On Tuesday 19 February 2008 16:44, KOSAKI Motohiro wrote:
> background
> ========================================
> current VM implementation doesn't has limit of # of parallel reclaim.
> when heavy workload, it bring to 2 bad things
>   - heavy lock contention
>   - unnecessary swap out
>
> abount 2 month ago, KAMEZA Hiroyuki proposed the patch of page
> reclaim throttle and explain it improve reclaim time.
>       http://marc.info/?l=linux-mm&m=119667465917215&w=2
>
> but unfortunately it works only memcgroup reclaim.
> Today, I implement it again for support global reclaim and mesure it.
>
>
> test machine, method and result
> ==================================================
> <test machine>
>       CPU:  IA64 x8
>       MEM:  8GB
>       SWAP: 2GB
>
> <test method>
>       got hackbench from
>               http://people.redhat.com/mingo/cfs-scheduler/tools/hackbench.c
>
>       $ /usr/bin/time hackbench 120 process 1000
>
>       this parameter mean consume all physical memory and
>       1GB swap space on my test environment.
>
> <test result (average of 3 times measurement)>
>
> before:
>       hackbench result:               282.30
>       /usr/bin/time result
>               user:                   14.16
>               sys:                    1248.47
>               elapse:                 432.93
>               major fault:            29026
>       max parallel reclaim tasks:     1298
>       max consumption time of
>        try_to_free_pages():           70394
>
> after:
>       hackbench result:               30.36
>       /usr/bin/time result
>               user:                   14.26
>               sys:                    294.44
>               elapse:                 118.01
>               major fault:            3064
>       max parallel reclaim tasks:     4
>       max consumption time of
>        try_to_free_pages():           12234
>
>
> conclusion
> =========================================
> this patch improve 3 things.
> 1. reduce unnecessary swap
>    (see above major fault. about 90% reduced)
> 2. improve throughput performance
>    (see above hackbench result. about 90% reduced)
> 3. improve interactive performance.
>    (see above max consumption of try_to_free_pages.
>     about 80% reduced)
> 4. reduce lock contention.
>    (see above sys time. about 80% reduced)
>
>
> Now, we got about 1000% performance improvement of hackbench :)
>
>
>
> foture works
> ==========================================================
>  - more discussion with memory controller guys.

Hi,

Yeah this is definitely needed and a nice result.

I'm worried about a) placing a global limit on parallelism, and b)
placing a limit on parallelism at all.

I think it should maybe be a per-zone thing...

What happens if you make it a per-zone mutex, and allow just a single
process to reclaim pages from a given zone at a time? I guess that is
going to slow down throughput a little bit in some cases though...

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Reply via email to