Hi Minchan,
On Mon, May 19, 2014 at 5:16 AM, Minchan Kim wrote:
> Linux doesn't have an ability to free pages lazy while other OS
> already have been supported that named by madvise(MADV_FREE).
Since this patch changes the ABI, could you please CC future
iterations to linux-...@vger.kernel.org as per
Documentation/SubmitChecklist.
Thanks,
Michael
> The gain is clear that kernel can discard freed pages rather than
> swapping out or OOM if memory pressure happens.
>
> Without memory pressure, freed pages would be reused by userspace
> without another additional overhead(ex, page fault + allocation
> + zeroing).
>
> How to work is following as.
>
> When madvise syscall is called, VM clears dirty bit of ptes of
> the range. If memory pressure happens, VM checks dirty bit of
> page table and if it found still "clean", it means it's a
> "lazyfree pages" so VM could discard the page instead of swapping out.
> Once there was store operation for the page before VM peek a page
> to reclaim, dirty bit is set so VM can swap out the page instead of
> discarding.
>
> Firstly, heavy users would be general allocators(ex, jemalloc,
> tcmalloc and hope glibc supports it) and jemalloc/tcmalloc already
> have supported the feature for other OS(ex, FreeBSD)
> barrios@blaptop:~/benchmark/ebizzy$ lscpu
> Architecture: x86_64
> CPU op-mode(s):32-bit, 64-bit
> Byte Order:Little Endian
> CPU(s):4
> On-line CPU(s) list: 0-3
> Thread(s) per core:2
> Core(s) per socket:2
> Socket(s): 1
> NUMA node(s): 1
> Vendor ID: GenuineIntel
> CPU family:6
> Model: 42
> Stepping: 7
> CPU MHz: 2801.000
> BogoMIPS: 5581.64
> Virtualization:VT-x
> L1d cache: 32K
> L1i cache: 32K
> L2 cache: 256K
> L3 cache: 4096K
> NUMA node0 CPU(s): 0-3
>
> ebizzy benchmark(./ebizzy -S 10 -n 512)
>
> vanilla-jemalloc MADV_free-jemalloc
>
> 1 thread
> records: 10 records: 10
> avg: 7682.10 avg: 15306.10
> std: 62.35(0.81%)std: 347.99(2.27%)
> max: 7770.00 max: 15622.00
> min: 7598.00 min: 14772.00
>
> 2 thread
> records: 10 records: 10
> avg: 12747.50avg: 24171.00
> std: 792.06(6.21%) std: 895.18(3.70%)
> max: 13337.00max: 26023.00
> min: 10535.00min: 23152.00
>
> 4 thread
> records: 10 records: 10
> avg: 16474.60avg: 33717.90
> std: 1496.45(9.08%) std: 2008.97(5.96%)
> max: 17877.00max: 35958.00
> min: 12224.00min: 29565.00
>
> 8 thread
> records: 10 records: 10
> avg: 16778.50avg: 33308.10
> std: 825.53(4.92%) std: 1668.30(5.01%)
> max: 17543.00max: 36010.00
> min: 14576.00min: 29577.00
>
> 16 thread
> records: 10 records: 10
> avg: 20614.40avg: 35516.30
> std: 602.95(2.92%) std: 1283.65(3.61%)
> max: 21753.00max: 37178.00
> min: 19605.00min: 33217.00
>
> 32 thread
> records: 10 records: 10
> avg: 22771.70avg: 36018.50
> std: 598.94(2.63%) std: 1046.76(2.91%)
> max: 24035.00max: 37266.00
> min: 22108.00min: 34149.00
>
> In summary, MADV_FREE is about 2 time faster than MADV_DONTNEED.
>
> * From v6
> * Remove page from swapcache in syscal time
> * Move utility functions from memory.c to madvise.c - Johannes
> * Rename untilify functtions - Johannes
> * Remove unnecessary checks from vmscan.c - Johannes
> * Rebased-on v3.15-rc5-mmotm-2014-05-16-16-56
> * Drop Reviewe-by because there was some changes since then.
>
> * From v5
> * Fix PPC problem which don't flush TLB - Rik
> * Remove unnecessary lazyfree_range stub function - Rik
> * Rebased on v3.15-rc5
>
> * From v4
> * Add Reviewed-by: Zhang Yanfei
> * Rebase on v3.15-rc1-mmotm-2014-04-15-16-14
>
> * From v3
> * Add "how to work part" in description - Zhang
> * Add page_discardable utility function - Zhang
> * Clean up
>
> * From v2
> * Remove forceful dirty marking of swap-readed page - Johannes
> * Remove deactivation logic of lazyfreed page
> * Rebased on 3.14
> * Remove RFC tag
>
> * From v1
> * Use custom page table walker for madvise_free - Johannes
> * Remove PG_lazypage flag - Johannes
> * Do madvise_dontneed instead of madvise_freein swapless system
>
> Cc: Hugh Dickins
> Cc: Johannes Weiner
> Cc: Rik van Riel
> Cc: KOSAKI Motohiro
> Cc: Mel Gorman
> Cc: Jason Evans
> Cc: Zhang Yanfei
> Signed-off-by: Minchan Kim
> ---
> include/linux/rmap.h | 8 +-
> include/linux/vm_event_item.h | 1 +
> include/uapi/asm-generic/mman-common.h |