Hello, Namhyung. 2015-03-23 15:30 GMT+09:00 Namhyung Kim <[email protected]>: > Hello, > > Currently perf kmem command only analyzes SLAB memory allocation. And > I'd like to introduce page allocation analysis also. Users can use > --slab and/or --page option to select it. If none of these options > are used, it does slab allocation analysis for backward compatibility. > > * changes in v3) > - add live page statistics > > * changes in v2) > - Use thousand grouping for big numbers - i.e. 12345 -> 12,345 (Ingo) > - Improve output stat readability (Ingo) > - Remove alloc size column as it can be calculated from hits and order > > Patch 1 is to support thousand grouping on stat output. Patch 2 > implements basic support for page allocation analysis, patch 3 deals > with the callsite and finally patch 4 implements sorting. > > In this patchset, I used two kmem events: kmem:mm_page_alloc and > kmem_page_free for analysis as they can track almost all of memory > allocation/free path AFAIK. However, unlike slab tracepoint events, > those page allocation events don't provide callsite info directly. So > I recorded callchains and extracted callsites like below: > > Normal page allocation callchains look like this: > > 360a7e __alloc_pages_nodemask > 3a711c alloc_pages_current > 357bc7 __page_cache_alloc <-- callsite > 357cf6 pagecache_get_page > 48b0a prepare_pages > 494d3 __btrfs_buffered_write > 49cdf btrfs_file_write_iter > 3ceb6e new_sync_write > 3cf447 vfs_write > 3cff99 sys_write > 7556e9 system_call > f880 __write_nocancel > 33eb9 cmd_record > 4b38e cmd_kmem > 7aa23 run_builtin > 27a9a main > 20800 __libc_start_main > > But first two are internal page allocation functions so it should be > skipped. To determine such allocation functions, I used following regex: > > ^_?_?(alloc|get_free|get_zeroed)_pages? > > This gave me a following list of functions (you can see this with -v): > > alloc func: __get_free_pages > alloc func: get_zeroed_page > alloc func: alloc_pages_exact > alloc func: __alloc_pages_direct_compact > alloc func: __alloc_pages_nodemask > alloc func: alloc_page_interleave > alloc func: alloc_pages_current > alloc func: alloc_pages_vma > alloc func: alloc_page_buffers > alloc func: alloc_pages_exact_nid > > After skipping those function, it got '__page_cache_alloc'.
It'd be better to have option for storing more depth of call stack. Just one call path isn't sufficient to distinguish real caller for some functions. For example, new_slab(), one of your callsite example doesn't tell which subsystem try to allocate slab object and fall through the page allocator. > Other information such as allocation order, migration type and gfp > flags are provided by tracepoint events. > > Basically the output will be sorted by total allocation bytes, but you > can change it by using -s/--sort option. The following sort keys are > added to support page analysis: page, order, mtype, gfp. Existing > 'callsite', 'bytes' and 'hit' sort keys also can be used. > > An example follows: > > # perf kmem record --slab --page sleep 1 > [ perf record: Woken up 0 times to write data ] > [ perf record: Captured and wrote 49.277 MB perf.data (191027 samples) ] > > # perf kmem stat --page --caller -l 10 -s order,hit > > > -------------------------------------------------------------------------------------------- > Total alloc (KB) | Hits | Order | Migration type | GFP flags | > Callsite > > -------------------------------------------------------------------------------------------- > 64 | 4 | 2 | RECLAIMABLE | 00285250 | > new_slab > 50,144 | 12,536 | 0 | MOVABLE | 0102005a | > __page_cache_alloc > 52 | 13 | 0 | UNMOVABLE | 002084d0 | > pte_alloc_one > 40 | 10 | 0 | MOVABLE | 000280da | > handle_mm_fault > 28 | 7 | 0 | UNMOVABLE | 000000d0 | > __pollwait > 20 | 5 | 0 | MOVABLE | 000200da | > do_wp_page > 20 | 5 | 0 | MOVABLE | 000200da | > do_cow_fault > 16 | 4 | 0 | UNMOVABLE | 00000200 | > __tlb_remove_page > 16 | 4 | 0 | UNMOVABLE | 000084d0 | > __pmd_alloc > 8 | 2 | 0 | UNMOVABLE | 000084d0 | > __pud_alloc > ... | ... | ... | ... | ... | ... > > -------------------------------------------------------------------------------------------- How about printing GFP flags more intuitively, for example, GFP_NOFS|GFP_ZERO? Tracepoint on mm_page_alloc already print output as this format. Thanks. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [email protected] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/

