Hi, one of our web servers got repeatedly stuck in the memory compaction code; two PHP processes have been busy at 100% inside memory compaction after a page fault:
100.00% 0.00% php-cgi7.0 [kernel.vmlinux] [k] page_fault | ---page_fault __do_page_fault handle_mm_fault __handle_mm_fault do_huge_pmd_anonymous_page __alloc_pages_nodemask __alloc_pages_slowpath __alloc_pages_direct_compact try_to_compact_pages compact_zone_order compact_zone | |--61.30%--isolate_migratepages_block | | | |--20.44%--node_page_state | | | |--5.88%--compact_unlock_should_abort.isra.33 | | | --3.28%--_cond_resched | | | --2.19%--rcu_all_qs | --3.37%--pageblock_skip_persistent ftrace: <...>-962300 [033] .... 236536.493919: node_page_state <-isolate_migratepages_block <...>-962300 [033] .... 236536.493919: node_page_state <-isolate_migratepages_block <...>-962300 [033] .... 236536.493919: node_page_state <-isolate_migratepages_block <...>-962300 [033] .... 236536.493919: _cond_resched <-isolate_migratepages_block <...>-962300 [033] .... 236536.493919: rcu_all_qs <-_cond_resched <...>-962300 [033] .... 236536.493919: compact_unlock_should_abort.isra.33 <-isolate_migratepages_block <...>-962300 [033] .... 236536.493919: pageblock_skip_persistent <-compact_zone <...>-962300 [033] .... 236536.493919: isolate_migratepages_block <-compact_zone <...>-962300 [033] .... 236536.493919: node_page_state <-isolate_migratepages_block <...>-962300 [033] .... 236536.493919: node_page_state <-isolate_migratepages_block <...>-962300 [033] .... 236536.493919: node_page_state <-isolate_migratepages_block <...>-962300 [033] .... 236536.493919: node_page_state <-isolate_migratepages_block <...>-962300 [033] .... 236536.493920: node_page_state <-isolate_migratepages_block <...>-962300 [033] .... 236536.493920: node_page_state <-isolate_migratepages_block <...>-962300 [033] .... 236536.493920: _cond_resched <-isolate_migratepages_block <...>-962300 [033] .... 236536.493920: rcu_all_qs <-_cond_resched <...>-962300 [033] .... 236536.493920: compact_unlock_should_abort.isra.33 <-isolate_migratepages_block <...>-962300 [033] .... 236536.493920: pageblock_skip_persistent <-compact_zone <...>-962300 [033] .... 236536.493920: isolate_migratepages_block <-compact_zone <...>-962300 [033] .... 236536.493920: node_page_state <-isolate_migratepages_block Nothing useful in /proc/PID/{stack,wchan,syscall}. slabinfo/kmalloc-{16,32} are going through the roof (~ 15 GB each), and this memleak-lookalike triggering the oomkiller all the time is what drew our attention to this server. Right now, the server is still stuck, and I can attempt to collect more information on request. Max