Greetings! I have a bunch of Supermicro server which serve large ZFS storages. Once in a while the report a strange page allocation failure when the monitoring system uses ipmi-dcmi. No OutOfMemory situation is visible and I cannot reproduce the error on purpose yet. All I found out about this is nvidia forum thread which describes a similar error message during high I/O https://forums.developer.nvidia.com/t/455-23-04-page-allocation-failure-in-kernel-module-at-random-points/155250 I can confirm that the errors appear when the server does a lot of I/O, several hundred MiB/s.
I cannot see any actual problem for our workload on these machines but still I wanted to report this in case anyone else might use the information. Regards, Beni Further Information about system, OS and the trace: product: X11SPi-TF vendor: Supermicro Firmware Revision: 01.73.03 Firmware Build Time: 06/30/2020 BIOS Version: 3.3 BIOS Build Time: 02/21/2020 CentOS 8.4 Kernel 4.18.0-193.el8.x86_64 Jun 8 04:55:40 bck-srv ipmi_exporter[3724813]: time="2021-06-08T04:55:40+02:00" level=error msg="Error while calling ipmi-dcmi for [local]: ipmi_ctx_find_inband: out of memory\n" source="collector.go:278"|| Jun 08 04:55:40 bck-srv011201 kernel: ipmi-dcmi: page allocation failure: order:4, mode:0x6040c0(GFP_KERNEL|__GFP_COMP), nodemask=(null) Jun 08 04:55:40 bck-srv kernel: ipmi-dcmi cpuset=/ mems_allowed=0 Jun 08 04:55:40 bck-srv kernel: CPU: 4 PID: 3348161 Comm: ipmi-dcmi Tainted: P OE --------- - - 4.18.0-193.el8.x86_64 #1 Jun 08 04:55:40 bck-srv kernel: Hardware name: Supermicro Super Server/X11SPi-TF, BIOS 3.3 02/21/2020 Jun 08 04:55:40 bck-srv kernel: Call Trace: Jun 08 04:55:40 bck-srv kernel: dump_stack+0x5c/0x80 Jun 08 04:55:40 bck-srv kernel: warn_alloc.cold.123+0x6f/0x101 Jun 08 04:55:40 bck-srv kernel: ? _cond_resched+0x15/0x30 Jun 08 04:55:40 bck-srv kernel: ? __alloc_pages_direct_compact+0x128/0x130 Jun 08 04:55:40 bck-srv kernel: __alloc_pages_slowpath+0xccd/0xd00 Jun 08 04:55:40 bck-srv kernel: ? enqueue_entity+0xf6/0x630 Jun 08 04:55:40 bck-srv kernel: __alloc_pages_nodemask+0x245/0x280 Jun 08 04:55:40 bck-srv kernel: kmalloc_order+0x14/0x30 Jun 08 04:55:40 bck-srv kernel: kmalloc_order_trace+0x1d/0xa0 Jun 08 04:55:40 bck-srv kernel: ipmi_create_user+0x5a/0x1e0 [ipmi_msghandler] Jun 08 04:55:40 bck-srv kernel: ? _cond_resched+0x15/0x30 Jun 08 04:55:40 bck-srv kernel: ? kmem_cache_alloc_trace+0x140/0x1c0 Jun 08 04:55:40 bck-srv kernel: ipmi_open+0x4d/0xd0 [ipmi_devintf] Jun 08 04:55:40 bck-srv kernel: chrdev_open+0xcb/0x1e0 Jun 08 04:55:40 bck-srv kernel: ? cdev_default_release+0x20/0x20 Jun 08 04:55:40 bck-srv kernel: do_dentry_open+0x132/0x330 Jun 08 04:55:40 bck-srv kernel: path_openat+0x573/0x14d0 Jun 08 04:55:40 bck-srv kernel: ? do_page_fault+0x32/0x110 Jun 08 04:55:40 bck-srv kernel: do_filp_open+0x93/0x100 Jun 08 04:55:40 bck-srv kernel: ? strncpy_from_user+0x7c/0x1b0 Jun 08 04:55:40 bck-srv kernel: do_sys_open+0x184/0x220 Jun 08 04:55:40 bck-srv kernel: do_syscall_64+0x5b/0x1a0 Jun 08 04:55:40 bck-srv kernel: entry_SYSCALL_64_after_hwframe+0x65/0xca Jun 08 04:55:40 bck-srv kernel: RIP: 0033:0x7fe8708c904f Jun 08 04:55:40 bck-srv kernel: Code: 52 89 f0 25 00 00 41 00 3d 00 00 41 00 74 44 8b 05 06 d4 20 00 85 c0 75 65 89 f2 b8 01 01 00 00 48 89 fe bf 9c ff ff ff 0f 05 <48> 3d 00 f0 ff ff 0f 87 9d 00 00 00 48 8b 4c 24 28 64 48 33 0c 25 Jun 08 04:55:40 bck-srv kernel: RSP: 002b:00007ffed94225c0 EFLAGS: 00000246 ORIG_RAX: 0000000000000101 Jun 08 04:55:40 bck-srv kernel: RAX: ffffffffffffffda RBX: 000055d126e47ec0 RCX: 00007fe8708c904f Jun 08 04:55:40 bck-srv kernel: RDX: 0000000000000002 RSI: 00007fe870cc3720 RDI: 00000000ffffff9c Jun 08 04:55:40 bck-srv kernel: RBP: 000055d126e47ea0 R08: 000055d126e481f0 R09: 000055d126e47eb0 Jun 08 04:55:40 bck-srv kernel: R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000 Jun 08 04:55:40 bck-srv kernel: R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000007 Jun 08 04:55:40 bck-srv kernel: warn_alloc_show_mem: 3 callbacks suppressed Jun 08 04:55:40 bck-srv kernel: Mem-Info: Jun 08 04:55:40 bck-srv kernel: active_anon:665748 inactive_anon:440411 isolated_anon:0 active_file:153083 inactive_file:92603 isolated_file:0 unevictable:0 dirty:41 writeback:0 unstable:0 slab_reclaimable:75153 slab_unreclaimable:392945 mapped:62785 shmem:1044900 pagetables:2392 bounce:0 free:4211681 free_pcp:0 free_cma:0 Jun 08 04:55:40 bck-srv kernel: Node 0 active_anon:2662992kB inactive_anon:1761644kB active_file:612332kB inactive_file:370412kB unevictable:0kB isolated(anon):0kB isolated(file):0kB mapped:251140kB dirty:164kB writeback:0kB shmem:4179600kB shmem_thp: 0kB shmem_pmdmapped: 0kB> Jun 08 04:55:40 bck-srv kernel: Node 0 DMA free:15360kB min:8kB low:20kB high:32kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB present:15964kB managed:15360kB mlocked:0kB kernel_stack:0kB pagetables:0kB bounce:0kB free_> Jun 08 04:55:40 bck-srv kernel: lowmem_reserve[]: 0 1336 95014 95014 95014 Jun 08 04:55:40 bck-srv kernel: Node 0 DMA32 free:374900kB min:948kB low:2316kB high:3684kB active_anon:47800kB inactive_anon:30864kB active_file:29832kB inactive_file:24268kB unevictable:0kB writepending:0kB present:1725332kB managed:1397652kB mlocked:0kB kernel_stack:0kB pa> Jun 08 04:55:40 bck-srv kernel: lowmem_reserve[]: 0 0 93677 93677 93677 Jun 08 04:55:40 bck-srv kernel: Node 0 Normal free:16456464kB min:66620kB low:162544kB high:258468kB active_anon:2615192kB inactive_anon:1730780kB active_file:582500kB inactive_file:346144kB unevictable:0kB writepending:164kB present:97517568kB managed:95932772kB mlocked:0kB > Jun 08 04:55:40 bck-srv kernel: lowmem_reserve[]: 0 0 0 0 0 Jun 08 04:55:40 bck-srv kernel: Node 0 DMA: 0*4kB 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 1*1024kB (U) 1*2048kB (M) 3*4096kB (M) = 15360kB Jun 08 04:55:40 bck-srv kernel: Node 0 DMA32: 57*4kB (UM) 180*8kB (UM) 131*16kB (UM) 68*32kB (UM) 39*64kB (UM) 25*128kB (UM) 17*256kB (UM) 13*512kB (M) 10*1024kB (M) 3*2048kB (UM) 82*4096kB (UM) = 374900kB Jun 08 04:55:40 bck-srv kernel: Node 0 Normal: 9421*4kB (UME) 318223*8kB (UME) 368527*16kB (UME) 249283*32kB (UE) 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 16456956kB Jun 08 04:55:40 bck-srv kernel: Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB Jun 08 04:55:40 bck-srv kernel: Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB Jun 08 04:55:40 bck-srv kernel: 1263502 total pagecache pages Jun 08 04:55:40 bck-srv kernel: 13 pages in swap cache Jun 08 04:55:40 bck-srv kernel: Swap cache stats: add 349058, delete 349045, find 620/1882 Jun 08 04:55:40 bck-srv kernel: Free swap = 8374524kB Jun 08 04:55:40 bck-srv kernel: Total swap = 8388604kB Jun 08 04:55:40 bck-srv kernel: 24814716 pages RAM Jun 08 04:55:40 bck-srv kernel: 0 pages HighMem/MovableOnly Jun 08 04:55:40 bck-srv kernel: 478270 pages reserved Jun 08 04:55:40 bck-srv kernel: 0 pages hwpoisoned