Re: alloc failed, but?
On 06/28/12 11:27, the mail apparently from Tom Gall included: Hi All, I'm stressing a system with apachebench. As one scales up work on a system obviously there's always a point where the wheels fall off, the engine explodes or something else exciting happens. But as Han Solo would say ... "hold together baby", I'd like to eek out as much as I can. (If you're really interested, here's what I'm up to : http://fullshovel.wordpress.com/ start with part 1) In this case with apachebench, I'm geting the following allocation errors in the kernel and need a little help deciphering. It sure looks like there's plenty of space to swap out however if I have this right, we're getting so much network traffic that the kernel gets inundated and it OOMs in the network stack. I did later try setting sysctl -w vm.min_free_kbytes=32768 but that didn't really seem to help. The much more complete dmesg dump is located at http://people.linaro.org/~tgall/dmesg-dump.txt [127100.245117] swapper/0: page allocation failure: order:3, mode:0x20 [127100.245666] [<80100f14>] (__alloc_pages_nodemask+0x678/0x7a4) from [<80695270>] (kmem_getpages.isra.35+0x3c/0xc0) [127100.245666] [<80695270>] (kmem_getpages.isra.35+0x3c/0xc0) from [<80695380>] (cache_grow.constprop.37+0x8c/0x1fc) [127100.245666] [<80695380>] (cache_grow.constprop.37+0x8c/0x1fc) from [<8069570c>] (cache_alloc_refill+0x21c/0x274) [127100.245819] [<8069570c>] (cache_alloc_refill+0x21c/0x274) from [<80132dac>] (__kmalloc_track_caller+0xac/0x1b0) [127100.245910] [<80132dac>] (__kmalloc_track_caller+0xac/0x1b0) from [<8057a37c>] (__alloc_skb+0x60/0xfc) [127100.245971] [<8057a37c>] (__alloc_skb+0x60/0xfc) from [<8057a874>] (__netdev_alloc_skb+0x2c/0x54) [127100.245971] [<8057a874>] (__netdev_alloc_skb+0x2c/0x54) from [<8049dbb8>] (rx_submit+0x2c/0x1d4) [127100.245971] [<8049dbb8>] (rx_submit+0x2c/0x1d4) from [<8049e1c0>] (rx_complete+0x1a4/0x1b8) [127100.245971] [<8049e1c0>] (rx_complete+0x1a4/0x1b8) from [<804a5f38>] (usb_hcd_giveback_urb+0xb0/0xfc) [127100.246246] [<804a5f38>] (usb_hcd_giveback_urb+0xb0/0xfc) from [<804b887c>] (ehci_urb_done+0xb8/0xc4) [127100.246246] [<804b887c>] (ehci_urb_done+0xb8/0xc4) from [<804bb240>] (qh_completions+0xc8/0x49c) Just some not directly useful extra info... I noticed these yesterday in dmesg as well while adding the 32K min_free_kybytes in tilt-3.4 as a hack. It seems to be part of some syndrome with smsc driver and network memory allocation that's in mainline and not Panda-specific. Yesterday I saw in Google the same problems plaguing Raspberry Pi folks. When I recently tried to stress the Panda a week or so ago by cloning gcc with a plan to compile it, in fact it lost sanity during the download with a storm of these kevent lost messages, hence the 32K hack being added. I also remember the same problems about kevents being dropped getting looked at like a year ago without any solid result, it'll be interesting if anyone understands and can explain what the underlying issue is. -Andy -- Andy Green | TI Landing Team Leader Linaro.org │ Open source software for ARM SoCs | Follow Linaro http://facebook.com/pages/Linaro/155974581091106 - http://twitter.com/#!/linaroorg - http://linaro.org/linaro-blog ___ linaro-dev mailing list linaro-dev@lists.linaro.org http://lists.linaro.org/mailman/listinfo/linaro-dev
alloc failed, but?
Hi All, I'm stressing a system with apachebench. As one scales up work on a system obviously there's always a point where the wheels fall off, the engine explodes or something else exciting happens. But as Han Solo would say ... "hold together baby", I'd like to eek out as much as I can. (If you're really interested, here's what I'm up to : http://fullshovel.wordpress.com/ start with part 1) In this case with apachebench, I'm geting the following allocation errors in the kernel and need a little help deciphering. It sure looks like there's plenty of space to swap out however if I have this right, we're getting so much network traffic that the kernel gets inundated and it OOMs in the network stack. I did later try setting sysctl -w vm.min_free_kbytes=32768 but that didn't really seem to help. The much more complete dmesg dump is located at http://people.linaro.org/~tgall/dmesg-dump.txt Thanks in advance for thoughts and advise. [127089.668487] smsc95xx 1-1.1:1.0: eth0: kevent 2 may have been dropped [127089.675994] smsc95xx 1-1.1:1.0: eth0: kevent 2 may have been dropped [127089.683502] smsc95xx 1-1.1:1.0: eth0: kevent 2 may have been dropped [127089.690979] smsc95xx 1-1.1:1.0: eth0: kevent 2 may have been dropped [127089.698455] smsc95xx 1-1.1:1.0: eth0: kevent 2 may have been dropped [127089.705932] smsc95xx 1-1.1:1.0: eth0: kevent 2 may have been dropped [127089.713409] smsc95xx 1-1.1:1.0: eth0: kevent 2 may have been dropped [127089.720886] smsc95xx 1-1.1:1.0: eth0: kevent 2 may have been dropped [127089.728363] smsc95xx 1-1.1:1.0: eth0: kevent 2 may have been dropped [127096.934051] Initial hdmi_get_current_hpd says disconnected [127100.245117] warn_alloc_failed: 52 callbacks suppressed [127100.245117] swapper/0: page allocation failure: order:3, mode:0x20 [127100.245117] [<8001b8d0>] (unwind_backtrace+0x0/0xec) from [<806905d0>] (dump_stack+0x20/0x24) [127100.245544] [<806905d0>] (dump_stack+0x20/0x24) from [<800fe390>] (warn_alloc_failed+0xfc/0x11c) [127100.245544] [<800fe390>] (warn_alloc_failed+0xfc/0x11c) from [<80100f14>] (__alloc_pages_nodemask+0x678/0x7a4) [127100.245666] [<80100f14>] (__alloc_pages_nodemask+0x678/0x7a4) from [<80695270>] (kmem_getpages.isra.35+0x3c/0xc0) [127100.245666] [<80695270>] (kmem_getpages.isra.35+0x3c/0xc0) from [<80695380>] (cache_grow.constprop.37+0x8c/0x1fc) [127100.245666] [<80695380>] (cache_grow.constprop.37+0x8c/0x1fc) from [<8069570c>] (cache_alloc_refill+0x21c/0x274) [127100.245819] [<8069570c>] (cache_alloc_refill+0x21c/0x274) from [<80132dac>] (__kmalloc_track_caller+0xac/0x1b0) [127100.245910] [<80132dac>] (__kmalloc_track_caller+0xac/0x1b0) from [<8057a37c>] (__alloc_skb+0x60/0xfc) [127100.245971] [<8057a37c>] (__alloc_skb+0x60/0xfc) from [<8057a874>] (__netdev_alloc_skb+0x2c/0x54) [127100.245971] [<8057a874>] (__netdev_alloc_skb+0x2c/0x54) from [<8049dbb8>] (rx_submit+0x2c/0x1d4) [127100.245971] [<8049dbb8>] (rx_submit+0x2c/0x1d4) from [<8049e1c0>] (rx_complete+0x1a4/0x1b8) [127100.245971] [<8049e1c0>] (rx_complete+0x1a4/0x1b8) from [<804a5f38>] (usb_hcd_giveback_urb+0xb0/0xfc) [127100.246246] [<804a5f38>] (usb_hcd_giveback_urb+0xb0/0xfc) from [<804b887c>] (ehci_urb_done+0xb8/0xc4) [127100.246246] [<804b887c>] (ehci_urb_done+0xb8/0xc4) from [<804bb240>] (qh_completions+0xc8/0x49c) [127100.246307] [<804bb240>] (qh_completions+0xc8/0x49c) from [<804bdcd0>] (scan_async+0x88/0x154) [127100.246398] [<804bdcd0>] (scan_async+0x88/0x154) from [<804be138>] (ehci_work+0x40/0x98) [127100.246398] [<804be138>] (ehci_work+0x40/0x98) from [<804bf9c4>] (ehci_irq+0x33c/0x3a4) [127100.246459] [<804bf9c4>] (ehci_irq+0x33c/0x3a4) from [<804a53ac>] (usb_hcd_irq+0x40/0x50) [127100.246459] [<804a53ac>] (usb_hcd_irq+0x40/0x50) from [<800bf45c>] (handle_irq_event_percpu+0xc4/0x298) [127100.246459] [<800bf45c>] (handle_irq_event_percpu+0xc4/0x298) from [<800bf67c>] (handle_irq_event+0x4c/0x6c) [127100.246459] [<800bf67c>] (handle_irq_event+0x4c/0x6c) from [<800c23b4>] (handle_fasteoi_irq+0xd8/0x124) [127100.246734] [<800c23b4>] (handle_fasteoi_irq+0xd8/0x124) from [<800bedd0>] (generic_handle_irq+0x30/0x40) [127100.246765] [<800bedd0>] (generic_handle_irq+0x30/0x40) from [<800140c4>] (handle_IRQ+0x88/0xc8) [127100.246826] [<800140c4>] (handle_IRQ+0x88/0xc8) from [<800086d8>] (gic_handle_irq+0x80/0xac) [127100.246917] [<800086d8>] (gic_handle_irq+0x80/0xac) from [<806a91c0>] (__irq_svc+0x40/0x70) [127100.246917] Exception stack(0x80a47bf8 to 0x80a47c40) [127100.246978] 7be0: 000d 2180 [127100.247009] 7c00: 000d 0005 8bb55040 8236d180 00f7a819 81e00d5a 0068 80c5ee80 [127100.247100] 7c20: 0002 80a47c5c 80a47c40 80a47c40 805c734c 805c9a70 80070113 [127100.247100] [<806a91c0>] (__irq_svc+0x40/0x70) from [<805c9a70>] (tcp_event_data_recv+0x118/0x194) [127100.247161] [<805c9a70>] (tcp_event_data_recv+0x118/0x194) from [<805cc554>] (tcp_data_queue+0x30c/0x998) [127100.247222] [<805cc554>] (tcp_data_queue+0x30c/0x998) from [<805cfa30>] (tc