Re: alloc failed, but?

2012-06-27 Thread Andy Green

On 06/28/12 11:27, the mail apparently from Tom Gall included:

Hi All,

I'm stressing a system with apachebench. As one scales up work on a
system obviously there's always a point where the wheels fall off, the
engine explodes or something else exciting happens. But as Han Solo
would say ... "hold together baby", I'd like to eek out as much as
I can. (If you're really interested, here's what I'm up to :
http://fullshovel.wordpress.com/  start with part 1)

In this case with apachebench, I'm geting the following allocation
errors in the kernel and need a little help deciphering. It sure looks
like there's plenty of space to swap out however if I have this right,
we're getting so much network traffic that the kernel gets inundated
and it OOMs in the network stack.

I did later try setting sysctl -w vm.min_free_kbytes=32768  but that
didn't really seem to help.

The much more complete dmesg dump is located at
http://people.linaro.org/~tgall/dmesg-dump.txt



[127100.245117] swapper/0: page allocation failure: order:3, mode:0x20



[127100.245666] [<80100f14>] (__alloc_pages_nodemask+0x678/0x7a4) from
[<80695270>] (kmem_getpages.isra.35+0x3c/0xc0)
[127100.245666] [<80695270>] (kmem_getpages.isra.35+0x3c/0xc0) from
[<80695380>] (cache_grow.constprop.37+0x8c/0x1fc)
[127100.245666] [<80695380>] (cache_grow.constprop.37+0x8c/0x1fc) from
[<8069570c>] (cache_alloc_refill+0x21c/0x274)
[127100.245819] [<8069570c>] (cache_alloc_refill+0x21c/0x274) from
[<80132dac>] (__kmalloc_track_caller+0xac/0x1b0)
[127100.245910] [<80132dac>] (__kmalloc_track_caller+0xac/0x1b0) from
[<8057a37c>] (__alloc_skb+0x60/0xfc)
[127100.245971] [<8057a37c>] (__alloc_skb+0x60/0xfc) from [<8057a874>]
(__netdev_alloc_skb+0x2c/0x54)
[127100.245971] [<8057a874>] (__netdev_alloc_skb+0x2c/0x54) from
[<8049dbb8>] (rx_submit+0x2c/0x1d4)
[127100.245971] [<8049dbb8>] (rx_submit+0x2c/0x1d4) from [<8049e1c0>]
(rx_complete+0x1a4/0x1b8)
[127100.245971] [<8049e1c0>] (rx_complete+0x1a4/0x1b8) from
[<804a5f38>] (usb_hcd_giveback_urb+0xb0/0xfc)
[127100.246246] [<804a5f38>] (usb_hcd_giveback_urb+0xb0/0xfc) from
[<804b887c>] (ehci_urb_done+0xb8/0xc4)
[127100.246246] [<804b887c>] (ehci_urb_done+0xb8/0xc4) from
[<804bb240>] (qh_completions+0xc8/0x49c)


Just some not directly useful extra info...

I noticed these yesterday in dmesg as well while adding the 32K 
min_free_kybytes in tilt-3.4 as a hack.  It seems to be part of some 
syndrome with smsc driver and network memory allocation that's in 
mainline and not Panda-specific.  Yesterday I saw in Google the same 
problems plaguing Raspberry Pi folks.


When I recently tried to stress the Panda a week or so ago by cloning 
gcc with a plan to compile it, in fact it lost sanity during the 
download with a storm of these kevent lost messages, hence the 32K hack 
being added.


I also remember the same problems about kevents being dropped getting 
looked at like a year ago without any solid result, it'll be interesting 
if anyone understands and can explain what the underlying issue is.


-Andy

--
Andy Green | TI Landing Team Leader
Linaro.org │ Open source software for ARM SoCs | Follow Linaro
http://facebook.com/pages/Linaro/155974581091106  - 
http://twitter.com/#!/linaroorg - http://linaro.org/linaro-blog




___
linaro-dev mailing list
linaro-dev@lists.linaro.org
http://lists.linaro.org/mailman/listinfo/linaro-dev


alloc failed, but?

2012-06-27 Thread Tom Gall
Hi All,

I'm stressing a system with apachebench. As one scales up work on a
system obviously there's always a point where the wheels fall off, the
engine explodes or something else exciting happens. But as Han Solo
would say ... "hold together baby", I'd like to eek out as much as
I can. (If you're really interested, here's what I'm up to :
http://fullshovel.wordpress.com/  start with part 1)

In this case with apachebench, I'm geting the following allocation
errors in the kernel and need a little help deciphering. It sure looks
like there's plenty of space to swap out however if I have this right,
we're getting so much network traffic that the kernel gets inundated
and it OOMs in the network stack.

I did later try setting sysctl -w vm.min_free_kbytes=32768  but that
didn't really seem to help.

The much more complete dmesg dump is located at
http://people.linaro.org/~tgall/dmesg-dump.txt

Thanks in advance for thoughts and advise.

[127089.668487] smsc95xx 1-1.1:1.0: eth0: kevent 2 may have been dropped
[127089.675994] smsc95xx 1-1.1:1.0: eth0: kevent 2 may have been dropped
[127089.683502] smsc95xx 1-1.1:1.0: eth0: kevent 2 may have been dropped
[127089.690979] smsc95xx 1-1.1:1.0: eth0: kevent 2 may have been dropped
[127089.698455] smsc95xx 1-1.1:1.0: eth0: kevent 2 may have been dropped
[127089.705932] smsc95xx 1-1.1:1.0: eth0: kevent 2 may have been dropped
[127089.713409] smsc95xx 1-1.1:1.0: eth0: kevent 2 may have been dropped
[127089.720886] smsc95xx 1-1.1:1.0: eth0: kevent 2 may have been dropped
[127089.728363] smsc95xx 1-1.1:1.0: eth0: kevent 2 may have been dropped
[127096.934051] Initial hdmi_get_current_hpd says disconnected
[127100.245117] warn_alloc_failed: 52 callbacks suppressed
[127100.245117] swapper/0: page allocation failure: order:3, mode:0x20
[127100.245117] [<8001b8d0>] (unwind_backtrace+0x0/0xec) from
[<806905d0>] (dump_stack+0x20/0x24)
[127100.245544] [<806905d0>] (dump_stack+0x20/0x24) from [<800fe390>]
(warn_alloc_failed+0xfc/0x11c)
[127100.245544] [<800fe390>] (warn_alloc_failed+0xfc/0x11c) from
[<80100f14>] (__alloc_pages_nodemask+0x678/0x7a4)
[127100.245666] [<80100f14>] (__alloc_pages_nodemask+0x678/0x7a4) from
[<80695270>] (kmem_getpages.isra.35+0x3c/0xc0)
[127100.245666] [<80695270>] (kmem_getpages.isra.35+0x3c/0xc0) from
[<80695380>] (cache_grow.constprop.37+0x8c/0x1fc)
[127100.245666] [<80695380>] (cache_grow.constprop.37+0x8c/0x1fc) from
[<8069570c>] (cache_alloc_refill+0x21c/0x274)
[127100.245819] [<8069570c>] (cache_alloc_refill+0x21c/0x274) from
[<80132dac>] (__kmalloc_track_caller+0xac/0x1b0)
[127100.245910] [<80132dac>] (__kmalloc_track_caller+0xac/0x1b0) from
[<8057a37c>] (__alloc_skb+0x60/0xfc)
[127100.245971] [<8057a37c>] (__alloc_skb+0x60/0xfc) from [<8057a874>]
(__netdev_alloc_skb+0x2c/0x54)
[127100.245971] [<8057a874>] (__netdev_alloc_skb+0x2c/0x54) from
[<8049dbb8>] (rx_submit+0x2c/0x1d4)
[127100.245971] [<8049dbb8>] (rx_submit+0x2c/0x1d4) from [<8049e1c0>]
(rx_complete+0x1a4/0x1b8)
[127100.245971] [<8049e1c0>] (rx_complete+0x1a4/0x1b8) from
[<804a5f38>] (usb_hcd_giveback_urb+0xb0/0xfc)
[127100.246246] [<804a5f38>] (usb_hcd_giveback_urb+0xb0/0xfc) from
[<804b887c>] (ehci_urb_done+0xb8/0xc4)
[127100.246246] [<804b887c>] (ehci_urb_done+0xb8/0xc4) from
[<804bb240>] (qh_completions+0xc8/0x49c)
[127100.246307] [<804bb240>] (qh_completions+0xc8/0x49c) from
[<804bdcd0>] (scan_async+0x88/0x154)
[127100.246398] [<804bdcd0>] (scan_async+0x88/0x154) from [<804be138>]
(ehci_work+0x40/0x98)
[127100.246398] [<804be138>] (ehci_work+0x40/0x98) from [<804bf9c4>]
(ehci_irq+0x33c/0x3a4)
[127100.246459] [<804bf9c4>] (ehci_irq+0x33c/0x3a4) from [<804a53ac>]
(usb_hcd_irq+0x40/0x50)
[127100.246459] [<804a53ac>] (usb_hcd_irq+0x40/0x50) from [<800bf45c>]
(handle_irq_event_percpu+0xc4/0x298)
[127100.246459] [<800bf45c>] (handle_irq_event_percpu+0xc4/0x298) from
[<800bf67c>] (handle_irq_event+0x4c/0x6c)
[127100.246459] [<800bf67c>] (handle_irq_event+0x4c/0x6c) from
[<800c23b4>] (handle_fasteoi_irq+0xd8/0x124)
[127100.246734] [<800c23b4>] (handle_fasteoi_irq+0xd8/0x124) from
[<800bedd0>] (generic_handle_irq+0x30/0x40)
[127100.246765] [<800bedd0>] (generic_handle_irq+0x30/0x40) from
[<800140c4>] (handle_IRQ+0x88/0xc8)
[127100.246826] [<800140c4>] (handle_IRQ+0x88/0xc8) from [<800086d8>]
(gic_handle_irq+0x80/0xac)
[127100.246917] [<800086d8>] (gic_handle_irq+0x80/0xac) from
[<806a91c0>] (__irq_svc+0x40/0x70)
[127100.246917] Exception stack(0x80a47bf8 to 0x80a47c40)
[127100.246978] 7be0:
 000d 2180
[127100.247009] 7c00: 000d 0005 8bb55040 8236d180 00f7a819
81e00d5a 0068 80c5ee80
[127100.247100] 7c20: 0002 80a47c5c 80a47c40 80a47c40 805c734c
805c9a70 80070113 
[127100.247100] [<806a91c0>] (__irq_svc+0x40/0x70) from [<805c9a70>]
(tcp_event_data_recv+0x118/0x194)
[127100.247161] [<805c9a70>] (tcp_event_data_recv+0x118/0x194) from
[<805cc554>] (tcp_data_queue+0x30c/0x998)
[127100.247222] [<805cc554>] (tcp_data_queue+0x30c/0x998) from
[<805cfa30>] (tc