On 07/23/2012 12:41 AM, Pekka Riikonen wrote:
> Hi,
>
> In our 64 byte packet test with 12 10GbE ports we encountered some 
> interesting softlockups and interrupt rates.  For some reason suddenly we 
> started seeing softlockups usually in kworker (doing various work) while 
> processing packets.  In this test we sent a total 40 Mpps to all ports and 
> we use heavily modified ixgbe from sourceforge.net, pause frames off.
>
> Softlockups such as:
>
> [  250.133274] BUG: soft lockup - CPU#10 stuck for 22s! [kworker/10:1:77]
> [  250.133404] Process kworker/10:1 (pid: 77, threadinfo ffff88107c7c0000, 
> [  250.133441] Call Trace:
> [  250.133444]  <IRQ>
> [  250.133456]  [<ffffffffa0048a89>] ixgbe_clean_rx_irq+0x269/0x4e0 [ixgbe]
> [  250.133464]  [<ffffffffa004932c>] ixgbe_poll+0x25c/0x660 [ixgbe]
> [  250.133472]  [<ffffffff81397e87>] net_rx_action+0xa7/0x2a0
> [  250.133481]  [<ffffffff8103c108>] __do_softirq+0x98/0x120
> [  250.133489]  [<ffffffff81469f8c>] call_softirq+0x1c/0x30
> [  250.133497]  [<ffffffff810044fd>] do_softirq+0x4d/0x80
> [  250.133503]  [<ffffffff8103c3b5>] irq_exit+0x65/0x70
> [  250.133508]  [<ffffffff8100439e>] do_IRQ+0x5e/0xd0
> [  250.133517]  [<ffffffff81468813>] common_interrupt+0x13/0x13
> [  250.133521]  <EOI>
> [  250.133528]  [<ffffffffa004eb5f>] ? ixgbe_update_stats+0x13f/0xca0 [ixgbe]
> [  250.133535]  [<ffffffff8146880e>] ? common_interrupt+0xe/0x13
> [  250.133543]  [<ffffffffa004fd55>] ixgbe_service_task+0x695/0x970 [ixgbe]
> [  250.133551]  [<ffffffffa004f6c0>] ? ixgbe_update_stats+0xca0/0xca0 [ixgbe]
> [  250.133558]  [<ffffffff8104c0c1>] process_one_work+0x101/0x390
> [  250.133564]  [<ffffffff8104c91f>] worker_thread+0x15f/0x350
> [  250.133569]  [<ffffffff8104c7c0>] ? manage_workers.isra.32+0x220/0x220
> [  250.133577]  [<ffffffff81050b27>] kthread+0x87/0x90
> [  250.133584]  [<ffffffff81469e94>] kernel_thread_helper+0x4/0x10
> [  250.133590]  [<ffffffff81050aa0>] ? kthread_worker_fn+0x130/0x130
> [  250.133595]  [<ffffffff81469e90>] ? gs_change+0xb/0xb
>
> I traced the problem to the the NAPI poll return value in ixgbe_poll() 
> when exiting polling mode.  In that case ixgbe returns 0, and not the 
> actual value of work done.  This helps throughput but also makes NET_RX 
> run longer in hardirq context.  OTOH, if I change it to the true work done 
> value the throughput suffered too much so I settled on workdone >> 2, as a 
> hack.
>
> But I still wanted to know why this problem happens because even if 0 is 
> returned in poll() the softirqs aren't designed to run forever.  So I 
> started looking at the interrupt rate and noticed that in this particular 
> test it oscillated a lot, sometimes going up to 300k+ ints/sec, even 
> though the traffic is stable.  Apparently the interrupt rate was so high 
> that it could starve the user context.
>
> The problem went a way with the hack in ixgbe_poll() but it got me 
> thinking why the ITR value is not updated always in ixgbe_poll() and not 
> only after napi_complete()?  It should be more stable if it was updated at 
> each poll().
>
> The problem went away also when reducing number of ports, which makes me 
> think this problem will reappear when we finally start testing with 16-24 
> ports.
>
> And of course, softlockups went away when the traffic was stopped.
>
> Now, my analysis could be wrong and I have to say that we have heavily 
> modified ixgbe driver and kernel so it's possible that this problem 
> doesn't happen with vanilla driver and kernel.
>
>       Pekka
>
Hello Pekka,

You say you heavily modified the ixgbe driver.  I was wondering if you
are able to see the same issue with an unmodified driver?

Based on your description it sounds like there may be an issue with the
interrupt moderation for the adapter.  You might try using our ethregs
utility available at e1000.sf.net to dump the contents of the EITR
registers for the adapter.  This way you could at least verify what the
interrupt rate is that is being programmed into the adapters.

Thanks,

Alex

------------------------------------------------------------------------------
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
_______________________________________________
E1000-devel mailing list
E1000-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/e1000-devel
To learn more about Intel&#174; Ethernet, visit 
http://communities.intel.com/community/wired

Reply via email to