> -----Original Message----- > From: Tim Pepper [mailto:[email protected]] > Sent: Wednesday, February 15, 2012 1:19 PM > To: [email protected] > Subject: [E1000-devel] e1000_close() and concurrent reset > > I've got some systems whose nic's periodically go up and down, plus I > believe there are occasionally some "interesting" external network > issues > triggering nic resets. These machines are running the 1.6.3 Intel > driver > on a 2.6.32 based kernel. I have two sets of crashes that go: > > Feb 9 22:50:41 kernel: e1000e 0000:0b:00.0: eth1: Reset adapter > ... > Feb 9 22:50:41 kernel: WARNING: at > extra_drivers/open/e1000e_1_6_3/netdev.c:4676 e1000_close+0x162/0x170 > [e1000e_1_6_3]() > ... > Feb 9 22:50:41 kernel: BUG: unable to handle kernel NULL pointer > dereference at 00000004 > Feb 9 22:50:41 kernel: IP: [<f8747e55>] e1000_put_txbuf+0x15/0x90 > [e1000e_1_6_3] > ... > Feb 9 22:50:45 kernel: kernel BUG at drivers/pci/msi.c:284! > Feb 9 22:50:45 kernel: invalid opcode: 0000 [#2] SMP > > > Feb 14 13:50:06 kernel: e1000e 0000:15:00.0: eth0: Reset adapter > ... > Feb 14 13:50:06 kernel: WARNING: at > extra_drivers/open/e1000e_1_6_3/netdev.c:4676 e1000_close+0x162/0x170 > [e1000e_1_6_3]() > ... > Feb 14 13:50:06 kernel: BUG: unable to handle kernel NULL pointer > dereference at 00000008 > Feb 14 13:50:06 kernel: IP: [<f866f6a8>] > e1000_alloc_rx_buffers+0x98/0x270 [e1000e_1_6_3] > ... > Feb 14 13:50:07 kernel: kernel BUG at drivers/pci/msi.c:284! > Feb 14 13:50:07 kernel: invalid opcode: 0000 [#2] SMP > > > A very similar bug report is here: > http://lists.openwall.net/netdev/2011/11/14/127 > and notes two issues: > 1) The napi_enable() and napi_disable() should only be called in > the > e1000_open and e1000_close functions respectively > 2) There no synchronization preventing a call to the driver close > while > executing error processing. > > This led to upstream kernel commit > 5f4a780ddd453c4918555fed9d9c5f2d455a087d with respect to #1 about a > month > after 1.6.3 came out. I don't see the fix for #1 in driver 1.9.5 > though > which came out a few weeks after the upstream commit. Is this fix > going > to be available in an Intel driver update in the future?
Yes; it should be released in a few weeks. > > We don't explicitly set CONFIG_E1000E_NAPI in our build, but it looks > like src/kcompat.h probably automagically sets it since we haven't > defined E1000E_NO_NAPI. So we likely hit issue #1. > > But what about #2? It seems like something would still be needed to > address that and given a reading of the code paths involved with the > above kernel warnings/bugs, that concurrency issue seem to be just > what > we're hitting. Does Intel have a fix in the works for that portion? > Any patches we might be able to test? I'm not aware of anything in the works for this. > > > -- > Tim Pepper <[email protected]> > IBM Linux Technology Center ------------------------------------------------------------------------------ Virtualization & Cloud Management Using Capacity Planning Cloud computing makes use of virtualization - but cloud computing also focuses on allowing computing to be delivered as a service. http://www.accelacomm.com/jaw/sfnl/114/51521223/ _______________________________________________ E1000-devel mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/e1000-devel To learn more about Intel® Ethernet, visit http://communities.intel.com/community/wired
