> -----Original Message-----
> From: Tim Pepper [mailto:[email protected]]
> Sent: Wednesday, February 15, 2012 1:19 PM
> To: [email protected]
> Subject: [E1000-devel] e1000_close() and concurrent reset
> 
> I've got some systems whose nic's periodically go up and down, plus I
> believe there are occasionally some "interesting" external network
> issues
> triggering nic resets.  These machines are running the 1.6.3 Intel
> driver
> on a 2.6.32 based kernel.  I have two sets of crashes that go:
> 
> Feb  9 22:50:41 kernel: e1000e 0000:0b:00.0: eth1: Reset adapter
>     ...
> Feb  9 22:50:41 kernel: WARNING: at
> extra_drivers/open/e1000e_1_6_3/netdev.c:4676 e1000_close+0x162/0x170
> [e1000e_1_6_3]()
>     ...
> Feb  9 22:50:41 kernel: BUG: unable to handle kernel NULL pointer
> dereference at 00000004
> Feb  9 22:50:41 kernel: IP: [<f8747e55>] e1000_put_txbuf+0x15/0x90
> [e1000e_1_6_3]
>     ...
> Feb  9 22:50:45 kernel: kernel BUG at drivers/pci/msi.c:284!
> Feb  9 22:50:45 kernel: invalid opcode: 0000 [#2] SMP
> 
> 
> Feb 14 13:50:06 kernel: e1000e 0000:15:00.0: eth0: Reset adapter
>     ...
> Feb 14 13:50:06 kernel: WARNING: at
> extra_drivers/open/e1000e_1_6_3/netdev.c:4676 e1000_close+0x162/0x170
> [e1000e_1_6_3]()
>     ...
> Feb 14 13:50:06 kernel: BUG: unable to handle kernel NULL pointer
> dereference at 00000008
> Feb 14 13:50:06 kernel: IP: [<f866f6a8>]
> e1000_alloc_rx_buffers+0x98/0x270 [e1000e_1_6_3]
>     ...
> Feb 14 13:50:07 kernel: kernel BUG at drivers/pci/msi.c:284!
> Feb 14 13:50:07 kernel: invalid opcode: 0000 [#2] SMP
> 
> 
> A very similar bug report is here:
> http://lists.openwall.net/netdev/2011/11/14/127
> and notes two issues:
>    1) The napi_enable() and napi_disable() should only be called in
> the
>       e1000_open and e1000_close functions respectively
>    2) There no synchronization preventing a call to the driver close
> while
>       executing error processing.
> 
> This led to upstream kernel commit
> 5f4a780ddd453c4918555fed9d9c5f2d455a087d with respect to #1 about a
> month
> after 1.6.3 came out.  I don't see the fix for #1 in driver 1.9.5
> though
> which came out a few weeks after the upstream commit.  Is this fix
> going
> to be available in an Intel driver update in the future?

Yes; it should be released in a few weeks.

> 
> We don't explicitly set CONFIG_E1000E_NAPI in our build, but it looks
> like src/kcompat.h probably automagically sets it since we haven't
> defined E1000E_NO_NAPI.  So we likely hit issue #1.
> 
> But what about #2?  It seems like something would still be needed to
> address that and given a reading of the code paths involved with the
> above kernel warnings/bugs, that concurrency issue seem to be just
> what
> we're hitting.  Does Intel have a fix in the works for that portion?
> Any patches we might be able to test?

I'm not aware of anything in the works for this.

> 
> 
> --
> Tim Pepper  <[email protected]>
> IBM Linux Technology Center


------------------------------------------------------------------------------
Virtualization & Cloud Management Using Capacity Planning
Cloud computing makes use of virtualization - but cloud computing 
also focuses on allowing computing to be delivered as a service.
http://www.accelacomm.com/jaw/sfnl/114/51521223/
_______________________________________________
E1000-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/e1000-devel
To learn more about Intel&#174; Ethernet, visit 
http://communities.intel.com/community/wired

Reply via email to