Re: igb network lockups

2013-03-05 Thread Barney Cordoba


--- On Mon, 3/4/13, Zaphod Beeblebrox  wrote:

> From: Zaphod Beeblebrox 
> Subject: Re: igb network lockups
> To: "Jack Vogel" 
> Cc: "Nick Rogers" , "Sepherosa Ziehau" 
> , "Christopher D. Harrison" , 
> "freebsd-net@freebsd.org" 
> Date: Monday, March 4, 2013, 1:58 PM
> For everyone having lockup problems
> with IGB, I'd like to ask if they could
> try disabling hyperthreads --- this worked for me on one
> system but has
> been unnecessary on others.

Gee, maybe binding an interrupt to a virtual cpu isn't a good idea?

BC
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"


Re: igb network lockups

2013-03-05 Thread Barney Cordoba


--- On Mon, 3/4/13, Zaphod Beeblebrox  wrote:

> From: Zaphod Beeblebrox 
> Subject: Re: igb network lockups
> To: "Jack Vogel" 
> Cc: "Nick Rogers" , "Sepherosa Ziehau" 
> , "Christopher D. Harrison" , 
> "freebsd-net@freebsd.org" 
> Date: Monday, March 4, 2013, 1:58 PM
> For everyone having lockup problems
> with IGB, I'd like to ask if they could
> try disabling hyperthreads --- this worked for me on one
> system but has
> been unnecessary on others.

Gee, maybe binding an interrupt to a virtual cpu isn't a good idea?

BC
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"


Re: igb network lockups

2013-03-04 Thread Nick Rogers
On Mon, Mar 4, 2013 at 10:22 AM, Jack Vogel  wrote:
>
>>
>> Thanks. So on FreeBSD 9.1-RELEASE it is advisable to set
>> hw.em.enable_msix=0 for 82574L? Are there other em(x) NICs where this
>> is advisable?
>>
>
> As I explained in a previous email, this is not advisable unless you are
> experiencing problems (like hangs), if you are then its one possible
> cause, so try falling back to MSI to see if it eliminates your problem.
>
> And, 82574 is the only devise the em driver supports at present that is
> capable of MSIX, all others use the igb driver.
>

Jack, thanks for clarifying. Its much appreciated.

> Regards,
>
> Jack
>
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"


Re: igb network lockups

2013-03-04 Thread Zaphod Beeblebrox
For everyone having lockup problems with IGB, I'd like to ask if they could
try disabling hyperthreads --- this worked for me on one system but has
been unnecessary on others.
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"


Re: igb network lockups

2013-03-04 Thread Jack Vogel
>
> Thanks. So on FreeBSD 9.1-RELEASE it is advisable to set
> hw.em.enable_msix=0 for 82574L? Are there other em(x) NICs where this
> is advisable?
>
>
As I explained in a previous email, this is not advisable unless you are
experiencing problems (like hangs), if you are then its one possible
cause, so try falling back to MSI to see if it eliminates your problem.

And, 82574 is the only devise the em driver supports at present that is
capable of MSIX, all others use the igb driver.

Regards,

Jack
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"


Re: igb network lockups

2013-03-04 Thread Jack Vogel
On Sun, Mar 3, 2013 at 2:14 AM, Sepherosa Ziehau wrote:
...

>
>
> For 82574L, i.e. supported by em(4), MSI-X must _not_ be enabled; it
> is simply broken (you could check 82574 errata on Intel's website to
> confirm what I have said here).
>

If you actually checked the errata you will find that its not "simply
broken",
furthermore it most certainly SHOULD be enabled, it is by default in the
Linux driver as well as mine, the issue is not with the 82574, its with some
system designs that have upstream PCIE problems.

If you experience problems with a particular system, then we recommend
disabling MSIX to determine if this hardware issue may be behind it. In most
cases MSIX works just fine.


>
> For 82575, i.e. supported by igb(4), MSI-X must _not_ be enabled; it
> is simply broken (you could check 82575 errata on Intel's website to
> confirm what I have said here).
>
>
The same issue obtains on the 82575, its a system issue, and we have
tested the part on our Reference systems in prolonged stress without any
problem. So the same recommendation as above applies.

Jack Vogel
Intel Network Division
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"


Re: igb network lockups

2013-03-04 Thread Nick Rogers
On Sun, Mar 3, 2013 at 2:14 AM, Sepherosa Ziehau  wrote:
> On Sat, Mar 2, 2013 at 12:18 AM, Nick Rogers  wrote:
>> On Fri, Mar 1, 2013 at 8:04 AM, Nick Rogers  wrote:
>>> FWIW I have been experiencing a similar issue on a number of systems
>>> using the em(4) driver under 9.1-RELEASE. This is after upgrading from
>>> a snapshot of 8.3-STABLE. My systems use PF+ALTQ as well. The symptoms
>>> are: interface stops passing traffic until the system is rebooted. I
>>> have not yet been able to gain access to the systems to dig around
>>> (after they have crashed), however my kernel/network settings are
>>> properly tuned (high mbuf limit, hw.em.rxd/txd=4096, etc). It seems to
>>> happen about once a day on systems with around a sustained 50Mb/s of
>>> traffic.
>>>
>>> I realize this is not much to go on but perhaps it helps. I am
>>> debating trying the e1000 driver in the latest CURRENT on top of
>>> 9.1-RELEASE. I noticed the Intel shared code was updated about a week
>>> ago. Would this change or perhaps another change to e1000 since
>>> 9.1-RELEASE possibly affect stability in a positive way?
>>>
>>> Thanks.
>>
>> Heres relevant pciconf output:
>>
>> em0@pci0:1:0:0: class=0x02 card=0x10d315d9 chip=0x10d38086 rev=0x00 
>> hdr=0x00
>> vendor = 'Intel Corporation'
>> device = '82574L Gigabit Network Connection'
>> class  = network
>> subclass   = ethernet
>> cap 01[c8] = powerspec 2  supports D0 D3  current D0
>> cap 05[d0] = MSI supports 1 message, 64 bit
>> cap 10[e0] = PCI-Express 1 endpoint max data 128(256) link x1(x1)
>> cap 11[a0] = MSI-X supports 5 messages in map 0x1c enabled
>> ecap 0001[100] = AER 1 0 fatal 0 non-fatal 1 corrected
>
> For 82574L, i.e. supported by em(4), MSI-X must _not_ be enabled; it
> is simply broken (you could check 82574 errata on Intel's website to
> confirm what I have said here).

Thanks. So on FreeBSD 9.1-RELEASE it is advisable to set
hw.em.enable_msix=0 for 82574L? Are there other em(x) NICs where this
is advisable?

>
> For 82575, i.e. supported by igb(4), MSI-X must _not_ be enabled; it
> is simply broken (you could check 82575 errata on Intel's website to
> confirm what I have said here).
>
> Best Regards,
> sephe
>
> --
> Tomorrow Will Never Die
>
> On Sat, Mar 2, 2013 at 12:18 AM, Nick Rogers  wrote:
>> On Fri, Mar 1, 2013 at 8:04 AM, Nick Rogers  wrote:
>>> FWIW I have been experiencing a similar issue on a number of systems
>>> using the em(4) driver under 9.1-RELEASE. This is after upgrading from
>>> a snapshot of 8.3-STABLE. My systems use PF+ALTQ as well. The symptoms
>>> are: interface stops passing traffic until the system is rebooted. I
>>> have not yet been able to gain access to the systems to dig around
>>> (after they have crashed), however my kernel/network settings are
>>> properly tuned (high mbuf limit, hw.em.rxd/txd=4096, etc). It seems to
>>> happen about once a day on systems with around a sustained 50Mb/s of
>>> traffic.
>>>
>>> I realize this is not much to go on but perhaps it helps. I am
>>> debating trying the e1000 driver in the latest CURRENT on top of
>>> 9.1-RELEASE. I noticed the Intel shared code was updated about a week
>>> ago. Would this change or perhaps another change to e1000 since
>>> 9.1-RELEASE possibly affect stability in a positive way?
>>>
>>> Thanks.
>>
>> Heres relevant pciconf output:
>>
>> em0@pci0:1:0:0: class=0x02 card=0x10d315d9 chip=0x10d38086 rev=0x00 
>> hdr=0x00
>> vendor = 'Intel Corporation'
>> device = '82574L Gigabit Network Connection'
>> class  = network
>> subclass   = ethernet
>> cap 01[c8] = powerspec 2  supports D0 D3  current D0
>> cap 05[d0] = MSI supports 1 message, 64 bit
>> cap 10[e0] = PCI-Express 1 endpoint max data 128(256) link x1(x1)
>> cap 11[a0] = MSI-X supports 5 messages in map 0x1c enabled
>> ecap 0001[100] = AER 1 0 fatal 0 non-fatal 1 corrected
>> em1@pci0:2:0:0: class=0x02 card=0x10d315d9 chip=0x10d38086 rev=0x00 
>> hdr=0x00
>> vendor = 'Intel Corporation'
>> device = '82574L Gigabit Network Connection'
>> class  = network
>> subclass   = ethernet
>> cap 01[c8] = powerspec 2  supports D0 D3  current D0
>> cap 05[d0] = MSI supports 1 message, 64 bit
>> cap 10[e0] = PCI-Express 1 endpoint max data 128(256) link x1(x1)
>> cap 11[a0] = MSI-X supports 5 messages in map 0x1c enabled
>> ecap 0001[100] = AER 1 0 fatal 0 non-fatal 1 corrected
>> em2@pci0:7:0:0: class=0x02 card=0x10d315d9 chip=0x10d38086 rev=0x00 
>> hdr=0x00
>> vendor = 'Intel Corporation'
>> device = '82574L Gigabit Network Connection'
>> class  = network
>> subclass   = ethernet
>> cap 01[c8] = powerspec 2  supports D0 D3  current D0
>> cap 05[d0] = MSI supports 1 message, 64 bit
>> cap 10[e0] = PCI-Express 1 endpoint max data 128(256) link x1(x1)
>> cap 11[a0] = MSI-X supports 5 messages in map 0x1c enabled
>> ecap 0001[100] = AER 1 0 fa

Re: igb network lockups

2013-03-03 Thread Sepherosa Ziehau
On Sat, Mar 2, 2013 at 12:18 AM, Nick Rogers  wrote:
> On Fri, Mar 1, 2013 at 8:04 AM, Nick Rogers  wrote:
>> FWIW I have been experiencing a similar issue on a number of systems
>> using the em(4) driver under 9.1-RELEASE. This is after upgrading from
>> a snapshot of 8.3-STABLE. My systems use PF+ALTQ as well. The symptoms
>> are: interface stops passing traffic until the system is rebooted. I
>> have not yet been able to gain access to the systems to dig around
>> (after they have crashed), however my kernel/network settings are
>> properly tuned (high mbuf limit, hw.em.rxd/txd=4096, etc). It seems to
>> happen about once a day on systems with around a sustained 50Mb/s of
>> traffic.
>>
>> I realize this is not much to go on but perhaps it helps. I am
>> debating trying the e1000 driver in the latest CURRENT on top of
>> 9.1-RELEASE. I noticed the Intel shared code was updated about a week
>> ago. Would this change or perhaps another change to e1000 since
>> 9.1-RELEASE possibly affect stability in a positive way?
>>
>> Thanks.
>
> Heres relevant pciconf output:
>
> em0@pci0:1:0:0: class=0x02 card=0x10d315d9 chip=0x10d38086 rev=0x00 
> hdr=0x00
> vendor = 'Intel Corporation'
> device = '82574L Gigabit Network Connection'
> class  = network
> subclass   = ethernet
> cap 01[c8] = powerspec 2  supports D0 D3  current D0
> cap 05[d0] = MSI supports 1 message, 64 bit
> cap 10[e0] = PCI-Express 1 endpoint max data 128(256) link x1(x1)
> cap 11[a0] = MSI-X supports 5 messages in map 0x1c enabled
> ecap 0001[100] = AER 1 0 fatal 0 non-fatal 1 corrected

For 82574L, i.e. supported by em(4), MSI-X must _not_ be enabled; it
is simply broken (you could check 82574 errata on Intel's website to
confirm what I have said here).

For 82575, i.e. supported by igb(4), MSI-X must _not_ be enabled; it
is simply broken (you could check 82575 errata on Intel's website to
confirm what I have said here).

Best Regards,
sephe

--
Tomorrow Will Never Die

On Sat, Mar 2, 2013 at 12:18 AM, Nick Rogers  wrote:
> On Fri, Mar 1, 2013 at 8:04 AM, Nick Rogers  wrote:
>> FWIW I have been experiencing a similar issue on a number of systems
>> using the em(4) driver under 9.1-RELEASE. This is after upgrading from
>> a snapshot of 8.3-STABLE. My systems use PF+ALTQ as well. The symptoms
>> are: interface stops passing traffic until the system is rebooted. I
>> have not yet been able to gain access to the systems to dig around
>> (after they have crashed), however my kernel/network settings are
>> properly tuned (high mbuf limit, hw.em.rxd/txd=4096, etc). It seems to
>> happen about once a day on systems with around a sustained 50Mb/s of
>> traffic.
>>
>> I realize this is not much to go on but perhaps it helps. I am
>> debating trying the e1000 driver in the latest CURRENT on top of
>> 9.1-RELEASE. I noticed the Intel shared code was updated about a week
>> ago. Would this change or perhaps another change to e1000 since
>> 9.1-RELEASE possibly affect stability in a positive way?
>>
>> Thanks.
>
> Heres relevant pciconf output:
>
> em0@pci0:1:0:0: class=0x02 card=0x10d315d9 chip=0x10d38086 rev=0x00 
> hdr=0x00
> vendor = 'Intel Corporation'
> device = '82574L Gigabit Network Connection'
> class  = network
> subclass   = ethernet
> cap 01[c8] = powerspec 2  supports D0 D3  current D0
> cap 05[d0] = MSI supports 1 message, 64 bit
> cap 10[e0] = PCI-Express 1 endpoint max data 128(256) link x1(x1)
> cap 11[a0] = MSI-X supports 5 messages in map 0x1c enabled
> ecap 0001[100] = AER 1 0 fatal 0 non-fatal 1 corrected
> em1@pci0:2:0:0: class=0x02 card=0x10d315d9 chip=0x10d38086 rev=0x00 
> hdr=0x00
> vendor = 'Intel Corporation'
> device = '82574L Gigabit Network Connection'
> class  = network
> subclass   = ethernet
> cap 01[c8] = powerspec 2  supports D0 D3  current D0
> cap 05[d0] = MSI supports 1 message, 64 bit
> cap 10[e0] = PCI-Express 1 endpoint max data 128(256) link x1(x1)
> cap 11[a0] = MSI-X supports 5 messages in map 0x1c enabled
> ecap 0001[100] = AER 1 0 fatal 0 non-fatal 1 corrected
> em2@pci0:7:0:0: class=0x02 card=0x10d315d9 chip=0x10d38086 rev=0x00 
> hdr=0x00
> vendor = 'Intel Corporation'
> device = '82574L Gigabit Network Connection'
> class  = network
> subclass   = ethernet
> cap 01[c8] = powerspec 2  supports D0 D3  current D0
> cap 05[d0] = MSI supports 1 message, 64 bit
> cap 10[e0] = PCI-Express 1 endpoint max data 128(256) link x1(x1)
> cap 11[a0] = MSI-X supports 5 messages in map 0x1c enabled
> ecap 0001[100] = AER 1 0 fatal 0 non-fatal 1 corrected
> em3@pci0:8:0:0: class=0x02 card=0x10d315d9 chip=0x10d38086 rev=0x00 
> hdr=0x00
> vendor = 'Intel Corporation'
> device = '82574L Gigabit Network Connection'
> class  = network
> subclass   = ethernet
> cap 01[c8] = powerspec 2  supports D0 D3  curr

Re: igb network lockups

2013-03-02 Thread Barney Cordoba


--- On Mon, 2/25/13, Christopher D. Harrison  wrote:

> From: Christopher D. Harrison 
> Subject: Re: igb network lockups
> To: "Jack Vogel" 
> Cc: freebsd-net@freebsd.org
> Date: Monday, February 25, 2013, 1:38 PM
> Sure,
> The problem appears on both systems running with ALTQ and
> vanilla.
>      -C
> On 02/25/13 12:29, Jack Vogel wrote:
> > I've not heard of this problem, but I think most users
> do not use 
> > ALTQ, and we (Intel) do not
> > test using it. Can it be eliminated from the equation?
> >
> > Jack
> >
> >
> > On Mon, Feb 25, 2013 at 10:16 AM, Christopher D.
> Harrison 
> >  <mailto:harri...@biostat.wisc.edu>>
> wrote:
> >
> >     I recently have been
> experiencing network "freezes" and network
> >     "lockups" on our Freebsd 9.1
> systems which are running zfs and nfs
> >     file servers.
> >     I upgraded from 9.0 to 9.1
> about 2 months ago and we have been
> >     having issues with almost
> bi-monthly.   The issue manifests in the
> >     system becomes unresponsive to
> any/all nfs clients.   The system
> >     is not resource bound as our
> I/O is low to disk and our network is
> >     usually in the 20mbit/40mbit
> range.   We do notice a correlation
> >     between temporary i/o spikes
> and network freezes but not enough to
> >     send our system in to "lockup"
> mode for the next 5min.   Currently
> >     we have 4 igb nics in 2 aggr's
> with 8 queue's per nic and our
> >     dev.igb reports:
> >
> >     dev.igb.3.%desc: Intel(R)
> PRO/1000 Network Connection version - 2.3.4
> >
> >     I am almost certain the problem
> is with the ibg driver as a friend
> >     is also experiencing the same
> problem with the same intel igb nic.
> >       He has addressed the
> issue by restarting the network using netif
> >     on his
> systems.   According to my friend, once the
> network
> >     interfaces get cleared,
> everything comes back and starts working
> >     as expected.
> >
> >     I have noticed an issue with
> the igb driver and I was looking for
> >     thoughts on how to help address
> this problem.
> >     
> >http://freebsd.1045724.n5.nabble.com/em-igb-if-transmit-drbr-and-ALTQ-td5760338.html
> >
> >     Thoughts/Ideas are greatly
> appreciated!!!
> >
> >         -C

Do you have 32 cpus in the system? You've created a lock contention
nightmare; frankly Im surprised that the system runs at all.

Try running with 1 queue per nic. The point of using queues is to spread
the load; the fact that you're even using queues with such a minuscule load
is a commentary on the blind use of "features" without any explanation or
understanding of what they do.

Does igb still bind to CPUs without regard to whether its a real cpu or
a hyper thread? This needs to be removed.

I wish that someone who understood this stuff would have a beer with Jack
and explain to him why this design is defective. The "default" for this
driver is almost always the wrong configuration.

You don't need to spread the load with 40Mb/s throughput, and using
multiple queues will use a lot more CPU than using just 1. do you really
want 4 cpus using 10% instead of 1 using 14%?

You also should consider increasing your tx buffers; a property of 
applications like ALTQ is that they tend to send out big bursts of 
packets and they can overflow the rings. I'm not specifically familiar with
ALTQ so Im not sure how it handles such things; nor am I sure of how it
handles multiple tx queues, if at all.

BC
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"


Re: igb network lockups

2013-03-01 Thread Nick Rogers
On Fri, Mar 1, 2013 at 8:04 AM, Nick Rogers  wrote:
> FWIW I have been experiencing a similar issue on a number of systems
> using the em(4) driver under 9.1-RELEASE. This is after upgrading from
> a snapshot of 8.3-STABLE. My systems use PF+ALTQ as well. The symptoms
> are: interface stops passing traffic until the system is rebooted. I
> have not yet been able to gain access to the systems to dig around
> (after they have crashed), however my kernel/network settings are
> properly tuned (high mbuf limit, hw.em.rxd/txd=4096, etc). It seems to
> happen about once a day on systems with around a sustained 50Mb/s of
> traffic.
>
> I realize this is not much to go on but perhaps it helps. I am
> debating trying the e1000 driver in the latest CURRENT on top of
> 9.1-RELEASE. I noticed the Intel shared code was updated about a week
> ago. Would this change or perhaps another change to e1000 since
> 9.1-RELEASE possibly affect stability in a positive way?
>
> Thanks.

Heres relevant pciconf output:

em0@pci0:1:0:0: class=0x02 card=0x10d315d9 chip=0x10d38086 rev=0x00 hdr=0x00
vendor = 'Intel Corporation'
device = '82574L Gigabit Network Connection'
class  = network
subclass   = ethernet
cap 01[c8] = powerspec 2  supports D0 D3  current D0
cap 05[d0] = MSI supports 1 message, 64 bit
cap 10[e0] = PCI-Express 1 endpoint max data 128(256) link x1(x1)
cap 11[a0] = MSI-X supports 5 messages in map 0x1c enabled
ecap 0001[100] = AER 1 0 fatal 0 non-fatal 1 corrected
em1@pci0:2:0:0: class=0x02 card=0x10d315d9 chip=0x10d38086 rev=0x00 hdr=0x00
vendor = 'Intel Corporation'
device = '82574L Gigabit Network Connection'
class  = network
subclass   = ethernet
cap 01[c8] = powerspec 2  supports D0 D3  current D0
cap 05[d0] = MSI supports 1 message, 64 bit
cap 10[e0] = PCI-Express 1 endpoint max data 128(256) link x1(x1)
cap 11[a0] = MSI-X supports 5 messages in map 0x1c enabled
ecap 0001[100] = AER 1 0 fatal 0 non-fatal 1 corrected
em2@pci0:7:0:0: class=0x02 card=0x10d315d9 chip=0x10d38086 rev=0x00 hdr=0x00
vendor = 'Intel Corporation'
device = '82574L Gigabit Network Connection'
class  = network
subclass   = ethernet
cap 01[c8] = powerspec 2  supports D0 D3  current D0
cap 05[d0] = MSI supports 1 message, 64 bit
cap 10[e0] = PCI-Express 1 endpoint max data 128(256) link x1(x1)
cap 11[a0] = MSI-X supports 5 messages in map 0x1c enabled
ecap 0001[100] = AER 1 0 fatal 0 non-fatal 1 corrected
em3@pci0:8:0:0: class=0x02 card=0x10d315d9 chip=0x10d38086 rev=0x00 hdr=0x00
vendor = 'Intel Corporation'
device = '82574L Gigabit Network Connection'
class  = network
subclass   = ethernet
cap 01[c8] = powerspec 2  supports D0 D3  current D0
cap 05[d0] = MSI supports 1 message, 64 bit
cap 10[e0] = PCI-Express 1 endpoint max data 128(256) link x1(x1)
cap 11[a0] = MSI-X supports 5 messages in map 0x1c enabled
ecap 0001[100] = AER 1 0 fatal 0 non-fatal 1 corrected


>
> On Mon, Feb 25, 2013 at 10:45 AM, Jack Vogel  wrote:
>> Have you done any poking around, looking at stats to determine why the
>> hangs? For instance,
>> might your mbuf pool be depleted? Some other network resource perhaps?
>>
>> Jack
>>
>>
>> On Mon, Feb 25, 2013 at 10:38 AM, Christopher D. Harrison <
>> harri...@biostat.wisc.edu> wrote:
>>
>>>  Sure,
>>> The problem appears on both systems running with ALTQ and vanilla.
>>> -C
>>>
>>> On 02/25/13 12:29, Jack Vogel wrote:
>>>
>>> I've not heard of this problem, but I think most users do not use ALTQ,
>>> and we (Intel) do not
>>> test using it. Can it be eliminated from the equation?
>>>
>>> Jack
>>>
>>>
>>> On Mon, Feb 25, 2013 at 10:16 AM, Christopher D. Harrison <
>>> harri...@biostat.wisc.edu> wrote:
>>>
 I recently have been experiencing network "freezes" and network "lockups"
 on our Freebsd 9.1 systems which are running zfs and nfs file servers.
 I upgraded from 9.0 to 9.1 about 2 months ago and we have been having
 issues with almost bi-monthly.   The issue manifests in the system becomes
 unresponsive to any/all nfs clients.   The system is not resource bound as
 our I/O is low to disk and our network is usually in the 20mbit/40mbit
 range.   We do notice a correlation between temporary i/o spikes and
 network freezes but not enough to send our system in to "lockup" mode for
 the next 5min.   Currently we have 4 igb nics in 2 aggr's with 8 queue's
 per nic and our dev.igb reports:

 dev.igb.3.%desc: Intel(R) PRO/1000 Network Connection version - 2.3.4

 I am almost certain the problem is with the ibg driver as a friend is
 also experiencing the same problem with the same intel igb nic.   He has
 addressed the issue by restarting the network using netif on his systems.
 According to my friend, once the network interfaces get cleared, everything
 comes back a

Re: igb network lockups

2013-03-01 Thread Nick Rogers
FWIW I have been experiencing a similar issue on a number of systems
using the em(4) driver under 9.1-RELEASE. This is after upgrading from
a snapshot of 8.3-STABLE. My systems use PF+ALTQ as well. The symptoms
are: interface stops passing traffic until the system is rebooted. I
have not yet been able to gain access to the systems to dig around
(after they have crashed), however my kernel/network settings are
properly tuned (high mbuf limit, hw.em.rxd/txd=4096, etc). It seems to
happen about once a day on systems with around a sustained 50Mb/s of
traffic.

I realize this is not much to go on but perhaps it helps. I am
debating trying the e1000 driver in the latest CURRENT on top of
9.1-RELEASE. I noticed the Intel shared code was updated about a week
ago. Would this change or perhaps another change to e1000 since
9.1-RELEASE possibly affect stability in a positive way?

Thanks.

On Mon, Feb 25, 2013 at 10:45 AM, Jack Vogel  wrote:
> Have you done any poking around, looking at stats to determine why the
> hangs? For instance,
> might your mbuf pool be depleted? Some other network resource perhaps?
>
> Jack
>
>
> On Mon, Feb 25, 2013 at 10:38 AM, Christopher D. Harrison <
> harri...@biostat.wisc.edu> wrote:
>
>>  Sure,
>> The problem appears on both systems running with ALTQ and vanilla.
>> -C
>>
>> On 02/25/13 12:29, Jack Vogel wrote:
>>
>> I've not heard of this problem, but I think most users do not use ALTQ,
>> and we (Intel) do not
>> test using it. Can it be eliminated from the equation?
>>
>> Jack
>>
>>
>> On Mon, Feb 25, 2013 at 10:16 AM, Christopher D. Harrison <
>> harri...@biostat.wisc.edu> wrote:
>>
>>> I recently have been experiencing network "freezes" and network "lockups"
>>> on our Freebsd 9.1 systems which are running zfs and nfs file servers.
>>> I upgraded from 9.0 to 9.1 about 2 months ago and we have been having
>>> issues with almost bi-monthly.   The issue manifests in the system becomes
>>> unresponsive to any/all nfs clients.   The system is not resource bound as
>>> our I/O is low to disk and our network is usually in the 20mbit/40mbit
>>> range.   We do notice a correlation between temporary i/o spikes and
>>> network freezes but not enough to send our system in to "lockup" mode for
>>> the next 5min.   Currently we have 4 igb nics in 2 aggr's with 8 queue's
>>> per nic and our dev.igb reports:
>>>
>>> dev.igb.3.%desc: Intel(R) PRO/1000 Network Connection version - 2.3.4
>>>
>>> I am almost certain the problem is with the ibg driver as a friend is
>>> also experiencing the same problem with the same intel igb nic.   He has
>>> addressed the issue by restarting the network using netif on his systems.
>>> According to my friend, once the network interfaces get cleared, everything
>>> comes back and starts working as expected.
>>>
>>> I have noticed an issue with the igb driver and I was looking for
>>> thoughts on how to help address this problem.
>>>
>>> http://freebsd.1045724.n5.nabble.com/em-igb-if-transmit-drbr-and-ALTQ-td5760338.html
>>>
>>> Thoughts/Ideas are greatly appreciated!!!
>>>
>>> -C
>>>
>>> ___
>>> freebsd-net@freebsd.org mailing list
>>> http://lists.freebsd.org/mailman/listinfo/freebsd-net
>>> To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"
>>>
>>
>>
>>
> ___
> freebsd-net@freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-net
> To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"


Re: igb network lockups

2013-02-25 Thread Jack Vogel
Have you done any poking around, looking at stats to determine why the
hangs? For instance,
might your mbuf pool be depleted? Some other network resource perhaps?

Jack


On Mon, Feb 25, 2013 at 10:38 AM, Christopher D. Harrison <
harri...@biostat.wisc.edu> wrote:

>  Sure,
> The problem appears on both systems running with ALTQ and vanilla.
> -C
>
> On 02/25/13 12:29, Jack Vogel wrote:
>
> I've not heard of this problem, but I think most users do not use ALTQ,
> and we (Intel) do not
> test using it. Can it be eliminated from the equation?
>
> Jack
>
>
> On Mon, Feb 25, 2013 at 10:16 AM, Christopher D. Harrison <
> harri...@biostat.wisc.edu> wrote:
>
>> I recently have been experiencing network "freezes" and network "lockups"
>> on our Freebsd 9.1 systems which are running zfs and nfs file servers.
>> I upgraded from 9.0 to 9.1 about 2 months ago and we have been having
>> issues with almost bi-monthly.   The issue manifests in the system becomes
>> unresponsive to any/all nfs clients.   The system is not resource bound as
>> our I/O is low to disk and our network is usually in the 20mbit/40mbit
>> range.   We do notice a correlation between temporary i/o spikes and
>> network freezes but not enough to send our system in to "lockup" mode for
>> the next 5min.   Currently we have 4 igb nics in 2 aggr's with 8 queue's
>> per nic and our dev.igb reports:
>>
>> dev.igb.3.%desc: Intel(R) PRO/1000 Network Connection version - 2.3.4
>>
>> I am almost certain the problem is with the ibg driver as a friend is
>> also experiencing the same problem with the same intel igb nic.   He has
>> addressed the issue by restarting the network using netif on his systems.
>> According to my friend, once the network interfaces get cleared, everything
>> comes back and starts working as expected.
>>
>> I have noticed an issue with the igb driver and I was looking for
>> thoughts on how to help address this problem.
>>
>> http://freebsd.1045724.n5.nabble.com/em-igb-if-transmit-drbr-and-ALTQ-td5760338.html
>>
>> Thoughts/Ideas are greatly appreciated!!!
>>
>> -C
>>
>> ___
>> freebsd-net@freebsd.org mailing list
>> http://lists.freebsd.org/mailman/listinfo/freebsd-net
>> To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"
>>
>
>
>
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"


Re: igb network lockups

2013-02-25 Thread Christopher D. Harrison

Sure,
The problem appears on both systems running with ALTQ and vanilla.
-C
On 02/25/13 12:29, Jack Vogel wrote:
I've not heard of this problem, but I think most users do not use 
ALTQ, and we (Intel) do not

test using it. Can it be eliminated from the equation?

Jack


On Mon, Feb 25, 2013 at 10:16 AM, Christopher D. Harrison 
mailto:harri...@biostat.wisc.edu>> wrote:


I recently have been experiencing network "freezes" and network
"lockups" on our Freebsd 9.1 systems which are running zfs and nfs
file servers.
I upgraded from 9.0 to 9.1 about 2 months ago and we have been
having issues with almost bi-monthly.   The issue manifests in the
system becomes unresponsive to any/all nfs clients.   The system
is not resource bound as our I/O is low to disk and our network is
usually in the 20mbit/40mbit range.   We do notice a correlation
between temporary i/o spikes and network freezes but not enough to
send our system in to "lockup" mode for the next 5min.   Currently
we have 4 igb nics in 2 aggr's with 8 queue's per nic and our
dev.igb reports:

dev.igb.3.%desc: Intel(R) PRO/1000 Network Connection version - 2.3.4

I am almost certain the problem is with the ibg driver as a friend
is also experiencing the same problem with the same intel igb nic.
  He has addressed the issue by restarting the network using netif
on his systems.   According to my friend, once the network
interfaces get cleared, everything comes back and starts working
as expected.

I have noticed an issue with the igb driver and I was looking for
thoughts on how to help address this problem.

http://freebsd.1045724.n5.nabble.com/em-igb-if-transmit-drbr-and-ALTQ-td5760338.html

Thoughts/Ideas are greatly appreciated!!!

-C

___
freebsd-net@freebsd.org  mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to
"freebsd-net-unsubscr...@freebsd.org
"




___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"


Re: igb network lockups

2013-02-25 Thread Jack Vogel
I've not heard of this problem, but I think most users do not use ALTQ, and
we (Intel) do not
test using it. Can it be eliminated from the equation?

Jack


On Mon, Feb 25, 2013 at 10:16 AM, Christopher D. Harrison <
harri...@biostat.wisc.edu> wrote:

> I recently have been experiencing network "freezes" and network "lockups"
> on our Freebsd 9.1 systems which are running zfs and nfs file servers.
> I upgraded from 9.0 to 9.1 about 2 months ago and we have been having
> issues with almost bi-monthly.   The issue manifests in the system becomes
> unresponsive to any/all nfs clients.   The system is not resource bound as
> our I/O is low to disk and our network is usually in the 20mbit/40mbit
> range.   We do notice a correlation between temporary i/o spikes and
> network freezes but not enough to send our system in to "lockup" mode for
> the next 5min.   Currently we have 4 igb nics in 2 aggr's with 8 queue's
> per nic and our dev.igb reports:
>
> dev.igb.3.%desc: Intel(R) PRO/1000 Network Connection version - 2.3.4
>
> I am almost certain the problem is with the ibg driver as a friend is also
> experiencing the same problem with the same intel igb nic.   He has
> addressed the issue by restarting the network using netif on his systems.
> According to my friend, once the network interfaces get cleared, everything
> comes back and starts working as expected.
>
> I have noticed an issue with the igb driver and I was looking for thoughts
> on how to help address this problem.
> http://freebsd.1045724.n5.**nabble.com/em-igb-if-transmit-**
> drbr-and-ALTQ-td5760338.html
>
> Thoughts/Ideas are greatly appreciated!!!
>
> -C
>
> __**_
> freebsd-net@freebsd.org mailing list
> http://lists.freebsd.org/**mailman/listinfo/freebsd-net
> To unsubscribe, send any mail to 
> "freebsd-net-unsubscribe@**freebsd.org
> "
>
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"