Re: 7.1-STABLE Sun Mar 29 01:06:46 ADT 2009 Locks up ...

2009-05-11 Thread John Baldwin
On Saturday 09 May 2009 6:43:16 pm Marc G. Fournier wrote:
 On Tue, 28 Apr 2009, Gavin Atkinson wrote:
 
  On Fri, 2009-04-24 at 20:39 +0200, Martin Schmidt wrote:
  Hi Marc and List,
 
  i had similar issues with FreeBSD 7.2-PRERELEASE. Server (zfs,nfs)
  seems to hang in intervals of about 8 hours.
  kernel is still there but no connections can be made to nfs/ssh and
  login on local console doesn't seem to
  work due to incredible slowness. breaking to the debugger takes a
  moment but works.
  (compiling kernel with WITNESS didnt help)
 
  the server had been solid before with 7 stable kernel from around 19
  October 2008.
 
  I now added these lines to /boot/loader.conf
 
  hw.pci.enable_msi=0
  hw.pci.enable_msix=0
 
  to disable Message Signaled Interrupts. Which are used by the 3ware
  twa driver and igb network driver on our server.
 
  If you are willing to test further on your server, it may be helpful if
  you could determine which of those two lines in loader.conf fixes the
  problem for you.  It would also be useful to provide a dmesg from the
  machine when both msi and msix are enabled.
 
  FWIW, looking at the vmstat -i output it appears that only the igb
  driver that are using MSI/MSIX, unless you have a reason to suspect
  otherwise?
 
 How do you tell that, about igb?  looking at the server I have the igb 
 device on, it doesn't seem to say anything about that ...

IRQs  256 are MSI/MSI-X.

-- 
John Baldwin
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


RE: 7.1-STABLE Sun Mar 29 01:06:46 ADT 2009 Locks up ...

2009-05-09 Thread Marc G. Fournier

On Tue, 28 Apr 2009, Gavin Atkinson wrote:


On Fri, 2009-04-24 at 20:39 +0200, Martin Schmidt wrote:

Hi Marc and List,

i had similar issues with FreeBSD 7.2-PRERELEASE. Server (zfs,nfs)
seems to hang in intervals of about 8 hours.
kernel is still there but no connections can be made to nfs/ssh and
login on local console doesn't seem to
work due to incredible slowness. breaking to the debugger takes a
moment but works.
(compiling kernel with WITNESS didnt help)

the server had been solid before with 7 stable kernel from around 19
October 2008.

I now added these lines to /boot/loader.conf

hw.pci.enable_msi=0
hw.pci.enable_msix=0

to disable Message Signaled Interrupts. Which are used by the 3ware
twa driver and igb network driver on our server.


If you are willing to test further on your server, it may be helpful if
you could determine which of those two lines in loader.conf fixes the
problem for you.  It would also be useful to provide a dmesg from the
machine when both msi and msix are enabled.

FWIW, looking at the vmstat -i output it appears that only the igb
driver that are using MSI/MSIX, unless you have a reason to suspect
otherwise?


How do you tell that, about igb?  looking at the server I have the igb 
device on, it doesn't seem to say anything about that ...


# vmstat -i
interrupt  total   rate
irq1: atkbd0 162  0
irq30: twa0402647215187
cpu0: timer   4284778818   1999
irq256: igb0  1282945461598
irq257: igb0   215507100100
irq258: igb0   417702261194
irq259: igb0   314601966146
irq260: igb0   568062067265
irq261: igb0   3  0
cpu5: timer   428475   1999
cpu6: timer   4284731466   1999
cpu7: timer   4284724508   1999
cpu1: timer   4284893874   1999
cpu3: timer   4284899807   1999
cpu2: timer   4284892325   1999
cpu4: timer   4284897264   1999
Total37480028742  17493


The server(s) that I am experiencing the hangs on, vmstat -i shows:

# vmstat -i
interrupt  total   rate
irq1: atkbd0   2  0
irq3: sio1 8  0
irq25: bge0  4614816213
irq72: ciss0 1835763 85
cpu0: timer 43113685   1997
cpu1: timer 43116889   1997
Total   92681163   4293

Are any of these similiarly using MSI/MSIX?


Marc G. Fournier   Hub.Org Networking Services (http://www.hub.org)
Email . scra...@hub.org  MSN . scra...@hub.org
Yahoo . yscrappy   Skype: hub.orgICQ . 7615664
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


RE: 7.1-STABLE Sun Mar 29 01:06:46 ADT 2009 Locks up ...

2009-05-09 Thread Marc G. Fournier



'k, based on grep'ng the source files, turns out that the if_bge device 
driver uses msi, while, as you point out, the igb uses msix ... I have 
disabled msi on the two servers with bge devices, and msix on the one with 
igb ... all three have given the same sort of problem after varying 
periods of time ... let's see if I can get to 30 days uptime with this ...


On Tue, 28 Apr 2009, Gavin Atkinson wrote:


On Fri, 2009-04-24 at 20:39 +0200, Martin Schmidt wrote:

Hi Marc and List,

i had similar issues with FreeBSD 7.2-PRERELEASE. Server (zfs,nfs)
seems to hang in intervals of about 8 hours.
kernel is still there but no connections can be made to nfs/ssh and
login on local console doesn't seem to
work due to incredible slowness. breaking to the debugger takes a
moment but works.
(compiling kernel with WITNESS didnt help)

the server had been solid before with 7 stable kernel from around 19
October 2008.

I now added these lines to /boot/loader.conf

hw.pci.enable_msi=0
hw.pci.enable_msix=0

to disable Message Signaled Interrupts. Which are used by the 3ware
twa driver and igb network driver on our server.


If you are willing to test further on your server, it may be helpful if
you could determine which of those two lines in loader.conf fixes the
problem for you.  It would also be useful to provide a dmesg from the
machine when both msi and msix are enabled.

FWIW, looking at the vmstat -i output it appears that only the igb
driver that are using MSI/MSIX, unless you have a reason to suspect
otherwise?

Gavin
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org




Marc G. Fournier   Hub.Org Networking Services (http://www.hub.org)
Email . scra...@hub.org  MSN . scra...@hub.org
Yahoo . yscrappy   Skype: hub.orgICQ . 7615664
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


RE: 7.1-STABLE Sun Mar 29 01:06:46 ADT 2009 Locks up ...

2009-04-28 Thread Gavin Atkinson
On Fri, 2009-04-24 at 20:39 +0200, Martin Schmidt wrote:
 Hi Marc and List,
 
 i had similar issues with FreeBSD 7.2-PRERELEASE. Server (zfs,nfs)  
 seems to hang in intervals of about 8 hours.
 kernel is still there but no connections can be made to nfs/ssh and  
 login on local console doesn't seem to
 work due to incredible slowness. breaking to the debugger takes a  
 moment but works.
 (compiling kernel with WITNESS didnt help)
 
 the server had been solid before with 7 stable kernel from around 19  
 October 2008.
 
 I now added these lines to /boot/loader.conf
 
 hw.pci.enable_msi=0
 hw.pci.enable_msix=0
 
 to disable Message Signaled Interrupts. Which are used by the 3ware
 twa driver and igb network driver on our server.

If you are willing to test further on your server, it may be helpful if
you could determine which of those two lines in loader.conf fixes the
problem for you.  It would also be useful to provide a dmesg from the
machine when both msi and msix are enabled.

FWIW, looking at the vmstat -i output it appears that only the igb
driver that are using MSI/MSIX, unless you have a reason to suspect
otherwise?

Gavin
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


RE: 7.1-STABLE Sun Mar 29 01:06:46 ADT 2009 Locks up ...

2009-04-24 Thread Martin Schmidt

Hi Marc and List,

i had similar issues with FreeBSD 7.2-PRERELEASE. Server (zfs,nfs)  
seems to hang in intervals of about 8 hours.
kernel is still there but no connections can be made to nfs/ssh and  
login on local console doesn't seem to
work due to incredible slowness. breaking to the debugger takes a  
moment but works.

(compiling kernel with WITNESS didnt help)

the server had been solid before with 7 stable kernel from around 19  
October 2008.


I now added these lines to /boot/loader.conf

hw.pci.enable_msi=0
hw.pci.enable_msix=0

to disable Message Signaled Interrupts. Which are used by the 3ware
twa driver and igb network driver on our server.

With this the server had run 3 days with no hangs. I then enabled msi
again and had a hang within 24 hours. Disabled again and now the
server is online without an issue for 6 days.

Im not 100% sure yet if this really is the sole source of the problems
(e.g. workload might be another factor). But i guess its worth a try
to check if it might help you too.

If this is a known problem or there are any other hints to solve this
problem or if the server configuration just seems wrong, i appreciate  
the feedback.


regards,
Martin

pciconf (with msi):
hos...@pci0:0:0:0:  class=0x06 card=0xa28015d9 chip=0x40038086
rev=0x20 hdr=0x00
cap 01[50] = powerspec 3  supports D0 D3  current D0
cap 05[58] = MSI supports 2 messages
cap 10[6c] = PCI-Express 2 root port
pc...@pci0:0:1:0:   class=0x060400 card=0xa28015d9 chip=0x40218086
rev=0x20 hdr=0x01
cap 01[50] = powerspec 3  supports D0 D3  current D0
cap 05[58] = MSI supports 2 messages
cap 10[6c] = PCI-Express 2 root port
cap 0d[b0] = PCI Bridge card=0xa28015d9
pc...@pci0:0:3:0:   class=0x060400 card=0xa28015d9 chip=0x40238086
rev=0x20 hdr=0x01
cap 01[50] = powerspec 3  supports D0 D3  current D0
cap 05[58] = MSI supports 2 messages
cap 10[6c] = PCI-Express 2 root port
cap 0d[b0] = PCI Bridge card=0xa28015d9
pc...@pci0:0:5:0:   class=0x060400 card=0xa28015d9 chip=0x40258086
rev=0x20 hdr=0x01
cap 01[50] = powerspec 3  supports D0 D3  current D0
cap 05[58] = MSI supports 2 messages
cap 10[6c] = PCI-Express 2 root port
cap 0d[b0] = PCI Bridge card=0xa28015d9
pc...@pci0:0:7:0:   class=0x060400 card=0xa28015d9 chip=0x40278086
rev=0x20 hdr=0x01
cap 01[50] = powerspec 3  supports D0 D3  current D0
cap 05[58] = MSI supports 2 messages
cap 10[6c] = PCI-Express 2 root port
cap 0d[b0] = PCI Bridge card=0xa28015d9
pc...@pci0:0:9:0:   class=0x060400 card=0xa28015d9 chip=0x40298086
rev=0x20 hdr=0x01
cap 01[50] = powerspec 3  supports D0 D3  current D0
cap 05[58] = MSI supports 2 messages
cap 10[6c] = PCI-Express 2 root port
cap 0d[b0] = PCI Bridge card=0xa28015d9
no...@pci0:0:15:0:  class=0x088000 card=0xa28015d9 chip=0x402f8086
rev=0x20 hdr=0x00
cap 01[50] = powerspec 3  supports D0 D3  current D0
cap 11[58] = MSI-X supports 4 messages in map 0x10
cap 10[6c] = PCI-Express 2 type 0
hos...@pci0:0:16:0: class=0x06 card=0xa28015d9 chip=0x40308086
rev=0x20 hdr=0x00
hos...@pci0:0:16:1: class=0x06 card=0xa28015d9 chip=0x40308086
rev=0x20 hdr=0x00
hos...@pci0:0:16:2: class=0x06 card=0xa28015d9 chip=0x40308086
rev=0x20 hdr=0x00
hos...@pci0:0:16:3: class=0x06 card=0xa28015d9 chip=0x40308086
rev=0x20 hdr=0x00
hos...@pci0:0:16:4: class=0x06 card=0xa28015d9 chip=0x40308086
rev=0x20 hdr=0x00
hos...@pci0:0:17:0: class=0x06 card=0xa28015d9 chip=0x40318086
rev=0x20 hdr=0x00
hos...@pci0:0:21:0: class=0x06 card=0xa28015d9 chip=0x40358086
rev=0x20 hdr=0x00
hos...@pci0:0:21:1: class=0x06 card=0xa28015d9 chip=0x40358086
rev=0x20 hdr=0x00
hos...@pci0:0:22:0: class=0x06 card=0xa28015d9 chip=0x40368086
rev=0x20 hdr=0x00
host...@pci0:0:22:1:class=0x06 card=0xa28015d9 chip=0x40368086
rev=0x20 hdr=0x00
pc...@pci0:0:28:0:  class=0x060400 card=0xa28015d9 chip=0x26908086
rev=0x09 hdr=0x01
cap 10[40] = PCI-Express 1 root port
cap 05[80] = MSI supports 1 message
cap 0d[90] = PCI Bridge card=0xa28015d9
cap 01[a0] = powerspec 2  supports D0 D3  current D0
uh...@pci0:0:29:0:  class=0x0c0300 card=0xa28015d9 chip=0x26888086
rev=0x09 hdr=0x00
uh...@pci0:0:29:1:  class=0x0c0300 card=0xa28015d9 chip=0x26898086
rev=0x09 hdr=0x00
uh...@pci0:0:29:2:  class=0x0c0300 card=0xa28015d9 chip=0x268a8086
rev=0x09 hdr=0x00
eh...@pci0:0:29:7:  class=0x0c0320 card=0xa28015d9 chip=0x268c8086
rev=0x09 hdr=0x00
cap 01[50] = powerspec 2  supports D0 D3  current D0
cap 0a[58] = EHCI Debug Port at offset 0xa0 in map 0x14
pci...@pci0:0:30:0: class=0x060401 card=0xa28015d9 chip=0x244e8086
rev=0xd9 hdr=0x01
cap 0d[50] = PCI Bridge card=0xa28015d9
is...@pci0:0:31:0:  class=0x060100 card=0xa28015d9 chip=0x26708086
rev=0x09 hdr=0x00
atap...@pci0:0:31:1:class=0x01018a card=0xa28015d9 chip=0x269e8086
rev=0x09 hdr=0x00