Re: [Bug 245779] Re: Server 8.04 LTS: soft lockup - CPU#1 stuck for 11s! [bond1:3795] - bond - bond0

2009-07-22 Thread Warren V
Upgrade your kernel to 2.6.28. CentOS is now on 2.6.28-128, I noted the
problem went away around 2.6.28-92.

Ubuntu is  stuck with whatever is currently out.

On Wed, Jul 22, 2009 at 12:55 PM, Ryan Lovett r...@spacecoaster.org
wrote:

 We are seeing this problem with 2.6.24-24-server. We were running with
 -23 recently but had this problem and upgraded after reading this
 report. Additionally, the CPU load on the machine is around 190 though
 summing the individual processes in top doesn't approach that total.
 kswapd is near the top though the machine still has a lot of real RAM
 unallocated. /var/log/kern.log grows rapidly at a rate of about 200
 KB/minute. It has a lot of Call Traces and Pid: 15713, comm: process
 Not tainted 2.6.24-24-server #1.

 --
 Server 8.04 LTS: soft lockup - CPU#1 stuck for 11s! [bond1:3795] - bond -
 bond0
 https://bugs.launchpad.net/bugs/245779
 You received this bug notification because you are a direct subscriber
 of the bug.

 Status in “linux” package in Ubuntu: Incomplete
 Status in “linux” package in Debian: Fix Released

 Bug description:
 Hi!
 Ubuntu Server 8.04 LTS with all patch and last kernel
 Hardware: HP DL360 G4 Xeon
 Bonding with :
 - bond0 2x1Gb Intel (802.3ad / 4)
 - bond1 8x1Gb Intel (802.3ad / 4)
 Nagios (only nrpe and plugin)
 Heartbeat2 (withour CRM)
 Vlan

 Today it crash (after two week uptime from kernel upgrade) with this output

 6640927 firewall 11:46:54 kernel: [431168.944816] BUG: soft lockup - CPU#1
 stuck for 11s! [bond1:3795]
 6640928 firewall 11:46:54 kernel: [431168.944849]
 6640929 firewall 11:46:54 kernel: [431168.944853] Pid: 3795, comm: bond1
 Not tainted (2.6.24-19-server #1)
 6640930 firewall 11:46:54 kernel: [431168.944856] EIP:
 0060:[ipv6:_spin_lock+0xa/0x10] EFLAGS: 0286 CPU: 1
 6640931 firewall 11:46:54 kernel: [431168.944865] EIP is at
 _spin_lock+0xa/0x10
 6640932 firewall 11:46:54 kernel: [431168.944867] EAX: f749f334 EBX:
 f749f25c ECX: 0001 EDX: f749f25c
 6640933 firewall 11:46:54 kernel: [431168.944870] ESI:  EDI:
 f7ca1000 EBP: f6c35c80 ESP: f6835cc0
 6640934 firewall 11:46:54 kernel: [431168.944872] DS: 007b ES: 007b FS:
 00d8 GS:  SS: 0068
 6640935 firewall 11:46:54 kernel: [431168.944875] CR0: 8005003b CR2:
 b7bfd0a0 CR3: 35908000 CR4: 06b0
 6640936 firewall 11:46:54 kernel: [431168.944878] DR0:  DR1:
  DR2:  DR3: 
 6640937 firewall 11:46:54 kernel: [431168.944880] DR6: 0ff0 DR7:
 0400
 6640938 firewall 11:46:54 kernel: [431168.944887] [f8b67606]
 ad_rx_machine+0x26/0x690 [bonding]
 6640939 firewall 11:46:54 kernel: [431168.944899]
 [nf_nat:_read_lock_bh+0x8/0x50] _read_lock_bh+0x8/0x20
 6640940 firewall 11:46:54 kernel: [431168.944920] [arp_process+0x8b/0x5f0]
 arp_process+0x8b/0x5f0
 6640941 firewall 11:46:54 kernel: [431168.944930] [f8b67e6a]
 bond_3ad_lacpdu_recv+0x1fa/0x240 [bonding]
 6640942 firewall 11:46:54 kernel: [431168.944946]
 [ip_local_deliver_finish+0xf9/0x210] ip_local_deliver_finish+0xf9/0x210
 6640943 firewall 11:46:54 kernel: [431168.944955]
 [ip_rcv_finish+0xff/0x370] ip_rcv_finish+0xff/0x370
 6640944 firewall 11:46:54 kernel: [431168.944960]
 [sock_def_write_space+0x12/0xa0] sock_def_write_space+0x12/0xa0
 6640945 firewall 11:46:54 kernel: [431168.944968] [f8967a4b]
 e1000_alloc_rx_buffers+0xab/0x3a0 [e1000]
 6640946 firewall 11:46:54 kernel: [431168.944982] [arp_rcv+0x0/0x140]
 arp_rcv+0x0/0x140
 6640947 firewall 11:46:54 kernel: [431168.944994]
 [e1000:__netdev_alloc_skb+0x22/0x2a80] __netdev_alloc_skb+0x22/0x50
 6640948 firewall 11:46:54 kernel: [431168.945000] [f8b67c70]
 bond_3ad_lacpdu_recv+0x0/0x240 [bonding]
 6640949 firewall 11:46:54 kernel: [431168.945011]
 [tg3:netif_receive_skb+0x379/0x720] netif_receive_skb+0x379/0x440
 6640950 firewall 11:46:54 kernel: [431168.945024] [f8968474]
 e1000_clean_rx_irq+0x174/0x500 [e1000]
 6640951 firewall 11:46:54 kernel: [431168.945037] [f8968378]
 e1000_clean_rx_irq+0x78/0x500 [e1000]
 6640952 firewall 11:46:54 kernel: [431168.945059] [f8968300]
 e1000_clean_rx_irq+0x0/0x500 [e1000]
 6640953 firewall 11:46:54 kernel: [431168.945071] [f896569e]
 e1000_clean+0x5e/0x250 [e1000]
 6640954 firewall 11:46:54 kernel: [431168.945085]
 [net_rx_action+0x12d/0x210] net_rx_action+0x12d/0x210
 6640955 firewall 11:46:54 kernel: [431168.945099] [__do_softirq+0x82/0x110]
 __do_softirq+0x82/0x110
 6640956 firewall 11:46:54 kernel: [431168.945109] [do_softirq+0x55/0x60]
 do_softirq+0x55/0x60
 6640957 firewall 11:46:54 kernel: [431168.945113] [irq_exit+0x6d/0x80]
 irq_exit+0x6d/0x80
 6640958 firewall 11:46:54 kernel: [431168.945117] [do_IRQ+0x40/0x70]
 do_IRQ+0x40/0x70
 6640959 firewall 11:46:54 kernel: [431168.945121]
 [find_busiest_group+0x1bd/0x760] find_busiest_group+0x1bd/0x760
 6640960 firewall 11:46:54 kernel: [431168.945130]
 [common_interrupt+0x23/0x28] common_interrupt+0x23/0x28
 6640961 firewall 11:46:54 kernel: [431168.945142] [f897007b]
 e1000_init_hw+0x34b/0xb50 [e1000]
 6640962 firewall 11:46:54 

Re: [Bug 245779] Re: Server 8.04 LTS: soft lockup - CPU#1 stuck for 11s! [bond1:3795] - bond - bond0

2009-07-22 Thread Warren V
At one point I was able to track down a kernel.org post about the root of
the problem,  but I can't remember exactly what it was. But I recall it was
due to  a mistake on the part of one of the kernel devs.

-W

On Wed, Jul 22, 2009 at 2:36 PM, Ryan Lovett r...@spacecoaster.org
wrote:

 On Wed, Jul 22, 2009 at 06:23:01PM -, Warren V wrote:
  Upgrade your kernel to 2.6.28. CentOS is now on 2.6.28-128, I noted the
  problem went away around 2.6.28-92.
 
  Ubuntu is  stuck with whatever is currently out.

 Do you know which patch addressed the issue? If so, the Ubuntu kernel devs
 might be able to backport it to the LTS release.

 Ryan

 --
 Server 8.04 LTS: soft lockup - CPU#1 stuck for 11s! [bond1:3795] - bond -
 bond0
 https://bugs.launchpad.net/bugs/245779
 You received this bug notification because you are a direct subscriber
 of the bug.

 Status in “linux” package in Ubuntu: Incomplete
 Status in “linux” package in Debian: Fix Released

 Bug description:
 Hi!
 Ubuntu Server 8.04 LTS with all patch and last kernel
 Hardware: HP DL360 G4 Xeon
 Bonding with :
 - bond0 2x1Gb Intel (802.3ad / 4)
 - bond1 8x1Gb Intel (802.3ad / 4)
 Nagios (only nrpe and plugin)
 Heartbeat2 (withour CRM)
 Vlan

 Today it crash (after two week uptime from kernel upgrade) with this output

 6640927 firewall 11:46:54 kernel: [431168.944816] BUG: soft lockup - CPU#1
 stuck for 11s! [bond1:3795]
 6640928 firewall 11:46:54 kernel: [431168.944849]
 6640929 firewall 11:46:54 kernel: [431168.944853] Pid: 3795, comm: bond1
 Not tainted (2.6.24-19-server #1)
 6640930 firewall 11:46:54 kernel: [431168.944856] EIP:
 0060:[ipv6:_spin_lock+0xa/0x10] EFLAGS: 0286 CPU: 1
 6640931 firewall 11:46:54 kernel: [431168.944865] EIP is at
 _spin_lock+0xa/0x10
 6640932 firewall 11:46:54 kernel: [431168.944867] EAX: f749f334 EBX:
 f749f25c ECX: 0001 EDX: f749f25c
 6640933 firewall 11:46:54 kernel: [431168.944870] ESI:  EDI:
 f7ca1000 EBP: f6c35c80 ESP: f6835cc0
 6640934 firewall 11:46:54 kernel: [431168.944872] DS: 007b ES: 007b FS:
 00d8 GS:  SS: 0068
 6640935 firewall 11:46:54 kernel: [431168.944875] CR0: 8005003b CR2:
 b7bfd0a0 CR3: 35908000 CR4: 06b0
 6640936 firewall 11:46:54 kernel: [431168.944878] DR0:  DR1:
  DR2:  DR3: 
 6640937 firewall 11:46:54 kernel: [431168.944880] DR6: 0ff0 DR7:
 0400
 6640938 firewall 11:46:54 kernel: [431168.944887] [f8b67606]
 ad_rx_machine+0x26/0x690 [bonding]
 6640939 firewall 11:46:54 kernel: [431168.944899]
 [nf_nat:_read_lock_bh+0x8/0x50] _read_lock_bh+0x8/0x20
 6640940 firewall 11:46:54 kernel: [431168.944920] [arp_process+0x8b/0x5f0]
 arp_process+0x8b/0x5f0
 6640941 firewall 11:46:54 kernel: [431168.944930] [f8b67e6a]
 bond_3ad_lacpdu_recv+0x1fa/0x240 [bonding]
 6640942 firewall 11:46:54 kernel: [431168.944946]
 [ip_local_deliver_finish+0xf9/0x210] ip_local_deliver_finish+0xf9/0x210
 6640943 firewall 11:46:54 kernel: [431168.944955]
 [ip_rcv_finish+0xff/0x370] ip_rcv_finish+0xff/0x370
 6640944 firewall 11:46:54 kernel: [431168.944960]
 [sock_def_write_space+0x12/0xa0] sock_def_write_space+0x12/0xa0
 6640945 firewall 11:46:54 kernel: [431168.944968] [f8967a4b]
 e1000_alloc_rx_buffers+0xab/0x3a0 [e1000]
 6640946 firewall 11:46:54 kernel: [431168.944982] [arp_rcv+0x0/0x140]
 arp_rcv+0x0/0x140
 6640947 firewall 11:46:54 kernel: [431168.944994]
 [e1000:__netdev_alloc_skb+0x22/0x2a80] __netdev_alloc_skb+0x22/0x50
 6640948 firewall 11:46:54 kernel: [431168.945000] [f8b67c70]
 bond_3ad_lacpdu_recv+0x0/0x240 [bonding]
 6640949 firewall 11:46:54 kernel: [431168.945011]
 [tg3:netif_receive_skb+0x379/0x720] netif_receive_skb+0x379/0x440
 6640950 firewall 11:46:54 kernel: [431168.945024] [f8968474]
 e1000_clean_rx_irq+0x174/0x500 [e1000]
 6640951 firewall 11:46:54 kernel: [431168.945037] [f8968378]
 e1000_clean_rx_irq+0x78/0x500 [e1000]
 6640952 firewall 11:46:54 kernel: [431168.945059] [f8968300]
 e1000_clean_rx_irq+0x0/0x500 [e1000]
 6640953 firewall 11:46:54 kernel: [431168.945071] [f896569e]
 e1000_clean+0x5e/0x250 [e1000]
 6640954 firewall 11:46:54 kernel: [431168.945085]
 [net_rx_action+0x12d/0x210] net_rx_action+0x12d/0x210
 6640955 firewall 11:46:54 kernel: [431168.945099] [__do_softirq+0x82/0x110]
 __do_softirq+0x82/0x110
 6640956 firewall 11:46:54 kernel: [431168.945109] [do_softirq+0x55/0x60]
 do_softirq+0x55/0x60
 6640957 firewall 11:46:54 kernel: [431168.945113] [irq_exit+0x6d/0x80]
 irq_exit+0x6d/0x80
 6640958 firewall 11:46:54 kernel: [431168.945117] [do_IRQ+0x40/0x70]
 do_IRQ+0x40/0x70
 6640959 firewall 11:46:54 kernel: [431168.945121]
 [find_busiest_group+0x1bd/0x760] find_busiest_group+0x1bd/0x760
 6640960 firewall 11:46:54 kernel: [431168.945130]
 [common_interrupt+0x23/0x28] common_interrupt+0x23/0x28
 6640961 firewall 11:46:54 kernel: [431168.945142] [f897007b]
 e1000_init_hw+0x34b/0xb50 [e1000]
 6640962 firewall 11:46:54 kernel: [431168.945156]
 [ipv6:_spin_lock+0x3/0x10] _spin_lock+0x3/0x10
 6640963 firewall 11:46:54 kernel

Re: [Bug 245779] Re: Server 8.04 LTS: soft lockup - CPU#1 stuck for 11s! [bond1:3795] - bond - bond0

2009-02-13 Thread Warren V
using 2.6.24 or 2.6.18-128?

-W

On Fri, Feb 13, 2009 at 12:12 AM, Hark ubu...@komkommerkom.com wrote:

 Yesterday I got this error again:
  Feb 12 19:20:52 xxx kernel: [1410045.600863] BUG: soft lockup - CPU#3
 stuck for 11s! [kvm:10534]

 I had to use the remote power switch to get the machine running again,
 and that's definitely not something I want on a production machine!

 --
 Server 8.04 LTS: soft lockup - CPU#1 stuck for 11s! [bond1:3795] - bond -
 bond0
 https://bugs.launchpad.net/bugs/245779
 You received this bug notification because you are a direct subscriber
 of the bug.

 Status in linux source package in Ubuntu: Confirmed
 Status in linux source package in Debian: Fix Released

 Bug description:
 Hi!
 Ubuntu Server 8.04 LTS with all patch and last kernel
 Hardware: HP DL360 G4 Xeon
 Bonding with :
 - bond0 2x1Gb Intel (802.3ad / 4)
 - bond1 8x1Gb Intel (802.3ad / 4)
 Nagios (only nrpe and plugin)
 Heartbeat2 (withour CRM)
 Vlan

 Today it crash (after two week uptime from kernel upgrade) with this output

 6640927 firewall 11:46:54 kernel: [431168.944816] BUG: soft lockup - CPU#1
 stuck for 11s! [bond1:3795]
 6640928 firewall 11:46:54 kernel: [431168.944849]
 6640929 firewall 11:46:54 kernel: [431168.944853] Pid: 3795, comm: bond1
 Not tainted (2.6.24-19-server #1)
 6640930 firewall 11:46:54 kernel: [431168.944856] EIP:
 0060:[ipv6:_spin_lock+0xa/0x10] EFLAGS: 0286 CPU: 1
 6640931 firewall 11:46:54 kernel: [431168.944865] EIP is at
 _spin_lock+0xa/0x10
 6640932 firewall 11:46:54 kernel: [431168.944867] EAX: f749f334 EBX:
 f749f25c ECX: 0001 EDX: f749f25c
 6640933 firewall 11:46:54 kernel: [431168.944870] ESI:  EDI:
 f7ca1000 EBP: f6c35c80 ESP: f6835cc0
 6640934 firewall 11:46:54 kernel: [431168.944872] DS: 007b ES: 007b FS:
 00d8 GS:  SS: 0068
 6640935 firewall 11:46:54 kernel: [431168.944875] CR0: 8005003b CR2:
 b7bfd0a0 CR3: 35908000 CR4: 06b0
 6640936 firewall 11:46:54 kernel: [431168.944878] DR0:  DR1:
  DR2:  DR3: 
 6640937 firewall 11:46:54 kernel: [431168.944880] DR6: 0ff0 DR7:
 0400
 6640938 firewall 11:46:54 kernel: [431168.944887] [f8b67606]
 ad_rx_machine+0x26/0x690 [bonding]
 6640939 firewall 11:46:54 kernel: [431168.944899]
 [nf_nat:_read_lock_bh+0x8/0x50] _read_lock_bh+0x8/0x20
 6640940 firewall 11:46:54 kernel: [431168.944920] [arp_process+0x8b/0x5f0]
 arp_process+0x8b/0x5f0
 6640941 firewall 11:46:54 kernel: [431168.944930] [f8b67e6a]
 bond_3ad_lacpdu_recv+0x1fa/0x240 [bonding]
 6640942 firewall 11:46:54 kernel: [431168.944946]
 [ip_local_deliver_finish+0xf9/0x210] ip_local_deliver_finish+0xf9/0x210
 6640943 firewall 11:46:54 kernel: [431168.944955]
 [ip_rcv_finish+0xff/0x370] ip_rcv_finish+0xff/0x370
 6640944 firewall 11:46:54 kernel: [431168.944960]
 [sock_def_write_space+0x12/0xa0] sock_def_write_space+0x12/0xa0
 6640945 firewall 11:46:54 kernel: [431168.944968] [f8967a4b]
 e1000_alloc_rx_buffers+0xab/0x3a0 [e1000]
 6640946 firewall 11:46:54 kernel: [431168.944982] [arp_rcv+0x0/0x140]
 arp_rcv+0x0/0x140
 6640947 firewall 11:46:54 kernel: [431168.944994]
 [e1000:__netdev_alloc_skb+0x22/0x2a80] __netdev_alloc_skb+0x22/0x50
 6640948 firewall 11:46:54 kernel: [431168.945000] [f8b67c70]
 bond_3ad_lacpdu_recv+0x0/0x240 [bonding]
 6640949 firewall 11:46:54 kernel: [431168.945011]
 [tg3:netif_receive_skb+0x379/0x720] netif_receive_skb+0x379/0x440
 6640950 firewall 11:46:54 kernel: [431168.945024] [f8968474]
 e1000_clean_rx_irq+0x174/0x500 [e1000]
 6640951 firewall 11:46:54 kernel: [431168.945037] [f8968378]
 e1000_clean_rx_irq+0x78/0x500 [e1000]
 6640952 firewall 11:46:54 kernel: [431168.945059] [f8968300]
 e1000_clean_rx_irq+0x0/0x500 [e1000]
 6640953 firewall 11:46:54 kernel: [431168.945071] [f896569e]
 e1000_clean+0x5e/0x250 [e1000]
 6640954 firewall 11:46:54 kernel: [431168.945085]
 [net_rx_action+0x12d/0x210] net_rx_action+0x12d/0x210
 6640955 firewall 11:46:54 kernel: [431168.945099] [__do_softirq+0x82/0x110]
 __do_softirq+0x82/0x110
 6640956 firewall 11:46:54 kernel: [431168.945109] [do_softirq+0x55/0x60]
 do_softirq+0x55/0x60
 6640957 firewall 11:46:54 kernel: [431168.945113] [irq_exit+0x6d/0x80]
 irq_exit+0x6d/0x80
 6640958 firewall 11:46:54 kernel: [431168.945117] [do_IRQ+0x40/0x70]
 do_IRQ+0x40/0x70
 6640959 firewall 11:46:54 kernel: [431168.945121]
 [find_busiest_group+0x1bd/0x760] find_busiest_group+0x1bd/0x760
 6640960 firewall 11:46:54 kernel: [431168.945130]
 [common_interrupt+0x23/0x28] common_interrupt+0x23/0x28
 6640961 firewall 11:46:54 kernel: [431168.945142] [f897007b]
 e1000_init_hw+0x34b/0xb50 [e1000]
 6640962 firewall 11:46:54 kernel: [431168.945156]
 [ipv6:_spin_lock+0x3/0x10] _spin_lock+0x3/0x10
 6640963 firewall 11:46:54 kernel: [431168.945163] [f8b67606]
 ad_rx_machine+0x26/0x690 [bonding]
 6640964 firewall 11:46:54 kernel: [431168.945179]
 [lock_timer_base+0x27/0x60] lock_timer_base+0x27/0x60
 6640965 firewall 11:46:54 kernel: [431168.945183]
 [delayed_work_timer_fn+0x0/0x20] 

Re: [Bug 245779] Re: Server 8.04 LTS: soft lockup - CPU#1 stuck for 11s! [bond1:3795] - bond - bond0

2009-02-02 Thread Warren V
Hi-

Actually, I think I have one better. The latest redhat kernel patch release
for 2.6.18-128 seems to have fixed the issue (two weeks now, no reboot or
lockup), even though there is no official fix listed. It looks like they
made some alterations to the bonding code to fix some bogus MAC address
tracking silliness, which may be preventing the larger issue.

The patch discussion is at:
https://rhn.redhat.com/errata/RHSA-2009-0225.html
I downloaded the patch from:
http://people.redhat.com/dzickus/el5/128.el5/i686/

For those of us running CentOS, this is a straight rpm -ivh install. I
thought about doing the roll-my-own 2.6.24 install, but it was just too much
a jump ahead in kernel versions for me to be comfortable.

Thanks for the message!

-Warren V

On Mon, Feb 2, 2009 at 9:30 AM, Ryan Sitzman sitz...@gmail.com wrote:

 This isn't a solution to the bug, but you may find that using the
 backports repository to install xen 3.3.0 and the 2.6.24-23 kernel
 yields some positive results. On one of my boxes, I could consistently
 trigger the 'CPU#1 stuck' problem, and after upgrading it hasn't locked
 up once. Of course, on a different box with slightly different hardware,
 it locks up just as frequently as before... so ymmv.

 --
 Server 8.04 LTS: soft lockup - CPU#1 stuck for 11s! [bond1:3795] - bond -
 bond0
 https://bugs.launchpad.net/bugs/245779
 You received this bug notification because you are a direct subscriber
 of the bug.

 Status in linux source package in Ubuntu: Confirmed
 Status in linux source package in Debian: Fix Released

 Bug description:
 Hi!
 Ubuntu Server 8.04 LTS with all patch and last kernel
 Hardware: HP DL360 G4 Xeon
 Bonding with :
 - bond0 2x1Gb Intel (802.3ad / 4)
 - bond1 8x1Gb Intel (802.3ad / 4)
 Nagios (only nrpe and plugin)
 Heartbeat2 (withour CRM)
 Vlan

 Today it crash (after two week uptime from kernel upgrade) with this output

 6640927 firewall 11:46:54 kernel: [431168.944816] BUG: soft lockup - CPU#1
 stuck for 11s! [bond1:3795]
 6640928 firewall 11:46:54 kernel: [431168.944849]
 6640929 firewall 11:46:54 kernel: [431168.944853] Pid: 3795, comm: bond1
 Not tainted (2.6.24-19-server #1)
 6640930 firewall 11:46:54 kernel: [431168.944856] EIP:
 0060:[ipv6:_spin_lock+0xa/0x10] EFLAGS: 0286 CPU: 1
 6640931 firewall 11:46:54 kernel: [431168.944865] EIP is at
 _spin_lock+0xa/0x10
 6640932 firewall 11:46:54 kernel: [431168.944867] EAX: f749f334 EBX:
 f749f25c ECX: 0001 EDX: f749f25c
 6640933 firewall 11:46:54 kernel: [431168.944870] ESI:  EDI:
 f7ca1000 EBP: f6c35c80 ESP: f6835cc0
 6640934 firewall 11:46:54 kernel: [431168.944872] DS: 007b ES: 007b FS:
 00d8 GS:  SS: 0068
 6640935 firewall 11:46:54 kernel: [431168.944875] CR0: 8005003b CR2:
 b7bfd0a0 CR3: 35908000 CR4: 06b0
 6640936 firewall 11:46:54 kernel: [431168.944878] DR0:  DR1:
  DR2:  DR3: 
 6640937 firewall 11:46:54 kernel: [431168.944880] DR6: 0ff0 DR7:
 0400
 6640938 firewall 11:46:54 kernel: [431168.944887] [f8b67606]
 ad_rx_machine+0x26/0x690 [bonding]
 6640939 firewall 11:46:54 kernel: [431168.944899]
 [nf_nat:_read_lock_bh+0x8/0x50] _read_lock_bh+0x8/0x20
 6640940 firewall 11:46:54 kernel: [431168.944920] [arp_process+0x8b/0x5f0]
 arp_process+0x8b/0x5f0
 6640941 firewall 11:46:54 kernel: [431168.944930] [f8b67e6a]
 bond_3ad_lacpdu_recv+0x1fa/0x240 [bonding]
 6640942 firewall 11:46:54 kernel: [431168.944946]
 [ip_local_deliver_finish+0xf9/0x210] ip_local_deliver_finish+0xf9/0x210
 6640943 firewall 11:46:54 kernel: [431168.944955]
 [ip_rcv_finish+0xff/0x370] ip_rcv_finish+0xff/0x370
 6640944 firewall 11:46:54 kernel: [431168.944960]
 [sock_def_write_space+0x12/0xa0] sock_def_write_space+0x12/0xa0
 6640945 firewall 11:46:54 kernel: [431168.944968] [f8967a4b]
 e1000_alloc_rx_buffers+0xab/0x3a0 [e1000]
 6640946 firewall 11:46:54 kernel: [431168.944982] [arp_rcv+0x0/0x140]
 arp_rcv+0x0/0x140
 6640947 firewall 11:46:54 kernel: [431168.944994]
 [e1000:__netdev_alloc_skb+0x22/0x2a80] __netdev_alloc_skb+0x22/0x50
 6640948 firewall 11:46:54 kernel: [431168.945000] [f8b67c70]
 bond_3ad_lacpdu_recv+0x0/0x240 [bonding]
 6640949 firewall 11:46:54 kernel: [431168.945011]
 [tg3:netif_receive_skb+0x379/0x720] netif_receive_skb+0x379/0x440
 6640950 firewall 11:46:54 kernel: [431168.945024] [f8968474]
 e1000_clean_rx_irq+0x174/0x500 [e1000]
 6640951 firewall 11:46:54 kernel: [431168.945037] [f8968378]
 e1000_clean_rx_irq+0x78/0x500 [e1000]
 6640952 firewall 11:46:54 kernel: [431168.945059] [f8968300]
 e1000_clean_rx_irq+0x0/0x500 [e1000]
 6640953 firewall 11:46:54 kernel: [431168.945071] [f896569e]
 e1000_clean+0x5e/0x250 [e1000]
 6640954 firewall 11:46:54 kernel: [431168.945085]
 [net_rx_action+0x12d/0x210] net_rx_action+0x12d/0x210
 6640955 firewall 11:46:54 kernel: [431168.945099] [__do_softirq+0x82/0x110]
 __do_softirq+0x82/0x110
 6640956 firewall 11:46:54 kernel: [431168.945109] [do_softirq+0x55/0x60]
 do_softirq+0x55/0x60
 6640957 firewall 11:46:54 kernel

Re: [Bug 245779] Re: Server 8.04 LTS: soft lockup - CPU#1 stuck for 11s! [bond1:3795] - bond - bond0

2009-01-20 Thread Warren V
Howdy-

All of my units are using dual or quad-core Intels. The quads don't seem to
throw the softlock error- top just shows that one of the cores is slammed,
and everything slows to a crawl. I can't seem to be able to kill the bonding
kmod- so I always end up having to reboot the units. I'm going to update my
dev environment to 2.6.24-22, and will advise on any oddness that I run
into.

-Warren V


On Tue, Jan 20, 2009 at 8:26 AM, John Leach j...@johnleach.co.uk
wrote:

 I think there might be two bugs here.  Something regarding bonding,
 which I've seen on our Dell machines with Centos 5 too.

 And then a general cpu softlock problem, which I'm also experiencing
 with Hardy as a Xen guest - that I think is Xen related (I see it come
 up with various processes - whatever is busy really).  This bug here is
 probably the best place to report those types of problems:
 https://bugs.launchpad.net/ubuntu/+source/linux-meta/+bug/259487

 --
 Server 8.04 LTS: soft lockup - CPU#1 stuck for 11s! [bond1:3795] - bond -
 bond0
 https://bugs.launchpad.net/bugs/245779
 You received this bug notification because you are a direct subscriber
 of the bug.

 Status in linux source package in Ubuntu: Confirmed
 Status in linux source package in Debian: Fix Released

 Bug description:
 Hi!
 Ubuntu Server 8.04 LTS with all patch and last kernel
 Hardware: HP DL360 G4 Xeon
 Bonding with :
 - bond0 2x1Gb Intel (802.3ad / 4)
 - bond1 8x1Gb Intel (802.3ad / 4)
 Nagios (only nrpe and plugin)
 Heartbeat2 (withour CRM)
 Vlan

 Today it crash (after two week uptime from kernel upgrade) with this output

 6640927 firewall 11:46:54 kernel: [431168.944816] BUG: soft lockup - CPU#1
 stuck for 11s! [bond1:3795]
 6640928 firewall 11:46:54 kernel: [431168.944849]
 6640929 firewall 11:46:54 kernel: [431168.944853] Pid: 3795, comm: bond1
 Not tainted (2.6.24-19-server #1)
 6640930 firewall 11:46:54 kernel: [431168.944856] EIP:
 0060:[ipv6:_spin_lock+0xa/0x10] EFLAGS: 0286 CPU: 1
 6640931 firewall 11:46:54 kernel: [431168.944865] EIP is at
 _spin_lock+0xa/0x10
 6640932 firewall 11:46:54 kernel: [431168.944867] EAX: f749f334 EBX:
 f749f25c ECX: 0001 EDX: f749f25c
 6640933 firewall 11:46:54 kernel: [431168.944870] ESI:  EDI:
 f7ca1000 EBP: f6c35c80 ESP: f6835cc0
 6640934 firewall 11:46:54 kernel: [431168.944872] DS: 007b ES: 007b FS:
 00d8 GS:  SS: 0068
 6640935 firewall 11:46:54 kernel: [431168.944875] CR0: 8005003b CR2:
 b7bfd0a0 CR3: 35908000 CR4: 06b0
 6640936 firewall 11:46:54 kernel: [431168.944878] DR0:  DR1:
  DR2:  DR3: 
 6640937 firewall 11:46:54 kernel: [431168.944880] DR6: 0ff0 DR7:
 0400
 6640938 firewall 11:46:54 kernel: [431168.944887] [f8b67606]
 ad_rx_machine+0x26/0x690 [bonding]
 6640939 firewall 11:46:54 kernel: [431168.944899]
 [nf_nat:_read_lock_bh+0x8/0x50] _read_lock_bh+0x8/0x20
 6640940 firewall 11:46:54 kernel: [431168.944920] [arp_process+0x8b/0x5f0]
 arp_process+0x8b/0x5f0
 6640941 firewall 11:46:54 kernel: [431168.944930] [f8b67e6a]
 bond_3ad_lacpdu_recv+0x1fa/0x240 [bonding]
 6640942 firewall 11:46:54 kernel: [431168.944946]
 [ip_local_deliver_finish+0xf9/0x210] ip_local_deliver_finish+0xf9/0x210
 6640943 firewall 11:46:54 kernel: [431168.944955]
 [ip_rcv_finish+0xff/0x370] ip_rcv_finish+0xff/0x370
 6640944 firewall 11:46:54 kernel: [431168.944960]
 [sock_def_write_space+0x12/0xa0] sock_def_write_space+0x12/0xa0
 6640945 firewall 11:46:54 kernel: [431168.944968] [f8967a4b]
 e1000_alloc_rx_buffers+0xab/0x3a0 [e1000]
 6640946 firewall 11:46:54 kernel: [431168.944982] [arp_rcv+0x0/0x140]
 arp_rcv+0x0/0x140
 6640947 firewall 11:46:54 kernel: [431168.944994]
 [e1000:__netdev_alloc_skb+0x22/0x2a80] __netdev_alloc_skb+0x22/0x50
 6640948 firewall 11:46:54 kernel: [431168.945000] [f8b67c70]
 bond_3ad_lacpdu_recv+0x0/0x240 [bonding]
 6640949 firewall 11:46:54 kernel: [431168.945011]
 [tg3:netif_receive_skb+0x379/0x720] netif_receive_skb+0x379/0x440
 6640950 firewall 11:46:54 kernel: [431168.945024] [f8968474]
 e1000_clean_rx_irq+0x174/0x500 [e1000]
 6640951 firewall 11:46:54 kernel: [431168.945037] [f8968378]
 e1000_clean_rx_irq+0x78/0x500 [e1000]
 6640952 firewall 11:46:54 kernel: [431168.945059] [f8968300]
 e1000_clean_rx_irq+0x0/0x500 [e1000]
 6640953 firewall 11:46:54 kernel: [431168.945071] [f896569e]
 e1000_clean+0x5e/0x250 [e1000]
 6640954 firewall 11:46:54 kernel: [431168.945085]
 [net_rx_action+0x12d/0x210] net_rx_action+0x12d/0x210
 6640955 firewall 11:46:54 kernel: [431168.945099] [__do_softirq+0x82/0x110]
 __do_softirq+0x82/0x110
 6640956 firewall 11:46:54 kernel: [431168.945109] [do_softirq+0x55/0x60]
 do_softirq+0x55/0x60
 6640957 firewall 11:46:54 kernel: [431168.945113] [irq_exit+0x6d/0x80]
 irq_exit+0x6d/0x80
 6640958 firewall 11:46:54 kernel: [431168.945117] [do_IRQ+0x40/0x70]
 do_IRQ+0x40/0x70
 6640959 firewall 11:46:54 kernel: [431168.945121]
 [find_busiest_group+0x1bd/0x760] find_busiest_group+0x1bd/0x760
 6640960 firewall 11:46:54 kernel

[Bug 245779] Re: Server 8.04 LTS: soft lockup - CPU#1 stuck for 11s! [bond1:3795] - bond - bond0

2009-01-19 Thread Warren V
I am seeing this same issue across multiple Dell platforms running
CentOS 5.2. I have a bug report there:
http://bugs.centos.org/view.php?id=3318

I'm seeing this item with both the 2.6.17 and 2.6.18 kernels- the base
kernels seem to be the most stable. Occasionally, I don't get the soft
lockup message- on my faster multicore machines, a core will go to
100%+,  but the machine will not completely fail.

I sure wish that the kernel team hadn't run away from this bug.

-- 
Server 8.04 LTS: soft lockup - CPU#1 stuck for 11s! [bond1:3795] - bond - bond0
https://bugs.launchpad.net/bugs/245779
You received this bug notification because you are a member of Ubuntu
Bugs, which is a direct subscriber.

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs