Re: [Bug 245779] Re: Server 8.04 LTS: soft lockup - CPU#1 stuck for 11s! [bond1:3795] - bond - bond0

2009-07-22 Thread Warren V
At one point I was able to track down a kernel.org post about the root of
the problem,  but I can't remember exactly what it was. But I recall it was
due to  a mistake on the part of one of the kernel devs.

-W

On Wed, Jul 22, 2009 at 2:36 PM, Ryan Lovett 
wrote:

> On Wed, Jul 22, 2009 at 06:23:01PM -, Warren V wrote:
> > Upgrade your kernel to 2.6.28. CentOS is now on 2.6.28-128, I noted the
> > problem went away around 2.6.28-92.
> >
> > Ubuntu is  stuck with whatever is currently out.
>
> Do you know which patch addressed the issue? If so, the Ubuntu kernel devs
> might be able to backport it to the LTS release.
>
> Ryan
>
> --
> Server 8.04 LTS: soft lockup - CPU#1 stuck for 11s! [bond1:3795] - bond -
> bond0
> https://bugs.launchpad.net/bugs/245779
> You received this bug notification because you are a direct subscriber
> of the bug.
>
> Status in “linux” package in Ubuntu: Incomplete
> Status in “linux” package in Debian: Fix Released
>
> Bug description:
> Hi!
> Ubuntu Server 8.04 LTS with all patch and last kernel
> Hardware: HP DL360 G4 Xeon
> Bonding with :
> - bond0 2x1Gb Intel (802.3ad / 4)
> - bond1 8x1Gb Intel (802.3ad / 4)
> Nagios (only nrpe and plugin)
> Heartbeat2 (withour CRM)
> Vlan
>
> Today it crash (after two week uptime from kernel upgrade) with this output
>
> 6640927 firewall 11:46:54 kernel: [431168.944816] BUG: soft lockup - CPU#1
> stuck for 11s! [bond1:3795]
> 6640928 firewall 11:46:54 kernel: [431168.944849]
> 6640929 firewall 11:46:54 kernel: [431168.944853] Pid: 3795, comm: bond1
> Not tainted (2.6.24-19-server #1)
> 6640930 firewall 11:46:54 kernel: [431168.944856] EIP:
> 0060:[ipv6:_spin_lock+0xa/0x10] EFLAGS: 0286 CPU: 1
> 6640931 firewall 11:46:54 kernel: [431168.944865] EIP is at
> _spin_lock+0xa/0x10
> 6640932 firewall 11:46:54 kernel: [431168.944867] EAX: f749f334 EBX:
> f749f25c ECX: 0001 EDX: f749f25c
> 6640933 firewall 11:46:54 kernel: [431168.944870] ESI:  EDI:
> f7ca1000 EBP: f6c35c80 ESP: f6835cc0
> 6640934 firewall 11:46:54 kernel: [431168.944872] DS: 007b ES: 007b FS:
> 00d8 GS:  SS: 0068
> 6640935 firewall 11:46:54 kernel: [431168.944875] CR0: 8005003b CR2:
> b7bfd0a0 CR3: 35908000 CR4: 06b0
> 6640936 firewall 11:46:54 kernel: [431168.944878] DR0:  DR1:
>  DR2:  DR3: 
> 6640937 firewall 11:46:54 kernel: [431168.944880] DR6: 0ff0 DR7:
> 0400
> 6640938 firewall 11:46:54 kernel: [431168.944887] []
> ad_rx_machine+0x26/0x690 [bonding]
> 6640939 firewall 11:46:54 kernel: [431168.944899]
> [nf_nat:_read_lock_bh+0x8/0x50] _read_lock_bh+0x8/0x20
> 6640940 firewall 11:46:54 kernel: [431168.944920] [arp_process+0x8b/0x5f0]
> arp_process+0x8b/0x5f0
> 6640941 firewall 11:46:54 kernel: [431168.944930] []
> bond_3ad_lacpdu_recv+0x1fa/0x240 [bonding]
> 6640942 firewall 11:46:54 kernel: [431168.944946]
> [ip_local_deliver_finish+0xf9/0x210] ip_local_deliver_finish+0xf9/0x210
> 6640943 firewall 11:46:54 kernel: [431168.944955]
> [ip_rcv_finish+0xff/0x370] ip_rcv_finish+0xff/0x370
> 6640944 firewall 11:46:54 kernel: [431168.944960]
> [sock_def_write_space+0x12/0xa0] sock_def_write_space+0x12/0xa0
> 6640945 firewall 11:46:54 kernel: [431168.944968] []
> e1000_alloc_rx_buffers+0xab/0x3a0 [e1000]
> 6640946 firewall 11:46:54 kernel: [431168.944982] [arp_rcv+0x0/0x140]
> arp_rcv+0x0/0x140
> 6640947 firewall 11:46:54 kernel: [431168.944994]
> [e1000:__netdev_alloc_skb+0x22/0x2a80] __netdev_alloc_skb+0x22/0x50
> 6640948 firewall 11:46:54 kernel: [431168.945000] []
> bond_3ad_lacpdu_recv+0x0/0x240 [bonding]
> 6640949 firewall 11:46:54 kernel: [431168.945011]
> [tg3:netif_receive_skb+0x379/0x720] netif_receive_skb+0x379/0x440
> 6640950 firewall 11:46:54 kernel: [431168.945024] []
> e1000_clean_rx_irq+0x174/0x500 [e1000]
> 6640951 firewall 11:46:54 kernel: [431168.945037] []
> e1000_clean_rx_irq+0x78/0x500 [e1000]
> 6640952 firewall 11:46:54 kernel: [431168.945059] []
> e1000_clean_rx_irq+0x0/0x500 [e1000]
> 6640953 firewall 11:46:54 kernel: [431168.945071] []
> e1000_clean+0x5e/0x250 [e1000]
> 6640954 firewall 11:46:54 kernel: [431168.945085]
> [net_rx_action+0x12d/0x210] net_rx_action+0x12d/0x210
> 6640955 firewall 11:46:54 kernel: [431168.945099] [__do_softirq+0x82/0x110]
> __do_softirq+0x82/0x110
> 6640956 firewall 11:46:54 kernel: [431168.945109] [do_softirq+0x55/0x60]
> do_softirq+0x55/0x60
> 6640957 firewall 11:46:54 kernel: [431168.945113] [irq_exit+0x6d/0x80]
> irq_exit+0x6d/0x80
> 6640958 firewall 11:46:54 kernel: [431168.945117] [do_IRQ+0x40/0x70]
> do_IRQ+0x40/0x70
> 6640959 firewall 11:46:54 kernel: [431168.945121]
> [find_busiest_group+0x1bd/0x760] find_busiest_group+0x1bd/0x760
> 6640960 fi

Re: [Bug 245779] Re: Server 8.04 LTS: soft lockup - CPU#1 stuck for 11s! [bond1:3795] - bond - bond0

2009-07-22 Thread Warren V
Upgrade your kernel to 2.6.28. CentOS is now on 2.6.28-128, I noted the
problem went away around 2.6.28-92.

Ubuntu is  stuck with whatever is currently out.

On Wed, Jul 22, 2009 at 12:55 PM, Ryan Lovett 
wrote:

> We are seeing this problem with 2.6.24-24-server. We were running with
> -23 recently but had this problem and upgraded after reading this
> report. Additionally, the CPU load on the machine is around 190 though
> summing the individual processes in top doesn't approach that total.
> kswapd is near the top though the machine still has a lot of real RAM
> unallocated. /var/log/kern.log grows rapidly at a rate of about 200
> KB/minute. It has a lot of Call Traces and "Pid: 15713, comm: 
> Not tainted 2.6.24-24-server #1".
>
> --
> Server 8.04 LTS: soft lockup - CPU#1 stuck for 11s! [bond1:3795] - bond -
> bond0
> https://bugs.launchpad.net/bugs/245779
> You received this bug notification because you are a direct subscriber
> of the bug.
>
> Status in “linux” package in Ubuntu: Incomplete
> Status in “linux” package in Debian: Fix Released
>
> Bug description:
> Hi!
> Ubuntu Server 8.04 LTS with all patch and last kernel
> Hardware: HP DL360 G4 Xeon
> Bonding with :
> - bond0 2x1Gb Intel (802.3ad / 4)
> - bond1 8x1Gb Intel (802.3ad / 4)
> Nagios (only nrpe and plugin)
> Heartbeat2 (withour CRM)
> Vlan
>
> Today it crash (after two week uptime from kernel upgrade) with this output
>
> 6640927 firewall 11:46:54 kernel: [431168.944816] BUG: soft lockup - CPU#1
> stuck for 11s! [bond1:3795]
> 6640928 firewall 11:46:54 kernel: [431168.944849]
> 6640929 firewall 11:46:54 kernel: [431168.944853] Pid: 3795, comm: bond1
> Not tainted (2.6.24-19-server #1)
> 6640930 firewall 11:46:54 kernel: [431168.944856] EIP:
> 0060:[ipv6:_spin_lock+0xa/0x10] EFLAGS: 0286 CPU: 1
> 6640931 firewall 11:46:54 kernel: [431168.944865] EIP is at
> _spin_lock+0xa/0x10
> 6640932 firewall 11:46:54 kernel: [431168.944867] EAX: f749f334 EBX:
> f749f25c ECX: 0001 EDX: f749f25c
> 6640933 firewall 11:46:54 kernel: [431168.944870] ESI:  EDI:
> f7ca1000 EBP: f6c35c80 ESP: f6835cc0
> 6640934 firewall 11:46:54 kernel: [431168.944872] DS: 007b ES: 007b FS:
> 00d8 GS:  SS: 0068
> 6640935 firewall 11:46:54 kernel: [431168.944875] CR0: 8005003b CR2:
> b7bfd0a0 CR3: 35908000 CR4: 06b0
> 6640936 firewall 11:46:54 kernel: [431168.944878] DR0:  DR1:
>  DR2:  DR3: 
> 6640937 firewall 11:46:54 kernel: [431168.944880] DR6: 0ff0 DR7:
> 0400
> 6640938 firewall 11:46:54 kernel: [431168.944887] []
> ad_rx_machine+0x26/0x690 [bonding]
> 6640939 firewall 11:46:54 kernel: [431168.944899]
> [nf_nat:_read_lock_bh+0x8/0x50] _read_lock_bh+0x8/0x20
> 6640940 firewall 11:46:54 kernel: [431168.944920] [arp_process+0x8b/0x5f0]
> arp_process+0x8b/0x5f0
> 6640941 firewall 11:46:54 kernel: [431168.944930] []
> bond_3ad_lacpdu_recv+0x1fa/0x240 [bonding]
> 6640942 firewall 11:46:54 kernel: [431168.944946]
> [ip_local_deliver_finish+0xf9/0x210] ip_local_deliver_finish+0xf9/0x210
> 6640943 firewall 11:46:54 kernel: [431168.944955]
> [ip_rcv_finish+0xff/0x370] ip_rcv_finish+0xff/0x370
> 6640944 firewall 11:46:54 kernel: [431168.944960]
> [sock_def_write_space+0x12/0xa0] sock_def_write_space+0x12/0xa0
> 6640945 firewall 11:46:54 kernel: [431168.944968] []
> e1000_alloc_rx_buffers+0xab/0x3a0 [e1000]
> 6640946 firewall 11:46:54 kernel: [431168.944982] [arp_rcv+0x0/0x140]
> arp_rcv+0x0/0x140
> 6640947 firewall 11:46:54 kernel: [431168.944994]
> [e1000:__netdev_alloc_skb+0x22/0x2a80] __netdev_alloc_skb+0x22/0x50
> 6640948 firewall 11:46:54 kernel: [431168.945000] []
> bond_3ad_lacpdu_recv+0x0/0x240 [bonding]
> 6640949 firewall 11:46:54 kernel: [431168.945011]
> [tg3:netif_receive_skb+0x379/0x720] netif_receive_skb+0x379/0x440
> 6640950 firewall 11:46:54 kernel: [431168.945024] []
> e1000_clean_rx_irq+0x174/0x500 [e1000]
> 6640951 firewall 11:46:54 kernel: [431168.945037] []
> e1000_clean_rx_irq+0x78/0x500 [e1000]
> 6640952 firewall 11:46:54 kernel: [431168.945059] []
> e1000_clean_rx_irq+0x0/0x500 [e1000]
> 6640953 firewall 11:46:54 kernel: [431168.945071] []
> e1000_clean+0x5e/0x250 [e1000]
> 6640954 firewall 11:46:54 kernel: [431168.945085]
> [net_rx_action+0x12d/0x210] net_rx_action+0x12d/0x210
> 6640955 firewall 11:46:54 kernel: [431168.945099] [__do_softirq+0x82/0x110]
> __do_softirq+0x82/0x110
> 6640956 firewall 11:46:54 kernel: [431168.945109] [do_softirq+0x55/0x60]
> do_softirq+0x55/0x60
> 6640957 firewall 11:46:54 kernel: [431168.945113] [irq_exit+0x6d/0x80]
> irq_exit+0x6d/0x80
> 6640958 firewall 11:46:54 kernel: [431168.945117] [do_IRQ+0x40/0x70]
> do_IRQ+0x40/0x70
> 6640959 firewall 11:46:54 kernel: [431168.945121]
> [find_busiest_group+0x1bd/0x760] find_busiest_group+0x1bd/0x760
> 6640960 firewall 11:46:54 kernel: [431168.945130]
> [common_interrupt+0x23/0x28] common_interrupt+0x23/0x28
> 6640961 firewall 11:46:54 kernel: [431168.945142] []
> e1000_init_hw+0x34b/0xb50 [e1000]
> 6640962 firewall 11:46:

Re: [Bug 245779] Re: Server 8.04 LTS: soft lockup - CPU#1 stuck for 11s! [bond1:3795] - bond - bond0

2009-02-13 Thread Warren V
using 2.6.24 or 2.6.18-128?

-W

On Fri, Feb 13, 2009 at 12:12 AM, Hark  wrote:

> Yesterday I got this error again:
>  Feb 12 19:20:52 xxx kernel: [1410045.600863] BUG: soft lockup - CPU#3
> stuck for 11s! [kvm:10534]
>
> I had to use the remote power switch to get the machine running again,
> and that's definitely not something I want on a production machine!
>
> --
> Server 8.04 LTS: soft lockup - CPU#1 stuck for 11s! [bond1:3795] - bond -
> bond0
> https://bugs.launchpad.net/bugs/245779
> You received this bug notification because you are a direct subscriber
> of the bug.
>
> Status in "linux" source package in Ubuntu: Confirmed
> Status in "linux" source package in Debian: Fix Released
>
> Bug description:
> Hi!
> Ubuntu Server 8.04 LTS with all patch and last kernel
> Hardware: HP DL360 G4 Xeon
> Bonding with :
> - bond0 2x1Gb Intel (802.3ad / 4)
> - bond1 8x1Gb Intel (802.3ad / 4)
> Nagios (only nrpe and plugin)
> Heartbeat2 (withour CRM)
> Vlan
>
> Today it crash (after two week uptime from kernel upgrade) with this output
>
> 6640927 firewall 11:46:54 kernel: [431168.944816] BUG: soft lockup - CPU#1
> stuck for 11s! [bond1:3795]
> 6640928 firewall 11:46:54 kernel: [431168.944849]
> 6640929 firewall 11:46:54 kernel: [431168.944853] Pid: 3795, comm: bond1
> Not tainted (2.6.24-19-server #1)
> 6640930 firewall 11:46:54 kernel: [431168.944856] EIP:
> 0060:[ipv6:_spin_lock+0xa/0x10] EFLAGS: 0286 CPU: 1
> 6640931 firewall 11:46:54 kernel: [431168.944865] EIP is at
> _spin_lock+0xa/0x10
> 6640932 firewall 11:46:54 kernel: [431168.944867] EAX: f749f334 EBX:
> f749f25c ECX: 0001 EDX: f749f25c
> 6640933 firewall 11:46:54 kernel: [431168.944870] ESI:  EDI:
> f7ca1000 EBP: f6c35c80 ESP: f6835cc0
> 6640934 firewall 11:46:54 kernel: [431168.944872] DS: 007b ES: 007b FS:
> 00d8 GS:  SS: 0068
> 6640935 firewall 11:46:54 kernel: [431168.944875] CR0: 8005003b CR2:
> b7bfd0a0 CR3: 35908000 CR4: 06b0
> 6640936 firewall 11:46:54 kernel: [431168.944878] DR0:  DR1:
>  DR2:  DR3: 
> 6640937 firewall 11:46:54 kernel: [431168.944880] DR6: 0ff0 DR7:
> 0400
> 6640938 firewall 11:46:54 kernel: [431168.944887] []
> ad_rx_machine+0x26/0x690 [bonding]
> 6640939 firewall 11:46:54 kernel: [431168.944899]
> [nf_nat:_read_lock_bh+0x8/0x50] _read_lock_bh+0x8/0x20
> 6640940 firewall 11:46:54 kernel: [431168.944920] [arp_process+0x8b/0x5f0]
> arp_process+0x8b/0x5f0
> 6640941 firewall 11:46:54 kernel: [431168.944930] []
> bond_3ad_lacpdu_recv+0x1fa/0x240 [bonding]
> 6640942 firewall 11:46:54 kernel: [431168.944946]
> [ip_local_deliver_finish+0xf9/0x210] ip_local_deliver_finish+0xf9/0x210
> 6640943 firewall 11:46:54 kernel: [431168.944955]
> [ip_rcv_finish+0xff/0x370] ip_rcv_finish+0xff/0x370
> 6640944 firewall 11:46:54 kernel: [431168.944960]
> [sock_def_write_space+0x12/0xa0] sock_def_write_space+0x12/0xa0
> 6640945 firewall 11:46:54 kernel: [431168.944968] []
> e1000_alloc_rx_buffers+0xab/0x3a0 [e1000]
> 6640946 firewall 11:46:54 kernel: [431168.944982] [arp_rcv+0x0/0x140]
> arp_rcv+0x0/0x140
> 6640947 firewall 11:46:54 kernel: [431168.944994]
> [e1000:__netdev_alloc_skb+0x22/0x2a80] __netdev_alloc_skb+0x22/0x50
> 6640948 firewall 11:46:54 kernel: [431168.945000] []
> bond_3ad_lacpdu_recv+0x0/0x240 [bonding]
> 6640949 firewall 11:46:54 kernel: [431168.945011]
> [tg3:netif_receive_skb+0x379/0x720] netif_receive_skb+0x379/0x440
> 6640950 firewall 11:46:54 kernel: [431168.945024] []
> e1000_clean_rx_irq+0x174/0x500 [e1000]
> 6640951 firewall 11:46:54 kernel: [431168.945037] []
> e1000_clean_rx_irq+0x78/0x500 [e1000]
> 6640952 firewall 11:46:54 kernel: [431168.945059] []
> e1000_clean_rx_irq+0x0/0x500 [e1000]
> 6640953 firewall 11:46:54 kernel: [431168.945071] []
> e1000_clean+0x5e/0x250 [e1000]
> 6640954 firewall 11:46:54 kernel: [431168.945085]
> [net_rx_action+0x12d/0x210] net_rx_action+0x12d/0x210
> 6640955 firewall 11:46:54 kernel: [431168.945099] [__do_softirq+0x82/0x110]
> __do_softirq+0x82/0x110
> 6640956 firewall 11:46:54 kernel: [431168.945109] [do_softirq+0x55/0x60]
> do_softirq+0x55/0x60
> 6640957 firewall 11:46:54 kernel: [431168.945113] [irq_exit+0x6d/0x80]
> irq_exit+0x6d/0x80
> 6640958 firewall 11:46:54 kernel: [431168.945117] [do_IRQ+0x40/0x70]
> do_IRQ+0x40/0x70
> 6640959 firewall 11:46:54 kernel: [431168.945121]
> [find_busiest_group+0x1bd/0x760] find_busiest_group+0x1bd/0x760
> 6640960 firewall 11:46:54 kernel: [431168.945130]
> [common_interrupt+0x23/0x28] common_interrupt+0x23/0x28
> 6640961 firewall 11:46:54 kernel: [431168.945142] []
> e1000_init_hw+0x34b/0xb50 [e1000]
> 6640962 firewall 11:46:54 kernel: [431168.945156]
> [ipv6:_spin_lock+0x3/0x10] _spin_lock+0x3/0x10
> 6640963 firewall 11:46:54 kernel: [431168.945163] []
> ad_rx_machine+0x26/0x690 [bonding]
> 6640964 firewall 11:46:54 kernel: [431168.945179]
> [lock_timer_base+0x27/0x60] lock_timer_base+0x27/0x60
> 6640965 firewall 11:46:54 kernel: [431168.945183]
> [delayed_work_timer_fn+0x0/0x20

Re: [Bug 245779] Re: Server 8.04 LTS: soft lockup - CPU#1 stuck for 11s! [bond1:3795] - bond - bond0

2009-02-02 Thread Warren V
Hi-

Actually, I think I have one better. The latest redhat kernel patch release
for 2.6.18-128 seems to have fixed the issue (two weeks now, no reboot or
lockup), even though there is no "official" fix listed. It looks like they
made some alterations to the bonding code to fix some bogus MAC address
tracking silliness, which may be preventing the larger issue.

The patch discussion is at:
https://rhn.redhat.com/errata/RHSA-2009-0225.html
I downloaded the patch from:
http://people.redhat.com/dzickus/el5/128.el5/i686/

For those of us running CentOS, this is a straight rpm -ivh install. I
thought about doing the roll-my-own 2.6.24 install, but it was just too much
a jump ahead in kernel versions for me to be comfortable.

Thanks for the message!

-Warren V

On Mon, Feb 2, 2009 at 9:30 AM, Ryan Sitzman  wrote:

> This isn't a solution to the bug, but you may find that using the
> backports repository to install xen 3.3.0 and the 2.6.24-23 kernel
> yields some positive results. On one of my boxes, I could consistently
> trigger the 'CPU#1 stuck' problem, and after upgrading it hasn't locked
> up once. Of course, on a different box with slightly different hardware,
> it locks up just as frequently as before... so ymmv.
>
> --
> Server 8.04 LTS: soft lockup - CPU#1 stuck for 11s! [bond1:3795] - bond -
> bond0
> https://bugs.launchpad.net/bugs/245779
> You received this bug notification because you are a direct subscriber
> of the bug.
>
> Status in "linux" source package in Ubuntu: Confirmed
> Status in "linux" source package in Debian: Fix Released
>
> Bug description:
> Hi!
> Ubuntu Server 8.04 LTS with all patch and last kernel
> Hardware: HP DL360 G4 Xeon
> Bonding with :
> - bond0 2x1Gb Intel (802.3ad / 4)
> - bond1 8x1Gb Intel (802.3ad / 4)
> Nagios (only nrpe and plugin)
> Heartbeat2 (withour CRM)
> Vlan
>
> Today it crash (after two week uptime from kernel upgrade) with this output
>
> 6640927 firewall 11:46:54 kernel: [431168.944816] BUG: soft lockup - CPU#1
> stuck for 11s! [bond1:3795]
> 6640928 firewall 11:46:54 kernel: [431168.944849]
> 6640929 firewall 11:46:54 kernel: [431168.944853] Pid: 3795, comm: bond1
> Not tainted (2.6.24-19-server #1)
> 6640930 firewall 11:46:54 kernel: [431168.944856] EIP:
> 0060:[ipv6:_spin_lock+0xa/0x10] EFLAGS: 0286 CPU: 1
> 6640931 firewall 11:46:54 kernel: [431168.944865] EIP is at
> _spin_lock+0xa/0x10
> 6640932 firewall 11:46:54 kernel: [431168.944867] EAX: f749f334 EBX:
> f749f25c ECX: 0001 EDX: f749f25c
> 6640933 firewall 11:46:54 kernel: [431168.944870] ESI:  EDI:
> f7ca1000 EBP: f6c35c80 ESP: f6835cc0
> 6640934 firewall 11:46:54 kernel: [431168.944872] DS: 007b ES: 007b FS:
> 00d8 GS:  SS: 0068
> 6640935 firewall 11:46:54 kernel: [431168.944875] CR0: 8005003b CR2:
> b7bfd0a0 CR3: 35908000 CR4: 06b0
> 6640936 firewall 11:46:54 kernel: [431168.944878] DR0:  DR1:
>  DR2:  DR3: 
> 6640937 firewall 11:46:54 kernel: [431168.944880] DR6: 0ff0 DR7:
> 0400
> 6640938 firewall 11:46:54 kernel: [431168.944887] []
> ad_rx_machine+0x26/0x690 [bonding]
> 6640939 firewall 11:46:54 kernel: [431168.944899]
> [nf_nat:_read_lock_bh+0x8/0x50] _read_lock_bh+0x8/0x20
> 6640940 firewall 11:46:54 kernel: [431168.944920] [arp_process+0x8b/0x5f0]
> arp_process+0x8b/0x5f0
> 6640941 firewall 11:46:54 kernel: [431168.944930] []
> bond_3ad_lacpdu_recv+0x1fa/0x240 [bonding]
> 6640942 firewall 11:46:54 kernel: [431168.944946]
> [ip_local_deliver_finish+0xf9/0x210] ip_local_deliver_finish+0xf9/0x210
> 6640943 firewall 11:46:54 kernel: [431168.944955]
> [ip_rcv_finish+0xff/0x370] ip_rcv_finish+0xff/0x370
> 6640944 firewall 11:46:54 kernel: [431168.944960]
> [sock_def_write_space+0x12/0xa0] sock_def_write_space+0x12/0xa0
> 6640945 firewall 11:46:54 kernel: [431168.944968] []
> e1000_alloc_rx_buffers+0xab/0x3a0 [e1000]
> 6640946 firewall 11:46:54 kernel: [431168.944982] [arp_rcv+0x0/0x140]
> arp_rcv+0x0/0x140
> 6640947 firewall 11:46:54 kernel: [431168.944994]
> [e1000:__netdev_alloc_skb+0x22/0x2a80] __netdev_alloc_skb+0x22/0x50
> 6640948 firewall 11:46:54 kernel: [431168.945000] []
> bond_3ad_lacpdu_recv+0x0/0x240 [bonding]
> 6640949 firewall 11:46:54 kernel: [431168.945011]
> [tg3:netif_receive_skb+0x379/0x720] netif_receive_skb+0x379/0x440
> 6640950 firewall 11:46:54 kernel: [431168.945024] []
> e1000_clean_rx_irq+0x174/0x500 [e1000]
> 6640951 firewall 11:46:54 kernel: [431168.945037] []
> e1000_clean_rx_irq+0x78/0x500 [e1000]
> 6640952 firewall 11:46:54 kernel: [431168.945059] []
> e1000_clean_rx_irq+0x0/0x500 [e1000]
> 6640953 firewall 11:46:54 kernel: [431168.945071] []
> e1000_clean+0x5e/0x250 [e1000]
> 6640954 firewall 11:46:54 k

Re: [Bug 245779] Re: Server 8.04 LTS: soft lockup - CPU#1 stuck for 11s! [bond1:3795] - bond - bond0

2009-01-20 Thread Warren V
Howdy-

All of my units are using dual or quad-core Intels. The quads don't seem to
throw the softlock error- top just shows that one of the cores is slammed,
and everything slows to a crawl. I can't seem to be able to kill the bonding
kmod- so I always end up having to reboot the units. I'm going to update my
dev environment to 2.6.24-22, and will advise on any oddness that I run
into.

-Warren V


On Tue, Jan 20, 2009 at 8:26 AM, John Leach 
wrote:

> I think there might be two bugs here.  Something regarding bonding,
> which I've seen on our Dell machines with Centos 5 too.
>
> And then a general cpu softlock problem, which I'm also experiencing
> with Hardy as a Xen guest - that I think is Xen related (I see it come
> up with various processes - whatever is busy really).  This bug here is
> probably the best place to report those types of problems:
> https://bugs.launchpad.net/ubuntu/+source/linux-meta/+bug/259487
>
> --
> Server 8.04 LTS: soft lockup - CPU#1 stuck for 11s! [bond1:3795] - bond -
> bond0
> https://bugs.launchpad.net/bugs/245779
> You received this bug notification because you are a direct subscriber
> of the bug.
>
> Status in "linux" source package in Ubuntu: Confirmed
> Status in "linux" source package in Debian: Fix Released
>
> Bug description:
> Hi!
> Ubuntu Server 8.04 LTS with all patch and last kernel
> Hardware: HP DL360 G4 Xeon
> Bonding with :
> - bond0 2x1Gb Intel (802.3ad / 4)
> - bond1 8x1Gb Intel (802.3ad / 4)
> Nagios (only nrpe and plugin)
> Heartbeat2 (withour CRM)
> Vlan
>
> Today it crash (after two week uptime from kernel upgrade) with this output
>
> 6640927 firewall 11:46:54 kernel: [431168.944816] BUG: soft lockup - CPU#1
> stuck for 11s! [bond1:3795]
> 6640928 firewall 11:46:54 kernel: [431168.944849]
> 6640929 firewall 11:46:54 kernel: [431168.944853] Pid: 3795, comm: bond1
> Not tainted (2.6.24-19-server #1)
> 6640930 firewall 11:46:54 kernel: [431168.944856] EIP:
> 0060:[ipv6:_spin_lock+0xa/0x10] EFLAGS: 0286 CPU: 1
> 6640931 firewall 11:46:54 kernel: [431168.944865] EIP is at
> _spin_lock+0xa/0x10
> 6640932 firewall 11:46:54 kernel: [431168.944867] EAX: f749f334 EBX:
> f749f25c ECX: 0001 EDX: f749f25c
> 6640933 firewall 11:46:54 kernel: [431168.944870] ESI:  EDI:
> f7ca1000 EBP: f6c35c80 ESP: f6835cc0
> 6640934 firewall 11:46:54 kernel: [431168.944872] DS: 007b ES: 007b FS:
> 00d8 GS:  SS: 0068
> 6640935 firewall 11:46:54 kernel: [431168.944875] CR0: 8005003b CR2:
> b7bfd0a0 CR3: 35908000 CR4: 06b0
> 6640936 firewall 11:46:54 kernel: [431168.944878] DR0:  DR1:
>  DR2:  DR3: 
> 6640937 firewall 11:46:54 kernel: [431168.944880] DR6: 0ff0 DR7:
> 0400
> 6640938 firewall 11:46:54 kernel: [431168.944887] []
> ad_rx_machine+0x26/0x690 [bonding]
> 6640939 firewall 11:46:54 kernel: [431168.944899]
> [nf_nat:_read_lock_bh+0x8/0x50] _read_lock_bh+0x8/0x20
> 6640940 firewall 11:46:54 kernel: [431168.944920] [arp_process+0x8b/0x5f0]
> arp_process+0x8b/0x5f0
> 6640941 firewall 11:46:54 kernel: [431168.944930] []
> bond_3ad_lacpdu_recv+0x1fa/0x240 [bonding]
> 6640942 firewall 11:46:54 kernel: [431168.944946]
> [ip_local_deliver_finish+0xf9/0x210] ip_local_deliver_finish+0xf9/0x210
> 6640943 firewall 11:46:54 kernel: [431168.944955]
> [ip_rcv_finish+0xff/0x370] ip_rcv_finish+0xff/0x370
> 6640944 firewall 11:46:54 kernel: [431168.944960]
> [sock_def_write_space+0x12/0xa0] sock_def_write_space+0x12/0xa0
> 6640945 firewall 11:46:54 kernel: [431168.944968] []
> e1000_alloc_rx_buffers+0xab/0x3a0 [e1000]
> 6640946 firewall 11:46:54 kernel: [431168.944982] [arp_rcv+0x0/0x140]
> arp_rcv+0x0/0x140
> 6640947 firewall 11:46:54 kernel: [431168.944994]
> [e1000:__netdev_alloc_skb+0x22/0x2a80] __netdev_alloc_skb+0x22/0x50
> 6640948 firewall 11:46:54 kernel: [431168.945000] []
> bond_3ad_lacpdu_recv+0x0/0x240 [bonding]
> 6640949 firewall 11:46:54 kernel: [431168.945011]
> [tg3:netif_receive_skb+0x379/0x720] netif_receive_skb+0x379/0x440
> 6640950 firewall 11:46:54 kernel: [431168.945024] []
> e1000_clean_rx_irq+0x174/0x500 [e1000]
> 6640951 firewall 11:46:54 kernel: [431168.945037] []
> e1000_clean_rx_irq+0x78/0x500 [e1000]
> 6640952 firewall 11:46:54 kernel: [431168.945059] []
> e1000_clean_rx_irq+0x0/0x500 [e1000]
> 6640953 firewall 11:46:54 kernel: [431168.945071] []
> e1000_clean+0x5e/0x250 [e1000]
> 6640954 firewall 11:46:54 kernel: [431168.945085]
> [net_rx_action+0x12d/0x210] net_rx_action+0x12d/0x210
> 6640955 firewall 11:46:54 kernel: [431168.945099] [__do_softirq+0x82/0x110]
> __do_softirq+0x82/0x110
> 6640956 firewall 11:46:54 kernel: [431168.945109] [do_softirq+0x55/0x60]
> do_softirq+0x55/0x60
> 

[Bug 245779] Re: Server 8.04 LTS: soft lockup - CPU#1 stuck for 11s! [bond1:3795] - bond - bond0

2009-01-19 Thread Warren V
I am seeing this same issue across multiple Dell platforms running
CentOS 5.2. I have a bug report there:
http://bugs.centos.org/view.php?id=3318

I'm seeing this item with both the 2.6.17 and 2.6.18 kernels- the base
kernels seem to be the most stable. Occasionally, I don't get the soft
lockup message- on my faster multicore machines, a core will go to
100%+,  but the machine will not completely fail.

I sure wish that the kernel team hadn't run away from this bug.

-- 
Server 8.04 LTS: soft lockup - CPU#1 stuck for 11s! [bond1:3795] - bond - bond0
https://bugs.launchpad.net/bugs/245779
You received this bug notification because you are a member of Ubuntu
Bugs, which is a direct subscriber.

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs