Re: Soft lockup in e100 driver ?
On 8/11/05, Stephen D. Williams <[EMAIL PROTECTED]> wrote: > The chipset is an Intel 8x0 something. Unfortunately, there is a > heatsink semi-permanently installed over everything. Is there a /proc > pseudofile that will give me good identifying chipset info to report here? you can show the chipset details with lspci lspci -n will show device IDs and revision ids interesting failure case on the e100, I haven't a clue whats going on. netdev @ vger might be a good place to continue the discussion abut e100 issues. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Soft lockup in e100 driver ?
"noapic" didn't work, nor did "noacpi", etc. Going to 2.6.13-rc6.2 solved the problem (once I integrated udev, etc.). The chipset is an Intel 8x0 something. Unfortunately, there is a heatsink semi-permanently installed over everything. Is there a /proc pseudofile that will give me good identifying chipset info to report here? If there is a FAQ for this, we should post a message about it once in a while. Nothing here indicates chipset: http://www.kernel.org/pub/linux/docs/lkml/reporting-bugs.html The CPU is an Intel Celeron CPU 2.00GHz running at 1495.772 MHz, 128MB cache. sdw Matti Aarnio wrote: On Wed, Aug 10, 2005 at 08:32:45PM -0400, Stephen D. Williams wrote: I just noticed that the Ubuntu setup says "GSI 20(level,low) -> IRQ 20" whereas I remember my built kernels saying "No GSI.. IRQ 11". I'll investigate what that means and how to enable it. Pointers appreciated. That is most likely unrelated, but I had similar experiences at times. It turned out that something done recently in APIC management code did break things, but lattest version is again working. For a while to have network card working I had to boot with "noapic" option in my home SMP box. In an UP box it is about same to boot as "noapic", but in SMP it does result in "one CPU does all interrupts" thingie. (In some rare cases it could be desirable, even.) /Matti Aarnio sdw Stephen D. Williams wrote: I have been working for days to get a recent kernel to work with these small-format UP Celeron 2Ghz (running at 1.33Ghz) motherboards that I am planning to use as thin clients. I'm doing a PXE boot, loading kernels, and trying to get networking to come up. I eventually realized that the problem is that the e100 driver loads but does not allow any packet traffic. The system isn't crashed, but I do get transmit timeouts. I've used kernels: 2.6.10, 2.6.11, and 2.6.12.4, stock with only the "squashfs" patch applied and compiled as 586/ The interesting thing is that Ubuntu 5.04, booted "Live" on the box, works just fine with the e100 driver with a kernel shown as: "2.6.10-5-386". I'm going to work on pulling this kernel and its modules off to use. Any help urgently appreciated. sdw begin:vcard fn:Stephen Williams n:Williams;Stephen email;internet:[EMAIL PROTECTED] tel;work:703-724-0118 tel;fax:703-995-0407 tel;pager:[EMAIL PROTECTED] tel;home:703-729-5405 tel;cell:703-371-9362 x-mozilla-html:TRUE version:2.1 end:vcard
Re: Soft lockup in e100 driver ?
On Wed, Aug 10, 2005 at 08:32:45PM -0400, Stephen D. Williams wrote: > I just noticed that the Ubuntu setup says "GSI 20(level,low) -> IRQ 20" > whereas I remember my built kernels saying "No GSI.. IRQ 11". I'll > investigate what that means and how to enable it. Pointers appreciated. That is most likely unrelated, but I had similar experiences at times. It turned out that something done recently in APIC management code did break things, but lattest version is again working. For a while to have network card working I had to boot with "noapic" option in my home SMP box. In an UP box it is about same to boot as "noapic", but in SMP it does result in "one CPU does all interrupts" thingie. (In some rare cases it could be desirable, even.) /Matti Aarnio > sdw > > Stephen D. Williams wrote: > > >I have been working for days to get a recent kernel to work with these > >small-format UP Celeron 2Ghz (running at 1.33Ghz) motherboards that I > >am planning to use as thin clients. I'm doing a PXE boot, loading > >kernels, and trying to get networking to come up. > > > >I eventually realized that the problem is that the e100 driver loads > >but does not allow any packet traffic. The system isn't crashed, but > >I do get transmit timeouts. > > > >I've used kernels: 2.6.10, 2.6.11, and 2.6.12.4, stock with only the > >"squashfs" patch applied and compiled as 586/ > > > >The interesting thing is that Ubuntu 5.04, booted "Live" on the box, > >works just fine with the e100 driver with a kernel shown as: > >"2.6.10-5-386". I'm going to work on pulling this kernel and its > >modules off to use. > > > >Any help urgently appreciated. > > > >sdw - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Soft lockup in e100 driver ?
I just noticed that the Ubuntu setup says "GSI 20(level,low) -> IRQ 20" whereas I remember my built kernels saying "No GSI.. IRQ 11". I'll investigate what that means and how to enable it. Pointers appreciated. sdw Stephen D. Williams wrote: I have been working for days to get a recent kernel to work with these small-format UP Celeron 2Ghz (running at 1.33Ghz) motherboards that I am planning to use as thin clients. I'm doing a PXE boot, loading kernels, and trying to get networking to come up. I eventually realized that the problem is that the e100 driver loads but does not allow any packet traffic. The system isn't crashed, but I do get transmit timeouts. I've used kernels: 2.6.10, 2.6.11, and 2.6.12.4, stock with only the "squashfs" patch applied and compiled as 586/ The interesting thing is that Ubuntu 5.04, booted "Live" on the box, works just fine with the e100 driver with a kernel shown as: "2.6.10-5-386". I'm going to work on pulling this kernel and its modules off to use. Any help urgently appreciated. sdw Matti Aarnio wrote: On Tue, Aug 09, 2005 at 09:16:21AM -0700, Daniel Walker wrote: It looks like this might be an SMP race , it seem that both processors are in e100_down(). There is a while loop in e100_clean_cbs() that appears to have an unsafe looping condition . It looks like cbs_avail might jump over params.cbs.count , then you would have to wait for a rollover . Is this a PREEMPT_NONE kernel? # CONFIG_PREEMPT is not set # CONFIG_PREEMPT_BKL is not set which is probably same as "NONE". There is _one_ processor in down, but other may be in trying to send some data out, or otherwise polling the card. However... while real bugs in their own sense, none of these are as important as original "card dies" thing, during a recovery of which all this soft-lockup merryment happens. Also, as it happens only once a week or so (except when it happens right after another), testing code patches is rather slow. I can guess which things make it more likely, but I can't make it happen at will. /Matti Aarnio This patch may help, but it's not a complete fix. --- linux-2.6.12.orig/drivers/net/e100.c2005-08-05 16:45:59.0 + +++ linux-2.6.12/drivers/net/e100.c 2005-08-09 16:14:45.0 + @@ -1393,7 +1393,7 @@ static inline int e100_tx_clean(struct n static void e100_clean_cbs(struct nic *nic) { if(nic->cbs) { - while(nic->cbs_avail != nic->params.cbs.count) { + while(nic->cbs_avail < nic->params.cbs.count) { struct cb *cb = nic->cb_to_clean; if(cb->skb) { pci_unmap_single(nic->pdev, On Tue, 2005-08-09 at 16:36 +0300, Matti Aarnio wrote: Running very recent Fedora Core Development kernel I can following soft-oops.. ( 2.6.12-1.1455_FC5smp ) e100: eth0: e100_watchdog: link up, 100Mbps, full-duplex BUG: soft lockup detected on CPU#0! Pid: 10743, comm: ifconfig EIP: 0060:[] CPU: 0 EIP is at e100_clean_cbs+0x2f/0x12b [e100] EFLAGS: 0293Not tainted (2.6.12-1.1455_FC5smp) EAX: 495c7c2b EBX: 495c7c2b ECX: f6c311a0 EDX: ESI: 0040 EDI: f6c3 EBP: f71a4b20 DS: 007b ES: 007b CR0: 8005003b CR2: 0804a544 CR3: 01e9cd80 CR4: 06f0 [] e100_down+0x66/0x9a [e100] [] e100_close+0xa/0xd [e100] [] dev_close+0x40/0x7e [] dev_change_flags+0x46/0xf5 [] devinet_ioctl+0x564/0x5df [] sock_ioctl+0xc3/0x250 [] sock_ioctl+0x0/0x250 [] do_ioctl+0x1f/0x6d [] vfs_ioctl+0x50/0x1c6 [] sys_ioctl+0x5d/0x6f [] syscall_call+0x7/0xb [] softlockup_tick+0x6f/0x80 [] timer_interrupt+0x2d/0x75 [] handle_IRQ_event+0x2e/0x5a [] __do_IRQ+0xc2/0x127 [] do_IRQ+0x4e/0x86 === [] smp_apic_timer_interrupt+0xc1/0xca [] common_interrupt+0x1a/0x20 [] e100_clean_cbs+0x2f/0x12b [e100] [] e100_down+0x66/0x9a [e100] [] e100_close+0xa/0xd [e100] [] dev_close+0x40/0x7e [] dev_change_flags+0x46/0xf5 [] devinet_ioctl+0x564/0x5df [] sock_ioctl+0xc3/0x250 [] sock_ioctl+0x0/0x250 [] do_ioctl+0x1f/0x6d [] vfs_ioctl+0x50/0x1c6 [] sys_ioctl+0x5d/0x6f [] syscall_call+0x7/0xb Preconditions for this are: - E100 card stopped working for some reason (no idea why, it just does sometimes at this oldish 2x P-III machine) - There are active datastreams running in and out (around 0.2 Mbps out, multiple megabits in.) - Commanding then "ifconfig eth0 down" results in what feels like system freezing, but it does recover in about 30-60 seconds (it takes long enough for me to sweat bullets...) - While in freeze state, keyboard can go crazy, but mouse does respond, as well as tvtime shows bt848 captured live video. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/ - To unsubscribe from this list: send the line "unsubs
Re: Soft lockup in e100 driver ?
I have been working for days to get a recent kernel to work with these small-format UP Celeron 2Ghz (running at 1.33Ghz) motherboards that I am planning to use as thin clients. I'm doing a PXE boot, loading kernels, and trying to get networking to come up. I eventually realized that the problem is that the e100 driver loads but does not allow any packet traffic. The system isn't crashed, but I do get transmit timeouts. I've used kernels: 2.6.10, 2.6.11, and 2.6.12.4, stock with only the "squashfs" patch applied and compiled as 586/ The interesting thing is that Ubuntu 5.04, booted "Live" on the box, works just fine with the e100 driver with a kernel shown as: "2.6.10-5-386". I'm going to work on pulling this kernel and its modules off to use. Any help urgently appreciated. sdw Matti Aarnio wrote: On Tue, Aug 09, 2005 at 09:16:21AM -0700, Daniel Walker wrote: It looks like this might be an SMP race , it seem that both processors are in e100_down(). There is a while loop in e100_clean_cbs() that appears to have an unsafe looping condition . It looks like cbs_avail might jump over params.cbs.count , then you would have to wait for a rollover . Is this a PREEMPT_NONE kernel? # CONFIG_PREEMPT is not set # CONFIG_PREEMPT_BKL is not set which is probably same as "NONE". There is _one_ processor in down, but other may be in trying to send some data out, or otherwise polling the card. However... while real bugs in their own sense, none of these are as important as original "card dies" thing, during a recovery of which all this soft-lockup merryment happens. Also, as it happens only once a week or so (except when it happens right after another), testing code patches is rather slow. I can guess which things make it more likely, but I can't make it happen at will. /Matti Aarnio This patch may help, but it's not a complete fix. --- linux-2.6.12.orig/drivers/net/e100.c2005-08-05 16:45:59.0 + +++ linux-2.6.12/drivers/net/e100.c 2005-08-09 16:14:45.0 + @@ -1393,7 +1393,7 @@ static inline int e100_tx_clean(struct n static void e100_clean_cbs(struct nic *nic) { if(nic->cbs) { - while(nic->cbs_avail != nic->params.cbs.count) { + while(nic->cbs_avail < nic->params.cbs.count) { struct cb *cb = nic->cb_to_clean; if(cb->skb) { pci_unmap_single(nic->pdev, On Tue, 2005-08-09 at 16:36 +0300, Matti Aarnio wrote: Running very recent Fedora Core Development kernel I can following soft-oops.. ( 2.6.12-1.1455_FC5smp ) e100: eth0: e100_watchdog: link up, 100Mbps, full-duplex BUG: soft lockup detected on CPU#0! Pid: 10743, comm: ifconfig EIP: 0060:[] CPU: 0 EIP is at e100_clean_cbs+0x2f/0x12b [e100] EFLAGS: 0293Not tainted (2.6.12-1.1455_FC5smp) EAX: 495c7c2b EBX: 495c7c2b ECX: f6c311a0 EDX: ESI: 0040 EDI: f6c3 EBP: f71a4b20 DS: 007b ES: 007b CR0: 8005003b CR2: 0804a544 CR3: 01e9cd80 CR4: 06f0 [] e100_down+0x66/0x9a [e100] [] e100_close+0xa/0xd [e100] [] dev_close+0x40/0x7e [] dev_change_flags+0x46/0xf5 [] devinet_ioctl+0x564/0x5df [] sock_ioctl+0xc3/0x250 [] sock_ioctl+0x0/0x250 [] do_ioctl+0x1f/0x6d [] vfs_ioctl+0x50/0x1c6 [] sys_ioctl+0x5d/0x6f [] syscall_call+0x7/0xb [] softlockup_tick+0x6f/0x80 [] timer_interrupt+0x2d/0x75 [] handle_IRQ_event+0x2e/0x5a [] __do_IRQ+0xc2/0x127 [] do_IRQ+0x4e/0x86 === [] smp_apic_timer_interrupt+0xc1/0xca [] common_interrupt+0x1a/0x20 [] e100_clean_cbs+0x2f/0x12b [e100] [] e100_down+0x66/0x9a [e100] [] e100_close+0xa/0xd [e100] [] dev_close+0x40/0x7e [] dev_change_flags+0x46/0xf5 [] devinet_ioctl+0x564/0x5df [] sock_ioctl+0xc3/0x250 [] sock_ioctl+0x0/0x250 [] do_ioctl+0x1f/0x6d [] vfs_ioctl+0x50/0x1c6 [] sys_ioctl+0x5d/0x6f [] syscall_call+0x7/0xb Preconditions for this are: - E100 card stopped working for some reason (no idea why, it just does sometimes at this oldish 2x P-III machine) - There are active datastreams running in and out (around 0.2 Mbps out, multiple megabits in.) - Commanding then "ifconfig eth0 down" results in what feels like system freezing, but it does recover in about 30-60 seconds (it takes long enough for me to sweat bullets...) - While in freeze state, keyboard can go crazy, but mouse does respond, as well as tvtime shows bt848 captured live video. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/ begin:vcard fn:Stephen Williams n:Williams;Stephen email;internet:[
Re: Soft lockup in e100 driver ?
On Tue, Aug 09, 2005 at 09:16:21AM -0700, Daniel Walker wrote: > It looks like this might be an SMP race , it seem that both processors > are in e100_down(). There is a while loop in e100_clean_cbs() that > appears to have an unsafe looping condition . > > It looks like cbs_avail might jump over params.cbs.count , then you > would have to wait for a rollover . Is this a PREEMPT_NONE kernel? # CONFIG_PREEMPT is not set # CONFIG_PREEMPT_BKL is not set which is probably same as "NONE". There is _one_ processor in down, but other may be in trying to send some data out, or otherwise polling the card. However... while real bugs in their own sense, none of these are as important as original "card dies" thing, during a recovery of which all this soft-lockup merryment happens. Also, as it happens only once a week or so (except when it happens right after another), testing code patches is rather slow. I can guess which things make it more likely, but I can't make it happen at will. /Matti Aarnio > This patch may help, but it's not a complete fix. > > --- linux-2.6.12.orig/drivers/net/e100.c2005-08-05 16:45:59.0 > + > +++ linux-2.6.12/drivers/net/e100.c 2005-08-09 16:14:45.0 + > @@ -1393,7 +1393,7 @@ static inline int e100_tx_clean(struct n > static void e100_clean_cbs(struct nic *nic) > { > if(nic->cbs) { > - while(nic->cbs_avail != nic->params.cbs.count) { > + while(nic->cbs_avail < nic->params.cbs.count) { > struct cb *cb = nic->cb_to_clean; > if(cb->skb) { > pci_unmap_single(nic->pdev, > > > > On Tue, 2005-08-09 at 16:36 +0300, Matti Aarnio wrote: > > Running very recent Fedora Core Development kernel I can following > > soft-oops.. ( 2.6.12-1.1455_FC5smp ) > > > > > > e100: eth0: e100_watchdog: link up, 100Mbps, full-duplex > > BUG: soft lockup detected on CPU#0! > > > > Pid: 10743, comm: ifconfig > > EIP: 0060:[] CPU: 0 > > EIP is at e100_clean_cbs+0x2f/0x12b [e100] > > EFLAGS: 0293Not tainted (2.6.12-1.1455_FC5smp) > > EAX: 495c7c2b EBX: 495c7c2b ECX: f6c311a0 EDX: > > ESI: 0040 EDI: f6c3 EBP: f71a4b20 DS: 007b ES: 007b > > CR0: 8005003b CR2: 0804a544 CR3: 01e9cd80 CR4: 06f0 > > [] e100_down+0x66/0x9a [e100] > > [] e100_close+0xa/0xd [e100] > > [] dev_close+0x40/0x7e > > [] dev_change_flags+0x46/0xf5 > > [] devinet_ioctl+0x564/0x5df > > [] sock_ioctl+0xc3/0x250 > > [] sock_ioctl+0x0/0x250 > > [] do_ioctl+0x1f/0x6d > > [] vfs_ioctl+0x50/0x1c6 > > [] sys_ioctl+0x5d/0x6f > > [] syscall_call+0x7/0xb > > [] softlockup_tick+0x6f/0x80 > > [] timer_interrupt+0x2d/0x75 > > [] handle_IRQ_event+0x2e/0x5a > > [] __do_IRQ+0xc2/0x127 > > [] do_IRQ+0x4e/0x86 > > === > > [] smp_apic_timer_interrupt+0xc1/0xca > > [] common_interrupt+0x1a/0x20 > > [] e100_clean_cbs+0x2f/0x12b [e100] > > [] e100_down+0x66/0x9a [e100] > > [] e100_close+0xa/0xd [e100] > > [] dev_close+0x40/0x7e > > [] dev_change_flags+0x46/0xf5 > > [] devinet_ioctl+0x564/0x5df > > [] sock_ioctl+0xc3/0x250 > > [] sock_ioctl+0x0/0x250 > > [] do_ioctl+0x1f/0x6d > > [] vfs_ioctl+0x50/0x1c6 > > [] sys_ioctl+0x5d/0x6f > > [] syscall_call+0x7/0xb > > > > > > > > Preconditions for this are: > > > > - E100 card stopped working for some reason (no idea why, it just > > does sometimes at this oldish 2x P-III machine) > > - There are active datastreams running in and out > > (around 0.2 Mbps out, multiple megabits in.) > > - Commanding then "ifconfig eth0 down" results in what feels like > > system freezing, but it does recover in about 30-60 seconds > > (it takes long enough for me to sweat bullets...) > > - While in freeze state, keyboard can go crazy, but mouse does > > respond, as well as tvtime shows bt848 captured live video. > > - > > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > > the body of a message to [EMAIL PROTECTED] > > More majordomo info at http://vger.kernel.org/majordomo-info.html > > Please read the FAQ at http://www.tux.org/lkml/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Soft lockup in e100 driver ?
It looks like this might be an SMP race , it seem that both processors are in e100_down(). There is a while loop in e100_clean_cbs() that appears to have an unsafe looping condition . It looks like cbs_avail might jump over params.cbs.count , then you would have to wait for a rollover . Is this a PREEMPT_NONE kernel? This patch may help, but it's not a complete fix. --- linux-2.6.12.orig/drivers/net/e100.c2005-08-05 16:45:59.0 + +++ linux-2.6.12/drivers/net/e100.c 2005-08-09 16:14:45.0 + @@ -1393,7 +1393,7 @@ static inline int e100_tx_clean(struct n static void e100_clean_cbs(struct nic *nic) { if(nic->cbs) { - while(nic->cbs_avail != nic->params.cbs.count) { + while(nic->cbs_avail < nic->params.cbs.count) { struct cb *cb = nic->cb_to_clean; if(cb->skb) { pci_unmap_single(nic->pdev, On Tue, 2005-08-09 at 16:36 +0300, Matti Aarnio wrote: > Running very recent Fedora Core Development kernel I can following > soft-oops.. ( 2.6.12-1.1455_FC5smp ) > > > e100: eth0: e100_watchdog: link up, 100Mbps, full-duplex > BUG: soft lockup detected on CPU#0! > > Pid: 10743, comm: ifconfig > EIP: 0060:[] CPU: 0 > EIP is at e100_clean_cbs+0x2f/0x12b [e100] > EFLAGS: 0293Not tainted (2.6.12-1.1455_FC5smp) > EAX: 495c7c2b EBX: 495c7c2b ECX: f6c311a0 EDX: > ESI: 0040 EDI: f6c3 EBP: f71a4b20 DS: 007b ES: 007b > CR0: 8005003b CR2: 0804a544 CR3: 01e9cd80 CR4: 06f0 > [] e100_down+0x66/0x9a [e100] > [] e100_close+0xa/0xd [e100] > [] dev_close+0x40/0x7e > [] dev_change_flags+0x46/0xf5 > [] devinet_ioctl+0x564/0x5df > [] sock_ioctl+0xc3/0x250 > [] sock_ioctl+0x0/0x250 > [] do_ioctl+0x1f/0x6d > [] vfs_ioctl+0x50/0x1c6 > [] sys_ioctl+0x5d/0x6f > [] syscall_call+0x7/0xb > [] softlockup_tick+0x6f/0x80 > [] timer_interrupt+0x2d/0x75 > [] handle_IRQ_event+0x2e/0x5a > [] __do_IRQ+0xc2/0x127 > [] do_IRQ+0x4e/0x86 > === > [] smp_apic_timer_interrupt+0xc1/0xca > [] common_interrupt+0x1a/0x20 > [] e100_clean_cbs+0x2f/0x12b [e100] > [] e100_down+0x66/0x9a [e100] > [] e100_close+0xa/0xd [e100] > [] dev_close+0x40/0x7e > [] dev_change_flags+0x46/0xf5 > [] devinet_ioctl+0x564/0x5df > [] sock_ioctl+0xc3/0x250 > [] sock_ioctl+0x0/0x250 > [] do_ioctl+0x1f/0x6d > [] vfs_ioctl+0x50/0x1c6 > [] sys_ioctl+0x5d/0x6f > [] syscall_call+0x7/0xb > > > > Preconditions for this are: > > - E100 card stopped working for some reason (no idea why, it just > does sometimes at this oldish 2x P-III machine) > - There are active datastreams running in and out > (around 0.2 Mbps out, multiple megabits in.) > - Commanding then "ifconfig eth0 down" results in what feels like > system freezing, but it does recover in about 30-60 seconds > (it takes long enough for me to sweat bullets...) > - While in freeze state, keyboard can go crazy, but mouse does > respond, as well as tvtime shows bt848 captured live video. > - > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to [EMAIL PROTECTED] > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Soft lockup in e100 driver ?
On Tue, 2005-08-09 at 18:55 +0300, Matti Aarnio wrote: > The fundamental thing is, IT LOCKS UP (for a while), when I do > "ifconfig eth0 down" and there is active traffic but the card DIES > somehow. Apparently it requires marginal/unreliable hardware to > happen as well. (Which for e100 is rather rare.) This does look like a problem with the e100. I have a SMP machine and another machine with a e100 card, but not the both together, and I'm not about to start pulling cards. Does this only happen in SMP or do you also see this problem running a UP kernel (you only need to run a UP kernel on SMP machine to get the same results)? I'm running debian but I guess I could run the Fedora kernel to see if I can get the same behavior. > That is: at first the card dies, then I notice it, and do the ifconfig. > Then things go _bad_, and recover. Then I do 'rmmod e100', and > restart network (which reloads the driver module), and things work > once again. So you have something locking up momentarily, then coming back to normal? After the rmmod of e100 and bringing back up the network, all is in order? Just confirming what you see. > > Fedora kernel sources have this "softlockups" patch file: (size and date) >6159 May 12 04:50 linux-2.6.12-detect-softlockups.patch > > That file I can upload, if you want. Or send in email. > Rest of the RPM-wrapper CPIO package I would prefer not to... Did you add that patch yourself, or did it come with an update? I was just fiddling with rpms and I can use them too, with the rpm2cpio, it works nice. So if you can just point to a link then I'll download it and try it out. I found http://download.fedora.redhat.com/pub/fedora/linux/core/updates/testing/4/i386/kernel-smp-2.6.12-1.1411_FC4.i686.rpm but this is to 1411 and not to what you showed (1455). -- Steve - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Soft lockup in e100 driver ?
On Tue, Aug 09, 2005 at 11:41:40AM -0400, Steven Rostedt wrote: > On Tue, 2005-08-09 at 17:37 +0300, Matti Aarnio wrote: > > On Tue, Aug 09, 2005 at 03:55:49PM +0200, Jesper Juhl wrote: > > > On 8/9/05, Matti Aarnio <[EMAIL PROTECTED]> wrote: > > > > Running very recent Fedora Core Development kernel I can following > > > > soft-oops.. ( 2.6.12-1.1455_FC5smp ) > > > > > > > Various patches to the e100 driver have been merged since 2.6.12.1 > > > (which is ~1.5months old), so it would make sense to try a more recent > > > kernel like 2.6.13-rc6, 2.6.13-rc6-git1 or 2.6.13-rc5-mm1 and see if > > > you can still reproduce the problem with those. > > > > The kernel in question is less than 3 days old RedHat Fedora Core > > Development kernel with baseline as: > > * Sun Aug 07 2005 Dave Jones <[EMAIL PROTECTED]> > > - 2.6.13-rc5-git4 > > > > Those merges have not helped. > > Matti, > > I believe Fedora must have added Ingo's soft lockup detect code. I've > made additions to this code as well. Could you point me to a link that I > could download this kernel source. No rpm's or packagemanagers please. > Just a tarball would be fine. The fundamental thing is, IT LOCKS UP (for a while), when I do "ifconfig eth0 down" and there is active traffic but the card DIES somehow. Apparently it requires marginal/unreliable hardware to happen as well. (Which for e100 is rather rare.) That is: at first the card dies, then I notice it, and do the ifconfig. Then things go _bad_, and recover. Then I do 'rmmod e100', and restart network (which reloads the driver module), and things work once again. Fedora kernel sources have this "softlockups" patch file: (size and date) 6159 May 12 04:50 linux-2.6.12-detect-softlockups.patch That file I can upload, if you want. Or send in email. Rest of the RPM-wrapper CPIO package I would prefer not to... > Thanks, > -- Steve /Matti Aarnio - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Soft lockup in e100 driver ?
On Tue, 2005-08-09 at 17:37 +0300, Matti Aarnio wrote: > On Tue, Aug 09, 2005 at 03:55:49PM +0200, Jesper Juhl wrote: > > On 8/9/05, Matti Aarnio <[EMAIL PROTECTED]> wrote: > > > Running very recent Fedora Core Development kernel I can following > > > soft-oops.. ( 2.6.12-1.1455_FC5smp ) > > > > > Various patches to the e100 driver have been merged since 2.6.12.1 > > (which is ~1.5months old), so it would make sense to try a more recent > > kernel like 2.6.13-rc6, 2.6.13-rc6-git1 or 2.6.13-rc5-mm1 and see if > > you can still reproduce the problem with those. > > The kernel in question is less than 3 days old RedHat Fedora Core > Development kernel with baseline as: > * Sun Aug 07 2005 Dave Jones <[EMAIL PROTECTED]> > - 2.6.13-rc5-git4 > > Those merges have not helped. Matti, I believe Fedora must have added Ingo's soft lockup detect code. I've made additions to this code as well. Could you point me to a link that I could download this kernel source. No rpm's or packagemanagers please. Just a tarball would be fine. Thanks, -- Steve - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Soft lockup in e100 driver ?
On Tue, 2005-08-09 at 11:23 -0400, Steven Rostedt wrote: > > I just downloaded 2.6.13-rc6-git and I don't see the merge of the soft > lockup code. Is this a Fedora thing? If so, could someone point me to > a link to download this Fedora kernel. I'm currently using Debian. I seem to recall seeing fedora using voluntary preempt before it was merged. Daniel - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Soft lockup in e100 driver ?
On Tue, 2005-08-09 at 10:58 -0400, Lee Revell wrote: > On Tue, 2005-08-09 at 16:36 +0300, Matti Aarnio wrote: > > Running very recent Fedora Core Development kernel I can following > > soft-oops.. ( 2.6.12-1.1455_FC5smp ) > > > > > > e100: eth0: e100_watchdog: link up, 100Mbps, full-duplex > > BUG: soft lockup detected on CPU#0! > > Could this be a false positive? It's suspicious that the soft lockup > detector was just merged to mainline then you got this. I just downloaded 2.6.13-rc6-git and I don't see the merge of the soft lockup code. Is this a Fedora thing? If so, could someone point me to a link to download this Fedora kernel. I'm currently using Debian. -- Steve - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Soft lockup in e100 driver ?
On Tue, 2005-08-09 at 16:36 +0300, Matti Aarnio wrote: > Running very recent Fedora Core Development kernel I can following > soft-oops.. ( 2.6.12-1.1455_FC5smp ) > > > e100: eth0: e100_watchdog: link up, 100Mbps, full-duplex > BUG: soft lockup detected on CPU#0! Could this be a false positive? It's suspicious that the soft lockup detector was just merged to mainline then you got this. Lee - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Soft lockup in e100 driver ?
On Tue, Aug 09, 2005 at 03:55:49PM +0200, Jesper Juhl wrote: > On 8/9/05, Matti Aarnio <[EMAIL PROTECTED]> wrote: > > Running very recent Fedora Core Development kernel I can following > > soft-oops.. ( 2.6.12-1.1455_FC5smp ) > > > Various patches to the e100 driver have been merged since 2.6.12.1 > (which is ~1.5months old), so it would make sense to try a more recent > kernel like 2.6.13-rc6, 2.6.13-rc6-git1 or 2.6.13-rc5-mm1 and see if > you can still reproduce the problem with those. The kernel in question is less than 3 days old RedHat Fedora Core Development kernel with baseline as: * Sun Aug 07 2005 Dave Jones <[EMAIL PROTECTED]> - 2.6.13-rc5-git4 Those merges have not helped. > -- > Jesper Juhl <[EMAIL PROTECTED]> /Matti Aarnio - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Soft lockup in e100 driver ?
On 8/9/05, Matti Aarnio <[EMAIL PROTECTED]> wrote: > Running very recent Fedora Core Development kernel I can following > soft-oops.. ( 2.6.12-1.1455_FC5smp ) > Various patches to the e100 driver have been merged since 2.6.12.1 (which is ~1.5months old), so it would make sense to try a more recent kernel like 2.6.13-rc6, 2.6.13-rc6-git1 or 2.6.13-rc5-mm1 and see if you can still reproduce the problem with those. -- Jesper Juhl <[EMAIL PROTECTED]> Don't top-post http://www.catb.org/~esr/jargon/html/T/top-post.html Plain text mails only, please http://www.expita.com/nomime.html - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/