Re: Soft lockup in e100 driver ?

2005-08-11 Thread Jesse Brandeburg
On 8/11/05, Stephen D. Williams <[EMAIL PROTECTED]> wrote:
> The chipset is an Intel 8x0 something.  Unfortunately, there is a
> heatsink semi-permanently installed over everything.  Is there a /proc
> pseudofile that will give me good identifying chipset info to report here?

you can show the chipset details with lspci
lspci -n will show device IDs and revision ids

interesting failure case on the e100, I haven't a clue whats going on.

netdev @ vger might be a good place to continue the discussion abut e100 issues.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Soft lockup in e100 driver ?

2005-08-11 Thread Stephen D. Williams

"noapic" didn't work, nor did "noacpi", etc.
Going to 2.6.13-rc6.2 solved the problem (once I integrated udev, etc.).

The chipset is an Intel 8x0 something.  Unfortunately, there is a 
heatsink semi-permanently installed over everything.  Is there a /proc 
pseudofile that will give me good identifying chipset info to report here?


If there is a FAQ for this, we should post a message about it once in a 
while.

Nothing here indicates chipset:
http://www.kernel.org/pub/linux/docs/lkml/reporting-bugs.html

The CPU is an Intel Celeron CPU 2.00GHz running at 1495.772 MHz, 128MB 
cache.


sdw

Matti Aarnio wrote:


On Wed, Aug 10, 2005 at 08:32:45PM -0400, Stephen D. Williams wrote:
 

I just noticed that the Ubuntu setup says "GSI 20(level,low) -> IRQ 20" 
whereas I remember my built kernels saying "No GSI..  IRQ 11".  I'll 
investigate what that means and how to enable it.  Pointers appreciated.
   



That is most likely unrelated, but I had similar experiences
at times.  It turned out that something done recently in APIC
management code did break things, but lattest version is again
working.   For a while to have network card working I had to boot
with  "noapic"  option in my home SMP box.

In an UP box it is about same to boot as "noapic", but in SMP it
does result in "one CPU does all interrupts" thingie.  (In some
rare cases it could be desirable, even.)

  /Matti Aarnio


 


sdw

Stephen D. Williams wrote:

   

I have been working for days to get a recent kernel to work with these 
small-format UP Celeron 2Ghz (running at 1.33Ghz) motherboards that I 
am planning to use as thin clients.  I'm doing a PXE boot, loading 
kernels, and trying to get networking to come up.


I eventually realized that the problem is that the e100 driver loads 
but does not allow any packet traffic.  The system isn't crashed, but 
I do get transmit timeouts.


I've used kernels: 2.6.10, 2.6.11, and 2.6.12.4, stock with only the 
"squashfs" patch applied and compiled as 586/


The interesting thing is that Ubuntu 5.04, booted "Live" on the box, 
works just fine with the e100 driver with a kernel shown as: 
"2.6.10-5-386".  I'm going to work on pulling this kernel and its 
modules off to use.


Any help urgently appreciated.

sdw
 



 



begin:vcard
fn:Stephen Williams
n:Williams;Stephen
email;internet:[EMAIL PROTECTED]
tel;work:703-724-0118
tel;fax:703-995-0407
tel;pager:[EMAIL PROTECTED]
tel;home:703-729-5405
tel;cell:703-371-9362
x-mozilla-html:TRUE
version:2.1
end:vcard



Re: Soft lockup in e100 driver ?

2005-08-11 Thread Matti Aarnio
On Wed, Aug 10, 2005 at 08:32:45PM -0400, Stephen D. Williams wrote:
> I just noticed that the Ubuntu setup says "GSI 20(level,low) -> IRQ 20" 
> whereas I remember my built kernels saying "No GSI..  IRQ 11".  I'll 
> investigate what that means and how to enable it.  Pointers appreciated.

That is most likely unrelated, but I had similar experiences
at times.  It turned out that something done recently in APIC
management code did break things, but lattest version is again
working.   For a while to have network card working I had to boot
with  "noapic"  option in my home SMP box.

In an UP box it is about same to boot as "noapic", but in SMP it
does result in "one CPU does all interrupts" thingie.  (In some
rare cases it could be desirable, even.)

   /Matti Aarnio


> sdw
> 
> Stephen D. Williams wrote:
> 
> >I have been working for days to get a recent kernel to work with these 
> >small-format UP Celeron 2Ghz (running at 1.33Ghz) motherboards that I 
> >am planning to use as thin clients.  I'm doing a PXE boot, loading 
> >kernels, and trying to get networking to come up.
> >
> >I eventually realized that the problem is that the e100 driver loads 
> >but does not allow any packet traffic.  The system isn't crashed, but 
> >I do get transmit timeouts.
> >
> >I've used kernels: 2.6.10, 2.6.11, and 2.6.12.4, stock with only the 
> >"squashfs" patch applied and compiled as 586/
> >
> >The interesting thing is that Ubuntu 5.04, booted "Live" on the box, 
> >works just fine with the e100 driver with a kernel shown as: 
> >"2.6.10-5-386".  I'm going to work on pulling this kernel and its 
> >modules off to use.
> >
> >Any help urgently appreciated.
> >
> >sdw


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Soft lockup in e100 driver ?

2005-08-10 Thread Stephen D. Williams
I just noticed that the Ubuntu setup says "GSI 20(level,low) -> IRQ 20" 
whereas I remember my built kernels saying "No GSI..  IRQ 11".  I'll 
investigate what that means and how to enable it.  Pointers appreciated.


sdw

Stephen D. Williams wrote:

I have been working for days to get a recent kernel to work with these 
small-format UP Celeron 2Ghz (running at 1.33Ghz) motherboards that I 
am planning to use as thin clients.  I'm doing a PXE boot, loading 
kernels, and trying to get networking to come up.


I eventually realized that the problem is that the e100 driver loads 
but does not allow any packet traffic.  The system isn't crashed, but 
I do get transmit timeouts.


I've used kernels: 2.6.10, 2.6.11, and 2.6.12.4, stock with only the 
"squashfs" patch applied and compiled as 586/


The interesting thing is that Ubuntu 5.04, booted "Live" on the box, 
works just fine with the e100 driver with a kernel shown as: 
"2.6.10-5-386".  I'm going to work on pulling this kernel and its 
modules off to use.


Any help urgently appreciated.

sdw

Matti Aarnio wrote:


On Tue, Aug 09, 2005 at 09:16:21AM -0700, Daniel Walker wrote:
 


It looks like this might be an SMP race , it seem that both processors
are in e100_down(). There is a while loop in e100_clean_cbs() that
appears to have an unsafe looping condition .
It looks like cbs_avail might jump over params.cbs.count , then you
would have to wait for a rollover . Is this a PREEMPT_NONE kernel?
  



 # CONFIG_PREEMPT is not set
 # CONFIG_PREEMPT_BKL is not set

which is probably same as "NONE".

There is _one_ processor in down, but other may be in trying to send
some data out, or otherwise polling the card.

However...  while real bugs in their own sense, none of these are
as important as original "card dies" thing, during a recovery of
which all this soft-lockup merryment happens.

Also, as it happens only once a week or so (except when it happens
right after another), testing code patches is rather slow.
I can guess which things make it more likely, but I can't make it
happen at will.

 /Matti Aarnio


 


This patch may help, but it's not a complete fix.

--- linux-2.6.12.orig/drivers/net/e100.c2005-08-05 
16:45:59.0 +
+++ linux-2.6.12/drivers/net/e100.c 2005-08-09 
16:14:45.0 +

@@ -1393,7 +1393,7 @@ static inline int e100_tx_clean(struct n
static void e100_clean_cbs(struct nic *nic)
{
   if(nic->cbs) {
-   while(nic->cbs_avail != nic->params.cbs.count) {
+   while(nic->cbs_avail < nic->params.cbs.count) {
   struct cb *cb = nic->cb_to_clean;
   if(cb->skb) {
   pci_unmap_single(nic->pdev,



On Tue, 2005-08-09 at 16:36 +0300, Matti Aarnio wrote:
  


Running very recent Fedora Core Development kernel I can following
soft-oops..   ( 2.6.12-1.1455_FC5smp )


e100: eth0: e100_watchdog: link up, 100Mbps, full-duplex
BUG: soft lockup detected on CPU#0!

Pid: 10743, comm: ifconfig
EIP: 0060:[] CPU: 0
EIP is at e100_clean_cbs+0x2f/0x12b [e100]
EFLAGS: 0293Not tainted  (2.6.12-1.1455_FC5smp)
EAX: 495c7c2b EBX: 495c7c2b ECX: f6c311a0 EDX: 
ESI: 0040 EDI: f6c3 EBP: f71a4b20 DS: 007b ES: 007b
CR0: 8005003b CR2: 0804a544 CR3: 01e9cd80 CR4: 06f0
[] e100_down+0x66/0x9a [e100]
[] e100_close+0xa/0xd [e100]
[] dev_close+0x40/0x7e
[] dev_change_flags+0x46/0xf5
[] devinet_ioctl+0x564/0x5df
[] sock_ioctl+0xc3/0x250
[] sock_ioctl+0x0/0x250
[] do_ioctl+0x1f/0x6d
[] vfs_ioctl+0x50/0x1c6
[] sys_ioctl+0x5d/0x6f
[] syscall_call+0x7/0xb
[] softlockup_tick+0x6f/0x80
[] timer_interrupt+0x2d/0x75
[] handle_IRQ_event+0x2e/0x5a
[] __do_IRQ+0xc2/0x127
[] do_IRQ+0x4e/0x86
===
[] smp_apic_timer_interrupt+0xc1/0xca
[] common_interrupt+0x1a/0x20
[] e100_clean_cbs+0x2f/0x12b [e100]
[] e100_down+0x66/0x9a [e100]
[] e100_close+0xa/0xd [e100]
[] dev_close+0x40/0x7e
[] dev_change_flags+0x46/0xf5
[] devinet_ioctl+0x564/0x5df
[] sock_ioctl+0xc3/0x250
[] sock_ioctl+0x0/0x250
[] do_ioctl+0x1f/0x6d
[] vfs_ioctl+0x50/0x1c6
[] sys_ioctl+0x5d/0x6f
[] syscall_call+0x7/0xb



Preconditions for this are:

- E100 card stopped working for some reason (no idea why, it just
 does sometimes at this oldish 2x P-III machine)
- There are active datastreams running in and out
 (around 0.2 Mbps out, multiple megabits in.)
- Commanding then "ifconfig eth0 down" results in what feels like 
 system freezing, but it does recover in about 30-60 seconds

 (it takes long enough for me to sweat bullets...)
- While in freeze state, keyboard can go crazy, but mouse does
 respond, as well as tvtime shows bt848 captured live video.
-
To unsubscribe from this list: send the line "unsubscribe 
linux-kernel" in

the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/




-
To unsubscribe from this list: send the line "unsubs

Re: Soft lockup in e100 driver ?

2005-08-10 Thread Stephen D. Williams
I have been working for days to get a recent kernel to work with these 
small-format UP Celeron 2Ghz (running at 1.33Ghz) motherboards that I am 
planning to use as thin clients.  I'm doing a PXE boot, loading kernels, 
and trying to get networking to come up.


I eventually realized that the problem is that the e100 driver loads but 
does not allow any packet traffic.  The system isn't crashed, but I do 
get transmit timeouts.


I've used kernels: 2.6.10, 2.6.11, and 2.6.12.4, stock with only the 
"squashfs" patch applied and compiled as 586/


The interesting thing is that Ubuntu 5.04, booted "Live" on the box, 
works just fine with the e100 driver with a kernel shown as: 
"2.6.10-5-386".  I'm going to work on pulling this kernel and its 
modules off to use.


Any help urgently appreciated.

sdw

Matti Aarnio wrote:


On Tue, Aug 09, 2005 at 09:16:21AM -0700, Daniel Walker wrote:
 


It looks like this might be an SMP race , it seem that both processors
are in e100_down(). There is a while loop in e100_clean_cbs() that
appears to have an unsafe looping condition . 


It looks like cbs_avail might jump over params.cbs.count , then you
would have to wait for a rollover . Is this a PREEMPT_NONE kernel?
   



 # CONFIG_PREEMPT is not set
 # CONFIG_PREEMPT_BKL is not set

which is probably same as "NONE".

There is _one_ processor in down, but other may be in trying to send
some data out, or otherwise polling the card.

However...  while real bugs in their own sense, none of these are
as important as original "card dies" thing, during a recovery of
which all this soft-lockup merryment happens.

Also, as it happens only once a week or so (except when it happens
right after another), testing code patches is rather slow.
I can guess which things make it more likely, but I can't make it
happen at will.

 /Matti Aarnio


 


This patch may help, but it's not a complete fix.

--- linux-2.6.12.orig/drivers/net/e100.c2005-08-05 16:45:59.0 
+
+++ linux-2.6.12/drivers/net/e100.c 2005-08-09 16:14:45.0 +
@@ -1393,7 +1393,7 @@ static inline int e100_tx_clean(struct n
static void e100_clean_cbs(struct nic *nic)
{
   if(nic->cbs) {
-   while(nic->cbs_avail != nic->params.cbs.count) {
+   while(nic->cbs_avail < nic->params.cbs.count) {
   struct cb *cb = nic->cb_to_clean;
   if(cb->skb) {
   pci_unmap_single(nic->pdev,



On Tue, 2005-08-09 at 16:36 +0300, Matti Aarnio wrote:
   


Running very recent Fedora Core Development kernel I can following
soft-oops..   ( 2.6.12-1.1455_FC5smp )


e100: eth0: e100_watchdog: link up, 100Mbps, full-duplex
BUG: soft lockup detected on CPU#0!

Pid: 10743, comm: ifconfig
EIP: 0060:[] CPU: 0
EIP is at e100_clean_cbs+0x2f/0x12b [e100]
EFLAGS: 0293Not tainted  (2.6.12-1.1455_FC5smp)
EAX: 495c7c2b EBX: 495c7c2b ECX: f6c311a0 EDX: 
ESI: 0040 EDI: f6c3 EBP: f71a4b20 DS: 007b ES: 007b
CR0: 8005003b CR2: 0804a544 CR3: 01e9cd80 CR4: 06f0
[] e100_down+0x66/0x9a [e100]
[] e100_close+0xa/0xd [e100]
[] dev_close+0x40/0x7e
[] dev_change_flags+0x46/0xf5
[] devinet_ioctl+0x564/0x5df
[] sock_ioctl+0xc3/0x250
[] sock_ioctl+0x0/0x250
[] do_ioctl+0x1f/0x6d
[] vfs_ioctl+0x50/0x1c6
[] sys_ioctl+0x5d/0x6f
[] syscall_call+0x7/0xb
[] softlockup_tick+0x6f/0x80
[] timer_interrupt+0x2d/0x75
[] handle_IRQ_event+0x2e/0x5a
[] __do_IRQ+0xc2/0x127
[] do_IRQ+0x4e/0x86
===
[] smp_apic_timer_interrupt+0xc1/0xca
[] common_interrupt+0x1a/0x20
[] e100_clean_cbs+0x2f/0x12b [e100]
[] e100_down+0x66/0x9a [e100]
[] e100_close+0xa/0xd [e100]
[] dev_close+0x40/0x7e
[] dev_change_flags+0x46/0xf5
[] devinet_ioctl+0x564/0x5df
[] sock_ioctl+0xc3/0x250
[] sock_ioctl+0x0/0x250
[] do_ioctl+0x1f/0x6d
[] vfs_ioctl+0x50/0x1c6
[] sys_ioctl+0x5d/0x6f
[] syscall_call+0x7/0xb



Preconditions for this are:

- E100 card stopped working for some reason (no idea why, it just
 does sometimes at this oldish 2x P-III machine)
- There are active datastreams running in and out
 (around 0.2 Mbps out, multiple megabits in.)
- Commanding then "ifconfig eth0 down" results in what feels like 
 system freezing, but it does recover in about 30-60 seconds

 (it takes long enough for me to sweat bullets...)
- While in freeze state, keyboard can go crazy, but mouse does
 respond, as well as tvtime shows bt848 captured live video.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/
 


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/
 



begin:vcard
fn:Stephen Williams
n:Williams;Stephen
email;internet:[

Re: Soft lockup in e100 driver ?

2005-08-09 Thread Matti Aarnio
On Tue, Aug 09, 2005 at 09:16:21AM -0700, Daniel Walker wrote:
> It looks like this might be an SMP race , it seem that both processors
> are in e100_down(). There is a while loop in e100_clean_cbs() that
> appears to have an unsafe looping condition . 
> 
> It looks like cbs_avail might jump over params.cbs.count , then you
> would have to wait for a rollover . Is this a PREEMPT_NONE kernel?

  # CONFIG_PREEMPT is not set
  # CONFIG_PREEMPT_BKL is not set

which is probably same as "NONE".

There is _one_ processor in down, but other may be in trying to send
some data out, or otherwise polling the card.

However...  while real bugs in their own sense, none of these are
as important as original "card dies" thing, during a recovery of
which all this soft-lockup merryment happens.

Also, as it happens only once a week or so (except when it happens
right after another), testing code patches is rather slow.
I can guess which things make it more likely, but I can't make it
happen at will.

  /Matti Aarnio


> This patch may help, but it's not a complete fix.
> 
> --- linux-2.6.12.orig/drivers/net/e100.c2005-08-05 16:45:59.0 
> +
> +++ linux-2.6.12/drivers/net/e100.c 2005-08-09 16:14:45.0 +
> @@ -1393,7 +1393,7 @@ static inline int e100_tx_clean(struct n
>  static void e100_clean_cbs(struct nic *nic)
>  {
> if(nic->cbs) {
> -   while(nic->cbs_avail != nic->params.cbs.count) {
> +   while(nic->cbs_avail < nic->params.cbs.count) {
> struct cb *cb = nic->cb_to_clean;
> if(cb->skb) {
> pci_unmap_single(nic->pdev,
> 
> 
> 
> On Tue, 2005-08-09 at 16:36 +0300, Matti Aarnio wrote:
> > Running very recent Fedora Core Development kernel I can following
> > soft-oops..   ( 2.6.12-1.1455_FC5smp )
> > 
> > 
> > e100: eth0: e100_watchdog: link up, 100Mbps, full-duplex
> > BUG: soft lockup detected on CPU#0!
> > 
> > Pid: 10743, comm: ifconfig
> > EIP: 0060:[] CPU: 0
> > EIP is at e100_clean_cbs+0x2f/0x12b [e100]
> >  EFLAGS: 0293Not tainted  (2.6.12-1.1455_FC5smp)
> > EAX: 495c7c2b EBX: 495c7c2b ECX: f6c311a0 EDX: 
> > ESI: 0040 EDI: f6c3 EBP: f71a4b20 DS: 007b ES: 007b
> > CR0: 8005003b CR2: 0804a544 CR3: 01e9cd80 CR4: 06f0
> >  [] e100_down+0x66/0x9a [e100]
> >  [] e100_close+0xa/0xd [e100]
> >  [] dev_close+0x40/0x7e
> >  [] dev_change_flags+0x46/0xf5
> >  [] devinet_ioctl+0x564/0x5df
> >  [] sock_ioctl+0xc3/0x250
> >  [] sock_ioctl+0x0/0x250
> >  [] do_ioctl+0x1f/0x6d
> >  [] vfs_ioctl+0x50/0x1c6
> >  [] sys_ioctl+0x5d/0x6f
> >  [] syscall_call+0x7/0xb
> >  [] softlockup_tick+0x6f/0x80
> >  [] timer_interrupt+0x2d/0x75
> >  [] handle_IRQ_event+0x2e/0x5a
> >  [] __do_IRQ+0xc2/0x127
> >  [] do_IRQ+0x4e/0x86
> >  ===
> >  [] smp_apic_timer_interrupt+0xc1/0xca
> >  [] common_interrupt+0x1a/0x20
> >  [] e100_clean_cbs+0x2f/0x12b [e100]
> >  [] e100_down+0x66/0x9a [e100]
> >  [] e100_close+0xa/0xd [e100]
> >  [] dev_close+0x40/0x7e
> >  [] dev_change_flags+0x46/0xf5
> >  [] devinet_ioctl+0x564/0x5df
> >  [] sock_ioctl+0xc3/0x250
> >  [] sock_ioctl+0x0/0x250
> >  [] do_ioctl+0x1f/0x6d
> >  [] vfs_ioctl+0x50/0x1c6
> >  [] sys_ioctl+0x5d/0x6f
> >  [] syscall_call+0x7/0xb
> > 
> > 
> > 
> > Preconditions for this are:
> > 
> > - E100 card stopped working for some reason (no idea why, it just
> >   does sometimes at this oldish 2x P-III machine)
> > - There are active datastreams running in and out
> >   (around 0.2 Mbps out, multiple megabits in.)
> > - Commanding then "ifconfig eth0 down" results in what feels like 
> >   system freezing, but it does recover in about 30-60 seconds
> >   (it takes long enough for me to sweat bullets...)
> > - While in freeze state, keyboard can go crazy, but mouse does
> >   respond, as well as tvtime shows bt848 captured live video.
> > -
> > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> > the body of a message to [EMAIL PROTECTED]
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > Please read the FAQ at  http://www.tux.org/lkml/
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Soft lockup in e100 driver ?

2005-08-09 Thread Daniel Walker

It looks like this might be an SMP race , it seem that both processors
are in e100_down(). There is a while loop in e100_clean_cbs() that
appears to have an unsafe looping condition . 

It looks like cbs_avail might jump over params.cbs.count , then you
would have to wait for a rollover . Is this a PREEMPT_NONE kernel?


This patch may help, but it's not a complete fix.

--- linux-2.6.12.orig/drivers/net/e100.c2005-08-05 16:45:59.0 
+
+++ linux-2.6.12/drivers/net/e100.c 2005-08-09 16:14:45.0 +
@@ -1393,7 +1393,7 @@ static inline int e100_tx_clean(struct n
 static void e100_clean_cbs(struct nic *nic)
 {
if(nic->cbs) {
-   while(nic->cbs_avail != nic->params.cbs.count) {
+   while(nic->cbs_avail < nic->params.cbs.count) {
struct cb *cb = nic->cb_to_clean;
if(cb->skb) {
pci_unmap_single(nic->pdev,



On Tue, 2005-08-09 at 16:36 +0300, Matti Aarnio wrote:
> Running very recent Fedora Core Development kernel I can following
> soft-oops..   ( 2.6.12-1.1455_FC5smp )
> 
> 
> e100: eth0: e100_watchdog: link up, 100Mbps, full-duplex
> BUG: soft lockup detected on CPU#0!
> 
> Pid: 10743, comm: ifconfig
> EIP: 0060:[] CPU: 0
> EIP is at e100_clean_cbs+0x2f/0x12b [e100]
>  EFLAGS: 0293Not tainted  (2.6.12-1.1455_FC5smp)
> EAX: 495c7c2b EBX: 495c7c2b ECX: f6c311a0 EDX: 
> ESI: 0040 EDI: f6c3 EBP: f71a4b20 DS: 007b ES: 007b
> CR0: 8005003b CR2: 0804a544 CR3: 01e9cd80 CR4: 06f0
>  [] e100_down+0x66/0x9a [e100]
>  [] e100_close+0xa/0xd [e100]
>  [] dev_close+0x40/0x7e
>  [] dev_change_flags+0x46/0xf5
>  [] devinet_ioctl+0x564/0x5df
>  [] sock_ioctl+0xc3/0x250
>  [] sock_ioctl+0x0/0x250
>  [] do_ioctl+0x1f/0x6d
>  [] vfs_ioctl+0x50/0x1c6
>  [] sys_ioctl+0x5d/0x6f
>  [] syscall_call+0x7/0xb
>  [] softlockup_tick+0x6f/0x80
>  [] timer_interrupt+0x2d/0x75
>  [] handle_IRQ_event+0x2e/0x5a
>  [] __do_IRQ+0xc2/0x127
>  [] do_IRQ+0x4e/0x86
>  ===
>  [] smp_apic_timer_interrupt+0xc1/0xca
>  [] common_interrupt+0x1a/0x20
>  [] e100_clean_cbs+0x2f/0x12b [e100]
>  [] e100_down+0x66/0x9a [e100]
>  [] e100_close+0xa/0xd [e100]
>  [] dev_close+0x40/0x7e
>  [] dev_change_flags+0x46/0xf5
>  [] devinet_ioctl+0x564/0x5df
>  [] sock_ioctl+0xc3/0x250
>  [] sock_ioctl+0x0/0x250
>  [] do_ioctl+0x1f/0x6d
>  [] vfs_ioctl+0x50/0x1c6
>  [] sys_ioctl+0x5d/0x6f
>  [] syscall_call+0x7/0xb
> 
> 
> 
> Preconditions for this are:
> 
> - E100 card stopped working for some reason (no idea why, it just
>   does sometimes at this oldish 2x P-III machine)
> - There are active datastreams running in and out
>   (around 0.2 Mbps out, multiple megabits in.)
> - Commanding then "ifconfig eth0 down" results in what feels like 
>   system freezing, but it does recover in about 30-60 seconds
>   (it takes long enough for me to sweat bullets...)
> - While in freeze state, keyboard can go crazy, but mouse does
>   respond, as well as tvtime shows bt848 captured live video.
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [EMAIL PROTECTED]
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Soft lockup in e100 driver ?

2005-08-09 Thread Steven Rostedt
On Tue, 2005-08-09 at 18:55 +0300, Matti Aarnio wrote:

> The fundamental thing is, IT LOCKS UP (for a while), when I do 
> "ifconfig eth0 down" and there is active traffic but the card DIES
> somehow.  Apparently it requires marginal/unreliable hardware to
> happen as well.  (Which for e100 is rather rare.)

This does look like a problem with the e100. I have a SMP machine and
another machine with a e100 card, but not the both together, and I'm not
about to start pulling cards.  Does this only happen in SMP or do you
also see this problem running a UP kernel (you only need to run a UP
kernel on SMP machine to get the same results)? I'm running debian but I
guess I could run the Fedora kernel to see if I can get the same
behavior.

> That is: at first the card dies, then I notice it, and do the ifconfig.
> Then things go _bad_, and recover.  Then I do 'rmmod e100', and
> restart network (which reloads the driver module), and things work
> once again.

So you have something locking up momentarily, then coming back to
normal?  After the rmmod of e100 and bringing back up the network, all
is in order?  Just confirming what you see.

> 
> Fedora kernel sources have this "softlockups" patch file: (size and date)
>6159 May 12 04:50 linux-2.6.12-detect-softlockups.patch
> 
> That file I can upload, if you want.  Or send in email.
> Rest of the RPM-wrapper CPIO package I would prefer not to...

Did you add that patch yourself, or did it come with an update?  I was
just fiddling with rpms and I can use them too, with the rpm2cpio, it
works nice.  So if you can just point to a link then I'll download it
and try it out.  I found
http://download.fedora.redhat.com/pub/fedora/linux/core/updates/testing/4/i386/kernel-smp-2.6.12-1.1411_FC4.i686.rpm
but this is to 1411 and not to what you showed (1455).

-- Steve


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Soft lockup in e100 driver ?

2005-08-09 Thread Matti Aarnio
On Tue, Aug 09, 2005 at 11:41:40AM -0400, Steven Rostedt wrote:
> On Tue, 2005-08-09 at 17:37 +0300, Matti Aarnio wrote:
> > On Tue, Aug 09, 2005 at 03:55:49PM +0200, Jesper Juhl wrote:
> > > On 8/9/05, Matti Aarnio <[EMAIL PROTECTED]> wrote:
> > > > Running very recent Fedora Core Development kernel I can following
> > > > soft-oops..   ( 2.6.12-1.1455_FC5smp )
> > > > 
> > > Various patches to the e100 driver have been merged since 2.6.12.1
> > > (which is ~1.5months old), so it would make sense to try a more recent
> > > kernel like 2.6.13-rc6, 2.6.13-rc6-git1 or 2.6.13-rc5-mm1 and see if
> > > you can still reproduce the problem with those.
> > 
> > The kernel in question is less than 3 days old RedHat Fedora Core
> > Development kernel with baseline as:
> >   * Sun Aug 07 2005 Dave Jones <[EMAIL PROTECTED]>
> > - 2.6.13-rc5-git4
> > 
> > Those merges have not helped.
> 
> Matti,
> 
> I believe Fedora must have added Ingo's soft lockup detect code.  I've
> made additions to this code as well. Could you point me to a link that I
> could download this kernel source.  No rpm's or packagemanagers please.
> Just a tarball would be fine.

The fundamental thing is, IT LOCKS UP (for a while), when I do 
"ifconfig eth0 down" and there is active traffic but the card DIES
somehow.  Apparently it requires marginal/unreliable hardware to
happen as well.  (Which for e100 is rather rare.)

That is: at first the card dies, then I notice it, and do the ifconfig.
Then things go _bad_, and recover.  Then I do 'rmmod e100', and
restart network (which reloads the driver module), and things work
once again.

Fedora kernel sources have this "softlockups" patch file: (size and date)
   6159 May 12 04:50 linux-2.6.12-detect-softlockups.patch

That file I can upload, if you want.  Or send in email.
Rest of the RPM-wrapper CPIO package I would prefer not to...


> Thanks,
> -- Steve

/Matti Aarnio
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Soft lockup in e100 driver ?

2005-08-09 Thread Steven Rostedt
On Tue, 2005-08-09 at 17:37 +0300, Matti Aarnio wrote:
> On Tue, Aug 09, 2005 at 03:55:49PM +0200, Jesper Juhl wrote:
> > On 8/9/05, Matti Aarnio <[EMAIL PROTECTED]> wrote:
> > > Running very recent Fedora Core Development kernel I can following
> > > soft-oops..   ( 2.6.12-1.1455_FC5smp )
> > > 
> > Various patches to the e100 driver have been merged since 2.6.12.1
> > (which is ~1.5months old), so it would make sense to try a more recent
> > kernel like 2.6.13-rc6, 2.6.13-rc6-git1 or 2.6.13-rc5-mm1 and see if
> > you can still reproduce the problem with those.
> 
> The kernel in question is less than 3 days old RedHat Fedora Core
> Development kernel with baseline as:
>   * Sun Aug 07 2005 Dave Jones <[EMAIL PROTECTED]>
> - 2.6.13-rc5-git4
> 
> Those merges have not helped.

Matti,

I believe Fedora must have added Ingo's soft lockup detect code.  I've
made additions to this code as well. Could you point me to a link that I
could download this kernel source.  No rpm's or packagemanagers please.
Just a tarball would be fine.

Thanks,

-- Steve


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Soft lockup in e100 driver ?

2005-08-09 Thread Daniel Walker
On Tue, 2005-08-09 at 11:23 -0400, Steven Rostedt wrote:
> 
> I just downloaded 2.6.13-rc6-git and I don't see the merge of the soft
> lockup code.  Is this a Fedora thing?  If so, could someone point me to
> a link to download this Fedora kernel. I'm currently using Debian.

I seem to recall seeing fedora using voluntary preempt before it was
merged.

Daniel

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Soft lockup in e100 driver ?

2005-08-09 Thread Steven Rostedt
On Tue, 2005-08-09 at 10:58 -0400, Lee Revell wrote:
> On Tue, 2005-08-09 at 16:36 +0300, Matti Aarnio wrote:
> > Running very recent Fedora Core Development kernel I can following
> > soft-oops..   ( 2.6.12-1.1455_FC5smp )
> > 
> > 
> > e100: eth0: e100_watchdog: link up, 100Mbps, full-duplex
> > BUG: soft lockup detected on CPU#0!
> 
> Could this be a false positive?  It's suspicious that the soft lockup
> detector was just merged to mainline then you got this.

I just downloaded 2.6.13-rc6-git and I don't see the merge of the soft
lockup code.  Is this a Fedora thing?  If so, could someone point me to
a link to download this Fedora kernel. I'm currently using Debian.

-- Steve


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Soft lockup in e100 driver ?

2005-08-09 Thread Lee Revell
On Tue, 2005-08-09 at 16:36 +0300, Matti Aarnio wrote:
> Running very recent Fedora Core Development kernel I can following
> soft-oops..   ( 2.6.12-1.1455_FC5smp )
> 
> 
> e100: eth0: e100_watchdog: link up, 100Mbps, full-duplex
> BUG: soft lockup detected on CPU#0!

Could this be a false positive?  It's suspicious that the soft lockup
detector was just merged to mainline then you got this.

Lee

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Soft lockup in e100 driver ?

2005-08-09 Thread Matti Aarnio
On Tue, Aug 09, 2005 at 03:55:49PM +0200, Jesper Juhl wrote:
> On 8/9/05, Matti Aarnio <[EMAIL PROTECTED]> wrote:
> > Running very recent Fedora Core Development kernel I can following
> > soft-oops..   ( 2.6.12-1.1455_FC5smp )
> > 
> Various patches to the e100 driver have been merged since 2.6.12.1
> (which is ~1.5months old), so it would make sense to try a more recent
> kernel like 2.6.13-rc6, 2.6.13-rc6-git1 or 2.6.13-rc5-mm1 and see if
> you can still reproduce the problem with those.

The kernel in question is less than 3 days old RedHat Fedora Core
Development kernel with baseline as:
  * Sun Aug 07 2005 Dave Jones <[EMAIL PROTECTED]>
- 2.6.13-rc5-git4

Those merges have not helped.

> -- 
> Jesper Juhl <[EMAIL PROTECTED]>

  /Matti Aarnio
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Soft lockup in e100 driver ?

2005-08-09 Thread Jesper Juhl
On 8/9/05, Matti Aarnio <[EMAIL PROTECTED]> wrote:
> Running very recent Fedora Core Development kernel I can following
> soft-oops..   ( 2.6.12-1.1455_FC5smp )
> 
Various patches to the e100 driver have been merged since 2.6.12.1
(which is ~1.5months old), so it would make sense to try a more recent
kernel like 2.6.13-rc6, 2.6.13-rc6-git1 or 2.6.13-rc5-mm1 and see if
you can still reproduce the problem with those.

-- 
Jesper Juhl <[EMAIL PROTECTED]>
Don't top-post  http://www.catb.org/~esr/jargon/html/T/top-post.html
Plain text mails only, please  http://www.expita.com/nomime.html
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/