Re: Request for net guru help: waitqueue oops

2000-10-04 Thread Hans Grobler

On Wed, 4 Oct 2000, Petko Manolov wrote:
> > The timer routines (there are 4) are used to switch hardware states and
> > must therefore be mutually exclusive with respect to the interrupt handler.
> > There are no bottom halves used in this driver. Andrew Morton suggested
> > that the problem could be in my use of the skb pointers, which seems
> > a likely candidate. I'll check that.
> 
> It might be, but it might not. Be careful about locking and calling
> procedures which can sleep from interrupt context.
> 
> Sorry if i am not enough specific, i haven't seen the code ;-)

I have found another driver in the standard kernel that also causes this
oops and have posted to linux-net (as this appears to be networking 
related). 

Thanks
-- Hans.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: Request for net guru help: waitqueue oops

2000-10-04 Thread Petko Manolov

Hans Grobler wrote:
> 
> Ok. I originally had them outside locks as they appeared to be atomic. I
> moved them in incase they were the cause of the problem.


Don't bother about them - see include/linux/netdevice.h to be sure.

 
> The timer routines (there are 4) are used to switch hardware states and
> must therefore be mutually exclusive with respect to the interrupt handler.
> There are no bottom halves used in this driver. Andrew Morton suggested
> that the problem could be in my use of the skb pointers, which seems
> a likely candidate. I'll check that.


It might be, but it might not. Be careful about locking and calling
procedures which can sleep from interrupt context.

Sorry if i am not enough specific, i haven't seen the code ;-)


best,
Petkan
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: Request for net guru help: waitqueue oops

2000-10-04 Thread Petko Manolov

Hans Grobler wrote:
 
 Ok. I originally had them outside locks as they appeared to be atomic. I
 moved them in incase they were the cause of the problem.


Don't bother about them - see include/linux/netdevice.h to be sure.

 
 The timer routines (there are 4) are used to switch hardware states and
 must therefore be mutually exclusive with respect to the interrupt handler.
 There are no bottom halves used in this driver. Andrew Morton suggested
 that the problem could be in my use of the skb pointers, which seems
 a likely candidate. I'll check that.


It might be, but it might not. Be careful about locking and calling
procedures which can sleep from interrupt context.

Sorry if i am not enough specific, i haven't seen the code ;-)


best,
Petkan
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: Request for net guru help: waitqueue oops

2000-10-04 Thread Hans Grobler

On Wed, 4 Oct 2000, Petko Manolov wrote:
  The timer routines (there are 4) are used to switch hardware states and
  must therefore be mutually exclusive with respect to the interrupt handler.
  There are no bottom halves used in this driver. Andrew Morton suggested
  that the problem could be in my use of the skb pointers, which seems
  a likely candidate. I'll check that.
 
 It might be, but it might not. Be careful about locking and calling
 procedures which can sleep from interrupt context.
 
 Sorry if i am not enough specific, i haven't seen the code ;-)

I have found another driver in the standard kernel that also causes this
oops and have posted to linux-net (as this appears to be networking 
related). 

Thanks
-- Hans.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: Request for net guru help: waitqueue oops

2000-10-03 Thread Hans Grobler

On Tue, 3 Oct 2000, Petko Manolov wrote:
> None of these can sleep. netif_*_queue routines are quite simple.
> They are all atomic so there is no need to protect them with locks.

Ok. I originally had them outside locks as they appeared to be atomic. I
moved them in incase they were the cause of the problem.

> It is not clear from the example above if it is needed to lock in
> the timer routine and what is locked inside. Anyway be careful
> about locking regions shared between interrupts/bottom halves and
> user context as this happens often.

The timer routines (there are 4) are used to switch hardware states and
must therefore be mutually exclusive with respect to the interrupt handler. 
There are no bottom halves used in this driver. Andrew Morton suggested
that the problem could be in my use of the skb pointers, which seems
a likely candidate. I'll check that.

Thanks
-- Hans

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: Request for net guru help: waitqueue oops

2000-10-03 Thread Petko Manolov

Hans Grobler wrote:
> 
> On Tue, 3 Oct 2000, Petko Manolov wrote:
>
> > It seems you're trying to sleep without process context (most likely in
> > net_tx_action). It would be more clear if you send that part of the
> > code.
> 
> Since I don't explictly sleep anywhere, I'm not sure which code fragment
> would be useful... (net_tx_action is part of the networking layers). Which
> network functions can sleep (netif_rx, netif_stop_queue, netif_wake_queue,
> ...) ?


None of these can sleep. netif_*_queue routines are quite simple.
They are all atomic so there is no need to protect them with locks.

 
> After reading the softnet HOWTO, and some of the network drivers, I
> was unsure about the netif_stop_queue and netif_wake_queue functions. The
> howto indicated that these two should be protected from concurrent
> execution by a private lock. Not all the drivers seem to do this. In my
> case (although I'm running UP at the moment), I've used a driver global
> spinlock, for example:
> 
>   spinlock_t driver_lock = SPIN_LOCK_UNLOCKED;
> 
>   int scc72_hard_xmit (struct sk_buff *skb, struct net_device *dev)
>   {
> unsigned long flags;
> 
> /* ... */
> 
> spin_lock_irqsave (_lock, flags);
> netif_stop_queue (dev);
> spin_unlock_irqrestore (_lock, flags);
> 
> /* ... */
>   }
> 
>   /* Example timer callback, to wake the queue */
>   void scc72_interframewait (unsigned long channel)
>   {
> unsigned long flags;
> struct scc72_channel *scc = (struct scc72_channel *) channel;
> 
> /* ... */
> 
> spin_lock_irqsave (_lock, flags);
> 
> /* ... */
> 
> if (netif_queue_stopped (scc->dev))
>   netif_wake_queue (scc->dev);
> 
> spin_unlock_irqrestore (_lock, flags);
>   }

 
It is not clear from the example above if it is needed to lock in
the timer routine and what is locked inside. Anyway be careful
about locking regions shared between interrupts/bottom halves and
user context as this happens often.




best,
Petkan
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: Request for net guru help: waitqueue oops

2000-10-03 Thread Hans Grobler

Hi Petkan,

Thanks for your comment.

On Tue, 3 Oct 2000, Petko Manolov wrote:
> > A driver I'm working on seems to be doing/triggering something related
> > to waitqueues. This causes a perfectly reproducable oops (small mercies!).
> > Since the oops is not happening in my driver, I'm having a hard time
> > figuring out whats going wrong. I suspect a networking guru will take
> > one look and know what I'm doing wrong. Any suggestions please?
> 
> 
> It seems you're trying to sleep without process context (most likely in
> net_tx_action). It would be more clear if you send that part of the
> code.

Since I don't explictly sleep anywhere, I'm not sure which code fragment
would be useful... (net_tx_action is part of the networking layers). Which
network functions can sleep (netif_rx, netif_stop_queue, netif_wake_queue,
...) ?

After reading the softnet HOWTO, and some of the network drivers, I
was unsure about the netif_stop_queue and netif_wake_queue functions. The
howto indicated that these two should be protected from concurrent
execution by a private lock. Not all the drivers seem to do this. In my
case (although I'm running UP at the moment), I've used a driver global
spinlock, for example:

  spinlock_t driver_lock = SPIN_LOCK_UNLOCKED;

  int scc72_hard_xmit (struct sk_buff *skb, struct net_device *dev) 
  {  
unsigned long flags;

/* ... */
  
spin_lock_irqsave (_lock, flags);
netif_stop_queue (dev);
spin_unlock_irqrestore (_lock, flags);

/* ... */ 
  }

  /* Example timer callback, to wake the queue */
  void scc72_interframewait (unsigned long channel)
  {
unsigned long flags;
struct scc72_channel *scc = (struct scc72_channel *) channel;

/* ... */

spin_lock_irqsave (_lock, flags);

/* ... */
 
if (netif_queue_stopped (scc->dev))
  netif_wake_queue (scc->dev);

spin_unlock_irqrestore (_lock, flags);
  }

I've just checked my driver, and below is the list of all the external
functions called. Any idea which of these could be trying to sleep?

  dev_kfree_skb_any (called from both hard IRQ and non IRQ context)
  dev_alloc_skb (called from both hard IRQ and non IRQ context)
  del_timer (called from both hard IRQ and non IRQ context)
  add_timer (called from both hard IRQ and non IRQ context)
  netif_rx  (called from IRQ context) 
  netif_start_queue (called from non hard IRQ context, ex: dev_open)
  netif_stop_queue  (called from non hard IRQ context, ex: hard_start_xmit)
  netif_wake_queue  (called from non hard IRQ context, ex: timer callbacks)
  netif_queue_stopped   (called from non hard IRQ context, ex: timer callbacks)
  skb_queue_tail(called from non hard IRQ context, ex: hard_start_xmit)
  skb_dequeue   (called from both hard IRQ and non IRQ context)
  skb_queue_head_init   (called from non hard IRQ context, ex: dev_open)

and the standard functions dev_init_buffers, register_netdevice, 
   copy_from_user, unregister_netdev, etc. called in the standard places.

skb_queue_tail, skb_dequeue and skb_queue_head_init are used to manage
an internal queue of outgoing skb's.

Thanks.
-- Hans
  



-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: Request for net guru help: waitqueue oops

2000-10-03 Thread Petko Manolov

Hans Grobler wrote:
> 
> Hi all,
> 
> A driver I'm working on seems to be doing/triggering something related
> to waitqueues. This causes a perfectly reproducable oops (small mercies!).
> Since the oops is not happening in my driver, I'm having a hard time
> figuring out whats going wrong. I suspect a networking guru will take
> one look and know what I'm doing wrong. Any suggestions please?


It seems you're trying to sleep without process context (most likely in
net_tx_action). It would be more clear if you send that part of the
code.


 
best,
Petkan
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Request for net guru help: waitqueue oops

2000-10-03 Thread Hans Grobler

Hi all,

A driver I'm working on seems to be doing/triggering something related
to waitqueues. This causes a perfectly reproducable oops (small mercies!).
Since the oops is not happening in my driver, I'm having a hard time
figuring out whats going wrong. I suspect a networking guru will take
one look and know what I'm doing wrong. Any suggestions please?

Initially, I was getting the first oops below. After browsing the waitqueue
code, I found and enabled the WAITQUEUE_DEBUG define. Now I'm getting the
second oops. The values 8729, 8731 in eax ebx ecx (first oops) and in the
magic & creator field (second oops) look very weird... something
incrementing... 

In my driver I have all pointers protected by magic numbers. These are
validated before every use (will do a BUG() on invalid pointer).

TIA
-- Hans.

---[ OOPS1 ]--

ksymoops 2.3.4 on i686 2.4.0-test9.  Options used
 -v /usr/src/linux/vmlinux (specified)
 -k ./ksyms (specified)
 -l ./modules (specified)
 -o /lib/modules/2.4.0-test9 (specified)
 -m /usr/src/linux/System.map (specified)

Unable to handle kernel paging request at virtual address 8731
c0113a70
*pde = 
Oops: 
CPU:0
EIP:0010:[]
Using defaults from ksymoops -t elf32-i386 -a i386
EFLAGS: 00010003
eax: 8729   ebx: 8731   ecx: 8731   edx: 0021
esi:    edi: 000d   ebp: c0231f40   esp: c0231f1c
ds: 0018   es: 0018   ss: 0018
Process swapper (pid: 0, stackpage=c0231000)
Stack: c3fc59a0 c3fa8800 000d 0110 8731 c17aec6c 0246 0001 
   0021 c0231fa4 c01a5155 c3fc59a0 c01a4a53 c3fc59a0  c01a55d0 
   c3fa8800 000d c010a00d c01a7129 c3fa8800 0001 c0269c08 000d 
Call Trace: [] [] [] [] [] 
[] [] 
   [] [] [] [] [] [] 
[] [] 
   [] [] 
Code: 8b 1b 89 5d ec 8b 48 04 8b 11 89 d0 24 df 85 45 fc 0f 84 79 

>>EIP; c0113a70 <__wake_up+50/144>   <=
Trace; c01a5155 
Trace; c01a4a53 
Trace; c01a55d0 <__kfree_skb+7c/11c>
Trace; c010a00d 
Trace; c01a7129 
Trace; c01192ee 
Trace; c010a1a8 
Trace; c0107160 
Trace; c0107160 
Trace; c0108df0 
Trace; c0107160 
Trace; c0107160 
Trace; c0100018 
Trace; c0107183 
Trace; c01071e4 
Trace; c0105000 
Trace; c0100192 
Code;  c0113a70 <__wake_up+50/144>
 <_EIP>:
Code;  c0113a70 <__wake_up+50/144>   <=
   0:   8b 1b mov(%ebx),%ebx   <=
Code;  c0113a72 <__wake_up+52/144>
   2:   89 5d ec  mov%ebx,0xffec(%ebp)
Code;  c0113a75 <__wake_up+55/144>
   5:   8b 48 04  mov0x4(%eax),%ecx
Code;  c0113a78 <__wake_up+58/144>
   8:   8b 11 mov(%ecx),%edx
Code;  c0113a7a <__wake_up+5a/144>
   a:   89 d0 mov%edx,%eax
Code;  c0113a7c <__wake_up+5c/144>
   c:   24 df and$0xdf,%al
Code;  c0113a7e <__wake_up+5e/144>
   e:   85 45 fc  test   %eax,0xfffc(%ebp)
Code;  c0113a81 <__wake_up+61/144>
  11:   0f 84 79 00 00 00 je 90 <_EIP+0x90> c0113b00 <__wake_up+e0/144>

Aiee, killing interrupt handler
Kernel panic: Attempted to kill the idle task!

---[ OOPS2 ]--

ksymoops 2.3.4 on i686 2.4.0-test9.  Options used
 -v /usr/src/linux/vmlinux (specified)
 -k ./ksyms (specified)
 -l ./modules (specified)
 -o /lib/modules/2.4.0-test9 (specified)
 -m /usr/src/linux/System.map (specified)

bad magic 8722 (should be c2dfbbd4, creator 8723), wq bug, forcing oops.
kernel BUG at /usr/src/linux/include/linux/wait.h:155!
invalid operand: 
CPU:0
EIP:0010:[]
Using defaults from ksymoops -t elf32-i386 -a i386
EFLAGS: 00010296
eax: 0037   ebx: c2dfbbc8   ecx: c0240b48   edx: 
esi: c3bbe060   edi: 000d   ebp: c0253fa4   esp: c0253f34
ds: 0018   es: 0018   ss: 0018
Process swapper (pid: 0, stackpage=c0253000)
Stack: c02291e4 c02291c0 009b c3bbe060 c3f87260 c01b2ea7 c3bbe060  
   c01b3bc0 c3f87260 000d  c01b582a c3f87260 0001 c028bc08 
   000d c0253fa4 c011b1ae c028bc08 00a0 c02839a0 0005 c010a4a5 
Call Trace: [] [] [] [] [] 
[] [] 
   [] [] [] [] [] [] 
[] [] 
   [] [] 
Code: 0f 0b 83 c4 0c 8d b6 00 00 00 00 8d 43 04 39 43 04 74 0d 8b 

>>EIP; c01b3715<=
Trace; c02291e4 
Trace; c02291c0 
Trace; c01b2ea7 
Trace; c01b3bc0 <__kfree_skb+7c/11c>
Trace; c01b582a 
Trace; c011b1ae 
Trace; c010a4a5 
Trace; c0107160 
Trace; c0107160 
Trace; c010902c 
Trace; c0107160 
Trace; c0107160 
Trace; c0100018 
Trace; c0107183 
Trace; c01071e4 
Trace; c0105000 
Trace; c0100192 
Code;  c01b3715 
 <_EIP>:
Code;  c01b3715<=
   0:   0f 0b ud2a  <=
Code;  c01b3717 
   2:   83 c4 0c  add$0xc,%esp
Code;  c01b371a 
   5:   8d b6 00 00 00 00 lea0x0(%esi),%esi
Code;  c01b3720 
   b:   8d 43 04  lea0x4(%ebx),%eax

Request for net guru help: waitqueue oops

2000-10-03 Thread Hans Grobler

Hi all,

A driver I'm working on seems to be doing/triggering something related
to waitqueues. This causes a perfectly reproducable oops (small mercies!).
Since the oops is not happening in my driver, I'm having a hard time
figuring out whats going wrong. I suspect a networking guru will take
one look and know what I'm doing wrong. Any suggestions please?

Initially, I was getting the first oops below. After browsing the waitqueue
code, I found and enabled the WAITQUEUE_DEBUG define. Now I'm getting the
second oops. The values 8729, 8731 in eax ebx ecx (first oops) and in the
magic  creator field (second oops) look very weird... something
incrementing... 

In my driver I have all pointers protected by magic numbers. These are
validated before every use (will do a BUG() on invalid pointer).

TIA
-- Hans.

---[ OOPS1 ]--

ksymoops 2.3.4 on i686 2.4.0-test9.  Options used
 -v /usr/src/linux/vmlinux (specified)
 -k ./ksyms (specified)
 -l ./modules (specified)
 -o /lib/modules/2.4.0-test9 (specified)
 -m /usr/src/linux/System.map (specified)

Unable to handle kernel paging request at virtual address 8731
c0113a70
*pde = 
Oops: 
CPU:0
EIP:0010:[c0113a70]
Using defaults from ksymoops -t elf32-i386 -a i386
EFLAGS: 00010003
eax: 8729   ebx: 8731   ecx: 8731   edx: 0021
esi:    edi: 000d   ebp: c0231f40   esp: c0231f1c
ds: 0018   es: 0018   ss: 0018
Process swapper (pid: 0, stackpage=c0231000)
Stack: c3fc59a0 c3fa8800 000d 0110 8731 c17aec6c 0246 0001 
   0021 c0231fa4 c01a5155 c3fc59a0 c01a4a53 c3fc59a0  c01a55d0 
   c3fa8800 000d c010a00d c01a7129 c3fa8800 0001 c0269c08 000d 
Call Trace: [c01a5155] [c01a4a53] [c01a55d0] [c010a00d] [c01a7129] 
[c01192ee] [c010a1a8] 
   [c0107160] [c0107160] [c0108df0] [c0107160] [c0107160] [c0100018] 
[c0107183] [c01071e4] 
   [c0105000] [c0100192] 
Code: 8b 1b 89 5d ec 8b 48 04 8b 11 89 d0 24 df 85 45 fc 0f 84 79 

EIP; c0113a70 __wake_up+50/144   =
Trace; c01a5155 sock_def_write_space+2d/74
Trace; c01a4a53 sock_wfree+17/30
Trace; c01a55d0 __kfree_skb+7c/11c
Trace; c010a00d handle_IRQ_event+31/5c
Trace; c01a7129 net_tx_action+45/a0
Trace; c01192ee do_softirq+4e/74
Trace; c010a1a8 do_IRQ+9c/ac
Trace; c0107160 default_idle+0/28
Trace; c0107160 default_idle+0/28
Trace; c0108df0 ret_from_intr+0/20
Trace; c0107160 default_idle+0/28
Trace; c0107160 default_idle+0/28
Trace; c0100018 startup_32+18/13a
Trace; c0107183 default_idle+23/28
Trace; c01071e4 cpu_idle+3c/50
Trace; c0105000 empty_bad_page+0/1000
Trace; c0100192 L6+0/2
Code;  c0113a70 __wake_up+50/144
 _EIP:
Code;  c0113a70 __wake_up+50/144   =
   0:   8b 1b mov(%ebx),%ebx   =
Code;  c0113a72 __wake_up+52/144
   2:   89 5d ec  mov%ebx,0xffec(%ebp)
Code;  c0113a75 __wake_up+55/144
   5:   8b 48 04  mov0x4(%eax),%ecx
Code;  c0113a78 __wake_up+58/144
   8:   8b 11 mov(%ecx),%edx
Code;  c0113a7a __wake_up+5a/144
   a:   89 d0 mov%edx,%eax
Code;  c0113a7c __wake_up+5c/144
   c:   24 df and$0xdf,%al
Code;  c0113a7e __wake_up+5e/144
   e:   85 45 fc  test   %eax,0xfffc(%ebp)
Code;  c0113a81 __wake_up+61/144
  11:   0f 84 79 00 00 00 je 90 _EIP+0x90 c0113b00 __wake_up+e0/144

Aiee, killing interrupt handler
Kernel panic: Attempted to kill the idle task!

---[ OOPS2 ]--

ksymoops 2.3.4 on i686 2.4.0-test9.  Options used
 -v /usr/src/linux/vmlinux (specified)
 -k ./ksyms (specified)
 -l ./modules (specified)
 -o /lib/modules/2.4.0-test9 (specified)
 -m /usr/src/linux/System.map (specified)

bad magic 8722 (should be c2dfbbd4, creator 8723), wq bug, forcing oops.
kernel BUG at /usr/src/linux/include/linux/wait.h:155!
invalid operand: 
CPU:0
EIP:0010:[c01b3715]
Using defaults from ksymoops -t elf32-i386 -a i386
EFLAGS: 00010296
eax: 0037   ebx: c2dfbbc8   ecx: c0240b48   edx: 
esi: c3bbe060   edi: 000d   ebp: c0253fa4   esp: c0253f34
ds: 0018   es: 0018   ss: 0018
Process swapper (pid: 0, stackpage=c0253000)
Stack: c02291e4 c02291c0 009b c3bbe060 c3f87260 c01b2ea7 c3bbe060  
   c01b3bc0 c3f87260 000d  c01b582a c3f87260 0001 c028bc08 
   000d c0253fa4 c011b1ae c028bc08 00a0 c02839a0 0005 c010a4a5 
Call Trace: [c02291e4] [c02291c0] [c01b2ea7] [c01b3bc0] [c01b582a] 
[c011b1ae] [c010a4a5] 
   [c0107160] [c0107160] [c010902c] [c0107160] [c0107160] [c0100018] 
[c0107183] [c01071e4] 
   [c0105000] [c0100192] 
Code: 0f 0b 83 c4 0c 8d b6 00 00 00 00 8d 43 04 39 43 04 74 0d 8b 

EIP; c01b3715 sock_def_write_space+5d/c4   =
Trace; c02291e4 RCSid+6ee4/9360
Trace; c02291c0 RCSid+6ec0/9360
Trace; c01b2ea7 

Re: Request for net guru help: waitqueue oops

2000-10-03 Thread Petko Manolov

Hans Grobler wrote:
 
 Hi all,
 
 A driver I'm working on seems to be doing/triggering something related
 to waitqueues. This causes a perfectly reproducable oops (small mercies!).
 Since the oops is not happening in my driver, I'm having a hard time
 figuring out whats going wrong. I suspect a networking guru will take
 one look and know what I'm doing wrong. Any suggestions please?


It seems you're trying to sleep without process context (most likely in
net_tx_action). It would be more clear if you send that part of the
code.


 
best,
Petkan
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: Request for net guru help: waitqueue oops

2000-10-03 Thread Hans Grobler

Hi Petkan,

Thanks for your comment.

On Tue, 3 Oct 2000, Petko Manolov wrote:
  A driver I'm working on seems to be doing/triggering something related
  to waitqueues. This causes a perfectly reproducable oops (small mercies!).
  Since the oops is not happening in my driver, I'm having a hard time
  figuring out whats going wrong. I suspect a networking guru will take
  one look and know what I'm doing wrong. Any suggestions please?
 
 
 It seems you're trying to sleep without process context (most likely in
 net_tx_action). It would be more clear if you send that part of the
 code.

Since I don't explictly sleep anywhere, I'm not sure which code fragment
would be useful... (net_tx_action is part of the networking layers). Which
network functions can sleep (netif_rx, netif_stop_queue, netif_wake_queue,
...) ?

After reading the softnet HOWTO, and some of the network drivers, I
was unsure about the netif_stop_queue and netif_wake_queue functions. The
howto indicated that these two should be protected from concurrent
execution by a private lock. Not all the drivers seem to do this. In my
case (although I'm running UP at the moment), I've used a driver global
spinlock, for example:

  spinlock_t driver_lock = SPIN_LOCK_UNLOCKED;

  int scc72_hard_xmit (struct sk_buff *skb, struct net_device *dev) 
  {  
unsigned long flags;

/* ... */
  
spin_lock_irqsave (driver_lock, flags);
netif_stop_queue (dev);
spin_unlock_irqrestore (driver_lock, flags);

/* ... */ 
  }

  /* Example timer callback, to wake the queue */
  void scc72_interframewait (unsigned long channel)
  {
unsigned long flags;
struct scc72_channel *scc = (struct scc72_channel *) channel;

/* ... */

spin_lock_irqsave (driver_lock, flags);

/* ... */
 
if (netif_queue_stopped (scc-dev))
  netif_wake_queue (scc-dev);

spin_unlock_irqrestore (driver_lock, flags);
  }

I've just checked my driver, and below is the list of all the external
functions called. Any idea which of these could be trying to sleep?

  dev_kfree_skb_any (called from both hard IRQ and non IRQ context)
  dev_alloc_skb (called from both hard IRQ and non IRQ context)
  del_timer (called from both hard IRQ and non IRQ context)
  add_timer (called from both hard IRQ and non IRQ context)
  netif_rx  (called from IRQ context) 
  netif_start_queue (called from non hard IRQ context, ex: dev_open)
  netif_stop_queue  (called from non hard IRQ context, ex: hard_start_xmit)
  netif_wake_queue  (called from non hard IRQ context, ex: timer callbacks)
  netif_queue_stopped   (called from non hard IRQ context, ex: timer callbacks)
  skb_queue_tail(called from non hard IRQ context, ex: hard_start_xmit)
  skb_dequeue   (called from both hard IRQ and non IRQ context)
  skb_queue_head_init   (called from non hard IRQ context, ex: dev_open)

and the standard functions dev_init_buffers, register_netdevice, 
   copy_from_user, unregister_netdev, etc. called in the standard places.

skb_queue_tail, skb_dequeue and skb_queue_head_init are used to manage
an internal queue of outgoing skb's.

Thanks.
-- Hans
  



-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: Request for net guru help: waitqueue oops

2000-10-03 Thread Petko Manolov

Hans Grobler wrote:
 
 On Tue, 3 Oct 2000, Petko Manolov wrote:

  It seems you're trying to sleep without process context (most likely in
  net_tx_action). It would be more clear if you send that part of the
  code.
 
 Since I don't explictly sleep anywhere, I'm not sure which code fragment
 would be useful... (net_tx_action is part of the networking layers). Which
 network functions can sleep (netif_rx, netif_stop_queue, netif_wake_queue,
 ...) ?


None of these can sleep. netif_*_queue routines are quite simple.
They are all atomic so there is no need to protect them with locks.

 
 After reading the softnet HOWTO, and some of the network drivers, I
 was unsure about the netif_stop_queue and netif_wake_queue functions. The
 howto indicated that these two should be protected from concurrent
 execution by a private lock. Not all the drivers seem to do this. In my
 case (although I'm running UP at the moment), I've used a driver global
 spinlock, for example:
 
   spinlock_t driver_lock = SPIN_LOCK_UNLOCKED;
 
   int scc72_hard_xmit (struct sk_buff *skb, struct net_device *dev)
   {
 unsigned long flags;
 
 /* ... */
 
 spin_lock_irqsave (driver_lock, flags);
 netif_stop_queue (dev);
 spin_unlock_irqrestore (driver_lock, flags);
 
 /* ... */
   }
 
   /* Example timer callback, to wake the queue */
   void scc72_interframewait (unsigned long channel)
   {
 unsigned long flags;
 struct scc72_channel *scc = (struct scc72_channel *) channel;
 
 /* ... */
 
 spin_lock_irqsave (driver_lock, flags);
 
 /* ... */
 
 if (netif_queue_stopped (scc-dev))
   netif_wake_queue (scc-dev);
 
 spin_unlock_irqrestore (driver_lock, flags);
   }

 
It is not clear from the example above if it is needed to lock in
the timer routine and what is locked inside. Anyway be careful
about locking regions shared between interrupts/bottom halves and
user context as this happens often.




best,
Petkan
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: Request for net guru help: waitqueue oops

2000-10-03 Thread Hans Grobler

On Tue, 3 Oct 2000, Petko Manolov wrote:
 None of these can sleep. netif_*_queue routines are quite simple.
 They are all atomic so there is no need to protect them with locks.

Ok. I originally had them outside locks as they appeared to be atomic. I
moved them in incase they were the cause of the problem.

 It is not clear from the example above if it is needed to lock in
 the timer routine and what is locked inside. Anyway be careful
 about locking regions shared between interrupts/bottom halves and
 user context as this happens often.

The timer routines (there are 4) are used to switch hardware states and
must therefore be mutually exclusive with respect to the interrupt handler. 
There are no bottom halves used in this driver. Andrew Morton suggested
that the problem could be in my use of the skb pointers, which seems
a likely candidate. I'll check that.

Thanks
-- Hans

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/