Scheduling in interrupt BUG.

2001-05-14 Thread Marcell GAL

Hi Guys,

Once upon a time on my
x86 UP box, UP kernel 2.4.4, (64M ram, 260M swap)
http://home.sch.bme.hu/~cell/.config
I hit a reproducable "Scheduling in interrupt" BUG.
Also reproduced with 128M ram and low memory pressure
(first I suspected it is related to swapping)
Running lots of pppd version 2.4.0 (pppoe) sessions almost at the same
time. 
(before the crash the pppoe sessions work fine)
It crashed after 89 sessions, 473 another time.. (depending
on the phase of Jupiter moons I guess .. I still have to verify this),
usually much before memory is exhausted (30k mem/pppd process).
To do this you have to patch ppp_generic.c
http://x-dsl.hu/~cell/ppp_generic_hash/, because
otherwise we hit 'NULL ptr in all_ppp_units list'
BUG much _more likely_ than this 'sched.c line 709 thingy'..

EIP: c010faa4<= sched.c schedule(), line 709:
which is ~ printk("Scheduling in interrupt");BUG();

Trace:

0xc01ddac5 <__lock_sock+53>:movl   $0x0,0x1c(%esp,1)
0xc01ddacd <__lock_sock+61>:mov%ebx,0x20(%esp,1)
0xc01ddad1 <__lock_sock+65>:movl   $0x0,0x24(%esp,1)
0xc01ddad9 <__lock_sock+73>:movl   $0x0,0x28(%esp,1)
0xc01ddae1 <__lock_sock+81>:lea0x1c(%esp,1),%esi
0xc01ddae5 <__lock_sock+85>:lea0x34(%edi),%eax
0xc01ddae8 <__lock_sock+88>:mov%esi,%edx
0xc01ddaea <__lock_sock+90>:call   0xc0110598

0xc01ddaef <__lock_sock+95>:nop
0xc01ddaf0 <__lock_sock+96>:movl   $0x2,(%ebx)
0xc01ddaf6 <__lock_sock+102>:   decl   0xc02f75ec
0xc01ddafc <__lock_sock+108>:   call   0xc010f71c  
*
0xc01ddb01 <__lock_sock+113>:   incl   0xc02f75ec
0xc01ddb07 <__lock_sock+119>:   cmpl   $0x0,0x30(%edi)
0xc01ddb0b <__lock_sock+123>:   jne0xc01ddaf0 <__lock_sock+96>

-
0xc01a315c : push   %esi
0xc01a315d :   push   %ebx
0xc01a315e :   mov0xc(%esp,1),%ebx
0xc01a3162 :   incl   0xc02f75ec
0xc01a3168 :  cmpl   $0x0,0x30(%ebx)
0xc01a316c :
je 0xc01a3177 
0xc01a316e :  push   %ebx
0xc01a316f :  call   0xc01dda90
<__lock_sock>   
0xc01a3174 :  add$0x4,%esp
0xc01a3177 :  movl   $0x1,0x30(%ebx)
0xc01a317e :  decl   0xc02f75ec
0xc01a3184 :  mov0x10(%esp,1),%eax

0xc01ddb2c <__release_sock>:push   %esi
0xc01ddb2d <__release_sock+1>:  push   %ebx
0xc01ddb2e <__release_sock+2>:  mov0xc(%esp,1),%esi
0xc01ddb32 <__release_sock+6>:  mov0xb8(%esi),%eax
0xc01ddb38 <__release_sock+12>: movl   $0x0,0xbc(%esi)
0xc01ddb42 <__release_sock+22>: movl   $0x0,0xb8(%esi)
0xc01ddb4c <__release_sock+32>: lea0x0(%esi,1),%esi
0xc01ddb50 <__release_sock+36>: mov(%eax),%ebx
0xc01ddb52 <__release_sock+38>: movl   $0x0,(%eax)
0xc01ddb58 <__release_sock+44>: push   %eax
0xc01ddb59 <__release_sock+45>: push   %esi
0xc01ddb5a <__release_sock+46>: mov0x31c(%esi),%eax
0xc01ddb60 <__release_sock+52>: call   *%eax
0xc01ddb62 <__release_sock+54>: mov%ebx,%eax
0xc01ddb64 <__release_sock+56>: add$0x8,%esp
0xc01ddb67 <__release_sock+59>: test   %eax,%eax


int pppoe_backlog_rcv(struct sock *sk, struct sk_buff *skb)
{
lock_sock(sk);
pppoe_rcv_core(sk, skb);
release_sock(sk);
return 0;
}


What else should I check? How can we fix it?
PPPoE is more and more frequently used nowadays because of ADSL
services. I think this can effect its stability (guess which direction
;-)
even with one session (though probably not that bad as with many
sessions).
 
Have a nice week:
Cell

-- 
Alice? WTFIA?
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: Scheduling in interrupt BUG. [Patch]

2001-05-14 Thread Marcell Gal

Hi,

This patch solved the problem. Should be ready for inclusion in 2.4.
No more 'Scheduling in interrupt' under those conditions.
Thanx for the thoughts, solution and the amazing speed.
You guys are doing a really great job!

I hope we can get the earlier mentioned NULL ptr in all_ppp_units list
straight
soon. (I have a simple workaround - the mentioned hash, that even improves
speed,
but I a real fix would be more satisfaction. The relevant part of
ppp_generic.c
is so simple that it's really strange it is not correct.. ).

thanx:
Cell

Michal Ostrowski wrote:

> Anybody care to comment on this?
> [EMAIL PROTECTED]

--- linuxold/drivers/net/pppoe.cMon May 14 22:06:44 2001
+++ linux/drivers/net/pppoe.c   Mon May 14 22:11:25 2001
@@ -4,9 +4,9 @@
  * PPPoX --- Generic PPP encapsulation socket family
  * PPPoE --- PPP over Ethernet (RFC 2516)
  *
  *
- * Version:0.6.5
+ * Version:0.6.6
  *
  * 030700 : Fixed connect logic to allow for disconnect.
  * 270700 :Fixed potential SMP problems; we must protect against
  * simultaneous invocation of ppp_input
@@ -18,8 +18,9 @@
  * in pppoe_release.
  * 051000 :Initialization cleanup.
  * 00 :Fix recvmsg.
  * 050101 :Fix PADT procesing.
+ * 140501 :pppoe_backlog_rcv must call bh_lock_sock, not lock_sock.
  *
  * Author: Michal Ostrowski <[EMAIL PROTECTED]>
  * Contributors:
  * Arnaldo Carvalho de Melo <[EMAIL PROTECTED]>
@@ -383,11 +384,11 @@
  *
  ***/
 int pppoe_backlog_rcv(struct sock *sk, struct sk_buff *skb)
 {
-   lock_sock(sk);
+   bh_lock_sock(sk);
pppoe_rcv_core(sk, skb);
-   release_sock(sk);
+   bh_unlock_sock(sk);
return 0;
 }


--
You'll never see all the places, or read all the books, but fortunately,
they're not all recommended.



-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



2.4.3 oopses at lots of ppp sessions

2001-04-26 Thread Marcell GAL

Hi Guys,

2.4.3 (UP kernel UP machine, http://home.sch.bme.hu/~cell/.config) 
oopses when I start lots of pppd eth0 simultaneously.
(I guess the problem is not pppoe specific, but I do not know exactly)

The last pppd sighs: PPP: couldn't register device (-17)
This is 2 oops not just 1...

:51 lima pppd[2093]: pppd 2.4.0 started by root, uid 0
:51 lima pppd[2093]: Sending PADI
:51 lima pppd[2093]: HOST_UNIQ successful match 
:51 lima pppd[2093]: Tag error: TAG_SYS_ERR
:51 lima pppd[2093]: Failed to negotiate PPPoE connection: 25
Inappropriate ioctl for device
:51 lima pppd[2093]: Exit.
:51 lima kernel: EIP:0010:[ppp_create_interface+400/452]
:51 lima kernel: EFLAGS: 00010286
:51 lima kernel: eax: c3ab77d8   ebx: c17d4e00   ecx:    edx:
c3ab77d8
:51 lima kernel: esi: 0033   edi: c3ab77a0   ebp:    esp:
c3b23f28
:51 lima kernel: ds: 0018   es: 0018   ss: 0018
:51 lima kernel: Process pppd (pid: 2031, stackpage=c3b23000)
:51 lima kernel: Stack: fff2 08076f48 c2cc3d80 ffe7 c3ab77a0
 c029de64 c019e775 
:51 lima kernel: c3b23f5c 08076f48 08076f48 c004743e
fff2 c019e13c  
:51 lima kernel:c2cc3d80 c004743e 08076f48 c2cc3d80 08076f48
c004743e ffe7 fff2 
:51 lima kernel: Call Trace: [ppp_unattached_ioctl+97/320]
[ppp_ioctl+48/1544] [sys_ioctl+363/388] [system_call+51/56] 
:51 lima kernel: 
:51 lima kernel: Code: 89 41 04 8b 5c 24 10 89 4b 38 89 50 04 89 02 8b
44 24 24 8b 
:51 lima kernel: PPP: couldn't register device (-17)
:51 lima kernel: Unable to handle kernel NULL pointer dereference at
virtual address 0008
:51 lima kernel:  printing eip:
:51 lima kernel: c01a014f
:51 lima kernel: *pde = 
:51 lima kernel: Oops: 
:51 lima kernel: CPU:0
:51 lima kernel: EIP:0010:[ppp_create_interface+79/452]
:51 lima kernel: EFLAGS: 00010286
:51 lima kernel: eax: 0033   ebx: ffc8   ecx: 0032   edx:
0032
:51 lima kernel: esi:    edi: c321ec00   ebp: ffe7   esp:
c17bff28
:51 lima kernel: ds: 0018   es: 0018   ss: 0018
:51 lima kernel: Process pppd (pid: 2035, stackpage=c17bf000)
:51 lima kernel: Stack: fff2 08076f48 c321ec00 ffe7 ffc8
ffef  c019e775 
:51 lima kernel: c17bff5c 08076f48 08076f48 c004743e
fff2 c019e13c  
:51 lima kernel:c321ec00 c004743e 08076f48 c321ec00 08076f48
c004743e ffe7 fff2 
:51 lima kernel: Call Trace: [ppp_unattached_ioctl+97/320]
[ppp_ioctl+48/1544] [sys_ioctl+363/388] [system_call+51/56] 
:51 lima kernel: 
:51 lima kernel: Code: 8b 53 40 39 c2 7f 0f eb ca 8b 7c 24 10 8b 47 40
89 c2 39 d6 



0xc01a0100 :  sub$0xc,%esp
0xc01a0103 :push   %ebp
0xc01a0104 :push   %edi
0xc01a0105 :push   %esi
0xc01a0106 :push   %ebx
0xc01a0107 :mov0x20(%esp,1),%esi
0xc01a010b :   mov$0x,%ecx
0xc01a0110 :   movl   $0xffef,0x14(%esp,1)
0xc01a0118 :   movl   $0xc029de64,0x18(%esp,1)
0xc01a0120 :   jmp0xc01a012c

0xc01a0122 :   cmp%edx,%esi
0xc01a0124 :   je 0xc01a029f

0xc01a012a :   mov%edx,%ecx
0xc01a012c :   mov0x18(%esp,1),%eax
0xc01a0130 :   mov(%eax),%eax
0xc01a0132 :   mov%eax,0x18(%esp,1)
0xc01a0136 :   cmp$0xc029de64,%eax
0xc01a013b :   je 0xc01a0165

0xc01a013d :   add$0xffc8,%eax
0xc01a0140 :   mov%eax,0x10(%esp,1)
0xc01a0144 :   test   %esi,%esi
0xc01a0146 :   jge0xc01a0158

0xc01a0148 :   mov0x10(%esp,1),%ebx
0xc01a014c :   lea0x1(%ecx),%eax
0xc01a014f :   mov0x40(%ebx),%edx
^^ NULL pointer dereference HERE
0xc01a0152 :   cmp%eax,%edx
0xc01a0154 :   jg 0xc01a0165

0xc01a0156 :   jmp0xc01a0122


0xc01a0265 :  call   0xc0111d7c 
0xc01a026a :  push   %ebx
0xc01a026b :  call   0xc0124330 
0xc01a0270 :  push   %edi
0xc01a0271 :  call   0xc0124330 
0xc01a0276 :  add$0x10,%esp
0xc01a0279 :  jmp0xc01a029f

0xc01a027b :  nop
0xc01a027c :  lea0x0(%esi,1),%esi
0xc01a0280 :  mov0x18(%esp,1),%ecx
0xc01a0284 :  mov0x4(%ecx),%edx
0xc01a0287 :  mov0x10(%esp,1),%eax
0xc01a028b :  add$0x38,%eax
0xc01a028e :  mov(%edx),%ecx
0xc01a0290 :  mov%eax,0x4(%ecx)
^^ NULL pointer dereference HERE
0xc01a0293 :  mov0x10(%esp,1),%ebx
0xc01a0297 :  mov%ecx,0x38(%ebx)

---
ppp_create_interface(int unit, int *retp)
{
struct ppp *ppp;
struct net_device *dev;
struct list_head *list;
int last_unit = -1;
int ret = -EEXIST;
int i;

spin_lock(&all_ppp_lock);
list = &all_ppp_units;
while ((list = list->next) != &all_ppp_units) {
ppp = list_entry(list, struct ppp, file.list);
if ((unit < 0 && ppp->file.index > last_unit + 1)
  _MAYBE_ this is
ppp_create_interface+79 ?? 
|| (unit >= 0 && unit < ppp->file.index))
break;
if (unit == ppp->file.index)
...


Re: Should VLANs be devices or something else?

2001-06-19 Thread Marcell Gal

Hi,

Ben Greear wrote:

> >  > > Should VLANs be devices or some other thing?
> I found it to be the easiest way to implement things.  It allowed
> me to not have to touch any of layer 3, and I did not have to patch
> any user-space program like ip or ifconfig.

I faced the same issue when implementing RFC2684 (formerly 1483)
Ethernet over ATM-AAL5. Since users want to do the same thing
(ifconfig, tcpdump, rfc 2514 pppoe, dhcp, ipx) as with traditional eth0
using register_netdev was 'the right thing'.
However having the possibility of many devices annoyed
some people. (upto appr. 4095/ATM-VC in case of vlan over rfc2684 over
atm ;-)

My answer to the (old) 'long ifconfig listing' argument:
Users do not have more interfaces in the ifconfig listing than those they
create for themselves.
That's ok, exactly what they want. Those who do not like many interfaces
do not
create many.
The real thrill would be maintaining new (or patched) tools just because
we want to
avoid having the _possibility_ of long listings at any cost...

I remember
/proc/sys/net/ipv4/conf/
was broken for about >300 devices. I do not know how's it today.

> Adding the hashed lookup for devices took the exponential curve out of
> ip and ifconfig's performance, btw.

n^2 for creating n devices (in the unfortunate increasing or random
order),
(not 2^n), I guess.

Cell

--
You'll never see all the places, or read all the books, but fortunately,
they're not all recommended.



-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/