Re: [Bug 8473] New: Oops: 0010 [1] SMP

2007-06-08 Thread Andrew Morton
On Fri, 08 Jun 2007 14:20:48 -0500
Paul Fulghum <[EMAIL PROTECTED]> wrote:

> On Fri, 2007-06-08 at 10:16 -0500, Paul Fulghum wrote:
> > On Fri, 2007-06-08 at 05:06 +0200, Bj__rn Steinbrink wrote:
> > > This is do_tty_hangup() exchanging the fops while we're waiting for the
> > > lock. The new fops (hung_up_tty_fops) only have the unlocked variant and
> > > thus we Oops our way.
> ...
> > Here is a patch that restores the locked ioctl for hung_up_tty_ioctl.
> > Can you try it and see if it removes your oops?
> 
> Unfortunately I can't get the timing right to trigger this,
> but it is very clear the locked ioctl fop must not be
> allowed to disappear like my original patch allows.
> 
> Andrew:
> 
> Would you prefer I resend the entire compat ioctl patch or
> submit an incremental patch like in my message I'm quoting above?
> 

The compat_ioctl patch is in mainline, and has been for some time.

Hence a patch against mainline would be appropriate, thanks.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Bug 8473] New: Oops: 0010 [1] SMP

2007-06-08 Thread Paul Fulghum
On Fri, 2007-06-08 at 10:16 -0500, Paul Fulghum wrote:
> On Fri, 2007-06-08 at 05:06 +0200, Björn Steinbrink wrote:
> > This is do_tty_hangup() exchanging the fops while we're waiting for the
> > lock. The new fops (hung_up_tty_fops) only have the unlocked variant and
> > thus we Oops our way.
...
> Here is a patch that restores the locked ioctl for hung_up_tty_ioctl.
> Can you try it and see if it removes your oops?

Unfortunately I can't get the timing right to trigger this,
but it is very clear the locked ioctl fop must not be
allowed to disappear like my original patch allows.

Andrew:

Would you prefer I resend the entire compat ioctl patch or
submit an incremental patch like in my message I'm quoting above?

-- 
Paul Fulghum
Microgate Systems, Ltd

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Bug 8473] New: Oops: 0010 [1] SMP

2007-06-08 Thread Paul Fulghum
On Fri, 2007-06-08 at 05:06 +0200, Björn Steinbrink wrote:
> This is do_tty_hangup() exchanging the fops while we're waiting for the
> lock. The new fops (hung_up_tty_fops) only have the unlocked variant and
> thus we Oops our way.
> 
> The following program reproduces it quite easily on a SMP box. I'm
> running it from X as root like this:
> while true; do xterm /path/to/program; done

I am unable to reproduce the oops on either i386 SMP or x86_64 SMP
using your test program. This is against plain 2.6.21 with only
my compat ioctl patch applied.

Here is a patch that restores the locked ioctl for hung_up_tty_ioctl.
Can you try it and see if it removes your oops?

--- a/drivers/char/tty_io.c 2007-06-08 10:07:39.0 -0500
+++ b/drivers/char/tty_io.c 2007-06-08 10:09:59.0 -0500
@@ -1150,8 +1150,14 @@ static unsigned int hung_up_tty_poll(str
return POLLIN | POLLOUT | POLLERR | POLLHUP | POLLRDNORM | POLLWRNORM;
 }
 
-static long hung_up_tty_ioctl(struct file * file,
- unsigned int cmd, unsigned long arg)
+static int hung_up_tty_ioctl(struct inode * inode, struct file * file,
+unsigned int cmd, unsigned long arg)
+{
+   return cmd == TIOCSPGRP ? -ENOTTY : -EIO;
+}
+
+static long hung_up_tty_compat_ioctl(struct file * file,
+unsigned int cmd, unsigned long arg)
 {
return cmd == TIOCSPGRP ? -ENOTTY : -EIO;
 }
@@ -1199,8 +1205,8 @@ static const struct file_operations hung
.read   = hung_up_tty_read,
.write  = hung_up_tty_write,
.poll   = hung_up_tty_poll,
-   .unlocked_ioctl = hung_up_tty_ioctl,
-   .compat_ioctl   = hung_up_tty_ioctl,
+   .ioctl  = hung_up_tty_ioctl,
+   .compat_ioctl   = hung_up_tty_compat_ioctl,
.release= tty_release,
 };
 


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Bug 8473] New: Oops: 0010 [1] SMP

2007-06-08 Thread Paul Fulghum
On Thu, 2007-06-07 at 20:16 -0700, Andrew Morton wrote:
> On Fri, 8 Jun 2007 05:06:29 +0200 Björn Steinbrink <[EMAIL PROTECTED]> wrote:
> > This is do_tty_hangup() exchanging the fops while we're waiting for the
> > lock. The new fops (hung_up_tty_fops) only have the unlocked variant and
> > thus we Oops our way.
> 
> ah, yes, that explains it, thanks.  Culprit:
> 
> commit e10cc1df1d2014f68a4bdcf73f6dd122c4561f94
> Author: Paul Fulghum <[EMAIL PROTECTED]>
> Date:   Thu May 10 22:22:50 2007 -0700
> 
> tty: add compat_ioctl
> 
> Add compat_ioctl method for tty code to allow processing of 32 bit ioctl
> calls on 64 bit systems by tty core, tty drivers, and line disciplines.

OK, the change of hung_up_tty_ioctl() from locked to unlocked
is not necessary for this patch. On the surface it seemed a clever
way of not needing two different functions to do the same simple:

return cmd == TIOCSPGRP ? -ENOTTY : -EIO;

which does not need any locking for its own sake. But clearly
the hangup behavior requires the locked version.

I'll redo the patch with hung_up_tty_ioctl remaining locked.

That will separate the compat ioctl stuff from this issue.

-- 
Paul Fulghum
Microgate Systems, Ltd

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Bug 8473] New: Oops: 0010 [1] SMP

2007-06-08 Thread Paul Fulghum
On Thu, 2007-06-07 at 20:16 -0700, Andrew Morton wrote:
 On Fri, 8 Jun 2007 05:06:29 +0200 Björn Steinbrink [EMAIL PROTECTED] wrote:
  This is do_tty_hangup() exchanging the fops while we're waiting for the
  lock. The new fops (hung_up_tty_fops) only have the unlocked variant and
  thus we Oops our way.
 
 ah, yes, that explains it, thanks.  Culprit:
 
 commit e10cc1df1d2014f68a4bdcf73f6dd122c4561f94
 Author: Paul Fulghum [EMAIL PROTECTED]
 Date:   Thu May 10 22:22:50 2007 -0700
 
 tty: add compat_ioctl
 
 Add compat_ioctl method for tty code to allow processing of 32 bit ioctl
 calls on 64 bit systems by tty core, tty drivers, and line disciplines.

OK, the change of hung_up_tty_ioctl() from locked to unlocked
is not necessary for this patch. On the surface it seemed a clever
way of not needing two different functions to do the same simple:

return cmd == TIOCSPGRP ? -ENOTTY : -EIO;

which does not need any locking for its own sake. But clearly
the hangup behavior requires the locked version.

I'll redo the patch with hung_up_tty_ioctl remaining locked.

That will separate the compat ioctl stuff from this issue.

-- 
Paul Fulghum
Microgate Systems, Ltd

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Bug 8473] New: Oops: 0010 [1] SMP

2007-06-08 Thread Paul Fulghum
On Fri, 2007-06-08 at 05:06 +0200, Björn Steinbrink wrote:
 This is do_tty_hangup() exchanging the fops while we're waiting for the
 lock. The new fops (hung_up_tty_fops) only have the unlocked variant and
 thus we Oops our way.
 
 The following program reproduces it quite easily on a SMP box. I'm
 running it from X as root like this:
 while true; do xterm /path/to/program; done

I am unable to reproduce the oops on either i386 SMP or x86_64 SMP
using your test program. This is against plain 2.6.21 with only
my compat ioctl patch applied.

Here is a patch that restores the locked ioctl for hung_up_tty_ioctl.
Can you try it and see if it removes your oops?

--- a/drivers/char/tty_io.c 2007-06-08 10:07:39.0 -0500
+++ b/drivers/char/tty_io.c 2007-06-08 10:09:59.0 -0500
@@ -1150,8 +1150,14 @@ static unsigned int hung_up_tty_poll(str
return POLLIN | POLLOUT | POLLERR | POLLHUP | POLLRDNORM | POLLWRNORM;
 }
 
-static long hung_up_tty_ioctl(struct file * file,
- unsigned int cmd, unsigned long arg)
+static int hung_up_tty_ioctl(struct inode * inode, struct file * file,
+unsigned int cmd, unsigned long arg)
+{
+   return cmd == TIOCSPGRP ? -ENOTTY : -EIO;
+}
+
+static long hung_up_tty_compat_ioctl(struct file * file,
+unsigned int cmd, unsigned long arg)
 {
return cmd == TIOCSPGRP ? -ENOTTY : -EIO;
 }
@@ -1199,8 +1205,8 @@ static const struct file_operations hung
.read   = hung_up_tty_read,
.write  = hung_up_tty_write,
.poll   = hung_up_tty_poll,
-   .unlocked_ioctl = hung_up_tty_ioctl,
-   .compat_ioctl   = hung_up_tty_ioctl,
+   .ioctl  = hung_up_tty_ioctl,
+   .compat_ioctl   = hung_up_tty_compat_ioctl,
.release= tty_release,
 };
 


-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Bug 8473] New: Oops: 0010 [1] SMP

2007-06-08 Thread Paul Fulghum
On Fri, 2007-06-08 at 10:16 -0500, Paul Fulghum wrote:
 On Fri, 2007-06-08 at 05:06 +0200, Björn Steinbrink wrote:
  This is do_tty_hangup() exchanging the fops while we're waiting for the
  lock. The new fops (hung_up_tty_fops) only have the unlocked variant and
  thus we Oops our way.
...
 Here is a patch that restores the locked ioctl for hung_up_tty_ioctl.
 Can you try it and see if it removes your oops?

Unfortunately I can't get the timing right to trigger this,
but it is very clear the locked ioctl fop must not be
allowed to disappear like my original patch allows.

Andrew:

Would you prefer I resend the entire compat ioctl patch or
submit an incremental patch like in my message I'm quoting above?

-- 
Paul Fulghum
Microgate Systems, Ltd

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Bug 8473] New: Oops: 0010 [1] SMP

2007-06-08 Thread Andrew Morton
On Fri, 08 Jun 2007 14:20:48 -0500
Paul Fulghum [EMAIL PROTECTED] wrote:

 On Fri, 2007-06-08 at 10:16 -0500, Paul Fulghum wrote:
  On Fri, 2007-06-08 at 05:06 +0200, Bj__rn Steinbrink wrote:
   This is do_tty_hangup() exchanging the fops while we're waiting for the
   lock. The new fops (hung_up_tty_fops) only have the unlocked variant and
   thus we Oops our way.
 ...
  Here is a patch that restores the locked ioctl for hung_up_tty_ioctl.
  Can you try it and see if it removes your oops?
 
 Unfortunately I can't get the timing right to trigger this,
 but it is very clear the locked ioctl fop must not be
 allowed to disappear like my original patch allows.
 
 Andrew:
 
 Would you prefer I resend the entire compat ioctl patch or
 submit an incremental patch like in my message I'm quoting above?
 

The compat_ioctl patch is in mainline, and has been for some time.

Hence a patch against mainline would be appropriate, thanks.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Bug 8473] New: Oops: 0010 [1] SMP

2007-06-07 Thread Andrew Morton
On Fri, 8 Jun 2007 05:06:29 +0200 Björn Steinbrink <[EMAIL PROTECTED]> wrote:

> On 2007.05.26 21:10:15 +0200, Nicolas Mailhot wrote:
> > Le jeudi 17 mai 2007 à 18:59 +0200, Nicolas Mailhot a écrit :
> > > Le jeudi 17 mai 2007 à 09:45 -0700, Randy Dunlap a écrit :
> > > 
> > > > Can you boot with "kstack=32" so that we can see more of the stack?
> > > 
> > > I can try. It's not triggering quickly though
> > 
> > Seems I was completely wrong about the trigger, but anyway it happened
> > again, this time on 2.6.22-rc2.mm1.cfs14 (and I had kept kstack=32)
> > 
> >  BUG: using smp_processor_id() in preemptible [0001] code: bash/3857
> >  caller is oops_begin+0xb/0x6f
> >  
> >  Call Trace:
> >  [] show_trace+0x34/0x4f
> >  [] dump_stack+0x12/0x17
> >  [] debug_smp_processor_id+0xad/0xbc
> >  [] oops_begin+0xb/0x6f
> >  [] do_page_fault+0x66a/0x7c0
> >  [] error_exit+0x0/0x84

hm that was dumb.  I'll stick a raw_smp_processor_id() in there.

> >  Unable to handle kernel NULL pointer dereference at  RIP: 
> >  [<>]
> >  PGD bdd2067 PUD c133067 PMD 0 
> >  Oops: 0010 [1] PREEMPT SMP 
> >  CPU 1 
> >  Pid: 3857, comm: bash Not tainted 2.6.22-0.8.rc2.mm1.cfs14.fc8.nim #1
> >  RIP: 0010:[<>]  [<>]
> >  RSP: 0018:81000cb03ee0  EFLAGS: 00010296
> >  RAX: 8044dbc0 RBX: 81000c3aa8c0 RCX: 7fff549dcae4
> >  RDX: 5410 RSI: 81000c3aa8c0 RDI: 81000ba913d8
> >  RBP: 7fff549dcae4 R08:  R09: 
> >  R10: 0008 R11: 0246 R12: 5410
> >  R13: 00ff R14: 00ff R15: 
> >  FS:  2b06560d8f40() GS:810004017180() 
> > knlGS:
> >  CS:  0010 DS:  ES:  CR0: 8005003b
> >  CR2:  CR3: 0bc55000 CR4: 06e0
> >  Process bash (pid: 3857, threadinfo 81000cb02000, task 
> > 81000adc59a0)
> >  Stack:  8028ada9 81000c3aa8c0 7fff549dcae4 7fff549dcae4
> >  8028b016 5410 00ff 81000c3aa8c0
> >   7fff549dcae4 5410 00ff
> >  8028b088  80209571 
> >  7fff549dce87 0f11 7fff549dcfb8 7fff549dddb0
> >  802095dc 0246 0008 
> >   ffda  7fff549dcae4
> >  5410 00ff 0010 003d340c9117
> >  Call Trace:
> >  Inexact backtrace:
> >  [] do_ioctl+0x55/0x6b
> >  [] vfs_ioctl+0x257/0x270
> >  [] sys_ioctl+0x59/0x79
> >  [] tracesys+0xdc/0xe1
> >  
> >  INFO: lockdep is turned off.
> >  
> >  Code:  Bad RIP value.
> >  RIP  [<>]
> >  RSP 
> >  CR2: 
> 
> This is do_tty_hangup() exchanging the fops while we're waiting for the
> lock. The new fops (hung_up_tty_fops) only have the unlocked variant and
> thus we Oops our way.

ah, yes, that explains it, thanks.  Culprit:

commit e10cc1df1d2014f68a4bdcf73f6dd122c4561f94
Author: Paul Fulghum <[EMAIL PROTECTED]>
Date:   Thu May 10 22:22:50 2007 -0700

tty: add compat_ioctl

Add compat_ioctl method for tty code to allow processing of 32 bit ioctl
calls on 64 bit systems by tty core, tty drivers, and line disciplines.

Based on patch by Arnd Bergmann:
http://www.uwsg.iu.edu/hypermail/linux/kernel/0511.0/1732.html

[EMAIL PROTECTED]: make things static]
Signed-off-by: Paul Fulghum <[EMAIL PROTECTED]>
Acked-by: Arnd Bergmann <[EMAIL PROTECTED]>
Cc: Alan Cox <[EMAIL PROTECTED]>
Signed-off-by: Andrew Morton <[EMAIL PROTECTED]>
Signed-off-by: Linus Torvalds <[EMAIL PROTECTED]>


> The following program reproduces it quite easily on a SMP box. I'm
> running it from X as root like this:
> while true; do xterm /path/to/program; done
> 
> #include 
> #include 
> #include 
> 
> #include 
> 
> pid_t pid;
> 
> void *thread(void *arg)
> {
>   while (1)
>   ioctl(0, TIOCSPGRP, );
> }
> 
> int main()
> {
>   pthread_t t;
> 
>   pid = getpid();
> 
>   pthread_create(, NULL, thread, NULL);
>   sleep(1);
>   vhangup();
>   perror("vhangup");
>   return 0;
> }
> 
> I'm not exactly sure how to solve that in a clean way, though. Moving
> the call to lock_kernel() up would make the Oops go away, but could
> result in the wrong error code being returned. Checking for ioctl first
> and unlocked_ioctl last would cause useless locking. And retrying the
> unlocked ioctl doesn't look nice either :-(
> 
> Björn
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Bug 8473] New: Oops: 0010 [1] SMP

2007-06-07 Thread Björn Steinbrink
On 2007.05.26 21:10:15 +0200, Nicolas Mailhot wrote:
> Le jeudi 17 mai 2007 à 18:59 +0200, Nicolas Mailhot a écrit :
> > Le jeudi 17 mai 2007 à 09:45 -0700, Randy Dunlap a écrit :
> > 
> > > Can you boot with "kstack=32" so that we can see more of the stack?
> > 
> > I can try. It's not triggering quickly though
> 
> Seems I was completely wrong about the trigger, but anyway it happened
> again, this time on 2.6.22-rc2.mm1.cfs14 (and I had kept kstack=32)
> 
>  BUG: using smp_processor_id() in preemptible [0001] code: bash/3857
>  caller is oops_begin+0xb/0x6f
>  
>  Call Trace:
>  [] show_trace+0x34/0x4f
>  [] dump_stack+0x12/0x17
>  [] debug_smp_processor_id+0xad/0xbc
>  [] oops_begin+0xb/0x6f
>  [] do_page_fault+0x66a/0x7c0
>  [] error_exit+0x0/0x84
>  
>  Unable to handle kernel NULL pointer dereference at  RIP: 
>  [<>]
>  PGD bdd2067 PUD c133067 PMD 0 
>  Oops: 0010 [1] PREEMPT SMP 
>  CPU 1 
>  Pid: 3857, comm: bash Not tainted 2.6.22-0.8.rc2.mm1.cfs14.fc8.nim #1
>  RIP: 0010:[<>]  [<>]
>  RSP: 0018:81000cb03ee0  EFLAGS: 00010296
>  RAX: 8044dbc0 RBX: 81000c3aa8c0 RCX: 7fff549dcae4
>  RDX: 5410 RSI: 81000c3aa8c0 RDI: 81000ba913d8
>  RBP: 7fff549dcae4 R08:  R09: 
>  R10: 0008 R11: 0246 R12: 5410
>  R13: 00ff R14: 00ff R15: 
>  FS:  2b06560d8f40() GS:810004017180() knlGS:
>  CS:  0010 DS:  ES:  CR0: 8005003b
>  CR2:  CR3: 0bc55000 CR4: 06e0
>  Process bash (pid: 3857, threadinfo 81000cb02000, task 81000adc59a0)
>  Stack:  8028ada9 81000c3aa8c0 7fff549dcae4 7fff549dcae4
>  8028b016 5410 00ff 81000c3aa8c0
>   7fff549dcae4 5410 00ff
>  8028b088  80209571 
>  7fff549dce87 0f11 7fff549dcfb8 7fff549dddb0
>  802095dc 0246 0008 
>   ffda  7fff549dcae4
>  5410 00ff 0010 003d340c9117
>  Call Trace:
>  Inexact backtrace:
>  [] do_ioctl+0x55/0x6b
>  [] vfs_ioctl+0x257/0x270
>  [] sys_ioctl+0x59/0x79
>  [] tracesys+0xdc/0xe1
>  
>  INFO: lockdep is turned off.
>  
>  Code:  Bad RIP value.
>  RIP  [<>]
>  RSP 
>  CR2: 

This is do_tty_hangup() exchanging the fops while we're waiting for the
lock. The new fops (hung_up_tty_fops) only have the unlocked variant and
thus we Oops our way.

The following program reproduces it quite easily on a SMP box. I'm
running it from X as root like this:
while true; do xterm /path/to/program; done

#include 
#include 
#include 

#include 

pid_t pid;

void *thread(void *arg)
{
while (1)
ioctl(0, TIOCSPGRP, );
}

int main()
{
pthread_t t;

pid = getpid();

pthread_create(, NULL, thread, NULL);
sleep(1);
vhangup();
perror("vhangup");
return 0;
}

I'm not exactly sure how to solve that in a clean way, though. Moving
the call to lock_kernel() up would make the Oops go away, but could
result in the wrong error code being returned. Checking for ioctl first
and unlocked_ioctl last would cause useless locking. And retrying the
unlocked ioctl doesn't look nice either :-(

Björn
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Bug 8473] New: Oops: 0010 [1] SMP

2007-06-07 Thread Björn Steinbrink
On 2007.05.26 21:10:15 +0200, Nicolas Mailhot wrote:
 Le jeudi 17 mai 2007 à 18:59 +0200, Nicolas Mailhot a écrit :
  Le jeudi 17 mai 2007 à 09:45 -0700, Randy Dunlap a écrit :
  
   Can you boot with kstack=32 so that we can see more of the stack?
  
  I can try. It's not triggering quickly though
 
 Seems I was completely wrong about the trigger, but anyway it happened
 again, this time on 2.6.22-rc2.mm1.cfs14 (and I had kept kstack=32)
 
  BUG: using smp_processor_id() in preemptible [0001] code: bash/3857
  caller is oops_begin+0xb/0x6f
  
  Call Trace:
  [8020ab4d] show_trace+0x34/0x4f
  [8020ab7a] dump_stack+0x12/0x17
  [8030d92d] debug_smp_processor_id+0xad/0xbc
  [8042388f] oops_begin+0xb/0x6f
  [8042520b] do_page_fault+0x66a/0x7c0
  [804234bd] error_exit+0x0/0x84
  
  Unable to handle kernel NULL pointer dereference at  RIP: 
  []
  PGD bdd2067 PUD c133067 PMD 0 
  Oops: 0010 [1] PREEMPT SMP 
  CPU 1 
  Pid: 3857, comm: bash Not tainted 2.6.22-0.8.rc2.mm1.cfs14.fc8.nim #1
  RIP: 0010:[]  []
  RSP: 0018:81000cb03ee0  EFLAGS: 00010296
  RAX: 8044dbc0 RBX: 81000c3aa8c0 RCX: 7fff549dcae4
  RDX: 5410 RSI: 81000c3aa8c0 RDI: 81000ba913d8
  RBP: 7fff549dcae4 R08:  R09: 
  R10: 0008 R11: 0246 R12: 5410
  R13: 00ff R14: 00ff R15: 
  FS:  2b06560d8f40() GS:810004017180() knlGS:
  CS:  0010 DS:  ES:  CR0: 8005003b
  CR2:  CR3: 0bc55000 CR4: 06e0
  Process bash (pid: 3857, threadinfo 81000cb02000, task 81000adc59a0)
  Stack:  8028ada9 81000c3aa8c0 7fff549dcae4 7fff549dcae4
  8028b016 5410 00ff 81000c3aa8c0
   7fff549dcae4 5410 00ff
  8028b088  80209571 
  7fff549dce87 0f11 7fff549dcfb8 7fff549dddb0
  802095dc 0246 0008 
   ffda  7fff549dcae4
  5410 00ff 0010 003d340c9117
  Call Trace:
  Inexact backtrace:
  [8028ada9] do_ioctl+0x55/0x6b
  [8028b016] vfs_ioctl+0x257/0x270
  [8028b088] sys_ioctl+0x59/0x79
  [802095dc] tracesys+0xdc/0xe1
  
  INFO: lockdep is turned off.
  
  Code:  Bad RIP value.
  RIP  []
  RSP 81000cb03ee0
  CR2: 

This is do_tty_hangup() exchanging the fops while we're waiting for the
lock. The new fops (hung_up_tty_fops) only have the unlocked variant and
thus we Oops our way.

The following program reproduces it quite easily on a SMP box. I'm
running it from X as root like this:
while true; do xterm /path/to/program; done

#include pthread.h
#include stdio.h
#include unistd.h

#include sys/ioctl.h

pid_t pid;

void *thread(void *arg)
{
while (1)
ioctl(0, TIOCSPGRP, pid);
}

int main()
{
pthread_t t;

pid = getpid();

pthread_create(t, NULL, thread, NULL);
sleep(1);
vhangup();
perror(vhangup);
return 0;
}

I'm not exactly sure how to solve that in a clean way, though. Moving
the call to lock_kernel() up would make the Oops go away, but could
result in the wrong error code being returned. Checking for ioctl first
and unlocked_ioctl last would cause useless locking. And retrying the
unlocked ioctl doesn't look nice either :-(

Björn
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Bug 8473] New: Oops: 0010 [1] SMP

2007-06-07 Thread Andrew Morton
On Fri, 8 Jun 2007 05:06:29 +0200 Björn Steinbrink [EMAIL PROTECTED] wrote:

 On 2007.05.26 21:10:15 +0200, Nicolas Mailhot wrote:
  Le jeudi 17 mai 2007 à 18:59 +0200, Nicolas Mailhot a écrit :
   Le jeudi 17 mai 2007 à 09:45 -0700, Randy Dunlap a écrit :
   
Can you boot with kstack=32 so that we can see more of the stack?
   
   I can try. It's not triggering quickly though
  
  Seems I was completely wrong about the trigger, but anyway it happened
  again, this time on 2.6.22-rc2.mm1.cfs14 (and I had kept kstack=32)
  
   BUG: using smp_processor_id() in preemptible [0001] code: bash/3857
   caller is oops_begin+0xb/0x6f
   
   Call Trace:
   [8020ab4d] show_trace+0x34/0x4f
   [8020ab7a] dump_stack+0x12/0x17
   [8030d92d] debug_smp_processor_id+0xad/0xbc
   [8042388f] oops_begin+0xb/0x6f
   [8042520b] do_page_fault+0x66a/0x7c0
   [804234bd] error_exit+0x0/0x84

hm that was dumb.  I'll stick a raw_smp_processor_id() in there.

   Unable to handle kernel NULL pointer dereference at  RIP: 
   []
   PGD bdd2067 PUD c133067 PMD 0 
   Oops: 0010 [1] PREEMPT SMP 
   CPU 1 
   Pid: 3857, comm: bash Not tainted 2.6.22-0.8.rc2.mm1.cfs14.fc8.nim #1
   RIP: 0010:[]  []
   RSP: 0018:81000cb03ee0  EFLAGS: 00010296
   RAX: 8044dbc0 RBX: 81000c3aa8c0 RCX: 7fff549dcae4
   RDX: 5410 RSI: 81000c3aa8c0 RDI: 81000ba913d8
   RBP: 7fff549dcae4 R08:  R09: 
   R10: 0008 R11: 0246 R12: 5410
   R13: 00ff R14: 00ff R15: 
   FS:  2b06560d8f40() GS:810004017180() 
  knlGS:
   CS:  0010 DS:  ES:  CR0: 8005003b
   CR2:  CR3: 0bc55000 CR4: 06e0
   Process bash (pid: 3857, threadinfo 81000cb02000, task 
  81000adc59a0)
   Stack:  8028ada9 81000c3aa8c0 7fff549dcae4 7fff549dcae4
   8028b016 5410 00ff 81000c3aa8c0
    7fff549dcae4 5410 00ff
   8028b088  80209571 
   7fff549dce87 0f11 7fff549dcfb8 7fff549dddb0
   802095dc 0246 0008 
    ffda  7fff549dcae4
   5410 00ff 0010 003d340c9117
   Call Trace:
   Inexact backtrace:
   [8028ada9] do_ioctl+0x55/0x6b
   [8028b016] vfs_ioctl+0x257/0x270
   [8028b088] sys_ioctl+0x59/0x79
   [802095dc] tracesys+0xdc/0xe1
   
   INFO: lockdep is turned off.
   
   Code:  Bad RIP value.
   RIP  []
   RSP 81000cb03ee0
   CR2: 
 
 This is do_tty_hangup() exchanging the fops while we're waiting for the
 lock. The new fops (hung_up_tty_fops) only have the unlocked variant and
 thus we Oops our way.

ah, yes, that explains it, thanks.  Culprit:

commit e10cc1df1d2014f68a4bdcf73f6dd122c4561f94
Author: Paul Fulghum [EMAIL PROTECTED]
Date:   Thu May 10 22:22:50 2007 -0700

tty: add compat_ioctl

Add compat_ioctl method for tty code to allow processing of 32 bit ioctl
calls on 64 bit systems by tty core, tty drivers, and line disciplines.

Based on patch by Arnd Bergmann:
http://www.uwsg.iu.edu/hypermail/linux/kernel/0511.0/1732.html

[EMAIL PROTECTED]: make things static]
Signed-off-by: Paul Fulghum [EMAIL PROTECTED]
Acked-by: Arnd Bergmann [EMAIL PROTECTED]
Cc: Alan Cox [EMAIL PROTECTED]
Signed-off-by: Andrew Morton [EMAIL PROTECTED]
Signed-off-by: Linus Torvalds [EMAIL PROTECTED]


 The following program reproduces it quite easily on a SMP box. I'm
 running it from X as root like this:
 while true; do xterm /path/to/program; done
 
 #include pthread.h
 #include stdio.h
 #include unistd.h
 
 #include sys/ioctl.h
 
 pid_t pid;
 
 void *thread(void *arg)
 {
   while (1)
   ioctl(0, TIOCSPGRP, pid);
 }
 
 int main()
 {
   pthread_t t;
 
   pid = getpid();
 
   pthread_create(t, NULL, thread, NULL);
   sleep(1);
   vhangup();
   perror(vhangup);
   return 0;
 }
 
 I'm not exactly sure how to solve that in a clean way, though. Moving
 the call to lock_kernel() up would make the Oops go away, but could
 result in the wrong error code being returned. Checking for ioctl first
 and unlocked_ioctl last would cause useless locking. And retrying the
 unlocked ioctl doesn't look nice either :-(
 
 Björn
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Bug 8473] New: Oops: 0010 [1] SMP

2007-05-26 Thread Nicolas Mailhot
Le jeudi 17 mai 2007 à 18:59 +0200, Nicolas Mailhot a écrit :
> Le jeudi 17 mai 2007 à 09:45 -0700, Randy Dunlap a écrit :
> 
> > Can you boot with "kstack=32" so that we can see more of the stack?
> 
> I can try. It's not triggering quickly though

Seems I was completely wrong about the trigger, but anyway it happened
again, this time on 2.6.22-rc2.mm1.cfs14 (and I had kept kstack=32)

 BUG: using smp_processor_id() in preemptible [0001] code: bash/3857
 caller is oops_begin+0xb/0x6f
 
 Call Trace:
 [] show_trace+0x34/0x4f
 [] dump_stack+0x12/0x17
 [] debug_smp_processor_id+0xad/0xbc
 [] oops_begin+0xb/0x6f
 [] do_page_fault+0x66a/0x7c0
 [] error_exit+0x0/0x84
 
 Unable to handle kernel NULL pointer dereference at  RIP: 
 [<>]
 PGD bdd2067 PUD c133067 PMD 0 
 Oops: 0010 [1] PREEMPT SMP 
 CPU 1 
 Pid: 3857, comm: bash Not tainted 2.6.22-0.8.rc2.mm1.cfs14.fc8.nim #1
 RIP: 0010:[<>]  [<>]
 RSP: 0018:81000cb03ee0  EFLAGS: 00010296
 RAX: 8044dbc0 RBX: 81000c3aa8c0 RCX: 7fff549dcae4
 RDX: 5410 RSI: 81000c3aa8c0 RDI: 81000ba913d8
 RBP: 7fff549dcae4 R08:  R09: 
 R10: 0008 R11: 0246 R12: 5410
 R13: 00ff R14: 00ff R15: 
 FS:  2b06560d8f40() GS:810004017180() knlGS:
 CS:  0010 DS:  ES:  CR0: 8005003b
 CR2:  CR3: 0bc55000 CR4: 06e0
 Process bash (pid: 3857, threadinfo 81000cb02000, task 81000adc59a0)
 Stack:  8028ada9 81000c3aa8c0 7fff549dcae4 7fff549dcae4
 8028b016 5410 00ff 81000c3aa8c0
  7fff549dcae4 5410 00ff
 8028b088  80209571 
 7fff549dce87 0f11 7fff549dcfb8 7fff549dddb0
 802095dc 0246 0008 
  ffda  7fff549dcae4
 5410 00ff 0010 003d340c9117
 Call Trace:
 Inexact backtrace:
 [] do_ioctl+0x55/0x6b
 [] vfs_ioctl+0x257/0x270
 [] sys_ioctl+0x59/0x79
 [] tracesys+0xdc/0xe1
 
 INFO: lockdep is turned off.
 
 Code:  Bad RIP value.
 RIP  [<>]
 RSP 
 CR2: 

-- 
Nicolas Mailhot


signature.asc
Description: Ceci est une partie de message	numériquement signée


Re: [Bug 8473] New: Oops: 0010 [1] SMP

2007-05-26 Thread Nicolas Mailhot
Le jeudi 17 mai 2007 à 18:59 +0200, Nicolas Mailhot a écrit :
 Le jeudi 17 mai 2007 à 09:45 -0700, Randy Dunlap a écrit :
 
  Can you boot with kstack=32 so that we can see more of the stack?
 
 I can try. It's not triggering quickly though

Seems I was completely wrong about the trigger, but anyway it happened
again, this time on 2.6.22-rc2.mm1.cfs14 (and I had kept kstack=32)

 BUG: using smp_processor_id() in preemptible [0001] code: bash/3857
 caller is oops_begin+0xb/0x6f
 
 Call Trace:
 [8020ab4d] show_trace+0x34/0x4f
 [8020ab7a] dump_stack+0x12/0x17
 [8030d92d] debug_smp_processor_id+0xad/0xbc
 [8042388f] oops_begin+0xb/0x6f
 [8042520b] do_page_fault+0x66a/0x7c0
 [804234bd] error_exit+0x0/0x84
 
 Unable to handle kernel NULL pointer dereference at  RIP: 
 []
 PGD bdd2067 PUD c133067 PMD 0 
 Oops: 0010 [1] PREEMPT SMP 
 CPU 1 
 Pid: 3857, comm: bash Not tainted 2.6.22-0.8.rc2.mm1.cfs14.fc8.nim #1
 RIP: 0010:[]  []
 RSP: 0018:81000cb03ee0  EFLAGS: 00010296
 RAX: 8044dbc0 RBX: 81000c3aa8c0 RCX: 7fff549dcae4
 RDX: 5410 RSI: 81000c3aa8c0 RDI: 81000ba913d8
 RBP: 7fff549dcae4 R08:  R09: 
 R10: 0008 R11: 0246 R12: 5410
 R13: 00ff R14: 00ff R15: 
 FS:  2b06560d8f40() GS:810004017180() knlGS:
 CS:  0010 DS:  ES:  CR0: 8005003b
 CR2:  CR3: 0bc55000 CR4: 06e0
 Process bash (pid: 3857, threadinfo 81000cb02000, task 81000adc59a0)
 Stack:  8028ada9 81000c3aa8c0 7fff549dcae4 7fff549dcae4
 8028b016 5410 00ff 81000c3aa8c0
  7fff549dcae4 5410 00ff
 8028b088  80209571 
 7fff549dce87 0f11 7fff549dcfb8 7fff549dddb0
 802095dc 0246 0008 
  ffda  7fff549dcae4
 5410 00ff 0010 003d340c9117
 Call Trace:
 Inexact backtrace:
 [8028ada9] do_ioctl+0x55/0x6b
 [8028b016] vfs_ioctl+0x257/0x270
 [8028b088] sys_ioctl+0x59/0x79
 [802095dc] tracesys+0xdc/0xe1
 
 INFO: lockdep is turned off.
 
 Code:  Bad RIP value.
 RIP  []
 RSP 81000cb03ee0
 CR2: 

-- 
Nicolas Mailhot


signature.asc
Description: Ceci est une partie de message	numériquement signée


Re: [Bug 8473] New: Oops: 0010 [1] SMP

2007-05-17 Thread Nicolas Mailhot
Le jeudi 17 mai 2007 à 09:45 -0700, Randy Dunlap a écrit :
> On Thu, 17 May 2007 12:00:02 +0200 Nicolas Mailhot wrote:
> 
> > Le lundi 14 mai 2007 à 01:25 +0200, Nicolas Mailhot a écrit :
> > 
> > > It happened once so far. The load was moderate (and certainly not
> > > comparable to what I did for Mel yesterday)
> > 
> > Make that twice. The interesting thing is it was preceded by CD/DVD
> > access just before, to something is rotten there.
> > 
> > 10:52:35 ISO 9660 Extensions: RRIP_1991A
> > 11:52:36 Unable to handle kernel NULL pointer dereference at 
> >  RIP: 
> > 11:52:36 [<>]
> > 11:52:36 PGD 2438a067 PUD c484067 PMD 0 
> > 11:52:36 Oops: 0010 [1] SMP 
> > 11:52:36 CPU 1 
> > 11:52:36 Pid: 30655, comm: bash Not tainted 2.6.21-11.mm2.fc7.nim #1
> 
> so just what is this kernel?  A hybrid of -mm and -fc7 or what?

-mm + md-improve-partition-detection-in-md-array.patch revert + Mel's
patches for bug #8464 + the fedora nouveau patch

the fc7.nim is there because it was build in an F7 buildroot, not
because it carries the full Fedora patchset

> Can you boot with "kstack=32" so that we can see more of the stack?

I can try. It's not triggering quickly though

-- 
Nicolas Mailhot


signature.asc
Description: Ceci est une partie de message	numériquement signée


Re: [Bug 8473] New: Oops: 0010 [1] SMP

2007-05-17 Thread Randy Dunlap
On Thu, 17 May 2007 12:00:02 +0200 Nicolas Mailhot wrote:

> Le lundi 14 mai 2007 à 01:25 +0200, Nicolas Mailhot a écrit :
> 
> > It happened once so far. The load was moderate (and certainly not
> > comparable to what I did for Mel yesterday)
> 
> Make that twice. The interesting thing is it was preceded by CD/DVD
> access just before, to something is rotten there.
> 
> 10:52:35 ISO 9660 Extensions: RRIP_1991A
> 11:52:36 Unable to handle kernel NULL pointer dereference at  
> RIP: 
> 11:52:36 [<>]
> 11:52:36 PGD 2438a067 PUD c484067 PMD 0 
> 11:52:36 Oops: 0010 [1] SMP 
> 11:52:36 CPU 1 
> 11:52:36 Pid: 30655, comm: bash Not tainted 2.6.21-11.mm2.fc7.nim #1

so just what is this kernel?  A hybrid of -mm and -fc7 or what?

> 11:52:36 RIP: 0010:[<>]  [<>]
> 11:52:36 RSP: :810006199ee0  EFLAGS: 00010296
> 11:52:36 RAX: 804426a0 RBX: 81000903f800 RCX: 7fff8a5e9874
> 11:52:36 RDX: 5410 RSI: 81000903f800 RDI: 810006a37b88
> 11:52:36 RBP: 7fff8a5e9874 R08:  R09: 0004
> 11:52:36 R10: 0008 R11: 0246 R12: 5410
> 11:52:36 R13: 00ff R14: 00ff R15: 0008
> 11:52:36 FS:  2b1c204ccf40() GS:810004017180() 
> knlGS:
> 11:52:36 CS:  0010 DS:  ES:  CR0: 8005003b
> 11:52:36 CR2:  CR3: 0f83b000 CR4: 06e0
> 11:52:36 Process bash (pid: 30655, threadinfo 810006198000, task 
> 810024319c80)
> 11:52:36 Stack:  80285451 81000903f800 7fff8a5e9874 
> 7fff8a5e9874
> 11:52:36 802856be 5410 00ff 81000903f800
> 11:52:36  7fff8a5e9874 5410 00ff
> 11:52:36 Call Trace:
> 11:52:36 [] do_ioctl+0x55/0x6b
> 11:52:36 [] vfs_ioctl+0x257/0x270
> 11:52:36 [] sys_ioctl+0x59/0x79
> 11:52:36 [] tracesys+0xdc/0xe1
> 11:52:36 
> 11:52:36 INFO: lockdep is turned off.
> 11:52:36 
> 11:52:36 Code:  Bad RIP value.
> 11:52:36 RIP  [<>]
> 11:52:36 RSP 
> 11:52:36 CR2: 
> 
> I'd try rc2-mm1, but I don't know if the patches of Mel Gorman &
> Christoph Lameter for bug #8464 have been merged yet

Can you boot with "kstack=32" so that we can see more of the stack?


---
~Randy
*** Remember to use Documentation/SubmitChecklist when testing your code ***
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Bug 8473] New: Oops: 0010 [1] SMP

2007-05-17 Thread Nicolas Mailhot
Le lundi 14 mai 2007 à 01:25 +0200, Nicolas Mailhot a écrit :

> It happened once so far. The load was moderate (and certainly not
> comparable to what I did for Mel yesterday)

Make that twice. The interesting thing is it was preceded by CD/DVD
access just before, to something is rotten there.

10:52:35 ISO 9660 Extensions: RRIP_1991A
11:52:36 Unable to handle kernel NULL pointer dereference at  
RIP: 
11:52:36 [<>]
11:52:36 PGD 2438a067 PUD c484067 PMD 0 
11:52:36 Oops: 0010 [1] SMP 
11:52:36 CPU 1 
11:52:36 Pid: 30655, comm: bash Not tainted 2.6.21-11.mm2.fc7.nim #1
11:52:36 RIP: 0010:[<>]  [<>]
11:52:36 RSP: :810006199ee0  EFLAGS: 00010296
11:52:36 RAX: 804426a0 RBX: 81000903f800 RCX: 7fff8a5e9874
11:52:36 RDX: 5410 RSI: 81000903f800 RDI: 810006a37b88
11:52:36 RBP: 7fff8a5e9874 R08:  R09: 0004
11:52:36 R10: 0008 R11: 0246 R12: 5410
11:52:36 R13: 00ff R14: 00ff R15: 0008
11:52:36 FS:  2b1c204ccf40() GS:810004017180() 
knlGS:
11:52:36 CS:  0010 DS:  ES:  CR0: 8005003b
11:52:36 CR2:  CR3: 0f83b000 CR4: 06e0
11:52:36 Process bash (pid: 30655, threadinfo 810006198000, task 
810024319c80)
11:52:36 Stack:  80285451 81000903f800 7fff8a5e9874 
7fff8a5e9874
11:52:36 802856be 5410 00ff 81000903f800
11:52:36  7fff8a5e9874 5410 00ff
11:52:36 Call Trace:
11:52:36 [] do_ioctl+0x55/0x6b
11:52:36 [] vfs_ioctl+0x257/0x270
11:52:36 [] sys_ioctl+0x59/0x79
11:52:36 [] tracesys+0xdc/0xe1
11:52:36 
11:52:36 INFO: lockdep is turned off.
11:52:36 
11:52:36 Code:  Bad RIP value.
11:52:36 RIP  [<>]
11:52:36 RSP 
11:52:36 CR2: 

I'd try rc2-mm1, but I don't know if the patches of Mel Gorman &
Christoph Lameter for bug #8464 have been merged yet

Regards,

-- 
Nicolas Mailhot


signature.asc
Description: Ceci est une partie de message	numériquement signée


Re: [Bug 8473] New: Oops: 0010 [1] SMP

2007-05-17 Thread Nicolas Mailhot
Le lundi 14 mai 2007 à 01:25 +0200, Nicolas Mailhot a écrit :

 It happened once so far. The load was moderate (and certainly not
 comparable to what I did for Mel yesterday)

Make that twice. The interesting thing is it was preceded by CD/DVD
access just before, to something is rotten there.

10:52:35 ISO 9660 Extensions: RRIP_1991A
11:52:36 Unable to handle kernel NULL pointer dereference at  
RIP: 
11:52:36 []
11:52:36 PGD 2438a067 PUD c484067 PMD 0 
11:52:36 Oops: 0010 [1] SMP 
11:52:36 CPU 1 
11:52:36 Pid: 30655, comm: bash Not tainted 2.6.21-11.mm2.fc7.nim #1
11:52:36 RIP: 0010:[]  []
11:52:36 RSP: :810006199ee0  EFLAGS: 00010296
11:52:36 RAX: 804426a0 RBX: 81000903f800 RCX: 7fff8a5e9874
11:52:36 RDX: 5410 RSI: 81000903f800 RDI: 810006a37b88
11:52:36 RBP: 7fff8a5e9874 R08:  R09: 0004
11:52:36 R10: 0008 R11: 0246 R12: 5410
11:52:36 R13: 00ff R14: 00ff R15: 0008
11:52:36 FS:  2b1c204ccf40() GS:810004017180() 
knlGS:
11:52:36 CS:  0010 DS:  ES:  CR0: 8005003b
11:52:36 CR2:  CR3: 0f83b000 CR4: 06e0
11:52:36 Process bash (pid: 30655, threadinfo 810006198000, task 
810024319c80)
11:52:36 Stack:  80285451 81000903f800 7fff8a5e9874 
7fff8a5e9874
11:52:36 802856be 5410 00ff 81000903f800
11:52:36  7fff8a5e9874 5410 00ff
11:52:36 Call Trace:
11:52:36 [80285451] do_ioctl+0x55/0x6b
11:52:36 [802856be] vfs_ioctl+0x257/0x270
11:52:36 [80285730] sys_ioctl+0x59/0x79
11:52:36 [8020955c] tracesys+0xdc/0xe1
11:52:36 
11:52:36 INFO: lockdep is turned off.
11:52:36 
11:52:36 Code:  Bad RIP value.
11:52:36 RIP  []
11:52:36 RSP 810006199ee0
11:52:36 CR2: 

I'd try rc2-mm1, but I don't know if the patches of Mel Gorman 
Christoph Lameter for bug #8464 have been merged yet

Regards,

-- 
Nicolas Mailhot


signature.asc
Description: Ceci est une partie de message	numériquement signée


Re: [Bug 8473] New: Oops: 0010 [1] SMP

2007-05-17 Thread Randy Dunlap
On Thu, 17 May 2007 12:00:02 +0200 Nicolas Mailhot wrote:

 Le lundi 14 mai 2007 à 01:25 +0200, Nicolas Mailhot a écrit :
 
  It happened once so far. The load was moderate (and certainly not
  comparable to what I did for Mel yesterday)
 
 Make that twice. The interesting thing is it was preceded by CD/DVD
 access just before, to something is rotten there.
 
 10:52:35 ISO 9660 Extensions: RRIP_1991A
 11:52:36 Unable to handle kernel NULL pointer dereference at  
 RIP: 
 11:52:36 []
 11:52:36 PGD 2438a067 PUD c484067 PMD 0 
 11:52:36 Oops: 0010 [1] SMP 
 11:52:36 CPU 1 
 11:52:36 Pid: 30655, comm: bash Not tainted 2.6.21-11.mm2.fc7.nim #1

so just what is this kernel?  A hybrid of -mm and -fc7 or what?

 11:52:36 RIP: 0010:[]  []
 11:52:36 RSP: :810006199ee0  EFLAGS: 00010296
 11:52:36 RAX: 804426a0 RBX: 81000903f800 RCX: 7fff8a5e9874
 11:52:36 RDX: 5410 RSI: 81000903f800 RDI: 810006a37b88
 11:52:36 RBP: 7fff8a5e9874 R08:  R09: 0004
 11:52:36 R10: 0008 R11: 0246 R12: 5410
 11:52:36 R13: 00ff R14: 00ff R15: 0008
 11:52:36 FS:  2b1c204ccf40() GS:810004017180() 
 knlGS:
 11:52:36 CS:  0010 DS:  ES:  CR0: 8005003b
 11:52:36 CR2:  CR3: 0f83b000 CR4: 06e0
 11:52:36 Process bash (pid: 30655, threadinfo 810006198000, task 
 810024319c80)
 11:52:36 Stack:  80285451 81000903f800 7fff8a5e9874 
 7fff8a5e9874
 11:52:36 802856be 5410 00ff 81000903f800
 11:52:36  7fff8a5e9874 5410 00ff
 11:52:36 Call Trace:
 11:52:36 [80285451] do_ioctl+0x55/0x6b
 11:52:36 [802856be] vfs_ioctl+0x257/0x270
 11:52:36 [80285730] sys_ioctl+0x59/0x79
 11:52:36 [8020955c] tracesys+0xdc/0xe1
 11:52:36 
 11:52:36 INFO: lockdep is turned off.
 11:52:36 
 11:52:36 Code:  Bad RIP value.
 11:52:36 RIP  []
 11:52:36 RSP 810006199ee0
 11:52:36 CR2: 
 
 I'd try rc2-mm1, but I don't know if the patches of Mel Gorman 
 Christoph Lameter for bug #8464 have been merged yet

Can you boot with kstack=32 so that we can see more of the stack?


---
~Randy
*** Remember to use Documentation/SubmitChecklist when testing your code ***
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Bug 8473] New: Oops: 0010 [1] SMP

2007-05-17 Thread Nicolas Mailhot
Le jeudi 17 mai 2007 à 09:45 -0700, Randy Dunlap a écrit :
 On Thu, 17 May 2007 12:00:02 +0200 Nicolas Mailhot wrote:
 
  Le lundi 14 mai 2007 à 01:25 +0200, Nicolas Mailhot a écrit :
  
   It happened once so far. The load was moderate (and certainly not
   comparable to what I did for Mel yesterday)
  
  Make that twice. The interesting thing is it was preceded by CD/DVD
  access just before, to something is rotten there.
  
  10:52:35 ISO 9660 Extensions: RRIP_1991A
  11:52:36 Unable to handle kernel NULL pointer dereference at 
   RIP: 
  11:52:36 []
  11:52:36 PGD 2438a067 PUD c484067 PMD 0 
  11:52:36 Oops: 0010 [1] SMP 
  11:52:36 CPU 1 
  11:52:36 Pid: 30655, comm: bash Not tainted 2.6.21-11.mm2.fc7.nim #1
 
 so just what is this kernel?  A hybrid of -mm and -fc7 or what?

-mm + md-improve-partition-detection-in-md-array.patch revert + Mel's
patches for bug #8464 + the fedora nouveau patch

the fc7.nim is there because it was build in an F7 buildroot, not
because it carries the full Fedora patchset

 Can you boot with kstack=32 so that we can see more of the stack?

I can try. It's not triggering quickly though

-- 
Nicolas Mailhot


signature.asc
Description: Ceci est une partie de message	numériquement signée


Re: [Bug 8473] New: Oops: 0010 [1] SMP

2007-05-13 Thread Nicolas Mailhot
Le dimanche 13 mai 2007 à 15:47 -0700, Andrew Morton a écrit :
> On Sun, 13 May 2007 14:02:50 -0700 [EMAIL PROTECTED] wrote:
> 
> > http://bugzilla.kernel.org/show_bug.cgi?id=8473
> 
> Please follow up via emailed reply-to-all.
> 
> In fact, please report -mm bugs via email.  bugzilla is more suited to
> longer-term problems, and -mm bugs are super-short-term, we hope.

Can't attach trace screenshots or long log dumps to mails :(

> May 13 22:59:43 rousalka kernel: Unable to handle kernel NULL pointer
> dereference at  RIP: 
> May 13 22:59:43 rousalka kernel: [<>]

> Anything you can do to make that wordwrapping go away for ever would be
> great, thanks.

You have the full kernel log with no wrapping there
http://bugzilla.kernel.org/attachment.cgi?id=11492

> I don't know what would have caused this.  do_ioctl() did a jump-to-zero,
> but it has code in there to explicitly test for null pointers.
> 
> Perhaps some weird race, although I find it hard to imagine how we could
> have such a race in any ioctl which bash is likely to be calling.
> 
> Is it repeatable at all?

It happened once so far. The load was moderate (and certainly not
comparable to what I did for Mel yesterday)

-- 
Nicolas Mailhot


signature.asc
Description: Ceci est une partie de message	numériquement signée


Re: [Bug 8473] New: Oops: 0010 [1] SMP

2007-05-13 Thread Andrew Morton
On Sun, 13 May 2007 14:02:50 -0700 [EMAIL PROTECTED] wrote:

> http://bugzilla.kernel.org/show_bug.cgi?id=8473

Please follow up via emailed reply-to-all.

In fact, please report -mm bugs via email.  bugzilla is more suited to
longer-term problems, and -mm bugs are super-short-term, we hope.


>Summary: Oops: 0010 [1] SMP
> Kernel Version: 2.6.21-mm2
> Status: NEW
>   Severity: high
>  Owner: [EMAIL PROTECTED]
>  Submitter: [EMAIL PROTECTED]
> 
> 
> Most recent kernel where this bug did *NOT* occur: 2.6.21-mm1 (though I didn't
> test it that long)
> Distribution: Fedora Devel
> Hardware Environment: AMD C2 on CK804
> Software Environment: Normal workload (building a small package)
> Problem Description:
> 
> Steps to reproduce:
> May 13 22:59:43 rousalka kernel: Unable to handle kernel NULL pointer
> dereference at  RIP: 
> May 13 22:59:43 rousalka kernel: [<>]
> May 13 22:59:43 rousalka kernel: PGD bf1a067 PUD f295067 PMD 0 
> May 13 22:59:43 rousalka kernel: Oops: 0010 [1] SMP 
> May 13 22:59:43 rousalka kernel: CPU 0 
> May 13 22:59:43 rousalka kernel: Pid: 8758, comm: bash Not tainted
> 2.6.21-9.mm2.fc7.nim #1
> May 13 22:59:43 rousalka kernel: RIP: 0010:[<>] 
> [<>]
> May 13 22:59:43 rousalka kernel: RSP: 0018:81000ffedee0  EFLAGS: 00010296
> May 13 22:59:43 rousalka kernel: RAX: 804426a0 RBX: 81000abe6a80
> RCX: 7fff565bbdc4
> May 13 22:59:43 rousalka kernel: RDX: 5410 RSI: 81000abe6a80
> RDI: 810009cc6fa0
> May 13 22:59:43 rousalka kernel: RBP: 7fff565bbdc4 R08: 
> R09: 009033a4
> May 13 22:59:43 rousalka kernel: R10: 0008 R11: 0246
> R12: 5410
> May 13 22:59:43 rousalka kernel: R13: 00ff R14: 00ff
> R15: 
> May 13 22:59:43 rousalka kernel: FS:  2b2d544faf40()
> GS:8056b000() knlGS:
> May 13 22:59:43 rousalka kernel: CS:  0010 DS:  ES:  CR0: 
> 8005003b
> May 13 22:59:43 rousalka kernel: CR2:  CR3: 0bdf4000
> CR4: 06e0
> May 13 22:59:43 rousalka kernel: Process bash (pid: 8758, threadinfo
> 81000ffec000, task 81000bc82ac0)
> May 13 22:59:43 rousalka kernel: Stack:  8028545d 81000abe6a80
> 7fff565bbdc4 7fff565bbdc4
> May 13 22:59:43 rousalka kernel: 802856ca 5410
> 00ff 81000abe6a80
> May 13 22:59:43 rousalka kernel:  7fff565bbdc4
> 5410 00ff
> May 13 22:59:43 rousalka kernel: Call Trace:
> May 13 22:59:43 rousalka kernel: [] do_ioctl+0x55/0x6b
> May 13 22:59:43 rousalka kernel: [] vfs_ioctl+0x257/0x270
> May 13 22:59:43 rousalka kernel: [] sys_ioctl+0x59/0x79
> May 13 22:59:43 rousalka kernel: [] tracesys+0xdc/0xe1
> May 13 22:59:43 rousalka kernel: 
> May 13 22:59:43 rousalka kernel: INFO: lockdep is turned off.
> May 13 22:59:43 rousalka kernel: 
> May 13 22:59:43 rousalka kernel: Code:  Bad RIP value.
> May 13 22:59:43 rousalka kernel: RIP  [<>]
> May 13 22:59:43 rousalka kernel: RSP 
> May 13 22:59:43 rousalka kernel: CR2: 

Anything you can do to make that wordwrapping go away for ever would be
great, thanks.

I don't know what would have caused this.  do_ioctl() did a jump-to-zero,
but it has code in there to explicitly test for null pointers.

Perhaps some weird race, although I find it hard to imagine how we could
have such a race in any ioctl which bash is likely to be calling.

Is it repeatable at all?
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Bug 8473] New: Oops: 0010 [1] SMP

2007-05-13 Thread Andrew Morton
On Sun, 13 May 2007 14:02:50 -0700 [EMAIL PROTECTED] wrote:

 http://bugzilla.kernel.org/show_bug.cgi?id=8473

Please follow up via emailed reply-to-all.

In fact, please report -mm bugs via email.  bugzilla is more suited to
longer-term problems, and -mm bugs are super-short-term, we hope.


Summary: Oops: 0010 [1] SMP
 Kernel Version: 2.6.21-mm2
 Status: NEW
   Severity: high
  Owner: [EMAIL PROTECTED]
  Submitter: [EMAIL PROTECTED]
 
 
 Most recent kernel where this bug did *NOT* occur: 2.6.21-mm1 (though I didn't
 test it that long)
 Distribution: Fedora Devel
 Hardware Environment: AMD C2 on CK804
 Software Environment: Normal workload (building a small package)
 Problem Description:
 
 Steps to reproduce:
 May 13 22:59:43 rousalka kernel: Unable to handle kernel NULL pointer
 dereference at  RIP: 
 May 13 22:59:43 rousalka kernel: []
 May 13 22:59:43 rousalka kernel: PGD bf1a067 PUD f295067 PMD 0 
 May 13 22:59:43 rousalka kernel: Oops: 0010 [1] SMP 
 May 13 22:59:43 rousalka kernel: CPU 0 
 May 13 22:59:43 rousalka kernel: Pid: 8758, comm: bash Not tainted
 2.6.21-9.mm2.fc7.nim #1
 May 13 22:59:43 rousalka kernel: RIP: 0010:[] 
 []
 May 13 22:59:43 rousalka kernel: RSP: 0018:81000ffedee0  EFLAGS: 00010296
 May 13 22:59:43 rousalka kernel: RAX: 804426a0 RBX: 81000abe6a80
 RCX: 7fff565bbdc4
 May 13 22:59:43 rousalka kernel: RDX: 5410 RSI: 81000abe6a80
 RDI: 810009cc6fa0
 May 13 22:59:43 rousalka kernel: RBP: 7fff565bbdc4 R08: 
 R09: 009033a4
 May 13 22:59:43 rousalka kernel: R10: 0008 R11: 0246
 R12: 5410
 May 13 22:59:43 rousalka kernel: R13: 00ff R14: 00ff
 R15: 
 May 13 22:59:43 rousalka kernel: FS:  2b2d544faf40()
 GS:8056b000() knlGS:
 May 13 22:59:43 rousalka kernel: CS:  0010 DS:  ES:  CR0: 
 8005003b
 May 13 22:59:43 rousalka kernel: CR2:  CR3: 0bdf4000
 CR4: 06e0
 May 13 22:59:43 rousalka kernel: Process bash (pid: 8758, threadinfo
 81000ffec000, task 81000bc82ac0)
 May 13 22:59:43 rousalka kernel: Stack:  8028545d 81000abe6a80
 7fff565bbdc4 7fff565bbdc4
 May 13 22:59:43 rousalka kernel: 802856ca 5410
 00ff 81000abe6a80
 May 13 22:59:43 rousalka kernel:  7fff565bbdc4
 5410 00ff
 May 13 22:59:43 rousalka kernel: Call Trace:
 May 13 22:59:43 rousalka kernel: [8028545d] do_ioctl+0x55/0x6b
 May 13 22:59:43 rousalka kernel: [802856ca] vfs_ioctl+0x257/0x270
 May 13 22:59:43 rousalka kernel: [8028573c] sys_ioctl+0x59/0x79
 May 13 22:59:43 rousalka kernel: [8020955c] tracesys+0xdc/0xe1
 May 13 22:59:43 rousalka kernel: 
 May 13 22:59:43 rousalka kernel: INFO: lockdep is turned off.
 May 13 22:59:43 rousalka kernel: 
 May 13 22:59:43 rousalka kernel: Code:  Bad RIP value.
 May 13 22:59:43 rousalka kernel: RIP  []
 May 13 22:59:43 rousalka kernel: RSP 81000ffedee0
 May 13 22:59:43 rousalka kernel: CR2: 

Anything you can do to make that wordwrapping go away for ever would be
great, thanks.

I don't know what would have caused this.  do_ioctl() did a jump-to-zero,
but it has code in there to explicitly test for null pointers.

Perhaps some weird race, although I find it hard to imagine how we could
have such a race in any ioctl which bash is likely to be calling.

Is it repeatable at all?
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Bug 8473] New: Oops: 0010 [1] SMP

2007-05-13 Thread Nicolas Mailhot
Le dimanche 13 mai 2007 à 15:47 -0700, Andrew Morton a écrit :
 On Sun, 13 May 2007 14:02:50 -0700 [EMAIL PROTECTED] wrote:
 
  http://bugzilla.kernel.org/show_bug.cgi?id=8473
 
 Please follow up via emailed reply-to-all.
 
 In fact, please report -mm bugs via email.  bugzilla is more suited to
 longer-term problems, and -mm bugs are super-short-term, we hope.

Can't attach trace screenshots or long log dumps to mails :(

 May 13 22:59:43 rousalka kernel: Unable to handle kernel NULL pointer
 dereference at  RIP: 
 May 13 22:59:43 rousalka kernel: []

 Anything you can do to make that wordwrapping go away for ever would be
 great, thanks.

You have the full kernel log with no wrapping there
http://bugzilla.kernel.org/attachment.cgi?id=11492

 I don't know what would have caused this.  do_ioctl() did a jump-to-zero,
 but it has code in there to explicitly test for null pointers.
 
 Perhaps some weird race, although I find it hard to imagine how we could
 have such a race in any ioctl which bash is likely to be calling.
 
 Is it repeatable at all?

It happened once so far. The load was moderate (and certainly not
comparable to what I did for Mel yesterday)

-- 
Nicolas Mailhot


signature.asc
Description: Ceci est une partie de message	numériquement signée