Re: 2.6.24-rc1-gb4f5550 oops

2007-11-14 Thread Oleg Nesterov
On 11/14, Srivatsa Vaddagiri wrote:
>
> On Wed, Nov 14, 2007 at 06:17:08PM +0300, Oleg Nesterov wrote:
> > Suppose that old user_struct/task_group is freed/reused, and the task does
> 
> Shouldn't this old user actually be the root user_struct?

Well, the S_ISUID application can switch ->user from !root_user (note that
setuid exec doesn't change ->user), and we also have CAP_SETUID...

But yes, I agree, this all is very unlikely.

Oleg.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.24-rc1-gb4f5550 oops

2007-11-14 Thread Srivatsa Vaddagiri
On Wed, Nov 14, 2007 at 06:17:08PM +0300, Oleg Nesterov wrote:
> Suppose that old user_struct/task_group is freed/reused, and the task does

Shouldn't this old user actually be the root user_struct?

-- 
Regards,
vatsa
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.24-rc1-gb4f5550 oops

2007-11-14 Thread Ingo Molnar

btw., could you get your Signed-off-by line for that fix?

Ingo
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.24-rc1-gb4f5550 oops

2007-11-14 Thread Ingo Molnar

* Oleg Nesterov <[EMAIL PROTECTED]> wrote:

> > > I suspect I see the bug in that area, but I am not sure it can 
> > > explain this trace completely.
> > 
> > there's a fix pending from Dmitry - please see below. It took days 
> > for Grant to trigger the crash so it needs some time to be confirmed 
> > but it could explain the crash in theory.
> 
> Yes I agree, it can explain the crash.
> 
> However, this patch can't fix the bug I was talking about (of course, 
> unless I missed something). It is still possible that the 
> "fair_sched_class" task can have ->se.cfs_rq/parent pointing to the 
> freed memory, no?

yeah, agreed. I've queued up your fix for merging - thanks!

Ingo
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.24-rc1-gb4f5550 oops

2007-11-14 Thread Oleg Nesterov
On 11/14, Ingo Molnar wrote:
> 
> * Oleg Nesterov <[EMAIL PROTECTED]> wrote:
> 
> > > [18073.371126] Unable to handle kernel NULL pointer dereference at 
> > > 0120 RIP:
> > > [18073.371134]  [] check_preempt_wakeup+0x6e/0x110
> > > [18073.371144] PGD 81f9067 PUD 81c8067 PMD 0
> > > [18073.371151] Oops:  [1] PREEMPT SMP
> > > [18073.371157] CPU 2
> > > [18073.371161] Modules linked in: vfat fat
> > > [18073.371168] Pid: 4639, comm: kwin Not tainted 2.6.24-rc1 #1
> > > [18073.371171] RIP: 0010:[]  [] 
> > > check_preempt_wakeup+0x6e/0x110
> > > [18073.371177] RSP: 0018:810008531a78  EFLAGS: 00010006
> > > [18073.371179] RAX:  RBX:  RCX: 
> > > 
> > > [18073.371183] RDX: 810004441bf0 RSI: 81000801e860 RDI: 
> > > 81000444ab80
> > > [18073.371186] RBP: 810008531aa8 R08: 00d0d47a4a90 R09: 
> > > 
> > > [18073.371188] R10: 810004441bf0 R11: 0001 R12: 
> > > 810006520400
> > > [18073.371190] R13: 81000801e860 R14: 81000a63a000 R15: 
> > > 81000443d8e0
> > > [18073.371193] FS:  2b7d646a86f0() GS:810004c11780() 
> > > knlGS:
> > > [18073.371196] CS:  0010 DS:  ES:  CR0: 8005003b
> > > [18073.371199] CR2: 0120 CR3: 08495000 CR4: 
> > > 06e0
> > > [18073.371202] DR0:  DR1:  DR2: 
> > > 
> > > [18073.371211] DR3:  DR6: 0ff0 DR7: 
> > > 0400
> > > [18073.371214] Process kwin (pid: 4639, threadinfo 81000853, task 
> > > 81000840a860)
> > > [18073.371216] Stack:  81000444ab80 0001 81000801e860 
> > > 81000444ab80
> > > [18073.371231]  0002 81000443d8e0 810008531b38 
> > > 8023061e
> > > [18073.371238]   810004441b80 0002 
> > > 0001
> > > [18073.371245] Call Trace:
> > > [18073.371250]  [] try_to_wake_up+0x2fe/0x3a0
> > 
> > I suspect I see the bug in that area, but I am not sure it can explain 
> > this trace completely.
> 
> there's a fix pending from Dmitry - please see below. It took days for 
> Grant to trigger the crash so it needs some time to be confirmed but it 
> could explain the crash in theory.

Yes I agree, it can explain the crash.

However, this patch can't fix the bug I was talking about (of course, unless
I missed something). It is still possible that the "fair_sched_class" task
can have ->se.cfs_rq/parent pointing to the freed memory, no?

Oleg.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.24-rc1-gb4f5550 oops

2007-11-14 Thread Ingo Molnar

* Oleg Nesterov <[EMAIL PROTECTED]> wrote:

> > [18073.371126] Unable to handle kernel NULL pointer dereference at 
> > 0120 RIP:
> > [18073.371134]  [] check_preempt_wakeup+0x6e/0x110
> > [18073.371144] PGD 81f9067 PUD 81c8067 PMD 0
> > [18073.371151] Oops:  [1] PREEMPT SMP
> > [18073.371157] CPU 2
> > [18073.371161] Modules linked in: vfat fat
> > [18073.371168] Pid: 4639, comm: kwin Not tainted 2.6.24-rc1 #1
> > [18073.371171] RIP: 0010:[]  [] 
> > check_preempt_wakeup+0x6e/0x110
> > [18073.371177] RSP: 0018:810008531a78  EFLAGS: 00010006
> > [18073.371179] RAX:  RBX:  RCX: 
> > 
> > [18073.371183] RDX: 810004441bf0 RSI: 81000801e860 RDI: 
> > 81000444ab80
> > [18073.371186] RBP: 810008531aa8 R08: 00d0d47a4a90 R09: 
> > 
> > [18073.371188] R10: 810004441bf0 R11: 0001 R12: 
> > 810006520400
> > [18073.371190] R13: 81000801e860 R14: 81000a63a000 R15: 
> > 81000443d8e0
> > [18073.371193] FS:  2b7d646a86f0() GS:810004c11780() 
> > knlGS:
> > [18073.371196] CS:  0010 DS:  ES:  CR0: 8005003b
> > [18073.371199] CR2: 0120 CR3: 08495000 CR4: 
> > 06e0
> > [18073.371202] DR0:  DR1:  DR2: 
> > 
> > [18073.371211] DR3:  DR6: 0ff0 DR7: 
> > 0400
> > [18073.371214] Process kwin (pid: 4639, threadinfo 81000853, task 
> > 81000840a860)
> > [18073.371216] Stack:  81000444ab80 0001 81000801e860 
> > 81000444ab80
> > [18073.371231]  0002 81000443d8e0 810008531b38 
> > 8023061e
> > [18073.371238]   810004441b80 0002 
> > 0001
> > [18073.371245] Call Trace:
> > [18073.371250]  [] try_to_wake_up+0x2fe/0x3a0
> 
> I suspect I see the bug in that area, but I am not sure it can explain 
> this trace completely.

there's a fix pending from Dmitry - please see below. It took days for 
Grant to trigger the crash so it needs some time to be confirmed but it 
could explain the crash in theory.

Ingo

-->
Subject: sched: fix __set_task_cpu() SMP race
From: Dmitry Adamushko <[EMAIL PROTECTED]>

Grant Wilson has reported rare SCHED_FAIR_USER crashes on his quad-core 
system, which crashes can only be explained via runqueue corruption.

there is a narrow SMP race in __set_task_cpu(): after ->cpu is set up to 
a new value, task_rq_lock(p, ...) can be successfuly executed on another 
CPU. We must ensure that updates of per-task data have been completed by 
this moment.

this bug has been hiding in the Linux scheduler for an eternity (we 
never had any explicit barrier for task->cpu in set_task_cpu() - so the 
bug was introduced in 2.5.1), but only became visible via 
set_task_cfs_rq() being accidentally put after the task->cpu update. It 
also probably needs a sufficiently out-of-order CPU to trigger.

Reported-by: Grant Wilson <[EMAIL PROTECTED]>
Signed-off-by: Dmitry Adamushko <[EMAIL PROTECTED]>
Signed-off-by: Ingo Molnar <[EMAIL PROTECTED]>
---
 kernel/sched.c |   18 --
 1 file changed, 12 insertions(+), 6 deletions(-)

Index: linux/kernel/sched.c
===
--- linux.orig/kernel/sched.c
+++ linux/kernel/sched.c
@@ -217,15 +217,15 @@ static inline struct task_group *task_gr
 }
 
 /* Change a task's cfs_rq and parent entity if it moves across CPUs/groups */
-static inline void set_task_cfs_rq(struct task_struct *p)
+static inline void set_task_cfs_rq(struct task_struct *p, unsigned int cpu)
 {
-   p->se.cfs_rq = task_group(p)->cfs_rq[task_cpu(p)];
-   p->se.parent = task_group(p)->se[task_cpu(p)];
+   p->se.cfs_rq = task_group(p)->cfs_rq[cpu];
+   p->se.parent = task_group(p)->se[cpu];
 }
 
 #else
 
-static inline void set_task_cfs_rq(struct task_struct *p) { }
+static inline void set_task_cfs_rq(struct task_struct *p, unsigned int cpu) { }
 
 #endif /* CONFIG_FAIR_GROUP_SCHED */
 
@@ -1023,10 +1023,16 @@ unsigned long weighted_cpuload(const int
 
 static inline void __set_task_cpu(struct task_struct *p, unsigned int cpu)
 {
+   set_task_cfs_rq(p, cpu);
 #ifdef CONFIG_SMP
+   /*
+* After ->cpu is set up to a new value, task_rq_lock(p, ...) can be
+* successfuly executed on another CPU. We must ensure that updates of
+* per-task data have been completed by this moment.
+*/
+   smp_wmb();
task_thread_info(p)->cpu = cpu;
 #endif
-   set_task_cfs_rq(p);
 }
 
 #ifdef CONFIG_SMP
@@ -7111,7 +7117,7 @@ void sched_move_task(struct task_struct 
tsk->sched_class->put_prev_task(rq, tsk);
}
 
-   set_task_cfs_rq(tsk);
+   set_task_cfs_rq(tsk, task_cpu(tsk));
 
if (on_rq) {
if (unlikely(running))
-
To 

Re: 2.6.24-rc1-gb4f5550 oops

2007-11-14 Thread Oleg Nesterov
Grant Wilson wrote:
>
> [18073.371126] Unable to handle kernel NULL pointer dereference at 
> 0120 RIP:
> [18073.371134]  [] check_preempt_wakeup+0x6e/0x110
> [18073.371144] PGD 81f9067 PUD 81c8067 PMD 0
> [18073.371151] Oops:  [1] PREEMPT SMP
> [18073.371157] CPU 2
> [18073.371161] Modules linked in: vfat fat
> [18073.371168] Pid: 4639, comm: kwin Not tainted 2.6.24-rc1 #1
> [18073.371171] RIP: 0010:[]  [] 
> check_preempt_wakeup+0x6e/0x110
> [18073.371177] RSP: 0018:810008531a78  EFLAGS: 00010006
> [18073.371179] RAX:  RBX:  RCX: 
> 
> [18073.371183] RDX: 810004441bf0 RSI: 81000801e860 RDI: 
> 81000444ab80
> [18073.371186] RBP: 810008531aa8 R08: 00d0d47a4a90 R09: 
> 
> [18073.371188] R10: 810004441bf0 R11: 0001 R12: 
> 810006520400
> [18073.371190] R13: 81000801e860 R14: 81000a63a000 R15: 
> 81000443d8e0
> [18073.371193] FS:  2b7d646a86f0() GS:810004c11780() 
> knlGS:
> [18073.371196] CS:  0010 DS:  ES:  CR0: 8005003b
> [18073.371199] CR2: 0120 CR3: 08495000 CR4: 
> 06e0
> [18073.371202] DR0:  DR1:  DR2: 
> 
> [18073.371211] DR3:  DR6: 0ff0 DR7: 
> 0400
> [18073.371214] Process kwin (pid: 4639, threadinfo 81000853, task 
> 81000840a860)
> [18073.371216] Stack:  81000444ab80 0001 81000801e860 
> 81000444ab80
> [18073.371231]  0002 81000443d8e0 810008531b38 
> 8023061e
> [18073.371238]   810004441b80 0002 
> 0001
> [18073.371245] Call Trace:
> [18073.371250]  [] try_to_wake_up+0x2fe/0x3a0

I suspect I see the bug in that area, but I am not sure it can explain this
trace completely.

Suppose that the SCHED_FIFO task does

switch_uid(new_user);

Now, p->se.cfs_rq and p->se.parent both point into the old user_struct->tg
because sched_move_task() doesn't call set_task_cfs_rq() for !fair_sched_class
case.

Suppose that old user_struct/task_group is freed/reused, and the task does

sched_setscheduler(SCHED_NORMAL);

__setscheduler() sets fair_sched_class, but doesn't update ->se.cfs_rq/parent
which point to the freed memory.

This means that check_preempt_wakeup() doing

while (!is_same_group(se, pse)) {
se = parent_entity(se);
pse = parent_entity(pse);
}

may OOPS in a similar way if rq->curr or p did something like above.

Perhaps we need something like the patch below, note that __setscheduler()
can't do set_task_cfs_rq().

Oleg.

--- kernel/sched.c  2007-11-14 17:32:21.0 +0300
+++ -   2007-11-14 18:14:00.245492756 +0300
@@ -7068,8 +7068,10 @@ void sched_move_task(struct task_struct 
 
rq = task_rq_lock(tsk, );
 
-   if (tsk->sched_class != _sched_class)
+   if (tsk->sched_class != _sched_class) {
+   set_task_cfs_rq(tsk);
goto done;
+   }
 
update_rq_clock(rq);
 

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.24-rc1-gb4f5550 oops

2007-11-14 Thread Oleg Nesterov
Grant Wilson wrote:

 [18073.371126] Unable to handle kernel NULL pointer dereference at 
 0120 RIP:
 [18073.371134]  [8023572e] check_preempt_wakeup+0x6e/0x110
 [18073.371144] PGD 81f9067 PUD 81c8067 PMD 0
 [18073.371151] Oops:  [1] PREEMPT SMP
 [18073.371157] CPU 2
 [18073.371161] Modules linked in: vfat fat
 [18073.371168] Pid: 4639, comm: kwin Not tainted 2.6.24-rc1 #1
 [18073.371171] RIP: 0010:[8023572e]  [8023572e] 
 check_preempt_wakeup+0x6e/0x110
 [18073.371177] RSP: 0018:810008531a78  EFLAGS: 00010006
 [18073.371179] RAX:  RBX:  RCX: 
 
 [18073.371183] RDX: 810004441bf0 RSI: 81000801e860 RDI: 
 81000444ab80
 [18073.371186] RBP: 810008531aa8 R08: 00d0d47a4a90 R09: 
 
 [18073.371188] R10: 810004441bf0 R11: 0001 R12: 
 810006520400
 [18073.371190] R13: 81000801e860 R14: 81000a63a000 R15: 
 81000443d8e0
 [18073.371193] FS:  2b7d646a86f0() GS:810004c11780() 
 knlGS:
 [18073.371196] CS:  0010 DS:  ES:  CR0: 8005003b
 [18073.371199] CR2: 0120 CR3: 08495000 CR4: 
 06e0
 [18073.371202] DR0:  DR1:  DR2: 
 
 [18073.371211] DR3:  DR6: 0ff0 DR7: 
 0400
 [18073.371214] Process kwin (pid: 4639, threadinfo 81000853, task 
 81000840a860)
 [18073.371216] Stack:  81000444ab80 0001 81000801e860 
 81000444ab80
 [18073.371231]  0002 81000443d8e0 810008531b38 
 8023061e
 [18073.371238]   810004441b80 0002 
 0001
 [18073.371245] Call Trace:
 [18073.371250]  [8023061e] try_to_wake_up+0x2fe/0x3a0

I suspect I see the bug in that area, but I am not sure it can explain this
trace completely.

Suppose that the SCHED_FIFO task does

switch_uid(new_user);

Now, p-se.cfs_rq and p-se.parent both point into the old user_struct-tg
because sched_move_task() doesn't call set_task_cfs_rq() for !fair_sched_class
case.

Suppose that old user_struct/task_group is freed/reused, and the task does

sched_setscheduler(SCHED_NORMAL);

__setscheduler() sets fair_sched_class, but doesn't update -se.cfs_rq/parent
which point to the freed memory.

This means that check_preempt_wakeup() doing

while (!is_same_group(se, pse)) {
se = parent_entity(se);
pse = parent_entity(pse);
}

may OOPS in a similar way if rq-curr or p did something like above.

Perhaps we need something like the patch below, note that __setscheduler()
can't do set_task_cfs_rq().

Oleg.

--- kernel/sched.c  2007-11-14 17:32:21.0 +0300
+++ -   2007-11-14 18:14:00.245492756 +0300
@@ -7068,8 +7068,10 @@ void sched_move_task(struct task_struct 
 
rq = task_rq_lock(tsk, flags);
 
-   if (tsk-sched_class != fair_sched_class)
+   if (tsk-sched_class != fair_sched_class) {
+   set_task_cfs_rq(tsk);
goto done;
+   }
 
update_rq_clock(rq);
 

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.24-rc1-gb4f5550 oops

2007-11-14 Thread Ingo Molnar

* Oleg Nesterov [EMAIL PROTECTED] wrote:

   I suspect I see the bug in that area, but I am not sure it can 
   explain this trace completely.
  
  there's a fix pending from Dmitry - please see below. It took days 
  for Grant to trigger the crash so it needs some time to be confirmed 
  but it could explain the crash in theory.
 
 Yes I agree, it can explain the crash.
 
 However, this patch can't fix the bug I was talking about (of course, 
 unless I missed something). It is still possible that the 
 fair_sched_class task can have -se.cfs_rq/parent pointing to the 
 freed memory, no?

yeah, agreed. I've queued up your fix for merging - thanks!

Ingo
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.24-rc1-gb4f5550 oops

2007-11-14 Thread Ingo Molnar

* Oleg Nesterov [EMAIL PROTECTED] wrote:

  [18073.371126] Unable to handle kernel NULL pointer dereference at 
  0120 RIP:
  [18073.371134]  [8023572e] check_preempt_wakeup+0x6e/0x110
  [18073.371144] PGD 81f9067 PUD 81c8067 PMD 0
  [18073.371151] Oops:  [1] PREEMPT SMP
  [18073.371157] CPU 2
  [18073.371161] Modules linked in: vfat fat
  [18073.371168] Pid: 4639, comm: kwin Not tainted 2.6.24-rc1 #1
  [18073.371171] RIP: 0010:[8023572e]  [8023572e] 
  check_preempt_wakeup+0x6e/0x110
  [18073.371177] RSP: 0018:810008531a78  EFLAGS: 00010006
  [18073.371179] RAX:  RBX:  RCX: 
  
  [18073.371183] RDX: 810004441bf0 RSI: 81000801e860 RDI: 
  81000444ab80
  [18073.371186] RBP: 810008531aa8 R08: 00d0d47a4a90 R09: 
  
  [18073.371188] R10: 810004441bf0 R11: 0001 R12: 
  810006520400
  [18073.371190] R13: 81000801e860 R14: 81000a63a000 R15: 
  81000443d8e0
  [18073.371193] FS:  2b7d646a86f0() GS:810004c11780() 
  knlGS:
  [18073.371196] CS:  0010 DS:  ES:  CR0: 8005003b
  [18073.371199] CR2: 0120 CR3: 08495000 CR4: 
  06e0
  [18073.371202] DR0:  DR1:  DR2: 
  
  [18073.371211] DR3:  DR6: 0ff0 DR7: 
  0400
  [18073.371214] Process kwin (pid: 4639, threadinfo 81000853, task 
  81000840a860)
  [18073.371216] Stack:  81000444ab80 0001 81000801e860 
  81000444ab80
  [18073.371231]  0002 81000443d8e0 810008531b38 
  8023061e
  [18073.371238]   810004441b80 0002 
  0001
  [18073.371245] Call Trace:
  [18073.371250]  [8023061e] try_to_wake_up+0x2fe/0x3a0
 
 I suspect I see the bug in that area, but I am not sure it can explain 
 this trace completely.

there's a fix pending from Dmitry - please see below. It took days for 
Grant to trigger the crash so it needs some time to be confirmed but it 
could explain the crash in theory.

Ingo

--
Subject: sched: fix __set_task_cpu() SMP race
From: Dmitry Adamushko [EMAIL PROTECTED]

Grant Wilson has reported rare SCHED_FAIR_USER crashes on his quad-core 
system, which crashes can only be explained via runqueue corruption.

there is a narrow SMP race in __set_task_cpu(): after -cpu is set up to 
a new value, task_rq_lock(p, ...) can be successfuly executed on another 
CPU. We must ensure that updates of per-task data have been completed by 
this moment.

this bug has been hiding in the Linux scheduler for an eternity (we 
never had any explicit barrier for task-cpu in set_task_cpu() - so the 
bug was introduced in 2.5.1), but only became visible via 
set_task_cfs_rq() being accidentally put after the task-cpu update. It 
also probably needs a sufficiently out-of-order CPU to trigger.

Reported-by: Grant Wilson [EMAIL PROTECTED]
Signed-off-by: Dmitry Adamushko [EMAIL PROTECTED]
Signed-off-by: Ingo Molnar [EMAIL PROTECTED]
---
 kernel/sched.c |   18 --
 1 file changed, 12 insertions(+), 6 deletions(-)

Index: linux/kernel/sched.c
===
--- linux.orig/kernel/sched.c
+++ linux/kernel/sched.c
@@ -217,15 +217,15 @@ static inline struct task_group *task_gr
 }
 
 /* Change a task's cfs_rq and parent entity if it moves across CPUs/groups */
-static inline void set_task_cfs_rq(struct task_struct *p)
+static inline void set_task_cfs_rq(struct task_struct *p, unsigned int cpu)
 {
-   p-se.cfs_rq = task_group(p)-cfs_rq[task_cpu(p)];
-   p-se.parent = task_group(p)-se[task_cpu(p)];
+   p-se.cfs_rq = task_group(p)-cfs_rq[cpu];
+   p-se.parent = task_group(p)-se[cpu];
 }
 
 #else
 
-static inline void set_task_cfs_rq(struct task_struct *p) { }
+static inline void set_task_cfs_rq(struct task_struct *p, unsigned int cpu) { }
 
 #endif /* CONFIG_FAIR_GROUP_SCHED */
 
@@ -1023,10 +1023,16 @@ unsigned long weighted_cpuload(const int
 
 static inline void __set_task_cpu(struct task_struct *p, unsigned int cpu)
 {
+   set_task_cfs_rq(p, cpu);
 #ifdef CONFIG_SMP
+   /*
+* After -cpu is set up to a new value, task_rq_lock(p, ...) can be
+* successfuly executed on another CPU. We must ensure that updates of
+* per-task data have been completed by this moment.
+*/
+   smp_wmb();
task_thread_info(p)-cpu = cpu;
 #endif
-   set_task_cfs_rq(p);
 }
 
 #ifdef CONFIG_SMP
@@ -7111,7 +7117,7 @@ void sched_move_task(struct task_struct 
tsk-sched_class-put_prev_task(rq, tsk);
}
 
-   set_task_cfs_rq(tsk);
+   set_task_cfs_rq(tsk, task_cpu(tsk));
 
if (on_rq) {
if (unlikely(running))
-
To unsubscribe from this list: send the line unsubscribe 

Re: 2.6.24-rc1-gb4f5550 oops

2007-11-14 Thread Srivatsa Vaddagiri
On Wed, Nov 14, 2007 at 06:17:08PM +0300, Oleg Nesterov wrote:
 Suppose that old user_struct/task_group is freed/reused, and the task does

Shouldn't this old user actually be the root user_struct?

-- 
Regards,
vatsa
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.24-rc1-gb4f5550 oops

2007-11-14 Thread Ingo Molnar

btw., could you get your Signed-off-by line for that fix?

Ingo
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.24-rc1-gb4f5550 oops

2007-11-14 Thread Oleg Nesterov
On 11/14, Ingo Molnar wrote:
 
 * Oleg Nesterov [EMAIL PROTECTED] wrote:
 
   [18073.371126] Unable to handle kernel NULL pointer dereference at 
   0120 RIP:
   [18073.371134]  [8023572e] check_preempt_wakeup+0x6e/0x110
   [18073.371144] PGD 81f9067 PUD 81c8067 PMD 0
   [18073.371151] Oops:  [1] PREEMPT SMP
   [18073.371157] CPU 2
   [18073.371161] Modules linked in: vfat fat
   [18073.371168] Pid: 4639, comm: kwin Not tainted 2.6.24-rc1 #1
   [18073.371171] RIP: 0010:[8023572e]  [8023572e] 
   check_preempt_wakeup+0x6e/0x110
   [18073.371177] RSP: 0018:810008531a78  EFLAGS: 00010006
   [18073.371179] RAX:  RBX:  RCX: 
   
   [18073.371183] RDX: 810004441bf0 RSI: 81000801e860 RDI: 
   81000444ab80
   [18073.371186] RBP: 810008531aa8 R08: 00d0d47a4a90 R09: 
   
   [18073.371188] R10: 810004441bf0 R11: 0001 R12: 
   810006520400
   [18073.371190] R13: 81000801e860 R14: 81000a63a000 R15: 
   81000443d8e0
   [18073.371193] FS:  2b7d646a86f0() GS:810004c11780() 
   knlGS:
   [18073.371196] CS:  0010 DS:  ES:  CR0: 8005003b
   [18073.371199] CR2: 0120 CR3: 08495000 CR4: 
   06e0
   [18073.371202] DR0:  DR1:  DR2: 
   
   [18073.371211] DR3:  DR6: 0ff0 DR7: 
   0400
   [18073.371214] Process kwin (pid: 4639, threadinfo 81000853, task 
   81000840a860)
   [18073.371216] Stack:  81000444ab80 0001 81000801e860 
   81000444ab80
   [18073.371231]  0002 81000443d8e0 810008531b38 
   8023061e
   [18073.371238]   810004441b80 0002 
   0001
   [18073.371245] Call Trace:
   [18073.371250]  [8023061e] try_to_wake_up+0x2fe/0x3a0
  
  I suspect I see the bug in that area, but I am not sure it can explain 
  this trace completely.
 
 there's a fix pending from Dmitry - please see below. It took days for 
 Grant to trigger the crash so it needs some time to be confirmed but it 
 could explain the crash in theory.

Yes I agree, it can explain the crash.

However, this patch can't fix the bug I was talking about (of course, unless
I missed something). It is still possible that the fair_sched_class task
can have -se.cfs_rq/parent pointing to the freed memory, no?

Oleg.

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.24-rc1-gb4f5550 oops

2007-11-14 Thread Oleg Nesterov
On 11/14, Srivatsa Vaddagiri wrote:

 On Wed, Nov 14, 2007 at 06:17:08PM +0300, Oleg Nesterov wrote:
  Suppose that old user_struct/task_group is freed/reused, and the task does
 
 Shouldn't this old user actually be the root user_struct?

Well, the S_ISUID application can switch -user from !root_user (note that
setuid exec doesn't change -user), and we also have CAP_SETUID...

But yes, I agree, this all is very unlikely.

Oleg.

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.24-rc1-gb4f5550 oops

2007-11-12 Thread Grant Wilson
On Mon, 12 Nov 2007 19:05:49 +0100
Peter Zijlstra <[EMAIL PROTECTED]> wrote:

> 
> On Thu, 2007-11-08 at 23:49 +0100, Rafael J. Wysocki wrote:
> > On Thursday, 8 of November 2007, Grant Wilson wrote:
> > > On Thu, 8 Nov 2007 22:42:21 +0100
> > > "Rafael J. Wysocki" <[EMAIL PROTECTED]> wrote:
> > > 
> > > > On Thursday, 8 of November 2007, Grant Wilson wrote:
> > > > > On Thu, 8 Nov 2007 16:53:10 +0100
> > > > > "Rafael J. Wysocki" <[EMAIL PROTECTED]> wrote:
> > > > > 
> > > > > > On Thursday, 8 of November 2007, Grant Wilson wrote:
> > > > > > > On Thu, 8 Nov 2007 01:06:21 +0100
> > > > > > > "Rafael J. Wysocki" <[EMAIL PROTECTED]> wrote:
> > > > > > > 
> > > > > > > > On Monday, 5 of November 2007, Grant Wilson wrote:
> > > > > > > > > Hi,
> > > > > > > > > I got this oops on 2.6.24-rc1-641-gb4f5550:
> > > > > > > > 
> > > > > > > > (1) Is this reproducible?
> > > > > > > > (2) Did it happen previously on your system?
> > > > > > > >
> > > > > > > > [18073.371126] Unable to handle kernel NULL pointer dereference 
> > > > > > > > at 0120 RIP: 
> > > > > > > > [18073.371134]  [] 
> > > > > > > > check_preempt_wakeup+0x6e/0x110
> > > > > > > 
> > > > > > > This has now happened twice - the second time was last night when
> > > > > > > running 2.6.24-rc2.
> > > > > > > 
> > > > > > > Here's that second occurrence:
> > > > > > > 
> > > > > [snip]
> > > > > > 
> > > > > > Hmm.
> > > > > > 
> > > > > > Please run "gdb vmlinux" and see what code corresponds to
> > > > > > check_preempt_wakeup+0x6e in your kernel.
> > > > >
> > > > > 
> > > > > Dump of assembler code for function check_preempt_wakeup:
> > > > 
> > > > Well, thanks, but I meant the source code.  Please do "gdb vmlinux" and 
> > > > then
> > > > "l *check_preempt_wakeup+0x6e" in gdb.
> > > 
> > > Here's the requested output:
> > > 
> > > (gdb) l *check_preempt_wakeup+0x6e
> > > 0x802329ae is in check_preempt_wakeup (kernel/sched_fair.c:668).
> > > 663
> > > 664 /* Do the two (enqueued) entities belong to the same group ? */
> > > 665 static inline int
> > > 666 is_same_group(struct sched_entity *se, struct sched_entity *pse)
> > > 667 {
> > > 668 if (se->cfs_rq == pse->cfs_rq)
> > > 669 return 1;
> > > 670
> > > 671 return 0;
> > > 672 }
> > 
> > Well, it looks like either se or pse is NULL.
> > 
> > Ingo, can you please have a look?
> 
> Most puzzling this, it should be guaranteed that the top sched_entities
> are of the same group, therefore avoiding this loop into NULL. Obviously
> something has gone wrong.
> 
> Grant, is there anything specific you can tell us about how to reproduce
> this?

I'm afraid not.  It has only happened twice and both times I was away
from the box in question at the time of failure, so it wasn't doing a
great deal.

I'm running 2.6.24-rc2 on two boxes and both times it happened on the
box running a quad core processor.

Grant

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.24-rc1-gb4f5550 oops

2007-11-12 Thread Peter Zijlstra

On Thu, 2007-11-08 at 23:49 +0100, Rafael J. Wysocki wrote:
> On Thursday, 8 of November 2007, Grant Wilson wrote:
> > On Thu, 8 Nov 2007 22:42:21 +0100
> > "Rafael J. Wysocki" <[EMAIL PROTECTED]> wrote:
> > 
> > > On Thursday, 8 of November 2007, Grant Wilson wrote:
> > > > On Thu, 8 Nov 2007 16:53:10 +0100
> > > > "Rafael J. Wysocki" <[EMAIL PROTECTED]> wrote:
> > > > 
> > > > > On Thursday, 8 of November 2007, Grant Wilson wrote:
> > > > > > On Thu, 8 Nov 2007 01:06:21 +0100
> > > > > > "Rafael J. Wysocki" <[EMAIL PROTECTED]> wrote:
> > > > > > 
> > > > > > > On Monday, 5 of November 2007, Grant Wilson wrote:
> > > > > > > > Hi,
> > > > > > > > I got this oops on 2.6.24-rc1-641-gb4f5550:
> > > > > > > 
> > > > > > > (1) Is this reproducible?
> > > > > > > (2) Did it happen previously on your system?
> > > > > > >
> > > > > > > [18073.371126] Unable to handle kernel NULL pointer dereference 
> > > > > > > at 0120 RIP: 
> > > > > > > [18073.371134]  [] 
> > > > > > > check_preempt_wakeup+0x6e/0x110
> > > > > > 
> > > > > > This has now happened twice - the second time was last night when
> > > > > > running 2.6.24-rc2.
> > > > > > 
> > > > > > Here's that second occurrence:
> > > > > > 
> > > > [snip]
> > > > > 
> > > > > Hmm.
> > > > > 
> > > > > Please run "gdb vmlinux" and see what code corresponds to
> > > > > check_preempt_wakeup+0x6e in your kernel.
> > > >
> > > > 
> > > > Dump of assembler code for function check_preempt_wakeup:
> > > 
> > > Well, thanks, but I meant the source code.  Please do "gdb vmlinux" and 
> > > then
> > > "l *check_preempt_wakeup+0x6e" in gdb.
> > 
> > Here's the requested output:
> > 
> > (gdb) l *check_preempt_wakeup+0x6e
> > 0x802329ae is in check_preempt_wakeup (kernel/sched_fair.c:668).
> > 663
> > 664 /* Do the two (enqueued) entities belong to the same group ? */
> > 665 static inline int
> > 666 is_same_group(struct sched_entity *se, struct sched_entity *pse)
> > 667 {
> > 668 if (se->cfs_rq == pse->cfs_rq)
> > 669 return 1;
> > 670
> > 671 return 0;
> > 672 }
> 
> Well, it looks like either se or pse is NULL.
> 
> Ingo, can you please have a look?

Most puzzling this, it should be guaranteed that the top sched_entities
are of the same group, therefore avoiding this loop into NULL. Obviously
something has gone wrong.

Grant, is there anything specific you can tell us about how to reproduce
this?


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.24-rc1-gb4f5550 oops

2007-11-12 Thread Peter Zijlstra

On Thu, 2007-11-08 at 23:49 +0100, Rafael J. Wysocki wrote:
 On Thursday, 8 of November 2007, Grant Wilson wrote:
  On Thu, 8 Nov 2007 22:42:21 +0100
  Rafael J. Wysocki [EMAIL PROTECTED] wrote:
  
   On Thursday, 8 of November 2007, Grant Wilson wrote:
On Thu, 8 Nov 2007 16:53:10 +0100
Rafael J. Wysocki [EMAIL PROTECTED] wrote:

 On Thursday, 8 of November 2007, Grant Wilson wrote:
  On Thu, 8 Nov 2007 01:06:21 +0100
  Rafael J. Wysocki [EMAIL PROTECTED] wrote:
  
   On Monday, 5 of November 2007, Grant Wilson wrote:
Hi,
I got this oops on 2.6.24-rc1-641-gb4f5550:
   
   (1) Is this reproducible?
   (2) Did it happen previously on your system?
  
   [18073.371126] Unable to handle kernel NULL pointer dereference 
   at 0120 RIP: 
   [18073.371134]  [8023572e] 
   check_preempt_wakeup+0x6e/0x110
  
  This has now happened twice - the second time was last night when
  running 2.6.24-rc2.
  
  Here's that second occurrence:
  
[snip]
 
 Hmm.
 
 Please run gdb vmlinux and see what code corresponds to
 check_preempt_wakeup+0x6e in your kernel.
   

Dump of assembler code for function check_preempt_wakeup:
   
   Well, thanks, but I meant the source code.  Please do gdb vmlinux and 
   then
   l *check_preempt_wakeup+0x6e in gdb.
  
  Here's the requested output:
  
  (gdb) l *check_preempt_wakeup+0x6e
  0x802329ae is in check_preempt_wakeup (kernel/sched_fair.c:668).
  663
  664 /* Do the two (enqueued) entities belong to the same group ? */
  665 static inline int
  666 is_same_group(struct sched_entity *se, struct sched_entity *pse)
  667 {
  668 if (se-cfs_rq == pse-cfs_rq)
  669 return 1;
  670
  671 return 0;
  672 }
 
 Well, it looks like either se or pse is NULL.
 
 Ingo, can you please have a look?

Most puzzling this, it should be guaranteed that the top sched_entities
are of the same group, therefore avoiding this loop into NULL. Obviously
something has gone wrong.

Grant, is there anything specific you can tell us about how to reproduce
this?


-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.24-rc1-gb4f5550 oops

2007-11-12 Thread Grant Wilson
On Mon, 12 Nov 2007 19:05:49 +0100
Peter Zijlstra [EMAIL PROTECTED] wrote:

 
 On Thu, 2007-11-08 at 23:49 +0100, Rafael J. Wysocki wrote:
  On Thursday, 8 of November 2007, Grant Wilson wrote:
   On Thu, 8 Nov 2007 22:42:21 +0100
   Rafael J. Wysocki [EMAIL PROTECTED] wrote:
   
On Thursday, 8 of November 2007, Grant Wilson wrote:
 On Thu, 8 Nov 2007 16:53:10 +0100
 Rafael J. Wysocki [EMAIL PROTECTED] wrote:
 
  On Thursday, 8 of November 2007, Grant Wilson wrote:
   On Thu, 8 Nov 2007 01:06:21 +0100
   Rafael J. Wysocki [EMAIL PROTECTED] wrote:
   
On Monday, 5 of November 2007, Grant Wilson wrote:
 Hi,
 I got this oops on 2.6.24-rc1-641-gb4f5550:

(1) Is this reproducible?
(2) Did it happen previously on your system?
   
[18073.371126] Unable to handle kernel NULL pointer dereference 
at 0120 RIP: 
[18073.371134]  [8023572e] 
check_preempt_wakeup+0x6e/0x110
   
   This has now happened twice - the second time was last night when
   running 2.6.24-rc2.
   
   Here's that second occurrence:
   
 [snip]
  
  Hmm.
  
  Please run gdb vmlinux and see what code corresponds to
  check_preempt_wakeup+0x6e in your kernel.

 
 Dump of assembler code for function check_preempt_wakeup:

Well, thanks, but I meant the source code.  Please do gdb vmlinux and 
then
l *check_preempt_wakeup+0x6e in gdb.
   
   Here's the requested output:
   
   (gdb) l *check_preempt_wakeup+0x6e
   0x802329ae is in check_preempt_wakeup (kernel/sched_fair.c:668).
   663
   664 /* Do the two (enqueued) entities belong to the same group ? */
   665 static inline int
   666 is_same_group(struct sched_entity *se, struct sched_entity *pse)
   667 {
   668 if (se-cfs_rq == pse-cfs_rq)
   669 return 1;
   670
   671 return 0;
   672 }
  
  Well, it looks like either se or pse is NULL.
  
  Ingo, can you please have a look?
 
 Most puzzling this, it should be guaranteed that the top sched_entities
 are of the same group, therefore avoiding this loop into NULL. Obviously
 something has gone wrong.
 
 Grant, is there anything specific you can tell us about how to reproduce
 this?

I'm afraid not.  It has only happened twice and both times I was away
from the box in question at the time of failure, so it wasn't doing a
great deal.

I'm running 2.6.24-rc2 on two boxes and both times it happened on the
box running a quad core processor.

Grant

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.24-rc1-gb4f5550 oops

2007-11-08 Thread Rafael J. Wysocki
On Thursday, 8 of November 2007, Grant Wilson wrote:
> On Thu, 8 Nov 2007 22:42:21 +0100
> "Rafael J. Wysocki" <[EMAIL PROTECTED]> wrote:
> 
> > On Thursday, 8 of November 2007, Grant Wilson wrote:
> > > On Thu, 8 Nov 2007 16:53:10 +0100
> > > "Rafael J. Wysocki" <[EMAIL PROTECTED]> wrote:
> > > 
> > > > On Thursday, 8 of November 2007, Grant Wilson wrote:
> > > > > On Thu, 8 Nov 2007 01:06:21 +0100
> > > > > "Rafael J. Wysocki" <[EMAIL PROTECTED]> wrote:
> > > > > 
> > > > > > On Monday, 5 of November 2007, Grant Wilson wrote:
> > > > > > > Hi,
> > > > > > > I got this oops on 2.6.24-rc1-641-gb4f5550:
> > > > > > 
> > > > > > (1) Is this reproducible?
> > > > > > (2) Did it happen previously on your system?
> > > > > >
> > > > > > [18073.371126] Unable to handle kernel NULL pointer dereference at 
> > > > > > 0120 RIP: 
> > > > > > [18073.371134]  [] check_preempt_wakeup+0x6e/0x110
> > > > > 
> > > > > This has now happened twice - the second time was last night when
> > > > > running 2.6.24-rc2.
> > > > > 
> > > > > Here's that second occurrence:
> > > > > 
> > > [snip]
> > > > 
> > > > Hmm.
> > > > 
> > > > Please run "gdb vmlinux" and see what code corresponds to
> > > > check_preempt_wakeup+0x6e in your kernel.
> > >
> > > 
> > > Dump of assembler code for function check_preempt_wakeup:
> > 
> > Well, thanks, but I meant the source code.  Please do "gdb vmlinux" and then
> > "l *check_preempt_wakeup+0x6e" in gdb.
> 
> Here's the requested output:
> 
> (gdb) l *check_preempt_wakeup+0x6e
> 0x802329ae is in check_preempt_wakeup (kernel/sched_fair.c:668).
> 663
> 664 /* Do the two (enqueued) entities belong to the same group ? */
> 665 static inline int
> 666 is_same_group(struct sched_entity *se, struct sched_entity *pse)
> 667 {
> 668 if (se->cfs_rq == pse->cfs_rq)
> 669 return 1;
> 670
> 671 return 0;
> 672 }

Well, it looks like either se or pse is NULL.

Ingo, can you please have a look?

Thanks,
Rafael
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.24-rc1-gb4f5550 oops

2007-11-08 Thread Grant Wilson
On Thu, 8 Nov 2007 22:42:21 +0100
"Rafael J. Wysocki" <[EMAIL PROTECTED]> wrote:

> On Thursday, 8 of November 2007, Grant Wilson wrote:
> > On Thu, 8 Nov 2007 16:53:10 +0100
> > "Rafael J. Wysocki" <[EMAIL PROTECTED]> wrote:
> > 
> > > On Thursday, 8 of November 2007, Grant Wilson wrote:
> > > > On Thu, 8 Nov 2007 01:06:21 +0100
> > > > "Rafael J. Wysocki" <[EMAIL PROTECTED]> wrote:
> > > > 
> > > > > On Monday, 5 of November 2007, Grant Wilson wrote:
> > > > > > Hi,
> > > > > > I got this oops on 2.6.24-rc1-641-gb4f5550:
> > > > > 
> > > > > (1) Is this reproducible?
> > > > > (2) Did it happen previously on your system?
> > > > >
> > > > > [18073.371126] Unable to handle kernel NULL pointer dereference at 
> > > > > 0120 RIP: 
> > > > > [18073.371134]  [] check_preempt_wakeup+0x6e/0x110
> > > > 
> > > > This has now happened twice - the second time was last night when
> > > > running 2.6.24-rc2.
> > > > 
> > > > Here's that second occurrence:
> > > > 
> > [snip]
> > > 
> > > Hmm.
> > > 
> > > Please run "gdb vmlinux" and see what code corresponds to
> > > check_preempt_wakeup+0x6e in your kernel.
> >
> > 
> > Dump of assembler code for function check_preempt_wakeup:
> 
> Well, thanks, but I meant the source code.  Please do "gdb vmlinux" and then
> "l *check_preempt_wakeup+0x6e" in gdb.

Here's the requested output:

(gdb) l *check_preempt_wakeup+0x6e
0x802329ae is in check_preempt_wakeup (kernel/sched_fair.c:668).
663
664 /* Do the two (enqueued) entities belong to the same group ? */
665 static inline int
666 is_same_group(struct sched_entity *se, struct sched_entity *pse)
667 {
668 if (se->cfs_rq == pse->cfs_rq)
669 return 1;
670
671 return 0;
672 }

Grant
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.24-rc1-gb4f5550 oops

2007-11-08 Thread Rafael J. Wysocki
On Thursday, 8 of November 2007, Grant Wilson wrote:
> On Thu, 8 Nov 2007 16:53:10 +0100
> "Rafael J. Wysocki" <[EMAIL PROTECTED]> wrote:
> 
> > On Thursday, 8 of November 2007, Grant Wilson wrote:
> > > On Thu, 8 Nov 2007 01:06:21 +0100
> > > "Rafael J. Wysocki" <[EMAIL PROTECTED]> wrote:
> > > 
> > > > On Monday, 5 of November 2007, Grant Wilson wrote:
> > > > > Hi,
> > > > > I got this oops on 2.6.24-rc1-641-gb4f5550:
> > > > 
> > > > (1) Is this reproducible?
> > > > (2) Did it happen previously on your system?
> > > >
> > > > [18073.371126] Unable to handle kernel NULL pointer dereference at 
> > > > 0120 RIP: 
> > > > [18073.371134]  [] check_preempt_wakeup+0x6e/0x110
> > > 
> > > This has now happened twice - the second time was last night when
> > > running 2.6.24-rc2.
> > > 
> > > Here's that second occurrence:
> > > 
> [snip]
> > 
> > Hmm.
> > 
> > Please run "gdb vmlinux" and see what code corresponds to
> > check_preempt_wakeup+0x6e in your kernel.
>
> 
> Dump of assembler code for function check_preempt_wakeup:

Well, thanks, but I meant the source code.  Please do "gdb vmlinux" and then
"l *check_preempt_wakeup+0x6e" in gdb.

Thanks,
Rafael
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.24-rc1-gb4f5550 oops

2007-11-08 Thread Grant Wilson
On Thu, 8 Nov 2007 16:53:10 +0100
"Rafael J. Wysocki" <[EMAIL PROTECTED]> wrote:

> On Thursday, 8 of November 2007, Grant Wilson wrote:
> > On Thu, 8 Nov 2007 01:06:21 +0100
> > "Rafael J. Wysocki" <[EMAIL PROTECTED]> wrote:
> > 
> > > On Monday, 5 of November 2007, Grant Wilson wrote:
> > > > Hi,
> > > > I got this oops on 2.6.24-rc1-641-gb4f5550:
> > > 
> > > (1) Is this reproducible?
> > > (2) Did it happen previously on your system?
> > >
> > > [18073.371126] Unable to handle kernel NULL pointer dereference at 
> > > 0120 RIP: 
> > > [18073.371134]  [] check_preempt_wakeup+0x6e/0x110
> > 
> > This has now happened twice - the second time was last night when
> > running 2.6.24-rc2.
> > 
> > Here's that second occurrence:
> > 
[snip]
> 
> Hmm.
> 
> Please run "gdb vmlinux" and see what code corresponds to
> check_preempt_wakeup+0x6e in your kernel.

Dump of assembler code for function check_preempt_wakeup:
0x80232940 :push   %rbp
0x80232941 :mov%rsp,%rbp
0x80232944 :sub$0x30,%rsp
0x80232948 :mov%r13,-0x18(%rbp)
0x8023294c :   mov%rbx,-0x28(%rbp)
0x80232950 :   mov%rsi,%r13
0x80232953 :   mov%r12,-0x20(%rbp)
0x80232957 :   mov%r14,-0x10(%rbp)
0x8023295b :   mov%r15,-0x8(%rbp)
0x8023295f :   cmpl   $0x63,0x20(%rsi)
0x80232963 :   mov0x750(%rdi),%r14
0x8023296a :   mov0x168(%r14),%r12
0x80232971 :   jle0x80232a1c 

0x80232977 :   cmpl   $0x3,0x17c(%rsi)
0x8023297e :   je 0x802329f8 

0x80232980 :   testb  $0x10,0x593cb9(%rip) 
   # 0x807c6640 
0x80232987 :   je 0x802329f8 

0x80232989 :   cmp0x168(%rsi),%r12
0x80232990 :   lea0x48(%r14),%rbx
0x80232994 :   lea0x48(%rsi),%rax
0x80232998 :   je 0x802329be 

0x8023299a :   nopw   0x0(%rax,%rax,1)
0x802329a0 :   mov0x118(%rbx),%rbx
0x802329a7 :  mov0x118(%rax),%rax
0x802329ae :  mov0x120(%rax),%rdx
0x802329b5 :  cmp%rdx,0x120(%rbx)
0x802329bc :  jne0x802329a0 

0x802329be :  cmpq   $0x400,(%rbx)
0x802329c5 :  mov0x40(%rbx),%r12
0x802329c9 :  mov0x40(%rax),%r15
0x802329cd :  mov0x593c81(%rip),%edi  
  # 0x807c6654 
0x802329d3 :  jne0x80232a37 

0x802329d5 :  sub%r15,%r12
0x802329d8 :  cmp%r12,%rdi
0x802329db :  jge0x802329f8 

0x802329dd :  testb  $0x20,0x593c5c(%rip) 
   # 0x807c6640 

Cheers,
Grant
-- 
Running Linux 2.6.24-rc2 on x86_64
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.24-rc1-gb4f5550 oops

2007-11-08 Thread Rafael J. Wysocki
On Thursday, 8 of November 2007, Grant Wilson wrote:
> On Thu, 8 Nov 2007 01:06:21 +0100
> "Rafael J. Wysocki" <[EMAIL PROTECTED]> wrote:
> 
> > On Monday, 5 of November 2007, Grant Wilson wrote:
> > > Hi,
> > > I got this oops on 2.6.24-rc1-641-gb4f5550:
> > 
> > (1) Is this reproducible?
> > (2) Did it happen previously on your system?
> >
> > [18073.371126] Unable to handle kernel NULL pointer dereference at 
> > 0120 RIP: 
> > [18073.371134]  [] check_preempt_wakeup+0x6e/0x110
> 
> This has now happened twice - the second time was last night when
> running 2.6.24-rc2.
> 
> Here's that second occurrence:
> 
> [ 7224.233621] Unable to handle kernel NULL pointer dereference at 
> 0120 RIP:
> [ 7224.233631]  [] check_preempt_wakeup+0x6e/0x110
> [ 7224.233644] PGD 0
> [ 7224.233650] Oops:  [1] PREEMPT SMP
> [ 7224.233660] CPU 0
> [ 7224.233716] Modules linked in:
> [ 7224.233722] Pid: 7622, comm: ssh Not tainted 2.6.24-rc2 #1
> [ 7224.233726] RIP: 0010:[]  [] 
> check_preempt_wakeup+0x6e/0x110
> [ 7224.233735] RSP: 0018:81000aafba78  EFLAGS: 00010006
> [ 7224.233738] RAX:  RBX:  RCX: 
> 5745
> [ 7224.233743] RDX: 81000442fbf0 RSI: 810006c96860 RDI: 
> 810004438b80
> [ 7224.233748] RBP: 81000aafbaa8 R08: 007e8e25cf71 R09: 
> 
> [ 7224.233752] R10: 81000442fbf0 R11: 0001 R12: 
> 810007479d80
> [ 7224.233756] R13: 810006c96860 R14: 810009924860 R15: 
> 81000442b8e0
> [ 7224.233760] FS:  2b8bf089d350() GS:808d7000() 
> knlGS:
> [ 7224.233764] CS:  0010 DS:  ES:  CR0: 8005003b
> [ 7224.233768] CR2: 0120 CR3: 0ab3f000 CR4: 
> 06e0
> [ 7224.233771] DR0:  DR1:  DR2: 
> 
> [ 7224.233775] DR3:  DR6: 0ff0 DR7: 
> 0400
> [ 7224.233779] Process ssh (pid: 7622, threadinfo 81000aafa000, task 
> 81000abc2000)
> [ 7224.233782] Stack:  810004438b80 0001 810006c96860 
> 810004438b80
> [ 7224.233796]   81000442b8e0 81000aafbb38 
> 8023061e
> [ 7224.233807]  0400 81000442fb80  
> 0001
> [ 7224.233816] Call Trace:
> [ 7224.233823]  [] try_to_wake_up+0x2fe/0x3a0
> [ 7224.233828]  [] default_wake_function+0xd/0x10
> [ 7224.233833]  [] __wake_up_common+0x5a/0x90
> [ 7224.233839]  [] __wake_up_sync+0x4a/0x70
> [ 7224.233845]  [] unix_write_space+0x8f/0xa0
> [ 7224.233850]  [] sock_wfree+0x49/0x50
> [ 7224.233854]  [] __kfree_skb+0x69/0xe0
> [ 7224.233859]  [] kfree_skb+0x17/0x30
> [ 7224.233863]  [] unix_stream_recvmsg+0x267/0x610
> [ 7224.233869]  [] sock_aio_read+0x107/0x110
> [ 7224.233875]  [] do_sync_read+0xf1/0x130
> [ 7224.233882]  [] __d_free+0x30/0x40
> [ 7224.233887]  [] autoremove_wake_function+0x0/0x40
> [ 7224.233892]  [] vfs_read+0x156/0x160
> [ 7224.233897]  [] sys_read+0x50/0x90
> [ 7224.233901]  [] system_call+0x7e/0x83
> [ 7224.233904]
> [ 7224.233907]
> [ 7224.233907] Code: 48 8b 90 20 01 00 00 48 39 93 20 01 00 00 75 e2 48 81 3b 
> 00
> [ 7224.233951] RIP  [] check_preempt_wakeup+0x6e/0x110
> [ 7224.233957]  RSP 
> [ 7224.233961] CR2: 0120
> [ 7224.233967] note: ssh[7622] exited with preempt_count 3

Hmm.

Please run "gdb vmlinux" and see what code corresponds to
check_preempt_wakeup+0x6e in your kernel.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.24-rc1-gb4f5550 oops

2007-11-08 Thread Rafael J. Wysocki
On Thursday, 8 of November 2007, Grant Wilson wrote:
 On Thu, 8 Nov 2007 01:06:21 +0100
 Rafael J. Wysocki [EMAIL PROTECTED] wrote:
 
  On Monday, 5 of November 2007, Grant Wilson wrote:
   Hi,
   I got this oops on 2.6.24-rc1-641-gb4f5550:
  
  (1) Is this reproducible?
  (2) Did it happen previously on your system?
 
  [18073.371126] Unable to handle kernel NULL pointer dereference at 
  0120 RIP: 
  [18073.371134]  [8023572e] check_preempt_wakeup+0x6e/0x110
 
 This has now happened twice - the second time was last night when
 running 2.6.24-rc2.
 
 Here's that second occurrence:
 
 [ 7224.233621] Unable to handle kernel NULL pointer dereference at 
 0120 RIP:
 [ 7224.233631]  [8023572e] check_preempt_wakeup+0x6e/0x110
 [ 7224.233644] PGD 0
 [ 7224.233650] Oops:  [1] PREEMPT SMP
 [ 7224.233660] CPU 0
 [ 7224.233716] Modules linked in:
 [ 7224.233722] Pid: 7622, comm: ssh Not tainted 2.6.24-rc2 #1
 [ 7224.233726] RIP: 0010:[8023572e]  [8023572e] 
 check_preempt_wakeup+0x6e/0x110
 [ 7224.233735] RSP: 0018:81000aafba78  EFLAGS: 00010006
 [ 7224.233738] RAX:  RBX:  RCX: 
 5745
 [ 7224.233743] RDX: 81000442fbf0 RSI: 810006c96860 RDI: 
 810004438b80
 [ 7224.233748] RBP: 81000aafbaa8 R08: 007e8e25cf71 R09: 
 
 [ 7224.233752] R10: 81000442fbf0 R11: 0001 R12: 
 810007479d80
 [ 7224.233756] R13: 810006c96860 R14: 810009924860 R15: 
 81000442b8e0
 [ 7224.233760] FS:  2b8bf089d350() GS:808d7000() 
 knlGS:
 [ 7224.233764] CS:  0010 DS:  ES:  CR0: 8005003b
 [ 7224.233768] CR2: 0120 CR3: 0ab3f000 CR4: 
 06e0
 [ 7224.233771] DR0:  DR1:  DR2: 
 
 [ 7224.233775] DR3:  DR6: 0ff0 DR7: 
 0400
 [ 7224.233779] Process ssh (pid: 7622, threadinfo 81000aafa000, task 
 81000abc2000)
 [ 7224.233782] Stack:  810004438b80 0001 810006c96860 
 810004438b80
 [ 7224.233796]   81000442b8e0 81000aafbb38 
 8023061e
 [ 7224.233807]  0400 81000442fb80  
 0001
 [ 7224.233816] Call Trace:
 [ 7224.233823]  [8023061e] try_to_wake_up+0x2fe/0x3a0
 [ 7224.233828]  [802306cd] default_wake_function+0xd/0x10
 [ 7224.233833]  [8022daca] __wake_up_common+0x5a/0x90
 [ 7224.233839]  [8023095a] __wake_up_sync+0x4a/0x70
 [ 7224.233845]  [80602fbf] unix_write_space+0x8f/0xa0
 [ 7224.233850]  [805936f9] sock_wfree+0x49/0x50
 [ 7224.233854]  [80595579] __kfree_skb+0x69/0xe0
 [ 7224.233859]  [80595607] kfree_skb+0x17/0x30
 [ 7224.233863]  [806016c7] unix_stream_recvmsg+0x267/0x610
 [ 7224.233869]  [8058e457] sock_aio_read+0x107/0x110
 [ 7224.233875]  [802928f1] do_sync_read+0xf1/0x130
 [ 7224.233882]  [802a4f20] __d_free+0x30/0x40
 [ 7224.233887]  [80251830] autoremove_wake_function+0x0/0x40
 [ 7224.233892]  [80293246] vfs_read+0x156/0x160
 [ 7224.233897]  [80293650] sys_read+0x50/0x90
 [ 7224.233901]  [8020bd7e] system_call+0x7e/0x83
 [ 7224.233904]
 [ 7224.233907]
 [ 7224.233907] Code: 48 8b 90 20 01 00 00 48 39 93 20 01 00 00 75 e2 48 81 3b 
 00
 [ 7224.233951] RIP  [8023572e] check_preempt_wakeup+0x6e/0x110
 [ 7224.233957]  RSP 81000aafba78
 [ 7224.233961] CR2: 0120
 [ 7224.233967] note: ssh[7622] exited with preempt_count 3

Hmm.

Please run gdb vmlinux and see what code corresponds to
check_preempt_wakeup+0x6e in your kernel.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.24-rc1-gb4f5550 oops

2007-11-08 Thread Grant Wilson
On Thu, 8 Nov 2007 16:53:10 +0100
Rafael J. Wysocki [EMAIL PROTECTED] wrote:

 On Thursday, 8 of November 2007, Grant Wilson wrote:
  On Thu, 8 Nov 2007 01:06:21 +0100
  Rafael J. Wysocki [EMAIL PROTECTED] wrote:
  
   On Monday, 5 of November 2007, Grant Wilson wrote:
Hi,
I got this oops on 2.6.24-rc1-641-gb4f5550:
   
   (1) Is this reproducible?
   (2) Did it happen previously on your system?
  
   [18073.371126] Unable to handle kernel NULL pointer dereference at 
   0120 RIP: 
   [18073.371134]  [8023572e] check_preempt_wakeup+0x6e/0x110
  
  This has now happened twice - the second time was last night when
  running 2.6.24-rc2.
  
  Here's that second occurrence:
  
[snip]
 
 Hmm.
 
 Please run gdb vmlinux and see what code corresponds to
 check_preempt_wakeup+0x6e in your kernel.

Dump of assembler code for function check_preempt_wakeup:
0x80232940 check_preempt_wakeup+0:push   %rbp
0x80232941 check_preempt_wakeup+1:mov%rsp,%rbp
0x80232944 check_preempt_wakeup+4:sub$0x30,%rsp
0x80232948 check_preempt_wakeup+8:mov%r13,-0x18(%rbp)
0x8023294c check_preempt_wakeup+12:   mov%rbx,-0x28(%rbp)
0x80232950 check_preempt_wakeup+16:   mov%rsi,%r13
0x80232953 check_preempt_wakeup+19:   mov%r12,-0x20(%rbp)
0x80232957 check_preempt_wakeup+23:   mov%r14,-0x10(%rbp)
0x8023295b check_preempt_wakeup+27:   mov%r15,-0x8(%rbp)
0x8023295f check_preempt_wakeup+31:   cmpl   $0x63,0x20(%rsi)
0x80232963 check_preempt_wakeup+35:   mov0x750(%rdi),%r14
0x8023296a check_preempt_wakeup+42:   mov0x168(%r14),%r12
0x80232971 check_preempt_wakeup+49:   jle0x80232a1c 
check_preempt_wakeup+220
0x80232977 check_preempt_wakeup+55:   cmpl   $0x3,0x17c(%rsi)
0x8023297e check_preempt_wakeup+62:   je 0x802329f8 
check_preempt_wakeup+184
0x80232980 check_preempt_wakeup+64:   testb  $0x10,0x593cb9(%rip) 
   # 0x807c6640 sysctl_sched_features
0x80232987 check_preempt_wakeup+71:   je 0x802329f8 
check_preempt_wakeup+184
0x80232989 check_preempt_wakeup+73:   cmp0x168(%rsi),%r12
0x80232990 check_preempt_wakeup+80:   lea0x48(%r14),%rbx
0x80232994 check_preempt_wakeup+84:   lea0x48(%rsi),%rax
0x80232998 check_preempt_wakeup+88:   je 0x802329be 
check_preempt_wakeup+126
0x8023299a check_preempt_wakeup+90:   nopw   0x0(%rax,%rax,1)
0x802329a0 check_preempt_wakeup+96:   mov0x118(%rbx),%rbx
0x802329a7 check_preempt_wakeup+103:  mov0x118(%rax),%rax
0x802329ae check_preempt_wakeup+110:  mov0x120(%rax),%rdx
0x802329b5 check_preempt_wakeup+117:  cmp%rdx,0x120(%rbx)
0x802329bc check_preempt_wakeup+124:  jne0x802329a0 
check_preempt_wakeup+96
0x802329be check_preempt_wakeup+126:  cmpq   $0x400,(%rbx)
0x802329c5 check_preempt_wakeup+133:  mov0x40(%rbx),%r12
0x802329c9 check_preempt_wakeup+137:  mov0x40(%rax),%r15
0x802329cd check_preempt_wakeup+141:  mov0x593c81(%rip),%edi  
  # 0x807c6654 sysctl_sched_wakeup_granularity
0x802329d3 check_preempt_wakeup+147:  jne0x80232a37 
check_preempt_wakeup+247
0x802329d5 check_preempt_wakeup+149:  sub%r15,%r12
0x802329d8 check_preempt_wakeup+152:  cmp%r12,%rdi
0x802329db check_preempt_wakeup+155:  jge0x802329f8 
check_preempt_wakeup+184
0x802329dd check_preempt_wakeup+157:  testb  $0x20,0x593c5c(%rip) 
   # 0x807c6640 sysctl_sched_features

Cheers,
Grant
-- 
Running Linux 2.6.24-rc2 on x86_64
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.24-rc1-gb4f5550 oops

2007-11-08 Thread Rafael J. Wysocki
On Thursday, 8 of November 2007, Grant Wilson wrote:
 On Thu, 8 Nov 2007 22:42:21 +0100
 Rafael J. Wysocki [EMAIL PROTECTED] wrote:
 
  On Thursday, 8 of November 2007, Grant Wilson wrote:
   On Thu, 8 Nov 2007 16:53:10 +0100
   Rafael J. Wysocki [EMAIL PROTECTED] wrote:
   
On Thursday, 8 of November 2007, Grant Wilson wrote:
 On Thu, 8 Nov 2007 01:06:21 +0100
 Rafael J. Wysocki [EMAIL PROTECTED] wrote:
 
  On Monday, 5 of November 2007, Grant Wilson wrote:
   Hi,
   I got this oops on 2.6.24-rc1-641-gb4f5550:
  
  (1) Is this reproducible?
  (2) Did it happen previously on your system?
 
  [18073.371126] Unable to handle kernel NULL pointer dereference at 
  0120 RIP: 
  [18073.371134]  [8023572e] check_preempt_wakeup+0x6e/0x110
 
 This has now happened twice - the second time was last night when
 running 2.6.24-rc2.
 
 Here's that second occurrence:
 
   [snip]

Hmm.

Please run gdb vmlinux and see what code corresponds to
check_preempt_wakeup+0x6e in your kernel.
  
   
   Dump of assembler code for function check_preempt_wakeup:
  
  Well, thanks, but I meant the source code.  Please do gdb vmlinux and then
  l *check_preempt_wakeup+0x6e in gdb.
 
 Here's the requested output:
 
 (gdb) l *check_preempt_wakeup+0x6e
 0x802329ae is in check_preempt_wakeup (kernel/sched_fair.c:668).
 663
 664 /* Do the two (enqueued) entities belong to the same group ? */
 665 static inline int
 666 is_same_group(struct sched_entity *se, struct sched_entity *pse)
 667 {
 668 if (se-cfs_rq == pse-cfs_rq)
 669 return 1;
 670
 671 return 0;
 672 }

Well, it looks like either se or pse is NULL.

Ingo, can you please have a look?

Thanks,
Rafael
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.24-rc1-gb4f5550 oops

2007-11-08 Thread Grant Wilson
On Thu, 8 Nov 2007 22:42:21 +0100
Rafael J. Wysocki [EMAIL PROTECTED] wrote:

 On Thursday, 8 of November 2007, Grant Wilson wrote:
  On Thu, 8 Nov 2007 16:53:10 +0100
  Rafael J. Wysocki [EMAIL PROTECTED] wrote:
  
   On Thursday, 8 of November 2007, Grant Wilson wrote:
On Thu, 8 Nov 2007 01:06:21 +0100
Rafael J. Wysocki [EMAIL PROTECTED] wrote:

 On Monday, 5 of November 2007, Grant Wilson wrote:
  Hi,
  I got this oops on 2.6.24-rc1-641-gb4f5550:
 
 (1) Is this reproducible?
 (2) Did it happen previously on your system?

 [18073.371126] Unable to handle kernel NULL pointer dereference at 
 0120 RIP: 
 [18073.371134]  [8023572e] check_preempt_wakeup+0x6e/0x110

This has now happened twice - the second time was last night when
running 2.6.24-rc2.

Here's that second occurrence:

  [snip]
   
   Hmm.
   
   Please run gdb vmlinux and see what code corresponds to
   check_preempt_wakeup+0x6e in your kernel.
 
  
  Dump of assembler code for function check_preempt_wakeup:
 
 Well, thanks, but I meant the source code.  Please do gdb vmlinux and then
 l *check_preempt_wakeup+0x6e in gdb.

Here's the requested output:

(gdb) l *check_preempt_wakeup+0x6e
0x802329ae is in check_preempt_wakeup (kernel/sched_fair.c:668).
663
664 /* Do the two (enqueued) entities belong to the same group ? */
665 static inline int
666 is_same_group(struct sched_entity *se, struct sched_entity *pse)
667 {
668 if (se-cfs_rq == pse-cfs_rq)
669 return 1;
670
671 return 0;
672 }

Grant
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.24-rc1-gb4f5550 oops

2007-11-08 Thread Rafael J. Wysocki
On Thursday, 8 of November 2007, Grant Wilson wrote:
 On Thu, 8 Nov 2007 16:53:10 +0100
 Rafael J. Wysocki [EMAIL PROTECTED] wrote:
 
  On Thursday, 8 of November 2007, Grant Wilson wrote:
   On Thu, 8 Nov 2007 01:06:21 +0100
   Rafael J. Wysocki [EMAIL PROTECTED] wrote:
   
On Monday, 5 of November 2007, Grant Wilson wrote:
 Hi,
 I got this oops on 2.6.24-rc1-641-gb4f5550:

(1) Is this reproducible?
(2) Did it happen previously on your system?
   
[18073.371126] Unable to handle kernel NULL pointer dereference at 
0120 RIP: 
[18073.371134]  [8023572e] check_preempt_wakeup+0x6e/0x110
   
   This has now happened twice - the second time was last night when
   running 2.6.24-rc2.
   
   Here's that second occurrence:
   
 [snip]
  
  Hmm.
  
  Please run gdb vmlinux and see what code corresponds to
  check_preempt_wakeup+0x6e in your kernel.

 
 Dump of assembler code for function check_preempt_wakeup:

Well, thanks, but I meant the source code.  Please do gdb vmlinux and then
l *check_preempt_wakeup+0x6e in gdb.

Thanks,
Rafael
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.24-rc1-gb4f5550 oops

2007-11-07 Thread Rafael J. Wysocki
On Monday, 5 of November 2007, Grant Wilson wrote:
> Hi,
> I got this oops on 2.6.24-rc1-641-gb4f5550:

(1) Is this reproducible?
(2) Did it happen previously on your system?


> [18073.371126] Unable to handle kernel NULL pointer dereference at 
> 0120 RIP: 
> [18073.371134]  [] check_preempt_wakeup+0x6e/0x110
> [18073.371144] PGD 81f9067 PUD 81c8067 PMD 0 
> [18073.371151] Oops:  [1] PREEMPT SMP 
> [18073.371157] CPU 2 
> [18073.371161] Modules linked in: vfat fat
> [18073.371168] Pid: 4639, comm: kwin Not tainted 2.6.24-rc1 #1
> [18073.371171] RIP: 0010:[]  [] 
> check_preempt_wakeup+0x6e/0x110
> [18073.371177] RSP: 0018:810008531a78  EFLAGS: 00010006
> [18073.371179] RAX:  RBX:  RCX: 
> 
> [18073.371183] RDX: 810004441bf0 RSI: 81000801e860 RDI: 
> 81000444ab80
> [18073.371186] RBP: 810008531aa8 R08: 00d0d47a4a90 R09: 
> 
> [18073.371188] R10: 810004441bf0 R11: 0001 R12: 
> 810006520400
> [18073.371190] R13: 81000801e860 R14: 81000a63a000 R15: 
> 81000443d8e0
> [18073.371193] FS:  2b7d646a86f0() GS:810004c11780() 
> knlGS:
> [18073.371196] CS:  0010 DS:  ES:  CR0: 8005003b
> [18073.371199] CR2: 0120 CR3: 08495000 CR4: 
> 06e0
> [18073.371202] DR0:  DR1:  DR2: 
> 
> [18073.371211] DR3:  DR6: 0ff0 DR7: 
> 0400
> [18073.371214] Process kwin (pid: 4639, threadinfo 81000853, task 
> 81000840a860)
> [18073.371216] Stack:  81000444ab80 0001 81000801e860 
> 81000444ab80
> [18073.371231]  0002 81000443d8e0 810008531b38 
> 8023061e
> [18073.371238]   810004441b80 0002 
> 0001
> [18073.371245] Call Trace:
> [18073.371250]  [] try_to_wake_up+0x2fe/0x3a0
> [18073.371253]  [] default_wake_function+0xd/0x10
> [18073.371257]  [] __wake_up_common+0x5a/0x90
> [18073.371260]  [] __wake_up_sync+0x4a/0x70
> [18073.371264]  [] unix_write_space+0x8f/0xa0
> [18073.371269]  [] sock_wfree+0x49/0x50
> [18073.371272]  [] __kfree_skb+0x69/0xe0
> [18073.371275]  [] kfree_skb+0x17/0x30
> [18073.371278]  [] unix_stream_recvmsg+0x267/0x610
> [18073.371283]  [] sock_aio_read+0x107/0x110
> [18073.371287]  [] do_sync_read+0xf1/0x130
> [18073.371291]  [] sock_ioctl+0x0/0x260
> [18073.371295]  [] autoremove_wake_function+0x0/0x40
> [18073.371299]  [] unix_ioctl+0xb2/0xf0
> [18073.371302]  [] sock_ioctl+0xd1/0x260
> [18073.371305]  [] do_ioctl+0x31/0x90
> [18073.371308]  [] vfs_read+0x156/0x160
> [18073.371311]  [] sys_read+0x50/0x90
> [18073.371315]  [] system_call+0x7e/0x83
> [18073.371317] 
> [18073.371319] 
> [18073.371319] Code: 48 8b 90 20 01 00 00 48 39 93 20 01 00 00 75 e2 48 81 3b 
> 00 
> [18073.371346] RIP  [] check_preempt_wakeup+0x6e/0x110
> [18073.371351]  RSP 
> [18073.371354] CR2: 0120
> [18073.371358] note: kwin[4639] exited with preempt_count 3
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.24-rc1-gb4f5550 oops

2007-11-07 Thread Rafael J. Wysocki
On Monday, 5 of November 2007, Grant Wilson wrote:
 Hi,
 I got this oops on 2.6.24-rc1-641-gb4f5550:

(1) Is this reproducible?
(2) Did it happen previously on your system?


 [18073.371126] Unable to handle kernel NULL pointer dereference at 
 0120 RIP: 
 [18073.371134]  [8023572e] check_preempt_wakeup+0x6e/0x110
 [18073.371144] PGD 81f9067 PUD 81c8067 PMD 0 
 [18073.371151] Oops:  [1] PREEMPT SMP 
 [18073.371157] CPU 2 
 [18073.371161] Modules linked in: vfat fat
 [18073.371168] Pid: 4639, comm: kwin Not tainted 2.6.24-rc1 #1
 [18073.371171] RIP: 0010:[8023572e]  [8023572e] 
 check_preempt_wakeup+0x6e/0x110
 [18073.371177] RSP: 0018:810008531a78  EFLAGS: 00010006
 [18073.371179] RAX:  RBX:  RCX: 
 
 [18073.371183] RDX: 810004441bf0 RSI: 81000801e860 RDI: 
 81000444ab80
 [18073.371186] RBP: 810008531aa8 R08: 00d0d47a4a90 R09: 
 
 [18073.371188] R10: 810004441bf0 R11: 0001 R12: 
 810006520400
 [18073.371190] R13: 81000801e860 R14: 81000a63a000 R15: 
 81000443d8e0
 [18073.371193] FS:  2b7d646a86f0() GS:810004c11780() 
 knlGS:
 [18073.371196] CS:  0010 DS:  ES:  CR0: 8005003b
 [18073.371199] CR2: 0120 CR3: 08495000 CR4: 
 06e0
 [18073.371202] DR0:  DR1:  DR2: 
 
 [18073.371211] DR3:  DR6: 0ff0 DR7: 
 0400
 [18073.371214] Process kwin (pid: 4639, threadinfo 81000853, task 
 81000840a860)
 [18073.371216] Stack:  81000444ab80 0001 81000801e860 
 81000444ab80
 [18073.371231]  0002 81000443d8e0 810008531b38 
 8023061e
 [18073.371238]   810004441b80 0002 
 0001
 [18073.371245] Call Trace:
 [18073.371250]  [8023061e] try_to_wake_up+0x2fe/0x3a0
 [18073.371253]  [802306cd] default_wake_function+0xd/0x10
 [18073.371257]  [8022daca] __wake_up_common+0x5a/0x90
 [18073.371260]  [8023095a] __wake_up_sync+0x4a/0x70
 [18073.371264]  [80602c3f] unix_write_space+0x8f/0xa0
 [18073.371269]  [80593359] sock_wfree+0x49/0x50
 [18073.371272]  [805951d9] __kfree_skb+0x69/0xe0
 [18073.371275]  [80595267] kfree_skb+0x17/0x30
 [18073.371278]  [80601347] unix_stream_recvmsg+0x267/0x610
 [18073.371283]  [8058e0b7] sock_aio_read+0x107/0x110
 [18073.371287]  [802929a1] do_sync_read+0xf1/0x130
 [18073.371291]  [8058e230] sock_ioctl+0x0/0x260
 [18073.371295]  [80251830] autoremove_wake_function+0x0/0x40
 [18073.371299]  [80600522] unix_ioctl+0xb2/0xf0
 [18073.371302]  [8058e301] sock_ioctl+0xd1/0x260
 [18073.371305]  [802a00c1] do_ioctl+0x31/0x90
 [18073.371308]  [802932f6] vfs_read+0x156/0x160
 [18073.371311]  [80293700] sys_read+0x50/0x90
 [18073.371315]  [8020bd7e] system_call+0x7e/0x83
 [18073.371317] 
 [18073.371319] 
 [18073.371319] Code: 48 8b 90 20 01 00 00 48 39 93 20 01 00 00 75 e2 48 81 3b 
 00 
 [18073.371346] RIP  [8023572e] check_preempt_wakeup+0x6e/0x110
 [18073.371351]  RSP 810008531a78
 [18073.371354] CR2: 0120
 [18073.371358] note: kwin[4639] exited with preempt_count 3
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


2.6.24-rc1-gb4f5550 oops

2007-11-04 Thread Grant Wilson
Hi,
I got this oops on 2.6.24-rc1-641-gb4f5550:

[18073.371126] Unable to handle kernel NULL pointer dereference at 
0120 RIP: 
[18073.371134]  [] check_preempt_wakeup+0x6e/0x110
[18073.371144] PGD 81f9067 PUD 81c8067 PMD 0 
[18073.371151] Oops:  [1] PREEMPT SMP 
[18073.371157] CPU 2 
[18073.371161] Modules linked in: vfat fat
[18073.371168] Pid: 4639, comm: kwin Not tainted 2.6.24-rc1 #1
[18073.371171] RIP: 0010:[]  [] 
check_preempt_wakeup+0x6e/0x110
[18073.371177] RSP: 0018:810008531a78  EFLAGS: 00010006
[18073.371179] RAX:  RBX:  RCX: 
[18073.371183] RDX: 810004441bf0 RSI: 81000801e860 RDI: 81000444ab80
[18073.371186] RBP: 810008531aa8 R08: 00d0d47a4a90 R09: 
[18073.371188] R10: 810004441bf0 R11: 0001 R12: 810006520400
[18073.371190] R13: 81000801e860 R14: 81000a63a000 R15: 81000443d8e0
[18073.371193] FS:  2b7d646a86f0() GS:810004c11780() 
knlGS:
[18073.371196] CS:  0010 DS:  ES:  CR0: 8005003b
[18073.371199] CR2: 0120 CR3: 08495000 CR4: 06e0
[18073.371202] DR0:  DR1:  DR2: 
[18073.371211] DR3:  DR6: 0ff0 DR7: 0400
[18073.371214] Process kwin (pid: 4639, threadinfo 81000853, task 
81000840a860)
[18073.371216] Stack:  81000444ab80 0001 81000801e860 
81000444ab80
[18073.371231]  0002 81000443d8e0 810008531b38 
8023061e
[18073.371238]   810004441b80 0002 
0001
[18073.371245] Call Trace:
[18073.371250]  [] try_to_wake_up+0x2fe/0x3a0
[18073.371253]  [] default_wake_function+0xd/0x10
[18073.371257]  [] __wake_up_common+0x5a/0x90
[18073.371260]  [] __wake_up_sync+0x4a/0x70
[18073.371264]  [] unix_write_space+0x8f/0xa0
[18073.371269]  [] sock_wfree+0x49/0x50
[18073.371272]  [] __kfree_skb+0x69/0xe0
[18073.371275]  [] kfree_skb+0x17/0x30
[18073.371278]  [] unix_stream_recvmsg+0x267/0x610
[18073.371283]  [] sock_aio_read+0x107/0x110
[18073.371287]  [] do_sync_read+0xf1/0x130
[18073.371291]  [] sock_ioctl+0x0/0x260
[18073.371295]  [] autoremove_wake_function+0x0/0x40
[18073.371299]  [] unix_ioctl+0xb2/0xf0
[18073.371302]  [] sock_ioctl+0xd1/0x260
[18073.371305]  [] do_ioctl+0x31/0x90
[18073.371308]  [] vfs_read+0x156/0x160
[18073.371311]  [] sys_read+0x50/0x90
[18073.371315]  [] system_call+0x7e/0x83
[18073.371317] 
[18073.371319] 
[18073.371319] Code: 48 8b 90 20 01 00 00 48 39 93 20 01 00 00 75 e2 48 81 3b 
00 
[18073.371346] RIP  [] check_preempt_wakeup+0x6e/0x110
[18073.371351]  RSP 
[18073.371354] CR2: 0120
[18073.371358] note: kwin[4639] exited with preempt_count 3

Here's my config:

#
# Automatically generated make config: don't edit
# Linux kernel version: 2.6.24-rc1
# Sun Nov  4 13:21:29 2007
#
CONFIG_X86_64=y
CONFIG_64BIT=y
CONFIG_X86=y
CONFIG_GENERIC_TIME=y
CONFIG_GENERIC_TIME_VSYSCALL=y
CONFIG_GENERIC_CMOS_UPDATE=y
CONFIG_CLOCKSOURCE_WATCHDOG=y
CONFIG_GENERIC_CLOCKEVENTS=y
CONFIG_GENERIC_CLOCKEVENTS_BROADCAST=y
CONFIG_ZONE_DMA32=y
CONFIG_LOCKDEP_SUPPORT=y
CONFIG_STACKTRACE_SUPPORT=y
CONFIG_SEMAPHORE_SLEEPERS=y
CONFIG_MMU=y
CONFIG_ZONE_DMA=y
CONFIG_RWSEM_GENERIC_SPINLOCK=y
CONFIG_GENERIC_HWEIGHT=y
CONFIG_GENERIC_CALIBRATE_DELAY=y
CONFIG_X86_CMPXCHG=y
CONFIG_GENERIC_ISA_DMA=y
CONFIG_GENERIC_IOMAP=y
CONFIG_ARCH_MAY_HAVE_PC_FDC=y
CONFIG_ARCH_POPULATES_NODE_MAP=y
CONFIG_DMI=y
CONFIG_AUDIT_ARCH=y
CONFIG_GENERIC_BUG=y
# CONFIG_ARCH_HAS_ILOG2_U32 is not set
# CONFIG_ARCH_HAS_ILOG2_U64 is not set
CONFIG_DEFCONFIG_LIST="/lib/modules/$UNAME_RELEASE/.config"

#
# General setup
#
CONFIG_EXPERIMENTAL=y
CONFIG_LOCK_KERNEL=y
CONFIG_INIT_ENV_ARG_LIMIT=32
CONFIG_LOCALVERSION=""
CONFIG_LOCALVERSION_AUTO=y
CONFIG_SWAP=y
CONFIG_SYSVIPC=y
CONFIG_SYSVIPC_SYSCTL=y
CONFIG_POSIX_MQUEUE=y
CONFIG_BSD_PROCESS_ACCT=y
# CONFIG_BSD_PROCESS_ACCT_V3 is not set
# CONFIG_TASKSTATS is not set
# CONFIG_USER_NS is not set
# CONFIG_AUDIT is not set
CONFIG_IKCONFIG=y
CONFIG_IKCONFIG_PROC=y
CONFIG_LOG_BUF_SHIFT=17
# CONFIG_CGROUPS is not set
CONFIG_FAIR_GROUP_SCHED=y
CONFIG_FAIR_USER_SCHED=y
# CONFIG_FAIR_CGROUP_SCHED is not set
# CONFIG_SYSFS_DEPRECATED is not set
# CONFIG_RELAY is not set
CONFIG_BLK_DEV_INITRD=y
CONFIG_INITRAMFS_SOURCE=""
# CONFIG_CC_OPTIMIZE_FOR_SIZE is not set
CONFIG_SYSCTL=y
# CONFIG_EMBEDDED is not set
CONFIG_UID16=y
CONFIG_SYSCTL_SYSCALL=y
CONFIG_KALLSYMS=y
CONFIG_KALLSYMS_ALL=y
# CONFIG_KALLSYMS_EXTRA_PASS is not set
CONFIG_HOTPLUG=y
CONFIG_PRINTK=y
CONFIG_BUG=y
CONFIG_ELF_CORE=y
CONFIG_BASE_FULL=y
CONFIG_FUTEX=y
CONFIG_ANON_INODES=y
CONFIG_EPOLL=y
CONFIG_SIGNALFD=y
CONFIG_EVENTFD=y
CONFIG_SHMEM=y
CONFIG_VM_EVENT_COUNTERS=y
CONFIG_SLUB_DEBUG=y
# CONFIG_SLAB is not set
CONFIG_SLUB=y
# CONFIG_SLOB is not set
CONFIG_RT_MUTEXES=y
# CONFIG_TINY_SHMEM is not set
CONFIG_BASE_SMALL=0
CONFIG_MODULES=y

2.6.24-rc1-gb4f5550 oops

2007-11-04 Thread Grant Wilson
Hi,
I got this oops on 2.6.24-rc1-641-gb4f5550:

[18073.371126] Unable to handle kernel NULL pointer dereference at 
0120 RIP: 
[18073.371134]  [8023572e] check_preempt_wakeup+0x6e/0x110
[18073.371144] PGD 81f9067 PUD 81c8067 PMD 0 
[18073.371151] Oops:  [1] PREEMPT SMP 
[18073.371157] CPU 2 
[18073.371161] Modules linked in: vfat fat
[18073.371168] Pid: 4639, comm: kwin Not tainted 2.6.24-rc1 #1
[18073.371171] RIP: 0010:[8023572e]  [8023572e] 
check_preempt_wakeup+0x6e/0x110
[18073.371177] RSP: 0018:810008531a78  EFLAGS: 00010006
[18073.371179] RAX:  RBX:  RCX: 
[18073.371183] RDX: 810004441bf0 RSI: 81000801e860 RDI: 81000444ab80
[18073.371186] RBP: 810008531aa8 R08: 00d0d47a4a90 R09: 
[18073.371188] R10: 810004441bf0 R11: 0001 R12: 810006520400
[18073.371190] R13: 81000801e860 R14: 81000a63a000 R15: 81000443d8e0
[18073.371193] FS:  2b7d646a86f0() GS:810004c11780() 
knlGS:
[18073.371196] CS:  0010 DS:  ES:  CR0: 8005003b
[18073.371199] CR2: 0120 CR3: 08495000 CR4: 06e0
[18073.371202] DR0:  DR1:  DR2: 
[18073.371211] DR3:  DR6: 0ff0 DR7: 0400
[18073.371214] Process kwin (pid: 4639, threadinfo 81000853, task 
81000840a860)
[18073.371216] Stack:  81000444ab80 0001 81000801e860 
81000444ab80
[18073.371231]  0002 81000443d8e0 810008531b38 
8023061e
[18073.371238]   810004441b80 0002 
0001
[18073.371245] Call Trace:
[18073.371250]  [8023061e] try_to_wake_up+0x2fe/0x3a0
[18073.371253]  [802306cd] default_wake_function+0xd/0x10
[18073.371257]  [8022daca] __wake_up_common+0x5a/0x90
[18073.371260]  [8023095a] __wake_up_sync+0x4a/0x70
[18073.371264]  [80602c3f] unix_write_space+0x8f/0xa0
[18073.371269]  [80593359] sock_wfree+0x49/0x50
[18073.371272]  [805951d9] __kfree_skb+0x69/0xe0
[18073.371275]  [80595267] kfree_skb+0x17/0x30
[18073.371278]  [80601347] unix_stream_recvmsg+0x267/0x610
[18073.371283]  [8058e0b7] sock_aio_read+0x107/0x110
[18073.371287]  [802929a1] do_sync_read+0xf1/0x130
[18073.371291]  [8058e230] sock_ioctl+0x0/0x260
[18073.371295]  [80251830] autoremove_wake_function+0x0/0x40
[18073.371299]  [80600522] unix_ioctl+0xb2/0xf0
[18073.371302]  [8058e301] sock_ioctl+0xd1/0x260
[18073.371305]  [802a00c1] do_ioctl+0x31/0x90
[18073.371308]  [802932f6] vfs_read+0x156/0x160
[18073.371311]  [80293700] sys_read+0x50/0x90
[18073.371315]  [8020bd7e] system_call+0x7e/0x83
[18073.371317] 
[18073.371319] 
[18073.371319] Code: 48 8b 90 20 01 00 00 48 39 93 20 01 00 00 75 e2 48 81 3b 
00 
[18073.371346] RIP  [8023572e] check_preempt_wakeup+0x6e/0x110
[18073.371351]  RSP 810008531a78
[18073.371354] CR2: 0120
[18073.371358] note: kwin[4639] exited with preempt_count 3

Here's my config:

#
# Automatically generated make config: don't edit
# Linux kernel version: 2.6.24-rc1
# Sun Nov  4 13:21:29 2007
#
CONFIG_X86_64=y
CONFIG_64BIT=y
CONFIG_X86=y
CONFIG_GENERIC_TIME=y
CONFIG_GENERIC_TIME_VSYSCALL=y
CONFIG_GENERIC_CMOS_UPDATE=y
CONFIG_CLOCKSOURCE_WATCHDOG=y
CONFIG_GENERIC_CLOCKEVENTS=y
CONFIG_GENERIC_CLOCKEVENTS_BROADCAST=y
CONFIG_ZONE_DMA32=y
CONFIG_LOCKDEP_SUPPORT=y
CONFIG_STACKTRACE_SUPPORT=y
CONFIG_SEMAPHORE_SLEEPERS=y
CONFIG_MMU=y
CONFIG_ZONE_DMA=y
CONFIG_RWSEM_GENERIC_SPINLOCK=y
CONFIG_GENERIC_HWEIGHT=y
CONFIG_GENERIC_CALIBRATE_DELAY=y
CONFIG_X86_CMPXCHG=y
CONFIG_GENERIC_ISA_DMA=y
CONFIG_GENERIC_IOMAP=y
CONFIG_ARCH_MAY_HAVE_PC_FDC=y
CONFIG_ARCH_POPULATES_NODE_MAP=y
CONFIG_DMI=y
CONFIG_AUDIT_ARCH=y
CONFIG_GENERIC_BUG=y
# CONFIG_ARCH_HAS_ILOG2_U32 is not set
# CONFIG_ARCH_HAS_ILOG2_U64 is not set
CONFIG_DEFCONFIG_LIST=/lib/modules/$UNAME_RELEASE/.config

#
# General setup
#
CONFIG_EXPERIMENTAL=y
CONFIG_LOCK_KERNEL=y
CONFIG_INIT_ENV_ARG_LIMIT=32
CONFIG_LOCALVERSION=
CONFIG_LOCALVERSION_AUTO=y
CONFIG_SWAP=y
CONFIG_SYSVIPC=y
CONFIG_SYSVIPC_SYSCTL=y
CONFIG_POSIX_MQUEUE=y
CONFIG_BSD_PROCESS_ACCT=y
# CONFIG_BSD_PROCESS_ACCT_V3 is not set
# CONFIG_TASKSTATS is not set
# CONFIG_USER_NS is not set
# CONFIG_AUDIT is not set
CONFIG_IKCONFIG=y
CONFIG_IKCONFIG_PROC=y
CONFIG_LOG_BUF_SHIFT=17
# CONFIG_CGROUPS is not set
CONFIG_FAIR_GROUP_SCHED=y
CONFIG_FAIR_USER_SCHED=y
# CONFIG_FAIR_CGROUP_SCHED is not set
# CONFIG_SYSFS_DEPRECATED is not set
# CONFIG_RELAY is not set
CONFIG_BLK_DEV_INITRD=y
CONFIG_INITRAMFS_SOURCE=
# CONFIG_CC_OPTIMIZE_FOR_SIZE is not set
CONFIG_SYSCTL=y
# CONFIG_EMBEDDED is not set
CONFIG_UID16=y
CONFIG_SYSCTL_SYSCALL=y
CONFIG_KALLSYMS=y
CONFIG_KALLSYMS_ALL=y
# CONFIG_KALLSYMS_EXTRA_PASS is not set