Re: 2.6.24-rc1-gb4f5550 oops
On 11/14, Srivatsa Vaddagiri wrote: > > On Wed, Nov 14, 2007 at 06:17:08PM +0300, Oleg Nesterov wrote: > > Suppose that old user_struct/task_group is freed/reused, and the task does > > Shouldn't this old user actually be the root user_struct? Well, the S_ISUID application can switch ->user from !root_user (note that setuid exec doesn't change ->user), and we also have CAP_SETUID... But yes, I agree, this all is very unlikely. Oleg. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.24-rc1-gb4f5550 oops
On Wed, Nov 14, 2007 at 06:17:08PM +0300, Oleg Nesterov wrote: > Suppose that old user_struct/task_group is freed/reused, and the task does Shouldn't this old user actually be the root user_struct? -- Regards, vatsa - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.24-rc1-gb4f5550 oops
btw., could you get your Signed-off-by line for that fix? Ingo - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.24-rc1-gb4f5550 oops
* Oleg Nesterov <[EMAIL PROTECTED]> wrote: > > > I suspect I see the bug in that area, but I am not sure it can > > > explain this trace completely. > > > > there's a fix pending from Dmitry - please see below. It took days > > for Grant to trigger the crash so it needs some time to be confirmed > > but it could explain the crash in theory. > > Yes I agree, it can explain the crash. > > However, this patch can't fix the bug I was talking about (of course, > unless I missed something). It is still possible that the > "fair_sched_class" task can have ->se.cfs_rq/parent pointing to the > freed memory, no? yeah, agreed. I've queued up your fix for merging - thanks! Ingo - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.24-rc1-gb4f5550 oops
On 11/14, Ingo Molnar wrote: > > * Oleg Nesterov <[EMAIL PROTECTED]> wrote: > > > > [18073.371126] Unable to handle kernel NULL pointer dereference at > > > 0120 RIP: > > > [18073.371134] [] check_preempt_wakeup+0x6e/0x110 > > > [18073.371144] PGD 81f9067 PUD 81c8067 PMD 0 > > > [18073.371151] Oops: [1] PREEMPT SMP > > > [18073.371157] CPU 2 > > > [18073.371161] Modules linked in: vfat fat > > > [18073.371168] Pid: 4639, comm: kwin Not tainted 2.6.24-rc1 #1 > > > [18073.371171] RIP: 0010:[] [] > > > check_preempt_wakeup+0x6e/0x110 > > > [18073.371177] RSP: 0018:810008531a78 EFLAGS: 00010006 > > > [18073.371179] RAX: RBX: RCX: > > > > > > [18073.371183] RDX: 810004441bf0 RSI: 81000801e860 RDI: > > > 81000444ab80 > > > [18073.371186] RBP: 810008531aa8 R08: 00d0d47a4a90 R09: > > > > > > [18073.371188] R10: 810004441bf0 R11: 0001 R12: > > > 810006520400 > > > [18073.371190] R13: 81000801e860 R14: 81000a63a000 R15: > > > 81000443d8e0 > > > [18073.371193] FS: 2b7d646a86f0() GS:810004c11780() > > > knlGS: > > > [18073.371196] CS: 0010 DS: ES: CR0: 8005003b > > > [18073.371199] CR2: 0120 CR3: 08495000 CR4: > > > 06e0 > > > [18073.371202] DR0: DR1: DR2: > > > > > > [18073.371211] DR3: DR6: 0ff0 DR7: > > > 0400 > > > [18073.371214] Process kwin (pid: 4639, threadinfo 81000853, task > > > 81000840a860) > > > [18073.371216] Stack: 81000444ab80 0001 81000801e860 > > > 81000444ab80 > > > [18073.371231] 0002 81000443d8e0 810008531b38 > > > 8023061e > > > [18073.371238] 810004441b80 0002 > > > 0001 > > > [18073.371245] Call Trace: > > > [18073.371250] [] try_to_wake_up+0x2fe/0x3a0 > > > > I suspect I see the bug in that area, but I am not sure it can explain > > this trace completely. > > there's a fix pending from Dmitry - please see below. It took days for > Grant to trigger the crash so it needs some time to be confirmed but it > could explain the crash in theory. Yes I agree, it can explain the crash. However, this patch can't fix the bug I was talking about (of course, unless I missed something). It is still possible that the "fair_sched_class" task can have ->se.cfs_rq/parent pointing to the freed memory, no? Oleg. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.24-rc1-gb4f5550 oops
* Oleg Nesterov <[EMAIL PROTECTED]> wrote: > > [18073.371126] Unable to handle kernel NULL pointer dereference at > > 0120 RIP: > > [18073.371134] [] check_preempt_wakeup+0x6e/0x110 > > [18073.371144] PGD 81f9067 PUD 81c8067 PMD 0 > > [18073.371151] Oops: [1] PREEMPT SMP > > [18073.371157] CPU 2 > > [18073.371161] Modules linked in: vfat fat > > [18073.371168] Pid: 4639, comm: kwin Not tainted 2.6.24-rc1 #1 > > [18073.371171] RIP: 0010:[] [] > > check_preempt_wakeup+0x6e/0x110 > > [18073.371177] RSP: 0018:810008531a78 EFLAGS: 00010006 > > [18073.371179] RAX: RBX: RCX: > > > > [18073.371183] RDX: 810004441bf0 RSI: 81000801e860 RDI: > > 81000444ab80 > > [18073.371186] RBP: 810008531aa8 R08: 00d0d47a4a90 R09: > > > > [18073.371188] R10: 810004441bf0 R11: 0001 R12: > > 810006520400 > > [18073.371190] R13: 81000801e860 R14: 81000a63a000 R15: > > 81000443d8e0 > > [18073.371193] FS: 2b7d646a86f0() GS:810004c11780() > > knlGS: > > [18073.371196] CS: 0010 DS: ES: CR0: 8005003b > > [18073.371199] CR2: 0120 CR3: 08495000 CR4: > > 06e0 > > [18073.371202] DR0: DR1: DR2: > > > > [18073.371211] DR3: DR6: 0ff0 DR7: > > 0400 > > [18073.371214] Process kwin (pid: 4639, threadinfo 81000853, task > > 81000840a860) > > [18073.371216] Stack: 81000444ab80 0001 81000801e860 > > 81000444ab80 > > [18073.371231] 0002 81000443d8e0 810008531b38 > > 8023061e > > [18073.371238] 810004441b80 0002 > > 0001 > > [18073.371245] Call Trace: > > [18073.371250] [] try_to_wake_up+0x2fe/0x3a0 > > I suspect I see the bug in that area, but I am not sure it can explain > this trace completely. there's a fix pending from Dmitry - please see below. It took days for Grant to trigger the crash so it needs some time to be confirmed but it could explain the crash in theory. Ingo --> Subject: sched: fix __set_task_cpu() SMP race From: Dmitry Adamushko <[EMAIL PROTECTED]> Grant Wilson has reported rare SCHED_FAIR_USER crashes on his quad-core system, which crashes can only be explained via runqueue corruption. there is a narrow SMP race in __set_task_cpu(): after ->cpu is set up to a new value, task_rq_lock(p, ...) can be successfuly executed on another CPU. We must ensure that updates of per-task data have been completed by this moment. this bug has been hiding in the Linux scheduler for an eternity (we never had any explicit barrier for task->cpu in set_task_cpu() - so the bug was introduced in 2.5.1), but only became visible via set_task_cfs_rq() being accidentally put after the task->cpu update. It also probably needs a sufficiently out-of-order CPU to trigger. Reported-by: Grant Wilson <[EMAIL PROTECTED]> Signed-off-by: Dmitry Adamushko <[EMAIL PROTECTED]> Signed-off-by: Ingo Molnar <[EMAIL PROTECTED]> --- kernel/sched.c | 18 -- 1 file changed, 12 insertions(+), 6 deletions(-) Index: linux/kernel/sched.c === --- linux.orig/kernel/sched.c +++ linux/kernel/sched.c @@ -217,15 +217,15 @@ static inline struct task_group *task_gr } /* Change a task's cfs_rq and parent entity if it moves across CPUs/groups */ -static inline void set_task_cfs_rq(struct task_struct *p) +static inline void set_task_cfs_rq(struct task_struct *p, unsigned int cpu) { - p->se.cfs_rq = task_group(p)->cfs_rq[task_cpu(p)]; - p->se.parent = task_group(p)->se[task_cpu(p)]; + p->se.cfs_rq = task_group(p)->cfs_rq[cpu]; + p->se.parent = task_group(p)->se[cpu]; } #else -static inline void set_task_cfs_rq(struct task_struct *p) { } +static inline void set_task_cfs_rq(struct task_struct *p, unsigned int cpu) { } #endif /* CONFIG_FAIR_GROUP_SCHED */ @@ -1023,10 +1023,16 @@ unsigned long weighted_cpuload(const int static inline void __set_task_cpu(struct task_struct *p, unsigned int cpu) { + set_task_cfs_rq(p, cpu); #ifdef CONFIG_SMP + /* +* After ->cpu is set up to a new value, task_rq_lock(p, ...) can be +* successfuly executed on another CPU. We must ensure that updates of +* per-task data have been completed by this moment. +*/ + smp_wmb(); task_thread_info(p)->cpu = cpu; #endif - set_task_cfs_rq(p); } #ifdef CONFIG_SMP @@ -7111,7 +7117,7 @@ void sched_move_task(struct task_struct tsk->sched_class->put_prev_task(rq, tsk); } - set_task_cfs_rq(tsk); + set_task_cfs_rq(tsk, task_cpu(tsk)); if (on_rq) { if (unlikely(running)) - To
Re: 2.6.24-rc1-gb4f5550 oops
Grant Wilson wrote: > > [18073.371126] Unable to handle kernel NULL pointer dereference at > 0120 RIP: > [18073.371134] [] check_preempt_wakeup+0x6e/0x110 > [18073.371144] PGD 81f9067 PUD 81c8067 PMD 0 > [18073.371151] Oops: [1] PREEMPT SMP > [18073.371157] CPU 2 > [18073.371161] Modules linked in: vfat fat > [18073.371168] Pid: 4639, comm: kwin Not tainted 2.6.24-rc1 #1 > [18073.371171] RIP: 0010:[] [] > check_preempt_wakeup+0x6e/0x110 > [18073.371177] RSP: 0018:810008531a78 EFLAGS: 00010006 > [18073.371179] RAX: RBX: RCX: > > [18073.371183] RDX: 810004441bf0 RSI: 81000801e860 RDI: > 81000444ab80 > [18073.371186] RBP: 810008531aa8 R08: 00d0d47a4a90 R09: > > [18073.371188] R10: 810004441bf0 R11: 0001 R12: > 810006520400 > [18073.371190] R13: 81000801e860 R14: 81000a63a000 R15: > 81000443d8e0 > [18073.371193] FS: 2b7d646a86f0() GS:810004c11780() > knlGS: > [18073.371196] CS: 0010 DS: ES: CR0: 8005003b > [18073.371199] CR2: 0120 CR3: 08495000 CR4: > 06e0 > [18073.371202] DR0: DR1: DR2: > > [18073.371211] DR3: DR6: 0ff0 DR7: > 0400 > [18073.371214] Process kwin (pid: 4639, threadinfo 81000853, task > 81000840a860) > [18073.371216] Stack: 81000444ab80 0001 81000801e860 > 81000444ab80 > [18073.371231] 0002 81000443d8e0 810008531b38 > 8023061e > [18073.371238] 810004441b80 0002 > 0001 > [18073.371245] Call Trace: > [18073.371250] [] try_to_wake_up+0x2fe/0x3a0 I suspect I see the bug in that area, but I am not sure it can explain this trace completely. Suppose that the SCHED_FIFO task does switch_uid(new_user); Now, p->se.cfs_rq and p->se.parent both point into the old user_struct->tg because sched_move_task() doesn't call set_task_cfs_rq() for !fair_sched_class case. Suppose that old user_struct/task_group is freed/reused, and the task does sched_setscheduler(SCHED_NORMAL); __setscheduler() sets fair_sched_class, but doesn't update ->se.cfs_rq/parent which point to the freed memory. This means that check_preempt_wakeup() doing while (!is_same_group(se, pse)) { se = parent_entity(se); pse = parent_entity(pse); } may OOPS in a similar way if rq->curr or p did something like above. Perhaps we need something like the patch below, note that __setscheduler() can't do set_task_cfs_rq(). Oleg. --- kernel/sched.c 2007-11-14 17:32:21.0 +0300 +++ - 2007-11-14 18:14:00.245492756 +0300 @@ -7068,8 +7068,10 @@ void sched_move_task(struct task_struct rq = task_rq_lock(tsk, ); - if (tsk->sched_class != _sched_class) + if (tsk->sched_class != _sched_class) { + set_task_cfs_rq(tsk); goto done; + } update_rq_clock(rq); - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.24-rc1-gb4f5550 oops
Grant Wilson wrote: [18073.371126] Unable to handle kernel NULL pointer dereference at 0120 RIP: [18073.371134] [8023572e] check_preempt_wakeup+0x6e/0x110 [18073.371144] PGD 81f9067 PUD 81c8067 PMD 0 [18073.371151] Oops: [1] PREEMPT SMP [18073.371157] CPU 2 [18073.371161] Modules linked in: vfat fat [18073.371168] Pid: 4639, comm: kwin Not tainted 2.6.24-rc1 #1 [18073.371171] RIP: 0010:[8023572e] [8023572e] check_preempt_wakeup+0x6e/0x110 [18073.371177] RSP: 0018:810008531a78 EFLAGS: 00010006 [18073.371179] RAX: RBX: RCX: [18073.371183] RDX: 810004441bf0 RSI: 81000801e860 RDI: 81000444ab80 [18073.371186] RBP: 810008531aa8 R08: 00d0d47a4a90 R09: [18073.371188] R10: 810004441bf0 R11: 0001 R12: 810006520400 [18073.371190] R13: 81000801e860 R14: 81000a63a000 R15: 81000443d8e0 [18073.371193] FS: 2b7d646a86f0() GS:810004c11780() knlGS: [18073.371196] CS: 0010 DS: ES: CR0: 8005003b [18073.371199] CR2: 0120 CR3: 08495000 CR4: 06e0 [18073.371202] DR0: DR1: DR2: [18073.371211] DR3: DR6: 0ff0 DR7: 0400 [18073.371214] Process kwin (pid: 4639, threadinfo 81000853, task 81000840a860) [18073.371216] Stack: 81000444ab80 0001 81000801e860 81000444ab80 [18073.371231] 0002 81000443d8e0 810008531b38 8023061e [18073.371238] 810004441b80 0002 0001 [18073.371245] Call Trace: [18073.371250] [8023061e] try_to_wake_up+0x2fe/0x3a0 I suspect I see the bug in that area, but I am not sure it can explain this trace completely. Suppose that the SCHED_FIFO task does switch_uid(new_user); Now, p-se.cfs_rq and p-se.parent both point into the old user_struct-tg because sched_move_task() doesn't call set_task_cfs_rq() for !fair_sched_class case. Suppose that old user_struct/task_group is freed/reused, and the task does sched_setscheduler(SCHED_NORMAL); __setscheduler() sets fair_sched_class, but doesn't update -se.cfs_rq/parent which point to the freed memory. This means that check_preempt_wakeup() doing while (!is_same_group(se, pse)) { se = parent_entity(se); pse = parent_entity(pse); } may OOPS in a similar way if rq-curr or p did something like above. Perhaps we need something like the patch below, note that __setscheduler() can't do set_task_cfs_rq(). Oleg. --- kernel/sched.c 2007-11-14 17:32:21.0 +0300 +++ - 2007-11-14 18:14:00.245492756 +0300 @@ -7068,8 +7068,10 @@ void sched_move_task(struct task_struct rq = task_rq_lock(tsk, flags); - if (tsk-sched_class != fair_sched_class) + if (tsk-sched_class != fair_sched_class) { + set_task_cfs_rq(tsk); goto done; + } update_rq_clock(rq); - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.24-rc1-gb4f5550 oops
* Oleg Nesterov [EMAIL PROTECTED] wrote: I suspect I see the bug in that area, but I am not sure it can explain this trace completely. there's a fix pending from Dmitry - please see below. It took days for Grant to trigger the crash so it needs some time to be confirmed but it could explain the crash in theory. Yes I agree, it can explain the crash. However, this patch can't fix the bug I was talking about (of course, unless I missed something). It is still possible that the fair_sched_class task can have -se.cfs_rq/parent pointing to the freed memory, no? yeah, agreed. I've queued up your fix for merging - thanks! Ingo - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.24-rc1-gb4f5550 oops
* Oleg Nesterov [EMAIL PROTECTED] wrote: [18073.371126] Unable to handle kernel NULL pointer dereference at 0120 RIP: [18073.371134] [8023572e] check_preempt_wakeup+0x6e/0x110 [18073.371144] PGD 81f9067 PUD 81c8067 PMD 0 [18073.371151] Oops: [1] PREEMPT SMP [18073.371157] CPU 2 [18073.371161] Modules linked in: vfat fat [18073.371168] Pid: 4639, comm: kwin Not tainted 2.6.24-rc1 #1 [18073.371171] RIP: 0010:[8023572e] [8023572e] check_preempt_wakeup+0x6e/0x110 [18073.371177] RSP: 0018:810008531a78 EFLAGS: 00010006 [18073.371179] RAX: RBX: RCX: [18073.371183] RDX: 810004441bf0 RSI: 81000801e860 RDI: 81000444ab80 [18073.371186] RBP: 810008531aa8 R08: 00d0d47a4a90 R09: [18073.371188] R10: 810004441bf0 R11: 0001 R12: 810006520400 [18073.371190] R13: 81000801e860 R14: 81000a63a000 R15: 81000443d8e0 [18073.371193] FS: 2b7d646a86f0() GS:810004c11780() knlGS: [18073.371196] CS: 0010 DS: ES: CR0: 8005003b [18073.371199] CR2: 0120 CR3: 08495000 CR4: 06e0 [18073.371202] DR0: DR1: DR2: [18073.371211] DR3: DR6: 0ff0 DR7: 0400 [18073.371214] Process kwin (pid: 4639, threadinfo 81000853, task 81000840a860) [18073.371216] Stack: 81000444ab80 0001 81000801e860 81000444ab80 [18073.371231] 0002 81000443d8e0 810008531b38 8023061e [18073.371238] 810004441b80 0002 0001 [18073.371245] Call Trace: [18073.371250] [8023061e] try_to_wake_up+0x2fe/0x3a0 I suspect I see the bug in that area, but I am not sure it can explain this trace completely. there's a fix pending from Dmitry - please see below. It took days for Grant to trigger the crash so it needs some time to be confirmed but it could explain the crash in theory. Ingo -- Subject: sched: fix __set_task_cpu() SMP race From: Dmitry Adamushko [EMAIL PROTECTED] Grant Wilson has reported rare SCHED_FAIR_USER crashes on his quad-core system, which crashes can only be explained via runqueue corruption. there is a narrow SMP race in __set_task_cpu(): after -cpu is set up to a new value, task_rq_lock(p, ...) can be successfuly executed on another CPU. We must ensure that updates of per-task data have been completed by this moment. this bug has been hiding in the Linux scheduler for an eternity (we never had any explicit barrier for task-cpu in set_task_cpu() - so the bug was introduced in 2.5.1), but only became visible via set_task_cfs_rq() being accidentally put after the task-cpu update. It also probably needs a sufficiently out-of-order CPU to trigger. Reported-by: Grant Wilson [EMAIL PROTECTED] Signed-off-by: Dmitry Adamushko [EMAIL PROTECTED] Signed-off-by: Ingo Molnar [EMAIL PROTECTED] --- kernel/sched.c | 18 -- 1 file changed, 12 insertions(+), 6 deletions(-) Index: linux/kernel/sched.c === --- linux.orig/kernel/sched.c +++ linux/kernel/sched.c @@ -217,15 +217,15 @@ static inline struct task_group *task_gr } /* Change a task's cfs_rq and parent entity if it moves across CPUs/groups */ -static inline void set_task_cfs_rq(struct task_struct *p) +static inline void set_task_cfs_rq(struct task_struct *p, unsigned int cpu) { - p-se.cfs_rq = task_group(p)-cfs_rq[task_cpu(p)]; - p-se.parent = task_group(p)-se[task_cpu(p)]; + p-se.cfs_rq = task_group(p)-cfs_rq[cpu]; + p-se.parent = task_group(p)-se[cpu]; } #else -static inline void set_task_cfs_rq(struct task_struct *p) { } +static inline void set_task_cfs_rq(struct task_struct *p, unsigned int cpu) { } #endif /* CONFIG_FAIR_GROUP_SCHED */ @@ -1023,10 +1023,16 @@ unsigned long weighted_cpuload(const int static inline void __set_task_cpu(struct task_struct *p, unsigned int cpu) { + set_task_cfs_rq(p, cpu); #ifdef CONFIG_SMP + /* +* After -cpu is set up to a new value, task_rq_lock(p, ...) can be +* successfuly executed on another CPU. We must ensure that updates of +* per-task data have been completed by this moment. +*/ + smp_wmb(); task_thread_info(p)-cpu = cpu; #endif - set_task_cfs_rq(p); } #ifdef CONFIG_SMP @@ -7111,7 +7117,7 @@ void sched_move_task(struct task_struct tsk-sched_class-put_prev_task(rq, tsk); } - set_task_cfs_rq(tsk); + set_task_cfs_rq(tsk, task_cpu(tsk)); if (on_rq) { if (unlikely(running)) - To unsubscribe from this list: send the line unsubscribe
Re: 2.6.24-rc1-gb4f5550 oops
On Wed, Nov 14, 2007 at 06:17:08PM +0300, Oleg Nesterov wrote: Suppose that old user_struct/task_group is freed/reused, and the task does Shouldn't this old user actually be the root user_struct? -- Regards, vatsa - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.24-rc1-gb4f5550 oops
btw., could you get your Signed-off-by line for that fix? Ingo - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.24-rc1-gb4f5550 oops
On 11/14, Ingo Molnar wrote: * Oleg Nesterov [EMAIL PROTECTED] wrote: [18073.371126] Unable to handle kernel NULL pointer dereference at 0120 RIP: [18073.371134] [8023572e] check_preempt_wakeup+0x6e/0x110 [18073.371144] PGD 81f9067 PUD 81c8067 PMD 0 [18073.371151] Oops: [1] PREEMPT SMP [18073.371157] CPU 2 [18073.371161] Modules linked in: vfat fat [18073.371168] Pid: 4639, comm: kwin Not tainted 2.6.24-rc1 #1 [18073.371171] RIP: 0010:[8023572e] [8023572e] check_preempt_wakeup+0x6e/0x110 [18073.371177] RSP: 0018:810008531a78 EFLAGS: 00010006 [18073.371179] RAX: RBX: RCX: [18073.371183] RDX: 810004441bf0 RSI: 81000801e860 RDI: 81000444ab80 [18073.371186] RBP: 810008531aa8 R08: 00d0d47a4a90 R09: [18073.371188] R10: 810004441bf0 R11: 0001 R12: 810006520400 [18073.371190] R13: 81000801e860 R14: 81000a63a000 R15: 81000443d8e0 [18073.371193] FS: 2b7d646a86f0() GS:810004c11780() knlGS: [18073.371196] CS: 0010 DS: ES: CR0: 8005003b [18073.371199] CR2: 0120 CR3: 08495000 CR4: 06e0 [18073.371202] DR0: DR1: DR2: [18073.371211] DR3: DR6: 0ff0 DR7: 0400 [18073.371214] Process kwin (pid: 4639, threadinfo 81000853, task 81000840a860) [18073.371216] Stack: 81000444ab80 0001 81000801e860 81000444ab80 [18073.371231] 0002 81000443d8e0 810008531b38 8023061e [18073.371238] 810004441b80 0002 0001 [18073.371245] Call Trace: [18073.371250] [8023061e] try_to_wake_up+0x2fe/0x3a0 I suspect I see the bug in that area, but I am not sure it can explain this trace completely. there's a fix pending from Dmitry - please see below. It took days for Grant to trigger the crash so it needs some time to be confirmed but it could explain the crash in theory. Yes I agree, it can explain the crash. However, this patch can't fix the bug I was talking about (of course, unless I missed something). It is still possible that the fair_sched_class task can have -se.cfs_rq/parent pointing to the freed memory, no? Oleg. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.24-rc1-gb4f5550 oops
On 11/14, Srivatsa Vaddagiri wrote: On Wed, Nov 14, 2007 at 06:17:08PM +0300, Oleg Nesterov wrote: Suppose that old user_struct/task_group is freed/reused, and the task does Shouldn't this old user actually be the root user_struct? Well, the S_ISUID application can switch -user from !root_user (note that setuid exec doesn't change -user), and we also have CAP_SETUID... But yes, I agree, this all is very unlikely. Oleg. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.24-rc1-gb4f5550 oops
On Mon, 12 Nov 2007 19:05:49 +0100 Peter Zijlstra <[EMAIL PROTECTED]> wrote: > > On Thu, 2007-11-08 at 23:49 +0100, Rafael J. Wysocki wrote: > > On Thursday, 8 of November 2007, Grant Wilson wrote: > > > On Thu, 8 Nov 2007 22:42:21 +0100 > > > "Rafael J. Wysocki" <[EMAIL PROTECTED]> wrote: > > > > > > > On Thursday, 8 of November 2007, Grant Wilson wrote: > > > > > On Thu, 8 Nov 2007 16:53:10 +0100 > > > > > "Rafael J. Wysocki" <[EMAIL PROTECTED]> wrote: > > > > > > > > > > > On Thursday, 8 of November 2007, Grant Wilson wrote: > > > > > > > On Thu, 8 Nov 2007 01:06:21 +0100 > > > > > > > "Rafael J. Wysocki" <[EMAIL PROTECTED]> wrote: > > > > > > > > > > > > > > > On Monday, 5 of November 2007, Grant Wilson wrote: > > > > > > > > > Hi, > > > > > > > > > I got this oops on 2.6.24-rc1-641-gb4f5550: > > > > > > > > > > > > > > > > (1) Is this reproducible? > > > > > > > > (2) Did it happen previously on your system? > > > > > > > > > > > > > > > > [18073.371126] Unable to handle kernel NULL pointer dereference > > > > > > > > at 0120 RIP: > > > > > > > > [18073.371134] [] > > > > > > > > check_preempt_wakeup+0x6e/0x110 > > > > > > > > > > > > > > This has now happened twice - the second time was last night when > > > > > > > running 2.6.24-rc2. > > > > > > > > > > > > > > Here's that second occurrence: > > > > > > > > > > > > [snip] > > > > > > > > > > > > Hmm. > > > > > > > > > > > > Please run "gdb vmlinux" and see what code corresponds to > > > > > > check_preempt_wakeup+0x6e in your kernel. > > > > > > > > > > > > > > > Dump of assembler code for function check_preempt_wakeup: > > > > > > > > Well, thanks, but I meant the source code. Please do "gdb vmlinux" and > > > > then > > > > "l *check_preempt_wakeup+0x6e" in gdb. > > > > > > Here's the requested output: > > > > > > (gdb) l *check_preempt_wakeup+0x6e > > > 0x802329ae is in check_preempt_wakeup (kernel/sched_fair.c:668). > > > 663 > > > 664 /* Do the two (enqueued) entities belong to the same group ? */ > > > 665 static inline int > > > 666 is_same_group(struct sched_entity *se, struct sched_entity *pse) > > > 667 { > > > 668 if (se->cfs_rq == pse->cfs_rq) > > > 669 return 1; > > > 670 > > > 671 return 0; > > > 672 } > > > > Well, it looks like either se or pse is NULL. > > > > Ingo, can you please have a look? > > Most puzzling this, it should be guaranteed that the top sched_entities > are of the same group, therefore avoiding this loop into NULL. Obviously > something has gone wrong. > > Grant, is there anything specific you can tell us about how to reproduce > this? I'm afraid not. It has only happened twice and both times I was away from the box in question at the time of failure, so it wasn't doing a great deal. I'm running 2.6.24-rc2 on two boxes and both times it happened on the box running a quad core processor. Grant - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.24-rc1-gb4f5550 oops
On Thu, 2007-11-08 at 23:49 +0100, Rafael J. Wysocki wrote: > On Thursday, 8 of November 2007, Grant Wilson wrote: > > On Thu, 8 Nov 2007 22:42:21 +0100 > > "Rafael J. Wysocki" <[EMAIL PROTECTED]> wrote: > > > > > On Thursday, 8 of November 2007, Grant Wilson wrote: > > > > On Thu, 8 Nov 2007 16:53:10 +0100 > > > > "Rafael J. Wysocki" <[EMAIL PROTECTED]> wrote: > > > > > > > > > On Thursday, 8 of November 2007, Grant Wilson wrote: > > > > > > On Thu, 8 Nov 2007 01:06:21 +0100 > > > > > > "Rafael J. Wysocki" <[EMAIL PROTECTED]> wrote: > > > > > > > > > > > > > On Monday, 5 of November 2007, Grant Wilson wrote: > > > > > > > > Hi, > > > > > > > > I got this oops on 2.6.24-rc1-641-gb4f5550: > > > > > > > > > > > > > > (1) Is this reproducible? > > > > > > > (2) Did it happen previously on your system? > > > > > > > > > > > > > > [18073.371126] Unable to handle kernel NULL pointer dereference > > > > > > > at 0120 RIP: > > > > > > > [18073.371134] [] > > > > > > > check_preempt_wakeup+0x6e/0x110 > > > > > > > > > > > > This has now happened twice - the second time was last night when > > > > > > running 2.6.24-rc2. > > > > > > > > > > > > Here's that second occurrence: > > > > > > > > > > [snip] > > > > > > > > > > Hmm. > > > > > > > > > > Please run "gdb vmlinux" and see what code corresponds to > > > > > check_preempt_wakeup+0x6e in your kernel. > > > > > > > > > > > > Dump of assembler code for function check_preempt_wakeup: > > > > > > Well, thanks, but I meant the source code. Please do "gdb vmlinux" and > > > then > > > "l *check_preempt_wakeup+0x6e" in gdb. > > > > Here's the requested output: > > > > (gdb) l *check_preempt_wakeup+0x6e > > 0x802329ae is in check_preempt_wakeup (kernel/sched_fair.c:668). > > 663 > > 664 /* Do the two (enqueued) entities belong to the same group ? */ > > 665 static inline int > > 666 is_same_group(struct sched_entity *se, struct sched_entity *pse) > > 667 { > > 668 if (se->cfs_rq == pse->cfs_rq) > > 669 return 1; > > 670 > > 671 return 0; > > 672 } > > Well, it looks like either se or pse is NULL. > > Ingo, can you please have a look? Most puzzling this, it should be guaranteed that the top sched_entities are of the same group, therefore avoiding this loop into NULL. Obviously something has gone wrong. Grant, is there anything specific you can tell us about how to reproduce this? - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.24-rc1-gb4f5550 oops
On Thu, 2007-11-08 at 23:49 +0100, Rafael J. Wysocki wrote: On Thursday, 8 of November 2007, Grant Wilson wrote: On Thu, 8 Nov 2007 22:42:21 +0100 Rafael J. Wysocki [EMAIL PROTECTED] wrote: On Thursday, 8 of November 2007, Grant Wilson wrote: On Thu, 8 Nov 2007 16:53:10 +0100 Rafael J. Wysocki [EMAIL PROTECTED] wrote: On Thursday, 8 of November 2007, Grant Wilson wrote: On Thu, 8 Nov 2007 01:06:21 +0100 Rafael J. Wysocki [EMAIL PROTECTED] wrote: On Monday, 5 of November 2007, Grant Wilson wrote: Hi, I got this oops on 2.6.24-rc1-641-gb4f5550: (1) Is this reproducible? (2) Did it happen previously on your system? [18073.371126] Unable to handle kernel NULL pointer dereference at 0120 RIP: [18073.371134] [8023572e] check_preempt_wakeup+0x6e/0x110 This has now happened twice - the second time was last night when running 2.6.24-rc2. Here's that second occurrence: [snip] Hmm. Please run gdb vmlinux and see what code corresponds to check_preempt_wakeup+0x6e in your kernel. Dump of assembler code for function check_preempt_wakeup: Well, thanks, but I meant the source code. Please do gdb vmlinux and then l *check_preempt_wakeup+0x6e in gdb. Here's the requested output: (gdb) l *check_preempt_wakeup+0x6e 0x802329ae is in check_preempt_wakeup (kernel/sched_fair.c:668). 663 664 /* Do the two (enqueued) entities belong to the same group ? */ 665 static inline int 666 is_same_group(struct sched_entity *se, struct sched_entity *pse) 667 { 668 if (se-cfs_rq == pse-cfs_rq) 669 return 1; 670 671 return 0; 672 } Well, it looks like either se or pse is NULL. Ingo, can you please have a look? Most puzzling this, it should be guaranteed that the top sched_entities are of the same group, therefore avoiding this loop into NULL. Obviously something has gone wrong. Grant, is there anything specific you can tell us about how to reproduce this? - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.24-rc1-gb4f5550 oops
On Mon, 12 Nov 2007 19:05:49 +0100 Peter Zijlstra [EMAIL PROTECTED] wrote: On Thu, 2007-11-08 at 23:49 +0100, Rafael J. Wysocki wrote: On Thursday, 8 of November 2007, Grant Wilson wrote: On Thu, 8 Nov 2007 22:42:21 +0100 Rafael J. Wysocki [EMAIL PROTECTED] wrote: On Thursday, 8 of November 2007, Grant Wilson wrote: On Thu, 8 Nov 2007 16:53:10 +0100 Rafael J. Wysocki [EMAIL PROTECTED] wrote: On Thursday, 8 of November 2007, Grant Wilson wrote: On Thu, 8 Nov 2007 01:06:21 +0100 Rafael J. Wysocki [EMAIL PROTECTED] wrote: On Monday, 5 of November 2007, Grant Wilson wrote: Hi, I got this oops on 2.6.24-rc1-641-gb4f5550: (1) Is this reproducible? (2) Did it happen previously on your system? [18073.371126] Unable to handle kernel NULL pointer dereference at 0120 RIP: [18073.371134] [8023572e] check_preempt_wakeup+0x6e/0x110 This has now happened twice - the second time was last night when running 2.6.24-rc2. Here's that second occurrence: [snip] Hmm. Please run gdb vmlinux and see what code corresponds to check_preempt_wakeup+0x6e in your kernel. Dump of assembler code for function check_preempt_wakeup: Well, thanks, but I meant the source code. Please do gdb vmlinux and then l *check_preempt_wakeup+0x6e in gdb. Here's the requested output: (gdb) l *check_preempt_wakeup+0x6e 0x802329ae is in check_preempt_wakeup (kernel/sched_fair.c:668). 663 664 /* Do the two (enqueued) entities belong to the same group ? */ 665 static inline int 666 is_same_group(struct sched_entity *se, struct sched_entity *pse) 667 { 668 if (se-cfs_rq == pse-cfs_rq) 669 return 1; 670 671 return 0; 672 } Well, it looks like either se or pse is NULL. Ingo, can you please have a look? Most puzzling this, it should be guaranteed that the top sched_entities are of the same group, therefore avoiding this loop into NULL. Obviously something has gone wrong. Grant, is there anything specific you can tell us about how to reproduce this? I'm afraid not. It has only happened twice and both times I was away from the box in question at the time of failure, so it wasn't doing a great deal. I'm running 2.6.24-rc2 on two boxes and both times it happened on the box running a quad core processor. Grant - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.24-rc1-gb4f5550 oops
On Thursday, 8 of November 2007, Grant Wilson wrote: > On Thu, 8 Nov 2007 22:42:21 +0100 > "Rafael J. Wysocki" <[EMAIL PROTECTED]> wrote: > > > On Thursday, 8 of November 2007, Grant Wilson wrote: > > > On Thu, 8 Nov 2007 16:53:10 +0100 > > > "Rafael J. Wysocki" <[EMAIL PROTECTED]> wrote: > > > > > > > On Thursday, 8 of November 2007, Grant Wilson wrote: > > > > > On Thu, 8 Nov 2007 01:06:21 +0100 > > > > > "Rafael J. Wysocki" <[EMAIL PROTECTED]> wrote: > > > > > > > > > > > On Monday, 5 of November 2007, Grant Wilson wrote: > > > > > > > Hi, > > > > > > > I got this oops on 2.6.24-rc1-641-gb4f5550: > > > > > > > > > > > > (1) Is this reproducible? > > > > > > (2) Did it happen previously on your system? > > > > > > > > > > > > [18073.371126] Unable to handle kernel NULL pointer dereference at > > > > > > 0120 RIP: > > > > > > [18073.371134] [] check_preempt_wakeup+0x6e/0x110 > > > > > > > > > > This has now happened twice - the second time was last night when > > > > > running 2.6.24-rc2. > > > > > > > > > > Here's that second occurrence: > > > > > > > > [snip] > > > > > > > > Hmm. > > > > > > > > Please run "gdb vmlinux" and see what code corresponds to > > > > check_preempt_wakeup+0x6e in your kernel. > > > > > > > > > Dump of assembler code for function check_preempt_wakeup: > > > > Well, thanks, but I meant the source code. Please do "gdb vmlinux" and then > > "l *check_preempt_wakeup+0x6e" in gdb. > > Here's the requested output: > > (gdb) l *check_preempt_wakeup+0x6e > 0x802329ae is in check_preempt_wakeup (kernel/sched_fair.c:668). > 663 > 664 /* Do the two (enqueued) entities belong to the same group ? */ > 665 static inline int > 666 is_same_group(struct sched_entity *se, struct sched_entity *pse) > 667 { > 668 if (se->cfs_rq == pse->cfs_rq) > 669 return 1; > 670 > 671 return 0; > 672 } Well, it looks like either se or pse is NULL. Ingo, can you please have a look? Thanks, Rafael - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.24-rc1-gb4f5550 oops
On Thu, 8 Nov 2007 22:42:21 +0100 "Rafael J. Wysocki" <[EMAIL PROTECTED]> wrote: > On Thursday, 8 of November 2007, Grant Wilson wrote: > > On Thu, 8 Nov 2007 16:53:10 +0100 > > "Rafael J. Wysocki" <[EMAIL PROTECTED]> wrote: > > > > > On Thursday, 8 of November 2007, Grant Wilson wrote: > > > > On Thu, 8 Nov 2007 01:06:21 +0100 > > > > "Rafael J. Wysocki" <[EMAIL PROTECTED]> wrote: > > > > > > > > > On Monday, 5 of November 2007, Grant Wilson wrote: > > > > > > Hi, > > > > > > I got this oops on 2.6.24-rc1-641-gb4f5550: > > > > > > > > > > (1) Is this reproducible? > > > > > (2) Did it happen previously on your system? > > > > > > > > > > [18073.371126] Unable to handle kernel NULL pointer dereference at > > > > > 0120 RIP: > > > > > [18073.371134] [] check_preempt_wakeup+0x6e/0x110 > > > > > > > > This has now happened twice - the second time was last night when > > > > running 2.6.24-rc2. > > > > > > > > Here's that second occurrence: > > > > > > [snip] > > > > > > Hmm. > > > > > > Please run "gdb vmlinux" and see what code corresponds to > > > check_preempt_wakeup+0x6e in your kernel. > > > > > > Dump of assembler code for function check_preempt_wakeup: > > Well, thanks, but I meant the source code. Please do "gdb vmlinux" and then > "l *check_preempt_wakeup+0x6e" in gdb. Here's the requested output: (gdb) l *check_preempt_wakeup+0x6e 0x802329ae is in check_preempt_wakeup (kernel/sched_fair.c:668). 663 664 /* Do the two (enqueued) entities belong to the same group ? */ 665 static inline int 666 is_same_group(struct sched_entity *se, struct sched_entity *pse) 667 { 668 if (se->cfs_rq == pse->cfs_rq) 669 return 1; 670 671 return 0; 672 } Grant - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.24-rc1-gb4f5550 oops
On Thursday, 8 of November 2007, Grant Wilson wrote: > On Thu, 8 Nov 2007 16:53:10 +0100 > "Rafael J. Wysocki" <[EMAIL PROTECTED]> wrote: > > > On Thursday, 8 of November 2007, Grant Wilson wrote: > > > On Thu, 8 Nov 2007 01:06:21 +0100 > > > "Rafael J. Wysocki" <[EMAIL PROTECTED]> wrote: > > > > > > > On Monday, 5 of November 2007, Grant Wilson wrote: > > > > > Hi, > > > > > I got this oops on 2.6.24-rc1-641-gb4f5550: > > > > > > > > (1) Is this reproducible? > > > > (2) Did it happen previously on your system? > > > > > > > > [18073.371126] Unable to handle kernel NULL pointer dereference at > > > > 0120 RIP: > > > > [18073.371134] [] check_preempt_wakeup+0x6e/0x110 > > > > > > This has now happened twice - the second time was last night when > > > running 2.6.24-rc2. > > > > > > Here's that second occurrence: > > > > [snip] > > > > Hmm. > > > > Please run "gdb vmlinux" and see what code corresponds to > > check_preempt_wakeup+0x6e in your kernel. > > > Dump of assembler code for function check_preempt_wakeup: Well, thanks, but I meant the source code. Please do "gdb vmlinux" and then "l *check_preempt_wakeup+0x6e" in gdb. Thanks, Rafael - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.24-rc1-gb4f5550 oops
On Thu, 8 Nov 2007 16:53:10 +0100 "Rafael J. Wysocki" <[EMAIL PROTECTED]> wrote: > On Thursday, 8 of November 2007, Grant Wilson wrote: > > On Thu, 8 Nov 2007 01:06:21 +0100 > > "Rafael J. Wysocki" <[EMAIL PROTECTED]> wrote: > > > > > On Monday, 5 of November 2007, Grant Wilson wrote: > > > > Hi, > > > > I got this oops on 2.6.24-rc1-641-gb4f5550: > > > > > > (1) Is this reproducible? > > > (2) Did it happen previously on your system? > > > > > > [18073.371126] Unable to handle kernel NULL pointer dereference at > > > 0120 RIP: > > > [18073.371134] [] check_preempt_wakeup+0x6e/0x110 > > > > This has now happened twice - the second time was last night when > > running 2.6.24-rc2. > > > > Here's that second occurrence: > > [snip] > > Hmm. > > Please run "gdb vmlinux" and see what code corresponds to > check_preempt_wakeup+0x6e in your kernel. Dump of assembler code for function check_preempt_wakeup: 0x80232940 :push %rbp 0x80232941 :mov%rsp,%rbp 0x80232944 :sub$0x30,%rsp 0x80232948 :mov%r13,-0x18(%rbp) 0x8023294c : mov%rbx,-0x28(%rbp) 0x80232950 : mov%rsi,%r13 0x80232953 : mov%r12,-0x20(%rbp) 0x80232957 : mov%r14,-0x10(%rbp) 0x8023295b : mov%r15,-0x8(%rbp) 0x8023295f : cmpl $0x63,0x20(%rsi) 0x80232963 : mov0x750(%rdi),%r14 0x8023296a : mov0x168(%r14),%r12 0x80232971 : jle0x80232a1c 0x80232977 : cmpl $0x3,0x17c(%rsi) 0x8023297e : je 0x802329f8 0x80232980 : testb $0x10,0x593cb9(%rip) # 0x807c6640 0x80232987 : je 0x802329f8 0x80232989 : cmp0x168(%rsi),%r12 0x80232990 : lea0x48(%r14),%rbx 0x80232994 : lea0x48(%rsi),%rax 0x80232998 : je 0x802329be 0x8023299a : nopw 0x0(%rax,%rax,1) 0x802329a0 : mov0x118(%rbx),%rbx 0x802329a7 : mov0x118(%rax),%rax 0x802329ae : mov0x120(%rax),%rdx 0x802329b5 : cmp%rdx,0x120(%rbx) 0x802329bc : jne0x802329a0 0x802329be : cmpq $0x400,(%rbx) 0x802329c5 : mov0x40(%rbx),%r12 0x802329c9 : mov0x40(%rax),%r15 0x802329cd : mov0x593c81(%rip),%edi # 0x807c6654 0x802329d3 : jne0x80232a37 0x802329d5 : sub%r15,%r12 0x802329d8 : cmp%r12,%rdi 0x802329db : jge0x802329f8 0x802329dd : testb $0x20,0x593c5c(%rip) # 0x807c6640 Cheers, Grant -- Running Linux 2.6.24-rc2 on x86_64 - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.24-rc1-gb4f5550 oops
On Thursday, 8 of November 2007, Grant Wilson wrote: > On Thu, 8 Nov 2007 01:06:21 +0100 > "Rafael J. Wysocki" <[EMAIL PROTECTED]> wrote: > > > On Monday, 5 of November 2007, Grant Wilson wrote: > > > Hi, > > > I got this oops on 2.6.24-rc1-641-gb4f5550: > > > > (1) Is this reproducible? > > (2) Did it happen previously on your system? > > > > [18073.371126] Unable to handle kernel NULL pointer dereference at > > 0120 RIP: > > [18073.371134] [] check_preempt_wakeup+0x6e/0x110 > > This has now happened twice - the second time was last night when > running 2.6.24-rc2. > > Here's that second occurrence: > > [ 7224.233621] Unable to handle kernel NULL pointer dereference at > 0120 RIP: > [ 7224.233631] [] check_preempt_wakeup+0x6e/0x110 > [ 7224.233644] PGD 0 > [ 7224.233650] Oops: [1] PREEMPT SMP > [ 7224.233660] CPU 0 > [ 7224.233716] Modules linked in: > [ 7224.233722] Pid: 7622, comm: ssh Not tainted 2.6.24-rc2 #1 > [ 7224.233726] RIP: 0010:[] [] > check_preempt_wakeup+0x6e/0x110 > [ 7224.233735] RSP: 0018:81000aafba78 EFLAGS: 00010006 > [ 7224.233738] RAX: RBX: RCX: > 5745 > [ 7224.233743] RDX: 81000442fbf0 RSI: 810006c96860 RDI: > 810004438b80 > [ 7224.233748] RBP: 81000aafbaa8 R08: 007e8e25cf71 R09: > > [ 7224.233752] R10: 81000442fbf0 R11: 0001 R12: > 810007479d80 > [ 7224.233756] R13: 810006c96860 R14: 810009924860 R15: > 81000442b8e0 > [ 7224.233760] FS: 2b8bf089d350() GS:808d7000() > knlGS: > [ 7224.233764] CS: 0010 DS: ES: CR0: 8005003b > [ 7224.233768] CR2: 0120 CR3: 0ab3f000 CR4: > 06e0 > [ 7224.233771] DR0: DR1: DR2: > > [ 7224.233775] DR3: DR6: 0ff0 DR7: > 0400 > [ 7224.233779] Process ssh (pid: 7622, threadinfo 81000aafa000, task > 81000abc2000) > [ 7224.233782] Stack: 810004438b80 0001 810006c96860 > 810004438b80 > [ 7224.233796] 81000442b8e0 81000aafbb38 > 8023061e > [ 7224.233807] 0400 81000442fb80 > 0001 > [ 7224.233816] Call Trace: > [ 7224.233823] [] try_to_wake_up+0x2fe/0x3a0 > [ 7224.233828] [] default_wake_function+0xd/0x10 > [ 7224.233833] [] __wake_up_common+0x5a/0x90 > [ 7224.233839] [] __wake_up_sync+0x4a/0x70 > [ 7224.233845] [] unix_write_space+0x8f/0xa0 > [ 7224.233850] [] sock_wfree+0x49/0x50 > [ 7224.233854] [] __kfree_skb+0x69/0xe0 > [ 7224.233859] [] kfree_skb+0x17/0x30 > [ 7224.233863] [] unix_stream_recvmsg+0x267/0x610 > [ 7224.233869] [] sock_aio_read+0x107/0x110 > [ 7224.233875] [] do_sync_read+0xf1/0x130 > [ 7224.233882] [] __d_free+0x30/0x40 > [ 7224.233887] [] autoremove_wake_function+0x0/0x40 > [ 7224.233892] [] vfs_read+0x156/0x160 > [ 7224.233897] [] sys_read+0x50/0x90 > [ 7224.233901] [] system_call+0x7e/0x83 > [ 7224.233904] > [ 7224.233907] > [ 7224.233907] Code: 48 8b 90 20 01 00 00 48 39 93 20 01 00 00 75 e2 48 81 3b > 00 > [ 7224.233951] RIP [] check_preempt_wakeup+0x6e/0x110 > [ 7224.233957] RSP > [ 7224.233961] CR2: 0120 > [ 7224.233967] note: ssh[7622] exited with preempt_count 3 Hmm. Please run "gdb vmlinux" and see what code corresponds to check_preempt_wakeup+0x6e in your kernel. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.24-rc1-gb4f5550 oops
On Thursday, 8 of November 2007, Grant Wilson wrote: On Thu, 8 Nov 2007 01:06:21 +0100 Rafael J. Wysocki [EMAIL PROTECTED] wrote: On Monday, 5 of November 2007, Grant Wilson wrote: Hi, I got this oops on 2.6.24-rc1-641-gb4f5550: (1) Is this reproducible? (2) Did it happen previously on your system? [18073.371126] Unable to handle kernel NULL pointer dereference at 0120 RIP: [18073.371134] [8023572e] check_preempt_wakeup+0x6e/0x110 This has now happened twice - the second time was last night when running 2.6.24-rc2. Here's that second occurrence: [ 7224.233621] Unable to handle kernel NULL pointer dereference at 0120 RIP: [ 7224.233631] [8023572e] check_preempt_wakeup+0x6e/0x110 [ 7224.233644] PGD 0 [ 7224.233650] Oops: [1] PREEMPT SMP [ 7224.233660] CPU 0 [ 7224.233716] Modules linked in: [ 7224.233722] Pid: 7622, comm: ssh Not tainted 2.6.24-rc2 #1 [ 7224.233726] RIP: 0010:[8023572e] [8023572e] check_preempt_wakeup+0x6e/0x110 [ 7224.233735] RSP: 0018:81000aafba78 EFLAGS: 00010006 [ 7224.233738] RAX: RBX: RCX: 5745 [ 7224.233743] RDX: 81000442fbf0 RSI: 810006c96860 RDI: 810004438b80 [ 7224.233748] RBP: 81000aafbaa8 R08: 007e8e25cf71 R09: [ 7224.233752] R10: 81000442fbf0 R11: 0001 R12: 810007479d80 [ 7224.233756] R13: 810006c96860 R14: 810009924860 R15: 81000442b8e0 [ 7224.233760] FS: 2b8bf089d350() GS:808d7000() knlGS: [ 7224.233764] CS: 0010 DS: ES: CR0: 8005003b [ 7224.233768] CR2: 0120 CR3: 0ab3f000 CR4: 06e0 [ 7224.233771] DR0: DR1: DR2: [ 7224.233775] DR3: DR6: 0ff0 DR7: 0400 [ 7224.233779] Process ssh (pid: 7622, threadinfo 81000aafa000, task 81000abc2000) [ 7224.233782] Stack: 810004438b80 0001 810006c96860 810004438b80 [ 7224.233796] 81000442b8e0 81000aafbb38 8023061e [ 7224.233807] 0400 81000442fb80 0001 [ 7224.233816] Call Trace: [ 7224.233823] [8023061e] try_to_wake_up+0x2fe/0x3a0 [ 7224.233828] [802306cd] default_wake_function+0xd/0x10 [ 7224.233833] [8022daca] __wake_up_common+0x5a/0x90 [ 7224.233839] [8023095a] __wake_up_sync+0x4a/0x70 [ 7224.233845] [80602fbf] unix_write_space+0x8f/0xa0 [ 7224.233850] [805936f9] sock_wfree+0x49/0x50 [ 7224.233854] [80595579] __kfree_skb+0x69/0xe0 [ 7224.233859] [80595607] kfree_skb+0x17/0x30 [ 7224.233863] [806016c7] unix_stream_recvmsg+0x267/0x610 [ 7224.233869] [8058e457] sock_aio_read+0x107/0x110 [ 7224.233875] [802928f1] do_sync_read+0xf1/0x130 [ 7224.233882] [802a4f20] __d_free+0x30/0x40 [ 7224.233887] [80251830] autoremove_wake_function+0x0/0x40 [ 7224.233892] [80293246] vfs_read+0x156/0x160 [ 7224.233897] [80293650] sys_read+0x50/0x90 [ 7224.233901] [8020bd7e] system_call+0x7e/0x83 [ 7224.233904] [ 7224.233907] [ 7224.233907] Code: 48 8b 90 20 01 00 00 48 39 93 20 01 00 00 75 e2 48 81 3b 00 [ 7224.233951] RIP [8023572e] check_preempt_wakeup+0x6e/0x110 [ 7224.233957] RSP 81000aafba78 [ 7224.233961] CR2: 0120 [ 7224.233967] note: ssh[7622] exited with preempt_count 3 Hmm. Please run gdb vmlinux and see what code corresponds to check_preempt_wakeup+0x6e in your kernel. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.24-rc1-gb4f5550 oops
On Thu, 8 Nov 2007 16:53:10 +0100 Rafael J. Wysocki [EMAIL PROTECTED] wrote: On Thursday, 8 of November 2007, Grant Wilson wrote: On Thu, 8 Nov 2007 01:06:21 +0100 Rafael J. Wysocki [EMAIL PROTECTED] wrote: On Monday, 5 of November 2007, Grant Wilson wrote: Hi, I got this oops on 2.6.24-rc1-641-gb4f5550: (1) Is this reproducible? (2) Did it happen previously on your system? [18073.371126] Unable to handle kernel NULL pointer dereference at 0120 RIP: [18073.371134] [8023572e] check_preempt_wakeup+0x6e/0x110 This has now happened twice - the second time was last night when running 2.6.24-rc2. Here's that second occurrence: [snip] Hmm. Please run gdb vmlinux and see what code corresponds to check_preempt_wakeup+0x6e in your kernel. Dump of assembler code for function check_preempt_wakeup: 0x80232940 check_preempt_wakeup+0:push %rbp 0x80232941 check_preempt_wakeup+1:mov%rsp,%rbp 0x80232944 check_preempt_wakeup+4:sub$0x30,%rsp 0x80232948 check_preempt_wakeup+8:mov%r13,-0x18(%rbp) 0x8023294c check_preempt_wakeup+12: mov%rbx,-0x28(%rbp) 0x80232950 check_preempt_wakeup+16: mov%rsi,%r13 0x80232953 check_preempt_wakeup+19: mov%r12,-0x20(%rbp) 0x80232957 check_preempt_wakeup+23: mov%r14,-0x10(%rbp) 0x8023295b check_preempt_wakeup+27: mov%r15,-0x8(%rbp) 0x8023295f check_preempt_wakeup+31: cmpl $0x63,0x20(%rsi) 0x80232963 check_preempt_wakeup+35: mov0x750(%rdi),%r14 0x8023296a check_preempt_wakeup+42: mov0x168(%r14),%r12 0x80232971 check_preempt_wakeup+49: jle0x80232a1c check_preempt_wakeup+220 0x80232977 check_preempt_wakeup+55: cmpl $0x3,0x17c(%rsi) 0x8023297e check_preempt_wakeup+62: je 0x802329f8 check_preempt_wakeup+184 0x80232980 check_preempt_wakeup+64: testb $0x10,0x593cb9(%rip) # 0x807c6640 sysctl_sched_features 0x80232987 check_preempt_wakeup+71: je 0x802329f8 check_preempt_wakeup+184 0x80232989 check_preempt_wakeup+73: cmp0x168(%rsi),%r12 0x80232990 check_preempt_wakeup+80: lea0x48(%r14),%rbx 0x80232994 check_preempt_wakeup+84: lea0x48(%rsi),%rax 0x80232998 check_preempt_wakeup+88: je 0x802329be check_preempt_wakeup+126 0x8023299a check_preempt_wakeup+90: nopw 0x0(%rax,%rax,1) 0x802329a0 check_preempt_wakeup+96: mov0x118(%rbx),%rbx 0x802329a7 check_preempt_wakeup+103: mov0x118(%rax),%rax 0x802329ae check_preempt_wakeup+110: mov0x120(%rax),%rdx 0x802329b5 check_preempt_wakeup+117: cmp%rdx,0x120(%rbx) 0x802329bc check_preempt_wakeup+124: jne0x802329a0 check_preempt_wakeup+96 0x802329be check_preempt_wakeup+126: cmpq $0x400,(%rbx) 0x802329c5 check_preempt_wakeup+133: mov0x40(%rbx),%r12 0x802329c9 check_preempt_wakeup+137: mov0x40(%rax),%r15 0x802329cd check_preempt_wakeup+141: mov0x593c81(%rip),%edi # 0x807c6654 sysctl_sched_wakeup_granularity 0x802329d3 check_preempt_wakeup+147: jne0x80232a37 check_preempt_wakeup+247 0x802329d5 check_preempt_wakeup+149: sub%r15,%r12 0x802329d8 check_preempt_wakeup+152: cmp%r12,%rdi 0x802329db check_preempt_wakeup+155: jge0x802329f8 check_preempt_wakeup+184 0x802329dd check_preempt_wakeup+157: testb $0x20,0x593c5c(%rip) # 0x807c6640 sysctl_sched_features Cheers, Grant -- Running Linux 2.6.24-rc2 on x86_64 - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.24-rc1-gb4f5550 oops
On Thursday, 8 of November 2007, Grant Wilson wrote: On Thu, 8 Nov 2007 22:42:21 +0100 Rafael J. Wysocki [EMAIL PROTECTED] wrote: On Thursday, 8 of November 2007, Grant Wilson wrote: On Thu, 8 Nov 2007 16:53:10 +0100 Rafael J. Wysocki [EMAIL PROTECTED] wrote: On Thursday, 8 of November 2007, Grant Wilson wrote: On Thu, 8 Nov 2007 01:06:21 +0100 Rafael J. Wysocki [EMAIL PROTECTED] wrote: On Monday, 5 of November 2007, Grant Wilson wrote: Hi, I got this oops on 2.6.24-rc1-641-gb4f5550: (1) Is this reproducible? (2) Did it happen previously on your system? [18073.371126] Unable to handle kernel NULL pointer dereference at 0120 RIP: [18073.371134] [8023572e] check_preempt_wakeup+0x6e/0x110 This has now happened twice - the second time was last night when running 2.6.24-rc2. Here's that second occurrence: [snip] Hmm. Please run gdb vmlinux and see what code corresponds to check_preempt_wakeup+0x6e in your kernel. Dump of assembler code for function check_preempt_wakeup: Well, thanks, but I meant the source code. Please do gdb vmlinux and then l *check_preempt_wakeup+0x6e in gdb. Here's the requested output: (gdb) l *check_preempt_wakeup+0x6e 0x802329ae is in check_preempt_wakeup (kernel/sched_fair.c:668). 663 664 /* Do the two (enqueued) entities belong to the same group ? */ 665 static inline int 666 is_same_group(struct sched_entity *se, struct sched_entity *pse) 667 { 668 if (se-cfs_rq == pse-cfs_rq) 669 return 1; 670 671 return 0; 672 } Well, it looks like either se or pse is NULL. Ingo, can you please have a look? Thanks, Rafael - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.24-rc1-gb4f5550 oops
On Thu, 8 Nov 2007 22:42:21 +0100 Rafael J. Wysocki [EMAIL PROTECTED] wrote: On Thursday, 8 of November 2007, Grant Wilson wrote: On Thu, 8 Nov 2007 16:53:10 +0100 Rafael J. Wysocki [EMAIL PROTECTED] wrote: On Thursday, 8 of November 2007, Grant Wilson wrote: On Thu, 8 Nov 2007 01:06:21 +0100 Rafael J. Wysocki [EMAIL PROTECTED] wrote: On Monday, 5 of November 2007, Grant Wilson wrote: Hi, I got this oops on 2.6.24-rc1-641-gb4f5550: (1) Is this reproducible? (2) Did it happen previously on your system? [18073.371126] Unable to handle kernel NULL pointer dereference at 0120 RIP: [18073.371134] [8023572e] check_preempt_wakeup+0x6e/0x110 This has now happened twice - the second time was last night when running 2.6.24-rc2. Here's that second occurrence: [snip] Hmm. Please run gdb vmlinux and see what code corresponds to check_preempt_wakeup+0x6e in your kernel. Dump of assembler code for function check_preempt_wakeup: Well, thanks, but I meant the source code. Please do gdb vmlinux and then l *check_preempt_wakeup+0x6e in gdb. Here's the requested output: (gdb) l *check_preempt_wakeup+0x6e 0x802329ae is in check_preempt_wakeup (kernel/sched_fair.c:668). 663 664 /* Do the two (enqueued) entities belong to the same group ? */ 665 static inline int 666 is_same_group(struct sched_entity *se, struct sched_entity *pse) 667 { 668 if (se-cfs_rq == pse-cfs_rq) 669 return 1; 670 671 return 0; 672 } Grant - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.24-rc1-gb4f5550 oops
On Thursday, 8 of November 2007, Grant Wilson wrote: On Thu, 8 Nov 2007 16:53:10 +0100 Rafael J. Wysocki [EMAIL PROTECTED] wrote: On Thursday, 8 of November 2007, Grant Wilson wrote: On Thu, 8 Nov 2007 01:06:21 +0100 Rafael J. Wysocki [EMAIL PROTECTED] wrote: On Monday, 5 of November 2007, Grant Wilson wrote: Hi, I got this oops on 2.6.24-rc1-641-gb4f5550: (1) Is this reproducible? (2) Did it happen previously on your system? [18073.371126] Unable to handle kernel NULL pointer dereference at 0120 RIP: [18073.371134] [8023572e] check_preempt_wakeup+0x6e/0x110 This has now happened twice - the second time was last night when running 2.6.24-rc2. Here's that second occurrence: [snip] Hmm. Please run gdb vmlinux and see what code corresponds to check_preempt_wakeup+0x6e in your kernel. Dump of assembler code for function check_preempt_wakeup: Well, thanks, but I meant the source code. Please do gdb vmlinux and then l *check_preempt_wakeup+0x6e in gdb. Thanks, Rafael - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.24-rc1-gb4f5550 oops
On Monday, 5 of November 2007, Grant Wilson wrote: > Hi, > I got this oops on 2.6.24-rc1-641-gb4f5550: (1) Is this reproducible? (2) Did it happen previously on your system? > [18073.371126] Unable to handle kernel NULL pointer dereference at > 0120 RIP: > [18073.371134] [] check_preempt_wakeup+0x6e/0x110 > [18073.371144] PGD 81f9067 PUD 81c8067 PMD 0 > [18073.371151] Oops: [1] PREEMPT SMP > [18073.371157] CPU 2 > [18073.371161] Modules linked in: vfat fat > [18073.371168] Pid: 4639, comm: kwin Not tainted 2.6.24-rc1 #1 > [18073.371171] RIP: 0010:[] [] > check_preempt_wakeup+0x6e/0x110 > [18073.371177] RSP: 0018:810008531a78 EFLAGS: 00010006 > [18073.371179] RAX: RBX: RCX: > > [18073.371183] RDX: 810004441bf0 RSI: 81000801e860 RDI: > 81000444ab80 > [18073.371186] RBP: 810008531aa8 R08: 00d0d47a4a90 R09: > > [18073.371188] R10: 810004441bf0 R11: 0001 R12: > 810006520400 > [18073.371190] R13: 81000801e860 R14: 81000a63a000 R15: > 81000443d8e0 > [18073.371193] FS: 2b7d646a86f0() GS:810004c11780() > knlGS: > [18073.371196] CS: 0010 DS: ES: CR0: 8005003b > [18073.371199] CR2: 0120 CR3: 08495000 CR4: > 06e0 > [18073.371202] DR0: DR1: DR2: > > [18073.371211] DR3: DR6: 0ff0 DR7: > 0400 > [18073.371214] Process kwin (pid: 4639, threadinfo 81000853, task > 81000840a860) > [18073.371216] Stack: 81000444ab80 0001 81000801e860 > 81000444ab80 > [18073.371231] 0002 81000443d8e0 810008531b38 > 8023061e > [18073.371238] 810004441b80 0002 > 0001 > [18073.371245] Call Trace: > [18073.371250] [] try_to_wake_up+0x2fe/0x3a0 > [18073.371253] [] default_wake_function+0xd/0x10 > [18073.371257] [] __wake_up_common+0x5a/0x90 > [18073.371260] [] __wake_up_sync+0x4a/0x70 > [18073.371264] [] unix_write_space+0x8f/0xa0 > [18073.371269] [] sock_wfree+0x49/0x50 > [18073.371272] [] __kfree_skb+0x69/0xe0 > [18073.371275] [] kfree_skb+0x17/0x30 > [18073.371278] [] unix_stream_recvmsg+0x267/0x610 > [18073.371283] [] sock_aio_read+0x107/0x110 > [18073.371287] [] do_sync_read+0xf1/0x130 > [18073.371291] [] sock_ioctl+0x0/0x260 > [18073.371295] [] autoremove_wake_function+0x0/0x40 > [18073.371299] [] unix_ioctl+0xb2/0xf0 > [18073.371302] [] sock_ioctl+0xd1/0x260 > [18073.371305] [] do_ioctl+0x31/0x90 > [18073.371308] [] vfs_read+0x156/0x160 > [18073.371311] [] sys_read+0x50/0x90 > [18073.371315] [] system_call+0x7e/0x83 > [18073.371317] > [18073.371319] > [18073.371319] Code: 48 8b 90 20 01 00 00 48 39 93 20 01 00 00 75 e2 48 81 3b > 00 > [18073.371346] RIP [] check_preempt_wakeup+0x6e/0x110 > [18073.371351] RSP > [18073.371354] CR2: 0120 > [18073.371358] note: kwin[4639] exited with preempt_count 3 - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.24-rc1-gb4f5550 oops
On Monday, 5 of November 2007, Grant Wilson wrote: Hi, I got this oops on 2.6.24-rc1-641-gb4f5550: (1) Is this reproducible? (2) Did it happen previously on your system? [18073.371126] Unable to handle kernel NULL pointer dereference at 0120 RIP: [18073.371134] [8023572e] check_preempt_wakeup+0x6e/0x110 [18073.371144] PGD 81f9067 PUD 81c8067 PMD 0 [18073.371151] Oops: [1] PREEMPT SMP [18073.371157] CPU 2 [18073.371161] Modules linked in: vfat fat [18073.371168] Pid: 4639, comm: kwin Not tainted 2.6.24-rc1 #1 [18073.371171] RIP: 0010:[8023572e] [8023572e] check_preempt_wakeup+0x6e/0x110 [18073.371177] RSP: 0018:810008531a78 EFLAGS: 00010006 [18073.371179] RAX: RBX: RCX: [18073.371183] RDX: 810004441bf0 RSI: 81000801e860 RDI: 81000444ab80 [18073.371186] RBP: 810008531aa8 R08: 00d0d47a4a90 R09: [18073.371188] R10: 810004441bf0 R11: 0001 R12: 810006520400 [18073.371190] R13: 81000801e860 R14: 81000a63a000 R15: 81000443d8e0 [18073.371193] FS: 2b7d646a86f0() GS:810004c11780() knlGS: [18073.371196] CS: 0010 DS: ES: CR0: 8005003b [18073.371199] CR2: 0120 CR3: 08495000 CR4: 06e0 [18073.371202] DR0: DR1: DR2: [18073.371211] DR3: DR6: 0ff0 DR7: 0400 [18073.371214] Process kwin (pid: 4639, threadinfo 81000853, task 81000840a860) [18073.371216] Stack: 81000444ab80 0001 81000801e860 81000444ab80 [18073.371231] 0002 81000443d8e0 810008531b38 8023061e [18073.371238] 810004441b80 0002 0001 [18073.371245] Call Trace: [18073.371250] [8023061e] try_to_wake_up+0x2fe/0x3a0 [18073.371253] [802306cd] default_wake_function+0xd/0x10 [18073.371257] [8022daca] __wake_up_common+0x5a/0x90 [18073.371260] [8023095a] __wake_up_sync+0x4a/0x70 [18073.371264] [80602c3f] unix_write_space+0x8f/0xa0 [18073.371269] [80593359] sock_wfree+0x49/0x50 [18073.371272] [805951d9] __kfree_skb+0x69/0xe0 [18073.371275] [80595267] kfree_skb+0x17/0x30 [18073.371278] [80601347] unix_stream_recvmsg+0x267/0x610 [18073.371283] [8058e0b7] sock_aio_read+0x107/0x110 [18073.371287] [802929a1] do_sync_read+0xf1/0x130 [18073.371291] [8058e230] sock_ioctl+0x0/0x260 [18073.371295] [80251830] autoremove_wake_function+0x0/0x40 [18073.371299] [80600522] unix_ioctl+0xb2/0xf0 [18073.371302] [8058e301] sock_ioctl+0xd1/0x260 [18073.371305] [802a00c1] do_ioctl+0x31/0x90 [18073.371308] [802932f6] vfs_read+0x156/0x160 [18073.371311] [80293700] sys_read+0x50/0x90 [18073.371315] [8020bd7e] system_call+0x7e/0x83 [18073.371317] [18073.371319] [18073.371319] Code: 48 8b 90 20 01 00 00 48 39 93 20 01 00 00 75 e2 48 81 3b 00 [18073.371346] RIP [8023572e] check_preempt_wakeup+0x6e/0x110 [18073.371351] RSP 810008531a78 [18073.371354] CR2: 0120 [18073.371358] note: kwin[4639] exited with preempt_count 3 - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
2.6.24-rc1-gb4f5550 oops
Hi, I got this oops on 2.6.24-rc1-641-gb4f5550: [18073.371126] Unable to handle kernel NULL pointer dereference at 0120 RIP: [18073.371134] [] check_preempt_wakeup+0x6e/0x110 [18073.371144] PGD 81f9067 PUD 81c8067 PMD 0 [18073.371151] Oops: [1] PREEMPT SMP [18073.371157] CPU 2 [18073.371161] Modules linked in: vfat fat [18073.371168] Pid: 4639, comm: kwin Not tainted 2.6.24-rc1 #1 [18073.371171] RIP: 0010:[] [] check_preempt_wakeup+0x6e/0x110 [18073.371177] RSP: 0018:810008531a78 EFLAGS: 00010006 [18073.371179] RAX: RBX: RCX: [18073.371183] RDX: 810004441bf0 RSI: 81000801e860 RDI: 81000444ab80 [18073.371186] RBP: 810008531aa8 R08: 00d0d47a4a90 R09: [18073.371188] R10: 810004441bf0 R11: 0001 R12: 810006520400 [18073.371190] R13: 81000801e860 R14: 81000a63a000 R15: 81000443d8e0 [18073.371193] FS: 2b7d646a86f0() GS:810004c11780() knlGS: [18073.371196] CS: 0010 DS: ES: CR0: 8005003b [18073.371199] CR2: 0120 CR3: 08495000 CR4: 06e0 [18073.371202] DR0: DR1: DR2: [18073.371211] DR3: DR6: 0ff0 DR7: 0400 [18073.371214] Process kwin (pid: 4639, threadinfo 81000853, task 81000840a860) [18073.371216] Stack: 81000444ab80 0001 81000801e860 81000444ab80 [18073.371231] 0002 81000443d8e0 810008531b38 8023061e [18073.371238] 810004441b80 0002 0001 [18073.371245] Call Trace: [18073.371250] [] try_to_wake_up+0x2fe/0x3a0 [18073.371253] [] default_wake_function+0xd/0x10 [18073.371257] [] __wake_up_common+0x5a/0x90 [18073.371260] [] __wake_up_sync+0x4a/0x70 [18073.371264] [] unix_write_space+0x8f/0xa0 [18073.371269] [] sock_wfree+0x49/0x50 [18073.371272] [] __kfree_skb+0x69/0xe0 [18073.371275] [] kfree_skb+0x17/0x30 [18073.371278] [] unix_stream_recvmsg+0x267/0x610 [18073.371283] [] sock_aio_read+0x107/0x110 [18073.371287] [] do_sync_read+0xf1/0x130 [18073.371291] [] sock_ioctl+0x0/0x260 [18073.371295] [] autoremove_wake_function+0x0/0x40 [18073.371299] [] unix_ioctl+0xb2/0xf0 [18073.371302] [] sock_ioctl+0xd1/0x260 [18073.371305] [] do_ioctl+0x31/0x90 [18073.371308] [] vfs_read+0x156/0x160 [18073.371311] [] sys_read+0x50/0x90 [18073.371315] [] system_call+0x7e/0x83 [18073.371317] [18073.371319] [18073.371319] Code: 48 8b 90 20 01 00 00 48 39 93 20 01 00 00 75 e2 48 81 3b 00 [18073.371346] RIP [] check_preempt_wakeup+0x6e/0x110 [18073.371351] RSP [18073.371354] CR2: 0120 [18073.371358] note: kwin[4639] exited with preempt_count 3 Here's my config: # # Automatically generated make config: don't edit # Linux kernel version: 2.6.24-rc1 # Sun Nov 4 13:21:29 2007 # CONFIG_X86_64=y CONFIG_64BIT=y CONFIG_X86=y CONFIG_GENERIC_TIME=y CONFIG_GENERIC_TIME_VSYSCALL=y CONFIG_GENERIC_CMOS_UPDATE=y CONFIG_CLOCKSOURCE_WATCHDOG=y CONFIG_GENERIC_CLOCKEVENTS=y CONFIG_GENERIC_CLOCKEVENTS_BROADCAST=y CONFIG_ZONE_DMA32=y CONFIG_LOCKDEP_SUPPORT=y CONFIG_STACKTRACE_SUPPORT=y CONFIG_SEMAPHORE_SLEEPERS=y CONFIG_MMU=y CONFIG_ZONE_DMA=y CONFIG_RWSEM_GENERIC_SPINLOCK=y CONFIG_GENERIC_HWEIGHT=y CONFIG_GENERIC_CALIBRATE_DELAY=y CONFIG_X86_CMPXCHG=y CONFIG_GENERIC_ISA_DMA=y CONFIG_GENERIC_IOMAP=y CONFIG_ARCH_MAY_HAVE_PC_FDC=y CONFIG_ARCH_POPULATES_NODE_MAP=y CONFIG_DMI=y CONFIG_AUDIT_ARCH=y CONFIG_GENERIC_BUG=y # CONFIG_ARCH_HAS_ILOG2_U32 is not set # CONFIG_ARCH_HAS_ILOG2_U64 is not set CONFIG_DEFCONFIG_LIST="/lib/modules/$UNAME_RELEASE/.config" # # General setup # CONFIG_EXPERIMENTAL=y CONFIG_LOCK_KERNEL=y CONFIG_INIT_ENV_ARG_LIMIT=32 CONFIG_LOCALVERSION="" CONFIG_LOCALVERSION_AUTO=y CONFIG_SWAP=y CONFIG_SYSVIPC=y CONFIG_SYSVIPC_SYSCTL=y CONFIG_POSIX_MQUEUE=y CONFIG_BSD_PROCESS_ACCT=y # CONFIG_BSD_PROCESS_ACCT_V3 is not set # CONFIG_TASKSTATS is not set # CONFIG_USER_NS is not set # CONFIG_AUDIT is not set CONFIG_IKCONFIG=y CONFIG_IKCONFIG_PROC=y CONFIG_LOG_BUF_SHIFT=17 # CONFIG_CGROUPS is not set CONFIG_FAIR_GROUP_SCHED=y CONFIG_FAIR_USER_SCHED=y # CONFIG_FAIR_CGROUP_SCHED is not set # CONFIG_SYSFS_DEPRECATED is not set # CONFIG_RELAY is not set CONFIG_BLK_DEV_INITRD=y CONFIG_INITRAMFS_SOURCE="" # CONFIG_CC_OPTIMIZE_FOR_SIZE is not set CONFIG_SYSCTL=y # CONFIG_EMBEDDED is not set CONFIG_UID16=y CONFIG_SYSCTL_SYSCALL=y CONFIG_KALLSYMS=y CONFIG_KALLSYMS_ALL=y # CONFIG_KALLSYMS_EXTRA_PASS is not set CONFIG_HOTPLUG=y CONFIG_PRINTK=y CONFIG_BUG=y CONFIG_ELF_CORE=y CONFIG_BASE_FULL=y CONFIG_FUTEX=y CONFIG_ANON_INODES=y CONFIG_EPOLL=y CONFIG_SIGNALFD=y CONFIG_EVENTFD=y CONFIG_SHMEM=y CONFIG_VM_EVENT_COUNTERS=y CONFIG_SLUB_DEBUG=y # CONFIG_SLAB is not set CONFIG_SLUB=y # CONFIG_SLOB is not set CONFIG_RT_MUTEXES=y # CONFIG_TINY_SHMEM is not set CONFIG_BASE_SMALL=0 CONFIG_MODULES=y
2.6.24-rc1-gb4f5550 oops
Hi, I got this oops on 2.6.24-rc1-641-gb4f5550: [18073.371126] Unable to handle kernel NULL pointer dereference at 0120 RIP: [18073.371134] [8023572e] check_preempt_wakeup+0x6e/0x110 [18073.371144] PGD 81f9067 PUD 81c8067 PMD 0 [18073.371151] Oops: [1] PREEMPT SMP [18073.371157] CPU 2 [18073.371161] Modules linked in: vfat fat [18073.371168] Pid: 4639, comm: kwin Not tainted 2.6.24-rc1 #1 [18073.371171] RIP: 0010:[8023572e] [8023572e] check_preempt_wakeup+0x6e/0x110 [18073.371177] RSP: 0018:810008531a78 EFLAGS: 00010006 [18073.371179] RAX: RBX: RCX: [18073.371183] RDX: 810004441bf0 RSI: 81000801e860 RDI: 81000444ab80 [18073.371186] RBP: 810008531aa8 R08: 00d0d47a4a90 R09: [18073.371188] R10: 810004441bf0 R11: 0001 R12: 810006520400 [18073.371190] R13: 81000801e860 R14: 81000a63a000 R15: 81000443d8e0 [18073.371193] FS: 2b7d646a86f0() GS:810004c11780() knlGS: [18073.371196] CS: 0010 DS: ES: CR0: 8005003b [18073.371199] CR2: 0120 CR3: 08495000 CR4: 06e0 [18073.371202] DR0: DR1: DR2: [18073.371211] DR3: DR6: 0ff0 DR7: 0400 [18073.371214] Process kwin (pid: 4639, threadinfo 81000853, task 81000840a860) [18073.371216] Stack: 81000444ab80 0001 81000801e860 81000444ab80 [18073.371231] 0002 81000443d8e0 810008531b38 8023061e [18073.371238] 810004441b80 0002 0001 [18073.371245] Call Trace: [18073.371250] [8023061e] try_to_wake_up+0x2fe/0x3a0 [18073.371253] [802306cd] default_wake_function+0xd/0x10 [18073.371257] [8022daca] __wake_up_common+0x5a/0x90 [18073.371260] [8023095a] __wake_up_sync+0x4a/0x70 [18073.371264] [80602c3f] unix_write_space+0x8f/0xa0 [18073.371269] [80593359] sock_wfree+0x49/0x50 [18073.371272] [805951d9] __kfree_skb+0x69/0xe0 [18073.371275] [80595267] kfree_skb+0x17/0x30 [18073.371278] [80601347] unix_stream_recvmsg+0x267/0x610 [18073.371283] [8058e0b7] sock_aio_read+0x107/0x110 [18073.371287] [802929a1] do_sync_read+0xf1/0x130 [18073.371291] [8058e230] sock_ioctl+0x0/0x260 [18073.371295] [80251830] autoremove_wake_function+0x0/0x40 [18073.371299] [80600522] unix_ioctl+0xb2/0xf0 [18073.371302] [8058e301] sock_ioctl+0xd1/0x260 [18073.371305] [802a00c1] do_ioctl+0x31/0x90 [18073.371308] [802932f6] vfs_read+0x156/0x160 [18073.371311] [80293700] sys_read+0x50/0x90 [18073.371315] [8020bd7e] system_call+0x7e/0x83 [18073.371317] [18073.371319] [18073.371319] Code: 48 8b 90 20 01 00 00 48 39 93 20 01 00 00 75 e2 48 81 3b 00 [18073.371346] RIP [8023572e] check_preempt_wakeup+0x6e/0x110 [18073.371351] RSP 810008531a78 [18073.371354] CR2: 0120 [18073.371358] note: kwin[4639] exited with preempt_count 3 Here's my config: # # Automatically generated make config: don't edit # Linux kernel version: 2.6.24-rc1 # Sun Nov 4 13:21:29 2007 # CONFIG_X86_64=y CONFIG_64BIT=y CONFIG_X86=y CONFIG_GENERIC_TIME=y CONFIG_GENERIC_TIME_VSYSCALL=y CONFIG_GENERIC_CMOS_UPDATE=y CONFIG_CLOCKSOURCE_WATCHDOG=y CONFIG_GENERIC_CLOCKEVENTS=y CONFIG_GENERIC_CLOCKEVENTS_BROADCAST=y CONFIG_ZONE_DMA32=y CONFIG_LOCKDEP_SUPPORT=y CONFIG_STACKTRACE_SUPPORT=y CONFIG_SEMAPHORE_SLEEPERS=y CONFIG_MMU=y CONFIG_ZONE_DMA=y CONFIG_RWSEM_GENERIC_SPINLOCK=y CONFIG_GENERIC_HWEIGHT=y CONFIG_GENERIC_CALIBRATE_DELAY=y CONFIG_X86_CMPXCHG=y CONFIG_GENERIC_ISA_DMA=y CONFIG_GENERIC_IOMAP=y CONFIG_ARCH_MAY_HAVE_PC_FDC=y CONFIG_ARCH_POPULATES_NODE_MAP=y CONFIG_DMI=y CONFIG_AUDIT_ARCH=y CONFIG_GENERIC_BUG=y # CONFIG_ARCH_HAS_ILOG2_U32 is not set # CONFIG_ARCH_HAS_ILOG2_U64 is not set CONFIG_DEFCONFIG_LIST=/lib/modules/$UNAME_RELEASE/.config # # General setup # CONFIG_EXPERIMENTAL=y CONFIG_LOCK_KERNEL=y CONFIG_INIT_ENV_ARG_LIMIT=32 CONFIG_LOCALVERSION= CONFIG_LOCALVERSION_AUTO=y CONFIG_SWAP=y CONFIG_SYSVIPC=y CONFIG_SYSVIPC_SYSCTL=y CONFIG_POSIX_MQUEUE=y CONFIG_BSD_PROCESS_ACCT=y # CONFIG_BSD_PROCESS_ACCT_V3 is not set # CONFIG_TASKSTATS is not set # CONFIG_USER_NS is not set # CONFIG_AUDIT is not set CONFIG_IKCONFIG=y CONFIG_IKCONFIG_PROC=y CONFIG_LOG_BUF_SHIFT=17 # CONFIG_CGROUPS is not set CONFIG_FAIR_GROUP_SCHED=y CONFIG_FAIR_USER_SCHED=y # CONFIG_FAIR_CGROUP_SCHED is not set # CONFIG_SYSFS_DEPRECATED is not set # CONFIG_RELAY is not set CONFIG_BLK_DEV_INITRD=y CONFIG_INITRAMFS_SOURCE= # CONFIG_CC_OPTIMIZE_FOR_SIZE is not set CONFIG_SYSCTL=y # CONFIG_EMBEDDED is not set CONFIG_UID16=y CONFIG_SYSCTL_SYSCALL=y CONFIG_KALLSYMS=y CONFIG_KALLSYMS_ALL=y # CONFIG_KALLSYMS_EXTRA_PASS is not set