Re: 2.6.24-git2: Oracle 11g VKTM process enters R state on startup and is unkillable [still broken in 2.6.25-rc1]
* Alessandro Suardi <[EMAIL PROTECTED]> wrote: > > > ok, we just found the reason for the 8-way crash, the delta fix > > > from Peter is below if any of you have tried the previous combo > > > patch. Updated sched.git as well, new HEAD is > > > fec13e45305d69fd0bd23b30bd05a0a42cf341f8. > > > > With the previous patch and this patch applied, the issue is not > > reproducible here. > > The problem is fixed for me as well with the previous patch + the > patch below, VKTM now enters S state and Oracle shuts down properly > again. thanks alot for testing this. The scheduler queue is now looking good in testing, will probably send a pull request to Linus later today. Ingo -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.24-git2: Oracle 11g VKTM process enters R state on startup and is unkillable [still broken in 2.6.25-rc1]
* Alessandro Suardi [EMAIL PROTECTED] wrote: ok, we just found the reason for the 8-way crash, the delta fix from Peter is below if any of you have tried the previous combo patch. Updated sched.git as well, new HEAD is fec13e45305d69fd0bd23b30bd05a0a42cf341f8. With the previous patch and this patch applied, the issue is not reproducible here. The problem is fixed for me as well with the previous patch + the patch below, VKTM now enters S state and Oracle shuts down properly again. thanks alot for testing this. The scheduler queue is now looking good in testing, will probably send a pull request to Linus later today. Ingo -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.24-git2: Oracle 11g VKTM process enters R state on startup and is unkillable [still broken in 2.6.25-rc1]
On Tuesday, 12 of February 2008, Peter Zijlstra wrote: > > On Tue, 2008-02-12 at 00:12 +0100, Rafael J. Wysocki wrote: > > On Monday, 11 of February 2008, Ingo Molnar wrote: > > > > > > * Ingo Molnar <[EMAIL PROTECTED]> wrote: > > > > > > > no, they were not lost, they just didnt pass QA here (they crashed on > > > > a particularly hard to debug 8-way box i have) and Peter worked on > > > > that queue of fixes up until today to get it really correct. Could you > > > > check: > > > > > > > > > > > > git://git.kernel.org/pub/scm/linux/kernel/git/mingo/linux-2.6-sched.git > > > > > > > > combo patch below as well - whichever you prefer. The shortlog can be > > > > found below as well - but i dont yet consider this pullable, i'd like > > > > it to see pass a full night of randconfig tests on my test-systems. > > > > > > ok, we just found the reason for the 8-way crash, the delta fix from > > > Peter is below if any of you have tried the previous combo patch. > > > Updated sched.git as well, new HEAD is > > > fec13e45305d69fd0bd23b30bd05a0a42cf341f8. > > > > With the previous patch and this patch applied, the issue is not > > reproducible > > here. > > Did you enable CONFIG_RT_GROUP_SCHED (it defaults to n)? > > If you didn't, could you try with it set to y? Tested with CONFIG_RT_GROUP_SCHED set and it also works as expected. Thanks, Rafael -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.24-git2: Oracle 11g VKTM process enters R state on startup and is unkillable [still broken in 2.6.25-rc1]
On Tue, 2008-02-12 at 15:35 +0100, Alessandro Suardi wrote: > On Feb 12, 2008 2:44 PM, Peter Zijlstra <[EMAIL PROTECTED]> wrote: > > > > On Tue, 2008-02-12 at 00:12 +0100, Rafael J. Wysocki wrote: > > > On Monday, 11 of February 2008, Ingo Molnar wrote: > > > > > > > > * Ingo Molnar <[EMAIL PROTECTED]> wrote: > > > > > > > > > no, they were not lost, they just didnt pass QA here (they crashed on > > > > > a particularly hard to debug 8-way box i have) and Peter worked on > > > > > that queue of fixes up until today to get it really correct. Could you > > > > > check: > > > > > > > > > > > > > > > git://git.kernel.org/pub/scm/linux/kernel/git/mingo/linux-2.6-sched.git > > > > > > > > > > combo patch below as well - whichever you prefer. The shortlog can be > > > > > found below as well - but i dont yet consider this pullable, i'd like > > > > > it to see pass a full night of randconfig tests on my test-systems. > > > > > > > > ok, we just found the reason for the 8-way crash, the delta fix from > > > > Peter is below if any of you have tried the previous combo patch. > > > > Updated sched.git as well, new HEAD is > > > > fec13e45305d69fd0bd23b30bd05a0a42cf341f8. > > > > > > With the previous patch and this patch applied, the issue is not > > > reproducible > > > here. > > > > Did you enable CONFIG_RT_GROUP_SCHED (it defaults to n)? > > > > If you didn't, could you try with it set to y? > > I just rebuilt 2.6.25-rc1-git2 with Ingo's patch and your patch on top, > and the Oracle VKTM issue is still gone even with > > [EMAIL PROTECTED] ~]$ grep GROUP_SCHED > /share/src/linux-2.6.25-rc1-git2-orafix/.config > CONFIG_GROUP_SCHED=y > CONFIG_FAIR_GROUP_SCHED=y > CONFIG_RT_GROUP_SCHED=y > # CONFIG_CGROUP_SCHED is not set > > so it's good for me. > > Or is it necessary to also enable CONFIG_CGROUP_SCHED and retest ? No that should be quite all-right. Thanks for testing! -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.24-git2: Oracle 11g VKTM process enters R state on startup and is unkillable [still broken in 2.6.25-rc1]
On Feb 12, 2008 2:44 PM, Peter Zijlstra <[EMAIL PROTECTED]> wrote: > > On Tue, 2008-02-12 at 00:12 +0100, Rafael J. Wysocki wrote: > > On Monday, 11 of February 2008, Ingo Molnar wrote: > > > > > > * Ingo Molnar <[EMAIL PROTECTED]> wrote: > > > > > > > no, they were not lost, they just didnt pass QA here (they crashed on > > > > a particularly hard to debug 8-way box i have) and Peter worked on > > > > that queue of fixes up until today to get it really correct. Could you > > > > check: > > > > > > > > > > > > git://git.kernel.org/pub/scm/linux/kernel/git/mingo/linux-2.6-sched.git > > > > > > > > combo patch below as well - whichever you prefer. The shortlog can be > > > > found below as well - but i dont yet consider this pullable, i'd like > > > > it to see pass a full night of randconfig tests on my test-systems. > > > > > > ok, we just found the reason for the 8-way crash, the delta fix from > > > Peter is below if any of you have tried the previous combo patch. > > > Updated sched.git as well, new HEAD is > > > fec13e45305d69fd0bd23b30bd05a0a42cf341f8. > > > > With the previous patch and this patch applied, the issue is not > > reproducible > > here. > > Did you enable CONFIG_RT_GROUP_SCHED (it defaults to n)? > > If you didn't, could you try with it set to y? I just rebuilt 2.6.25-rc1-git2 with Ingo's patch and your patch on top, and the Oracle VKTM issue is still gone even with [EMAIL PROTECTED] ~]$ grep GROUP_SCHED /share/src/linux-2.6.25-rc1-git2-orafix/.config CONFIG_GROUP_SCHED=y CONFIG_FAIR_GROUP_SCHED=y CONFIG_RT_GROUP_SCHED=y # CONFIG_CGROUP_SCHED is not set so it's good for me. Or is it necessary to also enable CONFIG_CGROUP_SCHED and retest ? --alessandro "We act as though comfort and luxury were the chief requirements of life, when all that we need to make us really happy is something to be enthusiastic about." (Charles Kingsley) -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.24-git2: Oracle 11g VKTM process enters R state on startup and is unkillable [still broken in 2.6.25-rc1]
On Tue, 2008-02-12 at 00:12 +0100, Rafael J. Wysocki wrote: > On Monday, 11 of February 2008, Ingo Molnar wrote: > > > > * Ingo Molnar <[EMAIL PROTECTED]> wrote: > > > > > no, they were not lost, they just didnt pass QA here (they crashed on > > > a particularly hard to debug 8-way box i have) and Peter worked on > > > that queue of fixes up until today to get it really correct. Could you > > > check: > > > > > >git://git.kernel.org/pub/scm/linux/kernel/git/mingo/linux-2.6-sched.git > > > > > > combo patch below as well - whichever you prefer. The shortlog can be > > > found below as well - but i dont yet consider this pullable, i'd like > > > it to see pass a full night of randconfig tests on my test-systems. > > > > ok, we just found the reason for the 8-way crash, the delta fix from > > Peter is below if any of you have tried the previous combo patch. > > Updated sched.git as well, new HEAD is > > fec13e45305d69fd0bd23b30bd05a0a42cf341f8. > > With the previous patch and this patch applied, the issue is not reproducible > here. Did you enable CONFIG_RT_GROUP_SCHED (it defaults to n)? If you didn't, could you try with it set to y? -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.24-git2: Oracle 11g VKTM process enters R state on startup and is unkillable [still broken in 2.6.25-rc1]
On Tue, 2008-02-12 at 15:35 +0100, Alessandro Suardi wrote: On Feb 12, 2008 2:44 PM, Peter Zijlstra [EMAIL PROTECTED] wrote: On Tue, 2008-02-12 at 00:12 +0100, Rafael J. Wysocki wrote: On Monday, 11 of February 2008, Ingo Molnar wrote: * Ingo Molnar [EMAIL PROTECTED] wrote: no, they were not lost, they just didnt pass QA here (they crashed on a particularly hard to debug 8-way box i have) and Peter worked on that queue of fixes up until today to get it really correct. Could you check: git://git.kernel.org/pub/scm/linux/kernel/git/mingo/linux-2.6-sched.git combo patch below as well - whichever you prefer. The shortlog can be found below as well - but i dont yet consider this pullable, i'd like it to see pass a full night of randconfig tests on my test-systems. ok, we just found the reason for the 8-way crash, the delta fix from Peter is below if any of you have tried the previous combo patch. Updated sched.git as well, new HEAD is fec13e45305d69fd0bd23b30bd05a0a42cf341f8. With the previous patch and this patch applied, the issue is not reproducible here. Did you enable CONFIG_RT_GROUP_SCHED (it defaults to n)? If you didn't, could you try with it set to y? I just rebuilt 2.6.25-rc1-git2 with Ingo's patch and your patch on top, and the Oracle VKTM issue is still gone even with [EMAIL PROTECTED] ~]$ grep GROUP_SCHED /share/src/linux-2.6.25-rc1-git2-orafix/.config CONFIG_GROUP_SCHED=y CONFIG_FAIR_GROUP_SCHED=y CONFIG_RT_GROUP_SCHED=y # CONFIG_CGROUP_SCHED is not set so it's good for me. Or is it necessary to also enable CONFIG_CGROUP_SCHED and retest ? No that should be quite all-right. Thanks for testing! -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.24-git2: Oracle 11g VKTM process enters R state on startup and is unkillable [still broken in 2.6.25-rc1]
On Feb 12, 2008 2:44 PM, Peter Zijlstra [EMAIL PROTECTED] wrote: On Tue, 2008-02-12 at 00:12 +0100, Rafael J. Wysocki wrote: On Monday, 11 of February 2008, Ingo Molnar wrote: * Ingo Molnar [EMAIL PROTECTED] wrote: no, they were not lost, they just didnt pass QA here (they crashed on a particularly hard to debug 8-way box i have) and Peter worked on that queue of fixes up until today to get it really correct. Could you check: git://git.kernel.org/pub/scm/linux/kernel/git/mingo/linux-2.6-sched.git combo patch below as well - whichever you prefer. The shortlog can be found below as well - but i dont yet consider this pullable, i'd like it to see pass a full night of randconfig tests on my test-systems. ok, we just found the reason for the 8-way crash, the delta fix from Peter is below if any of you have tried the previous combo patch. Updated sched.git as well, new HEAD is fec13e45305d69fd0bd23b30bd05a0a42cf341f8. With the previous patch and this patch applied, the issue is not reproducible here. Did you enable CONFIG_RT_GROUP_SCHED (it defaults to n)? If you didn't, could you try with it set to y? I just rebuilt 2.6.25-rc1-git2 with Ingo's patch and your patch on top, and the Oracle VKTM issue is still gone even with [EMAIL PROTECTED] ~]$ grep GROUP_SCHED /share/src/linux-2.6.25-rc1-git2-orafix/.config CONFIG_GROUP_SCHED=y CONFIG_FAIR_GROUP_SCHED=y CONFIG_RT_GROUP_SCHED=y # CONFIG_CGROUP_SCHED is not set so it's good for me. Or is it necessary to also enable CONFIG_CGROUP_SCHED and retest ? --alessandro We act as though comfort and luxury were the chief requirements of life, when all that we need to make us really happy is something to be enthusiastic about. (Charles Kingsley) -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.24-git2: Oracle 11g VKTM process enters R state on startup and is unkillable [still broken in 2.6.25-rc1]
On Feb 12, 2008 12:12 AM, Rafael J. Wysocki <[EMAIL PROTECTED]> wrote: > On Monday, 11 of February 2008, Ingo Molnar wrote: > > > > * Ingo Molnar <[EMAIL PROTECTED]> wrote: > > > > > no, they were not lost, they just didnt pass QA here (they crashed on > > > a particularly hard to debug 8-way box i have) and Peter worked on > > > that queue of fixes up until today to get it really correct. Could you > > > check: > > > > > >git://git.kernel.org/pub/scm/linux/kernel/git/mingo/linux-2.6-sched.git > > > > > > combo patch below as well - whichever you prefer. The shortlog can be > > > found below as well - but i dont yet consider this pullable, i'd like > > > it to see pass a full night of randconfig tests on my test-systems. > > > > ok, we just found the reason for the 8-way crash, the delta fix from > > Peter is below if any of you have tried the previous combo patch. > > Updated sched.git as well, new HEAD is > > fec13e45305d69fd0bd23b30bd05a0a42cf341f8. > > With the previous patch and this patch applied, the issue is not reproducible > here. > > Thanks, > Rafael The problem is fixed for me as well with the previous patch + the patch below, VKTM now enters S state and Oracle shuts down properly again. Thanks ! > > Index: linux-2.6/kernel/sched.c > > === > > --- linux-2.6.orig/kernel/sched.c > > +++ linux-2.6/kernel/sched.c > > @@ -219,6 +219,10 @@ static void start_rt_bandwidth(struct rt > > if (rt_b->rt_runtime == RUNTIME_INF) > > return; > > > > + if (hrtimer_active(_b->rt_period_timer)) > > + return; > > + > > + spin_lock(_b->rt_runtime_lock); > > for (;;) { > > if (hrtimer_active(_b->rt_period_timer)) > > break; > > @@ -229,6 +233,7 @@ static void start_rt_bandwidth(struct rt > > rt_b->rt_period_timer.expires, > > HRTIMER_MODE_ABS); > > } > > + spin_unlock(_b->rt_runtime_lock); > > } > > > > #ifdef CONFIG_RT_GROUP_SCHED --alessandro "We act as though comfort and luxury were the chief requirements of life, when all that we need to make us really happy is something to be enthusiastic about." (Charles Kingsley) -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.24-git2: Oracle 11g VKTM process enters R state on startup and is unkillable [still broken in 2.6.25-rc1]
On Monday, 11 of February 2008, Ingo Molnar wrote: > > * Ingo Molnar <[EMAIL PROTECTED]> wrote: > > > no, they were not lost, they just didnt pass QA here (they crashed on > > a particularly hard to debug 8-way box i have) and Peter worked on > > that queue of fixes up until today to get it really correct. Could you > > check: > > > >git://git.kernel.org/pub/scm/linux/kernel/git/mingo/linux-2.6-sched.git > > > > combo patch below as well - whichever you prefer. The shortlog can be > > found below as well - but i dont yet consider this pullable, i'd like > > it to see pass a full night of randconfig tests on my test-systems. > > ok, we just found the reason for the 8-way crash, the delta fix from > Peter is below if any of you have tried the previous combo patch. > Updated sched.git as well, new HEAD is > fec13e45305d69fd0bd23b30bd05a0a42cf341f8. With the previous patch and this patch applied, the issue is not reproducible here. Thanks, Rafael > Index: linux-2.6/kernel/sched.c > === > --- linux-2.6.orig/kernel/sched.c > +++ linux-2.6/kernel/sched.c > @@ -219,6 +219,10 @@ static void start_rt_bandwidth(struct rt > if (rt_b->rt_runtime == RUNTIME_INF) > return; > > + if (hrtimer_active(_b->rt_period_timer)) > + return; > + > + spin_lock(_b->rt_runtime_lock); > for (;;) { > if (hrtimer_active(_b->rt_period_timer)) > break; > @@ -229,6 +233,7 @@ static void start_rt_bandwidth(struct rt > rt_b->rt_period_timer.expires, > HRTIMER_MODE_ABS); > } > + spin_unlock(_b->rt_runtime_lock); > } > > #ifdef CONFIG_RT_GROUP_SCHED > > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.24-git2: Oracle 11g VKTM process enters R state on startup and is unkillable [still broken in 2.6.25-rc1]
* Ingo Molnar <[EMAIL PROTECTED]> wrote: > no, they were not lost, they just didnt pass QA here (they crashed on > a particularly hard to debug 8-way box i have) and Peter worked on > that queue of fixes up until today to get it really correct. Could you > check: > >git://git.kernel.org/pub/scm/linux/kernel/git/mingo/linux-2.6-sched.git > > combo patch below as well - whichever you prefer. The shortlog can be > found below as well - but i dont yet consider this pullable, i'd like > it to see pass a full night of randconfig tests on my test-systems. ok, we just found the reason for the 8-way crash, the delta fix from Peter is below if any of you have tried the previous combo patch. Updated sched.git as well, new HEAD is fec13e45305d69fd0bd23b30bd05a0a42cf341f8. Ingo Index: linux-2.6/kernel/sched.c === --- linux-2.6.orig/kernel/sched.c +++ linux-2.6/kernel/sched.c @@ -219,6 +219,10 @@ static void start_rt_bandwidth(struct rt if (rt_b->rt_runtime == RUNTIME_INF) return; + if (hrtimer_active(_b->rt_period_timer)) + return; + + spin_lock(_b->rt_runtime_lock); for (;;) { if (hrtimer_active(_b->rt_period_timer)) break; @@ -229,6 +233,7 @@ static void start_rt_bandwidth(struct rt rt_b->rt_period_timer.expires, HRTIMER_MODE_ABS); } + spin_unlock(_b->rt_runtime_lock); } #ifdef CONFIG_RT_GROUP_SCHED -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.24-git2: Oracle 11g VKTM process enters R state on startup and is unkillable [still broken in 2.6.25-rc1]
On Monday, 11 of February 2008, Linus Torvalds wrote: > > On Mon, 11 Feb 2008, Rafael J. Wysocki wrote: > > > On Monday, 11 of February 2008, Alessandro Suardi wrote: > > > > > > > > 2.6.24-git1 is okay > > > > 2.6.24-git2 is bad > > Ok, that's git ID's > > b47711bfbcd4eb77ca61ef0162487b20e023ae55 2.6.24-git1 > 9b73e76f3cf63379dcf45fcd4f112f5812418d0a 2.6.24-git2 > > so if you get a git tree, you can do > > gitk b47711b..9b73e76 > > to see what happened in there. > > However, the obvious candidates are the scheduler or the ocfs2 merge, and > the latter is only relevant in case you use ocfs2, of course. > > The rest of it tends to be the DVB and SCSI updates. > > But it would be great if you could do a bisect and verify. Just do > > git bisect start > git bisect good b47711bfbcd4eb77ca61ef0162487b20e023ae55 > git bisect bad 9b73e76f3cf63379dcf45fcd4f112f5812418d0a > > and off you go.. Well, I've already bisected that down to commit 6f505b16425a51270058e4a93441fe64de3dd435 "sched: rt group scheduling" and provided a simple test case. Moreover, there are patches from Peter that fix the problem, but they are lost somewhere in the way from him to you (please see http://lkml.org/lkml/2008/2/5/535 and http://lkml.org/lkml/2008/2/6/320). Thanks, Rafael -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.24-git2: Oracle 11g VKTM process enters R state on startup and is unkillable [still broken in 2.6.25-rc1]
On Mon, 11 Feb 2008, Rafael J. Wysocki wrote: > On Monday, 11 of February 2008, Alessandro Suardi wrote: > > > > > > 2.6.24-git1 is okay > > > 2.6.24-git2 is bad Ok, that's git ID's b47711bfbcd4eb77ca61ef0162487b20e023ae55 2.6.24-git1 9b73e76f3cf63379dcf45fcd4f112f5812418d0a 2.6.24-git2 so if you get a git tree, you can do gitk b47711b..9b73e76 to see what happened in there. However, the obvious candidates are the scheduler or the ocfs2 merge, and the latter is only relevant in case you use ocfs2, of course. The rest of it tends to be the DVB and SCSI updates. But it would be great if you could do a bisect and verify. Just do git bisect start git bisect good b47711bfbcd4eb77ca61ef0162487b20e023ae55 git bisect bad 9b73e76f3cf63379dcf45fcd4f112f5812418d0a and off you go.. Linus -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.24-git2: Oracle 11g VKTM process enters R state on startup and is unkillable [still broken in 2.6.25-rc1]
On Monday, 11 of February 2008, Alessandro Suardi wrote: > On Feb 9, 2008 6:10 PM, Alessandro Suardi <[EMAIL PROTECTED]> wrote: > > I finally had a bit of time to try out different kernel versions to find > > out where this began... and it's in 2.6.24-git2. > > > > What happens: Oracle 11g starts up and forks a number of so > > called background processes. Starting in 2.6.24-git2 the VKTM > > process never fully completes its initialization but gets in R state, > > never accumulating CPU, and can't be straced/gdb'd/killed. > > > > Sysrq-T reports for VKTM looks like this > > > > Feb 9 16:10:46 sandman kernel: === > > Feb 9 16:10:46 sandman kernel: oracleR running 2684 2258 1 > > Feb 9 16:10:46 sandman kernel:f591dfb0 00200086 f6bbc3c4 > > f6863cc0 c010547a b794f62c b7b70600 > > Feb 9 16:10:46 sandman kernel:b79453dc f591d000 c0103caa > > b794f62c b7943708 b79453e4 b7b70600 b79453dc > > Feb 9 16:10:46 sandman kernel:bfb0dd5c b79500b0 007b > > 007b c032 0e072d7a 0073 > > Feb 9 16:10:46 sandman kernel: Call Trace: > > Feb 9 16:10:46 sandman kernel: [] ? do_IRQ+0xac/0xc1 > > Feb 9 16:10:46 sandman kernel: [] work_resched+0x5/0x16 > > Feb 9 16:10:46 sandman kernel: [] ? pci_setup+0xb3/0x104 > > Feb 9 16:10:46 sandman kernel: === > > > > > > 2.6.24-git1 is okay > > 2.6.24-git2 is bad > > ... > > 2.6.24-git20 is bad > > > > Only differences in kernel .config between -git1 and -git2 are > > > > [EMAIL PROTECTED] src]$ diff .config-2.6.24-git[12] > > 3,4c3,4 > > < # Linux kernel version: 2.6.24-git1 > > < # Sat Jan 26 01:04:43 2008 > > --- > > > # Linux kernel version: 2.6.24-git2 > > > # Sat Jan 26 12:10:15 2008 > > 121a122,123 > > > CONFIG_CLASSIC_RCU=y > > > # CONFIG_PREEMPT_RCU is not set > > 187a190 > > > # CONFIG_RCU_TRACE is not set > > 230a234 > > > # CONFIG_SCHED_HRTICK is not set > > 755a760 > > > # CONFIG_PATA_NINJA32 is not set > > 1807a1813 > > > # CONFIG_LATENCYTOP is not set > > > > Symptom is similar to what Rafael reported here > > > > http://www.ussg.iu.edu/hypermail/linux/kernel/0801.3/4114.html > > > > and similarly VKTM attempts to run at elevated priority as normal > > user process (Oracle kernel binary is not setuid root). Yes, I think this is the same problem. Please try to unset CONFIG_GROUP_SCHED and see if that helps. > > Peter Zijlstra's patches mentioned in the above thread, at > > > > http://programming.kicks-ass.net/kernel-patches/sched-rt-group , > > > > do not appear to be in -git20 yet. > > > > > > I'm available for further testing. Thanks, ciao, > > Only to add that 2.6.25-rc1 is still broken. Yes, it is. Thanks, Rafael -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.24-git2: Oracle 11g VKTM process enters R state on startup and is unkillable [still broken in 2.6.25-rc1]
On Feb 9, 2008 6:10 PM, Alessandro Suardi <[EMAIL PROTECTED]> wrote: > I finally had a bit of time to try out different kernel versions to find > out where this began... and it's in 2.6.24-git2. > > What happens: Oracle 11g starts up and forks a number of so > called background processes. Starting in 2.6.24-git2 the VKTM > process never fully completes its initialization but gets in R state, > never accumulating CPU, and can't be straced/gdb'd/killed. > > Sysrq-T reports for VKTM looks like this > > Feb 9 16:10:46 sandman kernel: === > Feb 9 16:10:46 sandman kernel: oracleR running 2684 2258 1 > Feb 9 16:10:46 sandman kernel:f591dfb0 00200086 f6bbc3c4 > f6863cc0 c010547a b794f62c b7b70600 > Feb 9 16:10:46 sandman kernel:b79453dc f591d000 c0103caa > b794f62c b7943708 b79453e4 b7b70600 b79453dc > Feb 9 16:10:46 sandman kernel:bfb0dd5c b79500b0 007b > 007b c032 0e072d7a 0073 > Feb 9 16:10:46 sandman kernel: Call Trace: > Feb 9 16:10:46 sandman kernel: [] ? do_IRQ+0xac/0xc1 > Feb 9 16:10:46 sandman kernel: [] work_resched+0x5/0x16 > Feb 9 16:10:46 sandman kernel: [] ? pci_setup+0xb3/0x104 > Feb 9 16:10:46 sandman kernel: === > > > 2.6.24-git1 is okay > 2.6.24-git2 is bad > ... > 2.6.24-git20 is bad > > Only differences in kernel .config between -git1 and -git2 are > > [EMAIL PROTECTED] src]$ diff .config-2.6.24-git[12] > 3,4c3,4 > < # Linux kernel version: 2.6.24-git1 > < # Sat Jan 26 01:04:43 2008 > --- > > # Linux kernel version: 2.6.24-git2 > > # Sat Jan 26 12:10:15 2008 > 121a122,123 > > CONFIG_CLASSIC_RCU=y > > # CONFIG_PREEMPT_RCU is not set > 187a190 > > # CONFIG_RCU_TRACE is not set > 230a234 > > # CONFIG_SCHED_HRTICK is not set > 755a760 > > # CONFIG_PATA_NINJA32 is not set > 1807a1813 > > # CONFIG_LATENCYTOP is not set > > Symptom is similar to what Rafael reported here > > http://www.ussg.iu.edu/hypermail/linux/kernel/0801.3/4114.html > > and similarly VKTM attempts to run at elevated priority as normal > user process (Oracle kernel binary is not setuid root). > > > Peter Zijlstra's patches mentioned in the above thread, at > > http://programming.kicks-ass.net/kernel-patches/sched-rt-group , > > do not appear to be in -git20 yet. > > > I'm available for further testing. Thanks, ciao, Only to add that 2.6.25-rc1 is still broken. thanks, --alessandro "We act as though comfort and luxury were the chief requirements of life, when all that we need to make us really happy is something to be enthusiastic about." (Charles Kingsley) -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.24-git2: Oracle 11g VKTM process enters R state on startup and is unkillable [still broken in 2.6.25-rc1]
On Mon, 11 Feb 2008, Rafael J. Wysocki wrote: On Monday, 11 of February 2008, Alessandro Suardi wrote: 2.6.24-git1 is okay 2.6.24-git2 is bad Ok, that's git ID's b47711bfbcd4eb77ca61ef0162487b20e023ae55 2.6.24-git1 9b73e76f3cf63379dcf45fcd4f112f5812418d0a 2.6.24-git2 so if you get a git tree, you can do gitk b47711b..9b73e76 to see what happened in there. However, the obvious candidates are the scheduler or the ocfs2 merge, and the latter is only relevant in case you use ocfs2, of course. The rest of it tends to be the DVB and SCSI updates. But it would be great if you could do a bisect and verify. Just do git bisect start git bisect good b47711bfbcd4eb77ca61ef0162487b20e023ae55 git bisect bad 9b73e76f3cf63379dcf45fcd4f112f5812418d0a and off you go.. Linus -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.24-git2: Oracle 11g VKTM process enters R state on startup and is unkillable [still broken in 2.6.25-rc1]
On Monday, 11 of February 2008, Alessandro Suardi wrote: On Feb 9, 2008 6:10 PM, Alessandro Suardi [EMAIL PROTECTED] wrote: I finally had a bit of time to try out different kernel versions to find out where this began... and it's in 2.6.24-git2. What happens: Oracle 11g starts up and forks a number of so called background processes. Starting in 2.6.24-git2 the VKTM process never fully completes its initialization but gets in R state, never accumulating CPU, and can't be straced/gdb'd/killed. Sysrq-T reports for VKTM looks like this Feb 9 16:10:46 sandman kernel: === Feb 9 16:10:46 sandman kernel: oracleR running 2684 2258 1 Feb 9 16:10:46 sandman kernel:f591dfb0 00200086 f6bbc3c4 f6863cc0 c010547a b794f62c b7b70600 Feb 9 16:10:46 sandman kernel:b79453dc f591d000 c0103caa b794f62c b7943708 b79453e4 b7b70600 b79453dc Feb 9 16:10:46 sandman kernel:bfb0dd5c b79500b0 007b 007b c032 0e072d7a 0073 Feb 9 16:10:46 sandman kernel: Call Trace: Feb 9 16:10:46 sandman kernel: [c010547a] ? do_IRQ+0xac/0xc1 Feb 9 16:10:46 sandman kernel: [c0103caa] work_resched+0x5/0x16 Feb 9 16:10:46 sandman kernel: [c032] ? pci_setup+0xb3/0x104 Feb 9 16:10:46 sandman kernel: === 2.6.24-git1 is okay 2.6.24-git2 is bad ... 2.6.24-git20 is bad Only differences in kernel .config between -git1 and -git2 are [EMAIL PROTECTED] src]$ diff .config-2.6.24-git[12] 3,4c3,4 # Linux kernel version: 2.6.24-git1 # Sat Jan 26 01:04:43 2008 --- # Linux kernel version: 2.6.24-git2 # Sat Jan 26 12:10:15 2008 121a122,123 CONFIG_CLASSIC_RCU=y # CONFIG_PREEMPT_RCU is not set 187a190 # CONFIG_RCU_TRACE is not set 230a234 # CONFIG_SCHED_HRTICK is not set 755a760 # CONFIG_PATA_NINJA32 is not set 1807a1813 # CONFIG_LATENCYTOP is not set Symptom is similar to what Rafael reported here http://www.ussg.iu.edu/hypermail/linux/kernel/0801.3/4114.html and similarly VKTM attempts to run at elevated priority as normal user process (Oracle kernel binary is not setuid root). Yes, I think this is the same problem. Please try to unset CONFIG_GROUP_SCHED and see if that helps. Peter Zijlstra's patches mentioned in the above thread, at http://programming.kicks-ass.net/kernel-patches/sched-rt-group , do not appear to be in -git20 yet. I'm available for further testing. Thanks, ciao, Only to add that 2.6.25-rc1 is still broken. Yes, it is. Thanks, Rafael -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.24-git2: Oracle 11g VKTM process enters R state on startup and is unkillable [still broken in 2.6.25-rc1]
On Monday, 11 of February 2008, Ingo Molnar wrote: * Ingo Molnar [EMAIL PROTECTED] wrote: no, they were not lost, they just didnt pass QA here (they crashed on a particularly hard to debug 8-way box i have) and Peter worked on that queue of fixes up until today to get it really correct. Could you check: git://git.kernel.org/pub/scm/linux/kernel/git/mingo/linux-2.6-sched.git combo patch below as well - whichever you prefer. The shortlog can be found below as well - but i dont yet consider this pullable, i'd like it to see pass a full night of randconfig tests on my test-systems. ok, we just found the reason for the 8-way crash, the delta fix from Peter is below if any of you have tried the previous combo patch. Updated sched.git as well, new HEAD is fec13e45305d69fd0bd23b30bd05a0a42cf341f8. With the previous patch and this patch applied, the issue is not reproducible here. Thanks, Rafael Index: linux-2.6/kernel/sched.c === --- linux-2.6.orig/kernel/sched.c +++ linux-2.6/kernel/sched.c @@ -219,6 +219,10 @@ static void start_rt_bandwidth(struct rt if (rt_b-rt_runtime == RUNTIME_INF) return; + if (hrtimer_active(rt_b-rt_period_timer)) + return; + + spin_lock(rt_b-rt_runtime_lock); for (;;) { if (hrtimer_active(rt_b-rt_period_timer)) break; @@ -229,6 +233,7 @@ static void start_rt_bandwidth(struct rt rt_b-rt_period_timer.expires, HRTIMER_MODE_ABS); } + spin_unlock(rt_b-rt_runtime_lock); } #ifdef CONFIG_RT_GROUP_SCHED -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.24-git2: Oracle 11g VKTM process enters R state on startup and is unkillable [still broken in 2.6.25-rc1]
On Feb 12, 2008 12:12 AM, Rafael J. Wysocki [EMAIL PROTECTED] wrote: On Monday, 11 of February 2008, Ingo Molnar wrote: * Ingo Molnar [EMAIL PROTECTED] wrote: no, they were not lost, they just didnt pass QA here (they crashed on a particularly hard to debug 8-way box i have) and Peter worked on that queue of fixes up until today to get it really correct. Could you check: git://git.kernel.org/pub/scm/linux/kernel/git/mingo/linux-2.6-sched.git combo patch below as well - whichever you prefer. The shortlog can be found below as well - but i dont yet consider this pullable, i'd like it to see pass a full night of randconfig tests on my test-systems. ok, we just found the reason for the 8-way crash, the delta fix from Peter is below if any of you have tried the previous combo patch. Updated sched.git as well, new HEAD is fec13e45305d69fd0bd23b30bd05a0a42cf341f8. With the previous patch and this patch applied, the issue is not reproducible here. Thanks, Rafael The problem is fixed for me as well with the previous patch + the patch below, VKTM now enters S state and Oracle shuts down properly again. Thanks ! Index: linux-2.6/kernel/sched.c === --- linux-2.6.orig/kernel/sched.c +++ linux-2.6/kernel/sched.c @@ -219,6 +219,10 @@ static void start_rt_bandwidth(struct rt if (rt_b-rt_runtime == RUNTIME_INF) return; + if (hrtimer_active(rt_b-rt_period_timer)) + return; + + spin_lock(rt_b-rt_runtime_lock); for (;;) { if (hrtimer_active(rt_b-rt_period_timer)) break; @@ -229,6 +233,7 @@ static void start_rt_bandwidth(struct rt rt_b-rt_period_timer.expires, HRTIMER_MODE_ABS); } + spin_unlock(rt_b-rt_runtime_lock); } #ifdef CONFIG_RT_GROUP_SCHED --alessandro We act as though comfort and luxury were the chief requirements of life, when all that we need to make us really happy is something to be enthusiastic about. (Charles Kingsley) -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.24-git2: Oracle 11g VKTM process enters R state on startup and is unkillable [still broken in 2.6.25-rc1]
* Ingo Molnar [EMAIL PROTECTED] wrote: no, they were not lost, they just didnt pass QA here (they crashed on a particularly hard to debug 8-way box i have) and Peter worked on that queue of fixes up until today to get it really correct. Could you check: git://git.kernel.org/pub/scm/linux/kernel/git/mingo/linux-2.6-sched.git combo patch below as well - whichever you prefer. The shortlog can be found below as well - but i dont yet consider this pullable, i'd like it to see pass a full night of randconfig tests on my test-systems. ok, we just found the reason for the 8-way crash, the delta fix from Peter is below if any of you have tried the previous combo patch. Updated sched.git as well, new HEAD is fec13e45305d69fd0bd23b30bd05a0a42cf341f8. Ingo Index: linux-2.6/kernel/sched.c === --- linux-2.6.orig/kernel/sched.c +++ linux-2.6/kernel/sched.c @@ -219,6 +219,10 @@ static void start_rt_bandwidth(struct rt if (rt_b-rt_runtime == RUNTIME_INF) return; + if (hrtimer_active(rt_b-rt_period_timer)) + return; + + spin_lock(rt_b-rt_runtime_lock); for (;;) { if (hrtimer_active(rt_b-rt_period_timer)) break; @@ -229,6 +233,7 @@ static void start_rt_bandwidth(struct rt rt_b-rt_period_timer.expires, HRTIMER_MODE_ABS); } + spin_unlock(rt_b-rt_runtime_lock); } #ifdef CONFIG_RT_GROUP_SCHED -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.24-git2: Oracle 11g VKTM process enters R state on startup and is unkillable [still broken in 2.6.25-rc1]
On Feb 9, 2008 6:10 PM, Alessandro Suardi [EMAIL PROTECTED] wrote: I finally had a bit of time to try out different kernel versions to find out where this began... and it's in 2.6.24-git2. What happens: Oracle 11g starts up and forks a number of so called background processes. Starting in 2.6.24-git2 the VKTM process never fully completes its initialization but gets in R state, never accumulating CPU, and can't be straced/gdb'd/killed. Sysrq-T reports for VKTM looks like this Feb 9 16:10:46 sandman kernel: === Feb 9 16:10:46 sandman kernel: oracleR running 2684 2258 1 Feb 9 16:10:46 sandman kernel:f591dfb0 00200086 f6bbc3c4 f6863cc0 c010547a b794f62c b7b70600 Feb 9 16:10:46 sandman kernel:b79453dc f591d000 c0103caa b794f62c b7943708 b79453e4 b7b70600 b79453dc Feb 9 16:10:46 sandman kernel:bfb0dd5c b79500b0 007b 007b c032 0e072d7a 0073 Feb 9 16:10:46 sandman kernel: Call Trace: Feb 9 16:10:46 sandman kernel: [c010547a] ? do_IRQ+0xac/0xc1 Feb 9 16:10:46 sandman kernel: [c0103caa] work_resched+0x5/0x16 Feb 9 16:10:46 sandman kernel: [c032] ? pci_setup+0xb3/0x104 Feb 9 16:10:46 sandman kernel: === 2.6.24-git1 is okay 2.6.24-git2 is bad ... 2.6.24-git20 is bad Only differences in kernel .config between -git1 and -git2 are [EMAIL PROTECTED] src]$ diff .config-2.6.24-git[12] 3,4c3,4 # Linux kernel version: 2.6.24-git1 # Sat Jan 26 01:04:43 2008 --- # Linux kernel version: 2.6.24-git2 # Sat Jan 26 12:10:15 2008 121a122,123 CONFIG_CLASSIC_RCU=y # CONFIG_PREEMPT_RCU is not set 187a190 # CONFIG_RCU_TRACE is not set 230a234 # CONFIG_SCHED_HRTICK is not set 755a760 # CONFIG_PATA_NINJA32 is not set 1807a1813 # CONFIG_LATENCYTOP is not set Symptom is similar to what Rafael reported here http://www.ussg.iu.edu/hypermail/linux/kernel/0801.3/4114.html and similarly VKTM attempts to run at elevated priority as normal user process (Oracle kernel binary is not setuid root). Peter Zijlstra's patches mentioned in the above thread, at http://programming.kicks-ass.net/kernel-patches/sched-rt-group , do not appear to be in -git20 yet. I'm available for further testing. Thanks, ciao, Only to add that 2.6.25-rc1 is still broken. thanks, --alessandro We act as though comfort and luxury were the chief requirements of life, when all that we need to make us really happy is something to be enthusiastic about. (Charles Kingsley) -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.24-git2: Oracle 11g VKTM process enters R state on startup and is unkillable [still broken in 2.6.25-rc1]
On Monday, 11 of February 2008, Linus Torvalds wrote: On Mon, 11 Feb 2008, Rafael J. Wysocki wrote: On Monday, 11 of February 2008, Alessandro Suardi wrote: 2.6.24-git1 is okay 2.6.24-git2 is bad Ok, that's git ID's b47711bfbcd4eb77ca61ef0162487b20e023ae55 2.6.24-git1 9b73e76f3cf63379dcf45fcd4f112f5812418d0a 2.6.24-git2 so if you get a git tree, you can do gitk b47711b..9b73e76 to see what happened in there. However, the obvious candidates are the scheduler or the ocfs2 merge, and the latter is only relevant in case you use ocfs2, of course. The rest of it tends to be the DVB and SCSI updates. But it would be great if you could do a bisect and verify. Just do git bisect start git bisect good b47711bfbcd4eb77ca61ef0162487b20e023ae55 git bisect bad 9b73e76f3cf63379dcf45fcd4f112f5812418d0a and off you go.. Well, I've already bisected that down to commit 6f505b16425a51270058e4a93441fe64de3dd435 sched: rt group scheduling and provided a simple test case. Moreover, there are patches from Peter that fix the problem, but they are lost somewhere in the way from him to you (please see http://lkml.org/lkml/2008/2/5/535 and http://lkml.org/lkml/2008/2/6/320). Thanks, Rafael -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/