Re: [1/6] 2.6.21-rc4: known regressions
On Fri, 2007-03-23 at 11:28 -0700, Linus Torvalds wrote: > > On Fri, 23 Mar 2007, Linus Torvalds wrote: > > > > Thomas, please fix. > > Here's a possible fix. It compiles. And I still wish we had common files. You beat me by 30 seconds. > ia64 shouldn't be affected, because ia64 doesn't #define the > ARCH_APICTIMER_STOPS_ON_C3 flag (and then we don't use the "c2_ok" thing > either. Right, ia64 does not see it. > But this is still pretty damn ugly. Yes it is. > Maybe a field in "struct acpi_processor" for C2/C3 problems? Hmm, the acpi processor stuff is modular. tglx - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [1/6] 2.6.21-rc4: known regressions
On Fri, 23 Mar 2007, Linus Torvalds wrote: > > Thomas, please fix. Here's a possible fix. It compiles. And I still wish we had common files. ia64 shouldn't be affected, because ia64 doesn't #define the ARCH_APICTIMER_STOPS_ON_C3 flag (and then we don't use the "c2_ok" thing either. But this is still pretty damn ugly. Maybe a field in "struct acpi_processor" for C2/C3 problems? Linus --- diff --git a/arch/x86_64/kernel/apic.c b/arch/x86_64/kernel/apic.c index 723417d..46acf4f 100644 --- a/arch/x86_64/kernel/apic.c +++ b/arch/x86_64/kernel/apic.c @@ -47,6 +47,10 @@ int apic_calibrate_pmtmr __initdata; int disable_apic_timer __initdata; +/* Local APIC timer works in C2? */ +int local_apic_timer_c2_ok; +EXPORT_SYMBOL_GPL(local_apic_timer_c2_ok); + static struct resource *ioapic_resources; static struct resource lapic_resource = { .name = "Local APIC", @@ -1192,6 +1196,13 @@ static __init int setup_nolapic(char *str) } early_param("nolapic", setup_nolapic); +static int __init parse_lapic_timer_c2_ok(char *arg) +{ + local_apic_timer_c2_ok = 1; + return 0; +} +early_param("lapic_timer_c2_ok", parse_lapic_timer_c2_ok); + static __init int setup_noapictimer(char *str) { if (str[0] != ' ' && str[0] != 0) diff --git a/include/asm-x86_64/apic.h b/include/asm-x86_64/apic.h index e81d0f2..7cfb39c 100644 --- a/include/asm-x86_64/apic.h +++ b/include/asm-x86_64/apic.h @@ -102,5 +102,6 @@ void switch_ipi_to_APIC_timer(void *cpumask); #define ARCH_APICTIMER_STOPS_ON_C3 1 extern unsigned boot_cpu_id; +extern int local_apic_timer_c2_ok; #endif /* __ASM_APIC_H */ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [1/6] 2.6.21-rc4: known regressions
On Fri, 23 Mar 2007, Linus Torvalds wrote: > > I really wish we had an x86-64 maintainer that understood that it's > confusing that files in arch/i386/ are also used for arch/x86-64. Sorry, that was unfair. The patch was simply buggy. It added the test to drivers/acpi/ *without* adding it to the architectures that used it, it wasn't an i386/x86-64 thing. Thomas, please fix. Linus - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [1/6] 2.6.21-rc4: known regressions
On Fri, 23 Mar 2007, Thomas Gleixner wrote: > > We should revert that patch and add a "trust_lapic_timer_in_c2" > commandline option instead. So we are on the safe side. Damn. I applied your patch, but it breaks on x86-64: drivers/acpi/processor_idle.c:271: error: 'local_apic_timer_c2_ok' undeclared (f irst use in this function) I really wish we had an x86-64 maintainer that understood that it's confusing that files in arch/i386/ are also used for arch/x86-64. Linus - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [1/6] 2.6.21-rc4: known regressions
On Fri, Mar 23, 2007 at 10:37:38AM +0100, Nick Piggin wrote: > On Fri, Mar 23, 2007 at 08:51:13AM +0100, Michal Piotrowski wrote: > > On 23/03/07, Nick Piggin <[EMAIL PROTECTED]> wrote: > > >> > > >> and that in turn points to the kernel log: > > >> > > >> > > >http://www.stardust.webpages.pl/files/tbf/bitis-gabonica/2.6.21-rc4/git-console.log > > > > > >Seems convincing. Michal, can you post your .config, and if you had > > >dynticks and hrtimers enabled, try reproducing without them? > > > > > > > http://www.stardust.webpages.pl/files/tbf/bitis-gabonica/2.6.21-rc4/git-config > > > > I don't know how to reproduce this bug on 2.6.21-rc4.On 2.6.21-rc2-mm1 > > it was very simple, just run youtube, bash_shared_mapping etc. In fact > > I didn't see this bug for a week. > > OK... for some reason this is listed as a regression against 2.6.21-rc4. >... Due to http://lkml.org/lkml/2007/3/16/288 cu Adrian -- "Is there not promise of rain?" Ling Tan asked suddenly out of the darkness. There had been need of rain for many days. "Only a promise," Lao Er said. Pearl S. Buck - Dragon Seed - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [1/6] 2.6.21-rc4: known regressions
* Linus Torvalds <[EMAIL PROTECTED]> wrote: > [ Ok, I think it's those timers again... agreed - this seems to be a genuine CONFIG_HIGH_RES_TIMERS=y bug. (which has probably not been fixed since -rc4 either, we have no bugfix in this area that could explain the expires_next==KTIME_MAX timer state visible in SysRq-Q.) there seems to be a trend in the reports: HT P4 CPUs. > Ingo: let me just state how *happy* I am that I told you off when > you wanted to merge the hires timers and NO_HZ before 2.6.20 because > they were "stable". You were wrong, and 2.6.20 is at least in > reasonable shape. [...] yes - i was quite wrong pushing it so hard. (and doubly so given your stated focus of making v2.6.20 a quiet release) Sorry :-/ > [...] Now we just need to make sure that 2.6.21 will be too.. ] yeah - we are working hard on it. Ingo - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [1/6] 2.6.21-rc4: known regressions
On Fri, 2007-03-23 at 12:42 +0100, Ingo Molnar wrote: > there's a new post-rc4 regression: my T60 hangs during early bootup. I > bisected the hang down to this recent commit: > > | commit 25496caec111481161e7f06bbfa12a533c43cc6f > | Author: Thomas Renninger <[EMAIL PROTECTED]> > | Date: Tue Feb 27 12:13:00 2007 -0500 > | > |ACPI: Only use IPI on known broken machines (AMD, Dothan/BaniasPentium M) > > undoing this change fixes my T60 so it correctly boots again. > > the commit has this confidence-raising comment: > > | However, I am not sure about the naming of the parameter and how it > | could/should get integrated into the dyntick part > | (CONFIG_GENERIC_CLOCKEVENTS). There, a more fine grained check (TSC > | still running?, ..) is needed? > > could we please revert this commit until it's done correctly? > > and did this end up being a 'fix'? The change weakens the scope of a > hardware workaround, which IMO has no place so late in the cycle. At a > minimum the clockevents maintainer (Thomas) should have been Cc:-ed on > it. Ingo, I had seen it before, and I had no objections under the premise, that it does not break things and especially survives on Andrews VAIO. I expected that to come in via -mm so it gets enough testing. We should revert that patch and add a "trust_lapic_timer_in_c2" commandline option instead. So we are on the safe side. tglx - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [1/6] 2.6.21-rc4: known regressions
there's a new post-rc4 regression: my T60 hangs during early bootup. I bisected the hang down to this recent commit: | commit 25496caec111481161e7f06bbfa12a533c43cc6f | Author: Thomas Renninger <[EMAIL PROTECTED]> | Date: Tue Feb 27 12:13:00 2007 -0500 | |ACPI: Only use IPI on known broken machines (AMD, Dothan/BaniasPentium M) undoing this change fixes my T60 so it correctly boots again. the commit has this confidence-raising comment: | However, I am not sure about the naming of the parameter and how it | could/should get integrated into the dyntick part | (CONFIG_GENERIC_CLOCKEVENTS). There, a more fine grained check (TSC | still running?, ..) is needed? could we please revert this commit until it's done correctly? and did this end up being a 'fix'? The change weakens the scope of a hardware workaround, which IMO has no place so late in the cycle. At a minimum the clockevents maintainer (Thomas) should have been Cc:-ed on it. Ingo - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [1/6] 2.6.21-rc4: known regressions
On Fri, Mar 23, 2007 at 08:51:13AM +0100, Michal Piotrowski wrote: > On 23/03/07, Nick Piggin <[EMAIL PROTECTED]> wrote: > >> > >> and that in turn points to the kernel log: > >> > >> > >http://www.stardust.webpages.pl/files/tbf/bitis-gabonica/2.6.21-rc4/git-console.log > > > >Seems convincing. Michal, can you post your .config, and if you had > >dynticks and hrtimers enabled, try reproducing without them? > > > > http://www.stardust.webpages.pl/files/tbf/bitis-gabonica/2.6.21-rc4/git-config > > I don't know how to reproduce this bug on 2.6.21-rc4.On 2.6.21-rc2-mm1 > it was very simple, just run youtube, bash_shared_mapping etc. In fact > I didn't see this bug for a week. OK... for some reason this is listed as a regression against 2.6.21-rc4. You do have CONFIG_NO_HZ=y, and it is likely to be the cause of your 2.6.21-rc2-mm1 problems, but maybe there have been fixes since then? Ingo? - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [1/6] 2.6.21-rc4: known regressions
On 23/03/07, Nick Piggin <[EMAIL PROTECTED]> wrote: On Thu, Mar 22, 2007 at 06:40:41PM -0700, Linus Torvalds wrote: > > [ Ok, I think it's those timers again... > > Ingo: let me just state how *happy* I am that I told you off when you > wanted to merge the hires timers and NO_HZ before 2.6.20 because they > were "stable". You were wrong, and 2.6.20 is at least in reasonable > shape. Now we just need to make sure that 2.6.21 will be too.. ] > > On Thu, 22 Mar 2007, Mingming Cao wrote: > > > > I might missed something, so far I can't see a deadlock yet. > > If there is a deadlock, I think we should see ext3_xattr_release_block() > > and ext3_forget() on the stack. Is this the case? > > No. What's strange is that two (maybe more, I didn't check) processes seem > to be stuck in > >[] schedule_timeout+0x70/0x8e >[] schedule_timeout_uninterruptible+0x15/0x17 >[] journal_stop+0xe2/0x1e6 >[] journal_force_commit+0x1d/0x1f >[] ext3_force_commit+0x22/0x24 >[] ext3_write_inode+0x34/0x3a >[] __writeback_single_inode+0x1c5/0x2cb >[] sync_inode+0x1c/0x2e >[] ext3_sync_file+0xab/0xc0 >[] do_fsync+0x4b/0x98 >[] __do_fsync+0x20/0x2f >[] sys_fsync+0xd/0xf >[] syscall_call+0x7/0xb > > but that that thing is literally: > > ... > do { > old_handle_count = transaction->t_handle_count; > schedule_timeout_uninterruptible(1); > } while (old_handle_count != transaction->t_handle_count); > ... > > and especially if nothing is happening, I'd not expect > "transaction->t_handle_count" to keep changing, so it should stop very > quickly. > > Maybe it's CONFIG_NO_HZ again, and the problem is that timeout, and simply > no timer tick happening? > > Bingo. I think that's it. > > active timers: >#0: hardirq_stack, tick_sched_timer, S:01 ># expires at 953089300 nsecs [in -2567889 nsecs] >#1: hardirq_stack, hrtimer_wakeup, S:01 ># expires at 10858649798503 nsecs [in 1327754230614 nsecs] > .expires_next : 953089300 nsecs > > See > > http://lkml.org/lkml/2007/3/16/288 > > and that in turn points to the kernel log: > > http://www.stardust.webpages.pl/files/tbf/bitis-gabonica/2.6.21-rc4/git-console.log Seems convincing. Michal, can you post your .config, and if you had dynticks and hrtimers enabled, try reproducing without them? http://www.stardust.webpages.pl/files/tbf/bitis-gabonica/2.6.21-rc4/git-config I don't know how to reproduce this bug on 2.6.21-rc4.On 2.6.21-rc2-mm1 it was very simple, just run youtube, bash_shared_mapping etc. In fact I didn't see this bug for a week. Unfortunately, I wasn't able to take a crash dump because of sound card driver bug (I've got crash dump from 2.6.21-rc2-mm1). Regards, Michal -- Michal K. K. Piotrowski LTG - Linux Testers Group (PL) (http://www.stardust.webpages.pl/ltg/) LTG - Linux Testers Group (EN) (http://www.stardust.webpages.pl/linux_testers_group_en/) - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [1/6] 2.6.21-rc4: known regressions
On 23/03/07, Nick Piggin [EMAIL PROTECTED] wrote: On Thu, Mar 22, 2007 at 06:40:41PM -0700, Linus Torvalds wrote: [ Ok, I think it's those timers again... Ingo: let me just state how *happy* I am that I told you off when you wanted to merge the hires timers and NO_HZ before 2.6.20 because they were stable. You were wrong, and 2.6.20 is at least in reasonable shape. Now we just need to make sure that 2.6.21 will be too.. ] On Thu, 22 Mar 2007, Mingming Cao wrote: I might missed something, so far I can't see a deadlock yet. If there is a deadlock, I think we should see ext3_xattr_release_block() and ext3_forget() on the stack. Is this the case? No. What's strange is that two (maybe more, I didn't check) processes seem to be stuck in [c0318981] schedule_timeout+0x70/0x8e [c03189b4] schedule_timeout_uninterruptible+0x15/0x17 [c01b964a] journal_stop+0xe2/0x1e6 [c01ba2b0] journal_force_commit+0x1d/0x1f [c01b29fb] ext3_force_commit+0x22/0x24 [c01ad607] ext3_write_inode+0x34/0x3a [c0189f74] __writeback_single_inode+0x1c5/0x2cb [c018a096] sync_inode+0x1c/0x2e [c01a9ff7] ext3_sync_file+0xab/0xc0 [c018c8c5] do_fsync+0x4b/0x98 [c018c932] __do_fsync+0x20/0x2f [c018c960] sys_fsync+0xd/0xf [c0104064] syscall_call+0x7/0xb but that that thing is literally: ... do { old_handle_count = transaction-t_handle_count; schedule_timeout_uninterruptible(1); } while (old_handle_count != transaction-t_handle_count); ... and especially if nothing is happening, I'd not expect transaction-t_handle_count to keep changing, so it should stop very quickly. Maybe it's CONFIG_NO_HZ again, and the problem is that timeout, and simply no timer tick happening? Bingo. I think that's it. active timers: #0: hardirq_stack, tick_sched_timer, S:01 # expires at 953089300 nsecs [in -2567889 nsecs] #1: hardirq_stack, hrtimer_wakeup, S:01 # expires at 10858649798503 nsecs [in 1327754230614 nsecs] .expires_next : 953089300 nsecs See http://lkml.org/lkml/2007/3/16/288 and that in turn points to the kernel log: http://www.stardust.webpages.pl/files/tbf/bitis-gabonica/2.6.21-rc4/git-console.log Seems convincing. Michal, can you post your .config, and if you had dynticks and hrtimers enabled, try reproducing without them? http://www.stardust.webpages.pl/files/tbf/bitis-gabonica/2.6.21-rc4/git-config I don't know how to reproduce this bug on 2.6.21-rc4.On 2.6.21-rc2-mm1 it was very simple, just run youtube, bash_shared_mapping etc. In fact I didn't see this bug for a week. Unfortunately, I wasn't able to take a crash dump because of sound card driver bug (I've got crash dump from 2.6.21-rc2-mm1). Regards, Michal -- Michal K. K. Piotrowski LTG - Linux Testers Group (PL) (http://www.stardust.webpages.pl/ltg/) LTG - Linux Testers Group (EN) (http://www.stardust.webpages.pl/linux_testers_group_en/) - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [1/6] 2.6.21-rc4: known regressions
On Fri, Mar 23, 2007 at 08:51:13AM +0100, Michal Piotrowski wrote: On 23/03/07, Nick Piggin [EMAIL PROTECTED] wrote: and that in turn points to the kernel log: http://www.stardust.webpages.pl/files/tbf/bitis-gabonica/2.6.21-rc4/git-console.log Seems convincing. Michal, can you post your .config, and if you had dynticks and hrtimers enabled, try reproducing without them? http://www.stardust.webpages.pl/files/tbf/bitis-gabonica/2.6.21-rc4/git-config I don't know how to reproduce this bug on 2.6.21-rc4.On 2.6.21-rc2-mm1 it was very simple, just run youtube, bash_shared_mapping etc. In fact I didn't see this bug for a week. OK... for some reason this is listed as a regression against 2.6.21-rc4. You do have CONFIG_NO_HZ=y, and it is likely to be the cause of your 2.6.21-rc2-mm1 problems, but maybe there have been fixes since then? Ingo? - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [1/6] 2.6.21-rc4: known regressions
there's a new post-rc4 regression: my T60 hangs during early bootup. I bisected the hang down to this recent commit: | commit 25496caec111481161e7f06bbfa12a533c43cc6f | Author: Thomas Renninger [EMAIL PROTECTED] | Date: Tue Feb 27 12:13:00 2007 -0500 | |ACPI: Only use IPI on known broken machines (AMD, Dothan/BaniasPentium M) undoing this change fixes my T60 so it correctly boots again. the commit has this confidence-raising comment: | However, I am not sure about the naming of the parameter and how it | could/should get integrated into the dyntick part | (CONFIG_GENERIC_CLOCKEVENTS). There, a more fine grained check (TSC | still running?, ..) is needed? could we please revert this commit until it's done correctly? and did this end up being a 'fix'? The change weakens the scope of a hardware workaround, which IMO has no place so late in the cycle. At a minimum the clockevents maintainer (Thomas) should have been Cc:-ed on it. Ingo - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [1/6] 2.6.21-rc4: known regressions
On Fri, 2007-03-23 at 12:42 +0100, Ingo Molnar wrote: there's a new post-rc4 regression: my T60 hangs during early bootup. I bisected the hang down to this recent commit: | commit 25496caec111481161e7f06bbfa12a533c43cc6f | Author: Thomas Renninger [EMAIL PROTECTED] | Date: Tue Feb 27 12:13:00 2007 -0500 | |ACPI: Only use IPI on known broken machines (AMD, Dothan/BaniasPentium M) undoing this change fixes my T60 so it correctly boots again. the commit has this confidence-raising comment: | However, I am not sure about the naming of the parameter and how it | could/should get integrated into the dyntick part | (CONFIG_GENERIC_CLOCKEVENTS). There, a more fine grained check (TSC | still running?, ..) is needed? could we please revert this commit until it's done correctly? and did this end up being a 'fix'? The change weakens the scope of a hardware workaround, which IMO has no place so late in the cycle. At a minimum the clockevents maintainer (Thomas) should have been Cc:-ed on it. Ingo, I had seen it before, and I had no objections under the premise, that it does not break things and especially survives on Andrews VAIO. I expected that to come in via -mm so it gets enough testing. We should revert that patch and add a trust_lapic_timer_in_c2 commandline option instead. So we are on the safe side. tglx - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [1/6] 2.6.21-rc4: known regressions
* Linus Torvalds [EMAIL PROTECTED] wrote: [ Ok, I think it's those timers again... agreed - this seems to be a genuine CONFIG_HIGH_RES_TIMERS=y bug. (which has probably not been fixed since -rc4 either, we have no bugfix in this area that could explain the expires_next==KTIME_MAX timer state visible in SysRq-Q.) there seems to be a trend in the reports: HT P4 CPUs. Ingo: let me just state how *happy* I am that I told you off when you wanted to merge the hires timers and NO_HZ before 2.6.20 because they were stable. You were wrong, and 2.6.20 is at least in reasonable shape. [...] yes - i was quite wrong pushing it so hard. (and doubly so given your stated focus of making v2.6.20 a quiet release) Sorry :-/ [...] Now we just need to make sure that 2.6.21 will be too.. ] yeah - we are working hard on it. Ingo - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [1/6] 2.6.21-rc4: known regressions
On Fri, Mar 23, 2007 at 10:37:38AM +0100, Nick Piggin wrote: On Fri, Mar 23, 2007 at 08:51:13AM +0100, Michal Piotrowski wrote: On 23/03/07, Nick Piggin [EMAIL PROTECTED] wrote: and that in turn points to the kernel log: http://www.stardust.webpages.pl/files/tbf/bitis-gabonica/2.6.21-rc4/git-console.log Seems convincing. Michal, can you post your .config, and if you had dynticks and hrtimers enabled, try reproducing without them? http://www.stardust.webpages.pl/files/tbf/bitis-gabonica/2.6.21-rc4/git-config I don't know how to reproduce this bug on 2.6.21-rc4.On 2.6.21-rc2-mm1 it was very simple, just run youtube, bash_shared_mapping etc. In fact I didn't see this bug for a week. OK... for some reason this is listed as a regression against 2.6.21-rc4. ... Due to http://lkml.org/lkml/2007/3/16/288 cu Adrian -- Is there not promise of rain? Ling Tan asked suddenly out of the darkness. There had been need of rain for many days. Only a promise, Lao Er said. Pearl S. Buck - Dragon Seed - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [1/6] 2.6.21-rc4: known regressions
On Fri, 23 Mar 2007, Thomas Gleixner wrote: We should revert that patch and add a trust_lapic_timer_in_c2 commandline option instead. So we are on the safe side. Damn. I applied your patch, but it breaks on x86-64: drivers/acpi/processor_idle.c:271: error: 'local_apic_timer_c2_ok' undeclared (f irst use in this function) I really wish we had an x86-64 maintainer that understood that it's confusing that files in arch/i386/ are also used for arch/x86-64. Linus - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [1/6] 2.6.21-rc4: known regressions
On Fri, 23 Mar 2007, Linus Torvalds wrote: I really wish we had an x86-64 maintainer that understood that it's confusing that files in arch/i386/ are also used for arch/x86-64. Sorry, that was unfair. The patch was simply buggy. It added the test to drivers/acpi/ *without* adding it to the architectures that used it, it wasn't an i386/x86-64 thing. Thomas, please fix. Linus - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [1/6] 2.6.21-rc4: known regressions
On Fri, 23 Mar 2007, Linus Torvalds wrote: Thomas, please fix. Here's a possible fix. It compiles. And I still wish we had common files. ia64 shouldn't be affected, because ia64 doesn't #define the ARCH_APICTIMER_STOPS_ON_C3 flag (and then we don't use the c2_ok thing either. But this is still pretty damn ugly. Maybe a field in struct acpi_processor for C2/C3 problems? Linus --- diff --git a/arch/x86_64/kernel/apic.c b/arch/x86_64/kernel/apic.c index 723417d..46acf4f 100644 --- a/arch/x86_64/kernel/apic.c +++ b/arch/x86_64/kernel/apic.c @@ -47,6 +47,10 @@ int apic_calibrate_pmtmr __initdata; int disable_apic_timer __initdata; +/* Local APIC timer works in C2? */ +int local_apic_timer_c2_ok; +EXPORT_SYMBOL_GPL(local_apic_timer_c2_ok); + static struct resource *ioapic_resources; static struct resource lapic_resource = { .name = Local APIC, @@ -1192,6 +1196,13 @@ static __init int setup_nolapic(char *str) } early_param(nolapic, setup_nolapic); +static int __init parse_lapic_timer_c2_ok(char *arg) +{ + local_apic_timer_c2_ok = 1; + return 0; +} +early_param(lapic_timer_c2_ok, parse_lapic_timer_c2_ok); + static __init int setup_noapictimer(char *str) { if (str[0] != ' ' str[0] != 0) diff --git a/include/asm-x86_64/apic.h b/include/asm-x86_64/apic.h index e81d0f2..7cfb39c 100644 --- a/include/asm-x86_64/apic.h +++ b/include/asm-x86_64/apic.h @@ -102,5 +102,6 @@ void switch_ipi_to_APIC_timer(void *cpumask); #define ARCH_APICTIMER_STOPS_ON_C3 1 extern unsigned boot_cpu_id; +extern int local_apic_timer_c2_ok; #endif /* __ASM_APIC_H */ - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [1/6] 2.6.21-rc4: known regressions
On Fri, 2007-03-23 at 11:28 -0700, Linus Torvalds wrote: On Fri, 23 Mar 2007, Linus Torvalds wrote: Thomas, please fix. Here's a possible fix. It compiles. And I still wish we had common files. You beat me by 30 seconds. ia64 shouldn't be affected, because ia64 doesn't #define the ARCH_APICTIMER_STOPS_ON_C3 flag (and then we don't use the c2_ok thing either. Right, ia64 does not see it. But this is still pretty damn ugly. Yes it is. Maybe a field in struct acpi_processor for C2/C3 problems? Hmm, the acpi processor stuff is modular. tglx - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [1/6] 2.6.21-rc4: known regressions
On Thu, Mar 22, 2007 at 06:40:41PM -0700, Linus Torvalds wrote: > > [ Ok, I think it's those timers again... > > Ingo: let me just state how *happy* I am that I told you off when you > wanted to merge the hires timers and NO_HZ before 2.6.20 because they > were "stable". You were wrong, and 2.6.20 is at least in reasonable > shape. Now we just need to make sure that 2.6.21 will be too.. ] > > On Thu, 22 Mar 2007, Mingming Cao wrote: > > > > I might missed something, so far I can't see a deadlock yet. > > If there is a deadlock, I think we should see ext3_xattr_release_block() > > and ext3_forget() on the stack. Is this the case? > > No. What's strange is that two (maybe more, I didn't check) processes seem > to be stuck in > >[] schedule_timeout+0x70/0x8e >[] schedule_timeout_uninterruptible+0x15/0x17 >[] journal_stop+0xe2/0x1e6 >[] journal_force_commit+0x1d/0x1f >[] ext3_force_commit+0x22/0x24 >[] ext3_write_inode+0x34/0x3a >[] __writeback_single_inode+0x1c5/0x2cb >[] sync_inode+0x1c/0x2e >[] ext3_sync_file+0xab/0xc0 >[] do_fsync+0x4b/0x98 >[] __do_fsync+0x20/0x2f >[] sys_fsync+0xd/0xf >[] syscall_call+0x7/0xb > > but that that thing is literally: > > ... > do { > old_handle_count = transaction->t_handle_count; > schedule_timeout_uninterruptible(1); > } while (old_handle_count != transaction->t_handle_count); > ... > > and especially if nothing is happening, I'd not expect > "transaction->t_handle_count" to keep changing, so it should stop very > quickly. > > Maybe it's CONFIG_NO_HZ again, and the problem is that timeout, and simply > no timer tick happening? > > Bingo. I think that's it. > > active timers: >#0: hardirq_stack, tick_sched_timer, S:01 ># expires at 953089300 nsecs [in -2567889 nsecs] >#1: hardirq_stack, hrtimer_wakeup, S:01 ># expires at 10858649798503 nsecs [in 1327754230614 nsecs] > .expires_next : 953089300 nsecs > > See > > http://lkml.org/lkml/2007/3/16/288 > > and that in turn points to the kernel log: > > > http://www.stardust.webpages.pl/files/tbf/bitis-gabonica/2.6.21-rc4/git-console.log Seems convincing. Michal, can you post your .config, and if you had dynticks and hrtimers enabled, try reproducing without them? - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [1/6] 2.6.21-rc4: known regressions
[ Ok, I think it's those timers again... Ingo: let me just state how *happy* I am that I told you off when you wanted to merge the hires timers and NO_HZ before 2.6.20 because they were "stable". You were wrong, and 2.6.20 is at least in reasonable shape. Now we just need to make sure that 2.6.21 will be too.. ] On Thu, 22 Mar 2007, Mingming Cao wrote: > > I might missed something, so far I can't see a deadlock yet. > If there is a deadlock, I think we should see ext3_xattr_release_block() > and ext3_forget() on the stack. Is this the case? No. What's strange is that two (maybe more, I didn't check) processes seem to be stuck in [] schedule_timeout+0x70/0x8e [] schedule_timeout_uninterruptible+0x15/0x17 [] journal_stop+0xe2/0x1e6 [] journal_force_commit+0x1d/0x1f [] ext3_force_commit+0x22/0x24 [] ext3_write_inode+0x34/0x3a [] __writeback_single_inode+0x1c5/0x2cb [] sync_inode+0x1c/0x2e [] ext3_sync_file+0xab/0xc0 [] do_fsync+0x4b/0x98 [] __do_fsync+0x20/0x2f [] sys_fsync+0xd/0xf [] syscall_call+0x7/0xb but that that thing is literally: ... do { old_handle_count = transaction->t_handle_count; schedule_timeout_uninterruptible(1); } while (old_handle_count != transaction->t_handle_count); ... and especially if nothing is happening, I'd not expect "transaction->t_handle_count" to keep changing, so it should stop very quickly. Maybe it's CONFIG_NO_HZ again, and the problem is that timeout, and simply no timer tick happening? Bingo. I think that's it. active timers: #0: hardirq_stack, tick_sched_timer, S:01 # expires at 953089300 nsecs [in -2567889 nsecs] #1: hardirq_stack, hrtimer_wakeup, S:01 # expires at 10858649798503 nsecs [in 1327754230614 nsecs] .expires_next : 953089300 nsecs See http://lkml.org/lkml/2007/3/16/288 and that in turn points to the kernel log: http://www.stardust.webpages.pl/files/tbf/bitis-gabonica/2.6.21-rc4/git-console.log - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [1/6] 2.6.21-rc4: known regressions
On Thu, 2007-03-22 at 08:21 -0700, Linus Torvalds wrote: > > On Thu, 22 Mar 2007, Nick Piggin wrote: > > > > Nothing sleeps on PageUptodate, so I don't think that could explain it. > > Good point. I forget that we just test "uptodate", but then always sleep > on "locked". > > > The fs: fix __block_write_full_page error case buffer submission patch > > does change the locking, but I'd be really suprised if that was the > > problem, because it changes locking to match the regular non-error path > > submission. > > I'd agree, except something clearly has changed ;^) > > > > Alternatively, maybe it really is an _io_ problem (and the buffer-head > > > thing > > > is just a red herring, and it could happen to other IO, it's just that > > > metadata IO uses buffer heads), and it's the scheduler changes since > > > 2.6.20.. > > > > I see what you mean. Could it be an ext3 or jbd change I wonder? > > jbd hasn't changed since 2.6.20, and the ext3 changes are mostly > things like const'ness fixes. And others were things like changing > "journal_current_handle()" into "ext3_journal_current_handle()", which > looked exciting considering that the hung processes were waiting for the > journal, but the fact is, that's just an inline function that just calls > the old function, so.. > > But interestingly, there *is* a "EA block reference count racing fix" > that does move a lock_buffer()/unlock_buffer() to cover a bigger area. It > looks "obviously correct", but maybe there's a deadlock possibility there > with ext3_forget() or something? > I might missed something, so far I can't see a deadlock yet. If there is a deadlock, I think we should see ext3_xattr_release_block() and ext3_forget() on the stack. Is this the case? Regards, Mingming > Linus > - > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to [EMAIL PROTECTED] > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ > - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [1/6] 2.6.21-rc4: known regressions
Hello, > > In contrast, the hang reported by Mariusz Kozlowski has a slightly > > different feel to it, but there's a tantalizing pattern in there too: Just to make things clear. I didn't say I could reproduce it on 2.6.21-rc4. In fact I'm running 2.6.21-rc4-mm1 with no problems so far. I just replied to show my sysrq dumps of processes states with 2.6.21-rc2-mm1. I could reproduce similar (but still each time slightly different) hangs on -mm series from 2.6.20-mm1 to 2.6.21-rc2-mm1. 2.6.21-rc3-mm1 worked well for me so not sure If my report is still valid here. Sorry if I didn't make it clear enough. > > http://www.ussg.iu.edu/hypermail/linux/kernel/0703.0/1243.html > > > > Call Trace: > > [] io_schedule+0x42/0x59 > > [] sleep_on_buffer+0x8/0xc > > [] __wait_on_bit+0x47/0x6c > > [] out_of_line_wait_on_bit+0x5b/0x64 > > [] __wait_on_buffer+0x27/0x2d > > [] journal_commit_transaction+0x707/0x127f > > [] kjournald+0xac/0x1ed > > [] kthread+0xa2/0xc9 > > [] kernel_thread_helper+0x7/0x1c > > > > which certainly also looks like an IO never completed (or completed but > > never woke anything up). As I previously noticed each time the system hang I/O activity to disk looked dead (couldn't even sysrq-s). > It could be possible that ext3 is doing something weird and expecting True. I'm using ext3. > fs: nobh data leak... again hard to see how it could cause an unlock/wakeup > to get lost. Is Mariusz using the nobh mount option? No. He is not. Regards, Mariusz Kozlowski - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [1/6] 2.6.21-rc4: known regressions
On Thu, 22 Mar 2007, Nick Piggin wrote: > > Nothing sleeps on PageUptodate, so I don't think that could explain it. Good point. I forget that we just test "uptodate", but then always sleep on "locked". > The fs: fix __block_write_full_page error case buffer submission patch > does change the locking, but I'd be really suprised if that was the > problem, because it changes locking to match the regular non-error path > submission. I'd agree, except something clearly has changed ;^) > > Alternatively, maybe it really is an _io_ problem (and the buffer-head thing > > is just a red herring, and it could happen to other IO, it's just that > > metadata IO uses buffer heads), and it's the scheduler changes since > > 2.6.20.. > > I see what you mean. Could it be an ext3 or jbd change I wonder? jbd hasn't changed since 2.6.20, and the ext3 changes are mostly things like const'ness fixes. And others were things like changing "journal_current_handle()" into "ext3_journal_current_handle()", which looked exciting considering that the hung processes were waiting for the journal, but the fact is, that's just an inline function that just calls the old function, so.. But interestingly, there *is* a "EA block reference count racing fix" that does move a lock_buffer()/unlock_buffer() to cover a bigger area. It looks "obviously correct", but maybe there's a deadlock possibility there with ext3_forget() or something? Linus - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [1/6] 2.6.21-rc4: known regressions
On Thu, 22 Mar 2007, Nick Piggin wrote: Nothing sleeps on PageUptodate, so I don't think that could explain it. Good point. I forget that we just test uptodate, but then always sleep on locked. The fs: fix __block_write_full_page error case buffer submission patch does change the locking, but I'd be really suprised if that was the problem, because it changes locking to match the regular non-error path submission. I'd agree, except something clearly has changed ;^) Alternatively, maybe it really is an _io_ problem (and the buffer-head thing is just a red herring, and it could happen to other IO, it's just that metadata IO uses buffer heads), and it's the scheduler changes since 2.6.20.. I see what you mean. Could it be an ext3 or jbd change I wonder? jbd hasn't changed since 2.6.20, and the ext3 changes are mostly things like const'ness fixes. And others were things like changing journal_current_handle() into ext3_journal_current_handle(), which looked exciting considering that the hung processes were waiting for the journal, but the fact is, that's just an inline function that just calls the old function, so.. But interestingly, there *is* a EA block reference count racing fix that does move a lock_buffer()/unlock_buffer() to cover a bigger area. It looks obviously correct, but maybe there's a deadlock possibility there with ext3_forget() or something? Linus - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [1/6] 2.6.21-rc4: known regressions
Hello, In contrast, the hang reported by Mariusz Kozlowski has a slightly different feel to it, but there's a tantalizing pattern in there too: Just to make things clear. I didn't say I could reproduce it on 2.6.21-rc4. In fact I'm running 2.6.21-rc4-mm1 with no problems so far. I just replied to show my sysrq dumps of processes states with 2.6.21-rc2-mm1. I could reproduce similar (but still each time slightly different) hangs on -mm series from 2.6.20-mm1 to 2.6.21-rc2-mm1. 2.6.21-rc3-mm1 worked well for me so not sure If my report is still valid here. Sorry if I didn't make it clear enough. http://www.ussg.iu.edu/hypermail/linux/kernel/0703.0/1243.html Call Trace: [c03ec87e] io_schedule+0x42/0x59 [c0184915] sleep_on_buffer+0x8/0xc [c03ed217] __wait_on_bit+0x47/0x6c [c03ed297] out_of_line_wait_on_bit+0x5b/0x64 [c01848a8] __wait_on_buffer+0x27/0x2d [c01b4228] journal_commit_transaction+0x707/0x127f [c01b868b] kjournald+0xac/0x1ed [c0126af5] kthread+0xa2/0xc9 [c010422b] kernel_thread_helper+0x7/0x1c which certainly also looks like an IO never completed (or completed but never woke anything up). As I previously noticed each time the system hang I/O activity to disk looked dead (couldn't even sysrq-s). It could be possible that ext3 is doing something weird and expecting True. I'm using ext3. fs: nobh data leak... again hard to see how it could cause an unlock/wakeup to get lost. Is Mariusz using the nobh mount option? No. He is not. Regards, Mariusz Kozlowski - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [1/6] 2.6.21-rc4: known regressions
On Thu, 2007-03-22 at 08:21 -0700, Linus Torvalds wrote: On Thu, 22 Mar 2007, Nick Piggin wrote: Nothing sleeps on PageUptodate, so I don't think that could explain it. Good point. I forget that we just test uptodate, but then always sleep on locked. The fs: fix __block_write_full_page error case buffer submission patch does change the locking, but I'd be really suprised if that was the problem, because it changes locking to match the regular non-error path submission. I'd agree, except something clearly has changed ;^) Alternatively, maybe it really is an _io_ problem (and the buffer-head thing is just a red herring, and it could happen to other IO, it's just that metadata IO uses buffer heads), and it's the scheduler changes since 2.6.20.. I see what you mean. Could it be an ext3 or jbd change I wonder? jbd hasn't changed since 2.6.20, and the ext3 changes are mostly things like const'ness fixes. And others were things like changing journal_current_handle() into ext3_journal_current_handle(), which looked exciting considering that the hung processes were waiting for the journal, but the fact is, that's just an inline function that just calls the old function, so.. But interestingly, there *is* a EA block reference count racing fix that does move a lock_buffer()/unlock_buffer() to cover a bigger area. It looks obviously correct, but maybe there's a deadlock possibility there with ext3_forget() or something? I might missed something, so far I can't see a deadlock yet. If there is a deadlock, I think we should see ext3_xattr_release_block() and ext3_forget() on the stack. Is this the case? Regards, Mingming Linus - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/ - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [1/6] 2.6.21-rc4: known regressions
[ Ok, I think it's those timers again... Ingo: let me just state how *happy* I am that I told you off when you wanted to merge the hires timers and NO_HZ before 2.6.20 because they were stable. You were wrong, and 2.6.20 is at least in reasonable shape. Now we just need to make sure that 2.6.21 will be too.. ] On Thu, 22 Mar 2007, Mingming Cao wrote: I might missed something, so far I can't see a deadlock yet. If there is a deadlock, I think we should see ext3_xattr_release_block() and ext3_forget() on the stack. Is this the case? No. What's strange is that two (maybe more, I didn't check) processes seem to be stuck in [c0318981] schedule_timeout+0x70/0x8e [c03189b4] schedule_timeout_uninterruptible+0x15/0x17 [c01b964a] journal_stop+0xe2/0x1e6 [c01ba2b0] journal_force_commit+0x1d/0x1f [c01b29fb] ext3_force_commit+0x22/0x24 [c01ad607] ext3_write_inode+0x34/0x3a [c0189f74] __writeback_single_inode+0x1c5/0x2cb [c018a096] sync_inode+0x1c/0x2e [c01a9ff7] ext3_sync_file+0xab/0xc0 [c018c8c5] do_fsync+0x4b/0x98 [c018c932] __do_fsync+0x20/0x2f [c018c960] sys_fsync+0xd/0xf [c0104064] syscall_call+0x7/0xb but that that thing is literally: ... do { old_handle_count = transaction-t_handle_count; schedule_timeout_uninterruptible(1); } while (old_handle_count != transaction-t_handle_count); ... and especially if nothing is happening, I'd not expect transaction-t_handle_count to keep changing, so it should stop very quickly. Maybe it's CONFIG_NO_HZ again, and the problem is that timeout, and simply no timer tick happening? Bingo. I think that's it. active timers: #0: hardirq_stack, tick_sched_timer, S:01 # expires at 953089300 nsecs [in -2567889 nsecs] #1: hardirq_stack, hrtimer_wakeup, S:01 # expires at 10858649798503 nsecs [in 1327754230614 nsecs] .expires_next : 953089300 nsecs See http://lkml.org/lkml/2007/3/16/288 and that in turn points to the kernel log: http://www.stardust.webpages.pl/files/tbf/bitis-gabonica/2.6.21-rc4/git-console.log - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [1/6] 2.6.21-rc4: known regressions
On Thu, Mar 22, 2007 at 06:40:41PM -0700, Linus Torvalds wrote: [ Ok, I think it's those timers again... Ingo: let me just state how *happy* I am that I told you off when you wanted to merge the hires timers and NO_HZ before 2.6.20 because they were stable. You were wrong, and 2.6.20 is at least in reasonable shape. Now we just need to make sure that 2.6.21 will be too.. ] On Thu, 22 Mar 2007, Mingming Cao wrote: I might missed something, so far I can't see a deadlock yet. If there is a deadlock, I think we should see ext3_xattr_release_block() and ext3_forget() on the stack. Is this the case? No. What's strange is that two (maybe more, I didn't check) processes seem to be stuck in [c0318981] schedule_timeout+0x70/0x8e [c03189b4] schedule_timeout_uninterruptible+0x15/0x17 [c01b964a] journal_stop+0xe2/0x1e6 [c01ba2b0] journal_force_commit+0x1d/0x1f [c01b29fb] ext3_force_commit+0x22/0x24 [c01ad607] ext3_write_inode+0x34/0x3a [c0189f74] __writeback_single_inode+0x1c5/0x2cb [c018a096] sync_inode+0x1c/0x2e [c01a9ff7] ext3_sync_file+0xab/0xc0 [c018c8c5] do_fsync+0x4b/0x98 [c018c932] __do_fsync+0x20/0x2f [c018c960] sys_fsync+0xd/0xf [c0104064] syscall_call+0x7/0xb but that that thing is literally: ... do { old_handle_count = transaction-t_handle_count; schedule_timeout_uninterruptible(1); } while (old_handle_count != transaction-t_handle_count); ... and especially if nothing is happening, I'd not expect transaction-t_handle_count to keep changing, so it should stop very quickly. Maybe it's CONFIG_NO_HZ again, and the problem is that timeout, and simply no timer tick happening? Bingo. I think that's it. active timers: #0: hardirq_stack, tick_sched_timer, S:01 # expires at 953089300 nsecs [in -2567889 nsecs] #1: hardirq_stack, hrtimer_wakeup, S:01 # expires at 10858649798503 nsecs [in 1327754230614 nsecs] .expires_next : 953089300 nsecs See http://lkml.org/lkml/2007/3/16/288 and that in turn points to the kernel log: http://www.stardust.webpages.pl/files/tbf/bitis-gabonica/2.6.21-rc4/git-console.log Seems convincing. Michal, can you post your .config, and if you had dynticks and hrtimers enabled, try reproducing without them? - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [1/6] 2.6.21-rc4: known regressions
Linus Torvalds wrote: In contrast, the hang reported by Mariusz Kozlowski has a slightly different feel to it, but there's a tantalizing pattern in there too: http://www.ussg.iu.edu/hypermail/linux/kernel/0703.0/1243.html Call Trace: [] io_schedule+0x42/0x59 [] sleep_on_buffer+0x8/0xc [] __wait_on_bit+0x47/0x6c [] out_of_line_wait_on_bit+0x5b/0x64 [] __wait_on_buffer+0x27/0x2d [] journal_commit_transaction+0x707/0x127f [] kjournald+0xac/0x1ed [] kthread+0xa2/0xc9 [] kernel_thread_helper+0x7/0x1c which certainly also looks like an IO never completed (or completed but never woke anything up). It also seems to be related to *buffers*. Maybe the whole bh layer thing is a fluke, but it's not waiting for normal data, it's very much waiting for those journal things that all use buffer heads.Which just makes me worry about those patches by Nick (which did come in through Andrew). I don't think it's the memorder one (it looks safe and shouldn't matter on x86 anyway!), but what about the fs: fix __block_write_full_page error case buffer submission locking change for example? Or that "fs: fix nobh data leak" thing with its fix? It uses "SetPageUptodate(page);" without waking up anybody who might wait for it (but the waiters here seem to wait on buffers, so that's probably not it).. Nothing sleeps on PageUptodate, so I don't think that could explain it. The fs: fix __block_write_full_page error case buffer submission patch does change the locking, but I'd be really suprised if that was the problem, because it changes locking to match the regular non-error path submission. It could be possible that ext3 is doing something weird and expecting the old behaviour if it failed get_block, but that seems pretty weird to do, and would need fixing. fs: nobh data leak... again hard to see how it could cause an unlock/wakeup to get lost. Is Mariusz using the nobh mount option? It wouldn't hurt to test with these patches backed out... Alternatively, maybe it really is an _io_ problem (and the buffer-head thing is just a red herring, and it could happen to other IO, it's just that metadata IO uses buffer heads), and it's the scheduler changes since 2.6.20.. I see what you mean. Could it be an ext3 or jbd change I wonder? -- SUSE Labs, Novell Inc. Send instant messages to your online friends http://au.messenger.yahoo.com - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [1/6] 2.6.21-rc4: known regressions
On Sun, 18 Mar 2007, Adrian Bunk wrote: > > Subject: weird system hangs > References : http://lkml.org/lkml/2007/3/16/288 > Submitter : Michal Piotrowski <[EMAIL PROTECTED]> > Mariusz Kozlowski <[EMAIL PROTECTED]> > Status : unknown According to the console log, it seems to be hung because a lot of processes are stuck in D state in various variations of this: Call Trace: [] start_this_handle+0x2d7/0x355 [] journal_start+0xb3/0xe1 [] ext3_journal_start_sb+0x48/0x4a [] ext3_create+0x47/0xe2 [] vfs_create+0xcd/0x13e [] open_namei+0x176/0x5b5 [] do_filp_open+0x26/0x3b [] do_sys_open+0x43/0xc2 [] sys_open+0x1c/0x1e [] syscall_call+0x7/0xb and then you have "kget" (whatever that is) which is doing Call Trace: [] schedule_timeout+0x70/0x8e [] schedule_timeout_uninterruptible+0x15/0x17 [] journal_stop+0xe2/0x1e6 [] journal_force_commit+0x1d/0x1f [] ext3_force_commit+0x22/0x24 [] ext3_write_inode+0x34/0x3a [] __writeback_single_inode+0x1c5/0x2cb [] sync_inode+0x1c/0x2e [] ext3_sync_file+0xab/0xc0 [] do_fsync+0x4b/0x98 [] __do_fsync+0x20/0x2f [] sys_fdatasync+0x10/0x12 [] syscall_call+0x7/0xb with kjournald in D sleep at [] journal_commit_transaction+0x15d/0x11d3 [] kjournald+0xab/0x1e8 [] kthread+0xb5/0xe0 [] kernel_thread_helper+0x7/0x10 which certainly looks like something is waiting for an IO to finish. In contrast, the hang reported by Mariusz Kozlowski has a slightly different feel to it, but there's a tantalizing pattern in there too: http://www.ussg.iu.edu/hypermail/linux/kernel/0703.0/1243.html Call Trace: [] io_schedule+0x42/0x59 [] sleep_on_buffer+0x8/0xc [] __wait_on_bit+0x47/0x6c [] out_of_line_wait_on_bit+0x5b/0x64 [] __wait_on_buffer+0x27/0x2d [] journal_commit_transaction+0x707/0x127f [] kjournald+0xac/0x1ed [] kthread+0xa2/0xc9 [] kernel_thread_helper+0x7/0x1c which certainly also looks like an IO never completed (or completed but never woke anything up). It also seems to be related to *buffers*. Maybe the whole bh layer thing is a fluke, but it's not waiting for normal data, it's very much waiting for those journal things that all use buffer heads.Which just makes me worry about those patches by Nick (which did come in through Andrew). I don't think it's the memorder one (it looks safe and shouldn't matter on x86 anyway!), but what about the fs: fix __block_write_full_page error case buffer submission locking change for example? Or that "fs: fix nobh data leak" thing with its fix? It uses "SetPageUptodate(page);" without waking up anybody who might wait for it (but the waiters here seem to wait on buffers, so that's probably not it).. Alternatively, maybe it really is an _io_ problem (and the buffer-head thing is just a red herring, and it could happen to other IO, it's just that metadata IO uses buffer heads), and it's the scheduler changes since 2.6.20.. Jens, Nick.. Could you take a look? Linus - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [1/6] 2.6.21-rc4: known regressions
On Sun, 18 Mar 2007, Adrian Bunk wrote: Subject: weird system hangs References : http://lkml.org/lkml/2007/3/16/288 Submitter : Michal Piotrowski [EMAIL PROTECTED] Mariusz Kozlowski [EMAIL PROTECTED] Status : unknown According to the console log, it seems to be hung because a lot of processes are stuck in D state in various variations of this: Call Trace: [c01ba134] start_this_handle+0x2d7/0x355 [c01ba265] journal_start+0xb3/0xe1 [c01b2837] ext3_journal_start_sb+0x48/0x4a [c01b0924] ext3_create+0x47/0xe2 [c017820c] vfs_create+0xcd/0x13e [c017ab6e] open_namei+0x176/0x5b5 [c0170026] do_filp_open+0x26/0x3b [c017007e] do_sys_open+0x43/0xc2 [c0170135] sys_open+0x1c/0x1e [c0104064] syscall_call+0x7/0xb and then you have kget (whatever that is) which is doing Call Trace: [c0318981] schedule_timeout+0x70/0x8e [c03189b4] schedule_timeout_uninterruptible+0x15/0x17 [c01b964a] journal_stop+0xe2/0x1e6 [c01ba2b0] journal_force_commit+0x1d/0x1f [c01b29fb] ext3_force_commit+0x22/0x24 [c01ad607] ext3_write_inode+0x34/0x3a [c0189f74] __writeback_single_inode+0x1c5/0x2cb [c018a096] sync_inode+0x1c/0x2e [c01a9ff7] ext3_sync_file+0xab/0xc0 [c018c8c5] do_fsync+0x4b/0x98 [c018c932] __do_fsync+0x20/0x2f [c018c951] sys_fdatasync+0x10/0x12 [c0104064] syscall_call+0x7/0xb with kjournald in D sleep at [c01bb7b2] journal_commit_transaction+0x15d/0x11d3 [c01bfcbe] kjournald+0xab/0x1e8 [c01333dd] kthread+0xb5/0xe0 [c0104cd3] kernel_thread_helper+0x7/0x10 which certainly looks like something is waiting for an IO to finish. In contrast, the hang reported by Mariusz Kozlowski has a slightly different feel to it, but there's a tantalizing pattern in there too: http://www.ussg.iu.edu/hypermail/linux/kernel/0703.0/1243.html Call Trace: [c03ec87e] io_schedule+0x42/0x59 [c0184915] sleep_on_buffer+0x8/0xc [c03ed217] __wait_on_bit+0x47/0x6c [c03ed297] out_of_line_wait_on_bit+0x5b/0x64 [c01848a8] __wait_on_buffer+0x27/0x2d [c01b4228] journal_commit_transaction+0x707/0x127f [c01b868b] kjournald+0xac/0x1ed [c0126af5] kthread+0xa2/0xc9 [c010422b] kernel_thread_helper+0x7/0x1c which certainly also looks like an IO never completed (or completed but never woke anything up). It also seems to be related to *buffers*. Maybe the whole bh layer thing is a fluke, but it's not waiting for normal data, it's very much waiting for those journal things that all use buffer heads.Which just makes me worry about those patches by Nick (which did come in through Andrew). I don't think it's the memorder one (it looks safe and shouldn't matter on x86 anyway!), but what about the fs: fix __block_write_full_page error case buffer submission locking change for example? Or that fs: fix nobh data leak thing with its fix? It uses SetPageUptodate(page); without waking up anybody who might wait for it (but the waiters here seem to wait on buffers, so that's probably not it).. Alternatively, maybe it really is an _io_ problem (and the buffer-head thing is just a red herring, and it could happen to other IO, it's just that metadata IO uses buffer heads), and it's the scheduler changes since 2.6.20.. Jens, Nick.. Could you take a look? Linus - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [1/6] 2.6.21-rc4: known regressions
Linus Torvalds wrote: In contrast, the hang reported by Mariusz Kozlowski has a slightly different feel to it, but there's a tantalizing pattern in there too: http://www.ussg.iu.edu/hypermail/linux/kernel/0703.0/1243.html Call Trace: [c03ec87e] io_schedule+0x42/0x59 [c0184915] sleep_on_buffer+0x8/0xc [c03ed217] __wait_on_bit+0x47/0x6c [c03ed297] out_of_line_wait_on_bit+0x5b/0x64 [c01848a8] __wait_on_buffer+0x27/0x2d [c01b4228] journal_commit_transaction+0x707/0x127f [c01b868b] kjournald+0xac/0x1ed [c0126af5] kthread+0xa2/0xc9 [c010422b] kernel_thread_helper+0x7/0x1c which certainly also looks like an IO never completed (or completed but never woke anything up). It also seems to be related to *buffers*. Maybe the whole bh layer thing is a fluke, but it's not waiting for normal data, it's very much waiting for those journal things that all use buffer heads.Which just makes me worry about those patches by Nick (which did come in through Andrew). I don't think it's the memorder one (it looks safe and shouldn't matter on x86 anyway!), but what about the fs: fix __block_write_full_page error case buffer submission locking change for example? Or that fs: fix nobh data leak thing with its fix? It uses SetPageUptodate(page); without waking up anybody who might wait for it (but the waiters here seem to wait on buffers, so that's probably not it).. Nothing sleeps on PageUptodate, so I don't think that could explain it. The fs: fix __block_write_full_page error case buffer submission patch does change the locking, but I'd be really suprised if that was the problem, because it changes locking to match the regular non-error path submission. It could be possible that ext3 is doing something weird and expecting the old behaviour if it failed get_block, but that seems pretty weird to do, and would need fixing. fs: nobh data leak... again hard to see how it could cause an unlock/wakeup to get lost. Is Mariusz using the nobh mount option? It wouldn't hurt to test with these patches backed out... Alternatively, maybe it really is an _io_ problem (and the buffer-head thing is just a red herring, and it could happen to other IO, it's just that metadata IO uses buffer heads), and it's the scheduler changes since 2.6.20.. I see what you mean. Could it be an ext3 or jbd change I wonder? -- SUSE Labs, Novell Inc. Send instant messages to your online friends http://au.messenger.yahoo.com - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [1/6] 2.6.21-rc4: known regressions
On Tue, Mar 20, 2007 at 11:24:41AM +0100, Tobias Diedrich wrote: > Adrian Bunk wrote: > > This email lists some known regressions in Linus' tree compared to 2.6.20. > > Since I didn't see any mention of this: > > I'm seeing an Oops when removing the ohci1394 module: > > [ 16.047275] ieee1394: Node removed: ID:BUS[158717321-38:0860] > GUID[c033ced6] > [ 16.047287] BUG: unable to handle kernel NULL pointer dereference at > virtual address 0094 > [ 16.047451] printing eip: > [ 16.047524] c02daf3d > [ 16.047527] *pde = > [ 16.047603] Oops: [#1] > [ 16.047676] PREEMPT > [ 16.047788] Modules linked in: backlight ohci1394 parport_pc parport > [ 16.048069] CPU:0 > [ 16.048071] EIP:0060:[]Not tainted VLI > [ 16.048074] EFLAGS: 00010246 (2.6.21-rc4 #35) > [ 16.048298] EIP is at class_device_remove_attrs+0xa/0x30 > [ 16.048377] eax: dfd04338 ebx: ecx: df655988 edx: > [ 16.048456] esi: edi: dfd04338 ebp: esp: df506e38 > [ 16.048535] ds: 007b es: 007b fs: 00d8 gs: 0033 ss: 0068 > [ 16.048614] Process rmmod (pid: 1455, ti=df506000 task=df6cc0b0 > task.ti=df506000) > [ 16.048693] Stack: dfd04338 dfd04340 c02db02f dfd04338 > dfd041e4 c0331871 > [ 16.049159] c02db065 dfd041b0 c0331858 c055006d 0975d589 > 0026 035c > [ 16.049626] c033ced6 df24c000 c0331879 c02d859f > df24c0bc df24c0bc > [ 16.050091] Call Trace: > [ 16.050233] [] class_device_del+0xcc/0xfa > [ 16.050352] [] __nodemgr_remove_host_dev+0x0/0xb >... > [ 16.057248] EIP: [] class_device_remove_attrs+0xa/0x30 SS:ESP > 0068:df506e38 >... You missed the following entry in my list [1]: Subject: Oops in __nodemgr_remove_host_dev References : http://lkml.org/lkml/2007/3/14/4 http://lkml.org/lkml/2007/3/18/87 Submitter : Ismail Dönmez <[EMAIL PROTECTED]> Stefan Richter <[EMAIL PROTECTED]> Thomas Meyer <[EMAIL PROTECTED]> Caused-By : Greg Kroah-Hartman <[EMAIL PROTECTED]> commit 43cb76d91ee85f579a69d42bc8efc08bac560278 commit 40cf67c5fcc513406558c01b91129280208e57bf Handled-By : Stefan Richter <[EMAIL PROTECTED]> Status : problem is being debugged cu Adrian [1] not meant as an offence - there are so many items in the list that it's easy to miss one -- "Is there not promise of rain?" Ling Tan asked suddenly out of the darkness. There had been need of rain for many days. "Only a promise," Lao Er said. Pearl S. Buck - Dragon Seed - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [1/6] 2.6.21-rc4: known regressions
Adrian Bunk wrote: > This email lists some known regressions in Linus' tree compared to 2.6.20. Since I didn't see any mention of this: I'm seeing an Oops when removing the ohci1394 module: [ 16.047275] ieee1394: Node removed: ID:BUS[158717321-38:0860] GUID[c033ced6] [ 16.047287] BUG: unable to handle kernel NULL pointer dereference at virtual address 0094 [ 16.047451] printing eip: [ 16.047524] c02daf3d [ 16.047527] *pde = [ 16.047603] Oops: [#1] [ 16.047676] PREEMPT [ 16.047788] Modules linked in: backlight ohci1394 parport_pc parport [ 16.048069] CPU:0 [ 16.048071] EIP:0060:[]Not tainted VLI [ 16.048074] EFLAGS: 00010246 (2.6.21-rc4 #35) [ 16.048298] EIP is at class_device_remove_attrs+0xa/0x30 [ 16.048377] eax: dfd04338 ebx: ecx: df655988 edx: [ 16.048456] esi: edi: dfd04338 ebp: esp: df506e38 [ 16.048535] ds: 007b es: 007b fs: 00d8 gs: 0033 ss: 0068 [ 16.048614] Process rmmod (pid: 1455, ti=df506000 task=df6cc0b0 task.ti=df506000) [ 16.048693] Stack: dfd04338 dfd04340 c02db02f dfd04338 dfd041e4 c0331871 [ 16.049159] c02db065 dfd041b0 c0331858 c055006d 0975d589 0026 035c [ 16.049626] c033ced6 df24c000 c0331879 c02d859f df24c0bc df24c0bc [ 16.050091] Call Trace: [ 16.050233] [] class_device_del+0xcc/0xfa [ 16.050352] [] __nodemgr_remove_host_dev+0x0/0xb [ 16.050475] [] class_device_unregister+0x8/0x10 [ 16.050595] [] nodemgr_remove_ne+0x61/0x7a [ 16.050714] [] ether1394_header_cache+0x0/0x43 [ 16.050835] [] __nodemgr_remove_host_dev+0x8/0xb [ 16.050954] [] device_for_each_child+0x1a/0x3c [ 16.051073] [] nodemgr_remove_host+0x30/0x90 [ 16.051192] [] __unregister_host+0x1a/0xad [ 16.051311] [] hl_get_hostinfo+0x5b/0x76 [ 16.051430] [] highlevel_remove_host+0x21/0x42 [ 16.051549] [] hpsb_remove_host+0x37/0x56 [ 16.051668] [] ohci1394_pci_remove+0x44/0x1c7 [ohci1394] [ 16.051794] [] pci_device_remove+0x16/0x35 [ 16.053376] [] __device_release_driver+0x6e/0x8b [ 16.053496] [] driver_detach+0xa1/0xde [ 16.053613] [] bus_remove_driver+0x57/0x75 [ 16.053733] [] driver_unregister+0x8/0x13 [ 16.053850] [] pci_unregister_driver+0xc/0x6e [ 16.053969] [] sys_delete_module+0x174/0x19a [ 16.054091] [] do_page_fault+0x277/0x525 [ 16.054211] [] do_munmap+0x193/0x1ac [ 16.054331] [] syscall_call+0x7/0xb [ 16.054450] === [ 16.054523] Code: ff c3 85 c0 74 08 83 c0 08 e9 9b f8 ea ff b8 ea ff ff ff c3 85 c0 74 08 83 c0 08 e9 b9 db ea ff c3 57 89 c7 56 53 31 db 8b 70 44 <83> be 94 00 00 00 00 75 09 eb 17 89 f8 e8 d7 ff ff ff 89 da 83 [ 16.057248] EIP: [] class_device_remove_attrs+0xa/0x30 SS:ESP 0068:df506e38 -- Tobias PGP: http://9ac7e0bc.uguu.de - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [1/6] 2.6.21-rc4: known regressions
Adrian Bunk wrote: This email lists some known regressions in Linus' tree compared to 2.6.20. Since I didn't see any mention of this: I'm seeing an Oops when removing the ohci1394 module: [ 16.047275] ieee1394: Node removed: ID:BUS[158717321-38:0860] GUID[c033ced6] [ 16.047287] BUG: unable to handle kernel NULL pointer dereference at virtual address 0094 [ 16.047451] printing eip: [ 16.047524] c02daf3d [ 16.047527] *pde = [ 16.047603] Oops: [#1] [ 16.047676] PREEMPT [ 16.047788] Modules linked in: backlight ohci1394 parport_pc parport [ 16.048069] CPU:0 [ 16.048071] EIP:0060:[c02daf3d]Not tainted VLI [ 16.048074] EFLAGS: 00010246 (2.6.21-rc4 #35) [ 16.048298] EIP is at class_device_remove_attrs+0xa/0x30 [ 16.048377] eax: dfd04338 ebx: ecx: df655988 edx: [ 16.048456] esi: edi: dfd04338 ebp: esp: df506e38 [ 16.048535] ds: 007b es: 007b fs: 00d8 gs: 0033 ss: 0068 [ 16.048614] Process rmmod (pid: 1455, ti=df506000 task=df6cc0b0 task.ti=df506000) [ 16.048693] Stack: dfd04338 dfd04340 c02db02f dfd04338 dfd041e4 c0331871 [ 16.049159] c02db065 dfd041b0 c0331858 c055006d 0975d589 0026 035c [ 16.049626] c033ced6 df24c000 c0331879 c02d859f df24c0bc df24c0bc [ 16.050091] Call Trace: [ 16.050233] [c02db02f] class_device_del+0xcc/0xfa [ 16.050352] [c0331871] __nodemgr_remove_host_dev+0x0/0xb [ 16.050475] [c02db065] class_device_unregister+0x8/0x10 [ 16.050595] [c0331858] nodemgr_remove_ne+0x61/0x7a [ 16.050714] [c033ced6] ether1394_header_cache+0x0/0x43 [ 16.050835] [c0331879] __nodemgr_remove_host_dev+0x8/0xb [ 16.050954] [c02d859f] device_for_each_child+0x1a/0x3c [ 16.051073] [c0331b98] nodemgr_remove_host+0x30/0x90 [ 16.051192] [c032f12c] __unregister_host+0x1a/0xad [ 16.051311] [c032ee17] hl_get_hostinfo+0x5b/0x76 [ 16.051430] [c032f34a] highlevel_remove_host+0x21/0x42 [ 16.051549] [c032ed9d] hpsb_remove_host+0x37/0x56 [ 16.051668] [e0869263] ohci1394_pci_remove+0x44/0x1c7 [ohci1394] [ 16.051794] [c027e5b0] pci_device_remove+0x16/0x35 [ 16.053376] [c02da6d7] __device_release_driver+0x6e/0x8b [ 16.053496] [c02dab77] driver_detach+0xa1/0xde [ 16.053613] [c02da33f] bus_remove_driver+0x57/0x75 [ 16.053733] [c02dabd4] driver_unregister+0x8/0x13 [ 16.053850] [c027e732] pci_unregister_driver+0xc/0x6e [ 16.053969] [c0134d56] sys_delete_module+0x174/0x19a [ 16.054091] [c0113cea] do_page_fault+0x277/0x525 [ 16.054211] [c0148b0d] do_munmap+0x193/0x1ac [ 16.054331] [c0103d0c] syscall_call+0x7/0xb [ 16.054450] === [ 16.054523] Code: ff c3 85 c0 74 08 83 c0 08 e9 9b f8 ea ff b8 ea ff ff ff c3 85 c0 74 08 83 c0 08 e9 b9 db ea ff c3 57 89 c7 56 53 31 db 8b 70 44 83 be 94 00 00 00 00 75 09 eb 17 89 f8 e8 d7 ff ff ff 89 da 83 [ 16.057248] EIP: [c02daf3d] class_device_remove_attrs+0xa/0x30 SS:ESP 0068:df506e38 -- Tobias PGP: http://9ac7e0bc.uguu.de - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [1/6] 2.6.21-rc4: known regressions
On Tue, Mar 20, 2007 at 11:24:41AM +0100, Tobias Diedrich wrote: Adrian Bunk wrote: This email lists some known regressions in Linus' tree compared to 2.6.20. Since I didn't see any mention of this: I'm seeing an Oops when removing the ohci1394 module: [ 16.047275] ieee1394: Node removed: ID:BUS[158717321-38:0860] GUID[c033ced6] [ 16.047287] BUG: unable to handle kernel NULL pointer dereference at virtual address 0094 [ 16.047451] printing eip: [ 16.047524] c02daf3d [ 16.047527] *pde = [ 16.047603] Oops: [#1] [ 16.047676] PREEMPT [ 16.047788] Modules linked in: backlight ohci1394 parport_pc parport [ 16.048069] CPU:0 [ 16.048071] EIP:0060:[c02daf3d]Not tainted VLI [ 16.048074] EFLAGS: 00010246 (2.6.21-rc4 #35) [ 16.048298] EIP is at class_device_remove_attrs+0xa/0x30 [ 16.048377] eax: dfd04338 ebx: ecx: df655988 edx: [ 16.048456] esi: edi: dfd04338 ebp: esp: df506e38 [ 16.048535] ds: 007b es: 007b fs: 00d8 gs: 0033 ss: 0068 [ 16.048614] Process rmmod (pid: 1455, ti=df506000 task=df6cc0b0 task.ti=df506000) [ 16.048693] Stack: dfd04338 dfd04340 c02db02f dfd04338 dfd041e4 c0331871 [ 16.049159] c02db065 dfd041b0 c0331858 c055006d 0975d589 0026 035c [ 16.049626] c033ced6 df24c000 c0331879 c02d859f df24c0bc df24c0bc [ 16.050091] Call Trace: [ 16.050233] [c02db02f] class_device_del+0xcc/0xfa [ 16.050352] [c0331871] __nodemgr_remove_host_dev+0x0/0xb ... [ 16.057248] EIP: [c02daf3d] class_device_remove_attrs+0xa/0x30 SS:ESP 0068:df506e38 ... You missed the following entry in my list [1]: Subject: Oops in __nodemgr_remove_host_dev References : http://lkml.org/lkml/2007/3/14/4 http://lkml.org/lkml/2007/3/18/87 Submitter : Ismail Dönmez [EMAIL PROTECTED] Stefan Richter [EMAIL PROTECTED] Thomas Meyer [EMAIL PROTECTED] Caused-By : Greg Kroah-Hartman [EMAIL PROTECTED] commit 43cb76d91ee85f579a69d42bc8efc08bac560278 commit 40cf67c5fcc513406558c01b91129280208e57bf Handled-By : Stefan Richter [EMAIL PROTECTED] Status : problem is being debugged cu Adrian [1] not meant as an offence - there are so many items in the list that it's easy to miss one -- Is there not promise of rain? Ling Tan asked suddenly out of the darkness. There had been need of rain for many days. Only a promise, Lao Er said. Pearl S. Buck - Dragon Seed - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[1/6] 2.6.21-rc4: known regressions
This email lists some known regressions in Linus' tree compared to 2.6.20. If you find your name in the Cc header, you are either submitter of one of the bugs, maintainer of an affectected subsystem or driver, a patch of you caused a breakage or I'm considering you in any other way possibly involved with one or more of these issues. Due to the huge amount of recipients, please trim the Cc when answering. Subject: weird system hangs References : http://lkml.org/lkml/2007/3/16/288 Submitter : Michal Piotrowski <[EMAIL PROTECTED]> Mariusz Kozlowski <[EMAIL PROTECTED]> Status : unknown Subject: crashes in KDE References : http://bugzilla.kernel.org/show_bug.cgi?id=8157 Submitter : Oliver Pinter <[EMAIL PROTECTED]> Status : unknown Subject: kwin dies silently References : http://lkml.org/lkml/2007/2/28/112 Submitter : Sid Boyce <[EMAIL PROTECTED]> Status : unknown - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[1/6] 2.6.21-rc4: known regressions
This email lists some known regressions in Linus' tree compared to 2.6.20. If you find your name in the Cc header, you are either submitter of one of the bugs, maintainer of an affectected subsystem or driver, a patch of you caused a breakage or I'm considering you in any other way possibly involved with one or more of these issues. Due to the huge amount of recipients, please trim the Cc when answering. Subject: weird system hangs References : http://lkml.org/lkml/2007/3/16/288 Submitter : Michal Piotrowski [EMAIL PROTECTED] Mariusz Kozlowski [EMAIL PROTECTED] Status : unknown Subject: crashes in KDE References : http://bugzilla.kernel.org/show_bug.cgi?id=8157 Submitter : Oliver Pinter [EMAIL PROTECTED] Status : unknown Subject: kwin dies silently References : http://lkml.org/lkml/2007/2/28/112 Submitter : Sid Boyce [EMAIL PROTECTED] Status : unknown - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/