Re: [1/6] 2.6.21-rc4: known regressions

2007-03-23 Thread Thomas Gleixner
On Fri, 2007-03-23 at 11:28 -0700, Linus Torvalds wrote:
> 
> On Fri, 23 Mar 2007, Linus Torvalds wrote:
> > 
> > Thomas, please fix.
> 
> Here's a possible fix. It compiles. And I still wish we had common files.

You beat me by 30 seconds.

> ia64 shouldn't be affected, because ia64 doesn't #define the 
> ARCH_APICTIMER_STOPS_ON_C3 flag (and then we don't use the "c2_ok" thing 
> either. 

Right, ia64 does not see it.

> But this is still pretty damn ugly.

Yes it is.

> Maybe a field in "struct acpi_processor" for C2/C3 problems?

Hmm, the acpi processor stuff is modular.

tglx


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [1/6] 2.6.21-rc4: known regressions

2007-03-23 Thread Linus Torvalds


On Fri, 23 Mar 2007, Linus Torvalds wrote:
> 
> Thomas, please fix.

Here's a possible fix. It compiles. And I still wish we had common files.

ia64 shouldn't be affected, because ia64 doesn't #define the 
ARCH_APICTIMER_STOPS_ON_C3 flag (and then we don't use the "c2_ok" thing 
either. But this is still pretty damn ugly.

Maybe a field in "struct acpi_processor" for C2/C3 problems?

Linus

---
diff --git a/arch/x86_64/kernel/apic.c b/arch/x86_64/kernel/apic.c
index 723417d..46acf4f 100644
--- a/arch/x86_64/kernel/apic.c
+++ b/arch/x86_64/kernel/apic.c
@@ -47,6 +47,10 @@ int apic_calibrate_pmtmr __initdata;
 
 int disable_apic_timer __initdata;
 
+/* Local APIC timer works in C2? */
+int local_apic_timer_c2_ok;
+EXPORT_SYMBOL_GPL(local_apic_timer_c2_ok);
+
 static struct resource *ioapic_resources;
 static struct resource lapic_resource = {
.name = "Local APIC",
@@ -1192,6 +1196,13 @@ static __init int setup_nolapic(char *str)
 } 
 early_param("nolapic", setup_nolapic);
 
+static int __init parse_lapic_timer_c2_ok(char *arg)
+{
+   local_apic_timer_c2_ok = 1;
+   return 0;
+}
+early_param("lapic_timer_c2_ok", parse_lapic_timer_c2_ok);
+
 static __init int setup_noapictimer(char *str) 
 { 
if (str[0] != ' ' && str[0] != 0)
diff --git a/include/asm-x86_64/apic.h b/include/asm-x86_64/apic.h
index e81d0f2..7cfb39c 100644
--- a/include/asm-x86_64/apic.h
+++ b/include/asm-x86_64/apic.h
@@ -102,5 +102,6 @@ void switch_ipi_to_APIC_timer(void *cpumask);
 #define ARCH_APICTIMER_STOPS_ON_C3 1
 
 extern unsigned boot_cpu_id;
+extern int local_apic_timer_c2_ok;
 
 #endif /* __ASM_APIC_H */
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [1/6] 2.6.21-rc4: known regressions

2007-03-23 Thread Linus Torvalds


On Fri, 23 Mar 2007, Linus Torvalds wrote:
> 
> I really wish we had an x86-64 maintainer that understood that it's 
> confusing that files in arch/i386/ are also used for arch/x86-64.

Sorry, that was unfair. The patch was simply buggy. It added the test to 
drivers/acpi/ *without* adding it to the architectures that used it, it 
wasn't an i386/x86-64 thing.

Thomas, please fix.

Linus
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [1/6] 2.6.21-rc4: known regressions

2007-03-23 Thread Linus Torvalds


On Fri, 23 Mar 2007, Thomas Gleixner wrote:
> 
> We should revert that patch and add a "trust_lapic_timer_in_c2"
> commandline option instead. So we are on the safe side.

Damn. I applied your patch, but it breaks on x86-64:

   drivers/acpi/processor_idle.c:271: error: 'local_apic_timer_c2_ok' 
undeclared (f irst use in this function)

I really wish we had an x86-64 maintainer that understood that it's 
confusing that files in arch/i386/ are also used for arch/x86-64.

Linus
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [1/6] 2.6.21-rc4: known regressions

2007-03-23 Thread Adrian Bunk
On Fri, Mar 23, 2007 at 10:37:38AM +0100, Nick Piggin wrote:
> On Fri, Mar 23, 2007 at 08:51:13AM +0100, Michal Piotrowski wrote:
> > On 23/03/07, Nick Piggin <[EMAIL PROTECTED]> wrote:
> > >>
> > >> and that in turn points to the kernel log:
> > >>
> > >>   
> > >http://www.stardust.webpages.pl/files/tbf/bitis-gabonica/2.6.21-rc4/git-console.log
> > >
> > >Seems convincing. Michal, can you post your .config, and if you had
> > >dynticks and hrtimers enabled, try reproducing without them?
> > >
> > 
> > http://www.stardust.webpages.pl/files/tbf/bitis-gabonica/2.6.21-rc4/git-config
> > 
> > I don't know how to reproduce this bug on 2.6.21-rc4.On 2.6.21-rc2-mm1
> > it was very simple, just run youtube, bash_shared_mapping etc. In fact
> > I didn't see this bug for a week.
> 
> OK... for some reason this is listed as a regression against 2.6.21-rc4.
>...

Due to
   http://lkml.org/lkml/2007/3/16/288

cu
Adrian

-- 

   "Is there not promise of rain?" Ling Tan asked suddenly out
of the darkness. There had been need of rain for many days.
   "Only a promise," Lao Er said.
   Pearl S. Buck - Dragon Seed

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [1/6] 2.6.21-rc4: known regressions

2007-03-23 Thread Ingo Molnar

* Linus Torvalds <[EMAIL PROTECTED]> wrote:

> [ Ok, I think it's those timers again...

agreed - this seems to be a genuine CONFIG_HIGH_RES_TIMERS=y bug. (which 
has probably not been fixed since -rc4 either, we have no bugfix in this 
area that could explain the expires_next==KTIME_MAX timer state visible 
in SysRq-Q.)

there seems to be a trend in the reports: HT P4 CPUs.

>   Ingo: let me just state how *happy* I am that I told you off when 
>   you wanted to merge the hires timers and NO_HZ before 2.6.20 because 
>   they were "stable". You were wrong, and 2.6.20 is at least in 
>   reasonable shape. [...]

yes - i was quite wrong pushing it so hard. (and doubly so given your 
stated focus of making v2.6.20 a quiet release) Sorry :-/

> [...] Now we just need to make sure that 2.6.21 will be too.. ]

yeah - we are working hard on it.

Ingo
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [1/6] 2.6.21-rc4: known regressions

2007-03-23 Thread Thomas Gleixner
On Fri, 2007-03-23 at 12:42 +0100, Ingo Molnar wrote:
> there's a new post-rc4 regression: my T60 hangs during early bootup. I 
> bisected the hang down to this recent commit:
> 
> | commit 25496caec111481161e7f06bbfa12a533c43cc6f
> | Author: Thomas Renninger <[EMAIL PROTECTED]>
> | Date:   Tue Feb 27 12:13:00 2007 -0500
> |
> |ACPI: Only use IPI on known broken machines (AMD, Dothan/BaniasPentium M)
> 
> undoing this change fixes my T60 so it correctly boots again.
> 
> the commit has this confidence-raising comment:
> 
> |   However, I am not sure about the naming of the parameter and how it 
> |   could/should get integrated into the dyntick part 
> |   (CONFIG_GENERIC_CLOCKEVENTS). There, a more fine grained check (TSC 
> |   still running?, ..) is needed?
> 
> could we please revert this commit until it's done correctly?
> 
> and did this end up being a 'fix'? The change weakens the scope of a 
> hardware workaround, which IMO has no place so late in the cycle. At a 
> minimum the clockevents maintainer (Thomas) should have been Cc:-ed on 
> it.

Ingo, 

I had seen it before, and I had no objections under the premise, that it
does not break things and especially survives on Andrews VAIO. I
expected that to come in via -mm so it gets enough testing.

We should revert that patch and add a "trust_lapic_timer_in_c2"
commandline option instead. So we are on the safe side.

tglx


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [1/6] 2.6.21-rc4: known regressions

2007-03-23 Thread Ingo Molnar

there's a new post-rc4 regression: my T60 hangs during early bootup. I 
bisected the hang down to this recent commit:

| commit 25496caec111481161e7f06bbfa12a533c43cc6f
| Author: Thomas Renninger <[EMAIL PROTECTED]>
| Date:   Tue Feb 27 12:13:00 2007 -0500
|
|ACPI: Only use IPI on known broken machines (AMD, Dothan/BaniasPentium M)

undoing this change fixes my T60 so it correctly boots again.

the commit has this confidence-raising comment:

|   However, I am not sure about the naming of the parameter and how it 
|   could/should get integrated into the dyntick part 
|   (CONFIG_GENERIC_CLOCKEVENTS). There, a more fine grained check (TSC 
|   still running?, ..) is needed?

could we please revert this commit until it's done correctly?

and did this end up being a 'fix'? The change weakens the scope of a 
hardware workaround, which IMO has no place so late in the cycle. At a 
minimum the clockevents maintainer (Thomas) should have been Cc:-ed on 
it.

Ingo
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [1/6] 2.6.21-rc4: known regressions

2007-03-23 Thread Nick Piggin
On Fri, Mar 23, 2007 at 08:51:13AM +0100, Michal Piotrowski wrote:
> On 23/03/07, Nick Piggin <[EMAIL PROTECTED]> wrote:
> >>
> >> and that in turn points to the kernel log:
> >>
> >>   
> >http://www.stardust.webpages.pl/files/tbf/bitis-gabonica/2.6.21-rc4/git-console.log
> >
> >Seems convincing. Michal, can you post your .config, and if you had
> >dynticks and hrtimers enabled, try reproducing without them?
> >
> 
> http://www.stardust.webpages.pl/files/tbf/bitis-gabonica/2.6.21-rc4/git-config
> 
> I don't know how to reproduce this bug on 2.6.21-rc4.On 2.6.21-rc2-mm1
> it was very simple, just run youtube, bash_shared_mapping etc. In fact
> I didn't see this bug for a week.

OK... for some reason this is listed as a regression against 2.6.21-rc4.

You do have CONFIG_NO_HZ=y, and it is likely to be the cause of your
2.6.21-rc2-mm1 problems, but maybe there have been fixes since then? Ingo?

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [1/6] 2.6.21-rc4: known regressions

2007-03-23 Thread Michal Piotrowski

On 23/03/07, Nick Piggin <[EMAIL PROTECTED]> wrote:

On Thu, Mar 22, 2007 at 06:40:41PM -0700, Linus Torvalds wrote:
>
> [ Ok, I think it's those timers again...
>
>   Ingo: let me just state how *happy* I am that I told you off when you
>   wanted to merge the hires timers and NO_HZ before 2.6.20 because they
>   were "stable". You were wrong, and 2.6.20 is at least in reasonable
>   shape. Now we just need to make sure that 2.6.21 will be too.. ]
>
> On Thu, 22 Mar 2007, Mingming Cao wrote:
> >
> > I might missed something, so far I can't see a deadlock yet.
> > If there is a deadlock, I think we should see ext3_xattr_release_block()
> > and ext3_forget() on the stack. Is this the case?
>
> No. What's strange is that two (maybe more, I didn't check) processes seem
> to be stuck in
>
>[] schedule_timeout+0x70/0x8e
>[] schedule_timeout_uninterruptible+0x15/0x17
>[] journal_stop+0xe2/0x1e6
>[] journal_force_commit+0x1d/0x1f
>[] ext3_force_commit+0x22/0x24
>[] ext3_write_inode+0x34/0x3a
>[] __writeback_single_inode+0x1c5/0x2cb
>[] sync_inode+0x1c/0x2e
>[] ext3_sync_file+0xab/0xc0
>[] do_fsync+0x4b/0x98
>[] __do_fsync+0x20/0x2f
>[] sys_fsync+0xd/0xf
>[] syscall_call+0x7/0xb
>
> but that that thing is literally:
>
>   ...
> do {
> old_handle_count = transaction->t_handle_count;
> schedule_timeout_uninterruptible(1);
> } while (old_handle_count != transaction->t_handle_count);
>   ...
>
> and especially if nothing is happening, I'd not expect
> "transaction->t_handle_count" to keep changing, so it should stop very
> quickly.
>
> Maybe it's CONFIG_NO_HZ again, and the problem is that timeout, and simply
> no timer tick happening?
>
> Bingo. I think that's it.
>
>   active timers:
>#0: hardirq_stack, tick_sched_timer, S:01
># expires at 953089300 nsecs [in -2567889 nsecs]
>#1: hardirq_stack, hrtimer_wakeup, S:01
># expires at 10858649798503 nsecs [in 1327754230614 nsecs]
> .expires_next   : 953089300 nsecs
>
> See
>
>   http://lkml.org/lkml/2007/3/16/288
>
> and that in turn points to the kernel log:
>
>   
http://www.stardust.webpages.pl/files/tbf/bitis-gabonica/2.6.21-rc4/git-console.log

Seems convincing. Michal, can you post your .config, and if you had
dynticks and hrtimers enabled, try reproducing without them?



http://www.stardust.webpages.pl/files/tbf/bitis-gabonica/2.6.21-rc4/git-config

I don't know how to reproduce this bug on 2.6.21-rc4.On 2.6.21-rc2-mm1
it was very simple, just run youtube, bash_shared_mapping etc. In fact
I didn't see this bug for a week.

Unfortunately, I wasn't able to take a crash dump because of sound
card driver bug (I've got crash dump from 2.6.21-rc2-mm1).

Regards,
Michal

--
Michal K. K. Piotrowski
LTG - Linux Testers Group (PL)
(http://www.stardust.webpages.pl/ltg/)
LTG - Linux Testers Group (EN)
(http://www.stardust.webpages.pl/linux_testers_group_en/)
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [1/6] 2.6.21-rc4: known regressions

2007-03-23 Thread Michal Piotrowski

On 23/03/07, Nick Piggin [EMAIL PROTECTED] wrote:

On Thu, Mar 22, 2007 at 06:40:41PM -0700, Linus Torvalds wrote:

 [ Ok, I think it's those timers again...

   Ingo: let me just state how *happy* I am that I told you off when you
   wanted to merge the hires timers and NO_HZ before 2.6.20 because they
   were stable. You were wrong, and 2.6.20 is at least in reasonable
   shape. Now we just need to make sure that 2.6.21 will be too.. ]

 On Thu, 22 Mar 2007, Mingming Cao wrote:
 
  I might missed something, so far I can't see a deadlock yet.
  If there is a deadlock, I think we should see ext3_xattr_release_block()
  and ext3_forget() on the stack. Is this the case?

 No. What's strange is that two (maybe more, I didn't check) processes seem
 to be stuck in

[c0318981] schedule_timeout+0x70/0x8e
[c03189b4] schedule_timeout_uninterruptible+0x15/0x17
[c01b964a] journal_stop+0xe2/0x1e6
[c01ba2b0] journal_force_commit+0x1d/0x1f
[c01b29fb] ext3_force_commit+0x22/0x24
[c01ad607] ext3_write_inode+0x34/0x3a
[c0189f74] __writeback_single_inode+0x1c5/0x2cb
[c018a096] sync_inode+0x1c/0x2e
[c01a9ff7] ext3_sync_file+0xab/0xc0
[c018c8c5] do_fsync+0x4b/0x98
[c018c932] __do_fsync+0x20/0x2f
[c018c960] sys_fsync+0xd/0xf
[c0104064] syscall_call+0x7/0xb

 but that that thing is literally:

   ...
 do {
 old_handle_count = transaction-t_handle_count;
 schedule_timeout_uninterruptible(1);
 } while (old_handle_count != transaction-t_handle_count);
   ...

 and especially if nothing is happening, I'd not expect
 transaction-t_handle_count to keep changing, so it should stop very
 quickly.

 Maybe it's CONFIG_NO_HZ again, and the problem is that timeout, and simply
 no timer tick happening?

 Bingo. I think that's it.

   active timers:
#0: hardirq_stack, tick_sched_timer, S:01
# expires at 953089300 nsecs [in -2567889 nsecs]
#1: hardirq_stack, hrtimer_wakeup, S:01
# expires at 10858649798503 nsecs [in 1327754230614 nsecs]
 .expires_next   : 953089300 nsecs

 See

   http://lkml.org/lkml/2007/3/16/288

 and that in turn points to the kernel log:

   
http://www.stardust.webpages.pl/files/tbf/bitis-gabonica/2.6.21-rc4/git-console.log

Seems convincing. Michal, can you post your .config, and if you had
dynticks and hrtimers enabled, try reproducing without them?



http://www.stardust.webpages.pl/files/tbf/bitis-gabonica/2.6.21-rc4/git-config

I don't know how to reproduce this bug on 2.6.21-rc4.On 2.6.21-rc2-mm1
it was very simple, just run youtube, bash_shared_mapping etc. In fact
I didn't see this bug for a week.

Unfortunately, I wasn't able to take a crash dump because of sound
card driver bug (I've got crash dump from 2.6.21-rc2-mm1).

Regards,
Michal

--
Michal K. K. Piotrowski
LTG - Linux Testers Group (PL)
(http://www.stardust.webpages.pl/ltg/)
LTG - Linux Testers Group (EN)
(http://www.stardust.webpages.pl/linux_testers_group_en/)
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [1/6] 2.6.21-rc4: known regressions

2007-03-23 Thread Nick Piggin
On Fri, Mar 23, 2007 at 08:51:13AM +0100, Michal Piotrowski wrote:
 On 23/03/07, Nick Piggin [EMAIL PROTECTED] wrote:
 
  and that in turn points to the kernel log:
 

 http://www.stardust.webpages.pl/files/tbf/bitis-gabonica/2.6.21-rc4/git-console.log
 
 Seems convincing. Michal, can you post your .config, and if you had
 dynticks and hrtimers enabled, try reproducing without them?
 
 
 http://www.stardust.webpages.pl/files/tbf/bitis-gabonica/2.6.21-rc4/git-config
 
 I don't know how to reproduce this bug on 2.6.21-rc4.On 2.6.21-rc2-mm1
 it was very simple, just run youtube, bash_shared_mapping etc. In fact
 I didn't see this bug for a week.

OK... for some reason this is listed as a regression against 2.6.21-rc4.

You do have CONFIG_NO_HZ=y, and it is likely to be the cause of your
2.6.21-rc2-mm1 problems, but maybe there have been fixes since then? Ingo?

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [1/6] 2.6.21-rc4: known regressions

2007-03-23 Thread Ingo Molnar

there's a new post-rc4 regression: my T60 hangs during early bootup. I 
bisected the hang down to this recent commit:

| commit 25496caec111481161e7f06bbfa12a533c43cc6f
| Author: Thomas Renninger [EMAIL PROTECTED]
| Date:   Tue Feb 27 12:13:00 2007 -0500
|
|ACPI: Only use IPI on known broken machines (AMD, Dothan/BaniasPentium M)

undoing this change fixes my T60 so it correctly boots again.

the commit has this confidence-raising comment:

|   However, I am not sure about the naming of the parameter and how it 
|   could/should get integrated into the dyntick part 
|   (CONFIG_GENERIC_CLOCKEVENTS). There, a more fine grained check (TSC 
|   still running?, ..) is needed?

could we please revert this commit until it's done correctly?

and did this end up being a 'fix'? The change weakens the scope of a 
hardware workaround, which IMO has no place so late in the cycle. At a 
minimum the clockevents maintainer (Thomas) should have been Cc:-ed on 
it.

Ingo
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [1/6] 2.6.21-rc4: known regressions

2007-03-23 Thread Thomas Gleixner
On Fri, 2007-03-23 at 12:42 +0100, Ingo Molnar wrote:
 there's a new post-rc4 regression: my T60 hangs during early bootup. I 
 bisected the hang down to this recent commit:
 
 | commit 25496caec111481161e7f06bbfa12a533c43cc6f
 | Author: Thomas Renninger [EMAIL PROTECTED]
 | Date:   Tue Feb 27 12:13:00 2007 -0500
 |
 |ACPI: Only use IPI on known broken machines (AMD, Dothan/BaniasPentium M)
 
 undoing this change fixes my T60 so it correctly boots again.
 
 the commit has this confidence-raising comment:
 
 |   However, I am not sure about the naming of the parameter and how it 
 |   could/should get integrated into the dyntick part 
 |   (CONFIG_GENERIC_CLOCKEVENTS). There, a more fine grained check (TSC 
 |   still running?, ..) is needed?
 
 could we please revert this commit until it's done correctly?
 
 and did this end up being a 'fix'? The change weakens the scope of a 
 hardware workaround, which IMO has no place so late in the cycle. At a 
 minimum the clockevents maintainer (Thomas) should have been Cc:-ed on 
 it.

Ingo, 

I had seen it before, and I had no objections under the premise, that it
does not break things and especially survives on Andrews VAIO. I
expected that to come in via -mm so it gets enough testing.

We should revert that patch and add a trust_lapic_timer_in_c2
commandline option instead. So we are on the safe side.

tglx


-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [1/6] 2.6.21-rc4: known regressions

2007-03-23 Thread Ingo Molnar

* Linus Torvalds [EMAIL PROTECTED] wrote:

 [ Ok, I think it's those timers again...

agreed - this seems to be a genuine CONFIG_HIGH_RES_TIMERS=y bug. (which 
has probably not been fixed since -rc4 either, we have no bugfix in this 
area that could explain the expires_next==KTIME_MAX timer state visible 
in SysRq-Q.)

there seems to be a trend in the reports: HT P4 CPUs.

   Ingo: let me just state how *happy* I am that I told you off when 
   you wanted to merge the hires timers and NO_HZ before 2.6.20 because 
   they were stable. You were wrong, and 2.6.20 is at least in 
   reasonable shape. [...]

yes - i was quite wrong pushing it so hard. (and doubly so given your 
stated focus of making v2.6.20 a quiet release) Sorry :-/

 [...] Now we just need to make sure that 2.6.21 will be too.. ]

yeah - we are working hard on it.

Ingo
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [1/6] 2.6.21-rc4: known regressions

2007-03-23 Thread Adrian Bunk
On Fri, Mar 23, 2007 at 10:37:38AM +0100, Nick Piggin wrote:
 On Fri, Mar 23, 2007 at 08:51:13AM +0100, Michal Piotrowski wrote:
  On 23/03/07, Nick Piggin [EMAIL PROTECTED] wrote:
  
   and that in turn points to the kernel log:
  
 
  http://www.stardust.webpages.pl/files/tbf/bitis-gabonica/2.6.21-rc4/git-console.log
  
  Seems convincing. Michal, can you post your .config, and if you had
  dynticks and hrtimers enabled, try reproducing without them?
  
  
  http://www.stardust.webpages.pl/files/tbf/bitis-gabonica/2.6.21-rc4/git-config
  
  I don't know how to reproduce this bug on 2.6.21-rc4.On 2.6.21-rc2-mm1
  it was very simple, just run youtube, bash_shared_mapping etc. In fact
  I didn't see this bug for a week.
 
 OK... for some reason this is listed as a regression against 2.6.21-rc4.
...

Due to
   http://lkml.org/lkml/2007/3/16/288

cu
Adrian

-- 

   Is there not promise of rain? Ling Tan asked suddenly out
of the darkness. There had been need of rain for many days.
   Only a promise, Lao Er said.
   Pearl S. Buck - Dragon Seed

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [1/6] 2.6.21-rc4: known regressions

2007-03-23 Thread Linus Torvalds


On Fri, 23 Mar 2007, Thomas Gleixner wrote:
 
 We should revert that patch and add a trust_lapic_timer_in_c2
 commandline option instead. So we are on the safe side.

Damn. I applied your patch, but it breaks on x86-64:

   drivers/acpi/processor_idle.c:271: error: 'local_apic_timer_c2_ok' 
undeclared (f irst use in this function)

I really wish we had an x86-64 maintainer that understood that it's 
confusing that files in arch/i386/ are also used for arch/x86-64.

Linus
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [1/6] 2.6.21-rc4: known regressions

2007-03-23 Thread Linus Torvalds


On Fri, 23 Mar 2007, Linus Torvalds wrote:
 
 I really wish we had an x86-64 maintainer that understood that it's 
 confusing that files in arch/i386/ are also used for arch/x86-64.

Sorry, that was unfair. The patch was simply buggy. It added the test to 
drivers/acpi/ *without* adding it to the architectures that used it, it 
wasn't an i386/x86-64 thing.

Thomas, please fix.

Linus
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [1/6] 2.6.21-rc4: known regressions

2007-03-23 Thread Linus Torvalds


On Fri, 23 Mar 2007, Linus Torvalds wrote:
 
 Thomas, please fix.

Here's a possible fix. It compiles. And I still wish we had common files.

ia64 shouldn't be affected, because ia64 doesn't #define the 
ARCH_APICTIMER_STOPS_ON_C3 flag (and then we don't use the c2_ok thing 
either. But this is still pretty damn ugly.

Maybe a field in struct acpi_processor for C2/C3 problems?

Linus

---
diff --git a/arch/x86_64/kernel/apic.c b/arch/x86_64/kernel/apic.c
index 723417d..46acf4f 100644
--- a/arch/x86_64/kernel/apic.c
+++ b/arch/x86_64/kernel/apic.c
@@ -47,6 +47,10 @@ int apic_calibrate_pmtmr __initdata;
 
 int disable_apic_timer __initdata;
 
+/* Local APIC timer works in C2? */
+int local_apic_timer_c2_ok;
+EXPORT_SYMBOL_GPL(local_apic_timer_c2_ok);
+
 static struct resource *ioapic_resources;
 static struct resource lapic_resource = {
.name = Local APIC,
@@ -1192,6 +1196,13 @@ static __init int setup_nolapic(char *str)
 } 
 early_param(nolapic, setup_nolapic);
 
+static int __init parse_lapic_timer_c2_ok(char *arg)
+{
+   local_apic_timer_c2_ok = 1;
+   return 0;
+}
+early_param(lapic_timer_c2_ok, parse_lapic_timer_c2_ok);
+
 static __init int setup_noapictimer(char *str) 
 { 
if (str[0] != ' '  str[0] != 0)
diff --git a/include/asm-x86_64/apic.h b/include/asm-x86_64/apic.h
index e81d0f2..7cfb39c 100644
--- a/include/asm-x86_64/apic.h
+++ b/include/asm-x86_64/apic.h
@@ -102,5 +102,6 @@ void switch_ipi_to_APIC_timer(void *cpumask);
 #define ARCH_APICTIMER_STOPS_ON_C3 1
 
 extern unsigned boot_cpu_id;
+extern int local_apic_timer_c2_ok;
 
 #endif /* __ASM_APIC_H */
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [1/6] 2.6.21-rc4: known regressions

2007-03-23 Thread Thomas Gleixner
On Fri, 2007-03-23 at 11:28 -0700, Linus Torvalds wrote:
 
 On Fri, 23 Mar 2007, Linus Torvalds wrote:
  
  Thomas, please fix.
 
 Here's a possible fix. It compiles. And I still wish we had common files.

You beat me by 30 seconds.

 ia64 shouldn't be affected, because ia64 doesn't #define the 
 ARCH_APICTIMER_STOPS_ON_C3 flag (and then we don't use the c2_ok thing 
 either. 

Right, ia64 does not see it.

 But this is still pretty damn ugly.

Yes it is.

 Maybe a field in struct acpi_processor for C2/C3 problems?

Hmm, the acpi processor stuff is modular.

tglx


-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [1/6] 2.6.21-rc4: known regressions

2007-03-22 Thread Nick Piggin
On Thu, Mar 22, 2007 at 06:40:41PM -0700, Linus Torvalds wrote:
> 
> [ Ok, I think it's those timers again...
> 
>   Ingo: let me just state how *happy* I am that I told you off when you 
>   wanted to merge the hires timers and NO_HZ before 2.6.20 because they 
>   were "stable". You were wrong, and 2.6.20 is at least in reasonable 
>   shape. Now we just need to make sure that 2.6.21 will be too.. ]
> 
> On Thu, 22 Mar 2007, Mingming Cao wrote:
> > 
> > I might missed something, so far I can't see a deadlock yet.
> > If there is a deadlock, I think we should see ext3_xattr_release_block()
> > and ext3_forget() on the stack. Is this the case?
> 
> No. What's strange is that two (maybe more, I didn't check) processes seem 
> to be stuck in
> 
>[] schedule_timeout+0x70/0x8e
>[] schedule_timeout_uninterruptible+0x15/0x17
>[] journal_stop+0xe2/0x1e6
>[] journal_force_commit+0x1d/0x1f
>[] ext3_force_commit+0x22/0x24
>[] ext3_write_inode+0x34/0x3a
>[] __writeback_single_inode+0x1c5/0x2cb
>[] sync_inode+0x1c/0x2e
>[] ext3_sync_file+0xab/0xc0
>[] do_fsync+0x4b/0x98
>[] __do_fsync+0x20/0x2f
>[] sys_fsync+0xd/0xf
>[] syscall_call+0x7/0xb
> 
> but that that thing is literally:
> 
>   ...
> do {
> old_handle_count = transaction->t_handle_count;
> schedule_timeout_uninterruptible(1);
> } while (old_handle_count != transaction->t_handle_count);
>   ...
> 
> and especially if nothing is happening, I'd not expect 
> "transaction->t_handle_count" to keep changing, so it should stop very 
> quickly.
> 
> Maybe it's CONFIG_NO_HZ again, and the problem is that timeout, and simply 
> no timer tick happening?
> 
> Bingo. I think that's it.
> 
>   active timers:
>#0: hardirq_stack, tick_sched_timer, S:01
># expires at 953089300 nsecs [in -2567889 nsecs]
>#1: hardirq_stack, hrtimer_wakeup, S:01
># expires at 10858649798503 nsecs [in 1327754230614 nsecs]
> .expires_next   : 953089300 nsecs
> 
> See
> 
>   http://lkml.org/lkml/2007/3/16/288
> 
> and that in turn points to the kernel log:
> 
>   
> http://www.stardust.webpages.pl/files/tbf/bitis-gabonica/2.6.21-rc4/git-console.log

Seems convincing. Michal, can you post your .config, and if you had
dynticks and hrtimers enabled, try reproducing without them?
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [1/6] 2.6.21-rc4: known regressions

2007-03-22 Thread Linus Torvalds

[ Ok, I think it's those timers again...

  Ingo: let me just state how *happy* I am that I told you off when you 
  wanted to merge the hires timers and NO_HZ before 2.6.20 because they 
  were "stable". You were wrong, and 2.6.20 is at least in reasonable 
  shape. Now we just need to make sure that 2.6.21 will be too.. ]

On Thu, 22 Mar 2007, Mingming Cao wrote:
> 
> I might missed something, so far I can't see a deadlock yet.
> If there is a deadlock, I think we should see ext3_xattr_release_block()
> and ext3_forget() on the stack. Is this the case?

No. What's strange is that two (maybe more, I didn't check) processes seem 
to be stuck in

 [] schedule_timeout+0x70/0x8e
 [] schedule_timeout_uninterruptible+0x15/0x17
 [] journal_stop+0xe2/0x1e6
 [] journal_force_commit+0x1d/0x1f
 [] ext3_force_commit+0x22/0x24
 [] ext3_write_inode+0x34/0x3a
 [] __writeback_single_inode+0x1c5/0x2cb
 [] sync_inode+0x1c/0x2e
 [] ext3_sync_file+0xab/0xc0
 [] do_fsync+0x4b/0x98
 [] __do_fsync+0x20/0x2f
 [] sys_fsync+0xd/0xf
 [] syscall_call+0x7/0xb

but that that thing is literally:

...
do {
old_handle_count = transaction->t_handle_count;
schedule_timeout_uninterruptible(1);
} while (old_handle_count != transaction->t_handle_count);
...

and especially if nothing is happening, I'd not expect 
"transaction->t_handle_count" to keep changing, so it should stop very 
quickly.

Maybe it's CONFIG_NO_HZ again, and the problem is that timeout, and simply 
no timer tick happening?

Bingo. I think that's it.

active timers:
 #0: hardirq_stack, tick_sched_timer, S:01
 # expires at 953089300 nsecs [in -2567889 nsecs]
 #1: hardirq_stack, hrtimer_wakeup, S:01
 # expires at 10858649798503 nsecs [in 1327754230614 nsecs]
  .expires_next   : 953089300 nsecs

See

http://lkml.org/lkml/2007/3/16/288

and that in turn points to the kernel log:


http://www.stardust.webpages.pl/files/tbf/bitis-gabonica/2.6.21-rc4/git-console.log
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [1/6] 2.6.21-rc4: known regressions

2007-03-22 Thread Mingming Cao
On Thu, 2007-03-22 at 08:21 -0700, Linus Torvalds wrote:
> 
> On Thu, 22 Mar 2007, Nick Piggin wrote:
> > 
> > Nothing sleeps on PageUptodate, so I don't think that could explain it.
> 
> Good point. I forget that we just test "uptodate", but then always sleep 
> on "locked".
> 
> > The fs: fix __block_write_full_page error case buffer submission patch
> > does change the locking, but I'd be really suprised if that was the
> > problem, because it changes locking to match the regular non-error path
> > submission.
> 
> I'd agree, except something clearly has changed ;^)
> 
> > > Alternatively, maybe it really is an _io_ problem (and the buffer-head 
> > > thing
> > > is just a red herring, and it could happen to other IO, it's just that
> > > metadata IO uses buffer heads), and it's the scheduler changes since
> > > 2.6.20..
> > 
> > I see what you mean. Could it be an ext3 or jbd change I wonder?
> 
> jbd hasn't changed since 2.6.20, and the ext3 changes are mostly 
> things like const'ness fixes. And others were things like changing 
> "journal_current_handle()" into "ext3_journal_current_handle()", which 
> looked exciting considering that the hung processes were waiting for the 
> journal, but the fact is, that's just an inline function that just calls 
> the old function, so..
> 
> But interestingly, there *is* a "EA block reference count racing fix" 
> that does move a lock_buffer()/unlock_buffer() to cover a bigger area. It 
> looks "obviously correct", but maybe there's a deadlock possibility there 
> with ext3_forget() or something?
> 

I might missed something, so far I can't see a deadlock yet.
If there is a deadlock, I think we should see ext3_xattr_release_block()
and ext3_forget() on the stack. Is this the case?

Regards,
Mingming

>   Linus
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [EMAIL PROTECTED]
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> 

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [1/6] 2.6.21-rc4: known regressions

2007-03-22 Thread Mariusz Kozłowski
Hello,

> > In contrast, the hang reported by Mariusz Kozlowski has a slightly 
> > different feel to it, but there's a tantalizing pattern in there too:

Just to make things clear. I didn't say I could reproduce it on 2.6.21-rc4.
In fact I'm running 2.6.21-rc4-mm1 with no problems so far. I just replied
to show my sysrq dumps of processes states with 2.6.21-rc2-mm1.

I could reproduce similar (but still each time slightly different) hangs 
on -mm series from 2.6.20-mm1 to 2.6.21-rc2-mm1. 2.6.21-rc3-mm1 worked well
for me so not sure If my report is still valid here.

Sorry if I didn't make it clear enough.

> >   http://www.ussg.iu.edu/hypermail/linux/kernel/0703.0/1243.html
> > 
> > Call Trace:
> > [] io_schedule+0x42/0x59
> > [] sleep_on_buffer+0x8/0xc
> > [] __wait_on_bit+0x47/0x6c
> > [] out_of_line_wait_on_bit+0x5b/0x64
> > [] __wait_on_buffer+0x27/0x2d
> > [] journal_commit_transaction+0x707/0x127f
> > [] kjournald+0xac/0x1ed
> > [] kthread+0xa2/0xc9
> > [] kernel_thread_helper+0x7/0x1c
> > 
> > which certainly also looks like an IO never completed (or completed but 
> > never woke anything up).

As I previously noticed each time the system hang I/O activity to disk looked
dead (couldn't even sysrq-s).

> It could be possible that ext3 is doing something weird and expecting

True. I'm using ext3.

> fs: nobh data leak... again hard to see how it could cause an unlock/wakeup
> to get lost. Is Mariusz using the nobh mount option?

No. He is not.

Regards,

Mariusz Kozlowski
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [1/6] 2.6.21-rc4: known regressions

2007-03-22 Thread Linus Torvalds


On Thu, 22 Mar 2007, Nick Piggin wrote:
> 
> Nothing sleeps on PageUptodate, so I don't think that could explain it.

Good point. I forget that we just test "uptodate", but then always sleep 
on "locked".

> The fs: fix __block_write_full_page error case buffer submission patch
> does change the locking, but I'd be really suprised if that was the
> problem, because it changes locking to match the regular non-error path
> submission.

I'd agree, except something clearly has changed ;^)

> > Alternatively, maybe it really is an _io_ problem (and the buffer-head thing
> > is just a red herring, and it could happen to other IO, it's just that
> > metadata IO uses buffer heads), and it's the scheduler changes since
> > 2.6.20..
> 
> I see what you mean. Could it be an ext3 or jbd change I wonder?

jbd hasn't changed since 2.6.20, and the ext3 changes are mostly 
things like const'ness fixes. And others were things like changing 
"journal_current_handle()" into "ext3_journal_current_handle()", which 
looked exciting considering that the hung processes were waiting for the 
journal, but the fact is, that's just an inline function that just calls 
the old function, so..

But interestingly, there *is* a "EA block reference count racing fix" 
that does move a lock_buffer()/unlock_buffer() to cover a bigger area. It 
looks "obviously correct", but maybe there's a deadlock possibility there 
with ext3_forget() or something?

Linus
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [1/6] 2.6.21-rc4: known regressions

2007-03-22 Thread Linus Torvalds


On Thu, 22 Mar 2007, Nick Piggin wrote:
 
 Nothing sleeps on PageUptodate, so I don't think that could explain it.

Good point. I forget that we just test uptodate, but then always sleep 
on locked.

 The fs: fix __block_write_full_page error case buffer submission patch
 does change the locking, but I'd be really suprised if that was the
 problem, because it changes locking to match the regular non-error path
 submission.

I'd agree, except something clearly has changed ;^)

  Alternatively, maybe it really is an _io_ problem (and the buffer-head thing
  is just a red herring, and it could happen to other IO, it's just that
  metadata IO uses buffer heads), and it's the scheduler changes since
  2.6.20..
 
 I see what you mean. Could it be an ext3 or jbd change I wonder?

jbd hasn't changed since 2.6.20, and the ext3 changes are mostly 
things like const'ness fixes. And others were things like changing 
journal_current_handle() into ext3_journal_current_handle(), which 
looked exciting considering that the hung processes were waiting for the 
journal, but the fact is, that's just an inline function that just calls 
the old function, so..

But interestingly, there *is* a EA block reference count racing fix 
that does move a lock_buffer()/unlock_buffer() to cover a bigger area. It 
looks obviously correct, but maybe there's a deadlock possibility there 
with ext3_forget() or something?

Linus
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [1/6] 2.6.21-rc4: known regressions

2007-03-22 Thread Mariusz Kozłowski
Hello,

  In contrast, the hang reported by Mariusz Kozlowski has a slightly 
  different feel to it, but there's a tantalizing pattern in there too:

Just to make things clear. I didn't say I could reproduce it on 2.6.21-rc4.
In fact I'm running 2.6.21-rc4-mm1 with no problems so far. I just replied
to show my sysrq dumps of processes states with 2.6.21-rc2-mm1.

I could reproduce similar (but still each time slightly different) hangs 
on -mm series from 2.6.20-mm1 to 2.6.21-rc2-mm1. 2.6.21-rc3-mm1 worked well
for me so not sure If my report is still valid here.

Sorry if I didn't make it clear enough.

http://www.ussg.iu.edu/hypermail/linux/kernel/0703.0/1243.html
  
  Call Trace:
  [c03ec87e] io_schedule+0x42/0x59
  [c0184915] sleep_on_buffer+0x8/0xc
  [c03ed217] __wait_on_bit+0x47/0x6c
  [c03ed297] out_of_line_wait_on_bit+0x5b/0x64
  [c01848a8] __wait_on_buffer+0x27/0x2d
  [c01b4228] journal_commit_transaction+0x707/0x127f
  [c01b868b] kjournald+0xac/0x1ed
  [c0126af5] kthread+0xa2/0xc9
  [c010422b] kernel_thread_helper+0x7/0x1c
  
  which certainly also looks like an IO never completed (or completed but 
  never woke anything up).

As I previously noticed each time the system hang I/O activity to disk looked
dead (couldn't even sysrq-s).

 It could be possible that ext3 is doing something weird and expecting

True. I'm using ext3.

 fs: nobh data leak... again hard to see how it could cause an unlock/wakeup
 to get lost. Is Mariusz using the nobh mount option?

No. He is not.

Regards,

Mariusz Kozlowski
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [1/6] 2.6.21-rc4: known regressions

2007-03-22 Thread Mingming Cao
On Thu, 2007-03-22 at 08:21 -0700, Linus Torvalds wrote:
 
 On Thu, 22 Mar 2007, Nick Piggin wrote:
  
  Nothing sleeps on PageUptodate, so I don't think that could explain it.
 
 Good point. I forget that we just test uptodate, but then always sleep 
 on locked.
 
  The fs: fix __block_write_full_page error case buffer submission patch
  does change the locking, but I'd be really suprised if that was the
  problem, because it changes locking to match the regular non-error path
  submission.
 
 I'd agree, except something clearly has changed ;^)
 
   Alternatively, maybe it really is an _io_ problem (and the buffer-head 
   thing
   is just a red herring, and it could happen to other IO, it's just that
   metadata IO uses buffer heads), and it's the scheduler changes since
   2.6.20..
  
  I see what you mean. Could it be an ext3 or jbd change I wonder?
 
 jbd hasn't changed since 2.6.20, and the ext3 changes are mostly 
 things like const'ness fixes. And others were things like changing 
 journal_current_handle() into ext3_journal_current_handle(), which 
 looked exciting considering that the hung processes were waiting for the 
 journal, but the fact is, that's just an inline function that just calls 
 the old function, so..
 
 But interestingly, there *is* a EA block reference count racing fix 
 that does move a lock_buffer()/unlock_buffer() to cover a bigger area. It 
 looks obviously correct, but maybe there's a deadlock possibility there 
 with ext3_forget() or something?
 

I might missed something, so far I can't see a deadlock yet.
If there is a deadlock, I think we should see ext3_xattr_release_block()
and ext3_forget() on the stack. Is this the case?

Regards,
Mingming

   Linus
 -
 To unsubscribe from this list: send the line unsubscribe linux-kernel in
 the body of a message to [EMAIL PROTECTED]
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
 Please read the FAQ at  http://www.tux.org/lkml/
 

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [1/6] 2.6.21-rc4: known regressions

2007-03-22 Thread Linus Torvalds

[ Ok, I think it's those timers again...

  Ingo: let me just state how *happy* I am that I told you off when you 
  wanted to merge the hires timers and NO_HZ before 2.6.20 because they 
  were stable. You were wrong, and 2.6.20 is at least in reasonable 
  shape. Now we just need to make sure that 2.6.21 will be too.. ]

On Thu, 22 Mar 2007, Mingming Cao wrote:
 
 I might missed something, so far I can't see a deadlock yet.
 If there is a deadlock, I think we should see ext3_xattr_release_block()
 and ext3_forget() on the stack. Is this the case?

No. What's strange is that two (maybe more, I didn't check) processes seem 
to be stuck in

 [c0318981] schedule_timeout+0x70/0x8e
 [c03189b4] schedule_timeout_uninterruptible+0x15/0x17
 [c01b964a] journal_stop+0xe2/0x1e6
 [c01ba2b0] journal_force_commit+0x1d/0x1f
 [c01b29fb] ext3_force_commit+0x22/0x24
 [c01ad607] ext3_write_inode+0x34/0x3a
 [c0189f74] __writeback_single_inode+0x1c5/0x2cb
 [c018a096] sync_inode+0x1c/0x2e
 [c01a9ff7] ext3_sync_file+0xab/0xc0
 [c018c8c5] do_fsync+0x4b/0x98
 [c018c932] __do_fsync+0x20/0x2f
 [c018c960] sys_fsync+0xd/0xf
 [c0104064] syscall_call+0x7/0xb

but that that thing is literally:

...
do {
old_handle_count = transaction-t_handle_count;
schedule_timeout_uninterruptible(1);
} while (old_handle_count != transaction-t_handle_count);
...

and especially if nothing is happening, I'd not expect 
transaction-t_handle_count to keep changing, so it should stop very 
quickly.

Maybe it's CONFIG_NO_HZ again, and the problem is that timeout, and simply 
no timer tick happening?

Bingo. I think that's it.

active timers:
 #0: hardirq_stack, tick_sched_timer, S:01
 # expires at 953089300 nsecs [in -2567889 nsecs]
 #1: hardirq_stack, hrtimer_wakeup, S:01
 # expires at 10858649798503 nsecs [in 1327754230614 nsecs]
  .expires_next   : 953089300 nsecs

See

http://lkml.org/lkml/2007/3/16/288

and that in turn points to the kernel log:


http://www.stardust.webpages.pl/files/tbf/bitis-gabonica/2.6.21-rc4/git-console.log
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [1/6] 2.6.21-rc4: known regressions

2007-03-22 Thread Nick Piggin
On Thu, Mar 22, 2007 at 06:40:41PM -0700, Linus Torvalds wrote:
 
 [ Ok, I think it's those timers again...
 
   Ingo: let me just state how *happy* I am that I told you off when you 
   wanted to merge the hires timers and NO_HZ before 2.6.20 because they 
   were stable. You were wrong, and 2.6.20 is at least in reasonable 
   shape. Now we just need to make sure that 2.6.21 will be too.. ]
 
 On Thu, 22 Mar 2007, Mingming Cao wrote:
  
  I might missed something, so far I can't see a deadlock yet.
  If there is a deadlock, I think we should see ext3_xattr_release_block()
  and ext3_forget() on the stack. Is this the case?
 
 No. What's strange is that two (maybe more, I didn't check) processes seem 
 to be stuck in
 
[c0318981] schedule_timeout+0x70/0x8e
[c03189b4] schedule_timeout_uninterruptible+0x15/0x17
[c01b964a] journal_stop+0xe2/0x1e6
[c01ba2b0] journal_force_commit+0x1d/0x1f
[c01b29fb] ext3_force_commit+0x22/0x24
[c01ad607] ext3_write_inode+0x34/0x3a
[c0189f74] __writeback_single_inode+0x1c5/0x2cb
[c018a096] sync_inode+0x1c/0x2e
[c01a9ff7] ext3_sync_file+0xab/0xc0
[c018c8c5] do_fsync+0x4b/0x98
[c018c932] __do_fsync+0x20/0x2f
[c018c960] sys_fsync+0xd/0xf
[c0104064] syscall_call+0x7/0xb
 
 but that that thing is literally:
 
   ...
 do {
 old_handle_count = transaction-t_handle_count;
 schedule_timeout_uninterruptible(1);
 } while (old_handle_count != transaction-t_handle_count);
   ...
 
 and especially if nothing is happening, I'd not expect 
 transaction-t_handle_count to keep changing, so it should stop very 
 quickly.
 
 Maybe it's CONFIG_NO_HZ again, and the problem is that timeout, and simply 
 no timer tick happening?
 
 Bingo. I think that's it.
 
   active timers:
#0: hardirq_stack, tick_sched_timer, S:01
# expires at 953089300 nsecs [in -2567889 nsecs]
#1: hardirq_stack, hrtimer_wakeup, S:01
# expires at 10858649798503 nsecs [in 1327754230614 nsecs]
 .expires_next   : 953089300 nsecs
 
 See
 
   http://lkml.org/lkml/2007/3/16/288
 
 and that in turn points to the kernel log:
 
   
 http://www.stardust.webpages.pl/files/tbf/bitis-gabonica/2.6.21-rc4/git-console.log

Seems convincing. Michal, can you post your .config, and if you had
dynticks and hrtimers enabled, try reproducing without them?
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [1/6] 2.6.21-rc4: known regressions

2007-03-21 Thread Nick Piggin

Linus Torvalds wrote:

In contrast, the hang reported by Mariusz Kozlowski has a slightly 
different feel to it, but there's a tantalizing pattern in there too:


  http://www.ussg.iu.edu/hypermail/linux/kernel/0703.0/1243.html

Call Trace:
[] io_schedule+0x42/0x59
[] sleep_on_buffer+0x8/0xc
[] __wait_on_bit+0x47/0x6c
[] out_of_line_wait_on_bit+0x5b/0x64
[] __wait_on_buffer+0x27/0x2d
[] journal_commit_transaction+0x707/0x127f
[] kjournald+0xac/0x1ed
[] kthread+0xa2/0xc9
[] kernel_thread_helper+0x7/0x1c

which certainly also looks like an IO never completed (or completed but 
never woke anything up).


It also seems to be related to *buffers*. Maybe the whole bh layer thing 
is a fluke, but it's not waiting for normal data, it's very much waiting 
for those journal things that all use buffer heads.Which just makes me 
worry about those patches by Nick (which did come in through Andrew). I 
don't think it's the memorder one (it looks safe and shouldn't matter on 
x86 anyway!), but what about the


fs: fix __block_write_full_page error case buffer submission

locking change for example? Or that "fs: fix nobh data leak" thing with 
its fix? It uses "SetPageUptodate(page);" without waking up anybody who 
might wait for it (but the waiters here seem to wait on buffers, so that's 
probably not it)..


Nothing sleeps on PageUptodate, so I don't think that could explain it.

The fs: fix __block_write_full_page error case buffer submission patch
does change the locking, but I'd be really suprised if that was the
problem, because it changes locking to match the regular non-error path
submission.

It could be possible that ext3 is doing something weird and expecting
the old behaviour if it failed get_block, but that seems pretty weird
to do, and would need fixing.

fs: nobh data leak... again hard to see how it could cause an unlock/wakeup
to get lost. Is Mariusz using the nobh mount option?

It wouldn't hurt to test with these patches backed out...

Alternatively, maybe it really is an _io_ problem (and the buffer-head 
thing is just a red herring, and it could happen to other IO, it's just 
that metadata IO uses buffer heads), and it's the scheduler changes since 
2.6.20..


I see what you mean. Could it be an ext3 or jbd change I wonder?

--
SUSE Labs, Novell Inc.
Send instant messages to your online friends http://au.messenger.yahoo.com 


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [1/6] 2.6.21-rc4: known regressions

2007-03-21 Thread Linus Torvalds


On Sun, 18 Mar 2007, Adrian Bunk wrote:
> 
> Subject: weird system hangs
> References : http://lkml.org/lkml/2007/3/16/288
> Submitter  : Michal Piotrowski <[EMAIL PROTECTED]>
>  Mariusz Kozlowski <[EMAIL PROTECTED]>
> Status : unknown

According to the console log, it seems to be hung because a lot of 
processes are stuck in D state in various variations of this:

Call Trace:
 [] start_this_handle+0x2d7/0x355
 [] journal_start+0xb3/0xe1
 [] ext3_journal_start_sb+0x48/0x4a
 [] ext3_create+0x47/0xe2
 [] vfs_create+0xcd/0x13e
 [] open_namei+0x176/0x5b5
 [] do_filp_open+0x26/0x3b
 [] do_sys_open+0x43/0xc2
 [] sys_open+0x1c/0x1e
 [] syscall_call+0x7/0xb

and then you have "kget" (whatever that is) which is doing

Call Trace:
 [] schedule_timeout+0x70/0x8e
 [] schedule_timeout_uninterruptible+0x15/0x17
 [] journal_stop+0xe2/0x1e6
 [] journal_force_commit+0x1d/0x1f
 [] ext3_force_commit+0x22/0x24
 [] ext3_write_inode+0x34/0x3a
 [] __writeback_single_inode+0x1c5/0x2cb
 [] sync_inode+0x1c/0x2e
 [] ext3_sync_file+0xab/0xc0
 [] do_fsync+0x4b/0x98
 [] __do_fsync+0x20/0x2f
 [] sys_fdatasync+0x10/0x12
 [] syscall_call+0x7/0xb

with kjournald in D sleep at

 [] journal_commit_transaction+0x15d/0x11d3
 [] kjournald+0xab/0x1e8
 [] kthread+0xb5/0xe0
 [] kernel_thread_helper+0x7/0x10

which certainly looks like something is waiting for an IO to finish.

In contrast, the hang reported by Mariusz Kozlowski has a slightly 
different feel to it, but there's a tantalizing pattern in there too:

  http://www.ussg.iu.edu/hypermail/linux/kernel/0703.0/1243.html

Call Trace:
[] io_schedule+0x42/0x59
[] sleep_on_buffer+0x8/0xc
[] __wait_on_bit+0x47/0x6c
[] out_of_line_wait_on_bit+0x5b/0x64
[] __wait_on_buffer+0x27/0x2d
[] journal_commit_transaction+0x707/0x127f
[] kjournald+0xac/0x1ed
[] kthread+0xa2/0xc9
[] kernel_thread_helper+0x7/0x1c

which certainly also looks like an IO never completed (or completed but 
never woke anything up).

It also seems to be related to *buffers*. Maybe the whole bh layer thing 
is a fluke, but it's not waiting for normal data, it's very much waiting 
for those journal things that all use buffer heads.Which just makes me 
worry about those patches by Nick (which did come in through Andrew). I 
don't think it's the memorder one (it looks safe and shouldn't matter on 
x86 anyway!), but what about the

fs: fix __block_write_full_page error case buffer submission

locking change for example? Or that "fs: fix nobh data leak" thing with 
its fix? It uses "SetPageUptodate(page);" without waking up anybody who 
might wait for it (but the waiters here seem to wait on buffers, so that's 
probably not it)..

Alternatively, maybe it really is an _io_ problem (and the buffer-head 
thing is just a red herring, and it could happen to other IO, it's just 
that metadata IO uses buffer heads), and it's the scheduler changes since 
2.6.20..

Jens, Nick.. Could you take a look?

Linus
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [1/6] 2.6.21-rc4: known regressions

2007-03-21 Thread Linus Torvalds


On Sun, 18 Mar 2007, Adrian Bunk wrote:
 
 Subject: weird system hangs
 References : http://lkml.org/lkml/2007/3/16/288
 Submitter  : Michal Piotrowski [EMAIL PROTECTED]
  Mariusz Kozlowski [EMAIL PROTECTED]
 Status : unknown

According to the console log, it seems to be hung because a lot of 
processes are stuck in D state in various variations of this:

Call Trace:
 [c01ba134] start_this_handle+0x2d7/0x355
 [c01ba265] journal_start+0xb3/0xe1
 [c01b2837] ext3_journal_start_sb+0x48/0x4a
 [c01b0924] ext3_create+0x47/0xe2
 [c017820c] vfs_create+0xcd/0x13e
 [c017ab6e] open_namei+0x176/0x5b5
 [c0170026] do_filp_open+0x26/0x3b
 [c017007e] do_sys_open+0x43/0xc2
 [c0170135] sys_open+0x1c/0x1e
 [c0104064] syscall_call+0x7/0xb

and then you have kget (whatever that is) which is doing

Call Trace:
 [c0318981] schedule_timeout+0x70/0x8e
 [c03189b4] schedule_timeout_uninterruptible+0x15/0x17
 [c01b964a] journal_stop+0xe2/0x1e6
 [c01ba2b0] journal_force_commit+0x1d/0x1f
 [c01b29fb] ext3_force_commit+0x22/0x24
 [c01ad607] ext3_write_inode+0x34/0x3a
 [c0189f74] __writeback_single_inode+0x1c5/0x2cb
 [c018a096] sync_inode+0x1c/0x2e
 [c01a9ff7] ext3_sync_file+0xab/0xc0
 [c018c8c5] do_fsync+0x4b/0x98
 [c018c932] __do_fsync+0x20/0x2f
 [c018c951] sys_fdatasync+0x10/0x12
 [c0104064] syscall_call+0x7/0xb

with kjournald in D sleep at

 [c01bb7b2] journal_commit_transaction+0x15d/0x11d3
 [c01bfcbe] kjournald+0xab/0x1e8
 [c01333dd] kthread+0xb5/0xe0
 [c0104cd3] kernel_thread_helper+0x7/0x10

which certainly looks like something is waiting for an IO to finish.

In contrast, the hang reported by Mariusz Kozlowski has a slightly 
different feel to it, but there's a tantalizing pattern in there too:

  http://www.ussg.iu.edu/hypermail/linux/kernel/0703.0/1243.html

Call Trace:
[c03ec87e] io_schedule+0x42/0x59
[c0184915] sleep_on_buffer+0x8/0xc
[c03ed217] __wait_on_bit+0x47/0x6c
[c03ed297] out_of_line_wait_on_bit+0x5b/0x64
[c01848a8] __wait_on_buffer+0x27/0x2d
[c01b4228] journal_commit_transaction+0x707/0x127f
[c01b868b] kjournald+0xac/0x1ed
[c0126af5] kthread+0xa2/0xc9
[c010422b] kernel_thread_helper+0x7/0x1c

which certainly also looks like an IO never completed (or completed but 
never woke anything up).

It also seems to be related to *buffers*. Maybe the whole bh layer thing 
is a fluke, but it's not waiting for normal data, it's very much waiting 
for those journal things that all use buffer heads.Which just makes me 
worry about those patches by Nick (which did come in through Andrew). I 
don't think it's the memorder one (it looks safe and shouldn't matter on 
x86 anyway!), but what about the

fs: fix __block_write_full_page error case buffer submission

locking change for example? Or that fs: fix nobh data leak thing with 
its fix? It uses SetPageUptodate(page); without waking up anybody who 
might wait for it (but the waiters here seem to wait on buffers, so that's 
probably not it)..

Alternatively, maybe it really is an _io_ problem (and the buffer-head 
thing is just a red herring, and it could happen to other IO, it's just 
that metadata IO uses buffer heads), and it's the scheduler changes since 
2.6.20..

Jens, Nick.. Could you take a look?

Linus
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [1/6] 2.6.21-rc4: known regressions

2007-03-21 Thread Nick Piggin

Linus Torvalds wrote:

In contrast, the hang reported by Mariusz Kozlowski has a slightly 
different feel to it, but there's a tantalizing pattern in there too:


  http://www.ussg.iu.edu/hypermail/linux/kernel/0703.0/1243.html

Call Trace:
[c03ec87e] io_schedule+0x42/0x59
[c0184915] sleep_on_buffer+0x8/0xc
[c03ed217] __wait_on_bit+0x47/0x6c
[c03ed297] out_of_line_wait_on_bit+0x5b/0x64
[c01848a8] __wait_on_buffer+0x27/0x2d
[c01b4228] journal_commit_transaction+0x707/0x127f
[c01b868b] kjournald+0xac/0x1ed
[c0126af5] kthread+0xa2/0xc9
[c010422b] kernel_thread_helper+0x7/0x1c

which certainly also looks like an IO never completed (or completed but 
never woke anything up).


It also seems to be related to *buffers*. Maybe the whole bh layer thing 
is a fluke, but it's not waiting for normal data, it's very much waiting 
for those journal things that all use buffer heads.Which just makes me 
worry about those patches by Nick (which did come in through Andrew). I 
don't think it's the memorder one (it looks safe and shouldn't matter on 
x86 anyway!), but what about the


fs: fix __block_write_full_page error case buffer submission

locking change for example? Or that fs: fix nobh data leak thing with 
its fix? It uses SetPageUptodate(page); without waking up anybody who 
might wait for it (but the waiters here seem to wait on buffers, so that's 
probably not it)..


Nothing sleeps on PageUptodate, so I don't think that could explain it.

The fs: fix __block_write_full_page error case buffer submission patch
does change the locking, but I'd be really suprised if that was the
problem, because it changes locking to match the regular non-error path
submission.

It could be possible that ext3 is doing something weird and expecting
the old behaviour if it failed get_block, but that seems pretty weird
to do, and would need fixing.

fs: nobh data leak... again hard to see how it could cause an unlock/wakeup
to get lost. Is Mariusz using the nobh mount option?

It wouldn't hurt to test with these patches backed out...

Alternatively, maybe it really is an _io_ problem (and the buffer-head 
thing is just a red herring, and it could happen to other IO, it's just 
that metadata IO uses buffer heads), and it's the scheduler changes since 
2.6.20..


I see what you mean. Could it be an ext3 or jbd change I wonder?

--
SUSE Labs, Novell Inc.
Send instant messages to your online friends http://au.messenger.yahoo.com 


-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [1/6] 2.6.21-rc4: known regressions

2007-03-20 Thread Adrian Bunk
On Tue, Mar 20, 2007 at 11:24:41AM +0100, Tobias Diedrich wrote:
> Adrian Bunk wrote:
> > This email lists some known regressions in Linus' tree compared to 2.6.20.
> 
> Since I didn't see any mention of this:
> 
> I'm seeing an Oops when removing the ohci1394 module:
> 
> [   16.047275] ieee1394: Node removed: ID:BUS[158717321-38:0860]  
> GUID[c033ced6]
> [   16.047287] BUG: unable to handle kernel NULL pointer dereference at 
> virtual address 0094
> [   16.047451]  printing eip:
> [   16.047524] c02daf3d
> [   16.047527] *pde = 
> [   16.047603] Oops:  [#1]
> [   16.047676] PREEMPT 
> [   16.047788] Modules linked in: backlight ohci1394 parport_pc parport
> [   16.048069] CPU:0
> [   16.048071] EIP:0060:[]Not tainted VLI
> [   16.048074] EFLAGS: 00010246   (2.6.21-rc4 #35)
> [   16.048298] EIP is at class_device_remove_attrs+0xa/0x30
> [   16.048377] eax: dfd04338   ebx:    ecx: df655988   edx: 
> [   16.048456] esi:    edi: dfd04338   ebp:    esp: df506e38
> [   16.048535] ds: 007b   es: 007b   fs: 00d8  gs: 0033  ss: 0068
> [   16.048614] Process rmmod (pid: 1455, ti=df506000 task=df6cc0b0 
> task.ti=df506000)
> [   16.048693] Stack: dfd04338 dfd04340  c02db02f  dfd04338 
> dfd041e4 c0331871 
> [   16.049159] c02db065 dfd041b0 c0331858 c055006d 0975d589 
> 0026 035c 
> [   16.049626] c033ced6  df24c000 c0331879 c02d859f 
> df24c0bc df24c0bc 
> [   16.050091] Call Trace:
> [   16.050233]  [] class_device_del+0xcc/0xfa
> [   16.050352]  [] __nodemgr_remove_host_dev+0x0/0xb
>...
> [   16.057248] EIP: [] class_device_remove_attrs+0xa/0x30 SS:ESP 
> 0068:df506e38
>...

You missed the following entry in my list [1]:

Subject: Oops in __nodemgr_remove_host_dev
References : http://lkml.org/lkml/2007/3/14/4
 http://lkml.org/lkml/2007/3/18/87
Submitter  : Ismail Dönmez <[EMAIL PROTECTED]>
 Stefan Richter <[EMAIL PROTECTED]>
 Thomas Meyer <[EMAIL PROTECTED]>
Caused-By  : Greg Kroah-Hartman <[EMAIL PROTECTED]>
 commit 43cb76d91ee85f579a69d42bc8efc08bac560278
 commit 40cf67c5fcc513406558c01b91129280208e57bf
Handled-By : Stefan Richter <[EMAIL PROTECTED]>
Status : problem is being debugged


cu
Adrian

[1] not meant as an offence - there are so many items in the list
that it's easy to miss one

-- 

   "Is there not promise of rain?" Ling Tan asked suddenly out
of the darkness. There had been need of rain for many days.
   "Only a promise," Lao Er said.
   Pearl S. Buck - Dragon Seed

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [1/6] 2.6.21-rc4: known regressions

2007-03-20 Thread Tobias Diedrich
Adrian Bunk wrote:
> This email lists some known regressions in Linus' tree compared to 2.6.20.

Since I didn't see any mention of this:

I'm seeing an Oops when removing the ohci1394 module:

[   16.047275] ieee1394: Node removed: ID:BUS[158717321-38:0860]  
GUID[c033ced6]
[   16.047287] BUG: unable to handle kernel NULL pointer dereference at virtual 
address 0094
[   16.047451]  printing eip:
[   16.047524] c02daf3d
[   16.047527] *pde = 
[   16.047603] Oops:  [#1]
[   16.047676] PREEMPT 
[   16.047788] Modules linked in: backlight ohci1394 parport_pc parport
[   16.048069] CPU:0
[   16.048071] EIP:0060:[]Not tainted VLI
[   16.048074] EFLAGS: 00010246   (2.6.21-rc4 #35)
[   16.048298] EIP is at class_device_remove_attrs+0xa/0x30
[   16.048377] eax: dfd04338   ebx:    ecx: df655988   edx: 
[   16.048456] esi:    edi: dfd04338   ebp:    esp: df506e38
[   16.048535] ds: 007b   es: 007b   fs: 00d8  gs: 0033  ss: 0068
[   16.048614] Process rmmod (pid: 1455, ti=df506000 task=df6cc0b0 
task.ti=df506000)
[   16.048693] Stack: dfd04338 dfd04340  c02db02f  dfd04338 
dfd041e4 c0331871 
[   16.049159] c02db065 dfd041b0 c0331858 c055006d 0975d589 
0026 035c 
[   16.049626] c033ced6  df24c000 c0331879 c02d859f 
df24c0bc df24c0bc 
[   16.050091] Call Trace:
[   16.050233]  [] class_device_del+0xcc/0xfa
[   16.050352]  [] __nodemgr_remove_host_dev+0x0/0xb
[   16.050475]  [] class_device_unregister+0x8/0x10
[   16.050595]  [] nodemgr_remove_ne+0x61/0x7a
[   16.050714]  [] ether1394_header_cache+0x0/0x43
[   16.050835]  [] __nodemgr_remove_host_dev+0x8/0xb
[   16.050954]  [] device_for_each_child+0x1a/0x3c
[   16.051073]  [] nodemgr_remove_host+0x30/0x90
[   16.051192]  [] __unregister_host+0x1a/0xad
[   16.051311]  [] hl_get_hostinfo+0x5b/0x76
[   16.051430]  [] highlevel_remove_host+0x21/0x42
[   16.051549]  [] hpsb_remove_host+0x37/0x56
[   16.051668]  [] ohci1394_pci_remove+0x44/0x1c7 [ohci1394]
[   16.051794]  [] pci_device_remove+0x16/0x35
[   16.053376]  [] __device_release_driver+0x6e/0x8b
[   16.053496]  [] driver_detach+0xa1/0xde
[   16.053613]  [] bus_remove_driver+0x57/0x75
[   16.053733]  [] driver_unregister+0x8/0x13
[   16.053850]  [] pci_unregister_driver+0xc/0x6e
[   16.053969]  [] sys_delete_module+0x174/0x19a
[   16.054091]  [] do_page_fault+0x277/0x525
[   16.054211]  [] do_munmap+0x193/0x1ac
[   16.054331]  [] syscall_call+0x7/0xb
[   16.054450]  ===
[   16.054523] Code: ff c3 85 c0 74 08 83 c0 08 e9 9b f8 ea ff b8 ea ff ff ff 
c3 85 c0 74 08 83 c0 08 e9 b9 db ea ff c3 57 89 c7 56 53 31 db 8b 70 44 <83> be 
94 00 00 00 00 75 09 eb 17 89 f8 e8 d7 ff ff ff 89 da 83 
[   16.057248] EIP: [] class_device_remove_attrs+0xa/0x30 SS:ESP 
0068:df506e38

-- 
Tobias  PGP: http://9ac7e0bc.uguu.de
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [1/6] 2.6.21-rc4: known regressions

2007-03-20 Thread Tobias Diedrich
Adrian Bunk wrote:
 This email lists some known regressions in Linus' tree compared to 2.6.20.

Since I didn't see any mention of this:

I'm seeing an Oops when removing the ohci1394 module:

[   16.047275] ieee1394: Node removed: ID:BUS[158717321-38:0860]  
GUID[c033ced6]
[   16.047287] BUG: unable to handle kernel NULL pointer dereference at virtual 
address 0094
[   16.047451]  printing eip:
[   16.047524] c02daf3d
[   16.047527] *pde = 
[   16.047603] Oops:  [#1]
[   16.047676] PREEMPT 
[   16.047788] Modules linked in: backlight ohci1394 parport_pc parport
[   16.048069] CPU:0
[   16.048071] EIP:0060:[c02daf3d]Not tainted VLI
[   16.048074] EFLAGS: 00010246   (2.6.21-rc4 #35)
[   16.048298] EIP is at class_device_remove_attrs+0xa/0x30
[   16.048377] eax: dfd04338   ebx:    ecx: df655988   edx: 
[   16.048456] esi:    edi: dfd04338   ebp:    esp: df506e38
[   16.048535] ds: 007b   es: 007b   fs: 00d8  gs: 0033  ss: 0068
[   16.048614] Process rmmod (pid: 1455, ti=df506000 task=df6cc0b0 
task.ti=df506000)
[   16.048693] Stack: dfd04338 dfd04340  c02db02f  dfd04338 
dfd041e4 c0331871 
[   16.049159] c02db065 dfd041b0 c0331858 c055006d 0975d589 
0026 035c 
[   16.049626] c033ced6  df24c000 c0331879 c02d859f 
df24c0bc df24c0bc 
[   16.050091] Call Trace:
[   16.050233]  [c02db02f] class_device_del+0xcc/0xfa
[   16.050352]  [c0331871] __nodemgr_remove_host_dev+0x0/0xb
[   16.050475]  [c02db065] class_device_unregister+0x8/0x10
[   16.050595]  [c0331858] nodemgr_remove_ne+0x61/0x7a
[   16.050714]  [c033ced6] ether1394_header_cache+0x0/0x43
[   16.050835]  [c0331879] __nodemgr_remove_host_dev+0x8/0xb
[   16.050954]  [c02d859f] device_for_each_child+0x1a/0x3c
[   16.051073]  [c0331b98] nodemgr_remove_host+0x30/0x90
[   16.051192]  [c032f12c] __unregister_host+0x1a/0xad
[   16.051311]  [c032ee17] hl_get_hostinfo+0x5b/0x76
[   16.051430]  [c032f34a] highlevel_remove_host+0x21/0x42
[   16.051549]  [c032ed9d] hpsb_remove_host+0x37/0x56
[   16.051668]  [e0869263] ohci1394_pci_remove+0x44/0x1c7 [ohci1394]
[   16.051794]  [c027e5b0] pci_device_remove+0x16/0x35
[   16.053376]  [c02da6d7] __device_release_driver+0x6e/0x8b
[   16.053496]  [c02dab77] driver_detach+0xa1/0xde
[   16.053613]  [c02da33f] bus_remove_driver+0x57/0x75
[   16.053733]  [c02dabd4] driver_unregister+0x8/0x13
[   16.053850]  [c027e732] pci_unregister_driver+0xc/0x6e
[   16.053969]  [c0134d56] sys_delete_module+0x174/0x19a
[   16.054091]  [c0113cea] do_page_fault+0x277/0x525
[   16.054211]  [c0148b0d] do_munmap+0x193/0x1ac
[   16.054331]  [c0103d0c] syscall_call+0x7/0xb
[   16.054450]  ===
[   16.054523] Code: ff c3 85 c0 74 08 83 c0 08 e9 9b f8 ea ff b8 ea ff ff ff 
c3 85 c0 74 08 83 c0 08 e9 b9 db ea ff c3 57 89 c7 56 53 31 db 8b 70 44 83 be 
94 00 00 00 00 75 09 eb 17 89 f8 e8 d7 ff ff ff 89 da 83 
[   16.057248] EIP: [c02daf3d] class_device_remove_attrs+0xa/0x30 SS:ESP 
0068:df506e38

-- 
Tobias  PGP: http://9ac7e0bc.uguu.de
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [1/6] 2.6.21-rc4: known regressions

2007-03-20 Thread Adrian Bunk
On Tue, Mar 20, 2007 at 11:24:41AM +0100, Tobias Diedrich wrote:
 Adrian Bunk wrote:
  This email lists some known regressions in Linus' tree compared to 2.6.20.
 
 Since I didn't see any mention of this:
 
 I'm seeing an Oops when removing the ohci1394 module:
 
 [   16.047275] ieee1394: Node removed: ID:BUS[158717321-38:0860]  
 GUID[c033ced6]
 [   16.047287] BUG: unable to handle kernel NULL pointer dereference at 
 virtual address 0094
 [   16.047451]  printing eip:
 [   16.047524] c02daf3d
 [   16.047527] *pde = 
 [   16.047603] Oops:  [#1]
 [   16.047676] PREEMPT 
 [   16.047788] Modules linked in: backlight ohci1394 parport_pc parport
 [   16.048069] CPU:0
 [   16.048071] EIP:0060:[c02daf3d]Not tainted VLI
 [   16.048074] EFLAGS: 00010246   (2.6.21-rc4 #35)
 [   16.048298] EIP is at class_device_remove_attrs+0xa/0x30
 [   16.048377] eax: dfd04338   ebx:    ecx: df655988   edx: 
 [   16.048456] esi:    edi: dfd04338   ebp:    esp: df506e38
 [   16.048535] ds: 007b   es: 007b   fs: 00d8  gs: 0033  ss: 0068
 [   16.048614] Process rmmod (pid: 1455, ti=df506000 task=df6cc0b0 
 task.ti=df506000)
 [   16.048693] Stack: dfd04338 dfd04340  c02db02f  dfd04338 
 dfd041e4 c0331871 
 [   16.049159] c02db065 dfd041b0 c0331858 c055006d 0975d589 
 0026 035c 
 [   16.049626] c033ced6  df24c000 c0331879 c02d859f 
 df24c0bc df24c0bc 
 [   16.050091] Call Trace:
 [   16.050233]  [c02db02f] class_device_del+0xcc/0xfa
 [   16.050352]  [c0331871] __nodemgr_remove_host_dev+0x0/0xb
...
 [   16.057248] EIP: [c02daf3d] class_device_remove_attrs+0xa/0x30 SS:ESP 
 0068:df506e38
...

You missed the following entry in my list [1]:

Subject: Oops in __nodemgr_remove_host_dev
References : http://lkml.org/lkml/2007/3/14/4
 http://lkml.org/lkml/2007/3/18/87
Submitter  : Ismail Dönmez [EMAIL PROTECTED]
 Stefan Richter [EMAIL PROTECTED]
 Thomas Meyer [EMAIL PROTECTED]
Caused-By  : Greg Kroah-Hartman [EMAIL PROTECTED]
 commit 43cb76d91ee85f579a69d42bc8efc08bac560278
 commit 40cf67c5fcc513406558c01b91129280208e57bf
Handled-By : Stefan Richter [EMAIL PROTECTED]
Status : problem is being debugged


cu
Adrian

[1] not meant as an offence - there are so many items in the list
that it's easy to miss one

-- 

   Is there not promise of rain? Ling Tan asked suddenly out
of the darkness. There had been need of rain for many days.
   Only a promise, Lao Er said.
   Pearl S. Buck - Dragon Seed

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[1/6] 2.6.21-rc4: known regressions

2007-03-18 Thread Adrian Bunk
This email lists some known regressions in Linus' tree compared to 2.6.20.

If you find your name in the Cc header, you are either submitter of one
of the bugs, maintainer of an affectected subsystem or driver, a patch
of you caused a breakage or I'm considering you in any other way
possibly involved with one or more of these issues.

Due to the huge amount of recipients, please trim the Cc when answering.


Subject: weird system hangs
References : http://lkml.org/lkml/2007/3/16/288
Submitter  : Michal Piotrowski <[EMAIL PROTECTED]>
 Mariusz Kozlowski <[EMAIL PROTECTED]>
Status : unknown


Subject: crashes in KDE
References : http://bugzilla.kernel.org/show_bug.cgi?id=8157
Submitter  : Oliver Pinter <[EMAIL PROTECTED]>
Status : unknown


Subject: kwin dies silently
References : http://lkml.org/lkml/2007/2/28/112
Submitter  : Sid Boyce <[EMAIL PROTECTED]>
Status : unknown

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[1/6] 2.6.21-rc4: known regressions

2007-03-18 Thread Adrian Bunk
This email lists some known regressions in Linus' tree compared to 2.6.20.

If you find your name in the Cc header, you are either submitter of one
of the bugs, maintainer of an affectected subsystem or driver, a patch
of you caused a breakage or I'm considering you in any other way
possibly involved with one or more of these issues.

Due to the huge amount of recipients, please trim the Cc when answering.


Subject: weird system hangs
References : http://lkml.org/lkml/2007/3/16/288
Submitter  : Michal Piotrowski [EMAIL PROTECTED]
 Mariusz Kozlowski [EMAIL PROTECTED]
Status : unknown


Subject: crashes in KDE
References : http://bugzilla.kernel.org/show_bug.cgi?id=8157
Submitter  : Oliver Pinter [EMAIL PROTECTED]
Status : unknown


Subject: kwin dies silently
References : http://lkml.org/lkml/2007/2/28/112
Submitter  : Sid Boyce [EMAIL PROTECTED]
Status : unknown

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/