Re: [patch 1/4] Ignore stolen time in the softlockup watchdog

2007-04-24 Thread Daniel Walker
On Tue, 2007-04-24 at 23:20 +0200, Andi Kleen wrote:
> On Tuesday 24 April 2007 22:52:27 Daniel Walker wrote:
> > On Tue, 2007-04-24 at 13:24 -0700, Jeremy Fitzhardinge wrote:
> > 
> > > And sched_clock's use of local_irq_save/restore appears to be absolutely
> > > correct, so I think it must be triggering a bug in either the self-tests
> > > or lockdep itself.
> > 
> > Why does sched_clock need to disable interrupts?
> 
> It's only used in the instable path which is kind of "i already threw up
> my hands" anyways
> 
> I use it because when you transition from stable (TSC) to instable (jiffies)
> the only way to avoid the clock jumping backwards is to remember and update 
> the 
> last value. To avoid races with parallel cpufreq handlers or timer
> interrupts this small section needs to run with interrupts disabled.

Preemption is already disabled with the get_cpu_var() , so it seems like
the timer interrupt is the only worry? I find it confusing that the
access of jiffies_64 isn't protected from interrupts, it's only the
per_cpu data which should already be protected by the
get_cpu_var()/put_cpu_var ..

Daniel

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 1/4] Ignore stolen time in the softlockup watchdog

2007-04-24 Thread Andi Kleen
On Tuesday 24 April 2007 22:52:27 Daniel Walker wrote:
> On Tue, 2007-04-24 at 13:24 -0700, Jeremy Fitzhardinge wrote:
> 
> > And sched_clock's use of local_irq_save/restore appears to be absolutely
> > correct, so I think it must be triggering a bug in either the self-tests
> > or lockdep itself.
> 
> Why does sched_clock need to disable interrupts?

It's only used in the instable path which is kind of "i already threw up
my hands" anyways

I use it because when you transition from stable (TSC) to instable (jiffies)
the only way to avoid the clock jumping backwards is to remember and update the 
last value. To avoid races with parallel cpufreq handlers or timer
interrupts this small section needs to run with interrupts disabled.

The alternative would be a seqlock, but people have complained about this
earlier too so i judged irq disabling better.

-Andi
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 1/4] Ignore stolen time in the softlockup watchdog

2007-04-24 Thread Andrew Morton
On Tue, 24 Apr 2007 22:59:18 +0200 Ingo Molnar <[EMAIL PROTECTED]> wrote:

> 
> * Daniel Walker <[EMAIL PROTECTED]> wrote:
> 
> > On Tue, 2007-04-24 at 13:24 -0700, Jeremy Fitzhardinge wrote:
> > 
> > > And sched_clock's use of local_irq_save/restore appears to be absolutely
> > > correct, so I think it must be triggering a bug in either the self-tests
> > > or lockdep itself.
> > 
> > Why does sched_clock need to disable interrupts?
> 
> i concur. To me it appears not "absolutely correct" that someone 
> apparently added local_irq_save/restore to sched_clock(), but "absolute 
> madness". sched_clock() is _very_ performance-sensitive for the 
> scheduler, do not mess with it.

Why does a local_irq_save/restore make the selftests fail??
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 1/4] Ignore stolen time in the softlockup watchdog

2007-04-24 Thread Ingo Molnar

* Daniel Walker <[EMAIL PROTECTED]> wrote:

> On Tue, 2007-04-24 at 13:24 -0700, Jeremy Fitzhardinge wrote:
> 
> > And sched_clock's use of local_irq_save/restore appears to be absolutely
> > correct, so I think it must be triggering a bug in either the self-tests
> > or lockdep itself.
> 
> Why does sched_clock need to disable interrupts?

i concur. To me it appears not "absolutely correct" that someone 
apparently added local_irq_save/restore to sched_clock(), but "absolute 
madness". sched_clock() is _very_ performance-sensitive for the 
scheduler, do not mess with it.

Ingo
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 1/4] Ignore stolen time in the softlockup watchdog

2007-04-24 Thread Daniel Walker
On Tue, 2007-04-24 at 22:59 +0200, Ingo Molnar wrote:
> * Daniel Walker <[EMAIL PROTECTED]> wrote:
> 
> > On Tue, 2007-04-24 at 13:24 -0700, Jeremy Fitzhardinge wrote:
> > 
> > > And sched_clock's use of local_irq_save/restore appears to be absolutely
> > > correct, so I think it must be triggering a bug in either the self-tests
> > > or lockdep itself.
> > 
> > Why does sched_clock need to disable interrupts?
> 
> i concur. To me it appears not "absolutely correct" that someone 
> apparently added local_irq_save/restore to sched_clock(), but "absolute 
> madness". sched_clock() is _very_ performance-sensitive for the 
> scheduler, do not mess with it.

It looks like it's used in some sort of warp check, but only when
jiffies is used .. So I'm totally stumped why it's in there..

Daniel

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 1/4] Ignore stolen time in the softlockup watchdog

2007-04-24 Thread Daniel Walker
On Tue, 2007-04-24 at 13:24 -0700, Jeremy Fitzhardinge wrote:

> And sched_clock's use of local_irq_save/restore appears to be absolutely
> correct, so I think it must be triggering a bug in either the self-tests
> or lockdep itself.

Why does sched_clock need to disable interrupts?

Daniel

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 1/4] Ignore stolen time in the softlockup watchdog

2007-04-24 Thread Jeremy Fitzhardinge
Andrew Morton wrote:
> It's weird.  And I don't think the locking selftest code calls
> sched_clock() (or any other time-related thing) at all, does it?
>   

I guess it ends up going through the scheduler, which does use it. 
But... 

J
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 1/4] Ignore stolen time in the softlockup watchdog

2007-04-24 Thread Jeremy Fitzhardinge
Andrew Morton wrote:
> On Tue, 24 Apr 2007 13:00:49 -0700 Jeremy Fitzhardinge <[EMAIL PROTECTED]> 
> wrote:
>
>   
>> Andrew Morton wrote:
>> 
>>> Well, it _is_ mysterious.
>>>
>>> Did you try to locate the code which failed?  I got lost in macros and
>>> include files, and gave up very very easily.  Stop hiding, Ingo.
>>>   
>>>   
>> OK, I've managed to reproduce it.  Removing the local_irq_save/restore
>> from sched_clock() makes it go away, as I'd expect (otherwise it would
>> really be magic).
>> 
>
> erm, why do you expect that?  A local_irq_save()/local_irq_restore() pair
> shouldn't be affecting anything?
>   

Well, yes.  I have no idea why it causes a problem.  But other than
that, sched_clock does absolutely nothing which would affect lockdep state.

>>  But given that it never seems to touch the softlockup
>> during testing, I have no idea what difference it makes...
>> 
>
> To what softlockup are you referring, and what does that have to do with
> anything?

You dropped this patch, "Ignore stolen time in the softlockup watchdog"
because its presence triggers the lock tester errors.  The only thing
this patch does is use sched_clock() rather than jiffies to measure
lockup time.  It therefore appears, for some reason, that using
sched_clock() in the softlockup code is making the lock-test fail. 
Since the lock test doesn't explicitly do any softlockup stuff, the
connection must be implicit via sched_lock - but how, I can't imagine.

Since sched_clock() itself looks perfectly OK, and the softlockup
watchdog seems fine too, I can only conclude its a bug in the lock
testing stuff.  But I don't know what.

J

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 1/4] Ignore stolen time in the softlockup watchdog

2007-04-24 Thread Andrew Morton
On Tue, 24 Apr 2007 13:24:24 -0700 Jeremy Fitzhardinge <[EMAIL PROTECTED]> 
wrote:

> Jeremy Fitzhardinge wrote:
> > Andrew Morton wrote:
> >   
> >> Well, it _is_ mysterious.
> >>
> >> Did you try to locate the code which failed?  I got lost in macros and
> >> include files, and gave up very very easily.  Stop hiding, Ingo.
> >>   
> >> 
> >
> > OK, I've managed to reproduce it.  Removing the local_irq_save/restore
> > from sched_clock() makes it go away, as I'd expect (otherwise it would
> > really be magic).  But given that it never seems to touch the softlockup
> > during testing, I have no idea what difference it makes...
> 
> And sched_clock's use of local_irq_save/restore appears to be absolutely
> correct, so I think it must be triggering a bug in either the self-tests
> or lockdep itself.

It's weird.  And I don't think the locking selftest code calls
sched_clock() (or any other time-related thing) at all, does it?

> The only way I could actually extract the test code itself was to run
> the whole thing through cpp+indent, but it doesn't shed much light.
> 
> It's also not clear to me if there are 6 independent failures, or if
> they're a cascade.

Oh well.  I'll restore the patches and when people hit problems we can
blame Ingo!

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 1/4] Ignore stolen time in the softlockup watchdog

2007-04-24 Thread Jeremy Fitzhardinge
Jeremy Fitzhardinge wrote:
> Andrew Morton wrote:
>   
>> Well, it _is_ mysterious.
>>
>> Did you try to locate the code which failed?  I got lost in macros and
>> include files, and gave up very very easily.  Stop hiding, Ingo.
>>   
>> 
>
> OK, I've managed to reproduce it.  Removing the local_irq_save/restore
> from sched_clock() makes it go away, as I'd expect (otherwise it would
> really be magic).  But given that it never seems to touch the softlockup
> during testing, I have no idea what difference it makes...

And sched_clock's use of local_irq_save/restore appears to be absolutely
correct, so I think it must be triggering a bug in either the self-tests
or lockdep itself.

The only way I could actually extract the test code itself was to run
the whole thing through cpp+indent, but it doesn't shed much light.

It's also not clear to me if there are 6 independent failures, or if
they're a cascade.

J
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 1/4] Ignore stolen time in the softlockup watchdog

2007-04-24 Thread Andrew Morton
On Tue, 24 Apr 2007 13:00:49 -0700 Jeremy Fitzhardinge <[EMAIL PROTECTED]> 
wrote:

> Andrew Morton wrote:
> > Well, it _is_ mysterious.
> >
> > Did you try to locate the code which failed?  I got lost in macros and
> > include files, and gave up very very easily.  Stop hiding, Ingo.
> >   
> 
> OK, I've managed to reproduce it.  Removing the local_irq_save/restore
> from sched_clock() makes it go away, as I'd expect (otherwise it would
> really be magic).

erm, why do you expect that?  A local_irq_save()/local_irq_restore() pair
shouldn't be affecting anything?

>  But given that it never seems to touch the softlockup
> during testing, I have no idea what difference it makes...

To what softlockup are you referring, and what does that have to do with
anything?


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 1/4] Ignore stolen time in the softlockup watchdog

2007-04-24 Thread Jeremy Fitzhardinge
Andrew Morton wrote:
> Well, it _is_ mysterious.
>
> Did you try to locate the code which failed?  I got lost in macros and
> include files, and gave up very very easily.  Stop hiding, Ingo.
>   

OK, I've managed to reproduce it.  Removing the local_irq_save/restore
from sched_clock() makes it go away, as I'd expect (otherwise it would
really be magic).  But given that it never seems to touch the softlockup
during testing, I have no idea what difference it makes...

J
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 1/4] Ignore stolen time in the softlockup watchdog

2007-04-24 Thread Andrew Morton
On Tue, 24 Apr 2007 11:16:09 -0700 Jeremy Fitzhardinge <[EMAIL PROTECTED]> 
wrote:

> Andrew Morton wrote:
> > I said that because the damn thing went away when I was hunting it down
> > because I lost the config and was unable to remember the right combination
> > of debug settings.  Fortunately it later came back so I took care to
> > preserve the config.
> >   
> 
> sched_clock doesn't *do* anything except flap interrupts.

Well, it _is_ mysterious.

Did you try to locate the code which failed?  I got lost in macros and
include files, and gave up very very easily.  Stop hiding, Ingo.

> Oh, wait, have
> you got Andi's bugfixed version of the sched_clock patch?  The first
> version did a local_save_flags rather than a local_irq_save.

I have whatever I pulled from firstfloor over the weekend.  It's in
rc7-mm1.  No, it doesn't use local_save_flags.

> >> Hm, is it caused by using sched_clock() to generate the printk
> >> timestamps while generating the lock test output?
> >> 
> >
> > Conceivably.  What does that locking API test do?
> >   
> 
> Didn't make a difference here.  Building your config now.
> 
> > I was using printk timestamps and netconsole at the time.
> >   
> 
> Ah, great, now you're going to make me setup netconsole...
> 

That's a doddle.

On test system, boot with

netconsole=@/eth0,@/

On workstation:

sudo netcat -u -l -p  | tee -a ~/.log/log-
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 1/4] Ignore stolen time in the softlockup watchdog

2007-04-24 Thread Jeremy Fitzhardinge
Andrew Morton wrote:
> I said that because the damn thing went away when I was hunting it down
> because I lost the config and was unable to remember the right combination
> of debug settings.  Fortunately it later came back so I took care to
> preserve the config.
>   

sched_clock doesn't *do* anything except flap interrupts. Oh, wait, have
you got Andi's bugfixed version of the sched_clock patch?  The first
version did a local_save_flags rather than a local_irq_save.

>> Hm, is it caused by using sched_clock() to generate the printk
>> timestamps while generating the lock test output?
>> 
>
> Conceivably.  What does that locking API test do?
>   

Didn't make a difference here.  Building your config now.

> I was using printk timestamps and netconsole at the time.
>   

Ah, great, now you're going to make me setup netconsole...

J

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 1/4] Ignore stolen time in the softlockup watchdog

2007-04-24 Thread Andrew Morton
On Tue, 24 Apr 2007 10:51:35 -0700 Jeremy Fitzhardinge <[EMAIL PROTECTED]> 
wrote:

> Andrew Morton wrote:
> > It seems fairly sensitive to .config settings.  See
> > http://userweb.kernel.org/~akpm/config-sony.txt
> >   
> 
> I haven't tried your config yet, but I haven't managed to reproduce it
> by playing with the usual suspects in my config (SMP, PREEMPT).  Any
> idea about which config changes make the difference?

I said that because the damn thing went away when I was hunting it down
because I lost the config and was unable to remember the right combination
of debug settings.  Fortunately it later came back so I took care to
preserve the config.

> Hm, is it caused by using sched_clock() to generate the printk
> timestamps while generating the lock test output?

Conceivably.  What does that locking API test do?

I was using printk timestamps and netconsole at the time.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 1/4] Ignore stolen time in the softlockup watchdog

2007-04-24 Thread Jeremy Fitzhardinge
Andrew Morton wrote:
> It seems fairly sensitive to .config settings.  See
> http://userweb.kernel.org/~akpm/config-sony.txt
>   

I haven't tried your config yet, but I haven't managed to reproduce it
by playing with the usual suspects in my config (SMP, PREEMPT).  Any
idea about which config changes make the difference?

Hm, is it caused by using sched_clock() to generate the printk
timestamps while generating the lock test output?

J
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 1/4] Ignore stolen time in the softlockup watchdog

2007-04-24 Thread Andrew Morton
On Mon, 23 Apr 2007 23:58:20 -0700 Jeremy Fitzhardinge <[EMAIL PROTECTED]> 
wrote:

> Andrew Morton wrote:
> > On Tue, 27 Mar 2007 14:49:20 -0700 Jeremy Fitzhardinge <[EMAIL PROTECTED]> 
> > wrote:
> >
> >   
> >> The softlockup watchdog is currently a nuisance in a virtual machine,
> >> since the whole system could have the CPU stolen from it for a long
> >> period of time.  While it would be unlikely for a guest domain to be
> >> denied timer interrupts for over 10s, it could happen and any softlockup
> >> message would be completely spurious.
> >>
> >> Earlier I proposed that sched_clock() return time in unstolen
> >> nanoseconds, which is how Xen and VMI currently implement it.  If the
> >> softlockup watchdog uses sched_clock() to measure time, it would
> >> automatically ignore stolen time, and therefore only report when the
> >> guest itself locked up.  When running native, sched_clock() returns
> >> real-time nanoseconds, so the behaviour would be unchanged.
> >>
> >> Note that sched_clock() used this way is inherently per-cpu, so this
> >> patch makes sure that the per-processor watchdog thread initialized
> >> its own timestamp.
> >> 
> >
> > This patch
> > (ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.21-rc6/2.6.21-rc6-mm1/broken-out/ignore-stolen-time-in-the-softlockup-watchdog.patch)
> > causes six failures in the locking self-tests, which I must say is rather
> > clever of it.
> >   
> 
> Interesting.

I'll say.

>  Which variation of sched_clock do you have in your tree at
> the moment?

Andi's, plus the below fix.

Sigh.  I thought I was only two more bugs away from a release, then...


[18014389.347124] BUG: unable to handle kernel paging request at virtual 
address 6b6b7193
[18014389.347142]  printing eip:
[18014389.347149] c029a80c
[18014389.347156] *pde = 
[18014389.347166] Oops:  [#1]
[18014389.347174] Modules linked in: i915 drm ipw2200 sonypi ipv6 autofs4 hidp 
l2cap bluetooth sunrpc nf_conntrack_netbios_ns ipt_REJECT nf_conntrack_ipv4 
xt_state nf_conntrack nfnetlink xt_tcpudp iptable_filter ip_tables x_tables 
cpufreq_ondemand video sbs button battery asus_acpi ac nvram ohci1394 ieee1394 
ehci_hcd uhci_hcd sg joydev snd_hda_intel snd_seq_dummy snd_seq_oss 
snd_seq_midi_event snd_seq snd_seq_device snd_pcm_oss snd_mixer_oss snd_pcm 
sr_mod cdrom snd_timer ieee80211 i2c_i801 piix ieee80211_crypt i2c_core generic 
snd soundcore snd_page_alloc ext3 jbd ide_disk ide_core
[18014389.347520] CPU:0
[18014389.347521] EIP:0060:[]Tainted: G  D VLI
[18014389.347522] EFLAGS: 00010296   (2.6.21-rc7-mm1 #35)
[18014389.347547] EIP is at input_release_device+0x8/0x4e
[18014389.347555] eax: c99709a8   ebx: 6b6b6b6b   ecx: 0286   edx: 
[18014389.347563] esi: 6b6b6b6b   edi: c99709cc   ebp: c21e3d40   esp: c21e3d38
[18014389.347571] ds: 007b   es: 007b   fs: 00d8  gs:   ss: 0068
[18014389.347580] Process khubd (pid: 159, ti=c21e2000 task=c20a62f0 
task.ti=c21e2000)
[18014389.347588] Stack: 6b6b6b6b c99709a8 c21e3d60 c029b489 c2014ec8 c9182000 
c96b167c c9970954 
[18014389.347655]c9970954 c99709cc c21e3d80 c029d401 c9977a6c c96b1000 
c21e3d90 c9970954 
[18014389.347708]c99709a8 c9164000 c21e3d90 c029d4b5 c96b1000 c9970564 
c21e3db0 c029c50b 
[18014389.347771] Call Trace:
[18014389.347792]  [] input_close_device+0x13/0x51
[18014389.347810]  [] mousedev_destroy+0x29/0x7e
[18014389.347827]  [] mousedev_disconnect+0x5f/0x63
[18014389.347842]  [] input_unregister_device+0x6a/0x100
[18014389.347858]  [] hidinput_disconnect+0x24/0x41
[18014389.347874]  [] hid_disconnect+0x79/0xc9
[18014389.347889]  [] usb_unbind_interface+0x47/0x8f
[18014389.347916]  [] __device_release_driver+0x74/0x90
[18014389.347933]  [] device_release_driver+0x37/0x4e
[18014389.347957]  [] bus_remove_device+0x73/0x82
[18014389.347977]  [] device_del+0x214/0x28c
[18014389.348132]  [] usb_disable_device+0x62/0xc2
[18014389.348148]  [] usb_disconnect+0x99/0x126
[18014389.348163]  [] hub_thread+0x3a5/0xb07
[18014389.348178]  [] kthread+0x6e/0x79
[18014389.348194]  [] kernel_thread_helper+0x7/0x10
[18014389.348210]  ===
[18014389.348218] INFO: lockdep is turned off.
[18014389.348224] Code: 5b 5d c3 55 b9 f0 ff ff ff 8b 50 0c 89 e5 83 ba 28 06 
00 00 00 75 08 89 82 28 06 00 00 31 c9 5d 89 c8 c3 55 89 e5 56 53 8b 70 0c <39> 
86 28 06 00 00 75 3a 8b 9e e4 08 00 00 c7 86 28 06 00 00 00 

I dunno.  I'll keep plugging for another couple hours then I'll shove
out what I have as a -mm snapshot whatsit.

Things are just ridiculous.  I'm thinking of having a hard-disk crash and
accidentally losing everything.



From: Andrew Morton <[EMAIL PROTECTED]>

WARNING: arch/x86_64/kernel/built-in.o - Section mismatch: reference to 
.init.text:sc_cpu_event from .data between 'sc_cpu_notifier' (at offset 0x2110) 
and 'mcelog'

Use hotcpu_notifier().  This takes care of making sure that the unused code
disappears from vmlinux if !CONFIG_HOTPLUG_CPU, too.


Re: [patch 1/4] Ignore stolen time in the softlockup watchdog

2007-04-24 Thread Jeremy Fitzhardinge
Andrew Morton wrote:
> On Tue, 27 Mar 2007 14:49:20 -0700 Jeremy Fitzhardinge <[EMAIL PROTECTED]> 
> wrote:
>
>   
>> The softlockup watchdog is currently a nuisance in a virtual machine,
>> since the whole system could have the CPU stolen from it for a long
>> period of time.  While it would be unlikely for a guest domain to be
>> denied timer interrupts for over 10s, it could happen and any softlockup
>> message would be completely spurious.
>>
>> Earlier I proposed that sched_clock() return time in unstolen
>> nanoseconds, which is how Xen and VMI currently implement it.  If the
>> softlockup watchdog uses sched_clock() to measure time, it would
>> automatically ignore stolen time, and therefore only report when the
>> guest itself locked up.  When running native, sched_clock() returns
>> real-time nanoseconds, so the behaviour would be unchanged.
>>
>> Note that sched_clock() used this way is inherently per-cpu, so this
>> patch makes sure that the per-processor watchdog thread initialized
>> its own timestamp.
>> 
>
> This patch
> (ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.21-rc6/2.6.21-rc6-mm1/broken-out/ignore-stolen-time-in-the-softlockup-watchdog.patch)
> causes six failures in the locking self-tests, which I must say is rather
> clever of it.
>   

Interesting.  Which variation of sched_clock do you have in your tree at
the moment?

J
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 1/4] Ignore stolen time in the softlockup watchdog

2007-04-24 Thread Andrew Morton
On Tue, 27 Mar 2007 14:49:20 -0700 Jeremy Fitzhardinge <[EMAIL PROTECTED]> 
wrote:

> The softlockup watchdog is currently a nuisance in a virtual machine,
> since the whole system could have the CPU stolen from it for a long
> period of time.  While it would be unlikely for a guest domain to be
> denied timer interrupts for over 10s, it could happen and any softlockup
> message would be completely spurious.
> 
> Earlier I proposed that sched_clock() return time in unstolen
> nanoseconds, which is how Xen and VMI currently implement it.  If the
> softlockup watchdog uses sched_clock() to measure time, it would
> automatically ignore stolen time, and therefore only report when the
> guest itself locked up.  When running native, sched_clock() returns
> real-time nanoseconds, so the behaviour would be unchanged.
> 
> Note that sched_clock() used this way is inherently per-cpu, so this
> patch makes sure that the per-processor watchdog thread initialized
> its own timestamp.

This patch
(ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.21-rc6/2.6.21-rc6-mm1/broken-out/ignore-stolen-time-in-the-softlockup-watchdog.patch)
causes six failures in the locking self-tests, which I must say is rather
clever of it.


Here's the first one:

[17179569.184000] Lock dependency validator: Copyright (c) 2006 Red Hat, Inc., 
Ingo Molnar
[17179569.184000] ... MAX_LOCKDEP_SUBCLASSES:8
[17179569.184000] ... MAX_LOCK_DEPTH:  30
[17179569.184000] ... MAX_LOCKDEP_KEYS:2048
[17179569.184000] ... CLASSHASH_SIZE:   1024
[17179569.184000] ... MAX_LOCKDEP_ENTRIES: 8192
[17179569.184000] ... MAX_LOCKDEP_CHAINS:  16384
[17179569.184000] ... CHAINHASH_SIZE:  8192
[17179569.184000]  memory used by lock dependency info: 992 kB
[17179569.184000]  per task-struct memory footprint: 1200 bytes
[17179569.184000] 
[17179569.184000] | Locking API testsuite:
[17179569.184000] 

[17179569.184000]  | spin |wlock |rlock |mutex 
| wsem | rsem |
[17179569.184000]   
--
[17179569.184000]  A-A deadlock:  ok  |  ok  |  ok  |  ok  
|  ok  |  ok  |
[17179569.184000]  A-B-B-A deadlock:  ok  |  ok  |  ok  |  ok  
|  ok  |  ok  |
[17179569.184000]  A-B-B-C-C-A deadlock:  ok  |  ok  |  ok  |  ok  
|  ok  |  ok  |
[17179569.184001]  A-B-C-A-B-C deadlock:  ok  |  ok  |  ok  |  ok  
|  ok  |  ok  |
[17179569.184002]  A-B-B-C-C-D-D-A deadlock:  ok  |  ok  |  ok  |  ok  
|  ok  |  ok  |
[17179569.184003]  A-B-C-D-B-D-D-A deadlock:  ok  |  ok  |  ok  |  ok  
|  ok  |  ok  |
[17179569.184004]  A-B-C-D-B-C-D-A deadlock:  ok  |  ok  |  ok  |  ok  
|  ok  |  ok  |
[17179569.184005] double unlock:  ok  |  ok  |  ok  |  ok  
|  ok  |  ok  |
[17179569.184006]   initialize held:  ok  |  ok  |  ok  |  ok  
|  ok  |  ok  |
[17179569.184006]  bad unlock order:  ok  |  ok  |  ok  |  ok  
|  ok  |  ok  |
[17179569.184006]   
--
[17179569.184006]   recursive read-lock: |  ok  |   
  |  ok  |
[17179569.184006]recursive read-lock #2: |  ok  |   
  |  ok  |
[17179569.184007] mixed read-write-lock: |  ok  |   
  |  ok  |
[17179569.184007] mixed write-read-lock: |  ok  |   
  |  ok  |
[17179569.184007]   
--
[17179569.184007]  hard-irqs-on + irq-safe-A/12:  ok  |  ok  |  ok  |
[17179569.184007]  soft-irqs-on + irq-safe-A/12:  ok  |  ok  |  ok  |
[17179569.184007]  hard-irqs-on + irq-safe-A/21:  ok  |  ok  |  ok  |
[17179569.184007]  soft-irqs-on + irq-safe-A/21:  ok  |  ok  |  ok  |
[17179569.184007]sirq-safe-A => hirqs-on/12:  ok  |  ok  |irq event 
stamp: 458
[17179569.184007] hardirqs last  enabled at (458): [] 
irqsafe2A_rlock_12+0x96/0xa3
[17179569.184007] hardirqs last disabled at (457): [] 
sched_clock+0x5e/0xe9
[17179569.184007] softirqs last  enabled at (454): [] 
irqsafe2A_rlock_12+0x81/0xa3
[17179569.184007] softirqs last disabled at (450): [] 
irqsafe2A_rlock_12+0xb/0xa3
[17179569.184007] FAILED| [] dump_trace+0x63/0x1ec
[17179569.184007]  [] show_trace_log_lvl+0x1a/0x30
[17179569.184007]  [] show_trace+0x12/0x14
[17179569.184007]  [] dump_stack+0x16/0x18
[17179569.184007]  [] dotest+0x6b/0x3d0
[17179569.184007]  [] locking_selftest+0x915/0x1a58
[17179569.184007]  [] start_kernel+0x1d0/0x2a2
[17179569.184007]  ===
[17179569.184007] 
[17179569.184007]sirq-safe-A => hirqs-on/21:irq event stamp: 462
[17179569.184007] hardirqs last  enabled at (462): [] 

Re: [patch 1/4] Ignore stolen time in the softlockup watchdog

2007-04-24 Thread Andrew Morton
On Tue, 27 Mar 2007 14:49:20 -0700 Jeremy Fitzhardinge [EMAIL PROTECTED] 
wrote:

 The softlockup watchdog is currently a nuisance in a virtual machine,
 since the whole system could have the CPU stolen from it for a long
 period of time.  While it would be unlikely for a guest domain to be
 denied timer interrupts for over 10s, it could happen and any softlockup
 message would be completely spurious.
 
 Earlier I proposed that sched_clock() return time in unstolen
 nanoseconds, which is how Xen and VMI currently implement it.  If the
 softlockup watchdog uses sched_clock() to measure time, it would
 automatically ignore stolen time, and therefore only report when the
 guest itself locked up.  When running native, sched_clock() returns
 real-time nanoseconds, so the behaviour would be unchanged.
 
 Note that sched_clock() used this way is inherently per-cpu, so this
 patch makes sure that the per-processor watchdog thread initialized
 its own timestamp.

This patch
(ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.21-rc6/2.6.21-rc6-mm1/broken-out/ignore-stolen-time-in-the-softlockup-watchdog.patch)
causes six failures in the locking self-tests, which I must say is rather
clever of it.


Here's the first one:

[17179569.184000] Lock dependency validator: Copyright (c) 2006 Red Hat, Inc., 
Ingo Molnar
[17179569.184000] ... MAX_LOCKDEP_SUBCLASSES:8
[17179569.184000] ... MAX_LOCK_DEPTH:  30
[17179569.184000] ... MAX_LOCKDEP_KEYS:2048
[17179569.184000] ... CLASSHASH_SIZE:   1024
[17179569.184000] ... MAX_LOCKDEP_ENTRIES: 8192
[17179569.184000] ... MAX_LOCKDEP_CHAINS:  16384
[17179569.184000] ... CHAINHASH_SIZE:  8192
[17179569.184000]  memory used by lock dependency info: 992 kB
[17179569.184000]  per task-struct memory footprint: 1200 bytes
[17179569.184000] 
[17179569.184000] | Locking API testsuite:
[17179569.184000] 

[17179569.184000]  | spin |wlock |rlock |mutex 
| wsem | rsem |
[17179569.184000]   
--
[17179569.184000]  A-A deadlock:  ok  |  ok  |  ok  |  ok  
|  ok  |  ok  |
[17179569.184000]  A-B-B-A deadlock:  ok  |  ok  |  ok  |  ok  
|  ok  |  ok  |
[17179569.184000]  A-B-B-C-C-A deadlock:  ok  |  ok  |  ok  |  ok  
|  ok  |  ok  |
[17179569.184001]  A-B-C-A-B-C deadlock:  ok  |  ok  |  ok  |  ok  
|  ok  |  ok  |
[17179569.184002]  A-B-B-C-C-D-D-A deadlock:  ok  |  ok  |  ok  |  ok  
|  ok  |  ok  |
[17179569.184003]  A-B-C-D-B-D-D-A deadlock:  ok  |  ok  |  ok  |  ok  
|  ok  |  ok  |
[17179569.184004]  A-B-C-D-B-C-D-A deadlock:  ok  |  ok  |  ok  |  ok  
|  ok  |  ok  |
[17179569.184005] double unlock:  ok  |  ok  |  ok  |  ok  
|  ok  |  ok  |
[17179569.184006]   initialize held:  ok  |  ok  |  ok  |  ok  
|  ok  |  ok  |
[17179569.184006]  bad unlock order:  ok  |  ok  |  ok  |  ok  
|  ok  |  ok  |
[17179569.184006]   
--
[17179569.184006]   recursive read-lock: |  ok  |   
  |  ok  |
[17179569.184006]recursive read-lock #2: |  ok  |   
  |  ok  |
[17179569.184007] mixed read-write-lock: |  ok  |   
  |  ok  |
[17179569.184007] mixed write-read-lock: |  ok  |   
  |  ok  |
[17179569.184007]   
--
[17179569.184007]  hard-irqs-on + irq-safe-A/12:  ok  |  ok  |  ok  |
[17179569.184007]  soft-irqs-on + irq-safe-A/12:  ok  |  ok  |  ok  |
[17179569.184007]  hard-irqs-on + irq-safe-A/21:  ok  |  ok  |  ok  |
[17179569.184007]  soft-irqs-on + irq-safe-A/21:  ok  |  ok  |  ok  |
[17179569.184007]sirq-safe-A = hirqs-on/12:  ok  |  ok  |irq event 
stamp: 458
[17179569.184007] hardirqs last  enabled at (458): [c01e4116] 
irqsafe2A_rlock_12+0x96/0xa3
[17179569.184007] hardirqs last disabled at (457): [c01095b9] 
sched_clock+0x5e/0xe9
[17179569.184007] softirqs last  enabled at (454): [c01e4101] 
irqsafe2A_rlock_12+0x81/0xa3
[17179569.184007] softirqs last disabled at (450): [c01e408b] 
irqsafe2A_rlock_12+0xb/0xa3
[17179569.184007] FAILED| [c0104cf0] dump_trace+0x63/0x1ec
[17179569.184007]  [c0104e93] show_trace_log_lvl+0x1a/0x30
[17179569.184007]  [c01059ec] show_trace+0x12/0x14
[17179569.184007]  [c0105a45] dump_stack+0x16/0x18
[17179569.184007]  [c01e1eb5] dotest+0x6b/0x3d0
[17179569.184007]  [c01eb249] locking_selftest+0x915/0x1a58
[17179569.184007]  [c048c979] start_kernel+0x1d0/0x2a2
[17179569.184007]  ===
[17179569.184007] 
[17179569.184007]sirq-safe-A = hirqs-on/21:irq event stamp: 462

Re: [patch 1/4] Ignore stolen time in the softlockup watchdog

2007-04-24 Thread Jeremy Fitzhardinge
Andrew Morton wrote:
 On Tue, 27 Mar 2007 14:49:20 -0700 Jeremy Fitzhardinge [EMAIL PROTECTED] 
 wrote:

   
 The softlockup watchdog is currently a nuisance in a virtual machine,
 since the whole system could have the CPU stolen from it for a long
 period of time.  While it would be unlikely for a guest domain to be
 denied timer interrupts for over 10s, it could happen and any softlockup
 message would be completely spurious.

 Earlier I proposed that sched_clock() return time in unstolen
 nanoseconds, which is how Xen and VMI currently implement it.  If the
 softlockup watchdog uses sched_clock() to measure time, it would
 automatically ignore stolen time, and therefore only report when the
 guest itself locked up.  When running native, sched_clock() returns
 real-time nanoseconds, so the behaviour would be unchanged.

 Note that sched_clock() used this way is inherently per-cpu, so this
 patch makes sure that the per-processor watchdog thread initialized
 its own timestamp.
 

 This patch
 (ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.21-rc6/2.6.21-rc6-mm1/broken-out/ignore-stolen-time-in-the-softlockup-watchdog.patch)
 causes six failures in the locking self-tests, which I must say is rather
 clever of it.
   

Interesting.  Which variation of sched_clock do you have in your tree at
the moment?

J
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 1/4] Ignore stolen time in the softlockup watchdog

2007-04-24 Thread Andrew Morton
On Mon, 23 Apr 2007 23:58:20 -0700 Jeremy Fitzhardinge [EMAIL PROTECTED] 
wrote:

 Andrew Morton wrote:
  On Tue, 27 Mar 2007 14:49:20 -0700 Jeremy Fitzhardinge [EMAIL PROTECTED] 
  wrote:
 

  The softlockup watchdog is currently a nuisance in a virtual machine,
  since the whole system could have the CPU stolen from it for a long
  period of time.  While it would be unlikely for a guest domain to be
  denied timer interrupts for over 10s, it could happen and any softlockup
  message would be completely spurious.
 
  Earlier I proposed that sched_clock() return time in unstolen
  nanoseconds, which is how Xen and VMI currently implement it.  If the
  softlockup watchdog uses sched_clock() to measure time, it would
  automatically ignore stolen time, and therefore only report when the
  guest itself locked up.  When running native, sched_clock() returns
  real-time nanoseconds, so the behaviour would be unchanged.
 
  Note that sched_clock() used this way is inherently per-cpu, so this
  patch makes sure that the per-processor watchdog thread initialized
  its own timestamp.
  
 
  This patch
  (ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.21-rc6/2.6.21-rc6-mm1/broken-out/ignore-stolen-time-in-the-softlockup-watchdog.patch)
  causes six failures in the locking self-tests, which I must say is rather
  clever of it.

 
 Interesting.

I'll say.

  Which variation of sched_clock do you have in your tree at
 the moment?

Andi's, plus the below fix.

Sigh.  I thought I was only two more bugs away from a release, then...


[18014389.347124] BUG: unable to handle kernel paging request at virtual 
address 6b6b7193
[18014389.347142]  printing eip:
[18014389.347149] c029a80c
[18014389.347156] *pde = 
[18014389.347166] Oops:  [#1]
[18014389.347174] Modules linked in: i915 drm ipw2200 sonypi ipv6 autofs4 hidp 
l2cap bluetooth sunrpc nf_conntrack_netbios_ns ipt_REJECT nf_conntrack_ipv4 
xt_state nf_conntrack nfnetlink xt_tcpudp iptable_filter ip_tables x_tables 
cpufreq_ondemand video sbs button battery asus_acpi ac nvram ohci1394 ieee1394 
ehci_hcd uhci_hcd sg joydev snd_hda_intel snd_seq_dummy snd_seq_oss 
snd_seq_midi_event snd_seq snd_seq_device snd_pcm_oss snd_mixer_oss snd_pcm 
sr_mod cdrom snd_timer ieee80211 i2c_i801 piix ieee80211_crypt i2c_core generic 
snd soundcore snd_page_alloc ext3 jbd ide_disk ide_core
[18014389.347520] CPU:0
[18014389.347521] EIP:0060:[c029a80c]Tainted: G  D VLI
[18014389.347522] EFLAGS: 00010296   (2.6.21-rc7-mm1 #35)
[18014389.347547] EIP is at input_release_device+0x8/0x4e
[18014389.347555] eax: c99709a8   ebx: 6b6b6b6b   ecx: 0286   edx: 
[18014389.347563] esi: 6b6b6b6b   edi: c99709cc   ebp: c21e3d40   esp: c21e3d38
[18014389.347571] ds: 007b   es: 007b   fs: 00d8  gs:   ss: 0068
[18014389.347580] Process khubd (pid: 159, ti=c21e2000 task=c20a62f0 
task.ti=c21e2000)
[18014389.347588] Stack: 6b6b6b6b c99709a8 c21e3d60 c029b489 c2014ec8 c9182000 
c96b167c c9970954 
[18014389.347655]c9970954 c99709cc c21e3d80 c029d401 c9977a6c c96b1000 
c21e3d90 c9970954 
[18014389.347708]c99709a8 c9164000 c21e3d90 c029d4b5 c96b1000 c9970564 
c21e3db0 c029c50b 
[18014389.347771] Call Trace:
[18014389.347792]  [c029b489] input_close_device+0x13/0x51
[18014389.347810]  [c029d401] mousedev_destroy+0x29/0x7e
[18014389.347827]  [c029d4b5] mousedev_disconnect+0x5f/0x63
[18014389.347842]  [c029c50b] input_unregister_device+0x6a/0x100
[18014389.347858]  [c02abf9c] hidinput_disconnect+0x24/0x41
[18014389.347874]  [c02aef29] hid_disconnect+0x79/0xc9
[18014389.347889]  [c028e1db] usb_unbind_interface+0x47/0x8f
[18014389.347916]  [c0256852] __device_release_driver+0x74/0x90
[18014389.347933]  [c0256c5f] device_release_driver+0x37/0x4e
[18014389.347957]  [c02561c6] bus_remove_device+0x73/0x82
[18014389.347977]  [c02547c1] device_del+0x214/0x28c
[18014389.348132]  [c028bb72] usb_disable_device+0x62/0xc2
[18014389.348148]  [c0288893] usb_disconnect+0x99/0x126
[18014389.348163]  [c0288d2c] hub_thread+0x3a5/0xb07
[18014389.348178]  [c012cbe5] kthread+0x6e/0x79
[18014389.348194]  [c0104917] kernel_thread_helper+0x7/0x10
[18014389.348210]  ===
[18014389.348218] INFO: lockdep is turned off.
[18014389.348224] Code: 5b 5d c3 55 b9 f0 ff ff ff 8b 50 0c 89 e5 83 ba 28 06 
00 00 00 75 08 89 82 28 06 00 00 31 c9 5d 89 c8 c3 55 89 e5 56 53 8b 70 0c 39 
86 28 06 00 00 75 3a 8b 9e e4 08 00 00 c7 86 28 06 00 00 00 

I dunno.  I'll keep plugging for another couple hours then I'll shove
out what I have as a -mm snapshot whatsit.

Things are just ridiculous.  I'm thinking of having a hard-disk crash and
accidentally losing everything.



From: Andrew Morton [EMAIL PROTECTED]

WARNING: arch/x86_64/kernel/built-in.o - Section mismatch: reference to 
.init.text:sc_cpu_event from .data between 'sc_cpu_notifier' (at offset 0x2110) 
and 'mcelog'

Use hotcpu_notifier().  This takes care of making sure that the unused code

Re: [patch 1/4] Ignore stolen time in the softlockup watchdog

2007-04-24 Thread Jeremy Fitzhardinge
Andrew Morton wrote:
 It seems fairly sensitive to .config settings.  See
 http://userweb.kernel.org/~akpm/config-sony.txt
   

I haven't tried your config yet, but I haven't managed to reproduce it
by playing with the usual suspects in my config (SMP, PREEMPT).  Any
idea about which config changes make the difference?

Hm, is it caused by using sched_clock() to generate the printk
timestamps while generating the lock test output?

J
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 1/4] Ignore stolen time in the softlockup watchdog

2007-04-24 Thread Andrew Morton
On Tue, 24 Apr 2007 10:51:35 -0700 Jeremy Fitzhardinge [EMAIL PROTECTED] 
wrote:

 Andrew Morton wrote:
  It seems fairly sensitive to .config settings.  See
  http://userweb.kernel.org/~akpm/config-sony.txt

 
 I haven't tried your config yet, but I haven't managed to reproduce it
 by playing with the usual suspects in my config (SMP, PREEMPT).  Any
 idea about which config changes make the difference?

I said that because the damn thing went away when I was hunting it down
because I lost the config and was unable to remember the right combination
of debug settings.  Fortunately it later came back so I took care to
preserve the config.

 Hm, is it caused by using sched_clock() to generate the printk
 timestamps while generating the lock test output?

Conceivably.  What does that locking API test do?

I was using printk timestamps and netconsole at the time.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 1/4] Ignore stolen time in the softlockup watchdog

2007-04-24 Thread Jeremy Fitzhardinge
Andrew Morton wrote:
 I said that because the damn thing went away when I was hunting it down
 because I lost the config and was unable to remember the right combination
 of debug settings.  Fortunately it later came back so I took care to
 preserve the config.
   

sched_clock doesn't *do* anything except flap interrupts. Oh, wait, have
you got Andi's bugfixed version of the sched_clock patch?  The first
version did a local_save_flags rather than a local_irq_save.

 Hm, is it caused by using sched_clock() to generate the printk
 timestamps while generating the lock test output?
 

 Conceivably.  What does that locking API test do?
   

Didn't make a difference here.  Building your config now.

 I was using printk timestamps and netconsole at the time.
   

Ah, great, now you're going to make me setup netconsole...

J

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 1/4] Ignore stolen time in the softlockup watchdog

2007-04-24 Thread Andrew Morton
On Tue, 24 Apr 2007 11:16:09 -0700 Jeremy Fitzhardinge [EMAIL PROTECTED] 
wrote:

 Andrew Morton wrote:
  I said that because the damn thing went away when I was hunting it down
  because I lost the config and was unable to remember the right combination
  of debug settings.  Fortunately it later came back so I took care to
  preserve the config.

 
 sched_clock doesn't *do* anything except flap interrupts.

Well, it _is_ mysterious.

Did you try to locate the code which failed?  I got lost in macros and
include files, and gave up very very easily.  Stop hiding, Ingo.

 Oh, wait, have
 you got Andi's bugfixed version of the sched_clock patch?  The first
 version did a local_save_flags rather than a local_irq_save.

I have whatever I pulled from firstfloor over the weekend.  It's in
rc7-mm1.  No, it doesn't use local_save_flags.

  Hm, is it caused by using sched_clock() to generate the printk
  timestamps while generating the lock test output?
  
 
  Conceivably.  What does that locking API test do?

 
 Didn't make a difference here.  Building your config now.
 
  I was using printk timestamps and netconsole at the time.

 
 Ah, great, now you're going to make me setup netconsole...
 

That's a doddle.

On test system, boot with

netconsole=@test-system-ip-addr/eth0,udp-port-no@workstation-ip-addr/workstation-mac-addr

On workstation:

sudo netcat -u -l -p udp-port-no | tee -a ~/.log/log-test-system-hostname
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 1/4] Ignore stolen time in the softlockup watchdog

2007-04-24 Thread Jeremy Fitzhardinge
Andrew Morton wrote:
 Well, it _is_ mysterious.

 Did you try to locate the code which failed?  I got lost in macros and
 include files, and gave up very very easily.  Stop hiding, Ingo.
   

OK, I've managed to reproduce it.  Removing the local_irq_save/restore
from sched_clock() makes it go away, as I'd expect (otherwise it would
really be magic).  But given that it never seems to touch the softlockup
during testing, I have no idea what difference it makes...

J
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 1/4] Ignore stolen time in the softlockup watchdog

2007-04-24 Thread Andrew Morton
On Tue, 24 Apr 2007 13:00:49 -0700 Jeremy Fitzhardinge [EMAIL PROTECTED] 
wrote:

 Andrew Morton wrote:
  Well, it _is_ mysterious.
 
  Did you try to locate the code which failed?  I got lost in macros and
  include files, and gave up very very easily.  Stop hiding, Ingo.

 
 OK, I've managed to reproduce it.  Removing the local_irq_save/restore
 from sched_clock() makes it go away, as I'd expect (otherwise it would
 really be magic).

erm, why do you expect that?  A local_irq_save()/local_irq_restore() pair
shouldn't be affecting anything?

  But given that it never seems to touch the softlockup
 during testing, I have no idea what difference it makes...

To what softlockup are you referring, and what does that have to do with
anything?

feels dumb
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 1/4] Ignore stolen time in the softlockup watchdog

2007-04-24 Thread Jeremy Fitzhardinge
Jeremy Fitzhardinge wrote:
 Andrew Morton wrote:
   
 Well, it _is_ mysterious.

 Did you try to locate the code which failed?  I got lost in macros and
 include files, and gave up very very easily.  Stop hiding, Ingo.
   
 

 OK, I've managed to reproduce it.  Removing the local_irq_save/restore
 from sched_clock() makes it go away, as I'd expect (otherwise it would
 really be magic).  But given that it never seems to touch the softlockup
 during testing, I have no idea what difference it makes...

And sched_clock's use of local_irq_save/restore appears to be absolutely
correct, so I think it must be triggering a bug in either the self-tests
or lockdep itself.

The only way I could actually extract the test code itself was to run
the whole thing through cpp+indent, but it doesn't shed much light.

It's also not clear to me if there are 6 independent failures, or if
they're a cascade.

J
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 1/4] Ignore stolen time in the softlockup watchdog

2007-04-24 Thread Andrew Morton
On Tue, 24 Apr 2007 13:24:24 -0700 Jeremy Fitzhardinge [EMAIL PROTECTED] 
wrote:

 Jeremy Fitzhardinge wrote:
  Andrew Morton wrote:

  Well, it _is_ mysterious.
 
  Did you try to locate the code which failed?  I got lost in macros and
  include files, and gave up very very easily.  Stop hiding, Ingo.

  
 
  OK, I've managed to reproduce it.  Removing the local_irq_save/restore
  from sched_clock() makes it go away, as I'd expect (otherwise it would
  really be magic).  But given that it never seems to touch the softlockup
  during testing, I have no idea what difference it makes...
 
 And sched_clock's use of local_irq_save/restore appears to be absolutely
 correct, so I think it must be triggering a bug in either the self-tests
 or lockdep itself.

It's weird.  And I don't think the locking selftest code calls
sched_clock() (or any other time-related thing) at all, does it?

 The only way I could actually extract the test code itself was to run
 the whole thing through cpp+indent, but it doesn't shed much light.
 
 It's also not clear to me if there are 6 independent failures, or if
 they're a cascade.

Oh well.  I'll restore the patches and when people hit problems we can
blame Ingo!

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 1/4] Ignore stolen time in the softlockup watchdog

2007-04-24 Thread Jeremy Fitzhardinge
Andrew Morton wrote:
 On Tue, 24 Apr 2007 13:00:49 -0700 Jeremy Fitzhardinge [EMAIL PROTECTED] 
 wrote:

   
 Andrew Morton wrote:
 
 Well, it _is_ mysterious.

 Did you try to locate the code which failed?  I got lost in macros and
 include files, and gave up very very easily.  Stop hiding, Ingo.
   
   
 OK, I've managed to reproduce it.  Removing the local_irq_save/restore
 from sched_clock() makes it go away, as I'd expect (otherwise it would
 really be magic).
 

 erm, why do you expect that?  A local_irq_save()/local_irq_restore() pair
 shouldn't be affecting anything?
   

Well, yes.  I have no idea why it causes a problem.  But other than
that, sched_clock does absolutely nothing which would affect lockdep state.

  But given that it never seems to touch the softlockup
 during testing, I have no idea what difference it makes...
 

 To what softlockup are you referring, and what does that have to do with
 anything?

You dropped this patch, Ignore stolen time in the softlockup watchdog
because its presence triggers the lock tester errors.  The only thing
this patch does is use sched_clock() rather than jiffies to measure
lockup time.  It therefore appears, for some reason, that using
sched_clock() in the softlockup code is making the lock-test fail. 
Since the lock test doesn't explicitly do any softlockup stuff, the
connection must be implicit via sched_lock - but how, I can't imagine.

Since sched_clock() itself looks perfectly OK, and the softlockup
watchdog seems fine too, I can only conclude its a bug in the lock
testing stuff.  But I don't know what.

J

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 1/4] Ignore stolen time in the softlockup watchdog

2007-04-24 Thread Jeremy Fitzhardinge
Andrew Morton wrote:
 It's weird.  And I don't think the locking selftest code calls
 sched_clock() (or any other time-related thing) at all, does it?
   

I guess it ends up going through the scheduler, which does use it. 
But... shrug

J
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 1/4] Ignore stolen time in the softlockup watchdog

2007-04-24 Thread Daniel Walker
On Tue, 2007-04-24 at 13:24 -0700, Jeremy Fitzhardinge wrote:

 And sched_clock's use of local_irq_save/restore appears to be absolutely
 correct, so I think it must be triggering a bug in either the self-tests
 or lockdep itself.

Why does sched_clock need to disable interrupts?

Daniel

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 1/4] Ignore stolen time in the softlockup watchdog

2007-04-24 Thread Ingo Molnar

* Daniel Walker [EMAIL PROTECTED] wrote:

 On Tue, 2007-04-24 at 13:24 -0700, Jeremy Fitzhardinge wrote:
 
  And sched_clock's use of local_irq_save/restore appears to be absolutely
  correct, so I think it must be triggering a bug in either the self-tests
  or lockdep itself.
 
 Why does sched_clock need to disable interrupts?

i concur. To me it appears not absolutely correct that someone 
apparently added local_irq_save/restore to sched_clock(), but absolute 
madness. sched_clock() is _very_ performance-sensitive for the 
scheduler, do not mess with it.

Ingo
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 1/4] Ignore stolen time in the softlockup watchdog

2007-04-24 Thread Daniel Walker
On Tue, 2007-04-24 at 22:59 +0200, Ingo Molnar wrote:
 * Daniel Walker [EMAIL PROTECTED] wrote:
 
  On Tue, 2007-04-24 at 13:24 -0700, Jeremy Fitzhardinge wrote:
  
   And sched_clock's use of local_irq_save/restore appears to be absolutely
   correct, so I think it must be triggering a bug in either the self-tests
   or lockdep itself.
  
  Why does sched_clock need to disable interrupts?
 
 i concur. To me it appears not absolutely correct that someone 
 apparently added local_irq_save/restore to sched_clock(), but absolute 
 madness. sched_clock() is _very_ performance-sensitive for the 
 scheduler, do not mess with it.

It looks like it's used in some sort of warp check, but only when
jiffies is used .. So I'm totally stumped why it's in there..

Daniel

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 1/4] Ignore stolen time in the softlockup watchdog

2007-04-24 Thread Andrew Morton
On Tue, 24 Apr 2007 22:59:18 +0200 Ingo Molnar [EMAIL PROTECTED] wrote:

 
 * Daniel Walker [EMAIL PROTECTED] wrote:
 
  On Tue, 2007-04-24 at 13:24 -0700, Jeremy Fitzhardinge wrote:
  
   And sched_clock's use of local_irq_save/restore appears to be absolutely
   correct, so I think it must be triggering a bug in either the self-tests
   or lockdep itself.
  
  Why does sched_clock need to disable interrupts?
 
 i concur. To me it appears not absolutely correct that someone 
 apparently added local_irq_save/restore to sched_clock(), but absolute 
 madness. sched_clock() is _very_ performance-sensitive for the 
 scheduler, do not mess with it.

Why does a local_irq_save/restore make the selftests fail??
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 1/4] Ignore stolen time in the softlockup watchdog

2007-04-24 Thread Andi Kleen
On Tuesday 24 April 2007 22:52:27 Daniel Walker wrote:
 On Tue, 2007-04-24 at 13:24 -0700, Jeremy Fitzhardinge wrote:
 
  And sched_clock's use of local_irq_save/restore appears to be absolutely
  correct, so I think it must be triggering a bug in either the self-tests
  or lockdep itself.
 
 Why does sched_clock need to disable interrupts?

It's only used in the instable path which is kind of i already threw up
my hands anyways

I use it because when you transition from stable (TSC) to instable (jiffies)
the only way to avoid the clock jumping backwards is to remember and update the 
last value. To avoid races with parallel cpufreq handlers or timer
interrupts this small section needs to run with interrupts disabled.

The alternative would be a seqlock, but people have complained about this
earlier too so i judged irq disabling better.

-Andi
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 1/4] Ignore stolen time in the softlockup watchdog

2007-04-24 Thread Daniel Walker
On Tue, 2007-04-24 at 23:20 +0200, Andi Kleen wrote:
 On Tuesday 24 April 2007 22:52:27 Daniel Walker wrote:
  On Tue, 2007-04-24 at 13:24 -0700, Jeremy Fitzhardinge wrote:
  
   And sched_clock's use of local_irq_save/restore appears to be absolutely
   correct, so I think it must be triggering a bug in either the self-tests
   or lockdep itself.
  
  Why does sched_clock need to disable interrupts?
 
 It's only used in the instable path which is kind of i already threw up
 my hands anyways
 
 I use it because when you transition from stable (TSC) to instable (jiffies)
 the only way to avoid the clock jumping backwards is to remember and update 
 the 
 last value. To avoid races with parallel cpufreq handlers or timer
 interrupts this small section needs to run with interrupts disabled.

Preemption is already disabled with the get_cpu_var() , so it seems like
the timer interrupt is the only worry? I find it confusing that the
access of jiffies_64 isn't protected from interrupts, it's only the
per_cpu data which should already be protected by the
get_cpu_var()/put_cpu_var ..

Daniel

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/