Re: [patch 1/4] Ignore stolen time in the softlockup watchdog
On Tue, 2007-04-24 at 23:20 +0200, Andi Kleen wrote: > On Tuesday 24 April 2007 22:52:27 Daniel Walker wrote: > > On Tue, 2007-04-24 at 13:24 -0700, Jeremy Fitzhardinge wrote: > > > > > And sched_clock's use of local_irq_save/restore appears to be absolutely > > > correct, so I think it must be triggering a bug in either the self-tests > > > or lockdep itself. > > > > Why does sched_clock need to disable interrupts? > > It's only used in the instable path which is kind of "i already threw up > my hands" anyways > > I use it because when you transition from stable (TSC) to instable (jiffies) > the only way to avoid the clock jumping backwards is to remember and update > the > last value. To avoid races with parallel cpufreq handlers or timer > interrupts this small section needs to run with interrupts disabled. Preemption is already disabled with the get_cpu_var() , so it seems like the timer interrupt is the only worry? I find it confusing that the access of jiffies_64 isn't protected from interrupts, it's only the per_cpu data which should already be protected by the get_cpu_var()/put_cpu_var .. Daniel - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 1/4] Ignore stolen time in the softlockup watchdog
On Tuesday 24 April 2007 22:52:27 Daniel Walker wrote: > On Tue, 2007-04-24 at 13:24 -0700, Jeremy Fitzhardinge wrote: > > > And sched_clock's use of local_irq_save/restore appears to be absolutely > > correct, so I think it must be triggering a bug in either the self-tests > > or lockdep itself. > > Why does sched_clock need to disable interrupts? It's only used in the instable path which is kind of "i already threw up my hands" anyways I use it because when you transition from stable (TSC) to instable (jiffies) the only way to avoid the clock jumping backwards is to remember and update the last value. To avoid races with parallel cpufreq handlers or timer interrupts this small section needs to run with interrupts disabled. The alternative would be a seqlock, but people have complained about this earlier too so i judged irq disabling better. -Andi - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 1/4] Ignore stolen time in the softlockup watchdog
On Tue, 24 Apr 2007 22:59:18 +0200 Ingo Molnar <[EMAIL PROTECTED]> wrote: > > * Daniel Walker <[EMAIL PROTECTED]> wrote: > > > On Tue, 2007-04-24 at 13:24 -0700, Jeremy Fitzhardinge wrote: > > > > > And sched_clock's use of local_irq_save/restore appears to be absolutely > > > correct, so I think it must be triggering a bug in either the self-tests > > > or lockdep itself. > > > > Why does sched_clock need to disable interrupts? > > i concur. To me it appears not "absolutely correct" that someone > apparently added local_irq_save/restore to sched_clock(), but "absolute > madness". sched_clock() is _very_ performance-sensitive for the > scheduler, do not mess with it. Why does a local_irq_save/restore make the selftests fail?? - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 1/4] Ignore stolen time in the softlockup watchdog
* Daniel Walker <[EMAIL PROTECTED]> wrote: > On Tue, 2007-04-24 at 13:24 -0700, Jeremy Fitzhardinge wrote: > > > And sched_clock's use of local_irq_save/restore appears to be absolutely > > correct, so I think it must be triggering a bug in either the self-tests > > or lockdep itself. > > Why does sched_clock need to disable interrupts? i concur. To me it appears not "absolutely correct" that someone apparently added local_irq_save/restore to sched_clock(), but "absolute madness". sched_clock() is _very_ performance-sensitive for the scheduler, do not mess with it. Ingo - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 1/4] Ignore stolen time in the softlockup watchdog
On Tue, 2007-04-24 at 22:59 +0200, Ingo Molnar wrote: > * Daniel Walker <[EMAIL PROTECTED]> wrote: > > > On Tue, 2007-04-24 at 13:24 -0700, Jeremy Fitzhardinge wrote: > > > > > And sched_clock's use of local_irq_save/restore appears to be absolutely > > > correct, so I think it must be triggering a bug in either the self-tests > > > or lockdep itself. > > > > Why does sched_clock need to disable interrupts? > > i concur. To me it appears not "absolutely correct" that someone > apparently added local_irq_save/restore to sched_clock(), but "absolute > madness". sched_clock() is _very_ performance-sensitive for the > scheduler, do not mess with it. It looks like it's used in some sort of warp check, but only when jiffies is used .. So I'm totally stumped why it's in there.. Daniel - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 1/4] Ignore stolen time in the softlockup watchdog
On Tue, 2007-04-24 at 13:24 -0700, Jeremy Fitzhardinge wrote: > And sched_clock's use of local_irq_save/restore appears to be absolutely > correct, so I think it must be triggering a bug in either the self-tests > or lockdep itself. Why does sched_clock need to disable interrupts? Daniel - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 1/4] Ignore stolen time in the softlockup watchdog
Andrew Morton wrote: > It's weird. And I don't think the locking selftest code calls > sched_clock() (or any other time-related thing) at all, does it? > I guess it ends up going through the scheduler, which does use it. But... J - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 1/4] Ignore stolen time in the softlockup watchdog
Andrew Morton wrote: > On Tue, 24 Apr 2007 13:00:49 -0700 Jeremy Fitzhardinge <[EMAIL PROTECTED]> > wrote: > > >> Andrew Morton wrote: >> >>> Well, it _is_ mysterious. >>> >>> Did you try to locate the code which failed? I got lost in macros and >>> include files, and gave up very very easily. Stop hiding, Ingo. >>> >>> >> OK, I've managed to reproduce it. Removing the local_irq_save/restore >> from sched_clock() makes it go away, as I'd expect (otherwise it would >> really be magic). >> > > erm, why do you expect that? A local_irq_save()/local_irq_restore() pair > shouldn't be affecting anything? > Well, yes. I have no idea why it causes a problem. But other than that, sched_clock does absolutely nothing which would affect lockdep state. >> But given that it never seems to touch the softlockup >> during testing, I have no idea what difference it makes... >> > > To what softlockup are you referring, and what does that have to do with > anything? You dropped this patch, "Ignore stolen time in the softlockup watchdog" because its presence triggers the lock tester errors. The only thing this patch does is use sched_clock() rather than jiffies to measure lockup time. It therefore appears, for some reason, that using sched_clock() in the softlockup code is making the lock-test fail. Since the lock test doesn't explicitly do any softlockup stuff, the connection must be implicit via sched_lock - but how, I can't imagine. Since sched_clock() itself looks perfectly OK, and the softlockup watchdog seems fine too, I can only conclude its a bug in the lock testing stuff. But I don't know what. J - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 1/4] Ignore stolen time in the softlockup watchdog
On Tue, 24 Apr 2007 13:24:24 -0700 Jeremy Fitzhardinge <[EMAIL PROTECTED]> wrote: > Jeremy Fitzhardinge wrote: > > Andrew Morton wrote: > > > >> Well, it _is_ mysterious. > >> > >> Did you try to locate the code which failed? I got lost in macros and > >> include files, and gave up very very easily. Stop hiding, Ingo. > >> > >> > > > > OK, I've managed to reproduce it. Removing the local_irq_save/restore > > from sched_clock() makes it go away, as I'd expect (otherwise it would > > really be magic). But given that it never seems to touch the softlockup > > during testing, I have no idea what difference it makes... > > And sched_clock's use of local_irq_save/restore appears to be absolutely > correct, so I think it must be triggering a bug in either the self-tests > or lockdep itself. It's weird. And I don't think the locking selftest code calls sched_clock() (or any other time-related thing) at all, does it? > The only way I could actually extract the test code itself was to run > the whole thing through cpp+indent, but it doesn't shed much light. > > It's also not clear to me if there are 6 independent failures, or if > they're a cascade. Oh well. I'll restore the patches and when people hit problems we can blame Ingo! - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 1/4] Ignore stolen time in the softlockup watchdog
Jeremy Fitzhardinge wrote: > Andrew Morton wrote: > >> Well, it _is_ mysterious. >> >> Did you try to locate the code which failed? I got lost in macros and >> include files, and gave up very very easily. Stop hiding, Ingo. >> >> > > OK, I've managed to reproduce it. Removing the local_irq_save/restore > from sched_clock() makes it go away, as I'd expect (otherwise it would > really be magic). But given that it never seems to touch the softlockup > during testing, I have no idea what difference it makes... And sched_clock's use of local_irq_save/restore appears to be absolutely correct, so I think it must be triggering a bug in either the self-tests or lockdep itself. The only way I could actually extract the test code itself was to run the whole thing through cpp+indent, but it doesn't shed much light. It's also not clear to me if there are 6 independent failures, or if they're a cascade. J - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 1/4] Ignore stolen time in the softlockup watchdog
On Tue, 24 Apr 2007 13:00:49 -0700 Jeremy Fitzhardinge <[EMAIL PROTECTED]> wrote: > Andrew Morton wrote: > > Well, it _is_ mysterious. > > > > Did you try to locate the code which failed? I got lost in macros and > > include files, and gave up very very easily. Stop hiding, Ingo. > > > > OK, I've managed to reproduce it. Removing the local_irq_save/restore > from sched_clock() makes it go away, as I'd expect (otherwise it would > really be magic). erm, why do you expect that? A local_irq_save()/local_irq_restore() pair shouldn't be affecting anything? > But given that it never seems to touch the softlockup > during testing, I have no idea what difference it makes... To what softlockup are you referring, and what does that have to do with anything? - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 1/4] Ignore stolen time in the softlockup watchdog
Andrew Morton wrote: > Well, it _is_ mysterious. > > Did you try to locate the code which failed? I got lost in macros and > include files, and gave up very very easily. Stop hiding, Ingo. > OK, I've managed to reproduce it. Removing the local_irq_save/restore from sched_clock() makes it go away, as I'd expect (otherwise it would really be magic). But given that it never seems to touch the softlockup during testing, I have no idea what difference it makes... J - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 1/4] Ignore stolen time in the softlockup watchdog
On Tue, 24 Apr 2007 11:16:09 -0700 Jeremy Fitzhardinge <[EMAIL PROTECTED]> wrote: > Andrew Morton wrote: > > I said that because the damn thing went away when I was hunting it down > > because I lost the config and was unable to remember the right combination > > of debug settings. Fortunately it later came back so I took care to > > preserve the config. > > > > sched_clock doesn't *do* anything except flap interrupts. Well, it _is_ mysterious. Did you try to locate the code which failed? I got lost in macros and include files, and gave up very very easily. Stop hiding, Ingo. > Oh, wait, have > you got Andi's bugfixed version of the sched_clock patch? The first > version did a local_save_flags rather than a local_irq_save. I have whatever I pulled from firstfloor over the weekend. It's in rc7-mm1. No, it doesn't use local_save_flags. > >> Hm, is it caused by using sched_clock() to generate the printk > >> timestamps while generating the lock test output? > >> > > > > Conceivably. What does that locking API test do? > > > > Didn't make a difference here. Building your config now. > > > I was using printk timestamps and netconsole at the time. > > > > Ah, great, now you're going to make me setup netconsole... > That's a doddle. On test system, boot with netconsole=@/eth0,@/ On workstation: sudo netcat -u -l -p | tee -a ~/.log/log- - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 1/4] Ignore stolen time in the softlockup watchdog
Andrew Morton wrote: > I said that because the damn thing went away when I was hunting it down > because I lost the config and was unable to remember the right combination > of debug settings. Fortunately it later came back so I took care to > preserve the config. > sched_clock doesn't *do* anything except flap interrupts. Oh, wait, have you got Andi's bugfixed version of the sched_clock patch? The first version did a local_save_flags rather than a local_irq_save. >> Hm, is it caused by using sched_clock() to generate the printk >> timestamps while generating the lock test output? >> > > Conceivably. What does that locking API test do? > Didn't make a difference here. Building your config now. > I was using printk timestamps and netconsole at the time. > Ah, great, now you're going to make me setup netconsole... J - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 1/4] Ignore stolen time in the softlockup watchdog
On Tue, 24 Apr 2007 10:51:35 -0700 Jeremy Fitzhardinge <[EMAIL PROTECTED]> wrote: > Andrew Morton wrote: > > It seems fairly sensitive to .config settings. See > > http://userweb.kernel.org/~akpm/config-sony.txt > > > > I haven't tried your config yet, but I haven't managed to reproduce it > by playing with the usual suspects in my config (SMP, PREEMPT). Any > idea about which config changes make the difference? I said that because the damn thing went away when I was hunting it down because I lost the config and was unable to remember the right combination of debug settings. Fortunately it later came back so I took care to preserve the config. > Hm, is it caused by using sched_clock() to generate the printk > timestamps while generating the lock test output? Conceivably. What does that locking API test do? I was using printk timestamps and netconsole at the time. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 1/4] Ignore stolen time in the softlockup watchdog
Andrew Morton wrote: > It seems fairly sensitive to .config settings. See > http://userweb.kernel.org/~akpm/config-sony.txt > I haven't tried your config yet, but I haven't managed to reproduce it by playing with the usual suspects in my config (SMP, PREEMPT). Any idea about which config changes make the difference? Hm, is it caused by using sched_clock() to generate the printk timestamps while generating the lock test output? J - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 1/4] Ignore stolen time in the softlockup watchdog
On Mon, 23 Apr 2007 23:58:20 -0700 Jeremy Fitzhardinge <[EMAIL PROTECTED]> wrote: > Andrew Morton wrote: > > On Tue, 27 Mar 2007 14:49:20 -0700 Jeremy Fitzhardinge <[EMAIL PROTECTED]> > > wrote: > > > > > >> The softlockup watchdog is currently a nuisance in a virtual machine, > >> since the whole system could have the CPU stolen from it for a long > >> period of time. While it would be unlikely for a guest domain to be > >> denied timer interrupts for over 10s, it could happen and any softlockup > >> message would be completely spurious. > >> > >> Earlier I proposed that sched_clock() return time in unstolen > >> nanoseconds, which is how Xen and VMI currently implement it. If the > >> softlockup watchdog uses sched_clock() to measure time, it would > >> automatically ignore stolen time, and therefore only report when the > >> guest itself locked up. When running native, sched_clock() returns > >> real-time nanoseconds, so the behaviour would be unchanged. > >> > >> Note that sched_clock() used this way is inherently per-cpu, so this > >> patch makes sure that the per-processor watchdog thread initialized > >> its own timestamp. > >> > > > > This patch > > (ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.21-rc6/2.6.21-rc6-mm1/broken-out/ignore-stolen-time-in-the-softlockup-watchdog.patch) > > causes six failures in the locking self-tests, which I must say is rather > > clever of it. > > > > Interesting. I'll say. > Which variation of sched_clock do you have in your tree at > the moment? Andi's, plus the below fix. Sigh. I thought I was only two more bugs away from a release, then... [18014389.347124] BUG: unable to handle kernel paging request at virtual address 6b6b7193 [18014389.347142] printing eip: [18014389.347149] c029a80c [18014389.347156] *pde = [18014389.347166] Oops: [#1] [18014389.347174] Modules linked in: i915 drm ipw2200 sonypi ipv6 autofs4 hidp l2cap bluetooth sunrpc nf_conntrack_netbios_ns ipt_REJECT nf_conntrack_ipv4 xt_state nf_conntrack nfnetlink xt_tcpudp iptable_filter ip_tables x_tables cpufreq_ondemand video sbs button battery asus_acpi ac nvram ohci1394 ieee1394 ehci_hcd uhci_hcd sg joydev snd_hda_intel snd_seq_dummy snd_seq_oss snd_seq_midi_event snd_seq snd_seq_device snd_pcm_oss snd_mixer_oss snd_pcm sr_mod cdrom snd_timer ieee80211 i2c_i801 piix ieee80211_crypt i2c_core generic snd soundcore snd_page_alloc ext3 jbd ide_disk ide_core [18014389.347520] CPU:0 [18014389.347521] EIP:0060:[]Tainted: G D VLI [18014389.347522] EFLAGS: 00010296 (2.6.21-rc7-mm1 #35) [18014389.347547] EIP is at input_release_device+0x8/0x4e [18014389.347555] eax: c99709a8 ebx: 6b6b6b6b ecx: 0286 edx: [18014389.347563] esi: 6b6b6b6b edi: c99709cc ebp: c21e3d40 esp: c21e3d38 [18014389.347571] ds: 007b es: 007b fs: 00d8 gs: ss: 0068 [18014389.347580] Process khubd (pid: 159, ti=c21e2000 task=c20a62f0 task.ti=c21e2000) [18014389.347588] Stack: 6b6b6b6b c99709a8 c21e3d60 c029b489 c2014ec8 c9182000 c96b167c c9970954 [18014389.347655]c9970954 c99709cc c21e3d80 c029d401 c9977a6c c96b1000 c21e3d90 c9970954 [18014389.347708]c99709a8 c9164000 c21e3d90 c029d4b5 c96b1000 c9970564 c21e3db0 c029c50b [18014389.347771] Call Trace: [18014389.347792] [] input_close_device+0x13/0x51 [18014389.347810] [] mousedev_destroy+0x29/0x7e [18014389.347827] [] mousedev_disconnect+0x5f/0x63 [18014389.347842] [] input_unregister_device+0x6a/0x100 [18014389.347858] [] hidinput_disconnect+0x24/0x41 [18014389.347874] [] hid_disconnect+0x79/0xc9 [18014389.347889] [] usb_unbind_interface+0x47/0x8f [18014389.347916] [] __device_release_driver+0x74/0x90 [18014389.347933] [] device_release_driver+0x37/0x4e [18014389.347957] [] bus_remove_device+0x73/0x82 [18014389.347977] [] device_del+0x214/0x28c [18014389.348132] [] usb_disable_device+0x62/0xc2 [18014389.348148] [] usb_disconnect+0x99/0x126 [18014389.348163] [] hub_thread+0x3a5/0xb07 [18014389.348178] [] kthread+0x6e/0x79 [18014389.348194] [] kernel_thread_helper+0x7/0x10 [18014389.348210] === [18014389.348218] INFO: lockdep is turned off. [18014389.348224] Code: 5b 5d c3 55 b9 f0 ff ff ff 8b 50 0c 89 e5 83 ba 28 06 00 00 00 75 08 89 82 28 06 00 00 31 c9 5d 89 c8 c3 55 89 e5 56 53 8b 70 0c <39> 86 28 06 00 00 75 3a 8b 9e e4 08 00 00 c7 86 28 06 00 00 00 I dunno. I'll keep plugging for another couple hours then I'll shove out what I have as a -mm snapshot whatsit. Things are just ridiculous. I'm thinking of having a hard-disk crash and accidentally losing everything. From: Andrew Morton <[EMAIL PROTECTED]> WARNING: arch/x86_64/kernel/built-in.o - Section mismatch: reference to .init.text:sc_cpu_event from .data between 'sc_cpu_notifier' (at offset 0x2110) and 'mcelog' Use hotcpu_notifier(). This takes care of making sure that the unused code disappears from vmlinux if !CONFIG_HOTPLUG_CPU, too.
Re: [patch 1/4] Ignore stolen time in the softlockup watchdog
Andrew Morton wrote: > On Tue, 27 Mar 2007 14:49:20 -0700 Jeremy Fitzhardinge <[EMAIL PROTECTED]> > wrote: > > >> The softlockup watchdog is currently a nuisance in a virtual machine, >> since the whole system could have the CPU stolen from it for a long >> period of time. While it would be unlikely for a guest domain to be >> denied timer interrupts for over 10s, it could happen and any softlockup >> message would be completely spurious. >> >> Earlier I proposed that sched_clock() return time in unstolen >> nanoseconds, which is how Xen and VMI currently implement it. If the >> softlockup watchdog uses sched_clock() to measure time, it would >> automatically ignore stolen time, and therefore only report when the >> guest itself locked up. When running native, sched_clock() returns >> real-time nanoseconds, so the behaviour would be unchanged. >> >> Note that sched_clock() used this way is inherently per-cpu, so this >> patch makes sure that the per-processor watchdog thread initialized >> its own timestamp. >> > > This patch > (ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.21-rc6/2.6.21-rc6-mm1/broken-out/ignore-stolen-time-in-the-softlockup-watchdog.patch) > causes six failures in the locking self-tests, which I must say is rather > clever of it. > Interesting. Which variation of sched_clock do you have in your tree at the moment? J - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 1/4] Ignore stolen time in the softlockup watchdog
On Tue, 27 Mar 2007 14:49:20 -0700 Jeremy Fitzhardinge <[EMAIL PROTECTED]> wrote: > The softlockup watchdog is currently a nuisance in a virtual machine, > since the whole system could have the CPU stolen from it for a long > period of time. While it would be unlikely for a guest domain to be > denied timer interrupts for over 10s, it could happen and any softlockup > message would be completely spurious. > > Earlier I proposed that sched_clock() return time in unstolen > nanoseconds, which is how Xen and VMI currently implement it. If the > softlockup watchdog uses sched_clock() to measure time, it would > automatically ignore stolen time, and therefore only report when the > guest itself locked up. When running native, sched_clock() returns > real-time nanoseconds, so the behaviour would be unchanged. > > Note that sched_clock() used this way is inherently per-cpu, so this > patch makes sure that the per-processor watchdog thread initialized > its own timestamp. This patch (ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.21-rc6/2.6.21-rc6-mm1/broken-out/ignore-stolen-time-in-the-softlockup-watchdog.patch) causes six failures in the locking self-tests, which I must say is rather clever of it. Here's the first one: [17179569.184000] Lock dependency validator: Copyright (c) 2006 Red Hat, Inc., Ingo Molnar [17179569.184000] ... MAX_LOCKDEP_SUBCLASSES:8 [17179569.184000] ... MAX_LOCK_DEPTH: 30 [17179569.184000] ... MAX_LOCKDEP_KEYS:2048 [17179569.184000] ... CLASSHASH_SIZE: 1024 [17179569.184000] ... MAX_LOCKDEP_ENTRIES: 8192 [17179569.184000] ... MAX_LOCKDEP_CHAINS: 16384 [17179569.184000] ... CHAINHASH_SIZE: 8192 [17179569.184000] memory used by lock dependency info: 992 kB [17179569.184000] per task-struct memory footprint: 1200 bytes [17179569.184000] [17179569.184000] | Locking API testsuite: [17179569.184000] [17179569.184000] | spin |wlock |rlock |mutex | wsem | rsem | [17179569.184000] -- [17179569.184000] A-A deadlock: ok | ok | ok | ok | ok | ok | [17179569.184000] A-B-B-A deadlock: ok | ok | ok | ok | ok | ok | [17179569.184000] A-B-B-C-C-A deadlock: ok | ok | ok | ok | ok | ok | [17179569.184001] A-B-C-A-B-C deadlock: ok | ok | ok | ok | ok | ok | [17179569.184002] A-B-B-C-C-D-D-A deadlock: ok | ok | ok | ok | ok | ok | [17179569.184003] A-B-C-D-B-D-D-A deadlock: ok | ok | ok | ok | ok | ok | [17179569.184004] A-B-C-D-B-C-D-A deadlock: ok | ok | ok | ok | ok | ok | [17179569.184005] double unlock: ok | ok | ok | ok | ok | ok | [17179569.184006] initialize held: ok | ok | ok | ok | ok | ok | [17179569.184006] bad unlock order: ok | ok | ok | ok | ok | ok | [17179569.184006] -- [17179569.184006] recursive read-lock: | ok | | ok | [17179569.184006]recursive read-lock #2: | ok | | ok | [17179569.184007] mixed read-write-lock: | ok | | ok | [17179569.184007] mixed write-read-lock: | ok | | ok | [17179569.184007] -- [17179569.184007] hard-irqs-on + irq-safe-A/12: ok | ok | ok | [17179569.184007] soft-irqs-on + irq-safe-A/12: ok | ok | ok | [17179569.184007] hard-irqs-on + irq-safe-A/21: ok | ok | ok | [17179569.184007] soft-irqs-on + irq-safe-A/21: ok | ok | ok | [17179569.184007]sirq-safe-A => hirqs-on/12: ok | ok |irq event stamp: 458 [17179569.184007] hardirqs last enabled at (458): [] irqsafe2A_rlock_12+0x96/0xa3 [17179569.184007] hardirqs last disabled at (457): [] sched_clock+0x5e/0xe9 [17179569.184007] softirqs last enabled at (454): [] irqsafe2A_rlock_12+0x81/0xa3 [17179569.184007] softirqs last disabled at (450): [] irqsafe2A_rlock_12+0xb/0xa3 [17179569.184007] FAILED| [] dump_trace+0x63/0x1ec [17179569.184007] [] show_trace_log_lvl+0x1a/0x30 [17179569.184007] [] show_trace+0x12/0x14 [17179569.184007] [] dump_stack+0x16/0x18 [17179569.184007] [] dotest+0x6b/0x3d0 [17179569.184007] [] locking_selftest+0x915/0x1a58 [17179569.184007] [] start_kernel+0x1d0/0x2a2 [17179569.184007] === [17179569.184007] [17179569.184007]sirq-safe-A => hirqs-on/21:irq event stamp: 462 [17179569.184007] hardirqs last enabled at (462): []
Re: [patch 1/4] Ignore stolen time in the softlockup watchdog
On Tue, 27 Mar 2007 14:49:20 -0700 Jeremy Fitzhardinge [EMAIL PROTECTED] wrote: The softlockup watchdog is currently a nuisance in a virtual machine, since the whole system could have the CPU stolen from it for a long period of time. While it would be unlikely for a guest domain to be denied timer interrupts for over 10s, it could happen and any softlockup message would be completely spurious. Earlier I proposed that sched_clock() return time in unstolen nanoseconds, which is how Xen and VMI currently implement it. If the softlockup watchdog uses sched_clock() to measure time, it would automatically ignore stolen time, and therefore only report when the guest itself locked up. When running native, sched_clock() returns real-time nanoseconds, so the behaviour would be unchanged. Note that sched_clock() used this way is inherently per-cpu, so this patch makes sure that the per-processor watchdog thread initialized its own timestamp. This patch (ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.21-rc6/2.6.21-rc6-mm1/broken-out/ignore-stolen-time-in-the-softlockup-watchdog.patch) causes six failures in the locking self-tests, which I must say is rather clever of it. Here's the first one: [17179569.184000] Lock dependency validator: Copyright (c) 2006 Red Hat, Inc., Ingo Molnar [17179569.184000] ... MAX_LOCKDEP_SUBCLASSES:8 [17179569.184000] ... MAX_LOCK_DEPTH: 30 [17179569.184000] ... MAX_LOCKDEP_KEYS:2048 [17179569.184000] ... CLASSHASH_SIZE: 1024 [17179569.184000] ... MAX_LOCKDEP_ENTRIES: 8192 [17179569.184000] ... MAX_LOCKDEP_CHAINS: 16384 [17179569.184000] ... CHAINHASH_SIZE: 8192 [17179569.184000] memory used by lock dependency info: 992 kB [17179569.184000] per task-struct memory footprint: 1200 bytes [17179569.184000] [17179569.184000] | Locking API testsuite: [17179569.184000] [17179569.184000] | spin |wlock |rlock |mutex | wsem | rsem | [17179569.184000] -- [17179569.184000] A-A deadlock: ok | ok | ok | ok | ok | ok | [17179569.184000] A-B-B-A deadlock: ok | ok | ok | ok | ok | ok | [17179569.184000] A-B-B-C-C-A deadlock: ok | ok | ok | ok | ok | ok | [17179569.184001] A-B-C-A-B-C deadlock: ok | ok | ok | ok | ok | ok | [17179569.184002] A-B-B-C-C-D-D-A deadlock: ok | ok | ok | ok | ok | ok | [17179569.184003] A-B-C-D-B-D-D-A deadlock: ok | ok | ok | ok | ok | ok | [17179569.184004] A-B-C-D-B-C-D-A deadlock: ok | ok | ok | ok | ok | ok | [17179569.184005] double unlock: ok | ok | ok | ok | ok | ok | [17179569.184006] initialize held: ok | ok | ok | ok | ok | ok | [17179569.184006] bad unlock order: ok | ok | ok | ok | ok | ok | [17179569.184006] -- [17179569.184006] recursive read-lock: | ok | | ok | [17179569.184006]recursive read-lock #2: | ok | | ok | [17179569.184007] mixed read-write-lock: | ok | | ok | [17179569.184007] mixed write-read-lock: | ok | | ok | [17179569.184007] -- [17179569.184007] hard-irqs-on + irq-safe-A/12: ok | ok | ok | [17179569.184007] soft-irqs-on + irq-safe-A/12: ok | ok | ok | [17179569.184007] hard-irqs-on + irq-safe-A/21: ok | ok | ok | [17179569.184007] soft-irqs-on + irq-safe-A/21: ok | ok | ok | [17179569.184007]sirq-safe-A = hirqs-on/12: ok | ok |irq event stamp: 458 [17179569.184007] hardirqs last enabled at (458): [c01e4116] irqsafe2A_rlock_12+0x96/0xa3 [17179569.184007] hardirqs last disabled at (457): [c01095b9] sched_clock+0x5e/0xe9 [17179569.184007] softirqs last enabled at (454): [c01e4101] irqsafe2A_rlock_12+0x81/0xa3 [17179569.184007] softirqs last disabled at (450): [c01e408b] irqsafe2A_rlock_12+0xb/0xa3 [17179569.184007] FAILED| [c0104cf0] dump_trace+0x63/0x1ec [17179569.184007] [c0104e93] show_trace_log_lvl+0x1a/0x30 [17179569.184007] [c01059ec] show_trace+0x12/0x14 [17179569.184007] [c0105a45] dump_stack+0x16/0x18 [17179569.184007] [c01e1eb5] dotest+0x6b/0x3d0 [17179569.184007] [c01eb249] locking_selftest+0x915/0x1a58 [17179569.184007] [c048c979] start_kernel+0x1d0/0x2a2 [17179569.184007] === [17179569.184007] [17179569.184007]sirq-safe-A = hirqs-on/21:irq event stamp: 462
Re: [patch 1/4] Ignore stolen time in the softlockup watchdog
Andrew Morton wrote: On Tue, 27 Mar 2007 14:49:20 -0700 Jeremy Fitzhardinge [EMAIL PROTECTED] wrote: The softlockup watchdog is currently a nuisance in a virtual machine, since the whole system could have the CPU stolen from it for a long period of time. While it would be unlikely for a guest domain to be denied timer interrupts for over 10s, it could happen and any softlockup message would be completely spurious. Earlier I proposed that sched_clock() return time in unstolen nanoseconds, which is how Xen and VMI currently implement it. If the softlockup watchdog uses sched_clock() to measure time, it would automatically ignore stolen time, and therefore only report when the guest itself locked up. When running native, sched_clock() returns real-time nanoseconds, so the behaviour would be unchanged. Note that sched_clock() used this way is inherently per-cpu, so this patch makes sure that the per-processor watchdog thread initialized its own timestamp. This patch (ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.21-rc6/2.6.21-rc6-mm1/broken-out/ignore-stolen-time-in-the-softlockup-watchdog.patch) causes six failures in the locking self-tests, which I must say is rather clever of it. Interesting. Which variation of sched_clock do you have in your tree at the moment? J - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 1/4] Ignore stolen time in the softlockup watchdog
On Mon, 23 Apr 2007 23:58:20 -0700 Jeremy Fitzhardinge [EMAIL PROTECTED] wrote: Andrew Morton wrote: On Tue, 27 Mar 2007 14:49:20 -0700 Jeremy Fitzhardinge [EMAIL PROTECTED] wrote: The softlockup watchdog is currently a nuisance in a virtual machine, since the whole system could have the CPU stolen from it for a long period of time. While it would be unlikely for a guest domain to be denied timer interrupts for over 10s, it could happen and any softlockup message would be completely spurious. Earlier I proposed that sched_clock() return time in unstolen nanoseconds, which is how Xen and VMI currently implement it. If the softlockup watchdog uses sched_clock() to measure time, it would automatically ignore stolen time, and therefore only report when the guest itself locked up. When running native, sched_clock() returns real-time nanoseconds, so the behaviour would be unchanged. Note that sched_clock() used this way is inherently per-cpu, so this patch makes sure that the per-processor watchdog thread initialized its own timestamp. This patch (ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.21-rc6/2.6.21-rc6-mm1/broken-out/ignore-stolen-time-in-the-softlockup-watchdog.patch) causes six failures in the locking self-tests, which I must say is rather clever of it. Interesting. I'll say. Which variation of sched_clock do you have in your tree at the moment? Andi's, plus the below fix. Sigh. I thought I was only two more bugs away from a release, then... [18014389.347124] BUG: unable to handle kernel paging request at virtual address 6b6b7193 [18014389.347142] printing eip: [18014389.347149] c029a80c [18014389.347156] *pde = [18014389.347166] Oops: [#1] [18014389.347174] Modules linked in: i915 drm ipw2200 sonypi ipv6 autofs4 hidp l2cap bluetooth sunrpc nf_conntrack_netbios_ns ipt_REJECT nf_conntrack_ipv4 xt_state nf_conntrack nfnetlink xt_tcpudp iptable_filter ip_tables x_tables cpufreq_ondemand video sbs button battery asus_acpi ac nvram ohci1394 ieee1394 ehci_hcd uhci_hcd sg joydev snd_hda_intel snd_seq_dummy snd_seq_oss snd_seq_midi_event snd_seq snd_seq_device snd_pcm_oss snd_mixer_oss snd_pcm sr_mod cdrom snd_timer ieee80211 i2c_i801 piix ieee80211_crypt i2c_core generic snd soundcore snd_page_alloc ext3 jbd ide_disk ide_core [18014389.347520] CPU:0 [18014389.347521] EIP:0060:[c029a80c]Tainted: G D VLI [18014389.347522] EFLAGS: 00010296 (2.6.21-rc7-mm1 #35) [18014389.347547] EIP is at input_release_device+0x8/0x4e [18014389.347555] eax: c99709a8 ebx: 6b6b6b6b ecx: 0286 edx: [18014389.347563] esi: 6b6b6b6b edi: c99709cc ebp: c21e3d40 esp: c21e3d38 [18014389.347571] ds: 007b es: 007b fs: 00d8 gs: ss: 0068 [18014389.347580] Process khubd (pid: 159, ti=c21e2000 task=c20a62f0 task.ti=c21e2000) [18014389.347588] Stack: 6b6b6b6b c99709a8 c21e3d60 c029b489 c2014ec8 c9182000 c96b167c c9970954 [18014389.347655]c9970954 c99709cc c21e3d80 c029d401 c9977a6c c96b1000 c21e3d90 c9970954 [18014389.347708]c99709a8 c9164000 c21e3d90 c029d4b5 c96b1000 c9970564 c21e3db0 c029c50b [18014389.347771] Call Trace: [18014389.347792] [c029b489] input_close_device+0x13/0x51 [18014389.347810] [c029d401] mousedev_destroy+0x29/0x7e [18014389.347827] [c029d4b5] mousedev_disconnect+0x5f/0x63 [18014389.347842] [c029c50b] input_unregister_device+0x6a/0x100 [18014389.347858] [c02abf9c] hidinput_disconnect+0x24/0x41 [18014389.347874] [c02aef29] hid_disconnect+0x79/0xc9 [18014389.347889] [c028e1db] usb_unbind_interface+0x47/0x8f [18014389.347916] [c0256852] __device_release_driver+0x74/0x90 [18014389.347933] [c0256c5f] device_release_driver+0x37/0x4e [18014389.347957] [c02561c6] bus_remove_device+0x73/0x82 [18014389.347977] [c02547c1] device_del+0x214/0x28c [18014389.348132] [c028bb72] usb_disable_device+0x62/0xc2 [18014389.348148] [c0288893] usb_disconnect+0x99/0x126 [18014389.348163] [c0288d2c] hub_thread+0x3a5/0xb07 [18014389.348178] [c012cbe5] kthread+0x6e/0x79 [18014389.348194] [c0104917] kernel_thread_helper+0x7/0x10 [18014389.348210] === [18014389.348218] INFO: lockdep is turned off. [18014389.348224] Code: 5b 5d c3 55 b9 f0 ff ff ff 8b 50 0c 89 e5 83 ba 28 06 00 00 00 75 08 89 82 28 06 00 00 31 c9 5d 89 c8 c3 55 89 e5 56 53 8b 70 0c 39 86 28 06 00 00 75 3a 8b 9e e4 08 00 00 c7 86 28 06 00 00 00 I dunno. I'll keep plugging for another couple hours then I'll shove out what I have as a -mm snapshot whatsit. Things are just ridiculous. I'm thinking of having a hard-disk crash and accidentally losing everything. From: Andrew Morton [EMAIL PROTECTED] WARNING: arch/x86_64/kernel/built-in.o - Section mismatch: reference to .init.text:sc_cpu_event from .data between 'sc_cpu_notifier' (at offset 0x2110) and 'mcelog' Use hotcpu_notifier(). This takes care of making sure that the unused code
Re: [patch 1/4] Ignore stolen time in the softlockup watchdog
Andrew Morton wrote: It seems fairly sensitive to .config settings. See http://userweb.kernel.org/~akpm/config-sony.txt I haven't tried your config yet, but I haven't managed to reproduce it by playing with the usual suspects in my config (SMP, PREEMPT). Any idea about which config changes make the difference? Hm, is it caused by using sched_clock() to generate the printk timestamps while generating the lock test output? J - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 1/4] Ignore stolen time in the softlockup watchdog
On Tue, 24 Apr 2007 10:51:35 -0700 Jeremy Fitzhardinge [EMAIL PROTECTED] wrote: Andrew Morton wrote: It seems fairly sensitive to .config settings. See http://userweb.kernel.org/~akpm/config-sony.txt I haven't tried your config yet, but I haven't managed to reproduce it by playing with the usual suspects in my config (SMP, PREEMPT). Any idea about which config changes make the difference? I said that because the damn thing went away when I was hunting it down because I lost the config and was unable to remember the right combination of debug settings. Fortunately it later came back so I took care to preserve the config. Hm, is it caused by using sched_clock() to generate the printk timestamps while generating the lock test output? Conceivably. What does that locking API test do? I was using printk timestamps and netconsole at the time. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 1/4] Ignore stolen time in the softlockup watchdog
Andrew Morton wrote: I said that because the damn thing went away when I was hunting it down because I lost the config and was unable to remember the right combination of debug settings. Fortunately it later came back so I took care to preserve the config. sched_clock doesn't *do* anything except flap interrupts. Oh, wait, have you got Andi's bugfixed version of the sched_clock patch? The first version did a local_save_flags rather than a local_irq_save. Hm, is it caused by using sched_clock() to generate the printk timestamps while generating the lock test output? Conceivably. What does that locking API test do? Didn't make a difference here. Building your config now. I was using printk timestamps and netconsole at the time. Ah, great, now you're going to make me setup netconsole... J - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 1/4] Ignore stolen time in the softlockup watchdog
On Tue, 24 Apr 2007 11:16:09 -0700 Jeremy Fitzhardinge [EMAIL PROTECTED] wrote: Andrew Morton wrote: I said that because the damn thing went away when I was hunting it down because I lost the config and was unable to remember the right combination of debug settings. Fortunately it later came back so I took care to preserve the config. sched_clock doesn't *do* anything except flap interrupts. Well, it _is_ mysterious. Did you try to locate the code which failed? I got lost in macros and include files, and gave up very very easily. Stop hiding, Ingo. Oh, wait, have you got Andi's bugfixed version of the sched_clock patch? The first version did a local_save_flags rather than a local_irq_save. I have whatever I pulled from firstfloor over the weekend. It's in rc7-mm1. No, it doesn't use local_save_flags. Hm, is it caused by using sched_clock() to generate the printk timestamps while generating the lock test output? Conceivably. What does that locking API test do? Didn't make a difference here. Building your config now. I was using printk timestamps and netconsole at the time. Ah, great, now you're going to make me setup netconsole... That's a doddle. On test system, boot with netconsole=@test-system-ip-addr/eth0,udp-port-no@workstation-ip-addr/workstation-mac-addr On workstation: sudo netcat -u -l -p udp-port-no | tee -a ~/.log/log-test-system-hostname - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 1/4] Ignore stolen time in the softlockup watchdog
Andrew Morton wrote: Well, it _is_ mysterious. Did you try to locate the code which failed? I got lost in macros and include files, and gave up very very easily. Stop hiding, Ingo. OK, I've managed to reproduce it. Removing the local_irq_save/restore from sched_clock() makes it go away, as I'd expect (otherwise it would really be magic). But given that it never seems to touch the softlockup during testing, I have no idea what difference it makes... J - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 1/4] Ignore stolen time in the softlockup watchdog
On Tue, 24 Apr 2007 13:00:49 -0700 Jeremy Fitzhardinge [EMAIL PROTECTED] wrote: Andrew Morton wrote: Well, it _is_ mysterious. Did you try to locate the code which failed? I got lost in macros and include files, and gave up very very easily. Stop hiding, Ingo. OK, I've managed to reproduce it. Removing the local_irq_save/restore from sched_clock() makes it go away, as I'd expect (otherwise it would really be magic). erm, why do you expect that? A local_irq_save()/local_irq_restore() pair shouldn't be affecting anything? But given that it never seems to touch the softlockup during testing, I have no idea what difference it makes... To what softlockup are you referring, and what does that have to do with anything? feels dumb - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 1/4] Ignore stolen time in the softlockup watchdog
Jeremy Fitzhardinge wrote: Andrew Morton wrote: Well, it _is_ mysterious. Did you try to locate the code which failed? I got lost in macros and include files, and gave up very very easily. Stop hiding, Ingo. OK, I've managed to reproduce it. Removing the local_irq_save/restore from sched_clock() makes it go away, as I'd expect (otherwise it would really be magic). But given that it never seems to touch the softlockup during testing, I have no idea what difference it makes... And sched_clock's use of local_irq_save/restore appears to be absolutely correct, so I think it must be triggering a bug in either the self-tests or lockdep itself. The only way I could actually extract the test code itself was to run the whole thing through cpp+indent, but it doesn't shed much light. It's also not clear to me if there are 6 independent failures, or if they're a cascade. J - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 1/4] Ignore stolen time in the softlockup watchdog
On Tue, 24 Apr 2007 13:24:24 -0700 Jeremy Fitzhardinge [EMAIL PROTECTED] wrote: Jeremy Fitzhardinge wrote: Andrew Morton wrote: Well, it _is_ mysterious. Did you try to locate the code which failed? I got lost in macros and include files, and gave up very very easily. Stop hiding, Ingo. OK, I've managed to reproduce it. Removing the local_irq_save/restore from sched_clock() makes it go away, as I'd expect (otherwise it would really be magic). But given that it never seems to touch the softlockup during testing, I have no idea what difference it makes... And sched_clock's use of local_irq_save/restore appears to be absolutely correct, so I think it must be triggering a bug in either the self-tests or lockdep itself. It's weird. And I don't think the locking selftest code calls sched_clock() (or any other time-related thing) at all, does it? The only way I could actually extract the test code itself was to run the whole thing through cpp+indent, but it doesn't shed much light. It's also not clear to me if there are 6 independent failures, or if they're a cascade. Oh well. I'll restore the patches and when people hit problems we can blame Ingo! - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 1/4] Ignore stolen time in the softlockup watchdog
Andrew Morton wrote: On Tue, 24 Apr 2007 13:00:49 -0700 Jeremy Fitzhardinge [EMAIL PROTECTED] wrote: Andrew Morton wrote: Well, it _is_ mysterious. Did you try to locate the code which failed? I got lost in macros and include files, and gave up very very easily. Stop hiding, Ingo. OK, I've managed to reproduce it. Removing the local_irq_save/restore from sched_clock() makes it go away, as I'd expect (otherwise it would really be magic). erm, why do you expect that? A local_irq_save()/local_irq_restore() pair shouldn't be affecting anything? Well, yes. I have no idea why it causes a problem. But other than that, sched_clock does absolutely nothing which would affect lockdep state. But given that it never seems to touch the softlockup during testing, I have no idea what difference it makes... To what softlockup are you referring, and what does that have to do with anything? You dropped this patch, Ignore stolen time in the softlockup watchdog because its presence triggers the lock tester errors. The only thing this patch does is use sched_clock() rather than jiffies to measure lockup time. It therefore appears, for some reason, that using sched_clock() in the softlockup code is making the lock-test fail. Since the lock test doesn't explicitly do any softlockup stuff, the connection must be implicit via sched_lock - but how, I can't imagine. Since sched_clock() itself looks perfectly OK, and the softlockup watchdog seems fine too, I can only conclude its a bug in the lock testing stuff. But I don't know what. J - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 1/4] Ignore stolen time in the softlockup watchdog
Andrew Morton wrote: It's weird. And I don't think the locking selftest code calls sched_clock() (or any other time-related thing) at all, does it? I guess it ends up going through the scheduler, which does use it. But... shrug J - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 1/4] Ignore stolen time in the softlockup watchdog
On Tue, 2007-04-24 at 13:24 -0700, Jeremy Fitzhardinge wrote: And sched_clock's use of local_irq_save/restore appears to be absolutely correct, so I think it must be triggering a bug in either the self-tests or lockdep itself. Why does sched_clock need to disable interrupts? Daniel - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 1/4] Ignore stolen time in the softlockup watchdog
* Daniel Walker [EMAIL PROTECTED] wrote: On Tue, 2007-04-24 at 13:24 -0700, Jeremy Fitzhardinge wrote: And sched_clock's use of local_irq_save/restore appears to be absolutely correct, so I think it must be triggering a bug in either the self-tests or lockdep itself. Why does sched_clock need to disable interrupts? i concur. To me it appears not absolutely correct that someone apparently added local_irq_save/restore to sched_clock(), but absolute madness. sched_clock() is _very_ performance-sensitive for the scheduler, do not mess with it. Ingo - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 1/4] Ignore stolen time in the softlockup watchdog
On Tue, 2007-04-24 at 22:59 +0200, Ingo Molnar wrote: * Daniel Walker [EMAIL PROTECTED] wrote: On Tue, 2007-04-24 at 13:24 -0700, Jeremy Fitzhardinge wrote: And sched_clock's use of local_irq_save/restore appears to be absolutely correct, so I think it must be triggering a bug in either the self-tests or lockdep itself. Why does sched_clock need to disable interrupts? i concur. To me it appears not absolutely correct that someone apparently added local_irq_save/restore to sched_clock(), but absolute madness. sched_clock() is _very_ performance-sensitive for the scheduler, do not mess with it. It looks like it's used in some sort of warp check, but only when jiffies is used .. So I'm totally stumped why it's in there.. Daniel - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 1/4] Ignore stolen time in the softlockup watchdog
On Tue, 24 Apr 2007 22:59:18 +0200 Ingo Molnar [EMAIL PROTECTED] wrote: * Daniel Walker [EMAIL PROTECTED] wrote: On Tue, 2007-04-24 at 13:24 -0700, Jeremy Fitzhardinge wrote: And sched_clock's use of local_irq_save/restore appears to be absolutely correct, so I think it must be triggering a bug in either the self-tests or lockdep itself. Why does sched_clock need to disable interrupts? i concur. To me it appears not absolutely correct that someone apparently added local_irq_save/restore to sched_clock(), but absolute madness. sched_clock() is _very_ performance-sensitive for the scheduler, do not mess with it. Why does a local_irq_save/restore make the selftests fail?? - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 1/4] Ignore stolen time in the softlockup watchdog
On Tuesday 24 April 2007 22:52:27 Daniel Walker wrote: On Tue, 2007-04-24 at 13:24 -0700, Jeremy Fitzhardinge wrote: And sched_clock's use of local_irq_save/restore appears to be absolutely correct, so I think it must be triggering a bug in either the self-tests or lockdep itself. Why does sched_clock need to disable interrupts? It's only used in the instable path which is kind of i already threw up my hands anyways I use it because when you transition from stable (TSC) to instable (jiffies) the only way to avoid the clock jumping backwards is to remember and update the last value. To avoid races with parallel cpufreq handlers or timer interrupts this small section needs to run with interrupts disabled. The alternative would be a seqlock, but people have complained about this earlier too so i judged irq disabling better. -Andi - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 1/4] Ignore stolen time in the softlockup watchdog
On Tue, 2007-04-24 at 23:20 +0200, Andi Kleen wrote: On Tuesday 24 April 2007 22:52:27 Daniel Walker wrote: On Tue, 2007-04-24 at 13:24 -0700, Jeremy Fitzhardinge wrote: And sched_clock's use of local_irq_save/restore appears to be absolutely correct, so I think it must be triggering a bug in either the self-tests or lockdep itself. Why does sched_clock need to disable interrupts? It's only used in the instable path which is kind of i already threw up my hands anyways I use it because when you transition from stable (TSC) to instable (jiffies) the only way to avoid the clock jumping backwards is to remember and update the last value. To avoid races with parallel cpufreq handlers or timer interrupts this small section needs to run with interrupts disabled. Preemption is already disabled with the get_cpu_var() , so it seems like the timer interrupt is the only worry? I find it confusing that the access of jiffies_64 isn't protected from interrupts, it's only the per_cpu data which should already be protected by the get_cpu_var()/put_cpu_var .. Daniel - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/