Re: x86: kvmclock: abstract save/restore sched_clock_state
On Tue, 7 Feb 2012, Marcelo Tosatti wrote: > > Upon resume from hibernation, CPU 0's hvclock area contains the old > values for system_time and tsc_timestamp. It is necessary for the > hypervisor to update these values with uptodate ones before the CPU uses > them. > > Abstract TSC's save/restore sched_clock_state functions and use > restore_state to write to KVM_SYSTEM_TIME MSR, forcing an update. > > Fixes suspend-to-disk with kvmclock. > > Signed-off-by: Marcelo Tosatti > > diff --git a/arch/x86/include/asm/tsc.h b/arch/x86/include/asm/tsc.h > index 15d9915..c91e8b9 100644 > --- a/arch/x86/include/asm/tsc.h > +++ b/arch/x86/include/asm/tsc.h > @@ -61,7 +61,7 @@ extern void check_tsc_sync_source(int cpu); > extern void check_tsc_sync_target(void); > > extern int notsc_setup(char *); > -extern void save_sched_clock_state(void); > -extern void restore_sched_clock_state(void); > +extern void tsc_save_sched_clock_state(void); > +extern void tsc_restore_sched_clock_state(void); > > #endif /* _ASM_X86_TSC_H */ > diff --git a/arch/x86/include/asm/x86_init.h b/arch/x86/include/asm/x86_init.h > index 5d0afac..baaca8d 100644 > --- a/arch/x86/include/asm/x86_init.h > +++ b/arch/x86/include/asm/x86_init.h > @@ -162,6 +162,8 @@ struct x86_cpuinit_ops { > * @is_untracked_pat_range exclude from PAT logic > * @nmi_init enable NMI on cpus > * @i8042_detect pre-detect if i8042 controller exists > + * @save_sched_clock_state: save state for sched_clock() on suspend > + * @restore_sched_clock_state: restore state for sched_clock() on > resume > */ > struct x86_platform_ops { > unsigned long (*calibrate_tsc)(void); > @@ -173,6 +175,8 @@ struct x86_platform_ops { > void (*nmi_init)(void); > unsigned char (*get_nmi_reason)(void); > int (*i8042_detect)(void); > + void (*save_sched_clock_state)(void); > + void (*restore_sched_clock_state)(void); > }; > > struct pci_dev; > diff --git a/arch/x86/kernel/kvmclock.c b/arch/x86/kernel/kvmclock.c > index ca4e735..57e6b78 100644 > --- a/arch/x86/kernel/kvmclock.c > +++ b/arch/x86/kernel/kvmclock.c > @@ -136,6 +136,15 @@ int kvm_register_clock(char *txt) > return ret; > } > > +void kvm_save_sched_clock_state(void) static ? > +{ > +} > + > +void kvm_restore_sched_clock_state(void) Ditto > +{ > + kvm_register_clock("primary cpu clock, resume"); > +} > + Otherwise: Reviewed-by: Thomas Gleixner -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: x86: kvmclock: abstract save/restore sched_clock_state (v2)
On 02/13/2012 05:52 PM, Marcelo Tosatti wrote: > > > { > > >+ x86_platform.restore_sched_clock_state(); > > Isn't it too early? It is scarry to say hypervisor to write to some > > memory location and than completely replace page-tables and half of > > cpu state in __restore_processor_state. Wouldn't that have a potential > > of writing into a place that is not restored hv_clock and restored > > hv_clock might still be stale? > > No, memory is copied in swsusp_arch_resume(), which happens > before restore_processor_state. restore_processor_state() is only > setting up registers and MTRR. > In addition, kvmclock uses physical addresses, so page table changes don't matter. Note we could have done this in __save_processor_state()/__restore_processor_state() (it's just reading and writing an MSR, like we do for MSR_IA32_MISC_ENABLE), but I think your patch is the right way. I'd like an ack from the x86 maintainers though. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: x86: kvmclock: abstract save/restore sched_clock_state (v2)
On (Mon) 13 Feb 2012 [11:07:27], Marcelo Tosatti wrote: > > Upon resume from hibernation, CPU 0's hvclock area contains the old > values for system_time and tsc_timestamp. It is necessary for the > hypervisor to update these values with uptodate ones before the CPU uses > them. > > Abstract TSC's save/restore sched_clock_state functions and use > restore_state to write to KVM_SYSTEM_TIME MSR, forcing an update. > > Also move restore_sched_clock_state before __restore_processor_state, > since the later calls CONFIG_LOCK_STAT's lockstat_clock (also for TSC). > Thanks to Igor Mammedov for tracking it down. > > Fixes suspend-to-disk with kvmclock. > > Signed-off-by: Marcelo Tosatti This works fine, thanks. Tested-by: Amit Shah Amit -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: x86: kvmclock: abstract save/restore sched_clock_state (v2)
On Mon, Feb 13, 2012 at 04:20:24PM +0100, Igor Mammedov wrote: > On 02/13/2012 02:07 PM, Marcelo Tosatti wrote: > > > >Upon resume from hibernation, CPU 0's hvclock area contains the old > >values for system_time and tsc_timestamp. It is necessary for the > >hypervisor to update these values with uptodate ones before the CPU uses > >them. > > > >Abstract TSC's save/restore sched_clock_state functions and use > >restore_state to write to KVM_SYSTEM_TIME MSR, forcing an update. > > > >Also move restore_sched_clock_state before __restore_processor_state, > >since the later calls CONFIG_LOCK_STAT's lockstat_clock (also for TSC). > >Thanks to Igor Mammedov for tracking it down. > > > >Fixes suspend-to-disk with kvmclock. > > > >Signed-off-by: Marcelo Tosatti > > > >diff --git a/arch/x86/include/asm/tsc.h b/arch/x86/include/asm/tsc.h > >index 15d9915..c91e8b9 100644 > >--- a/arch/x86/include/asm/tsc.h > >+++ b/arch/x86/include/asm/tsc.h > >@@ -61,7 +61,7 @@ extern void check_tsc_sync_source(int cpu); > > extern void check_tsc_sync_target(void); > > > > extern int notsc_setup(char *); > >-extern void save_sched_clock_state(void); > >-extern void restore_sched_clock_state(void); > >+extern void tsc_save_sched_clock_state(void); > >+extern void tsc_restore_sched_clock_state(void); > > > > #endif /* _ASM_X86_TSC_H */ > >diff --git a/arch/x86/include/asm/x86_init.h > >b/arch/x86/include/asm/x86_init.h > >index 5d0afac..baaca8d 100644 > >--- a/arch/x86/include/asm/x86_init.h > >+++ b/arch/x86/include/asm/x86_init.h > >@@ -162,6 +162,8 @@ struct x86_cpuinit_ops { > > * @is_untracked_pat_range exclude from PAT logic > > * @nmi_init enable NMI on cpus > > * @i8042_detect pre-detect if i8042 controller exists > >+ * @save_sched_clock_state: save state for sched_clock() on suspend > >+ * @restore_sched_clock_state: restore state for sched_clock() on > >resume > > */ > > struct x86_platform_ops { > > unsigned long (*calibrate_tsc)(void); > >@@ -173,6 +175,8 @@ struct x86_platform_ops { > > void (*nmi_init)(void); > > unsigned char (*get_nmi_reason)(void); > > int (*i8042_detect)(void); > >+void (*save_sched_clock_state)(void); > >+void (*restore_sched_clock_state)(void); > > }; > > > > struct pci_dev; > >diff --git a/arch/x86/kernel/kvmclock.c b/arch/x86/kernel/kvmclock.c > >index ca4e735..57e6b78 100644 > >--- a/arch/x86/kernel/kvmclock.c > >+++ b/arch/x86/kernel/kvmclock.c > >@@ -136,6 +136,15 @@ int kvm_register_clock(char *txt) > > return ret; > > } > > > >+void kvm_save_sched_clock_state(void) > >+{ > >+} > >+ > >+void kvm_restore_sched_clock_state(void) > >+{ > >+kvm_register_clock("primary cpu clock, resume"); > >+} > >+ > > #ifdef CONFIG_X86_LOCAL_APIC > > static void __cpuinit kvm_setup_secondary_clock(void) > > { > >@@ -195,6 +204,8 @@ void __init kvmclock_init(void) > > x86_cpuinit.early_percpu_clock_init = > > kvm_setup_secondary_clock; > > #endif > >+x86_platform.save_sched_clock_state = kvm_save_sched_clock_state; > >+x86_platform.restore_sched_clock_state = kvm_restore_sched_clock_state; > > machine_ops.shutdown = kvm_shutdown; > > #ifdef CONFIG_KEXEC > > machine_ops.crash_shutdown = kvm_crash_shutdown; > >diff --git a/arch/x86/kernel/tsc.c b/arch/x86/kernel/tsc.c > >index a62c201..aed2aa1 100644 > >--- a/arch/x86/kernel/tsc.c > >+++ b/arch/x86/kernel/tsc.c > >@@ -629,7 +629,7 @@ static void set_cyc2ns_scale(unsigned long cpu_khz, int > >cpu) > > > > static unsigned long long cyc2ns_suspend; > > > >-void save_sched_clock_state(void) > >+void tsc_save_sched_clock_state(void) > > { > > if (!sched_clock_stable) > > return; > >@@ -645,7 +645,7 @@ void save_sched_clock_state(void) > > * that sched_clock() continues from the point where it was left off during > > * suspend. > > */ > >-void restore_sched_clock_state(void) > >+void tsc_restore_sched_clock_state(void) > > { > > unsigned long long offset; > > unsigned long flags; > >diff --git a/arch/x86/kernel/x86_init.c b/arch/x86/kernel/x86_init.c > >index 6f2ec53..e9f265f 100644 > >--- a/arch/x86/kernel/x86_init.c > >+++ b/arch/x86/kernel/x86_init.c > >@@ -108,7 +108,9 @@ struct x86_platform_ops x86_platform = { > > .is_untracked_pat_range = is_ISA_range, > > .nmi_init = default_nmi_init, > > .get_nmi_reason = default_get_nmi_reason, > >-.i8042_detect = default_i8042_detect > >+.i8042_detect = default_i8042_detect, > >+.save_sched_clock_state = tsc_save_sched_clock_state, > >+.restore_sched_clock_state = tsc_restore_sched_clock_state, > > }; > > > > EXPORT_SYMBOL_GPL(x86_platform); > >diff --git a/arch/x86/power/cpu.c b/arch/x86/power/cpu.c > >index f10c0af..0e76a28 100644 > >--- a/arch/x86/power/cpu.c > >+++ b/arch/x86/power/cpu.c > >@@ -114,7 +114,7 @@ static void __save_processor_sta
Re: x86: kvmclock: abstract save/restore sched_clock_state (v2)
On 02/13/2012 02:07 PM, Marcelo Tosatti wrote: Upon resume from hibernation, CPU 0's hvclock area contains the old values for system_time and tsc_timestamp. It is necessary for the hypervisor to update these values with uptodate ones before the CPU uses them. Abstract TSC's save/restore sched_clock_state functions and use restore_state to write to KVM_SYSTEM_TIME MSR, forcing an update. Also move restore_sched_clock_state before __restore_processor_state, since the later calls CONFIG_LOCK_STAT's lockstat_clock (also for TSC). Thanks to Igor Mammedov for tracking it down. Fixes suspend-to-disk with kvmclock. Signed-off-by: Marcelo Tosatti diff --git a/arch/x86/include/asm/tsc.h b/arch/x86/include/asm/tsc.h index 15d9915..c91e8b9 100644 --- a/arch/x86/include/asm/tsc.h +++ b/arch/x86/include/asm/tsc.h @@ -61,7 +61,7 @@ extern void check_tsc_sync_source(int cpu); extern void check_tsc_sync_target(void); extern int notsc_setup(char *); -extern void save_sched_clock_state(void); -extern void restore_sched_clock_state(void); +extern void tsc_save_sched_clock_state(void); +extern void tsc_restore_sched_clock_state(void); #endif /* _ASM_X86_TSC_H */ diff --git a/arch/x86/include/asm/x86_init.h b/arch/x86/include/asm/x86_init.h index 5d0afac..baaca8d 100644 --- a/arch/x86/include/asm/x86_init.h +++ b/arch/x86/include/asm/x86_init.h @@ -162,6 +162,8 @@ struct x86_cpuinit_ops { * @is_untracked_pat_rangeexclude from PAT logic * @nmi_init enable NMI on cpus * @i8042_detect pre-detect if i8042 controller exists + * @save_sched_clock_state:save state for sched_clock() on suspend + * @restore_sched_clock_state: restore state for sched_clock() on resume */ struct x86_platform_ops { unsigned long (*calibrate_tsc)(void); @@ -173,6 +175,8 @@ struct x86_platform_ops { void (*nmi_init)(void); unsigned char (*get_nmi_reason)(void); int (*i8042_detect)(void); + void (*save_sched_clock_state)(void); + void (*restore_sched_clock_state)(void); }; struct pci_dev; diff --git a/arch/x86/kernel/kvmclock.c b/arch/x86/kernel/kvmclock.c index ca4e735..57e6b78 100644 --- a/arch/x86/kernel/kvmclock.c +++ b/arch/x86/kernel/kvmclock.c @@ -136,6 +136,15 @@ int kvm_register_clock(char *txt) return ret; } +void kvm_save_sched_clock_state(void) +{ +} + +void kvm_restore_sched_clock_state(void) +{ + kvm_register_clock("primary cpu clock, resume"); +} + #ifdef CONFIG_X86_LOCAL_APIC static void __cpuinit kvm_setup_secondary_clock(void) { @@ -195,6 +204,8 @@ void __init kvmclock_init(void) x86_cpuinit.early_percpu_clock_init = kvm_setup_secondary_clock; #endif + x86_platform.save_sched_clock_state = kvm_save_sched_clock_state; + x86_platform.restore_sched_clock_state = kvm_restore_sched_clock_state; machine_ops.shutdown = kvm_shutdown; #ifdef CONFIG_KEXEC machine_ops.crash_shutdown = kvm_crash_shutdown; diff --git a/arch/x86/kernel/tsc.c b/arch/x86/kernel/tsc.c index a62c201..aed2aa1 100644 --- a/arch/x86/kernel/tsc.c +++ b/arch/x86/kernel/tsc.c @@ -629,7 +629,7 @@ static void set_cyc2ns_scale(unsigned long cpu_khz, int cpu) static unsigned long long cyc2ns_suspend; -void save_sched_clock_state(void) +void tsc_save_sched_clock_state(void) { if (!sched_clock_stable) return; @@ -645,7 +645,7 @@ void save_sched_clock_state(void) * that sched_clock() continues from the point where it was left off during * suspend. */ -void restore_sched_clock_state(void) +void tsc_restore_sched_clock_state(void) { unsigned long long offset; unsigned long flags; diff --git a/arch/x86/kernel/x86_init.c b/arch/x86/kernel/x86_init.c index 6f2ec53..e9f265f 100644 --- a/arch/x86/kernel/x86_init.c +++ b/arch/x86/kernel/x86_init.c @@ -108,7 +108,9 @@ struct x86_platform_ops x86_platform = { .is_untracked_pat_range = is_ISA_range, .nmi_init = default_nmi_init, .get_nmi_reason = default_get_nmi_reason, - .i8042_detect = default_i8042_detect + .i8042_detect = default_i8042_detect, + .save_sched_clock_state = tsc_save_sched_clock_state, + .restore_sched_clock_state = tsc_restore_sched_clock_state, }; EXPORT_SYMBOL_GPL(x86_platform); diff --git a/arch/x86/power/cpu.c b/arch/x86/power/cpu.c index f10c0af..0e76a28 100644 --- a/arch/x86/power/cpu.c +++ b/arch/x86/power/cpu.c @@ -114,7 +114,7 @@ static void __save_processor_state(struct saved_context *ctxt) void save_processor_state(void) { __save_processor_state(&saved_context); - save_sched_clock_state(); + x86_platform.save_sched_clock_state(); } #ifdef CONFIG_X86_32 EXPORT_SYMBOL(save_processor_state); @@ -230,8 +230,8 @@ static void __restore_processor_state(struct saved_context *ctxt) /* Needed by apm.c
Re: x86: kvmclock: abstract save/restore sched_clock_state
On (Fri) 10 Feb 2012 [10:33:37], Marcelo Tosatti wrote: > On Fri, Feb 10, 2012 at 10:32:16AM -0200, Marcelo Tosatti wrote: > > On Fri, Feb 10, 2012 at 03:32:11PM +0530, Amit Shah wrote: > > > On (Thu) 09 Feb 2012 [16:13:29], Igor Mammedov wrote: > > > > > > > Stalls are probably caused by uninitialized percpu hv_clock, with > > > > following patch I don't see stalls. Although I might be just lucky. > > > > http://git.kernel.org/?p=virt/kvm/kvm.git;a=commit;h=e2971ac7e1d186af059e088d305496c5cb47d487 > > > > > > Your commit does make things better, I don't see any stalls on the > > > first resume. > > > > > > However, a subsequent s4 causes the stall to re-appear on resume, and > > > this time there are no stall messages; the kernel just sits there > > > spinning on something. I've not found the solution to this one yet (I > > > had a commit similar to Marcelo's in the works, which got me to the > > > previous works-but-stalls behaviour). > > > > I cannot reproduce it here. Suspend/resume are operating normally after > > several iterations. Igor do you see anything similar? > > > > Amit, can you please enable CONFIG_PRINTK_TIME=y and post a full dmesg > > (both during suspend and also the new kernel during resume). > > Also is it reproducible with UP guest? Yes, it is. Amit -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: x86: kvmclock: abstract save/restore sched_clock_state
On (Fri) 10 Feb 2012 [13:43:05], Igor Mammedov wrote: > Another thing is to try smp guest without kvmclock and see if it helps. > It might be just something else. Nope, it's related to kvmclock. Amit -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: x86: kvmclock: abstract save/restore sched_clock_state
On (Fri) 10 Feb 2012 [21:58:47], Igor Mammedov wrote: > BTW Amit, > your config doesn't have CONFIG_KVM_GUEST set, which causes primary cpu clock > to be > uninitialized too in case of SMP kernel. Interesting. I didn't notice that. However, if I enable that option, resume fails for me even the first time. Amit -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: x86: kvmclock: abstract save/restore sched_clock_state
I've to revoke my ack and say NAK to this patch. Patch itself is in right direction but clock restore happens too late. With patch I've used to hunt down overflow for cpu hotplug (maybe it will be better for it to be in kernel, which will help to detect overflow problem without spending a lot of time to debug it?): diff --git a/arch/x86/kernel/pvclock.c b/arch/x86/kernel/pvclock.c index 42eb330..0081e10 100644 --- a/arch/x86/kernel/pvclock.c +++ b/arch/x86/kernel/pvclock.c @@ -43,7 +43,10 @@ void pvclock_set_flags(u8 flags) static u64 pvclock_get_nsec_offset(struct pvclock_shadow_time *shadow) { - u64 delta = native_read_tsc() - shadow->tsc_timestamp; + u64 tsc = native_read_tsc(); + u64 shadow_timestamp = shadow->tsc_timestamp; + u64 delta = tsc - shadow_timestamp; + BUG_ON (tsc < shadow_timestamp); return pvclock_scale_delta(delta, shadow->tsc_to_nsec_mul, shadow->tsc_shift); } I've got following back-trace: #29 0x81004925 in oops_end (flags=70, regs=, signr=11) at arch/x86/kernel/dumpstack.c:246 #30 0x81004a77 in die (str=0x8152ca7f "invalid opcode", regs=0x8800149859e8, err=0) at arch/x86/kernel/dumpstack.c:305 #31 0x81001fed in do_trap (trapnr=6, signr=4, str=0x8152ca7f "invalid opcode", regs=0x8800149859e8, error_code=0, info=0x880014985948) at arch/x86/kernel/traps.c:168 #32 0x810021f7 in do_invalid_op (regs=0x8800149859e8, error_code=0) at arch/x86/kernel/traps.c:209 #33 #34 pvclock_get_nsec_offset (shadow=0x880014985a98) at arch/x86/kernel/pvclock.c:51 #35 pvclock_clocksource_read (src=0x88001f1d1ac0) at arch/x86/kernel/pvclock.c:107 #36 0x8101aa97 in kvm_clock_read () at arch/x86/kernel/kvmclock.c:79 #37 0x810087cc in paravirt_sched_clock () at /home/imammedov/fc15/linux-2.6/arch/x86/include/asm/paravirt.h:230 #38 sched_clock () at arch/x86/kernel/tsc.c:71 #39 0x8104f025 in sched_clock_local (scd=0x88001f1d2600) at kernel/sched/clock.c:147 #40 0x8104f1b7 in sched_clock_cpu (cpu=0) at kernel/sched/clock.c:232 #41 0x8104f1e5 in local_clock () at kernel/sched/clock.c:316 #42 0x810699d2 in lockstat_clock () at kernel/lockdep.c:158 #43 __lock_acquire (lock=0x816131b8, subclass=, trylock=0, read=0, check=2, hardirqs_off=, nest_lock=0x0, ip=18446744071578924098, references=0) at kernel/lockdep.c:3098 #44 0x8106a924 in lock_acquire (lock=0x816131b8, subclass=0, trylock=0, read=0, check=2, nest_lock=0x0, ip=18446744071578924098) at kernel/lockdep.c:3555 #45 0x8136b1c3 in __raw_spin_lock (lock=0x816131a0) at include/linux/spinlock_api_smp.h:143 #46 _raw_spin_lock (lock=0x816131a0) at kernel/spinlock.c:137 #47 0x81013442 in prepare_set () at arch/x86/kernel/cpu/mtrr/generic.c:676 #48 0x8101368c in generic_set_all () at arch/x86/kernel/cpu/mtrr/generic.c:723 #49 0x8101258b in mtrr_bp_restore () at arch/x86/kernel/cpu/mtrr/main.c:738 #50 0x812cabfa in __restore_processor_state (ctxt=) at arch/x86/power/cpu.c:227 #51 restore_processor_state () at arch/x86/power/cpu.c:233 ---Type to continue, or q to quit--- #52 0x8105a1b6 in create_image (platform_mode=0) at kernel/power/hibernate.c:291 #53 hibernation_snapshot (platform_mode=0) at kernel/power/hibernate.c:363 #54 0x8105a825 in hibernate () at kernel/power/hibernate.c:629 #55 0x81058868 in state_store (kobj=, attr=, buf=0x88001366e000 , n=5) at kernel/power/main.c:284 #56 0x811e6cc3 in kobj_attr_store (kobj=, attr=, buf=, count=) at lib/kobject.c:699 #57 0x811372c2 in flush_write_buffer (count=, buffer=, dentry=) at fs/sysfs/file.c:202 #58 sysfs_write_file (file=, buf=0x7f9ce710d000 , count=, ppos=0x880014985f58) at fs/sysfs/file.c:236 #59 0x810ebfbe in vfs_write (file=0x88001cb48000, buf=0x7f9ce710d000 , count=, pos=0x880014985f58) at fs/read_write.c:435 #60 0x810ec175 in sys_write (fd=, buf=0x7f9ce710d000 , count=) at fs/read_write.c:487 that shows an access to clock happens right before x86_platform.restore_sched_clock_state is called. Moving x86_platform.restore_sched_clock_state before mtrr_bp_restore solves issue. It isn't bugged on me for 20 save/restore cycles with this change, without this change it bugs on 2nd-3rd cycle. BTW Amit, your config doesn't have CONFIG_KVM_GUEST set, which causes primary cpu clock to be uninitialized too in case of SMP kernel. Commit ca3f10172eea9b95b moved it from kvmclock_init to kvm_guest_init but forgot to change Kconfig too and kvm_guest_init is called only when KVM_GUEST is selected. This commit created following implicit deps if SMP && KVM_CLOCK then select KVM_GUEST But I don't know if it's possible to express it in Kconfig. That rises a question: Do we need KVM_CLOCK option? Maybe better
Re: x86: kvmclock: abstract save/restore sched_clock_state
On (Fri) 10 Feb 2012 [10:32:16], Marcelo Tosatti wrote: > On Fri, Feb 10, 2012 at 03:32:11PM +0530, Amit Shah wrote: > > On (Thu) 09 Feb 2012 [16:13:29], Igor Mammedov wrote: > > > > > Stalls are probably caused by uninitialized percpu hv_clock, with > > > following patch I don't see stalls. Although I might be just lucky. > > > http://git.kernel.org/?p=virt/kvm/kvm.git;a=commit;h=e2971ac7e1d186af059e088d305496c5cb47d487 > > > > Your commit does make things better, I don't see any stalls on the > > first resume. > > > > However, a subsequent s4 causes the stall to re-appear on resume, and > > this time there are no stall messages; the kernel just sits there > > spinning on something. I've not found the solution to this one yet (I > > had a commit similar to Marcelo's in the works, which got me to the > > previous works-but-stalls behaviour). > > I cannot reproduce it here. Suspend/resume are operating normally after > several iterations. Igor do you see anything similar? > > Amit, can you please enable CONFIG_PRINTK_TIME=y and post a full dmesg > (both during suspend and also the new kernel during resume). In my case, I run a ping to the host (10.0.2.2) while the s4 suspend/resume operations are performed. Complete dmesg, for all 3 invocations of the guest. First one boots, starts ping, starts s4. Second one starts s4 after confirming ping is working fine. Third one just stays there, spinning. $ ./x86_64-softmmu/qemu-system-x86_64 -kernel ~/src/linux/arch/x86/boot/bzImage -append 'root=/dev/vda1 console=tty0 console=ttyS0 no_console_suspend' -drive file=/guests/f14-suspend.qcow2,if=none,id=dr0 -device virtio-blk-pci,drive=dr0 -net nic,model=virtio -net user -serial stdio -enable-kvm -m 512 -cpu host -smp 4 [0.00] Initializing cgroup subsys cpu [0.00] Linux version 3.3.0-rc2+ (a...@amit.redhat.com) (gcc version 4.6.2 20111027 (Red Hat 4.6.2-1) (GCC) ) #295 SMP PREEMPT Fri Feb 10 18:39:48 IST 2012 [0.00] Command line: root=/dev/vda1 console=tty0 console=ttyS0 no_console_suspend [0.00] BIOS-provided physical RAM map: [0.00] BIOS-e820: - 0009dc00 (usable) [0.00] BIOS-e820: 0009dc00 - 000a (reserved) [0.00] BIOS-e820: 000f - 0010 (reserved) [0.00] BIOS-e820: 0010 - 1fffd000 (usable) [0.00] BIOS-e820: 1fffd000 - 2000 (reserved) [0.00] BIOS-e820: feffc000 - ff00 (reserved) [0.00] BIOS-e820: fffc - 0001 (reserved) [0.00] NX (Execute Disable) protection: active [0.00] DMI 2.4 present. [0.00] No AGP bridge found [0.00] last_pfn = 0x1fffd max_arch_pfn = 0x4 [0.00] x86 PAT enabled: cpu 0, old 0x70406, new 0x7010600070106 [0.00] init_memory_mapping: -1fffd000 [0.00] RAMDISK: 1fa2e000 - 1fff [0.00] ACPI: RSDP 000fd3f0 00014 (v00 BOCHS ) [0.00] ACPI: RSDT 1fffd660 00034 (v01 BOCHS BXPCRSDT 0001 BXPC 0001) [0.00] ACPI: FACP 1f80 00074 (v01 BOCHS BXPCFACP 0001 BXPC 0001) [0.00] ACPI: DSDT 1fffd9b0 02589 (v01 BXPC BXDSDT 0001 INTL 20100528) [0.00] ACPI: FACS 1f40 00040 [0.00] ACPI: SSDT 1fffd7e0 001C1 (v01 BOCHS BXPCSSDT 0001 BXPC 0001) [0.00] ACPI: APIC 1fffd6e0 0008A (v01 BOCHS BXPCAPIC 0001 BXPC 0001) [0.00] ACPI: HPET 1fffd6a0 00038 (v01 BOCHS BXPCHPET 0001 BXPC 0001) [0.00] kvm-clock: Using msrs 4b564d01 and 4b564d00 [0.00] kvm-clock: cpu 0, msr 0:18509c1, boot clock [0.00] Zone PFN ranges: [0.00] DMA 0x0010 -> 0x1000 [0.00] DMA320x1000 -> 0x0010 [0.00] Normal empty [0.00] Movable zone start PFN for each node [0.00] Early memory PFN ranges [0.00] 0: 0x0010 -> 0x009d [0.00] 0: 0x0100 -> 0x0001fffd [0.00] ACPI: PM-Timer IO Port: 0xb008 [0.00] ACPI: LAPIC (acpi_id[0x00] lapic_id[0x00] enabled) [0.00] ACPI: LAPIC (acpi_id[0x01] lapic_id[0x01] enabled) [0.00] ACPI: LAPIC (acpi_id[0x02] lapic_id[0x02] enabled) [0.00] ACPI: LAPIC (acpi_id[0x03] lapic_id[0x03] enabled) [0.00] ACPI: IOAPIC (id[0x04] address[0xfec0] gsi_base[0]) [0.00] IOAPIC[0]: apic_id 4, version 17, address 0xfec0, GSI 0-23 [0.00] ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 dfl dfl) [0.00] ACPI: INT_SRC_OVR (bus 0 bus_irq 5 global_irq 5 high level) [0.00] ACPI: INT_SRC_OVR (bus 0 bus_irq 9 global_irq 9 high level) [0.00] ACPI: INT_SRC_OVR (bus 0 bus_irq 10 global_irq 10 high level) [0.00] ACPI: INT_SRC_OVR (bus 0 bus_irq 11 global_irq 11 high level) [0.00] Using ACPI
Re: x86: kvmclock: abstract save/restore sched_clock_state
On 02/10/2012 01:33 PM, Marcelo Tosatti wrote: On Fri, Feb 10, 2012 at 10:32:16AM -0200, Marcelo Tosatti wrote: On Fri, Feb 10, 2012 at 03:32:11PM +0530, Amit Shah wrote: On (Thu) 09 Feb 2012 [16:13:29], Igor Mammedov wrote: Stalls are probably caused by uninitialized percpu hv_clock, with following patch I don't see stalls. Although I might be just lucky. http://git.kernel.org/?p=virt/kvm/kvm.git;a=commit;h=e2971ac7e1d186af059e088d305496c5cb47d487 Your commit does make things better, I don't see any stalls on the first resume. However, a subsequent s4 causes the stall to re-appear on resume, and this time there are no stall messages; the kernel just sits there spinning on something. I've not found the solution to this one yet (I had a commit similar to Marcelo's in the works, which got me to the previous works-but-stalls behaviour). I cannot reproduce it here. Suspend/resume are operating normally after several iterations. Igor do you see anything similar? I wasn't able to reproduce it either but I haven't tried with Amit's config yet. Amit, can you please enable CONFIG_PRINTK_TIME=y and post a full dmesg (both during suspend and also the new kernel during resume). Also is it reproducible with UP guest? Another thing is to try smp guest without kvmclock and see if it helps. It might be just something else. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- Thanks, Igor -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: x86: kvmclock: abstract save/restore sched_clock_state
On Fri, Feb 10, 2012 at 10:32:16AM -0200, Marcelo Tosatti wrote: > On Fri, Feb 10, 2012 at 03:32:11PM +0530, Amit Shah wrote: > > On (Thu) 09 Feb 2012 [16:13:29], Igor Mammedov wrote: > > > > > Stalls are probably caused by uninitialized percpu hv_clock, with > > > following patch I don't see stalls. Although I might be just lucky. > > > http://git.kernel.org/?p=virt/kvm/kvm.git;a=commit;h=e2971ac7e1d186af059e088d305496c5cb47d487 > > > > Your commit does make things better, I don't see any stalls on the > > first resume. > > > > However, a subsequent s4 causes the stall to re-appear on resume, and > > this time there are no stall messages; the kernel just sits there > > spinning on something. I've not found the solution to this one yet (I > > had a commit similar to Marcelo's in the works, which got me to the > > previous works-but-stalls behaviour). > > I cannot reproduce it here. Suspend/resume are operating normally after > several iterations. Igor do you see anything similar? > > Amit, can you please enable CONFIG_PRINTK_TIME=y and post a full dmesg > (both during suspend and also the new kernel during resume). Also is it reproducible with UP guest? -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: x86: kvmclock: abstract save/restore sched_clock_state
On Fri, Feb 10, 2012 at 03:32:11PM +0530, Amit Shah wrote: > On (Thu) 09 Feb 2012 [16:13:29], Igor Mammedov wrote: > > > Stalls are probably caused by uninitialized percpu hv_clock, with > > following patch I don't see stalls. Although I might be just lucky. > > http://git.kernel.org/?p=virt/kvm/kvm.git;a=commit;h=e2971ac7e1d186af059e088d305496c5cb47d487 > > Your commit does make things better, I don't see any stalls on the > first resume. > > However, a subsequent s4 causes the stall to re-appear on resume, and > this time there are no stall messages; the kernel just sits there > spinning on something. I've not found the solution to this one yet (I > had a commit similar to Marcelo's in the works, which got me to the > previous works-but-stalls behaviour). I cannot reproduce it here. Suspend/resume are operating normally after several iterations. Igor do you see anything similar? Amit, can you please enable CONFIG_PRINTK_TIME=y and post a full dmesg (both during suspend and also the new kernel during resume). Thanks. > > However there is/are a warning/s on suspend path and with following patch: > > I didn't see this. This is unrelated. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: x86: kvmclock: abstract save/restore sched_clock_state
On (Fri) 10 Feb 2012 [05:11:00], Igor Mammedov wrote: > Could you send me your .config and commit id of kernel you are using? Kernel's based on bd3ce7d57c380af110c86d19e256115d0e7053ca plus your commit + Marcelo's patch. config is attached below. # # Automatically generated file; DO NOT EDIT. # Linux/x86_64 3.3.0-rc2 Kernel Configuration # CONFIG_64BIT=y # CONFIG_X86_32 is not set CONFIG_X86_64=y CONFIG_X86=y CONFIG_INSTRUCTION_DECODER=y CONFIG_OUTPUT_FORMAT="elf64-x86-64" CONFIG_ARCH_DEFCONFIG="arch/x86/configs/x86_64_defconfig" CONFIG_GENERIC_CMOS_UPDATE=y CONFIG_CLOCKSOURCE_WATCHDOG=y CONFIG_GENERIC_CLOCKEVENTS=y CONFIG_ARCH_CLOCKSOURCE_DATA=y CONFIG_GENERIC_CLOCKEVENTS_BROADCAST=y CONFIG_LOCKDEP_SUPPORT=y CONFIG_STACKTRACE_SUPPORT=y CONFIG_HAVE_LATENCYTOP_SUPPORT=y CONFIG_MMU=y CONFIG_NEED_DMA_MAP_STATE=y CONFIG_NEED_SG_DMA_LENGTH=y CONFIG_GENERIC_ISA_DMA=y CONFIG_GENERIC_BUG=y CONFIG_GENERIC_BUG_RELATIVE_POINTERS=y CONFIG_GENERIC_HWEIGHT=y CONFIG_ARCH_MAY_HAVE_PC_FDC=y # CONFIG_RWSEM_GENERIC_SPINLOCK is not set CONFIG_RWSEM_XCHGADD_ALGORITHM=y CONFIG_ARCH_HAS_CPU_IDLE_WAIT=y CONFIG_GENERIC_CALIBRATE_DELAY=y CONFIG_GENERIC_TIME_VSYSCALL=y CONFIG_ARCH_HAS_CPU_RELAX=y CONFIG_ARCH_HAS_DEFAULT_IDLE=y CONFIG_ARCH_HAS_CACHE_LINE_SIZE=y CONFIG_HAVE_SETUP_PER_CPU_AREA=y CONFIG_NEED_PER_CPU_EMBED_FIRST_CHUNK=y CONFIG_NEED_PER_CPU_PAGE_FIRST_CHUNK=y CONFIG_ARCH_HIBERNATION_POSSIBLE=y CONFIG_ARCH_SUSPEND_POSSIBLE=y CONFIG_ZONE_DMA32=y CONFIG_AUDIT_ARCH=y CONFIG_ARCH_SUPPORTS_OPTIMIZED_INLINING=y CONFIG_ARCH_SUPPORTS_DEBUG_PAGEALLOC=y CONFIG_X86_64_SMP=y CONFIG_X86_HT=y CONFIG_ARCH_HWEIGHT_CFLAGS="-fcall-saved-rdi -fcall-saved-rsi -fcall-saved-rdx -fcall-saved-rcx -fcall-saved-r8 -fcall-saved-r9 -fcall-saved-r10 -fcall-saved-r11" # CONFIG_KTIME_SCALAR is not set CONFIG_ARCH_CPU_PROBE_RELEASE=y CONFIG_DEFCONFIG_LIST="/lib/modules/$UNAME_RELEASE/.config" CONFIG_HAVE_IRQ_WORK=y CONFIG_IRQ_WORK=y # # General setup # CONFIG_EXPERIMENTAL=y CONFIG_INIT_ENV_ARG_LIMIT=32 CONFIG_CROSS_COMPILE="" CONFIG_LOCALVERSION="" # CONFIG_LOCALVERSION_AUTO is not set CONFIG_HAVE_KERNEL_GZIP=y CONFIG_HAVE_KERNEL_BZIP2=y CONFIG_HAVE_KERNEL_LZMA=y CONFIG_HAVE_KERNEL_XZ=y CONFIG_HAVE_KERNEL_LZO=y # CONFIG_KERNEL_GZIP is not set # CONFIG_KERNEL_BZIP2 is not set CONFIG_KERNEL_LZMA=y # CONFIG_KERNEL_XZ is not set # CONFIG_KERNEL_LZO is not set CONFIG_DEFAULT_HOSTNAME="virthost" CONFIG_SWAP=y CONFIG_SYSVIPC=y CONFIG_SYSVIPC_SYSCTL=y CONFIG_POSIX_MQUEUE=y CONFIG_POSIX_MQUEUE_SYSCTL=y CONFIG_BSD_PROCESS_ACCT=y # CONFIG_BSD_PROCESS_ACCT_V3 is not set # CONFIG_FHANDLE is not set # CONFIG_TASKSTATS is not set CONFIG_AUDIT=y CONFIG_AUDITSYSCALL=y CONFIG_AUDIT_WATCH=y CONFIG_AUDIT_TREE=y # CONFIG_AUDIT_LOGINUID_IMMUTABLE is not set CONFIG_HAVE_GENERIC_HARDIRQS=y # # IRQ subsystem # CONFIG_GENERIC_HARDIRQS=y CONFIG_HAVE_SPARSE_IRQ=y CONFIG_GENERIC_IRQ_PROBE=y CONFIG_GENERIC_IRQ_SHOW=y CONFIG_GENERIC_PENDING_IRQ=y CONFIG_IRQ_FORCED_THREADING=y CONFIG_SPARSE_IRQ=y # # RCU Subsystem # CONFIG_TREE_PREEMPT_RCU=y CONFIG_PREEMPT_RCU=y # CONFIG_RCU_TRACE is not set CONFIG_RCU_FANOUT=64 # CONFIG_RCU_FANOUT_EXACT is not set # CONFIG_RCU_FAST_NO_HZ is not set # CONFIG_TREE_RCU_TRACE is not set # CONFIG_RCU_BOOST is not set # CONFIG_IKCONFIG is not set CONFIG_LOG_BUF_SHIFT=17 CONFIG_HAVE_UNSTABLE_SCHED_CLOCK=y CONFIG_CGROUPS=y # CONFIG_CGROUP_DEBUG is not set # CONFIG_CGROUP_FREEZER is not set # CONFIG_CGROUP_DEVICE is not set # CONFIG_CPUSETS is not set # CONFIG_CGROUP_CPUACCT is not set # CONFIG_RESOURCE_COUNTERS is not set # CONFIG_CGROUP_PERF is not set CONFIG_CGROUP_SCHED=y CONFIG_FAIR_GROUP_SCHED=y # CONFIG_CFS_BANDWIDTH is not set # CONFIG_RT_GROUP_SCHED is not set # CONFIG_BLK_CGROUP is not set # CONFIG_CHECKPOINT_RESTORE is not set CONFIG_NAMESPACES=y CONFIG_UTS_NS=y CONFIG_IPC_NS=y CONFIG_USER_NS=y CONFIG_PID_NS=y CONFIG_NET_NS=y CONFIG_SCHED_AUTOGROUP=y # CONFIG_SYSFS_DEPRECATED is not set CONFIG_RELAY=y CONFIG_BLK_DEV_INITRD=y CONFIG_INITRAMFS_SOURCE="" CONFIG_RD_GZIP=y CONFIG_RD_BZIP2=y CONFIG_RD_LZMA=y CONFIG_RD_XZ=y CONFIG_RD_LZO=y CONFIG_CC_OPTIMIZE_FOR_SIZE=y CONFIG_SYSCTL=y CONFIG_ANON_INODES=y # CONFIG_EXPERT is not set # CONFIG_SYSCTL_SYSCALL is not set CONFIG_KALLSYMS=y CONFIG_KALLSYMS_ALL=y CONFIG_HOTPLUG=y CONFIG_PRINTK=y CONFIG_BUG=y CONFIG_ELF_CORE=y CONFIG_PCSPKR_PLATFORM=y CONFIG_HAVE_PCSPKR_PLATFORM=y CONFIG_BASE_FULL=y CONFIG_FUTEX=y CONFIG_EPOLL=y CONFIG_SIGNALFD=y CONFIG_TIMERFD=y CONFIG_EVENTFD=y CONFIG_SHMEM=y CONFIG_AIO=y # CONFIG_EMBEDDED is not set CONFIG_HAVE_PERF_EVENTS=y # # Kernel Performance Events And Counters # CONFIG_PERF_EVENTS=y # CONFIG_PERF_COUNTERS is not set # CONFIG_DEBUG_PERF_USE_VMALLOC is not set CONFIG_VM_EVENT_COUNTERS=y CONFIG_PCI_QUIRKS=y CONFIG_SLUB_DEBUG=y # CONFIG_COMPAT_BRK is not set # CONFIG_SLAB is not set CONFIG_SLUB=y # CONFIG_PROFILING is not set CONFIG_TRACEPOINTS=y CONFIG_HAVE_OPROFILE=y CONFIG_OPROFILE_NMI_TIMER=y # CONFIG_KPROBES is not set # CONFIG_JUMP_LABEL is not set CONFIG_HAVE_
Re: x86: kvmclock: abstract save/restore sched_clock_state
Could you send me your .config and commit id of kernel you are using? - Original Message - > From: "Amit Shah" > To: "Igor Mammedov" > Cc: "Marcelo Tosatti" , kvm@vger.kernel.org, > t...@linutronix.de, mi...@redhat.com, > h...@zytor.com, x...@kernel.org, johns...@us.ibm.com, r...@redhat.com, > a...@redhat.com > Sent: Friday, February 10, 2012 11:02:11 AM > Subject: Re: x86: kvmclock: abstract save/restore sched_clock_state > > On (Thu) 09 Feb 2012 [16:13:29], Igor Mammedov wrote: > > > Stalls are probably caused by uninitialized percpu hv_clock, with > > following patch I don't see stalls. Although I might be just lucky. > > http://git.kernel.org/?p=virt/kvm/kvm.git;a=commit;h=e2971ac7e1d186af059e088d305496c5cb47d487 > > Your commit does make things better, I don't see any stalls on the > first resume. > > However, a subsequent s4 causes the stall to re-appear on resume, and > this time there are no stall messages; the kernel just sits there > spinning on something. I've not found the solution to this one yet > (I > had a commit similar to Marcelo's in the works, which got me to the > previous works-but-stalls behaviour). > > > However there is/are a warning/s on suspend path and with following > > patch: > > I didn't see this. > > Amit > -- > To unsubscribe from this list: send the line "unsubscribe kvm" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: x86: kvmclock: abstract save/restore sched_clock_state
On (Thu) 09 Feb 2012 [16:13:29], Igor Mammedov wrote: > Stalls are probably caused by uninitialized percpu hv_clock, with > following patch I don't see stalls. Although I might be just lucky. > http://git.kernel.org/?p=virt/kvm/kvm.git;a=commit;h=e2971ac7e1d186af059e088d305496c5cb47d487 Your commit does make things better, I don't see any stalls on the first resume. However, a subsequent s4 causes the stall to re-appear on resume, and this time there are no stall messages; the kernel just sits there spinning on something. I've not found the solution to this one yet (I had a commit similar to Marcelo's in the works, which got me to the previous works-but-stalls behaviour). > However there is/are a warning/s on suspend path and with following patch: I didn't see this. Amit -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: x86: kvmclock: abstract save/restore sched_clock_state
On 02/09/2012 01:27 PM, Amit Shah wrote: On (Tue) 07 Feb 2012 [19:05:42], Marcelo Tosatti wrote: Upon resume from hibernation, CPU 0's hvclock area contains the old values for system_time and tsc_timestamp. It is necessary for the hypervisor to update these values with uptodate ones before the CPU uses them. Abstract TSC's save/restore sched_clock_state functions and use restore_state to write to KVM_SYSTEM_TIME MSR, forcing an update. Fixes suspend-to-disk with kvmclock. There are stalls after resume, see trace below. ./x86_64-softmmu/qemu-system-x86_64 -kernel ~/src/linux/arch/x86/boot/bzImage -append 'root=/dev/vda1 console=tty0 console=ttyS0 no_console_suspend' -drive file=/guests/f14-suspend.qcow2,if=none,id=dr0 -device virtio-blk-pci,drive=dr0 -net nic,model=virtio -net user -serial stdio -enable-kvm -m 512 -cpu host -smp 4 Disabling non-boot CPUs ... CPU 1 is now offline CPU 2 is now offline CPU 3 is now offline lockdep: fixing up alternatives. kvm-clock: cpu 0, msr 0:1f1d19c1, primary cpu clock, resume PM: Restoring platform NVS memory Enabling non-boot CPUs ... lockdep: fixing up alternatives. Booting Node 0 Processor 1 APIC 0x1 Calibrating delay loop (skipped) already calibrated this CPU INFO: rcu_sched detected stalls on CPUs/tasks: { 1} (detected by 0, t=314192 jiffies) Pid: 662, comm: bash Not tainted 3.3.0-rc2+ #293 Call Trace: [] __rcu_pending+0x268/0x3c5 [] rcu_check_callbacks+0xab/0x108 [] update_process_times+0x3f/0x75 [] tick_sched_timer+0x6d/0x8c [] __run_hrtimer+0xc9/0x15c [] ? tick_nohz_handler+0xd5/0xd5 [] hrtimer_interrupt+0xea/0x1b4 [] smp_apic_timer_interrupt+0x76/0x89 [] apic_timer_interrupt+0x73/0x80 [] ? arch_local_irq_restore+0x6/0xd [] native_cpu_up+0x108/0x127 [] _cpu_up+0x92/0xfc [] enable_nonboot_cpus+0x4d/0xb2 [] hibernation_snapshot+0x1f3/0x2a0 [] ? cleanup_srcu_struct+0x52/0x58 [] hibernate+0x97/0x196 [] state_store+0x5c/0x106 [] kobj_attr_store+0x17/0x19 [] sysfs_write_file+0x10e/0x14a [] vfs_write+0xab/0xd2 [] ? fget_light+0x3a/0xa1 [] sys_write+0x4d/0x74 [] system_call_fastpath+0x16/0x1b INFO: rcu_preempt detected stalls on CPUs/tasks: { 1} (detected by 0, t=314193 jiffies) Pid: 662, comm: bash Not tainted 3.3.0-rc2+ #293 Call Trace: [] __rcu_pending+0x268/0x3c5 [] rcu_check_callbacks+0xe9/0x108 [] update_process_times+0x3f/0x75 [] tick_sched_timer+0x6d/0x8c [] __run_hrtimer+0xc9/0x15c [] ? tick_nohz_handler+0xd5/0xd5 [] hrtimer_interrupt+0xea/0x1b4 [] smp_apic_timer_interrupt+0x76/0x89 [] apic_timer_interrupt+0x73/0x80 [] ? arch_local_irq_restore+0x6/0xd [] native_cpu_up+0x108/0x127 [] _cpu_up+0x92/0xfc [] enable_nonboot_cpus+0x4d/0xb2 [] hibernation_snapshot+0x1f3/0x2a0 [] ? cleanup_srcu_struct+0x52/0x58 [] hibernate+0x97/0x196 [] state_store+0x5c/0x106 [] kobj_attr_store+0x17/0x19 [] sysfs_write_file+0x10e/0x14a [] vfs_write+0xab/0xd2 [] ? fget_light+0x3a/0xa1 [] sys_write+0x4d/0x74 [] system_call_fastpath+0x16/0x1b INFO: rcu_sched detected stalls on CPUs/tasks: { 1} (detected by 0, t=1713579840232 jiffies) Pid: 662, comm: bash Not tainted 3.3.0-rc2+ #293 Call Trace: [] __rcu_pending+0x268/0x3c5 [] rcu_check_callbacks+0xab/0x108 [] update_process_times+0x3f/0x75 [] tick_sched_timer+0x6d/0x8c [] __run_hrtimer+0xc9/0x15c [] ? tick_nohz_handler+0xd5/0xd5 [] hrtimer_interrupt+0xea/0x1b4 [] smp_apic_timer_interrupt+0x76/0x89 [] apic_timer_interrupt+0x73/0x80 [] ? arch_local_irq_restore+0x6/0xd [] native_cpu_up+0x108/0x127 [] _cpu_up+0x92/0xfc [] enable_nonboot_cpus+0x4d/0xb2 [] hibernation_snapshot+0x1f3/0x2a0 [] ? cleanup_srcu_struct+0x52/0x58 [] hibernate+0x97/0x196 [] state_store+0x5c/0x106 [] kobj_attr_store+0x17/0x19 [] sysfs_write_file+0x10e/0x14a [] vfs_write+0xab/0xd2 [] ? fget_light+0x3a/0xa1 [] sys_write+0x4d/0x74 [] system_call_fastpath+0x16/0x1b INFO: rcu_preempt detected stalls on CPUs/tasks: { 1} (detected by 0, t=1713579840233 jiffies) Pid: 662, comm: bash Not tainted 3.3.0-rc2+ #293 Call Trace: [] __rcu_pending+0x268/0x3c5 [] rcu_check_callbacks+0xe9/0x108 [] update_process_times+0x3f/0x75 [] tick_sched_timer+0x6d/0x8c [] __run_hrtimer+0xc9/0x15c [] ? tick_nohz_handler+0xd5/0xd5 [] hrtimer_interrupt+0xea/0x1b4 [] smp_apic_timer_interrupt+0x76/0x89 [] apic_timer_interrupt+0x73/0x80 [] ? arch_local_irq_restore+0x6/0xd [] native_cpu_up+0x108/0x127 [] _cpu_up+0x92/0xfc [] enable_nonboot_cpus+0x4d/0xb2 [] hibernation_snapshot+0x1f3/0x2a0 [] ? cleanup_srcu_struct+0x52/0x58 [] hibernate+0x97/0x196 [] state_store+0x5c/0x106 [] kobj_attr_store+0x17/0x19 [] sysfs_write_file+0x10e/0x14a [] vfs_write+0xab/0xd2 [] ? fget_light+0x3a/0xa1 [] sys_write+0x4d/0x74 [] system_call_fastpath+0x16/0x1b Amit Stalls are probably caused by uninitialized percpu hv_clock, with following patch I don't see stalls. Although I might be j
Re: x86: kvmclock: abstract save/restore sched_clock_state
On (Tue) 07 Feb 2012 [19:05:42], Marcelo Tosatti wrote: > > Upon resume from hibernation, CPU 0's hvclock area contains the old > values for system_time and tsc_timestamp. It is necessary for the > hypervisor to update these values with uptodate ones before the CPU uses > them. > > Abstract TSC's save/restore sched_clock_state functions and use > restore_state to write to KVM_SYSTEM_TIME MSR, forcing an update. > > Fixes suspend-to-disk with kvmclock. There are stalls after resume, see trace below. ./x86_64-softmmu/qemu-system-x86_64 -kernel ~/src/linux/arch/x86/boot/bzImage -append 'root=/dev/vda1 console=tty0 console=ttyS0 no_console_suspend' -drive file=/guests/f14-suspend.qcow2,if=none,id=dr0 -device virtio-blk-pci,drive=dr0 -net nic,model=virtio -net user -serial stdio -enable-kvm -m 512 -cpu host -smp 4 Disabling non-boot CPUs ... CPU 1 is now offline CPU 2 is now offline CPU 3 is now offline lockdep: fixing up alternatives. kvm-clock: cpu 0, msr 0:1f1d19c1, primary cpu clock, resume PM: Restoring platform NVS memory Enabling non-boot CPUs ... lockdep: fixing up alternatives. Booting Node 0 Processor 1 APIC 0x1 Calibrating delay loop (skipped) already calibrated this CPU INFO: rcu_sched detected stalls on CPUs/tasks: { 1} (detected by 0, t=314192 jiffies) Pid: 662, comm: bash Not tainted 3.3.0-rc2+ #293 Call Trace: [] __rcu_pending+0x268/0x3c5 [] rcu_check_callbacks+0xab/0x108 [] update_process_times+0x3f/0x75 [] tick_sched_timer+0x6d/0x8c [] __run_hrtimer+0xc9/0x15c [] ? tick_nohz_handler+0xd5/0xd5 [] hrtimer_interrupt+0xea/0x1b4 [] smp_apic_timer_interrupt+0x76/0x89 [] apic_timer_interrupt+0x73/0x80 [] ? arch_local_irq_restore+0x6/0xd [] native_cpu_up+0x108/0x127 [] _cpu_up+0x92/0xfc [] enable_nonboot_cpus+0x4d/0xb2 [] hibernation_snapshot+0x1f3/0x2a0 [] ? cleanup_srcu_struct+0x52/0x58 [] hibernate+0x97/0x196 [] state_store+0x5c/0x106 [] kobj_attr_store+0x17/0x19 [] sysfs_write_file+0x10e/0x14a [] vfs_write+0xab/0xd2 [] ? fget_light+0x3a/0xa1 [] sys_write+0x4d/0x74 [] system_call_fastpath+0x16/0x1b INFO: rcu_preempt detected stalls on CPUs/tasks: { 1} (detected by 0, t=314193 jiffies) Pid: 662, comm: bash Not tainted 3.3.0-rc2+ #293 Call Trace: [] __rcu_pending+0x268/0x3c5 [] rcu_check_callbacks+0xe9/0x108 [] update_process_times+0x3f/0x75 [] tick_sched_timer+0x6d/0x8c [] __run_hrtimer+0xc9/0x15c [] ? tick_nohz_handler+0xd5/0xd5 [] hrtimer_interrupt+0xea/0x1b4 [] smp_apic_timer_interrupt+0x76/0x89 [] apic_timer_interrupt+0x73/0x80 [] ? arch_local_irq_restore+0x6/0xd [] native_cpu_up+0x108/0x127 [] _cpu_up+0x92/0xfc [] enable_nonboot_cpus+0x4d/0xb2 [] hibernation_snapshot+0x1f3/0x2a0 [] ? cleanup_srcu_struct+0x52/0x58 [] hibernate+0x97/0x196 [] state_store+0x5c/0x106 [] kobj_attr_store+0x17/0x19 [] sysfs_write_file+0x10e/0x14a [] vfs_write+0xab/0xd2 [] ? fget_light+0x3a/0xa1 [] sys_write+0x4d/0x74 [] system_call_fastpath+0x16/0x1b INFO: rcu_sched detected stalls on CPUs/tasks: { 1} (detected by 0, t=1713579840232 jiffies) Pid: 662, comm: bash Not tainted 3.3.0-rc2+ #293 Call Trace: [] __rcu_pending+0x268/0x3c5 [] rcu_check_callbacks+0xab/0x108 [] update_process_times+0x3f/0x75 [] tick_sched_timer+0x6d/0x8c [] __run_hrtimer+0xc9/0x15c [] ? tick_nohz_handler+0xd5/0xd5 [] hrtimer_interrupt+0xea/0x1b4 [] smp_apic_timer_interrupt+0x76/0x89 [] apic_timer_interrupt+0x73/0x80 [] ? arch_local_irq_restore+0x6/0xd [] native_cpu_up+0x108/0x127 [] _cpu_up+0x92/0xfc [] enable_nonboot_cpus+0x4d/0xb2 [] hibernation_snapshot+0x1f3/0x2a0 [] ? cleanup_srcu_struct+0x52/0x58 [] hibernate+0x97/0x196 [] state_store+0x5c/0x106 [] kobj_attr_store+0x17/0x19 [] sysfs_write_file+0x10e/0x14a [] vfs_write+0xab/0xd2 [] ? fget_light+0x3a/0xa1 [] sys_write+0x4d/0x74 [] system_call_fastpath+0x16/0x1b INFO: rcu_preempt detected stalls on CPUs/tasks: { 1} (detected by 0, t=1713579840233 jiffies) Pid: 662, comm: bash Not tainted 3.3.0-rc2+ #293 Call Trace: [] __rcu_pending+0x268/0x3c5 [] rcu_check_callbacks+0xe9/0x108 [] update_process_times+0x3f/0x75 [] tick_sched_timer+0x6d/0x8c [] __run_hrtimer+0xc9/0x15c [] ? tick_nohz_handler+0xd5/0xd5 [] hrtimer_interrupt+0xea/0x1b4 [] smp_apic_timer_interrupt+0x76/0x89 [] apic_timer_interrupt+0x73/0x80 [] ? arch_local_irq_restore+0x6/0xd [] native_cpu_up+0x108/0x127 [] _cpu_up+0x92/0xfc [] enable_nonboot_cpus+0x4d/0xb2 [] hibernation_snapshot+0x1f3/0x2a0 [] ? cleanup_srcu_struct+0x52/0x58 [] hibernate+0x97/0x196 [] state_store+0x5c/0x106 [] kobj_attr_store+0x17/0x19 [] sysfs_write_file+0x10e/0x14a [] vfs_write+0xab/0xd2 [] ? fget_light+0x3a/0xa1 [] sys_write+0x4d/0x74 [] system_call_fastpath+0x16/0x1b Amit -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: x86: kvmclock: abstract save/restore sched_clock_state
On 02/07/2012 10:05 PM, Marcelo Tosatti wrote: Upon resume from hibernation, CPU 0's hvclock area contains the old values for system_time and tsc_timestamp. It is necessary for the hypervisor to update these values with uptodate ones before the CPU uses them. Abstract TSC's save/restore sched_clock_state functions and use restore_state to write to KVM_SYSTEM_TIME MSR, forcing an update. Fixes suspend-to-disk with kvmclock. Signed-off-by: Marcelo Tosatti Acked-by: Igor Mammedov -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html