Re: [PATCH 0/2] Quieten softlockup detector on virtualised kernels
On Fri, Jan 09, 2015 at 02:15:00PM +1100, Cyril Bur wrote: > > > > I am not too familar with it, but the kernel/watchdog.c code has calls to > > kvm_check_and_clear_guest_paused(), which is probably a good place to > > start. > > > Ah yes that, I did initially have a look at what it does when I > undertook to solve the problem on power and I suppose the two solutions > are similar in that they both just use a virtualised time source. The > similarities stop there though, the paravirtualised clock that x86 uses > provides (as the name of the function implies) a 'was paused' flag. > Obviously the flag isn't something the vtb register on power8 can > provide and since we have a vtb, its preferable to use that. > Perhaps x86 can do something with running_clock? Marcello? Drew? Cheers, Don > > Regards, > > Cyril > > > Cheers, > > Don > > > > > > > > > Not sure if that is useful or could be incoporated into the power8 code. > > > > Though to be honest I am curious if the steal_time code could be ported > > > > to > > > > your solution as it seems the watchdog code could remove all the > > > > steal_time warts. > > > Happy to help sus out the situation here, again, if you could pass on > > > what the x86 guys are working on, thanks. > > > > > > > > > Thanks, > > > > > > Cyril > > > > I have cc'd Marcelo into this discussion as he was the last person I > > > > remember talking with about this problem. > > > > > > > > Cheers, > > > > Don > > > > > > > > > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/2] Quieten softlockup detector on virtualised kernels
On Tue, 2015-01-06 at 10:01 -0500, Don Zickus wrote: > On Tue, Jan 06, 2015 at 10:53:35AM +1100, Cyril Bur wrote: > > On Mon, 2015-01-05 at 11:50 -0500, Don Zickus wrote: > > > cc'ing Marcelo > > > > > > On Mon, Dec 22, 2014 at 04:06:02PM +1100, Cyril Bur wrote: > > > > When the hypervisor pauses a virtualised kernel the kernel will observe > > > > a jump > > > > in timebase, this can cause spurious messages from the softlockup > > > > detector. > > > > > > > > Whilst these messages are harmless, they are accompanied with a stack > > > > trace > > > > which causes undue concern and more problematically the stack trace in > > > > the > > > > guest has nothing to do with the observed problem and can only be > > > > misleading. > > > > > > > > Futhermore, on POWER8 this is completely avoidable with the > > > > introduction of > > > > the Virtual Time Base (VTB) register. > > > > > > Hi Cyril, > > > > > > Your solution seems simple and doesn't disturb the softlockup code as much > > > as the x86 solution does. The only small issue I had was the use of > > > sched_clock instead of local_clock. I keep forgetting the difference > > > (unstable clock is the biggest reason I think). > > My apologies there it appears I stuffed up, local_clock was used > > initially in the softlockup code, I'll send a v2. > > Thanks! > > > > > > Other than that, I am not the biggest fan of putting multiple virtual > > > guest solutions for the same problem into the watchdog code. I would > > > prefer a common solution/framework to leverage. > > Agreed. > > > > > I have the x86 folks focusing on the steal_time stuff. It started with > > > KVM and I believe VMWare is working on utilizing it too (and maybe Xen). > > I'm not sure I've ever seen this, could you please point me towards > > something I can look at? > > I am not too familar with it, but the kernel/watchdog.c code has calls to > kvm_check_and_clear_guest_paused(), which is probably a good place to > start. > Ah yes that, I did initially have a look at what it does when I undertook to solve the problem on power and I suppose the two solutions are similar in that they both just use a virtualised time source. The similarities stop there though, the paravirtualised clock that x86 uses provides (as the name of the function implies) a 'was paused' flag. Obviously the flag isn't something the vtb register on power8 can provide and since we have a vtb, its preferable to use that. Perhaps x86 can do something with running_clock? Regards, Cyril > Cheers, > Don > > > > > > Not sure if that is useful or could be incoporated into the power8 code. > > > Though to be honest I am curious if the steal_time code could be ported to > > > your solution as it seems the watchdog code could remove all the > > > steal_time warts. > > Happy to help sus out the situation here, again, if you could pass on > > what the x86 guys are working on, thanks. > > > > > > Thanks, > > > > Cyril > > > I have cc'd Marcelo into this discussion as he was the last person I > > > remember talking with about this problem. > > > > > > Cheers, > > > Don > > > > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/2] Quieten softlockup detector on virtualised kernels
On Tue, Jan 06, 2015 at 10:53:35AM +1100, Cyril Bur wrote: > On Mon, 2015-01-05 at 11:50 -0500, Don Zickus wrote: > > cc'ing Marcelo > > > > On Mon, Dec 22, 2014 at 04:06:02PM +1100, Cyril Bur wrote: > > > When the hypervisor pauses a virtualised kernel the kernel will observe a > > > jump > > > in timebase, this can cause spurious messages from the softlockup > > > detector. > > > > > > Whilst these messages are harmless, they are accompanied with a stack > > > trace > > > which causes undue concern and more problematically the stack trace in the > > > guest has nothing to do with the observed problem and can only be > > > misleading. > > > > > > Futhermore, on POWER8 this is completely avoidable with the introduction > > > of > > > the Virtual Time Base (VTB) register. > > > > Hi Cyril, > > > > Your solution seems simple and doesn't disturb the softlockup code as much > > as the x86 solution does. The only small issue I had was the use of > > sched_clock instead of local_clock. I keep forgetting the difference > > (unstable clock is the biggest reason I think). > My apologies there it appears I stuffed up, local_clock was used > initially in the softlockup code, I'll send a v2. Thanks! > > > Other than that, I am not the biggest fan of putting multiple virtual > > guest solutions for the same problem into the watchdog code. I would > > prefer a common solution/framework to leverage. > Agreed. > > > I have the x86 folks focusing on the steal_time stuff. It started with > > KVM and I believe VMWare is working on utilizing it too (and maybe Xen). > I'm not sure I've ever seen this, could you please point me towards > something I can look at? I am not too familar with it, but the kernel/watchdog.c code has calls to kvm_check_and_clear_guest_paused(), which is probably a good place to start. Cheers, Don > > > Not sure if that is useful or could be incoporated into the power8 code. > > Though to be honest I am curious if the steal_time code could be ported to > > your solution as it seems the watchdog code could remove all the > > steal_time warts. > Happy to help sus out the situation here, again, if you could pass on > what the x86 guys are working on, thanks. > > > Thanks, > > Cyril > > I have cc'd Marcelo into this discussion as he was the last person I > > remember talking with about this problem. > > > > Cheers, > > Don > > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/2] Quieten softlockup detector on virtualised kernels
On Mon, 2015-01-05 at 14:09 -0800, Andrew Morton wrote: > On Mon, 22 Dec 2014 16:06:02 +1100 Cyril Bur wrote: > > > When the hypervisor pauses a virtualised kernel the kernel will observe a > > jump > > in timebase, this can cause spurious messages from the softlockup detector. > > > > Whilst these messages are harmless, they are accompanied with a stack trace > > which causes undue concern and more problematically the stack trace in the > > guest has nothing to do with the observed problem and can only be > > misleading. > > > > Futhermore, on POWER8 this is completely avoidable with the introduction of > > the Virtual Time Base (VTB) register. > > Does this problem apply to other KVM implementations and to Xen? If > so, what would implementations of running_clock() for those look like? > If not, why not? Yes the problem should appear on other KVM implementations, not really sure about Xen but I don't see why the problem wouldn't crop up. x86 do have a method for dealing with it in the softlockup detector, they've added a check in the softlockup using a paravirtualised clock where the guest can discover if it had been paused, Xen could be using too. It doesn't appear s390 do anything. Thanks, Cyril > > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/2] Quieten softlockup detector on virtualised kernels
On Mon, 2015-01-05 at 11:50 -0500, Don Zickus wrote: > cc'ing Marcelo > > On Mon, Dec 22, 2014 at 04:06:02PM +1100, Cyril Bur wrote: > > When the hypervisor pauses a virtualised kernel the kernel will observe a > > jump > > in timebase, this can cause spurious messages from the softlockup detector. > > > > Whilst these messages are harmless, they are accompanied with a stack trace > > which causes undue concern and more problematically the stack trace in the > > guest has nothing to do with the observed problem and can only be > > misleading. > > > > Futhermore, on POWER8 this is completely avoidable with the introduction of > > the Virtual Time Base (VTB) register. > > Hi Cyril, > > Your solution seems simple and doesn't disturb the softlockup code as much > as the x86 solution does. The only small issue I had was the use of > sched_clock instead of local_clock. I keep forgetting the difference > (unstable clock is the biggest reason I think). My apologies there it appears I stuffed up, local_clock was used initially in the softlockup code, I'll send a v2. > Other than that, I am not the biggest fan of putting multiple virtual > guest solutions for the same problem into the watchdog code. I would > prefer a common solution/framework to leverage. Agreed. > I have the x86 folks focusing on the steal_time stuff. It started with > KVM and I believe VMWare is working on utilizing it too (and maybe Xen). I'm not sure I've ever seen this, could you please point me towards something I can look at? > Not sure if that is useful or could be incoporated into the power8 code. > Though to be honest I am curious if the steal_time code could be ported to > your solution as it seems the watchdog code could remove all the > steal_time warts. Happy to help sus out the situation here, again, if you could pass on what the x86 guys are working on, thanks. Thanks, Cyril > I have cc'd Marcelo into this discussion as he was the last person I > remember talking with about this problem. > > Cheers, > Don -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/2] Quieten softlockup detector on virtualised kernels
On Mon, 22 Dec 2014 16:06:02 +1100 Cyril Bur wrote: > When the hypervisor pauses a virtualised kernel the kernel will observe a jump > in timebase, this can cause spurious messages from the softlockup detector. > > Whilst these messages are harmless, they are accompanied with a stack trace > which causes undue concern and more problematically the stack trace in the > guest has nothing to do with the observed problem and can only be misleading. > > Futhermore, on POWER8 this is completely avoidable with the introduction of > the Virtual Time Base (VTB) register. Does this problem apply to other KVM implementations and to Xen? If so, what would implementations of running_clock() for those look like? If not, why not? -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/2] Quieten softlockup detector on virtualised kernels
cc'ing Marcelo On Mon, Dec 22, 2014 at 04:06:02PM +1100, Cyril Bur wrote: > When the hypervisor pauses a virtualised kernel the kernel will observe a jump > in timebase, this can cause spurious messages from the softlockup detector. > > Whilst these messages are harmless, they are accompanied with a stack trace > which causes undue concern and more problematically the stack trace in the > guest has nothing to do with the observed problem and can only be misleading. > > Futhermore, on POWER8 this is completely avoidable with the introduction of > the Virtual Time Base (VTB) register. Hi Cyril, Your solution seems simple and doesn't disturb the softlockup code as much as the x86 solution does. The only small issue I had was the use of sched_clock instead of local_clock. I keep forgetting the difference (unstable clock is the biggest reason I think). Other than that, I am not the biggest fan of putting multiple virtual guest solutions for the same problem into the watchdog code. I would prefer a common solution/framework to leverage. I have the x86 folks focusing on the steal_time stuff. It started with KVM and I believe VMWare is working on utilizing it too (and maybe Xen). Not sure if that is useful or could be incoporated into the power8 code. Though to be honest I am curious if the steal_time code could be ported to your solution as it seems the watchdog code could remove all the steal_time warts. I have cc'd Marcelo into this discussion as he was the last person I remember talking with about this problem. Cheers, Don -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/