Re: [PATCH 0/2] Quieten softlockup detector on virtualised kernels

2015-01-09 Thread Don Zickus
On Fri, Jan 09, 2015 at 02:15:00PM +1100, Cyril Bur wrote:
> > 
> > I am not too familar with it, but the kernel/watchdog.c code has calls to
> > kvm_check_and_clear_guest_paused(), which is probably a good place to
> > start.
> > 
> Ah yes that, I did initially have a look at what it does when I
> undertook to solve the problem on power and I suppose the two solutions
> are similar in that they both just use a virtualised time source. The
> similarities stop there though, the paravirtualised clock that x86 uses
> provides (as the name of the function implies) a 'was paused' flag.
> Obviously the flag isn't something the vtb register on power8 can
> provide and since we have a vtb, its preferable to use that.
> Perhaps x86 can do something with running_clock?

Marcello?  Drew?

Cheers,
Don

> 
> Regards,
> 
> Cyril
> 
> > Cheers,
> > Don
> > 
> > > 
> > > > Not sure if that is useful or could be incoporated into the power8 code.
> > > > Though to be honest I am curious if the steal_time code could be ported 
> > > > to
> > > > your solution as it seems the watchdog code could remove all the
> > > > steal_time warts.
> > > Happy to help sus out the situation here, again, if you could pass on
> > > what the x86 guys are working on, thanks.
> > > 
> > > 
> > > Thanks,
> > > 
> > > Cyril
> > > > I have cc'd Marcelo into this discussion as he was the last person I
> > > > remember talking with about this problem.
> > > > 
> > > > Cheers,
> > > > Don
> > > 
> > > 
> 
> 
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/2] Quieten softlockup detector on virtualised kernels

2015-01-08 Thread Cyril Bur

On Tue, 2015-01-06 at 10:01 -0500, Don Zickus wrote:
> On Tue, Jan 06, 2015 at 10:53:35AM +1100, Cyril Bur wrote:
> > On Mon, 2015-01-05 at 11:50 -0500, Don Zickus wrote:
> > > cc'ing Marcelo
> > > 
> > > On Mon, Dec 22, 2014 at 04:06:02PM +1100, Cyril Bur wrote:
> > > > When the hypervisor pauses a virtualised kernel the kernel will observe 
> > > > a jump
> > > > in timebase, this can cause spurious messages from the softlockup 
> > > > detector.
> > > > 
> > > > Whilst these messages are harmless, they are accompanied with a stack 
> > > > trace
> > > > which causes undue concern and more problematically the stack trace in 
> > > > the
> > > > guest has nothing to do with the observed problem and can only be 
> > > > misleading.
> > > > 
> > > > Futhermore, on POWER8 this is completely avoidable with the 
> > > > introduction of
> > > > the Virtual Time Base (VTB) register.
> > > 
> > > Hi Cyril,
> > > 
> > > Your solution seems simple and doesn't disturb the softlockup code as much
> > > as the x86 solution does.  The only small issue I had was the use of
> > > sched_clock instead of local_clock.  I keep forgetting the difference
> > > (unstable clock is the biggest reason I think).
> > My apologies there it appears I stuffed up, local_clock was used
> > initially in the softlockup code, I'll send a v2.
> 
> Thanks!
> 
> > 
> > > Other than that, I am not the biggest fan of putting multiple virtual
> > > guest solutions for the same problem into the watchdog code.  I would
> > > prefer a common solution/framework to leverage.
> > Agreed.
> > 
> > > I have the x86 folks focusing on the steal_time stuff.  It started with
> > > KVM and I believe VMWare is working on utilizing it too (and maybe Xen).
> > I'm not sure I've ever seen this, could you please point me towards
> > something I can look at?
> 
> I am not too familar with it, but the kernel/watchdog.c code has calls to
> kvm_check_and_clear_guest_paused(), which is probably a good place to
> start.
> 
Ah yes that, I did initially have a look at what it does when I
undertook to solve the problem on power and I suppose the two solutions
are similar in that they both just use a virtualised time source. The
similarities stop there though, the paravirtualised clock that x86 uses
provides (as the name of the function implies) a 'was paused' flag.
Obviously the flag isn't something the vtb register on power8 can
provide and since we have a vtb, its preferable to use that.
Perhaps x86 can do something with running_clock?

Regards,

Cyril

> Cheers,
> Don
> 
> > 
> > > Not sure if that is useful or could be incoporated into the power8 code.
> > > Though to be honest I am curious if the steal_time code could be ported to
> > > your solution as it seems the watchdog code could remove all the
> > > steal_time warts.
> > Happy to help sus out the situation here, again, if you could pass on
> > what the x86 guys are working on, thanks.
> > 
> > 
> > Thanks,
> > 
> > Cyril
> > > I have cc'd Marcelo into this discussion as he was the last person I
> > > remember talking with about this problem.
> > > 
> > > Cheers,
> > > Don
> > 
> > 



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/2] Quieten softlockup detector on virtualised kernels

2015-01-06 Thread Don Zickus
On Tue, Jan 06, 2015 at 10:53:35AM +1100, Cyril Bur wrote:
> On Mon, 2015-01-05 at 11:50 -0500, Don Zickus wrote:
> > cc'ing Marcelo
> > 
> > On Mon, Dec 22, 2014 at 04:06:02PM +1100, Cyril Bur wrote:
> > > When the hypervisor pauses a virtualised kernel the kernel will observe a 
> > > jump
> > > in timebase, this can cause spurious messages from the softlockup 
> > > detector.
> > > 
> > > Whilst these messages are harmless, they are accompanied with a stack 
> > > trace
> > > which causes undue concern and more problematically the stack trace in the
> > > guest has nothing to do with the observed problem and can only be 
> > > misleading.
> > > 
> > > Futhermore, on POWER8 this is completely avoidable with the introduction 
> > > of
> > > the Virtual Time Base (VTB) register.
> > 
> > Hi Cyril,
> > 
> > Your solution seems simple and doesn't disturb the softlockup code as much
> > as the x86 solution does.  The only small issue I had was the use of
> > sched_clock instead of local_clock.  I keep forgetting the difference
> > (unstable clock is the biggest reason I think).
> My apologies there it appears I stuffed up, local_clock was used
> initially in the softlockup code, I'll send a v2.

Thanks!

> 
> > Other than that, I am not the biggest fan of putting multiple virtual
> > guest solutions for the same problem into the watchdog code.  I would
> > prefer a common solution/framework to leverage.
> Agreed.
> 
> > I have the x86 folks focusing on the steal_time stuff.  It started with
> > KVM and I believe VMWare is working on utilizing it too (and maybe Xen).
> I'm not sure I've ever seen this, could you please point me towards
> something I can look at?

I am not too familar with it, but the kernel/watchdog.c code has calls to
kvm_check_and_clear_guest_paused(), which is probably a good place to
start.

Cheers,
Don

> 
> > Not sure if that is useful or could be incoporated into the power8 code.
> > Though to be honest I am curious if the steal_time code could be ported to
> > your solution as it seems the watchdog code could remove all the
> > steal_time warts.
> Happy to help sus out the situation here, again, if you could pass on
> what the x86 guys are working on, thanks.
> 
> 
> Thanks,
> 
> Cyril
> > I have cc'd Marcelo into this discussion as he was the last person I
> > remember talking with about this problem.
> > 
> > Cheers,
> > Don
> 
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/2] Quieten softlockup detector on virtualised kernels

2015-01-05 Thread Cyril Bur
On Mon, 2015-01-05 at 14:09 -0800, Andrew Morton wrote:
> On Mon, 22 Dec 2014 16:06:02 +1100 Cyril Bur  wrote:
> 
> > When the hypervisor pauses a virtualised kernel the kernel will observe a 
> > jump
> > in timebase, this can cause spurious messages from the softlockup detector.
> > 
> > Whilst these messages are harmless, they are accompanied with a stack trace
> > which causes undue concern and more problematically the stack trace in the
> > guest has nothing to do with the observed problem and can only be 
> > misleading.
> > 
> > Futhermore, on POWER8 this is completely avoidable with the introduction of
> > the Virtual Time Base (VTB) register.
> 
> Does this problem apply to other KVM implementations and to Xen?  If
> so, what would implementations of running_clock() for those look like? 
> If not, why not?
Yes the problem should appear on other KVM implementations, not really
sure about Xen but I don't see why the problem wouldn't crop up.

x86 do have a method for dealing with it in the softlockup detector,
they've added a check in the softlockup using a paravirtualised clock
where the guest can discover if it had been paused, Xen could be using
too.
It doesn't appear s390 do anything.

Thanks,

Cyril
> 
> 


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/2] Quieten softlockup detector on virtualised kernels

2015-01-05 Thread Cyril Bur
On Mon, 2015-01-05 at 11:50 -0500, Don Zickus wrote:
> cc'ing Marcelo
> 
> On Mon, Dec 22, 2014 at 04:06:02PM +1100, Cyril Bur wrote:
> > When the hypervisor pauses a virtualised kernel the kernel will observe a 
> > jump
> > in timebase, this can cause spurious messages from the softlockup detector.
> > 
> > Whilst these messages are harmless, they are accompanied with a stack trace
> > which causes undue concern and more problematically the stack trace in the
> > guest has nothing to do with the observed problem and can only be 
> > misleading.
> > 
> > Futhermore, on POWER8 this is completely avoidable with the introduction of
> > the Virtual Time Base (VTB) register.
> 
> Hi Cyril,
> 
> Your solution seems simple and doesn't disturb the softlockup code as much
> as the x86 solution does.  The only small issue I had was the use of
> sched_clock instead of local_clock.  I keep forgetting the difference
> (unstable clock is the biggest reason I think).
My apologies there it appears I stuffed up, local_clock was used
initially in the softlockup code, I'll send a v2.

> Other than that, I am not the biggest fan of putting multiple virtual
> guest solutions for the same problem into the watchdog code.  I would
> prefer a common solution/framework to leverage.
Agreed.

> I have the x86 folks focusing on the steal_time stuff.  It started with
> KVM and I believe VMWare is working on utilizing it too (and maybe Xen).
I'm not sure I've ever seen this, could you please point me towards
something I can look at?

> Not sure if that is useful or could be incoporated into the power8 code.
> Though to be honest I am curious if the steal_time code could be ported to
> your solution as it seems the watchdog code could remove all the
> steal_time warts.
Happy to help sus out the situation here, again, if you could pass on
what the x86 guys are working on, thanks.


Thanks,

Cyril
> I have cc'd Marcelo into this discussion as he was the last person I
> remember talking with about this problem.
> 
> Cheers,
> Don


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/2] Quieten softlockup detector on virtualised kernels

2015-01-05 Thread Andrew Morton
On Mon, 22 Dec 2014 16:06:02 +1100 Cyril Bur  wrote:

> When the hypervisor pauses a virtualised kernel the kernel will observe a jump
> in timebase, this can cause spurious messages from the softlockup detector.
> 
> Whilst these messages are harmless, they are accompanied with a stack trace
> which causes undue concern and more problematically the stack trace in the
> guest has nothing to do with the observed problem and can only be misleading.
> 
> Futhermore, on POWER8 this is completely avoidable with the introduction of
> the Virtual Time Base (VTB) register.

Does this problem apply to other KVM implementations and to Xen?  If
so, what would implementations of running_clock() for those look like? 
If not, why not?


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/2] Quieten softlockup detector on virtualised kernels

2015-01-05 Thread Don Zickus
cc'ing Marcelo

On Mon, Dec 22, 2014 at 04:06:02PM +1100, Cyril Bur wrote:
> When the hypervisor pauses a virtualised kernel the kernel will observe a jump
> in timebase, this can cause spurious messages from the softlockup detector.
> 
> Whilst these messages are harmless, they are accompanied with a stack trace
> which causes undue concern and more problematically the stack trace in the
> guest has nothing to do with the observed problem and can only be misleading.
> 
> Futhermore, on POWER8 this is completely avoidable with the introduction of
> the Virtual Time Base (VTB) register.

Hi Cyril,

Your solution seems simple and doesn't disturb the softlockup code as much
as the x86 solution does.  The only small issue I had was the use of
sched_clock instead of local_clock.  I keep forgetting the difference
(unstable clock is the biggest reason I think).

Other than that, I am not the biggest fan of putting multiple virtual
guest solutions for the same problem into the watchdog code.  I would
prefer a common solution/framework to leverage.

I have the x86 folks focusing on the steal_time stuff.  It started with
KVM and I believe VMWare is working on utilizing it too (and maybe Xen).

Not sure if that is useful or could be incoporated into the power8 code.
Though to be honest I am curious if the steal_time code could be ported to
your solution as it seems the watchdog code could remove all the
steal_time warts.

I have cc'd Marcelo into this discussion as he was the last person I
remember talking with about this problem.

Cheers,
Don
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/