Re: [PATCH 0/4] Avoid soft lockup message when KVM is stopped by host

2011-09-20 Thread Jamie Lokier
Marcelo Tosatti wrote:
> In case the VM stops for whatever reason, the host system is not
> supposed to adjust time related hardware state to compensate, in an
> attempt to present apparent continuous time.
> 
> If you save a VM and then restore it later, it is the guest
> responsability to adjust its time representation.

If the guest doesn't know it's been stopped, then its time
representation will be wrong until it finds out, e.g. after a few
minutes with NTP, or even a seconds can be too long.

That is sad when it happens because it breaks the coherence of any
timed-lease caching the guest is involved in.  I.e. where the guest
acquires a lock on some data object (like a file in NFS) that it can
efficiently access without network round trips (similar to MESI), with
all nodes having agreed that it's coherent for, say, 5 seconds before
renewing or breaking.  (It's just a way to reduce latency.)

But we can't trust CLOCK_MONOTONIC when a VM is involved, it's just
one of those facts of life.  So instead the effort is to try and
detect when a VM is involved and then distrust the clock.

(Non-VM) suspend/resume is similar, but there's usually a way to
be notified about that as it happens.

-- Jamie
--
To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/4] Avoid soft lockup message when KVM is stopped by host

2011-09-20 Thread Marcelo Tosatti
On Tue, Sep 20, 2011 at 03:00:54PM -0400, Eric B Munson wrote:
> On Thu, 15 Sep 2011, Marcelo Tosatti wrote:
> 
> > On Tue, Sep 13, 2011 at 04:49:55PM -0400, Eric B Munson wrote:
> > > On Fri, 09 Sep 2011, Marcelo Tosatti wrote:
> > > 
> > > > On Thu, Sep 01, 2011 at 02:27:49PM -0600, emun...@mgebm.net wrote:
> > > > > On Thu, 01 Sep 2011 14:24:12 -0500, Anthony Liguori wrote:
> > > > > >On 08/30/2011 07:26 AM, Marcelo Tosatti wrote:
> > > > > >>On Mon, Aug 29, 2011 at 05:27:11PM -0600, Eric B Munson wrote:
> > > > > >>>Currently, when qemu stops a guest kernel that guest will
> > > > > >>>issue a soft lockup
> > > > > >>>message when it resumes.  This set provides the ability for
> > > > > >>>qemu to comminucate
> > > > > >>>to the guest that it has been stopped.  When the guest hits
> > > > > >>>the watchdog on
> > > > > >>>resume it will check if it was suspended before issuing the
> > > > > >>>warning.
> > > > > >>>
> > > > > >>>Eric B Munson (4):
> > > > > >>>   Add flag to indicate that a vm was stopped by the host
> > > > > >>>   Add functions to check if the host has stopped the vm
> > > > > >>>   Add generic stubs for kvm stop check functions
> > > > > >>>   Add check for suspended vm in softlockup detector
> > > > > >>>
> > > > > >>>  arch/x86/include/asm/pvclock-abi.h |1 +
> > > > > >>>  arch/x86/include/asm/pvclock.h |2 ++
> > > > > >>>  arch/x86/kernel/kvmclock.c |   14 ++
> > > > > >>>  include/asm-generic/pvclock.h  |   14 ++
> > > > > >>>  kernel/watchdog.c  |   12 
> > > > > >>>  5 files changed, 43 insertions(+), 0 deletions(-)
> > > > > >>>  create mode 100644 include/asm-generic/pvclock.h
> > > > > >>>
> > > > > >>>--
> > > > > >>>1.7.4.1
> > > > > >>
> > > > > >>How is the host supposed to set this flag?
> > > > > >>
> > > > > >>As mentioned previously, if you save save/restore the offset
> > > > > >>added to
> > > > > >>kvmclock on stop/cont (and the TSC MSR, forgot to mention that), no
> > > > > >>paravirt infrastructure is required. Which means the issue is
> > > > > >>also fixed
> > > > > >>for older guests.
> > > > > >
> > > 
> > > Marcelo,
> > > 
> > > I think that stopping the TSC is the wrong approach because it will break 
> > > time
> > > between the two systems so timething that expects the monotonic clock to 
> > > move
> > > consistently will be wrong.
> > 
> > In case the VM stops for whatever reason, the host system is not
> > supposed to adjust time related hardware state to compensate, in an
> > attempt to present apparent continuous time.
> > 
> > If you save a VM and then restore it later, it is the guest
> > responsability to adjust its time representation.
> > 
> > QEMU exposing continuous TSC and kvmclock state between "stop" and
> > "cont" should not be a reason to introduce new paravirt infrastructure.
> > 
> > >  IMO, messing with the TSC at run time to avoid a watchdog message
> > > is the wrong solution, better to teach the watchdog to ignore this
> > > special case.
> > 
> > OK then, it is not a harmful addition, can you post the QEMU patches to
> > set the "ignore watchdog" bit.
> 
> I am about to start working on the interface to set this bit, but to double
> check I tried your suggestion to save the clock value on stop and restore it 
> on
> continue (see qemu patch below) and I am seeing very odd behavior from it.  On
> random resumes the vm won't come back immediately (it took one 15 mintues to
> respond) 

You should also save/restore the TSC MSR for each VCPU. And it appears
there is a missing

kvm_for_each_vcpu(vcpu)
kvm_make_request(KVM_REQ_CLOCK_UPDATE, vcpu)

in KVM_SET_CLOCK ioctl handling.

> and the wall clock stays behind my host wall clock by the amount of
> time it took to resume.

This is expected, similar to savevm/loadvm.

> ---
>  cpus.c |   14 ++
>  1 files changed, 14 insertions(+), 0 deletions(-)
> 
> diff --git a/cpus.c b/cpus.c
> index 54c188c..4573e23 100644
> --- a/cpus.c
> +++ b/cpus.c
> @@ -63,6 +63,7 @@
>  #endif /* CONFIG_LINUX */
>  
>  static CPUState *next_cpu;
> +struct kvm_clock_data data;
>  
>  /***/
>  void hw_error(const char *fmt, ...)
> @@ -788,6 +789,7 @@ static int all_vcpus_paused(void)
>  void pause_all_vcpus(void)
>  {
>  CPUState *penv = first_cpu;
> +int ret;
>  
>  while (penv) {
>  penv->stop = 1;
> @@ -803,11 +805,18 @@ void pause_all_vcpus(void)
>  penv = (CPUState *)penv->next_cpu;
>  }
>  }
> +
> +ret = kvm_vm_ioctl(kvm_state, KVM_GET_CLOCK, &data);
> +if (ret < 0) {
> +fprintf(stderr, "KVM_GET_CLOCK failed: %s\n", strerror(ret));
> +data.clock = 0;
> +}
>  }
>  
>  void resume_all_vcpus(void)
>  {
>  CPUState *penv = first_cpu;
> +int ret;
>  
>  while (penv) {
>  penv->stop = 0;
> @@ -815,6 +824,11 @@ void resume_all_vcpus(void)
>  qemu_cpu_kick(penv);
>

Re: [PATCH 0/4] Avoid soft lockup message when KVM is stopped by host

2011-09-20 Thread Eric B Munson
On Thu, 15 Sep 2011, Marcelo Tosatti wrote:

> On Tue, Sep 13, 2011 at 04:49:55PM -0400, Eric B Munson wrote:
> > On Fri, 09 Sep 2011, Marcelo Tosatti wrote:
> > 
> > > On Thu, Sep 01, 2011 at 02:27:49PM -0600, emun...@mgebm.net wrote:
> > > > On Thu, 01 Sep 2011 14:24:12 -0500, Anthony Liguori wrote:
> > > > >On 08/30/2011 07:26 AM, Marcelo Tosatti wrote:
> > > > >>On Mon, Aug 29, 2011 at 05:27:11PM -0600, Eric B Munson wrote:
> > > > >>>Currently, when qemu stops a guest kernel that guest will
> > > > >>>issue a soft lockup
> > > > >>>message when it resumes.  This set provides the ability for
> > > > >>>qemu to comminucate
> > > > >>>to the guest that it has been stopped.  When the guest hits
> > > > >>>the watchdog on
> > > > >>>resume it will check if it was suspended before issuing the
> > > > >>>warning.
> > > > >>>
> > > > >>>Eric B Munson (4):
> > > > >>>   Add flag to indicate that a vm was stopped by the host
> > > > >>>   Add functions to check if the host has stopped the vm
> > > > >>>   Add generic stubs for kvm stop check functions
> > > > >>>   Add check for suspended vm in softlockup detector
> > > > >>>
> > > > >>>  arch/x86/include/asm/pvclock-abi.h |1 +
> > > > >>>  arch/x86/include/asm/pvclock.h |2 ++
> > > > >>>  arch/x86/kernel/kvmclock.c |   14 ++
> > > > >>>  include/asm-generic/pvclock.h  |   14 ++
> > > > >>>  kernel/watchdog.c  |   12 
> > > > >>>  5 files changed, 43 insertions(+), 0 deletions(-)
> > > > >>>  create mode 100644 include/asm-generic/pvclock.h
> > > > >>>
> > > > >>>--
> > > > >>>1.7.4.1
> > > > >>
> > > > >>How is the host supposed to set this flag?
> > > > >>
> > > > >>As mentioned previously, if you save save/restore the offset
> > > > >>added to
> > > > >>kvmclock on stop/cont (and the TSC MSR, forgot to mention that), no
> > > > >>paravirt infrastructure is required. Which means the issue is
> > > > >>also fixed
> > > > >>for older guests.
> > > > >
> > 
> > Marcelo,
> > 
> > I think that stopping the TSC is the wrong approach because it will break 
> > time
> > between the two systems so timething that expects the monotonic clock to 
> > move
> > consistently will be wrong.
> 
> In case the VM stops for whatever reason, the host system is not
> supposed to adjust time related hardware state to compensate, in an
> attempt to present apparent continuous time.
> 
> If you save a VM and then restore it later, it is the guest
> responsability to adjust its time representation.
> 
> QEMU exposing continuous TSC and kvmclock state between "stop" and
> "cont" should not be a reason to introduce new paravirt infrastructure.
> 
> >  IMO, messing with the TSC at run time to avoid a watchdog message
> > is the wrong solution, better to teach the watchdog to ignore this
> > special case.
> 
> OK then, it is not a harmful addition, can you post the QEMU patches to
> set the "ignore watchdog" bit.

I am about to start working on the interface to set this bit, but to double
check I tried your suggestion to save the clock value on stop and restore it on
continue (see qemu patch below) and I am seeing very odd behavior from it.  On
random resumes the vm won't come back immediately (it took one 15 mintues to
respond) and the wall clock stays behind my host wall clock by the amount of
time it took to resume.

---
 cpus.c |   14 ++
 1 files changed, 14 insertions(+), 0 deletions(-)

diff --git a/cpus.c b/cpus.c
index 54c188c..4573e23 100644
--- a/cpus.c
+++ b/cpus.c
@@ -63,6 +63,7 @@
 #endif /* CONFIG_LINUX */
 
 static CPUState *next_cpu;
+struct kvm_clock_data data;
 
 /***/
 void hw_error(const char *fmt, ...)
@@ -788,6 +789,7 @@ static int all_vcpus_paused(void)
 void pause_all_vcpus(void)
 {
 CPUState *penv = first_cpu;
+int ret;
 
 while (penv) {
 penv->stop = 1;
@@ -803,11 +805,18 @@ void pause_all_vcpus(void)
 penv = (CPUState *)penv->next_cpu;
 }
 }
+
+ret = kvm_vm_ioctl(kvm_state, KVM_GET_CLOCK, &data);
+if (ret < 0) {
+fprintf(stderr, "KVM_GET_CLOCK failed: %s\n", strerror(ret));
+data.clock = 0;
+}
 }
 
 void resume_all_vcpus(void)
 {
 CPUState *penv = first_cpu;
+int ret;
 
 while (penv) {
 penv->stop = 0;
@@ -815,6 +824,11 @@ void resume_all_vcpus(void)
 qemu_cpu_kick(penv);
 penv = (CPUState *)penv->next_cpu;
 }
+if (data.clock != 0) {
+ret = kvm_vm_ioctl(kvm_state, KVM_SET_CLOCK, &data);
+if (ret < 0)
+fprintf(stderr, "KVM_SET_CLOCK failed: %s\n", strerror(ret));
+}
 }
 
 static void qemu_tcg_init_vcpu(void *_env)
> 


signature.asc
Description: Digital signature


Re: [PATCH] KVM: PPC: E500: Support hugetlbfs

2011-09-20 Thread Scott Wood
On 09/19/2011 06:35 PM, Alexander Graf wrote:
> + /*
> +  * e500 doesn't implement the lowest tsize bit,
> +  * or 1K pages.
> +  */
> + tsize = max(BOOK3E_PAGESZ_4K, tsize & ~1);
> +
> + /* take the smallest page size that satisfies host and
> +guest mapping */

kernel comment style...

I don't personally care much, but this leaves a checkpatch complaint for
the next person to modify the comment.  Such as to make it say "take the
largest page size that...". :-)

> + asm (PPC_CNTLZL "%0,%1" : "=r" (lz) : "r" (psize));
> + tsize = min(21 - lz, tsize);

No need to open-code the cntlz and subtract-from-21:

tsize = min(ilog2(psize) - 10, tsize);

/*
 * e500 doesn't implement the lowest tsize bit,
 * or 1K pages.
 */
tsize = max(BOOK3E_PAGESZ_4K, tsize & ~1);

There's still an open-coded subtraction of 10, but that relates more
straightforwardly to the definition of tsize (and could be factored out
into a size-to-tsize function).

>   }
>  
>   up_read(¤t->mm->mmap_sem);
>   }
>  
>   if (likely(!pfnmap)) {
> + unsigned long tsize_pages = 1 << (tsize - 2);

1 << (tsize + 10 - PAGE_SHIFT);

>   pfn = gfn_to_pfn_memslot(vcpu_e500->vcpu.kvm, slot, gfn);
> + pfn &= ~(tsize_pages - 1);
> + gvaddr &= ~((tsize_pages << PAGE_SHIFT) - 1);
>   if (is_error_pfn(pfn)) {

Won't the masking of pfn affect is_error_pfn()?

-Scott

--
To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html