Re: SVM: vmload/vmsave-free VM exits?

2015-04-14 Thread Jan Kiszka
On 2015-04-14 08:39, Valentine Sinitsyn wrote:
> Hi all,
> 
> On 13.04.2015 22:41, Avi Kivity wrote:
>> On 04/13/2015 08:35 PM, Jan Kiszka wrote:
>>> On 2015-04-13 19:29, Avi Kivity wrote:
 On 04/13/2015 10:01 AM, Jan Kiszka wrote:
> On 2015-04-07 07:43, Jan Kiszka wrote:
>> On 2015-04-05 19:12, Valentine Sinitsyn wrote:
>>> Hi Jan,
>>>
>>> On 05.04.2015 13:31, Jan Kiszka wrote:
 studying the VM exit logic of Jailhouse, I was wondering when AMD's
 vmload/vmsave can be avoided. Jailhouse as well as KVM currently
 use
 these instructions unconditionally. However, I think both only need
 GS.base, i.e. the per-cpu base address, to be saved and restored
 if no
 user space exit or no CPU migration is involved (both is always
 true for
 Jailhouse). Xen avoids vmload/vmsave on lightweight exits but it
 also
 still uses rsp-based per-cpu variables.

 So the question boils down to what is generally faster:

 A) vmload
   vmrun
   vmsave

 B) wrmsrl(MSR_GS_BASE, guest_gs_base)
   vmrun
   rdmsrl(MSR_GS_BASE, guest_gs_base)

 Of course, KVM also has to take into account that heavyweight exits
 still require vmload/vmsave, thus become more expensive with B)
 due to
 the additional MSR accesses.

 Any thoughts or results of previous experiments?
>>> That's a good question, I also thought about it when I was
>>> finalizing
>>> Jailhouse AMD port. I tried "lightweight exits" with apic-demo
>>> but it
>>> didn't seem to affect the latency in any noticeable way. That's
>>> why I
>>> decided not to push the patch (in fact, I was even unable to find it
>>> now).
>>>
>>> Note however that how AMD chips store host state during VM
>>> switches are
>>> implementation-specific. I did my quick experiments on one CPU
>>> only, so
>>> your mileage may vary.
>>>
>>> Regarding your question, I feel B will be faster anyways but again
>>> I'm
>>> afraid that the gain could be within statistical error of the
>>> experiment.
>> It is, at least 160 cycles with hot caches on an AMD A6-5200 APU,
>> more
>> towards 600 if they are colder (added some usleep to each loop in the
>> test).
>>
>> I've tested via vmmcall from guest userspace under Jailhouse. KVM
>> should
>> be adjustable in a similar way. Attached the benchmark, patch will
>> be in
>> the Jailhouse next branch soon. We need to check more CPU types,
>> though.
> Avi, I found some preparatory patches of yours from 2010 [1]. Do you
> happen to remember if it was never completed for a technical reason?
 IIRC, I came to the conclusion that it was impossible.  Something about
 TR.size not receiving a reasonable value.  Let me see.
>>> To my understanding, TR doesn't play a role until we leave ring 0 again.
>>> Or what could make the CPU look for any of the fields in the 64-bit TSS
>>> before that?
>>
>> Exceptions that utilize the IST.  I found a writeup [17] that describes
>> this, but I think it's even more impossible than that writeup implies.
> Pardon my slowness, but how does it affect Jailhouse running on AMD? For
> NMI, we do #VMEXIT, but we can disable IST (I'm not sure it's enabled
> already, in fact). Double faults don't cause #VMEXIT, so there is no
> VMLOAD/VMSAVE issue. I'm not sure about MCE, but for now they are sort
> of flawed in Jailhouse anyways IIRC.
> 
> What am I missing here?

Nothing. As I said in the other branch of this thread, Jailhouse is not
affected as it doesn't use the IST. Only KVM is because Linux - in host
mode - requires it for the cases Avi mentioned.

Jan

-- 
Siemens AG, Corporate Technology, CT RTC ITP SES-DE
Corporate Competence Center Embedded Linux
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: SVM: vmload/vmsave-free VM exits?

2015-04-14 Thread Valentine Sinitsyn

On 14.04.2015 12:02, Jan Kiszka wrote:

What am I missing here?


Nothing. As I said in the other branch of this thread, Jailhouse is not
affected as it doesn't use the IST. Only KVM is because Linux - in host
mode - requires it for the cases Avi mentioned.
Then what I am missing is correct discussion branch. :) Thanks for 
explaining.


Valentine
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [v3 24/26] KVM: Update Posted-Interrupts Descriptor when vCPU is blocked

2015-04-14 Thread Wu, Feng


> -Original Message-
> From: Marcelo Tosatti [mailto:mtosa...@redhat.com]
> Sent: Tuesday, March 31, 2015 7:56 AM
> To: Wu, Feng
> Cc: h...@zytor.com; t...@linutronix.de; mi...@redhat.com; x...@kernel.org;
> g...@kernel.org; pbonz...@redhat.com; dw...@infradead.org;
> j...@8bytes.org; alex.william...@redhat.com; jiang@linux.intel.com;
> eric.au...@linaro.org; linux-ker...@vger.kernel.org;
> io...@lists.linux-foundation.org; kvm@vger.kernel.org
> Subject: Re: [v3 24/26] KVM: Update Posted-Interrupts Descriptor when vCPU
> is blocked
> 
> On Mon, Mar 30, 2015 at 04:46:55AM +, Wu, Feng wrote:
> >
> >
> > > -Original Message-
> > > From: Marcelo Tosatti [mailto:mtosa...@redhat.com]
> > > Sent: Saturday, March 28, 2015 3:30 AM
> > > To: Wu, Feng
> > > Cc: h...@zytor.com; t...@linutronix.de; mi...@redhat.com;
> x...@kernel.org;
> > > g...@kernel.org; pbonz...@redhat.com; dw...@infradead.org;
> > > j...@8bytes.org; alex.william...@redhat.com; jiang@linux.intel.com;
> > > eric.au...@linaro.org; linux-ker...@vger.kernel.org;
> > > io...@lists.linux-foundation.org; kvm@vger.kernel.org
> > > Subject: Re: [v3 24/26] KVM: Update Posted-Interrupts Descriptor when
> vCPU
> > > is blocked
> > >
> > > On Fri, Mar 27, 2015 at 06:34:14AM +, Wu, Feng wrote:
> > > > > > Currently, the following code is executed before 
> > > > > > local_irq_disable() is
> > > called,
> > > > > > so do you mean 1)moving local_irq_disable() to the place before it. 
> > > > > > 2)
> after
> > > > > interrupt
> > > > > > is disabled, set KVM_REQ_EVENT in case the ON bit is set?
> > > > >
> > > > > 2) after interrupt is disabled, set KVM_REQ_EVENT in case the ON bit
> > > > > is set.
> > > >
> > > > Here is my understanding about your comments here:
> > > > - Disable interrupts
> > > > - Check 'ON'
> > > > - Set KVM_REQ_EVENT if 'ON' is set
> > > >
> > > > Then we can put the above code inside " if
> > > (kvm_check_request(KVM_REQ_EVENT, vcpu) || req_int_win) "
> > > > just like it used to be. However, I still have some questions about this
> > > comment:
> > > >
> > > > 1. Where should I set KVM_REQ_EVENT? In function vcpu_enter_guest(),
> or
> > > other places?
> > >
> > > See below:
> > >
> > > > If in vcpu_enter_guest(), since currently local_irq_disable() is called 
> > > > after
> > > 'KVM_REQ_EVENT'
> > > > is checked, is it helpful to set KVM_REQ_EVENT after 
> > > > local_irq_disable() is
> > > called?
> > >
> > > local_irq_disable();
> > >
> > >   *** add code here ***
> >
> > So we need add code like the following here, right?
> >
> >   if ('ON' is set)
> >   kvm_make_request(KVM_REQ_EVENT, vcpu);
> 

Hi Marcelo,

I changed the code as above, then I found that the ping latency was extremely 
big, (70ms - 400ms).
I digged into it and got the root cause. We cannot use "checking-on" as the 
judgment, since 'ON'
can be cleared by hypervisor software in lots of places. In this case, 
KVM_REQ_EVENT cannot be
set when we check 'ON' bit, hence the interrupts are not injected to the guest 
in time.

Please refer to the following code, in which 'ON' bit can be cleared:

apic_find_highest_irr () --> vmx_sync_pir_to_irr () --> pi_test_and_clear_on()

Searching from the code step by step, apic_find_highest_irr() can be called by 
many other guys.

Thanks,
Feng

> Yes.
> 
> > > if (vcpu->mode == EXITING_GUEST_MODE || vcpu->requests
> > >   ^^
> 
> Point *1.
> 
> > > || need_resched() || signal_pending(current)) {
> > > vcpu->mode = OUTSIDE_GUEST_MODE;
> > > smp_wmb();
> > > local_irq_enable();
> > > preempt_enable();
> > > vcpu->srcu_idx = srcu_read_lock(&vcpu->kvm->srcu);
> > > r = 1;
> > > goto cancel_injection;
> > > }
> > >
> > > > 2. 'ON' is set by VT-d hardware, it can be set even when interrupt is
> disabled
> > > (the related bit in PIR is also set).
> > >
> > > Yes, we are checking if the HW has set an interrupt in PIR while
> > > outside VM (which requires PIR->VIRR transfer by software).
> > >
> > > If the interrupt it set by hardware after local_irq_disable(),
> > > VMX-entry will handle the interrupt and perform the PIR->VIRR
> > > transfer and reevaluate interrupts, injecting to guest
> > > if necessary, is that correct ?
> > >
> > > > So does it make sense to check 'ON' and set KVM_REQ_EVENT accordingly
> > > after interrupt is disabled?
> > >
> > > To replace the costly
> > >
> > > +*/
> > > +   if (kvm_x86_ops->hwapic_irr_update)
> > > +   kvm_x86_ops->hwapic_irr_update(vcpu,
> > > +   kvm_lapic_find_highest_irr(vcpu));
> > >
> > > Yes, i think so.
> >
> > After adding the "checking ON and setting KVM_REQ_EVENT" operations
> listed in my
> > comments above, do you mean we still need to keep the costly code above
> > inside "if

Re: [PATCH v2 04/10] KVM: arm: guest debug, add stub KVM_SET_GUEST_DEBUG ioctl

2015-04-14 Thread Alex Bennée

David Hildenbrand  writes:

>> On Tue, Mar 31, 2015 at 04:08:02PM +0100, Alex Bennée wrote:
>> > This commit adds a stub function to support the KVM_SET_GUEST_DEBUG
>> > ioctl. Currently any operation flag will return EINVAL. Actual
>> > functionality will be added with further patches.
>> > 
>> > Signed-off-by: Alex Bennée .
>> > 
>> > ---
>> > v2
>> >   - simplified form of the ioctl (stuff will go into setup_debug)
>> > 
>> > diff --git a/Documentation/virtual/kvm/api.txt 
>> > b/Documentation/virtual/kvm/api.txt
>> > index b112efc..06c5064 100644
>> > --- a/Documentation/virtual/kvm/api.txt
>> > +++ b/Documentation/virtual/kvm/api.txt
>> > @@ -2604,7 +2604,7 @@ handled.
>> >  4.87 KVM_SET_GUEST_DEBUG
>> >  
>> >  Capability: KVM_CAP_SET_GUEST_DEBUG
>> > -Architectures: x86, s390, ppc
>> > +Architectures: x86, s390, ppc, arm64
>> >  Type: vcpu ioctl
>> >  Parameters: struct kvm_guest_debug (in)
>> >  Returns: 0 on success; -1 on error
>> > diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
>> > index 5560f74..445933d 100644
>> > --- a/arch/arm/kvm/arm.c
>> > +++ b/arch/arm/kvm/arm.c
>> > @@ -183,6 +183,7 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long 
>> > ext)
>> >case KVM_CAP_ARM_PSCI:
>> >case KVM_CAP_ARM_PSCI_0_2:
>> >case KVM_CAP_READONLY_MEM:
>> > +  case KVM_CAP_SET_GUEST_DEBUG:
>> >r = 1;
>> >break;
>> 
>> shouldn't you wait with advertising this capability until you've
>> implemented support for it?
>> 
>
> I think this would work for now, however it's not very practical
> - in the end one has to sense which debug flags are actually supported.
>
> Question is if he wants to add initial support and extend functionality and
> flags with each patch or enable the whole set of features in one shot at the
> end.

This is what in effect I was doing. Testing each feature in turn and
ensuring it failed gracefully when kernel features where not present
(both missing the capability or returning EINVAL).

> Doing the latter seems more practicable to me (especially as the debug 
> features
> are added in the same patch series).

Well in practice the whole series will go in together so this is only
really relevant to what happens when you bisect the series.

>
> David

-- 
Alex Bennée
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 02/10] KVM: define common __KVM_GUESTDBG_USE_SW/HW_BP values

2015-04-14 Thread Christoffer Dall
On Mon, Apr 13, 2015 at 03:51:33PM +0100, Alex Bennée wrote:
> 
> Christoffer Dall  writes:
> 
> > On Tue, Mar 31, 2015 at 04:08:00PM +0100, Alex Bennée wrote:
> >> Currently x86, powerpc and soon arm64 use the same two architecture
> >> specific bits for guest debug support for software and hardware
> >> breakpoints. This makes the shared values explicit while leaving the
> >> gate open for another architecture to use some other value if they
> >> really really want to.
> >> 
> >> Signed-off-by: Alex Bennée 
> >> 
> >> diff --git a/arch/powerpc/include/uapi/asm/kvm.h 
> >> b/arch/powerpc/include/uapi/asm/kvm.h
> >> index ab4d473..1731569 100644
> >> --- a/arch/powerpc/include/uapi/asm/kvm.h
> >> +++ b/arch/powerpc/include/uapi/asm/kvm.h
> >> @@ -310,8 +310,8 @@ struct kvm_guest_debug_arch {
> >>   * and upper 16 bits are architecture specific. Architecture specific 
> >> defines
> >>   * that ioctl is for setting hardware breakpoint or software breakpoint.
> >>   */
> >> -#define KVM_GUESTDBG_USE_SW_BP0x0001
> >> -#define KVM_GUESTDBG_USE_HW_BP0x0002
> >> +#define KVM_GUESTDBG_USE_SW_BP__KVM_GUESTDBG_USE_SW_BP
> >> +#define KVM_GUESTDBG_USE_HW_BP__KVM_GUESTDBG_USE_HW_BP
> >>  
> >>  /* definition of registers in kvm_run */
> >>  struct kvm_sync_regs {
> >> diff --git a/arch/x86/include/uapi/asm/kvm.h 
> >> b/arch/x86/include/uapi/asm/kvm.h
> >> index d7dcef5..1438202 100644
> >> --- a/arch/x86/include/uapi/asm/kvm.h
> >> +++ b/arch/x86/include/uapi/asm/kvm.h
> >> @@ -250,8 +250,8 @@ struct kvm_debug_exit_arch {
> >>__u64 dr7;
> >>  };
> >>  
> >> -#define KVM_GUESTDBG_USE_SW_BP0x0001
> >> -#define KVM_GUESTDBG_USE_HW_BP0x0002
> >> +#define KVM_GUESTDBG_USE_SW_BP__KVM_GUESTDBG_USE_SW_BP
> >> +#define KVM_GUESTDBG_USE_HW_BP__KVM_GUESTDBG_USE_HW_BP
> >>  #define KVM_GUESTDBG_INJECT_DB0x0004
> >>  #define KVM_GUESTDBG_INJECT_BP0x0008
> >>  
> >> diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
> >> index 5eedf84..ce2db14 100644
> >> --- a/include/uapi/linux/kvm.h
> >> +++ b/include/uapi/linux/kvm.h
> >> @@ -525,8 +525,16 @@ struct kvm_s390_irq {
> >>  
> >>  /* for KVM_SET_GUEST_DEBUG */
> >>  
> >> -#define KVM_GUESTDBG_ENABLE   0x0001
> >> -#define KVM_GUESTDBG_SINGLESTEP   0x0002
> >> +#define KVM_GUESTDBG_ENABLE   (1 << 0)
> >> +#define KVM_GUESTDBG_SINGLESTEP   (1 << 1)
> >> +
> >> +/*
> >> + * Architecture specific stuff uses the top 16 bits of the field,
> >
> > can you be more specific than 'stuff' here?  features?
> >
> >> + * however there is some shared commonality for the common cases
> >
> > I don't like this sentence; shared commonality is a pleonasm and the use
> > of however makes it sounds like there's some caveat here.
> 
> OK I can see that - after I looked it up ;-)
> 
> > If the top 16 bits are indeed arhictecture specific, then I think they
> > should just be defined in their architecture specific headers.  Unless
> > the idea here is that there's a fixed set of of flags that architectures
> > can choose to support, in which case it should simply be defined in the
> > common header.
> 
> Well an architecture might not support some features and want to use
> those bits for something else? I didn't want to force the bottom two
> of the architecture specific bits to wasted if the features don't exist.
> 
In that case I think the definition is local to each architecture and
should indeed just be duplicated.  The __ definitions complicate more
than they help as they are exported to userspace etc.  The KVM
maintainers may have a different view on this though.

-Christoffer
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 06/10] KVM: arm64: guest debug, add SW break point support

2015-04-14 Thread Christoffer Dall
On Tue, Mar 31, 2015 at 04:08:04PM +0100, Alex Bennée wrote:
> This adds support for SW breakpoints inserted by userspace.
> 
> We do this by trapping all BKPT exceptions in the
> hypervisor (MDCR_EL2_TDE).

you mean trapping all exceptions in the guest to the hypervisor?

> The kvm_debug_exit_arch carries the address
> of the exception.

why?  can userspace not simply read out the PC using GET_ONE_REG?

> If user-space doesn't know of the breakpoint then we
> have a guest inserted breakpoint and the hypervisor needs to start again
> and deliver the exception to guest.
> 
> Signed-off-by: Alex Bennée 
> 
> ---
> v2
>   - update to use new exit struct
>   - tweak for C setup
>   - do our setup in debug_setup/clear code
>   - fixed up comments
> 
> diff --git a/Documentation/virtual/kvm/api.txt 
> b/Documentation/virtual/kvm/api.txt
> index 06c5064..17d4f9c 100644
> --- a/Documentation/virtual/kvm/api.txt
> +++ b/Documentation/virtual/kvm/api.txt
> @@ -2626,7 +2626,7 @@ when running. Common control bits are:
>  The top 16 bits of the control field are architecture specific control
>  flags which can include the following:
>  
> -  - KVM_GUESTDBG_USE_SW_BP: using software breakpoints [x86]
> +  - KVM_GUESTDBG_USE_SW_BP: using software breakpoints [x86, arm64]
>- KVM_GUESTDBG_USE_HW_BP: using hardware breakpoints [x86, s390]
>- KVM_GUESTDBG_INJECT_DB: inject DB type exception [x86]
>- KVM_GUESTDBG_INJECT_BP: inject BP type exception [x86]
> diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
> index 7ea8b0e..d3bc8dc 100644
> --- a/arch/arm/kvm/arm.c
> +++ b/arch/arm/kvm/arm.c
> @@ -304,7 +304,7 @@ void kvm_arch_vcpu_put(struct kvm_vcpu *vcpu)
>   kvm_arm_set_running_vcpu(NULL);
>  }
>  
> -#define KVM_GUESTDBG_VALID (KVM_GUESTDBG_ENABLE)
> +#define KVM_GUESTDBG_VALID (KVM_GUESTDBG_ENABLE|KVM_GUESTDBG_USE_SW_BP)

nit: spaces around the operator

>  
>  int kvm_arch_vcpu_ioctl_set_guest_debug(struct kvm_vcpu *vcpu,
>   struct kvm_guest_debug *dbg)
> diff --git a/arch/arm64/kvm/debug.c b/arch/arm64/kvm/debug.c
> index 8a29d0b..cff0475 100644
> --- a/arch/arm64/kvm/debug.c
> +++ b/arch/arm64/kvm/debug.c
> @@ -45,11 +45,18 @@ void kvm_arch_setup_debug(struct kvm_vcpu *vcpu)
>   vcpu->arch.mdcr_el2 |= (MDCR_EL2_TPM | MDCR_EL2_TPMCR);
>   vcpu->arch.mdcr_el2 |= (MDCR_EL2_TDRA | MDCR_EL2_TDOSA);
>  
> + /* Trap debug register access? */

other patch

>   if (!vcpu->arch.debug_flags & KVM_ARM64_DEBUG_DIRTY)
>   vcpu->arch.mdcr_el2 |= MDCR_EL2_TDA;
>   else
>   vcpu->arch.mdcr_el2 &= ~MDCR_EL2_TDA;
>  
> + /* Trap breakpoints? */
> + if (vcpu->guest_debug & KVM_GUESTDBG_USE_SW_BP)
> + vcpu->arch.mdcr_el2 |= MDCR_EL2_TDE;
> + else
> + vcpu->arch.mdcr_el2 &= ~MDCR_EL2_TDE;

so now you're trapping all debug exceptions, right?

what happens if the guest is using the hardware to debug debug stuff and
generates other kinds of debug exceptions, like a hardware breakpoint,
will we not see an unhandled exception and the guest being forcefully
killed?

> +
>  }
>  
>  void kvm_arch_clear_debug(struct kvm_vcpu *vcpu)
> diff --git a/arch/arm64/kvm/handle_exit.c b/arch/arm64/kvm/handle_exit.c
> index 524fa25..ed1bbb4 100644
> --- a/arch/arm64/kvm/handle_exit.c
> +++ b/arch/arm64/kvm/handle_exit.c
> @@ -82,6 +82,37 @@ static int kvm_handle_wfx(struct kvm_vcpu *vcpu, struct 
> kvm_run *run)
>   return 1;
>  }
>  
> +/**
> + * kvm_handle_debug_exception - handle a debug exception instruction

handle a software breadkpoint exception

> + *
> + * @vcpu:the vcpu pointer
> + * @run: access to the kvm_run structure for results
> + *
> + * We route all debug exceptions through the same handler as we

all debug exceptions?  software breakpoints and all?  then why the above
shot text?

> + * just need to report the PC and the HSR values to userspace.
> + * Userspace may decide to re-inject the exception and deliver it to
> + * the guest if it wasn't for the host to deal with.

now I'm confused - does userspace setup the guest to receive an
exception or does it tell KVM to emulate an exception for the guest or
do we execute the breakpoint without trapping the debug exception?

> + */
> +static int kvm_handle_guest_debug(struct kvm_vcpu *vcpu, struct kvm_run *run)
> +{
> + u32 hsr = kvm_vcpu_get_hsr(vcpu);
> +
> + run->exit_reason = KVM_EXIT_DEBUG;
> + run->debug.arch.hsr = hsr;
> +
> + switch (hsr >> ESR_ELx_EC_SHIFT) {
> + case ESR_ELx_EC_BKPT32:
> + case ESR_ELx_EC_BRK64:
> + run->debug.arch.pc = *vcpu_pc(vcpu);
> + break;
> + default:
> + kvm_err("%s: un-handled case hsr: %#08x\n",
> + __func__, (unsigned int) hsr);

this should never happen right?

> + break;
> + }
> + return 0;
> +}
> +
>  static exit_handle_fn arm_exit_handlers[] = {
>   [ESR_ELx_EC_WFx]= kvm_han

Re: [PATCH v2 07/10] KVM: arm64: guest debug, add support for single-step

2015-04-14 Thread Christoffer Dall
Hi Alex,

On Tue, Mar 31, 2015 at 04:08:05PM +0100, Alex Bennée wrote:
> This adds support for single-stepping the guest. As userspace can and
> will manipulate guest registers before restarting any tweaking of the
> registers has to occur just before control is passed back to the guest.

this sentence is hard to read.  Do you mean:

(a) As userspace can and will manipulate guest register, we must ensure
that any tweaking of the registers before restarting the guest happens
immediately before...

or

(b) As userspace manipulates guest registers before restarting the
guest, we must ensure that any tweaking of the register happens
immediately before...

> Furthermore while guest debugging is in effect we need to squash the

Furthermore, while guest debugging is in effect,
(commas)

> ability of the guest to single-step itself as we have no easy way of
> re-entering the guest after the exception has been delivered to the
> hypervisor.

I'm not sure I understand this last part of the sentence.  Is the point
that if we trap on a guest single-step exception we cannot easily inject
such an exception back into the guest and therefore we trap the guest if
it tries to set itself up for single-stepping?

What is our recourse then?  To just ignore the single-step setting of
the guest and execute it as normal (while single-stepping the guest from
the outside)?

> 
> Signed-off-by: Alex Bennée 
> 
> ---
> v2
>   - Move pstate/mdscr manipulation into C
>   - don't export guest_debug to assembly
>   - add accessor for saved_debug regs
>   - tweak save/restore of mdscr_el1
> 
> diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
> index d3bc8dc..c1ed8cb 100644
> --- a/arch/arm/kvm/arm.c
> +++ b/arch/arm/kvm/arm.c
> @@ -304,7 +304,21 @@ void kvm_arch_vcpu_put(struct kvm_vcpu *vcpu)
>   kvm_arm_set_running_vcpu(NULL);
>  }
>  
> -#define KVM_GUESTDBG_VALID (KVM_GUESTDBG_ENABLE|KVM_GUESTDBG_USE_SW_BP)
> +#define KVM_GUESTDBG_VALID (KVM_GUESTDBG_ENABLE |\
> + KVM_GUESTDBG_USE_SW_BP | \
> + KVM_GUESTDBG_SINGLESTEP)
> +
> +/**
> + * kvm_arch_vcpu_ioctl_set_guest_debug - Setup guest debugging
> + * @kvm: pointer to the KVM struct
> + * @kvm_guest_debug: the ioctl data buffer
> + *
> + * This sets up the VM for guest debugging. Care has to be taken when
> + * manipulating guest registers as these will be set/cleared by the
> + * hyper-visor controller, typically before each kvm_run event. As a

which guest registers are set/cleared by userspace exactly?

s/by the hyper-visor controller/by userspace/

> + * result modification of the guest registers needs to take place

As a result, (comma)

s/needs to/must/

> + * after they have been restored in the hyp.S trampoline code.

trampoline code?  The trampoline code we are referring to is in
hyp-init.S.  Do you mean in EL2?  Then just sya in hyp.S or say in EL2
or in hyp mode.

> + */
>  
>  int kvm_arch_vcpu_ioctl_set_guest_debug(struct kvm_vcpu *vcpu,
>   struct kvm_guest_debug *dbg)
> diff --git a/arch/arm64/include/asm/kvm_host.h 
> b/arch/arm64/include/asm/kvm_host.h
> index 0631840..6a33647 100644
> --- a/arch/arm64/include/asm/kvm_host.h
> +++ b/arch/arm64/include/asm/kvm_host.h
> @@ -121,6 +121,13 @@ struct kvm_vcpu_arch {
>* here.
>*/
>  
> + /* Registers pre any guest debug manipulations */

I couldn't find 'pre' as an independent word in any English
dictionaries.  I'm also not entirely sure what you mean?  Who modifies
this when, and why do we need to store this?

> + struct {
> + u32 pstate_ss_bit;
> + u32 mdscr_el1_bits;
> +
> + } debug_saved_regs;

If I understood this state correctly (see below), then guest_debug_state
is probably a better name for this struct.

> +
>   /* Don't run the guest */
>   bool pause;
>  
> @@ -143,6 +150,7 @@ struct kvm_vcpu_arch {
>  
>  #define vcpu_gp_regs(v)  (&(v)->arch.ctxt.gp_regs)
>  #define vcpu_sys_reg(v,r)((v)->arch.ctxt.sys_regs[(r)])
> +#define vcpu_debug_saved_reg(v, r) ((v)->arch.debug_saved_regs.r)

hmm, not sure this is warranted if the 'saved_regs' is not the current
state of the VM, which is sort of what the vcpu_gp_regs() and friends
hint at.  Maybe I'm just not getting exactly what piece of state it is.

>  /*
>   * CP14 and CP15 live in the same array, as they are backed by the
>   * same system registers.
> diff --git a/arch/arm64/kvm/debug.c b/arch/arm64/kvm/debug.c
> index cff0475..b32362c 100644
> --- a/arch/arm64/kvm/debug.c
> +++ b/arch/arm64/kvm/debug.c
> @@ -19,8 +19,16 @@
>  
>  #include 
>  
> +#include 
> +#include 
>  #include 
>  #include 
> +#include 
> +
> +/* These are the bits of MDSCR_EL1 we may mess with */

we may mess with?  Can you be more specific?

> +#define MDSCR_EL1_DEBUG_BITS (DBG_MDSCR_SS | \
> + DBG_MDSCR_KDE | \
> + DBG_MDSCR_MDE)
>  
>  /**
>   * kvm_arch_se

Re: [PATCH v2 08/10] KVM: arm64: guest debug, HW assisted debug support

2015-04-14 Thread Christoffer Dall
On Tue, Mar 31, 2015 at 04:08:06PM +0100, Alex Bennée wrote:
> This adds support for userspace to control the HW debug registers for
> guest debug. We'll only copy the $ARCH defined number across as that is
> all that hyp.S will use anyway. 

I don't really understand what this sentence means?

> I've moved some helper functions into
> the hw_breakpoint.h header for re-use.
> 
> As with single step we need to tweak the guest registers to enable the
> exceptions so we need to save and restore those bits.
> 
> Two new capabilities have been added to the KVM_EXTENSION ioctl to allow
> userspace to query the number of hardware break and watch points
> available on the host hardware.
> 
> As QEMU tests for watchpoints based on the address and not the PC we
> also need to export the value of far_el2 to userspace.
> 
> Signed-off-by: Alex Bennée 
> 
> ---
> v2
>- switched to C setup
>- replace host debug registers directly into context
>- minor tweak to api docs
>- setup right register for debug
>- add FAR_EL2 to debug exit structure
>- add support fro trapping debug register access
> 
> diff --git a/Documentation/virtual/kvm/api.txt 
> b/Documentation/virtual/kvm/api.txt
> index 17d4f9c..ac34093 100644
> --- a/Documentation/virtual/kvm/api.txt
> +++ b/Documentation/virtual/kvm/api.txt
> @@ -2627,7 +2627,7 @@ The top 16 bits of the control field are architecture 
> specific control
>  flags which can include the following:
>  
>- KVM_GUESTDBG_USE_SW_BP: using software breakpoints [x86, arm64]
> -  - KVM_GUESTDBG_USE_HW_BP: using hardware breakpoints [x86, s390]
> +  - KVM_GUESTDBG_USE_HW_BP: using hardware breakpoints [x86, s390, arm64]
>- KVM_GUESTDBG_INJECT_DB: inject DB type exception [x86]
>- KVM_GUESTDBG_INJECT_BP: inject BP type exception [x86]
>- KVM_GUESTDBG_EXIT_PENDING:  trigger an immediate guest exit [s390]
> @@ -2642,6 +2642,10 @@ updated to the correct (supplied) values.
>  The second part of the structure is architecture specific and
>  typically contains a set of debug registers.
>  
> +For arm64 the number of debug registers is implementation defined and
> +can be determined by querying the KVM_CAP_GUEST_DEBUG_HW_BPS and
> +KVM_CAP_GUEST_DEBUG_HW_WPS capabilities.
> +

can you document their behavior more specifically?  I assume they both
return 0 if HW assisted debugging is not supported and return the number
of implemented hardware registers otherwise?

How does this work on big.LITTLE systems where cores may have a different
number of implemented registers?

>  When debug events exit the main run loop with the reason
>  KVM_EXIT_DEBUG with the kvm_debug_exit_arch part of the kvm_run
>  structure containing architecture specific debug information.
> diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
> index c1ed8cb..a286026 100644
> --- a/arch/arm/kvm/arm.c
> +++ b/arch/arm/kvm/arm.c
> @@ -306,6 +306,7 @@ void kvm_arch_vcpu_put(struct kvm_vcpu *vcpu)
>  
>  #define KVM_GUESTDBG_VALID (KVM_GUESTDBG_ENABLE |\
>   KVM_GUESTDBG_USE_SW_BP | \
> + KVM_GUESTDBG_USE_HW_BP | \
>   KVM_GUESTDBG_SINGLESTEP)
>  
>  /**
> @@ -328,6 +329,26 @@ int kvm_arch_vcpu_ioctl_set_guest_debug(struct kvm_vcpu 
> *vcpu,
>   return -EINVAL;
>  
>   vcpu->guest_debug = dbg->control;
> +
> + /* Hardware assisted Break and Watch points */
> + if (vcpu->guest_debug & KVM_GUESTDBG_USE_HW_BP) {
> + int nb = get_num_brps();
> + int nw = get_num_wrps();
> +
> + /* Copy across up to IMPDEF debug registers to our
> +  * shadow copy in the vcpu structure. The debug code
> +  * will then set them up before we re-enter the guest.
> +  */
> + memcpy(vcpu->arch.guest_debug_regs.dbg_bcr,
> + dbg->arch.dbg_bcr, sizeof(__u64)*nb);
> + memcpy(vcpu->arch.guest_debug_regs.dbg_bvr,
> + dbg->arch.dbg_bvr, sizeof(__u64)*nb);
> + memcpy(vcpu->arch.guest_debug_regs.dbg_wcr,
> + dbg->arch.dbg_wcr, sizeof(__u64)*nw);
> + memcpy(vcpu->arch.guest_debug_regs.dbg_wvr,
> + dbg->arch.dbg_wvr, sizeof(__u64)*nw);
> + }
> +
>   } else {
>   /* If not enabled clear all flags */
>   vcpu->guest_debug = 0;
> diff --git a/arch/arm64/include/asm/hw_breakpoint.h 
> b/arch/arm64/include/asm/hw_breakpoint.h
> index 52b484b..c450552 100644
> --- a/arch/arm64/include/asm/hw_breakpoint.h
> +++ b/arch/arm64/include/asm/hw_breakpoint.h
> @@ -130,6 +130,18 @@ static inline void ptrace_hw_copy_thread(struct 
> task_struct *task)
>  }
>  #endif
>  
> +/* Determine number of BRP registers available. */
> +static inline int get_num_brps(vo

Re: [PATCH v2 4/4] KVM: x86: Clear CR2 on VCPU reset

2015-04-14 Thread Wanpeng Li
Hi Nadav,
On Thu, Apr 02, 2015 at 03:10:38AM +0300, Nadav Amit wrote:
>CR2 is not cleared as it should after reset.  See Intel SDM table named "IA-32
>Processor States Following Power-up, Reset, or INIT".

How you trigger the reset instead of the "Power-up" one?

Regards,
Wanpeng Li 

>
>Signed-off-by: Nadav Amit 
>---
> arch/x86/kvm/x86.c | 2 ++
> 1 file changed, 2 insertions(+)
>
>diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
>index e4ac17e..8fdad04 100644
>--- a/arch/x86/kvm/x86.c
>+++ b/arch/x86/kvm/x86.c
>@@ -7117,6 +7117,8 @@ void kvm_vcpu_reset(struct kvm_vcpu *vcpu, bool 
>init_event)
>   vcpu->arch.dr7 = DR7_FIXED_1;
>   kvm_update_dr7(vcpu);
> 
>+  vcpu->arch.cr2 = 0;
>+
>   kvm_make_request(KVM_REQ_EVENT, vcpu);
>   vcpu->arch.apf.msr_val = 0;
>   vcpu->arch.st.msr_val = 0;
>-- 
>1.9.1
>
>--
>To unsubscribe from this list: send the line "unsubscribe kvm" in
>the body of a message to majord...@vger.kernel.org
>More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 08/10] KVM: arm64: guest debug, HW assisted debug support

2015-04-14 Thread Christoffer Dall
On Fri, Apr 10, 2015 at 02:25:21PM +0200, Andrew Jones wrote:

[...]

> > --- a/arch/arm64/kvm/sys_regs.c
> > +++ b/arch/arm64/kvm/sys_regs.c
> > @@ -196,16 +196,49 @@ static bool trap_dbgauthstatus_el1(struct kvm_vcpu 
> > *vcpu,
> >   * - If the dirty bit is set, save guest registers, restore host
> >   *   registers and clear the dirty bit. This ensure that the host can
> >   *   now use the debug registers.
> > + *
> > + * We also use this mechanism to set-up the debug registers for guest
> setup

since I'm in this mood:

setup: noun or adjective
set-up: noun derived from the phrasal verb, example "Run! It's a set-up."
set up: verb

-Christoffer
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 09/10] KVM: arm64: trap nested debug register access

2015-04-14 Thread Christoffer Dall
On Mon, Apr 13, 2015 at 08:59:21AM +0100, Alex Bennée wrote:

[...]

> >> +  /* MDSCR_EL1 */
> >> +  if (r->reg == MDSCR_EL1) {
> >> +  if (p->is_write)
> >> +  vcpu_debug_saved_reg(vcpu, mdscr_el1) =
> >> +  *vcpu_reg(vcpu, p->Rt);
> >> +  else
> >> +  *vcpu_reg(vcpu, p->Rt) =
> >> +  vcpu_debug_saved_reg(vcpu, mdscr_el1);
> >
> > With this lines wrapping, {}'s might be nice.
> 
> My natural inclination is to wrap in {}'s but I know the kernel is a fan
> of the single-statement if forms.
> 
It's accepted to use braces for multi-line single statements - and I
prefer it too :)

-Christoffer
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 09/10] KVM: arm64: trap nested debug register access

2015-04-14 Thread Christoffer Dall
On Tue, Mar 31, 2015 at 04:08:07PM +0100, Alex Bennée wrote:
> When we are using the hardware registers for guest debug we need to deal
> with the guests access to them. There is already a mechanism for dealing
> with these accesses so we build on top of that.
> 
>   - mdscr_el1_bits is renamed as we save the whole register
>   - any access to mdscr_el1 is now stored in the mirror location
>   - if we are using HW assisted debug we do the same with DBG[WB][CV]R
> 
> There is one register (MDCCINT_EL1) which guest debug doesn't care about
> so this behaves as before.
> 
> Signed-off-by: Alex Bennée 
> 
> diff --git a/arch/arm64/include/asm/kvm_host.h 
> b/arch/arm64/include/asm/kvm_host.h
> index 2c359c9..3d32d45 100644
> --- a/arch/arm64/include/asm/kvm_host.h
> +++ b/arch/arm64/include/asm/kvm_host.h
> @@ -122,10 +122,13 @@ struct kvm_vcpu_arch {
>* here.
>*/
>  
> - /* Registers pre any guest debug manipulations */
> + /* Registers before any guest debug manipulations. These
> +  * shadow registers are updated by the kvm_handle_sys_reg
> +  * trap handler if the guest accesses or updates them
> +  */
>   struct {
>   u32 pstate_ss_bit;
> - u32 mdscr_el1_bits;
> + u32 mdscr_el1;
>  
>   struct kvm_guest_debug_arch debug_regs;
>   } debug_saved_regs;
> diff --git a/arch/arm64/kvm/debug.c b/arch/arm64/kvm/debug.c
> index 3b368f3..638c111 100644
> --- a/arch/arm64/kvm/debug.c
> +++ b/arch/arm64/kvm/debug.c
> @@ -55,8 +55,6 @@ void kvm_arch_setup_debug(struct kvm_vcpu *vcpu)
>   vcpu->arch.mdcr_el2 |= (MDCR_EL2_TPM | MDCR_EL2_TPMCR);
>   vcpu->arch.mdcr_el2 |= (MDCR_EL2_TDRA | MDCR_EL2_TDOSA);
>  
> - trace_kvm_arch_setup_debug_reg32("MDCR_EL2", vcpu->arch.mdcr_el2);
> -
>   /*
>* If we are not treating debug registers are dirty we need
>* to trap if the guest starts accessing them.
> @@ -71,8 +69,10 @@ void kvm_arch_setup_debug(struct kvm_vcpu *vcpu)
>   /* Save pstate/mdscr */
>   vcpu_debug_saved_reg(vcpu, pstate_ss_bit) =
>   *vcpu_cpsr(vcpu) & DBG_SPSR_SS;
> - vcpu_debug_saved_reg(vcpu, mdscr_el1_bits) =
> - vcpu_sys_reg(vcpu, MDSCR_EL1) & MDSCR_EL1_DEBUG_BITS;
> +
> + vcpu_debug_saved_reg(vcpu, mdscr_el1) =
> + vcpu_sys_reg(vcpu, MDSCR_EL1);
> +

you can avoid this churn in the patches by following Drew's advice to a
previous patch.

>   /*
>* Single Step (ARM ARM D2.12.3 The software step state
>* machine)
> @@ -161,9 +161,8 @@ void kvm_arch_clear_debug(struct kvm_vcpu *vcpu)
>   *vcpu_cpsr(vcpu) &= ~DBG_SPSR_SS;
>   *vcpu_cpsr(vcpu) |= vcpu_debug_saved_reg(vcpu, pstate_ss_bit);
>  
> - vcpu_sys_reg(vcpu, MDSCR_EL1) &= ~MDSCR_EL1_DEBUG_BITS;
> - vcpu_sys_reg(vcpu, MDSCR_EL1) |=
> - vcpu_debug_saved_reg(vcpu, mdscr_el1_bits);
> + vcpu_sys_reg(vcpu, MDSCR_EL1) =
> + vcpu_debug_saved_reg(vcpu, mdscr_el1);
>  
>   /*
>* If we were using HW debug we need to restore the
> diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
> index be9b188..d43d7d1 100644
> --- a/arch/arm64/kvm/sys_regs.c
> +++ b/arch/arm64/kvm/sys_regs.c
> @@ -208,39 +208,61 @@ static bool trap_debug_regs(struct kvm_vcpu *vcpu,
>   const struct sys_reg_params *p,
>   const struct sys_reg_desc *r)
>  {
> - if (vcpu->guest_debug & KVM_GUESTDBG_USE_HW_BP) {
> - struct kvm_guest_debug_arch *saved;
> - __u64 *val;
> -
> - saved = &vcpu_debug_saved_reg(vcpu, debug_regs);
> -
> - if (r->reg >= DBGBCR0_EL1 && r->reg <= DBGBCR15_EL1)
> - val = &saved->dbg_bcr[r->reg - DBGBCR0_EL1];
> - else if (r->reg >= DBGBVR0_EL1 && r->reg <= DBGBVR15_EL1)
> - val = &saved->dbg_bvr[r->reg - DBGBVR0_EL1];
> - else if (r->reg >= DBGWCR0_EL1 && r->reg <= DBGWCR15_EL1)
> - val = &saved->dbg_wcr[r->reg - DBGWCR0_EL1];
> - else if (r->reg >= DBGWVR0_EL1 && r->reg <= DBGWVR15_EL1)
> - val = &saved->dbg_wvr[r->reg - DBGWVR0_EL1];
> - else {
> - kvm_err("Bad register index %d\n", r->reg);
> - return false;
> + if (vcpu->guest_debug) {
> +
> + /* MDSCR_EL1 */
> + if (r->reg == MDSCR_EL1) {
> + if (p->is_write)
> + vcpu_debug_saved_reg(vcpu, mdscr_el1) =
> + *vcpu_reg(vcpu, p->Rt);
> + else
> + *vcpu_reg(vcpu, p->Rt) =
> + vcpu_debug_saved_reg(vcpu, mdscr_el1);
> +
> + return t

Re: [PATCH v2 4/4] KVM: x86: Clear CR2 on VCPU reset

2015-04-14 Thread Nadav Amit
Wanpeng Li  wrote:

> Hi Nadav,
> On Thu, Apr 02, 2015 at 03:10:38AM +0300, Nadav Amit wrote:
>> CR2 is not cleared as it should after reset.  See Intel SDM table named 
>> "IA-32
>> Processor States Following Power-up, Reset, or INIT".
> 
> How you trigger the reset instead of the "Power-up" one?

I sent an IPI of INIT for the KVM “reset” flow. I posted a unit-test:
http://www.spinics.net/lists/kvm/msg115525.html

The actual reset is handled by qemu, but KVM is still able to introduce bugs
in it, as it did in not reseting DR0-DR3.

Nadav--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 10/10] KVM: arm64: add trace points for guest_debug debug

2015-04-14 Thread Christoffer Dall
On Tue, Mar 31, 2015 at 04:08:08PM +0100, Alex Bennée wrote:
> This includes trace points for:
>   kvm_arch_setup_guest_debug
>   kvm_arch_clear_guest_debug
>   kvm_handle_guest_debug
> 
> I've also added some generic register setting trace events so I can
> watch the register values being built up over time. The local
> dump_dbg_regs() function dumps all the HW BKPT and WPT registers.
> 
> I've also added a #define trace_dreg to shorten some lines.
> 
> Signed-off-by: Alex Bennée 
> 
> diff --git a/arch/arm64/kvm/debug.c b/arch/arm64/kvm/debug.c
> index 638c111..7c96288 100644
> --- a/arch/arm64/kvm/debug.c
> +++ b/arch/arm64/kvm/debug.c
> @@ -25,12 +25,37 @@
>  #include 
>  #include 
>  
> +#include "trace.h"
> +
> +#define trace_dreg(name, value) trace_kvm_arch_setup_debug_reg32(name, value)
> +
>  /* These are the bits of MDSCR_EL1 we may mess with */
>  #define MDSCR_EL1_DEBUG_BITS (DBG_MDSCR_SS | \
>   DBG_MDSCR_KDE | \
>   DBG_MDSCR_MDE)
>  
>  /**
> + * dump_dbg_regs - simple debug helper
> + *
> + * This provides a simple helper to dump the HW debug registers
> + */
> +static void dump_dbg_regs(struct kvm_vcpu *vcpu, int nb, int nw)
> +{
> + int i;
> +
> + for (i = 0; i < nb; i++) {
> + trace_printk("bkpt%d: 0x%08x:0x%llx\n", i,
> + (u32) vcpu_sys_reg(vcpu, DBGBCR0_EL1 + i),
> + vcpu_sys_reg(vcpu, DBGBVR0_EL1 + i));
> + }
> + for (i = 0; i < nb; i++) {
> + trace_printk("wtpt%d: 0x%08x:0x%llx\n", i,
> + (u32) vcpu_sys_reg(vcpu, DBGWCR0_EL1 + i),
> + vcpu_sys_reg(vcpu, DBGWVR0_EL1 + i));
> + }
> +}
> +
> +/**
>   * kvm_arch_setup_debug - set-up debug related stuff
>   *
>   * @vcpu:the vcpu pointer
> @@ -52,9 +77,13 @@ void kvm_arch_setup_debug(struct kvm_vcpu *vcpu)
>  {
>   bool trap_debug = false;
>  
> + trace_kvm_arch_setup_debug(vcpu->guest_debug);
> +
>   vcpu->arch.mdcr_el2 |= (MDCR_EL2_TPM | MDCR_EL2_TPMCR);
>   vcpu->arch.mdcr_el2 |= (MDCR_EL2_TDRA | MDCR_EL2_TDOSA);
>  
> + trace_kvm_arch_setup_debug_reg32("MDCR_EL2", vcpu->arch.mdcr_el2);
> +
>   /*
>* If we are not treating debug registers are dirty we need
>* to trap if the guest starts accessing them.
> @@ -66,6 +95,8 @@ void kvm_arch_setup_debug(struct kvm_vcpu *vcpu)
>   if (vcpu->guest_debug) {
>   vcpu->arch.mdcr_el2 |= MDCR_EL2_TDE;
>  
> + trace_dreg("MDCR_EL2", vcpu->arch.mdcr_el2);
> +
>   /* Save pstate/mdscr */
>   vcpu_debug_saved_reg(vcpu, pstate_ss_bit) =
>   *vcpu_cpsr(vcpu) & DBG_SPSR_SS;
> @@ -73,6 +104,11 @@ void kvm_arch_setup_debug(struct kvm_vcpu *vcpu)
>   vcpu_debug_saved_reg(vcpu, mdscr_el1) =
>   vcpu_sys_reg(vcpu, MDSCR_EL1);
>  
> + trace_dreg("Save: PSTATE.SS",
> + vcpu_debug_saved_reg(vcpu, pstate_ss_bit));
> + trace_dreg("Save: MDSCR",
> + vcpu_debug_saved_reg(vcpu, mdscr_el1));
> +
>   /*
>* Single Step (ARM ARM D2.12.3 The software step state
>* machine)
> @@ -88,6 +124,8 @@ void kvm_arch_setup_debug(struct kvm_vcpu *vcpu)
>   *vcpu_cpsr(vcpu) &= ~DBG_SPSR_SS;
>   vcpu_sys_reg(vcpu, MDSCR_EL1) &= ~DBG_MDSCR_SS;
>   }
> + trace_dreg("SPSR_EL2", *vcpu_cpsr(vcpu));
> + trace_dreg("MDSCR_EL1", vcpu_sys_reg(vcpu, MDSCR_EL1));
>  
>   /*
>* HW Break/Watch points
> @@ -136,6 +174,9 @@ void kvm_arch_setup_debug(struct kvm_vcpu *vcpu)
>  &host->dbg_wvr,
>  sizeof(__u64)*nw);
>  
> + if (trace_kvm_arch_setup_debug_reg32_enabled())
> + dump_dbg_regs(vcpu, nb, nw);
> +
>   /* Make sure hyp.S copies them in/out */
>   vcpu->arch.debug_flags |= KVM_ARM64_DEBUG_DIRTY;
>   /* Also track guest changes */
> @@ -147,15 +188,24 @@ void kvm_arch_setup_debug(struct kvm_vcpu *vcpu)
>   vcpu->arch.mdcr_el2 &= ~MDCR_EL2_TDE;
>   }
>  
> + trace_kvm_arch_setup_debug_reg32("MDCR_EL2", vcpu->arch.mdcr_el2);
> + trace_kvm_arch_setup_debug_reg32("MDSCR_EL1",
> + vcpu_sys_reg(vcpu, MDSCR_EL1));
> +
> +
>   /* Trap debug register access? */
>   if (trap_debug)
>   vcpu->arch.mdcr_el2 |= MDCR_EL2_TDA;
>   else
>   vcpu->arch.mdcr_el2 &= ~MDCR_EL2_TDA;
> +
> + trace_kvm_arch_setup_debug_reg32("MDCR_EL2", vcpu->arch.mdcr_el2);
>  }
>  
>  void kvm_arch_clear_debug(struct kvm_vcpu *vcpu)
>  {
> + trace_kvm_arch_clear_debug(vcpu->guest_debug);
> +
>   if (vcpu->guest_debug) {
>   /* Restore pstate/mdscr bits we may have messed with 

Re: [PATCH] KVM: arm: irqfd: fix value returned by kvm_irq_map_gsi

2015-04-14 Thread Christoffer Dall
On Mon, Apr 13, 2015 at 03:01:59PM +0200, Eric Auger wrote:
> irqfd/arm curently does not support routing. kvm_irq_map_gsi is
> supposed to return all the routing entries associated with the
> provided gsi and return the number of those entries. We should
> return 0 at this point.
> 
> Signed-off-by: Eric Auger 
> ---
>  virt/kvm/arm/vgic.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/virt/kvm/arm/vgic.c b/virt/kvm/arm/vgic.c
> index 9ad074e..9b4f7d4 100644
> --- a/virt/kvm/arm/vgic.c
> +++ b/virt/kvm/arm/vgic.c
> @@ -1947,7 +1947,7 @@ int kvm_irq_map_gsi(struct kvm *kvm,
>   struct kvm_kernel_irq_routing_entry *entries,
>   int gsi)
>  {
> - return gsi;
> + return 0;
>  }
>  
>  int kvm_irq_map_chip_pin(struct kvm *kvm, unsigned irqchip, unsigned pin)
> -- 

Acked-by: Christoffer Dall 
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: H_CLEAR_REF and H_CLEAR_MOD

2015-04-14 Thread Paul Mackerras
On Sat, Apr 11, 2015 at 12:57:54PM -0700, Nathan Whitehorn wrote:
> 
> 
> On 02/18/15 15:33, Nathan Whitehorn wrote:
> >
> >On 02/18/15 14:00, Paul Mackerras wrote:
> >>On Wed, Feb 18, 2015 at 09:34:54AM +0100, Alexander Graf wrote:
> Am 18.02.2015 um 07:12 schrieb Nathan Whitehorn
> :
> 
> It seems like KVM doesn't implement the H_CLEAR_REF and H_CLEAR_MOD
> hypervisor calls, which are absolutely critical for memory
> management in the FreeBSD kernel (and are marked "mandatory" in the
> PAPR manual). It seems some patches have been contributed already in 
> https://lists.ozlabs.org/pipermail/linuxppc-dev/2011-December/095013.html,
> so it would be fantastic if these could end up upstream.
> >>>Paul, I guess we never included this because  there was no user. If
> >>>FreeBSD does use it though, I think it makes a lot of sense to resend
> >>>it for inclusion.
> >>I agree.  I just need to check the locking and synchronization around
> >>the reference and change bit recording, then I'll resend it.
> >
> >Thanks much! Please let me know if I can help at all with this.
> 
> Any news on this? I'm happy to test the patch if you like.
> -Nathan

I have discovered a bug where we can lose the host-side view of the
dirtiness of pages when the guest does a H_CLEAR_MOD on a huge-page
HPTE, so I'm working on a fix for that.

Paul.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 00/12] Remaining improvements for HV KVM

2015-04-14 Thread Paul Mackerras
On Thu, Apr 09, 2015 at 12:57:58AM +0200, Alexander Graf wrote:
> On 03/28/2015 04:21 AM, Paul Mackerras wrote:
> >This is the rest of my current patch queue for HV KVM on PPC.  This
> >series is based on Alex Graf's kvm-ppc-queue branch.  The only change
> >from the previous version of this series is that patch 2 has been
> >updated to take account of the timebase offset.
> >
> >The last patch in this series needs a definition of PPC_MSGCLR that is
> >added by the patch "powerpc/powernv: Fixes for hypervisor doorbell
> >handling", which has now gone upstream into Linus' tree as commit
> >755563bc79c7 via the linuxppc-dev mailing list.  Alex, how do you want
> >to handle that?  You could pull in the master branch of the kvm tree,
> >which includes 755563bc79c7, or you could cherry-pick 755563bc79c7 and
> >let the subsequent merge fix it up.
> 
> I've just cherry-picked it for now since it still lives in my queue, so it
> will get thrown out automatically once I rebase on next if it's included in
> there.
> 
> Paolo / Marcelo, could you please try to somehow get the commit above into
> the next branch somehow? I guess the easiest would be to merge linus/master
> into kvm/next.
> 
> Thanks, applied all to kvm-ppc-queue.

Did you forget to push it out or something?  Your kvm-ppc-queue branch
is still at 4.0-rc1 as far as I can see.

Paul.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: KVM call agenda for 2015-04-14

2015-04-14 Thread Juan Quintela
Juan Quintela  wrote:
> Hi
>
> Please, send any topic that you are interested in covering.
>


As there are no agenda, call get cancelled.

Later, Juan.


>
>  Call details:
>
> By popular demand, a google calendar public entry with it
>
>   
> https://www.google.com/calendar/embed?src=dG9iMXRqcXAzN3Y4ZXZwNzRoMHE4a3BqcXNAZ3JvdXAuY2FsZW5kYXIuZ29vZ2xlLmNvbQ
>
> (Let me know if you have any problems with the calendar entry.  I just
> gave up about getting right at the same time CEST, CET, EDT and DST).
>
> If you need phone number details,  contact me privately
>
> Thanks, Juan.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Bug 93251] qemu-kvm guests randomly hangs after reboot command in guest

2015-04-14 Thread bugzilla-daemon
https://bugzilla.kernel.org/show_bug.cgi?id=93251

--- Comment #15 from Igor Mammedov  ---
Fixed in v4.0
commit 744961341d472db6272ed9b42319a90f5a2aa7c4
kvm: avoid page allocation failure in kvm_set_memory_region()

-- 
You are receiving this mail because:
You are watching the assignee of the bug.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v4 7/8] vhost: feature to set the vring endianness

2015-04-14 Thread Cornelia Huck
On Fri, 10 Apr 2015 12:19:16 +0200
Greg Kurz  wrote:

> This patch brings cross-endian support to vhost when used to implement
> legacy virtio devices. Since it is a relatively rare situation, the
> feature availability is controlled by a kernel config option (not set
> by default).
> 
> The vq->is_le boolean field is added to cache the endianness to be
> used for ring accesses. It defaults to native endian, as expected
> by legacy virtio devices. When the ring gets active, we force little
> endian if the device is modern. When the ring is deactivated, we
> revert to the native endian default.
> 
> If cross-endian was compiled in, a vq->user_be boolean field is added
> so that userspace may request a specific endianness. This field is
> used to override the default when activating the ring of a legacy
> device. It has no effect on modern devices.
> 
> Signed-off-by: Greg Kurz 
> ---
>  drivers/vhost/Kconfig  |   10 ++
>  drivers/vhost/vhost.c  |   76 
> +++-
>  drivers/vhost/vhost.h  |   12 +--
>  include/uapi/linux/vhost.h |9 +
>  4 files changed, 103 insertions(+), 4 deletions(-)

> +#ifdef CONFIG_VHOST_SET_ENDIAN_LEGACY
> +static long vhost_set_vring_big_endian(struct vhost_virtqueue *vq,
> +int __user *argp)
> +{
> + struct vhost_vring_state s;
> +
> + if (vq->private_data)
> + return -EBUSY;
> +
> + if (copy_from_user(&s, argp, sizeof(s)))
> + return -EFAULT;
> +
> + if (s.num && s.num != 1)

Checking for s.num > 1 might be more obvious at a glance?

> + return -EINVAL;
> +
> + vq->user_be = s.num;
> +
> + return 0;
> +}
> +

(...)

> +#ifdef CONFIG_VHOST_SET_ENDIAN_LEGACY
> +static void vhost_init_is_le(struct vhost_virtqueue *vq)
> +{
> + vq->is_le = vhost_has_feature(vq, VIRTIO_F_VERSION_1) || !vq->user_be;

So if the endianness is not explicitly set to BE, it will be forced to
LE (instead of native endian)? Won't that break userspace that does not
yet use the new interface when CONFIG_VHOST_SET_ENDIAN_LEGACY is set?

> +}
> +#else
> +static void vhost_init_is_le(struct vhost_virtqueue *vq)
> +{
> + if (vhost_has_feature(vq, VIRTIO_F_VERSION_1))
> + vq->is_le = true;
> +}
> +#endif
> +
>  int vhost_init_used(struct vhost_virtqueue *vq)
>  {
>   __virtio16 last_used_idx;
>   int r;
> - if (!vq->private_data)
> + if (!vq->private_data) {
> + vq->is_le = virtio_legacy_is_little_endian();
>   return 0;
> + }
> +
> + vhost_init_is_le(vq);
> 
>   r = vhost_update_used_flags(vq);
>   if (r)

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v4 7/8] vhost: feature to set the vring endianness

2015-04-14 Thread Greg Kurz
On Tue, 14 Apr 2015 16:20:23 +0200
Cornelia Huck  wrote:

> On Fri, 10 Apr 2015 12:19:16 +0200
> Greg Kurz  wrote:
> 
> > This patch brings cross-endian support to vhost when used to implement
> > legacy virtio devices. Since it is a relatively rare situation, the
> > feature availability is controlled by a kernel config option (not set
> > by default).
> > 
> > The vq->is_le boolean field is added to cache the endianness to be
> > used for ring accesses. It defaults to native endian, as expected
> > by legacy virtio devices. When the ring gets active, we force little
> > endian if the device is modern. When the ring is deactivated, we
> > revert to the native endian default.
> > 
> > If cross-endian was compiled in, a vq->user_be boolean field is added
> > so that userspace may request a specific endianness. This field is
> > used to override the default when activating the ring of a legacy
> > device. It has no effect on modern devices.
> > 
> > Signed-off-by: Greg Kurz 
> > ---
> >  drivers/vhost/Kconfig  |   10 ++
> >  drivers/vhost/vhost.c  |   76 
> > +++-
> >  drivers/vhost/vhost.h  |   12 +--
> >  include/uapi/linux/vhost.h |9 +
> >  4 files changed, 103 insertions(+), 4 deletions(-)
> 
> > +#ifdef CONFIG_VHOST_SET_ENDIAN_LEGACY
> > +static long vhost_set_vring_big_endian(struct vhost_virtqueue *vq,
> > +  int __user *argp)
> > +{
> > +   struct vhost_vring_state s;
> > +
> > +   if (vq->private_data)
> > +   return -EBUSY;
> > +
> > +   if (copy_from_user(&s, argp, sizeof(s)))
> > +   return -EFAULT;
> > +
> > +   if (s.num && s.num != 1)
> 
> Checking for s.num > 1 might be more obvious at a glance?
> 

Sure since s.num is unsigned.

> > +   return -EINVAL;
> > +
> > +   vq->user_be = s.num;
> > +
> > +   return 0;
> > +}
> > +
> 
> (...)
> 
> > +#ifdef CONFIG_VHOST_SET_ENDIAN_LEGACY
> > +static void vhost_init_is_le(struct vhost_virtqueue *vq)
> > +{
> > +   vq->is_le = vhost_has_feature(vq, VIRTIO_F_VERSION_1) || !vq->user_be;
> 
> So if the endianness is not explicitly set to BE, it will be forced to
> LE (instead of native endian)? Won't that break userspace that does not
> yet use the new interface when CONFIG_VHOST_SET_ENDIAN_LEGACY is set?
> 

If userspace doesn't use the new interface, then vq->user_be will retain its
default value that was set in vhost_vq_reset(), i.e. :

 vq->user_be = !virtio_legacy_is_little_endian();

This means default is native endian.

What about adding this comment ?

static void vhost_init_is_le(struct vhost_virtqueue *vq)
{
/* Note for legacy virtio: user_be is initialized in vhost_vq_reset()
 * according to the host endianness. If userspace does not set an
 * explicit endianness, the default behavior is native endian, as
 * expected by legacy virtio.
 */
vq->is_le = vhost_has_feature(vq, VIRTIO_F_VERSION_1) || !vq->user_be;
}

> > +}
> > +#else
> > +static void vhost_init_is_le(struct vhost_virtqueue *vq)
> > +{
> > +   if (vhost_has_feature(vq, VIRTIO_F_VERSION_1))
> > +   vq->is_le = true;
> > +}
> > +#endif
> > +
> >  int vhost_init_used(struct vhost_virtqueue *vq)
> >  {
> > __virtio16 last_used_idx;
> > int r;
> > -   if (!vq->private_data)
> > +   if (!vq->private_data) {
> > +   vq->is_le = virtio_legacy_is_little_endian();
> > return 0;
> > +   }
> > +
> > +   vhost_init_is_le(vq);
> > 
> > r = vhost_update_used_flags(vq);
> > if (r)

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v4 7/8] vhost: feature to set the vring endianness

2015-04-14 Thread Cornelia Huck
On Tue, 14 Apr 2015 17:13:52 +0200
Greg Kurz  wrote:

> On Tue, 14 Apr 2015 16:20:23 +0200
> Cornelia Huck  wrote:
> 
> > On Fri, 10 Apr 2015 12:19:16 +0200
> > Greg Kurz  wrote:

> > 
> > > +#ifdef CONFIG_VHOST_SET_ENDIAN_LEGACY
> > > +static void vhost_init_is_le(struct vhost_virtqueue *vq)
> > > +{
> > > + vq->is_le = vhost_has_feature(vq, VIRTIO_F_VERSION_1) || !vq->user_be;
> > 
> > So if the endianness is not explicitly set to BE, it will be forced to
> > LE (instead of native endian)? Won't that break userspace that does not
> > yet use the new interface when CONFIG_VHOST_SET_ENDIAN_LEGACY is set?
> > 
> 
> If userspace doesn't use the new interface, then vq->user_be will retain its
> default value that was set in vhost_vq_reset(), i.e. :
> 
>  vq->user_be = !virtio_legacy_is_little_endian();
> 
> This means default is native endian.
> 
> What about adding this comment ?
> 
> static void vhost_init_is_le(struct vhost_virtqueue *vq)
> {
>   /* Note for legacy virtio: user_be is initialized in vhost_vq_reset()
>* according to the host endianness. If userspace does not set an
>* explicit endianness, the default behavior is native endian, as
>* expected by legacy virtio.
>*/

Good idea, as this is easy to miss.

>   vq->is_le = vhost_has_feature(vq, VIRTIO_F_VERSION_1) || !vq->user_be;
> }
> 
> > > +}
> > > +#else
> > > +static void vhost_init_is_le(struct vhost_virtqueue *vq)
> > > +{
> > > + if (vhost_has_feature(vq, VIRTIO_F_VERSION_1))
> > > + vq->is_le = true;
> > > +}
> > > +#endif
> > > +
> > >  int vhost_init_used(struct vhost_virtqueue *vq)
> > >  {
> > >   __virtio16 last_used_idx;
> > >   int r;
> > > - if (!vq->private_data)
> > > + if (!vq->private_data) {
> > > + vq->is_le = virtio_legacy_is_little_endian();
> > >   return 0;
> > > + }
> > > +
> > > + vhost_init_is_le(vq);
> > > 
> > >   r = vhost_update_used_flags(vq);
> > >   if (r)
> 

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2] kvm: mmu: don't do overflow memslot check

2015-04-14 Thread Andres Lagar-Cavilla
On Mon, Apr 13, 2015 at 10:51 PM, Wanpeng Li  wrote:
> As Andre pointed out:
(Andres)
>
> | I don't understand the value of this check here. Are we looking for a
> | broken memslot? Shouldn't this be a BUG_ON? Is this the place to care
> | about these things? npages is capped to KVM_MEM_MAX_NR_PAGES, i.e.
> | 2^31. A 64 bit overflow would be caused by a gigantic gfn_start which
> | would be trouble in many other ways.
>
> This patch drops the memslot overflow check to make the codes more simple.
>
> Signed-off-by: Wanpeng Li 
Reviewed-by: Andres Lagar-Cavilla 

Thanks
> ---
>  arch/x86/kvm/mmu.c | 12 ++--
>  1 file changed, 2 insertions(+), 10 deletions(-)
>
> diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
> index 2a0d77e..9265fda 100644
> --- a/arch/x86/kvm/mmu.c
> +++ b/arch/x86/kvm/mmu.c
> @@ -4505,19 +4505,12 @@ void kvm_mmu_zap_collapsible_sptes(struct kvm *kvm,
> bool flush = false;
> unsigned long *rmapp;
> unsigned long last_index, index;
> -   gfn_t gfn_start, gfn_end;
>
> spin_lock(&kvm->mmu_lock);
>
> -   gfn_start = memslot->base_gfn;
> -   gfn_end = memslot->base_gfn + memslot->npages - 1;
> -
> -   if (gfn_start >= gfn_end)
> -   goto out;
> -
> rmapp = memslot->arch.rmap[0];
> -   last_index = gfn_to_index(gfn_end, memslot->base_gfn,
> -   PT_PAGE_TABLE_LEVEL);
> +   last_index = gfn_to_index(memslot->base_gfn + memslot->npages - 1,
> +   memslot->base_gfn, PT_PAGE_TABLE_LEVEL);
>
> for (index = 0; index <= last_index; ++index, ++rmapp) {
> if (*rmapp)
> @@ -4535,7 +4528,6 @@ void kvm_mmu_zap_collapsible_sptes(struct kvm *kvm,
> if (flush)
> kvm_flush_remote_tlbs(kvm);
>
> -out:
> spin_unlock(&kvm->mmu_lock);
>  }
>
> --
> 1.9.1
>



-- 
Andres Lagar-Cavilla | Google Kernel Team | andre...@google.com
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/2] vfio-pci: Log device requests more verbosely

2015-04-14 Thread Alex Williamson
Log some clues indicating whether the user is receiving device
request interfaces or not listening.  This can help indicate why a
driver unbind is blocked or explain why QEMU automatically unplugged
a device from the VM.

Signed-off-by: Alex Williamson 
---
 drivers/vfio/pci/vfio_pci.c |8 +++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/drivers/vfio/pci/vfio_pci.c b/drivers/vfio/pci/vfio_pci.c
index d9f7ec5..5eb9cbe 100644
--- a/drivers/vfio/pci/vfio_pci.c
+++ b/drivers/vfio/pci/vfio_pci.c
@@ -923,8 +923,14 @@ static void vfio_pci_request(void *device_data, unsigned 
int count)
mutex_lock(&vdev->igate);
 
if (vdev->req_trigger) {
-   dev_dbg(&vdev->pdev->dev, "Requesting device from user\n");
+   if (!(count % 10))
+   dev_notice_ratelimited(&vdev->pdev->dev,
+   "Relaying device request to user (#%u)\n",
+   count);
eventfd_signal(vdev->req_trigger, 1);
+   } else if (count == 0) {
+   dev_warn(&vdev->pdev->dev,
+   "No device request channel registered, blocked until 
released by user\n");
}
 
mutex_unlock(&vdev->igate);

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/2] vfio: Flush signals on device request interruption

2015-04-14 Thread Alex Williamson
Signals don't just interrupt our wait, they remain pending such that
subsequent wait_events timeout immediately.  This can cause a CPU to
spin with a single signal.  Flush signals if we receive an
interruption and also log information about the reason the task is
blocked.

Signed-off-by: Alex Williamson 
Cc: sta...@vger.kernel.org # v4.0
---
 drivers/vfio/vfio.c |   13 ++---
 1 file changed, 10 insertions(+), 3 deletions(-)

diff --git a/drivers/vfio/vfio.c b/drivers/vfio/vfio.c
index 0d33662..4c786d4 100644
--- a/drivers/vfio/vfio.c
+++ b/drivers/vfio/vfio.c
@@ -710,6 +710,7 @@ void *vfio_del_group_dev(struct device *dev)
void *device_data = device->device_data;
struct vfio_unbound_dev *unbound;
unsigned int i = 0;
+   long ret;
 
/*
 * The group exists so long as we have a device reference.  Get
@@ -755,9 +756,15 @@ void *vfio_del_group_dev(struct device *dev)
 
vfio_device_put(device);
 
-   } while (wait_event_interruptible_timeout(vfio.release_q,
- !vfio_dev_present(group, dev),
- HZ * 10) <= 0);
+   ret = wait_event_interruptible_timeout(vfio.release_q,
+   !vfio_dev_present(group, dev), HZ * 10);
+   if (ret == -ERESTARTSYS) {
+   flush_signals(current);
+   dev_warn_ratelimited(dev, "Device is currently in use, 
task \"%s\" (%d) blocked until device is released",
+current->comm,
+task_pid_nr(current));
+   }
+   } while (ret <= 0);
 
vfio_group_put(group);
 

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 0/2] vfio: Add signal flush and log more verbosely

2015-04-14 Thread Alex Williamson
Fix an issue where signals are not flushed causing our interruptible
wait_event to go into a tight loop once a signal is received.  Also
add more logging throughout.  We have no -EBUSY return path for a
driver remove request and it's confusing to users when tasks get
blocked waiting for the device to release.  Provide more clues what's
happening.  Thanks,

Alex

---

Alex Williamson (2):
  vfio: Flush signals on device request interruption
  vfio-pci: Log device requests more verbosely


 drivers/vfio/pci/vfio_pci.c |8 +++-
 drivers/vfio/vfio.c |   13 ++---
 2 files changed, 17 insertions(+), 4 deletions(-)
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[GIT PULL] VFIO updates for v4.1-rc1

2015-04-14 Thread Alex Williamson
Hi Linus,

The following changes since commit 06e5801b8cb3fc057d88cb4dc03c0b64b2744cda:

  Linux 4.0-rc4 (2015-03-15 17:38:20 -0700)

are available in the git repository at:

  git://github.com/awilliam/linux-vfio.git tags/vfio-v4.1-rc1

for you to fetch changes up to 5a0ff17741c1785b27229a16b5ab77470d71b170:

  vfio-pci: Fix use after free (2015-04-08 08:11:51 -0600)


VFIO updates for v4.1
 - VFIO platform bus driver support (Baptiste Reynal, Antonios Motakis, testing 
and review by Eric Auger)
 - Split VFIO irqfd support to separate module (Alex Williamson)
 - vfio-pci VGA arbiter client (Alex Williamson)
 - New vfio-pci.ids= module option (Alex Williamson)
 - vfio-pci D3 power state support for idle devices (Alex Williamson)


Alex Williamson (8):
  vfio: Split virqfd into a separate module for vfio bus drivers
  vgaarb: Stub vga_set_legacy_decoding()
  vfio-pci: Add module option to disable VGA region access
  vfio-pci: Add VGA arbiter client
  vfio-pci: Allow PCI IDs to be specified as module options
  vfio-pci: Remove warning if try-reset fails
  vfio-pci: Move idle devices to D3hot power state
  vfio-pci: Fix use after free

Antonios Motakis (20):
  vfio/platform: initial skeleton of VFIO support for platform devices
  vfio: platform: probe to devices on the platform bus
  vfio: platform: add the VFIO PLATFORM module to Kconfig
  vfio: amba: VFIO support for AMBA devices
  vfio: amba: add the VFIO for AMBA devices module to Kconfig
  vfio/platform: return info for bound device
  vfio/platform: return info for device memory mapped IO regions
  vfio/platform: read and write support for the device fd
  vfio/platform: support MMAP of MMIO regions
  vfio/platform: return IRQ info
  vfio/platform: initial interrupts support code
  vfio/platform: trigger an interrupt via eventfd
  vfio/platform: support for level sensitive interrupts
  vfio: add a vfio_ prefix to virqfd_enable and virqfd_disable and export
  vfio: virqfd: rename vfio_pci_virqfd_init and vfio_pci_virqfd_exit
  vfio: add local lock for virqfd instead of depending on VFIO PCI
  vfio: pass an opaque pointer on virqfd initialization
  vfio: move eventfd support code for VFIO_PCI to a separate file
  vfio: initialize the virqfd workqueue in VFIO generic code
  vfio/platform: implement IRQ masking/unmasking via an eventfd

Zhen Lei (1):
  vfio: put off the allocation of "minor" in vfio_create_group

kbuild test robot (1):
  vfio: virqfd_lock can be static

 drivers/vfio/Kconfig  |   6 +
 drivers/vfio/Makefile |   4 +
 drivers/vfio/pci/Kconfig  |   1 +
 drivers/vfio/pci/vfio_pci.c   | 188 --
 drivers/vfio/pci/vfio_pci_intrs.c | 238 +---
 drivers/vfio/pci/vfio_pci_private.h   |   3 -
 drivers/vfio/platform/Kconfig |  20 +
 drivers/vfio/platform/Makefile|   8 +
 drivers/vfio/platform/vfio_amba.c | 115 ++
 drivers/vfio/platform/vfio_platform.c | 103 +
 drivers/vfio/platform/vfio_platform_common.c  | 521 ++
 drivers/vfio/platform/vfio_platform_irq.c | 336 +
 drivers/vfio/platform/vfio_platform_private.h |  85 +
 drivers/vfio/vfio.c   |  13 +-
 drivers/vfio/virqfd.c | 226 +++
 include/linux/vfio.h  |  25 ++
 include/linux/vgaarb.h|   5 +
 include/uapi/linux/vfio.h |   2 +
 18 files changed, 1640 insertions(+), 259 deletions(-)
 create mode 100644 drivers/vfio/platform/Kconfig
 create mode 100644 drivers/vfio/platform/Makefile
 create mode 100644 drivers/vfio/platform/vfio_amba.c
 create mode 100644 drivers/vfio/platform/vfio_platform.c
 create mode 100644 drivers/vfio/platform/vfio_platform_common.c
 create mode 100644 drivers/vfio/platform/vfio_platform_irq.c
 create mode 100644 drivers/vfio/platform/vfio_platform_private.h
 create mode 100644 drivers/vfio/virqfd.c


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] vhost: use kthread_run macro

2015-04-14 Thread Bandan Das

Code Cleanup, kthread_run is a convenient wrapper
around kthread_create() that even wakes up the process
for us. Use it and remove no longer needed temp
task_struct variable.

Signed-off-by: Bandan Das 
---
 drivers/vhost/vhost.c | 12 
 1 file changed, 4 insertions(+), 8 deletions(-)

diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
index 2ee2826..9ac66f7 100644
--- a/drivers/vhost/vhost.c
+++ b/drivers/vhost/vhost.c
@@ -366,7 +366,6 @@ EXPORT_SYMBOL_GPL(vhost_dev_has_owner);
 /* Caller should have device mutex */
 long vhost_dev_set_owner(struct vhost_dev *dev)
 {
-   struct task_struct *worker;
int err;
 
/* Is there an owner already? */
@@ -377,15 +376,12 @@ long vhost_dev_set_owner(struct vhost_dev *dev)
 
/* No owner, become one */
dev->mm = get_task_mm(current);
-   worker = kthread_create(vhost_worker, dev, "vhost-%d", current->pid);
-   if (IS_ERR(worker)) {
-   err = PTR_ERR(worker);
+   dev->worker = kthread_run(vhost_worker, dev, "vhost-%d", current->pid);
+   if (IS_ERR(dev->worker)) {
+   err = PTR_ERR(dev->worker);
goto err_worker;
}
 
-   dev->worker = worker;
-   wake_up_process(worker);/* avoid contributing to loadavg */
-
err = vhost_attach_cgroups(dev);
if (err)
goto err_cgroup;
@@ -396,7 +392,7 @@ long vhost_dev_set_owner(struct vhost_dev *dev)
 
return 0;
 err_cgroup:
-   kthread_stop(worker);
+   kthread_stop(dev->worker);
dev->worker = NULL;
 err_worker:
if (dev->mm)
-- 
2.1.0

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2] kvm: mmu: don't do memslot overflow check

2015-04-14 Thread Wanpeng Li
As Andres pointed out:

| I don't understand the value of this check here. Are we looking for a
| broken memslot? Shouldn't this be a BUG_ON? Is this the place to care
| about these things? npages is capped to KVM_MEM_MAX_NR_PAGES, i.e.
| 2^31. A 64 bit overflow would be caused by a gigantic gfn_start which
| would be trouble in many other ways.

This patch drops the memslot overflow check to make the codes more simple.

Reviewed-by: Andres Lagar-Cavilla 
Signed-off-by: Wanpeng Li 
---
v1 -> v2:
 * Fix Andres's name
 * Add Andres's Reviewed-by 

 arch/x86/kvm/mmu.c | 12 ++--
 1 file changed, 2 insertions(+), 10 deletions(-)

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index 2a0d77e..9265fda 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -4505,19 +4505,12 @@ void kvm_mmu_zap_collapsible_sptes(struct kvm *kvm,
bool flush = false;
unsigned long *rmapp;
unsigned long last_index, index;
-   gfn_t gfn_start, gfn_end;
 
spin_lock(&kvm->mmu_lock);
 
-   gfn_start = memslot->base_gfn;
-   gfn_end = memslot->base_gfn + memslot->npages - 1;
-
-   if (gfn_start >= gfn_end)
-   goto out;
-
rmapp = memslot->arch.rmap[0];
-   last_index = gfn_to_index(gfn_end, memslot->base_gfn,
-   PT_PAGE_TABLE_LEVEL);
+   last_index = gfn_to_index(memslot->base_gfn + memslot->npages - 1,
+   memslot->base_gfn, PT_PAGE_TABLE_LEVEL);
 
for (index = 0; index <= last_index; ++index, ++rmapp) {
if (*rmapp)
@@ -4535,7 +4528,6 @@ void kvm_mmu_zap_collapsible_sptes(struct kvm *kvm,
if (flush)
kvm_flush_remote_tlbs(kvm);
 
-out:
spin_unlock(&kvm->mmu_lock);
 }
 
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html