Re: [CFT PATCH v2 0/2] KVM: support XSAVES usage in the host

2014-11-26 Thread Nadav Amit
Wanpeng Li wanpeng...@linux.intel.com wrote:

 Hi all,
 On Tue, Nov 25, 2014 at 04:50:06PM +0200, Nadav Amit wrote:
 On Nov 25, 2014, at 16:17, Paolo Bonzini pbonz...@redhat.com wrote:
 
 
 
 On 25/11/2014 15:05, Nadav Amit wrote:
 diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
 index 373b0ab9a32e..ca26681455c2 100644
 --- a/arch/x86/kvm/x86.c
 +++ b/arch/x86/kvm/x86.c
 @@ -6955,6 +6955,9 @@ int fx_init(struct kvm_vcpu *vcpu)
   return err;
 
   fpu_finit(vcpu-arch.guest_fpu);
 + if (cpu_has_xsaves)
 + vcpu-arch.guest_fpu.state-xsave.xsave_hdr.xcomp_bv =
 + host_xcr0 | XSTATE_COMPACTION_ENABLED;
 
   /*
* Ensure guest xcr0 is valid for loading
 
 The second version works for me (w/qemu v2.1.0; Linux 3.13 guest). I
 
 Could you try 3.17 guest which has xsaves enabled? Because I'm not sure if 
 the below codes from Paolo is enough to mask XSAVES, should we also add 
 F(XSAVES)?
I guess this paragraph is intended for me. As I said before, I got limited
access to the machine - I will not be able to run these experiments before
Sunday, and I am not sure that I will have time to fully debug it. If you
insist, the very least please run some experiments running a 3.16 (or older)
guest kernel and running 3.17 with ‘noxsaves’ kernel parameter.

 +   const u32 kvm_supported_word10_x86_features =
 +   F(XSAVEOPT) | F(XSAVEC) | F(XGETBV1);
 +
 
 In addition, the 3.17 guest is still hang as I mentioned even if I add the 
 F(XSAVES) to the kvm_supported_word10_x86_features.
 
 did not need to apply this patch on top. [although I am not sure whether
 relying on userspace to call KVM_SET_XSAVE early enough is a good 
 practice].
 
 Did you actually try the patch? :)  If it works, I'm tempted to apply it
 anyway.
 Yes, I tried it both with and without this patch.
 Due to time constraints I only tested minimal functionality (Linux boot).
 I will run more tests in the near future. Anyhow, you can put the:
 
 Tested-by: Nadav Amit na...@cs.technion.ac.il
 
 One disclaimer: Since I got limited time with the machine, I executed
 a slightly modified kernel/qemu, and not the latest version.
 Anyhow, I don’t think these differences can have any impact.
 
 Yes, that is no problem.
 
 I am just worried that Wanpeng reported it fails, while I report it works...
 
 I have another patch which enable xsaves in KVM and the patch is still
 under debug with Paolo's patch KVM: x86: support XSAVES usage in the host,
 so the 1/2 patch from Paolo can be dropped if my patch is ready. Anyway,
 a quick fix is needed before enable xsaves in kvm. 
xsaves is already enabled in KVM (the host) since 3.17. It just didn’t work…

Nadav

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [CFT PATCH v2 0/2] KVM: support XSAVES usage in the host

2014-11-26 Thread Wanpeng Li
Hi Nadav,
On Wed, Nov 26, 2014 at 11:00:34AM +0200, Nadav Amit wrote:
Wanpeng Li wanpeng...@linux.intel.com wrote:

 Hi all,
 On Tue, Nov 25, 2014 at 04:50:06PM +0200, Nadav Amit wrote:
 On Nov 25, 2014, at 16:17, Paolo Bonzini pbonz...@redhat.com wrote:
 
 
 
 On 25/11/2014 15:05, Nadav Amit wrote:
 diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
 index 373b0ab9a32e..ca26681455c2 100644
 --- a/arch/x86/kvm/x86.c
 +++ b/arch/x86/kvm/x86.c
 @@ -6955,6 +6955,9 @@ int fx_init(struct kvm_vcpu *vcpu)
  return err;
 
  fpu_finit(vcpu-arch.guest_fpu);
 +if (cpu_has_xsaves)
 +vcpu-arch.guest_fpu.state-xsave.xsave_hdr.xcomp_bv =
 +host_xcr0 | XSTATE_COMPACTION_ENABLED;
 
  /*
   * Ensure guest xcr0 is valid for loading
 
 The second version works for me (w/qemu v2.1.0; Linux 3.13 guest). I
 
 Could you try 3.17 guest which has xsaves enabled? Because I'm not sure if 
 the below codes from Paolo is enough to mask XSAVES, should we also add 
 F(XSAVES)?
I guess this paragraph is intended for me. As I said before, I got limited
access to the machine - I will not be able to run these experiments before
Sunday, and I am not sure that I will have time to fully debug it. If you
insist, the very least please run some experiments running a 3.16 (or older)
guest kernel and running 3.17 with ‘noxsaves’ kernel parameter.

 +   const u32 kvm_supported_word10_x86_features =
 +   F(XSAVEOPT) | F(XSAVEC) | F(XGETBV1);
 +
 
 In addition, the 3.17 guest is still hang as I mentioned even if I add the 
 F(XSAVES) to the kvm_supported_word10_x86_features.
 
 did not need to apply this patch on top. [although I am not sure whether
 relying on userspace to call KVM_SET_XSAVE early enough is a good 
 practice].
 
 Did you actually try the patch? :)  If it works, I'm tempted to apply it
 anyway.
 Yes, I tried it both with and without this patch.
 Due to time constraints I only tested minimal functionality (Linux boot).
 I will run more tests in the near future. Anyhow, you can put the:
 
 Tested-by: Nadav Amit na...@cs.technion.ac.il
 
 One disclaimer: Since I got limited time with the machine, I executed
 a slightly modified kernel/qemu, and not the latest version.
 Anyhow, I don’t think these differences can have any impact.
 
 Yes, that is no problem.
 
 I am just worried that Wanpeng reported it fails, while I report it works...
 
 I have another patch which enable xsaves in KVM and the patch is still
 under debug with Paolo's patch KVM: x86: support XSAVES usage in the host,
 so the 1/2 patch from Paolo can be dropped if my patch is ready. Anyway,
 a quick fix is needed before enable xsaves in kvm. 
xsaves is already enabled in KVM (the host) since 3.17. It just didn’t work…


Not yet, it is enabled in native currrently instead of vmx and my patch
is still under debug.

Regards,
Wanpeng Li 

Nadav
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] target-i386: add feature flags for CPUID[EAX=0xd,ECX=1]

2014-11-26 Thread Paolo Bonzini


On 25/11/2014 21:02, Paolo Bonzini wrote:
   +static const char *cpuid_xsave_feature_name[] = {
   +xsaveopt, xsavec, xgetbv1, xsaves,
  
  None of the above features introduce any new state that might need to be
  migrated, or will require other changes in QEMU to work, right?
  
  It looks like they don't introduce any extra state, but if they do, they
  need to be added to unmigratable_flags until migration support is
  implemented.
  
  If they require other QEMU changes, it would be nice if KVM reported
  them using KVM_CHECK_EXTENSION instead of GET_SUPPORTED_CPUID, so it
  wouldn't break -cpu host.
 
 No, they don't.

Actually, xsaves does but I don't think KVM_CHECK_EXTENSION is right.
It's just another MSR, and we haven't used KVM_CHECK_EXTENSION for new
MSRs and new XSAVE areas (last example: avx512).

Since no hardware really exists for it, and KVM does not support it
anyway, I think it's simplest to leave xsaves out for now.  Is this right?

Paolo
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RFC 1/2] KVM: don't check for PF_VCPU when yielding

2014-11-26 Thread David Hildenbrand
 This change is a trade-off.
 PRO: This patch would improve the case of preemption on s390. This is 
 probably a corner case as most distros have preemption off anyway.
 CON: The downside is that kvm_vcpu_yield_to is called also from 
 kvm_vcpu_on_spin. Here we want to avoid the scheduler overhead for a wrong 
 decision.   

Won't most of that part be covered by:
if (!ACCESS_ONCE(vcpu-preempted))

vcpu-preempted is only set when scheduled out involuntarily. It is cleared
when scheduled in. s390 sets it manually, to speed up waking up a vcpu.

So when our task is scheduled in (an PF_VCPU is set), this check will already
avoid scheduler overhead in kvm_vcpu_on_spin() or am I missing something?

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RFC 1/2] KVM: don't check for PF_VCPU when yielding

2014-11-26 Thread Christian Borntraeger
Am 26.11.2014 um 10:23 schrieb David Hildenbrand:
 This change is a trade-off.
 PRO: This patch would improve the case of preemption on s390. This is 
 probably a corner case as most distros have preemption off anyway.
 CON: The downside is that kvm_vcpu_yield_to is called also from 
 kvm_vcpu_on_spin. Here we want to avoid the scheduler overhead for a wrong 
 decision.   
 
 Won't most of that part be covered by:
   if (!ACCESS_ONCE(vcpu-preempted))

Hmm, right. Checking vcpu-preempted and PF_VCPU might boil down to the same.
Would be good if to have to performance regression test, though. 

 
 vcpu-preempted is only set when scheduled out involuntarily. It is cleared
 when scheduled in. s390 sets it manually, to speed up waking up a vcpu.
 
 So when our task is scheduled in (an PF_VCPU is set), this check will already
 avoid scheduler overhead in kvm_vcpu_on_spin() or am I missing something?
 

CC Raghavendra KT. Could be rerun your kernbench/sysbench/ebizzy setup on x86 
to see if the patch in this thread causes any regression? If think your commit 
7bc7ae25b143kvm: Iterate over only vcpus that are preempted might have really 
made the PF_VCPU check unnecessary

CC Michael Mueller, do we still have our yield performance setup handy to check 
if this patch causes any regression?


Christian

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v4 2/3] KVM: arm: add irqfd support

2014-11-26 Thread Christoffer Dall
On Tue, Nov 25, 2014 at 02:12:31PM +0100, Eric Auger wrote:
 On 11/25/2014 11:19 AM, Christoffer Dall wrote:
  [Re-adding cc list]
  
  On Mon, Nov 24, 2014 at 05:42:30PM +0100, Eric Auger wrote:
  On 11/24/2014 04:47 PM, Christoffer Dall wrote:
  On Mon, Nov 24, 2014 at 12:02 PM, Eric Auger eric.au...@linaro.org 
  wrote:
  On 11/24/2014 11:00 AM, Christoffer Dall wrote:
  On Sun, Nov 23, 2014 at 06:56:59PM +0100, Eric Auger wrote:
  This patch enables irqfd on arm.
 
  Both irqfd and resamplefd are supported. Injection is implemented
  in vgic.c without routing.
 
  This patch enables CONFIG_HAVE_KVM_EVENTFD and CONFIG_HAVE_KVM_IRQFD.
 
  KVM_CAP_IRQFD is now advertised. KVM_CAP_IRQFD_RESAMPLE capability
  automatically is advertised as soon as CONFIG_HAVE_KVM_IRQFD is set.
 
  Signed-off-by: Eric Auger eric.au...@linaro.org
 
  ---
  
  [...]
  
  +int kvm_set_irq(struct kvm *kvm, int irq_source_id,
  +u32 irq, int level, bool line_status)
  +{
  +unsigned int spi = irq + VGIC_NR_PRIVATE_IRQS;
  +
  +kvm_debug(irqfd sets vIRQ %d to %d\n, irq, level);
  +
  +if (likely(vgic_initialized(kvm))) {
  +if (spi  kvm-arch.vgic.nr_irqs)
  +return -EINVAL;
  +return kvm_vgic_inject_irq(kvm, 0, spi, level);
  +} else if (level  kvm_irq_has_notifier(kvm, 0, irq)) {
  +/*
  + * any IRQ injected before vgic readiness is
  + * ignored and the notifier, if any, is called
  + * immediately as if the virtual IRQ were completed
  + * by the guest
  + */
  +kvm_notify_acked_irq(kvm, 0, irq);
  +return -EAGAIN;
 
  This looks fishy, I don't fully understand which case you're catering
  towards here (the comment is hard to understand), but my gut feeling is
  that you should back out (probably with an error) if the vgic is not
  initialized when calling this function.  Setting up the routing in the
  first place should probably error out if the vgic is not available.
  The issue is related to integration with QEMU and VFIO.
  Currently VFIO signaling (we tell VFIO to signal the eventfd on a
  peculiar physical IRQ) and irqfd setup (we tell KVM/IRQFD to handle
  previous eventfd when triggered and inject a GSI) are done by QEMU
  *before* the first vcpu run. so before VGIC is ready.
 
  Both are done in a so called QEMU machine init done notifier. It could
  be done in a QEMU reset notifier but I guess the problem would be the
  same. and I think the same at which QEMU initializes it is correct.
 
  As soon as both VFIO signaling and irqfd are setup, virtual IRQ are
  likely to be injected and this is what happens on some circumstances.
  This happens on the 2d QEMU run after killing the 1st QEMU session. For
  some reason I guess the HW device - in my case the xgmac - was released
  in such a way the IRQ wire was not reset. So as soon as VFIO driver
  calls request_irq, the IRQ hits.
 
  I tried to change that this xgmac driver behavior but I must acknowledge
  that for the time beeing I failed. I will continue investigating that.
 
  Indeed I might be fixing the issue at the wrong place. But anyway this
  relies on the fact the assigned device driver toggled down the IRQ
  properly when releasing. At the time the signaling is setup we have not
  entered yet any driver reset code.
 
  I see, it's because of the quirky way we initialize the vgic at time
  of running the first CPU, so user space can't really hook into the
  sweet spot after initializing the vgic but just before actually
  running guest code.
 
  Yes currently irqfd generic code has no way to test if the virtual
  interrupt controller is ready (KVM__IRQFD - kvm_irqfd_assign )
  because I guess other archs don't have the problem.
 
  Could it be that we simply need to let user space init the vgic after
  creating all its vcpus and only then allow setting up the irqfd?
 
  I start VFIO signaling and setup KVM_IRQFD in a QEMU machine init done
  notifier. the vgic init then must happen before then.
  
  have all the vcpus that QEMU wants to create, been created by then?
 Hi Christoffer,
 
 My understanding of QEMU start sequence is as follows:
 at machine init, CPU realize function is called for each CPU
 (target-arm/cpu.c arm_cpu_realizefn)
 qemu_init_vcpu (cpus.c)  qemu_kvm_start_vcpu (cpus.c)  launch VCPU
 thread (qemu_kvm_cpu_thread_fn)  kvm_init_vcpu
 kvm_init_vcpu does the while (1) loop issuing KVM_RUN ioctl if *cpu_can_run*
 
 cpu_can_run returns true when vm_start is called
 (resume_all_vcpuscpu_resume)
 
 QEMU machine init done or reset callbacks happen after machine init and
 before vm_start.
 
 The actual vgic readiness is set in the first vcpu run, ie on the
 vm_start.
 
 vgic instantiation in virt machine file is done after cpu_instantiation.
 We could force vgic init at that time, in the gic realize fn
 (kvm_arm_gic_realize)?. All vcpus are instantiated: nr_vcpus and nr_irqs
 are 

Re: [PATCH] target-i386: add feature flags for CPUID[EAX=0xd,ECX=1]

2014-11-26 Thread Eduardo Habkost
On Wed, Nov 26, 2014 at 10:20:12AM +0100, Paolo Bonzini wrote:
 
 
 On 25/11/2014 21:02, Paolo Bonzini wrote:
+static const char *cpuid_xsave_feature_name[] = {
+xsaveopt, xsavec, xgetbv1, xsaves,
   
   None of the above features introduce any new state that might need to be
   migrated, or will require other changes in QEMU to work, right?
   
   It looks like they don't introduce any extra state, but if they do, they
   need to be added to unmigratable_flags until migration support is
   implemented.
   
   If they require other QEMU changes, it would be nice if KVM reported
   them using KVM_CHECK_EXTENSION instead of GET_SUPPORTED_CPUID, so it
   wouldn't break -cpu host.
  
  No, they don't.
 
 Actually, xsaves does but I don't think KVM_CHECK_EXTENSION is right.
 It's just another MSR, and we haven't used KVM_CHECK_EXTENSION for new
 MSRs and new XSAVE areas (last example: avx512).

If the changes needed are only to support migration (this is the case if
it's just another MSR handled by KVM, or additional XSAVE areas),
GET_SUPPORTED_CPUID is still reasonable, because features that are
unknown to QEMU are always considered unmigratable until we add the
feature name to the feature_name arrays. (That's why we need to know if
the feature introduces additional state when adding the feature names to
the array.)

If other changes are required to make the feature work even if no
migration is required, then adding them to GET_SUPPORTED_CPUID would
break -cpu host on older QEMUs. I don't think that's the case here,
but I wanted to confirm.

(Should we add those observations to Documentation/virtual/kvm/api.txt?)


 
 Since no hardware really exists for it, and KVM does not support it
 anyway, I think it's simplest to leave xsaves out for now.  Is this right?

If unsure, it won't hurt to add the feature to unmigratable_features by
now. Making QEMU aware of the feature name is still useful.

-- 
Eduardo
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [CFT PATCH v2 2/2] KVM: x86: support XSAVES usage in the host

2014-11-26 Thread Radim Krčmář
2014-11-24 17:43+0100, Paolo Bonzini:
 Userspace is expecting non-compacted format for KVM_GET_XSAVE, but
 struct xsave_struct might be using the compacted format.  Convert
 in order to preserve userspace ABI.
 
 Likewise, userspace is passing non-compacted format for KVM_SET_XSAVE
 but the kernel will pass it to XRSTORS, and we need to convert back.

Future instructions might force us to calling xsave/xrstor directly, so
we could do that even now and save the explicit conversion ...

What I mean is:  we could be using the native xsave.*/xrstor.* while in
kernel and use xsave/xrstor for communication with userspace.
Hardware would take care of everything in the conversion.

get_xsave = native_xrstor(guest_xsave);  xsave(aligned_userspace_buffer)
set_xsave = xrstor(aligned_userspace_buffer);  native_xsave(guest_xsave)

Could that work?

 Fixes: f31a9f7c71691569359fa7fb8b0acaa44bce0324
 Cc: Fenghua Yu fenghua...@intel.com
 Cc: H. Peter Anvin h...@linux.intel.com
 Cc: Nadav Amit na...@cs.technion.ac.il
 Signed-off-by: Paolo Bonzini pbonz...@redhat.com
 ---
  arch/x86/kvm/x86.c | 87 
 +-
  1 file changed, 80 insertions(+), 7 deletions(-)
 
 diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
 index 08b5657e57ed..373b0ab9a32e 100644
 --- a/arch/x86/kvm/x86.c
 +++ b/arch/x86/kvm/x86.c
 @@ -3132,15 +3132,89 @@ static int kvm_vcpu_ioctl_x86_set_debugregs(struct 
 kvm_vcpu *vcpu,
   return 0;
  }
  
 +#define XSTATE_COMPACTION_ENABLED (1ULL  63)

(arch/x86/include/asm/xsave.h)

 +
 +static void fill_xsave(u8 *dest, struct kvm_vcpu *vcpu)
 +{
 + struct xsave_struct *xsave = vcpu-arch.guest_fpu.state-xsave;
 + u64 xstate_bv = vcpu-arch.guest_supported_xcr0 | XSTATE_FPSSE;

(I don't think this is necessary.  We haven't modified it before and
 userspace worked, so we can save explicit copying of initialized data.)

 + u64 valid;
 +
 + /*
 +  * Copy legacy XSAVE area, to avoid complications with CPUID
 +  * leaves 0 and 1 in the loop below.
 +  */
 + memcpy(dest, xsave, XSAVE_HDR_OFFSET);

(Yeah, there is an exception for SSE;  I don't see any effect it has on
 restore though, so we could probably ignore it as well.)

 +
 + /* Set XSTATE_BV */
 + *(u64 *)(dest + XSAVE_HDR_OFFSET) = xstate_bv;
 +
 + /*
 +  * Copy each region from the possibly compacted offset to the
 +  * non-compacted offset.
 +  */
 + valid = xstate_bv  ~XSTATE_FPSSE;

(We could read xstate_bv from xsave and  it with supported.)

 + while (valid) {
 + u64 feature = valid  -valid;
 + int index = fls64(feature) - 1;
 + void *src = get_xsave_addr(xsave, feature);

(xcomp_bv never changes, so it works for compacted xsave.)

 +
 + if (src) {
 + u32 size, offset, ecx, edx;
 + cpuid_count(XSTATE_CPUID, index,
 + size, offset, ecx, edx);

(ok, setup_xstate_features() has the same code.)

 + memcpy(dest + offset, src, size);
 + }
 +
 + valid -= feature;
 + }
 +}
 +
 +static void load_xsave(struct kvm_vcpu *vcpu, u8 *src)
 +{
 + struct xsave_struct *xsave = vcpu-arch.guest_fpu.state-xsave;
 + u64 xstate_bv = *(u64 *)(src + XSAVE_HDR_OFFSET);
 + u64 valid;
 +
 + /*
 +  * Copy legacy XSAVE area, to avoid complications with CPUID
 +  * leaves 0 and 1 in the loop below.
 +  */
 + memcpy(xsave, src, XSAVE_HDR_OFFSET);
 +
 + /* Set XSTATE_BV and possibly XCOMP_BV.  */
 + xsave-xsave_hdr.xstate_bv = xstate_bv;
 + if (cpu_has_xsaves)
 + xsave-xsave_hdr.xcomp_bv = host_xcr0 | 
 XSTATE_COMPACTION_ENABLED;

Userspace can trigger a #GP if it passes xstate_bv bit that isn't in
xcomp_bv, so we could  them back into xstate_bv as well.

(Linux probably won't start using IA32_XSS, so using just xcr0 is fine.)

 +
 + /*
 +  * Copy each region from the non-compacted offset to the
 +  * possibly compacted offset.
 +  */
 + valid = xstate_bv  ~XSTATE_FPSSE;
 + while (valid) {
 + u64 feature = valid  -valid;
 + int index = fls64(feature) - 1;
 + void *dest = get_xsave_addr(xsave, feature);
 +
 + if (dest) {
 + u32 size, offset, ecx, edx;
 + cpuid_count(XSTATE_CPUID, index,
 + size, offset, ecx, edx);
 + memcpy(dest, src + offset, size);
 + } else
 + WARN_ON_ONCE(1);
 +
 + valid -= feature;
 + }
 +}
 +
  static void kvm_vcpu_ioctl_x86_get_xsave(struct kvm_vcpu *vcpu,
struct kvm_xsave *guest_xsave)
  {
   if (cpu_has_xsave) {
 - memcpy(guest_xsave-region,
 - vcpu-arch.guest_fpu.state-xsave,
 - vcpu-arch.guest_xstate_size);
 - *(u64 

Re: Exposing host debug capabilities to userspace

2014-11-26 Thread Alex Bennée

Paolo Bonzini pbonz...@redhat.com writes:

 On 25/11/2014 17:35, Alexander Graf wrote:
 Unfortunately on ARM you also have a few other constraints - the debug
 register space is partitioned into magic super debug registers at the
 top (with an implementation specific amount) and normal debug registers
 in the lower end of the space.

 Does the gdbstub ever need to use the magic super debug registers?  Even
 if it does, the logic is the same as x86.  On x86 you might run out of
 breakpoints, on ARM you might run out of breakpoints in general or magic
 super breakpoints in particular.

By default gdbstub uses normal software breakpoints unless a) the user
asks for a HW one or b) the memory is RO (i.e a ROM). For watchpoints
the situation is reversed and they are used by default if available.

 I would just treat gdbstub debugging as the ugly step child that it is
 and not introduce anything special for it (except for debug event
 deflection to user space). Then only ever work on guest debug registers
 and call it a day. Chances are just too high that we get the interfaces
 wrong and shoot ourselves in the foot.

 I agree.  But we do need a way to retrieve the number of valid fields in
 struct kvm_guest_debug_arch, if that is not architecturally defined
 (e.g. on x86 it's just always 4).

It's IMPDEF (implementation defined) up to a maximum of 16.

 A hidden ONE_REG, whose existence
 is determined by (ARM || ARM64) 
 kvm_check_extension(KVM_CAP_SET_GUEST_DEBUG), is certainly fine and I
 like it because it doesn't pollute the global KVM_CAP_* namespace.

My original implementation more or less did that but my main worry is
migration code tends to iterate over the whole ONE_REG list and
introducing a value that doesn't happen to be strictly the guests is
semantically impure. Of course if you mean by hidden don't export it
via GET_REG_LIST then I guess that would work. Does seem a little ugly
as the internal KVM code now needs to learn about hidden and non-hidden
registers.

 The alternative is a capability; however, if you start with two
 capabilities you'll end up needing four, and then realize that returning
 a struct via ONE_REG would have been a better idea.  :)  We already have
 how many breakpoints, how many watchpoints, how many magic super debug
 registers,...

Only two, not sure what the super debug registers are?

Certainly on the QEMU side the capability based approach is beautifully
simple:

max_hw_wp = kvm_check_extension(cs-kvm_state, KVM_CAP_GUEST_DEBUG_HW_WPS);
max_hw_bp = kvm_check_extension(cs-kvm_state, KVM_CAP_GUEST_DEBUG_HW_BPS);

thanks to kvm_check_extension zeroing failure modes.

And in it's defence it's a generic enough capability to be used across
any other architectures that need to expose this information compared to
the many PPC specific capabilities. Perhaps a KVM_ARCH_EXTENSION ioctl
would be more useful in reducing the growth of the space?


 Paolo

-- 
Alex Bennée
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Exposing host debug capabilities to userspace

2014-11-26 Thread Peter Maydell
On 25 November 2014 at 16:50, Paolo Bonzini pbonz...@redhat.com wrote:


 On 25/11/2014 17:35, Alexander Graf wrote:
 Unfortunately on ARM you also have a few other constraints - the debug
 register space is partitioned into magic super debug registers at the
 top (with an implementation specific amount) and normal debug registers
 in the lower end of the space.

 Does the gdbstub ever need to use the magic super debug registers?

No. The extra features of the trailing registers in the range
are context matching, which means you can set them up as
breakpoint on this context ID or breakpoint on this VMID
[read: breakpoints specific to a particular process ID or
particular VM], generally as linked breakpoints adding a
constraint to a plain address-match breakpoint register.
The gdbstub doesn't use context bps and I don't think Linux
does either (though of course it might in future).

-- PMM
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Another Obsolete Fix me in trace.h?

2014-11-26 Thread Radim Krčmář
2014-11-24 22:49+0100, Radim Krčmář:
 2014-11-24 16:19-0500, Steven Rostedt:
  That wouldn't be too hard to implement.
 
 I'll look at the patch tommorrow.

The kernel part is trivial.
Most of the code is going to be in tools/lib/traceevent/event-parse.c.

I wasn't sure whether to
1) define new 'enum print_arg_type' for our function and do exactly what
   other __print* did
2) extend existing PRINT_FUNC with variable arguments and register it as
   a global event handler

as (2) makes more sense to me, but we already had it when some __print*
was implemented ... (and I didn't want to dig too deep into the project)
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] target-i386: add feature flags for CPUID[EAX=0xd,ECX=1]

2014-11-26 Thread Paolo Bonzini


On 26/11/2014 12:40, Eduardo Habkost wrote:
 On Wed, Nov 26, 2014 at 10:20:12AM +0100, Paolo Bonzini wrote:


 On 25/11/2014 21:02, Paolo Bonzini wrote:
 +static const char *cpuid_xsave_feature_name[] = {
 +xsaveopt, xsavec, xgetbv1, xsaves,

 None of the above features introduce any new state that might need to be
 migrated, or will require other changes in QEMU to work, right?

 It looks like they don't introduce any extra state, but if they do, they
 need to be added to unmigratable_flags until migration support is
 implemented.

 If they require other QEMU changes, it would be nice if KVM reported
 them using KVM_CHECK_EXTENSION instead of GET_SUPPORTED_CPUID, so it
 wouldn't break -cpu host.

 No, they don't.

 Actually, xsaves does but I don't think KVM_CHECK_EXTENSION is right.
 It's just another MSR, and we haven't used KVM_CHECK_EXTENSION for new
 MSRs and new XSAVE areas (last example: avx512).
 
 If the changes needed are only to support migration (this is the case if
 it's just another MSR handled by KVM, or additional XSAVE areas),
 GET_SUPPORTED_CPUID is still reasonable, because features that are
 unknown to QEMU are always considered unmigratable until we add the
 feature name to the feature_name arrays. (That's why we need to know if
 the feature introduces additional state when adding the feature names to
 the array.)
 
 If other changes are required to make the feature work even if no
 migration is required, then adding them to GET_SUPPORTED_CPUID would
 break -cpu host on older QEMUs. I don't think that's the case here,
 but I wanted to confirm.

KVM may need more changes (I don't know, the details of the feature are
not public yet), but a new userspace API is very unlikely based on Intel
documentation.

 (Should we add those observations to Documentation/virtual/kvm/api.txt?)
 

 Since no hardware really exists for it, and KVM does not support it
 anyway, I think it's simplest to leave xsaves out for now.  Is this right?
 
 If unsure, it won't hurt to add the feature to unmigratable_features by
 now. Making QEMU aware of the feature name is still useful.

Ok, thanks.

Paolo
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [CFT PATCH v2 0/2] KVM: support XSAVES usage in the host

2014-11-26 Thread Paolo Bonzini


On 26/11/2014 02:24, Wanpeng Li wrote:
 Hi all,
 On Tue, Nov 25, 2014 at 04:50:06PM +0200, Nadav Amit wrote:

 On Nov 25, 2014, at 16:17, Paolo Bonzini pbonz...@redhat.com wrote:



 On 25/11/2014 15:05, Nadav Amit wrote:
 diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
 index 373b0ab9a32e..ca26681455c2 100644
 --- a/arch/x86/kvm/x86.c
 +++ b/arch/x86/kvm/x86.c
 @@ -6955,6 +6955,9 @@ int fx_init(struct kvm_vcpu *vcpu)
   return err;

   fpu_finit(vcpu-arch.guest_fpu);
 + if (cpu_has_xsaves)
 + vcpu-arch.guest_fpu.state-xsave.xsave_hdr.xcomp_bv =
 + host_xcr0 | XSTATE_COMPACTION_ENABLED;

   /*
* Ensure guest xcr0 is valid for loading

 The second version works for me (w/qemu v2.1.0; Linux 3.13 guest). I
 
 Could you try 3.17 guest which has xsaves enabled? Because I'm not sure if 
 the below codes from Paolo is enough to mask XSAVES, should we also add 
 F(XSAVES)?
 
 +   const u32 kvm_supported_word10_x86_features =
 +   F(XSAVEOPT) | F(XSAVEC) | F(XGETBV1);
 +
 
 In addition, the 3.17 guest is still hang as I mentioned even if I add the 
 F(XSAVES) to the kvm_supported_word10_x86_features.

Note that the point is _hiding_ XSAVES while leaving
XSAVEOPT/XSAVEC/XGETBV1 enabled.  This series only fixes the problems
with xsaves in the host, it doesn't try making XSAVES work in the guest.

 I have another patch which enable xsaves in KVM and the patch is still
 under debug with Paolo's patch KVM: x86: support XSAVES usage in the host,
 so the 1/2 patch from Paolo can be dropped if my patch is ready. Anyway,
 a quick fix is needed before enable xsaves in kvm. 

Please add XSAVES to kvm_supported_word10_x86_features in your patches
instead.  We still need to mask any (not yet defined) feature in bit 4
and above of CPUID[EAX=0xd,ECX].EAX.

Paolo
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v4 2/3] KVM: arm: add irqfd support

2014-11-26 Thread Eric Auger
On 11/26/2014 12:31 PM, Christoffer Dall wrote:
 On Tue, Nov 25, 2014 at 02:12:31PM +0100, Eric Auger wrote:
 On 11/25/2014 11:19 AM, Christoffer Dall wrote:
 [Re-adding cc list]

 On Mon, Nov 24, 2014 at 05:42:30PM +0100, Eric Auger wrote:
 On 11/24/2014 04:47 PM, Christoffer Dall wrote:
 On Mon, Nov 24, 2014 at 12:02 PM, Eric Auger eric.au...@linaro.org 
 wrote:
 On 11/24/2014 11:00 AM, Christoffer Dall wrote:
 On Sun, Nov 23, 2014 at 06:56:59PM +0100, Eric Auger wrote:
 This patch enables irqfd on arm.

 Both irqfd and resamplefd are supported. Injection is implemented
 in vgic.c without routing.

 This patch enables CONFIG_HAVE_KVM_EVENTFD and CONFIG_HAVE_KVM_IRQFD.

 KVM_CAP_IRQFD is now advertised. KVM_CAP_IRQFD_RESAMPLE capability
 automatically is advertised as soon as CONFIG_HAVE_KVM_IRQFD is set.

 Signed-off-by: Eric Auger eric.au...@linaro.org

 ---

 [...]

 +int kvm_set_irq(struct kvm *kvm, int irq_source_id,
 +u32 irq, int level, bool line_status)
 +{
 +unsigned int spi = irq + VGIC_NR_PRIVATE_IRQS;
 +
 +kvm_debug(irqfd sets vIRQ %d to %d\n, irq, level);
 +
 +if (likely(vgic_initialized(kvm))) {
 +if (spi  kvm-arch.vgic.nr_irqs)
 +return -EINVAL;
 +return kvm_vgic_inject_irq(kvm, 0, spi, level);
 +} else if (level  kvm_irq_has_notifier(kvm, 0, irq)) {
 +/*
 + * any IRQ injected before vgic readiness is
 + * ignored and the notifier, if any, is called
 + * immediately as if the virtual IRQ were completed
 + * by the guest
 + */
 +kvm_notify_acked_irq(kvm, 0, irq);
 +return -EAGAIN;

 This looks fishy, I don't fully understand which case you're catering
 towards here (the comment is hard to understand), but my gut feeling is
 that you should back out (probably with an error) if the vgic is not
 initialized when calling this function.  Setting up the routing in the
 first place should probably error out if the vgic is not available.
 The issue is related to integration with QEMU and VFIO.
 Currently VFIO signaling (we tell VFIO to signal the eventfd on a
 peculiar physical IRQ) and irqfd setup (we tell KVM/IRQFD to handle
 previous eventfd when triggered and inject a GSI) are done by QEMU
 *before* the first vcpu run. so before VGIC is ready.

 Both are done in a so called QEMU machine init done notifier. It could
 be done in a QEMU reset notifier but I guess the problem would be the
 same. and I think the same at which QEMU initializes it is correct.

 As soon as both VFIO signaling and irqfd are setup, virtual IRQ are
 likely to be injected and this is what happens on some circumstances.
 This happens on the 2d QEMU run after killing the 1st QEMU session. For
 some reason I guess the HW device - in my case the xgmac - was released
 in such a way the IRQ wire was not reset. So as soon as VFIO driver
 calls request_irq, the IRQ hits.

 I tried to change that this xgmac driver behavior but I must acknowledge
 that for the time beeing I failed. I will continue investigating that.

 Indeed I might be fixing the issue at the wrong place. But anyway this
 relies on the fact the assigned device driver toggled down the IRQ
 properly when releasing. At the time the signaling is setup we have not
 entered yet any driver reset code.

 I see, it's because of the quirky way we initialize the vgic at time
 of running the first CPU, so user space can't really hook into the
 sweet spot after initializing the vgic but just before actually
 running guest code.

 Yes currently irqfd generic code has no way to test if the virtual
 interrupt controller is ready (KVM__IRQFD - kvm_irqfd_assign )
 because I guess other archs don't have the problem.

 Could it be that we simply need to let user space init the vgic after
 creating all its vcpus and only then allow setting up the irqfd?

 I start VFIO signaling and setup KVM_IRQFD in a QEMU machine init done
 notifier. the vgic init then must happen before then.

 have all the vcpus that QEMU wants to create, been created by then?
 Hi Christoffer,

 My understanding of QEMU start sequence is as follows:
 at machine init, CPU realize function is called for each CPU
 (target-arm/cpu.c arm_cpu_realizefn)
 qemu_init_vcpu (cpus.c)  qemu_kvm_start_vcpu (cpus.c)  launch VCPU
 thread (qemu_kvm_cpu_thread_fn)  kvm_init_vcpu
 kvm_init_vcpu does the while (1) loop issuing KVM_RUN ioctl if *cpu_can_run*

 cpu_can_run returns true when vm_start is called
 (resume_all_vcpuscpu_resume)

 QEMU machine init done or reset callbacks happen after machine init and
 before vm_start.

 The actual vgic readiness is set in the first vcpu run, ie on the
 vm_start.

 vgic instantiation in virt machine file is done after cpu_instantiation.
 We could force vgic init at that time, in the gic realize fn
 (kvm_arm_gic_realize)?. All vcpus are instantiated: nr_vcpus and nr_irqs
 are known. Also base addresses are known.



 Currently 

Re: [PATCH 2/7] KVM: arm: guest debug, define API headers

2014-11-26 Thread Alex Bennée

Paolo Bonzini pbonz...@redhat.com writes:

 On 25/11/2014 18:13, Peter Maydell wrote:
 On 25 November 2014 at 17:05, Paolo Bonzini pbonz...@redhat.com wrote:
  So there is no register that says this breakpoint has triggered or
  this watchpoint has triggered?
 Nope. You take a debug exception; the syndrome register tells
 you if it was a bp or a wp, and if it was a wp the fault address
 register tells you the address being accessed (if it was a bp
 you know the program counter, obviously). The debugger is expected
 to be able to figure it out from there, if it cares.

 That's already good enough---do the KVM_DEBUG_EXIT_* constants match the
 syndrome register, or if not why?

No they don't. I did consider it at the time but I was wary of pulling
too much over into the uapi headers wholesale. If your happy to do that
I'll include the change in my next version.

I could also rationalise the exit handlers as they all pretty much do
the same thing (save for the exit/syndrome related info). Again I was
keeping things nicely separated in case any particular exception needed
excessive special case handling.

Would you like those changes?


 Paolo

-- 
Alex Bennée
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [CFT PATCH v2 2/2] KVM: x86: support XSAVES usage in the host

2014-11-26 Thread Paolo Bonzini


On 26/11/2014 13:07, Radim Krčmář wrote:
 2014-11-24 17:43+0100, Paolo Bonzini:
 Userspace is expecting non-compacted format for KVM_GET_XSAVE, but
 struct xsave_struct might be using the compacted format.  Convert
 in order to preserve userspace ABI.

 Likewise, userspace is passing non-compacted format for KVM_SET_XSAVE
 but the kernel will pass it to XRSTORS, and we need to convert back.
 
 Future instructions might force us to calling xsave/xrstor directly, so
 we could do that even now and save the explicit conversion ...
 
 What I mean is:  we could be using the native xsave.*/xrstor.* while in
 kernel and use xsave/xrstor for communication with userspace.
 Hardware would take care of everything in the conversion.
 
 get_xsave = native_xrstor(guest_xsave);  xsave(aligned_userspace_buffer)
 set_xsave = xrstor(aligned_userspace_buffer);  native_xsave(guest_xsave)
 
 Could that work?

It could, though it is more like

   get_fpu()
   native_xrstor(guest_xsave)
   xsave(buffer)
   put_fpu()

and vice versa.  Also, the userspace buffer is mos likely not aligned,
so you need some kind of bounce buffer.  It can be done if the CPUID
turns out to be a bottleneck, apart from that it'd most likely be slower.

Paolo
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/7] KVM: arm: guest debug, define API headers

2014-11-26 Thread Paolo Bonzini


On 26/11/2014 14:13, Alex Bennée wrote:
 
 Paolo Bonzini pbonz...@redhat.com writes:
 
 On 25/11/2014 18:13, Peter Maydell wrote:
 On 25 November 2014 at 17:05, Paolo Bonzini pbonz...@redhat.com wrote:
 So there is no register that says this breakpoint has triggered or
 this watchpoint has triggered?
 Nope. You take a debug exception; the syndrome register tells
 you if it was a bp or a wp, and if it was a wp the fault address
 register tells you the address being accessed (if it was a bp
 you know the program counter, obviously). The debugger is expected
 to be able to figure it out from there, if it cares.

 That's already good enough---do the KVM_DEBUG_EXIT_* constants match the
 syndrome register, or if not why?
 
 No they don't. I did consider it at the time but I was wary of pulling
 too much over into the uapi headers wholesale. If your happy to do that
 I'll include the change in my next version.
 
 I could also rationalise the exit handlers as they all pretty much do
 the same thing (save for the exit/syndrome related info). Again I was
 keeping things nicely separated in case any particular exception needed
 excessive special case handling.
 
 Would you like those changes?

Yes, please.

Paolo
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] KVM: x86: Generate #UD when memory operand is required

2014-11-26 Thread Nadav Amit
Certain x86 instructions that use modrm operands only allow memory operand
(i.e., mod012), and cause a #UD exception otherwise. KVM ignores this fact.
Currently, the instructions that are such and are emulated by KVM are MOVBE,
MOVNTPS, MOVNTPD and MOVNTI.  MOVBE is the most blunt example, since it may be
emulated by the host regardless of MMIO.

The fix introduces a new group for handling such instructions, marking mod3 as
illegal instruction.

Signed-off-by: Nadav Amit na...@cs.technion.ac.il
---
 arch/x86/kvm/emulate.c | 38 ++
 1 file changed, 34 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index 6a57157..3817334 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -123,6 +123,7 @@
 #define Prefix  (315) /* Instruction varies with 66/f2/f3 prefix */
 #define RMExt   (415) /* Opcode extension in ModRM r/m if mod == 3 */
 #define Escape  (515) /* Escape to coprocessor instruction */
+#define InstrDual   (615) /* Alternate instruction decoding of mod == 3 
*/
 #define Sse (118) /* SSE Vector instruction */
 /* Generic ModRM decode. */
 #define ModRM   (119)
@@ -211,6 +212,7 @@ struct opcode {
const struct group_dual *gdual;
const struct gprefix *gprefix;
const struct escape *esc;
+   const struct instr_dual *idual;
void (*fastop)(struct fastop *fake);
} u;
int (*check_perm)(struct x86_emulate_ctxt *ctxt);
@@ -233,6 +235,11 @@ struct escape {
struct opcode high[64];
 };
 
+struct instr_dual {
+   struct opcode mod012;
+   struct opcode mod3;
+};
+
 /* EFLAGS bit definitions. */
 #define EFLG_ID (121)
 #define EFLG_VIP (120)
@@ -3679,6 +3686,7 @@ static int check_perm_out(struct x86_emulate_ctxt *ctxt)
 #define EXT(_f, _e) { .flags = ((_f) | RMExt), .u.group = (_e) }
 #define G(_f, _g) { .flags = ((_f) | Group | ModRM), .u.group = (_g) }
 #define GD(_f, _g) { .flags = ((_f) | GroupDual | ModRM), .u.gdual = (_g) }
+#define ID(_f, _i) { .flags = ((_f) | InstrDual | ModRM), .u.idual = (_i) }
 #define E(_f, _e) { .flags = ((_f) | Escape | ModRM), .u.esc = (_e) }
 #define I(_f, _e) { .flags = (_f), .u.execute = (_e) }
 #define F(_f, _e) { .flags = (_f) | Fastop, .u.fastop = (_e) }
@@ -3840,8 +3848,12 @@ static const struct gprefix pfx_0f_6f_0f_7f = {
I(Mmx, em_mov), I(Sse | Aligned, em_mov), N, I(Sse | Unaligned, em_mov),
 };
 
+static const struct instr_dual instr_dual_0f_2b = {
+   I(0, em_mov), N
+};
+
 static const struct gprefix pfx_0f_2b = {
-   I(0, em_mov), I(0, em_mov), N, N,
+   ID(0, instr_dual_0f_2b), ID(0, instr_dual_0f_2b), N, N,
 };
 
 static const struct gprefix pfx_0f_28_0f_29 = {
@@ -3915,6 +3927,10 @@ static const struct escape escape_dd = { {
N, N, N, N, N, N, N, N,
 } };
 
+static const struct instr_dual instr_dual_0f_c3 = {
+   I(DstMem | SrcReg | ModRM | No16 | Mov, em_mov), N
+};
+
 static const struct opcode opcode_table[256] = {
/* 0x00 - 0x07 */
F6ALU(Lock, em_add),
@@ -4117,7 +4133,7 @@ static const struct opcode twobyte_table[256] = {
D(DstReg | SrcMem8 | ModRM | Mov), D(DstReg | SrcMem16 | ModRM | Mov),
/* 0xC0 - 0xC7 */
F2bv(DstMem | SrcReg | ModRM | SrcWrite | Lock, em_xadd),
-   N, I(DstMem | SrcReg | ModRM | No16 | Mov, em_mov),
+   N, ID(0, instr_dual_0f_c3),
N, N, N, GD(0, group9),
/* 0xC8 - 0xCF */
X8(I(DstReg, em_bswap)),
@@ -4130,12 +4146,20 @@ static const struct opcode twobyte_table[256] = {
N, N, N, N, N, N, N, N, N, N, N, N, N, N, N, N
 };
 
+static const struct instr_dual instr_dual_0f_38_f0 = {
+   I(DstReg | SrcMem | Mov, em_movbe), N
+};
+
+static const struct instr_dual instr_dual_0f_38_f1 = {
+   I(DstMem | SrcReg | Mov, em_movbe), N
+};
+
 static const struct gprefix three_byte_0f_38_f0 = {
-   I(DstReg | SrcMem | Mov, em_movbe), N, N, N
+   ID(0, instr_dual_0f_38_f0), N, N, N
 };
 
 static const struct gprefix three_byte_0f_38_f1 = {
-   I(DstMem | SrcReg | Mov, em_movbe), N, N, N
+   ID(0, instr_dual_0f_38_f1), N, N, N
 };
 
 /*
@@ -4536,6 +4560,12 @@ done_prefixes:
else
opcode = opcode.u.esc-op[(ctxt-modrm  3)  
7];
break;
+   case InstrDual:
+   if ((ctxt-modrm  6) == 3)
+   opcode = opcode.u.idual-mod3;
+   else
+   opcode = opcode.u.idual-mod012;
+   break;
default:
return EMULATION_FAILED;
}
-- 
1.9.1

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH kvm-unit-tests] x86: Test illegal movbe

2014-11-26 Thread Nadav Amit
Previously KVM ignored the mod field of MOVBE instruction, so MOVBE from
register to register succeeds, although it should fail (cause a #UD exception).
This test check that a #UD is indeed delivered upon such MOVBE.

The test would not work if MOVBE is unsupported.

Signed-off-by: Nadav Amit na...@cs.technion.ac.il
---
 x86/emulator.c | 22 ++
 1 file changed, 22 insertions(+)

diff --git a/x86/emulator.c b/x86/emulator.c
index 5aa4dbf..709978b 100644
--- a/x86/emulator.c
+++ b/x86/emulator.c
@@ -1051,6 +1051,27 @@ static void test_simplealu(u32 *mem)
 report(test, *mem == 0x8400);
 }
 
+static void illegal_movbe_handler(struct ex_regs *regs)
+{
+   extern char bad_movbe_cont;
+
+   ++exceptions;
+   regs-rip = (ulong)bad_movbe_cont;
+}
+
+static void test_illegal_movbe(void)
+{
+   if (!(cpuid(1).c  (1  22)))
+   printf(SKIP: illegal movbe\n);
+
+   exceptions = 0;
+   handle_exception(UD_VECTOR, illegal_movbe_handler);
+   asm volatile(.byte 0x0f; .byte 0x38; .byte 0xf0; .byte 0xc0;\n\t
+ bad_movbe_cont: : : : rax);
+   report(illegal movbe, exceptions == 1);
+   handle_exception(UD_VECTOR, 0);
+}
+
 int main()
 {
void *mem;
@@ -1119,6 +1140,7 @@ int main()
test_string_io_mmio(mem);
 
test_jmp_noncanonical(mem);
+   test_illegal_movbe();
 
return report_summary();
 }
-- 
1.9.1

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] KVM: x86: Generate #UD when memory operand is required

2014-11-26 Thread Paolo Bonzini


On 26/11/2014 14:47, Nadav Amit wrote:
 Certain x86 instructions that use modrm operands only allow memory operand
 (i.e., mod012), and cause a #UD exception otherwise. KVM ignores this fact.
 Currently, the instructions that are such and are emulated by KVM are MOVBE,
 MOVNTPS, MOVNTPD and MOVNTI.  MOVBE is the most blunt example, since it may be
 emulated by the host regardless of MMIO.
 
 The fix introduces a new group for handling such instructions, marking mod3 as
 illegal instruction.
 
 Signed-off-by: Nadav Amit na...@cs.technion.ac.il
 ---
  arch/x86/kvm/emulate.c | 38 ++
  1 file changed, 34 insertions(+), 4 deletions(-)
 
 diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
 index 6a57157..3817334 100644
 --- a/arch/x86/kvm/emulate.c
 +++ b/arch/x86/kvm/emulate.c
 @@ -123,6 +123,7 @@
  #define Prefix  (315) /* Instruction varies with 66/f2/f3 prefix */
  #define RMExt   (415) /* Opcode extension in ModRM r/m if mod == 3 
 */
  #define Escape  (515) /* Escape to coprocessor instruction */
 +#define InstrDual   (615) /* Alternate instruction decoding of mod == 
 3 */
  #define Sse (118) /* SSE Vector instruction */
  /* Generic ModRM decode. */
  #define ModRM   (119)
 @@ -211,6 +212,7 @@ struct opcode {
   const struct group_dual *gdual;
   const struct gprefix *gprefix;
   const struct escape *esc;
 + const struct instr_dual *idual;
   void (*fastop)(struct fastop *fake);
   } u;
   int (*check_perm)(struct x86_emulate_ctxt *ctxt);
 @@ -233,6 +235,11 @@ struct escape {
   struct opcode high[64];
  };
  
 +struct instr_dual {
 + struct opcode mod012;
 + struct opcode mod3;
 +};
 +
  /* EFLAGS bit definitions. */
  #define EFLG_ID (121)
  #define EFLG_VIP (120)
 @@ -3679,6 +3686,7 @@ static int check_perm_out(struct x86_emulate_ctxt *ctxt)
  #define EXT(_f, _e) { .flags = ((_f) | RMExt), .u.group = (_e) }
  #define G(_f, _g) { .flags = ((_f) | Group | ModRM), .u.group = (_g) }
  #define GD(_f, _g) { .flags = ((_f) | GroupDual | ModRM), .u.gdual = (_g) }
 +#define ID(_f, _i) { .flags = ((_f) | InstrDual | ModRM), .u.idual = (_i) }
  #define E(_f, _e) { .flags = ((_f) | Escape | ModRM), .u.esc = (_e) }
  #define I(_f, _e) { .flags = (_f), .u.execute = (_e) }
  #define F(_f, _e) { .flags = (_f) | Fastop, .u.fastop = (_e) }
 @@ -3840,8 +3848,12 @@ static const struct gprefix pfx_0f_6f_0f_7f = {
   I(Mmx, em_mov), I(Sse | Aligned, em_mov), N, I(Sse | Unaligned, em_mov),
  };
  
 +static const struct instr_dual instr_dual_0f_2b = {
 + I(0, em_mov), N
 +};
 +
  static const struct gprefix pfx_0f_2b = {
 - I(0, em_mov), I(0, em_mov), N, N,
 + ID(0, instr_dual_0f_2b), ID(0, instr_dual_0f_2b), N, N,
  };
  
  static const struct gprefix pfx_0f_28_0f_29 = {
 @@ -3915,6 +3927,10 @@ static const struct escape escape_dd = { {
   N, N, N, N, N, N, N, N,
  } };
  
 +static const struct instr_dual instr_dual_0f_c3 = {
 + I(DstMem | SrcReg | ModRM | No16 | Mov, em_mov), N
 +};
 +
  static const struct opcode opcode_table[256] = {
   /* 0x00 - 0x07 */
   F6ALU(Lock, em_add),
 @@ -4117,7 +4133,7 @@ static const struct opcode twobyte_table[256] = {
   D(DstReg | SrcMem8 | ModRM | Mov), D(DstReg | SrcMem16 | ModRM | Mov),
   /* 0xC0 - 0xC7 */
   F2bv(DstMem | SrcReg | ModRM | SrcWrite | Lock, em_xadd),
 - N, I(DstMem | SrcReg | ModRM | No16 | Mov, em_mov),
 + N, ID(0, instr_dual_0f_c3),
   N, N, N, GD(0, group9),
   /* 0xC8 - 0xCF */
   X8(I(DstReg, em_bswap)),
 @@ -4130,12 +4146,20 @@ static const struct opcode twobyte_table[256] = {
   N, N, N, N, N, N, N, N, N, N, N, N, N, N, N, N
  };
  
 +static const struct instr_dual instr_dual_0f_38_f0 = {
 + I(DstReg | SrcMem | Mov, em_movbe), N
 +};
 +
 +static const struct instr_dual instr_dual_0f_38_f1 = {
 + I(DstMem | SrcReg | Mov, em_movbe), N
 +};
 +
  static const struct gprefix three_byte_0f_38_f0 = {
 - I(DstReg | SrcMem | Mov, em_movbe), N, N, N
 + ID(0, instr_dual_0f_38_f0), N, N, N
  };
  
  static const struct gprefix three_byte_0f_38_f1 = {
 - I(DstMem | SrcReg | Mov, em_movbe), N, N, N
 + ID(0, instr_dual_0f_38_f1), N, N, N
  };
  
  /*
 @@ -4536,6 +4560,12 @@ done_prefixes:
   else
   opcode = opcode.u.esc-op[(ctxt-modrm  3)  
 7];
   break;
 + case InstrDual:
 + if ((ctxt-modrm  6) == 3)
 + opcode = opcode.u.idual-mod3;
 + else
 + opcode = opcode.u.idual-mod012;
 + break;
   default:
   return EMULATION_FAILED;
   }
 

Thanks, looks good (not applied yet).

Paolo
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to 

[GIT PULL] KVM changes for 3.18-rc7 (or final)

2014-11-26 Thread Paolo Bonzini
Linus,

The following changes since commit b914c5b2130239fd378d1a719ab3c53f0c782be9:

  Merge branch 'for-3.18' of git://linux-nfs.org/~bfields/linux (2014-11-25 
19:05:41 -0800)

are available in the git repository at:


  git://git.kernel.org/pub/scm/virt/kvm/kvm.git tags/for-linus

for you to fetch changes up to d3fccc7ef831d1d829b4da5eaa081db55b1e38f3:

  kvm: fix kvm_is_mmio_pfn() and rename to kvm_is_reserved_pfn() (2014-11-26 
14:40:45 +0100)


Last minute KVM/ARM fixes; even the generic change actually
affects nothing but ARM.


Ard Biesheuvel (2):
  arm/arm64: kvm: drop inappropriate use of kvm_is_mmio_pfn()
  kvm: fix kvm_is_mmio_pfn() and rename to kvm_is_reserved_pfn()

Christoffer Dall (2):
  arm64: KVM: Handle traps of ICC_SRE_EL1 as RAZ/WI
  arm/arm64: KVM: vgic: Fix error code in kvm_vgic_create()

Mark Rutland (1):
  arm64: KVM: fix unmapping with 48-bit VAs

 arch/arm/kvm/mmu.c| 10 --
 arch/arm64/kvm/sys_regs.c |  9 +
 arch/ia64/kvm/kvm-ia64.c  |  2 +-
 arch/x86/kvm/mmu.c|  6 +++---
 include/linux/kvm_host.h  |  2 +-
 virt/kvm/arm/vgic.c   |  8 
 virt/kvm/kvm_main.c   | 16 
 7 files changed, 34 insertions(+), 19 deletions(-)
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [CFT PATCH v2 2/2] KVM: x86: support XSAVES usage in the host

2014-11-26 Thread Radim Krčmář
2014-11-26 14:13+0100, Paolo Bonzini:
 On 26/11/2014 13:07, Radim Krčmář wrote:
  2014-11-24 17:43+0100, Paolo Bonzini:
  Userspace is expecting non-compacted format for KVM_GET_XSAVE, but
  struct xsave_struct might be using the compacted format.  Convert
  in order to preserve userspace ABI.
 
  Likewise, userspace is passing non-compacted format for KVM_SET_XSAVE
  but the kernel will pass it to XRSTORS, and we need to convert back.
  
  Future instructions might force us to calling xsave/xrstor directly, so
  we could do that even now and save the explicit conversion ...
  
  What I mean is:  we could be using the native xsave.*/xrstor.* while in
  kernel and use xsave/xrstor for communication with userspace.
  Hardware would take care of everything in the conversion.
  
  get_xsave = native_xrstor(guest_xsave);  xsave(aligned_userspace_buffer)
  set_xsave = xrstor(aligned_userspace_buffer);  native_xsave(guest_xsave)
  
  Could that work?
 
 It could, though it is more like
 
get_fpu()
native_xrstor(guest_xsave)
xsave(buffer)
put_fpu()
 
 and vice versa.  Also, the userspace buffer is mos likely not aligned,
 so you need some kind of bounce buffer.  It can be done if the CPUID
 turns out to be a bottleneck, apart from that it'd most likely be slower.

Yeah, it was mostly making this code more future-proof ... it is easier
to convince xsave.h to export its structures if CPUID is the problem.
(I still see some hope for Linux, so performance isn't my primary goal.)

I'm quite interested in CPUID now though, so I'll try to benchmark it,
someday.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v4 25/42] vhost: add memory access wrappers

2014-11-26 Thread Cornelia Huck
On Tue, 25 Nov 2014 18:43:10 +0200
Michael S. Tsirkin m...@redhat.com wrote:

These wrappers are needed to handle virtio endianness conversions.

?

 Signed-off-by: Michael S. Tsirkin m...@redhat.com
 ---
  drivers/vhost/vhost.h | 33 -
  1 file changed, 32 insertions(+), 1 deletion(-)
 
 diff --git a/drivers/vhost/vhost.h b/drivers/vhost/vhost.h
 index 3eda654..b9032e8 100644
 --- a/drivers/vhost/vhost.h
 +++ b/drivers/vhost/vhost.h
 @@ -174,6 +174,37 @@ enum {
 
  static inline int vhost_has_feature(struct vhost_virtqueue *vq, int bit)
  {
 - return vq-acked_features  (1  bit);
 + return vq-acked_features  (1ULL  bit);

Should this hunk go into patch 28?

 +}

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH kvm-unit-tests] x86: Test illegal movbe

2014-11-26 Thread Paolo Bonzini


On 26/11/2014 14:48, Nadav Amit wrote:
 Previously KVM ignored the mod field of MOVBE instruction, so MOVBE from
 register to register succeeds, although it should fail (cause a #UD 
 exception).
 This test check that a #UD is indeed delivered upon such MOVBE.
 
 The test would not work if MOVBE is unsupported.
 
 Signed-off-by: Nadav Amit na...@cs.technion.ac.il
 ---
  x86/emulator.c | 22 ++
  1 file changed, 22 insertions(+)
 
 diff --git a/x86/emulator.c b/x86/emulator.c
 index 5aa4dbf..709978b 100644
 --- a/x86/emulator.c
 +++ b/x86/emulator.c
 @@ -1051,6 +1051,27 @@ static void test_simplealu(u32 *mem)
  report(test, *mem == 0x8400);
  }
  
 +static void illegal_movbe_handler(struct ex_regs *regs)
 +{
 + extern char bad_movbe_cont;
 +
 + ++exceptions;
 + regs-rip = (ulong)bad_movbe_cont;
 +}
 +
 +static void test_illegal_movbe(void)
 +{
 + if (!(cpuid(1).c  (1  22)))
 + printf(SKIP: illegal movbe\n);
 +
 + exceptions = 0;
 + handle_exception(UD_VECTOR, illegal_movbe_handler);
 + asm volatile(.byte 0x0f; .byte 0x38; .byte 0xf0; .byte 0xc0;\n\t
 +   bad_movbe_cont: : : : rax);
 + report(illegal movbe, exceptions == 1);
 + handle_exception(UD_VECTOR, 0);
 +}
 +
  int main()
  {
   void *mem;
 @@ -1119,6 +1140,7 @@ int main()
   test_string_io_mmio(mem);
  
   test_jmp_noncanonical(mem);
 + test_illegal_movbe();
  
   return report_summary();
  }
 

Reviewed-by: Paolo Bonzini pbonz...@redhat.com

Thanks,

Paolo
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [CFT PATCH v2 2/2] KVM: x86: support XSAVES usage in the host

2014-11-26 Thread Paolo Bonzini


On 26/11/2014 14:53, Radim Krčmář wrote:
   get_xsave = native_xrstor(guest_xsave);  xsave(aligned_userspace_buffer)
   set_xsave = xrstor(aligned_userspace_buffer);  native_xsave(guest_xsave)
   
   Could that work?
  
  It could, though it is more like
  
 get_fpu()
 native_xrstor(guest_xsave)
 xsave(buffer)
 put_fpu()
  
  and vice versa.  Also, the userspace buffer is mos likely not aligned,
  so you need some kind of bounce buffer.  It can be done if the CPUID
  turns out to be a bottleneck, apart from that it'd most likely be slower.
 Yeah, it was mostly making this code more future-proof ... it is easier
 to convince xsave.h to export its structures if CPUID is the problem.
 (I still see some hope for Linux, so performance isn't my primary goal.)
 
 I'm quite interested in CPUID now though, so I'll try to benchmark it,
 someday.

I'm not sure what is more future proof. :)  I wonder if native_xrstor
could be a problem the day XRSTORS actually sets/restores MSRs as the
processor documentation promises.  We do not need that to pass them to
userspace via KVM_GET/SET_XSAVE because we have KVM_GET/SET_MSR for
that, but it may cause problems if get_xsave uses XRSTORS and thus sets
the MSRs to unanticipated values.  Difficult to say without more
information on Intel's plans.

Paolo
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


With '-cpu host' in L1 nested guest fails: KVM: entry failed, hardware error 0x7

2014-11-26 Thread Kashyap Chamarthy
Hi,

This error occurs when using using '-cpu host'. And, is consistently
reproducible with Fedora 21's Kernels.


One way to reproduce the issue
--

a) Enable nested virt on host, I use this procedure[1]

b) Boot a a guest (ensure /dev/kvm character device shows up)

c) Invoke `libguestfs-test-tool` utility in L1 (which will run a QEMU
   appliance with -cpu host). You should see the failure in this log:

~/.cache/libvirt/qemu/log/guestfs-vegb4aor6a8d1r3e.log


QEMU CLI: 

[. . .]
2014-11-26 12:16:02.449+: starting up
LC_ALL=C 
PATH=/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/home/kashyapc/.local/bin:/home/kashyapc/bin:/usr/local/sbin:/usr/sbin:/sbin
 HOME=/home/kashyapc USER=kashyapc LOGNAME=kashyapc QEMU_AUDIO_DRV=none 
TMPDIR=/var/tmp /bin/qemu-kvm -name guestfs-vegb4aor6a8d1r3e -S -machine 
pc-i440fx-2.1,accel=kvm,usb=off -cpu host -m 500 -realtime mlock=off -smp 
1,sockets=1,cores=1,threads=1 -uuid 27869b93-0b83-4b57-a909-9265c677d307 
-nographic -no-user-config -nodefaults -device sga -chardev 
socket,id=charmonitor,path=/home/kashyapc/.config/libvirt/qemu/lib/guestfs-vegb4aor6a8d1r3e.monitor,server,nowait
 -mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc,driftfix=slew 
-global kvm-pit.lost_tick_policy=discard -no-hpet -no-reboot -no-acpi -boot 
strict=on -kernel /var/tmp/.guestfs-1000/appliance.d/kernel -initrd 
/var/tmp/.guestfs-1000/appliance.d/initrd -append panic=1 console=ttyS0 
udevtimeout=6000 udev.event-timeout=6000 no_timer_check acpi=off printk.time=1 
cgroup_disable=memory root=/dev/sdb selinux=0 TERM=screen -device 
piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -device 
virtio-scsi-pci,id=scsi0,bus=pci.0,addr=0x2 -device 
virtio-serial-pci,id=virtio-serial0,bus=pci.0,addr=0x3 -drive 
file=/tmp/libguestfsenp1an/devnull1,if=none,id=drive-scsi0-0-0-0,format=raw,cache=writeback
 -device 
scsi-hd,bus=scsi0.0,channel=0,scsi-id=0,lun=0,drive=drive-scsi0-0-0-0,id=scsi0-0-0-0,bootindex=1
 -drive 
file=/tmp/libguestfsenp1an/overlay2,if=none,id=drive-scsi0-0-1-0,format=qcow2,cache=unsafe
 -device 
scsi-hd,bus=scsi0.0,channel=0,scsi-id=1,lun=0,drive=drive-scsi0-0-1-0,id=scsi0-0-1-0
 -chardev socket,id=charserial0,path=/tmp/libguestfsenp1an/console.sock -device 
isa-serial,chardev=charserial0,id=serial0 -chardev 
socket,id=charchannel0,path=/tmp/libguestfsenp1an/guestfsd.sock -device 
virtserialport,bus=virtio-serial0.0,nr=1,chardev=charchannel0,id=channel0,name=org.libguestfs.channel.0
 -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x4 -msg timestamp=on
Domain id=2 is tainted: custom-argv
Domain id=2 is tainted: host-cpu
KVM: entry failed, hardware error 0x7
RAX=000f RBX= RCX=038f 
RDX=
RSI=000f RDI=038f RBP=88001ee4faa0 
RSP=88001ee4faa0
R8 = R9 =88001f00cf60 R10=88001ee4fa48 
R11=0005
R12=038f R13=81809390 R14= 
R15=88001f00ca40
RIP=8105b8ca RFL=0046 [---Z-P-] CPL=0 II=0 A20=1 SMM=0 HLT=0
ES =   00c0
CS =0010   00a09b00 DPL=0 CS64 [-RA]
SS =   00c0
DS =   00c0
FS =   00c0
GS = 88001f00  00c0
LDT=   00c0
TR =0040 88001f012080 2087 8b00 DPL=0 TSS64-busy
GDT= 88001f00a000 007f
IDT= ff56b000 0fff
CR0=8005003b CR2= CR3=01c11000 CR4=07f0
DR0= DR1= DR2= 
DR3= 
DR6=0ff0 DR7=0400
EFER=0d01
Code=20 48 89 f8 48 09 f0 5d c3 90 55 89 f0 89 f9 48 89 e5 0f 30 31 c0 5d c3 
66 90 55 89 f9 48 89 e5 0f 33 48 89 d7 89 c1 48 c1 e7 20 48 89 f8 48 09 c8 5d
[. . .]


I'll admit, with Fedora Rawhide Kernel, libvirt and QEMU versions I
could not reproduce the issue:

  $ uname -r; rpm -q qemu-system-x86 libvirt-daemon-kvm 
  3.18.0-0.rc6.git0.1.fc22.x86_64
  qemu-system-x86-2.2.0-0.1.rc1.fc22.x86_64
  libvirt-daemon-kvm-1.2.10-3.fc22.x86_64


There's an existing bug for Fedora, here[2]


[1] 
https://kashyapc.fedorapeople.org/virt/procedure-to-enable-nested-virt-on-intel-machines.txt
[2] https://bugzilla.redhat.com/show_bug.cgi?id=1016748#c9 -- KVM: entry
failed, hardware error 0x7 for nested kvm guest

-- 
/kashyap
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v4 25/42] vhost: add memory access wrappers

2014-11-26 Thread Michael S. Tsirkin
On Wed, Nov 26, 2014 at 02:54:38PM +0100, Cornelia Huck wrote:
 On Tue, 25 Nov 2014 18:43:10 +0200
 Michael S. Tsirkin m...@redhat.com wrote:
 
 These wrappers are needed to handle virtio endianness conversions.
 
 ?

yes, it's same as virtio ones. I'll add this text, thanks.

  Signed-off-by: Michael S. Tsirkin m...@redhat.com
  ---
   drivers/vhost/vhost.h | 33 -
   1 file changed, 32 insertions(+), 1 deletion(-)
  
  diff --git a/drivers/vhost/vhost.h b/drivers/vhost/vhost.h
  index 3eda654..b9032e8 100644
  --- a/drivers/vhost/vhost.h
  +++ b/drivers/vhost/vhost.h
  @@ -174,6 +174,37 @@ enum {
  
   static inline int vhost_has_feature(struct vhost_virtqueue *vq, int bit)
   {
  -   return vq-acked_features  (1  bit);
  +   return vq-acked_features  (1ULL  bit);
 
 Should this hunk go into patch 28?

Well, this is needed here since 1  32 is not legal C.

I can move it - this means patch 28 will have to move earlier
in series though.



  +}
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Another Obsolete Fix me in trace.h?

2014-11-26 Thread Steven Rostedt
On Wed, 26 Nov 2014 13:40:26 +0100
Radim Krčmář rkrc...@redhat.com wrote:

 2014-11-24 22:49+0100, Radim Krčmář:
  2014-11-24 16:19-0500, Steven Rostedt:
   That wouldn't be too hard to implement.
  
  I'll look at the patch tommorrow.
 
 The kernel part is trivial.
 Most of the code is going to be in tools/lib/traceevent/event-parse.c.
 
 I wasn't sure whether to
 1) define new 'enum print_arg_type' for our function and do exactly what
other __print* did
 2) extend existing PRINT_FUNC with variable arguments and register it as
a global event handler
 
 as (2) makes more sense to me, but we already had it when some __print*
 was implemented ... (and I didn't want to dig too deep into the project)

Add a new hardcoded instance. The PRINT_FUNC is for plugins that have a
unique function for the events they handle. Notice that event-parse
does not define any func handlers. If this is to be generic, then lets
make it specified as a new enum.

Thanks,

-- Steve
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v4 25/42] vhost: add memory access wrappers

2014-11-26 Thread Cornelia Huck
On Wed, 26 Nov 2014 16:05:39 +0200
Michael S. Tsirkin m...@redhat.com wrote:

 On Wed, Nov 26, 2014 at 02:54:38PM +0100, Cornelia Huck wrote:
  On Tue, 25 Nov 2014 18:43:10 +0200
  Michael S. Tsirkin m...@redhat.com wrote:

   @@ -174,6 +174,37 @@ enum {
   
static inline int vhost_has_feature(struct vhost_virtqueue *vq, int bit)
{
   - return vq-acked_features  (1  bit);
   + return vq-acked_features  (1ULL  bit);
  
  Should this hunk go into patch 28?
 
 Well, this is needed here since 1  32 is not legal C.
 
 I can move it - this means patch 28 will have to move earlier
 in series though.

Yes, I think it makes sense to move patch 28 earlier.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/7] KVM: add commentary for kvm_debug_exit_arch struct

2014-11-26 Thread Andrew Jones
On Tue, Nov 25, 2014 at 04:09:59PM +, Alex Bennée wrote:
 Bring into line with the commentary for the other structures and their
 KVM_EXIT_* cases.
 
 Signed-off-by: Alex Bennée alex.ben...@linaro.org
 
 diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
 index 6076882..523f476 100644
 --- a/include/uapi/linux/kvm.h
 +++ b/include/uapi/linux/kvm.h
 @@ -226,6 +226,7 @@ struct kvm_run {
   __u32 count;
   __u64 data_offset; /* relative to kvm_run start */
   } io;
 + /* KVM_EXIT_DEBUG */
   struct {
   struct kvm_debug_exit_arch arch;
   } debug;
 -- 
 2.1.3

Can we change this to a 'KVM: add commentary for kvm exit type data'
patch? We're also missing

/* KVM_EXIT_INTERNAL_ERROR */

and

/* KVM_EXIT_PAPR_HCALL */

 
 ___
 kvmarm mailing list
 kvm...@lists.cs.columbia.edu
 https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v4 25/42] vhost: add memory access wrappers

2014-11-26 Thread Michael S. Tsirkin
On Wed, Nov 26, 2014 at 03:17:50PM +0100, Cornelia Huck wrote:
 On Wed, 26 Nov 2014 16:05:39 +0200
 Michael S. Tsirkin m...@redhat.com wrote:
 
  On Wed, Nov 26, 2014 at 02:54:38PM +0100, Cornelia Huck wrote:
   On Tue, 25 Nov 2014 18:43:10 +0200
   Michael S. Tsirkin m...@redhat.com wrote:
 
@@ -174,6 +174,37 @@ enum {

 static inline int vhost_has_feature(struct vhost_virtqueue *vq, int 
bit)
 {
-   return vq-acked_features  (1  bit);
+   return vq-acked_features  (1ULL  bit);
   
   Should this hunk go into patch 28?
  
  Well, this is needed here since 1  32 is not legal C.
  
  I can move it - this means patch 28 will have to move earlier
  in series though.
 
 Yes, I think it makes sense to move patch 28 earlier.

Will do, thanks.

-- 
MST
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/7] KVM: arm: guest debug, define API headers

2014-11-26 Thread Andrew Jones
On Tue, Nov 25, 2014 at 04:10:00PM +, Alex Bennée wrote:
 This commit defines the API headers for guest debugging. There are two
 architecture specific debug structures:
 
   - kvm_guest_debug_arch, allows us to pass in HW debug registers
   - kvm_debug_exit_arch, signals the exact debug exit and address
 
 The type of debugging being used is control by the architecture specific
 control bits of the kvm_guest_debug-control flags in the ioctl
 structure.
 
 Signed-off-by: Alex Bennée alex.ben...@linaro.org
 
 diff --git a/arch/arm64/include/uapi/asm/kvm.h 
 b/arch/arm64/include/uapi/asm/kvm.h
 index 8e38878..de2450c 100644
 --- a/arch/arm64/include/uapi/asm/kvm.h
 +++ b/arch/arm64/include/uapi/asm/kvm.h
 @@ -93,10 +93,30 @@ struct kvm_sregs {
  struct kvm_fpu {
  };
  
 +/* See ARM ARM D7.3: Debug Registers
 + *
 + * The control registers are stored as 64bit values as the setup code
 + * is shared with the normal cpu context restore code in hyp.S which
 + * is 64 bit only.
 + */
 +#define KVM_ARM_NDBG_REGS 16
  struct kvm_guest_debug_arch {
 + __u64 dbg_bcr[KVM_ARM_NDBG_REGS];
 + __u64 dbg_bvr[KVM_ARM_NDBG_REGS];
 + __u64 dbg_wcr[KVM_ARM_NDBG_REGS];
 + __u64 dbg_wvr[KVM_ARM_NDBG_REGS];
  };
  
 +/* Exit types which define why we did a debug exit */
 +#define KVM_DEBUG_EXIT_ERROR 0x0
 +#define KVM_DEBUG_EXIT_SINGLE_STEP   0x1
 +#define KVM_DEBUG_EXIT_SW_BKPT   0x2
 +#define KVM_DEBUG_EXIT_HW_BKPT   0x3
 +#define KVM_DEBUG_EXIT_HW_WTPT   0x4
 +
  struct kvm_debug_exit_arch {
 + __u64 address;
 + __u32 exit_type;
  };
  
  struct kvm_sync_regs {
 @@ -198,4 +218,12 @@ struct kvm_arch_memory_slot {
  
  #endif
  
 +/* Architecture related debug defines - upper 16 bits of
 + * kvm_guest_debug-control
 + */
 +#define KVM_GUESTDBG_USE_SW_BP_SHIFT 16
 +#define KVM_GUESTDBG_USE_SW_BP   (1  
 KVM_GUESTDBG_USE_SW_BP_SHIFT)
 +#define KVM_GUESTDBG_USE_HW_BP_SHIFT 17
 +#define KVM_GUESTDBG_USE_HW_BP   (1  
 KVM_GUESTDBG_USE_HW_BP_SHIFT)
 +

I see this are defined in arch/x86/include/uapi/asm/kvm.h,
so you needed to reproduce them here, but shouldn't they
be promoted to include/uapi/linux/kvm.h instead?

  #endif /* __ARM_KVM_H__ */
 -- 
 2.1.3
 
 ___
 kvmarm mailing list
 kvm...@lists.cs.columbia.edu
 https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v4 26/42] vhost/net: force len for TX to host endian

2014-11-26 Thread Cornelia Huck
On Tue, 25 Nov 2014 18:43:14 +0200
Michael S. Tsirkin m...@redhat.com wrote:

 We use native endian-ness internally but never
 expose it to guest.
 
 Signed-off-by: Michael S. Tsirkin m...@redhat.com
 ---
  drivers/vhost/net.c | 10 +-
  1 file changed, 5 insertions(+), 5 deletions(-)
 
 diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
 index 8dae2f7..dce5c58 100644
 --- a/drivers/vhost/net.c
 +++ b/drivers/vhost/net.c
 @@ -48,15 +48,15 @@ MODULE_PARM_DESC(experimental_zcopytx, Enable Zero Copy 
 TX;
   * status internally; used for zerocopy tx only.
   */
  /* Lower device DMA failed */
 -#define VHOST_DMA_FAILED_LEN 3
 +#define VHOST_DMA_FAILED_LEN ((__force __virtio32)3)
  /* Lower device DMA done */
 -#define VHOST_DMA_DONE_LEN   2
 +#define VHOST_DMA_DONE_LEN   ((__force __virtio32)2)
  /* Lower device DMA in progress */
 -#define VHOST_DMA_IN_PROGRESS1
 +#define VHOST_DMA_IN_PROGRESS((__force __virtio32)1)
  /* Buffer unused */
 -#define VHOST_DMA_CLEAR_LEN  0
 +#define VHOST_DMA_CLEAR_LEN  ((__force __virtio32)0)

I find these constants a bit confusing: What does __virtio32 mean
without the context of a vq or device?

 
 -#define VHOST_DMA_IS_DONE(len) ((len) = VHOST_DMA_DONE_LEN)
 +#define VHOST_DMA_IS_DONE(len) ((__force u32)(len) = (__force 
 u32)VHOST_DMA_DONE_LEN)

And here you cast it to a plain u32 again.

I looked at the final code, and you seem either to use the above
constants for .len or do a cpu_to_vhost32(). Wouldn't you need to
convert the constants as well?

 
  enum {
   VHOST_NET_FEATURES = VHOST_FEATURES |

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/7] KVM: arm: guest debug, add stub KVM_SET_GUEST_DEBUG ioctl

2014-11-26 Thread Andrew Jones
On Tue, Nov 25, 2014 at 04:10:01PM +, Alex Bennée wrote:
 This commit adds a stub function to support the KVM_SET_GUEST_DEBUG
 ioctl. Currently any operation flag will return EINVAL. Actual
 functionality will be added with further patches.

Technically the stub is already there, and you're extending it to
start looking at control flags, but still not doing anything yet.

 
 Signed-off-by: Alex Bennée alex.ben...@linaro.org.
 
 diff --git a/Documentation/virtual/kvm/api.txt 
 b/Documentation/virtual/kvm/api.txt
 index 7610eaa..2c6386e 100644
 --- a/Documentation/virtual/kvm/api.txt
 +++ b/Documentation/virtual/kvm/api.txt
 @@ -2570,7 +2570,7 @@ handled.
  4.87 KVM_SET_GUEST_DEBUG
  
  Capability: KVM_CAP_SET_GUEST_DEBUG
 -Architectures: x86, s390, ppc
 +Architectures: x86, s390, ppc, arm64
  Type: vcpu ioctl
  Parameters: struct kvm_guest_debug (in)
  Returns: 0 on success; -1 on error
 diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
 index 9e193c8..a0ff410 100644
 --- a/arch/arm/kvm/arm.c
 +++ b/arch/arm/kvm/arm.c
 @@ -180,6 +180,7 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long 
 ext)
   case KVM_CAP_ARM_PSCI:
   case KVM_CAP_ARM_PSCI_0_2:
   case KVM_CAP_READONLY_MEM:
 + case KVM_CAP_SET_GUEST_DEBUG:
   r = 1;
   break;
   case KVM_CAP_COALESCED_MMIO:
 @@ -302,7 +303,34 @@ void kvm_arch_vcpu_put(struct kvm_vcpu *vcpu)
  int kvm_arch_vcpu_ioctl_set_guest_debug(struct kvm_vcpu *vcpu,
   struct kvm_guest_debug *dbg)
  {
 - return -EINVAL;
 + /* If it's not enabled clear all flags */
 + if (!(dbg-control  KVM_GUESTDBG_ENABLE)) {
 + vcpu-guest_debug = 0;
 + return 0;
 + }
 +
 + vcpu-guest_debug = dbg-control;
 + kvm_info(%s: guest_debug is 0x%lx\n, __func__, vcpu-guest_debug);
 +
 + /* Single Step */
 + if (vcpu-guest_debug  KVM_GUESTDBG_SINGLESTEP) {
 + kvm_info(SS requested, not yet implemented\n);
 + return -EINVAL;
 + }
 +
 + /* Software Break Points */
 + if (vcpu-guest_debug  KVM_GUESTDBG_USE_SW_BP) {
 + kvm_info(SW BP support requested, not yet implemented\n);
 + return -EINVAL;
 + }
 +
 + /* Hardware assisted Break and Watch points */
 + if (vcpu-guest_debug  KVM_GUESTDBG_USE_HW_BP) {
 + kvm_info(HW BP support requested, not yet implemented\n);
 + return -EINVAL;
 + }
 +
 + return 0;
  }

I guess all the kvm_info's were useful for developing this patch series,
but do we still need them?

  
  
 -- 
 2.1.3
 
 ___
 kvmarm mailing list
 kvm...@lists.cs.columbia.edu
 https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v4 26/42] vhost/net: force len for TX to host endian

2014-11-26 Thread Michael S. Tsirkin
On Wed, Nov 26, 2014 at 03:31:02PM +0100, Cornelia Huck wrote:
 On Tue, 25 Nov 2014 18:43:14 +0200
 Michael S. Tsirkin m...@redhat.com wrote:
 
  We use native endian-ness internally but never
  expose it to guest.
  
  Signed-off-by: Michael S. Tsirkin m...@redhat.com
  ---
   drivers/vhost/net.c | 10 +-
   1 file changed, 5 insertions(+), 5 deletions(-)
  
  diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
  index 8dae2f7..dce5c58 100644
  --- a/drivers/vhost/net.c
  +++ b/drivers/vhost/net.c
  @@ -48,15 +48,15 @@ MODULE_PARM_DESC(experimental_zcopytx, Enable Zero 
  Copy TX;
* status internally; used for zerocopy tx only.
*/
   /* Lower device DMA failed */
  -#define VHOST_DMA_FAILED_LEN   3
  +#define VHOST_DMA_FAILED_LEN   ((__force __virtio32)3)
   /* Lower device DMA done */
  -#define VHOST_DMA_DONE_LEN 2
  +#define VHOST_DMA_DONE_LEN ((__force __virtio32)2)
   /* Lower device DMA in progress */
  -#define VHOST_DMA_IN_PROGRESS  1
  +#define VHOST_DMA_IN_PROGRESS  ((__force __virtio32)1)
   /* Buffer unused */
  -#define VHOST_DMA_CLEAR_LEN0
  +#define VHOST_DMA_CLEAR_LEN((__force __virtio32)0)
 
 I find these constants a bit confusing: What does __virtio32 mean
 without the context of a vq or device?
 
  
  -#define VHOST_DMA_IS_DONE(len) ((len) = VHOST_DMA_DONE_LEN)
  +#define VHOST_DMA_IS_DONE(len) ((__force u32)(len) = (__force 
  u32)VHOST_DMA_DONE_LEN)
 
 And here you cast it to a plain u32 again.
 
 I looked at the final code, and you seem either to use the above
 constants for .len or do a cpu_to_vhost32(). Wouldn't you need to
 convert the constants as well?

I tried to explain it in the commit message.
It's a hack in vhost: it keeps virtio used structure in host
memory, but abuses length field for internal housekeeping.
This works because length in used ring for tx is always 0.


  
   enum {
  VHOST_NET_FEATURES = VHOST_FEATURES |
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [CFT PATCH v2 2/2] KVM: x86: support XSAVES usage in the host

2014-11-26 Thread Radim Krčmář
2014-11-26 14:57+0100, Paolo Bonzini:
 
 
 On 26/11/2014 14:53, Radim Krčmář wrote:
get_xsave = native_xrstor(guest_xsave);  
xsave(aligned_userspace_buffer)
set_xsave = xrstor(aligned_userspace_buffer);  
native_xsave(guest_xsave)

Could that work?
   
   It could, though it is more like
   
  get_fpu()
  native_xrstor(guest_xsave)
  xsave(buffer)
  put_fpu()
   
   and vice versa.  Also, the userspace buffer is mos likely not aligned,
   so you need some kind of bounce buffer.  It can be done if the CPUID
   turns out to be a bottleneck, apart from that it'd most likely be slower.
  Yeah, it was mostly making this code more future-proof ... it is easier
  to convince xsave.h to export its structures if CPUID is the problem.
  (I still see some hope for Linux, so performance isn't my primary goal.)
  
  I'm quite interested in CPUID now though, so I'll try to benchmark it,
  someday.

(Sorry, I don't fully understand your thoughts and I just talk more of
 the same in those scenarios.)

 I'm not sure what is more future proof. :)  I wonder if native_xrstor
 could be a problem the day XRSTORS actually sets/restores MSRs as the
 processor documentation promises.

Isn't that a problem only for emulation?

We do not need that to pass them to
 userspace via KVM_GET/SET_XSAVE because we have KVM_GET/SET_MSR for
 that, but it may cause problems if get_xsave uses XRSTORS and thus sets
 the MSRs to unanticipated values.

KVM_GET/SET_XSAVE is defined to use the format of XSAVE/XRSTOR.
(Userspace shouldn't know how we actually store guest's state;
 KVM_GET/SET_MSR doesn't read host's state.)

XRSTORS won't affect the guest in any way, we are just going to use it
to convert the xsave, so any side-effects are going to stay in the host.
(This could break the host though.)

Difficult to say without more
 information on Intel's plans.

My main presumption is that XSAVE*-XRSTOR*-XSAVE-XRSTOR has the same
result as XSAVE-XRSTOR, because we are only interested in the state,
not in any metadata.
(If it isn't possible to combine intructions, like XSAVE after XRSTORS,
 this solution won't work.)
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Another Obsolete Fix me in trace.h?

2014-11-26 Thread Radim Krčmář
2014-11-26 09:15-0500, Steven Rostedt:
 On Wed, 26 Nov 2014 13:40:26 +0100
 Radim Krčmář rkrc...@redhat.com wrote:
  The kernel part is trivial.
  Most of the code is going to be in tools/lib/traceevent/event-parse.c.
  
  I wasn't sure whether to
  1) define new 'enum print_arg_type' for our function and do exactly what
 other __print* did
  2) extend existing PRINT_FUNC with variable arguments and register it as
 a global event handler
  
  as (2) makes more sense to me, but we already had it when some __print*
  was implemented ... (and I didn't want to dig too deep into the project)
 
 Add a new hardcoded instance. The PRINT_FUNC is for plugins that have a
 unique function for the events they handle. Notice that event-parse
 does not define any func handlers. If this is to be generic, then lets
 make it specified as a new enum.

Ok, thanks.

(It just seemed weird that we have an infrastructure that could allow a
 generic plugin to cover this functionality.)
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v4 26/42] vhost/net: force len for TX to host endian

2014-11-26 Thread Cornelia Huck
On Wed, 26 Nov 2014 16:44:00 +0200
Michael S. Tsirkin m...@redhat.com wrote:

 On Wed, Nov 26, 2014 at 03:31:02PM +0100, Cornelia Huck wrote:
  On Tue, 25 Nov 2014 18:43:14 +0200
  Michael S. Tsirkin m...@redhat.com wrote:
  
   We use native endian-ness internally but never
   expose it to guest.
   
   Signed-off-by: Michael S. Tsirkin m...@redhat.com
   ---
drivers/vhost/net.c | 10 +-
1 file changed, 5 insertions(+), 5 deletions(-)
   
   diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
   index 8dae2f7..dce5c58 100644
   --- a/drivers/vhost/net.c
   +++ b/drivers/vhost/net.c
   @@ -48,15 +48,15 @@ MODULE_PARM_DESC(experimental_zcopytx, Enable Zero 
   Copy TX;
 * status internally; used for zerocopy tx only.
 */
/* Lower device DMA failed */
   -#define VHOST_DMA_FAILED_LEN 3
   +#define VHOST_DMA_FAILED_LEN ((__force __virtio32)3)
/* Lower device DMA done */
   -#define VHOST_DMA_DONE_LEN   2
   +#define VHOST_DMA_DONE_LEN   ((__force __virtio32)2)
/* Lower device DMA in progress */
   -#define VHOST_DMA_IN_PROGRESS1
   +#define VHOST_DMA_IN_PROGRESS((__force __virtio32)1)
/* Buffer unused */
   -#define VHOST_DMA_CLEAR_LEN  0
   +#define VHOST_DMA_CLEAR_LEN  ((__force __virtio32)0)
  
  I find these constants a bit confusing: What does __virtio32 mean
  without the context of a vq or device?
  
   
   -#define VHOST_DMA_IS_DONE(len) ((len) = VHOST_DMA_DONE_LEN)
   +#define VHOST_DMA_IS_DONE(len) ((__force u32)(len) = (__force 
   u32)VHOST_DMA_DONE_LEN)
  
  And here you cast it to a plain u32 again.
  
  I looked at the final code, and you seem either to use the above
  constants for .len or do a cpu_to_vhost32(). Wouldn't you need to
  convert the constants as well?
 
 I tried to explain it in the commit message.
 It's a hack in vhost: it keeps virtio used structure in host
 memory, but abuses length field for internal housekeeping.
 This works because length in used ring for tx is always 0.

Ah, ok. It might make sense to add this explanation to the patch :)

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/7] KVM: arm: guest debug, define API headers

2014-11-26 Thread Alex Bennée

Andrew Jones drjo...@redhat.com writes:

 On Tue, Nov 25, 2014 at 04:10:00PM +, Alex Bennée wrote:
 This commit defines the API headers for guest debugging. There are two
 architecture specific debug structures:
snip
 +/* Architecture related debug defines - upper 16 bits of
 + * kvm_guest_debug-control
 + */
 +#define KVM_GUESTDBG_USE_SW_BP_SHIFT16
 +#define KVM_GUESTDBG_USE_SW_BP  (1  
 KVM_GUESTDBG_USE_SW_BP_SHIFT)
 +#define KVM_GUESTDBG_USE_HW_BP_SHIFT17
 +#define KVM_GUESTDBG_USE_HW_BP  (1  
 KVM_GUESTDBG_USE_HW_BP_SHIFT)
 +

 I see this are defined in arch/x86/include/uapi/asm/kvm.h,
 so you needed to reproduce them here, but shouldn't they
 be promoted to include/uapi/linux/kvm.h instead?

Well if we move them to common uapi we either restrict the $ARCH
specific options that don't have SW/HW BKPTS (would be weird but...) or
make them generic in the lower 16 bits (breaks API).

But in principle I have no objection if other don't.

-- 
Alex Bennée
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v4 26/42] vhost/net: force len for TX to host endian

2014-11-26 Thread Michael S. Tsirkin
On Wed, Nov 26, 2014 at 03:54:40PM +0100, Cornelia Huck wrote:
 On Wed, 26 Nov 2014 16:44:00 +0200
 Michael S. Tsirkin m...@redhat.com wrote:
 
  On Wed, Nov 26, 2014 at 03:31:02PM +0100, Cornelia Huck wrote:
   On Tue, 25 Nov 2014 18:43:14 +0200
   Michael S. Tsirkin m...@redhat.com wrote:
   
We use native endian-ness internally but never
expose it to guest.

Signed-off-by: Michael S. Tsirkin m...@redhat.com
---
 drivers/vhost/net.c | 10 +-
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
index 8dae2f7..dce5c58 100644
--- a/drivers/vhost/net.c
+++ b/drivers/vhost/net.c
@@ -48,15 +48,15 @@ MODULE_PARM_DESC(experimental_zcopytx, Enable Zero 
Copy TX;
  * status internally; used for zerocopy tx only.
  */
 /* Lower device DMA failed */
-#define VHOST_DMA_FAILED_LEN   3
+#define VHOST_DMA_FAILED_LEN   ((__force __virtio32)3)
 /* Lower device DMA done */
-#define VHOST_DMA_DONE_LEN 2
+#define VHOST_DMA_DONE_LEN ((__force __virtio32)2)
 /* Lower device DMA in progress */
-#define VHOST_DMA_IN_PROGRESS  1
+#define VHOST_DMA_IN_PROGRESS  ((__force __virtio32)1)
 /* Buffer unused */
-#define VHOST_DMA_CLEAR_LEN0
+#define VHOST_DMA_CLEAR_LEN((__force __virtio32)0)
   
   I find these constants a bit confusing: What does __virtio32 mean
   without the context of a vq or device?
   

-#define VHOST_DMA_IS_DONE(len) ((len) = VHOST_DMA_DONE_LEN)
+#define VHOST_DMA_IS_DONE(len) ((__force u32)(len) = (__force 
u32)VHOST_DMA_DONE_LEN)
   
   And here you cast it to a plain u32 again.
   
   I looked at the final code, and you seem either to use the above
   constants for .len or do a cpu_to_vhost32(). Wouldn't you need to
   convert the constants as well?
  
  I tried to explain it in the commit message.
  It's a hack in vhost: it keeps virtio used structure in host
  memory, but abuses length field for internal housekeeping.
  This works because length in used ring for tx is always 0.
 
 Ah, ok. It might make sense to add this explanation to the patch :)

Absolutely.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/7] KVM: arm: guest debug, add stub KVM_SET_GUEST_DEBUG ioctl

2014-11-26 Thread Alex Bennée

Andrew Jones drjo...@redhat.com writes:

 On Tue, Nov 25, 2014 at 04:10:01PM +, Alex Bennée wrote:
 This commit adds a stub function to support the KVM_SET_GUEST_DEBUG
 ioctl. Currently any operation flag will return EINVAL. Actual
 functionality will be added with further patches.

 Technically the stub is already there, and you're extending it to
 start looking at control flags, but still not doing anything yet.

Sure we do:

  int kvm_arch_vcpu_ioctl_set_guest_debug(struct kvm_vcpu *vcpu,
  struct kvm_guest_debug *dbg)
  {
 -return -EINVAL;
 +/* If it's not enabled clear all flags */
 +if (!(dbg-control  KVM_GUESTDBG_ENABLE)) {
 +vcpu-guest_debug = 0;
 +return 0;
 +}

That's some class non-functionality right there ;-)

 +vcpu-guest_debug = dbg-control;
 +kvm_info(%s: guest_debug is 0x%lx\n, __func__, vcpu-guest_debug);
 +
 +/* Single Step */
 +if (vcpu-guest_debug  KVM_GUESTDBG_SINGLESTEP) {
 +kvm_info(SS requested, not yet implemented\n);
 +return -EINVAL;
 +}
 +
 +/* Software Break Points */
 +if (vcpu-guest_debug  KVM_GUESTDBG_USE_SW_BP) {
 +kvm_info(SW BP support requested, not yet implemented\n);
 +return -EINVAL;
 +}
 +
 +/* Hardware assisted Break and Watch points */
 +if (vcpu-guest_debug  KVM_GUESTDBG_USE_HW_BP) {
 +kvm_info(HW BP support requested, not yet implemented\n);
 +return -EINVAL;
 +}
 +
 +return 0;
  }

 I guess all the kvm_info's were useful for developing this patch series,
 but do we still need them?

They also served the very useful roll of stopping checkpatch.pl bitching
about my reluctance to remove braces from the if () { } clauses. However
I take your point. I can certainly remove the kvm_info() statements as
each bit of functionality is added while leaving this one to help when
someone is bisecting and confused.

-- 
Alex Bennée
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/7] KVM: arm: guest debug, define API headers

2014-11-26 Thread Alex Bennée

Peter Maydell peter.mayd...@linaro.org writes:

 On 25 November 2014 at 16:10, Alex Bennée alex.ben...@linaro.org wrote:
 +/* Exit types which define why we did a debug exit */
 +#define KVM_DEBUG_EXIT_ERROR   0x0
 +#define KVM_DEBUG_EXIT_SINGLE_STEP 0x1
 +#define KVM_DEBUG_EXIT_SW_BKPT 0x2
 +#define KVM_DEBUG_EXIT_HW_BKPT 0x3
 +#define KVM_DEBUG_EXIT_HW_WTPT 0x4

 The names of these imply that they're generic, but they're
 defined in an arch-specific header file...

Yeah, I think these will die and I'll just export the syndrome
information directly to QEMU.

-- 
Alex Bennée
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] KVM: x86: Fix reserved x2apic registers

2014-11-26 Thread Nadav Amit
x2APIC has no registers for DFR and ICR (see Intel SDM 10.12.1.2 x2APIC
Register Address Space). KVM needs to cause #GP on such accesses.

Fix it.

Signed-off-by: Nadav Amit na...@cs.technion.ac.il
---
 arch/x86/kvm/lapic.c | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index e0e5642..7611b75 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -1897,6 +1897,12 @@ int kvm_x2apic_msr_read(struct kvm_vcpu *vcpu, u32 msr, 
u64 *data)
if (!irqchip_in_kernel(vcpu-kvm) || !apic_x2apic_mode(apic))
return 1;
 
+   if (reg == APIC_DFR || reg == APIC_ICR2) {
+   apic_debug(KVM_APIC_READ: read x2apic reserved register %x\n,
+  reg);
+   return 1;
+   }
+
if (apic_reg_read(apic, reg, 4, low))
return 1;
if (msr == 0x830)
-- 
1.9.1

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Another Obsolete Fix me in trace.h?

2014-11-26 Thread Steven Rostedt
On Wed, 26 Nov 2014 15:49:34 +0100
Radim Krčmář rkrc...@redhat.com wrote:

 
 (It just seemed weird that we have an infrastructure that could allow a
  generic plugin to cover this functionality.)

Maybe in the future we may do something like that. But for now, think
of these as optimized ;-)

-- Steve
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] KVM: x86: Fix reserved x2apic registers

2014-11-26 Thread Nadav Amit
The commit message has a typo - it should be “DFR and ICR2”.
I didn’t post a new version in order not to confuse.

Nadav

Nadav Amit na...@cs.technion.ac.il wrote:

 x2APIC has no registers for DFR and ICR (see Intel SDM 10.12.1.2 x2APIC
 Register Address Space). KVM needs to cause #GP on such accesses.
 
 Fix it.
 
 Signed-off-by: Nadav Amit na...@cs.technion.ac.il
 ---
 arch/x86/kvm/lapic.c | 6 ++
 1 file changed, 6 insertions(+)
 
 diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
 index e0e5642..7611b75 100644
 --- a/arch/x86/kvm/lapic.c
 +++ b/arch/x86/kvm/lapic.c
 @@ -1897,6 +1897,12 @@ int kvm_x2apic_msr_read(struct kvm_vcpu *vcpu, u32 
 msr, u64 *data)
   if (!irqchip_in_kernel(vcpu-kvm) || !apic_x2apic_mode(apic))
   return 1;
 
 + if (reg == APIC_DFR || reg == APIC_ICR2) {
 + apic_debug(KVM_APIC_READ: read x2apic reserved register %x\n,
 +reg);
 + return 1;
 + }
 +
   if (apic_reg_read(apic, reg, 4, low))
   return 1;
   if (msr == 0x830)
 -- 
 1.9.1
 
 --
 To unsubscribe from this list: send the line unsubscribe kvm in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] KVM: x86: Fix reserved x2apic registers

2014-11-26 Thread Radim Krčmář
2014-11-26 17:11+0200, Nadav Amit:
 x2APIC has no registers for DFR and ICR (see Intel SDM 10.12.1.2 x2APIC
 Register Address Space). KVM needs to cause #GP on such accesses.

We should to take care of write to ICR2 as well.

 Fix it.
 
 Signed-off-by: Nadav Amit na...@cs.technion.ac.il
 ---
  arch/x86/kvm/lapic.c | 6 ++
  1 file changed, 6 insertions(+)
 
 diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
 index e0e5642..7611b75 100644
 --- a/arch/x86/kvm/lapic.c
 +++ b/arch/x86/kvm/lapic.c
 @@ -1897,6 +1897,12 @@ int kvm_x2apic_msr_read(struct kvm_vcpu *vcpu, u32 
 msr, u64 *data)
   if (!irqchip_in_kernel(vcpu-kvm) || !apic_x2apic_mode(apic))
   return 1;
  
 + if (reg == APIC_DFR || reg == APIC_ICR2) {
 + apic_debug(KVM_APIC_READ: read x2apic reserved register %x\n,
 +reg);
 + return 1;
 + }
 +
   if (apic_reg_read(apic, reg, 4, low))
   return 1;
   if (msr == 0x830)
 -- 
 1.9.1
 
 --
 To unsubscribe from this list: send the line unsubscribe kvm in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2] KVM: x86: Fix reserved x2apic registers

2014-11-26 Thread Nadav Amit
x2APIC has no registers for DFR and ICR2 (see Intel SDM 10.12.1.2 x2APIC
Register Address Space). KVM needs to cause #GP on such accesses.

Fix it (DFR and ICR2 on read, ICR2 on write, DFR already handled on writes).

Signed-off-by: Nadav Amit na...@cs.technion.ac.il

---
v1-v2: Cause #GP on ICR2 write access (as Radim pointed)
---
 arch/x86/kvm/lapic.c | 9 +
 1 file changed, 9 insertions(+)

diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index e0e5642..9c90d31 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -1883,6 +1883,9 @@ int kvm_x2apic_msr_write(struct kvm_vcpu *vcpu, u32 msr, 
u64 data)
if (!irqchip_in_kernel(vcpu-kvm) || !apic_x2apic_mode(apic))
return 1;
 
+   if (reg == APIC_ICR2)
+   return 1;
+
/* if this is ICR write vector before command */
if (msr == 0x830)
apic_reg_write(apic, APIC_ICR2, (u32)(data  32));
@@ -1897,6 +1900,12 @@ int kvm_x2apic_msr_read(struct kvm_vcpu *vcpu, u32 msr, 
u64 *data)
if (!irqchip_in_kernel(vcpu-kvm) || !apic_x2apic_mode(apic))
return 1;
 
+   if (reg == APIC_DFR || reg == APIC_ICR2) {
+   apic_debug(KVM_APIC_READ: read x2apic reserved register %x\n,
+  reg);
+   return 1;
+   }
+
if (apic_reg_read(apic, reg, 4, low))
return 1;
if (msr == 0x830)
-- 
1.9.1

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2] KVM: x86: Fix reserved x2apic registers

2014-11-26 Thread Radim Krčmář
2014-11-26 17:56+0200, Nadav Amit:
 x2APIC has no registers for DFR and ICR2 (see Intel SDM 10.12.1.2 x2APIC
 Register Address Space). KVM needs to cause #GP on such accesses.
 
 Fix it (DFR and ICR2 on read, ICR2 on write, DFR already handled on writes).
 
 Signed-off-by: Nadav Amit na...@cs.technion.ac.il
 
 ---

Reviewed-by: Radim Krčmář rkrc...@redhat.com

 v1-v2: Cause #GP on ICR2 write access (as Radim pointed)
 ---
  arch/x86/kvm/lapic.c | 9 +
  1 file changed, 9 insertions(+)
 
 diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
 index e0e5642..9c90d31 100644
 --- a/arch/x86/kvm/lapic.c
 +++ b/arch/x86/kvm/lapic.c
 @@ -1883,6 +1883,9 @@ int kvm_x2apic_msr_write(struct kvm_vcpu *vcpu, u32 
 msr, u64 data)
   if (!irqchip_in_kernel(vcpu-kvm) || !apic_x2apic_mode(apic))
   return 1;
  
 + if (reg == APIC_ICR2)
 + return 1;
 +
   /* if this is ICR write vector before command */
   if (msr == 0x830)
   apic_reg_write(apic, APIC_ICR2, (u32)(data  32));
 @@ -1897,6 +1900,12 @@ int kvm_x2apic_msr_read(struct kvm_vcpu *vcpu, u32 
 msr, u64 *data)
   if (!irqchip_in_kernel(vcpu-kvm) || !apic_x2apic_mode(apic))
   return 1;
  
 + if (reg == APIC_DFR || reg == APIC_ICR2) {
 + apic_debug(KVM_APIC_READ: read x2apic reserved register %x\n,
 +reg);
 + return 1;
 + }
 +
   if (apic_reg_read(apic, reg, 4, low))
   return 1;
   if (msr == 0x830)
 -- 
 1.9.1
 
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] KVM: x86: use MSR_ICR instead of a number

2014-11-26 Thread Radim Krčmář
0x830 MSR is 0x300 xAPIC MMIO, which is MSR_ICR.

Signed-off-by: Radim Krčmář rkrc...@redhat.com
---
 This applies on top of Amit's
   [PATCH v2] KVM: x86: Fix reserved x2apic registers
 in which I noticed this minor deficit.

 arch/x86/kvm/lapic.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index 9c90d31..687874f 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -1887,7 +1887,7 @@ int kvm_x2apic_msr_write(struct kvm_vcpu *vcpu, u32 msr, 
u64 data)
return 1;
 
/* if this is ICR write vector before command */
-   if (msr == 0x830)
+   if (reg == APIC_ICR)
apic_reg_write(apic, APIC_ICR2, (u32)(data  32));
return apic_reg_write(apic, reg, (u32)data);
 }
@@ -1908,7 +1908,7 @@ int kvm_x2apic_msr_read(struct kvm_vcpu *vcpu, u32 msr, 
u64 *data)
 
if (apic_reg_read(apic, reg, 4, low))
return 1;
-   if (msr == 0x830)
+   if (reg == APIC_ICR)
apic_reg_read(apic, APIC_ICR2, 4, high);
 
*data = (((u64)high)  32) | low;
-- 
2.1.0

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 4/7] KVM: arm64: guest debug, add SW break point support

2014-11-26 Thread Andrew Jones
On Tue, Nov 25, 2014 at 04:10:02PM +, Alex Bennée wrote:
 This adds support for SW breakpoints inserted by userspace.
 
 First we need to trap all BKPT exceptions in the hypervisor (ELS). This
 in controlled through the MDCR_EL2 register. I've added a new field to
 the vcpu structure to hold this value. There should be scope to
 rationlise this with the VCPU_DEBUG_FLAGS/KVM_ARM64_DEBUG_DIRTY_SHIFT
 manipulation in later patches.

I think we should start using the new mdcr_el2 field everywhere we plan
to within the same series that it is introduced. Otherwise it's hard
to tell if we need an mdcr_el2 field, or if a more generic field would
suffice. We can always translate bits of a more generic field to
mdcr_el2 bits when necessary, but not the reverse.

 
 Once the exception arrives we triggers an exit from the main hypervisor
s/triggers/trigger/
 
 loop to the hypervisor controller. The kvm_debug_exit_arch carries the
 address of the exception. If the controller doesn't know of breakpoint
 ^ a
 then we have a guest inserted breakpoint and the hypervisor needs to
 start again and deliver the exception to guest.
 
 Signed-off-by: Alex Bennée alex.ben...@linaro.org
 
 diff --git a/Documentation/virtual/kvm/api.txt 
 b/Documentation/virtual/kvm/api.txt
 index 2c6386e..9383359 100644
 --- a/Documentation/virtual/kvm/api.txt
 +++ b/Documentation/virtual/kvm/api.txt
 @@ -2592,7 +2592,7 @@ when running. Common control bits are:
  The top 16 bits of the control field are architecture specific control
  flags which can include the following:
  
 -  - KVM_GUESTDBG_USE_SW_BP: using software breakpoints [x86]
 +  - KVM_GUESTDBG_USE_SW_BP: using software breakpoints [x86, arm64]
- KVM_GUESTDBG_USE_HW_BP: using hardware breakpoints [x86, s390]
- KVM_GUESTDBG_INJECT_DB: inject DB type exception [x86]
- KVM_GUESTDBG_INJECT_BP: inject BP type exception [x86]
 diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
 index a0ff410..48d26bb 100644
 --- a/arch/arm/kvm/arm.c
 +++ b/arch/arm/kvm/arm.c
 @@ -303,6 +303,9 @@ void kvm_arch_vcpu_put(struct kvm_vcpu *vcpu)
  int kvm_arch_vcpu_ioctl_set_guest_debug(struct kvm_vcpu *vcpu,
   struct kvm_guest_debug *dbg)
  {
 + /* Route debug traps to el2? */
 + bool route_el2 = false;
 +
   /* If it's not enabled clear all flags */
   if (!(dbg-control  KVM_GUESTDBG_ENABLE)) {
   vcpu-guest_debug = 0;
 @@ -320,8 +323,8 @@ int kvm_arch_vcpu_ioctl_set_guest_debug(struct kvm_vcpu 
 *vcpu,
  
   /* Software Break Points */
   if (vcpu-guest_debug  KVM_GUESTDBG_USE_SW_BP) {
 - kvm_info(SW BP support requested, not yet implemented\n);
 - return -EINVAL;
 + kvm_info(SW BP support requested\n);
 + route_el2 = true;
   }
  
   /* Hardware assisted Break and Watch points */
 @@ -330,6 +333,20 @@ int kvm_arch_vcpu_ioctl_set_guest_debug(struct kvm_vcpu 
 *vcpu,
   return -EINVAL;
   }
  
 + /* If we are going to handle any debug exceptions we need to
 +  * set MDCE_EL2.TDE to ensure they are routed to the
 +  * hypervisor. If userspace determines the exception was
 +  * actually internal to the guest it needs to handle
 +  * re-injecting the exception.
 +  */

kernel comment blocks typically start with an empty line, e.g.
/*
 * comment block
 */

 + if (route_el2) {
 + kvm_info(routing debug exceptions);
 + vcpu-arch.mdcr_el2 = MDCR_EL2_TDE;
 + } else {
 + kvm_info(not routing debug exceptions);
 + vcpu-arch.mdcr_el2 = 0;
 + }

This looks weird because we're only managing some of the mdcr_el2 bits
with the mdcr_el2 field. If we were managing all of them then these
would need to be |= MDCR_EL2_TDE and = ~SOME_MASK instead. If we never
plan to manage all the bits, then I think that points more towards the
need for a more generic field instead.

 +
   return 0;
  }
  
 diff --git a/arch/arm64/include/asm/kvm_host.h 
 b/arch/arm64/include/asm/kvm_host.h
 index 2012c4b..38b0f07 100644
 --- a/arch/arm64/include/asm/kvm_host.h
 +++ b/arch/arm64/include/asm/kvm_host.h
 @@ -98,6 +98,7 @@ struct kvm_vcpu_arch {
  
   /* HYP configuration */
   u64 hcr_el2;
 + u32 mdcr_el2;
  
   /* Exception Information */
   struct kvm_vcpu_fault_info fault;
 diff --git a/arch/arm64/kernel/asm-offsets.c b/arch/arm64/kernel/asm-offsets.c
 index 9a9fce0..8da1043 100644
 --- a/arch/arm64/kernel/asm-offsets.c
 +++ b/arch/arm64/kernel/asm-offsets.c
 @@ -122,6 +122,7 @@ int main(void)
DEFINE(VCPU_HPFAR_EL2, offsetof(struct kvm_vcpu, 
 arch.fault.hpfar_el2));
DEFINE(VCPU_DEBUG_FLAGS,   offsetof(struct kvm_vcpu, arch.debug_flags));
DEFINE(VCPU_HCR_EL2,   offsetof(struct kvm_vcpu, 
 arch.hcr_el2));
 +  DEFINE(VCPU_MDCR_EL2,  offsetof(struct kvm_vcpu, 

Re: [CFT PATCH v2 2/2] KVM: x86: support XSAVES usage in the host

2014-11-26 Thread Paolo Bonzini


On 26/11/2014 15:42, Radim Krčmář wrote:
 2014-11-26 14:57+0100, Paolo Bonzini:


 On 26/11/2014 14:53, Radim Krčmář wrote:
 get_xsave = native_xrstor(guest_xsave);  xsave(aligned_userspace_buffer)
 set_xsave = xrstor(aligned_userspace_buffer);  native_xsave(guest_xsave)

 Could that work?

 It could, though it is more like

get_fpu()
native_xrstor(guest_xsave)
xsave(buffer)
put_fpu()

 and vice versa.  Also, the userspace buffer is mos likely not aligned,
 so you need some kind of bounce buffer.  It can be done if the CPUID
 turns out to be a bottleneck, apart from that it'd most likely be slower.
 Yeah, it was mostly making this code more future-proof ... it is easier
 to convince xsave.h to export its structures if CPUID is the problem.
 (I still see some hope for Linux, so performance isn't my primary goal.)

 I'm quite interested in CPUID now though, so I'll try to benchmark it,
 someday.
 
 (Sorry, I don't fully understand your thoughts and I just talk more of
  the same in those scenarios.)
 
 I'm not sure what is more future proof. :)  I wonder if native_xrstor
 could be a problem the day XRSTORS actually sets/restores MSRs as the
 processor documentation promises.
 
 XRSTORS won't affect the guest in any way, we are just going to use it
 to convert the xsave, so any side-effects are going to stay in the host.
 (This could break the host though.)

Yes, that's the problem. :)

 My main presumption is that XSAVE*-XRSTOR*-XSAVE-XRSTOR has the same
 result as XSAVE-XRSTOR, because we are only interested in the state,
 not in any metadata.
 (If it isn't possible to combine intructions, like XSAVE after XRSTORS,
  this solution won't work.)

Yes, that should be right.  But actually what KVM would do it is
XRSTOR-XSAVE*-XRSTOR*-XSAVE.  The problem here is the side effects of
doing XRSTORS far from a guest entry... though that would likely be
handled by load_guest_fpu/put_guest_fpu.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 5/7] KVM: arm64: guest debug, add support for single-step

2014-11-26 Thread Andrew Jones
On Tue, Nov 25, 2014 at 04:10:03PM +, Alex Bennée wrote:
 This adds support for single-stepping the guest. As userspace can and
 will manipulate guest registers before restarting any tweaking of the
 registers has to occur just before control is passed back to the guest.
 Furthermore while guest debugging is in effect we need to squash the
 ability of the guest to single-step itself as we have no easy way of
 re-entering the guest after the exception has been delivered to the
 hypervisor.
 
 Signed-off-by: Alex Bennée alex.ben...@linaro.org
 
 diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
 index 48d26bb..a76daae 100644
 --- a/arch/arm/kvm/arm.c
 +++ b/arch/arm/kvm/arm.c
 @@ -38,6 +38,7 @@
  #include asm/tlbflush.h
  #include asm/cacheflush.h
  #include asm/virt.h
 +#include asm/debug-monitors.h
  #include asm/kvm_arm.h
  #include asm/kvm_asm.h
  #include asm/kvm_mmu.h
 @@ -300,6 +301,17 @@ void kvm_arch_vcpu_put(struct kvm_vcpu *vcpu)
   kvm_arm_set_running_vcpu(NULL);
  }
  
 +/**
 + * kvm_arch_vcpu_ioctl_set_guest_debug - Setup guest debugging
 + * @kvm: pointer to the KVM struct
 + * @kvm_guest_debug: the ioctl data buffer
 + *
 + * This sets up the VM for guest debugging. Care has to be taken when
 + * manipulating guest registers as these will be set/cleared by the
 + * hyper-visor controller, typically before each kvm_run event. As a
 + * result modification of the guest registers needs to take place
 + * after they have been restored in the hyp.S trampoline code.
 + */
  int kvm_arch_vcpu_ioctl_set_guest_debug(struct kvm_vcpu *vcpu,
   struct kvm_guest_debug *dbg)
  {
 @@ -317,8 +329,8 @@ int kvm_arch_vcpu_ioctl_set_guest_debug(struct kvm_vcpu 
 *vcpu,
  
   /* Single Step */
   if (vcpu-guest_debug  KVM_GUESTDBG_SINGLESTEP) {
 - kvm_info(SS requested, not yet implemented\n);
 - return -EINVAL;
 + kvm_info(SS requested\n);
 + route_el2 = true;
   }
  
   /* Software Break Points */
 diff --git a/arch/arm64/kernel/asm-offsets.c b/arch/arm64/kernel/asm-offsets.c
 index 8da1043..78e5ae1 100644
 --- a/arch/arm64/kernel/asm-offsets.c
 +++ b/arch/arm64/kernel/asm-offsets.c
 @@ -121,6 +121,7 @@ int main(void)
DEFINE(VCPU_FAR_EL2,   offsetof(struct kvm_vcpu, 
 arch.fault.far_el2));
DEFINE(VCPU_HPFAR_EL2, offsetof(struct kvm_vcpu, 
 arch.fault.hpfar_el2));
DEFINE(VCPU_DEBUG_FLAGS,   offsetof(struct kvm_vcpu, arch.debug_flags));
 +  DEFINE(GUEST_DEBUG,offsetof(struct kvm_vcpu, guest_debug));
DEFINE(VCPU_HCR_EL2,   offsetof(struct kvm_vcpu, 
 arch.hcr_el2));
DEFINE(VCPU_MDCR_EL2,  offsetof(struct kvm_vcpu, arch.mdcr_el2));
DEFINE(VCPU_IRQ_LINES, offsetof(struct kvm_vcpu, arch.irq_lines));
 diff --git a/arch/arm64/kvm/handle_exit.c b/arch/arm64/kvm/handle_exit.c
 index 28dc92b..6def054 100644
 --- a/arch/arm64/kvm/handle_exit.c
 +++ b/arch/arm64/kvm/handle_exit.c
 @@ -91,6 +91,25 @@ static int kvm_handle_bkpt(struct kvm_vcpu *vcpu, struct 
 kvm_run *run)
   return 0;
  }
  
 +/**
 + * kvm_handle_ss - handle single step exceptions
 + *
 + * @vcpu:the vcpu pointer

same @run comment as other handler header in previous patch

 + *
 + * See: ARM ARM D2.12 for the details. While the host is routing debug
 + * exceptions to it's handlers we have to suppress the ability of the
 + * guest to trigger exceptions.
 + */
 +static int kvm_handle_ss(struct kvm_vcpu *vcpu, struct kvm_run *run)
 +{
 + WARN_ON(!(vcpu-guest_debug  KVM_GUESTDBG_SINGLESTEP));

I'm not sure about this WARN_ON. Is there some scenario you were
thinking of when you put it here? Is there some scenario where this
could trigger so frequently we kill the log buffer?

 +
 + run-exit_reason = KVM_EXIT_DEBUG;
 + run-debug.arch.exit_type = KVM_DEBUG_EXIT_SINGLE_STEP;
 + run-debug.arch.address = *vcpu_pc(vcpu);
 + return 0;
 +}
 +
  static exit_handle_fn arm_exit_handlers[] = {
   [ESR_EL2_EC_WFI]= kvm_handle_wfx,
   [ESR_EL2_EC_CP15_32]= kvm_handle_cp15_32,
 @@ -105,6 +124,7 @@ static exit_handle_fn arm_exit_handlers[] = {
   [ESR_EL2_EC_SYS64]  = kvm_handle_sys_reg,
   [ESR_EL2_EC_IABT]   = kvm_handle_guest_abort,
   [ESR_EL2_EC_DABT]   = kvm_handle_guest_abort,
 + [ESR_EL2_EC_SOFTSTP]= kvm_handle_ss,
   [ESR_EL2_EC_BKPT32] = kvm_handle_bkpt,
   [ESR_EL2_EC_BRK64]  = kvm_handle_bkpt,
  };
 diff --git a/arch/arm64/kvm/hyp.S b/arch/arm64/kvm/hyp.S
 index 3c733ea..c0bc218 100644
 --- a/arch/arm64/kvm/hyp.S
 +++ b/arch/arm64/kvm/hyp.S
 @@ -16,6 +16,7 @@
   */
  
  #include linux/linkage.h
 +#include linux/kvm.h
  
  #include asm/assembler.h
  #include asm/memory.h
 @@ -168,6 +169,31 @@
   // x19-x29, lr, sp*, elr*, spsr*
   restore_common_regs
  
 + // After restoring the guest registers but before we return to the guest
 + // we may want to make some final 

Re: [PATCH 2/7] KVM: arm: guest debug, define API headers

2014-11-26 Thread Paolo Bonzini


On 26/11/2014 15:58, Alex Bennée wrote:
 
 Andrew Jones drjo...@redhat.com writes:
 
 On Tue, Nov 25, 2014 at 04:10:00PM +, Alex Bennée wrote:
 This commit defines the API headers for guest debugging. There are two
 architecture specific debug structures:
 snip
 +/* Architecture related debug defines - upper 16 bits of
 + * kvm_guest_debug-control
 + */
 +#define KVM_GUESTDBG_USE_SW_BP_SHIFT   16
 +#define KVM_GUESTDBG_USE_SW_BP (1  
 KVM_GUESTDBG_USE_SW_BP_SHIFT)
 +#define KVM_GUESTDBG_USE_HW_BP_SHIFT   17
 +#define KVM_GUESTDBG_USE_HW_BP (1  
 KVM_GUESTDBG_USE_HW_BP_SHIFT)
 +

 I see this are defined in arch/x86/include/uapi/asm/kvm.h,
 so you needed to reproduce them here, but shouldn't they
 be promoted to include/uapi/linux/kvm.h instead?
 
 Well if we move them to common uapi we either restrict the $ARCH
 specific options that don't have SW/HW BKPTS (would be weird but...) or
 make them generic in the lower 16 bits (breaks API).
 
 But in principle I have no objection if other don't.

I think it's a matter of personal taste.  Architecture-specific means
not all architectures may support it, but it's certainly a good idea
to reuse the same value if multiple architectures do support a #define.

What you did is fine, another possibility is to do

#define __KVM_GUESTDBG_USE_SW_BP   (1  16)

in include/uapi/linux/kvm.h, and

#define KVM_GUESTDBG_USE_SW_BP __KVM_GUESTDBG_USE_SW_BP

in the arch-specific file.  Andrew, is this closer to what you intended?

Paolo
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 6/7] KVM: arm64: re-factor hyp.S debug register code

2014-11-26 Thread Andrew Jones
On Tue, Nov 25, 2014 at 04:10:04PM +, Alex Bennée wrote:
 This is a pre-cursor to sharing the code with the guest debug support.
 This replaces the big macro that fishes data out of a fixed location
 with a more general helper macro to restore a set of debug registers. It
 uses macro substitution so it can be re-used for debug control and value
 registers. It does however rely on the debug registers being 64 bit
 aligned (as they happen to be in the hyp ABI).
 
 Signed-off-by: Alex Bennée alex.ben...@linaro.org
 
 diff --git a/arch/arm64/kvm/hyp.S b/arch/arm64/kvm/hyp.S
 index c0bc218..b38ce3d 100644
 --- a/arch/arm64/kvm/hyp.S
 +++ b/arch/arm64/kvm/hyp.S
 @@ -490,65 +490,12 @@
   msr mdscr_el1,  x25
  .endm
  
 -.macro restore_debug
 - // x2: base address for cpu context
 - // x3: tmp register
 -
 - mrs x26, id_aa64dfr0_el1
 - ubfxx24, x26, #12, #4   // Extract BRPs
 - ubfxx25, x26, #20, #4   // Extract WRPs
 - mov w26, #15
 - sub w24, w26, w24   // How many BPs to skip
 - sub w25, w26, w25   // How many WPs to skip
 -
 - add x3, x2, #CPU_SYSREG_OFFSET(DBGBCR0_EL1)
 -
 - adr x26, 1f
 - add x26, x26, x24, lsl #2
 - br  x26
 -1:
 - ldr x20, [x3, #(15 * 8)]
 - ldr x19, [x3, #(14 * 8)]
 - ldr x18, [x3, #(13 * 8)]
 - ldr x17, [x3, #(12 * 8)]
 - ldr x16, [x3, #(11 * 8)]
 - ldr x15, [x3, #(10 * 8)]
 - ldr x14, [x3, #(9 * 8)]
 - ldr x13, [x3, #(8 * 8)]
 - ldr x12, [x3, #(7 * 8)]
 - ldr x11, [x3, #(6 * 8)]
 - ldr x10, [x3, #(5 * 8)]
 - ldr x9, [x3, #(4 * 8)]
 - ldr x8, [x3, #(3 * 8)]
 - ldr x7, [x3, #(2 * 8)]
 - ldr x6, [x3, #(1 * 8)]
 - ldr x5, [x3, #(0 * 8)]
 -
 +.macro setup_debug_registers type
 +// x3: pointer to register set
 +// x4: number of registers to copy
 +// x5..x20/x26 trashed
 ^ comma instead?
   adr x26, 1f
 - add x26, x26, x24, lsl #2
 - br  x26
 -1:
 - msr dbgbcr15_el1, x20
 - msr dbgbcr14_el1, x19
 - msr dbgbcr13_el1, x18
 - msr dbgbcr12_el1, x17
 - msr dbgbcr11_el1, x16
 - msr dbgbcr10_el1, x15
 - msr dbgbcr9_el1, x14
 - msr dbgbcr8_el1, x13
 - msr dbgbcr7_el1, x12
 - msr dbgbcr6_el1, x11
 - msr dbgbcr5_el1, x10
 - msr dbgbcr4_el1, x9
 - msr dbgbcr3_el1, x8
 - msr dbgbcr2_el1, x7
 - msr dbgbcr1_el1, x6
 - msr dbgbcr0_el1, x5
 -
 - add x3, x2, #CPU_SYSREG_OFFSET(DBGBVR0_EL1)
 -
 - adr x26, 1f
 - add x26, x26, x24, lsl #2
 + add x26, x26, x4, lsl #2
   br  x26
  1:
   ldr x20, [x3, #(15 * 8)]
 @@ -569,116 +516,25 @@
   ldr x5, [x3, #(0 * 8)]
  
   adr x26, 1f
 - add x26, x26, x24, lsl #2
 - br  x26
 -1:
 - msr dbgbvr15_el1, x20
 - msr dbgbvr14_el1, x19
 - msr dbgbvr13_el1, x18
 - msr dbgbvr12_el1, x17
 - msr dbgbvr11_el1, x16
 - msr dbgbvr10_el1, x15
 - msr dbgbvr9_el1, x14
 - msr dbgbvr8_el1, x13
 - msr dbgbvr7_el1, x12
 - msr dbgbvr6_el1, x11
 - msr dbgbvr5_el1, x10
 - msr dbgbvr4_el1, x9
 - msr dbgbvr3_el1, x8
 - msr dbgbvr2_el1, x7
 - msr dbgbvr1_el1, x6
 - msr dbgbvr0_el1, x5
 -
 - add x3, x2, #CPU_SYSREG_OFFSET(DBGWCR0_EL1)
 -
 - adr x26, 1f
 - add x26, x26, x25, lsl #2
 + add x26, x26, x4, lsl #2
   br  x26
  1:
 - ldr x20, [x3, #(15 * 8)]
 - ldr x19, [x3, #(14 * 8)]
 - ldr x18, [x3, #(13 * 8)]
 - ldr x17, [x3, #(12 * 8)]
 - ldr x16, [x3, #(11 * 8)]
 - ldr x15, [x3, #(10 * 8)]
 - ldr x14, [x3, #(9 * 8)]
 - ldr x13, [x3, #(8 * 8)]
 - ldr x12, [x3, #(7 * 8)]
 - ldr x11, [x3, #(6 * 8)]
 - ldr x10, [x3, #(5 * 8)]
 - ldr x9, [x3, #(4 * 8)]
 - ldr x8, [x3, #(3 * 8)]
 - ldr x7, [x3, #(2 * 8)]
 - ldr x6, [x3, #(1 * 8)]
 - ldr x5, [x3, #(0 * 8)]
 -
 - adr x26, 1f
 - add x26, x26, x25, lsl #2
 - br  x26
 -1:
 - msr dbgwcr15_el1, x20
 - msr dbgwcr14_el1, x19
 - msr dbgwcr13_el1, x18
 - msr dbgwcr12_el1, x17
 - msr dbgwcr11_el1, x16
 - msr dbgwcr10_el1, x15
 - msr dbgwcr9_el1, x14
 - msr dbgwcr8_el1, x13
 - msr dbgwcr7_el1, x12
 - msr dbgwcr6_el1, x11
 - msr dbgwcr5_el1, x10
 - msr dbgwcr4_el1, x9
 - msr dbgwcr3_el1, x8
 - msr dbgwcr2_el1, x7
 - msr dbgwcr1_el1, x6
 - msr dbgwcr0_el1, x5
 -
 - add x3, x2, #CPU_SYSREG_OFFSET(DBGWVR0_EL1)
 -
 - adr x26, 1f
 - add x26, x26, x25, lsl #2
 - br  x26
 -1:
 - 

Re: [RFC PATCH 4/5] ARM: enable linking against eventfd and irqchip

2014-11-26 Thread Eric Auger
On 11/24/2014 10:27 PM, Nikolay Nikolaev wrote:
 This enables compilation of the eventfd feature on ARM.
Hi Nikolay,

why irqchip in the title?

Best Regards

Eric
 
 Signed-off-by: Nikolay Nikolaev n.nikol...@virtualopensystems.com
 ---
  arch/arm/kvm/Kconfig  |1 +
  arch/arm/kvm/Makefile |2 +-
  2 files changed, 2 insertions(+), 1 deletion(-)
 
 diff --git a/arch/arm/kvm/Kconfig b/arch/arm/kvm/Kconfig
 index 466bd29..a4b0312 100644
 --- a/arch/arm/kvm/Kconfig
 +++ b/arch/arm/kvm/Kconfig
 @@ -20,6 +20,7 @@ config KVM
   bool Kernel-based Virtual Machine (KVM) support
   select PREEMPT_NOTIFIERS
   select ANON_INODES
 + select HAVE_KVM_EVENTFD
   select HAVE_KVM_CPU_RELAX_INTERCEPT
   select KVM_MMIO
   select KVM_ARM_HOST
 diff --git a/arch/arm/kvm/Makefile b/arch/arm/kvm/Makefile
 index 443b8be..539c1a5 100644
 --- a/arch/arm/kvm/Makefile
 +++ b/arch/arm/kvm/Makefile
 @@ -15,7 +15,7 @@ AFLAGS_init.o := -Wa,-march=armv7-a$(plus_virt)
  AFLAGS_interrupts.o := -Wa,-march=armv7-a$(plus_virt)
  
  KVM := ../../../virt/kvm
 -kvm-arm-y = $(KVM)/kvm_main.o $(KVM)/coalesced_mmio.o
 +kvm-arm-y = $(KVM)/kvm_main.o $(KVM)/coalesced_mmio.o $(KVM)/eventfd.o
  
  obj-y += kvm-arm.o init.o interrupts.o
  obj-y += arm.o handle_exit.o guest.o mmu.o emulate.o reset.o
 

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 14/21] KVM: x86: Software disabled APIC should still deliver NMIs

2014-11-26 Thread Nadav Amit
Paolo Bonzini pbonz...@redhat.com wrote:

 
 
 On 06/11/2014 17:45, Radim Krčmář wrote:
 2014-11-06 10:34+0100, Paolo Bonzini:
 On 05/11/2014 21:45, Nadav Amit wrote:
 If I understand the SDM correctly, in such scenario (all APICs are
 software disabled) the mode is left as the default - flat mode (see
 
 APIC doesn't have any global mode (it is just KVM's simplification), so
 when a message lands on the system bus, it just compares MDA with LDR
 and DFR ...
 
 section 10.6.2.2 Logical Destination Mode”): All processors that
 have their APIC software enabled (using the spurious vector
 enable/disable bit) must have their DFRs (Destination Format
 Registers) programmed identically. The default mode for DFR is flat
 mode.”
 
 I think the default mode points to reset state, which is flat DFR;
 and it is mentioned only because of the following sentence
  If you are using cluster mode, DFRs must be programmed before the APIC
  is software enabled.
 
 That's not what either Bochs or QEMU do, though.  (Though in the case of
 Bochs I cannot find the place where reception of IPIs is prevented for
 software-disabled APICs, so I'm not sure how much to trust it in this case).
 
 I'm not sure why software-disabled APICs could have different DFRs, if
 the APICs can receive NMI IPIs.  I'll ask Intel.
 
 When changing the mode, we can't switch DFR synchronously, so it has to
 happen and NMI may be needed (watchdog?) before APIC configuration.
 Explicit statement might have been a hint to be _extra_ careful when
 using logical destination for INIT, NMI, ... I wonder what they'll say.
 
 Anyway, Paolo's patch seems to be in the right direction, I'd modify it
 a bit though:
 
 LDR=0 isn't addressable in any logical mode, so we can ignore APICs that
 don't set it and decide the mode by the last nonzero one.
 This works in a situation, where one part is configured for cluster and
 the rest is still in reset state.
 
 (It gets harder if we allow nonzero LDRs with different DFR ...
 we'd need to split our logical map to handle it.)
 
 diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
 index 758f838..6da303e1 100644
 --- a/arch/x86/kvm/lapic.c
 +++ b/arch/x86/kvm/lapic.c
 @@ -148,10 +148,6 @@ static void recalculate_apic_map(struct kvm *kvm)
  goto out;
 
  new-ldr_bits = 8;
 -/* flat mode is default */
 -new-cid_shift = 8;
 -new-cid_mask = 0;
 -new-lid_mask = 0xff;
  new-broadcast = APIC_BROADCAST;
 
  kvm_for_each_vcpu(i, vcpu, kvm) {
 @@ -166,7 +162,7 @@ static void recalculate_apic_map(struct kvm *kvm)
  new-cid_mask = (1  KVM_X2APIC_CID_BITS) - 1;
  new-lid_mask = 0x;
  new-broadcast = X2APIC_BROADCAST;
 -} else if (kvm_apic_hw_enabled(apic)) {
 +} else if (kvm_apic_get_reg(apic, APIC_LDR)) {
  if (kvm_apic_get_reg(apic, APIC_DFR) ==
  APIC_DFR_CLUSTER) {
  new-cid_shift = 4;
 
 I merged this patch and Nadav’s.

Sorry for the late and long reply, but I got an issue with the new version
(and my previous version as well). Indeed, the SDM states that DFR should
be the same for enabled CPUs, and that the BIOS should get all CPUs in
either xAPIC or x2APIC. Yet, there is nothing that says all CPUs need to be
in xAPIC/x2APIC mode.

In my tests (which pass on bare-metal), I got a scenario in which some CPUs
are in xAPIC mode, the BSP changed (which is currently not handled correctly
by KVM) and the BSP has x2APIC enabled. All the core APICs are
software-enabled. The expected behaviour is that the CPUs with x2APIC
enabled would work in x2APIC mode.

I think such a transitory scenario is possible on real-systems as well,
perhaps during CPU hot-plug. It appears the previous version (before all of
our changes) handled it better. I presume the most efficient way is to start
determining the APIC logical mode from the BSP, and if it is disabled,
traverse the rest of the CPUs until finding the first one with APIC enabled.
Yet, I have not finished doing and checking the BSP fix and other dependent
INIT signal handling fixes.

In the meanwhile, would you be ok with restoring some of the previous
behaviour - i.e., x2APIC is enabled if any CPU turned it on (regardless to
whether APIC is software enabled), otherwise use the configuration of the
last enabled APIC?

-- 8 —
Subject: [PATCH] KVM: x86: Traverse all CPUs during recalculate_apic_map

---
 arch/x86/kvm/lapic.c | 20 +++-
 1 file changed, 11 insertions(+), 9 deletions(-)

diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index 9c90d31..6dc2be6 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -139,6 +139,7 @@ static void recalculate_apic_map(struct kvm *kvm)
struct kvm_apic_map *new, *old = NULL;
struct kvm_vcpu *vcpu;
int i;
+   bool any_enabled = false;
 
new = kzalloc(sizeof(struct 

Re: [RFC PATCH 1/5] KVM: redesing kvm_io_bus_ API to pass VCPU structure to the callbacks.

2014-11-26 Thread Eric Auger
Hi Nikolay,

typo in the title,

On 11/24/2014 10:26 PM, Nikolay Nikolaev wrote:
 This is needed in e.g. ARM vGIC emulation, where the MMIO handling
 depends on the VCPU that does the access.
 
 Signed-off-by: Nikolay Nikolaev n.nikol...@virtualopensystems.com
 ---
  arch/ia64/kvm/kvm-ia64.c   |4 ++--
  arch/powerpc/kvm/powerpc.c |4 ++--
  arch/s390/kvm/diag.c   |2 +-
  arch/x86/kvm/vmx.c |2 +-
  arch/x86/kvm/x86.c |   11 ++-
  include/linux/kvm_host.h   |   10 +-
  virt/kvm/coalesced_mmio.c  |5 +++--
  virt/kvm/eventfd.c |4 ++--
  virt/kvm/iodev.h   |   23 +++
  virt/kvm/kvm_main.c|   32 
  10 files changed, 53 insertions(+), 44 deletions(-)
 
 diff --git a/arch/ia64/kvm/kvm-ia64.c b/arch/ia64/kvm/kvm-ia64.c
 index ec6b9ac..16f1d26 100644
 --- a/arch/ia64/kvm/kvm-ia64.c
 +++ b/arch/ia64/kvm/kvm-ia64.c
 @@ -246,10 +246,10 @@ static int handle_mmio(struct kvm_vcpu *vcpu, struct 
 kvm_run *kvm_run)
   return 0;
  mmio:
   if (p-dir)
 - r = kvm_io_bus_read(vcpu-kvm, KVM_MMIO_BUS, p-addr,
 + r = kvm_io_bus_read(vcpu, KVM_MMIO_BUS, p-addr,
   p-size, p-data);
   else
 - r = kvm_io_bus_write(vcpu-kvm, KVM_MMIO_BUS, p-addr,
 + r = kvm_io_bus_write(vcpu, KVM_MMIO_BUS, p-addr,
p-size, p-data);
   if (r)
   printk(KERN_ERRkvm: No iodevice found! addr:%lx\n, p-addr);
 diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c
 index c1f8f53..5ac065b 100644
 --- a/arch/powerpc/kvm/powerpc.c
 +++ b/arch/powerpc/kvm/powerpc.c
 @@ -814,7 +814,7 @@ int kvmppc_handle_load(struct kvm_run *run, struct 
 kvm_vcpu *vcpu,
  
   idx = srcu_read_lock(vcpu-kvm-srcu);
  
 - ret = kvm_io_bus_read(vcpu-kvm, KVM_MMIO_BUS, run-mmio.phys_addr,
 + ret = kvm_io_bus_read(vcpu, KVM_MMIO_BUS, run-mmio.phys_addr,
 bytes, run-mmio.data);
  
   srcu_read_unlock(vcpu-kvm-srcu, idx);
 @@ -887,7 +887,7 @@ int kvmppc_handle_store(struct kvm_run *run, struct 
 kvm_vcpu *vcpu,
  
   idx = srcu_read_lock(vcpu-kvm-srcu);
  
 - ret = kvm_io_bus_write(vcpu-kvm, KVM_MMIO_BUS, run-mmio.phys_addr,
 + ret = kvm_io_bus_write(vcpu, KVM_MMIO_BUS, run-mmio.phys_addr,
  bytes, run-mmio.data);
  
   srcu_read_unlock(vcpu-kvm-srcu, idx);
 diff --git a/arch/s390/kvm/diag.c b/arch/s390/kvm/diag.c
 index 9254aff..329ec75 100644
 --- a/arch/s390/kvm/diag.c
 +++ b/arch/s390/kvm/diag.c
 @@ -213,7 +213,7 @@ static int __diag_virtio_hypercall(struct kvm_vcpu *vcpu)
* - gpr 3 contains the virtqueue index (passed as datamatch)
* - gpr 4 contains the index on the bus (optionally)
*/
 - ret = kvm_io_bus_write_cookie(vcpu-kvm, KVM_VIRTIO_CCW_NOTIFY_BUS,
 + ret = kvm_io_bus_write_cookie(vcpu, KVM_VIRTIO_CCW_NOTIFY_BUS,
 vcpu-run-s.regs.gprs[2]  0x,
 8, vcpu-run-s.regs.gprs[3],
 vcpu-run-s.regs.gprs[4]);
 diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
 index 3e556c6..e6d9f01 100644
 --- a/arch/x86/kvm/vmx.c
 +++ b/arch/x86/kvm/vmx.c
 @@ -5620,7 +5620,7 @@ static int handle_ept_misconfig(struct kvm_vcpu *vcpu)
   gpa_t gpa;
  
   gpa = vmcs_read64(GUEST_PHYSICAL_ADDRESS);
 - if (!kvm_io_bus_write(vcpu-kvm, KVM_FAST_MMIO_BUS, gpa, 0, NULL)) {
 + if (!kvm_io_bus_write(vcpu, KVM_FAST_MMIO_BUS, gpa, 0, NULL)) {
   skip_emulated_instruction(vcpu);
   return 1;
   }
 diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
 index 0033df3..bbf9375 100644
 --- a/arch/x86/kvm/x86.c
 +++ b/arch/x86/kvm/x86.c
 @@ -4053,7 +4053,7 @@ static int vcpu_mmio_write(struct kvm_vcpu *vcpu, gpa_t 
 addr, int len,
   n = min(len, 8);
   if (!(vcpu-arch.apic 
 !kvm_iodevice_write(vcpu-arch.apic-dev, addr, n, v))
vcpu param should be added above too.

Eric
 -  kvm_io_bus_write(vcpu-kvm, KVM_MMIO_BUS, addr, n, v))
 +  kvm_io_bus_write(vcpu, KVM_MMIO_BUS, addr, n, v))
   break;
   handled += n;
   addr += n;
 @@ -4072,8 +4072,9 @@ static int vcpu_mmio_read(struct kvm_vcpu *vcpu, gpa_t 
 addr, int len, void *v)
   do {
   n = min(len, 8);
   if (!(vcpu-arch.apic 
 -   !kvm_iodevice_read(vcpu-arch.apic-dev, addr, n, v))
 -  kvm_io_bus_read(vcpu-kvm, KVM_MMIO_BUS, addr, n, v))
 +   !kvm_iodevice_read(vcpu, vcpu-arch.apic-dev,
 +  addr, n, v))
 +  kvm_io_bus_read(vcpu, KVM_MMIO_BUS, addr, n, v))
   break;
   trace_kvm_mmio(KVM_TRACE_MMIO_READ, n, addr, *(u64 *)v);
  

Re: [PATCH 4/7] KVM: arm64: guest debug, add SW break point support

2014-11-26 Thread Peter Maydell
On 26 November 2014 at 16:07, Andrew Jones drjo...@redhat.com wrote:
 There appears to be a typo in the ARM ARM. Subsection Software
 Breakpoint Instruction exception of D1.10.4 says BRK (ESR_EL2_EC_BRK64)
 is 0x39, but the table above that has it correctly as 0x3c.

Thanks for pointing out this typo -- I've reported it to the
appropriate documentation people.

-- PMM
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH RFC v3 12/12] s390x/virtio-ccw: enable virtio 1.0

2014-11-26 Thread Cornelia Huck
virtio-ccw should now have everything in place to operate virtio 1.0
devices, so let's enable revision 1.

Signed-off-by: Cornelia Huck cornelia.h...@de.ibm.com
---
 hw/s390x/virtio-ccw.h |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/hw/s390x/virtio-ccw.h b/hw/s390x/virtio-ccw.h
index 03d5955..08edd8d 100644
--- a/hw/s390x/virtio-ccw.h
+++ b/hw/s390x/virtio-ccw.h
@@ -73,7 +73,7 @@ typedef struct VirtIOCCWDeviceClass {
 #define VIRTIO_CCW_FEATURE_SIZE NR_VIRTIO_FEATURE_WORDS
 
 /* The maximum virtio revision we support. */
-#define VIRTIO_CCW_REV_MAX 0
+#define VIRTIO_CCW_REV_MAX 1
 
 /* Performance improves when virtqueue kick processing is decoupled from the
  * vcpu thread using ioeventfd for some devices. */
-- 
1.7.9.5

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH RFC v3 11/12] virtio-net/virtio-blk: enable virtio 1.0

2014-11-26 Thread Cornelia Huck
virtio-net (non-vhost) and virtio-blk have everything in place to support
virtio 1.0: let's enable the feature bit for them.

Note that VIRTIO_F_VERSION_1 is technically a transport feature; once
every device is ready for virtio 1.0, we can move this setting this
feature bit out of the individual devices.

Signed-off-by: Cornelia Huck cornelia.h...@de.ibm.com
---
 hw/block/virtio-blk.c |4 
 hw/net/virtio-net.c   |4 
 2 files changed, 8 insertions(+)

diff --git a/hw/block/virtio-blk.c b/hw/block/virtio-blk.c
index 6d86f60..3781f98 100644
--- a/hw/block/virtio-blk.c
+++ b/hw/block/virtio-blk.c
@@ -569,6 +569,10 @@ static uint32_t virtio_blk_get_features(VirtIODevice 
*vdev, unsigned int index,
 {
 VirtIOBlock *s = VIRTIO_BLK(vdev);
 
+if (index == 1) {
+features |= (1  (VIRTIO_F_VERSION_1 - 32));
+}
+
 if (index  0) {
 return features;
 }
diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
index 1e214b5..fcfc95f 100644
--- a/hw/net/virtio-net.c
+++ b/hw/net/virtio-net.c
@@ -447,6 +447,10 @@ static uint32_t virtio_net_get_features(VirtIODevice 
*vdev, unsigned int index,
 VirtIONet *n = VIRTIO_NET(vdev);
 NetClientState *nc = qemu_get_queue(n-nic);
 
+if (index == 1  !get_vhost_net(nc-peer)) {
+features |= (1  (VIRTIO_F_VERSION_1 - 32));
+}
+
 if (index  0) {
 return features;
 }
-- 
1.7.9.5

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH RFC v3 10/12] s390x/virtio-ccw: support virtio-1 set_vq format

2014-11-26 Thread Cornelia Huck
Support the new CCW_CMD_SET_VQ format for virtio-1 devices.

While we're at it, refactor the code a bit and enforce big endian
fields (which had always been required, even for legacy).

Reviewed-by: Thomas Huth th...@linux.vnet.ibm.com
Signed-off-by: Cornelia Huck cornelia.h...@de.ibm.com
---
 hw/s390x/virtio-ccw.c |  114 ++---
 1 file changed, 80 insertions(+), 34 deletions(-)

diff --git a/hw/s390x/virtio-ccw.c b/hw/s390x/virtio-ccw.c
index e79f3b8..60d67a3 100644
--- a/hw/s390x/virtio-ccw.c
+++ b/hw/s390x/virtio-ccw.c
@@ -238,11 +238,20 @@ VirtualCssBus *virtual_css_bus_init(void)
 }
 
 /* Communication blocks used by several channel commands. */
-typedef struct VqInfoBlock {
+typedef struct VqInfoBlockLegacy {
 uint64_t queue;
 uint32_t align;
 uint16_t index;
 uint16_t num;
+} QEMU_PACKED VqInfoBlockLegacy;
+
+typedef struct VqInfoBlock {
+uint64_t desc;
+uint32_t res0;
+uint16_t index;
+uint16_t num;
+uint64_t avail;
+uint64_t used;
 } QEMU_PACKED VqInfoBlock;
 
 typedef struct VqConfigBlock {
@@ -269,17 +278,20 @@ typedef struct VirtioRevInfo {
 } QEMU_PACKED VirtioRevInfo;
 
 /* Specify where the virtqueues for the subchannel are in guest memory. */
-static int virtio_ccw_set_vqs(SubchDev *sch, uint64_t addr, uint32_t align,
-  uint16_t index, uint16_t num)
+static int virtio_ccw_set_vqs(SubchDev *sch, VqInfoBlock *info,
+  VqInfoBlockLegacy *linfo)
 {
 VirtIODevice *vdev = virtio_ccw_get_vdev(sch);
+uint16_t index = info ? info-index : linfo-index;
+uint16_t num = info ? info-num : linfo-num;
+uint64_t desc = info ? info-desc : linfo-queue;
 
 if (index  VIRTIO_PCI_QUEUE_MAX) {
 return -EINVAL;
 }
 
 /* Current code in virtio.c relies on 4K alignment. */
-if (addr  (align != 4096)) {
+if (linfo  desc  (linfo-align != 4096)) {
 return -EINVAL;
 }
 
@@ -287,8 +299,12 @@ static int virtio_ccw_set_vqs(SubchDev *sch, uint64_t 
addr, uint32_t align,
 return -EINVAL;
 }
 
-virtio_queue_set_addr(vdev, index, addr);
-if (!addr) {
+if (info) {
+virtio_queue_set_rings(vdev, index, desc, info-avail, info-used);
+} else {
+virtio_queue_set_addr(vdev, index, desc);
+}
+if (!desc) {
 virtio_queue_set_vector(vdev, index, 0);
 } else {
 /* Fail if we don't have a big enough queue. */
@@ -303,10 +319,66 @@ static int virtio_ccw_set_vqs(SubchDev *sch, uint64_t 
addr, uint32_t align,
 return 0;
 }
 
-static int virtio_ccw_cb(SubchDev *sch, CCW1 ccw)
+static int virtio_ccw_handle_set_vq(SubchDev *sch, CCW1 ccw, bool check_len,
+bool is_legacy)
 {
 int ret;
 VqInfoBlock info;
+VqInfoBlockLegacy linfo;
+size_t info_len = is_legacy ? sizeof(linfo) : sizeof(info);
+
+if (check_len) {
+if (ccw.count != info_len) {
+return -EINVAL;
+}
+} else if (ccw.count  info_len) {
+/* Can't execute command. */
+return -EINVAL;
+}
+if (!ccw.cda) {
+return -EFAULT;
+}
+if (is_legacy) {
+linfo.queue = ldq_be_phys(address_space_memory, ccw.cda);
+linfo.align = ldl_be_phys(address_space_memory,
+  ccw.cda + sizeof(linfo.queue));
+linfo.index = lduw_be_phys(address_space_memory,
+   ccw.cda + sizeof(linfo.queue)
+   + sizeof(linfo.align));
+linfo.num = lduw_be_phys(address_space_memory,
+ ccw.cda + sizeof(linfo.queue)
+ + sizeof(linfo.align)
+ + sizeof(linfo.index));
+ret = virtio_ccw_set_vqs(sch, NULL, linfo);
+} else {
+info.desc = ldq_be_phys(address_space_memory, ccw.cda);
+info.index = lduw_be_phys(address_space_memory,
+  ccw.cda + sizeof(info.desc)
+  + sizeof(info.res0));
+info.num = lduw_be_phys(address_space_memory,
+ccw.cda + sizeof(info.desc)
+  + sizeof(info.res0)
+  + sizeof(info.index));
+info.avail = ldq_be_phys(address_space_memory,
+ ccw.cda + sizeof(info.desc)
+ + sizeof(info.res0)
+ + sizeof(info.index)
+ + sizeof(info.num));
+info.used = ldq_be_phys(address_space_memory,
+ccw.cda + sizeof(info.desc)
++ sizeof(info.res0)
++ sizeof(info.index)
++ sizeof(info.num)
++ sizeof(info.avail));
+ret = virtio_ccw_set_vqs(sch, 

[PATCH RFC v3 07/12] dataplane: allow virtio-1 devices

2014-11-26 Thread Cornelia Huck
Handle endianness conversion for virtio-1 virtqueues correctly.

Note that dataplane now needs to be built per-target.

Signed-off-by: Cornelia Huck cornelia.h...@de.ibm.com
---
 hw/block/dataplane/virtio-blk.c   |4 +-
 hw/scsi/virtio-scsi-dataplane.c   |2 +-
 hw/virtio/Makefile.objs   |2 +-
 hw/virtio/dataplane/Makefile.objs |2 +-
 hw/virtio/dataplane/vring.c   |   86 ++---
 include/hw/virtio/dataplane/vring-accessors.h |   75 +
 include/hw/virtio/dataplane/vring.h   |   14 +---
 7 files changed, 131 insertions(+), 54 deletions(-)
 create mode 100644 include/hw/virtio/dataplane/vring-accessors.h

diff --git a/hw/block/dataplane/virtio-blk.c b/hw/block/dataplane/virtio-blk.c
index 1222a37..2d8cc15 100644
--- a/hw/block/dataplane/virtio-blk.c
+++ b/hw/block/dataplane/virtio-blk.c
@@ -16,7 +16,9 @@
 #include qemu/iov.h
 #include qemu/thread.h
 #include qemu/error-report.h
+#include hw/virtio/virtio-access.h
 #include hw/virtio/dataplane/vring.h
+#include hw/virtio/dataplane/vring-accessors.h
 #include sysemu/block-backend.h
 #include hw/virtio/virtio-blk.h
 #include virtio-blk.h
@@ -75,7 +77,7 @@ static void complete_request_vring(VirtIOBlockReq *req, 
unsigned char status)
 VirtIOBlockDataPlane *s = req-dev-dataplane;
 stb_p(req-in-status, status);
 
-vring_push(req-dev-dataplane-vring, req-elem,
+vring_push(s-vdev, req-dev-dataplane-vring, req-elem,
req-qiov.size + sizeof(*req-in));
 
 /* Suppress notification to guest by BH and its scheduled
diff --git a/hw/scsi/virtio-scsi-dataplane.c b/hw/scsi/virtio-scsi-dataplane.c
index 03a1e8c..418d73b 100644
--- a/hw/scsi/virtio-scsi-dataplane.c
+++ b/hw/scsi/virtio-scsi-dataplane.c
@@ -94,7 +94,7 @@ void virtio_scsi_vring_push_notify(VirtIOSCSIReq *req)
 {
 VirtIODevice *vdev = VIRTIO_DEVICE(req-vring-parent);
 
-vring_push(req-vring-vring, req-elem,
+vring_push(vdev, req-vring-vring, req-elem,
req-qsgl.size + req-resp_iov.size);
 
 if (vring_should_notify(vdev, req-vring-vring)) {
diff --git a/hw/virtio/Makefile.objs b/hw/virtio/Makefile.objs
index d21c397..19b224a 100644
--- a/hw/virtio/Makefile.objs
+++ b/hw/virtio/Makefile.objs
@@ -2,7 +2,7 @@ common-obj-y += virtio-rng.o
 common-obj-$(CONFIG_VIRTIO_PCI) += virtio-pci.o
 common-obj-y += virtio-bus.o
 common-obj-y += virtio-mmio.o
-common-obj-$(CONFIG_VIRTIO) += dataplane/
+obj-$(CONFIG_VIRTIO) += dataplane/
 
 obj-y += virtio.o virtio-balloon.o 
 obj-$(CONFIG_LINUX) += vhost.o vhost-backend.o vhost-user.o
diff --git a/hw/virtio/dataplane/Makefile.objs 
b/hw/virtio/dataplane/Makefile.objs
index 9a8cfc0..753a9ca 100644
--- a/hw/virtio/dataplane/Makefile.objs
+++ b/hw/virtio/dataplane/Makefile.objs
@@ -1 +1 @@
-common-obj-y += vring.o
+obj-y += vring.o
diff --git a/hw/virtio/dataplane/vring.c b/hw/virtio/dataplane/vring.c
index a051775..0da8d6b 100644
--- a/hw/virtio/dataplane/vring.c
+++ b/hw/virtio/dataplane/vring.c
@@ -18,7 +18,9 @@
 #include hw/hw.h
 #include exec/memory.h
 #include exec/address-spaces.h
+#include hw/virtio/virtio-access.h
 #include hw/virtio/dataplane/vring.h
+#include hw/virtio/dataplane/vring-accessors.h
 #include qemu/error-report.h
 
 /* vring_map can be coupled with vring_unmap or (if you still have the
@@ -83,7 +85,7 @@ bool vring_setup(Vring *vring, VirtIODevice *vdev, int n)
 vring_init(vring-vr, virtio_queue_get_num(vdev, n), vring_ptr, 4096);
 
 vring-last_avail_idx = virtio_queue_get_last_avail_idx(vdev, n);
-vring-last_used_idx = vring-vr.used-idx;
+vring-last_used_idx = vring_get_used_idx(vdev, vring);
 vring-signalled_used = 0;
 vring-signalled_used_valid = false;
 
@@ -104,7 +106,7 @@ void vring_teardown(Vring *vring, VirtIODevice *vdev, int n)
 void vring_disable_notification(VirtIODevice *vdev, Vring *vring)
 {
 if (!(vdev-guest_features[0]  (1  VIRTIO_RING_F_EVENT_IDX))) {
-vring-vr.used-flags |= VRING_USED_F_NO_NOTIFY;
+vring_set_used_flags(vdev, vring, VRING_USED_F_NO_NOTIFY);
 }
 }
 
@@ -117,10 +119,10 @@ bool vring_enable_notification(VirtIODevice *vdev, Vring 
*vring)
 if (vdev-guest_features[0]  (1  VIRTIO_RING_F_EVENT_IDX)) {
 vring_avail_event(vring-vr) = vring-vr.avail-idx;
 } else {
-vring-vr.used-flags = ~VRING_USED_F_NO_NOTIFY;
+vring_clear_used_flags(vdev, vring, VRING_USED_F_NO_NOTIFY);
 }
 smp_mb(); /* ensure update is seen before reading avail_idx */
-return !vring_more_avail(vring);
+return !vring_more_avail(vdev, vring);
 }
 
 /* This is stolen from linux/drivers/vhost/vhost.c:vhost_notify() */
@@ -134,12 +136,13 @@ bool vring_should_notify(VirtIODevice *vdev, Vring *vring)
 smp_mb();
 
 if ((vdev-guest_features[0]  VIRTIO_F_NOTIFY_ON_EMPTY) 
-unlikely(vring-vr.avail-idx == vring-last_avail_idx)) {
+unlikely(!vring_more_avail(vdev, 

[PATCH RFC v3 06/12] virtio: allow virtio-1 queue layout

2014-11-26 Thread Cornelia Huck
For virtio-1 devices, we allow a more complex queue layout that doesn't
require descriptor table and rings on a physically-contigous memory area:
add virtio_queue_set_rings() to allow transports to set this up.

Signed-off-by: Cornelia Huck cornelia.h...@de.ibm.com
---
 hw/virtio/virtio.c |   16 
 include/hw/virtio/virtio.h |2 ++
 2 files changed, 18 insertions(+)

diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
index 4149f45..2c6bb91 100644
--- a/hw/virtio/virtio.c
+++ b/hw/virtio/virtio.c
@@ -96,6 +96,13 @@ static void virtqueue_init(VirtQueue *vq)
 {
 hwaddr pa = vq-pa;
 
+if (pa == -1ULL) {
+/*
+ * This is a virtio-1 style vq that has already been setup
+ * in virtio_queue_set.
+ */
+return;
+}
 vq-vring.desc = pa;
 vq-vring.avail = pa + vq-vring.num * sizeof(VRingDesc);
 vq-vring.used = vring_align(vq-vring.avail +
@@ -719,6 +726,15 @@ hwaddr virtio_queue_get_addr(VirtIODevice *vdev, int n)
 return vdev-vq[n].pa;
 }
 
+void virtio_queue_set_rings(VirtIODevice *vdev, int n, hwaddr desc,
+hwaddr avail, hwaddr used)
+{
+vdev-vq[n].pa = -1ULL;
+vdev-vq[n].vring.desc = desc;
+vdev-vq[n].vring.avail = avail;
+vdev-vq[n].vring.used = used;
+}
+
 void virtio_queue_set_num(VirtIODevice *vdev, int n, int num)
 {
 /* Don't allow guest to flip queue between existent and
diff --git a/include/hw/virtio/virtio.h b/include/hw/virtio/virtio.h
index 40e567c..f840320 100644
--- a/include/hw/virtio/virtio.h
+++ b/include/hw/virtio/virtio.h
@@ -227,6 +227,8 @@ void virtio_queue_set_addr(VirtIODevice *vdev, int n, 
hwaddr addr);
 hwaddr virtio_queue_get_addr(VirtIODevice *vdev, int n);
 void virtio_queue_set_num(VirtIODevice *vdev, int n, int num);
 int virtio_queue_get_num(VirtIODevice *vdev, int n);
+void virtio_queue_set_rings(VirtIODevice *vdev, int n, hwaddr desc,
+hwaddr avail, hwaddr used);
 void virtio_queue_set_align(VirtIODevice *vdev, int n, int align);
 void virtio_queue_notify(VirtIODevice *vdev, int n);
 uint16_t virtio_queue_vector(VirtIODevice *vdev, int n);
-- 
1.7.9.5

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH RFC v3 09/12] s390x/virtio-ccw: add virtio set-revision call

2014-11-26 Thread Cornelia Huck
From: Thomas Huth th...@linux.vnet.ibm.com

Handle the virtio-ccw revision according to what the guest sets.
When revision 1 is selected, we have a virtio-1 standard device
with byteswapping for the virtio rings.

When a channel gets disabled, we have to revert to the legacy behavior
in case the next user of the device does not negotiate the revision 1
anymore (e.g. the boot firmware uses revision 1, but the operating
system only uses the legacy mode).

Note that revisions  0 are still disabled; but we still extend the
feature bit size to be able to handle the VERSION_1 bit.

Signed-off-by: Thomas Huth th...@linux.vnet.ibm.com
Signed-off-by: Cornelia Huck cornelia.h...@de.ibm.com
---
 hw/s390x/virtio-ccw.c |   52 +
 hw/s390x/virtio-ccw.h |7 ++-
 2 files changed, 58 insertions(+), 1 deletion(-)

diff --git a/hw/s390x/virtio-ccw.c b/hw/s390x/virtio-ccw.c
index 62ec852..e79f3b8 100644
--- a/hw/s390x/virtio-ccw.c
+++ b/hw/s390x/virtio-ccw.c
@@ -20,9 +20,11 @@
 #include hw/virtio/virtio-net.h
 #include hw/sysbus.h
 #include qemu/bitops.h
+#include hw/virtio/virtio-access.h
 #include hw/virtio/virtio-bus.h
 #include hw/s390x/adapter.h
 #include hw/s390x/s390_flic.h
+#include linux/virtio_config.h
 
 #include ioinst.h
 #include css.h
@@ -260,6 +262,12 @@ typedef struct VirtioThinintInfo {
 uint8_t isc;
 } QEMU_PACKED VirtioThinintInfo;
 
+typedef struct VirtioRevInfo {
+uint16_t revision;
+uint16_t length;
+uint8_t data[0];
+} QEMU_PACKED VirtioRevInfo;
+
 /* Specify where the virtqueues for the subchannel are in guest memory. */
 static int virtio_ccw_set_vqs(SubchDev *sch, uint64_t addr, uint32_t align,
   uint16_t index, uint16_t num)
@@ -299,6 +307,7 @@ static int virtio_ccw_cb(SubchDev *sch, CCW1 ccw)
 {
 int ret;
 VqInfoBlock info;
+VirtioRevInfo revinfo;
 uint8_t status;
 VirtioFeatDesc features;
 void *config;
@@ -373,6 +382,13 @@ static int virtio_ccw_cb(SubchDev *sch, CCW1 ccw)
ccw.cda + sizeof(features.features));
 if (features.index  ARRAY_SIZE(dev-host_features)) {
 features.features = dev-host_features[features.index];
+/*
+ * Don't offer version 1 to the guest if it did not
+ * negotiate at least revision 1.
+ */
+if (features.index == 1  dev-revision = 0) {
+features.features = ~(1  (VIRTIO_F_VERSION_1 - 32));
+}
 } else {
 /* Return zeroes if the guest supports more feature bits. */
 features.features = 0;
@@ -400,6 +416,13 @@ static int virtio_ccw_cb(SubchDev *sch, CCW1 ccw)
ccw.cda + sizeof(features.features));
 features.features = ldl_le_phys(address_space_memory, ccw.cda);
 if (features.index  ARRAY_SIZE(vdev-guest_features)) {
+/*
+ * The guest should not set version 1 if it didn't
+ * negotiate a revision = 1.
+ */
+if (features.index == 1  dev-revision = 0) {
+features.features = ~(1  (VIRTIO_F_VERSION_1 - 32));
+}
 virtio_set_features(vdev, features.index, features.features);
 } else {
 /*
@@ -600,6 +623,25 @@ static int virtio_ccw_cb(SubchDev *sch, CCW1 ccw)
 }
 }
 break;
+case CCW_CMD_SET_VIRTIO_REV:
+len = sizeof(revinfo);
+if (ccw.count  len || (check_len  ccw.count  len)) {
+ret = -EINVAL;
+break;
+}
+if (!ccw.cda) {
+ret = -EFAULT;
+break;
+}
+cpu_physical_memory_read(ccw.cda, revinfo, len);
+if (dev-revision = 0 ||
+revinfo.revision  VIRTIO_CCW_REV_MAX) {
+ret = -ENOSYS;
+break;
+}
+ret = 0;
+dev-revision = revinfo.revision;
+break;
 default:
 ret = -ENOSYS;
 break;
@@ -607,6 +649,13 @@ static int virtio_ccw_cb(SubchDev *sch, CCW1 ccw)
 return ret;
 }
 
+static void virtio_sch_disable_cb(SubchDev *sch)
+{
+VirtioCcwDevice *dev = sch-driver_data;
+
+dev-revision = -1;
+}
+
 static int virtio_ccw_device_init(VirtioCcwDevice *dev, VirtIODevice *vdev)
 {
 unsigned int cssid = 0;
@@ -733,6 +782,7 @@ static int virtio_ccw_device_init(VirtioCcwDevice *dev, 
VirtIODevice *vdev)
 css_sch_build_virtual_schib(sch, 0, VIRTIO_CCW_CHPID_TYPE);
 
 sch-ccw_cb = virtio_ccw_cb;
+sch-disable_cb = virtio_sch_disable_cb;
 
 /* Build senseid data. */
 memset(sch-id, 0, sizeof(SenseId));
@@ -740,6 +790,8 @@ static int virtio_ccw_device_init(VirtioCcwDevice *dev, 
VirtIODevice *vdev)
 sch-id.cu_type = VIRTIO_CCW_CU_TYPE;
 sch-id.cu_model = vdev-device_id;
 
+dev-revision = 

[PATCH RFC v3 02/12] virtio: cull virtio_bus_set_vdev_features

2014-11-26 Thread Cornelia Huck
The only user of this function was virtio-ccw, and it should use
virtio_set_features() like everybody else: We need to make sure
that bad features are masked out properly, which this function did
not do.

Reviewed-by: Thomas Huth th...@linux.vnet.ibm.com
Signed-off-by: Cornelia Huck cornelia.h...@de.ibm.com
---
 hw/s390x/virtio-ccw.c  |3 +--
 hw/virtio/virtio-bus.c |   14 --
 include/hw/virtio/virtio-bus.h |3 ---
 3 files changed, 1 insertion(+), 19 deletions(-)

diff --git a/hw/s390x/virtio-ccw.c b/hw/s390x/virtio-ccw.c
index ea236c9..84f17bc 100644
--- a/hw/s390x/virtio-ccw.c
+++ b/hw/s390x/virtio-ccw.c
@@ -400,8 +400,7 @@ static int virtio_ccw_cb(SubchDev *sch, CCW1 ccw)
ccw.cda + sizeof(features.features));
 features.features = ldl_le_phys(address_space_memory, ccw.cda);
 if (features.index  ARRAY_SIZE(dev-host_features)) {
-virtio_bus_set_vdev_features(dev-bus, features.features);
-vdev-guest_features = features.features;
+virtio_set_features(vdev, features.features);
 } else {
 /*
  * If the guest supports more feature bits, assert that it
diff --git a/hw/virtio/virtio-bus.c b/hw/virtio/virtio-bus.c
index eb77019..a8ffa07 100644
--- a/hw/virtio/virtio-bus.c
+++ b/hw/virtio/virtio-bus.c
@@ -109,20 +109,6 @@ uint32_t virtio_bus_get_vdev_features(VirtioBusState *bus,
 return k-get_features(vdev, requested_features);
 }
 
-/* Set the features of the plugged device. */
-void virtio_bus_set_vdev_features(VirtioBusState *bus,
-  uint32_t requested_features)
-{
-VirtIODevice *vdev = virtio_bus_get_device(bus);
-VirtioDeviceClass *k;
-
-assert(vdev != NULL);
-k = VIRTIO_DEVICE_GET_CLASS(vdev);
-if (k-set_features != NULL) {
-k-set_features(vdev, requested_features);
-}
-}
-
 /* Get bad features of the plugged device. */
 uint32_t virtio_bus_get_vdev_bad_features(VirtioBusState *bus)
 {
diff --git a/include/hw/virtio/virtio-bus.h b/include/hw/virtio/virtio-bus.h
index 0756545..0d2e7b4 100644
--- a/include/hw/virtio/virtio-bus.h
+++ b/include/hw/virtio/virtio-bus.h
@@ -84,9 +84,6 @@ size_t virtio_bus_get_vdev_config_len(VirtioBusState *bus);
 /* Get the features of the plugged device. */
 uint32_t virtio_bus_get_vdev_features(VirtioBusState *bus,
 uint32_t requested_features);
-/* Set the features of the plugged device. */
-void virtio_bus_set_vdev_features(VirtioBusState *bus,
-  uint32_t requested_features);
 /* Get bad features of the plugged device. */
 uint32_t virtio_bus_get_vdev_bad_features(VirtioBusState *bus);
 /* Get config of the plugged device. */
-- 
1.7.9.5

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH RFC v3 08/12] s390x/css: Add a callback for when subchannel gets disabled

2014-11-26 Thread Cornelia Huck
From: Thomas Huth th...@linux.vnet.ibm.com

We need a possibility to run code when a subchannel gets disabled.
This patch adds the necessary infrastructure.

Signed-off-by: Thomas Huth th...@linux.vnet.ibm.com
Signed-off-by: Cornelia Huck cornelia.h...@de.ibm.com
---
 hw/s390x/css.c |   12 
 hw/s390x/css.h |1 +
 2 files changed, 13 insertions(+)

diff --git a/hw/s390x/css.c b/hw/s390x/css.c
index b67c039..735ec55 100644
--- a/hw/s390x/css.c
+++ b/hw/s390x/css.c
@@ -588,6 +588,7 @@ int css_do_msch(SubchDev *sch, SCHIB *orig_schib)
 {
 SCSW *s = sch-curr_status.scsw;
 PMCW *p = sch-curr_status.pmcw;
+uint16_t oldflags;
 int ret;
 SCHIB schib;
 
@@ -610,6 +611,7 @@ int css_do_msch(SubchDev *sch, SCHIB *orig_schib)
 copy_schib_from_guest(schib, orig_schib);
 /* Only update the program-modifiable fields. */
 p-intparm = schib.pmcw.intparm;
+oldflags = p-flags;
 p-flags = ~(PMCW_FLAGS_MASK_ISC | PMCW_FLAGS_MASK_ENA |
   PMCW_FLAGS_MASK_LM | PMCW_FLAGS_MASK_MME |
   PMCW_FLAGS_MASK_MP);
@@ -625,6 +627,12 @@ int css_do_msch(SubchDev *sch, SCHIB *orig_schib)
 (PMCW_CHARS_MASK_MBFC | PMCW_CHARS_MASK_CSENSE);
 sch-curr_status.mba = schib.mba;
 
+/* Has the channel been disabled? */
+if (sch-disable_cb  (oldflags  PMCW_FLAGS_MASK_ENA) != 0
+ (p-flags  PMCW_FLAGS_MASK_ENA) == 0) {
+sch-disable_cb(sch);
+}
+
 ret = 0;
 
 out:
@@ -1443,6 +1451,10 @@ void css_reset_sch(SubchDev *sch)
 {
 PMCW *p = sch-curr_status.pmcw;
 
+if ((p-flags  PMCW_FLAGS_MASK_ENA) != 0  sch-disable_cb) {
+sch-disable_cb(sch);
+}
+
 p-intparm = 0;
 p-flags = ~(PMCW_FLAGS_MASK_ISC | PMCW_FLAGS_MASK_ENA |
   PMCW_FLAGS_MASK_LM | PMCW_FLAGS_MASK_MME |
diff --git a/hw/s390x/css.h b/hw/s390x/css.h
index 33104ac..7fa807b 100644
--- a/hw/s390x/css.h
+++ b/hw/s390x/css.h
@@ -81,6 +81,7 @@ struct SubchDev {
 uint8_t ccw_no_data_cnt;
 /* transport-provided data: */
 int (*ccw_cb) (SubchDev *, CCW1);
+void (*disable_cb)(SubchDev *);
 SenseId id;
 void *driver_data;
 };
-- 
1.7.9.5

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH RFC v3 05/12] virtio: introduce legacy virtio devices

2014-11-26 Thread Cornelia Huck
Introduce a helper function to indicate  whether a virtio device is
operating in legacy or virtio standard mode.

It may be used to make decisions about the endianess of virtio accesses
and other virtio-1 specific changes, enabling us to support transitional
devices.

Reviewed-by: Thomas Huth th...@linux.vnet.ibm.com
Signed-off-by: Cornelia Huck cornelia.h...@de.ibm.com
---
 hw/virtio/virtio.c|6 +-
 include/hw/virtio/virtio-access.h |4 
 include/hw/virtio/virtio.h|   13 +++--
 3 files changed, 20 insertions(+), 3 deletions(-)

diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
index 2eb5d3c..4149f45 100644
--- a/hw/virtio/virtio.c
+++ b/hw/virtio/virtio.c
@@ -883,7 +883,11 @@ static bool virtio_device_endian_needed(void *opaque)
 VirtIODevice *vdev = opaque;
 
 assert(vdev-device_endian != VIRTIO_DEVICE_ENDIAN_UNKNOWN);
-return vdev-device_endian != virtio_default_endian();
+if (virtio_device_is_legacy(vdev)) {
+return vdev-device_endian != virtio_default_endian();
+}
+/* Devices conforming to VIRTIO 1.0 or later are always LE. */
+return vdev-device_endian != VIRTIO_DEVICE_ENDIAN_LITTLE;
 }
 
 static const VMStateDescription vmstate_virtio_device_endian = {
diff --git a/include/hw/virtio/virtio-access.h 
b/include/hw/virtio/virtio-access.h
index 46456fd..c123ee0 100644
--- a/include/hw/virtio/virtio-access.h
+++ b/include/hw/virtio/virtio-access.h
@@ -19,6 +19,10 @@
 
 static inline bool virtio_access_is_big_endian(VirtIODevice *vdev)
 {
+if (!virtio_device_is_legacy(vdev)) {
+/* Devices conforming to VIRTIO 1.0 or later are always LE. */
+return false;
+}
 #if defined(TARGET_IS_BIENDIAN)
 return virtio_is_big_endian(vdev);
 #elif defined(TARGET_WORDS_BIGENDIAN)
diff --git a/include/hw/virtio/virtio.h b/include/hw/virtio/virtio.h
index b408166..40e567c 100644
--- a/include/hw/virtio/virtio.h
+++ b/include/hw/virtio/virtio.h
@@ -275,9 +275,18 @@ void virtio_queue_set_host_notifier_fd_handler(VirtQueue 
*vq, bool assign,
 void virtio_queue_notify_vq(VirtQueue *vq);
 void virtio_irq(VirtQueue *vq);
 
+static inline bool virtio_device_is_legacy(VirtIODevice *vdev)
+{
+return !(vdev-guest_features[1]  (1  (VIRTIO_F_VERSION_1 - 32)));
+}
+
 static inline bool virtio_is_big_endian(VirtIODevice *vdev)
 {
-assert(vdev-device_endian != VIRTIO_DEVICE_ENDIAN_UNKNOWN);
-return vdev-device_endian == VIRTIO_DEVICE_ENDIAN_BIG;
+if (virtio_device_is_legacy(vdev)) {
+assert(vdev-device_endian != VIRTIO_DEVICE_ENDIAN_UNKNOWN);
+return vdev-device_endian == VIRTIO_DEVICE_ENDIAN_BIG;
+}
+/* Devices conforming to VIRTIO 1.0 or later are always LE. */
+return false;
 }
 #endif
-- 
1.7.9.5

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH RFC v3 04/12] s390x/virtio-ccw: fix check for WRITE_FEAT

2014-11-26 Thread Cornelia Huck
We need to check guest feature size, not host feature size to find
out whether we should call virtio_set_features(). This check is
possible now that vdev-guest_features is an array.

Reviewed-by: Thomas Huth th...@linux.vnet.ibm.com
Signed-off-by: Cornelia Huck cornelia.h...@de.ibm.com
---
 hw/s390x/virtio-ccw.c |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/hw/s390x/virtio-ccw.c b/hw/s390x/virtio-ccw.c
index 2f52b82..62ec852 100644
--- a/hw/s390x/virtio-ccw.c
+++ b/hw/s390x/virtio-ccw.c
@@ -399,7 +399,7 @@ static int virtio_ccw_cb(SubchDev *sch, CCW1 ccw)
 features.index = ldub_phys(address_space_memory,
ccw.cda + sizeof(features.features));
 features.features = ldl_le_phys(address_space_memory, ccw.cda);
-if (features.index  ARRAY_SIZE(dev-host_features)) {
+if (features.index  ARRAY_SIZE(vdev-guest_features)) {
 virtio_set_features(vdev, features.index, features.features);
 } else {
 /*
-- 
1.7.9.5

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH RFC v3 00/12] qemu: towards virtio-1 host support

2014-11-26 Thread Cornelia Huck
Next version of virtio-1 patches for qemu.

Only change from v2 is splitting out the vring accessors into a
separate header file - should hopefully fix the build issues.

Cornelia Huck (9):
  virtio: cull virtio_bus_set_vdev_features
  virtio: support more feature bits
  s390x/virtio-ccw: fix check for WRITE_FEAT
  virtio: introduce legacy virtio devices
  virtio: allow virtio-1 queue layout
  dataplane: allow virtio-1 devices
  s390x/virtio-ccw: support virtio-1 set_vq format
  virtio-net/virtio-blk: enable virtio 1.0
  s390x/virtio-ccw: enable virtio 1.0

Thomas Huth (3):
  linux-headers/virtio_config: Update with VIRTIO_F_VERSION_1
  s390x/css: Add a callback for when subchannel gets disabled
  s390x/virtio-ccw: add virtio set-revision call

 hw/9pfs/virtio-9p-device.c|7 +-
 hw/block/dataplane/virtio-blk.c   |4 +-
 hw/block/virtio-blk.c |   13 +-
 hw/char/virtio-serial-bus.c   |9 +-
 hw/net/virtio-net.c   |   42 --
 hw/s390x/css.c|   12 ++
 hw/s390x/css.h|1 +
 hw/s390x/s390-virtio-bus.c|9 +-
 hw/s390x/virtio-ccw.c |  186 +++--
 hw/s390x/virtio-ccw.h |7 +-
 hw/scsi/vhost-scsi.c  |7 +-
 hw/scsi/virtio-scsi-dataplane.c   |2 +-
 hw/scsi/virtio-scsi.c |   10 +-
 hw/virtio/Makefile.objs   |2 +-
 hw/virtio/dataplane/Makefile.objs |2 +-
 hw/virtio/dataplane/vring.c   |   96 +++--
 hw/virtio/virtio-balloon.c|8 +-
 hw/virtio/virtio-bus.c|   23 +--
 hw/virtio/virtio-mmio.c   |9 +-
 hw/virtio/virtio-pci.c|   13 +-
 hw/virtio/virtio-rng.c|2 +-
 hw/virtio/virtio.c|   51 +--
 include/hw/virtio/dataplane/vring-accessors.h |   75 ++
 include/hw/virtio/dataplane/vring.h   |   14 +-
 include/hw/virtio/virtio-access.h |4 +
 include/hw/virtio/virtio-bus.h|   10 +-
 include/hw/virtio/virtio.h|   34 -
 linux-headers/linux/virtio_config.h   |3 +
 28 files changed, 467 insertions(+), 188 deletions(-)
 create mode 100644 include/hw/virtio/dataplane/vring-accessors.h

-- 
1.7.9.5

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH RFC v3 01/12] linux-headers/virtio_config: Update with VIRTIO_F_VERSION_1

2014-11-26 Thread Cornelia Huck
From: Thomas Huth th...@linux.vnet.ibm.com

Add the new VIRTIO_F_VERSION_1 definition to the virtio_config.h
linux header.

Signed-off-by: Thomas Huth th...@linux.vnet.ibm.com
Signed-off-by: Cornelia Huck cornelia.h...@de.ibm.com
---
 linux-headers/linux/virtio_config.h |3 +++
 1 file changed, 3 insertions(+)

diff --git a/linux-headers/linux/virtio_config.h 
b/linux-headers/linux/virtio_config.h
index 75dc20b..16aa289 100644
--- a/linux-headers/linux/virtio_config.h
+++ b/linux-headers/linux/virtio_config.h
@@ -54,4 +54,7 @@
 /* Can the device handle any descriptor layout? */
 #define VIRTIO_F_ANY_LAYOUT27
 
+/* v1.0 compliant. */
+#define VIRTIO_F_VERSION_1 32
+
 #endif /* _LINUX_VIRTIO_CONFIG_H */
-- 
1.7.9.5

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH RFC v3 03/12] virtio: support more feature bits

2014-11-26 Thread Cornelia Huck
With virtio-1, we support more than 32 feature bits. Let's make
vdev-guest_features depend on the number of supported feature bits,
allowing us to grow the feature bits automatically.

We also need to enhance the internal functions dealing with getting
and setting features with an additional index field, so that all feature
bits may be accessed (in chunks of 32 bits).

vhost and migration have been ignored for now.

Reviewed-by: Thomas Huth th...@linux.vnet.ibm.com
Signed-off-by: Cornelia Huck cornelia.h...@de.ibm.com
---
 hw/9pfs/virtio-9p-device.c |7 ++-
 hw/block/virtio-blk.c  |9 +++--
 hw/char/virtio-serial-bus.c|9 +++--
 hw/net/virtio-net.c|   38 ++
 hw/s390x/s390-virtio-bus.c |9 +
 hw/s390x/virtio-ccw.c  |   17 ++---
 hw/scsi/vhost-scsi.c   |7 +--
 hw/scsi/virtio-scsi.c  |   10 +-
 hw/virtio/dataplane/vring.c|   10 +-
 hw/virtio/virtio-balloon.c |8 ++--
 hw/virtio/virtio-bus.c |9 +
 hw/virtio/virtio-mmio.c|9 +
 hw/virtio/virtio-pci.c |   13 +++--
 hw/virtio/virtio-rng.c |2 +-
 hw/virtio/virtio.c |   29 +
 include/hw/virtio/virtio-bus.h |7 ---
 include/hw/virtio/virtio.h |   19 ++-
 17 files changed, 135 insertions(+), 77 deletions(-)

diff --git a/hw/9pfs/virtio-9p-device.c b/hw/9pfs/virtio-9p-device.c
index 2572747..c29c8c8 100644
--- a/hw/9pfs/virtio-9p-device.c
+++ b/hw/9pfs/virtio-9p-device.c
@@ -21,8 +21,13 @@
 #include virtio-9p-coth.h
 #include hw/virtio/virtio-access.h
 
-static uint32_t virtio_9p_get_features(VirtIODevice *vdev, uint32_t features)
+static uint32_t virtio_9p_get_features(VirtIODevice *vdev, unsigned int index,
+   uint32_t features)
 {
+if (index  0) {
+return features;
+}
+
 features |= 1  VIRTIO_9P_MOUNT_TAG;
 return features;
 }
diff --git a/hw/block/virtio-blk.c b/hw/block/virtio-blk.c
index b19b102..6d86f60 100644
--- a/hw/block/virtio-blk.c
+++ b/hw/block/virtio-blk.c
@@ -564,10 +564,15 @@ static void virtio_blk_set_config(VirtIODevice *vdev, 
const uint8_t *config)
 aio_context_release(blk_get_aio_context(s-blk));
 }
 
-static uint32_t virtio_blk_get_features(VirtIODevice *vdev, uint32_t features)
+static uint32_t virtio_blk_get_features(VirtIODevice *vdev, unsigned int index,
+uint32_t features)
 {
 VirtIOBlock *s = VIRTIO_BLK(vdev);
 
+if (index  0) {
+return features;
+}
+
 features |= (1  VIRTIO_BLK_F_SEG_MAX);
 features |= (1  VIRTIO_BLK_F_GEOMETRY);
 features |= (1  VIRTIO_BLK_F_TOPOLOGY);
@@ -601,7 +606,7 @@ static void virtio_blk_set_status(VirtIODevice *vdev, 
uint8_t status)
 return;
 }
 
-features = vdev-guest_features;
+features = vdev-guest_features[0];
 
 /* A guest that supports VIRTIO_BLK_F_CONFIG_WCE must be able to send
  * cache flushes.  Thus, the auto writethrough behavior is never
diff --git a/hw/char/virtio-serial-bus.c b/hw/char/virtio-serial-bus.c
index a7b1b68..55de504 100644
--- a/hw/char/virtio-serial-bus.c
+++ b/hw/char/virtio-serial-bus.c
@@ -75,7 +75,7 @@ static VirtIOSerialPort *find_port_by_name(char *name)
 static bool use_multiport(VirtIOSerial *vser)
 {
 VirtIODevice *vdev = VIRTIO_DEVICE(vser);
-return vdev-guest_features  (1  VIRTIO_CONSOLE_F_MULTIPORT);
+return vdev-guest_features[0]  (1  VIRTIO_CONSOLE_F_MULTIPORT);
 }
 
 static size_t write_to_port(VirtIOSerialPort *port,
@@ -467,10 +467,15 @@ static void handle_input(VirtIODevice *vdev, VirtQueue 
*vq)
 {
 }
 
-static uint32_t get_features(VirtIODevice *vdev, uint32_t features)
+static uint32_t get_features(VirtIODevice *vdev, unsigned int index,
+ uint32_t features)
 {
 VirtIOSerial *vser;
 
+if (index  0) {
+return features;
+}
+
 vser = VIRTIO_SERIAL(vdev);
 
 if (vser-bus.max_nr_ports  1) {
diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
index 9b88775..1e214b5 100644
--- a/hw/net/virtio-net.c
+++ b/hw/net/virtio-net.c
@@ -86,7 +86,7 @@ static void virtio_net_set_config(VirtIODevice *vdev, const 
uint8_t *config)
 
 memcpy(netcfg, config, n-config_size);
 
-if (!(vdev-guest_features  VIRTIO_NET_F_CTRL_MAC_ADDR  1) 
+if (!(vdev-guest_features[0]  VIRTIO_NET_F_CTRL_MAC_ADDR  1) 
 memcmp(netcfg.mac, n-mac, ETH_ALEN)) {
 memcpy(n-mac, netcfg.mac, ETH_ALEN);
 qemu_format_nic_info_str(qemu_get_queue(n-nic), n-mac);
@@ -305,7 +305,7 @@ static RxFilterInfo 
*virtio_net_query_rxfilter(NetClientState *nc)
 info-multicast_table = str_list;
 info-vlan_table = get_vlan_table(n);
 
-if (!((1  VIRTIO_NET_F_CTRL_VLAN)  vdev-guest_features)) {
+if (!((1  VIRTIO_NET_F_CTRL_VLAN)  

Re: [CFT PATCH v2 2/2] KVM: x86: support XSAVES usage in the host

2014-11-26 Thread Radim Krčmář
2014-11-26 17:26+0100, Paolo Bonzini:
 On 26/11/2014 15:42, Radim Krčmář wrote:
  I'm not sure what is more future proof. :)  I wonder if native_xrstor
  could be a problem the day XRSTORS actually sets/restores MSRs as the
  processor documentation promises.
  
  XRSTORS won't affect the guest in any way, we are just going to use it
  to convert the xsave, so any side-effects are going to stay in the host.
  (This could break the host though.)
 
 Yes, that's the problem. :)

(It would be a bug in Linux's xsave API, if we were using it correctly.)

  My main presumption is that XSAVE*-XRSTOR*-XSAVE-XRSTOR has the same
  result as XSAVE-XRSTOR, because we are only interested in the state,
  not in any metadata.
  (If it isn't possible to combine intructions, like XSAVE after XRSTORS,
   this solution won't work.)
 
 Yes, that should be right.  But actually what KVM would do it is
 XRSTOR-XSAVE*-XRSTOR*-XSAVE.  The problem here is the side effects of
 doing XRSTORS far from a guest entry...

The entry can be aborted after doing XRSTORS, so we are going to know if
this doesn't work :)

 though that would likely be
 handled by load_guest_fpu/put_guest_fpu.

Yes, I don't see a principal difference between manipulating xsave for
vmentry and this conversion, it can be wrapped in the same way.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 7/7] KVM: arm64: guest debug, HW assisted debug support

2014-11-26 Thread Andrew Jones
On Tue, Nov 25, 2014 at 04:10:05PM +, Alex Bennée wrote:
 This adds support for userspace to control the HW debug registers for
 guest debug. We'll only copy the $ARCH defined number across as that's
 all that hyp.S will use anyway. I've moved some helper functions into
 the hw_breakpoint.h header for re-use.
 
 As with single step we need to tweak the guest registers to enable the
 exceptions but we don't want to overwrite the guest copy of these
 registers so this is done close to the guest entry.
 
 Two new capabilities have been added to the KVM_EXTENSION ioctl to allow
 userspace to query the number of hardware break and watch points
 available on the host hardware.
 
 Signed-off-by: Alex Bennée alex.ben...@linaro.org
 
 diff --git a/Documentation/virtual/kvm/api.txt 
 b/Documentation/virtual/kvm/api.txt
 index 9383359..5e8c673 100644
 --- a/Documentation/virtual/kvm/api.txt
 +++ b/Documentation/virtual/kvm/api.txt
 @@ -2593,7 +2593,7 @@ The top 16 bits of the control field are architecture 
 specific control
  flags which can include the following:
  
- KVM_GUESTDBG_USE_SW_BP: using software breakpoints [x86, arm64]
 -  - KVM_GUESTDBG_USE_HW_BP: using hardware breakpoints [x86, s390]
 +  - KVM_GUESTDBG_USE_HW_BP: using hardware breakpoints [x86, s390, arm64]
- KVM_GUESTDBG_INJECT_DB: inject DB type exception [x86]
- KVM_GUESTDBG_INJECT_BP: inject BP type exception [x86]
- KVM_GUESTDBG_EXIT_PENDING:  trigger an immediate guest exit [s390]
 @@ -2606,7 +2606,10 @@ we need to ensure the guest vCPUs architecture 
 specific registers are
  updated to the correct (supplied) values.
  
  The second part of the structure is architecture specific and
 -typically contains a set of debug registers.
 +typically contains a set of debug registers. For arm64 the number of
 +debug registers is implementation defined and can be determined by
 +querying the KVM_CAP_GUEST_DEBUG_HW_BPS and KVM_CAP_GUEST_DEBUG_HW_WPS
 +capabilities.
  
  When debug events exit the main run loop with the reason
  KVM_EXIT_DEBUG with the kvm_debug_exit_arch part of the kvm_run
 diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
 index a76daae..c8ec23a 100644
 --- a/arch/arm/kvm/arm.c
 +++ b/arch/arm/kvm/arm.c
 @@ -39,6 +39,7 @@
  #include asm/cacheflush.h
  #include asm/virt.h
  #include asm/debug-monitors.h
 +#include asm/hw_breakpoint.h
  #include asm/kvm_arm.h
  #include asm/kvm_asm.h
  #include asm/kvm_mmu.h
 @@ -341,8 +342,37 @@ int kvm_arch_vcpu_ioctl_set_guest_debug(struct kvm_vcpu 
 *vcpu,
  
   /* Hardware assisted Break and Watch points */
   if (vcpu-guest_debug  KVM_GUESTDBG_USE_HW_BP) {
 - kvm_info(HW BP support requested, not yet implemented\n);
 - return -EINVAL;
 + int i;
 + int nb = get_num_brps();
 + int nw = get_num_wrps();
 +
 + /* Copy across up to IMPDEF debug registers to our
 +  * shadow copy in the vcpu structure. The hyp.S code
 +  * will then set them up before we re-enter the guest.
 +  */
 + memcpy(vcpu-arch.guest_debug_regs.dbg_bcr,
 + dbg-arch.dbg_bcr, sizeof(__u64)*nb);
 + memcpy(vcpu-arch.guest_debug_regs.dbg_bvr,
 + dbg-arch.dbg_bvr, sizeof(__u64)*nb);
 + memcpy(vcpu-arch.guest_debug_regs.dbg_wcr,
 + dbg-arch.dbg_wcr, sizeof(__u64)*nw);
 + memcpy(vcpu-arch.guest_debug_regs.dbg_wvr,
 + dbg-arch.dbg_wvr, sizeof(__u64)*nw);
 +
 + kvm_info(HW BP support requested\n);
 + for (i = 0; i  nb; i++) {
 + kvm_info(%d: dbg_bcr=0x%llx dbg_bvr=0x%llx\n,
 +  i,
 + vcpu-arch.guest_debug_regs.dbg_bcr[i],
 + vcpu-arch.guest_debug_regs.dbg_bvr[i]);
 + }
 + for (i = 0; i  nw; i++) {
 + kvm_info(%d: dbg_wcr=0x%llx dbg_wvr=0x%llx\n,
 +  i,
 +  vcpu-arch.guest_debug_regs.dbg_wcr[i],
 +  vcpu-arch.guest_debug_regs.dbg_wvr[i]);
 + }

I think we decided to drop the kvm_info's. Here's several more lines of
logging to drop too.

 + route_el2 = true;
   }
  
   /* If we are going to handle any debug exceptions we need to
 diff --git a/arch/arm64/include/asm/hw_breakpoint.h 
 b/arch/arm64/include/asm/hw_breakpoint.h
 index 52b484b..c450552 100644
 --- a/arch/arm64/include/asm/hw_breakpoint.h
 +++ b/arch/arm64/include/asm/hw_breakpoint.h
 @@ -130,6 +130,18 @@ static inline void ptrace_hw_copy_thread(struct 
 task_struct *task)
  }
  #endif
  
 +/* Determine number of BRP registers available. */
 +static inline int get_num_brps(void)
 +{
 + return ((read_cpuid(ID_AA64DFR0_EL1)  12)  0xf) + 1;
 +}
 +
 +/* Determine number of WRP registers available. */
 +static inline int 

Re: [PATCH 2/7] KVM: arm: guest debug, define API headers

2014-11-26 Thread Andrew Jones
On Wed, Nov 26, 2014 at 05:46:05PM +0100, Paolo Bonzini wrote:
 
 
 On 26/11/2014 15:58, Alex Bennée wrote:
  
  Andrew Jones drjo...@redhat.com writes:
  
  On Tue, Nov 25, 2014 at 04:10:00PM +, Alex Bennée wrote:
  This commit defines the API headers for guest debugging. There are two
  architecture specific debug structures:
  snip
  +/* Architecture related debug defines - upper 16 bits of
  + * kvm_guest_debug-control
  + */
  +#define KVM_GUESTDBG_USE_SW_BP_SHIFT 16
  +#define KVM_GUESTDBG_USE_SW_BP   (1  
  KVM_GUESTDBG_USE_SW_BP_SHIFT)
  +#define KVM_GUESTDBG_USE_HW_BP_SHIFT 17
  +#define KVM_GUESTDBG_USE_HW_BP   (1  
  KVM_GUESTDBG_USE_HW_BP_SHIFT)
  +
 
  I see this are defined in arch/x86/include/uapi/asm/kvm.h,
  so you needed to reproduce them here, but shouldn't they
  be promoted to include/uapi/linux/kvm.h instead?
  
  Well if we move them to common uapi we either restrict the $ARCH
  specific options that don't have SW/HW BKPTS (would be weird but...) or
  make them generic in the lower 16 bits (breaks API).
  
  But in principle I have no objection if other don't.
 
 I think it's a matter of personal taste.  Architecture-specific means
 not all architectures may support it, but it's certainly a good idea
 to reuse the same value if multiple architectures do support a #define.
 
 What you did is fine, another possibility is to do
 
 #define __KVM_GUESTDBG_USE_SW_BP   (1  16)
 
 in include/uapi/linux/kvm.h, and
 
 #define KVM_GUESTDBG_USE_SW_BP __KVM_GUESTDBG_USE_SW_BP
 
 in the arch-specific file.  Andrew, is this closer to what you intended?


I just reread Documentation/virtual/kvm/api.txt a bit more closely and
see that these fall in the architecture specific control region of the
field. So forget what I said. But your suggestion of __KVM_GUESTDBG_USE_SW_BP
looks like a good idea to me.

drew
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 14/21] KVM: x86: Software disabled APIC should still deliver NMIs

2014-11-26 Thread Paolo Bonzini


On 26/11/2014 18:01, Nadav Amit wrote:
 Paolo Bonzini pbonz...@redhat.com wrote:
 


 On 06/11/2014 17:45, Radim Krčmář wrote:
 2014-11-06 10:34+0100, Paolo Bonzini:
 On 05/11/2014 21:45, Nadav Amit wrote:
 If I understand the SDM correctly, in such scenario (all APICs are
 software disabled) the mode is left as the default - flat mode (see

 APIC doesn't have any global mode (it is just KVM's simplification), so
 when a message lands on the system bus, it just compares MDA with LDR
 and DFR ...

 section 10.6.2.2 Logical Destination Mode”): All processors that
 have their APIC software enabled (using the spurious vector
 enable/disable bit) must have their DFRs (Destination Format
 Registers) programmed identically. The default mode for DFR is flat
 mode.”

 I think the default mode points to reset state, which is flat DFR;
 and it is mentioned only because of the following sentence
  If you are using cluster mode, DFRs must be programmed before the APIC
  is software enabled.

 That's not what either Bochs or QEMU do, though.  (Though in the case of
 Bochs I cannot find the place where reception of IPIs is prevented for
 software-disabled APICs, so I'm not sure how much to trust it in this 
 case).

 I'm not sure why software-disabled APICs could have different DFRs, if
 the APICs can receive NMI IPIs.  I'll ask Intel.

 When changing the mode, we can't switch DFR synchronously, so it has to
 happen and NMI may be needed (watchdog?) before APIC configuration.
 Explicit statement might have been a hint to be _extra_ careful when
 using logical destination for INIT, NMI, ... I wonder what they'll say.

 Anyway, Paolo's patch seems to be in the right direction, I'd modify it
 a bit though:

 LDR=0 isn't addressable in any logical mode, so we can ignore APICs that
 don't set it and decide the mode by the last nonzero one.
 This works in a situation, where one part is configured for cluster and
 the rest is still in reset state.

 (It gets harder if we allow nonzero LDRs with different DFR ...
 we'd need to split our logical map to handle it.)

 diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
 index 758f838..6da303e1 100644
 --- a/arch/x86/kvm/lapic.c
 +++ b/arch/x86/kvm/lapic.c
 @@ -148,10 +148,6 @@ static void recalculate_apic_map(struct kvm *kvm)
 goto out;

 new-ldr_bits = 8;
 -   /* flat mode is default */
 -   new-cid_shift = 8;
 -   new-cid_mask = 0;
 -   new-lid_mask = 0xff;
 new-broadcast = APIC_BROADCAST;

 kvm_for_each_vcpu(i, vcpu, kvm) {
 @@ -166,7 +162,7 @@ static void recalculate_apic_map(struct kvm *kvm)
 new-cid_mask = (1  KVM_X2APIC_CID_BITS) - 1;
 new-lid_mask = 0x;
 new-broadcast = X2APIC_BROADCAST;
 -   } else if (kvm_apic_hw_enabled(apic)) {
 +   } else if (kvm_apic_get_reg(apic, APIC_LDR)) {
 if (kvm_apic_get_reg(apic, APIC_DFR) ==
 APIC_DFR_CLUSTER) {
 new-cid_shift = 4;

 I merged this patch and Nadav’s.
 
 Sorry for the late and long reply, but I got an issue with the new version
 (and my previous version as well). Indeed, the SDM states that DFR should
 be the same for enabled CPUs, and that the BIOS should get all CPUs in
 either xAPIC or x2APIC. Yet, there is nothing that says all CPUs need to be
 in xAPIC/x2APIC mode.
 
 In my tests (which pass on bare-metal), I got a scenario in which some CPUs
 are in xAPIC mode, the BSP changed (which is currently not handled correctly
 by KVM) and the BSP has x2APIC enabled. All the core APICs are
 software-enabled. The expected behaviour is that the CPUs with x2APIC
 enabled would work in x2APIC mode.
 
 I think such a transitory scenario is possible on real-systems as well,
 perhaps during CPU hot-plug. It appears the previous version (before all of
 our changes) handled it better. I presume the most efficient way is to start
 determining the APIC logical mode from the BSP, and if it is disabled,
 traverse the rest of the CPUs until finding the first one with APIC enabled.
 Yet, I have not finished doing and checking the BSP fix and other dependent
 INIT signal handling fixes.
 
 In the meanwhile, would you be ok with restoring some of the previous
 behaviour - i.e., x2APIC is enabled if any CPU turned it on (regardless to
 whether APIC is software enabled), otherwise use the configuration of the
 last enabled APIC?
 
 -- 8 —
 Subject: [PATCH] KVM: x86: Traverse all CPUs during recalculate_apic_map
 
 ---
  arch/x86/kvm/lapic.c | 20 +++-
  1 file changed, 11 insertions(+), 9 deletions(-)
 
 diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
 index 9c90d31..6dc2be6 100644
 --- a/arch/x86/kvm/lapic.c
 +++ b/arch/x86/kvm/lapic.c
 @@ -139,6 +139,7 @@ static void recalculate_apic_map(struct kvm *kvm)
   struct kvm_apic_map *new, *old = NULL;
   struct kvm_vcpu *vcpu;
   int i;
 + bool any_enabled 

Re: [PATCH 5/7] KVM: arm64: guest debug, add support for single-step

2014-11-26 Thread Alex Bennée

Andrew Jones drjo...@redhat.com writes:

 On Tue, Nov 25, 2014 at 04:10:03PM +, Alex Bennée wrote:
 This adds support for single-stepping the guest. As userspace can and
 will manipulate guest registers before restarting any tweaking of the
 registers has to occur just before control is passed back to the guest.
 Furthermore while guest debugging is in effect we need to squash the
 ability of the guest to single-step itself as we have no easy way of
 re-entering the guest after the exception has been delivered to the
 hypervisor.
 
 Signed-off-by: Alex Bennée alex.ben...@linaro.org
 
 diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
 index 48d26bb..a76daae 100644
 --- a/arch/arm/kvm/arm.c
 +++ b/arch/arm/kvm/arm.c
 @@ -38,6 +38,7 @@
  #include asm/tlbflush.h
  #include asm/cacheflush.h
  #include asm/virt.h
 +#include asm/debug-monitors.h
  #include asm/kvm_arm.h
  #include asm/kvm_asm.h
  #include asm/kvm_mmu.h
 @@ -300,6 +301,17 @@ void kvm_arch_vcpu_put(struct kvm_vcpu *vcpu)
  kvm_arm_set_running_vcpu(NULL);
  }
  
 +/**
 + * kvm_arch_vcpu_ioctl_set_guest_debug - Setup guest debugging
 + * @kvm:pointer to the KVM struct
 + * @kvm_guest_debug: the ioctl data buffer
 + *
 + * This sets up the VM for guest debugging. Care has to be taken when
 + * manipulating guest registers as these will be set/cleared by the
 + * hyper-visor controller, typically before each kvm_run event. As a
 + * result modification of the guest registers needs to take place
 + * after they have been restored in the hyp.S trampoline code.
 + */
  int kvm_arch_vcpu_ioctl_set_guest_debug(struct kvm_vcpu *vcpu,
  struct kvm_guest_debug *dbg)
  {
 @@ -317,8 +329,8 @@ int kvm_arch_vcpu_ioctl_set_guest_debug(struct kvm_vcpu 
 *vcpu,
  
  /* Single Step */
  if (vcpu-guest_debug  KVM_GUESTDBG_SINGLESTEP) {
 -kvm_info(SS requested, not yet implemented\n);
 -return -EINVAL;
 +kvm_info(SS requested\n);
 +route_el2 = true;
  }
  
  /* Software Break Points */
 diff --git a/arch/arm64/kernel/asm-offsets.c 
 b/arch/arm64/kernel/asm-offsets.c
 index 8da1043..78e5ae1 100644
 --- a/arch/arm64/kernel/asm-offsets.c
 +++ b/arch/arm64/kernel/asm-offsets.c
 @@ -121,6 +121,7 @@ int main(void)
DEFINE(VCPU_FAR_EL2,  offsetof(struct kvm_vcpu, 
 arch.fault.far_el2));
DEFINE(VCPU_HPFAR_EL2,offsetof(struct kvm_vcpu, 
 arch.fault.hpfar_el2));
DEFINE(VCPU_DEBUG_FLAGS,  offsetof(struct kvm_vcpu, arch.debug_flags));
 +  DEFINE(GUEST_DEBUG,   offsetof(struct kvm_vcpu, guest_debug));
DEFINE(VCPU_HCR_EL2,  offsetof(struct kvm_vcpu, 
 arch.hcr_el2));
DEFINE(VCPU_MDCR_EL2, offsetof(struct kvm_vcpu, arch.mdcr_el2));
DEFINE(VCPU_IRQ_LINES,offsetof(struct kvm_vcpu, arch.irq_lines));
 diff --git a/arch/arm64/kvm/handle_exit.c b/arch/arm64/kvm/handle_exit.c
 index 28dc92b..6def054 100644
 --- a/arch/arm64/kvm/handle_exit.c
 +++ b/arch/arm64/kvm/handle_exit.c
 @@ -91,6 +91,25 @@ static int kvm_handle_bkpt(struct kvm_vcpu *vcpu, struct 
 kvm_run *run)
  return 0;
  }
  
 +/**
 + * kvm_handle_ss - handle single step exceptions
 + *
 + * @vcpu:   the vcpu pointer

 same @run comment as other handler header in previous patch

Yeah I think I'll be merging them all together given the comments about
passing syndrome info directly.

 + *
 + * See: ARM ARM D2.12 for the details. While the host is routing debug
 + * exceptions to it's handlers we have to suppress the ability of the
 + * guest to trigger exceptions.
 + */
 +static int kvm_handle_ss(struct kvm_vcpu *vcpu, struct kvm_run *run)
 +{
 +WARN_ON(!(vcpu-guest_debug  KVM_GUESTDBG_SINGLESTEP));

 I'm not sure about this WARN_ON. Is there some scenario you were
 thinking of when you put it here? Is there some scenario where this
 could trigger so frequently we kill the log buffer?

The main one I had in mind was not suppressing the guest's attempt to
step while guest debugging was running.

snip
  
 -/* for KVM_SET_GUEST_DEBUG */
 -
 -#define KVM_GUESTDBG_ENABLE 0x0001
 -#define KVM_GUESTDBG_SINGLESTEP 0x0002
 -
  struct kvm_guest_debug {
  __u32 control;
  __u32 pad;
 @@ -1189,4 +1186,15 @@ struct kvm_assigned_msix_entry {
  __u16 padding[3];
  };
  
 +#endif /* __ASSEMBLY__ */
 +
 +/* for KVM_SET_GUEST_DEBUG */
 +
 +#define KVM_GUESTDBG_ENABLE_SHIFT   0
 +#define KVM_GUESTDBG_ENABLE (1  KVM_GUESTDBG_ENABLE_SHIFT)
 +#define KVM_GUESTDBG_SINGLESTEP_SHIFT   1
 +#define KVM_GUESTDBG_SINGLESTEP (1  KVM_GUESTDBG_SINGLESTEP_SHIFT)

 EALIGN: we can tab these defines up better

Sure, I'll clean those up.

-- 
Alex Bennée
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RFC v3 07/12] dataplane: allow virtio-1 devices

2014-11-26 Thread Greg Kurz
On Wed, 26 Nov 2014 18:28:38 +0100
Cornelia Huck cornelia.h...@de.ibm.com wrote:

 Handle endianness conversion for virtio-1 virtqueues correctly.
 
 Note that dataplane now needs to be built per-target.
 
 Signed-off-by: Cornelia Huck cornelia.h...@de.ibm.com

Build is ok now.

Acked-by: Greg Kurz gk...@linux.vnet.ibm.com

 ---
  hw/block/dataplane/virtio-blk.c   |4 +-
  hw/scsi/virtio-scsi-dataplane.c   |2 +-
  hw/virtio/Makefile.objs   |2 +-
  hw/virtio/dataplane/Makefile.objs |2 +-
  hw/virtio/dataplane/vring.c   |   86 
 ++---
  include/hw/virtio/dataplane/vring-accessors.h |   75 +
  include/hw/virtio/dataplane/vring.h   |   14 +---
  7 files changed, 131 insertions(+), 54 deletions(-)
  create mode 100644 include/hw/virtio/dataplane/vring-accessors.h
 
 diff --git a/hw/block/dataplane/virtio-blk.c b/hw/block/dataplane/virtio-blk.c
 index 1222a37..2d8cc15 100644
 --- a/hw/block/dataplane/virtio-blk.c
 +++ b/hw/block/dataplane/virtio-blk.c
 @@ -16,7 +16,9 @@
  #include qemu/iov.h
  #include qemu/thread.h
  #include qemu/error-report.h
 +#include hw/virtio/virtio-access.h
  #include hw/virtio/dataplane/vring.h
 +#include hw/virtio/dataplane/vring-accessors.h
  #include sysemu/block-backend.h
  #include hw/virtio/virtio-blk.h
  #include virtio-blk.h
 @@ -75,7 +77,7 @@ static void complete_request_vring(VirtIOBlockReq *req, 
 unsigned char status)
  VirtIOBlockDataPlane *s = req-dev-dataplane;
  stb_p(req-in-status, status);
 
 -vring_push(req-dev-dataplane-vring, req-elem,
 +vring_push(s-vdev, req-dev-dataplane-vring, req-elem,
 req-qiov.size + sizeof(*req-in));
 
  /* Suppress notification to guest by BH and its scheduled
 diff --git a/hw/scsi/virtio-scsi-dataplane.c b/hw/scsi/virtio-scsi-dataplane.c
 index 03a1e8c..418d73b 100644
 --- a/hw/scsi/virtio-scsi-dataplane.c
 +++ b/hw/scsi/virtio-scsi-dataplane.c
 @@ -94,7 +94,7 @@ void virtio_scsi_vring_push_notify(VirtIOSCSIReq *req)
  {
  VirtIODevice *vdev = VIRTIO_DEVICE(req-vring-parent);
 
 -vring_push(req-vring-vring, req-elem,
 +vring_push(vdev, req-vring-vring, req-elem,
 req-qsgl.size + req-resp_iov.size);
 
  if (vring_should_notify(vdev, req-vring-vring)) {
 diff --git a/hw/virtio/Makefile.objs b/hw/virtio/Makefile.objs
 index d21c397..19b224a 100644
 --- a/hw/virtio/Makefile.objs
 +++ b/hw/virtio/Makefile.objs
 @@ -2,7 +2,7 @@ common-obj-y += virtio-rng.o
  common-obj-$(CONFIG_VIRTIO_PCI) += virtio-pci.o
  common-obj-y += virtio-bus.o
  common-obj-y += virtio-mmio.o
 -common-obj-$(CONFIG_VIRTIO) += dataplane/
 +obj-$(CONFIG_VIRTIO) += dataplane/
 
  obj-y += virtio.o virtio-balloon.o 
  obj-$(CONFIG_LINUX) += vhost.o vhost-backend.o vhost-user.o
 diff --git a/hw/virtio/dataplane/Makefile.objs 
 b/hw/virtio/dataplane/Makefile.objs
 index 9a8cfc0..753a9ca 100644
 --- a/hw/virtio/dataplane/Makefile.objs
 +++ b/hw/virtio/dataplane/Makefile.objs
 @@ -1 +1 @@
 -common-obj-y += vring.o
 +obj-y += vring.o
 diff --git a/hw/virtio/dataplane/vring.c b/hw/virtio/dataplane/vring.c
 index a051775..0da8d6b 100644
 --- a/hw/virtio/dataplane/vring.c
 +++ b/hw/virtio/dataplane/vring.c
 @@ -18,7 +18,9 @@
  #include hw/hw.h
  #include exec/memory.h
  #include exec/address-spaces.h
 +#include hw/virtio/virtio-access.h
  #include hw/virtio/dataplane/vring.h
 +#include hw/virtio/dataplane/vring-accessors.h
  #include qemu/error-report.h
 
  /* vring_map can be coupled with vring_unmap or (if you still have the
 @@ -83,7 +85,7 @@ bool vring_setup(Vring *vring, VirtIODevice *vdev, int n)
  vring_init(vring-vr, virtio_queue_get_num(vdev, n), vring_ptr, 4096);
 
  vring-last_avail_idx = virtio_queue_get_last_avail_idx(vdev, n);
 -vring-last_used_idx = vring-vr.used-idx;
 +vring-last_used_idx = vring_get_used_idx(vdev, vring);
  vring-signalled_used = 0;
  vring-signalled_used_valid = false;
 
 @@ -104,7 +106,7 @@ void vring_teardown(Vring *vring, VirtIODevice *vdev, int 
 n)
  void vring_disable_notification(VirtIODevice *vdev, Vring *vring)
  {
  if (!(vdev-guest_features[0]  (1  VIRTIO_RING_F_EVENT_IDX))) {
 -vring-vr.used-flags |= VRING_USED_F_NO_NOTIFY;
 +vring_set_used_flags(vdev, vring, VRING_USED_F_NO_NOTIFY);
  }
  }
 
 @@ -117,10 +119,10 @@ bool vring_enable_notification(VirtIODevice *vdev, 
 Vring *vring)
  if (vdev-guest_features[0]  (1  VIRTIO_RING_F_EVENT_IDX)) {
  vring_avail_event(vring-vr) = vring-vr.avail-idx;
  } else {
 -vring-vr.used-flags = ~VRING_USED_F_NO_NOTIFY;
 +vring_clear_used_flags(vdev, vring, VRING_USED_F_NO_NOTIFY);
  }
  smp_mb(); /* ensure update is seen before reading avail_idx */
 -return !vring_more_avail(vring);
 +return !vring_more_avail(vdev, vring);
  }
 
  /* This is stolen from linux/drivers/vhost/vhost.c:vhost_notify() */
 @@ 

Re: [Qemu-devel] [PATCH RFC v3 05/12] virtio: introduce legacy virtio devices

2014-11-26 Thread Greg Kurz
On Wed, 26 Nov 2014 18:28:36 +0100
Cornelia Huck cornelia.h...@de.ibm.com wrote:

 Introduce a helper function to indicate  whether a virtio device is
 operating in legacy or virtio standard mode.
 
 It may be used to make decisions about the endianess of virtio accesses
 and other virtio-1 specific changes, enabling us to support transitional
 devices.
 
 Reviewed-by: Thomas Huth th...@linux.vnet.ibm.com
 Signed-off-by: Cornelia Huck cornelia.h...@de.ibm.com
 ---
  hw/virtio/virtio.c|6 +-
  include/hw/virtio/virtio-access.h |4 
  include/hw/virtio/virtio.h|   13 +++--
  3 files changed, 20 insertions(+), 3 deletions(-)
 
 diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
 index 2eb5d3c..4149f45 100644
 --- a/hw/virtio/virtio.c
 +++ b/hw/virtio/virtio.c
 @@ -883,7 +883,11 @@ static bool virtio_device_endian_needed(void *opaque)
  VirtIODevice *vdev = opaque;
 
  assert(vdev-device_endian != VIRTIO_DEVICE_ENDIAN_UNKNOWN);
 -return vdev-device_endian != virtio_default_endian();
 +if (virtio_device_is_legacy(vdev)) {
 +return vdev-device_endian != virtio_default_endian();
 +}
 +/* Devices conforming to VIRTIO 1.0 or later are always LE. */
 +return vdev-device_endian != VIRTIO_DEVICE_ENDIAN_LITTLE;
  }
 

Sorry but I still don't understand why we need to stream the device_endian
subsection if we have a virtio-1 device... this field is only used on
legacy device paths. Can you share an example where it is needed ?

  static const VMStateDescription vmstate_virtio_device_endian = {
 diff --git a/include/hw/virtio/virtio-access.h 
 b/include/hw/virtio/virtio-access.h
 index 46456fd..c123ee0 100644
 --- a/include/hw/virtio/virtio-access.h
 +++ b/include/hw/virtio/virtio-access.h
 @@ -19,6 +19,10 @@
 
  static inline bool virtio_access_is_big_endian(VirtIODevice *vdev)
  {
 +if (!virtio_device_is_legacy(vdev)) {
 +/* Devices conforming to VIRTIO 1.0 or later are always LE. */
 +return false;
 +}
  #if defined(TARGET_IS_BIENDIAN)
  return virtio_is_big_endian(vdev);
  #elif defined(TARGET_WORDS_BIGENDIAN)
 diff --git a/include/hw/virtio/virtio.h b/include/hw/virtio/virtio.h
 index b408166..40e567c 100644
 --- a/include/hw/virtio/virtio.h
 +++ b/include/hw/virtio/virtio.h
 @@ -275,9 +275,18 @@ void virtio_queue_set_host_notifier_fd_handler(VirtQueue 
 *vq, bool assign,
  void virtio_queue_notify_vq(VirtQueue *vq);
  void virtio_irq(VirtQueue *vq);
 
 +static inline bool virtio_device_is_legacy(VirtIODevice *vdev)
 +{
 +return !(vdev-guest_features[1]  (1  (VIRTIO_F_VERSION_1 - 32)));
 +}
 +
  static inline bool virtio_is_big_endian(VirtIODevice *vdev)
  {
 -assert(vdev-device_endian != VIRTIO_DEVICE_ENDIAN_UNKNOWN);
 -return vdev-device_endian == VIRTIO_DEVICE_ENDIAN_BIG;
 +if (virtio_device_is_legacy(vdev)) {
 +assert(vdev-device_endian != VIRTIO_DEVICE_ENDIAN_UNKNOWN);
 +return vdev-device_endian == VIRTIO_DEVICE_ENDIAN_BIG;
 +}
 +/* Devices conforming to VIRTIO 1.0 or later are always LE. */
 +return false;
  }
  #endif

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RFC v3 11/12] virtio-net/virtio-blk: enable virtio 1.0

2014-11-26 Thread Michael S. Tsirkin
On Wed, Nov 26, 2014 at 06:28:42PM +0100, Cornelia Huck wrote:
 virtio-net (non-vhost) and virtio-blk have everything in place to support
 virtio 1.0: let's enable the feature bit for them.

Hmm I doubt that.
At least not without more patches.

For block, scsi, wce and config-wce must be off, and
it must support ANY_LAYOUT (which might be a bit more code).

For net, header size is different when mergeable bufs are off,
and mac is read-only.



 Note that VIRTIO_F_VERSION_1 is technically a transport feature; once
 every device is ready for virtio 1.0, we can move this setting this
 feature bit out of the individual devices.
 
 Signed-off-by: Cornelia Huck cornelia.h...@de.ibm.com
 ---
  hw/block/virtio-blk.c |4 
  hw/net/virtio-net.c   |4 
  2 files changed, 8 insertions(+)
 
 diff --git a/hw/block/virtio-blk.c b/hw/block/virtio-blk.c
 index 6d86f60..3781f98 100644
 --- a/hw/block/virtio-blk.c
 +++ b/hw/block/virtio-blk.c
 @@ -569,6 +569,10 @@ static uint32_t virtio_blk_get_features(VirtIODevice 
 *vdev, unsigned int index,
  {
  VirtIOBlock *s = VIRTIO_BLK(vdev);
  
 +if (index == 1) {
 +features |= (1  (VIRTIO_F_VERSION_1 - 32));
 +}
 +
  if (index  0) {
  return features;
  }
 diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
 index 1e214b5..fcfc95f 100644
 --- a/hw/net/virtio-net.c
 +++ b/hw/net/virtio-net.c
 @@ -447,6 +447,10 @@ static uint32_t virtio_net_get_features(VirtIODevice 
 *vdev, unsigned int index,
  VirtIONet *n = VIRTIO_NET(vdev);
  NetClientState *nc = qemu_get_queue(n-nic);
  
 +if (index == 1  !get_vhost_net(nc-peer)) {
 +features |= (1  (VIRTIO_F_VERSION_1 - 32));
 +}
 +
  if (index  0) {
  return features;
  }
 -- 
 1.7.9.5
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [PATCH RFC v3 05/12] virtio: introduce legacy virtio devices

2014-11-26 Thread Michael S. Tsirkin
On Wed, Nov 26, 2014 at 07:46:38PM +0100, Greg Kurz wrote:
 On Wed, 26 Nov 2014 18:28:36 +0100
 Cornelia Huck cornelia.h...@de.ibm.com wrote:
 
  Introduce a helper function to indicate  whether a virtio device is
  operating in legacy or virtio standard mode.
  
  It may be used to make decisions about the endianess of virtio accesses
  and other virtio-1 specific changes, enabling us to support transitional
  devices.
  
  Reviewed-by: Thomas Huth th...@linux.vnet.ibm.com
  Signed-off-by: Cornelia Huck cornelia.h...@de.ibm.com
  ---
   hw/virtio/virtio.c|6 +-
   include/hw/virtio/virtio-access.h |4 
   include/hw/virtio/virtio.h|   13 +++--
   3 files changed, 20 insertions(+), 3 deletions(-)
  
  diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
  index 2eb5d3c..4149f45 100644
  --- a/hw/virtio/virtio.c
  +++ b/hw/virtio/virtio.c
  @@ -883,7 +883,11 @@ static bool virtio_device_endian_needed(void *opaque)
   VirtIODevice *vdev = opaque;
  
   assert(vdev-device_endian != VIRTIO_DEVICE_ENDIAN_UNKNOWN);
  -return vdev-device_endian != virtio_default_endian();
  +if (virtio_device_is_legacy(vdev)) {
  +return vdev-device_endian != virtio_default_endian();
  +}
  +/* Devices conforming to VIRTIO 1.0 or later are always LE. */
  +return vdev-device_endian != VIRTIO_DEVICE_ENDIAN_LITTLE;
   }
  
 
 Sorry but I still don't understand why we need to stream the device_endian
 subsection if we have a virtio-1 device... this field is only used on
 legacy device paths. Can you share an example where it is needed ?

I think it's needed.
A transitional device can be used with legacy native endian and
modern little endian format.


   static const VMStateDescription vmstate_virtio_device_endian = {
  diff --git a/include/hw/virtio/virtio-access.h 
  b/include/hw/virtio/virtio-access.h
  index 46456fd..c123ee0 100644
  --- a/include/hw/virtio/virtio-access.h
  +++ b/include/hw/virtio/virtio-access.h
  @@ -19,6 +19,10 @@
  
   static inline bool virtio_access_is_big_endian(VirtIODevice *vdev)
   {
  +if (!virtio_device_is_legacy(vdev)) {
  +/* Devices conforming to VIRTIO 1.0 or later are always LE. */
  +return false;
  +}
   #if defined(TARGET_IS_BIENDIAN)
   return virtio_is_big_endian(vdev);
   #elif defined(TARGET_WORDS_BIGENDIAN)
  diff --git a/include/hw/virtio/virtio.h b/include/hw/virtio/virtio.h
  index b408166..40e567c 100644
  --- a/include/hw/virtio/virtio.h
  +++ b/include/hw/virtio/virtio.h
  @@ -275,9 +275,18 @@ void 
  virtio_queue_set_host_notifier_fd_handler(VirtQueue *vq, bool assign,
   void virtio_queue_notify_vq(VirtQueue *vq);
   void virtio_irq(VirtQueue *vq);
  
  +static inline bool virtio_device_is_legacy(VirtIODevice *vdev)
  +{
  +return !(vdev-guest_features[1]  (1  (VIRTIO_F_VERSION_1 - 32)));
  +}
  +
   static inline bool virtio_is_big_endian(VirtIODevice *vdev)
   {
  -assert(vdev-device_endian != VIRTIO_DEVICE_ENDIAN_UNKNOWN);
  -return vdev-device_endian == VIRTIO_DEVICE_ENDIAN_BIG;
  +if (virtio_device_is_legacy(vdev)) {
  +assert(vdev-device_endian != VIRTIO_DEVICE_ENDIAN_UNKNOWN);
  +return vdev-device_endian == VIRTIO_DEVICE_ENDIAN_BIG;
  +}
  +/* Devices conforming to VIRTIO 1.0 or later are always LE. */
  +return false;
   }
   #endif
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 5/7] KVM: arm64: guest debug, add support for single-step

2014-11-26 Thread Peter Maydell
On 25 November 2014 at 16:10, Alex Bennée alex.ben...@linaro.org wrote:
 This adds support for single-stepping the guest. As userspace can and
 will manipulate guest registers before restarting any tweaking of the
 registers has to occur just before control is passed back to the guest.
 Furthermore while guest debugging is in effect we need to squash the
 ability of the guest to single-step itself as we have no easy way of
 re-entering the guest after the exception has been delivered to the
 hypervisor.

A corner case I don't think this patch handles: if the debugger
tries to single step an insn which is emulated by the
hypervisor (because it's a load/store which is trapped and
handled as emulated mmio in userspace) then we won't
correctly update the single-step state machine (and so we'll end
up incorrectly stopping after the following insn rather than
before, I think).

You should be able to achieve this effect by simply always clearing
the guest's PSTATE.SS when you advance the PC to skip the emulated
instruction (cf the comment in the pseudocode SSAdvance() function).

I think we should also be doing this PC advance on return from
userspace's handling of the mmio rather than before we drop back
to userspace as we do now, but I can't remember why I think that.
Christoffer, I don't suppose you recall, do you? I think it was
you I had this conversation with on IRC a month or so back...

-- PMM
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH kvm-unit-tests v2] x86: Test illegal movbe

2014-11-26 Thread Nadav Amit
Previously KVM ignored the mod field of MOVBE instruction, so MOVBE from
register to register succeeds, although it should fail (cause a #UD exception).
This test check that a #UD is indeed delivered upon such MOVBE.

The test would not work if MOVBE is unsupported.

Signed-off-by: Nadav Amit na...@cs.technion.ac.il
---
v1-v2: Add missing return upon no support for MOVBE
---
 x86/emulator.c | 24 
 1 file changed, 24 insertions(+)

diff --git a/x86/emulator.c b/x86/emulator.c
index 5aa4dbf..567e421 100644
--- a/x86/emulator.c
+++ b/x86/emulator.c
@@ -1051,6 +1051,29 @@ static void test_simplealu(u32 *mem)
 report(test, *mem == 0x8400);
 }
 
+static void illegal_movbe_handler(struct ex_regs *regs)
+{
+   extern char bad_movbe_cont;
+
+   ++exceptions;
+   regs-rip = (ulong)bad_movbe_cont;
+}
+
+static void test_illegal_movbe(void)
+{
+   if (!(cpuid(1).c  (1  22))) {
+   printf(SKIP: illegal movbe\n);
+   return;
+   }
+
+   exceptions = 0;
+   handle_exception(UD_VECTOR, illegal_movbe_handler);
+   asm volatile(.byte 0x0f; .byte 0x38; .byte 0xf0; .byte 0xc0;\n\t
+ bad_movbe_cont: : : : rax);
+   report(illegal movbe, exceptions == 1);
+   handle_exception(UD_VECTOR, 0);
+}
+
 int main()
 {
void *mem;
@@ -1119,6 +1142,7 @@ int main()
test_string_io_mmio(mem);
 
test_jmp_noncanonical(mem);
+   test_illegal_movbe();
 
return report_summary();
 }
-- 
1.9.1

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] xen: privcmd: schedule() after private hypercall when non CONFIG_PREEMPT

2014-11-26 Thread Luis R. Rodriguez
From: Luis R. Rodriguez mcg...@suse.com

Some folks had reported that some xen hypercalls take a long time
to complete when issued from the userspace private ioctl mechanism,
this can happen for instance with some hypercalls that have many
sub-operations, this can happen for instance on hypercalls that use
multi-call feature whereby Xen lets one hypercall batch out a series
of other hypercalls on the hypervisor. At times such hypercalls can
even end up triggering the TASK_UNINTERRUPTIBLE hanger check (default
120 seconds), this a non-issue issue on preemptible kernels though as
the kernel may deschedule such long running tasks. Xen for instance
supports multicalls to be preempted as well, this is what Xen calls
continuation (see xen commit 42217cbc5b which introduced this [0]).
On systems without CONFIG_PREEMPT though -- a kernel with voluntary
or no preemption -- a long running hypercall will not be descheduled
until the hypercall is complete and the ioctl returns to user space.

To help with this David had originally implemented support for use
of preempt_schedule_irq() [1] for non CONFIG_PREEMPT kernels. This
solution never went upstream though and upon review to help refactor
this I've concluded that usage of preempt_schedule_irq() would be
a bit abussive of existing APIs -- for a few reasons:

0) we want to avoid spreading its use on non CONFIG_PREEMPT kernels

1) we want try to consider solutions that might work for other
   hypervisors for this same problem, and identify it its an issue
   even present on other hypervisors or if this is a self
   inflicted architectural issue caused by use of multicalls

2) there is no documentation or profiling of the exact hypercalls
   that were causing these issues, nor do we have any context
   to help evaluate this any further

I at least checked with kvm folks and it seems hypercall preemption
is not needed there. We can survey other hypervisors...

If 'something like preemption' is needed then CONFIG_PREEMPT
should just be enabled and encouraged, it seems we want to
encourage CONFIG_PREEMPT on xen, specially when multicalls are
used. In the meantime this tries to address a solution to help
xen on non CONFIG_PREEMPT kernels.

One option tested and evaluated was to put private hypercalls in
process context, however this would introduce complexities such
originating hypercalls from different contexts. Current xen
hypercall callback handlers would need to be changed per architecture,
for instance, we'd also incur the cost of switching states from
user / kernel (this cost is also present if preempt_schedule_irq()
is used). There may be other issues which could be introduced with
this strategy as well. The simplest *shared* alternative is instead
to just explicitly schedule() at the end of a private hypercall on non
preempt kernels. This forces our private hypercall call mechanism
to try to be fair only on non CONFIG_PREEMPT kernels at the cost of
more context switch but keeps the private hypercall context intact.

[0] 
http://xenbits.xen.org/gitweb/?p=xen.git;a=commitdiff;h=42217cbc5b3e84b8c145d8cfb62dd5de0134b9e8;hp=3a0b9c57d5c9e82c55dd967c84dd06cb43c49ee9
[1] 
http://ftp.suse.com/pub/people/mcgrof/xen-preempt-hypercalls/0001-x86-xen-allow-privcmd-hypercalls-to-be-preempted.patch

Cc: Davidlohr Bueso dbu...@suse.de
Cc: Joerg Roedel jroe...@suse.de
Cc: Borislav Petkov b...@suse.de
Cc: Konrad Rzeszutek Wilk konrad.w...@oracle.com
Cc: Jan Beulich jbeul...@suse.com
Cc: Juergen Gross jgr...@suse.com
Cc: Olaf Hering oher...@suse.de
Cc: David Vrabel david.vra...@citrix.com
Signed-off-by: Luis R. Rodriguez mcg...@suse.com
---
 drivers/xen/privcmd.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/xen/privcmd.c b/drivers/xen/privcmd.c
index 569a13b..e29edba 100644
--- a/drivers/xen/privcmd.c
+++ b/drivers/xen/privcmd.c
@@ -60,6 +60,9 @@ static long privcmd_ioctl_hypercall(void __user *udata)
   hypercall.arg[0], hypercall.arg[1],
   hypercall.arg[2], hypercall.arg[3],
   hypercall.arg[4]);
+#ifndef CONFIG_PREEMPT
+   schedule();
+#endif
 
return ret;
 }
-- 
2.1.1

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/3] KVM: PPC: BOOK3S: HV: Rename variable for better readability

2014-11-26 Thread Paul Mackerras
On Mon, Oct 20, 2014 at 07:59:00PM +0530, Aneesh Kumar K.V wrote:
 Minor cleanup
 
 Signed-off-by: Aneesh Kumar K.V aneesh.ku...@linux.vnet.ibm.com
 ---
  arch/powerpc/kvm/book3s_hv_rm_mmu.c | 25 +
  1 file changed, 13 insertions(+), 12 deletions(-)
 
 diff --git a/arch/powerpc/kvm/book3s_hv_rm_mmu.c 
 b/arch/powerpc/kvm/book3s_hv_rm_mmu.c
 index 78e689b066f1..2922f8d127ff 100644
 --- a/arch/powerpc/kvm/book3s_hv_rm_mmu.c
 +++ b/arch/powerpc/kvm/book3s_hv_rm_mmu.c
 @@ -523,7 +523,7 @@ long kvmppc_h_bulk_remove(struct kvm_vcpu *vcpu)
   unsigned long *args = vcpu-arch.gpr[4];
   __be64 *hp, *hptes[4];
   unsigned long tlbrb[4];
 - long int i, j, k, n, found, indexes[4];
 + long int i, j, k, collected_hpte, found, indexes[4];

Hmmm... I don't find it more readable, because collected_hpte sounds
like it contains a HPTE value.  Also I don't like using a long name
for something that is just a temporary value inside a function, and
n is a suitable name for a temporary variable counting the number of
things we have.  I would prefer just adding a comment like this:

-   n = 0;
+   n = 0;  /* # values collected in tlbrb[], indexes[] etc. */

Paul.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[question] what is virtio-1 device?

2014-11-26 Thread Zhang Haoyu
Hi,
what is virtio-1 device?

Thanks,
Zhang Haoyu

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [question] what is virtio-1 device?

2014-11-26 Thread Chen, Tiejun

On 2014/11/27 9:26, Zhang Haoyu wrote:

Hi,
what is virtio-1 device?



Are you mean subsystem device id is '1'? That should be a network card.

Tiejun
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [question] what is virtio-1 device?

2014-11-26 Thread Zhang Haoyu
 Hi,
 what is virtio-1 device?


Are you mean subsystem device id is '1'? That should be a network card.

In the mail qemu: towards virtio-1 host support, virtio: allow virtio-1 
queue layout,
dataplane: allow virtio-1 devices, etc., virtio-1 devices was mentioned,
I don't know which devices it incorporates?

Thanks,
Zhang Haoyu
Tiejun

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [question] what is virtio-1 device?

2014-11-26 Thread Chen, Tiejun

On 2014/11/27 9:46, Zhang Haoyu wrote:

Hi,
what is virtio-1 device?



Are you mean subsystem device id is '1'? That should be a network card.


In the mail qemu: towards virtio-1 host support, virtio: allow virtio-1 queue 
layout,
dataplane: allow virtio-1 devices, etc., virtio-1 devices was mentioned,
I don't know which devices it incorporates?



That should mean they're carrying forward virtio based on latest Virtual 
I/O Device (VIRTIO) Version 1.0.


Tiejun
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [question] what is virtio-1 device?

2014-11-26 Thread Zhang Haoyu
 Hi,
 what is virtio-1 device?


 Are you mean subsystem device id is '1'? That should be a network card.

 In the mail qemu: towards virtio-1 host support, virtio: allow virtio-1 
 queue layout,
 dataplane: allow virtio-1 devices, etc., virtio-1 devices was mentioned,
 I don't know which devices it incorporates?


That should mean they're carrying forward virtio based on latest Virtual 
I/O Device (VIRTIO) Version 1.0.

Got it, thanks.

Tiejun

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] xen: privcmd: schedule() after private hypercall when non CONFIG_PREEMPT

2014-11-26 Thread Juergen Gross

On 11/26/2014 11:26 PM, Luis R. Rodriguez wrote:

From: Luis R. Rodriguez mcg...@suse.com

Some folks had reported that some xen hypercalls take a long time
to complete when issued from the userspace private ioctl mechanism,
this can happen for instance with some hypercalls that have many
sub-operations, this can happen for instance on hypercalls that use
multi-call feature whereby Xen lets one hypercall batch out a series
of other hypercalls on the hypervisor. At times such hypercalls can
even end up triggering the TASK_UNINTERRUPTIBLE hanger check (default
120 seconds), this a non-issue issue on preemptible kernels though as
the kernel may deschedule such long running tasks. Xen for instance
supports multicalls to be preempted as well, this is what Xen calls
continuation (see xen commit 42217cbc5b which introduced this [0]).
On systems without CONFIG_PREEMPT though -- a kernel with voluntary
or no preemption -- a long running hypercall will not be descheduled
until the hypercall is complete and the ioctl returns to user space.

To help with this David had originally implemented support for use
of preempt_schedule_irq() [1] for non CONFIG_PREEMPT kernels. This
solution never went upstream though and upon review to help refactor
this I've concluded that usage of preempt_schedule_irq() would be
a bit abussive of existing APIs -- for a few reasons:

0) we want to avoid spreading its use on non CONFIG_PREEMPT kernels

1) we want try to consider solutions that might work for other
hypervisors for this same problem, and identify it its an issue
even present on other hypervisors or if this is a self
inflicted architectural issue caused by use of multicalls

2) there is no documentation or profiling of the exact hypercalls
that were causing these issues, nor do we have any context
to help evaluate this any further

I at least checked with kvm folks and it seems hypercall preemption
is not needed there. We can survey other hypervisors...

If 'something like preemption' is needed then CONFIG_PREEMPT
should just be enabled and encouraged, it seems we want to
encourage CONFIG_PREEMPT on xen, specially when multicalls are
used. In the meantime this tries to address a solution to help
xen on non CONFIG_PREEMPT kernels.

One option tested and evaluated was to put private hypercalls in
process context, however this would introduce complexities such
originating hypercalls from different contexts. Current xen
hypercall callback handlers would need to be changed per architecture,
for instance, we'd also incur the cost of switching states from
user / kernel (this cost is also present if preempt_schedule_irq()
is used). There may be other issues which could be introduced with
this strategy as well. The simplest *shared* alternative is instead
to just explicitly schedule() at the end of a private hypercall on non
preempt kernels. This forces our private hypercall call mechanism
to try to be fair only on non CONFIG_PREEMPT kernels at the cost of
more context switch but keeps the private hypercall context intact.

[0] 
http://xenbits.xen.org/gitweb/?p=xen.git;a=commitdiff;h=42217cbc5b3e84b8c145d8cfb62dd5de0134b9e8;hp=3a0b9c57d5c9e82c55dd967c84dd06cb43c49ee9
[1] 
http://ftp.suse.com/pub/people/mcgrof/xen-preempt-hypercalls/0001-x86-xen-allow-privcmd-hypercalls-to-be-preempted.patch

Cc: Davidlohr Bueso dbu...@suse.de
Cc: Joerg Roedel jroe...@suse.de
Cc: Borislav Petkov b...@suse.de
Cc: Konrad Rzeszutek Wilk konrad.w...@oracle.com
Cc: Jan Beulich jbeul...@suse.com
Cc: Juergen Gross jgr...@suse.com
Cc: Olaf Hering oher...@suse.de
Cc: David Vrabel david.vra...@citrix.com
Signed-off-by: Luis R. Rodriguez mcg...@suse.com
---
  drivers/xen/privcmd.c | 3 +++
  1 file changed, 3 insertions(+)

diff --git a/drivers/xen/privcmd.c b/drivers/xen/privcmd.c
index 569a13b..e29edba 100644
--- a/drivers/xen/privcmd.c
+++ b/drivers/xen/privcmd.c
@@ -60,6 +60,9 @@ static long privcmd_ioctl_hypercall(void __user *udata)
   hypercall.arg[0], hypercall.arg[1],
   hypercall.arg[2], hypercall.arg[3],
   hypercall.arg[4]);
+#ifndef CONFIG_PREEMPT
+   schedule();
+#endif

return ret;
  }



Sorry, I don't think this will solve anything. You're calling schedule()
right after the long running hypercall just nanoseconds before returning
to the user.

I suppose you were mislead by the int 0x82 in [0]. This is the
hypercall from the kernel into the hypervisor, e.g. inside of
privcmd_call().


Juergen
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH net-next] vhost: remove unnecessary forward declarations in vhost.h

2014-11-26 Thread Jason Wang
Signed-off-by: Jason Wang jasow...@redhat.com
---
 drivers/vhost/vhost.h | 4 
 1 file changed, 4 deletions(-)

diff --git a/drivers/vhost/vhost.h b/drivers/vhost/vhost.h
index 3eda654..7d039ef 100644
--- a/drivers/vhost/vhost.h
+++ b/drivers/vhost/vhost.h
@@ -12,8 +12,6 @@
 #include linux/virtio_ring.h
 #include linux/atomic.h
 
-struct vhost_device;
-
 struct vhost_work;
 typedef void (*vhost_work_fn_t)(struct vhost_work *work);
 
@@ -54,8 +52,6 @@ struct vhost_log {
u64 len;
 };
 
-struct vhost_virtqueue;
-
 /* The virtqueue structure describes a queue attached to a device. */
 struct vhost_virtqueue {
struct vhost_dev *dev;
-- 
1.9.1

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [question] lots of interrupts injected to vm when pressing some key w/o releasing

2014-11-26 Thread Zhang Haoyu
 I tested win-server-2008 with -cpu 
 core2duo,hv_spinlocks=0x,hv_relaxed,hv_time,
 this problem still happened, about 200,000 vmexits per-second, 
 bringing very bad experience, just like being stuck.
 
 Please upload a full trace somewhere, or at least the perf report 
 output.
 
 
 And, if I remove the commit of 0bc830b0, the problem disappeared.
 
 Please send the full trace file.  If you compress it, it should be small.
 
 See the attach 1, please.
 
 Paolo

Can you try the follow draft patch to see whether it solve your problem? This 
patch is based on commit 0bc830b0.

After applying this patch, VM got stuck with black-screen at boot stage,
# trace-cmd report:
version = 6
CPU 0 is empty
CPU 1 is empty
CPU 2 is empty
CPU 3 is empty
CPU 5 is empty
CPU 7 is empty
cpus=8
 kvm-1266  [004] 14399.834397: kvm_set_irq:  gsi 9 level 1 
source 0
 kvm-1266  [004] 14399.834403: kvm_pic_set_irq:  chip 1 pin 1 
(edge|masked)
 kvm-1266  [004] 14399.834411: kvm_apic_accept_irq:  apicid 0 vec 
177 (LowPrio|level)
 kvm-1266  [004] 14399.834412: kvm_ioapic_set_irq:   pin 9 dst 3 
vec=177 (LowPrio|logical|level)
 kvm-1266  [004] 14402.180013: kvm_set_irq:  gsi 9 level 1 
source 0
 kvm-1266  [004] 14402.180019: kvm_pic_set_irq:  chip 1 pin 1 
(edge|masked)
 kvm-1266  [004] 14402.180028: kvm_apic_accept_irq:  apicid 1 vec 
177 (LowPrio|level)
 kvm-1266  [004] 14402.180029: kvm_ioapic_set_irq:   pin 9 dst 3 
vec=177 (LowPrio|logical|level)
 kvm-1266  [004] 14404.525627: kvm_set_irq:  gsi 9 level 1 
source 0
 kvm-1266  [004] 14404.525634: kvm_pic_set_irq:  chip 1 pin 1 
(edge|masked)
 kvm-1266  [004] 14404.525641: kvm_apic_accept_irq:  apicid 0 vec 
177 (LowPrio|level)
 kvm-1266  [004] 14404.525642: kvm_ioapic_set_irq:   pin 9 dst 3 
vec=177 (LowPrio|logical|level)
 kvm-1266  [004] 14406.871238: kvm_set_irq:  gsi 9 level 1 
source 0
 kvm-1266  [004] 14406.871245: kvm_pic_set_irq:  chip 1 pin 1 
(edge|masked)
 kvm-1266  [004] 14406.871254: kvm_apic_accept_irq:  apicid 1 vec 
177 (LowPrio|level)
 kvm-1266  [004] 14406.871256: kvm_ioapic_set_irq:   pin 9 dst 3 
vec=177 (LowPrio|logical|level)
 kvm-1266  [006] 14409.216849: kvm_set_irq:  gsi 9 level 1 
source 0
 kvm-1266  [006] 14409.216855: kvm_pic_set_irq:  chip 1 pin 1 
(edge|masked)
 kvm-1266  [006] 14409.216862: kvm_apic_accept_irq:  apicid 0 vec 
177 (LowPrio|level)
 kvm-1266  [006] 14409.216863: kvm_ioapic_set_irq:   pin 9 dst 3 
vec=177 (LowPrio|logical|level)
 kvm-1266  [006] 14411.562475: kvm_set_irq:  gsi 9 level 1 
source 0
 kvm-1266  [006] 14411.562481: kvm_pic_set_irq:  chip 1 pin 1 
(edge|masked)
 kvm-1266  [006] 14411.562489: kvm_apic_accept_irq:  apicid 1 vec 
177 (LowPrio|level)
 kvm-1266  [006] 14411.562491: kvm_ioapic_set_irq:   pin 9 dst 3 
vec=177 (LowPrio|logical|level)
 kvm-1266  [004] 14413.908074: kvm_set_irq:  gsi 9 level 1 
source 0
 kvm-1266  [004] 14413.908080: kvm_pic_set_irq:  chip 1 pin 1 
(edge|masked)
 kvm-1266  [004] 14413.908088: kvm_apic_accept_irq:  apicid 0 vec 
177 (LowPrio|level)
 kvm-1266  [004] 14413.908089: kvm_ioapic_set_irq:   pin 9 dst 3 
vec=177 (LowPrio|logical|level)

Thanks,
Zhang Haoyu

diff --git a/virt/kvm/ioapic.c b/virt/kvm/ioapic.c
index 25e16a6..8f4e211 100644
--- a/virt/kvm/ioapic.c
+++ b/virt/kvm/ioapic.c
@@ -305,6 +305,7 @@ static int ioapic_service(struct kvm_ioapic *ioapic, int 
irq, bool line_status)
return ret;
 }
 
+static int irq_status[256];
 int kvm_ioapic_set_irq(struct kvm_ioapic *ioapic, int irq, int irq_source_id,
   int level, bool line_status)
 {
@@ -312,10 +313,13 @@ int kvm_ioapic_set_irq(struct kvm_ioapic *ioapic, int 
irq, int irq_source_id,
u32 mask = 1  irq;
union kvm_ioapic_redirect_entry entry;
int ret, irq_level;
+   int old_irq;
 
BUG_ON(irq  0 || irq = IOAPIC_NUM_PINS);
 
spin_lock(ioapic-lock);
+   old_irq = irq_status[irq];
+   irq_status[irq] = level;
old_irr = ioapic-irr;
irq_level = __kvm_irq_line_state(ioapic-irq_states[irq],
 irq_source_id, level);
@@ -332,6 +336,10 @@ int kvm_ioapic_set_irq(struct kvm_ioapic *ioapic, int 
irq, int irq_source_id,
goto out;
}
ioapic-irr |= mask;
+   if (edge  old_irq == irq_status[irq]  old_irq) {
+   ret = 0;
+   goto out;
+   }
if ((edge  old_irr != ioapic-irr) ||
(!edge  !entry.fields.remote_irr))
ret = 

Re: [PATCH 3/3] KVM: PPC: BOOK3S: HV: Rename variable for better readability

2014-11-26 Thread Paul Mackerras
On Mon, Oct 20, 2014 at 07:59:00PM +0530, Aneesh Kumar K.V wrote:
 Minor cleanup
 
 Signed-off-by: Aneesh Kumar K.V aneesh.ku...@linux.vnet.ibm.com
 ---
  arch/powerpc/kvm/book3s_hv_rm_mmu.c | 25 +
  1 file changed, 13 insertions(+), 12 deletions(-)
 
 diff --git a/arch/powerpc/kvm/book3s_hv_rm_mmu.c 
 b/arch/powerpc/kvm/book3s_hv_rm_mmu.c
 index 78e689b066f1..2922f8d127ff 100644
 --- a/arch/powerpc/kvm/book3s_hv_rm_mmu.c
 +++ b/arch/powerpc/kvm/book3s_hv_rm_mmu.c
 @@ -523,7 +523,7 @@ long kvmppc_h_bulk_remove(struct kvm_vcpu *vcpu)
   unsigned long *args = vcpu-arch.gpr[4];
   __be64 *hp, *hptes[4];
   unsigned long tlbrb[4];
 - long int i, j, k, n, found, indexes[4];
 + long int i, j, k, collected_hpte, found, indexes[4];

Hmmm... I don't find it more readable, because collected_hpte sounds
like it contains a HPTE value.  Also I don't like using a long name
for something that is just a temporary value inside a function, and
n is a suitable name for a temporary variable counting the number of
things we have.  I would prefer just adding a comment like this:

-   n = 0;
+   n = 0;  /* # values collected in tlbrb[], indexes[] etc. */

Paul.
--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html