Re: [PATCH] KVM: svm: add support for RDTSCP

2015-11-12 Thread Joerg Roedel
Hi Paolo,

On Thu, Nov 12, 2015 at 02:49:16PM +0100, Paolo Bonzini wrote:
> RDTSCP was never supported for AMD CPUs, which nobody noticed because
> Linux does not use it.  But exactly the fact that Linux does not
> use it makes the implementation very simple; we can freely trash
> MSR_TSC_AUX while running the guest.
> 
> Cc: Joerg Roedel <j...@8bytes.org>
> Signed-off-by: Paolo Bonzini <pbonz...@redhat.com>
> ---
>  arch/x86/kvm/svm.c | 24 +++-
>  1 file changed, 23 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
> index 83a1c64..c302614 100644
> --- a/arch/x86/kvm/svm.c
> +++ b/arch/x86/kvm/svm.c
> @@ -86,6 +86,7 @@ static const u32 host_save_user_msrs[] = {
>   MSR_FS_BASE,
>  #endif
>   MSR_IA32_SYSENTER_CS, MSR_IA32_SYSENTER_ESP, MSR_IA32_SYSENTER_EIP,
> + MSR_TSC_AUX,
>  };
>  
>  #define NR_HOST_SAVE_USER_MSRS ARRAY_SIZE(host_save_user_msrs)
> @@ -135,6 +136,7 @@ struct vcpu_svm {
>   uint64_t asid_generation;
>   uint64_t sysenter_esp;
>   uint64_t sysenter_eip;
> + uint64_t tsc_aux;
>  
>   u64 next_rip;
>  
> @@ -1238,6 +1240,9 @@ static void svm_vcpu_load(struct kvm_vcpu *vcpu, int 
> cpu)
>   wrmsrl(MSR_AMD64_TSC_RATIO, tsc_ratio);
>   }
>   }
> + /* This assumes that the kernel never uses MSR_TSC_AUX */
> + if (static_cpu_has(X86_FEATURE_RDTSCP))
> + wrmsrl(MSR_TSC_AUX, svm->tsc_aux);
>  }

Hmm, you seem to still intercept MSR_TSC_AUX, is that intentional?
Loading the guests value into the real cpu msr only makes sense to me
when the MSR is no longer intercepted.


Joerg

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v4 0/6] virtio core DMA API conversion

2015-11-10 Thread Joerg Roedel
On Tue, Nov 10, 2015 at 01:04:36PM +1100, Benjamin Herrenschmidt wrote:
> The "in absence of the new DT binding" doesn't make that much sense.
> 
> Those platforms use device-trees defined since the dawn of ages by
> actual open firmware implementations, they either have no iommu
> representation in there (Macs, the platform code hooks it all up) or
> have various properties related to the iommu but no concept of "bypass"
> in there.
> 
> We can *add* a new property under some circumstances that indicates a
> bypass on a per-device basis, however that doesn't completely solve it:
> 
>   - As I said above, what does the absence of that property mean ? An
> old qemu that does bypass on all virtio or a new qemu trying to tell
> you that the virtio device actually does use the iommu (or some other
> environment that isn't qemu) ?

You have the same problem when real PCIe devices appear that speak
virtio. I think the only real (still not very nice) solution is to add a
quirk to powerpc platform code that sets noop dma-ops for the existing
virtio vendor/device-ids and add a DT property to opt-out of that quirk.

New vendor/device-ids (as for real devices) would just not be covered by
the quirk and existing emulated devices continue to work.

The absence of the property just means that the quirk is in place and
the system assumes no translation for virtio devices.


Joerg

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3 0/3] virtio DMA API core stuff

2015-11-08 Thread Joerg Roedel
On Sun, Nov 08, 2015 at 12:37:47PM +0200, Michael S. Tsirkin wrote:
> I have no problem with that. For example, can we teach
> the DMA API on intel x86 to use PT for virtio by default?
> That would allow merging Andy's patches with
> full compatibility with old guests and hosts.

Well, the only incompatibility comes from an experimental qemu feature,
more explicitly from a bug in that features implementation. So why
should we work around that in the kernel? I think it is not too hard to
fix qemu to generate a correct DMAR table which excludes the virtio
devices from iommu translation.


Joerg

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v1 2/2] dma-mapping-common: add DMA attribute - DMA_ATTR_IOMMU_BYPASS

2015-11-05 Thread Joerg Roedel
On Mon, Nov 02, 2015 at 07:32:19PM +0200, Shamir Rabinovitch wrote:
> Correct. This issue is one of the concerns here in the previous replies.
> I will take different approach which will not require the IOMMU bypass
> per mapping. Will try to shift to the x86 'iommu=pt' approach.

Yeah, it doesn't really make sense to have an extra remappable area when
the device can access all physical memory anyway.

> We had a bunch of issues around SPARC IOMMU. Not all of them relate to
> performance. The first issue was that on SPARC, currently, we only have 
> limited address space to IOMMU so we had issue to do large DMA mappings
> for Infiniband. Second issue was that we identified high contention on 
> the IOMMU locks even in ETH driver.

Contended IOMMU locks are not only a problem on SPARC, but on x86 and
various other IOMMU drivers too. But I have some ideas on how to improve
the situation there.

> I do not want to put too much information here but you can see some results:
> 
> rds-stress test from sparc t5-2 -> x86:
> 
> with iommu bypass:
> -
> sparc->x86 cmdline = -r XXX -s XXX -q 256 -a 8192 -T 10 -d 10 -t 3 -o XXX
> tsks   tx/s   rx/s  tx+rx K/smbi K/smbo K/s tx us/c   rtt us cpu %
>3 141278  0 1165565.81   0.00   0.008.93   376.60 -1.00  
> (average)
> 
> without iommu bypass:
> -
> sparc->x86 cmdline = -r XXX -s XXX -q 256 -a 8192 -T 10 -d 10 -t 3 -o XXX
> tsks   tx/s   rx/s  tx+rx K/smbi K/smbo K/s tx us/c   rtt us cpu %
>3  78558  0  648101.41   0.00   0.00   15.05   876.72 -1.00  
> (average)
> 
> + RDMA tests are totally not working (might be due to failure to DMA map all 
> the memory).
> 
> So IOMMU bypass give ~80% performance boost.

Interesting. Have you looked more closely on what causes the performance
degradation? Is it the lock contention or something else?


Joerg

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/3] dma: Provide simple noop dma ops

2015-11-05 Thread Joerg Roedel
On Tue, Nov 03, 2015 at 12:54:37PM +0100, Christian Borntraeger wrote:
> We are going to require dma_ops for several common drivers, even for
> systems that do have an identity mapping. Lets provide some minimal
> no-op dma_ops that can be used for that purpose.
> 
> Signed-off-by: Christian Borntraeger <borntrae...@de.ibm.com>
> ---
>  include/linux/dma-mapping.h |  2 ++
>  lib/Makefile|  1 +
>  lib/dma-noop.c  | 75 
> +
>  3 files changed, 78 insertions(+)
>  create mode 100644 lib/dma-noop.c

Reviewed-by: Joerg Roedel <jroe...@suse.de>

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] vfio: Fix bug in vfio_device_get_from_name()

2015-11-04 Thread Joerg Roedel
From: Joerg Roedel <jroe...@suse.de>

The vfio_device_get_from_name() function might return a
non-NULL pointer, when called with a device name that is not
found in the list. This causes undefined behavior, in my
case calling an invalid function pointer later on:

 kernel tried to execute NX-protected page - exploit attempt? (uid: 0)
 BUG: unable to handle kernel paging request at 8800cb3ddc08

[...]

 Call Trace:
  [] ? vfio_group_fops_unl_ioctl+0x253/0x410 [vfio]
  [] do_vfs_ioctl+0x2cd/0x4c0
  [] ? __fget+0x77/0xb0
  [] SyS_ioctl+0x79/0x90
  [] ? syscall_return_slowpath+0x50/0x130
  [] entry_SYSCALL_64_fastpath+0x16/0x75

Fix the issue by returning NULL when there is no device with
the requested name in the list.

Cc: sta...@vger.kernel.org # v4.2+
Fixes: 4bc94d5dc95d ("vfio: Fix lockdep issue")
Signed-off-by: Joerg Roedel <jroe...@suse.de>
---
 drivers/vfio/vfio.c | 7 ---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/drivers/vfio/vfio.c b/drivers/vfio/vfio.c
index 563c510..8c50ea6 100644
--- a/drivers/vfio/vfio.c
+++ b/drivers/vfio/vfio.c
@@ -692,11 +692,12 @@ EXPORT_SYMBOL_GPL(vfio_device_get_from_dev);
 static struct vfio_device *vfio_device_get_from_name(struct vfio_group *group,
 char *buf)
 {
-   struct vfio_device *device;
+   struct vfio_device *it, *device = NULL;
 
mutex_lock(>device_lock);
-   list_for_each_entry(device, >device_list, group_next) {
-   if (!strcmp(dev_name(device->dev), buf)) {
+   list_for_each_entry(it, >device_list, group_next) {
+   if (!strcmp(dev_name(it->dev), buf)) {
+   device = it;
vfio_device_get(device);
break;
}
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/3] Provide simple noop dma ops

2015-11-02 Thread Joerg Roedel
On Fri, Oct 30, 2015 at 02:20:35PM +0100, Christian Borntraeger wrote:
> +static void *dma_noop_alloc(struct device *dev, size_t size,
> + dma_addr_t *dma_handle, gfp_t gfp,
> + struct dma_attrs *attrs)
> +{
> + void *ret;
> +
> + ret = (void *)__get_free_pages(gfp, get_order(size));
> + if (ret) {
> + memset(ret, 0, size);

There is no need to zero out the memory here. If the user wants
initialized memory it can call dma_zalloc_coherent. Having the memset
here means to clear the memory twice in the dma_zalloc_coherent path.

Otherwise it looks good.


Joerg

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/3] alpha: use common noop dma ops

2015-11-02 Thread Joerg Roedel
On Fri, Oct 30, 2015 at 02:20:36PM +0100, Christian Borntraeger wrote:
> Some of the alpha pci noop dma ops are identical to the common ones.
> Use them.
> 
> Signed-off-by: Christian Borntraeger <borntrae...@de.ibm.com>
> ---
>  arch/alpha/kernel/pci-noop.c | 46 
> 
>  1 file changed, 4 insertions(+), 42 deletions(-)

Reviewed-by: Joerg Roedel <jroe...@suse.de>

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v1 2/2] dma-mapping-common: add DMA attribute - DMA_ATTR_IOMMU_BYPASS

2015-11-02 Thread Joerg Roedel
On Thu, Oct 29, 2015 at 09:32:32AM +0200, Shamir Rabinovitch wrote:
> For the IB case, setting and tearing DMA mappings for the drivers data buffers
> is expensive. But we could for example consider to map all the HW control 
> objects
> that validate the HW access to the drivers data buffers as IOMMU protected 
> and so
> have IOMMU protection on those critical objects while having fast 
> set-up/tear-down
> of driver data buffers. The HW control objects have stable mapping for the 
> lifetime
> of the system. So the case of using both types of DMA mappings is still valid.

How do you envision this per-mapping by-pass to work? For the DMA-API
mappings you have only one device address space. For by-pass to work,
you need to map all physical memory of the machine into this space,
which leaves the question where you want to put the window for
remapping.

You surely have to put it after the physical mappings, but any
protection will be gone, as the device can access all physical memory.

So instead of working around the shortcomings of DMA-API
implementations, can you present us some numbers and analysis of how bad
the performance impact with an IOMMU is and what causes it?

I know that we have lock-contention issues in our IOMMU drivers, which
can be fixed. Maybe the performance problems you are seeing can be fixed
too, when you give us more details about them.


Joerg
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/3] s390/dma: Allow per device dma ops

2015-11-02 Thread Joerg Roedel
On Fri, Oct 30, 2015 at 02:20:37PM +0100, Christian Borntraeger wrote:
> As virtio-ccw now has dma ops, we can no longer default to the PCI ones.
> Make use of dev_archdata to keep the dma_ops per device. The pci devices
> now use that to override the default, and the default is changed to use
> the noop ops for everything that is not PCI. To compile without PCI
> support we also have to enable the DMA api with virtio.
> 
> Signed-off-by: Christian Borntraeger <borntrae...@de.ibm.com>
> ---
>  arch/s390/Kconfig   | 3 ++-
>  arch/s390/include/asm/device.h  | 6 +-
>  arch/s390/include/asm/dma-mapping.h | 6 --
>  arch/s390/pci/pci.c | 1 +
>  arch/s390/pci/pci_dma.c | 4 ++--
>  5 files changed, 14 insertions(+), 6 deletions(-)

Reviewed-by: Joerg Roedel <jroe...@suse.de>

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v1 2/2] dma-mapping-common: add DMA attribute - DMA_ATTR_IOMMU_BYPASS

2015-11-02 Thread Joerg Roedel
On Fri, Oct 30, 2015 at 11:32:06AM +0100, Arnd Bergmann wrote:
> I wonder if the 'iommu=force' attribute is too coarse-grained though,
> and if we should perhaps allow a per-device setting on architectures
> that allow this.

Yeah, definitly. Currently we only have iommu=pt to enable pass-through
mode for _all_ devices. I think it makes sense to introduce a per-device
opt-in for pass-through, but have it configured by the user and not by
the device driver.

If the user enables the IOMMU in his system, he expects to be secure
against DMA attacks. If drivers could opt-out, every protection would be
voided.


Joerg

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/4] Provide simple noop dma ops

2015-10-30 Thread Joerg Roedel
On Fri, Oct 30, 2015 at 01:55:56PM +0100, Christian Borntraeger wrote:
> It not trivial without understanding the dma mask details. Do I read
> the x86 implementation right, that it limits the dma to 32 bit? Then
> we cannot collapse both implementations. Or maybe we can hide this in
> dma_capable. Dont know

No, DMA is not limited to 32bit on x86. Each device has its own
dma_mask, and the requested address+size is checked against it. The
DMA_BIT_MASK(32) check is only to there to print a warning when a 32bit
capable device trys to access memory above 4GB.



Joerg

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3 0/3] virtio DMA API core stuff

2015-10-30 Thread Joerg Roedel
On Thu, Oct 29, 2015 at 11:01:41AM +0200, Michael S. Tsirkin wrote:
> Example: you have a mix of assigned devices and virtio devices. You
> don't trust your assigned device vendor not to corrupt your memory so
> you want to limit the damage your assigned device can do to your guest,
> so you use an IOMMU for that.  Thus existing iommu=pt within guest is out.
> 
> But you trust your hypervisor (you have no choice anyway),
> and you don't want the overhead of tweaking IOMMU
> on data path for virtio. Thus iommu=on is out too.

IOMMUs on x86 usually come with an ACPI table that describes which
IOMMUs are in the system and which devices they translate. So you can
easily describe all devices there that are not behind an IOMMU.

The ACPI table is built by the BIOS, and the platform intialization code
sets the device dma_ops accordingly. If the BIOS provides wrong
information in the ACPI table this is a platform bug.

> I'm not sure what ACPI has to do with it.  It's about a way for guest
> users to specify whether they want to bypass an IOMMU for a given
> device.

We have no way yet to request passthrough-mode per-device from the IOMMU
drivers, but that can easily be added. But as I see it:

> By the way, a bunch of code is missing on the QEMU side
> to make this useful:
> 1. virtio ignores the iommu
> 2. vhost user ignores the iommu
> 3. dataplane ignores the iommu
> 4. vhost-net ignores the iommu
> 5. VFIO ignores the iommu

Qemu does not implement IOMMU translation for virtio devices anyway
(which is fine), so it just should tell the guest so in the ACPI table
built to describe the emulated IOMMU.


Joerg

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/4] Provide simple noop dma ops

2015-10-27 Thread Joerg Roedel
Hi Christian,

On Tue, Oct 27, 2015 at 11:48:48PM +0100, Christian Borntraeger wrote:
> +static dma_addr_t dma_noop_map_page(struct device *dev, struct page *page,
> +   unsigned long offset, size_t size,
> +   enum dma_data_direction dir,
> +   struct dma_attrs *attrs)
> +{
> + return page_to_phys(page) + offset;
> +}

X86 also has its own version of these noop dma_ops, see
arch/x86/kernel/pci-nommu.c. This one also checks the dma_mask and
prints a warning if the physical address doesn't fit into the mask.

I think this would make sense here too, and that we can also make x86
use the same generic noop-dma-ops your are introducing.


Joerg

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH/RFC 0/4] dma ops and virtio

2015-10-27 Thread Joerg Roedel
Hi Christian,

On Tue, Oct 27, 2015 at 11:48:47PM +0100, Christian Borntraeger wrote:
> Here is a very quick (still untested) shot at providing the s390 part:
> - patch1: dummy dma ops, inspired by the alpha code
> - patch2: replace some of the alpha functions with the dummy ones
> - patch3: allow per device dma ops for s390
> - patch4: wire up virtio dma ops

Thanks for the patches, I think they are a good start. I sent you
comments for two of the patches, the others look good to me.


Joerg

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 4/4] s390/virtio: use noop dma ops

2015-10-27 Thread Joerg Roedel
On Tue, Oct 27, 2015 at 11:48:51PM +0100, Christian Borntraeger wrote:
> @@ -1093,6 +1094,7 @@ static void virtio_ccw_auto_online(void *data, 
> async_cookie_t cookie)
>   struct ccw_device *cdev = data;
>   int ret;
>  
> + cdev->dev.archdata.dma_ops = _noop_ops;
>   ret = ccw_device_set_online(cdev);
>   if (ret)
>   dev_warn(>dev, "Failed to set online: %d\n", ret);

Hmm, drivers usually don't deal with setting the dma_ops for their
devices, as they depend on the platform and not so much on the device
itself.

Can you do this special-case handling from device independent platform
code, where you also setup dma_ops for other devices?


Joerg

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/3] virtio_ring: Support DMA APIs

2015-10-27 Thread Joerg Roedel
Hi Andy,

On Tue, Oct 27, 2015 at 06:17:09PM -0700, Andy Lutomirski wrote:
> From: Andy Lutomirski 
> 
> virtio_ring currently sends the device (usually a hypervisor)
> physical addresses of its I/O buffers.  This is okay when DMA
> addresses and physical addresses are the same thing, but this isn't
> always the case.  For example, this never works on Xen guests, and
> it is likely to fail if a physical "virtio" device ever ends up
> behind an IOMMU or swiotlb.

The overall code looks good, but I havn't seen and dma_sync* calls.
When swiotlb=force is in use this would break.

> + vq->vring.desc[head].addr = cpu_to_virtio64(_vq->vdev, 
> vring_map_single(
> + vq,
> + desc, total_sg * sizeof(struct vring_desc),
> + DMA_TO_DEVICE));

Nit: I think readability could be improved here by using a temporary
variable for the return value of vring_map_single().


Joerg

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/3] virtio_net: Stop doing DMA from the stack

2015-10-27 Thread Joerg Roedel
On Tue, Oct 27, 2015 at 06:17:08PM -0700, Andy Lutomirski wrote:
> From: Andy Lutomirski <l...@amacapital.net>
> 
> Once virtio starts using the DMA API, we won't be able to safely DMA
> from the stack.  virtio-net does a couple of config DMA requests
> from small stack buffers -- switch to using dynamically-allocated
> memory.
> 
> This should have no effect on any performance-critical code paths.
> 
> Signed-off-by: Andy Lutomirski <l...@kernel.org>
> ---
>  drivers/net/virtio_net.c | 53 
> 
>  1 file changed, 36 insertions(+), 17 deletions(-)

Reviewed-by: Joerg Roedel <jroe...@suse.de>
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/3] virtio_pci: Use the DMA API

2015-10-27 Thread Joerg Roedel
On Tue, Oct 27, 2015 at 06:17:10PM -0700, Andy Lutomirski wrote:
> From: Andy Lutomirski 
> 
> This fixes virtio-pci on platforms and busses that have IOMMUs.  This
> will break the experimental QEMU Q35 IOMMU support until QEMU is
> fixed.  In exchange, it fixes physical virtio hardware as well as
> virtio-pci running under Xen.
> 
> We should clean up the virtqueue API to do its own allocation and
> teach virtqueue_get_avail and virtqueue_get_used to return DMA
> addresses directly.
> 
> Signed-off-by: Andy Lutomirski 
> ---
>  drivers/virtio/virtio_pci_common.h |  3 ++-
>  drivers/virtio/virtio_pci_legacy.c | 19 +++
>  drivers/virtio/virtio_pci_modern.c | 34 --
>  3 files changed, 41 insertions(+), 15 deletions(-)

Same here, you need to call the dma_sync* functions when passing data
from/to the virtio-device.

I think a good test for that is to boot a virtio kvm-guest with
swiotlb=force and see if it still works.



Joerg

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/3] virtio_ring: Support DMA APIs

2015-10-27 Thread Joerg Roedel
On Tue, Oct 27, 2015 at 07:13:56PM -0700, Andy Lutomirski wrote:
> On Tue, Oct 27, 2015 at 7:06 PM, Joerg Roedel <jroe...@suse.de> wrote:
> > Hi Andy,
> >
> > On Tue, Oct 27, 2015 at 06:17:09PM -0700, Andy Lutomirski wrote:
> >> From: Andy Lutomirski <l...@amacapital.net>
> >>
> >> virtio_ring currently sends the device (usually a hypervisor)
> >> physical addresses of its I/O buffers.  This is okay when DMA
> >> addresses and physical addresses are the same thing, but this isn't
> >> always the case.  For example, this never works on Xen guests, and
> >> it is likely to fail if a physical "virtio" device ever ends up
> >> behind an IOMMU or swiotlb.
> >
> > The overall code looks good, but I havn't seen and dma_sync* calls.
> > When swiotlb=force is in use this would break.
> >
> >> + vq->vring.desc[head].addr = cpu_to_virtio64(_vq->vdev, 
> >> vring_map_single(
> >> + vq,
> >> + desc, total_sg * sizeof(struct vring_desc),
> >> + DMA_TO_DEVICE));
> >
> 
> Are you talking about a dma_sync call on the descriptor ring itself?
> Isn't dma_alloc_coherent supposed to make that unnecessary?  I should
> move the allocation into the virtqueue code.
> 
> The docs suggest that I might need to "flush the processor's write
> buffers before telling devices to read that memory".  I'm not sure how
> to do that.

The write buffers should be flushed by the dma-api functions if
necessary.  For dma_alloc_coherent allocations you don't need to call
dma_sync*, but for the map_single/map_page/map_sg ones, as these might
be bounce-buffered.


Joerg

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/3] virtio_pci: Use the DMA API

2015-10-27 Thread Joerg Roedel
On Wed, Oct 28, 2015 at 11:15:30AM +0900, Joerg Roedel wrote:
> Same here, you need to call the dma_sync* functions when passing data
> from/to the virtio-device.

Okay, forget about this comment. This patch only converts to
dma_coherent allocations, which don't need synchronization.

> I think a good test for that is to boot a virtio kvm-guest with
> swiotlb=force and see if it still works.

But this holds, Its a good way to test if your changes work with
bounce-buffering. Together with DMA_API_DEBUG you also see if your
specified dma_directions are right.



Joerg

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/3] virtio_pci: Use the DMA API

2015-10-27 Thread Joerg Roedel
On Wed, Oct 28, 2015 at 11:22:52AM +0900, David Woodhouse wrote:
> On Wed, 2015-10-28 at 11:15 +0900, Joerg Roedel wrote:
> > I think a good test for that is to boot a virtio kvm-guest with
> > swiotlb=force and see if it still works.
> 
> That's useful but doesn't cover the cases where dma_wmb() is needed,
> right? 
> 
> We should make sure we're handling descriptors properly as we would for
> real hardware, and ensuring that addresses etc. are written to the
> descriptor (and a write barrier occurs) *before* the bit is set which
> tells the 'hardware' that it owns that descriptor.

Hmm, good question. The virtio code has virtio_rmb/wmb and should
already call it in the right places.

The virtio-barriers default to rmb()/wmb() unless weak_barriers is set,
in which case it calls dma_rmb()/wmb(), just a compiler barrier on x86.

And weak_barriers is set by the virtio drivers when creating the queue,
and the drivers should know what they are doing. Saying this, I think
the virtio-drivers should already get this right.


Joerg

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 00/12] KVM: x86: add support for VMX TSC scaling

2015-10-23 Thread Joerg Roedel
On Tue, Oct 20, 2015 at 03:39:00PM +0800, Haozhong Zhang wrote:
> VMX TSC scaling shares some common logics with SVM TSC ratio which
> is already supported by KVM. Patch 1 ~ 8 move those common logics from
> SVM code to the common code. Upon them, patch 9 ~ 12 add VMX-specific
> support for VMX TSC scaling.

Have you tested your changes on an AMD machine too?


Joerg

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 00/12] KVM: x86: add support for VMX TSC scaling

2015-10-23 Thread Joerg Roedel
On Fri, Oct 23, 2015 at 08:32:28PM +0800, Haozhong Zhang wrote:
> No, since I don't have AMD machines at hand. The modifications to SVM
> code are mostly lifting common code with VMX TSC scaling code, so it
> should still work on AMD machines.

Well, I think it would be good if you can provide a Tested-by on AMD
machines from someone who has one. Or get one yourself when changing AMD
specific code, they are not that expensive :)
I can do some testing when I am back from my travels, but that will not
be before early November.

Joerg
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2] kvm: svm: Only propagate next_rip when guest supports it

2015-10-14 Thread Joerg Roedel
On Fri, Oct 09, 2015 at 01:15:11PM +0200, Paolo Bonzini wrote:
> This could be a bit expensive to do on every vmexit.  Can you benchmark
> it with kvm-unit-tests, or just cache the result in struct vcpu_svm?

Yes, caching it is certainly a good idea. I updated the patch:

>From 94ee662c527683c26ea5fa98a5a8f2c798c58470 Mon Sep 17 00:00:00 2001
From: Joerg Roedel <jroe...@suse.de>
Date: Wed, 7 Oct 2015 13:38:19 +0200
Subject: [PATCH] kvm: svm: Only propagate next_rip when guest supports it

Currently we always write the next_rip of the shadow vmcb to
the guests vmcb when we emulate a vmexit. This could confuse
the guest when its cpuid indicated no support for the
next_rip feature.

Fix this by only propagating next_rip if the guest actually
supports it.

Cc: Bandan Das <b...@redhat.com>
Cc: Dirk Mueller <dmuel...@suse.com>
Tested-By: Dirk Mueller <dmuel...@suse.com>
Signed-off-by: Joerg Roedel <jroe...@suse.de>
---
 arch/x86/kvm/cpuid.h | 21 +
 arch/x86/kvm/svm.c   | 11 ++-
 2 files changed, 31 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kvm/cpuid.h b/arch/x86/kvm/cpuid.h
index dd05b9c..effca1f 100644
--- a/arch/x86/kvm/cpuid.h
+++ b/arch/x86/kvm/cpuid.h
@@ -133,4 +133,25 @@ static inline bool guest_cpuid_has_mpx(struct kvm_vcpu 
*vcpu)
best = kvm_find_cpuid_entry(vcpu, 7, 0);
return best && (best->ebx & bit(X86_FEATURE_MPX));
 }
+
+/*
+ * NRIPS is provided through cpuidfn 0x800a.edx bit 3
+ */
+#define BIT_NRIPS  3
+
+static inline bool guest_cpuid_has_nrips(struct kvm_vcpu *vcpu)
+{
+   struct kvm_cpuid_entry2 *best;
+
+   best = kvm_find_cpuid_entry(vcpu, 0x800a, 0);
+
+   /*
+* NRIPS is a scattered cpuid feature, so we can't use
+* X86_FEATURE_NRIPS here (X86_FEATURE_NRIPS would be bit
+* position 8, not 3).
+*/
+   return best && (best->edx & bit(BIT_NRIPS));
+}
+#undef BIT_NRIPS
+
 #endif
diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index 2f9ed1f..e9e3294 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -159,6 +159,9 @@ struct vcpu_svm {
u32 apf_reason;
 
u64  tsc_ratio;
+
+   /* cached guest cpuid flags for faster access */
+   bool nrips_enabled  : 1;
 };
 
 static DEFINE_PER_CPU(u64, current_tsc_ratio);
@@ -2365,7 +2368,9 @@ static int nested_svm_vmexit(struct vcpu_svm *svm)
nested_vmcb->control.exit_info_2   = vmcb->control.exit_info_2;
nested_vmcb->control.exit_int_info = vmcb->control.exit_int_info;
nested_vmcb->control.exit_int_info_err = 
vmcb->control.exit_int_info_err;
-   nested_vmcb->control.next_rip  = vmcb->control.next_rip;
+
+   if (svm->nrips_enabled)
+   nested_vmcb->control.next_rip  = vmcb->control.next_rip;
 
/*
 * If we emulate a VMRUN/#VMEXIT in the same host #vmexit cycle we have
@@ -4098,6 +4103,10 @@ static u64 svm_get_mt_mask(struct kvm_vcpu *vcpu, gfn_t 
gfn, bool is_mmio)
 
 static void svm_cpuid_update(struct kvm_vcpu *vcpu)
 {
+   struct vcpu_svm *svm = to_svm(vcpu);
+
+   /* Update nrips enabled cache */
+   svm->nrips_enabled = !!guest_cpuid_has_nrips(>vcpu);
 }
 
 static void svm_set_supported_cpuid(u32 func, struct kvm_cpuid_entry2 *entry)
-- 
1.8.4.5

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] kvm: svm: Only propagate next_rip when guest supports it

2015-10-09 Thread Joerg Roedel
From: Joerg Roedel <jroe...@suse.de>

Currently we always write the next_rip of the shadow vmcb to
the guests vmcb when we emulate a vmexit. This could confuse
the guest when its cpuid indicated no support for the
next_rip feature.

Fix this by only propagating next_rip if the guest actually
supports it.

Cc: Bandan Das <b...@redhat.com>
Cc: Dirk Mueller <dmuel...@suse.com>
Tested-by: Dirk Mueller <dmuel...@suse.com>
Signed-off-by: Joerg Roedel <jroe...@suse.de>
---
 arch/x86/kvm/cpuid.h | 21 +
 arch/x86/kvm/svm.c   |  4 +++-
 2 files changed, 24 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kvm/cpuid.h b/arch/x86/kvm/cpuid.h
index dd05b9c..effca1f 100644
--- a/arch/x86/kvm/cpuid.h
+++ b/arch/x86/kvm/cpuid.h
@@ -133,4 +133,25 @@ static inline bool guest_cpuid_has_mpx(struct kvm_vcpu 
*vcpu)
best = kvm_find_cpuid_entry(vcpu, 7, 0);
return best && (best->ebx & bit(X86_FEATURE_MPX));
 }
+
+/*
+ * NRIPS is provided through cpuidfn 0x800a.edx bit 3
+ */
+#define BIT_NRIPS  3
+
+static inline bool guest_cpuid_has_nrips(struct kvm_vcpu *vcpu)
+{
+   struct kvm_cpuid_entry2 *best;
+
+   best = kvm_find_cpuid_entry(vcpu, 0x800a, 0);
+
+   /*
+* NRIPS is a scattered cpuid feature, so we can't use
+* X86_FEATURE_NRIPS here (X86_FEATURE_NRIPS would be bit
+* position 8, not 3).
+*/
+   return best && (best->edx & bit(BIT_NRIPS));
+}
+#undef BIT_NRIPS
+
 #endif
diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index 2f9ed1f..4084b33 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -2365,7 +2365,9 @@ static int nested_svm_vmexit(struct vcpu_svm *svm)
nested_vmcb->control.exit_info_2   = vmcb->control.exit_info_2;
nested_vmcb->control.exit_int_info = vmcb->control.exit_int_info;
nested_vmcb->control.exit_int_info_err = 
vmcb->control.exit_int_info_err;
-   nested_vmcb->control.next_rip  = vmcb->control.next_rip;
+
+   if (guest_cpuid_has_nrips(>vcpu))
+   nested_vmcb->control.next_rip  = vmcb->control.next_rip;
 
/*
 * If we emulate a VMRUN/#VMEXIT in the same host #vmexit cycle we have
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


BUG: unable to handle kernel paging request with v4.3-rc4

2015-10-09 Thread Joerg Roedel
Hi Alex,

while playing around with attaching a 32bit PCI device to a guest via
VFIO I triggered this oops:

[  192.289917] kernel tried to execute NX-protected page - exploit attempt? 
(uid: 0)
[  192.298245] BUG: unable to handle kernel paging request at 880224582608
[  192.306195] IP: [] 0x880224582608
[  192.312302] PGD 2026067 PUD 2029067 PMD 8002244001e3 
[  192.318589] Oops: 0011 [#1] PREEMPT SMP 
[  192.323363] Modules linked in: kvm_amd kvm vfio_pci vfio_iommu_type1 
vfio_virqfd vfio bnep bluetooth rfkill iscsi_ibft iscsi_boot_sysfs af_packet 
snd_hda_codec_via snd_hda_codec_generic snd_hda_codec_hdmi raid1 snd_hda_intel 
crct10dif_pclmul crc32_pclmul snd_hda_codec crc32c_intel ghash_clmulni_intel 
snd_hwdep snd_hda_core snd_pcm snd_timer aesni_intel aes_x86_64 md_mod 
glue_helper lrw gf128mul ablk_helper be2net snd serio_raw cryptd sp5100_tco 
pcspkr xhci_pci vxlan ip6_udp_tunnel fam15h_power sky2 udp_tunnel xhci_hcd 
soundcore dm_mod k10temp i2c_piix4 shpchp wmi acpi_cpufreq asus_atk0110 button 
processor ata_generic firewire_ohci firewire_core ohci_pci crc_itu_t radeon 
i2c_algo_bit drm_kms_helper pata_jmicron syscopyarea sysfillrect sysimgblt 
fb_sys_fops ttm drm sg [last unloaded: kvm]
[  192.399986] CPU: 4 PID: 2037 Comm: qemu-system-x86 Not tainted 4.3.0-rc4+ #4
[  192.408260] Hardware name: System manufacturer System Product Name/Crosshair 
IV Formula, BIOS 302710/28/2011
[  192.419746] task: 880223e24040 ti: 8800cae5c000 task.ti: 
8800cae5c000
[  192.428506] RIP: 0010:[]  [] 
0x880224582608
[  192.437376] RSP: 0018:8800cae5fe58  EFLAGS: 00010286
[  192.443940] RAX: 8800cb3c8800 RBX: 8800cba55800 RCX: 0004
[  192.452370] RDX: 0004 RSI: 8802233e7887 RDI: 0001
[  192.460796] RBP: 8800cae5fe98 R08: 0ff8 R09: 0008
[  192.469145] R10: 0001d300 R11:  R12: 8800cba55800
[  192.477584] R13: 8802233e7880 R14: 8800cba55830 R15: 7fff43b30b50
[  192.486025] FS:  7f94375b2c00() GS:88022ed0() 
knlGS:
[  192.495445] CS:  0010 DS:  ES:  CR0: 80050033
[  192.502481] CR2: 880224582608 CR3: cb9d9000 CR4: 000406e0
[  192.510850] Stack:
[  192.514094]  a03f9733 0001 0001 
880223c74600
[  192.522876]  8800ca4f6d88 7fff43b30b50 3b6a 
7fff43b30b50
[  192.531582]  8800cae5ff08 811efc7d 8800cae5fec8 
880223c74600
[  192.540439] Call Trace:
[  192.544145]  [] ? vfio_group_fops_unl_ioctl+0x253/0x410 
[vfio]
[  192.552898]  [] do_vfs_ioctl+0x2cd/0x4c0
[  192.559713]  [] ? __fget+0x77/0xb0
[  192.565998]  [] SyS_ioctl+0x79/0x90
[  192.572373]  [] ? syscall_return_slowpath+0x50/0x130
[  192.580258]  [] entry_SYSCALL_64_fastpath+0x16/0x75
[  192.588049] Code: 88 ff ff d8 25 58 24 02 88 ff ff e8 25 58 24 02 88 ff ff 
e8 25 58 24 02 88 ff ff 58 a2 70 21 02 88 ff ff c0 65 39 cb 00 88 ff ff <08> 2d 
58 24 02 88 ff ff 08 88 3c cb 00 88 ff ff d8 58 c1 24 02 
[  192.610309] RIP  [] 0x880224582608
[  192.616940]  RSP 
[  192.621805] CR2: 880224582608
[  192.632826] ---[ end trace ce135ef0c9b1869f ]---

I am not sure whether this is an IOMMU or VFIO bug, have you seen
something like this before?


Joerg

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: BUG: unable to handle kernel paging request with v4.3-rc4

2015-10-09 Thread Joerg Roedel
On Fri, Oct 09, 2015 at 09:30:40AM -0600, Alex Williamson wrote:
> I have not seen this one yet.  There literally have been no changes for
> vfio in 4.3, so if this is new, it may be collateral from changes
> elsewhere.  32bit devices really shouldn't make any difference to vfio,
> I'll see if I can reproduce it myself though.  Thanks,

Great, thanks. It looks like some sort of memory corruption to me. What
I did was trying to assign the 32bit dev together with its PCI bridge.
Probing the bridge in VFIO fails due to the first check in the probe
function and then I get the oops.

I also tried assigning other bridges, or only the bridge to the 32bit
bus, but this did not trigger the oops (but fails too, as it should).

I'll also try to debug this more.


Joerg

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Use WARN_ON_ONCE for missing X86_FEATURE_NRIPS

2015-10-07 Thread Joerg Roedel
On Tue, Oct 06, 2015 at 01:59:27PM -0400, Bandan Das wrote:
> Joerg Roedel <j...@8bytes.org> writes:
> >
> > So svm->vmcb->control.next_rip is only written by hardware or in
> > svm_check_intercept(). Both cases write only to this field, if the
> > hardware supports X86_FEATURE_NRIPS. The write in nested_svm_vmexit only
> 
> Not until commit f104765b4f81fd74d69e0eb161e89096deade2db. So, an older L1
> kernel will trigger it.

But we don't care if L1 writes something into its own next_rip, as we
never read this value from its VMCB. We only copy the next_rip value we
get from our shadow-vmcb to it on an emulated vmexit. So I still don't
understand what triggers the reported problem or why the WARN_ON is
necessary.


Joerg

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] kvm: svm: Only propagate next_rip when guest supports it

2015-10-07 Thread Joerg Roedel
On Wed, Oct 07, 2015 at 01:03:35PM +0200, Joerg Roedel wrote:
> But we don't care if L1 writes something into its own next_rip, as we
> never read this value from its VMCB. We only copy the next_rip value we
> get from our shadow-vmcb to it on an emulated vmexit. So I still don't
> understand what triggers the reported problem or why the WARN_ON is
> necessary.

Okay, I think I have an idea now. I talked a bit with Dirk and the
WARN_ON triggers in the guest, and not on the host. This makes a lot
more sense.

In nested-svm we always copy the next_rip from the shadow-vmcb to the
guests vmcb, even when the nrips bit in cpuid is not set for the guest.
This obviously triggers the WARN_ON() in the L1 KVM (I still don't
understand why the WARN_ON was introduced in the first place).

So the right fix is to only copy next_rip to the guests vmcb when its
cpuid indicates that next_rip is supported there, like in this patch:

>From 019afc60507618b8e44e0c67d5ea2d850d88c9dd Mon Sep 17 00:00:00 2001
From: Joerg Roedel <jroe...@suse.de>
Date: Wed, 7 Oct 2015 13:38:19 +0200
Subject: [PATCH] kvm: svm: Only propagate next_rip when guest supports it

Currently we always write the next_rip of the shadow vmcb to
the guests vmcb when we emulate a vmexit. This could confuse
the guest when its cpuid indicated no support for the
next_rip feature.

Fix this by only propagating next_rip if the guest actually
supports it.

Signed-off-by: Joerg Roedel <jroe...@suse.de>
---
 arch/x86/kvm/cpuid.h | 21 +
 arch/x86/kvm/svm.c   |  7 ++-
 2 files changed, 27 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kvm/cpuid.h b/arch/x86/kvm/cpuid.h
index dd05b9c..effca1f 100644
--- a/arch/x86/kvm/cpuid.h
+++ b/arch/x86/kvm/cpuid.h
@@ -133,4 +133,25 @@ static inline bool guest_cpuid_has_mpx(struct kvm_vcpu 
*vcpu)
best = kvm_find_cpuid_entry(vcpu, 7, 0);
return best && (best->ebx & bit(X86_FEATURE_MPX));
 }
+
+/*
+ * NRIPS is provided through cpuidfn 0x800a.edx bit 3
+ */
+#define BIT_NRIPS  3
+
+static inline bool guest_cpuid_has_nrips(struct kvm_vcpu *vcpu)
+{
+   struct kvm_cpuid_entry2 *best;
+
+   best = kvm_find_cpuid_entry(vcpu, 0x800a, 0);
+
+   /*
+* NRIPS is a scattered cpuid feature, so we can't use
+* X86_FEATURE_NRIPS here (X86_FEATURE_NRIPS would be bit
+* position 8, not 3).
+*/
+   return best && (best->edx & bit(BIT_NRIPS));
+}
+#undef BIT_NRIPS
+
 #endif
diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index 94b7d15..e1a8824 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -2459,7 +2459,9 @@ static int nested_svm_vmexit(struct vcpu_svm *svm)
nested_vmcb->control.exit_info_2   = vmcb->control.exit_info_2;
nested_vmcb->control.exit_int_info = vmcb->control.exit_int_info;
nested_vmcb->control.exit_int_info_err = 
vmcb->control.exit_int_info_err;
-   nested_vmcb->control.next_rip  = vmcb->control.next_rip;
+
+   if (guest_cpuid_has_nrips(vcpu))
+   nested_vmcb->control.next_rip  = vmcb->control.next_rip;
 
/*
 * If we emulate a VMRUN/#VMEXIT in the same host #vmexit cycle we have
@@ -2714,6 +2716,9 @@ static bool nested_svm_vmrun(struct vcpu_svm *svm)
svm->vmcb->control.event_inj = nested_vmcb->control.event_inj;
svm->vmcb->control.event_inj_err = nested_vmcb->control.event_inj_err;
 
+   /* Clear next_rip, as real hardware would do */
+   nested_vmcb->control.next_rip = 0;
+
nested_svm_unmap(page);
 
/* Enter Guest-Mode */
-- 
1.8.4.5

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Use WARN_ON_ONCE for missing X86_FEATURE_NRIPS

2015-10-07 Thread Joerg Roedel
On Wed, Oct 07, 2015 at 10:58:07AM -0400, Bandan Das wrote:
> Ok, looks like I am making some incorrect "vmx" assumptions here. What happens
> when we exit from L2 to L0, arent' we looking at the VMCB L1 is using to run
> L2 ? Wouldn't that trigger the warning if the host processor does not support
> nrips and the field is set ?

No, because the L1 hypervisor can't write to the shadow-vmcb, and this
is the only one where we _read_ next_rip from.



Joerg

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] kvm: svm: Only propagate next_rip when guest supports it

2015-10-07 Thread Joerg Roedel
On Wed, Oct 07, 2015 at 11:48:36AM -0400, Bandan Das wrote:
> Ok, understood now. The warn_on would trigger in L1 only if it has
> decided to disable nrips for some reason as was the case here. So,
> my reasoning behind putting the warning was incorrect.

Okay, so I think the warning can be removed.

> > +
> > +   if (guest_cpuid_has_nrips(vcpu))
> > +   nested_vmcb->control.next_rip  = vmcb->control.next_rip;

Note that there is a bug here, instead of vcpu it must be >vcpu.
Somehow I missed to at least compile-test this.

Dirk is currently testing whether this (fixed) patch solves the problem
in his setup.

> >  
> > /*
> >  * If we emulate a VMRUN/#VMEXIT in the same host #vmexit cycle we have
> > @@ -2714,6 +2716,9 @@ static bool nested_svm_vmrun(struct vcpu_svm *svm)
> > svm->vmcb->control.event_inj = nested_vmcb->control.event_inj;
> > svm->vmcb->control.event_inj_err = nested_vmcb->control.event_inj_err;
> >  
> > +   /* Clear next_rip, as real hardware would do */
> > +   nested_vmcb->control.next_rip = 0;
> > +
> 
> Why do we need this ? And are you sure this is what real hardware does ?
> I couldn't find anything in the spec.

Yeah, probably right. Since we only write guests next_rip when the guest
supports it via cpuid, there is probably no point in resetting it at
vmrun emulation.


Joerg

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] iommu/s390: add iommu api for s390 pci devices

2015-10-06 Thread Joerg Roedel
On Thu, Oct 01, 2015 at 07:30:28PM +0200, Gerald Schaefer wrote:
> Yes, the DMA API is already implemented in arch/s390/pci/pci_dma.c.
> I thought about moving it over to the new location in drivers/iommu/,
> but I don't see any benefit from it.

Okay, this is true for now. At some point we hopefully have a common
DMA-API implementation for all IOMMU driver, at which point s390 can
make use of it too and abandon its own implementation.

> Also, the two APIs are quite different on s390 and must not be mixed-up.
> For example, we have optimizations in the DMA API to reduce TLB flushes
> based on iommu bitmap wrap-around, which is not possible for the map/unmap
> logic in the IOMMU API. There is also the requirement that each device has
> its own DMA page table (not shared), which is important for DMA API device
> recovery and map/unmap on s390.

This sounds quite similar to what other IOMMU drivers also implement,
especially the AMD IOMMU driver. It also uses non-shared page-tables for
devices and implements the bitmap-allocator optimization.

> Hmm, not sure how this can replace my own struct. I need the struct to
> maintain a list of all devices that share a dma page table. And the
> devices need to be added and removed to/from that list in attach/detach_dev.
> 
> I also need that list during map/unmap, in order to do a TLB flush for
> all affected devices, and this happens under a spin lock.
> 
> So I guess I cannot use the iommu_group->devices list, which is managed
> in add/remove_device and under a mutex, if that was on your mind.

Yeah, right. Thanks for the explanation.


Joerg

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] iommu/s390: add iommu api for s390 pci devices

2015-10-06 Thread Joerg Roedel
On Thu, Aug 27, 2015 at 03:33:03PM +0200, Gerald Schaefer wrote:
> This adds an IOMMU API implementation for s390 PCI devices.
> 
> Reviewed-by: Sebastian Ott 
> Signed-off-by: Gerald Schaefer 
> ---
>  MAINTAINERS |   7 +
>  arch/s390/Kconfig   |   1 +
>  arch/s390/include/asm/pci.h |   4 +
>  arch/s390/include/asm/pci_dma.h |   5 +-
>  arch/s390/pci/pci_dma.c |  37 +++--
>  drivers/iommu/Kconfig   |   7 +
>  drivers/iommu/Makefile  |   1 +
>  drivers/iommu/s390-iommu.c  | 337 
> 
>  8 files changed, 386 insertions(+), 13 deletions(-)
>  create mode 100644 drivers/iommu/s390-iommu.c

Applied, thanks.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Use WARN_ON_ONCE for missing X86_FEATURE_NRIPS

2015-10-06 Thread Joerg Roedel
On Mon, Oct 05, 2015 at 01:42:43PM -0400, Bandan Das wrote:
> Joerg Roedel <j...@8bytes.org> writes:
> 
> > On Mon, Oct 05, 2015 at 12:54:43PM -0400, Bandan Das wrote:
> >> Joerg Roedel <j...@8bytes.org> writes:
> >> The problems is that the next_rip field could be stale. If the processor 
> >> supports
> >> next_rip, then it will clear it out on the next entry. If it doesn't,
> >> an old value just sits there (no matter who wrote it) and the problem
> >> happens when skip_emulated_instruction advances the rip with an incorrect
> >> value.
> >
> > So the right fix would be to just set the guests next_rip to 0 when we
> > emulate vmrun, just like real hardware does, no?
> 
> Agreed, resetting to 0 if nrips isn't supported seems right. It would still
> help having a printk_once in this case IMO :)

I meant to reset it always to 0 on vmrun, like real hardware does.



Joerg

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Use WARN_ON_ONCE for missing X86_FEATURE_NRIPS

2015-10-06 Thread Joerg Roedel
On Thu, Oct 01, 2015 at 06:31:27PM -0400, Bandan Das wrote:
> >> @@ -514,7 +514,7 @@ static void skip_emulated_instruction(struct kvm_vcpu 
> >> *vcpu)
> >>struct vcpu_svm *svm = to_svm(vcpu);
> >>  
> >>if (svm->vmcb->control.next_rip != 0) {
> >> -  WARN_ON(!static_cpu_has(X86_FEATURE_NRIPS));
> >> +  WARN_ON_ONCE(!static_cpu_has(X86_FEATURE_NRIPS));
> >>svm->next_rip = svm->vmcb->control.next_rip;
> >>}

I looked again how this could possibly be triggered, and I am somewhat
confused now.

So svm->vmcb->control.next_rip is only written by hardware or in
svm_check_intercept(). Both cases write only to this field, if the
hardware supports X86_FEATURE_NRIPS. The write in nested_svm_vmexit only
targets the guests VMCB, and we don't use that one again.

So I can't see how the WARN_ON above could be triggered. Do I miss
something or might this also be a miscompilation of static_cpu_has?


Joerg

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Use WARN_ON_ONCE for missing X86_FEATURE_NRIPS

2015-10-05 Thread Joerg Roedel
On Thu, Oct 01, 2015 at 06:31:27PM -0400, Bandan Das wrote:
> Paolo Bonzini  writes:
> 
> > On 01/10/2015 13:43, Dirk Müller wrote:
> >> diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
> >> index 94b7d15..0a42859 100644
> >> --- a/arch/x86/kvm/svm.c
> >> +++ b/arch/x86/kvm/svm.c
> >> @@ -514,7 +514,7 @@ static void skip_emulated_instruction(struct kvm_vcpu 
> >> *vcpu)
> >>struct vcpu_svm *svm = to_svm(vcpu);
> >>  
> >>if (svm->vmcb->control.next_rip != 0) {
> >> -  WARN_ON(!static_cpu_has(X86_FEATURE_NRIPS));
> >> +  WARN_ON_ONCE(!static_cpu_has(X86_FEATURE_NRIPS));
> >>svm->next_rip = svm->vmcb->control.next_rip;
> >>}
> >>  
> >
> > Bandan, what was the reason for warning here?
> 
> I added the warning so that we catch if the next_rip field is being written
> to (even if the feature isn't supported) by a buggy L1 hypervisor.

Even if the L1 hypervisor writes to the next_rip field in the VMCB, we
would never see it in this code path, as we access the shadow VMCB in
this statement.

We don't even care if the L1 hypervisor writes to its next_rip field
because we only write to this field on an emulatated VMEXIT and never
read it back.

So what's the point in adding a guest-triggerable warning at all?



Joerg

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Use WARN_ON_ONCE for missing X86_FEATURE_NRIPS

2015-10-05 Thread Joerg Roedel
On Mon, Oct 05, 2015 at 12:54:43PM -0400, Bandan Das wrote:
> Joerg Roedel <j...@8bytes.org> writes:
> The problems is that the next_rip field could be stale. If the processor 
> supports
> next_rip, then it will clear it out on the next entry. If it doesn't,
> an old value just sits there (no matter who wrote it) and the problem
> happens when skip_emulated_instruction advances the rip with an incorrect
> value.

So the right fix would be to just set the guests next_rip to 0 when we
emulate vmrun, just like real hardware does, no?


Joerg

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] iommu/s390: add iommu api for s390 pci devices

2015-09-29 Thread Joerg Roedel
Hi Gerald,

thanks for your patch. It looks pretty good and addresses my previous
review comments. I have a few questions, first one is how this
operates with DMA-API on s390. Is there a seperate DMA-API
implementation besides the IOMMU-API one for PCI devices?

My other question is inline:

On Thu, Aug 27, 2015 at 03:33:03PM +0200, Gerald Schaefer wrote:
> +struct s390_domain_device {
> + struct list_headlist;
> + struct zpci_dev *zdev;
> +};

Instead of using your own struct here, have you considered using the
struct iommu_group instead? The struct devices contains a pointer to an
iommu_group and the struct itself contains pointers to the domain it is
currently bound to.


Regards,

Joerg

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v9 18/18] iommu/vt-d: Add a command line parameter for VT-d posted-interrupts

2015-09-21 Thread Joerg Roedel
On Fri, Sep 18, 2015 at 10:29:56PM +0800, Feng Wu wrote:
> Enable VT-d Posted-Interrtups and add a command line
> parameter for it.
> 
> Signed-off-by: Feng Wu <feng...@intel.com>
> Reviewed-by: Paolo Bonzini <pbonz...@redhat.com>
> ---
>  Documentation/kernel-parameters.txt |  1 +
>  drivers/iommu/irq_remapping.c   | 12 
>  2 files changed, 9 insertions(+), 4 deletions(-)

Acked-by: Joerg Roedel <jroe...@suse.de>

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RTF] kvm:x86:Fix error checking in the function pf_interception

2015-08-07 Thread Joerg Roedel
On Thu, Aug 06, 2015 at 10:10:23PM -0400, Nicholas Krause wrote:
 This fixes error checking in the function pf_interception by
 checking if the call to kvm_mmu_unprotect_page_virt returns
 zero to indicate the function has failed internally and if
 this occurs we must return immediately to the caller of the
 function pf_interception with the return value of zero from
 the call to the function kvm_mmu_unprotect_page_virt to indicate
 failure to the caller of the function pf_interception.
 
 Signed-off-by: Nicholas Krause xerofo...@gmail.com
 ---
  arch/x86/kvm/svm.c | 7 +--
  1 file changed, 5 insertions(+), 2 deletions(-)
 
 diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
 index 8e0c084..a57aee1 100644
 --- a/arch/x86/kvm/svm.c
 +++ b/arch/x86/kvm/svm.c
 @@ -1814,8 +1814,11 @@ static int pf_interception(struct vcpu_svm *svm)
   error_code = svm-vmcb-control.exit_info_1;
  
   trace_kvm_page_fault(fault_address, error_code);
 - if (!npt_enabled  kvm_event_needs_reinjection(svm-vcpu))
 - kvm_mmu_unprotect_page_virt(svm-vcpu, fault_address);
 + if (!npt_enabled  kvm_event_needs_reinjection(svm-vcpu)) {
 + r = kvm_mmu_unprotect_page_virt(svm-vcpu, 
 fault_address);
 + if (!r)
 + break;
 + }

NAK. kvm_mmu_unprotect_page_virt does not return an error but whether it
unprotected a page or not. This patch breaks SVM.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH 0/1] iommu: Detach device from domain when removed from group

2015-08-03 Thread Joerg Roedel
On Tue, Jul 28, 2015 at 07:55:55PM +0200, Gerald Schaefer wrote:
 On s390, this eventually leads to a kernel panic when binding the device
 again to its non-vfio PCI driver, because of the missing arch-specific
 cleanup in detach_dev. On x86, the detach_dev callback will also not be
 called directly, but there is a notifier that will catch
 BUS_NOTIFY_REMOVED_DEVICE and eventually do the cleanup. Other
 architectures w/o the notifier probably have at least some kind of memory
 leak in this scenario, so a general fix would be nice.

This notifier is not arch-specific, but registered against the bus the
iommu-ops are set for. Why does it not run on s390?


Joerg

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Lockdep warning in VFIO using v4.2-rc3

2015-07-23 Thread Joerg Roedel
Hi Alex,

On Thu, Jul 23, 2015 at 11:07:37AM -0600, Alex Williamson wrote:
 Thanks for the report.  I think I found it.  I'll do further testing
 myself, but would appreciate if you're able to see if this clears the
 problem.  Thanks,

Just tested it and the lockdep warning is gone. Thanks for your quick
look.

Tested-by: Joerg Roedel jroe...@suse.de

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Lockdep warning in VFIO using v4.2-rc3

2015-07-23 Thread Joerg Roedel
Hi Alex,

I stumbled over this lockdep warning yesterday while testing my VT-d
changes. It looks like one code path is taking the locks:

group-device_lock
driver_lock
pci_bus_sem

while another path is taking

pci_bus_sem
group-device_lock

which could lead to a deadlock. I attach the full warning, can you
please have a look?

[  252.892008] ==
[  252.892008] [ INFO: possible circular locking dependency detected ]
[  252.892008] 4.2.0-rc3+ #16 Not tainted
[  252.892008] ---
[  252.892008] qemu-system-x86/4986 is trying to acquire lock:
[  252.892008]  (group-device_lock){+.+.+.}, at: [a0569da4] 
vfio_group_get_device+0x24/0xb0 [vfio]
[  252.892008] 
but task is already holding lock:
[  252.892008]  (pci_bus_sem){.+}, at: [813ee47e] 
pci_walk_bus+0x2e/0xa0
[  252.892008] 
which lock already depends on the new lock.

[  252.892008] 
the existing dependency chain (in reverse order) is:
[  252.892008] 
- #2 (pci_bus_sem){.+}:
[  252.892008][810c9152] __lock_acquire+0xca2/0x1550
[  252.892008][810c9b6f] lock_acquire+0xdf/0x2c0
[  252.892008][81710c4c] down_read+0x4c/0xa0
[  252.892008][813ee47e] pci_walk_bus+0x2e/0xa0
[  252.892008][a0657dcb] vfio_pci_release+0x18b/0x3c0 
[vfio_pci]
[  252.892008][a056a100] vfio_device_fops_release+0x20/0x40 
[vfio]
[  252.892008][81211460] __fput+0xf0/0x200
[  252.892008][812115be] fput+0xe/0x10
[  252.892008][8108f5fd] task_work_run+0x8d/0xc0
[  252.892008][8106d35c] do_exit+0x32c/0xc30
[  252.892008][8106f10c] do_group_exit+0x4c/0xc0
[  252.892008][8107d678] get_signal+0x328/0x9e0
[  252.892008][81003458] do_signal+0x28/0x9e0
[  252.892008][81003e75] do_notify_resume+0x65/0x80
[  252.892008][81713e2e] int_signal+0x12/0x17
[  252.892008] 
- #1 (driver_lock){+.+.+.}:
[  252.892008][810c9152] __lock_acquire+0xca2/0x1550
[  252.892008][810c9b6f] lock_acquire+0xdf/0x2c0
[  252.892008][8170e71b] mutex_lock_nested+0x6b/0x420
[  252.892008][a06576b8] vfio_pci_open+0x38/0x270 [vfio_pci]
[  252.892008][a056ab17] 
vfio_group_fops_unl_ioctl+0x267/0x460 [vfio]
[  252.892008][8122454d] do_vfs_ioctl+0x30d/0x580
[  252.892008][81224839] SyS_ioctl+0x79/0x90
[  252.892008][81713c32] entry_SYSCALL_64_fastpath+0x16/0x7a
[  252.892008] 
- #2 (group-device_lock){+.+.+.}:
[  252.892008][810c6c6c] check_prevs_add+0x8fc/0x900
[  252.892008][810c9152] __lock_acquire+0xca2/0x1550
[  252.892008][810c9b6f] lock_acquire+0xdf/0x2c0
[  252.892008][8170e71b] mutex_lock_nested+0x6b/0x420
[  252.892008][a0569da4] vfio_group_get_device+0x24/0xb0 
[vfio]
[  252.892008][a056a165] vfio_device_get_from_dev+0x45/0xa0 
[vfio]
[  252.892008][a065762c] vfio_pci_get_devs+0x2c/0x80 
[vfio_pci]
[  252.892008][a065706d] vfio_pci_walk_wrapper+0x5d/0x70 
[vfio_pci]
[  252.892008][813ee4c5] pci_walk_bus+0x75/0xa0
[  252.892008][a0657e93] vfio_pci_release+0x253/0x3c0 
[vfio_pci]
[  252.892008][a056a100] vfio_device_fops_release+0x20/0x40 
[vfio]
[  252.892008][81211460] __fput+0xf0/0x200
[  252.892008][812115be] fput+0xe/0x10
[  252.892008][8108f5fd] task_work_run+0x8d/0xc0
[  252.892008][8106d35c] do_exit+0x32c/0xc30
[  252.892008][8106f10c] do_group_exit+0x4c/0xc0
[  252.892008][8107d678] get_signal+0x328/0x9e0
[  252.892008][81003458] do_signal+0x28/0x9e0
[  252.892008][81003e75] do_notify_resume+0x65/0x80
[  252.892008][81713e2e] int_signal+0x12/0x17
[  252.892008] 
other info that might help us debug this:

[  252.892008] Chain exists of:
  group-device_lock -- driver_lock -- pci_bus_sem

[  252.892008]  Possible unsafe locking scenario:

[  252.892008]CPU0CPU1
[  252.892008]
[  252.892008]   lock(pci_bus_sem);
[  252.892008]lock(driver_lock);
[  252.892008]lock(pci_bus_sem);
[  252.892008]   lock(group-device_lock);
[  252.892008] 
 *** DEADLOCK ***

[  252.892008] 2 locks held by qemu-system-x86/4986:
[  252.892008]  #0:  (driver_lock){+.+.+.}, at: [a0657c67] 
vfio_pci_release+0x27/0x3c0 [vfio_pci]
[  252.892008]  #1:  (pci_bus_sem){.+}, at: [813ee47e] 
pci_walk_bus+0x2e/0xa0
[  252.892008] 
stack backtrace:
[  252.892008] 

Re: [PATCH] KVM: svm: remove KVM_QUIRK_CD_NW_CLEARED quirk

2015-07-10 Thread Joerg Roedel
On Fri, Jul 10, 2015 at 02:01:33PM +0200, Paolo Bonzini wrote:
 We can disable CD unconditionally when there is no assigned device.
 KVM now forces guest PAT to all-writeback in that case, so it makes
 sense to also force CR0.CD=0.
 
 When there are assigned devices, emulate cache-disabled operation
 through the page tables.  This behavior is consistent with VMX,
 where CD/NW are not touched by vmentry/vmexit.
 
 Note that buggy firmware that does not clear CD/NW is _seriously_
 old: SeaBIOS for example has been doing it since October 2008.
 
 Signed-off-by: Paolo Bonzini pbonz...@redhat.com
 ---
  arch/x86/kvm/svm.c | 14 +++---
  1 file changed, 7 insertions(+), 7 deletions(-)

Looks good to me.

Reviewed-by: Joerg Roedel jroe...@suse.de

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC v2 0/6] IRQ bypass manager and irqfd consumer

2015-07-09 Thread Joerg Roedel
On Thu, Jul 09, 2015 at 04:38:41PM +0200, Paolo Bonzini wrote:
  On Tue, Jul 07, 2015 at 11:17:48AM -0600, Alex Williamson wrote:
  If we think that it's *only* a kvm-vfio interaction then we could add it
  to virt/kvm/vfio.c.  vfio could use symbol_get to avoid a module
  dependency and effectively disable the code path when not used with kvm.
  The reverse model of hosting it in vfio and using symbol_get from
  kvm-vfio would also work.  Do we really want to declare it to be
  kvm-vfio specific though?  Another option would be to simply host it
  under virt/lib with module dependencies for both vfio and kvm.
 
 I wonder if in the future we may have some kind of driver-mediated
 passthrough, e.g. for network drivers.  They might use the bypass
 mechanism too.  So I think drivers/vfio is too restrictive.
 
 virt/ right now only hosts KVM, but it could for example host lguest
 too.  virt/lib/ is okay with me.

Yeah, virt/lib is probably the best choice.


Joerg

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC v2 0/6] IRQ bypass manager and irqfd consumer

2015-07-09 Thread Joerg Roedel
On Tue, Jul 07, 2015 at 11:17:48AM -0600, Alex Williamson wrote:
 Hosting the bypass manager in kernel/irq seemed appropriate, but really
 it could be anywhere.  Does anyone have a different preference or
 specifically want it under their scope?  We had originally thought of
 this as an IOMMU service, but I think we've generalized it beyond that.
 I expect we should also add the necessary hooks to turn it into a
 loadable module to keep the tinification folks happy, I'll incorporate
 the current working changes and post a version with that.

Yeah, this is only an IOMMU service on x86, afaik. So drivers/iommu is
probably the wrong place to host it.

Will there be any other producers than VFIO or any other consumers than
KVM? If not, it should live in one of these spaces. KVM is probably the
best choice, as any hardware feature that uses this targets
virtualization, so there will hardly ever be another consumer than KVM.


Joerg

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC/RFT PATCH v3 0/4] KVM: x86: full virtualization of guest MTRR

2015-07-08 Thread Joerg Roedel
On Wed, Jul 08, 2015 at 05:18:26PM +0200, Paolo Bonzini wrote:
 I do not have any AMD machines that support an IOMMU, so I would like
 some help testing these patches.

Works here, tested on an AMD IOMMUv2 machine and could successfully
download a file over the assigned NIC.

Tested-by: Joerg Roedel jroe...@suse.de

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC/RFT PATCH v2 0/4] KVM: x86: full virtualization of guest MTRR

2015-07-07 Thread Joerg Roedel
On Tue, Jul 07, 2015 at 04:09:07PM +0200, Paolo Bonzini wrote:
 The guest should not take ages to boot, which is typical of messed-up
 MTRR/PAT.  And if you add some actual usage of the attached device, it
 actually is sufficient.  Even a simple ping test, or typing through a
 USB keyboard on a passed-through controller, is enough.

Okay, I do some testing on this tomorrow and let you know.


Joerg

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC/RFT PATCH v2 0/4] KVM: x86: full virtualization of guest MTRR

2015-07-07 Thread Joerg Roedel
Hi Paolo,

On Tue, Jul 07, 2015 at 03:45:35PM +0200, Paolo Bonzini wrote:
 I do not have any AMD machines that support an IOMMU, so I would like
 some help testing these patches.  Thanks,

What kind of testing do you want? Booting a guest with a device attached
is probably not sufficient, right?


Joerg

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/3] KVM: SVM: use NPT page attributes

2015-07-07 Thread Joerg Roedel
Hi Paolo,

On Tue, Jul 07, 2015 at 02:58:12PM +0200, Paolo Bonzini wrote:
 +static void svm_set_guest_pat(struct vcpu_svm *svm, u64 *g_pat)
 +{
 + struct kvm_vcpu *vcpu = svm-vcpu;
 +
 + /* Unlike Intel, AMD takes the guest's CR0.CD into account.
 +  *
 +  * AMD doesn't have snooping control in the IOMMU, but if the guest

The AMD IOMMU has snooping control, its just called 'Force Coherent'
there. The AMD IOMMU driver always sets the FC bit in the page tables.


Joerg

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [v4 08/16] KVM: kvm-vfio: User API for IRQ forwarding

2015-06-29 Thread Joerg Roedel
On Mon, Jun 29, 2015 at 09:14:54AM +, Wu, Feng wrote:
 Do you mean updating the hardware IRTEs for all the entries in the irq
 routing table, no matter whether it is the updated one?

Right, that's what I mean. It seems wrong to me to work around the API
interface by creating a diff between the old and the new routing table.
It is much simpler (and easier to maintain) to just update the IRTE
and PI structures for all IRQs in the routing table, especially since
this is not a hot-path.


Joerg

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [v4 08/16] KVM: kvm-vfio: User API for IRQ forwarding

2015-06-29 Thread Joerg Roedel
Hi Feng,

On Thu, Jun 25, 2015 at 09:11:52AM -0600, Alex Williamson wrote:
 So the trouble is that QEMU vfio updates a single MSI vector, but that
 just updates a single entry within a whole table of routes, then the
 whole table is pushed to KVM.  But in kvm_set_irq_routing() we have
 access to both the new and the old tables, so we do have the ability to
 detect the change.  We can therefore detect which GSI changed and
 cross-reference that to KVMs irqfds.  If we have an irqfd that matches
 the GSI then we have all the information we need, right?  We can use the
 eventfd_ctx of the irqfd to call into the IRQ bypass manager if we need
 to.  If it's an irqfd that's already enabled for bypass then we may
 already have the data we need to tweak the PI config.
 
 Yes, I agree it's more difficult, but it doesn't appear to be
 impossible, right?

Since this also doesn't happen very often, you could also just update _all_
PI data-structures from kvm_set_irq_routing, no? This would just
resemble the way the API works anyway.

You just need to be careful to update the data structures only when the
function can't fail anymore, so that you don't have to roll back
anything.


Joerg

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] kvm: irqchip: Fix possible memory leak in kvm_set_irq_routing()

2015-06-26 Thread Joerg Roedel
Hi Dan,

On Fri, Jun 26, 2015 at 12:00:22PM +0300, Dan Carpenter wrote:
 The patch e73f61e41f3b: kvm: irqchip: Break up high order
 allocations of kvm_irq_routing_table from May 8, 2015, leads to the
 following static checker warning:
215  r = -EINVAL;
216  if (ue-flags)
217  goto out;
   
 Leaked here.  Move in front of the allocation?

Right, this is a potential leak, thanks for the report. The patch below
should fix it:

From 14abe455d04f7208a16237a2f1321fd5e5c5d115 Mon Sep 17 00:00:00 2001
From: Joerg Roedel jroe...@suse.de
Date: Fri, 26 Jun 2015 18:02:47 +0200
Subject: [PATCH] kvm: irqchip: Fix possible memory leak in
 kvm_set_irq_routing()

If ue-flags field is checked after the allocation of the
kvm_kernel_irq_routing_entry, it will be leaked if the check
succeeds. Do the check before the allocation instead to
avoid this leak.

Reported-by: Dan Carpenter dan.carpen...@oracle.com
Fixes: e73f61e41f3b: kvm: irqchip: Break up high order allocations of 
kvm_irq_routing_table
Signed-off-by: Joerg Roedel jroe...@suse.de
---
 virt/kvm/irqchip.c | 7 ---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/virt/kvm/irqchip.c b/virt/kvm/irqchip.c
index 21c1424..239f4ec 100644
--- a/virt/kvm/irqchip.c
+++ b/virt/kvm/irqchip.c
@@ -207,14 +207,15 @@ int kvm_set_irq_routing(struct kvm *kvm,
for (i = 0; i  nr; ++i) {
struct kvm_kernel_irq_routing_entry *e;
 
+   r = -EINVAL;
+   if (ue-flags)
+   goto out;
+
r = -ENOMEM;
e = kzalloc(sizeof(*e), GFP_KERNEL);
if (!e)
goto out;
 
-   r = -EINVAL;
-   if (ue-flags)
-   goto out;
r = setup_routing_entry(new, e, ue);
if (r)
goto out;
-- 
1.8.4.5

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [v4 08/16] KVM: kvm-vfio: User API for IRQ forwarding

2015-06-24 Thread Joerg Roedel
On Mon, Jun 15, 2015 at 06:17:03PM +0200, Eric Auger wrote:
 I guess this discussion also is relevant wrt [RFC v6 00/16] KVM-VFIO
 IRQ forward control series? Or is that central registry maintained by
 a posted interrupts manager something more specific to x86?

From what I understood so far, the feature you implemented for ARM is a
bit different from the ones that get introduced to x86.

Can you please share some details on how the ARM version works? I am
interested in how the GICv2 is configured for IRQ forwarding. The
question is whether the forwarding information needs to be updated from
KVM and what information about the IRQ KVM needs for this.


Joerg

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [v4 08/16] KVM: kvm-vfio: User API for IRQ forwarding

2015-06-24 Thread Joerg Roedel
On Thu, Jun 18, 2015 at 02:04:08PM -0600, Alex Williamson wrote:
 There are plenty of details to be filled in,

I also need to fill plenty of details in my head first, so here are some
suggestions based on my current understanding. Please don't hesitate to
correct me if where I got something wrong.

So first I totally agree that the handling of PI/non-PI configurations
should be transparent to user-space.

I read a bit through the VT-d spec, and my understanding of posted
interrupts so far is that:

1) Each VCPU gets a PI-Descriptor with its pending Posted
   Interrupts. This descriptor needs to be updated when a VCPU
   is migrated to another PCPU and should thus be under control
   of KVM.

   This is similar to the vAPIC backing page in the AMD version
   of this, except that the PCPU routing information is stored
   somewhere else on AMD.

2) As long as the VCPU runs the IRTEs are configured for
   posting, when the VCPU goes to sleep the old remapped entry is
   established again. So when the VCPU sleeps the interrupt
   would get routed to VFIO and forwarded through the eventfd.

   This would be different to the AMD version, where we have a
   running bit. When this is clear the IOMMU will trigger an event
   in its event-log. This might need special handling in VFIO
   ('might' because VFIO does not need to forward the interrupt,
it just needs to make sure the VCPU wakes up).

   Please correct me if my understanding of the Intel version is
   wrong.

So most of the data structures the IOMMU reads for this need to be
updated from KVM code (either x86-generic or AMD/Intel specific code),
as KVM has the information about VCPU load/unload and the IRQ routing.

What KVM needs from VFIO are the informations about the physical
interrupts, and it makes total sense to attach them as metadata to the
eventfd.

But the problems start at how this metadata should look like. It would
be good to have some generic description, but not sure if this is
possible. Otherwise this metadata would need to be requested by VFIO
from the IOMMU driver and passed on to KVM, which it then passes back to
the IOMMU driver. Or something like that.



Joerg

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH V5 0/4] Consolidated KVM vPMU support for x86

2015-06-16 Thread Joerg Roedel
On Fri, Jun 12, 2015 at 01:34:52AM -0400, Wei Huang wrote:
 This patchset is directlyh applicable on kvm.git/queue.
 
 V5:
   * Remove the get_pmu_ops from sub_arch; instead define pmu dispatcher
 in kvm_x86_ops-pmu_ops. The dispatcher is initialized in sub-arch.
 The PMU interface functions are changed accordingly (suggested by 
 Joerg Rodel).

Tested it again, works like a charm on my AMD system :) So for all patches
again:

Tested-by: Joerg Roedel jroe...@suse.de


Thanks,

Joerg

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH V5 2/4] KVM: x86/vPMU: Create vPMU interface for VMX and SVM

2015-06-16 Thread Joerg Roedel
On Fri, Jun 12, 2015 at 01:34:54AM -0400, Wei Huang wrote:
 This patch splits existing vPMU code into a common vPMU interface (pmc.c)
 and Intel specific vPMU code (pmu_intel.c) using the following steps:
 
 - Part of arechitectural vPMU code is extracted and moved to pmu_intel.c
   file. They are hooked up with the newly-defined intel_pmu_ops, which will
   be called from pmu interface;
 - Create a dummy pmu_amd.c file for AMD SVM with empty functions;
 
 All architectural vPMU functions are now called via PMU function dispatcher.
 This function dispatcher is defined in kvm_x86_ops-pmu_ops, which is
 initialized by sub-arch. Also note that Intel and AMD modules are now
 generated by combinig their corresponding arch files (vmx.c/svm.c) and pmu
 files (pmu_intel.c/pmu_amd.c).
 
 Tested-by: Radim Krčmář rkrc...@redhat.com
 Signed-off-by: Wei Huang w...@redhat.com

Reviewed-by: Joerg Roedel jroe...@suse.de

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH V4 2/4] KVM: x86/vPMU: Create vPMU interface for VMX and SVM

2015-06-10 Thread Joerg Roedel
On Fri, Jun 05, 2015 at 01:20:14AM -0400, Wei Huang wrote:
 +
 + struct kvm_pmu_ops *(*get_pmu_ops)(void);

Can't you just set kvm_pmu_ops in svm.c and vmx.c and save this
call-back? Besides that the patch looks good.


Joerg

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH V4 0/4] Consolidated KVM vPMU support for x86

2015-06-10 Thread Joerg Roedel
On Fri, Jun 05, 2015 at 01:20:12AM -0400, Wei Huang wrote:
 Wei Huang (4):
   KVM: x86/vPMU: Define kvm_pmu_ops to support vPMU function dispatch
   KVM: x86/vPMU: Create vPMU interface for VMX and SVM
   KVM: x86/vPMU: Implement AMD vPMU code for KVM
   KVM: x86/vPMU: Enable PMU handling for AMD PERFCTRn and EVNTSELn MSRs
 
  arch/x86/include/asm/kvm_host.h |  50 ++--
  arch/x86/kvm/Makefile   |   4 +-
  arch/x86/kvm/cpuid.c|   3 +-
  arch/x86/kvm/pmu.c  | 557 
 +++-
  arch/x86/kvm/pmu.h  |  99 +++
  arch/x86/kvm/pmu_amd.c  | 207 +++
  arch/x86/kvm/pmu_intel.c| 360 ++
  arch/x86/kvm/svm.c  |   8 +
  arch/x86/kvm/vmx.c  |   8 +
  arch/x86/kvm/x86.c  |  76 ++
  10 files changed, 896 insertions(+), 476 deletions(-)
  create mode 100644 arch/x86/kvm/pmu.h
  create mode 100644 arch/x86/kvm/pmu_amd.c
  create mode 100644 arch/x86/kvm/pmu_intel.c

Okay, I reviewed and tested the series. It looks good to me, except for
the one comment I had on patch 2.

So feel free to add

Reviewed-by: Joerg Roedel jroe...@suse.de
Tested-by: Joerg Roedel jroe...@suse.de

to patches 1, 3, and 4 in the next interation.


Joerg

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH V4 2/4] KVM: x86/vPMU: Create vPMU interface for VMX and SVM

2015-06-10 Thread Joerg Roedel
On Wed, Jun 10, 2015 at 01:47:42PM -0500, Wei Huang wrote:
 On 06/10/2015 01:05 PM, Joerg Roedel wrote:
  So how about putting the pmu-ops directly into the kvm_x86_ops as a
  (const) member, 
 I like this idea better. Here is the (expanded) design:
 
 1) add const struct kvm_pmu_ops *pmu_ops to kvm_x86_ops;
 2) In VMX, vmx_x86_ops.pmu_ops = intel_pmu_ops;
 3) In SVM, svm_x86_ops.pmu_ops = amd_pmu_ops.
 
 Common kvm_pmu_ functions (in pmu.c) will change accordingly
 afterwards. If no objection, I will do this way.

Yeah, thats probably the best solution then. Extending kvm_init would be
a lot more hassle.


Joerg

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH V4 2/4] KVM: x86/vPMU: Create vPMU interface for VMX and SVM

2015-06-10 Thread Joerg Roedel
On Wed, Jun 10, 2015 at 11:43:20AM -0500, Wei Huang wrote:
 On 06/10/2015 05:12 AM, Joerg Roedel wrote:
  On Fri, Jun 05, 2015 at 01:20:14AM -0400, Wei Huang wrote:
  +
  +  struct kvm_pmu_ops *(*get_pmu_ops)(void);
  
  Can't you just set kvm_pmu_ops in svm.c and vmx.c and save this
  call-back? Besides that the patch looks good.
 Hi Joerg,
 
 Thanks for your review. How about setting up kvm_pmu_ops in
 .hardware_setup function of VMX and SVM? More specifically:
 
 1) EXPORT_SYMBOL_GPL(kvm_pmu_ops) from pmu.c file;
 2) In vmx.c, set kvm_pmu_ops = intel_pmu_ops in hardware_setup();
 3) In svm.c, set kvm_pmu_ops = amd_pmu_ops in svm_hardware_setup().
 
 With that, we can get rid of the call-back.

Okay, just had a look at how this works with kvm_x86_ops, because my
suggestion needs the EXPORT_SYMBOL_GPL(kvm_pmu_ops), which is better
also avoided.

So how about putting the pmu-ops directly into the kvm_x86_ops as a
(const) member, or alternativly as an additional parameter to kvm_init?
This still saves us the access functions.


Joerg

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] kvm: irqchip: Break up high order allocations of kvm_irq_routing_table

2015-06-05 Thread Joerg Roedel
Hi Paolo,

On Mon, May 11, 2015 at 03:27:26PM +0200, Paolo Bonzini wrote:
 Great, I'll apply the patch.

Gentle ping. I don't see the patch in the queue or next branches of the
KVM tree yet. Do you plan to apply it for v4.2?


Joerg

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] kvm: irqchip: Break up high order allocations of kvm_irq_routing_table

2015-05-11 Thread Joerg Roedel
Hi Paolo,

On Fri, May 08, 2015 at 06:26:13PM +0200, Paolo Bonzini wrote:
 It probably doesn't matter much indeed, but can you time the difference?
  kvm_set_irq_routing is not too frequent, but happens enough often that
 we had to use a separate SRCU instance just to speed it up (see commit
 719d93cd5f5, kvm/irqchip: Speed up KVM_SET_GSI_ROUTING, 2014-01-16).

The results vary a lot, but what I can say for sure is that the
kvm_set_irq_routing function takes at least twice as long (~10.000 vs
~22.000 cycles) as before on my AMD Kaveri machine (maximum was between
3-4 times as long).

On the other side this function is only called 2 times at boot in my
test, so I couldn't detect a noticable effect on the overall boot time
of the guest (37 disks were attached).


Joerg

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2] kvm: iommu: Add cond_resched to legacy device assignment code

2015-01-27 Thread Joerg Roedel
From: Joerg Roedel jroe...@suse.de

When assigning devices to large memory guests (=128GB guest
memory in the failure case) the functions to create the
IOMMU page-tables for the whole guest might run for a very
long time. On non-preemptible kernels this might cause
Soft-Lockup warnings. Fix these by adding a cond_resched()
to the mapping and unmapping loops.

Signed-off-by: Joerg Roedel jroe...@suse.de
---
 arch/x86/kvm/iommu.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kvm/iommu.c b/arch/x86/kvm/iommu.c
index 17b73ee..7dbced3 100644
--- a/arch/x86/kvm/iommu.c
+++ b/arch/x86/kvm/iommu.c
@@ -138,7 +138,7 @@ int kvm_iommu_map_pages(struct kvm *kvm, struct 
kvm_memory_slot *slot)
 
gfn += page_size  PAGE_SHIFT;
 
-
+   cond_resched();
}
 
return 0;
@@ -306,6 +306,8 @@ static void kvm_iommu_put_pages(struct kvm *kvm,
kvm_unpin_pages(kvm, pfn, unmap_pages);
 
gfn += unmap_pages;
+
+   cond_resched();
}
 }
 
-- 
1.9.1

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] kvm: iommu: Add cond_resched to legacy device assignment code

2014-12-17 Thread Joerg Roedel
On Wed, Dec 17, 2014 at 09:38:57AM +0800, Chen, Tiejun wrote:
 On 2014/12/16 23:47, Joerg Roedel wrote:
 diff --git a/virt/kvm/iommu.c b/virt/kvm/iommu.c
 index c1e6ae9..ac427e8 100644
 --- a/virt/kvm/iommu.c
 +++ b/virt/kvm/iommu.c
 
 This file is already gone after one latest commit c274e03af705,
 kvm: x86: move assigned-dev.c and iommu.c to arch/x86/ is
 introduced, so you need to pull your tree firstly :)

Hmm, I based the patch on kvm/master, where the file is still present.
What is the branch to base kvm patches on these days?


Joerg

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] kvm: iommu: Add cond_resched to legacy device assignment code

2014-12-17 Thread Joerg Roedel
On Wed, Dec 17, 2014 at 11:46:24AM +0100, Paolo Bonzini wrote:
 
 
 On 17/12/2014 11:35, Joerg Roedel wrote:
   
   This file is already gone after one latest commit c274e03af705,
   kvm: x86: move assigned-dev.c and iommu.c to arch/x86/ is
   introduced, so you need to pull your tree firstly :)
  Hmm, I based the patch on kvm/master, where the file is still present.
  What is the branch to base kvm patches on these days?
 
 kvm/master in the -rc period, kvm/next during the merge window and for
 the next release.  But it's not a problem, I can sort it out.

Great, thanks Paolo. So I will base my future patches on kvm/next.


Joerg

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] kvm: iommu: Add cond_resched to legacy device assignment code

2014-12-16 Thread Joerg Roedel
From: Joerg Roedel jroe...@suse.de

When assigning devices to large memory guests (=128GB guest
memory in the failure case) the functions to create the
IOMMU page-tables for the whole guest might run for a very
long time. On non-preemptible kernels this might cause
Soft-Lockup warnings. Fix these by adding a cond_resched()
to the mapping and unmapping loops.

Signed-off-by: Joerg Roedel jroe...@suse.de
---
 virt/kvm/iommu.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/virt/kvm/iommu.c b/virt/kvm/iommu.c
index c1e6ae9..ac427e8 100644
--- a/virt/kvm/iommu.c
+++ b/virt/kvm/iommu.c
@@ -137,7 +137,7 @@ int kvm_iommu_map_pages(struct kvm *kvm, struct 
kvm_memory_slot *slot)
 
gfn += page_size  PAGE_SHIFT;
 
-
+   cond_resched();
}
 
return 0;
@@ -311,6 +311,8 @@ static void kvm_iommu_put_pages(struct kvm *kvm,
kvm_unpin_pages(kvm, pfn, unmap_pages);
 
gfn += unmap_pages;
+
+   cond_resched();
}
 }
 
-- 
1.9.1

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCHv7 02/26] iommu: add capability IOMMU_CAP_NOEXEC

2014-09-26 Thread Joerg Roedel
On Tue, Sep 23, 2014 at 04:46:01PM +0200, Antonios Motakis wrote:
 Some IOMMUs accept an IOMMU_NOEXEC protection flag in addition to
 IOMMU_READ and IOMMU_WRITE. Expose this as an IOMMU capability.
 
 Signed-off-by: Antonios Motakis a.mota...@virtualopensystems.com
 ---
  include/linux/iommu.h | 1 +
  1 file changed, 1 insertion(+)
 
 diff --git a/include/linux/iommu.h b/include/linux/iommu.h
 index e1a644c..0433553 100644
 --- a/include/linux/iommu.h
 +++ b/include/linux/iommu.h
 @@ -59,6 +59,7 @@ struct iommu_domain {
  
  #define IOMMU_CAP_CACHE_COHERENCY0x1
  #define IOMMU_CAP_INTR_REMAP 0x2 /* isolates device intrs */
 +#define IOMMU_CAP_NOEXEC 0x3 /* IOMMU_NOEXEC flag */

This changed recently to an enum, please rebase to latest iommu/next
branch.


With that change:

Acked-by: Joerg Roedel jroe...@suse.de

for patches 1 and 2.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH kvm-unit-tests 1/3] x86: svm: fix exitinfo values for NPT tests

2014-09-02 Thread Joerg Roedel
On Tue, Sep 02, 2014 at 05:05:26PM +0200, Paolo Bonzini wrote:
 The exitinfo values were plain wrong for the page-walk tests
 (including npt_rsvd), or else they were missing bits 32:33.
 Expect the right values.

Are bits 32:33 really emulated? IIRC they were not emulated in the
inital implementation, and they are missing in some hardware NPT
implementations as well.


Joerg

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/4] KVM: nSVM: propagate the NPF EXITINFO to the guest

2014-09-02 Thread Joerg Roedel
Ah, here you add emulation of these bits.

On Tue, Sep 02, 2014 at 05:13:48PM +0200, Paolo Bonzini wrote:
 This is similar to what the EPT code does with the exit qualification.
 This allows the guest to see a valid value for bits 33:32.
 
 Signed-off-by: Paolo Bonzini pbonz...@redhat.com
 ---
  arch/x86/kvm/paging_tmpl.h |  6 ++
  arch/x86/kvm/svm.c | 26 ++
  2 files changed, 28 insertions(+), 4 deletions(-)
 
 diff --git a/arch/x86/kvm/paging_tmpl.h b/arch/x86/kvm/paging_tmpl.h
 index 410776528265..99d4c4e836a0 100644
 --- a/arch/x86/kvm/paging_tmpl.h
 +++ b/arch/x86/kvm/paging_tmpl.h
 @@ -322,8 +322,14 @@ retry_walk:
  
   real_gfn = mmu-translate_gpa(vcpu, gfn_to_gpa(table_gfn),
 PFERR_USER_MASK|PFERR_WRITE_MASK);
 +
 + /*
 +  * Can this happen (except if the guest is playing TOCTTOU 
 games)?
 +  * We should have gotten a nested page fault on table_gfn 
 instead.
 +  */

Comment is true, but doesn't make the check below obsolete, no?

   if (unlikely(real_gfn == UNMAPPED_GVA))
   goto error;
 @@ -1974,10 +1974,28 @@ static void nested_svm_inject_npf_exit(struct 
 kvm_vcpu *vcpu,
  {
   struct vcpu_svm *svm = to_svm(vcpu);
  
 - svm-vmcb-control.exit_code = SVM_EXIT_NPF;
 - svm-vmcb-control.exit_code_hi = 0;
 - svm-vmcb-control.exit_info_1 = fault-error_code;
 - svm-vmcb-control.exit_info_2 = fault-address;
 + /*
 +  * We can keep the value that the processor stored in the VMCB,
 +  * but make up something sensible if we hit the WARN.
 +  */
 + if (WARN_ON(svm-vmcb-control.exit_code != SVM_EXIT_NPF)) {
 + svm-vmcb-control.exit_code = SVM_EXIT_NPF;
 + svm-vmcb-control.exit_code_hi = 0;
 + svm-vmcb-control.exit_info_1 = (1ULL  32);
 + svm-vmcb-control.exit_info_2 = fault-address;
 + }

Its been a while since I looked into this, but is an injected NPF exit
always the result of a real NPF exit? How about an io-port emulated on
L1 but passed through to L2 by the nested hypervisor. On emulation of
INS or OUTS, KVM would need to read/write to an L2 address space, maybe
causing NPF faults to be injected. In this case an IOIO exit would cause
an injected NPF exit for L1.


Joerg

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/4] KVM: nSVM: propagate the NPF EXITINFO to the guest

2014-09-02 Thread Joerg Roedel
On Tue, Sep 02, 2014 at 06:46:06PM +0200, Paolo Bonzini wrote:
 Il 02/09/2014 18:33, Joerg Roedel ha scritto:
  Comment is true, but doesn't make the check below obsolete, no?
 
 No, it doesn't.  I'll rewrite it as
 
   /*
* This cannot happen unless the guest is playing TOCTTOU games,
* because we would have gotten a nested page fault on table_gfn
* instead.  If this happens, the exit qualification / exit info
* field will incorrectly have guest page access as the
* nested page fault's cause, instead of guest page structure
* access.
*/

Sounds good.

  Its been a while since I looked into this, but is an injected NPF exit
  always the result of a real NPF exit?
 
 I think so, but that's why I CCed you. :)
 
  How about an io-port emulated on
  L1 but passed through to L2 by the nested hypervisor. On emulation of
  INS or OUTS, KVM would need to read/write to an L2 address space,
 
 It would need to read/write to *L1* (that's where the VMCB's IOIO map
 lies), which could result into a regular page fault injected into L1.

Hmm, INS and OUTS read/write to a virtual memory location, right? So if
we have an io-port intercepted by bare-metal KVM but not by the L1
hypervisor, and the L2 guest accesses this io-port, bare-metal KVM would
get an IOIO exit it has to emulate itself (it can't inject an IOIO exit
to L1 anyway, since L1 does not intercept the io-port).

During emulation the bare-metal KVM would read/write at an L2 virtual
address, because that is the context the instruction is executed in.
And while resolving this L2 virtual address, an NPF fault could be
detected that needs to be injected into the L1 hypervisor, right?



Joerg

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


LPC IOMMU and VFIO MicroConference - Call for Participation

2014-08-12 Thread Joerg Roedel
LPC IOMMU and VFIO MicroConference - Call for Participation
===

We are pleased to announce that this year there will be the first IOMMU
and VFIO MicroConference held at Linux Plumbers Conference in
Düsseldorf. An initial request for support of this micro conference
generated, among others, the following possible topic ideas:

* Improving generic IOMMU code and move code out of drivers
* IOMMU device error handling
* IOMMU Power Management
* Virtualizing IOMMUs
* Interface between IOMMUs an memory management

More suggested topics can be found at the wiki page of the micro
conference:

http://wiki.linuxplumbersconf.org/2014:iommu_microconference

We now ask for formal proposals for these discussions along with any
other topics or problems that need to be discussed in this area.

The format of the micro conference will be roughly half-hour slots for
each topic, where the discussion lead gives a short introduction to the
problem and maybe sketches possible solutions. The rest of the slot is
open for discussions so that we come to an agreement how to move
forward.

Please submit your formal proposal on the Linux Plumbers website (OpenID
login required) until August 31st at:


http://www.linuxplumbersconf.org/2014/how-to-submit-microconference-discussions-topics/

Hope to see you in Düsseldorf!


Joerg Roedel and Alex Williamson

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] KVM: nSVM: Do not report CLTS via SVM_EXIT_WRITE_CR0 to L1

2014-07-01 Thread Joerg Roedel
On Sun, Jun 29, 2014 at 09:55:53PM +0200, Jan Kiszka wrote:
 From: Jan Kiszka jan.kis...@siemens.com
 
 CLTS only changes TS which is not monitored by selected CR0
 interception. So skip any attempt to translate WRITE_CR0 to
 CR0_SEL_WRITE for this instruction.
 
 Signed-off-by: Jan Kiszka jan.kis...@siemens.com

Reviewed-by: Joerg Roedel jroe...@suse.de
Acked-by: Joerg Roedel jroe...@suse.de


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] KVM: nSVM: Fix IOIO bitmap evaluation

2014-07-01 Thread Joerg Roedel
On Mon, Jun 30, 2014 at 10:54:17AM +0200, Jan Kiszka wrote:
 From: Jan Kiszka jan.kis...@siemens.com
 
 First, kvm_read_guest returns 0 on success. And then we need to take the
 access size into account when testing the bitmap: intercept if any of
 bits corresponding to the access is set.
 
 Signed-off-by: Jan Kiszka jan.kis...@siemens.com

Reviewed-by: Joerg Roedel jroe...@suse.de
Acked-by: Joerg Roedel jroe...@suse.de

I have the slight hope that this fixes the issues with L2 Linux guests
on L1 Windows hypervisors. Have to check that at some point :)

 - if (kvm_read_guest(svm-vcpu.kvm, gpa, val, 1))
 - val = (1  bit);
 + if (kvm_read_guest(svm-vcpu.kvm, gpa, val, iopm_len))
 + return NESTED_EXIT_DONE;

Not related to that fix, but as a further improvement we should probably
do a #vmexit(invalid-vmcb) or something if we can't read the iopm.


Joerg


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] KVM: nSVM: Set correct port for IOIO interception evaluation

2014-07-01 Thread Joerg Roedel
On Mon, Jun 30, 2014 at 12:52:55PM +0200, Jan Kiszka wrote:
 From: Jan Kiszka jan.kis...@siemens.com
 
 Obtaining the port number from DX is bogus as a) there are immediate
 port accesses and b) user space may have changed the register content
 while processing the PIO access. Forward the correct value from the
 instruction emulator instead.
 
 Signed-off-by: Jan Kiszka jan.kis...@siemens.com

Reviewed-by: Joerg Roedel jroe...@suse.de
Acked-by: Joerg Roedel jroe...@suse.de


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH v6 04/20] iommu/arm-smmu: add capability IOMMU_CAP_INTR_REMAP

2014-06-16 Thread Joerg Roedel
On Sun, Jun 08, 2014 at 12:31:29PM +0200, Christoffer Dall wrote:
 On Thu, Jun 05, 2014 at 07:03:12PM +0200, Antonios Motakis wrote:
  With an ARM SMMU, interrupt remapping should always be safe from the
  SMMU's point of view, as it is properly handled by the GIC.
  
  Signed-off-by: Antonios Motakis a.mota...@virtualopensystems.com
  ---
   drivers/iommu/arm-smmu.c | 2 +-
   1 file changed, 1 insertion(+), 1 deletion(-)
  
  diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
  index 15ab2af..ff29402 100644
  --- a/drivers/iommu/arm-smmu.c
  +++ b/drivers/iommu/arm-smmu.c
  @@ -1544,7 +1544,7 @@ static int arm_smmu_domain_has_cap(struct 
  iommu_domain *domain,
  if (smmu_domain-root_cfg.smmu-features  ARM_SMMU_FEAT_COHERENT_WALK)
  caps |= IOMMU_CAP_CACHE_COHERENCY;
   
  -   caps |= IOMMU_CAP_NOEXEC;
  +   caps |= IOMMU_CAP_NOEXEC | IOMMU_CAP_INTR_REMAP;
   
  return !!(cap  caps);
   }
  -- 
  1.8.3.2
  
 What does IOMMU_CAP_INTR_REMAP signify exactly?  Is there docs/examples
 somewhere I can look at?  (A quick scan of the Linux souce code doesn't
 reveal much, and I'm not sure if this is purely MSI related or what...)

The flag was introduced for x86 IOMMUs to detect whether an IOMMU in the
system has and enables interrupt remapping to allow safe device
assignment to KVM guests. Without interrupt remapping a malicious guest
could attack the host with MSIs from the attached device.

How are PCI MSIs implemented on ARM with SMMU and GIC enabled. MSIs are
only memory dma transactions in the end, is it guaranteed on ARM that a
device only sends MSI transactions it is allowed to?


Joerg


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH v6 04/20] iommu/arm-smmu: add capability IOMMU_CAP_INTR_REMAP

2014-06-16 Thread Joerg Roedel
On Mon, Jun 16, 2014 at 04:13:29PM +0100, Will Deacon wrote:
 MSIs look just like memory accesses made by the device, so the SMMU
 will translate them to point at the GIC ITS (doorbell). The ITS then
 has tables to work out how to route the MSI.
 
 So, if IOMMU_CAP_INTR_REMAP is simply supposed to indicate that the
 SMMU can translate MSIs to point somewhere else, then the ARM SMMU can
 always do it.  If it's supposed to indicate that the actual MSI
 payload can be filtered/routed, then that requires the GIC ITS.
 
 The part I'm unsure of is how VFIO knows where to map the MSIs to.
 That requires knowledge of the physical and virtual doorbell pages --
 is that discoverable in the API?

VFIO does not care about the actual routing, it only cares that the
device can not send interrupts it is not allowed to send (e.g.
interrupts to vectors used by other devices or, on x86, exception vectors).
If that is guaranteed by the SMMU or the GIC ITS hardware and driver
then it is fine to set this flag.


Joerg


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH v6 04/20] iommu/arm-smmu: add capability IOMMU_CAP_INTR_REMAP

2014-06-16 Thread Joerg Roedel
On Mon, Jun 16, 2014 at 04:25:26PM +0100, Will Deacon wrote:
 Ok, thanks. In which case, I think this is really a combined property of
 the SMMU and the interrupt controller, so we might need some extra code
 so that the SMMU can check that the interrupt controller for the device
 is also capable of interrupt remapping.

Right, that this is part of IOMMU code has more or less historic reasons
on x86. Interrupt remapping is purely implemented in the IOMMU there, so
on ARM some clue-code between interrupt controler and smmu is needed.


Joerg


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Another preempt folding issue?

2014-02-12 Thread Joerg Roedel
On Wed, Feb 12, 2014 at 11:40:17AM +0100, Borislav Petkov wrote:
 Also what I'm wondering about and what's not clear from Stefan's reply
 is whether this is purely a 32-bit issue, i.e. a 32-bit host running a
 64-bit qemu running a 32-bit iso or what is it?
 
 Or do we have reports for both 32-bit and 64-bit hosts?

As I  understand it, the problem reproduces only on a 32bit host, which
also limits the guests to 32bit, at least when ran with KVM. This is
true even if a x86-64 qemu is used. This is also independent on whether
the host-cpu supports 64bit or not.


Joerg


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Vga passthrough to KVM Guest issues

2013-11-01 Thread Joerg Roedel
Hey Max,

On Mon, Oct 28, 2013 at 10:17:19AM -0600, Alex Williamson wrote:
 On Wed, 2013-10-16 at 21:08 +0200, Max Schettler wrote:
  Hi guys,
  
  Im trying to set up vga passthrough. I use the latest mainline kernel
  (3.12rc5) and patched qemu (1.6.50). When i try to start a VM using this
  command:
  
  qemu-system-x86_64 -enable-kvm -M q35 -m 1024 -cpu qemu64
  -bios /usr/share/qemu/bios.bin -vga none 
  -device
  ioh3420,bus=pcie.0,addr=1c.0,multifunction=on,port=1,chassis=1,id=root.1
  -device vfio-pci,host=06:00.0,bus=root.1,multifunction=on,x-vga=on
  -device vfio-pci,host=06:00.1,bus=root.1
  
  The screen attached to the passthroughed GPU turns on but does not show
  any output.
  Also I get some warnings from amd_iommu and my kernel messages are full
  of AMD-Vi messages:
  AMD-Vi: Completion-Wait loop timed out and
  AMD-Vi: Event logged [IOTLB_INV_TIMEOUT device=06:00.0
  address=0x000438544b90
  
  If i dont stop the qemu process this eventually forces me to reboot
  since the host gets unusable.
  
  I uploaded the whole dmesg output, if its of any help here:
  http://pastebin.com/ki13dqEd
  
  My hardware is an AMD FX-8350 with an Gigabyte 970A-UD3 and a Gigabyte
  7870OC.
  
  Any help is much appreciated, thanks in advance!

That's interesting, what GPU is it that you are trying to pass through
to the guest? It seems to be an ATS capable device. Either there is a
problem in the AMD IOMMU driver with handling ATS or there is a problem
with the hardware and ATS should not be enabled there. Either way, the
fix/workaround has to be in the AMD IOMMU driver.


Joerg


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Vga passthrough to KVM Guest issues

2013-11-01 Thread Joerg Roedel
On Fri, Nov 01, 2013 at 05:35:38PM +0100, Max Schettler wrote:
 I fixed this issue in the meanwhile by switching to another Mainboard,
 so I guess it was at least somehow associated with the Hardware.
 
 The first Setup was this one:
 AMD FX-8350
 Gigabyte 970A-UD3
 Gigabyte 7870GHz Edition
 Sapphire 6450
 
 I switched to an Asrock 990FX Extreme3.

Hmm, in this case it might also have been a BIOS issue. The Gigabyte
BIOS might have configured the IOMMU in a wrong way causing those
errors. Have you tried with the latest BIOS on the Gigabyte Mainboard?


Joerg


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] intel-iommu: Fix leaks in pagetable freeing

2013-08-14 Thread Joerg Roedel
On Sat, Jun 15, 2013 at 10:27:19AM -0600, Alex Williamson wrote:
 At best the current code only seems to free the leaf pagetables and
 the root.  If you're unlucky enough to have a large gap (like any
 QEMU guest with more than 3G of memory), only the first chunk of leaf
 pagetables are freed (plus the root).  This is a massive memory leak.
 This patch re-writes the pagetable freeing function to use a
 recursive algorithm and manages to not only free all the pagetables,
 but does it without any apparent performance loss versus the current
 broken version.
 
 Signed-off-by: Alex Williamson alex.william...@redhat.com
 Cc: sta...@vger.kernel.org

Applied to iommu/fixes, thanks Alex. Will send this for v3.11 after a
couple of days in next.


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: I/O port permission bit inheritance between threads

2013-05-21 Thread Joerg Roedel
Hey Stephen,

On Mon, May 20, 2013 at 02:24:31PM -0700, Stephen Hemminger wrote:
 ioperm() inheritance across threads is different in KVM then when run
 on physical hardware.  The following program runs on physical hardware
 but get SEGV under KVM.
 
 It appears that the I/O permission bits are not shared between threads
 in the same way.

Is this specific to SVM or do you see it on VMX too? My first guess
would be that the KVM instruction emulator does not check to
IO-permissions correctly, but that would affect VMX and SVM.


Joerg


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] kvm:x86/svm: enable SVM lock if host supports it

2013-04-17 Thread Joerg Roedel
On Wed, Apr 17, 2013 at 11:03:33PM +0530, prasadjoshi.li...@gmail.com wrote:
 From: Prasad Joshi prasadjoshi.li...@gmail.com
 
 SVM lock features allows software from preventing update to EFER.SVME.
 Enable the SVM lock in guest if it is supported on host machine.
 
 Signed-off-by: Prasad Joshi prasadjoshi.li...@gmail.com
 ---
  arch/x86/kvm/svm.c |4 
  1 file changed, 4 insertions(+)
 
 diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
 index e1b1ce2..fcdfdea 100644
 --- a/arch/x86/kvm/svm.c
 +++ b/arch/x86/kvm/svm.c
 @@ -4031,6 +4031,10 @@ static void svm_set_supported_cpuid(u32 func, struct 
 kvm_cpuid_entry2 *entry)
   if (boot_cpu_has(X86_FEATURE_NRIPS))
   entry-edx |= SVM_FEATURE_NRIP;
  
 + /* support SVM Lock if host supports it */
 + if (boot_cpu_has(X86_FEATURE_SVML))
 + entry-edx |= SVM_FEATURE_SVML;
 +

No. SVM lock is way more complex to emulate than just setting the CPUID
bit.


Joerg


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: RFC: vfio API changes needed for powerpc (v3)

2013-04-11 Thread Joerg Roedel
On Tue, Apr 09, 2013 at 01:22:15AM +, Yoder Stuart-B08248 wrote:
  What happens if a normal unmap call is done on the MSI iova?  Do we
  need a separate unmap?
 
 I was thinking a normal unmap on an MSI windows would be an error...but
 I'm not set on that.   I put the msi unmap there to make things symmetric,
 a normal unmap would work as well...and then we could drop the msi unmap.

Hmm, this API semantic isn't very clean. When you explicitly map the MSI
banks a clean API would also allow to unmap them. But that is not
possible in your design because the kernel is responsible for mapping
MSIs and you can't unmap a MSI bank that is in use by the kernel.

So since the kernel owns the MSI setup anyways it should also take care
of mapping the MSI banks. What is the reason to not let the kernel
allocate the MSI banks top-down from the end of the DMA window space?
Just let userspace know (or even set if needed) in advance how many of
the windows it configures the kernel will take for mapping MSI banks and
you are fine, no?


Joerg


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] KVM: nSVM/nVMX: Implement vmexit on INIT assertion

2013-02-27 Thread Joerg Roedel
On Sun, Feb 24, 2013 at 03:08:53PM +0100, Jan Kiszka wrote:
 From: Jan Kiszka jan.kis...@siemens.com

 @@ -2390,6 +2390,21 @@ static int nested_svm_vmexit(struct vcpu_svm *svm)
   return 0;
  }
  +static bool nested_svm_handle_init_received(struct kvm_vcpu *vcpu)
 +{
 + struct vcpu_svm *svm = to_svm(vcpu);
 +
 + if (!is_guest_mode(vcpu) ||
 + !(svm-nested.intercept  (1ULL  INTERCEPT_INIT)))
 + return false;
 +
 + svm-vmcb-control.exit_code = SVM_EXIT_INIT;
 + svm-vmcb-control.exit_int_info = 0;
 + nested_svm_vmexit(svm);
 +
 + return true;
 +}

[...]

 + if (vcpu-arch.mp_state == KVM_MP_STATE_INIT_RECEIVED 
 + kvm_x86_ops-handle_init_received(vcpu)) {
 + /* nested vmexit, L1 is runnable now */
 + vcpu-arch.mp_state = KVM_MP_STATE_RUNNABLE;
 + return 1;
 + }

Hmm, looks like the INIT signal is lost after the VMEXIT. But on SVM the
INIT signal is still pending an will be delivered when GIF becomes one
again. Do I miss anything?


Joerg


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] KVM: nSVM/nVMX: Implement vmexit on INIT assertion

2013-02-27 Thread Joerg Roedel
On Mon, Feb 25, 2013 at 10:04:50AM +0100, Jan Kiszka wrote:
 Is the nested-related state already saved on AMD, Jörg? If not, adding
 this one would not make things worse at least. Still, missing user space
 save/restore already breaks reset, not only migration (dunno if this is
 better on AMD).

Not sure if this is what you are asking for, but nested state is at not
saved/restored for migration or anything. This is a long-standing issue
which needs to be fixed at some point.


Joerg


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] KVM: nVMX: Fix direct injection of interrupts from L0 to L2

2013-02-19 Thread Joerg Roedel
On Tue, Feb 19, 2013 at 11:04:01AM +0100, Jan Kiszka wrote:
 I had a look at SVM to check how it deals with this, but I'm not sure
 if I understand the logic correctly. SVM does:
 
 static int nested_svm_vmexit(struct vcpu_svm *svm)
 {
   ...
   /*
* If we emulate a VMRUN/#VMEXIT in the same host #vmexit cycle we have
* to make sure that we do not lose injected events. So check event_inj
* here and copy it to exit_int_info if it is valid.
* Exit_int_info and event_inj can't be both valid because the case
* below only happens on a VMRUN instruction intercept which has
* no valid exit_int_info set.
*/
   if (vmcb-control.event_inj  SVM_EVTINJ_VALID) {
   struct vmcb_control_area *nc = nested_vmcb-control;
 
   nc-exit_int_info = vmcb-control.event_inj;
   nc-exit_int_info_err = vmcb-control.event_inj_err;
   }
 
 nested_svm_vmexit is only called when we leave L2 toward L1, right?

Right.

 So, vmcb-control.event_inj might have been set on last VMRUN emulation, and
 if that one failed, this value shall become the nested exit_int_info. So
 far, so good.

Important fact here: This L2-L1 exit is emulated in the same real
#vmexit cycle as the VMRUN was emulated. So what happens is:

1. VMRUN intercept from L1
2. We emulate the VMRUN and load L2 state into VMCB
3. On the way back to guest mode (to actually run the L2) we
   detect a #vmexit condition
4. So we roll-back by calling nested_svm_vmexit()
5. We enter the guest again which continues execution right
   after its VMRUN instruction.

So we never actually entered L2, but for L1 it has to look like it was
in L2 and made no progress. But when coming out of a guest event_inj is
never valid, so without the special case above we make sure that the L1
hypervisor re-injects the event so it is not lost.


 But what if that injection succeeded and we are now exiting L2 past the
 execution of VMRUN, e.g. L1 intercepts the execution of some special
 instruction in L2? Doesn't the nested exit_int_info now gain a stale
 value? Or does the hardware clear the valid bit int EVENTINJ on
 successful injection? Didn't find an indication in the spec on first
 glance.

Hardware clears event_inj. If the injection was not successful the event
is reported in exit_int_info.

 Otherwise the logic seems to be like this:
  - EVENTINJ is set to the nested value on VMRUN emulation, and only
there (that's in contrast to current VMX, but it makes sense)
  - Interrupt completion with state transfer the VCPU event queues is
*only* performed on L2-to-L1 exits (that's like VMX is trying to do
it as well)
  - There is a special case around nested.exit_required that I didn't
fully get yet, nor can I say how it corresponds to logic in VMX.

Which special case do you mean? There are checks in
nested_svm_check_exception() and nested_svm_intr().


Regards,

Joerg


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/2] kvm, svm: Update MAINTAINERS entry

2012-10-29 Thread Joerg Roedel
I have no access to my AMD email address anymore. Update
entry in MAINTAINERS to the new address.

Cc: Avi Kivity a...@redhat.com
Cc: Marcelo Tosatti mtosa...@redhat.com
Signed-off-by: Joerg Roedel j...@8bytes.org
---
 MAINTAINERS |4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/MAINTAINERS b/MAINTAINERS
index 0267ba2..d881321 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -4234,10 +4234,10 @@ F:  include/linux/kvm*
 F: virt/kvm/
 
 KERNEL VIRTUAL MACHINE (KVM) FOR AMD-V
-M: Joerg Roedel joerg.roe...@amd.com
+M: Joerg Roedel j...@8bytes.org
 L: kvm@vger.kernel.org
 W: http://kvm.qumranet.com
-S: Supported
+S: Maintained
 F: arch/x86/include/asm/svm.h
 F: arch/x86/kvm/svm.c
 
-- 
1.7.9.5


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: AMD KVM Pci Passthrough reports device busy

2012-07-11 Thread Joerg Roedel
Hi Andreas,

On Wed, Jul 11, 2012 at 04:26:30PM +0200, Andreas Hartmann wrote:
 May I please ask, if you meanwhile could get any information about
 potential peer-to-peer communication between the functions of the
 following multifunction device:

Good news: I actually found the right person to ask and the answer is,
 that peer-to-peer communication between these devices is not
 possible. So it is safe to pass them through to guests
 individually.

Regards,

Joerg

-- 
AMD Operating System Research Center

Advanced Micro Devices GmbH Einsteinring 24 85609 Dornach
General Managers: Alberto Bozzo
Registration: Dornach, Landkr. Muenchen; Registerger. Muenchen, HRB Nr. 43632

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: AMD KVM Pci Passthrough reports device busy

2012-06-25 Thread Joerg Roedel
On Mon, Jun 25, 2012 at 07:55:55AM +0200, Andreas Hartmann wrote:
 Hello Joerg,
 
 Joerg Roedel wrote:
  On Tue, Jun 05, 2012 at 08:27:05AM -0600, Alex Williamson wrote:
  Joerg, the question is whether the multifunction device above allows
  peer-to-peer between functions that could bypass the iommu.  If not, we
  can make it the first entry in device specific acs enabled function I
  proposed:
 
  https://lkml.org/lkml/2012/5/30/438
 
  and it would greatly simplify assigning PCI devices on these systems
  with VFIO.  Thanks,
  
  Hm, good question. I will ask around and let you know what I find out.
 
 May I please ask, if you meanwhile could get any information about
 potential peer-to-peer communication between the functions of the
 following multifunction device:
 
 00:14.0 SMBus: Advanced Micro Devices [AMD] nee ATI SBx00 SMBus Controller 
 (rev 42)
 00:14.1 IDE interface: Advanced Micro Devices [AMD] nee ATI SB7x0/SB8x0/SB9x0 
 IDE Controller (rev 40) (prog-if 8a [Master SecP PriP])
 00:14.2 Audio device: Advanced Micro Devices [AMD] nee ATI SBx00 Azalia 
 (Intel HDA) (rev 40)
 00:14.3 ISA bridge: Advanced Micro Devices [AMD] nee ATI SB7x0/SB8x0/SB9x0 
 LPC host controller (rev 40)
 00:14.4 PCI bridge: Advanced Micro Devices [AMD] nee ATI SBx00 PCI to PCI 
 Bridge (rev 40) (prog-if 01 [Subtractive decode])
 00:14.5 USB controller: Advanced Micro Devices [AMD] nee ATI 
 SB7x0/SB8x0/SB9x0 USB OHCI2 Controller (prog-if 10 [OHCI])
 
 00:14.0 0c05: 1002:4385 (rev 42)
 00:14.1 0101: 1002:439c (rev 40)
 00:14.2 0403: 1002:4383 (rev 40)
 00:14.3 0601: 1002:439d (rev 40)
 00:14.4 0604: 1002:4384 (rev 40)
 00:14.5 0c03: 1002:4399

I asked the 'supposed-to-be-resonsible-person' but didn't get a resonse
so far. I will ping them again.


Joerg

-- 
AMD Operating System Research Center

Advanced Micro Devices GmbH Einsteinring 24 85609 Dornach
General Managers: Alberto Bozzo
Registration: Dornach, Landkr. Muenchen; Registerger. Muenchen, HRB Nr. 43632

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: AMD KVM Pci Passthrough reports device busy

2012-06-06 Thread Joerg Roedel
On Tue, Jun 05, 2012 at 08:27:05AM -0600, Alex Williamson wrote:
 Joerg, the question is whether the multifunction device above allows
 peer-to-peer between functions that could bypass the iommu.  If not, we
 can make it the first entry in device specific acs enabled function I
 proposed:
 
 https://lkml.org/lkml/2012/5/30/438
 
 and it would greatly simplify assigning PCI devices on these systems
 with VFIO.  Thanks,

Hm, good question. I will ask around and let you know what I find out.

Regards,

Joerg

-- 
AMD Operating System Research Center

Advanced Micro Devices GmbH Einsteinring 24 85609 Dornach
General Managers: Alberto Bozzo
Registration: Dornach, Landkr. Muenchen; Registerger. Muenchen, HRB Nr. 43632

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] kvm-unittests: Fix apic unittest for latest qemu-kvm

2012-05-08 Thread Joerg Roedel
Since commit b334ec56 in qemu-kvm the use of irqlines  15
is not supported anymore. This causes the apic unittest to
fail since this commit. Since the commit-msg explicitly
states that potential users of this needs to be fixed, here
is the fix which uses irq lines 14 and 15 instead in the
unittest.

Cc: Jan Kiszka jan.kis...@siemens.com
Signed-off-by: Joerg Roedel joerg.roe...@amd.com
---
 x86/apic.c |   12 ++--
 1 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/x86/apic.c b/x86/apic.c
index 2725b9a..50e77fc 100644
--- a/x86/apic.c
+++ b/x86/apic.c
@@ -139,8 +139,8 @@ static void ioapic_isr_77(isr_regs_t *regs)
 static void test_ioapic_intr(void)
 {
 handle_irq(0x77, ioapic_isr_77);
-set_ioapic_redir(0x10, 0x77);
-toggle_irq_line(0x10);
+set_ioapic_redir(0x0e, 0x77);
+toggle_irq_line(0x0e);
 asm volatile (nop);
 report(ioapic interrupt, g_isr_77 == 1);
 }
@@ -168,11 +168,11 @@ static void test_ioapic_simultaneous(void)
 {
 handle_irq(0x78, ioapic_isr_78);
 handle_irq(0x66, ioapic_isr_66);
-set_ioapic_redir(0x10, 0x78);
-set_ioapic_redir(0x11, 0x66);
+set_ioapic_redir(0x0e, 0x78);
+set_ioapic_redir(0x0f, 0x66);
 irq_disable();
-toggle_irq_line(0x11);
-toggle_irq_line(0x10);
+toggle_irq_line(0x0f);
+toggle_irq_line(0x0e);
 irq_enable();
 asm volatile (nop);
 report(ioapic simultaneous interrupt,
-- 
1.7.0.4


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] KVM: X86: Add mmx movq emulation

2012-05-07 Thread Joerg Roedel
On Sun, May 06, 2012 at 01:08:05PM +0300, Avi Kivity wrote:
 On 05/04/2012 02:47 PM, Joerg Roedel wrote:
  Add support to the MMX versions of the movq instructions to
  the instruction emulator. Also handle possible exceptions
  they may cause.
 
 This is already in (cbe2c9d30, e59717550e).  Are you looking at master
 instead of next?

Right, I was looking at master. Probably I should have re-read your mail
about the new git workflow.

 Since you've just thought of the issues involved, I'd appreciate a
 review of the commits above, both wrt correctness and
 maintainability.

The patches above look correct to me. In fact cbe2c9d30 is more general
than my implementation because it fetches all possible mmx operands.

My implementation on the other side should be a bit faster because it
looks for FP exceptions directly when the registers are accessed which
saves one get_fpu/put_fpu cycle (and an fwait instruction).

 In fact I already see one difference - my patches do reg = 7, while
 your patches generate #UD for %mm8-%mm15.

Your version is correct. Documentation says that REX-prefixes are
ignored where not supported or misplaced. I also tried that directly on
hardware and it works as documented and implemented in KVM.


Regards,

Joerg

-- 
AMD Operating System Research Center

Advanced Micro Devices GmbH Einsteinring 24 85609 Dornach
General Managers: Alberto Bozzo
Registration: Dornach, Landkr. Muenchen; Registerger. Muenchen, HRB Nr. 43632

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] KVM: X86: Remove stale values from ctxt-memop before emulation

2012-05-07 Thread Joerg Roedel
On Sun, May 06, 2012 at 11:21:52AM +0300, Avi Kivity wrote:
  diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
  index d4bf50c..1b516ec 100644
  --- a/arch/x86/kvm/emulate.c
  +++ b/arch/x86/kvm/emulate.c
  @@ -3937,6 +3937,7 @@ int x86_decode_insn(struct x86_emulate_ctxt *ctxt, 
  void *insn, int insn_len)
  struct opcode opcode;
   
  ctxt-memop.type = OP_NONE;
  +   ctxt-memop.val  = 0;
  ctxt-memopp = NULL;
  ctxt-_eip = ctxt-eip;
  ctxt-fetch.start = ctxt-_eip;
 
 This only works for long sized values - it doesn't initialize val64 on
 i386, for example.  So I think it's better to change bsr (and family) to
 use emualte_2op_SrcV_nobyte() instead (which has the added benefit of
 using the same values as the processor for the undefined bits).

Right, thats a better solution. How about the attached patch? The zf
check shouldn't be necessary anymore because the generated assembly uses
dst.val as input and output so writeback shouldn't do anything wrong.
The bsr and bsf unittests all pass again with this patch.

Joerg

From e9262f18e90111d32b584084c0b5564cbd728d65 Mon Sep 17 00:00:00 2001
From: Joerg Roedel joerg.roe...@amd.com
Date: Mon, 7 May 2012 12:05:28 +0200
Subject: [PATCH] KVM: X86: convert bsf/bsr instructions to
 emulate_2op_SrcV_nobyte()

The instruction emulation for bsrw is broken in KVM because
the code always uses bsr with 32 or 64 bit operand size for
emulation. Fix that by using emulate_2op_SrcV_nobyte() macro
to use guest operand size for emulation.

Signed-off-by: Joerg Roedel joerg.roe...@amd.com
---
 arch/x86/kvm/emulate.c |   26 ++
 1 file changed, 2 insertions(+), 24 deletions(-)

diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index 0d151e2..a6f8488 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -3134,35 +3134,13 @@ static int em_btc(struct x86_emulate_ctxt *ctxt)
 
 static int em_bsf(struct x86_emulate_ctxt *ctxt)
 {
-   u8 zf;
-
-   __asm__ (bsf %2, %0; setz %1
-: =r(ctxt-dst.val), =q(zf)
-: r(ctxt-src.val));
-
-   ctxt-eflags = ~X86_EFLAGS_ZF;
-   if (zf) {
-   ctxt-eflags |= X86_EFLAGS_ZF;
-   /* Disable writeback. */
-   ctxt-dst.type = OP_NONE;
-   }
+   emulate_2op_SrcV_nobyte(ctxt, bsf);
return X86EMUL_CONTINUE;
 }
 
 static int em_bsr(struct x86_emulate_ctxt *ctxt)
 {
-   u8 zf;
-
-   __asm__ (bsr %2, %0; setz %1
-: =r(ctxt-dst.val), =q(zf)
-: r(ctxt-src.val));
-
-   ctxt-eflags = ~X86_EFLAGS_ZF;
-   if (zf) {
-   ctxt-eflags |= X86_EFLAGS_ZF;
-   /* Disable writeback. */
-   ctxt-dst.type = OP_NONE;
-   }
+   emulate_2op_SrcV_nobyte(ctxt, bsr);
return X86EMUL_CONTINUE;
 }
 
-- 
1.7.9.5


-- 
AMD Operating System Research Center

Advanced Micro Devices GmbH Einsteinring 24 85609 Dornach
General Managers: Alberto Bozzo
Registration: Dornach, Landkr. Muenchen; Registerger. Muenchen, HRB Nr. 43632

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] KVM: X86: Add mmx movq emulation

2012-05-07 Thread Joerg Roedel
On Sun, May 06, 2012 at 01:08:05PM +0300, Avi Kivity wrote:
 This is already in (cbe2c9d30, e59717550e).  Are you looking at master
 instead of next?

Btw. your mail about the new git-workflow states something about an
auto-next branch. But I don't see that branch in the KVM tree (looking
at git://git.kernel.org/pub/scm/virt/kvm/kvm.git). Is there another
branch that contains all fixes and everything for the next merge window?
Basically I am looking for a branch which has the new master and next
merged.

Thanks,

Joerg

-- 
AMD Operating System Research Center

Advanced Micro Devices GmbH Einsteinring 24 85609 Dornach
General Managers: Alberto Bozzo
Registration: Dornach, Landkr. Muenchen; Registerger. Muenchen, HRB Nr. 43632

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] KVM: X86: Remove stale values from ctxt-memop before emulation

2012-05-07 Thread Joerg Roedel
On Mon, May 07, 2012 at 01:18:01PM +0300, Avi Kivity wrote:
 On 05/07/2012 01:12 PM, Joerg Roedel wrote:
 
  The instruction emulation for bsrw is broken in KVM because
  the code always uses bsr with 32 or 64 bit operand size for
  emulation. Fix that by using emulate_2op_SrcV_nobyte() macro
  to use guest operand size for emulation.
 
 
 It looks fine.  Do you know what triggered this regression?  (for
 figuring out if it's 3.4 material)

Looks like it is 3.4 (and -stable) material. I tested a few older
kernels and the test passes on 3.0 but fails on 3.2 an later kernels
(I have not tested 3.1).


Joerg

-- 
AMD Operating System Research Center

Advanced Micro Devices GmbH Einsteinring 24 85609 Dornach
General Managers: Alberto Bozzo
Registration: Dornach, Landkr. Muenchen; Registerger. Muenchen, HRB Nr. 43632

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] KVM: X86: Add mmx movq emulation

2012-05-07 Thread Joerg Roedel
On Mon, May 07, 2012 at 01:30:48PM +0300, Avi Kivity wrote:
 On 05/07/2012 01:28 PM, Joerg Roedel wrote:
  On Sun, May 06, 2012 at 01:08:05PM +0300, Avi Kivity wrote:
   This is already in (cbe2c9d30, e59717550e).  Are you looking at master
   instead of next?
 
  Btw. your mail about the new git-workflow states something about an
  auto-next branch. But I don't see that branch in the KVM tree (looking
  at git://git.kernel.org/pub/scm/virt/kvm/kvm.git). 
 
 We forgot to generate it.
 
  Is there another
  branch that contains all fixes and everything for the next merge window?
  Basically I am looking for a branch which has the new master and next
  merged.
 
 Right.  I'll get my scripts to generate it.  (btw:  auto-next =
 merge(upstream, master, next)).

Cool thanks. That is perfect for our internal testing :)


Joerg

-- 
AMD Operating System Research Center

Advanced Micro Devices GmbH Einsteinring 24 85609 Dornach
General Managers: Alberto Bozzo
Registration: Dornach, Landkr. Muenchen; Registerger. Muenchen, HRB Nr. 43632

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] KVM: X86: Add mmx movq emulation

2012-05-04 Thread Joerg Roedel
Add support to the MMX versions of the movq instructions to
the instruction emulator. Also handle possible exceptions
they may cause.

Signed-off-by: Joerg Roedel joerg.roe...@amd.com
---
 arch/x86/include/asm/kvm_emulate.h |2 +-
 arch/x86/kvm/emulate.c |  155 +++-
 2 files changed, 154 insertions(+), 3 deletions(-)

diff --git a/arch/x86/include/asm/kvm_emulate.h 
b/arch/x86/include/asm/kvm_emulate.h
index c222e1a..e4833f8 100644
--- a/arch/x86/include/asm/kvm_emulate.h
+++ b/arch/x86/include/asm/kvm_emulate.h
@@ -200,7 +200,7 @@ typedef u32 __attribute__((vector_size(16))) sse128_t;
 
 /* Type, address-of, and value of an instruction's operand. */
 struct operand {
-   enum { OP_REG, OP_MEM, OP_IMM, OP_XMM, OP_NONE } type;
+   enum { OP_REG, OP_MEM, OP_IMM, OP_XMM, OP_MMX, OP_NONE } type;
unsigned int bytes;
union {
unsigned long orig_val;
diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index 8375622..d4bf50c 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -142,6 +142,7 @@
 #define Src2FS  (OpFS  Src2Shift)
 #define Src2GS  (OpGS  Src2Shift)
 #define Src2Mask(OpMask  Src2Shift)
+#define Mmx (1ULL35)
 
 #define X2(x...) x, x
 #define X3(x...) X2(x), x
@@ -537,6 +538,11 @@ static int emulate_nm(struct x86_emulate_ctxt *ctxt)
return emulate_exception(ctxt, NM_VECTOR, 0, false);
 }
 
+static int emulate_mf(struct x86_emulate_ctxt *ctxt)
+{
+   return emulate_exception(ctxt, MF_VECTOR, 0, false);
+}
+
 static u16 get_segment_selector(struct x86_emulate_ctxt *ctxt, unsigned seg)
 {
u16 selector;
@@ -859,6 +865,110 @@ static void write_sse_reg(struct x86_emulate_ctxt *ctxt, 
sse128_t *data,
ctxt-ops-put_fpu(ctxt);
 }
 
+#define __READ_MMX_SAFE(mmxreg)\
+   asm volatile(2: movq %% mmxreg , %[d]\n\t   \
+xor %[err], %[err]\n\t   \
+1:\n\t   \
+.section .fixup,\ax\\n\t   \
+3: mov %[fault], %[err]; jmp 1b\n\t  \
+.previous\n\t\
+_ASM_EXTABLE(2b, 3b)   \
+: [err] =r (err), [d] =m(*data)\
+: [fault] i (X86EMUL_PROPAGATE_FAULT));
+
+
+#define __WRITE_MMX_SAFE(mmxreg)   \
+   asm volatile(2: movq %[d], %% mmxreg \n\t   \
+ xor %[err], %[err]\n\t  \
+1:\n\t   \
+.section .fixup,\ax\\n\t   \
+3: mov %[fault], %[err]; jmp 1b\n\t  \
+.previous\n\t\
+_ASM_EXTABLE(2b, 3b)   \
+: [err] =r (err) \
+: [d] m(*data),  \
+  [fault] i (X86EMUL_PROPAGATE_FAULT));
+
+static int read_mmx_reg(struct x86_emulate_ctxt *ctxt, sse128_t *data, int reg)
+{
+   int err = X86EMUL_CONTINUE;
+
+   ctxt-ops-get_fpu(ctxt);
+   switch (reg) {
+   case 0:
+   __READ_MMX_SAFE(mm0);
+   break;
+   case 1:
+   __READ_MMX_SAFE(mm1);
+   break;
+   case 2:
+   __READ_MMX_SAFE(mm2);
+   break;
+   case 3:
+   __READ_MMX_SAFE(mm3);
+   break;
+   case 4:
+   __READ_MMX_SAFE(mm4);
+   break;
+   case 5:
+   __READ_MMX_SAFE(mm5);
+   break;
+   case 6:
+   __READ_MMX_SAFE(mm6);
+   break;
+   case 7:
+   __READ_MMX_SAFE(mm7);
+   break;
+   default:
+   BUG();
+   }
+   ctxt-ops-put_fpu(ctxt);
+
+   return err;
+}
+
+static int write_mmx_reg(struct x86_emulate_ctxt *ctxt, sse128_t *data,
+int reg)
+{
+   int err = X86EMUL_CONTINUE;
+
+   ctxt-ops-get_fpu(ctxt);
+   switch (reg) {
+   case 0:
+   __WRITE_MMX_SAFE(mm0);
+   break;
+   case 1:
+   __WRITE_MMX_SAFE(mm1);
+   break;
+   case 2:
+   __WRITE_MMX_SAFE(mm2);
+   break;
+   case 3:
+   __WRITE_MMX_SAFE(mm3);
+   break;
+   case 4:
+   __WRITE_MMX_SAFE(mm4);
+   break;
+   case 5:
+   __WRITE_MMX_SAFE(mm5);
+   break;
+   case 6:
+   __WRITE_MMX_SAFE(mm6);
+   break;
+   case 7:
+   __WRITE_MMX_SAFE(mm7);
+   break;
+   default:
+   BUG();
+   }
+   ctxt-ops-put_fpu(ctxt);
+
+   return err

  1   2   3   4   5   6   7   8   9   10   >