Re: [PATCH v5 2/3] x86/asm: add _ASM_ARG* constants for argument registers to

2018-06-21 Thread Ingo Molnar


* Nick Desaulniers  wrote:

> On Thu, Jun 14, 2018 at 5:17 PM H. Peter Anvin  wrote:
> >
> > On 06/14/18 13:59, Nick Desaulniers wrote:
> > > On Thu, Jun 14, 2018 at 1:48 PM H. Peter Anvin  wrote:
> > >>
> > >> On 06/13/18 14:05, Nick Desaulniers wrote:
> > >>> From: "H. Peter Anvin" 
> > >>>
> > >>> i386 and x86-64 uses different registers for arguments; make them
> > >>> available so we don't have to #ifdef in the actual code.
> > >>>
> > >>> Native size and specified size (q, l, w, b) versions are provided.
> > >>>
> > >>> Suggested-by: Sedat Dilek 
> > >>> Signed-off-by: H. Peter Anvin 
> > >>> Signed-off-by: Nick Desaulniers 
> > >>
> > >> I still object to Suggested-by: here.  Sedat did a correction, which is
> > >> a Reviewed-by:, but unless I'm completely out to sea, there was no
> > >> suggestion on Sedat's part of this; and I had certainly not seen it when
> > >> I wrote the code.
> > >
> > > I'm fine with changing it from a Suggested-by to a Reviewed-by.  Can
> > > you do that when you apply the set, or should I send a v6?
> > >
> >
> > I'm not handling patch mechanics for x86 at the moment.
> 
> Hi Ingo and Thomas,
> Have you had a chance to review this patch series for application?
> 
> Note that Juergen (the paravirt maintainer) acked the series:
> https://bugs.llvm.org/show_bug.cgi?id=37880
> The only issue being swapping a Suggested-by for a Reviewed-by tag for
> one commit, as above.

Could you please update the series with all suggestions and tags and send a new 
series?

Thanks,

Ingo
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [virtio-dev] Re: [Qemu-devel] [PATCH] qemu: Introduce VIRTIO_NET_F_STANDBY feature bit to virtio_net

2018-06-21 Thread Cornelia Huck
On Wed, 20 Jun 2018 22:48:58 +0300
"Michael S. Tsirkin"  wrote:

> On Wed, Jun 20, 2018 at 06:06:19PM +0200, Cornelia Huck wrote:
> > In any case, I'm not sure anymore why we'd want the extra uuid.  
> 
> It's mostly so we can have e.g. multiple devices with same MAC
> (which some people seem to want in order to then use
> then with different containers).
> 
> But it is also handy for when you assign a PF, since then you
> can't set the MAC.
> 

OK, so what about the following:

- introduce a new feature bit, VIRTIO_NET_F_STANDBY_UUID that indicates
  that we have a new uuid field in the virtio-net config space
- in QEMU, add a property for virtio-net that allows to specify a uuid,
  offer VIRTIO_NET_F_STANDBY_UUID if set
- when configuring, set the property to the group UUID of the vfio-pci
  device
- in the guest, use the uuid from the virtio-net device's config space
  if applicable; else, fall back to matching by MAC as done today

That should work for all virtio transports.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH] qemu: Introduce VIRTIO_NET_F_STANDBY feature bit to virtio_net

2018-06-21 Thread Michael S. Tsirkin
On Wed, Jun 13, 2018 at 01:40:59PM +0800, Jason Wang wrote:
> 
> 
> On 2018年06月13日 12:24, Samudrala, Sridhar wrote:
> > On 6/12/2018 7:38 PM, Jason Wang wrote:
> > > 
> > > 
> > > On 2018年06月12日 19:54, Michael S. Tsirkin wrote:
> > > > On Wed, Jun 06, 2018 at 10:29:03AM +0800, Jason Wang wrote:
> > > > > 
> > > > > On 2018年06月05日 20:33, Michael S. Tsirkin wrote:
> > > > > > I don't think this is sufficient.
> > > > > > 
> > > > > > If both primary and standby devices are present, a
> > > > > > legacy guest without
> > > > > > support for the feature might see two devices with same mac and get
> > > > > > confused.
> > > > > > 
> > > > > > I think that we should only make primary visible after
> > > > > > guest acked the
> > > > > > backup feature bit.
> > > > > I think we want exactly the reverse? E.g fail the
> > > > > negotiation when guest
> > > > > does not ack backup feature.
> > > > > 
> > > > > Otherwise legacy guest won't even have the chance to see
> > > > > primary device in
> > > > > the guest.
> > > > That's by design.
> > > 
> > > So management needs to know the capability of guest to set the
> > > backup feature. This looks a chicken or egg problem to me.
> > 
> > I don't think so. If the tenant requests 'accelerated datapath feature',
> > the management
> > will set 'standby' feature bit on virtio-net interface and if the guest
> > virtio-net driver
> > supports this feature, then the tenant VM will get that capability via a
> > hot-plugged
> > primary device.
> 
> Ok, I thought exactly the reverse because of the commit title is "enable
> virtio_net to act as a standby for a passthru device". But re-read the
> commit log content, I understand the case a little bit. Btw, VF is not
> necessarily faster than virtio-net, especially consider virtio-net may have
> a lot of queues.

Don't do that then, right?

> > 
> > 
> > > 
> > > > 
> > > > > > And on reset or when backup is cleared in some other way, unplug the
> > > > > > primary.
> > > > > What if guest does not support hotplug?
> > > > It shouldn't ack the backup feature then and it's a good point.
> > > > We should both document this and check kernel config has
> > > > hotplug support. Sridhar could you take a look pls?
> > > > 
> > > > > > Something like the below will do the job:
> > > > > > 
> > > > > > Primary device is added with a special "primary-failover" flag.
> > > > > > A virtual machine is then initialized with just a standby virtio
> > > > > > device. Primary is not yet added.
> > > > > A question is how to do the matching? Qemu knows nothing about e.g mac
> > > > > address of a pass-through device I believe?
> > > > Supply a flag to the VFIO when it's added, this way QEMU will know.
> > > > 
> > > > > > Later QEMU detects that guest driver device set DRIVER_OK.
> > > > > > It then exposes the primary device to the guest, and triggers
> > > > > > a device addition event (hot-plug event) for it.
> > > > > Do you mean we won't have primary device in the initial qemu cli?
> > > > No, that's not what I mean.
> > > > 
> > > > I mean management will supply a flag to VFIO and then
> > > > 
> > > > 
> > > > - VFIO defers exposing
> > > > primary to guest until guest acks the feature bit.
> > > > - When we see guest ack, initiate hotplug.
> > > > - On reboot, hide it again.
> > > > - On reset without reboot, request hot-unplug and on eject hide
> > > > it again.
> > > 
> > > This sounds much like a kind of bonding in qemu.
> > > 
> > > > 
> > > > > > If QEMU detects guest driver removal, it initiates a
> > > > > > hot-unplug sequence
> > > > > > to remove the primary driver.  In particular, if QEMU detects guest
> > > > > > re-initialization (e.g. by detecting guest reset) it
> > > > > > immediately removes
> > > > > > the primary device.
> > > > > I believe guest failover module should handle this gracefully?
> > > > It can't control enumeration order, if primary is enumerated before
> > > > standby then guest will load its driver and it's too late
> > > > when failover driver is loaded.
> > > 
> > > Well, even if we can delay the visibility of primary until
> > > DRIVER_OK, there still be a race I think? And it looks to me it's
> > > still a bug of guest:
> > > 
> > > E.g primary could be probed before failover_register() in guest.
> > > Then we will miss the enslaving of primary forever.
> > 
> > That is not an issue. Even if the primary is probed before failover
> > driver, it will
> > enslave the primary via the call to failover_existing_slave_register()
> > as part of
> > failover_register() routine.
> 
> Aha I get it. So the enumeration order is not an issue.
> 
> Consider primary may still be seen by guest kernel even if we delay its
> visibility, I wonder whether we can control the lifecycle of primary through
> driver but not qemu. This can simplify a lot of things.
> 
> Thanks
> 
> > 
> > > 
> > > Thanks
> > > 
> > > > 
> > > > > Thanks
> > > > > 
> > > > > > We can move some of this code to management as well,
> > > > > > ar

Re: [virtio-dev] Re: [Qemu-devel] [PATCH] qemu: Introduce VIRTIO_NET_F_STANDBY feature bit to virtio_net

2018-06-21 Thread Michael S. Tsirkin
On Thu, Jun 21, 2018 at 04:59:13PM +0200, Cornelia Huck wrote:
> On Wed, 20 Jun 2018 22:48:58 +0300
> "Michael S. Tsirkin"  wrote:
> 
> > On Wed, Jun 20, 2018 at 06:06:19PM +0200, Cornelia Huck wrote:
> > > In any case, I'm not sure anymore why we'd want the extra uuid.  
> > 
> > It's mostly so we can have e.g. multiple devices with same MAC
> > (which some people seem to want in order to then use
> > then with different containers).
> > 
> > But it is also handy for when you assign a PF, since then you
> > can't set the MAC.
> > 
> 
> OK, so what about the following:
> 
> - introduce a new feature bit, VIRTIO_NET_F_STANDBY_UUID that indicates
>   that we have a new uuid field in the virtio-net config space
> - in QEMU, add a property for virtio-net that allows to specify a uuid,
>   offer VIRTIO_NET_F_STANDBY_UUID if set
> - when configuring, set the property to the group UUID of the vfio-pci
>   device
> - in the guest, use the uuid from the virtio-net device's config space
>   if applicable; else, fall back to matching by MAC as done today
> 
> That should work for all virtio transports.

True. I'm a bit unhappy that it's virtio net specific though
since down the road I expect we'll have a very similar feature
for scsi (and maybe others).

But we do not have a way to have fields that are portable
both across devices and transports, and I think it would
be a useful addition. How would this work though? Any idea?

-- 
MST
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH v2 0/5] Add virtio-iommu driver

2018-06-21 Thread Jean-Philippe Brucker
Implement the base virtio-iommu driver, following version 0.7 of the
specification [1].

Changes since last version [2]:
* Address comments, thanks again for the review.
* As suggested, add a DT binding description in patch 1.
* Depend on VIRTIO_MMIO=y to fix a build failure¹
* Switch to v0.7 of the spec, which changes resv_mem parameters and adds
  an MMIO flag. These are trivial but not backward compatible. Once
  device or driver is upstream, updates to the spec will rely on feature
  bits to stay compatible with this code.
* Implement the new tlb_sync interface, by splitting add_req() and
  sync_req(). I noticed a small improvement on netperf stream because
  the synchronous iommu_unmap() also benefits from this. Other
  experiments, such as using kmem_cache for requests instead of kmalloc,
  didn't show any improvement.

Driver is available on branch virtio-iommu/v0.7 [3], and the x86+ACPI
prototype is on branch virtio-iommu/devel. That x86 code hasn't changed,
it still requires the DMA ifdeffery and I lack the expertise to tidy it
up. The kvmtool example device has been cleaned up and is available on
branch virtio-iommu/v0.7 [4].

Feedback welcome!

Thanks,
Jean

[1] virtio-iommu specification v0.7
https://www.spinics.net/lists/linux-virtualization/msg34127.html
[2] virtio-iommu driver v1
https://www.spinics.net/lists/kvm/msg164322.html
[3] git://linux-arm.org/linux-jpb.git virtio-iommu/v0.7

http://www.linux-arm.org/git?p=linux-jpb.git;a=shortlog;h=refs/heads/virtio-iommu/v0.7
[4] git://linux-arm.org/kvmtool-jpb.git virtio-iommu/v0.7 

  ---
¹ A word on the module story. Because of complex dependencies IOMMU
drivers cannot yet be .ko modules. Enabling it is outside the scope of
this series but I have a small prototype on branch virtio-iommu/
module-devel. It seems desirable since some distros currently ship the
transport code as module and are unlikely to change this on our account.
Note that this series works fine with arm64 defconfig, which selects
VIRTIO_MMIO=y.

I could use some help to clean this up. Currently my solution is to split
virtio-iommu into a module and a 3-lines built-in stub, which isn't
graceful but could have been worse.

Keeping the whole virtio-iommu driver as builtin would require accessing
any virtio utility through get_symbol. So far we only have seven of
those and could keep a list of pointer ops, but I find this solution
terrible. If virtio or virtio_mmio is a module, virtio-iommu also needs
to be one.

The minimal set of changes to make a module out of an OF-based IOMMU
driver seems to be:
* Export IOMMU symbols used by drivers
* Don't give up deferring probe in of_iommu
* Move IOMMU OF tables to .rodata
* Create a static virtio-iommu stub that declares the virtio-iommu OF
  table entry. The build system doesn't pick up IOMMU_OF_DECLARE when
  done from an object destined to be built as module :(
* Create a device_link between endpoint and IOMMU, to ensure that
  removing the IOMMU driver also unbinds the endpoint driver. Putting
  this in IOMMU core seems like a better approach since even when not
  built as module, unbinding an IOMMU device via sysfs will cause
  crashes.

With this, as long as virtio-mmio isn't loaded, the OF code defers probe
of endpoints that depend on virtio-iommu. Once virtio-mmio is loaded,
the probe is still deferred until virtio-iommu registers itself to the
virtio bus. Once virtio-iommu is loaded, probe of endpoints managed by
the IOMMU follows.

I'll investigate ACPI IORT when I find some time, though I don't expect
much complication and suggest we tackle one problem at a time. Since it
is a luxury that requires changes to the IOMMU core, module support is
left as a future improvement.
  ---

Jean-Philippe Brucker (5):
  dt-bindings: virtio: Specify #iommu-cells value for a virtio-iommu
  iommu: Add virtio-iommu driver
  iommu/virtio: Add probe request
  iommu/virtio: Add event queue
  vfio: Allow type-1 IOMMU instantiation for ARM

 .../devicetree/bindings/virtio/mmio.txt   |8 +
 MAINTAINERS   |6 +
 drivers/iommu/Kconfig |   11 +
 drivers/iommu/Makefile|1 +
 drivers/iommu/virtio-iommu.c  | 1164 +
 drivers/vfio/Kconfig  |2 +-
 include/uapi/linux/virtio_ids.h   |1 +
 include/uapi/linux/virtio_iommu.h |  172 +++
 8 files changed, 1364 insertions(+), 1 deletion(-)
 create mode 100644 drivers/iommu/virtio-iommu.c
 create mode 100644 include/uapi/linux/virtio_iommu.h

-- 
2.17.0

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

[PATCH v2 1/5] dt-bindings: virtio: Specify #iommu-cells value for a virtio-iommu

2018-06-21 Thread Jean-Philippe Brucker
A virtio-mmio node may represent a virtio-iommu device. This is discovered
by the virtio driver at probe time, but the DMA topology isn't
discoverable and must be described by firmware. For DT the standard IOMMU
description is used, as specified in bindings/iommu/iommu.txt and
bindings/pci/pci-iommu.txt. Like many other IOMMUs, virtio-iommu
distinguishes masters by their endpoint IDs, which requires one IOMMU cell
in the "iommus" property.

Signed-off-by: Jean-Philippe Brucker 
---
 Documentation/devicetree/bindings/virtio/mmio.txt | 8 
 1 file changed, 8 insertions(+)

diff --git a/Documentation/devicetree/bindings/virtio/mmio.txt 
b/Documentation/devicetree/bindings/virtio/mmio.txt
index 5069c1b8e193..337da0e3a87f 100644
--- a/Documentation/devicetree/bindings/virtio/mmio.txt
+++ b/Documentation/devicetree/bindings/virtio/mmio.txt
@@ -8,6 +8,14 @@ Required properties:
 - reg: control registers base address and size including configuration 
space
 - interrupts:  interrupt generated by the device
 
+Required properties for virtio-iommu:
+
+- #iommu-cells:When the node describes a virtio-iommu device, it is
+   linked to DMA masters using the "iommus" property as
+   described in devicetree/bindings/iommu/iommu.txt. For
+   virtio-iommu #iommu-cells must be 1, each cell describing
+   a single endpoint ID.
+
 Example:
 
virtio_block@3000 {
-- 
2.17.0

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH v2 2/5] iommu: Add virtio-iommu driver

2018-06-21 Thread Jean-Philippe Brucker
The virtio IOMMU is a para-virtualized device, allowing to send IOMMU
requests such as map/unmap over virtio-mmio transport without emulating
page tables. This implementation handles ATTACH, DETACH, MAP and UNMAP
requests.

The bulk of the code transforms calls coming from the IOMMU API into
corresponding virtio requests. Mappings are kept in an interval tree
instead of page tables.

Signed-off-by: Jean-Philippe Brucker 
---
 MAINTAINERS   |   6 +
 drivers/iommu/Kconfig |  11 +
 drivers/iommu/Makefile|   1 +
 drivers/iommu/virtio-iommu.c  | 929 ++
 include/uapi/linux/virtio_ids.h   |   1 +
 include/uapi/linux/virtio_iommu.h | 117 
 6 files changed, 1065 insertions(+)
 create mode 100644 drivers/iommu/virtio-iommu.c
 create mode 100644 include/uapi/linux/virtio_iommu.h

diff --git a/MAINTAINERS b/MAINTAINERS
index 55c08968aaab..bfb09dfa8e02 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -15248,6 +15248,12 @@ S: Maintained
 F: drivers/virtio/virtio_input.c
 F: include/uapi/linux/virtio_input.h
 
+VIRTIO IOMMU DRIVER
+M: Jean-Philippe Brucker 
+S: Maintained
+F: drivers/iommu/virtio-iommu.c
+F: include/uapi/linux/virtio_iommu.h
+
 VIRTUAL BOX GUEST DEVICE DRIVER
 M: Hans de Goede 
 M: Arnd Bergmann 
diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig
index 958417f22020..820709383899 100644
--- a/drivers/iommu/Kconfig
+++ b/drivers/iommu/Kconfig
@@ -412,4 +412,15 @@ config QCOM_IOMMU
help
  Support for IOMMU on certain Qualcomm SoCs.
 
+config VIRTIO_IOMMU
+   bool "Virtio IOMMU driver"
+   depends on VIRTIO_MMIO=y
+   select IOMMU_API
+   select INTERVAL_TREE
+   select ARM_DMA_USE_IOMMU if ARM
+   help
+ Para-virtualised IOMMU driver with virtio.
+
+ Say Y here if you intend to run this kernel as a guest.
+
 endif # IOMMU_SUPPORT
diff --git a/drivers/iommu/Makefile b/drivers/iommu/Makefile
index 244ad7913a81..b559c6ae81ea 100644
--- a/drivers/iommu/Makefile
+++ b/drivers/iommu/Makefile
@@ -33,3 +33,4 @@ obj-$(CONFIG_EXYNOS_IOMMU) += exynos-iommu.o
 obj-$(CONFIG_FSL_PAMU) += fsl_pamu.o fsl_pamu_domain.o
 obj-$(CONFIG_S390_IOMMU) += s390-iommu.o
 obj-$(CONFIG_QCOM_IOMMU) += qcom_iommu.o
+obj-$(CONFIG_VIRTIO_IOMMU) += virtio-iommu.o
diff --git a/drivers/iommu/virtio-iommu.c b/drivers/iommu/virtio-iommu.c
new file mode 100644
index ..ea0242d8624b
--- /dev/null
+++ b/drivers/iommu/virtio-iommu.c
@@ -0,0 +1,929 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Virtio driver for the paravirtualized IOMMU
+ *
+ * Copyright (C) 2018 Arm Limited
+ */
+
+#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include 
+
+#define MSI_IOVA_BASE  0x800
+#define MSI_IOVA_LENGTH0x10
+
+#define VIOMMU_REQUEST_VQ  0
+#define VIOMMU_NR_VQS  1
+
+#define VIOMMU_REQUEST_TIMEOUT 1 /* 10s */
+
+struct viommu_dev {
+   struct iommu_device iommu;
+   struct device   *dev;
+   struct virtio_device*vdev;
+
+   struct ida  domain_ids;
+
+   struct virtqueue*vqs[VIOMMU_NR_VQS];
+   spinlock_t  request_lock;
+   struct list_headrequests;
+
+   /* Device configuration */
+   struct iommu_domain_geometrygeometry;
+   u64 pgsize_bitmap;
+   u8  domain_bits;
+};
+
+struct viommu_mapping {
+   phys_addr_t paddr;
+   struct interval_tree_node   iova;
+   u32 flags;
+};
+
+struct viommu_domain {
+   struct iommu_domain domain;
+   struct viommu_dev   *viommu;
+   struct mutexmutex;
+   unsigned intid;
+
+   spinlock_t  mappings_lock;
+   struct rb_root_cached   mappings;
+
+   unsigned long   nr_endpoints;
+};
+
+struct viommu_endpoint {
+   struct viommu_dev   *viommu;
+   struct viommu_domain*vdomain;
+};
+
+struct viommu_request {
+   struct list_headlist;
+   void*writeback;
+   unsigned intwrite_offset;
+   unsigned intlen;
+   charbuf[];
+};
+
+#define to_viommu_domain(domain)   \
+   container_of(domain, struct viommu_domain, domain)
+
+static int viommu_get_req_errno(void *buf, size_t len)
+{
+   struct virtio_iommu_req_tail *tail = buf + len - sizeof(*tail);
+
+   switch (tail->status) {
+   case VIRTIO_IOMMU_S_OK:
+

[PATCH v2 3/5] iommu/virtio: Add probe request

2018-06-21 Thread Jean-Philippe Brucker
When the device offers the probe feature, send a probe request for each
device managed by the IOMMU. Extract RESV_MEM information. When we
encounter a MSI doorbell region, set it up as a IOMMU_RESV_MSI region.
This will tell other subsystems that there is no need to map the MSI
doorbell in the virtio-iommu, because MSIs bypass it.

Signed-off-by: Jean-Philippe Brucker 
---
 drivers/iommu/virtio-iommu.c  | 149 --
 include/uapi/linux/virtio_iommu.h |  37 
 2 files changed, 180 insertions(+), 6 deletions(-)

diff --git a/drivers/iommu/virtio-iommu.c b/drivers/iommu/virtio-iommu.c
index ea0242d8624b..29ce9f4398ef 100644
--- a/drivers/iommu/virtio-iommu.c
+++ b/drivers/iommu/virtio-iommu.c
@@ -48,6 +48,7 @@ struct viommu_dev {
struct iommu_domain_geometrygeometry;
u64 pgsize_bitmap;
u8  domain_bits;
+   u32 probe_size;
 };
 
 struct viommu_mapping {
@@ -69,8 +70,10 @@ struct viommu_domain {
 };
 
 struct viommu_endpoint {
+   struct device   *dev;
struct viommu_dev   *viommu;
struct viommu_domain*vdomain;
+   struct list_headresv_regions;
 };
 
 struct viommu_request {
@@ -121,6 +124,9 @@ static off_t viommu_get_req_offset(struct viommu_dev 
*viommu,
 {
size_t tail_size = sizeof(struct virtio_iommu_req_tail);
 
+   if (req->type == VIRTIO_IOMMU_T_PROBE)
+   return len - viommu->probe_size - tail_size;
+
return len - tail_size;
 }
 
@@ -404,6 +410,103 @@ static int viommu_replay_mappings(struct viommu_domain 
*vdomain)
return ret;
 }
 
+static int viommu_add_resv_mem(struct viommu_endpoint *vdev,
+  struct virtio_iommu_probe_resv_mem *mem,
+  size_t len)
+{
+   struct iommu_resv_region *region = NULL;
+   unsigned long prot = IOMMU_WRITE | IOMMU_NOEXEC | IOMMU_MMIO;
+
+   u64 start = le64_to_cpu(mem->start);
+   u64 end = le64_to_cpu(mem->end);
+   size_t size = end - start + 1;
+
+   if (len < sizeof(*mem))
+   return -EINVAL;
+
+   switch (mem->subtype) {
+   default:
+   if (mem->subtype != VIRTIO_IOMMU_RESV_MEM_T_RESERVED &&
+   mem->subtype != VIRTIO_IOMMU_RESV_MEM_T_MSI)
+   dev_warn(vdev->dev, "unknown resv mem subtype 0x%x\n",
+mem->subtype);
+   /* Fall-through */
+   case VIRTIO_IOMMU_RESV_MEM_T_RESERVED:
+   region = iommu_alloc_resv_region(start, size, 0,
+IOMMU_RESV_RESERVED);
+   break;
+   case VIRTIO_IOMMU_RESV_MEM_T_MSI:
+   region = iommu_alloc_resv_region(start, size, prot,
+IOMMU_RESV_MSI);
+   break;
+   }
+
+   list_add(&vdev->resv_regions, ®ion->list);
+   return 0;
+}
+
+static int viommu_probe_endpoint(struct viommu_dev *viommu, struct device *dev)
+{
+   int ret;
+   u16 type, len;
+   size_t cur = 0;
+   size_t probe_len;
+   struct virtio_iommu_req_probe *probe;
+   struct virtio_iommu_probe_property *prop;
+   struct iommu_fwspec *fwspec = dev->iommu_fwspec;
+   struct viommu_endpoint *vdev = fwspec->iommu_priv;
+
+   if (!fwspec->num_ids)
+   return -EINVAL;
+
+   probe_len = sizeof(*probe) + viommu->probe_size +
+   sizeof(struct virtio_iommu_req_tail);
+   probe = kzalloc(probe_len, GFP_KERNEL);
+   if (!probe)
+   return -ENOMEM;
+
+   probe->head.type = VIRTIO_IOMMU_T_PROBE;
+   /*
+* For now, assume that properties of an endpoint that outputs multiple
+* IDs are consistent. Only probe the first one.
+*/
+   probe->endpoint = cpu_to_le32(fwspec->ids[0]);
+
+   ret = viommu_send_req_sync(viommu, probe, probe_len);
+   if (ret)
+   goto out_free;
+
+   prop = (void *)probe->properties;
+   type = le16_to_cpu(prop->type) & VIRTIO_IOMMU_PROBE_T_MASK;
+
+   while (type != VIRTIO_IOMMU_PROBE_T_NONE &&
+  cur < viommu->probe_size) {
+   len = le16_to_cpu(prop->length);
+
+   switch (type) {
+   case VIRTIO_IOMMU_PROBE_T_RESV_MEM:
+   ret = viommu_add_resv_mem(vdev, (void *)prop->value, 
len);
+   break;
+   default:
+   dev_dbg(dev, "unknown viommu prop 0x%x\n", type);
+   }
+
+   if (ret)
+   dev_err(dev, "failed to parse viommu prop 0x%x\n", 
type);
+
+   cur += sizeof(*prop) + len;
+   if (cur >= viommu->probe_size)
+   break;
+
+   prop = (void *)probe->properties + cur;
+   type =

[PATCH v2 4/5] iommu/virtio: Add event queue

2018-06-21 Thread Jean-Philippe Brucker
The event queue offers a way for the device to report access faults from
endpoints. It is implemented on virtqueue #1. Whenever the host needs to
signal a fault, it fills one of the buffers offered by the guest and
interrupts it.

Signed-off-by: Jean-Philippe Brucker 
---
 drivers/iommu/virtio-iommu.c  | 116 +++---
 include/uapi/linux/virtio_iommu.h |  18 +
 2 files changed, 125 insertions(+), 9 deletions(-)

diff --git a/drivers/iommu/virtio-iommu.c b/drivers/iommu/virtio-iommu.c
index 29ce9f4398ef..c075404e97d6 100644
--- a/drivers/iommu/virtio-iommu.c
+++ b/drivers/iommu/virtio-iommu.c
@@ -29,7 +29,8 @@
 #define MSI_IOVA_LENGTH0x10
 
 #define VIOMMU_REQUEST_VQ  0
-#define VIOMMU_NR_VQS  1
+#define VIOMMU_EVENT_VQ1
+#define VIOMMU_NR_VQS  2
 
 #define VIOMMU_REQUEST_TIMEOUT 1 /* 10s */
 
@@ -43,6 +44,7 @@ struct viommu_dev {
struct virtqueue*vqs[VIOMMU_NR_VQS];
spinlock_t  request_lock;
struct list_headrequests;
+   void*evts;
 
/* Device configuration */
struct iommu_domain_geometrygeometry;
@@ -84,6 +86,15 @@ struct viommu_request {
charbuf[];
 };
 
+#define VIOMMU_FAULT_RESV_MASK 0xff00
+
+struct viommu_event {
+   union {
+   u32 head;
+   struct virtio_iommu_fault fault;
+   };
+};
+
 #define to_viommu_domain(domain)   \
container_of(domain, struct viommu_domain, domain)
 
@@ -507,6 +518,69 @@ static int viommu_probe_endpoint(struct viommu_dev 
*viommu, struct device *dev)
return ret;
 }
 
+static int viommu_fault_handler(struct viommu_dev *viommu,
+   struct virtio_iommu_fault *fault)
+{
+   char *reason_str;
+
+   u8 reason   = fault->reason;
+   u32 flags   = le32_to_cpu(fault->flags);
+   u32 endpoint= le32_to_cpu(fault->endpoint);
+   u64 address = le64_to_cpu(fault->address);
+
+   switch (reason) {
+   case VIRTIO_IOMMU_FAULT_R_DOMAIN:
+   reason_str = "domain";
+   break;
+   case VIRTIO_IOMMU_FAULT_R_MAPPING:
+   reason_str = "page";
+   break;
+   case VIRTIO_IOMMU_FAULT_R_UNKNOWN:
+   default:
+   reason_str = "unknown";
+   break;
+   }
+
+   /* TODO: find EP by ID and report_iommu_fault */
+   if (flags & VIRTIO_IOMMU_FAULT_F_ADDRESS)
+   dev_err_ratelimited(viommu->dev, "%s fault from EP %u at %#llx 
[%s%s%s]\n",
+   reason_str, endpoint, address,
+   flags & VIRTIO_IOMMU_FAULT_F_READ ? "R" : 
"",
+   flags & VIRTIO_IOMMU_FAULT_F_WRITE ? "W" : 
"",
+   flags & VIRTIO_IOMMU_FAULT_F_EXEC ? "X" : 
"");
+   else
+   dev_err_ratelimited(viommu->dev, "%s fault from EP %u\n",
+   reason_str, endpoint);
+   return 0;
+}
+
+static void viommu_event_handler(struct virtqueue *vq)
+{
+   int ret;
+   unsigned int len;
+   struct scatterlist sg[1];
+   struct viommu_event *evt;
+   struct viommu_dev *viommu = vq->vdev->priv;
+
+   while ((evt = virtqueue_get_buf(vq, &len)) != NULL) {
+   if (len > sizeof(*evt)) {
+   dev_err(viommu->dev,
+   "invalid event buffer (len %u != %zu)\n",
+   len, sizeof(*evt));
+   } else if (!(evt->head & VIOMMU_FAULT_RESV_MASK)) {
+   viommu_fault_handler(viommu, &evt->fault);
+   }
+
+   sg_init_one(sg, evt, sizeof(*evt));
+   ret = virtqueue_add_inbuf(vq, sg, 1, evt, GFP_ATOMIC);
+   if (ret)
+   dev_err(viommu->dev, "could not add event buffer\n");
+   }
+
+   if (!virtqueue_kick(vq))
+   dev_err(viommu->dev, "kick failed\n");
+}
+
 /* IOMMU API */
 
 static struct iommu_domain *viommu_domain_alloc(unsigned type)
@@ -893,16 +967,35 @@ static struct iommu_ops viommu_ops = {
 static int viommu_init_vqs(struct viommu_dev *viommu)
 {
struct virtio_device *vdev = dev_to_virtio(viommu->dev);
-   const char *name = "request";
-   void *ret;
+   const char *names[] = { "request", "event" };
+   vq_callback_t *callbacks[] = {
+   NULL, /* No async requests */
+   viommu_event_handler,
+   };
 
-   ret = virtio_find_single_vq(vdev, NULL, name);
-   if (IS_ERR(ret)) {
-   dev_err(viommu->dev, "cannot find VQ\n");
-   return PTR_ERR(ret);
-   }
+   return virtio_find_vqs(vdev, VIOMMU_NR_VQS, viommu->vqs, callbacks,
+ 

[PATCH v2 5/5] vfio: Allow type-1 IOMMU instantiation for ARM

2018-06-21 Thread Jean-Philippe Brucker
ARM platforms may implement several kinds of IOMMUs (various SMMU or
SMMUv3 implementations, virtio-iommu). They are all type-1, so
automatically select VFIO_IOMMU_TYPE1 on ARM if IOMMU is selected.

Signed-off-by: Jean-Philippe Brucker 
---
 drivers/vfio/Kconfig | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/vfio/Kconfig b/drivers/vfio/Kconfig
index c84333eb5eb5..9de5ed38da83 100644
--- a/drivers/vfio/Kconfig
+++ b/drivers/vfio/Kconfig
@@ -21,7 +21,7 @@ config VFIO_VIRQFD
 menuconfig VFIO
tristate "VFIO Non-Privileged userspace driver framework"
depends on IOMMU_API
-   select VFIO_IOMMU_TYPE1 if (X86 || S390 || ARM_SMMU || ARM_SMMU_V3)
+   select VFIO_IOMMU_TYPE1 if (X86 || S390 || ARM || ARM64)
select ANON_INODES
help
  VFIO provides a framework for secure userspace device drivers.
-- 
2.17.0

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v2 2/5] iommu: Add virtio-iommu driver

2018-06-21 Thread Michael S. Tsirkin
On Thu, Jun 21, 2018 at 08:06:52PM +0100, Jean-Philippe Brucker wrote:
> The virtio IOMMU is a para-virtualized device, allowing to send IOMMU
> requests such as map/unmap over virtio-mmio transport without emulating
> page tables. This implementation handles ATTACH, DETACH, MAP and UNMAP
> requests.
> 
> The bulk of the code transforms calls coming from the IOMMU API into
> corresponding virtio requests. Mappings are kept in an interval tree
> instead of page tables.
> 
> Signed-off-by: Jean-Philippe Brucker 
> ---
>  MAINTAINERS   |   6 +
>  drivers/iommu/Kconfig |  11 +
>  drivers/iommu/Makefile|   1 +
>  drivers/iommu/virtio-iommu.c  | 929 ++
>  include/uapi/linux/virtio_ids.h   |   1 +
>  include/uapi/linux/virtio_iommu.h | 117 
>  6 files changed, 1065 insertions(+)
>  create mode 100644 drivers/iommu/virtio-iommu.c
>  create mode 100644 include/uapi/linux/virtio_iommu.h
> 
> diff --git a/MAINTAINERS b/MAINTAINERS
> index 55c08968aaab..bfb09dfa8e02 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -15248,6 +15248,12 @@ S:   Maintained
>  F:   drivers/virtio/virtio_input.c
>  F:   include/uapi/linux/virtio_input.h
>  
> +VIRTIO IOMMU DRIVER
> +M:   Jean-Philippe Brucker 
> +S:   Maintained
> +F:   drivers/iommu/virtio-iommu.c
> +F:   include/uapi/linux/virtio_iommu.h
> +
>  VIRTUAL BOX GUEST DEVICE DRIVER
>  M:   Hans de Goede 
>  M:   Arnd Bergmann 

Please add the virtualization mailing list.

> diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig
> index 958417f22020..820709383899 100644
> --- a/drivers/iommu/Kconfig
> +++ b/drivers/iommu/Kconfig
> @@ -412,4 +412,15 @@ config QCOM_IOMMU
>   help
> Support for IOMMU on certain Qualcomm SoCs.
>  
> +config VIRTIO_IOMMU
> + bool "Virtio IOMMU driver"
> + depends on VIRTIO_MMIO=y
> + select IOMMU_API
> + select INTERVAL_TREE
> + select ARM_DMA_USE_IOMMU if ARM
> + help
> +   Para-virtualised IOMMU driver with virtio.
> +
> +   Say Y here if you intend to run this kernel as a guest.
> +
>  endif # IOMMU_SUPPORT
> diff --git a/drivers/iommu/Makefile b/drivers/iommu/Makefile
> index 244ad7913a81..b559c6ae81ea 100644
> --- a/drivers/iommu/Makefile
> +++ b/drivers/iommu/Makefile
> @@ -33,3 +33,4 @@ obj-$(CONFIG_EXYNOS_IOMMU) += exynos-iommu.o
>  obj-$(CONFIG_FSL_PAMU) += fsl_pamu.o fsl_pamu_domain.o
>  obj-$(CONFIG_S390_IOMMU) += s390-iommu.o
>  obj-$(CONFIG_QCOM_IOMMU) += qcom_iommu.o
> +obj-$(CONFIG_VIRTIO_IOMMU) += virtio-iommu.o
> diff --git a/drivers/iommu/virtio-iommu.c b/drivers/iommu/virtio-iommu.c
> new file mode 100644
> index ..ea0242d8624b
> --- /dev/null
> +++ b/drivers/iommu/virtio-iommu.c
> @@ -0,0 +1,929 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + * Virtio driver for the paravirtualized IOMMU
> + *
> + * Copyright (C) 2018 Arm Limited
> + */
> +
> +#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
> +
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +
> +#include 
> +
> +#define MSI_IOVA_BASE0x800
> +#define MSI_IOVA_LENGTH  0x10
> +
> +#define VIOMMU_REQUEST_VQ0
> +#define VIOMMU_NR_VQS1
> +
> +#define VIOMMU_REQUEST_TIMEOUT   1 /* 10s */

Where does this come from? Do you absolutely have to have
an arbitrary timeout?

> +
> +struct viommu_dev {
> + struct iommu_device iommu;
> + struct device   *dev;
> + struct virtio_device*vdev;
> +
> + struct ida  domain_ids;
> +
> + struct virtqueue*vqs[VIOMMU_NR_VQS];
> + spinlock_t  request_lock;
> + struct list_headrequests;
> +
> + /* Device configuration */
> + struct iommu_domain_geometrygeometry;
> + u64 pgsize_bitmap;
> + u8  domain_bits;
> +};
> +
> +struct viommu_mapping {
> + phys_addr_t paddr;
> + struct interval_tree_node   iova;
> + u32 flags;
> +};
> +
> +struct viommu_domain {
> + struct iommu_domain domain;
> + struct viommu_dev   *viommu;
> + struct mutexmutex;
> + unsigned intid;
> +
> + spinlock_t  mappings_lock;
> + struct rb_root_cached   mappings;
> +
> + unsigned long   nr_endpoints;
> +};
> +
> +struct viommu_endpoint {
> + struct viommu_dev   *viommu;
> + struct viommu_domain*vdomain;
> +};
> +
> +struct viommu_request {
> + struct list_headlist;
> + void*writeback;
> + unsigned int  

Re: [PATCH v2 3/5] iommu/virtio: Add probe request

2018-06-21 Thread Michael S. Tsirkin
On Thu, Jun 21, 2018 at 08:06:53PM +0100, Jean-Philippe Brucker wrote:
> When the device offers the probe feature, send a probe request for each
> device managed by the IOMMU. Extract RESV_MEM information. When we
> encounter a MSI doorbell region, set it up as a IOMMU_RESV_MSI region.
> This will tell other subsystems that there is no need to map the MSI
> doorbell in the virtio-iommu, because MSIs bypass it.
> 
> Signed-off-by: Jean-Philippe Brucker 
> ---
>  drivers/iommu/virtio-iommu.c  | 149 --
>  include/uapi/linux/virtio_iommu.h |  37 
>  2 files changed, 180 insertions(+), 6 deletions(-)
> 
> diff --git a/drivers/iommu/virtio-iommu.c b/drivers/iommu/virtio-iommu.c
> index ea0242d8624b..29ce9f4398ef 100644
> --- a/drivers/iommu/virtio-iommu.c
> +++ b/drivers/iommu/virtio-iommu.c
> @@ -48,6 +48,7 @@ struct viommu_dev {
>   struct iommu_domain_geometrygeometry;
>   u64 pgsize_bitmap;
>   u8  domain_bits;
> + u32 probe_size;
>  };
>  
>  struct viommu_mapping {
> @@ -69,8 +70,10 @@ struct viommu_domain {
>  };
>  
>  struct viommu_endpoint {
> + struct device   *dev;
>   struct viommu_dev   *viommu;
>   struct viommu_domain*vdomain;
> + struct list_headresv_regions;
>  };
>  
>  struct viommu_request {
> @@ -121,6 +124,9 @@ static off_t viommu_get_req_offset(struct viommu_dev 
> *viommu,
>  {
>   size_t tail_size = sizeof(struct virtio_iommu_req_tail);
>  
> + if (req->type == VIRTIO_IOMMU_T_PROBE)
> + return len - viommu->probe_size - tail_size;
> +
>   return len - tail_size;
>  }
>  
> @@ -404,6 +410,103 @@ static int viommu_replay_mappings(struct viommu_domain 
> *vdomain)
>   return ret;
>  }
>  
> +static int viommu_add_resv_mem(struct viommu_endpoint *vdev,
> +struct virtio_iommu_probe_resv_mem *mem,
> +size_t len)
> +{
> + struct iommu_resv_region *region = NULL;
> + unsigned long prot = IOMMU_WRITE | IOMMU_NOEXEC | IOMMU_MMIO;
> +
> + u64 start = le64_to_cpu(mem->start);
> + u64 end = le64_to_cpu(mem->end);
> + size_t size = end - start + 1;
> +
> + if (len < sizeof(*mem))
> + return -EINVAL;
> +
> + switch (mem->subtype) {
> + default:
> + if (mem->subtype != VIRTIO_IOMMU_RESV_MEM_T_RESERVED &&
> + mem->subtype != VIRTIO_IOMMU_RESV_MEM_T_MSI)
> + dev_warn(vdev->dev, "unknown resv mem subtype 0x%x\n",
> +  mem->subtype);
> + /* Fall-through */
> + case VIRTIO_IOMMU_RESV_MEM_T_RESERVED:
> + region = iommu_alloc_resv_region(start, size, 0,
> +  IOMMU_RESV_RESERVED);
> + break;
> + case VIRTIO_IOMMU_RESV_MEM_T_MSI:
> + region = iommu_alloc_resv_region(start, size, prot,
> +  IOMMU_RESV_MSI);
> + break;
> + }
> +
> + list_add(&vdev->resv_regions, ®ion->list);
> + return 0;
> +}
> +
> +static int viommu_probe_endpoint(struct viommu_dev *viommu, struct device 
> *dev)
> +{
> + int ret;
> + u16 type, len;
> + size_t cur = 0;
> + size_t probe_len;
> + struct virtio_iommu_req_probe *probe;
> + struct virtio_iommu_probe_property *prop;
> + struct iommu_fwspec *fwspec = dev->iommu_fwspec;
> + struct viommu_endpoint *vdev = fwspec->iommu_priv;
> +
> + if (!fwspec->num_ids)
> + return -EINVAL;
> +
> + probe_len = sizeof(*probe) + viommu->probe_size +
> + sizeof(struct virtio_iommu_req_tail);
> + probe = kzalloc(probe_len, GFP_KERNEL);
> + if (!probe)
> + return -ENOMEM;
> +
> + probe->head.type = VIRTIO_IOMMU_T_PROBE;
> + /*
> +  * For now, assume that properties of an endpoint that outputs multiple
> +  * IDs are consistent. Only probe the first one.
> +  */
> + probe->endpoint = cpu_to_le32(fwspec->ids[0]);
> +
> + ret = viommu_send_req_sync(viommu, probe, probe_len);
> + if (ret)
> + goto out_free;
> +
> + prop = (void *)probe->properties;
> + type = le16_to_cpu(prop->type) & VIRTIO_IOMMU_PROBE_T_MASK;
> +
> + while (type != VIRTIO_IOMMU_PROBE_T_NONE &&
> +cur < viommu->probe_size) {
> + len = le16_to_cpu(prop->length);
> +
> + switch (type) {
> + case VIRTIO_IOMMU_PROBE_T_RESV_MEM:
> + ret = viommu_add_resv_mem(vdev, (void *)prop->value, 
> len);
> + break;
> + default:
> + dev_dbg(dev, "unknown viommu prop 0x%x\n", type);
> + }
> +
> + if (ret)
> + dev_err(dev, "failed to parse viommu prop 0x%x\n", 
> type);
> +
> +   

Re: [virtio-dev] Re: [PATCH] qemu: Introduce VIRTIO_NET_F_STANDBY feature bit to virtio_net

2018-06-21 Thread Siwei Liu
On Thu, Jun 21, 2018 at 11:14 AM, Michael S. Tsirkin  wrote:
> On Wed, Jun 13, 2018 at 01:40:59PM +0800, Jason Wang wrote:
>>
>>
>> On 2018年06月13日 12:24, Samudrala, Sridhar wrote:
>> > On 6/12/2018 7:38 PM, Jason Wang wrote:
>> > >
>> > >
>> > > On 2018年06月12日 19:54, Michael S. Tsirkin wrote:
>> > > > On Wed, Jun 06, 2018 at 10:29:03AM +0800, Jason Wang wrote:
>> > > > >
>> > > > > On 2018年06月05日 20:33, Michael S. Tsirkin wrote:
>> > > > > > I don't think this is sufficient.
>> > > > > >
>> > > > > > If both primary and standby devices are present, a
>> > > > > > legacy guest without
>> > > > > > support for the feature might see two devices with same mac and get
>> > > > > > confused.
>> > > > > >
>> > > > > > I think that we should only make primary visible after
>> > > > > > guest acked the
>> > > > > > backup feature bit.
>> > > > > I think we want exactly the reverse? E.g fail the
>> > > > > negotiation when guest
>> > > > > does not ack backup feature.
>> > > > >
>> > > > > Otherwise legacy guest won't even have the chance to see
>> > > > > primary device in
>> > > > > the guest.
>> > > > That's by design.
>> > >
>> > > So management needs to know the capability of guest to set the
>> > > backup feature. This looks a chicken or egg problem to me.
>> >
>> > I don't think so. If the tenant requests 'accelerated datapath feature',
>> > the management
>> > will set 'standby' feature bit on virtio-net interface and if the guest
>> > virtio-net driver
>> > supports this feature, then the tenant VM will get that capability via a
>> > hot-plugged
>> > primary device.
>>
>> Ok, I thought exactly the reverse because of the commit title is "enable
>> virtio_net to act as a standby for a passthru device". But re-read the
>> commit log content, I understand the case a little bit. Btw, VF is not
>> necessarily faster than virtio-net, especially consider virtio-net may have
>> a lot of queues.
>
> Don't do that then, right?

I don't understand. Where did the standby feature come to imply the
"accelerated datapath" thing? Isn't failover/standby a generic high
availblity term, rather than marry it to the concept of device model
specifics? Do we expect scsi to work exactly the same way with
"accelerated datapath"?

-Siwei


>
>> >
>> >
>> > >
>> > > >
>> > > > > > And on reset or when backup is cleared in some other way, unplug 
>> > > > > > the
>> > > > > > primary.
>> > > > > What if guest does not support hotplug?
>> > > > It shouldn't ack the backup feature then and it's a good point.
>> > > > We should both document this and check kernel config has
>> > > > hotplug support. Sridhar could you take a look pls?
>> > > >
>> > > > > > Something like the below will do the job:
>> > > > > >
>> > > > > > Primary device is added with a special "primary-failover" flag.
>> > > > > > A virtual machine is then initialized with just a standby virtio
>> > > > > > device. Primary is not yet added.
>> > > > > A question is how to do the matching? Qemu knows nothing about e.g 
>> > > > > mac
>> > > > > address of a pass-through device I believe?
>> > > > Supply a flag to the VFIO when it's added, this way QEMU will know.
>> > > >
>> > > > > > Later QEMU detects that guest driver device set DRIVER_OK.
>> > > > > > It then exposes the primary device to the guest, and triggers
>> > > > > > a device addition event (hot-plug event) for it.
>> > > > > Do you mean we won't have primary device in the initial qemu cli?
>> > > > No, that's not what I mean.
>> > > >
>> > > > I mean management will supply a flag to VFIO and then
>> > > >
>> > > >
>> > > > - VFIO defers exposing
>> > > > primary to guest until guest acks the feature bit.
>> > > > - When we see guest ack, initiate hotplug.
>> > > > - On reboot, hide it again.
>> > > > - On reset without reboot, request hot-unplug and on eject hide
>> > > > it again.
>> > >
>> > > This sounds much like a kind of bonding in qemu.
>> > >
>> > > >
>> > > > > > If QEMU detects guest driver removal, it initiates a
>> > > > > > hot-unplug sequence
>> > > > > > to remove the primary driver.  In particular, if QEMU detects guest
>> > > > > > re-initialization (e.g. by detecting guest reset) it
>> > > > > > immediately removes
>> > > > > > the primary device.
>> > > > > I believe guest failover module should handle this gracefully?
>> > > > It can't control enumeration order, if primary is enumerated before
>> > > > standby then guest will load its driver and it's too late
>> > > > when failover driver is loaded.
>> > >
>> > > Well, even if we can delay the visibility of primary until
>> > > DRIVER_OK, there still be a race I think? And it looks to me it's
>> > > still a bug of guest:
>> > >
>> > > E.g primary could be probed before failover_register() in guest.
>> > > Then we will miss the enslaving of primary forever.
>> >
>> > That is not an issue. Even if the primary is probed before failover
>> > driver, it will
>> > enslave the primary via the call to failover_existing_slave_register()
>> > as part

Re: [virtio-dev] Re: [Qemu-devel] [PATCH] qemu: Introduce VIRTIO_NET_F_STANDBY feature bit to virtio_net

2018-06-21 Thread Siwei Liu
On Thu, Jun 21, 2018 at 7:59 AM, Cornelia Huck  wrote:
> On Wed, 20 Jun 2018 22:48:58 +0300
> "Michael S. Tsirkin"  wrote:
>
>> On Wed, Jun 20, 2018 at 06:06:19PM +0200, Cornelia Huck wrote:
>> > In any case, I'm not sure anymore why we'd want the extra uuid.
>>
>> It's mostly so we can have e.g. multiple devices with same MAC
>> (which some people seem to want in order to then use
>> then with different containers).
>>
>> But it is also handy for when you assign a PF, since then you
>> can't set the MAC.
>>
>
> OK, so what about the following:
>
> - introduce a new feature bit, VIRTIO_NET_F_STANDBY_UUID that indicates
>   that we have a new uuid field in the virtio-net config space
> - in QEMU, add a property for virtio-net that allows to specify a uuid,
>   offer VIRTIO_NET_F_STANDBY_UUID if set
> - when configuring, set the property to the group UUID of the vfio-pci
>   device

If feature negotiation fails on VIRTIO_NET_F_STANDBY_UUID, is it safe
to still expose UUID in the config space on virtio-pci?

I'm not even sure if it's sane to expose group UUID on the PCI bridge
where the corresponding vfio-pci device attached to for a guest which
doesn't support the feature (legacy).

-Siwei


> - in the guest, use the uuid from the virtio-net device's config space
>   if applicable; else, fall back to matching by MAC as done today
>
> That should work for all virtio transports.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v6 3/3] x86: paravirt: make native_save_fl extern inline

2018-06-21 Thread Ingo Molnar


* Nick Desaulniers  wrote:

> native_save_fl() is marked static inline, but by using it as
> a function pointer in arch/x86/kernel/paravirt.c, it MUST be outlined.

> --- a/arch/x86/include/asm/irqflags.h
> +++ b/arch/x86/include/asm/irqflags.h
> @@ -13,7 +13,7 @@
>   * Interrupt control:
>   */
>  
> -static inline unsigned long native_save_fl(void)
> +extern inline unsigned long native_save_fl(void)
>  {
>   unsigned long flags;
>  

What's the code generation effect of this on say a defconfig kernel vmlinux 
with 
paravirt enabled?

Thanks,

Ingo
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [virtio-dev] Re: [PATCH] qemu: Introduce VIRTIO_NET_F_STANDBY feature bit to virtio_net

2018-06-21 Thread Michael S. Tsirkin
On Thu, Jun 21, 2018 at 06:07:18PM -0700, Siwei Liu wrote:
> On Thu, Jun 21, 2018 at 11:14 AM, Michael S. Tsirkin  wrote:
> > On Wed, Jun 13, 2018 at 01:40:59PM +0800, Jason Wang wrote:
> >>
> >>
> >> On 2018年06月13日 12:24, Samudrala, Sridhar wrote:
> >> > On 6/12/2018 7:38 PM, Jason Wang wrote:
> >> > >
> >> > >
> >> > > On 2018年06月12日 19:54, Michael S. Tsirkin wrote:
> >> > > > On Wed, Jun 06, 2018 at 10:29:03AM +0800, Jason Wang wrote:
> >> > > > >
> >> > > > > On 2018年06月05日 20:33, Michael S. Tsirkin wrote:
> >> > > > > > I don't think this is sufficient.
> >> > > > > >
> >> > > > > > If both primary and standby devices are present, a
> >> > > > > > legacy guest without
> >> > > > > > support for the feature might see two devices with same mac and 
> >> > > > > > get
> >> > > > > > confused.
> >> > > > > >
> >> > > > > > I think that we should only make primary visible after
> >> > > > > > guest acked the
> >> > > > > > backup feature bit.
> >> > > > > I think we want exactly the reverse? E.g fail the
> >> > > > > negotiation when guest
> >> > > > > does not ack backup feature.
> >> > > > >
> >> > > > > Otherwise legacy guest won't even have the chance to see
> >> > > > > primary device in
> >> > > > > the guest.
> >> > > > That's by design.
> >> > >
> >> > > So management needs to know the capability of guest to set the
> >> > > backup feature. This looks a chicken or egg problem to me.
> >> >
> >> > I don't think so. If the tenant requests 'accelerated datapath feature',
> >> > the management
> >> > will set 'standby' feature bit on virtio-net interface and if the guest
> >> > virtio-net driver
> >> > supports this feature, then the tenant VM will get that capability via a
> >> > hot-plugged
> >> > primary device.
> >>
> >> Ok, I thought exactly the reverse because of the commit title is "enable
> >> virtio_net to act as a standby for a passthru device". But re-read the
> >> commit log content, I understand the case a little bit. Btw, VF is not
> >> necessarily faster than virtio-net, especially consider virtio-net may have
> >> a lot of queues.
> >
> > Don't do that then, right?
> 
> I don't understand. Where did the standby feature come to imply the
> "accelerated datapath" thing?
> Isn't failover/standby a generic high
> availblity term, rather than marry it to the concept of device model
> specifics? Do we expect scsi to work exactly the same way with
> "accelerated datapath"?

That's not what I said.
The semantics are that the primary is always used if present in
preference to standby.
Jason said virtio net is sometimes preferable.
If that's the case don't make it a standby.

More advanced use-cases do exist and e.g. Alexander Duyck
suggested using a switch-dev. failover isn't it though.

-- 
MST
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [virtio-dev] Re: [Qemu-devel] [PATCH] qemu: Introduce VIRTIO_NET_F_STANDBY feature bit to virtio_net

2018-06-21 Thread Michael S. Tsirkin
On Thu, Jun 21, 2018 at 06:21:55PM -0700, Siwei Liu wrote:
> On Thu, Jun 21, 2018 at 7:59 AM, Cornelia Huck  wrote:
> > On Wed, 20 Jun 2018 22:48:58 +0300
> > "Michael S. Tsirkin"  wrote:
> >
> >> On Wed, Jun 20, 2018 at 06:06:19PM +0200, Cornelia Huck wrote:
> >> > In any case, I'm not sure anymore why we'd want the extra uuid.
> >>
> >> It's mostly so we can have e.g. multiple devices with same MAC
> >> (which some people seem to want in order to then use
> >> then with different containers).
> >>
> >> But it is also handy for when you assign a PF, since then you
> >> can't set the MAC.
> >>
> >
> > OK, so what about the following:
> >
> > - introduce a new feature bit, VIRTIO_NET_F_STANDBY_UUID that indicates
> >   that we have a new uuid field in the virtio-net config space
> > - in QEMU, add a property for virtio-net that allows to specify a uuid,
> >   offer VIRTIO_NET_F_STANDBY_UUID if set
> > - when configuring, set the property to the group UUID of the vfio-pci
> >   device
> 
> If feature negotiation fails on VIRTIO_NET_F_STANDBY_UUID, is it safe
> to still expose UUID in the config space on virtio-pci?


Yes but guest is not supposed to read it.

> I'm not even sure if it's sane to expose group UUID on the PCI bridge
> where the corresponding vfio-pci device attached to for a guest which
> doesn't support the feature (legacy).
> 
> -Siwei

Yes but you won't add the primary behind such a bridge.

> 
> > - in the guest, use the uuid from the virtio-net device's config space
> >   if applicable; else, fall back to matching by MAC as done today
> >
> > That should work for all virtio transports.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization