[PATCH] VMXNET3: Check for map error in vmxnet3_set_mc

2014-09-02 Thread Andy King
We should check if the map of the table actually succeeds, and also free
resources accordingly. This fixes the kernel panic reported by Tetsuo
Handa.

Version bumped to 1.2.1.0

Acked-by: Shelley Gong 
Acked-by: Bhavesh Davda 
Signed-off-by: Andy King 
---
 drivers/net/vmxnet3/vmxnet3_drv.c |   14 --
 drivers/net/vmxnet3/vmxnet3_int.h |4 ++--
 2 files changed, 10 insertions(+), 8 deletions(-)

diff --git a/drivers/net/vmxnet3/vmxnet3_drv.c 
b/drivers/net/vmxnet3/vmxnet3_drv.c
index d6e90c7..f450010 100644
--- a/drivers/net/vmxnet3/vmxnet3_drv.c
+++ b/drivers/net/vmxnet3/vmxnet3_drv.c
@@ -2056,7 +2056,6 @@ vmxnet3_set_mc(struct net_device *netdev)
if (!netdev_mc_empty(netdev)) {
new_table = vmxnet3_copy_mc(netdev);
if (new_table) {
-   new_mode |= VMXNET3_RXM_MCAST;
rxConf->mfTableLen = cpu_to_le16(
netdev_mc_count(netdev) * ETH_ALEN);
new_table_pa = dma_map_single(
@@ -2064,15 +2063,18 @@ vmxnet3_set_mc(struct net_device *netdev)
new_table,
rxConf->mfTableLen,
PCI_DMA_TODEVICE);
+   }
+
+   if (new_table_pa) {
+   new_mode |= VMXNET3_RXM_MCAST;
rxConf->mfTablePA = cpu_to_le64(new_table_pa);
} else {
-   netdev_info(netdev, "failed to copy mcast list"
-   ", setting ALL_MULTI\n");
+   netdev_info(netdev,
+   "failed to copy mcast list, setting 
ALL_MULTI\n");
new_mode |= VMXNET3_RXM_ALL_MULTI;
}
}
 
-
if (!(new_mode & VMXNET3_RXM_MCAST)) {
rxConf->mfTableLen = 0;
rxConf->mfTablePA = 0;
@@ -2091,11 +2093,11 @@ vmxnet3_set_mc(struct net_device *netdev)
   VMXNET3_CMD_UPDATE_MAC_FILTERS);
spin_unlock_irqrestore(&adapter->cmd_lock, flags);
 
-   if (new_table) {
+   if (new_table_pa)
dma_unmap_single(&adapter->pdev->dev, new_table_pa,
 rxConf->mfTableLen, PCI_DMA_TODEVICE);
+   if (new_table)
kfree(new_table);
-   }
 }
 
 void
diff --git a/drivers/net/vmxnet3/vmxnet3_int.h 
b/drivers/net/vmxnet3/vmxnet3_int.h
index 29ee77f2..3759479 100644
--- a/drivers/net/vmxnet3/vmxnet3_int.h
+++ b/drivers/net/vmxnet3/vmxnet3_int.h
@@ -69,10 +69,10 @@
 /*
  * Version numbers
  */
-#define VMXNET3_DRIVER_VERSION_STRING   "1.2.0.0-k"
+#define VMXNET3_DRIVER_VERSION_STRING   "1.2.1.0-k"
 
 /* a 32-bit int, each byte encode a verion number in VMXNET3_DRIVER_VERSION */
-#define VMXNET3_DRIVER_VERSION_NUM  0x0102
+#define VMXNET3_DRIVER_VERSION_NUM  0x01020100
 
 #if defined(CONFIG_PCI_MSI)
/* RSS only makes sense if MSI-X is supported. */
-- 
1.7.4.1

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH] VMXNET3: Check for map error in vmxnet3_set_mc

2014-09-02 Thread Sergei Shtylyov

Hello.

On 09/02/2014 08:30 PM, Andy King wrote:


We should check if the map of the table actually succeeds, and also free
resources accordingly. This fixes the kernel panic reported by Tetsuo
Handa.


   There's "Reported-by:" line for that.


Version bumped to 1.2.1.0



Acked-by: Shelley Gong 
Acked-by: Bhavesh Davda 
Signed-off-by: Andy King 
---
  drivers/net/vmxnet3/vmxnet3_drv.c |   14 --
  drivers/net/vmxnet3/vmxnet3_int.h |4 ++--
  2 files changed, 10 insertions(+), 8 deletions(-)



diff --git a/drivers/net/vmxnet3/vmxnet3_drv.c 
b/drivers/net/vmxnet3/vmxnet3_drv.c
index d6e90c7..f450010 100644
--- a/drivers/net/vmxnet3/vmxnet3_drv.c
+++ b/drivers/net/vmxnet3/vmxnet3_drv.c

[...]

@@ -2091,11 +2093,11 @@ vmxnet3_set_mc(struct net_device *netdev)
   VMXNET3_CMD_UPDATE_MAC_FILTERS);
spin_unlock_irqrestore(&adapter->cmd_lock, flags);

-   if (new_table) {
+   if (new_table_pa)
dma_unmap_single(&adapter->pdev->dev, new_table_pa,
 rxConf->mfTableLen, PCI_DMA_TODEVICE);
+   if (new_table)
kfree(new_table);


   The above *if* is not needed -- kfree() already checks for NULL.

[...]

WBR, Sergei

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[no subject]

2014-09-02 Thread Andy King
This version addresses Sergei's comments.

o Fixed description and added Reported-by
o Removed NULL check for kfree()

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH] VMXNET3: Check for map error in vmxnet3_set_mc

2014-09-02 Thread Andy King
We should check if the map of the table actually succeeds, and also free
resources accordingly.

Version bumped to 1.2.1.0

Acked-by: Shelley Gong 
Acked-by: Bhavesh Davda 
Signed-off-by: Andy King 
Reported-by: Tetsuo Handa 
---
 drivers/net/vmxnet3/vmxnet3_drv.c |   15 ---
 drivers/net/vmxnet3/vmxnet3_int.h |4 ++--
 2 files changed, 10 insertions(+), 9 deletions(-)

diff --git a/drivers/net/vmxnet3/vmxnet3_drv.c 
b/drivers/net/vmxnet3/vmxnet3_drv.c
index d6e90c7..6dfcbf5 100644
--- a/drivers/net/vmxnet3/vmxnet3_drv.c
+++ b/drivers/net/vmxnet3/vmxnet3_drv.c
@@ -2056,7 +2056,6 @@ vmxnet3_set_mc(struct net_device *netdev)
if (!netdev_mc_empty(netdev)) {
new_table = vmxnet3_copy_mc(netdev);
if (new_table) {
-   new_mode |= VMXNET3_RXM_MCAST;
rxConf->mfTableLen = cpu_to_le16(
netdev_mc_count(netdev) * ETH_ALEN);
new_table_pa = dma_map_single(
@@ -2064,15 +2063,18 @@ vmxnet3_set_mc(struct net_device *netdev)
new_table,
rxConf->mfTableLen,
PCI_DMA_TODEVICE);
+   }
+
+   if (new_table_pa) {
+   new_mode |= VMXNET3_RXM_MCAST;
rxConf->mfTablePA = cpu_to_le64(new_table_pa);
} else {
-   netdev_info(netdev, "failed to copy mcast list"
-   ", setting ALL_MULTI\n");
+   netdev_info(netdev,
+   "failed to copy mcast list, setting 
ALL_MULTI\n");
new_mode |= VMXNET3_RXM_ALL_MULTI;
}
}
 
-
if (!(new_mode & VMXNET3_RXM_MCAST)) {
rxConf->mfTableLen = 0;
rxConf->mfTablePA = 0;
@@ -2091,11 +2093,10 @@ vmxnet3_set_mc(struct net_device *netdev)
   VMXNET3_CMD_UPDATE_MAC_FILTERS);
spin_unlock_irqrestore(&adapter->cmd_lock, flags);
 
-   if (new_table) {
+   if (new_table_pa)
dma_unmap_single(&adapter->pdev->dev, new_table_pa,
 rxConf->mfTableLen, PCI_DMA_TODEVICE);
-   kfree(new_table);
-   }
+   kfree(new_table);
 }
 
 void
diff --git a/drivers/net/vmxnet3/vmxnet3_int.h 
b/drivers/net/vmxnet3/vmxnet3_int.h
index 29ee77f2..3759479 100644
--- a/drivers/net/vmxnet3/vmxnet3_int.h
+++ b/drivers/net/vmxnet3/vmxnet3_int.h
@@ -69,10 +69,10 @@
 /*
  * Version numbers
  */
-#define VMXNET3_DRIVER_VERSION_STRING   "1.2.0.0-k"
+#define VMXNET3_DRIVER_VERSION_STRING   "1.2.1.0-k"
 
 /* a 32-bit int, each byte encode a verion number in VMXNET3_DRIVER_VERSION */
-#define VMXNET3_DRIVER_VERSION_NUM  0x0102
+#define VMXNET3_DRIVER_VERSION_NUM  0x01020100
 
 #if defined(CONFIG_PCI_MSI)
/* RSS only makes sense if MSI-X is supported. */
-- 
1.7.4.1

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v4 0/4] virtio: Clean up scatterlists and use the DMA API

2014-09-02 Thread Benjamin Herrenschmidt
On Mon, 2014-09-01 at 22:55 -0700, Andy Lutomirski wrote:
> 
> On x86, at least, I doubt that we'll ever see a physically addressed
> PCI virtio device for which ACPI advertises an IOMMU, since any sane
> hypervisor will just not advertise an IOMMU for the virtio device.
> But are there arm64 or PPC guests that use virtio_pci, that have
> IOMMUs, and that will malfunction if the virtio_pci driver ends up
> using the IOMMU?  I certainly hope not, since these systems might be
> very hard-pressed to work right if someone plugged in a physical
> virtio-speaking PCI device.

It will definitely not work on ppc64. We always have IOMMUs on pseries,
all PCI busses do, and because it's a paravirtualized environment,
napping/unmapping pages means hypercalls -> expensive.

But our virtio implementation bypasses it in qemu, so if virtio-pci
starts using the DMA mapping API without changing the DMA ops under the
hood, it will break for us.

Cheers,
Ben.


___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v4 0/4] virtio: Clean up scatterlists and use the DMA API

2014-09-02 Thread Konrad Rzeszutek Wilk
On Wed, Sep 03, 2014 at 06:53:33AM +1000, Benjamin Herrenschmidt wrote:
> On Mon, 2014-09-01 at 22:55 -0700, Andy Lutomirski wrote:
> > 
> > On x86, at least, I doubt that we'll ever see a physically addressed
> > PCI virtio device for which ACPI advertises an IOMMU, since any sane
> > hypervisor will just not advertise an IOMMU for the virtio device.
> > But are there arm64 or PPC guests that use virtio_pci, that have
> > IOMMUs, and that will malfunction if the virtio_pci driver ends up
> > using the IOMMU?  I certainly hope not, since these systems might be
> > very hard-pressed to work right if someone plugged in a physical
> > virtio-speaking PCI device.
> 
> It will definitely not work on ppc64. We always have IOMMUs on pseries,
> all PCI busses do, and because it's a paravirtualized environment,
> napping/unmapping pages means hypercalls -> expensive.
> 
> But our virtio implementation bypasses it in qemu, so if virtio-pci
> starts using the DMA mapping API without changing the DMA ops under the
> hood, it will break for us.

What is the default dma_ops that the Linux guests start with as
guests under ppc64?

Thanks!
> 
> Cheers,
> Ben.
> 
> 
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v4 0/4] virtio: Clean up scatterlists and use the DMA API

2014-09-02 Thread Benjamin Herrenschmidt
On Tue, 2014-09-02 at 16:56 -0400, Konrad Rzeszutek Wilk wrote:
> On Wed, Sep 03, 2014 at 06:53:33AM +1000, Benjamin Herrenschmidt wrote:
> > On Mon, 2014-09-01 at 22:55 -0700, Andy Lutomirski wrote:
> > > 
> > > On x86, at least, I doubt that we'll ever see a physically addressed
> > > PCI virtio device for which ACPI advertises an IOMMU, since any sane
> > > hypervisor will just not advertise an IOMMU for the virtio device.
> > > But are there arm64 or PPC guests that use virtio_pci, that have
> > > IOMMUs, and that will malfunction if the virtio_pci driver ends up
> > > using the IOMMU?  I certainly hope not, since these systems might be
> > > very hard-pressed to work right if someone plugged in a physical
> > > virtio-speaking PCI device.
> > 
> > It will definitely not work on ppc64. We always have IOMMUs on pseries,
> > all PCI busses do, and because it's a paravirtualized environment,
> > napping/unmapping pages means hypercalls -> expensive.
> > 
> > But our virtio implementation bypasses it in qemu, so if virtio-pci
> > starts using the DMA mapping API without changing the DMA ops under the
> > hood, it will break for us.
> 
> What is the default dma_ops that the Linux guests start with as
> guests under ppc64?

On pseries (which is what we care the most about nowadays) it's
dma_iommu_ops() which in turn call into the "TCE" code for populating
the IOMMU entries which calls the hypervisor.

Cheers,
Ben.

> Thanks!
> > 
> > Cheers,
> > Ben.
> > 
> > 


___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v4 0/4] virtio: Clean up scatterlists and use the DMA API

2014-09-02 Thread Michael S. Tsirkin
On Mon, Sep 01, 2014 at 10:55:29PM -0700, Andy Lutomirski wrote:
> On Mon, Sep 1, 2014 at 3:16 PM, Benjamin Herrenschmidt
>  wrote:
> > On Mon, 2014-09-01 at 10:39 -0700, Andy Lutomirski wrote:
> >> Changes from v1:
> >>  - Using the DMA API is optional now.  It would be nice to improve the
> >>DMA API to the point that it could be used unconditionally, but s390
> >>proves that we're not there yet.
> >>  - Includes patch 4, which fixes DMA debugging warnings from virtio_net.
> >
> > I'm not sure if you saw my reply on the other thread but I have a few
> > comments based on the above "it would be nice if ..."
> >
> 
> Yeah, sorry, I sort of thought I responded, but I didn't do a very good job.
> 
> > So here we have both a yes and a no :-)
> >
> > It would be nice to avoid those if () games all over and indeed just
> > use the DMA API, *however* we most certainly don't want to actually
> > create IOMMU mappings for the KVM virio case. This would be a massive
> > loss in performances on several platforms and generally doesn't make
> > much sense.
> >
> > However, we can still use the API without that on any architecture
> > where the dma mapping API ends up calling the generic dma_map_ops,
> > it becomes just a matter of virtio setting up some special "nop" ops
> > when needed.
> 
> I'm not quite convinced that this is a good idea.  I think that there
> are three relevant categories of virtio devices:
> 
> a) Any virtio device where the normal DMA ops are nops.  This includes
> x86 without an IOMMU (e.g. in a QEMU/KVM guest), 32-bit ARM, and
> probably many other architectures.  In this case, what we do only
> matters for performance, not for correctness.  Ideally the arch DMA
> ops are fast.
> 
> b) Virtio devices that use physical addressing on systems where DMA
> ops either don't exist at all (most s390) or do something nontrivial.
> In this case, we must either override the DMA ops or just not use
> them.
> 
> c) Virtio devices that use bus addressing.  This includes everything
> on Xen (because the "physical" addresses are nonsense) and any actual
> physical PCI device that speaks virtio on a system with an IOMMU.  In
> this case, we must use the DMA ops.
> 
> The issue is that, on systems with DMA ops that do something, we need
> to make sure that we know whether we're in case (b) or (c).  In these
> patches, I've made the assumption that, if the virtio devices lives on
> the PCI bus, then it uses the same type of addressing that any other
> device on that PCI bus would use.
> 
> On x86, at least, I doubt that we'll ever see a physically addressed
> PCI virtio device for which ACPI advertises an IOMMU, since any sane
> hypervisor will just not advertise an IOMMU for the virtio device.

How exactly does one not advertise an IOMMU for a specific
device? Could you please clarify?

> But are there arm64 or PPC guests that use virtio_pci, that have
> IOMMUs, and that will malfunction if the virtio_pci driver ends up
> using the IOMMU?  I certainly hope not, since these systems might be
> very hard-pressed to work right if someone plugged in a physical
> virtio-speaking PCI device.

One simple fix is to defer this all until virtio 1.0.
virtio 1.0 has an alternative set of IDs for virtio pci,
that can be used if you are making an incompatible change.
We can use that if there's an iommu.


> >
> > The difficulty here resides in the fact that we have never completely
> > made the dma_map_ops generic. The ops themselves are defined generically
> > as are the dma_map_* interfaces based on them, but the location of the
> > ops pointer is still more/less arch specific and some architectures
> > still chose not to use that indirection at all I believe.
> >
> 
> I'd be happy to update the patches if someone does this, but I don't
> really want to attack the DMA API on all architectures right now.  In
> the mean time, at least s390 requires that we be able to compile out
> the DMA API calls.  I'd rather see s390 provide working no-op dma ops
> for all of the struct devices that provide virtio interfaces.
> 
> On a related note, shouldn't virtio be doing something to provide dma
> ops to the virtio device and any of its children?  I don't know how it
> would even try to do this, given how architecture-dependent this code
> currently is.  Calling dma_map_single on the virtio device (as opposed
> to its parent) is currently likely to crash on x86.  Fortunately,
> nothing does this.
> 
> --Andy
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v4 0/4] virtio: Clean up scatterlists and use the DMA API

2014-09-02 Thread Andy Lutomirski
On Tue, Sep 2, 2014 at 1:53 PM, Benjamin Herrenschmidt
 wrote:
> On Mon, 2014-09-01 at 22:55 -0700, Andy Lutomirski wrote:
>>
>> On x86, at least, I doubt that we'll ever see a physically addressed
>> PCI virtio device for which ACPI advertises an IOMMU, since any sane
>> hypervisor will just not advertise an IOMMU for the virtio device.
>> But are there arm64 or PPC guests that use virtio_pci, that have
>> IOMMUs, and that will malfunction if the virtio_pci driver ends up
>> using the IOMMU?  I certainly hope not, since these systems might be
>> very hard-pressed to work right if someone plugged in a physical
>> virtio-speaking PCI device.
>
> It will definitely not work on ppc64. We always have IOMMUs on pseries,
> all PCI busses do, and because it's a paravirtualized environment,
> napping/unmapping pages means hypercalls -> expensive.
>
> But our virtio implementation bypasses it in qemu, so if virtio-pci
> starts using the DMA mapping API without changing the DMA ops under the
> hood, it will break for us.
>

Let's take a step back from from the implementation.  What is a driver
for a virtio PCI device (i.e. a PCI device with vendor 0x1af4)
supposed to do on ppc64?

It can send the device physical addresses and ignore the normal PCI
DMA semantics, which is what the current virtio_pci driver does.  This
seems like a layering violation, and this won't work if the device is
a real PCI device.  Alternatively, it can treat the device like any
other PCI device and use the IOMMU.  This is a bit slower, and it is
also incompatible with current hypervisors.

There really are virtio devices that are pieces of silicon and not
figments of a hypervisor's imagination [1].  We could teach virtio_pci
to use physical addressing on ppc64, but that seems like a pretty
awful hack, and it'll start needing quirks as soon as someone tries to
plug a virtio-speaking PCI card into a ppc64 machine.

Ideas?  x86 and arm seem to be safe here, since AFAIK there is no such
thing as a physically addressed virtio "PCI" device on a bus with an
IOMMU on x86, arm, or arm64.

[1] https://lwn.net/Articles/580186/

> Cheers,
> Ben.
>
>



-- 
Andy Lutomirski
AMA Capital Management, LLC
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v4 0/4] virtio: Clean up scatterlists and use the DMA API

2014-09-02 Thread Andy Lutomirski
On Tue, Sep 2, 2014 at 2:10 PM, Michael S. Tsirkin  wrote:
> On Mon, Sep 01, 2014 at 10:55:29PM -0700, Andy Lutomirski wrote:
>> On Mon, Sep 1, 2014 at 3:16 PM, Benjamin Herrenschmidt
>>  wrote:
>> > On Mon, 2014-09-01 at 10:39 -0700, Andy Lutomirski wrote:
>> >> Changes from v1:
>> >>  - Using the DMA API is optional now.  It would be nice to improve the
>> >>DMA API to the point that it could be used unconditionally, but s390
>> >>proves that we're not there yet.
>> >>  - Includes patch 4, which fixes DMA debugging warnings from virtio_net.
>> >
>> > I'm not sure if you saw my reply on the other thread but I have a few
>> > comments based on the above "it would be nice if ..."
>> >
>>
>> Yeah, sorry, I sort of thought I responded, but I didn't do a very good job.
>>
>> > So here we have both a yes and a no :-)
>> >
>> > It would be nice to avoid those if () games all over and indeed just
>> > use the DMA API, *however* we most certainly don't want to actually
>> > create IOMMU mappings for the KVM virio case. This would be a massive
>> > loss in performances on several platforms and generally doesn't make
>> > much sense.
>> >
>> > However, we can still use the API without that on any architecture
>> > where the dma mapping API ends up calling the generic dma_map_ops,
>> > it becomes just a matter of virtio setting up some special "nop" ops
>> > when needed.
>>
>> I'm not quite convinced that this is a good idea.  I think that there
>> are three relevant categories of virtio devices:
>>
>> a) Any virtio device where the normal DMA ops are nops.  This includes
>> x86 without an IOMMU (e.g. in a QEMU/KVM guest), 32-bit ARM, and
>> probably many other architectures.  In this case, what we do only
>> matters for performance, not for correctness.  Ideally the arch DMA
>> ops are fast.
>>
>> b) Virtio devices that use physical addressing on systems where DMA
>> ops either don't exist at all (most s390) or do something nontrivial.
>> In this case, we must either override the DMA ops or just not use
>> them.
>>
>> c) Virtio devices that use bus addressing.  This includes everything
>> on Xen (because the "physical" addresses are nonsense) and any actual
>> physical PCI device that speaks virtio on a system with an IOMMU.  In
>> this case, we must use the DMA ops.
>>
>> The issue is that, on systems with DMA ops that do something, we need
>> to make sure that we know whether we're in case (b) or (c).  In these
>> patches, I've made the assumption that, if the virtio devices lives on
>> the PCI bus, then it uses the same type of addressing that any other
>> device on that PCI bus would use.
>>
>> On x86, at least, I doubt that we'll ever see a physically addressed
>> PCI virtio device for which ACPI advertises an IOMMU, since any sane
>> hypervisor will just not advertise an IOMMU for the virtio device.
>
> How exactly does one not advertise an IOMMU for a specific
> device? Could you please clarify?

See 
https://software.intel.com/en-us/blogs/2009/09/11/decoding-the-dmar-tables-in-acpiiommu-part-2

I think that all that needs to happen is for ACPI to not list the
device in the scope of any drhd unit.  I don't know whether this works
correctly, but it looks like the iommu_dummy and the
init_no_remapping_devices code in intel-iommu.c exists for almost
exactly this purpose.

>
>> But are there arm64 or PPC guests that use virtio_pci, that have
>> IOMMUs, and that will malfunction if the virtio_pci driver ends up
>> using the IOMMU?  I certainly hope not, since these systems might be
>> very hard-pressed to work right if someone plugged in a physical
>> virtio-speaking PCI device.
>
> One simple fix is to defer this all until virtio 1.0.
> virtio 1.0 has an alternative set of IDs for virtio pci,
> that can be used if you are making an incompatible change.
> We can use that if there's an iommu.

How?  If someone builds a physical device compliant with the virtio
1.0 specification, how do can that device know whether it's behind an
IOMMU?  The IOMMU is part of the host (or Xen, sort of), not the PCI
device.  I suppose that virtio 1.0 could add a bit indicating that the
virtio device is a physical piece of hardware (presumably this should
be PCI-specific).

--Andy
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v4 0/4] virtio: Clean up scatterlists and use the DMA API

2014-09-02 Thread Benjamin Herrenschmidt
On Tue, 2014-09-02 at 14:37 -0700, Andy Lutomirski wrote:

> Let's take a step back from from the implementation.  What is a driver
> for a virtio PCI device (i.e. a PCI device with vendor 0x1af4)
> supposed to do on ppc64?

Today, it's supposed to send guest physical addresses. We can make that
optional via some nego or capabilities to support more esoteric setups
but for backward compatibility, this must remain the default behaviour.

> It can send the device physical addresses and ignore the normal PCI
> DMA semantics, which is what the current virtio_pci driver does.  This
> seems like a layering violation, and this won't work if the device is
> a real PCI device.

Correct, it's an original virtio implementation choice for maximum
performances.

>   Alternatively, it can treat the device like any
> other PCI device and use the IOMMU.  This is a bit slower, and it is
> also incompatible with current hypervisors.

This is a potentially a LOT slower and is backward incompatible with
current qemu/KVM and kvmtool yes.

The slowness can be alleviated using various techniques, for example on
ppc64 we can create a DMA window that contains a permanent mapping of
the entire guest space, so we could use such a thing for virtio.

Another think we could do potentially is advertize via the device-tree
that such a bus uses a direct mapping and have the guest use appropriate
"direct map" dma_ops.

But we need to keep backward compatibility with existing
guest/hypervisors so the default must remain as it is.

> There really are virtio devices that are pieces of silicon and not
> figments of a hypervisor's imagination [1].

I am aware of that. There are also attempts at using virtio to make two
machines communicate via a PCIe link (either with one as endpoint of the
other or via a non-transparent switch).

Which is why I'm not objecting to what you are trying to do ;-)

My suggestion was that it might be a cleaner approach to do that by
having the individual virtio drivers always use the dma_map_* API, and
limiting the kludgery to a combination of virtio_pci "core" and arch
code by selecting an appropriate set of dma_map_ops, defaulting with a
"transparent" (or direct) one as our current default case (and thus
overriding the iommu ones provided by the arch).

>   We could teach virtio_pci
> to use physical addressing on ppc64, but that seems like a pretty
> awful hack, and it'll start needing quirks as soon as someone tries to
> plug a virtio-speaking PCI card into a ppc64 machine.

But x86_64 is the same no ? The day it starts growing an iommu emulation
in qemu (and I've heard it's happening) it will still want to do direct
bypass for virtio for performance.

> Ideas?  x86 and arm seem to be safe here, since AFAIK there is no such
> thing as a physically addressed virtio "PCI" device on a bus with an
> IOMMU on x86, arm, or arm64.

Today  I wouldn't bet on it to remain that way. The qemu
implementation of virtio is physically addressed and you don't
necessarily have a choice of which device gets an iommu and which not.

Cheers,
Ben.

> [1] https://lwn.net/Articles/580186/
> 
> > Cheers,
> > Ben.
> >
> >
> 
> 
> 


___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v4 0/4] virtio: Clean up scatterlists and use the DMA API

2014-09-02 Thread Andy Lutomirski
On Tue, Sep 2, 2014 at 3:10 PM, Benjamin Herrenschmidt
 wrote:
> On Tue, 2014-09-02 at 14:37 -0700, Andy Lutomirski wrote:
>
>> Let's take a step back from from the implementation.  What is a driver
>> for a virtio PCI device (i.e. a PCI device with vendor 0x1af4)
>> supposed to do on ppc64?
>
> Today, it's supposed to send guest physical addresses. We can make that
> optional via some nego or capabilities to support more esoteric setups
> but for backward compatibility, this must remain the default behaviour.

I think it only needs to remain the default in cases where the
alternative (bus addressing) won't work.  I think that, so far, this
is just ppc64.  But see below...

>
> My suggestion was that it might be a cleaner approach to do that by
> having the individual virtio drivers always use the dma_map_* API, and
> limiting the kludgery to a combination of virtio_pci "core" and arch
> code by selecting an appropriate set of dma_map_ops, defaulting with a
> "transparent" (or direct) one as our current default case (and thus
> overriding the iommu ones provided by the arch).

I think the cleanest way of all would be to get the bus drivers to do
the right thing so that all of the virtio code can just use the dma
api.  I don't know whether this is achievable.

>
>>   We could teach virtio_pci
>> to use physical addressing on ppc64, but that seems like a pretty
>> awful hack, and it'll start needing quirks as soon as someone tries to
>> plug a virtio-speaking PCI card into a ppc64 machine.
>
> But x86_64 is the same no ? The day it starts growing an iommu emulation
> in qemu (and I've heard it's happening) it will still want to do direct
> bypass for virtio for performance.

I don't think so.  I would argue that it's a straight-up bug for QEMU
to expose a physically-addressed virtio-pci device to the guest behind
an emulated IOMMU.  QEMU may already be doing that on ppc64, but it
isn't on x86_64 or arm (yet).

On x86_64, I'm pretty sure that QEMU can emulate an IOMMU for
everything except the virtio-pci devices.  The ACPI DMAR stuff is
quite expressive.

On ARM, I hope the QEMU will never implement a PCI IOMMU.  As far as I
could tell when I looked last week, none of the newer QEMU-emulated
ARM machines even support PCI.  Even if QEMU were to implement a PCI
IOMMU on some future ARM machine, it could continue using virtio-mmio
for virtio devices.

So ppc might actually be the only system that has or will have
physically-addressed virtio PCI devices that are behind an IOMMU.  Can
this be handled in a ppc64-specific way?  Is there any way that the
kernel can distinguish a QEMU-provided virtio PCI device from a
physical PCIe thing?  It would be kind of nice to address this without
adding complexity to the virtio spec.  Maybe virtio 1.0 devices could
be assumed to use bus addressing unless a new devicetree property says
otherwise.

--Andy
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v4 0/4] virtio: Clean up scatterlists and use the DMA API

2014-09-02 Thread Andy Lutomirski
On Tue, Sep 2, 2014 at 4:20 PM, Benjamin Herrenschmidt  wrote:
> On Tue, 2014-09-02 at 16:11 -0700, Andy Lutomirski wrote:
>
>> I don't think so.  I would argue that it's a straight-up bug for QEMU
>> to expose a physically-addressed virtio-pci device to the guest behind
>> an emulated IOMMU.  QEMU may already be doing that on ppc64, but it
>> isn't on x86_64 or arm (yet).
>
> Last I looked, it does on everything, it bypasses the DMA layer in qemu
> which is where IOMMUs are implemented.

I believe you, but I'm not convinced that this means much from the
guest's POV, except on ppc64.

>
>> On x86_64, I'm pretty sure that QEMU can emulate an IOMMU for
>> everything except the virtio-pci devices.  The ACPI DMAR stuff is
>> quite expressive.
>
> Well, *except* virtio, exactly...

But there aren't any ACPI systems with both virtio-pci and IOMMUs,
right?  So we could say that, henceforth, ACPI systems must declare
whether virtio-pci devices live behind IOMMUs without breaking
backward compatibility.

>
>> On ARM, I hope the QEMU will never implement a PCI IOMMU.  As far as I
>> could tell when I looked last week, none of the newer QEMU-emulated
>> ARM machines even support PCI.  Even if QEMU were to implement a PCI
>> IOMMU on some future ARM machine, it could continue using virtio-mmio
>> for virtio devices.
>
> Possibly...
>
>> So ppc might actually be the only system that has or will have
>> physically-addressed virtio PCI devices that are behind an IOMMU.  Can
>> this be handled in a ppc64-specific way?
>
> I wouldn't be so certain, as I said, the way virtio is implemented in
> qemu bypass the DMA layer which is where IOMMUs sit. The fact that
> currently x86 doesn't put an IOMMU there is not even garanteed, is it ?
> What happens if you try to mix and match virtio and other emulated
> devices that require the iommu on the same bus ?

AFAIK QEMU doesn't support IOMMUs at all on x86, so current versions
of QEMU really do guarantee that virtio-pci on x86 has no IOMMU, even
if that guarantee is purely accidental.

>
> If we could discriminate virtio devices to a specific host bridge and
> guarantee no mix & match, we could probably add a concept of
> "IOMMU-less" bus but that would require guest changes which limits the
> usefulness.
>
>>   Is there any way that the
>> kernel can distinguish a QEMU-provided virtio PCI device from a
>> physical PCIe thing?
>
> Not with existing guests which cannot be changed. Existing distros are
> out with those drivers. If we add a backward compatibility mechanism,
> then we could add something yes, provided we can segregate virtio onto a
> dedicated host bridge (which can be a problem with the libvirt
> trainwreck...)

Ugh.

So here's an ugly proposal:

Step 1: Make virtio-pci use the DMA API only on x86.  This will at
least fix Xen and people experimenting with virtio hardware on x86,
and it won't break anything, since there are no emulated IOMMUs on
x86.

Step 2: Update the virtio spec.  Virtio 1.0 PCI devices should set a
new bit if they are physically addressed.  If that bit is clear, then
the device is assumed to be addressed in accordance with the
platform's standard addressing model for PCI.  Presumably this would
be something like VIRTIO_F_BUS_ADDRESSING = 33, and the spec would say
something like "Physical devices compatible with this specification
MUST offer VIRTIO_F_BUS_ADDRESSING.  Drivers MUST implement this
feature."  Alternatively, this could live in a PCI configuration
capability.

Step 3: Update virtio-pci to use the DMA API for all devices on x86
and for devices that advertise bus addressing on other architectures.

I think this proposal will work, but I also think it sucks and I'd
really like to see a better counter-proposal.


--Andy
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v4 0/4] virtio: Clean up scatterlists and use the DMA API

2014-09-02 Thread Andy Lutomirski
On Tue, Sep 2, 2014 at 5:25 PM, Benjamin Herrenschmidt  wrote:
> On Tue, 2014-09-02 at 16:42 -0700, Andy Lutomirski wrote:
>> So here's an ugly proposal:
>>
>> Step 1: Make virtio-pci use the DMA API only on x86.  This will at
>> least fix Xen and people experimenting with virtio hardware on x86,
>> and it won't break anything, since there are no emulated IOMMUs on
>> x86.
>
> I think we should make all virtio drivers use the DMA API and just have
> different set of dma_ops. We can make a simple ifdef powerpc if needed
> in virtio-pci that force the dma-ops of the device to some direct
> "bypass" ops at init time.
>
> That way no need to select whether to use the DMA API or not, just
> always use it, and add a tweak to replace the DMA ops with the direct
> ones on the archs/platforms that need that. That was my original
> proposal and I still think it's the best approach.

I agree *except* that implementing it will be a real PITA and (I
think) can't be done without changing code in arch/.  My patches plus
an ifdef powerpc will be functionally equivalent, just uglier.

>
> As I said, make it always use the DMA API, but add a quirk to replace
> the dma_ops with some NULL ops on platforms that need it.
>
> The only issue with that is the location of the dma ops is arch
> specific, so that one function will contain some ifdefs, but the rest of
> the code can just use the DMA API.

Bigger quirk: on a standard s390 virtio guest configuration,
dma_map_single etc will fail to link.  I tried this in v1 of these
patches.  So we can poke at the archdata all day, but we can't build a
kernel like that :(

So until the dma_ops pointer move into struct device and
CONFIG_HAS_DMA becomes mandatory (or mandatory enough that virtio can
depend on it), I don't think we can do it this way.

I'll send a v5 that is the same as v4 except with physical addressing
hardcoded in for powerpc.

--Andy
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH 3/3] virtio_ring: unify direct/indirect code paths.

2014-09-02 Thread Rusty Russell
virtqueue_add() populates the virtqueue descriptor table from the sgs
given.  If it uses an indirect descriptor table, then it puts a single
descriptor in the descriptor table pointing to the kmalloc'ed indirect
table where the sg is populated.

Previously vring_add_indirect() did the allocation and the simple
linear layout.  We replace that with alloc_indirect() which allocates
the indirect table then chains it like the normal descriptor table so
we can reuse the core logic.

This slows down pktgen by less than 1/2 a percent (which uses direct
descriptors), as well as vring_bench, but it's far neater.

vring_bench before:
1061485790-1104800648(1.08254e+09+/-6.6e+06)ns
vring_bench after:
1125610268-1183528965(1.14172e+09+/-8e+06)ns

pktgen before:
   787781-796334(793165+/-2.4e+03)pps 365-369(367.5+/-1.2)Mb/sec 
(365530384-369498976(3.68028e+08+/-1.1e+06)bps) errors: 0

pktgen after:
   779988-790404(786391+/-2.5e+03)pps 361-366(364.35+/-1.3)Mb/sec 
(361914432-366747456(3.64885e+08+/-1.2e+06)bps) errors: 0

Now, if we make force indirect descriptors by turning off any_header_sg
in virtio_net.c:

pktgen before:
  713773-721062(718374+/-2.1e+03)pps 331-334(332.95+/-0.92)Mb/sec 
(331190672-334572768(3.33325e+08+/-9.6e+05)bps) errors: 0
pktgen after:
  710542-719195(714898+/-2.4e+03)pps 329-333(331.15+/-1.1)Mb/sec 
(329691488-333706480(3.31713e+08+/-1.1e+06)bps) errors: 0

Signed-off-by: Rusty Russell 
---
 drivers/virtio/virtio_ring.c | 125 +--
 1 file changed, 50 insertions(+), 75 deletions(-)

diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
index 374399c62080..a4ebbffac141 100644
--- a/drivers/virtio/virtio_ring.c
+++ b/drivers/virtio/virtio_ring.c
@@ -99,18 +99,10 @@ struct vring_virtqueue
 
 #define to_vvq(_vq) container_of(_vq, struct vring_virtqueue, vq)
 
-/* Set up an indirect table of descriptors and add it to the queue. */
-static inline int vring_add_indirect(struct vring_virtqueue *vq,
-struct scatterlist *sgs[],
-unsigned int total_sg,
-unsigned int out_sgs,
-unsigned int in_sgs,
-gfp_t gfp)
+static struct vring_desc *alloc_indirect(unsigned int total_sg, gfp_t gfp)
 {
-   struct vring_desc *desc;
-   unsigned head;
-   struct scatterlist *sg;
-   int i, n;
+   struct vring_desc *desc;
+   unsigned int i;
 
/*
 * We require lowmem mappings for the descriptors because
@@ -119,51 +111,13 @@ static inline int vring_add_indirect(struct 
vring_virtqueue *vq,
 */
gfp &= ~(__GFP_HIGHMEM | __GFP_HIGH);
 
-   desc = kmalloc(total_sg * sizeof(struct vring_desc), gfp);
-   if (!desc)
-   return -ENOMEM;
-
-   /* Transfer entries from the sg lists into the indirect page */
-   i = 0;
-   for (n = 0; n < out_sgs; n++) {
-   for (sg = sgs[n]; sg; sg = sg_next(sg)) {
-   desc[i].flags = VRING_DESC_F_NEXT;
-   desc[i].addr = sg_phys(sg);
-   desc[i].len = sg->length;
-   desc[i].next = i+1;
-   i++;
-   }
-   }
-   for (; n < (out_sgs + in_sgs); n++) {
-   for (sg = sgs[n]; sg; sg = sg_next(sg)) {
-   desc[i].flags = VRING_DESC_F_NEXT|VRING_DESC_F_WRITE;
-   desc[i].addr = sg_phys(sg);
-   desc[i].len = sg->length;
-   desc[i].next = i+1;
-   i++;
-   }
-   }
-   BUG_ON(i != total_sg);
-
-   /* Last one doesn't continue. */
-   desc[i-1].flags &= ~VRING_DESC_F_NEXT;
-   desc[i-1].next = 0;
-
-   /* We're about to use a buffer */
-   vq->vq.num_free--;
-
-   /* Use a single buffer which doesn't continue */
-   head = vq->free_head;
-   vq->vring.desc[head].flags = VRING_DESC_F_INDIRECT;
-   vq->vring.desc[head].addr = virt_to_phys(desc);
-   /* kmemleak gives a false positive, as it's hidden by virt_to_phys */
-   kmemleak_ignore(desc);
-   vq->vring.desc[head].len = i * sizeof(struct vring_desc);
-
-   /* Update free pointer */
-   vq->free_head = vq->vring.desc[head].next;
+   desc = kmalloc(total_sg * sizeof(struct vring_desc), gfp);
+   if (!desc)
+   return NULL;
 
-   return head;
+   for (i = 0; i < total_sg; i++)
+   desc[i].next = i+1;
+   return desc;
 }
 
 static inline int virtqueue_add(struct virtqueue *_vq,
@@ -176,8 +130,10 @@ static inline int virtqueue_add(struct virtqueue *_vq,
 {
struct vring_virtqueue *vq = to_vvq(_vq);
struct scatterlist *sg;
+   struct vring_desc *desc;
unsigned int i, n, avail, uninitialized_var(prev);
int head;
+   bool indirect;
 
  

[PATCH 2/3] virtio_ring: assume sgs are always well-formed.

2014-09-02 Thread Rusty Russell
We used to have several callers which just used arrays.  They're
gone, so we can use sg_next() everywhere, simplifying the code.

On my laptop, this slowed down vring_bench by 15%:

vring_bench before:
936153354-967745359(9.44739e+08+/-6.1e+06)ns
vring_bench after:
1061485790-1104800648(1.08254e+09+/-6.6e+06)ns

However, a more realistic test using pktgen on a AMD FX(tm)-8320 saw
a few percent improvement:

pktgen before:
  767390-792966(785159+/-6.5e+03)pps 356-367(363.75+/-2.9)Mb/sec 
(356068960-367936224(3.64314e+08+/-3e+06)bps) errors: 0

pktgen after:
   787781-796334(793165+/-2.4e+03)pps 365-369(367.5+/-1.2)Mb/sec 
(365530384-369498976(3.68028e+08+/-1.1e+06)bps) errors: 0

Signed-off-by: Rusty Russell 
---
 drivers/virtio/virtio_ring.c | 68 +---
 1 file changed, 19 insertions(+), 49 deletions(-)

diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
index 4d08f45a9c29..374399c62080 100644
--- a/drivers/virtio/virtio_ring.c
+++ b/drivers/virtio/virtio_ring.c
@@ -99,28 +99,10 @@ struct vring_virtqueue
 
 #define to_vvq(_vq) container_of(_vq, struct vring_virtqueue, vq)
 
-static inline struct scatterlist *sg_next_chained(struct scatterlist *sg,
- unsigned int *count)
-{
-   return sg_next(sg);
-}
-
-static inline struct scatterlist *sg_next_arr(struct scatterlist *sg,
- unsigned int *count)
-{
-   if (--(*count) == 0)
-   return NULL;
-   return sg + 1;
-}
-
 /* Set up an indirect table of descriptors and add it to the queue. */
 static inline int vring_add_indirect(struct vring_virtqueue *vq,
 struct scatterlist *sgs[],
-struct scatterlist *(*next)
-  (struct scatterlist *, unsigned int *),
 unsigned int total_sg,
-unsigned int total_out,
-unsigned int total_in,
 unsigned int out_sgs,
 unsigned int in_sgs,
 gfp_t gfp)
@@ -144,7 +126,7 @@ static inline int vring_add_indirect(struct vring_virtqueue 
*vq,
/* Transfer entries from the sg lists into the indirect page */
i = 0;
for (n = 0; n < out_sgs; n++) {
-   for (sg = sgs[n]; sg; sg = next(sg, &total_out)) {
+   for (sg = sgs[n]; sg; sg = sg_next(sg)) {
desc[i].flags = VRING_DESC_F_NEXT;
desc[i].addr = sg_phys(sg);
desc[i].len = sg->length;
@@ -153,7 +135,7 @@ static inline int vring_add_indirect(struct vring_virtqueue 
*vq,
}
}
for (; n < (out_sgs + in_sgs); n++) {
-   for (sg = sgs[n]; sg; sg = next(sg, &total_in)) {
+   for (sg = sgs[n]; sg; sg = sg_next(sg)) {
desc[i].flags = VRING_DESC_F_NEXT|VRING_DESC_F_WRITE;
desc[i].addr = sg_phys(sg);
desc[i].len = sg->length;
@@ -186,10 +168,7 @@ static inline int vring_add_indirect(struct 
vring_virtqueue *vq,
 
 static inline int virtqueue_add(struct virtqueue *_vq,
struct scatterlist *sgs[],
-   struct scatterlist *(*next)
- (struct scatterlist *, unsigned int *),
-   unsigned int total_out,
-   unsigned int total_in,
+   unsigned int total_sg,
unsigned int out_sgs,
unsigned int in_sgs,
void *data,
@@ -197,7 +176,7 @@ static inline int virtqueue_add(struct virtqueue *_vq,
 {
struct vring_virtqueue *vq = to_vvq(_vq);
struct scatterlist *sg;
-   unsigned int i, n, avail, uninitialized_var(prev), total_sg;
+   unsigned int i, n, avail, uninitialized_var(prev);
int head;
 
START_USE(vq);
@@ -222,13 +201,10 @@ static inline int virtqueue_add(struct virtqueue *_vq,
}
 #endif
 
-   total_sg = total_in + total_out;
-
/* If the host supports indirect descriptor tables, and we have multiple
 * buffers, then go indirect. FIXME: tune this threshold */
if (vq->indirect && total_sg > 1 && vq->vq.num_free) {
-   head = vring_add_indirect(vq, sgs, next, total_sg, total_out,
- total_in,
+   head = vring_add_indirect(vq, sgs, total_sg, 
  out_sgs, in_sgs, gfp);
if (likely(head >= 0))
goto add_head;
@@ -254,7 +230,7 @@ static inline int virtqueue_add(struct virtqueue *_vq,
 
head = i =

[PATCH 0/3] virtio: simplify virtio_ring.

2014-09-02 Thread Rusty Russell
I resurrected these patches after prompting from Andy Lutomirski's
recent patches.  I put them on the back-burner because vring_bench
had a 15% slowdown on my laptop: pktgen testing revealed a speedup,
if anything, so I've cleaned them up.

Rusty Russell (3):
  virtio_net: pass well-formed sgs to virtqueue_add_*()
  virtio_ring: assume sgs are always well-formed.
  virtio_ring: unify direct/indirect code paths.

 drivers/net/virtio_net.c |   5 +-
 drivers/virtio/virtio_ring.c | 185 +++
 2 files changed, 69 insertions(+), 121 deletions(-)

-- 
1.9.1

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH 1/3] virtio_net: pass well-formed sgs to virtqueue_add_*()

2014-09-02 Thread Rusty Russell
This is the only driver which doesn't hand virtqueue_add_inbuf and
virtqueue_add_outbuf a well-formed, well-terminated sg.  Fix it,
so we can make virtio_add_* simpler.

pktgen results:
modprobe pktgen
echo 'add_device eth0' > /proc/net/pktgen/kpktgend_0
echo nowait 1 > /proc/net/pktgen/eth0
echo count 100 > /proc/net/pktgen/eth0
echo clone_skb 10 > /proc/net/pktgen/eth0
echo dst_mac 4e:14:25:a9:30:ac > /proc/net/pktgen/eth0
echo dst 192.168.1.2 > /proc/net/pktgen/eth0
for i in `seq 20`; do echo start > /proc/net/pktgen/pgctrl; tail -n1 
/proc/net/pktgen/eth0; done

Before:
  746547-793084(786421+/-9.6e+03)pps 346-367(364.4+/-4.4)Mb/sec 
(346397808-367990976(3.649e+08+/-4.5e+06)bps) errors: 0

After:
  767390-792966(785159+/-6.5e+03)pps 356-367(363.75+/-2.9)Mb/sec 
(356068960-367936224(3.64314e+08+/-3e+06)bps) errors: 0

Signed-off-by: Rusty Russell 
---
 drivers/net/virtio_net.c | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index 59caa06f34a6..31cac511b3c3 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -546,8 +546,8 @@ static int add_recvbuf_small(struct receive_queue *rq, 
gfp_t gfp)
skb_put(skb, GOOD_PACKET_LEN);
 
hdr = skb_vnet_hdr(skb);
+   sg_init_table(rq->sg, MAX_SKB_FRAGS + 2);
sg_set_buf(rq->sg, &hdr->hdr, sizeof hdr->hdr);
-
skb_to_sgvec(skb, rq->sg + 1, 0, skb->len);
 
err = virtqueue_add_inbuf(rq->vq, rq->sg, 2, skb, gfp);
@@ -563,6 +563,8 @@ static int add_recvbuf_big(struct receive_queue *rq, gfp_t 
gfp)
char *p;
int i, err, offset;
 
+   sg_init_table(rq->sg, MAX_SKB_FRAGS + 2);
+
/* page in rq->sg[MAX_SKB_FRAGS + 1] is list tail */
for (i = MAX_SKB_FRAGS + 1; i > 1; --i) {
first = get_a_page(rq, gfp);
@@ -899,6 +901,7 @@ static int xmit_skb(struct send_queue *sq, struct sk_buff 
*skb)
if (vi->mergeable_rx_bufs)
hdr->mhdr.num_buffers = 0;
 
+   sg_init_table(sq->sg, MAX_SKB_FRAGS + 2);
if (can_push) {
__skb_push(skb, hdr_len);
num_sg = skb_to_sgvec(skb, sq->sg, 0, skb->len);
-- 
1.9.1

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH 1/3] virtio_net: pass well-formed sgs to virtqueue_add_*()

2014-09-02 Thread Andy Lutomirski
On Tue, Sep 2, 2014 at 9:29 PM, Rusty Russell  wrote:
> This is the only driver which doesn't hand virtqueue_add_inbuf and
> virtqueue_add_outbuf a well-formed, well-terminated sg.  Fix it,
> so we can make virtio_add_* simpler.

OK, I get it now: you're reinitializing the table every time, clearing
old end marks.  There's room to microoptimize this, but it's probably
not worth it.

IOW, looks good to me.

--Andy
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Setting up VHost without QEMU?

2014-09-02 Thread 가니스
Dear all,

I have a few questions. Vhost is taking out QEMU in virtio device emulation. 
But in KVM, Vhost needs QEMU for setting up virtqueue. Why it can't be done in 
kernel? Is it because the guest VMs are running in QEMU?

I and my team are developing our own hypervisor. And our guest VMs are not 
running in QEMU. Is there any implementation out there which implement Vhost 
without needing QEMU at all?

Thank you for your time. Have a nice day/evening!


Ganis
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH v4 0/4] virtio: Clean up scatterlists and use the DMA API

2014-09-02 Thread Rusty Russell
Andy Lutomirski  writes:
> There really are virtio devices that are pieces of silicon and not
> figments of a hypervisor's imagination [1].

Hi Andy,

As you're discovering, there's a reason no one has done the DMA
API before.

So the problem is that ppc64's IOMMU is a platform thing, not a bus
thing.  They really do carve out an exception for virtio devices,
because performance (LOTS of performance).  It remains to be seen if
other platforms have the same performance issues, but in absence of
other evidence, the answer is yes.

It's a hack.  But having specific virtual-only devices are an even
bigger hack.

Physical virtio devices have been talked about, but don't actually exist
in Real Life.  And someone a virtio PCI card is going to have serious
performance issues: mainly because they'll want the rings in the card's
MMIO region, not allocated by the driver.  Being broken on PPC is really
the least of their problems.

So, what do we do?  It'd be nice if Linux virtio Just Worked under Xen,
though Xen's IOMMU is outside the virtio spec.  Since virtio_pci can be
a module, obvious hacks like having xen_arch_setup initialize a dma_ops pointer
exposed by virtio_pci.c is out.

I think the best approach is to have a new feature bit (25 is free),
VIRTIO_F_USE_BUS_MAPPING which indicates that a device really wants to
use the mapping for the bus it is on.  A real device would set this,
or it won't work behind an IOMMU.  A Xen device would also set this.

Thoughts?
Rusty.

PS.  I cc'd OASIS virtio-dev: it's subscriber only for IP reasons (to
 subscribe you have to promise we can use your suggestion in the
 standard).  Feel free to remove in any replies, but it's part of
 the world we live in...
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization