Re: [PATCH v2 1/5] virtio: add functions for piecewise addition of buffers

2013-01-10 Thread Paolo Bonzini
Il 08/01/2013 01:12, Rusty Russell ha scritto:
 >>> Unfortunately, that cannot work because not all architectures support
 >>> chained scatterlists.
>>> >> 
>>> >> WHAT?  I can't figure out what an arch needs to do to support this?
>> >
>> > It needs to use the iterator functions in its DMA driver.
> But we don't care for virtio.

True.

>>> >> All archs we care about support them, though, so I think we can ignore
>>> >> this issue for now.
>> >
>> > Kind of... In principle all QEMU-supported arches can use virtio, and
>> > the speedup can be quite useful.  And there is no Kconfig symbol for SG
>> > chains that I can use to disable virtio-scsi on unsupported arches. :/
> Well, we #error if it's not supported.  Then the lazy architectures can
> fix it.

Yeah, that would be one approach.

But frankly, your patch is really disgusting. :)  Not your fault, of
course, but I still prefer a limited amount of duplication.

Perhaps we can get the best of both worlds, I'll take a look when I have
some time.

Paolo
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH V3 1/2] virtio-net: fix the set affinity bug when CPU IDs are not consecutive

2013-01-10 Thread Wanlong Gao
On 01/10/2013 08:49 AM, Rusty Russell wrote:
> Wanlong Gao  writes:
>> On 01/09/2013 07:31 AM, Rusty Russell wrote:
>>> Wanlong Gao  writes:
   */
  static u16 virtnet_select_queue(struct net_device *dev, struct sk_buff 
 *skb)
  {
 -  int txq = skb_rx_queue_recorded(skb) ? skb_get_rx_queue(skb) :
 -smp_processor_id();
 +  int txq = 0;
 +
 +  if (skb_rx_queue_recorded(skb))
 +  txq = skb_get_rx_queue(skb);
 +  else if ((txq = per_cpu(vq_index, smp_processor_id())) == -1)
 +  txq = 0;
>>>
>>> You should use __get_cpu_var() instead of smp_processor_id() here, ie:
>>>
>>> else if ((txq = __get_cpu_var(vq_index)) == -1)
>>>
>>> And AFAICT, no reason to initialize txq to 0 to start with.
>>>
>>> So:
>>>
>>> int txq;
>>>
>>> if (skb_rx_queue_recorded(skb))
>>> txq = skb_get_rx_queue(skb);
>>> else {
>>> txq = __get_cpu_var(vq_index);
>>> if (txq == -1)
>>> txq = 0;
>>> }
>>
>> Got it, thank you.
>>
>>>
>>> Now, just to confirm, I assume this can happen even if we use vq_index,
>>> right, because of races with virtnet_set_channels?
>>
>> I still can't understand this race, could you explain more? thank you.
> 
> I assume that someone can call virtnet_set_channels() while we are
> inside virtnet_select_queue(), so they reduce dev->real_num_tx_queues,
> causing virtnet_set_channels to do:
> 
>   while (unlikely(txq >= dev->real_num_tx_queues))
>   txq -= dev->real_num_tx_queues;
> 
> Otherwise, when is this loop called?

How about just remove this loop? 

Eric, can you give a help here?

Thanks,
Wanlong Gao

> 
> Thanks,
> Rusty.
> 

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [RFCv2 00/12] Introduce host-side virtio queue and CAIF Virtio.

2013-01-10 Thread Rusty Russell
Rusty Russell  writes:
> It basically involves moving much of vring.c into a virtio_host.c: the
> parts which actually touch the ring.  Then it provides accessors for
> vring.c to use which are __user-safe (all casts are inside
> virtio_host.c).
>
> I should have something to post by end of today, my time...

Well, that was optimistic.

I now have some lightly-tested code (via a userspace harness).  The
interface will probably change again as I try to adapt vhost to use it.

The emphasis is getting a sglist out of the vring as efficiently as
possible.  This involves some hacks: I'm still wondering if we should
move the address mapping logic into the virtio_host core, with a callout
if an address we want is outside a single range.

Not sure why vring/net doesn't built a packet and feed it in
netif_rx_ni().  This is what tun seems to do, and with this code it
should be fairly optimal.

Cheers,
Rusty.

virtio_host: host-side implementation of virtio rings.

Getting use of virtio rings correct is tricky, and a recent patch saw
an implementation of in-kernel rings (as separate from userspace).

This patch attempts to abstract the business of dealing with the
virtio ring layout from the access (userspace or direct); to do this,
we use function pointers, which gcc inlines correctly.

The new API should be more efficient than the existing vhost code,
too, since we convert directly to chained sg lists, which can be
modified in place to map the pages.

Disadvantages:
1) The spec allows chained indirect entries, we don't.  Noone does this,
   but it's not as crazy as it sounds, so perhaps we should support it.  If
   we did, we'd almost certainly invoke an function ptr call to check the
   validity of the indirect mem.

2) Getting an accessor is ugly; if it's indirect, the caller has to check
   that it's valid.  Efficient, but it's a horrible API.

No doubt this will change as I try to adapt existing vhost drivers.

diff --git a/drivers/vhost/Kconfig b/drivers/vhost/Kconfig
index 202bba6..38ec470 100644
--- a/drivers/vhost/Kconfig
+++ b/drivers/vhost/Kconfig
@@ -1,6 +1,7 @@
 config VHOST_NET
tristate "Host kernel accelerator for virtio net (EXPERIMENTAL)"
depends on NET && EVENTFD && (TUN || !TUN) && (MACVTAP || !MACVTAP) && 
EXPERIMENTAL
+   select VHOST
---help---
  This kernel module can be loaded in host kernel to accelerate
  guest networking with virtio_net. Not to be confused with virtio_net
diff --git a/drivers/vhost/Kconfig.tcm b/drivers/vhost/Kconfig.tcm
index a9c6f76..f4c3704 100644
--- a/drivers/vhost/Kconfig.tcm
+++ b/drivers/vhost/Kconfig.tcm
@@ -1,6 +1,7 @@
 config TCM_VHOST
tristate "TCM_VHOST fabric module (EXPERIMENTAL)"
depends on TARGET_CORE && EVENTFD && EXPERIMENTAL && m
+   select VHOST
default n
---help---
Say M here to enable the TCM_VHOST fabric module for use with 
virtio-scsi guests
diff --git a/drivers/virtio/Kconfig b/drivers/virtio/Kconfig
index 8d5bddb..fd95d3e 100644
--- a/drivers/virtio/Kconfig
+++ b/drivers/virtio/Kconfig
@@ -5,6 +5,12 @@ config VIRTIO
  bus, such as CONFIG_VIRTIO_PCI, CONFIG_VIRTIO_MMIO, CONFIG_LGUEST,
  CONFIG_RPMSG or CONFIG_S390_GUEST.
 
+config VHOST
+   tristate
+   ---help---
+ This option is selected by any driver which needs to access
+ the host side of a virtio ring.
+
 menu "Virtio drivers"
 
 config VIRTIO_PCI
diff --git a/drivers/virtio/Makefile b/drivers/virtio/Makefile
index 9076635..9833cd5 100644
--- a/drivers/virtio/Makefile
+++ b/drivers/virtio/Makefile
@@ -2,3 +2,4 @@ obj-$(CONFIG_VIRTIO) += virtio.o virtio_ring.o
 obj-$(CONFIG_VIRTIO_MMIO) += virtio_mmio.o
 obj-$(CONFIG_VIRTIO_PCI) += virtio_pci.o
 obj-$(CONFIG_VIRTIO_BALLOON) += virtio_balloon.o
+obj-$(CONFIG_VHOST) += virtio_host.o
diff --git a/drivers/virtio/virtio_host.c b/drivers/virtio/virtio_host.c
new file mode 100644
index 000..169c8e2
--- /dev/null
+++ b/drivers/virtio/virtio_host.c
@@ -0,0 +1,668 @@
+/*
+ * Helpers for the host side of a virtio ring.
+ *
+ * Since these may be in userspace, we use (inline) accessors.
+ */
+#include 
+#include 
+#include 
+#include 
+
+/* An inline function, for easy cold marking. */
+static __cold bool __vringh_bad(void)
+{
+   static DEFINE_RATELIMIT_STATE(vringh_rs,
+ DEFAULT_RATELIMIT_INTERVAL,
+ DEFAULT_RATELIMIT_BURST);
+   return __ratelimit(&vringh_rs);
+}
+
+#define vringh_bad(fmt, ...)   \
+   do { if (__vringh_bad())\
+   printk(KERN_NOTICE "vringh: " fmt "\n", __VA_ARGS__); \
+   } while(0)
+
+/* Returns vring->num if empty, -ve on error. */
+static inline int __vringh_get_head(const struct vringh *vrh,
+   int (*getu16)(u16 *val, const u16 *p),
+   u16 *last_ava

Re: [RFCv2 00/12] Introduce host-side virtio queue and CAIF Virtio.

2013-01-10 Thread Michael S. Tsirkin
On Thu, Jan 10, 2013 at 09:00:55PM +1030, Rusty Russell wrote:
> Not sure why vring/net doesn't built a packet and feed it in
> netif_rx_ni().  This is what tun seems to do, and with this code it
> should be fairly optimal.

Because we want to use NAPI.

-- 
MST
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [Xen-devel] [PATCH v3 02/11] x86/kexec: Add extra pointers to transition page table PGD, PUD, PMD and PTE

2013-01-10 Thread David Vrabel
On 04/01/13 15:15, Daniel Kiper wrote:
> On Thu, Jan 03, 2013 at 09:34:55AM +, Jan Beulich wrote:
> On 27.12.12 at 03:18, Daniel Kiper  wrote:
>>> Some implementations (e.g. Xen PVOPS) could not use part of identity page 
>>> table
>>> to construct transition page table. It means that they require separate 
>>> PUDs,
>>> PMDs and PTEs for virtual and physical (identity) mapping. To satisfy that
>>> requirement add extra pointer to PGD, PUD, PMD and PTE and align existing
>>> code.
>>
>> So you keep posting this despite it having got pointed out on each
>> earlier submission that this is unnecessary, proven by the fact that
>> the non-pvops Xen kernels can get away without it. Why?
> 
> Sorry but I forgot to reply for your email last time.
> 
> I am still not convinced. I have tested SUSE kernel itself and it does not 
> work.
> Maybe I missed something but... Please check 
> arch/x86/kernel/machine_kexec_64.c:init_transition_pgtable()
> 
> I can see:
> 
> vaddr = (unsigned long)relocate_kernel;
> 
> and later:
> 
> pgd += pgd_index(vaddr);
> ...
> 
> It is wrong. relocate_kernel() virtual address in Xen is different
> than its virtual address in Linux Kernel. That is why transition
> page table could not be established in Linux Kernel and so on...
> How does this work in SUSE? I do not have an idea.

The real problem here is attempting to transition from the Xen page
tables to an identity mapping set of page tables by using some
trampoline code and page tables provided by the dom0 kernel.

This works[*] with PV because the page tables from the PV dom0 have
machine addresses and get mapped into the fixmap on kexec load, but it's
completely broken for a PVH dom0.

I shall be ditching this (bizarre) method and putting the trampoline and
transition/identity map page tables into Xen.

David

[*] Works for us in our old classic kernels, YMMV.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [Xen-devel] [PATCH v3 00/11] xen: Initial kexec/kdump implementation

2013-01-10 Thread David Vrabel
On 04/01/13 17:01, Daniel Kiper wrote:
> On Fri, Jan 04, 2013 at 02:38:44PM +, David Vrabel wrote:
>> On 04/01/13 14:22, Daniel Kiper wrote:
>>> On Wed, Jan 02, 2013 at 11:26:43AM +, Andrew Cooper wrote:
 On 27/12/12 18:02, Eric W. Biederman wrote:
> Andrew Cooper  writes:
>
>> On 27/12/2012 07:53, Eric W. Biederman wrote:
>>> The syscall ABI still has the wrong semantics.
>>>
>>> Aka totally unmaintainable and umergeable.
>>>
>>> The concept of domU support is also strange.  What does domU support 
>>> even mean, when the dom0 support is loading a kernel to pick up Xen 
>>> when Xen falls over.
>> There are two requirements pulling at this patch series, but I agree
>> that we need to clarify them.
> It probably make sense to split them apart a little even.
>
>

 Thinking about this split, there might be a way to simply it even more.

 /sbin/kexec can load the "Xen" crash kernel itself by issuing
 hypercalls using /dev/xen/privcmd.  This would remove the need for
 the dom0 kernel to distinguish between loading a crash kernel for
 itself and loading a kernel for Xen.

 Or is this just a silly idea complicating the matter?
>>>
>>> This is impossible with current Xen kexec/kdump interface.
>>> It should be changed to do that. However, I suppose that
>>> Xen community would not be interested in such changes.
>>
>> I don't see why the hypercall ABI cannot be extended with new sub-ops
>> that do the right thing -- the existing ABI is a bit weird.
>>
>> I plan to start prototyping something shortly (hopefully next week) for
>> the Xen kexec case.
> 
> Wow... As I can this time Xen community is interested in...
> That is great. I agree that current kexec interface is not ideal.

I spent some more time looking at the existing interface and
implementation and it really is broken.

> David, I am happy to help in that process. However, if you wish I could
> carry it myself. Anyway, it looks that I should hold on with my
> Linux kexec/kdump patches.

I should be able to post some prototype patches for Xen in a few weeks.
 No guarantees though.

> My .5 cents:
>   - We should focus on KEXEC_CMD_kexec_load and KEXEC_CMD_kexec_unload;
> probably we should introduce KEXEC_CMD_kexec_load2 and 
> KEXEC_CMD_kexec_unload2;
> load should __LOAD__ kernel image and other things into hypervisor memory;

Yes, but I don't see how we can easily support both ABIs easily.  I'd be
in favour of replacing the existing hypercalls and requiring updated
kexec tools in dom0 (this isn't that different to requiring the correct
libxc in dom0).

> I suppose that allmost all things could be copied from 
> linux/kernel/kexec.c,
> linux/arch/x86/kernel/{machine_kexec_$(BITS).c,relocate_kernel_$(BITS).c};
> I think that KEXEC_CMD_kexec should stay as is,

I don't think we want all the junk from Linux inside Xen -- we only want
to support the kdump case and do not have to handle returning from the
kexec image.

>   - Hmmm... Now I think that we should still use kexec syscall to load image
> into Xen memory (with new KEXEC_CMD_kexec_load2) because it establishes
> all things which are needed to call kdump if dom0 crashes; however,
> I could be wrong...

I don't think we need the kexec syscall.  The kernel can unconditionally
do the crash hypercall, which will return if the kdump kernel isn't
loaded and the kernel can fall back to the regular non-kexec panic.

This will allow the kexec syscall to be used only for the domU kexec case.

>   - last but not least, we should think about support for PV guests
> too.

I won't be looking at this.

To avoid confusion about the two largely orthogonal sorts of kexec how
about defining some terms.  I suggest:

Xen kexec: Xen executes the image in response to a Xen crash or a
hypercall from a privileged domain.

Guest kexec: The guest kernel executes the images within the domain in
response to a guest kernel crash or a system call.

David
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[RFC PATCH 0/2] make mac programming for virtio net more robust

2013-01-10 Thread akong
From: Amos Kong 

Currenly mac is programmed byte by byte. This means that we
have an intermediate step where mac is wrong. 

Second patch introduced a new vq control command to set mac
address in one time.

Amos Kong (2):
  move virtnet_send_command() above virtnet_set_mac_address()
  virtio-net: introduce a new control to set macaddr

 drivers/net/virtio_net.c| 105 ++--
 include/uapi/linux/virtio_net.h |   8 ++-
 2 files changed, 66 insertions(+), 47 deletions(-)

-- 
1.7.11.7

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[RFC PATCH 1/2] move virtnet_send_command() above virtnet_set_mac_address()

2013-01-10 Thread akong
From: Amos Kong 

We will send vq command to set mac address in virtnet_set_mac_address()
a little fix of coding style

Signed-off-by: Amos Kong 
---
 drivers/net/virtio_net.c | 89 
 1 file changed, 44 insertions(+), 45 deletions(-)

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index a6fcf15..395ab4f 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -753,6 +753,50 @@ static netdev_tx_t start_xmit(struct sk_buff *skb, struct 
net_device *dev)
return NETDEV_TX_OK;
 }
 
+/*
+ * Send command via the control virtqueue and check status.  Commands
+ * supported by the hypervisor, as indicated by feature bits, should
+ * never fail unless improperly formated.
+ */
+static bool virtnet_send_command(struct virtnet_info *vi, u8 class, u8 cmd,
+struct scatterlist *data, int out, int in)
+{
+   struct scatterlist *s, sg[VIRTNET_SEND_COMMAND_SG_MAX + 2];
+   struct virtio_net_ctrl_hdr ctrl;
+   virtio_net_ctrl_ack status = ~0;
+   unsigned int tmp;
+   int i;
+
+   /* Caller should know better */
+   BUG_ON(!virtio_has_feature(vi->vdev, VIRTIO_NET_F_CTRL_VQ) ||
+   (out + in > VIRTNET_SEND_COMMAND_SG_MAX));
+
+   out++; /* Add header */
+   in++; /* Add return status */
+
+   ctrl.class = class;
+   ctrl.cmd = cmd;
+
+   sg_init_table(sg, out + in);
+
+   sg_set_buf(&sg[0], &ctrl, sizeof(ctrl));
+   for_each_sg(data, s, out + in - 2, i)
+   sg_set_buf(&sg[i + 1], sg_virt(s), s->length);
+   sg_set_buf(&sg[out + in - 1], &status, sizeof(status));
+
+   BUG_ON(virtqueue_add_buf(vi->cvq, sg, out, in, vi, GFP_ATOMIC) < 0);
+
+   virtqueue_kick(vi->cvq);
+
+   /* Spin for a response, the kick causes an ioport write, trapping
+* into the hypervisor, so the request should be handled immediately.
+*/
+   while (!virtqueue_get_buf(vi->cvq, &tmp))
+   cpu_relax();
+
+   return status == VIRTIO_NET_OK;
+}
+
 static int virtnet_set_mac_address(struct net_device *dev, void *p)
 {
struct virtnet_info *vi = netdev_priv(dev);
@@ -819,51 +863,6 @@ static void virtnet_netpoll(struct net_device *dev)
 }
 #endif
 
-/*
- * Send command via the control virtqueue and check status.  Commands
- * supported by the hypervisor, as indicated by feature bits, should
- * never fail unless improperly formated.
- */
-static bool virtnet_send_command(struct virtnet_info *vi, u8 class, u8 cmd,
-struct scatterlist *data, int out, int in)
-{
-   struct scatterlist *s, sg[VIRTNET_SEND_COMMAND_SG_MAX + 2];
-   struct virtio_net_ctrl_hdr ctrl;
-   virtio_net_ctrl_ack status = ~0;
-   unsigned int tmp;
-   int i;
-
-   /* Caller should know better */
-   BUG_ON(!virtio_has_feature(vi->vdev, VIRTIO_NET_F_CTRL_VQ) ||
-   (out + in > VIRTNET_SEND_COMMAND_SG_MAX));
-
-   out++; /* Add header */
-   in++; /* Add return status */
-
-   ctrl.class = class;
-   ctrl.cmd = cmd;
-
-   sg_init_table(sg, out + in);
-
-   sg_set_buf(&sg[0], &ctrl, sizeof(ctrl));
-   for_each_sg(data, s, out + in - 2, i)
-   sg_set_buf(&sg[i + 1], sg_virt(s), s->length);
-   sg_set_buf(&sg[out + in - 1], &status, sizeof(status));
-
-   BUG_ON(virtqueue_add_buf(vi->cvq, sg, out, in, vi, GFP_ATOMIC) < 0);
-
-   virtqueue_kick(vi->cvq);
-
-   /*
-* Spin for a response, the kick causes an ioport write, trapping
-* into the hypervisor, so the request should be handled immediately.
-*/
-   while (!virtqueue_get_buf(vi->cvq, &tmp))
-   cpu_relax();
-
-   return status == VIRTIO_NET_OK;
-}
-
 static void virtnet_ack_link_announce(struct virtnet_info *vi)
 {
rtnl_lock();
-- 
1.7.11.7

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[RFC PATCH 2/2] virtio-net: introduce a new control to set macaddr

2013-01-10 Thread akong
From: Amos Kong 

Currently we write MAC address to pci config space byte by byte,
this means that we have an intermediate step where mac is wrong.
This patch introduced a new control command to set MAC address
in one time.

VIRTIO_NET_F_CTRL_MAC_ADDR is a new feature bit for compatibility.

Signed-off-by: Amos Kong 
---
 drivers/net/virtio_net.c| 16 +++-
 include/uapi/linux/virtio_net.h |  8 +++-
 2 files changed, 22 insertions(+), 2 deletions(-)

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index 395ab4f..ff22bcd 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -803,13 +803,26 @@ static int virtnet_set_mac_address(struct net_device 
*dev, void *p)
struct virtio_device *vdev = vi->vdev;
int ret;
 
+   struct scatterlist sg;
+
ret = eth_mac_addr(dev, p);
if (ret)
return ret;
 
-   if (virtio_has_feature(vdev, VIRTIO_NET_F_MAC))
+   if (virtio_has_feature(vdev, VIRTIO_NET_F_CTRL_MAC_ADDR)) {
+   /* Set MAC address by sending vq command */
+   sg_init_one(&sg, dev->dev_addr, dev->addr_len);
+   virtnet_send_command(vi, VIRTIO_NET_CTRL_MAC,
+VIRTIO_NET_CTRL_MAC_ADDR_SET,
+&sg, 1, 0);
+   return 0;
+   }
+
+   if (virtio_has_feature(vdev, VIRTIO_NET_F_MAC)) {
+   /* Set MAC address by writing config space */
vdev->config->set(vdev, offsetof(struct virtio_net_config, mac),
  dev->dev_addr, dev->addr_len);
+   }
 
return 0;
 }
@@ -1627,6 +1640,7 @@ static unsigned int features[] = {
VIRTIO_NET_F_MRG_RXBUF, VIRTIO_NET_F_STATUS, VIRTIO_NET_F_CTRL_VQ,
VIRTIO_NET_F_CTRL_RX, VIRTIO_NET_F_CTRL_VLAN,
VIRTIO_NET_F_GUEST_ANNOUNCE, VIRTIO_NET_F_MQ,
+   VIRTIO_NET_F_CTRL_MAC_ADDR,
 };
 
 static struct virtio_driver virtio_net_driver = {
diff --git a/include/uapi/linux/virtio_net.h b/include/uapi/linux/virtio_net.h
index 848e358..a5a8c88 100644
--- a/include/uapi/linux/virtio_net.h
+++ b/include/uapi/linux/virtio_net.h
@@ -53,6 +53,7 @@
 * network */
 #define VIRTIO_NET_F_MQ22  /* Device supports Receive Flow
 * Steering */
+#define VIRTIO_NET_F_CTRL_MAC_ADDR 23  /* Set MAC address */
 
 #define VIRTIO_NET_S_LINK_UP   1   /* Link is up */
 #define VIRTIO_NET_S_ANNOUNCE  2   /* Announcement is needed */
@@ -127,7 +128,7 @@ typedef __u8 virtio_net_ctrl_ack;
  #define VIRTIO_NET_CTRL_RX_NOBCAST  5
 
 /*
- * Control the MAC filter table.
+ * Control the MAC
  *
  * The MAC filter table is managed by the hypervisor, the guest should
  * assume the size is infinite.  Filtering should be considered
@@ -140,6 +141,10 @@ typedef __u8 virtio_net_ctrl_ack;
  * first sg list contains unicast addresses, the second is for multicast.
  * This functionality is present if the VIRTIO_NET_F_CTRL_RX feature
  * is available.
+ *
+ * The ADDR_SET command requests one out scatterlist, it contains a
+ * 6 bytes MAC address. This functionality is present if the
+ * VIRTIO_NET_F_CTRL_MAC_ADDR feature is available.
  */
 struct virtio_net_ctrl_mac {
__u32 entries;
@@ -148,6 +153,7 @@ struct virtio_net_ctrl_mac {
 
 #define VIRTIO_NET_CTRL_MAC1
  #define VIRTIO_NET_CTRL_MAC_TABLE_SET0
+ #define VIRTIO_NET_CTRL_MAC_ADDR_SET 1
 
 /*
  * Control VLAN filtering
-- 
1.7.11.7

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[RFC PATCH] virtio-net: introduce a new macaddr control

2013-01-10 Thread akong
From: Amos Kong 

In virtio-net guest driver, currently we write MAC address to
pci config space byte by byte, this means that we have an
intermediate step where mac is wrong. This patch introduced
a new control command to set MAC address in one time.

VIRTIO_NET_F_CTRL_MAC_ADDR is a new feature bit for compatibility.

Signed-off-by: Amos Kong 
---
 hw/virtio-net.c | 9 +
 hw/virtio-net.h | 9 -
 2 files changed, 17 insertions(+), 1 deletion(-)

diff --git a/hw/virtio-net.c b/hw/virtio-net.c
index dc7c6d6..fc11106 100644
--- a/hw/virtio-net.c
+++ b/hw/virtio-net.c
@@ -247,6 +247,7 @@ static uint32_t virtio_net_get_features(VirtIODevice *vdev, 
uint32_t features)
 VirtIONet *n = to_virtio_net(vdev);
 
 features |= (1 << VIRTIO_NET_F_MAC);
+features |= (1 << VIRTIO_NET_F_CTRL_MAC_ADDR);
 
 if (!peer_has_vnet_hdr(n)) {
 features &= ~(0x1 << VIRTIO_NET_F_CSUM);
@@ -282,6 +283,7 @@ static uint32_t virtio_net_bad_features(VirtIODevice *vdev)
 /* Linux kernel 2.6.25.  It understood MAC (as everyone must),
  * but also these: */
 features |= (1 << VIRTIO_NET_F_MAC);
+features |= (1 << VIRTIO_NET_F_CTRL_MAC_ADDR);
 features |= (1 << VIRTIO_NET_F_CSUM);
 features |= (1 << VIRTIO_NET_F_HOST_TSO4);
 features |= (1 << VIRTIO_NET_F_HOST_TSO6);
@@ -349,6 +351,13 @@ static int virtio_net_handle_mac(VirtIONet *n, uint8_t cmd,
 {
 struct virtio_net_ctrl_mac mac_data;
 
+if (cmd == VIRTIO_NET_CTRL_MAC_ADDR_SET && elem->out_num == 2) {
+/* Set MAC address */
+memcpy(n->mac, elem->out_sg[1].iov_base, elem->out_sg[1].iov_len);
+qemu_format_nic_info_str(&n->nic->nc, n->mac);
+return VIRTIO_NET_OK;
+}
+
 if (cmd != VIRTIO_NET_CTRL_MAC_TABLE_SET || elem->out_num != 3 ||
 elem->out_sg[1].iov_len < sizeof(mac_data) ||
 elem->out_sg[2].iov_len < sizeof(mac_data))
diff --git a/hw/virtio-net.h b/hw/virtio-net.h
index d46fb98..9394cc0 100644
--- a/hw/virtio-net.h
+++ b/hw/virtio-net.h
@@ -44,6 +44,8 @@
 #define VIRTIO_NET_F_CTRL_VLAN  19  /* Control channel VLAN filtering */
 #define VIRTIO_NET_F_CTRL_RX_EXTRA 20   /* Extra RX mode control support */
 
+#define VIRTIO_NET_F_CTRL_MAC_ADDR   23 /* Set MAC address */
+
 #define VIRTIO_NET_S_LINK_UP1   /* Link is up */
 
 #define TX_TIMER_INTERVAL 15 /* 150 us */
@@ -106,7 +108,7 @@ typedef uint8_t virtio_net_ctrl_ack;
  #define VIRTIO_NET_CTRL_RX_MODE_NOBCAST  5
 
 /*
- * Control the MAC filter table.
+ * Control the MAC
  *
  * The MAC filter table is managed by the hypervisor, the guest should
  * assume the size is infinite.  Filtering should be considered
@@ -119,6 +121,10 @@ typedef uint8_t virtio_net_ctrl_ack;
  * first sg list contains unicast addresses, the second is for multicast.
  * This functionality is present if the VIRTIO_NET_F_CTRL_RX feature
  * is available.
+ *
+ * The ADDR_SET command requests one out scatterlist, it contains a
+ * 6 bytes MAC address. This functionality is present if the
+ * VIRTIO_NET_F_CTRL_MAC_ADDR feature is available.
  */
 struct virtio_net_ctrl_mac {
 uint32_t entries;
@@ -126,6 +132,7 @@ struct virtio_net_ctrl_mac {
 };
 #define VIRTIO_NET_CTRL_MAC1
  #define VIRTIO_NET_CTRL_MAC_TABLE_SET0
+ #define VIRTIO_NET_CTRL_MAC_ADDR_SET 1
 
 /*
  * Control VLAN filtering
-- 
1.7.11.7

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [RFC PATCH 1/2] move virtnet_send_command() above virtnet_set_mac_address()

2013-01-10 Thread Jason Wang
On 01/10/2013 10:45 PM, ak...@redhat.com wrote:
> From: Amos Kong 
>
> We will send vq command to set mac address in virtnet_set_mac_address()
> a little fix of coding style

Maybe what you need is just a forward declaration.
>
> Signed-off-by: Amos Kong 
> ---
>  drivers/net/virtio_net.c | 89 
> 
>  1 file changed, 44 insertions(+), 45 deletions(-)
>
> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> index a6fcf15..395ab4f 100644
> --- a/drivers/net/virtio_net.c
> +++ b/drivers/net/virtio_net.c
> @@ -753,6 +753,50 @@ static netdev_tx_t start_xmit(struct sk_buff *skb, 
> struct net_device *dev)
>   return NETDEV_TX_OK;
>  }
>  
> +/*
> + * Send command via the control virtqueue and check status.  Commands
> + * supported by the hypervisor, as indicated by feature bits, should
> + * never fail unless improperly formated.
> + */
> +static bool virtnet_send_command(struct virtnet_info *vi, u8 class, u8 cmd,
> +  struct scatterlist *data, int out, int in)
> +{
> + struct scatterlist *s, sg[VIRTNET_SEND_COMMAND_SG_MAX + 2];
> + struct virtio_net_ctrl_hdr ctrl;
> + virtio_net_ctrl_ack status = ~0;
> + unsigned int tmp;
> + int i;
> +
> + /* Caller should know better */
> + BUG_ON(!virtio_has_feature(vi->vdev, VIRTIO_NET_F_CTRL_VQ) ||
> + (out + in > VIRTNET_SEND_COMMAND_SG_MAX));
> +
> + out++; /* Add header */
> + in++; /* Add return status */
> +
> + ctrl.class = class;
> + ctrl.cmd = cmd;
> +
> + sg_init_table(sg, out + in);
> +
> + sg_set_buf(&sg[0], &ctrl, sizeof(ctrl));
> + for_each_sg(data, s, out + in - 2, i)
> + sg_set_buf(&sg[i + 1], sg_virt(s), s->length);
> + sg_set_buf(&sg[out + in - 1], &status, sizeof(status));
> +
> + BUG_ON(virtqueue_add_buf(vi->cvq, sg, out, in, vi, GFP_ATOMIC) < 0);
> +
> + virtqueue_kick(vi->cvq);
> +
> + /* Spin for a response, the kick causes an ioport write, trapping
> +  * into the hypervisor, so the request should be handled immediately.
> +  */
> + while (!virtqueue_get_buf(vi->cvq, &tmp))
> + cpu_relax();
> +
> + return status == VIRTIO_NET_OK;
> +}
> +
>  static int virtnet_set_mac_address(struct net_device *dev, void *p)
>  {
>   struct virtnet_info *vi = netdev_priv(dev);
> @@ -819,51 +863,6 @@ static void virtnet_netpoll(struct net_device *dev)
>  }
>  #endif
>  
> -/*
> - * Send command via the control virtqueue and check status.  Commands
> - * supported by the hypervisor, as indicated by feature bits, should
> - * never fail unless improperly formated.
> - */
> -static bool virtnet_send_command(struct virtnet_info *vi, u8 class, u8 cmd,
> -  struct scatterlist *data, int out, int in)
> -{
> - struct scatterlist *s, sg[VIRTNET_SEND_COMMAND_SG_MAX + 2];
> - struct virtio_net_ctrl_hdr ctrl;
> - virtio_net_ctrl_ack status = ~0;
> - unsigned int tmp;
> - int i;
> -
> - /* Caller should know better */
> - BUG_ON(!virtio_has_feature(vi->vdev, VIRTIO_NET_F_CTRL_VQ) ||
> - (out + in > VIRTNET_SEND_COMMAND_SG_MAX));
> -
> - out++; /* Add header */
> - in++; /* Add return status */
> -
> - ctrl.class = class;
> - ctrl.cmd = cmd;
> -
> - sg_init_table(sg, out + in);
> -
> - sg_set_buf(&sg[0], &ctrl, sizeof(ctrl));
> - for_each_sg(data, s, out + in - 2, i)
> - sg_set_buf(&sg[i + 1], sg_virt(s), s->length);
> - sg_set_buf(&sg[out + in - 1], &status, sizeof(status));
> -
> - BUG_ON(virtqueue_add_buf(vi->cvq, sg, out, in, vi, GFP_ATOMIC) < 0);
> -
> - virtqueue_kick(vi->cvq);
> -
> - /*
> -  * Spin for a response, the kick causes an ioport write, trapping
> -  * into the hypervisor, so the request should be handled immediately.
> -  */
> - while (!virtqueue_get_buf(vi->cvq, &tmp))
> - cpu_relax();
> -
> - return status == VIRTIO_NET_OK;
> -}
> -
>  static void virtnet_ack_link_announce(struct virtnet_info *vi)
>  {
>   rtnl_lock();

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [Qemu-devel] [RFC PATCH 2/2] virtio-net: introduce a new control to set macaddr

2013-01-10 Thread Jason Wang
On 01/10/2013 10:45 PM, ak...@redhat.com wrote:
> From: Amos Kong 
>
> Currently we write MAC address to pci config space byte by byte,
> this means that we have an intermediate step where mac is wrong.
> This patch introduced a new control command to set MAC address
> in one time.
>
> VIRTIO_NET_F_CTRL_MAC_ADDR is a new feature bit for compatibility.
>
> Signed-off-by: Amos Kong 
> ---
>  drivers/net/virtio_net.c| 16 +++-
>  include/uapi/linux/virtio_net.h |  8 +++-
>  2 files changed, 22 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> index 395ab4f..ff22bcd 100644
> --- a/drivers/net/virtio_net.c
> +++ b/drivers/net/virtio_net.c
> @@ -803,13 +803,26 @@ static int virtnet_set_mac_address(struct net_device 
> *dev, void *p)
>   struct virtio_device *vdev = vi->vdev;
>   int ret;
>  
> + struct scatterlist sg;
> +
>   ret = eth_mac_addr(dev, p);
>   if (ret)
>   return ret;
>  
> - if (virtio_has_feature(vdev, VIRTIO_NET_F_MAC))
> + if (virtio_has_feature(vdev, VIRTIO_NET_F_CTRL_MAC_ADDR)) {
> + /* Set MAC address by sending vq command */
> + sg_init_one(&sg, dev->dev_addr, dev->addr_len);
> + virtnet_send_command(vi, VIRTIO_NET_CTRL_MAC,
> +  VIRTIO_NET_CTRL_MAC_ADDR_SET,
> +  &sg, 1, 0);
> + return 0;
> + }
> +

Better to check the return of virtnet_send_command() and give a warn
like other command. Btw, need we fail back to try the old way then?
> + if (virtio_has_feature(vdev, VIRTIO_NET_F_MAC)) {
> + /* Set MAC address by writing config space */
>   vdev->config->set(vdev, offsetof(struct virtio_net_config, mac),
> dev->dev_addr, dev->addr_len);
> + }
>  
>   return 0;
>  }
> @@ -1627,6 +1640,7 @@ static unsigned int features[] = {
>   VIRTIO_NET_F_MRG_RXBUF, VIRTIO_NET_F_STATUS, VIRTIO_NET_F_CTRL_VQ,
>   VIRTIO_NET_F_CTRL_RX, VIRTIO_NET_F_CTRL_VLAN,
>   VIRTIO_NET_F_GUEST_ANNOUNCE, VIRTIO_NET_F_MQ,
> + VIRTIO_NET_F_CTRL_MAC_ADDR,
>  };
>  
>  static struct virtio_driver virtio_net_driver = {
> diff --git a/include/uapi/linux/virtio_net.h b/include/uapi/linux/virtio_net.h
> index 848e358..a5a8c88 100644
> --- a/include/uapi/linux/virtio_net.h
> +++ b/include/uapi/linux/virtio_net.h
> @@ -53,6 +53,7 @@
>* network */
>  #define VIRTIO_NET_F_MQ  22  /* Device supports Receive Flow
>* Steering */
> +#define VIRTIO_NET_F_CTRL_MAC_ADDR 23/* Set MAC address */
>  
>  #define VIRTIO_NET_S_LINK_UP 1   /* Link is up */
>  #define VIRTIO_NET_S_ANNOUNCE2   /* Announcement is needed */
> @@ -127,7 +128,7 @@ typedef __u8 virtio_net_ctrl_ack;
>   #define VIRTIO_NET_CTRL_RX_NOBCAST  5
>  
>  /*
> - * Control the MAC filter table.
> + * Control the MAC
>   *
>   * The MAC filter table is managed by the hypervisor, the guest should
>   * assume the size is infinite.  Filtering should be considered
> @@ -140,6 +141,10 @@ typedef __u8 virtio_net_ctrl_ack;
>   * first sg list contains unicast addresses, the second is for multicast.
>   * This functionality is present if the VIRTIO_NET_F_CTRL_RX feature
>   * is available.
> + *
> + * The ADDR_SET command requests one out scatterlist, it contains a
> + * 6 bytes MAC address. This functionality is present if the
> + * VIRTIO_NET_F_CTRL_MAC_ADDR feature is available.
>   */
>  struct virtio_net_ctrl_mac {
>   __u32 entries;
> @@ -148,6 +153,7 @@ struct virtio_net_ctrl_mac {
>  
>  #define VIRTIO_NET_CTRL_MAC1
>   #define VIRTIO_NET_CTRL_MAC_TABLE_SET0
> + #define VIRTIO_NET_CTRL_MAC_ADDR_SET 1
>  
>  /*
>   * Control VLAN filtering

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [RFC PATCH 1/2] move virtnet_send_command() above virtnet_set_mac_address()

2013-01-10 Thread Michael S. Tsirkin
On Thu, Jan 10, 2013 at 10:51:52PM +0800, Jason Wang wrote:
> On 01/10/2013 10:45 PM, ak...@redhat.com wrote:
> > From: Amos Kong 
> >
> > We will send vq command to set mac address in virtnet_set_mac_address()
> > a little fix of coding style
> 
> Maybe what you need is just a forward declaration.


Nah functions should be ordered sensibly.

> >
> > Signed-off-by: Amos Kong 
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [RFC PATCH 0/2] make mac programming for virtio net more robust

2013-01-10 Thread Amos Kong
On Thu, Jan 10, 2013 at 10:45:39PM +0800, ak...@redhat.com wrote:
> From: Amos Kong 
> 
> Currenly mac is programmed byte by byte. This means that we
> have an intermediate step where mac is wrong. 
> 
> Second patch introduced a new vq control command to set mac
> address in one time.

I just posted RFC patches (kernel & qemu) out,
specifiction update will be sent when the solution
is accepted.

MST has another solution:
1. add a feature to disable mac address in pci space
2. introduce a new vq control cmd to add new mac address
   to mac filter table. (no long check n->mac in receiver_filter())

Welcome to give comments, thanks!
Amos
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [RFC PATCH 2/2] virtio-net: introduce a new control to set macaddr

2013-01-10 Thread Michael S. Tsirkin
On Thu, Jan 10, 2013 at 10:45:41PM +0800, ak...@redhat.com wrote:
> From: Amos Kong 
> 
> Currently we write MAC address to pci config space byte by byte,
> this means that we have an intermediate step where mac is wrong.
> This patch introduced a new control command to set MAC address
> in one time.
> 
> VIRTIO_NET_F_CTRL_MAC_ADDR is a new feature bit for compatibility.
> 
> Signed-off-by: Amos Kong 
> ---
>  drivers/net/virtio_net.c| 16 +++-
>  include/uapi/linux/virtio_net.h |  8 +++-
>  2 files changed, 22 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> index 395ab4f..ff22bcd 100644
> --- a/drivers/net/virtio_net.c
> +++ b/drivers/net/virtio_net.c
> @@ -803,13 +803,26 @@ static int virtnet_set_mac_address(struct net_device 
> *dev, void *p)
>   struct virtio_device *vdev = vi->vdev;
>   int ret;
>  
> + struct scatterlist sg;
> +
>   ret = eth_mac_addr(dev, p);
>   if (ret)
>   return ret;
>  
> - if (virtio_has_feature(vdev, VIRTIO_NET_F_MAC))
> + if (virtio_has_feature(vdev, VIRTIO_NET_F_CTRL_MAC_ADDR)) {
> + /* Set MAC address by sending vq command */
> + sg_init_one(&sg, dev->dev_addr, dev->addr_len);
> + virtnet_send_command(vi, VIRTIO_NET_CTRL_MAC,
> +  VIRTIO_NET_CTRL_MAC_ADDR_SET,
> +  &sg, 1, 0);

I think we should check status and return an error if the command failed.
Also, delay eth_mac_addr until after it passes.

> + return 0;
> + }
> +
> + if (virtio_has_feature(vdev, VIRTIO_NET_F_MAC)) {
> + /* Set MAC address by writing config space */
>   vdev->config->set(vdev, offsetof(struct virtio_net_config, mac),
> dev->dev_addr, dev->addr_len);
> + }
>  

By the way, why don't we fail the command if VIRTIO_NET_F_MAC is off?
Rusty?

>   return 0;
>  }
> @@ -1627,6 +1640,7 @@ static unsigned int features[] = {
>   VIRTIO_NET_F_MRG_RXBUF, VIRTIO_NET_F_STATUS, VIRTIO_NET_F_CTRL_VQ,
>   VIRTIO_NET_F_CTRL_RX, VIRTIO_NET_F_CTRL_VLAN,
>   VIRTIO_NET_F_GUEST_ANNOUNCE, VIRTIO_NET_F_MQ,
> + VIRTIO_NET_F_CTRL_MAC_ADDR,
>  };
>  
>  static struct virtio_driver virtio_net_driver = {
> diff --git a/include/uapi/linux/virtio_net.h b/include/uapi/linux/virtio_net.h
> index 848e358..a5a8c88 100644
> --- a/include/uapi/linux/virtio_net.h
> +++ b/include/uapi/linux/virtio_net.h
> @@ -53,6 +53,7 @@
>* network */
>  #define VIRTIO_NET_F_MQ  22  /* Device supports Receive Flow
>* Steering */
> +#define VIRTIO_NET_F_CTRL_MAC_ADDR 23/* Set MAC address */
>  
>  #define VIRTIO_NET_S_LINK_UP 1   /* Link is up */
>  #define VIRTIO_NET_S_ANNOUNCE2   /* Announcement is needed */
> @@ -127,7 +128,7 @@ typedef __u8 virtio_net_ctrl_ack;
>   #define VIRTIO_NET_CTRL_RX_NOBCAST  5
>  
>  /*
> - * Control the MAC filter table.
> + * Control the MAC
>   *
>   * The MAC filter table is managed by the hypervisor, the guest should
>   * assume the size is infinite.  Filtering should be considered
> @@ -140,6 +141,10 @@ typedef __u8 virtio_net_ctrl_ack;
>   * first sg list contains unicast addresses, the second is for multicast.
>   * This functionality is present if the VIRTIO_NET_F_CTRL_RX feature
>   * is available.
> + *
> + * The ADDR_SET command requests one out scatterlist, it contains a
> + * 6 bytes MAC address. This functionality is present if the
> + * VIRTIO_NET_F_CTRL_MAC_ADDR feature is available.
>   */
>  struct virtio_net_ctrl_mac {
>   __u32 entries;
> @@ -148,6 +153,7 @@ struct virtio_net_ctrl_mac {
>  
>  #define VIRTIO_NET_CTRL_MAC1
>   #define VIRTIO_NET_CTRL_MAC_TABLE_SET0
> + #define VIRTIO_NET_CTRL_MAC_ADDR_SET 1
>  
>  /*
>   * Control VLAN filtering
> -- 
> 1.7.11.7
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [RFC PATCH 0/2] make mac programming for virtio net more robust

2013-01-10 Thread Michael S. Tsirkin
On Thu, Jan 10, 2013 at 10:45:39PM +0800, ak...@redhat.com wrote:
> From: Amos Kong 
> 
> Currenly mac is programmed byte by byte. This means that we
> have an intermediate step where mac is wrong. 
> 
> Second patch introduced a new vq control command to set mac
> address in one time.

As you mention we could alternatively do it without
new commands, simply add a feature bit that says that MACs are
in the mac table.
This would be a much bigger patch, and I'm fine with either way.
Rusty what do you think?

> Amos Kong (2):
>   move virtnet_send_command() above virtnet_set_mac_address()
>   virtio-net: introduce a new control to set macaddr
> 
>  drivers/net/virtio_net.c| 105 
> ++--
>  include/uapi/linux/virtio_net.h |   8 ++-
>  2 files changed, 66 insertions(+), 47 deletions(-)
> 
> -- 
> 1.7.11.7
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [RFCv2 00/12] Introduce host-side virtio queue and CAIF Virtio.

2013-01-10 Thread Sjur Brændeland
Hi Rusty,

On Thu, Jan 10, 2013 at 11:30 AM, Rusty Russell  wrote:
...
>I now have some lightly-tested code (via a userspace harness).

Great - thank you for looking into this. I will start integrating this
with my patches
when you send out a proper patch.

...
> diff --git a/drivers/virtio/Kconfig b/drivers/virtio/Kconfig
> index 8d5bddb..fd95d3e 100644
> --- a/drivers/virtio/Kconfig
> +++ b/drivers/virtio/Kconfig
> @@ -5,6 +5,12 @@ config VIRTIO
>   bus, such as CONFIG_VIRTIO_PCI, CONFIG_VIRTIO_MMIO, CONFIG_LGUEST,
>   CONFIG_RPMSG or CONFIG_S390_GUEST.
>
> +config VHOST
> +   tristate

Inclusion of drivers/virtio from drivers/Makefile depends on VIRTIO.
So I guess VHOST should select VIRTIO to ensure that
drivers/virtio/virtio_host.c
is part of the build.

> +   ---help---
> + This option is selected by any driver which needs to access
> + the host side of a virtio ring.
> +
...
> +/* Returns vring->num if empty, -ve on error. */
> +static inline int __vringh_get_head(const struct vringh *vrh,
> +   int (*getu16)(u16 *val, const u16 *p),
> +   u16 *last_avail_idx)
> +{
> +   u16 avail_idx, i, head;
> +   int err;
> +
> +   err = getu16(&avail_idx, &vrh->vring.avail->idx);
> +   if (err) {
> +   vringh_bad("Failed to access avail idx at %p",
> +  &vrh->vring.avail->idx);
> +   return err;
> +   }
> +
> +   err = getu16(last_avail_idx, &vring_avail_event(&vrh->vring));
> +   if (err) {
> +   vringh_bad("Failed to access last avail idx at %p",
> +  &vring_avail_event(&vrh->vring));
> +   return err;
> +   }
> +
> +   if (*last_avail_idx == avail_idx)
> +   return vrh->vring.num;
> +
> +   /* Only get avail ring entries after they have been exposed by guest. 
> */
> +   smp_rmb();

We are accessing memory shared with a remote device (modem), so we probably
need mandatory barriers here, e.g. something like the virtio_rmb
defined in virtio_ring.c.

> +
> +   i = *last_avail_idx & (vrh->vring.num - 1);
> +
> +   err = getu16(&head, &vrh->vring.avail->ring[i]);
> +   if (err) {
> +   vringh_bad("Failed to read head: idx %d address %p",
> +  *last_avail_idx, &vrh->vring.avail->ring[i]);
> +   return err;
> +   }
> +
> +   if (head >= vrh->vring.num) {
> +   vringh_bad("Guest says index %u > %u is available",
> +  head, vrh->vring.num);
> +   return -EINVAL;
> +   }
> +   return head;
> +}
...
> +static inline int
> +__vringh_sg(struct vringh_acc *acc,
> +   struct vringh_sg *vsg,
> +   unsigned max,
> +   u16 write_flag,
> +   gfp_t gfp,
> +   int (*getdesc)(struct vring_desc *dst, const struct vring_desc 
> *s))
> +{
> +   unsigned count = 0, num_descs = 0;
> +   struct vringh_sg *orig_vsg = vsg;
> +   int err;
> +
> +   /* This sends end marker on sg[max-1], so we know when to chain. */
> +   if (max)
> +   sg_init_table(&vsg->sg, max);
> +
> +   for (;;) {
> +   /* Exhausted this descriptor?  Read next. */
> +   if (acc->desc.len == 0) {
> +   if (!(acc->desc.flags & VRING_DESC_F_NEXT))
> +   break;
> +
> +   if (num_descs++ == acc->max) {
> +   err = -ELOOP;
> +   goto fail;
> +   }
> +
> +   if (acc->desc.next >= acc->max) {
> +   vringh_bad("Chained index %u > %u",
> +  acc->desc.next, acc->max);
> +   err = -EINVAL;
> +   goto fail;
> +   }
> +
> +   acc->idx = acc->desc.next;
> +   err = getdesc(&acc->desc, acc->start + acc->idx);
> +   if (unlikely(err))
> +   goto fail;
> +   }
> +
> +   if (unlikely(!max)) {
> +   vringh_bad("Unexpected %s descriptor",
> +  write_flag ? "writable" : "readable");
> +   return -EINVAL;
> +   }
> +
> +   /* No more readable/writable descriptors? */
> +   if ((acc->desc.flags & VRING_DESC_F_WRITE) != write_flag) {
> +   /* We should not have readable after writable */
> +   if (write_flag) {
> +   vringh_bad("Readable desc %p after writable",
> +  acc->start + acc->idx);
> +   err = -EINVAL;
> +   goto fail;
> +   

Re: [PATCH V3 1/2] virtio-net: fix the set affinity bug when CPU IDs are not consecutive

2013-01-10 Thread Ben Hutchings
On Thu, 2013-01-10 at 11:19 +1030, Rusty Russell wrote:
> Wanlong Gao  writes:
> > On 01/09/2013 07:31 AM, Rusty Russell wrote:
> >> Wanlong Gao  writes:
> >>>   */
> >>>  static u16 virtnet_select_queue(struct net_device *dev, struct sk_buff 
> >>> *skb)
> >>>  {
> >>> - int txq = skb_rx_queue_recorded(skb) ? skb_get_rx_queue(skb) :
> >>> -   smp_processor_id();
> >>> + int txq = 0;
> >>> +
> >>> + if (skb_rx_queue_recorded(skb))
> >>> + txq = skb_get_rx_queue(skb);
> >>> + else if ((txq = per_cpu(vq_index, smp_processor_id())) == -1)
> >>> + txq = 0;
> >> 
> >> You should use __get_cpu_var() instead of smp_processor_id() here, ie:
> >> 
> >> else if ((txq = __get_cpu_var(vq_index)) == -1)
> >> 
> >> And AFAICT, no reason to initialize txq to 0 to start with.
> >> 
> >> So:
> >> 
> >> int txq;
> >> 
> >> if (skb_rx_queue_recorded(skb))
> >>txq = skb_get_rx_queue(skb);
> >> else {
> >> txq = __get_cpu_var(vq_index);
> >> if (txq == -1)
> >> txq = 0;
> >> }
> >
> > Got it, thank you.
> >
> >> 
> >> Now, just to confirm, I assume this can happen even if we use vq_index,
> >> right, because of races with virtnet_set_channels?
> >
> > I still can't understand this race, could you explain more? thank you.
> 
> I assume that someone can call virtnet_set_channels() while we are
> inside virtnet_select_queue(), so they reduce dev->real_num_tx_queues,
> causing virtnet_set_channels to do:
> 
>   while (unlikely(txq >= dev->real_num_tx_queues))
>   txq -= dev->real_num_tx_queues;
> 
> Otherwise, when is this loop called?

In fact, this race can result in the TX scheduler using a queue that has
been disabled, or other weirdness (consider what happens if
real_num_tx_queues increases between those two uses).

virtnet_set_channels() really must disable TX temporarily:

netif_tx_lock(dev);
netif_device_detach(dev);
netif_tx_unlock(dev);
...
netif_device_attach(dev);

Ben.

-- 
Ben Hutchings, Staff Engineer, Solarflare
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: linux-next: Tree for Jan 10 (vmci)

2013-01-10 Thread Randy Dunlap
On 01/10/13 11:17, Randy Dunlap wrote:
> On 01/09/13 19:32, Stephen Rothwell wrote:
>> Hi all,
>>
>> Changes since 20130109:
>>
> 
> 
> on i386, when CONFIG_PCI is not enabled:
> 
>   CC [M]  drivers/misc/vmw_vmci/vmci_guest.o
> drivers/misc/vmw_vmci/vmci_guest.c:58:20: error: array type has incomplete 
> element type
> drivers/misc/vmw_vmci/vmci_guest.c: In function 'vmci_enable_msix':
> drivers/misc/vmw_vmci/vmci_guest.c:384:2: error: implicit declaration of 
> function 'pci_enable_msix' [-Werror=implicit-function-declaration]
> drivers/misc/vmw_vmci/vmci_guest.c: In function 'vmci_guest_probe_device':
> drivers/misc/vmw_vmci/vmci_guest.c:466:2: error: implicit declaration of 
> function 'pcim_enable_device' [-Werror=implicit-function-declaration]
> drivers/misc/vmw_vmci/vmci_guest.c:592:2: error: implicit declaration of 
> function 'pci_enable_msi' [-Werror=implicit-function-declaration]
> drivers/misc/vmw_vmci/vmci_guest.c:654:3: error: implicit declaration of 
> function 'pci_disable_msix' [-Werror=implicit-function-declaration]
> drivers/misc/vmw_vmci/vmci_guest.c:656:3: error: implicit declaration of 
> function 'pci_disable_msi' [-Werror=implicit-function-declaration]
> cc1: some warnings being treated as errors
> make[4]: *** [drivers/misc/vmw_vmci/vmci_guest.o] Error 1
> 
> 
> Should all of vmci depend on PCI ??
> 
> 


Also on i386, when CONFIG_BLOCK is not enabled:

  CC [M]  drivers/misc/vmw_vmci/vmci_queue_pair.o
In file included from drivers/misc/vmw_vmci/vmci_queue_pair.c:16:0:
include/linux/device-mapper.h:48:56: warning: 'struct bio' declared inside 
parameter list [enabled by default]
include/linux/device-mapper.h:48:56: warning: its scope is only this definition 
or declaration, which is probably not what you want [enabled by default]
include/linux/device-mapper.h:50:13: warning: 'struct request' declared inside 
parameter list [enabled by default]
include/linux/device-mapper.h:61:15: warning: 'struct bio' declared inside 
parameter list [enabled by default]
include/linux/device-mapper.h:64:15: warning: 'struct request' declared inside 
parameter list [enabled by default]
include/linux/device-mapper.h:80:15: warning: 'struct bvec_merge_data' declared 
inside parameter list [enabled by default]
include/linux/device-mapper.h:92:12: warning: 'struct queue_limits' declared 
inside parameter list [enabled by default]
include/linux/device-mapper.h:265:13: error: field 'clone' has incomplete type
include/linux/device-mapper.h: In function 'dm_bio_get_target_request_nr':
include/linux/device-mapper.h:280:9: warning: initialization from incompatible 
pointer type [enabled by default]
include/linux/device-mapper.h: At top level:
include/linux/device-mapper.h:376:42: warning: 'struct request' declared inside 
parameter list [enabled by default]
include/linux/device-mapper.h:563:33: warning: 'struct request' declared inside 
parameter list [enabled by default]
include/linux/device-mapper.h:564:41: warning: 'struct request' declared inside 
parameter list [enabled by default]
include/linux/device-mapper.h:565:38: warning: 'struct request' declared inside 
parameter list [enabled by default]
drivers/misc/vmw_vmci/vmci_queue_pair.c: In function 'qp_alloc_ppn_set':
drivers/misc/vmw_vmci/vmci_queue_pair.c:476:6: error: implicit declaration of 
function 'kmalloc' [-Werror=implicit-function-declaration]
drivers/misc/vmw_vmci/vmci_queue_pair.c:475:15: warning: assignment makes 
pointer from integer without a cast [enabled by default]
drivers/misc/vmw_vmci/vmci_queue_pair.c:480:15: warning: assignment makes 
pointer from integer without a cast [enabled by default]
drivers/misc/vmw_vmci/vmci_queue_pair.c:483:3: error: implicit declaration of 
function 'kfree' [-Werror=implicit-function-declaration]
drivers/misc/vmw_vmci/vmci_queue_pair.c: In function 'qp_host_alloc_queue':
drivers/misc/vmw_vmci/vmci_queue_pair.c:619:2: error: implicit declaration of 
function 'kzalloc' [-Werror=implicit-function-declaration]
drivers/misc/vmw_vmci/vmci_queue_pair.c:619:8: warning: assignment makes 
pointer from integer without a cast [enabled by default]
drivers/misc/vmw_vmci/vmci_queue_pair.c: In function 'qp_release_pages':
drivers/misc/vmw_vmci/vmci_queue_pair.c:717:3: error: implicit declaration of 
function 'page_cache_release' [-Werror=implicit-function-declaration]
drivers/misc/vmw_vmci/vmci_queue_pair.c: In function 'qp_guest_endpoint_create':
drivers/misc/vmw_vmci/vmci_queue_pair.c:975:8: warning: assignment makes 
pointer from integer without a cast [enabled by default]
drivers/misc/vmw_vmci/vmci_queue_pair.c: In function 'qp_alloc_hypercall':
drivers/misc/vmw_vmci/vmci_queue_pair.c:1033:12: warning: assignment makes 
pointer from integer without a cast [enabled by default]
drivers/misc/vmw_vmci/vmci_queue_pair.c: In function 'qp_broker_create':
drivers/misc/vmw_vmci/vmci_queue_pair.c:1396:8: warning: assignment makes 
pointer from integer without a cast [enabled by default]
drivers/misc/vmw_vmci/vmci_queue_pair.c:1

Re: [RFCv2 00/12] Introduce host-side virtio queue and CAIF Virtio.

2013-01-10 Thread Rusty Russell
"Michael S. Tsirkin"  writes:
> On Thu, Jan 10, 2013 at 09:00:55PM +1030, Rusty Russell wrote:
>> Not sure why vhost/net doesn't built a packet and feed it in
>> netif_rx_ni().  This is what tun seems to do, and with this code it
>> should be fairly optimal.
>
> Because we want to use NAPI.

Not quite what I was asking; it was more a question of why we're using a
raw socket, when we trivially have a complete skb already which we
should be able to feed to Linux like any network packet.

And that path is pretty well optimized...

Cheers,
Rusty.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [RFCv2 00/12] Introduce host-side virtio queue and CAIF Virtio.

2013-01-10 Thread Rusty Russell
Sjur Brændeland  writes:
> Hi Rusty,
>
> On Thu, Jan 10, 2013 at 11:30 AM, Rusty Russell  wrote:
> ...
>>I now have some lightly-tested code (via a userspace harness).
>
> Great - thank you for looking into this. I will start integrating this
> with my patches
> when you send out a proper patch.

Hi Sjur!

OK, the Internet was no help here, how do you pronounce Sjur?
I'm guessing "shoor" rhyming with tour until I know better.

>> diff --git a/drivers/virtio/Kconfig b/drivers/virtio/Kconfig
>> index 8d5bddb..fd95d3e 100644
>> --- a/drivers/virtio/Kconfig
>> +++ b/drivers/virtio/Kconfig
>> @@ -5,6 +5,12 @@ config VIRTIO
>>   bus, such as CONFIG_VIRTIO_PCI, CONFIG_VIRTIO_MMIO, CONFIG_LGUEST,
>>   CONFIG_RPMSG or CONFIG_S390_GUEST.
>>
>> +config VHOST
>> +   tristate
>
> Inclusion of drivers/virtio from drivers/Makefile depends on VIRTIO.
> So I guess VHOST should select VIRTIO to ensure that
> drivers/virtio/virtio_host.c
> is part of the build.

Maybe I should move drivers/virtio/virtio_host.c to
drivers/vhost/vringh.c; I'll look at it.

It makes sense for vhost/ to contain the host-side stuff, since it
already exists.

>> +   if (*last_avail_idx == avail_idx)
>> +   return vrh->vring.num;
>> +
>> +   /* Only get avail ring entries after they have been exposed by 
>> guest. */
>> +   smp_rmb();
>
> We are accessing memory shared with a remote device (modem), so we probably
> need mandatory barriers here, e.g. something like the virtio_rmb
> defined in virtio_ring.c.

Fair enough, we can put those in a header.

>> +   /* Append the pages into the sg. */
>> +   err = add_to_sg(&vsg, (void *)(long)acc->desc.addr,
>> +   acc->desc.len, gfp);
>
> I would prefer not to split into pages at this point, but rather provide an
> iterator or the original list found in the descriptor to the client.
>
> In our case we use virtio rings to talk to a LTE-modem over shared memory.
> The IP traffic is received over the air, interleaved and arrives in
> the virtio driver in
> large bursts. So virtio driver on the modem receives multiple datagrams
> held in large contiguous buffers. Our current approach is to handle each
> buffer as a chained descriptor list, where each datagram is kept in
> separate chained descriptors. When the buffers are consumed on the linux
> host, the modem will read the chained descriptors from the used-ring and
> free the entire contiguous buffer in one operation.

In other words, boundaries matter?

While the sg-in-place hack is close to optimal for TCM_VHOST, neither
net not you can use it directly.  I'll switch to an iovec (with a
similar use-caller-supplied-if-it-fits trick); they're smaller anyway.

More code coming...

Thanks,
Rusty.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [RFC PATCH 2/2] virtio-net: introduce a new control to set macaddr

2013-01-10 Thread Rusty Russell
"Michael S. Tsirkin"  writes:
>> +if (virtio_has_feature(vdev, VIRTIO_NET_F_MAC)) {
>> +/* Set MAC address by writing config space */
>>  vdev->config->set(vdev, offsetof(struct virtio_net_config, mac),
>>dev->dev_addr, dev->addr_len);
>> +}
>>  
>
> By the way, why don't we fail the command if VIRTIO_NET_F_MAC is off?
> Rusty?

Looked back through the history for this one.  I think the theory is that
if the guest doesn't set VIRTIO_NET_F_MAC, it means it doesn't care:
it's up to the guest.

Cheers,
Rusty.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [RFC PATCH 0/2] make mac programming for virtio net more robust

2013-01-10 Thread Rusty Russell
"Michael S. Tsirkin"  writes:

> On Thu, Jan 10, 2013 at 10:45:39PM +0800, ak...@redhat.com wrote:
>> From: Amos Kong 
>> 
>> Currenly mac is programmed byte by byte. This means that we
>> have an intermediate step where mac is wrong. 
>> 
>> Second patch introduced a new vq control command to set mac
>> address in one time.
>
> As you mention we could alternatively do it without
> new commands, simply add a feature bit that says that MACs are
> in the mac table.
> This would be a much bigger patch, and I'm fine with either way.
> Rusty what do you think?

Hmm, mac filtering and "my mac address" are not quite the same thing.  I
don't know if it matters for anyone: does it?  The mac address is abused
for things like identifying machines, etc.

If we keep it as a separate concept, Amos' patch seems to make sense.

Cheers,
Rusty.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [Xen-devel] [PATCH v3 00/11] xen: Initial kexec/kdump implementation

2013-01-10 Thread Eric W. Biederman
Konrad Rzeszutek Wilk  writes:

> On Mon, Jan 07, 2013 at 01:34:04PM +0100, Daniel Kiper wrote:
>> I think that new kexec hypercall function should mimics kexec syscall.
>> It means that all arguments passed to hypercall should have same types
>> if it is possible or if it is not possible then conversion should be done
>> in very easy way. Additionally, I think that one call of new hypercall
>> load function should load all needed thinks in right place and
>> return relevant status. Last but not least, new functionality should
>
> We are not restricted to just _one_ hypercall. And this loading
> thing could be similar to the micrcode hypercall - which just points
> to a virtual address along with the length - and says 'load me'.
>
>> be available through /dev/xen/privcmd or directly from kernel without
>> bigger effort.
>
> Perhaps we should have a email thread on xen-devel where we hash out
> some ideas. Eric, would you be OK included on this - it would make
> sense for this mechanism to be as future-proof as possible - and I am not
> sure what your plans for kexec are in the future?

The basic kexec interface is.

load ranges of virtual addresses physical addresses.
jump to the physical address  with identity mapped page tables.

There are a few flags to allow for different usage scenarios like
kexec on panic vs normal kexec.

It is very very simple and very extensible.  All of the weird glue
happens in userspace.

Eric
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH V4 1/2] virtio-net: fix the set affinity bug when CPU IDs are not consecutive

2013-01-10 Thread Wanlong Gao
As Michael mentioned, set affinity and select queue will not work very
well when CPU IDs are not consecutive, this can happen with hot unplug.
Fix this bug by traversal the online CPUs, and create a per cpu variable
to find the mapping from CPU to the preferable virtual-queue.

Cc: Rusty Russell 
Cc: "Michael S. Tsirkin" 
Cc: Jason Wang 
Cc: Eric Dumazet 
Cc: virtualization@lists.linux-foundation.org
Cc: net...@vger.kernel.org
Signed-off-by: Wanlong Gao 
---
V3->V4:
move vq_index into virtnet_info (Jason)
change the mapping value when not setting affinity (Jason)
address the comments about select_queue (Rusty)
Not Addressed yet:
race between select_queue and set_channels need discuss more
but not the same problem with this (Rusty)

 drivers/net/virtio_net.c | 53 ++--
 1 file changed, 42 insertions(+), 11 deletions(-)

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index a6fcf15..ca17a58 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -123,6 +123,9 @@ struct virtnet_info {
 
/* Does the affinity hint is set for virtqueues? */
bool affinity_hint_set;
+
+   /* Per-cpu variable to show the mapping from CPU to virtqueue */
+   int __percpu *vq_index;
 };
 
 struct skb_vnet_hdr {
@@ -1016,6 +1019,7 @@ static int virtnet_vlan_rx_kill_vid(struct net_device 
*dev, u16 vid)
 static void virtnet_set_affinity(struct virtnet_info *vi, bool set)
 {
int i;
+   int cpu;
 
/* In multiqueue mode, when the number of cpu is equal to the number of
 * queue pairs, we let the queue pairs to be private to one cpu by
@@ -1029,16 +1033,29 @@ static void virtnet_set_affinity(struct virtnet_info 
*vi, bool set)
return;
}
 
-   for (i = 0; i < vi->max_queue_pairs; i++) {
-   int cpu = set ? i : -1;
-   virtqueue_set_affinity(vi->rq[i].vq, cpu);
-   virtqueue_set_affinity(vi->sq[i].vq, cpu);
-   }
+   if (set) {
+   i = 0;
+   for_each_online_cpu(cpu) {
+   virtqueue_set_affinity(vi->rq[i].vq, cpu);
+   virtqueue_set_affinity(vi->sq[i].vq, cpu);
+   *per_cpu_ptr(vi->vq_index, cpu) = i;
+   i++;
+   }
 
-   if (set)
vi->affinity_hint_set = true;
-   else
+   } else {
+   for(i = 0; i < vi->max_queue_pairs; i++) {
+   virtqueue_set_affinity(vi->rq[i].vq, -1);
+   virtqueue_set_affinity(vi->sq[i].vq, -1);
+   }
+
+   i = 0;
+   for_each_online_cpu(cpu)
+   *per_cpu_ptr(vi->vq_index, cpu) =
+   ++i % vi->curr_queue_pairs;
+
vi->affinity_hint_set = false;
+   }
 }
 
 static void virtnet_get_ringparam(struct net_device *dev,
@@ -1127,12 +1144,19 @@ static int virtnet_change_mtu(struct net_device *dev, 
int new_mtu)
 
 /* To avoid contending a lock hold by a vcpu who would exit to host, select the
  * txq based on the processor id.
- * TODO: handle cpu hotplug.
  */
 static u16 virtnet_select_queue(struct net_device *dev, struct sk_buff *skb)
 {
-   int txq = skb_rx_queue_recorded(skb) ? skb_get_rx_queue(skb) :
- smp_processor_id();
+   int txq;
+   struct virtnet_info *vi = netdev_priv(dev);
+
+   if (skb_rx_queue_recorded(skb)) {
+   txq = skb_get_rx_queue(skb);
+   } else {
+   txq = *__this_cpu_ptr(vi->vq_index);
+   if (txq == -1)
+   txq = 0;
+   }
 
while (unlikely(txq >= dev->real_num_tx_queues))
txq -= dev->real_num_tx_queues;
@@ -1453,6 +1477,10 @@ static int virtnet_probe(struct virtio_device *vdev)
if (vi->stats == NULL)
goto free;
 
+   vi->vq_index = alloc_percpu(int);
+   if (vi->vq_index == NULL)
+   goto free_stats;
+
mutex_init(&vi->config_lock);
vi->config_enable = true;
INIT_WORK(&vi->config_work, virtnet_config_changed_work);
@@ -1476,7 +1504,7 @@ static int virtnet_probe(struct virtio_device *vdev)
/* Allocate/initialize the rx/tx queues, and invoke find_vqs */
err = init_vqs(vi);
if (err)
-   goto free_stats;
+   goto free_index;
 
netif_set_real_num_tx_queues(dev, 1);
netif_set_real_num_rx_queues(dev, 1);
@@ -1520,6 +1548,8 @@ free_recv_bufs:
 free_vqs:
cancel_delayed_work_sync(&vi->refill);
virtnet_del_vqs(vi);
+free_index:
+   free_percpu(vi->vq_index);
 free_stats:
free_percpu(vi->stats);
 free:
@@ -1554,6 +1584,7 @@ static void virtnet_remove(struct virtio_device *vdev)
 
flush_work(&vi->config_work);
 
+   free_percpu(vi->vq_index);
free_percpu(vi->stats);
free_net

[PATCH V4 2/2] virtio-net: reset virtqueue affinity when doing cpu hotplug

2013-01-10 Thread Wanlong Gao
Add a cpu notifier to virtio-net, so that we can reset the
virtqueue affinity if the cpu hotplug happens. It improve
the performance through enabling or disabling the virtqueue
affinity after doing cpu hotplug.

Cc: Rusty Russell 
Cc: "Michael S. Tsirkin" 
Cc: Jason Wang 
Cc: Eric Dumazet 
Cc: virtualization@lists.linux-foundation.org
Cc: net...@vger.kernel.org
Signed-off-by: Wanlong Gao 
---
 drivers/net/virtio_net.c | 30 ++
 1 file changed, 30 insertions(+)

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index ca17a58..e0b1f25 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -26,6 +26,7 @@
 #include 
 #include 
 #include 
+#include 
 
 static int napi_weight = 128;
 module_param(napi_weight, int, 0444);
@@ -126,6 +127,9 @@ struct virtnet_info {
 
/* Per-cpu variable to show the mapping from CPU to virtqueue */
int __percpu *vq_index;
+
+   /* CPU hot plug notifier */
+   struct notifier_block nb;
 };
 
 struct skb_vnet_hdr {
@@ -1058,6 +1062,23 @@ static void virtnet_set_affinity(struct virtnet_info 
*vi, bool set)
}
 }
 
+static int virtnet_cpu_callback(struct notifier_block *nfb,
+   unsigned long action, void *hcpu)
+{
+   struct virtnet_info *vi = container_of(nfb, struct virtnet_info, nb);
+   switch(action) {
+   case CPU_ONLINE:
+   case CPU_ONLINE_FROZEN:
+   case CPU_DEAD:
+   case CPU_DEAD_FROZEN:
+   virtnet_set_affinity(vi, true);
+   break;
+   default:
+   break;
+   }
+   return NOTIFY_OK;
+}
+
 static void virtnet_get_ringparam(struct net_device *dev,
struct ethtool_ringparam *ring)
 {
@@ -1527,6 +1548,13 @@ static int virtnet_probe(struct virtio_device *vdev)
}
}
 
+   vi->nb.notifier_call = &virtnet_cpu_callback;
+   err = register_hotcpu_notifier(&vi->nb);
+   if (err) {
+   pr_debug("virtio_net: registering cpu notifier failed\n");
+   goto free_recv_bufs;
+   }
+
/* Assume link up if device can't report link status,
   otherwise get link status from config. */
if (virtio_has_feature(vi->vdev, VIRTIO_NET_F_STATUS)) {
@@ -1573,6 +1601,8 @@ static void virtnet_remove(struct virtio_device *vdev)
 {
struct virtnet_info *vi = vdev->priv;
 
+   unregister_hotcpu_notifier(&vi->nb);
+
/* Prevent config work handler from accessing the device. */
mutex_lock(&vi->config_lock);
vi->config_enable = false;
-- 
1.8.1

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [RFCv2 00/12] Introduce host-side virtio queue and CAIF Virtio.

2013-01-10 Thread Rusty Russell
Untested, but I wanted to post before the weekend.

I think the implementation is a bit nicer, and though we have a callback
to get the guest-to-userspace offset, it might be faster since I think
most cases will re-use the same mapping.

Feedback on API welcome!
Rusty.

virtio_host: host-side implementation of virtio rings (untested!)

Getting use of virtio rings correct is tricky, and a recent patch saw
an implementation of in-kernel rings (as separate from userspace).

This patch attempts to abstract the business of dealing with the
virtio ring layout from the access (userspace or direct); to do this,
we use function pointers, which gcc inlines correctly.

FIXME: strong barriers a-la virtio weak_barrier flag.
FIXME: separate notify call with flag if we wrapped.
FIXME: move to vhost/vringh.c.
FIXME: test :)

diff --git a/drivers/vhost/Kconfig b/drivers/vhost/Kconfig
index 202bba6..38ec470 100644
--- a/drivers/vhost/Kconfig
+++ b/drivers/vhost/Kconfig
@@ -1,6 +1,7 @@
 config VHOST_NET
tristate "Host kernel accelerator for virtio net (EXPERIMENTAL)"
depends on NET && EVENTFD && (TUN || !TUN) && (MACVTAP || !MACVTAP) && 
EXPERIMENTAL
+   select VHOST
---help---
  This kernel module can be loaded in host kernel to accelerate
  guest networking with virtio_net. Not to be confused with virtio_net
diff --git a/drivers/vhost/Kconfig.tcm b/drivers/vhost/Kconfig.tcm
index a9c6f76..f4c3704 100644
--- a/drivers/vhost/Kconfig.tcm
+++ b/drivers/vhost/Kconfig.tcm
@@ -1,6 +1,7 @@
 config TCM_VHOST
tristate "TCM_VHOST fabric module (EXPERIMENTAL)"
depends on TARGET_CORE && EVENTFD && EXPERIMENTAL && m
+   select VHOST
default n
---help---
Say M here to enable the TCM_VHOST fabric module for use with 
virtio-scsi guests
diff --git a/drivers/virtio/Kconfig b/drivers/virtio/Kconfig
index 8d5bddb..fd95d3e 100644
--- a/drivers/virtio/Kconfig
+++ b/drivers/virtio/Kconfig
@@ -5,6 +5,12 @@ config VIRTIO
  bus, such as CONFIG_VIRTIO_PCI, CONFIG_VIRTIO_MMIO, CONFIG_LGUEST,
  CONFIG_RPMSG or CONFIG_S390_GUEST.
 
+config VHOST
+   tristate
+   ---help---
+ This option is selected by any driver which needs to access
+ the host side of a virtio ring.
+
 menu "Virtio drivers"
 
 config VIRTIO_PCI
diff --git a/drivers/virtio/Makefile b/drivers/virtio/Makefile
index 9076635..9833cd5 100644
--- a/drivers/virtio/Makefile
+++ b/drivers/virtio/Makefile
@@ -2,3 +2,4 @@ obj-$(CONFIG_VIRTIO) += virtio.o virtio_ring.o
 obj-$(CONFIG_VIRTIO_MMIO) += virtio_mmio.o
 obj-$(CONFIG_VIRTIO_PCI) += virtio_pci.o
 obj-$(CONFIG_VIRTIO_BALLOON) += virtio_balloon.o
+obj-$(CONFIG_VHOST) += virtio_host.o
diff --git a/drivers/virtio/virtio_host.c b/drivers/virtio/virtio_host.c
new file mode 100644
index 000..7416741
--- /dev/null
+++ b/drivers/virtio/virtio_host.c
@@ -0,0 +1,618 @@
+/*
+ * Helpers for the host side of a virtio ring.
+ *
+ * Since these may be in userspace, we use (inline) accessors.
+ */
+#include 
+#include 
+#include 
+#include 
+#include 
+
+static __printf(1,2) __cold void vringh_bad(const char *fmt, ...)
+{
+   static DEFINE_RATELIMIT_STATE(vringh_rs,
+ DEFAULT_RATELIMIT_INTERVAL,
+ DEFAULT_RATELIMIT_BURST);
+   if (__ratelimit(&vringh_rs)) {
+   va_list ap;
+   va_start(ap, fmt);
+   printk(KERN_NOTICE "vringh:");
+   vprintk(fmt, ap);
+   va_end(ap);
+   }
+}
+
+/* Returns vring->num if empty, -ve on error. */
+static inline int __vringh_get_head(const struct vringh *vrh,
+   int (*getu16)(u16 *val, const u16 *p),
+   u16 *last_avail_idx)
+{
+   u16 avail_idx, i, head;
+   int err;
+
+   err = getu16(&avail_idx, &vrh->vring.avail->idx);
+   if (err) {
+   vringh_bad("Failed to access avail idx at %p",
+  &vrh->vring.avail->idx);
+   return err;
+   }
+
+   err = getu16(last_avail_idx, &vring_avail_event(&vrh->vring));
+   if (err) {
+   vringh_bad("Failed to access last avail idx at %p",
+  &vring_avail_event(&vrh->vring));
+   return err;
+   }
+
+   if (*last_avail_idx == avail_idx)
+   return vrh->vring.num;
+
+   /* Only get avail ring entries after they have been exposed by guest. */
+   smp_rmb();
+
+   i = *last_avail_idx & (vrh->vring.num - 1);
+
+   err = getu16(&head, &vrh->vring.avail->ring[i]);
+   if (err) {
+   vringh_bad("Failed to read head: idx %d address %p",
+  *last_avail_idx, &vrh->vring.avail->ring[i]);
+   return err;
+   }
+
+   if (head >= vrh->vring.num) {
+   vringh_bad("Guest says index %u > %u is available",
+  head, vrh->vrin

Re: [RFCv2 00/12] Introduce host-side virtio queue and CAIF Virtio.

2013-01-10 Thread Michael S. Tsirkin
On Fri, Jan 11, 2013 at 09:18:33AM +1030, Rusty Russell wrote:
> "Michael S. Tsirkin"  writes:
> > On Thu, Jan 10, 2013 at 09:00:55PM +1030, Rusty Russell wrote:
> >> Not sure why vhost/net doesn't built a packet and feed it in
> >> netif_rx_ni().  This is what tun seems to do, and with this code it
> >> should be fairly optimal.
> >
> > Because we want to use NAPI.
> 
> Not quite what I was asking; it was more a question of why we're using a
> raw socket, when we trivially have a complete skb already which we
> should be able to feed to Linux like any network packet.
> 
> And that path is pretty well optimized...
> 
> Cheers,
> Rusty.


Oh for some reason I thought you were talking about virtio.
I don't really understand what you are saying here - vhost
actually calls out to tun to build and submit the skb.

-- 
MST
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [RFC PATCH 0/2] make mac programming for virtio net more robust

2013-01-10 Thread Michael S. Tsirkin
On Fri, Jan 11, 2013 at 12:53:07PM +1030, Rusty Russell wrote:
> "Michael S. Tsirkin"  writes:
> 
> > On Thu, Jan 10, 2013 at 10:45:39PM +0800, ak...@redhat.com wrote:
> >> From: Amos Kong 
> >> 
> >> Currenly mac is programmed byte by byte. This means that we
> >> have an intermediate step where mac is wrong. 
> >> 
> >> Second patch introduced a new vq control command to set mac
> >> address in one time.
> >
> > As you mention we could alternatively do it without
> > new commands, simply add a feature bit that says that MACs are
> > in the mac table.
> > This would be a much bigger patch, and I'm fine with either way.
> > Rusty what do you think?
> 
> Hmm, mac filtering and "my mac address" are not quite the same thing.  I
> don't know if it matters for anyone: does it?
> The mac address is abused
> for things like identifying machines, etc.

I don't know either. I think net core differentiates between mac and
uc_list because linux has to know which mac to use when building
up packets, so at some level, I agree it might be useful to identify the
machine.

BTW netdev/davem should have been copied on this, Amos I think it's a
good idea to remember to do it next time you post.

> 
> If we keep it as a separate concept, Amos' patch seems to make sense.

Yes. It also keeps the patch small, I just thought I'd mention the
option.

> 
> Cheers,
> Rusty.

-- 
MST
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization