Re: [PATCH 0/6] virtio_add_buf replacement.

2013-03-06 Thread Asias He
On Wed, Mar 06, 2013 at 04:15:02PM +1100, Rusty Russell wrote:
> OK, so I've spent a few days benchmarking.  Turns out 80% of
> virtio_add_buf cases are uni-directional (including the
> always-performance-sensitive networking code), and that gets no
> performance penalty (though tests with real networking would be
> appreciated!).
> 
> I'm not reposting all the "convert driver to virtio_add_outbuf()"
> patches: just the scsi one which I didn't have before.  I won't actually
> remove virtio_add_buf() until the *following* merge window, just to be
> sure.

Why not send out all the patches in this series? It would be much easier
for people to read in one thread.

> One annoying thing about benchmarking is that in some cases, speeding up
> one side can make the whole thing slower, due to more wakeups.
> Device-side polling techniques might be required in future to get more
> performance.
> 
> Cheers,
> Rusty.
> ___
> Virtualization mailing list
> Virtualization@lists.linux-foundation.org
> https://lists.linuxfoundation.org/mailman/listinfo/virtualization

-- 
Asias
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH 1/1] virtio: rng: disallow multiple device registrations, fixes crashes

2013-03-06 Thread Amit Shah
The code currently only supports one virtio-rng device at a time.
Invoking guests with multiple devices causes the guest to blow up.

Check if we've already registered and initialised the driver.  Also
cleanup in case of registration errors or hot-unplug so that a new
device can be used.

Reported-by: Peter Krempa 
Reported-by: 
Signed-off-by: Amit Shah 
---

Also valid for stable?

 drivers/char/hw_random/virtio-rng.c | 13 +++--
 1 file changed, 11 insertions(+), 2 deletions(-)

diff --git a/drivers/char/hw_random/virtio-rng.c 
b/drivers/char/hw_random/virtio-rng.c
index 10fd71c..6bf4d47 100644
--- a/drivers/char/hw_random/virtio-rng.c
+++ b/drivers/char/hw_random/virtio-rng.c
@@ -92,14 +92,22 @@ static int probe_common(struct virtio_device *vdev)
 {
int err;
 
+   if (vq) {
+   /* We only support one device for now */
+   return -EBUSY;
+   }
/* We expect a single virtqueue. */
vq = virtio_find_single_vq(vdev, random_recv_done, "input");
-   if (IS_ERR(vq))
-   return PTR_ERR(vq);
+   if (IS_ERR(vq)) {
+   err = PTR_ERR(vq);
+   vq = NULL;
+   return err;
+   }
 
err = hwrng_register(&virtio_hwrng);
if (err) {
vdev->config->del_vqs(vdev);
+   vq = NULL;
return err;
}
 
@@ -112,6 +120,7 @@ static void remove_common(struct virtio_device *vdev)
busy = false;
hwrng_unregister(&virtio_hwrng);
vdev->config->del_vqs(vdev);
+   vq = NULL;
 }
 
 static int virtrng_probe(struct virtio_device *vdev)
-- 
1.8.1.2

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH 4/6] virtio_ring: virtqueue_add_outbuf / virtqueue_add_inbuf.

2013-03-06 Thread Asias He
On Wed, Mar 06, 2013 at 04:23:24PM +1100, Rusty Russell wrote:
> These are specialized versions of virtqueue_add_buf(), which cover
> over 80% of cases and are far clearer.
> 
> In particular, the scatterlists passed to these functions don't have
> to be clean (ie. we ignore end markers).
> 
> Signed-off-by: Rusty Russell  

So, what is the plan for the following ideas discussed in the other
thread?

   '''
   > Looking at code, it seems that most users really have a single sg, in
   > low memory. So how about simply passing void * instead of sg? Whoever
   > has multiple sgs can use the rich interface.
   
   Good point, let's do that:
   1) Make virtqueue_add_outbuf()/inbuf() take a void * and len.
   2) Transfer users across to use that.
   3) Make everyone else use clean scatterlists with virtqueue_add_sgs[].
   4) Remove virtqueue_add_bufs().

   '''
 
> diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
> index a78ad45..5217baf 100644
> --- a/drivers/virtio/virtio_ring.c
> +++ b/drivers/virtio/virtio_ring.c
> @@ -366,6 +366,50 @@ int virtqueue_add_sgs(struct virtqueue *_vq,
>  EXPORT_SYMBOL_GPL(virtqueue_add_sgs);
>  
>  /**
> + * virtqueue_add_outbuf - expose output buffers to other end
> + * @vq: the struct virtqueue we're talking about.
> + * @sgs: array of scatterlists (need not be terminated!)
> + * @num: the number of scatterlists readable by other side
> + * @data: the token identifying the buffer.
> + * @gfp: how to do memory allocations (if necessary).
> + *
> + * Caller must ensure we don't call this with other virtqueue operations
> + * at the same time (except where noted).
> + *
> + * Returns zero or a negative error (ie. ENOSPC, ENOMEM).
> + */
> +int virtqueue_add_outbuf(struct virtqueue *vq,
> +  struct scatterlist sg[], unsigned int num,
> +  void *data,
> +  gfp_t gfp)
> +{
> + return virtqueue_add(vq, &sg, sg_next_arr, num, 0, 1, 0, data, gfp);
> +}
> +EXPORT_SYMBOL_GPL(virtqueue_add_outbuf);
> +
> +/**
> + * virtqueue_add_inbuf - expose input buffers to other end
> + * @vq: the struct virtqueue we're talking about.
> + * @sgs: array of scatterlists (need not be terminated!)
> + * @num: the number of scatterlists writable by other side
> + * @data: the token identifying the buffer.
> + * @gfp: how to do memory allocations (if necessary).
> + *
> + * Caller must ensure we don't call this with other virtqueue operations
> + * at the same time (except where noted).
> + *
> + * Returns zero or a negative error (ie. ENOSPC, ENOMEM).
> + */
> +int virtqueue_add_inbuf(struct virtqueue *vq,
> + struct scatterlist sg[], unsigned int num,
> + void *data,
> + gfp_t gfp)
> +{
> + return virtqueue_add(vq, &sg, sg_next_arr, 0, num, 0, 1, data, gfp);
> +}
> +EXPORT_SYMBOL_GPL(virtqueue_add_inbuf);
> +
> +/**
>   * virtqueue_kick_prepare - first half of split virtqueue_kick call.
>   * @vq: the struct virtqueue
>   *
> diff --git a/include/linux/virtio.h b/include/linux/virtio.h
> index 6eff15b..b442500 100644
> --- a/include/linux/virtio.h
> +++ b/include/linux/virtio.h
> @@ -40,6 +40,16 @@ int virtqueue_add_buf(struct virtqueue *vq,
> void *data,
> gfp_t gfp);
>  
> +int virtqueue_add_outbuf(struct virtqueue *vq,
> +  struct scatterlist sg[], unsigned int num,
> +  void *data,
> +  gfp_t gfp);
> +
> +int virtqueue_add_inbuf(struct virtqueue *vq,
> + struct scatterlist sg[], unsigned int num,
> + void *data,
> + gfp_t gfp);
> +
>  int virtqueue_add_sgs(struct virtqueue *vq,
> struct scatterlist *sgs[],
> unsigned int out_sgs,
> ___
> Virtualization mailing list
> Virtualization@lists.linux-foundation.org
> https://lists.linuxfoundation.org/mailman/listinfo/virtualization

-- 
Asias
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH 2/5] tcm_vhost: Introduce tcm_vhost_check_feature()

2013-03-06 Thread Stefan Hajnoczi
On Wed, Mar 06, 2013 at 02:16:27PM +0800, Asias He wrote:
> This helper is useful to check if a feature is supported.
> 
> Signed-off-by: Asias He 
> ---
>  drivers/vhost/tcm_vhost.c | 14 ++
>  1 file changed, 14 insertions(+)
> 
> diff --git a/drivers/vhost/tcm_vhost.c b/drivers/vhost/tcm_vhost.c
> index b3e50d7..fdbf986 100644
> --- a/drivers/vhost/tcm_vhost.c
> +++ b/drivers/vhost/tcm_vhost.c
> @@ -91,6 +91,20 @@ static int iov_num_pages(struct iovec *iov)
>  ((unsigned long)iov->iov_base & PAGE_MASK)) >> PAGE_SHIFT;
>  }
>  
> +static bool tcm_vhost_check_feature(struct vhost_scsi *vs, u64 feature)
> +{
> + u64 acked_features;
> + bool ret = false;
> +
> + mutex_lock(&vs->dev.mutex);
> + acked_features = vs->dev.acked_features;
> + if (acked_features & 1ULL << feature)
> + ret = true;
> + mutex_unlock(&vs->dev.mutex);
> +
> + return ret;
> +}

This is like vhost_has_feature() except it acquires dev.mutex?

In any case it isn't tcm_vhost-specific and could be in vhost.c.

Stefan
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH 5/5] tcm_vhost: Add hotplug/hotunplug support

2013-03-06 Thread Stefan Hajnoczi
On Wed, Mar 06, 2013 at 02:16:30PM +0800, Asias He wrote:
> +static struct tcm_vhost_evt *tcm_vhost_allocate_evt(struct vhost_scsi *vs,
> + u32 event, u32 reason)
> +{
> + struct tcm_vhost_evt *evt;
> +
> + if (atomic_read(&vs->vs_events_nr) > VHOST_SCSI_MAX_EVENT)
> + return NULL;
> +
> + evt = kzalloc(sizeof(*evt), GFP_KERNEL);
> +
> + if (evt) {
> + atomic_inc(&vs->vs_events_nr);

This looks suspicious: checking vs_events_nr > VHOST_SCSI_MAX_EVENT
first and then incrementing later isn't atomic!

> +static int tcm_vhost_hotunplug(struct tcm_vhost_tpg *tpg, struct se_lun *lun)
> +{
> + struct vhost_scsi *vs = tpg->vhost_scsi;
> +
> + mutex_lock(&tpg->tv_tpg_mutex);
> + vs = tpg->vhost_scsi;
> + mutex_unlock(&tpg->tv_tpg_mutex);
> + if (!vs)
> + return -EOPNOTSUPP;
> +
> + if (!tcm_vhost_check_feature(vs, VIRTIO_SCSI_F_HOTPLUG))
> + return -EOPNOTSUPP;
> +
> + return tcm_vhost_send_evt(vs, tpg, lun,
> + VIRTIO_SCSI_T_TRANSPORT_RESET,
> + VIRTIO_SCSI_EVT_RESET_REMOVED);
> +}

tcm_vhost_hotplug() and tcm_vhost_hotunplug() are the same function
except for VIRTIO_SCSI_EVT_RESET_RESCAN vs
VIRTIO_SCSI_EVT_RESET_REMOVED.  That can be passed in as an argument and
the code duplication can be eliminated.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: virtio PCI on KVM without IO BARs

2013-03-06 Thread Michael S. Tsirkin
On Tue, Mar 05, 2013 at 11:14:31PM -0800, H. Peter Anvin wrote:
> On 03/05/2013 04:05 PM, H. Peter Anvin wrote:
> > On 02/28/2013 07:24 AM, Michael S. Tsirkin wrote:
> >>
> >> 3. hypervisor assigned IO address
> >>qemu can reserve IO addresses and assign to virtio devices.
> >>2 bytes per device (for notification and ISR access) will be
> >>enough. So we can reserve 4K and this gets us 2000 devices.
> >> From KVM perspective, nothing changes.
> >>We'll want some capability in the device to let guest know
> >>this is what it should do, and pass the io address.
> >>One way to reserve the addresses is by using the bridge.
> >>Pros: no need for host kernel support
> >>Pros: regular PIO so fast
> >>Cons: does not help assigned devices, breaks nested virt
> >>
> >> Simply counting pros/cons, option 3 seems best. It's also the
> >> easiest to implement.
> >>
> > 
> > The problem here is the 4K I/O window for IO device BARs in bridges.
> > Why not simply add a (possibly proprietary) capability to the PCI bridge
> > to allow a much narrower window?  That fits much more nicely into the
> > device resource assignment on the guest side, and could even be
> > implemented on a real hardware device -- we can offer it to the PCI-SIG
> > for standardization, even.
> > 
> 
> Just a correction: I'm of course not talking about BARs but of the
> bridge windows.  The BARs are not a problem; an I/O BAR can cover as
> little as four bytes.
> 
>   -hpa

Right. Though even with better granularify bridge windows
would still be a (smaller) problem causing fragmentation.

If we were to extend the PCI spec I would go for a bridge without
windows at all: a bridge can snoop on configuration transactions and
responses programming devices behind it and build a full map of address
to device mappings.

In partucular, this would be a good fit for an uplink bridge in a PCI
express switch, which is integrated with downlink bridges on the same
silicon, so bridge windows do nothing but add overhead.

> -- 
> H. Peter Anvin, Intel Open Source Technology Center
> I work for Intel.  I don't speak on their behalf.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH 3/6] virtio_ring: inline internal vring functions more aggressively.

2013-03-06 Thread Michael S. Tsirkin
On Wed, Mar 06, 2013 at 04:22:09PM +1100, Rusty Russell wrote:
> We use inline and get gcc to do the right thing inlining the
> "indirect" traversal functions.  This also means we don't need to
> clean the sgs.
> 
> for i in `seq 50`; do /usr/bin/time -f 'Wall time:%e' ./vringh_test 
> --indirect --eventidx --parallel --fast-vringh; done 2>&1 | stats 
> --trim-outliers:
> 
> Before:
>   Using CPUS 0 and 3
>   Guest: notified 0, pinged 39014-39063(39062)
>   Host: notified 39014-39063(39062), pinged 0
>   Wall time:1.90-2.35(1.921875)
> 
> After:
>   Using CPUS 0 and 3
>   Guest: notified 0, pinged 39062-39063(39063)
>   Host: notified 39062-39063(39063), pinged 0
>   Wall time:1.76-2.22(1.789167)
> 
> Signed-off-by: Rusty Russell  

It's pretty cool that gcc is able to optimize it like this.
Which gcc version did you use here?


> diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
> index c537385..a78ad45 100644
> --- a/drivers/virtio/virtio_ring.c
> +++ b/drivers/virtio/virtio_ring.c
> @@ -98,13 +98,31 @@ struct vring_virtqueue
>  
>  #define to_vvq(_vq) container_of(_vq, struct vring_virtqueue, vq)
>  
> +static inline struct scatterlist *sg_next_chained(struct scatterlist *sg,
> +   unsigned int *count)
> +{
> + return sg_next(sg);
> +}
> +
> +static inline struct scatterlist *sg_next_arr(struct scatterlist *sg,
> +   unsigned int *count)
> +{
> + if (--(*count) == 0)
> + return NULL;
> + return sg + 1;
> +}
> +
>  /* Set up an indirect table of descriptors and add it to the queue. */
> -static int vring_add_indirect(struct vring_virtqueue *vq,
> -   struct scatterlist *sgs[],
> -   unsigned int total_sg,
> -   unsigned int out_sgs,
> -   unsigned int in_sgs,
> -   gfp_t gfp)
> +static inline int vring_add_indirect(struct vring_virtqueue *vq,
> +  struct scatterlist *sgs[],
> +  struct scatterlist *(*next)
> +(struct scatterlist *, unsigned int *),
> +  unsigned int total_sg,
> +  unsigned int total_out,
> +  unsigned int total_in,
> +  unsigned int out_sgs,
> +  unsigned int in_sgs,
> +  gfp_t gfp)
>  {
>   struct vring_desc *desc;
>   unsigned head;
> @@ -125,7 +143,7 @@ static int vring_add_indirect(struct vring_virtqueue *vq,
>   /* Transfer entries from the sg lists into the indirect page */
>   i = 0;
>   for (n = 0; n < out_sgs; n++) {
> - for (sg = sgs[n]; sg; sg = sg_next(sg)) {
> + for (sg = sgs[n]; sg; sg = next(sg, &total_out)) {
>   desc[i].flags = VRING_DESC_F_NEXT;
>   desc[i].addr = sg_phys(sg);
>   desc[i].len = sg->length;
> @@ -134,7 +152,7 @@ static int vring_add_indirect(struct vring_virtqueue *vq,
>   }
>   }
>   for (; n < (out_sgs + in_sgs); n++) {
> - for (sg = sgs[n]; sg; sg = sg_next(sg)) {
> + for (sg = sgs[n]; sg; sg = next(sg, &total_in)) {
>   desc[i].flags = VRING_DESC_F_NEXT|VRING_DESC_F_WRITE;
>   desc[i].addr = sg_phys(sg);
>   desc[i].len = sg->length;
> @@ -163,17 +181,20 @@ static int vring_add_indirect(struct vring_virtqueue 
> *vq,
>   return head;
>  }
>  
> -static int virtqueue_add(struct virtqueue *_vq,
> -  struct scatterlist *sgs[],
> -  unsigned int total_sg,
> -  unsigned int out_sgs,
> -  unsigned int in_sgs,
> -  void *data,
> -  gfp_t gfp)
> +static inline int virtqueue_add(struct virtqueue *_vq,
> + struct scatterlist *sgs[],
> + struct scatterlist *(*next)
> +   (struct scatterlist *, unsigned int *),
> + unsigned int total_out,
> + unsigned int total_in,
> + unsigned int out_sgs,
> + unsigned int in_sgs,
> + void *data,
> + gfp_t gfp)
>  {
>   struct vring_virtqueue *vq = to_vvq(_vq);
>   struct scatterlist *sg;
> - unsigned int i, n, avail, uninitialized_var(prev);
> + unsigned int i, n, avail, uninitialized_var(prev), total_sg;
>   int head;
>  
>   START_USE(vq);
> @@ -193,11 +214,14 @@ static int virtqueue_add(struct virtqueue *_vq,
>   }
>  #endif
>  
> + total_s

Re: [PATCH 2/2] tools/virtio: make barriers stronger.

2013-03-06 Thread Michael S. Tsirkin
On Wed, Mar 06, 2013 at 03:54:42PM +1100, Rusty Russell wrote:
> In the coming vringh_test, we share an mmap with another userspace process
> for testing.  This requires real barriers.
> 
> Signed-off-by: Rusty Russell 
> 
> diff --git a/tools/virtio/asm/barrier.h b/tools/virtio/asm/barrier.h
> index aff61e1..7a63693 100644
> --- a/tools/virtio/asm/barrier.h
> +++ b/tools/virtio/asm/barrier.h
> @@ -3,8 +3,8 @@
>  #define mb() __sync_synchronize()
>  
>  #define smp_mb() mb()
> -# define smp_rmb()   barrier()
> -# define smp_wmb()   barrier()
> +# define smp_rmb()   mb()
> +# define smp_wmb()   mb()
>  /* Weak barriers should be used. If not - it's a bug */
>  # define rmb()   abort()
>  # define wmb()   abort()

Hmm this seems wrong on x86 which has strong order in hardware.
It should not matter whether the other side is a userspace
process or a kernel thread.

-- 
MST
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: virtio PCI on KVM without IO BARs

2013-03-06 Thread H. Peter Anvin
On 03/06/2013 01:21 AM, Michael S. Tsirkin wrote:
> 
> Right. Though even with better granularify bridge windows
> would still be a (smaller) problem causing fragmentation.
> 
> If we were to extend the PCI spec I would go for a bridge without
> windows at all: a bridge can snoop on configuration transactions and
> responses programming devices behind it and build a full map of address
> to device mappings.
> 
> In partucular, this would be a good fit for an uplink bridge in a PCI
> express switch, which is integrated with downlink bridges on the same
> silicon, so bridge windows do nothing but add overhead.
> 

True, but the real problem is that the downlink (type 1 header) is
typically on a different piece of silicon than the device BAR (type 0
header).

I am not sure that a snooping-based system will work and not be
prohibitive in its hardware cost on an actual hardware system.  I
suspect it would decay into needing a large RAM array in every bridge.

-hpa


___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: virtio PCI on KVM without IO BARs

2013-03-06 Thread Michael S. Tsirkin
On Wed, Mar 06, 2013 at 03:15:16AM -0800, H. Peter Anvin wrote:
> On 03/06/2013 01:21 AM, Michael S. Tsirkin wrote:
> > 
> > Right. Though even with better granularify bridge windows
> > would still be a (smaller) problem causing fragmentation.
> > 
> > If we were to extend the PCI spec I would go for a bridge without
> > windows at all: a bridge can snoop on configuration transactions and
> > responses programming devices behind it and build a full map of address
> > to device mappings.
> > 
> > In partucular, this would be a good fit for an uplink bridge in a PCI
> > express switch, which is integrated with downlink bridges on the same
> > silicon, so bridge windows do nothing but add overhead.
> > 
> 
> True, but the real problem is that the downlink (type 1 header) is
> typically on a different piece of silicon than the device BAR (type 0
> header).
> 
> I am not sure that a snooping-based system will work and not be
> prohibitive in its hardware cost on an actual hardware system.  I
> suspect it would decay into needing a large RAM array in every bridge.
> 
>   -hpa

This might be more difficult if you allow arbitrary number of functions
behind a bridge.  However, this is not a problem for integrated functions
(e.g. on the same silicon or board).

Bridge could specify which slots host integrated functions and do
snooping for these, and have regular window rules apply to open slots.


-- 
MST
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH vringh] virtio: Introduce vringh wrappers in virtio_config

2013-03-06 Thread Ohad Ben-Cohen
On Wed, Mar 6, 2013 at 12:50 PM, Sjur Brændeland  wrote:
> The caif_virtio driver is using both host and guest ring sides, host side
> rings in RX direction and guest side rings in TX direction. The reason
> behind this is to enable zero-copy on the producer sides.
> (For details see recent discussion with Ohad "Wrappers for vringh"

FWIW, I'm not convinced that host side rings are necessary for
zero-copy - you could always just pass pointers to your huge data
inside buffers posted by guest side rings.

That said, I do believe that mixing guest side and host side rings may
simplify the overall solution in some cases, especially in more
complex multi core scenarios (where several cores all communicate with
each other).
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH vringh] virtio: Introduce vringh wrappers in virtio_config

2013-03-06 Thread Sjur Brændeland
On Wed, Mar 6, 2013 at 1:16 PM, Ohad Ben-Cohen  wrote:
> On Wed, Mar 6, 2013 at 12:50 PM, Sjur Brændeland  wrote:
>> The caif_virtio driver is using both host and guest ring sides, host side
>> rings in RX direction and guest side rings in TX direction. The reason
>> behind this is to enable zero-copy on the producer sides.
>> (For details see recent discussion with Ohad "Wrappers for vringh"
>
> FWIW, I'm not convinced that host side rings are necessary for
> zero-copy - you could always just pass pointers to your huge data
> inside buffers posted by guest side rings.

Well, to do that for CAIF you would have to integrate virtio buffer handling
deep into the lower layer of the modems 3GPP stack so that the 3GPP stack
could take buffers from the available ring.

From the outside point of view, perhaps this sounds doable, but in reality you
wouldn't get any Network Signalling engineers implementing the 3GPP stack
to buy into that.

Another issue is that our modem also support other interface technologies
such as USB and HSI. It simply doesn't make sense to enforce virtio rings
internally if virtio isn't otherwise used on the modem.

Without host side rings we simply wouldn't be able to implement zero-copy
in the RX direction for the STE-modem.

Regards,
Sjur

>
> That said, I do believe that mixing guest side and host side rings may
> simplify the overall solution in some cases, especially in more
> complex multi core scenarios (where several cores all communicate with
> each other).
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH] tcm_vhost: Add hotplug/hotunplug support

2013-03-06 Thread Paolo Bonzini
Il 05/03/2013 23:51, Nicholas A. Bellinger ha scritto:
> For making progress with Paolo's vhost-scsi branch on my side, this was
> still being blocked by bios boot hangs:
> 
> http://www.spinics.net/lists/target-devel/msg03830.html
> 
> Paolo & Co, any ideas on the latter..?

Nope.  Asias, do you have time to look at it?  Only patches 3 and 4 in
that series are needed, they are at

http://article.gmane.org/gmane.comp.emulators.qemu/192143
http://article.gmane.org/gmane.comp.emulators.qemu/192147

(vhost_scsi_set_endpoint and vhost_scsi_clear_endpoint need to be
integrated in vhost_scsi_start and vhost_scsi_stop).

Paolo
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH] virtio-spec: Define virtio-mmio registers as LE

2013-03-06 Thread Pawel Moll
On Tue, 2013-03-05 at 00:11 +, Rusty Russell wrote:
> Pawel Moll  writes:
> > On Fri, 2013-03-01 at 11:21 +, Marc Zyngier wrote:
> >> > Having said that, Rusty was contemplating enforcing LE config space in
> >> > the new PCI layout...
> >> 
> >> I wouldn't complain about that, and would like to see a similar thing on
> >> MMIO.
> >
> > Wherever PCI goes, MMIO follows :-)
> 
> Yes, but if you switch from 'guest-endian' to 'little-endian' how will
> you tell?  For PCI, we'd detect it by using the new layout.

The version register/value. At some point of time there will be a
new(ish) MMIO layout anyway to deal with 64-bit addresses, replacing the
ring page number with two 32-bit hi/lo physical address registers. This
was discussed not long after the driver got merged...

> I'd rather you specify MMIO as little endian, and we fix the kernel
> config accessors to be endian aware (ie. 8, 16, 32, 64-bit accessors).
> Since noone BE is using MMIO right now, it's safe...

That's absolutely fine with me, however I don't see anything I could do
in the virtio_mmio driver and spec - the virtio_config_ops specifies
get/set as void * operations and I simply do byte-by-byte copy. Have I
missed some config/endianess/PCI related discussion?

Paweł


___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [RFC PATCH v1 01/33] mm: introduce common help functions to deal with reserved/managed pages

2013-03-06 Thread Russell King - ARM Linux
On Tue, Mar 05, 2013 at 08:47:22PM +0100, Sam Ravnborg wrote:
> On Tue, Mar 05, 2013 at 10:54:44PM +0800, Jiang Liu wrote:
> > +static inline void free_initmem_default(int poison)
> > +{
> 
> Why request user to supply the poison argumet. If this is the default
> implmentation then use the default poison value too (POISON_FREE_INITMEM)

That poison value is inappropriate on some architectures like ARM - it's
executable.  The default poison value leads to:

   0:   stclgt  12, cr12, [ip], {204}   ; 0xcc

or

   4:   ldmia   r4!, {r2, r3, r6, r7}

And we might as well forget using any kind of poison in that case.

The value which use is an undefined instruction on ARM and Thumb.

Notice the calls to poison_init_mem() in arch/arm/mm/init.c, which are
left by these patches, allowing us to continue using an appropriate
architecture specific value which will help to ensure that people
calling discarded init functions get appropriately bitten.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH] tcm_vhost: Add hotplug/hotunplug support

2013-03-06 Thread Paolo Bonzini
Il 06/03/2013 02:28, Asias He ha scritto:
>> > This can queue up quite a bit of memory if the handler thread
>> > is delayed, no? Can we limit the # of outstanding events?
>> > Will guest recover from a missed event?
> Hmm, good point. Will limit the number. The size of 'struct
> tcm_vhost_evt' is around 20 bytes. So if we limit it to 128, it is ~2.5K
> of memory.
> 
> Paolo, if we limit the number of outstanding events and set
> vs->vs_events_dropped, the guest will recover from a missed event, right?

Yes.  At least it should (it doesn't yet).

Paolo

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH] lguest: fix paths in comments

2013-03-06 Thread Rusty Russell
Wanlong Gao  writes:
> After commit 07fe997, lguest tool has already moved from
> Documentation/virtual/lguest/ to tools/lguest/.

Thanks, applied!

Cheers,
Rusty.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH vringh] virtio: Introduce vringh wrappers in virtio_config

2013-03-06 Thread Sjur Brændeland
On Wed, Mar 6, 2013 at 1:37 PM, Sjur Brændeland  wrote:
> On Wed, Mar 6, 2013 at 1:16 PM, Ohad Ben-Cohen  wrote:
>> On Wed, Mar 6, 2013 at 12:50 PM, Sjur Brændeland  wrote:
>>> The caif_virtio driver is using both host and guest ring sides, host side
>>> rings in RX direction and guest side rings in TX direction. The reason
>>> behind this is to enable zero-copy on the producer sides.
>>> (For details see recent discussion with Ohad "Wrappers for vringh"
>>
>> FWIW, I'm not convinced that host side rings are necessary for
>> zero-copy - you could always just pass pointers to your huge data
>> inside buffers posted by guest side rings.

When I read this again I realized that I missed your point regarding
passing pointers inside buffers completely. My rambling below
argued against using the buffers provided in the available ring as a
replacement of the modem internal allocator. Sorry for the confusion.

I haven't really spent time thinking through the implication of the alternative
of passing buffer pointers inside buffers posted by guest side rings.
But it seems to me that there would be some issues related to returning
consumed buffers back if you use the rings in this way.

Thanks,
Sjur

> Well, to do that for CAIF you would have to integrate virtio buffer handling
> deep into the lower layer of the modems 3GPP stack so that the 3GPP stack
> could take buffers from the available ring.
>
> From the outside point of view, perhaps this sounds doable, but in reality you
> wouldn't get any Network Signalling engineers implementing the 3GPP stack
> to buy into that.
>
> Another issue is that our modem also support other interface technologies
> such as USB and HSI. It simply doesn't make sense to enforce virtio rings
> internally if virtio isn't otherwise used on the modem.
>
> Without host side rings we simply wouldn't be able to implement zero-copy
> in the RX direction for the STE-modem.
>
> Regards,
> Sjur
>
>>
>> That said, I do believe that mixing guest side and host side rings may
>> simplify the overall solution in some cases, especially in more
>> complex multi core scenarios (where several cores all communicate with
>> each other).
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH 5/5] tcm_vhost: Add hotplug/hotunplug support

2013-03-06 Thread Asias He
On Wed, Mar 06, 2013 at 10:21:09AM +0100, Stefan Hajnoczi wrote:
> On Wed, Mar 06, 2013 at 02:16:30PM +0800, Asias He wrote:
> > +static struct tcm_vhost_evt *tcm_vhost_allocate_evt(struct vhost_scsi *vs,
> > +   u32 event, u32 reason)
> > +{
> > +   struct tcm_vhost_evt *evt;
> > +
> > +   if (atomic_read(&vs->vs_events_nr) > VHOST_SCSI_MAX_EVENT)
> > +   return NULL;
> > +
> > +   evt = kzalloc(sizeof(*evt), GFP_KERNEL);
> > +
> > +   if (evt) {
> > +   atomic_inc(&vs->vs_events_nr);
> 
> This looks suspicious: checking vs_events_nr > VHOST_SCSI_MAX_EVENT
> first and then incrementing later isn't atomic!

This does not matter. (1) and (2) are okay. In case (3), the other side
can only decrease the number of event, the limit will not be exceeded.

(1) 
   atomic_dec()
   atomic_read() 
   atomic_inc()
(2)
   atomic_read() 
   atomic_inc()
   atomic_dec()

(3)
   atomic_read() 
   atomic_dec()
   atomic_inc()

  
> > +static int tcm_vhost_hotunplug(struct tcm_vhost_tpg *tpg, struct se_lun 
> > *lun)
> > +{
> > +   struct vhost_scsi *vs = tpg->vhost_scsi;
> > +
> > +   mutex_lock(&tpg->tv_tpg_mutex);
> > +   vs = tpg->vhost_scsi;
> > +   mutex_unlock(&tpg->tv_tpg_mutex);
> > +   if (!vs)
> > +   return -EOPNOTSUPP;
> > +
> > +   if (!tcm_vhost_check_feature(vs, VIRTIO_SCSI_F_HOTPLUG))
> > +   return -EOPNOTSUPP;
> > +
> > +   return tcm_vhost_send_evt(vs, tpg, lun,
> > +   VIRTIO_SCSI_T_TRANSPORT_RESET,
> > +   VIRTIO_SCSI_EVT_RESET_REMOVED);
> > +}
> 
> tcm_vhost_hotplug() and tcm_vhost_hotunplug() are the same function
> except for VIRTIO_SCSI_EVT_RESET_RESCAN vs
> VIRTIO_SCSI_EVT_RESET_REMOVED.  That can be passed in as an argument and
> the code duplication can be eliminated.

I thought about this also. We can have a tcm_vhost_do_hotplug() helper.

   tcm_vhost_do_hotplug(tpg, lun, plug)
   
   tcm_vhost_hotplug() {
tcm_vhost_do_hotplug(tpg, lun, true)
   }
   
   tcm_vhost_hotunplug() {
tcm_vhost_do_hotplug(tpg, lun, false)
   }

The reason I did not do that is I do not like the true/false argument
but anyway this could remove duplication. I will do it.

-- 
Asias
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH 2/5] tcm_vhost: Introduce tcm_vhost_check_feature()

2013-03-06 Thread Asias He
On Wed, Mar 06, 2013 at 10:06:20AM +0100, Stefan Hajnoczi wrote:
> On Wed, Mar 06, 2013 at 02:16:27PM +0800, Asias He wrote:
> > This helper is useful to check if a feature is supported.
> > 
> > Signed-off-by: Asias He 
> > ---
> >  drivers/vhost/tcm_vhost.c | 14 ++
> >  1 file changed, 14 insertions(+)
> > 
> > diff --git a/drivers/vhost/tcm_vhost.c b/drivers/vhost/tcm_vhost.c
> > index b3e50d7..fdbf986 100644
> > --- a/drivers/vhost/tcm_vhost.c
> > +++ b/drivers/vhost/tcm_vhost.c
> > @@ -91,6 +91,20 @@ static int iov_num_pages(struct iovec *iov)
> >((unsigned long)iov->iov_base & PAGE_MASK)) >> PAGE_SHIFT;
> >  }
> >  
> > +static bool tcm_vhost_check_feature(struct vhost_scsi *vs, u64 feature)
> > +{
> > +   u64 acked_features;
> > +   bool ret = false;
> > +
> > +   mutex_lock(&vs->dev.mutex);
> > +   acked_features = vs->dev.acked_features;
> > +   if (acked_features & 1ULL << feature)
> > +   ret = true;
> > +   mutex_unlock(&vs->dev.mutex);
> > +
> > +   return ret;
> > +}
> 
> This is like vhost_has_feature() except it acquires dev.mutex?
> 
> In any case it isn't tcm_vhost-specific and could be in vhost.c.

tcm_vhost_check_feature() is called outside the vhost worker thread. So
the dev.mutex is needed. 

Mst, what's your preference here? Add vhost_has_feature_locked() in
vhost.c or I use vhost_has_feature() in tcm_vhost_check_feature().

Thanks.
-- 
Asias
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH] tcm_vhost: Add hotplug/hotunplug support

2013-03-06 Thread Asias He
On Wed, Mar 06, 2013 at 06:41:47PM +0100, Paolo Bonzini wrote:
> Il 06/03/2013 02:28, Asias He ha scritto:
> >> > This can queue up quite a bit of memory if the handler thread
> >> > is delayed, no? Can we limit the # of outstanding events?
> >> > Will guest recover from a missed event?
> > Hmm, good point. Will limit the number. The size of 'struct
> > tcm_vhost_evt' is around 20 bytes. So if we limit it to 128, it is ~2.5K
> > of memory.
> > 
> > Paolo, if we limit the number of outstanding events and set
> > vs->vs_events_dropped, the guest will recover from a missed event, right?
> 
> Yes.  At least it should (it doesn't yet).

Ok, thanks.

> Paolo
> 

-- 
Asias
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH] tcm_vhost: Add hotplug/hotunplug support

2013-03-06 Thread Asias He

On Wed, Mar 06, 2013 at 03:28:00PM +0100, Paolo Bonzini wrote:
> Il 05/03/2013 23:51, Nicholas A. Bellinger ha scritto:
> > For making progress with Paolo's vhost-scsi branch on my side, this was
> > still being blocked by bios boot hangs:
> > 
> > http://www.spinics.net/lists/target-devel/msg03830.html
> > 
> > Paolo & Co, any ideas on the latter..?
> 
> Nope.  Asias, do you have time to look at it?  Only patches 3 and 4 in
> that series are needed, they are at
> 
> http://article.gmane.org/gmane.comp.emulators.qemu/192143
> http://article.gmane.org/gmane.comp.emulators.qemu/192147
> 
> (vhost_scsi_set_endpoint and vhost_scsi_clear_endpoint need to be
> integrated in vhost_scsi_start and vhost_scsi_stop).

Okay. I will take a look. I am CC'ing Dominik Dingel, he might be
interested in.

Nicholas, could you share the code where you left (on top of Paolo's),
so I can start from.

-- 
Asias
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH 4/6] virtio_ring: virtqueue_add_outbuf / virtqueue_add_inbuf.

2013-03-06 Thread Rusty Russell
Asias He  writes:
> On Wed, Mar 06, 2013 at 04:23:24PM +1100, Rusty Russell wrote:
>> These are specialized versions of virtqueue_add_buf(), which cover
>> over 80% of cases and are far clearer.
>> 
>> In particular, the scatterlists passed to these functions don't have
>> to be clean (ie. we ignore end markers).
>> 
>> Signed-off-by: Rusty Russell  
>
> So, what is the plan for the following ideas discussed in the other
> thread?
>
>'''
>> Looking at code, it seems that most users really have a single sg, in
>> low memory. So how about simply passing void * instead of sg? Whoever
>> has multiple sgs can use the rich interface.
>
>Good point, let's do that:
>1) Make virtqueue_add_outbuf()/inbuf() take a void * and len.
>2) Transfer users across to use that.
>3) Make everyone else use clean scatterlists with virtqueue_add_sgs[].
>4) Remove virtqueue_add_bufs().

Networking performance: there is still a performance penalty in using
virtqueue_add_sgs(), and it can't use a simple void * and len.

So I changed my mind.  Again...

Cheers,
Rusty.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH 0/6] virtio_add_buf replacement.

2013-03-06 Thread Rusty Russell
Asias He  writes:
> On Wed, Mar 06, 2013 at 04:15:02PM +1100, Rusty Russell wrote:
>> OK, so I've spent a few days benchmarking.  Turns out 80% of
>> virtio_add_buf cases are uni-directional (including the
>> always-performance-sensitive networking code), and that gets no
>> performance penalty (though tests with real networking would be
>> appreciated!).
>> 
>> I'm not reposting all the "convert driver to virtio_add_outbuf()"
>> patches: just the scsi one which I didn't have before.  I won't actually
>> remove virtio_add_buf() until the *following* merge window, just to be
>> sure.
>
> Why not send out all the patches in this series? It would be much easier
> for people to read in one thread.

I could re-spam people, but the patches are unchanged and uninteresting:
the scsi one I wanted an Ack for, however.

I really want people to review the core patches, and if they're fine,
I'll post the whole thing one last time before putting them in my
virtio-next branch (where they can't change).

Cheers,
Rusty.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH 3/6] virtio_ring: inline internal vring functions more aggressively.

2013-03-06 Thread Rusty Russell
"Michael S. Tsirkin"  writes:

> On Wed, Mar 06, 2013 at 04:22:09PM +1100, Rusty Russell wrote:
>> We use inline and get gcc to do the right thing inlining the
>> "indirect" traversal functions.  This also means we don't need to
>> clean the sgs.
>> 
>> for i in `seq 50`; do /usr/bin/time -f 'Wall time:%e' ./vringh_test 
>> --indirect --eventidx --parallel --fast-vringh; done 2>&1 | stats 
>> --trim-outliers:
>> 
>> Before:
>>  Using CPUS 0 and 3
>>  Guest: notified 0, pinged 39014-39063(39062)
>>  Host: notified 39014-39063(39062), pinged 0
>>  Wall time:1.90-2.35(1.921875)
>> 
>> After:
>>  Using CPUS 0 and 3
>>  Guest: notified 0, pinged 39062-39063(39063)
>>  Host: notified 39062-39063(39063), pinged 0
>>  Wall time:1.76-2.22(1.789167)
>> 
>> Signed-off-by: Rusty Russell  
>
> It's pretty cool that gcc is able to optimize it like this.
> Which gcc version did you use here?

Gcc has done this for quite a while now... I reported a bug about it in
2008: http://gcc.gnu.org/bugzilla/show_bug.cgi?id=35728

This was i686-linux-gnu-gcc-4.7 (Ubuntu/Linaro 4.7.2-2ubuntu1) 4.7.2.

Cheers,
Rusty.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH 1/2] virtio-scsi: use pr_err() instead of printk()

2013-03-06 Thread Wanlong Gao
Convert the virtio-scsi driver to use pr_err() instead of printk().

Signed-off-by: Wanlong Gao 
---
 drivers/scsi/virtio_scsi.c | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/drivers/scsi/virtio_scsi.c b/drivers/scsi/virtio_scsi.c
index 612e320..f679b8c 100644
--- a/drivers/scsi/virtio_scsi.c
+++ b/drivers/scsi/virtio_scsi.c
@@ -13,6 +13,8 @@
  *
  */
 
+#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
+
 #include 
 #include 
 #include 
@@ -771,8 +773,7 @@ static int __init init(void)
 
virtscsi_cmd_cache = KMEM_CACHE(virtio_scsi_cmd, 0);
if (!virtscsi_cmd_cache) {
-   printk(KERN_ERR "kmem_cache_create() for "
-   "virtscsi_cmd_cache failed\n");
+   pr_err("kmem_cache_create() for virtscsi_cmd_cache failed\n");
goto error;
}
 
@@ -781,8 +782,7 @@ static int __init init(void)
mempool_create_slab_pool(VIRTIO_SCSI_MEMPOOL_SZ,
 virtscsi_cmd_cache);
if (!virtscsi_cmd_pool) {
-   printk(KERN_ERR "mempool_create() for"
-   "virtscsi_cmd_pool failed\n");
+   pr_err("mempool_create() for virtscsi_cmd_pool failed\n");
goto error;
}
ret = register_virtio_driver(&virtio_scsi_driver);
-- 
1.8.2.rc2

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH 2/2] virtio-scsi: reorder the goto label in init()

2013-03-06 Thread Wanlong Gao
Reorder the goto label in init() to make it more clearly, and remove
the useless NULL pointer reassignment.

Signed-off-by: Wanlong Gao 
---
 drivers/scsi/virtio_scsi.c | 27 +++
 1 file changed, 11 insertions(+), 16 deletions(-)

diff --git a/drivers/scsi/virtio_scsi.c b/drivers/scsi/virtio_scsi.c
index f679b8c..55dfb06 100644
--- a/drivers/scsi/virtio_scsi.c
+++ b/drivers/scsi/virtio_scsi.c
@@ -774,32 +774,27 @@ static int __init init(void)
virtscsi_cmd_cache = KMEM_CACHE(virtio_scsi_cmd, 0);
if (!virtscsi_cmd_cache) {
pr_err("kmem_cache_create() for virtscsi_cmd_cache failed\n");
-   goto error;
+   return ret;
}
 
-
-   virtscsi_cmd_pool =
-   mempool_create_slab_pool(VIRTIO_SCSI_MEMPOOL_SZ,
-virtscsi_cmd_cache);
+   virtscsi_cmd_pool = mempool_create_slab_pool(VIRTIO_SCSI_MEMPOOL_SZ,
+virtscsi_cmd_cache);
if (!virtscsi_cmd_pool) {
pr_err("mempool_create() for virtscsi_cmd_pool failed\n");
-   goto error;
+   goto out_destroy_cache;
}
+
ret = register_virtio_driver(&virtio_scsi_driver);
if (ret < 0)
-   goto error;
+   goto out_destroy_pool;
 
return 0;
 
-error:
-   if (virtscsi_cmd_pool) {
-   mempool_destroy(virtscsi_cmd_pool);
-   virtscsi_cmd_pool = NULL;
-   }
-   if (virtscsi_cmd_cache) {
-   kmem_cache_destroy(virtscsi_cmd_cache);
-   virtscsi_cmd_cache = NULL;
-   }
+out_destroy_pool:
+   mempool_destroy(virtscsi_cmd_pool);
+out_destroy_cache:
+   kmem_cache_destroy(virtscsi_cmd_cache);
+
return ret;
 }
 
-- 
1.8.2.rc2

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH] vhost_net: remove tx polling state

2013-03-06 Thread Jason Wang
After commit 2b8b328b61c799957a456a5a8dab8cc7dea68575 (vhost_net: handle polling
errors when setting backend), we in fact track the polling state through
poll->wqh, so there's no need to duplicate the work with an extra
vhost_net_polling_state. So this patch removes this and make the code simpler.

Signed-off-by: Jason Wang 
---
 drivers/vhost/net.c   |   60 
 drivers/vhost/vhost.c |3 ++
 2 files changed, 13 insertions(+), 50 deletions(-)

diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
index 959b1cd..d1a03dd 100644
--- a/drivers/vhost/net.c
+++ b/drivers/vhost/net.c
@@ -64,20 +64,10 @@ enum {
VHOST_NET_VQ_MAX = 2,
 };
 
-enum vhost_net_poll_state {
-   VHOST_NET_POLL_DISABLED = 0,
-   VHOST_NET_POLL_STARTED = 1,
-   VHOST_NET_POLL_STOPPED = 2,
-};
-
 struct vhost_net {
struct vhost_dev dev;
struct vhost_virtqueue vqs[VHOST_NET_VQ_MAX];
struct vhost_poll poll[VHOST_NET_VQ_MAX];
-   /* Tells us whether we are polling a socket for TX.
-* We only do this when socket buffer fills up.
-* Protected by tx vq lock. */
-   enum vhost_net_poll_state tx_poll_state;
/* Number of TX recently submitted.
 * Protected by tx vq lock. */
unsigned tx_packets;
@@ -155,28 +145,6 @@ static void copy_iovec_hdr(const struct iovec *from, 
struct iovec *to,
}
 }
 
-/* Caller must have TX VQ lock */
-static void tx_poll_stop(struct vhost_net *net)
-{
-   if (likely(net->tx_poll_state != VHOST_NET_POLL_STARTED))
-   return;
-   vhost_poll_stop(net->poll + VHOST_NET_VQ_TX);
-   net->tx_poll_state = VHOST_NET_POLL_STOPPED;
-}
-
-/* Caller must have TX VQ lock */
-static int tx_poll_start(struct vhost_net *net, struct socket *sock)
-{
-   int ret;
-
-   if (unlikely(net->tx_poll_state != VHOST_NET_POLL_STOPPED))
-   return 0;
-   ret = vhost_poll_start(net->poll + VHOST_NET_VQ_TX, sock->file);
-   if (!ret)
-   net->tx_poll_state = VHOST_NET_POLL_STARTED;
-   return ret;
-}
-
 /* In case of DMA done not in order in lower device driver for some reason.
  * upend_idx is used to track end of used idx, done_idx is used to track head
  * of used idx. Once lower device DMA done contiguously, we will signal KVM
@@ -231,6 +199,7 @@ static void vhost_zerocopy_callback(struct ubuf_info *ubuf, 
bool success)
 static void handle_tx(struct vhost_net *net)
 {
struct vhost_virtqueue *vq = &net->dev.vqs[VHOST_NET_VQ_TX];
+   struct vhost_poll *poll = net->poll + VHOST_NET_VQ_TX;
unsigned out, in, s;
int head;
struct msghdr msg = {
@@ -256,7 +225,7 @@ static void handle_tx(struct vhost_net *net)
wmem = atomic_read(&sock->sk->sk_wmem_alloc);
if (wmem >= sock->sk->sk_sndbuf) {
mutex_lock(&vq->mutex);
-   tx_poll_start(net, sock);
+   vhost_poll_start(poll, sock->file);
mutex_unlock(&vq->mutex);
return;
}
@@ -265,7 +234,7 @@ static void handle_tx(struct vhost_net *net)
vhost_disable_notify(&net->dev, vq);
 
if (wmem < sock->sk->sk_sndbuf / 2)
-   tx_poll_stop(net);
+   vhost_poll_stop(poll);
hdr_size = vq->vhost_hlen;
zcopy = vq->ubufs;
 
@@ -287,7 +256,7 @@ static void handle_tx(struct vhost_net *net)
 
wmem = atomic_read(&sock->sk->sk_wmem_alloc);
if (wmem >= sock->sk->sk_sndbuf * 3 / 4) {
-   tx_poll_start(net, sock);
+   vhost_poll_start(poll, sock->file);
set_bit(SOCK_ASYNC_NOSPACE, &sock->flags);
break;
}
@@ -298,7 +267,7 @@ static void handle_tx(struct vhost_net *net)
(vq->upend_idx - vq->done_idx) :
(vq->upend_idx + UIO_MAXIOV - vq->done_idx);
if (unlikely(num_pends > VHOST_MAX_PEND)) {
-   tx_poll_start(net, sock);
+   vhost_poll_start(poll, sock->file);
set_bit(SOCK_ASYNC_NOSPACE, &sock->flags);
break;
}
@@ -364,7 +333,7 @@ static void handle_tx(struct vhost_net *net)
}
vhost_discard_vq_desc(vq, 1);
if (err == -EAGAIN || err == -ENOBUFS)
-   tx_poll_start(net, sock);
+   vhost_poll_start(poll, sock->file);
break;
}
if (err != len)
@@ -627,7 +596,6 @@ static int vhost_net_open(struct inode *inode, struct file 
*f)
 
vhost_poll_init(n->poll + VHOST_NET_VQ_TX, handle_tx_net, POLLOUT, dev);
vhost_poll_init(n->poll + VHOST_N

Re: [PATCH 1/1] virtio: rng: disallow multiple device registrations, fixes crashes

2013-03-06 Thread Rusty Russell
Amit Shah  writes:
> The code currently only supports one virtio-rng device at a time.
> Invoking guests with multiple devices causes the guest to blow up.
>
> Check if we've already registered and initialised the driver.  Also
> cleanup in case of registration errors or hot-unplug so that a new
> device can be used.
>
> Reported-by: Peter Krempa 
> Reported-by: 
> Signed-off-by: Amit Shah 
> ---
>
> Also valid for stable?

Yes.  We could fix virtio-rng to allow multiple rngs, but of course it
will fail anyway since hwrng wants unique names.  And changing the name
to be virtio-%u will probably break things, for no real upside.

Applied, and Cc:stable added.

Thanks,
Rusty.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH 2/2] tools/virtio: make barriers stronger.

2013-03-06 Thread Rusty Russell
"Michael S. Tsirkin"  writes:
> On Wed, Mar 06, 2013 at 03:54:42PM +1100, Rusty Russell wrote:
>> In the coming vringh_test, we share an mmap with another userspace process
>> for testing.  This requires real barriers.
>> 
>> Signed-off-by: Rusty Russell 
>> 
>> diff --git a/tools/virtio/asm/barrier.h b/tools/virtio/asm/barrier.h
>> index aff61e1..7a63693 100644
>> --- a/tools/virtio/asm/barrier.h
>> +++ b/tools/virtio/asm/barrier.h
>> @@ -3,8 +3,8 @@
>>  #define mb() __sync_synchronize()
>>  
>>  #define smp_mb()mb()
>> -# define smp_rmb()  barrier()
>> -# define smp_wmb()  barrier()
>> +# define smp_rmb()  mb()
>> +# define smp_wmb()  mb()
>>  /* Weak barriers should be used. If not - it's a bug */
>>  # define rmb()  abort()
>>  # define wmb()  abort()
>
> Hmm this seems wrong on x86 which has strong order in hardware.
> It should not matter whether the other side is a userspace
> process or a kernel thread.

Actually, this code is completely generic now, though overkill for x86 
smp_wmb():

Interestingly, when I try defining them, 32-bit x86 slows down (it seems
that gcc is using "lock orl $0x0,(%esp)" for __sync_synchronize()).:

On my 32-bit laptop: Intel(R) Core(TM) i5 CPU   M 560  @ 2.67GHz

Before:
Wall time:1.66-1.79(1.682500)
After:
Wall time:1.93-3.62(1.960625)

64 bit it's a win:
On 2.6.32-358.el6.x86_64, gcc (GCC) 4.4.7 20120313 (Red Hat 4.4.7-3), Intel(R) 
Xeon(R) CPU   E5620  @ 2.40GHz:

Before:
real0m2.937000-8.339000(3.123979)s
user0m2.811000-8.233000(2.954813)s
sys 0m0.052000-0.154000(0.089396)s
After:
real0m2.559000-2.936000(2.726729)s
user0m2.397000-2.651000(2.506396)s
sys 0m0.055000-0.152000(0.090667)s

Raw performance doesn't really matter, of course, but it's tempting to
use these asm barriers for __x86_64__, and use __sync_synchronize()
everywhere for everyone else.

Thoughts?
Rusty.

diff --git a/tools/virtio/asm/barrier.h b/tools/virtio/asm/barrier.h
index 7a63693..8de720a 100644
--- a/tools/virtio/asm/barrier.h
+++ b/tools/virtio/asm/barrier.h
@@ -1,11 +1,12 @@
 #if defined(__i386__) || defined(__x86_64__)
 #define barrier() asm volatile("" ::: "memory")
-#define mb() __sync_synchronize()
 
-#define smp_mb()   mb()
-# define smp_rmb() mb()
-# define smp_wmb() mb()
+#define smp_mb()   asm volatile("mfence":::"memory")
+#define smp_rmb()  asm volatile("lfence":::"memory")
+#define smp_wmb()  asm volatile("sfence" ::: "memory")
+
 /* Weak barriers should be used. If not - it's a bug */
+# define mb()  abort()
 # define rmb() abort()
 # define wmb() abort()
 #else
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH] virtio-spec: Define virtio-mmio registers as LE

2013-03-06 Thread Rusty Russell
Pawel Moll  writes:
> On Tue, 2013-03-05 at 00:11 +, Rusty Russell wrote:
>> Pawel Moll  writes:
>> > On Fri, 2013-03-01 at 11:21 +, Marc Zyngier wrote:
>> >> > Having said that, Rusty was contemplating enforcing LE config space in
>> >> > the new PCI layout...
>> >> 
>> >> I wouldn't complain about that, and would like to see a similar thing on
>> >> MMIO.
>> >
>> > Wherever PCI goes, MMIO follows :-)
>> 
>> Yes, but if you switch from 'guest-endian' to 'little-endian' how will
>> you tell?  For PCI, we'd detect it by using the new layout.
>
> The version register/value. At some point of time there will be a
> new(ish) MMIO layout anyway to deal with 64-bit addresses, replacing the
> ring page number with two 32-bit hi/lo physical address registers. This
> was discussed not long after the driver got merged...

As long as you have a plan for older guests...

>> I'd rather you specify MMIO as little endian, and we fix the kernel
>> config accessors to be endian aware (ie. 8, 16, 32, 64-bit accessors).
>> Since noone BE is using MMIO right now, it's safe...
>
> That's absolutely fine with me, however I don't see anything I could do
> in the virtio_mmio driver and spec - the virtio_config_ops specifies
> get/set as void * operations and I simply do byte-by-byte copy. Have I
> missed some config/endianess/PCI related discussion?

Yes, that's exactly what I mean, they'd have to be split into
8/16/32/64 accessors.  Which would do byte-by-byte for PCI.

The spec can simply be updated.

Cheers,
Rusty.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization