Re: [PATCH 14/18] virtio: add api for delayed callbacks

2011-05-19 Thread Michael S. Tsirkin
On Mon, May 16, 2011 at 04:43:21PM +0930, Rusty Russell wrote:
 On Sun, 15 May 2011 15:48:18 +0300, Michael S. Tsirkin m...@redhat.com 
 wrote:
  On Mon, May 09, 2011 at 03:27:33PM +0930, Rusty Russell wrote:
   On Wed, 4 May 2011 23:52:33 +0300, Michael S. Tsirkin m...@redhat.com 
   wrote:
Add an API that tells the other side that callbacks
should be delayed until a lot of work has been done.
Implement using the new used_event feature.
   
   Since you're going to add a capacity query anyway, why not add the
   threshold argument here?
  
  I thought that if we keep the API kind of generic
  there might be more of a chance that future transports
  will be able to implement it. For example, with an
  old host we can't commit to a specific index.
 
 No, it's always a hint anyway: you can be notified before the threshold
 is reached.  But best make it explicit I think.
 
 Cheers,
 Rusty.


I tried doing that and remembered the real reason I went for this API:

capacity is limited by descriptor table space, not
used ring space: each entry in the used ring frees up multiple entries
in the descriptor ring. Thus the ring can't provide
callback after capacity is N: capacity is only available
after we get bufs.

We could try and make the API pass in the number of freed bufs, however:
- this is not really what virtio-net cares about (it cares about
  capacity)
- if the driver passes a number  number of outstanding bufs, it will
  never get a callback. So to stay correct the driver will need to
  track number of outstanding requests. The simpler API avoids that. 


APIs are easy to change so I'm guessing it's not a major blocker:
we can change later when e.g. block tries to
pass in some kind of extra hint: we'll be smarter
about how this API can change then.

Right?

-- 
MST
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: [PATCH 09/18] virtio: use avail_event index

2011-05-19 Thread Michael S. Tsirkin
On Wed, May 18, 2011 at 09:49:42AM +0930, Rusty Russell wrote:
 On Tue, 17 May 2011 09:10:31 +0300, Michael S. Tsirkin m...@redhat.com 
 wrote:
  Well one can imagine a driver doing:
  
  while (virtqueue_get_buf()) {
  virtqueue_add_buf()
  }
  virtqueue_kick()
  
  which looks sensible (batch kicks) but might
  process any number of bufs between kicks.
 
 No, we currently only expose the buffers in the kick, so it can only
 fill the ring doing that.
 
 We could change that (and maybe that's worth looking at)...

That's actually what one of the early patches in the series did.
I guess I can try and reorder the patches, I do believe
it makes sense to publish immediately as this way
host can work in parallel with the guest.

  If we look at drivers closely enough, I think none
  of them do the equivalent of the above, but not 100% sure.
 
 I'm pretty sure we don't have this kind of 'echo' driver yet.  Drivers
 tend to take OS requests and queue them.  The only one which does
 anything even partially sophisticated is the net driver...
 
 Thanks,
 Rusty.

I guess I'll just need to do the legwork and check then.

-- 
MST
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: [PATCH] arch/tile: add /proc/tile, /proc/sys/tile, and a sysfs cpu attribute

2011-05-19 Thread Arnd Bergmann
(adding virtualization mailing list)

On Thursday 19 May 2011, Chris Metcalf wrote:
 On 5/19/2011 9:41 AM, Arnd Bergmann wrote:
  /proc/tile/hvconfig
Detailed configuration description of the hypervisor config
 
 I'm concerned about moving this one out of /proc, since it's just (copious)
 free text.  An hvconfig (hypervisor config) file describes hypervisor
 driver dedicated tiles that run things like network packet or PCIe
 ingress/egress processing, etc.  In addition it lists hypervisor driver
 options, boot flags for the kernel, etc, all kinds of things -- and you
 can't really guarantee that it will fit on a 4KB page, though in practice
 it usually does.  The hypervisor reads this file from the boot stream when
 it boots, and then makes it available to Linux not for Linux's use, or even
 for programmatic userspace use, but just for end users to be able to review
 and verify that the configuration they think they booted is really what
 they got, for customer remote debugging, etc.  The remote debugging
 aspect makes truncation to page size a particularly worrisome idea.

Since it's not the kernel that is imposing the format here, you could
make it a binary sysfs attribute, which works in the same way as
a proc file and does not have the size limitations.

  /proc/tile/board
Information on part numbers, serial numbers, etc., of the
hardware that the kernel is executing on
 
  /proc/tile/switch
The type of control path for the onboard network switch, if any.
 
 These two report information about the hardware, not the hypervisor.  For
 example:
 
 # cat /proc/tile/board
 board_part: 402-2-05
 board_serial: NBS-5002-00012
 chip_serial: P62338.01.110
 chip_revision: A0
 board_revision: 2.2
 board_description: Tilera TILExpressPro-64, TILEPro64 processor (866 
 MHz-capable), 1 10GbE, 6 1GbE
 # cat /proc/tile/switch
 control: mdio gbe/0

I think it's ok to have it below /sys/hypervisor, because the information
is provided through a hypervisor ABI, even though it describes something
else. This is more like /sys/firmware, but the boundaries between that
and /sys/hypervisor are not clearly defined when running virtualized anyway.

 The chip_serial and chip_revision can certainly hang off
 /sys/devices/system/cpu along with chip_height and chip_width (I've made
 this change now) but I don't know where the remaining board level
 description could go.  Note that (as you can see in the source) certain
 boards will also include four lines of output with the mezzanine board
 part number, serial number, revision, and description; this particular
 example doesn't have a mezzanine board.  The switch info is broken out
 into a separate file just to make it easier to script some /etc/rc code
 that launches a configurator for the Marvell switch on some of our boards,
 but is conceptually part of the board info.
 
  /proc/tile/hardwall
Information on the set of currently active hardwalls (note that
the implementation is already present in arch/tile/kernel/hardwall.c;
this change just enables it)
 
 This one is not a hypervisor-related file.  It just lists information about
 the set of Linux hardwalls currently active.  Again, it's not primarily
 intended for programmatic use, but as a diagnostic tool.

same here, I'd still put it into the hypervisor structure.

Arnd
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


[PATCH RFC] virtio_ring: fix delayed enable cb implementation

2011-05-19 Thread Michael S. Tsirkin
Fix some signed/assigned mistakes in virtqueue_enable_cb_delayed
by using u16 math all over.

Signed-off-by: Michael S. Tsirkin m...@redhat.com

---

I'll put this on my v1 branch as well

@@ -398,7 +397,7 @@ EXPORT_SYMBOL_GPL(virtqueue_enable_cb);
 bool virtqueue_enable_cb_delayed(struct virtqueue *_vq)
 {
struct vring_virtqueue *vq = to_vvq(_vq);
-   int bufs;
+   u16 bufs;
 
START_USE(vq);
 
@@ -412,7 +411,7 @@ bool virtqueue_enable_cb_delayed(struct virtqueue *_vq)
bufs = (vq-vring.avail-idx - vq-last_used_idx) * 3 / 4;
vring_used_event(vq-vring) = vq-last_used_idx + bufs;
virtio_mb();
-   if (unlikely(vq-vring.used-idx - vq-last_used_idx  bufs)) {
+   if (unlikely((u16)(vq-vring.used-idx - vq-last_used_idx)  bufs)) {
END_USE(vq);
return false;
}
 
-- 
1.7.5.53.gc233e
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


vmbus driver

2011-05-19 Thread K. Y. Srinivasan
 
Greg,

A few days ago you applied all the outstanding patches for the Hyper-V
drivers. With these patches, I have addressed all of the known review 
comments for the  vmbus driver (and a lot of comments/issues in other
drivers as well). I am still hoping I can address 
whatever other issues/comments there might be with the intention to 
get the vmbus driver out of staging in the current window. What is your 
sense in terms of how feasible this is. From my side, I can assure you 
that I will address all legitimate issues in a very timely manner and this
will not be dependent upon the location of the drivers (staging or 
outside staging). Looking forward to hearing from you.

Regards,

K. Y

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


[PATCHv2 01/14] virtio: event index interface

2011-05-19 Thread Michael S. Tsirkin
Define a new feature bit for the guest and host to utilize
an event index (like Xen) instead if a flag bit to enable/disable
interrupts and kicks.

Signed-off-by: Michael S. Tsirkin m...@redhat.com
---
 include/linux/virtio_ring.h |   15 ++-
 1 files changed, 14 insertions(+), 1 deletions(-)

diff --git a/include/linux/virtio_ring.h b/include/linux/virtio_ring.h
index e4d144b..70b0b39 100644
--- a/include/linux/virtio_ring.h
+++ b/include/linux/virtio_ring.h
@@ -29,6 +29,12 @@
 /* We support indirect buffer descriptors */
 #define VIRTIO_RING_F_INDIRECT_DESC28
 
+/* The Guest publishes the used index for which it expects an interrupt
+ * at the end of the avail ring. Host should ignore the avail-flags field. */
+/* The Host publishes the avail index for which it expects a kick
+ * at the end of the used ring. Guest should ignore the used-flags field. */
+#define VIRTIO_RING_F_EVENT_IDX29
+
 /* Virtio ring descriptors: 16 bytes.  These can chain together via next. */
 struct vring_desc {
/* Address (guest-physical). */
@@ -83,6 +89,7 @@ struct vring {
  * __u16 avail_flags;
  * __u16 avail_idx;
  * __u16 available[num];
+ * __u16 used_event_idx;
  *
  * // Padding to the next align boundary.
  * char pad[];
@@ -91,8 +98,14 @@ struct vring {
  * __u16 used_flags;
  * __u16 used_idx;
  * struct vring_used_elem used[num];
+ * __u16 avail_event_idx;
  * };
  */
+/* We publish the used event index at the end of the available ring, and vice
+ * versa. They are at the end for backwards compatibility. */
+#define vring_used_event(vr) ((vr)-avail-ring[(vr)-num])
+#define vring_avail_event(vr) (*(__u16 *)(vr)-used-ring[(vr)-num])
+
 static inline void vring_init(struct vring *vr, unsigned int num, void *p,
  unsigned long align)
 {
@@ -107,7 +120,7 @@ static inline unsigned vring_size(unsigned int num, 
unsigned long align)
 {
return ((sizeof(struct vring_desc) * num + sizeof(__u16) * (2 + num)
 + align - 1)  ~(align - 1))
-   + sizeof(__u16) * 2 + sizeof(struct vring_used_elem) * num;
+   + sizeof(__u16) * 3 + sizeof(struct vring_used_elem) * num;
 }
 
 #ifdef __KERNEL__
-- 
1.7.5.53.gc233e

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


[PATCHv2 02/14] virtio ring: inline function to check for events

2011-05-19 Thread Michael S. Tsirkin
With the new used_event and avail_event and features, both
host and guest need similar logic to check whether events are
enabled, so it helps to put the common code in the header.

Note that Xen has similar logic for notification hold-off
in include/xen/interface/io/ring.h with req_event and req_prod
corresponding to event_idx + 1 and new_idx respectively.
+1 comes from the fact that req_event and req_prod in Xen start at 1,
while event index in virtio starts at 0.

Signed-off-by: Michael S. Tsirkin m...@redhat.com
---
 include/linux/virtio_ring.h |   14 ++
 1 files changed, 14 insertions(+), 0 deletions(-)

diff --git a/include/linux/virtio_ring.h b/include/linux/virtio_ring.h
index 70b0b39..cf020e3 100644
--- a/include/linux/virtio_ring.h
+++ b/include/linux/virtio_ring.h
@@ -123,6 +123,20 @@ static inline unsigned vring_size(unsigned int num, 
unsigned long align)
+ sizeof(__u16) * 3 + sizeof(struct vring_used_elem) * num;
 }
 
+/* The following is used with USED_EVENT_IDX and AVAIL_EVENT_IDX */
+/* Assuming a given event_idx value from the other size, if
+ * we have just incremented index from old to new_idx,
+ * should we trigger an event? */
+static inline int vring_need_event(__u16 event_idx, __u16 new_idx, __u16 old)
+{
+   /* Note: Xen has similar logic for notification hold-off
+* in include/xen/interface/io/ring.h with req_event and req_prod
+* corresponding to event_idx + 1 and new_idx respectively.
+* Note also that req_event and req_prod in Xen start at 1,
+* event indexes in virtio start at 0. */
+   return (__u16)(new_idx - event_idx - 1)  (__u16)(new_idx - old);
+}
+
 #ifdef __KERNEL__
 #include linux/irqreturn.h
 struct virtio_device;
-- 
1.7.5.53.gc233e

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


[PATCHv2 00/14] virtio and vhost-net performance enhancements

2011-05-19 Thread Michael S. Tsirkin
OK, here is the large patchset that implements the virtio spec update
that I sent earlier (the spec itself needs a minor update, will send
that out too next week, but I think we are on the same page here
already). It supercedes the PUBLISH_USED_IDX patches I sent
out earlier.

What will follow will be a patchset that actually includes 4 sets of
patches.  I note below their status.  Please consider for 2.6.40, at
least partially. Rusty, do you think it's feasible?

List of patches and what they do:

I) With the first patchset, we change virtio ring notification
hand-off to work like the one in Xen -
each side publishes an event index, the other one
notifies when it reaches that value -
With the one difference that event index starts at 0,
same as request index (in xen event index starts at 1).

These are the patches in this set:
virtio: event index interface
virtio ring: inline function to check for events
virtio_ring: support event idx feature
vhost: support event index
virtio_test: support event index

Changes in this part of the patchset from v1 - address comments by Rusty et al.

I tested this a lot with virtio net block and with the simulator and esp
with the simulator it's easy to see drastic performance improvement
here:

[virtio]# time ./virtio_test 
spurious wakeus: 0x7

real0m0.169s
user0m0.140s
sys 0m0.019s
[virtio]# time ./virtio_test --no-event-idx
spurious wakeus: 0x11

real0m0.649s
user0m0.295s
sys 0m0.335s

And these patches are mostly unchanged from the very first version,
changes being almost exclusively code cleanups.  So I consider this part
the most stable, I strongly think these patches should go into 2.6.40.
One extra reason besides performance is that maintaining
them out of tree is very painful as guest/host ABI is affected.

II) Second set of patches: new apis and use in virtio_net
With the indexes in place it becomes possibile to request an event after
many requests (and not just on the next one as done now). This shall fix
the TX queue overrun which currently triggers a storm of interrupts.

Another issue I tried to fix is capacity checks in virtio-net,
there's a new API for that, and on top of that,
I implemented a patch improving real-time characteristics
of virtio_net

Thus we get the second patchset:
virtio: add api for delayed callbacks
virtio_net: delay TX callbacks
virtio_ring: Add capacity check API
virtio_net: fix TX capacity checks using new API
virtio_net: limit xmit polling

This has some fixes that I posted previously applied,
but otherwise ideantical to v1. I tried to change API
for enable_cb_delayed as Rusty suggested but failed to do this.
I think it's not possible to define cleanly.

These work fine for me, I think they can be merged for 2.6.40
too but would be nice to hear back from Shirley, Tom, Krishna.

III) There's also a patch that adds a tweak to virtio ring
virtio: don't delay avail index update

This seems to help small message sizes where we are constantly draining
the RX VQ.

I'll need to benchmark this to be able to give any numbers
with confidence, but I don't see how it can hurt anything.
Thoughts?

IV) Last part is a set of patches to extend feature bits
to 64 bit. I tested this by using feature bit 32.
vhost: fix 64 bit features
virtio_test: update for 64 bit features
virtio: 64 bit features

It's nice to have as set I used up the last free bit.
But not a must now that a single bit controls
use of event index on both sides.

The patchset is on top of net-next which at the time
I last rebased was 15ecd03 - so roughly 2.6.39-rc2.
For testing I usually do merge v2.6.39 on top.

qemu patch is also ready.  Code can be pulled from here:

git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost.git 
vhost-net-next-event-idx-v3
git://git.kernel.org/pub/scm/linux/kernel/git/mst/qemu-kvm.git 
virtio-net-event-idx-v3

Rusty, I think it will be easier to merge vhost and virtio bits in one
go. Can it all go in through your tree (Dave in the past acked
sending a very similar patch through you so should not be a problem)?



-- 
1.7.5.53.gc233e


Michael S. Tsirkin (13):
  virtio: event index interface
  virtio ring: inline function to check for events
  virtio_ring: support event idx feature
  vhost: support event index
  virtio_test: support event index
  virtio: add api for delayed callbacks
  virtio_net: delay TX callbacks
  virtio_net: fix TX capacity checks using new API
  virtio_net: limit xmit polling
  virtio: don't delay avail index update
  virtio: 64 bit features
  virtio_test: update for 64 bit features
  vhost: fix 64 bit features

Shirley Ma (1):
  virtio_ring: Add capacity check API

 drivers/lguest/lguest_device.c |8 +-
 drivers/net/virtio_net.c   |   27 +---
 drivers/s390/kvm/kvm_virtio.c  |8 +-
 drivers/vhost/net.c|   12 ++--
 drivers/vhost/test.c   |6 +-
 drivers/vhost/vhost.c  |  138 ++--
 drivers/vhost/vhost.h  |   29 +---

[PATCHv2 03/14] virtio_ring: support event idx feature

2011-05-19 Thread Michael S. Tsirkin
Support for the new event idx feature:
1. When enabling interrupts, publish the current avail index
   value to the host to get interrupts on the next update.
2. Use the new avail_event feature to reduce the number
   of exits from the guest.

Simple test with the simulator:

[virtio]# time ./virtio_test
spurious wakeus: 0x7

real0m0.169s
user0m0.140s
sys 0m0.019s
[virtio]# time ./virtio_test --no-event-idx
spurious wakeus: 0x11

real0m0.649s
user0m0.295s
sys 0m0.335s

Signed-off-by: Michael S. Tsirkin m...@redhat.com
---
 drivers/virtio/virtio_ring.c |   26 --
 1 files changed, 24 insertions(+), 2 deletions(-)

diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
index cc2f73e..1d0f9be 100644
--- a/drivers/virtio/virtio_ring.c
+++ b/drivers/virtio/virtio_ring.c
@@ -82,6 +82,9 @@ struct vring_virtqueue
/* Host supports indirect buffers */
bool indirect;
 
+   /* Host publishes avail event idx */
+   bool event;
+
/* Number of free buffers */
unsigned int num_free;
/* Head of free buffer list. */
@@ -237,18 +240,22 @@ EXPORT_SYMBOL_GPL(virtqueue_add_buf_gfp);
 void virtqueue_kick(struct virtqueue *_vq)
 {
struct vring_virtqueue *vq = to_vvq(_vq);
+   u16 new, old;
START_USE(vq);
/* Descriptors and available array need to be set before we expose the
 * new available array entries. */
virtio_wmb();
 
-   vq-vring.avail-idx += vq-num_added;
+   old = vq-vring.avail-idx;
+   new = vq-vring.avail-idx = old + vq-num_added;
vq-num_added = 0;
 
/* Need to update avail index before checking if we should notify */
virtio_mb();
 
-   if (!(vq-vring.used-flags  VRING_USED_F_NO_NOTIFY))
+   if (vq-event ?
+   vring_need_event(vring_avail_event(vq-vring), new, old) :
+   !(vq-vring.used-flags  VRING_USED_F_NO_NOTIFY))
/* Prod other side to tell it about changes. */
vq-notify(vq-vq);
 
@@ -324,6 +331,14 @@ void *virtqueue_get_buf(struct virtqueue *_vq, unsigned 
int *len)
ret = vq-data[i];
detach_buf(vq, i);
vq-last_used_idx++;
+   /* If we expect an interrupt for the next entry, tell host
+* by writing event index and flush out the write before
+* the read in the next get_buf call. */
+   if (!(vq-vring.avail-flags  VRING_AVAIL_F_NO_INTERRUPT)) {
+   vring_used_event(vq-vring) = vq-last_used_idx;
+   virtio_mb();
+   }
+
END_USE(vq);
return ret;
 }
@@ -345,7 +360,11 @@ bool virtqueue_enable_cb(struct virtqueue *_vq)
 
/* We optimistically turn back on interrupts, then check if there was
 * more to do. */
+   /* Depending on the VIRTIO_RING_F_EVENT_IDX feature, we need to
+* either clear the flags bit or point the event index at the next
+* entry. Always do both to keep code simple. */
vq-vring.avail-flags = ~VRING_AVAIL_F_NO_INTERRUPT;
+   vring_used_event(vq-vring) = vq-last_used_idx;
virtio_mb();
if (unlikely(more_used(vq))) {
END_USE(vq);
@@ -437,6 +456,7 @@ struct virtqueue *vring_new_virtqueue(unsigned int num,
 #endif
 
vq-indirect = virtio_has_feature(vdev, VIRTIO_RING_F_INDIRECT_DESC);
+   vq-event = virtio_has_feature(vdev, VIRTIO_RING_F_EVENT_IDX);
 
/* No callback?  Tell other side not to bother us. */
if (!callback)
@@ -471,6 +491,8 @@ void vring_transport_features(struct virtio_device *vdev)
switch (i) {
case VIRTIO_RING_F_INDIRECT_DESC:
break;
+   case VIRTIO_RING_F_EVENT_IDX:
+   break;
default:
/* We don't understand this bit. */
clear_bit(i, vdev-features);
-- 
1.7.5.53.gc233e

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


[PATCHv2 05/14] virtio_test: support event index

2011-05-19 Thread Michael S. Tsirkin
Add ability to test the new event idx feature,
enable by default.
---
 tools/virtio/virtio_test.c |   19 +--
 1 files changed, 17 insertions(+), 2 deletions(-)

diff --git a/tools/virtio/virtio_test.c b/tools/virtio/virtio_test.c
index df0c6d2..74d3331 100644
--- a/tools/virtio/virtio_test.c
+++ b/tools/virtio/virtio_test.c
@@ -198,6 +198,14 @@ const struct option longopts[] = {
.val = 'h',
},
{
+   .name = event-idx,
+   .val = 'E',
+   },
+   {
+   .name = no-event-idx,
+   .val = 'e',
+   },
+   {
.name = indirect,
.val = 'I',
},
@@ -211,13 +219,17 @@ const struct option longopts[] = {
 
 static void help()
 {
-   fprintf(stderr, Usage: virtio_test [--help] [--no-indirect]\n);
+   fprintf(stderr, Usage: virtio_test [--help]
+[--no-indirect]
+[--no-event-idx]
+   \n);
 }
 
 int main(int argc, char **argv)
 {
struct vdev_info dev;
-   unsigned long long features = 1ULL  VIRTIO_RING_F_INDIRECT_DESC;
+   unsigned long long features = (1ULL  VIRTIO_RING_F_INDIRECT_DESC) |
+   (1ULL  VIRTIO_RING_F_EVENT_IDX);
int o;
 
for (;;) {
@@ -228,6 +240,9 @@ int main(int argc, char **argv)
case '?':
help();
exit(2);
+   case 'e':
+   features = ~(1ULL  VIRTIO_RING_F_EVENT_IDX);
+   break;
case 'h':
help();
goto done;
-- 
1.7.5.53.gc233e

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


[PATCHv2 04/14] vhost: support event index

2011-05-19 Thread Michael S. Tsirkin
Support the new event index feature. When acked,
utilize it to reduce the # of interrupts sent to the guest.

Signed-off-by: Michael S. Tsirkin m...@redhat.com
---
 drivers/vhost/net.c   |   12 ++--
 drivers/vhost/test.c  |6 +-
 drivers/vhost/vhost.c |  138 +
 drivers/vhost/vhost.h |   21 +---
 4 files changed, 127 insertions(+), 50 deletions(-)

diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
index 2f7c76a..e224a92 100644
--- a/drivers/vhost/net.c
+++ b/drivers/vhost/net.c
@@ -144,7 +144,7 @@ static void handle_tx(struct vhost_net *net)
}
 
mutex_lock(vq-mutex);
-   vhost_disable_notify(vq);
+   vhost_disable_notify(net-dev, vq);
 
if (wmem  sock-sk-sk_sndbuf / 2)
tx_poll_stop(net);
@@ -166,8 +166,8 @@ static void handle_tx(struct vhost_net *net)
set_bit(SOCK_ASYNC_NOSPACE, sock-flags);
break;
}
-   if (unlikely(vhost_enable_notify(vq))) {
-   vhost_disable_notify(vq);
+   if (unlikely(vhost_enable_notify(net-dev, vq))) {
+   vhost_disable_notify(net-dev, vq);
continue;
}
break;
@@ -315,7 +315,7 @@ static void handle_rx(struct vhost_net *net)
return;
 
mutex_lock(vq-mutex);
-   vhost_disable_notify(vq);
+   vhost_disable_notify(net-dev, vq);
vhost_hlen = vq-vhost_hlen;
sock_hlen = vq-sock_hlen;
 
@@ -334,10 +334,10 @@ static void handle_rx(struct vhost_net *net)
break;
/* OK, now we need to know about added descriptors. */
if (!headcount) {
-   if (unlikely(vhost_enable_notify(vq))) {
+   if (unlikely(vhost_enable_notify(net-dev, vq))) {
/* They have slipped one in as we were
 * doing that: check again. */
-   vhost_disable_notify(vq);
+   vhost_disable_notify(net-dev, vq);
continue;
}
/* Nothing new?  Wait for eventfd to tell us
diff --git a/drivers/vhost/test.c b/drivers/vhost/test.c
index 099f302..734e1d7 100644
--- a/drivers/vhost/test.c
+++ b/drivers/vhost/test.c
@@ -49,7 +49,7 @@ static void handle_vq(struct vhost_test *n)
return;
 
mutex_lock(vq-mutex);
-   vhost_disable_notify(vq);
+   vhost_disable_notify(n-dev, vq);
 
for (;;) {
head = vhost_get_vq_desc(n-dev, vq, vq-iov,
@@ -61,8 +61,8 @@ static void handle_vq(struct vhost_test *n)
break;
/* Nothing new?  Wait for eventfd to tell us they refilled. */
if (head == vq-num) {
-   if (unlikely(vhost_enable_notify(vq))) {
-   vhost_disable_notify(vq);
+   if (unlikely(vhost_enable_notify(n-dev, vq))) {
+   vhost_disable_notify(n-dev, vq);
continue;
}
break;
diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
index 2ab2912..2a10786 100644
--- a/drivers/vhost/vhost.c
+++ b/drivers/vhost/vhost.c
@@ -37,6 +37,9 @@ enum {
VHOST_MEMORY_F_LOG = 0x1,
 };
 
+#define vhost_used_event(vq) ((u16 __user *)vq-avail-ring[vq-num])
+#define vhost_avail_event(vq) ((u16 __user *)vq-used-ring[vq-num])
+
 static void vhost_poll_func(struct file *file, wait_queue_head_t *wqh,
poll_table *pt)
 {
@@ -161,6 +164,8 @@ static void vhost_vq_reset(struct vhost_dev *dev,
vq-last_avail_idx = 0;
vq-avail_idx = 0;
vq-last_used_idx = 0;
+   vq-signalled_used = 0;
+   vq-signalled_used_valid = false;
vq-used_flags = 0;
vq-log_used = false;
vq-log_addr = -1ull;
@@ -489,16 +494,17 @@ static int memory_access_ok(struct vhost_dev *d, struct 
vhost_memory *mem,
return 1;
 }
 
-static int vq_access_ok(unsigned int num,
+static int vq_access_ok(struct vhost_dev *d, unsigned int num,
struct vring_desc __user *desc,
struct vring_avail __user *avail,
struct vring_used __user *used)
 {
+   size_t s = vhost_has_feature(d, VIRTIO_RING_F_EVENT_IDX) ? 2 : 0;
return access_ok(VERIFY_READ, desc, num * sizeof *desc) 
   access_ok(VERIFY_READ, avail,
-sizeof *avail + num * sizeof *avail-ring) 
+sizeof *avail + num * sizeof *avail-ring + s) 
   access_ok(VERIFY_WRITE, used,
-   sizeof *used + num * sizeof *used-ring);
+

[PATCHv2 06/14] virtio: add api for delayed callbacks

2011-05-19 Thread Michael S. Tsirkin
Add an API that tells the other side that callbacks
should be delayed until a lot of work has been done.
Implement using the new event_idx feature.

Note: it might seem advantageous to let the drivers
ask for a callback after a specific capacity has
been reached. However, as a single head can
free many entries in the descriptor table,
we don't really have a clue about capacity
until get_buf is called. The API is the simplest
to implement at the moment, we'll see what kind of
hints drivers can pass when there's more than one
user of the feature.

Signed-off-by: Michael S. Tsirkin m...@redhat.com
---
 drivers/virtio/virtio_ring.c |   27 +++
 include/linux/virtio.h   |9 +
 2 files changed, 36 insertions(+), 0 deletions(-)

diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
index 1d0f9be..6578e1a 100644
--- a/drivers/virtio/virtio_ring.c
+++ b/drivers/virtio/virtio_ring.c
@@ -376,6 +376,33 @@ bool virtqueue_enable_cb(struct virtqueue *_vq)
 }
 EXPORT_SYMBOL_GPL(virtqueue_enable_cb);
 
+bool virtqueue_enable_cb_delayed(struct virtqueue *_vq)
+{
+   struct vring_virtqueue *vq = to_vvq(_vq);
+   u16 bufs;
+
+   START_USE(vq);
+
+   /* We optimistically turn back on interrupts, then check if there was
+* more to do. */
+   /* Depending on the VIRTIO_RING_F_USED_EVENT_IDX feature, we need to
+* either clear the flags bit or point the event index at the next
+* entry. Always do both to keep code simple. */
+   vq-vring.avail-flags = ~VRING_AVAIL_F_NO_INTERRUPT;
+   /* TODO: tune this threshold */
+   bufs = (u16)(vq-vring.avail-idx - vq-last_used_idx) * 3 / 4;
+   vring_used_event(vq-vring) = vq-last_used_idx + bufs;
+   virtio_mb();
+   if (unlikely((u16)(vq-vring.used-idx - vq-last_used_idx)  bufs)) {
+   END_USE(vq);
+   return false;
+   }
+
+   END_USE(vq);
+   return true;
+}
+EXPORT_SYMBOL_GPL(virtqueue_enable_cb_delayed);
+
 void *virtqueue_detach_unused_buf(struct virtqueue *_vq)
 {
struct vring_virtqueue *vq = to_vvq(_vq);
diff --git a/include/linux/virtio.h b/include/linux/virtio.h
index aff5b4f..7108857 100644
--- a/include/linux/virtio.h
+++ b/include/linux/virtio.h
@@ -51,6 +51,13 @@ struct virtqueue {
  * This re-enables callbacks; it returns false if there are pending
  * buffers in the queue, to detect a possible race between the driver
  * checking for more work, and enabling callbacks.
+ * virtqueue_enable_cb_delayed: restart callbacks after disable_cb.
+ * vq: the struct virtqueue we're talking about.
+ * This re-enables callbacks but hints to the other side to delay
+ * interrupts until most of the available buffers have been processed;
+ * it returns false if there are many pending buffers in the queue,
+ * to detect a possible race between the driver checking for more work,
+ * and enabling callbacks.
  * virtqueue_detach_unused_buf: detach first unused buffer
  * vq: the struct virtqueue we're talking about.
  * Returns NULL or the data token handed to add_buf
@@ -86,6 +93,8 @@ void virtqueue_disable_cb(struct virtqueue *vq);
 
 bool virtqueue_enable_cb(struct virtqueue *vq);
 
+bool virtqueue_enable_cb_delayed(struct virtqueue *vq);
+
 void *virtqueue_detach_unused_buf(struct virtqueue *vq);
 
 /**
-- 
1.7.5.53.gc233e

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


[PATCHv2 07/14] virtio_net: delay TX callbacks

2011-05-19 Thread Michael S. Tsirkin
Ask for delayed callbacks on TX ring full, to give the
other side more of a chance to make progress.

Signed-off-by: Michael S. Tsirkin m...@redhat.com
---
 drivers/net/virtio_net.c |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index 0cb0b06..f685324 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -609,7 +609,7 @@ static netdev_tx_t start_xmit(struct sk_buff *skb, struct 
net_device *dev)
 * before it gets out of hand.  Naturally, this wastes entries. */
if (capacity  2+MAX_SKB_FRAGS) {
netif_stop_queue(dev);
-   if (unlikely(!virtqueue_enable_cb(vi-svq))) {
+   if (unlikely(!virtqueue_enable_cb_delayed(vi-svq))) {
/* More just got used, free them then recheck. */
capacity += free_old_xmit_skbs(vi);
if (capacity = 2+MAX_SKB_FRAGS) {
-- 
1.7.5.53.gc233e

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


[PATCHv2 08/14] virtio_ring: Add capacity check API

2011-05-19 Thread Michael S. Tsirkin
From: Shirley Ma mashi...@us.ibm.com

Signed-off-by: Shirley Ma x...@us.ibm.com
Signed-off-by: Michael S. Tsirkin m...@redhat.com
---
 drivers/virtio/virtio_ring.c |8 
 include/linux/virtio.h   |5 +
 2 files changed, 13 insertions(+), 0 deletions(-)

diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
index 6578e1a..eed5f29 100644
--- a/drivers/virtio/virtio_ring.c
+++ b/drivers/virtio/virtio_ring.c
@@ -344,6 +344,14 @@ void *virtqueue_get_buf(struct virtqueue *_vq, unsigned 
int *len)
 }
 EXPORT_SYMBOL_GPL(virtqueue_get_buf);
 
+int virtqueue_get_capacity(struct virtqueue *_vq)
+{
+   struct vring_virtqueue *vq = to_vvq(_vq);
+
+   return vq-num_free;
+}
+EXPORT_SYMBOL_GPL(virtqueue_get_capacity);
+
 void virtqueue_disable_cb(struct virtqueue *_vq)
 {
struct vring_virtqueue *vq = to_vvq(_vq);
diff --git a/include/linux/virtio.h b/include/linux/virtio.h
index 7108857..58c0953 100644
--- a/include/linux/virtio.h
+++ b/include/linux/virtio.h
@@ -42,6 +42,9 @@ struct virtqueue {
  * vq: the struct virtqueue we're talking about.
  * len: the length written into the buffer
  * Returns NULL or the data token handed to add_buf.
+ * virtqueue_get_capacity: get the current capacity of the queue
+ * vq: the struct virtqueue we're talking about.
+ * Returns remaining capacity of the queue.
  * virtqueue_disable_cb: disable callbacks
  * vq: the struct virtqueue we're talking about.
  * Note that this is not necessarily synchronous, hence unreliable and only
@@ -89,6 +92,8 @@ void virtqueue_kick(struct virtqueue *vq);
 
 void *virtqueue_get_buf(struct virtqueue *vq, unsigned int *len);
 
+int virtqueue_get_capacity(struct virtqueue *vq);
+
 void virtqueue_disable_cb(struct virtqueue *vq);
 
 bool virtqueue_enable_cb(struct virtqueue *vq);
-- 
1.7.5.53.gc233e

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


[PATCHv2 09/14] virtio_net: fix TX capacity checks using new API

2011-05-19 Thread Michael S. Tsirkin
virtio net uses the number of sg entries to
check for TX ring capacity freed. But this
gives incorrect results when indirect buffers
are used. Use the new capacity API instead.

Signed-off-by: Michael S. Tsirkin m...@redhat.com
---
 drivers/net/virtio_net.c |9 -
 1 files changed, 4 insertions(+), 5 deletions(-)

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index f685324..f33c92b 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -509,19 +509,17 @@ again:
return received;
 }
 
-static unsigned int free_old_xmit_skbs(struct virtnet_info *vi)
+static void free_old_xmit_skbs(struct virtnet_info *vi)
 {
struct sk_buff *skb;
-   unsigned int len, tot_sgs = 0;
+   unsigned int len;
 
while ((skb = virtqueue_get_buf(vi-svq, len)) != NULL) {
pr_debug(Sent skb %p\n, skb);
vi-dev-stats.tx_bytes += skb-len;
vi-dev-stats.tx_packets++;
-   tot_sgs += skb_vnet_hdr(skb)-num_sg;
dev_kfree_skb_any(skb);
}
-   return tot_sgs;
 }
 
 static int xmit_skb(struct virtnet_info *vi, struct sk_buff *skb)
@@ -611,7 +609,8 @@ static netdev_tx_t start_xmit(struct sk_buff *skb, struct 
net_device *dev)
netif_stop_queue(dev);
if (unlikely(!virtqueue_enable_cb_delayed(vi-svq))) {
/* More just got used, free them then recheck. */
-   capacity += free_old_xmit_skbs(vi);
+   free_old_xmit_skbs(vi);
+   capacity = virtqueue_get_capacity(vi-svq);
if (capacity = 2+MAX_SKB_FRAGS) {
netif_start_queue(dev);
virtqueue_disable_cb(vi-svq);
-- 
1.7.5.53.gc233e

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


[PATCHv2 10/14] virtio_net: limit xmit polling

2011-05-19 Thread Michael S. Tsirkin
Current code might introduce a lot of latency variation
if there are many pending bufs at the time we
attempt to transmit a new one. This is bad for
real-time applications and can't be good for TCP either.

Free up just enough to both clean up all buffers
eventually and to be able to xmit the next packet.

Signed-off-by: Michael S. Tsirkin m...@redhat.com
---
 drivers/net/virtio_net.c |   22 ++
 1 files changed, 14 insertions(+), 8 deletions(-)

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index f33c92b..42935cb 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -509,17 +509,25 @@ again:
return received;
 }
 
-static void free_old_xmit_skbs(struct virtnet_info *vi)
+static bool free_old_xmit_skbs(struct virtnet_info *vi, int capacity)
 {
struct sk_buff *skb;
unsigned int len;
-
-   while ((skb = virtqueue_get_buf(vi-svq, len)) != NULL) {
+   bool c;
+   int n;
+
+   /* We try to free up at least 2 skbs per one sent, so that we'll get
+* all of the memory back if they are used fast enough. */
+   for (n = 0;
+((c = virtqueue_get_capacity(vi-svq)  capacity) || n  2) 
+((skb = virtqueue_get_buf(vi-svq, len)));
+++n) {
pr_debug(Sent skb %p\n, skb);
vi-dev-stats.tx_bytes += skb-len;
vi-dev-stats.tx_packets++;
dev_kfree_skb_any(skb);
}
+   return !c;
 }
 
 static int xmit_skb(struct virtnet_info *vi, struct sk_buff *skb)
@@ -574,8 +582,8 @@ static netdev_tx_t start_xmit(struct sk_buff *skb, struct 
net_device *dev)
struct virtnet_info *vi = netdev_priv(dev);
int capacity;
 
-   /* Free up any pending old buffers before queueing new ones. */
-   free_old_xmit_skbs(vi);
+   /* Free enough pending old buffers to enable queueing new ones. */
+   free_old_xmit_skbs(vi, 2+MAX_SKB_FRAGS);
 
/* Try to transmit */
capacity = xmit_skb(vi, skb);
@@ -609,9 +617,7 @@ static netdev_tx_t start_xmit(struct sk_buff *skb, struct 
net_device *dev)
netif_stop_queue(dev);
if (unlikely(!virtqueue_enable_cb_delayed(vi-svq))) {
/* More just got used, free them then recheck. */
-   free_old_xmit_skbs(vi);
-   capacity = virtqueue_get_capacity(vi-svq);
-   if (capacity = 2+MAX_SKB_FRAGS) {
+   if (!likely(free_old_xmit_skbs(vi, 2+MAX_SKB_FRAGS))) {
netif_start_queue(dev);
virtqueue_disable_cb(vi-svq);
}
-- 
1.7.5.53.gc233e

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


[PATCHv2 11/14] virtio: don't delay avail index update

2011-05-19 Thread Michael S. Tsirkin
Update avail index immediately instead of upon kick:
for virtio-net RX this helps parallelism with the host.

Signed-off-by: Michael S. Tsirkin m...@redhat.com
---
 drivers/virtio/virtio_ring.c |   28 +++-
 1 files changed, 19 insertions(+), 9 deletions(-)

diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
index eed5f29..8218fe6 100644
--- a/drivers/virtio/virtio_ring.c
+++ b/drivers/virtio/virtio_ring.c
@@ -89,7 +89,7 @@ struct vring_virtqueue
unsigned int num_free;
/* Head of free buffer list. */
unsigned int free_head;
-   /* Number we've added since last sync. */
+   /* Number we've added since last kick. */
unsigned int num_added;
 
/* Last used index we've seen. */
@@ -174,6 +174,13 @@ int virtqueue_add_buf_gfp(struct virtqueue *_vq,
 
BUG_ON(data == NULL);
 
+   /* Prevent drivers from adding more than num bufs without a kick. */
+   if (vq-num_added == vq-vring.num) {
+   printk(KERN_ERR gaaa!!!\n);
+   END_USE(vq);
+   return -ENOSPC;
+   }
+
/* If the host supports indirect descriptor tables, and we have multiple
 * buffers, then go indirect. FIXME: tune this threshold */
if (vq-indirect  (out + in)  1  vq-num_free) {
@@ -227,8 +234,14 @@ add_head:
 
/* Put entry in available array (but don't update avail-idx until they
 * do sync).  FIXME: avoid modulus here? */
-   avail = (vq-vring.avail-idx + vq-num_added++) % vq-vring.num;
+   avail = vq-vring.avail-idx % vq-vring.num;
vq-vring.avail-ring[avail] = head;
+   vq-num_added++;
+
+   /* Descriptors and available array need to be set before we expose the
+* new available array entries. */
+   virtio_wmb();
+   vq-vring.avail-idx++;
 
pr_debug(Added buffer head %i to %p\n, head, vq);
END_USE(vq);
@@ -242,17 +255,14 @@ void virtqueue_kick(struct virtqueue *_vq)
struct vring_virtqueue *vq = to_vvq(_vq);
u16 new, old;
START_USE(vq);
-   /* Descriptors and available array need to be set before we expose the
-* new available array entries. */
-   virtio_wmb();
-
-   old = vq-vring.avail-idx;
-   new = vq-vring.avail-idx = old + vq-num_added;
-   vq-num_added = 0;
 
/* Need to update avail index before checking if we should notify */
virtio_mb();
 
+   new = vq-vring.avail-idx;
+   old = new - vq-num_added;
+   vq-num_added = 0;
+
if (vq-event ?
vring_need_event(vring_avail_event(vq-vring), new, old) :
!(vq-vring.used-flags  VRING_USED_F_NO_NOTIFY))
-- 
1.7.5.53.gc233e

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


[PATCHv2 13/14] virtio_test: update for 64 bit features

2011-05-19 Thread Michael S. Tsirkin
Extend the virtio_test tool so it can work with
64 bit features.

Signed-off-by: Michael S. Tsirkin m...@redhat.com
---
 tools/virtio/virtio_test.c |8 ++--
 1 files changed, 6 insertions(+), 2 deletions(-)

diff --git a/tools/virtio/virtio_test.c b/tools/virtio/virtio_test.c
index 74d3331..96cf9bf 100644
--- a/tools/virtio/virtio_test.c
+++ b/tools/virtio/virtio_test.c
@@ -55,7 +55,6 @@ void vhost_vq_setup(struct vdev_info *dev, struct vq_info 
*info)
 {
struct vhost_vring_state state = { .index = info-idx };
struct vhost_vring_file file = { .index = info-idx };
-   unsigned long long features = dev-vdev.features[0];
struct vhost_vring_addr addr = {
.index = info-idx,
.desc_user_addr = (uint64_t)(unsigned long)info-vring.desc,
@@ -63,6 +62,10 @@ void vhost_vq_setup(struct vdev_info *dev, struct vq_info 
*info)
.used_user_addr = (uint64_t)(unsigned long)info-vring.used,
};
int r;
+   unsigned long long features = dev-vdev.features[0];
+   if (sizeof features  sizeof dev-vdev.features[0])
+   features |= ((unsigned long long)dev-vdev.features[1])  32;
+
r = ioctl(dev-control, VHOST_SET_FEATURES, features);
assert(r = 0);
state.num = info-vring.num;
@@ -107,7 +110,8 @@ static void vdev_info_init(struct vdev_info* dev, unsigned 
long long features)
int r;
memset(dev, 0, sizeof *dev);
dev-vdev.features[0] = features;
-   dev-vdev.features[1] = features  32;
+   if (sizeof features  sizeof dev-vdev.features[0])
+   dev-vdev.features[1] = features  32;
dev-buf_size = 1024;
dev-buf = malloc(dev-buf_size);
assert(dev-buf);
-- 
1.7.5.53.gc233e

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


[PATCHv2 12/14] virtio: 64 bit features

2011-05-19 Thread Michael S. Tsirkin
Extend features to 64 bit so we can use more
transport bits.

Signed-off-by: Michael S. Tsirkin m...@redhat.com
---
 drivers/lguest/lguest_device.c |8 
 drivers/s390/kvm/kvm_virtio.c  |8 
 drivers/virtio/virtio.c|8 
 drivers/virtio/virtio_pci.c|   34 --
 drivers/virtio/virtio_ring.c   |2 ++
 include/linux/virtio.h |2 +-
 include/linux/virtio_config.h  |   15 +--
 include/linux/virtio_pci.h |9 -
 8 files changed, 60 insertions(+), 26 deletions(-)

diff --git a/drivers/lguest/lguest_device.c b/drivers/lguest/lguest_device.c
index 69c84a1..d2d6953 100644
--- a/drivers/lguest/lguest_device.c
+++ b/drivers/lguest/lguest_device.c
@@ -93,17 +93,17 @@ static unsigned desc_size(const struct lguest_device_desc 
*desc)
 }
 
 /* This gets the device's feature bits. */
-static u32 lg_get_features(struct virtio_device *vdev)
+static u64 lg_get_features(struct virtio_device *vdev)
 {
unsigned int i;
-   u32 features = 0;
+   u64 features = 0;
struct lguest_device_desc *desc = to_lgdev(vdev)-desc;
u8 *in_features = lg_features(desc);
 
/* We do this the slow but generic way. */
-   for (i = 0; i  min(desc-feature_len * 8, 32); i++)
+   for (i = 0; i  min(desc-feature_len * 8, 64); i++)
if (in_features[i / 8]  (1  (i % 8)))
-   features |= (1  i);
+   features |= (1ull  i);
 
return features;
 }
diff --git a/drivers/s390/kvm/kvm_virtio.c b/drivers/s390/kvm/kvm_virtio.c
index 414427d..c56293c 100644
--- a/drivers/s390/kvm/kvm_virtio.c
+++ b/drivers/s390/kvm/kvm_virtio.c
@@ -79,16 +79,16 @@ static unsigned desc_size(const struct kvm_device_desc 
*desc)
 }
 
 /* This gets the device's feature bits. */
-static u32 kvm_get_features(struct virtio_device *vdev)
+static u64 kvm_get_features(struct virtio_device *vdev)
 {
unsigned int i;
-   u32 features = 0;
+   u64 features = 0;
struct kvm_device_desc *desc = to_kvmdev(vdev)-desc;
u8 *in_features = kvm_vq_features(desc);
 
-   for (i = 0; i  min(desc-feature_len * 8, 32); i++)
+   for (i = 0; i  min(desc-feature_len * 8, 64); i++)
if (in_features[i / 8]  (1  (i % 8)))
-   features |= (1  i);
+   features |= (1ull  i);
return features;
 }
 
diff --git a/drivers/virtio/virtio.c b/drivers/virtio/virtio.c
index efb35aa..52b24d7 100644
--- a/drivers/virtio/virtio.c
+++ b/drivers/virtio/virtio.c
@@ -112,7 +112,7 @@ static int virtio_dev_probe(struct device *_d)
struct virtio_device *dev = container_of(_d,struct virtio_device,dev);
struct virtio_driver *drv = container_of(dev-dev.driver,
 struct virtio_driver, driver);
-   u32 device_features;
+   u64 device_features;
 
/* We have a driver! */
add_status(dev, VIRTIO_CONFIG_S_DRIVER);
@@ -124,14 +124,14 @@ static int virtio_dev_probe(struct device *_d)
memset(dev-features, 0, sizeof(dev-features));
for (i = 0; i  drv-feature_table_size; i++) {
unsigned int f = drv-feature_table[i];
-   BUG_ON(f = 32);
-   if (device_features  (1  f))
+   BUG_ON(f = 64);
+   if (device_features  (1ull  f))
set_bit(f, dev-features);
}
 
/* Transport features always preserved to pass to finalize_features. */
for (i = VIRTIO_TRANSPORT_F_START; i  VIRTIO_TRANSPORT_F_END; i++)
-   if (device_features  (1  i))
+   if (device_features  (1ull  i))
set_bit(i, dev-features);
 
dev-config-finalize_features(dev);
diff --git a/drivers/virtio/virtio_pci.c b/drivers/virtio/virtio_pci.c
index 4fb5b2b..04b216f 100644
--- a/drivers/virtio/virtio_pci.c
+++ b/drivers/virtio/virtio_pci.c
@@ -44,6 +44,8 @@ struct virtio_pci_device
spinlock_t lock;
struct list_head virtqueues;
 
+   /* 64 bit features */
+   int features_hi;
/* MSI-X support */
int msix_enabled;
int intx_enabled;
@@ -103,26 +105,46 @@ static struct virtio_pci_device *to_vp_device(struct 
virtio_device *vdev)
 }
 
 /* virtio config-get_features() implementation */
-static u32 vp_get_features(struct virtio_device *vdev)
+static u64 vp_get_features(struct virtio_device *vdev)
 {
struct virtio_pci_device *vp_dev = to_vp_device(vdev);
+   u32 flo, fhi;
 
-   /* When someone needs more than 32 feature bits, we'll need to
+   /* When someone needs more than 32 feature bits, we need to
 * steal a bit to indicate that the rest are somewhere else. */
-   return ioread32(vp_dev-ioaddr + VIRTIO_PCI_HOST_FEATURES);
+   flo = ioread32(vp_dev-ioaddr + VIRTIO_PCI_HOST_FEATURES);
+   if (flo  (0x1  VIRTIO_F_FEATURES_HI)) {
+   

[PATCHv2 14/14] vhost: fix 64 bit features

2011-05-19 Thread Michael S. Tsirkin
Update vhost_has_feature to make it work correctly for bit  32.

Signed-off-by: Michael S. Tsirkin m...@redhat.com
---
 drivers/vhost/vhost.h |8 
 1 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/drivers/vhost/vhost.h b/drivers/vhost/vhost.h
index 8e03379..64889d2 100644
--- a/drivers/vhost/vhost.h
+++ b/drivers/vhost/vhost.h
@@ -123,7 +123,7 @@ struct vhost_dev {
struct vhost_memory __rcu *memory;
struct mm_struct *mm;
struct mutex mutex;
-   unsigned acked_features;
+   u64 acked_features;
struct vhost_virtqueue *vqs;
int nvqs;
struct file *log_file;
@@ -176,14 +176,14 @@ enum {
 (1ULL  VIRTIO_NET_F_MRG_RXBUF),
 };
 
-static inline int vhost_has_feature(struct vhost_dev *dev, int bit)
+static inline bool vhost_has_feature(struct vhost_dev *dev, int bit)
 {
-   unsigned acked_features;
+   u64 acked_features;
 
/* TODO: check that we are running from vhost_worker or dev mutex is
 * held? */
acked_features = rcu_dereference_index_check(dev-acked_features, 1);
-   return acked_features  (1  bit);
+   return acked_features  (1ull  bit);
 }
 
 #endif
-- 
1.7.5.53.gc233e
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: [PATCHv2 00/14] virtio and vhost-net performance enhancements

2011-05-19 Thread David Miller
From: Michael S. Tsirkin m...@redhat.com
Date: Fri, 20 May 2011 02:10:07 +0300

 Rusty, I think it will be easier to merge vhost and virtio bits in one
 go. Can it all go in through your tree (Dave in the past acked
 sending a very similar patch through you so should not be a problem)?

And in case you want an explicit ack for the net bits:

Acked-by: David S. Miller da...@davemloft.net

:-)
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


[PATCHv2 0/2] virtio-net: 64 bit features, event index

2011-05-19 Thread Michael S. Tsirkin
OK, here's a patch that implements the virtio spec update that I
sent earlier. It supercedes the PUBLISH_USED_IDX patches
I sent out earlier.

Support is added in both userspace and vhost-net.

If you see issues or are just curious, you can
turn the new feature off. For example:

-global virtio-net-pci.event_idx=on
-global virtio-blk-pci.event_idx=off

Also, it's possible to try both vhost-net and virtio-net.

Another part is adding support for 64 bit features in
place. The high bits are actually unused, to test
hack qemu to set some high bit.

linux code is here:
git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost.git 
vhost-net-next-event-idx-v3
git://git.kernel.org/pub/scm/linux/kernel/git/mst/qemu-kvm.git 
virtio-net-event-idx-v3


Changes from v1:
- unify used and avail ring handling in a single feature bit
- copy avail event idx fix from vhost-net

Michael S. Tsirkin (2):
  virtio/vhost: support 64 bit features
  virtio+vhost: event idx feature

 hw/qdev-properties.c   |   39 +---
 hw/qdev.h  |   10 
 hw/s390-virtio-bus.c   |5 +-
 hw/s390-virtio-bus.h   |2 +-
 hw/syborg_virtio.c |7 ++-
 hw/vhost_net.c |   14 --
 hw/vhost_net.h |4 +-
 hw/virtio-9p.c |2 +-
 hw/virtio-balloon.c|2 +-
 hw/virtio-blk.c|2 +-
 hw/virtio-blk.h|2 +-
 hw/virtio-net.c|   11 +++--
 hw/virtio-net.h|   34 +++---
 hw/virtio-pci.c|   91 +++--
 hw/virtio-serial-bus.c |2 +-
 hw/virtio.c|  116 ++--
 hw/virtio.h|   24 +++---
 17 files changed, 275 insertions(+), 92 deletions(-)

-- 
1.7.5.53.gc233e
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


[PATCHv2 1/2] virtio/vhost: support 64 bit features

2011-05-19 Thread Michael S. Tsirkin
Add support for extended feature bits: up to 64 bit.
Only virtio-pci is actually implemented,
s390 and syborg are stubbed out (and untested).

Signed-off-by: Michael S. Tsirkin m...@redhat.com
---
 hw/qdev-properties.c   |   39 
 hw/qdev.h  |   10 +
 hw/s390-virtio-bus.c   |5 ++-
 hw/s390-virtio-bus.h   |2 +-
 hw/syborg_virtio.c |7 ++--
 hw/vhost_net.c |8 ++--
 hw/vhost_net.h |4 +-
 hw/virtio-9p.c |2 +-
 hw/virtio-balloon.c|2 +-
 hw/virtio-blk.c|2 +-
 hw/virtio-blk.h|2 +-
 hw/virtio-net.c|   11 +++---
 hw/virtio-net.h|   34 +-
 hw/virtio-pci.c|   91 +++-
 hw/virtio-serial-bus.c |2 +-
 hw/virtio.c|   24 ++---
 hw/virtio.h|   17 +
 17 files changed, 179 insertions(+), 83 deletions(-)

diff --git a/hw/qdev-properties.c b/hw/qdev-properties.c
index 1088a26..a933c9e 100644
--- a/hw/qdev-properties.c
+++ b/hw/qdev-properties.c
@@ -10,20 +10,35 @@ void *qdev_get_prop_ptr(DeviceState *dev, Property *prop)
 return ptr;
 }
 
-static uint32_t qdev_get_prop_mask(Property *prop)
+static uint64_t qdev_get_prop_mask(Property *prop)
 {
 assert(prop-info-type == PROP_TYPE_BIT);
-return 0x1  prop-bitnr;
+return 0x1ULL  prop-bitnr;
+}
+
+static uint64_t qdev_get_prop_val(DeviceState *dev, Property *prop)
+{
+assert(prop-info-type == PROP_TYPE_BIT);
+if (prop-info-size == sizeof(uint32_t)) {
+return *(uint32_t *)qdev_get_prop_ptr(dev, prop);
+} else {
+return *(uint64_t *)qdev_get_prop_ptr(dev, prop);
+}
 }
 
 static void bit_prop_set(DeviceState *dev, Property *props, bool val)
 {
-uint32_t *p = qdev_get_prop_ptr(dev, props);
-uint32_t mask = qdev_get_prop_mask(props);
+uint64_t p = qdev_get_prop_val(dev, props);
+uint64_t mask = qdev_get_prop_mask(props);
 if (val)
-*p |= mask;
+p |= mask;
 else
-*p = ~mask;
+p = ~mask;
+if (props-info-size == sizeof(uint32_t)) {
+*(uint32_t *)qdev_get_prop_ptr(dev, props) = p;
+} else {
+*(uint64_t *)qdev_get_prop_ptr(dev, props) = p;
+}
 }
 
 static void qdev_prop_cpy(DeviceState *dev, Property *props, void *src)
@@ -51,8 +66,8 @@ static int parse_bit(DeviceState *dev, Property *prop, const 
char *str)
 
 static int print_bit(DeviceState *dev, Property *prop, char *dest, size_t len)
 {
-uint32_t *p = qdev_get_prop_ptr(dev, prop);
-return snprintf(dest, len, (*p  qdev_get_prop_mask(prop)) ? on : off);
+uint64_t val = qdev_get_prop_val(dev, prop);
+return snprintf(dest, len, (val  qdev_get_prop_mask(prop)) ? on : 
off);
 }
 
 PropertyInfo qdev_prop_bit = {
@@ -63,6 +78,14 @@ PropertyInfo qdev_prop_bit = {
 .print = print_bit,
 };
 
+PropertyInfo qdev_prop_bit64 = {
+.name  = on/off,
+.type  = PROP_TYPE_BIT,
+.size  = sizeof(uint64_t),
+.parse = parse_bit,
+.print = print_bit,
+};
+
 /* --- 8bit integer --- */
 
 static int parse_uint8(DeviceState *dev, Property *prop, const char *str)
diff --git a/hw/qdev.h b/hw/qdev.h
index 8a13ec9..e65cab0 100644
--- a/hw/qdev.h
+++ b/hw/qdev.h
@@ -219,6 +219,7 @@ int do_device_del(Monitor *mon, const QDict *qdict, QObject 
**ret_data);
 /*** qdev-properties.c ***/
 
 extern PropertyInfo qdev_prop_bit;
+extern PropertyInfo qdev_prop_bit64;
 extern PropertyInfo qdev_prop_uint8;
 extern PropertyInfo qdev_prop_uint16;
 extern PropertyInfo qdev_prop_uint32;
@@ -257,6 +258,15 @@ extern PropertyInfo qdev_prop_pci_devfn;
 .defval= (bool[]) { (_defval) }, \
 }
 
+#define DEFINE_PROP_BIT64(_name, _state, _field, _bit, _defval) {  \
+.name  = (_name),\
+.info  = (qdev_prop_bit64),   \
+.bitnr= (_bit),  \
+.offset= offsetof(_state, _field)\
++ type_check(uint64_t,typeof_field(_state, _field)), \
+.defval= (bool[]) { (_defval) }, \
+}
+
 #define DEFINE_PROP_UINT8(_n, _s, _f, _d)   \
 DEFINE_PROP_DEFAULT(_n, _s, _f, _d, qdev_prop_uint8, uint8_t)
 #define DEFINE_PROP_UINT16(_n, _s, _f, _d)  \
diff --git a/hw/s390-virtio-bus.c b/hw/s390-virtio-bus.c
index 175e5cb..89e8c8e 100644
--- a/hw/s390-virtio-bus.c
+++ b/hw/s390-virtio-bus.c
@@ -310,10 +310,11 @@ static void virtio_s390_notify(void *opaque, uint16_t 
vector)
 kvm_s390_virtio_irq(s390_cpu_addr2state(0), 0, token);
 }
 
-static unsigned virtio_s390_get_features(void *opaque)
+static uint64_t virtio_s390_get_features(void *opaque)
 {
 VirtIOS390Device *dev = (VirtIOS390Device*)opaque;
-return dev-host_features;
+/* TODO: support high 32 bit features */
+return dev-host_features  

[PATCHv2 2/2] virtio+vhost: event idx feature

2011-05-19 Thread Michael S. Tsirkin
Add support for used_event feature, and utilize it to
reduce the number of interrupts and exits for the guest.

Signed-off-by: Michael S. Tsirkin m...@redhat.com
---
 hw/vhost_net.c |6 
 hw/virtio.c|   92 ++-
 hw/virtio.h|9 +-
 3 files changed, 97 insertions(+), 10 deletions(-)

diff --git a/hw/vhost_net.c b/hw/vhost_net.c
index 7e94f61..41841d9 100644
--- a/hw/vhost_net.c
+++ b/hw/vhost_net.c
@@ -50,6 +50,9 @@ uint64_t vhost_net_get_features(struct vhost_net *net, 
uint64_t features)
 if (!(net-dev.features  (1  VIRTIO_RING_F_INDIRECT_DESC))) {
 features = ~(1  VIRTIO_RING_F_INDIRECT_DESC);
 }
+if (!(net-dev.features  (1  VIRTIO_RING_F_EVENT_IDX))) {
+features = ~(1  VIRTIO_RING_F_EVENT_IDX);
+}
 if (!(net-dev.features  (1  VIRTIO_NET_F_MRG_RXBUF))) {
 features = ~(1  VIRTIO_NET_F_MRG_RXBUF);
 }
@@ -65,6 +68,9 @@ void vhost_net_ack_features(struct vhost_net *net, uint64_t 
features)
 if (features  (1  VIRTIO_RING_F_INDIRECT_DESC)) {
 net-dev.acked_features |= (1  VIRTIO_RING_F_INDIRECT_DESC);
 }
+if (features  (1  VIRTIO_RING_F_EVENT_IDX)) {
+net-dev.acked_features |= (1  VIRTIO_RING_F_EVENT_IDX);
+}
 if (features  (1  VIRTIO_NET_F_MRG_RXBUF)) {
 net-dev.acked_features |= (1  VIRTIO_NET_F_MRG_RXBUF);
 }
diff --git a/hw/virtio.c b/hw/virtio.c
index b9acbd4..d51ac93 100644
--- a/hw/virtio.c
+++ b/hw/virtio.c
@@ -72,7 +72,17 @@ struct VirtQueue
 VRing vring;
 target_phys_addr_t pa;
 uint16_t last_avail_idx;
+/* Last used index value we have signalled on */
+uint16_t signalled_used;
+
+/* Last used index value we have signalled on */
+bool signalled_used_valid;
+
+/* Notification enabled? */
+bool notification;
+
 int inuse;
+
 uint16_t vector;
 void (*handle_output)(VirtIODevice *vdev, VirtQueue *vq);
 VirtIODevice *vdev;
@@ -141,6 +151,11 @@ static inline uint16_t vring_avail_ring(VirtQueue *vq, int 
i)
 return lduw_phys(pa);
 }
 
+static inline uint16_t vring_used_event(VirtQueue *vq)
+{
+return vring_avail_ring(vq, vq-vring.num);
+}
+
 static inline void vring_used_ring_id(VirtQueue *vq, int i, uint32_t val)
 {
 target_phys_addr_t pa;
@@ -162,11 +177,11 @@ static uint16_t vring_used_idx(VirtQueue *vq)
 return lduw_phys(pa);
 }
 
-static inline void vring_used_idx_increment(VirtQueue *vq, uint16_t val)
+static inline void vring_used_idx_set(VirtQueue *vq, uint16_t val)
 {
 target_phys_addr_t pa;
 pa = vq-vring.used + offsetof(VRingUsed, idx);
-stw_phys(pa, vring_used_idx(vq) + val);
+stw_phys(pa, val);
 }
 
 static inline void vring_used_flags_set_bit(VirtQueue *vq, int mask)
@@ -183,12 +198,26 @@ static inline void vring_used_flags_unset_bit(VirtQueue 
*vq, int mask)
 stw_phys(pa, lduw_phys(pa)  ~mask);
 }
 
+static inline void vring_avail_event(VirtQueue *vq, uint16_t val)
+{
+target_phys_addr_t pa;
+if (!vq-notification) {
+return;
+}
+pa = vq-vring.used + offsetof(VRingUsed, ring[vq-vring.num]);
+stw_phys(pa, val);
+}
+
 void virtio_queue_set_notification(VirtQueue *vq, int enable)
 {
-if (enable)
+vq-notification = enable;
+if (vq-vdev-guest_features  (1  VIRTIO_RING_F_EVENT_IDX)) {
+vring_avail_event(vq, vring_avail_idx(vq));
+} else if (enable) {
 vring_used_flags_unset_bit(vq, VRING_USED_F_NO_NOTIFY);
-else
+} else {
 vring_used_flags_set_bit(vq, VRING_USED_F_NO_NOTIFY);
+}
 }
 
 int virtio_queue_ready(VirtQueue *vq)
@@ -234,11 +263,16 @@ void virtqueue_fill(VirtQueue *vq, const VirtQueueElement 
*elem,
 
 void virtqueue_flush(VirtQueue *vq, unsigned int count)
 {
+uint16_t old, new;
 /* Make sure buffer is written before we update index. */
 wmb();
 trace_virtqueue_flush(vq, count);
-vring_used_idx_increment(vq, count);
+old = vring_used_idx(vq);
+new = old + count;
+vring_used_idx_set(vq, new);
 vq-inuse -= count;
+if (unlikely((int16_t)(new - vq-signalled_used)  (uint16_t)(new - old)))
+vq-signalled_used_valid = false;
 }
 
 void virtqueue_push(VirtQueue *vq, const VirtQueueElement *elem,
@@ -395,6 +429,9 @@ int virtqueue_pop(VirtQueue *vq, VirtQueueElement *elem)
 max = vq-vring.num;
 
 i = head = virtqueue_get_head(vq, vq-last_avail_idx++);
+if (vq-vdev-guest_features  (1  VIRTIO_RING_F_EVENT_IDX)) {
+vring_avail_event(vq, vring_avail_idx(vq));
+}
 
 if (vring_desc_flags(desc_pa, i)  VRING_DESC_F_INDIRECT) {
 if (vring_desc_len(desc_pa, i) % sizeof(VRingDesc)) {
@@ -478,6 +515,9 @@ void virtio_reset(void *opaque)
 vdev-vq[i].last_avail_idx = 0;
 vdev-vq[i].pa = 0;
 vdev-vq[i].vector = VIRTIO_NO_VECTOR;
+vdev-vq[i].signalled_used = 0;
+vdev-vq[i].signalled_used_valid = false;
+vdev-vq[i].notification = true;
 }
 }
 
@@ -629,13 

Re: [PATCH RFC] virtio_net: fix patch: virtio_net: limit xmit polling

2011-05-19 Thread Rusty Russell
On Thu, 19 May 2011 01:01:25 +0300, Michael S. Tsirkin m...@redhat.com 
wrote:
 The patch  virtio_net: limit xmit polling
 got the logic reversed: it polled while we had
 capacity not while ring was empty.
 
 Fix it up and clean up a bit by using a for loop.
 
 Signed-off-by: Michael S. Tsirkin m...@redhat.com
 ---
 
 OK, turns out that patch was borken. Here's
 a fix that survived stress test on my box.
 Pushed on my branch, I'll send a rebased series
 with Rusty's comments addressed ASAP.

Normally you would have missed the merge window by now, but I'd really
like this stuff in, so I'm holding it open for this.  I want these patches
in linux-next for at least a few days before I push them.

If you think we're not close enough, please tell me and I'll push
the rest of the virtio patches to Linus now.  

Thanks,
Rusty.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization