Re: [PATCH 1/1] XEN: enlighten, use uninitialized_var(cx)

2009-08-25 Thread Ingo Molnar

* Jiri Slaby  wrote:

> To avoid a wrong compiler warning, use unitialized_var(cx) in
> xen_init_cpuid_mask.
> 
> cx needn't be initialized for cpuid when ax is 1.
> 
> Signed-off-by: Jiri Slaby 
> Cc: Jeremy Fitzhardinge 
> Cc: Chris Wright 
> ---
>  arch/x86/xen/enlighten.c |2 +-
>  1 files changed, 1 insertions(+), 1 deletions(-)
> 
> diff --git a/arch/x86/xen/enlighten.c b/arch/x86/xen/enlighten.c
> index e90540a..5ab75e2 100644
> --- a/arch/x86/xen/enlighten.c
> +++ b/arch/x86/xen/enlighten.c
> @@ -202,7 +202,7 @@ static void xen_cpuid(unsigned int *ax, unsigned int *bx,
>  
>  static __init void xen_init_cpuid_mask(void)
>  {
> - unsigned int ax, bx, cx, dx;
> + unsigned int ax, bx, uninitialized_var(cx), dx;

Please dont use uninitialized_var(), it's an unreliable facility: if 
this variable ever grows a real used-without-initialization bug in 
the future, the compiler warning is turned off permanently. It's 
rare but might happen. We are better off with initializing it to 
zero.

Ingo
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


[PATCH 1/1] XEN: enlighten, use uninitialized_var(cx)

2009-08-25 Thread Jiri Slaby
To avoid a wrong compiler warning, use unitialized_var(cx) in
xen_init_cpuid_mask.

cx needn't be initialized for cpuid when ax is 1.

Signed-off-by: Jiri Slaby 
Cc: Jeremy Fitzhardinge 
Cc: Chris Wright 
---
 arch/x86/xen/enlighten.c |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/arch/x86/xen/enlighten.c b/arch/x86/xen/enlighten.c
index e90540a..5ab75e2 100644
--- a/arch/x86/xen/enlighten.c
+++ b/arch/x86/xen/enlighten.c
@@ -202,7 +202,7 @@ static void xen_cpuid(unsigned int *ax, unsigned int *bx,
 
 static __init void xen_init_cpuid_mask(void)
 {
-   unsigned int ax, bx, cx, dx;
+   unsigned int ax, bx, uninitialized_var(cx), dx;
 
cpuid_leaf1_edx_mask =
~((1 << X86_FEATURE_MCE)  |  /* disable MCE */
-- 
1.6.3.3

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: [PATCH 1/1] XEN: enlighten, use uninitialized_var(cx)

2009-08-25 Thread Jeremy Fitzhardinge
On 08/25/09 14:00, Jiri Slaby wrote:
> To avoid a wrong compiler warning, use unitialized_var(cx) in
> xen_init_cpuid_mask.
>
> cx needn't be initialized for cpuid when ax is 1.
>
> Signed-off-by: Jiri Slaby 
> Cc: Jeremy Fitzhardinge 
> Cc: Chris Wright 
>   
Acked-by: Jeremy Fitzhardinge 

> ---
>  arch/x86/xen/enlighten.c |2 +-
>  1 files changed, 1 insertions(+), 1 deletions(-)
>
> diff --git a/arch/x86/xen/enlighten.c b/arch/x86/xen/enlighten.c
> index e90540a..5ab75e2 100644
> --- a/arch/x86/xen/enlighten.c
> +++ b/arch/x86/xen/enlighten.c
> @@ -202,7 +202,7 @@ static void xen_cpuid(unsigned int *ax, unsigned int *bx,
>  
>  static __init void xen_init_cpuid_mask(void)
>  {
> - unsigned int ax, bx, cx, dx;
> + unsigned int ax, bx, uninitialized_var(cx), dx;
>  
>   cpuid_leaf1_edx_mask =
>   ~((1 << X86_FEATURE_MCE)  |  /* disable MCE */
>   

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: [PATCHv4 2/2] vhost_net: a kernel-level virtio server

2009-08-25 Thread Michael S. Tsirkin
On Tue, Aug 25, 2009 at 09:40:40PM +0930, Rusty Russell wrote:
> > +   u32 __user *featurep = argp;
> > +   int __user *fdp = argp;
> > +   u32 features;
> > +   int fd, r;
> > +   switch (ioctl) {
> > +   case VHOST_NET_SET_SOCKET:
> > +   r = get_user(fd, fdp);
> > +   if (r < 0)
> > +   return r;
> > +   return vhost_net_set_socket(n, fd);
> > +   case VHOST_GET_FEATURES:
> > +   /* No features for now */
> > +   features = 0;
> > +   return put_user(features, featurep);
> 
> We may well get more than 32 feature bits, at least for virtio_net, which will
> force us to do some trickery in virtio_pci.

Unlike PCI, if we ever run out of bits we can just
add FEATURES_EXTENDED ioctl, no need for trickery.

>  I'd like to avoid that here,
> though it's kind of ugly.  We'd need VHOST_GET_FEATURES (and ACK) to take a
> struct like:
> 
>   u32 feature_size;
>   u32 features[];


Thinking about this proposal some more, how will the guest
determine the size to supply the GET_FEATURES ioctl?

Since we are a bit tight in 32 bit space already,
let's just use a 64 bit integer and be done with it?
Right?

-- 
MST
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: [PATCH] XEN: remove undefined functions

2009-08-25 Thread Jeremy Fitzhardinge
On 08/25/09 03:01, Jaswinder Singh Rajput wrote:
> mk_pirq_info(), gsi_from_irq() and vector_from_irq() are static functions
> and no one is calling them.
>
> This fixed following compilation warnings :
>
>   drivers/xen/events.c:134: warning: ‘mk_pirq_info’ defined but not used
>   drivers/xen/events.c:180: warning: ‘gsi_from_irq’ defined but not used
>   drivers/xen/events.c:190: warning: ‘vector_from_irq’ defined but not used
>
> Signed-off-by: Jaswinder Singh Rajput 
>   

Hm, they're all about to become used.

J
> ---
>  drivers/xen/events.c |   27 ---
>  1 files changed, 0 insertions(+), 27 deletions(-)
>
> diff --git a/drivers/xen/events.c b/drivers/xen/events.c
> index abad71b..d43957a 100644
> --- a/drivers/xen/events.c
> +++ b/drivers/xen/events.c
> @@ -131,13 +131,6 @@ static struct irq_info mk_virq_info(unsigned short 
> evtchn, unsigned short virq)
>   .cpu = 0, .u.virq = virq };
>  }
>  
> -static struct irq_info mk_pirq_info(unsigned short evtchn,
> - unsigned short gsi, unsigned short vector)
> -{
> - return (struct irq_info) { .type = IRQT_PIRQ, .evtchn = evtchn,
> - .cpu = 0, .u.pirq = { .gsi = gsi, .vector = vector } };
> -}
> -
>  /*
>   * Accessors for packed IRQ information.
>   */
> @@ -177,26 +170,6 @@ static unsigned virq_from_irq(unsigned irq)
>   return info->u.virq;
>  }
>  
> -static unsigned gsi_from_irq(unsigned irq)
> -{
> - struct irq_info *info = info_for_irq(irq);
> -
> - BUG_ON(info == NULL);
> - BUG_ON(info->type != IRQT_PIRQ);
> -
> - return info->u.pirq.gsi;
> -}
> -
> -static unsigned vector_from_irq(unsigned irq)
> -{
> - struct irq_info *info = info_for_irq(irq);
> -
> - BUG_ON(info == NULL);
> - BUG_ON(info->type != IRQT_PIRQ);
> -
> - return info->u.pirq.vector;
> -}
> -
>  static enum xen_irq_type type_from_irq(unsigned irq)
>  {
>   return info_for_irq(irq)->type;
>   

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization

Re: vhost net: performance with ping benchmark

2009-08-25 Thread Avi Kivity
On 08/25/2009 04:08 PM, Anthony Liguori wrote:
> Michael S. Tsirkin wrote:
>>> My preference is ring proxying.  Not we'll need ring proxying (or 
>>> at  least event proxying) for non-MSI guests.
>>
>> Exactly, that's what I meant earlier. That's enough, isn't it, Anthony?
>
> It is if we have a working implementation that demonstrates the 
> userspace interface is sufficient.  Once it goes into the upstream 
> kernel, we need to have backwards compatibility code in QEMU forever 
> to support that kernel version.

Not at all.  We still have pure userspace support, so if we don't like 
the first two versions of vhost, we can simply not support them.  Of 
course I'm not advocating merging something known bad or untested, just 
pointing out that the cost of an error is not that bad.

-- 
error compiling committee.c: too many arguments to function

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: [PATCH 1/2] virtio: Add a can_add_buf helper

2009-08-25 Thread Amit Shah
On (Tue) Aug 25 2009 [23:57:54], Rusty Russell wrote:
> On Mon, 24 Aug 2009 05:09:44 pm Amit Shah wrote:
> > sg_init_table(sg, 2+MAX_SKB_FRAGS);
> > -   for (;;) {
> > +   ret = 1;
> > +   while(ret > 0) { /* Is there space to add another buffer to the vq */
> 
> This is called a "do" loop :)
> 
> Please test, and I'll apply.

:-) drat! shouldn't have done that on a Sunday anyway

I'll test it soon.

Amit
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: [PATCH 1/2] virtio: Add a can_add_buf helper

2009-08-25 Thread Rusty Russell
On Mon, 24 Aug 2009 05:09:44 pm Amit Shah wrote:
>   sg_init_table(sg, 2+MAX_SKB_FRAGS);
> - for (;;) {
> + ret = 1;
> + while(ret > 0) { /* Is there space to add another buffer to the vq */

This is called a "do" loop :)

Please test, and I'll apply.

Cheers,
Rusty.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: vhost net: performance with ping benchmark

2009-08-25 Thread Michael S. Tsirkin
On Tue, Aug 25, 2009 at 08:06:39AM -0500, Anthony Liguori wrote:
> Avi Kivity wrote:
>>> I think this is likely going to be needed regardless.  I also think  
>>> the tap compatibility suggestion would simplify the consumption of  
>>> this in userspace.
>>
>> What about veth pairs?
>
> Does veth support GSO and checksum offload?

AFAIK, no. But again, improving veth is a separate project :)

>>> I'd like some time to look at get_state/set_state ioctl()s along with 
>>> dirty tracking support.  It's a much better model for live migration  
>>> IMHO.
>>
>> My preference is ring proxying.  Not we'll need ring proxying (or at  
>> least event proxying) for non-MSI guests.
>
> I avoided suggested ring proxying because I didn't want to suggest that  
> merging should be contingent on it.

Happily, the proposed interface supports is.

> Regards,
>
> Anthony Liguori
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: vhost net: performance with ping benchmark

2009-08-25 Thread Michael S. Tsirkin
On Tue, Aug 25, 2009 at 08:08:05AM -0500, Anthony Liguori wrote:
> Once it goes into the upstream  
> kernel, we need to have backwards compatibility code in QEMU forever to  
> support that kernel version.

BTW, qemu can keep doing the userspace thing if some capability it needs
is missing. It won't be worse off than if the driver is not upstream at
all :).

-- 
MST
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: vhost net: performance with ping benchmark

2009-08-25 Thread Michael S. Tsirkin
On Tue, Aug 25, 2009 at 08:24:07AM -0500, Anthony Liguori wrote:
> Avi Kivity wrote:
>> My preference is ring proxying.  Not we'll need ring proxying (or at  
>> least event proxying) for non-MSI guests.
>
> Thinking about this more...
>
> How does the hand off work?  Assuming you normally don't proxy ring  
> entries and switch to proxying them when you want to migration, do you  
> have a set of ioctl()s that changes the semantics of the ring to be host  
> virtual addresses instead of guest physical?  If so, what do you do with  
> in flight requests?  Does qemu have to buffer new requests and wait for  
> old ones to complete?
>
> Unless you always do ring proxying.  If that's the case, we don't need  
> any of the slot management code in vhost.
>
> Regards,
>
> Anthony Liguori

Here's how it works. It relies on the fact that in virtio, guest can not
assume that descriptors have been used unless they appeared in used
buffers.

When migration starts, we do this:

1. stop kernel (disable socket)
2. call VHOST_SET_VRING_USED: note it gets virtual address, bot guest
   physical. We point it at buffer in qemu memory
3. call VHOST_SET_VRING_CALL, pass eventfd created by qemu
5. copy over existing used buffer
4. unstop kernel (reenable socket)

Now when migration is in progress, we do this:
A. poll eventfd in 3 above
B. When event is seen, look at used buffer that we gave to kernel
C. Parse descriptors and mark pages that kernel wrote to
   as dirty
D. update used buffer that guest looks at
E. signal eventfd for guest

-- 
MST
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: vhost net: performance with ping benchmark

2009-08-25 Thread Michael S. Tsirkin
On Tue, Aug 25, 2009 at 08:08:05AM -0500, Anthony Liguori wrote:
> Michael S. Tsirkin wrote:
>>> My preference is ring proxying.  Not we'll need ring proxying (or at  
>>> least event proxying) for non-MSI guests.
>>> 
>>
>> Exactly, that's what I meant earlier. That's enough, isn't it, Anthony?
>>   
>
> It is if we have a working implementation that demonstrates the
> userspace interface is sufficient.

The idea is trivial enough to be sure the interface is sufficient:
we point kernel at used buffer at address X, and
copy stuff from there to guest buffer, then signal guest.
I'll post a code snippet to show how it's done if you like.

> Once it goes into the upstream
> kernel, we need to have backwards compatibility code in QEMU forever
> to  support that kernel version.

Don't worry: kernel needs to handle old userspace as well, and neither I
nor Rusty want to have a compatibility mess in kernel.

> Regards,
>
> Anthony Liguori
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: [PATCHv4 2/2] vhost_net: a kernel-level virtio server

2009-08-25 Thread Michael S. Tsirkin
Thanks for the comments, I'll work on them ASAP.
Answers to questions and more comments below.

On Tue, Aug 25, 2009 at 09:40:40PM +0930, Rusty Russell wrote:
> On Thu, 20 Aug 2009 12:33:09 am Michael S. Tsirkin wrote:
> > What it is: vhost net is a character device that can be used to reduce
> > the number of system calls involved in virtio networking.
> > Existing virtio net code is used in the guest without modification.
> 
> ...
> 
> > +config VHOST_NET
> > +   tristate "Host kernel accelerator for virtio net"
> > +   depends on NET && EVENTFD
> > +   ---help---
> > + This kernel module can be loaded in host kernel to accelerate
> > + guest networking with virtio_net. Not to be confused with virtio_net
> > + module itself which needs to be loaded in guest kernel.
> > +
> > + To compile this driver as a module, choose M here: the module will
> > + be called vhost_net.
> 
> Just want to note that the patch explanation and the Kconfig help text are
> exceptional examples of reader-oriented text.  Nice!

High praise indeed from the author of the lguest Quest :).

> > +   /* Sanity check */
> > +   if (vq->iov->iov_len != sizeof(struct virtio_net_hdr)) {
> > +   vq_err(vq, "Unexpected header len for TX: "
> > +  "%ld expected %zd\n", vq->iov->iov_len,
> > +  sizeof(struct virtio_net_hdr));
> > +   break;
> > +   }
> 
> OK, this check, which is in the qemu version, is *wrong*.  There should be
> no assumption on sg boundaries.  For example, the guest should be able to
> put the virtio_net_hdr in the front of the skbuf data if there is room.
> 
> You should try to explicitly "consume" sizeof(struct virtio_net_hdr) of the
> iov, and if fail, do this message.  You can skip the out <= 1 test then,
> too.
> 
> Anyway, I really prefer vq->iov[0]. to vq->iov-> here.

I'll fix that. Probably should fix qemu as well.

> > +   /* Sanity check */
> > +   if (vq->iov->iov_len != sizeof(struct virtio_net_hdr)) {
> > +   vq_err(vq, "Unexpected header len for RX: "
> > +  "%ld expected %zd\n",
> > +  vq->iov->iov_len, sizeof(struct virtio_net_hdr));
> > +   break;
> 
> Here too.
> 
> > +   u32 __user *featurep = argp;
> > +   int __user *fdp = argp;
> > +   u32 features;
> > +   int fd, r;
> > +   switch (ioctl) {
> > +   case VHOST_NET_SET_SOCKET:
> > +   r = get_user(fd, fdp);
> > +   if (r < 0)
> > +   return r;
> > +   return vhost_net_set_socket(n, fd);
> > +   case VHOST_GET_FEATURES:
> > +   /* No features for now */
> > +   features = 0;
> > +   return put_user(features, featurep);
> 
> We may well get more than 32 feature bits, at least for virtio_net, which will
> force us to do some trickery in virtio_pci.  I'd like to avoid that here,
> though it's kind of ugly.  We'd need VHOST_GET_FEATURES (and ACK) to take a
> struct like:
> 
>   u32 feature_size;
>   u32 features[];

Do you feel just making it 64 bit won't be enough?  How about 128 bit?

> > +int vhost_net_init(void)
> 
> static?
> 
> > +void vhost_net_exit(void)
> 
> static?

Good catch.

> > +/* Start polling a file. We add ourselves to file's wait queue. The user 
> > must
> > + * keep a reference to a file until after vhost_poll_stop is called. */
> 
> I experienced minor confusion from the comments in this file.  Where you said
> "user" I think "caller".  No biggie though.
> 
> > +   memory->nregions = 2;
> > +   memory->regions[0].guest_phys_addr = 1;
> > +   memory->regions[0].userspace_addr = 1;
> > +   memory->regions[0].memory_size = ~0ULL;
> > +   memory->regions[1].guest_phys_addr = 0;
> > +   memory->regions[1].userspace_addr = 0;
> > +   memory->regions[1].memory_size = 1;
> 
> Not sure I understand why there are two regions to start?

We are trying to cover a whole 2^64 range here.  It's size does not fit
in a single 64 bit length value.  I could special case 0 length to mean
2^64, but decided against it.

> > +   case VHOST_SET_VRING_BASE:
> > +   r = copy_from_user(&s, argp, sizeof s);
> > +   if (r < 0)
> > +   break;
> > +   if (s.num > 0x) {
> > +   r = -EINVAL;
> > +   break;
> > +   }
> > +   vq->last_avail_idx = s.num;
> > +   break;
> > +   case VHOST_GET_VRING_BASE:
> > +   s.index = idx;
> > +   s.num = vq->last_avail_idx;
> > +   r = copy_to_user(argp, &s, sizeof s);
> > +   break;
> 
> Ah, this is my fault.  I didn't expose the last_avail_idx in the ring
> because the other side doesn't need it; but without it the ring state is not
> fully observable from outside (no external save / restore!).
> 
> I have a patch which published these indices (we have room), see:
>   
> http://ozlabs.org/~rusty/kernel/rr-2009-

Re: vhost net: performance with ping benchmark

2009-08-25 Thread Anthony Liguori
Avi Kivity wrote:
> My preference is ring proxying.  Not we'll need ring proxying (or at 
> least event proxying) for non-MSI guests.

Thinking about this more...

How does the hand off work?  Assuming you normally don't proxy ring 
entries and switch to proxying them when you want to migration, do you 
have a set of ioctl()s that changes the semantics of the ring to be host 
virtual addresses instead of guest physical?  If so, what do you do with 
in flight requests?  Does qemu have to buffer new requests and wait for 
old ones to complete?

Unless you always do ring proxying.  If that's the case, we don't need 
any of the slot management code in vhost.

Regards,

Anthony Liguori
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: vhost net: performance with ping benchmark

2009-08-25 Thread Anthony Liguori
Michael S. Tsirkin wrote:
>> My preference is ring proxying.  Not we'll need ring proxying (or at  
>> least event proxying) for non-MSI guests.
>> 
>
> Exactly, that's what I meant earlier. That's enough, isn't it, Anthony?
>   

It is if we have a working implementation that demonstrates the 
userspace interface is sufficient.  Once it goes into the upstream 
kernel, we need to have backwards compatibility code in QEMU forever to 
support that kernel version.

Regards,

Anthony Liguori
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: vhost net: performance with ping benchmark

2009-08-25 Thread Anthony Liguori
Avi Kivity wrote:
>> I think this is likely going to be needed regardless.  I also think 
>> the tap compatibility suggestion would simplify the consumption of 
>> this in userspace.
>
> What about veth pairs?

Does veth support GSO and checksum offload?

>> I'd like some time to look at get_state/set_state ioctl()s along with 
>> dirty tracking support.  It's a much better model for live migration 
>> IMHO.
>
> My preference is ring proxying.  Not we'll need ring proxying (or at 
> least event proxying) for non-MSI guests.

I avoided suggested ring proxying because I didn't want to suggest that 
merging should be contingent on it.

Regards,

Anthony Liguori
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: [Qemu-devel] Re: virtio-serial: An interface for host-guest communication

2009-08-25 Thread Gerd Hoffmann
On 08/25/09 14:43, Rusty Russell wrote:
> On Thu, 20 Aug 2009 05:14:29 pm Gerd Hoffmann wrote:
>> I think strings are better as numbers for identifying protocols as you
>> can work without a central registry for the numbers then.
>
> Yep, all you need is a central registry for names :)

There are schemes to do without (reverse domain names, i.e. 
au.com.rustcorp.* is all yours and you don't have to register).  Also 
with names the namespace is much bigger and clashes are much less 
likely, so chances that it works out with everybody simply picking a 
sane name are much higher.

cheers,
   Gerd
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: [Qemu-devel] Re: virtio-serial: An interface for host-guest communication

2009-08-25 Thread Rusty Russell
On Thu, 20 Aug 2009 05:14:29 pm Gerd Hoffmann wrote:
> I think strings are better as numbers for identifying protocols as you 
> can work without a central registry for the numbers then.

Yep, all you need is a central registry for names :)

Rusty.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: vhost net: performance with ping benchmark

2009-08-25 Thread Arnd Bergmann
On Tuesday 25 August 2009, Avi Kivity wrote:
> On 08/25/2009 05:22 AM, Anthony Liguori wrote:
> >
> > I think 2.6.32 is pushing it. 
> 
> 2.6.32 is pushing it, but we need to push it.

Agreed.

> > I think some time is needed to flush out the userspace interface.  In 
> > particular, I don't think Mark's comments have been adequately 
> > addressed.  If a version were merged without GSO support, some 
> > mechanism to do feature detection would be needed in the userspace API. 
> 
> I don't see any point  in merging without gso (unless it beats userspace 
> with gso, which I don't think will happen).  In any case we'll need 
> feature negotiation.

The feature negotiation that Michael has put in seems sufficient for this.
If you care more about latency than bandwidth, the current driver is
an enormous improvement over the user space code, which I find is enough
reason to have it now.

> > I think this is likely going to be needed regardless.  I also think 
> > the tap compatibility suggestion would simplify the consumption of 
> > this in userspace.
> 
> What about veth pairs?

I think you are talking about different issues. Veth pairs let you connect
vhost to a bridge device like you do in the typical tun/tap setup, which
addresses one problem.

The other issue that came up is that raw sockets require root permissions,
so you have to start qemu as root or with an open file descriptor for the
socket (e.g. through libvirt). Permission handling on tap devices is ugly
as well, but is a solved problem. The solution is to be able to pass
a tap device (or a socket from a tap device) into vhost net. IMHO we
need this eventually, but it's not a show stopper for 2.6.32.

Along the same lines, I also think it should support TCP and UDP sockets
so we can offload 'qemu -net socket,mcast' and 'qemu -net socket,connect'
in addition to 'qemu -net socket,raw'.

Arnd <><
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: [PATCHv4 2/2] vhost_net: a kernel-level virtio server

2009-08-25 Thread Rusty Russell
On Thu, 20 Aug 2009 12:33:09 am Michael S. Tsirkin wrote:
> What it is: vhost net is a character device that can be used to reduce
> the number of system calls involved in virtio networking.
> Existing virtio net code is used in the guest without modification.

...

> +config VHOST_NET
> + tristate "Host kernel accelerator for virtio net"
> + depends on NET && EVENTFD
> + ---help---
> +   This kernel module can be loaded in host kernel to accelerate
> +   guest networking with virtio_net. Not to be confused with virtio_net
> +   module itself which needs to be loaded in guest kernel.
> +
> +   To compile this driver as a module, choose M here: the module will
> +   be called vhost_net.

Just want to note that the patch explanation and the Kconfig help text are
exceptional examples of reader-oriented text.  Nice!

> + /* Sanity check */
> + if (vq->iov->iov_len != sizeof(struct virtio_net_hdr)) {
> + vq_err(vq, "Unexpected header len for TX: "
> +"%ld expected %zd\n", vq->iov->iov_len,
> +sizeof(struct virtio_net_hdr));
> + break;
> + }

OK, this check, which is in the qemu version, is *wrong*.  There should be
no assumption on sg boundaries.  For example, the guest should be able to
put the virtio_net_hdr in the front of the skbuf data if there is room.

You should try to explicitly "consume" sizeof(struct virtio_net_hdr) of the
iov, and if fail, do this message.  You can skip the out <= 1 test then,
too.

Anyway, I really prefer vq->iov[0]. to vq->iov-> here.

> + /* Sanity check */
> + if (vq->iov->iov_len != sizeof(struct virtio_net_hdr)) {
> + vq_err(vq, "Unexpected header len for RX: "
> +"%ld expected %zd\n",
> +vq->iov->iov_len, sizeof(struct virtio_net_hdr));
> + break;

Here too.

> + u32 __user *featurep = argp;
> + int __user *fdp = argp;
> + u32 features;
> + int fd, r;
> + switch (ioctl) {
> + case VHOST_NET_SET_SOCKET:
> + r = get_user(fd, fdp);
> + if (r < 0)
> + return r;
> + return vhost_net_set_socket(n, fd);
> + case VHOST_GET_FEATURES:
> + /* No features for now */
> + features = 0;
> + return put_user(features, featurep);

We may well get more than 32 feature bits, at least for virtio_net, which will
force us to do some trickery in virtio_pci.  I'd like to avoid that here,
though it's kind of ugly.  We'd need VHOST_GET_FEATURES (and ACK) to take a
struct like:

u32 feature_size;
u32 features[];

> +int vhost_net_init(void)

static?

> +void vhost_net_exit(void)

static?

> +/* Start polling a file. We add ourselves to file's wait queue. The user must
> + * keep a reference to a file until after vhost_poll_stop is called. */

I experienced minor confusion from the comments in this file.  Where you said
"user" I think "caller".  No biggie though.

> + memory->nregions = 2;
> + memory->regions[0].guest_phys_addr = 1;
> + memory->regions[0].userspace_addr = 1;
> + memory->regions[0].memory_size = ~0ULL;
> + memory->regions[1].guest_phys_addr = 0;
> + memory->regions[1].userspace_addr = 0;
> + memory->regions[1].memory_size = 1;

Not sure I understand why there are two regions to start?

> + case VHOST_SET_VRING_BASE:
> + r = copy_from_user(&s, argp, sizeof s);
> + if (r < 0)
> + break;
> + if (s.num > 0x) {
> + r = -EINVAL;
> + break;
> + }
> + vq->last_avail_idx = s.num;
> + break;
> + case VHOST_GET_VRING_BASE:
> + s.index = idx;
> + s.num = vq->last_avail_idx;
> + r = copy_to_user(argp, &s, sizeof s);
> + break;

Ah, this is my fault.  I didn't expose the last_avail_idx in the ring
because the other side doesn't need it; but without it the ring state is not
fully observable from outside (no external save / restore!).

I have a patch which published these indices (we have room), see:

http://ozlabs.org/~rusty/kernel/rr-2009-08-12-1/virtio:ring-publish-indices.patch

Perhaps we should use that mechanism instead?  We don't actually have to
offer the feature (we don't care about the guest state), but it's nice as
documentation.  I've been waiting for an excuse to use that patch.

> +long vhost_dev_ioctl(struct vhost_dev *d, unsigned int ioctl, unsigned long 
> arg)
> +{
> + void __user *argp = (void __user *)arg;
> + long r;
> +
> + mutex_lock(&d->mutex);
> + if (ioctl == VHOST_SET_OWNER) {
> + r = vhost_dev_set_owner(d);
> + goto done;
> + }
> +
> + r = vhost_dev_check_owner(d);

You can do a VHOST_SET_OWNE

Re: [PATCHv4 2/2] virtio: refactor find_vqs

2009-08-25 Thread Michael S. Tsirkin
Sorry it took a bit to respond to this.  I kept hoping I'll have time to
test it but looks like I won't before I'm on vacation.  I agree it's
good cleanup, and maybe we can go even further, see below.

On Mon, Aug 10, 2009 at 09:07:52AM +0930, Rusty Russell wrote:
> On Tue, 28 Jul 2009 06:03:08 pm Michael S. Tsirkin wrote:
> > On Tue, Jul 28, 2009 at 12:44:31PM +0930, Rusty Russell wrote:
> > > On Mon, 27 Jul 2009 01:17:09 am Michael S. Tsirkin wrote:
> > > > This refactors find_vqs, making it more readable and robust, and fixing
> > > > two regressions from 2.6.30:
> > > > - double free_irq causing BUG_ON on device removal
> > > > - probe failure when vq can't be assigned to msi-x vector
> > > >   (reported on old host kernels)
> > > > 
> > > > An older version of this patch was tested by Amit Shah.
> > > 
> > > OK, I've applied both of these; I'd like to see a new test by Amit to
> > > make sure tho.
> > > 
> > > I really like this cleanup!  I looked harder at this code, and my best
> > > attempts to untangle it further came to very little.  This is what I
> > > ended up with, but it's all cosmetic and can wait until next merge window.
> > > See what you think.
> > > 
> > > Thanks!
> > > Rusty.
> > > 
> > > virtio_pci: minor MSI-X cleanups
> > > 
> > > 1) Rename vp_request_vectors to vp_request_msix_vectors, and take
> > >non-MSI-X case out to caller.
> > 
> > I'm not sure this change was for the best: we still have a separate code
> > path under if !use_msix, only in another place now.  See below.
> > And this seems to break the symmetry between request_ and free_vectors.
> 
> Yes, but unfortunately request_vectors was never really symmetrical :(
> 
> That's because we didn't do the request_irq's for the per_vector case, because
> we don't have the names.  This is what prevented me from doing a nice
> encapsulation.

Yes. But let's split free_vectors out into free_msix_vectors and
free_intx as well?

> > > - err = vp_request_vectors(vdev, nvectors, per_vq_vectors);
> > > + if (!use_msix) {
> > > + /* Old style: one normal interrupt for change and all vqs. */
> > > + vp_dev->msix_vectors = 0;
> > > + vp_dev->per_vq_vectors = false;
> > > + err = request_irq(vp_dev->pci_dev->irq, vp_interrupt,
> > > +   IRQF_SHARED, dev_name(&vdev->dev), vp_dev);
> > > + if (!err)
> > > + vp_dev->intx_enabled = 1;
> > 
> > shorter as vp_dev->intx_enabled = !err
> > 
> > > + return err;
> > 
> > Is that all? Don't we need to create the vqs?
> 
> Oh, yeah :)
> 
> This patch applies on top.  Basically reverts that part, and
> renames vector to msix_vector, which I think makes the code more
> readable.

Yes, I agree, this is good cleanup, structure field names should be
desriptive.  Are you sure we want to make all local variables named
vector renamed to msix_vector though? CodingStyle says local var names
should be short ...  A couple of ideas below.

> virtio: more PCI minor cleanups
> 
> Signed-off-by: Rusty Russell 
> ---
>  drivers/virtio/virtio_pci.c |   73 
> +---
>  1 file changed, 36 insertions(+), 37 deletions(-)
> 
> diff --git a/drivers/virtio/virtio_pci.c b/drivers/virtio/virtio_pci.c
> --- a/drivers/virtio/virtio_pci.c
> +++ b/drivers/virtio/virtio_pci.c
> @@ -84,7 +84,7 @@ struct virtio_pci_vq_info
>   struct list_head node;
>  
>   /* MSI-X vector (or none) */
> - unsigned vector;
> + unsigned msix_vector;

Good. And now we don't need the comment.

>  };
>  
>  /* Qumranet donated their vendor ID for devices 0x1000 thru 0x10FF. */
> @@ -349,7 +349,7 @@ error:
>  static struct virtqueue *setup_vq(struct virtio_device *vdev, unsigned index,
> void (*callback)(struct virtqueue *vq),
> const char *name,
> -   u16 vector)
> +   u16 msix_vector)
>  {
>   struct virtio_pci_device *vp_dev = to_vp_device(vdev);
>   struct virtio_pci_vq_info *info;
> @@ -374,7 +374,7 @@ static struct virtqueue *setup_vq(struct
>  
>   info->queue_index = index;
>   info->num = num;
> - info->vector = vector;
> + info->msix_vector = msix_vector;
>  
>   size = PAGE_ALIGN(vring_size(num, VIRTIO_PCI_VRING_ALIGN));
>   info->queue = alloc_pages_exact(size, GFP_KERNEL|__GFP_ZERO);
> @@ -398,10 +398,10 @@ static struct virtqueue *setup_vq(struct
>   vq->priv = info;
>   info->vq = vq;
>  
> - if (vector != VIRTIO_MSI_NO_VECTOR) {
> - iowrite16(vector, vp_dev->ioaddr + VIRTIO_MSI_QUEUE_VECTOR);
> - vector = ioread16(vp_dev->ioaddr + VIRTIO_MSI_QUEUE_VECTOR);
> - if (vector == VIRTIO_MSI_NO_VECTOR) {
> + if (msix_vector != VIRTIO_MSI_NO_VECTOR) {
> + iowrite16(msix_vector, vp_dev->ioaddr+VIRTIO_MSI_QUEUE_VECTOR);
> + msix_vector = ioread16(vp_dev->ioaddr+VIRTIO_MSI_QUEUE_VECT

[PATCH] XEN: remove undefined functions

2009-08-25 Thread Jaswinder Singh Rajput

mk_pirq_info(), gsi_from_irq() and vector_from_irq() are static functions
and no one is calling them.

This fixed following compilation warnings :

  drivers/xen/events.c:134: warning: ‘mk_pirq_info’ defined but not used
  drivers/xen/events.c:180: warning: ‘gsi_from_irq’ defined but not used
  drivers/xen/events.c:190: warning: ‘vector_from_irq’ defined but not used

Signed-off-by: Jaswinder Singh Rajput 
---
 drivers/xen/events.c |   27 ---
 1 files changed, 0 insertions(+), 27 deletions(-)

diff --git a/drivers/xen/events.c b/drivers/xen/events.c
index abad71b..d43957a 100644
--- a/drivers/xen/events.c
+++ b/drivers/xen/events.c
@@ -131,13 +131,6 @@ static struct irq_info mk_virq_info(unsigned short evtchn, 
unsigned short virq)
.cpu = 0, .u.virq = virq };
 }
 
-static struct irq_info mk_pirq_info(unsigned short evtchn,
-   unsigned short gsi, unsigned short vector)
-{
-   return (struct irq_info) { .type = IRQT_PIRQ, .evtchn = evtchn,
-   .cpu = 0, .u.pirq = { .gsi = gsi, .vector = vector } };
-}
-
 /*
  * Accessors for packed IRQ information.
  */
@@ -177,26 +170,6 @@ static unsigned virq_from_irq(unsigned irq)
return info->u.virq;
 }
 
-static unsigned gsi_from_irq(unsigned irq)
-{
-   struct irq_info *info = info_for_irq(irq);
-
-   BUG_ON(info == NULL);
-   BUG_ON(info->type != IRQT_PIRQ);
-
-   return info->u.pirq.gsi;
-}
-
-static unsigned vector_from_irq(unsigned irq)
-{
-   struct irq_info *info = info_for_irq(irq);
-
-   BUG_ON(info == NULL);
-   BUG_ON(info->type != IRQT_PIRQ);
-
-   return info->u.pirq.vector;
-}
-
 static enum xen_irq_type type_from_irq(unsigned irq)
 {
return info_for_irq(irq)->type;
-- 
1.6.2.5


___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization

Re: [PATCHv4 0/2] vhost: a kernel-level virtio server

2009-08-25 Thread Rusty Russell
On Thu, 20 Aug 2009 12:30:29 am Michael S. Tsirkin wrote:
> Rusty, could you review and comment on the patches please?  Since most
> of the code deals with virtio from host side, I think it will make sense
> to merge them through your tree. What do you think?

Yep, I've been waiting for all the other comments, then I got ill.

Now I'm all recovered, what better way to spend an evening than reviewing
fresh code?

New dir seems fine to me, too.

Thanks,
Rusty.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


[PATCH 3/3] virtio-console: Add interface for generic guest-host communication

2009-08-25 Thread Amit Shah
This interface extends the virtio-console device to handle
multiple ports that present a char device from which bits can
be sent and read.

Sample uses for such a device can be obtaining info from the
guest like the file systems used, apps installed, etc. for
offline usage and logged-in users, clipboard copy-paste, etc.
for online usage.

Each port is to be assigned a unique function, for example, the
first 4 ports may be reserved for libvirt usage, the next 4 for
generic streaming data and so on. This port-function mapping
isn't finalised yet.

For requirements, use-cases and some history see

http://www.linux-kvm.org/page/VMchannel_Requirements

Signed-off-by: Amit Shah 
---
 hw/pc.c |   16 ++-
 hw/virtio-console.c |  423 +--
 hw/virtio-console.h |   47 ++
 monitor.c   |7 +
 qemu-monitor.hx |   10 ++
 qemu-options.hx |2 +-
 sysemu.h|   10 +-
 vl.c|   41 +++---
 8 files changed, 479 insertions(+), 77 deletions(-)

diff --git a/hw/pc.c b/hw/pc.c
index ca681b8..96eb7ac 100644
--- a/hw/pc.c
+++ b/hw/pc.c
@@ -1453,11 +1453,17 @@ static void pc_init1(ram_addr_t ram_size,
 }
 
 /* Add virtio console devices */
-if (pci_enabled) {
-for(i = 0; i < MAX_VIRTIO_CONSOLES; i++) {
-if (virtcon_hds[i]) {
-pci_create_simple(pci_bus, -1, "virtio-console-pci");
-}
+if (pci_enabled && virtcon_nr_ports) {
+void *dev;
+
+dev = pci_create_simple(pci_bus, -1, "virtio-console-pci");
+if (!dev) {
+fprintf(stderr, "qemu: could not create virtio console pci 
device\n");
+exit(1);
+}
+
+for (i = 0; i < virtcon_nr_ports; i++) {
+virtio_console_new_port(dev, virtcon_idx[i]);
 }
 }
 
diff --git a/hw/virtio-console.c b/hw/virtio-console.c
index 92c953c..f8f9866 100644
--- a/hw/virtio-console.c
+++ b/hw/virtio-console.c
@@ -2,9 +2,11 @@
  * Virtio Console Device
  *
  * Copyright IBM, Corp. 2008
+ * Copyright Red Hat, Inc. 2009
  *
  * Authors:
  *  Christian Ehrhardt 
+ *  Amit Shah 
  *
  * This work is licensed under the terms of the GNU GPL, version 2.  See
  * the COPYING file in the top-level directory.
@@ -12,39 +14,158 @@
  */
 
 #include "hw.h"
+#include "pci.h"
+#include "monitor.h"
 #include "qemu-char.h"
 #include "virtio.h"
 #include "virtio-console.h"
 
-
 typedef struct VirtIOConsole
 {
 VirtIODevice vdev;
+PCIDevice *dev;
 VirtQueue *ivq, *ovq;
-CharDriverState *chr;
+struct VirtIOConsolePort *ports;
+struct virtio_console_config *config;
+uint32_t guest_features;
 } VirtIOConsole;
 
-static VirtIOConsole *to_virtio_console(VirtIODevice *vdev)
+typedef struct VirtIOConsolePort {
+VirtIOConsole *vcon;
+CharDriverState *hd;
+bool guest_connected;
+} VirtIOConsolePort;
+
+static VirtIOConsole *virtio_console;
+static struct virtio_console_config virtcon_config;
+
+static VirtIOConsolePort *get_port_from_id(uint32_t id)
+{
+if (id > MAX_VIRTIO_CONSOLE_PORTS)
+return NULL;
+
+return &virtio_console->ports[id];
+}
+
+static int get_id_from_port(VirtIOConsolePort *port)
 {
-return (VirtIOConsole *)vdev;
+uint32_t i;
+
+for (i = 0; i < MAX_VIRTIO_CONSOLE_PORTS; i++) {
+if (port == &virtio_console->ports[i]) {
+return i;
+}
+}
+return VIRTIO_CONSOLE_BAD_ID;
+}
+
+static bool use_multiport(void)
+{
+return virtio_console->guest_features & (1 << VIRTIO_CONSOLE_F_MULTIPORT);
+}
+
+void virtio_console_monitor_command(Monitor *mon,
+const char *command, const char *param)
+{
+int ret;
+
+if(!strncmp(command, "add_port", 8)) {
+if (!param) {
+monitor_printf(mon, "Error: need port id to add new port\n");
+return;
+}
+ret = init_virtio_console_port(virtcon_nr_ports, param);
+if (ret < 0) {
+monitor_printf(mon, "Error: cannot add new port: %s\n",
+   strerror(-ret));
+return;
+}
+virtio_console_new_port(NULL, virtcon_idx[virtcon_nr_ports]);
+virtcon_nr_ports++;
+virtio_console->config->nr_active_ports = 
cpu_to_le32(virtcon_nr_ports);
+return;
+}
+}
+
+static size_t flush_buf(uint32_t id, VirtIOConsolePort *port,
+const uint8_t *buf, size_t len)
+{
+size_t write_len = 0;
+
+if (port->hd) {
+write_len = qemu_chr_write(port->hd, buf, len);
+return write_len;
+}
+return 0;
+}
+
+/* Guest wants to notify us of some event */
+static void handle_control_message(VirtIOConsolePort *port,
+   struct virtio_console_control *cpkt)
+{
+switch(cpkt->event) {
+case VIRTIO_CONSOLE_PORT_OPEN:
+port->guest_connected = cpkt->value;
+break;
+}
 }
 
+/* Guest wrote something to so