Re: [kvm-devel] [PATCH RFC 1/3] virtio infrastructure

2007-06-04 Thread Avi Kivity
Rusty Russell wrote:
   
 Networking hardware generally services descriptors in a FIFO manner.
 

 Well, ethernet guarantees order.  Not sure about others tho...

   

OT: Does that hold for bonded interfaces too?

 virtio may not (for example, it may offload copies of larger packets to 
 a dma engine such as I/OAT, resulting in a delay, but copy smaller 
 packets immediately).  that means that there will be some mismatch 
 between virtio drivers and real hardware drivers.
 

 I think your point is that the completion bitmap (or indeed, the current
 approach) does not maintain order?  Hmm, this is more convincing to me
 than cache arguments, since some devices might want ordering and want
 more than a single io in flight.
   

Well, it wasn't really; sorry for being unclear.  My point was that 
virtio interfaces will not match hardware exactly.

My objection is to scan all slots, occupied or not, for completion.  I 
think virtio should present completed descriptors without the need for 
scanning, even if it means looking a bit different from a typical 
ethernet driver.


-- 
error compiling committee.c: too many arguments to function


-
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


Re: [kvm-devel] [PATCH RFC 1/3] virtio infrastructure

2007-06-04 Thread Rusty Russell
On Mon, 2007-06-04 at 14:25 +0300, Avi Kivity wrote:
 Rusty Russell wrote:

  Networking hardware generally services descriptors in a FIFO manner.
  
 
  Well, ethernet guarantees order.  Not sure about others tho...
 
 OT: Does that hold for bonded interfaces too?

Sorry, I don't know.  The ethernet standard promises in-order, but I'd
imagine you'd need to prepend a header to get this to work with bonding
in general...

  virtio may not (for example, it may offload copies of larger packets to 
  a dma engine such as I/OAT, resulting in a delay, but copy smaller 
  packets immediately).  that means that there will be some mismatch 
  between virtio drivers and real hardware drivers.
  
 
  I think your point is that the completion bitmap (or indeed, the current
  approach) does not maintain order?  Hmm, this is more convincing to me
  than cache arguments, since some devices might want ordering and want
  more than a single io in flight.
 
 Well, it wasn't really; sorry for being unclear.  My point was that 
 virtio interfaces will not match hardware exactly.
 
 My objection is to scan all slots, occupied or not, for completion.  I 
 think virtio should present completed descriptors without the need for 
 scanning, even if it means looking a bit different from a typical 
 ethernet driver.

It's not just the ethernet driver, it's virtio drivers in general.  One
reason the Xen drivers are viewed with such horror is that they look
nothing like normal Linux drivers.

But that just means that the linked list(s) should be in the struct
virtio_device rather than an arg to the interrupt handler.  I think,
given that the network code doesn't want to process used outbufs in the
interrupt handler, this is the Right Thing anyway.

I'll send here once it's done...

Thanks,
Rusty.


-
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


Re: [kvm-devel] [PATCH RFC 1/3] virtio infrastructure

2007-06-04 Thread Herbert Xu
On Mon, Jun 04, 2007 at 02:25:32PM +0300, Avi Kivity wrote:
 
 OT: Does that hold for bonded interfaces too?

Yes.  By default traffic to the same destination MAC always stick to
one interface.  You could select a layer3+4 hashing policy but even
that guarantees a single flow will stick to one physical interface
unless it contains IP fragments which should never happen for TCP.

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmVHI~} [EMAIL PROTECTED]
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

-
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


Re: [kvm-devel] [PATCH RFC 1/3] virtio infrastructure

2007-06-02 Thread Avi Kivity
Rusty Russell wrote:
 This attempts to implement a virtual I/O layer which should allow
 common drivers to be efficiently used across most virtual I/O
 mechanisms.  It will no-doubt need further enhancement.

 The details of probing the device are left to hypervisor-specific
 code: it simple constructs the struct virtio_device and hands it to
 the probe function (eg. virtnet_probe() or virtblk_probe()).

 The virtio drivers add and detach input and output buffers; as the
 buffers are used up their associated used pointers are filled in.

   

Good stuff.

 +/**
 + * virtio_ops - virtio abstraction layer
 + * @add_outbuf: prepare to send data to the other end:
 + *   vdev: the virtio_device
 + *   sg: the description of the buffer(s).
 + *   num: the size of the sg array.
 + *   used: the length sent (set once sending is done).
 + *  Returns an identifier or an error.
 + * @add_inbuf: prepare to receive data from the other end:
 + *   vdev: the virtio_device
 + *   sg: the description of the buffer(s).
 + *   num: the size of the sg array.
 + *   used: the length sent (set once data received).
 + *  Returns an identifier or an error (eg. -ENOSPC).
   

Instead of 'used', how about a completion callback (with associated data
pointer)?  A new helper, virtio_complete(), would call the callback for
all completed requests.  It would eliminate all the tedious scanning
used to match the identifier.

It would also be nice to support a bit of non-buffer data, like a set of
bitflags.


-- 
Do not meddle in the internals of kernels, for they are subtle and quick to 
panic.


-
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


Re: [kvm-devel] [PATCH RFC 1/3] virtio infrastructure

2007-06-02 Thread Rusty Russell
On Sat, 2007-06-02 at 09:30 +0300, Avi Kivity wrote:
 Rusty Russell wrote:
  + * virtio_ops - virtio abstraction layer
  + * @add_outbuf: prepare to send data to the other end:
  + * vdev: the virtio_device
  + * sg: the description of the buffer(s).
  + * num: the size of the sg array.
  + * used: the length sent (set once sending is done).
  + *  Returns an identifier or an error.
  + * @add_inbuf: prepare to receive data from the other end:
  + * vdev: the virtio_device
  + * sg: the description of the buffer(s).
  + * num: the size of the sg array.
  + * used: the length sent (set once data received).
  + *  Returns an identifier or an error (eg. -ENOSPC).

 
 Instead of 'used', how about a completion callback (with associated data
 pointer)?  A new helper, virtio_complete(), would call the callback for
 all completed requests.  It would eliminate all the tedious scanning
 used to match the identifier.

Hi Avi,

There were several considerations here.  My first was that the drivers
look much more like normal devices than getting a callback for every
buffer.   Secondly, used batches much more nicely than a completion.
Finally, it's also something you really want to know, so the driver
doesn't have to zero its inbufs (an untrusted other side says it sends
you 1500 bytes but actually sent nothing, and now you spray kernel
memory out the NIC).

I also considered some scheme like:

struct virtio_used_info
{
unsigned long len;
void *next_token;
};
...
unsigned long (*add_outbuf)(struct virtio_device *vdev,
const struct scatterlist sg[],
unsigned int num,
void *token,
struct virtio_used_info *used_info);

So the used becomes a used/next pair and you can just walk the
linked list.  But I wasn't convinced that walking the buffers is going
to be a performance issue (tho the net driver puts them in a continuous
array for cache friendliness as a nod to this concern).

 It would also be nice to support a bit of non-buffer data, like a set of
 bitflags.

I expect this might be necessary, but it wasn't so far.  The non-buffer
data tends to go in sg[0]: the block driver works this way, and the
network driver will for GSO.  Of course, a specialized virtio_ops
backend might well take this and put the info somewhere else.

I also considered a separate publish/examine interface for things
which aren't really messages, but again, haven't needed it yet.

Thanks!
Rusty.


-
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


Re: [kvm-devel] [PATCH RFC 1/3] virtio infrastructure

2007-05-31 Thread Dor Laor
This attempts to implement a virtual I/O layer which should allow
common drivers to be efficiently used across most virtual I/O
mechanisms.  It will no-doubt need further enhancement.

The details of probing the device are left to hypervisor-specific
code: it simple constructs the struct virtio_device and hands it to
the probe function (eg. virtnet_probe() or virtblk_probe()).

The virtio drivers add and detach input and output buffers; as the
buffers are used up their associated used pointers are filled in.

I have written two virtio device drivers (net and block) and two
virtio implementations (for lguest): a read-write socket-style
implementation, and a more efficient descriptor-based implementation).

Signed-off-by: Rusty Russell [EMAIL PROTECTED]

That's the exact things I was planning to add to KVM/Linux.
All virtual I/O devices should have common interface and share the core
functionality. Since Xen PV drivers are already performance optimized
and 
feature rich, we were planning to generalize the hypervisor-specific
backend in order to reuse them.
This is a good step toward such sharing.
Cheers, Dor.

-
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel