Re: [kvm-devel] [PATCH RFC 1/3] virtio infrastructure
Rusty Russell wrote: Networking hardware generally services descriptors in a FIFO manner. Well, ethernet guarantees order. Not sure about others tho... OT: Does that hold for bonded interfaces too? virtio may not (for example, it may offload copies of larger packets to a dma engine such as I/OAT, resulting in a delay, but copy smaller packets immediately). that means that there will be some mismatch between virtio drivers and real hardware drivers. I think your point is that the completion bitmap (or indeed, the current approach) does not maintain order? Hmm, this is more convincing to me than cache arguments, since some devices might want ordering and want more than a single io in flight. Well, it wasn't really; sorry for being unclear. My point was that virtio interfaces will not match hardware exactly. My objection is to scan all slots, occupied or not, for completion. I think virtio should present completed descriptors without the need for scanning, even if it means looking a bit different from a typical ethernet driver. -- error compiling committee.c: too many arguments to function - This SF.net email is sponsored by DB2 Express Download DB2 Express C - the FREE version of DB2 express and take control of your XML. No limits. Just data. Click to get it now. http://sourceforge.net/powerbar/db2/ ___ kvm-devel mailing list kvm-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/kvm-devel
Re: [kvm-devel] [PATCH RFC 1/3] virtio infrastructure
On Mon, 2007-06-04 at 14:25 +0300, Avi Kivity wrote: Rusty Russell wrote: Networking hardware generally services descriptors in a FIFO manner. Well, ethernet guarantees order. Not sure about others tho... OT: Does that hold for bonded interfaces too? Sorry, I don't know. The ethernet standard promises in-order, but I'd imagine you'd need to prepend a header to get this to work with bonding in general... virtio may not (for example, it may offload copies of larger packets to a dma engine such as I/OAT, resulting in a delay, but copy smaller packets immediately). that means that there will be some mismatch between virtio drivers and real hardware drivers. I think your point is that the completion bitmap (or indeed, the current approach) does not maintain order? Hmm, this is more convincing to me than cache arguments, since some devices might want ordering and want more than a single io in flight. Well, it wasn't really; sorry for being unclear. My point was that virtio interfaces will not match hardware exactly. My objection is to scan all slots, occupied or not, for completion. I think virtio should present completed descriptors without the need for scanning, even if it means looking a bit different from a typical ethernet driver. It's not just the ethernet driver, it's virtio drivers in general. One reason the Xen drivers are viewed with such horror is that they look nothing like normal Linux drivers. But that just means that the linked list(s) should be in the struct virtio_device rather than an arg to the interrupt handler. I think, given that the network code doesn't want to process used outbufs in the interrupt handler, this is the Right Thing anyway. I'll send here once it's done... Thanks, Rusty. - This SF.net email is sponsored by DB2 Express Download DB2 Express C - the FREE version of DB2 express and take control of your XML. No limits. Just data. Click to get it now. http://sourceforge.net/powerbar/db2/ ___ kvm-devel mailing list kvm-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/kvm-devel
Re: [kvm-devel] [PATCH RFC 1/3] virtio infrastructure
On Mon, Jun 04, 2007 at 02:25:32PM +0300, Avi Kivity wrote: OT: Does that hold for bonded interfaces too? Yes. By default traffic to the same destination MAC always stick to one interface. You could select a layer3+4 hashing policy but even that guarantees a single flow will stick to one physical interface unless it contains IP fragments which should never happen for TCP. Cheers, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmVHI~} [EMAIL PROTECTED] Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt - This SF.net email is sponsored by DB2 Express Download DB2 Express C - the FREE version of DB2 express and take control of your XML. No limits. Just data. Click to get it now. http://sourceforge.net/powerbar/db2/ ___ kvm-devel mailing list kvm-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/kvm-devel
Re: [kvm-devel] [PATCH RFC 1/3] virtio infrastructure
Rusty Russell wrote: This attempts to implement a virtual I/O layer which should allow common drivers to be efficiently used across most virtual I/O mechanisms. It will no-doubt need further enhancement. The details of probing the device are left to hypervisor-specific code: it simple constructs the struct virtio_device and hands it to the probe function (eg. virtnet_probe() or virtblk_probe()). The virtio drivers add and detach input and output buffers; as the buffers are used up their associated used pointers are filled in. Good stuff. +/** + * virtio_ops - virtio abstraction layer + * @add_outbuf: prepare to send data to the other end: + * vdev: the virtio_device + * sg: the description of the buffer(s). + * num: the size of the sg array. + * used: the length sent (set once sending is done). + * Returns an identifier or an error. + * @add_inbuf: prepare to receive data from the other end: + * vdev: the virtio_device + * sg: the description of the buffer(s). + * num: the size of the sg array. + * used: the length sent (set once data received). + * Returns an identifier or an error (eg. -ENOSPC). Instead of 'used', how about a completion callback (with associated data pointer)? A new helper, virtio_complete(), would call the callback for all completed requests. It would eliminate all the tedious scanning used to match the identifier. It would also be nice to support a bit of non-buffer data, like a set of bitflags. -- Do not meddle in the internals of kernels, for they are subtle and quick to panic. - This SF.net email is sponsored by DB2 Express Download DB2 Express C - the FREE version of DB2 express and take control of your XML. No limits. Just data. Click to get it now. http://sourceforge.net/powerbar/db2/ ___ kvm-devel mailing list kvm-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/kvm-devel
Re: [kvm-devel] [PATCH RFC 1/3] virtio infrastructure
On Sat, 2007-06-02 at 09:30 +0300, Avi Kivity wrote: Rusty Russell wrote: + * virtio_ops - virtio abstraction layer + * @add_outbuf: prepare to send data to the other end: + * vdev: the virtio_device + * sg: the description of the buffer(s). + * num: the size of the sg array. + * used: the length sent (set once sending is done). + * Returns an identifier or an error. + * @add_inbuf: prepare to receive data from the other end: + * vdev: the virtio_device + * sg: the description of the buffer(s). + * num: the size of the sg array. + * used: the length sent (set once data received). + * Returns an identifier or an error (eg. -ENOSPC). Instead of 'used', how about a completion callback (with associated data pointer)? A new helper, virtio_complete(), would call the callback for all completed requests. It would eliminate all the tedious scanning used to match the identifier. Hi Avi, There were several considerations here. My first was that the drivers look much more like normal devices than getting a callback for every buffer. Secondly, used batches much more nicely than a completion. Finally, it's also something you really want to know, so the driver doesn't have to zero its inbufs (an untrusted other side says it sends you 1500 bytes but actually sent nothing, and now you spray kernel memory out the NIC). I also considered some scheme like: struct virtio_used_info { unsigned long len; void *next_token; }; ... unsigned long (*add_outbuf)(struct virtio_device *vdev, const struct scatterlist sg[], unsigned int num, void *token, struct virtio_used_info *used_info); So the used becomes a used/next pair and you can just walk the linked list. But I wasn't convinced that walking the buffers is going to be a performance issue (tho the net driver puts them in a continuous array for cache friendliness as a nod to this concern). It would also be nice to support a bit of non-buffer data, like a set of bitflags. I expect this might be necessary, but it wasn't so far. The non-buffer data tends to go in sg[0]: the block driver works this way, and the network driver will for GSO. Of course, a specialized virtio_ops backend might well take this and put the info somewhere else. I also considered a separate publish/examine interface for things which aren't really messages, but again, haven't needed it yet. Thanks! Rusty. - This SF.net email is sponsored by DB2 Express Download DB2 Express C - the FREE version of DB2 express and take control of your XML. No limits. Just data. Click to get it now. http://sourceforge.net/powerbar/db2/ ___ kvm-devel mailing list kvm-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/kvm-devel
Re: [kvm-devel] [PATCH RFC 1/3] virtio infrastructure
This attempts to implement a virtual I/O layer which should allow common drivers to be efficiently used across most virtual I/O mechanisms. It will no-doubt need further enhancement. The details of probing the device are left to hypervisor-specific code: it simple constructs the struct virtio_device and hands it to the probe function (eg. virtnet_probe() or virtblk_probe()). The virtio drivers add and detach input and output buffers; as the buffers are used up their associated used pointers are filled in. I have written two virtio device drivers (net and block) and two virtio implementations (for lguest): a read-write socket-style implementation, and a more efficient descriptor-based implementation). Signed-off-by: Rusty Russell [EMAIL PROTECTED] That's the exact things I was planning to add to KVM/Linux. All virtual I/O devices should have common interface and share the core functionality. Since Xen PV drivers are already performance optimized and feature rich, we were planning to generalize the hypervisor-specific backend in order to reuse them. This is a good step toward such sharing. Cheers, Dor. - This SF.net email is sponsored by DB2 Express Download DB2 Express C - the FREE version of DB2 express and take control of your XML. No limits. Just data. Click to get it now. http://sourceforge.net/powerbar/db2/ ___ kvm-devel mailing list kvm-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/kvm-devel