Re: [Qemu-devel] [QA-virtio]:Why vring size is limited to 1024?
Thanks for your patient answer! :-) On 2014/9/30 17:33, Michael S. Tsirkin wrote: On Tue, Sep 30, 2014 at 04:36:00PM +0800, Zhangjie (HZ) wrote: Hi, There exits packets loss when we do packet forwarding in VM, especially when we use dpdk to do the forwarding. By enlarging vring can alleviate the problem. I think this has to do with the fact that dpdk disables checksum offloading, this has the side effect of disabling segmentation offloading. Please fix dpdk to support checksum offloading, and I think the problem will go away. In some application scene, loss of udp packets are not allowed, and udp packets are always short than mtu. So, we need to support high pps(eg.0.3M Packets/s) forwarding, and offloading cannot fix it. But now vring size is limited to 1024 as follows: VirtQueue *virtio_add_queue(VirtIODevice *vdev, int queue_size, void (*handle_output)(VirtIODevice *, VirtQueue *)) { ... if (i == VIRTIO_PCI_QUEUE_MAX || queue_size VIRTQUEUE_MAX_SIZE) abort(); } ps:#define VIRTQUEUE_MAX_SIZE 1024 I delete the judgement code, and set vring size to 2048, VM can be successfully started, and the network is ok too. So, Why vring size is limited to 1024 and what is the influence? Thanks! There are several reason for this limit. First guest has to allocate descriptor buffer which is 16 * vring size. With 1K size that is already 16K which might be tricky to allocate contigiously if memory is fragmented when device is added by hotplug. That is very The second issue is that we want to be able to implement the device on top of linux kernel, and a single descriptor might use all of the virtqueue. In this case we wont to be able to pass the descriptor directly to linux as a single iov, since that is limited to 1K entries. For the second issue, I wonder if it is ok to set vring size of virtio-net to large than 1024, as for net work, there is at most 18 pages for a skb, it will not exceed iov. -- Best Wishes! Zhang Jie . -- Best Wishes! Zhang Jie
Re: [Qemu-devel] [QA-virtio]:Why vring size is limited to 1024?
On Wed, Oct 08, 2014 at 03:17:56PM +0800, Zhangjie (HZ) wrote: Thanks for your patient answer! :-) On 2014/9/30 17:33, Michael S. Tsirkin wrote: On Tue, Sep 30, 2014 at 04:36:00PM +0800, Zhangjie (HZ) wrote: Hi, There exits packets loss when we do packet forwarding in VM, especially when we use dpdk to do the forwarding. By enlarging vring can alleviate the problem. I think this has to do with the fact that dpdk disables checksum offloading, this has the side effect of disabling segmentation offloading. Please fix dpdk to support checksum offloading, and I think the problem will go away. In some application scene, loss of udp packets are not allowed, and udp packets are always short than mtu. So, we need to support high pps(eg.0.3M Packets/s) forwarding, and offloading cannot fix it. That's the point. With UFO you get larger than MTU UDP packets: http://www.linuxfoundation.org/collaborate/workgroups/networking/ufo Additionally, checksum offloading reduces CPU utilization and reduces the number of data copies, allowing higher pps with smaller buffers. It might look like queue depth helps performance for netperf, but in real-life workloads the latency under load will suffer, with more protocols implementing tunnelling on top of UDP such extreme bufferbloat will not be tolerated. But now vring size is limited to 1024 as follows: VirtQueue *virtio_add_queue(VirtIODevice *vdev, int queue_size, void (*handle_output)(VirtIODevice *, VirtQueue *)) { ... if (i == VIRTIO_PCI_QUEUE_MAX || queue_size VIRTQUEUE_MAX_SIZE) abort(); } ps:#define VIRTQUEUE_MAX_SIZE 1024 I delete the judgement code, and set vring size to 2048, VM can be successfully started, and the network is ok too. So, Why vring size is limited to 1024 and what is the influence? Thanks! There are several reason for this limit. First guest has to allocate descriptor buffer which is 16 * vring size. With 1K size that is already 16K which might be tricky to allocate contigiously if memory is fragmented when device is added by hotplug. That is very The second issue is that we want to be able to implement the device on top of linux kernel, and a single descriptor might use all of the virtqueue. In this case we wont to be able to pass the descriptor directly to linux as a single iov, since that is limited to 1K entries. For the second issue, I wonder if it is ok to set vring size of virtio-net to large than 1024, as for net work, there is at most 18 pages for a skb, it will not exceed iov. -- Best Wishes! Zhang Jie . -- Best Wishes! Zhang Jie
Re: [Qemu-devel] [QA-virtio]:Why vring size is limited to 1024?
On 09/30/2014 12:33 PM, Michael S. Tsirkin wrote: a single descriptor might use all of the virtqueue. In this case we wont to be able to pass the descriptor directly to linux as a single iov, since You could separate maximum request scatter/gather list size from the virtqueue size. They are totally unrelated - even now you can have a larger request by using indirect descriptors.
Re: [Qemu-devel] [QA-virtio]:Why vring size is limited to 1024?
MST, Thanks very much, I get it. On 2014/10/8 15:37, Michael S. Tsirkin wrote: On Wed, Oct 08, 2014 at 03:17:56PM +0800, Zhangjie (HZ) wrote: Thanks for your patient answer! :-) On 2014/9/30 17:33, Michael S. Tsirkin wrote: On Tue, Sep 30, 2014 at 04:36:00PM +0800, Zhangjie (HZ) wrote: Hi, There exits packets loss when we do packet forwarding in VM, especially when we use dpdk to do the forwarding. By enlarging vring can alleviate the problem. I think this has to do with the fact that dpdk disables checksum offloading, this has the side effect of disabling segmentation offloading. Please fix dpdk to support checksum offloading, and I think the problem will go away. In some application scene, loss of udp packets are not allowed, and udp packets are always short than mtu. So, we need to support high pps(eg.0.3M Packets/s) forwarding, and offloading cannot fix it. That's the point. With UFO you get larger than MTU UDP packets: http://www.linuxfoundation.org/collaborate/workgroups/networking/ufo Then vm only do forwarding, and not create new packets itself. As we can not gro normal udp packets, when udp packets come from the nic of host, ufo cannot work. Additionally, checksum offloading reduces CPU utilization and reduces the number of data copies, allowing higher pps with smaller buffers. It might look like queue depth helps performance for netperf, but in real-life workloads the latency under load will suffer, with more protocols implementing tunnelling on top of UDP such extreme bufferbloat will not be tolerated. But now vring size is limited to 1024 as follows: VirtQueue *virtio_add_queue(VirtIODevice *vdev, int queue_size, void (*handle_output)(VirtIODevice *, VirtQueue *)) { ... if (i == VIRTIO_PCI_QUEUE_MAX || queue_size VIRTQUEUE_MAX_SIZE) abort(); } ps:#define VIRTQUEUE_MAX_SIZE 1024 I delete the judgement code, and set vring size to 2048, VM can be successfully started, and the network is ok too. So, Why vring size is limited to 1024 and what is the influence? Thanks! There are several reason for this limit. First guest has to allocate descriptor buffer which is 16 * vring size. With 1K size that is already 16K which might be tricky to allocate contigiously if memory is fragmented when device is added by hotplug. That is very The second issue is that we want to be able to implement the device on top of linux kernel, and a single descriptor might use all of the virtqueue. In this case we wont to be able to pass the descriptor directly to linux as a single iov, since that is limited to 1K entries. For the second issue, I wonder if it is ok to set vring size of virtio-net to large than 1024, as for net work, there is at most 18 pages for a skb, it will not exceed iov. -- Best Wishes! Zhang Jie . -- Best Wishes! Zhang Jie . -- Best Wishes! Zhang Jie
Re: [Qemu-devel] [QA-virtio]:Why vring size is limited to 1024?
On 2014/10/8 15:43, Avi Kivity wrote: You could separate maximum request scatter/gather list size from the virtqueue size. They are totally unrelated - even now you can have a larger request by using indirect descriptors. Yes, there is no strong correlation between virtqueue size and iov form the code. -- Best Wishes! Zhang Jie
Re: [Qemu-devel] [QA-virtio]:Why vring size is limited to 1024?
On Wed, Oct 08, 2014 at 04:07:47PM +0800, Zhangjie (HZ) wrote: MST, Thanks very much, I get it. On 2014/10/8 15:37, Michael S. Tsirkin wrote: On Wed, Oct 08, 2014 at 03:17:56PM +0800, Zhangjie (HZ) wrote: Thanks for your patient answer! :-) On 2014/9/30 17:33, Michael S. Tsirkin wrote: On Tue, Sep 30, 2014 at 04:36:00PM +0800, Zhangjie (HZ) wrote: Hi, There exits packets loss when we do packet forwarding in VM, especially when we use dpdk to do the forwarding. By enlarging vring can alleviate the problem. I think this has to do with the fact that dpdk disables checksum offloading, this has the side effect of disabling segmentation offloading. Please fix dpdk to support checksum offloading, and I think the problem will go away. In some application scene, loss of udp packets are not allowed, and udp packets are always short than mtu. So, we need to support high pps(eg.0.3M Packets/s) forwarding, and offloading cannot fix it. That's the point. With UFO you get larger than MTU UDP packets: http://www.linuxfoundation.org/collaborate/workgroups/networking/ufo Then vm only do forwarding, and not create new packets itself. As we can not gro normal udp packets, when udp packets come from the nic of host, ufo cannot work. This is something I've been thinking about for a while now. We really should add GRO-like path for UDP, this isn't too different from UDP. LRO can often work with UDP too, but linux discards too much info on LRO, but if you are doing drivers in userspace you might be able to support this. Additionally, checksum offloading reduces CPU utilization and reduces the number of data copies, allowing higher pps with smaller buffers. It might look like queue depth helps performance for netperf, but in real-life workloads the latency under load will suffer, with more protocols implementing tunnelling on top of UDP such extreme bufferbloat will not be tolerated. But now vring size is limited to 1024 as follows: VirtQueue *virtio_add_queue(VirtIODevice *vdev, int queue_size, void (*handle_output)(VirtIODevice *, VirtQueue *)) { ... if (i == VIRTIO_PCI_QUEUE_MAX || queue_size VIRTQUEUE_MAX_SIZE) abort(); } ps:#define VIRTQUEUE_MAX_SIZE 1024 I delete the judgement code, and set vring size to 2048, VM can be successfully started, and the network is ok too. So, Why vring size is limited to 1024 and what is the influence? Thanks! There are several reason for this limit. First guest has to allocate descriptor buffer which is 16 * vring size. With 1K size that is already 16K which might be tricky to allocate contigiously if memory is fragmented when device is added by hotplug. That is very The second issue is that we want to be able to implement the device on top of linux kernel, and a single descriptor might use all of the virtqueue. In this case we wont to be able to pass the descriptor directly to linux as a single iov, since that is limited to 1K entries. For the second issue, I wonder if it is ok to set vring size of virtio-net to large than 1024, as for net work, there is at most 18 pages for a skb, it will not exceed iov. -- Best Wishes! Zhang Jie . -- Best Wishes! Zhang Jie . -- Best Wishes! Zhang Jie
Re: [Qemu-devel] [QA-virtio]:Why vring size is limited to 1024?
On Wed, Oct 08, 2014 at 10:43:07AM +0300, Avi Kivity wrote: On 09/30/2014 12:33 PM, Michael S. Tsirkin wrote: a single descriptor might use all of the virtqueue. In this case we wont to be able to pass the descriptor directly to linux as a single iov, since You could separate maximum request scatter/gather list size from the virtqueue size. They are totally unrelated - even now you can have a larger request by using indirect descriptors. We could add a feature to have a smaller or larger S/G length limit. Is this something useful? -- MST
Re: [Qemu-devel] [QA-virtio]:Why vring size is limited to 1024?
On 10/08/2014 12:15 PM, Michael S. Tsirkin wrote: On Wed, Oct 08, 2014 at 10:43:07AM +0300, Avi Kivity wrote: On 09/30/2014 12:33 PM, Michael S. Tsirkin wrote: a single descriptor might use all of the virtqueue. In this case we wont to be able to pass the descriptor directly to linux as a single iov, since You could separate maximum request scatter/gather list size from the virtqueue size. They are totally unrelated - even now you can have a larger request by using indirect descriptors. We could add a feature to have a smaller or larger S/G length limit. Is this something useful? Having a larger ring size is useful, esp. with zero-copy transmit, and you would need the sglist length limit in order to not require linearization on linux hosts. So the limit is not useful in itself, only indirectly. Google cloud engine exposes virtio ring sizes of 16384. Even more useful is getting rid of the desc array and instead passing descs inline in avail and used.
Re: [Qemu-devel] [QA-virtio]:Why vring size is limited to 1024?
On Wed, Oct 08, 2014 at 12:51:21PM +0300, Avi Kivity wrote: On 10/08/2014 12:15 PM, Michael S. Tsirkin wrote: On Wed, Oct 08, 2014 at 10:43:07AM +0300, Avi Kivity wrote: On 09/30/2014 12:33 PM, Michael S. Tsirkin wrote: a single descriptor might use all of the virtqueue. In this case we wont to be able to pass the descriptor directly to linux as a single iov, since You could separate maximum request scatter/gather list size from the virtqueue size. They are totally unrelated - even now you can have a larger request by using indirect descriptors. We could add a feature to have a smaller or larger S/G length limit. Is this something useful? Having a larger ring size is useful, esp. with zero-copy transmit, and you would need the sglist length limit in order to not require linearization on linux hosts. So the limit is not useful in itself, only indirectly. Google cloud engine exposes virtio ring sizes of 16384. OK this sounds useful, I'll queue this up for consideration. Thanks! Even more useful is getting rid of the desc array and instead passing descs inline in avail and used. You expect this to improve performance? Quite possibly but this will have to be demonstrated. -- MST
Re: [Qemu-devel] [QA-virtio]:Why vring size is limited to 1024?
On 10/08/2014 01:14 PM, Michael S. Tsirkin wrote: On Wed, Oct 08, 2014 at 12:51:21PM +0300, Avi Kivity wrote: On 10/08/2014 12:15 PM, Michael S. Tsirkin wrote: On Wed, Oct 08, 2014 at 10:43:07AM +0300, Avi Kivity wrote: On 09/30/2014 12:33 PM, Michael S. Tsirkin wrote: a single descriptor might use all of the virtqueue. In this case we wont to be able to pass the descriptor directly to linux as a single iov, since You could separate maximum request scatter/gather list size from the virtqueue size. They are totally unrelated - even now you can have a larger request by using indirect descriptors. We could add a feature to have a smaller or larger S/G length limit. Is this something useful? Having a larger ring size is useful, esp. with zero-copy transmit, and you would need the sglist length limit in order to not require linearization on linux hosts. So the limit is not useful in itself, only indirectly. Google cloud engine exposes virtio ring sizes of 16384. OK this sounds useful, I'll queue this up for consideration. Thanks! Thanks. Even more useful is getting rid of the desc array and instead passing descs inline in avail and used. You expect this to improve performance? Quite possibly but this will have to be demonstrated. The top vhost function in small packet workloads is vhost_get_vq_desc, and the top instruction within that (50%) is the one that reads the first 8 bytes of desc. It's a guaranteed cache line miss (and again on the guest side when it's time to reuse). Inline descriptors will amortize the cache miss over 4 descriptors, and will allow the hardware to prefetch, since the descriptors are linear in memory.
Re: [Qemu-devel] [QA-virtio]:Why vring size is limited to 1024?
On Wed, Oct 08, 2014 at 01:37:25PM +0300, Avi Kivity wrote: On 10/08/2014 01:14 PM, Michael S. Tsirkin wrote: On Wed, Oct 08, 2014 at 12:51:21PM +0300, Avi Kivity wrote: On 10/08/2014 12:15 PM, Michael S. Tsirkin wrote: On Wed, Oct 08, 2014 at 10:43:07AM +0300, Avi Kivity wrote: On 09/30/2014 12:33 PM, Michael S. Tsirkin wrote: a single descriptor might use all of the virtqueue. In this case we wont to be able to pass the descriptor directly to linux as a single iov, since You could separate maximum request scatter/gather list size from the virtqueue size. They are totally unrelated - even now you can have a larger request by using indirect descriptors. We could add a feature to have a smaller or larger S/G length limit. Is this something useful? Having a larger ring size is useful, esp. with zero-copy transmit, and you would need the sglist length limit in order to not require linearization on linux hosts. So the limit is not useful in itself, only indirectly. Google cloud engine exposes virtio ring sizes of 16384. OK this sounds useful, I'll queue this up for consideration. Thanks! Thanks. Even more useful is getting rid of the desc array and instead passing descs inline in avail and used. You expect this to improve performance? Quite possibly but this will have to be demonstrated. The top vhost function in small packet workloads is vhost_get_vq_desc, and the top instruction within that (50%) is the one that reads the first 8 bytes of desc. It's a guaranteed cache line miss (and again on the guest side when it's time to reuse). OK so basically what you are pointing out is that we get 5 accesses: read of available head, read of available ring, read of descriptor, write of used ring, write of used ring head. If processing is in-order, we could build a much simpler design, with a valid bit in the descriptor, cleared by host as descriptors are consumed. Basically get rid of both used and available ring. Sounds good in theory. Inline descriptors will amortize the cache miss over 4 descriptors, and will allow the hardware to prefetch, since the descriptors are linear in memory. If descriptors are used in order (as they are with current qemu) then aren't they amortized already? -- MST
Re: [Qemu-devel] [QA-virtio]:Why vring size is limited to 1024?
On 10/08/2014 01:55 PM, Michael S. Tsirkin wrote: Even more useful is getting rid of the desc array and instead passing descs inline in avail and used. You expect this to improve performance? Quite possibly but this will have to be demonstrated. The top vhost function in small packet workloads is vhost_get_vq_desc, and the top instruction within that (50%) is the one that reads the first 8 bytes of desc. It's a guaranteed cache line miss (and again on the guest side when it's time to reuse). OK so basically what you are pointing out is that we get 5 accesses: read of available head, read of available ring, read of descriptor, write of used ring, write of used ring head. Right. And only read of descriptor is not amortized. If processing is in-order, we could build a much simpler design, with a valid bit in the descriptor, cleared by host as descriptors are consumed. Basically get rid of both used and available ring. That only works if you don't allow reordering, which is never the case for block, and not the case for zero-copy net. It also has writers on both side of the ring. The right design is to keep avail and used, but instead of making them rings of pointers to descs, make them rings of descs. The host reads descs from avail, processes them, then writes them back on used (possibly out-of-order). The guest writes descs to avail and reads them back from used. You'll probably have to add a 64-bit cookie to desc so you can complete without an additional lookup. Sounds good in theory. Inline descriptors will amortize the cache miss over 4 descriptors, and will allow the hardware to prefetch, since the descriptors are linear in memory. If descriptors are used in order (as they are with current qemu) then aren't they amortized already?
Re: [Qemu-devel] [QA-virtio]:Why vring size is limited to 1024?
On 10/08/2014 01:55 PM, Michael S. Tsirkin wrote: Inline descriptors will amortize the cache miss over 4 descriptors, and will allow the hardware to prefetch, since the descriptors are linear in memory. If descriptors are used in order (as they are with current qemu) then aren't they amortized already? The descriptors are only in-order for non-zero-copy net. They are out of order for block and zero-copy net. (also, the guest has to be careful in how it allocates descriptors).
Re: [Qemu-devel] [QA-virtio]:Why vring size is limited to 1024?
On Wed, Oct 08, 2014 at 01:59:13PM +0300, Avi Kivity wrote: On 10/08/2014 01:55 PM, Michael S. Tsirkin wrote: Even more useful is getting rid of the desc array and instead passing descs inline in avail and used. You expect this to improve performance? Quite possibly but this will have to be demonstrated. The top vhost function in small packet workloads is vhost_get_vq_desc, and the top instruction within that (50%) is the one that reads the first 8 bytes of desc. It's a guaranteed cache line miss (and again on the guest side when it's time to reuse). OK so basically what you are pointing out is that we get 5 accesses: read of available head, read of available ring, read of descriptor, write of used ring, write of used ring head. Right. And only read of descriptor is not amortized. If processing is in-order, we could build a much simpler design, with a valid bit in the descriptor, cleared by host as descriptors are consumed. Basically get rid of both used and available ring. That only works if you don't allow reordering, which is never the case for block, and not the case for zero-copy net. It also has writers on both side of the ring. The right design is to keep avail and used, but instead of making them rings of pointers to descs, make them rings of descs. The host reads descs from avail, processes them, then writes them back on used (possibly out-of-order). The guest writes descs to avail and reads them back from used. You'll probably have to add a 64-bit cookie to desc so you can complete without an additional lookup. My old presentation from 2012 or so suggested something like this. We don't need a 64 bit cookie I think - a small 16 bit one should be enough. Sounds good in theory. Inline descriptors will amortize the cache miss over 4 descriptors, and will allow the hardware to prefetch, since the descriptors are linear in memory. If descriptors are used in order (as they are with current qemu) then aren't they amortized already?
Re: [Qemu-devel] [QA-virtio]:Why vring size is limited to 1024?
On 10/08/2014 03:22 PM, Michael S. Tsirkin wrote: On Wed, Oct 08, 2014 at 01:59:13PM +0300, Avi Kivity wrote: On 10/08/2014 01:55 PM, Michael S. Tsirkin wrote: Even more useful is getting rid of the desc array and instead passing descs inline in avail and used. You expect this to improve performance? Quite possibly but this will have to be demonstrated. The top vhost function in small packet workloads is vhost_get_vq_desc, and the top instruction within that (50%) is the one that reads the first 8 bytes of desc. It's a guaranteed cache line miss (and again on the guest side when it's time to reuse). OK so basically what you are pointing out is that we get 5 accesses: read of available head, read of available ring, read of descriptor, write of used ring, write of used ring head. Right. And only read of descriptor is not amortized. If processing is in-order, we could build a much simpler design, with a valid bit in the descriptor, cleared by host as descriptors are consumed. Basically get rid of both used and available ring. That only works if you don't allow reordering, which is never the case for block, and not the case for zero-copy net. It also has writers on both side of the ring. The right design is to keep avail and used, but instead of making them rings of pointers to descs, make them rings of descs. The host reads descs from avail, processes them, then writes them back on used (possibly out-of-order). The guest writes descs to avail and reads them back from used. You'll probably have to add a 64-bit cookie to desc so you can complete without an additional lookup. My old presentation from 2012 or so suggested something like this. We don't need a 64 bit cookie I think - a small 16 bit one should be enough. A 16 bit cookie means you need an extra table to hold the real request pointers. With a 64-bit cookie you can store a pointer to the skbuff or bio in the ring itself, and avoid the extra lookup. The extra lookup isn't the end of the world, since doesn't cross core boundaries, but it's worth avoiding.
Re: [Qemu-devel] [QA-virtio]:Why vring size is limited to 1024?
On 10/08/2014 03:28 PM, Avi Kivity wrote: My old presentation from 2012 or so suggested something like this. We don't need a 64 bit cookie I think - a small 16 bit one should be enough. A 16 bit cookie means you need an extra table to hold the real request pointers. With a 64-bit cookie you can store a pointer to the skbuff or bio in the ring itself, and avoid the extra lookup. The extra lookup isn't the end of the world, since doesn't cross core boundaries, but it's worth avoiding. What you can do is have two types of descriptors: head and fragment union desc { struct head { u16 nfrags u16 flags u64 cookie } struct frag { u64 paddr u16 flen u16 flags } } so now a request length is 12*(nfrags+1). You can be evil and steal some bits from paddr/cookie, and have each descriptor 8 bytes long. btw, I also recommend storing things like vnet_hdr in the ring itself, instead of out-of-line in memory. Maybe the ring should just transport bytes and let the upper layer decide how it's formatted.
Re: [Qemu-devel] [QA-virtio]:Why vring size is limited to 1024?
On Tue, Sep 30, 2014 at 04:36:00PM +0800, Zhangjie (HZ) wrote: Hi, There exits packets loss when we do packet forwarding in VM, especially when we use dpdk to do the forwarding. By enlarging vring can alleviate the problem. I think this has to do with the fact that dpdk disables checksum offloading, this has the side effect of disabling segmentation offloading. Please fix dpdk to support checksum offloading, and I think the problem will go away. But now vring size is limited to 1024 as follows: VirtQueue *virtio_add_queue(VirtIODevice *vdev, int queue_size, void (*handle_output)(VirtIODevice *, VirtQueue *)) { ... if (i == VIRTIO_PCI_QUEUE_MAX || queue_size VIRTQUEUE_MAX_SIZE) abort(); } ps:#define VIRTQUEUE_MAX_SIZE 1024 I delete the judgement code, and set vring size to 2048, VM can be successfully started, and the network is ok too. So, Why vring size is limited to 1024 and what is the influence? Thanks! There are several reason for this limit. First guest has to allocate descriptor buffer which is 16 * vring size. With 1K size that is already 16K which might be tricky to allocate contigiously if memory is fragmented when device is added by hotplug. The second issue is that we want to be able to implement the device on top of linux kernel, and a single descriptor might use all of the virtqueue. In this case we wont to be able to pass the descriptor directly to linux as a single iov, since that is limited to 1K entries. -- Best Wishes! Zhang Jie