Re: [RFC] Virtual Machine Device Queues(VMDq) support on KVM
On Tuesday 22 September 2009, Stephen Hemminger wrote: > > My idea for that was to open multiple file descriptors to the same > > macvtap device and let the kernel figure out the right thing to > > do with that. You can do the same with raw packed sockets in case > > of vhost_net, but I wouldn't want to add more complexity to the > > tun/tap driver for this. > > > Or get tap out of the way entirely. The packets should not have > to go out to user space at all (see veth) How does veth relate to that, do you mean vhost_net? With vhost_net, you could still open multiple sockets, only the access is in the kernel. Obviously, once it all is in the kernel, that could be done under the covers, but I think it would be cleaner to treat vhost_net purely as a way to bypass the syscalls for user space, with as little as possible visible impact otherwise. Arnd <>< -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC] Virtual Machine Device Queues(VMDq) support on KVM
On Tue, 22 Sep 2009 13:50:54 +0200 Arnd Bergmann wrote: > On Tuesday 22 September 2009, Michael S. Tsirkin wrote: > > > > More importantly, when virtualizations is used with multi-queue > > > > NIC's the virtio-net NIC is a single CPU bottleneck. The virtio-net > > > > NIC should preserve the parallelism (lock free) using multiple > > > > receive/transmit queues. The number of queues should equal the > > > > number of CPUs. > > > > > > Yup, multiqueue virtio is on todo list ;-) > > > > > > > Note we'll need multiqueue tap for that to help. > > My idea for that was to open multiple file descriptors to the same > macvtap device and let the kernel figure out the right thing to > do with that. You can do the same with raw packed sockets in case > of vhost_net, but I wouldn't want to add more complexity to the > tun/tap driver for this. > > Arnd <>< Or get tap out of the way entirely. The packets should not have to go out to user space at all (see veth) -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC] Virtual Machine Device Queues(VMDq) support on KVM
On Tuesday 22 September 2009, Michael S. Tsirkin wrote: > > > More importantly, when virtualizations is used with multi-queue > > > NIC's the virtio-net NIC is a single CPU bottleneck. The virtio-net > > > NIC should preserve the parallelism (lock free) using multiple > > > receive/transmit queues. The number of queues should equal the > > > number of CPUs. > > > > Yup, multiqueue virtio is on todo list ;-) > > > > Note we'll need multiqueue tap for that to help. My idea for that was to open multiple file descriptors to the same macvtap device and let the kernel figure out the right thing to do with that. You can do the same with raw packed sockets in case of vhost_net, but I wouldn't want to add more complexity to the tun/tap driver for this. Arnd <>< -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC] Virtual Machine Device Queues(VMDq) support on KVM
On Mon, Sep 21, 2009 at 09:27:18AM -0700, Chris Wright wrote: > * Stephen Hemminger (shemmin...@vyatta.com) wrote: > > On Mon, 21 Sep 2009 16:37:22 +0930 > > Rusty Russell wrote: > > > > > > > Actually this framework can apply to traditional network adapters > > > > > which have > > > > > just one tx/rx queue pair. And applications using the same > > > > > user/kernel interface > > > > > can utilize this framework to send/receive network traffic directly > > > > > thru a tx/rx > > > > > queue pair in a network adapter. > > > > > > > > > More importantly, when virtualizations is used with multi-queue > > NIC's the virtio-net NIC is a single CPU bottleneck. The virtio-net > > NIC should preserve the parallelism (lock free) using multiple > > receive/transmit queues. The number of queues should equal the > > number of CPUs. > > Yup, multiqueue virtio is on todo list ;-) > > thanks, > -chris Note we'll need multiqueue tap for that to help. -- MST -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC] Virtual Machine Device Queues(VMDq) support on KVM
* Stephen Hemminger (shemmin...@vyatta.com) wrote: > On Mon, 21 Sep 2009 16:37:22 +0930 > Rusty Russell wrote: > > > > > Actually this framework can apply to traditional network adapters which > > > > have > > > > just one tx/rx queue pair. And applications using the same user/kernel > > > > interface > > > > can utilize this framework to send/receive network traffic directly > > > > thru a tx/rx > > > > queue pair in a network adapter. > > > > > > More importantly, when virtualizations is used with multi-queue NIC's the > virtio-net > NIC is a single CPU bottleneck. The virtio-net NIC should preserve the > parallelism (lock > free) using multiple receive/transmit queues. The number of queues should > equal the > number of CPUs. Yup, multiqueue virtio is on todo list ;-) thanks, -chris -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC] Virtual Machine Device Queues(VMDq) support on KVM
On Mon, 21 Sep 2009 16:37:22 +0930 Rusty Russell wrote: > > > Actually this framework can apply to traditional network adapters which > > > have > > > just one tx/rx queue pair. And applications using the same user/kernel > > > interface > > > can utilize this framework to send/receive network traffic directly thru > > > a tx/rx > > > queue pair in a network adapter. > > > More importantly, when virtualizations is used with multi-queue NIC's the virtio-net NIC is a single CPU bottleneck. The virtio-net NIC should preserve the parallelism (lock free) using multiple receive/transmit queues. The number of queues should equal the number of CPUs. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC] Virtual Machine Device Queues(VMDq) support on KVM
On Wed, 2 Sep 2009 01:35:18 am Stephen Hemminger wrote: > On Tue, 1 Sep 2009 14:58:19 +0800 > "Xin, Xiaohui" wrote: > > > [RFC] Virtual Machine Device Queues (VMDq) support on KVM > > > > Network adapter with VMDq technology presents multiple pairs of tx/rx > > queues, > > and renders network L2 sorting mechanism based on MAC addresses and VLAN > > tags > > for each tx/rx queue pair. Here we present a generic framework, in which > > network > > traffic to/from a tx/rx queue pair can be directed from/to a KVM guest > > without > > any software copy. > > > > Actually this framework can apply to traditional network adapters which have > > just one tx/rx queue pair. And applications using the same user/kernel > > interface > > can utilize this framework to send/receive network traffic directly thru a > > tx/rx > > queue pair in a network adapter. > > > > We use virtio-net architecture to illustrate the framework. > > > > > > || pop add_buf|| > > |Qemu process| <-TX <-- | Guest Kernel | > > || -> --> || > > |Virtio-net | push get_buf|| > > | (Backend service) | ->RX --> | Virtio-net| > > || <- <-- |driver | > > || push get_buf|| > > || || > >| > >| > >| AIO (read & write) combined with Direct I/O > >| (which substitute synced file operations) > > |---| > > | Host kernel | read: copy-less with directly mapped user | > > | | space to kernel, payload directly DMAed | > > | | into user space | > > | | write: copy-less with directly mapped user | > > | | space to kernel, payload directly hooked | > > | | to a skb | > > | || > > | (a likely || > > | queue pair || > > | instance) || > > | | || > > | NIC driver <--> TUN/TAP driver | > > |---| > >| > >| > >traditional adapter or a tx/rx queue pair > > > > The basic idea is to utilize the kernel Asynchronous I/O combined with > > Direct > > I/O to implements copy-less TUN/TAP device. AIO and Direct I/O is not new to > > kernel, we still can see it in SCSI tape driver. > > > > With traditional file operations, a copying of payload contents from/to the > > kernel DMA address to/from a user buffer is needed. That's what the copying > > we > > want to save. > > > > The proposed framework is like this: > > A TUN/TAP device is bound to a traditional NIC adapter or a tx/rx queue > > pair in > > host side. KVM virto-net Backend service, the user space program submits > > asynchronous read/write I/O requests to the host kernel through TUN/TAP > > device. > > The requests are corresponding to the vqueue elements include both > > transmission > > & receive. They can be queued in one AIO request and later, the completion > > will > > be notified through the underlying packets tx/rx processing of the rx/tx > > queue > > pair. > > > > Detailed path: > > > > To guest Virtio-net driver, packets receive corresponding to asynchronous > > read > > I/O requests of Backend service. > > > > 1) Guest Virtio-net driver provides header and payload address through the > > receive vqueue to Virtio-net backend service. > > > > 2) Virtio-net backend service encapsulates multiple vqueue elements into > > multiple AIO control blocks and composes them into one AIO read request. > > > > 3) Virtio-net backend service uses io_submit() syscall to pass the request > > to > > the TUN/TAP device. > > > > 4) Virtio-net backend service uses io_getevents() syscall to check the > > completion of the request. > > > > 5) The TUN/TAP driver receives packets from the queue pair of NIC, and > > prepares > > for Direct I/O. > >A modified NIC driver may render a skb which header is allocated in host > > kernel, but the payload buffer is directly mapped from user space buffer > > which > > are rendered through the AIO request by the Backend service. > > get_user_pages() > > may do this.
RE: [RFC] Virtual Machine Device Queues(VMDq) support on KVM
>* Code is easier to review than bullet points. Yes. We'd send the code soon. >* Direct I/O has to be safe when page is shared by multiple threads, > and has to be non-blocking since network I/O can take indeterminately > long (think big queue's, tunneling, ...) In the situation, one queue pair NIC is assigned to only one guest, the pages are locked and a KVM guest will not swapped out. >* In the past attempts at Direct I/O on network have always had SMP > TLB issues. The page has to be flipped or marked as COW on all CPU's > and the cost of the Inter Processor Interrupt to steal the page has > been slower than copying It may be, we have not thought about this more . Thanks. Thanks Xiaohui -Original Message- From: Stephen Hemminger [mailto:shemmin...@vyatta.com] Sent: Wednesday, September 02, 2009 12:05 AM To: Xin, Xiaohui Cc: m...@redhat.com; net...@vger.kernel.org; virtualizat...@lists.linux-foundation.org; kvm@vger.kernel.org; linux-ker...@vger.kernel.org; mi...@elte.hu; linux...@kvack.org; a...@linux-foundation.org; h...@zytor.com; gregory.hask...@gmail.com Subject: Re: [RFC] Virtual Machine Device Queues(VMDq) support on KVM On Tue, 1 Sep 2009 14:58:19 +0800 "Xin, Xiaohui" wrote: > [RFC] Virtual Machine Device Queues (VMDq) support on KVM > > Network adapter with VMDq technology presents multiple pairs of tx/rx queues, > and renders network L2 sorting mechanism based on MAC addresses and VLAN tags > for each tx/rx queue pair. Here we present a generic framework, in which > network > traffic to/from a tx/rx queue pair can be directed from/to a KVM guest without > any software copy. > > Actually this framework can apply to traditional network adapters which have > just one tx/rx queue pair. And applications using the same user/kernel > interface > can utilize this framework to send/receive network traffic directly thru a > tx/rx > queue pair in a network adapter. > > We use virtio-net architecture to illustrate the framework. > > > || pop add_buf|| > |Qemu process| <-TX <-- | Guest Kernel | > || -> --> || > |Virtio-net | push get_buf|| > | (Backend service) | ->RX --> | Virtio-net| > || <- <-- |driver | > || push get_buf|| > || || >| >| >| AIO (read & write) combined with Direct I/O >| (which substitute synced file operations) > |---| > | Host kernel | read: copy-less with directly mapped user | > | | space to kernel, payload directly DMAed | > | | into user space | > | | write: copy-less with directly mapped user | > | | space to kernel, payload directly hooked | > | | to a skb | > | || > | (a likely || > | queue pair || > | instance) || > | | || > | NIC driver <--> TUN/TAP driver | > |---| >| >| >traditional adapter or a tx/rx queue pair > > The basic idea is to utilize the kernel Asynchronous I/O combined with Direct > I/O to implements copy-less TUN/TAP device. AIO and Direct I/O is not new to > kernel, we still can see it in SCSI tape driver. > > With traditional file operations, a copying of payload contents from/to the > kernel DMA address to/from a user buffer is needed. That's what the copying we > want to save. > > The proposed framework is like this: > A TUN/TAP device is bound to a traditional NIC adapter or a tx/rx queue pair > in > host side. KVM virto-net Backend service, the user space program submits > asynchronous read/write I/O requests to the host kernel through TUN/TAP > device. > The requests are corresponding to the vqueue elements include both > transmission
Re: [RFC] Virtual Machine Device Queues(VMDq) support on KVM
On Tue, 1 Sep 2009 14:58:19 +0800 "Xin, Xiaohui" wrote: > [RFC] Virtual Machine Device Queues (VMDq) support on KVM > > Network adapter with VMDq technology presents multiple pairs of tx/rx queues, > and renders network L2 sorting mechanism based on MAC addresses and VLAN tags > for each tx/rx queue pair. Here we present a generic framework, in which > network > traffic to/from a tx/rx queue pair can be directed from/to a KVM guest without > any software copy. > > Actually this framework can apply to traditional network adapters which have > just one tx/rx queue pair. And applications using the same user/kernel > interface > can utilize this framework to send/receive network traffic directly thru a > tx/rx > queue pair in a network adapter. > > We use virtio-net architecture to illustrate the framework. > > > || pop add_buf|| > |Qemu process| <-TX <-- | Guest Kernel | > || -> --> || > |Virtio-net | push get_buf|| > | (Backend service) | ->RX --> | Virtio-net| > || <- <-- |driver | > || push get_buf|| > || || >| >| >| AIO (read & write) combined with Direct I/O >| (which substitute synced file operations) > |---| > | Host kernel | read: copy-less with directly mapped user | > | | space to kernel, payload directly DMAed | > | | into user space | > | | write: copy-less with directly mapped user | > | | space to kernel, payload directly hooked | > | | to a skb | > | || > | (a likely || > | queue pair || > | instance) || > | | || > | NIC driver <--> TUN/TAP driver | > |---| >| >| >traditional adapter or a tx/rx queue pair > > The basic idea is to utilize the kernel Asynchronous I/O combined with Direct > I/O to implements copy-less TUN/TAP device. AIO and Direct I/O is not new to > kernel, we still can see it in SCSI tape driver. > > With traditional file operations, a copying of payload contents from/to the > kernel DMA address to/from a user buffer is needed. That's what the copying we > want to save. > > The proposed framework is like this: > A TUN/TAP device is bound to a traditional NIC adapter or a tx/rx queue pair > in > host side. KVM virto-net Backend service, the user space program submits > asynchronous read/write I/O requests to the host kernel through TUN/TAP > device. > The requests are corresponding to the vqueue elements include both > transmission > & receive. They can be queued in one AIO request and later, the completion > will > be notified through the underlying packets tx/rx processing of the rx/tx queue > pair. > > Detailed path: > > To guest Virtio-net driver, packets receive corresponding to asynchronous read > I/O requests of Backend service. > > 1) Guest Virtio-net driver provides header and payload address through the > receive vqueue to Virtio-net backend service. > > 2) Virtio-net backend service encapsulates multiple vqueue elements into > multiple AIO control blocks and composes them into one AIO read request. > > 3) Virtio-net backend service uses io_submit() syscall to pass the request to > the TUN/TAP device. > > 4) Virtio-net backend service uses io_getevents() syscall to check the > completion of the request. > > 5) The TUN/TAP driver receives packets from the queue pair of NIC, and > prepares > for Direct I/O. >A modified NIC driver may render a skb which header is allocated in host > kernel, but the payload buffer is directly mapped from user space buffer which > are rendered through the AIO request by the Backend service. get_user_pages() > may do this. For one AIO read request, the TUN/TAP driver maintains a list for > the directly mapped buffers, and a NIC driver tries to get the buffers as > payload buffer to compose the new skbs. Of course, if getting the buffers > fails, then kernel allocated buffers are used. > > 6) Modern NIC cards now most