Re: [RFC] Virtual Machine Device Queues(VMDq) support on KVM

2009-09-22 Thread Arnd Bergmann
On Tuesday 22 September 2009, Stephen Hemminger wrote:
> > My idea for that was to open multiple file descriptors to the same
> > macvtap device and let the kernel figure out the  right thing to
> > do with that. You can do the same with raw packed sockets in case
> > of vhost_net, but I wouldn't want to add more complexity to the
> > tun/tap driver for this.
> > 
> Or get tap out of the way entirely. The packets should not have
> to go out to user space at all (see veth)

How does veth relate to that, do you mean vhost_net? With vhost_net,
you could still open multiple sockets, only the access is in the kernel.
Obviously, once it all is in the kernel, that could be done under the
covers, but I think it would be cleaner to treat vhost_net purely as
a way to bypass the syscalls for user space, with as little as possible
visible impact otherwise.

Arnd <><
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] Virtual Machine Device Queues(VMDq) support on KVM

2009-09-22 Thread Stephen Hemminger
On Tue, 22 Sep 2009 13:50:54 +0200
Arnd Bergmann  wrote:

> On Tuesday 22 September 2009, Michael S. Tsirkin wrote:
> > > > More importantly, when virtualizations is used with multi-queue
> > > > NIC's the virtio-net NIC is a single CPU bottleneck. The virtio-net
> > > > NIC should preserve the parallelism (lock free) using multiple
> > > > receive/transmit queues. The number of queues should equal the
> > > > number of CPUs.
> > > 
> > > Yup, multiqueue virtio is on todo list ;-)
> > > 
> > 
> > Note we'll need multiqueue tap for that to help.
> 
> My idea for that was to open multiple file descriptors to the same
> macvtap device and let the kernel figure out the  right thing to
> do with that. You can do the same with raw packed sockets in case
> of vhost_net, but I wouldn't want to add more complexity to the
> tun/tap driver for this.
> 
>   Arnd <><


Or get tap out of the way entirely. The packets should not have
to go out to user space at all (see veth)
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] Virtual Machine Device Queues(VMDq) support on KVM

2009-09-22 Thread Arnd Bergmann
On Tuesday 22 September 2009, Michael S. Tsirkin wrote:
> > > More importantly, when virtualizations is used with multi-queue
> > > NIC's the virtio-net NIC is a single CPU bottleneck. The virtio-net
> > > NIC should preserve the parallelism (lock free) using multiple
> > > receive/transmit queues. The number of queues should equal the
> > > number of CPUs.
> > 
> > Yup, multiqueue virtio is on todo list ;-)
> > 
> 
> Note we'll need multiqueue tap for that to help.

My idea for that was to open multiple file descriptors to the same
macvtap device and let the kernel figure out the  right thing to
do with that. You can do the same with raw packed sockets in case
of vhost_net, but I wouldn't want to add more complexity to the
tun/tap driver for this.

Arnd <><
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] Virtual Machine Device Queues(VMDq) support on KVM

2009-09-22 Thread Michael S. Tsirkin
On Mon, Sep 21, 2009 at 09:27:18AM -0700, Chris Wright wrote:
> * Stephen Hemminger (shemmin...@vyatta.com) wrote:
> > On Mon, 21 Sep 2009 16:37:22 +0930
> > Rusty Russell  wrote:
> > 
> > > > > Actually this framework can apply to traditional network adapters 
> > > > > which have
> > > > > just one tx/rx queue pair. And applications using the same 
> > > > > user/kernel interface
> > > > > can utilize this framework to send/receive network traffic directly 
> > > > > thru a tx/rx
> > > > > queue pair in a network adapter.
> > > > > 
> > 
> > More importantly, when virtualizations is used with multi-queue
> > NIC's the virtio-net NIC is a single CPU bottleneck. The virtio-net
> > NIC should preserve the parallelism (lock free) using multiple
> > receive/transmit queues. The number of queues should equal the
> > number of CPUs.
> 
> Yup, multiqueue virtio is on todo list ;-)
> 
> thanks,
> -chris

Note we'll need multiqueue tap for that to help.

-- 
MST
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] Virtual Machine Device Queues(VMDq) support on KVM

2009-09-21 Thread Chris Wright
* Stephen Hemminger (shemmin...@vyatta.com) wrote:
> On Mon, 21 Sep 2009 16:37:22 +0930
> Rusty Russell  wrote:
> 
> > > > Actually this framework can apply to traditional network adapters which 
> > > > have
> > > > just one tx/rx queue pair. And applications using the same user/kernel 
> > > > interface
> > > > can utilize this framework to send/receive network traffic directly 
> > > > thru a tx/rx
> > > > queue pair in a network adapter.
> > > > 
> 
> More importantly, when virtualizations is used with multi-queue NIC's the 
> virtio-net
> NIC is a single CPU bottleneck. The virtio-net NIC should preserve the 
> parallelism (lock
> free) using multiple receive/transmit queues. The number of queues should 
> equal the
> number of CPUs.

Yup, multiqueue virtio is on todo list ;-)

thanks,
-chris
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] Virtual Machine Device Queues(VMDq) support on KVM

2009-09-21 Thread Stephen Hemminger
On Mon, 21 Sep 2009 16:37:22 +0930
Rusty Russell  wrote:

> > > Actually this framework can apply to traditional network adapters which 
> > > have
> > > just one tx/rx queue pair. And applications using the same user/kernel 
> > > interface
> > > can utilize this framework to send/receive network traffic directly thru 
> > > a tx/rx
> > > queue pair in a network adapter.
> > > 

More importantly, when virtualizations is used with multi-queue NIC's the 
virtio-net
NIC is a single CPU bottleneck. The virtio-net NIC should preserve the 
parallelism (lock
free) using multiple receive/transmit queues. The number of queues should equal 
the
number of CPUs.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] Virtual Machine Device Queues(VMDq) support on KVM

2009-09-21 Thread Rusty Russell
On Wed, 2 Sep 2009 01:35:18 am Stephen Hemminger wrote:
> On Tue, 1 Sep 2009 14:58:19 +0800
> "Xin, Xiaohui"  wrote:
> 
> >   [RFC] Virtual Machine Device Queues (VMDq) support on KVM
> > 
> > Network adapter with VMDq technology presents multiple pairs of tx/rx 
> > queues,
> > and renders network L2 sorting mechanism based on MAC addresses and VLAN 
> > tags
> > for each tx/rx queue pair. Here we present a generic framework, in which 
> > network
> > traffic to/from a tx/rx queue pair can be directed from/to a KVM guest 
> > without
> > any software copy.
> > 
> > Actually this framework can apply to traditional network adapters which have
> > just one tx/rx queue pair. And applications using the same user/kernel 
> > interface
> > can utilize this framework to send/receive network traffic directly thru a 
> > tx/rx
> > queue pair in a network adapter.
> > 
> > We use virtio-net architecture to illustrate the framework.
> > 
> > 
> > || pop   add_buf||
> > |Qemu process|  <-TX   <--  | Guest Kernel   |
> > ||  -> -->  ||
> > |Virtio-net  | push  get_buf||
> > |  (Backend service) |  ->RX   -->  |  Virtio-net|
> > ||  <- <--  |driver  |
> > || push  get_buf||
> > ||  ||
> >|
> >|
> >| AIO (read & write) combined with Direct I/O
> >|   (which substitute synced file operations)
> > |---|
> > | Host kernel  | read: copy-less with directly mapped user  |
> > |  |   space to kernel, payload directly DMAed  |
> > |  |   into user space  |
> > |  | write: copy-less with directly mapped user |
> > |  |   space to kernel, payload directly hooked |
> > |  |   to a skb |
> > |  ||
> > |  (a likely   ||
> > |   queue pair ||
> > |   instance)  ||
> > |  |   ||
> > | NIC driver <-->  TUN/TAP driver   |
> > |---|
> >|
> >|
> >traditional adapter or a tx/rx queue pair
> > 
> > The basic idea is to utilize the kernel Asynchronous I/O combined with 
> > Direct
> > I/O to implements copy-less TUN/TAP device. AIO and Direct I/O is not new to
> > kernel, we still can see it in SCSI tape driver.
> > 
> > With traditional file operations, a copying of payload contents from/to the
> > kernel DMA address to/from a user buffer is needed. That's what the copying 
> > we
> > want to save.
> > 
> > The proposed framework is like this:
> > A TUN/TAP device is bound to a traditional NIC adapter or a tx/rx queue 
> > pair in
> > host side. KVM virto-net Backend service, the user space program submits
> > asynchronous read/write I/O requests to the host kernel through TUN/TAP 
> > device.
> > The requests are corresponding to the vqueue elements include both 
> > transmission
> > & receive. They can be queued in one AIO request and later, the completion 
> > will
> > be notified through the underlying packets tx/rx processing of the rx/tx 
> > queue
> > pair.
> > 
> > Detailed path:
> > 
> > To guest Virtio-net driver, packets receive corresponding to asynchronous 
> > read
> > I/O requests of Backend service.
> > 
> > 1) Guest Virtio-net driver provides header and payload address through the
> > receive vqueue to Virtio-net backend service.
> > 
> > 2) Virtio-net backend service encapsulates multiple vqueue elements into
> > multiple AIO control blocks and composes them into one AIO read request.
> > 
> > 3) Virtio-net backend service uses io_submit() syscall to pass the request 
> > to
> > the TUN/TAP device.
> > 
> > 4) Virtio-net backend service uses io_getevents() syscall to check the
> > completion of the request.
> > 
> > 5) The TUN/TAP driver receives packets from the queue pair of NIC, and 
> > prepares
> > for Direct I/O.
> >A modified NIC driver may render a skb which header is allocated in host
> > kernel, but the payload buffer is directly mapped from user space buffer 
> > which
> > are rendered through the AIO request by the Backend service. 
> > get_user_pages()
> > may do this. 

RE: [RFC] Virtual Machine Device Queues(VMDq) support on KVM

2009-09-01 Thread Xin, Xiaohui
>* Code is easier to review than bullet points.

Yes. We'd send the code soon.

>* Direct I/O has to be safe when page is shared by multiple threads,
> and has to be non-blocking since network I/O can take indeterminately
> long (think big queue's, tunneling, ...)

In the situation, one queue pair NIC is assigned to only one guest, the pages 
are locked and a KVM guest will not swapped out.


>* In the past attempts at Direct I/O on network have always had SMP
> TLB issues. The page has to be flipped or marked as COW on all CPU's
> and the cost of the Inter Processor Interrupt to steal the page has
> been slower than copying

It may be, we have not thought about this more . Thanks.

Thanks
Xiaohui

-Original Message-
From: Stephen Hemminger [mailto:shemmin...@vyatta.com] 
Sent: Wednesday, September 02, 2009 12:05 AM
To: Xin, Xiaohui
Cc: m...@redhat.com; net...@vger.kernel.org; 
virtualizat...@lists.linux-foundation.org; kvm@vger.kernel.org; 
linux-ker...@vger.kernel.org; mi...@elte.hu; linux...@kvack.org; 
a...@linux-foundation.org; h...@zytor.com; gregory.hask...@gmail.com
Subject: Re: [RFC] Virtual Machine Device Queues(VMDq) support on KVM

On Tue, 1 Sep 2009 14:58:19 +0800
"Xin, Xiaohui"  wrote:

>   [RFC] Virtual Machine Device Queues (VMDq) support on KVM
> 
> Network adapter with VMDq technology presents multiple pairs of tx/rx queues,
> and renders network L2 sorting mechanism based on MAC addresses and VLAN tags
> for each tx/rx queue pair. Here we present a generic framework, in which 
> network
> traffic to/from a tx/rx queue pair can be directed from/to a KVM guest without
> any software copy.
> 
> Actually this framework can apply to traditional network adapters which have
> just one tx/rx queue pair. And applications using the same user/kernel 
> interface
> can utilize this framework to send/receive network traffic directly thru a 
> tx/rx
> queue pair in a network adapter.
> 
> We use virtio-net architecture to illustrate the framework.
> 
> 
> || pop   add_buf||
> |Qemu process|  <-TX   <--  | Guest Kernel   |
> ||  -> -->  ||
> |Virtio-net  | push  get_buf||
> |  (Backend service) |  ->RX   -->  |  Virtio-net|
> ||  <- <--  |driver  |
> || push  get_buf||
> ||  ||
>|
>|
>| AIO (read & write) combined with Direct I/O
>|   (which substitute synced file operations)
> |---|
> | Host kernel  | read: copy-less with directly mapped user  |
> |  |   space to kernel, payload directly DMAed  |
> |  |   into user space  |
> |  | write: copy-less with directly mapped user |
> |  |   space to kernel, payload directly hooked |
> |  |   to a skb |
> |  ||
> |  (a likely   ||
> |   queue pair ||
> |   instance)  ||
> |  |   ||
> | NIC driver <-->  TUN/TAP driver   |
> |---|
>|
>|
>traditional adapter or a tx/rx queue pair
> 
> The basic idea is to utilize the kernel Asynchronous I/O combined with Direct
> I/O to implements copy-less TUN/TAP device. AIO and Direct I/O is not new to
> kernel, we still can see it in SCSI tape driver.
> 
> With traditional file operations, a copying of payload contents from/to the
> kernel DMA address to/from a user buffer is needed. That's what the copying we
> want to save.
> 
> The proposed framework is like this:
> A TUN/TAP device is bound to a traditional NIC adapter or a tx/rx queue pair 
> in
> host side. KVM virto-net Backend service, the user space program submits
> asynchronous read/write I/O requests to the host kernel through TUN/TAP 
> device.
> The requests are corresponding to the vqueue elements include both 
> transmission

Re: [RFC] Virtual Machine Device Queues(VMDq) support on KVM

2009-09-01 Thread Stephen Hemminger
On Tue, 1 Sep 2009 14:58:19 +0800
"Xin, Xiaohui"  wrote:

>   [RFC] Virtual Machine Device Queues (VMDq) support on KVM
> 
> Network adapter with VMDq technology presents multiple pairs of tx/rx queues,
> and renders network L2 sorting mechanism based on MAC addresses and VLAN tags
> for each tx/rx queue pair. Here we present a generic framework, in which 
> network
> traffic to/from a tx/rx queue pair can be directed from/to a KVM guest without
> any software copy.
> 
> Actually this framework can apply to traditional network adapters which have
> just one tx/rx queue pair. And applications using the same user/kernel 
> interface
> can utilize this framework to send/receive network traffic directly thru a 
> tx/rx
> queue pair in a network adapter.
> 
> We use virtio-net architecture to illustrate the framework.
> 
> 
> || pop   add_buf||
> |Qemu process|  <-TX   <--  | Guest Kernel   |
> ||  -> -->  ||
> |Virtio-net  | push  get_buf||
> |  (Backend service) |  ->RX   -->  |  Virtio-net|
> ||  <- <--  |driver  |
> || push  get_buf||
> ||  ||
>|
>|
>| AIO (read & write) combined with Direct I/O
>|   (which substitute synced file operations)
> |---|
> | Host kernel  | read: copy-less with directly mapped user  |
> |  |   space to kernel, payload directly DMAed  |
> |  |   into user space  |
> |  | write: copy-less with directly mapped user |
> |  |   space to kernel, payload directly hooked |
> |  |   to a skb |
> |  ||
> |  (a likely   ||
> |   queue pair ||
> |   instance)  ||
> |  |   ||
> | NIC driver <-->  TUN/TAP driver   |
> |---|
>|
>|
>traditional adapter or a tx/rx queue pair
> 
> The basic idea is to utilize the kernel Asynchronous I/O combined with Direct
> I/O to implements copy-less TUN/TAP device. AIO and Direct I/O is not new to
> kernel, we still can see it in SCSI tape driver.
> 
> With traditional file operations, a copying of payload contents from/to the
> kernel DMA address to/from a user buffer is needed. That's what the copying we
> want to save.
> 
> The proposed framework is like this:
> A TUN/TAP device is bound to a traditional NIC adapter or a tx/rx queue pair 
> in
> host side. KVM virto-net Backend service, the user space program submits
> asynchronous read/write I/O requests to the host kernel through TUN/TAP 
> device.
> The requests are corresponding to the vqueue elements include both 
> transmission
> & receive. They can be queued in one AIO request and later, the completion 
> will
> be notified through the underlying packets tx/rx processing of the rx/tx queue
> pair.
> 
> Detailed path:
> 
> To guest Virtio-net driver, packets receive corresponding to asynchronous read
> I/O requests of Backend service.
> 
> 1) Guest Virtio-net driver provides header and payload address through the
> receive vqueue to Virtio-net backend service.
> 
> 2) Virtio-net backend service encapsulates multiple vqueue elements into
> multiple AIO control blocks and composes them into one AIO read request.
> 
> 3) Virtio-net backend service uses io_submit() syscall to pass the request to
> the TUN/TAP device.
> 
> 4) Virtio-net backend service uses io_getevents() syscall to check the
> completion of the request.
> 
> 5) The TUN/TAP driver receives packets from the queue pair of NIC, and 
> prepares
> for Direct I/O.
>A modified NIC driver may render a skb which header is allocated in host
> kernel, but the payload buffer is directly mapped from user space buffer which
> are rendered through the AIO request by the Backend service. get_user_pages()
> may do this. For one AIO read request, the TUN/TAP driver maintains a list for
> the directly mapped buffers, and a NIC driver tries to get the buffers as
> payload buffer to compose the new skbs. Of course, if getting the buffers
> fails, then kernel allocated buffers are used.
> 
> 6) Modern NIC cards now most