Re: [RFC] Virtual Machine Device Queues(VMDq) support on KVM

2009-09-22 Thread Michael S. Tsirkin
On Mon, Sep 21, 2009 at 09:27:18AM -0700, Chris Wright wrote:
 * Stephen Hemminger (shemmin...@vyatta.com) wrote:
  On Mon, 21 Sep 2009 16:37:22 +0930
  Rusty Russell ru...@rustcorp.com.au wrote:
  
 Actually this framework can apply to traditional network adapters 
 which have
 just one tx/rx queue pair. And applications using the same 
 user/kernel interface
 can utilize this framework to send/receive network traffic directly 
 thru a tx/rx
 queue pair in a network adapter.
 
  
  More importantly, when virtualizations is used with multi-queue
  NIC's the virtio-net NIC is a single CPU bottleneck. The virtio-net
  NIC should preserve the parallelism (lock free) using multiple
  receive/transmit queues. The number of queues should equal the
  number of CPUs.
 
 Yup, multiqueue virtio is on todo list ;-)
 
 thanks,
 -chris

Note we'll need multiqueue tap for that to help.

-- 
MST
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: [RFC] Virtual Machine Device Queues(VMDq) support on KVM

2009-09-22 Thread Arnd Bergmann
On Tuesday 22 September 2009, Michael S. Tsirkin wrote:
   More importantly, when virtualizations is used with multi-queue
   NIC's the virtio-net NIC is a single CPU bottleneck. The virtio-net
   NIC should preserve the parallelism (lock free) using multiple
   receive/transmit queues. The number of queues should equal the
   number of CPUs.
  
  Yup, multiqueue virtio is on todo list ;-)
  
 
 Note we'll need multiqueue tap for that to help.

My idea for that was to open multiple file descriptors to the same
macvtap device and let the kernel figure out the  right thing to
do with that. You can do the same with raw packed sockets in case
of vhost_net, but I wouldn't want to add more complexity to the
tun/tap driver for this.

Arnd 
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: [RFC] Virtual Machine Device Queues(VMDq) support on KVM

2009-09-22 Thread Stephen Hemminger
On Tue, 22 Sep 2009 13:50:54 +0200
Arnd Bergmann a...@arndb.de wrote:

 On Tuesday 22 September 2009, Michael S. Tsirkin wrote:
More importantly, when virtualizations is used with multi-queue
NIC's the virtio-net NIC is a single CPU bottleneck. The virtio-net
NIC should preserve the parallelism (lock free) using multiple
receive/transmit queues. The number of queues should equal the
number of CPUs.
   
   Yup, multiqueue virtio is on todo list ;-)
   
  
  Note we'll need multiqueue tap for that to help.
 
 My idea for that was to open multiple file descriptors to the same
 macvtap device and let the kernel figure out the  right thing to
 do with that. You can do the same with raw packed sockets in case
 of vhost_net, but I wouldn't want to add more complexity to the
 tun/tap driver for this.
 
   Arnd 


Or get tap out of the way entirely. The packets should not have
to go out to user space at all (see veth)
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: [RFC] Virtual Machine Device Queues(VMDq) support on KVM

2009-09-22 Thread Arnd Bergmann
On Tuesday 22 September 2009, Stephen Hemminger wrote:
  My idea for that was to open multiple file descriptors to the same
  macvtap device and let the kernel figure out the  right thing to
  do with that. You can do the same with raw packed sockets in case
  of vhost_net, but I wouldn't want to add more complexity to the
  tun/tap driver for this.
  
 Or get tap out of the way entirely. The packets should not have
 to go out to user space at all (see veth)

How does veth relate to that, do you mean vhost_net? With vhost_net,
you could still open multiple sockets, only the access is in the kernel.
Obviously, once it all is in the kernel, that could be done under the
covers, but I think it would be cleaner to treat vhost_net purely as
a way to bypass the syscalls for user space, with as little as possible
visible impact otherwise.

Arnd 
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: [RFC] Virtual Machine Device Queues(VMDq) support on KVM

2009-09-21 Thread Stephen Hemminger
On Mon, 21 Sep 2009 16:37:22 +0930
Rusty Russell ru...@rustcorp.com.au wrote:

   Actually this framework can apply to traditional network adapters which 
   have
   just one tx/rx queue pair. And applications using the same user/kernel 
   interface
   can utilize this framework to send/receive network traffic directly thru 
   a tx/rx
   queue pair in a network adapter.
   

More importantly, when virtualizations is used with multi-queue NIC's the 
virtio-net
NIC is a single CPU bottleneck. The virtio-net NIC should preserve the 
parallelism (lock
free) using multiple receive/transmit queues. The number of queues should equal 
the
number of CPUs.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: [RFC] Virtual Machine Device Queues(VMDq) support on KVM

2009-09-21 Thread Chris Wright
* Stephen Hemminger (shemmin...@vyatta.com) wrote:
 On Mon, 21 Sep 2009 16:37:22 +0930
 Rusty Russell ru...@rustcorp.com.au wrote:
 
Actually this framework can apply to traditional network adapters which 
have
just one tx/rx queue pair. And applications using the same user/kernel 
interface
can utilize this framework to send/receive network traffic directly 
thru a tx/rx
queue pair in a network adapter.

 
 More importantly, when virtualizations is used with multi-queue NIC's the 
 virtio-net
 NIC is a single CPU bottleneck. The virtio-net NIC should preserve the 
 parallelism (lock
 free) using multiple receive/transmit queues. The number of queues should 
 equal the
 number of CPUs.

Yup, multiqueue virtio is on todo list ;-)

thanks,
-chris
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


RE: [RFC] Virtual Machine Device Queues(VMDq) support on KVM

2009-09-02 Thread Xin, Xiaohui
* Code is easier to review than bullet points.

Yes. We'd send the code soon.

* Direct I/O has to be safe when page is shared by multiple threads,
 and has to be non-blocking since network I/O can take indeterminately
 long (think big queue's, tunneling, ...)

In the situation, one queue pair NIC is assigned to only one guest, the pages 
are locked and a KVM guest will not swapped out.


* In the past attempts at Direct I/O on network have always had SMP
 TLB issues. The page has to be flipped or marked as COW on all CPU's
 and the cost of the Inter Processor Interrupt to steal the page has
 been slower than copying

It may be, we have not thought about this more . Thanks.

Thanks
Xiaohui

-Original Message-
From: Stephen Hemminger [mailto:shemmin...@vyatta.com] 
Sent: Wednesday, September 02, 2009 12:05 AM
To: Xin, Xiaohui
Cc: m...@redhat.com; net...@vger.kernel.org; 
virtualization@lists.linux-foundation.org; k...@vger.kernel.org; 
linux-ker...@vger.kernel.org; mi...@elte.hu; linux...@kvack.org; 
a...@linux-foundation.org; h...@zytor.com; gregory.hask...@gmail.com
Subject: Re: [RFC] Virtual Machine Device Queues(VMDq) support on KVM

On Tue, 1 Sep 2009 14:58:19 +0800
Xin, Xiaohui xiaohui@intel.com wrote:

   [RFC] Virtual Machine Device Queues (VMDq) support on KVM
 
 Network adapter with VMDq technology presents multiple pairs of tx/rx queues,
 and renders network L2 sorting mechanism based on MAC addresses and VLAN tags
 for each tx/rx queue pair. Here we present a generic framework, in which 
 network
 traffic to/from a tx/rx queue pair can be directed from/to a KVM guest without
 any software copy.
 
 Actually this framework can apply to traditional network adapters which have
 just one tx/rx queue pair. And applications using the same user/kernel 
 interface
 can utilize this framework to send/receive network traffic directly thru a 
 tx/rx
 queue pair in a network adapter.
 
 We use virtio-net architecture to illustrate the framework.
 
 
 || pop   add_buf||
 |Qemu process|  -TX   --  | Guest Kernel   |
 ||  - --  ||
 |Virtio-net  | push  get_buf||
 |  (Backend service) |  -RX   --  |  Virtio-net|
 ||  - --  |driver  |
 || push  get_buf||
 ||  ||
|
|
| AIO (read  write) combined with Direct I/O
|   (which substitute synced file operations)
 |---|
 | Host kernel  | read: copy-less with directly mapped user  |
 |  |   space to kernel, payload directly DMAed  |
 |  |   into user space  |
 |  | write: copy-less with directly mapped user |
 |  |   space to kernel, payload directly hooked |
 |  |   to a skb |
 |  ||
 |  (a likely   ||
 |   queue pair ||
 |   instance)  ||
 |  |   ||
 | NIC driver --  TUN/TAP driver   |
 |---|
|
|
traditional adapter or a tx/rx queue pair
 
 The basic idea is to utilize the kernel Asynchronous I/O combined with Direct
 I/O to implements copy-less TUN/TAP device. AIO and Direct I/O is not new to
 kernel, we still can see it in SCSI tape driver.
 
 With traditional file operations, a copying of payload contents from/to the
 kernel DMA address to/from a user buffer is needed. That's what the copying we
 want to save.
 
 The proposed framework is like this:
 A TUN/TAP device is bound to a traditional NIC adapter or a tx/rx queue pair 
 in
 host side. KVM virto-net Backend service, the user space program submits
 asynchronous read/write I/O requests to the host kernel through TUN/TAP 
 device.
 The requests are corresponding to the vqueue elements include both 
 transmission
  receive. They can be queued in one AIO request and later, the completion 
 will
 be notified through the underlying packets tx/rx processing of the rx/tx queue
 pair.
 
 Detailed path:
 
 To guest Virtio-net driver, packets receive corresponding to asynchronous read
 I/O requests of Backend service.
 
 1) Guest Virtio-net driver provides header

Re: [RFC] Virtual Machine Device Queues(VMDq) support on KVM

2009-09-01 Thread Stephen Hemminger
On Tue, 1 Sep 2009 14:58:19 +0800
Xin, Xiaohui xiaohui@intel.com wrote:

   [RFC] Virtual Machine Device Queues (VMDq) support on KVM
 
 Network adapter with VMDq technology presents multiple pairs of tx/rx queues,
 and renders network L2 sorting mechanism based on MAC addresses and VLAN tags
 for each tx/rx queue pair. Here we present a generic framework, in which 
 network
 traffic to/from a tx/rx queue pair can be directed from/to a KVM guest without
 any software copy.
 
 Actually this framework can apply to traditional network adapters which have
 just one tx/rx queue pair. And applications using the same user/kernel 
 interface
 can utilize this framework to send/receive network traffic directly thru a 
 tx/rx
 queue pair in a network adapter.
 
 We use virtio-net architecture to illustrate the framework.
 
 
 || pop   add_buf||
 |Qemu process|  -TX   --  | Guest Kernel   |
 ||  - --  ||
 |Virtio-net  | push  get_buf||
 |  (Backend service) |  -RX   --  |  Virtio-net|
 ||  - --  |driver  |
 || push  get_buf||
 ||  ||
|
|
| AIO (read  write) combined with Direct I/O
|   (which substitute synced file operations)
 |---|
 | Host kernel  | read: copy-less with directly mapped user  |
 |  |   space to kernel, payload directly DMAed  |
 |  |   into user space  |
 |  | write: copy-less with directly mapped user |
 |  |   space to kernel, payload directly hooked |
 |  |   to a skb |
 |  ||
 |  (a likely   ||
 |   queue pair ||
 |   instance)  ||
 |  |   ||
 | NIC driver --  TUN/TAP driver   |
 |---|
|
|
traditional adapter or a tx/rx queue pair
 
 The basic idea is to utilize the kernel Asynchronous I/O combined with Direct
 I/O to implements copy-less TUN/TAP device. AIO and Direct I/O is not new to
 kernel, we still can see it in SCSI tape driver.
 
 With traditional file operations, a copying of payload contents from/to the
 kernel DMA address to/from a user buffer is needed. That's what the copying we
 want to save.
 
 The proposed framework is like this:
 A TUN/TAP device is bound to a traditional NIC adapter or a tx/rx queue pair 
 in
 host side. KVM virto-net Backend service, the user space program submits
 asynchronous read/write I/O requests to the host kernel through TUN/TAP 
 device.
 The requests are corresponding to the vqueue elements include both 
 transmission
  receive. They can be queued in one AIO request and later, the completion 
 will
 be notified through the underlying packets tx/rx processing of the rx/tx queue
 pair.
 
 Detailed path:
 
 To guest Virtio-net driver, packets receive corresponding to asynchronous read
 I/O requests of Backend service.
 
 1) Guest Virtio-net driver provides header and payload address through the
 receive vqueue to Virtio-net backend service.
 
 2) Virtio-net backend service encapsulates multiple vqueue elements into
 multiple AIO control blocks and composes them into one AIO read request.
 
 3) Virtio-net backend service uses io_submit() syscall to pass the request to
 the TUN/TAP device.
 
 4) Virtio-net backend service uses io_getevents() syscall to check the
 completion of the request.
 
 5) The TUN/TAP driver receives packets from the queue pair of NIC, and 
 prepares
 for Direct I/O.
A modified NIC driver may render a skb which header is allocated in host
 kernel, but the payload buffer is directly mapped from user space buffer which
 are rendered through the AIO request by the Backend service. get_user_pages()
 may do this. For one AIO read request, the TUN/TAP driver maintains a list for
 the directly mapped buffers, and a NIC driver tries to get the buffers as
 payload buffer to compose the new skbs. Of course, if getting the buffers
 fails, then kernel allocated buffers are used.
 
 6) Modern NIC cards now mostly have the header split feature. The NIC queue
 pair then may directly DMA the payload