On Wed, Dec 05, 2012 at 09:46:59PM +0100, Stefan Hajnoczi wrote: > This series adds the -device virtio-blk-pci,x-data-plane=on property that > enables a high performance I/O codepath. A dedicated thread is used to > process > virtio-blk requests outside the global mutex and without going through the > QEMU > block layer. > > Khoa Huynh <k...@us.ibm.com> reported an increase from 140,000 IOPS to 600,000 > IOPS for a single VM using virtio-blk-data-plane in July: > > http://comments.gmane.org/gmane.comp.emulators.kvm.devel/94580 > > The virtio-blk-data-plane approach was originally presented at Linux Plumbers > Conference 2010. The following slides contain a brief overview: > > > http://linuxplumbersconf.org/2010/ocw/system/presentations/651/original/Optimizing_the_QEMU_Storage_Stack.pdf > > The basic approach is: > 1. Each virtio-blk device has a thread dedicated to handling ioeventfd > signalling when the guest kicks the virtqueue. > 2. Requests are processed without going through the QEMU block layer using > Linux AIO directly. > 3. Completion interrupts are injected via irqfd from the dedicated thread. > > To try it out: > > qemu -drive if=none,id=drive0,cache=none,aio=native,format=raw,file=... > -device virtio-blk-pci,drive=drive0,scsi=off,x-data-plane=on > > Limitations: > * Only format=raw is supported > * Live migration is not supported > * Block jobs, hot unplug, and other operations fail with -EBUSY > * I/O throttling limits are ignored > * Only Linux hosts are supported due to Linux AIO usage > > The code has reached a stage where I feel it is ready to merge. Users have > been playing with it for some time and want the significant performance boost. > > We are refactoring QEMU to get rid of the global mutex. I believe that > virtio-blk-data-plane can eventually become the default mode of operation. > > Instead of waiting for global mutex removal efforts to finish, I want to use > virtio-blk-data-plane as an example device for AioContext and threaded hw > dispatch refactoring. This means: > > 1. When the block layer can bind to an AioContext and execute I/O outside the > global mutex, virtio-blk-data-plane can use this (and gain image format > support). > > 2. When hw dispatch no longer needs the global mutex we can use hw/virtio.c > again and perhaps run a pool of iothreads instead of dedicated data plane > threads. > > But in the meantime, I have cleaned up the virtio-blk-data-plane code so that > it can be merged as an experimental feature.
I mostly looked at the virtio side of the patchset. I don't see any bugs here. I sent some improvement suggestions but we can do them in tree as well. > v5: > * Omit memory regions with dirty logging enabled from hostmem [Michael] > * Add doc comment about quiescing requests across memory hot unplug [Michael] > * Clarify which Linux vhost version the vring code originates from [Michael] > * Break up indirect vring buffer into 1 hostmem_lookup() per descriptor > [Michael] > * Barriers in hw/dataplane/vring.c to force fields to be loaded [Michael] > * split vring_set_notification() into enable/disable [Paolo] > * barriers in vring.c instead of virtio-blk.c [Michael] > * move setup code from hw/virtio-blk.c into hw/dataplane/virtio-blk.c > [Michael] > > * Note I did not get rid of the mutex+condvar approach to draining requests. > I've had good feedback on the performance of the patch series so I'm not > worried about eliminating the lock (it's very rarely contended). Hope > Michael and Paolo are okay with this approach. > > v4: > * Add qemu_iovec_concat_iov() [Paolo] > * Use QEMUIOVector to copy out virtio_blk_inhdr [Michael, Paolo] > > v3: > * Don't assume iovec layout [Michael] > * Better naming for hostmem.c MemoryListener callbacks [Don] > * More vring quarantining if commands are bogus instead of exiting [Blue] > > v2: > * Use MemoryListener for thread-safe memory mapping [Paolo, Anthony, and > everyone else pointed this out ;-)] > * Quarantine invalid vring instead of exiting [Blue] > * Replace __u16 kernel types with uint16_t [Blue] > > Changes from the RFC v9: > * Add x-data-plane=on|off option and coexist with regular virtio-blk code > * Create thread from BH so it inherits iothread cpusets > * Drain requests on vm_stop() so stopped guest does not access image file > * Add migration blocker > * Add bdrv_in_use() to prevent block jobs and other operations that can > interfere > * Drop IOQueue request merging for simplicity > * Drop ioctl interrupt injection and always use irqfd for simplicity > * Major cleanup to split up source files > * Rebase from qemu-kvm.git onto qemu.git > * Address Michael Tsirkin's review comments > > Stefan Hajnoczi (11): > raw-posix: add raw_get_aio_fd() for virtio-blk-data-plane > configure: add CONFIG_VIRTIO_BLK_DATA_PLANE > dataplane: add host memory mapping code > dataplane: add virtqueue vring code > dataplane: add event loop > dataplane: add Linux AIO request queue > iov: add iov_discard() to remove data > test-iov: add iov_discard() testcase > iov: add qemu_iovec_concat_iov() > dataplane: add virtio-blk data plane code > virtio-blk: add x-data-plane=on|off performance feature > > block.h | 9 + > block/raw-posix.c | 34 ++++ > configure | 21 ++ > hw/Makefile.objs | 2 +- > hw/dataplane/Makefile.objs | 3 + > hw/dataplane/event-poll.c | 109 +++++++++++ > hw/dataplane/event-poll.h | 40 ++++ > hw/dataplane/hostmem.c | 173 +++++++++++++++++ > hw/dataplane/hostmem.h | 57 ++++++ > hw/dataplane/ioq.c | 118 ++++++++++++ > hw/dataplane/ioq.h | 57 ++++++ > hw/dataplane/virtio-blk.c | 463 > +++++++++++++++++++++++++++++++++++++++++++++ > hw/dataplane/virtio-blk.h | 43 +++++ > hw/dataplane/vring.c | 361 +++++++++++++++++++++++++++++++++++ > hw/dataplane/vring.h | 63 ++++++ > hw/virtio-blk.c | 28 ++- > hw/virtio-blk.h | 1 + > hw/virtio-pci.c | 3 + > iov.c | 80 ++++++-- > iov.h | 13 ++ > qemu-common.h | 3 + > tests/test-iov.c | 129 +++++++++++++ > trace-events | 9 + > 23 files changed, 1805 insertions(+), 14 deletions(-) > create mode 100644 hw/dataplane/Makefile.objs > create mode 100644 hw/dataplane/event-poll.c > create mode 100644 hw/dataplane/event-poll.h > create mode 100644 hw/dataplane/hostmem.c > create mode 100644 hw/dataplane/hostmem.h > create mode 100644 hw/dataplane/ioq.c > create mode 100644 hw/dataplane/ioq.h > create mode 100644 hw/dataplane/virtio-blk.c > create mode 100644 hw/dataplane/virtio-blk.h > create mode 100644 hw/dataplane/vring.c > create mode 100644 hw/dataplane/vring.h > > -- > 1.8.0.1