On Tue, Jun 08, 2021 at 10:52:05AM +0800, zhenwei pi wrote: > On 6/7/21 11:08 PM, Stefan Hajnoczi wrote: > > On Mon, Jun 07, 2021 at 09:32:52PM +0800, zhenwei pi wrote: > > > Since 2020, I started to develop a userspace NVMF initiator library: > > > https://github.com/bytedance/libnvmf > > > and released v0.1 recently. > > > > > > Also developed block driver for QEMU side: > > > https://github.com/pizhenwei/qemu/tree/block-nvmf > > > > > > Test with linux kernel NVMF target (TCP), QEMU gets about 220K IOPS, > > > it seems good. > > > > How does the performance compare to the Linux kernel NVMeoF initiator? > > > > In case you're interested, some Red Hat developers have started to > > working on a new library called libblkio. For now it supports io_uring > > but PCI NVMe and virtio-blk are on the roadmap. The library supports > > blocking, event-driven, and polling modes. There isn't a direct overlap > > with libnvmf but maybe they can learn from each other. > > https://gitlab.com/libblkio/libblkio/-/blob/main/docs/blkio.rst > > > > Stefan > > > > I'm sorry about that no enough information of QEMU block nvmf driver and > libnvmf. > > Kernel initiator & userspace initiator > Rather than io_uring/libaio + kernel initiator solution(read 500K+ IOPS & > write 200K+ IOPS), I prefer QEMU block nvmf + libnvmf(RW 200K+ IOPS): > 1, I don't have to upgrade host kernel. I can also run it on a lower version > of kernel. > 2, During re-connection if target side hits a panic, initiator side would > not get 'D' state(uninterruptable state in kernel), QEMU always could be > killed. > 3, It's easier to trouble shoot for a userspace application.
I see, thanks for sharing. > Default NVMe-OF IO queues > The mechanism of QEMU+libnvmf: > 1, QEMU iothread creates a request and dispatches it to NVMe-OF IO queues > thread by lockless list. > 2, QEMU iothread tries to kick NVMe-OF IO queue thread. > 3, NVMe-OF IO queue thread processes request and returns response to the > QEMU iothread. > > When the QEMU iothread reaches the limitation, 4 NVMe-OF IO queues get > better performance. Can you explain this bottleneck? Even with 4 NVMe-oF IO queues there is still just 1 IOThread submitting requests, so why are 4 IO queues faster than 1? Stefan
signature.asc
Description: PGP signature