On Fri, Jun 27, 2014 at 8:01 PM, Stefan Hajnoczi <stefa...@redhat.com> wrote: > On Thu, Jun 26, 2014 at 11:14:16PM +0800, Ming Lei wrote: >> Hi Stefan, >> >> I found VM block I/O thoughput is decreased by more than 40% >> on my laptop, and looks much worsen in my server environment, >> and it is caused by your commit 580b6b2aa2: >> >> dataplane: use the QEMU block layer for I/O >> >> I run fio with below config to test random read: >> >> [global] >> direct=1 >> size=4G >> bsrange=4k-4k >> timeout=20 >> numjobs=4 >> ioengine=libaio >> iodepth=64 >> filename=/dev/vdc >> group_reporting=1 >> >> [f] >> rw=randread >> >> Together with throughput drop, the latency is improved a little. >> >> With this commit, I/O block submitted to fs becomes much smaller >> than before, and more io_submit() need to be called to kernel, that >> means iodepth may become much less. >> >> I am not surprised with the result since I did compare VM I/O >> performance between qemu and lkvm before, which has no big qemu >> lock problem and handle I/O in a dedicated thread, but lkvm's block >> IO is still much worse than qemu from view of throughput, because >> lkvm doesn't submit block I/O at batch like the way of previous >> dataplane, IMO. >> >> But now you change the way of submitting I/O, could you share >> the motivation about the change? Is the throughput drop you expect? > > Thanks for reporting this. 40% is a serious regression. > > We were expecting a regression since the custom Linux AIO codepath has > been replaced with the QEMU block layer (which offers features like > image formats, snapshots, I/O throttling). > > Let me know if you get stuck working on a patch. Implementing batching > sounds like a good idea. I never measured the impact when I wrote the > ioq code, it just seemed like a natural way to structure the code.
I just implemented plug&unplug based batching, and it is working now. But throughout still has no obvious improvement. Looks loading in IOthread is a bit low, so I am wondering if there is block point caused by Qemu QEMU block layer. > Hopefully this 40% number is purely due to batching and we can get most > of the performance back. I will double check it, but based on my previous comparison between lkvm and qemu, and batching is the only difference. Thanks, -- Ming Lei