On Thu, Dec 05, 2013 at 09:40:56AM +0100, Matthias Brugger wrote: > 2013/11/11 Stefan Hajnoczi <stefa...@redhat.com>: > > On Mon, Nov 11, 2013 at 11:00:45AM +0100, Matthias Brugger wrote: > >> 2013/11/5 Stefan Hajnoczi <stefa...@gmail.com>: > >> > I'd also like to see the thread pool implementation you wish to add > >> > before we add a layer of indirection which has no users yet. > >> > >> Fair enough, I will evaluate if it will make more sense to implement a > >> new AIO infrastructure instead to try reuse the thread-pool. > >> Actually my implementation will differ in the way, that we will have > >> several workerthreads with everyone of them having its own queue. The > >> requests will be distributed between them depending on an identifier. > >> The request function which the worker_thread call will be the same as > >> using aio=threads, so I'm not quite sure which will be the way to go. > >> Any opinions and hints, like the one you gave are highly appreciated. > > > > If I understand the slides you linked to correctly, the guest will pass > > an identifier with each request. The host has worker threads allowing > > each stream of requests to be serviced independently. The idea is to > > associate guest processes with unique identifiers. > > > > The guest I/O scheduler is supposed to submit requests in a way that > > meets certain policies (e.g. fairness between processes, deadlines, > > etc). > > > > Why is it necessary to push this task down into the host? I don't > > understand the advantage of this approach except that maybe it works > > around certain misconfigurations, I/O scheduler quirks, or plain old > > bugs - all of which should be investigated and fixed at the source > > instead of adding another layer of code to mask them. > > It is about I/O scheduling. CFQ the state of the art I/O scheduler > merges adjacent requests from the same PID before dispatching them to > the disk. > If we can distinguish between the different threads of a virtual > machine that read/write a file, the I/O scheduler in the host can > merge requests in an effective way for sequential access. Qemu fails > in this, because of its architecture. Apart that at the moment there > is no way to distinguish the guest threads from each other (I'm > working on some kernel patches), Qemu has one big queue from which > several workerthreads grab requests and dispatch them to the disk. > Even if you have one large read from just one thread in the guest, the > I/O scheduler in the host will get the requests from different PIDs (= > workerthreads) and won't be able to merge them.
>From clone(2): If several threads are doing I/O on behalf of the same process (aio_read(3), for instance), they should employ CLONE_IO to get better I/O performance. I think we should just make sure that worker threads share the same io_context in QEMU. That way the host I/O scheduler will treat their requests as one. Can you benchmark CLONE_IO and see if it improves the situation? > In former versions, there was some work done to merge requests in > Qemu, but I don't think they were very useful, because you don't know > how the layout of the image file looks like on the physical disk. > Anyway I think this code parts have been removed. If you mean bdrv_aio_multiwrite(), it's still there and used by virtio-blk emulation. It's only used for write requests, not reads. Stefan