On Tue, Mar 05, 2013 at 09:46:30PM +0100, Benoît Canet wrote: > > You need to set a more specific goal. Some questions to get started: > > * Which workloads do you care about and what are their > > characteristics (sequential or random I/O, queue depth)? > > * Do you care about 1 vcpu guests or 4+ vcpu guests? (SMP scalability) > > * Are you using an image format? > > The usage would be a typical HPC workload: 4 vcpu in SMP and random IO on raw > devices.
Okay. If you want to do performance analysis on the existing stack, then the next step is to choose a benchmark that represents this workload. Then you can collect profiles and see where there is room for improvement. In virtio-blk data plane world I'm currently converting core QEMU code to support multiple AioContexts. This is needed in order to use BlockDriverStates from threads (without holding the global mutex). This is a pretty linear task. The second half of this work is enabling device emulation code to run in threads without the global mutex. The trickiest thing here is probably guest memory access - making guest memory access and DMA work safely from a thread. I haven't really started on this, Ping Fan Liu has worked on it in the past, you could chat with him to find out the current status. Stefan