On Mon, Jan 11, 2010 at 09:13:23AM -0600, Anthony Liguori wrote: > On 01/11/2010 08:46 AM, Avi Kivity wrote: >> On 01/11/2010 04:37 PM, Anthony Liguori wrote: >>>> That has the downside of bouncing a cache line on unrelated exits. >>> >>> >>> The read and write sides of the ring are widely separated in physical >>> memory specifically to avoid cache line bouncing. >> >> I meant, exits on random vcpus will cause the cacheline containing the >> notification disable flag to bounce around. As it is, we read it on >> the vcpu that owns the queue and write it on that vcpu or the I/O >> thread. > > Bottom halves are always run from the IO thread. >>>> It probably doesn't matter with qemu as it is now, since it will >>>> bounce qemu_mutex, but it will hurt with large guests (especially >>>> if they have many rings). >>>> >>>> IMO we should get things to work well without riding on unrelated >>>> exits, especially as we're trying to reduce those exits. >>> >>> A block I/O request can potentially be very, very long lived. By >>> serializing requests like this, there's a high likelihood that it's >>> going to kill performance with anything capable of processing >>> multiple requests. >> >> Right, that's why I suggested having a queue depth at which disabling >> notification kicks in. The patch hardcodes this depth to 1, unpatched >> qemu is infinite, a good value is probably spindle count + VAT. > > That means we would need a user visible option which is quite unfortunate. > > Also, that logic only really makes sense with cache=off. With > cache=writethrough, you can get pathological cases whereas you have an > uncached access followed by cached accesses. In fact, with read-ahead, > this is probably not an uncommon scenario. > >>> OTOH, if we aggressively poll the ring when we have an opportunity >>> to, there's very little down side to that and it addresses the >>> serialization problem. >> >> But we can't guarantee that we'll get those opportunities, so it >> doesn't address the problem in a general way. A guest that doesn't >> use hpet and only has a single virtio-blk device will not have any >> reason to exit to qemu. > > We can mitigate this with a timer but honestly, we need to do perf > measurements to see. My feeling is that we will need some more > aggressive form of polling than just waiting for IO completion. I don't > think queue depth is enough because it assumes that all requests are > equal. When dealing with cache=off or even just storage with it's own > cache, that's simply not the case. > > Regards, > > Anthony Liguori >
BTW this is why vhost net uses queue depth in bytes. -- MST