On Thu, Aug 27, 2015 at 5:48 PM, Stefan Hajnoczi <stefa...@gmail.com> wrote: > On Mon, Aug 25, 2014 at 03:50:02PM -0600, Chris Friesen wrote: >> On 08/23/2014 01:56 AM, Benoît Canet wrote: >> >The Friday 22 Aug 2014 à 18:59:38 (-0600), Chris Friesen wrote : >> >>On 07/21/2014 10:10 AM, Benoît Canet wrote: >> >>>The Monday 21 Jul 2014 à 09:35:29 (-0600), Chris Friesen wrote : >> >>>>On 07/21/2014 09:15 AM, Benoît Canet wrote: >> >>>>>The Monday 21 Jul 2014 à 08:59:45 (-0600), Chris Friesen wrote : >> >>>>>>On 07/19/2014 02:45 AM, Benoît Canet wrote: >> >>>>>> >> >>>>>>>I think in the throttling case the number of in flight operation is >> >>>>>>>limited by >> >>>>>>>the emulated hardware queue. Else request would pile up and >> >>>>>>>throttling would be >> >>>>>>>inefective. >> >>>>>>> >> >>>>>>>So this number should be around: #define VIRTIO_PCI_QUEUE_MAX 64 or >> >>>>>>>something like than that. >> >>>>>> >> >>>>>>Okay, that makes sense. Do you know how much data can be written as >> >>>>>>part of >> >>>>>>a single operation? We're using 2MB hugepages for the guest memory, >> >>>>>>and we >> >>>>>>saw the qemu RSS numbers jump from 25-30MB during normal operation up >> >>>>>>to >> >>>>>>120-180MB when running dbench. I'd like to know what the worst-case >> >>>>>>would >> >>> >> >>>Sorry I didn't understood this part at first read. >> >>> >> >>>In the linux guest can you monitor: >> >>>benoit@Laure:~$ cat /sys/class/block/xyz/inflight ? >> >>> >> >>>This would give us a faily precise number of the requests actually in >> >>>flight between the guest and qemu. >> >> >> >> >> >>After a bit of a break I'm looking at this again. >> >> >> > >> >Strange. >> > >> >I would use dd with the flag oflag=nocache to make sure the write request >> >does not do in the guest cache though. >> > >> >Best regards >> > >> >Benoît >> > >> >>While doing "dd if=/dev/zero of=testfile bs=1M count=700" in the guest, I >> >>got a max "inflight" value of 181. This seems quite a bit higher than >> >>VIRTIO_PCI_QUEUE_MAX. >> >> >> >>I've seen throughput as high as ~210 MB/sec, which also kicked the RSS >> >>numbers up above 200MB. >> >> >> >>I tried dropping VIRTIO_PCI_QUEUE_MAX down to 32 (it didn't seem to work at >> >>all for values much less than that, though I didn't bother getting an exact >> >>value) and it didn't really make any difference, I saw inflight values as >> >>high as 177. >> >> I think I might have a glimmering of what's going on. Someone please >> correct me if I get something wrong. >> >> I think that VIRTIO_PCI_QUEUE_MAX doesn't really mean anything with respect >> to max inflight operations, and neither does virtio-blk calling >> virtio_add_queue() with a queue size of 128. >> >> I think what's happening is that virtio_blk_handle_output() spins, pulling >> data off the 128-entry queue and calling virtio_blk_handle_request(). At >> this point that queue entry can be reused, so the queue size isn't really >> relevant. > > The number of pending virtio-blk requests is finite. You missed the > vring descriptor table where buffer descriptors live - that's what > prevents the guest from issuing an infinite number of pending requests. > > You are correct that the host moves along the "avail" queue, the actual > buffer descriptors in the vring (struct vring_desc) stay put until > request completion is processed by the guest driver from the "used" > ring. > > Each virtio-blk request takes at least 2 vring descriptors (data buffer > + request status byte). I think 3 is common in practice because drivers > like to submit struct virtio_blk_outhdr in its own descriptor. > > So we have a limit of 128 / 2 = 64 I/O requests or 128 / 3 = 42 I/O > requests. > > If you rerun the tests with the fio job file I posted, the results > should show that only 64 or 42 requests are pending at any given time.
By the way, there's one more case: indirect vring descriptors. If indirect vring descriptors are used them each request takes just 1 descriptor in the vring descriptor table. In that case up to 128 in-flight I/O requests are supported. Stefan