On Mon, Aug 25, 2014 at 09:12:54AM -0600, Chris Friesen wrote: > On 08/23/2014 01:56 AM, Benoît Canet wrote: > >The Friday 22 Aug 2014 à 18:59:38 (-0600), Chris Friesen wrote : > >>On 07/21/2014 10:10 AM, Benoît Canet wrote: > >>>The Monday 21 Jul 2014 à 09:35:29 (-0600), Chris Friesen wrote : > >>>>On 07/21/2014 09:15 AM, Benoît Canet wrote: > >>>>>The Monday 21 Jul 2014 à 08:59:45 (-0600), Chris Friesen wrote : > >>>>>>On 07/19/2014 02:45 AM, Benoît Canet wrote: > >>>>>> > >>>>>>>I think in the throttling case the number of in flight operation is > >>>>>>>limited by > >>>>>>>the emulated hardware queue. Else request would pile up and throttling > >>>>>>>would be > >>>>>>>inefective. > >>>>>>> > >>>>>>>So this number should be around: #define VIRTIO_PCI_QUEUE_MAX 64 or > >>>>>>>something like than that. > >>>>>> > >>>>>>Okay, that makes sense. Do you know how much data can be written as > >>>>>>part of > >>>>>>a single operation? We're using 2MB hugepages for the guest memory, > >>>>>>and we > >>>>>>saw the qemu RSS numbers jump from 25-30MB during normal operation up to > >>>>>>120-180MB when running dbench. I'd like to know what the worst-case > >>>>>>would > >>> > >>>Sorry I didn't understood this part at first read. > >>> > >>>In the linux guest can you monitor: > >>>benoit@Laure:~$ cat /sys/class/block/xyz/inflight ? > >>> > >>>This would give us a faily precise number of the requests actually in > >>>flight between the guest and qemu. > >> > >> > >>After a bit of a break I'm looking at this again. > >> > > > >Strange. > > > >I would use dd with the flag oflag=nocache to make sure the write request > >does not do in the guest cache though. > > I set up another test, checking the inflight value every second. > > Running just "dd if=/dev/zero of=testfile2 bs=1M count=700 oflag=nocache&" > gave a bit over 100 inflight requests. > > If I simultaneously run "dd if=testfile of=/dev/null bs=1M count=700 > oflag=nocache&" then then number of inflight write requests peaks at 176.
Please use oflag=direct bs=4k because oflag=nocache is not an accurate measure of the number of I/O requests. It's also a good idea to reduce bs since the block queue limits in the Linux guest driver are around 128 KB and cannot submit 1 MB in a single request. oflag=nocache uses fadvise(POSIX_ADV_DONTNEED) as shown by strace(1): read(0, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 512) = 512 write(1, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 512) = 512 read(0, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 512) = 512 write(1, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 512) = 512 read(0, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 512) = 512 write(1, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 512) = 512 read(0, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 512) = 512 write(1, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 512) = 512 read(0, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 512) = 512 fadvise64(0, 2866688, 4096, POSIX_FADV_DONTNEED) = 0 That's not the same as bypassing the page cache! All I/O is going through the page cache, but dd is telling the kernel it doesn't care about the cached memory anymore *afterwards*. This means the kernel can still do readahead/writebehind to some extent and does not represent the true number of I/O operations to the disk!