On Wed, Nov 13, 2019 at 12:38:48PM +0000, Denis Plotnikov wrote: > > > On 06.11.2019 15:03, Michael S. Tsirkin wrote: > > On Thu, Oct 24, 2019 at 11:34:34AM +0000, Denis Lunev wrote: > >> On 10/24/19 12:28 AM, Michael S. Tsirkin wrote: > >>> On Fri, Oct 18, 2019 at 02:55:47PM +0300, Denis Plotnikov wrote: > >>>> From: "Denis V. Lunev" <d...@openvz.org> > >>>> > >>>> Linux guests submit IO requests no longer than PAGE_SIZE * max_seg > >>>> field reported by SCSI controler. Thus typical sequential read with > >>>> 1 MB size results in the following pattern of the IO from the guest: > >>>> 8,16 1 15754 2.766095122 2071 D R 2095104 + 1008 [dd] > >>>> 8,16 1 15755 2.766108785 2071 D R 2096112 + 1008 [dd] > >>>> 8,16 1 15756 2.766113486 2071 D R 2097120 + 32 [dd] > >>>> 8,16 1 15757 2.767668961 0 C R 2095104 + 1008 [0] > >>>> 8,16 1 15758 2.768534315 0 C R 2096112 + 1008 [0] > >>>> 8,16 1 15759 2.768539782 0 C R 2097120 + 32 [0] > >>>> The IO was generated by > >>>> dd if=/dev/sda of=/dev/null bs=1024 iflag=direct > >>>> > >>>> This effectively means that on rotational disks we will observe 3 IOPS > >>>> for each 2 MBs processed. This definitely negatively affects both > >>>> guest and host IO performance. > >>>> > >>>> The cure is relatively simple - we should report lengthy scatter-gather > >>>> ability of the SCSI controller. Fortunately the situation here is very > >>>> good. VirtIO transport layer can accomodate 1024 items in one request > >>>> while we are using only 128. This situation is present since almost > >>>> very beginning. 2 items are dedicated for request metadata thus we > >>>> should publish VIRTQUEUE_MAX_SIZE - 2 as max_seg. > >>>> > >>>> The following pattern is observed after the patch: > >>>> 8,16 1 9921 2.662721340 2063 D R 2095104 + 1024 [dd] > >>>> 8,16 1 9922 2.662737585 2063 D R 2096128 + 1024 [dd] > >>>> 8,16 1 9923 2.665188167 0 C R 2095104 + 1024 [0] > >>>> 8,16 1 9924 2.665198777 0 C R 2096128 + 1024 [0] > >>>> which is much better. > >>>> > >>>> The dark side of this patch is that we are tweaking guest visible > >>>> parameter, though this should be relatively safe as above transport > >>>> layer support is present in QEMU/host Linux for a very long time. > >>>> The patch adds configurable property for VirtIO SCSI with a new default > >>>> and hardcode option for VirtBlock which does not provide good > >>>> configurable framework. > >>>> > >>>> Unfortunately the commit can not be applied as is. For the real cure we > >>>> need guest to be fixed to accomodate that queue length, which is done > >>>> only in the latest 4.14 kernel. Thus we are going to expose the property > >>>> and tweak it on machine type level. > >>>> > >>>> The problem with the old kernels is that they have > >>>> max_segments <= virtqueue_size restriction which cause the guest > >>>> crashing in the case of violation. > >>> This isn't just in the guests: virtio spec also seems to imply this, > >>> or at least be vague on this point. > >>> > >>> So I think it'll need a feature bit. > >>> Doing that in a safe way will also allow being compatible with old guests. > >>> > >>> The only downside is it's a bit more work as we need to > >>> spec this out and add guest support. > >>> > >>>> To fix the case described above in the old kernels we can increase > >>>> virtqueue_size to 256 and max_segments to 254. The pitfall here is > >>>> that seabios allows the virtqueue_size-s < 128, however, the seabios > >>>> patch extending that value to 256 is pending. > >>> And the fix here is just to limit large vq size to virtio 1.0. > >>> In that mode it's fine I think: > >>> > >>> > >>> /* check if the queue is available */ > >>> if (vp->use_modern) { > >>> num = vp_read(&vp->common, virtio_pci_common_cfg, queue_size); > >>> if (num > MAX_QUEUE_NUM) { > >>> vp_write(&vp->common, virtio_pci_common_cfg, queue_size, > >>> MAX_QUEUE_NUM); > >>> num = vp_read(&vp->common, virtio_pci_common_cfg, queue_size); > >>> } > >>> } else { > >>> num = vp_read(&vp->legacy, virtio_pci_legacy, queue_num); > >>> } > >> you mean to put the code like this into virtio_pci_realize() inside QEMU? > >> > >> If no, can you pls clarify which component should be touched. > >> > >> Den > > I mean: > > - add an API to change the default queue size > > - add a validate features callback, in there check and for modern > > flag set in features increase the queue size > > > > maybe all this is too much work, we could block this > > for transitional devices, but your patch does not do it, > > you need to check that legacy is enabled not that modern > > is not disabled. > To develop the idea of how to adjust queue size further I'd like to > summarize what we have: > > 1. Variatly of gusts without(?) queue size limitations which can support > queue sizes up to MAX(1024) > > 2. seabios setups with two possible max queue size limitations: 128 and > 256 (recently commited) > > 3. non-sebios setups with unknown max queue-size limitations > > Taking into account that queue size may be limited in bios(efi), to > safely support gueue sizes > 128 we need to distinguish those how can > support greater_than_128 from those who can't. > seabios potentially can't do it, so, as far as I understood, the idea is > to start with queue size=128 and then increase the queue size when the > guest driver is engaged. > > To achieve that, we need to > > 1. understand, which driver is currently working with a virtio device: > seabios, guest, other. Things > here are quite complex, since we can't modify any guest, seabios or > other drivers to explicitly tell > that to device
Anyone negotiating VIRTIO_1 > 2. be able to increase queue size dynamically (re-create queues?). At > the time, this functionality > is absent, at least in qemu virtio-scsi. > Is it possible by design? Why not, it's just an array. This is what I meant when I said we need an API to resize a queue. > 3. choose a place for queue size extending (re-creation). > VirtioDeviceClass->reset? Definitely not reset, that gets you back to original state. > I actually don't know how to do it reliably, so would really appreciate > sone help or advice. validate features sounds like a good place. this is why I wrote "add a validate features callback". > > You've mentioned that old seabios won't use the modern interface, so > would it be ok, if we > > * define DEFAULT_QUEUE_SIZE = 128 > * leave queues creation as is at VirtioDeviceClass->realize() > with queue_size = conf.queue_size > * on VirtioDeviceClass->reset() we check if the device accessed > through "legacy" interface > if so, then (in pseudocode) > if (current_queue_size > DEFAULT_QUEUE_SIZE) { > for (queue in all_queues) { > reduce_queue_size(queue, DEFAULT_QUEUE_SIZE) // > recreate_queue() ? > } > } > else > if (conf.queue_size > current_queue_size) { > for (queue in all_queues) { > increase_queue_size(queue, conf.queue_size) > } > } > > Might this approach work? Does it what you meant? > > Denis I don't think you can do anything useful in reset. We need to check features after they have been negotiated. So we'd start with a small queue min(DEFAULT_QUEUE_SIZE, current_queue_size)? and if VIRTIO_1 is set increase the size. This is very compatible but it is certainly ugly as we are second-guessing the user. Simpler idea: add a new property that is simply unsupported with legacy. E.g. "modern-queue-size" ? If someone sets it, legacy must be disabled otherwise we fail. Way less compatible but hey. > > > > > > >