On 10/24/19 12:28 AM, Michael S. Tsirkin wrote: > On Fri, Oct 18, 2019 at 02:55:47PM +0300, Denis Plotnikov wrote: >> From: "Denis V. Lunev" <d...@openvz.org> >> >> Linux guests submit IO requests no longer than PAGE_SIZE * max_seg >> field reported by SCSI controler. Thus typical sequential read with >> 1 MB size results in the following pattern of the IO from the guest: >> 8,16 1 15754 2.766095122 2071 D R 2095104 + 1008 [dd] >> 8,16 1 15755 2.766108785 2071 D R 2096112 + 1008 [dd] >> 8,16 1 15756 2.766113486 2071 D R 2097120 + 32 [dd] >> 8,16 1 15757 2.767668961 0 C R 2095104 + 1008 [0] >> 8,16 1 15758 2.768534315 0 C R 2096112 + 1008 [0] >> 8,16 1 15759 2.768539782 0 C R 2097120 + 32 [0] >> The IO was generated by >> dd if=/dev/sda of=/dev/null bs=1024 iflag=direct >> >> This effectively means that on rotational disks we will observe 3 IOPS >> for each 2 MBs processed. This definitely negatively affects both >> guest and host IO performance. >> >> The cure is relatively simple - we should report lengthy scatter-gather >> ability of the SCSI controller. Fortunately the situation here is very >> good. VirtIO transport layer can accomodate 1024 items in one request >> while we are using only 128. This situation is present since almost >> very beginning. 2 items are dedicated for request metadata thus we >> should publish VIRTQUEUE_MAX_SIZE - 2 as max_seg. >> >> The following pattern is observed after the patch: >> 8,16 1 9921 2.662721340 2063 D R 2095104 + 1024 [dd] >> 8,16 1 9922 2.662737585 2063 D R 2096128 + 1024 [dd] >> 8,16 1 9923 2.665188167 0 C R 2095104 + 1024 [0] >> 8,16 1 9924 2.665198777 0 C R 2096128 + 1024 [0] >> which is much better. >> >> The dark side of this patch is that we are tweaking guest visible >> parameter, though this should be relatively safe as above transport >> layer support is present in QEMU/host Linux for a very long time. >> The patch adds configurable property for VirtIO SCSI with a new default >> and hardcode option for VirtBlock which does not provide good >> configurable framework. >> >> Unfortunately the commit can not be applied as is. For the real cure we >> need guest to be fixed to accomodate that queue length, which is done >> only in the latest 4.14 kernel. Thus we are going to expose the property >> and tweak it on machine type level. >> >> The problem with the old kernels is that they have >> max_segments <= virtqueue_size restriction which cause the guest >> crashing in the case of violation. > This isn't just in the guests: virtio spec also seems to imply this, > or at least be vague on this point. > > So I think it'll need a feature bit. > Doing that in a safe way will also allow being compatible with old guests. > > The only downside is it's a bit more work as we need to > spec this out and add guest support. > >> To fix the case described above in the old kernels we can increase >> virtqueue_size to 256 and max_segments to 254. The pitfall here is >> that seabios allows the virtqueue_size-s < 128, however, the seabios >> patch extending that value to 256 is pending. > > And the fix here is just to limit large vq size to virtio 1.0. > In that mode it's fine I think: > > > /* check if the queue is available */ > if (vp->use_modern) { > num = vp_read(&vp->common, virtio_pci_common_cfg, queue_size); > if (num > MAX_QUEUE_NUM) { > vp_write(&vp->common, virtio_pci_common_cfg, queue_size, > MAX_QUEUE_NUM); > num = vp_read(&vp->common, virtio_pci_common_cfg, queue_size); > } > } else { > num = vp_read(&vp->legacy, virtio_pci_legacy, queue_num); > }
you mean to put the code like this into virtio_pci_realize() inside QEMU? If no, can you pls clarify which component should be touched. Den