[PING] [PATCH] virtio: fix IO request length in virtio SCSI/block #PSBM-78839

Denis Plotnikov Fri, 13 Dec 2019 04:25:26 -0800

On 05.12.2019 10:59, Denis Plotnikov wrote:
> Ping!
>
> On 25.11.2019 12:16, Denis Plotnikov wrote:
>>
>>
>> On 06.11.2019 15:03, Michael S. Tsirkin wrote:
>>> On Thu, Oct 24, 2019 at 11:34:34AM +0000, Denis Lunev wrote:
>>>> On 10/24/19 12:28 AM, Michael S. Tsirkin wrote:
>>>>> On Fri, Oct 18, 2019 at 02:55:47PM +0300, Denis Plotnikov wrote:
>>>>>> From: "Denis V. Lunev" <d...@openvz.org>
>>>>>>
>>>>>> Linux guests submit IO requests no longer than PAGE_SIZE * max_seg
>>>>>> field reported by SCSI controler. Thus typical sequential read with
>>>>>> 1 MB size results in the following pattern of the IO from the guest:
>>>>>>    8,16   1    15754     2.766095122  2071  D   R 2095104 + 1008 
>>>>>> [dd]
>>>>>>    8,16   1    15755     2.766108785  2071  D   R 2096112 + 1008 
>>>>>> [dd]
>>>>>>    8,16   1    15756     2.766113486  2071  D   R 2097120 + 32 [dd]
>>>>>>    8,16   1    15757     2.767668961     0  C   R 2095104 + 1008 [0]
>>>>>>    8,16   1    15758     2.768534315     0  C   R 2096112 + 1008 [0]
>>>>>>    8,16   1    15759     2.768539782     0  C   R 2097120 + 32 [0]
>>>>>> The IO was generated by
>>>>>>    dd if=/dev/sda of=/dev/null bs=1024 iflag=direct
>>>>>>
>>>>>> This effectively means that on rotational disks we will observe 3 
>>>>>> IOPS
>>>>>> for each 2 MBs processed. This definitely negatively affects both
>>>>>> guest and host IO performance.
>>>>>>
>>>>>> The cure is relatively simple - we should report lengthy 
>>>>>> scatter-gather
>>>>>> ability of the SCSI controller. Fortunately the situation here is 
>>>>>> very
>>>>>> good. VirtIO transport layer can accomodate 1024 items in one 
>>>>>> request
>>>>>> while we are using only 128. This situation is present since almost
>>>>>> very beginning. 2 items are dedicated for request metadata thus we
>>>>>> should publish VIRTQUEUE_MAX_SIZE - 2 as max_seg.
>>>>>>
>>>>>> The following pattern is observed after the patch:
>>>>>>    8,16   1     9921     2.662721340  2063  D   R 2095104 + 1024 
>>>>>> [dd]
>>>>>>    8,16   1     9922     2.662737585  2063  D   R 2096128 + 1024 
>>>>>> [dd]
>>>>>>    8,16   1     9923     2.665188167     0  C   R 2095104 + 1024 [0]
>>>>>>    8,16   1     9924     2.665198777     0  C   R 2096128 + 1024 [0]
>>>>>> which is much better.
>>>>>>
>>>>>> The dark side of this patch is that we are tweaking guest visible
>>>>>> parameter, though this should be relatively safe as above transport
>>>>>> layer support is present in QEMU/host Linux for a very long time.
>>>>>> The patch adds configurable property for VirtIO SCSI with a new 
>>>>>> default
>>>>>> and hardcode option for VirtBlock which does not provide good
>>>>>> configurable framework.
>>>>>>
>>>>>> Unfortunately the commit can not be applied as is. For the real 
>>>>>> cure we
>>>>>> need guest to be fixed to accomodate that queue length, which is 
>>>>>> done
>>>>>> only in the latest 4.14 kernel. Thus we are going to expose the 
>>>>>> property
>>>>>> and tweak it on machine type level.
>>>>>>
>>>>>> The problem with the old kernels is that they have
>>>>>> max_segments <= virtqueue_size restriction which cause the guest
>>>>>> crashing in the case of violation.
>>>>> This isn't just in the guests: virtio spec also seems to imply this,
>>>>> or at least be vague on this point.
>>>>>
>>>>> So I think it'll need a feature bit.
>>>>> Doing that in a safe way will also allow being compatible with old 
>>>>> guests.
>>>>>
>>>>> The only downside is it's a bit more work as we need to
>>>>> spec this out and add guest support.
>>>>>
>>>>>> To fix the case described above in the old kernels we can increase
>>>>>> virtqueue_size to 256 and max_segments to 254. The pitfall here is
>>>>>> that seabios allows the virtqueue_size-s < 128, however, the seabios
>>>>>> patch extending that value to 256 is pending.
>>>>> And the fix here is just to limit large vq size to virtio 1.0.
>>>>> In that mode it's fine I think:
>>>>>
>>>>>
>>>>>     /* check if the queue is available */
>>>>>     if (vp->use_modern) {
>>>>>         num = vp_read(&vp->common, virtio_pci_common_cfg, 
>>>>> queue_size);
>>>>>         if (num > MAX_QUEUE_NUM) {
>>>>>             vp_write(&vp->common, virtio_pci_common_cfg, queue_size,
>>>>>                      MAX_QUEUE_NUM);
>>>>>             num = vp_read(&vp->common, virtio_pci_common_cfg, 
>>>>> queue_size);
>>>>>         }
>>>>>     } else {
>>>>>         num = vp_read(&vp->legacy, virtio_pci_legacy, queue_num);
>>>>>     }
>> The same seabios snippet,  but more detailed:
>>
>> vp_find_vq()
>> {
>>    ...
>>    /* check if the queue is available */
>>    if (vp->use_modern) {
>>        num = vp_read(&vp->common, virtio_pci_common_cfg, queue_size);
>>        if (num > MAX_QUEUE_NUM) {
>>            vp_write(&vp->common, virtio_pci_common_cfg, queue_size,
>>                     MAX_QUEUE_NUM);
>>            num = vp_read(&vp->common, virtio_pci_common_cfg, 
>> queue_size);
>>        }
>>    } else {
>>        num = vp_read(&vp->legacy, virtio_pci_legacy, queue_num);
>>    }
>>    if (!num) {
>>        dprintf(1, "ERROR: queue size is 0\n");
>>        goto fail;
>>    }
>>    if (num > MAX_QUEUE_NUM) {
>>        dprintf(1, "ERROR: queue size %d > %d\n", num, MAX_QUEUE_NUM);
>>        goto fail;
>>    }
>> ...
>> }
>>
>> It turned out that the problem is here, but not because of the 
>> seabios code.
>> The virtqueue size is written and then incorrect value is re-read.
>> Thanks to Roman Kagan (rka...@virtuozzo.com) for investigating the 
>> root cause of the problem.
>>
>> As the code states, for the modern devices, seabios reads the queue 
>> size and if it's
>> greater than seabios can support, reduce the queue size to the max 
>> seabios supported value.
>>
>> This doesn't work.
>>
>> The reason is that the size is read from the virtio device,
>>
>> virtio_pci_common_read()
>> {
>>     ...
>>     case VIRTIO_PCI_COMMON_Q_SIZE:
>>         val = virtio_queue_get_num(vdev, vdev->queue_sel);
>>         break;
>>     ...
>> }
>>
>> but is written to the proxy
>>
>> virtio_pci_common_write()
>> {
>>     ...
>>     case VIRTIO_PCI_COMMON_Q_SIZE:
>>         proxy->vqs[vdev->queue_sel].num = val;
>>         break;
>>    ...
>> }.
>>
>> The final stage of the size setting is propagated it from the proxy 
>> to the device on virtqueue enabling:
>>
>> virtio_cpi_common_write()
>> {
>>     ...
>>     case VIRTIO_PCI_COMMON_Q_ENABLE:
>>         virtio_queue_set_num(vdev, vdev->queue_sel,
>> proxy->vqs[vdev->queue_sel].num);
>>         virtio_queue_set_rings(vdev, vdev->queue_sel,
>> ((uint64_t)proxy->vqs[vdev->queue_sel].desc[1]) << 32 |
>> proxy->vqs[vdev->queue_sel].desc[0],
>> ((uint64_t)proxy->vqs[vdev->queue_sel].avail[1]) << 32 |
>> proxy->vqs[vdev->queue_sel].avail[0],
>> ((uint64_t)proxy->vqs[vdev->queue_sel].used[1]) << 32 |
>> proxy->vqs[vdev->queue_sel].used[0]);
>>         proxy->vqs[vdev->queue_sel].enabled = 1;
>>         break;
>>     ...
>> }.
>>
>> So we have the following workflow:
>> suppose the device has virtqueue size = 256 and seabios MAX_QUEUE_NUM 
>> = 128.
>> In that case the seabios works like:
>>
>> 1. if vp_modern read the size (256)
>> 2. 256 > 128
>> 3. write virtqueue size = 128
>> 4. re-read virtqueue size = 256 !!!
>> 5. fail because of the check
>>     if (num > MAX_QUEUE_NUM) {
>>         dprintf(1, "ERROR: queue size %d > %d\n", num, MAX_QUEUE_NUM);
>>         goto fail;
>>     }
>>
>> To fix the issue, we need to read and write the virtqueue size from 
>> the same place.
>> Should we do with the proxy?
>> Is there any reason to read from the device and write to the proxy?
>>
>> Furthermore, the size setting has a few flaws:
>>
>> 1. The size being set should be a power of 2
>> 2. The size being set should be less or equal to the virtqueue size 
>> (and be greater that 2?)
>>
>> Denis
>>>> you mean to put the code like this into virtio_pci_realize() inside 
>>>> QEMU?
>>>>
>>>> If no, can you pls clarify which component should be touched.
>>>>
>>>> Den
>>> I mean:
>>>   - add an API to change the default queue size
>>>   - add a validate features callback, in there check and for modern
>>>     flag set in features increase the queue size
>>>
>>> maybe all this is too much work, we could block this
>>> for transitional devices, but your patch does not do it,
>>> you need to check that legacy is enabled not that modern
>>> is not disabled.
>>>
>>>
>>>
>>
>
[PING] [PATCH] virtio: fix IO request length in virtio SCSI/block #PSBM-78839

Reply via email to