On Tue, Apr 27, 2021 at 10:28 PM Stefan Hajnoczi <stefa...@redhat.com> wrote: > > On Tue, Apr 27, 2021 at 06:24:55PM +0800, Yongji Xie wrote: > > On Mon, Apr 26, 2021 at 11:34 PM Stefan Hajnoczi <stefa...@redhat.com> > > wrote: > > > > > > On Thu, Apr 08, 2021 at 06:12:49PM +0800, Xie Yongji wrote: > > > > Since we already have some ways to emulate vDPA block device > > > > in kernel[1] or userspace[2]. This series tries to introduce a > > > > new vhost-vdpa block device for that. To use it, we can add > > > > something like: > > > > > > > > qemu-system-x86_64 \ > > > > -device vhost-vdpa-blk-pci,vdpa-dev=/dev/vhost-vdpa-0 > > > > > > This device is similar to vhost-user-blk. QEMU does not see it as a > > > block device so storage migration, I/O throttling, image formats, etc > > > are not supported. Stefano Garzarella and I discussed how vdpa-blk > > > devices could integrate more deeply with QEMU's block layer. The QEMU > > > block layer could be enabled only when necessary and otherwise bypassed > > > for maximum performance. > > > > > > > Do you mean we can make use of the shadow virtqueue for the slow path > > (I/O will go through the block layer) and add a fast path (like what > > we do now) to bypass the block layer? > > Yes. > > > > This alternative approach is similar to how vhost-net is implemented in > > > QEMU. A BlockDriver would handle the vdpa-blk device and the regular > > > virtio-blk-pci device would still be present. The virtqueues could be > > > delegated to the vdpa-blk device in order to bypass the QEMU block > > > layer. > > > > > > I wanted to mention this since it's likely that this kind of vdpa-blk > > > device implementation will be posted in the future and you might be > > > interested. It makes live migration with non-shared storage possible, > > > for example. > > > > > > > That would be nice, I'm looking forward to it! > > > > So do you think whether it's necessary to continue this approach? > > Looks like we don't need a vhost-vdpa-blk device any more in the new > > approach. > > There is no code for the vdpa-blk BlockDriver yet and the implementation > will be more complex than this patch series (more risk the feature could > be delayed). > > If you need vdpa-blk as soon as possible then this patch series may be a > pragmatic solution. As long as it doesn't limit the vdpa-blk interface > in a way that prevents the BlockDriver implementation then I think QEMU > could support both. > > In the long term the vdpa-blk BlockDriver could replace -device > vdpa-blk-pci since the performance should be identical and it supports > more use cases. > > It's up to you. You're welcome to wait for the BlockDriver, work > together with Stefano on the BlockDriver, or continue with your patch > series. >
I like the new idea! Let me wait for it. > > > An issue with vhost-user-blk is that the ownership of qdev properties > > > and VIRTIO Configuration Space fields was not clearly defined. Some > > > things are handled by QEMU's vhost-user-blk code, some things are > > > handled by the vhost-user device backend, and some things are negotiated > > > between both entities. This patch series follows the existing > > > vhost-user-blk approach, which I think will show serious issues as the > > > device is more widely used and whenever virtio-blk or the implementation > > > is extended with new features. It is very hard to provide backwards > > > compatibility with the current approach where the ownership of qdev > > > properties and VIRTIO Configuration Space fields is ad-hoc and largely > > > undefined. > > > > > > Since vDPA has VIRTIO Configuration Space APIs, I suggest that the > > > vhost-vDPA device controls the entire configuration space. QEMU should > > > simply forward accesses between the guest and vhost-vdpa. > > > > > > > Does this already achieved by vhost_ops->vhost_get_config? And I want > > to know how to handle the endianness issue if qemu just simply does > > forwarding and doesn't care about the content of config space. > > QEMU just copies bytes betwen the driver and the device via > vdpa_config_ops->get/set_config(). I don't think it needs to worry about > endianness in VIRTIO Configuration Space access code or did I miss > something? > > My understanding is that vDPA currently supports same-endian Legacy and > little-endian Modern devices. Cross-endian Legacy devices are currently > not supported because there is no API to set endianness. > OK, I get you. > If such an API is added in the future, then QEMU can use it to tell the > vDPA device about guest endianness. QEMU still won't need to modify > Configuration Space data itself. > > > > Regarding qdev properties, it's a little trickier because QEMU needs to > > > do the emulated VIRTIO device setup (allocating virtqueues, setting > > > their sizes, etc). Can QEMU query this information from the vDPA device? > > > If not, which qdev properties are read-only and must match the > > > configuration of the vDPA device and which are read-write and can > > > control the vDPA device? > > > > > > > Yes, that's an issue. We have to make sure the number of virtqueues > > and their size set by qemu is not greater than hardware limitation. > > Now I think we can query the max queue size, but looks like we don't > > have an interface to query the max number of virtqueues. > > Okay, this is something that Jason and Stefano can discuss more. Ideally > the QEMU command-line does not need to duplicate information about the > vDPA device. The vdpa management tool > (https://github.com/shemminger/iproute2/tree/main/vdpa) and VDUSE > virtio-blk device will probably be where the main device configuration > happens. > > As a starting point, can you describe how your VDUSE virtio-blk device > is configured? Does it have a command-line/config file/RPC API to set > the number of virtqueues, block size, etc? > Yes, we have a command-line to set those properties, something like: qemu-storage-daemon \ --chardev socket,id=charmonitor,path=/tmp/qmp.sock,server,nowait \ --monitor chardev=charmonitor \ --blockdev driver=host_device,cache.direct=on,aio=native,filename=/dev/nullb0,node-name=disk0 \ --export type=vduse-blk,id=test,node-name=disk0,writable=on,name=vduse-null,num-queues=16,queue-size=128 Thanks, Yongji