On 30.08.22 22:16, Stefan Hajnoczi wrote: > On Thu, Aug 25, 2022 at 09:43:16AM +0200, David Hildenbrand wrote: >> On 23.08.22 21:22, Stefan Hajnoczi wrote: >>> On Tue, Aug 23, 2022 at 10:01:59AM +0200, David Hildenbrand wrote: >>>> On 23.08.22 00:24, Stefan Hajnoczi wrote: >>>>> Register guest RAM using BlockRAMRegistrar and set the >>>>> BDRV_REQ_REGISTERED_BUF flag so block drivers can optimize memory >>>>> accesses in I/O requests. >>>>> >>>>> This is for vdpa-blk, vhost-user-blk, and other I/O interfaces that rely >>>>> on DMA mapping/unmapping. >>>> >>>> Can you explain why we're monitoring RAMRegistrar to hook into "guest >>>> RAM" and not go the usual path of the MemoryListener? >>> >>> The requirements are similar to VFIO, which uses RAMBlockNotifier. We >> >> Only VFIO NVME uses RAMBlockNotifier. Ordinary VFIO uses the MemoryListener. >> >> Maybe the difference is that ordinary VFIO has to replicate the actual >> guest physical memory layout, and VFIO NVME is only interested in >> possible guest RAM inside guest physical memory. >> >>> need to learn about all guest RAM because that's where I/O buffers are >>> located. >>> >>> Do you think RAMBlockNotifier should be avoided? >> >> I assume it depends on the use case. For saying "this might be used for >> I/O" it might be good enough I guess. >> >>> >>>> What will BDRV_REQ_REGISTERED_BUF actually do? Pin all guest memory in >>>> the worst case such as io_uring fixed buffers would do ( I hope not ). >>> >>> BLK_REQ_REGISTERED_BUF is a hint that no bounce buffer is necessary >>> because the I/O buffer is located in memory that was previously >>> registered with bdrv_registered_buf(). >>> >>> The RAMBlockNotifier calls bdrv_register_buf() to let the libblkio >>> driver know about RAM. Some libblkio drivers ignore this hint, io_uring >>> may use the fixed buffers feature, vhost-user sends the shared memory >>> file descriptors to the vhost device server, and VFIO/vhost may pin >>> pages. >>> >>> So the blkio block driver doesn't add anything new, it's the union of >>> VFIO/vhost/vhost-user/etc memory requirements. >> >> The issue is if that backend pins memory inside any of these regions. >> Then, you're instantly incompatible to anything the relies on sparse >> RAMBlocks, such as memory ballooning or virtio-mem, and have to properly >> fence it. >> >> In that case, you'd have to successfully trigger >> ram_block_discard_disable(true) first, before pinning. Who would do that >> now conditionally, just like e.g., VFIO does? >> >> io_uring fixed buffers would be one such example that pins memory and is >> problematic. vfio (unless on s390x) is another example, as you point out. > > Okay, I think libblkio needs to expose a bool property called > "mem-regions-pinned" so QEMU whether or not the registered buffers will > be pinned. > > Then the QEMU BlockDriver can do: > > if (mem_regions_pinned) { > if (ram_block_discard_disable(true) < 0) { > ...fail to open block device... > } > } > > Does that sound right?
Yes, I think so. > > Is "pinned" the best word to describe this or is there a more general > characteristic we are looking for? pinning should be the right term. We want to express that all user page tables will immediately get populated and that a kernel subsystem will take longterm references on mapped page that will go out of sync as soon as we discard memory e.g., using madvise(MADV_DONTEED). We just should not confuse it with memlock / locking into memory, which are yet different semantics (e.g., don't swap it out). > >> >> This has to be treated with care. Another thing to consider is that >> different backends might only support a limited number of such regions. >> I assume there is a way for QEMU to query this limit upfront? It might >> be required for memory hot(un)plug to figure out how many memory slots >> we actually have (for ordinary DIMMs, and if we ever want to make this >> compatible to virtio-mem, it might be required as well when the backend >> pins memory). > > Yes, libblkio reports the maximum number of blkio_mem_regions supported > by the device. The property is called "max-mem-regions". > > The QEMU BlockDriver currently doesn't use this information. Are there > any QEMU APIs that should be called to propagate this value? I assume we have to do exactly the same thing as e.g., vhost_has_free_slot()/kvm_has_free_slot() does. Especially, hw/mem/memory-device.c needs care and slots_limit/used_memslots handling in hw/virtio/vhost.c might be relevant as well. Note that I have some patches pending that extend that handling, by also providing how many used+free slots there are, such as: https://lore.kernel.org/all/20211027124531.57561-3-da...@redhat.com/ -- Thanks, David / dhildenb