Hi Akihiko,
> Subject: Re: [PATCH v2 10/10] virtio-gpu-dmabuf: Create dmabuf for blobs
> associated with VFIO devices
> >>>>>>>> On 2025/11/09 14:33, Vivek Kasireddy wrote:
> >>>>>>>>> In addition to memfd, a blob resource can also have its backing
> >>>>>>>>> storage in a VFIO device region. Therefore, we first need to figure
> >>>>>>>>> out if the blob is backed by a VFIO device region or a memfd
> before
> >>>>>>>>> we can call the right API to get a dmabuf fd created.
> >>>>>>>>>
> >>>>>>>>> So, once we have the ramblock and the associated mr, we rely
> on
> >>>>>>>>> memory_region_is_ram_device() to tell us where the backing
> storage
> >>>>>>>>> is located. If the blob resource is VFIO backed, we try to find the
> >>>>>>>>> right VFIO device that contains the blob and then invoke the API
> >>>>>>>>> vfio_device_create_dmabuf().
> >>>>>>>>>
> >>>>>>>>> Note that in virtio_gpu_remap_udmabuf(), we first try to test if
> >>>>>>>>> the VFIO dmabuf exporter supports mmap or not. If it doesn't,
> we
> >>>>>>>>> use the VFIO device fd directly to create the CPU mapping.
> >>>>>>>>
> >>>>>>>> I have just remembered that mremap() will work for either of
> udmabuf
> >>>>>> and
> >>>>>>>> VFIO. That will avoid having two different methods and make
> >>>>>>>> vfio_get_region_index_from_mr() and
> vfio_device_get_region_info()
> >>>>>>>> unnecessary.
> >>>>>>> IIUC, the name virtio_gpu_remap_dmabuf() is misleading because
> we
> >>>> are
> >>>>>> not
> >>>>>>> actually doing remap but are simply calling mmap(). In other
> words, we
> >>>>>> are not
> >>>>>>> expanding or shrinking existing mapping but are creating a new
> >>>> mapping.
> >>>>>>> And, for dmabufs associated with VFIO devices, without having to
> call
> >>>>>>> vfio_get_region_index_from_mr() and
> vfio_device_get_region_info(), I
> >>>>>> don't see
> >>>>>>> any other way to determine the region offset.
> >>>>>>>
> >>>>>>> So, I guess I'll create a new patch to do s/remapped/map.
> >>>>>>
> >>>>>> I mean calling mremap() with 0 as the old_size parameter. The man
> page
> >>>>>> says:
> >>>>>> > If the value of old_size is zero, and old_address refers to a
> >>>>>> > shareable mapping (see the description of MAP_SHARED in
> >> mmap(2)),
> >>>>>> then
> >>>>>> > mremap() will create a new mapping of the same pages.
> >>>>> It might be possible to use mremap() here but using mmap() seems
> very
> >>>>> straightforward given that we are actually not shrinking or expanding
> >>>>> an existing mapping but are instead creating a new mapping. Also, I
> am
> >>>>> wondering what benefit would mremap() bring as opposed to just
> using
> >>>>> mmap()?
> >>>>
> >>>> As I noted earlier, mremap() removes the need of having two different
> >>>> paths for udmabuf and VFIO, and make
> vfio_get_region_index_from_mr()
> >>>> and
> >>>> vfio_device_get_region_info() unnecessary, reducing code complexity.
> >>> Sorry, I should have researched thoroughly before but after looking at
> the
> >> code
> >>> again, I don't see how mremap() removes the need for having two
> different
> >>> paths for udmabuf and VFIO and make
> vfio_get_region_index_from_mr()
> >>> and vfio_device_get_region_info() unnecessary. Could you please
> elaborate
> >>> how it can be done?
> >>
> >> Not tested, but something like the following:
> >>
> >> head = qemu_ram_mmap(-1, res->blob_size,
> qemu_real_host_page_size(),
> >> QEMU_MAP_READONLY | QEMU_MAP_SHARED, 0);
> >> if (head == MAP_FAILED) {
> >> return NULL;
> >> }
> >>
> >> cursor = head;
> >>
> >> for (i = 0; i < res->iov_cnt; i++) {
> >> if (mremap(res->iov[i].iov_base, 0, res->iov[i].iov_len,
> >> MREMAP_FIXED, cursor) == MAP_FAILED) {
> > This is very elegant and I can now see how it is expected to work.
> However,
> > I went ahead and tested it and it does not seem to work for VFIO backed
> > buffers. It works for buffers based out of System RAM though. Here is the
> > actual code I tested with that I am unconditionally calling for both VFIO
> > and udmabuf cases:
> > static void *vfio_dmabuf_mmap2(struct virtio_gpu_simple_resource *res,
> > VFIODevice *vdev)
> > {
> > void *head, *cursor;
> > int i;
> >
> > head = qemu_ram_mmap(-1, res->blob_size,
> qemu_real_host_page_size(),
> > QEMU_MAP_READONLY |
> QEMU_MAP_SHARED, 0);
>
> By the way, please do:
> head = mmap(NULL, res->blob_size, PROT_NONE, MAP_SHARED, -1, 0);
>
> I forgot that we don't need to map a RAM but mmap() with PROT_NONE is
> sufficient. It will catch a bug that fails to mmap() a real resource on
> top of it.
>
> > if (head == MAP_FAILED) {
> > return head;
> > }
> >
> > cursor = head;
> > for (i = 0; i < res->iov_cnt; i++) {
> > if (mremap(res->iov[i].iov_base, 0, res->iov[i].iov_len,
> > MREMAP_FIXED | MREMAP_MAYMOVE, cursor) == MAP_FAILED) {
> > goto err;
> > }
> > cursor += res->iov[i].iov_len;
> > }
> > return head;
> > err:
> > qemu_ram_munmap(-1, head, res->blob_size);
> > return MAP_FAILED;
> > }
> >
> > It (mremap) initially errored with -EINVAL in all cases but adding
> MREMAP_MAYMOVE
> > fixed it for buffers based out of RAM but for VFIO backed buffers, it seems
> to be
> > throwing -EFAULT/Bad Address error. I did not yet check why or where the
> kernel
> > driver is returning this error from.
>
> The man page says that EFAULT means:
> > Some address in the range old_address to old_address+old_size is an
> > invalid virtual memory address for this process. You can also get
> > EFAULT even if there exist mappings that cover the whole address space
> > requested, but those mappings are of different types.
>
> None of this should be true so it should be a bug, though I'm not sure
> if it is a bug of QEMU, Linux, or the man page (i.e., the man page
> failed to mention another failure scenario). In any case it needs to be
> debugged.
I found that the check that fails due to which -EFAULT is returned is this:
https://elixir.bootlin.com/linux/v6.18-rc6/source/mm/mremap.c#L1736
And, removing the check makes it (mremap) work. So, it is not clear whether
mremap() is supported for VMAs that have VM_DONTEXPAND and VM_PFNMAP
flags set, like in the case of VFIO. I guess I'll send a patch removing this
check
to linux-mm to discuss this issue with mm developers.
Thanks,
Vivek
>
> Regards,
> Akihiko Odaki