Re: [PATCH v4 0/5] virtio-gpu: Add userptr support for compute workloads

Akihiko Odaki Fri, 16 Jan 2026 03:05:13 -0800

On 2026/01/16 19:32, Honglei Huang wrote:

On 2026/1/16 18:01, Akihiko Odaki wrote:
On 2026/01/16 18:39, Honglei Huang wrote:
On 2026/1/16 16:54, Akihiko Odaki wrote:
On 2026/01/16 16:20, Honglei Huang wrote:
On 2026/1/15 17:20, Akihiko Odaki wrote:
On 2026/01/15 16:58, Honglei Huang wrote:
From: Honglei Huang <[email protected]>

Hello,

This series adds virtio-gpu userptr support to enable ROCm native
context for compute workloads. The userptr feature allows thehost todirectly access guest userspace memory without memcpy overhead,which is
essential for GPU compute performance.
The userptr implementation provides buffer-based zero-copy memoryaccess.This approach pins guest userspace pages and exposes them to thehost
via scatter-gather tables, enabling efficient compute operations.
This description looks identical with whatVIRTIO_GPU_BLOB_MEM_HOST3D_GUEST does so there should be someexplanation how it makes difference.
I have already pointed out this when reviewing the QEMUpatches[1], but I note that here too, since QEMU is just amiddleman and this matter is better discussed by Linux andvirglrenderer developers.
[1] https://lore.kernel.org/qemu-devel/35a8add7-da49-4833-9e69-[email protected]/
Thanks for raising this important point about the distinction between
VIRTGPU_BLOB_FLAG_USE_USERPTR and VIRTIO_GPU_BLOB_MEM_HOST3D_GUEST.
I might not have explained it clearly previously.

The key difference is memory ownership and lifecycle:

BLOB_MEM_HOST3D_GUEST:
   - Kernel allocates memory (drm_gem_shmem_create)
   - Userspace accesses via mmap(GEM_BO)
   - Use case: Graphics resources (Vulkan/OpenGL)

BLOB_FLAG_USE_USERPTR:
   - Userspace pre-allocates memory (malloc/mmap)
"Kernel allocates memory" and "userspace pre-allocates memory" is abit ambiguous phrasing. Either way, the userspace requests thekernel to map memory with a system call, brk() or mmap().
They are different:
BLOB_MEM_HOST3D_GUEST (kernel-managed pages):
   - Allocated via drm_gem_shmem_create() as GFP_KERNEL pages
- Kernel guarantees pages won't swap or migrate while GEM objectexists
   - Physical addresses remain stable → safe for DMA

BLOB_FLAG_USE_USERPTR (userspace pages):
   - From regular malloc/mmap - subject to MM policies
   - Can be swapped, migrated, or compacted by kernel
   - Requires FOLL_LONGTERM pinning to make DMA-safe
The device must treat them differently. Kernel-managed pages havestable physicaladdresses. Userspace pages need explicit pinning and the device mustbe prepared
for potential invalidation.
This is why all compute drivers (amdgpu, i915, nouveau) implementuserptr - tomake arbitrary userspace allocations DMA-accessible while respectingtheir different
page mobility characteristics.
And the drm already has a better frame work for it: SVM, and thisverions is a super simplified verion.https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/ drivers/gpu/drm/drm_gpusvm.c#:~:text=*%20GPU%20Shared%20Virtual%20Memory%20(GPU%20SVM)%20layer%20for%20the%20Direct%20Rendering%20Manager%20(DRM)
I referred to phrasing "kernel allocates" vs "userspace allocates".Using GFP_KERNEL, swapping, migrating, or pinning is all what thekernel does.
I am talking about the virtio gpu driver side, the virtio gpu driverneed handle those two type memory differently.
   - Kernel only get existing pages
   - Use case: Compute workloads (ROCm/CUDA) with large datasets, like
GPU needs load a big model file 10G+, UMD mmap the fd file, thengive the mmap ptr into userspace then driver do not need a anothercopy.But if the shmem is used, the userspace needs copy the file datainto a shmem mmap ptr there is a copy overhead.
Userptr:

file -> open/mmap -> userspace ptr -> driver

shmem:

user alloc shmem ──→ mmap shmem ──→ shmem userspace ptr -> driver
                                               ↑
                                               │ copy
                                               │
file ──→ open/mmap ──→ file userptr ──────────┘


For compute workloads, this matters significantly:
Without userptr: malloc(8GB) → alloc GEM BO → memcpy 8GB →compute → memcpy 8GB back With userptr: malloc(8GB) → create userptr BO → compute(zero- copy)
Why don't you alloc GEM BO first and read the file into there?
Because that defeats the purpose of zero-copy.

With GEM-BO-first (what you suggest):

void *gembo = virtgpu_gem_create(10GB);     // Allocate GEM buffer
void *model = mmap(..., model_file_fd, 0);  // Map model file
memcpy(gembo, model, 10GB);                 // Copy 10GB - NOT zero-copy
munmap(model, 10GB);
gpu_compute(gembo);

Result: 10GB copy overhead + double memory usage during copy.
How about:

void *gembo = virtgpu_gem_create(10GB);
read(model_file_fd, gembo, 10GB);
I believe there is still memory copy in read operation
model_file_fd -> gembo, they have different physical pages,
but the userptr/SVM feature will access the model_file_fd physical pagesdirectly.


You can use O_DIRECT if you want.

Result: zero-copy + simpler code.
With userptr (zero-copy):

void *model = mmap(..., model_file_fd, 0);  // Map model file
hsa_memory_register(model, 10GB); // Pin pages, createuserptr BOgpu_compute(model); // GPU reads directlyfrom file pages
The explicit flag serves three purposes:
1. Although both send scatter-gather entries to host. The flagmakes the intent unambiguous.
Why will the host care?
The flag tells host this is a userptr, host side need handle itspecially.
Please provide the concrete requirement. What is the special handlingthe host side needs to perform?
Every hardware has it own special API to handle userptr, for amdgpu ROCm
it is hsaKmtRegisterMemoryWithFlags.

On the host side, BLOB_MEM_HOST3D_GUEST will always result in auserspace pointer. Below is how the address is translated:


1) (with the ioctl you are adding)
   Guest kernel translates guest userspace pointer to guest PA.
2) (with IOMMU)
   Guest kernel translates guest PA to device VA
3) The host VMM translates device VA to host userspace pointer
4) virglrenderer passes userspace pointer to the GPU API (ROCm)

BLOB_FLAG_USE_USERPTR tells 1) happened. But the succeeding process isnot affected by that.

2. Ensures consistency between flag and userptr address field.
Addresses are represented with the nr_entries and following structvirtio_gpu_mem_entry entries, wheneverVIRTIO_GPU_CMD_RESOURCE_CREATE_BLOB orVIRTIO_GPU_CMD_RESOURCE_ATTACH_BACKING is used. Having a specialflag introduces inconsistency.
For this part I am talking about the virito gpu guest UMD side, inblob create io ctrl we need this flag to
check the userptr address and is it a read-only attribute:
     if (rc_blob->blob_flags & VIRTGPU_BLOB_FLAG_USE_USERPTR) {
         if (!rc_blob->userptr)
             return -EINVAL;
     } else {
         if (rc_blob->userptr)
             return -EINVAL;

         if (rc_blob->blob_flags & VIRTGPU_BLOB_FLAG_USERPTR_RDONLY)
             return -EINVAL;
     }
I see. That shows VIRTGPU_BLOB_FLAG_USE_USERPTR is necessary for theioctl.
3. Future HMM support: There is a plan to upgrade userptrimplementation to use Heterogeneous Memory Management for betterGPU coherency and dynamic page migration. The flag provides a cleanpath to future upgrade.
How will the upgrade path with the flag and the one without the flaglook like, and in what aspect the upgrade path with the flag is"cleaner"?
As I mentioned above the userptr handling is different with shmem/GEMBO.
All the above describes the guest-internal behavior. What about theinteraction between the guest and host? How will virtio as a guest-host interface having VIRTIO_GPU_BLOB_FLAG_USE_USERPTR ease futureupgrade?
It depends on how we implement it, the current version is the simplestimplementation, similar to the implementation in Intel's i915.
If virtio side needs HMM to implement a SVM type userptr feature
I think VIRTIO_GPU_BLOB_FLAG_USE_USERPTR is must needed, stack needs toknow if it is a userptr resource, and to perform advanced operationssuch as updating page tables, splitting BOs, etc.

Why do the device need to know if it is a userptr resource to performoperations when the device always get device VAs?

I understand the concern about API complexity. I'll defer to thevirtio- gpu maintainers for the final decision on whether thisdesign is acceptable or if they prefer an alternative approach.
It is fine to have API complexity. The problem here is the lack ofclear motivation and documentation.
Another way to put this is: how will you explain the flag in thevirtio specification? It should say "the driver MAY/SHOULD/MUST dosomething" and/or "the device MAY/SHOULD/MUST do something", andthen Linux and virglrenderer can implement the flag accordingly.
you're absolutely right that the specification should
be written in proper virtio spec language. The draft should be:

VIRTIO_GPU_BLOB_FLAG_USE_USERPTR:

Linux virtio driver requirements:
- MUST set userptr to valid guest userspace VA indrm_virtgpu_resource_create_blob
- SHOULD keep VA mapping valid until resource destruction
- MUST pin pages or use HMM at blob creation time
These descriptions are not for the virtio specification. The virtiospecification describes the interaction between the driver and device.These statements describe the interaction between the guest userspaceand the guest kernel.
Virglrenderer requirements:
- must use correspoonding API for userptr resource
What is the "corresponding API"?
It may can be:
**VIRTIO_GPU_BLOB_FLAG_USE_USERPTR specification:**

Driver requirements:
- MUST populate mem_entry[] with valid guest physical addresses ofpinned userspace pages

"Userspace" is a the guest-internal concepts and irrelevant with theinteraction between the driver and device.

- MUST set blob_mem to VIRTIO_GPU_BLOB_FLAG_USE_USERPTR when using thisflag


When should the driver use the flag?

- SHOULD keep pages pinned until VIRTIO_GPU_CMD_RESOURCE_UNREF

It is not a new requirement. The page must stay at the same positionwhether VIRTIO_GPU_BLOB_FLAG_USE_USERPTR is used or not.

Device requirements:
- MUST establish IOMMU mappings using the provided iovec array withspecific API.(hsaKmtRegisterMemoryWithFlags for ROCm)

This should be also true even when VIRTIO_GPU_BLOB_FLAG_USE_USERPTR isnot set.

Really thanks for your comments, and I believe we need some input of
virito gpu maintainers.
VIRTIO_GPU_BLOB_FLAG_USE_USERPTR flag is a flag for how to use, and itdoen't conflict with VIRTGPU_BLOB_MEM_HOST3D_GUEST. Just like a resourceis used for VIRTGPU_BLOB_FLAG_USE_SHAREABLE but it can be a guestresource or a host resource.
If we don't have VIRTIO_GPU_BLOB_FLAG_USE_USERPTR flag, we may have some
resource conflict in host side, guest kernel can use 'userptr' param toidentify. But in host side the 'userptr' param is lost, we only know itis just a guest flag resource.

I still don't see why knowing it is a guest resource is insufficient forthe host.


Regards,
AKihiko Odaki

Re: [PATCH v4 0/5] virtio-gpu: Add userptr support for compute workloads

Reply via email to