On 2026/1/15 17:20, Akihiko Odaki wrote:
On 2026/01/15 16:58, Honglei Huang wrote:
From: Honglei Huang <[email protected]>

Hello,

This series adds virtio-gpu userptr support to enable ROCm native
context for compute workloads. The userptr feature allows the host to
directly access guest userspace memory without memcpy overhead, which is
essential for GPU compute performance.

The userptr implementation provides buffer-based zero-copy memory access.
This approach pins guest userspace pages and exposes them to the host
via scatter-gather tables, enabling efficient compute operations.

This description looks identical with what VIRTIO_GPU_BLOB_MEM_HOST3D_GUEST does so there should be some explanation how it makes difference.

I have already pointed out this when reviewing the QEMU patches[1], but I note that here too, since QEMU is just a middleman and this matter is better discussed by Linux and virglrenderer developers.

[1] https://lore.kernel.org/qemu-devel/35a8add7-da49-4833-9e69- [email protected]/


Thanks for raising this important point about the distinction between
VIRTGPU_BLOB_FLAG_USE_USERPTR and VIRTIO_GPU_BLOB_MEM_HOST3D_GUEST.
I might not have explained it clearly previously.

The key difference is memory ownership and lifecycle:

BLOB_MEM_HOST3D_GUEST:
  - Kernel allocates memory (drm_gem_shmem_create)
  - Userspace accesses via mmap(GEM_BO)
  - Use case: Graphics resources (Vulkan/OpenGL)

BLOB_FLAG_USE_USERPTR:
  - Userspace pre-allocates memory (malloc/mmap)
  - Kernel only get existing pages
  - Use case: Compute workloads (ROCm/CUDA) with large datasets, like
GPU needs load a big model file 10G+, UMD mmap the fd file, then give the mmap ptr into userspace then driver do not need a another copy. But if the shmem is used, the userspace needs copy the file data into a shmem mmap ptr there is a copy overhead.

Userptr:

file -> open/mmap -> userspace ptr -> driver

shmem:

user alloc shmem ──→ mmap shmem ──→ shmem userspace ptr -> driver
                                              ↑
                                              │ copy
                                              │
file ──→ open/mmap ──→ file userptr ──────────┘


For compute workloads, this matters significantly:
Without userptr: malloc(8GB) → alloc GEM BO → memcpy 8GB → compute → memcpy 8GB back
  With userptr:    malloc(8GB) → create userptr BO → compute (zero-copy)

The explicit flag serves three purposes:

1. Although both send scatter-gather entries to host. The flag makes the intent unambiguous.

2. Ensures consistency between flag and userptr address field.

3. Future HMM support: There is a plan to upgrade userptr implementation to use Heterogeneous Memory Management for better GPU coherency and dynamic page migration. The flag provides a clean path to future upgrade.

I understand the concern about API complexity. I'll defer to the virtio-gpu maintainers for the final decision on whether this design is acceptable or if they prefer an alternative approach.

Regards,
Honglei Huang


Key features:
- Zero-copy memory access between guest userspace and host GPU
- Read-only and read-write userptr support
- Runtime feature detection via VIRTGPU_PARAM_RESOURCE_USERPTR
- ROCm capset support for ROCm stack integration
- Proper page lifecycle management with FOLL_LONGTERM pinning

Patches overview:
1. Add VIRTIO_GPU_CAPSET_ROCM capability for compute workloads
2. Add virtio-gpu API definitions for userptr blob resources
3. Extend DRM UAPI with comprehensive userptr support
4. Implement core userptr functionality with page management
5. Integrate userptr into blob resource creation and advertise to userspace

Performance: In popular compute benchmarks, this implementation achieves
approximately 70% efficiency compared to bare metal OpenCL performance on
AMD V2000 hardware, achieves 92% efficiency on AMD W7900 hardware.

Testing: Verified with ROCm stack and OpenCL applications in VIRTIO virtualized
environments.
- Full OPENCL CTS tests passed on ROCm 5.7.0 in V2000 platform.
- Near 70% percentage of OPENCL CTS tests passed on ROCm 7.0 W7900 platform.
- most HIP catch tests passed on ROCm 7.0 W7900 platform.
- Some AI applications enabled on ROCm 7.0 W7900 platform.

V4 changes:
     - Renamed VIRTIO_GPU_CAPSET_HSAKMT to VIRTIO_GPU_CAPSET_ROCM
     - Remove userptr feature probing cause it can reuse the guest
       blob resource code path, reduce patch count from 6 to 5
     - Updated corresponding commit messages
     - Consolidated userptr feature detection in final patch
     - Update corresponding cover letter content

V3 changes:
     - Split into focused patches for easier review
     - Removed complex interval tree userptr management
     - Simplified resource creation without deduplication
     - Added VIRTGPU_PARAM_RESOURCE_USERPTR for feature detection
     - Improved UAPI documentation and error handling
     - Enhanced code quality with proper cleanup paths
     - Removed MMU notifier dependencies for simplicity
     - Fixed resource lifecycle management issues

V2: - Split add HSAKMT context and blob userptr resource to two patches.
     - Remove MMU notifier related patches, cause use not moveable user space
       memory with MMU notifier is not a good idea.
     - Remove HSAKMT context check when create context, let all the context
       support the userptr feature.
     - Remove MMU notifier related content in cover letter.
     - Add more comments  for patch 6 in cover letter.

Honglei Huang (5):
   drm/virtio-gpu: Add VIRTIO_GPU_CAPSET_ROCM capability
   virtio-gpu api: add blob userptr resource
   drm/virtgpu api: add blob userptr resource
   drm/virtio: implement userptr support for zero-copy memory access
   drm/virtio: advertise base userptr feature to userspace

  drivers/gpu/drm/virtio/Makefile          |   3 +-
  drivers/gpu/drm/virtio/virtgpu_drv.h     |  33 ++++
  drivers/gpu/drm/virtio/virtgpu_ioctl.c   |   9 +-
  drivers/gpu/drm/virtio/virtgpu_object.c  |   6 +
  drivers/gpu/drm/virtio/virtgpu_userptr.c | 231 +++++++++++++++++++++++
  include/uapi/drm/virtgpu_drm.h           |   9 +
  include/uapi/linux/virtio_gpu.h          |   7 +
  7 files changed, 295 insertions(+), 3 deletions(-)
  create mode 100644 drivers/gpu/drm/virtio/virtgpu_userptr.c



Reply via email to