Currently kfd manages kfd_process in a one context (kfd_process) per program manner, thus each user space program only onws one kfd context (kfd_process).
This model works fine for most of the programs, but imperfect
for a hypervisor like QEMU. Because all programs in the guest
user space share the same only one kfd context, which is
problematic, including but not limited to:
As illustrated in Figure 1, all guest user space programs share the same fd of
/dev/kfd
and the same kfd_process, and the same PASID leading to the same
GPU_VM address space. Therefore the IOVA range of each
guest user space programs are not isolated,
they can attack each other through GPU DMA.
+----------------------------------------------------------------------------------+
|
|
| +-----------+ +-----------+ +------------+ +------------+
|
| | | | | | | | |
|
| | Program 1 | | Program 2 | | Program 3 | | Program N |
|
| | | | | | | | |
|
| +----+------+ +--------+--+ +--+---------+ +-----+------+
|
| | | | |
|
| | | | |
Guest |
| | | | |
|
+-------+----------------------+------------+----------------------+---------------+
| | | |
| | | |
| | | |
| | | |
| +--+------------+---+ |
| | file descriptor | |
+-------------------+ of /dev/kfd +------------------+
| opened by QEMU |
| |
+---------+---------+ User Space
| QEMU
|
---------------------------------------+-----------------------------------------------------
| Kernel
Space
| KFD Module
|
+--------+--------+
| |
| kfd_process |<------------------The only
one KFD context
| |
+--------+--------+
|
+--------+--------+
| PASID |
+--------+--------+
|
+--------+--------+
| GPU_VM |
+-----------------+
Fiture 1
This series implements a multiple contexts solution:
- Allow each program to create and hold multiple contexts (kfd processes)
- Each context has its own fd of /dev/kfd and an exclusive kfd_process,
which is a secondary kfd context. So that PASID/GPU VM isolates their IOVA
address spaces.
Therefore, they can not attack each other through GPU DMA.
The design is illustrated in Figure 2 below:
+---------------------------------------------------------------------------------------------------------+
|
|
|
|
|
|
|
+----------------------------------------------------------------------------------+
|
| |
| |
| | +-----------+ +-----------+ +-----------+
+-----------+ | |
| | | | | | | | |
| | |
| | | Program 1 | | Program 2 | | Program 3 | | Program N
| | |
| | | | | | | | |
| | |
| | +-----+-----+ +-----+-----+ +-----+-----+
+-----+-----+ | |
| | | | | |
| |
| | | | | |
Guest | |
| | | | | |
| |
|
+-------+------------------+-----------------+----------------+--------------------+
|
| | | | |
QEMU |
| | | | |
|
+---------------+------------------+-----------------+----------------+--------------------------+--------+
| | | |
|
| | | |
|
| | | |
|
+---+----+ +---+----+ +---+----+ +---+----+
+---+-----+
| | | | | | | |
| Primary |
| FD 1 | | FD 2 | | FD 3 | | FD 4 |
| FD |
| | | | | | | |
| |
+---+----+ +---+----+ +---+----+ +----+---+
+----+----+
| | | |
| User Space
| | | |
|
-------------------+------------------+-----------------+-----------------+--------------------------+----------------------------
| | | |
| Kernel SPace
| | | |
|
| | | |
|
+--------------------------------------------------------------------------------------------------------------------------+
| +------+------+ +------+------+ +------+------+
+------+------+ +------+------+ |
| | Secondary | | Secondary | | Secondary | | Secondary
| | Primary | KFD Module |
| |kfd_process 1| |kfd_process 2| |kfd_process 3| |kfd_process
4| | kfd_process | |
| | | | | | | |
| | | |
| +------+------+ +------+------+ +------+------+
+------+------+ +------+------+ |
| | | | |
| |
| +------+------+ +------+------+ +------+------+
+------+------+ +------+------+ |
| | PASID | | PASID | | PASID | | PASID
| | PASID | |
| +------+------+ +------+------+ +------+------+
+------+------+ +------+------+ |
| | | | |
| |
| | | | |
| |
| +------+------+ +------+------+ +------+------+
+------+------+ +------+------+ |
| | GPU_VM | | GPU_VM | | GPU_VM | | GPU_VM
| | GPU_VM | |
| +-------------+ +-------------+ +-------------+
+-------------+ +-------------+ |
|
|
+--------------------------------------------------------------------------------------------------------------------------+
Figure 2
The relevant reference user space rocm changes could be found at:
https://github.com/AMD-ROCm-Internal/rocm-systems/pull/78
https://github.com/AMD-ROCm-Internal/rocm-systems/pull/110
Thanks!
Zhu Lingshan (18):
amdkfd: enlarge the hashtable of kfd_process
amdkfd: mark the first kfd_process as the primary one
amdkfd: find_process_by_mm always return the primary context
amdkfd: Introduce kfd_create_process_sysfs as a separate function
amdkfd: destroy kfd secondary contexts through fd close
amdkfd: process svm ioctl only on the primary kfd process
amdkfd: process USERPTR allocation only on the primary kfd process
amdkfd: identify a secondary kfd process by its id
amdkfd: find kfd_process by filep->private_data in kfd_mmap
amdkfd: remove DIQ support
amdkfd: process pointer of a HIQ should be NULL
amdkfd: remove test_kq
amdkfd: introduce new helper kfd_lookup_process_by_id
amdkfd: record kfd process id into kfd process_info
amdkfd: record kfd process id in amdkfd_fence
amdkfd: fence handler evict and restore a kfd process by its id
amdkfd: set_debug_trap ioctl only works on a primary kfd_process
target
amdkfd: introduce new ioctl AMDKFD_IOC_CREATE_PROCESS
drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h | 8 +-
.../gpu/drm/amd/amdgpu/amdgpu_amdkfd_fence.c | 10 +-
.../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c | 8 +-
drivers/gpu/drm/amd/amdkfd/kfd_chardev.c | 102 +++++++-
drivers/gpu/drm/amd/amdkfd/kfd_device.c | 7 +-
.../drm/amd/amdkfd/kfd_device_queue_manager.c | 6 +-
drivers/gpu/drm/amd/amdkfd/kfd_kernel_queue.c | 61 +----
.../drm/amd/amdkfd/kfd_packet_manager_v9.c | 4 -
.../drm/amd/amdkfd/kfd_packet_manager_vi.c | 4 -
drivers/gpu/drm/amd/amdkfd/kfd_priv.h | 17 +-
drivers/gpu/drm/amd/amdkfd/kfd_process.c | 244 +++++++++++++-----
.../amd/amdkfd/kfd_process_queue_manager.c | 39 +--
drivers/gpu/drm/amd/amdkfd/kfd_svm.c | 2 +-
include/uapi/linux/kfd_ioctl.h | 8 +-
14 files changed, 321 insertions(+), 199 deletions(-)
--
2.51.0
