Re: [PATCH] drm/amdgpu: Fix uninitialized return value
Am 27.11.23 um 22:55 schrieb Alex Deucher: On Mon, Nov 27, 2023 at 2:22 PM Christian König wrote: Am 27.11.23 um 19:29 schrieb Lijo Lazar: The return value is uniinitialized if ras context is NULL. Fixes: 0f4c8faa043c (drm/amdgpu: Move mca debug mode decision to ras) Signed-off-by: Lijo Lazar --- drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c index 1a8668a63e67..f6b47ebce9d6 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c @@ -3410,7 +3410,7 @@ int amdgpu_ras_reset_gpu(struct amdgpu_device *adev) int amdgpu_ras_set_mca_debug_mode(struct amdgpu_device *adev, bool enable) { struct amdgpu_ras *con = amdgpu_ras_get_context(adev); - int ret; + int ret = 0; That's usually considered very bad coding style and complained about by automated checkers. Instead explicitly set the return value in the code paths not actually setting it. In this case, the function is so short, I think it makes things less readable to do that. Yeah, indeed but that doesn't help us with the coding style checkers. We could do something like this instead: if (!con) return 0; ret = amdgpu_mca_smu_set_debug_mode(adev, enable); ... Regards, Christian. Reviewed-by: Alex Deucher Regards, Christian. if (con) { ret = amdgpu_mca_smu_set_debug_mode(adev, enable);
Re: [PATCH] drm/amdgpu: Fix uninitialized return value
On 11/28/2023 3:07 PM, Christian König wrote: Am 27.11.23 um 22:55 schrieb Alex Deucher: On Mon, Nov 27, 2023 at 2:22 PM Christian König wrote: Am 27.11.23 um 19:29 schrieb Lijo Lazar: The return value is uniinitialized if ras context is NULL. Fixes: 0f4c8faa043c (drm/amdgpu: Move mca debug mode decision to ras) Signed-off-by: Lijo Lazar --- drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c index 1a8668a63e67..f6b47ebce9d6 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c @@ -3410,7 +3410,7 @@ int amdgpu_ras_reset_gpu(struct amdgpu_device *adev) int amdgpu_ras_set_mca_debug_mode(struct amdgpu_device *adev, bool enable) { struct amdgpu_ras *con = amdgpu_ras_get_context(adev); - int ret; + int ret = 0; That's usually considered very bad coding style and complained about by automated checkers. Instead explicitly set the return value in the code paths not actually setting it. In this case, the function is so short, I think it makes things less readable to do that. Yeah, indeed but that doesn't help us with the coding style checkers. Are these checkers for real? I see many instances of variable initialization including in core kernel code (ex: workqueue) code. Thanks Lijo We could do something like this instead: if (!con) return 0; ret = amdgpu_mca_smu_set_debug_mode(adev, enable); ... Regards, Christian. Reviewed-by: Alex Deucher Regards, Christian. if (con) { ret = amdgpu_mca_smu_set_debug_mode(adev, enable);
[PATCH 0/8] drm/plane-helpers: Minor clean ups
Move drm_plane_helper_atomic_check() into udl, which is the only driver using it. Remove several unnecessary include statements for . Thomas Zimmermann (8): drm/plane-helper: Move drm_plane_helper_atomic_check() into udl drm/amdgpu: Do not include drm/loongson: Do not include drm/shmobile: Do not include drm/solomon: Do not include drm/ofdrm: Do not include drm/simpledrm: Do not include drm/xlnx: Do not include .../gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 1 - drivers/gpu/drm/drm_plane_helper.c| 32 --- drivers/gpu/drm/loongson/lsdc_plane.c | 1 - .../drm/renesas/shmobile/shmob_drm_plane.c| 1 - drivers/gpu/drm/solomon/ssd130x.h | 1 - drivers/gpu/drm/tiny/ofdrm.c | 1 - drivers/gpu/drm/tiny/simpledrm.c | 1 - drivers/gpu/drm/udl/udl_modeset.c | 19 +-- drivers/gpu/drm/xlnx/zynqmp_kms.c | 1 - include/drm/drm_plane_helper.h| 2 -- 10 files changed, 17 insertions(+), 43 deletions(-) -- 2.43.0
[PATCH 2/8] drm/amdgpu: Do not include
Remove unnecessary include statements for . The file contains helpers for non-atomic code and should not be required by most drivers. No functional changes. Signed-off-by: Thomas Zimmermann --- drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 1 - 1 file changed, 1 deletion(-) diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c index aa43f1761acd3..b8c3a9b104a41 100644 --- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c +++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c @@ -92,7 +92,6 @@ #include #include #include -#include #include -- 2.43.0
[PATCH 6/8] drm/ofdrm: Do not include
Remove unnecessary include statements for . The file contains helpers for non-atomic code and should not be required by most drivers. No functional changes. Signed-off-by: Thomas Zimmermann --- drivers/gpu/drm/tiny/ofdrm.c | 1 - 1 file changed, 1 deletion(-) diff --git a/drivers/gpu/drm/tiny/ofdrm.c b/drivers/gpu/drm/tiny/ofdrm.c index 05a72473cfc65..ab89b7fc7bf61 100644 --- a/drivers/gpu/drm/tiny/ofdrm.c +++ b/drivers/gpu/drm/tiny/ofdrm.c @@ -19,7 +19,6 @@ #include #include #include -#include #include #include -- 2.43.0
[PATCH 4/8] drm/shmobile: Do not include
Remove unnecessary include statements for . The file contains helpers for non-atomic code and should not be required by most drivers. No functional changes. Signed-off-by: Thomas Zimmermann --- drivers/gpu/drm/renesas/shmobile/shmob_drm_plane.c | 1 - 1 file changed, 1 deletion(-) diff --git a/drivers/gpu/drm/renesas/shmobile/shmob_drm_plane.c b/drivers/gpu/drm/renesas/shmobile/shmob_drm_plane.c index 8f9a728affde8..07ad17d24294d 100644 --- a/drivers/gpu/drm/renesas/shmobile/shmob_drm_plane.c +++ b/drivers/gpu/drm/renesas/shmobile/shmob_drm_plane.c @@ -14,7 +14,6 @@ #include #include #include -#include #include "shmob_drm_drv.h" #include "shmob_drm_kms.h" -- 2.43.0
[PATCH 3/8] drm/loongson: Do not include
Remove unnecessary include statements for . The file contains helpers for non-atomic code and should not be required by most drivers. No functional changes. Signed-off-by: Thomas Zimmermann --- drivers/gpu/drm/loongson/lsdc_plane.c | 1 - 1 file changed, 1 deletion(-) diff --git a/drivers/gpu/drm/loongson/lsdc_plane.c b/drivers/gpu/drm/loongson/lsdc_plane.c index 0d50946332229..d227a2c1dcf16 100644 --- a/drivers/gpu/drm/loongson/lsdc_plane.c +++ b/drivers/gpu/drm/loongson/lsdc_plane.c @@ -9,7 +9,6 @@ #include #include #include -#include #include "lsdc_drv.h" #include "lsdc_regs.h" -- 2.43.0
[PATCH 1/8] drm/plane-helper: Move drm_plane_helper_atomic_check() into udl
The udl driver is the only caller of drm_plane_helper_atomic_check(). Move the function into the driver. No functional changes. Signed-off-by: Thomas Zimmermann --- drivers/gpu/drm/drm_plane_helper.c | 32 -- drivers/gpu/drm/udl/udl_modeset.c | 19 -- include/drm/drm_plane_helper.h | 2 -- 3 files changed, 17 insertions(+), 36 deletions(-) diff --git a/drivers/gpu/drm/drm_plane_helper.c b/drivers/gpu/drm/drm_plane_helper.c index 5e95089676ff8..7982be4b0306d 100644 --- a/drivers/gpu/drm/drm_plane_helper.c +++ b/drivers/gpu/drm/drm_plane_helper.c @@ -279,35 +279,3 @@ void drm_plane_helper_destroy(struct drm_plane *plane) kfree(plane); } EXPORT_SYMBOL(drm_plane_helper_destroy); - -/** - * drm_plane_helper_atomic_check() - Helper to check plane atomic-state - * @plane: plane to check - * @state: atomic state object - * - * Provides a default plane-state check handler for planes whose atomic-state - * scale and positioning are not expected to change since the plane is always - * a fullscreen scanout buffer. - * - * This is often the case for the primary plane of simple framebuffers. See - * also drm_crtc_helper_atomic_check() for the respective CRTC-state check - * helper function. - * - * RETURNS: - * Zero on success, or an errno code otherwise. - */ -int drm_plane_helper_atomic_check(struct drm_plane *plane, struct drm_atomic_state *state) -{ - struct drm_plane_state *new_plane_state = drm_atomic_get_new_plane_state(state, plane); - struct drm_crtc *new_crtc = new_plane_state->crtc; - struct drm_crtc_state *new_crtc_state = NULL; - - if (new_crtc) - new_crtc_state = drm_atomic_get_new_crtc_state(state, new_crtc); - - return drm_atomic_helper_check_plane_state(new_plane_state, new_crtc_state, - DRM_PLANE_NO_SCALING, - DRM_PLANE_NO_SCALING, - false, false); -} -EXPORT_SYMBOL(drm_plane_helper_atomic_check); diff --git a/drivers/gpu/drm/udl/udl_modeset.c b/drivers/gpu/drm/udl/udl_modeset.c index 40876bcdd79a4..7702359c90c22 100644 --- a/drivers/gpu/drm/udl/udl_modeset.c +++ b/drivers/gpu/drm/udl/udl_modeset.c @@ -21,7 +21,6 @@ #include #include #include -#include #include #include @@ -261,6 +260,22 @@ static const uint64_t udl_primary_plane_fmtmods[] = { DRM_FORMAT_MOD_INVALID }; +static int udl_primary_plane_helper_atomic_check(struct drm_plane *plane, +struct drm_atomic_state *state) +{ + struct drm_plane_state *new_plane_state = drm_atomic_get_new_plane_state(state, plane); + struct drm_crtc *new_crtc = new_plane_state->crtc; + struct drm_crtc_state *new_crtc_state = NULL; + + if (new_crtc) + new_crtc_state = drm_atomic_get_new_crtc_state(state, new_crtc); + + return drm_atomic_helper_check_plane_state(new_plane_state, new_crtc_state, + DRM_PLANE_NO_SCALING, + DRM_PLANE_NO_SCALING, + false, false); +} + static void udl_primary_plane_helper_atomic_update(struct drm_plane *plane, struct drm_atomic_state *state) { @@ -296,7 +311,7 @@ static void udl_primary_plane_helper_atomic_update(struct drm_plane *plane, static const struct drm_plane_helper_funcs udl_primary_plane_helper_funcs = { DRM_GEM_SHADOW_PLANE_HELPER_FUNCS, - .atomic_check = drm_plane_helper_atomic_check, + .atomic_check = udl_primary_plane_helper_atomic_check, .atomic_update = udl_primary_plane_helper_atomic_update, }; diff --git a/include/drm/drm_plane_helper.h b/include/drm/drm_plane_helper.h index 3a574e8cd22f4..75f9c4830564a 100644 --- a/include/drm/drm_plane_helper.h +++ b/include/drm/drm_plane_helper.h @@ -26,7 +26,6 @@ #include -struct drm_atomic_state; struct drm_crtc; struct drm_framebuffer; struct drm_modeset_acquire_ctx; @@ -42,7 +41,6 @@ int drm_plane_helper_update_primary(struct drm_plane *plane, struct drm_crtc *cr int drm_plane_helper_disable_primary(struct drm_plane *plane, struct drm_modeset_acquire_ctx *ctx); void drm_plane_helper_destroy(struct drm_plane *plane); -int drm_plane_helper_atomic_check(struct drm_plane *plane, struct drm_atomic_state *state); /** * DRM_PLANE_NON_ATOMIC_FUNCS - Default plane functions for non-atomic drivers -- 2.43.0
[PATCH 5/8] drm/solomon: Do not include
Remove unnecessary include statements for . The file contains helpers for non-atomic code and should not be required by most drivers. No functional changes. Signed-off-by: Thomas Zimmermann --- drivers/gpu/drm/solomon/ssd130x.h | 1 - 1 file changed, 1 deletion(-) diff --git a/drivers/gpu/drm/solomon/ssd130x.h b/drivers/gpu/drm/solomon/ssd130x.h index acf7cedf0c1ab..075c5c3ee75ac 100644 --- a/drivers/gpu/drm/solomon/ssd130x.h +++ b/drivers/gpu/drm/solomon/ssd130x.h @@ -17,7 +17,6 @@ #include #include #include -#include #include -- 2.43.0
[PATCH 8/8] drm/xlnx: Do not include
Remove unnecessary include statements for . The file contains helpers for non-atomic code and should not be required by most drivers. No functional changes. Signed-off-by: Thomas Zimmermann --- drivers/gpu/drm/xlnx/zynqmp_kms.c | 1 - 1 file changed, 1 deletion(-) diff --git a/drivers/gpu/drm/xlnx/zynqmp_kms.c b/drivers/gpu/drm/xlnx/zynqmp_kms.c index a7f8611be6f42..db3bb4afbfc46 100644 --- a/drivers/gpu/drm/xlnx/zynqmp_kms.c +++ b/drivers/gpu/drm/xlnx/zynqmp_kms.c @@ -27,7 +27,6 @@ #include #include #include -#include #include #include #include -- 2.43.0
[PATCH 7/8] drm/simpledrm: Do not include
Remove unnecessary include statements for . The file contains helpers for non-atomic code and should not be required by most drivers. No functional changes. Signed-off-by: Thomas Zimmermann --- drivers/gpu/drm/tiny/simpledrm.c | 1 - 1 file changed, 1 deletion(-) diff --git a/drivers/gpu/drm/tiny/simpledrm.c b/drivers/gpu/drm/tiny/simpledrm.c index 34bbbd7b53dd9..7ce1c46176750 100644 --- a/drivers/gpu/drm/tiny/simpledrm.c +++ b/drivers/gpu/drm/tiny/simpledrm.c @@ -25,7 +25,6 @@ #include #include #include -#include #include #define DRIVER_NAME"simpledrm" -- 2.43.0
Re: [PATCH 8/8] drm/xlnx: Do not include
Hi Thomas, Thank you for the patch. On Tue, Nov 28, 2023 at 11:45:24AM +0100, Thomas Zimmermann wrote: > Remove unnecessary include statements for . > The file contains helpers for non-atomic code and should not be > required by most drivers. No functional changes. > > Signed-off-by: Thomas Zimmermann Assuming you've compile-tested the driver, Reviewed-by: Laurent Pinchart Please get this merged through drm-misc as part of the series if possible. > --- > drivers/gpu/drm/xlnx/zynqmp_kms.c | 1 - > 1 file changed, 1 deletion(-) > > diff --git a/drivers/gpu/drm/xlnx/zynqmp_kms.c > b/drivers/gpu/drm/xlnx/zynqmp_kms.c > index a7f8611be6f42..db3bb4afbfc46 100644 > --- a/drivers/gpu/drm/xlnx/zynqmp_kms.c > +++ b/drivers/gpu/drm/xlnx/zynqmp_kms.c > @@ -27,7 +27,6 @@ > #include > #include > #include > -#include > #include > #include > #include -- Regards, Laurent Pinchart
[PATCH] drm/amdgpu: Use another offset for GC 9.4.3 remap
The legacy region at 0x7F000 maps to valid registers in GC 9.4.3 SOCs. Use 0x1A000 offset instead as MMIO register remap region. Signed-off-by: Lijo Lazar --- drivers/gpu/drm/amd/amdgpu/soc15.c | 5 + 1 file changed, 5 insertions(+) diff --git a/drivers/gpu/drm/amd/amdgpu/soc15.c b/drivers/gpu/drm/amd/amdgpu/soc15.c index e3d41e8aac9d..9ad4d6d3122b 100644 --- a/drivers/gpu/drm/amd/amdgpu/soc15.c +++ b/drivers/gpu/drm/amd/amdgpu/soc15.c @@ -1162,6 +1162,11 @@ static int soc15_common_early_init(void *handle) AMD_PG_SUPPORT_VCN_DPG | AMD_PG_SUPPORT_JPEG; adev->external_rev_id = adev->rev_id + 0x46; + /* GC 9.4.3 uses MMIO register region hole at a different offset */ + if (!amdgpu_sriov_vf(adev)) { + adev->rmmio_remap.reg_offset = 0x1A000; + adev->rmmio_remap.bus_addr = adev->rmmio_base + 0x1A000; + } break; default: /* FIXME: not supported yet */ -- 2.25.1
Re: [RFC PATCH 0/6] Supporting GMEM (generalized memory management) for external memory devices
Am 28.11.23 um 13:50 schrieb Weixi Zhu: The problem: Accelerator driver developers are forced to reinvent external MM subsystems case by case, because Linux core MM only considers host memory resources. These reinvented MM subsystems have similar orders of magnitude of LoC as Linux MM (80K), e.g. Nvidia-UVM has 70K, AMD GPU has 14K and Huawei NPU has 30K. Meanwhile, more and more vendors are implementing their own accelerators, e.g. Microsoft's Maia 100. At the same time, application-level developers suffer from poor programmability -- they must consider parallel address spaces and be careful about the limited device DRAM capacity. This can be alleviated if a malloc()-ed virtual address can be shared by the accelerator, or the abundant host DRAM can further transparently backup the device local memory. These external MM systems share similar mechanisms except for the hardware-dependent part, so reinventing them is effectively introducing redundant code (14K~70K for each case). Such developing/maintaining is not cheap. Furthermore, to share a malloc()-ed virtual address, device drivers need to deeply interact with Linux MM via low-level MM APIs, e.g. MMU notifiers/HMM. This raises the bar for driver development, since developers must understand how Linux MM works. Further, it creates code maintenance problems -- any changes to Linux MM potentially require coordinated changes to accelerator drivers using low-level MM APIs. Putting a cache-coherent bus between host and device will not make these external MM subsystems disappear. For example, a throughput-oriented accelerator will not tolerate executing heavy memory access workload with a host MMU/IOMMU via a remote bus. Therefore, devices will still have their own MMU and pick a simpler page table format for lower address translation overhead, requiring external MM subsystems. What GMEM (Generalized Memory Management [1]) does: GMEM extends Linux MM to share its machine-independent MM code. Only high-level interface is provided for device drivers. This prevents accelerator drivers from reinventing the wheel, but relies on drivers to implement their hardware-dependent functions declared by GMEM. GMEM's key interface include gm_dev_create(), gm_as_create(), gm_as_attach() and gm_dev_register_physmem(). Here briefly describe how a device driver utilizes them: 1. At boot time, call gm_dev_create() and registers the implementation of hardware-dependent functions as declared in struct gm_mmu. - If the device has local DRAM, call gm_dev_register_physmem() to register available physical addresses. 2. When a device context is initialized (e.g. triggered by ioctl), check if the current CPU process has been attached to a gmem address space (struct gm_as). If not, call gm_as_create() and point current->mm->gm_as to it. 3. Call gm_as_attach() to attach the device context to a gmem address space. 4. Invoke gm_dev_fault() to resolve a page fault or prepare data before device computation happens. GMEM has changed the following assumptions in Linux MM: 1. An mm_struct not only handle a single CPU context, but may also handle external memory contexts encapsulated as gm_context listed in mm->gm_as. An external memory context can include a few or all of the following parts: an external MMU (that requires TLB invalidation), an external page table (that requires PTE manipulation) and external DRAM (that requires physical memory management). Well that is pretty much exactly what AMD has already proposed with KFD and was rejected for rather good reasons. 2. Faulting a MAP_PRIVATE VMA with no CPU PTE found does not necessarily mean that a zero-filled physical page should be mapped. The virtual page may have been mapped to an external memory device. 3. Unmapping a page may include sending device TLB invalidation (even if its MMU shares CPU page table) and manipulating device PTEs. Semantics of new syscalls: 1. mmap(..., MAP_PRIVATE | MAP_PEER_SHARED) Allocate virtual address that is shared between the CPU and all attached devices. Data is guaranteed to be coherent whenever the address is accessed by either CPU or any attached device. If the device does not support page fault, then device driver is responsible for faulting memory before data gets accessed. By default, the CPU DRAM is can be used as a swap backup for the device local memory. 2. hmadvise(NUMA_id, va_start, size, memory_hint) Issuing memory hint for a given VMA. This extends traditional madvise() syscall with an extra argument so that programmers have better control with heterogeneous devices registered as NUMA nodes. One useful memory hint could be MADV_PREFETCH, which guarantees that the physical data of the given VMA [VA, VA+size) is migrated to NUMA node #id. Another useful memory hint is MADV_DONTNEED. This is
Re: [RFC PATCH 0/6] Supporting GMEM (generalized memory management) for external memory devices
Adding a few missing important people to the explicit to list. Am 28.11.23 um 13:50 schrieb Weixi Zhu: The problem: Accelerator driver developers are forced to reinvent external MM subsystems case by case, because Linux core MM only considers host memory resources. These reinvented MM subsystems have similar orders of magnitude of LoC as Linux MM (80K), e.g. Nvidia-UVM has 70K, AMD GPU has 14K and Huawei NPU has 30K. Meanwhile, more and more vendors are implementing their own accelerators, e.g. Microsoft's Maia 100. At the same time, application-level developers suffer from poor programmability -- they must consider parallel address spaces and be careful about the limited device DRAM capacity. This can be alleviated if a malloc()-ed virtual address can be shared by the accelerator, or the abundant host DRAM can further transparently backup the device local memory. These external MM systems share similar mechanisms except for the hardware-dependent part, so reinventing them is effectively introducing redundant code (14K~70K for each case). Such developing/maintaining is not cheap. Furthermore, to share a malloc()-ed virtual address, device drivers need to deeply interact with Linux MM via low-level MM APIs, e.g. MMU notifiers/HMM. This raises the bar for driver development, since developers must understand how Linux MM works. Further, it creates code maintenance problems -- any changes to Linux MM potentially require coordinated changes to accelerator drivers using low-level MM APIs. Putting a cache-coherent bus between host and device will not make these external MM subsystems disappear. For example, a throughput-oriented accelerator will not tolerate executing heavy memory access workload with a host MMU/IOMMU via a remote bus. Therefore, devices will still have their own MMU and pick a simpler page table format for lower address translation overhead, requiring external MM subsystems. What GMEM (Generalized Memory Management [1]) does: GMEM extends Linux MM to share its machine-independent MM code. Only high-level interface is provided for device drivers. This prevents accelerator drivers from reinventing the wheel, but relies on drivers to implement their hardware-dependent functions declared by GMEM. GMEM's key interface include gm_dev_create(), gm_as_create(), gm_as_attach() and gm_dev_register_physmem(). Here briefly describe how a device driver utilizes them: 1. At boot time, call gm_dev_create() and registers the implementation of hardware-dependent functions as declared in struct gm_mmu. - If the device has local DRAM, call gm_dev_register_physmem() to register available physical addresses. 2. When a device context is initialized (e.g. triggered by ioctl), check if the current CPU process has been attached to a gmem address space (struct gm_as). If not, call gm_as_create() and point current->mm->gm_as to it. 3. Call gm_as_attach() to attach the device context to a gmem address space. 4. Invoke gm_dev_fault() to resolve a page fault or prepare data before device computation happens. GMEM has changed the following assumptions in Linux MM: 1. An mm_struct not only handle a single CPU context, but may also handle external memory contexts encapsulated as gm_context listed in mm->gm_as. An external memory context can include a few or all of the following parts: an external MMU (that requires TLB invalidation), an external page table (that requires PTE manipulation) and external DRAM (that requires physical memory management). 2. Faulting a MAP_PRIVATE VMA with no CPU PTE found does not necessarily mean that a zero-filled physical page should be mapped. The virtual page may have been mapped to an external memory device. 3. Unmapping a page may include sending device TLB invalidation (even if its MMU shares CPU page table) and manipulating device PTEs. Semantics of new syscalls: 1. mmap(..., MAP_PRIVATE | MAP_PEER_SHARED) Allocate virtual address that is shared between the CPU and all attached devices. Data is guaranteed to be coherent whenever the address is accessed by either CPU or any attached device. If the device does not support page fault, then device driver is responsible for faulting memory before data gets accessed. By default, the CPU DRAM is can be used as a swap backup for the device local memory. 2. hmadvise(NUMA_id, va_start, size, memory_hint) Issuing memory hint for a given VMA. This extends traditional madvise() syscall with an extra argument so that programmers have better control with heterogeneous devices registered as NUMA nodes. One useful memory hint could be MADV_PREFETCH, which guarantees that the physical data of the given VMA [VA, VA+size) is migrated to NUMA node #id. Another useful memory hint is MADV_DONTNEED. This is helpful to increase device memory utilization. It
Re: [PATCH] drm/amdgpu: Fix uninitialized return value
Am 28.11.23 um 10:49 schrieb Lazar, Lijo: On 11/28/2023 3:07 PM, Christian König wrote: Am 27.11.23 um 22:55 schrieb Alex Deucher: On Mon, Nov 27, 2023 at 2:22 PM Christian König wrote: Am 27.11.23 um 19:29 schrieb Lijo Lazar: The return value is uniinitialized if ras context is NULL. Fixes: 0f4c8faa043c (drm/amdgpu: Move mca debug mode decision to ras) Signed-off-by: Lijo Lazar --- drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c index 1a8668a63e67..f6b47ebce9d6 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c @@ -3410,7 +3410,7 @@ int amdgpu_ras_reset_gpu(struct amdgpu_device *adev) int amdgpu_ras_set_mca_debug_mode(struct amdgpu_device *adev, bool enable) { struct amdgpu_ras *con = amdgpu_ras_get_context(adev); - int ret; + int ret = 0; That's usually considered very bad coding style and complained about by automated checkers. Instead explicitly set the return value in the code paths not actually setting it. In this case, the function is so short, I think it makes things less readable to do that. Yeah, indeed but that doesn't help us with the coding style checkers. Are these checkers for real? I see many instances of variable initialization including in core kernel code (ex: workqueue) code. Yes, I've got multiple complains about that already. What people basically seem to do is to search for patterns like "int ret = 0;... ret = whatever();.. return ret;" with cocci. This then results in a note that an initialization isn't necessary and should be avoided. Same for things like return after else, e.g. when you have something like this "if (...) { ret = whatever(); if (ret) return ret; } else { ... ret = 0;} return ret;". Regards, Christian. Thanks Lijo We could do something like this instead: if (!con) return 0; ret = amdgpu_mca_smu_set_debug_mode(adev, enable); ... Regards, Christian. Reviewed-by: Alex Deucher Regards, Christian. if (con) { ret = amdgpu_mca_smu_set_debug_mode(adev, enable);
Re: [PATCH] drm/amdgpu: add shared fdinfo stats
Am 17.11.23 um 20:56 schrieb Alex Deucher: Add shared stats. Useful for seeing shared memory. Signed-off-by: Alex Deucher --- drivers/gpu/drm/amd/amdgpu/amdgpu_fdinfo.c | 4 drivers/gpu/drm/amd/amdgpu/amdgpu_object.c | 11 +++ drivers/gpu/drm/amd/amdgpu/amdgpu_object.h | 6 ++ 3 files changed, 21 insertions(+) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_fdinfo.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_fdinfo.c index 5706b282a0c7..c7df7fa3459f 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_fdinfo.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_fdinfo.c @@ -97,6 +97,10 @@ void amdgpu_show_fdinfo(struct drm_printer *p, struct drm_file *file) stats.requested_visible_vram/1024UL); drm_printf(p, "amd-requested-gtt:\t%llu KiB\n", stats.requested_gtt/1024UL); + drm_printf(p, "drm-shared-vram:\t%llu KiB\n", stats.vram_shared/1024UL); + drm_printf(p, "drm-shared-gtt:\t%llu KiB\n", stats.gtt_shared/1024UL); + drm_printf(p, "drm-shared-cpu:\t%llu KiB\n", stats.cpu_shared/1024UL); + for (hw_ip = 0; hw_ip < AMDGPU_HW_IP_NUM; ++hw_ip) { if (!usage[hw_ip]) continue; diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c index d79b4ca1ecfc..c24f7b2c04c1 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c @@ -1287,25 +1287,36 @@ void amdgpu_bo_get_memory(struct amdgpu_bo *bo, struct amdgpu_mem_stats *stats) { uint64_t size = amdgpu_bo_size(bo); + struct drm_gem_object *obj; unsigned int domain; + bool shared; /* Abort if the BO doesn't currently have a backing store */ if (!bo->tbo.resource) return; + obj = &bo->tbo.base; + shared = obj->handle_count > 1; Interesting approach but I don't think that this is correct. The handle_count is basically how many GEM handles are there for BO, so for example it doesn't catch sharing things with V4L. What we should probably rather do is to take a look if bo->tbo.base.dma_buf is NULL or not. Regards, Christian. + domain = amdgpu_mem_type_to_domain(bo->tbo.resource->mem_type); switch (domain) { case AMDGPU_GEM_DOMAIN_VRAM: stats->vram += size; if (amdgpu_bo_in_cpu_visible_vram(bo)) stats->visible_vram += size; + if (shared) + stats->vram_shared += size; break; case AMDGPU_GEM_DOMAIN_GTT: stats->gtt += size; + if (shared) + stats->gtt_shared += size; break; case AMDGPU_GEM_DOMAIN_CPU: default: stats->cpu += size; + if (shared) + stats->cpu_shared += size; break; } diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.h index d28e21baef16..0503af75dc26 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.h +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.h @@ -138,12 +138,18 @@ struct amdgpu_bo_vm { struct amdgpu_mem_stats { /* current VRAM usage, includes visible VRAM */ uint64_t vram; + /* current shared VRAM usage, includes visible VRAM */ + uint64_t vram_shared; /* current visible VRAM usage */ uint64_t visible_vram; /* current GTT usage */ uint64_t gtt; + /* current shared GTT usage */ + uint64_t gtt_shared; /* current system memory usage */ uint64_t cpu; + /* current shared system memory usage */ + uint64_t cpu_shared; /* sum of evicted buffers, includes visible VRAM */ uint64_t evicted_vram; /* sum of evicted buffers due to CPU access */
Re: [PATCH 8/8] drm/xlnx: Do not include
Hi Am 28.11.23 um 12:02 schrieb Laurent Pinchart: Hi Thomas, Thank you for the patch. On Tue, Nov 28, 2023 at 11:45:24AM +0100, Thomas Zimmermann wrote: Remove unnecessary include statements for . The file contains helpers for non-atomic code and should not be required by most drivers. No functional changes. Signed-off-by: Thomas Zimmermann Assuming you've compile-tested the driver, I've COMPILE_TEST'ed the driver on aarch64. Reviewed-by: Laurent Pinchart Please get this merged through drm-misc as part of the series if possible. Sure. Best regards Thomas --- drivers/gpu/drm/xlnx/zynqmp_kms.c | 1 - 1 file changed, 1 deletion(-) diff --git a/drivers/gpu/drm/xlnx/zynqmp_kms.c b/drivers/gpu/drm/xlnx/zynqmp_kms.c index a7f8611be6f42..db3bb4afbfc46 100644 --- a/drivers/gpu/drm/xlnx/zynqmp_kms.c +++ b/drivers/gpu/drm/xlnx/zynqmp_kms.c @@ -27,7 +27,6 @@ #include #include #include -#include #include #include #include -- Thomas Zimmermann Graphics Driver Developer SUSE Software Solutions Germany GmbH Frankenstrasse 146, 90461 Nuernberg, Germany GF: Ivo Totev, Andrew Myers, Andrew McDonald, Boudien Moerman HRB 36809 (AG Nuernberg) OpenPGP_signature.asc Description: OpenPGP digital signature
Re: [PATCH] drm/amdgpu: add shared fdinfo stats
On Tue, Nov 28, 2023 at 9:17 AM Christian König wrote: > > Am 17.11.23 um 20:56 schrieb Alex Deucher: > > Add shared stats. Useful for seeing shared memory. > > > > Signed-off-by: Alex Deucher > > --- > > drivers/gpu/drm/amd/amdgpu/amdgpu_fdinfo.c | 4 > > drivers/gpu/drm/amd/amdgpu/amdgpu_object.c | 11 +++ > > drivers/gpu/drm/amd/amdgpu/amdgpu_object.h | 6 ++ > > 3 files changed, 21 insertions(+) > > > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_fdinfo.c > > b/drivers/gpu/drm/amd/amdgpu/amdgpu_fdinfo.c > > index 5706b282a0c7..c7df7fa3459f 100644 > > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_fdinfo.c > > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_fdinfo.c > > @@ -97,6 +97,10 @@ void amdgpu_show_fdinfo(struct drm_printer *p, struct > > drm_file *file) > > stats.requested_visible_vram/1024UL); > > drm_printf(p, "amd-requested-gtt:\t%llu KiB\n", > > stats.requested_gtt/1024UL); > > + drm_printf(p, "drm-shared-vram:\t%llu KiB\n", > > stats.vram_shared/1024UL); > > + drm_printf(p, "drm-shared-gtt:\t%llu KiB\n", stats.gtt_shared/1024UL); > > + drm_printf(p, "drm-shared-cpu:\t%llu KiB\n", stats.cpu_shared/1024UL); > > + > > for (hw_ip = 0; hw_ip < AMDGPU_HW_IP_NUM; ++hw_ip) { > > if (!usage[hw_ip]) > > continue; > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c > > b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c > > index d79b4ca1ecfc..c24f7b2c04c1 100644 > > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c > > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c > > @@ -1287,25 +1287,36 @@ void amdgpu_bo_get_memory(struct amdgpu_bo *bo, > > struct amdgpu_mem_stats *stats) > > { > > uint64_t size = amdgpu_bo_size(bo); > > + struct drm_gem_object *obj; > > unsigned int domain; > > + bool shared; > > > > /* Abort if the BO doesn't currently have a backing store */ > > if (!bo->tbo.resource) > > return; > > > > + obj = &bo->tbo.base; > > + shared = obj->handle_count > 1; > > Interesting approach but I don't think that this is correct. > > The handle_count is basically how many GEM handles are there for BO, so > for example it doesn't catch sharing things with V4L. > > What we should probably rather do is to take a look if > bo->tbo.base.dma_buf is NULL or not. +Rob, dri-devel This is what the generic drm helper code does. See drm_show_memory_stats(). If that is not correct that code should probably be fixed too. Alex > > Regards, > Christian. > > > > + > > domain = amdgpu_mem_type_to_domain(bo->tbo.resource->mem_type); > > switch (domain) { > > case AMDGPU_GEM_DOMAIN_VRAM: > > stats->vram += size; > > if (amdgpu_bo_in_cpu_visible_vram(bo)) > > stats->visible_vram += size; > > + if (shared) > > + stats->vram_shared += size; > > break; > > case AMDGPU_GEM_DOMAIN_GTT: > > stats->gtt += size; > > + if (shared) > > + stats->gtt_shared += size; > > break; > > case AMDGPU_GEM_DOMAIN_CPU: > > default: > > stats->cpu += size; > > + if (shared) > > + stats->cpu_shared += size; > > break; > > } > > > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.h > > b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.h > > index d28e21baef16..0503af75dc26 100644 > > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.h > > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.h > > @@ -138,12 +138,18 @@ struct amdgpu_bo_vm { > > struct amdgpu_mem_stats { > > /* current VRAM usage, includes visible VRAM */ > > uint64_t vram; > > + /* current shared VRAM usage, includes visible VRAM */ > > + uint64_t vram_shared; > > /* current visible VRAM usage */ > > uint64_t visible_vram; > > /* current GTT usage */ > > uint64_t gtt; > > + /* current shared GTT usage */ > > + uint64_t gtt_shared; > > /* current system memory usage */ > > uint64_t cpu; > > + /* current shared system memory usage */ > > + uint64_t cpu_shared; > > /* sum of evicted buffers, includes visible VRAM */ > > uint64_t evicted_vram; > > /* sum of evicted buffers due to CPU access */ >
Re: [7/8] drm/simpledrm: Do not include
Hi, On 2023/11/28 18:45, Thomas Zimmermann wrote: Remove unnecessary include statements for . The file contains helpers for non-atomic code and should not be required by most drivers. No functional changes. Signed-off-by: Thomas Zimmermann Reviewed-by: Sui Jingfeng --- drivers/gpu/drm/tiny/simpledrm.c | 1 - 1 file changed, 1 deletion(-) diff --git a/drivers/gpu/drm/tiny/simpledrm.c b/drivers/gpu/drm/tiny/simpledrm.c index 34bbbd7b53dd9..7ce1c46176750 100644 --- a/drivers/gpu/drm/tiny/simpledrm.c +++ b/drivers/gpu/drm/tiny/simpledrm.c @@ -25,7 +25,6 @@ #include #include #include -#include #include #define DRIVER_NAME "simpledrm"
[RFC PATCH 5/6] mm/gmem: resolve VMA conflicts for attached peer devices
This patch resolves potential VMA conflicts when mmap(MAP_PRIVATE | MAP_PEER_SHARED) is invoked. Note that the semantic of mmap(MAP_PRIVATE | MAP_PEER_SHARED) is to provide a coherent view of memory through the allocated virtual addresses between the CPU and all attached devices. However, an attached device may create its own computing context that does not necessarily share the same address space layout with the CPU process. Therefore, the mmap() syscall must return virtual addresses that are guaranteed to be valid across all attached peer devices. In current implementation, if a candidate VMA is detected to be conflicting, it will be temporarily blacklisted. The mmap_region() function will retry other VMA candidates for a predefined number of iterations. Signed-off-by: Weixi Zhu --- fs/proc/task_mmu.c | 3 ++ include/linux/gmem.h | 26 +++- include/linux/mm.h | 8 + include/uapi/asm-generic/mman-common.h | 1 + kernel/fork.c | 4 +++ mm/gmem.c | 38 mm/mempolicy.c | 4 +++ mm/mmap.c | 38 ++-- mm/vm_object.c | 41 ++ 9 files changed, 159 insertions(+), 4 deletions(-) diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c index ef2eb12906da..5af03d8f0319 100644 --- a/fs/proc/task_mmu.c +++ b/fs/proc/task_mmu.c @@ -701,6 +701,9 @@ static void show_smap_vma_flags(struct seq_file *m, struct vm_area_struct *vma) #endif /* CONFIG_HAVE_ARCH_USERFAULTFD_MINOR */ #ifdef CONFIG_X86_USER_SHADOW_STACK [ilog2(VM_SHADOW_STACK)] = "ss", +#endif +#ifdef CONFIG_GMEM + [ilog2(VM_PEER_SHARED)] = "ps", #endif }; size_t i; diff --git a/include/linux/gmem.h b/include/linux/gmem.h index 97186f29638d..82d88df5ce44 100644 --- a/include/linux/gmem.h +++ b/include/linux/gmem.h @@ -24,7 +24,10 @@ static inline bool gmem_is_enabled(void) static inline bool vma_is_peer_shared(struct vm_area_struct *vma) { - return false; + if (!gmem_is_enabled()) + return false; + + return !!(vma->vm_flags & VM_PEER_SHARED); } struct gm_dev { @@ -130,6 +133,8 @@ void unmap_gm_mappings_range(struct vm_area_struct *vma, unsigned long start, unsigned long end); void munmap_in_peer_devices(struct mm_struct *mm, unsigned long start, unsigned long end); +void gm_reserve_vma(struct vm_area_struct *value, struct list_head *head); +void gm_release_vma(struct mm_struct *mm, struct list_head *head); /* core gmem */ enum gm_ret { @@ -283,6 +288,10 @@ int gm_as_create(unsigned long begin, unsigned long end, struct gm_as **new_as); int gm_as_destroy(struct gm_as *as); int gm_as_attach(struct gm_as *as, struct gm_dev *dev, enum gm_mmu_mode mode, bool activate, struct gm_context **out_ctx); + +int gm_alloc_va_in_peer_devices(struct mm_struct *mm, + struct vm_area_struct *vma, unsigned long addr, + unsigned long len, vm_flags_t vm_flags); #else static inline bool gmem_is_enabled(void) { return false; } static inline bool vma_is_peer_shared(struct vm_area_struct *vma) @@ -339,6 +348,21 @@ int gm_as_attach(struct gm_as *as, struct gm_dev *dev, enum gm_mmu_mode mode, { return 0; } +static inline void gm_reserve_vma(struct vm_area_struct *value, + struct list_head *head) +{ +} +static inline void gm_release_vma(struct mm_struct *mm, struct list_head *head) +{ +} +static inline int gm_alloc_va_in_peer_devices(struct mm_struct *mm, + struct vm_area_struct *vma, + unsigned long addr, + unsigned long len, + vm_flags_t vm_flags) +{ + return 0; +} #endif #endif /* _GMEM_H */ diff --git a/include/linux/mm.h b/include/linux/mm.h index 418d26608ece..8837624e4c66 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -320,14 +320,22 @@ extern unsigned int kobjsize(const void *objp); #define VM_HIGH_ARCH_BIT_3 35 /* bit only usable on 64-bit architectures */ #define VM_HIGH_ARCH_BIT_4 36 /* bit only usable on 64-bit architectures */ #define VM_HIGH_ARCH_BIT_5 37 /* bit only usable on 64-bit architectures */ +#define VM_HIGH_ARCH_BIT_6 38 /* bit only usable on 64-bit architectures */ #define VM_HIGH_ARCH_0 BIT(VM_HIGH_ARCH_BIT_0) #define VM_HIGH_ARCH_1 BIT(VM_HIGH_ARCH_BIT_1) #define VM_HIGH_ARCH_2 BIT(VM_HIGH_ARCH_BIT_2) #define VM_HIGH_ARCH_3 BIT(VM_HIGH_ARCH_BIT_3) #define VM_HIGH_ARCH_4 BIT(VM_HIGH_ARCH_BIT_4) #define VM_HIGH_ARCH_5 BIT(VM_HIGH_ARCH_BIT_5) +#define VM_HIGH_ARCH_6 BIT(VM_HIGH_A
Re: [PATCH 5/8] drm/solomon: Do not include
On Tue, Nov 28, 2023 at 11:47 AM Thomas Zimmermann wrote: > Remove unnecessary include statements for . > The file contains helpers for non-atomic code and should not be > required by most drivers. No functional changes. > > Signed-off-by: Thomas Zimmermann Reviewed-by: Geert Uytterhoeven Gr{oetje,eeting}s, Geert -- Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- ge...@linux-m68k.org In personal conversations with technical people, I call myself a hacker. But when I'm talking to journalists I just say "programmer" or something like that. -- Linus Torvalds
[RFC PATCH 3/6] mm/gmem: add GMEM (Generalized Memory Management) interface for external accelerators
Accelerator driver developers are forced to reinvent external MM subsystems case by case, introducing redundant code (14K~70K for each case). This is because Linux core MM only considers host memory resources. At the same time, application-level developers suffer from poor programmability -- they must consider parallel address spaces and be careful about the limited device DRAM capacity. This patch adds GMEM interface to help accelerator drivers directly reuse Linux core MM, preventing them from reinventing the wheel. Drivers which utilize GMEM interface can directly support unified virtual address spaces for application users -- memory allocated with malloc()/mmap() can be directly used by either CPU and accelerators, providing a coherent view of memory. The GMEM device interface prefixed with "gm_dev" is used to decouple accelerator-specific operations. Device drivers should invoke gm_dev_create() to register a device instance at the device boot time. A device-specific implementation of "struct gm_mmu" must be provided, so Linux can invoke hardware-related functions at the right time. If the driver wants Linux to take charge of the local DRAM of the accelerator, then it should register a range of physical addresses to be managed by gm_dev_register_physmem(). The GMEM address space interface prefixed with "gm_as" is used to connect a device context with a CPU context, i.e. an mm_struct. Struct gm_as is created as a unified address space that not only includes a CPU context, but may also include one or more device contexts. Device driver should utilize gm_as_attach() to include a device context to a created struct gm_as. Then gm_dev_fault() can then serve as a generic device page fault handler. It is important that a device driver invokes gm_as_attach() at the beginning of a CPU program. This invocation can happen inside an ioctl() call when a device context is initialized. Signed-off-by: Weixi Zhu --- include/linux/gmem.h | 196 +++ include/linux/mm_types.h | 1 + mm/gmem.c| 408 +++ 3 files changed, 605 insertions(+) diff --git a/include/linux/gmem.h b/include/linux/gmem.h index 529ff6755a99..f424225daa03 100644 --- a/include/linux/gmem.h +++ b/include/linux/gmem.h @@ -13,6 +13,35 @@ #ifdef CONFIG_GMEM +#define GMEM_MMAP_RETRY_TIMES 10 /* gmem retry times before OOM */ + +DECLARE_STATIC_KEY_FALSE(gmem_status); + +static inline bool gmem_is_enabled(void) +{ + return static_branch_likely(&gmem_status); +} + +struct gm_dev { + int id; + + /* +* TODO: define more device capabilities and consider different device +* base page sizes +*/ + unsigned long capability; + struct gm_mmu *mmu; + void *dev_data; + /* A device may support time-sliced context switch. */ + struct gm_context *current_ctx; + + struct list_head gm_ctx_list; + + /* Add tracking of registered device local physical memory. */ + nodemask_t registered_hnodes; + struct device *dma_dev; +}; + #define GM_PAGE_CPU0x10 /* Determines whether page is a pointer or a pfn number. */ #define GM_PAGE_DEVICE 0x20 #define GM_PAGE_NOMAP 0x40 @@ -96,7 +125,161 @@ void unmap_gm_mappings_range(struct vm_area_struct *vma, unsigned long start, unsigned long end); void munmap_in_peer_devices(struct mm_struct *mm, unsigned long start, unsigned long end); + +/* core gmem */ +enum gm_ret { + GM_RET_SUCCESS = 0, + GM_RET_NOMEM, + GM_RET_PAGE_EXIST, + GM_RET_MIGRATING, + GM_RET_FAILURE_UNKNOWN, +}; + +/** + * enum gm_mmu_mode - defines the method to share a physical page table. + * + * @GM_MMU_MODE_SHARE: Share a physical page table with another attached + * device's MMU, requiring one of the attached MMUs to be compatible. For + * example, the IOMMU is compatible with the CPU MMU on most modern machines. + * This mode requires the device physical memory to be cache-coherent. + * TODO: add MMU cookie to detect compatible MMUs. + * + * @GM_MMU_MODE_COHERENT_EXCLUSIVE: Maintain a coherent page table that holds + * exclusive mapping entries, so that device memory accesses can trigger + * fault-driven migration for automatic data locality optimizations. + * This mode does not require a cache-coherent link between the CPU and device. + * + * @GM_MMU_MODE_REPLICATE: Maintain a coherent page table that replicates + * physical mapping entries whenever a physical mapping is installed inside the + * address space, so that it may minimize the page faults to be triggered by + * this device. + * This mode requires the device physical memory to be cache-coherent. + */ +enum gm_mmu_mode { + GM_MMU_MODE_SHARE, + GM_MMU_MODE_COHERENT_EXCLUSIVE, + GM_MMU_MODE_REPLICATE, +}; + +enum gm_fault_hint { + GM_FAULT_HINT_MARK_HOT, + /* +* TODO: introduce other fault hints, e.g. read-on
[RFC PATCH 0/6] Supporting GMEM (generalized memory management) for external memory devices
The problem: Accelerator driver developers are forced to reinvent external MM subsystems case by case, because Linux core MM only considers host memory resources. These reinvented MM subsystems have similar orders of magnitude of LoC as Linux MM (80K), e.g. Nvidia-UVM has 70K, AMD GPU has 14K and Huawei NPU has 30K. Meanwhile, more and more vendors are implementing their own accelerators, e.g. Microsoft's Maia 100. At the same time, application-level developers suffer from poor programmability -- they must consider parallel address spaces and be careful about the limited device DRAM capacity. This can be alleviated if a malloc()-ed virtual address can be shared by the accelerator, or the abundant host DRAM can further transparently backup the device local memory. These external MM systems share similar mechanisms except for the hardware-dependent part, so reinventing them is effectively introducing redundant code (14K~70K for each case). Such developing/maintaining is not cheap. Furthermore, to share a malloc()-ed virtual address, device drivers need to deeply interact with Linux MM via low-level MM APIs, e.g. MMU notifiers/HMM. This raises the bar for driver development, since developers must understand how Linux MM works. Further, it creates code maintenance problems -- any changes to Linux MM potentially require coordinated changes to accelerator drivers using low-level MM APIs. Putting a cache-coherent bus between host and device will not make these external MM subsystems disappear. For example, a throughput-oriented accelerator will not tolerate executing heavy memory access workload with a host MMU/IOMMU via a remote bus. Therefore, devices will still have their own MMU and pick a simpler page table format for lower address translation overhead, requiring external MM subsystems. What GMEM (Generalized Memory Management [1]) does: GMEM extends Linux MM to share its machine-independent MM code. Only high-level interface is provided for device drivers. This prevents accelerator drivers from reinventing the wheel, but relies on drivers to implement their hardware-dependent functions declared by GMEM. GMEM's key interface include gm_dev_create(), gm_as_create(), gm_as_attach() and gm_dev_register_physmem(). Here briefly describe how a device driver utilizes them: 1. At boot time, call gm_dev_create() and registers the implementation of hardware-dependent functions as declared in struct gm_mmu. - If the device has local DRAM, call gm_dev_register_physmem() to register available physical addresses. 2. When a device context is initialized (e.g. triggered by ioctl), check if the current CPU process has been attached to a gmem address space (struct gm_as). If not, call gm_as_create() and point current->mm->gm_as to it. 3. Call gm_as_attach() to attach the device context to a gmem address space. 4. Invoke gm_dev_fault() to resolve a page fault or prepare data before device computation happens. GMEM has changed the following assumptions in Linux MM: 1. An mm_struct not only handle a single CPU context, but may also handle external memory contexts encapsulated as gm_context listed in mm->gm_as. An external memory context can include a few or all of the following parts: an external MMU (that requires TLB invalidation), an external page table (that requires PTE manipulation) and external DRAM (that requires physical memory management). 2. Faulting a MAP_PRIVATE VMA with no CPU PTE found does not necessarily mean that a zero-filled physical page should be mapped. The virtual page may have been mapped to an external memory device. 3. Unmapping a page may include sending device TLB invalidation (even if its MMU shares CPU page table) and manipulating device PTEs. Semantics of new syscalls: 1. mmap(..., MAP_PRIVATE | MAP_PEER_SHARED) Allocate virtual address that is shared between the CPU and all attached devices. Data is guaranteed to be coherent whenever the address is accessed by either CPU or any attached device. If the device does not support page fault, then device driver is responsible for faulting memory before data gets accessed. By default, the CPU DRAM is can be used as a swap backup for the device local memory. 2. hmadvise(NUMA_id, va_start, size, memory_hint) Issuing memory hint for a given VMA. This extends traditional madvise() syscall with an extra argument so that programmers have better control with heterogeneous devices registered as NUMA nodes. One useful memory hint could be MADV_PREFETCH, which guarantees that the physical data of the given VMA [VA, VA+size) is migrated to NUMA node #id. Another useful memory hint is MADV_DONTNEED. This is helpful to increase device memory utilization. It is worth considering extending the existing madvise() syscall with one additional argument. Implementation det
[RFC PATCH 4/6] mm/gmem: add new syscall hmadvise() to issue memory hints for heterogeneous NUMA nodes
This patch adds a new syscall, hmadvise(), to issue memory hints for heterogeneous NUMA nodes. The new syscall effectively extends madvise() with one additional argument that indicates the NUMA id of a heterogeneous device, which is not necessarily accessible by the CPU. The implemented memory hint is MADV_PREFETCH, which guarantees that the physical data of the given VMA [VA, VA+size) is migrated to a designated NUMA id, so subsequent accesses from the corresponding device can obtain local memory access speed. This prefetch hint is internally parallized with multiple workqueue threads, allowing the page table management to be overlapped. In a test with Huawei's Ascend NPU card, the MADV_PREFETCH is able to saturate the host-device bandwidth if the given VMA size is larger than 16MB. Signed-off-by: Weixi Zhu --- arch/arm64/include/asm/unistd.h | 2 +- arch/arm64/include/asm/unistd32.h | 2 + include/linux/gmem.h| 9 + include/uapi/asm-generic/mman-common.h | 3 + include/uapi/asm-generic/unistd.h | 5 +- kernel/sys_ni.c | 2 + mm/gmem.c | 222 tools/include/uapi/asm-generic/unistd.h | 5 +- 8 files changed, 247 insertions(+), 3 deletions(-) diff --git a/arch/arm64/include/asm/unistd.h b/arch/arm64/include/asm/unistd.h index 531effca5f1f..298313d2e0af 100644 --- a/arch/arm64/include/asm/unistd.h +++ b/arch/arm64/include/asm/unistd.h @@ -39,7 +39,7 @@ #define __ARM_NR_compat_set_tls(__ARM_NR_COMPAT_BASE + 5) #define __ARM_NR_COMPAT_END(__ARM_NR_COMPAT_BASE + 0x800) -#define __NR_compat_syscalls 457 +#define __NR_compat_syscalls 458 #endif #define __ARCH_WANT_SYS_CLONE diff --git a/arch/arm64/include/asm/unistd32.h b/arch/arm64/include/asm/unistd32.h index 9f7c1bf99526..0d44383b98be 100644 --- a/arch/arm64/include/asm/unistd32.h +++ b/arch/arm64/include/asm/unistd32.h @@ -919,6 +919,8 @@ __SYSCALL(__NR_futex_wake, sys_futex_wake) __SYSCALL(__NR_futex_wait, sys_futex_wait) #define __NR_futex_requeue 456 __SYSCALL(__NR_futex_requeue, sys_futex_requeue) +#define __NR_hmadvise 457 +__SYSCALL(__NR_hmadvise, sys_hmadvise) /* * Please add new compat syscalls above this comment and update diff --git a/include/linux/gmem.h b/include/linux/gmem.h index f424225daa03..97186f29638d 100644 --- a/include/linux/gmem.h +++ b/include/linux/gmem.h @@ -22,6 +22,11 @@ static inline bool gmem_is_enabled(void) return static_branch_likely(&gmem_status); } +static inline bool vma_is_peer_shared(struct vm_area_struct *vma) +{ + return false; +} + struct gm_dev { int id; @@ -280,6 +285,10 @@ int gm_as_attach(struct gm_as *as, struct gm_dev *dev, enum gm_mmu_mode mode, bool activate, struct gm_context **out_ctx); #else static inline bool gmem_is_enabled(void) { return false; } +static inline bool vma_is_peer_shared(struct vm_area_struct *vma) +{ + return false; +} static inline void hnuma_init(void) {} static inline void __init vm_object_init(void) { diff --git a/include/uapi/asm-generic/mman-common.h b/include/uapi/asm-generic/mman-common.h index 6ce1f1ceb432..49b22a497c5d 100644 --- a/include/uapi/asm-generic/mman-common.h +++ b/include/uapi/asm-generic/mman-common.h @@ -79,6 +79,9 @@ #define MADV_COLLAPSE 25 /* Synchronous hugepage collapse */ +/* for hmadvise */ +#define MADV_PREFETCH 26 /* prefetch pages for hNUMA node */ + /* compatibility flags */ #define MAP_FILE 0 diff --git a/include/uapi/asm-generic/unistd.h b/include/uapi/asm-generic/unistd.h index 756b013fb832..a0773d4f7fa5 100644 --- a/include/uapi/asm-generic/unistd.h +++ b/include/uapi/asm-generic/unistd.h @@ -829,8 +829,11 @@ __SYSCALL(__NR_futex_wait, sys_futex_wait) #define __NR_futex_requeue 456 __SYSCALL(__NR_futex_requeue, sys_futex_requeue) +#define __NR_hmadvise 453 +__SYSCALL(__NR_hmadvise, sys_hmadvise) + #undef __NR_syscalls -#define __NR_syscalls 457 +#define __NR_syscalls 458 /* * 32 bit systems traditionally used different diff --git a/kernel/sys_ni.c b/kernel/sys_ni.c index e1a6e3c675c0..73bc1b35b8c6 100644 --- a/kernel/sys_ni.c +++ b/kernel/sys_ni.c @@ -374,3 +374,5 @@ COND_SYSCALL(setuid16); /* restartable sequence */ COND_SYSCALL(rseq); + +COND_SYSCALL(hmadvise); diff --git a/mm/gmem.c b/mm/gmem.c index b95b6b42ed6d..4eb522026a0d 100644 --- a/mm/gmem.c +++ b/mm/gmem.c @@ -9,6 +9,8 @@ #include #include #include +#include +#include DEFINE_STATIC_KEY_FALSE(gmem_status); EXPORT_SYMBOL_GPL(gmem_status); @@ -484,3 +486,223 @@ int gm_as_attach(struct gm_as *as, struct gm_dev *dev, enum gm_mmu_mode mode, return GM_RET_SUCCESS; } EXPORT_SYMBOL_GPL(gm_as_attach); + +struct prefetch_data { + struct mm_struct *mm; + struct gm_dev *dev; + unsigned long addr; + size_t size; + struct work_struct work;
Re: [3/8] drm/loongson: Do not include
Hi, Thank you for the patch. On 2023/11/28 18:45, Thomas Zimmermann wrote: Remove unnecessary include statements for . The file contains helpers for non-atomic code and should not be required by most drivers. No functional changes. Signed-off-by: Thomas Zimmermann Reviewed-by: Sui Jingfeng Tested-by: Sui Jingfeng --- drivers/gpu/drm/loongson/lsdc_plane.c | 1 - 1 file changed, 1 deletion(-) diff --git a/drivers/gpu/drm/loongson/lsdc_plane.c b/drivers/gpu/drm/loongson/lsdc_plane.c index 0d50946332229..d227a2c1dcf16 100644 --- a/drivers/gpu/drm/loongson/lsdc_plane.c +++ b/drivers/gpu/drm/loongson/lsdc_plane.c @@ -9,7 +9,6 @@ #include #include #include -#include #include "lsdc_drv.h" #include "lsdc_regs.h"
[RFC PATCH 6/6] mm/gmem: extending Linux core MM to support unified virtual address space
This patch extends Linux core MM to support unified virtual address space. A unified virtual address space provides a coherent view of memory for the CPU and devices. This is achieved by maintaining coherent page tables for the CPU and any attached devices for each process, without assuming that the underlying interconnect between the CPU and peripheral device is cache-coherent. Specifically, for each mm_struct that is attached with one or more device computing contexts, a per-process logical page table is utilized to track the mapping status of anonymous memory allocated via mmap(MAP_PRIVATE | MAP_PEER_SHARED). The CPU page fault handling path is modified to examine whether a faulted virtual page has already been faulted elsewhere, e.g. on a device, by looking up the logical page table in vm_object. If so, a page migration operation should be orchestrated by the core MM to prepare the CPU physical page, instead of zero-filling. This is achieved by invoking gm_host_fault_locked(). The logical page table must also be updated once the CPU page table gets modified. Ideally, the logical page table should always be looked up or modified first if the CPU page table is changed, but the currently implementation is reverse. Also, current implementation only considers anonymous memory, while a device may want to operate on a disk-file directly via mmap(fd). In the future, logical page table is planned to play a more generic role for anonymous memory, folios/huge pages and file-backed memory, as well as to provide a clean abstraction for CPU page table functions (including these stage-2 functions). More, the page fault handler path will be enhanced to deal with cache-coherent buses as well, since it might be desirable for devices to operate sparse data remotely instead of migration data at page granules. Signed-off-by: Weixi Zhu --- kernel/fork.c| 1 + mm/huge_memory.c | 85 +++- mm/memory.c | 42 +--- mm/mmap.c| 2 ++ mm/oom_kill.c| 2 ++ mm/vm_object.c | 84 +++ 6 files changed, 203 insertions(+), 13 deletions(-) diff --git a/kernel/fork.c b/kernel/fork.c index eab96cdb25a6..06130c73bf2e 100644 --- a/kernel/fork.c +++ b/kernel/fork.c @@ -543,6 +543,7 @@ static void vm_area_free_rcu_cb(struct rcu_head *head) void vm_area_free(struct vm_area_struct *vma) { + free_gm_mappings(vma); #ifdef CONFIG_PER_VMA_LOCK call_rcu(&vma->vm_rcu, vm_area_free_rcu_cb); #else diff --git a/mm/huge_memory.c b/mm/huge_memory.c index 4f542444a91f..59f63f04 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -37,6 +37,7 @@ #include #include #include +#include #include #include @@ -684,6 +685,10 @@ static vm_fault_t __do_huge_pmd_anonymous_page(struct vm_fault *vmf, pgtable_t pgtable; unsigned long haddr = vmf->address & HPAGE_PMD_MASK; vm_fault_t ret = 0; + struct gm_mapping *gm_mapping = NULL; + + if (vma_is_peer_shared(vma)) + gm_mapping = vm_object_lookup(vma->vm_mm->vm_obj, haddr); VM_BUG_ON_FOLIO(!folio_test_large(folio), folio); @@ -691,7 +696,8 @@ static vm_fault_t __do_huge_pmd_anonymous_page(struct vm_fault *vmf, folio_put(folio); count_vm_event(THP_FAULT_FALLBACK); count_vm_event(THP_FAULT_FALLBACK_CHARGE); - return VM_FAULT_FALLBACK; + ret = VM_FAULT_FALLBACK; + goto gm_mapping_release; } folio_throttle_swaprate(folio, gfp); @@ -701,7 +707,14 @@ static vm_fault_t __do_huge_pmd_anonymous_page(struct vm_fault *vmf, goto release; } - clear_huge_page(page, vmf->address, HPAGE_PMD_NR); + /* +* Skip zero-filling page if the logical mapping indicates +* that page contains valid data of the virtual address. This +* could happen if the page was a victim of device memory +* oversubscription. +*/ + if (!(vma_is_peer_shared(vma) && gm_mapping_cpu(gm_mapping))) + clear_huge_page(page, vmf->address, HPAGE_PMD_NR); /* * The memory barrier inside __folio_mark_uptodate makes sure that * clear_huge_page writes become visible before the set_pmd_at() @@ -726,7 +739,7 @@ static vm_fault_t __do_huge_pmd_anonymous_page(struct vm_fault *vmf, pte_free(vma->vm_mm, pgtable); ret = handle_userfault(vmf, VM_UFFD_MISSING); VM_BUG_ON(ret & VM_FAULT_FALLBACK); - return ret; + goto gm_mapping_release; } entry = mk_huge_pmd(page, vma->vm_page_prot); @@ -734,6 +747,13 @@ static vm_fault_t __do_huge_pmd_anonymous_page(struct vm_fault *vmf, folio_add_new_anon_rmap(folio, vma, haddr); folio_add_lru_vma(folio, vma);
[RFC PATCH 2/6] mm/gmem: add arch-independent abstraction to track address mapping status
This patch adds an abstraction layer, struct vm_object, that maintains per-process virtual-to-physical mapping status stored in struct gm_mapping. For example, a virtual page may be mapped to a CPU physical page or to a device physical page. Struct vm_object effectively maintains an arch-independent page table, which is defined as a "logical page table". While arch-dependent page table used by a real MMU is named a "physical page table". The logical page table is useful if Linux core MM is extended to handle a unified virtual address space with external accelerators using customized MMUs. In this patch, struct vm_object utilizes a radix tree (xarray) to track where a virtual page is mapped to. This adds extra memory consumption from xarray, but provides a nice abstraction to isolate mapping status from the machine-dependent layer (PTEs). Besides supporting accelerators with external MMUs, struct vm_object is planned to further union with i_pages in struct address_mapping for file-backed memory. The idea of struct vm_object is originated from FreeBSD VM design, which provides a unified abstraction for anonymous memory, file-backed memory, page cache and etc[1]. Currently, Linux utilizes a set of hierarchical page walk functions to abstract page table manipulations of different CPU architecture. The problem happens when a device wants to reuse Linux MM code to manage its page table -- the device page table may not be accessible to the CPU. Existing solution like Linux HMM utilizes the MMU notifier mechanisms to invoke device-specific MMU functions, but relies on encoding the mapping status on the CPU page table entries. This entangles machine-independent code with machine-dependent code, and also brings unnecessary restrictions. The PTE size and format vary arch by arch, which harms the extensibility. [1] https://docs.freebsd.org/en/articles/vm-design/ Signed-off-by: Weixi Zhu --- include/linux/gmem.h | 120 + include/linux/mm_types.h | 4 + mm/Makefile | 2 +- mm/vm_object.c | 184 +++ 4 files changed, 309 insertions(+), 1 deletion(-) create mode 100644 mm/vm_object.c diff --git a/include/linux/gmem.h b/include/linux/gmem.h index fff877873557..529ff6755a99 100644 --- a/include/linux/gmem.h +++ b/include/linux/gmem.h @@ -9,11 +9,131 @@ #ifndef _GMEM_H #define _GMEM_H +#include + #ifdef CONFIG_GMEM + +#define GM_PAGE_CPU0x10 /* Determines whether page is a pointer or a pfn number. */ +#define GM_PAGE_DEVICE 0x20 +#define GM_PAGE_NOMAP 0x40 +#define GM_PAGE_WILLNEED 0x80 + +#define GM_PAGE_TYPE_MASK (GM_PAGE_CPU | GM_PAGE_DEVICE | GM_PAGE_NOMAP) + +struct gm_mapping { + unsigned int flag; + + union { + struct page *page; /* CPU node */ + struct gm_dev *dev; /* hetero-node. TODO: support multiple devices */ + unsigned long pfn; + }; + + struct mutex lock; +}; + +static inline void gm_mapping_flags_set(struct gm_mapping *gm_mapping, int flags) +{ + if (flags & GM_PAGE_TYPE_MASK) + gm_mapping->flag &= ~GM_PAGE_TYPE_MASK; + + gm_mapping->flag |= flags; +} + +static inline void gm_mapping_flags_clear(struct gm_mapping *gm_mapping, int flags) +{ + gm_mapping->flag &= ~flags; +} + +static inline bool gm_mapping_cpu(struct gm_mapping *gm_mapping) +{ + return !!(gm_mapping->flag & GM_PAGE_CPU); +} + +static inline bool gm_mapping_device(struct gm_mapping *gm_mapping) +{ + return !!(gm_mapping->flag & GM_PAGE_DEVICE); +} + +static inline bool gm_mapping_nomap(struct gm_mapping *gm_mapping) +{ + return !!(gm_mapping->flag & GM_PAGE_NOMAP); +} + +static inline bool gm_mapping_willneed(struct gm_mapping *gm_mapping) +{ + return !!(gm_mapping->flag & GM_PAGE_WILLNEED); +} + /* h-NUMA topology */ void __init hnuma_init(void); + +/* vm object */ +/* + * Each per-process vm_object tracks the mapping status of virtual pages from + * all VMAs mmap()-ed with MAP_PRIVATE | MAP_PEER_SHARED. + */ +struct vm_object { + spinlock_t lock; + + /* +* The logical_page_table is a container that holds the mapping +* information between a VA and a struct page. +*/ + struct xarray *logical_page_table; + atomic_t nr_pages; +}; + +int __init vm_object_init(void); +struct vm_object *vm_object_create(struct mm_struct *mm); +void vm_object_drop_locked(struct mm_struct *mm); + +struct gm_mapping *alloc_gm_mapping(void); +void free_gm_mappings(struct vm_area_struct *vma); +struct gm_mapping *vm_object_lookup(struct vm_object *obj, unsigned long va); +void vm_object_mapping_create(struct vm_object *obj, unsigned long start); +void unmap_gm_mappings_range(struct vm_area_struct *vma, unsigned long start, +unsigned long end); +void munmap_in_peer_devices(struct mm_struct *mm, unsigned long start, + u
Re: [6/8] drm/ofdrm: Do not include
Hi, On 2023/11/28 18:45, Thomas Zimmermann wrote: Remove unnecessary include statements for . The file contains helpers for non-atomic code and should not be required by most drivers. No functional changes. Signed-off-by: Thomas Zimmermann Reviewed-by: Sui Jingfeng --- drivers/gpu/drm/tiny/ofdrm.c | 1 - 1 file changed, 1 deletion(-) diff --git a/drivers/gpu/drm/tiny/ofdrm.c b/drivers/gpu/drm/tiny/ofdrm.c index 05a72473cfc65..ab89b7fc7bf61 100644 --- a/drivers/gpu/drm/tiny/ofdrm.c +++ b/drivers/gpu/drm/tiny/ofdrm.c @@ -19,7 +19,6 @@ #include #include #include -#include #include #include
Re: [PATCH 4/8] drm/shmobile: Do not include
On Tue, Nov 28, 2023 at 11:47 AM Thomas Zimmermann wrote: > Remove unnecessary include statements for . > The file contains helpers for non-atomic code and should not be > required by most drivers. No functional changes. > > Signed-off-by: Thomas Zimmermann Reviewed-by: Geert Uytterhoeven Acked-by: Geert Uytterhoeven Gr{oetje,eeting}s, Geert -- Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- ge...@linux-m68k.org In personal conversations with technical people, I call myself a hacker. But when I'm talking to journalists I just say "programmer" or something like that. -- Linus Torvalds
[RFC PATCH 1/6] mm/gmem: add heterogeneous NUMA node
This patch adds a new NUMA node state, named N_HETEROGENEOUS. It is utilized to identify heterogeneous NUMA (hNUMA) node. Note that hNUMA node may not be directly accessible by the CPU. Each hNUMA node can be identified with a NUMA id. This can be extended to provide NUMA topology including device local DRAM, where a cache-coherent bus does not need to exist between the CPU and device local DRAM. Furthermore, this allows an application user to issue memory hints that bind with specific hNUMA nodes. Signed-off-by: Weixi Zhu --- drivers/base/node.c | 6 include/linux/gmem.h | 19 ++ include/linux/nodemask.h | 10 ++ init/main.c | 2 ++ mm/Kconfig | 14 mm/Makefile | 1 + mm/gmem.c| 78 mm/page_alloc.c | 3 ++ 8 files changed, 133 insertions(+) create mode 100644 include/linux/gmem.h create mode 100644 mm/gmem.c diff --git a/drivers/base/node.c b/drivers/base/node.c index 493d533f8375..aa4d2ca266aa 100644 --- a/drivers/base/node.c +++ b/drivers/base/node.c @@ -928,6 +928,9 @@ static struct node_attr node_state_attr[] = { [N_CPU] = _NODE_ATTR(has_cpu, N_CPU), [N_GENERIC_INITIATOR] = _NODE_ATTR(has_generic_initiator, N_GENERIC_INITIATOR), +#ifdef CONFIG_GMEM + [N_HETEROGENEOUS] = _NODE_ATTR(has_hetero_memory, N_HETEROGENEOUS), +#endif }; static struct attribute *node_state_attrs[] = { @@ -940,6 +943,9 @@ static struct attribute *node_state_attrs[] = { &node_state_attr[N_MEMORY].attr.attr, &node_state_attr[N_CPU].attr.attr, &node_state_attr[N_GENERIC_INITIATOR].attr.attr, +#ifdef CONFIG_GMEM + &node_state_attr[N_HETEROGENEOUS].attr.attr, +#endif NULL }; diff --git a/include/linux/gmem.h b/include/linux/gmem.h new file mode 100644 index ..fff877873557 --- /dev/null +++ b/include/linux/gmem.h @@ -0,0 +1,19 @@ +/* SPDX-License-Identifier: GPL-2.0-only */ +/* + * Generalized Memory Management. + * + * Copyright (C) 2023- Huawei, Inc. + * Author: Weixi Zhu + * + */ +#ifndef _GMEM_H +#define _GMEM_H + +#ifdef CONFIG_GMEM +/* h-NUMA topology */ +void __init hnuma_init(void); +#else +static inline void hnuma_init(void) {} +#endif + +#endif /* _GMEM_H */ diff --git a/include/linux/nodemask.h b/include/linux/nodemask.h index 8d07116caaf1..66e4640a52ba 100644 --- a/include/linux/nodemask.h +++ b/include/linux/nodemask.h @@ -407,6 +407,9 @@ enum node_states { N_MEMORY, /* The node has memory(regular, high, movable) */ N_CPU, /* The node has one or more cpus */ N_GENERIC_INITIATOR,/* The node has one or more Generic Initiators */ +#ifdef CONFIG_GMEM + N_HETEROGENEOUS,/* The node has heterogeneous memory */ +#endif NR_NODE_STATES }; @@ -536,6 +539,13 @@ static inline int node_random(const nodemask_t *maskp) #define for_each_node(node) for_each_node_state(node, N_POSSIBLE) #define for_each_online_node(node) for_each_node_state(node, N_ONLINE) +#ifdef CONFIG_GMEM +/* For h-NUMA topology */ +#define hnode_map node_states[N_HETEROGENEOUS] +#define num_hnodes() num_node_state(N_HETEROGENEOUS) +#define for_each_hnode(node) for_each_node_state(node, N_HETEROGENEOUS) +#endif + /* * For nodemask scratch area. * NODEMASK_ALLOC(type, name) allocates an object with a specified type and diff --git a/init/main.c b/init/main.c index e24b0780fdff..12dfb5b63d51 100644 --- a/init/main.c +++ b/init/main.c @@ -100,6 +100,7 @@ #include #include #include +#include #include #include @@ -901,6 +902,7 @@ void start_kernel(void) setup_per_cpu_areas(); smp_prepare_boot_cpu(); /* arch-specific boot-cpu hooks */ boot_cpu_hotplug_init(); + hnuma_init(); pr_notice("Kernel command line: %s\n", saved_command_line); /* parameters may set static keys */ diff --git a/mm/Kconfig b/mm/Kconfig index 89971a894b60..1a7d8194513c 100644 --- a/mm/Kconfig +++ b/mm/Kconfig @@ -1270,6 +1270,20 @@ config LOCK_MM_AND_FIND_VMA bool depends on !STACK_GROWSUP +config GMEM + bool "generalized memory management for external memory devices" + depends on (ARM64 || X86_64) && MMU && TRANSPARENT_HUGEPAGE + select ARCH_USES_HIGH_VMA_FLAGS + default y + help + Supporting GMEM (generalized memory management) for external memory + devices + + GMEM extends Linux MM to share its machine-independent MM code. Only + high-level interface is provided for device drivers. This prevents + accelerator drivers from reinventing the wheel, but relies on drivers to + implement their hardware-dependent functions declared by GMEM. + source "mm/damon/Kconfig" endmenu diff --git a/mm/Makefile b/mm/Makefile index 33873c8aedb3..f48ea2eb4a44 100644 --- a/mm/Makefi
Re: [PATCH] drm/amdgpu: Fix uninitialized return value
On 2023-11-28 8:18, Christian König wrote: Am 28.11.23 um 10:49 schrieb Lazar, Lijo: On 11/28/2023 3:07 PM, Christian König wrote: Am 27.11.23 um 22:55 schrieb Alex Deucher: On Mon, Nov 27, 2023 at 2:22 PM Christian König wrote: Am 27.11.23 um 19:29 schrieb Lijo Lazar: The return value is uniinitialized if ras context is NULL. Fixes: 0f4c8faa043c (drm/amdgpu: Move mca debug mode decision to ras) Signed-off-by: Lijo Lazar --- drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c index 1a8668a63e67..f6b47ebce9d6 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c @@ -3410,7 +3410,7 @@ int amdgpu_ras_reset_gpu(struct amdgpu_device *adev) int amdgpu_ras_set_mca_debug_mode(struct amdgpu_device *adev, bool enable) { struct amdgpu_ras *con = amdgpu_ras_get_context(adev); - int ret; + int ret = 0; That's usually considered very bad coding style and complained about by automated checkers. Instead explicitly set the return value in the code paths not actually setting it. In this case, the function is so short, I think it makes things less readable to do that. Yeah, indeed but that doesn't help us with the coding style checkers. Are these checkers for real? I see many instances of variable initialization including in core kernel code (ex: workqueue) code. Yes, I've got multiple complains about that already. What people basically seem to do is to search for patterns like "int ret = 0;... ret = whatever();.. return ret;" with cocci. This then results in a note that an initialization isn't necessary and should be avoided. Isn't that the opposite of defensive programming? If you write your kernel code in Rust, all your local variables will be implicitly initialized. In C you have to do it yourself. And the compiler is notoriously inconsistent warning about uninitialized variables. Regards, Felix Same for things like return after else, e.g. when you have something like this "if (...) { ret = whatever(); if (ret) return ret; } else { ... ret = 0;} return ret;". Regards, Christian. Thanks Lijo We could do something like this instead: if (!con) return 0; ret = amdgpu_mca_smu_set_debug_mode(adev, enable); ... Regards, Christian. Reviewed-by: Alex Deucher Regards, Christian. if (con) { ret = amdgpu_mca_smu_set_debug_mode(adev, enable);
PSP_VMBX_POLLING_LIMIT too big
Hi, In amd-staging-drm-next 46fe6312082c ("drm/amdgpu: update retry times for psp BL wait") and upstream a11156ff6f41 ("drm/amdgpu: update retry times for psp BL wait") the number of loops for psp_v13_0_wait_for_bootloader() to try again increased significantly. It went from 10 loops to 20k loops. Essentially this means that the function can "effectively" no longer fail. I've got an issue I'm looking at where runtime resume for a dGPU fails, and because of this change the system gets stuck in a never ending busy loop instead of cleanly returning an error code to the caller. The outcome is the system appears hung while the 20k loops run instead of just the dGPU failing to resume. Is this 20k value really required? Or can we reduce it back to something more manageable? Thanks,
Re: [PATCH 1/3] Revert "drm/prime: Unexport helpers for fd/handle conversion"
On Thu, Nov 23, 2023 at 6:12 PM Felix Kuehling wrote: > > [+Alex] > > On 2023-11-17 16:44, Felix Kuehling wrote: > > > This reverts commit 71a7974ac7019afeec105a54447ae1dc7216cbb3. > > > > These helper functions are needed for KFD to export and import DMABufs > > the right way without duplicating the tracking of DMABufs associated with > > GEM objects while ensuring that move notifier callbacks are working as > > intended. > > > > CC: Christian König > > CC: Thomas Zimmermann > > Signed-off-by: Felix Kuehling > > Re: our discussion about v2 of this patch: If this version is > acceptable, can I get an R-b or A-b? > > I would like to get this patch into drm-next as a prerequisite for > patches 2 and 3. I cannot submit it to the current amd-staging-drm-next > because the patch I'm reverting doesn't exist there yet. > > Patch 2 and 3 could go into drm-next as well, or go through Alex's > amd-staging-drm-next branch once patch 1 is in drm-next. Alex, how do > you prefer to coordinate this? I guess ideally this would go through my drm-next tree since your other patches depend on it unless others feel strongly that it should go through drm-misc. Alex > > Regards, >Felix > > > > --- > > drivers/gpu/drm/drm_prime.c | 33 ++--- > > include/drm/drm_prime.h | 7 +++ > > 2 files changed, 25 insertions(+), 15 deletions(-) > > > > diff --git a/drivers/gpu/drm/drm_prime.c b/drivers/gpu/drm/drm_prime.c > > index 63b709a67471..834a5e28abbe 100644 > > --- a/drivers/gpu/drm/drm_prime.c > > +++ b/drivers/gpu/drm/drm_prime.c > > @@ -278,7 +278,7 @@ void drm_gem_dmabuf_release(struct dma_buf *dma_buf) > > } > > EXPORT_SYMBOL(drm_gem_dmabuf_release); > > > > -/* > > +/** > >* drm_gem_prime_fd_to_handle - PRIME import function for GEM drivers > >* @dev: drm_device to import into > >* @file_priv: drm file-private structure > > @@ -292,9 +292,9 @@ EXPORT_SYMBOL(drm_gem_dmabuf_release); > >* > >* Returns 0 on success or a negative error code on failure. > >*/ > > -static int drm_gem_prime_fd_to_handle(struct drm_device *dev, > > - struct drm_file *file_priv, int > > prime_fd, > > - uint32_t *handle) > > +int drm_gem_prime_fd_to_handle(struct drm_device *dev, > > +struct drm_file *file_priv, int prime_fd, > > +uint32_t *handle) > > { > > struct dma_buf *dma_buf; > > struct drm_gem_object *obj; > > @@ -360,6 +360,7 @@ static int drm_gem_prime_fd_to_handle(struct drm_device > > *dev, > > dma_buf_put(dma_buf); > > return ret; > > } > > +EXPORT_SYMBOL(drm_gem_prime_fd_to_handle); > > > > int drm_prime_fd_to_handle_ioctl(struct drm_device *dev, void *data, > >struct drm_file *file_priv) > > @@ -408,7 +409,7 @@ static struct dma_buf > > *export_and_register_object(struct drm_device *dev, > > return dmabuf; > > } > > > > -/* > > +/** > >* drm_gem_prime_handle_to_fd - PRIME export function for GEM drivers > >* @dev: dev to export the buffer from > >* @file_priv: drm file-private structure > > @@ -421,10 +422,10 @@ static struct dma_buf > > *export_and_register_object(struct drm_device *dev, > >* The actual exporting from GEM object to a dma-buf is done through the > >* &drm_gem_object_funcs.export callback. > >*/ > > -static int drm_gem_prime_handle_to_fd(struct drm_device *dev, > > - struct drm_file *file_priv, uint32_t > > handle, > > - uint32_t flags, > > - int *prime_fd) > > +int drm_gem_prime_handle_to_fd(struct drm_device *dev, > > +struct drm_file *file_priv, uint32_t handle, > > +uint32_t flags, > > +int *prime_fd) > > { > > struct drm_gem_object *obj; > > int ret = 0; > > @@ -506,6 +507,7 @@ static int drm_gem_prime_handle_to_fd(struct drm_device > > *dev, > > > > return ret; > > } > > +EXPORT_SYMBOL(drm_gem_prime_handle_to_fd); > > > > int drm_prime_handle_to_fd_ioctl(struct drm_device *dev, void *data, > >struct drm_file *file_priv) > > @@ -864,9 +866,9 @@ EXPORT_SYMBOL(drm_prime_get_contiguous_size); > >* @obj: GEM object to export > >* @flags: flags like DRM_CLOEXEC and DRM_RDWR > >* > > - * This is the implementation of the &drm_gem_object_funcs.export functions > > - * for GEM drivers using the PRIME helpers. It is used as the default for > > - * drivers that do not set their own. > > + * This is the implementation of the &drm_gem_object_funcs.export > > functions for GEM drivers > > + * using the PRIME helpers. It is used as the default in > > + * drm_gem_prime_handle_to_fd(). > >*/ > > struct dma_buf *drm_gem_prime_export(struct drm_gem_object *obj, > >
Re: [PATCH] drm/amdgpu: add shared fdinfo stats
On Tue, Nov 28, 2023 at 6:28 AM Alex Deucher wrote: > > On Tue, Nov 28, 2023 at 9:17 AM Christian König > wrote: > > > > Am 17.11.23 um 20:56 schrieb Alex Deucher: > > > Add shared stats. Useful for seeing shared memory. > > > > > > Signed-off-by: Alex Deucher > > > --- > > > drivers/gpu/drm/amd/amdgpu/amdgpu_fdinfo.c | 4 > > > drivers/gpu/drm/amd/amdgpu/amdgpu_object.c | 11 +++ > > > drivers/gpu/drm/amd/amdgpu/amdgpu_object.h | 6 ++ > > > 3 files changed, 21 insertions(+) > > > > > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_fdinfo.c > > > b/drivers/gpu/drm/amd/amdgpu/amdgpu_fdinfo.c > > > index 5706b282a0c7..c7df7fa3459f 100644 > > > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_fdinfo.c > > > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_fdinfo.c > > > @@ -97,6 +97,10 @@ void amdgpu_show_fdinfo(struct drm_printer *p, struct > > > drm_file *file) > > > stats.requested_visible_vram/1024UL); > > > drm_printf(p, "amd-requested-gtt:\t%llu KiB\n", > > > stats.requested_gtt/1024UL); > > > + drm_printf(p, "drm-shared-vram:\t%llu KiB\n", > > > stats.vram_shared/1024UL); > > > + drm_printf(p, "drm-shared-gtt:\t%llu KiB\n", > > > stats.gtt_shared/1024UL); > > > + drm_printf(p, "drm-shared-cpu:\t%llu KiB\n", > > > stats.cpu_shared/1024UL); > > > + > > > for (hw_ip = 0; hw_ip < AMDGPU_HW_IP_NUM; ++hw_ip) { > > > if (!usage[hw_ip]) > > > continue; > > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c > > > b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c > > > index d79b4ca1ecfc..c24f7b2c04c1 100644 > > > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c > > > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c > > > @@ -1287,25 +1287,36 @@ void amdgpu_bo_get_memory(struct amdgpu_bo *bo, > > > struct amdgpu_mem_stats *stats) > > > { > > > uint64_t size = amdgpu_bo_size(bo); > > > + struct drm_gem_object *obj; > > > unsigned int domain; > > > + bool shared; > > > > > > /* Abort if the BO doesn't currently have a backing store */ > > > if (!bo->tbo.resource) > > > return; > > > > > > + obj = &bo->tbo.base; > > > + shared = obj->handle_count > 1; > > > > Interesting approach but I don't think that this is correct. > > > > The handle_count is basically how many GEM handles are there for BO, so > > for example it doesn't catch sharing things with V4L. > > > > What we should probably rather do is to take a look if > > bo->tbo.base.dma_buf is NULL or not. > > +Rob, dri-devel > > This is what the generic drm helper code does. See > drm_show_memory_stats(). If that is not correct that code should > probably be fixed too. OTOH, v4l doesn't expose fdinfo. What "shared" is telling you is whether the BO is counted multiple times when you look at all processes fdinfo. But I guess it would be ok to look for obj->handle_count > 1 || obj->dma_buf BR, -R > > Alex > > > > > Regards, > > Christian. > > > > > > > + > > > domain = amdgpu_mem_type_to_domain(bo->tbo.resource->mem_type); > > > switch (domain) { > > > case AMDGPU_GEM_DOMAIN_VRAM: > > > stats->vram += size; > > > if (amdgpu_bo_in_cpu_visible_vram(bo)) > > > stats->visible_vram += size; > > > + if (shared) > > > + stats->vram_shared += size; > > > break; > > > case AMDGPU_GEM_DOMAIN_GTT: > > > stats->gtt += size; > > > + if (shared) > > > + stats->gtt_shared += size; > > > break; > > > case AMDGPU_GEM_DOMAIN_CPU: > > > default: > > > stats->cpu += size; > > > + if (shared) > > > + stats->cpu_shared += size; > > > break; > > > } > > > > > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.h > > > b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.h > > > index d28e21baef16..0503af75dc26 100644 > > > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.h > > > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.h > > > @@ -138,12 +138,18 @@ struct amdgpu_bo_vm { > > > struct amdgpu_mem_stats { > > > /* current VRAM usage, includes visible VRAM */ > > > uint64_t vram; > > > + /* current shared VRAM usage, includes visible VRAM */ > > > + uint64_t vram_shared; > > > /* current visible VRAM usage */ > > > uint64_t visible_vram; > > > /* current GTT usage */ > > > uint64_t gtt; > > > + /* current shared GTT usage */ > > > + uint64_t gtt_shared; > > > /* current system memory usage */ > > > uint64_t cpu; > > > + /* current shared system memory usage */ > > > + uint64_t cpu_shared; > > > /* sum of evicted buffers, includes visible VRAM */ > > > uint64_t evicted_vram; > > > /* sum of evicted buffers due to CPU access */ > >
[PATCH v3 0/9] drm/amd/display: improve DTN color state log
This series updates the color state section of the AMD DTN log to match color resource differences between DCN versions. Currently, the DTN log considers the DCN10 color pipeline, which is useless for newer DCN versions due to all the color capability differences. In addition to the new color blocks and features, some semantic differences meant that the DCN10 output was no longer suitable for newer families. This version addresses comments from Siqueira and Harry [1]. It also contains some improvements: DPP and MPC gamut remap matrix data in 31.32 fixed point format and coverage of DCN2+ and DCN3+. - The first patch decouple DCN10 color state from HW log in a preparation for color logs specific to each DCN family. - Harry kindly provided the second patch with a way to read Gamut Remap Matrix data in 31.32 fixed point format instead of HW values. - With this, I replaced the DCN10 gamut remap output to display values in the new format (third patch). - Patches 4 and 6 fill up the color state of DPP and MPC blocks for DCN3 from the right registers. - As DCN3+ has a new MPC color block for post-blending Gamut Remap matrix, in the patch 5 I reuse Harry's approach for reading DPP gamut remap matrix and create a helper to read data of MPC gamut remap matrix. - Patch 7 and 9 create the new color state log specific for DCN2+ and DCN3+. I didn't extend to DCN32 (and also DCN35) because I observed some differences in the shaper and 3D LUT registers of this version. - Patch 8 adds description of DPP and MPC color blocks for for better interpretation of data. This new approach works well with the driver-specific color properties[2] and steamdeck/gamescope[3] together, where we can see color state changing from default values. I also tested with steamdeck/KDE and DCN21/GNOME. Please find some `before vs after` results below: === DCN301 - Before: --- DPP:IGAM format IGAM modeDGAM modeRGAM mode GAMUT mode C11 C12 C13 C14 C21 C22 C23 C24 C31 C32 C33 C34 [ 0]:0h BypassFixed Bypass Bypass0h h h h h h [ 1]:0h BypassFixed Bypass Bypass0h h h h h h [ 2]:0h BypassFixed Bypass Bypass0h h h h h h [ 3]:0h BypassFixed Bypass Bypass0h h h h h h MPCC: OPP DPP MPCCBOT MODE ALPHA_MODE PREMULT OVERLAP_ONLY IDLE [ 0]: 0h 0h 2h 3 01 0 0 [ 1]: 0h 1h fh 2 20 0 0 [ 2]: 0h 2h 3h 3 01 0 0 [ 3]: 0h 3h 1h 3 20 0 0 DCN301 - After (Gamescope): --- DPP: DGAM ROM DGAM ROM type DGAM LUT SHAPER mode 3DLUT mode 3DLUT bit depth 3DLUT size RGAM mode GAMUT adjust C11C12C13 C14C21C22C23C24C31C32 C33C34 [ 0]:1 sRGBBypassRAM B RAM A 12-bit17x17x17 RAM ABypass 00 00 00 00 00 00 00 00 00 00 00 00 [ 1]:1 sRGBBypassRAM B RAM A 12-bit17x17x17 RAM ABypass 00 00 00 00 00 00 00 00 00 00 00 00 [ 2]:1 sRGBBypassRAM B RAM A 12-bit17x17x17 RAM ABypass 00 00 00 00 00 00 00 00 00 00 00 00 [ 3]:1 sRGBBypassRAM B RAM A 12-bit17x17x17 RAM ABypass 00 00 00 00 00 00 00 00 00 00 00 00 DPP Color Caps: input_lut_shared:0 icsc:1 dgam_ram:0 dgam_rom: srgb:1,bt2020:1,gamma2_2:1,pq:1,hlg:1 post_csc:1 gamcor:1 dgam_rom_for_yuv:0 3d_lut:1 blnd_lut:1 oscs:0 MPCC: OPP DPP MPCCBOT MODE ALPHA_MODE PREMULT OVERLAP_ONLY IDLE SHAPER mode 3DLUT mode 3DLUT bit-depth 3DLUT size OGAM mode OGAM LUT GAMUT adjust C11C12C13C14C21C22C23 C24C31C32C33C34 [ 0]: 0h 0h 2h 3 01 0 0 Bypass Bypass 12-bit17x17x17RAM A Bypass 00 00 00 00 00 00 00 00 00 0
[PATCH v3 1/9] drm/amd/display: decouple color state from hw state log
Prepare to hook up color state log according to the DCN version. v3: - put functions in single line (Siqueira) Signed-off-by: Melissa Wen --- .../amd/display/dc/hwss/dcn10/dcn10_hwseq.c | 26 +-- 1 file changed, 18 insertions(+), 8 deletions(-) diff --git a/drivers/gpu/drm/amd/display/dc/hwss/dcn10/dcn10_hwseq.c b/drivers/gpu/drm/amd/display/dc/hwss/dcn10/dcn10_hwseq.c index 2b8b8366538e..9b801488eb9d 100644 --- a/drivers/gpu/drm/amd/display/dc/hwss/dcn10/dcn10_hwseq.c +++ b/drivers/gpu/drm/amd/display/dc/hwss/dcn10/dcn10_hwseq.c @@ -282,19 +282,13 @@ static void dcn10_log_hubp_states(struct dc *dc, void *log_ctx) DTN_INFO("\n"); } -void dcn10_log_hw_state(struct dc *dc, - struct dc_log_buffer_ctx *log_ctx) +static void dcn10_log_color_state(struct dc *dc, + struct dc_log_buffer_ctx *log_ctx) { struct dc_context *dc_ctx = dc->ctx; struct resource_pool *pool = dc->res_pool; int i; - DTN_INFO_BEGIN(); - - dcn10_log_hubbub_state(dc, log_ctx); - - dcn10_log_hubp_states(dc, log_ctx); - DTN_INFO("DPP:IGAM format IGAM modeDGAM modeRGAM mode" " GAMUT mode C11 C12 C13 C14 C21 C22 C23 C24 " "C31 C32 C33 C34\n"); @@ -351,6 +345,22 @@ void dcn10_log_hw_state(struct dc *dc, s.idle); } DTN_INFO("\n"); +} + +void dcn10_log_hw_state(struct dc *dc, + struct dc_log_buffer_ctx *log_ctx) +{ + struct dc_context *dc_ctx = dc->ctx; + struct resource_pool *pool = dc->res_pool; + int i; + + DTN_INFO_BEGIN(); + + dcn10_log_hubbub_state(dc, log_ctx); + + dcn10_log_hubp_states(dc, log_ctx); + + dcn10_log_color_state(dc, log_ctx); DTN_INFO("OTG: v_bs v_be v_ss v_se vpol vmax vmin vmax_sel vmin_sel h_bs h_be h_ss h_se hpol htot vtot underflow blank_en\n"); -- 2.42.0
[PATCH v3 4/9] drm/amd/display: fill up DCN3 DPP color state
DCN3 DPP color state was uncollected and some state elements from DCN1 doesn't fit DCN3. Create new elements according to DCN3 color caps and fill them up for DTN log output. rfc-v2: - fix reading of gamcor and blnd gamma states - remove gamut remap register in favor of gamut remap matrix reading Signed-off-by: Melissa Wen --- .../gpu/drm/amd/display/dc/dcn30/dcn30_dpp.c | 37 ++- drivers/gpu/drm/amd/display/dc/inc/hw/dpp.h | 8 2 files changed, 43 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/drm/amd/display/dc/dcn30/dcn30_dpp.c b/drivers/gpu/drm/amd/display/dc/dcn30/dcn30_dpp.c index 7c18f31bb56c..a3a769aad042 100644 --- a/drivers/gpu/drm/amd/display/dc/dcn30/dcn30_dpp.c +++ b/drivers/gpu/drm/amd/display/dc/dcn30/dcn30_dpp.c @@ -44,12 +44,45 @@ void dpp30_read_state(struct dpp *dpp_base, struct dcn_dpp_state *s) { struct dcn3_dpp *dpp = TO_DCN30_DPP(dpp_base); + uint32_t gamcor_lut_mode, rgam_lut_mode; REG_GET(DPP_CONTROL, - DPP_CLOCK_ENABLE, &s->is_enabled); + DPP_CLOCK_ENABLE, &s->is_enabled); - // TODO: Implement for DCN3 + // Pre-degamma (ROM) + REG_GET_2(PRE_DEGAM, + PRE_DEGAM_MODE, &s->pre_dgam_mode, + PRE_DEGAM_SELECT, &s->pre_dgam_select); + + // Gamma Correction (RAM) + REG_GET(CM_GAMCOR_CONTROL, + CM_GAMCOR_MODE_CURRENT, &s->gamcor_mode); + if (s->gamcor_mode) { + REG_GET(CM_GAMCOR_CONTROL, CM_GAMCOR_SELECT_CURRENT, &gamcor_lut_mode); + if (!gamcor_lut_mode) + s->gamcor_mode = LUT_RAM_A; // Otherwise, LUT_RAM_B + } + + // Shaper LUT (RAM), 3D LUT (mode, bit-depth, size) + REG_GET(CM_SHAPER_CONTROL, + CM_SHAPER_LUT_MODE, &s->shaper_lut_mode); + REG_GET(CM_3DLUT_MODE, + CM_3DLUT_MODE_CURRENT, &s->lut3d_mode); + REG_GET(CM_3DLUT_READ_WRITE_CONTROL, + CM_3DLUT_30BIT_EN, &s->lut3d_bit_depth); + REG_GET(CM_3DLUT_MODE, + CM_3DLUT_SIZE, &s->lut3d_size); + + // Blend/Out Gamma (RAM) + REG_GET(CM_BLNDGAM_CONTROL, + CM_BLNDGAM_MODE_CURRENT, &s->rgam_lut_mode); + if (s->rgam_lut_mode){ + REG_GET(CM_BLNDGAM_CONTROL, CM_BLNDGAM_SELECT_CURRENT, &rgam_lut_mode); + if (!rgam_lut_mode) + s->rgam_lut_mode = LUT_RAM_A; // Otherwise, LUT_RAM_B + } } + /*program post scaler scs block in dpp CM*/ void dpp3_program_post_csc( struct dpp *dpp_base, diff --git a/drivers/gpu/drm/amd/display/dc/inc/hw/dpp.h b/drivers/gpu/drm/amd/display/dc/inc/hw/dpp.h index b6acfd86642a..4e604bf24f51 100644 --- a/drivers/gpu/drm/amd/display/dc/inc/hw/dpp.h +++ b/drivers/gpu/drm/amd/display/dc/inc/hw/dpp.h @@ -151,6 +151,14 @@ struct dcn_dpp_state { uint32_t gamut_remap_c33_c34; // gamut_remap data for dcn*_log_color_state() struct dpp_grph_csc_adjustment gamut_remap; + uint32_t shaper_lut_mode; + uint32_t lut3d_mode; + uint32_t lut3d_bit_depth; + uint32_t lut3d_size; + uint32_t blnd_lut_mode; + uint32_t pre_dgam_mode; + uint32_t pre_dgam_select; + uint32_t gamcor_mode; }; struct CM_bias_params { -- 2.42.0
[PATCH v3 2/9] drm/amd/display: Add dpp_get_gamut_remap functions
From: Harry Wentland We want to be able to read the DPP's gamut remap matrix. v2: - code-style and doc comments clean-up (Melissa) Signed-off-by: Harry Wentland Signed-off-by: Melissa Wen --- .../drm/amd/display/dc/basics/conversion.c| 34 + .../drm/amd/display/dc/basics/conversion.h| 4 ++ .../amd/display/dc/dcn10/dcn10_cm_common.c| 20 ++ .../amd/display/dc/dcn10/dcn10_cm_common.h| 4 +- .../gpu/drm/amd/display/dc/dcn10/dcn10_dpp.c | 3 +- .../gpu/drm/amd/display/dc/dcn10/dcn10_dpp.h | 3 + .../drm/amd/display/dc/dcn10/dcn10_dpp_cm.c | 70 ++- .../gpu/drm/amd/display/dc/dcn20/dcn20_dpp.c | 1 + .../gpu/drm/amd/display/dc/dcn20/dcn20_dpp.h | 3 + .../drm/amd/display/dc/dcn20/dcn20_dpp_cm.c | 55 +++ .../drm/amd/display/dc/dcn201/dcn201_dpp.c| 1 + .../gpu/drm/amd/display/dc/dcn30/dcn30_dpp.c | 1 + .../gpu/drm/amd/display/dc/dcn30/dcn30_dpp.h | 2 + .../drm/amd/display/dc/dcn30/dcn30_dpp_cm.c | 54 ++ .../gpu/drm/amd/display/dc/dcn32/dcn32_dpp.c | 1 + drivers/gpu/drm/amd/display/dc/inc/hw/dpp.h | 3 + 16 files changed, 256 insertions(+), 3 deletions(-) diff --git a/drivers/gpu/drm/amd/display/dc/basics/conversion.c b/drivers/gpu/drm/amd/display/dc/basics/conversion.c index e295a839ab47..f0065a51beb9 100644 --- a/drivers/gpu/drm/amd/display/dc/basics/conversion.c +++ b/drivers/gpu/drm/amd/display/dc/basics/conversion.c @@ -101,6 +101,40 @@ void convert_float_matrix( } } +static struct fixed31_32 int_frac_to_fixed_point(uint16_t arg, +uint8_t integer_bits, +uint8_t fractional_bits) +{ + struct fixed31_32 result; + uint16_t sign_mask = 1 << (fractional_bits + integer_bits); + uint16_t value_mask = sign_mask - 1; + + result.value = (long long)(arg & value_mask) << + (FIXED31_32_BITS_PER_FRACTIONAL_PART - fractional_bits); + + if (arg & sign_mask) + result = dc_fixpt_neg(result); + + return result; +} + +/** + * convert_hw_matrix - converts HW values into fixed31_32 matrix. + * @matrix: fixed point 31.32 matrix + * @reg: array of register values + * @buffer_size: size of the array of register values + * + * Converts HW register spec defined format S2D13 into a fixed-point 31.32 + * matrix. + */ +void convert_hw_matrix(struct fixed31_32 *matrix, + uint16_t *reg, + uint32_t buffer_size) +{ + for (int i = 0; i < buffer_size; ++i) + matrix[i] = int_frac_to_fixed_point(reg[i], 2, 13); +} + static uint32_t find_gcd(uint32_t a, uint32_t b) { uint32_t remainder = 0; diff --git a/drivers/gpu/drm/amd/display/dc/basics/conversion.h b/drivers/gpu/drm/amd/display/dc/basics/conversion.h index 81da4e6f7a1a..a433cef78496 100644 --- a/drivers/gpu/drm/amd/display/dc/basics/conversion.h +++ b/drivers/gpu/drm/amd/display/dc/basics/conversion.h @@ -41,6 +41,10 @@ void convert_float_matrix( void reduce_fraction(uint32_t num, uint32_t den, uint32_t *out_num, uint32_t *out_den); +void convert_hw_matrix(struct fixed31_32 *matrix, + uint16_t *reg, + uint32_t buffer_size); + static inline unsigned int log_2(unsigned int num) { return ilog2(num); diff --git a/drivers/gpu/drm/amd/display/dc/dcn10/dcn10_cm_common.c b/drivers/gpu/drm/amd/display/dc/dcn10/dcn10_cm_common.c index 3538973bd0c6..b7e57aa27361 100644 --- a/drivers/gpu/drm/amd/display/dc/dcn10/dcn10_cm_common.c +++ b/drivers/gpu/drm/amd/display/dc/dcn10/dcn10_cm_common.c @@ -62,6 +62,26 @@ void cm_helper_program_color_matrices( } +void cm_helper_read_color_matrices(struct dc_context *ctx, + uint16_t *regval, + const struct color_matrices_reg *reg) +{ + uint32_t cur_csc_reg, regval0, regval1; + unsigned int i = 0; + + for (cur_csc_reg = reg->csc_c11_c12; +cur_csc_reg <= reg->csc_c33_c34; cur_csc_reg++) { + REG_GET_2(cur_csc_reg, + csc_c11, ®val0, + csc_c12, ®val1); + + regval[2 * i] = regval0; + regval[(2 * i) + 1] = regval1; + + i++; + } +} + void cm_helper_program_xfer_func( struct dc_context *ctx, const struct pwl_params *params, diff --git a/drivers/gpu/drm/amd/display/dc/dcn10/dcn10_cm_common.h b/drivers/gpu/drm/amd/display/dc/dcn10/dcn10_cm_common.h index 0a68b63d6126..decc50b1ac53 100644 --- a/drivers/gpu/drm/amd/display/dc/dcn10/dcn10_cm_common.h +++ b/drivers/gpu/drm/amd/display/dc/dcn10/dcn10_cm_common.h @@ -114,5 +114,7 @@ bool cm_helper_translate_curve_to_degamma_hw_format( const struct dc_transfer_func *output_tf, struct pwl_params
[PATCH v3 3/9] drm/amd/display: read gamut remap matrix in fixed-point 31.32 format
Instead of read gamut remap data from hw values, convert HW register values (S2D13) into a fixed-point 31.32 matrix for color state log. Change DCN10 log to print data in the format of the gamut remap matrix. Signed-off-by: Melissa Wen --- .../amd/display/dc/hwss/dcn10/dcn10_hwseq.c | 38 +-- drivers/gpu/drm/amd/display/dc/inc/hw/dpp.h | 3 ++ 2 files changed, 29 insertions(+), 12 deletions(-) diff --git a/drivers/gpu/drm/amd/display/dc/hwss/dcn10/dcn10_hwseq.c b/drivers/gpu/drm/amd/display/dc/hwss/dcn10/dcn10_hwseq.c index 9b801488eb9d..f0a9f8818909 100644 --- a/drivers/gpu/drm/amd/display/dc/hwss/dcn10/dcn10_hwseq.c +++ b/drivers/gpu/drm/amd/display/dc/hwss/dcn10/dcn10_hwseq.c @@ -289,20 +289,26 @@ static void dcn10_log_color_state(struct dc *dc, struct resource_pool *pool = dc->res_pool; int i; - DTN_INFO("DPP:IGAM format IGAM modeDGAM modeRGAM mode" - " GAMUT mode C11 C12 C13 C14 C21 C22 C23 C24 " - "C31 C32 C33 C34\n"); + DTN_INFO("DPP:IGAM formatIGAM modeDGAM modeRGAM mode" +" GAMUT adjust " +"C11C12C13C14" +"C21C22C23C24" +"C31C32C33C34\n"); for (i = 0; i < pool->pipe_count; i++) { struct dpp *dpp = pool->dpps[i]; struct dcn_dpp_state s = {0}; dpp->funcs->dpp_read_state(dpp, &s); + dpp->funcs->dpp_get_gamut_remap(dpp, &s.gamut_remap); if (!s.is_enabled) continue; - DTN_INFO("[%2d]: %11xh %-11s %-11s %-11s" - "%8x%08xh %08xh %08xh %08xh %08xh %08xh", + DTN_INFO("[%2d]: %11xh %11s%9s%9s" +" %12s " +"%010lld %010lld %010lld %010lld " +"%010lld %010lld %010lld %010lld " +"%010lld %010lld %010lld %010lld", dpp->inst, s.igam_input_format, (s.igam_lut_mode == 0) ? "BypassFixed" : @@ -322,13 +328,21 @@ static void dcn10_log_color_state(struct dc *dc, ((s.rgam_lut_mode == 3) ? "RAM" : ((s.rgam_lut_mode == 4) ? "RAM" : "Unknown", - s.gamut_remap_mode, - s.gamut_remap_c11_c12, - s.gamut_remap_c13_c14, - s.gamut_remap_c21_c22, - s.gamut_remap_c23_c24, - s.gamut_remap_c31_c32, - s.gamut_remap_c33_c34); + (s.gamut_remap.gamut_adjust_type == 0) ? "Bypass" : + ((s.gamut_remap.gamut_adjust_type == 1) ? "HW" : + "SW"), + s.gamut_remap.temperature_matrix[0].value, + s.gamut_remap.temperature_matrix[1].value, + s.gamut_remap.temperature_matrix[2].value, + s.gamut_remap.temperature_matrix[3].value, + s.gamut_remap.temperature_matrix[4].value, + s.gamut_remap.temperature_matrix[5].value, + s.gamut_remap.temperature_matrix[6].value, + s.gamut_remap.temperature_matrix[7].value, + s.gamut_remap.temperature_matrix[8].value, + s.gamut_remap.temperature_matrix[9].value, + s.gamut_remap.temperature_matrix[10].value, + s.gamut_remap.temperature_matrix[11].value); DTN_INFO("\n"); } DTN_INFO("\n"); diff --git a/drivers/gpu/drm/amd/display/dc/inc/hw/dpp.h b/drivers/gpu/drm/amd/display/dc/inc/hw/dpp.h index 597ebdb4da4c..b6acfd86642a 100644 --- a/drivers/gpu/drm/amd/display/dc/inc/hw/dpp.h +++ b/drivers/gpu/drm/amd/display/dc/inc/hw/dpp.h @@ -141,6 +141,7 @@ struct dcn_dpp_state { uint32_t igam_input_format; uint32_t dgam_lut_mode; uint32_t rgam_lut_mode; + // gamut_remap data for dcn10_get_cm_states() uint32_t gamut_remap_mode; uint32_t gamut_remap_c11_c12; uint32_t gamut_remap_c13_c14; @@ -148,6 +149,8 @@ struct dcn_dpp_state { uint32_t gamut_remap_c23_c24; uint32_t gamut_remap_c31_c32; uint32_t gamut_remap_c33_c34; + // gamut_remap data for dcn*_log_color_state() + struct dpp_grph
[PATCH v3 5/9] drm/amd/display: add get_gamut_remap helper for MPC3
We want to be able to read the MPC's gamut remap matrix similar to what we do with .dpp_get_gamut_remap functions. On the other hand, we don't need a hook here because only DCN3+ has the MPC gamut remap block, being absent in previous families. Signed-off-by: Melissa Wen --- .../gpu/drm/amd/display/dc/dcn30/dcn30_mpc.c | 58 +++ .../gpu/drm/amd/display/dc/dcn30/dcn30_mpc.h | 4 ++ 2 files changed, 62 insertions(+) diff --git a/drivers/gpu/drm/amd/display/dc/dcn30/dcn30_mpc.c b/drivers/gpu/drm/amd/display/dc/dcn30/dcn30_mpc.c index d1500b223858..a6a4c3413f89 100644 --- a/drivers/gpu/drm/amd/display/dc/dcn30/dcn30_mpc.c +++ b/drivers/gpu/drm/amd/display/dc/dcn30/dcn30_mpc.c @@ -1129,6 +1129,64 @@ void mpc3_set_gamut_remap( } } +static void read_gamut_remap(struct dcn30_mpc *mpc30, +int mpcc_id, +uint16_t *regval, +uint32_t *select) +{ + struct color_matrices_reg gam_regs; + + //current coefficient set in use + REG_GET(MPCC_GAMUT_REMAP_MODE[mpcc_id], MPCC_GAMUT_REMAP_MODE_CURRENT, select); + + gam_regs.shifts.csc_c11 = mpc30->mpc_shift->MPCC_GAMUT_REMAP_C11_A; + gam_regs.masks.csc_c11 = mpc30->mpc_mask->MPCC_GAMUT_REMAP_C11_A; + gam_regs.shifts.csc_c12 = mpc30->mpc_shift->MPCC_GAMUT_REMAP_C12_A; + gam_regs.masks.csc_c12 = mpc30->mpc_mask->MPCC_GAMUT_REMAP_C12_A; + + if (*select == GAMUT_REMAP_COEFF) { + gam_regs.csc_c11_c12 = REG(MPC_GAMUT_REMAP_C11_C12_A[mpcc_id]); + gam_regs.csc_c33_c34 = REG(MPC_GAMUT_REMAP_C33_C34_A[mpcc_id]); + + cm_helper_read_color_matrices( + mpc30->base.ctx, + regval, + &gam_regs); + + } else if (*select == GAMUT_REMAP_COMA_COEFF) { + + gam_regs.csc_c11_c12 = REG(MPC_GAMUT_REMAP_C11_C12_B[mpcc_id]); + gam_regs.csc_c33_c34 = REG(MPC_GAMUT_REMAP_C33_C34_B[mpcc_id]); + + cm_helper_read_color_matrices( + mpc30->base.ctx, + regval, + &gam_regs); + + } + +} + +void mpc3_get_gamut_remap(struct mpc *mpc, + int mpcc_id, + struct mpc_grph_gamut_adjustment *adjust) +{ + struct dcn30_mpc *mpc30 = TO_DCN30_MPC(mpc); + uint16_t arr_reg_val[12]; + int select; + + read_gamut_remap(mpc30, mpcc_id, arr_reg_val, &select); + + if (select == GAMUT_REMAP_BYPASS) { + adjust->gamut_adjust_type = GRAPHICS_GAMUT_ADJUST_TYPE_BYPASS; + return; + } + + adjust->gamut_adjust_type = GRAPHICS_GAMUT_ADJUST_TYPE_SW; + convert_hw_matrix(adjust->temperature_matrix, + arr_reg_val, ARRAY_SIZE(arr_reg_val)); +} + bool mpc3_program_3dlut( struct mpc *mpc, const struct tetrahedral_params *params, diff --git a/drivers/gpu/drm/amd/display/dc/dcn30/dcn30_mpc.h b/drivers/gpu/drm/amd/display/dc/dcn30/dcn30_mpc.h index 5198f2167c7c..9cb96ae95a2f 100644 --- a/drivers/gpu/drm/amd/display/dc/dcn30/dcn30_mpc.h +++ b/drivers/gpu/drm/amd/display/dc/dcn30/dcn30_mpc.h @@ -1056,6 +1056,10 @@ void mpc3_set_gamut_remap( int mpcc_id, const struct mpc_grph_gamut_adjustment *adjust); +void mpc3_get_gamut_remap(struct mpc *mpc, + int mpcc_id, + struct mpc_grph_gamut_adjustment *adjust); + void mpc3_set_rmu_mux( struct mpc *mpc, int rmu_idx, -- 2.42.0
[PATCH v3 7/9] drm/amd/display: hook up DCN30 color blocks data to DTN log
Color caps changed between HW versions, which caused the DCN10 color state sections in the DTN log to no longer match DCN3+ state. Create a color state log specific to DCN3.0 and hook it up to DCN3.0+ and DCN3.1+ drivers. rfc-v2: - detail RAM mode for gamcor and blnd gamma blocks - add MPC gamut remap matrix log v3: - read MPC gamut remap matrix in fixed 31.32 format - extend to DCN3.0+ and DCN3.1+ drivers (Harry) Signed-off-by: Melissa Wen --- .../gpu/drm/amd/display/dc/dcn30/dcn30_init.c | 1 + .../drm/amd/display/dc/dcn301/dcn301_init.c | 1 + .../gpu/drm/amd/display/dc/dcn31/dcn31_init.c | 1 + .../drm/amd/display/dc/dcn314/dcn314_init.c | 1 + .../amd/display/dc/hwss/dcn10/dcn10_hwseq.c | 5 +- .../amd/display/dc/hwss/dcn30/dcn30_hwseq.c | 126 ++ .../amd/display/dc/hwss/dcn30/dcn30_hwseq.h | 3 + .../drm/amd/display/dc/hwss/hw_sequencer.h| 2 + 8 files changed, 139 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/display/dc/dcn30/dcn30_init.c b/drivers/gpu/drm/amd/display/dc/dcn30/dcn30_init.c index 9894caedffed..4064e6b7f599 100644 --- a/drivers/gpu/drm/amd/display/dc/dcn30/dcn30_init.c +++ b/drivers/gpu/drm/amd/display/dc/dcn30/dcn30_init.c @@ -68,6 +68,7 @@ static const struct hw_sequencer_funcs dcn30_funcs = { .setup_stereo = dcn10_setup_stereo, .set_avmute = dcn30_set_avmute, .log_hw_state = dcn10_log_hw_state, + .log_color_state = dcn30_log_color_state, .get_hw_state = dcn10_get_hw_state, .clear_status_bits = dcn10_clear_status_bits, .wait_for_mpcc_disconnect = dcn10_wait_for_mpcc_disconnect, diff --git a/drivers/gpu/drm/amd/display/dc/dcn301/dcn301_init.c b/drivers/gpu/drm/amd/display/dc/dcn301/dcn301_init.c index 6477009ce065..1a9122252702 100644 --- a/drivers/gpu/drm/amd/display/dc/dcn301/dcn301_init.c +++ b/drivers/gpu/drm/amd/display/dc/dcn301/dcn301_init.c @@ -72,6 +72,7 @@ static const struct hw_sequencer_funcs dcn301_funcs = { .setup_stereo = dcn10_setup_stereo, .set_avmute = dcn30_set_avmute, .log_hw_state = dcn10_log_hw_state, + .log_color_state = dcn30_log_color_state, .get_hw_state = dcn10_get_hw_state, .clear_status_bits = dcn10_clear_status_bits, .wait_for_mpcc_disconnect = dcn10_wait_for_mpcc_disconnect, diff --git a/drivers/gpu/drm/amd/display/dc/dcn31/dcn31_init.c b/drivers/gpu/drm/amd/display/dc/dcn31/dcn31_init.c index 669f524bd064..61577a3678a0 100644 --- a/drivers/gpu/drm/amd/display/dc/dcn31/dcn31_init.c +++ b/drivers/gpu/drm/amd/display/dc/dcn31/dcn31_init.c @@ -71,6 +71,7 @@ static const struct hw_sequencer_funcs dcn31_funcs = { .setup_stereo = dcn10_setup_stereo, .set_avmute = dcn30_set_avmute, .log_hw_state = dcn10_log_hw_state, + .log_color_state = dcn30_log_color_state, .get_hw_state = dcn10_get_hw_state, .clear_status_bits = dcn10_clear_status_bits, .wait_for_mpcc_disconnect = dcn10_wait_for_mpcc_disconnect, diff --git a/drivers/gpu/drm/amd/display/dc/dcn314/dcn314_init.c b/drivers/gpu/drm/amd/display/dc/dcn314/dcn314_init.c index ccb7e317e86a..094b912832c1 100644 --- a/drivers/gpu/drm/amd/display/dc/dcn314/dcn314_init.c +++ b/drivers/gpu/drm/amd/display/dc/dcn314/dcn314_init.c @@ -73,6 +73,7 @@ static const struct hw_sequencer_funcs dcn314_funcs = { .setup_stereo = dcn10_setup_stereo, .set_avmute = dcn30_set_avmute, .log_hw_state = dcn10_log_hw_state, + .log_color_state = dcn30_log_color_state, .get_hw_state = dcn10_get_hw_state, .clear_status_bits = dcn10_clear_status_bits, .wait_for_mpcc_disconnect = dcn10_wait_for_mpcc_disconnect, diff --git a/drivers/gpu/drm/amd/display/dc/hwss/dcn10/dcn10_hwseq.c b/drivers/gpu/drm/amd/display/dc/hwss/dcn10/dcn10_hwseq.c index f0a9f8818909..f7d9bcdbc6c6 100644 --- a/drivers/gpu/drm/amd/display/dc/hwss/dcn10/dcn10_hwseq.c +++ b/drivers/gpu/drm/amd/display/dc/hwss/dcn10/dcn10_hwseq.c @@ -374,7 +374,10 @@ void dcn10_log_hw_state(struct dc *dc, dcn10_log_hubp_states(dc, log_ctx); - dcn10_log_color_state(dc, log_ctx); + if (dc->hwss.log_color_state) + dc->hwss.log_color_state(dc, log_ctx); + else + dcn10_log_color_state(dc, log_ctx); DTN_INFO("OTG: v_bs v_be v_ss v_se vpol vmax vmin vmax_sel vmin_sel h_bs h_be h_ss h_se hpol htot vtot underflow blank_en\n"); diff --git a/drivers/gpu/drm/amd/display/dc/hwss/dcn30/dcn30_hwseq.c b/drivers/gpu/drm/amd/display/dc/hwss/dcn30/dcn30_hwseq.c index d71faf2ecd41..1e07f0a6be1f 100644 --- a/drivers/gpu/drm/amd/display/dc/hwss/dcn30/dcn30_hwseq.c +++ b/drivers/gpu/drm/amd/display/dc/hwss/dcn30/dcn30_hwseq.c @@ -69,6 +69,132 @@ #define FN(reg_name, field_name) \ hws->shifts->field_name, hws->masks->field_name +void dcn30_log_color_state(struct dc *dc, + struct dc_log_buffer_ctx *log_ctx) +{ +
[PATCH v3 8/9] drm/amd/display: add DPP and MPC color caps to DTN log
Add color caps information for DPP and MPC block to show HW color caps. Signed-off-by: Melissa Wen --- .../amd/display/dc/hwss/dcn10/dcn10_hwseq.c | 23 +++ .../amd/display/dc/hwss/dcn30/dcn30_hwseq.c | 23 +++ 2 files changed, 46 insertions(+) diff --git a/drivers/gpu/drm/amd/display/dc/hwss/dcn10/dcn10_hwseq.c b/drivers/gpu/drm/amd/display/dc/hwss/dcn10/dcn10_hwseq.c index f7d9bcdbc6c6..d3cab6fb270b 100644 --- a/drivers/gpu/drm/amd/display/dc/hwss/dcn10/dcn10_hwseq.c +++ b/drivers/gpu/drm/amd/display/dc/hwss/dcn10/dcn10_hwseq.c @@ -346,6 +346,24 @@ static void dcn10_log_color_state(struct dc *dc, DTN_INFO("\n"); } DTN_INFO("\n"); + DTN_INFO("DPP Color Caps: input_lut_shared:%d icsc:%d" +" dgam_ram:%d dgam_rom: srgb:%d,bt2020:%d,gamma2_2:%d,pq:%d,hlg:%d" +" post_csc:%d gamcor:%d dgam_rom_for_yuv:%d 3d_lut:%d" +" blnd_lut:%d oscs:%d\n\n", +dc->caps.color.dpp.input_lut_shared, +dc->caps.color.dpp.icsc, +dc->caps.color.dpp.dgam_ram, +dc->caps.color.dpp.dgam_rom_caps.srgb, +dc->caps.color.dpp.dgam_rom_caps.bt2020, +dc->caps.color.dpp.dgam_rom_caps.gamma2_2, +dc->caps.color.dpp.dgam_rom_caps.pq, +dc->caps.color.dpp.dgam_rom_caps.hlg, +dc->caps.color.dpp.post_csc, +dc->caps.color.dpp.gamma_corr, +dc->caps.color.dpp.dgam_rom_for_yuv, +dc->caps.color.dpp.hw_3d_lut, +dc->caps.color.dpp.ogam_ram, +dc->caps.color.dpp.ocsc); DTN_INFO("MPCC: OPP DPP MPCCBOT MODE ALPHA_MODE PREMULT OVERLAP_ONLY IDLE\n"); for (i = 0; i < pool->pipe_count; i++) { @@ -359,6 +377,11 @@ static void dcn10_log_color_state(struct dc *dc, s.idle); } DTN_INFO("\n"); + DTN_INFO("MPC Color Caps: gamut_remap:%d, 3dlut:%d, ogam_ram:%d, ocsc:%d\n\n", +dc->caps.color.mpc.gamut_remap, +dc->caps.color.mpc.num_3dluts, +dc->caps.color.mpc.ogam_ram, +dc->caps.color.mpc.ocsc); } void dcn10_log_hw_state(struct dc *dc, diff --git a/drivers/gpu/drm/amd/display/dc/hwss/dcn30/dcn30_hwseq.c b/drivers/gpu/drm/amd/display/dc/hwss/dcn30/dcn30_hwseq.c index 1e07f0a6be1f..3b38af592101 100644 --- a/drivers/gpu/drm/amd/display/dc/hwss/dcn30/dcn30_hwseq.c +++ b/drivers/gpu/drm/amd/display/dc/hwss/dcn30/dcn30_hwseq.c @@ -140,6 +140,24 @@ void dcn30_log_color_state(struct dc *dc, DTN_INFO("\n"); } DTN_INFO("\n"); + DTN_INFO("DPP Color Caps: input_lut_shared:%d icsc:%d" +" dgam_ram:%d dgam_rom: srgb:%d,bt2020:%d,gamma2_2:%d,pq:%d,hlg:%d" +" post_csc:%d gamcor:%d dgam_rom_for_yuv:%d 3d_lut:%d" +" blnd_lut:%d oscs:%d\n\n", +dc->caps.color.dpp.input_lut_shared, +dc->caps.color.dpp.icsc, +dc->caps.color.dpp.dgam_ram, +dc->caps.color.dpp.dgam_rom_caps.srgb, +dc->caps.color.dpp.dgam_rom_caps.bt2020, +dc->caps.color.dpp.dgam_rom_caps.gamma2_2, +dc->caps.color.dpp.dgam_rom_caps.pq, +dc->caps.color.dpp.dgam_rom_caps.hlg, +dc->caps.color.dpp.post_csc, +dc->caps.color.dpp.gamma_corr, +dc->caps.color.dpp.dgam_rom_for_yuv, +dc->caps.color.dpp.hw_3d_lut, +dc->caps.color.dpp.ogam_ram, +dc->caps.color.dpp.ocsc); DTN_INFO("MPCC: OPP DPP MPCCBOT MODE ALPHA_MODE PREMULT OVERLAP_ONLY IDLE" " SHAPER mode 3DLUT mode 3DLUT bit-depth 3DLUT size OGAM mode OGAM LUT" @@ -193,6 +211,11 @@ void dcn30_log_color_state(struct dc *dc, } DTN_INFO("\n"); + DTN_INFO("MPC Color Caps: gamut_remap:%d, 3dlut:%d, ogam_ram:%d, ocsc:%d\n\n", +dc->caps.color.mpc.gamut_remap, +dc->caps.color.mpc.num_3dluts, +dc->caps.color.mpc.ogam_ram, +dc->caps.color.mpc.ocsc); } bool dcn30_set_blend_lut( -- 2.42.0
[PATCH v3 9/9] drm/amd/display: hook up DCN20 color blocks data to DTN log
Color caps changed between HW versions, which caused the DCN10 color state sections in the DTN log to no longer match DCN2+ state. Create a color state log specific to DCN2.0 and hook it up to DCN2 family drivers. Instead of reading gamut remap reg values, display gamut remap matrix data in fixed 31.32. Signed-off-by: Melissa Wen --- .../gpu/drm/amd/display/dc/dcn20/dcn20_dpp.c | 30 ++--- .../gpu/drm/amd/display/dc/dcn20/dcn20_init.c | 1 + .../gpu/drm/amd/display/dc/dcn20/dcn20_mpc.c | 24 +++- .../gpu/drm/amd/display/dc/dcn21/dcn21_init.c | 1 + .../amd/display/dc/hwss/dcn20/dcn20_hwseq.c | 106 ++ .../amd/display/dc/hwss/dcn20/dcn20_hwseq.h | 2 + 6 files changed, 149 insertions(+), 15 deletions(-) diff --git a/drivers/gpu/drm/amd/display/dc/dcn20/dcn20_dpp.c b/drivers/gpu/drm/amd/display/dc/dcn20/dcn20_dpp.c index dedc2dcf2691..1516c0a48726 100644 --- a/drivers/gpu/drm/amd/display/dc/dcn20/dcn20_dpp.c +++ b/drivers/gpu/drm/amd/display/dc/dcn20/dcn20_dpp.c @@ -55,21 +55,23 @@ void dpp20_read_state(struct dpp *dpp_base, REG_GET(DPP_CONTROL, DPP_CLOCK_ENABLE, &s->is_enabled); + + // Degamma LUT (RAM) REG_GET(CM_DGAM_CONTROL, - CM_DGAM_LUT_MODE, &s->dgam_lut_mode); - // BGAM has no ROM, and definition is different, can't reuse same dump - //REG_GET(CM_BLNDGAM_CONTROL, - // CM_BLNDGAM_LUT_MODE, &s->rgam_lut_mode); - REG_GET(CM_GAMUT_REMAP_CONTROL, - CM_GAMUT_REMAP_MODE, &s->gamut_remap_mode); - if (s->gamut_remap_mode) { - s->gamut_remap_c11_c12 = REG_READ(CM_GAMUT_REMAP_C11_C12); - s->gamut_remap_c13_c14 = REG_READ(CM_GAMUT_REMAP_C13_C14); - s->gamut_remap_c21_c22 = REG_READ(CM_GAMUT_REMAP_C21_C22); - s->gamut_remap_c23_c24 = REG_READ(CM_GAMUT_REMAP_C23_C24); - s->gamut_remap_c31_c32 = REG_READ(CM_GAMUT_REMAP_C31_C32); - s->gamut_remap_c33_c34 = REG_READ(CM_GAMUT_REMAP_C33_C34); - } + CM_DGAM_LUT_MODE, &s->dgam_lut_mode); + + // Shaper LUT (RAM), 3D LUT (mode, bit-depth, size) + REG_GET(CM_SHAPER_CONTROL, + CM_SHAPER_LUT_MODE, &s->shaper_lut_mode); + REG_GET_2(CM_3DLUT_READ_WRITE_CONTROL, + CM_3DLUT_CONFIG_STATUS, &s->lut3d_mode, + CM_3DLUT_30BIT_EN, &s->lut3d_bit_depth); + REG_GET(CM_3DLUT_MODE, + CM_3DLUT_SIZE, &s->lut3d_size); + + // Blend/Out Gamma (RAM) + REG_GET(CM_BLNDGAM_LUT_WRITE_EN_MASK, + CM_BLNDGAM_CONFIG_STATUS, &s->rgam_lut_mode); } void dpp2_power_on_obuf( diff --git a/drivers/gpu/drm/amd/display/dc/dcn20/dcn20_init.c b/drivers/gpu/drm/amd/display/dc/dcn20/dcn20_init.c index 884e3e323338..ef6488165b8f 100644 --- a/drivers/gpu/drm/amd/display/dc/dcn20/dcn20_init.c +++ b/drivers/gpu/drm/amd/display/dc/dcn20/dcn20_init.c @@ -67,6 +67,7 @@ static const struct hw_sequencer_funcs dcn20_funcs = { .setup_stereo = dcn10_setup_stereo, .set_avmute = dce110_set_avmute, .log_hw_state = dcn10_log_hw_state, + .log_color_state = dcn20_log_color_state, .get_hw_state = dcn10_get_hw_state, .clear_status_bits = dcn10_clear_status_bits, .wait_for_mpcc_disconnect = dcn10_wait_for_mpcc_disconnect, diff --git a/drivers/gpu/drm/amd/display/dc/dcn20/dcn20_mpc.c b/drivers/gpu/drm/amd/display/dc/dcn20/dcn20_mpc.c index 5da6e44f284a..16b5ff208d14 100644 --- a/drivers/gpu/drm/amd/display/dc/dcn20/dcn20_mpc.c +++ b/drivers/gpu/drm/amd/display/dc/dcn20/dcn20_mpc.c @@ -542,8 +542,30 @@ static struct mpcc *mpc2_get_mpcc_for_dpp(struct mpc_tree *tree, int dpp_id) return NULL; } +static void mpc2_read_mpcc_state( + struct mpc *mpc, + int mpcc_inst, + struct mpcc_state *s) +{ + struct dcn20_mpc *mpc20 = TO_DCN20_MPC(mpc); + + REG_GET(MPCC_OPP_ID[mpcc_inst], MPCC_OPP_ID, &s->opp_id); + REG_GET(MPCC_TOP_SEL[mpcc_inst], MPCC_TOP_SEL, &s->dpp_id); + REG_GET(MPCC_BOT_SEL[mpcc_inst], MPCC_BOT_SEL, &s->bot_mpcc_id); + REG_GET_4(MPCC_CONTROL[mpcc_inst], MPCC_MODE, &s->mode, + MPCC_ALPHA_BLND_MODE, &s->alpha_mode, + MPCC_ALPHA_MULTIPLIED_MODE, &s->pre_multiplied_alpha, + MPCC_BLND_ACTIVE_OVERLAP_ONLY, &s->overlap_only); + REG_GET_2(MPCC_STATUS[mpcc_inst], MPCC_IDLE, &s->idle, + MPCC_BUSY, &s->busy); + + /* Gamma block state */ + REG_GET(MPCC_OGAM_LUT_RAM_CONTROL[mpcc_inst], + MPCC_OGAM_CONFIG_STATUS, &s->rgam_mode); +} + static const struct mpc_funcs dcn20_mpc_funcs = { - .read_mpcc_state = mpc1_read_mpcc_state, + .read_mpcc_state = mpc2_read_mpcc_state, .insert_plane = mpc1_insert_plane, .remove_mpcc = mpc1_remove_mpcc, .mpc_init = mpc1_mpc_init, diff --git a/dri
[PATCH v3 6/9] drm/amd/display: create DCN3-specific log for MPC state
Logging DCN3 MPC state was following DCN1 implementation that doesn't consider new DCN3 MPC color blocks. Create new elements according to DCN3 MPC color caps and a new DCN3-specific function for reading MPC data. v3: - remove gamut remap reg reading in favor of fixed31_32 matrix data Signed-off-by: Melissa Wen --- .../gpu/drm/amd/display/dc/dcn30/dcn30_mpc.c | 48 ++- drivers/gpu/drm/amd/display/dc/inc/hw/mpc.h | 7 +++ 2 files changed, 54 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/display/dc/dcn30/dcn30_mpc.c b/drivers/gpu/drm/amd/display/dc/dcn30/dcn30_mpc.c index a6a4c3413f89..bf3386cd444d 100644 --- a/drivers/gpu/drm/amd/display/dc/dcn30/dcn30_mpc.c +++ b/drivers/gpu/drm/amd/display/dc/dcn30/dcn30_mpc.c @@ -1440,8 +1440,54 @@ static void mpc3_set_mpc_mem_lp_mode(struct mpc *mpc) } } +static void mpc3_read_mpcc_state( + struct mpc *mpc, + int mpcc_inst, + struct mpcc_state *s) +{ + struct dcn30_mpc *mpc30 = TO_DCN30_MPC(mpc); + uint32_t rmu_status = 0xf; + + REG_GET(MPCC_OPP_ID[mpcc_inst], MPCC_OPP_ID, &s->opp_id); + REG_GET(MPCC_TOP_SEL[mpcc_inst], MPCC_TOP_SEL, &s->dpp_id); + REG_GET(MPCC_BOT_SEL[mpcc_inst], MPCC_BOT_SEL, &s->bot_mpcc_id); + REG_GET_4(MPCC_CONTROL[mpcc_inst], MPCC_MODE, &s->mode, + MPCC_ALPHA_BLND_MODE, &s->alpha_mode, + MPCC_ALPHA_MULTIPLIED_MODE, &s->pre_multiplied_alpha, + MPCC_BLND_ACTIVE_OVERLAP_ONLY, &s->overlap_only); + REG_GET_2(MPCC_STATUS[mpcc_inst], MPCC_IDLE, &s->idle, + MPCC_BUSY, &s->busy); + + /* Color blocks state */ + REG_GET(MPC_RMU_CONTROL, MPC_RMU0_MUX_STATUS, &rmu_status); + + if (rmu_status == mpcc_inst) { + REG_GET(SHAPER_CONTROL[0], + MPC_RMU_SHAPER_LUT_MODE_CURRENT, &s->shaper_lut_mode); + REG_GET(RMU_3DLUT_MODE[0], + MPC_RMU_3DLUT_MODE_CURRENT, &s->lut3d_mode); + REG_GET(RMU_3DLUT_READ_WRITE_CONTROL[0], + MPC_RMU_3DLUT_30BIT_EN, &s->lut3d_bit_depth); + REG_GET(RMU_3DLUT_MODE[0], + MPC_RMU_3DLUT_SIZE, &s->lut3d_size); + } else { + REG_GET(SHAPER_CONTROL[1], + MPC_RMU_SHAPER_LUT_MODE_CURRENT, &s->shaper_lut_mode); + REG_GET(RMU_3DLUT_MODE[1], + MPC_RMU_3DLUT_MODE_CURRENT, &s->lut3d_mode); + REG_GET(RMU_3DLUT_READ_WRITE_CONTROL[1], + MPC_RMU_3DLUT_30BIT_EN, &s->lut3d_bit_depth); + REG_GET(RMU_3DLUT_MODE[1], + MPC_RMU_3DLUT_SIZE, &s->lut3d_size); + } + +REG_GET_2(MPCC_OGAM_CONTROL[mpcc_inst], + MPCC_OGAM_MODE_CURRENT, &s->rgam_mode, + MPCC_OGAM_SELECT_CURRENT, &s->rgam_lut); +} + static const struct mpc_funcs dcn30_mpc_funcs = { - .read_mpcc_state = mpc1_read_mpcc_state, + .read_mpcc_state = mpc3_read_mpcc_state, .insert_plane = mpc1_insert_plane, .remove_mpcc = mpc1_remove_mpcc, .mpc_init = mpc1_mpc_init, diff --git a/drivers/gpu/drm/amd/display/dc/inc/hw/mpc.h b/drivers/gpu/drm/amd/display/dc/inc/hw/mpc.h index 61a2406dcc53..a11e40fddc44 100644 --- a/drivers/gpu/drm/amd/display/dc/inc/hw/mpc.h +++ b/drivers/gpu/drm/amd/display/dc/inc/hw/mpc.h @@ -199,6 +199,13 @@ struct mpcc_state { uint32_t overlap_only; uint32_t idle; uint32_t busy; + uint32_t shaper_lut_mode; + uint32_t lut3d_mode; + uint32_t lut3d_bit_depth; + uint32_t lut3d_size; + uint32_t rgam_mode; + uint32_t rgam_lut; + struct mpc_grph_gamut_adjustment gamut_remap; }; /** -- 2.42.0
Re: [PATCH 1/3] Revert "drm/prime: Unexport helpers for fd/handle conversion"
On 2023-11-28 12:22, Alex Deucher wrote: On Thu, Nov 23, 2023 at 6:12 PM Felix Kuehling wrote: [+Alex] On 2023-11-17 16:44, Felix Kuehling wrote: This reverts commit 71a7974ac7019afeec105a54447ae1dc7216cbb3. These helper functions are needed for KFD to export and import DMABufs the right way without duplicating the tracking of DMABufs associated with GEM objects while ensuring that move notifier callbacks are working as intended. CC: Christian König CC: Thomas Zimmermann Signed-off-by: Felix Kuehling Re: our discussion about v2 of this patch: If this version is acceptable, can I get an R-b or A-b? I would like to get this patch into drm-next as a prerequisite for patches 2 and 3. I cannot submit it to the current amd-staging-drm-next because the patch I'm reverting doesn't exist there yet. Patch 2 and 3 could go into drm-next as well, or go through Alex's amd-staging-drm-next branch once patch 1 is in drm-next. Alex, how do you prefer to coordinate this? I guess ideally this would go through my drm-next tree since your other patches depend on it unless others feel strongly that it should go through drm-misc. Yes, drm-next would work best for applying this patch and the two patches that depend on it. I can send you the rebased patches from my drm-next based branch that I used for testing this. Regards, Felix Alex Regards, Felix --- drivers/gpu/drm/drm_prime.c | 33 ++--- include/drm/drm_prime.h | 7 +++ 2 files changed, 25 insertions(+), 15 deletions(-) diff --git a/drivers/gpu/drm/drm_prime.c b/drivers/gpu/drm/drm_prime.c index 63b709a67471..834a5e28abbe 100644 --- a/drivers/gpu/drm/drm_prime.c +++ b/drivers/gpu/drm/drm_prime.c @@ -278,7 +278,7 @@ void drm_gem_dmabuf_release(struct dma_buf *dma_buf) } EXPORT_SYMBOL(drm_gem_dmabuf_release); -/* +/** * drm_gem_prime_fd_to_handle - PRIME import function for GEM drivers * @dev: drm_device to import into * @file_priv: drm file-private structure @@ -292,9 +292,9 @@ EXPORT_SYMBOL(drm_gem_dmabuf_release); * * Returns 0 on success or a negative error code on failure. */ -static int drm_gem_prime_fd_to_handle(struct drm_device *dev, - struct drm_file *file_priv, int prime_fd, - uint32_t *handle) +int drm_gem_prime_fd_to_handle(struct drm_device *dev, +struct drm_file *file_priv, int prime_fd, +uint32_t *handle) { struct dma_buf *dma_buf; struct drm_gem_object *obj; @@ -360,6 +360,7 @@ static int drm_gem_prime_fd_to_handle(struct drm_device *dev, dma_buf_put(dma_buf); return ret; } +EXPORT_SYMBOL(drm_gem_prime_fd_to_handle); int drm_prime_fd_to_handle_ioctl(struct drm_device *dev, void *data, struct drm_file *file_priv) @@ -408,7 +409,7 @@ static struct dma_buf *export_and_register_object(struct drm_device *dev, return dmabuf; } -/* +/** * drm_gem_prime_handle_to_fd - PRIME export function for GEM drivers * @dev: dev to export the buffer from * @file_priv: drm file-private structure @@ -421,10 +422,10 @@ static struct dma_buf *export_and_register_object(struct drm_device *dev, * The actual exporting from GEM object to a dma-buf is done through the * &drm_gem_object_funcs.export callback. */ -static int drm_gem_prime_handle_to_fd(struct drm_device *dev, - struct drm_file *file_priv, uint32_t handle, - uint32_t flags, - int *prime_fd) +int drm_gem_prime_handle_to_fd(struct drm_device *dev, +struct drm_file *file_priv, uint32_t handle, +uint32_t flags, +int *prime_fd) { struct drm_gem_object *obj; int ret = 0; @@ -506,6 +507,7 @@ static int drm_gem_prime_handle_to_fd(struct drm_device *dev, return ret; } +EXPORT_SYMBOL(drm_gem_prime_handle_to_fd); int drm_prime_handle_to_fd_ioctl(struct drm_device *dev, void *data, struct drm_file *file_priv) @@ -864,9 +866,9 @@ EXPORT_SYMBOL(drm_prime_get_contiguous_size); * @obj: GEM object to export * @flags: flags like DRM_CLOEXEC and DRM_RDWR * - * This is the implementation of the &drm_gem_object_funcs.export functions - * for GEM drivers using the PRIME helpers. It is used as the default for - * drivers that do not set their own. + * This is the implementation of the &drm_gem_object_funcs.export functions for GEM drivers + * using the PRIME helpers. It is used as the default in + * drm_gem_prime_handle_to_fd(). */ struct dma_buf *drm_gem_prime_export(struct drm_gem_object *obj, int flags) @@ -962,9 +964,10 @@ EXPORT_SYMBOL(drm_gem_prime_import_dev); * @dev: drm_device to import into
Re: [PATCH v3] drm/amdkfd: Run restore_workers on freezable WQs
On 2023-11-24 8:40, Lazar, Lijo wrote: On 11/24/2023 4:25 AM, Felix Kuehling wrote: Make restore workers freezable so we don't have to explicitly flush them in suspend and GPU reset code paths, and we don't accidentally try to restore BOs while the GPU is suspended. Not having to flush restore_work also helps avoid lock/fence dependencies in the GPU reset case where we're not allowed to wait for fences. A side effect of this is, that we can now have multiple concurrent threads trying to signal the same eviction fence. Rework eviction fence signaling and replacement to account for that. The GPU reset path can no longer rely on restore_process_worker to resume queues because evict/restore workers can run independently of it. Instead call a new restore_process_helper directly. Not familiar with this code. For clarity, does this mean eviction/restore may happen while a reset is in progress? I'm not sure if anything would be able to cause an eviction while a GPU reset is in progress. I don't think it's likely. Any actual migration of BOs would need to wait until after the reset is done anyway. In case of suspend/resume, evictions happen because all the VRAM gets saved to system memory. Suspend/resume and evictions depend on the same helper or worker to restore BOs before reenabling user mode queues. GPU reset needs the same helper to stop user mode queues to ensure that applications don't continue running with potentially corrupted data. This patch removes some synchronization between different users of these common code paths to fix deadlocks in the suspend/resume and GPU reset scenarios. Regards, Felix Thanks, Lijo This is an RFC and request for testing. v2: - Reworked eviction fence signaling - Introduced restore_process_helper v3: - Handle unsignaled eviction fences in restore_process_bos Signed-off-by: Felix Kuehling Acked-by: Christian König --- .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c | 68 +++ drivers/gpu/drm/amd/amdkfd/kfd_process.c | 87 +++ drivers/gpu/drm/amd/amdkfd/kfd_svm.c | 4 +- 3 files changed, 104 insertions(+), 55 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c index 2e302956a279..bdec88713a09 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c @@ -1431,7 +1431,6 @@ static int init_kfd_vm(struct amdgpu_vm *vm, void **process_info, amdgpu_amdkfd_restore_userptr_worker); *process_info = info; - *ef = dma_fence_get(&info->eviction_fence->base); } vm->process_info = *process_info; @@ -1462,6 +1461,8 @@ static int init_kfd_vm(struct amdgpu_vm *vm, void **process_info, list_add_tail(&vm->vm_list_node, &(vm->process_info->vm_list_head)); vm->process_info->n_vms++; + + *ef = dma_fence_get(&vm->process_info->eviction_fence->base); mutex_unlock(&vm->process_info->lock); return 0; @@ -1473,10 +1474,7 @@ static int init_kfd_vm(struct amdgpu_vm *vm, void **process_info, reserve_pd_fail: vm->process_info = NULL; if (info) { - /* Two fence references: one in info and one in *ef */ dma_fence_put(&info->eviction_fence->base); - dma_fence_put(*ef); - *ef = NULL; *process_info = NULL; put_pid(info->pid); create_evict_fence_fail: @@ -1670,7 +1668,8 @@ int amdgpu_amdkfd_criu_resume(void *p) goto out_unlock; } WRITE_ONCE(pinfo->block_mmu_notifications, false); - schedule_delayed_work(&pinfo->restore_userptr_work, 0); + queue_delayed_work(system_freezable_wq, + &pinfo->restore_userptr_work, 0); out_unlock: mutex_unlock(&pinfo->lock); @@ -2475,7 +2474,8 @@ int amdgpu_amdkfd_evict_userptr(struct mmu_interval_notifier *mni, KFD_QUEUE_EVICTION_TRIGGER_USERPTR); if (r) pr_err("Failed to quiesce KFD\n"); - schedule_delayed_work(&process_info->restore_userptr_work, + queue_delayed_work(system_freezable_wq, + &process_info->restore_userptr_work, msecs_to_jiffies(AMDGPU_USERPTR_RESTORE_DELAY_MS)); } mutex_unlock(&process_info->notifier_lock); @@ -2810,7 +2810,8 @@ static void amdgpu_amdkfd_restore_userptr_worker(struct work_struct *work) /* If validation failed, reschedule another attempt */ if (evicted_bos) { - schedule_delayed_work(&process_info->restore_userptr_work, + queue_delayed_work(system_freezable_wq, + &process_info->restore_userptr_work, msecs_to_jiffies(AMDGPU_USERPTR_RESTORE_DELAY_MS)); kfd_smi_event_queue_restore_rescheduled(mm); @@ -2819,6 +2820,23 @@ static void amdgpu_amdkfd_restore_userptr_worker(struct work_struct *work) put_task_struct(usertask); } +static void replace_eviction_fence(struct dma_fence **ef, +
[PATCH] drm/amd/amdgpu: Clean up VCN fw_shared and set flag bits during hw_init
- After whole GPU reset, the VCN fw_shared data is cleaned up. This seems like a new behavior and it is considered safer since we do not want to keep the residual data - However, the existing driver code still assumes the residual data. For example, in sw_init, we will set the UNIFIED_QUEUE flag bit. This flag bit is never set anywhere else. Then if a WGR happens, sw_init will not be called and therefore, we will lose this bit and it leads to a crash in VCN FW. - A proper fix is that we explicitly set the flag bit in hw_init every time and the data is repopulated after a WGR instead of assuming the data can survive the WGR. Signed-off-by: Bokun Zhang Change-Id: I783ff826f12e12eaf48d046ff9f1cef65558c7b9 --- drivers/gpu/drm/amd/amdgpu/vcn_v4_0.c | 45 +-- 1 file changed, 28 insertions(+), 17 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/vcn_v4_0.c b/drivers/gpu/drm/amd/amdgpu/vcn_v4_0.c index bf07aa200030..0cf3e639c480 100644 --- a/drivers/gpu/drm/amd/amdgpu/vcn_v4_0.c +++ b/drivers/gpu/drm/amd/amdgpu/vcn_v4_0.c @@ -111,6 +111,7 @@ static int vcn_v4_0_sw_init(void *handle) { struct amdgpu_ring *ring; struct amdgpu_device *adev = (struct amdgpu_device *)handle; + volatile struct amdgpu_vcn4_fw_shared *fw_shared; int i, r; r = amdgpu_vcn_sw_init(adev); @@ -124,11 +125,12 @@ static int vcn_v4_0_sw_init(void *handle) return r; for (i = 0; i < adev->vcn.num_vcn_inst; i++) { - volatile struct amdgpu_vcn4_fw_shared *fw_shared; - if (adev->vcn.harvest_config & (1 << i)) continue; + fw_shared = adev->vcn.inst[i].fw_shared.cpu_addr; + memset(fw_shared, 0, sizeof(struct amdgpu_vcn4_fw_shared)); + /* Init instance 0 sched_score to 1, so it's scheduled after other instances */ if (i == 0) atomic_set(&adev->vcn.inst[i].sched_score, 1); @@ -161,21 +163,6 @@ static int vcn_v4_0_sw_init(void *handle) if (r) return r; - fw_shared = adev->vcn.inst[i].fw_shared.cpu_addr; - fw_shared->present_flag_0 = cpu_to_le32(AMDGPU_FW_SHARED_FLAG_0_UNIFIED_QUEUE); - fw_shared->sq.is_enabled = 1; - - fw_shared->present_flag_0 |= cpu_to_le32(AMDGPU_VCN_SMU_DPM_INTERFACE_FLAG); - fw_shared->smu_dpm_interface.smu_interface_type = (adev->flags & AMD_IS_APU) ? - AMDGPU_VCN_SMU_DPM_INTERFACE_APU : AMDGPU_VCN_SMU_DPM_INTERFACE_DGPU; - - if (amdgpu_ip_version(adev, VCN_HWIP, 0) == - IP_VERSION(4, 0, 2)) { - fw_shared->present_flag_0 |= AMDGPU_FW_SHARED_FLAG_0_DRM_KEY_INJECT; - fw_shared->drm_key_wa.method = - AMDGPU_DRM_KEY_INJECT_WORKAROUND_VCNFW_ASD_HANDSHAKING; - } - if (amdgpu_vcnfw_log) amdgpu_vcn_fwlog_init(&adev->vcn.inst[i]); } @@ -245,9 +232,33 @@ static int vcn_v4_0_sw_fini(void *handle) static int vcn_v4_0_hw_init(void *handle) { struct amdgpu_device *adev = (struct amdgpu_device *)handle; + volatile struct amdgpu_vcn4_fw_shared *fw_shared; struct amdgpu_ring *ring; int i, r; + for (i = 0; i < adev->vcn.num_vcn_inst; i++) { + if (adev->vcn.harvest_config & (1 << i)) + continue; + + fw_shared = adev->vcn.inst[i].fw_shared.cpu_addr; + fw_shared->present_flag_0 |= cpu_to_le32(AMDGPU_FW_SHARED_FLAG_0_UNIFIED_QUEUE); + fw_shared->sq.is_enabled = 1; + + fw_shared->present_flag_0 |= cpu_to_le32(AMDGPU_VCN_SMU_DPM_INTERFACE_FLAG); + fw_shared->smu_dpm_interface.smu_interface_type = (adev->flags & AMD_IS_APU) ? + AMDGPU_VCN_SMU_DPM_INTERFACE_APU : AMDGPU_VCN_SMU_DPM_INTERFACE_DGPU; + + if (amdgpu_ip_version(adev, VCN_HWIP, 0) == + IP_VERSION(4, 0, 2)) { + fw_shared->present_flag_0 |= AMDGPU_FW_SHARED_FLAG_0_DRM_KEY_INJECT; + fw_shared->drm_key_wa.method = + AMDGPU_DRM_KEY_INJECT_WORKAROUND_VCNFW_ASD_HANDSHAKING; + } + + if (amdgpu_vcnfw_log) + fw_shared->present_flag_0 |= cpu_to_le32(AMDGPU_VCN_FW_LOGGING_FLAG); + } + if (amdgpu_sriov_vf(adev)) { r = vcn_v4_0_start_sriov(adev); if (r) -- 2.34.1
Re: [PATCH v5 00/32] drm/amd/display: add AMD driver-specific properties for color mgmt
On 2023-11-16 14:57, Melissa Wen wrote: > Hello, > > This series extends the current KMS color management API with AMD > driver-specific properties to enhance the color management support on > AMD Steam Deck. The key additions to the color pipeline include: > snip > Melissa Wen (18): > drm/drm_mode_object: increase max objects to accommodate new color > props > drm/drm_property: make replace_property_blob_from_id a DRM helper > drm/drm_plane: track color mgmt changes per plane If all patches are merged through amd-staging-drm-next I worry that conflicts creep in if any code around replace_property_blob_from_id changes in DRM. My plan is to merge DRM patches through drm-misc-next, as well as include them in the amd-staging-drm-next merge. They should then fall out at the next amd-staging-drm-next pull and (hopefully) ensure that there is no conflict. If no objections I'll go ahead with that later this week. Harry > drm/amd/display: add driver-specific property for plane degamma LUT > drm/amd/display: explicitly define EOTF and inverse EOTF > drm/amd/display: document AMDGPU pre-defined transfer functions > drm/amd/display: add plane 3D LUT driver-specific properties > drm/amd/display: add plane shaper LUT and TF driver-specific > properties > drm/amd/display: add CRTC gamma TF driver-specific property > drm/amd/display: add comments to describe DM crtc color mgmt behavior > drm/amd/display: encapsulate atomic regamma operation > drm/amd/display: decouple steps for mapping CRTC degamma to DC plane > drm/amd/display: reject atomic commit if setting both plane and CRTC > degamma > drm/amd/display: add plane shaper LUT support > drm/amd/display: add plane shaper TF support > drm/amd/display: add plane 3D LUT support > drm/amd/display: add plane CTM driver-specific property > drm/amd/display: add plane CTM support > > drivers/gpu/drm/amd/amdgpu/amdgpu_mode.h | 91 ++ > .../gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 34 +- > .../gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.h | 108 +++ > .../amd/display/amdgpu_dm/amdgpu_dm_color.c | 818 -- > .../amd/display/amdgpu_dm/amdgpu_dm_crtc.c| 72 ++ > .../amd/display/amdgpu_dm/amdgpu_dm_plane.c | 232 - > .../gpu/drm/amd/display/include/fixed31_32.h | 12 + > drivers/gpu/drm/arm/malidp_crtc.c | 2 +- > drivers/gpu/drm/drm_atomic.c | 1 + > drivers/gpu/drm/drm_atomic_state_helper.c | 1 + > drivers/gpu/drm/drm_atomic_uapi.c | 43 +- > drivers/gpu/drm/drm_property.c| 49 ++ > include/drm/drm_mode_object.h | 2 +- > include/drm/drm_plane.h | 7 + > include/drm/drm_property.h| 6 + > include/uapi/drm/drm_mode.h | 8 + > 16 files changed, 1377 insertions(+), 109 deletions(-) >
Re: Radeon regression in 6.6 kernel
On Mon, Nov 27, 2023 at 6:24 PM Phillip Susi wrote: > > Alex Deucher writes: > > >> In that case those are the already known problems with the scheduler > >> changes, aren't they? > > > > Yes. Those changes went into 6.7 though, not 6.6 AFAIK. Maybe I'm > > misunderstanding what the original report was actually testing. If it > > was 6.7, then try reverting: > > 56e449603f0ac580700621a356d35d5716a62ce5 > > b70438004a14f4d0f9890b3297cd66248728546c > > At some point it was suggested that I file a gitlab issue, but I took > this to mean it was already known and being worked on. -rc3 came out > today and still has the problem. Is there a known issue I could track? > At this point, unless there are any objections, I think we should just revert the two patches. Alex
Re: Radeon regression in 6.6 kernel
On 2023-11-28 17:13, Alex Deucher wrote: > On Mon, Nov 27, 2023 at 6:24 PM Phillip Susi wrote: >> >> Alex Deucher writes: >> In that case those are the already known problems with the scheduler changes, aren't they? >>> >>> Yes. Those changes went into 6.7 though, not 6.6 AFAIK. Maybe I'm >>> misunderstanding what the original report was actually testing. If it >>> was 6.7, then try reverting: >>> 56e449603f0ac580700621a356d35d5716a62ce5 >>> b70438004a14f4d0f9890b3297cd66248728546c >> >> At some point it was suggested that I file a gitlab issue, but I took >> this to mean it was already known and being worked on. -rc3 came out >> today and still has the problem. Is there a known issue I could track? >> > > At this point, unless there are any objections, I think we should just > revert the two patches Uhm, no. Why "the two" patches? This email, part of this thread, https://lore.kernel.org/all/87r0kircdo@vps.thesusis.net/ clearly states that reverting *only* this commit, 56e449603f0ac5 drm/sched: Convert the GPU scheduler to variable number of run-queues *does not* mitigate the failed suspend. (Furthermore, this commit doesn't really change anything operational, other than using an allocated array, instead of a static one, in DRM, while the 2nd patch is solely contained within the amdgpu driver code.) Leaving us with only this change, b70438004a14f4 drm/amdgpu: move buffer funcs setting up a level to be at fault, as the kernel log attached in the linked email above shows. The conclusion is that only b70438004a14f4 needs reverting. -- Regards, Luben OpenPGP_0x4C15479431A334AF.asc Description: OpenPGP public key OpenPGP_signature.asc Description: OpenPGP digital signature
Re: [RFC PATCH 0/6] Supporting GMEM (generalized memory management) for external memory devices
On Tue, 28 Nov 2023 at 23:07, Christian König wrote: > > Am 28.11.23 um 13:50 schrieb Weixi Zhu: > > The problem: > > > > Accelerator driver developers are forced to reinvent external MM subsystems > > case by case, because Linux core MM only considers host memory resources. > > These reinvented MM subsystems have similar orders of magnitude of LoC as > > Linux MM (80K), e.g. Nvidia-UVM has 70K, AMD GPU has 14K and Huawei NPU has > > 30K. Meanwhile, more and more vendors are implementing their own > > accelerators, e.g. Microsoft's Maia 100. At the same time, > > application-level developers suffer from poor programmability -- they must > > consider parallel address spaces and be careful about the limited device > > DRAM capacity. This can be alleviated if a malloc()-ed virtual address can > > be shared by the accelerator, or the abundant host DRAM can further > > transparently backup the device local memory. > > > > These external MM systems share similar mechanisms except for the > > hardware-dependent part, so reinventing them is effectively introducing > > redundant code (14K~70K for each case). Such developing/maintaining is not > > cheap. Furthermore, to share a malloc()-ed virtual address, device drivers > > need to deeply interact with Linux MM via low-level MM APIs, e.g. MMU > > notifiers/HMM. This raises the bar for driver development, since developers > > must understand how Linux MM works. Further, it creates code maintenance > > problems -- any changes to Linux MM potentially require coordinated changes > > to accelerator drivers using low-level MM APIs. > > > > Putting a cache-coherent bus between host and device will not make these > > external MM subsystems disappear. For example, a throughput-oriented > > accelerator will not tolerate executing heavy memory access workload with > > a host MMU/IOMMU via a remote bus. Therefore, devices will still have > > their own MMU and pick a simpler page table format for lower address > > translation overhead, requiring external MM subsystems. > > > > > > > > What GMEM (Generalized Memory Management [1]) does: > > > > GMEM extends Linux MM to share its machine-independent MM code. Only > > high-level interface is provided for device drivers. This prevents > > accelerator drivers from reinventing the wheel, but relies on drivers to > > implement their hardware-dependent functions declared by GMEM. GMEM's key > > interface include gm_dev_create(), gm_as_create(), gm_as_attach() and > > gm_dev_register_physmem(). Here briefly describe how a device driver > > utilizes them: > > 1. At boot time, call gm_dev_create() and registers the implementation of > > hardware-dependent functions as declared in struct gm_mmu. > > - If the device has local DRAM, call gm_dev_register_physmem() to > > register available physical addresses. > > 2. When a device context is initialized (e.g. triggered by ioctl), check if > > the current CPU process has been attached to a gmem address space > > (struct gm_as). If not, call gm_as_create() and point current->mm->gm_as > > to it. > > 3. Call gm_as_attach() to attach the device context to a gmem address space. > > 4. Invoke gm_dev_fault() to resolve a page fault or prepare data before > > device computation happens. > > > > GMEM has changed the following assumptions in Linux MM: > >1. An mm_struct not only handle a single CPU context, but may also handle > > external memory contexts encapsulated as gm_context listed in > > mm->gm_as. An external memory context can include a few or all of the > > following parts: an external MMU (that requires TLB invalidation), an > > external page table (that requires PTE manipulation) and external DRAM > > (that requires physical memory management). > > Well that is pretty much exactly what AMD has already proposed with KFD > and was rejected for rather good reasons. > > > > MMU functions > > The MMU functions peer_map() and peer_unmap() overlap other functions, > > leaving a question if the MMU functions should be decoupled as more basic > > operations. Decoupling them could potentially prevent device drivers > > coalescing these basic steps within a single host-device communication > > operation, while coupling them makes it more difficult for device drivers > > to utilize GMEM interface. > > Well to be honest all of this sounds like history to me. We have already > seen the same basic approach in KFD, HMM and to some extend in TTM as well. > > And all of them more or less failed. Why should this here be different? Any info we have on why this has failed to work in the past would be useful to provide. This is one of those cases where we may not have documented the bad ideas to stop future developers from thinking they are bad. I do think we would want more common code in this area, but I would think we'd have it more on the driver infrastructure side, than in the core mm. Dave.
Re: PSP_VMBX_POLLING_LIMIT too big
On 11/28/2023 9:51 PM, Mario Limonciello wrote: Hi, In amd-staging-drm-next 46fe6312082c ("drm/amdgpu: update retry times for psp BL wait") and upstream a11156ff6f41 ("drm/amdgpu: update retry times for psp BL wait") the number of loops for psp_v13_0_wait_for_bootloader() to try again increased significantly. It went from 10 loops to 20k loops. Essentially this means that the function can "effectively" no longer fail. PSP_VMBX_POLLING_LIMIT to 20k is introduced by this - f2328c2ba0e84 ("drm/amdgpu: update retry times for psp vmbx wait") 20k is too much even for PSP 13.0.6. Will reduce it to 3000 (~5mins) for 13.0.6 and for others keep the default 10. Thanks, Lijo I've got an issue I'm looking at where runtime resume for a dGPU fails, and because of this change the system gets stuck in a never ending busy loop instead of cleanly returning an error code to the caller. The outcome is the system appears hung while the 20k loops run instead of just the dGPU failing to resume. Is this 20k value really required? Or can we reduce it back to something more manageable? Thanks,
RE: [PATCH] drm/amdgpu: Use another offset for GC 9.4.3 remap
[AMD Official Use Only - General] Reviewed-by: Hawking Zhang Regards, Hawking -Original Message- From: Lazar, Lijo Sent: Tuesday, November 28, 2023 19:26 To: amd-gfx@lists.freedesktop.org Cc: Zhang, Hawking ; Deucher, Alexander ; Kuehling, Felix ; Ma, Le ; Gadre, Mangesh Subject: [PATCH] drm/amdgpu: Use another offset for GC 9.4.3 remap The legacy region at 0x7F000 maps to valid registers in GC 9.4.3 SOCs. Use 0x1A000 offset instead as MMIO register remap region. Signed-off-by: Lijo Lazar --- drivers/gpu/drm/amd/amdgpu/soc15.c | 5 + 1 file changed, 5 insertions(+) diff --git a/drivers/gpu/drm/amd/amdgpu/soc15.c b/drivers/gpu/drm/amd/amdgpu/soc15.c index e3d41e8aac9d..9ad4d6d3122b 100644 --- a/drivers/gpu/drm/amd/amdgpu/soc15.c +++ b/drivers/gpu/drm/amd/amdgpu/soc15.c @@ -1162,6 +1162,11 @@ static int soc15_common_early_init(void *handle) AMD_PG_SUPPORT_VCN_DPG | AMD_PG_SUPPORT_JPEG; adev->external_rev_id = adev->rev_id + 0x46; + /* GC 9.4.3 uses MMIO register region hole at a different offset */ + if (!amdgpu_sriov_vf(adev)) { + adev->rmmio_remap.reg_offset = 0x1A000; + adev->rmmio_remap.bus_addr = adev->rmmio_base + 0x1A000; + } break; default: /* FIXME: not supported yet */ -- 2.25.1
[PATCH 1/3] drm/amd: Fix handling of amdgpu.runpm on systems with BOCO
On products that support both BOCO and BACO it should be possible to override the BOCO detection and force BACO by amdgpu.runpm=1 but this doesn't work today. Adjust the logic used in amdgpu_driver_load_kms() to make sure that module parameters are looked at first and only use automatic policies in the -1 or -2 cases. Signed-off-by: Mario Limonciello --- drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c | 80 +++-- 1 file changed, 48 insertions(+), 32 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c index b5ebafd4a3ad..29381da08fd5 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c @@ -121,6 +121,53 @@ void amdgpu_register_gpu_instance(struct amdgpu_device *adev) mutex_unlock(&mgpu_info.mutex); } +static void amdgpu_driver_set_runtime_pm_mode(struct amdgpu_device *adev) +{ + struct drm_device *dev = adev_to_drm(adev); + + adev->pm.rpm_mode = AMDGPU_RUNPM_NONE; + + switch (amdgpu_runtime_pm) { + case -1: + case -2: + break; + case 0: + default: + return; + case 1: + if (amdgpu_device_supports_baco(dev)) + adev->pm.rpm_mode = AMDGPU_RUNPM_BACO; + else + dev_err(adev->dev, "BACO is not supported on this ASIC\n"); + return; + case 2: + // TODO: adjust plumbing to be able to pull PP table to check MACO support as well + if (amdgpu_device_supports_baco(dev)) + adev->pm.rpm_mode = AMDGPU_RUNPM_BACO; + else + dev_err(adev->dev, "BAMACO is not supported on this ASIC\n"); + return; + } + + if (amdgpu_device_supports_px(dev)) { + adev->pm.rpm_mode = AMDGPU_RUNPM_PX; + dev_info(adev->dev, "Using ATPX for runtime pm\n"); + } else if (amdgpu_device_supports_boco(dev)) { + adev->pm.rpm_mode = AMDGPU_RUNPM_BOCO; + dev_info(adev->dev, "Using BOCO for runtime pm\n"); + } else if (amdgpu_device_supports_baco(dev)) { + if (adev->asic_type == CHIP_VEGA10) { + /* enable BACO as runpm mode if noretry=0 */ + if (!adev->gmc.noretry) + adev->pm.rpm_mode = AMDGPU_RUNPM_BACO; + } else { + adev->pm.rpm_mode = AMDGPU_RUNPM_BACO; + } + if (adev->pm.rpm_mode == AMDGPU_RUNPM_BACO) + dev_info(adev->dev, "Using BACO for runtime pm\n"); + } +} + /** * amdgpu_driver_load_kms - Main load function for KMS. * @@ -149,38 +196,7 @@ int amdgpu_driver_load_kms(struct amdgpu_device *adev, unsigned long flags) goto out; } - adev->pm.rpm_mode = AMDGPU_RUNPM_NONE; - if (amdgpu_device_supports_px(dev) && - (amdgpu_runtime_pm != 0)) { /* enable PX as runtime mode */ - adev->pm.rpm_mode = AMDGPU_RUNPM_PX; - dev_info(adev->dev, "Using ATPX for runtime pm\n"); - } else if (amdgpu_device_supports_boco(dev) && - (amdgpu_runtime_pm != 0)) { /* enable boco as runtime mode */ - adev->pm.rpm_mode = AMDGPU_RUNPM_BOCO; - dev_info(adev->dev, "Using BOCO for runtime pm\n"); - } else if (amdgpu_device_supports_baco(dev) && - (amdgpu_runtime_pm != 0)) { - switch (adev->asic_type) { - case CHIP_VEGA20: - case CHIP_ARCTURUS: - /* enable BACO as runpm mode if runpm=1 */ - if (amdgpu_runtime_pm > 0) - adev->pm.rpm_mode = AMDGPU_RUNPM_BACO; - break; - case CHIP_VEGA10: - /* enable BACO as runpm mode if noretry=0 */ - if (!adev->gmc.noretry) - adev->pm.rpm_mode = AMDGPU_RUNPM_BACO; - break; - default: - /* enable BACO as runpm mode on CI+ */ - adev->pm.rpm_mode = AMDGPU_RUNPM_BACO; - break; - } - - if (adev->pm.rpm_mode == AMDGPU_RUNPM_BACO) - dev_info(adev->dev, "Using BACO for runtime pm\n"); - } + amdgpu_driver_set_runtime_pm_mode(adev); /* Call ACPI methods: require modeset init * but failure is not fatal -- 2.34.1
[PATCH 0/3] Obey amdgpu.runpm even on BOCO systems
I've found that on a system that supports BOCO when I try to override it that it still uses BOCO. This is because module parameter use is intermingled with automatic detection. This series moves automatic detection after module parameter use and makes all callers obey the result. Mario Limonciello (3): drm/amd: Fix handling of amdgpu.runpm on systems with BOCO drm/amd: Introduce new enum for BAMACO drm/amd: Drop calls for checking "support" for BACO/BOCO/PX drivers/gpu/drm/amd/amdgpu/amdgpu_device.c| 7 +- drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 34 drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c | 80 +++ drivers/gpu/drm/amd/pm/inc/amdgpu_dpm.h | 1 + drivers/gpu/drm/amd/pm/swsmu/amdgpu_smu.c | 3 +- .../gpu/drm/amd/pm/swsmu/smu11/smu_v11_0.c| 2 +- .../gpu/drm/amd/pm/swsmu/smu13/smu_v13_0.c| 5 +- 7 files changed, 77 insertions(+), 55 deletions(-) -- 2.34.1
[PATCH 2/3] drm/amd: Introduce new enum for BAMACO
Rather than plumbing module parameter deep into IP declare BAMACO runpm mode at amdgpu_driver_set_runtime_pm_mode() and then detect this mode in consumers. Signed-off-by: Mario Limonciello --- drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c| 2 +- drivers/gpu/drm/amd/pm/inc/amdgpu_dpm.h| 1 + drivers/gpu/drm/amd/pm/swsmu/smu11/smu_v11_0.c | 2 +- drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0.c | 5 +++-- 4 files changed, 6 insertions(+), 4 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c index 29381da08fd5..c6c87ab71d94 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c @@ -143,7 +143,7 @@ static void amdgpu_driver_set_runtime_pm_mode(struct amdgpu_device *adev) case 2: // TODO: adjust plumbing to be able to pull PP table to check MACO support as well if (amdgpu_device_supports_baco(dev)) - adev->pm.rpm_mode = AMDGPU_RUNPM_BACO; + adev->pm.rpm_mode = AMDGPU_RUNPM_BAMACO; else dev_err(adev->dev, "BAMACO is not supported on this ASIC\n"); return; diff --git a/drivers/gpu/drm/amd/pm/inc/amdgpu_dpm.h b/drivers/gpu/drm/amd/pm/inc/amdgpu_dpm.h index d76b0a60db44..3434c31b434b 100644 --- a/drivers/gpu/drm/amd/pm/inc/amdgpu_dpm.h +++ b/drivers/gpu/drm/amd/pm/inc/amdgpu_dpm.h @@ -50,6 +50,7 @@ enum amdgpu_runpm_mode { AMDGPU_RUNPM_PX, AMDGPU_RUNPM_BOCO, AMDGPU_RUNPM_BACO, + AMDGPU_RUNPM_BAMACO, }; struct amdgpu_ps { diff --git a/drivers/gpu/drm/amd/pm/swsmu/smu11/smu_v11_0.c b/drivers/gpu/drm/amd/pm/swsmu/smu11/smu_v11_0.c index 5a314d0316c1..64c8783b4118 100644 --- a/drivers/gpu/drm/amd/pm/swsmu/smu11/smu_v11_0.c +++ b/drivers/gpu/drm/amd/pm/swsmu/smu11/smu_v11_0.c @@ -1597,7 +1597,7 @@ int smu_v11_0_baco_set_state(struct smu_context *smu, enum smu_baco_state state) case IP_VERSION(11, 0, 11): case IP_VERSION(11, 0, 12): case IP_VERSION(11, 0, 13): - if (amdgpu_runtime_pm == 2) + if (adev->pm.rpm_mode == AMDGPU_RUNPM_BAMACO) ret = smu_cmn_send_smc_msg_with_param(smu, SMU_MSG_EnterBaco, D3HOT_BAMACO_SEQUENCE, diff --git a/drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0.c b/drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0.c index 3c595ac897d6..b77763d6c72f 100644 --- a/drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0.c +++ b/drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0.c @@ -2259,7 +2259,8 @@ int smu_v13_0_baco_set_state(struct smu_context *smu, if (state == SMU_BACO_STATE_ENTER) { ret = smu_cmn_send_smc_msg_with_param(smu, SMU_MSG_EnterBaco, - (smu_baco->maco_support && amdgpu_runtime_pm != 1) ? + (smu_baco->maco_support && + adev->pm.rpm_mode == AMDGPU_RUNPM_BAMACO) ? BACO_SEQ_BAMACO : BACO_SEQ_BACO, NULL); } else { @@ -2288,7 +2289,7 @@ int smu_v13_0_baco_enter(struct smu_context *smu) if (adev->in_runpm && smu_cmn_is_audio_func_enabled(adev)) { return smu_v13_0_baco_set_armd3_sequence(smu, - (smu_baco->maco_support && amdgpu_runtime_pm != 1) ? + (smu_baco->maco_support && adev->pm.rpm_mode == AMDGPU_RUNPM_BAMACO) ? BACO_SEQ_BAMACO : BACO_SEQ_BACO); } else { ret = smu_v13_0_baco_set_state(smu, SMU_BACO_STATE_ENTER); -- 2.34.1
[PATCH 3/3] drm/amd: Drop calls for checking "support" for BACO/BOCO/PX
As the module parameter can be used to control behavior, all parts of the driver should obey what has been programmed by user or detected by auto mode rather than what "can" be supported. Drop calls to all functions that check for BACO/BOCO/PX runpm modes and instead use the variable that is programmed when device is probed. Signed-off-by: Mario Limonciello --- drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 7 +++-- drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c| 34 -- drivers/gpu/drm/amd/pm/swsmu/amdgpu_smu.c | 3 +- 3 files changed, 24 insertions(+), 20 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c index 1181fe4baf0f..8f7377b37f2f 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c @@ -1822,9 +1822,10 @@ static void amdgpu_switcheroo_set_state(struct pci_dev *pdev, enum vga_switcheroo_state state) { struct drm_device *dev = pci_get_drvdata(pdev); + struct amdgpu_device *adev = drm_to_adev(dev); int r; - if (amdgpu_device_supports_px(dev) && state == VGA_SWITCHEROO_OFF) + if (adev->pm.rpm_mode == AMDGPU_RUNPM_PX && state == VGA_SWITCHEROO_OFF) return; if (state == VGA_SWITCHEROO_ON) { @@ -4244,7 +4245,7 @@ int amdgpu_device_init(struct amdgpu_device *adev, if ((adev->pdev->class >> 8) == PCI_CLASS_DISPLAY_VGA) vga_client_register(adev->pdev, amdgpu_device_vga_set_decode); - px = amdgpu_device_supports_px(ddev); + px = (adev->pm.rpm_mode == AMDGPU_RUNPM_PX); if (px || (!dev_is_removable(&adev->pdev->dev) && apple_gmux_detect(NULL, NULL))) @@ -4392,7 +4393,7 @@ void amdgpu_device_fini_sw(struct amdgpu_device *adev) kfree(adev->fru_info); adev->fru_info = NULL; - px = amdgpu_device_supports_px(adev_to_drm(adev)); + px = (adev->pm.rpm_mode == AMDGPU_RUNPM_PX); if (px || (!dev_is_removable(&adev->pdev->dev) && apple_gmux_detect(NULL, NULL))) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c index e39f3a334c9d..12fb8398fb45 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c @@ -2248,10 +2248,10 @@ static int amdgpu_pci_probe(struct pci_dev *pdev, if (adev->pm.rpm_mode != AMDGPU_RUNPM_NONE) { /* only need to skip on ATPX */ - if (amdgpu_device_supports_px(ddev)) + if (adev->pm.rpm_mode == AMDGPU_RUNPM_PX) dev_pm_set_driver_flags(ddev->dev, DPM_FLAG_NO_DIRECT_COMPLETE); /* we want direct complete for BOCO */ - if (amdgpu_device_supports_boco(ddev)) + if (adev->pm.rpm_mode == AMDGPU_RUNPM_BOCO) dev_pm_set_driver_flags(ddev->dev, DPM_FLAG_SMART_PREPARE | DPM_FLAG_SMART_SUSPEND | DPM_FLAG_MAY_SKIP_RESUME); @@ -2284,7 +2284,8 @@ static int amdgpu_pci_probe(struct pci_dev *pdev, * into D0 state. Then there will be a PMFW-aware D-state * transition(D0->D3) on runpm suspend. */ - if (amdgpu_device_supports_baco(ddev) && + if ((adev->pm.rpm_mode == AMDGPU_RUNPM_BACO || +adev->pm.rpm_mode == AMDGPU_RUNPM_BAMACO) && !(adev->flags & AMD_IS_APU) && (adev->asic_type >= CHIP_NAVI10)) amdgpu_get_secondary_funcs(adev); @@ -2466,7 +2467,7 @@ static int amdgpu_pmops_prepare(struct device *dev) /* Return a positive number here so * DPM_FLAG_SMART_SUSPEND works properly */ - if (amdgpu_device_supports_boco(drm_dev) && + if (adev->pm.rpm_mode == AMDGPU_RUNPM_BOCO && pm_runtime_suspended(dev)) return 1; @@ -2664,7 +2665,7 @@ static int amdgpu_pmops_runtime_suspend(struct device *dev) } adev->in_runpm = true; - if (amdgpu_device_supports_px(drm_dev)) + if (adev->pm.rpm_mode == AMDGPU_RUNPM_PX) drm_dev->switch_power_state = DRM_SWITCH_POWER_CHANGING; /* @@ -2674,7 +2675,7 @@ static int amdgpu_pmops_runtime_suspend(struct device *dev) * platforms. * TODO: this may be also needed for PX capable platform. */ - if (amdgpu_device_supports_boco(drm_dev)) + if (adev->pm.rpm_mode == AMDGPU_RUNPM_BOCO) adev->mp1_state = PP_MP1_STATE_UNLOAD; ret = amdgpu_device_prepare(drm_dev); @@ -2683,15 +2684,15 @@ static int amdgpu_pmops_runtime_suspend(struct device *dev) ret = amdgpu_device_suspend(drm_dev, false); if (ret) { adev->in_runpm = false; -
RE: [PATCH] drm/amdgpu: Skip access gfx11 golden registers under SRIOV
[AMD Official Use Only - General] Reviewed-By: Horace Chen -Original Message- From: Yin, ZhenGuo (Chris) Sent: Monday, November 27, 2023 10:43 AM To: Yin, ZhenGuo (Chris) ; amd-gfx@lists.freedesktop.org; Zha, YiFan(Even) ; Chen, Horace Subject: Re: [PATCH] drm/amdgpu: Skip access gfx11 golden registers under SRIOV Add potential reviewers. On 11/23/2023 4:57 PM, ZhenGuo Yin wrote: > [Why] > Golden registers are PF-only registers on gfx11. > RLCG interface will return "out-of-range" under SRIOV VF. > > [How] > Skip access gfx11 golden registers under SRIOV. > > Signed-off-by: ZhenGuo Yin > --- > drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c | 3 +++ > 1 file changed, 3 insertions(+) > > diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c > b/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c > index 8ed4a6fb147a..288e926e5cd4 100644 > --- a/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c > +++ b/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c > @@ -293,6 +293,9 @@ static void gfx_v11_0_set_kiq_pm4_funcs(struct > amdgpu_device *adev) > > static void gfx_v11_0_init_golden_registers(struct amdgpu_device *adev) > { > + if (amdgpu_sriov_vf(adev)) > + return; > + > switch (amdgpu_ip_version(adev, GC_HWIP, 0)) { > case IP_VERSION(11, 0, 1): > case IP_VERSION(11, 0, 4):