date:20231128

Re: [PATCH] drm/amdgpu: Fix uninitialized return value

2023-11-28 Thread Christian König


Am 27.11.23 um 22:55 schrieb Alex Deucher:

On Mon, Nov 27, 2023 at 2:22 PM Christian König
 wrote:

Am 27.11.23 um 19:29 schrieb Lijo Lazar:

The return value is uniinitialized if ras context is NULL.

Fixes: 0f4c8faa043c (drm/amdgpu: Move mca debug mode decision to ras)

Signed-off-by: Lijo Lazar 
---
   drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 2 +-
   1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
index 1a8668a63e67..f6b47ebce9d6 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
@@ -3410,7 +3410,7 @@ int amdgpu_ras_reset_gpu(struct amdgpu_device *adev)
   int amdgpu_ras_set_mca_debug_mode(struct amdgpu_device *adev, bool enable)
   {
   struct amdgpu_ras *con = amdgpu_ras_get_context(adev);
- int ret;
+ int ret = 0;

That's usually considered very bad coding style and complained about by
automated checkers.

Instead explicitly set the return value in the code paths not actually
setting it.

In this case, the function is so short, I think it makes things less
readable to do that.


Yeah, indeed but that doesn't help us with the coding style checkers.

We could do something like this instead:

if (!con)
    return 0;

ret = amdgpu_mca_smu_set_debug_mode(adev, enable);
...

Regards,
Christian.



Reviewed-by: Alex Deucher 


Regards,
Christian.


   if (con) {
   ret = amdgpu_mca_smu_set_debug_mode(adev, enable);

Re: [PATCH] drm/amdgpu: Fix uninitialized return value

2023-11-28 Thread Lazar, Lijo




On 11/28/2023 3:07 PM, Christian König wrote:

Am 27.11.23 um 22:55 schrieb Alex Deucher:

On Mon, Nov 27, 2023 at 2:22 PM Christian König
 wrote:

Am 27.11.23 um 19:29 schrieb Lijo Lazar:

The return value is uniinitialized if ras context is NULL.

Fixes: 0f4c8faa043c (drm/amdgpu: Move mca debug mode decision to ras)

Signed-off-by: Lijo Lazar 
---
   drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 2 +-
   1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c

index 1a8668a63e67..f6b47ebce9d6 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
@@ -3410,7 +3410,7 @@ int amdgpu_ras_reset_gpu(struct amdgpu_device 
*adev)
   int amdgpu_ras_set_mca_debug_mode(struct amdgpu_device *adev, 
bool enable)

   {
   struct amdgpu_ras *con = amdgpu_ras_get_context(adev);
- int ret;
+ int ret = 0;

That's usually considered very bad coding style and complained about by
automated checkers.

Instead explicitly set the return value in the code paths not actually
setting it.

In this case, the function is so short, I think it makes things less
readable to do that.


Yeah, indeed but that doesn't help us with the coding style checkers.


Are these checkers for real? I see many instances of variable 
initialization including in core kernel code (ex: workqueue) code.


Thanks

Lijo



We could do something like this instead:

if (!con)
    return 0;

ret = amdgpu_mca_smu_set_debug_mode(adev, enable);
...

Regards,
Christian.



Reviewed-by: Alex Deucher 


Regards,
Christian.


   if (con) {
   ret = amdgpu_mca_smu_set_debug_mode(adev, enable);

[PATCH 0/8] drm/plane-helpers: Minor clean ups

2023-11-28 Thread Thomas Zimmermann

Move drm_plane_helper_atomic_check() into udl, which is the only
driver using it. Remove several unnecessary include statements for
.

Thomas Zimmermann (8):
  drm/plane-helper: Move drm_plane_helper_atomic_check() into udl
  drm/amdgpu: Do not include 
  drm/loongson: Do not include 
  drm/shmobile: Do not include 
  drm/solomon: Do not include 
  drm/ofdrm: Do not include 
  drm/simpledrm: Do not include 
  drm/xlnx: Do not include 

 .../gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c |  1 -
 drivers/gpu/drm/drm_plane_helper.c| 32 ---
 drivers/gpu/drm/loongson/lsdc_plane.c |  1 -
 .../drm/renesas/shmobile/shmob_drm_plane.c|  1 -
 drivers/gpu/drm/solomon/ssd130x.h |  1 -
 drivers/gpu/drm/tiny/ofdrm.c  |  1 -
 drivers/gpu/drm/tiny/simpledrm.c  |  1 -
 drivers/gpu/drm/udl/udl_modeset.c | 19 +--
 drivers/gpu/drm/xlnx/zynqmp_kms.c |  1 -
 include/drm/drm_plane_helper.h|  2 --
 10 files changed, 17 insertions(+), 43 deletions(-)

-- 
2.43.0

[PATCH 2/8] drm/amdgpu: Do not include

2023-11-28 Thread Thomas Zimmermann

Remove unnecessary include statements for .
The file contains helpers for non-atomic code and should not be
required by most drivers. No functional changes.

Signed-off-by: Thomas Zimmermann 
---
 drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c 
b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
index aa43f1761acd3..b8c3a9b104a41 100644
--- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
+++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
@@ -92,7 +92,6 @@
 #include 
 #include 
 #include 
-#include 
 
 #include 
 
-- 
2.43.0

[PATCH 6/8] drm/ofdrm: Do not include

2023-11-28 Thread Thomas Zimmermann

Remove unnecessary include statements for .
The file contains helpers for non-atomic code and should not be
required by most drivers. No functional changes.

Signed-off-by: Thomas Zimmermann 
---
 drivers/gpu/drm/tiny/ofdrm.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/drivers/gpu/drm/tiny/ofdrm.c b/drivers/gpu/drm/tiny/ofdrm.c
index 05a72473cfc65..ab89b7fc7bf61 100644
--- a/drivers/gpu/drm/tiny/ofdrm.c
+++ b/drivers/gpu/drm/tiny/ofdrm.c
@@ -19,7 +19,6 @@
 #include 
 #include 
 #include 
-#include 
 #include 
 #include 
 
-- 
2.43.0

[PATCH 4/8] drm/shmobile: Do not include

2023-11-28 Thread Thomas Zimmermann

Remove unnecessary include statements for .
The file contains helpers for non-atomic code and should not be
required by most drivers. No functional changes.

Signed-off-by: Thomas Zimmermann 
---
 drivers/gpu/drm/renesas/shmobile/shmob_drm_plane.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/drivers/gpu/drm/renesas/shmobile/shmob_drm_plane.c 
b/drivers/gpu/drm/renesas/shmobile/shmob_drm_plane.c
index 8f9a728affde8..07ad17d24294d 100644
--- a/drivers/gpu/drm/renesas/shmobile/shmob_drm_plane.c
+++ b/drivers/gpu/drm/renesas/shmobile/shmob_drm_plane.c
@@ -14,7 +14,6 @@
 #include 
 #include 
 #include 
-#include 
 
 #include "shmob_drm_drv.h"
 #include "shmob_drm_kms.h"
-- 
2.43.0

[PATCH 3/8] drm/loongson: Do not include

2023-11-28 Thread Thomas Zimmermann

Remove unnecessary include statements for .
The file contains helpers for non-atomic code and should not be
required by most drivers. No functional changes.

Signed-off-by: Thomas Zimmermann 
---
 drivers/gpu/drm/loongson/lsdc_plane.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/drivers/gpu/drm/loongson/lsdc_plane.c 
b/drivers/gpu/drm/loongson/lsdc_plane.c
index 0d50946332229..d227a2c1dcf16 100644
--- a/drivers/gpu/drm/loongson/lsdc_plane.c
+++ b/drivers/gpu/drm/loongson/lsdc_plane.c
@@ -9,7 +9,6 @@
 #include 
 #include 
 #include 
-#include 
 
 #include "lsdc_drv.h"
 #include "lsdc_regs.h"
-- 
2.43.0

[PATCH 1/8] drm/plane-helper: Move drm_plane_helper_atomic_check() into udl

2023-11-28 Thread Thomas Zimmermann

The udl driver is the only caller of drm_plane_helper_atomic_check().
Move the function into the driver. No functional changes.

Signed-off-by: Thomas Zimmermann 
---
 drivers/gpu/drm/drm_plane_helper.c | 32 --
 drivers/gpu/drm/udl/udl_modeset.c  | 19 --
 include/drm/drm_plane_helper.h |  2 --
 3 files changed, 17 insertions(+), 36 deletions(-)

diff --git a/drivers/gpu/drm/drm_plane_helper.c 
b/drivers/gpu/drm/drm_plane_helper.c
index 5e95089676ff8..7982be4b0306d 100644
--- a/drivers/gpu/drm/drm_plane_helper.c
+++ b/drivers/gpu/drm/drm_plane_helper.c
@@ -279,35 +279,3 @@ void drm_plane_helper_destroy(struct drm_plane *plane)
kfree(plane);
 }
 EXPORT_SYMBOL(drm_plane_helper_destroy);
-
-/**
- * drm_plane_helper_atomic_check() - Helper to check plane atomic-state
- * @plane: plane to check
- * @state: atomic state object
- *
- * Provides a default plane-state check handler for planes whose atomic-state
- * scale and positioning are not expected to change since the plane is always
- * a fullscreen scanout buffer.
- *
- * This is often the case for the primary plane of simple framebuffers. See
- * also drm_crtc_helper_atomic_check() for the respective CRTC-state check
- * helper function.
- *
- * RETURNS:
- * Zero on success, or an errno code otherwise.
- */
-int drm_plane_helper_atomic_check(struct drm_plane *plane, struct 
drm_atomic_state *state)
-{
-   struct drm_plane_state *new_plane_state = 
drm_atomic_get_new_plane_state(state, plane);
-   struct drm_crtc *new_crtc = new_plane_state->crtc;
-   struct drm_crtc_state *new_crtc_state = NULL;
-
-   if (new_crtc)
-   new_crtc_state = drm_atomic_get_new_crtc_state(state, new_crtc);
-
-   return drm_atomic_helper_check_plane_state(new_plane_state, 
new_crtc_state,
-  DRM_PLANE_NO_SCALING,
-  DRM_PLANE_NO_SCALING,
-  false, false);
-}
-EXPORT_SYMBOL(drm_plane_helper_atomic_check);
diff --git a/drivers/gpu/drm/udl/udl_modeset.c 
b/drivers/gpu/drm/udl/udl_modeset.c
index 40876bcdd79a4..7702359c90c22 100644
--- a/drivers/gpu/drm/udl/udl_modeset.c
+++ b/drivers/gpu/drm/udl/udl_modeset.c
@@ -21,7 +21,6 @@
 #include 
 #include 
 #include 
-#include 
 #include 
 #include 
 
@@ -261,6 +260,22 @@ static const uint64_t udl_primary_plane_fmtmods[] = {
DRM_FORMAT_MOD_INVALID
 };
 
+static int udl_primary_plane_helper_atomic_check(struct drm_plane *plane,
+struct drm_atomic_state *state)
+{
+   struct drm_plane_state *new_plane_state = 
drm_atomic_get_new_plane_state(state, plane);
+   struct drm_crtc *new_crtc = new_plane_state->crtc;
+   struct drm_crtc_state *new_crtc_state = NULL;
+
+   if (new_crtc)
+   new_crtc_state = drm_atomic_get_new_crtc_state(state, new_crtc);
+
+   return drm_atomic_helper_check_plane_state(new_plane_state, 
new_crtc_state,
+  DRM_PLANE_NO_SCALING,
+  DRM_PLANE_NO_SCALING,
+  false, false);
+}
+
 static void udl_primary_plane_helper_atomic_update(struct drm_plane *plane,
   struct drm_atomic_state 
*state)
 {
@@ -296,7 +311,7 @@ static void udl_primary_plane_helper_atomic_update(struct 
drm_plane *plane,
 
 static const struct drm_plane_helper_funcs udl_primary_plane_helper_funcs = {
DRM_GEM_SHADOW_PLANE_HELPER_FUNCS,
-   .atomic_check = drm_plane_helper_atomic_check,
+   .atomic_check = udl_primary_plane_helper_atomic_check,
.atomic_update = udl_primary_plane_helper_atomic_update,
 };
 
diff --git a/include/drm/drm_plane_helper.h b/include/drm/drm_plane_helper.h
index 3a574e8cd22f4..75f9c4830564a 100644
--- a/include/drm/drm_plane_helper.h
+++ b/include/drm/drm_plane_helper.h
@@ -26,7 +26,6 @@
 
 #include 
 
-struct drm_atomic_state;
 struct drm_crtc;
 struct drm_framebuffer;
 struct drm_modeset_acquire_ctx;
@@ -42,7 +41,6 @@ int drm_plane_helper_update_primary(struct drm_plane *plane, 
struct drm_crtc *cr
 int drm_plane_helper_disable_primary(struct drm_plane *plane,
 struct drm_modeset_acquire_ctx *ctx);
 void drm_plane_helper_destroy(struct drm_plane *plane);
-int drm_plane_helper_atomic_check(struct drm_plane *plane, struct 
drm_atomic_state *state);
 
 /**
  * DRM_PLANE_NON_ATOMIC_FUNCS - Default plane functions for non-atomic drivers
-- 
2.43.0

[PATCH 5/8] drm/solomon: Do not include

2023-11-28 Thread Thomas Zimmermann

Remove unnecessary include statements for .
The file contains helpers for non-atomic code and should not be
required by most drivers. No functional changes.

Signed-off-by: Thomas Zimmermann 
---
 drivers/gpu/drm/solomon/ssd130x.h | 1 -
 1 file changed, 1 deletion(-)

diff --git a/drivers/gpu/drm/solomon/ssd130x.h 
b/drivers/gpu/drm/solomon/ssd130x.h
index acf7cedf0c1ab..075c5c3ee75ac 100644
--- a/drivers/gpu/drm/solomon/ssd130x.h
+++ b/drivers/gpu/drm/solomon/ssd130x.h
@@ -17,7 +17,6 @@
 #include 
 #include 
 #include 
-#include 
 
 #include 
 
-- 
2.43.0

[PATCH 8/8] drm/xlnx: Do not include

2023-11-28 Thread Thomas Zimmermann

Remove unnecessary include statements for .
The file contains helpers for non-atomic code and should not be
required by most drivers. No functional changes.

Signed-off-by: Thomas Zimmermann 
---
 drivers/gpu/drm/xlnx/zynqmp_kms.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/drivers/gpu/drm/xlnx/zynqmp_kms.c 
b/drivers/gpu/drm/xlnx/zynqmp_kms.c
index a7f8611be6f42..db3bb4afbfc46 100644
--- a/drivers/gpu/drm/xlnx/zynqmp_kms.c
+++ b/drivers/gpu/drm/xlnx/zynqmp_kms.c
@@ -27,7 +27,6 @@
 #include 
 #include 
 #include 
-#include 
 #include 
 #include 
 #include 
-- 
2.43.0

[PATCH 7/8] drm/simpledrm: Do not include

2023-11-28 Thread Thomas Zimmermann

Remove unnecessary include statements for .
The file contains helpers for non-atomic code and should not be
required by most drivers. No functional changes.

Signed-off-by: Thomas Zimmermann 
---
 drivers/gpu/drm/tiny/simpledrm.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/drivers/gpu/drm/tiny/simpledrm.c b/drivers/gpu/drm/tiny/simpledrm.c
index 34bbbd7b53dd9..7ce1c46176750 100644
--- a/drivers/gpu/drm/tiny/simpledrm.c
+++ b/drivers/gpu/drm/tiny/simpledrm.c
@@ -25,7 +25,6 @@
 #include 
 #include 
 #include 
-#include 
 #include 
 
 #define DRIVER_NAME"simpledrm"
-- 
2.43.0

Re: [PATCH 8/8] drm/xlnx: Do not include

2023-11-28 Thread Laurent Pinchart

Hi Thomas,

Thank you for the patch.

On Tue, Nov 28, 2023 at 11:45:24AM +0100, Thomas Zimmermann wrote:
> Remove unnecessary include statements for .
> The file contains helpers for non-atomic code and should not be
> required by most drivers. No functional changes.
> 
> Signed-off-by: Thomas Zimmermann 

Assuming you've compile-tested the driver,

Reviewed-by: Laurent Pinchart 

Please get this merged through drm-misc as part of the series if
possible.

> ---
>  drivers/gpu/drm/xlnx/zynqmp_kms.c | 1 -
>  1 file changed, 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/xlnx/zynqmp_kms.c 
> b/drivers/gpu/drm/xlnx/zynqmp_kms.c
> index a7f8611be6f42..db3bb4afbfc46 100644
> --- a/drivers/gpu/drm/xlnx/zynqmp_kms.c
> +++ b/drivers/gpu/drm/xlnx/zynqmp_kms.c
> @@ -27,7 +27,6 @@
>  #include 
>  #include 
>  #include 
> -#include 
>  #include 
>  #include 
>  #include 

-- 
Regards,

Laurent Pinchart

[PATCH] drm/amdgpu: Use another offset for GC 9.4.3 remap

2023-11-28 Thread Lijo Lazar

The legacy region at 0x7F000 maps to valid registers in GC 9.4.3 SOCs.
Use 0x1A000 offset instead as MMIO register remap region.

Signed-off-by: Lijo Lazar 
---
 drivers/gpu/drm/amd/amdgpu/soc15.c | 5 +
 1 file changed, 5 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/soc15.c 
b/drivers/gpu/drm/amd/amdgpu/soc15.c
index e3d41e8aac9d..9ad4d6d3122b 100644
--- a/drivers/gpu/drm/amd/amdgpu/soc15.c
+++ b/drivers/gpu/drm/amd/amdgpu/soc15.c
@@ -1162,6 +1162,11 @@ static int soc15_common_early_init(void *handle)
AMD_PG_SUPPORT_VCN_DPG |
AMD_PG_SUPPORT_JPEG;
adev->external_rev_id = adev->rev_id + 0x46;
+   /* GC 9.4.3 uses MMIO register region hole at a different 
offset */
+   if (!amdgpu_sriov_vf(adev)) {
+   adev->rmmio_remap.reg_offset = 0x1A000;
+   adev->rmmio_remap.bus_addr = adev->rmmio_base + 0x1A000;
+   }
break;
default:
/* FIXME: not supported yet */
-- 
2.25.1

Re: [RFC PATCH 0/6] Supporting GMEM (generalized memory management) for external memory devices

2023-11-28 Thread Christian König


Am 28.11.23 um 13:50 schrieb Weixi Zhu:

The problem:

Accelerator driver developers are forced to reinvent external MM subsystems
case by case, because Linux core MM only considers host memory resources.
These reinvented MM subsystems have similar orders of magnitude of LoC as
Linux MM (80K), e.g. Nvidia-UVM has 70K, AMD GPU has 14K and Huawei NPU has
30K. Meanwhile, more and more vendors are implementing their own
accelerators, e.g. Microsoft's Maia 100. At the same time,
application-level developers suffer from poor programmability -- they must
consider parallel address spaces and be careful about the limited device
DRAM capacity. This can be alleviated if a malloc()-ed virtual address can
be shared by the accelerator, or the abundant host DRAM can further
transparently backup the device local memory.

These external MM systems share similar mechanisms except for the
hardware-dependent part, so reinventing them is effectively introducing
redundant code (14K~70K for each case). Such developing/maintaining is not
cheap. Furthermore, to share a malloc()-ed virtual address, device drivers
need to deeply interact with Linux MM via low-level MM APIs, e.g. MMU
notifiers/HMM. This raises the bar for driver development, since developers
must understand how Linux MM works. Further, it creates code maintenance
problems -- any changes to Linux MM potentially require coordinated changes
to accelerator drivers using low-level MM APIs.

Putting a cache-coherent bus between host and device will not make these
external MM subsystems disappear. For example, a throughput-oriented
accelerator will not tolerate executing heavy memory access workload with
a host MMU/IOMMU via a remote bus. Therefore, devices will still have
their own MMU and pick a simpler page table format for lower address
translation overhead, requiring external MM subsystems.



What GMEM (Generalized Memory Management [1]) does:

GMEM extends Linux MM to share its machine-independent MM code. Only
high-level interface is provided for device drivers. This prevents
accelerator drivers from reinventing the wheel, but relies on drivers to
implement their hardware-dependent functions declared by GMEM. GMEM's key
interface include gm_dev_create(), gm_as_create(), gm_as_attach() and
gm_dev_register_physmem(). Here briefly describe how a device driver
utilizes them:
1. At boot time, call gm_dev_create() and registers the implementation of
hardware-dependent functions as declared in struct gm_mmu.
  - If the device has local DRAM, call gm_dev_register_physmem() to
register available physical addresses.
2. When a device context is initialized (e.g. triggered by ioctl), check if
the current CPU process has been attached to a gmem address space
(struct gm_as). If not, call gm_as_create() and point current->mm->gm_as
to it.
3. Call gm_as_attach() to attach the device context to a gmem address space.
4. Invoke gm_dev_fault() to resolve a page fault or prepare data before
device computation happens.

GMEM has changed the following assumptions in Linux MM:
   1. An mm_struct not only handle a single CPU context, but may also handle
  external memory contexts encapsulated as gm_context listed in
  mm->gm_as. An external memory context can include a few or all of the
  following parts: an external MMU (that requires TLB invalidation), an
  external page table (that requires PTE manipulation) and external DRAM
  (that requires physical memory management).


Well that is pretty much exactly what AMD has already proposed with KFD 
and was rejected for rather good reasons.



   2. Faulting a MAP_PRIVATE VMA with no CPU PTE found does not necessarily
  mean that a zero-filled physical page should be mapped. The virtual
  page may have been mapped to an external memory device.
   3. Unmapping a page may include sending device TLB invalidation (even if
  its MMU shares CPU page table) and manipulating device PTEs.



Semantics of new syscalls:

1. mmap(..., MAP_PRIVATE | MAP_PEER_SHARED)
 Allocate virtual address that is shared between the CPU and all
 attached devices. Data is guaranteed to be coherent whenever the
 address is accessed by either CPU or any attached device. If the device
 does not support page fault, then device driver is responsible for
 faulting memory before data gets accessed. By default, the CPU DRAM is
 can be used as a swap backup for the device local memory.
2. hmadvise(NUMA_id, va_start, size, memory_hint)
 Issuing memory hint for a given VMA. This extends traditional madvise()
 syscall with an extra argument so that programmers have better control
 with heterogeneous devices registered as NUMA nodes. One useful memory
 hint could be MADV_PREFETCH, which guarantees that the physical data of
 the given VMA [VA, VA+size) is migrated to NUMA node #id. Another
 useful memory hint is MADV_DONTNEED. This is

Re: [RFC PATCH 0/6] Supporting GMEM (generalized memory management) for external memory devices

2023-11-28 Thread Christian König


Adding a few missing important people to the explicit to list.

Am 28.11.23 um 13:50 schrieb Weixi Zhu:

The problem:

Accelerator driver developers are forced to reinvent external MM subsystems
case by case, because Linux core MM only considers host memory resources.
These reinvented MM subsystems have similar orders of magnitude of LoC as
Linux MM (80K), e.g. Nvidia-UVM has 70K, AMD GPU has 14K and Huawei NPU has
30K. Meanwhile, more and more vendors are implementing their own
accelerators, e.g. Microsoft's Maia 100. At the same time,
application-level developers suffer from poor programmability -- they must
consider parallel address spaces and be careful about the limited device
DRAM capacity. This can be alleviated if a malloc()-ed virtual address can
be shared by the accelerator, or the abundant host DRAM can further
transparently backup the device local memory.

These external MM systems share similar mechanisms except for the
hardware-dependent part, so reinventing them is effectively introducing
redundant code (14K~70K for each case). Such developing/maintaining is not
cheap. Furthermore, to share a malloc()-ed virtual address, device drivers
need to deeply interact with Linux MM via low-level MM APIs, e.g. MMU
notifiers/HMM. This raises the bar for driver development, since developers
must understand how Linux MM works. Further, it creates code maintenance
problems -- any changes to Linux MM potentially require coordinated changes
to accelerator drivers using low-level MM APIs.

Putting a cache-coherent bus between host and device will not make these
external MM subsystems disappear. For example, a throughput-oriented
accelerator will not tolerate executing heavy memory access workload with
a host MMU/IOMMU via a remote bus. Therefore, devices will still have
their own MMU and pick a simpler page table format for lower address
translation overhead, requiring external MM subsystems.



What GMEM (Generalized Memory Management [1]) does:

GMEM extends Linux MM to share its machine-independent MM code. Only
high-level interface is provided for device drivers. This prevents
accelerator drivers from reinventing the wheel, but relies on drivers to
implement their hardware-dependent functions declared by GMEM. GMEM's key
interface include gm_dev_create(), gm_as_create(), gm_as_attach() and
gm_dev_register_physmem(). Here briefly describe how a device driver
utilizes them:
1. At boot time, call gm_dev_create() and registers the implementation of
hardware-dependent functions as declared in struct gm_mmu.
  - If the device has local DRAM, call gm_dev_register_physmem() to
register available physical addresses.
2. When a device context is initialized (e.g. triggered by ioctl), check if
the current CPU process has been attached to a gmem address space
(struct gm_as). If not, call gm_as_create() and point current->mm->gm_as
to it.
3. Call gm_as_attach() to attach the device context to a gmem address space.
4. Invoke gm_dev_fault() to resolve a page fault or prepare data before
device computation happens.

GMEM has changed the following assumptions in Linux MM:
   1. An mm_struct not only handle a single CPU context, but may also handle
  external memory contexts encapsulated as gm_context listed in
  mm->gm_as. An external memory context can include a few or all of the
  following parts: an external MMU (that requires TLB invalidation), an
  external page table (that requires PTE manipulation) and external DRAM
  (that requires physical memory management).
   2. Faulting a MAP_PRIVATE VMA with no CPU PTE found does not necessarily
  mean that a zero-filled physical page should be mapped. The virtual
  page may have been mapped to an external memory device.
   3. Unmapping a page may include sending device TLB invalidation (even if
  its MMU shares CPU page table) and manipulating device PTEs.



Semantics of new syscalls:

1. mmap(..., MAP_PRIVATE | MAP_PEER_SHARED)
 Allocate virtual address that is shared between the CPU and all
 attached devices. Data is guaranteed to be coherent whenever the
 address is accessed by either CPU or any attached device. If the device
 does not support page fault, then device driver is responsible for
 faulting memory before data gets accessed. By default, the CPU DRAM is
 can be used as a swap backup for the device local memory.
2. hmadvise(NUMA_id, va_start, size, memory_hint)
 Issuing memory hint for a given VMA. This extends traditional madvise()
 syscall with an extra argument so that programmers have better control
 with heterogeneous devices registered as NUMA nodes. One useful memory
 hint could be MADV_PREFETCH, which guarantees that the physical data of
 the given VMA [VA, VA+size) is migrated to NUMA node #id. Another
 useful memory hint is MADV_DONTNEED. This is helpful to increase device
 memory utilization. It

Re: [PATCH] drm/amdgpu: Fix uninitialized return value

2023-11-28 Thread Christian König


Am 28.11.23 um 10:49 schrieb Lazar, Lijo:


On 11/28/2023 3:07 PM, Christian König wrote:

Am 27.11.23 um 22:55 schrieb Alex Deucher:

On Mon, Nov 27, 2023 at 2:22 PM Christian König
 wrote:

Am 27.11.23 um 19:29 schrieb Lijo Lazar:

The return value is uniinitialized if ras context is NULL.

Fixes: 0f4c8faa043c (drm/amdgpu: Move mca debug mode decision to ras)

Signed-off-by: Lijo Lazar 
---
   drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 2 +-
   1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c

index 1a8668a63e67..f6b47ebce9d6 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
@@ -3410,7 +3410,7 @@ int amdgpu_ras_reset_gpu(struct 
amdgpu_device *adev)
   int amdgpu_ras_set_mca_debug_mode(struct amdgpu_device *adev, 
bool enable)

   {
   struct amdgpu_ras *con = amdgpu_ras_get_context(adev);
- int ret;
+ int ret = 0;
That's usually considered very bad coding style and complained 
about by

automated checkers.

Instead explicitly set the return value in the code paths not actually
setting it.

In this case, the function is so short, I think it makes things less
readable to do that.


Yeah, indeed but that doesn't help us with the coding style checkers.


Are these checkers for real? I see many instances of variable 
initialization including in core kernel code (ex: workqueue) code.


Yes, I've got multiple complains about that already.

What people basically seem to do is to search for patterns like "int ret 
= 0;... ret = whatever();.. return ret;" with cocci.


This then results in a note that an initialization isn't necessary and 
should be avoided.


Same for things like return after else, e.g. when you have something 
like this "if (...) { ret = whatever(); if (ret) return ret; } else { 
... ret = 0;} return ret;".


Regards,
Christian.



Thanks

Lijo



We could do something like this instead:

if (!con)
    return 0;

ret = amdgpu_mca_smu_set_debug_mode(adev, enable);
...

Regards,
Christian.



Reviewed-by: Alex Deucher 


Regards,
Christian.


   if (con) {
   ret = amdgpu_mca_smu_set_debug_mode(adev, enable);

Re: [PATCH] drm/amdgpu: add shared fdinfo stats

2023-11-28 Thread Christian König


Am 17.11.23 um 20:56 schrieb Alex Deucher:

Add shared stats.  Useful for seeing shared memory.

Signed-off-by: Alex Deucher 
---
  drivers/gpu/drm/amd/amdgpu/amdgpu_fdinfo.c |  4 
  drivers/gpu/drm/amd/amdgpu/amdgpu_object.c | 11 +++
  drivers/gpu/drm/amd/amdgpu/amdgpu_object.h |  6 ++
  3 files changed, 21 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_fdinfo.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_fdinfo.c
index 5706b282a0c7..c7df7fa3459f 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_fdinfo.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_fdinfo.c
@@ -97,6 +97,10 @@ void amdgpu_show_fdinfo(struct drm_printer *p, struct 
drm_file *file)
   stats.requested_visible_vram/1024UL);
drm_printf(p, "amd-requested-gtt:\t%llu KiB\n",
   stats.requested_gtt/1024UL);
+   drm_printf(p, "drm-shared-vram:\t%llu KiB\n", stats.vram_shared/1024UL);
+   drm_printf(p, "drm-shared-gtt:\t%llu KiB\n", stats.gtt_shared/1024UL);
+   drm_printf(p, "drm-shared-cpu:\t%llu KiB\n", stats.cpu_shared/1024UL);
+
for (hw_ip = 0; hw_ip < AMDGPU_HW_IP_NUM; ++hw_ip) {
if (!usage[hw_ip])
continue;
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
index d79b4ca1ecfc..c24f7b2c04c1 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
@@ -1287,25 +1287,36 @@ void amdgpu_bo_get_memory(struct amdgpu_bo *bo,
  struct amdgpu_mem_stats *stats)
  {
uint64_t size = amdgpu_bo_size(bo);
+   struct drm_gem_object *obj;
unsigned int domain;
+   bool shared;
  
  	/* Abort if the BO doesn't currently have a backing store */

if (!bo->tbo.resource)
return;
  
+	obj = &bo->tbo.base;

+   shared = obj->handle_count > 1;


Interesting approach but I don't think that this is correct.

The handle_count is basically how many GEM handles are there for BO, so 
for example it doesn't catch sharing things with V4L.


What we should probably rather do is to take a look if 
bo->tbo.base.dma_buf is NULL or not.


Regards,
Christian.



+
domain = amdgpu_mem_type_to_domain(bo->tbo.resource->mem_type);
switch (domain) {
case AMDGPU_GEM_DOMAIN_VRAM:
stats->vram += size;
if (amdgpu_bo_in_cpu_visible_vram(bo))
stats->visible_vram += size;
+   if (shared)
+   stats->vram_shared += size;
break;
case AMDGPU_GEM_DOMAIN_GTT:
stats->gtt += size;
+   if (shared)
+   stats->gtt_shared += size;
break;
case AMDGPU_GEM_DOMAIN_CPU:
default:
stats->cpu += size;
+   if (shared)
+   stats->cpu_shared += size;
break;
}
  
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.h

index d28e21baef16..0503af75dc26 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.h
@@ -138,12 +138,18 @@ struct amdgpu_bo_vm {
  struct amdgpu_mem_stats {
/* current VRAM usage, includes visible VRAM */
uint64_t vram;
+   /* current shared VRAM usage, includes visible VRAM */
+   uint64_t vram_shared;
/* current visible VRAM usage */
uint64_t visible_vram;
/* current GTT usage */
uint64_t gtt;
+   /* current shared GTT usage */
+   uint64_t gtt_shared;
/* current system memory usage */
uint64_t cpu;
+   /* current shared system memory usage */
+   uint64_t cpu_shared;
/* sum of evicted buffers, includes visible VRAM */
uint64_t evicted_vram;
/* sum of evicted buffers due to CPU access */

Re: [PATCH 8/8] drm/xlnx: Do not include

2023-11-28 Thread Thomas Zimmermann


Hi

Am 28.11.23 um 12:02 schrieb Laurent Pinchart:

Hi Thomas,

Thank you for the patch.

On Tue, Nov 28, 2023 at 11:45:24AM +0100, Thomas Zimmermann wrote:

Remove unnecessary include statements for .
The file contains helpers for non-atomic code and should not be
required by most drivers. No functional changes.

Signed-off-by: Thomas Zimmermann 


Assuming you've compile-tested the driver,


I've COMPILE_TEST'ed the driver on aarch64.



Reviewed-by: Laurent Pinchart 

Please get this merged through drm-misc as part of the series if
possible.


Sure.

Best regards
Thomas




---
  drivers/gpu/drm/xlnx/zynqmp_kms.c | 1 -
  1 file changed, 1 deletion(-)

diff --git a/drivers/gpu/drm/xlnx/zynqmp_kms.c 
b/drivers/gpu/drm/xlnx/zynqmp_kms.c
index a7f8611be6f42..db3bb4afbfc46 100644
--- a/drivers/gpu/drm/xlnx/zynqmp_kms.c
+++ b/drivers/gpu/drm/xlnx/zynqmp_kms.c
@@ -27,7 +27,6 @@
  #include 
  #include 
  #include 
-#include 
  #include 
  #include 
  #include 




--
Thomas Zimmermann
Graphics Driver Developer
SUSE Software Solutions Germany GmbH
Frankenstrasse 146, 90461 Nuernberg, Germany
GF: Ivo Totev, Andrew Myers, Andrew McDonald, Boudien Moerman
HRB 36809 (AG Nuernberg)


OpenPGP_signature.asc
Description: OpenPGP digital signature

Re: [PATCH] drm/amdgpu: add shared fdinfo stats

2023-11-28 Thread Alex Deucher

On Tue, Nov 28, 2023 at 9:17 AM Christian König
 wrote:
>
> Am 17.11.23 um 20:56 schrieb Alex Deucher:
> > Add shared stats.  Useful for seeing shared memory.
> >
> > Signed-off-by: Alex Deucher 
> > ---
> >   drivers/gpu/drm/amd/amdgpu/amdgpu_fdinfo.c |  4 
> >   drivers/gpu/drm/amd/amdgpu/amdgpu_object.c | 11 +++
> >   drivers/gpu/drm/amd/amdgpu/amdgpu_object.h |  6 ++
> >   3 files changed, 21 insertions(+)
> >
> > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_fdinfo.c 
> > b/drivers/gpu/drm/amd/amdgpu/amdgpu_fdinfo.c
> > index 5706b282a0c7..c7df7fa3459f 100644
> > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_fdinfo.c
> > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_fdinfo.c
> > @@ -97,6 +97,10 @@ void amdgpu_show_fdinfo(struct drm_printer *p, struct 
> > drm_file *file)
> >  stats.requested_visible_vram/1024UL);
> >   drm_printf(p, "amd-requested-gtt:\t%llu KiB\n",
> >  stats.requested_gtt/1024UL);
> > + drm_printf(p, "drm-shared-vram:\t%llu KiB\n", 
> > stats.vram_shared/1024UL);
> > + drm_printf(p, "drm-shared-gtt:\t%llu KiB\n", stats.gtt_shared/1024UL);
> > + drm_printf(p, "drm-shared-cpu:\t%llu KiB\n", stats.cpu_shared/1024UL);
> > +
> >   for (hw_ip = 0; hw_ip < AMDGPU_HW_IP_NUM; ++hw_ip) {
> >   if (!usage[hw_ip])
> >   continue;
> > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c 
> > b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
> > index d79b4ca1ecfc..c24f7b2c04c1 100644
> > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
> > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
> > @@ -1287,25 +1287,36 @@ void amdgpu_bo_get_memory(struct amdgpu_bo *bo,
> > struct amdgpu_mem_stats *stats)
> >   {
> >   uint64_t size = amdgpu_bo_size(bo);
> > + struct drm_gem_object *obj;
> >   unsigned int domain;
> > + bool shared;
> >
> >   /* Abort if the BO doesn't currently have a backing store */
> >   if (!bo->tbo.resource)
> >   return;
> >
> > + obj = &bo->tbo.base;
> > + shared = obj->handle_count > 1;
>
> Interesting approach but I don't think that this is correct.
>
> The handle_count is basically how many GEM handles are there for BO, so
> for example it doesn't catch sharing things with V4L.
>
> What we should probably rather do is to take a look if
> bo->tbo.base.dma_buf is NULL or not.

+Rob, dri-devel

This is what the generic drm helper code does.  See
drm_show_memory_stats().  If that is not correct that code should
probably be fixed too.

Alex

>
> Regards,
> Christian.
>
>
> > +
> >   domain = amdgpu_mem_type_to_domain(bo->tbo.resource->mem_type);
> >   switch (domain) {
> >   case AMDGPU_GEM_DOMAIN_VRAM:
> >   stats->vram += size;
> >   if (amdgpu_bo_in_cpu_visible_vram(bo))
> >   stats->visible_vram += size;
> > + if (shared)
> > + stats->vram_shared += size;
> >   break;
> >   case AMDGPU_GEM_DOMAIN_GTT:
> >   stats->gtt += size;
> > + if (shared)
> > + stats->gtt_shared += size;
> >   break;
> >   case AMDGPU_GEM_DOMAIN_CPU:
> >   default:
> >   stats->cpu += size;
> > + if (shared)
> > + stats->cpu_shared += size;
> >   break;
> >   }
> >
> > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.h 
> > b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.h
> > index d28e21baef16..0503af75dc26 100644
> > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.h
> > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.h
> > @@ -138,12 +138,18 @@ struct amdgpu_bo_vm {
> >   struct amdgpu_mem_stats {
> >   /* current VRAM usage, includes visible VRAM */
> >   uint64_t vram;
> > + /* current shared VRAM usage, includes visible VRAM */
> > + uint64_t vram_shared;
> >   /* current visible VRAM usage */
> >   uint64_t visible_vram;
> >   /* current GTT usage */
> >   uint64_t gtt;
> > + /* current shared GTT usage */
> > + uint64_t gtt_shared;
> >   /* current system memory usage */
> >   uint64_t cpu;
> > + /* current shared system memory usage */
> > + uint64_t cpu_shared;
> >   /* sum of evicted buffers, includes visible VRAM */
> >   uint64_t evicted_vram;
> >   /* sum of evicted buffers due to CPU access */
>

Re: [7/8] drm/simpledrm: Do not include

2023-11-28 Thread Sui Jingfeng


Hi,


On 2023/11/28 18:45, Thomas Zimmermann wrote:

Remove unnecessary include statements for .
The file contains helpers for non-atomic code and should not be
required by most drivers. No functional changes.

Signed-off-by: Thomas Zimmermann 


Reviewed-by: Sui Jingfeng 



---
  drivers/gpu/drm/tiny/simpledrm.c | 1 -
  1 file changed, 1 deletion(-)

diff --git a/drivers/gpu/drm/tiny/simpledrm.c b/drivers/gpu/drm/tiny/simpledrm.c
index 34bbbd7b53dd9..7ce1c46176750 100644
--- a/drivers/gpu/drm/tiny/simpledrm.c
+++ b/drivers/gpu/drm/tiny/simpledrm.c
@@ -25,7 +25,6 @@
  #include 
  #include 
  #include 
-#include 
  #include 
  
  #define DRIVER_NAME	"simpledrm"

[RFC PATCH 5/6] mm/gmem: resolve VMA conflicts for attached peer devices

2023-11-28 Thread Weixi Zhu

This patch resolves potential VMA conflicts when
mmap(MAP_PRIVATE | MAP_PEER_SHARED) is invoked. Note that the semantic of
mmap(MAP_PRIVATE | MAP_PEER_SHARED) is to provide a coherent view of memory
through the allocated virtual addresses between the CPU and all attached
devices. However, an attached device may create its own computing context
that does not necessarily share the same address space layout with the CPU
process. Therefore, the mmap() syscall must return virtual addresses that
are guaranteed to be valid across all attached peer devices.

In current implementation, if a candidate VMA is detected to be
conflicting, it will be temporarily blacklisted. The mmap_region()
function will retry other VMA candidates for a predefined number of
iterations.

Signed-off-by: Weixi Zhu 
---
 fs/proc/task_mmu.c |  3 ++
 include/linux/gmem.h   | 26 +++-
 include/linux/mm.h |  8 +
 include/uapi/asm-generic/mman-common.h |  1 +
 kernel/fork.c  |  4 +++
 mm/gmem.c  | 38 
 mm/mempolicy.c |  4 +++
 mm/mmap.c  | 38 ++--
 mm/vm_object.c | 41 ++
 9 files changed, 159 insertions(+), 4 deletions(-)

diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c
index ef2eb12906da..5af03d8f0319 100644
--- a/fs/proc/task_mmu.c
+++ b/fs/proc/task_mmu.c
@@ -701,6 +701,9 @@ static void show_smap_vma_flags(struct seq_file *m, struct 
vm_area_struct *vma)
 #endif /* CONFIG_HAVE_ARCH_USERFAULTFD_MINOR */
 #ifdef CONFIG_X86_USER_SHADOW_STACK
[ilog2(VM_SHADOW_STACK)] = "ss",
+#endif
+#ifdef CONFIG_GMEM
+   [ilog2(VM_PEER_SHARED)] = "ps",
 #endif
};
size_t i;
diff --git a/include/linux/gmem.h b/include/linux/gmem.h
index 97186f29638d..82d88df5ce44 100644
--- a/include/linux/gmem.h
+++ b/include/linux/gmem.h
@@ -24,7 +24,10 @@ static inline bool gmem_is_enabled(void)
 
 static inline bool vma_is_peer_shared(struct vm_area_struct *vma)
 {
-   return false;
+   if (!gmem_is_enabled())
+   return false;
+
+   return !!(vma->vm_flags & VM_PEER_SHARED);
 }
 
 struct gm_dev {
@@ -130,6 +133,8 @@ void unmap_gm_mappings_range(struct vm_area_struct *vma, 
unsigned long start,
 unsigned long end);
 void munmap_in_peer_devices(struct mm_struct *mm, unsigned long start,
unsigned long end);
+void gm_reserve_vma(struct vm_area_struct *value, struct list_head *head);
+void gm_release_vma(struct mm_struct *mm, struct list_head *head);
 
 /* core gmem */
 enum gm_ret {
@@ -283,6 +288,10 @@ int gm_as_create(unsigned long begin, unsigned long end, 
struct gm_as **new_as);
 int gm_as_destroy(struct gm_as *as);
 int gm_as_attach(struct gm_as *as, struct gm_dev *dev, enum gm_mmu_mode mode,
 bool activate, struct gm_context **out_ctx);
+
+int gm_alloc_va_in_peer_devices(struct mm_struct *mm,
+   struct vm_area_struct *vma, unsigned long addr,
+   unsigned long len, vm_flags_t vm_flags);
 #else
 static inline bool gmem_is_enabled(void) { return false; }
 static inline bool vma_is_peer_shared(struct vm_area_struct *vma)
@@ -339,6 +348,21 @@ int gm_as_attach(struct gm_as *as, struct gm_dev *dev, 
enum gm_mmu_mode mode,
 {
return 0;
 }
+static inline void gm_reserve_vma(struct vm_area_struct *value,
+ struct list_head *head)
+{
+}
+static inline void gm_release_vma(struct mm_struct *mm, struct list_head *head)
+{
+}
+static inline int gm_alloc_va_in_peer_devices(struct mm_struct *mm,
+ struct vm_area_struct *vma,
+ unsigned long addr,
+ unsigned long len,
+ vm_flags_t vm_flags)
+{
+   return 0;
+}
 #endif
 
 #endif /* _GMEM_H */
diff --git a/include/linux/mm.h b/include/linux/mm.h
index 418d26608ece..8837624e4c66 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -320,14 +320,22 @@ extern unsigned int kobjsize(const void *objp);
 #define VM_HIGH_ARCH_BIT_3 35  /* bit only usable on 64-bit 
architectures */
 #define VM_HIGH_ARCH_BIT_4 36  /* bit only usable on 64-bit 
architectures */
 #define VM_HIGH_ARCH_BIT_5 37  /* bit only usable on 64-bit 
architectures */
+#define VM_HIGH_ARCH_BIT_6 38  /* bit only usable on 64-bit 
architectures */
 #define VM_HIGH_ARCH_0 BIT(VM_HIGH_ARCH_BIT_0)
 #define VM_HIGH_ARCH_1 BIT(VM_HIGH_ARCH_BIT_1)
 #define VM_HIGH_ARCH_2 BIT(VM_HIGH_ARCH_BIT_2)
 #define VM_HIGH_ARCH_3 BIT(VM_HIGH_ARCH_BIT_3)
 #define VM_HIGH_ARCH_4 BIT(VM_HIGH_ARCH_BIT_4)
 #define VM_HIGH_ARCH_5 BIT(VM_HIGH_ARCH_BIT_5)
+#define VM_HIGH_ARCH_6 BIT(VM_HIGH_A

Re: [PATCH 5/8] drm/solomon: Do not include

2023-11-28 Thread Geert Uytterhoeven

On Tue, Nov 28, 2023 at 11:47 AM Thomas Zimmermann  wrote:
> Remove unnecessary include statements for .
> The file contains helpers for non-atomic code and should not be
> required by most drivers. No functional changes.
>
> Signed-off-by: Thomas Zimmermann 

Reviewed-by: Geert Uytterhoeven 

Gr{oetje,eeting}s,

Geert

-- 
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- ge...@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
-- Linus Torvalds

[RFC PATCH 3/6] mm/gmem: add GMEM (Generalized Memory Management) interface for external accelerators

2023-11-28 Thread Weixi Zhu

Accelerator driver developers are forced to reinvent external MM subsystems
case by case, introducing redundant code (14K~70K for each case). This is
because Linux core MM only considers host memory resources. At the same
time, application-level developers suffer from poor programmability -- they
must consider parallel address spaces and be careful about the limited
device DRAM capacity.

This patch adds GMEM interface to help accelerator drivers directly reuse
Linux core MM, preventing them from reinventing the wheel. Drivers which
utilize GMEM interface can directly support unified virtual address spaces
for application users -- memory allocated with malloc()/mmap() can be
directly used by either CPU and accelerators, providing a coherent view of
memory.

The GMEM device interface prefixed with "gm_dev" is used to decouple
accelerator-specific operations. Device drivers should invoke
gm_dev_create() to register a device instance at the device boot time. A
device-specific implementation of "struct gm_mmu" must be provided, so
Linux can invoke hardware-related functions at the right time. If the
driver wants Linux to take charge of the local DRAM of the accelerator,
then it should register a range of physical addresses to be managed by
gm_dev_register_physmem().

The GMEM address space interface prefixed with "gm_as" is used to connect a
device context with a CPU context, i.e. an mm_struct. Struct gm_as is
created as a unified address space that not only includes a CPU context,
but may also include one or more device contexts. Device driver should
utilize gm_as_attach() to include a device context to a created struct
gm_as. Then gm_dev_fault() can then serve as a generic device page fault
handler. It is important that a device driver invokes gm_as_attach() at the
beginning of a CPU program. This invocation can happen inside an ioctl()
call when a device context is initialized.

Signed-off-by: Weixi Zhu 
---
 include/linux/gmem.h | 196 +++
 include/linux/mm_types.h |   1 +
 mm/gmem.c| 408 +++
 3 files changed, 605 insertions(+)

diff --git a/include/linux/gmem.h b/include/linux/gmem.h
index 529ff6755a99..f424225daa03 100644
--- a/include/linux/gmem.h
+++ b/include/linux/gmem.h
@@ -13,6 +13,35 @@
 
 #ifdef CONFIG_GMEM
 
+#define GMEM_MMAP_RETRY_TIMES 10 /* gmem retry times before OOM */
+
+DECLARE_STATIC_KEY_FALSE(gmem_status);
+
+static inline bool gmem_is_enabled(void)
+{
+   return static_branch_likely(&gmem_status);
+}
+
+struct gm_dev {
+   int id;
+
+   /*
+* TODO: define more device capabilities and consider different device
+* base page sizes
+*/
+   unsigned long capability;
+   struct gm_mmu *mmu;
+   void *dev_data;
+   /* A device may support time-sliced context switch. */
+   struct gm_context *current_ctx;
+
+   struct list_head gm_ctx_list;
+
+   /* Add tracking of registered device local physical memory. */
+   nodemask_t registered_hnodes;
+   struct device *dma_dev;
+};
+
 #define GM_PAGE_CPU0x10 /* Determines whether page is a pointer or a pfn 
number. */
 #define GM_PAGE_DEVICE 0x20
 #define GM_PAGE_NOMAP  0x40
@@ -96,7 +125,161 @@ void unmap_gm_mappings_range(struct vm_area_struct *vma, 
unsigned long start,
 unsigned long end);
 void munmap_in_peer_devices(struct mm_struct *mm, unsigned long start,
unsigned long end);
+
+/* core gmem */
+enum gm_ret {
+   GM_RET_SUCCESS = 0,
+   GM_RET_NOMEM,
+   GM_RET_PAGE_EXIST,
+   GM_RET_MIGRATING,
+   GM_RET_FAILURE_UNKNOWN,
+};
+
+/**
+ * enum gm_mmu_mode - defines the method to share a physical page table.
+ *
+ * @GM_MMU_MODE_SHARE: Share a physical page table with another attached
+ * device's MMU, requiring one of the attached MMUs to be compatible. For
+ * example, the IOMMU is compatible with the CPU MMU on most modern machines.
+ * This mode requires the device physical memory to be cache-coherent.
+ * TODO: add MMU cookie to detect compatible MMUs.
+ *
+ * @GM_MMU_MODE_COHERENT_EXCLUSIVE: Maintain a coherent page table that holds
+ * exclusive mapping entries, so that device memory accesses can trigger
+ * fault-driven migration for automatic data locality optimizations.
+ * This mode does not require a cache-coherent link between the CPU and device.
+ *
+ * @GM_MMU_MODE_REPLICATE: Maintain a coherent page table that replicates
+ * physical mapping entries whenever a physical mapping is installed inside the
+ * address space, so that it may minimize the page faults to be triggered by
+ * this device.
+ * This mode requires the device physical memory to be cache-coherent.
+ */
+enum gm_mmu_mode {
+   GM_MMU_MODE_SHARE,
+   GM_MMU_MODE_COHERENT_EXCLUSIVE,
+   GM_MMU_MODE_REPLICATE,
+};
+
+enum gm_fault_hint {
+   GM_FAULT_HINT_MARK_HOT,
+   /*
+* TODO: introduce other fault hints, e.g. read-on

[RFC PATCH 0/6] Supporting GMEM (generalized memory management) for external memory devices

2023-11-28 Thread Weixi Zhu

The problem:

Accelerator driver developers are forced to reinvent external MM subsystems
case by case, because Linux core MM only considers host memory resources.
These reinvented MM subsystems have similar orders of magnitude of LoC as
Linux MM (80K), e.g. Nvidia-UVM has 70K, AMD GPU has 14K and Huawei NPU has
30K. Meanwhile, more and more vendors are implementing their own
accelerators, e.g. Microsoft's Maia 100. At the same time,
application-level developers suffer from poor programmability -- they must
consider parallel address spaces and be careful about the limited device
DRAM capacity. This can be alleviated if a malloc()-ed virtual address can
be shared by the accelerator, or the abundant host DRAM can further
transparently backup the device local memory.

These external MM systems share similar mechanisms except for the
hardware-dependent part, so reinventing them is effectively introducing
redundant code (14K~70K for each case). Such developing/maintaining is not
cheap. Furthermore, to share a malloc()-ed virtual address, device drivers
need to deeply interact with Linux MM via low-level MM APIs, e.g. MMU
notifiers/HMM. This raises the bar for driver development, since developers
must understand how Linux MM works. Further, it creates code maintenance
problems -- any changes to Linux MM potentially require coordinated changes
to accelerator drivers using low-level MM APIs.

Putting a cache-coherent bus between host and device will not make these
external MM subsystems disappear. For example, a throughput-oriented
accelerator will not tolerate executing heavy memory access workload with
a host MMU/IOMMU via a remote bus. Therefore, devices will still have
their own MMU and pick a simpler page table format for lower address
translation overhead, requiring external MM subsystems.



What GMEM (Generalized Memory Management [1]) does:

GMEM extends Linux MM to share its machine-independent MM code. Only
high-level interface is provided for device drivers. This prevents
accelerator drivers from reinventing the wheel, but relies on drivers to
implement their hardware-dependent functions declared by GMEM. GMEM's key
interface include gm_dev_create(), gm_as_create(), gm_as_attach() and
gm_dev_register_physmem(). Here briefly describe how a device driver
utilizes them:
1. At boot time, call gm_dev_create() and registers the implementation of
   hardware-dependent functions as declared in struct gm_mmu.
 - If the device has local DRAM, call gm_dev_register_physmem() to
   register available physical addresses.
2. When a device context is initialized (e.g. triggered by ioctl), check if
   the current CPU process has been attached to a gmem address space
   (struct gm_as). If not, call gm_as_create() and point current->mm->gm_as
   to it.
3. Call gm_as_attach() to attach the device context to a gmem address space.
4. Invoke gm_dev_fault() to resolve a page fault or prepare data before
   device computation happens.

GMEM has changed the following assumptions in Linux MM:
  1. An mm_struct not only handle a single CPU context, but may also handle
 external memory contexts encapsulated as gm_context listed in
 mm->gm_as. An external memory context can include a few or all of the
 following parts: an external MMU (that requires TLB invalidation), an
 external page table (that requires PTE manipulation) and external DRAM
 (that requires physical memory management).
  2. Faulting a MAP_PRIVATE VMA with no CPU PTE found does not necessarily
 mean that a zero-filled physical page should be mapped. The virtual
 page may have been mapped to an external memory device.
  3. Unmapping a page may include sending device TLB invalidation (even if
 its MMU shares CPU page table) and manipulating device PTEs.



Semantics of new syscalls:

1. mmap(..., MAP_PRIVATE | MAP_PEER_SHARED)
Allocate virtual address that is shared between the CPU and all
attached devices. Data is guaranteed to be coherent whenever the
address is accessed by either CPU or any attached device. If the device
does not support page fault, then device driver is responsible for
faulting memory before data gets accessed. By default, the CPU DRAM is
can be used as a swap backup for the device local memory.
2. hmadvise(NUMA_id, va_start, size, memory_hint)
Issuing memory hint for a given VMA. This extends traditional madvise()
syscall with an extra argument so that programmers have better control
with heterogeneous devices registered as NUMA nodes. One useful memory
hint could be MADV_PREFETCH, which guarantees that the physical data of
the given VMA [VA, VA+size) is migrated to NUMA node #id. Another
useful memory hint is MADV_DONTNEED. This is helpful to increase device
memory utilization. It is worth considering extending the existing
madvise() syscall with one additional argument.



Implementation det

[RFC PATCH 4/6] mm/gmem: add new syscall hmadvise() to issue memory hints for heterogeneous NUMA nodes

2023-11-28 Thread Weixi Zhu

This patch adds a new syscall, hmadvise(), to issue memory hints for
heterogeneous NUMA nodes. The new syscall effectively extends madvise()
with one additional argument that indicates the NUMA id of a heterogeneous
device, which is not necessarily accessible by the CPU.

The implemented memory hint is MADV_PREFETCH, which guarantees that the
physical data of the given VMA [VA, VA+size) is migrated to a designated
NUMA id, so subsequent accesses from the corresponding device can obtain
local memory access speed. This prefetch hint is internally parallized with
multiple workqueue threads, allowing the page table management to be
overlapped. In a test with Huawei's Ascend NPU card, the MADV_PREFETCH is
able to saturate the host-device bandwidth if the given VMA size is larger
than 16MB.

Signed-off-by: Weixi Zhu 
---
 arch/arm64/include/asm/unistd.h |   2 +-
 arch/arm64/include/asm/unistd32.h   |   2 +
 include/linux/gmem.h|   9 +
 include/uapi/asm-generic/mman-common.h  |   3 +
 include/uapi/asm-generic/unistd.h   |   5 +-
 kernel/sys_ni.c |   2 +
 mm/gmem.c   | 222 
 tools/include/uapi/asm-generic/unistd.h |   5 +-
 8 files changed, 247 insertions(+), 3 deletions(-)

diff --git a/arch/arm64/include/asm/unistd.h b/arch/arm64/include/asm/unistd.h
index 531effca5f1f..298313d2e0af 100644
--- a/arch/arm64/include/asm/unistd.h
+++ b/arch/arm64/include/asm/unistd.h
@@ -39,7 +39,7 @@
 #define __ARM_NR_compat_set_tls(__ARM_NR_COMPAT_BASE + 5)
 #define __ARM_NR_COMPAT_END(__ARM_NR_COMPAT_BASE + 0x800)
 
-#define __NR_compat_syscalls   457
+#define __NR_compat_syscalls   458
 #endif
 
 #define __ARCH_WANT_SYS_CLONE
diff --git a/arch/arm64/include/asm/unistd32.h 
b/arch/arm64/include/asm/unistd32.h
index 9f7c1bf99526..0d44383b98be 100644
--- a/arch/arm64/include/asm/unistd32.h
+++ b/arch/arm64/include/asm/unistd32.h
@@ -919,6 +919,8 @@ __SYSCALL(__NR_futex_wake, sys_futex_wake)
 __SYSCALL(__NR_futex_wait, sys_futex_wait)
 #define __NR_futex_requeue 456
 __SYSCALL(__NR_futex_requeue, sys_futex_requeue)
+#define __NR_hmadvise 457
+__SYSCALL(__NR_hmadvise, sys_hmadvise)
 
 /*
  * Please add new compat syscalls above this comment and update
diff --git a/include/linux/gmem.h b/include/linux/gmem.h
index f424225daa03..97186f29638d 100644
--- a/include/linux/gmem.h
+++ b/include/linux/gmem.h
@@ -22,6 +22,11 @@ static inline bool gmem_is_enabled(void)
return static_branch_likely(&gmem_status);
 }
 
+static inline bool vma_is_peer_shared(struct vm_area_struct *vma)
+{
+   return false;
+}
+
 struct gm_dev {
int id;
 
@@ -280,6 +285,10 @@ int gm_as_attach(struct gm_as *as, struct gm_dev *dev, 
enum gm_mmu_mode mode,
 bool activate, struct gm_context **out_ctx);
 #else
 static inline bool gmem_is_enabled(void) { return false; }
+static inline bool vma_is_peer_shared(struct vm_area_struct *vma)
+{
+   return false;
+}
 static inline void hnuma_init(void) {}
 static inline void __init vm_object_init(void)
 {
diff --git a/include/uapi/asm-generic/mman-common.h 
b/include/uapi/asm-generic/mman-common.h
index 6ce1f1ceb432..49b22a497c5d 100644
--- a/include/uapi/asm-generic/mman-common.h
+++ b/include/uapi/asm-generic/mman-common.h
@@ -79,6 +79,9 @@
 
 #define MADV_COLLAPSE  25  /* Synchronous hugepage collapse */
 
+/* for hmadvise */
+#define MADV_PREFETCH  26  /* prefetch pages for hNUMA node */
+
 /* compatibility flags */
 #define MAP_FILE   0
 
diff --git a/include/uapi/asm-generic/unistd.h 
b/include/uapi/asm-generic/unistd.h
index 756b013fb832..a0773d4f7fa5 100644
--- a/include/uapi/asm-generic/unistd.h
+++ b/include/uapi/asm-generic/unistd.h
@@ -829,8 +829,11 @@ __SYSCALL(__NR_futex_wait, sys_futex_wait)
 #define __NR_futex_requeue 456
 __SYSCALL(__NR_futex_requeue, sys_futex_requeue)
 
+#define __NR_hmadvise 453
+__SYSCALL(__NR_hmadvise, sys_hmadvise)
+
 #undef __NR_syscalls
-#define __NR_syscalls 457
+#define __NR_syscalls 458
 
 /*
  * 32 bit systems traditionally used different
diff --git a/kernel/sys_ni.c b/kernel/sys_ni.c
index e1a6e3c675c0..73bc1b35b8c6 100644
--- a/kernel/sys_ni.c
+++ b/kernel/sys_ni.c
@@ -374,3 +374,5 @@ COND_SYSCALL(setuid16);
 
 /* restartable sequence */
 COND_SYSCALL(rseq);
+
+COND_SYSCALL(hmadvise);
diff --git a/mm/gmem.c b/mm/gmem.c
index b95b6b42ed6d..4eb522026a0d 100644
--- a/mm/gmem.c
+++ b/mm/gmem.c
@@ -9,6 +9,8 @@
 #include 
 #include 
 #include 
+#include 
+#include 
 
 DEFINE_STATIC_KEY_FALSE(gmem_status);
 EXPORT_SYMBOL_GPL(gmem_status);
@@ -484,3 +486,223 @@ int gm_as_attach(struct gm_as *as, struct gm_dev *dev, 
enum gm_mmu_mode mode,
return GM_RET_SUCCESS;
 }
 EXPORT_SYMBOL_GPL(gm_as_attach);
+
+struct prefetch_data {
+   struct mm_struct *mm;
+   struct gm_dev *dev;
+   unsigned long addr;
+   size_t size;
+   struct work_struct work;

Re: [3/8] drm/loongson: Do not include

2023-11-28 Thread Sui Jingfeng


Hi,

Thank you for the patch.


On 2023/11/28 18:45, Thomas Zimmermann wrote:

Remove unnecessary include statements for .
The file contains helpers for non-atomic code and should not be
required by most drivers. No functional changes.

Signed-off-by: Thomas Zimmermann 



Reviewed-by: Sui Jingfeng 
Tested-by: Sui Jingfeng 



---
  drivers/gpu/drm/loongson/lsdc_plane.c | 1 -
  1 file changed, 1 deletion(-)

diff --git a/drivers/gpu/drm/loongson/lsdc_plane.c 
b/drivers/gpu/drm/loongson/lsdc_plane.c
index 0d50946332229..d227a2c1dcf16 100644
--- a/drivers/gpu/drm/loongson/lsdc_plane.c
+++ b/drivers/gpu/drm/loongson/lsdc_plane.c
@@ -9,7 +9,6 @@
  #include 
  #include 
  #include 
-#include 
  
  #include "lsdc_drv.h"

  #include "lsdc_regs.h"

[RFC PATCH 6/6] mm/gmem: extending Linux core MM to support unified virtual address space

2023-11-28 Thread Weixi Zhu

This patch extends Linux core MM to support unified virtual address space.
A unified virtual address space provides a coherent view of memory for the
CPU and devices. This is achieved by maintaining coherent page tables for
the CPU and any attached devices for each process, without assuming that
the underlying interconnect between the CPU and peripheral device is
cache-coherent.

Specifically, for each mm_struct that is attached with one or more device
computing contexts, a per-process logical page table is utilized to track
the mapping status of anonymous memory allocated via mmap(MAP_PRIVATE |
MAP_PEER_SHARED). The CPU page fault handling path is modified to examine
whether a faulted virtual page has already been faulted elsewhere, e.g. on
a device, by looking up the logical page table in vm_object. If so, a page
migration operation should be orchestrated by the core MM to prepare the
CPU physical page, instead of zero-filling. This is achieved by invoking
gm_host_fault_locked(). The logical page table must also be updated once
the CPU page table gets modified.

Ideally, the logical page table should always be looked up or modified
first if the CPU page table is changed, but the currently implementation is
reverse. Also, current implementation only considers anonymous memory,
while a device may want to operate on a disk-file directly via mmap(fd). In
the future, logical page table is planned to play a more generic role for
anonymous memory, folios/huge pages and file-backed memory, as well as to
provide a clean abstraction for CPU page table functions (including these
stage-2 functions). More, the page fault handler path will be enhanced to
deal with cache-coherent buses as well, since it might be desirable for
devices to operate sparse data remotely instead of migration data at page
granules.

Signed-off-by: Weixi Zhu 
---
 kernel/fork.c|  1 +
 mm/huge_memory.c | 85 +++-
 mm/memory.c  | 42 +---
 mm/mmap.c|  2 ++
 mm/oom_kill.c|  2 ++
 mm/vm_object.c   | 84 +++
 6 files changed, 203 insertions(+), 13 deletions(-)

diff --git a/kernel/fork.c b/kernel/fork.c
index eab96cdb25a6..06130c73bf2e 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -543,6 +543,7 @@ static void vm_area_free_rcu_cb(struct rcu_head *head)
 
 void vm_area_free(struct vm_area_struct *vma)
 {
+   free_gm_mappings(vma);
 #ifdef CONFIG_PER_VMA_LOCK
call_rcu(&vma->vm_rcu, vm_area_free_rcu_cb);
 #else
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 4f542444a91f..59f63f04 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -37,6 +37,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
@@ -684,6 +685,10 @@ static vm_fault_t __do_huge_pmd_anonymous_page(struct 
vm_fault *vmf,
pgtable_t pgtable;
unsigned long haddr = vmf->address & HPAGE_PMD_MASK;
vm_fault_t ret = 0;
+   struct gm_mapping *gm_mapping = NULL;
+
+   if (vma_is_peer_shared(vma))
+   gm_mapping = vm_object_lookup(vma->vm_mm->vm_obj, haddr);
 
VM_BUG_ON_FOLIO(!folio_test_large(folio), folio);
 
@@ -691,7 +696,8 @@ static vm_fault_t __do_huge_pmd_anonymous_page(struct 
vm_fault *vmf,
folio_put(folio);
count_vm_event(THP_FAULT_FALLBACK);
count_vm_event(THP_FAULT_FALLBACK_CHARGE);
-   return VM_FAULT_FALLBACK;
+   ret = VM_FAULT_FALLBACK;
+   goto gm_mapping_release;
}
folio_throttle_swaprate(folio, gfp);
 
@@ -701,7 +707,14 @@ static vm_fault_t __do_huge_pmd_anonymous_page(struct 
vm_fault *vmf,
goto release;
}
 
-   clear_huge_page(page, vmf->address, HPAGE_PMD_NR);
+   /*
+* Skip zero-filling page if the logical mapping indicates
+* that page contains valid data of the virtual address. This
+* could happen if the page was a victim of device memory
+* oversubscription.
+*/
+   if (!(vma_is_peer_shared(vma) && gm_mapping_cpu(gm_mapping)))
+   clear_huge_page(page, vmf->address, HPAGE_PMD_NR);
/*
 * The memory barrier inside __folio_mark_uptodate makes sure that
 * clear_huge_page writes become visible before the set_pmd_at()
@@ -726,7 +739,7 @@ static vm_fault_t __do_huge_pmd_anonymous_page(struct 
vm_fault *vmf,
pte_free(vma->vm_mm, pgtable);
ret = handle_userfault(vmf, VM_UFFD_MISSING);
VM_BUG_ON(ret & VM_FAULT_FALLBACK);
-   return ret;
+   goto gm_mapping_release;
}
 
entry = mk_huge_pmd(page, vma->vm_page_prot);
@@ -734,6 +747,13 @@ static vm_fault_t __do_huge_pmd_anonymous_page(struct 
vm_fault *vmf,
folio_add_new_anon_rmap(folio, vma, haddr);
folio_add_lru_vma(folio, vma);

[RFC PATCH 2/6] mm/gmem: add arch-independent abstraction to track address mapping status

2023-11-28 Thread Weixi Zhu

This patch adds an abstraction layer, struct vm_object, that maintains
per-process virtual-to-physical mapping status stored in struct gm_mapping.
For example, a virtual page may be mapped to a CPU physical page or to a
device physical page. Struct vm_object effectively maintains an
arch-independent page table, which is defined as a "logical page table".
While arch-dependent page table used by a real MMU is named a "physical
page table". The logical page table is useful if Linux core MM is extended
to handle a unified virtual address space with external accelerators using
customized MMUs.

In this patch, struct vm_object utilizes a radix
tree (xarray) to track where a virtual page is mapped to. This adds extra
memory consumption from xarray, but provides a nice abstraction to isolate
mapping status from the machine-dependent layer (PTEs). Besides supporting
accelerators with external MMUs, struct vm_object is planned to further
union with i_pages in struct address_mapping for file-backed memory.

The idea of struct vm_object is originated from FreeBSD VM design, which
provides a unified abstraction for anonymous memory, file-backed memory,
page cache and etc[1].

Currently, Linux utilizes a set of hierarchical page walk functions to
abstract page table manipulations of different CPU architecture. The
problem happens when a device wants to reuse Linux MM code to manage its
page table -- the device page table may not be accessible to the CPU.
Existing solution like Linux HMM utilizes the MMU notifier mechanisms to
invoke device-specific MMU functions, but relies on encoding the mapping
status on the CPU page table entries. This entangles machine-independent
code with machine-dependent code, and also brings unnecessary restrictions.
The PTE size and format vary arch by arch, which harms the extensibility.

[1] https://docs.freebsd.org/en/articles/vm-design/

Signed-off-by: Weixi Zhu 
---
 include/linux/gmem.h | 120 +
 include/linux/mm_types.h |   4 +
 mm/Makefile  |   2 +-
 mm/vm_object.c   | 184 +++
 4 files changed, 309 insertions(+), 1 deletion(-)
 create mode 100644 mm/vm_object.c

diff --git a/include/linux/gmem.h b/include/linux/gmem.h
index fff877873557..529ff6755a99 100644
--- a/include/linux/gmem.h
+++ b/include/linux/gmem.h
@@ -9,11 +9,131 @@
 #ifndef _GMEM_H
 #define _GMEM_H
 
+#include 
+
 #ifdef CONFIG_GMEM
+
+#define GM_PAGE_CPU0x10 /* Determines whether page is a pointer or a pfn 
number. */
+#define GM_PAGE_DEVICE 0x20
+#define GM_PAGE_NOMAP  0x40
+#define GM_PAGE_WILLNEED   0x80
+
+#define GM_PAGE_TYPE_MASK  (GM_PAGE_CPU | GM_PAGE_DEVICE | GM_PAGE_NOMAP)
+
+struct gm_mapping {
+   unsigned int flag;
+
+   union {
+   struct page *page;  /* CPU node */
+   struct gm_dev *dev; /* hetero-node. TODO: support multiple 
devices */
+   unsigned long pfn;
+   };
+
+   struct mutex lock;
+};
+
+static inline void gm_mapping_flags_set(struct gm_mapping *gm_mapping, int 
flags)
+{
+   if (flags & GM_PAGE_TYPE_MASK)
+   gm_mapping->flag &= ~GM_PAGE_TYPE_MASK;
+
+   gm_mapping->flag |= flags;
+}
+
+static inline void gm_mapping_flags_clear(struct gm_mapping *gm_mapping, int 
flags)
+{
+   gm_mapping->flag &= ~flags;
+}
+
+static inline bool gm_mapping_cpu(struct gm_mapping *gm_mapping)
+{
+   return !!(gm_mapping->flag & GM_PAGE_CPU);
+}
+
+static inline bool gm_mapping_device(struct gm_mapping *gm_mapping)
+{
+   return !!(gm_mapping->flag & GM_PAGE_DEVICE);
+}
+
+static inline bool gm_mapping_nomap(struct gm_mapping *gm_mapping)
+{
+   return !!(gm_mapping->flag & GM_PAGE_NOMAP);
+}
+
+static inline bool gm_mapping_willneed(struct gm_mapping *gm_mapping)
+{
+   return !!(gm_mapping->flag & GM_PAGE_WILLNEED);
+}
+
 /* h-NUMA topology */
 void __init hnuma_init(void);
+
+/* vm object */
+/*
+ * Each per-process vm_object tracks the mapping status of virtual pages from
+ * all VMAs mmap()-ed with MAP_PRIVATE | MAP_PEER_SHARED.
+ */
+struct vm_object {
+   spinlock_t lock;
+
+   /*
+* The logical_page_table is a container that holds the mapping
+* information between a VA and a struct page.
+*/
+   struct xarray *logical_page_table;
+   atomic_t nr_pages;
+};
+
+int __init vm_object_init(void);
+struct vm_object *vm_object_create(struct mm_struct *mm);
+void vm_object_drop_locked(struct mm_struct *mm);
+
+struct gm_mapping *alloc_gm_mapping(void);
+void free_gm_mappings(struct vm_area_struct *vma);
+struct gm_mapping *vm_object_lookup(struct vm_object *obj, unsigned long va);
+void vm_object_mapping_create(struct vm_object *obj, unsigned long start);
+void unmap_gm_mappings_range(struct vm_area_struct *vma, unsigned long start,
+unsigned long end);
+void munmap_in_peer_devices(struct mm_struct *mm, unsigned long start,
+   u

Re: [6/8] drm/ofdrm: Do not include

2023-11-28 Thread Sui Jingfeng


Hi,


On 2023/11/28 18:45, Thomas Zimmermann wrote:

Remove unnecessary include statements for .
The file contains helpers for non-atomic code and should not be
required by most drivers. No functional changes.

Signed-off-by: Thomas Zimmermann 



Reviewed-by: Sui Jingfeng 



---
  drivers/gpu/drm/tiny/ofdrm.c | 1 -
  1 file changed, 1 deletion(-)

diff --git a/drivers/gpu/drm/tiny/ofdrm.c b/drivers/gpu/drm/tiny/ofdrm.c
index 05a72473cfc65..ab89b7fc7bf61 100644
--- a/drivers/gpu/drm/tiny/ofdrm.c
+++ b/drivers/gpu/drm/tiny/ofdrm.c
@@ -19,7 +19,6 @@
  #include 
  #include 
  #include 
-#include 
  #include 
  #include

Re: [PATCH 4/8] drm/shmobile: Do not include

2023-11-28 Thread Geert Uytterhoeven

On Tue, Nov 28, 2023 at 11:47 AM Thomas Zimmermann  wrote:
> Remove unnecessary include statements for .
> The file contains helpers for non-atomic code and should not be
> required by most drivers. No functional changes.
>
> Signed-off-by: Thomas Zimmermann 

Reviewed-by: Geert Uytterhoeven 
Acked-by: Geert Uytterhoeven 

Gr{oetje,eeting}s,

Geert

-- 
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- ge...@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
-- Linus Torvalds

[RFC PATCH 1/6] mm/gmem: add heterogeneous NUMA node

2023-11-28 Thread Weixi Zhu

This patch adds a new NUMA node state, named N_HETEROGENEOUS. It is
utilized to identify heterogeneous NUMA (hNUMA) node. Note that hNUMA node
may not be directly accessible by the CPU.

Each hNUMA node can be identified with a NUMA id. This can be extended to
provide NUMA topology including device local DRAM, where a cache-coherent
bus does not need to exist between the CPU and device local DRAM.
Furthermore, this allows an application user to issue memory hints that
bind with specific hNUMA nodes.

Signed-off-by: Weixi Zhu 
---
 drivers/base/node.c  |  6 
 include/linux/gmem.h | 19 ++
 include/linux/nodemask.h | 10 ++
 init/main.c  |  2 ++
 mm/Kconfig   | 14 
 mm/Makefile  |  1 +
 mm/gmem.c| 78 
 mm/page_alloc.c  |  3 ++
 8 files changed, 133 insertions(+)
 create mode 100644 include/linux/gmem.h
 create mode 100644 mm/gmem.c

diff --git a/drivers/base/node.c b/drivers/base/node.c
index 493d533f8375..aa4d2ca266aa 100644
--- a/drivers/base/node.c
+++ b/drivers/base/node.c
@@ -928,6 +928,9 @@ static struct node_attr node_state_attr[] = {
[N_CPU] = _NODE_ATTR(has_cpu, N_CPU),
[N_GENERIC_INITIATOR] = _NODE_ATTR(has_generic_initiator,
   N_GENERIC_INITIATOR),
+#ifdef CONFIG_GMEM
+   [N_HETEROGENEOUS] = _NODE_ATTR(has_hetero_memory, N_HETEROGENEOUS),
+#endif
 };
 
 static struct attribute *node_state_attrs[] = {
@@ -940,6 +943,9 @@ static struct attribute *node_state_attrs[] = {
&node_state_attr[N_MEMORY].attr.attr,
&node_state_attr[N_CPU].attr.attr,
&node_state_attr[N_GENERIC_INITIATOR].attr.attr,
+#ifdef CONFIG_GMEM
+   &node_state_attr[N_HETEROGENEOUS].attr.attr,
+#endif
NULL
 };
 
diff --git a/include/linux/gmem.h b/include/linux/gmem.h
new file mode 100644
index ..fff877873557
--- /dev/null
+++ b/include/linux/gmem.h
@@ -0,0 +1,19 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/*
+ * Generalized Memory Management.
+ *
+ * Copyright (C) 2023- Huawei, Inc.
+ * Author: Weixi Zhu
+ *
+ */
+#ifndef _GMEM_H
+#define _GMEM_H
+
+#ifdef CONFIG_GMEM
+/* h-NUMA topology */
+void __init hnuma_init(void);
+#else
+static inline void hnuma_init(void) {}
+#endif
+
+#endif /* _GMEM_H */
diff --git a/include/linux/nodemask.h b/include/linux/nodemask.h
index 8d07116caaf1..66e4640a52ba 100644
--- a/include/linux/nodemask.h
+++ b/include/linux/nodemask.h
@@ -407,6 +407,9 @@ enum node_states {
N_MEMORY,   /* The node has memory(regular, high, movable) 
*/
N_CPU,  /* The node has one or more cpus */
N_GENERIC_INITIATOR,/* The node has one or more Generic Initiators 
*/
+#ifdef CONFIG_GMEM
+   N_HETEROGENEOUS,/* The node has heterogeneous memory */
+#endif
NR_NODE_STATES
 };
 
@@ -536,6 +539,13 @@ static inline int node_random(const nodemask_t *maskp)
 #define for_each_node(node)   for_each_node_state(node, N_POSSIBLE)
 #define for_each_online_node(node) for_each_node_state(node, N_ONLINE)
 
+#ifdef CONFIG_GMEM
+/* For h-NUMA topology */
+#define hnode_map  node_states[N_HETEROGENEOUS]
+#define num_hnodes()   num_node_state(N_HETEROGENEOUS)
+#define for_each_hnode(node)   for_each_node_state(node, N_HETEROGENEOUS)
+#endif
+
 /*
  * For nodemask scratch area.
  * NODEMASK_ALLOC(type, name) allocates an object with a specified type and
diff --git a/init/main.c b/init/main.c
index e24b0780fdff..12dfb5b63d51 100644
--- a/init/main.c
+++ b/init/main.c
@@ -100,6 +100,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
@@ -901,6 +902,7 @@ void start_kernel(void)
setup_per_cpu_areas();
smp_prepare_boot_cpu(); /* arch-specific boot-cpu hooks */
boot_cpu_hotplug_init();
+   hnuma_init();
 
pr_notice("Kernel command line: %s\n", saved_command_line);
/* parameters may set static keys */
diff --git a/mm/Kconfig b/mm/Kconfig
index 89971a894b60..1a7d8194513c 100644
--- a/mm/Kconfig
+++ b/mm/Kconfig
@@ -1270,6 +1270,20 @@ config LOCK_MM_AND_FIND_VMA
bool
depends on !STACK_GROWSUP
 
+config GMEM
+   bool "generalized memory management for external memory devices"
+   depends on (ARM64 || X86_64) && MMU && TRANSPARENT_HUGEPAGE
+   select ARCH_USES_HIGH_VMA_FLAGS
+   default y
+   help
+ Supporting GMEM (generalized memory management) for external memory
+ devices
+
+ GMEM extends Linux MM to share its machine-independent MM code. Only
+ high-level interface is provided for device drivers. This prevents
+ accelerator drivers from reinventing the wheel, but relies on drivers 
to
+ implement their hardware-dependent functions declared by GMEM.
+
 source "mm/damon/Kconfig"
 
 endmenu
diff --git a/mm/Makefile b/mm/Makefile
index 33873c8aedb3..f48ea2eb4a44 100644
--- a/mm/Makefi

Re: [PATCH] drm/amdgpu: Fix uninitialized return value

2023-11-28 Thread Felix Kuehling


On 2023-11-28 8:18, Christian König wrote:

Am 28.11.23 um 10:49 schrieb Lazar, Lijo:


On 11/28/2023 3:07 PM, Christian König wrote:

Am 27.11.23 um 22:55 schrieb Alex Deucher:

On Mon, Nov 27, 2023 at 2:22 PM Christian König
 wrote:

Am 27.11.23 um 19:29 schrieb Lijo Lazar:

The return value is uniinitialized if ras context is NULL.

Fixes: 0f4c8faa043c (drm/amdgpu: Move mca debug mode decision to 
ras)


Signed-off-by: Lijo Lazar 
---
   drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 2 +-
   1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c

index 1a8668a63e67..f6b47ebce9d6 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
@@ -3410,7 +3410,7 @@ int amdgpu_ras_reset_gpu(struct 
amdgpu_device *adev)
   int amdgpu_ras_set_mca_debug_mode(struct amdgpu_device *adev, 
bool enable)

   {
   struct amdgpu_ras *con = amdgpu_ras_get_context(adev);
- int ret;
+ int ret = 0;
That's usually considered very bad coding style and complained 
about by

automated checkers.

Instead explicitly set the return value in the code paths not 
actually

setting it.

In this case, the function is so short, I think it makes things less
readable to do that.


Yeah, indeed but that doesn't help us with the coding style checkers.


Are these checkers for real? I see many instances of variable 
initialization including in core kernel code (ex: workqueue) code.


Yes, I've got multiple complains about that already.

What people basically seem to do is to search for patterns like "int 
ret = 0;... ret = whatever();.. return ret;" with cocci.


This then results in a note that an initialization isn't necessary and 
should be avoided.


Isn't that the opposite of defensive programming? If you write your 
kernel code in Rust, all your local variables will be implicitly 
initialized. In C you have to do it yourself. And the compiler is 
notoriously inconsistent warning about uninitialized variables.


Regards,
  Felix




Same for things like return after else, e.g. when you have something 
like this "if (...) { ret = whatever(); if (ret) return ret; } else { 
... ret = 0;} return ret;".


Regards,
Christian.



Thanks

Lijo



We could do something like this instead:

if (!con)
    return 0;

ret = amdgpu_mca_smu_set_debug_mode(adev, enable);
...

Regards,
Christian.



Reviewed-by: Alex Deucher 


Regards,
Christian.


   if (con) {
   ret = amdgpu_mca_smu_set_debug_mode(adev, enable);

PSP_VMBX_POLLING_LIMIT too big

2023-11-28 Thread Mario Limonciello


Hi,

In amd-staging-drm-next 46fe6312082c ("drm/amdgpu: update retry times 
for psp BL wait") and upstream a11156ff6f41 ("drm/amdgpu: update retry 
times for psp BL wait") the number of loops for 
psp_v13_0_wait_for_bootloader() to try again increased significantly.


It went from 10 loops to 20k loops.  Essentially this means that the 
function can "effectively" no longer fail.


I've got an issue I'm looking at where runtime resume for a dGPU fails, 
and because of this change the system gets stuck in a never ending busy 
loop instead of cleanly returning an error code to the caller.  The 
outcome is the system appears hung while the 20k loops run instead of 
just the dGPU failing to resume.


Is this 20k value really required?  Or can we reduce it back to 
something more manageable?


Thanks,

Re: [PATCH 1/3] Revert "drm/prime: Unexport helpers for fd/handle conversion"

2023-11-28 Thread Alex Deucher

On Thu, Nov 23, 2023 at 6:12 PM Felix Kuehling  wrote:
>
> [+Alex]
>
> On 2023-11-17 16:44, Felix Kuehling wrote:
>
> > This reverts commit 71a7974ac7019afeec105a54447ae1dc7216cbb3.
> >
> > These helper functions are needed for KFD to export and import DMABufs
> > the right way without duplicating the tracking of DMABufs associated with
> > GEM objects while ensuring that move notifier callbacks are working as
> > intended.
> >
> > CC: Christian König 
> > CC: Thomas Zimmermann 
> > Signed-off-by: Felix Kuehling 
>
> Re: our discussion about v2 of this patch: If this version is
> acceptable, can I get an R-b or A-b?
>
> I would like to get this patch into drm-next as a prerequisite for
> patches 2 and 3. I cannot submit it to the current amd-staging-drm-next
> because the patch I'm reverting doesn't exist there yet.
>
> Patch 2 and 3 could go into drm-next as well, or go through Alex's
> amd-staging-drm-next branch once patch 1 is in drm-next. Alex, how do
> you prefer to coordinate this?

I guess ideally this would go through my drm-next tree since your
other patches depend on it unless others feel strongly that it should
go through drm-misc.

Alex


>
> Regards,
>Felix
>
>
> > ---
> >   drivers/gpu/drm/drm_prime.c | 33 ++---
> >   include/drm/drm_prime.h |  7 +++
> >   2 files changed, 25 insertions(+), 15 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/drm_prime.c b/drivers/gpu/drm/drm_prime.c
> > index 63b709a67471..834a5e28abbe 100644
> > --- a/drivers/gpu/drm/drm_prime.c
> > +++ b/drivers/gpu/drm/drm_prime.c
> > @@ -278,7 +278,7 @@ void drm_gem_dmabuf_release(struct dma_buf *dma_buf)
> >   }
> >   EXPORT_SYMBOL(drm_gem_dmabuf_release);
> >
> > -/*
> > +/**
> >* drm_gem_prime_fd_to_handle - PRIME import function for GEM drivers
> >* @dev: drm_device to import into
> >* @file_priv: drm file-private structure
> > @@ -292,9 +292,9 @@ EXPORT_SYMBOL(drm_gem_dmabuf_release);
> >*
> >* Returns 0 on success or a negative error code on failure.
> >*/
> > -static int drm_gem_prime_fd_to_handle(struct drm_device *dev,
> > -   struct drm_file *file_priv, int 
> > prime_fd,
> > -   uint32_t *handle)
> > +int drm_gem_prime_fd_to_handle(struct drm_device *dev,
> > +struct drm_file *file_priv, int prime_fd,
> > +uint32_t *handle)
> >   {
> >   struct dma_buf *dma_buf;
> >   struct drm_gem_object *obj;
> > @@ -360,6 +360,7 @@ static int drm_gem_prime_fd_to_handle(struct drm_device 
> > *dev,
> >   dma_buf_put(dma_buf);
> >   return ret;
> >   }
> > +EXPORT_SYMBOL(drm_gem_prime_fd_to_handle);
> >
> >   int drm_prime_fd_to_handle_ioctl(struct drm_device *dev, void *data,
> >struct drm_file *file_priv)
> > @@ -408,7 +409,7 @@ static struct dma_buf 
> > *export_and_register_object(struct drm_device *dev,
> >   return dmabuf;
> >   }
> >
> > -/*
> > +/**
> >* drm_gem_prime_handle_to_fd - PRIME export function for GEM drivers
> >* @dev: dev to export the buffer from
> >* @file_priv: drm file-private structure
> > @@ -421,10 +422,10 @@ static struct dma_buf 
> > *export_and_register_object(struct drm_device *dev,
> >* The actual exporting from GEM object to a dma-buf is done through the
> >* &drm_gem_object_funcs.export callback.
> >*/
> > -static int drm_gem_prime_handle_to_fd(struct drm_device *dev,
> > -   struct drm_file *file_priv, uint32_t 
> > handle,
> > -   uint32_t flags,
> > -   int *prime_fd)
> > +int drm_gem_prime_handle_to_fd(struct drm_device *dev,
> > +struct drm_file *file_priv, uint32_t handle,
> > +uint32_t flags,
> > +int *prime_fd)
> >   {
> >   struct drm_gem_object *obj;
> >   int ret = 0;
> > @@ -506,6 +507,7 @@ static int drm_gem_prime_handle_to_fd(struct drm_device 
> > *dev,
> >
> >   return ret;
> >   }
> > +EXPORT_SYMBOL(drm_gem_prime_handle_to_fd);
> >
> >   int drm_prime_handle_to_fd_ioctl(struct drm_device *dev, void *data,
> >struct drm_file *file_priv)
> > @@ -864,9 +866,9 @@ EXPORT_SYMBOL(drm_prime_get_contiguous_size);
> >* @obj: GEM object to export
> >* @flags: flags like DRM_CLOEXEC and DRM_RDWR
> >*
> > - * This is the implementation of the &drm_gem_object_funcs.export functions
> > - * for GEM drivers using the PRIME helpers. It is used as the default for
> > - * drivers that do not set their own.
> > + * This is the implementation of the &drm_gem_object_funcs.export 
> > functions for GEM drivers
> > + * using the PRIME helpers. It is used as the default in
> > + * drm_gem_prime_handle_to_fd().
> >*/
> >   struct dma_buf *drm_gem_prime_export(struct drm_gem_object *obj,
> >

Re: [PATCH] drm/amdgpu: add shared fdinfo stats

2023-11-28 Thread Rob Clark

On Tue, Nov 28, 2023 at 6:28 AM Alex Deucher  wrote:
>
> On Tue, Nov 28, 2023 at 9:17 AM Christian König
>  wrote:
> >
> > Am 17.11.23 um 20:56 schrieb Alex Deucher:
> > > Add shared stats.  Useful for seeing shared memory.
> > >
> > > Signed-off-by: Alex Deucher 
> > > ---
> > >   drivers/gpu/drm/amd/amdgpu/amdgpu_fdinfo.c |  4 
> > >   drivers/gpu/drm/amd/amdgpu/amdgpu_object.c | 11 +++
> > >   drivers/gpu/drm/amd/amdgpu/amdgpu_object.h |  6 ++
> > >   3 files changed, 21 insertions(+)
> > >
> > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_fdinfo.c 
> > > b/drivers/gpu/drm/amd/amdgpu/amdgpu_fdinfo.c
> > > index 5706b282a0c7..c7df7fa3459f 100644
> > > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_fdinfo.c
> > > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_fdinfo.c
> > > @@ -97,6 +97,10 @@ void amdgpu_show_fdinfo(struct drm_printer *p, struct 
> > > drm_file *file)
> > >  stats.requested_visible_vram/1024UL);
> > >   drm_printf(p, "amd-requested-gtt:\t%llu KiB\n",
> > >  stats.requested_gtt/1024UL);
> > > + drm_printf(p, "drm-shared-vram:\t%llu KiB\n", 
> > > stats.vram_shared/1024UL);
> > > + drm_printf(p, "drm-shared-gtt:\t%llu KiB\n", 
> > > stats.gtt_shared/1024UL);
> > > + drm_printf(p, "drm-shared-cpu:\t%llu KiB\n", 
> > > stats.cpu_shared/1024UL);
> > > +
> > >   for (hw_ip = 0; hw_ip < AMDGPU_HW_IP_NUM; ++hw_ip) {
> > >   if (!usage[hw_ip])
> > >   continue;
> > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c 
> > > b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
> > > index d79b4ca1ecfc..c24f7b2c04c1 100644
> > > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
> > > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
> > > @@ -1287,25 +1287,36 @@ void amdgpu_bo_get_memory(struct amdgpu_bo *bo,
> > > struct amdgpu_mem_stats *stats)
> > >   {
> > >   uint64_t size = amdgpu_bo_size(bo);
> > > + struct drm_gem_object *obj;
> > >   unsigned int domain;
> > > + bool shared;
> > >
> > >   /* Abort if the BO doesn't currently have a backing store */
> > >   if (!bo->tbo.resource)
> > >   return;
> > >
> > > + obj = &bo->tbo.base;
> > > + shared = obj->handle_count > 1;
> >
> > Interesting approach but I don't think that this is correct.
> >
> > The handle_count is basically how many GEM handles are there for BO, so
> > for example it doesn't catch sharing things with V4L.
> >
> > What we should probably rather do is to take a look if
> > bo->tbo.base.dma_buf is NULL or not.
>
> +Rob, dri-devel
>
> This is what the generic drm helper code does.  See
> drm_show_memory_stats().  If that is not correct that code should
> probably be fixed too.

OTOH, v4l doesn't expose fdinfo.  What "shared" is telling you is
whether the BO is counted multiple times when you look at all
processes fdinfo.

But I guess it would be ok to look for obj->handle_count > 1 || obj->dma_buf

BR,
-R

>
> Alex
>
> >
> > Regards,
> > Christian.
> >
> >
> > > +
> > >   domain = amdgpu_mem_type_to_domain(bo->tbo.resource->mem_type);
> > >   switch (domain) {
> > >   case AMDGPU_GEM_DOMAIN_VRAM:
> > >   stats->vram += size;
> > >   if (amdgpu_bo_in_cpu_visible_vram(bo))
> > >   stats->visible_vram += size;
> > > + if (shared)
> > > + stats->vram_shared += size;
> > >   break;
> > >   case AMDGPU_GEM_DOMAIN_GTT:
> > >   stats->gtt += size;
> > > + if (shared)
> > > + stats->gtt_shared += size;
> > >   break;
> > >   case AMDGPU_GEM_DOMAIN_CPU:
> > >   default:
> > >   stats->cpu += size;
> > > + if (shared)
> > > + stats->cpu_shared += size;
> > >   break;
> > >   }
> > >
> > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.h 
> > > b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.h
> > > index d28e21baef16..0503af75dc26 100644
> > > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.h
> > > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.h
> > > @@ -138,12 +138,18 @@ struct amdgpu_bo_vm {
> > >   struct amdgpu_mem_stats {
> > >   /* current VRAM usage, includes visible VRAM */
> > >   uint64_t vram;
> > > + /* current shared VRAM usage, includes visible VRAM */
> > > + uint64_t vram_shared;
> > >   /* current visible VRAM usage */
> > >   uint64_t visible_vram;
> > >   /* current GTT usage */
> > >   uint64_t gtt;
> > > + /* current shared GTT usage */
> > > + uint64_t gtt_shared;
> > >   /* current system memory usage */
> > >   uint64_t cpu;
> > > + /* current shared system memory usage */
> > > + uint64_t cpu_shared;
> > >   /* sum of evicted buffers, includes visible VRAM */
> > >   uint64_t evicted_vram;
> > >   /* sum of evicted buffers due to CPU access */
> >

[PATCH v3 0/9] drm/amd/display: improve DTN color state log

2023-11-28 Thread Melissa Wen

This series updates the color state section of the AMD DTN log to match
color resource differences between DCN versions.

Currently, the DTN log considers the DCN10 color pipeline, which is
useless for newer DCN versions due to all the color capability
differences. In addition to the new color blocks and features, some
semantic differences meant that the DCN10 output was no longer suitable
for newer families.

This version addresses comments from Siqueira and Harry [1]. It also
contains some improvements: DPP and MPC gamut remap matrix data in 31.32
fixed point format and coverage of DCN2+ and DCN3+.

- The first patch decouple DCN10 color state from HW log in a
  preparation for color logs specific to each DCN family.
- Harry kindly provided the second patch with a way to read Gamut Remap
  Matrix data in 31.32 fixed point format instead of HW values.
- With this, I replaced the DCN10 gamut remap output to display values
  in the new format (third patch).
- Patches 4 and 6 fill up the color state of DPP and MPC blocks for DCN3
  from the right registers.
- As DCN3+ has a new MPC color block for post-blending Gamut Remap
  matrix, in the patch 5 I reuse Harry's approach for reading DPP gamut
  remap matrix and create a helper to read data of MPC gamut remap
  matrix.
- Patch 7 and 9 create the new color state log specific for DCN2+ and
  DCN3+. I didn't extend to DCN32 (and also DCN35) because I observed
  some differences in the shaper and 3D LUT registers of this version.
- Patch 8 adds description of DPP and MPC color blocks for for better
  interpretation of data.

This new approach works well with the driver-specific color
properties[2] and steamdeck/gamescope[3] together, where we can see
color state changing from default values. I also tested with
steamdeck/KDE and DCN21/GNOME.

Please find some `before vs after` results below:

===

DCN301 - Before:
---

DPP:IGAM format  IGAM modeDGAM modeRGAM mode  GAMUT mode  C11 C12   
C13 C14   C21 C22   C23 C24   C31 C32   C33 C34
[ 0]:0h  BypassFixed  Bypass   Bypass0h 
h h h h h
[ 1]:0h  BypassFixed  Bypass   Bypass0h 
h h h h h
[ 2]:0h  BypassFixed  Bypass   Bypass0h 
h h h h h
[ 3]:0h  BypassFixed  Bypass   Bypass0h 
h h h h h

MPCC:  OPP  DPP  MPCCBOT  MODE  ALPHA_MODE  PREMULT  OVERLAP_ONLY  IDLE
[ 0]:   0h   0h   2h 3   01 0 0
[ 1]:   0h   1h   fh 2   20 0 0
[ 2]:   0h   2h   3h 3   01 0 0
[ 3]:   0h   3h   1h 3   20 0 0


DCN301 - After (Gamescope):
---

DPP:  DGAM ROM  DGAM ROM type  DGAM LUT  SHAPER mode  3DLUT mode  3DLUT bit 
depth  3DLUT size  RGAM mode  GAMUT adjust  C11C12C13
C14C21C22C23C24C31C32
C33C34
[ 0]:1   sRGBBypassRAM B   RAM A   
12-bit17x17x17  RAM ABypass  00 00 00 
00 00 00 00 00 00 00 
00 00
[ 1]:1   sRGBBypassRAM B   RAM A   
12-bit17x17x17  RAM ABypass  00 00 00 
00 00 00 00 00 00 00 
00 00
[ 2]:1   sRGBBypassRAM B   RAM A   
12-bit17x17x17  RAM ABypass  00 00 00 
00 00 00 00 00 00 00 
00 00
[ 3]:1   sRGBBypassRAM B   RAM A   
12-bit17x17x17  RAM ABypass  00 00 00 
00 00 00 00 00 00 00 
00 00

DPP Color Caps: input_lut_shared:0  icsc:1  dgam_ram:0  dgam_rom: 
srgb:1,bt2020:1,gamma2_2:1,pq:1,hlg:1  post_csc:1  gamcor:1  dgam_rom_for_yuv:0 
 3d_lut:1  blnd_lut:1  oscs:0

MPCC:  OPP  DPP  MPCCBOT  MODE  ALPHA_MODE  PREMULT  OVERLAP_ONLY  IDLE  SHAPER 
mode  3DLUT mode  3DLUT bit-depth  3DLUT size  OGAM mode  OGAM LUT  GAMUT 
adjust  C11C12C13C14C21C22C23   
 C24C31C32C33C34
[ 0]:   0h   0h   2h 3   01 0 0   
Bypass  Bypass   12-bit17x17x17RAM A  Bypass
00 00 00 00 00 00 
00 00 00 0

[PATCH v3 1/9] drm/amd/display: decouple color state from hw state log

2023-11-28 Thread Melissa Wen

Prepare to hook up color state log according to the DCN version.

v3:
- put functions in single line (Siqueira)

Signed-off-by: Melissa Wen 
---
 .../amd/display/dc/hwss/dcn10/dcn10_hwseq.c   | 26 +--
 1 file changed, 18 insertions(+), 8 deletions(-)

diff --git a/drivers/gpu/drm/amd/display/dc/hwss/dcn10/dcn10_hwseq.c 
b/drivers/gpu/drm/amd/display/dc/hwss/dcn10/dcn10_hwseq.c
index 2b8b8366538e..9b801488eb9d 100644
--- a/drivers/gpu/drm/amd/display/dc/hwss/dcn10/dcn10_hwseq.c
+++ b/drivers/gpu/drm/amd/display/dc/hwss/dcn10/dcn10_hwseq.c
@@ -282,19 +282,13 @@ static void dcn10_log_hubp_states(struct dc *dc, void 
*log_ctx)
DTN_INFO("\n");
 }
 
-void dcn10_log_hw_state(struct dc *dc,
-   struct dc_log_buffer_ctx *log_ctx)
+static void dcn10_log_color_state(struct dc *dc,
+ struct dc_log_buffer_ctx *log_ctx)
 {
struct dc_context *dc_ctx = dc->ctx;
struct resource_pool *pool = dc->res_pool;
int i;
 
-   DTN_INFO_BEGIN();
-
-   dcn10_log_hubbub_state(dc, log_ctx);
-
-   dcn10_log_hubp_states(dc, log_ctx);
-
DTN_INFO("DPP:IGAM format  IGAM modeDGAM modeRGAM mode"
"  GAMUT mode  C11 C12   C13 C14   C21 C22   C23 C24   "
"C31 C32   C33 C34\n");
@@ -351,6 +345,22 @@ void dcn10_log_hw_state(struct dc *dc,
s.idle);
}
DTN_INFO("\n");
+}
+
+void dcn10_log_hw_state(struct dc *dc,
+   struct dc_log_buffer_ctx *log_ctx)
+{
+   struct dc_context *dc_ctx = dc->ctx;
+   struct resource_pool *pool = dc->res_pool;
+   int i;
+
+   DTN_INFO_BEGIN();
+
+   dcn10_log_hubbub_state(dc, log_ctx);
+
+   dcn10_log_hubp_states(dc, log_ctx);
+
+   dcn10_log_color_state(dc, log_ctx);
 
DTN_INFO("OTG:  v_bs  v_be  v_ss  v_se  vpol  vmax  vmin  vmax_sel  
vmin_sel  h_bs  h_be  h_ss  h_se  hpol  htot  vtot  underflow blank_en\n");
 
-- 
2.42.0

[PATCH v3 4/9] drm/amd/display: fill up DCN3 DPP color state

2023-11-28 Thread Melissa Wen

DCN3 DPP color state was uncollected and some state elements from DCN1
doesn't fit DCN3. Create new elements according to DCN3 color caps and
fill them up for DTN log output.

rfc-v2:
- fix reading of gamcor and blnd gamma states
- remove gamut remap register in favor of gamut remap matrix reading

Signed-off-by: Melissa Wen 
---
 .../gpu/drm/amd/display/dc/dcn30/dcn30_dpp.c  | 37 ++-
 drivers/gpu/drm/amd/display/dc/inc/hw/dpp.h   |  8 
 2 files changed, 43 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/display/dc/dcn30/dcn30_dpp.c 
b/drivers/gpu/drm/amd/display/dc/dcn30/dcn30_dpp.c
index 7c18f31bb56c..a3a769aad042 100644
--- a/drivers/gpu/drm/amd/display/dc/dcn30/dcn30_dpp.c
+++ b/drivers/gpu/drm/amd/display/dc/dcn30/dcn30_dpp.c
@@ -44,12 +44,45 @@
 void dpp30_read_state(struct dpp *dpp_base, struct dcn_dpp_state *s)
 {
struct dcn3_dpp *dpp = TO_DCN30_DPP(dpp_base);
+   uint32_t gamcor_lut_mode, rgam_lut_mode;
 
REG_GET(DPP_CONTROL,
-   DPP_CLOCK_ENABLE, &s->is_enabled);
+   DPP_CLOCK_ENABLE, &s->is_enabled);
 
-   // TODO: Implement for DCN3
+   // Pre-degamma (ROM)
+   REG_GET_2(PRE_DEGAM,
+ PRE_DEGAM_MODE, &s->pre_dgam_mode,
+ PRE_DEGAM_SELECT, &s->pre_dgam_select);
+
+   // Gamma Correction (RAM)
+   REG_GET(CM_GAMCOR_CONTROL,
+   CM_GAMCOR_MODE_CURRENT, &s->gamcor_mode);
+   if (s->gamcor_mode) {
+   REG_GET(CM_GAMCOR_CONTROL, CM_GAMCOR_SELECT_CURRENT, 
&gamcor_lut_mode);
+   if (!gamcor_lut_mode)
+   s->gamcor_mode = LUT_RAM_A; // Otherwise, LUT_RAM_B
+   }
+
+   // Shaper LUT (RAM), 3D LUT (mode, bit-depth, size)
+   REG_GET(CM_SHAPER_CONTROL,
+   CM_SHAPER_LUT_MODE, &s->shaper_lut_mode);
+   REG_GET(CM_3DLUT_MODE,
+   CM_3DLUT_MODE_CURRENT, &s->lut3d_mode);
+   REG_GET(CM_3DLUT_READ_WRITE_CONTROL,
+   CM_3DLUT_30BIT_EN, &s->lut3d_bit_depth);
+   REG_GET(CM_3DLUT_MODE,
+   CM_3DLUT_SIZE, &s->lut3d_size);
+
+   // Blend/Out Gamma (RAM)
+   REG_GET(CM_BLNDGAM_CONTROL,
+   CM_BLNDGAM_MODE_CURRENT, &s->rgam_lut_mode);
+   if (s->rgam_lut_mode){
+   REG_GET(CM_BLNDGAM_CONTROL, CM_BLNDGAM_SELECT_CURRENT, 
&rgam_lut_mode);
+   if (!rgam_lut_mode)
+   s->rgam_lut_mode = LUT_RAM_A; // Otherwise, LUT_RAM_B
+   }
 }
+
 /*program post scaler scs block in dpp CM*/
 void dpp3_program_post_csc(
struct dpp *dpp_base,
diff --git a/drivers/gpu/drm/amd/display/dc/inc/hw/dpp.h 
b/drivers/gpu/drm/amd/display/dc/inc/hw/dpp.h
index b6acfd86642a..4e604bf24f51 100644
--- a/drivers/gpu/drm/amd/display/dc/inc/hw/dpp.h
+++ b/drivers/gpu/drm/amd/display/dc/inc/hw/dpp.h
@@ -151,6 +151,14 @@ struct dcn_dpp_state {
uint32_t gamut_remap_c33_c34;
// gamut_remap data for dcn*_log_color_state()
struct dpp_grph_csc_adjustment gamut_remap;
+   uint32_t shaper_lut_mode;
+   uint32_t lut3d_mode;
+   uint32_t lut3d_bit_depth;
+   uint32_t lut3d_size;
+   uint32_t blnd_lut_mode;
+   uint32_t pre_dgam_mode;
+   uint32_t pre_dgam_select;
+   uint32_t gamcor_mode;
 };
 
 struct CM_bias_params {
-- 
2.42.0

[PATCH v3 2/9] drm/amd/display: Add dpp_get_gamut_remap functions

2023-11-28 Thread Melissa Wen

From: Harry Wentland 

We want to be able to read the DPP's gamut remap matrix.

v2:
- code-style and doc comments clean-up (Melissa)

Signed-off-by: Harry Wentland 
Signed-off-by: Melissa Wen 
---
 .../drm/amd/display/dc/basics/conversion.c| 34 +
 .../drm/amd/display/dc/basics/conversion.h|  4 ++
 .../amd/display/dc/dcn10/dcn10_cm_common.c| 20 ++
 .../amd/display/dc/dcn10/dcn10_cm_common.h|  4 +-
 .../gpu/drm/amd/display/dc/dcn10/dcn10_dpp.c  |  3 +-
 .../gpu/drm/amd/display/dc/dcn10/dcn10_dpp.h  |  3 +
 .../drm/amd/display/dc/dcn10/dcn10_dpp_cm.c   | 70 ++-
 .../gpu/drm/amd/display/dc/dcn20/dcn20_dpp.c  |  1 +
 .../gpu/drm/amd/display/dc/dcn20/dcn20_dpp.h  |  3 +
 .../drm/amd/display/dc/dcn20/dcn20_dpp_cm.c   | 55 +++
 .../drm/amd/display/dc/dcn201/dcn201_dpp.c|  1 +
 .../gpu/drm/amd/display/dc/dcn30/dcn30_dpp.c  |  1 +
 .../gpu/drm/amd/display/dc/dcn30/dcn30_dpp.h  |  2 +
 .../drm/amd/display/dc/dcn30/dcn30_dpp_cm.c   | 54 ++
 .../gpu/drm/amd/display/dc/dcn32/dcn32_dpp.c  |  1 +
 drivers/gpu/drm/amd/display/dc/inc/hw/dpp.h   |  3 +
 16 files changed, 256 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/amd/display/dc/basics/conversion.c 
b/drivers/gpu/drm/amd/display/dc/basics/conversion.c
index e295a839ab47..f0065a51beb9 100644
--- a/drivers/gpu/drm/amd/display/dc/basics/conversion.c
+++ b/drivers/gpu/drm/amd/display/dc/basics/conversion.c
@@ -101,6 +101,40 @@ void convert_float_matrix(
}
 }
 
+static struct fixed31_32 int_frac_to_fixed_point(uint16_t arg,
+uint8_t integer_bits,
+uint8_t fractional_bits)
+{
+   struct fixed31_32 result;
+   uint16_t sign_mask = 1 << (fractional_bits + integer_bits);
+   uint16_t value_mask = sign_mask - 1;
+
+   result.value = (long long)(arg & value_mask) <<
+  (FIXED31_32_BITS_PER_FRACTIONAL_PART - fractional_bits);
+
+   if (arg & sign_mask)
+   result = dc_fixpt_neg(result);
+
+   return result;
+}
+
+/**
+ * convert_hw_matrix - converts HW values into fixed31_32 matrix.
+ * @matrix: fixed point 31.32 matrix
+ * @reg: array of register values
+ * @buffer_size: size of the array of register values
+ *
+ * Converts HW register spec defined format S2D13 into a fixed-point 31.32
+ * matrix.
+ */
+void convert_hw_matrix(struct fixed31_32 *matrix,
+  uint16_t *reg,
+  uint32_t buffer_size)
+{
+   for (int i = 0; i < buffer_size; ++i)
+   matrix[i] = int_frac_to_fixed_point(reg[i], 2, 13);
+}
+
 static uint32_t find_gcd(uint32_t a, uint32_t b)
 {
uint32_t remainder = 0;
diff --git a/drivers/gpu/drm/amd/display/dc/basics/conversion.h 
b/drivers/gpu/drm/amd/display/dc/basics/conversion.h
index 81da4e6f7a1a..a433cef78496 100644
--- a/drivers/gpu/drm/amd/display/dc/basics/conversion.h
+++ b/drivers/gpu/drm/amd/display/dc/basics/conversion.h
@@ -41,6 +41,10 @@ void convert_float_matrix(
 void reduce_fraction(uint32_t num, uint32_t den,
uint32_t *out_num, uint32_t *out_den);
 
+void convert_hw_matrix(struct fixed31_32 *matrix,
+  uint16_t *reg,
+  uint32_t buffer_size);
+
 static inline unsigned int log_2(unsigned int num)
 {
return ilog2(num);
diff --git a/drivers/gpu/drm/amd/display/dc/dcn10/dcn10_cm_common.c 
b/drivers/gpu/drm/amd/display/dc/dcn10/dcn10_cm_common.c
index 3538973bd0c6..b7e57aa27361 100644
--- a/drivers/gpu/drm/amd/display/dc/dcn10/dcn10_cm_common.c
+++ b/drivers/gpu/drm/amd/display/dc/dcn10/dcn10_cm_common.c
@@ -62,6 +62,26 @@ void cm_helper_program_color_matrices(
 
 }
 
+void cm_helper_read_color_matrices(struct dc_context *ctx,
+  uint16_t *regval,
+  const struct color_matrices_reg *reg)
+{
+   uint32_t cur_csc_reg, regval0, regval1;
+   unsigned int i = 0;
+
+   for (cur_csc_reg = reg->csc_c11_c12;
+cur_csc_reg <= reg->csc_c33_c34; cur_csc_reg++) {
+   REG_GET_2(cur_csc_reg,
+   csc_c11, ®val0,
+   csc_c12, ®val1);
+
+   regval[2 * i] = regval0;
+   regval[(2 * i) + 1] = regval1;
+
+   i++;
+   }
+}
+
 void cm_helper_program_xfer_func(
struct dc_context *ctx,
const struct pwl_params *params,
diff --git a/drivers/gpu/drm/amd/display/dc/dcn10/dcn10_cm_common.h 
b/drivers/gpu/drm/amd/display/dc/dcn10/dcn10_cm_common.h
index 0a68b63d6126..decc50b1ac53 100644
--- a/drivers/gpu/drm/amd/display/dc/dcn10/dcn10_cm_common.h
+++ b/drivers/gpu/drm/amd/display/dc/dcn10/dcn10_cm_common.h
@@ -114,5 +114,7 @@ bool cm_helper_translate_curve_to_degamma_hw_format(
const struct dc_transfer_func *output_tf,
struct pwl_params

[PATCH v3 3/9] drm/amd/display: read gamut remap matrix in fixed-point 31.32 format

2023-11-28 Thread Melissa Wen

Instead of read gamut remap data from hw values, convert HW register
values (S2D13) into a fixed-point 31.32 matrix for color state log.
Change DCN10 log to print data in the format of the gamut remap matrix.

Signed-off-by: Melissa Wen 
---
 .../amd/display/dc/hwss/dcn10/dcn10_hwseq.c   | 38 +--
 drivers/gpu/drm/amd/display/dc/inc/hw/dpp.h   |  3 ++
 2 files changed, 29 insertions(+), 12 deletions(-)

diff --git a/drivers/gpu/drm/amd/display/dc/hwss/dcn10/dcn10_hwseq.c 
b/drivers/gpu/drm/amd/display/dc/hwss/dcn10/dcn10_hwseq.c
index 9b801488eb9d..f0a9f8818909 100644
--- a/drivers/gpu/drm/amd/display/dc/hwss/dcn10/dcn10_hwseq.c
+++ b/drivers/gpu/drm/amd/display/dc/hwss/dcn10/dcn10_hwseq.c
@@ -289,20 +289,26 @@ static void dcn10_log_color_state(struct dc *dc,
struct resource_pool *pool = dc->res_pool;
int i;
 
-   DTN_INFO("DPP:IGAM format  IGAM modeDGAM modeRGAM mode"
-   "  GAMUT mode  C11 C12   C13 C14   C21 C22   C23 C24   "
-   "C31 C32   C33 C34\n");
+   DTN_INFO("DPP:IGAM formatIGAM modeDGAM modeRGAM mode"
+"  GAMUT adjust  "
+"C11C12C13C14"
+"C21C22C23C24"
+"C31C32C33C34\n");
for (i = 0; i < pool->pipe_count; i++) {
struct dpp *dpp = pool->dpps[i];
struct dcn_dpp_state s = {0};
 
dpp->funcs->dpp_read_state(dpp, &s);
+   dpp->funcs->dpp_get_gamut_remap(dpp, &s.gamut_remap);
 
if (!s.is_enabled)
continue;
 
-   DTN_INFO("[%2d]:  %11xh  %-11s  %-11s  %-11s"
-   "%8x%08xh %08xh %08xh %08xh %08xh %08xh",
+   DTN_INFO("[%2d]:  %11xh  %11s%9s%9s"
+"  %12s  "
+"%010lld %010lld %010lld %010lld "
+"%010lld %010lld %010lld %010lld "
+"%010lld %010lld %010lld %010lld",
dpp->inst,
s.igam_input_format,
(s.igam_lut_mode == 0) ? "BypassFixed" :
@@ -322,13 +328,21 @@ static void dcn10_log_color_state(struct dc *dc,
((s.rgam_lut_mode == 3) ? "RAM" :
((s.rgam_lut_mode == 4) ? "RAM" :
 "Unknown",
-   s.gamut_remap_mode,
-   s.gamut_remap_c11_c12,
-   s.gamut_remap_c13_c14,
-   s.gamut_remap_c21_c22,
-   s.gamut_remap_c23_c24,
-   s.gamut_remap_c31_c32,
-   s.gamut_remap_c33_c34);
+   (s.gamut_remap.gamut_adjust_type == 0) ? 
"Bypass" :
+   ((s.gamut_remap.gamut_adjust_type == 1) 
? "HW" :
+   
  "SW"),
+   s.gamut_remap.temperature_matrix[0].value,
+   s.gamut_remap.temperature_matrix[1].value,
+   s.gamut_remap.temperature_matrix[2].value,
+   s.gamut_remap.temperature_matrix[3].value,
+   s.gamut_remap.temperature_matrix[4].value,
+   s.gamut_remap.temperature_matrix[5].value,
+   s.gamut_remap.temperature_matrix[6].value,
+   s.gamut_remap.temperature_matrix[7].value,
+   s.gamut_remap.temperature_matrix[8].value,
+   s.gamut_remap.temperature_matrix[9].value,
+   s.gamut_remap.temperature_matrix[10].value,
+   s.gamut_remap.temperature_matrix[11].value);
DTN_INFO("\n");
}
DTN_INFO("\n");
diff --git a/drivers/gpu/drm/amd/display/dc/inc/hw/dpp.h 
b/drivers/gpu/drm/amd/display/dc/inc/hw/dpp.h
index 597ebdb4da4c..b6acfd86642a 100644
--- a/drivers/gpu/drm/amd/display/dc/inc/hw/dpp.h
+++ b/drivers/gpu/drm/amd/display/dc/inc/hw/dpp.h
@@ -141,6 +141,7 @@ struct dcn_dpp_state {
uint32_t igam_input_format;
uint32_t dgam_lut_mode;
uint32_t rgam_lut_mode;
+   // gamut_remap data for dcn10_get_cm_states()
uint32_t gamut_remap_mode;
uint32_t gamut_remap_c11_c12;
uint32_t gamut_remap_c13_c14;
@@ -148,6 +149,8 @@ struct dcn_dpp_state {
uint32_t gamut_remap_c23_c24;
uint32_t gamut_remap_c31_c32;
uint32_t gamut_remap_c33_c34;
+   // gamut_remap data for dcn*_log_color_state()
+   struct dpp_grph

[PATCH v3 5/9] drm/amd/display: add get_gamut_remap helper for MPC3

2023-11-28 Thread Melissa Wen

We want to be able to read the MPC's gamut remap matrix similar to
what we do with .dpp_get_gamut_remap functions. On the other hand, we
don't need a hook here because only DCN3+ has the MPC gamut remap
block, being absent in previous families.

Signed-off-by: Melissa Wen 
---
 .../gpu/drm/amd/display/dc/dcn30/dcn30_mpc.c  | 58 +++
 .../gpu/drm/amd/display/dc/dcn30/dcn30_mpc.h  |  4 ++
 2 files changed, 62 insertions(+)

diff --git a/drivers/gpu/drm/amd/display/dc/dcn30/dcn30_mpc.c 
b/drivers/gpu/drm/amd/display/dc/dcn30/dcn30_mpc.c
index d1500b223858..a6a4c3413f89 100644
--- a/drivers/gpu/drm/amd/display/dc/dcn30/dcn30_mpc.c
+++ b/drivers/gpu/drm/amd/display/dc/dcn30/dcn30_mpc.c
@@ -1129,6 +1129,64 @@ void mpc3_set_gamut_remap(
}
 }
 
+static void read_gamut_remap(struct dcn30_mpc *mpc30,
+int mpcc_id,
+uint16_t *regval,
+uint32_t *select)
+{
+   struct color_matrices_reg gam_regs;
+
+   //current coefficient set in use
+   REG_GET(MPCC_GAMUT_REMAP_MODE[mpcc_id], MPCC_GAMUT_REMAP_MODE_CURRENT, 
select);
+
+   gam_regs.shifts.csc_c11 = mpc30->mpc_shift->MPCC_GAMUT_REMAP_C11_A;
+   gam_regs.masks.csc_c11  = mpc30->mpc_mask->MPCC_GAMUT_REMAP_C11_A;
+   gam_regs.shifts.csc_c12 = mpc30->mpc_shift->MPCC_GAMUT_REMAP_C12_A;
+   gam_regs.masks.csc_c12 = mpc30->mpc_mask->MPCC_GAMUT_REMAP_C12_A;
+
+   if (*select == GAMUT_REMAP_COEFF) {
+   gam_regs.csc_c11_c12 = REG(MPC_GAMUT_REMAP_C11_C12_A[mpcc_id]);
+   gam_regs.csc_c33_c34 = REG(MPC_GAMUT_REMAP_C33_C34_A[mpcc_id]);
+
+   cm_helper_read_color_matrices(
+   mpc30->base.ctx,
+   regval,
+   &gam_regs);
+
+   } else  if (*select == GAMUT_REMAP_COMA_COEFF) {
+
+   gam_regs.csc_c11_c12 = REG(MPC_GAMUT_REMAP_C11_C12_B[mpcc_id]);
+   gam_regs.csc_c33_c34 = REG(MPC_GAMUT_REMAP_C33_C34_B[mpcc_id]);
+
+   cm_helper_read_color_matrices(
+   mpc30->base.ctx,
+   regval,
+   &gam_regs);
+
+   }
+
+}
+
+void mpc3_get_gamut_remap(struct mpc *mpc,
+ int mpcc_id,
+ struct mpc_grph_gamut_adjustment *adjust)
+{
+   struct dcn30_mpc *mpc30 = TO_DCN30_MPC(mpc);
+   uint16_t arr_reg_val[12];
+   int select;
+
+   read_gamut_remap(mpc30, mpcc_id, arr_reg_val, &select);
+
+   if (select == GAMUT_REMAP_BYPASS) {
+   adjust->gamut_adjust_type = GRAPHICS_GAMUT_ADJUST_TYPE_BYPASS;
+   return;
+   }
+
+   adjust->gamut_adjust_type = GRAPHICS_GAMUT_ADJUST_TYPE_SW;
+   convert_hw_matrix(adjust->temperature_matrix,
+ arr_reg_val, ARRAY_SIZE(arr_reg_val));
+}
+
 bool mpc3_program_3dlut(
struct mpc *mpc,
const struct tetrahedral_params *params,
diff --git a/drivers/gpu/drm/amd/display/dc/dcn30/dcn30_mpc.h 
b/drivers/gpu/drm/amd/display/dc/dcn30/dcn30_mpc.h
index 5198f2167c7c..9cb96ae95a2f 100644
--- a/drivers/gpu/drm/amd/display/dc/dcn30/dcn30_mpc.h
+++ b/drivers/gpu/drm/amd/display/dc/dcn30/dcn30_mpc.h
@@ -1056,6 +1056,10 @@ void mpc3_set_gamut_remap(
int mpcc_id,
const struct mpc_grph_gamut_adjustment *adjust);
 
+void mpc3_get_gamut_remap(struct mpc *mpc,
+ int mpcc_id,
+ struct mpc_grph_gamut_adjustment *adjust);
+
 void mpc3_set_rmu_mux(
struct mpc *mpc,
int rmu_idx,
-- 
2.42.0

[PATCH v3 7/9] drm/amd/display: hook up DCN30 color blocks data to DTN log

2023-11-28 Thread Melissa Wen

Color caps changed between HW versions, which caused the DCN10 color
state sections in the DTN log to no longer match DCN3+ state. Create a
color state log specific to DCN3.0 and hook it up to DCN3.0+ and DCN3.1+
drivers.

rfc-v2:
- detail RAM mode for gamcor and blnd gamma blocks
- add MPC gamut remap matrix log

v3:
- read MPC gamut remap matrix in fixed 31.32 format
- extend to DCN3.0+ and DCN3.1+ drivers (Harry)

Signed-off-by: Melissa Wen 
---
 .../gpu/drm/amd/display/dc/dcn30/dcn30_init.c |   1 +
 .../drm/amd/display/dc/dcn301/dcn301_init.c   |   1 +
 .../gpu/drm/amd/display/dc/dcn31/dcn31_init.c |   1 +
 .../drm/amd/display/dc/dcn314/dcn314_init.c   |   1 +
 .../amd/display/dc/hwss/dcn10/dcn10_hwseq.c   |   5 +-
 .../amd/display/dc/hwss/dcn30/dcn30_hwseq.c   | 126 ++
 .../amd/display/dc/hwss/dcn30/dcn30_hwseq.h   |   3 +
 .../drm/amd/display/dc/hwss/hw_sequencer.h|   2 +
 8 files changed, 139 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/display/dc/dcn30/dcn30_init.c 
b/drivers/gpu/drm/amd/display/dc/dcn30/dcn30_init.c
index 9894caedffed..4064e6b7f599 100644
--- a/drivers/gpu/drm/amd/display/dc/dcn30/dcn30_init.c
+++ b/drivers/gpu/drm/amd/display/dc/dcn30/dcn30_init.c
@@ -68,6 +68,7 @@ static const struct hw_sequencer_funcs dcn30_funcs = {
.setup_stereo = dcn10_setup_stereo,
.set_avmute = dcn30_set_avmute,
.log_hw_state = dcn10_log_hw_state,
+   .log_color_state = dcn30_log_color_state,
.get_hw_state = dcn10_get_hw_state,
.clear_status_bits = dcn10_clear_status_bits,
.wait_for_mpcc_disconnect = dcn10_wait_for_mpcc_disconnect,
diff --git a/drivers/gpu/drm/amd/display/dc/dcn301/dcn301_init.c 
b/drivers/gpu/drm/amd/display/dc/dcn301/dcn301_init.c
index 6477009ce065..1a9122252702 100644
--- a/drivers/gpu/drm/amd/display/dc/dcn301/dcn301_init.c
+++ b/drivers/gpu/drm/amd/display/dc/dcn301/dcn301_init.c
@@ -72,6 +72,7 @@ static const struct hw_sequencer_funcs dcn301_funcs = {
.setup_stereo = dcn10_setup_stereo,
.set_avmute = dcn30_set_avmute,
.log_hw_state = dcn10_log_hw_state,
+   .log_color_state = dcn30_log_color_state,
.get_hw_state = dcn10_get_hw_state,
.clear_status_bits = dcn10_clear_status_bits,
.wait_for_mpcc_disconnect = dcn10_wait_for_mpcc_disconnect,
diff --git a/drivers/gpu/drm/amd/display/dc/dcn31/dcn31_init.c 
b/drivers/gpu/drm/amd/display/dc/dcn31/dcn31_init.c
index 669f524bd064..61577a3678a0 100644
--- a/drivers/gpu/drm/amd/display/dc/dcn31/dcn31_init.c
+++ b/drivers/gpu/drm/amd/display/dc/dcn31/dcn31_init.c
@@ -71,6 +71,7 @@ static const struct hw_sequencer_funcs dcn31_funcs = {
.setup_stereo = dcn10_setup_stereo,
.set_avmute = dcn30_set_avmute,
.log_hw_state = dcn10_log_hw_state,
+   .log_color_state = dcn30_log_color_state,
.get_hw_state = dcn10_get_hw_state,
.clear_status_bits = dcn10_clear_status_bits,
.wait_for_mpcc_disconnect = dcn10_wait_for_mpcc_disconnect,
diff --git a/drivers/gpu/drm/amd/display/dc/dcn314/dcn314_init.c 
b/drivers/gpu/drm/amd/display/dc/dcn314/dcn314_init.c
index ccb7e317e86a..094b912832c1 100644
--- a/drivers/gpu/drm/amd/display/dc/dcn314/dcn314_init.c
+++ b/drivers/gpu/drm/amd/display/dc/dcn314/dcn314_init.c
@@ -73,6 +73,7 @@ static const struct hw_sequencer_funcs dcn314_funcs = {
.setup_stereo = dcn10_setup_stereo,
.set_avmute = dcn30_set_avmute,
.log_hw_state = dcn10_log_hw_state,
+   .log_color_state = dcn30_log_color_state,
.get_hw_state = dcn10_get_hw_state,
.clear_status_bits = dcn10_clear_status_bits,
.wait_for_mpcc_disconnect = dcn10_wait_for_mpcc_disconnect,
diff --git a/drivers/gpu/drm/amd/display/dc/hwss/dcn10/dcn10_hwseq.c 
b/drivers/gpu/drm/amd/display/dc/hwss/dcn10/dcn10_hwseq.c
index f0a9f8818909..f7d9bcdbc6c6 100644
--- a/drivers/gpu/drm/amd/display/dc/hwss/dcn10/dcn10_hwseq.c
+++ b/drivers/gpu/drm/amd/display/dc/hwss/dcn10/dcn10_hwseq.c
@@ -374,7 +374,10 @@ void dcn10_log_hw_state(struct dc *dc,
 
dcn10_log_hubp_states(dc, log_ctx);
 
-   dcn10_log_color_state(dc, log_ctx);
+   if (dc->hwss.log_color_state)
+   dc->hwss.log_color_state(dc, log_ctx);
+   else
+   dcn10_log_color_state(dc, log_ctx);
 
DTN_INFO("OTG:  v_bs  v_be  v_ss  v_se  vpol  vmax  vmin  vmax_sel  
vmin_sel  h_bs  h_be  h_ss  h_se  hpol  htot  vtot  underflow blank_en\n");
 
diff --git a/drivers/gpu/drm/amd/display/dc/hwss/dcn30/dcn30_hwseq.c 
b/drivers/gpu/drm/amd/display/dc/hwss/dcn30/dcn30_hwseq.c
index d71faf2ecd41..1e07f0a6be1f 100644
--- a/drivers/gpu/drm/amd/display/dc/hwss/dcn30/dcn30_hwseq.c
+++ b/drivers/gpu/drm/amd/display/dc/hwss/dcn30/dcn30_hwseq.c
@@ -69,6 +69,132 @@
 #define FN(reg_name, field_name) \
hws->shifts->field_name, hws->masks->field_name
 
+void dcn30_log_color_state(struct dc *dc,
+  struct dc_log_buffer_ctx *log_ctx)
+{
+

[PATCH v3 8/9] drm/amd/display: add DPP and MPC color caps to DTN log

2023-11-28 Thread Melissa Wen

Add color caps information for DPP and MPC block to show HW color caps.

Signed-off-by: Melissa Wen 
---
 .../amd/display/dc/hwss/dcn10/dcn10_hwseq.c   | 23 +++
 .../amd/display/dc/hwss/dcn30/dcn30_hwseq.c   | 23 +++
 2 files changed, 46 insertions(+)

diff --git a/drivers/gpu/drm/amd/display/dc/hwss/dcn10/dcn10_hwseq.c 
b/drivers/gpu/drm/amd/display/dc/hwss/dcn10/dcn10_hwseq.c
index f7d9bcdbc6c6..d3cab6fb270b 100644
--- a/drivers/gpu/drm/amd/display/dc/hwss/dcn10/dcn10_hwseq.c
+++ b/drivers/gpu/drm/amd/display/dc/hwss/dcn10/dcn10_hwseq.c
@@ -346,6 +346,24 @@ static void dcn10_log_color_state(struct dc *dc,
DTN_INFO("\n");
}
DTN_INFO("\n");
+   DTN_INFO("DPP Color Caps: input_lut_shared:%d  icsc:%d"
+"  dgam_ram:%d  dgam_rom: 
srgb:%d,bt2020:%d,gamma2_2:%d,pq:%d,hlg:%d"
+"  post_csc:%d  gamcor:%d  dgam_rom_for_yuv:%d  3d_lut:%d"
+"  blnd_lut:%d  oscs:%d\n\n",
+dc->caps.color.dpp.input_lut_shared,
+dc->caps.color.dpp.icsc,
+dc->caps.color.dpp.dgam_ram,
+dc->caps.color.dpp.dgam_rom_caps.srgb,
+dc->caps.color.dpp.dgam_rom_caps.bt2020,
+dc->caps.color.dpp.dgam_rom_caps.gamma2_2,
+dc->caps.color.dpp.dgam_rom_caps.pq,
+dc->caps.color.dpp.dgam_rom_caps.hlg,
+dc->caps.color.dpp.post_csc,
+dc->caps.color.dpp.gamma_corr,
+dc->caps.color.dpp.dgam_rom_for_yuv,
+dc->caps.color.dpp.hw_3d_lut,
+dc->caps.color.dpp.ogam_ram,
+dc->caps.color.dpp.ocsc);
 
DTN_INFO("MPCC:  OPP  DPP  MPCCBOT  MODE  ALPHA_MODE  PREMULT  
OVERLAP_ONLY  IDLE\n");
for (i = 0; i < pool->pipe_count; i++) {
@@ -359,6 +377,11 @@ static void dcn10_log_color_state(struct dc *dc,
s.idle);
}
DTN_INFO("\n");
+   DTN_INFO("MPC Color Caps: gamut_remap:%d, 3dlut:%d, ogam_ram:%d, 
ocsc:%d\n\n",
+dc->caps.color.mpc.gamut_remap,
+dc->caps.color.mpc.num_3dluts,
+dc->caps.color.mpc.ogam_ram,
+dc->caps.color.mpc.ocsc);
 }
 
 void dcn10_log_hw_state(struct dc *dc,
diff --git a/drivers/gpu/drm/amd/display/dc/hwss/dcn30/dcn30_hwseq.c 
b/drivers/gpu/drm/amd/display/dc/hwss/dcn30/dcn30_hwseq.c
index 1e07f0a6be1f..3b38af592101 100644
--- a/drivers/gpu/drm/amd/display/dc/hwss/dcn30/dcn30_hwseq.c
+++ b/drivers/gpu/drm/amd/display/dc/hwss/dcn30/dcn30_hwseq.c
@@ -140,6 +140,24 @@ void dcn30_log_color_state(struct dc *dc,
DTN_INFO("\n");
}
DTN_INFO("\n");
+   DTN_INFO("DPP Color Caps: input_lut_shared:%d  icsc:%d"
+"  dgam_ram:%d  dgam_rom: 
srgb:%d,bt2020:%d,gamma2_2:%d,pq:%d,hlg:%d"
+"  post_csc:%d  gamcor:%d  dgam_rom_for_yuv:%d  3d_lut:%d"
+"  blnd_lut:%d  oscs:%d\n\n",
+dc->caps.color.dpp.input_lut_shared,
+dc->caps.color.dpp.icsc,
+dc->caps.color.dpp.dgam_ram,
+dc->caps.color.dpp.dgam_rom_caps.srgb,
+dc->caps.color.dpp.dgam_rom_caps.bt2020,
+dc->caps.color.dpp.dgam_rom_caps.gamma2_2,
+dc->caps.color.dpp.dgam_rom_caps.pq,
+dc->caps.color.dpp.dgam_rom_caps.hlg,
+dc->caps.color.dpp.post_csc,
+dc->caps.color.dpp.gamma_corr,
+dc->caps.color.dpp.dgam_rom_for_yuv,
+dc->caps.color.dpp.hw_3d_lut,
+dc->caps.color.dpp.ogam_ram,
+dc->caps.color.dpp.ocsc);
 
DTN_INFO("MPCC:  OPP  DPP  MPCCBOT  MODE  ALPHA_MODE  PREMULT  
OVERLAP_ONLY  IDLE"
 "  SHAPER mode  3DLUT mode  3DLUT bit-depth  3DLUT size  OGAM 
mode  OGAM LUT"
@@ -193,6 +211,11 @@ void dcn30_log_color_state(struct dc *dc,
 
}
DTN_INFO("\n");
+   DTN_INFO("MPC Color Caps: gamut_remap:%d, 3dlut:%d, ogam_ram:%d, 
ocsc:%d\n\n",
+dc->caps.color.mpc.gamut_remap,
+dc->caps.color.mpc.num_3dluts,
+dc->caps.color.mpc.ogam_ram,
+dc->caps.color.mpc.ocsc);
 }
 
 bool dcn30_set_blend_lut(
-- 
2.42.0

[PATCH v3 9/9] drm/amd/display: hook up DCN20 color blocks data to DTN log

2023-11-28 Thread Melissa Wen

Color caps changed between HW versions, which caused the DCN10 color
state sections in the DTN log to no longer match DCN2+ state. Create a
color state log specific to DCN2.0 and hook it up to DCN2 family
drivers. Instead of reading gamut remap reg values, display gamut remap
matrix data in fixed 31.32.

Signed-off-by: Melissa Wen 
---
 .../gpu/drm/amd/display/dc/dcn20/dcn20_dpp.c  |  30 ++---
 .../gpu/drm/amd/display/dc/dcn20/dcn20_init.c |   1 +
 .../gpu/drm/amd/display/dc/dcn20/dcn20_mpc.c  |  24 +++-
 .../gpu/drm/amd/display/dc/dcn21/dcn21_init.c |   1 +
 .../amd/display/dc/hwss/dcn20/dcn20_hwseq.c   | 106 ++
 .../amd/display/dc/hwss/dcn20/dcn20_hwseq.h   |   2 +
 6 files changed, 149 insertions(+), 15 deletions(-)

diff --git a/drivers/gpu/drm/amd/display/dc/dcn20/dcn20_dpp.c 
b/drivers/gpu/drm/amd/display/dc/dcn20/dcn20_dpp.c
index dedc2dcf2691..1516c0a48726 100644
--- a/drivers/gpu/drm/amd/display/dc/dcn20/dcn20_dpp.c
+++ b/drivers/gpu/drm/amd/display/dc/dcn20/dcn20_dpp.c
@@ -55,21 +55,23 @@ void dpp20_read_state(struct dpp *dpp_base,
 
REG_GET(DPP_CONTROL,
DPP_CLOCK_ENABLE, &s->is_enabled);
+
+   // Degamma LUT (RAM)
REG_GET(CM_DGAM_CONTROL,
-   CM_DGAM_LUT_MODE, &s->dgam_lut_mode);
-   // BGAM has no ROM, and definition is different, can't reuse same dump
-   //REG_GET(CM_BLNDGAM_CONTROL,
-   //  CM_BLNDGAM_LUT_MODE, &s->rgam_lut_mode);
-   REG_GET(CM_GAMUT_REMAP_CONTROL,
-   CM_GAMUT_REMAP_MODE, &s->gamut_remap_mode);
-   if (s->gamut_remap_mode) {
-   s->gamut_remap_c11_c12 = REG_READ(CM_GAMUT_REMAP_C11_C12);
-   s->gamut_remap_c13_c14 = REG_READ(CM_GAMUT_REMAP_C13_C14);
-   s->gamut_remap_c21_c22 = REG_READ(CM_GAMUT_REMAP_C21_C22);
-   s->gamut_remap_c23_c24 = REG_READ(CM_GAMUT_REMAP_C23_C24);
-   s->gamut_remap_c31_c32 = REG_READ(CM_GAMUT_REMAP_C31_C32);
-   s->gamut_remap_c33_c34 = REG_READ(CM_GAMUT_REMAP_C33_C34);
-   }
+   CM_DGAM_LUT_MODE, &s->dgam_lut_mode);
+
+   // Shaper LUT (RAM), 3D LUT (mode, bit-depth, size)
+   REG_GET(CM_SHAPER_CONTROL,
+   CM_SHAPER_LUT_MODE, &s->shaper_lut_mode);
+   REG_GET_2(CM_3DLUT_READ_WRITE_CONTROL,
+ CM_3DLUT_CONFIG_STATUS, &s->lut3d_mode,
+ CM_3DLUT_30BIT_EN, &s->lut3d_bit_depth);
+   REG_GET(CM_3DLUT_MODE,
+   CM_3DLUT_SIZE, &s->lut3d_size);
+
+   // Blend/Out Gamma (RAM)
+   REG_GET(CM_BLNDGAM_LUT_WRITE_EN_MASK,
+   CM_BLNDGAM_CONFIG_STATUS, &s->rgam_lut_mode);
 }
 
 void dpp2_power_on_obuf(
diff --git a/drivers/gpu/drm/amd/display/dc/dcn20/dcn20_init.c 
b/drivers/gpu/drm/amd/display/dc/dcn20/dcn20_init.c
index 884e3e323338..ef6488165b8f 100644
--- a/drivers/gpu/drm/amd/display/dc/dcn20/dcn20_init.c
+++ b/drivers/gpu/drm/amd/display/dc/dcn20/dcn20_init.c
@@ -67,6 +67,7 @@ static const struct hw_sequencer_funcs dcn20_funcs = {
.setup_stereo = dcn10_setup_stereo,
.set_avmute = dce110_set_avmute,
.log_hw_state = dcn10_log_hw_state,
+   .log_color_state = dcn20_log_color_state,
.get_hw_state = dcn10_get_hw_state,
.clear_status_bits = dcn10_clear_status_bits,
.wait_for_mpcc_disconnect = dcn10_wait_for_mpcc_disconnect,
diff --git a/drivers/gpu/drm/amd/display/dc/dcn20/dcn20_mpc.c 
b/drivers/gpu/drm/amd/display/dc/dcn20/dcn20_mpc.c
index 5da6e44f284a..16b5ff208d14 100644
--- a/drivers/gpu/drm/amd/display/dc/dcn20/dcn20_mpc.c
+++ b/drivers/gpu/drm/amd/display/dc/dcn20/dcn20_mpc.c
@@ -542,8 +542,30 @@ static struct mpcc *mpc2_get_mpcc_for_dpp(struct mpc_tree 
*tree, int dpp_id)
return NULL;
 }
 
+static void mpc2_read_mpcc_state(
+   struct mpc *mpc,
+   int mpcc_inst,
+   struct mpcc_state *s)
+{
+   struct dcn20_mpc *mpc20 = TO_DCN20_MPC(mpc);
+
+   REG_GET(MPCC_OPP_ID[mpcc_inst], MPCC_OPP_ID, &s->opp_id);
+   REG_GET(MPCC_TOP_SEL[mpcc_inst], MPCC_TOP_SEL, &s->dpp_id);
+   REG_GET(MPCC_BOT_SEL[mpcc_inst], MPCC_BOT_SEL, &s->bot_mpcc_id);
+   REG_GET_4(MPCC_CONTROL[mpcc_inst], MPCC_MODE, &s->mode,
+   MPCC_ALPHA_BLND_MODE, &s->alpha_mode,
+   MPCC_ALPHA_MULTIPLIED_MODE, &s->pre_multiplied_alpha,
+   MPCC_BLND_ACTIVE_OVERLAP_ONLY, &s->overlap_only);
+   REG_GET_2(MPCC_STATUS[mpcc_inst], MPCC_IDLE, &s->idle,
+   MPCC_BUSY, &s->busy);
+
+   /* Gamma block state */
+   REG_GET(MPCC_OGAM_LUT_RAM_CONTROL[mpcc_inst],
+   MPCC_OGAM_CONFIG_STATUS, &s->rgam_mode);
+}
+
 static const struct mpc_funcs dcn20_mpc_funcs = {
-   .read_mpcc_state = mpc1_read_mpcc_state,
+   .read_mpcc_state = mpc2_read_mpcc_state,
.insert_plane = mpc1_insert_plane,
.remove_mpcc = mpc1_remove_mpcc,
.mpc_init = mpc1_mpc_init,
diff --git a/dri

[PATCH v3 6/9] drm/amd/display: create DCN3-specific log for MPC state

2023-11-28 Thread Melissa Wen

Logging DCN3 MPC state was following DCN1 implementation that doesn't
consider new DCN3 MPC color blocks. Create new elements according to
DCN3 MPC color caps and a new DCN3-specific function for reading MPC
data.

v3:
- remove gamut remap reg reading in favor of fixed31_32 matrix data

Signed-off-by: Melissa Wen 
---
 .../gpu/drm/amd/display/dc/dcn30/dcn30_mpc.c  | 48 ++-
 drivers/gpu/drm/amd/display/dc/inc/hw/mpc.h   |  7 +++
 2 files changed, 54 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/display/dc/dcn30/dcn30_mpc.c 
b/drivers/gpu/drm/amd/display/dc/dcn30/dcn30_mpc.c
index a6a4c3413f89..bf3386cd444d 100644
--- a/drivers/gpu/drm/amd/display/dc/dcn30/dcn30_mpc.c
+++ b/drivers/gpu/drm/amd/display/dc/dcn30/dcn30_mpc.c
@@ -1440,8 +1440,54 @@ static void mpc3_set_mpc_mem_lp_mode(struct mpc *mpc)
}
 }
 
+static void mpc3_read_mpcc_state(
+   struct mpc *mpc,
+   int mpcc_inst,
+   struct mpcc_state *s)
+{
+   struct dcn30_mpc *mpc30 = TO_DCN30_MPC(mpc);
+   uint32_t rmu_status = 0xf;
+
+   REG_GET(MPCC_OPP_ID[mpcc_inst], MPCC_OPP_ID, &s->opp_id);
+   REG_GET(MPCC_TOP_SEL[mpcc_inst], MPCC_TOP_SEL, &s->dpp_id);
+   REG_GET(MPCC_BOT_SEL[mpcc_inst], MPCC_BOT_SEL, &s->bot_mpcc_id);
+   REG_GET_4(MPCC_CONTROL[mpcc_inst], MPCC_MODE, &s->mode,
+   MPCC_ALPHA_BLND_MODE, &s->alpha_mode,
+   MPCC_ALPHA_MULTIPLIED_MODE, &s->pre_multiplied_alpha,
+   MPCC_BLND_ACTIVE_OVERLAP_ONLY, &s->overlap_only);
+   REG_GET_2(MPCC_STATUS[mpcc_inst], MPCC_IDLE, &s->idle,
+   MPCC_BUSY, &s->busy);
+
+   /* Color blocks state */
+   REG_GET(MPC_RMU_CONTROL, MPC_RMU0_MUX_STATUS, &rmu_status);
+
+   if (rmu_status == mpcc_inst) {
+   REG_GET(SHAPER_CONTROL[0],
+   MPC_RMU_SHAPER_LUT_MODE_CURRENT, &s->shaper_lut_mode);
+   REG_GET(RMU_3DLUT_MODE[0],
+   MPC_RMU_3DLUT_MODE_CURRENT,  &s->lut3d_mode);
+   REG_GET(RMU_3DLUT_READ_WRITE_CONTROL[0],
+   MPC_RMU_3DLUT_30BIT_EN, &s->lut3d_bit_depth);
+   REG_GET(RMU_3DLUT_MODE[0],
+   MPC_RMU_3DLUT_SIZE, &s->lut3d_size);
+   } else {
+   REG_GET(SHAPER_CONTROL[1],
+   MPC_RMU_SHAPER_LUT_MODE_CURRENT, &s->shaper_lut_mode);
+   REG_GET(RMU_3DLUT_MODE[1],
+   MPC_RMU_3DLUT_MODE_CURRENT,  &s->lut3d_mode);
+   REG_GET(RMU_3DLUT_READ_WRITE_CONTROL[1],
+   MPC_RMU_3DLUT_30BIT_EN, &s->lut3d_bit_depth);
+   REG_GET(RMU_3DLUT_MODE[1],
+   MPC_RMU_3DLUT_SIZE, &s->lut3d_size);
+   }
+
+REG_GET_2(MPCC_OGAM_CONTROL[mpcc_inst],
+ MPCC_OGAM_MODE_CURRENT, &s->rgam_mode,
+ MPCC_OGAM_SELECT_CURRENT, &s->rgam_lut);
+}
+
 static const struct mpc_funcs dcn30_mpc_funcs = {
-   .read_mpcc_state = mpc1_read_mpcc_state,
+   .read_mpcc_state = mpc3_read_mpcc_state,
.insert_plane = mpc1_insert_plane,
.remove_mpcc = mpc1_remove_mpcc,
.mpc_init = mpc1_mpc_init,
diff --git a/drivers/gpu/drm/amd/display/dc/inc/hw/mpc.h 
b/drivers/gpu/drm/amd/display/dc/inc/hw/mpc.h
index 61a2406dcc53..a11e40fddc44 100644
--- a/drivers/gpu/drm/amd/display/dc/inc/hw/mpc.h
+++ b/drivers/gpu/drm/amd/display/dc/inc/hw/mpc.h
@@ -199,6 +199,13 @@ struct mpcc_state {
uint32_t overlap_only;
uint32_t idle;
uint32_t busy;
+   uint32_t shaper_lut_mode;
+   uint32_t lut3d_mode;
+   uint32_t lut3d_bit_depth;
+   uint32_t lut3d_size;
+   uint32_t rgam_mode;
+   uint32_t rgam_lut;
+   struct mpc_grph_gamut_adjustment gamut_remap;
 };
 
 /**
-- 
2.42.0

Re: [PATCH 1/3] Revert "drm/prime: Unexport helpers for fd/handle conversion"

2023-11-28 Thread Felix Kuehling


On 2023-11-28 12:22, Alex Deucher wrote:

On Thu, Nov 23, 2023 at 6:12 PM Felix Kuehling  wrote:

[+Alex]

On 2023-11-17 16:44, Felix Kuehling wrote:


This reverts commit 71a7974ac7019afeec105a54447ae1dc7216cbb3.

These helper functions are needed for KFD to export and import DMABufs
the right way without duplicating the tracking of DMABufs associated with
GEM objects while ensuring that move notifier callbacks are working as
intended.

CC: Christian König 
CC: Thomas Zimmermann 
Signed-off-by: Felix Kuehling 

Re: our discussion about v2 of this patch: If this version is
acceptable, can I get an R-b or A-b?

I would like to get this patch into drm-next as a prerequisite for
patches 2 and 3. I cannot submit it to the current amd-staging-drm-next
because the patch I'm reverting doesn't exist there yet.

Patch 2 and 3 could go into drm-next as well, or go through Alex's
amd-staging-drm-next branch once patch 1 is in drm-next. Alex, how do
you prefer to coordinate this?

I guess ideally this would go through my drm-next tree since your
other patches depend on it unless others feel strongly that it should
go through drm-misc.


Yes, drm-next would work best for applying this patch and the two 
patches that depend on it. I can send you the rebased patches from my 
drm-next based branch that I used for testing this.


Regards,
  Felix




Alex



Regards,
Felix



---
   drivers/gpu/drm/drm_prime.c | 33 ++---
   include/drm/drm_prime.h |  7 +++
   2 files changed, 25 insertions(+), 15 deletions(-)

diff --git a/drivers/gpu/drm/drm_prime.c b/drivers/gpu/drm/drm_prime.c
index 63b709a67471..834a5e28abbe 100644
--- a/drivers/gpu/drm/drm_prime.c
+++ b/drivers/gpu/drm/drm_prime.c
@@ -278,7 +278,7 @@ void drm_gem_dmabuf_release(struct dma_buf *dma_buf)
   }
   EXPORT_SYMBOL(drm_gem_dmabuf_release);

-/*
+/**
* drm_gem_prime_fd_to_handle - PRIME import function for GEM drivers
* @dev: drm_device to import into
* @file_priv: drm file-private structure
@@ -292,9 +292,9 @@ EXPORT_SYMBOL(drm_gem_dmabuf_release);
*
* Returns 0 on success or a negative error code on failure.
*/
-static int drm_gem_prime_fd_to_handle(struct drm_device *dev,
-   struct drm_file *file_priv, int prime_fd,
-   uint32_t *handle)
+int drm_gem_prime_fd_to_handle(struct drm_device *dev,
+struct drm_file *file_priv, int prime_fd,
+uint32_t *handle)
   {
   struct dma_buf *dma_buf;
   struct drm_gem_object *obj;
@@ -360,6 +360,7 @@ static int drm_gem_prime_fd_to_handle(struct drm_device 
*dev,
   dma_buf_put(dma_buf);
   return ret;
   }
+EXPORT_SYMBOL(drm_gem_prime_fd_to_handle);

   int drm_prime_fd_to_handle_ioctl(struct drm_device *dev, void *data,
struct drm_file *file_priv)
@@ -408,7 +409,7 @@ static struct dma_buf *export_and_register_object(struct 
drm_device *dev,
   return dmabuf;
   }

-/*
+/**
* drm_gem_prime_handle_to_fd - PRIME export function for GEM drivers
* @dev: dev to export the buffer from
* @file_priv: drm file-private structure
@@ -421,10 +422,10 @@ static struct dma_buf *export_and_register_object(struct 
drm_device *dev,
* The actual exporting from GEM object to a dma-buf is done through the
* &drm_gem_object_funcs.export callback.
*/
-static int drm_gem_prime_handle_to_fd(struct drm_device *dev,
-   struct drm_file *file_priv, uint32_t handle,
-   uint32_t flags,
-   int *prime_fd)
+int drm_gem_prime_handle_to_fd(struct drm_device *dev,
+struct drm_file *file_priv, uint32_t handle,
+uint32_t flags,
+int *prime_fd)
   {
   struct drm_gem_object *obj;
   int ret = 0;
@@ -506,6 +507,7 @@ static int drm_gem_prime_handle_to_fd(struct drm_device 
*dev,

   return ret;
   }
+EXPORT_SYMBOL(drm_gem_prime_handle_to_fd);

   int drm_prime_handle_to_fd_ioctl(struct drm_device *dev, void *data,
struct drm_file *file_priv)
@@ -864,9 +866,9 @@ EXPORT_SYMBOL(drm_prime_get_contiguous_size);
* @obj: GEM object to export
* @flags: flags like DRM_CLOEXEC and DRM_RDWR
*
- * This is the implementation of the &drm_gem_object_funcs.export functions
- * for GEM drivers using the PRIME helpers. It is used as the default for
- * drivers that do not set their own.
+ * This is the implementation of the &drm_gem_object_funcs.export functions 
for GEM drivers
+ * using the PRIME helpers. It is used as the default in
+ * drm_gem_prime_handle_to_fd().
*/
   struct dma_buf *drm_gem_prime_export(struct drm_gem_object *obj,
int flags)
@@ -962,9 +964,10 @@ EXPORT_SYMBOL(drm_gem_prime_import_dev);
* @dev: drm_device to import into

Re: [PATCH v3] drm/amdkfd: Run restore_workers on freezable WQs

2023-11-28 Thread Felix Kuehling


On 2023-11-24 8:40, Lazar, Lijo wrote:



On 11/24/2023 4:25 AM, Felix Kuehling wrote:

Make restore workers freezable so we don't have to explicitly flush them
in suspend and GPU reset code paths, and we don't accidentally try to
restore BOs while the GPU is suspended. Not having to flush restore_work
also helps avoid lock/fence dependencies in the GPU reset case where 
we're

not allowed to wait for fences.

A side effect of this is, that we can now have multiple concurrent 
threads
trying to signal the same eviction fence. Rework eviction fence 
signaling

and replacement to account for that.

The GPU reset path can no longer rely on restore_process_worker to 
resume
queues because evict/restore workers can run independently of it. 
Instead

call a new restore_process_helper directly.


Not familiar with this code. For clarity, does this mean 
eviction/restore may happen while a reset is in progress?


I'm not sure if anything would be able to cause an eviction while a GPU 
reset is in progress. I don't think it's likely. Any actual migration of 
BOs would need to wait until after the reset is done anyway. In case of 
suspend/resume, evictions happen because all the VRAM gets saved to 
system memory.


Suspend/resume and evictions depend on the same helper or worker to 
restore BOs before reenabling user mode queues. GPU reset needs the same 
helper to stop user mode queues to ensure that applications don't 
continue running with potentially corrupted data. This patch removes 
some synchronization between different users of these common code paths 
to fix deadlocks in the suspend/resume and GPU reset scenarios.


Regards,
  Felix




Thanks,
Lijo



This is an RFC and request for testing.

v2:
- Reworked eviction fence signaling
- Introduced restore_process_helper

v3:
- Handle unsignaled eviction fences in restore_process_bos

Signed-off-by: Felix Kuehling 
Acked-by: Christian König 
---
  .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c  | 68 +++
  drivers/gpu/drm/amd/amdkfd/kfd_process.c  | 87 +++
  drivers/gpu/drm/amd/amdkfd/kfd_svm.c  |  4 +-
  3 files changed, 104 insertions(+), 55 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c

index 2e302956a279..bdec88713a09 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
@@ -1431,7 +1431,6 @@ static int init_kfd_vm(struct amdgpu_vm *vm, 
void **process_info,

    amdgpu_amdkfd_restore_userptr_worker);
    *process_info = info;
-    *ef = dma_fence_get(&info->eviction_fence->base);
  }
    vm->process_info = *process_info;
@@ -1462,6 +1461,8 @@ static int init_kfd_vm(struct amdgpu_vm *vm, 
void **process_info,

  list_add_tail(&vm->vm_list_node,
  &(vm->process_info->vm_list_head));
  vm->process_info->n_vms++;
+
+    *ef = dma_fence_get(&vm->process_info->eviction_fence->base);
  mutex_unlock(&vm->process_info->lock);
    return 0;
@@ -1473,10 +1474,7 @@ static int init_kfd_vm(struct amdgpu_vm *vm, 
void **process_info,

  reserve_pd_fail:
  vm->process_info = NULL;
  if (info) {
-    /* Two fence references: one in info and one in *ef */
  dma_fence_put(&info->eviction_fence->base);
-    dma_fence_put(*ef);
-    *ef = NULL;
  *process_info = NULL;
  put_pid(info->pid);
  create_evict_fence_fail:
@@ -1670,7 +1668,8 @@ int amdgpu_amdkfd_criu_resume(void *p)
  goto out_unlock;
  }
  WRITE_ONCE(pinfo->block_mmu_notifications, false);
-    schedule_delayed_work(&pinfo->restore_userptr_work, 0);
+    queue_delayed_work(system_freezable_wq,
+   &pinfo->restore_userptr_work, 0);
    out_unlock:
  mutex_unlock(&pinfo->lock);
@@ -2475,7 +2474,8 @@ int amdgpu_amdkfd_evict_userptr(struct 
mmu_interval_notifier *mni,

 KFD_QUEUE_EVICTION_TRIGGER_USERPTR);
  if (r)
  pr_err("Failed to quiesce KFD\n");
- schedule_delayed_work(&process_info->restore_userptr_work,
+    queue_delayed_work(system_freezable_wq,
+    &process_info->restore_userptr_work,
msecs_to_jiffies(AMDGPU_USERPTR_RESTORE_DELAY_MS));
  }
  mutex_unlock(&process_info->notifier_lock);
@@ -2810,7 +2810,8 @@ static void 
amdgpu_amdkfd_restore_userptr_worker(struct work_struct *work)

    /* If validation failed, reschedule another attempt */
  if (evicted_bos) {
- schedule_delayed_work(&process_info->restore_userptr_work,
+    queue_delayed_work(system_freezable_wq,
+    &process_info->restore_userptr_work,
msecs_to_jiffies(AMDGPU_USERPTR_RESTORE_DELAY_MS));
    kfd_smi_event_queue_restore_rescheduled(mm);
@@ -2819,6 +2820,23 @@ static void 
amdgpu_amdkfd_restore_userptr_worker(struct work_struct *work)

  put_task_struct(usertask);
  }
  +static void replace_eviction_fence(struct dma_fence **ef,
+

[PATCH] drm/amd/amdgpu: Clean up VCN fw_shared and set flag bits during hw_init

2023-11-28 Thread Bokun Zhang

- After whole GPU reset, the VCN fw_shared data is cleaned up.
  This seems like a new behavior and it is considered safer since
  we do not want to keep the residual data

- However, the existing driver code still assumes the residual data.
  For example, in sw_init, we will set the UNIFIED_QUEUE flag bit.
  This flag bit is never set anywhere else.

  Then if a WGR happens, sw_init will not be called and therefore,
  we will lose this bit and it leads to a crash in VCN FW.

- A proper fix is that we explicitly set the flag bit in hw_init
  every time and the data is repopulated after a WGR instead of
  assuming the data can survive the WGR.

Signed-off-by: Bokun Zhang 
Change-Id: I783ff826f12e12eaf48d046ff9f1cef65558c7b9
---
 drivers/gpu/drm/amd/amdgpu/vcn_v4_0.c | 45 +--
 1 file changed, 28 insertions(+), 17 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/vcn_v4_0.c 
b/drivers/gpu/drm/amd/amdgpu/vcn_v4_0.c
index bf07aa200030..0cf3e639c480 100644
--- a/drivers/gpu/drm/amd/amdgpu/vcn_v4_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/vcn_v4_0.c
@@ -111,6 +111,7 @@ static int vcn_v4_0_sw_init(void *handle)
 {
struct amdgpu_ring *ring;
struct amdgpu_device *adev = (struct amdgpu_device *)handle;
+   volatile struct amdgpu_vcn4_fw_shared *fw_shared;
int i, r;
 
r = amdgpu_vcn_sw_init(adev);
@@ -124,11 +125,12 @@ static int vcn_v4_0_sw_init(void *handle)
return r;
 
for (i = 0; i < adev->vcn.num_vcn_inst; i++) {
-   volatile struct amdgpu_vcn4_fw_shared *fw_shared;
-
if (adev->vcn.harvest_config & (1 << i))
continue;
 
+   fw_shared = adev->vcn.inst[i].fw_shared.cpu_addr;
+   memset(fw_shared, 0, sizeof(struct amdgpu_vcn4_fw_shared));
+
/* Init instance 0 sched_score to 1, so it's scheduled after 
other instances */
if (i == 0)
atomic_set(&adev->vcn.inst[i].sched_score, 1);
@@ -161,21 +163,6 @@ static int vcn_v4_0_sw_init(void *handle)
if (r)
return r;
 
-   fw_shared = adev->vcn.inst[i].fw_shared.cpu_addr;
-   fw_shared->present_flag_0 = 
cpu_to_le32(AMDGPU_FW_SHARED_FLAG_0_UNIFIED_QUEUE);
-   fw_shared->sq.is_enabled = 1;
-
-   fw_shared->present_flag_0 |= 
cpu_to_le32(AMDGPU_VCN_SMU_DPM_INTERFACE_FLAG);
-   fw_shared->smu_dpm_interface.smu_interface_type = (adev->flags 
& AMD_IS_APU) ?
-   AMDGPU_VCN_SMU_DPM_INTERFACE_APU : 
AMDGPU_VCN_SMU_DPM_INTERFACE_DGPU;
-
-   if (amdgpu_ip_version(adev, VCN_HWIP, 0) ==
-   IP_VERSION(4, 0, 2)) {
-   fw_shared->present_flag_0 |= 
AMDGPU_FW_SHARED_FLAG_0_DRM_KEY_INJECT;
-   fw_shared->drm_key_wa.method =
-   
AMDGPU_DRM_KEY_INJECT_WORKAROUND_VCNFW_ASD_HANDSHAKING;
-   }
-
if (amdgpu_vcnfw_log)
amdgpu_vcn_fwlog_init(&adev->vcn.inst[i]);
}
@@ -245,9 +232,33 @@ static int vcn_v4_0_sw_fini(void *handle)
 static int vcn_v4_0_hw_init(void *handle)
 {
struct amdgpu_device *adev = (struct amdgpu_device *)handle;
+   volatile struct amdgpu_vcn4_fw_shared *fw_shared;
struct amdgpu_ring *ring;
int i, r;
 
+   for (i = 0; i < adev->vcn.num_vcn_inst; i++) {
+   if (adev->vcn.harvest_config & (1 << i))
+   continue;
+
+   fw_shared = adev->vcn.inst[i].fw_shared.cpu_addr;
+   fw_shared->present_flag_0 |= 
cpu_to_le32(AMDGPU_FW_SHARED_FLAG_0_UNIFIED_QUEUE);
+   fw_shared->sq.is_enabled = 1;
+
+   fw_shared->present_flag_0 |= 
cpu_to_le32(AMDGPU_VCN_SMU_DPM_INTERFACE_FLAG);
+   fw_shared->smu_dpm_interface.smu_interface_type = (adev->flags 
& AMD_IS_APU) ?
+   AMDGPU_VCN_SMU_DPM_INTERFACE_APU : 
AMDGPU_VCN_SMU_DPM_INTERFACE_DGPU;
+
+   if (amdgpu_ip_version(adev, VCN_HWIP, 0) ==
+   IP_VERSION(4, 0, 2)) {
+   fw_shared->present_flag_0 |= 
AMDGPU_FW_SHARED_FLAG_0_DRM_KEY_INJECT;
+   fw_shared->drm_key_wa.method =
+   
AMDGPU_DRM_KEY_INJECT_WORKAROUND_VCNFW_ASD_HANDSHAKING;
+   }
+
+   if (amdgpu_vcnfw_log)
+   fw_shared->present_flag_0 |= 
cpu_to_le32(AMDGPU_VCN_FW_LOGGING_FLAG);
+   }
+
if (amdgpu_sriov_vf(adev)) {
r = vcn_v4_0_start_sriov(adev);
if (r)
-- 
2.34.1

Re: [PATCH v5 00/32] drm/amd/display: add AMD driver-specific properties for color mgmt

2023-11-28 Thread Harry Wentland

On 2023-11-16 14:57, Melissa Wen wrote:
> Hello,
> 
> This series extends the current KMS color management API with AMD
> driver-specific properties to enhance the color management support on
> AMD Steam Deck. The key additions to the color pipeline include:
> 

snip

> Melissa Wen (18):
>   drm/drm_mode_object: increase max objects to accommodate new color
> props
>   drm/drm_property: make replace_property_blob_from_id a DRM helper
>   drm/drm_plane: track color mgmt changes per plane

If all patches are merged through amd-staging-drm-next I worry that
conflicts creep in if any code around replace_property_blob_from_id
changes in DRM.

My plan is to merge DRM patches through drm-misc-next, as well
as include them in the amd-staging-drm-next merge. They should then
fall out at the next amd-staging-drm-next pull and (hopefully)
ensure that there is no conflict.

If no objections I'll go ahead with that later this week.

Harry

>   drm/amd/display: add driver-specific property for plane degamma LUT
>   drm/amd/display: explicitly define EOTF and inverse EOTF
>   drm/amd/display: document AMDGPU pre-defined transfer functions
>   drm/amd/display: add plane 3D LUT driver-specific properties
>   drm/amd/display: add plane shaper LUT and TF driver-specific
> properties
>   drm/amd/display: add CRTC gamma TF driver-specific property
>   drm/amd/display: add comments to describe DM crtc color mgmt behavior
>   drm/amd/display: encapsulate atomic regamma operation
>   drm/amd/display: decouple steps for mapping CRTC degamma to DC plane
>   drm/amd/display: reject atomic commit if setting both plane and CRTC
> degamma
>   drm/amd/display: add plane shaper LUT support
>   drm/amd/display: add plane shaper TF support
>   drm/amd/display: add plane 3D LUT support
>   drm/amd/display: add plane CTM driver-specific property
>   drm/amd/display: add plane CTM support
> 
>  drivers/gpu/drm/amd/amdgpu/amdgpu_mode.h  |  91 ++
>  .../gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c |  34 +-
>  .../gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.h | 108 +++
>  .../amd/display/amdgpu_dm/amdgpu_dm_color.c   | 818 --
>  .../amd/display/amdgpu_dm/amdgpu_dm_crtc.c|  72 ++
>  .../amd/display/amdgpu_dm/amdgpu_dm_plane.c   | 232 -
>  .../gpu/drm/amd/display/include/fixed31_32.h  |  12 +
>  drivers/gpu/drm/arm/malidp_crtc.c |   2 +-
>  drivers/gpu/drm/drm_atomic.c  |   1 +
>  drivers/gpu/drm/drm_atomic_state_helper.c |   1 +
>  drivers/gpu/drm/drm_atomic_uapi.c |  43 +-
>  drivers/gpu/drm/drm_property.c|  49 ++
>  include/drm/drm_mode_object.h |   2 +-
>  include/drm/drm_plane.h   |   7 +
>  include/drm/drm_property.h|   6 +
>  include/uapi/drm/drm_mode.h   |   8 +
>  16 files changed, 1377 insertions(+), 109 deletions(-)
>

Re: Radeon regression in 6.6 kernel

2023-11-28 Thread Alex Deucher

On Mon, Nov 27, 2023 at 6:24 PM Phillip Susi  wrote:
>
> Alex Deucher  writes:
>
> >> In that case those are the already known problems with the scheduler
> >> changes, aren't they?
> >
> > Yes.  Those changes went into 6.7 though, not 6.6 AFAIK.  Maybe I'm
> > misunderstanding what the original report was actually testing.  If it
> > was 6.7, then try reverting:
> > 56e449603f0ac580700621a356d35d5716a62ce5
> > b70438004a14f4d0f9890b3297cd66248728546c
>
> At some point it was suggested that I file a gitlab issue, but I took
> this to mean it was already known and being worked on.  -rc3 came out
> today and still has the problem.  Is there a known issue I could track?
>

At this point, unless there are any objections, I think we should just
revert the two patches.

Alex

Re: Radeon regression in 6.6 kernel

2023-11-28 Thread Luben Tuikov

On 2023-11-28 17:13, Alex Deucher wrote:
> On Mon, Nov 27, 2023 at 6:24 PM Phillip Susi  wrote:
>>
>> Alex Deucher  writes:
>>
 In that case those are the already known problems with the scheduler
 changes, aren't they?
>>>
>>> Yes.  Those changes went into 6.7 though, not 6.6 AFAIK.  Maybe I'm
>>> misunderstanding what the original report was actually testing.  If it
>>> was 6.7, then try reverting:
>>> 56e449603f0ac580700621a356d35d5716a62ce5
>>> b70438004a14f4d0f9890b3297cd66248728546c
>>
>> At some point it was suggested that I file a gitlab issue, but I took
>> this to mean it was already known and being worked on.  -rc3 came out
>> today and still has the problem.  Is there a known issue I could track?
>>
> 
> At this point, unless there are any objections, I think we should just
> revert the two patches
Uhm, no.

Why "the two" patches?

This email, part of this thread,

https://lore.kernel.org/all/87r0kircdo@vps.thesusis.net/

clearly states that reverting *only* this commit,
56e449603f0ac5 drm/sched: Convert the GPU scheduler to variable number of 
run-queues
*does not* mitigate the failed suspend. (Furthermore, this commit doesn't 
really change
anything operational, other than using an allocated array, instead of a static 
one, in DRM,
while the 2nd patch is solely contained within the amdgpu driver code.)

Leaving us with only this change,
b70438004a14f4 drm/amdgpu: move buffer funcs setting up a level
to be at fault, as the kernel log attached in the linked email above shows.

The conclusion is that only b70438004a14f4 needs reverting.
-- 
Regards,
Luben

OpenPGP_0x4C15479431A334AF.asc
Description: OpenPGP public key

OpenPGP_signature.asc
Description: OpenPGP digital signature

Re: [RFC PATCH 0/6] Supporting GMEM (generalized memory management) for external memory devices

2023-11-28 Thread Dave Airlie

On Tue, 28 Nov 2023 at 23:07, Christian König  wrote:
>
> Am 28.11.23 um 13:50 schrieb Weixi Zhu:
> > The problem:
> >
> > Accelerator driver developers are forced to reinvent external MM subsystems
> > case by case, because Linux core MM only considers host memory resources.
> > These reinvented MM subsystems have similar orders of magnitude of LoC as
> > Linux MM (80K), e.g. Nvidia-UVM has 70K, AMD GPU has 14K and Huawei NPU has
> > 30K. Meanwhile, more and more vendors are implementing their own
> > accelerators, e.g. Microsoft's Maia 100. At the same time,
> > application-level developers suffer from poor programmability -- they must
> > consider parallel address spaces and be careful about the limited device
> > DRAM capacity. This can be alleviated if a malloc()-ed virtual address can
> > be shared by the accelerator, or the abundant host DRAM can further
> > transparently backup the device local memory.
> >
> > These external MM systems share similar mechanisms except for the
> > hardware-dependent part, so reinventing them is effectively introducing
> > redundant code (14K~70K for each case). Such developing/maintaining is not
> > cheap. Furthermore, to share a malloc()-ed virtual address, device drivers
> > need to deeply interact with Linux MM via low-level MM APIs, e.g. MMU
> > notifiers/HMM. This raises the bar for driver development, since developers
> > must understand how Linux MM works. Further, it creates code maintenance
> > problems -- any changes to Linux MM potentially require coordinated changes
> > to accelerator drivers using low-level MM APIs.
> >
> > Putting a cache-coherent bus between host and device will not make these
> > external MM subsystems disappear. For example, a throughput-oriented
> > accelerator will not tolerate executing heavy memory access workload with
> > a host MMU/IOMMU via a remote bus. Therefore, devices will still have
> > their own MMU and pick a simpler page table format for lower address
> > translation overhead, requiring external MM subsystems.
> >
> > 
> >
> > What GMEM (Generalized Memory Management [1]) does:
> >
> > GMEM extends Linux MM to share its machine-independent MM code. Only
> > high-level interface is provided for device drivers. This prevents
> > accelerator drivers from reinventing the wheel, but relies on drivers to
> > implement their hardware-dependent functions declared by GMEM. GMEM's key
> > interface include gm_dev_create(), gm_as_create(), gm_as_attach() and
> > gm_dev_register_physmem(). Here briefly describe how a device driver
> > utilizes them:
> > 1. At boot time, call gm_dev_create() and registers the implementation of
> > hardware-dependent functions as declared in struct gm_mmu.
> >   - If the device has local DRAM, call gm_dev_register_physmem() to
> > register available physical addresses.
> > 2. When a device context is initialized (e.g. triggered by ioctl), check if
> > the current CPU process has been attached to a gmem address space
> > (struct gm_as). If not, call gm_as_create() and point current->mm->gm_as
> > to it.
> > 3. Call gm_as_attach() to attach the device context to a gmem address space.
> > 4. Invoke gm_dev_fault() to resolve a page fault or prepare data before
> > device computation happens.
> >
> > GMEM has changed the following assumptions in Linux MM:
> >1. An mm_struct not only handle a single CPU context, but may also handle
> >   external memory contexts encapsulated as gm_context listed in
> >   mm->gm_as. An external memory context can include a few or all of the
> >   following parts: an external MMU (that requires TLB invalidation), an
> >   external page table (that requires PTE manipulation) and external DRAM
> >   (that requires physical memory management).
>
> Well that is pretty much exactly what AMD has already proposed with KFD
> and was rejected for rather good reasons.

> >
> > MMU functions
> > The MMU functions peer_map() and peer_unmap() overlap other functions,
> > leaving a question if the MMU functions should be decoupled as more basic
> > operations. Decoupling them could potentially prevent device drivers
> > coalescing these basic steps within a single host-device communication
> > operation, while coupling them makes it more difficult for device drivers
> > to utilize GMEM interface.
>
> Well to be honest all of this sounds like history to me. We have already
> seen the same basic approach in KFD, HMM and to some extend in TTM as well.
>
> And all of them more or less failed. Why should this here be different?


Any info we have on why this has failed to work in the past would be
useful to provide. This is one of those cases where we may not have
documented the bad ideas to stop future developers from thinking they
are bad.

I do think we would want more common code in this area, but I would
think we'd have it more on the driver infrastructure side, than in the
core mm.

Dave.

Re: PSP_VMBX_POLLING_LIMIT too big

2023-11-28 Thread Lazar, Lijo





On 11/28/2023 9:51 PM, Mario Limonciello wrote:

Hi,

In amd-staging-drm-next 46fe6312082c ("drm/amdgpu: update retry times 
for psp BL wait") and upstream a11156ff6f41 ("drm/amdgpu: update retry 
times for psp BL wait") the number of loops for 
psp_v13_0_wait_for_bootloader() to try again increased significantly.


It went from 10 loops to 20k loops.  Essentially this means that the 
function can "effectively" no longer fail.




PSP_VMBX_POLLING_LIMIT to 20k is introduced by this - f2328c2ba0e84 
("drm/amdgpu: update retry times for psp vmbx wait")


20k is too much even for PSP 13.0.6. Will reduce it to 3000 (~5mins) for 
13.0.6 and for others keep the default 10.


Thanks,
Lijo

I've got an issue I'm looking at where runtime resume for a dGPU fails, 
and because of this change the system gets stuck in a never ending busy 
loop instead of cleanly returning an error code to the caller.  The 
outcome is the system appears hung while the 20k loops run instead of 
just the dGPU failing to resume.


Is this 20k value really required?  Or can we reduce it back to 
something more manageable?


Thanks,

RE: [PATCH] drm/amdgpu: Use another offset for GC 9.4.3 remap

2023-11-28 Thread Zhang, Hawking

[AMD Official Use Only - General]

Reviewed-by: Hawking Zhang 

Regards,
Hawking
-Original Message-
From: Lazar, Lijo 
Sent: Tuesday, November 28, 2023 19:26
To: amd-gfx@lists.freedesktop.org
Cc: Zhang, Hawking ; Deucher, Alexander 
; Kuehling, Felix ; Ma, Le 
; Gadre, Mangesh 
Subject: [PATCH] drm/amdgpu: Use another offset for GC 9.4.3 remap

The legacy region at 0x7F000 maps to valid registers in GC 9.4.3 SOCs.
Use 0x1A000 offset instead as MMIO register remap region.

Signed-off-by: Lijo Lazar 
---
 drivers/gpu/drm/amd/amdgpu/soc15.c | 5 +
 1 file changed, 5 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/soc15.c 
b/drivers/gpu/drm/amd/amdgpu/soc15.c
index e3d41e8aac9d..9ad4d6d3122b 100644
--- a/drivers/gpu/drm/amd/amdgpu/soc15.c
+++ b/drivers/gpu/drm/amd/amdgpu/soc15.c
@@ -1162,6 +1162,11 @@ static int soc15_common_early_init(void *handle)
AMD_PG_SUPPORT_VCN_DPG |
AMD_PG_SUPPORT_JPEG;
adev->external_rev_id = adev->rev_id + 0x46;
+   /* GC 9.4.3 uses MMIO register region hole at a different 
offset */
+   if (!amdgpu_sriov_vf(adev)) {
+   adev->rmmio_remap.reg_offset = 0x1A000;
+   adev->rmmio_remap.bus_addr = adev->rmmio_base + 0x1A000;
+   }
break;
default:
/* FIXME: not supported yet */
--
2.25.1

[PATCH 1/3] drm/amd: Fix handling of amdgpu.runpm on systems with BOCO

2023-11-28 Thread Mario Limonciello

On products that support both BOCO and BACO it should be possible
to override the BOCO detection and force BACO by amdgpu.runpm=1 but
this doesn't work today.

Adjust the logic used in amdgpu_driver_load_kms() to make sure that
module parameters are looked at first and only use automatic policies
in the -1 or -2 cases.

Signed-off-by: Mario Limonciello 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c | 80 +++--
 1 file changed, 48 insertions(+), 32 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
index b5ebafd4a3ad..29381da08fd5 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
@@ -121,6 +121,53 @@ void amdgpu_register_gpu_instance(struct amdgpu_device 
*adev)
mutex_unlock(&mgpu_info.mutex);
 }
 
+static void amdgpu_driver_set_runtime_pm_mode(struct amdgpu_device *adev)
+{
+   struct drm_device *dev = adev_to_drm(adev);
+
+   adev->pm.rpm_mode = AMDGPU_RUNPM_NONE;
+
+   switch (amdgpu_runtime_pm) {
+   case -1:
+   case -2:
+   break;
+   case 0:
+   default:
+   return;
+   case 1:
+   if (amdgpu_device_supports_baco(dev))
+   adev->pm.rpm_mode = AMDGPU_RUNPM_BACO;
+   else
+   dev_err(adev->dev, "BACO is not supported on this 
ASIC\n");
+   return;
+   case 2:
+   // TODO: adjust plumbing to be able to pull PP table to check 
MACO support as well
+   if (amdgpu_device_supports_baco(dev))
+   adev->pm.rpm_mode = AMDGPU_RUNPM_BACO;
+   else
+   dev_err(adev->dev, "BAMACO is not supported on this 
ASIC\n");
+   return;
+   }
+
+   if (amdgpu_device_supports_px(dev)) {
+   adev->pm.rpm_mode = AMDGPU_RUNPM_PX;
+   dev_info(adev->dev, "Using ATPX for runtime pm\n");
+   } else if (amdgpu_device_supports_boco(dev)) {
+   adev->pm.rpm_mode = AMDGPU_RUNPM_BOCO;
+   dev_info(adev->dev, "Using BOCO for runtime pm\n");
+   } else if (amdgpu_device_supports_baco(dev)) {
+   if (adev->asic_type == CHIP_VEGA10) {
+   /* enable BACO as runpm mode if noretry=0 */
+   if (!adev->gmc.noretry)
+   adev->pm.rpm_mode = AMDGPU_RUNPM_BACO;
+   } else {
+   adev->pm.rpm_mode = AMDGPU_RUNPM_BACO;
+   }
+   if (adev->pm.rpm_mode == AMDGPU_RUNPM_BACO)
+   dev_info(adev->dev, "Using BACO for runtime pm\n");
+   }
+}
+
 /**
  * amdgpu_driver_load_kms - Main load function for KMS.
  *
@@ -149,38 +196,7 @@ int amdgpu_driver_load_kms(struct amdgpu_device *adev, 
unsigned long flags)
goto out;
}
 
-   adev->pm.rpm_mode = AMDGPU_RUNPM_NONE;
-   if (amdgpu_device_supports_px(dev) &&
-   (amdgpu_runtime_pm != 0)) { /* enable PX as runtime mode */
-   adev->pm.rpm_mode = AMDGPU_RUNPM_PX;
-   dev_info(adev->dev, "Using ATPX for runtime pm\n");
-   } else if (amdgpu_device_supports_boco(dev) &&
-  (amdgpu_runtime_pm != 0)) { /* enable boco as runtime mode */
-   adev->pm.rpm_mode = AMDGPU_RUNPM_BOCO;
-   dev_info(adev->dev, "Using BOCO for runtime pm\n");
-   } else if (amdgpu_device_supports_baco(dev) &&
-  (amdgpu_runtime_pm != 0)) {
-   switch (adev->asic_type) {
-   case CHIP_VEGA20:
-   case CHIP_ARCTURUS:
-   /* enable BACO as runpm mode if runpm=1 */
-   if (amdgpu_runtime_pm > 0)
-   adev->pm.rpm_mode = AMDGPU_RUNPM_BACO;
-   break;
-   case CHIP_VEGA10:
-   /* enable BACO as runpm mode if noretry=0 */
-   if (!adev->gmc.noretry)
-   adev->pm.rpm_mode = AMDGPU_RUNPM_BACO;
-   break;
-   default:
-   /* enable BACO as runpm mode on CI+ */
-   adev->pm.rpm_mode = AMDGPU_RUNPM_BACO;
-   break;
-   }
-
-   if (adev->pm.rpm_mode == AMDGPU_RUNPM_BACO)
-   dev_info(adev->dev, "Using BACO for runtime pm\n");
-   }
+   amdgpu_driver_set_runtime_pm_mode(adev);
 
/* Call ACPI methods: require modeset init
 * but failure is not fatal
-- 
2.34.1

[PATCH 0/3] Obey amdgpu.runpm even on BOCO systems

2023-11-28 Thread Mario Limonciello

I've found that on a system that supports BOCO when I try to override
it that it still uses BOCO.  This is because module parameter use is
intermingled with automatic detection.

This series moves automatic detection after module parameter use
and makes all callers obey the result.

Mario Limonciello (3):
  drm/amd: Fix handling of amdgpu.runpm on systems with BOCO
  drm/amd: Introduce new enum for BAMACO
  drm/amd: Drop calls for checking "support" for BACO/BOCO/PX

 drivers/gpu/drm/amd/amdgpu/amdgpu_device.c|  7 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c   | 34 
 drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c   | 80 +++
 drivers/gpu/drm/amd/pm/inc/amdgpu_dpm.h   |  1 +
 drivers/gpu/drm/amd/pm/swsmu/amdgpu_smu.c |  3 +-
 .../gpu/drm/amd/pm/swsmu/smu11/smu_v11_0.c|  2 +-
 .../gpu/drm/amd/pm/swsmu/smu13/smu_v13_0.c|  5 +-
 7 files changed, 77 insertions(+), 55 deletions(-)

-- 
2.34.1

[PATCH 2/3] drm/amd: Introduce new enum for BAMACO

2023-11-28 Thread Mario Limonciello

Rather than plumbing module parameter deep into IP declare BAMACO
runpm mode at amdgpu_driver_set_runtime_pm_mode() and then detect
this mode in consumers.

Signed-off-by: Mario Limonciello 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c| 2 +-
 drivers/gpu/drm/amd/pm/inc/amdgpu_dpm.h| 1 +
 drivers/gpu/drm/amd/pm/swsmu/smu11/smu_v11_0.c | 2 +-
 drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0.c | 5 +++--
 4 files changed, 6 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
index 29381da08fd5..c6c87ab71d94 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
@@ -143,7 +143,7 @@ static void amdgpu_driver_set_runtime_pm_mode(struct 
amdgpu_device *adev)
case 2:
// TODO: adjust plumbing to be able to pull PP table to check 
MACO support as well
if (amdgpu_device_supports_baco(dev))
-   adev->pm.rpm_mode = AMDGPU_RUNPM_BACO;
+   adev->pm.rpm_mode = AMDGPU_RUNPM_BAMACO;
else
dev_err(adev->dev, "BAMACO is not supported on this 
ASIC\n");
return;
diff --git a/drivers/gpu/drm/amd/pm/inc/amdgpu_dpm.h 
b/drivers/gpu/drm/amd/pm/inc/amdgpu_dpm.h
index d76b0a60db44..3434c31b434b 100644
--- a/drivers/gpu/drm/amd/pm/inc/amdgpu_dpm.h
+++ b/drivers/gpu/drm/amd/pm/inc/amdgpu_dpm.h
@@ -50,6 +50,7 @@ enum amdgpu_runpm_mode {
AMDGPU_RUNPM_PX,
AMDGPU_RUNPM_BOCO,
AMDGPU_RUNPM_BACO,
+   AMDGPU_RUNPM_BAMACO,
 };
 
 struct amdgpu_ps {
diff --git a/drivers/gpu/drm/amd/pm/swsmu/smu11/smu_v11_0.c 
b/drivers/gpu/drm/amd/pm/swsmu/smu11/smu_v11_0.c
index 5a314d0316c1..64c8783b4118 100644
--- a/drivers/gpu/drm/amd/pm/swsmu/smu11/smu_v11_0.c
+++ b/drivers/gpu/drm/amd/pm/swsmu/smu11/smu_v11_0.c
@@ -1597,7 +1597,7 @@ int smu_v11_0_baco_set_state(struct smu_context *smu, 
enum smu_baco_state state)
case IP_VERSION(11, 0, 11):
case IP_VERSION(11, 0, 12):
case IP_VERSION(11, 0, 13):
-   if (amdgpu_runtime_pm == 2)
+   if (adev->pm.rpm_mode == AMDGPU_RUNPM_BAMACO)
ret = smu_cmn_send_smc_msg_with_param(smu,
  
SMU_MSG_EnterBaco,
  
D3HOT_BAMACO_SEQUENCE,
diff --git a/drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0.c 
b/drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0.c
index 3c595ac897d6..b77763d6c72f 100644
--- a/drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0.c
+++ b/drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0.c
@@ -2259,7 +2259,8 @@ int smu_v13_0_baco_set_state(struct smu_context *smu,
if (state == SMU_BACO_STATE_ENTER) {
ret = smu_cmn_send_smc_msg_with_param(smu,
  SMU_MSG_EnterBaco,
- (smu_baco->maco_support 
&& amdgpu_runtime_pm != 1) ?
+ (smu_baco->maco_support &&
+ adev->pm.rpm_mode == 
AMDGPU_RUNPM_BAMACO) ?
  BACO_SEQ_BAMACO : 
BACO_SEQ_BACO,
  NULL);
} else {
@@ -2288,7 +2289,7 @@ int smu_v13_0_baco_enter(struct smu_context *smu)
 
if (adev->in_runpm && smu_cmn_is_audio_func_enabled(adev)) {
return smu_v13_0_baco_set_armd3_sequence(smu,
-   (smu_baco->maco_support && amdgpu_runtime_pm != 
1) ?
+   (smu_baco->maco_support && adev->pm.rpm_mode == 
AMDGPU_RUNPM_BAMACO) ?
BACO_SEQ_BAMACO : BACO_SEQ_BACO);
} else {
ret = smu_v13_0_baco_set_state(smu, SMU_BACO_STATE_ENTER);
-- 
2.34.1

[PATCH 3/3] drm/amd: Drop calls for checking "support" for BACO/BOCO/PX

2023-11-28 Thread Mario Limonciello

As the module parameter can be used to control behavior, all parts
of the driver should obey what has been programmed by user or
detected by auto mode rather than what "can" be supported.

Drop calls to all functions that check for BACO/BOCO/PX runpm modes
and instead use the variable that is programmed when device is probed.

Signed-off-by: Mario Limonciello 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_device.c |  7 +++--
 drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c| 34 --
 drivers/gpu/drm/amd/pm/swsmu/amdgpu_smu.c  |  3 +-
 3 files changed, 24 insertions(+), 20 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index 1181fe4baf0f..8f7377b37f2f 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -1822,9 +1822,10 @@ static void amdgpu_switcheroo_set_state(struct pci_dev 
*pdev,
enum vga_switcheroo_state state)
 {
struct drm_device *dev = pci_get_drvdata(pdev);
+   struct amdgpu_device *adev = drm_to_adev(dev);
int r;
 
-   if (amdgpu_device_supports_px(dev) && state == VGA_SWITCHEROO_OFF)
+   if (adev->pm.rpm_mode == AMDGPU_RUNPM_PX && state == VGA_SWITCHEROO_OFF)
return;
 
if (state == VGA_SWITCHEROO_ON) {
@@ -4244,7 +4245,7 @@ int amdgpu_device_init(struct amdgpu_device *adev,
if ((adev->pdev->class >> 8) == PCI_CLASS_DISPLAY_VGA)
vga_client_register(adev->pdev, amdgpu_device_vga_set_decode);
 
-   px = amdgpu_device_supports_px(ddev);
+   px = (adev->pm.rpm_mode == AMDGPU_RUNPM_PX);
 
if (px || (!dev_is_removable(&adev->pdev->dev) &&
apple_gmux_detect(NULL, NULL)))
@@ -4392,7 +4393,7 @@ void amdgpu_device_fini_sw(struct amdgpu_device *adev)
kfree(adev->fru_info);
adev->fru_info = NULL;
 
-   px = amdgpu_device_supports_px(adev_to_drm(adev));
+   px = (adev->pm.rpm_mode == AMDGPU_RUNPM_PX);
 
if (px || (!dev_is_removable(&adev->pdev->dev) &&
apple_gmux_detect(NULL, NULL)))
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
index e39f3a334c9d..12fb8398fb45 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
@@ -2248,10 +2248,10 @@ static int amdgpu_pci_probe(struct pci_dev *pdev,
 
if (adev->pm.rpm_mode != AMDGPU_RUNPM_NONE) {
/* only need to skip on ATPX */
-   if (amdgpu_device_supports_px(ddev))
+   if (adev->pm.rpm_mode == AMDGPU_RUNPM_PX)
dev_pm_set_driver_flags(ddev->dev, 
DPM_FLAG_NO_DIRECT_COMPLETE);
/* we want direct complete for BOCO */
-   if (amdgpu_device_supports_boco(ddev))
+   if (adev->pm.rpm_mode == AMDGPU_RUNPM_BOCO)
dev_pm_set_driver_flags(ddev->dev, 
DPM_FLAG_SMART_PREPARE |
DPM_FLAG_SMART_SUSPEND |
DPM_FLAG_MAY_SKIP_RESUME);
@@ -2284,7 +2284,8 @@ static int amdgpu_pci_probe(struct pci_dev *pdev,
 * into D0 state. Then there will be a PMFW-aware D-state
 * transition(D0->D3) on runpm suspend.
 */
-   if (amdgpu_device_supports_baco(ddev) &&
+   if ((adev->pm.rpm_mode == AMDGPU_RUNPM_BACO ||
+adev->pm.rpm_mode == AMDGPU_RUNPM_BAMACO) &&
!(adev->flags & AMD_IS_APU) &&
(adev->asic_type >= CHIP_NAVI10))
amdgpu_get_secondary_funcs(adev);
@@ -2466,7 +2467,7 @@ static int amdgpu_pmops_prepare(struct device *dev)
/* Return a positive number here so
 * DPM_FLAG_SMART_SUSPEND works properly
 */
-   if (amdgpu_device_supports_boco(drm_dev) &&
+   if (adev->pm.rpm_mode == AMDGPU_RUNPM_BOCO &&
pm_runtime_suspended(dev))
return 1;
 
@@ -2664,7 +2665,7 @@ static int amdgpu_pmops_runtime_suspend(struct device 
*dev)
}
 
adev->in_runpm = true;
-   if (amdgpu_device_supports_px(drm_dev))
+   if (adev->pm.rpm_mode == AMDGPU_RUNPM_PX)
drm_dev->switch_power_state = DRM_SWITCH_POWER_CHANGING;
 
/*
@@ -2674,7 +2675,7 @@ static int amdgpu_pmops_runtime_suspend(struct device 
*dev)
 * platforms.
 * TODO: this may be also needed for PX capable platform.
 */
-   if (amdgpu_device_supports_boco(drm_dev))
+   if (adev->pm.rpm_mode == AMDGPU_RUNPM_BOCO)
adev->mp1_state = PP_MP1_STATE_UNLOAD;
 
ret = amdgpu_device_prepare(drm_dev);
@@ -2683,15 +2684,15 @@ static int amdgpu_pmops_runtime_suspend(struct device 
*dev)
ret = amdgpu_device_suspend(drm_dev, false);
if (ret) {
adev->in_runpm = false;
-

RE: [PATCH] drm/amdgpu: Skip access gfx11 golden registers under SRIOV

2023-11-28 Thread Chen, Horace

[AMD Official Use Only - General]

Reviewed-By: Horace Chen 

-Original Message-
From: Yin, ZhenGuo (Chris) 
Sent: Monday, November 27, 2023 10:43 AM
To: Yin, ZhenGuo (Chris) ; amd-gfx@lists.freedesktop.org; 
Zha, YiFan(Even) ; Chen, Horace 
Subject: Re: [PATCH] drm/amdgpu: Skip access gfx11 golden registers under SRIOV

Add potential reviewers.

On 11/23/2023 4:57 PM, ZhenGuo Yin wrote:
> [Why]
> Golden registers are PF-only registers on gfx11.
> RLCG interface will return "out-of-range" under SRIOV VF.
>
> [How]
> Skip access gfx11 golden registers under SRIOV.
>
> Signed-off-by: ZhenGuo Yin 
> ---
>   drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c | 3 +++
>   1 file changed, 3 insertions(+)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c 
> b/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c
> index 8ed4a6fb147a..288e926e5cd4 100644
> --- a/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c
> +++ b/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c
> @@ -293,6 +293,9 @@ static void gfx_v11_0_set_kiq_pm4_funcs(struct 
> amdgpu_device *adev)
>
>   static void gfx_v11_0_init_golden_registers(struct amdgpu_device *adev)
>   {
> + if (amdgpu_sriov_vf(adev))
> + return;
> +
>   switch (amdgpu_ip_version(adev, GC_HWIP, 0)) {
>   case IP_VERSION(11, 0, 1):
>   case IP_VERSION(11, 0, 4):

59 matches

Mail list logo