[PATCH 7/8] Documentation/gpu: Add an explanation about the DC weekly patches

2023-10-20 Thread Rodrigo Siqueira
Sharing code with other OSes is confusing and raises some questions.
This patch introduces some explanation about our upstream process with
the shared code.

Cc: Mario Limonciello 
Cc: Alex Deucher 
Cc: Harry Wentland 
Cc: Hamza Mahfooz 
Signed-off-by: Rodrigo Siqueira 
---
 Documentation/gpu/amdgpu/display/index.rst | 111 -
 1 file changed, 109 insertions(+), 2 deletions(-)

diff --git a/Documentation/gpu/amdgpu/display/index.rst 
b/Documentation/gpu/amdgpu/display/index.rst
index b09d1434754d..9d53a42c5339 100644
--- a/Documentation/gpu/amdgpu/display/index.rst
+++ b/Documentation/gpu/amdgpu/display/index.rst
@@ -10,7 +10,114 @@ reason, our Display Core Driver is divided into two pieces:
 1. **Display Core (DC)** contains the OS-agnostic components. Things like
hardware programming and resource management are handled here.
 2. **Display Manager (DM)** contains the OS-dependent components. Hooks to the
-   amdgpu base driver and DRM are implemented here.
+   amdgpu base driver and DRM are implemented here. For example, you can check
+   display/amdgpu_dm/ folder.
+
+
+How AMD shares code?
+
+
+Maintaining the same code-base across multiple OSes requires a lot of
+synchronization effort between repositories. In the DC case, we maintain a
+central repository where everyone who works from other OSes can put their
+change in this centralized repository. In a simple way, this shared repository
+is identical to all code that you can see in the display folder. The shared
+repo has integration tests with our Linux CI farm, and we run an exhaustive set
+of IGT tests in various AMD GPUs/APUs. Our CI also checks ARM64/32, PPC64/32,
+and x86_64/32 compilation with DCN enabled and disabled. After all tests pass
+and the developer gets reviewed by someone else, the change gets merged into
+the shared repository.
+
+To maintain this shared code working properly, we run two activities every
+week:
+
+1. **Weekly backport**: We bring changes from Linux to the other shared
+   repositories. This work gets massive support from our CI tools, which can
+   detect new changes and send them to internal maintainers.
+2. **Weekly promotion**: Every week, we get changes from other teams in the
+   shared repo that have yet to be made public. For this reason, at the
+   beginning of each week, a developer will review that internal repo and
+   prepare a series of patches that can be sent to the public upstream
+   (promotion).
+
+For the context of this documentation, promotion is the essential part that
+deserves a good elaboration here.
+
+Weekly promotion
+
+
+As described in the previous sections, the display folder has its equivalent as
+an internal repository shared with multiple teams. The promotion activity is
+the task of 'promoting' those internal changes to the upstream; this is
+possible thanks to numerous tools that help us manage the code-sharing
+challenges. The weekly promotion usually takes one week, sliced like this:
+
+1. Extract all merged patches from the previous week that can be sent to the
+   upstream. In other words, we check the week's time frame.
+2. Evaluate if any potential new patches make sense to the upstream.
+3. Create a branch candidate with the latest amd-staging-drm-next code together
+   with the new patches. At this step, we must ensure that every patch compiles
+   and the entire series pass our set of IGT test in different hardware (i.e.,
+   it has to pass to our CI).
+4. Send the new candidate branch for an internal quality test and extra CI
+   validation.
+5. Send patches to amd-gfx for reviews. We wait a few days for community
+   feedback after sending a series to the public mailing list.  6. If there is
+   an error, we debug as fast as possible; usually, a simple bisect in the
+   weekly promotion patches points to a bad change, and we can take two
+   possible actions: fix the issue or drop the patch. If we cannot identify the
+   problem in the week interval, we drop the promotion and start over the
+   following week; in this case, the following promotion will have the previous
+   patches plus the new ones.
+
+We usually rotate the above process with many display developers to keep the
+workload manageable for everybody. It is good to highlight that the test phase
+is something that we take extremely seriously, and we never merge anything that
+fails our validation. Just to give an overview:
+
+1. Manual test
+ - Multiple Hotplugs with DP and HDMI.
+ - Stress test with multiple display configuration changes via the user
+   interface.
+ - Validate VRR behaviour.
+ - Check PSR.
+ - Validate MPO when playing video.
+ - Test more than two displays connected at the same time.
+ - Check suspend/resume.
+2. Automated test
+ - IGT tests in a farm with GPUs and APUs that support DCN and DCE.
+ - Compilation validation with the latest GCC and Clang from LTS distro.
+ - Cross-compilation for PowerPC 64/32, 

[PATCH 5/8] Documentation/gpu: Add entry for OPP in the kernel doc

2023-10-20 Thread Rodrigo Siqueira
Introduce OPP as part of the kernel documentation.

Cc: Mario Limonciello 
Cc: Alex Deucher 
Cc: Harry Wentland 
Cc: Hamza Mahfooz 
Signed-off-by: Rodrigo Siqueira 
---
 Documentation/gpu/amdgpu/display/dcn-blocks.rst | 12 
 drivers/gpu/drm/amd/display/dc/inc/hw/opp.h | 16 
 2 files changed, 28 insertions(+)

diff --git a/Documentation/gpu/amdgpu/display/dcn-blocks.rst 
b/Documentation/gpu/amdgpu/display/dcn-blocks.rst
index 1a223f33202e..5ba3c04c1db0 100644
--- a/Documentation/gpu/amdgpu/display/dcn-blocks.rst
+++ b/Documentation/gpu/amdgpu/display/dcn-blocks.rst
@@ -52,3 +52,15 @@ MPC
 
 .. kernel-doc:: drivers/gpu/drm/amd/display/dc/inc/hw/mpc.h
:internal:
+
+OPP
+---
+
+.. kernel-doc:: drivers/gpu/drm/amd/display/dc/inc/hw/opp.h
+   :doc: overview
+
+.. kernel-doc:: drivers/gpu/drm/amd/display/dc/inc/hw/opp.h
+   :export:
+
+.. kernel-doc:: drivers/gpu/drm/amd/display/dc/inc/hw/opp.h
+   :internal:
diff --git a/drivers/gpu/drm/amd/display/dc/inc/hw/opp.h 
b/drivers/gpu/drm/amd/display/dc/inc/hw/opp.h
index 7617fabbd16e..aee5372e292c 100644
--- a/drivers/gpu/drm/amd/display/dc/inc/hw/opp.h
+++ b/drivers/gpu/drm/amd/display/dc/inc/hw/opp.h
@@ -23,6 +23,22 @@
  *
  */
 
+/**
+ * DOC: overview
+ *
+ * The Output Plane Processor (OPP) block groups have functions that format
+ * pixel streams such that they are suitable for display at the display device.
+ * The key functions contained in the OPP are:
+ *
+ * - Adaptive Backlight Modulation (ABM)
+ * - Formatter (FMT) which provide pixel-by-pixel operations for format the
+ *   incoming pixel stream.
+ * - Output Buffer that provide pixel replication, and overlapping.
+ * - Interface between MPC and OPTC.
+ * - Clock and reset generation.
+ * - CRC generation.
+ */
+
 #ifndef __DAL_OPP_H__
 #define __DAL_OPP_H__
 
-- 
2.42.0



[PATCH 8/8] Documentation/gpu: Introduce a simple contribution list for display code

2023-10-20 Thread Rodrigo Siqueira
This commit adds a contribution list for display under the kernel
documentation with some first suggestions. It also drops an old TODO
list from the display folder.

Cc: Mario Limonciello 
Cc: Alex Deucher 
Cc: Harry Wentland 
Cc: Hamza Mahfooz 
Signed-off-by: Rodrigo Siqueira 
---
 .../amdgpu/display/display-contributing.rst   |  88 ++
 Documentation/gpu/amdgpu/display/index.rst|  12 +-
 drivers/gpu/drm/amd/display/TODO  | 110 --
 3 files changed, 95 insertions(+), 115 deletions(-)
 create mode 100644 Documentation/gpu/amdgpu/display/display-contributing.rst
 delete mode 100644 drivers/gpu/drm/amd/display/TODO

diff --git a/Documentation/gpu/amdgpu/display/display-contributing.rst 
b/Documentation/gpu/amdgpu/display/display-contributing.rst
new file mode 100644
index ..0247e0579fd4
--- /dev/null
+++ b/Documentation/gpu/amdgpu/display/display-contributing.rst
@@ -0,0 +1,88 @@
+.. _display_todos:
+
+==
+AMDGPU - Display Contributions
+==
+
+First of all, if you are here, you probably want to give some technical
+contribution to the display code, and for that, we say thank you :)
+
+This page summarizes some of the issues you can help with. This page follows
+the DRM way of creating a TODO list; for more information, check
+'Documentation/gpu/todo.rst'.
+
+Gitlab issues
+=
+
+Users can report issues associated with AMD GPUs at:
+
+- https://gitlab.freedesktop.org/drm/amd
+
+Usually, we try to add a proper label to all new tickets to make it easy to
+filter issues. If you can reproduce any problem, you could help by adding more
+information or fixing the issue.
+
+Level: diverse
+
+IGT
+===
+
+`IGT`_ provides many integration tests that can be run on your GPU. We always
+want to pass a large set of tests to increase the test coverage in our CI. If
+you wish to contribute to the display code but are unsure where a good place
+is, we recommend you run all IGT tests and try to fix any failure you see in
+your hardware. Keep in mind that this failure can be an IGT problem or a kernel
+issue; it is necessary to analyze case-by-case.
+
+Level: diverse
+
+.. _IGT: https://gitlab.freedesktop.org/drm/igt-gpu-tools
+
+Compilation
+===
+
+Fix compilation warnings
+
+
+Enable the W1 or W2 warning level in the kernel compilation and try to fix the
+issues on the display side.
+
+Level: Starter
+
+Code Refactor
+=
+
+Add prefix to DC functions to improve the debug with ftrace
+---
+
+The Ftrace debug feature (check 'Documentation/trace/ftrace.rst') is a
+fantastic way to check the code path when developers are trying to make sense
+of a bug. Ftrace provide a filter mechanism that can be useful when the
+developer has some hunch of which part of the code can cause the issue; for
+this reason, if a set of function has a proper prefix, it becomes easy to
+create a good filter. The DC code does not follow some prefix rules, which
+makes the Ftrace filter more complicated. If you want something simple to start
+contributing to the display, you can make patches for adding prefixes to DC
+functions. To create those prefixes, use part of the file name as a prefix for
+all functions in the target file. Check the 'amdgpu_dm_crtc.c` and
+`amdgpu_dm_plane.c` for some references.
+
+Level: Starter
+
+
+Try to move some of the sink handling code to DRM
+-
+
+When amdgpu was upstream for the first time, developers discussed how AMD
+display handles sink. From the conversation, developers concluded that we
+should move some of those codes to the DRM helpers/core in the long term.
+
+Level: Advanced
+
+Simplify DDC
+
+
+We can probably remove vector.c from dc/basics. It's used in DDC code which can
+probably be simplified enough to no longer need a vector implementation.
+
+Level: Advanced
diff --git a/Documentation/gpu/amdgpu/display/index.rst 
b/Documentation/gpu/amdgpu/display/index.rst
index 9d53a42c5339..25445a50121e 100644
--- a/Documentation/gpu/amdgpu/display/index.rst
+++ b/Documentation/gpu/amdgpu/display/index.rst
@@ -109,11 +109,12 @@ if possible.
 DC Workflow for a new feature
 -
 
-When we enable a new feature in the DC, the entire development workflow happens
-on the amd-gfx mailing list. For example, when we enabled the PSR or the Replay
-feature, all the development happened on amd-gfx. When enabling a new feature,
-we just use promotion for extra validation in the latest patches by asking the
-quality team to test the current promotion together with the new patches.
+When an AMD developer enables a new feature in the DC, the entire development
+workflow happens on the amd-gfx mailing list. For example, when we enabled the
+PSR or the Replay feature, all the development happened on amd-gfx. When
+enabling a new 

[PATCH 4/8] Documentation/gpu: Add kernel doc entry for MPC

2023-10-20 Thread Rodrigo Siqueira
This commit adds a kernel-doc entry for the MPC block. Since it enabled
the kernel-doc to parse some of the documentation in the mpc.h file,
fixing some of the comments was required.

Cc: Mario Limonciello 
Cc: Alex Deucher 
Cc: Harry Wentland 
Cc: Hamza Mahfooz 
Signed-off-by: Rodrigo Siqueira 
---
 .../gpu/amdgpu/display/dcn-blocks.rst |  12 +
 drivers/gpu/drm/amd/display/dc/inc/hw/mpc.h   | 248 --
 2 files changed, 184 insertions(+), 76 deletions(-)

diff --git a/Documentation/gpu/amdgpu/display/dcn-blocks.rst 
b/Documentation/gpu/amdgpu/display/dcn-blocks.rst
index 83fc4a03113e..1a223f33202e 100644
--- a/Documentation/gpu/amdgpu/display/dcn-blocks.rst
+++ b/Documentation/gpu/amdgpu/display/dcn-blocks.rst
@@ -40,3 +40,15 @@ DPP
 
 .. kernel-doc:: drivers/gpu/drm/amd/display/dc/inc/hw/hubp.h
:internal:
+
+MPC
+---
+
+.. kernel-doc:: drivers/gpu/drm/amd/display/dc/inc/hw/mpc.h
+   :doc: overview
+
+.. kernel-doc:: drivers/gpu/drm/amd/display/dc/inc/hw/mpc.h
+   :export:
+
+.. kernel-doc:: drivers/gpu/drm/amd/display/dc/inc/hw/mpc.h
+   :internal:
diff --git a/drivers/gpu/drm/amd/display/dc/inc/hw/mpc.h 
b/drivers/gpu/drm/amd/display/dc/inc/hw/mpc.h
index 61a2406dcc53..ae336c066f13 100644
--- a/drivers/gpu/drm/amd/display/dc/inc/hw/mpc.h
+++ b/drivers/gpu/drm/amd/display/dc/inc/hw/mpc.h
@@ -23,13 +23,28 @@
  */
 
 /**
- * DOC: mpc-overview
+ * DOC: overview
  *
  * Multiple Pipe/Plane Combined (MPC) is a component in the hardware pipeline
  * that performs blending of multiple planes, using global and per-pixel alpha.
  * It also performs post-blending color correction operations according to the
  * hardware capabilities, such as color transformation matrix and gamma 1D and
  * 3D LUT.
+ *
+ * MPC receives output from all DPP pipes and combines them to multiple outputs
+ * supporting "M MPC inputs -> N MPC outputs" flexible composition
+ * architecture. It features:
+ *
+ * - Programmable blending structure to allow software controlled blending and
+ *   cascading;
+ * - Programmable window location of each DPP in active region of display;
+ * - Combining multiple DPP pipes in one active region when a single DPP pipe
+ *   cannot process very large surface;
+ * - Combining multiple DPP from different SLS with blending;
+ * - Stereo formats from single DPP in top-bottom or side-by-side modes;
+ * - Stereo formats from 2 DPPs;
+ * - Alpha blending of multiple layers from different DPP pipes;
+ * - Programmable background color;
  */
 
 #ifndef __DC_MPCC_H__
@@ -83,34 +98,66 @@ enum mpcc_alpha_blend_mode {
 
 /**
  * struct mpcc_blnd_cfg - MPCC blending configuration
- *
- * @black_color: background color
- * @alpha_mode: alpha blend mode (MPCC_ALPHA_BLND_MODE)
- * @pre_multiplied_alpha: whether pixel color values were pre-multiplied by the
- * alpha channel (MPCC_ALPHA_MULTIPLIED_MODE)
- * @global_gain: used when blend mode considers both pixel alpha and plane
- * alpha value and assumes the global alpha value.
- * @global_alpha: plane alpha value
- * @overlap_only: whether overlapping of different planes is allowed
- * @bottom_gain_mode: blend mode for bottom gain setting
- * @background_color_bpc: background color for bpc
- * @top_gain: top gain setting
- * @bottom_inside_gain: blend mode for bottom inside
- * @bottom_outside_gain:  blend mode for bottom outside
  */
 struct mpcc_blnd_cfg {
-   struct tg_color black_color;/* background color */
-   enum mpcc_alpha_blend_mode alpha_mode;  /* alpha blend mode */
-   bool pre_multiplied_alpha;  /* alpha pre-multiplied mode flag */
+   /**
+* @black_color: background color.
+*/
+   struct tg_color black_color;
+
+   /**
+* @alpha_mode: alpha blend mode (MPCC_ALPHA_BLND_MODE).
+*/
+   enum mpcc_alpha_blend_mode alpha_mode;
+
+   /***
+* @@pre_multiplied_alpha:
+*
+* Whether pixel color values were pre-multiplied by the alpha channel
+* (MPCC_ALPHA_MULTIPLIED_MODE).
+*/
+   bool pre_multiplied_alpha;
+
+   /**
+* @global_gain: Used when blend mode considers both pixel alpha and 
plane.
+*/
int global_gain;
+
+   /**
+* @global_alpha: Plane alpha value.
+*/
int global_alpha;
+
+   /**
+* @@overlap_only: Whether overlapping of different planes is allowed.
+*/
bool overlap_only;
 
/* MPCC top/bottom gain settings */
+
+   /**
+* @bottom_gain_mode: Blend mode for bottom gain setting.
+*/
int bottom_gain_mode;
+
+   /**
+* @background_color_bpc: Background color for bpc.
+*/
int background_color_bpc;
+
+   /**
+* @top_gain: Top gain setting.
+*/
int top_gain;
+
+   /**
+* @bottom_inside_gain: Blend mode for bottom inside.
+*/
int bottom_inside_gain;
+
+   /**
+* @bottom_outside_gain: Blend mode for bottom 

[PATCH 6/8] Documentation/gpu: Add entry for the DIO component

2023-10-20 Thread Rodrigo Siqueira
Cc: Mario Limonciello 
Cc: Alex Deucher 
Cc: Harry Wentland 
Cc: Hamza Mahfooz 
Signed-off-by: Rodrigo Siqueira 
---
 Documentation/gpu/amdgpu/display/dcn-blocks.rst  | 12 
 .../gpu/drm/amd/display/dc/link/hwss/link_hwss_dio.h | 10 ++
 2 files changed, 22 insertions(+)

diff --git a/Documentation/gpu/amdgpu/display/dcn-blocks.rst 
b/Documentation/gpu/amdgpu/display/dcn-blocks.rst
index 5ba3c04c1db0..a3fbd3ea028b 100644
--- a/Documentation/gpu/amdgpu/display/dcn-blocks.rst
+++ b/Documentation/gpu/amdgpu/display/dcn-blocks.rst
@@ -64,3 +64,15 @@ OPP
 
 .. kernel-doc:: drivers/gpu/drm/amd/display/dc/inc/hw/opp.h
:internal:
+
+DIO
+---
+
+.. kernel-doc:: drivers/gpu/drm/amd/display/dc/link/hwss/link_hwss_dio.h
+   :doc: overview
+
+.. kernel-doc:: drivers/gpu/drm/amd/display/dc/link/hwss/link_hwss_dio.h
+   :export:
+
+.. kernel-doc:: drivers/gpu/drm/amd/display/dc/link/hwss/link_hwss_dio.h
+   :internal:
diff --git a/drivers/gpu/drm/amd/display/dc/link/hwss/link_hwss_dio.h 
b/drivers/gpu/drm/amd/display/dc/link/hwss/link_hwss_dio.h
index f4633d3cf9b9..a1f72fe378ee 100644
--- a/drivers/gpu/drm/amd/display/dc/link/hwss/link_hwss_dio.h
+++ b/drivers/gpu/drm/amd/display/dc/link/hwss/link_hwss_dio.h
@@ -22,6 +22,16 @@
  * Authors: AMD
  *
  */
+
+/**
+ * DOC: overview
+ *
+ * Display Input Output (DIO), is the display input and output unit in DCN. It
+ * includes output encoders to support different display output, like
+ * DisplayPort, HDMI, DVI interface, and others. It also includes the control
+ * and status channels for these interfaces.
+ */
+
 #ifndef __LINK_HWSS_DIO_H__
 #define __LINK_HWSS_DIO_H__
 
-- 
2.42.0



[PATCH 1/8] Documentation/gpu: Add basic page for HUBP

2023-10-20 Thread Rodrigo Siqueira
Create the HUBP documentation page and add the doc references to extract
the HUBP code documentation.

Cc: Mario Limonciello 
Cc: Alex Deucher 
Cc: Harry Wentland 
Cc: Hamza Mahfooz 
Signed-off-by: Rodrigo Siqueira 
---
 .../gpu/amdgpu/display/dcn-blocks.rst  | 18 ++
 Documentation/gpu/amdgpu/display/index.rst |  1 +
 drivers/gpu/drm/amd/display/dc/inc/hw/hubp.h   | 13 -
 3 files changed, 31 insertions(+), 1 deletion(-)
 create mode 100644 Documentation/gpu/amdgpu/display/dcn-blocks.rst

diff --git a/Documentation/gpu/amdgpu/display/dcn-blocks.rst 
b/Documentation/gpu/amdgpu/display/dcn-blocks.rst
new file mode 100644
index ..5da34d5b73d8
--- /dev/null
+++ b/Documentation/gpu/amdgpu/display/dcn-blocks.rst
@@ -0,0 +1,18 @@
+==
+DCN Blocks
+==
+
+In this section, you will find some extra details about some of the DCN blocks
+and the code documentation when it is automatically generated.
+
+HUBP
+
+
+.. kernel-doc:: drivers/gpu/drm/amd/display/dc/inc/hw/hubp.h
+   :doc: overview
+
+.. kernel-doc:: drivers/gpu/drm/amd/display/dc/inc/hw/hubp.h
+   :export:
+
+.. kernel-doc:: drivers/gpu/drm/amd/display/dc/inc/hw/hubp.h
+   :internal:
diff --git a/Documentation/gpu/amdgpu/display/index.rst 
b/Documentation/gpu/amdgpu/display/index.rst
index f8a4f53d70d8..b09d1434754d 100644
--- a/Documentation/gpu/amdgpu/display/index.rst
+++ b/Documentation/gpu/amdgpu/display/index.rst
@@ -28,5 +28,6 @@ table of content:
display-manager.rst
dc-debug.rst
dcn-overview.rst
+   dcn-blocks.rst
mpo-overview.rst
dc-glossary.rst
diff --git a/drivers/gpu/drm/amd/display/dc/inc/hw/hubp.h 
b/drivers/gpu/drm/amd/display/dc/inc/hw/hubp.h
index 7f3f9b69e903..dedc5370023e 100644
--- a/drivers/gpu/drm/amd/display/dc/inc/hw/hubp.h
+++ b/drivers/gpu/drm/amd/display/dc/inc/hw/hubp.h
@@ -26,13 +26,24 @@
 #ifndef __DAL_HUBP_H__
 #define __DAL_HUBP_H__
 
+/**
+ * DOC: overview
+ *
+ * Display Controller Hub (DCHUB) is the gateway between the Scalable Data Port
+ * (SDP) and DCN. This component has multiple features, such as memory
+ * arbitration, rotation, and cursor manipulation.
+ *
+ * There is one HUBP allocated per pipe, which fetches data and converts
+ * different pixel formats (i.e. ARGB, NV12, etc) into linear, interleaved
+ * and fixed-depth streams of pixel data.
+ */
+
 #include "mem_input.h"
 #include "cursor_reg_cache.h"
 
 #define OPP_ID_INVALID 0xf
 #define MAX_TTU 0xff
 
-
 enum cursor_pitch {
CURSOR_PITCH_64_PIXELS = 0,
CURSOR_PITCH_128_PIXELS,
-- 
2.42.0



[PATCH 3/8] Documentation/gpu: Add kernel doc entry for DPP

2023-10-20 Thread Rodrigo Siqueira
This commit introduces basic DPP information and the struct scan for
code documentation.

Cc: Mario Limonciello 
Cc: Alex Deucher 
Cc: Harry Wentland 
Cc: Hamza Mahfooz 
Signed-off-by: Rodrigo Siqueira 
---
 .../gpu/amdgpu/display/dcn-blocks.rst | 12 +
 drivers/gpu/drm/amd/display/dc/inc/hw/dpp.h   | 26 +++
 2 files changed, 38 insertions(+)

diff --git a/Documentation/gpu/amdgpu/display/dcn-blocks.rst 
b/Documentation/gpu/amdgpu/display/dcn-blocks.rst
index e4e0a4ddca4e..83fc4a03113e 100644
--- a/Documentation/gpu/amdgpu/display/dcn-blocks.rst
+++ b/Documentation/gpu/amdgpu/display/dcn-blocks.rst
@@ -28,3 +28,15 @@ HUBP
 
 .. kernel-doc:: drivers/gpu/drm/amd/display/dc/inc/hw/hubp.h
:internal:
+
+DPP
+---
+
+.. kernel-doc:: drivers/gpu/drm/amd/display/dc/inc/hw/hubp.h
+   :doc: overview
+
+.. kernel-doc:: drivers/gpu/drm/amd/display/dc/inc/hw/hubp.h
+   :export:
+
+.. kernel-doc:: drivers/gpu/drm/amd/display/dc/inc/hw/hubp.h
+   :internal:
diff --git a/drivers/gpu/drm/amd/display/dc/inc/hw/dpp.h 
b/drivers/gpu/drm/amd/display/dc/inc/hw/dpp.h
index f4aa76e02518..2c40e253b14e 100644
--- a/drivers/gpu/drm/amd/display/dc/inc/hw/dpp.h
+++ b/drivers/gpu/drm/amd/display/dc/inc/hw/dpp.h
@@ -27,6 +27,32 @@
 #ifndef __DAL_DPP_H__
 #define __DAL_DPP_H__
 
+/**
+ * DOC: overview
+ *
+ * The DPP (Display Pipe and Plane) block is the unified display data
+ * processing engine in DCN for processing graphic or video data on per DPP
+ * rectangle base. This rectangle can be a part of SLS (Single Large Surface),
+ * or a layer to be blended with other DPP, or a rectangle associated with a
+ * display tile.
+ *
+ * It provides various functions including:
+ * - graphic color keyer
+ * - graphic cursor compositing
+ * - graphic or video image source to destination scaling
+ * - image sharping
+ * - video format conversion from 4:2:0 or 4:2:2 to 4:4:4
+ * - Color Space Conversion
+ * - Host LUT gamma adjustment
+ * - Color Gamut Remap
+ * - brightness and contrast adjustment.
+ *
+ * DPP pipe consists of Converter and Cursor (CNVC), Scaler (DSCL), Color
+ * Management (CM), Output Buffer (OBUF) and Digital Bypass (DPB) module
+ * connected in a video/graphics pipeline.
+ */
+
+
 #include "transform.h"
 #include "cursor_reg_cache.h"
 
-- 
2.42.0



[PATCH 0/8] Expand and improve AMDGPU documentation

2023-10-20 Thread Rodrigo Siqueira
This patchset improves how the AMDGPU display documentation is
organized, expands the kerne-doc to extract information from the source,
and adds more context about DC workflow. Finally, at the end of this
series, we also introduce a contribution session for those interested in
contributing with the display code.

Thanks
Siqueira

Rodrigo Siqueira (8):
  Documentation/gpu: Add basic page for HUBP
  Documentation/gpu: Add simple doc page for DCHUBBUB
  Documentation/gpu: Add kernel doc entry for DPP
  Documentation/gpu: Add kernel doc entry for MPC
  Documentation/gpu: Add entry for OPP in the kernel doc
  Documentation/gpu: Add entry for the DIO component
  Documentation/gpu: Add an explanation about the DC weekly patches
  Documentation/gpu: Introduce a simple contribution list for display
code

 .../gpu/amdgpu/display/dcn-blocks.rst |  78 ++
 .../amdgpu/display/display-contributing.rst   |  88 +++
 Documentation/gpu/amdgpu/display/index.rst| 114 +++-
 drivers/gpu/drm/amd/display/TODO  | 110 
 .../gpu/drm/amd/display/dc/inc/hw/dchubbub.h  |   6 +
 drivers/gpu/drm/amd/display/dc/inc/hw/dpp.h   |  26 ++
 drivers/gpu/drm/amd/display/dc/inc/hw/hubp.h  |  13 +-
 drivers/gpu/drm/amd/display/dc/inc/hw/mpc.h   | 248 --
 drivers/gpu/drm/amd/display/dc/inc/hw/opp.h   |  16 ++
 .../amd/display/dc/link/hwss/link_hwss_dio.h  |  10 +
 10 files changed, 520 insertions(+), 189 deletions(-)
 create mode 100644 Documentation/gpu/amdgpu/display/dcn-blocks.rst
 create mode 100644 Documentation/gpu/amdgpu/display/display-contributing.rst
 delete mode 100644 drivers/gpu/drm/amd/display/TODO

-- 
2.42.0



[PATCH 2/8] Documentation/gpu: Add simple doc page for DCHUBBUB

2023-10-20 Thread Rodrigo Siqueira
Enable the documentation to extract code documentation from dchubbub.h
file.

Cc: Mario Limonciello 
Cc: Alex Deucher 
Cc: Harry Wentland 
Cc: Hamza Mahfooz 
Signed-off-by: Rodrigo Siqueira 
---
 Documentation/gpu/amdgpu/display/dcn-blocks.rst  | 12 
 drivers/gpu/drm/amd/display/dc/inc/hw/dchubbub.h |  6 ++
 2 files changed, 18 insertions(+)

diff --git a/Documentation/gpu/amdgpu/display/dcn-blocks.rst 
b/Documentation/gpu/amdgpu/display/dcn-blocks.rst
index 5da34d5b73d8..e4e0a4ddca4e 100644
--- a/Documentation/gpu/amdgpu/display/dcn-blocks.rst
+++ b/Documentation/gpu/amdgpu/display/dcn-blocks.rst
@@ -5,6 +5,18 @@ DCN Blocks
 In this section, you will find some extra details about some of the DCN blocks
 and the code documentation when it is automatically generated.
 
+DCHUBBUB
+
+
+.. kernel-doc:: drivers/gpu/drm/amd/display/dc/inc/hw/hubp.h
+   :doc: overview
+
+.. kernel-doc:: drivers/gpu/drm/amd/display/dc/inc/hw/hubp.h
+   :export:
+
+.. kernel-doc:: drivers/gpu/drm/amd/display/dc/inc/hw/hubp.h
+   :internal:
+
 HUBP
 
 
diff --git a/drivers/gpu/drm/amd/display/dc/inc/hw/dchubbub.h 
b/drivers/gpu/drm/amd/display/dc/inc/hw/dchubbub.h
index cea05843990c..580a9f3f07c0 100644
--- a/drivers/gpu/drm/amd/display/dc/inc/hw/dchubbub.h
+++ b/drivers/gpu/drm/amd/display/dc/inc/hw/dchubbub.h
@@ -26,6 +26,12 @@
 #ifndef __DAL_DCHUBBUB_H__
 #define __DAL_DCHUBBUB_H__
 
+/**
+ * DOC: overview
+ *
+ * There is only one common DCHUBBUB. It contains the common request and return
+ * blocks for the Data Fabric Interface that are not clock/power gated.
+ */
 
 enum dcc_control {
dcc_control__256_256_xxx,
-- 
2.42.0



Re: [PATCH v3] drm/amdkfd: Use partial mapping in GPU page faults

2023-10-20 Thread Philip Yang

  


On 2023-10-20 17:53, Xiaogang.Chen
  wrote:


  From: Xiaogang Chen 

After partial migration to recover GPU page fault this patch does GPU vm
space mapping for same page range that got migrated intead of mapping all
pages of svm range in which the page fault happened.

Signed-off-by: Xiaogang Chen

Reviewed-by: Philip Yang

  
---
 drivers/gpu/drm/amd/amdkfd/kfd_svm.c | 29 
 1 file changed, 21 insertions(+), 8 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
index 54af7a2b29f8..3a71d04779b1 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
@@ -1619,6 +1619,7 @@ static void *kfd_svm_page_owner(struct kfd_process *p, int32_t gpuidx)
  * 5. Release page table (and SVM BO) reservation
  */
 static int svm_range_validate_and_map(struct mm_struct *mm,
+  unsigned long map_start, unsigned long map_last,
   struct svm_range *prange, int32_t gpuidx,
   bool intr, bool wait, bool flush_tlb)
 {
@@ -1699,6 +1700,8 @@ static int svm_range_validate_and_map(struct mm_struct *mm,
 	end = (prange->last + 1) << PAGE_SHIFT;
 	for (addr = start; !r && addr < end; ) {
 		struct hmm_range *hmm_range;
+		unsigned long map_start_vma;
+		unsigned long map_last_vma;
 		struct vm_area_struct *vma;
 		uint64_t vram_pages_vma;
 		unsigned long next = 0;
@@ -1747,9 +1750,16 @@ static int svm_range_validate_and_map(struct mm_struct *mm,
 			r = -EAGAIN;
 		}
 
-		if (!r)
-			r = svm_range_map_to_gpus(prange, offset, npages, readonly,
-		  ctx->bitmap, wait, flush_tlb);
+		if (!r) {
+			map_start_vma = max(map_start, prange->start + offset);
+			map_last_vma = min(map_last, prange->start + offset + npages - 1);
+			if (map_start_vma <= map_last_vma) {
+offset = map_start_vma - prange->start;
+npages = map_last_vma - map_start_vma + 1;
+r = svm_range_map_to_gpus(prange, offset, npages, readonly,
+			  ctx->bitmap, wait, flush_tlb);
+			}
+		}
 
 		if (!r && next == end)
 			prange->mapped_to_gpu = true;
@@ -1855,8 +1865,8 @@ static void svm_range_restore_work(struct work_struct *work)
 		 */
 		mutex_lock(>migrate_mutex);
 
-		r = svm_range_validate_and_map(mm, prange, MAX_GPU_INSTANCE,
-	   false, true, false);
+		r = svm_range_validate_and_map(mm, prange->start, prange->last, prange,
+	   MAX_GPU_INSTANCE, false, true, false);
 		if (r)
 			pr_debug("failed %d to map 0x%lx to gpus\n", r,
  prange->start);
@@ -3069,6 +3079,8 @@ svm_range_restore_pages(struct amdgpu_device *adev, unsigned int pasid,
 	kfd_smi_event_page_fault_start(node, p->lead_thread->pid, addr,
    write_fault, timestamp);
 
+	start = prange->start;
+	last = prange->last;
 	if (prange->actual_loc != 0 || best_loc != 0) {
 		migration = true;
 		/* Align migration range start and size to granularity size */
@@ -3102,10 +3114,11 @@ svm_range_restore_pages(struct amdgpu_device *adev, unsigned int pasid,
 		}
 	}
 
-	r = svm_range_validate_and_map(mm, prange, gpuidx, false, false, false);
+	r = svm_range_validate_and_map(mm, start, last, prange, gpuidx, false,
+   false, false);
 	if (r)
 		pr_debug("failed %d to map svms 0x%p [0x%lx 0x%lx] to gpus\n",
-			 r, svms, prange->start, prange->last);
+			 r, svms, start, last);
 
 	kfd_smi_event_page_fault_end(node, p->lead_thread->pid, addr,
  migration);
@@ -3650,7 +3663,7 @@ svm_range_set_attr(struct kfd_process *p, struct mm_struct *mm,
 
 		flush_tlb = !migrated && update_mapping && prange->mapped_to_gpu;
 
-		r = svm_range_validate_and_map(mm, prange, MAX_GPU_INSTANCE,
+		r = svm_range_validate_and_map(mm, prange->start, prange->last, prange, MAX_GPU_INSTANCE,
 	   true, true, flush_tlb);
 		if (r)
 			pr_debug("failed %d to map svm range\n", r);


  



[PATCH v3] drm/amdkfd: Use partial mapping in GPU page faults

2023-10-20 Thread Xiaogang . Chen
From: Xiaogang Chen 

After partial migration to recover GPU page fault this patch does GPU vm
space mapping for same page range that got migrated intead of mapping all
pages of svm range in which the page fault happened.

Signed-off-by: Xiaogang Chen
---
 drivers/gpu/drm/amd/amdkfd/kfd_svm.c | 29 
 1 file changed, 21 insertions(+), 8 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
index 54af7a2b29f8..3a71d04779b1 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
@@ -1619,6 +1619,7 @@ static void *kfd_svm_page_owner(struct kfd_process *p, 
int32_t gpuidx)
  * 5. Release page table (and SVM BO) reservation
  */
 static int svm_range_validate_and_map(struct mm_struct *mm,
+ unsigned long map_start, unsigned long 
map_last,
  struct svm_range *prange, int32_t gpuidx,
  bool intr, bool wait, bool flush_tlb)
 {
@@ -1699,6 +1700,8 @@ static int svm_range_validate_and_map(struct mm_struct 
*mm,
end = (prange->last + 1) << PAGE_SHIFT;
for (addr = start; !r && addr < end; ) {
struct hmm_range *hmm_range;
+   unsigned long map_start_vma;
+   unsigned long map_last_vma;
struct vm_area_struct *vma;
uint64_t vram_pages_vma;
unsigned long next = 0;
@@ -1747,9 +1750,16 @@ static int svm_range_validate_and_map(struct mm_struct 
*mm,
r = -EAGAIN;
}
 
-   if (!r)
-   r = svm_range_map_to_gpus(prange, offset, npages, 
readonly,
- ctx->bitmap, wait, flush_tlb);
+   if (!r) {
+   map_start_vma = max(map_start, prange->start + offset);
+   map_last_vma = min(map_last, prange->start + offset + 
npages - 1);
+   if (map_start_vma <= map_last_vma) {
+   offset = map_start_vma - prange->start;
+   npages = map_last_vma - map_start_vma + 1;
+   r = svm_range_map_to_gpus(prange, offset, 
npages, readonly,
+ ctx->bitmap, wait, 
flush_tlb);
+   }
+   }
 
if (!r && next == end)
prange->mapped_to_gpu = true;
@@ -1855,8 +1865,8 @@ static void svm_range_restore_work(struct work_struct 
*work)
 */
mutex_lock(>migrate_mutex);
 
-   r = svm_range_validate_and_map(mm, prange, MAX_GPU_INSTANCE,
-  false, true, false);
+   r = svm_range_validate_and_map(mm, prange->start, prange->last, 
prange,
+  MAX_GPU_INSTANCE, false, true, 
false);
if (r)
pr_debug("failed %d to map 0x%lx to gpus\n", r,
 prange->start);
@@ -3069,6 +3079,8 @@ svm_range_restore_pages(struct amdgpu_device *adev, 
unsigned int pasid,
kfd_smi_event_page_fault_start(node, p->lead_thread->pid, addr,
   write_fault, timestamp);
 
+   start = prange->start;
+   last = prange->last;
if (prange->actual_loc != 0 || best_loc != 0) {
migration = true;
/* Align migration range start and size to granularity size */
@@ -3102,10 +3114,11 @@ svm_range_restore_pages(struct amdgpu_device *adev, 
unsigned int pasid,
}
}
 
-   r = svm_range_validate_and_map(mm, prange, gpuidx, false, false, false);
+   r = svm_range_validate_and_map(mm, start, last, prange, gpuidx, false,
+  false, false);
if (r)
pr_debug("failed %d to map svms 0x%p [0x%lx 0x%lx] to gpus\n",
-r, svms, prange->start, prange->last);
+r, svms, start, last);
 
kfd_smi_event_page_fault_end(node, p->lead_thread->pid, addr,
 migration);
@@ -3650,7 +3663,7 @@ svm_range_set_attr(struct kfd_process *p, struct 
mm_struct *mm,
 
flush_tlb = !migrated && update_mapping && 
prange->mapped_to_gpu;
 
-   r = svm_range_validate_and_map(mm, prange, MAX_GPU_INSTANCE,
+   r = svm_range_validate_and_map(mm, prange->start, prange->last, 
prange, MAX_GPU_INSTANCE,
   true, true, flush_tlb);
if (r)
pr_debug("failed %d to map svm range\n", r);
-- 
2.25.1



Re: [PATCH v2 0/2] Fix issues caused by DML2 in ASICs older than DCN35

2023-10-20 Thread Rodrigo Siqueira Jordao




On 10/20/23 15:38, Harry Wentland wrote:

On 2023-10-20 17:26, Rodrigo Siqueira wrote:

The first commit of this series just sets the variable using_dml2 to
false for all ASICs that do not require it. The second commit adds a fix
to the DC sequence that calls a DML2 operation in ASICs that does not
use it.

Cc: Vitaly Prosyak 
Cc: Roman Li 
Cc: Qingqing Zhuo 
Cc: Daniel Wheeler 
Cc: Alex Deucher 



Didn't realize this would make the change much bigger but I think
it'll be more consistent in the long-term.

Reviewed-by: Harry Wentland 


Thanks Harry



Harry


Rodrigo Siqueira (2):
   drm/amd/display: Set the DML2 attribute to false in all DCNs older
 than version 3.5
   drm/amd/display: Fix DMUB errors introduced by DML2

  drivers/gpu/drm/amd/display/dc/core/dc_resource.c   | 9 +
  drivers/gpu/drm/amd/display/dc/dcn10/dcn10_resource.c   | 1 +
  drivers/gpu/drm/amd/display/dc/dcn20/dcn20_resource.c   | 1 +
  drivers/gpu/drm/amd/display/dc/dcn201/dcn201_resource.c | 1 +
  drivers/gpu/drm/amd/display/dc/dcn21/dcn21_resource.c   | 1 +
  drivers/gpu/drm/amd/display/dc/dcn30/dcn30_resource.c   | 1 +
  drivers/gpu/drm/amd/display/dc/dcn301/dcn301_resource.c | 3 ++-
  drivers/gpu/drm/amd/display/dc/dcn302/dcn302_resource.c | 1 +
  drivers/gpu/drm/amd/display/dc/dcn303/dcn303_resource.c | 1 +
  drivers/gpu/drm/amd/display/dc/dcn31/dcn31_resource.c   | 1 +
  drivers/gpu/drm/amd/display/dc/dcn314/dcn314_resource.c | 3 ++-
  drivers/gpu/drm/amd/display/dc/dcn315/dcn315_resource.c | 1 +
  drivers/gpu/drm/amd/display/dc/dcn316/dcn316_resource.c | 1 +
  drivers/gpu/drm/amd/display/dc/dcn321/dcn321_resource.c | 1 +
  14 files changed, 20 insertions(+), 6 deletions(-)







Re: [PATCH v2 0/2] Fix issues caused by DML2 in ASICs older than DCN35

2023-10-20 Thread Harry Wentland
On 2023-10-20 17:26, Rodrigo Siqueira wrote:
> The first commit of this series just sets the variable using_dml2 to
> false for all ASICs that do not require it. The second commit adds a fix
> to the DC sequence that calls a DML2 operation in ASICs that does not
> use it.
> 
> Cc: Vitaly Prosyak 
> Cc: Roman Li 
> Cc: Qingqing Zhuo 
> Cc: Daniel Wheeler 
> Cc: Alex Deucher 
> 

Didn't realize this would make the change much bigger but I think
it'll be more consistent in the long-term.

Reviewed-by: Harry Wentland 

Harry

> Rodrigo Siqueira (2):
>   drm/amd/display: Set the DML2 attribute to false in all DCNs older
> than version 3.5
>   drm/amd/display: Fix DMUB errors introduced by DML2
> 
>  drivers/gpu/drm/amd/display/dc/core/dc_resource.c   | 9 +
>  drivers/gpu/drm/amd/display/dc/dcn10/dcn10_resource.c   | 1 +
>  drivers/gpu/drm/amd/display/dc/dcn20/dcn20_resource.c   | 1 +
>  drivers/gpu/drm/amd/display/dc/dcn201/dcn201_resource.c | 1 +
>  drivers/gpu/drm/amd/display/dc/dcn21/dcn21_resource.c   | 1 +
>  drivers/gpu/drm/amd/display/dc/dcn30/dcn30_resource.c   | 1 +
>  drivers/gpu/drm/amd/display/dc/dcn301/dcn301_resource.c | 3 ++-
>  drivers/gpu/drm/amd/display/dc/dcn302/dcn302_resource.c | 1 +
>  drivers/gpu/drm/amd/display/dc/dcn303/dcn303_resource.c | 1 +
>  drivers/gpu/drm/amd/display/dc/dcn31/dcn31_resource.c   | 1 +
>  drivers/gpu/drm/amd/display/dc/dcn314/dcn314_resource.c | 3 ++-
>  drivers/gpu/drm/amd/display/dc/dcn315/dcn315_resource.c | 1 +
>  drivers/gpu/drm/amd/display/dc/dcn316/dcn316_resource.c | 1 +
>  drivers/gpu/drm/amd/display/dc/dcn321/dcn321_resource.c | 1 +
>  14 files changed, 20 insertions(+), 6 deletions(-)
> 



[PATCH v2 1/2] drm/amd/display: Set the DML2 attribute to false in all DCNs older than version 3.5

2023-10-20 Thread Rodrigo Siqueira
When DML2 was introduced, it targeted only new DCN versions. For
controlling which ASIC should use this new version of DML, it was
introduced the using_dml2 attribute. To avoid ambiguities, this commit
explicitly sets using_dml2 to false in all ASICs that do not support
DML2.

Cc: Vitaly Prosyak 
Cc: Roman Li 
Cc: Qingqing Zhuo 
Cc: Daniel Wheeler 
Cc: Alex Deucher 
Signed-off-by: Rodrigo Siqueira 
---
 drivers/gpu/drm/amd/display/dc/dcn10/dcn10_resource.c   | 1 +
 drivers/gpu/drm/amd/display/dc/dcn20/dcn20_resource.c   | 1 +
 drivers/gpu/drm/amd/display/dc/dcn201/dcn201_resource.c | 1 +
 drivers/gpu/drm/amd/display/dc/dcn21/dcn21_resource.c   | 1 +
 drivers/gpu/drm/amd/display/dc/dcn30/dcn30_resource.c   | 1 +
 drivers/gpu/drm/amd/display/dc/dcn301/dcn301_resource.c | 3 ++-
 drivers/gpu/drm/amd/display/dc/dcn302/dcn302_resource.c | 1 +
 drivers/gpu/drm/amd/display/dc/dcn303/dcn303_resource.c | 1 +
 drivers/gpu/drm/amd/display/dc/dcn31/dcn31_resource.c   | 1 +
 drivers/gpu/drm/amd/display/dc/dcn314/dcn314_resource.c | 3 ++-
 drivers/gpu/drm/amd/display/dc/dcn315/dcn315_resource.c | 1 +
 drivers/gpu/drm/amd/display/dc/dcn316/dcn316_resource.c | 1 +
 drivers/gpu/drm/amd/display/dc/dcn321/dcn321_resource.c | 1 +
 13 files changed, 15 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/display/dc/dcn10/dcn10_resource.c 
b/drivers/gpu/drm/amd/display/dc/dcn10/dcn10_resource.c
index d1d8e904346e..b94c5c97eee7 100644
--- a/drivers/gpu/drm/amd/display/dc/dcn10/dcn10_resource.c
+++ b/drivers/gpu/drm/amd/display/dc/dcn10/dcn10_resource.c
@@ -554,6 +554,7 @@ static const struct dc_debug_options debug_defaults_drv = {
.max_downscale_src_width = 3840,
.underflow_assert_delay_us = 0x,
.enable_legacy_fast_update = true,
+   .using_dml2 = false,
 };
 
 static const struct dc_debug_options debug_defaults_diags = {
diff --git a/drivers/gpu/drm/amd/display/dc/dcn20/dcn20_resource.c 
b/drivers/gpu/drm/amd/display/dc/dcn20/dcn20_resource.c
index 7eda4bbcd8ac..0a422fbb14bc 100644
--- a/drivers/gpu/drm/amd/display/dc/dcn20/dcn20_resource.c
+++ b/drivers/gpu/drm/amd/display/dc/dcn20/dcn20_resource.c
@@ -723,6 +723,7 @@ static const struct dc_debug_options debug_defaults_drv = {
.sanity_checks = false,
.underflow_assert_delay_us = 0x,
.enable_legacy_fast_update = true,
+   .using_dml2 = false,
 };
 
 void dcn20_dpp_destroy(struct dpp **dpp)
diff --git a/drivers/gpu/drm/amd/display/dc/dcn201/dcn201_resource.c 
b/drivers/gpu/drm/amd/display/dc/dcn201/dcn201_resource.c
index a11b2f6afe4a..bca22d867696 100644
--- a/drivers/gpu/drm/amd/display/dc/dcn201/dcn201_resource.c
+++ b/drivers/gpu/drm/amd/display/dc/dcn201/dcn201_resource.c
@@ -614,6 +614,7 @@ static const struct dc_debug_options debug_defaults_drv = {
.underflow_assert_delay_us = 0x,
.enable_tri_buf = false,
.enable_legacy_fast_update = true,
+   .using_dml2 = false,
 };
 
 static void dcn201_dpp_destroy(struct dpp **dpp)
diff --git a/drivers/gpu/drm/amd/display/dc/dcn21/dcn21_resource.c 
b/drivers/gpu/drm/amd/display/dc/dcn21/dcn21_resource.c
index 58a0d37e9523..42277b280586 100644
--- a/drivers/gpu/drm/amd/display/dc/dcn21/dcn21_resource.c
+++ b/drivers/gpu/drm/amd/display/dc/dcn21/dcn21_resource.c
@@ -654,6 +654,7 @@ static const struct dc_debug_options debug_defaults_drv = {
.dmub_command_table = true,
.use_max_lb = true,
.enable_legacy_fast_update = true,
+   .using_dml2 = false,
 };
 
 static const struct dc_panel_config panel_config_defaults = {
diff --git a/drivers/gpu/drm/amd/display/dc/dcn30/dcn30_resource.c 
b/drivers/gpu/drm/amd/display/dc/dcn30/dcn30_resource.c
index 473581cff06b..7b259cb5f418 100644
--- a/drivers/gpu/drm/amd/display/dc/dcn30/dcn30_resource.c
+++ b/drivers/gpu/drm/amd/display/dc/dcn30/dcn30_resource.c
@@ -729,6 +729,7 @@ static const struct dc_debug_options debug_defaults_drv = {
.use_max_lb = true,
.exit_idle_opt_for_cursor_updates = true,
.enable_legacy_fast_update = false,
+   .using_dml2 = false,
 };
 
 static const struct dc_panel_config panel_config_defaults = {
diff --git a/drivers/gpu/drm/amd/display/dc/dcn301/dcn301_resource.c 
b/drivers/gpu/drm/amd/display/dc/dcn301/dcn301_resource.c
index b4b3b52990b9..f3b75f283aa2 100644
--- a/drivers/gpu/drm/amd/display/dc/dcn301/dcn301_resource.c
+++ b/drivers/gpu/drm/amd/display/dc/dcn301/dcn301_resource.c
@@ -701,7 +701,8 @@ static const struct dc_debug_options debug_defaults_drv = {
.dwb_fi_phase = -1, // -1 = disable
.dmub_command_table = true,
.use_max_lb = false,
-   .exit_idle_opt_for_cursor_updates = true
+   .exit_idle_opt_for_cursor_updates = true,
+   .using_dml2 = false,
 };
 
 static void dcn301_dpp_destroy(struct dpp **dpp)
diff --git 

[PATCH v2 2/2] drm/amd/display: Fix DMUB errors introduced by DML2

2023-10-20 Thread Rodrigo Siqueira
When DML 2 was introduced, it changed part of the generic sequence of
DC, which caused issues on previous DCNs with DMUB support. This commit
ensures the new sequence only works for new DCNs from 3.5 and above.

Changes since V1:
- Harry: Use the attribute using_dml2 instead of check the DCN version.

Cc: Vitaly Prosyak 
Cc: Roman Li 
Cc: Qingqing Zhuo 
Cc: Daniel Wheeler 
Cc: Alex Deucher 
Fixes: 7966f319c66d ("drm/amd/display: Introduce DML2")
Signed-off-by: Rodrigo Siqueira 
---
 drivers/gpu/drm/amd/display/dc/core/dc_resource.c | 9 +
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/amd/display/dc/core/dc_resource.c 
b/drivers/gpu/drm/amd/display/dc/core/dc_resource.c
index 97f402123fbb..f9e472f08e21 100644
--- a/drivers/gpu/drm/amd/display/dc/core/dc_resource.c
+++ b/drivers/gpu/drm/amd/display/dc/core/dc_resource.c
@@ -321,10 +321,11 @@ struct resource_pool *dc_create_resource_pool(struct dc  
*dc,
res_pool->ref_clocks.xtalin_clock_inKhz;
res_pool->ref_clocks.dchub_ref_clock_inKhz =
res_pool->ref_clocks.xtalin_clock_inKhz;
-   if (res_pool->hubbub && 
res_pool->hubbub->funcs->get_dchub_ref_freq)
-   
res_pool->hubbub->funcs->get_dchub_ref_freq(res_pool->hubbub,
-   
res_pool->ref_clocks.dccg_ref_clock_inKhz,
-   
_pool->ref_clocks.dchub_ref_clock_inKhz);
+   if (dc->debug.using_dml2)
+   if (res_pool->hubbub && 
res_pool->hubbub->funcs->get_dchub_ref_freq)
+   
res_pool->hubbub->funcs->get_dchub_ref_freq(res_pool->hubbub,
+   
res_pool->ref_clocks.dccg_ref_clock_inKhz,
+   
_pool->ref_clocks.dchub_ref_clock_inKhz);
} else
ASSERT_CRITICAL(false);
}
-- 
2.42.0



[PATCH v2 0/2] Fix issues caused by DML2 in ASICs older than DCN35

2023-10-20 Thread Rodrigo Siqueira
The first commit of this series just sets the variable using_dml2 to
false for all ASICs that do not require it. The second commit adds a fix
to the DC sequence that calls a DML2 operation in ASICs that does not
use it.

Cc: Vitaly Prosyak 
Cc: Roman Li 
Cc: Qingqing Zhuo 
Cc: Daniel Wheeler 
Cc: Alex Deucher 

Rodrigo Siqueira (2):
  drm/amd/display: Set the DML2 attribute to false in all DCNs older
than version 3.5
  drm/amd/display: Fix DMUB errors introduced by DML2

 drivers/gpu/drm/amd/display/dc/core/dc_resource.c   | 9 +
 drivers/gpu/drm/amd/display/dc/dcn10/dcn10_resource.c   | 1 +
 drivers/gpu/drm/amd/display/dc/dcn20/dcn20_resource.c   | 1 +
 drivers/gpu/drm/amd/display/dc/dcn201/dcn201_resource.c | 1 +
 drivers/gpu/drm/amd/display/dc/dcn21/dcn21_resource.c   | 1 +
 drivers/gpu/drm/amd/display/dc/dcn30/dcn30_resource.c   | 1 +
 drivers/gpu/drm/amd/display/dc/dcn301/dcn301_resource.c | 3 ++-
 drivers/gpu/drm/amd/display/dc/dcn302/dcn302_resource.c | 1 +
 drivers/gpu/drm/amd/display/dc/dcn303/dcn303_resource.c | 1 +
 drivers/gpu/drm/amd/display/dc/dcn31/dcn31_resource.c   | 1 +
 drivers/gpu/drm/amd/display/dc/dcn314/dcn314_resource.c | 3 ++-
 drivers/gpu/drm/amd/display/dc/dcn315/dcn315_resource.c | 1 +
 drivers/gpu/drm/amd/display/dc/dcn316/dcn316_resource.c | 1 +
 drivers/gpu/drm/amd/display/dc/dcn321/dcn321_resource.c | 1 +
 14 files changed, 20 insertions(+), 6 deletions(-)

-- 
2.42.0



[pull] amdgpu, amdkfd drm-next-6.7

2023-10-20 Thread Alex Deucher
Hi Dave, Sima,

More updates for 6.7.  Mostly bug fixes.

The following changes since commit 27442758e9b4e083bef3f164a1739475c01f3202:

  Merge tag 'amd-drm-next-6.7-2023-10-13' of 
https://gitlab.freedesktop.org/agd5f/linux into drm-next (2023-10-18 16:08:07 
+1000)

are available in the Git repository at:

  https://gitlab.freedesktop.org/agd5f/linux.git 
tags/amd-drm-next-6.7-2023-10-20

for you to fetch changes up to 5b2c54e0d0ea09f7a3b500510731878326e1117e:

  drm/amd/display: Fix stack size issue on DML2 (2023-10-20 15:11:29 -0400)


amd-drm-next-6.7-2023-10-20:

amdgpu:
- SMU 13 updates
- UMSCH updates
- DC MPO fixes
- RAS updates
- MES 11 fixes
- Fix possible memory leaks in error pathes
- GC 11.5 fixes
- Kernel doc updates
- PSP updates
- APU IMU fixes
- Misc code cleanups
- SMU 11 fixes
- OD fix
- Frame size warning fixes
- SR-IOV fixes
- NBIO 7.11 updates
- NBIO 7.7 updates
- XGMI fixes
- devcoredump updates

amdkfd:
- Misc code cleanups
- SVM fixes


Alex Deucher (3):
  drm/amdgpu/pm: update SMU 13.0.0 PMFW version check
  drm/amdgpu/mes11: remove aggregated doorbell code
  drm/amdgpu: update to the latest GC 11.5 headers

Alex Sierra (1):
  drm/amdkfd: remap unaligned svm ranges that have split

André Almeida (3):
  drm/amdgpu: Encapsulate all device reset info
  drm/amdgpu: Move coredump code to amdgpu_reset file
  drm/amdgpu: Create version number for coredumps

Asad Kamal (2):
  drm/amdgpu : Add hive ras recovery check
  drm/amdgpu: update retry times for psp BL wait

Bas Nieuwenhuizen (1):
  drm/amd/pm: Handle non-terminated overdrive commands.

Bokun Zhang (4):
  drm/amd/amdgpu/vcn: Add RB decouple feature under SRIOV - P1
  drm/amd/amdgpu/vcn: Add RB decouple feature under SRIOV - P2
  drm/amd/amdgpu/vcn: Add RB decouple feature under SRIOV - P3
  drm/amd/amdgpu/vcn: Add RB decouple feature under SRIOV - P4

Candice Li (1):
  drm/amdgpu: Log UE corrected by replay as correctable error

Colin Ian King (1):
  drm/amd/display: Fix a handful of spelling mistakes in dml_print output

Felix Kuehling (2):
  drm/amdgpu: Fix possible null pointer dereference
  drm/amdgpu: Reserve fences for VM update

Hawking Zhang (2):
  drm/amdgpu: Enable software RAS in vcn v4_0_3
  drm/amdgpu: Add UVD_VCPU_INT_EN2 to dpg sram

Jesse Zhang (1):
  drm/amdkfd:remove unused code

Jiapeng Chong (2):
  drm/amdkfd: clean up some inconsistent indenting
  drm/amd/display: clean up some inconsistent indenting

Kunwu.Chan (1):
  drm/amd/pm: Fix a memory leak on an error path

Lang Yu (1):
  drm/amdgpu/umsch: add suspend and resume callback

Li Ma (2):
  drm/amdgpu: fix missing stuff in NBIO v7.11
  drm/amdgpu: add clockgating support for NBIO v7.7.1

Ma Jun (1):
  drm/amd/pm: Support for getting power1_cap_min value

Mangesh Gadre (1):
  Revert "drm/amdgpu: Program xcp_ctl registers as needed"

Mario Limonciello (4):
  drm/amd: Add missing kernel doc for prepare_suspend()
  drm/amd: Move microcode init step to early_init()
  drm/amd: Don't parse IMU ucode version if it won't be loaded
  drm/amd: Read IMU FW version from scratch register during hw_init

Nathan Chancellor (1):
  drm/amd/display: Respect CONFIG_FRAME_WARN=0 in DML2

Rodrigo Siqueira (2):
  drm/amd/display: Reduce stack size by splitting function
  drm/amd/display: Fix stack size issue on DML2

Shiwu Zhang (3):
  drm/amdgpu: update the xgmi ta interface header
  drm/amdgpu: prepare the output buffer for GET_PEER_LINKS command
  drm/amdgpu: support the port num info based on the capability flag

Stanley.Yang (4):
  drm/amdgpu: Workaround to skip kiq ring test during ras gpu recovery
  drm/amdgpu: Enable mca debug mode mode when ras enabled
  drm/amdgpu: Fix delete nodes that have been relesed
  drm/amdgpu: Enable RAS feature by default for APU

Stylon Wang (2):
  drm/amd/display: Add missing lines of code in dc.c
  drm/amd/display: Remove brackets in macro to conform to coding style

Tao Zhou (4):
  drm/amdgpu: define ras_reset_error_count function
  drm/amdgpu: replace reset_error_count with amdgpu_ras_reset_error_count
  drm/amdgpu: add set/get mca debug mode operations
  drm/amdgpu: drop status query/reset for GCEA 9.4.3 and MMEA 1.8

Yang Li (4):
  drm/amd/display: clean up some inconsistent indentings
  drm/amd/display: Remove duplicated include in dce110_hwseq.c
  drm/amd/display: Remove unneeded semicolon
  drm/amd/display: Simplify bool conversion

Yang Wang (1):
  drm/amdgpu: fix typo for amdgpu ras error data print

 drivers/gpu/drm/amd/amdgpu/amdgpu.h|   25 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c|   10 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_device.c |   96 +-
 

Re: [PATCH 1/2] drm/amdgpu: Add timeout for sync wait

2023-10-20 Thread Felix Kuehling



On 2023-10-20 09:10, Christian König wrote:
No, the wait forever is what is expected and perfectly valid user 
experience.


Waiting with a timeout on the other hand sounds like a really bad idea 
to me.


Every wait with a timeout needs a justification, e.g. for example that 
userspace explicitly specified it. And I absolutely don't see that here.
In this case the wait is in a kernel worker thread, and the wait is not 
interruptible. Not having a timeout means, you can have a kernel worker 
stuck forever. The restore worker also has retry logic already, so it 
can handle a timeout perfectly well. But maybe this shouldn't be done 
automatically for all callers of amdgpu_sync_wait, but only for this 
particular caller in the restore_process_worker. So we'd need to add a 
timeout parameter to amdgpu_sync_wait.


Regards,
  Felix




Regards,
Christian.

Am 20.10.23 um 10:52 schrieb Deng, Emily:

[AMD Official Use Only - General]

Hi Christian,
  The issue is running a compute hang with a quark and trigger a 
compute job timeout. For compute, the timeout setting is 60s, but for 
gfx and sdma, it is 10s.

So, get the timeout from the sched is reasonable for different sched.
 And if wait timeout, it will print error, so won't hint real 
issues. And even it has real issue, the wait forever is bad user 
experience, and driver couldn't work anymore.


Emily Deng
Best Wishes




-Original Message-
From: Christian König 
Sent: Friday, October 20, 2023 3:29 PM
To: Deng, Emily ; amd-gfx@lists.freedesktop.org
Subject: Re: [PATCH 1/2] drm/amdgpu: Add timeout for sync wait

Am 20.10.23 um 08:13 schrieb Emily Deng:

Issue: Dead heappen during gpu recover, the call sequence as below:

amdgpu_device_gpu_recover->amdgpu_amdkfd_pre_reset-
flush_delayed_work
-> amdgpu_amdkfd_gpuvm_restore_process_bos->amdgpu_sync_wait

It is because the amdgpu_sync_wait is waiting for the bad job's fence,
and never return, so the recover couldn't continue.


Signed-off-by: Emily Deng 
---
   drivers/gpu/drm/amd/amdgpu/amdgpu_sync.c | 11 +--
   1 file changed, 9 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_sync.c
b/drivers/gpu/drm/amd/amdgpu/amdgpu_sync.c
index dcd8c066bc1f..6253d6aab7f8 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_sync.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_sync.c
@@ -406,8 +406,15 @@ int amdgpu_sync_wait(struct amdgpu_sync *sync,

bool intr)

  int i, r;

  hash_for_each_safe(sync->fences, i, tmp, e, node) {
-    r = dma_fence_wait(e->fence, intr);
-    if (r)
+    struct drm_sched_fence *s_fence = to_drm_sched_fence(e-
fence);
+    long timeout = msecs_to_jiffies(1);
That handling doesn't make much sense. If you need a timeout then 
you need

a timeout for the whole function.

Additional to that timeouts often just hide real problems which 
needs fixing.


So this here needs a much better justification otherwise it's a 
pretty clear NAK.


Regards,
Christian.


+
+    if (s_fence)
+    timeout = s_fence->sched->timeout;
+
+    if (r == 0)
+    r = -ETIMEDOUT;
+    if (r < 0)
  return r;

  amdgpu_sync_entry_free(e);




[linux-next:master] BUILD REGRESSION 2030579113a1b1b5bfd7ff24c0852847836d8fd1

2023-10-20 Thread kernel test robot
tree/branch: 
https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git master
branch HEAD: 2030579113a1b1b5bfd7ff24c0852847836d8fd1  Add linux-next specific 
files for 20231020

Error/Warning reports:

https://lore.kernel.org/oe-kbuild-all/202309212121.cul1ptra-...@intel.com
https://lore.kernel.org/oe-kbuild-all/202309212339.hxhbu2f1-...@intel.com
https://lore.kernel.org/oe-kbuild-all/202310171905.azfrkoid-...@intel.com
https://lore.kernel.org/oe-kbuild-all/202310201911.qt2yaa39-...@intel.com
https://lore.kernel.org/oe-kbuild-all/202310210234.arlqneke-...@intel.com
https://lore.kernel.org/oe-kbuild-all/202310210303.l1agutr9-...@intel.com

Error/Warning: (recently discovered and may have been fixed)

aarch64-linux-ld: s4-pll.c:(.text+0x168): undefined reference to 
`meson_clk_hw_get'
arch/x86/include/asm/string_32.h:150:25: warning: '__builtin_memcpy' writing 3 
bytes into a region of size 0 overflows the destination [-Wstringop-overflow=]
drivers/firmware/qcom_scm.c:1621:34: warning: 'qcom_scm_qseecom_allowlist' 
defined but not used [-Wunused-const-variable=]
drivers/gpu/drm/amd/amdgpu/../pm/swsmu/smu13/smu_v13_0_6_ppt.c:286:52: warning: 
'%s' directive output may be truncated writing up to 29 bytes into a region of 
size 23 [-Wformat-truncation=]
drivers/gpu/drm/amd/amdgpu/../pm/swsmu/smu14/smu_v14_0.c:72:52: warning: '%s' 
directive output may be truncated writing up to 29 bytes into a region of size 
23 [-Wformat-truncation=]
kernel/bpf/helpers.c:1909:19: warning: no previous declaration for 
'bpf_percpu_obj_new_impl' [-Wmissing-declarations]
kernel/bpf/helpers.c:1945:18: warning: no previous declaration for 
'bpf_percpu_obj_drop_impl' [-Wmissing-declarations]
kernel/bpf/helpers.c:2485:18: warning: no previous declaration for 'bpf_throw' 
[-Wmissing-declarations]
s4-pll.c:(.text+0x164): undefined reference to `meson_clk_hw_get'
security/apparmor/lsm.c:651:5: warning: no previous declaration for 
'apparmor_uring_override_creds' [-Wmissing-declarations]
security/apparmor/lsm.c:675:5: warning: no previous declaration for 
'apparmor_uring_sqpoll' [-Wmissing-declarations]

Unverified Error/Warning (likely false positive, please contact us if 
interested):

Documentation/devicetree/bindings/mfd/qcom,tcsr.yaml:
Documentation/devicetree/bindings/mfd/qcom-pm8xxx.yaml:
drivers/staging/octeon/ethernet.c:204:37: error: storage size of 'rx_status' 
isn't known
drivers/staging/octeon/ethernet.c:205:37: error: storage size of 'tx_status' 
isn't known
drivers/staging/octeon/ethernet.c:801:49: error: storage size of 'imode' isn't 
known
drivers/staging/octeon/ethernet.c:802:21: error: variable 'imode' has 
initializer but incomplete type

Error/Warning ids grouped by kconfigs:

gcc_recent_errors
|-- alpha-allyesconfig
|   |-- 
drivers-gpu-drm-amd-amdgpu-..-pm-swsmu-smu13-smu_v13_0_6_ppt.c:warning:s-directive-output-may-be-truncated-writing-up-to-bytes-into-a-region-of-size
|   `-- 
drivers-gpu-drm-amd-amdgpu-..-pm-swsmu-smu14-smu_v14_0.c:warning:s-directive-output-may-be-truncated-writing-up-to-bytes-into-a-region-of-size
|-- arm-allmodconfig
|   |-- 
drivers-gpu-drm-amd-amdgpu-..-pm-swsmu-smu13-smu_v13_0_6_ppt.c:warning:s-directive-output-may-be-truncated-writing-up-to-bytes-into-a-region-of-size
|   `-- 
drivers-gpu-drm-amd-amdgpu-..-pm-swsmu-smu14-smu_v14_0.c:warning:s-directive-output-may-be-truncated-writing-up-to-bytes-into-a-region-of-size
|-- arm-allyesconfig
|   |-- 
drivers-gpu-drm-amd-amdgpu-..-pm-swsmu-smu13-smu_v13_0_6_ppt.c:warning:s-directive-output-may-be-truncated-writing-up-to-bytes-into-a-region-of-size
|   `-- 
drivers-gpu-drm-amd-amdgpu-..-pm-swsmu-smu14-smu_v14_0.c:warning:s-directive-output-may-be-truncated-writing-up-to-bytes-into-a-region-of-size
|-- arm-randconfig-004-20231020
|   |-- 
drivers-gpu-drm-amd-amdgpu-..-pm-swsmu-smu13-smu_v13_0_6_ppt.c:warning:s-directive-output-may-be-truncated-writing-up-to-bytes-into-a-region-of-size
|   `-- 
drivers-gpu-drm-amd-amdgpu-..-pm-swsmu-smu14-smu_v14_0.c:warning:s-directive-output-may-be-truncated-writing-up-to-bytes-into-a-region-of-size
|-- arm64-allmodconfig
|   |-- 
drivers-gpu-drm-amd-amdgpu-..-pm-swsmu-smu13-smu_v13_0_6_ppt.c:warning:s-directive-output-may-be-truncated-writing-up-to-bytes-into-a-region-of-size
|   `-- 
drivers-gpu-drm-amd-amdgpu-..-pm-swsmu-smu14-smu_v14_0.c:warning:s-directive-output-may-be-truncated-writing-up-to-bytes-into-a-region-of-size
|-- arm64-allyesconfig
|   |-- 
drivers-gpu-drm-amd-amdgpu-..-pm-swsmu-smu13-smu_v13_0_6_ppt.c:warning:s-directive-output-may-be-truncated-writing-up-to-bytes-into-a-region-of-size
|   `-- 
drivers-gpu-drm-amd-amdgpu-..-pm-swsmu-smu14-smu_v14_0.c:warning:s-directive-output-may-be-truncated-writing-up-to-bytes-into-a-region-of-size
|-- arm64-buildonly-randconfig-r005-2025
|   |-- 
aarch64-linux-ld:s4-pll.c:(.text):undefined-reference-to-meson_clk_hw_get
|   `-- s4-pll.c:(.text):undefined-reference-to-meson_clk_hw_get
|-- csky-allmodconfig
|   |-- 
drivers-gpu-drm-amd-amdgpu

Re: [PATCH] drm/amdkfd: reserve a fence slot while locking the BO

2023-10-20 Thread Felix Kuehling

On 2023-10-20 08:33, Christian König wrote:

Looks like the KFD still needs this.

Signed-off-by: Christian König 
Fixes: 8abc1eb2987a ("drm/amdkfd: switch over to using drm_exec v3")


To fix the immediate problem, this patch is

Acked-by: Felix Kuehling 

As I understand it, this reserves a fence slot for adding an eviction 
fence. I'm not convinced that this is the right place to do this. Not 
all callers of reserve_bo_and_vm add eviction fences. In another patch 
series I added the fence slot reservation in a new helper function 
amdgpu_amdkfd_bo_validate_and_fence.


Taking another step back, as I understand it, the pre-reservation of 
fence slots is there to avoid late failures after submitting commands to 
the HW. This isn't really a problem for KFD because eviction fences 
aren't directly linked to commands submitted to the HW. It's more like a 
place holder for future user mode submissions. So I think it's OK to 
reserve the fence slot just before attaching the fence to a BO resv. We 
don't have to do the pre-reservation here.


Regards,
  Felix



---
  drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
index 7d6daf8d2bfa..e036011137aa 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
@@ -1103,7 +1103,7 @@ static int reserve_bo_and_vm(struct kgd_mem *mem,
if (unlikely(ret))
goto error;
  
-		ret = drm_exec_lock_obj(>exec, >tbo.base);

+   ret = drm_exec_prepare_obj(>exec, >tbo.base, 1);
drm_exec_retry_on_contention(>exec);
if (unlikely(ret))
goto error;


RE: [PATCH] drm/amd/display: Fix DMUB errors introduced by DML2

2023-10-20 Thread Wheeler, Daniel
[Public]

Hi all,

I verified that this fix solved both a GPU init error and a 
hubbub2_get_dchub_ref_freq warning when re-enabling the amdgpu module.

Tested-by: Daniel Wheeler 

Thank you,

Dan Wheeler
Sr. Technologist  |  AMD
SW Display
--
1 Commerce Valley Dr E, Thornhill, ON L3T 7X6
Facebook |  Twitter |  amd.com


-Original Message-
From: Siqueira, Rodrigo 
Sent: Friday, October 20, 2023 12:42 PM
To: amd-gfx@lists.freedesktop.org
Cc: Siqueira, Rodrigo ; Prosyak, Vitaly 
; Li, Roman ; Zhuo, Lillian 
; Wheeler, Daniel ; Deucher, 
Alexander 
Subject: [PATCH] drm/amd/display: Fix DMUB errors introduced by DML2

When DML 2 was introduced, it changed part of the generic sequence of DC, which 
caused issues on previous DCNs with DMUB support. This commit ensures the new 
sequence only works for new DCNs from 3.5 and above.

Cc: Vitaly Prosyak 
Cc: Roman Li 
Cc: Qingqing Zhuo 
Cc: Daniel Wheeler 
Cc: Alex Deucher 
Fixes: 7966f319c66d ("drm/amd/display: Introduce DML2")
Signed-off-by: Rodrigo Siqueira 
---
 drivers/gpu/drm/amd/display/dc/core/dc_resource.c | 9 +
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/amd/display/dc/core/dc_resource.c 
b/drivers/gpu/drm/amd/display/dc/core/dc_resource.c
index 97f402123fbb..73cc6e1b0e65 100644
--- a/drivers/gpu/drm/amd/display/dc/core/dc_resource.c
+++ b/drivers/gpu/drm/amd/display/dc/core/dc_resource.c
@@ -321,10 +321,11 @@ struct resource_pool *dc_create_resource_pool(struct dc  
*dc,
res_pool->ref_clocks.xtalin_clock_inKhz;
res_pool->ref_clocks.dchub_ref_clock_inKhz =
res_pool->ref_clocks.xtalin_clock_inKhz;
-   if (res_pool->hubbub && 
res_pool->hubbub->funcs->get_dchub_ref_freq)
-   
res_pool->hubbub->funcs->get_dchub_ref_freq(res_pool->hubbub,
-   
res_pool->ref_clocks.dccg_ref_clock_inKhz,
-   
_pool->ref_clocks.dchub_ref_clock_inKhz);
+   if (dc_version >= DCN_VERSION_3_5)
+   if (res_pool->hubbub && 
res_pool->hubbub->funcs->get_dchub_ref_freq)
+   
res_pool->hubbub->funcs->get_dchub_ref_freq(res_pool->hubbub,
+   
res_pool->ref_clocks.dccg_ref_clock_inKhz,
+   
_pool->ref_clocks.dchub_ref_clock_inKhz);
} else
ASSERT_CRITICAL(false);
}
--
2.42.0



Re: [PATCH] drm/amd/display: Fix DMUB errors introduced by DML2

2023-10-20 Thread Harry Wentland



On 2023-10-20 12:42, Rodrigo Siqueira wrote:
> When DML 2 was introduced, it changed part of the generic sequence of
> DC, which caused issues on previous DCNs with DMUB support. This commit
> ensures the new sequence only works for new DCNs from 3.5 and above.
> 
> Cc: Vitaly Prosyak 
> Cc: Roman Li 
> Cc: Qingqing Zhuo 
> Cc: Daniel Wheeler 
> Cc: Alex Deucher 
> Fixes: 7966f319c66d ("drm/amd/display: Introduce DML2")
> Signed-off-by: Rodrigo Siqueira 
> ---
>  drivers/gpu/drm/amd/display/dc/core/dc_resource.c | 9 +
>  1 file changed, 5 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/gpu/drm/amd/display/dc/core/dc_resource.c 
> b/drivers/gpu/drm/amd/display/dc/core/dc_resource.c
> index 97f402123fbb..73cc6e1b0e65 100644
> --- a/drivers/gpu/drm/amd/display/dc/core/dc_resource.c
> +++ b/drivers/gpu/drm/amd/display/dc/core/dc_resource.c
> @@ -321,10 +321,11 @@ struct resource_pool *dc_create_resource_pool(struct dc 
>  *dc,
>   res_pool->ref_clocks.xtalin_clock_inKhz;
>   res_pool->ref_clocks.dchub_ref_clock_inKhz =
>   res_pool->ref_clocks.xtalin_clock_inKhz;
> - if (res_pool->hubbub && 
> res_pool->hubbub->funcs->get_dchub_ref_freq)
> - 
> res_pool->hubbub->funcs->get_dchub_ref_freq(res_pool->hubbub,
> - 
> res_pool->ref_clocks.dccg_ref_clock_inKhz,
> - 
> _pool->ref_clocks.dchub_ref_clock_inKhz);
> + if (dc_version >= DCN_VERSION_3_5)

A better check would be dc->debug.using_dml2

Harry

> + if (res_pool->hubbub && 
> res_pool->hubbub->funcs->get_dchub_ref_freq)
> + 
> res_pool->hubbub->funcs->get_dchub_ref_freq(res_pool->hubbub,
> + 
> res_pool->ref_clocks.dccg_ref_clock_inKhz,
> + 
> _pool->ref_clocks.dchub_ref_clock_inKhz);
>   } else
>   ASSERT_CRITICAL(false);
>   }



Re: [PATCH 0/3]Reduce code delta with copyright notice

2023-10-20 Thread Harry Wentland



On 2023-10-20 14:06, Stylon Wang wrote:
> Many of the DC files have either incomplete or incorrect copyright
> notice. This patchset aims to address this and also make lives
> less difficult for those doing backport/upstream activities.
> 

Series is
Reviewed-by: Harry Wentland 

Harry

> Stylon Wang (3):
>   drm/amd/display: Add missing copyright notice in DMUB
>   drm/amd/display: Fix copyright notice in DML2 code
>   drm/amd/display: Fix copyright notice in DC code
> 
>  .../drm/amd/display/dc/dcn303/dcn303_dccg.h   | 18 ++
>  .../drm/amd/display/dc/dcn303/dcn303_init.c   | 18 ++
>  .../drm/amd/display/dc/dcn303/dcn303_init.h   | 18 ++
>  .../amd/display/dc/dcn303/dcn303_resource.c   | 18 ++
>  .../amd/display/dc/dcn303/dcn303_resource.h   | 18 ++
>  drivers/gpu/drm/amd/display/dc/dcn31/Makefile |  2 +-
>  .../gpu/drm/amd/display/dc/dcn35/dcn35_dpp.c  |  2 ++
>  .../gpu/drm/amd/display/dc/dcn35/dcn35_dpp.h  |  2 ++
>  .../gpu/drm/amd/display/dc/dcn35/dcn35_dsc.c  |  2 ++
>  .../gpu/drm/amd/display/dc/dcn35/dcn35_dsc.h  |  2 ++
>  .../gpu/drm/amd/display/dc/dcn35/dcn35_dwb.h  |  2 ++
>  .../drm/amd/display/dc/dcn35/dcn35_hubbub.c   |  2 ++
>  .../drm/amd/display/dc/dcn35/dcn35_hubbub.h   |  2 ++
>  .../gpu/drm/amd/display/dc/dcn35/dcn35_hubp.c |  2 ++
>  .../gpu/drm/amd/display/dc/dcn35/dcn35_hubp.h |  2 ++
>  .../gpu/drm/amd/display/dc/dcn35/dcn35_init.c |  2 ++
>  .../gpu/drm/amd/display/dc/dcn35/dcn35_init.h |  2 ++
>  .../drm/amd/display/dc/dcn35/dcn35_mmhubbub.c |  2 ++
>  .../drm/amd/display/dc/dcn35/dcn35_mmhubbub.h |  2 ++
>  .../gpu/drm/amd/display/dc/dcn35/dcn35_opp.c  |  2 ++
>  .../gpu/drm/amd/display/dc/dcn35/dcn35_opp.h  |  2 ++
>  .../gpu/drm/amd/display/dc/dcn35/dcn35_optc.c |  2 ++
>  .../gpu/drm/amd/display/dc/dcn35/dcn35_optc.h |  2 ++
>  .../drm/amd/display/dc/dcn35/dcn35_pg_cntl.c  |  2 ++
>  .../drm/amd/display/dc/dcn35/dcn35_pg_cntl.h  |  2 ++
>  .../drm/amd/display/dc/dcn35/dcn35_resource.c |  2 ++
>  .../drm/amd/display/dc/dcn35/dcn35_resource.h |  2 ++
>  drivers/gpu/drm/amd/display/dc/dml2/Makefile  |  4 +++-
>  .../gpu/drm/amd/display/dc/dml2/cmntypes.h|  2 ++
>  .../amd/display/dc/dml2/display_mode_core.c   |  2 ++
>  .../dc/dml2/display_mode_core_structs.h   |  2 ++
>  .../dc/dml2/display_mode_lib_defines.h|  2 ++
>  .../amd/display/dc/dml2/display_mode_util.c   |  2 ++
>  .../amd/display/dc/dml2/display_mode_util.h   |  2 ++
>  .../display/dc/dml2/dml2_dc_resource_mgmt.c   |  2 ++
>  .../display/dc/dml2/dml2_dc_resource_mgmt.h   |  2 ++
>  .../drm/amd/display/dc/dml2/dml2_dc_types.h   |  2 ++
>  .../amd/display/dc/dml2/dml2_internal_types.h |  2 ++
>  .../amd/display/dc/dml2/dml2_mall_phantom.c   |  2 ++
>  .../amd/display/dc/dml2/dml2_mall_phantom.h   |  2 ++
>  .../gpu/drm/amd/display/dc/dml2/dml2_policy.c |  2 ++
>  .../display/dc/dml2/dml2_translation_helper.c |  2 ++
>  .../display/dc/dml2/dml2_translation_helper.h |  2 ++
>  .../gpu/drm/amd/display/dc/dml2/dml2_utils.c  |  2 ++
>  .../drm/amd/display/dc/dml2/dml2_wrapper.c|  2 ++
>  .../drm/amd/display/dc/dml2/dml2_wrapper.h|  2 ++
>  .../gpu/drm/amd/display/dc/dml2/dml_assert.h  |  2 ++
>  .../drm/amd/display/dc/dml2/dml_depedencies.h |  2 ++
>  .../gpu/drm/amd/display/dc/dml2/dml_logging.h |  2 ++
>  drivers/gpu/drm/amd/display/dc/hdcp/Makefile  |  2 +-
>  .../amd/display/dc/hwss/dcn303/dcn303_hwseq.c | 19 +++
>  .../amd/display/dc/hwss/dcn303/dcn303_hwseq.h | 19 +++
>  .../amd/display/dc/hwss/dcn35/dcn35_hwseq.c   |  2 ++
>  .../amd/display/dc/hwss/dcn35/dcn35_hwseq.h   |  2 ++
>  .../dc/irq/dcn201/irq_service_dcn201.c|  2 +-
>  .../dc/irq/dcn303/irq_service_dcn303.c| 19 +++
>  .../dc/irq/dcn303/irq_service_dcn303.h| 19 +++
>  .../drm/amd/display/dmub/src/dmub_dcn303.c| 19 +++
>  .../drm/amd/display/dmub/src/dmub_dcn303.h| 19 +++
>  59 files changed, 298 insertions(+), 4 deletions(-)
> 



[PATCH 3/3] drm/amd/display: Fix copyright notice in DC code

2023-10-20 Thread Stylon Wang
[Why & How]
Fix incomplete copyright notice in DC code.

Signed-off-by: Stylon Wang 
---
 .../drm/amd/display/dc/dcn303/dcn303_dccg.h   | 18 ++
 .../drm/amd/display/dc/dcn303/dcn303_init.c   | 18 ++
 .../drm/amd/display/dc/dcn303/dcn303_init.h   | 18 ++
 .../amd/display/dc/dcn303/dcn303_resource.c   | 18 ++
 .../amd/display/dc/dcn303/dcn303_resource.h   | 18 ++
 drivers/gpu/drm/amd/display/dc/dcn31/Makefile |  2 +-
 .../gpu/drm/amd/display/dc/dcn35/dcn35_dpp.c  |  2 ++
 .../gpu/drm/amd/display/dc/dcn35/dcn35_dpp.h  |  2 ++
 .../gpu/drm/amd/display/dc/dcn35/dcn35_dsc.c  |  2 ++
 .../gpu/drm/amd/display/dc/dcn35/dcn35_dsc.h  |  2 ++
 .../gpu/drm/amd/display/dc/dcn35/dcn35_dwb.h  |  2 ++
 .../drm/amd/display/dc/dcn35/dcn35_hubbub.c   |  2 ++
 .../drm/amd/display/dc/dcn35/dcn35_hubbub.h   |  2 ++
 .../gpu/drm/amd/display/dc/dcn35/dcn35_hubp.c |  2 ++
 .../gpu/drm/amd/display/dc/dcn35/dcn35_hubp.h |  2 ++
 .../gpu/drm/amd/display/dc/dcn35/dcn35_init.c |  2 ++
 .../gpu/drm/amd/display/dc/dcn35/dcn35_init.h |  2 ++
 .../drm/amd/display/dc/dcn35/dcn35_mmhubbub.c |  2 ++
 .../drm/amd/display/dc/dcn35/dcn35_mmhubbub.h |  2 ++
 .../gpu/drm/amd/display/dc/dcn35/dcn35_opp.c  |  2 ++
 .../gpu/drm/amd/display/dc/dcn35/dcn35_opp.h  |  2 ++
 .../gpu/drm/amd/display/dc/dcn35/dcn35_optc.c |  2 ++
 .../gpu/drm/amd/display/dc/dcn35/dcn35_optc.h |  2 ++
 .../drm/amd/display/dc/dcn35/dcn35_pg_cntl.c  |  2 ++
 .../drm/amd/display/dc/dcn35/dcn35_pg_cntl.h  |  2 ++
 .../drm/amd/display/dc/dcn35/dcn35_resource.c |  2 ++
 .../drm/amd/display/dc/dcn35/dcn35_resource.h |  2 ++
 drivers/gpu/drm/amd/display/dc/hdcp/Makefile  |  2 +-
 .../amd/display/dc/hwss/dcn303/dcn303_hwseq.c | 19 +++
 .../amd/display/dc/hwss/dcn303/dcn303_hwseq.h | 19 +++
 .../amd/display/dc/hwss/dcn35/dcn35_hwseq.c   |  2 ++
 .../amd/display/dc/hwss/dcn35/dcn35_hwseq.h   |  2 ++
 .../dc/irq/dcn201/irq_service_dcn201.c|  2 +-
 .../dc/irq/dcn303/irq_service_dcn303.c| 19 +++
 .../dc/irq/dcn303/irq_service_dcn303.h| 19 +++
 35 files changed, 215 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/amd/display/dc/dcn303/dcn303_dccg.h 
b/drivers/gpu/drm/amd/display/dc/dcn303/dcn303_dccg.h
index 294bd757bcb5..2e12fb643005 100644
--- a/drivers/gpu/drm/amd/display/dc/dcn303/dcn303_dccg.h
+++ b/drivers/gpu/drm/amd/display/dc/dcn303/dcn303_dccg.h
@@ -2,6 +2,24 @@
 /*
  * Copyright (C) 2021 Advanced Micro Devices, Inc.
  *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, DAMAGES OR
+ * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
+ * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
+ * OTHER DEALINGS IN THE SOFTWARE.
+ *
  * Authors: AMD
  */
 
diff --git a/drivers/gpu/drm/amd/display/dc/dcn303/dcn303_init.c 
b/drivers/gpu/drm/amd/display/dc/dcn303/dcn303_init.c
index 39cf7a50bd26..edb4d68b8187 100644
--- a/drivers/gpu/drm/amd/display/dc/dcn303/dcn303_init.c
+++ b/drivers/gpu/drm/amd/display/dc/dcn303/dcn303_init.c
@@ -2,6 +2,24 @@
 /*
  * Copyright (C) 2021 Advanced Micro Devices, Inc.
  *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, DAMAGES OR
+ * OTHER LIABILITY, WHETHER IN AN ACTION OF 

[PATCH 2/3] drm/amd/display: Fix copyright notice in DML2 code

2023-10-20 Thread Stylon Wang
[Why & How]
Fix incomplete copyright notice in DML2 code.

Signed-off-by: Stylon Wang 
---
 drivers/gpu/drm/amd/display/dc/dml2/Makefile  | 4 +++-
 drivers/gpu/drm/amd/display/dc/dml2/cmntypes.h| 2 ++
 drivers/gpu/drm/amd/display/dc/dml2/display_mode_core.c   | 2 ++
 .../gpu/drm/amd/display/dc/dml2/display_mode_core_structs.h   | 2 ++
 .../gpu/drm/amd/display/dc/dml2/display_mode_lib_defines.h| 2 ++
 drivers/gpu/drm/amd/display/dc/dml2/display_mode_util.c   | 2 ++
 drivers/gpu/drm/amd/display/dc/dml2/display_mode_util.h   | 2 ++
 drivers/gpu/drm/amd/display/dc/dml2/dml2_dc_resource_mgmt.c   | 2 ++
 drivers/gpu/drm/amd/display/dc/dml2/dml2_dc_resource_mgmt.h   | 2 ++
 drivers/gpu/drm/amd/display/dc/dml2/dml2_dc_types.h   | 2 ++
 drivers/gpu/drm/amd/display/dc/dml2/dml2_internal_types.h | 2 ++
 drivers/gpu/drm/amd/display/dc/dml2/dml2_mall_phantom.c   | 2 ++
 drivers/gpu/drm/amd/display/dc/dml2/dml2_mall_phantom.h   | 2 ++
 drivers/gpu/drm/amd/display/dc/dml2/dml2_policy.c | 2 ++
 drivers/gpu/drm/amd/display/dc/dml2/dml2_translation_helper.c | 2 ++
 drivers/gpu/drm/amd/display/dc/dml2/dml2_translation_helper.h | 2 ++
 drivers/gpu/drm/amd/display/dc/dml2/dml2_utils.c  | 2 ++
 drivers/gpu/drm/amd/display/dc/dml2/dml2_wrapper.c| 2 ++
 drivers/gpu/drm/amd/display/dc/dml2/dml2_wrapper.h| 2 ++
 drivers/gpu/drm/amd/display/dc/dml2/dml_assert.h  | 2 ++
 drivers/gpu/drm/amd/display/dc/dml2/dml_depedencies.h | 2 ++
 drivers/gpu/drm/amd/display/dc/dml2/dml_logging.h | 2 ++
 22 files changed, 45 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/display/dc/dml2/Makefile 
b/drivers/gpu/drm/amd/display/dc/dml2/Makefile
index f35ed8de260d..d9137675d8b4 100644
--- a/drivers/gpu/drm/amd/display/dc/dml2/Makefile
+++ b/drivers/gpu/drm/amd/display/dc/dml2/Makefile
@@ -20,7 +20,9 @@
 # ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
 # OTHER DEALINGS IN THE SOFTWARE.
 #
-# makefile for dml2
+# Authors: AMD
+#
+# Makefile for dml2.
 
 ifdef CONFIG_X86
 dml2_ccflags-$(CONFIG_CC_IS_GCC) := -mhard-float
diff --git a/drivers/gpu/drm/amd/display/dc/dml2/cmntypes.h 
b/drivers/gpu/drm/amd/display/dc/dml2/cmntypes.h
index 5450aa5295f7..e450445bc05d 100644
--- a/drivers/gpu/drm/amd/display/dc/dml2/cmntypes.h
+++ b/drivers/gpu/drm/amd/display/dc/dml2/cmntypes.h
@@ -20,6 +20,8 @@
  * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
  * OTHER DEALINGS IN THE SOFTWARE.
  *
+ * Authors: AMD
+ *
  */
 
 #ifndef __CMNTYPES_H__
diff --git a/drivers/gpu/drm/amd/display/dc/dml2/display_mode_core.c 
b/drivers/gpu/drm/amd/display/dc/dml2/display_mode_core.c
index cff53d28d3c5..4f906fcd83d0 100644
--- a/drivers/gpu/drm/amd/display/dc/dml2/display_mode_core.c
+++ b/drivers/gpu/drm/amd/display/dc/dml2/display_mode_core.c
@@ -20,6 +20,8 @@
  * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
  * OTHER DEALINGS IN THE SOFTWARE.
  *
+ * Authors: AMD
+ *
  */
 
 #include "display_mode_core.h"
diff --git a/drivers/gpu/drm/amd/display/dc/dml2/display_mode_core_structs.h 
b/drivers/gpu/drm/amd/display/dc/dml2/display_mode_core_structs.h
index c2fa28ff57ab..b274bfb4225f 100644
--- a/drivers/gpu/drm/amd/display/dc/dml2/display_mode_core_structs.h
+++ b/drivers/gpu/drm/amd/display/dc/dml2/display_mode_core_structs.h
@@ -20,6 +20,8 @@
  * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
  * OTHER DEALINGS IN THE SOFTWARE.
  *
+ * Authors: AMD
+ *
  */
 
 #ifndef __DISPLAY_MODE_CORE_STRUCT_H__
diff --git a/drivers/gpu/drm/amd/display/dc/dml2/display_mode_lib_defines.h 
b/drivers/gpu/drm/amd/display/dc/dml2/display_mode_lib_defines.h
index 99bdb2ddd8ab..de63364be01d 100644
--- a/drivers/gpu/drm/amd/display/dc/dml2/display_mode_lib_defines.h
+++ b/drivers/gpu/drm/amd/display/dc/dml2/display_mode_lib_defines.h
@@ -20,6 +20,8 @@
  * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
  * OTHER DEALINGS IN THE SOFTWARE.
  *
+ * Authors: AMD
+ *
  */
 
 #ifndef __DISPLAY_MODE_LIB_DEFINES_H__
diff --git a/drivers/gpu/drm/amd/display/dc/dml2/display_mode_util.c 
b/drivers/gpu/drm/amd/display/dc/dml2/display_mode_util.c
index 7dd1f8a12582..c247aee89caf 100644
--- a/drivers/gpu/drm/amd/display/dc/dml2/display_mode_util.c
+++ b/drivers/gpu/drm/amd/display/dc/dml2/display_mode_util.c
@@ -20,6 +20,8 @@
  * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
  * OTHER DEALINGS IN THE SOFTWARE.
  *
+ * Authors: AMD
+ *
  */
 
 #include "display_mode_util.h"
diff --git a/drivers/gpu/drm/amd/display/dc/dml2/display_mode_util.h 
b/drivers/gpu/drm/amd/display/dc/dml2/display_mode_util.h
index fb74385e1060..113b0265e1d1 100644
--- a/drivers/gpu/drm/amd/display/dc/dml2/display_mode_util.h
+++ b/drivers/gpu/drm/amd/display/dc/dml2/display_mode_util.h
@@ -20,6 +20,8 @@
  * ARISING FROM, OUT OF OR IN CONNECTION 

[PATCH 1/3] drm/amd/display: Add missing copyright notice in DMUB

2023-10-20 Thread Stylon Wang
[Why & How]
Add missing/incomplete copyright notice in DMUB files

Signed-off-by: Stylon Wang 
---
 .../drm/amd/display/dmub/src/dmub_dcn303.c| 19 +++
 .../drm/amd/display/dmub/src/dmub_dcn303.h| 19 +++
 2 files changed, 38 insertions(+)

diff --git a/drivers/gpu/drm/amd/display/dmub/src/dmub_dcn303.c 
b/drivers/gpu/drm/amd/display/dmub/src/dmub_dcn303.c
index b42369984473..878700160fa9 100644
--- a/drivers/gpu/drm/amd/display/dmub/src/dmub_dcn303.c
+++ b/drivers/gpu/drm/amd/display/dmub/src/dmub_dcn303.c
@@ -2,7 +2,26 @@
 /*
  * Copyright (C) 2021 Advanced Micro Devices, Inc.
  *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, DAMAGES OR
+ * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
+ * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
+ * OTHER DEALINGS IN THE SOFTWARE.
+ *
  * Authors: AMD
+ *
  */
 
 #include "../dmub_srv.h"
diff --git a/drivers/gpu/drm/amd/display/dmub/src/dmub_dcn303.h 
b/drivers/gpu/drm/amd/display/dmub/src/dmub_dcn303.h
index 84141d450256..abe087251cc1 100644
--- a/drivers/gpu/drm/amd/display/dmub/src/dmub_dcn303.h
+++ b/drivers/gpu/drm/amd/display/dmub/src/dmub_dcn303.h
@@ -2,7 +2,26 @@
 /*
  * Copyright (C) 2021 Advanced Micro Devices, Inc.
  *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, DAMAGES OR
+ * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
+ * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
+ * OTHER DEALINGS IN THE SOFTWARE.
+ *
  * Authors: AMD
+ *
  */
 
 #ifndef _DMUB_DCN303_H_
-- 
2.42.0



[PATCH 0/3]Reduce code delta with copyright notice

2023-10-20 Thread Stylon Wang
Many of the DC files have either incomplete or incorrect copyright
notice. This patchset aims to address this and also make lives
less difficult for those doing backport/upstream activities.

Stylon Wang (3):
  drm/amd/display: Add missing copyright notice in DMUB
  drm/amd/display: Fix copyright notice in DML2 code
  drm/amd/display: Fix copyright notice in DC code

 .../drm/amd/display/dc/dcn303/dcn303_dccg.h   | 18 ++
 .../drm/amd/display/dc/dcn303/dcn303_init.c   | 18 ++
 .../drm/amd/display/dc/dcn303/dcn303_init.h   | 18 ++
 .../amd/display/dc/dcn303/dcn303_resource.c   | 18 ++
 .../amd/display/dc/dcn303/dcn303_resource.h   | 18 ++
 drivers/gpu/drm/amd/display/dc/dcn31/Makefile |  2 +-
 .../gpu/drm/amd/display/dc/dcn35/dcn35_dpp.c  |  2 ++
 .../gpu/drm/amd/display/dc/dcn35/dcn35_dpp.h  |  2 ++
 .../gpu/drm/amd/display/dc/dcn35/dcn35_dsc.c  |  2 ++
 .../gpu/drm/amd/display/dc/dcn35/dcn35_dsc.h  |  2 ++
 .../gpu/drm/amd/display/dc/dcn35/dcn35_dwb.h  |  2 ++
 .../drm/amd/display/dc/dcn35/dcn35_hubbub.c   |  2 ++
 .../drm/amd/display/dc/dcn35/dcn35_hubbub.h   |  2 ++
 .../gpu/drm/amd/display/dc/dcn35/dcn35_hubp.c |  2 ++
 .../gpu/drm/amd/display/dc/dcn35/dcn35_hubp.h |  2 ++
 .../gpu/drm/amd/display/dc/dcn35/dcn35_init.c |  2 ++
 .../gpu/drm/amd/display/dc/dcn35/dcn35_init.h |  2 ++
 .../drm/amd/display/dc/dcn35/dcn35_mmhubbub.c |  2 ++
 .../drm/amd/display/dc/dcn35/dcn35_mmhubbub.h |  2 ++
 .../gpu/drm/amd/display/dc/dcn35/dcn35_opp.c  |  2 ++
 .../gpu/drm/amd/display/dc/dcn35/dcn35_opp.h  |  2 ++
 .../gpu/drm/amd/display/dc/dcn35/dcn35_optc.c |  2 ++
 .../gpu/drm/amd/display/dc/dcn35/dcn35_optc.h |  2 ++
 .../drm/amd/display/dc/dcn35/dcn35_pg_cntl.c  |  2 ++
 .../drm/amd/display/dc/dcn35/dcn35_pg_cntl.h  |  2 ++
 .../drm/amd/display/dc/dcn35/dcn35_resource.c |  2 ++
 .../drm/amd/display/dc/dcn35/dcn35_resource.h |  2 ++
 drivers/gpu/drm/amd/display/dc/dml2/Makefile  |  4 +++-
 .../gpu/drm/amd/display/dc/dml2/cmntypes.h|  2 ++
 .../amd/display/dc/dml2/display_mode_core.c   |  2 ++
 .../dc/dml2/display_mode_core_structs.h   |  2 ++
 .../dc/dml2/display_mode_lib_defines.h|  2 ++
 .../amd/display/dc/dml2/display_mode_util.c   |  2 ++
 .../amd/display/dc/dml2/display_mode_util.h   |  2 ++
 .../display/dc/dml2/dml2_dc_resource_mgmt.c   |  2 ++
 .../display/dc/dml2/dml2_dc_resource_mgmt.h   |  2 ++
 .../drm/amd/display/dc/dml2/dml2_dc_types.h   |  2 ++
 .../amd/display/dc/dml2/dml2_internal_types.h |  2 ++
 .../amd/display/dc/dml2/dml2_mall_phantom.c   |  2 ++
 .../amd/display/dc/dml2/dml2_mall_phantom.h   |  2 ++
 .../gpu/drm/amd/display/dc/dml2/dml2_policy.c |  2 ++
 .../display/dc/dml2/dml2_translation_helper.c |  2 ++
 .../display/dc/dml2/dml2_translation_helper.h |  2 ++
 .../gpu/drm/amd/display/dc/dml2/dml2_utils.c  |  2 ++
 .../drm/amd/display/dc/dml2/dml2_wrapper.c|  2 ++
 .../drm/amd/display/dc/dml2/dml2_wrapper.h|  2 ++
 .../gpu/drm/amd/display/dc/dml2/dml_assert.h  |  2 ++
 .../drm/amd/display/dc/dml2/dml_depedencies.h |  2 ++
 .../gpu/drm/amd/display/dc/dml2/dml_logging.h |  2 ++
 drivers/gpu/drm/amd/display/dc/hdcp/Makefile  |  2 +-
 .../amd/display/dc/hwss/dcn303/dcn303_hwseq.c | 19 +++
 .../amd/display/dc/hwss/dcn303/dcn303_hwseq.h | 19 +++
 .../amd/display/dc/hwss/dcn35/dcn35_hwseq.c   |  2 ++
 .../amd/display/dc/hwss/dcn35/dcn35_hwseq.h   |  2 ++
 .../dc/irq/dcn201/irq_service_dcn201.c|  2 +-
 .../dc/irq/dcn303/irq_service_dcn303.c| 19 +++
 .../dc/irq/dcn303/irq_service_dcn303.h| 19 +++
 .../drm/amd/display/dmub/src/dmub_dcn303.c| 19 +++
 .../drm/amd/display/dmub/src/dmub_dcn303.h| 19 +++
 59 files changed, 298 insertions(+), 4 deletions(-)

-- 
2.42.0



[PATCH] drm/amd/display: Fix DMUB errors introduced by DML2

2023-10-20 Thread Rodrigo Siqueira
When DML 2 was introduced, it changed part of the generic sequence of
DC, which caused issues on previous DCNs with DMUB support. This commit
ensures the new sequence only works for new DCNs from 3.5 and above.

Cc: Vitaly Prosyak 
Cc: Roman Li 
Cc: Qingqing Zhuo 
Cc: Daniel Wheeler 
Cc: Alex Deucher 
Fixes: 7966f319c66d ("drm/amd/display: Introduce DML2")
Signed-off-by: Rodrigo Siqueira 
---
 drivers/gpu/drm/amd/display/dc/core/dc_resource.c | 9 +
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/amd/display/dc/core/dc_resource.c 
b/drivers/gpu/drm/amd/display/dc/core/dc_resource.c
index 97f402123fbb..73cc6e1b0e65 100644
--- a/drivers/gpu/drm/amd/display/dc/core/dc_resource.c
+++ b/drivers/gpu/drm/amd/display/dc/core/dc_resource.c
@@ -321,10 +321,11 @@ struct resource_pool *dc_create_resource_pool(struct dc  
*dc,
res_pool->ref_clocks.xtalin_clock_inKhz;
res_pool->ref_clocks.dchub_ref_clock_inKhz =
res_pool->ref_clocks.xtalin_clock_inKhz;
-   if (res_pool->hubbub && 
res_pool->hubbub->funcs->get_dchub_ref_freq)
-   
res_pool->hubbub->funcs->get_dchub_ref_freq(res_pool->hubbub,
-   
res_pool->ref_clocks.dccg_ref_clock_inKhz,
-   
_pool->ref_clocks.dchub_ref_clock_inKhz);
+   if (dc_version >= DCN_VERSION_3_5)
+   if (res_pool->hubbub && 
res_pool->hubbub->funcs->get_dchub_ref_freq)
+   
res_pool->hubbub->funcs->get_dchub_ref_freq(res_pool->hubbub,
+   
res_pool->ref_clocks.dccg_ref_clock_inKhz,
+   
_pool->ref_clocks.dchub_ref_clock_inKhz);
} else
ASSERT_CRITICAL(false);
}
-- 
2.42.0



Re: [PATCH] drm/amdgpu: Remove redundant call to priority_is_valid()

2023-10-20 Thread Alex Deucher
On Tue, Oct 17, 2023 at 9:22 PM Luben Tuikov  wrote:
>
> Remove a redundant call to amdgpu_ctx_priority_is_valid() from
> amdgpu_ctx_priority_permit(), which is called from amdgpu_ctx_init() which is
> called from amdgpu_ctx_alloc() which is called from amdgpu_ctx_ioctl(), where
> we've called amdgpu_ctx_priority_is_valid() already first thing in the
> function.
>
> Cc: Alex Deucher 
> Cc: Christian König 
> Signed-off-by: Luben Tuikov 

Please push this to drm-misc since it depends on your previous patches.

Alex

> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c | 15 ---
>  1 file changed, 8 insertions(+), 7 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c
> index 68db924161ef66..4c6ffca97c4512 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c
> @@ -56,6 +56,10 @@ bool amdgpu_ctx_priority_is_valid(int32_t ctx_prio)
> return true;
> default:
> case AMDGPU_CTX_PRIORITY_UNSET:
> +   /* UNSET priority is not valid and we don't carry that
> +* around, but set it to NORMAL in the only place this
> +* function is called, amdgpu_ctx_ioctl().
> +*/
> return false;
> }
>  }
> @@ -96,9 +100,6 @@ amdgpu_ctx_to_drm_sched_prio(int32_t ctx_prio)
>  static int amdgpu_ctx_priority_permit(struct drm_file *filp,
>   int32_t priority)
>  {
> -   if (!amdgpu_ctx_priority_is_valid(priority))
> -   return -EINVAL;
> -
> /* NORMAL and below are accessible by everyone */
> if (priority <= AMDGPU_CTX_PRIORITY_NORMAL)
> return 0;
> @@ -625,8 +626,6 @@ static int amdgpu_ctx_query2(struct amdgpu_device *adev,
> return 0;
>  }
>
> -
> -
>  static int amdgpu_ctx_stable_pstate(struct amdgpu_device *adev,
> struct amdgpu_fpriv *fpriv, uint32_t id,
> bool set, u32 *stable_pstate)
> @@ -669,8 +668,10 @@ int amdgpu_ctx_ioctl(struct drm_device *dev, void *data,
> id = args->in.ctx_id;
> priority = args->in.priority;
>
> -   /* For backwards compatibility reasons, we need to accept
> -* ioctls with garbage in the priority field */
> +   /* For backwards compatibility, we need to accept ioctls with garbage
> +* in the priority field. Garbage values in the priority field, result
> +* in the priority being set to NORMAL.
> +*/
> if (!amdgpu_ctx_priority_is_valid(priority))
> priority = AMDGPU_CTX_PRIORITY_NORMAL;
>
>
> base-commit: 915718484b8fa1eede4499a939e2e4fc0d85caa4
> prerequisite-patch-id: a36f628997d923f66da5342e760e8b45ff959fb8
> prerequisite-patch-id: f15148c302329c0c60d86040571c61d367bd05e7
> --
> 2.42.0
>


Re: [PATCH] drm/amdkfd: reserve a fence slot while locking the BO

2023-10-20 Thread Deucher, Alexander
[Public]

Acked-by: Alex Deucher 

From: amd-gfx  on behalf of Christian 
König 
Sent: Friday, October 20, 2023 8:33 AM
To: Shi, Leslie ; Kuehling, Felix 
; amd-gfx@lists.freedesktop.org 

Cc: Koenig, Christian 
Subject: [PATCH] drm/amdkfd: reserve a fence slot while locking the BO

Looks like the KFD still needs this.

Signed-off-by: Christian König 
Fixes: 8abc1eb2987a ("drm/amdkfd: switch over to using drm_exec v3")
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
index 7d6daf8d2bfa..e036011137aa 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
@@ -1103,7 +1103,7 @@ static int reserve_bo_and_vm(struct kgd_mem *mem,
 if (unlikely(ret))
 goto error;

-   ret = drm_exec_lock_obj(>exec, >tbo.base);
+   ret = drm_exec_prepare_obj(>exec, >tbo.base, 1);
 drm_exec_retry_on_contention(>exec);
 if (unlikely(ret))
 goto error;
--
2.34.1



[PATCH] drm/amdgpu: Use pcie domain of xcc acpi objects

2023-10-20 Thread Lijo Lazar
PCI domain/segment information of xccs is available through ACPI DSM
methods. Consider that also while looking for devices.

Signed-off-by: Lijo Lazar 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_acpi.c | 40 +---
 1 file changed, 22 insertions(+), 18 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_acpi.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_acpi.c
index 2bca37044ad0..d62e49758635 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_acpi.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_acpi.c
@@ -68,7 +68,7 @@ struct amdgpu_acpi_xcc_info {
 struct amdgpu_acpi_dev_info {
struct list_head list;
struct list_head xcc_list;
-   uint16_t bdf;
+   uint32_t sbdf;
uint16_t supp_xcp_mode;
uint16_t xcp_mode;
uint16_t mem_mode;
@@ -927,7 +927,7 @@ static acpi_status amdgpu_acpi_get_node_id(acpi_handle 
handle,
 #endif
 }
 
-static struct amdgpu_acpi_dev_info *amdgpu_acpi_get_dev(u16 bdf)
+static struct amdgpu_acpi_dev_info *amdgpu_acpi_get_dev(u32 sbdf)
 {
struct amdgpu_acpi_dev_info *acpi_dev;
 
@@ -935,14 +935,14 @@ static struct amdgpu_acpi_dev_info 
*amdgpu_acpi_get_dev(u16 bdf)
return NULL;
 
list_for_each_entry(acpi_dev, _acpi_dev_list, list)
-   if (acpi_dev->bdf == bdf)
+   if (acpi_dev->sbdf == sbdf)
return acpi_dev;
 
return NULL;
 }
 
 static int amdgpu_acpi_dev_init(struct amdgpu_acpi_dev_info **dev_info,
-   struct amdgpu_acpi_xcc_info *xcc_info, u16 bdf)
+   struct amdgpu_acpi_xcc_info *xcc_info, u32 sbdf)
 {
struct amdgpu_acpi_dev_info *tmp;
union acpi_object *obj;
@@ -955,7 +955,7 @@ static int amdgpu_acpi_dev_init(struct amdgpu_acpi_dev_info 
**dev_info,
 
INIT_LIST_HEAD(>xcc_list);
INIT_LIST_HEAD(>list);
-   tmp->bdf = bdf;
+   tmp->sbdf = sbdf;
 
obj = acpi_evaluate_dsm_typed(xcc_info->handle, _xcc_dsm_guid, 0,
  AMD_XCC_DSM_GET_SUPP_MODE, NULL,
@@ -1007,7 +1007,7 @@ static int amdgpu_acpi_dev_init(struct 
amdgpu_acpi_dev_info **dev_info,
 
DRM_DEBUG_DRIVER(
"New dev(%x): Supported xcp mode: %x curr xcp_mode : %x mem 
mode : %x, tmr base: %llx tmr size: %llx  ",
-   tmp->bdf, tmp->supp_xcp_mode, tmp->xcp_mode, tmp->mem_mode,
+   tmp->sbdf, tmp->supp_xcp_mode, tmp->xcp_mode, tmp->mem_mode,
tmp->tmr_base, tmp->tmr_size);
list_add_tail(>list, _acpi_dev_list);
*dev_info = tmp;
@@ -1023,7 +1023,7 @@ static int amdgpu_acpi_dev_init(struct 
amdgpu_acpi_dev_info **dev_info,
 }
 
 static int amdgpu_acpi_get_xcc_info(struct amdgpu_acpi_xcc_info *xcc_info,
-   u16 *bdf)
+   u32 *sbdf)
 {
union acpi_object *obj;
acpi_status status;
@@ -1054,8 +1054,10 @@ static int amdgpu_acpi_get_xcc_info(struct 
amdgpu_acpi_xcc_info *xcc_info,
xcc_info->phy_id = (obj->integer.value >> 32) & 0xFF;
/* xcp node of this xcc [47:40] */
xcc_info->xcp_node = (obj->integer.value >> 40) & 0xFF;
+   /* PF domain of this xcc [31:16] */
+   *sbdf = (obj->integer.value) & 0x;
/* PF bus/dev/fn of this xcc [63:48] */
-   *bdf = (obj->integer.value >> 48) & 0x;
+   *sbdf |= (obj->integer.value >> 48) & 0x;
ACPI_FREE(obj);
obj = NULL;
 
@@ -1079,7 +1081,7 @@ static int amdgpu_acpi_enumerate_xcc(void)
struct acpi_device *acpi_dev;
char hid[ACPI_ID_LEN];
int ret, id;
-   u16 bdf;
+   u32 sbdf;
 
INIT_LIST_HEAD(_acpi_dev_list);
xa_init(_info_xa);
@@ -1107,16 +1109,16 @@ static int amdgpu_acpi_enumerate_xcc(void)
xcc_info->handle = acpi_device_handle(acpi_dev);
acpi_dev_put(acpi_dev);
 
-   ret = amdgpu_acpi_get_xcc_info(xcc_info, );
+   ret = amdgpu_acpi_get_xcc_info(xcc_info, );
if (ret) {
kfree(xcc_info);
continue;
}
 
-   dev_info = amdgpu_acpi_get_dev(bdf);
+   dev_info = amdgpu_acpi_get_dev(sbdf);
 
if (!dev_info)
-   ret = amdgpu_acpi_dev_init(_info, xcc_info, bdf);
+   ret = amdgpu_acpi_dev_init(_info, xcc_info, sbdf);
 
if (ret == -ENOMEM)
return ret;
@@ -1136,13 +1138,14 @@ int amdgpu_acpi_get_tmr_info(struct amdgpu_device 
*adev, u64 *tmr_offset,
 u64 *tmr_size)
 {
struct amdgpu_acpi_dev_info *dev_info;
-   u16 bdf;
+   u32 sbdf;
 
if (!tmr_offset || !tmr_size)
return -EINVAL;
 
-   bdf = pci_dev_id(adev->pdev);
-   dev_info = amdgpu_acpi_get_dev(bdf);
+   sbdf = (pci_domain_nr(adev->pdev->bus) << 16);
+   sbdf |= 

[PATCH 2/3] drm/amdgpu: prepare the output buffer for GET_PEER_LINKS command

2023-10-20 Thread Shiwu Zhang
Per the xgmi ta implementation, KGD needs to fill in node_ids
in concern into the shared command output buffer rather than the
command input buffer.

Input buffer is not used for GET_PEER_LINKS command execution.

In this way, xgmi ta can reuse the node info in the output buffer
just filled in and populate the same buffer with link info directly.

Signed-off-by: Shiwu Zhang 
Reviewed-by: Le Ma 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c | 12 ++--
 1 file changed, 10 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c
index 0c7900f0d906..cea17ce9ac99 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c
@@ -1431,14 +1431,22 @@ int psp_xgmi_get_topology_info(struct psp_context *psp,
amdgpu_ip_version(psp->adev, MP0_HWIP, 0) ==
IP_VERSION(13, 0, 6);
 
-   xgmi_cmd->cmd_id = TA_COMMAND_XGMI__GET_PEER_LINKS;
+   link_info_output = _cmd->xgmi_out_message.get_link_info;
+   /* popluate the shared output buffer rather than the cmd input 
buffer
+* with node_ids as the input for GET_PEER_LINKS command 
execution.
+* This is required for GET_PEER_LINKS only per xgmi ta 
implementation
+*/
+   for (i = 0; i < topology->num_nodes; i++) {
+   link_info_output->nodes[i].node_id = 
topology->nodes[i].node_id;
+   }
+   link_info_output->num_nodes = topology->num_nodes;
 
+   xgmi_cmd->cmd_id = TA_COMMAND_XGMI__GET_PEER_LINKS;
ret = psp_xgmi_invoke(psp, TA_COMMAND_XGMI__GET_PEER_LINKS);
 
if (ret)
return ret;
 
-   link_info_output = _cmd->xgmi_out_message.get_link_info;
for (i = 0; i < topology->num_nodes; i++) {
/* accumulate num_links on extended data */
topology->nodes[i].num_links = get_extended_data ?
-- 
2.17.1



[PATCH 3/3] drm/amdgpu: support the port num info based on the capability flag

2023-10-20 Thread Shiwu Zhang
XGMI TA will set the capability flag to indicate whether the port_num
info is supported or not. KGD checks the flag and accordingly picks up
the right buffer format and send the right command to TA to retrieve
the info.

v2: simplify the code by reusing the same statement (lijo)

Signed-off-by: Shiwu Zhang 
Acked-by: Lijo Lazar 
Reviewed-by: Le Ma 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c | 45 ++---
 drivers/gpu/drm/amd/amdgpu/amdgpu_psp.h |  1 +
 2 files changed, 33 insertions(+), 13 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c
index cea17ce9ac99..7eede4747fe2 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c
@@ -1267,6 +1267,8 @@ int psp_xgmi_initialize(struct psp_context *psp, bool 
set_extended_data, bool lo
xgmi_cmd->cmd_id = TA_COMMAND_XGMI__INITIALIZE;
 
ret = psp_xgmi_invoke(psp, xgmi_cmd->cmd_id);
+   /* note down the capbility flag for XGMI TA */
+   psp->xgmi_context.xgmi_ta_caps = xgmi_cmd->caps_flag;
 
return ret;
 }
@@ -1425,35 +1427,52 @@ int psp_xgmi_get_topology_info(struct psp_context *psp,
/* Invoke xgmi ta again to get the link information */
if (psp_xgmi_peer_link_info_supported(psp)) {
struct ta_xgmi_cmd_get_peer_link_info *link_info_output;
+   struct ta_xgmi_cmd_get_extend_peer_link_info 
*link_extend_info_output;
bool requires_reflection =
(psp->xgmi_context.supports_extended_data &&
 get_extended_data) ||
amdgpu_ip_version(psp->adev, MP0_HWIP, 0) ==
IP_VERSION(13, 0, 6);
+   bool ta_port_num_support = psp->xgmi_context.xgmi_ta_caps &
+   EXTEND_PEER_LINK_INFO_CMD_FLAG;
 
-   link_info_output = _cmd->xgmi_out_message.get_link_info;
/* popluate the shared output buffer rather than the cmd input 
buffer
 * with node_ids as the input for GET_PEER_LINKS command 
execution.
-* This is required for GET_PEER_LINKS only per xgmi ta 
implementation
+* This is required for GET_PEER_LINKS per xgmi ta 
implementation.
+* The same requirement for GET_EXTEND_PEER_LINKS command.
 */
-   for (i = 0; i < topology->num_nodes; i++) {
-   link_info_output->nodes[i].node_id = 
topology->nodes[i].node_id;
-   }
-   link_info_output->num_nodes = topology->num_nodes;
+   if (ta_port_num_support) {
+   link_extend_info_output = 
_cmd->xgmi_out_message.get_extend_link_info;
+
+   for (i = 0; i < topology->num_nodes; i++)
+   link_extend_info_output->nodes[i].node_id = 
topology->nodes[i].node_id;
+
+   link_extend_info_output->num_nodes = 
topology->num_nodes;
+   xgmi_cmd->cmd_id = 
TA_COMMAND_XGMI__GET_EXTEND_PEER_LINKS;
+   } else {
+   link_info_output = 
_cmd->xgmi_out_message.get_link_info;
 
-   xgmi_cmd->cmd_id = TA_COMMAND_XGMI__GET_PEER_LINKS;
-   ret = psp_xgmi_invoke(psp, TA_COMMAND_XGMI__GET_PEER_LINKS);
+   for (i = 0; i < topology->num_nodes; i++)
+   link_info_output->nodes[i].node_id = 
topology->nodes[i].node_id;
 
+   link_info_output->num_nodes = topology->num_nodes;
+   xgmi_cmd->cmd_id = TA_COMMAND_XGMI__GET_PEER_LINKS;
+   }
+
+   ret = psp_xgmi_invoke(psp, xgmi_cmd->cmd_id);
if (ret)
return ret;
 
for (i = 0; i < topology->num_nodes; i++) {
+   uint8_t node_num_links = ta_port_num_support ?
+   link_extend_info_output->nodes[i].num_links : 
link_info_output->nodes[i].num_links;
/* accumulate num_links on extended data */
-   topology->nodes[i].num_links = get_extended_data ?
-   topology->nodes[i].num_links +
-   
link_info_output->nodes[i].num_links :
-   ((requires_reflection && 
topology->nodes[i].num_links) ? topology->nodes[i].num_links :
-link_info_output->nodes[i].num_links);
+   if (get_extended_data) {
+   topology->nodes[i].num_links = 
topology->nodes[i].num_links + node_num_links;
+   } else {
+   topology->nodes[i].num_links = 
(requires_reflection && topology->nodes[i].num_links) ?
+  

[PATCH 1/3] drm/amdgpu: update the xgmi ta interface header

2023-10-20 Thread Shiwu Zhang
Update the header file to the v20.00.00.13

v1: rename TA_COMMAND_XGMI__GET_GET_TOPOLOGY_INFO to
TA_COMMAND_XGMI__GET_TOPOLOGY_INFO

And also rename struct ta_xgmi_cmd_get_peer_link_info_output to
ta_xgmi_cmd_get_peer_link_info accordingly

v2: add structs to support xgmi GET_EXTEND_PEER_LINK command

Signed-off-by: Shiwu Zhang 
Reviewed-by: Le Ma 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c |  6 +--
 drivers/gpu/drm/amd/amdgpu/ta_xgmi_if.h | 62 +++--
 2 files changed, 51 insertions(+), 17 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c
index 7158d478eeea..0c7900f0d906 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c
@@ -1388,7 +1388,7 @@ int psp_xgmi_get_topology_info(struct psp_context *psp,
 
/* Fill in the shared memory with topology information as input */
topology_info_input = _cmd->xgmi_in_message.get_topology_info;
-   xgmi_cmd->cmd_id = TA_COMMAND_XGMI__GET_GET_TOPOLOGY_INFO;
+   xgmi_cmd->cmd_id = TA_COMMAND_XGMI__GET_TOPOLOGY_INFO;
topology_info_input->num_nodes = number_devices;
 
for (i = 0; i < topology_info_input->num_nodes; i++) {
@@ -1399,7 +1399,7 @@ int psp_xgmi_get_topology_info(struct psp_context *psp,
}
 
/* Invoke xgmi ta to get the topology information */
-   ret = psp_xgmi_invoke(psp, TA_COMMAND_XGMI__GET_GET_TOPOLOGY_INFO);
+   ret = psp_xgmi_invoke(psp, TA_COMMAND_XGMI__GET_TOPOLOGY_INFO);
if (ret)
return ret;
 
@@ -1424,7 +1424,7 @@ int psp_xgmi_get_topology_info(struct psp_context *psp,
 
/* Invoke xgmi ta again to get the link information */
if (psp_xgmi_peer_link_info_supported(psp)) {
-   struct ta_xgmi_cmd_get_peer_link_info_output *link_info_output;
+   struct ta_xgmi_cmd_get_peer_link_info *link_info_output;
bool requires_reflection =
(psp->xgmi_context.supports_extended_data &&
 get_extended_data) ||
diff --git a/drivers/gpu/drm/amd/amdgpu/ta_xgmi_if.h 
b/drivers/gpu/drm/amd/amdgpu/ta_xgmi_if.h
index da815a93d46e..d5748032674e 100644
--- a/drivers/gpu/drm/amd/amdgpu/ta_xgmi_if.h
+++ b/drivers/gpu/drm/amd/amdgpu/ta_xgmi_if.h
@@ -1,5 +1,5 @@
 /*
- * Copyright 2018 Advanced Micro Devices, Inc.
+ * Copyright 2018-2022 Advanced Micro Devices, Inc.
  *
  * Permission is hereby granted, free of charge, to any person obtaining a
  * copy of this software and associated documentation files (the "Software"),
@@ -20,7 +20,6 @@
  * OTHER DEALINGS IN THE SOFTWARE.
  *
  */
-
 #ifndef _TA_XGMI_IF_H
 #define _TA_XGMI_IF_H
 
@@ -28,20 +27,31 @@
 #define RSP_ID_MASK (1U << 31)
 #define RSP_ID(cmdId) (((uint32_t)(cmdId)) | RSP_ID_MASK)
 
+#define EXTEND_PEER_LINK_INFO_CMD_FLAG 1
+
 enum ta_command_xgmi {
+   /* Initialize the Context and Session Topology */
TA_COMMAND_XGMI__INITIALIZE = 0x00,
+   /* Gets the current GPU's node ID */
TA_COMMAND_XGMI__GET_NODE_ID= 0x01,
+   /* Gets the current GPU's hive ID */
TA_COMMAND_XGMI__GET_HIVE_ID= 0x02,
-   TA_COMMAND_XGMI__GET_GET_TOPOLOGY_INFO  = 0x03,
+   /* Gets the Peer's topology Information */
+   TA_COMMAND_XGMI__GET_TOPOLOGY_INFO  = 0x03,
+   /* Sets the Peer's topology Information */
TA_COMMAND_XGMI__SET_TOPOLOGY_INFO  = 0x04,
-   TA_COMMAND_XGMI__GET_PEER_LINKS = 0x0B
+   /* Gets the total links between adjacent peer dies in hive */
+   TA_COMMAND_XGMI__GET_PEER_LINKS = 0x0B,
+   /* Gets the total links and connected port numbers between adjacent 
peer dies in hive */
+   TA_COMMAND_XGMI__GET_EXTEND_PEER_LINKS  = 0x0C
 };
 
 /* XGMI related enumerations */
 /**/;
-enum ta_xgmi_connected_nodes {
-   TA_XGMI__MAX_CONNECTED_NODES= 64
-};
+enum { TA_XGMI__MAX_CONNECTED_NODES = 64 };
+enum { TA_XGMI__MAX_INTERNAL_STATE = 32 };
+enum { TA_XGMI__MAX_INTERNAL_STATE_BUFFER = 128 };
+enum { TA_XGMI__MAX_PORT_NUM = 8 };
 
 enum ta_xgmi_status {
TA_XGMI_STATUS__SUCCESS = 0x00,
@@ -81,6 +91,18 @@ struct ta_xgmi_peer_link_info {
uint8_t num_links;
 };
 
+struct xgmi_connected_port_num {
+   uint8_t dst_xgmi_port_num;
+   uint8_t src_xgmi_port_num;
+};
+
+/* support both the port num and num_links */
+struct ta_xgmi_extend_peer_link_info {
+   uint64_tnode_id;
+   uint8_t num_links;
+   struct xgmi_connected_port_num  port_num[TA_XGMI__MAX_PORT_NUM];
+};
+
 struct ta_xgmi_cmd_initialize_output {
uint32_tstatus;
 };
@@ -103,16 +125,21 @@ struct 

[PATCH v2] drm/amdkfd: Use partial mapping in GPU page fault recovery

2023-10-20 Thread Xiaogang . Chen
From: Xiaogang Chen 

After partial migration to recover GPU page fault this patch does GPU vm
space mapping for same page range that got migrated instead of mapping all
pages of svm range in which the page fault happened.

Signed-off-by: Xiaogang Chen
---
 drivers/gpu/drm/amd/amdkfd/kfd_svm.c | 27 +++
 1 file changed, 19 insertions(+), 8 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
index 54af7a2b29f8..58f0506d5221 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
@@ -1619,6 +1619,7 @@ static void *kfd_svm_page_owner(struct kfd_process *p, 
int32_t gpuidx)
  * 5. Release page table (and SVM BO) reservation
  */
 static int svm_range_validate_and_map(struct mm_struct *mm,
+ unsigned long map_start, unsigned long 
map_last,
  struct svm_range *prange, int32_t gpuidx,
  bool intr, bool wait, bool flush_tlb)
 {
@@ -1747,9 +1748,16 @@ static int svm_range_validate_and_map(struct mm_struct 
*mm,
r = -EAGAIN;
}
 
-   if (!r)
-   r = svm_range_map_to_gpus(prange, offset, npages, 
readonly,
- ctx->bitmap, wait, flush_tlb);
+   if (!r) {
+   map_start = max(map_start, prange->start + offset);
+   map_last = min(map_last, prange->start + offset + 
npages - 1);
+   if (map_start <= map_last) {
+   offset = map_start - prange->start;
+   npages = map_last - map_start + 1;
+   r = svm_range_map_to_gpus(prange, offset, 
npages, readonly,
+ ctx->bitmap, wait, 
flush_tlb);
+   }
+   }
 
if (!r && next == end)
prange->mapped_to_gpu = true;
@@ -1855,8 +1863,8 @@ static void svm_range_restore_work(struct work_struct 
*work)
 */
mutex_lock(>migrate_mutex);
 
-   r = svm_range_validate_and_map(mm, prange, MAX_GPU_INSTANCE,
-  false, true, false);
+   r = svm_range_validate_and_map(mm, prange->start, prange->last, 
prange,
+  MAX_GPU_INSTANCE, false, true, 
false);
if (r)
pr_debug("failed %d to map 0x%lx to gpus\n", r,
 prange->start);
@@ -3069,6 +3077,8 @@ svm_range_restore_pages(struct amdgpu_device *adev, 
unsigned int pasid,
kfd_smi_event_page_fault_start(node, p->lead_thread->pid, addr,
   write_fault, timestamp);
 
+   start = prange->start;
+   last = prange->last;
if (prange->actual_loc != 0 || best_loc != 0) {
migration = true;
/* Align migration range start and size to granularity size */
@@ -3102,10 +3112,11 @@ svm_range_restore_pages(struct amdgpu_device *adev, 
unsigned int pasid,
}
}
 
-   r = svm_range_validate_and_map(mm, prange, gpuidx, false, false, false);
+   r = svm_range_validate_and_map(mm, start, last, prange, gpuidx, false,
+  false, false);
if (r)
pr_debug("failed %d to map svms 0x%p [0x%lx 0x%lx] to gpus\n",
-r, svms, prange->start, prange->last);
+r, svms, start, last);
 
kfd_smi_event_page_fault_end(node, p->lead_thread->pid, addr,
 migration);
@@ -3650,7 +3661,7 @@ svm_range_set_attr(struct kfd_process *p, struct 
mm_struct *mm,
 
flush_tlb = !migrated && update_mapping && 
prange->mapped_to_gpu;
 
-   r = svm_range_validate_and_map(mm, prange, MAX_GPU_INSTANCE,
+   r = svm_range_validate_and_map(mm, prange->start, prange->last, 
prange, MAX_GPU_INSTANCE,
   true, true, flush_tlb);
if (r)
pr_debug("failed %d to map svm range\n", r);
-- 
2.25.1



[bug report] drm/amd/display: Introduce DML2

2023-10-20 Thread Dan Carpenter
Hello Qingqing Zhuo,

The patch 7966f319c66d: "drm/amd/display: Introduce DML2" from Jul
28, 2023 (linux-next), leads to the following Smatch static checker
warning:

drivers/gpu/drm/amd/amdgpu/../display/dc/dml2/dml2_wrapper.c:77 
map_hw_resources()
error: buffer overflow 
'dml2->v20.scratch.dml_to_dc_pipe_mapping.disp_cfg_to_stream_id' 6 <= 7

drivers/gpu/drm/amd/amdgpu/../display/dc/dml2/dml2_wrapper.c:79 
map_hw_resources()
error: buffer overflow 
'dml2->v20.scratch.dml_to_dc_pipe_mapping.disp_cfg_to_plane_id' 6 <= 7

drivers/gpu/drm/amd/amdgpu/../display/dc/dml2/dml2_wrapper.c
59 static void map_hw_resources(struct dml2_context *dml2,
60 struct dml_display_cfg_st *in_out_display_cfg, struct 
dml_mode_support_info_st *mode_support_info)
61 {
62 unsigned int num_pipes = 0;
63 int i, j;
64 
65 for (i = 0; i < __DML_NUM_PLANES__; i++) {
   ^^

__DML_NUM_PLANES__ is 8.  This loops 0-7.

66 in_out_display_cfg->hw.ODMMode[i] = 
mode_support_info->ODMMode[i];
67 in_out_display_cfg->hw.DPPPerSurface[i] = 
mode_support_info->DPPPerSurface[i];
68 in_out_display_cfg->hw.DSCEnabled[i] = 
mode_support_info->DSCEnabled[i];
69 in_out_display_cfg->hw.NumberOfDSCSlices[i] = 
mode_support_info->NumberOfDSCSlices[i];
70 in_out_display_cfg->hw.DLGRefClkFreqMHz = 24;
71 if (dml2->v20.dml_core_ctx.project != dml_project_dcn35 
&&
72 dml2->v20.dml_core_ctx.project != 
dml_project_dcn351) {
73 /*dGPU default as 50Mhz*/
74 in_out_display_cfg->hw.DLGRefClkFreqMHz = 50;
75 }
76 for (j = 0; j < mode_support_info->DPPPerSurface[i]; 
j++) {
--> 77 
dml2->v20.scratch.dml_to_dc_pipe_mapping.dml_pipe_idx_to_stream_id[num_pipes] = 
dml2->v20.scratch.dml_to_dc_pipe_mapping.disp_cfg_to_stream_id[i];

Both of these arrays have 6 elements.

78 
dml2->v20.scratch.dml_to_dc_pipe_mapping.dml_pipe_idx_to_stream_id_valid[num_pipes]
 = true;
79 
dml2->v20.scratch.dml_to_dc_pipe_mapping.dml_pipe_idx_to_plane_id[num_pipes] = 
dml2->v20.scratch.dml_to_dc_pipe_mapping.disp_cfg_to_plane_id[i];
80 
dml2->v20.scratch.dml_to_dc_pipe_mapping.dml_pipe_idx_to_plane_id_valid[num_pipes]
 = true;
81 num_pipes++;
82 }
83 }
84 }

regards,
dan carpenter


[bug report] drm/amd/display: Introduce DML2

2023-10-20 Thread Dan Carpenter
Hello Qingqing Zhuo,

The patch 7966f319c66d: "drm/amd/display: Introduce DML2" from Jul
28, 2023 (linux-next), leads to the following Smatch static checker
warning:

drivers/gpu/drm/amd/amdgpu/../display/dc/dml2/display_mode_core.c:2748 
TruncToValidBPP()
warn: inconsistent indenting

drivers/gpu/drm/amd/amdgpu/../display/dc/dml2/display_mode_core.c
2736 MinDSCBPP = 8;
2737 MaxDSCBPP = 3 * DSCInputBitPerComponent - 1.0 / 16;
2738 } else {
2739 if (Output == dml_hdmi) {
2740 NonDSCBPP0 = 24;
2741 NonDSCBPP1 = 24;
2742 NonDSCBPP2 = 24;
2743 } else {
2744 NonDSCBPP0 = 16;
2745 NonDSCBPP1 = 20;
2746 NonDSCBPP2 = 24;
2747 }
--> 2748 if (Format == dml_n422) {

This code should be indented another tab.

2749 MinDSCBPP = 7;
2750 MaxDSCBPP = 2 * DSCInputBitPerComponent - 1.0 
/ 16.0;
2751 } else {
2752 MinDSCBPP = 8;
2753 MaxDSCBPP = 3 * DSCInputBitPerComponent - 1.0 
/ 16.0;
2754 }
2755 }
2756 
2757 if (Output == dml_dp2p0) {
2758 MaxLinkBPP = LinkBitRate * Lanes / PixelClock * 128.0 
/ 132.0 * 383.0 / 384.0 * 65536.0 / 65540.0;

There are a bunch of other warnings as well.  Too many to review.

regards,
dan carpenter

drivers/gpu/drm/amd/amdgpu/../display/amdgpu_dm/amdgpu_dm.c:2903 dm_resume() 
warn: inconsistent indenting
drivers/gpu/drm/amd/amdgpu/../display/dc/basics/dce_calcs.c:914 
calculate_bandwidth() error: uninitialized symbol 'max_chunks_fbc_mode'.
drivers/gpu/drm/amd/amdgpu/../display/dc/basics/dce_calcs.c:917 
calculate_bandwidth() error: uninitialized symbol 'max_chunks_fbc_mode'.
drivers/gpu/drm/amd/amdgpu/../display/dc/clk_mgr/dcn35/dcn35_clk_mgr.c:702 
dcn35_clk_mgr_helper_populate_bw_params() warn: if statement not indented
drivers/gpu/drm/amd/amdgpu/../display/dc/dml2/display_mode_core.c:1207 
CalculatePrefetchSchedule() warn: inconsistent indenting
drivers/gpu/drm/amd/amdgpu/../display/dc/dml2/display_mode_core.c:1288 
CalculatePrefetchSchedule() warn: inconsistent indenting
drivers/gpu/drm/amd/amdgpu/../display/dc/dml2/display_mode_core.c:1455 
CalculatePrefetchSchedule() warn: inconsistent indenting
drivers/gpu/drm/amd/amdgpu/../display/dc/dml2/display_mode_core.c:1558 
CalculatePrefetchSchedule() warn: inconsistent indenting
drivers/gpu/drm/amd/amdgpu/../display/dc/dml2/display_mode_core.c:2645 
CalculateVMAndRowBytes() warn: previously used '*PixelPTEReqWidth' as divisor 
(see line 2634)
drivers/gpu/drm/amd/amdgpu/../display/dc/dml2/display_mode_core.c:2748 
TruncToValidBPP() warn: inconsistent indenting
drivers/gpu/drm/amd/amdgpu/../display/dc/dml2/display_mode_core.c:2750 
TruncToValidBPP() warn: inconsistent indenting
drivers/gpu/drm/amd/amdgpu/../display/dc/dml2/display_mode_core.c:2812 
TruncToValidBPP() warn: ignoring unreachable code.
drivers/gpu/drm/amd/amdgpu/../display/dc/dml2/display_mode_core.c:3171 
CalculateDCFCLKDeepSleep() warn: inconsistent indenting
drivers/gpu/drm/amd/amdgpu/../display/dc/dml2/display_mode_core.c:3749 
CalculateVMGroupAndRequestTimes() warn: inconsistent indenting
drivers/gpu/drm/amd/amdgpu/../display/dc/dml2/display_mode_core.c:4051 
CalculateStutterEfficiency() warn: inconsistent indenting
drivers/gpu/drm/amd/amdgpu/../display/dc/dml2/display_mode_core.c:4201 
CalculateSwathAndDETConfiguration() warn: inconsistent indenting
drivers/gpu/drm/amd/amdgpu/../display/dc/dml2/display_mode_core.c:4247 
CalculateSwathAndDETConfiguration() warn: inconsistent indenting
drivers/gpu/drm/amd/amdgpu/../display/dc/dml2/display_mode_core.c:4784 
CalculateSurfaceSizeInMall() warn: inconsistent indenting
drivers/gpu/drm/amd/amdgpu/../display/dc/dml2/display_mode_core.c:4789 
CalculateSurfaceSizeInMall() warn: inconsistent indenting
drivers/gpu/drm/amd/amdgpu/../display/dc/dml2/display_mode_core.c:5196 
CalculateVMRowAndSwath() warn: inconsistent indenting
drivers/gpu/drm/amd/amdgpu/../display/dc/dml2/display_mode_core.c:6013 
CalculatePrefetchBandwithSupport() warn: inconsistent indenting
drivers/gpu/drm/amd/amdgpu/../display/dc/dml2/display_mode_core.c:6195 
CalculateMaxVStartup() warn: inconsistent indenting
drivers/gpu/drm/amd/amdgpu/../display/dc/dml2/display_mode_core.c:6916 
dml_core_mode_support() warn: inconsistent indenting
drivers/gpu/drm/amd/amdgpu/../display/dc/dml2/display_mode_core.c:7206 
dml_core_mode_support() warn: inconsistent indenting
drivers/gpu/drm/amd/amdgpu/../display/dc/dml2/display_mode_core.c:7455 
dml_core_mode_support() warn: inconsistent indenting
drivers/gpu/drm/amd/amdgpu/../display/dc/dml2/display_mode_core.c:7869 
dml_core_mode_support() warn: inconsistent 

Re: [PATCH] drm/amd/pm: fix the high voltage and temperature issue on smu 13

2023-10-20 Thread Alex Deucher
On Fri, Oct 20, 2023 at 4:32 AM Kenneth Feng  wrote:
>
> fix the high voltage and temperature issue after the driver is unloaded on 
> smu 13.0.0,
> smu 13.0.7 and smu 13.0.10
>
> Signed-off-by: Kenneth Feng 
> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_device.c| 36 +++
>  drivers/gpu/drm/amd/amdgpu/gmc_v11_0.c|  4 +--
>  drivers/gpu/drm/amd/pm/swsmu/amdgpu_smu.c | 27 --
>  drivers/gpu/drm/amd/pm/swsmu/inc/amdgpu_smu.h |  1 +
>  drivers/gpu/drm/amd/pm/swsmu/inc/smu_v13_0.h  |  2 ++
>  .../gpu/drm/amd/pm/swsmu/smu13/smu_v13_0.c| 13 +++
>  .../drm/amd/pm/swsmu/smu13/smu_v13_0_0_ppt.c  |  8 -
>  .../drm/amd/pm/swsmu/smu13/smu_v13_0_7_ppt.c  |  8 -
>  8 files changed, 86 insertions(+), 13 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> index 31f8c3ead161..c5c892a8b3f9 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> @@ -3986,13 +3986,23 @@ int amdgpu_device_init(struct amdgpu_device *adev,
> }
> }
> } else {
> -   tmp = amdgpu_reset_method;
> -   /* It should do a default reset when loading or 
> reloading the driver,
> -* regardless of the module parameter reset_method.
> -*/
> -   amdgpu_reset_method = AMD_RESET_METHOD_NONE;
> -   r = amdgpu_asic_reset(adev);
> -   amdgpu_reset_method = tmp;
> +   switch (amdgpu_ip_version(adev, MP1_HWIP, 0)) {
> +   case IP_VERSION(13, 0, 0):
> +   case IP_VERSION(13, 0, 7):
> +   case IP_VERSION(13, 0, 10):
> +   r = psp_gpu_reset(adev);
> +   break;
> +   default:
> +   tmp = amdgpu_reset_method;
> +   /* It should do a default reset when loading 
> or reloading the driver,
> +* regardless of the module parameter 
> reset_method.
> +*/
> +   amdgpu_reset_method = AMD_RESET_METHOD_NONE;
> +   r = amdgpu_asic_reset(adev);
> +   amdgpu_reset_method = tmp;
> +   break;
> +   }
> +
> if (r) {
> dev_err(adev->dev, "asic reset on init 
> failed\n");
> goto failed;
> @@ -5945,6 +5955,18 @@ int amdgpu_device_baco_exit(struct drm_device *dev)
> return -ENOTSUPP;
>
> ret = amdgpu_dpm_baco_exit(adev);
> +
> +   if (!ret)
> +   switch (amdgpu_ip_version(adev, MP1_HWIP, 0)) {
> +   case IP_VERSION(13, 0, 0):
> +   case IP_VERSION(13, 0, 7):
> +   case IP_VERSION(13, 0, 10):
> +   adev->gfx.is_poweron = false;
> +   break;
> +   default:
> +   break;
> +   }

Maybe better to move this into smu_v13_0_0_baco_exit() so we keep the
asic specific details out of the common files?

> +
> if (ret)
> return ret;
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v11_0.c 
> b/drivers/gpu/drm/amd/amdgpu/gmc_v11_0.c
> index 80ca2c05b0b8..3ad38e42773b 100644
> --- a/drivers/gpu/drm/amd/amdgpu/gmc_v11_0.c
> +++ b/drivers/gpu/drm/amd/amdgpu/gmc_v11_0.c
> @@ -73,7 +73,7 @@ gmc_v11_0_vm_fault_interrupt_state(struct amdgpu_device 
> *adev,
>  * fini/suspend, so the overall state doesn't
>  * change over the course of suspend/resume.
>  */
> -   if (!adev->in_s0ix)
> +   if (!adev->in_s0ix && adev->gfx.is_poweron)
> amdgpu_gmc_set_vm_fault_masks(adev, AMDGPU_GFXHUB(0), 
> false);
> break;
> case AMDGPU_IRQ_STATE_ENABLE:
> @@ -85,7 +85,7 @@ gmc_v11_0_vm_fault_interrupt_state(struct amdgpu_device 
> *adev,
>  * fini/suspend, so the overall state doesn't
>  * change over the course of suspend/resume.
>  */
> -   if (!adev->in_s0ix)
> +   if (!adev->in_s0ix && adev->gfx.is_poweron)
> amdgpu_gmc_set_vm_fault_masks(adev, AMDGPU_GFXHUB(0), 
> true);
> break;
> default:


These changes are probably a valid bug fix on their own.

> diff --git a/drivers/gpu/drm/amd/pm/swsmu/amdgpu_smu.c 
> b/drivers/gpu/drm/amd/pm/swsmu/amdgpu_smu.c
> index 7c3356d6da5e..30e5f7161737 100644
> --- a/drivers/gpu/drm/amd/pm/swsmu/amdgpu_smu.c
> +++ b/drivers/gpu/drm/amd/pm/swsmu/amdgpu_smu.c
> @@ -733,7 +733,7 @@ static int 

Re: [PATCH 1/2] drm/amdgpu: Add timeout for sync wait

2023-10-20 Thread Christian König
No, the wait forever is what is expected and perfectly valid user 
experience.


Waiting with a timeout on the other hand sounds like a really bad idea 
to me.


Every wait with a timeout needs a justification, e.g. for example that 
userspace explicitly specified it. And I absolutely don't see that here.


Regards,
Christian.

Am 20.10.23 um 10:52 schrieb Deng, Emily:

[AMD Official Use Only - General]

Hi Christian,
  The issue is running a compute hang with a quark and trigger a compute 
job timeout. For compute, the timeout setting is 60s, but for gfx and sdma, it 
is 10s.
So, get the timeout from the sched is reasonable for different sched.
 And if wait timeout, it will print error, so won't hint real issues. And 
even it has real issue, the wait forever is bad user experience, and driver 
couldn't work anymore.

Emily Deng
Best Wishes




-Original Message-
From: Christian König 
Sent: Friday, October 20, 2023 3:29 PM
To: Deng, Emily ; amd-gfx@lists.freedesktop.org
Subject: Re: [PATCH 1/2] drm/amdgpu: Add timeout for sync wait

Am 20.10.23 um 08:13 schrieb Emily Deng:

Issue: Dead heappen during gpu recover, the call sequence as below:

amdgpu_device_gpu_recover->amdgpu_amdkfd_pre_reset-
flush_delayed_work
-> amdgpu_amdkfd_gpuvm_restore_process_bos->amdgpu_sync_wait

It is because the amdgpu_sync_wait is waiting for the bad job's fence,
and never return, so the recover couldn't continue.


Signed-off-by: Emily Deng 
---
   drivers/gpu/drm/amd/amdgpu/amdgpu_sync.c | 11 +--
   1 file changed, 9 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_sync.c
b/drivers/gpu/drm/amd/amdgpu/amdgpu_sync.c
index dcd8c066bc1f..6253d6aab7f8 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_sync.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_sync.c
@@ -406,8 +406,15 @@ int amdgpu_sync_wait(struct amdgpu_sync *sync,

bool intr)

  int i, r;

  hash_for_each_safe(sync->fences, i, tmp, e, node) {
-r = dma_fence_wait(e->fence, intr);
-if (r)
+struct drm_sched_fence *s_fence = to_drm_sched_fence(e-
fence);
+long timeout = msecs_to_jiffies(1);

That handling doesn't make much sense. If you need a timeout then you need
a timeout for the whole function.

Additional to that timeouts often just hide real problems which needs fixing.

So this here needs a much better justification otherwise it's a pretty clear 
NAK.

Regards,
Christian.


+
+if (s_fence)
+timeout = s_fence->sched->timeout;
+
+if (r == 0)
+r = -ETIMEDOUT;
+if (r < 0)
  return r;

  amdgpu_sync_entry_free(e);




[PATCH] drm/amdkfd: reserve a fence slot while locking the BO

2023-10-20 Thread Christian König
Looks like the KFD still needs this.

Signed-off-by: Christian König 
Fixes: 8abc1eb2987a ("drm/amdkfd: switch over to using drm_exec v3")
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
index 7d6daf8d2bfa..e036011137aa 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
@@ -1103,7 +1103,7 @@ static int reserve_bo_and_vm(struct kgd_mem *mem,
if (unlikely(ret))
goto error;
 
-   ret = drm_exec_lock_obj(>exec, >tbo.base);
+   ret = drm_exec_prepare_obj(>exec, >tbo.base, 1);
drm_exec_retry_on_contention(>exec);
if (unlikely(ret))
goto error;
-- 
2.34.1



Re: [PATCH Review 1/1] drm/amdgpu: Workaround to skip kiq ring test during ras gpu recovery

2023-10-20 Thread Christian König

Am 17.10.23 um 16:36 schrieb Stanley.Yang:

This is workaround, kiq ring test failed in suspend stage when do ras
recovery for gfx v9_4_3.


Any idea why that failed? Problems like this usually point to an 
incorrect init or in this case re-init procedure and are actually what 
the ring test should uncover.


Christian.



Change-Id: I8de9900aa76706f59bc029d4e9e8438c6e1db8e0
Signed-off-by: Stanley.Yang 
---
  drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c | 21 +
  1 file changed, 21 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c
index 9a158018ae16..902e60203809 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c
@@ -29,6 +29,7 @@
  #include "amdgpu_rlc.h"
  #include "amdgpu_ras.h"
  #include "amdgpu_xcp.h"
+#include "amdgpu_xgmi.h"
  
  /* delay 0.1 second to enable gfx off feature */

  #define GFX_OFF_DELAY_ENABLE msecs_to_jiffies(100)
@@ -501,6 +502,9 @@ int amdgpu_gfx_disable_kcq(struct amdgpu_device *adev, int 
xcc_id)
  {
struct amdgpu_kiq *kiq = >gfx.kiq[xcc_id];
struct amdgpu_ring *kiq_ring = >ring;
+   struct amdgpu_hive_info *hive;
+   struct amdgpu_ras *ras;
+   int hive_ras_recovery;
int i, r = 0;
int j;
  
@@ -521,6 +525,23 @@ int amdgpu_gfx_disable_kcq(struct amdgpu_device *adev, int xcc_id)

   RESET_QUEUES, 0, 0);
}
  
+	/**

+* This is workaround: only skip kiq_ring test
+* during ras recovery in suspend stage for gfx v9_4_3
+*/
+   hive = amdgpu_get_xgmi_hive(adev);
+   if (hive) {
+   hive_ras_recovery = atomic_read(>ras_recovery);
+   amdgpu_put_xgmi_hive(hive);
+   }
+
+   ras = amdgpu_ras_get_context(adev);
+   if ((amdgpu_ip_version(adev, GC_HWIP, 0) == IP_VERSION(9, 4, 3)) &&
+   ras && (atomic_read(>in_recovery) || hive_ras_recovery)) {
+   spin_unlock(>ring_lock);
+   return 0;
+   }
+
if (kiq_ring->sched.ready && !adev->job_hang)
r = amdgpu_ring_test_helper(kiq_ring);
spin_unlock(>ring_lock);




RE: [PATCH] drm/amdgpu: enable RAS poison mode for APU

2023-10-20 Thread Yang, Stanley
[AMD Official Use Only - General]

Reviewed-by: Stanley.Yang 

Regards,
Stanley
> -Original Message-
> From: amd-gfx  On Behalf Of Tao
> Zhou
> Sent: Friday, October 20, 2023 6:26 PM
> To: amd-gfx@lists.freedesktop.org
> Cc: Zhou1, Tao 
> Subject: [PATCH] drm/amdgpu: enable RAS poison mode for APU
>
> Enable it by default on APU platform.
>
> Signed-off-by: Tao Zhou 
> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
> index 95c181cd1fea..a41cab0a2f9c 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
> @@ -2710,7 +2710,8 @@ static void
> amdgpu_ras_query_poison_mode(struct amdgpu_device *adev)
>   return;
>
>   /* Init poison supported flag, the default value is false */
> - if (adev->gmc.xgmi.connected_to_cpu) {
> + if (adev->gmc.xgmi.connected_to_cpu ||
> + adev->gmc.is_app_apu) {
>   /* enabled by default when GPU is connected to CPU */
>   con->poison_supported = true;
>   } else if (adev->df.funcs &&
> --
> 2.35.1



[PATCH] drm/amdgpu: enable RAS poison mode for APU

2023-10-20 Thread Tao Zhou
Enable it by default on APU platform.

Signed-off-by: Tao Zhou 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
index 95c181cd1fea..a41cab0a2f9c 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
@@ -2710,7 +2710,8 @@ static void amdgpu_ras_query_poison_mode(struct 
amdgpu_device *adev)
return;
 
/* Init poison supported flag, the default value is false */
-   if (adev->gmc.xgmi.connected_to_cpu) {
+   if (adev->gmc.xgmi.connected_to_cpu ||
+   adev->gmc.is_app_apu) {
/* enabled by default when GPU is connected to CPU */
con->poison_supported = true;
} else if (adev->df.funcs &&
-- 
2.35.1



[PATCH 2/2] drm/amdgpu: Add timeout for sync wait

2023-10-20 Thread Emily Deng
Issue: Dead heappen during gpu recover, the call sequence as below:

amdgpu_device_gpu_recover->amdgpu_amdkfd_pre_reset->flush_delayed_work->
amdgpu_amdkfd_gpuvm_restore_process_bos->amdgpu_sync_wait

It is because the amdgpu_sync_wait is waiting for the bad job's fence, and
never return, so the recover couldn't continue.

Signed-off-by: Emily Deng 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_sync.c | 11 +--
 1 file changed, 9 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_sync.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_sync.c
index dcd8c066bc1f..9d4f122a7bf0 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_sync.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_sync.c
@@ -406,8 +406,15 @@ int amdgpu_sync_wait(struct amdgpu_sync *sync, bool intr)
int i, r;
 
hash_for_each_safe(sync->fences, i, tmp, e, node) {
-   r = dma_fence_wait(e->fence, intr);
-   if (r)
+   struct drm_sched_fence *s_fence = to_drm_sched_fence(e->fence);
+   long timeout = msecs_to_jiffies(1);
+
+   if (s_fence)
+   timeout = s_fence->sched->timeout;
+   r = dma_fence_wait_timeout(e->fence, intr, timeout);
+   if (r == 0)
+   r = -ETIMEDOUT;
+   if (r < 0)
return r;
 
amdgpu_sync_entry_free(e);
-- 
2.36.1



[PATCH 1/2] drm/amdgpu: handle the return for sync wait

2023-10-20 Thread Emily Deng
Add error handling for amdgpu_sync_wait.

Signed-off-by: Emily Deng 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c | 9 ++---
 drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c  | 6 +-
 2 files changed, 11 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
index 54f31a420229..3011c191d7dd 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
@@ -2668,7 +2668,7 @@ static int validate_invalid_user_pages(struct 
amdkfd_process_info *process_info)
 
 unreserve_out:
ttm_eu_backoff_reservation(, _list);
-   amdgpu_sync_wait(, false);
+   ret = amdgpu_sync_wait(, false);
amdgpu_sync_free();
 out_free:
kfree(pd_bo_list_entries);
@@ -2939,8 +2939,11 @@ int amdgpu_amdkfd_gpuvm_restore_process_bos(void *info, 
struct dma_fence **ef)
}
 
/* Wait for validate and PT updates to finish */
-   amdgpu_sync_wait(_obj, false);
-
+   ret = amdgpu_sync_wait(_obj, false);
+   if (ret) {
+   pr_err("Failed to wait for validate and PT updates to 
finish\n");
+   goto validate_map_fail;
+   }
/* Release old eviction fence and create new one, because fence only
 * goes from unsignaled to signaled, fence cannot be reused.
 * Use context and mm from the old fence.
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c
index 70fe3b39c004..a63139277583 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c
@@ -1153,7 +1153,11 @@ int amdgpu_mes_ctx_map_meta_data(struct amdgpu_device 
*adev,
}
amdgpu_sync_fence(, vm->last_update);
 
-   amdgpu_sync_wait(, false);
+   r = amdgpu_sync_wait(, false);
+   if (r) {
+   DRM_ERROR("failed to wait sync\n");
+   goto error;
+   }
ttm_eu_backoff_reservation(, );
 
amdgpu_sync_free();
-- 
2.36.1



RE: [PATCH] drm/amdgpu: Add a read to GFX v9.4.3 ring test

2023-10-20 Thread Kamal, Asad
[AMD Official Use Only - General]

Reviewed-by: Asad Kamal 

Thanks & Regards
Asad

-Original Message-
From: Zhang, Hawking 
Sent: Friday, October 20, 2023 12:36 PM
To: Lazar, Lijo ; amd-gfx@lists.freedesktop.org
Cc: Deucher, Alexander ; Kamal, Asad 
; Ma, Le 
Subject: RE: [PATCH] drm/amdgpu: Add a read to GFX v9.4.3 ring test

[AMD Official Use Only - General]

Acked-by: Hawking Zhang 

Regards,
Hawking
-Original Message-
From: Lazar, Lijo 
Sent: Friday, October 20, 2023 15:02
To: amd-gfx@lists.freedesktop.org
Cc: Zhang, Hawking ; Deucher, Alexander 
; Kamal, Asad ; Ma, Le 

Subject: [PATCH] drm/amdgpu: Add a read to GFX v9.4.3 ring test

Issue a read to confirm the register write before ringing doorbell. With 
multiple XCCs there is chance for race condition.

Signed-off-by: Lijo Lazar 
---
 drivers/gpu/drm/amd/amdgpu/gfx_v9_4_3.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v9_4_3.c 
b/drivers/gpu/drm/amd/amdgpu/gfx_v9_4_3.c
index a1c2c952d882..5861e4d0eda9 100644
--- a/drivers/gpu/drm/amd/amdgpu/gfx_v9_4_3.c
+++ b/drivers/gpu/drm/amd/amdgpu/gfx_v9_4_3.c
@@ -256,6 +256,7 @@ static int gfx_v9_4_3_ring_test_ring(struct amdgpu_ring 
*ring)
xcc_offset = SOC15_REG_OFFSET(GC, 0, regSCRATCH_REG0);
scratch_reg0_offset = SOC15_REG_OFFSET(GC, GET_INST(GC, ring->xcc_id), 
regSCRATCH_REG0);
WREG32(scratch_reg0_offset, 0xCAFEDEAD);
+   tmp = RREG32(scratch_reg0_offset);

r = amdgpu_ring_alloc(ring, 3);
if (r)
--
2.25.1




RE: [PATCH 2/2] drm/amdgpu: handle the return for sync wait

2023-10-20 Thread Deng, Emily
[AMD Official Use Only - General]

Ok, will send this as the first.

Emily Deng
Best Wishes

>-Original Message-
>From: Christian König 
>Sent: Friday, October 20, 2023 3:30 PM
>To: Deng, Emily ; amd-gfx@lists.freedesktop.org
>Subject: Re: [PATCH 2/2] drm/amdgpu: handle the return for sync wait
>
>Am 20.10.23 um 08:13 schrieb Emily Deng:
>
>You need a patch description and this patch here needs to come first and not
>second.
>
>Christian.
>
>> Signed-off-by: Emily Deng 
>> ---
>>   drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c | 9 ++---
>>   drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c  | 6 +-
>>   2 files changed, 11 insertions(+), 4 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
>> index 54f31a420229..3011c191d7dd 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
>> @@ -2668,7 +2668,7 @@ static int validate_invalid_user_pages(struct
>> amdkfd_process_info *process_info)
>>
>>   unreserve_out:
>>  ttm_eu_backoff_reservation(, _list);
>> -amdgpu_sync_wait(, false);
>> +ret = amdgpu_sync_wait(, false);
>>  amdgpu_sync_free();
>>   out_free:
>>  kfree(pd_bo_list_entries);
>> @@ -2939,8 +2939,11 @@ int
>amdgpu_amdkfd_gpuvm_restore_process_bos(void *info, struct dma_fence
>**ef)
>>  }
>>
>>  /* Wait for validate and PT updates to finish */
>> -amdgpu_sync_wait(_obj, false);
>> -
>> +ret = amdgpu_sync_wait(_obj, false);
>> +if (ret) {
>> +pr_err("Failed to wait for validate and PT updates to 
>> finish\n");
>> +goto validate_map_fail;
>> +}
>>  /* Release old eviction fence and create new one, because fence only
>>   * goes from unsignaled to signaled, fence cannot be reused.
>>   * Use context and mm from the old fence.
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c
>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c
>> index 70fe3b39c004..a63139277583 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c
>> @@ -1153,7 +1153,11 @@ int amdgpu_mes_ctx_map_meta_data(struct
>amdgpu_device *adev,
>>  }
>>  amdgpu_sync_fence(, vm->last_update);
>>
>> -amdgpu_sync_wait(, false);
>> +r = amdgpu_sync_wait(, false);
>> +if (r) {
>> +DRM_ERROR("failed to wait sync\n");
>> +goto error;
>> +}
>>  ttm_eu_backoff_reservation(, );
>>
>>  amdgpu_sync_free();



RE: [PATCH 1/2] drm/amdgpu: Add timeout for sync wait

2023-10-20 Thread Deng, Emily
[AMD Official Use Only - General]

Hi Christian,
 The issue is running a compute hang with a quark and trigger a compute job 
timeout. For compute, the timeout setting is 60s, but for gfx and sdma, it is 
10s.
So, get the timeout from the sched is reasonable for different sched.
And if wait timeout, it will print error, so won't hint real issues. And 
even it has real issue, the wait forever is bad user experience, and driver 
couldn't work anymore.

Emily Deng
Best Wishes



>-Original Message-
>From: Christian König 
>Sent: Friday, October 20, 2023 3:29 PM
>To: Deng, Emily ; amd-gfx@lists.freedesktop.org
>Subject: Re: [PATCH 1/2] drm/amdgpu: Add timeout for sync wait
>
>Am 20.10.23 um 08:13 schrieb Emily Deng:
>> Issue: Dead heappen during gpu recover, the call sequence as below:
>>
>> amdgpu_device_gpu_recover->amdgpu_amdkfd_pre_reset-
>>flush_delayed_work
>> -> amdgpu_amdkfd_gpuvm_restore_process_bos->amdgpu_sync_wait
>>
>> It is because the amdgpu_sync_wait is waiting for the bad job's fence,
>> and never return, so the recover couldn't continue.
>>
>>
>> Signed-off-by: Emily Deng 
>> ---
>>   drivers/gpu/drm/amd/amdgpu/amdgpu_sync.c | 11 +--
>>   1 file changed, 9 insertions(+), 2 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_sync.c
>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_sync.c
>> index dcd8c066bc1f..6253d6aab7f8 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_sync.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_sync.c
>> @@ -406,8 +406,15 @@ int amdgpu_sync_wait(struct amdgpu_sync *sync,
>bool intr)
>>  int i, r;
>>
>>  hash_for_each_safe(sync->fences, i, tmp, e, node) {
>> -r = dma_fence_wait(e->fence, intr);
>> -if (r)
>> +struct drm_sched_fence *s_fence = to_drm_sched_fence(e-
>>fence);
>> +long timeout = msecs_to_jiffies(1);
>
>That handling doesn't make much sense. If you need a timeout then you need
>a timeout for the whole function.
>
>Additional to that timeouts often just hide real problems which needs fixing.
>
>So this here needs a much better justification otherwise it's a pretty clear 
>NAK.
>
>Regards,
>Christian.
>
>> +
>> +if (s_fence)
>> +timeout = s_fence->sched->timeout;
>> +
>> +if (r == 0)
>> +r = -ETIMEDOUT;
>> +if (r < 0)
>>  return r;
>>
>>  amdgpu_sync_entry_free(e);



[PATCH] drm/amd/pm: fix the high voltage and temperature issue on smu 13

2023-10-20 Thread Kenneth Feng
fix the high voltage and temperature issue after the driver is unloaded on smu 
13.0.0,
smu 13.0.7 and smu 13.0.10

Signed-off-by: Kenneth Feng 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_device.c| 36 +++
 drivers/gpu/drm/amd/amdgpu/gmc_v11_0.c|  4 +--
 drivers/gpu/drm/amd/pm/swsmu/amdgpu_smu.c | 27 --
 drivers/gpu/drm/amd/pm/swsmu/inc/amdgpu_smu.h |  1 +
 drivers/gpu/drm/amd/pm/swsmu/inc/smu_v13_0.h  |  2 ++
 .../gpu/drm/amd/pm/swsmu/smu13/smu_v13_0.c| 13 +++
 .../drm/amd/pm/swsmu/smu13/smu_v13_0_0_ppt.c  |  8 -
 .../drm/amd/pm/swsmu/smu13/smu_v13_0_7_ppt.c  |  8 -
 8 files changed, 86 insertions(+), 13 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index 31f8c3ead161..c5c892a8b3f9 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -3986,13 +3986,23 @@ int amdgpu_device_init(struct amdgpu_device *adev,
}
}
} else {
-   tmp = amdgpu_reset_method;
-   /* It should do a default reset when loading or 
reloading the driver,
-* regardless of the module parameter reset_method.
-*/
-   amdgpu_reset_method = AMD_RESET_METHOD_NONE;
-   r = amdgpu_asic_reset(adev);
-   amdgpu_reset_method = tmp;
+   switch (amdgpu_ip_version(adev, MP1_HWIP, 0)) {
+   case IP_VERSION(13, 0, 0):
+   case IP_VERSION(13, 0, 7):
+   case IP_VERSION(13, 0, 10):
+   r = psp_gpu_reset(adev);
+   break;
+   default:
+   tmp = amdgpu_reset_method;
+   /* It should do a default reset when loading or 
reloading the driver,
+* regardless of the module parameter 
reset_method.
+*/
+   amdgpu_reset_method = AMD_RESET_METHOD_NONE;
+   r = amdgpu_asic_reset(adev);
+   amdgpu_reset_method = tmp;
+   break;
+   }
+
if (r) {
dev_err(adev->dev, "asic reset on init 
failed\n");
goto failed;
@@ -5945,6 +5955,18 @@ int amdgpu_device_baco_exit(struct drm_device *dev)
return -ENOTSUPP;
 
ret = amdgpu_dpm_baco_exit(adev);
+
+   if (!ret)
+   switch (amdgpu_ip_version(adev, MP1_HWIP, 0)) {
+   case IP_VERSION(13, 0, 0):
+   case IP_VERSION(13, 0, 7):
+   case IP_VERSION(13, 0, 10):
+   adev->gfx.is_poweron = false;
+   break;
+   default:
+   break;
+   }
+
if (ret)
return ret;
 
diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v11_0.c 
b/drivers/gpu/drm/amd/amdgpu/gmc_v11_0.c
index 80ca2c05b0b8..3ad38e42773b 100644
--- a/drivers/gpu/drm/amd/amdgpu/gmc_v11_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gmc_v11_0.c
@@ -73,7 +73,7 @@ gmc_v11_0_vm_fault_interrupt_state(struct amdgpu_device *adev,
 * fini/suspend, so the overall state doesn't
 * change over the course of suspend/resume.
 */
-   if (!adev->in_s0ix)
+   if (!adev->in_s0ix && adev->gfx.is_poweron)
amdgpu_gmc_set_vm_fault_masks(adev, AMDGPU_GFXHUB(0), 
false);
break;
case AMDGPU_IRQ_STATE_ENABLE:
@@ -85,7 +85,7 @@ gmc_v11_0_vm_fault_interrupt_state(struct amdgpu_device *adev,
 * fini/suspend, so the overall state doesn't
 * change over the course of suspend/resume.
 */
-   if (!adev->in_s0ix)
+   if (!adev->in_s0ix && adev->gfx.is_poweron)
amdgpu_gmc_set_vm_fault_masks(adev, AMDGPU_GFXHUB(0), 
true);
break;
default:
diff --git a/drivers/gpu/drm/amd/pm/swsmu/amdgpu_smu.c 
b/drivers/gpu/drm/amd/pm/swsmu/amdgpu_smu.c
index 7c3356d6da5e..30e5f7161737 100644
--- a/drivers/gpu/drm/amd/pm/swsmu/amdgpu_smu.c
+++ b/drivers/gpu/drm/amd/pm/swsmu/amdgpu_smu.c
@@ -733,7 +733,7 @@ static int smu_early_init(void *handle)
smu->adev = adev;
smu->pm_enabled = !!amdgpu_dpm;
smu->is_apu = false;
-   smu->smu_baco.state = SMU_BACO_STATE_EXIT;
+   smu->smu_baco.state = SMU_BACO_STATE_NONE;
smu->smu_baco.platform_support = false;
smu->user_dpm_profile.fan_mode = -1;
 
@@ -1740,10 +1740,25 @@ static int smu_smc_hw_cleanup(struct smu_context *smu)
return 0;
 }
 
+static int 

Re: [PATCH 2/2] drm/amdgpu: handle the return for sync wait

2023-10-20 Thread Christian König

Am 20.10.23 um 08:13 schrieb Emily Deng:

You need a patch description and this patch here needs to come first and 
not second.


Christian.


Signed-off-by: Emily Deng 
---
  drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c | 9 ++---
  drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c  | 6 +-
  2 files changed, 11 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
index 54f31a420229..3011c191d7dd 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
@@ -2668,7 +2668,7 @@ static int validate_invalid_user_pages(struct 
amdkfd_process_info *process_info)
  
  unreserve_out:

ttm_eu_backoff_reservation(, _list);
-   amdgpu_sync_wait(, false);
+   ret = amdgpu_sync_wait(, false);
amdgpu_sync_free();
  out_free:
kfree(pd_bo_list_entries);
@@ -2939,8 +2939,11 @@ int amdgpu_amdkfd_gpuvm_restore_process_bos(void *info, 
struct dma_fence **ef)
}
  
  	/* Wait for validate and PT updates to finish */

-   amdgpu_sync_wait(_obj, false);
-
+   ret = amdgpu_sync_wait(_obj, false);
+   if (ret) {
+   pr_err("Failed to wait for validate and PT updates to 
finish\n");
+   goto validate_map_fail;
+   }
/* Release old eviction fence and create new one, because fence only
 * goes from unsignaled to signaled, fence cannot be reused.
 * Use context and mm from the old fence.
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c
index 70fe3b39c004..a63139277583 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c
@@ -1153,7 +1153,11 @@ int amdgpu_mes_ctx_map_meta_data(struct amdgpu_device 
*adev,
}
amdgpu_sync_fence(, vm->last_update);
  
-	amdgpu_sync_wait(, false);

+   r = amdgpu_sync_wait(, false);
+   if (r) {
+   DRM_ERROR("failed to wait sync\n");
+   goto error;
+   }
ttm_eu_backoff_reservation(, );
  
  	amdgpu_sync_free();




Re: [PATCH 1/2] drm/amdgpu: Add timeout for sync wait

2023-10-20 Thread Christian König

Am 20.10.23 um 08:13 schrieb Emily Deng:

Issue: Dead heappen during gpu recover, the call sequence as below:

amdgpu_device_gpu_recover->amdgpu_amdkfd_pre_reset->flush_delayed_work->
amdgpu_amdkfd_gpuvm_restore_process_bos->amdgpu_sync_wait

It is because the amdgpu_sync_wait is waiting for the bad job's fence, and
never return, so the recover couldn't continue.


Signed-off-by: Emily Deng 
---
  drivers/gpu/drm/amd/amdgpu/amdgpu_sync.c | 11 +--
  1 file changed, 9 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_sync.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_sync.c
index dcd8c066bc1f..6253d6aab7f8 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_sync.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_sync.c
@@ -406,8 +406,15 @@ int amdgpu_sync_wait(struct amdgpu_sync *sync, bool intr)
int i, r;
  
  	hash_for_each_safe(sync->fences, i, tmp, e, node) {

-   r = dma_fence_wait(e->fence, intr);
-   if (r)
+   struct drm_sched_fence *s_fence = to_drm_sched_fence(e->fence);
+   long timeout = msecs_to_jiffies(1);


That handling doesn't make much sense. If you need a timeout then you 
need a timeout for the whole function.


Additional to that timeouts often just hide real problems which needs 
fixing.


So this here needs a much better justification otherwise it's a pretty 
clear NAK.


Regards,
Christian.


+
+   if (s_fence)
+   timeout = s_fence->sched->timeout;
+
+   if (r == 0)
+   r = -ETIMEDOUT;
+   if (r < 0)
return r;
  
  		amdgpu_sync_entry_free(e);




RE: [PATCH] drm/amdgpu: Add a read to GFX v9.4.3 ring test

2023-10-20 Thread Zhang, Hawking
[AMD Official Use Only - General]

Acked-by: Hawking Zhang 

Regards,
Hawking
-Original Message-
From: Lazar, Lijo 
Sent: Friday, October 20, 2023 15:02
To: amd-gfx@lists.freedesktop.org
Cc: Zhang, Hawking ; Deucher, Alexander 
; Kamal, Asad ; Ma, Le 

Subject: [PATCH] drm/amdgpu: Add a read to GFX v9.4.3 ring test

Issue a read to confirm the register write before ringing doorbell. With 
multiple XCCs there is chance for race condition.

Signed-off-by: Lijo Lazar 
---
 drivers/gpu/drm/amd/amdgpu/gfx_v9_4_3.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v9_4_3.c 
b/drivers/gpu/drm/amd/amdgpu/gfx_v9_4_3.c
index a1c2c952d882..5861e4d0eda9 100644
--- a/drivers/gpu/drm/amd/amdgpu/gfx_v9_4_3.c
+++ b/drivers/gpu/drm/amd/amdgpu/gfx_v9_4_3.c
@@ -256,6 +256,7 @@ static int gfx_v9_4_3_ring_test_ring(struct amdgpu_ring 
*ring)
xcc_offset = SOC15_REG_OFFSET(GC, 0, regSCRATCH_REG0);
scratch_reg0_offset = SOC15_REG_OFFSET(GC, GET_INST(GC, ring->xcc_id), 
regSCRATCH_REG0);
WREG32(scratch_reg0_offset, 0xCAFEDEAD);
+   tmp = RREG32(scratch_reg0_offset);

r = amdgpu_ring_alloc(ring, 3);
if (r)
--
2.25.1



RE: [PATCH 2/2] drm/amdgpu: refine ras error kernel log print

2023-10-20 Thread Zhang, Hawking
[AMD Official Use Only - General]

Series is

Reviewed-by: Hawking Zhang 

Regards,
Hawking
-Original Message-
From: Wang, Yang(Kevin) 
Sent: Friday, October 20, 2023 15:00
To: amd-gfx@lists.freedesktop.org
Cc: Zhang, Hawking ; Zhou1, Tao ; Li, 
Candice ; Chai, Thomas ; Wang, 
Yang(Kevin) 
Subject: [PATCH 2/2] drm/amdgpu: refine ras error kernel log print

refine ras error kernel log to avoid user-ridden ambiguity.

Signed-off-by: Yang Wang 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 116 +---
 drivers/gpu/drm/amd/amdgpu/amdgpu_ras.h |   5 +-
 2 files changed, 82 insertions(+), 39 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
index d5bcfcf4ced2..0cb60f71c14d 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
@@ -635,8 +635,11 @@ static ssize_t amdgpu_ras_sysfs_read(struct device *dev,

 static inline void put_obj(struct ras_manager *obj)  {
-   if (obj && (--obj->use == 0))
+   if (obj && (--obj->use == 0)) {
list_del(>node);
+   amdgpu_ras_error_data_fini(>err_data);
+   }
+
if (obj && (obj->use < 0))
DRM_ERROR("RAS ERROR: Unbalance obj(%s) use\n", 
get_ras_block_str(>head));  } @@ -666,6 +669,9 @@ static struct 
ras_manager *amdgpu_ras_create_obj(struct amdgpu_device *adev,
if (alive_obj(obj))
return NULL;

+   if(amdgpu_ras_error_data_init(>err_data))
+   return NULL;
+
obj->head = *head;
obj->adev = adev;
list_add(>node, >head);
@@ -1023,44 +1029,68 @@ static void amdgpu_ras_get_ecc_info(struct 
amdgpu_device *adev, struct ras_err_d  }

 static void amdgpu_ras_error_print_error_data(struct amdgpu_device *adev,
- struct ras_query_if *query_if,
+ struct ras_manager *ras_mgr,
  struct ras_err_data *err_data,
+ const char *blk_name,
  bool is_ue)
 {
-   struct ras_manager *ras_mgr = amdgpu_ras_find_obj(adev, 
_if->head);
-   const char *blk_name = get_ras_block_str(_if->head);
struct amdgpu_smuio_mcm_config_info *mcm_info;
struct ras_err_node *err_node;
struct ras_err_info *err_info;

-   if (is_ue)
-   dev_info(adev->dev, "%ld uncorrectable hardware errors detected 
in %s block\n",
-ras_mgr->err_data.ue_count, blk_name);
-   else
-   dev_info(adev->dev, "%ld correctable hardware errors detected 
in %s block\n",
-ras_mgr->err_data.ce_count, blk_name);
+   if (is_ue) {
+   for_each_ras_error(err_node, err_data) {
+   err_info = _node->err_info;
+   mcm_info = _info->mcm_info;
+   if (err_info->ue_count) {
+   dev_info(adev->dev, "socket: %d, die: %d, "
+"%lld new uncorrectable hardware 
errors detected in %s block\n",
+mcm_info->socket_id,
+mcm_info->die_id,
+err_info->ue_count,
+blk_name);
+   }
+   }

-   for_each_ras_error(err_node, err_data) {
-   err_info = _node->err_info;
-   mcm_info = _info->mcm_info;
-   if (is_ue && err_info->ue_count) {
-   dev_info(adev->dev, "socket: %d, die: %d "
-"%lld uncorrectable hardware errors detected 
in %s block\n",
-mcm_info->socket_id,
-mcm_info->die_id,
-err_info->ue_count,
-blk_name);
-   } else if (!is_ue && err_info->ce_count) {
-   dev_info(adev->dev, "socket: %d, die: %d "
-"%lld correctable hardware errors detected in 
%s block\n",
-mcm_info->socket_id,
-mcm_info->die_id,
-err_info->ce_count,
-blk_name);
+   for_each_ras_error(err_node, _mgr->err_data) {
+   err_info = _node->err_info;
+   mcm_info = _info->mcm_info;
+   dev_info(adev->dev, "socket: %d, die: %d, "
+"%lld uncorrectable hardware errors detected 
in total in %s block\n",
+mcm_info->socket_id, mcm_info->die_id, 
err_info->ue_count, blk_name);
+   }
+
+   } else {
+   for_each_ras_error(err_node, err_data) {
+ 

[PATCH] drm/amdgpu: Add a read to GFX v9.4.3 ring test

2023-10-20 Thread Lijo Lazar
Issue a read to confirm the register write before ringing doorbell. With
multiple XCCs there is chance for race condition.

Signed-off-by: Lijo Lazar 
---
 drivers/gpu/drm/amd/amdgpu/gfx_v9_4_3.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v9_4_3.c 
b/drivers/gpu/drm/amd/amdgpu/gfx_v9_4_3.c
index a1c2c952d882..5861e4d0eda9 100644
--- a/drivers/gpu/drm/amd/amdgpu/gfx_v9_4_3.c
+++ b/drivers/gpu/drm/amd/amdgpu/gfx_v9_4_3.c
@@ -256,6 +256,7 @@ static int gfx_v9_4_3_ring_test_ring(struct amdgpu_ring 
*ring)
xcc_offset = SOC15_REG_OFFSET(GC, 0, regSCRATCH_REG0);
scratch_reg0_offset = SOC15_REG_OFFSET(GC, GET_INST(GC, ring->xcc_id), 
regSCRATCH_REG0);
WREG32(scratch_reg0_offset, 0xCAFEDEAD);
+   tmp = RREG32(scratch_reg0_offset);
 
r = amdgpu_ring_alloc(ring, 3);
if (r)
-- 
2.25.1



[PATCH 2/2] drm/amdgpu: refine ras error kernel log print

2023-10-20 Thread Yang Wang
refine ras error kernel log to avoid user-ridden ambiguity.

Signed-off-by: Yang Wang 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 116 +---
 drivers/gpu/drm/amd/amdgpu/amdgpu_ras.h |   5 +-
 2 files changed, 82 insertions(+), 39 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
index d5bcfcf4ced2..0cb60f71c14d 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
@@ -635,8 +635,11 @@ static ssize_t amdgpu_ras_sysfs_read(struct device *dev,
 
 static inline void put_obj(struct ras_manager *obj)
 {
-   if (obj && (--obj->use == 0))
+   if (obj && (--obj->use == 0)) {
list_del(>node);
+   amdgpu_ras_error_data_fini(>err_data);
+   }
+
if (obj && (obj->use < 0))
DRM_ERROR("RAS ERROR: Unbalance obj(%s) use\n", 
get_ras_block_str(>head));
 }
@@ -666,6 +669,9 @@ static struct ras_manager *amdgpu_ras_create_obj(struct 
amdgpu_device *adev,
if (alive_obj(obj))
return NULL;
 
+   if(amdgpu_ras_error_data_init(>err_data))
+   return NULL;
+
obj->head = *head;
obj->adev = adev;
list_add(>node, >head);
@@ -1023,44 +1029,68 @@ static void amdgpu_ras_get_ecc_info(struct 
amdgpu_device *adev, struct ras_err_d
 }
 
 static void amdgpu_ras_error_print_error_data(struct amdgpu_device *adev,
- struct ras_query_if *query_if,
+ struct ras_manager *ras_mgr,
  struct ras_err_data *err_data,
+ const char *blk_name,
  bool is_ue)
 {
-   struct ras_manager *ras_mgr = amdgpu_ras_find_obj(adev, 
_if->head);
-   const char *blk_name = get_ras_block_str(_if->head);
struct amdgpu_smuio_mcm_config_info *mcm_info;
struct ras_err_node *err_node;
struct ras_err_info *err_info;
 
-   if (is_ue)
-   dev_info(adev->dev, "%ld uncorrectable hardware errors detected 
in %s block\n",
-ras_mgr->err_data.ue_count, blk_name);
-   else
-   dev_info(adev->dev, "%ld correctable hardware errors detected 
in %s block\n",
-ras_mgr->err_data.ce_count, blk_name);
+   if (is_ue) {
+   for_each_ras_error(err_node, err_data) {
+   err_info = _node->err_info;
+   mcm_info = _info->mcm_info;
+   if (err_info->ue_count) {
+   dev_info(adev->dev, "socket: %d, die: %d, "
+"%lld new uncorrectable hardware 
errors detected in %s block\n",
+mcm_info->socket_id,
+mcm_info->die_id,
+err_info->ue_count,
+blk_name);
+   }
+   }
 
-   for_each_ras_error(err_node, err_data) {
-   err_info = _node->err_info;
-   mcm_info = _info->mcm_info;
-   if (is_ue && err_info->ue_count) {
-   dev_info(adev->dev, "socket: %d, die: %d "
-"%lld uncorrectable hardware errors detected 
in %s block\n",
-mcm_info->socket_id,
-mcm_info->die_id,
-err_info->ue_count,
-blk_name);
-   } else if (!is_ue && err_info->ce_count) {
-   dev_info(adev->dev, "socket: %d, die: %d "
-"%lld correctable hardware errors detected in 
%s block\n",
-mcm_info->socket_id,
-mcm_info->die_id,
-err_info->ce_count,
-blk_name);
+   for_each_ras_error(err_node, _mgr->err_data) {
+   err_info = _node->err_info;
+   mcm_info = _info->mcm_info;
+   dev_info(adev->dev, "socket: %d, die: %d, "
+"%lld uncorrectable hardware errors detected 
in total in %s block\n",
+mcm_info->socket_id, mcm_info->die_id, 
err_info->ue_count, blk_name);
+   }
+
+   } else {
+   for_each_ras_error(err_node, err_data) {
+   err_info = _node->err_info;
+   mcm_info = _info->mcm_info;
+   if (err_info->ce_count) {
+   dev_info(adev->dev, "socket: %d, die: %d, "
+"%lld new correctable hardware errors 
detected in %s block, "
+ 

[PATCH 1/2] drm/amdgpu: fix find ras error node error

2023-10-20 Thread Yang Wang
the origin function might return the wrong node.

Fixes: d479ef0d5fbd ("drm/amdgpu: add ras_err_info to identify RAS error 
source")

Signed-off-by: Yang Wang 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 7 +++
 1 file changed, 3 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
index 70dd249f2ba7..d5bcfcf4ced2 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
@@ -3518,11 +3518,10 @@ static struct ras_err_node 
*amdgpu_ras_error_find_node_by_id(struct ras_err_data
 
for_each_ras_error(err_node, err_data) {
ref_id = _node->err_info.mcm_info;
-   if ((mcm_info->socket_id >= 0 && mcm_info->socket_id != 
ref_id->socket_id) ||
-   (mcm_info->die_id >= 0 && mcm_info->die_id != 
ref_id->die_id))
-   continue;
 
-   return err_node;
+   if (mcm_info->socket_id == ref_id->socket_id &&
+   mcm_info->die_id == ref_id->die_id)
+   return err_node;
}
 
return NULL;
-- 
2.34.1



[PATCH 2/2] drm/amdgpu: handle the return for sync wait

2023-10-20 Thread Emily Deng
Signed-off-by: Emily Deng 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c | 9 ++---
 drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c  | 6 +-
 2 files changed, 11 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
index 54f31a420229..3011c191d7dd 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
@@ -2668,7 +2668,7 @@ static int validate_invalid_user_pages(struct 
amdkfd_process_info *process_info)
 
 unreserve_out:
ttm_eu_backoff_reservation(, _list);
-   amdgpu_sync_wait(, false);
+   ret = amdgpu_sync_wait(, false);
amdgpu_sync_free();
 out_free:
kfree(pd_bo_list_entries);
@@ -2939,8 +2939,11 @@ int amdgpu_amdkfd_gpuvm_restore_process_bos(void *info, 
struct dma_fence **ef)
}
 
/* Wait for validate and PT updates to finish */
-   amdgpu_sync_wait(_obj, false);
-
+   ret = amdgpu_sync_wait(_obj, false);
+   if (ret) {
+   pr_err("Failed to wait for validate and PT updates to 
finish\n");
+   goto validate_map_fail;
+   }
/* Release old eviction fence and create new one, because fence only
 * goes from unsignaled to signaled, fence cannot be reused.
 * Use context and mm from the old fence.
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c
index 70fe3b39c004..a63139277583 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c
@@ -1153,7 +1153,11 @@ int amdgpu_mes_ctx_map_meta_data(struct amdgpu_device 
*adev,
}
amdgpu_sync_fence(, vm->last_update);
 
-   amdgpu_sync_wait(, false);
+   r = amdgpu_sync_wait(, false);
+   if (r) {
+   DRM_ERROR("failed to wait sync\n");
+   goto error;
+   }
ttm_eu_backoff_reservation(, );
 
amdgpu_sync_free();
-- 
2.36.1



[PATCH 1/2] drm/amdgpu: Add timeout for sync wait

2023-10-20 Thread Emily Deng
Issue: Dead heappen during gpu recover, the call sequence as below:

amdgpu_device_gpu_recover->amdgpu_amdkfd_pre_reset->flush_delayed_work->
amdgpu_amdkfd_gpuvm_restore_process_bos->amdgpu_sync_wait

It is because the amdgpu_sync_wait is waiting for the bad job's fence, and
never return, so the recover couldn't continue.


Signed-off-by: Emily Deng 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_sync.c | 11 +--
 1 file changed, 9 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_sync.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_sync.c
index dcd8c066bc1f..6253d6aab7f8 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_sync.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_sync.c
@@ -406,8 +406,15 @@ int amdgpu_sync_wait(struct amdgpu_sync *sync, bool intr)
int i, r;
 
hash_for_each_safe(sync->fences, i, tmp, e, node) {
-   r = dma_fence_wait(e->fence, intr);
-   if (r)
+   struct drm_sched_fence *s_fence = to_drm_sched_fence(e->fence);
+   long timeout = msecs_to_jiffies(1);
+
+   if (s_fence)
+   timeout = s_fence->sched->timeout;
+
+   if (r == 0)
+   r = -ETIMEDOUT;
+   if (r < 0)
return r;
 
amdgpu_sync_entry_free(e);
-- 
2.36.1