Re: [PATCH 7/8] Documentation/gpu: Add an explanation about the DC weekly patches

2023-10-23 Thread Jani Nikula
On Fri, 20 Oct 2023, Rodrigo Siqueira  wrote:
> Sharing code with other OSes is confusing and raises some questions.
> This patch introduces some explanation about our upstream process with
> the shared code.

Thanks for writing this! It does help with the transparency.

Please find a comment inline.

>
> Cc: Mario Limonciello 
> Cc: Alex Deucher 
> Cc: Harry Wentland 
> Cc: Hamza Mahfooz 
> Signed-off-by: Rodrigo Siqueira 
> ---
>  Documentation/gpu/amdgpu/display/index.rst | 111 -
>  1 file changed, 109 insertions(+), 2 deletions(-)
>
> diff --git a/Documentation/gpu/amdgpu/display/index.rst 
> b/Documentation/gpu/amdgpu/display/index.rst
> index b09d1434754d..9d53a42c5339 100644
> --- a/Documentation/gpu/amdgpu/display/index.rst
> +++ b/Documentation/gpu/amdgpu/display/index.rst
> @@ -10,7 +10,114 @@ reason, our Display Core Driver is divided into two 
> pieces:
>  1. **Display Core (DC)** contains the OS-agnostic components. Things like
> hardware programming and resource management are handled here.
>  2. **Display Manager (DM)** contains the OS-dependent components. Hooks to 
> the
> -   amdgpu base driver and DRM are implemented here.
> +   amdgpu base driver and DRM are implemented here. For example, you can 
> check
> +   display/amdgpu_dm/ folder.
> +
> +
> +How AMD shares code?
> +
> +
> +Maintaining the same code-base across multiple OSes requires a lot of
> +synchronization effort between repositories. In the DC case, we maintain a
> +central repository where everyone who works from other OSes can put their
> +change in this centralized repository. In a simple way, this shared 
> repository
> +is identical to all code that you can see in the display folder. The shared
> +repo has integration tests with our Linux CI farm, and we run an exhaustive 
> set
> +of IGT tests in various AMD GPUs/APUs. Our CI also checks ARM64/32, PPC64/32,
> +and x86_64/32 compilation with DCN enabled and disabled. After all tests pass
> +and the developer gets reviewed by someone else, the change gets merged into
> +the shared repository.
> +
> +To maintain this shared code working properly, we run two activities every
> +week:
> +
> +1. **Weekly backport**: We bring changes from Linux to the other shared
> +   repositories. This work gets massive support from our CI tools, which can
> +   detect new changes and send them to internal maintainers.
> +2. **Weekly promotion**: Every week, we get changes from other teams in the
> +   shared repo that have yet to be made public. For this reason, at the
> +   beginning of each week, a developer will review that internal repo and
> +   prepare a series of patches that can be sent to the public upstream
> +   (promotion).
> +
> +For the context of this documentation, promotion is the essential part that
> +deserves a good elaboration here.
> +
> +Weekly promotion
> +
> +
> +As described in the previous sections, the display folder has its equivalent 
> as
> +an internal repository shared with multiple teams. The promotion activity is
> +the task of 'promoting' those internal changes to the upstream; this is
> +possible thanks to numerous tools that help us manage the code-sharing
> +challenges. The weekly promotion usually takes one week, sliced like this:
> +
> +1. Extract all merged patches from the previous week that can be sent to the
> +   upstream. In other words, we check the week's time frame.
> +2. Evaluate if any potential new patches make sense to the upstream.
> +3. Create a branch candidate with the latest amd-staging-drm-next code 
> together
> +   with the new patches. At this step, we must ensure that every patch 
> compiles
> +   and the entire series pass our set of IGT test in different hardware 
> (i.e.,
> +   it has to pass to our CI).
> +4. Send the new candidate branch for an internal quality test and extra CI
> +   validation.
> +5. Send patches to amd-gfx for reviews. We wait a few days for community
> +   feedback after sending a series to the public mailing list.

So we've debated this one before. :)

Again, I applaud the transparency in writing the document, but I can't
help feeling the weekly promotions are code drops that will generally be
merged unchanged, with no comments. They have all been reviewed
internally, get posted with Reviewed-by tags pre-filled, we have no
visibility to the review. Since the code has already been merged
internally and the batch has passed CI, feels like the bar for changing
anything at this point is pretty high.

Just my two cents.


BR,
Jani.


(Side note, there should be a \n before 6.)

6. If there is
> +   an error, we debug as fast as possible; usually, a simple bisect in the
> +   weekly promotion patches points to a bad change, and we can take two
> +   possible actions: fix the issue or drop the patch. If we cannot identify 
> the
> +   problem in the week interval, we drop the promotion and start over the
> +   following week; i

[PATCH 7/8] Documentation/gpu: Add an explanation about the DC weekly patches

2023-10-20 Thread Rodrigo Siqueira
Sharing code with other OSes is confusing and raises some questions.
This patch introduces some explanation about our upstream process with
the shared code.

Cc: Mario Limonciello 
Cc: Alex Deucher 
Cc: Harry Wentland 
Cc: Hamza Mahfooz 
Signed-off-by: Rodrigo Siqueira 
---
 Documentation/gpu/amdgpu/display/index.rst | 111 -
 1 file changed, 109 insertions(+), 2 deletions(-)

diff --git a/Documentation/gpu/amdgpu/display/index.rst 
b/Documentation/gpu/amdgpu/display/index.rst
index b09d1434754d..9d53a42c5339 100644
--- a/Documentation/gpu/amdgpu/display/index.rst
+++ b/Documentation/gpu/amdgpu/display/index.rst
@@ -10,7 +10,114 @@ reason, our Display Core Driver is divided into two pieces:
 1. **Display Core (DC)** contains the OS-agnostic components. Things like
hardware programming and resource management are handled here.
 2. **Display Manager (DM)** contains the OS-dependent components. Hooks to the
-   amdgpu base driver and DRM are implemented here.
+   amdgpu base driver and DRM are implemented here. For example, you can check
+   display/amdgpu_dm/ folder.
+
+
+How AMD shares code?
+
+
+Maintaining the same code-base across multiple OSes requires a lot of
+synchronization effort between repositories. In the DC case, we maintain a
+central repository where everyone who works from other OSes can put their
+change in this centralized repository. In a simple way, this shared repository
+is identical to all code that you can see in the display folder. The shared
+repo has integration tests with our Linux CI farm, and we run an exhaustive set
+of IGT tests in various AMD GPUs/APUs. Our CI also checks ARM64/32, PPC64/32,
+and x86_64/32 compilation with DCN enabled and disabled. After all tests pass
+and the developer gets reviewed by someone else, the change gets merged into
+the shared repository.
+
+To maintain this shared code working properly, we run two activities every
+week:
+
+1. **Weekly backport**: We bring changes from Linux to the other shared
+   repositories. This work gets massive support from our CI tools, which can
+   detect new changes and send them to internal maintainers.
+2. **Weekly promotion**: Every week, we get changes from other teams in the
+   shared repo that have yet to be made public. For this reason, at the
+   beginning of each week, a developer will review that internal repo and
+   prepare a series of patches that can be sent to the public upstream
+   (promotion).
+
+For the context of this documentation, promotion is the essential part that
+deserves a good elaboration here.
+
+Weekly promotion
+
+
+As described in the previous sections, the display folder has its equivalent as
+an internal repository shared with multiple teams. The promotion activity is
+the task of 'promoting' those internal changes to the upstream; this is
+possible thanks to numerous tools that help us manage the code-sharing
+challenges. The weekly promotion usually takes one week, sliced like this:
+
+1. Extract all merged patches from the previous week that can be sent to the
+   upstream. In other words, we check the week's time frame.
+2. Evaluate if any potential new patches make sense to the upstream.
+3. Create a branch candidate with the latest amd-staging-drm-next code together
+   with the new patches. At this step, we must ensure that every patch compiles
+   and the entire series pass our set of IGT test in different hardware (i.e.,
+   it has to pass to our CI).
+4. Send the new candidate branch for an internal quality test and extra CI
+   validation.
+5. Send patches to amd-gfx for reviews. We wait a few days for community
+   feedback after sending a series to the public mailing list.  6. If there is
+   an error, we debug as fast as possible; usually, a simple bisect in the
+   weekly promotion patches points to a bad change, and we can take two
+   possible actions: fix the issue or drop the patch. If we cannot identify the
+   problem in the week interval, we drop the promotion and start over the
+   following week; in this case, the following promotion will have the previous
+   patches plus the new ones.
+
+We usually rotate the above process with many display developers to keep the
+workload manageable for everybody. It is good to highlight that the test phase
+is something that we take extremely seriously, and we never merge anything that
+fails our validation. Just to give an overview:
+
+1. Manual test
+ - Multiple Hotplugs with DP and HDMI.
+ - Stress test with multiple display configuration changes via the user
+   interface.
+ - Validate VRR behaviour.
+ - Check PSR.
+ - Validate MPO when playing video.
+ - Test more than two displays connected at the same time.
+ - Check suspend/resume.
+2. Automated test
+ - IGT tests in a farm with GPUs and APUs that support DCN and DCE.
+ - Compilation validation with the latest GCC and Clang from LTS distro.
+ - Cross-compilation for PowerPC 64/32, ARM