date:20221129

[PATCH] Add vkpreemption for gfx9 mcbp

2022-11-29 Thread jiadong.zhu

From: jiadozhu 

This is a standalone test case used for software mcbp on gfx9.
Build and open two consoles to run:
build/bin/vkpreemption s gfx=draws:100,priority:high,delay:0
build/bin/vkpreemption c gfx=draws:100,priority:low,delay:0

The result is printed on the console of the server side.

Signed-off-by: jiadozhu 
---
 vkpreemption/CMakeLists.txt   |  17 +
 vkpreemption/VulkanInitializers.hpp   | 591 
 vkpreemption/VulkanTools.cpp  | 361 
 vkpreemption/VulkanTools.h| 118 
 vkpreemption/base.hpp | 269 +
 vkpreemption/build_lnx.sh |  11 +
 vkpreemption/computework.hpp  | 429 ++
 vkpreemption/graphicwork.hpp  | 777 ++
 vkpreemption/headless.comp|  34 ++
 vkpreemption/headless.comp.inc|  33 ++
 vkpreemption/main.cpp | 385 +
 vkpreemption/triangle.frag|  10 +
 vkpreemption/triangle.frag.glsl   |  10 +
 vkpreemption/triangle.frag.inc|  17 +
 vkpreemption/triangle.vert|  20 +
 vkpreemption/triangle.vert.glsl   |  20 +
 vkpreemption/triangle.vert.inc|  34 ++
 vkpreemption/vk_amd_dispatch_tunnel.h |  34 ++
 vkpreemption/vk_internal_ext_helper.h |  33 ++
 19 files changed, 3203 insertions(+)
 create mode 100644 vkpreemption/CMakeLists.txt
 create mode 100644 vkpreemption/VulkanInitializers.hpp
 create mode 100644 vkpreemption/VulkanTools.cpp
 create mode 100644 vkpreemption/VulkanTools.h
 create mode 100644 vkpreemption/base.hpp
 create mode 100644 vkpreemption/build_lnx.sh
 create mode 100644 vkpreemption/computework.hpp
 create mode 100644 vkpreemption/graphicwork.hpp
 create mode 100644 vkpreemption/headless.comp
 create mode 100644 vkpreemption/headless.comp.inc
 create mode 100644 vkpreemption/main.cpp
 create mode 100644 vkpreemption/triangle.frag
 create mode 100644 vkpreemption/triangle.frag.glsl
 create mode 100644 vkpreemption/triangle.frag.inc
 create mode 100644 vkpreemption/triangle.vert
 create mode 100644 vkpreemption/triangle.vert.glsl
 create mode 100644 vkpreemption/triangle.vert.inc
 create mode 100644 vkpreemption/vk_amd_dispatch_tunnel.h
 create mode 100644 vkpreemption/vk_internal_ext_helper.h

diff --git a/vkpreemption/CMakeLists.txt b/vkpreemption/CMakeLists.txt
new file mode 100644
index ..0c54ddab
--- /dev/null
+++ b/vkpreemption/CMakeLists.txt
@@ -0,0 +1,17 @@
+cmake_minimum_required(VERSION 2.8 FATAL_ERROR)
+cmake_policy(VERSION 2.8)
+project(vkpreemption)
+
+message("CMAKE_SYSTEM_NAME: ${CMAKE_SYSTEM_NAME}")
+
+include_directories(glm)
+
+set(CMAKE_RUNTIME_OUTPUT_DIRECTORY "${CMAKE_BINARY_DIR}/bin/")
+
+file(GLOB EXAMPLE_SRC "*.cpp" "*.hpp")
+add_executable(vkpreemption ${EXAMPLE_SRC})
+
+target_link_libraries(
+vkpreemption
+libvulkan.so
+)
diff --git a/vkpreemption/VulkanInitializers.hpp 
b/vkpreemption/VulkanInitializers.hpp
new file mode 100644
index ..806ab513
--- /dev/null
+++ b/vkpreemption/VulkanInitializers.hpp
@@ -0,0 +1,591 @@
+/*
+* Initializers for Vulkan structures and objects used by the examples
+* Saves lot of VK_STRUCTURE_TYPE assignments
+* Some initializers are parameterized for convenience
+*
+* Copyright (C) 2016 by Sascha Willems - www.saschawillems.de
+*
+* This code is licensed under the MIT license (MIT) 
(http://opensource.org/licenses/MIT)
+*/
+
+#pragma once
+
+#include 
+#include "vulkan/vulkan.h"
+
+namespace vks
+{
+   namespace initializers
+   {
+
+   inline VkMemoryAllocateInfo memoryAllocateInfo()
+   {
+   VkMemoryAllocateInfo memAllocInfo {};
+   memAllocInfo.sType = 
VK_STRUCTURE_TYPE_MEMORY_ALLOCATE_INFO;
+   return memAllocInfo;
+   }
+
+   inline VkMappedMemoryRange mappedMemoryRange()
+   {
+   VkMappedMemoryRange mappedMemoryRange {};
+   mappedMemoryRange.sType = 
VK_STRUCTURE_TYPE_MAPPED_MEMORY_RANGE;
+   return mappedMemoryRange;
+   }
+
+   inline VkCommandBufferAllocateInfo commandBufferAllocateInfo(
+   VkCommandPool commandPool,
+   VkCommandBufferLevel level,
+   uint32_t bufferCount)
+   {
+   VkCommandBufferAllocateInfo commandBufferAllocateInfo 
{};
+   commandBufferAllocateInfo.sType = 
VK_STRUCTURE_TYPE_COMMAND_BUFFER_ALLOCATE_INFO;
+   commandBufferAllocateInfo.commandPool = commandPool;
+   commandBufferAllocateInfo.level = level;
+   commandBufferAllocateInfo.commandBufferCount = 
bufferCount;
+   return commandBufferAllocateInfo;
+   }
+
+   inline VkCommandPoolCreateInfo commandPoolCreateInfo()
+   {
+

[PATCH 14/14] drm/amd/display: 3.2.215

2022-11-29 Thread Stylon Wang

From: Aric Cyr 

Acked-by: Stylon Wang 
Signed-off-by: Aric Cyr 
---
 drivers/gpu/drm/amd/display/dc/dc.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/display/dc/dc.h 
b/drivers/gpu/drm/amd/display/dc/dc.h
index 3cb8cf065204..85ebeaa2de18 100644
--- a/drivers/gpu/drm/amd/display/dc/dc.h
+++ b/drivers/gpu/drm/amd/display/dc/dc.h
@@ -47,7 +47,7 @@ struct aux_payload;
 struct set_config_cmd_payload;
 struct dmub_notification;
 
-#define DC_VER "3.2.214"
+#define DC_VER "3.2.215"
 
 #define MAX_SURFACES 3
 #define MAX_PLANES 6
-- 
2.25.1

[PATCH 13/14] drm/amd/display: set optimized required for comp buf changes

2022-11-29 Thread Stylon Wang

From: Dillon Varone 

[Description]
When compressed buffer allocation changes, optimized required flag should be
set to trigger an update in optimize bandwidth.

Reviewed-by: Aric Cyr 
Acked-by: Stylon Wang 
Signed-off-by: Dillon Varone 
---
 drivers/gpu/drm/amd/display/dc/dcn20/dcn20_hwseq.c | 7 +--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/display/dc/dcn20/dcn20_hwseq.c 
b/drivers/gpu/drm/amd/display/dc/dcn20/dcn20_hwseq.c
index bc4a303cd864..6291a241158a 100644
--- a/drivers/gpu/drm/amd/display/dc/dcn20/dcn20_hwseq.c
+++ b/drivers/gpu/drm/amd/display/dc/dcn20/dcn20_hwseq.c
@@ -2011,10 +2011,13 @@ void dcn20_prepare_bandwidth(
 
/* decrease compbuf size */
if (hubbub->funcs->program_compbuf_size) {
-   if (context->bw_ctx.dml.ip.min_comp_buffer_size_kbytes)
+   if (context->bw_ctx.dml.ip.min_comp_buffer_size_kbytes) {
compbuf_size_kb = 
context->bw_ctx.dml.ip.min_comp_buffer_size_kbytes;
-   else
+   dc->wm_optimized_required |= (compbuf_size_kb != 
dc->current_state->bw_ctx.dml.ip.min_comp_buffer_size_kbytes);
+   } else {
compbuf_size_kb = 
context->bw_ctx.bw.dcn.compbuf_size_kb;
+   dc->wm_optimized_required |= (compbuf_size_kb != 
dc->current_state->bw_ctx.bw.dcn.compbuf_size_kb);
+   }
 
hubbub->funcs->program_compbuf_size(hubbub, compbuf_size_kb, 
false);
}
-- 
2.25.1

[PATCH 12/14] drm/amd/display: Add debug option to skip PSR CRTC disable

2022-11-29 Thread Stylon Wang

From: Nicholas Kazlauskas 

[Why]
It's currently tied to Z10 support, and is required for Z10, but
we can still support Z10 display off without PSR.

We currently need to skip the PSR CRTC disable to prevent stuttering
and underflow from occuring during PSR-SU.

[How]
Add a debug option to allow specifying this separately.

Reviewed-by: Robin Chen 
Acked-by: Stylon Wang 
Signed-off-by: Nicholas Kazlauskas 
---
 drivers/gpu/drm/amd/display/dc/core/dc_link.c   | 2 +-
 drivers/gpu/drm/amd/display/dc/dc.h | 1 +
 drivers/gpu/drm/amd/display/dc/dcn314/dcn314_resource.c | 1 +
 3 files changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/display/dc/core/dc_link.c 
b/drivers/gpu/drm/amd/display/dc/core/dc_link.c
index 5304e9daf90a..342e906ae26e 100644
--- a/drivers/gpu/drm/amd/display/dc/core/dc_link.c
+++ b/drivers/gpu/drm/amd/display/dc/core/dc_link.c
@@ -3378,7 +3378,7 @@ bool dc_link_setup_psr(struct dc_link *link,
case FAMILY_YELLOW_CARP:
case AMDGPU_FAMILY_GC_10_3_6:
case AMDGPU_FAMILY_GC_11_0_1:
-   if (dc->debug.disable_z10)
+   if (dc->debug.disable_z10 || 
dc->debug.psr_skip_crtc_disable)
psr_context->psr_level.bits.SKIP_CRTC_DISABLE = 
true;
break;
default:
diff --git a/drivers/gpu/drm/amd/display/dc/dc.h 
b/drivers/gpu/drm/amd/display/dc/dc.h
index 4a7c0356d9c7..3cb8cf065204 100644
--- a/drivers/gpu/drm/amd/display/dc/dc.h
+++ b/drivers/gpu/drm/amd/display/dc/dc.h
@@ -844,6 +844,7 @@ struct dc_debug_options {
int crb_alloc_policy_min_disp_count;
bool disable_z10;
bool enable_z9_disable_interface;
+   bool psr_skip_crtc_disable;
union dpia_debug_options dpia_debug;
bool disable_fixed_vs_aux_timeout_wa;
bool force_disable_subvp;
diff --git a/drivers/gpu/drm/amd/display/dc/dcn314/dcn314_resource.c 
b/drivers/gpu/drm/amd/display/dc/dcn314/dcn314_resource.c
index 4fffc7bb8088..f9ea1e86707f 100644
--- a/drivers/gpu/drm/amd/display/dc/dcn314/dcn314_resource.c
+++ b/drivers/gpu/drm/amd/display/dc/dcn314/dcn314_resource.c
@@ -886,6 +886,7 @@ static const struct dc_plane_cap plane_cap = {
 static const struct dc_debug_options debug_defaults_drv = {
.disable_z10 = false,
.enable_z9_disable_interface = true,
+   .psr_skip_crtc_disable = true,
.disable_dmcu = true,
.force_abm_enable = false,
.timing_trace = false,
-- 
2.25.1

[PATCH 11/14] drm/amd/display: correct DML calc error of UrgentLatency

2022-11-29 Thread Stylon Wang

From: Zhongwei 

[Why]
The input UrgentLatency in CalculateUrgentBurstFactor
of prefect check is wrong.

[How]
Correct to the correct one to keep same as HW formula

Reviewed-by: Charlene Liu 
Acked-by: Stylon Wang 
Signed-off-by: Zhongwei 
---
 drivers/gpu/drm/amd/display/dc/dml/dcn30/display_mode_vba_30.c  | 2 +-
 drivers/gpu/drm/amd/display/dc/dml/dcn31/display_mode_vba_31.c  | 2 +-
 .../gpu/drm/amd/display/dc/dml/dcn314/display_mode_vba_314.c| 2 +-
 3 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/amd/display/dc/dml/dcn30/display_mode_vba_30.c 
b/drivers/gpu/drm/amd/display/dc/dml/dcn30/display_mode_vba_30.c
index 479e2c1a1301..379729b02847 100644
--- a/drivers/gpu/drm/amd/display/dc/dml/dcn30/display_mode_vba_30.c
+++ b/drivers/gpu/drm/amd/display/dc/dml/dcn30/display_mode_vba_30.c
@@ -4851,7 +4851,7 @@ void dml30_ModeSupportAndSystemConfigurationFull(struct 
display_mode_lib *mode_l

v->SwathHeightYThisState[k],

v->SwathHeightCThisState[k],
v->HTotal[k] / 
v->PixelClock[k],
-   v->UrgentLatency,
+   v->UrgLatency[i],
v->CursorBufferSize,
v->CursorWidth[k][0],
v->CursorBPP[k][0],
diff --git a/drivers/gpu/drm/amd/display/dc/dml/dcn31/display_mode_vba_31.c 
b/drivers/gpu/drm/amd/display/dc/dml/dcn31/display_mode_vba_31.c
index 4e45c6d9ecdc..ec351c8418cb 100644
--- a/drivers/gpu/drm/amd/display/dc/dml/dcn31/display_mode_vba_31.c
+++ b/drivers/gpu/drm/amd/display/dc/dml/dcn31/display_mode_vba_31.c
@@ -5082,7 +5082,7 @@ void dml31_ModeSupportAndSystemConfigurationFull(struct 
display_mode_lib *mode_l

v->SwathHeightYThisState[k],

v->SwathHeightCThisState[k],
v->HTotal[k] / 
v->PixelClock[k],
-   v->UrgentLatency,
+   v->UrgLatency[i],
v->CursorBufferSize,
v->CursorWidth[k][0],
v->CursorBPP[k][0],
diff --git a/drivers/gpu/drm/amd/display/dc/dml/dcn314/display_mode_vba_314.c 
b/drivers/gpu/drm/amd/display/dc/dml/dcn314/display_mode_vba_314.c
index 41f0b4c1c72f..950669f2c10d 100644
--- a/drivers/gpu/drm/amd/display/dc/dml/dcn314/display_mode_vba_314.c
+++ b/drivers/gpu/drm/amd/display/dc/dml/dcn314/display_mode_vba_314.c
@@ -5179,7 +5179,7 @@ void dml314_ModeSupportAndSystemConfigurationFull(struct 
display_mode_lib *mode_

v->SwathHeightYThisState[k],

v->SwathHeightCThisState[k],
v->HTotal[k] / 
v->PixelClock[k],
-   v->UrgentLatency,
+   v->UrgLatency[i],
v->CursorBufferSize,
v->CursorWidth[k][0],
v->CursorBPP[k][0],
-- 
2.25.1

[PATCH 10/14] drm/amd/display: correct static_screen_event_mask

2022-11-29 Thread Stylon Wang

From: Charlene Liu 

[why]
HW register bit define changed.

Reviewed-by: Zhan Liu 
Reviewed-by: Dmytro Laktyushkin 
Acked-by: Stylon Wang 
Signed-off-by: Charlene Liu 
---
 .../drm/amd/display/dc/dcn31/dcn31_hwseq.c| 40 +++
 .../drm/amd/display/dc/dcn31/dcn31_hwseq.h|  4 ++
 .../gpu/drm/amd/display/dc/dcn31/dcn31_init.c |  4 +-
 .../gpu/drm/amd/display/dc/dcn31/dcn31_optc.c | 29 +-
 .../gpu/drm/amd/display/dc/dcn31/dcn31_optc.h |  5 ++-
 .../drm/amd/display/dc/dcn314/dcn314_init.c   |  4 +-
 .../drm/amd/display/dc/dcn314/dcn314_optc.c   |  2 +-
 7 files changed, 81 insertions(+), 7 deletions(-)

diff --git a/drivers/gpu/drm/amd/display/dc/dcn31/dcn31_hwseq.c 
b/drivers/gpu/drm/amd/display/dc/dcn31/dcn31_hwseq.c
index 165c920ca776..4226a051df41 100644
--- a/drivers/gpu/drm/amd/display/dc/dcn31/dcn31_hwseq.c
+++ b/drivers/gpu/drm/amd/display/dc/dcn31/dcn31_hwseq.c
@@ -623,3 +623,43 @@ void dcn31_setup_hpo_hw_control(const struct dce_hwseq 
*hws, bool enable)
if (hws->ctx->dc->debug.hpo_optimization)
REG_UPDATE(HPO_TOP_HW_CONTROL, HPO_IO_EN, !!enable);
 }
+void dcn31_set_drr(struct pipe_ctx **pipe_ctx,
+   int num_pipes, struct dc_crtc_timing_adjust adjust)
+{
+   int i = 0;
+   struct drr_params params = {0};
+   unsigned int event_triggers = 0x2;/*Bit[1]: OTG_TRIG_A*/
+   unsigned int num_frames = 2;
+   params.vertical_total_max = adjust.v_total_max;
+   params.vertical_total_min = adjust.v_total_min;
+   params.vertical_total_mid = adjust.v_total_mid;
+   params.vertical_total_mid_frame_num = adjust.v_total_mid_frame_num;
+   for (i = 0; i < num_pipes; i++) {
+   if ((pipe_ctx[i]->stream_res.tg != NULL) && 
pipe_ctx[i]->stream_res.tg->funcs) {
+   if (pipe_ctx[i]->stream_res.tg->funcs->set_drr)
+   pipe_ctx[i]->stream_res.tg->funcs->set_drr(
+   pipe_ctx[i]->stream_res.tg, ¶ms);
+   if (adjust.v_total_max != 0 && adjust.v_total_min != 0)
+   if 
(pipe_ctx[i]->stream_res.tg->funcs->set_static_screen_control)
+   
pipe_ctx[i]->stream_res.tg->funcs->set_static_screen_control(
+   pipe_ctx[i]->stream_res.tg,
+   event_triggers, num_frames);
+   }
+   }
+}
+void dcn31_set_static_screen_control(struct pipe_ctx **pipe_ctx,
+   int num_pipes, const struct dc_static_screen_params *params)
+{
+   unsigned int i;
+   unsigned int triggers = 0;
+   if (params->triggers.surface_update)
+   triggers |= 0x600;/*bit 9 and bit10 : 110  */
+   if (params->triggers.cursor_update)
+   triggers |= 0x10;/*bit4*/
+   if (params->triggers.force_trigger)
+   triggers |= 0x1;
+   for (i = 0; i < num_pipes; i++)
+   pipe_ctx[i]->stream_res.tg->funcs->
+   set_static_screen_control(pipe_ctx[i]->stream_res.tg,
+   triggers, params->num_frames);
+}
diff --git a/drivers/gpu/drm/amd/display/dc/dcn31/dcn31_hwseq.h 
b/drivers/gpu/drm/amd/display/dc/dcn31/dcn31_hwseq.h
index edfc01d6ad73..e7e03a8722e0 100644
--- a/drivers/gpu/drm/amd/display/dc/dcn31/dcn31_hwseq.h
+++ b/drivers/gpu/drm/amd/display/dc/dcn31/dcn31_hwseq.h
@@ -56,4 +56,8 @@ bool dcn31_is_abm_supported(struct dc *dc,
 void dcn31_init_pipes(struct dc *dc, struct dc_state *context);
 void dcn31_setup_hpo_hw_control(const struct dce_hwseq *hws, bool enable);
 
+void dcn31_set_static_screen_control(struct pipe_ctx **pipe_ctx,
+   int num_pipes, const struct dc_static_screen_params *params);
+void dcn31_set_drr(struct pipe_ctx **pipe_ctx,
+   int num_pipes, struct dc_crtc_timing_adjust adjust);
 #endif /* __DC_HWSS_DCN31_H__ */
diff --git a/drivers/gpu/drm/amd/display/dc/dcn31/dcn31_init.c 
b/drivers/gpu/drm/amd/display/dc/dcn31/dcn31_init.c
index 3a32810bbe38..7c2da70ffe21 100644
--- a/drivers/gpu/drm/amd/display/dc/dcn31/dcn31_init.c
+++ b/drivers/gpu/drm/amd/display/dc/dcn31/dcn31_init.c
@@ -64,9 +64,9 @@ static const struct hw_sequencer_funcs dcn31_funcs = {
.prepare_bandwidth = dcn20_prepare_bandwidth,
.optimize_bandwidth = dcn20_optimize_bandwidth,
.update_bandwidth = dcn20_update_bandwidth,
-   .set_drr = dcn10_set_drr,
+   .set_drr = dcn31_set_drr,
.get_position = dcn10_get_position,
-   .set_static_screen_control = dcn10_set_static_screen_control,
+   .set_static_screen_control = dcn31_set_static_screen_control,
.setup_stereo = dcn10_setup_stereo,
.set_avmute = dcn30_set_avmute,
.log_hw_state = dcn10_log_hw_state,
diff --git a/drivers/gpu/drm/amd/display/dc/dcn31/dcn31_optc.c 
b/drivers/gpu/drm/amd/display/dc/dcn31/dcn31_optc.c
index 63a677c8ee27..f

[PATCH 09/14] drm/amd/display: Ensure commit_streams returns the DC return code

2022-11-29 Thread Stylon Wang

From: Alvin Lee 

[Description]
- Ensure dc_commit_streams returns the correct return code so any
  failures can be handled properly in DM layer
- If set timings fail and we have to remove MPO planes, do so
  unconditionally but make sure to mark for removal so we report
  the VSYNC and prevent timeout
- Failure to remove MPO plane results in set timings failure due
  to lack of resources

Reviewed-by: Aric Cyr 
Acked-by: Stylon Wang 
Signed-off-by: Alvin Lee 
---
 drivers/gpu/drm/amd/display/dc/core/dc.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/display/dc/core/dc.c 
b/drivers/gpu/drm/amd/display/dc/core/dc.c
index 486d18290b9f..0cb8d1f934d1 100644
--- a/drivers/gpu/drm/amd/display/dc/core/dc.c
+++ b/drivers/gpu/drm/amd/display/dc/core/dc.c
@@ -1988,7 +1988,7 @@ enum dc_status dc_commit_streams(struct dc *dc,
 
DC_LOG_DC("%s Finished.\n", __func__);
 
-   return (res == DC_OK);
+   return res;
 }
 
 /* TODO: When the transition to the new commit sequence is done, remove this
-- 
2.25.1

[PATCH 08/14] drm/amd/display: read invalid ddc pin status cause engine busy

2022-11-29 Thread Stylon Wang

From: Paul Hsieh 

[Why]
There is no DDC_6 pin on new asic cause the mapping table is
incorrect. When app try to access DDC_VGA port, driver read
an invalid ddc pin status and report engine busy.

[How]
Add dummy DDC_6 pin to align gpio structure.

Reviewed-by: Alvin Lee 
Acked-by: Stylon Wang 
Signed-off-by: Paul Hsieh 
---
 drivers/gpu/drm/amd/display/dc/gpio/dcn32/hw_factory_dcn32.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/display/dc/gpio/dcn32/hw_factory_dcn32.c 
b/drivers/gpu/drm/amd/display/dc/gpio/dcn32/hw_factory_dcn32.c
index 0ea52ba5ac82..9fd8b269dd79 100644
--- a/drivers/gpu/drm/amd/display/dc/gpio/dcn32/hw_factory_dcn32.c
+++ b/drivers/gpu/drm/amd/display/dc/gpio/dcn32/hw_factory_dcn32.c
@@ -256,8 +256,8 @@ static const struct hw_factory_funcs funcs = {
  */
 void dal_hw_factory_dcn32_init(struct hw_factory *factory)
 {
-   factory->number_of_pins[GPIO_ID_DDC_DATA] = 6;
-   factory->number_of_pins[GPIO_ID_DDC_CLOCK] = 6;
+   factory->number_of_pins[GPIO_ID_DDC_DATA] = 8;
+   factory->number_of_pins[GPIO_ID_DDC_CLOCK] = 8;
factory->number_of_pins[GPIO_ID_GENERIC] = 4;
factory->number_of_pins[GPIO_ID_HPD] = 5;
factory->number_of_pins[GPIO_ID_GPIO_PAD] = 28;
-- 
2.25.1

[PATCH 07/14] drm/amd/display: Bypass DET swath fill check for max clocks

2022-11-29 Thread Stylon Wang

From: Dillon Varone 

[Description]
If validating for max voltage level (therefore max clocks) always pass over
the DET swath fill latency hiding check.

Reviewed-by: Alvin Lee 
Acked-by: Stylon Wang 
Signed-off-by: Dillon Varone 
---
 drivers/gpu/drm/amd/display/dc/dml/dcn32/display_mode_vba_32.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/display/dc/dml/dcn32/display_mode_vba_32.c 
b/drivers/gpu/drm/amd/display/dc/dml/dcn32/display_mode_vba_32.c
index 820042f6aaca..4b8f5fa0f0ad 100644
--- a/drivers/gpu/drm/amd/display/dc/dml/dcn32/display_mode_vba_32.c
+++ b/drivers/gpu/drm/amd/display/dc/dml/dcn32/display_mode_vba_32.c
@@ -1683,8 +1683,9 @@ static void mode_support_configuration(struct vba_vars_st 
*v,
&& mode_lib->vba.PTEBufferSizeNotExceeded[i][j] 
== true
&& 
mode_lib->vba.DCCMetaBufferSizeNotExceeded[i][j] == true
&& mode_lib->vba.NonsupportedDSCInputBPC == 
false
-   && 
mode_lib->vba.NotEnoughDETSwathFillLatencyHidingPerState[i][j] == false
&& !mode_lib->vba.ExceededMALLSize
+   && 
(mode_lib->vba.NotEnoughDETSwathFillLatencyHidingPerState[i][j] == false
+   || i == v->soc.num_states - 1)
&& ((mode_lib->vba.HostVMEnable == false
&& !mode_lib->vba.ImmediateFlipRequiredFinal)
|| 
mode_lib->vba.ImmediateFlipSupportedForState[i][j])
-- 
2.25.1

[PATCH 06/14] drm/amd/display: Disable uclk pstate for subvp pipes

2022-11-29 Thread Stylon Wang

From: Dillon Varone 

[Description]
When subvp is in use, main pipes should block unintended natural uclk pstate
changes to prevent disruption to the state machine.

Reviewed-by: Alvin Lee 
Acked-by: Stylon Wang 
Signed-off-by: Dillon Varone 
---
 drivers/gpu/drm/amd/display/dc/dcn32/dcn32_hwseq.c | 10 +-
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/drivers/gpu/drm/amd/display/dc/dcn32/dcn32_hwseq.c 
b/drivers/gpu/drm/amd/display/dc/dcn32/dcn32_hwseq.c
index c9b2343947be..b8767be1e4c5 100644
--- a/drivers/gpu/drm/amd/display/dc/dcn32/dcn32_hwseq.c
+++ b/drivers/gpu/drm/amd/display/dc/dcn32/dcn32_hwseq.c
@@ -703,11 +703,7 @@ void dcn32_subvp_update_force_pstate(struct dc *dc, struct 
dc_state *context)
for (i = 0; i < dc->res_pool->pipe_count; i++) {
struct pipe_ctx *pipe = &context->res_ctx.pipe_ctx[i];
 
-   // For SubVP + DRR, also force disallow on the DRR pipe
-   // (We will force allow in the DMUB sequence -- some DRR 
timings by default won't allow P-State so we have
-   // to force once the vblank is stretched).
-   if (pipe->stream && pipe->plane_state && 
(pipe->stream->mall_stream_config.type == SUBVP_MAIN ||
-   (pipe->stream->mall_stream_config.type == 
SUBVP_NONE && pipe->stream->ignore_msa_timing_param))) {
+   if (pipe->stream && pipe->plane_state && 
(pipe->stream->mall_stream_config.type == SUBVP_MAIN)) {
struct hubp *hubp = pipe->plane_res.hubp;
 
if (hubp && 
hubp->funcs->hubp_update_force_pstate_disallow)
@@ -785,6 +781,10 @@ void dcn32_program_mall_pipe_config(struct dc *dc, struct 
dc_state *context)
if (hws && hws->funcs.update_mall_sel)
hws->funcs.update_mall_sel(dc, context);
 
+   //update subvp force pstate
+   if (hws && hws->funcs.subvp_update_force_pstate)
+   dc->hwseq->funcs.subvp_update_force_pstate(dc, context);
+
// Program FORCE_ONE_ROW_FOR_FRAME and CURSOR_REQ_MODE for main subvp 
pipes
for (i = 0; i < dc->res_pool->pipe_count; i++) {
struct pipe_ctx *pipe = &context->res_ctx.pipe_ctx[i];
-- 
2.25.1

[PATCH 05/14] drm/amd/display: Fix DCN2.1 default DSC clocks

2022-11-29 Thread Stylon Wang

From: Michael Strauss 

[WHY]
Low dscclk in high vlevels blocks some DSC modes.

[HOW]
Update dscclk to 1/3 of dispclk.

Reviewed-by: Charlene Liu 
Acked-by: Stylon Wang 
Signed-off-by: Michael Strauss 
---
 drivers/gpu/drm/amd/display/dc/dml/dcn20/dcn20_fpu.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/amd/display/dc/dml/dcn20/dcn20_fpu.c 
b/drivers/gpu/drm/amd/display/dc/dml/dcn20/dcn20_fpu.c
index c4eca10587a6..c26da3bb2892 100644
--- a/drivers/gpu/drm/amd/display/dc/dml/dcn20/dcn20_fpu.c
+++ b/drivers/gpu/drm/amd/display/dc/dml/dcn20/dcn20_fpu.c
@@ -565,7 +565,7 @@ struct _vcs_dpi_soc_bounding_box_st dcn2_1_soc = {
.dppclk_mhz = 847.06,
.phyclk_mhz = 810.0,
.socclk_mhz = 953.0,
-   .dscclk_mhz = 489.0,
+   .dscclk_mhz = 300.0,
.dram_speed_mts = 2400.0,
},
{
@@ -576,7 +576,7 @@ struct _vcs_dpi_soc_bounding_box_st dcn2_1_soc = {
.dppclk_mhz = 960.00,
.phyclk_mhz = 810.0,
.socclk_mhz = 278.0,
-   .dscclk_mhz = 287.67,
+   .dscclk_mhz = 342.86,
.dram_speed_mts = 2666.0,
},
{
@@ -587,7 +587,7 @@ struct _vcs_dpi_soc_bounding_box_st dcn2_1_soc = {
.dppclk_mhz = 1028.57,
.phyclk_mhz = 810.0,
.socclk_mhz = 715.0,
-   .dscclk_mhz = 318.334,
+   .dscclk_mhz = 369.23,
.dram_speed_mts = 3200.0,
},
{
-- 
2.25.1

[PATCH 04/14] drm/amd/display: Enable dp_hdmi21_pcon support

2022-11-29 Thread Stylon Wang

From: David Galiffi 

[Why]
It is not enabled for DCN3.0.1, 3.0.2, 3.0.3.

[How]
Add `dc->caps.dp_hdmi21_pcon_support = true` to these DCN versions.

Reviewed-by: Martin Leung 
Acked-by: Stylon Wang 
Signed-off-by: David Galiffi 
---
 drivers/gpu/drm/amd/display/dc/dcn301/dcn301_resource.c | 2 ++
 drivers/gpu/drm/amd/display/dc/dcn302/dcn302_resource.c | 2 ++
 drivers/gpu/drm/amd/display/dc/dcn303/dcn303_resource.c | 2 ++
 3 files changed, 6 insertions(+)

diff --git a/drivers/gpu/drm/amd/display/dc/dcn301/dcn301_resource.c 
b/drivers/gpu/drm/amd/display/dc/dcn301/dcn301_resource.c
index 480145f09246..8cf10351f271 100644
--- a/drivers/gpu/drm/amd/display/dc/dcn301/dcn301_resource.c
+++ b/drivers/gpu/drm/amd/display/dc/dcn301/dcn301_resource.c
@@ -1493,6 +1493,8 @@ static bool dcn301_resource_construct(
dc->caps.color.mpc.ogam_rom_caps.hlg = 0;
dc->caps.color.mpc.ocsc = 1;
 
+   dc->caps.dp_hdmi21_pcon_support = true;
+
/* read VBIOS LTTPR caps */
if (ctx->dc_bios->funcs->get_lttpr_caps) {
enum bp_result bp_query_result;
diff --git a/drivers/gpu/drm/amd/display/dc/dcn302/dcn302_resource.c 
b/drivers/gpu/drm/amd/display/dc/dcn302/dcn302_resource.c
index 7d11c2a43cbe..47cffd0e6830 100644
--- a/drivers/gpu/drm/amd/display/dc/dcn302/dcn302_resource.c
+++ b/drivers/gpu/drm/amd/display/dc/dcn302/dcn302_resource.c
@@ -1281,6 +1281,8 @@ static bool dcn302_resource_construct(
dc->caps.color.mpc.ogam_rom_caps.hlg = 0;
dc->caps.color.mpc.ocsc = 1;
 
+   dc->caps.dp_hdmi21_pcon_support = true;
+
/* read VBIOS LTTPR caps */
if (ctx->dc_bios->funcs->get_lttpr_caps) {
enum bp_result bp_query_result;
diff --git a/drivers/gpu/drm/amd/display/dc/dcn303/dcn303_resource.c 
b/drivers/gpu/drm/amd/display/dc/dcn303/dcn303_resource.c
index 92393b04cc44..c14d35894b2e 100644
--- a/drivers/gpu/drm/amd/display/dc/dcn303/dcn303_resource.c
+++ b/drivers/gpu/drm/amd/display/dc/dcn303/dcn303_resource.c
@@ -1212,6 +1212,8 @@ static bool dcn303_resource_construct(
dc->caps.color.mpc.ogam_rom_caps.hlg = 0;
dc->caps.color.mpc.ocsc = 1;
 
+   dc->caps.dp_hdmi21_pcon_support = true;
+
/* read VBIOS LTTPR caps */
if (ctx->dc_bios->funcs->get_lttpr_caps) {
enum bp_result bp_query_result;
-- 
2.25.1

[PATCH 03/14] drm/amd/display: prevent seamless boot on displays that don't have the preferred dig

2022-11-29 Thread Stylon Wang

From: Dmytro Laktyushkin 

Seamless boot requires VBIOS to select dig matching to link order wise. A 
significant
amount of dal logic makes assumption we are using preferred dig for eDP and if 
this
isn't the case then seamless boot is not supported.

Reviewed-by: Martin Leung 
Acked-by: Stylon Wang 
Signed-off-by: Dmytro Laktyushkin 
---
 drivers/gpu/drm/amd/display/dc/core/dc.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/gpu/drm/amd/display/dc/core/dc.c 
b/drivers/gpu/drm/amd/display/dc/core/dc.c
index 87994ae0a397..486d18290b9f 100644
--- a/drivers/gpu/drm/amd/display/dc/core/dc.c
+++ b/drivers/gpu/drm/amd/display/dc/core/dc.c
@@ -1556,6 +1556,9 @@ bool dc_validate_boot_timing(const struct dc *dc,
if (tg_inst >= dc->res_pool->timing_generator_count)
return false;
 
+   if (tg_inst != link->link_enc->preferred_engine)
+   return false;
+
tg = dc->res_pool->timing_generators[tg_inst];
 
if (!tg->funcs->get_hw_timing)
-- 
2.25.1

[PATCH 02/14] drm/amd/display: trigger timing sync only if TG is running

2022-11-29 Thread Stylon Wang

From: Aurabindo Pillai 

[Why&How]
If the timing generator isnt running, it does not make sense to trigger
a sync on the corresponding OTG. Check this condition before starting.
Otherwise, this will cause error like:

*ERROR* GSL: Timeout on reset trigger!

Fixes: 8c7924bdb0fe ("drm/amd/display: Disable phantom OTG after enable for 
plane disable")
Reviewed-by: Rodrigo Siqueira 
Reviewed-by: Alvin Lee 
Acked-by: Stylon Wang 
Signed-off-by: Aurabindo Pillai 
---
 drivers/gpu/drm/amd/display/dc/dcn10/dcn10_hw_sequencer.c | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/drivers/gpu/drm/amd/display/dc/dcn10/dcn10_hw_sequencer.c 
b/drivers/gpu/drm/amd/display/dc/dcn10/dcn10_hw_sequencer.c
index 355ffed7380b..c8ec11839b4d 100644
--- a/drivers/gpu/drm/amd/display/dc/dcn10/dcn10_hw_sequencer.c
+++ b/drivers/gpu/drm/amd/display/dc/dcn10/dcn10_hw_sequencer.c
@@ -2216,6 +2216,12 @@ void dcn10_enable_vblanks_synchronization(
opp = grouped_pipes[i]->stream_res.opp;
tg = grouped_pipes[i]->stream_res.tg;
tg->funcs->get_otg_active_size(tg, &width, &height);
+
+   if (!tg->funcs->is_tg_enabled(tg)) {
+   DC_SYNC_INFO("Skipping timing sync on disabled OTG\n");
+   return;
+   }
+
if (opp->funcs->opp_program_dpg_dimensions)
opp->funcs->opp_program_dpg_dimensions(opp, width, 
2*(height) + 1);
}
-- 
2.25.1

[PATCH 01/14] drm/amd/display: Remove DTB DTO on CLK update

2022-11-29 Thread Stylon Wang

From: Chris Park 

[Why]
DTB DTO is programmed more correctly during
link enable.  Programming them on CLK update
which may arrive frequently and sporadically
per flip throws off DTB DTO.

[How]
Remove DTB DTO programming on clock update.

Reviewed-by: Alvin Lee 
Acked-by: Jasdeep Dhillon 
Signed-off-by: Chris Park 
---
 .../display/dc/clk_mgr/dcn32/dcn32_clk_mgr.c  | 37 ---
 .../amd/display/dc/dcn321/dcn321_resource.c   |  2 +-
 2 files changed, 1 insertion(+), 38 deletions(-)

diff --git a/drivers/gpu/drm/amd/display/dc/clk_mgr/dcn32/dcn32_clk_mgr.c 
b/drivers/gpu/drm/amd/display/dc/clk_mgr/dcn32/dcn32_clk_mgr.c
index 9eb9fe5b8d2c..200fcec19186 100644
--- a/drivers/gpu/drm/amd/display/dc/clk_mgr/dcn32/dcn32_clk_mgr.c
+++ b/drivers/gpu/drm/amd/display/dc/clk_mgr/dcn32/dcn32_clk_mgr.c
@@ -233,41 +233,6 @@ void dcn32_init_clocks(struct clk_mgr *clk_mgr_base)
DC_FP_END();
 }
 
-static void dcn32_update_clocks_update_dtb_dto(struct clk_mgr_internal 
*clk_mgr,
-   struct dc_state *context,
-   int ref_dtbclk_khz)
-{
-   struct dccg *dccg = clk_mgr->dccg;
-   uint32_t tg_mask = 0;
-   int i;
-
-   for (i = 0; i < clk_mgr->base.ctx->dc->res_pool->pipe_count; i++) {
-   struct pipe_ctx *pipe_ctx = &context->res_ctx.pipe_ctx[i];
-   struct dtbclk_dto_params dto_params = {0};
-
-   /* use mask to program DTO once per tg */
-   if (pipe_ctx->stream_res.tg &&
-   !(tg_mask & (1 << 
pipe_ctx->stream_res.tg->inst))) {
-   tg_mask |= (1 << pipe_ctx->stream_res.tg->inst);
-
-   dto_params.otg_inst = pipe_ctx->stream_res.tg->inst;
-   dto_params.ref_dtbclk_khz = ref_dtbclk_khz;
-
-   if (is_dp_128b_132b_signal(pipe_ctx)) {
-   dto_params.pixclk_khz = 
pipe_ctx->stream->phy_pix_clk;
-
-   if (pipe_ctx->stream_res.audio != NULL)
-   dto_params.req_audio_dtbclk_khz = 24000;
-   }
-   if (dc_is_hdmi_signal(pipe_ctx->stream->signal))
-   dto_params.is_hdmi = true;
-
-   dccg->funcs->set_dtbclk_dto(clk_mgr->dccg, &dto_params);
-   //dccg->funcs->set_audio_dtbclk_dto(clk_mgr->dccg, 
&dto_params);
-   }
-   }
-}
-
 /* Since DPPCLK request to PMFW needs to be exact (due to DPP DTO programming),
  * update DPPCLK to be the exact frequency that will be set after the DPPCLK
  * divider is updated. This will prevent rounding issues that could cause DPP
@@ -447,8 +412,6 @@ static void dcn32_update_clocks(struct clk_mgr 
*clk_mgr_base,
/* DCCG requires KHz precision for DTBCLK */
clk_mgr_base->clks.ref_dtbclk_khz =
dcn32_smu_set_hard_min_by_freq(clk_mgr, 
PPCLK_DTBCLK, khz_to_mhz_ceil(new_clocks->ref_dtbclk_khz));
-
-   dcn32_update_clocks_update_dtb_dto(clk_mgr, context, 
clk_mgr_base->clks.ref_dtbclk_khz);
}
 
if (dc->config.forced_clocks == false || (force_reset && 
safe_to_lower)) {
diff --git a/drivers/gpu/drm/amd/display/dc/dcn321/dcn321_resource.c 
b/drivers/gpu/drm/amd/display/dc/dcn321/dcn321_resource.c
index 3406e7735357..d1f36df03c2e 100644
--- a/drivers/gpu/drm/amd/display/dc/dcn321/dcn321_resource.c
+++ b/drivers/gpu/drm/amd/display/dc/dcn321/dcn321_resource.c
@@ -743,7 +743,7 @@ static const struct dc_debug_options debug_defaults_diags = 
{
.dmub_command_table = true,
.enable_tri_buf = true,
.use_max_lb = true,
-   .force_disable_subvp = true
+   .force_disable_subvp = true,
 };
 
 
-- 
2.25.1

[PATCH 00/14] DC Patches December 5 2022

2022-11-29 Thread Stylon Wang

This DC patchset brings improvements in multiple areas. In summary, we have:

* Improvements on PSR-SU
* Improvements and fixes on DML calculation
* Fix on Dynamic Refresh Rate for DCN 3.1.4
* Fix on DC commit streams
* Fix on DDC GPIO pin
* Fix on Sub-ViewPort
* Fix on DSC
* Enable DP HDMI 2.1 PCON
* Fix on seamless boot
* Fix on OTG programming
* Fix on DTB clock


Cc: Daniel Wheeler 


Alvin Lee (1):
  drm/amd/display: Ensure commit_streams returns the DC return code

Aric Cyr (1):
  drm/amd/display: 3.2.215

Aurabindo Pillai (1):
  drm/amd/display: trigger timing sync only if TG is running

Charlene Liu (1):
  drm/amd/display: correct static_screen_event_mask

Chris Park (1):
  drm/amd/display: Remove DTB DTO on CLK update

David Galiffi (1):
  drm/amd/display: Enable dp_hdmi21_pcon support

Dillon Varone (3):
  drm/amd/display: Disable uclk pstate for subvp pipes
  drm/amd/display: Bypass DET swath fill check for max clocks
  drm/amd/display: set optimized required for comp buf changes

Dmytro Laktyushkin (1):
  drm/amd/display: prevent seamless boot on displays that don't have the
preferred dig

Michael Strauss (1):
  drm/amd/display: Fix DCN2.1 default DSC clocks

Nicholas Kazlauskas (1):
  drm/amd/display: Add debug option to skip PSR CRTC disable

Paul Hsieh (1):
  drm/amd/display: read invalid ddc pin status cause engine busy

Zhongwei (1):
  drm/amd/display: correct DML calc error of UrgentLatency

 .../display/dc/clk_mgr/dcn32/dcn32_clk_mgr.c  | 37 -
 drivers/gpu/drm/amd/display/dc/core/dc.c  |  5 ++-
 drivers/gpu/drm/amd/display/dc/core/dc_link.c |  2 +-
 drivers/gpu/drm/amd/display/dc/dc.h   |  3 +-
 .../amd/display/dc/dcn10/dcn10_hw_sequencer.c |  6 +++
 .../drm/amd/display/dc/dcn20/dcn20_hwseq.c|  7 +++-
 .../amd/display/dc/dcn301/dcn301_resource.c   |  2 +
 .../amd/display/dc/dcn302/dcn302_resource.c   |  2 +
 .../amd/display/dc/dcn303/dcn303_resource.c   |  2 +
 .../drm/amd/display/dc/dcn31/dcn31_hwseq.c| 40 +++
 .../drm/amd/display/dc/dcn31/dcn31_hwseq.h|  4 ++
 .../gpu/drm/amd/display/dc/dcn31/dcn31_init.c |  4 +-
 .../gpu/drm/amd/display/dc/dcn31/dcn31_optc.c | 29 +-
 .../gpu/drm/amd/display/dc/dcn31/dcn31_optc.h |  5 ++-
 .../drm/amd/display/dc/dcn314/dcn314_init.c   |  4 +-
 .../drm/amd/display/dc/dcn314/dcn314_optc.c   |  2 +-
 .../amd/display/dc/dcn314/dcn314_resource.c   |  1 +
 .../drm/amd/display/dc/dcn32/dcn32_hwseq.c| 10 ++---
 .../amd/display/dc/dcn321/dcn321_resource.c   |  2 +-
 .../drm/amd/display/dc/dml/dcn20/dcn20_fpu.c  |  6 +--
 .../dc/dml/dcn30/display_mode_vba_30.c|  2 +-
 .../dc/dml/dcn31/display_mode_vba_31.c|  2 +-
 .../dc/dml/dcn314/display_mode_vba_314.c  |  2 +-
 .../dc/dml/dcn32/display_mode_vba_32.c|  3 +-
 .../display/dc/gpio/dcn32/hw_factory_dcn32.c  |  4 +-
 25 files changed, 122 insertions(+), 64 deletions(-)

-- 
2.25.1

[linux-next:master] BUILD REGRESSION 13ee7ef407cfcf63f4f047460ac5bb6ba5a3447d

2022-11-29 Thread kernel test robot

tree/branch: 
https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git master
branch HEAD: 13ee7ef407cfcf63f4f047460ac5bb6ba5a3447d  Add linux-next specific 
files for 20221129

Error/Warning reports:

https://lore.kernel.org/oe-kbuild-all/202211041320.coq8eelj-...@intel.com
https://lore.kernel.org/oe-kbuild-all/202211090634.ryfkk0ws-...@intel.com
https://lore.kernel.org/oe-kbuild-all/20220149.0etifpy6-...@intel.com
https://lore.kernel.org/oe-kbuild-all/202211242021.fdzrfna8-...@intel.com
https://lore.kernel.org/oe-kbuild-all/202211242120.mzzvguln-...@intel.com
https://lore.kernel.org/oe-kbuild-all/202211282102.qur7hhrw-...@intel.com
https://lore.kernel.org/oe-kbuild-all/202211300947.ley0l6at-...@intel.com

Error/Warning: (recently discovered and may have been fixed)

./include/media/dvbdev.h:207: warning: expecting prototype for 
dvb_device_get(). Prototype was for dvb_device_put() instead
arch/arm/mach-s3c/devs.c:32:10: fatal error: 
'linux/platform_data/dma-s3c24xx.h' file not found
arch/arm/mach-s3c/devs.c:32:10: fatal error: linux/platform_data/dma-s3c24xx.h: 
No such file or directory
arch/powerpc/kernel/kvm_emul.o: warning: objtool: kvm_template_end(): can't 
find starting instruction
arch/powerpc/kernel/optprobes_head.o: warning: objtool: 
optprobe_template_end(): can't find starting instruction
drivers/gpu/drm/amd/amdgpu/../display/dc/irq/dcn201/irq_service_dcn201.c:40:20: 
warning: no previous prototype for 'to_dal_irq_source_dcn201' 
[-Wmissing-prototypes]
drivers/gpu/drm/nouveau/nvkm/engine/fifo/gf100.c:451:1: warning: no previous 
prototype for 'gf100_fifo_nonstall_block' [-Wmissing-prototypes]
drivers/gpu/drm/nouveau/nvkm/engine/fifo/gf100.c:451:1: warning: no previous 
prototype for function 'gf100_fifo_nonstall_block' [-Wmissing-prototypes]
drivers/gpu/drm/nouveau/nvkm/engine/fifo/runl.c:34:1: warning: no previous 
prototype for 'nvkm_engn_cgrp_get' [-Wmissing-prototypes]
drivers/gpu/drm/nouveau/nvkm/engine/fifo/runl.c:34:1: warning: no previous 
prototype for function 'nvkm_engn_cgrp_get' [-Wmissing-prototypes]
drivers/gpu/drm/nouveau/nvkm/engine/gr/tu102.c:210:1: warning: no previous 
prototype for 'tu102_gr_load' [-Wmissing-prototypes]
drivers/gpu/drm/nouveau/nvkm/engine/gr/tu102.c:210:1: warning: no previous 
prototype for function 'tu102_gr_load' [-Wmissing-prototypes]
drivers/gpu/drm/nouveau/nvkm/nvfw/acr.c:49:1: warning: no previous prototype 
for 'wpr_generic_header_dump' [-Wmissing-prototypes]
drivers/gpu/drm/nouveau/nvkm/nvfw/acr.c:49:1: warning: no previous prototype 
for function 'wpr_generic_header_dump' [-Wmissing-prototypes]
drivers/gpu/drm/nouveau/nvkm/subdev/acr/lsfw.c:221:21: warning: variable 'loc' 
set but not used [-Wunused-but-set-variable]
drivers/media/platform/mediatek/jpeg/mtk_jpeg_core.c:1849:38: warning: unused 
variable 'mt8173_jpeg_drvdata' [-Wunused-const-variable]
drivers/media/platform/mediatek/jpeg/mtk_jpeg_core.c:1864:38: warning: unused 
variable 'mtk_jpeg_drvdata' [-Wunused-const-variable]
drivers/media/platform/mediatek/jpeg/mtk_jpeg_core.c:1890:38: warning: unused 
variable 'mtk8195_jpegdec_drvdata' [-Wunused-const-variable]
net/netfilter/nf_conntrack_netlink.c:2674:6: warning: unused variable 'mark' 
[-Wunused-variable]
vmlinux.o: warning: objtool: __btrfs_map_block+0x21ad: unreachable instruction

Error/Warning ids grouped by kconfigs:

gcc_recent_errors
|-- alpha-allyesconfig
|   |-- 
drivers-gpu-drm-amd-amdgpu-..-display-dc-irq-dcn201-irq_service_dcn201.c:warning:no-previous-prototype-for-to_dal_irq_source_dcn201
|   |-- 
drivers-gpu-drm-nouveau-nvkm-engine-fifo-gf100.c:warning:no-previous-prototype-for-gf100_fifo_nonstall_block
|   |-- 
drivers-gpu-drm-nouveau-nvkm-engine-fifo-runl.c:warning:no-previous-prototype-for-nvkm_engn_cgrp_get
|   |-- 
drivers-gpu-drm-nouveau-nvkm-engine-gr-tu102.c:warning:no-previous-prototype-for-tu102_gr_load
|   |-- 
drivers-gpu-drm-nouveau-nvkm-nvfw-acr.c:warning:no-previous-prototype-for-wpr_generic_header_dump
|   `-- 
drivers-gpu-drm-nouveau-nvkm-subdev-acr-lsfw.c:warning:variable-loc-set-but-not-used
|-- arc-allyesconfig
|   |-- 
drivers-gpu-drm-amd-amdgpu-..-display-dc-irq-dcn201-irq_service_dcn201.c:warning:no-previous-prototype-for-to_dal_irq_source_dcn201
|   |-- 
drivers-gpu-drm-nouveau-nvkm-engine-fifo-gf100.c:warning:no-previous-prototype-for-gf100_fifo_nonstall_block
|   |-- 
drivers-gpu-drm-nouveau-nvkm-engine-fifo-runl.c:warning:no-previous-prototype-for-nvkm_engn_cgrp_get
|   |-- 
drivers-gpu-drm-nouveau-nvkm-engine-gr-tu102.c:warning:no-previous-prototype-for-tu102_gr_load
|   |-- 
drivers-gpu-drm-nouveau-nvkm-nvfw-acr.c:warning:no-previous-prototype-for-wpr_generic_header_dump
|   `-- 
drivers-gpu-drm-nouveau-nvkm-subdev-acr-lsfw.c:warning:variable-loc-set-but-not-used
|-- arm-allyesconfig
|   |-- 
arch-arm-mach-s3c-devs.c:fata

RE: [PATCH 2/2] drm/amdgpu/mes11: enable reg active poll

2022-11-29 Thread Zhang, Hawking

[AMD Official Use Only - General]

For the series, Reviewed-by: Hawking Zhang 

Regards,
Hawking

-Original Message-
From: Zhang, Hawking
Sent: Wednesday, November 30, 2022 09:28
To: 'Jack Xiao' ; amd-gfx@lists.freedesktop.org
Cc: Xiao, Jack 
Subject: RE: [PATCH 2/2] drm/amdgpu/mes11: enable reg active poll

Reviewed-by: Hawking Zhang 

Regards,
Hawking
-Original Message-
From: amd-gfx  On Behalf Of Jack Xiao
Sent: Wednesday, November 30, 2022 08:44
To: amd-gfx@lists.freedesktop.org
Cc: Xiao, Jack 
Subject: [PATCH 2/2] drm/amdgpu/mes11: enable reg active poll

Enable reg active poll in mes11.

Signed-off-by: Jack Xiao 
---
 drivers/gpu/drm/amd/amdgpu/mes_v11_0.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/mes_v11_0.c 
b/drivers/gpu/drm/amd/amdgpu/mes_v11_0.c
index 02ad84a1526a..a3e7062b7f77 100644
--- a/drivers/gpu/drm/amd/amdgpu/mes_v11_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/mes_v11_0.c
@@ -383,6 +383,7 @@ static int mes_v11_0_set_hw_resources(struct amdgpu_mes 
*mes)
mes_set_hw_res_pkt.disable_reset = 1;
mes_set_hw_res_pkt.disable_mes_log = 1;
mes_set_hw_res_pkt.use_different_vmid_compute = 1;
+   mes_set_hw_res_pkt.enable_reg_active_poll = 1;
mes_set_hw_res_pkt.oversubscription_timer = 50;

return mes_v11_0_submit_pkt_and_poll_completion(mes,
--
2.37.3

<>

RE: [PATCH 2/2] drm/amdgpu/mes11: enable reg active poll

2022-11-29 Thread Zhang, Hawking

[AMD Official Use Only - General]

Reviewed-by: Hawking Zhang 

Regards,
Hawking
-Original Message-
From: amd-gfx  On Behalf Of Jack Xiao
Sent: Wednesday, November 30, 2022 08:44
To: amd-gfx@lists.freedesktop.org
Cc: Xiao, Jack 
Subject: [PATCH 2/2] drm/amdgpu/mes11: enable reg active poll

Enable reg active poll in mes11.

Signed-off-by: Jack Xiao 
---
 drivers/gpu/drm/amd/amdgpu/mes_v11_0.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/mes_v11_0.c 
b/drivers/gpu/drm/amd/amdgpu/mes_v11_0.c
index 02ad84a1526a..a3e7062b7f77 100644
--- a/drivers/gpu/drm/amd/amdgpu/mes_v11_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/mes_v11_0.c
@@ -383,6 +383,7 @@ static int mes_v11_0_set_hw_resources(struct amdgpu_mes 
*mes)
mes_set_hw_res_pkt.disable_reset = 1;
mes_set_hw_res_pkt.disable_mes_log = 1;
mes_set_hw_res_pkt.use_different_vmid_compute = 1;
+   mes_set_hw_res_pkt.enable_reg_active_poll = 1;
mes_set_hw_res_pkt.oversubscription_timer = 50;

return mes_v11_0_submit_pkt_and_poll_completion(mes,
--
2.37.3

<>

Re: [PATCH 26/29] drm/amdkfd: add debug query exception info operation

2022-11-29 Thread Felix Kuehling


On 2022-10-31 12:23, Jonathan Kim wrote:

Allow the debugger to query additional info based on an exception code.
For device exceptions, it's currently only memory violation information.
For process exceptions, it's currently only runtime information.
Queue exception only report the queue exception status.

The debugger has the option of clearing the target exception on query.

Signed-off-by: Jonathan Kim 


Reviewed-by: Felix Kuehling 



---
  drivers/gpu/drm/amd/amdkfd/kfd_chardev.c |   7 ++
  drivers/gpu/drm/amd/amdkfd/kfd_debug.c   | 120 +++
  drivers/gpu/drm/amd/amdkfd/kfd_debug.h   |   6 ++
  3 files changed, 133 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
index b918213a0087..2c8f107237ee 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
@@ -2953,6 +2953,13 @@ static int kfd_ioctl_set_debug_trap(struct file *filep, 
struct kfd_process *p, v
&args->query_debug_event.exception_mask);
break;
case KFD_IOC_DBG_TRAP_QUERY_EXCEPTION_INFO:
+   r = kfd_dbg_trap_query_exception_info(target,
+   args->query_exception_info.source_id,
+   args->query_exception_info.exception_code,
+   args->query_exception_info.clear_exception,
+   (void __user 
*)args->query_exception_info.info_ptr,
+   &args->query_exception_info.info_size);
+   break;
case KFD_IOC_DBG_TRAP_GET_QUEUE_SNAPSHOT:
case KFD_IOC_DBG_TRAP_GET_DEVICE_SNAPSHOT:
pr_warn("Debug op %i not supported yet\n", args->op);
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_debug.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_debug.c
index 6985a53b83e9..a05fe32eac0e 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_debug.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_debug.c
@@ -768,6 +768,126 @@ int kfd_dbg_trap_set_wave_launch_mode(struct kfd_process 
*target,
return r;
  }
  
+int kfd_dbg_trap_query_exception_info(struct kfd_process *target,

+   uint32_t source_id,
+   uint32_t exception_code,
+   bool clear_exception,
+   void __user *info,
+   uint32_t *info_size)
+{
+   bool found = false;
+   int r = 0;
+   uint32_t copy_size, actual_info_size = 0;
+   uint64_t *exception_status_ptr = NULL;
+
+   if (!target)
+   return -EINVAL;
+
+   if (!info || !info_size)
+   return -EINVAL;
+
+   mutex_lock(&target->event_mutex);
+
+   if (KFD_DBG_EC_TYPE_IS_QUEUE(exception_code)) {
+   /* Per queue exceptions */
+   struct queue *queue = NULL;
+   int i;
+   
+   for (i = 0; i < target->n_pdds; i++) {
+   struct kfd_process_device *pdd = target->pdds[i];
+   struct qcm_process_device *qpd = &pdd->qpd;
+
+   list_for_each_entry(queue, &qpd->queues_list, list) {
+   if (!found && queue->properties.queue_id == 
source_id) {
+   found = true;
+   break;
+   }
+   }
+   if (found)
+   break;
+   }
+
+   if (!found) {
+   r = -EINVAL;
+   goto out;
+   }
+
+   if (!(queue->properties.exception_status & 
KFD_EC_MASK(exception_code))) {
+   r = -ENODATA;
+   goto out;
+   }
+   exception_status_ptr = &queue->properties.exception_status;
+   } else if (KFD_DBG_EC_TYPE_IS_DEVICE(exception_code)) {
+   /* Per device exceptions */
+   struct kfd_process_device *pdd = NULL;
+   int i;
+
+   for (i = 0; i < target->n_pdds; i++) {
+   pdd = target->pdds[i];
+   if (pdd->dev->id == source_id) {
+   found = true;
+   break;
+   }
+   }
+
+   if (!found) {
+   r = -EINVAL;
+   goto out;
+   }
+
+   if (!(pdd->exception_status & KFD_EC_MASK(exception_code))) {
+   r = -ENODATA;
+   goto out;
+   }
+
+   if (exception_code == EC_DEVICE_MEMORY_VIOLATION) {
+   copy_size = min((size_t)(*info_size), 
pdd->vm_fault_exc_data_size);
+
+   if (copy_to_user(info, pdd->vm_fault_exc_data, 
copy_size)) {
+   r = -EFAULT;
+   goto out;
+

Re: [PATCH 25/29] drm/amdkfd: add debug query event operation

2022-11-29 Thread Felix Kuehling




On 2022-10-31 12:23, Jonathan Kim wrote:

Allow the debugger to a single query queue, device and process exception
in a FIFO manner.


The implementation is not really FIFO because the order in which events 
are returned is independent of the order in which they were raised. Just 
remove the FIFO statement.


Other than that, this patch is

Reviewed-by: Felix Kuehling 



The KFD should also return the GPU or Queue id of the exception.
The debugger also has the option of clearing exceptions after
being queried.

Signed-off-by: Jonathan Kim 
---
  drivers/gpu/drm/amd/amdkfd/kfd_chardev.c |  6 +++
  drivers/gpu/drm/amd/amdkfd/kfd_debug.c   | 64 
  drivers/gpu/drm/amd/amdkfd/kfd_debug.h   |  5 ++
  3 files changed, 75 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
index 200e11f02382..b918213a0087 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
@@ -2946,6 +2946,12 @@ static int kfd_ioctl_set_debug_trap(struct file *filep, 
struct kfd_process *p, v
r = kfd_dbg_trap_set_flags(target, &args->set_flags.flags);
break;
case KFD_IOC_DBG_TRAP_QUERY_DEBUG_EVENT:
+   r = kfd_dbg_ev_query_debug_event(target,
+   &args->query_debug_event.queue_id,
+   &args->query_debug_event.gpu_id,
+   args->query_debug_event.exception_mask,
+   &args->query_debug_event.exception_mask);
+   break;
case KFD_IOC_DBG_TRAP_QUERY_EXCEPTION_INFO:
case KFD_IOC_DBG_TRAP_GET_QUEUE_SNAPSHOT:
case KFD_IOC_DBG_TRAP_GET_DEVICE_SNAPSHOT:
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_debug.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_debug.c
index 1f4d3fa0278e..6985a53b83e9 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_debug.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_debug.c
@@ -33,6 +33,70 @@
  #define MAX_WATCH_ADDRESSES   4
  static DEFINE_SPINLOCK(watch_points_lock);
  
+int kfd_dbg_ev_query_debug_event(struct kfd_process *process,

+ unsigned int *queue_id,
+ unsigned int *gpu_id,
+ uint64_t exception_clear_mask,
+ uint64_t *event_status)
+{
+   struct process_queue_manager *pqm;
+   struct process_queue_node *pqn;
+   int i;
+
+   if (!(process && process->debug_trap_enabled))
+   return -ENODATA;
+
+   mutex_lock(&process->event_mutex);
+   *event_status = 0;
+   *queue_id = 0;
+   *gpu_id = 0;
+
+   /* find and report queue events */
+   pqm = &process->pqm;
+   list_for_each_entry(pqn, &pqm->queues, process_queue_list) {
+   uint64_t tmp = process->exception_enable_mask;
+
+   if (!pqn->q)
+   continue;
+
+   tmp &= pqn->q->properties.exception_status;
+
+   if (!tmp)
+   continue;
+
+   *event_status = pqn->q->properties.exception_status;
+   *queue_id = pqn->q->properties.queue_id;
+   *gpu_id = pqn->q->device->id;
+   pqn->q->properties.exception_status &= ~exception_clear_mask;
+   goto out;
+   }
+
+   /* find and report device events */
+   for (i = 0; i < process->n_pdds; i++) {
+   struct kfd_process_device *pdd = process->pdds[i];
+   uint64_t tmp = process->exception_enable_mask
+   & pdd->exception_status;
+
+   if (!tmp)
+   continue;
+
+   *event_status = pdd->exception_status;
+   *gpu_id = pdd->dev->id;
+   pdd->exception_status &= ~exception_clear_mask;
+   goto out;
+   }
+
+   /* report process events */
+   if (process->exception_enable_mask & process->exception_status) {
+   *event_status = process->exception_status;
+   process->exception_status &= ~exception_clear_mask;
+   }
+
+out:
+   mutex_unlock(&process->event_mutex);
+   return *event_status ? 0 : -EAGAIN;
+}
+
  void debug_event_write_work_handler(struct work_struct *work)
  {
struct kfd_process *process;
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_debug.h 
b/drivers/gpu/drm/amd/amdkfd/kfd_debug.h
index 12b80b6c96d0..c64ffd3efc46 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_debug.h
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_debug.h
@@ -27,6 +27,11 @@
  
  void kfd_dbg_trap_deactivate(struct kfd_process *target, bool unwind, int unwind_count);

  int kfd_dbg_trap_activate(struct kfd_process *target);
+int kfd_dbg_ev_query_debug_event(struct kfd_process *process,
+   unsigned int *queue_id,
+   unsigned int *gpu_id,
+   uint64_t exception_clear_mask,
+   uint64_t *event_status);

[PATCH 1/2] drm/amd/amdgpu: update mes11 api def

2022-11-29 Thread Jack Xiao

Update the api def of mes11.

Signed-off-by: Jack Xiao 
---
 drivers/gpu/drm/amd/include/mes_v11_api_def.h | 6 +-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/include/mes_v11_api_def.h 
b/drivers/gpu/drm/amd/include/mes_v11_api_def.h
index 7e85cdc5bd34..dc694cb246d9 100644
--- a/drivers/gpu/drm/amd/include/mes_v11_api_def.h
+++ b/drivers/gpu/drm/amd/include/mes_v11_api_def.h
@@ -222,7 +222,11 @@ union MESAPI_SET_HW_RESOURCES {
uint32_t 
apply_grbm_remote_register_dummy_read_wa : 1;
uint32_t second_gfx_pipe_enabled : 1;
uint32_t enable_level_process_quantum_check : 1;
-   uint32_t reserved   : 25;
+   uint32_t legacy_sch_mode : 1;
+   uint32_t disable_add_queue_wptr_mc_addr : 1;
+   uint32_t enable_mes_event_int_logging : 1;
+   uint32_t enable_reg_active_poll : 1;
+   uint32_t reserved   : 21;
};
uint32_tuint32_t_all;
};
-- 
2.37.3

[PATCH 2/2] drm/amdgpu/mes11: enable reg active poll

2022-11-29 Thread Jack Xiao

Enable reg active poll in mes11.

Signed-off-by: Jack Xiao 
---
 drivers/gpu/drm/amd/amdgpu/mes_v11_0.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/mes_v11_0.c 
b/drivers/gpu/drm/amd/amdgpu/mes_v11_0.c
index 02ad84a1526a..a3e7062b7f77 100644
--- a/drivers/gpu/drm/amd/amdgpu/mes_v11_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/mes_v11_0.c
@@ -383,6 +383,7 @@ static int mes_v11_0_set_hw_resources(struct amdgpu_mes 
*mes)
mes_set_hw_res_pkt.disable_reset = 1;
mes_set_hw_res_pkt.disable_mes_log = 1;
mes_set_hw_res_pkt.use_different_vmid_compute = 1;
+   mes_set_hw_res_pkt.enable_reg_active_poll = 1;
mes_set_hw_res_pkt.oversubscription_timer = 50;
 
return mes_v11_0_submit_pkt_and_poll_completion(mes,
-- 
2.37.3

Re: [PATCH 24/29] drm/amdkfd: add debug set flags operation

2022-11-29 Thread Felix Kuehling




On 2022-10-31 12:23, Jonathan Kim wrote:

Allow the debugger to set single memory and single ALU operations.

Some exceptions are imprecise (memory violations, address watch) in the
sense that a trap occurs only when the exception interrupt occurs and
not at the non-halting faulty instruction.  Trap temporaries 0 & 1 save
the program counter address, which means that these values will not point
to the faulty instruction address but to whenever the interrupt was
raised.

Setting the Single Memory Operations flag will inject an automatic wait
on every memory operation instruction forcing imprecise memory exceptions
to become precise at the cost of performance.  This setting is not
permitted on debug devices that support only a global setting of this
option.

Likewise, Single ALU Operations will force in-order ALU operations.
Although this is available on current hardware, it's not required so it
will be treated as a NOP.


Having a flag in the API that is just ignored is misleading. I think we 
should either remove it from the API for now, or at least make the 
function return an error if a debugger attempts to set the precise-ALU 
flag. This would be consistent with attempting to set a flag that is not 
supported on the HW.


Regards,
  Felix




Return the previous set flags to the debugger as well.

Signed-off-by: Jonathan Kim 
---
  drivers/gpu/drm/amd/amdkfd/kfd_chardev.c |  2 ++
  drivers/gpu/drm/amd/amdkfd/kfd_debug.c   | 35 
  drivers/gpu/drm/amd/amdkfd/kfd_debug.h   |  1 +
  3 files changed, 38 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
index 9b2ea6e9e078..200e11f02382 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
@@ -2943,6 +2943,8 @@ static int kfd_ioctl_set_debug_trap(struct file *filep, 
struct kfd_process *p, v
args->clear_node_address_watch.id);
break;
case KFD_IOC_DBG_TRAP_SET_FLAGS:
+   r = kfd_dbg_trap_set_flags(target, &args->set_flags.flags);
+   break;
case KFD_IOC_DBG_TRAP_QUERY_DEBUG_EVENT:
case KFD_IOC_DBG_TRAP_QUERY_EXCEPTION_INFO:
case KFD_IOC_DBG_TRAP_GET_QUEUE_SNAPSHOT:
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_debug.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_debug.c
index 68bc1d5bfd05..1f4d3fa0278e 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_debug.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_debug.c
@@ -23,6 +23,7 @@
  #include "kfd_debug.h"
  #include "kfd_device_queue_manager.h"
  #include 
+#include 
  
  /*

   * The spinlock protects the per device dev->alloc_watch_ids for 
multi-process access.
@@ -355,6 +356,37 @@ static void kfd_dbg_clear_process_address_watch(struct 
kfd_process *target)
kfd_dbg_trap_clear_dev_address_watch(target->pdds[i], 
j);
  }
  
+int kfd_dbg_trap_set_flags(struct kfd_process *target, uint32_t *flags)

+{
+   uint32_t prev_flags = target->dbg_flags;
+   int i, r = 0;
+
+   for (i = 0; i < target->n_pdds; i++) {
+   if (!kfd_dbg_is_per_vmid_supported(target->pdds[i]->dev) &&
+   (*flags & KFD_DBG_TRAP_FLAG_SINGLE_MEM_OP)) {
+   *flags = prev_flags;
+   return -EACCES;
+   }
+   }
+
+   target->dbg_flags = *flags;
+   *flags = prev_flags;
+   for (i = 0; i < target->n_pdds; i++) {
+   struct kfd_process_device *pdd = target->pdds[i];
+
+   if (!kfd_dbg_is_per_vmid_supported(pdd->dev))
+   continue;
+
+   r = debug_refresh_runlist(target->pdds[i]->dev->dqm);
+   if (r) {
+   target->dbg_flags = prev_flags;
+   break;
+   }
+   }
+
+   return r;
+}
+
  
  /* kfd_dbg_trap_deactivate:

   *target: target process
@@ -369,9 +401,12 @@ void kfd_dbg_trap_deactivate(struct kfd_process *target, 
bool unwind, int unwind
int i, count = 0;
  
  	if (!unwind) {

+   uint32_t flags = 0;
cancel_work_sync(&target->debug_event_workarea);
kfd_dbg_clear_process_address_watch(target);
kfd_dbg_trap_set_wave_launch_mode(target, 0);
+
+   kfd_dbg_trap_set_flags(target, &flags);
}
  
  	for (i = 0; i < target->n_pdds; i++) {

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_debug.h 
b/drivers/gpu/drm/amd/amdkfd/kfd_debug.h
index ad677e67e7eb..12b80b6c96d0 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_debug.h
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_debug.h
@@ -57,6 +57,7 @@ int kfd_dbg_trap_set_dev_address_watch(struct 
kfd_process_device *pdd,
uint32_t watch_address_mask,
uint32_t *watch_id,
uint32_t watch_mode);
+int kfd_dbg_trap_set_flags(struct kfd_process *target, uint32_t

Re: [PATCH 23/29] drm/amdkfd: add debug set and clear address watch points operation

2022-11-29 Thread Felix Kuehling




On 2022-10-31 12:23, Jonathan Kim wrote:

Shader read, write and atomic memory operations can be alerted to the
debugger as an address watch exception.

Allow the debugger to pass in a watch point to a particular memory
address per device.

Note that there exists only 4 watch points per devices to date, so have
the KFD keep track of what watch points are allocated or not.

v2: change dev_id arg to gpu_id for consistency

Signed-off-by: Jonathan Kim 


Nit-picks inline.



---
  .../drm/amd/amdgpu/amdgpu_amdkfd_aldebaran.c  |   2 +
  .../drm/amd/amdgpu/amdgpu_amdkfd_arcturus.c   |   2 +
  .../drm/amd/amdgpu/amdgpu_amdkfd_gfx_v10.c|  78 +++
  .../drm/amd/amdgpu/amdgpu_amdkfd_gfx_v10.h|   8 ++
  .../drm/amd/amdgpu/amdgpu_amdkfd_gfx_v10_3.c  |   5 +-
  .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.c | 128 +
  .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.h |   8 ++
  drivers/gpu/drm/amd/amdkfd/kfd_chardev.c  |  24 
  drivers/gpu/drm/amd/amdkfd/kfd_debug.c| 130 ++
  drivers/gpu/drm/amd/amdkfd/kfd_debug.h|   8 +-
  drivers/gpu/drm/amd/amdkfd/kfd_flat_memory.c  |   7 +
  drivers/gpu/drm/amd/amdkfd/kfd_priv.h |   9 +-
  12 files changed, 405 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_aldebaran.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_aldebaran.c
index 91c7fdee883e..8f9b613e3152 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_aldebaran.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_aldebaran.c
@@ -138,6 +138,8 @@ const struct kfd2kgd_calls aldebaran_kfd2kgd = {
.validate_trap_override_request = 
kgd_aldebaran_validate_trap_override_request,
.set_wave_launch_trap_override = 
kgd_aldebaran_set_wave_launch_trap_override,
.set_wave_launch_mode = kgd_aldebaran_set_wave_launch_mode,
+   .set_address_watch = kgd_gfx_v9_set_address_watch,
+   .clear_address_watch = kgd_gfx_v9_clear_address_watch,
.get_iq_wait_times = kgd_gfx_v9_get_iq_wait_times,
.build_grace_period_packet_info = 
kgd_gfx_v9_build_grace_period_packet_info,
.program_trap_handler_settings = 
kgd_gfx_v9_program_trap_handler_settings,
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_arcturus.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_arcturus.c
index 10470f4a4eaf..5d6bd23a8cc1 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_arcturus.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_arcturus.c
@@ -400,6 +400,8 @@ const struct kfd2kgd_calls arcturus_kfd2kgd = {
.validate_trap_override_request = 
kgd_gfx_v9_validate_trap_override_request,
.set_wave_launch_trap_override = 
kgd_gfx_v9_set_wave_launch_trap_override,
.set_wave_launch_mode = kgd_gfx_v9_set_wave_launch_mode,
+   .set_address_watch = kgd_gfx_v9_set_address_watch,
+   .clear_address_watch = kgd_gfx_v9_clear_address_watch,
.get_iq_wait_times = kgd_gfx_v9_get_iq_wait_times,
.build_grace_period_packet_info = 
kgd_gfx_v9_build_grace_period_packet_info,
.get_cu_occupancy = kgd_gfx_v9_get_cu_occupancy,
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v10.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v10.c
index 66a83e6fb9e5..ec4862f6 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v10.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v10.c
@@ -880,6 +880,82 @@ uint32_t kgd_gfx_v10_set_wave_launch_mode(struct 
amdgpu_device *adev,
return 0;
  }
  
+#define TCP_WATCH_STRIDE (mmTCP_WATCH1_ADDR_H - mmTCP_WATCH0_ADDR_H)

+uint32_t kgd_gfx_v10_set_address_watch(struct amdgpu_device *adev,
+   uint64_t watch_address,
+   uint32_t watch_address_mask,
+   uint32_t watch_id,
+   uint32_t watch_mode,
+   uint32_t debug_vmid)
+{
+   uint32_t watch_address_high;
+   uint32_t watch_address_low;
+   uint32_t watch_address_cntl;
+
+   watch_address_cntl = 0;
+
+   watch_address_low = lower_32_bits(watch_address);
+   watch_address_high = upper_32_bits(watch_address) & 0x;
+
+   watch_address_cntl = REG_SET_FIELD(watch_address_cntl,
+   TCP_WATCH0_CNTL,
+   VMID,
+   debug_vmid);
+   watch_address_cntl = REG_SET_FIELD(watch_address_cntl,
+   TCP_WATCH0_CNTL,
+   MODE,
+   watch_mode);
+   watch_address_cntl = REG_SET_FIELD(watch_address_cntl,
+   TCP_WATCH0_CNTL,
+   MASK,
+   watch_address_mask >> 7);
+
+   /* Turning off this watch point until we set all the registers */
+   watch_address_cntl = REG_SET_FIELD(watch_address_cntl,
+   TCP_WATCH0_CNTL,
+   VALID,
+   0);
+
+

Re: [PATCH] drm/amdgpu: enable Vangogh VCN indirect sram mode

2022-11-29 Thread James Zhu


ThispatchisReviewed-by:JamesZhu

On 2022-11-29 19:02, Leo Liu wrote:

So that uses PSP to initialize HW.

Fixes: 0c2c02b6 (drm/amdgpu/vcn: add firmware support for dimgrey_cavefish)

Signed-off-by: Leo Liu
---
  drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c | 3 +++
  1 file changed, 3 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c
index c448c1bdf84d..72fa14ff862f 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c
@@ -156,6 +156,9 @@ int amdgpu_vcn_sw_init(struct amdgpu_device *adev)
break;
case IP_VERSION(3, 0, 2):
fw_name = FIRMWARE_VANGOGH;
+   if ((adev->firmware.load_type == AMDGPU_FW_LOAD_PSP) &&
+   (adev->pg_flags & AMD_PG_SUPPORT_VCN_DPG))
+   adev->vcn.indirect_sram = true;
break;
case IP_VERSION(3, 0, 16):
fw_name = FIRMWARE_DIMGREY_CAVEFISH;

[PATCH] drm/amdgpu: enable Vangogh VCN indirect sram mode

2022-11-29 Thread Leo Liu

So that uses PSP to initialize HW.

Fixes: 0c2c02b6 (drm/amdgpu/vcn: add firmware support for dimgrey_cavefish)

Signed-off-by: Leo Liu 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c
index c448c1bdf84d..72fa14ff862f 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c
@@ -156,6 +156,9 @@ int amdgpu_vcn_sw_init(struct amdgpu_device *adev)
break;
case IP_VERSION(3, 0, 2):
fw_name = FIRMWARE_VANGOGH;
+   if ((adev->firmware.load_type == AMDGPU_FW_LOAD_PSP) &&
+   (adev->pg_flags & AMD_PG_SUPPORT_VCN_DPG))
+   adev->vcn.indirect_sram = true;
break;
case IP_VERSION(3, 0, 16):
fw_name = FIRMWARE_DIMGREY_CAVEFISH;
-- 
2.25.1

Re: [PATCH 22/29] drm/amdkfd: add debug suspend and resume process queues operation

2022-11-29 Thread Felix Kuehling




On 2022-10-31 12:23, Jonathan Kim wrote:

In order to inspect waves from the saved context at any point during a
debug session, the debugger must be able to preempt queues to trigger
context save by suspending them.

On queue suspend, the KFD will copy the context save header information
so that the debugger can correctly crawl the appropriate size of the saved
context. The debugger must then also be allowed to resume suspended queues.

A queue that is newly created cannot be suspended because queue ids are
recycled after destruction so the debugger needs to know that this has
occurred.  Query functions will be later added that will clear a given
queue of its new queue status.

A queue cannot be destroyed while it is suspended to preserve its saved
context during debugger inspection.  Have queue destruction block while
a queue is suspended and unblocked when it is resumed.  Likewise, if a
queue is about to be destroyed, it cannot be suspended.

Return the number of queues successfully suspended or resumed along with
a per queue status array where the upper bits per queue status show that
the request was invalid (new/destroyed queue suspend request, missing
queue) or an error occurred (HWS in a fatal state so it can't suspend or
resume queues).

Signed-off-by: Jonathan Kim 


Some nit-picks inline. Other than that, this patch looks good to me.



---
  drivers/gpu/drm/amd/amdkfd/kfd_chardev.c  |  12 +
  drivers/gpu/drm/amd/amdkfd/kfd_debug.c|   7 +
  .../drm/amd/amdkfd/kfd_device_queue_manager.c | 401 +-
  .../drm/amd/amdkfd/kfd_device_queue_manager.h |  11 +
  .../gpu/drm/amd/amdkfd/kfd_mqd_manager_v10.c  |  10 +
  .../gpu/drm/amd/amdkfd/kfd_mqd_manager_v9.c   |  14 +-
  drivers/gpu/drm/amd/amdkfd/kfd_priv.h |   5 +-
  7 files changed, 454 insertions(+), 6 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
index 63665279ce4d..ec26c51177f9 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
@@ -410,6 +410,7 @@ static int kfd_ioctl_create_queue(struct file *filep, 
struct kfd_process *p,
pr_debug("Write ptr address   == 0x%016llX\n",
args->write_pointer_address);
  
+	kfd_dbg_ev_raise(KFD_EC_MASK(EC_QUEUE_NEW), p, dev, queue_id, false, NULL, 0);

return 0;
  
  err_create_queue:

@@ -2903,7 +2904,18 @@ static int kfd_ioctl_set_debug_trap(struct file *filep, 
struct kfd_process *p, v
args->launch_mode.launch_mode);
break;
case KFD_IOC_DBG_TRAP_SUSPEND_QUEUES:
+   r = suspend_queues(target,
+   args->suspend_queues.num_queues,
+   args->suspend_queues.grace_period,
+   args->suspend_queues.exception_mask,
+   (uint32_t 
*)args->suspend_queues.queue_array_ptr);
+
+   break;
case KFD_IOC_DBG_TRAP_RESUME_QUEUES:
+   r = resume_queues(target, false,
+   args->resume_queues.num_queues,
+   (uint32_t 
*)args->resume_queues.queue_array_ptr);
+   break;
case KFD_IOC_DBG_TRAP_SET_NODE_ADDRESS_WATCH:
case KFD_IOC_DBG_TRAP_CLEAR_NODE_ADDRESS_WATCH:
case KFD_IOC_DBG_TRAP_SET_FLAGS:
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_debug.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_debug.c
index 210851f2cdb3..afa56aad316b 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_debug.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_debug.c
@@ -274,6 +274,13 @@ void kfd_dbg_trap_deactivate(struct kfd_process *target, 
bool unwind, int unwind
  
  		count++;

}
+
+   if (!unwind) {
+   int resume_count = resume_queues(target, true, 0, NULL);
+
+   if (resume_count)
+   pr_debug("Resumed %d queues\n", resume_count);
+   }
  }
  
  static void kfd_dbg_clean_exception_status(struct kfd_process *target)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
index bf4787b4dc6c..589efbefc8dc 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
@@ -921,6 +921,79 @@ static int update_queue(struct device_queue_manager *dqm, 
struct queue *q,
return retval;
  }
  
+/* suspend_single_queue does not lock the dqm like the

+ * evict_process_queues_cpsch or evict_process_queues_nocpsch. You should
+ * lock the dqm before calling, and unlock after calling.
+ *
+ * The reason we don't lock the dqm is because this function may be
+ * called on multiple queues in a loop, so rather than locking/unlocking
+ * multiple times, we will just keep the dqm locked for all of the calls.
+ */
+static int suspend_single_queue(struct device_queue_manager *dqm,
+ str

Re: [PATCH 20/29] drm/amdkfd: add debug wave launch override operation

2022-11-29 Thread Felix Kuehling


On 2022-10-31 12:23, Jonathan Kim wrote:

This operation allows the debugger to override the enabled HW
exceptions on the device.

On debug devices that only support the debugging of a single process,
the HW exceptions are global and set through the SPI_GDBG_TRAP_MASK
register.
Because they are global, only address watch exceptions are allowed to
be enabled.  In other words, the debugger must preserve all non-address
watch exception states in normal mode operation by barring a full
replacement override or a non-address watch override request.

For multi-process debugging, all HW exception overrides are per-VMID so
all exceptions can be overridden or fully replaced.

In order for the debugger to know what is permissible, returned the
supported override mask back to the debugger along with the previously
enable overrides.

v2: switch unsupported override mode return from EPERM to EINVAL to
support unique EPERM on PTRACE failure.

Signed-off-by: Jonathan Kim 


Reviewed-by: Felix Kuehling 



---
  .../drm/amd/amdgpu/amdgpu_amdkfd_aldebaran.c  | 47 ++
  .../drm/amd/amdgpu/amdgpu_amdkfd_arcturus.c   |  2 +
  .../drm/amd/amdgpu/amdgpu_amdkfd_gfx_v10.c| 55 
  .../drm/amd/amdgpu/amdgpu_amdkfd_gfx_v10.h| 10 +++
  .../drm/amd/amdgpu/amdgpu_amdkfd_gfx_v10_3.c  |  5 +-
  .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.c | 55 
  .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.h | 10 +++
  drivers/gpu/drm/amd/amdkfd/kfd_chardev.c  |  7 ++
  drivers/gpu/drm/amd/amdkfd/kfd_debug.c| 65 +++
  drivers/gpu/drm/amd/amdkfd/kfd_debug.h|  6 ++
  10 files changed, 261 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_aldebaran.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_aldebaran.c
index c9629fc5460c..a5003f6f05bf 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_aldebaran.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_aldebaran.c
@@ -25,6 +25,7 @@
  #include "amdgpu_amdkfd_gfx_v9.h"
  #include "gc/gc_9_4_2_offset.h"
  #include "gc/gc_9_4_2_sh_mask.h"
+#include 
  
  /* returns TRAP_EN, EXCP_EN and EXCP_REPLACE. */

  static uint32_t kgd_aldebaran_enable_debug_trap(struct amdgpu_device *adev,
@@ -54,6 +55,50 @@ static uint32_t kgd_aldebaran_disable_debug_trap(struct 
amdgpu_device *adev,
return data;
  }
  
+static int kgd_aldebaran_validate_trap_override_request(struct amdgpu_device *adev,

+   uint32_t trap_override,
+   uint32_t 
*trap_mask_supported)
+{
+   *trap_mask_supported &= KFD_DBG_TRAP_MASK_FP_INVALID |
+   KFD_DBG_TRAP_MASK_FP_INPUT_DENORMAL |
+   KFD_DBG_TRAP_MASK_FP_DIVIDE_BY_ZERO |
+   KFD_DBG_TRAP_MASK_FP_OVERFLOW |
+   KFD_DBG_TRAP_MASK_FP_UNDERFLOW |
+   KFD_DBG_TRAP_MASK_FP_INEXACT |
+   KFD_DBG_TRAP_MASK_INT_DIVIDE_BY_ZERO |
+   KFD_DBG_TRAP_MASK_DBG_ADDRESS_WATCH |
+   KFD_DBG_TRAP_MASK_DBG_MEMORY_VIOLATION;
+
+   if (trap_override != KFD_DBG_TRAP_OVERRIDE_OR &&
+   trap_override != KFD_DBG_TRAP_OVERRIDE_REPLACE)
+   return -EPERM;
+
+   return 0;
+}
+
+/* returns TRAP_EN, EXCP_EN and EXCP_RPLACE. */
+static uint32_t kgd_aldebaran_set_wave_launch_trap_override(struct 
amdgpu_device *adev,
+   uint32_t vmid,
+   uint32_t trap_override,
+   uint32_t trap_mask_bits,
+   uint32_t trap_mask_request,
+   uint32_t *trap_mask_prev,
+   uint32_t kfd_dbg_trap_cntl_prev)
+
+{
+   uint32_t data = 0;
+
+   *trap_mask_prev = REG_GET_FIELD(kfd_dbg_trap_cntl_prev, 
SPI_GDBG_PER_VMID_CNTL, EXCP_EN);
+   trap_mask_bits = (trap_mask_bits & trap_mask_request) |
+   (*trap_mask_prev & ~trap_mask_request);
+
+   data = REG_SET_FIELD(data, SPI_GDBG_PER_VMID_CNTL, TRAP_EN, 1);
+   data = REG_SET_FIELD(data, SPI_GDBG_PER_VMID_CNTL, EXCP_EN, 
trap_mask_bits);
+   data = REG_SET_FIELD(data, SPI_GDBG_PER_VMID_CNTL, EXCP_REPLACE, 
trap_override);
+
+   return data;
+}
+
  const struct kfd2kgd_calls aldebaran_kfd2kgd = {
.program_sh_mem_settings = kgd_gfx_v9_program_sh_mem_settings,
.set_pasid_vmid_mapping = kgd_gfx_v9_set_pasid_vmid_mapping,
@@ -73,6 +118,8 @@ const struct kfd2kgd_calls aldebaran_kfd2kgd = {
.set_vm_context_page_table_base = 
kgd_gfx_v9_set_vm_context_page_table_base,
.enable_debug_trap = kgd_aldebaran_enable_debug_trap,
.disable_debug_trap = kgd_aldebaran_disable_debug_trap,
+   .validate_trap_override_request = 
kgd_aldebaran_vali

Re: [PATCH 3/9] drm/ttm: use per BO cleanup workers

2022-11-29 Thread Felix Kuehling


On 2022-11-25 05:21, Christian König wrote:

Instead of a single worker going over the list of delete BOs in regular
intervals use a per BO worker which blocks for the resv object and
locking of the BO.

This not only simplifies the handling massively, but also results in
much better response time when cleaning up buffers.

Signed-off-by: Christian König 


Just thinking out loud: If I understand it correctly, this can cause a 
lot of sleeping worker threads when 
AMDGPU_GEM_CREATE_VRAM_WIPE_ON_RELEASE is used and many BOs are freed at 
the same time. This happens e.g. when a KFD process terminates or 
crashes. I guess with a concurrency-managed workqueue this isn't going 
to be excessive. And since it's on a per device workqueue, it doesn't 
stall work items on the system work queue or from other devices.


I'm trying to understand why you set WQ_MEM_RECLAIM. This work queue is 
not about freeing ttm_resources but about freeing the BOs. But it 
affects freeing of ghost_objs that are holding the ttm_resources being 
freed.


If those assumptions all make sense, patches 1-3 are

Reviewed-by: Felix Kuehling 



---
  drivers/gpu/drm/amd/amdgpu/amdgpu_device.c |   2 +-
  drivers/gpu/drm/i915/i915_gem.c|   2 +-
  drivers/gpu/drm/i915/intel_region_ttm.c|   2 +-
  drivers/gpu/drm/ttm/ttm_bo.c   | 112 -
  drivers/gpu/drm/ttm/ttm_bo_util.c  |   1 -
  drivers/gpu/drm/ttm/ttm_device.c   |  24 ++---
  include/drm/ttm/ttm_bo_api.h   |  18 +---
  include/drm/ttm/ttm_device.h   |   7 +-
  8 files changed, 57 insertions(+), 111 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index 2b1db37e25c1..74ccbd566777 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -3984,7 +3984,7 @@ void amdgpu_device_fini_hw(struct amdgpu_device *adev)
amdgpu_fence_driver_hw_fini(adev);
  
  	if (adev->mman.initialized)

-   flush_delayed_work(&adev->mman.bdev.wq);
+   drain_workqueue(adev->mman.bdev.wq);
  
  	if (adev->pm_sysfs_en)

amdgpu_pm_sysfs_fini(adev);
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 8468ca9885fd..c38306f156d6 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -1099,7 +1099,7 @@ void i915_gem_drain_freed_objects(struct drm_i915_private 
*i915)
  {
while (atomic_read(&i915->mm.free_count)) {
flush_work(&i915->mm.free_work);
-   flush_delayed_work(&i915->bdev.wq);
+   drain_workqueue(i915->bdev.wq);
rcu_barrier();
}
  }
diff --git a/drivers/gpu/drm/i915/intel_region_ttm.c 
b/drivers/gpu/drm/i915/intel_region_ttm.c
index cf89d0c2a2d9..657bbc16a48a 100644
--- a/drivers/gpu/drm/i915/intel_region_ttm.c
+++ b/drivers/gpu/drm/i915/intel_region_ttm.c
@@ -132,7 +132,7 @@ int intel_region_ttm_fini(struct intel_memory_region *mem)
break;
  
  		msleep(20);

-   flush_delayed_work(&mem->i915->bdev.wq);
+   drain_workqueue(mem->i915->bdev.wq);
}
  
  	/* If we leaked objects, Don't free the region causing use after free */

diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c
index b77262a623e0..4749b65bedc4 100644
--- a/drivers/gpu/drm/ttm/ttm_bo.c
+++ b/drivers/gpu/drm/ttm/ttm_bo.c
@@ -280,14 +280,13 @@ static int ttm_bo_cleanup_refs(struct ttm_buffer_object 
*bo,
ret = 0;
}
  
-	if (ret || unlikely(list_empty(&bo->ddestroy))) {

+   if (ret) {
if (unlock_resv)
dma_resv_unlock(bo->base.resv);
spin_unlock(&bo->bdev->lru_lock);
return ret;
}
  
-	list_del_init(&bo->ddestroy);

spin_unlock(&bo->bdev->lru_lock);
ttm_bo_cleanup_memtype_use(bo);
  
@@ -300,47 +299,21 @@ static int ttm_bo_cleanup_refs(struct ttm_buffer_object *bo,

  }
  
  /*

- * Traverse the delayed list, and call ttm_bo_cleanup_refs on all
- * encountered buffers.
+ * Block for the dma_resv object to become idle, lock the buffer and clean up
+ * the resource and tt object.
   */
-bool ttm_bo_delayed_delete(struct ttm_device *bdev, bool remove_all)
+static void ttm_bo_delayed_delete(struct work_struct *work)
  {
-   struct list_head removed;
-   bool empty;
-
-   INIT_LIST_HEAD(&removed);
-
-   spin_lock(&bdev->lru_lock);
-   while (!list_empty(&bdev->ddestroy)) {
-   struct ttm_buffer_object *bo;
-
-   bo = list_first_entry(&bdev->ddestroy, struct ttm_buffer_object,
- ddestroy);
-   list_move_tail(&bo->ddestroy, &removed);
-   if (!ttm_bo_get_unless_zero(bo))
-   continue;
-
-   if (remove_all || bo->base.resv != &bo->base._resv) {
-

[PATCH] drm/amd/display: use the proper fb offset for DM

2022-11-29 Thread Alex Deucher

This fixes DMCU initialization in APU GPU passthrough.  The
DMCU needs the GPU physical address, not the CPU physical
address.  This ends up working out on bare metal because
we always use the physical address, but doesn't work in
passthrough because the addresses are different.

Signed-off-by: Alex Deucher 
---
 drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c 
b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
index 3792a181253b..850432e220a8 100644
--- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
+++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
@@ -1096,7 +1096,7 @@ static int dm_dmub_hw_init(struct amdgpu_device *adev)
/* Initialize hardware. */
memset(&hw_params, 0, sizeof(hw_params));
hw_params.fb_base = adev->gmc.fb_start;
-   hw_params.fb_offset = adev->gmc.aper_base;
+   hw_params.fb_offset = adev->vm_manager.vram_base_offset;
 
/* backdoor load firmware and trigger dmub running */
if (adev->firmware.load_type != AMDGPU_FW_LOAD_PSP)
@@ -1218,7 +1218,7 @@ static void mmhub_read_system_context(struct 
amdgpu_device *adev, struct dc_phy_
pa_config->system_aperture.agp_top = (uint64_t)agp_top << 24;
 
pa_config->system_aperture.fb_base = adev->gmc.fb_start;
-   pa_config->system_aperture.fb_offset = adev->gmc.aper_base;
+   pa_config->system_aperture.fb_offset = 
adev->vm_manager.vram_base_offset;
pa_config->system_aperture.fb_top = adev->gmc.fb_end;
 
pa_config->gart_config.page_table_start_addr = 
page_table_start.quad_part << 12;
-- 
2.38.1

Re: [Intel-gfx] [PATCH 7/9] drm/i915: stop using ttm_bo_wait

2022-11-29 Thread Matthew Auld

On Fri, 25 Nov 2022 at 11:14, Tvrtko Ursulin
 wrote:
>
>
> + Matt
>
> On 25/11/2022 10:21, Christian König wrote:
> > TTM is just wrapping core DMA functionality here, remove the mid-layer.
> > No functional change.
> >
> > Signed-off-by: Christian König 
> > ---
> >   drivers/gpu/drm/i915/gem/i915_gem_ttm.c | 9 ++---
> >   1 file changed, 6 insertions(+), 3 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c 
> > b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
> > index 5247d88b3c13..d409a77449a3 100644
> > --- a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
> > +++ b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
> > @@ -599,13 +599,16 @@ i915_ttm_resource_get_st(struct drm_i915_gem_object 
> > *obj,
> >   static int i915_ttm_truncate(struct drm_i915_gem_object *obj)
> >   {
> >   struct ttm_buffer_object *bo = i915_gem_to_ttm(obj);
> > - int err;
> > + long err;
> >
> >   WARN_ON_ONCE(obj->mm.madv == I915_MADV_WILLNEED);
> >
> > - err = ttm_bo_wait(bo, true, false);
> > - if (err)
> > + err = dma_resv_wait_timeout(bo->base.resv, DMA_RESV_USAGE_BOOKKEEP,
> > + true, 15 * HZ);
>
> This 15 second stuck out a bit for me and then on a slightly deeper look
> it seems this timeout will "leak" into a few of i915 code paths. If we
> look at the difference between the legacy shmem and ttm backend I am not
> sure if the legacy one is blocking or not - but if it can block I don't
> think it would have an arbitrary timeout like this. Matt your thoughts?

Not sure what is meant by leak here, but the legacy shmem must also
wait/block when unbinding each VMA, before calling truncate. It's the
same story for the ttm backend, except slightly more complicated in
that there might be no currently bound VMA, and yet the GPU could
still be accessing the pages due to async unbinds, kernel moves etc,
which the wait here (and in i915_ttm_shrink) is meant to protect
against. If the wait times out it should just fail gracefully. I guess
we could just use MAX_SCHEDULE_TIMEOUT here? Not sure if it really
matters though.

>
> Regards,
>
> Tvrtko
>
> > + if (err < 0)
> >   return err;
> > + if (err == 0)
> > + return -EBUSY;
> >
> >   err = i915_ttm_move_notify(bo);
> >   if (err)

Re: [PATCH 2/3] drm/amd/pm/smu11: poll BACO status after RPM BACO exits

2022-11-29 Thread Alex Deucher

Once these patches land, can we revert these changes?
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=0c85c067c9d9d7a1b2cc2e01a236d5d0d4a872b5
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=192039f12233c9063d040266e7c98188c7c89dec

Alex

On Tue, Nov 22, 2022 at 8:44 PM Guchun Chen  wrote:
>
> After executing BACO exit, driver needs to poll the status
> to ensure FW has completed BACO exit sequence to prevent
> timing issue.
>
> v2: use usleep_range to replace msleep to fix checkpatch.pl warnings
>
> Signed-off-by: Guchun Chen 
> ---
>  .../gpu/drm/amd/pm/swsmu/smu11/smu_v11_0.c| 24 ++-
>  1 file changed, 23 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/amd/pm/swsmu/smu11/smu_v11_0.c 
> b/drivers/gpu/drm/amd/pm/swsmu/smu11/smu_v11_0.c
> index ad5f6a15a1d7..ad66d57aa102 100644
> --- a/drivers/gpu/drm/amd/pm/swsmu/smu11/smu_v11_0.c
> +++ b/drivers/gpu/drm/amd/pm/swsmu/smu11/smu_v11_0.c
> @@ -79,6 +79,17 @@ MODULE_FIRMWARE("amdgpu/beige_goby_smc.bin");
>  #define mmTHM_BACO_CNTL_ARCT   0xA7
>  #define mmTHM_BACO_CNTL_ARCT_BASE_IDX  0
>
> +static void smu_v11_0_poll_baco_exit(struct smu_context *smu)
> +{
> +   struct amdgpu_device *adev = smu->adev;
> +   uint32_t data, loop = 0;
> +
> +   do {
> +   usleep_range(1000, 1100);
> +   data = RREG32_SOC15(THM, 0, mmTHM_BACO_CNTL);
> +   } while ((data & 0x100) && (++loop < 100));
> +}
> +
>  int smu_v11_0_init_microcode(struct smu_context *smu)
>  {
> struct amdgpu_device *adev = smu->adev;
> @@ -1689,7 +1700,18 @@ int smu_v11_0_baco_enter(struct smu_context *smu)
>
>  int smu_v11_0_baco_exit(struct smu_context *smu)
>  {
> -   return smu_v11_0_baco_set_state(smu, SMU_BACO_STATE_EXIT);
> +   int ret;
> +
> +   ret = smu_v11_0_baco_set_state(smu, SMU_BACO_STATE_EXIT);
> +   if (!ret) {
> +   /*
> +* Poll BACO exit status to ensure FW has completed
> +* BACO exit process to avoid timing issues.
> +*/
> +   smu_v11_0_poll_baco_exit(smu);
> +   }
> +
> +   return ret;
>  }
>
>  int smu_v11_0_mode1_reset(struct smu_context *smu)
> --
> 2.25.1
>

Re: Screen corruption using radeon kernel driver

2022-11-29 Thread Mikhail Krylov

On Tue, Nov 29, 2022 at 11:05:28AM -0500, Alex Deucher wrote:
> On Tue, Nov 29, 2022 at 10:59 AM Mikhail Krylov  wrote:
> >
> > On Tue, Nov 29, 2022 at 09:44:19AM -0500, Alex Deucher wrote:
> > > On Mon, Nov 28, 2022 at 3:48 PM Mikhail Krylov  wrote:
> > > >
> > > > On Mon, Nov 28, 2022 at 09:50:50AM -0500, Alex Deucher wrote:
> > > >
> > > > >>> [excessive quoting removed]
> > > >
> > > > >> So, is there any progress on this issue? I do understand it's not a 
> > > > >> high
> > > > >> priority one, and today I've checked it on 6.0 kernel, and
> > > > >> unfortunately, it still persists...
> > > > >>
> > > > >> I'm considering writing a patch that will allow user to override
> > > > >> need_dma32/dma_bits setting with a module parameter. I'll have some 
> > > > >> time
> > > > >> after the New Year for that.
> > > > >>
> > > > >> Is it at all possible that such a patch will be merged into kernel?
> > > > >>
> > > > > On Mon, Nov 28, 2022 at 9:31 AM Mikhail Krylov  
> > > > > wrote:
> > > > > Unless someone familiar with HIMEM can figure out what is going wrong
> > > > > we should just revert the patch.
> > > > >
> > > > > Alex
> > > >
> > > >
> > > > Okay, I was suggesting that mostly because
> > > >
> > > > a) it works for me with dma_bits = 40 (I understand that's what it is
> > > > without the original patch applied);
> > > >
> > > > b) there's a hint of uncertainity on this line
> > > > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/gpu/drm/radeon/radeon_device.c#n1359
> > > > saying that for AGP dma_bits = 32 is the safest option, so apparently 
> > > > there are
> > > > setups, unlike mine, where dma_bits = 32 is better than 40.
> > > >
> > > > But I'm in no position to argue, just wanted to make myself clear.
> > > > I'm okay with rebuilding the kernel for my machine until the original
> > > > patch is reverted or any other fix is applied.
> > >
> > > What GPU do you have and is it AGP?  If it is AGP, does setting
> > > radeon.agpmode=-1 also fix it?
> > >
> > > Alex
> >
> > That is ATI Radeon X1950, and, unfortunately, radeon.agpmode=-1 doesn't
> > help, it just makes 3D acceleration in games such as OpenArena stop
> > working.
> 
> Just to confirm, is the board AGP or PCIe?
> 
> Alex

It is AGP. That's an old machine.


signature.asc
Description: PGP signature

Re: Screen corruption using radeon kernel driver

2022-11-29 Thread Mikhail Krylov

On Tue, Nov 29, 2022 at 09:44:19AM -0500, Alex Deucher wrote:
> On Mon, Nov 28, 2022 at 3:48 PM Mikhail Krylov  wrote:
> >
> > On Mon, Nov 28, 2022 at 09:50:50AM -0500, Alex Deucher wrote:
> >
> > >>> [excessive quoting removed]
> >
> > >> So, is there any progress on this issue? I do understand it's not a high
> > >> priority one, and today I've checked it on 6.0 kernel, and
> > >> unfortunately, it still persists...
> > >>
> > >> I'm considering writing a patch that will allow user to override
> > >> need_dma32/dma_bits setting with a module parameter. I'll have some time
> > >> after the New Year for that.
> > >>
> > >> Is it at all possible that such a patch will be merged into kernel?
> > >>
> > > On Mon, Nov 28, 2022 at 9:31 AM Mikhail Krylov  wrote:
> > > Unless someone familiar with HIMEM can figure out what is going wrong
> > > we should just revert the patch.
> > >
> > > Alex
> >
> >
> > Okay, I was suggesting that mostly because
> >
> > a) it works for me with dma_bits = 40 (I understand that's what it is
> > without the original patch applied);
> >
> > b) there's a hint of uncertainity on this line
> > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/gpu/drm/radeon/radeon_device.c#n1359
> > saying that for AGP dma_bits = 32 is the safest option, so apparently there 
> > are
> > setups, unlike mine, where dma_bits = 32 is better than 40.
> >
> > But I'm in no position to argue, just wanted to make myself clear.
> > I'm okay with rebuilding the kernel for my machine until the original
> > patch is reverted or any other fix is applied.
> 
> What GPU do you have and is it AGP?  If it is AGP, does setting
> radeon.agpmode=-1 also fix it?
> 
> Alex

That is ATI Radeon X1950, and, unfortunately, radeon.agpmode=-1 doesn't
help, it just makes 3D acceleration in games such as OpenArena stop
working.


signature.asc
Description: PGP signature

Re: Screen corruption using radeon kernel driver

2022-11-29 Thread Alex Deucher

On Tue, Nov 29, 2022 at 10:59 AM Mikhail Krylov  wrote:
>
> On Tue, Nov 29, 2022 at 09:44:19AM -0500, Alex Deucher wrote:
> > On Mon, Nov 28, 2022 at 3:48 PM Mikhail Krylov  wrote:
> > >
> > > On Mon, Nov 28, 2022 at 09:50:50AM -0500, Alex Deucher wrote:
> > >
> > > >>> [excessive quoting removed]
> > >
> > > >> So, is there any progress on this issue? I do understand it's not a 
> > > >> high
> > > >> priority one, and today I've checked it on 6.0 kernel, and
> > > >> unfortunately, it still persists...
> > > >>
> > > >> I'm considering writing a patch that will allow user to override
> > > >> need_dma32/dma_bits setting with a module parameter. I'll have some 
> > > >> time
> > > >> after the New Year for that.
> > > >>
> > > >> Is it at all possible that such a patch will be merged into kernel?
> > > >>
> > > > On Mon, Nov 28, 2022 at 9:31 AM Mikhail Krylov  
> > > > wrote:
> > > > Unless someone familiar with HIMEM can figure out what is going wrong
> > > > we should just revert the patch.
> > > >
> > > > Alex
> > >
> > >
> > > Okay, I was suggesting that mostly because
> > >
> > > a) it works for me with dma_bits = 40 (I understand that's what it is
> > > without the original patch applied);
> > >
> > > b) there's a hint of uncertainity on this line
> > > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/gpu/drm/radeon/radeon_device.c#n1359
> > > saying that for AGP dma_bits = 32 is the safest option, so apparently 
> > > there are
> > > setups, unlike mine, where dma_bits = 32 is better than 40.
> > >
> > > But I'm in no position to argue, just wanted to make myself clear.
> > > I'm okay with rebuilding the kernel for my machine until the original
> > > patch is reverted or any other fix is applied.
> >
> > What GPU do you have and is it AGP?  If it is AGP, does setting
> > radeon.agpmode=-1 also fix it?
> >
> > Alex
>
> That is ATI Radeon X1950, and, unfortunately, radeon.agpmode=-1 doesn't
> help, it just makes 3D acceleration in games such as OpenArena stop
> working.

Just to confirm, is the board AGP or PCIe?

Alex

Re: [PATCH] drm: amdgpu: Fix logic error

2022-11-29 Thread Alex Deucher

Applied.  Thanks!

Alex

On Tue, Nov 29, 2022 at 2:49 AM Konstantin Meskhidze
 wrote:
>
> This commit fixes logic error in function 'amdgpu_hw_ip_info':
>- value 'uvd' might be 'vcn'.
>
> Signed-off-by: Konstantin Meskhidze 
> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
> index fe23e09eec98..28752a6a92c4 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
> @@ -424,7 +424,7 @@ static int amdgpu_hw_ip_info(struct amdgpu_device *adev,
> case AMDGPU_HW_IP_VCN_DEC:
> type = AMD_IP_BLOCK_TYPE_VCN;
> for (i = 0; i < adev->vcn.num_vcn_inst; i++) {
> -   if (adev->uvd.harvest_config & (1 << i))
> +   if (adev->vcn.harvest_config & (1 << i))
> continue;
>
> if (adev->vcn.inst[i].ring_dec.sched.ready)
> @@ -436,7 +436,7 @@ static int amdgpu_hw_ip_info(struct amdgpu_device *adev,
> case AMDGPU_HW_IP_VCN_ENC:
> type = AMD_IP_BLOCK_TYPE_VCN;
> for (i = 0; i < adev->vcn.num_vcn_inst; i++) {
> -   if (adev->uvd.harvest_config & (1 << i))
> +   if (adev->vcn.harvest_config & (1 << i))
> continue;
>
> for (j = 0; j < adev->vcn.num_enc_rings; j++)
> --
> 2.25.1
>

Re: AMD GPU problems under Xen

2022-11-29 Thread Alex Deucher

On Tue, Nov 29, 2022 at 10:15 AM Marek Marczykowski-Górecki
 wrote:
>
> On Tue, Nov 29, 2022 at 09:32:54AM -0500, Alex Deucher wrote:
> > On Mon, Nov 28, 2022 at 8:59 PM Demi Marie Obenour
> >  wrote:
> > >
> > > On Mon, Nov 28, 2022 at 11:18:00AM -0500, Alex Deucher wrote:
> > > > On Mon, Nov 28, 2022 at 2:18 AM Demi Marie Obenour
> > > >  wrote:
> > > > >
> > > > > Dear Christian:
> > > > >
> > > > > What is the status of the AMDGPU work for Xen dom0?  That was 
> > > > > mentioned in
> > > > > https://lore.kernel.org/dri-devel/b2dec9b3-03a7-e7ac-306e-1da024af8...@amd.com/
> > > > > and there have been bug reports to Qubes OS about problems with AMDGPU
> > > > > under Xen (such as 
> > > > > https://github.com/QubesOS/qubes-issues/issues/7648).
> > > >
> > > > I would say it's a work in progress.  It depends what GPU  you have
> > > > and what type of xen setup you are using (PV vs PVH, etc.).
> > >
> > > The current situation is:
> > >
> > > - dom0 is PV.
> > > - VMs with assigned PCI devices are HVM and use a Linux-based stubdomain
> > >   QEMU does not run in dom0.
> > > - Everything else is PVH.
> > >
> > > In the future, I believe the goal is to move away from PV and HVM in
> > > favor of PVH, though HVM support will remain for compatibility with
> > > guests (such as Windows) that need emulated devices.
> > >
> > > > In general, your best bet currently is dGPU add in boards because they
> > > > are largely self contained.
> > >
> > > The main problem is that for the trusted GUI to work, there needs to
> > > be at least one GPU attached to a trusted VM, such as the host or a
> > > dedicated GUI VM.  That VM will typically not be running graphics-
> > > intensive workloads, so the compute power of a dGPU is largely wasted.
> > > SR-IOV support would help with that, but the only GPU vendor with open
> > > source SR-IOV support is Intel and it is still not upstream.  I am also
> > > not certain if the support extends to Arc dGPUs.
> >
> > Can you elaborate on this?  Why wouldn't you just want to pass-through
> > a dGPU to a domU to use directly in the guest?
>
> You can do that, but if that's your only GPU in the system, you'll lose
> graphical interface for other guests.
> But yes, simply pass-through of a dGPU is enough in some setups.
>
> > Are you sure?  I didn't think intel's GVT solution was actually
> > SR-IOV.  I think GVT is just a paravirtualized solution.
>
> Yes, it's a paravirtualized solution, with device emulation done in dom0
> kernel. This, besides being rather unusual approach in Xen world
> (emulators, aka IOREQ servers usually live in userspace) puts rather
> complex piece of code that interacts with untrusted data (instructions
> from guests) in almost the most privileged system component, without
> ability to sandbox it in any way. We consider it too risky for Qubes OS,
> especially since the kernel patches were never accepted upstream and the
> Xen support is not maintained anymore.
>
> The SR-IOV approach Demi is talking about is newer development,
> supported since Adler Lake (technically, IGD in Tiger Lake presents
> SR-IOV capability too, but officially it's supported since ADL). The driver
> for managing it is in the process of upstreaming. Some links here:
> https://github.com/intel/linux-intel-lts/issues/33
> (I have not tried it, yet)
>
> >  That aside,
> > we are working on enabling virtio gpu with our GPUs on xen in addition
> > to domU passthrough.
>
> That's interesting development. Please note, Linux recently (part of
> 6.1) gained support to use grant tables with virtio. This allows having
> backends without full access to guest's memory. The work is done in
> generic way, so a driver using proper APIs (including DMA) should work
> out in such setup out of the box. Please try to not break it :)
>
> > >
> > > > APUs and platforms with integrated dGPUs
> > > > are a bit more complicated as they tend to have more platform
> > > > dependencies like ACPI tables and methods in order for the driver to
> > > > be able to initialize the hardware properly.
> > >
> > > Is Xen dom0/domU support for such GPUs being worked on?  Is there an
> > > estimate as to when the needed support will be available upstream?  This
> > > is mostly directed at Christian and other people who work for hardware
> > > vendors.
> >
> > Yes, there are some minor fixes in the driver required which we'll be
> > sending out soon and we had to add some ACPI tables to the whitelist
> > in xen, but unfortunately the ACPI tables are AMD platform specific so
> > there has been pushback from the xen maintainers on accepting them
> > because they are not an official part of the ACPI spec.
>
> Can the driver work without them? Such dependency, as you noted above,
> make things rather complicated for pass-through (specific ACPI tables
> can probably be made available to the guest, but usually guest wouldn't
> see all the resources they talk about anyway).

Not really for APUs and dGPUs that are integrated into a

Re: AMD GPU problems under Xen

2022-11-29 Thread Marek Marczykowski-Górecki

On Tue, Nov 29, 2022 at 09:32:54AM -0500, Alex Deucher wrote:
> On Mon, Nov 28, 2022 at 8:59 PM Demi Marie Obenour
>  wrote:
> >
> > On Mon, Nov 28, 2022 at 11:18:00AM -0500, Alex Deucher wrote:
> > > On Mon, Nov 28, 2022 at 2:18 AM Demi Marie Obenour
> > >  wrote:
> > > >
> > > > Dear Christian:
> > > >
> > > > What is the status of the AMDGPU work for Xen dom0?  That was mentioned 
> > > > in
> > > > https://lore.kernel.org/dri-devel/b2dec9b3-03a7-e7ac-306e-1da024af8...@amd.com/
> > > > and there have been bug reports to Qubes OS about problems with AMDGPU
> > > > under Xen (such as https://github.com/QubesOS/qubes-issues/issues/7648).
> > >
> > > I would say it's a work in progress.  It depends what GPU  you have
> > > and what type of xen setup you are using (PV vs PVH, etc.).
> >
> > The current situation is:
> >
> > - dom0 is PV.
> > - VMs with assigned PCI devices are HVM and use a Linux-based stubdomain
> >   QEMU does not run in dom0.
> > - Everything else is PVH.
> >
> > In the future, I believe the goal is to move away from PV and HVM in
> > favor of PVH, though HVM support will remain for compatibility with
> > guests (such as Windows) that need emulated devices.
> >
> > > In general, your best bet currently is dGPU add in boards because they
> > > are largely self contained.
> >
> > The main problem is that for the trusted GUI to work, there needs to
> > be at least one GPU attached to a trusted VM, such as the host or a
> > dedicated GUI VM.  That VM will typically not be running graphics-
> > intensive workloads, so the compute power of a dGPU is largely wasted.
> > SR-IOV support would help with that, but the only GPU vendor with open
> > source SR-IOV support is Intel and it is still not upstream.  I am also
> > not certain if the support extends to Arc dGPUs.
> 
> Can you elaborate on this?  Why wouldn't you just want to pass-through
> a dGPU to a domU to use directly in the guest?

You can do that, but if that's your only GPU in the system, you'll lose
graphical interface for other guests.
But yes, simply pass-through of a dGPU is enough in some setups.

> Are you sure?  I didn't think intel's GVT solution was actually
> SR-IOV.  I think GVT is just a paravirtualized solution.

Yes, it's a paravirtualized solution, with device emulation done in dom0
kernel. This, besides being rather unusual approach in Xen world
(emulators, aka IOREQ servers usually live in userspace) puts rather
complex piece of code that interacts with untrusted data (instructions
from guests) in almost the most privileged system component, without
ability to sandbox it in any way. We consider it too risky for Qubes OS,
especially since the kernel patches were never accepted upstream and the
Xen support is not maintained anymore.

The SR-IOV approach Demi is talking about is newer development,
supported since Adler Lake (technically, IGD in Tiger Lake presents
SR-IOV capability too, but officially it's supported since ADL). The driver
for managing it is in the process of upstreaming. Some links here:
https://github.com/intel/linux-intel-lts/issues/33
(I have not tried it, yet)

>  That aside,
> we are working on enabling virtio gpu with our GPUs on xen in addition
> to domU passthrough.

That's interesting development. Please note, Linux recently (part of
6.1) gained support to use grant tables with virtio. This allows having
backends without full access to guest's memory. The work is done in
generic way, so a driver using proper APIs (including DMA) should work
out in such setup out of the box. Please try to not break it :)

> >
> > > APUs and platforms with integrated dGPUs
> > > are a bit more complicated as they tend to have more platform
> > > dependencies like ACPI tables and methods in order for the driver to
> > > be able to initialize the hardware properly.
> >
> > Is Xen dom0/domU support for such GPUs being worked on?  Is there an
> > estimate as to when the needed support will be available upstream?  This
> > is mostly directed at Christian and other people who work for hardware
> > vendors.
> 
> Yes, there are some minor fixes in the driver required which we'll be
> sending out soon and we had to add some ACPI tables to the whitelist
> in xen, but unfortunately the ACPI tables are AMD platform specific so
> there has been pushback from the xen maintainers on accepting them
> because they are not an official part of the ACPI spec.

Can the driver work without them? Such dependency, as you noted above,
make things rather complicated for pass-through (specific ACPI tables
can probably be made available to the guest, but usually guest wouldn't
see all the resources they talk about anyway).

-- 
Best Regards,
Marek Marczykowski-Górecki
Invisible Things Lab

signature.asc
Description: PGP signature

Re: Screen corruption using radeon kernel driver

2022-11-29 Thread Alex Deucher

On Mon, Nov 28, 2022 at 3:48 PM Mikhail Krylov  wrote:
>
> On Mon, Nov 28, 2022 at 09:50:50AM -0500, Alex Deucher wrote:
>
> >>> [excessive quoting removed]
>
> >> So, is there any progress on this issue? I do understand it's not a high
> >> priority one, and today I've checked it on 6.0 kernel, and
> >> unfortunately, it still persists...
> >>
> >> I'm considering writing a patch that will allow user to override
> >> need_dma32/dma_bits setting with a module parameter. I'll have some time
> >> after the New Year for that.
> >>
> >> Is it at all possible that such a patch will be merged into kernel?
> >>
> > On Mon, Nov 28, 2022 at 9:31 AM Mikhail Krylov  wrote:
> > Unless someone familiar with HIMEM can figure out what is going wrong
> > we should just revert the patch.
> >
> > Alex
>
>
> Okay, I was suggesting that mostly because
>
> a) it works for me with dma_bits = 40 (I understand that's what it is
> without the original patch applied);
>
> b) there's a hint of uncertainity on this line
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/gpu/drm/radeon/radeon_device.c#n1359
> saying that for AGP dma_bits = 32 is the safest option, so apparently there 
> are
> setups, unlike mine, where dma_bits = 32 is better than 40.
>
> But I'm in no position to argue, just wanted to make myself clear.
> I'm okay with rebuilding the kernel for my machine until the original
> patch is reverted or any other fix is applied.

What GPU do you have and is it AGP?  If it is AGP, does setting
radeon.agpmode=-1 also fix it?

Alex

Re: AMD GPU problems under Xen

2022-11-29 Thread Alex Deucher

On Mon, Nov 28, 2022 at 8:59 PM Demi Marie Obenour
 wrote:
>
> On Mon, Nov 28, 2022 at 11:18:00AM -0500, Alex Deucher wrote:
> > On Mon, Nov 28, 2022 at 2:18 AM Demi Marie Obenour
> >  wrote:
> > >
> > > Dear Christian:
> > >
> > > What is the status of the AMDGPU work for Xen dom0?  That was mentioned in
> > > https://lore.kernel.org/dri-devel/b2dec9b3-03a7-e7ac-306e-1da024af8...@amd.com/
> > > and there have been bug reports to Qubes OS about problems with AMDGPU
> > > under Xen (such as https://github.com/QubesOS/qubes-issues/issues/7648).
> >
> > I would say it's a work in progress.  It depends what GPU  you have
> > and what type of xen setup you are using (PV vs PVH, etc.).
>
> The current situation is:
>
> - dom0 is PV.
> - VMs with assigned PCI devices are HVM and use a Linux-based stubdomain
>   QEMU does not run in dom0.
> - Everything else is PVH.
>
> In the future, I believe the goal is to move away from PV and HVM in
> favor of PVH, though HVM support will remain for compatibility with
> guests (such as Windows) that need emulated devices.
>
> > In general, your best bet currently is dGPU add in boards because they
> > are largely self contained.
>
> The main problem is that for the trusted GUI to work, there needs to
> be at least one GPU attached to a trusted VM, such as the host or a
> dedicated GUI VM.  That VM will typically not be running graphics-
> intensive workloads, so the compute power of a dGPU is largely wasted.
> SR-IOV support would help with that, but the only GPU vendor with open
> source SR-IOV support is Intel and it is still not upstream.  I am also
> not certain if the support extends to Arc dGPUs.

Can you elaborate on this?  Why wouldn't you just want to pass-through
a dGPU to a domU to use directly in the guest?
Are you sure?  I didn't think intel's GVT solution was actually
SR-IOV.  I think GVT is just a paravirtualized solution.  That aside,
we are working on enabling virtio gpu with our GPUs on xen in addition
to domU passthrough.

>
> > APUs and platforms with integrated dGPUs
> > are a bit more complicated as they tend to have more platform
> > dependencies like ACPI tables and methods in order for the driver to
> > be able to initialize the hardware properly.
>
> Is Xen dom0/domU support for such GPUs being worked on?  Is there an
> estimate as to when the needed support will be available upstream?  This
> is mostly directed at Christian and other people who work for hardware
> vendors.

Yes, there are some minor fixes in the driver required which we'll be
sending out soon and we had to add some ACPI tables to the whitelist
in xen, but unfortunately the ACPI tables are AMD platform specific so
there has been pushback from the xen maintainers on accepting them
because they are not an official part of the ACPI spec.

Alex

>
> > Additionally, GPUs map a
> > lot of system memory so bounce buffers aren't really viable.  You'll
> > really need IOMMU,
>
> Qubes OS already needs an IOMMU so that is not a concern.
> --
> Sincerely,
> Demi Marie Obenour (she/her/hers)
> Invisible Things Lab

Re: [PATCH v3 1/2] drm: Add GPU reset sysfs event

2022-11-29 Thread Alex Deucher

On Fri, Nov 25, 2022 at 12:52 PM André Almeida  wrote:
>
> From: Shashank Sharma 
>
> Add a sysfs event to notify userspace about GPU resets providing:
> - PID that triggered the GPU reset, if any. Resets can happen from
>   kernel threads as well, in that case no PID is provided
> - Information about the reset (e.g. was VRAM lost?)
>
> Co-developed-by: André Almeida 
> Signed-off-by: André Almeida 
> Signed-off-by: Shashank Sharma 
> ---
>
> V3:
>- Reduce information to just PID and flags
>- Use pid pointer instead of just pid number
>- BUG() if no reset info is provided
>
> V2:
>- Addressed review comments from Christian and Amar
>- move the reset information structure to DRM layer
>- drop _ctx from struct name
>- make pid 32 bit(than 64)
>- set flag when VRAM invalid (than valid)
>- add process name as well (Amar)
> ---
>  drivers/gpu/drm/drm_sysfs.c | 26 ++
>  include/drm/drm_sysfs.h | 13 +
>  2 files changed, 39 insertions(+)
>
> diff --git a/drivers/gpu/drm/drm_sysfs.c b/drivers/gpu/drm/drm_sysfs.c
> index 430e00b16eec..85777abf4194 100644
> --- a/drivers/gpu/drm/drm_sysfs.c
> +++ b/drivers/gpu/drm/drm_sysfs.c
> @@ -409,6 +409,32 @@ void drm_sysfs_hotplug_event(struct drm_device *dev)
>  }
>  EXPORT_SYMBOL(drm_sysfs_hotplug_event);
>
> +/**
> + * drm_sysfs_reset_event - generate a DRM uevent to indicate GPU reset
> + * @dev: DRM device
> + * @reset_info: The contextual information about the reset (like PID, flags)
> + *
> + * Send a uevent for the DRM device specified by @dev. This informs
> + * user that a GPU reset has occurred, so that an interested client
> + * can take any recovery or profiling measure.
> + */
> +void drm_sysfs_reset_event(struct drm_device *dev, struct 
> drm_reset_event_info *reset_info)
> +{
> +   unsigned char pid_str[13];
> +   unsigned char flags_str[18];
> +   unsigned char reset_str[] = "RESET=1";
> +   char *envp[] = { reset_str, pid_str, flags_str, NULL };
> +
> +   DRM_DEBUG("generating reset event\n");
> +
> +   BUG_ON(!reset_info);
> +
> +   snprintf(pid_str, sizeof(pid_str), "PID=%u", 
> pid_vnr(reset_info->pid));
> +   snprintf(flags_str, sizeof(flags_str), "FLAGS=0x%llx", 
> reset_info->flags);
> +   kobject_uevent_env(&dev->primary->kdev->kobj, KOBJ_CHANGE, envp);
> +}
> +EXPORT_SYMBOL(drm_sysfs_reset_event);
> +
>  /**
>   * drm_sysfs_connector_hotplug_event - generate a DRM uevent for any 
> connector
>   * change
> diff --git a/include/drm/drm_sysfs.h b/include/drm/drm_sysfs.h
> index 6273cac44e47..dbb0ac6230b8 100644
> --- a/include/drm/drm_sysfs.h
> +++ b/include/drm/drm_sysfs.h
> @@ -2,15 +2,28 @@
>  #ifndef _DRM_SYSFS_H_
>  #define _DRM_SYSFS_H_
>
> +#define DRM_RESET_EVENT_VRAM_LOST (1 << 0)

I was thinking about this a bit more last night, and I think we should add:
DRM_RESET_EVENT_APP_ROBUSTNESS
When an application that supports robustness extensions starts, the
UMD can set a flag when it creates the context with the KMD.  That way
if the app causes a GPU hang, the reset daemon would see this flag if
the guilty app supports robustness and adjust it's behavior as
appropriate.  E.g., rather than killing the app, it might let it run
or set some grace period, etc.

Alex


> +
>  struct drm_device;
>  struct device;
>  struct drm_connector;
>  struct drm_property;
>
> +/**
> + * struct drm_reset_event_info - Information about a GPU reset event
> + * @pid: Process that triggered the reset, if any
> + * @flags: Extra information around the reset event (e.g. is VRAM lost?)
> + */
> +struct drm_reset_event_info {
> +   struct pid *pid;
> +   uint64_t flags;
> +};
> +
>  int drm_class_device_register(struct device *dev);
>  void drm_class_device_unregister(struct device *dev);
>
>  void drm_sysfs_hotplug_event(struct drm_device *dev);
> +void drm_sysfs_reset_event(struct drm_device *dev, struct 
> drm_reset_event_info *reset_info);
>  void drm_sysfs_connector_hotplug_event(struct drm_connector *connector);
>  void drm_sysfs_connector_status_event(struct drm_connector *connector,
>   struct drm_property *property);
> --
> 2.38.1
>

Re: 回复: 回复: [PATCH v4] drm: Optimise for continuous memory allocation

2022-11-29 Thread Christian König


Am 29.11.22 um 14:14 schrieb Pan, Xinhui:

[AMD Official Use Only - General]

comments line.


发件人: Koenig, Christian 
发送时间: 2022年11月29日 20:07
收件人: Pan, Xinhui; amd-gfx@lists.freedesktop.org
抄送: dan...@ffwll.ch; matthew.a...@intel.com; dri-de...@lists.freedesktop.org; 
linux-ker...@vger.kernel.org; Paneer Selvam, Arunpravin; 
intel-...@lists.freedesktop.org
主题: Re: 回复: [PATCH v4] drm: Optimise for continuous memory allocation

Am 29.11.22 um 12:54 schrieb Pan, Xinhui:

[AMD Official Use Only - General]

comments inline.


发件人: Koenig, Christian 
发送时间: 2022年11月29日 19:32
收件人: Pan, Xinhui; amd-gfx@lists.freedesktop.org
抄送: dan...@ffwll.ch; matthew.a...@intel.com; dri-de...@lists.freedesktop.org; 
linux-ker...@vger.kernel.org; Paneer Selvam, Arunpravin; 
intel-...@lists.freedesktop.org
主题: Re: [PATCH v4] drm: Optimise for continuous memory allocation

Am 29.11.22 um 11:56 schrieb xinhui pan:

Currently drm-buddy does not have full knowledge of continuous memory.

Lets consider scenario below.
order 1:L R
order 0: LL   LR  RL  RR
for order 1 allocation, it can offer L or R or LR+RL.

For now, we only implement L or R case for continuous memory allocation.
So this patch aims to implement the rest cases.

Adding a new member leaf_link which links all leaf blocks in asceding
order. Now we can find more than 2 sub-order blocks easier.
Say, order 4 can be combined with corresponding order 4, 2+2, 1+2+1,
0+1+2+0, 0+2+1+0.

Well that description is a bit confusing and doesn't make to much sense
to me.

When you have two adjacent free order 0 blocks then those should be
automatically combined into an order 1. This is a fundamental property
of the buddy allocator, otherwise the whole algorithm won't work.

[xh] sorry, The order above is not 4, should be 3.
order 3 can be combined with corresponding order 3, 2+2, 1+2+1, 0+1+2+0, 0+2+1+0
the order 0 + 1 + 2 + 0 case does not have two same order 0 adjacent. they are 
in different tree.
looks like below
order 3:L3  
 R3
order 2:L2  (R2)*L2*
order 1:L1 (R1) L1
order 0: L0   (R0) (L0)
R0 + R1+R2 +L0 with () around combined to be order 3.
R2 + L2 with * followed combined to be order 3.
etc

When you have the case of a free order 1 block with two adjacent free
order 0 blocks then we a fragmented address space. In this case the best
approach is to fail the allocation and start to swap things out.

[xh] Eviction is expensive.

No, it isn't. Eviction is part of the algorithm to clean this up.

When we can't find any free room then evicting and moving things back in
is the best we can do to de-fragment the address space.

This is expected behavior.

[xh]  I believe eviction is the best approach to cleanup memory.
But as its cost is not cheap, it should be the final step.
As long as we could find any room to satisfy the request, no need to trigger 
eviction.

Just a test in theory
two threads run parallelly.
total memory is 128.
while true {
alloc 32
alloc 32
free 32
free 32
alloc 64
free 64
}

when thread 0 wants to alloc 64, the memory layout might be
(32) means allocated, _32_ means free.
case 1: (32) _32_ _32_ (32)
case 2: (32) _32_ (32) _32_
case 3: (32) (32)  _64_
case 4: (32) _32_ 64_
case 5: _128_
case 6: (64) _64_

without this patch, it would trigger eviction in case 1 and case 2.
with this patch, it would trigger eviction only in case 2.
obviously, the two threads totally consume memory at most 128 at any time, no 
overcommit.
The eviction is the less the better.


No, once more: Eviction is part of why this works as it should.

In other words eviction is expected here and de-fragments the address 
space into larger blocks.


This patch here breaks the general approach of the buddy allocator and 
is a no-go as far as I can see.


If looking at adjacent blocks would come without extra cost then we 
could consider it, but this here means extra overhead and complexity.


Regards,
Christian.




Regards,
Christian.


And if it still fails to find the continuous memory with this approach, then 
let's evict.

So what exactly is the goal here?

Regards,
Christian.


Signed-off-by: xinhui pan 
---
change from v3:
reworked totally. adding leaf_link.

change from v2:
search continuous block in nearby root if needed

change from v1:
implement top-down continuous allocation
---
drivers/gpu/drm/drm_buddy.c | 108 +---
include/drm/drm_buddy.h |   1 +
2 files changed, 102 insertions(+), 7 deletions(-)

diff --git a/drivers/gpu/drm/drm_buddy.c b/drivers/gpu/drm/drm_buddy.c
index 11bb59399471..8edafb99b02c 100644
--- a/drivers/gpu/drm/drm_buddy.c
+++ b/drivers/gpu/drm/drm_buddy.c
@@ -80,6 +80,7 @@ int dr

回复: 回复: [PATCH v4] drm: Optimise for continuous memory allocation

2022-11-29 Thread Pan, Xinhui

[AMD Official Use Only - General]

comments line.


发件人: Koenig, Christian 
发送时间: 2022年11月29日 20:07
收件人: Pan, Xinhui; amd-gfx@lists.freedesktop.org
抄送: dan...@ffwll.ch; matthew.a...@intel.com; dri-de...@lists.freedesktop.org; 
linux-ker...@vger.kernel.org; Paneer Selvam, Arunpravin; 
intel-...@lists.freedesktop.org
主题: Re: 回复: [PATCH v4] drm: Optimise for continuous memory allocation

Am 29.11.22 um 12:54 schrieb Pan, Xinhui:
> [AMD Official Use Only - General]
>
> comments inline.
>
> 
> 发件人: Koenig, Christian 
> 发送时间: 2022年11月29日 19:32
> 收件人: Pan, Xinhui; amd-gfx@lists.freedesktop.org
> 抄送: dan...@ffwll.ch; matthew.a...@intel.com; dri-de...@lists.freedesktop.org; 
> linux-ker...@vger.kernel.org; Paneer Selvam, Arunpravin; 
> intel-...@lists.freedesktop.org
> 主题: Re: [PATCH v4] drm: Optimise for continuous memory allocation
>
> Am 29.11.22 um 11:56 schrieb xinhui pan:
>> Currently drm-buddy does not have full knowledge of continuous memory.
>>
>> Lets consider scenario below.
>> order 1:L R
>> order 0: LL   LR  RL  RR
>> for order 1 allocation, it can offer L or R or LR+RL.
>>
>> For now, we only implement L or R case for continuous memory allocation.
>> So this patch aims to implement the rest cases.
>>
>> Adding a new member leaf_link which links all leaf blocks in asceding
>> order. Now we can find more than 2 sub-order blocks easier.
>> Say, order 4 can be combined with corresponding order 4, 2+2, 1+2+1,
>> 0+1+2+0, 0+2+1+0.
> Well that description is a bit confusing and doesn't make to much sense
> to me.
>
> When you have two adjacent free order 0 blocks then those should be
> automatically combined into an order 1. This is a fundamental property
> of the buddy allocator, otherwise the whole algorithm won't work.
>
> [xh] sorry, The order above is not 4, should be 3.
> order 3 can be combined with corresponding order 3, 2+2, 1+2+1, 0+1+2+0, 
> 0+2+1+0
> the order 0 + 1 + 2 + 0 case does not have two same order 0 adjacent. they 
> are in different tree.
> looks like below
> order 3:L3
>R3
> order 2:L2  (R2)*
> L2*
> order 1:L1 (R1) L1
> order 0: L0   (R0) (L0)
> R0 + R1+R2 +L0 with () around combined to be order 3.
> R2 + L2 with * followed combined to be order 3.
> etc
>
> When you have the case of a free order 1 block with two adjacent free
> order 0 blocks then we a fragmented address space. In this case the best
> approach is to fail the allocation and start to swap things out.
>
> [xh] Eviction is expensive.

No, it isn't. Eviction is part of the algorithm to clean this up.

When we can't find any free room then evicting and moving things back in
is the best we can do to de-fragment the address space.

This is expected behavior.

[xh]  I believe eviction is the best approach to cleanup memory.
But as its cost is not cheap, it should be the final step.
As long as we could find any room to satisfy the request, no need to trigger 
eviction.

Just a test in theory
two threads run parallelly.
total memory is 128.
while true {
alloc 32
alloc 32
free 32
free 32
alloc 64
free 64
}

when thread 0 wants to alloc 64, the memory layout might be
(32) means allocated, _32_ means free.
case 1: (32) _32_ _32_ (32)
case 2: (32) _32_ (32) _32_
case 3: (32) (32)  _64_
case 4: (32) _32_ 64_
case 5: _128_
case 6: (64) _64_

without this patch, it would trigger eviction in case 1 and case 2.
with this patch, it would trigger eviction only in case 2.
obviously, the two threads totally consume memory at most 128 at any time, no 
overcommit.
The eviction is the less the better.


Regards,
Christian.

> And if it still fails to find the continuous memory with this approach, then 
> let's evict.
>
> So what exactly is the goal here?
>
> Regards,
> Christian.
>
>> Signed-off-by: xinhui pan 
>> ---
>> change from v3:
>> reworked totally. adding leaf_link.
>>
>> change from v2:
>> search continuous block in nearby root if needed
>>
>> change from v1:
>> implement top-down continuous allocation
>> ---
>>drivers/gpu/drm/drm_buddy.c | 108 +---
>>include/drm/drm_buddy.h |   1 +
>>2 files changed, 102 insertions(+), 7 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/drm_buddy.c b/drivers/gpu/drm/drm_buddy.c
>> index 11bb59399471..8edafb99b02c 100644
>> --- a/drivers/gpu/drm/drm_buddy.c
>> +++ b/drivers/gpu/drm/drm_buddy.c
>> @@ -80,6 +80,7 @@ int drm_buddy_init(struct drm_buddy *mm, u64 size, u64 
>> chunk_size)
>>{
>>unsigned int i;
>>u64 offset;
>> + LIST_HEAD(leaf);
>>
>>if (size < chunk_size)
>>return -EINVAL;
>> @@ -136,6 +137,7 @@ int drm_buddy_init(struct drm_buddy *mm, u64 size, u64 
>>

[PATCH v5] drm: Optimise for continuous memory allocation

2022-11-29 Thread xinhui pan

Currently drm-buddy does not have full knowledge of continuous memory.

Adding a new member leaf_link which links all leaf blocks in asceding
order. Finding continuous memory within this leaf_link is easier.

Say, memory of order 3 can be combined with corresponding memory of
order 3 or 2+2 or 1+2+1 or 0+1+2+0 or 0+2+1+0.
Without this patch, eviction is the final step to cleanup memory.
Now there is a chance to delay the evction and then reduce the total
count of evction.

Signed-off-by: xinhui pan 
---
change from v4:
Fix offset check by using <= instead of <
Change patch description.

change from v3:
reworked totally. adding leaf_link.

change from v2:
search continuous block in nearby root if needed

change from v1:
implement top-down continuous allocation
---
 drivers/gpu/drm/drm_buddy.c | 108 +---
 include/drm/drm_buddy.h |   1 +
 2 files changed, 102 insertions(+), 7 deletions(-)

diff --git a/drivers/gpu/drm/drm_buddy.c b/drivers/gpu/drm/drm_buddy.c
index 11bb59399471..00dd6da1e948 100644
--- a/drivers/gpu/drm/drm_buddy.c
+++ b/drivers/gpu/drm/drm_buddy.c
@@ -80,6 +80,7 @@ int drm_buddy_init(struct drm_buddy *mm, u64 size, u64 
chunk_size)
 {
unsigned int i;
u64 offset;
+   LIST_HEAD(leaf);
 
if (size < chunk_size)
return -EINVAL;
@@ -136,6 +137,7 @@ int drm_buddy_init(struct drm_buddy *mm, u64 size, u64 
chunk_size)
goto out_free_roots;
 
mark_free(mm, root);
+   list_add_tail(&root->leaf_link, &leaf);
 
BUG_ON(i > mm->max_order);
BUG_ON(drm_buddy_block_size(mm, root) < chunk_size);
@@ -147,6 +149,7 @@ int drm_buddy_init(struct drm_buddy *mm, u64 size, u64 
chunk_size)
i++;
} while (size);
 
+   list_del(&leaf);
return 0;
 
 out_free_roots:
@@ -205,6 +208,9 @@ static int split_block(struct drm_buddy *mm,
mark_free(mm, block->left);
mark_free(mm, block->right);
 
+   list_add(&block->right->leaf_link, &block->leaf_link);
+   list_add(&block->left->leaf_link, &block->leaf_link);
+   list_del(&block->leaf_link);
mark_split(block);
 
return 0;
@@ -256,6 +262,9 @@ static void __drm_buddy_free(struct drm_buddy *mm,
break;
 
list_del(&buddy->link);
+   list_add(&parent->leaf_link, &block->leaf_link);
+   list_del(&buddy->leaf_link);
+   list_del(&block->leaf_link);
 
drm_block_free(mm, block);
drm_block_free(mm, buddy);
@@ -386,6 +395,78 @@ alloc_range_bias(struct drm_buddy *mm,
return ERR_PTR(err);
 }
 
+static struct drm_buddy_block *
+find_continuous_blocks(struct drm_buddy *mm,
+  int order,
+  unsigned long flags,
+  struct drm_buddy_block **rblock)
+{
+   struct list_head *head = &mm->free_list[order];
+   struct drm_buddy_block *free_block, *max_block = NULL, *end, *begin;
+   u64 pages = BIT(order + 1);
+   u64 cur_pages;
+
+   list_for_each_entry(free_block, head, link) {
+   if (max_block) {
+   if (!(flags & DRM_BUDDY_TOPDOWN_ALLOCATION))
+   break;
+
+   if (drm_buddy_block_offset(free_block) <
+   drm_buddy_block_offset(max_block))
+   continue;
+   }
+
+   cur_pages = BIT(order);
+   begin = end = free_block;
+   while (true) {
+   struct drm_buddy_block *prev, *next;
+   int prev_order, next_order;
+
+   prev = list_prev_entry(begin, leaf_link);
+   if (!drm_buddy_block_is_free(prev) ||
+   drm_buddy_block_offset(prev) >=
+   drm_buddy_block_offset(begin)) {
+   prev = NULL;
+   }
+   next = list_next_entry(end, leaf_link);
+   if (!drm_buddy_block_is_free(next) ||
+   drm_buddy_block_offset(next) <=
+   drm_buddy_block_offset(end)) {
+   next = NULL;
+   }
+   if (!prev && !next)
+   break;
+
+   prev_order = prev ? drm_buddy_block_order(prev) : -1;
+   next_order = next ? drm_buddy_block_order(next) : -1;
+   if (next_order >= prev_order) {
+   BUG_ON(drm_buddy_block_offset(end) +
+  drm_buddy_block_size(mm, end) !=
+  drm_buddy_block_offset(next));
+   end = next;
+   cur_pages += BIT(drm_buddy_block_order(next));

Re: 回复: [PATCH v4] drm: Optimise for continuous memory allocation

2022-11-29 Thread Christian König


Am 29.11.22 um 12:54 schrieb Pan, Xinhui:

[AMD Official Use Only - General]

comments inline.


发件人: Koenig, Christian 
发送时间: 2022年11月29日 19:32
收件人: Pan, Xinhui; amd-gfx@lists.freedesktop.org
抄送: dan...@ffwll.ch; matthew.a...@intel.com; dri-de...@lists.freedesktop.org; 
linux-ker...@vger.kernel.org; Paneer Selvam, Arunpravin; 
intel-...@lists.freedesktop.org
主题: Re: [PATCH v4] drm: Optimise for continuous memory allocation

Am 29.11.22 um 11:56 schrieb xinhui pan:

Currently drm-buddy does not have full knowledge of continuous memory.

Lets consider scenario below.
order 1:L R
order 0: LL   LR  RL  RR
for order 1 allocation, it can offer L or R or LR+RL.

For now, we only implement L or R case for continuous memory allocation.
So this patch aims to implement the rest cases.

Adding a new member leaf_link which links all leaf blocks in asceding
order. Now we can find more than 2 sub-order blocks easier.
Say, order 4 can be combined with corresponding order 4, 2+2, 1+2+1,
0+1+2+0, 0+2+1+0.

Well that description is a bit confusing and doesn't make to much sense
to me.

When you have two adjacent free order 0 blocks then those should be
automatically combined into an order 1. This is a fundamental property
of the buddy allocator, otherwise the whole algorithm won't work.

[xh] sorry, The order above is not 4, should be 3.
order 3 can be combined with corresponding order 3, 2+2, 1+2+1, 0+1+2+0, 0+2+1+0
the order 0 + 1 + 2 + 0 case does not have two same order 0 adjacent. they are 
in different tree.
looks like below
order 3:L3  
 R3
order 2:L2  (R2)*L2*
order 1:L1 (R1) L1
order 0: L0   (R0) (L0)
R0 + R1+R2 +L0 with () around combined to be order 3.
R2 + L2 with * followed combined to be order 3.
etc

When you have the case of a free order 1 block with two adjacent free
order 0 blocks then we a fragmented address space. In this case the best
approach is to fail the allocation and start to swap things out.

[xh] Eviction is expensive.


No, it isn't. Eviction is part of the algorithm to clean this up.

When we can't find any free room then evicting and moving things back in 
is the best we can do to de-fragment the address space.


This is expected behavior.

Regards,
Christian.


And if it still fails to find the continuous memory with this approach, then 
let's evict.

So what exactly is the goal here?

Regards,
Christian.


Signed-off-by: xinhui pan 
---
change from v3:
reworked totally. adding leaf_link.

change from v2:
search continuous block in nearby root if needed

change from v1:
implement top-down continuous allocation
---
   drivers/gpu/drm/drm_buddy.c | 108 +---
   include/drm/drm_buddy.h |   1 +
   2 files changed, 102 insertions(+), 7 deletions(-)

diff --git a/drivers/gpu/drm/drm_buddy.c b/drivers/gpu/drm/drm_buddy.c
index 11bb59399471..8edafb99b02c 100644
--- a/drivers/gpu/drm/drm_buddy.c
+++ b/drivers/gpu/drm/drm_buddy.c
@@ -80,6 +80,7 @@ int drm_buddy_init(struct drm_buddy *mm, u64 size, u64 
chunk_size)
   {
   unsigned int i;
   u64 offset;
+ LIST_HEAD(leaf);

   if (size < chunk_size)
   return -EINVAL;
@@ -136,6 +137,7 @@ int drm_buddy_init(struct drm_buddy *mm, u64 size, u64 
chunk_size)
   goto out_free_roots;

   mark_free(mm, root);
+ list_add_tail(&root->leaf_link, &leaf);

   BUG_ON(i > mm->max_order);
   BUG_ON(drm_buddy_block_size(mm, root) < chunk_size);
@@ -147,6 +149,7 @@ int drm_buddy_init(struct drm_buddy *mm, u64 size, u64 
chunk_size)
   i++;
   } while (size);

+ list_del(&leaf);
   return 0;

   out_free_roots:
@@ -205,6 +208,9 @@ static int split_block(struct drm_buddy *mm,
   mark_free(mm, block->left);
   mark_free(mm, block->right);

+ list_add(&block->right->leaf_link, &block->leaf_link);
+ list_add(&block->left->leaf_link, &block->leaf_link);
+ list_del(&block->leaf_link);
   mark_split(block);

   return 0;
@@ -256,6 +262,9 @@ static void __drm_buddy_free(struct drm_buddy *mm,
   break;

   list_del(&buddy->link);
+ list_add(&parent->leaf_link, &block->leaf_link);
+ list_del(&buddy->leaf_link);
+ list_del(&block->leaf_link);

   drm_block_free(mm, block);
   drm_block_free(mm, buddy);
@@ -386,6 +395,78 @@ alloc_range_bias(struct drm_buddy *mm,
   return ERR_PTR(err);
   }

+static struct drm_buddy_block *
+find_continuous_blocks(struct drm_buddy *mm,
+int order,
+unsigned long flags,
+struct drm_buddy_block **rblock)
+{
+ struct

回复: [PATCH v4] drm: Optimise for continuous memory allocation

2022-11-29 Thread Pan, Xinhui

[AMD Official Use Only - General]

comments inline.


发件人: Koenig, Christian 
发送时间: 2022年11月29日 19:32
收件人: Pan, Xinhui; amd-gfx@lists.freedesktop.org
抄送: dan...@ffwll.ch; matthew.a...@intel.com; dri-de...@lists.freedesktop.org; 
linux-ker...@vger.kernel.org; Paneer Selvam, Arunpravin; 
intel-...@lists.freedesktop.org
主题: Re: [PATCH v4] drm: Optimise for continuous memory allocation

Am 29.11.22 um 11:56 schrieb xinhui pan:
> Currently drm-buddy does not have full knowledge of continuous memory.
>
> Lets consider scenario below.
> order 1:L R
> order 0: LL   LR  RL  RR
> for order 1 allocation, it can offer L or R or LR+RL.
>
> For now, we only implement L or R case for continuous memory allocation.
> So this patch aims to implement the rest cases.
>
> Adding a new member leaf_link which links all leaf blocks in asceding
> order. Now we can find more than 2 sub-order blocks easier.
> Say, order 4 can be combined with corresponding order 4, 2+2, 1+2+1,
> 0+1+2+0, 0+2+1+0.

Well that description is a bit confusing and doesn't make to much sense
to me.

When you have two adjacent free order 0 blocks then those should be
automatically combined into an order 1. This is a fundamental property
of the buddy allocator, otherwise the whole algorithm won't work.

[xh] sorry, The order above is not 4, should be 3.
order 3 can be combined with corresponding order 3, 2+2, 1+2+1, 0+1+2+0, 0+2+1+0
the order 0 + 1 + 2 + 0 case does not have two same order 0 adjacent. they are 
in different tree.
looks like below
order 3:L3  
 R3
order 2:L2  (R2)*L2*
order 1:L1 (R1) L1
order 0: L0   (R0) (L0)
R0 + R1+R2 +L0 with () around combined to be order 3.
R2 + L2 with * followed combined to be order 3.
etc

When you have the case of a free order 1 block with two adjacent free
order 0 blocks then we a fragmented address space. In this case the best
approach is to fail the allocation and start to swap things out.

[xh] Eviction is expensive.
And if it still fails to find the continuous memory with this approach, then 
let's evict.

So what exactly is the goal here?

Regards,
Christian.

>
> Signed-off-by: xinhui pan 
> ---
> change from v3:
> reworked totally. adding leaf_link.
>
> change from v2:
> search continuous block in nearby root if needed
>
> change from v1:
> implement top-down continuous allocation
> ---
>   drivers/gpu/drm/drm_buddy.c | 108 +---
>   include/drm/drm_buddy.h |   1 +
>   2 files changed, 102 insertions(+), 7 deletions(-)
>
> diff --git a/drivers/gpu/drm/drm_buddy.c b/drivers/gpu/drm/drm_buddy.c
> index 11bb59399471..8edafb99b02c 100644
> --- a/drivers/gpu/drm/drm_buddy.c
> +++ b/drivers/gpu/drm/drm_buddy.c
> @@ -80,6 +80,7 @@ int drm_buddy_init(struct drm_buddy *mm, u64 size, u64 
> chunk_size)
>   {
>   unsigned int i;
>   u64 offset;
> + LIST_HEAD(leaf);
>
>   if (size < chunk_size)
>   return -EINVAL;
> @@ -136,6 +137,7 @@ int drm_buddy_init(struct drm_buddy *mm, u64 size, u64 
> chunk_size)
>   goto out_free_roots;
>
>   mark_free(mm, root);
> + list_add_tail(&root->leaf_link, &leaf);
>
>   BUG_ON(i > mm->max_order);
>   BUG_ON(drm_buddy_block_size(mm, root) < chunk_size);
> @@ -147,6 +149,7 @@ int drm_buddy_init(struct drm_buddy *mm, u64 size, u64 
> chunk_size)
>   i++;
>   } while (size);
>
> + list_del(&leaf);
>   return 0;
>
>   out_free_roots:
> @@ -205,6 +208,9 @@ static int split_block(struct drm_buddy *mm,
>   mark_free(mm, block->left);
>   mark_free(mm, block->right);
>
> + list_add(&block->right->leaf_link, &block->leaf_link);
> + list_add(&block->left->leaf_link, &block->leaf_link);
> + list_del(&block->leaf_link);
>   mark_split(block);
>
>   return 0;
> @@ -256,6 +262,9 @@ static void __drm_buddy_free(struct drm_buddy *mm,
>   break;
>
>   list_del(&buddy->link);
> + list_add(&parent->leaf_link, &block->leaf_link);
> + list_del(&buddy->leaf_link);
> + list_del(&block->leaf_link);
>
>   drm_block_free(mm, block);
>   drm_block_free(mm, buddy);
> @@ -386,6 +395,78 @@ alloc_range_bias(struct drm_buddy *mm,
>   return ERR_PTR(err);
>   }
>
> +static struct drm_buddy_block *
> +find_continuous_blocks(struct drm_buddy *mm,
> +int order,
> +unsigned long flags,
> +struct drm_buddy_block **rblock)
> +{
> + struct list_head *head = &mm->free_list[order];
> + struct drm_buddy_block *free_block, *max_block = NULL, *end, *begin;
> + u64 pages = BIT(order + 1);

Re: [PATCH 1/4] drm/amdgpu: Introduce gfx software ring (v9)

2022-11-29 Thread Christian König


Am 29.11.22 um 08:10 schrieb jiadong@amd.com:

From: "Jiadong.Zhu" 

The software ring is created to support priority context while there is only
one hardware queue for gfx.

Every software ring has its fence driver and could be used as an ordinary ring
for the GPU scheduler.
Multiple software rings are bound to a real ring with the ring muxer. The
packages committed on the software ring are copied to the real ring.

v2: Use array to store software ring entry.
v3: Remove unnecessary prints.
v4: Remove amdgpu_ring_sw_init/fini functions,
using gtt for sw ring buffer for later dma copy
optimization.
v5: Allocate ring entry dynamically in the muxer.
v6: Update comments for the ring muxer.
v7: Modify for function naming.
v8: Combine software ring functions into amdgpu_ring_mux.c
v9: Use kernel-doc comment on the get_rptr function.

Cc: Christian Koenig 
Cc: Luben Tuikov 
Cc: Andrey Grodzovsky  
Cc: Michel Dänzer 
Signed-off-by: Jiadong.Zhu 
Acked-by: Huang Rui 
Acked-by: Luben Tuikov 


Acked-by: Christian König  for the series.


---
  drivers/gpu/drm/amd/amdgpu/Makefile  |   3 +-
  drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h  |   3 +
  drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h |   4 +
  drivers/gpu/drm/amd/amdgpu/amdgpu_ring_mux.c | 221 +++
  drivers/gpu/drm/amd/amdgpu/amdgpu_ring_mux.h |  76 +++
  5 files changed, 306 insertions(+), 1 deletion(-)
  create mode 100644 drivers/gpu/drm/amd/amdgpu/amdgpu_ring_mux.c
  create mode 100644 drivers/gpu/drm/amd/amdgpu/amdgpu_ring_mux.h

diff --git a/drivers/gpu/drm/amd/amdgpu/Makefile 
b/drivers/gpu/drm/amd/amdgpu/Makefile
index 74f109a56d90..f58aa5d2e83e 100644
--- a/drivers/gpu/drm/amd/amdgpu/Makefile
+++ b/drivers/gpu/drm/amd/amdgpu/Makefile
@@ -62,7 +62,8 @@ amdgpu-y += amdgpu_device.o amdgpu_kms.o \
amdgpu_vm_sdma.o amdgpu_discovery.o amdgpu_ras_eeprom.o amdgpu_nbio.o \
amdgpu_umc.o smu_v11_0_i2c.o amdgpu_fru_eeprom.o amdgpu_rap.o \
amdgpu_fw_attestation.o amdgpu_securedisplay.o \
-   amdgpu_eeprom.o amdgpu_mca.o amdgpu_psp_ta.o amdgpu_lsdma.o
+   amdgpu_eeprom.o amdgpu_mca.o amdgpu_psp_ta.o amdgpu_lsdma.o \
+   amdgpu_ring_mux.o
  
  amdgpu-$(CONFIG_PROC_FS) += amdgpu_fdinfo.o
  
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h

index 1e6e35ff3f13..7c2692f29311 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h
@@ -33,6 +33,7 @@
  #include "amdgpu_imu.h"
  #include "soc15.h"
  #include "amdgpu_ras.h"
+#include "amdgpu_ring_mux.h"
  
  /* GFX current status */

  #define AMDGPU_GFX_NORMAL_MODE0xL
@@ -363,6 +364,8 @@ struct amdgpu_gfx {
struct amdgpu_gfx_ras   *ras;
  
  	boolis_poweron;

+
+   struct amdgpu_ring_mux  muxer;
  };
  
  #define amdgpu_gfx_get_gpu_clock_counter(adev) (adev)->gfx.funcs->get_gpu_clock_counter((adev))

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h
index 82c178a9033a..8be51ebfedd5 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h
@@ -279,6 +279,10 @@ struct amdgpu_ring {
boolis_mes_queue;
uint32_thw_queue_id;
struct amdgpu_mes_ctx_data *mes_ctx;
+
+   boolis_sw_ring;
+   unsigned intentry_index;
+
  };
  
  #define amdgpu_ring_parse_cs(r, p, job, ib) ((r)->funcs->parse_cs((p), (job), (ib)))

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring_mux.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring_mux.c
new file mode 100644
index ..6fbf71451e29
--- /dev/null
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring_mux.c
@@ -0,0 +1,221 @@
+/*
+ * Copyright 2022 Advanced Micro Devices, Inc.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, DAMAGES OR
+ * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
+ * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
+ * OTHER DEALINGS IN THE SOFTWARE.
+ *
+ */
+#include 
+#include 
+
+#include "amdgpu_ring_mux.h"
+#includ

Re: [PATCH v4] drm: Optimise for continuous memory allocation

2022-11-29 Thread Christian König


Am 29.11.22 um 11:56 schrieb xinhui pan:

Currently drm-buddy does not have full knowledge of continuous memory.

Lets consider scenario below.
order 1:L   R
order 0: LL LR  RL  RR
for order 1 allocation, it can offer L or R or LR+RL.

For now, we only implement L or R case for continuous memory allocation.
So this patch aims to implement the rest cases.

Adding a new member leaf_link which links all leaf blocks in asceding
order. Now we can find more than 2 sub-order blocks easier.
Say, order 4 can be combined with corresponding order 4, 2+2, 1+2+1,
0+1+2+0, 0+2+1+0.


Well that description is a bit confusing and doesn't make to much sense 
to me.


When you have two adjacent free order 0 blocks then those should be 
automatically combined into an order 1. This is a fundamental property 
of the buddy allocator, otherwise the whole algorithm won't work.


When you have the case of a free order 1 block with two adjacent free 
order 0 blocks then we a fragmented address space. In this case the best 
approach is to fail the allocation and start to swap things out.


So what exactly is the goal here?

Regards,
Christian.



Signed-off-by: xinhui pan 
---
change from v3:
reworked totally. adding leaf_link.

change from v2:
search continuous block in nearby root if needed

change from v1:
implement top-down continuous allocation
---
  drivers/gpu/drm/drm_buddy.c | 108 +---
  include/drm/drm_buddy.h |   1 +
  2 files changed, 102 insertions(+), 7 deletions(-)

diff --git a/drivers/gpu/drm/drm_buddy.c b/drivers/gpu/drm/drm_buddy.c
index 11bb59399471..8edafb99b02c 100644
--- a/drivers/gpu/drm/drm_buddy.c
+++ b/drivers/gpu/drm/drm_buddy.c
@@ -80,6 +80,7 @@ int drm_buddy_init(struct drm_buddy *mm, u64 size, u64 
chunk_size)
  {
unsigned int i;
u64 offset;
+   LIST_HEAD(leaf);
  
  	if (size < chunk_size)

return -EINVAL;
@@ -136,6 +137,7 @@ int drm_buddy_init(struct drm_buddy *mm, u64 size, u64 
chunk_size)
goto out_free_roots;
  
  		mark_free(mm, root);

+   list_add_tail(&root->leaf_link, &leaf);
  
  		BUG_ON(i > mm->max_order);

BUG_ON(drm_buddy_block_size(mm, root) < chunk_size);
@@ -147,6 +149,7 @@ int drm_buddy_init(struct drm_buddy *mm, u64 size, u64 
chunk_size)
i++;
} while (size);
  
+	list_del(&leaf);

return 0;
  
  out_free_roots:

@@ -205,6 +208,9 @@ static int split_block(struct drm_buddy *mm,
mark_free(mm, block->left);
mark_free(mm, block->right);
  
+	list_add(&block->right->leaf_link, &block->leaf_link);

+   list_add(&block->left->leaf_link, &block->leaf_link);
+   list_del(&block->leaf_link);
mark_split(block);
  
  	return 0;

@@ -256,6 +262,9 @@ static void __drm_buddy_free(struct drm_buddy *mm,
break;
  
  		list_del(&buddy->link);

+   list_add(&parent->leaf_link, &block->leaf_link);
+   list_del(&buddy->leaf_link);
+   list_del(&block->leaf_link);
  
  		drm_block_free(mm, block);

drm_block_free(mm, buddy);
@@ -386,6 +395,78 @@ alloc_range_bias(struct drm_buddy *mm,
return ERR_PTR(err);
  }
  
+static struct drm_buddy_block *

+find_continuous_blocks(struct drm_buddy *mm,
+  int order,
+  unsigned long flags,
+  struct drm_buddy_block **rblock)
+{
+   struct list_head *head = &mm->free_list[order];
+   struct drm_buddy_block *free_block, *max_block = NULL, *end, *begin;
+   u64 pages = BIT(order + 1);
+   u64 cur_pages;
+
+   list_for_each_entry(free_block, head, link) {
+   if (max_block) {
+   if (!(flags & DRM_BUDDY_TOPDOWN_ALLOCATION))
+   break;
+
+   if (drm_buddy_block_offset(free_block) <
+   drm_buddy_block_offset(max_block))
+   continue;
+   }
+
+   cur_pages = BIT(order);
+   begin = end = free_block;
+   while (true) {
+   struct drm_buddy_block *prev, *next;
+   int prev_order, next_order;
+
+   prev = list_prev_entry(begin, leaf_link);
+   if (!drm_buddy_block_is_free(prev) ||
+   drm_buddy_block_offset(prev) >
+   drm_buddy_block_offset(begin)) {
+   prev = NULL;
+   }
+   next = list_next_entry(end, leaf_link);
+   if (!drm_buddy_block_is_free(next) ||
+   drm_buddy_block_offset(next) <
+   drm_buddy_block_offset(end)) {
+   next = NULL;
+   }
+   if (!prev && !next)
+

回复: [PATCH v4] drm: Optimise for continuous memory allocation

2022-11-29 Thread Pan, Xinhui

[AMD Official Use Only - General]

In one ROCM +  gdm restart test,
find_continuous_blocks() succeed with ratio 35%.
the cod coverage report is below.

 7723998 : if (order-- == min_order) {
 773 352 : if (!(flags & 
DRM_BUDDY_RANGE_ALLOCATION) &&
 774 352 : min_order != 0 && 
pages == BIT(order + 1)) {
 775  79 : block = 
find_continuous_blocks(mm,
 776 :  
  order,
 777 :  
  flags,
 778 :  
  &rblock);
 779  79 : if (block)
 780 : break;
 781 : }
 782 300 : err = -ENOSPC;
 783 300 : goto err_free;

thanks
xinhui

发件人: Pan, Xinhui 
发送时间: 2022年11月29日 18:56
收件人: amd-gfx@lists.freedesktop.org
抄送: dan...@ffwll.ch; matthew.a...@intel.com; Koenig, Christian; 
dri-de...@lists.freedesktop.org; linux-ker...@vger.kernel.org; Paneer Selvam, 
Arunpravin; intel-...@lists.freedesktop.org; Pan, Xinhui
主题: [PATCH v4] drm: Optimise for continuous memory allocation

Currently drm-buddy does not have full knowledge of continuous memory.

Lets consider scenario below.
order 1:L   R
order 0: LL LR  RL  RR
for order 1 allocation, it can offer L or R or LR+RL.

For now, we only implement L or R case for continuous memory allocation.
So this patch aims to implement the rest cases.

Adding a new member leaf_link which links all leaf blocks in asceding
order. Now we can find more than 2 sub-order blocks easier.
Say, order 4 can be combined with corresponding order 4, 2+2, 1+2+1,
0+1+2+0, 0+2+1+0.

Signed-off-by: xinhui pan 
---
change from v3:
reworked totally. adding leaf_link.

change from v2:
search continuous block in nearby root if needed

change from v1:
implement top-down continuous allocation
---
 drivers/gpu/drm/drm_buddy.c | 108 +---
 include/drm/drm_buddy.h |   1 +
 2 files changed, 102 insertions(+), 7 deletions(-)

diff --git a/drivers/gpu/drm/drm_buddy.c b/drivers/gpu/drm/drm_buddy.c
index 11bb59399471..8edafb99b02c 100644
--- a/drivers/gpu/drm/drm_buddy.c
+++ b/drivers/gpu/drm/drm_buddy.c
@@ -80,6 +80,7 @@ int drm_buddy_init(struct drm_buddy *mm, u64 size, u64 
chunk_size)
 {
unsigned int i;
u64 offset;
+   LIST_HEAD(leaf);

if (size < chunk_size)
return -EINVAL;
@@ -136,6 +137,7 @@ int drm_buddy_init(struct drm_buddy *mm, u64 size, u64 
chunk_size)
goto out_free_roots;

mark_free(mm, root);
+   list_add_tail(&root->leaf_link, &leaf);

BUG_ON(i > mm->max_order);
BUG_ON(drm_buddy_block_size(mm, root) < chunk_size);
@@ -147,6 +149,7 @@ int drm_buddy_init(struct drm_buddy *mm, u64 size, u64 
chunk_size)
i++;
} while (size);

+   list_del(&leaf);
return 0;

 out_free_roots:
@@ -205,6 +208,9 @@ static int split_block(struct drm_buddy *mm,
mark_free(mm, block->left);
mark_free(mm, block->right);

+   list_add(&block->right->leaf_link, &block->leaf_link);
+   list_add(&block->left->leaf_link, &block->leaf_link);
+   list_del(&block->leaf_link);
mark_split(block);

return 0;
@@ -256,6 +262,9 @@ static void __drm_buddy_free(struct drm_buddy *mm,
break;

list_del(&buddy->link);
+   list_add(&parent->leaf_link, &block->leaf_link);
+   list_del(&buddy->leaf_link);
+   list_del(&block->leaf_link);

drm_block_free(mm, block);
drm_block_free(mm, buddy);
@@ -386,6 +395,78 @@ alloc_range_bias(struct drm_buddy *mm,
return ERR_PTR(err);
 }

+static struct drm_buddy_block *
+find_continuous_blocks(struct drm_buddy *mm,
+  int order,
+  unsigned long flags,
+  struct drm_buddy_block **rblock)
+{
+   struct list_head *head = &mm->free_list[order];
+   struct drm_buddy_block *free_block, *max_block = NULL, *end, *begin;
+   u64 pages = BIT(order + 1);
+   u64 cur_pages;
+
+   list_for_each_entry(free_block, head, link) {
+   if (max_block) {
+   if (!(flags & DRM_BUDDY_TOPDOWN_ALLOCATION))
+   break;
+
+   if (drm_buddy_block_offset(free_block) <
+   drm_buddy_block_

[PATCH v4] drm: Optimise for continuous memory allocation

2022-11-29 Thread xinhui pan

Currently drm-buddy does not have full knowledge of continuous memory.

Lets consider scenario below.
order 1:L   R
order 0: LL LR  RL  RR
for order 1 allocation, it can offer L or R or LR+RL.

For now, we only implement L or R case for continuous memory allocation.
So this patch aims to implement the rest cases.

Adding a new member leaf_link which links all leaf blocks in asceding
order. Now we can find more than 2 sub-order blocks easier.
Say, order 4 can be combined with corresponding order 4, 2+2, 1+2+1,
0+1+2+0, 0+2+1+0.

Signed-off-by: xinhui pan 
---
change from v3:
reworked totally. adding leaf_link.

change from v2:
search continuous block in nearby root if needed

change from v1:
implement top-down continuous allocation
---
 drivers/gpu/drm/drm_buddy.c | 108 +---
 include/drm/drm_buddy.h |   1 +
 2 files changed, 102 insertions(+), 7 deletions(-)

diff --git a/drivers/gpu/drm/drm_buddy.c b/drivers/gpu/drm/drm_buddy.c
index 11bb59399471..8edafb99b02c 100644
--- a/drivers/gpu/drm/drm_buddy.c
+++ b/drivers/gpu/drm/drm_buddy.c
@@ -80,6 +80,7 @@ int drm_buddy_init(struct drm_buddy *mm, u64 size, u64 
chunk_size)
 {
unsigned int i;
u64 offset;
+   LIST_HEAD(leaf);
 
if (size < chunk_size)
return -EINVAL;
@@ -136,6 +137,7 @@ int drm_buddy_init(struct drm_buddy *mm, u64 size, u64 
chunk_size)
goto out_free_roots;
 
mark_free(mm, root);
+   list_add_tail(&root->leaf_link, &leaf);
 
BUG_ON(i > mm->max_order);
BUG_ON(drm_buddy_block_size(mm, root) < chunk_size);
@@ -147,6 +149,7 @@ int drm_buddy_init(struct drm_buddy *mm, u64 size, u64 
chunk_size)
i++;
} while (size);
 
+   list_del(&leaf);
return 0;
 
 out_free_roots:
@@ -205,6 +208,9 @@ static int split_block(struct drm_buddy *mm,
mark_free(mm, block->left);
mark_free(mm, block->right);
 
+   list_add(&block->right->leaf_link, &block->leaf_link);
+   list_add(&block->left->leaf_link, &block->leaf_link);
+   list_del(&block->leaf_link);
mark_split(block);
 
return 0;
@@ -256,6 +262,9 @@ static void __drm_buddy_free(struct drm_buddy *mm,
break;
 
list_del(&buddy->link);
+   list_add(&parent->leaf_link, &block->leaf_link);
+   list_del(&buddy->leaf_link);
+   list_del(&block->leaf_link);
 
drm_block_free(mm, block);
drm_block_free(mm, buddy);
@@ -386,6 +395,78 @@ alloc_range_bias(struct drm_buddy *mm,
return ERR_PTR(err);
 }
 
+static struct drm_buddy_block *
+find_continuous_blocks(struct drm_buddy *mm,
+  int order,
+  unsigned long flags,
+  struct drm_buddy_block **rblock)
+{
+   struct list_head *head = &mm->free_list[order];
+   struct drm_buddy_block *free_block, *max_block = NULL, *end, *begin;
+   u64 pages = BIT(order + 1);
+   u64 cur_pages;
+
+   list_for_each_entry(free_block, head, link) {
+   if (max_block) {
+   if (!(flags & DRM_BUDDY_TOPDOWN_ALLOCATION))
+   break;
+
+   if (drm_buddy_block_offset(free_block) <
+   drm_buddy_block_offset(max_block))
+   continue;
+   }
+
+   cur_pages = BIT(order);
+   begin = end = free_block;
+   while (true) {
+   struct drm_buddy_block *prev, *next;
+   int prev_order, next_order;
+
+   prev = list_prev_entry(begin, leaf_link);
+   if (!drm_buddy_block_is_free(prev) ||
+   drm_buddy_block_offset(prev) >
+   drm_buddy_block_offset(begin)) {
+   prev = NULL;
+   }
+   next = list_next_entry(end, leaf_link);
+   if (!drm_buddy_block_is_free(next) ||
+   drm_buddy_block_offset(next) <
+   drm_buddy_block_offset(end)) {
+   next = NULL;
+   }
+   if (!prev && !next)
+   break;
+
+   prev_order = prev ? drm_buddy_block_order(prev) : -1;
+   next_order = next ? drm_buddy_block_order(next) : -1;
+   if (next_order >= prev_order) {
+   BUG_ON(drm_buddy_block_offset(end) +
+  drm_buddy_block_size(mm, end) !=
+  drm_buddy_block_offset(next));
+   end = next;
+   cur_pages += BIT(drm_buddy_block_order(next));
+

RE: [PATCH] drm/amdgpu: enable VCN RAS poison for VCN v4.0

2022-11-29 Thread Zhang, Hawking

[AMD Official Use Only - General]

Reviewed-by: Hawking Zhang 

Regards,
Hawking
-Original Message-
From: Zhou1, Tao 
Sent: Tuesday, November 29, 2022 15:56
To: amd-gfx@lists.freedesktop.org; Zhang, Hawking 
Cc: Zhou1, Tao 
Subject: [PATCH] drm/amdgpu: enable VCN RAS poison for VCN v4.0

Configure related registers.

Signed-off-by: Tao Zhou 
---
 drivers/gpu/drm/amd/amdgpu/vcn_v4_0.c | 21 +
 1 file changed, 21 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/vcn_v4_0.c 
b/drivers/gpu/drm/amd/amdgpu/vcn_v4_0.c
index 66b3f42764df..0bc65b2f0ce8 100644
--- a/drivers/gpu/drm/amd/amdgpu/vcn_v4_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/vcn_v4_0.c
@@ -862,6 +862,25 @@ static void vcn_v4_0_enable_clock_gating(struct 
amdgpu_device *adev, int inst)
return;
 }

+static void vcn_v4_0_enable_ras(struct amdgpu_device *adev, int inst_idx,
+   bool indirect)
+{
+   uint32_t tmp;
+
+   tmp = VCN_RAS_CNTL__VCPU_VCODEC_REARM_MASK |
+ VCN_RAS_CNTL__VCPU_VCODEC_IH_EN_MASK |
+ VCN_RAS_CNTL__VCPU_VCODEC_PMI_EN_MASK |
+ VCN_RAS_CNTL__VCPU_VCODEC_STALL_EN_MASK;
+   WREG32_SOC15_DPG_MODE(inst_idx,
+ SOC15_DPG_MODE_OFFSET(VCN, 0, regVCN_RAS_CNTL),
+ tmp, 0, indirect);
+
+   tmp = UVD_SYS_INT_EN__RASCNTL_VCPU_VCODEC_EN_MASK;
+   WREG32_SOC15_DPG_MODE(inst_idx,
+ SOC15_DPG_MODE_OFFSET(VCN, 0, regUVD_SYS_INT_EN),
+ tmp, 0, indirect);
+}
+
 /**
  * vcn_v4_0_start_dpg_mode - VCN start with dpg mode
  *
@@ -950,6 +969,8 @@ static int vcn_v4_0_start_dpg_mode(struct amdgpu_device 
*adev, int inst_idx, boo
WREG32_SOC15_DPG_MODE(inst_idx, SOC15_DPG_MODE_OFFSET(
VCN, inst_idx, regUVD_LMI_CTRL2), tmp, 0, indirect);

+   vcn_v4_0_enable_ras(adev, inst_idx, indirect);
+
/* enable master interrupt */
WREG32_SOC15_DPG_MODE(inst_idx, SOC15_DPG_MODE_OFFSET(
VCN, inst_idx, regUVD_MASTINT_EN),
--
2.35.1

<>

Re: 回复: [PATCH] drm/amdgpu: New method to check block continuous

2022-11-29 Thread Christian König


Hi Xinhui,

Am 29.11.22 um 03:11 schrieb Pan, Xinhui:

[AMD Official Use Only - General]

What I am thinking is that
Hi Chris,

For continuous memory allocation, of course the blocks are in ascending order.

For non-continuous memory allocation, the allocated memory might be continuous 
while the blocks are not in ascending order.

Anyway, could we just re-sort these blocks in ascending order if memory is 
indeed continuous?


Well that the blocks are in continuous order by coincident is just 
extremely unlikely.


So this doesn't make much sense.

Regards,
Christian.



thanks
xinhui


发件人: Christian König 
发送时间: 2022年11月29日 1:11
收件人: Pan, Xinhui; amd-gfx@lists.freedesktop.org
抄送: Deucher, Alexander
主题: Re: [PATCH] drm/amdgpu: New method to check block continuous

Am 27.11.22 um 06:39 schrieb xinhui pan:

Blocks are not guarnteed to be in ascending order.

Well certainly a NAK. Blocks are required to be in ascending order to be
contiguous.

Regards,
Christian.


Signed-off-by: xinhui pan 
---
   drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c | 21 
   1 file changed, 8 insertions(+), 13 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c
index 27159f1d112e..17175d284869 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c
@@ -59,22 +59,17 @@ amdgpu_vram_mgr_first_block(struct list_head *list)
   static inline bool amdgpu_is_vram_mgr_blocks_contiguous(struct list_head 
*head)
   {
   struct drm_buddy_block *block;
- u64 start, size;
+ u64 start = LONG_MAX, end = 0, size = 0;

- block = amdgpu_vram_mgr_first_block(head);
- if (!block)
- return false;
+ list_for_each_entry(block, head, link) {
+ u64 bstart = amdgpu_vram_mgr_block_start(block);
+ u64 bsize = amdgpu_vram_mgr_block_size(block);

- while (head != block->link.next) {
- start = amdgpu_vram_mgr_block_start(block);
- size = amdgpu_vram_mgr_block_size(block);
-
- block = list_entry(block->link.next, struct drm_buddy_block, 
link);
- if (start + size != amdgpu_vram_mgr_block_start(block))
- return false;
+ start = min(bstart, start);
+ end = max(bstart + bsize, end);
+ size += bsize;
   }
-
- return true;
+ return end == start + size;
   }

[PATCH] drm/amdkfd: Fix memory leakage

2022-11-29 Thread Konstantin Meskhidze

This patch fixes potential memory leakage and seg fault
in  _gpuvm_import_dmabuf() function

Signed-off-by: Konstantin Meskhidze 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
index 978d3970b5cc..e0084f712e02 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
@@ -2257,7 +2257,7 @@ int amdgpu_amdkfd_gpuvm_import_dmabuf(struct 
amdgpu_device *adev,
 
ret = drm_vma_node_allow(&obj->vma_node, drm_priv);
if (ret) {
-   kfree(mem);
+   kfree(*mem);
return ret;
}
 
-- 
2.25.1

[PATCH] drm: amdgpu: Fix logic error

2022-11-29 Thread Konstantin Meskhidze

This commit fixes logic error in function 'amdgpu_hw_ip_info':
   - value 'uvd' might be 'vcn'.

Signed-off-by: Konstantin Meskhidze 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
index fe23e09eec98..28752a6a92c4 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
@@ -424,7 +424,7 @@ static int amdgpu_hw_ip_info(struct amdgpu_device *adev,
case AMDGPU_HW_IP_VCN_DEC:
type = AMD_IP_BLOCK_TYPE_VCN;
for (i = 0; i < adev->vcn.num_vcn_inst; i++) {
-   if (adev->uvd.harvest_config & (1 << i))
+   if (adev->vcn.harvest_config & (1 << i))
continue;
 
if (adev->vcn.inst[i].ring_dec.sched.ready)
@@ -436,7 +436,7 @@ static int amdgpu_hw_ip_info(struct amdgpu_device *adev,
case AMDGPU_HW_IP_VCN_ENC:
type = AMD_IP_BLOCK_TYPE_VCN;
for (i = 0; i < adev->vcn.num_vcn_inst; i++) {
-   if (adev->uvd.harvest_config & (1 << i))
+   if (adev->vcn.harvest_config & (1 << i))
continue;
 
for (j = 0; j < adev->vcn.num_enc_rings; j++)
-- 
2.25.1

Re: AMD GPU problems under Xen

2022-11-29 Thread Demi Marie Obenour

On Mon, Nov 28, 2022 at 11:18:00AM -0500, Alex Deucher wrote:
> On Mon, Nov 28, 2022 at 2:18 AM Demi Marie Obenour
>  wrote:
> >
> > Dear Christian:
> >
> > What is the status of the AMDGPU work for Xen dom0?  That was mentioned in
> > https://lore.kernel.org/dri-devel/b2dec9b3-03a7-e7ac-306e-1da024af8...@amd.com/
> > and there have been bug reports to Qubes OS about problems with AMDGPU
> > under Xen (such as https://github.com/QubesOS/qubes-issues/issues/7648).
> 
> I would say it's a work in progress.  It depends what GPU  you have
> and what type of xen setup you are using (PV vs PVH, etc.).

The current situation is:

- dom0 is PV.
- VMs with assigned PCI devices are HVM and use a Linux-based stubdomain
  QEMU does not run in dom0.
- Everything else is PVH.

In the future, I believe the goal is to move away from PV and HVM in
favor of PVH, though HVM support will remain for compatibility with
guests (such as Windows) that need emulated devices.

> In general, your best bet currently is dGPU add in boards because they
> are largely self contained.

The main problem is that for the trusted GUI to work, there needs to
be at least one GPU attached to a trusted VM, such as the host or a
dedicated GUI VM.  That VM will typically not be running graphics-
intensive workloads, so the compute power of a dGPU is largely wasted.
SR-IOV support would help with that, but the only GPU vendor with open
source SR-IOV support is Intel and it is still not upstream.  I am also
not certain if the support extends to Arc dGPUs.

> APUs and platforms with integrated dGPUs
> are a bit more complicated as they tend to have more platform
> dependencies like ACPI tables and methods in order for the driver to
> be able to initialize the hardware properly.

Is Xen dom0/domU support for such GPUs being worked on?  Is there an
estimate as to when the needed support will be available upstream?  This
is mostly directed at Christian and other people who work for hardware
vendors.

> Additionally, GPUs map a
> lot of system memory so bounce buffers aren't really viable.  You'll
> really need IOMMU,

Qubes OS already needs an IOMMU so that is not a concern.  
-- 
Sincerely,
Demi Marie Obenour (she/her/hers)
Invisible Things Lab

signature.asc
Description: PGP signature

56 matches

Mail list logo