RE: [PATCH] drm/amd/pm: Vangogh: Add new gpu_metrics_v2_4 to acquire gpu_metrics
[AMD Official Use Only - General] Any comments? > -Original Message- > From: Wenyou Yang > Sent: Thursday, June 1, 2023 9:38 AM > To: Deucher, Alexander ; Limonciello, Mario > ; Koenig, Christian ; > Pan, Xinhui ; Quan, Evan > Cc: Yuan, Perry ; Liang, Richard qi > ; amd-gfx@lists.freedesktop.org; dri- > de...@lists.freedesktop.org; linux-ker...@vger.kernel.org; Yang, WenYou > > Subject: [PATCH] drm/amd/pm: Vangogh: Add new gpu_metrics_v2_4 to acquire > gpu_metrics > > To acquire the voltage and current info from gpu_metrics interface, but > gpu_metrics_v2_3 doesn't contain them, and to be backward compatible, add > new gpu_metrics_v2_4 structure. > > Acked-by: Evan Quan > Signed-off-by: Wenyou Yang > --- > .../gpu/drm/amd/include/kgd_pp_interface.h| 69 +++ > .../gpu/drm/amd/pm/swsmu/smu11/vangogh_ppt.c | 109 - > - > drivers/gpu/drm/amd/pm/swsmu/smu_cmn.c| 3 + > 3 files changed, 172 insertions(+), 9 deletions(-) > > diff --git a/drivers/gpu/drm/amd/include/kgd_pp_interface.h > b/drivers/gpu/drm/amd/include/kgd_pp_interface.h > index 9f542f6e19ed..0f37dafafcf9 100644 > --- a/drivers/gpu/drm/amd/include/kgd_pp_interface.h > +++ b/drivers/gpu/drm/amd/include/kgd_pp_interface.h > @@ -892,4 +892,73 @@ struct gpu_metrics_v2_3 { > uint16_taverage_temperature_core[8]; // > average CPU core temperature on APUs > uint16_taverage_temperature_l3[2]; > }; > + > +struct gpu_metrics_v2_4 { > + struct metrics_table_header common_header; > + > + /* Temperature */ > + uint16_ttemperature_gfx; > + uint16_ttemperature_soc; > + uint16_ttemperature_core[8]; > + uint16_ttemperature_l3[2]; > + > + /* Utilization */ > + uint16_taverage_gfx_activity; > + uint16_taverage_mm_activity; > + > + /* Driver attached timestamp (in ns) */ > + uint64_tsystem_clock_counter; > + > + /* Power/Energy */ > + uint16_taverage_socket_power; > + uint16_taverage_cpu_power; > + uint16_taverage_soc_power; > + uint16_taverage_gfx_power; > + uint16_taverage_core_power[8]; > + > + /* Average clocks */ > + uint16_taverage_gfxclk_frequency; > + uint16_taverage_socclk_frequency; > + uint16_taverage_uclk_frequency; > + uint16_taverage_fclk_frequency; > + uint16_taverage_vclk_frequency; > + uint16_taverage_dclk_frequency; > + > + /* Current clocks */ > + uint16_tcurrent_gfxclk; > + uint16_tcurrent_socclk; > + uint16_tcurrent_uclk; > + uint16_tcurrent_fclk; > + uint16_tcurrent_vclk; > + uint16_tcurrent_dclk; > + uint16_tcurrent_coreclk[8]; > + uint16_tcurrent_l3clk[2]; > + > + /* Throttle status (ASIC dependent) */ > + uint32_tthrottle_status; > + > + /* Fans */ > + uint16_tfan_pwm; > + > + uint16_tpadding[3]; > + > + /* Throttle status (ASIC independent) */ > + uint64_tindep_throttle_status; > + > + /* Average Temperature */ > + uint16_taverage_temperature_gfx; > + uint16_taverage_temperature_soc; > + uint16_taverage_temperature_core[8]; > + uint16_taverage_temperature_l3[2]; > + > + /* Power/Voltage */ > + uint16_taverage_cpu_voltage; > + uint16_taverage_soc_voltage; > + uint16_taverage_gfx_voltage; > + > + /* Power/Current */ > + uint16_taverage_cpu_current; > + uint16_taverage_soc_current; > + uint16_taverage_gfx_current; > +}; > #endif > diff --git a/drivers/gpu/drm/amd/pm/swsmu/smu11/vangogh_ppt.c > b/drivers/gpu/drm/amd/pm/swsmu/smu11/vangogh_ppt.c > index 067b4e0b026c..185d0b50ee8e 100644 > --- a/drivers/gpu/drm/amd/pm/swsmu/smu11/vangogh_ppt.c > +++ b/drivers/gpu/drm/amd/pm/swsmu/smu11/vangogh_ppt.c > @@ -1854,6 +1854,86 @@ static ssize_t vangogh_get_gpu_metrics_v2_3(struct > smu_context *smu, > return sizeof(struct gpu_metrics_v2_3); } > > +static ssize_t vangogh_get_gpu_metrics_v2_4(struct smu_context *smu, > + void **table) > +{ > +
RE: [PATCH] drm/amdgpu: Fix usage of UMC fill record in RAS
[AMD Official Use Only - General] Reviewed-by: Hawking Zhang Regards, Hawking -Original Message- From: Tuikov, Luben Sent: Saturday, June 10, 2023 19:36 To: AMD Graphics Cc: Tuikov, Luben ; Zhou1, Tao ; Zhang, Hawking ; Deucher, Alexander Subject: [PATCH] drm/amdgpu: Fix usage of UMC fill record in RAS The fixed commit listed in the Fixes tag below, introduced a bug in amdgpu_ras.c::amdgpu_reserve_page_direct(), in that when introducing the new amdgpu_umc_fill_error_record() and internally in that new function the physical address (argument "uint64_t retired_page"--wrong name) is right-shifted by AMDGPU_GPU_PAGE_SHIFT. Thus, in amdgpu_reserve_page_direct() when we pass "address" to that new function, we should NOT right-shift it, since this results, erroneously, in the page address to be 0 for first 2^(2*AMDGPU_GPU_PAGE_SHIFT) memory addresses. This commit fixes this bug. Cc: Tao Zhou Cc: Hawking Zhang Cc: Alex Deucher Fixes: 400013b268cb66 ("drm/amdgpu: add umc_fill_error_record to make code more simple") Signed-off-by: Luben Tuikov --- drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c index 27a32933cbee3b..700eb180ea60fa 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c @@ -171,8 +171,7 @@ static int amdgpu_reserve_page_direct(struct amdgpu_device *adev, uint64_t addre memset(_rec, 0x0, sizeof(struct eeprom_table_record)); err_data.err_addr = _rec; - amdgpu_umc_fill_error_record(_data, address, - (address >> AMDGPU_GPU_PAGE_SHIFT), 0, 0); + amdgpu_umc_fill_error_record(_data, address, address, 0, 0); if (amdgpu_bad_page_threshold != 0) { amdgpu_ras_add_bad_pages(adev, err_data.err_addr, base-commit: 7eda4451177abbc8d2fab24f816a3243dd1808d8 prerequisite-patch-id: f2a3eadc5d7074225109701f1bb43b38bd6024fd -- 2.41.0
[PATCH] drm/amdgpu: Fix usage of UMC fill record in RAS
The fixed commit listed in the Fixes tag below, introduced a bug in amdgpu_ras.c::amdgpu_reserve_page_direct(), in that when introducing the new amdgpu_umc_fill_error_record() and internally in that new function the physical address (argument "uint64_t retired_page"--wrong name) is right-shifted by AMDGPU_GPU_PAGE_SHIFT. Thus, in amdgpu_reserve_page_direct() when we pass "address" to that new function, we should NOT right-shift it, since this results, erroneously, in the page address to be 0 for first 2^(2*AMDGPU_GPU_PAGE_SHIFT) memory addresses. This commit fixes this bug. Cc: Tao Zhou Cc: Hawking Zhang Cc: Alex Deucher Fixes: 400013b268cb66 ("drm/amdgpu: add umc_fill_error_record to make code more simple") Signed-off-by: Luben Tuikov --- drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c index 27a32933cbee3b..700eb180ea60fa 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c @@ -171,8 +171,7 @@ static int amdgpu_reserve_page_direct(struct amdgpu_device *adev, uint64_t addre memset(_rec, 0x0, sizeof(struct eeprom_table_record)); err_data.err_addr = _rec; - amdgpu_umc_fill_error_record(_data, address, - (address >> AMDGPU_GPU_PAGE_SHIFT), 0, 0); + amdgpu_umc_fill_error_record(_data, address, address, 0, 0); if (amdgpu_bad_page_threshold != 0) { amdgpu_ras_add_bad_pages(adev, err_data.err_addr, base-commit: 7eda4451177abbc8d2fab24f816a3243dd1808d8 prerequisite-patch-id: f2a3eadc5d7074225109701f1bb43b38bd6024fd -- 2.41.0
Re: [PATCH] drm/amdgpu: Wrap -Wunused-but-set-variable in cc-option
On Thu, 08 Jun 2023, Nathan Chancellor wrote: > -Wunused-but-set-variable was only supported in clang starting with > 13.0.0, so earlier versions will emit a warning, which is turned into a > hard error for the kernel to mirror GCC: > > error: unknown warning option '-Wunused-but-set-variable'; did you mean > '-Wunused-const-variable'? [-Werror,-Wunknown-warning-option] > > The minimum supported version of clang for building the kernel is > 11.0.0, so match the rest of the kernel and wrap > -Wunused-but-set-variable in a cc-option call, so that it is only used > when supported by the compiler. I wonder if there's a table somewhere listing all the warning options, which GCC and Clang versions support them, and which versions have them in -Wall and -Wextra. Would be really useful. If there isn't one, it would be really helpful. *wink*. BR, Jani. > > Closes: https://github.com/ClangBuiltLinux/linux/issues/1869 > Fixes: a0fd5a5f676c ("drm/amd/amdgpu: introduce DRM_AMDGPU_WERROR") > Signed-off-by: Nathan Chancellor > --- > drivers/gpu/drm/amd/amdgpu/Makefile | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/drivers/gpu/drm/amd/amdgpu/Makefile > b/drivers/gpu/drm/amd/amdgpu/Makefile > index 7ee68b1bbfed..86b833085f19 100644 > --- a/drivers/gpu/drm/amd/amdgpu/Makefile > +++ b/drivers/gpu/drm/amd/amdgpu/Makefile > @@ -40,7 +40,7 @@ ccflags-y := -I$(FULL_AMD_PATH)/include/asic_reg \ > -I$(FULL_AMD_PATH)/amdkfd > > subdir-ccflags-y := -Wextra > -subdir-ccflags-y += -Wunused-but-set-variable > +subdir-ccflags-y += $(call cc-option, -Wunused-but-set-variable) > subdir-ccflags-y += -Wno-unused-parameter > subdir-ccflags-y += -Wno-type-limits > subdir-ccflags-y += -Wno-sign-compare > > --- > base-commit: 6bd4b01e8938779b0d959bdf33949a9aa258a363 > change-id: > 20230608-amdgpu-wrap-wunused-but-set-variable-in-cc-option-0be9528ac5c8 > > Best regards, -- Jani Nikula, Intel Open Source Graphics Center
Re: [PATCH] drm/amd/amdgpu: enable W=1 for amdgpu
On Sat, 10 Jun 2023, Masahiro Yamada wrote: > On Sat, Jun 10, 2023 at 5:17 AM Nathan Chancellor wrote: >> >> + Masahiro and linux-kbuild >> >> On Fri, Jun 09, 2023 at 12:42:06PM -0400, Hamza Mahfooz wrote: >> > We have a clean build with W=1 as of >> > commit 12a15dd589ac ("drm/amd/display/amdgpu_dm/amdgpu_dm_helpers: Move >> > SYNAPTICS_DEVICE_ID into CONFIG_DRM_AMD_DC_DCN ifdef"). So, let's enable >> > these checks unconditionally for the entire module to catch these errors >> > during development. >> > >> > Cc: Alex Deucher >> > Cc: Nathan Chancellor >> > Signed-off-by: Hamza Mahfooz >> >> I think this is fine, especially since it will help catch issues in >> amdgpu quickly and hopefully encourage developers to fix their problems >> before they make it to a tree with wider impact lika -next. >> >> However, this is now the third place that W=1 has been effectively >> enabled (i915 and btrfs are the other two I know of) and it would be >> nice if this was a little more unified, especially since it is not >> uncommon for the warnings under W=1 to shift around and keeping them >> unified will make maintainence over the longer term a little easier. I >> am not sure if this has been brought up in the past and I don't want to >> hold up this change but I suspect this sentiment of wanting to enable >> W=1 on a per-subsystem basis is going to continue to grow. > > > > I believe this patch is the right way because > we will be able to add a new warning option to > scripts/Makefile.extrawarn without fixing any code. > > I remember somebody argued that drivers should be > able to do > subdir-ccflags-y += $(W1_FLAGS) Personally, I think this would be the viable way to make the kernel free of W=1 warnings. Make it clean driver and subsystem at a time, with constant progress. Currently, there's haphazard fixing of issues, with new ones creeping back in, because kernel-wide W=1 is too verbose for most developers. It's whac-a-mole. > However, if a new flag, -Wfoo, emits warnings > for drivers/gpu/drm/{i915,amd}, > you cannot add it to W=1 until fixing the code. Or adding -Wno-foo where it breaks, until fixed. > If many drivers start to do likewise, > W=1 warning will not be W=1 any more. I don't know, is the goal to fix the warnings, or keep adding stuff to W=1 so that it'll always emit warnings? :p BR, Jani. > Another good thing for hard-coding warning options > is you can lift up a warning flag one by one. > > > Let's say you fixed the entire DRM subsystem so > it is -Wunused free now. > > Then, you can move -Wunused to drivers/gpu/drm/Makefile, > while other warning options stay in drivers Makefiles. > > > > > > > >> >> Regardless, for clang 11.1.0 to 16.0.5, I see no warnings when building >> drivers/gpu/drm/amd/amdgpu/ with Arch Linux's configuration or >> allmodconfig. >> >> Reviewed-by: Nathan Chancellor >> Tested-by: Nathan Chancellor >> >> > --- >> > drivers/gpu/drm/amd/amdgpu/Makefile | 13 - >> > 1 file changed, 12 insertions(+), 1 deletion(-) >> > >> > diff --git a/drivers/gpu/drm/amd/amdgpu/Makefile >> > b/drivers/gpu/drm/amd/amdgpu/Makefile >> > index 86b833085f19..8d16f280b695 100644 >> > --- a/drivers/gpu/drm/amd/amdgpu/Makefile >> > +++ b/drivers/gpu/drm/amd/amdgpu/Makefile >> > @@ -40,7 +40,18 @@ ccflags-y := -I$(FULL_AMD_PATH)/include/asic_reg \ >> > -I$(FULL_AMD_PATH)/amdkfd >> > >> > subdir-ccflags-y := -Wextra >> > -subdir-ccflags-y += $(call cc-option, -Wunused-but-set-variable) >> > +subdir-ccflags-y += -Wunused >> > +subdir-ccflags-y += -Wmissing-prototypes >> > +subdir-ccflags-y += -Wmissing-declarations >> > +subdir-ccflags-y += -Wmissing-include-dirs >> > +subdir-ccflags-y += -Wold-style-definition >> > +subdir-ccflags-y += -Wmissing-format-attribute >> > +# Need this to avoid recursive variable evaluation issues >> > +cond-flags := $(call cc-option, -Wunused-but-set-variable) \ >> > + $(call cc-option, -Wunused-const-variable) \ >> > + $(call cc-option, -Wstringop-truncation) \ >> > + $(call cc-option, -Wpacked-not-aligned) >> > +subdir-ccflags-y += $(cond-flags) >> > subdir-ccflags-y += -Wno-unused-parameter >> > subdir-ccflags-y += -Wno-type-limits >> > subdir-ccflags-y += -Wno-sign-compare >> > -- >> > 2.40.1 >> > -- Jani Nikula, Intel Open Source Graphics Center