Re: [PATCH] drm/amdgpu: Skip some registers config for SRIOV
Reviewed-by: Luben Tuikov On 2020-08-07 04:48, Liu ChengZhe wrote: > Some registers are not accessible to virtual function setup, so > skip their initialization when in VF-SRIOV mode. > > v2: move SRIOV VF check into specify functions; > modify commit description and comment. > > Signed-off-by: Liu ChengZhe > --- > drivers/gpu/drm/amd/amdgpu/gfxhub_v2_1.c | 19 +++ > drivers/gpu/drm/amd/amdgpu/mmhub_v2_0.c | 19 +++ > 2 files changed, 38 insertions(+) > > diff --git a/drivers/gpu/drm/amd/amdgpu/gfxhub_v2_1.c > b/drivers/gpu/drm/amd/amdgpu/gfxhub_v2_1.c > index 1f6112b7fa49..80c906a0383f 100644 > --- a/drivers/gpu/drm/amd/amdgpu/gfxhub_v2_1.c > +++ b/drivers/gpu/drm/amd/amdgpu/gfxhub_v2_1.c > @@ -182,6 +182,12 @@ static void gfxhub_v2_1_init_cache_regs(struct > amdgpu_device *adev) > { > uint32_t tmp; > > + /* These registers are not accessible to VF-SRIOV. > + * The PF will program them instead. > + */ > + if (amdgpu_sriov_vf(adev)) > + return; > + > /* Setup L2 cache */ > tmp = RREG32_SOC15(GC, 0, mmGCVM_L2_CNTL); > tmp = REG_SET_FIELD(tmp, GCVM_L2_CNTL, ENABLE_L2_CACHE, 1); > @@ -237,6 +243,12 @@ static void gfxhub_v2_1_enable_system_domain(struct > amdgpu_device *adev) > > static void gfxhub_v2_1_disable_identity_aperture(struct amdgpu_device *adev) > { > + /* These registers are not accessible to VF-SRIOV. > + * The PF will program them instead. > + */ > + if (amdgpu_sriov_vf(adev)) > + return; > + > WREG32_SOC15(GC, 0, mmGCVM_L2_CONTEXT1_IDENTITY_APERTURE_LOW_ADDR_LO32, >0x); > WREG32_SOC15(GC, 0, mmGCVM_L2_CONTEXT1_IDENTITY_APERTURE_LOW_ADDR_HI32, > @@ -373,6 +385,13 @@ void gfxhub_v2_1_set_fault_enable_default(struct > amdgpu_device *adev, > bool value) > { > u32 tmp; > + > + /* These registers are not accessible to VF-SRIOV. > + * The PF will program them instead. > + */ > + if (amdgpu_sriov_vf(adev)) > + return; > + > tmp = RREG32_SOC15(GC, 0, mmGCVM_L2_PROTECTION_FAULT_CNTL); > tmp = REG_SET_FIELD(tmp, GCVM_L2_PROTECTION_FAULT_CNTL, > RANGE_PROTECTION_FAULT_ENABLE_DEFAULT, value); > diff --git a/drivers/gpu/drm/amd/amdgpu/mmhub_v2_0.c > b/drivers/gpu/drm/amd/amdgpu/mmhub_v2_0.c > index d83912901f73..8acb3b625afe 100644 > --- a/drivers/gpu/drm/amd/amdgpu/mmhub_v2_0.c > +++ b/drivers/gpu/drm/amd/amdgpu/mmhub_v2_0.c > @@ -181,6 +181,12 @@ static void mmhub_v2_0_init_cache_regs(struct > amdgpu_device *adev) > { > uint32_t tmp; > > + /* These registers are not accessible to VF-SRIOV. > + * The PF will program them instead. > + */ > + if (amdgpu_sriov_vf(adev)) > + return; > + > /* Setup L2 cache */ > tmp = RREG32_SOC15(MMHUB, 0, mmMMVM_L2_CNTL); > tmp = REG_SET_FIELD(tmp, MMVM_L2_CNTL, ENABLE_L2_CACHE, 1); > @@ -236,6 +242,12 @@ static void mmhub_v2_0_enable_system_domain(struct > amdgpu_device *adev) > > static void mmhub_v2_0_disable_identity_aperture(struct amdgpu_device *adev) > { > + /* These registers are not accessible to VF-SRIOV. > + * The PF will program them instead. > + */ > + if (amdgpu_sriov_vf(adev)) > + return; > + > WREG32_SOC15(MMHUB, 0, >mmMMVM_L2_CONTEXT1_IDENTITY_APERTURE_LOW_ADDR_LO32, >0x); > @@ -365,6 +377,13 @@ void mmhub_v2_0_gart_disable(struct amdgpu_device *adev) > void mmhub_v2_0_set_fault_enable_default(struct amdgpu_device *adev, bool > value) > { > u32 tmp; > + > + /* These registers are not accessible to VF-SRIOV. > + * The PF will program them instead. > + */ > + if (amdgpu_sriov_vf(adev)) > + return; > + > tmp = RREG32_SOC15(MMHUB, 0, mmMMVM_L2_PROTECTION_FAULT_CNTL); > tmp = REG_SET_FIELD(tmp, MMVM_L2_PROTECTION_FAULT_CNTL, > RANGE_PROTECTION_FAULT_ENABLE_DEFAULT, value); > ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
[pull] amdgpu drm-fixes-5.9
Hi Dave, Daniel, Fixes for 5.9. The following changes since commit dc100bc8fae59aafd2ea2e1a1a43ef1f65f8a8bc: Merge tag 'drm-msm-next-2020-07-30' of https://gitlab.freedesktop.org/drm/msm into drm-next (2020-08-05 08:05:31 +1000) are available in the Git repository at: git://people.freedesktop.org/~agd5f/linux tags/amd-drm-fixes-5.9-2020-08-07 for you to fetch changes up to f87812284172a9809820d10143b573d833cd3f75: drm/amdgpu: Fix bug where DPM is not enabled after hibernate and resume (2020-08-07 17:52:15 -0400) amd-drm-fixes-5.9-2020-08-07: amdgpu: - Re-add spelling typo fix - Sienna Cichlid fixes - Navy Flounder fixes - DC fixes - SMU i2c fix - Power fixes Alex Deucher (1): drm/amdgpu/smu: rework i2c adpater registration Alvin Lee (1): drm/amd/display: Don't compare dppclk before updating DTO Aric Cyr (2): drm/amd/display: Fix DP Compliance tests 4.3.2.1 and 4.3.2.2 drm/amd/display: AMD OUI (DPCD 0x00300) skipped on some sink Bhawanpreet Lakha (2): drm/amd/display: Use seperate dmcub firmware for navy_flounder drm/amd/display: Use proper abm/backlight functions for DCN3 Boyuan Zhang (1): drm/amdgpu: update dec ring test for VCN 3.0 Changfeng (2): Revert "drm/amd/powerplay: drop unnecessary message support check" drm/amd/powerplay: drop unnecessary message support check(v2) Colin Ian King (1): drm/amdgpu: fix spelling mistake "Falied" -> "Failed" Dan Carpenter (1): drm/amd/powerplay: off by one bugs in smu_cmn_to_asic_specific_index() Dmytro Laktyushkin (2): drm/amd/display: Clean up global sync param retrieval drm/amd/display: populate new dml variable Eric Bernstein (1): drm/amd/display: Use parameter for call to set output mux Eryk Brol (2): drm/amd/display: Rename bytes_pp to the correct bits_pp drm/amd/display: Fix naming of DSC Debugfs entry Evan Quan (2): drm/amd/powerplay: update swSMU VCN/JPEG PG logics drm/amd/powerplay: put VCN/JPEG into PG ungate state before dpm table setup(V3) George Shen (1): drm/amd/display: Change null plane state swizzle mode to 4kb_s Guchun Chen (1): drm/amdgpu: add printing after executing page reservation to eeprom Harry Wentland (1): drm/amd/display: Fix logger context Huang Rui (1): drm/amdgpu: skip crit temperature values on APU (v2) Igor Kravchenko (2): drm/amd/display: Read VBIOS Golden Settings Tbl drm/amd/display: Display goes blank after inst James Zhu (1): drm/amdgpu/jpeg3.0: remove extra asic type check Jiansong Chen (3): drm/amd/powerplay: update driver if version for navy_flounder drm/amdgpu: update GC golden setting for navy_flounder drm/amdgpu: enable GFXOFF for navy_flounder JinZe.Xu (1): drm/amd/display: Use helper function to check for HDMI signal John Clements (1): drm/amdgpu: expand sienna chichlid reg access support Jun Lei (1): drm/amd/display: Disable idle optimizations before programming DCN Kenneth Feng (1): drm/amd/powerplay: remove the dpm checking in the boot sequence Kevin Wang (1): drm/amd/swsmu: allow asic to handle sensor type by itself Likun Gao (6): drm/amd/powerplay: skip invalid msg when smu set mp1 state drm/amd/powerplay: add msg map for mode1 reset drm/amd/powerplay: correct smu message for vf mode drm/amdgpu: update golden setting for sienna_cichlid drm/amd/powerplay: update driver if file for sienna_cichlid drm/amdgpu: use mode1 reset by default for sienna_cichlid Liu ChengZhe (2): drm/amdgpu: fix PSP autoload twice in FLR drm amdgpu: Skip tmr load for SRIOV Martin Tsai (1): drm/amd/display: Check lane status again after link training done Reza Amini (1): drm/amd/display: Allow asic specific FSFT timing optimization Sandeep Raghuraman (1): drm/amdgpu: Fix bug where DPM is not enabled after hibernate and resume Stylon Wang (1): drm/amd/display: Fix dmesg warning from setting abm level Wyatt Wood (1): drm/amd/display: Use hw lock mgr hersen wu (1): drm/amd/display: dchubbub p-state warning during surface planes switch drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 4 +- drivers/gpu/drm/amd/amdgpu/amdgpu_pm.c | 6 + drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c| 37 +- drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c| 5 +- drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c | 7 +- drivers/gpu/drm/amd/amdgpu/jpeg_v3_0.c | 9 +- drivers/gpu/drm/amd/amdgpu/nv.c| 56 +++- drivers/gpu/drm/amd/amdgpu/vcn_v3_0.c | 2 +- drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 30 - .../drm/amd/display/amdgpu_dm/amdgpu_dm_debugfs.c | 11 +- drivers/gpu/drm/amd/display/dc/bios/bios_parser
Re: [PATCH 8/8] drm/amd/display: Expose modifiers.
On Tue, Aug 4, 2020 at 5:32 PM Bas Nieuwenhuizen wrote: > This expose modifier support on GFX9+. > > Only modifiers that can be rendered on the current GPU are > added. This is to reduce the number of modifiers exposed. > > The HW could expose more, but the best mechanism to decide > what to expose without an explosion in modifiers is still > to be decided, and in the meantime this should not regress > things from pre-modifiers and does not risk regressions as > we make up our mind in the future. > > Signed-off-by: Bas Nieuwenhuizen > --- > .../gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 343 +- > 1 file changed, 342 insertions(+), 1 deletion(-) > > diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c > b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c > index c38257081868..6594cbe625f9 100644 > --- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c > +++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c > @@ -3891,6 +3891,340 @@ fill_gfx9_tiling_info_from_modifier(const struct > amdgpu_device *adev, > } > } > > +enum dm_micro_swizzle { > + MICRO_SWIZZLE_Z = 0, > + MICRO_SWIZZLE_S = 1, > + MICRO_SWIZZLE_D = 2, > + MICRO_SWIZZLE_R = 3 > +}; > + > +static bool dm_plane_format_mod_supported(struct drm_plane *plane, > + uint32_t format, > + uint64_t modifier) > +{ > + struct amdgpu_device *adev = plane->dev->dev_private; > + const struct drm_format_info *info = drm_format_info(format); > + > + enum dm_micro_swizzle microtile = > modifier_gfx9_swizzle_mode(modifier) & 3; > + > + if (!info) > + return false; > + > + /* > +* We always have to allow this modifier, because core DRM still > +* checks LINEAR support if userspace does not provide modifers. > +*/ > + if (modifier == DRM_FORMAT_MOD_LINEAR) > + return true; > + > + /* > +* The arbitrary tiling support for multiplane formats has not > been hooked > +* up. > +*/ > + if (info->num_planes > 1) > + return false; > + > + /* > +* For D swizzle the canonical modifier depends on the bpp, so > check > +* it here. > +*/ > + if (AMD_FMT_MOD_GET(TILE_VERSION, modifier) == > AMD_FMT_MOD_TILE_VER_GFX9 && > + adev->family >= AMDGPU_FAMILY_NV) { > + if (microtile == MICRO_SWIZZLE_D && info->cpp[0] == 4) > + return false; > + } > + > + if (adev->family >= AMDGPU_FAMILY_RV && microtile == > MICRO_SWIZZLE_D && > + info->cpp[0] < 8) > + return false; > + > + if (modifier_has_dcc(modifier)) { > + /* Per radeonsi comments 16/64 bpp are more complicated. */ > + if (info->cpp[0] != 4) > + return false; > + } > + > + return true; > +} > + > +static void > +add_modifier(uint64_t **mods, uint64_t *size, uint64_t *cap, uint64_t mod) > +{ > + if (!*mods) > + return; > + > + if (*cap - *size < 1) { > + uint64_t new_cap = *cap * 2; > + uint64_t *new_mods = kmalloc(new_cap * sizeof(uint64_t), > GFP_KERNEL); > + > + if (!new_mods) { > + kfree(*mods); > + *mods = NULL; > + return; > + } > + > + memcpy(new_mods, *mods, sizeof(uint64_t) * *size); > + kfree(*mods); > + *mods = new_mods; > + *cap = new_cap; > + } > + > + (*mods)[*size] = mod; > + *size += 1; > +} > + > +static void > +add_gfx9_modifiers(const struct amdgpu_device *adev, > + uint64_t **mods, uint64_t *size, uint64_t *capacity) > +{ > + int pipes = > ilog2(adev->gfx.config.gb_addr_config_fields.num_pipes); > + int pipe_xor_bits = min(8, pipes + > + > ilog2(adev->gfx.config.gb_addr_config_fields.num_se)); > + int bank_xor_bits = min(8 - pipe_xor_bits, > + > ilog2(adev->gfx.config.gb_addr_config_fields.num_banks)); > + int rb = ilog2(adev->gfx.config.gb_addr_config_fields.num_se) + > + > ilog2(adev->gfx.config.gb_addr_config_fields.num_rb_per_se); > + > + > + if (adev->family == AMDGPU_FAMILY_RV) { > + /* > +* No _D DCC swizzles yet because we only allow 32bpp, > which > +* doesn't support _D on DCN > +*/ > + > + /* > +* Always enable constant encoding, because the only unit > that > +* didn't support it was CB. But on texture/display we can > +* always interpret it. > +*/ > + add_modifier(mods, size, capacity, AMD_FMT_MOD | > + AMD_FMT_MOD_SET(TILE, > AMD_FMT_MOD_TILE_GFX9_64K_S_X) | > + AMD_FMT_MOD
RE: [PATCH] drm/amdgpu: annotate a false positive recursive locking
[AMD Official Use Only - Internal Distribution Only] On Fri, Aug 7, 2020 at 5:32 PM Li, Dennis wrote: > > [AMD Public Use] > > On Fri, Aug 7, 2020 at 1:59 PM Li, Dennis wrote: > > > > [AMD Public Use] > > > > Hi, Daniel, > > Thanks your review. And I also understand your concern. I guess you > > missed the description of this issue, so I paste it again in the below, and > > explain why this issue happens. > > > > For example, in a XGMI system with 2 GPU devices whose device entity > > is named adev. And each adev has a separated reset_sem. > > // device init function > > void device_init(adev) { > > ... > > init_rwsem(&adev->reset_sem); > > ... > > } > > > > XGMI system have two GPU devices, so system will call device_init > > twice. However the definition of init_rwsem macro has a limitation which > > use a local static lock_class_key to initialize rw_sem, which cause each > > adev->reset_sem share the same lock_class_key. > > > > #define init_rwsem(sem) > > \ > > do { \ > > static struct lock_class_key __key;\ > > \ > > __init_rwsem((sem), #sem, &__key); \ > > } while (0) > > > > And when GPU hang, we should do gpu recovery for all devices in the > > hive. Therefore we should lock each adev->reset_sem to protect GPU from be > > accessed by other threads during recovery. The detailed recovery sequence > > as the following: > > // Step1: lock all GPU firstly: > > for each adev of GPU0 or GPU1 { > >down_write(adev->reset_sem); > >do_suspend(adev); > > } > > > > // Step2: > > do_hive_recovery(hive); > > > > // Step3: resume and unlock GPU > > for each adev of GPU0 or GPU1 { > > do_resume(adev) > > up_write(adev->reset_sem); > > } > > > > Each adev->reset_sem has the same lock_class_key, and lockdep will take > > them as the same rw_semaphore object. Therefore in step1, when lock GPU1, > > lockdep will pop the below warning. > > > > I have considered your proposal (using _nest_lock() ) before. Just as > > you said, _nest_lock() will hide true positive recursive locking. So I gave > > up it in the end. > > > > After reviewing codes of lockdep, I found the lockdep support > > dynamic_key, so using separated lock_class_key has no any side effect. In > > fact, init_rwsem also use dynamic_key. Please see the call sequence of > > init_rwsem and lockdep_set_class as the below: > >1) init_rwsem -> __init_rwsem -> lockdep_init_map; > >2) lockdep_set_class -> lockdep_init_map; > > > > Finally we go back to your concern, you maybe worry this change will > > cause the below dead-lock can't be detected. In fact, lockdep still is able > > to detect the issue as circular locking dependency, but there is no warning > > "recursive locking " in this case. > > Thread A: down_write(adev->reset_sem) for GPU 0 -> > > down_write(adev->reset_sem) for GPU 1 -> ... -> up_write(adev->reset_sem) > > for GPU 1 -> up_write(adev->reset_sem) for GPU 0 > > Thread B: down_write(adev->reset_sem) for GPU 1 -> > > down_write(adev->reset_sem) for GPU 0 -> ... -> > > up_write(adev->reset_sem) for GPU 0 -> up_write(adev->reset_sem) for > > GPU 1 > > > > But lockdep still can detect recursive locking for this case: > > Thread A: down_write(adev->reset_sem) for GPU 0 -> ...-> ...-> > > down_write(adev->reset_sem) for GPU 0 > > Yeah, I guessed correctly what you're doing. My recommendation to use > down_write_nest_lock still stands. This is for reset only, kernel-wide reset > lock really shouldn't hurt. Or make it a lock per xgmi hive, I'm assuming > that information is known somewhere. > > The problem with manual lockdep annotations is that they increase complexity. > You have to keep all the annotations in mind, including justifcations and > which parts they still catch and which parts they don't catch. And there's > zero performance justification for a gpu reset path to create some fancy > lockdep special cases. > > Locking needs to be > 1. maintainable, i.e. every time you need to write a multi-paragraph > explanation like the above it's probably not. This obviously includes > correctness, but it's even more important that people can easily understand > what you're doing. > 2. fast enough, where it matters. gpu reset just doesn't. > > [Dennis Li] Yeah. I strongly agree that manual lockdep annotation will > increase complexity. However my patch isn't for lockdep annotation, and in > fact it is to fix a bug. According to design of lockdep, every lock object > should has a separated lock_class_key which is an unified ID of lock object > in lockdep. Nope that 's not how lockdep work
Re: [PATCH] drm/amdgpu: annotate a false positive recursive locking
On Fri, Aug 7, 2020 at 5:32 PM Li, Dennis wrote: > > [AMD Public Use] > > On Fri, Aug 7, 2020 at 1:59 PM Li, Dennis wrote: > > > > [AMD Public Use] > > > > Hi, Daniel, > > Thanks your review. And I also understand your concern. I guess you > > missed the description of this issue, so I paste it again in the below, and > > explain why this issue happens. > > > > For example, in a XGMI system with 2 GPU devices whose device entity > > is named adev. And each adev has a separated reset_sem. > > // device init function > > void device_init(adev) { > > ... > > init_rwsem(&adev->reset_sem); > > ... > > } > > > > XGMI system have two GPU devices, so system will call device_init > > twice. However the definition of init_rwsem macro has a limitation which > > use a local static lock_class_key to initialize rw_sem, which cause each > > adev->reset_sem share the same lock_class_key. > > > > #define init_rwsem(sem) > > \ > > do { \ > > static struct lock_class_key __key;\ > > \ > > __init_rwsem((sem), #sem, &__key); \ > > } while (0) > > > > And when GPU hang, we should do gpu recovery for all devices in the > > hive. Therefore we should lock each adev->reset_sem to protect GPU from be > > accessed by other threads during recovery. The detailed recovery sequence > > as the following: > > // Step1: lock all GPU firstly: > > for each adev of GPU0 or GPU1 { > >down_write(adev->reset_sem); > >do_suspend(adev); > > } > > > > // Step2: > > do_hive_recovery(hive); > > > > // Step3: resume and unlock GPU > > for each adev of GPU0 or GPU1 { > > do_resume(adev) > > up_write(adev->reset_sem); > > } > > > > Each adev->reset_sem has the same lock_class_key, and lockdep will take > > them as the same rw_semaphore object. Therefore in step1, when lock GPU1, > > lockdep will pop the below warning. > > > > I have considered your proposal (using _nest_lock() ) before. Just as > > you said, _nest_lock() will hide true positive recursive locking. So I gave > > up it in the end. > > > > After reviewing codes of lockdep, I found the lockdep support > > dynamic_key, so using separated lock_class_key has no any side effect. In > > fact, init_rwsem also use dynamic_key. Please see the call sequence of > > init_rwsem and lockdep_set_class as the below: > >1) init_rwsem -> __init_rwsem -> lockdep_init_map; > >2) lockdep_set_class -> lockdep_init_map; > > > > Finally we go back to your concern, you maybe worry this change will > > cause the below dead-lock can't be detected. In fact, lockdep still is able > > to detect the issue as circular locking dependency, but there is no warning > > "recursive locking " in this case. > > Thread A: down_write(adev->reset_sem) for GPU 0 -> > > down_write(adev->reset_sem) for GPU 1 -> ... -> up_write(adev->reset_sem) > > for GPU 1 -> up_write(adev->reset_sem) for GPU 0 > > Thread B: down_write(adev->reset_sem) for GPU 1 -> > > down_write(adev->reset_sem) for GPU 0 -> ... -> > > up_write(adev->reset_sem) for GPU 0 -> up_write(adev->reset_sem) for > > GPU 1 > > > > But lockdep still can detect recursive locking for this case: > > Thread A: down_write(adev->reset_sem) for GPU 0 -> ...-> ...-> > > down_write(adev->reset_sem) for GPU 0 > > Yeah, I guessed correctly what you're doing. My recommendation to use > down_write_nest_lock still stands. This is for reset only, kernel-wide reset > lock really shouldn't hurt. Or make it a lock per xgmi hive, I'm assuming > that information is known somewhere. > > The problem with manual lockdep annotations is that they increase complexity. > You have to keep all the annotations in mind, including justifcations and > which parts they still catch and which parts they don't catch. And there's > zero performance justification for a gpu reset path to create some fancy > lockdep special cases. > > Locking needs to be > 1. maintainable, i.e. every time you need to write a multi-paragraph > explanation like the above it's probably not. This obviously includes > correctness, but it's even more important that people can easily understand > what you're doing. > 2. fast enough, where it matters. gpu reset just doesn't. > > [Dennis Li] Yeah. I strongly agree that manual lockdep annotation will > increase complexity. However my patch isn't for lockdep annotation, and in > fact it is to fix a bug. According to design of lockdep, every lock object > should has a separated lock_class_key which is an unified ID of lock object > in lockdep. Nope that 's not how lockdep works. It intentionally shares the lockdep class. If every
RE: [PATCH] drm/amdgpu: annotate a false positive recursive locking
[AMD Public Use] On Fri, Aug 7, 2020 at 1:59 PM Li, Dennis wrote: > > [AMD Public Use] > > Hi, Daniel, > Thanks your review. And I also understand your concern. I guess you > missed the description of this issue, so I paste it again in the below, and > explain why this issue happens. > > For example, in a XGMI system with 2 GPU devices whose device entity is > named adev. And each adev has a separated reset_sem. > // device init function > void device_init(adev) { > ... > init_rwsem(&adev->reset_sem); > ... > } > > XGMI system have two GPU devices, so system will call device_init twice. > However the definition of init_rwsem macro has a limitation which use a local > static lock_class_key to initialize rw_sem, which cause each adev->reset_sem > share the same lock_class_key. > > #define init_rwsem(sem) \ > do { \ > static struct lock_class_key __key;\ > \ > __init_rwsem((sem), #sem, &__key); \ > } while (0) > > And when GPU hang, we should do gpu recovery for all devices in the > hive. Therefore we should lock each adev->reset_sem to protect GPU from be > accessed by other threads during recovery. The detailed recovery sequence as > the following: > // Step1: lock all GPU firstly: > for each adev of GPU0 or GPU1 { >down_write(adev->reset_sem); >do_suspend(adev); > } > > // Step2: > do_hive_recovery(hive); > > // Step3: resume and unlock GPU > for each adev of GPU0 or GPU1 { > do_resume(adev) > up_write(adev->reset_sem); > } > > Each adev->reset_sem has the same lock_class_key, and lockdep will take > them as the same rw_semaphore object. Therefore in step1, when lock GPU1, > lockdep will pop the below warning. > > I have considered your proposal (using _nest_lock() ) before. Just as > you said, _nest_lock() will hide true positive recursive locking. So I gave > up it in the end. > > After reviewing codes of lockdep, I found the lockdep support > dynamic_key, so using separated lock_class_key has no any side effect. In > fact, init_rwsem also use dynamic_key. Please see the call sequence of > init_rwsem and lockdep_set_class as the below: >1) init_rwsem -> __init_rwsem -> lockdep_init_map; >2) lockdep_set_class -> lockdep_init_map; > > Finally we go back to your concern, you maybe worry this change will > cause the below dead-lock can't be detected. In fact, lockdep still is able > to detect the issue as circular locking dependency, but there is no warning > "recursive locking " in this case. > Thread A: down_write(adev->reset_sem) for GPU 0 -> > down_write(adev->reset_sem) for GPU 1 -> ... -> up_write(adev->reset_sem) for > GPU 1 -> up_write(adev->reset_sem) for GPU 0 > Thread B: down_write(adev->reset_sem) for GPU 1 -> > down_write(adev->reset_sem) for GPU 0 -> ... -> > up_write(adev->reset_sem) for GPU 0 -> up_write(adev->reset_sem) for > GPU 1 > > But lockdep still can detect recursive locking for this case: > Thread A: down_write(adev->reset_sem) for GPU 0 -> ...-> ...-> > down_write(adev->reset_sem) for GPU 0 Yeah, I guessed correctly what you're doing. My recommendation to use down_write_nest_lock still stands. This is for reset only, kernel-wide reset lock really shouldn't hurt. Or make it a lock per xgmi hive, I'm assuming that information is known somewhere. The problem with manual lockdep annotations is that they increase complexity. You have to keep all the annotations in mind, including justifcations and which parts they still catch and which parts they don't catch. And there's zero performance justification for a gpu reset path to create some fancy lockdep special cases. Locking needs to be 1. maintainable, i.e. every time you need to write a multi-paragraph explanation like the above it's probably not. This obviously includes correctness, but it's even more important that people can easily understand what you're doing. 2. fast enough, where it matters. gpu reset just doesn't. [Dennis Li] Yeah. I strongly agree that manual lockdep annotation will increase complexity. However my patch isn't for lockdep annotation, and in fact it is to fix a bug. According to design of lockdep, every lock object should has a separated lock_class_key which is an unified ID of lock object in lockdep. I still take the previous sample codes for example: we define two rw_sem: a and b, and use case1 and case2 to init them. In my opinion, case1 and case2 should have same behavior, but in fact, they are different, because case2 use init_rwsem macro wrongly. Therefore we should lockdep_set_class to fix this issue, such as case3. Case 1: init_
Re: [PATCH] drm/amd/powerplay: correct Vega20 cached smu feature state
[AMD Official Use Only - Internal Distribution Only] Acked-by: Alex Deucher From: Quan, Evan Sent: Friday, August 7, 2020 5:30 AM To: amd-gfx@lists.freedesktop.org Cc: Deucher, Alexander ; Kuehling, Felix ; Quan, Evan Subject: [PATCH] drm/amd/powerplay: correct Vega20 cached smu feature state Correct the cached smu feature state on pp_features sysfs setting. Change-Id: Icc4c3ce764876a0ffdc86ad4c8a8b9c9f0ed0e97 Signed-off-by: Evan Quan --- .../drm/amd/powerplay/hwmgr/vega20_hwmgr.c| 38 +-- 1 file changed, 19 insertions(+), 19 deletions(-) diff --git a/drivers/gpu/drm/amd/powerplay/hwmgr/vega20_hwmgr.c b/drivers/gpu/drm/amd/powerplay/hwmgr/vega20_hwmgr.c index 90c78f127f7e..acc926c20c55 100644 --- a/drivers/gpu/drm/amd/powerplay/hwmgr/vega20_hwmgr.c +++ b/drivers/gpu/drm/amd/powerplay/hwmgr/vega20_hwmgr.c @@ -984,10 +984,7 @@ static int vega20_disable_all_smu_features(struct pp_hwmgr *hwmgr) { struct vega20_hwmgr *data = (struct vega20_hwmgr *)(hwmgr->backend); - uint64_t features_enabled; - int i; - bool enabled; - int ret = 0; + int i, ret = 0; PP_ASSERT_WITH_CODE((ret = smum_send_msg_to_smc(hwmgr, PPSMC_MSG_DisableAllSmuFeatures, @@ -995,17 +992,8 @@ static int vega20_disable_all_smu_features(struct pp_hwmgr *hwmgr) "[DisableAllSMUFeatures] Failed to disable all smu features!", return ret); - ret = vega20_get_enabled_smc_features(hwmgr, &features_enabled); - PP_ASSERT_WITH_CODE(!ret, - "[DisableAllSMUFeatures] Failed to get enabled smc features!", - return ret); - - for (i = 0; i < GNLD_FEATURES_MAX; i++) { - enabled = (features_enabled & data->smu_features[i].smu_feature_bitmap) ? - true : false; - data->smu_features[i].enabled = enabled; - data->smu_features[i].supported = enabled; - } + for (i = 0; i < GNLD_FEATURES_MAX; i++) + data->smu_features[i].enabled = 0; return 0; } @@ -3242,10 +3230,11 @@ static int vega20_get_ppfeature_status(struct pp_hwmgr *hwmgr, char *buf) static int vega20_set_ppfeature_status(struct pp_hwmgr *hwmgr, uint64_t new_ppfeature_masks) { - uint64_t features_enabled; - uint64_t features_to_enable; - uint64_t features_to_disable; - int ret = 0; + struct vega20_hwmgr *data = + (struct vega20_hwmgr *)(hwmgr->backend); + uint64_t features_enabled, features_to_enable, features_to_disable; + int i, ret = 0; + bool enabled; if (new_ppfeature_masks >= (1ULL << GNLD_FEATURES_MAX)) return -EINVAL; @@ -3274,6 +3263,17 @@ static int vega20_set_ppfeature_status(struct pp_hwmgr *hwmgr, uint64_t new_ppfe return ret; } + /* Update the cached feature enablement state */ + ret = vega20_get_enabled_smc_features(hwmgr, &features_enabled); + if (ret) + return ret; + + for (i = 0; i < GNLD_FEATURES_MAX; i++) { + enabled = (features_enabled & data->smu_features[i].smu_feature_bitmap) ? + true : false; + data->smu_features[i].enabled = enabled; + } + return 0; } -- 2.28.0 ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
Re: [PATCH] drm/amd/powerplay: correct UVD/VCE PG state on custom pptable uploading
[AMD Official Use Only - Internal Distribution Only] Acked-by: Alex Deucher From: Quan, Evan Sent: Friday, August 7, 2020 5:31 AM To: amd-gfx@lists.freedesktop.org Cc: Deucher, Alexander ; Quan, Evan Subject: [PATCH] drm/amd/powerplay: correct UVD/VCE PG state on custom pptable uploading The UVD/VCE PG state is managed by UVD and VCE IP. It's error-prone to assume the bootup state in SMU based on the dpm status. Change-Id: Ib88298ab9812d7d242592bcd55eea140bef6696a Signed-off-by: Evan Quan --- drivers/gpu/drm/amd/powerplay/hwmgr/vega20_hwmgr.c | 6 -- 1 file changed, 6 deletions(-) diff --git a/drivers/gpu/drm/amd/powerplay/hwmgr/vega20_hwmgr.c b/drivers/gpu/drm/amd/powerplay/hwmgr/vega20_hwmgr.c index acc926c20c55..da84012b7fd5 100644 --- a/drivers/gpu/drm/amd/powerplay/hwmgr/vega20_hwmgr.c +++ b/drivers/gpu/drm/amd/powerplay/hwmgr/vega20_hwmgr.c @@ -1645,12 +1645,6 @@ static void vega20_init_powergate_state(struct pp_hwmgr *hwmgr) data->uvd_power_gated = true; data->vce_power_gated = true; - - if (data->smu_features[GNLD_DPM_UVD].enabled) - data->uvd_power_gated = false; - - if (data->smu_features[GNLD_DPM_VCE].enabled) - data->vce_power_gated = false; } static int vega20_enable_dpm_tasks(struct pp_hwmgr *hwmgr) -- 2.28.0 ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
Re: [PATCH] drm/amdgpu: stop touching bo->tbo.ttm
Am 2020-08-07 um 8:33 a.m. schrieb Christian König: > This is not allocated any more for SG BOs. Can you point me at the relevant TTM changes that require this change? We'd need to test that the SG BO is still working as expected. A doorbell self-ring test or a GPU HDP flush test in KFDTest should do the trick. Regards, Felix > > Signed-off-by: Christian König > --- > drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c | 4 +--- > 1 file changed, 1 insertion(+), 3 deletions(-) > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c > b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c > index 9015c7b76d60..2470b913038b 100644 > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c > @@ -1232,10 +1232,8 @@ int amdgpu_amdkfd_gpuvm_alloc_memory_of_gpu( > domain_string(alloc_domain), ret); > goto err_bo_create; > } > - if (bo_type == ttm_bo_type_sg) { > + if (bo_type == ttm_bo_type_sg) > bo->tbo.sg = sg; > - bo->tbo.ttm->sg = sg; > - } > bo->kfd_bo = *mem; > (*mem)->bo = bo; > if (user_addr) ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
Re: [PATCH] drm/amdkfd: force raven as "dgpu" path
Am 2020-08-07 um 4:25 a.m. schrieb Huang Rui: > We still have a few iommu issues which need to address, so force raven > as "dgpu" path for the moment. > > Will enable IOMMUv2 since the issues are fixed. Do you mean "_when_ the issues are fixed"? The current iommuv2 troubles aside, I think this change breaks existing user mode. The existing Thunk is not prepared to see Raven as a dGPU. So this is not something we want to do in an upstream Linux kernel change. I suggest using the ignore_crat module parameter for the workaround instead. You may need to duplicate the raven_device_info and pick the right one depending on whether it is a dGPU or an APU. The only difference would be the need_iommu_device option. If ignore_crat is set, you can support Raven as a dGPU and require a corresponding Thunk change that conditionally support Raven as a dGPU. I think such a change would also be the right direction for supporting Raven more universally in the future. It can be extended to conditionally treat Raven as a dGPU automatically in some situations: * broken or missing CRAT table * IOMMUv2 disabled Those are all situations where the current driver is broken anyway (and always has been), so it would not be a kernel change that breaks existing user mode. In addition the Thunk could be changed to downgrade a Raven APU to dGPU (by splitting the APU node into a separate CPU and dGPU node) if other dGPUs are present in the systems to disable all the APU-specific code paths and allow all the GPUs to work together seamlessly with SVM. Regards, Felix > Signed-off-by: Huang Rui > --- > drivers/gpu/drm/amd/amdkfd/kfd_crat.c | 6 ++ > drivers/gpu/drm/amd/amdkfd/kfd_device.c | 4 ++-- > 2 files changed, 8 insertions(+), 2 deletions(-) > > diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_crat.c > b/drivers/gpu/drm/amd/amdkfd/kfd_crat.c > index 6a250f8fcfb8..66d9f7280fe8 100644 > --- a/drivers/gpu/drm/amd/amdkfd/kfd_crat.c > +++ b/drivers/gpu/drm/amd/amdkfd/kfd_crat.c > @@ -22,6 +22,7 @@ > > #include > #include > +#include > #include "kfd_crat.h" > #include "kfd_priv.h" > #include "kfd_topology.h" > @@ -781,6 +782,11 @@ int kfd_create_crat_image_acpi(void **crat_image, size_t > *size) > return -ENODATA; > } > > + /* Raven series will go with the fallback path to use virtual CRAT */ > + if (boot_cpu_data.x86_vendor == X86_VENDOR_AMD && > + boot_cpu_data.x86 == 0x17) > + return -ENODATA; > + > pcrat_image = kmemdup(crat_table, crat_table->length, GFP_KERNEL); > if (!pcrat_image) > return -ENOMEM; > diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device.c > b/drivers/gpu/drm/amd/amdkfd/kfd_device.c > index d5e790f046b4..93179c928371 100644 > --- a/drivers/gpu/drm/amd/amdkfd/kfd_device.c > +++ b/drivers/gpu/drm/amd/amdkfd/kfd_device.c > @@ -116,6 +116,7 @@ static const struct kfd_device_info carrizo_device_info = > { > .num_xgmi_sdma_engines = 0, > .num_sdma_queues_per_engine = 2, > }; > +#endif > > static const struct kfd_device_info raven_device_info = { > .asic_family = CHIP_RAVEN, > @@ -128,13 +129,12 @@ static const struct kfd_device_info raven_device_info = > { > .num_of_watch_points = 4, > .mqd_size_aligned = MQD_SIZE_ALIGNED, > .supports_cwsr = true, > - .needs_iommu_device = true, > + .needs_iommu_device = false, > .needs_pci_atomics = true, > .num_sdma_engines = 1, > .num_xgmi_sdma_engines = 0, > .num_sdma_queues_per_engine = 2, > }; > -#endif > > static const struct kfd_device_info hawaii_device_info = { > .asic_family = CHIP_HAWAII, ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
Re: [PATCH 7/7] drm/amd/display: Replace DRM private objects with subclassed DRM atomic state
On 2020-08-07 4:52 a.m., dan...@ffwll.ch wrote: On Thu, Jul 30, 2020 at 04:36:42PM -0400, Nicholas Kazlauskas wrote: @@ -440,7 +431,7 @@ struct dm_crtc_state { #define to_dm_crtc_state(x) container_of(x, struct dm_crtc_state, base) struct dm_atomic_state { - struct drm_private_state base; + struct drm_atomic_state base; struct dc_state *context; Also curiosity: Can't we just embed dc_state here, instead of a pointer? Then it would become a lot more obvious that mostly this is a state object container like drm_atomic_state, but for the DC specific state structures. And we could look into moving the actual DC states into drm private states perhaps (if that helps with the code structure and overall flow). Maybe as next steps. -Daniel It's the refcounting that's the problem with this stuff. I'd like to move DC to a model where we have no memory allocation/ownership but that might be a bit of a more long term plan at this point. Same with dc_plane_state and dc_stream_state as well - these could exist on the DRM objects as long as they're not refcounted. Regards, Nicholas Kazlauskas ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
Re: [PATCH 3/7] drm/amd/display: Avoid using unvalidated tiling_flags and tmz_surface in prepare_planes
On 2020-08-07 4:30 a.m., dan...@ffwll.ch wrote: On Thu, Jul 30, 2020 at 04:36:38PM -0400, Nicholas Kazlauskas wrote: [Why] We're racing with userspace as the flags could potentially change from when we acquired and validated them in commit_check. Uh ... I didn't know these could change. I think my comments on Bas' series are even more relevant now. I think long term would be best to bake these flags in at addfb time when modifiers aren't set. And otherwise always use the modifiers flag, and completely ignore the legacy flags here. -Daniel There's a set tiling/mod flags IOCTL that can be called after addfb happens, so unless there's some sort of driver magic preventing this from working when it's already been pinned for scanout then I don't see anything stopping this from happening. I still need to review the modifiers series in a little more detail but that looks like a good approach to fixing these kind of issues. Regards, Nicholas Kazlauskas [How] We unfortunately can't drop this function in its entirety from prepare_planes since we don't know the afb->address at commit_check time yet. So instead of querying new tiling_flags and tmz_surface use the ones from the plane_state directly. While we're at it, also update the force_disable_dcc option based on the state from atomic check. Cc: Bhawanpreet Lakha Cc: Rodrigo Siqueira Signed-off-by: Nicholas Kazlauskas --- .../gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 36 ++- 1 file changed, 19 insertions(+), 17 deletions(-) diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c index bf1881bd492c..f78c09c9585e 100644 --- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c +++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c @@ -5794,14 +5794,8 @@ static int dm_plane_helper_prepare_fb(struct drm_plane *plane, struct list_head list; struct ttm_validate_buffer tv; struct ww_acquire_ctx ticket; - uint64_t tiling_flags; uint32_t domain; int r; - bool tmz_surface = false; - bool force_disable_dcc = false; - - dm_plane_state_old = to_dm_plane_state(plane->state); - dm_plane_state_new = to_dm_plane_state(new_state); if (!new_state->fb) { DRM_DEBUG_DRIVER("No FB bound\n"); @@ -5845,27 +5839,35 @@ static int dm_plane_helper_prepare_fb(struct drm_plane *plane, return r; } - amdgpu_bo_get_tiling_flags(rbo, &tiling_flags); - - tmz_surface = amdgpu_bo_encrypted(rbo); - ttm_eu_backoff_reservation(&ticket, &list); afb->address = amdgpu_bo_gpu_offset(rbo); amdgpu_bo_ref(rbo); + /** +* We don't do surface updates on planes that have been newly created, +* but we also don't have the afb->address during atomic check. +* +* Fill in buffer attributes depending on the address here, but only on +* newly created planes since they're not being used by DC yet and this +* won't modify global state. +*/ + dm_plane_state_old = to_dm_plane_state(plane->state); + dm_plane_state_new = to_dm_plane_state(new_state); + if (dm_plane_state_new->dc_state && - dm_plane_state_old->dc_state != dm_plane_state_new->dc_state) { - struct dc_plane_state *plane_state = dm_plane_state_new->dc_state; + dm_plane_state_old->dc_state != dm_plane_state_new->dc_state) { + struct dc_plane_state *plane_state = + dm_plane_state_new->dc_state; + bool force_disable_dcc = !plane_state->dcc.enable; - force_disable_dcc = adev->asic_type == CHIP_RAVEN && adev->in_suspend; fill_plane_buffer_attributes( adev, afb, plane_state->format, plane_state->rotation, - tiling_flags, &plane_state->tiling_info, - &plane_state->plane_size, &plane_state->dcc, - &plane_state->address, tmz_surface, - force_disable_dcc); + dm_plane_state_new->tiling_flags, + &plane_state->tiling_info, &plane_state->plane_size, + &plane_state->dcc, &plane_state->address, + dm_plane_state_new->tmz_surface, force_disable_dcc); } return 0; -- 2.25.1 ___ dri-devel mailing list dri-de...@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
Re: [PATCH 5/7] drm/amd/display: Reset plane for anything that's not a FAST update
On 2020-08-07 4:34 a.m., dan...@ffwll.ch wrote: On Thu, Jul 30, 2020 at 04:36:40PM -0400, Nicholas Kazlauskas wrote: [Why] MEDIUM or FULL updates can require global validation or affect bandwidth. By treating these all simply as surface updates we aren't actually passing this through DC global validation. [How] There's currently no way to pass surface updates through DC global validation, nor do I think it's a good idea to change the interface to accept these. DC global validation itself is currently stateless, and we can move our update type checking to be stateless as well by duplicating DC surface checks in DM based on DRM properties. We wanted to rely on DC automatically determining this since DC knows best, but DM is ultimately what fills in everything into DC plane state so it does need to know as well. There are basically only three paths that we exercise in DM today: 1) Cursor (async update) 2) Pageflip (fast update) 3) Full pipe programming (medium/full updates) Which means that anything that's more than a pageflip really needs to go down path #3. So this change duplicates all the surface update checks based on DRM state instead inside of should_reset_plane(). Next step is dropping dm_determine_update_type_for_commit and we no longer require the old DC state at all for global validation. I think we do something similar in i915, where we have a "nothing except base address changed" fast path, but for anything else we fully compute a new state. Obviously you should try to keep global state synchronization to a minimum for this step, so it's not entirely only 2 options. Once we have the states, we compare them and figure out whether we can get away with a fast modeset operation (maybe what you guys call medium update). Anyway I think being slightly more aggressive with computing full state, and then falling back to more optimized update again is a good approach. Only risk is if we you have too much synchronization in your locking (e.g. modern compositors do like to change tiling and stuff, especially once you have modifiers enabled, so this shouldn't cause a sync across crtc except when absolutely needed). -Daniel Sounds like the right approach then. We can support tiling changes in the fast path, but the more optimized version of that last check is really linear <-> tiled. That requires global validation with DC to revalidate bandwidth and calculate requestor parameters for HW. So we'll have to stall for some of these changes unfortunately since we need the full HW state for validation. Regards, Nicholas Kazlauskas Optimization can come later so we don't reset DC planes at all for MEDIUM udpates and avoid validation, but we might require some extra checks in DM to achieve this. Cc: Bhawanpreet Lakha Cc: Hersen Wu Signed-off-by: Nicholas Kazlauskas --- .../gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 25 +++ 1 file changed, 25 insertions(+) diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c index 0d5f45742bb5..2cbb29199e61 100644 --- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c +++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c @@ -8336,6 +8336,31 @@ static bool should_reset_plane(struct drm_atomic_state *state, if (old_other_state->crtc != new_other_state->crtc) return true; + /* Src/dst size and scaling updates. */ + if (old_other_state->src_w != new_other_state->src_w || + old_other_state->src_h != new_other_state->src_h || + old_other_state->crtc_w != new_other_state->crtc_w || + old_other_state->crtc_h != new_other_state->crtc_h) + return true; + + /* Rotation / mirroring updates. */ + if (old_other_state->rotation != new_other_state->rotation) + return true; + + /* Blending updates. */ + if (old_other_state->pixel_blend_mode != + new_other_state->pixel_blend_mode) + return true; + + /* Alpha updates. */ + if (old_other_state->alpha != new_other_state->alpha) + return true; + + /* Colorspace changes. */ + if (old_other_state->color_range != new_other_state->color_range || + old_other_state->color_encoding != new_other_state->color_encoding) + return true; + /* Framebuffer checks fall at the end. */ if (!old_other_state->fb || !new_other_state->fb) continue; -- 2.25.1 ___ dri-devel mailing list dri-de...@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesk
RFC: How to adjust the trace pid?
Hi everybody, in amdgpu we got the following issue which I'm seeking advise how to cleanly handle it. We have a bunch of trace points which are related to the VM subsystem and executed in either a work item, kthread or foreign process context. Now tracing the pid of the context which we are executing in is not really that useful, so I'm wondering if we could just overwrite the pid recorded in the trace entry? The following patch does exactly that for the vm_grab_id() trace point, but I'm not 100% sure if that is legal or not. Any ideas? Comments? Thanks, Christian. ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
[PATCH] drm/amdgpu: adjust the pid in the grab_id trace point
Trace something useful instead of the pid of a kernel thread here. Signed-off-by: Christian König --- drivers/gpu/drm/amd/amdgpu/amdgpu_trace.h | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_trace.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_trace.h index 5da20fc166d9..07f99ef69d91 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_trace.h +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_trace.h @@ -228,6 +228,7 @@ TRACE_EVENT(amdgpu_vm_grab_id, ), TP_fast_assign( + __entry->ent.pid = vm->task_info.pid; __entry->pasid = vm->pasid; __assign_str(ring, ring->name) __entry->vmid = job->vmid; -- 2.17.1 ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
[PATCH] drm/amdgpu: stop touching bo->tbo.ttm
This is not allocated any more for SG BOs. Signed-off-by: Christian König --- drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c index 9015c7b76d60..2470b913038b 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c @@ -1232,10 +1232,8 @@ int amdgpu_amdkfd_gpuvm_alloc_memory_of_gpu( domain_string(alloc_domain), ret); goto err_bo_create; } - if (bo_type == ttm_bo_type_sg) { + if (bo_type == ttm_bo_type_sg) bo->tbo.sg = sg; - bo->tbo.ttm->sg = sg; - } bo->kfd_bo = *mem; (*mem)->bo = bo; if (user_addr) -- 2.25.1 ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
Re: [PATCH] drm/amdgpu: annotate a false positive recursive locking
On Fri, Aug 7, 2020 at 1:59 PM Li, Dennis wrote: > > [AMD Public Use] > > Hi, Daniel, > Thanks your review. And I also understand your concern. I guess you > missed the description of this issue, so I paste it again in the below, and > explain why this issue happens. > > For example, in a XGMI system with 2 GPU devices whose device entity is > named adev. And each adev has a separated reset_sem. > // device init function > void device_init(adev) { > ... > init_rwsem(&adev->reset_sem); > ... > } > > XGMI system have two GPU devices, so system will call device_init twice. > However the definition of init_rwsem macro has a limitation which use a local > static lock_class_key to initialize rw_sem, which cause each adev->reset_sem > share the same lock_class_key. > > #define init_rwsem(sem) \ > do { \ > static struct lock_class_key __key;\ > \ > __init_rwsem((sem), #sem, &__key); \ > } while (0) > > And when GPU hang, we should do gpu recovery for all devices in the > hive. Therefore we should lock each adev->reset_sem to protect GPU from be > accessed by other threads during recovery. The detailed recovery sequence as > the following: > // Step1: lock all GPU firstly: > for each adev of GPU0 or GPU1 { >down_write(adev->reset_sem); >do_suspend(adev); > } > > // Step2: > do_hive_recovery(hive); > > // Step3: resume and unlock GPU > for each adev of GPU0 or GPU1 { > do_resume(adev) > up_write(adev->reset_sem); > } > > Each adev->reset_sem has the same lock_class_key, and lockdep will take > them as the same rw_semaphore object. Therefore in step1, when lock GPU1, > lockdep will pop the below warning. > > I have considered your proposal (using _nest_lock() ) before. Just as > you said, _nest_lock() will hide true positive recursive locking. So I gave > up it in the end. > > After reviewing codes of lockdep, I found the lockdep support > dynamic_key, so using separated lock_class_key has no any side effect. In > fact, init_rwsem also use dynamic_key. Please see the call sequence of > init_rwsem and lockdep_set_class as the below: >1) init_rwsem -> __init_rwsem -> lockdep_init_map; >2) lockdep_set_class -> lockdep_init_map; > > Finally we go back to your concern, you maybe worry this change will > cause the below dead-lock can't be detected. In fact, lockdep still is able > to detect the issue as circular locking dependency, but there is no warning > "recursive locking " in this case. > Thread A: down_write(adev->reset_sem) for GPU 0 -> > down_write(adev->reset_sem) for GPU 1 -> ... -> up_write(adev->reset_sem) for > GPU 1 -> up_write(adev->reset_sem) for GPU 0 > Thread B: down_write(adev->reset_sem) for GPU 1 -> > down_write(adev->reset_sem) for GPU 0 -> ... -> up_write(adev->reset_sem) for > GPU 0 -> up_write(adev->reset_sem) for GPU 1 > > But lockdep still can detect recursive locking for this case: > Thread A: down_write(adev->reset_sem) for GPU 0 -> ...-> ...-> > down_write(adev->reset_sem) for GPU 0 Yeah, I guessed correctly what you're doing. My recommendation to use down_write_nest_lock still stands. This is for reset only, kernel-wide reset lock really shouldn't hurt. Or make it a lock per xgmi hive, I'm assuming that information is known somewhere. The problem with manual lockdep annotations is that they increase complexity. You have to keep all the annotations in mind, including justifcations and which parts they still catch and which parts they don't catch. And there's zero performance justification for a gpu reset path to create some fancy lockdep special cases. Locking needs to be 1. maintainable, i.e. every time you need to write a multi-paragraph explanation like the above it's probably not. This obviously includes correctness, but it's even more important that people can easily understand what you're doing. 2. fast enough, where it matters. gpu reset just doesn't. > [ 584.110304] > [ 584.110590] WARNING: possible recursive locking detected > [ 584.110876] 5.6.0-deli-v5.6-2848-g3f3109b0e75f #1 Tainted: G OE > [ 584.64] > [ 584.111456] kworker/38:1/553 is trying to acquire lock: > [ 584.111721] 9b15ff0a47a0 (&adev->reset_sem){}, at: > amdgpu_device_gpu_recover+0x262/0x1030 [amdgpu] > [ 584.112112] >but task is already holding lock: > [ 584.112673] 9b1603d247a0 (&adev->reset_sem){}, at: > amdgpu_device_gpu_recover+0x262/0x1030 [amdgpu] > [ 584.113068] >other info that might help us debu
RE: [PATCH] drm/amdgpu: annotate a false positive recursive locking
[AMD Public Use] Hi, Daniel, Thanks your review. And I also understand your concern. I guess you missed the description of this issue, so I paste it again in the below, and explain why this issue happens. For example, in a XGMI system with 2 GPU devices whose device entity is named adev. And each adev has a separated reset_sem. // device init function void device_init(adev) { ... init_rwsem(&adev->reset_sem); ... } XGMI system have two GPU devices, so system will call device_init twice. However the definition of init_rwsem macro has a limitation which use a local static lock_class_key to initialize rw_sem, which cause each adev->reset_sem share the same lock_class_key. #define init_rwsem(sem) \ do { \ static struct lock_class_key __key;\ \ __init_rwsem((sem), #sem, &__key); \ } while (0) And when GPU hang, we should do gpu recovery for all devices in the hive. Therefore we should lock each adev->reset_sem to protect GPU from be accessed by other threads during recovery. The detailed recovery sequence as the following: // Step1: lock all GPU firstly: for each adev of GPU0 or GPU1 { down_write(adev->reset_sem); do_suspend(adev); } // Step2: do_hive_recovery(hive); // Step3: resume and unlock GPU for each adev of GPU0 or GPU1 { do_resume(adev) up_write(adev->reset_sem); } Each adev->reset_sem has the same lock_class_key, and lockdep will take them as the same rw_semaphore object. Therefore in step1, when lock GPU1, lockdep will pop the below warning. I have considered your proposal (using _nest_lock() ) before. Just as you said, _nest_lock() will hide true positive recursive locking. So I gave up it in the end. After reviewing codes of lockdep, I found the lockdep support dynamic_key, so using separated lock_class_key has no any side effect. In fact, init_rwsem also use dynamic_key. Please see the call sequence of init_rwsem and lockdep_set_class as the below: 1) init_rwsem -> __init_rwsem -> lockdep_init_map; 2) lockdep_set_class -> lockdep_init_map; Finally we go back to your concern, you maybe worry this change will cause the below dead-lock can't be detected. In fact, lockdep still is able to detect the issue as circular locking dependency, but there is no warning "recursive locking " in this case. Thread A: down_write(adev->reset_sem) for GPU 0 -> down_write(adev->reset_sem) for GPU 1 -> ... -> up_write(adev->reset_sem) for GPU 1 -> up_write(adev->reset_sem) for GPU 0 Thread B: down_write(adev->reset_sem) for GPU 1 -> down_write(adev->reset_sem) for GPU 0 -> ... -> up_write(adev->reset_sem) for GPU 0 -> up_write(adev->reset_sem) for GPU 1 But lockdep still can detect recursive locking for this case: Thread A: down_write(adev->reset_sem) for GPU 0 -> ...-> ...-> down_write(adev->reset_sem) for GPU 0 [ 584.110304] [ 584.110590] WARNING: possible recursive locking detected [ 584.110876] 5.6.0-deli-v5.6-2848-g3f3109b0e75f #1 Tainted: G OE [ 584.64] [ 584.111456] kworker/38:1/553 is trying to acquire lock: [ 584.111721] 9b15ff0a47a0 (&adev->reset_sem){}, at: amdgpu_device_gpu_recover+0x262/0x1030 [amdgpu] [ 584.112112] but task is already holding lock: [ 584.112673] 9b1603d247a0 (&adev->reset_sem){}, at: amdgpu_device_gpu_recover+0x262/0x1030 [amdgpu] [ 584.113068] other info that might help us debug this: [ 584.113689] Possible unsafe locking scenario: [ 584.114350]CPU0 [ 584.114685] [ 584.115014] lock(&adev->reset_sem); [ 584.115349] lock(&adev->reset_sem); [ 584.115678] *** DEADLOCK *** [ 584.116624] May be due to missing lock nesting notation [ 584.117284] 4 locks held by kworker/38:1/553: [ 584.117616] #0: 9ad635c1d348 ((wq_completion)events){+.+.}, at: process_one_work+0x21f/0x630 [ 584.117967] #1: ac708e1c3e58 ((work_completion)(&con->recovery_work)){+.+.}, at: process_one_work+0x21f/0x630 [ 584.118358] #2: c1c2a5d0 (&tmp->hive_lock){+.+.}, at: amdgpu_device_gpu_recover+0xae/0x1030 [amdgpu] [ 584.118786] #3: 9b1603d247a0 (&adev->reset_sem){}, at: amdgpu_device_gpu_recover+0x262/0x1030 [amdgpu] Best Regards Dennis Li -Original Message- From: Daniel Vetter Sent: Friday, August 7, 2020 5:45 PM To: Koenig, Christian Cc: Li, Dennis ; amd-gfx@lists.freedesktop.org; Deucher, Alexander ; Kuehling, Felix ; Zhang, Hawking Subject: Re: [PATCH] drm/amdgpu: annotate a false positive r
Re: [PATCH] drm/amdgpu: annotate a false positive recursive locking
On Fri, Aug 7, 2020 at 11:34 AM Christian König wrote: > > Am 07.08.20 um 11:23 schrieb Daniel Vetter: > > On Fri, Aug 7, 2020 at 11:20 AM Daniel Vetter wrote: > >> On Fri, Aug 7, 2020 at 11:08 AM Christian König > >> wrote: > >>> [SNIP] > >>> What we should do instead is to make sure we have only a single lock > >>> for the complete hive instead. > >>> [Dennis Li] If we use a single lock, users will must wait for all > >>> devices resuming successfully, but in fact their tasks are only > >>> running in device a. It is not friendly to users. > >> Well that is intentional I would say. We can only restart submissions > >> after all devices are resumed successfully, cause otherwise it can > >> happen that a job on device A depends on memory on device B which > >> isn't accessible. > >> [Dennis Li] Yes, you are right. Driver have make sure that the shared > >> resources(such as the shard memory) are ready before unlock the lock > >> of adev one by one. But each device still has private hardware > >> resources such as video and display. > > Yeah, but those are negligible and we have a team working on display > > support for XGMI. So this will sooner or later become a problem as well. > > > > I still think that a single rwsem for the whole hive is still the best > > option here. > > > > [Dennis Li] For your above concern, we can open a new thread to discuss > > it. If we make a decision to use a single after discussing, we can > > create another patch for it. > That's a really good argument, but I still hesitate to merge this patch. > How severe is the lockdep splat? > > At bare minimum we need a "/* TODO: " comment why we do this and how > to remove the workaround again. > [Dennis Li] It is not a workaround. According to design of lockdep, each > rw_semaphore should has a separated lock_class_key. I have explained > that init_rwsem has the above limitation, so we must correct it. > >>> Yeah, but this explanation is not really sufficient. Again this is not > >>> an limitation of init_rwsem, but intentional behavior. > >>> > >>> In other words in most cases lockdep should splat if you use rw > >>> semaphores like this. > >>> > >>> That we avoid this by locking multiple amdgpu device always in the same > >>> order is rather questionable driver design. > >> Yeah don't give multiple locking classes like that, that's fairly > >> terrible. > > Ok, that is exactly the answer I feared we would get. > > >> The right fix is mutex_lock_nest_lock, which we're using in > >> ww_mutex or in which is also used in mm_take_all_locks. > > Ah! Enlightenment! So in this case we should use down_write_nested() in > this special space and all is well? Nope. There's two kinds of nesting that lockdep supports: 1. _nested() variants, where you supply an integer parameter specifying the lockdep subclass. That's like setting a locking subclass at lock creating time, except much worse because you change the lockdep class at runtime. This can lead to stuff like: struct mutex A; mutex_init(&A); mutex_lock(&A); mutex_lock_nested(&A, SINGLE_DEPTH_NESTING); which is a very clear deadlock, but lockdep will think it's all fine. Don't use that. 2. _nest_lock() variants, where you specify a super-lock, which makes sure that all sub-locks cannot be acquired concurrently (or you promise there's some other trick to avoid deadlocks). Lockdep checks that you've acquired the super-lock, and then allows arbitary nesting. This is what ww_mutex uses, and what mm_take_all_locks uses. If the super-lock is an actual mutex, then this is actually deadlock free. With ww_mutex it's a fake lockdep_map in the ww_acquire_ctx, but we guarantee deadlock-freeness through the ww algorithm. The fake super lock for ww_mutex is acquired when you get the ticket. No 2 is the thing you want, not no 1. > >> > >> If you solve the locking ordering by sorting all the locks you need > >> (ugh), then you can also just use a fake lockdep_map for your special > >> function only. > >> > >> Also in general the idea of an rwsem in conjunction with xgmi cross > >> device memory handling just plain freaks me out :-) Please make sure > >> you prime this lock against dma_resv and dma_fence (and maybe also > >> drm_modeset_lock if this lock is outside of dma_resv), and ideally > >> this should only ever be used for setup stuff when loading drivers. I > >> guess that's why you went with a per-device rwsem. If it's really only > >> for setup then just do the simple thing of having an > >> xgmi_hive_reconfigure lock, which all the rwsems nest within, and no > >> fancy lock ordering or other tricks. > > Well this is for the group reset of all devices in a hive and you need > to reboot to change the order the device are in the list. Ok, that sounds save enough. I'd go with a xgmi_hive_lock then, and down_write_nest_lock(&adev->xgmi_rwsem, &xgm
Re: [PATCH] drm/amdgpu: annotate a false positive recursive locking
Am 07.08.20 um 11:23 schrieb Daniel Vetter: On Fri, Aug 7, 2020 at 11:20 AM Daniel Vetter wrote: On Fri, Aug 7, 2020 at 11:08 AM Christian König wrote: [SNIP] What we should do instead is to make sure we have only a single lock for the complete hive instead. [Dennis Li] If we use a single lock, users will must wait for all devices resuming successfully, but in fact their tasks are only running in device a. It is not friendly to users. Well that is intentional I would say. We can only restart submissions after all devices are resumed successfully, cause otherwise it can happen that a job on device A depends on memory on device B which isn't accessible. [Dennis Li] Yes, you are right. Driver have make sure that the shared resources(such as the shard memory) are ready before unlock the lock of adev one by one. But each device still has private hardware resources such as video and display. Yeah, but those are negligible and we have a team working on display support for XGMI. So this will sooner or later become a problem as well. I still think that a single rwsem for the whole hive is still the best option here. [Dennis Li] For your above concern, we can open a new thread to discuss it. If we make a decision to use a single after discussing, we can create another patch for it. That's a really good argument, but I still hesitate to merge this patch. How severe is the lockdep splat? At bare minimum we need a "/* TODO: " comment why we do this and how to remove the workaround again. [Dennis Li] It is not a workaround. According to design of lockdep, each rw_semaphore should has a separated lock_class_key. I have explained that init_rwsem has the above limitation, so we must correct it. Yeah, but this explanation is not really sufficient. Again this is not an limitation of init_rwsem, but intentional behavior. In other words in most cases lockdep should splat if you use rw semaphores like this. That we avoid this by locking multiple amdgpu device always in the same order is rather questionable driver design. Yeah don't give multiple locking classes like that, that's fairly terrible. Ok, that is exactly the answer I feared we would get. The right fix is mutex_lock_nest_lock, which we're using in ww_mutex or in which is also used in mm_take_all_locks. Ah! Enlightenment! So in this case we should use down_write_nested() in this special space and all is well? If you solve the locking ordering by sorting all the locks you need (ugh), then you can also just use a fake lockdep_map for your special function only. Also in general the idea of an rwsem in conjunction with xgmi cross device memory handling just plain freaks me out :-) Please make sure you prime this lock against dma_resv and dma_fence (and maybe also drm_modeset_lock if this lock is outside of dma_resv), and ideally this should only ever be used for setup stuff when loading drivers. I guess that's why you went with a per-device rwsem. If it's really only for setup then just do the simple thing of having an xgmi_hive_reconfigure lock, which all the rwsems nest within, and no fancy lock ordering or other tricks. Well this is for the group reset of all devices in a hive and you need to reboot to change the order the device are in the list. Regards, Christian. Core network driver (net/core/dev.c) has the similar use case with us. Just took a look at that, but this is completely different use case. The lockdep classes are grouped by driver type to note the difference in the network stack. A quick search showed that no other device driver is doing this in the kernel, so I'm not sure if such a behavior wouldn't be rejected. Adding Daniel to comment on this as well. Felix had some really good arguments to make that an urgent fix, so either we come up with some real rework of this or at least add a big "/* TODO*/". Aside from "I have questions" I don't think there's any reason to no fix this correctly with mutex_lock_nest_lock. Really shouldn't be a semantic change from what I'm understand here. Also one more with my drm maintainer hat on, as a heads-up: Dave&me looked at i915 gem code a bit too much, and we're appalled at the massive over-use of lockdep class hacks and dubious nesting annotations. Expect crack down on anyone who tries to play clever tricks here, we have a few too many already :-) So no mutex_lock_nested (totally different beast from mutex_lock_nest_lock, and really unsafe since it's a runtime change of the lockdep class) and also not any lockdep_set_class trickery or anything like that. We need locking hierarchies which mere humans can comprehend and review. Cheers, Daniel Cheers, Daniel Regards, Christian. Regards, Christian. -- Daniel Vetter Software Engineer, Intel Corporation https://nam11.safelinks.protection.outlook.com/?url=http%3A%2F%2Fblog.ffwll.ch%2F&data=02%7C01%7Cchristian.koenig%40amd.com%7Cfa969d301e9f4a98c05608d83ab3a0f8%7C3dd8961fe4884e608e11a82d994e183d%7
[PATCH] drm/amd/powerplay: correct UVD/VCE PG state on custom pptable uploading
The UVD/VCE PG state is managed by UVD and VCE IP. It's error-prone to assume the bootup state in SMU based on the dpm status. Change-Id: Ib88298ab9812d7d242592bcd55eea140bef6696a Signed-off-by: Evan Quan --- drivers/gpu/drm/amd/powerplay/hwmgr/vega20_hwmgr.c | 6 -- 1 file changed, 6 deletions(-) diff --git a/drivers/gpu/drm/amd/powerplay/hwmgr/vega20_hwmgr.c b/drivers/gpu/drm/amd/powerplay/hwmgr/vega20_hwmgr.c index acc926c20c55..da84012b7fd5 100644 --- a/drivers/gpu/drm/amd/powerplay/hwmgr/vega20_hwmgr.c +++ b/drivers/gpu/drm/amd/powerplay/hwmgr/vega20_hwmgr.c @@ -1645,12 +1645,6 @@ static void vega20_init_powergate_state(struct pp_hwmgr *hwmgr) data->uvd_power_gated = true; data->vce_power_gated = true; - - if (data->smu_features[GNLD_DPM_UVD].enabled) - data->uvd_power_gated = false; - - if (data->smu_features[GNLD_DPM_VCE].enabled) - data->vce_power_gated = false; } static int vega20_enable_dpm_tasks(struct pp_hwmgr *hwmgr) -- 2.28.0 ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
[PATCH] drm/amd/powerplay: correct Vega20 cached smu feature state
Correct the cached smu feature state on pp_features sysfs setting. Change-Id: Icc4c3ce764876a0ffdc86ad4c8a8b9c9f0ed0e97 Signed-off-by: Evan Quan --- .../drm/amd/powerplay/hwmgr/vega20_hwmgr.c| 38 +-- 1 file changed, 19 insertions(+), 19 deletions(-) diff --git a/drivers/gpu/drm/amd/powerplay/hwmgr/vega20_hwmgr.c b/drivers/gpu/drm/amd/powerplay/hwmgr/vega20_hwmgr.c index 90c78f127f7e..acc926c20c55 100644 --- a/drivers/gpu/drm/amd/powerplay/hwmgr/vega20_hwmgr.c +++ b/drivers/gpu/drm/amd/powerplay/hwmgr/vega20_hwmgr.c @@ -984,10 +984,7 @@ static int vega20_disable_all_smu_features(struct pp_hwmgr *hwmgr) { struct vega20_hwmgr *data = (struct vega20_hwmgr *)(hwmgr->backend); - uint64_t features_enabled; - int i; - bool enabled; - int ret = 0; + int i, ret = 0; PP_ASSERT_WITH_CODE((ret = smum_send_msg_to_smc(hwmgr, PPSMC_MSG_DisableAllSmuFeatures, @@ -995,17 +992,8 @@ static int vega20_disable_all_smu_features(struct pp_hwmgr *hwmgr) "[DisableAllSMUFeatures] Failed to disable all smu features!", return ret); - ret = vega20_get_enabled_smc_features(hwmgr, &features_enabled); - PP_ASSERT_WITH_CODE(!ret, - "[DisableAllSMUFeatures] Failed to get enabled smc features!", - return ret); - - for (i = 0; i < GNLD_FEATURES_MAX; i++) { - enabled = (features_enabled & data->smu_features[i].smu_feature_bitmap) ? - true : false; - data->smu_features[i].enabled = enabled; - data->smu_features[i].supported = enabled; - } + for (i = 0; i < GNLD_FEATURES_MAX; i++) + data->smu_features[i].enabled = 0; return 0; } @@ -3242,10 +3230,11 @@ static int vega20_get_ppfeature_status(struct pp_hwmgr *hwmgr, char *buf) static int vega20_set_ppfeature_status(struct pp_hwmgr *hwmgr, uint64_t new_ppfeature_masks) { - uint64_t features_enabled; - uint64_t features_to_enable; - uint64_t features_to_disable; - int ret = 0; + struct vega20_hwmgr *data = + (struct vega20_hwmgr *)(hwmgr->backend); + uint64_t features_enabled, features_to_enable, features_to_disable; + int i, ret = 0; + bool enabled; if (new_ppfeature_masks >= (1ULL << GNLD_FEATURES_MAX)) return -EINVAL; @@ -3274,6 +3263,17 @@ static int vega20_set_ppfeature_status(struct pp_hwmgr *hwmgr, uint64_t new_ppfe return ret; } + /* Update the cached feature enablement state */ + ret = vega20_get_enabled_smc_features(hwmgr, &features_enabled); + if (ret) + return ret; + + for (i = 0; i < GNLD_FEATURES_MAX; i++) { + enabled = (features_enabled & data->smu_features[i].smu_feature_bitmap) ? + true : false; + data->smu_features[i].enabled = enabled; + } + return 0; } -- 2.28.0 ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
Re: [PATCH] drm/amdgpu: annotate a false positive recursive locking
On Fri, Aug 7, 2020 at 11:20 AM Daniel Vetter wrote: > > On Fri, Aug 7, 2020 at 11:08 AM Christian König > wrote: > > > > [SNIP] > > What we should do instead is to make sure we have only a single lock > > for the complete hive instead. > > [Dennis Li] If we use a single lock, users will must wait for all > > devices resuming successfully, but in fact their tasks are only > > running in device a. It is not friendly to users. > > >>> Well that is intentional I would say. We can only restart submissions > > >>> after all devices are resumed successfully, cause otherwise it can > > >>> happen that a job on device A depends on memory on device B which isn't > > >>> accessible. > > >>> [Dennis Li] Yes, you are right. Driver have make sure that the shared > > >>> resources(such as the shard memory) are ready before unlock the lock of > > >>> adev one by one. But each device still has private hardware resources > > >>> such as video and display. > > >> Yeah, but those are negligible and we have a team working on display > > >> support for XGMI. So this will sooner or later become a problem as well. > > >> > > >> I still think that a single rwsem for the whole hive is still the best > > >> option here. > > >> > > >> [Dennis Li] For your above concern, we can open a new thread to discuss > > >> it. If we make a decision to use a single after discussing, we can > > >> create another patch for it. > > > That's a really good argument, but I still hesitate to merge this patch. > > > How severe is the lockdep splat? > > > > > > At bare minimum we need a "/* TODO: " comment why we do this and how > > > to remove the workaround again. > > > [Dennis Li] It is not a workaround. According to design of lockdep, each > > > rw_semaphore should has a separated lock_class_key. I have explained that > > > init_rwsem has the above limitation, so we must correct it. > > > > Yeah, but this explanation is not really sufficient. Again this is not > > an limitation of init_rwsem, but intentional behavior. > > > > In other words in most cases lockdep should splat if you use rw > > semaphores like this. > > > > That we avoid this by locking multiple amdgpu device always in the same > > order is rather questionable driver design. > > Yeah don't give multiple locking classes like that, that's fairly > terrible. The right fix is mutex_lock_nest_lock, which we're using in > ww_mutex or in which is also used in mm_take_all_locks. > > If you solve the locking ordering by sorting all the locks you need > (ugh), then you can also just use a fake lockdep_map for your special > function only. > > Also in general the idea of an rwsem in conjunction with xgmi cross > device memory handling just plain freaks me out :-) Please make sure > you prime this lock against dma_resv and dma_fence (and maybe also > drm_modeset_lock if this lock is outside of dma_resv), and ideally > this should only ever be used for setup stuff when loading drivers. I > guess that's why you went with a per-device rwsem. If it's really only > for setup then just do the simple thing of having an > xgmi_hive_reconfigure lock, which all the rwsems nest within, and no > fancy lock ordering or other tricks. > > > > Core network driver (net/core/dev.c) has the similar use case with us. > > > > Just took a look at that, but this is completely different use case. The > > lockdep classes are grouped by driver type to note the difference in the > > network stack. > > > > A quick search showed that no other device driver is doing this in the > > kernel, so I'm not sure if such a behavior wouldn't be rejected. Adding > > Daniel to comment on this as well. > > > > Felix had some really good arguments to make that an urgent fix, so > > either we come up with some real rework of this or at least add a big > > "/* TODO*/". > > Aside from "I have questions" I don't think there's any reason to no > fix this correctly with mutex_lock_nest_lock. Really shouldn't be a > semantic change from what I'm understand here. Also one more with my drm maintainer hat on, as a heads-up: Dave&me looked at i915 gem code a bit too much, and we're appalled at the massive over-use of lockdep class hacks and dubious nesting annotations. Expect crack down on anyone who tries to play clever tricks here, we have a few too many already :-) So no mutex_lock_nested (totally different beast from mutex_lock_nest_lock, and really unsafe since it's a runtime change of the lockdep class) and also not any lockdep_set_class trickery or anything like that. We need locking hierarchies which mere humans can comprehend and review. Cheers, Daniel > > Cheers, Daniel > > > > > Regards, > > Christian. > > > > > > > > Regards, > > > Christian. > > > > > > > > -- > Daniel Vetter > Software Engineer, Intel Corporation > http://blog.ffwll.ch -- Daniel Vetter Software Engineer, Intel Corporation http://blog.ffwll.ch ___ amd-gfx mailing li
Re: [PATCH] drm/amdgpu: annotate a false positive recursive locking
On Fri, Aug 7, 2020 at 11:08 AM Christian König wrote: > > [SNIP] > What we should do instead is to make sure we have only a single lock for > the complete hive instead. > [Dennis Li] If we use a single lock, users will must wait for all > devices resuming successfully, but in fact their tasks are only running > in device a. It is not friendly to users. > >>> Well that is intentional I would say. We can only restart submissions > >>> after all devices are resumed successfully, cause otherwise it can happen > >>> that a job on device A depends on memory on device B which isn't > >>> accessible. > >>> [Dennis Li] Yes, you are right. Driver have make sure that the shared > >>> resources(such as the shard memory) are ready before unlock the lock of > >>> adev one by one. But each device still has private hardware resources > >>> such as video and display. > >> Yeah, but those are negligible and we have a team working on display > >> support for XGMI. So this will sooner or later become a problem as well. > >> > >> I still think that a single rwsem for the whole hive is still the best > >> option here. > >> > >> [Dennis Li] For your above concern, we can open a new thread to discuss > >> it. If we make a decision to use a single after discussing, we can create > >> another patch for it. > > That's a really good argument, but I still hesitate to merge this patch. > > How severe is the lockdep splat? > > > > At bare minimum we need a "/* TODO: " comment why we do this and how to > > remove the workaround again. > > [Dennis Li] It is not a workaround. According to design of lockdep, each > > rw_semaphore should has a separated lock_class_key. I have explained that > > init_rwsem has the above limitation, so we must correct it. > > Yeah, but this explanation is not really sufficient. Again this is not > an limitation of init_rwsem, but intentional behavior. > > In other words in most cases lockdep should splat if you use rw > semaphores like this. > > That we avoid this by locking multiple amdgpu device always in the same > order is rather questionable driver design. Yeah don't give multiple locking classes like that, that's fairly terrible. The right fix is mutex_lock_nest_lock, which we're using in ww_mutex or in which is also used in mm_take_all_locks. If you solve the locking ordering by sorting all the locks you need (ugh), then you can also just use a fake lockdep_map for your special function only. Also in general the idea of an rwsem in conjunction with xgmi cross device memory handling just plain freaks me out :-) Please make sure you prime this lock against dma_resv and dma_fence (and maybe also drm_modeset_lock if this lock is outside of dma_resv), and ideally this should only ever be used for setup stuff when loading drivers. I guess that's why you went with a per-device rwsem. If it's really only for setup then just do the simple thing of having an xgmi_hive_reconfigure lock, which all the rwsems nest within, and no fancy lock ordering or other tricks. > > Core network driver (net/core/dev.c) has the similar use case with us. > > Just took a look at that, but this is completely different use case. The > lockdep classes are grouped by driver type to note the difference in the > network stack. > > A quick search showed that no other device driver is doing this in the > kernel, so I'm not sure if such a behavior wouldn't be rejected. Adding > Daniel to comment on this as well. > > Felix had some really good arguments to make that an urgent fix, so > either we come up with some real rework of this or at least add a big > "/* TODO*/". Aside from "I have questions" I don't think there's any reason to no fix this correctly with mutex_lock_nest_lock. Really shouldn't be a semantic change from what I'm understand here. Cheers, Daniel > > Regards, > Christian. > > > > > Regards, > > Christian. > > > -- Daniel Vetter Software Engineer, Intel Corporation http://blog.ffwll.ch ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
Re: [PATCH] drm/amdgpu: make sure userptr ttm is allocated
On 2020-08-06 2:56 p.m., Christian König wrote: > We need to allocate that manually now. > > Signed-off-by: Christian König > Fixes: 2ddef17678bc (HEAD) drm/ttm: make TT creation purely optional v3 > --- > .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c| 2 +- > drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c | 2 +- > drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c | 17 +++-- > drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h | 4 ++-- > 4 files changed, 15 insertions(+), 10 deletions(-) > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c > b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c > index 9015c7b76d60..55d2e870fdda 100644 > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c > @@ -564,7 +564,7 @@ static int init_user_pages(struct kgd_mem *mem, uint64_t > user_addr) > > mutex_lock(&process_info->lock); > > - ret = amdgpu_ttm_tt_set_userptr(bo->tbo.ttm, user_addr, 0); > + ret = amdgpu_ttm_tt_set_userptr(&bo->tbo, user_addr, 0); > if (ret) { > pr_err("%s: Failed to set userptr: %d\n", __func__, ret); > goto out; I suspect more needs to be done for KFD, e.g. amdgpu_amdkfd_gpuvm_alloc_memory_of_gpu can dereference bo->tbo.ttm before calling init_user_pages, and there are more usages of bo->tbo.ttm that need to be audited. -- Earthling Michel Dänzer | https://redhat.com Libre software enthusiast | Mesa and X developer ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
Re: [PATCH] drm/amdgpu: annotate a false positive recursive locking
[SNIP] What we should do instead is to make sure we have only a single lock for the complete hive instead. [Dennis Li] If we use a single lock, users will must wait for all devices resuming successfully, but in fact their tasks are only running in device a. It is not friendly to users. Well that is intentional I would say. We can only restart submissions after all devices are resumed successfully, cause otherwise it can happen that a job on device A depends on memory on device B which isn't accessible. [Dennis Li] Yes, you are right. Driver have make sure that the shared resources(such as the shard memory) are ready before unlock the lock of adev one by one. But each device still has private hardware resources such as video and display. Yeah, but those are negligible and we have a team working on display support for XGMI. So this will sooner or later become a problem as well. I still think that a single rwsem for the whole hive is still the best option here. [Dennis Li] For your above concern, we can open a new thread to discuss it. If we make a decision to use a single after discussing, we can create another patch for it. That's a really good argument, but I still hesitate to merge this patch. How severe is the lockdep splat? At bare minimum we need a "/* TODO: " comment why we do this and how to remove the workaround again. [Dennis Li] It is not a workaround. According to design of lockdep, each rw_semaphore should has a separated lock_class_key. I have explained that init_rwsem has the above limitation, so we must correct it. Yeah, but this explanation is not really sufficient. Again this is not an limitation of init_rwsem, but intentional behavior. In other words in most cases lockdep should splat if you use rw semaphores like this. That we avoid this by locking multiple amdgpu device always in the same order is rather questionable driver design. Core network driver (net/core/dev.c) has the similar use case with us. Just took a look at that, but this is completely different use case. The lockdep classes are grouped by driver type to note the difference in the network stack. A quick search showed that no other device driver is doing this in the kernel, so I'm not sure if such a behavior wouldn't be rejected. Adding Daniel to comment on this as well. Felix had some really good arguments to make that an urgent fix, so either we come up with some real rework of this or at least add a big "/* TODO*/". Regards, Christian. Regards, Christian. ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
Re: [PATCH 7/7] drm/amd/display: Replace DRM private objects with subclassed DRM atomic state
On Thu, Jul 30, 2020 at 04:36:42PM -0400, Nicholas Kazlauskas wrote: > @@ -440,7 +431,7 @@ struct dm_crtc_state { > #define to_dm_crtc_state(x) container_of(x, struct dm_crtc_state, base) > > struct dm_atomic_state { > - struct drm_private_state base; > + struct drm_atomic_state base; > > struct dc_state *context; Also curiosity: Can't we just embed dc_state here, instead of a pointer? Then it would become a lot more obvious that mostly this is a state object container like drm_atomic_state, but for the DC specific state structures. And we could look into moving the actual DC states into drm private states perhaps (if that helps with the code structure and overall flow). Maybe as next steps. -Daniel -- Daniel Vetter Software Engineer, Intel Corporation http://blog.ffwll.ch ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
[PATCH] drm/amdgpu: Skip some registers config for SRIOV
Some registers are not accessible to virtual function setup, so skip their initialization when in VF-SRIOV mode. v2: move SRIOV VF check into specify functions; modify commit description and comment. Signed-off-by: Liu ChengZhe --- drivers/gpu/drm/amd/amdgpu/gfxhub_v2_1.c | 19 +++ drivers/gpu/drm/amd/amdgpu/mmhub_v2_0.c | 19 +++ 2 files changed, 38 insertions(+) diff --git a/drivers/gpu/drm/amd/amdgpu/gfxhub_v2_1.c b/drivers/gpu/drm/amd/amdgpu/gfxhub_v2_1.c index 1f6112b7fa49..80c906a0383f 100644 --- a/drivers/gpu/drm/amd/amdgpu/gfxhub_v2_1.c +++ b/drivers/gpu/drm/amd/amdgpu/gfxhub_v2_1.c @@ -182,6 +182,12 @@ static void gfxhub_v2_1_init_cache_regs(struct amdgpu_device *adev) { uint32_t tmp; + /* These registers are not accessible to VF-SRIOV. +* The PF will program them instead. +*/ + if (amdgpu_sriov_vf(adev)) + return; + /* Setup L2 cache */ tmp = RREG32_SOC15(GC, 0, mmGCVM_L2_CNTL); tmp = REG_SET_FIELD(tmp, GCVM_L2_CNTL, ENABLE_L2_CACHE, 1); @@ -237,6 +243,12 @@ static void gfxhub_v2_1_enable_system_domain(struct amdgpu_device *adev) static void gfxhub_v2_1_disable_identity_aperture(struct amdgpu_device *adev) { + /* These registers are not accessible to VF-SRIOV. +* The PF will program them instead. +*/ + if (amdgpu_sriov_vf(adev)) + return; + WREG32_SOC15(GC, 0, mmGCVM_L2_CONTEXT1_IDENTITY_APERTURE_LOW_ADDR_LO32, 0x); WREG32_SOC15(GC, 0, mmGCVM_L2_CONTEXT1_IDENTITY_APERTURE_LOW_ADDR_HI32, @@ -373,6 +385,13 @@ void gfxhub_v2_1_set_fault_enable_default(struct amdgpu_device *adev, bool value) { u32 tmp; + + /* These registers are not accessible to VF-SRIOV. +* The PF will program them instead. +*/ + if (amdgpu_sriov_vf(adev)) + return; + tmp = RREG32_SOC15(GC, 0, mmGCVM_L2_PROTECTION_FAULT_CNTL); tmp = REG_SET_FIELD(tmp, GCVM_L2_PROTECTION_FAULT_CNTL, RANGE_PROTECTION_FAULT_ENABLE_DEFAULT, value); diff --git a/drivers/gpu/drm/amd/amdgpu/mmhub_v2_0.c b/drivers/gpu/drm/amd/amdgpu/mmhub_v2_0.c index d83912901f73..8acb3b625afe 100644 --- a/drivers/gpu/drm/amd/amdgpu/mmhub_v2_0.c +++ b/drivers/gpu/drm/amd/amdgpu/mmhub_v2_0.c @@ -181,6 +181,12 @@ static void mmhub_v2_0_init_cache_regs(struct amdgpu_device *adev) { uint32_t tmp; + /* These registers are not accessible to VF-SRIOV. +* The PF will program them instead. +*/ + if (amdgpu_sriov_vf(adev)) + return; + /* Setup L2 cache */ tmp = RREG32_SOC15(MMHUB, 0, mmMMVM_L2_CNTL); tmp = REG_SET_FIELD(tmp, MMVM_L2_CNTL, ENABLE_L2_CACHE, 1); @@ -236,6 +242,12 @@ static void mmhub_v2_0_enable_system_domain(struct amdgpu_device *adev) static void mmhub_v2_0_disable_identity_aperture(struct amdgpu_device *adev) { + /* These registers are not accessible to VF-SRIOV. +* The PF will program them instead. +*/ + if (amdgpu_sriov_vf(adev)) + return; + WREG32_SOC15(MMHUB, 0, mmMMVM_L2_CONTEXT1_IDENTITY_APERTURE_LOW_ADDR_LO32, 0x); @@ -365,6 +377,13 @@ void mmhub_v2_0_gart_disable(struct amdgpu_device *adev) void mmhub_v2_0_set_fault_enable_default(struct amdgpu_device *adev, bool value) { u32 tmp; + + /* These registers are not accessible to VF-SRIOV. +* The PF will program them instead. +*/ + if (amdgpu_sriov_vf(adev)) + return; + tmp = RREG32_SOC15(MMHUB, 0, mmMMVM_L2_PROTECTION_FAULT_CNTL); tmp = REG_SET_FIELD(tmp, MMVM_L2_PROTECTION_FAULT_CNTL, RANGE_PROTECTION_FAULT_ENABLE_DEFAULT, value); -- 2.25.1 ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
Re: [PATCH 7/7] drm/amd/display: Replace DRM private objects with subclassed DRM atomic state
On Thu, Jul 30, 2020 at 04:36:42PM -0400, Nicholas Kazlauskas wrote: > [Why] > DM atomic check was structured in a way that we required old DC state > in order to dynamically add and remove planes and streams from the > context to build the DC state context for validation. > > DRM private objects were used to carry over the last DC state and > were added to the context on nearly every commit - regardless of fast > or full so we could check whether or not the new state could affect > bandwidth. > > The problem with this model is that DRM private objects do not > implicitly stall out other commits. > > So if you have two commits touching separate DRM objects they could > run concurrently and potentially execute out of order - leading to a > use-after-free. > > If we want this to be safe we have two options: > 1. Stall out concurrent commits since they touch the same private object > 2. Refactor DM to not require old DC state and drop private object usage > > [How] > This implements approach #2 since it still allows for judder free > updates in multi-display scenarios. > > I'll list the big changes in order at a high level: > > 1. Subclass DRM atomic state instead of using DRM private objects. > > DC relied on the old state to determine which changes cause bandwidth > updates but now we have DM perform similar checks based on DRM state > instead - dropping the requirement for old state to exist at all. > > This means that we can now build a new DC context from scratch whenever > we have something that DM thinks could affect bandwidth. > > Whenever we need to rebuild bandwidth we now add all CRTCs and planes > to the DRM state in order to get the absolute set of DC streams and > DC planes. > > This introduces a stall on other commits, but this stall already > exists because of the lock_and_validation logic and it's necessary > since updates may move around pipes and require full reprogramming. Hm I think long term would be neat if you can first just add the dc streams for the current planes (if you convert them individually to private state objects), and ask DC to recompute specifics just for that. If DC then says "yes I need an even bigger recompute" only then do you grab all the streams and everything else. The idea is that essentially you treat individual stream objects as read-locks on that part of the overall global state, and only when you need to do a write do you grab all the "read locks" to do the update. But might not actually help for your hw. Just highlighting this as a pattern we've done. -Daniel > > 2. Drop workarounds to add planes to maintain z-order early in atomic > check. This is no longer needed because of the changes for (1). > > This also involves fixing up should_plane_reset checks since we can just > avoid resetting streams and planes when they haven't actually changed. > > 3. Rework dm_update_crtc_state and dm_update_plane_state to be single > pass instead of two pass. > > This is necessary since we no longer have the dc_state to add and > remove planes to the context in and we want to defer creation to the > end of commit_check. > > It also makes the logic a lot simpler to follow since as an added bonus. > > Cc: Bhawanpreet Lakha > Cc: Harry Wentland > Cc: Leo Li > Cc: Daniel Vetter > Signed-off-by: Nicholas Kazlauskas > --- > .../gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 720 +++--- > .../gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.h | 11 +- > 2 files changed, 280 insertions(+), 451 deletions(-) > > diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c > b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c > index 59829ec81694..97a7dfc620e8 100644 > --- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c > +++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c > @@ -1839,7 +1839,6 @@ static int dm_resume(void *handle) > struct drm_plane *plane; > struct drm_plane_state *new_plane_state; > struct dm_plane_state *dm_new_plane_state; > - struct dm_atomic_state *dm_state = > to_dm_atomic_state(dm->atomic_obj.state); > enum dc_connection_type new_connection_type = dc_connection_none; > struct dc_state *dc_state; > int i, r, j; > @@ -1879,11 +1878,6 @@ static int dm_resume(void *handle) > > return 0; > } > - /* Recreate dc_state - DC invalidates it when setting power state to > S3. */ > - dc_release_state(dm_state->context); > - dm_state->context = dc_create_state(dm->dc); > - /* TODO: Remove dc_state->dccg, use dc->dccg directly. */ > - dc_resource_state_construct(dm->dc, dm_state->context); > > /* Before powering on DC we need to re-initialize DMUB. */ > r = dm_dmub_hw_init(adev); > @@ -2019,11 +2013,51 @@ const struct amdgpu_ip_block_version dm_ip_block = > * *WIP* > */ > > +struct drm_atomic_state *dm_atomic_state_alloc(struct drm_device *dev) > +{ > + struct dm_atomic_state *dm_state; > + > + dm_state = kzalloc(sizeof(*dm_st
Re: [PATCH 5/7] drm/amd/display: Reset plane for anything that's not a FAST update
On Thu, Jul 30, 2020 at 04:36:40PM -0400, Nicholas Kazlauskas wrote: > [Why] > MEDIUM or FULL updates can require global validation or affect > bandwidth. By treating these all simply as surface updates we aren't > actually passing this through DC global validation. > > [How] > There's currently no way to pass surface updates through DC global > validation, nor do I think it's a good idea to change the interface > to accept these. > > DC global validation itself is currently stateless, and we can move > our update type checking to be stateless as well by duplicating DC > surface checks in DM based on DRM properties. > > We wanted to rely on DC automatically determining this since DC knows > best, but DM is ultimately what fills in everything into DC plane > state so it does need to know as well. > > There are basically only three paths that we exercise in DM today: > > 1) Cursor (async update) > 2) Pageflip (fast update) > 3) Full pipe programming (medium/full updates) > > Which means that anything that's more than a pageflip really needs to > go down path #3. > > So this change duplicates all the surface update checks based on DRM > state instead inside of should_reset_plane(). > > Next step is dropping dm_determine_update_type_for_commit and we no > longer require the old DC state at all for global validation. I think we do something similar in i915, where we have a "nothing except base address changed" fast path, but for anything else we fully compute a new state. Obviously you should try to keep global state synchronization to a minimum for this step, so it's not entirely only 2 options. Once we have the states, we compare them and figure out whether we can get away with a fast modeset operation (maybe what you guys call medium update). Anyway I think being slightly more aggressive with computing full state, and then falling back to more optimized update again is a good approach. Only risk is if we you have too much synchronization in your locking (e.g. modern compositors do like to change tiling and stuff, especially once you have modifiers enabled, so this shouldn't cause a sync across crtc except when absolutely needed). -Daniel > > Optimization can come later so we don't reset DC planes at all for > MEDIUM udpates and avoid validation, but we might require some extra > checks in DM to achieve this. > > Cc: Bhawanpreet Lakha > Cc: Hersen Wu > Signed-off-by: Nicholas Kazlauskas > --- > .../gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 25 +++ > 1 file changed, 25 insertions(+) > > diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c > b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c > index 0d5f45742bb5..2cbb29199e61 100644 > --- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c > +++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c > @@ -8336,6 +8336,31 @@ static bool should_reset_plane(struct drm_atomic_state > *state, > if (old_other_state->crtc != new_other_state->crtc) > return true; > > + /* Src/dst size and scaling updates. */ > + if (old_other_state->src_w != new_other_state->src_w || > + old_other_state->src_h != new_other_state->src_h || > + old_other_state->crtc_w != new_other_state->crtc_w || > + old_other_state->crtc_h != new_other_state->crtc_h) > + return true; > + > + /* Rotation / mirroring updates. */ > + if (old_other_state->rotation != new_other_state->rotation) > + return true; > + > + /* Blending updates. */ > + if (old_other_state->pixel_blend_mode != > + new_other_state->pixel_blend_mode) > + return true; > + > + /* Alpha updates. */ > + if (old_other_state->alpha != new_other_state->alpha) > + return true; > + > + /* Colorspace changes. */ > + if (old_other_state->color_range != > new_other_state->color_range || > + old_other_state->color_encoding != > new_other_state->color_encoding) > + return true; > + > /* Framebuffer checks fall at the end. */ > if (!old_other_state->fb || !new_other_state->fb) > continue; > -- > 2.25.1 > > ___ > dri-devel mailing list > dri-de...@lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/dri-devel -- Daniel Vetter Software Engineer, Intel Corporation http://blog.ffwll.ch ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
Re: [PATCH 3/7] drm/amd/display: Avoid using unvalidated tiling_flags and tmz_surface in prepare_planes
On Thu, Jul 30, 2020 at 04:36:38PM -0400, Nicholas Kazlauskas wrote: > [Why] > We're racing with userspace as the flags could potentially change > from when we acquired and validated them in commit_check. Uh ... I didn't know these could change. I think my comments on Bas' series are even more relevant now. I think long term would be best to bake these flags in at addfb time when modifiers aren't set. And otherwise always use the modifiers flag, and completely ignore the legacy flags here. -Daniel > > [How] > We unfortunately can't drop this function in its entirety from > prepare_planes since we don't know the afb->address at commit_check > time yet. > > So instead of querying new tiling_flags and tmz_surface use the ones > from the plane_state directly. > > While we're at it, also update the force_disable_dcc option based > on the state from atomic check. > > Cc: Bhawanpreet Lakha > Cc: Rodrigo Siqueira > Signed-off-by: Nicholas Kazlauskas > --- > .../gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 36 ++- > 1 file changed, 19 insertions(+), 17 deletions(-) > > diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c > b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c > index bf1881bd492c..f78c09c9585e 100644 > --- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c > +++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c > @@ -5794,14 +5794,8 @@ static int dm_plane_helper_prepare_fb(struct drm_plane > *plane, > struct list_head list; > struct ttm_validate_buffer tv; > struct ww_acquire_ctx ticket; > - uint64_t tiling_flags; > uint32_t domain; > int r; > - bool tmz_surface = false; > - bool force_disable_dcc = false; > - > - dm_plane_state_old = to_dm_plane_state(plane->state); > - dm_plane_state_new = to_dm_plane_state(new_state); > > if (!new_state->fb) { > DRM_DEBUG_DRIVER("No FB bound\n"); > @@ -5845,27 +5839,35 @@ static int dm_plane_helper_prepare_fb(struct > drm_plane *plane, > return r; > } > > - amdgpu_bo_get_tiling_flags(rbo, &tiling_flags); > - > - tmz_surface = amdgpu_bo_encrypted(rbo); > - > ttm_eu_backoff_reservation(&ticket, &list); > > afb->address = amdgpu_bo_gpu_offset(rbo); > > amdgpu_bo_ref(rbo); > > + /** > + * We don't do surface updates on planes that have been newly created, > + * but we also don't have the afb->address during atomic check. > + * > + * Fill in buffer attributes depending on the address here, but only on > + * newly created planes since they're not being used by DC yet and this > + * won't modify global state. > + */ > + dm_plane_state_old = to_dm_plane_state(plane->state); > + dm_plane_state_new = to_dm_plane_state(new_state); > + > if (dm_plane_state_new->dc_state && > - dm_plane_state_old->dc_state != > dm_plane_state_new->dc_state) { > - struct dc_plane_state *plane_state = > dm_plane_state_new->dc_state; > + dm_plane_state_old->dc_state != dm_plane_state_new->dc_state) { > + struct dc_plane_state *plane_state = > + dm_plane_state_new->dc_state; > + bool force_disable_dcc = !plane_state->dcc.enable; > > - force_disable_dcc = adev->asic_type == CHIP_RAVEN && > adev->in_suspend; > fill_plane_buffer_attributes( > adev, afb, plane_state->format, plane_state->rotation, > - tiling_flags, &plane_state->tiling_info, > - &plane_state->plane_size, &plane_state->dcc, > - &plane_state->address, tmz_surface, > - force_disable_dcc); > + dm_plane_state_new->tiling_flags, > + &plane_state->tiling_info, &plane_state->plane_size, > + &plane_state->dcc, &plane_state->address, > + dm_plane_state_new->tmz_surface, force_disable_dcc); > } > > return 0; > -- > 2.25.1 > > ___ > dri-devel mailing list > dri-de...@lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/dri-devel -- Daniel Vetter Software Engineer, Intel Corporation http://blog.ffwll.ch ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
Re: [PATCH 0/7] drm/amd/display: Drop DRM private objects from amdgpu_dm
On Thu, Jul 30, 2020 at 04:36:35PM -0400, Nicholas Kazlauskas wrote: > Based on the analysis of the bug from [1] the best course of action seems > to be swapping off of DRM private objects back to subclassing DRM atomic > state instead. > > This patch series implements this change, but not yet the other changes > suggested in the threads from that bug - these will come later. > > CCing dri-devel per Daniel's suggestion since this issue brought > some interesting misuse of private objects. I ended up reading around a bit, and it feels like the sub-objects might make a reasonable private state structure perhaps. Like dc_stream_state, at least when reading around in e.g. dc_remove_stream_from_ctx. But would need to come up with a plan how to integrate this on the other os side of DC I guess :-) Anyway I'd say more evidence that dc_state needs to subclass drm_atomic_state. Another thing I wondered is whether we should rename drm_atomic_state to drm_atomic_state_update, so it's clear it's the container with the updated states, not a real state object thing itself. -Daniel > > [1] https://bugzilla.kernel.org/show_bug.cgi?id=207383 > > Nicholas Kazlauskas (7): > drm/amd/display: Store tiling_flags and tmz_surface on dm_plane_state > drm/amd/display: Reset plane when tiling flags change > drm/amd/display: Avoid using unvalidated tiling_flags and tmz_surface > in prepare_planes > drm/amd/display: Use validated tiling_flags and tmz_surface in > commit_tail > drm/amd/display: Reset plane for anything that's not a FAST update > drm/amd/display: Drop dm_determine_update_type_for_commit > drm/amd/display: Replace DRM private objects with subclassed DRM > atomic state > > .../gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 967 ++ > .../gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.h | 13 +- > 2 files changed, 343 insertions(+), 637 deletions(-) > > -- > 2.25.1 > > ___ > dri-devel mailing list > dri-de...@lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/dri-devel -- Daniel Vetter Software Engineer, Intel Corporation http://blog.ffwll.ch ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
[PATCH] drm/amdkfd: force raven as "dgpu" path
We still have a few iommu issues which need to address, so force raven as "dgpu" path for the moment. Will enable IOMMUv2 since the issues are fixed. Signed-off-by: Huang Rui --- drivers/gpu/drm/amd/amdkfd/kfd_crat.c | 6 ++ drivers/gpu/drm/amd/amdkfd/kfd_device.c | 4 ++-- 2 files changed, 8 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_crat.c b/drivers/gpu/drm/amd/amdkfd/kfd_crat.c index 6a250f8fcfb8..66d9f7280fe8 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_crat.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_crat.c @@ -22,6 +22,7 @@ #include #include +#include #include "kfd_crat.h" #include "kfd_priv.h" #include "kfd_topology.h" @@ -781,6 +782,11 @@ int kfd_create_crat_image_acpi(void **crat_image, size_t *size) return -ENODATA; } + /* Raven series will go with the fallback path to use virtual CRAT */ + if (boot_cpu_data.x86_vendor == X86_VENDOR_AMD && + boot_cpu_data.x86 == 0x17) + return -ENODATA; + pcrat_image = kmemdup(crat_table, crat_table->length, GFP_KERNEL); if (!pcrat_image) return -ENOMEM; diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device.c b/drivers/gpu/drm/amd/amdkfd/kfd_device.c index d5e790f046b4..93179c928371 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_device.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_device.c @@ -116,6 +116,7 @@ static const struct kfd_device_info carrizo_device_info = { .num_xgmi_sdma_engines = 0, .num_sdma_queues_per_engine = 2, }; +#endif static const struct kfd_device_info raven_device_info = { .asic_family = CHIP_RAVEN, @@ -128,13 +129,12 @@ static const struct kfd_device_info raven_device_info = { .num_of_watch_points = 4, .mqd_size_aligned = MQD_SIZE_ALIGNED, .supports_cwsr = true, - .needs_iommu_device = true, + .needs_iommu_device = false, .needs_pci_atomics = true, .num_sdma_engines = 1, .num_xgmi_sdma_engines = 0, .num_sdma_queues_per_engine = 2, }; -#endif static const struct kfd_device_info hawaii_device_info = { .asic_family = CHIP_HAWAII, -- 2.25.1 ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
Re: [PATCH 1/7] drm/amd/display: Store tiling_flags and tmz_surface on dm_plane_state
On Thu, Jul 30, 2020 at 04:36:36PM -0400, Nicholas Kazlauskas wrote: > [Why] > Store these in advance so we can reuse them later in commit_tail without > having to reserve the fbo again. > > These will also be used for checking for tiling changes when deciding > to reset the plane or not. I've also dropped some comments on Bas' series for adding modifiers which might be relevant for shuffling all this. But yeah stuff this into plane state sounds like a good idea. -Daniel > > [How] > This change should mostly be a refactor. Only commit check is affected > for now and I'll drop the get_fb_info calls in prepare_planes and > commit_tail after. > > This runs a prepass loop once we think that all planes have been added > to the context and replaces the get_fb_info calls with accessing the > dm_plane_state instead. > > Cc: Bhawanpreet Lakha > Cc: Rodrigo Siqueira > Signed-off-by: Nicholas Kazlauskas > --- > .../gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 60 +++ > .../gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.h | 2 + > 2 files changed, 37 insertions(+), 25 deletions(-) > > diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c > b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c > index 8d64f5fde7e2..7cc5ab90ce13 100644 > --- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c > +++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c > @@ -3700,8 +3700,17 @@ static int fill_dc_scaling_info(const struct > drm_plane_state *state, > static int get_fb_info(const struct amdgpu_framebuffer *amdgpu_fb, > uint64_t *tiling_flags, bool *tmz_surface) > { > - struct amdgpu_bo *rbo = gem_to_amdgpu_bo(amdgpu_fb->base.obj[0]); > - int r = amdgpu_bo_reserve(rbo, false); > + struct amdgpu_bo *rbo; > + int r; > + > + if (!amdgpu_fb) { > + *tiling_flags = 0; > + *tmz_surface = false; > + return 0; > + } > + > + rbo = gem_to_amdgpu_bo(amdgpu_fb->base.obj[0]); > + r = amdgpu_bo_reserve(rbo, false); > > if (unlikely(r)) { > /* Don't show error message when returning -ERESTARTSYS */ > @@ -4124,13 +4133,10 @@ static int fill_dc_plane_attributes(struct > amdgpu_device *adev, > struct drm_crtc_state *crtc_state) > { > struct dm_crtc_state *dm_crtc_state = to_dm_crtc_state(crtc_state); > - const struct amdgpu_framebuffer *amdgpu_fb = > - to_amdgpu_framebuffer(plane_state->fb); > + struct dm_plane_state *dm_plane_state = to_dm_plane_state(plane_state); > struct dc_scaling_info scaling_info; > struct dc_plane_info plane_info; > - uint64_t tiling_flags; > int ret; > - bool tmz_surface = false; > bool force_disable_dcc = false; > > ret = fill_dc_scaling_info(plane_state, &scaling_info); > @@ -4142,15 +4148,12 @@ static int fill_dc_plane_attributes(struct > amdgpu_device *adev, > dc_plane_state->clip_rect = scaling_info.clip_rect; > dc_plane_state->scaling_quality = scaling_info.scaling_quality; > > - ret = get_fb_info(amdgpu_fb, &tiling_flags, &tmz_surface); > - if (ret) > - return ret; > - > force_disable_dcc = adev->asic_type == CHIP_RAVEN && adev->in_suspend; > - ret = fill_dc_plane_info_and_addr(adev, plane_state, tiling_flags, > + ret = fill_dc_plane_info_and_addr(adev, plane_state, > + dm_plane_state->tiling_flags, > &plane_info, > &dc_plane_state->address, > - tmz_surface, > + dm_plane_state->tmz_surface, > force_disable_dcc); > if (ret) > return ret; > @@ -5753,6 +5756,10 @@ dm_drm_plane_duplicate_state(struct drm_plane *plane) > dc_plane_state_retain(dm_plane_state->dc_state); > } > > + /* Framebuffer hasn't been updated yet, so retain old flags. */ > + dm_plane_state->tiling_flags = old_dm_plane_state->tiling_flags; > + dm_plane_state->tmz_surface = old_dm_plane_state->tmz_surface; > + > return &dm_plane_state->base; > } > > @@ -8557,13 +8564,9 @@ dm_determine_update_type_for_commit(struct > amdgpu_display_manager *dm, > continue; > > for_each_oldnew_plane_in_state(state, plane, old_plane_state, > new_plane_state, j) { > - const struct amdgpu_framebuffer *amdgpu_fb = > - to_amdgpu_framebuffer(new_plane_state->fb); > struct dc_plane_info *plane_info = > &bundle->plane_infos[num_plane]; > struct dc_flip_addrs *flip_addr = > &bundle->flip_addrs[num_plane]; > struct dc_scaling_info *scaling_info = > &bundle->scaling_infos[num_plane]; > - uint64_t tiling_flags; > -
[PATCH v2] drm/amdgpu: annotate a false positive recursive locking
[ 584.110304] [ 584.110590] WARNING: possible recursive locking detected [ 584.110876] 5.6.0-deli-v5.6-2848-g3f3109b0e75f #1 Tainted: G OE [ 584.64] [ 584.111456] kworker/38:1/553 is trying to acquire lock: [ 584.111721] 9b15ff0a47a0 (&adev->reset_sem){}, at: amdgpu_device_gpu_recover+0x262/0x1030 [amdgpu] [ 584.112112] but task is already holding lock: [ 584.112673] 9b1603d247a0 (&adev->reset_sem){}, at: amdgpu_device_gpu_recover+0x262/0x1030 [amdgpu] [ 584.113068] other info that might help us debug this: [ 584.113689] Possible unsafe locking scenario: [ 584.114350]CPU0 [ 584.114685] [ 584.115014] lock(&adev->reset_sem); [ 584.115349] lock(&adev->reset_sem); [ 584.115678] *** DEADLOCK *** [ 584.116624] May be due to missing lock nesting notation [ 584.117284] 4 locks held by kworker/38:1/553: [ 584.117616] #0: 9ad635c1d348 ((wq_completion)events){+.+.}, at: process_one_work+0x21f/0x630 [ 584.117967] #1: ac708e1c3e58 ((work_completion)(&con->recovery_work)){+.+.}, at: process_one_work+0x21f/0x630 [ 584.118358] #2: c1c2a5d0 (&tmp->hive_lock){+.+.}, at: amdgpu_device_gpu_recover+0xae/0x1030 [amdgpu] [ 584.118786] #3: 9b1603d247a0 (&adev->reset_sem){}, at: amdgpu_device_gpu_recover+0x262/0x1030 [amdgpu] [ 584.119222] stack backtrace: [ 584.119990] CPU: 38 PID: 553 Comm: kworker/38:1 Kdump: loaded Tainted: G OE 5.6.0-deli-v5.6-2848-g3f3109b0e75f #1 [ 584.120782] Hardware name: Supermicro SYS-7049GP-TRT/X11DPG-QT, BIOS 3.1 05/23/2019 [ 584.121223] Workqueue: events amdgpu_ras_do_recovery [amdgpu] [ 584.121638] Call Trace: [ 584.122050] dump_stack+0x98/0xd5 [ 584.122499] __lock_acquire+0x1139/0x16e0 [ 584.122931] ? trace_hardirqs_on+0x3b/0xf0 [ 584.123358] ? cancel_delayed_work+0xa6/0xc0 [ 584.123771] lock_acquire+0xb8/0x1c0 [ 584.124197] ? amdgpu_device_gpu_recover+0x262/0x1030 [amdgpu] [ 584.124599] down_write+0x49/0x120 [ 584.125032] ? amdgpu_device_gpu_recover+0x262/0x1030 [amdgpu] [ 584.125472] amdgpu_device_gpu_recover+0x262/0x1030 [amdgpu] [ 584.125910] ? amdgpu_ras_error_query+0x1b8/0x2a0 [amdgpu] [ 584.126367] amdgpu_ras_do_recovery+0x159/0x190 [amdgpu] [ 584.126789] process_one_work+0x29e/0x630 [ 584.127208] worker_thread+0x3c/0x3f0 [ 584.127621] ? __kthread_parkme+0x61/0x90 [ 584.128014] kthread+0x12f/0x150 [ 584.128402] ? process_one_work+0x630/0x630 [ 584.128790] ? kthread_park+0x90/0x90 [ 584.129174] ret_from_fork+0x3a/0x50 Each adev has owned lock_class_key to avoid false positive recursive locking. v2: 1. register adev->lock_key into lockdep, otherwise lockdep will report the below warning [ 1216.705820] BUG: key 890183b647d0 has not been registered! [ 1216.705924] [ cut here ] [ 1216.705972] DEBUG_LOCKS_WARN_ON(1) [ 1216.705997] WARNING: CPU: 20 PID: 541 at kernel/locking/lockdep.c:3743 lockdep_init_map+0x150/0x210 Signed-off-by: Dennis Li Change-Id: I7571efeccbf15483982031d00504a353031a854a diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h b/drivers/gpu/drm/amd/amdgpu/amdgpu.h index e97c088d03b3..766dc8f8c8a0 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h @@ -967,6 +967,7 @@ struct amdgpu_device { atomic_tin_gpu_reset; enum pp_mp1_state mp1_state; struct rw_semaphore reset_sem; + struct lock_class_key lock_key; struct amdgpu_doorbell_index doorbell_index; struct mutexnotifier_lock; diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c index 62ecac97fbd2..ae0a576f9895 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c @@ -3037,6 +3037,8 @@ int amdgpu_device_init(struct amdgpu_device *adev, mutex_init(&adev->virt.vf_errors.lock); hash_init(adev->mn_hash); init_rwsem(&adev->reset_sem); + lockdep_register_key(&adev->lock_key); + lockdep_set_class(&adev->reset_sem, &adev->lock_key); atomic_set(&adev->in_gpu_reset, 0); mutex_init(&adev->psp.mutex); mutex_init(&adev->notifier_lock); @@ -3411,6 +3413,8 @@ void amdgpu_device_fini(struct amdgpu_device *adev) amdgpu_pmu_fini(adev); if (adev->discovery_bin) amdgpu_discovery_fini(adev); + + lockdep_unregister_key(&adev->lock_key); } -- 2.17.1 ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
RE: [PATCH] drm/amdgpu: annotate a false positive recursive locking
[AMD Public Use] > [AMD Public Use] > >> [SNIP] I think it is a limitation of init_rwsem. >>> And exactly that's wrong, this is intentional and perfectly correct. >>> >>> [Dennis Li] I couldn't understand. Why is it a perfectly correct? >>> For example, we define two rw_sem: a and b. If we don't check init_rwsem >>> definition, we may think case#1 and case#2 have the same behavior, but in >>> fact they are different. >>> >>> Case 1: >>> init_rwsem(&a); >>> init_rwsem(&b); >>> >>> Case2: >>> void init_rwsem_ext(rw_sem* sem) >>> { >>> init_rwsem(sem); >>> } >>> init_rwsem_ext(&a); >>> init_rwsem_ext(&b); >>> >>> As far as I know it is perfectly possible that the locks in the hive are >>> not always grabbed in the same order. And that's why lockdep is complaining >>> about this. >>> [Dennis Li] No. I takes a hive with two devices(a and b) to explain why >>> lockdep complain. >>> >>> // Firstly driver lock the reset_sem of all devices >>> down_write(&a->reset_sem); do_suspend(a); >>> down_write(&b->reset_sem); // Because b shared the same lock_class_key >>> with a, lockdep will take a and b as the same rw_sem and complain here. >>> do_suspend(b); >>> >>> // do recovery >>> do_hive_recovery() >>> >>> // unlock the reset_sem of all devices do_resume(a); >>> up_write(&a->reset_sem); do_resume(b); up_write(&b->reset_sem); >> Ah! Now I understand what you are working around. So the problem is the >> static lock_class_key in the macro? >> [Dennis Li] Yes. The author of init_rwsem might not consider our similar use >> case. Well this is also really not the intended use case. When you lock the same rwsem class recursively you can easily run into deadlocks if you don't keep the order the same all the time. And abstracting init functions behind your own layer is a no-go in the Linux kernel as well. >>> What we should do instead is to make sure we have only a single lock for >>> the complete hive instead. >>> [Dennis Li] If we use a single lock, users will must wait for all devices >>> resuming successfully, but in fact their tasks are only running in device >>> a. It is not friendly to users. >> Well that is intentional I would say. We can only restart submissions after >> all devices are resumed successfully, cause otherwise it can happen that a >> job on device A depends on memory on device B which isn't accessible. >> [Dennis Li] Yes, you are right. Driver have make sure that the shared >> resources(such as the shard memory) are ready before unlock the lock of adev >> one by one. But each device still has private hardware resources such as >> video and display. > Yeah, but those are negligible and we have a team working on display support > for XGMI. So this will sooner or later become a problem as well. > > I still think that a single rwsem for the whole hive is still the best option > here. > > [Dennis Li] For your above concern, we can open a new thread to discuss it. > If we make a decision to use a single after discussing, we can create another > patch for it. That's a really good argument, but I still hesitate to merge this patch. How severe is the lockdep splat? At bare minimum we need a "/* TODO: " comment why we do this and how to remove the workaround again. [Dennis Li] It is not a workaround. According to design of lockdep, each rw_semaphore should has a separated lock_class_key. I have explained that init_rwsem has the above limitation, so we must correct it. Core network driver (net/core/dev.c) has the similar use case with us. Regards, Christian. > > Best Regards > Dennis lI >> Regards, >> Christian. >> >>> Regards, >>> Christian. >>> >>> Am 06.08.20 um 11:15 schrieb Li, Dennis: [AMD Official Use Only - Internal Distribution Only] Hi, Christian, For this case, it is safe to use separated lock key. Please see the definition of init_rwsem as the below. Every init_rwsem calling will use a new static key, but devices of the hive share the same code to do initialization, so their lock_class_key are the same. I think it is a limitation of init_rwsem. In our case, it should be correct that reset_sem of each adev has different lock_class_key. BTW, this change doesn't effect dead-lock detection and just correct it. #define init_rwsem(sem)\ do { \ static struct lock_class_key __key; \ \ __init_rwsem((sem), #sem, &__key); \ } while (0) Best Regards Dennis Li -Original Message- From: Koenig, Christian Sent: Thursday, August 6, 2020 4:13 PM To: Li, Dennis ; amd-gfx@lists.freedesktop.org; Deucher, Alexander ; Kuehling, Felix ; Zhang, Hawking Subject: Re: [PATCH] drm/amdgpu: annotate a false positi
Re: [PATCH] drm/amdgpu: annotate a false positive recursive locking
Am 2020-08-07 um 2:57 a.m. schrieb Christian König: [snip] > That's a really good argument, but I still hesitate to merge this > patch. How severe is the lockdep splat? I argued before that any lockdep splat is bad, because it disables further lockdep checking and can hide other lockdep problems down the road. The longer you live with it, the more potential locking problems you can introduce unknowingly, that you'll need to work through one at a time, as you uncover them later. Regards, Felix > > At bare minimum we need a "/* TODO: " comment why we do this and > how to remove the workaround again. > > Regards, > Christian. > >> >> Best Regards >> Dennis lI >>> Regards, >>> Christian. >>> Regards, Christian. Am 06.08.20 um 11:15 schrieb Li, Dennis: > [AMD Official Use Only - Internal Distribution Only] > > Hi, Christian, > For this case, it is safe to use separated lock key. > Please see the definition of init_rwsem as the below. Every > init_rwsem calling will use a new static key, but devices of the > hive share the same code to do initialization, so their > lock_class_key are the same. I think it is a limitation of > init_rwsem. In our case, it should be correct that reset_sem of > each adev has different lock_class_key. BTW, this change doesn't > effect dead-lock detection and just correct it. > > #define init_rwsem(sem) \ > do { \ > static struct lock_class_key __key; \ > \ > __init_rwsem((sem), #sem, &__key); \ > } while (0) > > Best Regards > Dennis Li > -Original Message- > From: Koenig, Christian > Sent: Thursday, August 6, 2020 4:13 PM > To: Li, Dennis ; amd-gfx@lists.freedesktop.org; > Deucher, Alexander ; Kuehling, Felix > ; Zhang, Hawking > Subject: Re: [PATCH] drm/amdgpu: annotate a false positive recursive > locking > > Preventing locking problems during implementation is obviously a > good approach, but lockdep has proven to be massively useful for > finding and fixing problems. > > Disabling lockdep splat by annotating lock with separate classes > is usually a no-go and only allowed if there is no other potential > approach. > > In this case here we should really clean things up instead. > > Christian. > > Am 06.08.20 um 09:44 schrieb Li, Dennis: >> [AMD Official Use Only - Internal Distribution Only] >> >> Hi, Christian, >> I agree with your concern. However we shouldn't rely >> on system to detect dead-lock, and should consider this when >> doing code implementation. In fact, dead-lock detection isn't >> enabled by default. >> For your proposal to remove reset_sem into the hive >> structure, we can open a new topic to discuss it. Currently we >> couldn't make sure which is the best solution. For example, with >> your proposal, we must wait for all devices resuming successfully >> before resubmit an old task in one device, which will effect >> performance. >> >> Best Regards >> Dennis Li >> -Original Message- >> From: amd-gfx On Behalf Of >> Christian König >> Sent: Thursday, August 6, 2020 3:08 PM >> To: Li, Dennis ; amd-gfx@lists.freedesktop.org; >> Deucher, Alexander ; Kuehling, Felix >> ; Zhang, Hawking >> Subject: Re: [PATCH] drm/amdgpu: annotate a false positive >> recursive locking >> >> Am 06.08.20 um 09:02 schrieb Dennis Li: >>> [ 584.110304] >>> [ 584.110590] WARNING: possible recursive locking detected >>> [ 584.110876] 5.6.0-deli-v5.6-2848-g3f3109b0e75f #1 Tainted: >>> G OE >>> [ 584.64] >>> [ 584.111456] kworker/38:1/553 is trying to acquire lock: >>> [ 584.111721] 9b15ff0a47a0 (&adev->reset_sem){}, at: >>> amdgpu_device_gpu_recover+0x262/0x1030 [amdgpu] [ 584.112112] >>> but task is already holding lock: >>> [ 584.112673] 9b1603d247a0 (&adev->reset_sem){}, at: >>> amdgpu_device_gpu_recover+0x262/0x1030 [amdgpu] [ 584.113068] >>> other info that might help us debug this: >>> [ 584.113689] Possible unsafe locking scenario: >>> >>> [ 584.114350] CPU0 >>> [ 584.114685] >>> [ 584.115014] lock(&adev->reset_sem); >>> [ 584.115349] lock(&adev->reset_sem); >>> [ 584.115678] >>> *** DEADLOCK *** >>> >>> [ 584.116624] May be due to missing lock nesting notation >>> >>> [ 584.117284] 4 locks held by kworker/38:1/553: >>> [ 584.117616] #0: 9ad635c1d348 >>> ((wq_completion)events){+.+.},