Re: [PATCH Review 1/1] drm/amdgpu: fix smu not match warning
On 11/16/2021 1:13 PM, Stanley.Yang wrote: update smu driver if version to avoid mismatch log Change-Id: I97f2bc4ed9a9cba313b744e2ff6812c90b244935 Signed-off-by: Stanley.Yang --- drivers/gpu/drm/amd/pm/inc/smu_v13_0.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/pm/inc/smu_v13_0.h b/drivers/gpu/drm/amd/pm/inc/smu_v13_0.h index e5d3b0d1a032..2e35885c7287 100644 --- a/drivers/gpu/drm/amd/pm/inc/smu_v13_0.h +++ b/drivers/gpu/drm/amd/pm/inc/smu_v13_0.h @@ -27,7 +27,7 @@ #define SMU13_DRIVER_IF_VERSION_INV 0x #define SMU13_DRIVER_IF_VERSION_YELLOW_CARP 0x04 -#define SMU13_DRIVER_IF_VERSION_ALDE 0x07 +#define SMU13_DRIVER_IF_VERSION_ALDE 0x08 This is not an independent change, it should go along with a change in interface file. Please post the changes in smu13_driver_if_aldebaran.h along with this as one patch. Thanks, Lijo /* MP Apertures */ #define MP0_Public0x0380
Re: [PATCH] drm/amdgpu: support new mode-1 reset interface
On 11/16/2021 12:53 PM, Tao Zhou wrote: If gpu reset is triggered by ras fatal error, tell it to smu in mode-1 reset message. Signed-off-by: Tao Zhou --- .../gpu/drm/amd/pm/swsmu/smu13/smu_v13_0.c| 21 --- 1 file changed, 18 insertions(+), 3 deletions(-) diff --git a/drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0.c b/drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0.c index 35145db6eedf..6f3d064a8232 100644 --- a/drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0.c +++ b/drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0.c @@ -1426,16 +1426,31 @@ int smu_v13_0_set_azalia_d3_pme(struct smu_context *smu) int smu_v13_0_mode1_reset(struct smu_context *smu) { - u32 smu_version; + u32 smu_version, fatal_err, param; int ret = 0; + struct amdgpu_device *adev = smu->adev; + struct amdgpu_ras *ras = amdgpu_ras_get_context(adev); + + fatal_err = 0; + param = SMU_RESET_MODE_1; + /* * PM FW support SMU_MSG_GfxDeviceDriverReset from 68.07 */ smu_cmn_get_smc_version(smu, NULL, _version); if (smu_version < 0x00440700) ret = smu_cmn_send_smc_msg(smu, SMU_MSG_Mode1Reset, NULL); - else - ret = smu_cmn_send_smc_msg_with_param(smu, SMU_MSG_GfxDeviceDriverReset, SMU_RESET_MODE_1, NULL); + else { + /* fatal error triggered by ras, PMFW supports the flag + from 68.44.0 */ + if ((smu_version >= 0x00442c00) && ras && + atomic_read(>in_recovery)) + fatal_err = 1; + From PMFW version, this looks specific to aldebaran. Since there is version check as well, the implementation needs to be moved to aldebaran_ppt.c Thanks, Lijo + param |= (fatal_err << 16); + ret = smu_cmn_send_smc_msg_with_param(smu, + SMU_MSG_GfxDeviceDriverReset, param, NULL); + } if (!ret) msleep(SMU13_MODE1_RESET_WAIT_TIME_IN_MS);
[PATCH Review 1/1] drm/amdgpu: fix smu not match warning
update smu driver if version to avoid mismatch log Change-Id: I97f2bc4ed9a9cba313b744e2ff6812c90b244935 Signed-off-by: Stanley.Yang --- drivers/gpu/drm/amd/pm/inc/smu_v13_0.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/pm/inc/smu_v13_0.h b/drivers/gpu/drm/amd/pm/inc/smu_v13_0.h index e5d3b0d1a032..2e35885c7287 100644 --- a/drivers/gpu/drm/amd/pm/inc/smu_v13_0.h +++ b/drivers/gpu/drm/amd/pm/inc/smu_v13_0.h @@ -27,7 +27,7 @@ #define SMU13_DRIVER_IF_VERSION_INV 0x #define SMU13_DRIVER_IF_VERSION_YELLOW_CARP 0x04 -#define SMU13_DRIVER_IF_VERSION_ALDE 0x07 +#define SMU13_DRIVER_IF_VERSION_ALDE 0x08 /* MP Apertures */ #define MP0_Public 0x0380 -- 2.17.1
RE: [PATCH] drm/amdgpu: support new mode-1 reset interface
[AMD Official Use Only] Reviewed-by: Hawking Zhang Regards, Hawking -Original Message- From: Zhou1, Tao Sent: Tuesday, November 16, 2021 15:24 To: amd-gfx@lists.freedesktop.org; Zhang, Hawking ; Clements, John ; Yang, Stanley ; Quan, Evan Cc: Zhou1, Tao Subject: [PATCH] drm/amdgpu: support new mode-1 reset interface If gpu reset is triggered by ras fatal error, tell it to smu in mode-1 reset message. Signed-off-by: Tao Zhou --- .../gpu/drm/amd/pm/swsmu/smu13/smu_v13_0.c| 21 --- 1 file changed, 18 insertions(+), 3 deletions(-) diff --git a/drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0.c b/drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0.c index 35145db6eedf..6f3d064a8232 100644 --- a/drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0.c +++ b/drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0.c @@ -1426,16 +1426,31 @@ int smu_v13_0_set_azalia_d3_pme(struct smu_context *smu) int smu_v13_0_mode1_reset(struct smu_context *smu) { - u32 smu_version; + u32 smu_version, fatal_err, param; int ret = 0; + struct amdgpu_device *adev = smu->adev; + struct amdgpu_ras *ras = amdgpu_ras_get_context(adev); + + fatal_err = 0; + param = SMU_RESET_MODE_1; + /* * PM FW support SMU_MSG_GfxDeviceDriverReset from 68.07 */ smu_cmn_get_smc_version(smu, NULL, _version); if (smu_version < 0x00440700) ret = smu_cmn_send_smc_msg(smu, SMU_MSG_Mode1Reset, NULL); - else - ret = smu_cmn_send_smc_msg_with_param(smu, SMU_MSG_GfxDeviceDriverReset, SMU_RESET_MODE_1, NULL); + else { + /* fatal error triggered by ras, PMFW supports the flag + from 68.44.0 */ + if ((smu_version >= 0x00442c00) && ras && + atomic_read(>in_recovery)) + fatal_err = 1; + + param |= (fatal_err << 16); + ret = smu_cmn_send_smc_msg_with_param(smu, + SMU_MSG_GfxDeviceDriverReset, param, NULL); + } if (!ret) msleep(SMU13_MODE1_RESET_WAIT_TIME_IN_MS); -- 2.17.1
[PATCH] drm/amdgpu: support new mode-1 reset interface
If gpu reset is triggered by ras fatal error, tell it to smu in mode-1 reset message. Signed-off-by: Tao Zhou --- .../gpu/drm/amd/pm/swsmu/smu13/smu_v13_0.c| 21 --- 1 file changed, 18 insertions(+), 3 deletions(-) diff --git a/drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0.c b/drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0.c index 35145db6eedf..6f3d064a8232 100644 --- a/drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0.c +++ b/drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0.c @@ -1426,16 +1426,31 @@ int smu_v13_0_set_azalia_d3_pme(struct smu_context *smu) int smu_v13_0_mode1_reset(struct smu_context *smu) { - u32 smu_version; + u32 smu_version, fatal_err, param; int ret = 0; + struct amdgpu_device *adev = smu->adev; + struct amdgpu_ras *ras = amdgpu_ras_get_context(adev); + + fatal_err = 0; + param = SMU_RESET_MODE_1; + /* * PM FW support SMU_MSG_GfxDeviceDriverReset from 68.07 */ smu_cmn_get_smc_version(smu, NULL, _version); if (smu_version < 0x00440700) ret = smu_cmn_send_smc_msg(smu, SMU_MSG_Mode1Reset, NULL); - else - ret = smu_cmn_send_smc_msg_with_param(smu, SMU_MSG_GfxDeviceDriverReset, SMU_RESET_MODE_1, NULL); + else { + /* fatal error triggered by ras, PMFW supports the flag + from 68.44.0 */ + if ((smu_version >= 0x00442c00) && ras && + atomic_read(>in_recovery)) + fatal_err = 1; + + param |= (fatal_err << 16); + ret = smu_cmn_send_smc_msg_with_param(smu, + SMU_MSG_GfxDeviceDriverReset, param, NULL); + } if (!ret) msleep(SMU13_MODE1_RESET_WAIT_TIME_IN_MS); -- 2.17.1
Re: [PATCH] drm/radeon:WARNING opportunity for max()
Am 16.11.21 um 06:50 schrieb zhaoxiao: Fix following coccicheck warning: drivers/gpu/drm/radeon/r100.c:3450:26-27: WARNING opportunity for max() drivers/gpu/drm/radeon/r100.c:2812:23-24: WARNING opportunity for max() Signed-off-by: zhaoxiao In general feel free to add my Acked-by, but I'm not sure if we want such cleanups in a driver which is only in maintenance mode. If you do this as part of a general and automated cleanup then it is probably ok. Regards, Christian. --- drivers/gpu/drm/radeon/r100.c | 10 ++ 1 file changed, 2 insertions(+), 8 deletions(-) diff --git a/drivers/gpu/drm/radeon/r100.c b/drivers/gpu/drm/radeon/r100.c index 2dd85ba1faa2..c65ee6f44af2 100644 --- a/drivers/gpu/drm/radeon/r100.c +++ b/drivers/gpu/drm/radeon/r100.c @@ -2809,10 +2809,7 @@ void r100_vram_init_sizes(struct radeon_device *rdev) if (rdev->mc.aper_size > config_aper_size) config_aper_size = rdev->mc.aper_size; - if (config_aper_size > rdev->mc.real_vram_size) - rdev->mc.mc_vram_size = config_aper_size; - else - rdev->mc.mc_vram_size = rdev->mc.real_vram_size; + rdev->mc.mc_vram_size = max(config_aper_size, rdev->mc.real_vram_size); } } @@ -3447,10 +3444,7 @@ void r100_bandwidth_update(struct radeon_device *rdev) mc_latency_mclk.full += disp_latency_overhead.full + cur_latency_mclk.full; mc_latency_sclk.full += disp_latency_overhead.full + cur_latency_sclk.full; - if (mc_latency_mclk.full > mc_latency_sclk.full) - disp_latency.full = mc_latency_mclk.full; - else - disp_latency.full = mc_latency_sclk.full; + disp_latency.full = max(mc_latency_mclk.full, mc_latency_sclk.full); /* setup Max GRPH_STOP_REQ default value */ if (ASIC_IS_RV100(rdev))
Re: Questions about KMS flip
Am 16.11.21 um 04:27 schrieb Lang Yu: On Mon, Nov 15, 2021 at 01:04:15PM +0100, Michel DDDnzer wrote: [SNIP] Though a single call to dce_v*_0_crtc_do_set_base() will only pin the BO, I found it will be unpinned in next call to dce_v*_0_crtc_do_set_base(). Yeah, that's the normal case when the new BO is different from the old one. To catch the case I described, try something like diff --git a/drivers/gpu/drm/amd/amdgpu/dce_v11_0.c b/drivers/gpu/drm/amd/amdgpu/dce_v11_0.c index 18a7b3bd633b..5726bd87a355 100644 --- a/drivers/gpu/drm/amd/amdgpu/dce_v11_0.c +++ b/drivers/gpu/drm/amd/amdgpu/dce_v11_0.c @@ -1926,6 +1926,7 @@ static int dce_v11_0_crtc_do_set_base(struct drm_crtc *crtc, return r; if (!atomic) { + WARN_ON_ONCE(target_fb == fb); r = amdgpu_bo_pin(abo, AMDGPU_GEM_DOMAIN_VRAM); if (unlikely(r != 0)) { amdgpu_bo_unreserve(abo); I did some tests, the warning can be triggered. pin/unpin operations in *_crtc_do_set_base() and amdgpu_display_crtc_page_flip_target() are mixed. Ok sounds like we narrowed down the root cause pretty well. Question is now how can we fix this? Just not pin the BO when target_fb == fb? Thanks, Christian. Regards, Lang
RE: [PATCH] drm/amdgpu: always reset the asic in suspend (v2)
[AMD Official Use Only] Acked-by: Evan Quan > -Original Message- > From: amd-gfx On Behalf Of Alex > Deucher > Sent: Saturday, November 13, 2021 12:26 AM > To: amd-gfx@lists.freedesktop.org > Cc: Deucher, Alexander > Subject: [PATCH] drm/amdgpu: always reset the asic in suspend (v2) > > If the platform suspend happens to fail and the power rail > is not turned off, the GPU will be in an unknown state on > resume, so reset the asic so that it will be in a known > good state on resume even if the platform suspend failed. > > v2: handle s0ix > > Signed-off-by: Alex Deucher > --- > drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 5 - > 1 file changed, 4 insertions(+), 1 deletion(-) > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c > b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c > index 1db76429a673..b4591f6e82dd 100644 > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c > @@ -2165,7 +2165,10 @@ static int amdgpu_pmops_suspend(struct device > *dev) > adev->in_s3 = true; > r = amdgpu_device_suspend(drm_dev, true); > adev->in_s3 = false; > - > + if (r) > + return r; > + if (!adev->in_s0ix) > + r = amdgpu_asic_reset(adev); > return r; > } > > -- > 2.31.1
RE: [PATCH] drm/amd/pm: Remove artificial freq level on Navi1x
[AMD Official Use Only] Reviewed-by: Evan Quan > -Original Message- > From: Lazar, Lijo > Sent: Monday, November 15, 2021 3:42 PM > To: amd-gfx@lists.freedesktop.org > Cc: Deucher, Alexander ; Zhang, Hawking > ; Wang, Yang(Kevin) > ; Quan, Evan > Subject: [PATCH] drm/amd/pm: Remove artificial freq level on Navi1x > > Print Navi1x fine grained clocks in a consistent manner with other SOCs. > Don't show aritificial DPM level when the current clock equals min or max. > > Signed-off-by: Lijo Lazar > --- > drivers/gpu/drm/amd/pm/swsmu/smu11/navi10_ppt.c | 13 - > 1 file changed, 8 insertions(+), 5 deletions(-) > > diff --git a/drivers/gpu/drm/amd/pm/swsmu/smu11/navi10_ppt.c > b/drivers/gpu/drm/amd/pm/swsmu/smu11/navi10_ppt.c > index 71161f6b78fe..60a557068ea4 100644 > --- a/drivers/gpu/drm/amd/pm/swsmu/smu11/navi10_ppt.c > +++ b/drivers/gpu/drm/amd/pm/swsmu/smu11/navi10_ppt.c > @@ -1265,7 +1265,7 @@ static int navi10_print_clk_levels(struct > smu_context *smu, > enum smu_clk_type clk_type, char *buf) { > uint16_t *curve_settings; > - int i, size = 0, ret = 0; > + int i, levels, size = 0, ret = 0; > uint32_t cur_value = 0, value = 0, count = 0; > uint32_t freq_values[3] = {0}; > uint32_t mark_index = 0; > @@ -1319,14 +1319,17 @@ static int navi10_print_clk_levels(struct > smu_context *smu, > freq_values[1] = cur_value; > mark_index = cur_value == freq_values[0] ? 0 : >cur_value == freq_values[2] ? 2 : 1; > - if (mark_index != 1) > - freq_values[1] = (freq_values[0] + > freq_values[2]) / 2; > > - for (i = 0; i < 3; i++) { > + levels = 3; > + if (mark_index != 1) { > + levels = 2; > + freq_values[1] = freq_values[2]; > + } > + > + for (i = 0; i < levels; i++) { > size += sysfs_emit_at(buf, size, > "%d: %uMhz %s\n", i, freq_values[i], > i == mark_index ? "*" : ""); > } > - > } > break; > case SMU_PCIE: > -- > 2.17.1
Re: [PATCH v2 2/8] drm: improve drm_buddy_alloc function
Hi Matthew, I am preparing version3 of the buddy allocator, Please find the updated comments. SNIP >>> -int drm_buddy_alloc_range(struct drm_buddy_mm *mm, >>> - struct list_head *blocks, >>> - u64 start, u64 size) >>> +static struct drm_buddy_block * >>> +alloc_range(struct drm_buddy_mm *mm, >>> + u64 start, u64 end, >>> + unsigned int order) >>> { >>> struct drm_buddy_block *block; >>> struct drm_buddy_block *buddy; >>> - LIST_HEAD(allocated); >>> LIST_HEAD(dfs); >>> - u64 end; >>> int err; >>> int i; >>> >> >>> if (!block) >>> break; >>> >>> list_del(>tmp_link); >>> >>> + if (drm_buddy_block_order(block) < order) >>> + continue; >>> + >>> block_start = drm_buddy_block_offset(block); >>> block_end = block_start + drm_buddy_block_size(mm, block) - 1; >>> >>> if (!overlaps(start, end, block_start, block_end)) >>> continue; >>> >>> - if (drm_buddy_block_is_allocated(block)) { >>> - err = -ENOSPC; >>> - goto err_free; >>> - } >>> + if (drm_buddy_block_is_allocated(block)) >>> + continue; >>> >>> - if (contains(start, end, block_start, block_end)) { >>> - if (!drm_buddy_block_is_free(block)) { >>> - err = -ENOSPC; >>> - goto err_free; >>> - } >>> + if (contains(start, end, block_start, block_end) >>> + && order == drm_buddy_block_order(block)) { >> >> Alignment looks off, also && should be on the line above. > > [Arun] ok >> >>> + /* >>> +* Find the free block within the range. >>> +*/ >>> + if (drm_buddy_block_is_free(block)) >>> + return block; >> >> Would it make sense to keep searching here, rather than restarting the >> search from scratch every time? Would it work if we pass in the total >> size and min order? > [Arun] yes, I will rewrite this function I tried to rewrite the function, AFAIK, in case of end bias allocation, we have to restart the search on every new order computed value from the requested total size since we have to find a free node in the required order level traversing from left to right, here continuing the search for the subsequent order value would skip the free nodes present in the beginning of the tree. In case of actual range allocation, as handled at i915_buddy_alloc_range, we can continue the search from where the previous allocation happened since we allocate all the blocks progressively within the start and end address values. alloc_range() handles both the cases, having a penalty of restarting the search in case of actual range allocation. Please let me know if any suggestions? >>> +int drm_buddy_alloc(struct drm_buddy_mm *mm, >>> + u64 start, u64 end, u64 size, >>> + u64 min_page_size, >>> + struct list_head *blocks, >>> + unsigned long flags) >> >> Do we need to validate the flags somewhere? > [Arun] I will move 'unsigned long flags' to enum type declaration I tried to move 'unsigned long flags' to enum type declaration, it creates an ambiguity in i915 driver as both DRM_BUDDY_ALLOC_TOPDOWN and DRM_BUDDY_ALLOC_RANGE are mutually non-exclusive. So I think its better to have 'unsigned long flags'. AFAIK, we don't need to validate the flags since we check flags using 'flags & DRM_BUDDY_RANGE_ALLOCATION' >> >>> + BUG_ON(order > mm->max_order); >>> + BUG_ON(order < min_order); >>> + >>> + do { >>> + if (flags & DRM_BUDDY_RANGE_ALLOCATION) >>> + /* Allocate traversing within the range */ >>> + block = alloc_range(mm, start, end, order); >> >> Ok, so blocks might be in a random order, which is a slight concern for >> actual range allocations(not the bias thing). Can we somehow make >> alloc_range just do the old behaviour when end - start == size? Not the >> end of the world though if not. > [Arun] I will change the alloc_range() block allocations to bottom-up, > so both actual range allocation and end bias allocation blocks will > start from lowest address. And, since we are traversing the tree from > left to right, blocks will be in order. > > And, alloc_range() handles actual range allocation demands the same way > as in the old i915_buddy_alloc_range() function except alloc_range() > make use of order value to find the blocks within the actual range > allocation. Correction - I will change alloc_range() block allocations to bottom-up, so actual range allocation blocks will start from lowest address (not the bias thing), and since we are traversing the tree from left to
Re: [PATCH] drm/amd/pm: Add sysfs interface for retrieving gpu metrics(V2)
On Mon, Nov 15, 2021 at 3:49 PM Kakarya, Surbhi wrote: > > [AMD Official Use Only] > > Hi Alex, > > I am porting the patches > (http://gerrit-git.amd.com/c/brahma/ec/linux/+/396997 and > http://gerrit-git.amd.com/c/brahma/ec/linux/+/528745) to provide the > necessary SMU utils (basic and system_status) support in this branch. > If you just want to populate the new fields in the metrics table, that is all you need to do. No need for any of the other changes. I think the last hunk of your patch is all you need. Alex > Thanks > Surbhi > > -Original Message- > From: Alex Deucher > Sent: Friday, November 12, 2021 2:41 PM > To: Kakarya, Surbhi > Cc: amd-gfx list ; Zhang, Bokun > ; Chang, HaiJun ; Liu, Monk > ; Deucher, Alexander > Subject: Re: [PATCH] drm/amd/pm: Add sysfs interface for retrieving gpu > metrics(V2) > > On Fri, Nov 12, 2021 at 12:46 PM Surbhi Kakarya > wrote: > > > > A new interface for UMD to retrieve gpu metrics data. This patch is > > based on an existing patch If7f3523915505c0ece0a56dfd476d2b8473440d4. > > > > It's not clear what you are trying to do here. > > > Signed-off-by: Surbhi Kakarya > > Change-Id: I701110d78a85c092f5dda167a52350cc6dda7557 > > --- > > drivers/gpu/drm/amd/pm/amdgpu_pm.c | 6 +- > > drivers/gpu/drm/amd/pm/inc/amdgpu_smu.h| 2 +- > > drivers/gpu/drm/amd/pm/swsmu/amdgpu_smu.c | 4 +--- > > .../gpu/drm/amd/pm/swsmu/smu11/sienna_cichlid_ppt.c| 10 ++ > > 4 files changed, 17 insertions(+), 5 deletions(-) > > > > diff --git a/drivers/gpu/drm/amd/pm/amdgpu_pm.c > > b/drivers/gpu/drm/amd/pm/amdgpu_pm.c > > index 01cca08a774f..d60426daddae 100644 > > --- a/drivers/gpu/drm/amd/pm/amdgpu_pm.c > > +++ b/drivers/gpu/drm/amd/pm/amdgpu_pm.c > > @@ -1800,8 +1800,12 @@ static ssize_t amdgpu_get_gpu_metrics(struct device > > *dev, > > return ret; > > } > > > > - if (adev->powerplay.pp_funcs->get_gpu_metrics) > > + down_read(>reset_sem); > > + if (is_support_sw_smu(adev)) > > + size = smu_sys_get_gpu_metrics(>smu, _metrics); > > + else if (adev->powerplay.pp_funcs->get_gpu_metrics) > > size = amdgpu_dpm_get_gpu_metrics(adev, _metrics); > > + up_read(>reset_sem); > > > > Why are you changing this code? > adev->powerplay.pp_funcs->get_gpu_metrics already points to > smu_sys_get_gpu_metrics(). Also why do you need to add the semaphore locking? > > > if (size <= 0) > > goto out; > > diff --git a/drivers/gpu/drm/amd/pm/inc/amdgpu_smu.h > > b/drivers/gpu/drm/amd/pm/inc/amdgpu_smu.h > > index 3557f4e7fc30..5ffe7e3bf1aa 100644 > > --- a/drivers/gpu/drm/amd/pm/inc/amdgpu_smu.h > > +++ b/drivers/gpu/drm/amd/pm/inc/amdgpu_smu.h > > @@ -1397,6 +1397,6 @@ int smu_set_light_sbr(struct smu_context *smu, > > bool enable); > > > > int smu_wait_for_event(struct amdgpu_device *adev, enum smu_event_type > > event, > >uint64_t event_arg); > > - > > +ssize_t smu_sys_get_gpu_metrics(struct smu_context *smu, void > > +**table); > > #endif > > #endif > > diff --git a/drivers/gpu/drm/amd/pm/swsmu/amdgpu_smu.c > > b/drivers/gpu/drm/amd/pm/swsmu/amdgpu_smu.c > > index b06c59dcc1b4..ec81abe385e3 100644 > > --- a/drivers/gpu/drm/amd/pm/swsmu/amdgpu_smu.c > > +++ b/drivers/gpu/drm/amd/pm/swsmu/amdgpu_smu.c > > @@ -3005,9 +3005,8 @@ static int smu_get_dpm_clock_table(void *handle, > > return ret; > > } > > > > -static ssize_t smu_sys_get_gpu_metrics(void *handle, void **table) > > +ssize_t smu_sys_get_gpu_metrics(struct smu_context *smu, void > > +**table) > > { > > - struct smu_context *smu = handle; > > ssize_t size; > > > > if (!smu->pm_enabled || !smu->adev->pm.dpm_enabled) @@ -3135,7 > > +3134,6 @@ static const struct amd_pm_funcs swsmu_pm_funcs = { > > .asic_reset_mode_2= smu_mode2_reset, > > .set_df_cstate= smu_set_df_cstate, > > .set_xgmi_pstate = smu_set_xgmi_pstate, > > - .get_gpu_metrics = smu_sys_get_gpu_metrics, > > Why are you removing this? > > > .set_watermarks_for_clock_ranges = > > smu_set_watermarks_for_clock_ranges, > > .display_disable_memory_clock_switch = > > smu_display_disable_memory_clock_switch, > > .get_max_sustainable_clocks_by_dc= > > smu_get_max_sustainable_clocks_by_dc, > > diff --git a/drivers/gpu/drm/amd/pm/swsmu/smu11/sienna_cichlid_ppt.c > > b/drivers/gpu/drm/amd/pm/swsmu/smu11/sienna_cichlid_ppt.c > > index 3b1bf270ebc6..97d18e764665 100644 > > --- a/drivers/gpu/drm/amd/pm/swsmu/smu11/sienna_cichlid_ppt.c > > +++ b/drivers/gpu/drm/amd/pm/swsmu/smu11/sienna_cichlid_ppt.c > > @@ -3619,6 +3619,16 @@ static ssize_t sienna_cichlid_get_gpu_metrics(struct > > smu_context *smu, > > gpu_metrics->energy_accumulator = > > use_metrics_v2 ? metrics_v2->EnergyAccumulator : > >
Re: [PATCH 4/5] drm/amdkfd: restore pages race with process termination
On 2021-11-09 11:16 p.m., Felix Kuehling wrote: On 2021-11-09 6:04 p.m., Philip Yang wrote: restore pages work can not find kfd process or mm struct if process is destroyed before drain retry fault work schedule to run, this is not failure, return 0 to avoid dump GPU vm fault kernel log. I wonder if this could also be solved by draining page fault interrupts in kfd_process_notifier_release before we remove the process from the hash table. Because that function runs while holding the mmap lock, we'd need to detect the draining condition for the process in the page fault handler before it tries to take the mmap lock. Maybe that's even a helpful optimization that speeds up interrupt draining by just ignoring all retry faults during that time. Call svm_range_drain_retry_fault in kfd_process_notifier_release before removing the process from hash table solve this race condition. That would also allow draining faults in svm_range_unmap_from_cpu instead of the delayed worker. And I believe that would also elegantly fix the vma removal race. vma maybe removed from rbtree before unmap call back (refer to below __do_munmap), draining fault in svm_range_unmap_from_cpu can not fix vma removal race, also cause mmap write and read lock deadlock issue, will change to check if svms->drain_pagefaults is set, restore_pages returns true if no VMA is found. int __do_munmap /* Detach vmas from rbtree */ if (!detach_vmas_to_be_unmapped(mm, vma, prev, end)) downgrade = false; if (downgrade) mmap_write_downgrade(mm); unmap_region(mm, vma, prev, start, end); /* Fix up all other VM information */ remove_vma_list(mm, vma); Regards, Philip Regards, Felix Signed-off-by: Philip Yang --- drivers/gpu/drm/amd/amdkfd/kfd_svm.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c index 8f77d5746b2c..2083a10b35c2 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c @@ -2596,7 +2596,7 @@ svm_range_restore_pages(struct amdgpu_device *adev, unsigned int pasid, p = kfd_lookup_process_by_pasid(pasid); if (!p) { pr_debug("kfd process not founded pasid 0x%x\n", pasid); - return -ESRCH; + return 0; } if (!p->xnack_enabled) { pr_debug("XNACK not enabled for pasid 0x%x\n", pasid); @@ -2610,7 +2610,7 @@ svm_range_restore_pages(struct amdgpu_device *adev, unsigned int pasid, mm = get_task_mm(p->lead_thread); if (!mm) { pr_debug("svms 0x%p failed to get mm\n", svms); - r = -ESRCH; + r = 0; goto out; }
RE: [PATCH] drm/amd/pm: Add sysfs interface for retrieving gpu metrics(V2)
[AMD Official Use Only] Hi Alex, I am porting the patches (http://gerrit-git.amd.com/c/brahma/ec/linux/+/396997 and http://gerrit-git.amd.com/c/brahma/ec/linux/+/528745) to provide the necessary SMU utils (basic and system_status) support in this branch. Thanks Surbhi -Original Message- From: Alex Deucher Sent: Friday, November 12, 2021 2:41 PM To: Kakarya, Surbhi Cc: amd-gfx list ; Zhang, Bokun ; Chang, HaiJun ; Liu, Monk ; Deucher, Alexander Subject: Re: [PATCH] drm/amd/pm: Add sysfs interface for retrieving gpu metrics(V2) On Fri, Nov 12, 2021 at 12:46 PM Surbhi Kakarya wrote: > > A new interface for UMD to retrieve gpu metrics data. This patch is > based on an existing patch If7f3523915505c0ece0a56dfd476d2b8473440d4. > It's not clear what you are trying to do here. > Signed-off-by: Surbhi Kakarya > Change-Id: I701110d78a85c092f5dda167a52350cc6dda7557 > --- > drivers/gpu/drm/amd/pm/amdgpu_pm.c | 6 +- > drivers/gpu/drm/amd/pm/inc/amdgpu_smu.h| 2 +- > drivers/gpu/drm/amd/pm/swsmu/amdgpu_smu.c | 4 +--- > .../gpu/drm/amd/pm/swsmu/smu11/sienna_cichlid_ppt.c| 10 ++ > 4 files changed, 17 insertions(+), 5 deletions(-) > > diff --git a/drivers/gpu/drm/amd/pm/amdgpu_pm.c > b/drivers/gpu/drm/amd/pm/amdgpu_pm.c > index 01cca08a774f..d60426daddae 100644 > --- a/drivers/gpu/drm/amd/pm/amdgpu_pm.c > +++ b/drivers/gpu/drm/amd/pm/amdgpu_pm.c > @@ -1800,8 +1800,12 @@ static ssize_t amdgpu_get_gpu_metrics(struct device > *dev, > return ret; > } > > - if (adev->powerplay.pp_funcs->get_gpu_metrics) > + down_read(>reset_sem); > + if (is_support_sw_smu(adev)) > + size = smu_sys_get_gpu_metrics(>smu, _metrics); > + else if (adev->powerplay.pp_funcs->get_gpu_metrics) > size = amdgpu_dpm_get_gpu_metrics(adev, _metrics); > + up_read(>reset_sem); > Why are you changing this code? adev->powerplay.pp_funcs->get_gpu_metrics already points to smu_sys_get_gpu_metrics(). Also why do you need to add the semaphore locking? > if (size <= 0) > goto out; > diff --git a/drivers/gpu/drm/amd/pm/inc/amdgpu_smu.h > b/drivers/gpu/drm/amd/pm/inc/amdgpu_smu.h > index 3557f4e7fc30..5ffe7e3bf1aa 100644 > --- a/drivers/gpu/drm/amd/pm/inc/amdgpu_smu.h > +++ b/drivers/gpu/drm/amd/pm/inc/amdgpu_smu.h > @@ -1397,6 +1397,6 @@ int smu_set_light_sbr(struct smu_context *smu, > bool enable); > > int smu_wait_for_event(struct amdgpu_device *adev, enum smu_event_type event, >uint64_t event_arg); > - > +ssize_t smu_sys_get_gpu_metrics(struct smu_context *smu, void > +**table); > #endif > #endif > diff --git a/drivers/gpu/drm/amd/pm/swsmu/amdgpu_smu.c > b/drivers/gpu/drm/amd/pm/swsmu/amdgpu_smu.c > index b06c59dcc1b4..ec81abe385e3 100644 > --- a/drivers/gpu/drm/amd/pm/swsmu/amdgpu_smu.c > +++ b/drivers/gpu/drm/amd/pm/swsmu/amdgpu_smu.c > @@ -3005,9 +3005,8 @@ static int smu_get_dpm_clock_table(void *handle, > return ret; > } > > -static ssize_t smu_sys_get_gpu_metrics(void *handle, void **table) > +ssize_t smu_sys_get_gpu_metrics(struct smu_context *smu, void > +**table) > { > - struct smu_context *smu = handle; > ssize_t size; > > if (!smu->pm_enabled || !smu->adev->pm.dpm_enabled) @@ -3135,7 > +3134,6 @@ static const struct amd_pm_funcs swsmu_pm_funcs = { > .asic_reset_mode_2= smu_mode2_reset, > .set_df_cstate= smu_set_df_cstate, > .set_xgmi_pstate = smu_set_xgmi_pstate, > - .get_gpu_metrics = smu_sys_get_gpu_metrics, Why are you removing this? > .set_watermarks_for_clock_ranges = > smu_set_watermarks_for_clock_ranges, > .display_disable_memory_clock_switch = > smu_display_disable_memory_clock_switch, > .get_max_sustainable_clocks_by_dc= > smu_get_max_sustainable_clocks_by_dc, > diff --git a/drivers/gpu/drm/amd/pm/swsmu/smu11/sienna_cichlid_ppt.c > b/drivers/gpu/drm/amd/pm/swsmu/smu11/sienna_cichlid_ppt.c > index 3b1bf270ebc6..97d18e764665 100644 > --- a/drivers/gpu/drm/amd/pm/swsmu/smu11/sienna_cichlid_ppt.c > +++ b/drivers/gpu/drm/amd/pm/swsmu/smu11/sienna_cichlid_ppt.c > @@ -3619,6 +3619,16 @@ static ssize_t sienna_cichlid_get_gpu_metrics(struct > smu_context *smu, > gpu_metrics->energy_accumulator = > use_metrics_v2 ? metrics_v2->EnergyAccumulator : > metrics->EnergyAccumulator; > > + if (metrics->CurrGfxVoltageOffset) > + gpu_metrics->voltage_gfx = > + (155000 - 625 * metrics->CurrGfxVoltageOffset) / 100; > + if (metrics->CurrMemVidOffset) > + gpu_metrics->voltage_mem = > + (155000 - 625 * metrics->CurrMemVidOffset) / 100; > + if (metrics->CurrSocVoltageOffset) > + gpu_metrics->voltage_soc = > + (155000 - 625 *
Re: [PATCH] drm/amd/display: Fix LTTPR not Enabled on HP ZBook G8 laptop
[AMD Official Use Only] Reviewed-by: Bhawanpreet Lakha From: Aurabindo Pillai Sent: November 15, 2021 2:59 PM To: amd-gfx@lists.freedesktop.org Cc: Lakha, Bhawanpreet ; Siqueira, Rodrigo ; Pillai, Aurabindo ; Wang, Angus ; Chalmers, Wesley ; Leung, Martin ; Zhuo, Qingqing Subject: [PATCH] drm/amd/display: Fix LTTPR not Enabled on HP ZBook G8 laptop From: Angus Wang [WHY] Previous LTTPR change has caused a regression that led to an issue where LTTPR is disabled [HOW] Extended changes from previous fix to DCN30X Reviewed-by: Wesley Chalmers Reviewed-by: Martin Leung Acked-by: Qingqing Zhuo Signed-off-by: Angus Wang --- .../amd/display/dc/dcn301/dcn301_resource.c| 18 ++ .../amd/display/dc/dcn302/dcn302_resource.c| 18 ++ .../amd/display/dc/dcn303/dcn303_resource.c| 17 + 3 files changed, 53 insertions(+) diff --git a/drivers/gpu/drm/amd/display/dc/dcn301/dcn301_resource.c b/drivers/gpu/drm/amd/display/dc/dcn301/dcn301_resource.c index 2650d3bd50ec..9cc1610360bd 100644 --- a/drivers/gpu/drm/amd/display/dc/dcn301/dcn301_resource.c +++ b/drivers/gpu/drm/amd/display/dc/dcn301/dcn301_resource.c @@ -1485,6 +1485,24 @@ static bool dcn301_resource_construct( dc->caps.color.mpc.ogam_rom_caps.hlg = 0; dc->caps.color.mpc.ocsc = 1; + /* read VBIOS LTTPR caps */ + + if (ctx->dc_bios->funcs->get_lttpr_caps) { + enum bp_result bp_query_result; + uint8_t is_vbios_lttpr_enable = 0; + + bp_query_result = ctx->dc_bios->funcs->get_lttpr_caps(ctx->dc_bios, _vbios_lttpr_enable); + dc->caps.vbios_lttpr_enable = (bp_query_result == BP_RESULT_OK) && !!is_vbios_lttpr_enable; + } + + if (ctx->dc_bios->funcs->get_lttpr_interop) { + enum bp_result bp_query_result; + uint8_t is_vbios_interop_enabled = 0; + + bp_query_result = ctx->dc_bios->funcs->get_lttpr_interop(ctx->dc_bios, _vbios_interop_enabled); + dc->caps.vbios_lttpr_aware = (bp_query_result == BP_RESULT_OK) && !!is_vbios_interop_enabled; + } + if (dc->ctx->dce_environment == DCE_ENV_PRODUCTION_DRV) dc->debug = debug_defaults_drv; else if (dc->ctx->dce_environment == DCE_ENV_FPGA_MAXIMUS) { diff --git a/drivers/gpu/drm/amd/display/dc/dcn302/dcn302_resource.c b/drivers/gpu/drm/amd/display/dc/dcn302/dcn302_resource.c index fcf96cf08c76..058f5d71e037 100644 --- a/drivers/gpu/drm/amd/display/dc/dcn302/dcn302_resource.c +++ b/drivers/gpu/drm/amd/display/dc/dcn302/dcn302_resource.c @@ -1557,6 +1557,24 @@ static bool dcn302_resource_construct( dc->caps.color.mpc.ogam_rom_caps.hlg = 0; dc->caps.color.mpc.ocsc = 1; + /* read VBIOS LTTPR caps */ + if (ctx->dc_bios->funcs->get_lttpr_caps) { + enum bp_result bp_query_result; + uint8_t is_vbios_lttpr_enable = 0; + + bp_query_result = ctx->dc_bios->funcs->get_lttpr_caps(ctx->dc_bios, _vbios_lttpr_enable); + dc->caps.vbios_lttpr_enable = (bp_query_result == BP_RESULT_OK) && !!is_vbios_lttpr_enable; + } + + if (ctx->dc_bios->funcs->get_lttpr_interop) { + enum bp_result bp_query_result; + uint8_t is_vbios_interop_enabled = 0; + + bp_query_result = ctx->dc_bios->funcs->get_lttpr_interop(ctx->dc_bios, + _vbios_interop_enabled); + dc->caps.vbios_lttpr_aware = (bp_query_result == BP_RESULT_OK) && !!is_vbios_interop_enabled; + } + if (dc->ctx->dce_environment == DCE_ENV_PRODUCTION_DRV) dc->debug = debug_defaults_drv; else diff --git a/drivers/gpu/drm/amd/display/dc/dcn303/dcn303_resource.c b/drivers/gpu/drm/amd/display/dc/dcn303/dcn303_resource.c index 4a9b64023675..7024aeb0884c 100644 --- a/drivers/gpu/drm/amd/display/dc/dcn303/dcn303_resource.c +++ b/drivers/gpu/drm/amd/display/dc/dcn303/dcn303_resource.c @@ -1500,6 +1500,23 @@ static bool dcn303_resource_construct( dc->caps.color.mpc.ogam_rom_caps.hlg = 0; dc->caps.color.mpc.ocsc = 1; + /* read VBIOS LTTPR caps */ + if (ctx->dc_bios->funcs->get_lttpr_caps) { + enum bp_result bp_query_result; + uint8_t is_vbios_lttpr_enable = 0; + + bp_query_result = ctx->dc_bios->funcs->get_lttpr_caps(ctx->dc_bios, _vbios_lttpr_enable); + dc->caps.vbios_lttpr_enable = (bp_query_result == BP_RESULT_OK) && !!is_vbios_lttpr_enable; + } + + if (ctx->dc_bios->funcs->get_lttpr_interop) { + enum bp_result bp_query_result; + uint8_t is_vbios_interop_enabled = 0; + + bp_query_result = ctx->dc_bios->funcs->get_lttpr_interop(ctx->dc_bios, _vbios_interop_enabled); + dc->caps.vbios_lttpr_aware = (bp_query_result == BP_RESULT_OK) &&
[PATCH] drm/amd/display: Fix LTTPR not Enabled on HP ZBook G8 laptop
From: Angus Wang [WHY] Previous LTTPR change has caused a regression that led to an issue where LTTPR is disabled [HOW] Extended changes from previous fix to DCN30X Reviewed-by: Wesley Chalmers Reviewed-by: Martin Leung Acked-by: Qingqing Zhuo Signed-off-by: Angus Wang --- .../amd/display/dc/dcn301/dcn301_resource.c| 18 ++ .../amd/display/dc/dcn302/dcn302_resource.c| 18 ++ .../amd/display/dc/dcn303/dcn303_resource.c| 17 + 3 files changed, 53 insertions(+) diff --git a/drivers/gpu/drm/amd/display/dc/dcn301/dcn301_resource.c b/drivers/gpu/drm/amd/display/dc/dcn301/dcn301_resource.c index 2650d3bd50ec..9cc1610360bd 100644 --- a/drivers/gpu/drm/amd/display/dc/dcn301/dcn301_resource.c +++ b/drivers/gpu/drm/amd/display/dc/dcn301/dcn301_resource.c @@ -1485,6 +1485,24 @@ static bool dcn301_resource_construct( dc->caps.color.mpc.ogam_rom_caps.hlg = 0; dc->caps.color.mpc.ocsc = 1; + /* read VBIOS LTTPR caps */ + + if (ctx->dc_bios->funcs->get_lttpr_caps) { + enum bp_result bp_query_result; + uint8_t is_vbios_lttpr_enable = 0; + + bp_query_result = ctx->dc_bios->funcs->get_lttpr_caps(ctx->dc_bios, _vbios_lttpr_enable); + dc->caps.vbios_lttpr_enable = (bp_query_result == BP_RESULT_OK) && !!is_vbios_lttpr_enable; + } + + if (ctx->dc_bios->funcs->get_lttpr_interop) { + enum bp_result bp_query_result; + uint8_t is_vbios_interop_enabled = 0; + + bp_query_result = ctx->dc_bios->funcs->get_lttpr_interop(ctx->dc_bios, _vbios_interop_enabled); + dc->caps.vbios_lttpr_aware = (bp_query_result == BP_RESULT_OK) && !!is_vbios_interop_enabled; + } + if (dc->ctx->dce_environment == DCE_ENV_PRODUCTION_DRV) dc->debug = debug_defaults_drv; else if (dc->ctx->dce_environment == DCE_ENV_FPGA_MAXIMUS) { diff --git a/drivers/gpu/drm/amd/display/dc/dcn302/dcn302_resource.c b/drivers/gpu/drm/amd/display/dc/dcn302/dcn302_resource.c index fcf96cf08c76..058f5d71e037 100644 --- a/drivers/gpu/drm/amd/display/dc/dcn302/dcn302_resource.c +++ b/drivers/gpu/drm/amd/display/dc/dcn302/dcn302_resource.c @@ -1557,6 +1557,24 @@ static bool dcn302_resource_construct( dc->caps.color.mpc.ogam_rom_caps.hlg = 0; dc->caps.color.mpc.ocsc = 1; + /* read VBIOS LTTPR caps */ + if (ctx->dc_bios->funcs->get_lttpr_caps) { + enum bp_result bp_query_result; + uint8_t is_vbios_lttpr_enable = 0; + + bp_query_result = ctx->dc_bios->funcs->get_lttpr_caps(ctx->dc_bios, _vbios_lttpr_enable); + dc->caps.vbios_lttpr_enable = (bp_query_result == BP_RESULT_OK) && !!is_vbios_lttpr_enable; + } + + if (ctx->dc_bios->funcs->get_lttpr_interop) { + enum bp_result bp_query_result; + uint8_t is_vbios_interop_enabled = 0; + + bp_query_result = ctx->dc_bios->funcs->get_lttpr_interop(ctx->dc_bios, + _vbios_interop_enabled); + dc->caps.vbios_lttpr_aware = (bp_query_result == BP_RESULT_OK) && !!is_vbios_interop_enabled; + } + if (dc->ctx->dce_environment == DCE_ENV_PRODUCTION_DRV) dc->debug = debug_defaults_drv; else diff --git a/drivers/gpu/drm/amd/display/dc/dcn303/dcn303_resource.c b/drivers/gpu/drm/amd/display/dc/dcn303/dcn303_resource.c index 4a9b64023675..7024aeb0884c 100644 --- a/drivers/gpu/drm/amd/display/dc/dcn303/dcn303_resource.c +++ b/drivers/gpu/drm/amd/display/dc/dcn303/dcn303_resource.c @@ -1500,6 +1500,23 @@ static bool dcn303_resource_construct( dc->caps.color.mpc.ogam_rom_caps.hlg = 0; dc->caps.color.mpc.ocsc = 1; + /* read VBIOS LTTPR caps */ + if (ctx->dc_bios->funcs->get_lttpr_caps) { + enum bp_result bp_query_result; + uint8_t is_vbios_lttpr_enable = 0; + + bp_query_result = ctx->dc_bios->funcs->get_lttpr_caps(ctx->dc_bios, _vbios_lttpr_enable); + dc->caps.vbios_lttpr_enable = (bp_query_result == BP_RESULT_OK) && !!is_vbios_lttpr_enable; + } + + if (ctx->dc_bios->funcs->get_lttpr_interop) { + enum bp_result bp_query_result; + uint8_t is_vbios_interop_enabled = 0; + + bp_query_result = ctx->dc_bios->funcs->get_lttpr_interop(ctx->dc_bios, _vbios_interop_enabled); + dc->caps.vbios_lttpr_aware = (bp_query_result == BP_RESULT_OK) && !!is_vbios_interop_enabled; + } + if (dc->ctx->dce_environment == DCE_ENV_PRODUCTION_DRV) dc->debug = debug_defaults_drv; else -- 2.30.2
[PATCH v1 1/2] ext4/xfs: add page refcount helper
From: Ralph Campbell There are several places where ZONE_DEVICE struct pages assume a reference count == 1 means the page is idle and free. Instead of open coding this, add a helper function to hide this detail. Signed-off-by: Ralph Campbell Signed-off-by: Alex Sierra Reviewed-by: Christoph Hellwig Acked-by: Theodore Ts'o Acked-by: Darrick J. Wong --- v3: [AS]: rename dax_layout_is_idle_page func to dax_page_unused v4: [AS]: This ref count functionality was missing on fuse/dax.c. --- fs/dax.c| 4 ++-- fs/ext4/inode.c | 5 + fs/fuse/dax.c | 4 +--- fs/xfs/xfs_file.c | 4 +--- include/linux/dax.h | 10 ++ 5 files changed, 15 insertions(+), 12 deletions(-) diff --git a/fs/dax.c b/fs/dax.c index 4e3e5a283a91..84803c835650 100644 --- a/fs/dax.c +++ b/fs/dax.c @@ -369,7 +369,7 @@ static void dax_disassociate_entry(void *entry, struct address_space *mapping, for_each_mapped_pfn(entry, pfn) { struct page *page = pfn_to_page(pfn); - WARN_ON_ONCE(trunc && page_ref_count(page) > 1); + WARN_ON_ONCE(trunc && !dax_page_unused(page)); WARN_ON_ONCE(page->mapping && page->mapping != mapping); page->mapping = NULL; page->index = 0; @@ -383,7 +383,7 @@ static struct page *dax_busy_page(void *entry) for_each_mapped_pfn(entry, pfn) { struct page *page = pfn_to_page(pfn); - if (page_ref_count(page) > 1) + if (!dax_page_unused(page)) return page; } return NULL; diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c index 0f06305167d5..068e8f78ec02 100644 --- a/fs/ext4/inode.c +++ b/fs/ext4/inode.c @@ -3913,10 +3913,7 @@ int ext4_break_layouts(struct inode *inode) if (!page) return 0; - error = ___wait_var_event(>_refcount, - atomic_read(>_refcount) == 1, - TASK_INTERRUPTIBLE, 0, 0, - ext4_wait_dax_page(inode)); + error = dax_wait_page(inode, page, ext4_wait_dax_page); } while (error == 0); return error; diff --git a/fs/fuse/dax.c b/fs/fuse/dax.c index 281d79f8b3d3..f6d2a36e56e2 100644 --- a/fs/fuse/dax.c +++ b/fs/fuse/dax.c @@ -676,9 +676,7 @@ static int __fuse_dax_break_layouts(struct inode *inode, bool *retry, return 0; *retry = true; - return ___wait_var_event(>_refcount, - atomic_read(>_refcount) == 1, TASK_INTERRUPTIBLE, - 0, 0, fuse_wait_dax_page(inode)); + return dax_wait_page(inode, page, fuse_wait_dax_page); } /* dmap_end == 0 leads to unmapping of whole file */ diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c index 7aa943edfc02..fb13b12fd032 100644 --- a/fs/xfs/xfs_file.c +++ b/fs/xfs/xfs_file.c @@ -860,9 +860,7 @@ xfs_break_dax_layouts( return 0; *retry = true; - return ___wait_var_event(>_refcount, - atomic_read(>_refcount) == 1, TASK_INTERRUPTIBLE, - 0, 0, xfs_wait_dax_page(inode)); + return dax_wait_page(inode, page, xfs_wait_dax_page); } int diff --git a/include/linux/dax.h b/include/linux/dax.h index 2619d94c308d..c9c27fbf0b98 100644 --- a/include/linux/dax.h +++ b/include/linux/dax.h @@ -216,6 +216,16 @@ static inline bool dax_mapping(struct address_space *mapping) return mapping->host && IS_DAX(mapping->host); } +static inline bool dax_page_unused(struct page *page) +{ + return page_ref_count(page) == 1; +} + +#define dax_wait_page(_inode, _page, _wait_cb) \ + ___wait_var_event(&(_page)->_refcount, \ + dax_page_unused(_page), \ + TASK_INTERRUPTIBLE, 0, 0, _wait_cb(_inode)) + #ifdef CONFIG_DEV_DAX_HMEM_DEVICES void hmem_register_device(int target_nid, struct resource *r); #else -- 2.32.0
[PATCH v1 2/2] mm: remove extra ZONE_DEVICE struct page refcount
From: Ralph Campbell ZONE_DEVICE struct pages have an extra reference count that complicates the code for put_page() and several places in the kernel that need to check the reference count to see that a page is not being used (gup, compaction, migration, etc.). Clean up the code so the reference count doesn't need to be treated specially for ZONE_DEVICE. Signed-off-by: Ralph Campbell Signed-off-by: Alex Sierra Reviewed-by: Christoph Hellwig --- arch/powerpc/kvm/book3s_hv_uvmem.c | 2 +- drivers/gpu/drm/amd/amdkfd/kfd_migrate.c | 3 +- drivers/gpu/drm/nouveau/nouveau_dmem.c | 2 +- fs/dax.c | 4 +- include/linux/dax.h | 2 +- include/linux/memremap.h | 7 ++- include/linux/mm.h | 11 lib/test_hmm.c | 2 +- mm/internal.h| 7 +++ mm/memcontrol.c | 6 +- mm/memremap.c| 72 +++- mm/migrate.c | 5 -- mm/page_alloc.c | 3 + mm/swap.c| 45 ++- 14 files changed, 48 insertions(+), 123 deletions(-) diff --git a/arch/powerpc/kvm/book3s_hv_uvmem.c b/arch/powerpc/kvm/book3s_hv_uvmem.c index a7061ee3b157..4de7c77ea17b 100644 --- a/arch/powerpc/kvm/book3s_hv_uvmem.c +++ b/arch/powerpc/kvm/book3s_hv_uvmem.c @@ -712,7 +712,7 @@ static struct page *kvmppc_uvmem_get_page(unsigned long gpa, struct kvm *kvm) dpage = pfn_to_page(uvmem_pfn); dpage->zone_device_data = pvt; - get_page(dpage); + init_page_count(dpage); lock_page(dpage); return dpage; out_clear: diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c b/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c index 3e405f078ade..c1b41c301a67 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c @@ -224,7 +224,8 @@ svm_migrate_get_vram_page(struct svm_range *prange, unsigned long pfn) page = pfn_to_page(pfn); svm_range_bo_ref(prange->svm_bo); page->zone_device_data = prange->svm_bo; - get_page(page); + VM_BUG_ON_PAGE(page_ref_count(page), page); + init_page_count(page); lock_page(page); } diff --git a/drivers/gpu/drm/nouveau/nouveau_dmem.c b/drivers/gpu/drm/nouveau/nouveau_dmem.c index 92987daa5e17..8bc7120e1216 100644 --- a/drivers/gpu/drm/nouveau/nouveau_dmem.c +++ b/drivers/gpu/drm/nouveau/nouveau_dmem.c @@ -324,7 +324,7 @@ nouveau_dmem_page_alloc_locked(struct nouveau_drm *drm) return NULL; } - get_page(page); + init_page_count(page); lock_page(page); return page; } diff --git a/fs/dax.c b/fs/dax.c index 84803c835650..72bb6f1e155c 100644 --- a/fs/dax.c +++ b/fs/dax.c @@ -572,14 +572,14 @@ static void *grab_mapping_entry(struct xa_state *xas, /** * dax_layout_busy_page_range - find first pinned page in @mapping - * @mapping: address space to scan for a page with ref count > 1 + * @mapping: address space to scan for a page with ref count > 0 * @start: Starting offset. Page containing 'start' is included. * @end: End offset. Page containing 'end' is included. If 'end' is LLONG_MAX, * pages from 'start' till the end of file are included. * * DAX requires ZONE_DEVICE mapped pages. These pages are never * 'onlined' to the page allocator so they are considered idle when - * page->count == 1. A filesystem uses this interface to determine if + * page->count == 0. A filesystem uses this interface to determine if * any page in the mapping is busy, i.e. for DMA, or other * get_user_pages() usages. * diff --git a/include/linux/dax.h b/include/linux/dax.h index c9c27fbf0b98..1d3e25d51fc6 100644 --- a/include/linux/dax.h +++ b/include/linux/dax.h @@ -218,7 +218,7 @@ static inline bool dax_mapping(struct address_space *mapping) static inline bool dax_page_unused(struct page *page) { - return page_ref_count(page) == 1; + return page_ref_count(page) == 0; } #define dax_wait_page(_inode, _page, _wait_cb) \ diff --git a/include/linux/memremap.h b/include/linux/memremap.h index ff4d398edf35..8340e39241fc 100644 --- a/include/linux/memremap.h +++ b/include/linux/memremap.h @@ -74,9 +74,10 @@ enum memory_type { struct dev_pagemap_ops { /* -* Called once the page refcount reaches 1. (ZONE_DEVICE pages never -* reach 0 refcount unless there is a refcount bug. This allows the -* device driver to implement its own memory management.) +* Called once the page refcount reaches 0. The reference count +* should be reset to one with init_page_count(page) before reusing +* the page. This allows the device driver to implement its own +* memory management. */ void (*page_free)(struct page *page); diff --git
[PATCH v1 0/2] Remove extra ZONE_DEVICE page refcount
This patch includes Ralph Campbell’s ZONE_DEVICE refcount cleanup and additionally the changes necessary to avoid breaking the separately submitted MEMORY_DEVICE_COHERENT page migration code. Ralph’s original description: ZONE_DEVICE struct pages have an extra reference count that complicates the code for put_page() and several places in the kernel that need to check the reference count to see that a page is not being used (gup, compaction, migration, etc.). Clean up the code so the reference count doesn't need to be treated specially for ZONE_DEVICE. Following a suggestion by Christoph, we attempted to combine this cleanup with the device patch migration patch series, however this caused xftests 413 to fail with a warning, the root cause of which has large kernel footprint than just device memory. Fixing this issue properly will require cooperation between multiple development groups working across multiple kernel subsystems, as is apparent from the discussion under the earlier, combined patch submission. We therefore propose to break this work out separately as its own patch, so it can receive the cooperative development work it needs. The deep problem arises from the get_user_pages API, which has proved troublesome for many years. It is possible that a concerted effort to repair this particular refcount issue properly will be a step in the direction of rationalizing this popular and problematic API. In the larger picture, this API rationalization work probably deserves an agenda item at the upcoming Filesystem, MM & BPF Summit: https://events.linuxfoundation.org/lsfmm/ The wide ranging discussion following previous iterations of the migration patchset focused almost exclusively on the refcount cleanup patch. The thread is here: https://lore.kernel.org/linux-mm/20211014153928.16805-3-alex.sie...@amd.com/ and links a number of previous threads. It is apparent that there is a lot of work in progress by a number of developer groups in parallel, and that there are issues with the order in which this work should be attempted and merged. Jason provided his list of “balls in the air”: - Joao's compound page support for device_dax and more - Alex's DEVICE_COHERENT - The refcount normalization - Removing the pgmap test from GUP - Removing the need for the PUD/PMD/PTE special bit - Removing the need for the PUD/PMD/PTE devmap bit - Remove PUD/PMD vma_is_special - folios for fsdax - shootdown for fsdax It is not clear that the refcount cleanup in this patch should be the first item on the list to be merged, however it has proved to be a good starting point for a cooperative effort to address the underlying issues. Ralph, if you would prefer to take back “ownership” of this patch, it’s yours, otherwise we will be happy to keep it in play and get it merged when some other pieces of the puzzle fall into place. Ralph Campbell (2): ext4/xfs: add page refcount helper mm: remove extra ZONE_DEVICE struct page refcount arch/powerpc/kvm/book3s_hv_uvmem.c | 2 +- drivers/gpu/drm/amd/amdkfd/kfd_migrate.c | 3 +- drivers/gpu/drm/nouveau/nouveau_dmem.c | 2 +- fs/dax.c | 8 +-- fs/ext4/inode.c | 5 +- fs/fuse/dax.c| 4 +- fs/xfs/xfs_file.c| 4 +- include/linux/dax.h | 10 include/linux/memremap.h | 7 ++- include/linux/mm.h | 11 lib/test_hmm.c | 2 +- mm/internal.h| 7 +++ mm/memcontrol.c | 6 +- mm/memremap.c| 72 +++- mm/migrate.c | 5 -- mm/page_alloc.c | 3 + mm/swap.c| 45 ++- 17 files changed, 62 insertions(+), 134 deletions(-) -- 2.32.0
[PATCH v1 8/9] tools: update hmm-test to support device coherent type
Test cases such as migrate_fault and migrate_multiple, were modified to explicit migrate from device to sys memory without the need of page faults, when using device coherent type. Snapshot test case updated to read memory device type first and based on that, get the proper returned results migrate_ping_pong test case added to test explicit migration from device to sys memory for both private and coherent zone types. Helpers to migrate from device to sys memory and vicerversa were also added. Signed-off-by: Alex Sierra --- tools/testing/selftests/vm/hmm-tests.c | 156 ++--- 1 file changed, 138 insertions(+), 18 deletions(-) diff --git a/tools/testing/selftests/vm/hmm-tests.c b/tools/testing/selftests/vm/hmm-tests.c index 864f126ffd78..6091e30636d5 100644 --- a/tools/testing/selftests/vm/hmm-tests.c +++ b/tools/testing/selftests/vm/hmm-tests.c @@ -44,6 +44,7 @@ struct hmm_buffer { int fd; uint64_tcpages; uint64_tfaults; + int zone_device_type; }; #define TWOMEG (1 << 21) @@ -144,6 +145,7 @@ static int hmm_dmirror_cmd(int fd, } buffer->cpages = cmd.cpages; buffer->faults = cmd.faults; + buffer->zone_device_type = cmd.zone_device_type; return 0; } @@ -211,6 +213,32 @@ static void hmm_nanosleep(unsigned int n) nanosleep(, NULL); } +static int hmm_migrate_sys_to_dev(int fd, + struct hmm_buffer *buffer, + unsigned long npages) +{ + return hmm_dmirror_cmd(fd, HMM_DMIRROR_MIGRATE_TO_DEV, buffer, npages); +} + +static int hmm_migrate_dev_to_sys(int fd, + struct hmm_buffer *buffer, + unsigned long npages) +{ + return hmm_dmirror_cmd(fd, HMM_DMIRROR_MIGRATE_TO_SYS, buffer, npages); +} + +static int hmm_is_private_device(int fd, bool *res) +{ + struct hmm_buffer buffer; + int ret; + + buffer.ptr = 0; + ret = hmm_dmirror_cmd(fd, HMM_DMIRROR_GET_MEM_DEV_TYPE, , 1); + *res = (buffer.zone_device_type == HMM_DMIRROR_MEMORY_DEVICE_PRIVATE); + + return ret; +} + /* * Simple NULL test of device open/close. */ @@ -875,7 +903,7 @@ TEST_F(hmm, migrate) ptr[i] = i; /* Migrate memory to device. */ - ret = hmm_dmirror_cmd(self->fd, HMM_DMIRROR_MIGRATE, buffer, npages); + ret = hmm_migrate_sys_to_dev(self->fd, buffer, npages); ASSERT_EQ(ret, 0); ASSERT_EQ(buffer->cpages, npages); @@ -923,7 +951,7 @@ TEST_F(hmm, migrate_fault) ptr[i] = i; /* Migrate memory to device. */ - ret = hmm_dmirror_cmd(self->fd, HMM_DMIRROR_MIGRATE, buffer, npages); + ret = hmm_migrate_sys_to_dev(self->fd, buffer, npages); ASSERT_EQ(ret, 0); ASSERT_EQ(buffer->cpages, npages); @@ -936,7 +964,7 @@ TEST_F(hmm, migrate_fault) ASSERT_EQ(ptr[i], i); /* Migrate memory to the device again. */ - ret = hmm_dmirror_cmd(self->fd, HMM_DMIRROR_MIGRATE, buffer, npages); + ret = hmm_migrate_sys_to_dev(self->fd, buffer, npages); ASSERT_EQ(ret, 0); ASSERT_EQ(buffer->cpages, npages); @@ -976,7 +1004,7 @@ TEST_F(hmm, migrate_shared) ASSERT_NE(buffer->ptr, MAP_FAILED); /* Migrate memory to device. */ - ret = hmm_dmirror_cmd(self->fd, HMM_DMIRROR_MIGRATE, buffer, npages); + ret = hmm_migrate_sys_to_dev(self->fd, buffer, npages); ASSERT_EQ(ret, -ENOENT); hmm_buffer_free(buffer); @@ -1015,7 +1043,7 @@ TEST_F(hmm2, migrate_mixed) p = buffer->ptr; /* Migrating a protected area should be an error. */ - ret = hmm_dmirror_cmd(self->fd1, HMM_DMIRROR_MIGRATE, buffer, npages); + ret = hmm_migrate_sys_to_dev(self->fd1, buffer, npages); ASSERT_EQ(ret, -EINVAL); /* Punch a hole after the first page address. */ @@ -1023,7 +1051,7 @@ TEST_F(hmm2, migrate_mixed) ASSERT_EQ(ret, 0); /* We expect an error if the vma doesn't cover the range. */ - ret = hmm_dmirror_cmd(self->fd1, HMM_DMIRROR_MIGRATE, buffer, 3); + ret = hmm_migrate_sys_to_dev(self->fd1, buffer, 3); ASSERT_EQ(ret, -EINVAL); /* Page 2 will be a read-only zero page. */ @@ -1055,13 +1083,13 @@ TEST_F(hmm2, migrate_mixed) /* Now try to migrate pages 2-5 to device 1. */ buffer->ptr = p + 2 * self->page_size; - ret = hmm_dmirror_cmd(self->fd1, HMM_DMIRROR_MIGRATE, buffer, 4); + ret = hmm_migrate_sys_to_dev(self->fd1, buffer, 4); ASSERT_EQ(ret, 0); ASSERT_EQ(buffer->cpages, 4); /* Page 5 won't be migrated to device 0 because it's on device 1. */ buffer->ptr = p + 5 * self->page_size; - ret = hmm_dmirror_cmd(self->fd0, HMM_DMIRROR_MIGRATE, buffer, 1); + ret = hmm_migrate_sys_to_dev(self->fd0, buffer, 1);
[PATCH v1 0/9] Add MEMORY_DEVICE_COHERENT for coherent device memory mapping
This patch series introduces MEMORY_DEVICE_COHERENT, a type of memory owned by a device that can be mapped into CPU page tables like MEMORY_DEVICE_GENERIC and can also be migrated like MEMORY_DEVICE_PRIVATE. Christoph, the suggestion to incorporate Ralph Campbell’s refcount cleanup patch into our hardware page migration patchset originally came from you, but it proved impractical to do things in that order because the refcount cleanup introduced a bug with wide ranging structural implications. Instead, we amended Ralph’s patch so that it could be applied after merging the migration work. As we saw from the recent discussion, merging the refcount work is going to take some time and cooperation between multiple development groups, while the migration work is ready now and is needed now. So we propose to merge this patchset first and continue to work with Ralph and others to merge the refcount cleanup separately, when it is ready. This patch series is mostly self-contained except for a few places where it needs to update other subsystems to handle the new memory type. System stability and performance are not affected according to our ongoing testing, including xfstests. How it works: The system BIOS advertises the GPU device memory (aka VRAM) as SPM (special purpose memory) in the UEFI system address map. The amdgpu driver registers the memory with devmap as MEMORY_DEVICE_COHERENT using devm_memremap_pages. The initial user for this hardware page migration capability is the Frontier supercomputer project. This functionality is not AMD-specific. We expect other GPU vendors to find this functionality useful, and possibly other hardware types in the future. Our test nodes in the lab are similar to the Frontier configuration, with .5 TB of system memory plus 256 GB of device memory split across 4 GPUs, all in a single coherent address space. Page migration is expected to improve application efficiency significantly. We will report empirical results as they become available. We extended hmm_test to cover migration of MEMORY_DEVICE_COHERENT. This patch set builds on HMM and our SVM memory manager already merged in 5.15. Alex Sierra (9): mm: add zone device coherent type memory support mm: add device coherent vma selection for memory migration drm/amdkfd: add SPM support for SVM drm/amdkfd: coherent type as sys mem on migration to ram lib: test_hmm add ioctl to get zone device type lib: test_hmm add module param for zone device type lib: add support for device coherent type in test_hmm tools: update hmm-test to support device coherent type tools: update test_hmm script to support SP config drivers/gpu/drm/amd/amdkfd/kfd_migrate.c | 34 ++- include/linux/memremap.h | 8 + include/linux/migrate.h | 1 + include/linux/mm.h | 9 + lib/test_hmm.c | 270 +-- lib/test_hmm_uapi.h | 21 +- mm/memcontrol.c | 6 +- mm/memory-failure.c | 6 +- mm/memremap.c| 5 +- mm/migrate.c | 30 ++- tools/testing/selftests/vm/hmm-tests.c | 156 +++-- tools/testing/selftests/vm/test_hmm.sh | 20 +- 12 files changed, 446 insertions(+), 120 deletions(-) -- 2.32.0
[PATCH v1 7/9] lib: add support for device coherent type in test_hmm
Device Coherent type uses device memory that is coherently accesible by the CPU. This could be shown as SP (special purpose) memory range at the BIOS-e820 memory enumeration. If no SP memory is supported in system, this could be faked by setting CONFIG_EFI_FAKE_MEMMAP. Currently, test_hmm only supports two different SP ranges of at least 256MB size. This could be specified in the kernel parameter variable efi_fake_mem. Ex. Two SP ranges of 1GB starting at 0x1 & 0x14000 physical address. Ex. efi_fake_mem=1G@0x1:0x4,1G@0x14000:0x4 Signed-off-by: Alex Sierra --- lib/test_hmm.c | 191 +--- lib/test_hmm_uapi.h | 15 ++-- 2 files changed, 153 insertions(+), 53 deletions(-) diff --git a/lib/test_hmm.c b/lib/test_hmm.c index 45334df28d7b..9e2cc0cd4412 100644 --- a/lib/test_hmm.c +++ b/lib/test_hmm.c @@ -471,6 +471,7 @@ static int dmirror_allocate_chunk(struct dmirror_device *mdevice, unsigned long pfn_first; unsigned long pfn_last; void *ptr; + int ret = -ENOMEM; devmem = kzalloc(sizeof(*devmem), GFP_KERNEL); if (!devmem) @@ -553,7 +554,7 @@ static int dmirror_allocate_chunk(struct dmirror_device *mdevice, } spin_unlock(>lock); - return true; + return 0; err_release: mutex_unlock(>devmem_lock); @@ -562,7 +563,7 @@ static int dmirror_allocate_chunk(struct dmirror_device *mdevice, err_devmem: kfree(devmem); - return false; + return ret; } static struct page *dmirror_devmem_alloc_page(struct dmirror_device *mdevice) @@ -571,8 +572,10 @@ static struct page *dmirror_devmem_alloc_page(struct dmirror_device *mdevice) struct page *rpage; /* -* This is a fake device so we alloc real system memory to store -* our device memory. +* For ZONE_DEVICE private type, this is a fake device so we alloc real +* system memory to store our device memory. +* For ZONE_DEVICE coherent type we use the actual dpage to store the data +* and ignore rpage. */ rpage = alloc_page(GFP_HIGHUSER); if (!rpage) @@ -623,12 +626,17 @@ static void dmirror_migrate_alloc_and_copy(struct migrate_vma *args, * unallocated pte_none() or read-only zero page. */ spage = migrate_pfn_to_page(*src); + if (spage && is_zone_device_page(spage)) + pr_err("page already in device spage pfn: 0x%lx\n", + page_to_pfn(spage)); + WARN_ON(spage && is_zone_device_page(spage)); dpage = dmirror_devmem_alloc_page(mdevice); if (!dpage) continue; - rpage = dpage->zone_device_data; + rpage = is_device_private_page(dpage) ? dpage->zone_device_data : + dpage; if (spage) copy_highpage(rpage, spage); else @@ -640,8 +648,11 @@ static void dmirror_migrate_alloc_and_copy(struct migrate_vma *args, * the simulated device memory and that page holds the pointer * to the mirror. */ + rpage = dpage->zone_device_data; rpage->zone_device_data = dmirror; + pr_debug("migrating from sys to dev pfn src: 0x%lx pfn dst: 0x%lx\n", +page_to_pfn(spage), page_to_pfn(dpage)); *dst = migrate_pfn(page_to_pfn(dpage)) | MIGRATE_PFN_LOCKED; if ((*src & MIGRATE_PFN_WRITE) || @@ -721,10 +732,13 @@ static int dmirror_migrate_finalize_and_map(struct migrate_vma *args, continue; /* -* Store the page that holds the data so the page table -* doesn't have to deal with ZONE_DEVICE private pages. +* For ZONE_DEVICE private pages we store the page that +* holds the data so the page table doesn't have to deal it. +* For ZONE_DEVICE coherent pages we store the actual page, since +* the CPU has coherent access to the page. */ - entry = dpage->zone_device_data; + entry = is_device_private_page(dpage) ? dpage->zone_device_data : + dpage; if (*dst & MIGRATE_PFN_WRITE) entry = xa_tag_pointer(entry, DPT_XA_TAG_WRITE); entry = xa_store(>pt, pfn, entry, GFP_ATOMIC); @@ -804,8 +818,110 @@ static int dmirror_exclusive(struct dmirror *dmirror, return ret; } -static int dmirror_migrate(struct dmirror *dmirror, - struct hmm_dmirror_cmd *cmd) +static vm_fault_t dmirror_devmem_fault_alloc_and_copy(struct migrate_vma
[PATCH v1 6/9] lib: test_hmm add module param for zone device type
In order to configure device coherent in test_hmm, two module parameters should be passed, which correspond to the SP start address of each device (2) spm_addr_dev0 & spm_addr_dev1. If no parameters are passed, private device type is configured. Signed-off-by: Alex Sierra --- lib/test_hmm.c | 66 +++-- lib/test_hmm_uapi.h | 1 + 2 files changed, 47 insertions(+), 20 deletions(-) diff --git a/lib/test_hmm.c b/lib/test_hmm.c index 2daaa7b3710b..45334df28d7b 100644 --- a/lib/test_hmm.c +++ b/lib/test_hmm.c @@ -34,6 +34,16 @@ #define DEVMEM_CHUNK_SIZE (256 * 1024 * 1024U) #define DEVMEM_CHUNKS_RESERVE 16 +static unsigned long spm_addr_dev0; +module_param(spm_addr_dev0, long, 0644); +MODULE_PARM_DESC(spm_addr_dev0, + "Specify start address for SPM (special purpose memory) used for device 0. By setting this Coherent device type will be used. Make sure spm_addr_dev1 is set too"); + +static unsigned long spm_addr_dev1; +module_param(spm_addr_dev1, long, 0644); +MODULE_PARM_DESC(spm_addr_dev1, + "Specify start address for SPM (special purpose memory) used for device 1. By setting this Coherent device type will be used. Make sure spm_addr_dev0 is set too"); + static const struct dev_pagemap_ops dmirror_devmem_ops; static const struct mmu_interval_notifier_ops dmirror_min_ops; static dev_t dmirror_dev; @@ -452,11 +462,11 @@ static int dmirror_write(struct dmirror *dmirror, struct hmm_dmirror_cmd *cmd) return ret; } -static bool dmirror_allocate_chunk(struct dmirror_device *mdevice, +static int dmirror_allocate_chunk(struct dmirror_device *mdevice, struct page **ppage) { struct dmirror_chunk *devmem; - struct resource *res; + struct resource *res = NULL; unsigned long pfn; unsigned long pfn_first; unsigned long pfn_last; @@ -464,17 +474,29 @@ static bool dmirror_allocate_chunk(struct dmirror_device *mdevice, devmem = kzalloc(sizeof(*devmem), GFP_KERNEL); if (!devmem) - return false; + return -ENOMEM; - res = request_free_mem_region(_resource, DEVMEM_CHUNK_SIZE, - "hmm_dmirror"); - if (IS_ERR(res)) - goto err_devmem; + if (!spm_addr_dev0 && !spm_addr_dev1) { + res = request_free_mem_region(_resource, DEVMEM_CHUNK_SIZE, + "hmm_dmirror"); + if (IS_ERR_OR_NULL(res)) + goto err_devmem; + devmem->pagemap.range.start = res->start; + devmem->pagemap.range.end = res->end; + devmem->pagemap.type = MEMORY_DEVICE_PRIVATE; + mdevice->zone_device_type = HMM_DMIRROR_MEMORY_DEVICE_PRIVATE; + } else if (spm_addr_dev0 && spm_addr_dev1) { + devmem->pagemap.range.start = MINOR(mdevice->cdevice.dev) ? + spm_addr_dev0 : + spm_addr_dev1; + devmem->pagemap.range.end = devmem->pagemap.range.start + + DEVMEM_CHUNK_SIZE - 1; + devmem->pagemap.type = MEMORY_DEVICE_COHERENT; + mdevice->zone_device_type = HMM_DMIRROR_MEMORY_DEVICE_COHERENT; + } else { + pr_err("Both spm_addr_dev parameters should be set\n"); + } - mdevice->zone_device_type = HMM_DMIRROR_MEMORY_DEVICE_PRIVATE; - devmem->pagemap.type = MEMORY_DEVICE_PRIVATE; - devmem->pagemap.range.start = res->start; - devmem->pagemap.range.end = res->end; devmem->pagemap.nr_range = 1; devmem->pagemap.ops = _devmem_ops; devmem->pagemap.owner = mdevice; @@ -495,10 +517,14 @@ static bool dmirror_allocate_chunk(struct dmirror_device *mdevice, mdevice->devmem_capacity = new_capacity; mdevice->devmem_chunks = new_chunks; } - ptr = memremap_pages(>pagemap, numa_node_id()); - if (IS_ERR(ptr)) + if (IS_ERR_OR_NULL(ptr)) { + if (ptr) + ret = PTR_ERR(ptr); + else + ret = -EFAULT; goto err_release; + } devmem->mdevice = mdevice; pfn_first = devmem->pagemap.range.start >> PAGE_SHIFT; @@ -531,7 +557,8 @@ static bool dmirror_allocate_chunk(struct dmirror_device *mdevice, err_release: mutex_unlock(>devmem_lock); - release_mem_region(devmem->pagemap.range.start, range_len(>pagemap.range)); + if (res) + release_mem_region(devmem->pagemap.range.start, range_len(>pagemap.range)); err_devmem: kfree(devmem); @@ -1219,10 +1246,8 @@ static int dmirror_device_init(struct dmirror_device *mdevice, int id) if (ret) return ret; -
[PATCH v1 2/9] mm: add device coherent vma selection for memory migration
This case is used to migrate pages from device memory, back to system memory. Device coherent type memory is cache coherent from device and CPU point of view. Signed-off-by: Alex Sierra --- v2: condition added when migrations from device coherent pages. --- include/linux/migrate.h | 1 + mm/migrate.c| 9 +++-- 2 files changed, 8 insertions(+), 2 deletions(-) diff --git a/include/linux/migrate.h b/include/linux/migrate.h index c8077e936691..e74bb0978f6f 100644 --- a/include/linux/migrate.h +++ b/include/linux/migrate.h @@ -138,6 +138,7 @@ static inline unsigned long migrate_pfn(unsigned long pfn) enum migrate_vma_direction { MIGRATE_VMA_SELECT_SYSTEM = 1 << 0, MIGRATE_VMA_SELECT_DEVICE_PRIVATE = 1 << 1, + MIGRATE_VMA_SELECT_DEVICE_COHERENT = 1 << 2, }; struct migrate_vma { diff --git a/mm/migrate.c b/mm/migrate.c index f74422a42192..166bfec7d85e 100644 --- a/mm/migrate.c +++ b/mm/migrate.c @@ -2340,8 +2340,6 @@ static int migrate_vma_collect_pmd(pmd_t *pmdp, if (is_writable_device_private_entry(entry)) mpfn |= MIGRATE_PFN_WRITE; } else { - if (!(migrate->flags & MIGRATE_VMA_SELECT_SYSTEM)) - goto next; pfn = pte_pfn(pte); if (is_zero_pfn(pfn)) { mpfn = MIGRATE_PFN_MIGRATE; @@ -2349,6 +2347,13 @@ static int migrate_vma_collect_pmd(pmd_t *pmdp, goto next; } page = vm_normal_page(migrate->vma, addr, pte); + if (!is_zone_device_page(page) && + !(migrate->flags & MIGRATE_VMA_SELECT_SYSTEM)) + goto next; + if (is_zone_device_page(page) && + (!(migrate->flags & MIGRATE_VMA_SELECT_DEVICE_COHERENT) || +page->pgmap->owner != migrate->pgmap_owner)) + goto next; mpfn = migrate_pfn(pfn) | MIGRATE_PFN_MIGRATE; mpfn |= pte_write(pte) ? MIGRATE_PFN_WRITE : 0; } -- 2.32.0
[PATCH v1 9/9] tools: update test_hmm script to support SP config
Add two more parameters to set spm_addr_dev0 & spm_addr_dev1 addresses. These two parameters configure the start SP addresses for each device in test_hmm driver. Consequently, this configures zone device type as coherent. Signed-off-by: Alex Sierra --- tools/testing/selftests/vm/test_hmm.sh | 20 +--- 1 file changed, 17 insertions(+), 3 deletions(-) diff --git a/tools/testing/selftests/vm/test_hmm.sh b/tools/testing/selftests/vm/test_hmm.sh index 0647b525a625..3eeabe94399f 100755 --- a/tools/testing/selftests/vm/test_hmm.sh +++ b/tools/testing/selftests/vm/test_hmm.sh @@ -40,7 +40,18 @@ check_test_requirements() load_driver() { - modprobe $DRIVER > /dev/null 2>&1 + if [ $# -eq 0 ]; then + modprobe $DRIVER > /dev/null 2>&1 + else + if [ $# -eq 2 ]; then + modprobe $DRIVER spm_addr_dev0=$1 spm_addr_dev1=$2 + > /dev/null 2>&1 + else + echo "Missing module parameters. Make sure pass"\ + "spm_addr_dev0 and spm_addr_dev1" + usage + fi + fi if [ $? == 0 ]; then major=$(awk "\$2==\"HMM_DMIRROR\" {print \$1}" /proc/devices) mknod /dev/hmm_dmirror0 c $major 0 @@ -58,7 +69,7 @@ run_smoke() { echo "Running smoke test. Note, this test provides basic coverage." - load_driver + load_driver $1 $2 $(dirname "${BASH_SOURCE[0]}")/hmm-tests unload_driver } @@ -75,6 +86,9 @@ usage() echo "# Smoke testing" echo "./${TEST_NAME}.sh smoke" echo + echo "# Smoke testing with SPM enabled" + echo "./${TEST_NAME}.sh smoke " + echo exit 0 } @@ -84,7 +98,7 @@ function run_test() usage else if [ "$1" = "smoke" ]; then - run_smoke + run_smoke $2 $3 else usage fi -- 2.32.0
[PATCH v1 4/9] drm/amdkfd: coherent type as sys mem on migration to ram
Coherent device type memory on VRAM to RAM migration, has similar access as System RAM from the CPU. This flag sets the source from the sender. Which in Coherent type case, should be set as MIGRATE_VMA_SELECT_DEVICE_COHERENT. Signed-off-by: Alex Sierra Reviewed-by: Felix Kuehling --- drivers/gpu/drm/amd/amdkfd/kfd_migrate.c | 5 - 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c b/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c index 9e36fe8aea0f..3e405f078ade 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c @@ -661,9 +661,12 @@ svm_migrate_vma_to_ram(struct amdgpu_device *adev, struct svm_range *prange, migrate.vma = vma; migrate.start = start; migrate.end = end; - migrate.flags = MIGRATE_VMA_SELECT_DEVICE_PRIVATE; migrate.pgmap_owner = SVM_ADEV_PGMAP_OWNER(adev); + if (adev->gmc.xgmi.connected_to_cpu) + migrate.flags = MIGRATE_VMA_SELECT_DEVICE_COHERENT; + else + migrate.flags = MIGRATE_VMA_SELECT_DEVICE_PRIVATE; size = 2 * sizeof(*migrate.src) + sizeof(uint64_t) + sizeof(dma_addr_t); size *= npages; buf = kvmalloc(size, GFP_KERNEL | __GFP_ZERO); -- 2.32.0
[PATCH v1 5/9] lib: test_hmm add ioctl to get zone device type
new ioctl cmd added to query zone device type. This will be used once the test_hmm adds zone device coherent type. Signed-off-by: Alex Sierra --- lib/test_hmm.c | 15 ++- lib/test_hmm_uapi.h | 7 +++ 2 files changed, 21 insertions(+), 1 deletion(-) diff --git a/lib/test_hmm.c b/lib/test_hmm.c index c259842f6d44..2daaa7b3710b 100644 --- a/lib/test_hmm.c +++ b/lib/test_hmm.c @@ -84,6 +84,7 @@ struct dmirror_chunk { struct dmirror_device { struct cdev cdevice; struct hmm_devmem *devmem; + unsigned intzone_device_type; unsigned intdevmem_capacity; unsigned intdevmem_count; @@ -470,6 +471,7 @@ static bool dmirror_allocate_chunk(struct dmirror_device *mdevice, if (IS_ERR(res)) goto err_devmem; + mdevice->zone_device_type = HMM_DMIRROR_MEMORY_DEVICE_PRIVATE; devmem->pagemap.type = MEMORY_DEVICE_PRIVATE; devmem->pagemap.range.start = res->start; devmem->pagemap.range.end = res->end; @@ -1025,6 +1027,15 @@ static int dmirror_snapshot(struct dmirror *dmirror, return ret; } +static int dmirror_get_device_type(struct dmirror *dmirror, + struct hmm_dmirror_cmd *cmd) +{ + mutex_lock(>mutex); + cmd->zone_device_type = dmirror->mdevice->zone_device_type; + mutex_unlock(>mutex); + + return 0; +} static long dmirror_fops_unlocked_ioctl(struct file *filp, unsigned int command, unsigned long arg) @@ -1074,7 +1085,9 @@ static long dmirror_fops_unlocked_ioctl(struct file *filp, case HMM_DMIRROR_SNAPSHOT: ret = dmirror_snapshot(dmirror, ); break; - + case HMM_DMIRROR_GET_MEM_DEV_TYPE: + ret = dmirror_get_device_type(dmirror, ); + break; default: return -EINVAL; } diff --git a/lib/test_hmm_uapi.h b/lib/test_hmm_uapi.h index f14dea5dcd06..c42e57a6a71e 100644 --- a/lib/test_hmm_uapi.h +++ b/lib/test_hmm_uapi.h @@ -26,6 +26,7 @@ struct hmm_dmirror_cmd { __u64 npages; __u64 cpages; __u64 faults; + __u64 zone_device_type; }; /* Expose the address space of the calling process through hmm device file */ @@ -35,6 +36,7 @@ struct hmm_dmirror_cmd { #define HMM_DMIRROR_SNAPSHOT _IOWR('H', 0x03, struct hmm_dmirror_cmd) #define HMM_DMIRROR_EXCLUSIVE _IOWR('H', 0x04, struct hmm_dmirror_cmd) #define HMM_DMIRROR_CHECK_EXCLUSIVE_IOWR('H', 0x05, struct hmm_dmirror_cmd) +#define HMM_DMIRROR_GET_MEM_DEV_TYPE _IOWR('H', 0x06, struct hmm_dmirror_cmd) /* * Values returned in hmm_dmirror_cmd.ptr for HMM_DMIRROR_SNAPSHOT. @@ -62,4 +64,9 @@ enum { HMM_DMIRROR_PROT_DEV_PRIVATE_REMOTE = 0x30, }; +enum { + /* 0 is reserved to catch uninitialized type fields */ + HMM_DMIRROR_MEMORY_DEVICE_PRIVATE = 1, +}; + #endif /* _LIB_TEST_HMM_UAPI_H */ -- 2.32.0
[PATCH v1 3/9] drm/amdkfd: add SPM support for SVM
When CPU is connected throug XGMI, it has coherent access to VRAM resource. In this case that resource is taken from a table in the device gmc aperture base. This resource is used along with the device type, which could be DEVICE_PRIVATE or DEVICE_COHERENT to create the device page map region. Signed-off-by: Alex Sierra Reviewed-by: Felix Kuehling --- v7: Remove lookup_resource call, so export symbol for this function is not longer required. Patch dropped "kernel: resource: lookup_resource as exported symbol" --- drivers/gpu/drm/amd/amdkfd/kfd_migrate.c | 29 +++- 1 file changed, 18 insertions(+), 11 deletions(-) diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c b/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c index aeade32ec298..9e36fe8aea0f 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c @@ -935,7 +935,7 @@ int svm_migrate_init(struct amdgpu_device *adev) { struct kfd_dev *kfddev = adev->kfd.dev; struct dev_pagemap *pgmap; - struct resource *res; + struct resource *res = NULL; unsigned long size; void *r; @@ -950,28 +950,34 @@ int svm_migrate_init(struct amdgpu_device *adev) * should remove reserved size */ size = ALIGN(adev->gmc.real_vram_size, 2ULL << 20); - res = devm_request_free_mem_region(adev->dev, _resource, size); - if (IS_ERR(res)) - return -ENOMEM; + if (adev->gmc.xgmi.connected_to_cpu) { + pgmap->range.start = adev->gmc.aper_base; + pgmap->range.end = adev->gmc.aper_base + adev->gmc.aper_size - 1; + pgmap->type = MEMORY_DEVICE_COHERENT; + } else { + res = devm_request_free_mem_region(adev->dev, _resource, size); + if (IS_ERR(res)) + return -ENOMEM; + pgmap->range.start = res->start; + pgmap->range.end = res->end; + pgmap->type = MEMORY_DEVICE_PRIVATE; + } - pgmap->type = MEMORY_DEVICE_PRIVATE; pgmap->nr_range = 1; - pgmap->range.start = res->start; - pgmap->range.end = res->end; pgmap->ops = _migrate_pgmap_ops; pgmap->owner = SVM_ADEV_PGMAP_OWNER(adev); - pgmap->flags = MIGRATE_VMA_SELECT_DEVICE_PRIVATE; - + pgmap->flags = 0; /* Device manager releases device-specific resources, memory region and * pgmap when driver disconnects from device. */ r = devm_memremap_pages(adev->dev, pgmap); if (IS_ERR(r)) { pr_err("failed to register HMM device memory\n"); - /* Disable SVM support capability */ pgmap->type = 0; - devm_release_mem_region(adev->dev, res->start, resource_size(res)); + if (pgmap->type == MEMORY_DEVICE_PRIVATE) + devm_release_mem_region(adev->dev, res->start, + res->end - res->start + 1); return PTR_ERR(r); } @@ -984,3 +990,4 @@ int svm_migrate_init(struct amdgpu_device *adev) return 0; } + -- 2.32.0
[PATCH v1 1/9] mm: add zone device coherent type memory support
Device memory that is cache coherent from device and CPU point of view. This is used on platforms that have an advanced system bus (like CAPI or CXL). Any page of a process can be migrated to such memory. However, no one should be allowed to pin such memory so that it can always be evicted. Signed-off-by: Alex Sierra --- include/linux/memremap.h | 8 include/linux/mm.h | 9 + mm/memcontrol.c | 6 +++--- mm/memory-failure.c | 6 +- mm/memremap.c| 5 - mm/migrate.c | 21 + 6 files changed, 42 insertions(+), 13 deletions(-) diff --git a/include/linux/memremap.h b/include/linux/memremap.h index c0e9d35889e8..ff4d398edf35 100644 --- a/include/linux/memremap.h +++ b/include/linux/memremap.h @@ -39,6 +39,13 @@ struct vmem_altmap { * A more complete discussion of unaddressable memory may be found in * include/linux/hmm.h and Documentation/vm/hmm.rst. * + * MEMORY_DEVICE_COHERENT: + * Device memory that is cache coherent from device and CPU point of view. This + * is used on platforms that have an advanced system bus (like CAPI or CXL). A + * driver can hotplug the device memory using ZONE_DEVICE and with that memory + * type. Any page of a process can be migrated to such memory. However no one + * should be allowed to pin such memory so that it can always be evicted. + * * MEMORY_DEVICE_FS_DAX: * Host memory that has similar access semantics as System RAM i.e. DMA * coherent and supports page pinning. In support of coordinating page @@ -59,6 +66,7 @@ struct vmem_altmap { enum memory_type { /* 0 is reserved to catch uninitialized type fields */ MEMORY_DEVICE_PRIVATE = 1, + MEMORY_DEVICE_COHERENT, MEMORY_DEVICE_FS_DAX, MEMORY_DEVICE_GENERIC, MEMORY_DEVICE_PCI_P2PDMA, diff --git a/include/linux/mm.h b/include/linux/mm.h index 73a52aba448f..d23b6ebd2177 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -1162,6 +1162,7 @@ static inline bool page_is_devmap_managed(struct page *page) return false; switch (page->pgmap->type) { case MEMORY_DEVICE_PRIVATE: + case MEMORY_DEVICE_COHERENT: case MEMORY_DEVICE_FS_DAX: return true; default: @@ -1191,6 +1192,14 @@ static inline bool is_device_private_page(const struct page *page) page->pgmap->type == MEMORY_DEVICE_PRIVATE; } +static inline bool is_device_page(const struct page *page) +{ + return IS_ENABLED(CONFIG_DEV_PAGEMAP_OPS) && + is_zone_device_page(page) && + (page->pgmap->type == MEMORY_DEVICE_PRIVATE || + page->pgmap->type == MEMORY_DEVICE_COHERENT); +} + static inline bool is_pci_p2pdma_page(const struct page *page) { return IS_ENABLED(CONFIG_DEV_PAGEMAP_OPS) && diff --git a/mm/memcontrol.c b/mm/memcontrol.c index 6da5020a8656..e0275ebe1198 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -5695,8 +5695,8 @@ static int mem_cgroup_move_account(struct page *page, * 2(MC_TARGET_SWAP): if the swap entry corresponding to this pte is a * target for charge migration. if @target is not NULL, the entry is stored * in target->ent. - * 3(MC_TARGET_DEVICE): like MC_TARGET_PAGE but page is MEMORY_DEVICE_PRIVATE - * (so ZONE_DEVICE page and thus not on the lru). + * 3(MC_TARGET_DEVICE): like MC_TARGET_PAGE but page is MEMORY_DEVICE_COHERENT + * or MEMORY_DEVICE_PRIVATE (so ZONE_DEVICE page and thus not on the lru). * For now we such page is charge like a regular page would be as for all * intent and purposes it is just special memory taking the place of a * regular page. @@ -5730,7 +5730,7 @@ static enum mc_target_type get_mctgt_type(struct vm_area_struct *vma, */ if (page_memcg(page) == mc.from) { ret = MC_TARGET_PAGE; - if (is_device_private_page(page)) + if (is_device_page(page)) ret = MC_TARGET_DEVICE; if (target) target->page = page; diff --git a/mm/memory-failure.c b/mm/memory-failure.c index 3e6449f2102a..51b55fc5172c 100644 --- a/mm/memory-failure.c +++ b/mm/memory-failure.c @@ -1554,12 +1554,16 @@ static int memory_failure_dev_pagemap(unsigned long pfn, int flags, goto unlock; } - if (pgmap->type == MEMORY_DEVICE_PRIVATE) { + switch (pgmap->type) { + case MEMORY_DEVICE_PRIVATE: + case MEMORY_DEVICE_COHERENT: /* * TODO: Handle HMM pages which may need coordination * with device-side memory. */ goto unlock; + default: + break; } /* diff --git a/mm/memremap.c b/mm/memremap.c index ed593bf87109..94d6a1e01d42 100644 --- a/mm/memremap.c +++ b/mm/memremap.c @@
[PATCH 5.15 350/917] drm/amd/display: Pass display_pipe_params_st as const in DML
From: Harry Wentland [ Upstream commit 22667e6ec6b2ce9ca706e9061660b059725d009c ] [Why] This neither needs to be on the stack nor passed by value to each function call. In fact, when building with clang it seems to break the Linux's default 1024 byte stack frame limit. [How] We can simply pass this as a const pointer. This patch fixes these Coverity IDs Addresses-Coverity-ID: 1424031: ("Big parameter passed by value") Addresses-Coverity-ID: 1423970: ("Big parameter passed by value") Addresses-Coverity-ID: 1423941: ("Big parameter passed by value") Addresses-Coverity-ID: 1451742: ("Big parameter passed by value") Addresses-Coverity-ID: 1451887: ("Big parameter passed by value") Addresses-Coverity-ID: 1454146: ("Big parameter passed by value") Addresses-Coverity-ID: 1454152: ("Big parameter passed by value") Addresses-Coverity-ID: 1454413: ("Big parameter passed by value") Addresses-Coverity-ID: 1466144: ("Big parameter passed by value") Addresses-Coverity-ID: 1487237: ("Big parameter passed by value") Signed-off-by: Harry Wentland Fixes: 3fe617ccafd6 ("Enable '-Werror' by default for all kernel builds") Cc: Nick Desaulniers Cc: Linus Torvalds Cc: amd-gfx@lists.freedesktop.org Cc: Linux Kernel Mailing List Cc: Arnd Bergmann Cc: Leo Li Cc: Alex Deucher Cc: Christian König Cc: Xinhui Pan Cc: Nathan Chancellor Cc: Guenter Roeck Cc: l...@lists.linux.dev Acked-by: Christian König Build-tested-by: Nathan Chancellor Reviewed-by: Leo Li Signed-off-by: Alex Deucher Signed-off-by: Sasha Levin --- .../drm/amd/display/dc/dcn20/dcn20_resource.c | 2 +- .../dc/dml/dcn20/display_rq_dlg_calc_20.c | 6 +- .../dc/dml/dcn20/display_rq_dlg_calc_20.h | 4 +- .../dc/dml/dcn20/display_rq_dlg_calc_20v2.c | 6 +- .../dc/dml/dcn20/display_rq_dlg_calc_20v2.h | 4 +- .../dc/dml/dcn21/display_rq_dlg_calc_21.c | 62 .../dc/dml/dcn21/display_rq_dlg_calc_21.h | 4 +- .../dc/dml/dcn30/display_rq_dlg_calc_30.c | 72 +-- .../dc/dml/dcn30/display_rq_dlg_calc_30.h | 4 +- .../dc/dml/dcn31/display_rq_dlg_calc_31.c | 68 +- .../dc/dml/dcn31/display_rq_dlg_calc_31.h | 4 +- .../drm/amd/display/dc/dml/display_mode_lib.h | 4 +- 12 files changed, 120 insertions(+), 120 deletions(-) diff --git a/drivers/gpu/drm/amd/display/dc/dcn20/dcn20_resource.c b/drivers/gpu/drm/amd/display/dc/dcn20/dcn20_resource.c index f2f258e70f9da..34a126816133e 100644 --- a/drivers/gpu/drm/amd/display/dc/dcn20/dcn20_resource.c +++ b/drivers/gpu/drm/amd/display/dc/dcn20/dcn20_resource.c @@ -3152,7 +3152,7 @@ void dcn20_calculate_dlg_params( context->bw_ctx.dml.funcs.rq_dlg_get_rq_reg(>bw_ctx.dml, >res_ctx.pipe_ctx[i].rq_regs, - pipes[pipe_idx].pipe); + [pipe_idx].pipe); pipe_idx++; } } diff --git a/drivers/gpu/drm/amd/display/dc/dml/dcn20/display_rq_dlg_calc_20.c b/drivers/gpu/drm/amd/display/dc/dml/dcn20/display_rq_dlg_calc_20.c index 2091dd8c252da..8c168f348a27f 100644 --- a/drivers/gpu/drm/amd/display/dc/dml/dcn20/display_rq_dlg_calc_20.c +++ b/drivers/gpu/drm/amd/display/dc/dml/dcn20/display_rq_dlg_calc_20.c @@ -768,12 +768,12 @@ static void dml20_rq_dlg_get_rq_params(struct display_mode_lib *mode_lib, void dml20_rq_dlg_get_rq_reg(struct display_mode_lib *mode_lib, display_rq_regs_st *rq_regs, - const display_pipe_params_st pipe_param) + const display_pipe_params_st *pipe_param) { display_rq_params_st rq_param = {0}; memset(rq_regs, 0, sizeof(*rq_regs)); - dml20_rq_dlg_get_rq_params(mode_lib, _param, pipe_param.src); + dml20_rq_dlg_get_rq_params(mode_lib, _param, pipe_param->src); extract_rq_regs(mode_lib, rq_regs, rq_param); print__rq_regs_st(mode_lib, *rq_regs); @@ -1549,7 +1549,7 @@ static void dml20_rq_dlg_get_dlg_params(struct display_mode_lib *mode_lib, void dml20_rq_dlg_get_dlg_reg(struct display_mode_lib *mode_lib, display_dlg_regs_st *dlg_regs, display_ttu_regs_st *ttu_regs, - display_e2e_pipe_params_st *e2e_pipe_param, + const display_e2e_pipe_params_st *e2e_pipe_param, const unsigned int num_pipes, const unsigned int pipe_idx, const bool cstate_en, diff --git a/drivers/gpu/drm/amd/display/dc/dml/dcn20/display_rq_dlg_calc_20.h b/drivers/gpu/drm/amd/display/dc/dml/dcn20/display_rq_dlg_calc_20.h index d0b90947f5409..8b23867e97c18 100644 --- a/drivers/gpu/drm/amd/display/dc/dml/dcn20/display_rq_dlg_calc_20.h +++ b/drivers/gpu/drm/amd/display/dc/dml/dcn20/display_rq_dlg_calc_20.h @@ -43,7 +43,7 @@ struct display_mode_lib; void dml20_rq_dlg_get_rq_reg( struct display_mode_lib *mode_lib, display_rq_regs_st *rq_regs, - const display_pipe_params_st pipe_param); +
Re: [PATCH] drm/amd/amdkfd: Fix kernel panic when reset failed and been triggered again
Am 2021-11-15 um 11:20 a.m. schrieb shaoyunl: > In SRIOV configuration, the reset may failed to bring asic back to normal but > stop cpsch > already been called, the start_cpsch will not be called since there is no > resume in this > case. When reset been triggered again, driver should avoid to do > uninitialization again. > > Signed-off-by: shaoyunl Reviewed-by: Felix Kuehling > --- > drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c | 5 + > 1 file changed, 5 insertions(+) > > diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c > b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c > index 42b2cc999434..62fe28244a80 100644 > --- a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c > +++ b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c > @@ -1225,6 +1225,11 @@ static int stop_cpsch(struct device_queue_manager *dqm) > bool hanging; > > dqm_lock(dqm); > + if (!dqm->sched_running) { > + dqm_unlock(dqm); > + return 0; > + } > + > if (!dqm->is_hws_hang) > unmap_queues_cpsch(dqm, KFD_UNMAP_QUEUES_FILTER_ALL_QUEUES, 0); > hanging = dqm->is_hws_hang || dqm->is_resetting;
[PATCH] drm/amd/amdkfd: Fix kernel panic when reset failed and been triggered again
In SRIOV configuration, the reset may failed to bring asic back to normal but stop cpsch already been called, the start_cpsch will not be called since there is no resume in this case. When reset been triggered again, driver should avoid to do uninitialization again. Signed-off-by: shaoyunl --- drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c | 5 + 1 file changed, 5 insertions(+) diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c index 42b2cc999434..62fe28244a80 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c @@ -1225,6 +1225,11 @@ static int stop_cpsch(struct device_queue_manager *dqm) bool hanging; dqm_lock(dqm); + if (!dqm->sched_running) { + dqm_unlock(dqm); + return 0; + } + if (!dqm->is_hws_hang) unmap_queues_cpsch(dqm, KFD_UNMAP_QUEUES_FILTER_ALL_QUEUES, 0); hanging = dqm->is_hws_hang || dqm->is_resetting; -- 2.17.1
RE: [PATCH] drm/amd/amdkfd: Fix kernel panic when reset failed and been triggered again
[AMD Official Use Only] Om, sounds reasonable Thanks Shaoyun.liu -Original Message- From: Kuehling, Felix Sent: Monday, November 15, 2021 11:07 AM To: amd-gfx@lists.freedesktop.org; Liu, Shaoyun Subject: Re: [PATCH] drm/amd/amdkfd: Fix kernel panic when reset failed and been triggered again Am 2021-11-14 um 12:55 p.m. schrieb shaoyunl: > In SRIOV configuration, the reset may failed to bring asic back to > normal but stop cpsch already been called, the start_cpsch will not be > called since there is no resume in this case. When reset been triggered > again, driver should avoid to do uninitialization again. > > Signed-off-by: shaoyunl If there is a possibility that stop_cpsch is called multiple times, I think the check for that should be at the start of the function. Something like: if (!dqm->sched_running) return 0; Regards, Felix > --- > drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c | 10 ++ > 1 file changed, 6 insertions(+), 4 deletions(-) > > diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c > b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c > index 42b2cc999434..bcc8980d77e0 100644 > --- a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c > +++ b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c > @@ -1228,12 +1228,14 @@ static int stop_cpsch(struct device_queue_manager > *dqm) > if (!dqm->is_hws_hang) > unmap_queues_cpsch(dqm, KFD_UNMAP_QUEUES_FILTER_ALL_QUEUES, 0); > hanging = dqm->is_hws_hang || dqm->is_resetting; > - dqm->sched_running = false; > > - pm_release_ib(>packet_mgr); > + if (dqm->sched_running) { > + dqm->sched_running = false; > + pm_release_ib(>packet_mgr); > + kfd_gtt_sa_free(dqm->dev, dqm->fence_mem); > + pm_uninit(>packet_mgr, hanging); > + } > > - kfd_gtt_sa_free(dqm->dev, dqm->fence_mem); > - pm_uninit(>packet_mgr, hanging); > dqm_unlock(dqm); > > return 0;
Re: [PATCH] drm/amd/amdkfd: Fix kernel panic when reset failed and been triggered again
Am 2021-11-14 um 12:55 p.m. schrieb shaoyunl: > In SRIOV configuration, the reset may failed to bring asic back to normal but > stop cpsch > already been called, the start_cpsch will not be called since there is no > resume in this > case. When reset been triggered again, driver should avoid to do > uninitialization again. > > Signed-off-by: shaoyunl If there is a possibility that stop_cpsch is called multiple times, I think the check for that should be at the start of the function. Something like: if (!dqm->sched_running) return 0; Regards, Felix > --- > drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c | 10 ++ > 1 file changed, 6 insertions(+), 4 deletions(-) > > diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c > b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c > index 42b2cc999434..bcc8980d77e0 100644 > --- a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c > +++ b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c > @@ -1228,12 +1228,14 @@ static int stop_cpsch(struct device_queue_manager > *dqm) > if (!dqm->is_hws_hang) > unmap_queues_cpsch(dqm, KFD_UNMAP_QUEUES_FILTER_ALL_QUEUES, 0); > hanging = dqm->is_hws_hang || dqm->is_resetting; > - dqm->sched_running = false; > > - pm_release_ib(>packet_mgr); > + if (dqm->sched_running) { > + dqm->sched_running = false; > + pm_release_ib(>packet_mgr); > + kfd_gtt_sa_free(dqm->dev, dqm->fence_mem); > + pm_uninit(>packet_mgr, hanging); > + } > > - kfd_gtt_sa_free(dqm->dev, dqm->fence_mem); > - pm_uninit(>packet_mgr, hanging); > dqm_unlock(dqm); > > return 0;
Re: [PATCH] drm/amd/amdgpu: fix potential memleak
Am 2021-11-14 um 9:58 p.m. schrieb Bernard Zhao: > In function amdgpu_get_xgmi_hive, when kobject_init_and_add failed > There is a potential memleak if not call kobject_put. > > Signed-off-by: Bernard Zhao Reviewed-by: Felix Kuehling > --- > drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c | 1 + > 1 file changed, 1 insertion(+) > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c > b/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c > index 0fad2bf854ae..567df2db23ac 100644 > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c > @@ -386,6 +386,7 @@ struct amdgpu_hive_info *amdgpu_get_xgmi_hive(struct > amdgpu_device *adev) > "%s", "xgmi_hive_info"); > if (ret) { > dev_err(adev->dev, "XGMI: failed initializing kobject for xgmi > hive\n"); > + kobject_put(>kobj); > kfree(hive); > hive = NULL; > goto pro_end;
Re: Backlight control broken on UM325 (OLED) on 5.15 (bisected)
On Mon, Nov 15, 2021 at 2:35 AM Samuel Čavoj wrote: > > Hello, > > the backlight control no longer works on my ASUS UM325 (Ryzen 5700U) > OLED laptop. I have bisected the breakage to commit 7fd13baeb7a3a48. > > commit 7fd13baeb7a3a48cae12c36c52f06bf4e9e7d728 (HEAD, refs/bisect/bad) > Author: Alex Deucher > Date: Thu Jul 8 16:31:10 2021 -0400 > > drm/amdgpu/display: add support for multiple backlights > > On platforms that support multiple backlights, register > each one separately. This lets us manage them independently > rather than registering a single backlight and applying the > same settings to both. > > v2: fix typo: > Reported-by: kernel test robot > > Reviewed-by: Roman Li > Signed-off-by: Alex Deucher > > I have encountered another user with the same issue on reddit[0]. > > The node in /sys/class/backlight exists, writing to it just does > nothing. I would be glad to help debugging the issue. Thank you very > much. That patch adds support for systems with multiple backlights. Do you have multiple backlight devices now? If so, does the other one work? Can you also try this patch? diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_acpi.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_acpi.c index 4811b0faafd9..67163c9d49e6 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_acpi.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_acpi.c @@ -854,8 +854,8 @@ int amdgpu_acpi_init(struct amdgpu_device *adev) if (amdgpu_device_has_dc_support(adev)) { #if defined(CONFIG_DRM_AMD_DC) struct amdgpu_display_manager *dm = >dm; - if (dm->backlight_dev[0]) - atif->bd = dm->backlight_dev[0]; + if (dm->backlight_dev[1]) + atif->bd = dm->backlight_dev[1]; #endif } else { struct drm_encoder *tmp; Alex > > Regards, > Samuel Čavoj > > [0]: > https://www.reddit.com/r/AMDLaptops/comments/qst0fm/after_updating_to_linux_515_my_brightness/
Re: [PATCH v1 1/3] string: Consolidate yesno() helpers under string.h hood
On Mon, Nov 15, 2021 at 01:43:02PM +0200, Jani Nikula wrote: > On Mon, 15 Nov 2021, Andy Shevchenko > wrote: > > On Tue, Oct 05, 2021 at 02:34:23PM -0700, Lucas De Marchi wrote: > >> On Mon, Feb 15, 2021 at 04:21:35PM +0200, Andy Shevchenko wrote: > >> > We have already few similar implementation and a lot of code that can > >> > benefit > >> > of the yesno() helper. Consolidate yesno() helpers under string.h hood. > >> > >> I was taking a look on i915_utils.h to reduce it and move some of it > >> elsewhere to be shared with others. I was starting with these helpers > >> and had [1] done, then Jani pointed me to this thread and also his > >> previous tentative. I thought the natural place for this would be > >> include/linux/string_helpers.h, but I will leave it up to you. > > > > Seems reasonable to use string_helpers (headers and/or C-file). > > > >> After reading the threads, I don't see real opposition to it. > >> Is there a tree you plan to take this through? > > > > I rest my series in favour of Jani's approach, so I suppose there is no go > > for _this_ series. > > If you want to make it happen, please pick it up and drive it. I'm > thoroughly had enough of this. My point is still the same, so it's more to Lucas. I'm not going to drive this activity due to lack of time. > >> [1] > >> https://lore.kernel.org/lkml/20211005212634.3223113-1-lucas.demar...@intel.com/T/#u -- With Best Regards, Andy Shevchenko
答复: [PATCH] drm/amd/amdgpu: cleanup the code style a bit
-邮件原件- 发件人: bern...@vivo.com 代表 Christian K?nig 发送时间: 2021年11月15日 19:49 收件人: 赵军奎 ; Alex Deucher ; Christian König ; Pan, Xinhui ; David Airlie ; Daniel Vetter ; Jingwen Chen ; Candice Li ; John Clements ; Monk liu ; Peng Ju Zhou ; Jiawei Gu ; Bokun Zhang ; Zhigang Luo ; Lee Jones ; amd-gfx@lists.freedesktop.org; dri-de...@lists.freedesktop.org; linux-ker...@vger.kernel.org 主题: Re: [PATCH] drm/amd/amdgpu: cleanup the code style a bit Am 15.11.21 um 08:07 schrieb Bernard Zhao: > This change is to cleanup the code style a bit. >To be honest I think the old style looked better. It took me a moment to >validate this now. >What you could to instead is to have goto style error handling which would >make this a bit more cleaner I think. Hi Looks like a good idea, thank you for your comments! I will resubmit a version! BR//Bernard >Christian. > > Signed-off-by: Bernard Zhao > --- > drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c | 10 ++ > 1 file changed, 6 insertions(+), 4 deletions(-) > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c > b/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c > index 04cf9b207e62..90070b41136a 100644 > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c > @@ -286,12 +286,14 @@ static int amdgpu_virt_init_ras_err_handler_data(struct > amdgpu_device *adev) > return -ENOMEM; > > bps = kmalloc_array(align_space, sizeof((*data)->bps), GFP_KERNEL); > + if (!bps) { > + kfree(*data); > + return -ENOMEM; > + } > bps_bo = kmalloc_array(align_space, sizeof((*data)->bps_bo), > GFP_KERNEL); > - > - if (!bps || !bps_bo) { > - kfree(bps); > - kfree(bps_bo); > + if (!bps_bo) { > kfree(*data); > + kfree(bps); > return -ENOMEM; > } >
Re: [PATCH v1 1/3] string: Consolidate yesno() helpers under string.h hood
On Tue, Oct 05, 2021 at 02:34:23PM -0700, Lucas De Marchi wrote: > On Mon, Feb 15, 2021 at 04:21:35PM +0200, Andy Shevchenko wrote: > > We have already few similar implementation and a lot of code that can > > benefit > > of the yesno() helper. Consolidate yesno() helpers under string.h hood. > > I was taking a look on i915_utils.h to reduce it and move some of it > elsewhere to be shared with others. I was starting with these helpers > and had [1] done, then Jani pointed me to this thread and also his > previous tentative. I thought the natural place for this would be > include/linux/string_helpers.h, but I will leave it up to you. Seems reasonable to use string_helpers (headers and/or C-file). > After reading the threads, I don't see real opposition to it. > Is there a tree you plan to take this through? I rest my series in favour of Jani's approach, so I suppose there is no go for _this_ series. > [1] > https://lore.kernel.org/lkml/20211005212634.3223113-1-lucas.demar...@intel.com/T/#u -- With Best Regards, Andy Shevchenko
[PATCH v2] drm/amd/amdgpu: cleanup the code style a bit
This change is to cleanup the code style a bit. Signed-off-by: Bernard Zhao --- drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c | 21 + 1 file changed, 13 insertions(+), 8 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c index 04cf9b207e62..3fc49823f527 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c @@ -283,17 +283,15 @@ static int amdgpu_virt_init_ras_err_handler_data(struct amdgpu_device *adev) *data = kmalloc(sizeof(struct amdgpu_virt_ras_err_handler_data), GFP_KERNEL); if (!*data) - return -ENOMEM; + goto data_failure; bps = kmalloc_array(align_space, sizeof((*data)->bps), GFP_KERNEL); - bps_bo = kmalloc_array(align_space, sizeof((*data)->bps_bo), GFP_KERNEL); + if (!bps) + goto bps_failure; - if (!bps || !bps_bo) { - kfree(bps); - kfree(bps_bo); - kfree(*data); - return -ENOMEM; - } + bps_bo = kmalloc_array(align_space, sizeof((*data)->bps_bo), GFP_KERNEL); + if (!bps_bo) + goto bps_bo_failure; (*data)->bps = bps; (*data)->bps_bo = bps_bo; @@ -303,6 +301,13 @@ static int amdgpu_virt_init_ras_err_handler_data(struct amdgpu_device *adev) virt->ras_init_done = true; return 0; + +bps_bo_failure: + kfree(bps); +bps_failure: + kfree(*data); +data_failure: + return -ENOMEM; } static void amdgpu_virt_ras_release_bp(struct amdgpu_device *adev) -- 2.33.1
Re: [PATCH] drm/amdgpu: always reset the asic in suspend
[AMD Official Use Only] Well, that handles the case of the GPU needing to be reset on driver (e.g., virtualization), but doesn't handle the interrupted suspend case (e.g., when suspend is unwound before the power rail was turned off). We already so something similar for hibernate to deal with the multiple freeze and thaw cycles. Alex From: Christian König Sent: Monday, November 15, 2021 8:41 AM To: Alex Deucher ; Deucher, Alexander Cc: amd-gfx list Subject: Re: [PATCH] drm/amdgpu: always reset the asic in suspend I was just about to write up my concern as well. IIRC we used to have that and it didn't really worked that well and we switched to resetting the GPU on driver load instead if initializing it doesn't work of hand. Christian. Am 12.11.21 um 17:19 schrieb Alex Deucher: > Actually, ignore this for now. This will likely cause problems with S0ix. > > Alex > > On Fri, Nov 12, 2021 at 11:18 AM Alex Deucher > wrote: >> If the platform suspend happens to fail and the power rail >> is not turned off, the GPU will be in an unknown state on >> resume, so reset the asic so that it will be in a known >> good state on resume even if the platform suspend failed. >> >> Signed-off-by: Alex Deucher >> --- >> drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 5 +++-- >> 1 file changed, 3 insertions(+), 2 deletions(-) >> >> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c >> b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c >> index 1db76429a673..42af3d88e0ba 100644 >> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c >> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c >> @@ -2165,8 +2165,9 @@ static int amdgpu_pmops_suspend(struct device *dev) >> adev->in_s3 = true; >> r = amdgpu_device_suspend(drm_dev, true); >> adev->in_s3 = false; >> - >> - return r; >> + if (r) >> + return r; >> + return amdgpu_asic_reset(adev); >> } >> >> static int amdgpu_pmops_resume(struct device *dev) >> -- >> 2.31.1 >>
Re: [PATCH] drm/amdgpu: always reset the asic in suspend (v2)
On Mon, Nov 15, 2021 at 2:50 AM Lazar, Lijo wrote: > > > > On 11/12/2021 9:55 PM, Alex Deucher wrote: > > If the platform suspend happens to fail and the power rail > > is not turned off, the GPU will be in an unknown state on > > resume, so reset the asic so that it will be in a known > > good state on resume even if the platform suspend failed. > > > > Any more background info on the issue? Is there a need to trigger BACO > or D3cold entry similar to how it's done for runtime suspend? Basically something like the following, user requests S3, drivers start to do their suspend thing, but then something interrupts it (e.g., user plugs/unplugs a usb device or S3 gets interrupted for something). At that point, the power rail has not been turned off. The kernel then starts calling the resume functions for each device because the suspend was aborted. However, since the power rail was not turned off, the GPU is still initialized so the driver can't properly re-init it without a reset. Alex > > Thanks, > Lijo > > > v2: handle s0ix > > > > Signed-off-by: Alex Deucher > > --- > > drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 5 - > > 1 file changed, 4 insertions(+), 1 deletion(-) > > > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c > > b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c > > index 1db76429a673..b4591f6e82dd 100644 > > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c > > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c > > @@ -2165,7 +2165,10 @@ static int amdgpu_pmops_suspend(struct device *dev) > > adev->in_s3 = true; > > r = amdgpu_device_suspend(drm_dev, true); > > adev->in_s3 = false; > > - > > + if (r) > > + return r; > > + if (!adev->in_s0ix) > > + r = amdgpu_asic_reset(adev); > > return r; > > } > > > >
Re: [PATCH] drm/amd/pm: Remove artificial freq level on Navi1x
[AMD Official Use Only] Acked-by: Alex Deucher From: Lazar, Lijo Sent: Monday, November 15, 2021 2:42 AM To: amd-gfx@lists.freedesktop.org Cc: Deucher, Alexander ; Zhang, Hawking ; Wang, Yang(Kevin) ; Quan, Evan Subject: [PATCH] drm/amd/pm: Remove artificial freq level on Navi1x Print Navi1x fine grained clocks in a consistent manner with other SOCs. Don't show aritificial DPM level when the current clock equals min or max. Signed-off-by: Lijo Lazar --- drivers/gpu/drm/amd/pm/swsmu/smu11/navi10_ppt.c | 13 - 1 file changed, 8 insertions(+), 5 deletions(-) diff --git a/drivers/gpu/drm/amd/pm/swsmu/smu11/navi10_ppt.c b/drivers/gpu/drm/amd/pm/swsmu/smu11/navi10_ppt.c index 71161f6b78fe..60a557068ea4 100644 --- a/drivers/gpu/drm/amd/pm/swsmu/smu11/navi10_ppt.c +++ b/drivers/gpu/drm/amd/pm/swsmu/smu11/navi10_ppt.c @@ -1265,7 +1265,7 @@ static int navi10_print_clk_levels(struct smu_context *smu, enum smu_clk_type clk_type, char *buf) { uint16_t *curve_settings; - int i, size = 0, ret = 0; + int i, levels, size = 0, ret = 0; uint32_t cur_value = 0, value = 0, count = 0; uint32_t freq_values[3] = {0}; uint32_t mark_index = 0; @@ -1319,14 +1319,17 @@ static int navi10_print_clk_levels(struct smu_context *smu, freq_values[1] = cur_value; mark_index = cur_value == freq_values[0] ? 0 : cur_value == freq_values[2] ? 2 : 1; - if (mark_index != 1) - freq_values[1] = (freq_values[0] + freq_values[2]) / 2; - for (i = 0; i < 3; i++) { + levels = 3; + if (mark_index != 1) { + levels = 2; + freq_values[1] = freq_values[2]; + } + + for (i = 0; i < levels; i++) { size += sysfs_emit_at(buf, size, "%d: %uMhz %s\n", i, freq_values[i], i == mark_index ? "*" : ""); } - } break; case SMU_PCIE: -- 2.17.1
RE: [PATCH 00/14] DC Patches November 12, 2021
[Public] Hi all, This week this patchset was tested on the following systems: HP Envy 360, with Ryzen 5 4500U, with the following display types: eDP 1080p 60hz, 4k 60hz (via USB-C to DP/HDMI), 1440p 144hz (via USB-C to DP/HDMI), 1680*1050 60hz (via USB-C to DP and then DP to DVI/VGA) AMD Ryzen 9 5900H, with the following display types: eDP 1080p 60hz, 4k 60hz (via USB-C to DP/HDMI), 1440p 144hz (via USB-C to DP/HDMI), 1680*1050 60hz (via USB-C to DP and then DP to DVI/VGA) Sapphire Pulse RX5700XT with the following display types: 4k 60hz (via DP/HDMI), 1440p 144hz (via DP/HDMI), 1680*1050 60hz (via DP to DVI/VGA) Reference AMD RX6800 with the following display types: 4k 60hz (via DP/HDMI and USB-C to DP/HDMI), 1440p 144hz (via USB-C to DP/HDMI and USB-C to DP/HDMI), 1680*1050 60hz (via DP to DVI/VGA) Included testing using a Startech DP 1.4 MST hub at 2x 4k 60hz, and 3x 1080p 60hz on all systems. Also tested DSC via USB-C to DP DSC Hub with 3x 4k 60hz on Ryzen 9 5900h and Ryzen 5 4500u. Tested on Ubuntu 20.04.3 with Kernel Version 5.13 and ChromeOS Tested-by: Daniel Wheeler Thank you, Dan Wheeler Technologist | AMD SW Display -- 1 Commerce Valley Dr E, Thornhill, ON L3T 7X6 Facebook | Twitter | amd.com -Original Message- From: amd-gfx On Behalf Of Wayne Lin Sent: November 11, 2021 7:54 PM To: amd-gfx@lists.freedesktop.org Cc: Wang, Chao-kai (Stylon) ; Li, Sun peng (Leo) ; Wentland, Harry ; Zhuo, Qingqing ; Siqueira, Rodrigo ; Li, Roman ; Chiu, Solomon ; Pillai, Aurabindo ; Lin, Wayne ; Lipski, Mikita ; Lakha, Bhawanpreet ; Gutierrez, Agustin ; Kotarac, Pavle Subject: [PATCH 00/14] DC Patches November 12, 2021 This DC patchset brings improvements in multiple areas. In summary, we highlight: - Fix issue that secondary display goes blank on Non DCN31. - Adjust flushing data in DMCUB - Revert patches which cause regression in hadnling MPO/Link encoder assignment - Correct the setting within MSA of DP2.0 - Adjustment for DML isolation - Fix FIFO erro in fast boot sequence - Enable DSC over eDP - Adjust the DSC power off sequence --- Ahmad Othman (1): drm/amd/display: Secondary display goes blank on Non DCN31 Angus Wang (1): drm/amd/display: Revert changes for MPO underflow Anthony Koo (2): drm/amd/display: [FW Promotion] Release 0.0.92 drm/amd/display: [FW Promotion] Release 0.0.93 Aric Cyr (1): drm/amd/display: 3.2.162 Brandon Syu (1): drm/amd/display: Fix eDP will flash when boot to OS Jun Lei (1): drm/amd/display: Code change for DML isolation Mikita Lipski (1): drm/amd/display: Enable DSC over eDP Nicholas Kazlauskas (1): drm/amd/display: Only flush delta from last command execution Sung Joon Kim (1): drm/amd/display: Revert "retain/release stream pointer in link enc table" Wenjing Liu (1): drm/amd/display: set MSA vsp/hsp to 0 for positive polarity for DP 128b/132b Xu, Jinze (1): drm/amd/display: Reset fifo after enable otg Yi-Ling Chen (1): drm/amd/display: fixed the DSC power off sequence during Driver PnP hvanzyll (1): drm/amd/display: Visual Confirm Bar Height Adjust .../gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 73 +- .../amd/display/amdgpu_dm/amdgpu_dm_helpers.c |2 +- drivers/gpu/drm/amd/display/dc/core/dc_link.c |5 +- .../gpu/drm/amd/display/dc/core/dc_link_dp.c | 167 +- .../drm/amd/display/dc/core/dc_link_enc_cfg.c |2 - drivers/gpu/drm/amd/display/dc/dc.h |7 +- drivers/gpu/drm/amd/display/dc/dce/dmub_psr.c |1 + .../display/dc/dce110/dce110_hw_sequencer.c |7 +- .../drm/amd/display/dc/dcn10/dcn10_dpp_dscl.c | 14 +- .../amd/display/dc/dcn10/dcn10_hw_sequencer.c | 37 + .../display/dc/dcn10/dcn10_stream_encoder.c | 15 + .../display/dc/dcn10/dcn10_stream_encoder.h |3 + .../gpu/drm/amd/display/dc/dcn20/dcn20_dsc.c |2 + .../gpu/drm/amd/display/dc/dcn20/dcn20_optc.c | 14 + .../gpu/drm/amd/display/dc/dcn20/dcn20_optc.h |3 + .../drm/amd/display/dc/dcn20/dcn20_resource.c |2 +- .../display/dc/dcn20/dcn20_stream_encoder.c |2 + .../dc/dcn30/dcn30_dio_stream_encoder.c |2 + .../gpu/drm/amd/display/dc/dcn30/dcn30_optc.c |1 + .../drm/amd/display/dc/dcn30/dcn30_resource.c |2 +- .../amd/display/dc/dcn302/dcn302_resource.c |2 +- .../amd/display/dc/dcn303/dcn303_resource.c |2 +- .../dc/dcn31/dcn31_hpo_dp_stream_encoder.c|4 +- .../drm/amd/display/dc/dcn31/dcn31_hwseq.c|5 - .../gpu/drm/amd/display/dc/dcn31/dcn31_optc.c |1 + .../drm/amd/display/dc/dcn31/dcn31_resource.c |1 + .../drm/amd/display/dc/dml/display_mode_lib.h |1 + .../gpu/drm/amd/display/dc/dml/dml_wrapper.c | 1889 + .../display/dc/dml/dml_wrapper_translation.c | 284 +++ drivers/gpu/drm/amd/display/dc/dsc/dc_dsc.c |8 +
Re: [PATCH 1/2] drm/amdgpu: IH process reset count when restart
Am 13.11.21 um 01:05 schrieb Philip Yang: Otherwise when IH process restart, count is zero, the loop will not exit to wake_up_all after processing AMDGPU_IH_MAX_NUM_IVS interrupts. Signed-off-by: Philip Yang Reviewed-by: Christian König Maybe even CC: stable? Regards, Christian. --- drivers/gpu/drm/amd/amdgpu/amdgpu_ih.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ih.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ih.c index f3d62e196901..0c7963dfacad 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ih.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ih.c @@ -223,7 +223,7 @@ int amdgpu_ih_wait_on_checkpoint_process(struct amdgpu_device *adev, */ int amdgpu_ih_process(struct amdgpu_device *adev, struct amdgpu_ih_ring *ih) { - unsigned int count = AMDGPU_IH_MAX_NUM_IVS; + unsigned int count; u32 wptr; if (!ih->enabled || adev->shutdown) @@ -232,6 +232,7 @@ int amdgpu_ih_process(struct amdgpu_device *adev, struct amdgpu_ih_ring *ih) wptr = amdgpu_ih_get_wptr(adev, ih); restart_ih: + count = AMDGPU_IH_MAX_NUM_IVS; DRM_DEBUG("%s: rptr %d, wptr %d\n", __func__, ih->rptr, wptr); /* Order reading of wptr vs. reading of IH ring data */
Re: [PATCH] drm/amdgpu: always reset the asic in suspend
I was just about to write up my concern as well. IIRC we used to have that and it didn't really worked that well and we switched to resetting the GPU on driver load instead if initializing it doesn't work of hand. Christian. Am 12.11.21 um 17:19 schrieb Alex Deucher: Actually, ignore this for now. This will likely cause problems with S0ix. Alex On Fri, Nov 12, 2021 at 11:18 AM Alex Deucher wrote: If the platform suspend happens to fail and the power rail is not turned off, the GPU will be in an unknown state on resume, so reset the asic so that it will be in a known good state on resume even if the platform suspend failed. Signed-off-by: Alex Deucher --- drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c index 1db76429a673..42af3d88e0ba 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c @@ -2165,8 +2165,9 @@ static int amdgpu_pmops_suspend(struct device *dev) adev->in_s3 = true; r = amdgpu_device_suspend(drm_dev, true); adev->in_s3 = false; - - return r; + if (r) + return r; + return amdgpu_asic_reset(adev); } static int amdgpu_pmops_resume(struct device *dev) -- 2.31.1
Re: Questions about KMS flip
On 2021-11-15 12:31, Lang Yu wrote: > On Mon, Nov 15, 2021 at 10:49:39AM +0100, Michel DDDnzer wrote: >> On 2021-11-15 10:04, Lang Yu wrote: >>> On Mon, Nov 15, 2021 at 09:38:47AM +0100, Michel DDDnzer wrote: On 2021-11-15 07:41, Lang Yu wrote: > On Fri, Nov 12, 2021 at 05:10:27PM +0100, Michel DDDnzer wrote: >> On 2021-11-12 16:03, Christian König wrote: >>> Am 12.11.21 um 15:30 schrieb Michel Dänzer: On 2021-11-12 15:29, Michel Dänzer wrote: > On 2021-11-12 13:47, Christian König wrote: >> Anyway this unfortunately turned out to be work for Harray and >> Nicholas. In detail it's about this bug report here: >> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fbugzilla.kernel.org%2Fshow_bug.cgi%3Fid%3D214621data=04%7C01%7CLang.Yu%40amd.com%7Cee54c4d055d040ef9f8b08d9a81d3eb9%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637725665833112900%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000sdata=7nwIYd1um420XHVpOzeIvz37%2FLQqHF%2F6aRKfzgxUTnM%3Dreserved=0 >> >> Lang was able to reproduce the issue and narrow it down to the pin >> in amdgpu_display_crtc_page_flip_target(). >> >> In other words we somehow have an unbalanced pinning of the scanout >> buffer in DC. > DC doesn't use amdgpu_display_crtc_page_flip_target AFAICT. The > corresponding pin with DC would be in dm_plane_helper_prepare_fb, > paired with the unpin in > dm_plane_helper_cleanup_fb. > > > With non-DC, the pin in amdgpu_display_crtc_page_flip_target is > paired with the unpin in dm_plane_helper_cleanup_fb This should say amdgpu_display_unpin_work_func. >>> >>> Ah! So that is the classic (e.g. non atomic) path? >> >> Presumably. >> >> > & dce_v*_crtc_disable. One thing I notice is that the pin is guarded > by if (!adev->enable_virtual_display), but the unpins seem > unconditional. So could this be about virtual display, and the > problem is actually trying to unpin a BO that was never pinned? >>> >>> Nope, my educated guess is rather that we free up the BO before >>> amdgpu_display_unpin_work_func is called. >>> >>> E.g. not pin unbalance, but rather use after free. >> >> amdgpu_display_crtc_page_flip_target calls amdgpu_bo_ref(work->old_abo), >> and amdgpu_display_unpin_work_func calls amdgpu_bo_unref(>old_abo) >> only after amdgpu_bo_unpin. So what you describe could only happen if >> there's an imbalance elsewhere such that amdgpu_bo_unref is called more >> often than amdgpu_bo_ref, or maybe if amdgpu_bo_reserve fails in >> amdgpu_display_unpin_work_func (in which case the "failed to reserve >> buffer after flip" error message should appear in dmesg). > > > Actually, each call to amdgpu_display_crtc_page_flip_target() will > > 1, init a work(amdgpu_display_unpin_work_func) to unpin an old buffer >(crtc->primary->fb), the work will be queued in > dce_vX_0_pageflip_irq(). > > 2, pin a new buffer, assign it to crtc->primary->fb. But how to unpin it? >Next call. > > The problem is the pinned buffer of last call to > amdgpu_display_crtc_page_flip_target() is not unpinned. It's unpinned in dce_v*_0_crtc_disable. >>> >>> I just found crtc->primary->fb is NULL when came in dce_v*_0_crtc_disable(). >>> So it's not unpinned... >> >> __drm_helper_disable_unused_functions sets crtc->primary->fb = NULL only >> after calling crtc_funcs->disable. Maybe this path can get hit for a CRTC >> which was already disabled, in which case crtc->primary->fb == NULL in >> dce_v*_0_crtc_disable is harmless. >> >> Have you checked for the issue I described below? Should be pretty easy to >> catch. >> >> I think I've found the problem though: dce_v*_0_crtc_do_set_base pin the BO from target_fb unconditionally, but unpin the BO from the fb parameter only if it's different from the former. So if they're the same, the BO's pin count is incremented by 1. > > Form my observations, amdgpu_bo_unpin() in dce_v*_0_crtc_disable() is > never called. It would be expected to happen when the screen turns off, e.g. due to DPMS. > Though a single call to dce_v*_0_crtc_do_set_base() will > only pin the BO, I found it will be unpinned in next call to > dce_v*_0_crtc_do_set_base(). Yeah, that's the normal case when the new BO is different from the old one. To catch the case I described, try something like diff --git a/drivers/gpu/drm/amd/amdgpu/dce_v11_0.c b/drivers/gpu/drm/amd/amdgpu/dce_v11_0.c index 18a7b3bd633b..5726bd87a355 100644 --- a/drivers/gpu/drm/amd/amdgpu/dce_v11_0.c +++ b/drivers/gpu/drm/amd/amdgpu/dce_v11_0.c @@ -1926,6 +1926,7 @@ static int dce_v11_0_crtc_do_set_base(struct drm_crtc *crtc,
Re: [PATCH] drm/amd/amdgpu: cleanup the code style a bit
Am 15.11.21 um 08:07 schrieb Bernard Zhao: This change is to cleanup the code style a bit. To be honest I think the old style looked better. It took me a moment to validate this now. What you could to instead is to have goto style error handling which would make this a bit more cleaner I think. Christian. Signed-off-by: Bernard Zhao --- drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c | 10 ++ 1 file changed, 6 insertions(+), 4 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c index 04cf9b207e62..90070b41136a 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c @@ -286,12 +286,14 @@ static int amdgpu_virt_init_ras_err_handler_data(struct amdgpu_device *adev) return -ENOMEM; bps = kmalloc_array(align_space, sizeof((*data)->bps), GFP_KERNEL); + if (!bps) { + kfree(*data); + return -ENOMEM; + } bps_bo = kmalloc_array(align_space, sizeof((*data)->bps_bo), GFP_KERNEL); - - if (!bps || !bps_bo) { - kfree(bps); - kfree(bps_bo); + if (!bps_bo) { kfree(*data); + kfree(bps); return -ENOMEM; }
Re: [PATCH v1 1/3] string: Consolidate yesno() helpers under string.h hood
On Mon, 15 Nov 2021, Andy Shevchenko wrote: > On Tue, Oct 05, 2021 at 02:34:23PM -0700, Lucas De Marchi wrote: >> On Mon, Feb 15, 2021 at 04:21:35PM +0200, Andy Shevchenko wrote: >> > We have already few similar implementation and a lot of code that can >> > benefit >> > of the yesno() helper. Consolidate yesno() helpers under string.h hood. >> >> I was taking a look on i915_utils.h to reduce it and move some of it >> elsewhere to be shared with others. I was starting with these helpers >> and had [1] done, then Jani pointed me to this thread and also his >> previous tentative. I thought the natural place for this would be >> include/linux/string_helpers.h, but I will leave it up to you. > > Seems reasonable to use string_helpers (headers and/or C-file). > >> After reading the threads, I don't see real opposition to it. >> Is there a tree you plan to take this through? > > I rest my series in favour of Jani's approach, so I suppose there is no go > for _this_ series. If you want to make it happen, please pick it up and drive it. I'm thoroughly had enough of this. BR, Jani. > >> [1] >> https://lore.kernel.org/lkml/20211005212634.3223113-1-lucas.demar...@intel.com/T/#u -- Jani Nikula, Intel Open Source Graphics Center
[PATCH] drm/amd/amdgpu: remove useless break after return
This change is to remove useless break after return. Signed-off-by: Bernard Zhao --- drivers/gpu/drm/amd/amdgpu/dce_v8_0.c | 4 1 file changed, 4 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/dce_v8_0.c b/drivers/gpu/drm/amd/amdgpu/dce_v8_0.c index b200b9e722d9..8318ee8339f1 100644 --- a/drivers/gpu/drm/amd/amdgpu/dce_v8_0.c +++ b/drivers/gpu/drm/amd/amdgpu/dce_v8_0.c @@ -2092,22 +2092,18 @@ static int dce_v8_0_pick_dig_encoder(struct drm_encoder *encoder) return 1; else return 0; - break; case ENCODER_OBJECT_ID_INTERNAL_UNIPHY1: if (dig->linkb) return 3; else return 2; - break; case ENCODER_OBJECT_ID_INTERNAL_UNIPHY2: if (dig->linkb) return 5; else return 4; - break; case ENCODER_OBJECT_ID_INTERNAL_UNIPHY3: return 6; - break; default: DRM_ERROR("invalid encoder_id: 0x%x\n", amdgpu_encoder->encoder_id); return 0; -- 2.33.1
[PATCH] drm/amd/amdgpu: cleanup the code style a bit
This change is to cleanup the code style a bit. Signed-off-by: Bernard Zhao --- drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c | 10 ++ 1 file changed, 6 insertions(+), 4 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c index 04cf9b207e62..90070b41136a 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c @@ -286,12 +286,14 @@ static int amdgpu_virt_init_ras_err_handler_data(struct amdgpu_device *adev) return -ENOMEM; bps = kmalloc_array(align_space, sizeof((*data)->bps), GFP_KERNEL); + if (!bps) { + kfree(*data); + return -ENOMEM; + } bps_bo = kmalloc_array(align_space, sizeof((*data)->bps_bo), GFP_KERNEL); - - if (!bps || !bps_bo) { - kfree(bps); - kfree(bps_bo); + if (!bps_bo) { kfree(*data); + kfree(bps); return -ENOMEM; } -- 2.33.1
[PATCH] drm/amd/amdgpu: fix potential memleak
In function amdgpu_get_xgmi_hive, when kobject_init_and_add failed There is a potential memleak if not call kobject_put. Signed-off-by: Bernard Zhao --- drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c index 0fad2bf854ae..567df2db23ac 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c @@ -386,6 +386,7 @@ struct amdgpu_hive_info *amdgpu_get_xgmi_hive(struct amdgpu_device *adev) "%s", "xgmi_hive_info"); if (ret) { dev_err(adev->dev, "XGMI: failed initializing kobject for xgmi hive\n"); + kobject_put(>kobj); kfree(hive); hive = NULL; goto pro_end; -- 2.33.1
[PATCH -next] drm/amd/display: check top_pipe_to_program pointer
Clang static analysis reports this error drivers/gpu/drm/amd/amdgpu/../display/dc/core/dc.c:2870:7: warning: Dereference of null pointer [clang-analyzer-core.NullDereference] if (top_pipe_to_program->stream_res.tg->funcs->lock_doublebuffer_enable) { ^ top_pipe_to_program being NULL is caught as an error But then it is used to report the error. So add a check before using it. Reported-by: Abaci Robot Signed-off-by: Yang Li --- drivers/gpu/drm/amd/display/dc/core/dc.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/display/dc/core/dc.c b/drivers/gpu/drm/amd/display/dc/core/dc.c index 39ad385..34382d0 100644 --- a/drivers/gpu/drm/amd/display/dc/core/dc.c +++ b/drivers/gpu/drm/amd/display/dc/core/dc.c @@ -2867,7 +2867,8 @@ static void commit_planes_for_stream(struct dc *dc, #endif if ((update_type != UPDATE_TYPE_FAST) && stream->update_flags.bits.dsc_changed) - if (top_pipe_to_program->stream_res.tg->funcs->lock_doublebuffer_enable) { + if (top_pipe_to_program && + top_pipe_to_program->stream_res.tg->funcs->lock_doublebuffer_enable) { if (should_use_dmub_lock(stream->link)) { union dmub_hw_lock_flags hw_locks = { 0 }; struct dmub_hw_lock_inst_flags inst_flags = { 0 }; -- 1.8.3.1
Backlight control broken on UM325 (OLED) on 5.15 (bisected)
Hello, the backlight control no longer works on my ASUS UM325 (Ryzen 5700U) OLED laptop. I have bisected the breakage to commit 7fd13baeb7a3a48. commit 7fd13baeb7a3a48cae12c36c52f06bf4e9e7d728 (HEAD, refs/bisect/bad) Author: Alex Deucher Date: Thu Jul 8 16:31:10 2021 -0400 drm/amdgpu/display: add support for multiple backlights On platforms that support multiple backlights, register each one separately. This lets us manage them independently rather than registering a single backlight and applying the same settings to both. v2: fix typo: Reported-by: kernel test robot Reviewed-by: Roman Li Signed-off-by: Alex Deucher I have encountered another user with the same issue on reddit[0]. The node in /sys/class/backlight exists, writing to it just does nothing. I would be glad to help debugging the issue. Thank you very much. Regards, Samuel Čavoj [0]: https://www.reddit.com/r/AMDLaptops/comments/qst0fm/after_updating_to_linux_515_my_brightness/
Re: [PATCH] drm/amd/amdgpu: remove useless break after return
On Sun, 2021-11-14 at 23:14 -0800, Bernard Zhao wrote: > This change is to remove useless break after return. [] > diff --git a/drivers/gpu/drm/amd/amdgpu/dce_v8_0.c > b/drivers/gpu/drm/amd/amdgpu/dce_v8_0.c [] > @@ -2092,22 +2092,18 @@ static int dce_v8_0_pick_dig_encoder(struct > drm_encoder *encoder) > return 1; > else > return 0; > - break; > case ENCODER_OBJECT_ID_INTERNAL_UNIPHY1: > if (dig->linkb) > return 3; > else > return 2; > - break; > case ENCODER_OBJECT_ID_INTERNAL_UNIPHY2: > if (dig->linkb) > return 5; > else > return 4; > - break; > case ENCODER_OBJECT_ID_INTERNAL_UNIPHY3: > return 6; > - break; > default: > DRM_ERROR("invalid encoder_id: 0x%x\n", > amdgpu_encoder->encoder_id); > return 0; Perhaps rewrite to make the sequential numbering more obvious. --- drivers/gpu/drm/amd/amdgpu/dce_v8_0.c | 19 +++ 1 file changed, 3 insertions(+), 16 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/dce_v8_0.c b/drivers/gpu/drm/amd/amdgpu/dce_v8_0.c index b200b9e722d97..7307524b706b4 100644 --- a/drivers/gpu/drm/amd/amdgpu/dce_v8_0.c +++ b/drivers/gpu/drm/amd/amdgpu/dce_v8_0.c @@ -2088,26 +2088,13 @@ static int dce_v8_0_pick_dig_encoder(struct drm_encoder *encoder) switch (amdgpu_encoder->encoder_id) { case ENCODER_OBJECT_ID_INTERNAL_UNIPHY: - if (dig->linkb) - return 1; - else - return 0; - break; + return !dig->linkb ? 0 : 1; case ENCODER_OBJECT_ID_INTERNAL_UNIPHY1: - if (dig->linkb) - return 3; - else - return 2; - break; + return !dig->linkb ? 2 : 3; case ENCODER_OBJECT_ID_INTERNAL_UNIPHY2: - if (dig->linkb) - return 5; - else - return 4; - break; + return !dig->linkb ? 4 : 5; case ENCODER_OBJECT_ID_INTERNAL_UNIPHY3: return 6; - break; default: DRM_ERROR("invalid encoder_id: 0x%x\n", amdgpu_encoder->encoder_id); return 0;
Re: Questions about KMS flip
On 2021-11-15 10:04, Lang Yu wrote: > On Mon, Nov 15, 2021 at 09:38:47AM +0100, Michel DDDnzer wrote: >> On 2021-11-15 07:41, Lang Yu wrote: >>> On Fri, Nov 12, 2021 at 05:10:27PM +0100, Michel DDDnzer wrote: On 2021-11-12 16:03, Christian König wrote: > Am 12.11.21 um 15:30 schrieb Michel Dänzer: >> On 2021-11-12 15:29, Michel Dänzer wrote: >>> On 2021-11-12 13:47, Christian König wrote: Anyway this unfortunately turned out to be work for Harray and Nicholas. In detail it's about this bug report here: https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fbugzilla.kernel.org%2Fshow_bug.cgi%3Fid%3D214621data=04%7C01%7CLang.Yu%40amd.com%7C766e41a1c3052b6b08d9a81358b0%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637725623316543180%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000sdata=od%2BNksWOff%2FtsuAYZLX7lIJFQJMY2OScVqhLclPYWAQ%3Dreserved=0 Lang was able to reproduce the issue and narrow it down to the pin in amdgpu_display_crtc_page_flip_target(). In other words we somehow have an unbalanced pinning of the scanout buffer in DC. >>> DC doesn't use amdgpu_display_crtc_page_flip_target AFAICT. The >>> corresponding pin with DC would be in dm_plane_helper_prepare_fb, >>> paired with the unpin in >>> dm_plane_helper_cleanup_fb. >>> >>> >>> With non-DC, the pin in amdgpu_display_crtc_page_flip_target is paired >>> with the unpin in dm_plane_helper_cleanup_fb >> This should say amdgpu_display_unpin_work_func. > > Ah! So that is the classic (e.g. non atomic) path? Presumably. >>> & dce_v*_crtc_disable. One thing I notice is that the pin is guarded by >>> if (!adev->enable_virtual_display), but the unpins seem unconditional. >>> So could this be about virtual display, and the problem is actually >>> trying to unpin a BO that was never pinned? > > Nope, my educated guess is rather that we free up the BO before > amdgpu_display_unpin_work_func is called. > > E.g. not pin unbalance, but rather use after free. amdgpu_display_crtc_page_flip_target calls amdgpu_bo_ref(work->old_abo), and amdgpu_display_unpin_work_func calls amdgpu_bo_unref(>old_abo) only after amdgpu_bo_unpin. So what you describe could only happen if there's an imbalance elsewhere such that amdgpu_bo_unref is called more often than amdgpu_bo_ref, or maybe if amdgpu_bo_reserve fails in amdgpu_display_unpin_work_func (in which case the "failed to reserve buffer after flip" error message should appear in dmesg). >>> >>> >>> Actually, each call to amdgpu_display_crtc_page_flip_target() will >>> >>> 1, init a work(amdgpu_display_unpin_work_func) to unpin an old buffer >>>(crtc->primary->fb), the work will be queued in dce_vX_0_pageflip_irq(). >>> >>> 2, pin a new buffer, assign it to crtc->primary->fb. But how to unpin it? >>>Next call. >>> >>> The problem is the pinned buffer of last call to >>> amdgpu_display_crtc_page_flip_target() is not unpinned. >> >> It's unpinned in dce_v*_0_crtc_disable. > > I just found crtc->primary->fb is NULL when came in dce_v*_0_crtc_disable(). > So it's not unpinned... __drm_helper_disable_unused_functions sets crtc->primary->fb = NULL only after calling crtc_funcs->disable. Maybe this path can get hit for a CRTC which was already disabled, in which case crtc->primary->fb == NULL in dce_v*_0_crtc_disable is harmless. Have you checked for the issue I described below? Should be pretty easy to catch. >> I think I've found the problem though: dce_v*_0_crtc_do_set_base pin the BO >> from target_fb unconditionally, but unpin the BO from the fb parameter only >> if it's different from the former. So if they're the same, the BO's pin >> count is incremented by 1. -- Earthling Michel Dänzer| https://redhat.com Libre software enthusiast | Mesa and Xwayland developer
Re: Questions about KMS flip
On 2021-11-15 07:41, Lang Yu wrote: > On Fri, Nov 12, 2021 at 05:10:27PM +0100, Michel DDDnzer wrote: >> On 2021-11-12 16:03, Christian König wrote: >>> Am 12.11.21 um 15:30 schrieb Michel Dänzer: On 2021-11-12 15:29, Michel Dänzer wrote: > On 2021-11-12 13:47, Christian König wrote: >> Anyway this unfortunately turned out to be work for Harray and Nicholas. >> In detail it's about this bug report here: >> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fbugzilla.kernel.org%2Fshow_bug.cgi%3Fid%3D214621data=04%7C01%7CLang.Yu%40amd.com%7Cca557eab16864ab544a108d9a5f6f288%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637723302338811983%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000sdata=9oG6k22BVp%2BkSvRX6uMlGXc6G%2BA5UL0Qy3W5iDTpvzs%3Dreserved=0 >> >> Lang was able to reproduce the issue and narrow it down to the pin in >> amdgpu_display_crtc_page_flip_target(). >> >> In other words we somehow have an unbalanced pinning of the scanout >> buffer in DC. > DC doesn't use amdgpu_display_crtc_page_flip_target AFAICT. The > corresponding pin with DC would be in dm_plane_helper_prepare_fb, paired > with the unpin in > dm_plane_helper_cleanup_fb. > > > With non-DC, the pin in amdgpu_display_crtc_page_flip_target is paired > with the unpin in dm_plane_helper_cleanup_fb This should say amdgpu_display_unpin_work_func. >>> >>> Ah! So that is the classic (e.g. non atomic) path? >> >> Presumably. >> >> > & dce_v*_crtc_disable. One thing I notice is that the pin is guarded by > if (!adev->enable_virtual_display), but the unpins seem unconditional. So > could this be about virtual display, and the problem is actually trying > to unpin a BO that was never pinned? >>> >>> Nope, my educated guess is rather that we free up the BO before >>> amdgpu_display_unpin_work_func is called. >>> >>> E.g. not pin unbalance, but rather use after free. >> >> amdgpu_display_crtc_page_flip_target calls amdgpu_bo_ref(work->old_abo), and >> amdgpu_display_unpin_work_func calls amdgpu_bo_unref(>old_abo) only >> after amdgpu_bo_unpin. So what you describe could only happen if there's an >> imbalance elsewhere such that amdgpu_bo_unref is called more often than >> amdgpu_bo_ref, or maybe if amdgpu_bo_reserve fails in >> amdgpu_display_unpin_work_func (in which case the "failed to reserve buffer >> after flip" error message should appear in dmesg). > > > Actually, each call to amdgpu_display_crtc_page_flip_target() will > > 1, init a work(amdgpu_display_unpin_work_func) to unpin an old buffer >(crtc->primary->fb), the work will be queued in dce_vX_0_pageflip_irq(). > > 2, pin a new buffer, assign it to crtc->primary->fb. But how to unpin it? >Next call. > > The problem is the pinned buffer of last call to > amdgpu_display_crtc_page_flip_target() is not unpinned. It's unpinned in dce_v*_0_crtc_disable. I think I've found the problem though: dce_v*_0_crtc_do_set_base pin the BO from target_fb unconditionally, but unpin the BO from the fb parameter only if it's different from the former. So if they're the same, the BO's pin count is incremented by 1. -- Earthling Michel Dänzer| https://redhat.com Libre software enthusiast | Mesa and Xwayland developer