Re: [PATCH 2/2] drm: add tdr support for embeded hw_fence
Am 22.07.21 um 18:47 schrieb Jingwen Chen: On Thu Jul 22, 2021 at 06:24:28PM +0200, Christian König wrote: Am 22.07.21 um 16:45 schrieb Andrey Grodzovsky: On 2021-07-22 6:45 a.m., Jingwen Chen wrote: On Wed Jul 21, 2021 at 12:53:51PM -0400, Andrey Grodzovsky wrote: On 2021-07-20 11:13 p.m., Jingwen Chen wrote: [Why] After embeded hw_fence to amdgpu_job, we need to add tdr support for this feature. [How] 1. Add a resubmit_flag for resubmit jobs. 2. Clear job fence from RCU and force complete vm flush fences in pre_asic_reset 3. skip dma_fence_get for resubmit jobs and add a dma_fence_put for guilty jobs. Signed-off-by: Jack Zhang Signed-off-by: Jingwen Chen --- drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 12 +++- drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c | 16 +++- drivers/gpu/drm/amd/amdgpu/amdgpu_job.c | 4 +++- drivers/gpu/drm/scheduler/sched_main.c | 1 + include/drm/gpu_scheduler.h | 1 + 5 files changed, 27 insertions(+), 7 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c index 40461547701a..fe0237f72a09 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c @@ -4382,7 +4382,7 @@ int amdgpu_device_mode1_reset(struct amdgpu_device *adev) int amdgpu_device_pre_asic_reset(struct amdgpu_device *adev, struct amdgpu_reset_context *reset_context) { - int i, r = 0; + int i, j, r = 0; struct amdgpu_job *job = NULL; bool need_full_reset = test_bit(AMDGPU_NEED_FULL_RESET, &reset_context->flags); @@ -4406,6 +4406,16 @@ int amdgpu_device_pre_asic_reset(struct amdgpu_device *adev, if (!ring || !ring->sched.thread) continue; + /*clear job fence from fence drv to avoid force_completion + *leave NULL and vm flush fence in fence drv */ + for (j = 0; j <= ring->fence_drv.num_fences_mask; j ++) { + struct dma_fence *old,**ptr; + ptr = &ring->fence_drv.fences[j]; + old = rcu_dereference_protected(*ptr, 1); + if (old && test_bit(DMA_FENCE_FLAG_USER_BITS, &old->flags))) { + RCU_INIT_POINTER(*ptr, NULL); + } Is this to avoid premature job free because of dma_fence_put inside amdgpu_fence_process ? I can't currently remember why but we probably want all the HW fences currently in the ring to be forced signaled - maybe better to test for DMA_FENCE_FLAG_USER_BITS inside amdgpu_fence_process and still do the signaling but not the dma_fence_put part Andrey Hi Andrey, This is to avoid signaling the same fence twice. If we still do the signaling, then the job in the pending list will be signaled first in force_completion, and later be signaled in resubmit. This will go to BUG() in amdgpu_fence_process. Oh, i see, how about just adding 'skip' flag to amdgpu_ring and setting it before calling amdgpu_fence_driver_force_completion and resetting it after, then inside amdgpu_fence_driver_force_completion you can just skip the signaling part with this flag for fences with DMA_FENCE_FLAG_USER_BITS set Less lines of code at least. Still sounds quite a bit hacky. I would rather suggest to completely drop the approach with amdgpu_fence_driver_force_completion(). I could never see why we would want that in the first place. Regards, Christian. Hi Christian, I keep the amdgpu_fence_driver_force_completion here to make sure the vm flush fence is signaled and put. So the key question is whether the fence of ib test should be the first fence to be signaled after reset. If it should be, it means not only fences with DMA_FENCE_FLAG_USER_BITS but also vm flush fences should be removed from RCU fence array before ib_test. Then we must do the force_completion here for vm flush fence, otherwise there will be a memory leak here as no one will signal and put it after we remove it from RCU. If not, then the first fence to signal could be vm flush fence. And I'm OK with drop amdgpu_fence_driver_force_completion here. The key problem is that amdgpu_fence_driver_force_completion() goes over the RCU array and signals everything there. What we should do instead is to go over the jobs and cleanup the ones we don't want to re-submit and keep the rest. And please drop any usage of DMA_FENCE_FLAG_USER_BITS. That is not something which should be abused for reset handling. Regards, Christian. Best Regards, JingWen Chen Andrey + } /* after all hw jobs are reset, hw fence is meaningless, so force_completion */ amdgpu_fence_driver_force_completion(ring); } diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c index eecf21d8ec33..815776c9a013 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c @@ -156,11 +156,17 @@ int amdgpu_fe
Re: [PATCH 2/2] drm: add tdr support for embeded hw_fence
On Thu Jul 22, 2021 at 01:50:09PM -0400, Andrey Grodzovsky wrote: > > On 2021-07-22 1:27 p.m., Jingwen Chen wrote: > > On Thu Jul 22, 2021 at 01:17:13PM -0400, Andrey Grodzovsky wrote: > > > On 2021-07-22 12:47 p.m., Jingwen Chen wrote: > > > > On Thu Jul 22, 2021 at 06:24:28PM +0200, Christian König wrote: > > > > > Am 22.07.21 um 16:45 schrieb Andrey Grodzovsky: > > > > > > On 2021-07-22 6:45 a.m., Jingwen Chen wrote: > > > > > > > On Wed Jul 21, 2021 at 12:53:51PM -0400, Andrey Grodzovsky wrote: > > > > > > > > On 2021-07-20 11:13 p.m., Jingwen Chen wrote: > > > > > > > > > [Why] > > > > > > > > > After embeded hw_fence to amdgpu_job, we need to add tdr > > > > > > > > > support > > > > > > > > > for this feature. > > > > > > > > > > > > > > > > > > [How] > > > > > > > > > 1. Add a resubmit_flag for resubmit jobs. > > > > > > > > > 2. Clear job fence from RCU and force complete vm flush > > > > > > > > > fences in > > > > > > > > > pre_asic_reset > > > > > > > > > 3. skip dma_fence_get for resubmit jobs and add a > > > > > > > > > dma_fence_put > > > > > > > > > for guilty jobs. > > > > > > > > > > > > > > > > > > Signed-off-by: Jack Zhang > > > > > > > > > Signed-off-by: Jingwen Chen > > > > > > > > > --- > > > > > > > > > drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 12 > > > > > > > > > +++- > > > > > > > > > drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c | 16 > > > > > > > > > +++- > > > > > > > > > drivers/gpu/drm/amd/amdgpu/amdgpu_job.c | 4 +++- > > > > > > > > > drivers/gpu/drm/scheduler/sched_main.c | 1 + > > > > > > > > > include/drm/gpu_scheduler.h | 1 + > > > > > > > > > 5 files changed, 27 insertions(+), 7 deletions(-) > > > > > > > > > > > > > > > > > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c > > > > > > > > > b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c > > > > > > > > > index 40461547701a..fe0237f72a09 100644 > > > > > > > > > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c > > > > > > > > > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c > > > > > > > > > @@ -4382,7 +4382,7 @@ int amdgpu_device_mode1_reset(struct > > > > > > > > > amdgpu_device *adev) > > > > > > > > > int amdgpu_device_pre_asic_reset(struct amdgpu_device > > > > > > > > > *adev, > > > > > > > > > struct amdgpu_reset_context > > > > > > > > > *reset_context) > > > > > > > > > { > > > > > > > > > - int i, r = 0; > > > > > > > > > + int i, j, r = 0; > > > > > > > > > struct amdgpu_job *job = NULL; > > > > > > > > > bool need_full_reset = > > > > > > > > > test_bit(AMDGPU_NEED_FULL_RESET, > > > > > > > > > &reset_context->flags); > > > > > > > > > @@ -4406,6 +4406,16 @@ int > > > > > > > > > amdgpu_device_pre_asic_reset(struct amdgpu_device *adev, > > > > > > > > > if (!ring || !ring->sched.thread) > > > > > > > > > continue; > > > > > > > > > + /*clear job fence from fence drv to avoid > > > > > > > > > force_completion > > > > > > > > > + *leave NULL and vm flush fence in fence drv */ > > > > > > > > > + for (j = 0; j <= ring->fence_drv.num_fences_mask; j > > > > > > > > > ++) { > > > > > > > > > + struct dma_fence *old,**ptr; > > > > > > > > > + ptr = &ring->fence_drv.fences[j]; > > > > > > > > > + old = rcu_dereference_protected(*ptr, 1); > > > > > > > > > + if (old && test_bit(DMA_FENCE_FLAG_USER_BITS, > > > > > > > > > &old->flags))) { > > > > > > > > > + RCU_INIT_POINTER(*ptr, NULL); > > > > > > > > > + } > > > > > > > > Is this to avoid premature job free because of dma_fence_put > > > > > > > > inside > > > > > > > > amdgpu_fence_process ? > > > > > > > > I can't currently remember why but we probably want all the HW > > > > > > > > fences > > > > > > > > currently in the ring to > > > > > > > > be forced signaled - maybe better to test for > > > > > > > > DMA_FENCE_FLAG_USER_BITS > > > > > > > > inside amdgpu_fence_process > > > > > > > > and still do the signaling but not the dma_fence_put part > > > > > > > > > > > > > > > > Andrey > > > > > > > Hi Andrey, > > > > > > > > > > > > > > This is to avoid signaling the same fence twice. If we still do > > > > > > > the > > > > > > > signaling, then the job in the pending list will be signaled > > > > > > > first in > > > > > > > force_completion, and later be signaled in resubmit. This will go > > > > > > > to > > > > > > > BUG() in amdgpu_fence_process. > > > > > > Oh, i see, how about just adding 'skip' flag to amdgpu_ring and > > > > > > setting > > > > > > it before calling > > > > > > amdgpu_fence_driver_force_completion and resetting it after, then > > > > > > inside > > > > > > amdgpu_fence_driver_force_completion > > > > > > you can just skip the signaling part with this flag for fences with > > > > > > DMA_FENCE_FLAG_USER_BITS set > > > > > > Le
Re: [PATCH v4 10/13] lib: test_hmm add module param for zone device type
On Thu, Jul 22, 2021 at 11:59:17AM -0500, Sierra Guiza, Alejandro (Alex) wrote: > > On 7/22/2021 7:23 AM, Jason Gunthorpe wrote: > > On Sat, Jul 17, 2021 at 02:21:32PM -0500, Alex Sierra wrote: > > > In order to configure device generic in test_hmm, two > > > module parameters should be passed, which correspon to the > > > SP start address of each device (2) spm_addr_dev0 & > > > spm_addr_dev1. If no parameters are passed, private device > > > type is configured. > > I don't think tests should need configuration like this, is it really > > necessary? How can people with normal HW run this test? > Hi Jason, > The idea was to add an easy way to validate the codepaths touched by this > patch series, which make modifications to the migration helpers for device > generic type pages. We're using CONFIG_EFI_FAKE_MEMMAP to create fake SPM > devices inside system memory. No special HW needed. And passing the kernel > parameter efi_fake_mem. Ex. efi_fake_mem=1G@0x1:0x4. I should > probably need to include a small example of how to set this in the > test_hmm.sh > usage(). I don't think anything about hmm is sensitive to how the pages are acquired - you can't create device generic pages without relying on FAKE_MEMMAP? Jason ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
Re: [PATCH 2/2] drm: add tdr support for embeded hw_fence
On 2021-07-22 1:27 p.m., Jingwen Chen wrote: On Thu Jul 22, 2021 at 01:17:13PM -0400, Andrey Grodzovsky wrote: On 2021-07-22 12:47 p.m., Jingwen Chen wrote: On Thu Jul 22, 2021 at 06:24:28PM +0200, Christian König wrote: Am 22.07.21 um 16:45 schrieb Andrey Grodzovsky: On 2021-07-22 6:45 a.m., Jingwen Chen wrote: On Wed Jul 21, 2021 at 12:53:51PM -0400, Andrey Grodzovsky wrote: On 2021-07-20 11:13 p.m., Jingwen Chen wrote: [Why] After embeded hw_fence to amdgpu_job, we need to add tdr support for this feature. [How] 1. Add a resubmit_flag for resubmit jobs. 2. Clear job fence from RCU and force complete vm flush fences in pre_asic_reset 3. skip dma_fence_get for resubmit jobs and add a dma_fence_put for guilty jobs. Signed-off-by: Jack Zhang Signed-off-by: Jingwen Chen --- drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 12 +++- drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c | 16 +++- drivers/gpu/drm/amd/amdgpu/amdgpu_job.c | 4 +++- drivers/gpu/drm/scheduler/sched_main.c | 1 + include/drm/gpu_scheduler.h | 1 + 5 files changed, 27 insertions(+), 7 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c index 40461547701a..fe0237f72a09 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c @@ -4382,7 +4382,7 @@ int amdgpu_device_mode1_reset(struct amdgpu_device *adev) int amdgpu_device_pre_asic_reset(struct amdgpu_device *adev, struct amdgpu_reset_context *reset_context) { - int i, r = 0; + int i, j, r = 0; struct amdgpu_job *job = NULL; bool need_full_reset = test_bit(AMDGPU_NEED_FULL_RESET, &reset_context->flags); @@ -4406,6 +4406,16 @@ int amdgpu_device_pre_asic_reset(struct amdgpu_device *adev, if (!ring || !ring->sched.thread) continue; + /*clear job fence from fence drv to avoid force_completion + *leave NULL and vm flush fence in fence drv */ + for (j = 0; j <= ring->fence_drv.num_fences_mask; j ++) { + struct dma_fence *old,**ptr; + ptr = &ring->fence_drv.fences[j]; + old = rcu_dereference_protected(*ptr, 1); + if (old && test_bit(DMA_FENCE_FLAG_USER_BITS, &old->flags))) { + RCU_INIT_POINTER(*ptr, NULL); + } Is this to avoid premature job free because of dma_fence_put inside amdgpu_fence_process ? I can't currently remember why but we probably want all the HW fences currently in the ring to be forced signaled - maybe better to test for DMA_FENCE_FLAG_USER_BITS inside amdgpu_fence_process and still do the signaling but not the dma_fence_put part Andrey Hi Andrey, This is to avoid signaling the same fence twice. If we still do the signaling, then the job in the pending list will be signaled first in force_completion, and later be signaled in resubmit. This will go to BUG() in amdgpu_fence_process. Oh, i see, how about just adding 'skip' flag to amdgpu_ring and setting it before calling amdgpu_fence_driver_force_completion and resetting it after, then inside amdgpu_fence_driver_force_completion you can just skip the signaling part with this flag for fences with DMA_FENCE_FLAG_USER_BITS set Less lines of code at least. Still sounds quite a bit hacky. I would rather suggest to completely drop the approach with amdgpu_fence_driver_force_completion(). I could never see why we would want that in the first place. Regards, Christian. Hi Christian, I keep the amdgpu_fence_driver_force_completion here to make sure the vm flush fence is signaled and put. Right, so we need to do force completion for the sake of all the fences without parent jobs since there is code which wait directly on them. So the key question is whether the fence of ib test should be the first fence to be signaled after reset. What do you mean by 'after reset' - at this point in the code there was no ASIC reset yet, we just stopped the schedulers and detached the HW fences from their parent jobs (sched_fence) I mean the ASIC reset. There will be a ib_test for each ring after ASIC reset. Then why wouldn't they be the first ? They will emit new fences into the ring which will be signaled right away because the ASIC went through reset and is not hung anymore. All the previous fences, including VM flush once from before the reset will be already signaled by this time from amdgpu_fence_driver_force_completion. If it should be, it means not only fences with DMA_FENCE_FLAG_USER_BITS but also vm flush fences should be removed from RCU fence array before ib_test. As I mentioned above, not clear to me why VM fences should get special treatment here. Andrey Which ib_test is it, the one after ASIC reset ? Yes Best Regards, JingWen Chen Andrey Then we must do the force_completion here for vm flush fence, otherwise the
Re: [PATCH 2/2] drm: add tdr support for embeded hw_fence
On Thu Jul 22, 2021 at 01:17:13PM -0400, Andrey Grodzovsky wrote: > > On 2021-07-22 12:47 p.m., Jingwen Chen wrote: > > On Thu Jul 22, 2021 at 06:24:28PM +0200, Christian König wrote: > > > Am 22.07.21 um 16:45 schrieb Andrey Grodzovsky: > > > > On 2021-07-22 6:45 a.m., Jingwen Chen wrote: > > > > > On Wed Jul 21, 2021 at 12:53:51PM -0400, Andrey Grodzovsky wrote: > > > > > > On 2021-07-20 11:13 p.m., Jingwen Chen wrote: > > > > > > > [Why] > > > > > > > After embeded hw_fence to amdgpu_job, we need to add tdr support > > > > > > > for this feature. > > > > > > > > > > > > > > [How] > > > > > > > 1. Add a resubmit_flag for resubmit jobs. > > > > > > > 2. Clear job fence from RCU and force complete vm flush fences in > > > > > > > pre_asic_reset > > > > > > > 3. skip dma_fence_get for resubmit jobs and add a dma_fence_put > > > > > > > for guilty jobs. > > > > > > > > > > > > > > Signed-off-by: Jack Zhang > > > > > > > Signed-off-by: Jingwen Chen > > > > > > > --- > > > > > > > drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 12 +++- > > > > > > > drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c | 16 > > > > > > > +++- > > > > > > > drivers/gpu/drm/amd/amdgpu/amdgpu_job.c | 4 +++- > > > > > > > drivers/gpu/drm/scheduler/sched_main.c | 1 + > > > > > > > include/drm/gpu_scheduler.h | 1 + > > > > > > > 5 files changed, 27 insertions(+), 7 deletions(-) > > > > > > > > > > > > > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c > > > > > > > b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c > > > > > > > index 40461547701a..fe0237f72a09 100644 > > > > > > > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c > > > > > > > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c > > > > > > > @@ -4382,7 +4382,7 @@ int amdgpu_device_mode1_reset(struct > > > > > > > amdgpu_device *adev) > > > > > > > int amdgpu_device_pre_asic_reset(struct amdgpu_device *adev, > > > > > > > struct amdgpu_reset_context *reset_context) > > > > > > > { > > > > > > > - int i, r = 0; > > > > > > > + int i, j, r = 0; > > > > > > > struct amdgpu_job *job = NULL; > > > > > > > bool need_full_reset = > > > > > > > test_bit(AMDGPU_NEED_FULL_RESET, > > > > > > > &reset_context->flags); > > > > > > > @@ -4406,6 +4406,16 @@ int > > > > > > > amdgpu_device_pre_asic_reset(struct amdgpu_device *adev, > > > > > > > if (!ring || !ring->sched.thread) > > > > > > > continue; > > > > > > > + /*clear job fence from fence drv to avoid > > > > > > > force_completion > > > > > > > + *leave NULL and vm flush fence in fence drv */ > > > > > > > + for (j = 0; j <= ring->fence_drv.num_fences_mask; j ++) { > > > > > > > + struct dma_fence *old,**ptr; > > > > > > > + ptr = &ring->fence_drv.fences[j]; > > > > > > > + old = rcu_dereference_protected(*ptr, 1); > > > > > > > + if (old && test_bit(DMA_FENCE_FLAG_USER_BITS, > > > > > > > &old->flags))) { > > > > > > > + RCU_INIT_POINTER(*ptr, NULL); > > > > > > > + } > > > > > > Is this to avoid premature job free because of dma_fence_put inside > > > > > > amdgpu_fence_process ? > > > > > > I can't currently remember why but we probably want all the HW > > > > > > fences > > > > > > currently in the ring to > > > > > > be forced signaled - maybe better to test for > > > > > > DMA_FENCE_FLAG_USER_BITS > > > > > > inside amdgpu_fence_process > > > > > > and still do the signaling but not the dma_fence_put part > > > > > > > > > > > > Andrey > > > > > Hi Andrey, > > > > > > > > > > This is to avoid signaling the same fence twice. If we still do the > > > > > signaling, then the job in the pending list will be signaled first in > > > > > force_completion, and later be signaled in resubmit. This will go to > > > > > BUG() in amdgpu_fence_process. > > > > > > > > Oh, i see, how about just adding 'skip' flag to amdgpu_ring and setting > > > > it before calling > > > > amdgpu_fence_driver_force_completion and resetting it after, then inside > > > > amdgpu_fence_driver_force_completion > > > > you can just skip the signaling part with this flag for fences with > > > > DMA_FENCE_FLAG_USER_BITS set > > > > Less lines of code at least. > > > Still sounds quite a bit hacky. > > > > > > I would rather suggest to completely drop the approach with > > > amdgpu_fence_driver_force_completion(). I could never see why we would > > > want > > > that in the first place. > > > > > > Regards, > > > Christian. > > > > > Hi Christian, > > > > I keep the amdgpu_fence_driver_force_completion here to make sure the vm > > flush fence is signaled and put. > > > Right, so we need to do force completion for the sake of all the fences > without parent jobs since there is code which wait directly on them. > > > > So the key question is whether the fence of ib test should be the first > >
Re: [PATCH 2/2] drm: add tdr support for embeded hw_fence
On 2021-07-22 12:47 p.m., Jingwen Chen wrote: On Thu Jul 22, 2021 at 06:24:28PM +0200, Christian König wrote: Am 22.07.21 um 16:45 schrieb Andrey Grodzovsky: On 2021-07-22 6:45 a.m., Jingwen Chen wrote: On Wed Jul 21, 2021 at 12:53:51PM -0400, Andrey Grodzovsky wrote: On 2021-07-20 11:13 p.m., Jingwen Chen wrote: [Why] After embeded hw_fence to amdgpu_job, we need to add tdr support for this feature. [How] 1. Add a resubmit_flag for resubmit jobs. 2. Clear job fence from RCU and force complete vm flush fences in pre_asic_reset 3. skip dma_fence_get for resubmit jobs and add a dma_fence_put for guilty jobs. Signed-off-by: Jack Zhang Signed-off-by: Jingwen Chen --- drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 12 +++- drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c | 16 +++- drivers/gpu/drm/amd/amdgpu/amdgpu_job.c | 4 +++- drivers/gpu/drm/scheduler/sched_main.c | 1 + include/drm/gpu_scheduler.h | 1 + 5 files changed, 27 insertions(+), 7 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c index 40461547701a..fe0237f72a09 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c @@ -4382,7 +4382,7 @@ int amdgpu_device_mode1_reset(struct amdgpu_device *adev) int amdgpu_device_pre_asic_reset(struct amdgpu_device *adev, struct amdgpu_reset_context *reset_context) { - int i, r = 0; + int i, j, r = 0; struct amdgpu_job *job = NULL; bool need_full_reset = test_bit(AMDGPU_NEED_FULL_RESET, &reset_context->flags); @@ -4406,6 +4406,16 @@ int amdgpu_device_pre_asic_reset(struct amdgpu_device *adev, if (!ring || !ring->sched.thread) continue; + /*clear job fence from fence drv to avoid force_completion + *leave NULL and vm flush fence in fence drv */ + for (j = 0; j <= ring->fence_drv.num_fences_mask; j ++) { + struct dma_fence *old,**ptr; + ptr = &ring->fence_drv.fences[j]; + old = rcu_dereference_protected(*ptr, 1); + if (old && test_bit(DMA_FENCE_FLAG_USER_BITS, &old->flags))) { + RCU_INIT_POINTER(*ptr, NULL); + } Is this to avoid premature job free because of dma_fence_put inside amdgpu_fence_process ? I can't currently remember why but we probably want all the HW fences currently in the ring to be forced signaled - maybe better to test for DMA_FENCE_FLAG_USER_BITS inside amdgpu_fence_process and still do the signaling but not the dma_fence_put part Andrey Hi Andrey, This is to avoid signaling the same fence twice. If we still do the signaling, then the job in the pending list will be signaled first in force_completion, and later be signaled in resubmit. This will go to BUG() in amdgpu_fence_process. Oh, i see, how about just adding 'skip' flag to amdgpu_ring and setting it before calling amdgpu_fence_driver_force_completion and resetting it after, then inside amdgpu_fence_driver_force_completion you can just skip the signaling part with this flag for fences with DMA_FENCE_FLAG_USER_BITS set Less lines of code at least. Still sounds quite a bit hacky. I would rather suggest to completely drop the approach with amdgpu_fence_driver_force_completion(). I could never see why we would want that in the first place. Regards, Christian. Hi Christian, I keep the amdgpu_fence_driver_force_completion here to make sure the vm flush fence is signaled and put. Right, so we need to do force completion for the sake of all the fences without parent jobs since there is code which wait directly on them. So the key question is whether the fence of ib test should be the first fence to be signaled after reset. What do you mean by 'after reset' - at this point in the code there was no ASIC reset yet, we just stopped the schedulers and detached the HW fences from their parent jobs (sched_fence) If it should be, it means not only fences with DMA_FENCE_FLAG_USER_BITS but also vm flush fences should be removed from RCU fence array before ib_test. Which ib_test is it, the one after ASIC reset ? Andrey Then we must do the force_completion here for vm flush fence, otherwise there will be a memory leak here as no one will signal and put it after we remove it from RCU. If not, then the first fence to signal could be vm flush fence. And I'm OK with drop amdgpu_fence_driver_force_completion here. Best Regards, JingWen Chen Andrey + } /* after all hw jobs are reset, hw fence is meaningless, so force_completion */ amdgpu_fence_driver_force_completion(ring); } diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c index eecf21d8ec33..815776c9a013 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c @@ -156,11 +156
Re: [PATCH] drm/amdgpu: Add msix restore for pass-through mode
[Public] Acked-by: Alex Deucher From: Chengzhe Liu Sent: Thursday, July 22, 2021 1:49 AM To: amd-gfx@lists.freedesktop.org Cc: Tuikov, Luben ; Koenig, Christian ; Deucher, Alexander ; Xiao, Jack ; Zhang, Hawking ; Xu, Feifei ; Wang, Kevin(Yang) ; Liu, Cheng Zhe Subject: [PATCH] drm/amdgpu: Add msix restore for pass-through mode In pass-through mode, after mode 1 reset, msix enablement status would lost and never receives interrupt again. So, we should restore msix status after mode 1 reset. Signed-off-by: Chengzhe Liu --- drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c index 83af307e97cd..e1aa4a5e6a98 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c @@ -584,7 +584,7 @@ void amdgpu_irq_gpu_reset_resume_helper(struct amdgpu_device *adev) { int i, j, k; - if (amdgpu_sriov_vf(adev)) + if (amdgpu_sriov_vf(adev) || amdgpu_passthrough(adev)) amdgpu_restore_msix(adev); for (i = 0; i < AMDGPU_IRQ_CLIENTID_MAX; ++i) { -- 2.25.1 ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
Re: [PATCH] drm/amdgpu: Check pmops for desired suspend state
[AMD Official Use Only] I sent a similar patch out a while ago, but never had a chance to follow up on it. The problem is users might change the default. https://www.spinics.net/lists/amd-gfx/msg60578.html Alex From: Vishwakarma, Pratik Sent: Thursday, July 22, 2021 1:27 AM To: Deucher, Alexander ; amd-gfx@lists.freedesktop.org Cc: Vishwakarma, Pratik Subject: [PATCH] drm/amdgpu: Check pmops for desired suspend state [Why] User might set mem_sleep as deep and it will result in amdgpu resume errors. [How] Check with pm for default suspend state Signed-off-by: Pratik Vishwakarma --- drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c index af1710971ff3..d92196429741 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c @@ -1468,7 +1468,8 @@ static int amdgpu_pmops_suspend(struct device *dev) struct amdgpu_device *adev = drm_to_adev(drm_dev); int r; - if (amdgpu_acpi_is_s0ix_supported(adev)) + if (amdgpu_acpi_is_s0ix_supported(adev) + && pm_suspend_default_s2idle()) adev->in_s0ix = true; adev->in_s3 = true; r = amdgpu_device_suspend(drm_dev, true); -- 2.25.1 ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
Re: [PATCH v4 10/13] lib: test_hmm add module param for zone device type
On 7/22/2021 7:23 AM, Jason Gunthorpe wrote: On Sat, Jul 17, 2021 at 02:21:32PM -0500, Alex Sierra wrote: In order to configure device generic in test_hmm, two module parameters should be passed, which correspon to the SP start address of each device (2) spm_addr_dev0 & spm_addr_dev1. If no parameters are passed, private device type is configured. I don't think tests should need configuration like this, is it really necessary? How can people with normal HW run this test? Hi Jason, The idea was to add an easy way to validate the codepaths touched by this patch series, which make modifications to the migration helpers for device generic type pages. We're using CONFIG_EFI_FAKE_MEMMAP to create fake SPM devices inside system memory. No special HW needed. And passing the kernel parameter efi_fake_mem. Ex. efi_fake_mem=1G@0x1:0x4. I should probably need to include a small example of how to set this in the test_hmm.sh usage(). Alex S. Jason ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
Re: [PATCH 2/2] drm: add tdr support for embeded hw_fence
On Thu Jul 22, 2021 at 06:24:28PM +0200, Christian König wrote: > Am 22.07.21 um 16:45 schrieb Andrey Grodzovsky: > > > > On 2021-07-22 6:45 a.m., Jingwen Chen wrote: > > > On Wed Jul 21, 2021 at 12:53:51PM -0400, Andrey Grodzovsky wrote: > > > > On 2021-07-20 11:13 p.m., Jingwen Chen wrote: > > > > > [Why] > > > > > After embeded hw_fence to amdgpu_job, we need to add tdr support > > > > > for this feature. > > > > > > > > > > [How] > > > > > 1. Add a resubmit_flag for resubmit jobs. > > > > > 2. Clear job fence from RCU and force complete vm flush fences in > > > > > pre_asic_reset > > > > > 3. skip dma_fence_get for resubmit jobs and add a dma_fence_put > > > > > for guilty jobs. > > > > > > > > > > Signed-off-by: Jack Zhang > > > > > Signed-off-by: Jingwen Chen > > > > > --- > > > > > drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 12 +++- > > > > > drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c | 16 +++- > > > > > drivers/gpu/drm/amd/amdgpu/amdgpu_job.c | 4 +++- > > > > > drivers/gpu/drm/scheduler/sched_main.c | 1 + > > > > > include/drm/gpu_scheduler.h | 1 + > > > > > 5 files changed, 27 insertions(+), 7 deletions(-) > > > > > > > > > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c > > > > > b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c > > > > > index 40461547701a..fe0237f72a09 100644 > > > > > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c > > > > > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c > > > > > @@ -4382,7 +4382,7 @@ int amdgpu_device_mode1_reset(struct > > > > > amdgpu_device *adev) > > > > > int amdgpu_device_pre_asic_reset(struct amdgpu_device *adev, > > > > > struct amdgpu_reset_context *reset_context) > > > > > { > > > > > - int i, r = 0; > > > > > + int i, j, r = 0; > > > > > struct amdgpu_job *job = NULL; > > > > > bool need_full_reset = > > > > > test_bit(AMDGPU_NEED_FULL_RESET, &reset_context->flags); > > > > > @@ -4406,6 +4406,16 @@ int > > > > > amdgpu_device_pre_asic_reset(struct amdgpu_device *adev, > > > > > if (!ring || !ring->sched.thread) > > > > > continue; > > > > > + /*clear job fence from fence drv to avoid force_completion > > > > > + *leave NULL and vm flush fence in fence drv */ > > > > > + for (j = 0; j <= ring->fence_drv.num_fences_mask; j ++) { > > > > > + struct dma_fence *old,**ptr; > > > > > + ptr = &ring->fence_drv.fences[j]; > > > > > + old = rcu_dereference_protected(*ptr, 1); > > > > > + if (old && test_bit(DMA_FENCE_FLAG_USER_BITS, > > > > > &old->flags))) { > > > > > + RCU_INIT_POINTER(*ptr, NULL); > > > > > + } > > > > > > > > Is this to avoid premature job free because of dma_fence_put inside > > > > amdgpu_fence_process ? > > > > I can't currently remember why but we probably want all the HW fences > > > > currently in the ring to > > > > be forced signaled - maybe better to test for DMA_FENCE_FLAG_USER_BITS > > > > inside amdgpu_fence_process > > > > and still do the signaling but not the dma_fence_put part > > > > > > > > Andrey > > > Hi Andrey, > > > > > > This is to avoid signaling the same fence twice. If we still do the > > > signaling, then the job in the pending list will be signaled first in > > > force_completion, and later be signaled in resubmit. This will go to > > > BUG() in amdgpu_fence_process. > > > > > > Oh, i see, how about just adding 'skip' flag to amdgpu_ring and setting > > it before calling > > amdgpu_fence_driver_force_completion and resetting it after, then inside > > amdgpu_fence_driver_force_completion > > you can just skip the signaling part with this flag for fences with > > DMA_FENCE_FLAG_USER_BITS set > > Less lines of code at least. > > Still sounds quite a bit hacky. > > I would rather suggest to completely drop the approach with > amdgpu_fence_driver_force_completion(). I could never see why we would want > that in the first place. > > Regards, > Christian. > Hi Christian, I keep the amdgpu_fence_driver_force_completion here to make sure the vm flush fence is signaled and put. So the key question is whether the fence of ib test should be the first fence to be signaled after reset. If it should be, it means not only fences with DMA_FENCE_FLAG_USER_BITS but also vm flush fences should be removed from RCU fence array before ib_test. Then we must do the force_completion here for vm flush fence, otherwise there will be a memory leak here as no one will signal and put it after we remove it from RCU. If not, then the first fence to signal could be vm flush fence. And I'm OK with drop amdgpu_fence_driver_force_completion here. Best Regards, JingWen Chen > > > > Andrey > > > > > > > > > > > > + } > > > > > /* after all hw jobs are reset, hw fence is > > > > > meaningless, so force_completion */ > > > > > amdgpu_fence_driver_
Re: [PATCH 2/2] drm: add tdr support for embeded hw_fence
On Thu Jul 22, 2021 at 10:45:40AM -0400, Andrey Grodzovsky wrote: > > On 2021-07-22 6:45 a.m., Jingwen Chen wrote: > > On Wed Jul 21, 2021 at 12:53:51PM -0400, Andrey Grodzovsky wrote: > > > On 2021-07-20 11:13 p.m., Jingwen Chen wrote: > > > > [Why] > > > > After embeded hw_fence to amdgpu_job, we need to add tdr support > > > > for this feature. > > > > > > > > [How] > > > > 1. Add a resubmit_flag for resubmit jobs. > > > > 2. Clear job fence from RCU and force complete vm flush fences in > > > > pre_asic_reset > > > > 3. skip dma_fence_get for resubmit jobs and add a dma_fence_put > > > > for guilty jobs. > > > > > > > > Signed-off-by: Jack Zhang > > > > Signed-off-by: Jingwen Chen > > > > --- > > > >drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 12 +++- > > > >drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c | 16 +++- > > > >drivers/gpu/drm/amd/amdgpu/amdgpu_job.c| 4 +++- > > > >drivers/gpu/drm/scheduler/sched_main.c | 1 + > > > >include/drm/gpu_scheduler.h| 1 + > > > >5 files changed, 27 insertions(+), 7 deletions(-) > > > > > > > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c > > > > b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c > > > > index 40461547701a..fe0237f72a09 100644 > > > > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c > > > > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c > > > > @@ -4382,7 +4382,7 @@ int amdgpu_device_mode1_reset(struct > > > > amdgpu_device *adev) > > > >int amdgpu_device_pre_asic_reset(struct amdgpu_device *adev, > > > > struct amdgpu_reset_context > > > > *reset_context) > > > >{ > > > > - int i, r = 0; > > > > + int i, j, r = 0; > > > > struct amdgpu_job *job = NULL; > > > > bool need_full_reset = > > > > test_bit(AMDGPU_NEED_FULL_RESET, &reset_context->flags); > > > > @@ -4406,6 +4406,16 @@ int amdgpu_device_pre_asic_reset(struct > > > > amdgpu_device *adev, > > > > if (!ring || !ring->sched.thread) > > > > continue; > > > > + /*clear job fence from fence drv to avoid > > > > force_completion > > > > +*leave NULL and vm flush fence in fence drv */ > > > > + for (j = 0; j <= ring->fence_drv.num_fences_mask; j ++) > > > > { > > > > + struct dma_fence *old,**ptr; > > > > + ptr = &ring->fence_drv.fences[j]; > > > > + old = rcu_dereference_protected(*ptr, 1); > > > > + if (old && test_bit(DMA_FENCE_FLAG_USER_BITS, > > > > &old->flags))) { > > > > + RCU_INIT_POINTER(*ptr, NULL); > > > > + } > > > > > > Is this to avoid premature job free because of dma_fence_put inside > > > amdgpu_fence_process ? > > > I can't currently remember why but we probably want all the HW fences > > > currently in the ring to > > > be forced signaled - maybe better to test for DMA_FENCE_FLAG_USER_BITS > > > inside amdgpu_fence_process > > > and still do the signaling but not the dma_fence_put part > > > > > > Andrey > > Hi Andrey, > > > > This is to avoid signaling the same fence twice. If we still do the > > signaling, then the job in the pending list will be signaled first in > > force_completion, and later be signaled in resubmit. This will go to > > BUG() in amdgpu_fence_process. > > > Oh, i see, how about just adding 'skip' flag to amdgpu_ring and setting it > before calling > amdgpu_fence_driver_force_completion and resetting it after, then inside > amdgpu_fence_driver_force_completion > you can just skip the signaling part with this flag for fences with > DMA_FENCE_FLAG_USER_BITS set > Less lines of code at least. > > Andrey Hi Andrey, In this way, this issue still exists. If we just skip it in the force_completion, these fences are still in the RCU fence array. So when the FLR finishes, the eop interrupt function of ib_test will call amdgpu_fence_process, the skipped fences will be signaled along with ib_test fences and signaled again during resubmission. Best Regards, JingWen Chen > > > > > > > + } > > > > /* after all hw jobs are reset, hw fence is > > > > meaningless, so force_completion */ > > > > amdgpu_fence_driver_force_completion(ring); > > > > } > > > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c > > > > b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c > > > > index eecf21d8ec33..815776c9a013 100644 > > > > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c > > > > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c > > > > @@ -156,11 +156,17 @@ int amdgpu_fence_emit(struct amdgpu_ring *ring, > > > > struct dma_fence **f, struct amd > > > > job->ring = ring; > > > > } > > > > - seq = ++ring->fence_drv.sync_seq; > > > > - dma_fence_init(fence, &amdgpu_fence_ops, > > > > -
Re: [PATCH 2/2] drm: add tdr support for embeded hw_fence
On 2021-07-22 6:45 a.m., Jingwen Chen wrote: On Wed Jul 21, 2021 at 12:53:51PM -0400, Andrey Grodzovsky wrote: On 2021-07-20 11:13 p.m., Jingwen Chen wrote: [Why] After embeded hw_fence to amdgpu_job, we need to add tdr support for this feature. [How] 1. Add a resubmit_flag for resubmit jobs. 2. Clear job fence from RCU and force complete vm flush fences in pre_asic_reset 3. skip dma_fence_get for resubmit jobs and add a dma_fence_put for guilty jobs. Signed-off-by: Jack Zhang Signed-off-by: Jingwen Chen --- drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 12 +++- drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c | 16 +++- drivers/gpu/drm/amd/amdgpu/amdgpu_job.c| 4 +++- drivers/gpu/drm/scheduler/sched_main.c | 1 + include/drm/gpu_scheduler.h| 1 + 5 files changed, 27 insertions(+), 7 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c index 40461547701a..fe0237f72a09 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c @@ -4382,7 +4382,7 @@ int amdgpu_device_mode1_reset(struct amdgpu_device *adev) int amdgpu_device_pre_asic_reset(struct amdgpu_device *adev, struct amdgpu_reset_context *reset_context) { - int i, r = 0; + int i, j, r = 0; struct amdgpu_job *job = NULL; bool need_full_reset = test_bit(AMDGPU_NEED_FULL_RESET, &reset_context->flags); @@ -4406,6 +4406,16 @@ int amdgpu_device_pre_asic_reset(struct amdgpu_device *adev, if (!ring || !ring->sched.thread) continue; + /*clear job fence from fence drv to avoid force_completion +*leave NULL and vm flush fence in fence drv */ + for (j = 0; j <= ring->fence_drv.num_fences_mask; j ++) { + struct dma_fence *old,**ptr; + ptr = &ring->fence_drv.fences[j]; + old = rcu_dereference_protected(*ptr, 1); + if (old && test_bit(DMA_FENCE_FLAG_USER_BITS, &old->flags))) { + RCU_INIT_POINTER(*ptr, NULL); + } Is this to avoid premature job free because of dma_fence_put inside amdgpu_fence_process ? I can't currently remember why but we probably want all the HW fences currently in the ring to be forced signaled - maybe better to test for DMA_FENCE_FLAG_USER_BITS inside amdgpu_fence_process and still do the signaling but not the dma_fence_put part Andrey Hi Andrey, This is to avoid signaling the same fence twice. If we still do the signaling, then the job in the pending list will be signaled first in force_completion, and later be signaled in resubmit. This will go to BUG() in amdgpu_fence_process. Oh, i see, how about just adding 'skip' flag to amdgpu_ring and setting it before calling amdgpu_fence_driver_force_completion and resetting it after, then inside amdgpu_fence_driver_force_completion you can just skip the signaling part with this flag for fences with DMA_FENCE_FLAG_USER_BITS set Less lines of code at least. Andrey + } /* after all hw jobs are reset, hw fence is meaningless, so force_completion */ amdgpu_fence_driver_force_completion(ring); } diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c index eecf21d8ec33..815776c9a013 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c @@ -156,11 +156,17 @@ int amdgpu_fence_emit(struct amdgpu_ring *ring, struct dma_fence **f, struct amd job->ring = ring; } - seq = ++ring->fence_drv.sync_seq; - dma_fence_init(fence, &amdgpu_fence_ops, - &ring->fence_drv.lock, - adev->fence_context + ring->idx, - seq); + if (job != NULL && job->base.resubmit_flag == 1) { + /* reinit seq for resubmitted jobs */ + seq = ++ring->fence_drv.sync_seq; + fence->seqno = seq; + } else { + seq = ++ring->fence_drv.sync_seq; Seems like you could do the above line only once above if-else as it was before Sure, I will modify this. Best Regards, JingWen Chen + dma_fence_init(fence, &amdgpu_fence_ops, + &ring->fence_drv.lock, + adev->fence_context + ring->idx, + seq); + } if (job != NULL) { /* mark this fence has a parent job */ diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c index 7c426e225b24..d6f848adc3f4 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c @@ -241,6 +241,7 @@ static struct dma_fence *
Re: [PATCH] drm/amdgpu: fix build error
Please push your patch now that it's been reviewed. Regards, Luben On 2021-07-22 10:22 a.m., Chen, Guchun wrote: > [Public] > > I guess your branch does not have below patch: > > 7afefb81b72c drm/amdgpu: Rename flag which prevents HW access > > Regards, > Guchun > > -Original Message- > From: Tuikov, Luben > Sent: Thursday, July 22, 2021 10:14 PM > To: Chen, Guchun ; amd-gfx@lists.freedesktop.org; Zhang, > Hawking ; Deucher, Alexander > > Subject: Re: [PATCH] drm/amdgpu: fix build error > > On 2021-07-22 5:25 a.m., Guchun Chen wrote: >> drivers/gpu/drm/amd/amdgpu/../pm/swsmu/smu_cmn.c: In function >> 'smu_cmn_send_msg_without_waiting': >> drivers/gpu/drm/amd/amdgpu/../pm/swsmu/smu_cmn.c:259:15: error: 'struct >> amdgpu_device' has no member named 'in_pci_err_recovery' >>if (smu->adev->in_pci_err_recovery) >>^~ >> CC [M] drivers/staging/rtl8192u/ieee80211/ieee80211_tx.o >> scripts/Makefile.build:272: recipe for target >> 'drivers/gpu/drm/amd/amdgpu/../pm/swsmu/smu_cmn.o' failed >> make[7]: *** [drivers/gpu/drm/amd/amdgpu/../pm/swsmu/smu_cmn.o] Error 1 >> scripts/Makefile.build:515: recipe for target 'drivers/gpu/drm/amd/amdgpu' >> failed >> make[6]: *** [drivers/gpu/drm/amd/amdgpu] Error 2 >> make[6]: *** Waiting for unfinished jobs >> >> Fixes: e070ba49f3a7 drm/amd/pm: Fix a bug communicating with the SMU (v5) >> >> Signed-off-by: Guchun Chen > Generally, there is no empty lines between tags. > > Not sure what happened here? Rename? This compiled and worked fine on my > machine. > > Reviewed-by: Luben Tuikov > > Regards, > Luben > >> --- >> drivers/gpu/drm/amd/pm/swsmu/smu_cmn.c | 2 +- >> 1 file changed, 1 insertion(+), 1 deletion(-) >> >> diff --git a/drivers/gpu/drm/amd/pm/swsmu/smu_cmn.c >> b/drivers/gpu/drm/amd/pm/swsmu/smu_cmn.c >> index a48f18932357..a0e2111eb783 100644 >> --- a/drivers/gpu/drm/amd/pm/swsmu/smu_cmn.c >> +++ b/drivers/gpu/drm/amd/pm/swsmu/smu_cmn.c >> @@ -256,7 +256,7 @@ int smu_cmn_send_msg_without_waiting(struct smu_context >> *smu, >> u32 reg; >> int res; >> >> -if (smu->adev->in_pci_err_recovery) >> +if (smu->adev->no_hw_access) >> return 0; >> >> mutex_lock(&smu->message_lock); ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
RE: [PATCH] drm/amdgpu: fix build error
[Public] I guess your branch does not have below patch: 7afefb81b72c drm/amdgpu: Rename flag which prevents HW access Regards, Guchun -Original Message- From: Tuikov, Luben Sent: Thursday, July 22, 2021 10:14 PM To: Chen, Guchun ; amd-gfx@lists.freedesktop.org; Zhang, Hawking ; Deucher, Alexander Subject: Re: [PATCH] drm/amdgpu: fix build error On 2021-07-22 5:25 a.m., Guchun Chen wrote: > drivers/gpu/drm/amd/amdgpu/../pm/swsmu/smu_cmn.c: In function > 'smu_cmn_send_msg_without_waiting': > drivers/gpu/drm/amd/amdgpu/../pm/swsmu/smu_cmn.c:259:15: error: 'struct > amdgpu_device' has no member named 'in_pci_err_recovery' >if (smu->adev->in_pci_err_recovery) >^~ > CC [M] drivers/staging/rtl8192u/ieee80211/ieee80211_tx.o > scripts/Makefile.build:272: recipe for target > 'drivers/gpu/drm/amd/amdgpu/../pm/swsmu/smu_cmn.o' failed > make[7]: *** [drivers/gpu/drm/amd/amdgpu/../pm/swsmu/smu_cmn.o] Error 1 > scripts/Makefile.build:515: recipe for target 'drivers/gpu/drm/amd/amdgpu' > failed > make[6]: *** [drivers/gpu/drm/amd/amdgpu] Error 2 > make[6]: *** Waiting for unfinished jobs > > Fixes: e070ba49f3a7 drm/amd/pm: Fix a bug communicating with the SMU (v5) > > Signed-off-by: Guchun Chen Generally, there is no empty lines between tags. Not sure what happened here? Rename? This compiled and worked fine on my machine. Reviewed-by: Luben Tuikov Regards, Luben > --- > drivers/gpu/drm/amd/pm/swsmu/smu_cmn.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/drivers/gpu/drm/amd/pm/swsmu/smu_cmn.c > b/drivers/gpu/drm/amd/pm/swsmu/smu_cmn.c > index a48f18932357..a0e2111eb783 100644 > --- a/drivers/gpu/drm/amd/pm/swsmu/smu_cmn.c > +++ b/drivers/gpu/drm/amd/pm/swsmu/smu_cmn.c > @@ -256,7 +256,7 @@ int smu_cmn_send_msg_without_waiting(struct smu_context > *smu, > u32 reg; > int res; > > - if (smu->adev->in_pci_err_recovery) > + if (smu->adev->no_hw_access) > return 0; > > mutex_lock(&smu->message_lock); ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
Re: [PATCH] drm/amdgpu: fix build error
On 2021-07-22 5:25 a.m., Guchun Chen wrote: > drivers/gpu/drm/amd/amdgpu/../pm/swsmu/smu_cmn.c: In function > 'smu_cmn_send_msg_without_waiting': > drivers/gpu/drm/amd/amdgpu/../pm/swsmu/smu_cmn.c:259:15: error: 'struct > amdgpu_device' has no member named 'in_pci_err_recovery' >if (smu->adev->in_pci_err_recovery) >^~ > CC [M] drivers/staging/rtl8192u/ieee80211/ieee80211_tx.o > scripts/Makefile.build:272: recipe for target > 'drivers/gpu/drm/amd/amdgpu/../pm/swsmu/smu_cmn.o' failed > make[7]: *** [drivers/gpu/drm/amd/amdgpu/../pm/swsmu/smu_cmn.o] Error 1 > scripts/Makefile.build:515: recipe for target 'drivers/gpu/drm/amd/amdgpu' > failed > make[6]: *** [drivers/gpu/drm/amd/amdgpu] Error 2 > make[6]: *** Waiting for unfinished jobs > > Fixes: e070ba49f3a7 drm/amd/pm: Fix a bug communicating with the SMU (v5) > > Signed-off-by: Guchun Chen Generally, there is no empty lines between tags. Not sure what happened here? Rename? This compiled and worked fine on my machine. Reviewed-by: Luben Tuikov Regards, Luben > --- > drivers/gpu/drm/amd/pm/swsmu/smu_cmn.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/drivers/gpu/drm/amd/pm/swsmu/smu_cmn.c > b/drivers/gpu/drm/amd/pm/swsmu/smu_cmn.c > index a48f18932357..a0e2111eb783 100644 > --- a/drivers/gpu/drm/amd/pm/swsmu/smu_cmn.c > +++ b/drivers/gpu/drm/amd/pm/swsmu/smu_cmn.c > @@ -256,7 +256,7 @@ int smu_cmn_send_msg_without_waiting(struct smu_context > *smu, > u32 reg; > int res; > > - if (smu->adev->in_pci_err_recovery) > + if (smu->adev->no_hw_access) > return 0; > > mutex_lock(&smu->message_lock); ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
Re: [PATCH v4 10/13] lib: test_hmm add module param for zone device type
On Sat, Jul 17, 2021 at 02:21:32PM -0500, Alex Sierra wrote: > In order to configure device generic in test_hmm, two > module parameters should be passed, which correspon to the > SP start address of each device (2) spm_addr_dev0 & > spm_addr_dev1. If no parameters are passed, private device > type is configured. I don't think tests should need configuration like this, is it really necessary? How can people with normal HW run this test? Jason ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
Re: ThinkPad T14 Gen 1 AMD switching to lowest brightness during boot process with 5.14-rc2
On Thu, Jul 22, 2021 at 10:04 AM Martin Steigerwald wrote: > > Not subscribed, so please Cc on answer. > > Hi! > > Just compiled 5.14-rc2 for my T14 Gen 1 AMD with "AMD Ryzen 7 PRO 4750U > with Radeon Graphics" with Low Power Display (400 Nits). > > Using that kernel, the display starts with the usual high brightness. > However, during boot the brightness goes down to what appears to be the > minimum. In KDE Plasma however the brightness of the display is shown as > maximum. > > If I move around the brightness slider just a few percent down from 100 > to maybe 98% or something like that the brightness is immediately reset > to its actual value display by the slider. > > This did not happen with 5.13. Please file a ticket here: https://gitlab.freedesktop.org/drm/amd/-/issues Can you bisect? Alex > > I am still on X11 as some KDE Plasma features are not yet available for > Wayland. > > This is on Devuan Ceres – basically Debian Sid without Systemd – with > Runit as init system, so there no automatic Systemd brightness logic > should be involved. > > If attached kernel configuration is stripped, please let me know. > > Best, > -- > Martin___ > amd-gfx mailing list > amd-gfx@lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/amd-gfx ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
Re: [PATCH 2/2] drm: add tdr support for embeded hw_fence
Am 22.07.21 um 16:45 schrieb Andrey Grodzovsky: On 2021-07-22 6:45 a.m., Jingwen Chen wrote: On Wed Jul 21, 2021 at 12:53:51PM -0400, Andrey Grodzovsky wrote: On 2021-07-20 11:13 p.m., Jingwen Chen wrote: [Why] After embeded hw_fence to amdgpu_job, we need to add tdr support for this feature. [How] 1. Add a resubmit_flag for resubmit jobs. 2. Clear job fence from RCU and force complete vm flush fences in pre_asic_reset 3. skip dma_fence_get for resubmit jobs and add a dma_fence_put for guilty jobs. Signed-off-by: Jack Zhang Signed-off-by: Jingwen Chen --- drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 12 +++- drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c | 16 +++- drivers/gpu/drm/amd/amdgpu/amdgpu_job.c | 4 +++- drivers/gpu/drm/scheduler/sched_main.c | 1 + include/drm/gpu_scheduler.h | 1 + 5 files changed, 27 insertions(+), 7 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c index 40461547701a..fe0237f72a09 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c @@ -4382,7 +4382,7 @@ int amdgpu_device_mode1_reset(struct amdgpu_device *adev) int amdgpu_device_pre_asic_reset(struct amdgpu_device *adev, struct amdgpu_reset_context *reset_context) { - int i, r = 0; + int i, j, r = 0; struct amdgpu_job *job = NULL; bool need_full_reset = test_bit(AMDGPU_NEED_FULL_RESET, &reset_context->flags); @@ -4406,6 +4406,16 @@ int amdgpu_device_pre_asic_reset(struct amdgpu_device *adev, if (!ring || !ring->sched.thread) continue; + /*clear job fence from fence drv to avoid force_completion + *leave NULL and vm flush fence in fence drv */ + for (j = 0; j <= ring->fence_drv.num_fences_mask; j ++) { + struct dma_fence *old,**ptr; + ptr = &ring->fence_drv.fences[j]; + old = rcu_dereference_protected(*ptr, 1); + if (old && test_bit(DMA_FENCE_FLAG_USER_BITS, &old->flags))) { + RCU_INIT_POINTER(*ptr, NULL); + } Is this to avoid premature job free because of dma_fence_put inside amdgpu_fence_process ? I can't currently remember why but we probably want all the HW fences currently in the ring to be forced signaled - maybe better to test for DMA_FENCE_FLAG_USER_BITS inside amdgpu_fence_process and still do the signaling but not the dma_fence_put part Andrey Hi Andrey, This is to avoid signaling the same fence twice. If we still do the signaling, then the job in the pending list will be signaled first in force_completion, and later be signaled in resubmit. This will go to BUG() in amdgpu_fence_process. Oh, i see, how about just adding 'skip' flag to amdgpu_ring and setting it before calling amdgpu_fence_driver_force_completion and resetting it after, then inside amdgpu_fence_driver_force_completion you can just skip the signaling part with this flag for fences with DMA_FENCE_FLAG_USER_BITS set Less lines of code at least. Still sounds quite a bit hacky. I would rather suggest to completely drop the approach with amdgpu_fence_driver_force_completion(). I could never see why we would want that in the first place. Regards, Christian. Andrey + } /* after all hw jobs are reset, hw fence is meaningless, so force_completion */ amdgpu_fence_driver_force_completion(ring); } diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c index eecf21d8ec33..815776c9a013 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c @@ -156,11 +156,17 @@ int amdgpu_fence_emit(struct amdgpu_ring *ring, struct dma_fence **f, struct amd job->ring = ring; } - seq = ++ring->fence_drv.sync_seq; - dma_fence_init(fence, &amdgpu_fence_ops, - &ring->fence_drv.lock, - adev->fence_context + ring->idx, - seq); + if (job != NULL && job->base.resubmit_flag == 1) { + /* reinit seq for resubmitted jobs */ + seq = ++ring->fence_drv.sync_seq; + fence->seqno = seq; + } else { + seq = ++ring->fence_drv.sync_seq; Seems like you could do the above line only once above if-else as it was before Sure, I will modify this. Best Regards, JingWen Chen + dma_fence_init(fence, &amdgpu_fence_ops, + &ring->fence_drv.lock, + adev->fence_context + ring->idx, + seq); + } if (job != NULL) { /* mark this fence has a parent job */ diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c index 7c426e225b24..d6f848adc3f4 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c @@ -241,6 +241,7 @@ static struct dma_fen
Re: [PATCH 2/2] drm: add tdr support for embeded hw_fence
On Wed Jul 21, 2021 at 12:53:51PM -0400, Andrey Grodzovsky wrote: > > On 2021-07-20 11:13 p.m., Jingwen Chen wrote: > > [Why] > > After embeded hw_fence to amdgpu_job, we need to add tdr support > > for this feature. > > > > [How] > > 1. Add a resubmit_flag for resubmit jobs. > > 2. Clear job fence from RCU and force complete vm flush fences in > > pre_asic_reset > > 3. skip dma_fence_get for resubmit jobs and add a dma_fence_put > > for guilty jobs. > > > > Signed-off-by: Jack Zhang > > Signed-off-by: Jingwen Chen > > --- > > drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 12 +++- > > drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c | 16 +++- > > drivers/gpu/drm/amd/amdgpu/amdgpu_job.c| 4 +++- > > drivers/gpu/drm/scheduler/sched_main.c | 1 + > > include/drm/gpu_scheduler.h| 1 + > > 5 files changed, 27 insertions(+), 7 deletions(-) > > > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c > > b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c > > index 40461547701a..fe0237f72a09 100644 > > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c > > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c > > @@ -4382,7 +4382,7 @@ int amdgpu_device_mode1_reset(struct amdgpu_device > > *adev) > > int amdgpu_device_pre_asic_reset(struct amdgpu_device *adev, > > struct amdgpu_reset_context *reset_context) > > { > > - int i, r = 0; > > + int i, j, r = 0; > > struct amdgpu_job *job = NULL; > > bool need_full_reset = > > test_bit(AMDGPU_NEED_FULL_RESET, &reset_context->flags); > > @@ -4406,6 +4406,16 @@ int amdgpu_device_pre_asic_reset(struct > > amdgpu_device *adev, > > if (!ring || !ring->sched.thread) > > continue; > > + /*clear job fence from fence drv to avoid force_completion > > +*leave NULL and vm flush fence in fence drv */ > > + for (j = 0; j <= ring->fence_drv.num_fences_mask; j ++) { > > + struct dma_fence *old,**ptr; > > + ptr = &ring->fence_drv.fences[j]; > > + old = rcu_dereference_protected(*ptr, 1); > > + if (old && test_bit(DMA_FENCE_FLAG_USER_BITS, > > &old->flags))) { > > + RCU_INIT_POINTER(*ptr, NULL); > > + } > > > Is this to avoid premature job free because of dma_fence_put inside > amdgpu_fence_process ? > I can't currently remember why but we probably want all the HW fences > currently in the ring to > be forced signaled - maybe better to test for DMA_FENCE_FLAG_USER_BITS > inside amdgpu_fence_process > and still do the signaling but not the dma_fence_put part > > Andrey Hi Andrey, This is to avoid signaling the same fence twice. If we still do the signaling, then the job in the pending list will be signaled first in force_completion, and later be signaled in resubmit. This will go to BUG() in amdgpu_fence_process. > > > + } > > /* after all hw jobs are reset, hw fence is meaningless, so > > force_completion */ > > amdgpu_fence_driver_force_completion(ring); > > } > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c > > b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c > > index eecf21d8ec33..815776c9a013 100644 > > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c > > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c > > @@ -156,11 +156,17 @@ int amdgpu_fence_emit(struct amdgpu_ring *ring, > > struct dma_fence **f, struct amd > > job->ring = ring; > > } > > - seq = ++ring->fence_drv.sync_seq; > > - dma_fence_init(fence, &amdgpu_fence_ops, > > - &ring->fence_drv.lock, > > - adev->fence_context + ring->idx, > > - seq); > > + if (job != NULL && job->base.resubmit_flag == 1) { > > + /* reinit seq for resubmitted jobs */ > > + seq = ++ring->fence_drv.sync_seq; > > + fence->seqno = seq; > > + } else { > > + seq = ++ring->fence_drv.sync_seq; > > > Seems like you could do the above line only once above if-else as it was > before Sure, I will modify this. Best Regards, JingWen Chen > > > + dma_fence_init(fence, &amdgpu_fence_ops, > > + &ring->fence_drv.lock, > > + adev->fence_context + ring->idx, > > + seq); > > + } > > if (job != NULL) { > > /* mark this fence has a parent job */ > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c > > b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c > > index 7c426e225b24..d6f848adc3f4 100644 > > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c > > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c > > @@ -241,6 +241,7 @@ static struct dma_fence *amdgpu_job_run(struct > > drm_sched_job *sched_job) > > dma_fence_set_error(finished, -ECANCELED);/* skip IB as well if > > VRAM lost */ > > if (finished->erro
Re: drm/amdgpu: Check pmops for desired suspend state
On Thu, Jul 22, 2021 at 10:57:14AM +0530, Pratik Vishwakarma wrote: > [Why] > User might set mem_sleep as deep and it will result > in amdgpu resume errors. > > [How] > Check with pm for default suspend state > > Signed-off-by: Pratik Vishwakarma > --- > drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 3 ++- > 1 file changed, 2 insertions(+), 1 deletion(-) > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c > b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c > index af1710971ff3..d92196429741 100644 > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c > @@ -1468,7 +1468,8 @@ static int amdgpu_pmops_suspend(struct device *dev) > struct amdgpu_device *adev = drm_to_adev(drm_dev); > int r; > > - if (amdgpu_acpi_is_s0ix_supported(adev)) > + if (amdgpu_acpi_is_s0ix_supported(adev) > + && pm_suspend_default_s2idle()) Why pm_suspend_default_s2idle instead of pm_suspend_via_firmware? > adev->in_s0ix = true; > adev->in_s3 = true; > r = amdgpu_device_suspend(drm_dev, true); ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
ThinkPad T14 Gen 1 AMD switching to lowest brightness during boot process with 5.14-rc2
Not subscribed, so please Cc on answer. Hi! Just compiled 5.14-rc2 for my T14 Gen 1 AMD with "AMD Ryzen 7 PRO 4750U with Radeon Graphics" with Low Power Display (400 Nits). Using that kernel, the display starts with the usual high brightness. However, during boot the brightness goes down to what appears to be the minimum. In KDE Plasma however the brightness of the display is shown as maximum. If I move around the brightness slider just a few percent down from 100 to maybe 98% or something like that the brightness is immediately reset to its actual value display by the slider. This did not happen with 5.13. I am still on X11 as some KDE Plasma features are not yet available for Wayland. This is on Devuan Ceres – basically Debian Sid without Systemd – with Runit as init system, so there no automatic Systemd brightness logic should be involved. If attached kernel configuration is stripped, please let me know. Best, -- Martin config-5.14.0-rc2-t14.xz Description: application/xz ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
[PATCH] drm/amdgpu: fix build error
drivers/gpu/drm/amd/amdgpu/../pm/swsmu/smu_cmn.c: In function 'smu_cmn_send_msg_without_waiting': drivers/gpu/drm/amd/amdgpu/../pm/swsmu/smu_cmn.c:259:15: error: 'struct amdgpu_device' has no member named 'in_pci_err_recovery' if (smu->adev->in_pci_err_recovery) ^~ CC [M] drivers/staging/rtl8192u/ieee80211/ieee80211_tx.o scripts/Makefile.build:272: recipe for target 'drivers/gpu/drm/amd/amdgpu/../pm/swsmu/smu_cmn.o' failed make[7]: *** [drivers/gpu/drm/amd/amdgpu/../pm/swsmu/smu_cmn.o] Error 1 scripts/Makefile.build:515: recipe for target 'drivers/gpu/drm/amd/amdgpu' failed make[6]: *** [drivers/gpu/drm/amd/amdgpu] Error 2 make[6]: *** Waiting for unfinished jobs Fixes: e070ba49f3a7 drm/amd/pm: Fix a bug communicating with the SMU (v5) Signed-off-by: Guchun Chen --- drivers/gpu/drm/amd/pm/swsmu/smu_cmn.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/pm/swsmu/smu_cmn.c b/drivers/gpu/drm/amd/pm/swsmu/smu_cmn.c index a48f18932357..a0e2111eb783 100644 --- a/drivers/gpu/drm/amd/pm/swsmu/smu_cmn.c +++ b/drivers/gpu/drm/amd/pm/swsmu/smu_cmn.c @@ -256,7 +256,7 @@ int smu_cmn_send_msg_without_waiting(struct smu_context *smu, u32 reg; int res; - if (smu->adev->in_pci_err_recovery) + if (smu->adev->no_hw_access) return 0; mutex_lock(&smu->message_lock); -- 2.17.1 ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
Re: [PATCH 1/2] drm/amd/pm: restore user customized OD settings properly for NV1x
On 7/22/2021 2:03 PM, Quan, Evan wrote: [AMD Official Use Only] -Original Message- From: Lazar, Lijo Sent: Thursday, July 22, 2021 4:10 PM To: Quan, Evan ; amd-gfx@lists.freedesktop.org Cc: Deucher, Alexander Subject: Re: [PATCH 1/2] drm/amd/pm: restore user customized OD settings properly for NV1x On 7/22/2021 8:50 AM, Evan Quan wrote: The customized OD settings can be divided into two parts: those committed ones and non-committed ones. - For those changes which had been fed to SMU before S3/S4/Runpm suspend kicked, they are committed changes. They should be properly restored and fed to SMU on S3/S4/Runpm resume. - For those non-committed changes, they are restored only without feeding to SMU. Change-Id: Iea7cf7908dfcd919a4d0205e10bff91b1149a440 Signed-off-by: Evan Quan --- drivers/gpu/drm/amd/pm/inc/amdgpu_smu.h | 8 +++ drivers/gpu/drm/amd/pm/swsmu/amdgpu_smu.c | 9 .../gpu/drm/amd/pm/swsmu/smu11/navi10_ppt.c | 52 ++- .../gpu/drm/amd/pm/swsmu/smu11/smu_v11_0.c| 12 + 4 files changed, 69 insertions(+), 12 deletions(-) diff --git a/drivers/gpu/drm/amd/pm/inc/amdgpu_smu.h b/drivers/gpu/drm/amd/pm/inc/amdgpu_smu.h index 3e89852e4820..8ba53f16d2d9 100644 --- a/drivers/gpu/drm/amd/pm/inc/amdgpu_smu.h +++ b/drivers/gpu/drm/amd/pm/inc/amdgpu_smu.h @@ -231,6 +231,7 @@ struct smu_user_dpm_profile { uint32_t power_limit; uint32_t fan_speed_percent; uint32_t flags; + uint32_t committed_od; /* user clock state information */ uint32_t clk_mask[SMU_CLK_COUNT]; @@ -352,6 +353,7 @@ struct smu_table_context void*overdrive_table; void*boot_overdrive_table; + void*committed_overdrive_table; May be rename it to user_overdrive_table - OD table with user settings? [Quan, Evan] Actually "overdrive_table " is the user_overdrive_table. It contains all the modification including those not committed( the commit is performed by echo "c" > /sys/class/drm/card1/device/pp_od_clk_voltage). The new member added refers to those user customized but committed only settings. That's why it was named as " committed_overdrive_table". Any good suggestion for the naming? As far as I understand, the problem is overdrive_table can have intemediary settings as the edit/commit is a two-step process. At any point we can have user_od_table keep the latest user mode settings. That is true when user restores to default settings also; that time we can just copy the boot settings back to user_od table and keep the flag as false indicating that there is no custom settings. uint32_tgpu_metrics_table_size; void*gpu_metrics_table; @@ -623,6 +625,12 @@ struct pptable_funcs { enum PP_OD_DPM_TABLE_COMMAND type, long *input, uint32_t size); + /** +* @restore_committed_od_settings: Restore the customized and committed +* OD settings on S3/S4/Runpm resume. +*/ + int (*restore_committed_od_settings)(struct smu_context *smu); + /** * @get_clock_by_type_with_latency: Get the speed and latency of a clock * domain. diff --git a/drivers/gpu/drm/amd/pm/swsmu/amdgpu_smu.c b/drivers/gpu/drm/amd/pm/swsmu/amdgpu_smu.c index ebe672142808..5f7d98e99f76 100644 --- a/drivers/gpu/drm/amd/pm/swsmu/amdgpu_smu.c +++ b/drivers/gpu/drm/amd/pm/swsmu/amdgpu_smu.c @@ -416,6 +416,15 @@ static void smu_restore_dpm_user_profile(struct smu_context *smu) } } + /* Restore customized and committed OD settings */ + if (smu->user_dpm_profile.committed_od) { + if (smu->ppt_funcs->restore_committed_od_settings) { + ret = smu->ppt_funcs- restore_committed_od_settings(smu); + if (ret) + dev_err(smu->adev->dev, "Failed to upload customized OD settings\n"); + } + } + Just thinking if there is a need to handle it ASIC specific. The flags and table pointer are maintained in common structure. So can't we do the restore of user OD settings directly in this common flow instead of having each ASIC to implement the callback? [Quan, Evan] The OD settings restoring is ASIC specific as it performed on OD table and that(OD table) is ASIC specific. I was thinking in terms of final logic that is being done. The below structures are available in common table context and we have a common logic to update the table. If there is no custom OD settings available, we could handle it with the flag. + struct smu_table_context *table_context = &smu->smu_table; + void *od_table = table_context->committed_overdrive_ta
RE: [PATCH] drm/amdgpu: Change the imprecise output
[AMD Official Use Only] Reviewed-by: Peng Ju Zhou > -Original Message- > From: amd-gfx On Behalf Of Roy Sun > Sent: Thursday, July 22, 2021 3:14 PM > To: amd-gfx@lists.freedesktop.org > Cc: Sun, Roy > Subject: [PATCH] drm/amdgpu: Change the imprecise output > > The fail reason is that the vfgate is disabled > > Signed-off-by: Roy Sun > --- > drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c | 6 +++--- > 1 file changed, 3 insertions(+), 3 deletions(-) > > diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c > b/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c > index 9f3d82dfb79c..f94ef15b3166 100644 > --- a/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c > +++ b/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c > @@ -56,7 +56,7 @@ > #define GFX10_NUM_GFX_RINGS_Sienna_Cichlid 1 > #define GFX10_MEC_HPD_SIZE 2048 > > -#define RLCG_INTERFACE_NOT_ENABLED 0x400 > +#define RLCG_VFGATE_DISABLED 0x400 > #define RLCG_WRONG_OPERATION_TYPE0x200 > #define RLCG_NOT_IN_RANGE0x100 > > @@ -1571,8 +1571,8 @@ static u32 gfx_v10_rlcg_rw(struct amdgpu_device > *adev, u32 offset, u32 v, uint32 > > if (i >= retries) { > if (RLCG_ERROR_REPORT_ENABLED(adev)) { > - if (tmp & RLCG_INTERFACE_NOT_ENABLED) > - pr_err("The interface is not enabled, > program reg:0x%05x failed!\n", offset); > + if (tmp & RLCG_VFGATE_DISABLED) > + pr_err("The vfgate is disabled, > program reg:0x%05x failed!\n", offset); > else if (tmp & > RLCG_WRONG_OPERATION_TYPE) > pr_err("Wrong operation type, > program reg:0x%05x failed!\n", offset); > else if (tmp & RLCG_NOT_IN_RANGE) > -- > 2.32.0 > > ___ > amd-gfx mailing list > amd-gfx@lists.freedesktop.org > https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.fr > eedesktop.org%2Fmailman%2Flistinfo%2Famd- > gfx&data=04%7C01%7CPengju.Zhou%40amd.com%7C5767a32458114913 > dc8608d94ce0458a%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C6 > 37625348403108644%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDA > iLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=Hs > Cx%2B9B3pIvGnov0I3mnxZOEKE7%2FvFf3JhpYl8BI6ko%3D&reserved=0 ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
RE: [PATCH 1/2] drm/amd/pm: restore user customized OD settings properly for NV1x
[AMD Official Use Only] > -Original Message- > From: Lazar, Lijo > Sent: Thursday, July 22, 2021 4:10 PM > To: Quan, Evan ; amd-gfx@lists.freedesktop.org > Cc: Deucher, Alexander > Subject: Re: [PATCH 1/2] drm/amd/pm: restore user customized OD settings > properly for NV1x > > > > On 7/22/2021 8:50 AM, Evan Quan wrote: > > The customized OD settings can be divided into two parts: those > > committed ones and non-committed ones. > >- For those changes which had been fed to SMU before S3/S4/Runpm > > suspend kicked, they are committed changes. They should be properly > > restored and fed to SMU on S3/S4/Runpm resume. > >- For those non-committed changes, they are restored only without > feeding > > to SMU. > > > > Change-Id: Iea7cf7908dfcd919a4d0205e10bff91b1149a440 > > Signed-off-by: Evan Quan > > --- > > drivers/gpu/drm/amd/pm/inc/amdgpu_smu.h | 8 +++ > > drivers/gpu/drm/amd/pm/swsmu/amdgpu_smu.c | 9 > > .../gpu/drm/amd/pm/swsmu/smu11/navi10_ppt.c | 52 > ++- > > .../gpu/drm/amd/pm/swsmu/smu11/smu_v11_0.c| 12 + > > 4 files changed, 69 insertions(+), 12 deletions(-) > > > > diff --git a/drivers/gpu/drm/amd/pm/inc/amdgpu_smu.h > > b/drivers/gpu/drm/amd/pm/inc/amdgpu_smu.h > > index 3e89852e4820..8ba53f16d2d9 100644 > > --- a/drivers/gpu/drm/amd/pm/inc/amdgpu_smu.h > > +++ b/drivers/gpu/drm/amd/pm/inc/amdgpu_smu.h > > @@ -231,6 +231,7 @@ struct smu_user_dpm_profile { > > uint32_t power_limit; > > uint32_t fan_speed_percent; > > uint32_t flags; > > + uint32_t committed_od; > > > > /* user clock state information */ > > uint32_t clk_mask[SMU_CLK_COUNT]; > > @@ -352,6 +353,7 @@ struct smu_table_context > > > > void*overdrive_table; > > void*boot_overdrive_table; > > + void*committed_overdrive_table; > > May be rename it to user_overdrive_table - OD table with user settings? [Quan, Evan] Actually "overdrive_table " is the user_overdrive_table. It contains all the modification including those not committed( the commit is performed by echo "c" > /sys/class/drm/card1/device/pp_od_clk_voltage). The new member added refers to those user customized but committed only settings. That's why it was named as " committed_overdrive_table". Any good suggestion for the naming? > > > uint32_tgpu_metrics_table_size; > > void*gpu_metrics_table; > > @@ -623,6 +625,12 @@ struct pptable_funcs { > > enum PP_OD_DPM_TABLE_COMMAND > type, > > long *input, uint32_t size); > > > > + /** > > +* @restore_committed_od_settings: Restore the customized and > committed > > +* OD settings on S3/S4/Runpm resume. > > +*/ > > + int (*restore_committed_od_settings)(struct smu_context *smu); > > + > > /** > > * @get_clock_by_type_with_latency: Get the speed and latency of > a clock > > * domain. > > diff --git a/drivers/gpu/drm/amd/pm/swsmu/amdgpu_smu.c > > b/drivers/gpu/drm/amd/pm/swsmu/amdgpu_smu.c > > index ebe672142808..5f7d98e99f76 100644 > > --- a/drivers/gpu/drm/amd/pm/swsmu/amdgpu_smu.c > > +++ b/drivers/gpu/drm/amd/pm/swsmu/amdgpu_smu.c > > @@ -416,6 +416,15 @@ static void smu_restore_dpm_user_profile(struct > smu_context *smu) > > } > > } > > > > + /* Restore customized and committed OD settings */ > > + if (smu->user_dpm_profile.committed_od) { > > + if (smu->ppt_funcs->restore_committed_od_settings) { > > + ret = smu->ppt_funcs- > >restore_committed_od_settings(smu); > > + if (ret) > > + dev_err(smu->adev->dev, "Failed to upload > customized OD settings\n"); > > + } > > + } > > + > > Just thinking if there is a need to handle it ASIC specific. The flags and > table > pointer are maintained in common structure. So can't we do the restore of > user OD settings directly in this common flow instead of having each ASIC to > implement the callback? [Quan, Evan] The OD settings restoring is ASIC specific as it performed on OD table and that(OD table) is ASIC specific. > > > /* Disable restore flag */ > > smu->user_dpm_profile.flags &= > ~SMU_DPM_USER_PROFILE_RESTORE; > > } > > diff --git a/drivers/gpu/drm/amd/pm/swsmu/smu11/navi10_ppt.c > > b/drivers/gpu/drm/amd/pm/swsmu/smu11/navi10_ppt.c > > index 59ea59acfb00..4752299d7f91 100644 > > --- a/drivers/gpu/drm/amd/pm/swsmu/smu11/navi10_ppt.c > > +++ b/drivers/gpu/drm/amd/pm/swsmu/smu11/navi10_ppt.c > > @@ -2296,39 +2296,45 @@ static int > navi10_set_default_od_settings(struct smu_context *smu) > > (OverDriveTable_t *)smu->smu_table.boot_overdrive_table; > > int ret = 0; > > > > - ret = smu_cmn_update_table(smu, SMU_TABLE_OVERDRIVE, 0, > (void *)
Re: [PATCH 1/2] drm/amd/pm: restore user customized OD settings properly for NV1x
On 7/22/2021 8:50 AM, Evan Quan wrote: The customized OD settings can be divided into two parts: those committed ones and non-committed ones. - For those changes which had been fed to SMU before S3/S4/Runpm suspend kicked, they are committed changes. They should be properly restored and fed to SMU on S3/S4/Runpm resume. - For those non-committed changes, they are restored only without feeding to SMU. Change-Id: Iea7cf7908dfcd919a4d0205e10bff91b1149a440 Signed-off-by: Evan Quan --- drivers/gpu/drm/amd/pm/inc/amdgpu_smu.h | 8 +++ drivers/gpu/drm/amd/pm/swsmu/amdgpu_smu.c | 9 .../gpu/drm/amd/pm/swsmu/smu11/navi10_ppt.c | 52 ++- .../gpu/drm/amd/pm/swsmu/smu11/smu_v11_0.c| 12 + 4 files changed, 69 insertions(+), 12 deletions(-) diff --git a/drivers/gpu/drm/amd/pm/inc/amdgpu_smu.h b/drivers/gpu/drm/amd/pm/inc/amdgpu_smu.h index 3e89852e4820..8ba53f16d2d9 100644 --- a/drivers/gpu/drm/amd/pm/inc/amdgpu_smu.h +++ b/drivers/gpu/drm/amd/pm/inc/amdgpu_smu.h @@ -231,6 +231,7 @@ struct smu_user_dpm_profile { uint32_t power_limit; uint32_t fan_speed_percent; uint32_t flags; + uint32_t committed_od; /* user clock state information */ uint32_t clk_mask[SMU_CLK_COUNT]; @@ -352,6 +353,7 @@ struct smu_table_context void*overdrive_table; void*boot_overdrive_table; + void*committed_overdrive_table; May be rename it to user_overdrive_table - OD table with user settings? uint32_tgpu_metrics_table_size; void*gpu_metrics_table; @@ -623,6 +625,12 @@ struct pptable_funcs { enum PP_OD_DPM_TABLE_COMMAND type, long *input, uint32_t size); + /** +* @restore_committed_od_settings: Restore the customized and committed +* OD settings on S3/S4/Runpm resume. +*/ + int (*restore_committed_od_settings)(struct smu_context *smu); + /** * @get_clock_by_type_with_latency: Get the speed and latency of a clock * domain. diff --git a/drivers/gpu/drm/amd/pm/swsmu/amdgpu_smu.c b/drivers/gpu/drm/amd/pm/swsmu/amdgpu_smu.c index ebe672142808..5f7d98e99f76 100644 --- a/drivers/gpu/drm/amd/pm/swsmu/amdgpu_smu.c +++ b/drivers/gpu/drm/amd/pm/swsmu/amdgpu_smu.c @@ -416,6 +416,15 @@ static void smu_restore_dpm_user_profile(struct smu_context *smu) } } + /* Restore customized and committed OD settings */ + if (smu->user_dpm_profile.committed_od) { + if (smu->ppt_funcs->restore_committed_od_settings) { + ret = smu->ppt_funcs->restore_committed_od_settings(smu); + if (ret) + dev_err(smu->adev->dev, "Failed to upload customized OD settings\n"); + } + } + Just thinking if there is a need to handle it ASIC specific. The flags and table pointer are maintained in common structure. So can't we do the restore of user OD settings directly in this common flow instead of having each ASIC to implement the callback? /* Disable restore flag */ smu->user_dpm_profile.flags &= ~SMU_DPM_USER_PROFILE_RESTORE; } diff --git a/drivers/gpu/drm/amd/pm/swsmu/smu11/navi10_ppt.c b/drivers/gpu/drm/amd/pm/swsmu/smu11/navi10_ppt.c index 59ea59acfb00..4752299d7f91 100644 --- a/drivers/gpu/drm/amd/pm/swsmu/smu11/navi10_ppt.c +++ b/drivers/gpu/drm/amd/pm/swsmu/smu11/navi10_ppt.c @@ -2296,39 +2296,45 @@ static int navi10_set_default_od_settings(struct smu_context *smu) (OverDriveTable_t *)smu->smu_table.boot_overdrive_table; int ret = 0; - ret = smu_cmn_update_table(smu, SMU_TABLE_OVERDRIVE, 0, (void *)od_table, false); + ret = smu_cmn_update_table(smu, SMU_TABLE_OVERDRIVE, 0, (void *)boot_od_table, false); if (ret) { dev_err(smu->adev->dev, "Failed to get overdrive table!\n"); return ret; } - if (!od_table->GfxclkVolt1) { + if (!boot_od_table->GfxclkVolt1) { ret = navi10_overdrive_get_gfx_clk_base_voltage(smu, - &od_table->GfxclkVolt1, - od_table->GfxclkFreq1); + &boot_od_table->GfxclkVolt1, + boot_od_table->GfxclkFreq1); if (ret) return ret; } - if (!od_table->GfxclkVolt2) { + if (!boot_od_table->GfxclkVolt2) { ret = navi10_overdrive_get_gfx_clk_base_voltage(smu, - &od_tabl
[PATCH] drm/amdgpu: Change the imprecise output
The fail reason is that the vfgate is disabled Signed-off-by: Roy Sun --- drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c b/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c index 9f3d82dfb79c..f94ef15b3166 100644 --- a/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c +++ b/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c @@ -56,7 +56,7 @@ #define GFX10_NUM_GFX_RINGS_Sienna_Cichlid 1 #define GFX10_MEC_HPD_SIZE 2048 -#define RLCG_INTERFACE_NOT_ENABLED 0x400 +#define RLCG_VFGATE_DISABLED 0x400 #define RLCG_WRONG_OPERATION_TYPE 0x200 #define RLCG_NOT_IN_RANGE 0x100 @@ -1571,8 +1571,8 @@ static u32 gfx_v10_rlcg_rw(struct amdgpu_device *adev, u32 offset, u32 v, uint32 if (i >= retries) { if (RLCG_ERROR_REPORT_ENABLED(adev)) { - if (tmp & RLCG_INTERFACE_NOT_ENABLED) - pr_err("The interface is not enabled, program reg:0x%05x failed!\n", offset); + if (tmp & RLCG_VFGATE_DISABLED) + pr_err("The vfgate is disabled, program reg:0x%05x failed!\n", offset); else if (tmp & RLCG_WRONG_OPERATION_TYPE) pr_err("Wrong operation type, program reg:0x%05x failed!\n", offset); else if (tmp & RLCG_NOT_IN_RANGE) -- 2.32.0 ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
[PATCH] drm/amdgpu: Add msix restore for pass-through mode
In pass-through mode, after mode 1 reset, msix enablement status would lost and never receives interrupt again. So, we should restore msix status after mode 1 reset. Signed-off-by: Chengzhe Liu --- drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c index 83af307e97cd..e1aa4a5e6a98 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c @@ -584,7 +584,7 @@ void amdgpu_irq_gpu_reset_resume_helper(struct amdgpu_device *adev) { int i, j, k; - if (amdgpu_sriov_vf(adev)) + if (amdgpu_sriov_vf(adev) || amdgpu_passthrough(adev)) amdgpu_restore_msix(adev); for (i = 0; i < AMDGPU_IRQ_CLIENTID_MAX; ++i) { -- 2.25.1 ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
[PATCH] drm/amdgpu: Check pmops for desired suspend state
[Why] User might set mem_sleep as deep and it will result in amdgpu resume errors. [How] Check with pm for default suspend state Signed-off-by: Pratik Vishwakarma --- drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c index af1710971ff3..d92196429741 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c @@ -1468,7 +1468,8 @@ static int amdgpu_pmops_suspend(struct device *dev) struct amdgpu_device *adev = drm_to_adev(drm_dev); int r; - if (amdgpu_acpi_is_s0ix_supported(adev)) + if (amdgpu_acpi_is_s0ix_supported(adev) + && pm_suspend_default_s2idle()) adev->in_s0ix = true; adev->in_s3 = true; r = amdgpu_device_suspend(drm_dev, true); -- 2.25.1 ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
[PATCH 1/2] drm/amd/pm: restore user customized OD settings properly for NV1x
The customized OD settings can be divided into two parts: those committed ones and non-committed ones. - For those changes which had been fed to SMU before S3/S4/Runpm suspend kicked, they are committed changes. They should be properly restored and fed to SMU on S3/S4/Runpm resume. - For those non-committed changes, they are restored only without feeding to SMU. Change-Id: Iea7cf7908dfcd919a4d0205e10bff91b1149a440 Signed-off-by: Evan Quan --- drivers/gpu/drm/amd/pm/inc/amdgpu_smu.h | 8 +++ drivers/gpu/drm/amd/pm/swsmu/amdgpu_smu.c | 9 .../gpu/drm/amd/pm/swsmu/smu11/navi10_ppt.c | 52 ++- .../gpu/drm/amd/pm/swsmu/smu11/smu_v11_0.c| 12 + 4 files changed, 69 insertions(+), 12 deletions(-) diff --git a/drivers/gpu/drm/amd/pm/inc/amdgpu_smu.h b/drivers/gpu/drm/amd/pm/inc/amdgpu_smu.h index 3e89852e4820..8ba53f16d2d9 100644 --- a/drivers/gpu/drm/amd/pm/inc/amdgpu_smu.h +++ b/drivers/gpu/drm/amd/pm/inc/amdgpu_smu.h @@ -231,6 +231,7 @@ struct smu_user_dpm_profile { uint32_t power_limit; uint32_t fan_speed_percent; uint32_t flags; + uint32_t committed_od; /* user clock state information */ uint32_t clk_mask[SMU_CLK_COUNT]; @@ -352,6 +353,7 @@ struct smu_table_context void*overdrive_table; void*boot_overdrive_table; + void*committed_overdrive_table; uint32_tgpu_metrics_table_size; void*gpu_metrics_table; @@ -623,6 +625,12 @@ struct pptable_funcs { enum PP_OD_DPM_TABLE_COMMAND type, long *input, uint32_t size); + /** +* @restore_committed_od_settings: Restore the customized and committed +* OD settings on S3/S4/Runpm resume. +*/ + int (*restore_committed_od_settings)(struct smu_context *smu); + /** * @get_clock_by_type_with_latency: Get the speed and latency of a clock * domain. diff --git a/drivers/gpu/drm/amd/pm/swsmu/amdgpu_smu.c b/drivers/gpu/drm/amd/pm/swsmu/amdgpu_smu.c index ebe672142808..5f7d98e99f76 100644 --- a/drivers/gpu/drm/amd/pm/swsmu/amdgpu_smu.c +++ b/drivers/gpu/drm/amd/pm/swsmu/amdgpu_smu.c @@ -416,6 +416,15 @@ static void smu_restore_dpm_user_profile(struct smu_context *smu) } } + /* Restore customized and committed OD settings */ + if (smu->user_dpm_profile.committed_od) { + if (smu->ppt_funcs->restore_committed_od_settings) { + ret = smu->ppt_funcs->restore_committed_od_settings(smu); + if (ret) + dev_err(smu->adev->dev, "Failed to upload customized OD settings\n"); + } + } + /* Disable restore flag */ smu->user_dpm_profile.flags &= ~SMU_DPM_USER_PROFILE_RESTORE; } diff --git a/drivers/gpu/drm/amd/pm/swsmu/smu11/navi10_ppt.c b/drivers/gpu/drm/amd/pm/swsmu/smu11/navi10_ppt.c index 59ea59acfb00..4752299d7f91 100644 --- a/drivers/gpu/drm/amd/pm/swsmu/smu11/navi10_ppt.c +++ b/drivers/gpu/drm/amd/pm/swsmu/smu11/navi10_ppt.c @@ -2296,39 +2296,45 @@ static int navi10_set_default_od_settings(struct smu_context *smu) (OverDriveTable_t *)smu->smu_table.boot_overdrive_table; int ret = 0; - ret = smu_cmn_update_table(smu, SMU_TABLE_OVERDRIVE, 0, (void *)od_table, false); + ret = smu_cmn_update_table(smu, SMU_TABLE_OVERDRIVE, 0, (void *)boot_od_table, false); if (ret) { dev_err(smu->adev->dev, "Failed to get overdrive table!\n"); return ret; } - if (!od_table->GfxclkVolt1) { + if (!boot_od_table->GfxclkVolt1) { ret = navi10_overdrive_get_gfx_clk_base_voltage(smu, - &od_table->GfxclkVolt1, - od_table->GfxclkFreq1); + &boot_od_table->GfxclkVolt1, + boot_od_table->GfxclkFreq1); if (ret) return ret; } - if (!od_table->GfxclkVolt2) { + if (!boot_od_table->GfxclkVolt2) { ret = navi10_overdrive_get_gfx_clk_base_voltage(smu, - &od_table->GfxclkVolt2, - od_table->GfxclkFreq2); + &boot_od_table->GfxclkVolt2, + boot_od_table->GfxclkFreq2); if (ret)
[PATCH 2/2] drm/amd/pm: restore user customized OD settings properly for Sienna Cichlid
Properly restore those committed and non-committed user customized OD settings. Change-Id: I25396df0b3ecdd7a0d9fc77ed220b0abf1fde020 Signed-off-by: Evan Quan --- drivers/gpu/drm/amd/pm/inc/smu_v11_0.h | 2 ++ drivers/gpu/drm/amd/pm/swsmu/smu11/navi10_ppt.c | 15 +-- .../drm/amd/pm/swsmu/smu11/sienna_cichlid_ppt.c | 16 +--- drivers/gpu/drm/amd/pm/swsmu/smu11/smu_v11_0.c | 13 + 4 files changed, 29 insertions(+), 17 deletions(-) diff --git a/drivers/gpu/drm/amd/pm/inc/smu_v11_0.h b/drivers/gpu/drm/amd/pm/inc/smu_v11_0.h index 385b2ea5379c..7bf25efc3936 100644 --- a/drivers/gpu/drm/amd/pm/inc/smu_v11_0.h +++ b/drivers/gpu/drm/amd/pm/inc/smu_v11_0.h @@ -302,5 +302,7 @@ void smu_v11_0_interrupt_work(struct smu_context *smu); int smu_v11_0_set_light_sbr(struct smu_context *smu, bool enable); +int smu_v11_0_restore_committed_od_settings(struct smu_context *smu); + #endif #endif diff --git a/drivers/gpu/drm/amd/pm/swsmu/smu11/navi10_ppt.c b/drivers/gpu/drm/amd/pm/swsmu/smu11/navi10_ppt.c index 4752299d7f91..fbd29129550a 100644 --- a/drivers/gpu/drm/amd/pm/swsmu/smu11/navi10_ppt.c +++ b/drivers/gpu/drm/amd/pm/swsmu/smu11/navi10_ppt.c @@ -2513,19 +2513,6 @@ static int navi10_od_edit_dpm_table(struct smu_context *smu, enum PP_OD_DPM_TABL return ret; } -static int navi10_restore_committed_od_settings(struct smu_context *smu) -{ - struct smu_table_context *table_context = &smu->smu_table; - void *od_table = table_context->committed_overdrive_table; - int ret = 0; - - ret = smu_cmn_update_table(smu, SMU_TABLE_OVERDRIVE, 0, (void *)od_table, true); - if (ret) - dev_err(smu->adev->dev, "Failed to import overdrive table!\n"); - - return ret; -} - static int navi10_run_btc(struct smu_context *smu) { int ret = 0; @@ -3289,7 +3276,7 @@ static const struct pptable_funcs navi10_ppt_funcs = { .set_soft_freq_limited_range = smu_v11_0_set_soft_freq_limited_range, .set_default_od_settings = navi10_set_default_od_settings, .od_edit_dpm_table = navi10_od_edit_dpm_table, - .restore_committed_od_settings = navi10_restore_committed_od_settings, + .restore_committed_od_settings = smu_v11_0_restore_committed_od_settings, .run_btc = navi10_run_btc, .set_power_source = smu_v11_0_set_power_source, .get_pp_feature_mask = smu_cmn_get_pp_feature_mask, diff --git a/drivers/gpu/drm/amd/pm/swsmu/smu11/sienna_cichlid_ppt.c b/drivers/gpu/drm/amd/pm/swsmu/smu11/sienna_cichlid_ppt.c index e0638dc3f617..f0a7dc1d1640 100644 --- a/drivers/gpu/drm/amd/pm/swsmu/smu11/sienna_cichlid_ppt.c +++ b/drivers/gpu/drm/amd/pm/swsmu/smu11/sienna_cichlid_ppt.c @@ -1957,15 +1957,16 @@ static int sienna_cichlid_set_default_od_settings(struct smu_context *smu) int ret = 0; ret = smu_cmn_update_table(smu, SMU_TABLE_OVERDRIVE, - 0, (void *)od_table, false); + 0, (void *)boot_od_table, false); if (ret) { dev_err(smu->adev->dev, "Failed to get overdrive table!\n"); return ret; } - memcpy(boot_od_table, od_table, sizeof(OverDriveTable_t)); + sienna_cichlid_dump_od_table(smu, boot_od_table); - sienna_cichlid_dump_od_table(smu, od_table); + if (!smu->adev->in_suspend) + memcpy(od_table, boot_od_table, sizeof(OverDriveTable_t)); return 0; } @@ -2136,6 +2137,14 @@ static int sienna_cichlid_od_edit_dpm_table(struct smu_context *smu, dev_err(smu->adev->dev, "Failed to import overdrive table!\n"); return ret; } + if (memcmp(table_context->overdrive_table, table_context->boot_overdrive_table, + sizeof(OverDriveTable_t))) { + smu->user_dpm_profile.committed_od = true; + memcpy(table_context->committed_overdrive_table, table_context->overdrive_table, + sizeof(OverDriveTable_t)); + } else { + smu->user_dpm_profile.committed_od = false; + } break; case PP_OD_EDIT_VDDGFX_OFFSET: @@ -3902,6 +3911,7 @@ static const struct pptable_funcs sienna_cichlid_ppt_funcs = { .set_soft_freq_limited_range = smu_v11_0_set_soft_freq_limited_range, .set_default_od_settings = sienna_cichlid_set_default_od_settings, .od_edit_dpm_table = sienna_cichlid_od_edit_dpm_table, + .restore_committed_od_settings = smu_v11_0_restore_committed_od_settings, .run_btc = sienna_cichlid_run_btc, .set_power_source = smu_v11_0_set_power_source, .get_pp_feature_mask = smu_cmn_get_pp_feature_mask, diff --git a/drivers/gpu/drm/amd/pm/swsmu/smu11/smu_v11_0.c b/drivers/gpu/drm/amd/pm/swsmu/smu11/smu_v11_0.c index 28fc3f17c1b1..323126a7ca49