date:20220928

Re: [PATCH 1/7] mm/memory.c: Fix race when faulting a device private page

2022-09-28 Thread Michael Ellerman

Alistair Popple  writes:
> When the CPU tries to access a device private page the migrate_to_ram()
> callback associated with the pgmap for the page is called. However no
> reference is taken on the faulting page. Therefore a concurrent
> migration of the device private page can free the page and possibly the
> underlying pgmap. This results in a race which can crash the kernel due
> to the migrate_to_ram() function pointer becoming invalid. It also means
> drivers can't reliably read the zone_device_data field because the page
> may have been freed with memunmap_pages().
>
> Close the race by getting a reference on the page while holding the ptl
> to ensure it has not been freed. Unfortunately the elevated reference
> count will cause the migration required to handle the fault to fail. To
> avoid this failure pass the faulting page into the migrate_vma functions
> so that if an elevated reference count is found it can be checked to see
> if it's expected or not.
>
> Signed-off-by: Alistair Popple 
> ---
>  arch/powerpc/kvm/book3s_hv_uvmem.c   | 15 ++-
>  drivers/gpu/drm/amd/amdkfd/kfd_migrate.c | 17 +++--
>  drivers/gpu/drm/amd/amdkfd/kfd_migrate.h |  2 +-
>  drivers/gpu/drm/amd/amdkfd/kfd_svm.c | 11 +---
>  include/linux/migrate.h  |  8 ++-
>  lib/test_hmm.c   |  7 ++---
>  mm/memory.c  | 16 +++-
>  mm/migrate.c | 34 ++---
>  mm/migrate_device.c  | 18 +
>  9 files changed, 87 insertions(+), 41 deletions(-)
>
> diff --git a/arch/powerpc/kvm/book3s_hv_uvmem.c 
> b/arch/powerpc/kvm/book3s_hv_uvmem.c
> index 5980063..d4eacf4 100644
> --- a/arch/powerpc/kvm/book3s_hv_uvmem.c
> +++ b/arch/powerpc/kvm/book3s_hv_uvmem.c
> @@ -508,10 +508,10 @@ unsigned long kvmppc_h_svm_init_start(struct kvm *kvm)
>  static int __kvmppc_svm_page_out(struct vm_area_struct *vma,
>   unsigned long start,
>   unsigned long end, unsigned long page_shift,
> - struct kvm *kvm, unsigned long gpa)
> + struct kvm *kvm, unsigned long gpa, struct page *fault_page)
>  {
>   unsigned long src_pfn, dst_pfn = 0;
> - struct migrate_vma mig;
> + struct migrate_vma mig = { 0 };
>   struct page *dpage, *spage;
>   struct kvmppc_uvmem_page_pvt *pvt;
>   unsigned long pfn;
> @@ -525,6 +525,7 @@ static int __kvmppc_svm_page_out(struct vm_area_struct 
> *vma,
>   mig.dst = _pfn;
>   mig.pgmap_owner = _uvmem_pgmap;
>   mig.flags = MIGRATE_VMA_SELECT_DEVICE_PRIVATE;
> + mig.fault_page = fault_page;
>  
>   /* The requested page is already paged-out, nothing to do */
>   if (!kvmppc_gfn_is_uvmem_pfn(gpa >> page_shift, kvm, NULL))
> @@ -580,12 +581,14 @@ static int __kvmppc_svm_page_out(struct vm_area_struct 
> *vma,
>  static inline int kvmppc_svm_page_out(struct vm_area_struct *vma,
> unsigned long start, unsigned long end,
> unsigned long page_shift,
> -   struct kvm *kvm, unsigned long gpa)
> +   struct kvm *kvm, unsigned long gpa,
> +   struct page *fault_page)
>  {
>   int ret;
>  
>   mutex_lock(>arch.uvmem_lock);
> - ret = __kvmppc_svm_page_out(vma, start, end, page_shift, kvm, gpa);
> + ret = __kvmppc_svm_page_out(vma, start, end, page_shift, kvm, gpa,
> + fault_page);
>   mutex_unlock(>arch.uvmem_lock);
>  
>   return ret;
> @@ -736,7 +739,7 @@ static int kvmppc_svm_page_in(struct vm_area_struct *vma,
>   bool pagein)
>  {
>   unsigned long src_pfn, dst_pfn = 0;
> - struct migrate_vma mig;
> + struct migrate_vma mig = { 0 };
>   struct page *spage;
>   unsigned long pfn;
>   struct page *dpage;
> @@ -994,7 +997,7 @@ static vm_fault_t kvmppc_uvmem_migrate_to_ram(struct 
> vm_fault *vmf)
>  
>   if (kvmppc_svm_page_out(vmf->vma, vmf->address,
>   vmf->address + PAGE_SIZE, PAGE_SHIFT,
> - pvt->kvm, pvt->gpa))
> + pvt->kvm, pvt->gpa, vmf->page))
>   return VM_FAULT_SIGBUS;
>   else
>   return 0;

I don't have a UV test system, but as-is it doesn't even compile :)

kvmppc_svm_page_out() is called via some paths other than the
migrate_to_ram callback.

I think it's correct to just pass fault_page = NULL when it's not called
from the migrate_to_ram callback?

Incremental diff below.

cheers


diff --git a/arch/powerpc/kvm/book3s_hv_uvmem.c 
b/arch/powerpc/kvm/book3s_hv_uvmem.c
index d4eacf410956..965c9e9e500b 100644
--- a/arch/powerpc/kvm/book3s_hv_uvmem.c
+++ b/arch/powerpc/kvm/book3s_hv_uvmem.c
@@ -637,7 +637,7 @@ void kvmppc_uvmem_drop_pages(const struct kvm_memory_slot 
*slot,

Re: [PATCH 1/7] mm/memory.c: Fix race when faulting a device private page

2022-09-28 Thread Alistair Popple



Michael Ellerman  writes:

> Alistair Popple  writes:
>> When the CPU tries to access a device private page the migrate_to_ram()
>> callback associated with the pgmap for the page is called. However no
>> reference is taken on the faulting page. Therefore a concurrent
>> migration of the device private page can free the page and possibly the
>> underlying pgmap. This results in a race which can crash the kernel due
>> to the migrate_to_ram() function pointer becoming invalid. It also means
>> drivers can't reliably read the zone_device_data field because the page
>> may have been freed with memunmap_pages().
>>
>> Close the race by getting a reference on the page while holding the ptl
>> to ensure it has not been freed. Unfortunately the elevated reference
>> count will cause the migration required to handle the fault to fail. To
>> avoid this failure pass the faulting page into the migrate_vma functions
>> so that if an elevated reference count is found it can be checked to see
>> if it's expected or not.
>>
>> Signed-off-by: Alistair Popple 
>> ---
>>  arch/powerpc/kvm/book3s_hv_uvmem.c   | 15 ++-
>>  drivers/gpu/drm/amd/amdkfd/kfd_migrate.c | 17 +++--
>>  drivers/gpu/drm/amd/amdkfd/kfd_migrate.h |  2 +-
>>  drivers/gpu/drm/amd/amdkfd/kfd_svm.c | 11 +---
>>  include/linux/migrate.h  |  8 ++-
>>  lib/test_hmm.c   |  7 ++---
>>  mm/memory.c  | 16 +++-
>>  mm/migrate.c | 34 ++---
>>  mm/migrate_device.c  | 18 +
>>  9 files changed, 87 insertions(+), 41 deletions(-)
>>
>> diff --git a/arch/powerpc/kvm/book3s_hv_uvmem.c 
>> b/arch/powerpc/kvm/book3s_hv_uvmem.c
>> index 5980063..d4eacf4 100644
>> --- a/arch/powerpc/kvm/book3s_hv_uvmem.c
>> +++ b/arch/powerpc/kvm/book3s_hv_uvmem.c
>> @@ -508,10 +508,10 @@ unsigned long kvmppc_h_svm_init_start(struct kvm *kvm)
>>  static int __kvmppc_svm_page_out(struct vm_area_struct *vma,
>>  unsigned long start,
>>  unsigned long end, unsigned long page_shift,
>> -struct kvm *kvm, unsigned long gpa)
>> +struct kvm *kvm, unsigned long gpa, struct page *fault_page)
>>  {
>>  unsigned long src_pfn, dst_pfn = 0;
>> -struct migrate_vma mig;
>> +struct migrate_vma mig = { 0 };
>>  struct page *dpage, *spage;
>>  struct kvmppc_uvmem_page_pvt *pvt;
>>  unsigned long pfn;
>> @@ -525,6 +525,7 @@ static int __kvmppc_svm_page_out(struct vm_area_struct 
>> *vma,
>>  mig.dst = _pfn;
>>  mig.pgmap_owner = _uvmem_pgmap;
>>  mig.flags = MIGRATE_VMA_SELECT_DEVICE_PRIVATE;
>> +mig.fault_page = fault_page;
>>
>>  /* The requested page is already paged-out, nothing to do */
>>  if (!kvmppc_gfn_is_uvmem_pfn(gpa >> page_shift, kvm, NULL))
>> @@ -580,12 +581,14 @@ static int __kvmppc_svm_page_out(struct vm_area_struct 
>> *vma,
>>  static inline int kvmppc_svm_page_out(struct vm_area_struct *vma,
>>unsigned long start, unsigned long end,
>>unsigned long page_shift,
>> -  struct kvm *kvm, unsigned long gpa)
>> +  struct kvm *kvm, unsigned long gpa,
>> +  struct page *fault_page)
>>  {
>>  int ret;
>>
>>  mutex_lock(>arch.uvmem_lock);
>> -ret = __kvmppc_svm_page_out(vma, start, end, page_shift, kvm, gpa);
>> +ret = __kvmppc_svm_page_out(vma, start, end, page_shift, kvm, gpa,
>> +fault_page);
>>  mutex_unlock(>arch.uvmem_lock);
>>
>>  return ret;
>> @@ -736,7 +739,7 @@ static int kvmppc_svm_page_in(struct vm_area_struct *vma,
>>  bool pagein)
>>  {
>>  unsigned long src_pfn, dst_pfn = 0;
>> -struct migrate_vma mig;
>> +struct migrate_vma mig = { 0 };
>>  struct page *spage;
>>  unsigned long pfn;
>>  struct page *dpage;
>> @@ -994,7 +997,7 @@ static vm_fault_t kvmppc_uvmem_migrate_to_ram(struct 
>> vm_fault *vmf)
>>
>>  if (kvmppc_svm_page_out(vmf->vma, vmf->address,
>>  vmf->address + PAGE_SIZE, PAGE_SHIFT,
>> -pvt->kvm, pvt->gpa))
>> +pvt->kvm, pvt->gpa, vmf->page))
>>  return VM_FAULT_SIGBUS;
>>  else
>>  return 0;
>
> I don't have a UV test system, but as-is it doesn't even compile :)

Ugh, thanks. I did get as far as installing a PPC cross-compiler and
building a kernel. Apparently I did not get as far as enabling
CONFIG_PPC_UV :)

> kvmppc_svm_page_out() is called via some paths other than the
> migrate_to_ram callback.
>
> I think it's correct to just pass fault_page = NULL when it's not called
> from the migrate_to_ram callback?
>
> Incremental diff below.
>
> cheers
>
>
> diff --git a/arch/powerpc/kvm/book3s_hv_uvmem.c 
>

RE: [PATCH] drm/amdgpu/gfx10: ignore rlc ucode validation

2022-09-28 Thread Chen, Guchun

Reviewed-by: Guchun Chen 

Regards,
Guchun

-Original Message-
From: amd-gfx  On Behalf Of Alex Deucher
Sent: Thursday, September 29, 2022 3:31 AM
To: amd-gfx@lists.freedesktop.org
Cc: Deucher, Alexander 
Subject: [PATCH] drm/amdgpu/gfx10: ignore rlc ucode validation

There are apparently ucode versions in the wild with incorrect sizes specified 
in the header.  We never checked this before, so don't start now.

Bug: 
https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgitlab.freedesktop.org%2Fdrm%2Famd%2F-%2Fissues%2F2170data=05%7C01%7Cguchun.chen%40amd.com%7C83b76835cb62401dd4a408daa18800ea%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C63702754962693%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7Csdata=BvwLB%2BugxkJ11k%2Bu6Cz6MvdgUJsZ6sE77VETnIUD41s%3Dreserved=0
Signed-off-by: Alex Deucher 
---
 drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c | 7 ++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c 
b/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
index 18809c3da178..af94ac580d3e 100644
--- a/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
@@ -4061,9 +4061,14 @@ static int gfx_v10_0_init_microcode(struct amdgpu_device 
*adev)
err = request_firmware(>gfx.rlc_fw, fw_name, adev->dev);
if (err)
goto out;
+   /* don't check this.  There are apparently firmwares in the 
wild with
+* incorrect size in the header
+*/
err = amdgpu_ucode_validate(adev->gfx.rlc_fw);
if (err)
-   goto out;
+   dev_dbg(adev->dev,
+   "gfx10: amdgpu_ucode_validate() failed 
\"%s\"\n",
+   fw_name);
rlc_hdr = (const struct rlc_firmware_header_v2_0 
*)adev->gfx.rlc_fw->data;
version_major = 
le16_to_cpu(rlc_hdr->header.header_version_major);
version_minor = 
le16_to_cpu(rlc_hdr->header.header_version_minor);
--
2.37.3

Re: [PATCH] drm/amdgpu: Skip put_reset_domain if it doesnt exist

2022-09-28 Thread Liu, Shaoyun

Looks OK to me.
Reviewed by : shaoyun.liu 



From: Chander, Vignesh 
Sent: September 28, 2022 3:03 PM
To: amd-gfx@lists.freedesktop.org 
Cc: Liu, Shaoyun ; Chander, Vignesh 

Subject: [PATCH] drm/amdgpu: Skip put_reset_domain if it doesnt exist

For xgmi sriov, the reset is handled by host driver and hive->reset_domain
is not initialized so need to check if it exists before doing a put.
Signed-off-by: Vignesh Chander 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_reset.h | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_reset.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_reset.h
index dc43fcb93eac..f5318fedf2f0 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_reset.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_reset.h
@@ -113,7 +113,8 @@ static inline bool amdgpu_reset_get_reset_domain(struct 
amdgpu_reset_domain *dom

 static inline void amdgpu_reset_put_reset_domain(struct amdgpu_reset_domain 
*domain)
 {
-   kref_put(>refcount, amdgpu_reset_destroy_reset_domain);
+   if (domain)
+   kref_put(>refcount, amdgpu_reset_destroy_reset_domain);
 }

 static inline bool amdgpu_reset_domain_schedule(struct amdgpu_reset_domain 
*domain,
--
2.25.1

Re: [PATCH v3 5/5] drm/amdgpu: switch workload context to/from compute

2022-09-28 Thread Alex Deucher

On Wed, Sep 28, 2022 at 4:57 AM Sharma, Shashank
 wrote:
>
>
>
> On 9/27/2022 10:40 PM, Alex Deucher wrote:
> > On Tue, Sep 27, 2022 at 11:38 AM Sharma, Shashank
> >  wrote:
> >>
> >>
> >>
> >> On 9/27/2022 5:23 PM, Felix Kuehling wrote:
> >>> Am 2022-09-27 um 10:58 schrieb Sharma, Shashank:
>  Hello Felix,
> 
>  Thank for the review comments.
> 
>  On 9/27/2022 4:48 PM, Felix Kuehling wrote:
> > Am 2022-09-27 um 02:12 schrieb Christian König:
> >> Am 26.09.22 um 23:40 schrieb Shashank Sharma:
> >>> This patch switches the GPU workload mode to/from
> >>> compute mode, while submitting compute workload.
> >>>
> >>> Signed-off-by: Alex Deucher 
> >>> Signed-off-by: Shashank Sharma 
> >>
> >> Feel free to add my acked-by, but Felix should probably take a look
> >> as well.
> >
> > This look OK purely from a compute perspective. But I'm concerned
> > about the interaction of compute with graphics or multiple graphics
> > contexts submitting work concurrently. They would constantly override
> > or disable each other's workload hints.
> >
> > For example, you have an amdgpu_ctx with
> > AMDGPU_CTX_WORKLOAD_HINT_COMPUTE (maybe Vulkan compute) and a KFD
> > process that also wants the compute profile. Those could be different
> > processes belonging to different users. Say, KFD enables the compute
> > profile first. Then the graphics context submits a job. At the start
> > of the job, the compute profile is enabled. That's a no-op because
> > KFD already enabled the compute profile. When the job finishes, it
> > disables the compute profile for everyone, including KFD. That's
> > unexpected.
> >
> 
>  In this case, it will not disable the compute profile, as the
>  reference counter will not be zero. The reset_profile() will only act
>  if the reference counter is 0.
> >>>
> >>> OK, I missed the reference counter.
> >>>
> >>>
> 
>  But I would be happy to get any inputs about a policy which can be
>  more sustainable and gets better outputs, for example:
>  - should we not allow a profile change, if a PP mode is already
>  applied and keep it Early bird basis ?
> 
>  For example: Policy A
>  - Job A sets the profile to compute
>  - Job B tries to set profile to 3D, but we do not allow it as job A is
>  not finished it yet.
> 
>  Or Policy B: Current one
>  - Job A sets the profile to compute
>  - Job B tries to set profile to 3D, and we allow it. Job A also runs
>  in PP 3D
>  - Job B finishes, but does not reset PP as reference count is not zero
>  due to compute
>  - Job  A finishes, profile reset to NONE
> >>>
> >>> I think this won't work. As I understand it, the
> >>> amdgpu_dpm_switch_power_profile enables and disables individual
> >>> profiles. Disabling the 3D profile doesn't disable the compute profile
> >>> at the same time. I think you'll need one refcount per profile.
> >>>
> >>> Regards,
> >>> Felix
> >>
> >> Thanks, This is exactly what I was looking for, I think Alex's initial
> >> idea was around it, but I was under the assumption that there is only
> >> one HW profile in SMU which keeps on getting overwritten. This can solve
> >> our problems, as I can create an array of reference counters, and will
> >> disable only the profile whose reference counter goes 0.
> >
> > It's been a while since I paged any of this code into my head, but I
> > believe the actual workload message in the SMU is a mask where you can
> > specify multiple workload types at the same time and the SMU will
> > arbitrate between them internally.  E.g., the most aggressive one will
> > be selected out of the ones specified.  I think in the driver we just
> > set one bit at a time using the current interface.  It might be better
> > to change the interface and just ref count the hint types and then
> > when we call the set function look at the ref counts for each hint
> > type and set the mask as appropriate.
> >
> > Alex
> >
>
> Hey Alex,
> Thanks for your comment, if that is the case, this current patch series
> works straight forward, and no changes would be required. Please let me
> know if my understanding is correct:
>
> Assumption: Order of aggression: 3D > Media > Compute
>
> - Job 1: Requests mode compute: PP changed to compute, ref count 1
> - Job 2: Requests mode media: PP changed to media, ref count 2
> - Job 3: requests mode 3D: PP changed to 3D, ref count 3
> - Job 1 finishes, downs ref count to 2, doesn't reset the PP as ref > 0,
> PP still 3D
> - Job 3 finishes, downs ref count to 1, doesn't reset the PP as ref > 0,
> PP still 3D
> - Job 2 finishes, downs ref count to 0, PP changed to NONE,
>
> In this way, every job will be operating in the Power profile of desired
> aggression or higher, and this API guarantees the execution at-least in
> the desired power profile.

I'm not entirely sure on the

Re: [PATCH 6/7] nouveau/dmem: Evict device private memory during release

2022-09-28 Thread Lyude Paul

Re comments about infinite retry: gotcha, makes sense to me.

On Tue, 2022-09-27 at 09:45 +1000, Alistair Popple wrote:
> John Hubbard  writes:
> 
> > On 9/26/22 14:35, Lyude Paul wrote:
> > > > +   for (i = 0; i < npages; i++) {
> > > > +   if (src_pfns[i] & MIGRATE_PFN_MIGRATE) {
> > > > +   struct page *dpage;
> > > > +
> > > > +   /*
> > > > +* _GFP_NOFAIL because the GPU is going away 
> > > > and there
> > > > +* is nothing sensible we can do if we can't 
> > > > copy the
> > > > +* data back.
> > > > +*/
> > > 
> > > You'll have to excuse me for a moment since this area of nouveau isn't 
> > > one of
> > > my strongpoints, but are we sure about this? IIRC __GFP_NOFAIL means 
> > > infinite
> > > retry, in the case of a GPU hotplug event I would assume we would rather 
> > > just
> > > stop trying to migrate things to the GPU and just drop the data instead of
> > > hanging on infinite retries.
> > > 
> 
> No problem, thanks for taking a look!
> 
> > Hi Lyude!
> > 
> > Actually, I really think it's better in this case to keep trying
> > (presumably not necessarily infinitely, but only until memory becomes
> > available), rather than failing out and corrupting data.
> > 
> > That's because I'm not sure it's completely clear that this memory is
> > discardable. And at some point, we're going to make this all work with
> > file-backed memory, which will *definitely* not be discardable--I
> > realize that we're not there yet, of course.
> > 
> > But here, it's reasonable to commit to just retrying indefinitely,
> > really. Memory should eventually show up. And if it doesn't, then
> > restarting the machine is better than corrupting data, generally.
> 
> The memory is definitely not discardable here if the migration failed
> because that implies it is still mapped into some userspace process.
> 
> We could avoid restarting the machine by doing something similar to what
> happens during memory failure and killing every process that maps the
> page(s). But overall I think it's better to retry until memory is
> available, because that allows things like reclaim to work and in the
> worst case allows the OOM killer to select an appropriate task to kill.
> It also won't cause data corruption if/when we have file-backed memory.
> 
> > thanks,
> 

-- 
Cheers,
 Lyude Paul (she/her)
 Software Engineer at Red Hat

Re: [PATCH v2 7/8] nouveau/dmem: Evict device private memory during release

2022-09-28 Thread Lyude Paul

Reviewed-by: Lyude Paul 

On Wed, 2022-09-28 at 22:01 +1000, Alistair Popple wrote:
> When the module is unloaded or a GPU is unbound from the module it is
> possible for device private pages to still be mapped in currently
> running processes. This can lead to a hangs and RCU stall warnings when
> unbinding the device as memunmap_pages() will wait in an uninterruptible
> state until all device pages have been freed which may never happen.
> 
> Fix this by migrating device mappings back to normal CPU memory prior to
> freeing the GPU memory chunks and associated device private pages.
> 
> Signed-off-by: Alistair Popple 
> Cc: Lyude Paul 
> Cc: Ben Skeggs 
> Cc: Ralph Campbell 
> Cc: John Hubbard 
> ---
>  drivers/gpu/drm/nouveau/nouveau_dmem.c | 48 +++-
>  1 file changed, 48 insertions(+)
> 
> diff --git a/drivers/gpu/drm/nouveau/nouveau_dmem.c 
> b/drivers/gpu/drm/nouveau/nouveau_dmem.c
> index 65f51fb..5fe2091 100644
> --- a/drivers/gpu/drm/nouveau/nouveau_dmem.c
> +++ b/drivers/gpu/drm/nouveau/nouveau_dmem.c
> @@ -367,6 +367,52 @@ nouveau_dmem_suspend(struct nouveau_drm *drm)
>   mutex_unlock(>dmem->mutex);
>  }
>  
> +/*
> + * Evict all pages mapping a chunk.
> + */
> +static void
> +nouveau_dmem_evict_chunk(struct nouveau_dmem_chunk *chunk)
> +{
> + unsigned long i, npages = range_len(>pagemap.range) >> 
> PAGE_SHIFT;
> + unsigned long *src_pfns, *dst_pfns;
> + dma_addr_t *dma_addrs;
> + struct nouveau_fence *fence;
> +
> + src_pfns = kcalloc(npages, sizeof(*src_pfns), GFP_KERNEL);
> + dst_pfns = kcalloc(npages, sizeof(*dst_pfns), GFP_KERNEL);
> + dma_addrs = kcalloc(npages, sizeof(*dma_addrs), GFP_KERNEL);
> +
> + migrate_device_range(src_pfns, chunk->pagemap.range.start >> PAGE_SHIFT,
> + npages);
> +
> + for (i = 0; i < npages; i++) {
> + if (src_pfns[i] & MIGRATE_PFN_MIGRATE) {
> + struct page *dpage;
> +
> + /*
> +  * _GFP_NOFAIL because the GPU is going away and there
> +  * is nothing sensible we can do if we can't copy the
> +  * data back.
> +  */
> + dpage = alloc_page(GFP_HIGHUSER | __GFP_NOFAIL);
> + dst_pfns[i] = migrate_pfn(page_to_pfn(dpage));
> + nouveau_dmem_copy_one(chunk->drm,
> + migrate_pfn_to_page(src_pfns[i]), dpage,
> + _addrs[i]);
> + }
> + }
> +
> + nouveau_fence_new(chunk->drm->dmem->migrate.chan, false, );
> + migrate_device_pages(src_pfns, dst_pfns, npages);
> + nouveau_dmem_fence_done();
> + migrate_device_finalize(src_pfns, dst_pfns, npages);
> + kfree(src_pfns);
> + kfree(dst_pfns);
> + for (i = 0; i < npages; i++)
> + dma_unmap_page(chunk->drm->dev->dev, dma_addrs[i], PAGE_SIZE, 
> DMA_BIDIRECTIONAL);
> + kfree(dma_addrs);
> +}
> +
>  void
>  nouveau_dmem_fini(struct nouveau_drm *drm)
>  {
> @@ -378,8 +424,10 @@ nouveau_dmem_fini(struct nouveau_drm *drm)
>   mutex_lock(>dmem->mutex);
>  
>   list_for_each_entry_safe(chunk, tmp, >dmem->chunks, list) {
> + nouveau_dmem_evict_chunk(chunk);
>   nouveau_bo_unpin(chunk->bo);
>   nouveau_bo_ref(NULL, >bo);
> + WARN_ON(chunk->callocated);
>   list_del(>list);
>   memunmap_pages(>pagemap);
>   release_mem_region(chunk->pagemap.range.start,

-- 
Cheers,
 Lyude Paul (she/her)
 Software Engineer at Red Hat

Re: [PATCH 6/7] nouveau/dmem: Evict device private memory during release

2022-09-28 Thread Lyude Paul

On Tue, 2022-09-27 at 11:39 +1000, Alistair Popple wrote:
> Felix Kuehling  writes:
> 
> > On 2022-09-26 17:35, Lyude Paul wrote:
> > > On Mon, 2022-09-26 at 16:03 +1000, Alistair Popple wrote:
> > > > When the module is unloaded or a GPU is unbound from the module it is
> > > > possible for device private pages to be left mapped in currently running
> > > > processes. This leads to a kernel crash when the pages are either freed
> > > > or accessed from the CPU because the GPU and associated data structures
> > > > and callbacks have all been freed.
> > > > 
> > > > Fix this by migrating any mappings back to normal CPU memory prior to
> > > > freeing the GPU memory chunks and associated device private pages.
> > > > 
> > > > Signed-off-by: Alistair Popple 
> > > > 
> > > > ---
> > > > 
> > > > I assume the AMD driver might have a similar issue. However I can't see
> > > > where device private (or coherent) pages actually get unmapped/freed
> > > > during teardown as I couldn't find any relevant calls to
> > > > devm_memunmap(), memunmap(), devm_release_mem_region() or
> > > > release_mem_region(). So it appears that ZONE_DEVICE pages are not being
> > > > properly freed during module unload, unless I'm missing something?
> > > I've got no idea, will poke Ben to see if they know the answer to this
> > 
> > I guess we're relying on devm to release the region. Isn't the whole point 
> > of
> > using devm_request_free_mem_region that we don't have to remember to 
> > explicitly
> > release it when the device gets destroyed? I believe we had an explicit free
> > call at some point by mistake, and that caused a double-free during module
> > unload. See this commit for reference:
> 
> Argh, thanks for that pointer. I was not so familiar with
> devm_request_free_mem_region()/devm_memremap_pages() as currently
> Nouveau explicitly manages that itself.

Mhm, TBH I feel like this was going to happen eventually anyway but there's
another reason for nouveau to start using the managed versions of these
functions at some point

> 
> > commit 22f4f4faf337d5fb2d2750aff13215726814273e
> > Author: Philip Yang 
> > Date:   Mon Sep 20 17:25:52 2021 -0400
> > 
> > drm/amdkfd: fix svm_migrate_fini warning
> >  Device manager releases device-specific resources when a driver
> > disconnects from a device, devm_memunmap_pages and
> > devm_release_mem_region calls in svm_migrate_fini are redundant.
> >  It causes below warning trace after patch "drm/amdgpu: Split
> > amdgpu_device_fini into early and late", so remove function
> > svm_migrate_fini.
> >  BUG: https://gitlab.freedesktop.org/drm/amd/-/issues/1718
> >  WARNING: CPU: 1 PID: 3646 at drivers/base/devres.c:795
> > devm_release_action+0x51/0x60
> > Call Trace:
> > ? memunmap_pages+0x360/0x360
> > svm_migrate_fini+0x2d/0x60 [amdgpu]
> > kgd2kfd_device_exit+0x23/0xa0 [amdgpu]
> > amdgpu_amdkfd_device_fini_sw+0x1d/0x30 [amdgpu]
> > amdgpu_device_fini_sw+0x45/0x290 [amdgpu]
> > amdgpu_driver_release_kms+0x12/0x30 [amdgpu]
> > drm_dev_release+0x20/0x40 [drm]
> > release_nodes+0x196/0x1e0
> > device_release_driver_internal+0x104/0x1d0
> > driver_detach+0x47/0x90
> > bus_remove_driver+0x7a/0xd0
> > pci_unregister_driver+0x3d/0x90
> > amdgpu_exit+0x11/0x20 [amdgpu]
> >  Signed-off-by: Philip Yang 
> > Reviewed-by: Felix Kuehling 
> > Signed-off-by: Alex Deucher 
> > 
> > Furthermore, I guess we are assuming that nobody is using the GPU when the
> > module is unloaded. As long as any processes have /dev/kfd open, you won't 
> > be
> > able to unload the module (except by force-unload). I suppose with 
> > ZONE_DEVICE
> > memory, we can have references to device memory pages even when user mode 
> > has
> > closed /dev/kfd. We do have a cleanup handler that runs in an 
> > MMU-free-notifier.
> > In theory that should run after all the pages in the mm_struct have been 
> > freed.
> > It releases all sorts of other device resources and needs the driver to 
> > still be
> > there. I'm not sure if there is anything preventing a module unload before 
> > the
> > free-notifier runs. I'll look into that.
> 
> Right - module unload (or device unbind) is one of the other ways we can
> hit this issue in Nouveau at least. You can end up with ZONE_DEVICE
> pages mapped in a running process after the module has unloaded.
> Although now you mention it that seems a bit wrong - the pgmap refcount
> should provide some protection against that. Will have to look into
> that too.
> 
> > Regards,
> >   Felix
> > 
> > 
> > > 
> > > > ---
> > > >   drivers/gpu/drm/nouveau/nouveau_dmem.c | 48 
> > > > +++-
> > > >   1 file changed, 48 insertions(+)
> > > > 
> > > > diff --git a/drivers/gpu/drm/nouveau/nouveau_dmem.c 
> > > > b/drivers/gpu/drm/nouveau/nouveau_dmem.c
> > > > index 66ebbd4..3b247b8 100644
> >

[PATCH 36/36] drm/amd/display: Minor code style change

2022-09-28 Thread Hamza Mahfooz

From: Rodrigo Siqueira 

This commit adds some minor code style changes just to reduce the merge
conflicts we have when we upstream some of the VBA code.

Reviewed-by: Aurabindo Pillai 
Signed-off-by: Rodrigo Siqueira 
---
 .../drm/amd/display/dc/dml/dcn32/display_mode_vba_32.c   | 9 ++---
 1 file changed, 2 insertions(+), 7 deletions(-)

diff --git a/drivers/gpu/drm/amd/display/dc/dml/dcn32/display_mode_vba_32.c 
b/drivers/gpu/drm/amd/display/dc/dml/dcn32/display_mode_vba_32.c
index 8316b1b914c6..11d5750e15af 100644
--- a/drivers/gpu/drm/amd/display/dc/dml/dcn32/display_mode_vba_32.c
+++ b/drivers/gpu/drm/amd/display/dc/dml/dcn32/display_mode_vba_32.c
@@ -2476,8 +2476,6 @@ void dml32_ModeSupportAndSystemConfigurationFull(struct 
display_mode_lib *mode_l
mode_lib->vba.PixelClock[k], 
mode_lib->vba.PixelClockBackEnd[k]);
}
 
-   m = 0;
-
for (k = 0; k <= mode_lib->vba.NumberOfActiveSurfaces - 1; k++) 
{
for (m = 0; m <= mode_lib->vba.NumberOfActiveSurfaces - 
1; m++) {
for (j = 0; j <= 
mode_lib->vba.NumberOfActiveSurfaces - 1; j++) {
@@ -2854,8 +2852,6 @@ void dml32_ModeSupportAndSystemConfigurationFull(struct 
display_mode_lib *mode_l
}
}
 
-   m = 0;
-
//Calculate Return BW
for (i = 0; i < (int) v->soc.num_states; ++i) {
for (j = 0; j <= 1; ++j) {
@@ -3616,11 +3612,10 @@ void dml32_ModeSupportAndSystemConfigurationFull(struct 
display_mode_lib *mode_l
mode_lib->vba.ModeIsSupported = 
mode_lib->vba.ModeSupport[i][0] == true
|| mode_lib->vba.ModeSupport[i][1] == 
true;
 
-   if (mode_lib->vba.ModeSupport[i][0] == true) {
+   if (mode_lib->vba.ModeSupport[i][0] == true)
MaximumMPCCombine = 0;
-   } else {
+   else
MaximumMPCCombine = 1;
-   }
}
}
 
-- 
2.37.2

[PATCH 08/36] Revert "drm/amd/display: correct hostvm flag"

2022-09-28 Thread Hamza Mahfooz

From: Aric Cyr 

This reverts commit bbd46fc8daae7c2a1c79b63854621d8446e9794a.

4K144 resolution isn't available on DCN31.

Reviewed-by: Sherry Wang 
Acked-by: Hamza Mahfooz 
Signed-off-by: Aric Cyr 
---
 drivers/gpu/drm/amd/display/dc/dcn31/dcn31_resource.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/display/dc/dcn31/dcn31_resource.c 
b/drivers/gpu/drm/amd/display/dc/dcn31/dcn31_resource.c
index ce993e8bdd24..fddc21a5a04c 100644
--- a/drivers/gpu/drm/amd/display/dc/dcn31/dcn31_resource.c
+++ b/drivers/gpu/drm/amd/display/dc/dcn31/dcn31_resource.c
@@ -889,7 +889,7 @@ static const struct dc_debug_options debug_defaults_drv = {
},
.disable_z10 = true,
.enable_z9_disable_interface = true, /* Allow support for the PMFW 
interface for disable Z9*/
-   .dml_hostvm_override = DML_HOSTVM_NO_OVERRIDE,
+   .dml_hostvm_override = DML_HOSTVM_OVERRIDE_FALSE,
 };
 
 static const struct dc_debug_options debug_defaults_diags = {
-- 
2.37.2

[PATCH 29/36] drm/amd/display: Adding missing HDMI ACP SEND register

2022-09-28 Thread Hamza Mahfooz

From: Rodrigo Siqueira 

Add HDMI ACP bit field definition for DCN32.

Reviewed-by: Aurabindo Pillai 
Signed-off-by: Rodrigo Siqueira 
---
 drivers/gpu/drm/amd/display/dc/dcn32/dcn32_dio_stream_encoder.h | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/gpu/drm/amd/display/dc/dcn32/dcn32_dio_stream_encoder.h 
b/drivers/gpu/drm/amd/display/dc/dcn32/dcn32_dio_stream_encoder.h
index df7e36576ac0..20e5f016a45a 100644
--- a/drivers/gpu/drm/amd/display/dc/dcn32/dcn32_dio_stream_encoder.h
+++ b/drivers/gpu/drm/amd/display/dc/dcn32/dcn32_dio_stream_encoder.h
@@ -108,6 +108,7 @@
SE_SF(DIG0_HDMI_VBI_PACKET_CONTROL, HDMI_GC_CONT, mask_sh),\
SE_SF(DIG0_HDMI_VBI_PACKET_CONTROL, HDMI_GC_SEND, mask_sh),\
SE_SF(DIG0_HDMI_VBI_PACKET_CONTROL, HDMI_NULL_SEND, mask_sh),\
+   SE_SF(DIG0_HDMI_VBI_PACKET_CONTROL, HDMI_ACP_SEND, mask_sh),\
SE_SF(DIG0_HDMI_INFOFRAME_CONTROL0, HDMI_AUDIO_INFO_SEND, mask_sh),\
SE_SF(DIG0_HDMI_INFOFRAME_CONTROL1, HDMI_AUDIO_INFO_LINE, mask_sh),\
SE_SF(DIG0_HDMI_GC, HDMI_GC_AVMUTE, mask_sh),\
-- 
2.37.2

[PATCH 27/36] drm/amd/display: Fix disable DSC logic in ghe DIO code

2022-09-28 Thread Hamza Mahfooz

From: Eric Bernstein 

[Why]
In DIO stream encoder, definition of DP_DSC_MODE is changed (only
enable/disable) In OPTC, OTG_SET_V_TOTAL_MIN_MASK_EN is removed (same as
DCN3.1)

[How]
In DIO stream encoder, update enc32_dp_set_dsc_config(). In OPTC, use
DCN3.1 version for function interfaces .set_vrr_m_const and .set_drr

Reviewed-by: Rodrigo Siqueira 
Signed-off-by: Eric Bernstein 
---
 drivers/gpu/drm/amd/display/dc/dcn32/dcn32_dio_stream_encoder.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/display/dc/dcn32/dcn32_dio_stream_encoder.c 
b/drivers/gpu/drm/amd/display/dc/dcn32/dcn32_dio_stream_encoder.c
index 40e713c4e172..d19fc93dbc75 100644
--- a/drivers/gpu/drm/amd/display/dc/dcn32/dcn32_dio_stream_encoder.c
+++ b/drivers/gpu/drm/amd/display/dc/dcn32/dcn32_dio_stream_encoder.c
@@ -388,7 +388,7 @@ static void enc32_dp_set_dsc_config(struct stream_encoder 
*enc,
 {
struct dcn10_stream_encoder *enc1 = DCN10STRENC_FROM_STRENC(enc);
 
-   REG_UPDATE(DP_DSC_CNTL, DP_DSC_MODE, dsc_mode);
+   REG_UPDATE(DP_DSC_CNTL, DP_DSC_MODE, dsc_mode == OPTC_DSC_DISABLED ? 0 
: 1);
 }
 
 /* this function read dsc related register fields to be logged later in 
dcn10_log_hw_state
-- 
2.37.2

[PATCH 28/36] drm/amd/display: Add missing SDP registers to DCN32 reglist

2022-09-28 Thread Hamza Mahfooz

From: George Shen 

[Why]
Certain features require the additional DP SDP configuration registers
DP_SEC_CNTL1 and DP_SEC_CNTL5 in order to function correctly.

The DCN32 DIO stream encoder reglist is currently missing these two
registers.

[How]
Add the missing registers to the DCN32 DIO stream encoder reglist.

Reviewed-by: Rodrigo Siqueira 
Signed-off-by: George Shen 
---
 drivers/gpu/drm/amd/display/dc/dcn32/dcn32_dio_stream_encoder.h | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/gpu/drm/amd/display/dc/dcn32/dcn32_dio_stream_encoder.h 
b/drivers/gpu/drm/amd/display/dc/dcn32/dcn32_dio_stream_encoder.h
index 250d9a341cf6..df7e36576ac0 100644
--- a/drivers/gpu/drm/amd/display/dc/dcn32/dcn32_dio_stream_encoder.h
+++ b/drivers/gpu/drm/amd/display/dc/dcn32/dcn32_dio_stream_encoder.h
@@ -71,7 +71,9 @@
SRI(DP_MSE_RATE_UPDATE, DP, id), \
SRI(DP_PIXEL_FORMAT, DP, id), \
SRI(DP_SEC_CNTL, DP, id), \
+   SRI(DP_SEC_CNTL1, DP, id), \
SRI(DP_SEC_CNTL2, DP, id), \
+   SRI(DP_SEC_CNTL5, DP, id), \
SRI(DP_SEC_CNTL6, DP, id), \
SRI(DP_STEER_FIFO, DP, id), \
SRI(DP_VID_M, DP, id), \
-- 
2.37.2

[PATCH 23/36] drm/amd/display: Drop unused code for DCN32/321

2022-09-28 Thread Hamza Mahfooz

From: Rodrigo Siqueira 

Under DCN32/321 we identified some code paths that DC never executes.
This commit removes those unused codes to avoid distractions when
debugging issues.

Reviewed-by: Aurabindo Pillai 
Signed-off-by: Rodrigo Siqueira 
---
 .../display/dc/dcn32/dcn32_dio_link_encoder.c |  7 ---
 .../display/dc/dcn32/dcn32_dio_link_encoder.h |  4 
 .../dc/dcn32/dcn32_dio_stream_encoder.c   | 20 ---
 .../gpu/drm/amd/display/dc/dcn32/dcn32_hubp.c |  3 +--
 .../dc/dcn321/dcn321_dio_link_encoder.c   |  1 -
 .../amd/display/dc/dcn321/dcn321_resource.c   |  2 --
 6 files changed, 1 insertion(+), 36 deletions(-)

diff --git a/drivers/gpu/drm/amd/display/dc/dcn32/dcn32_dio_link_encoder.c 
b/drivers/gpu/drm/amd/display/dc/dcn32/dcn32_dio_link_encoder.c
index fdae6aa89908..076969d928af 100644
--- a/drivers/gpu/drm/amd/display/dc/dcn32/dcn32_dio_link_encoder.c
+++ b/drivers/gpu/drm/amd/display/dc/dcn32/dcn32_dio_link_encoder.c
@@ -150,12 +150,6 @@ static void dcn32_link_encoder_get_max_link_cap(struct 
link_encoder *enc,
 
 }
 
-void enc32_set_dig_output_mode(struct link_encoder *enc, uint8_t 
pix_per_container)
-{
-   struct dcn10_link_encoder *enc10 = TO_DCN10_LINK_ENC(enc);
-   REG_UPDATE(DIG_FIFO_CTRL0, DIG_FIFO_OUTPUT_PIXEL_MODE, 
pix_per_container);
-}
- 
 static const struct link_encoder_funcs dcn32_link_enc_funcs = {
.read_state = link_enc2_read_state,
.validate_output_with_stream =
@@ -186,7 +180,6 @@ static const struct link_encoder_funcs dcn32_link_enc_funcs 
= {
.is_in_alt_mode = dcn32_link_encoder_is_in_alt_mode,
.get_max_link_cap = dcn32_link_encoder_get_max_link_cap,
.set_dio_phy_mux = dcn31_link_encoder_set_dio_phy_mux,
-   .set_dig_output_mode = enc32_set_dig_output_mode,
 };
 
 void dcn32_link_encoder_construct(
diff --git a/drivers/gpu/drm/amd/display/dc/dcn32/dcn32_dio_link_encoder.h 
b/drivers/gpu/drm/amd/display/dc/dcn32/dcn32_dio_link_encoder.h
index 749a1e8cb811..bbcfce06bec0 100644
--- a/drivers/gpu/drm/amd/display/dc/dcn32/dcn32_dio_link_encoder.h
+++ b/drivers/gpu/drm/amd/display/dc/dcn32/dcn32_dio_link_encoder.h
@@ -53,8 +53,4 @@ void dcn32_link_encoder_enable_dp_output(
const struct dc_link_settings *link_settings,
enum clock_source_id clock_source);
 
-void enc32_set_dig_output_mode(
-   struct link_encoder *enc,
-   uint8_t pix_per_container);
-
 #endif /* __DC_LINK_ENCODER__DCN32_H__ */
diff --git a/drivers/gpu/drm/amd/display/dc/dcn32/dcn32_dio_stream_encoder.c 
b/drivers/gpu/drm/amd/display/dc/dcn32/dcn32_dio_stream_encoder.c
index 3195be9d38f5..40e713c4e172 100644
--- a/drivers/gpu/drm/amd/display/dc/dcn32/dcn32_dio_stream_encoder.c
+++ b/drivers/gpu/drm/amd/display/dc/dcn32/dcn32_dio_stream_encoder.c
@@ -411,24 +411,6 @@ static void enc32_read_state(struct stream_encoder *enc, 
struct enc_state *s)
}
 }
 
-static void enc32_stream_encoder_reset_fifo(struct stream_encoder *enc)
-{
-   struct dcn10_stream_encoder *enc1 = DCN10STRENC_FROM_STRENC(enc);
-   uint32_t fifo_enabled;
-
-   REG_GET(DIG_FIFO_CTRL0, DIG_FIFO_ENABLE, _enabled);
-
-   if (fifo_enabled == 0) {
-   /* reset DIG resync FIFO */
-   REG_UPDATE(DIG_FIFO_CTRL0, DIG_FIFO_RESET, 1);
-   /* TODO: fix timeout when wait for DIG_FIFO_RESET_DONE */
-   //REG_WAIT(DIG_FIFO_CTRL0, DIG_FIFO_RESET_DONE, 1, 1, 100);
-   udelay(1);
-   REG_UPDATE(DIG_FIFO_CTRL0, DIG_FIFO_RESET, 0);
-   REG_WAIT(DIG_FIFO_CTRL0, DIG_FIFO_RESET_DONE, 0, 1, 100);
-   }
-}
-
 static void enc32_set_dig_input_mode(struct stream_encoder *enc, unsigned int 
pix_per_container)
 {
struct dcn10_stream_encoder *enc1 = DCN10STRENC_FROM_STRENC(enc);
@@ -458,8 +440,6 @@ static const struct stream_encoder_funcs 
dcn32_str_enc_funcs = {
enc3_stream_encoder_update_dp_info_packets,
.stop_dp_info_packets =
enc1_stream_encoder_stop_dp_info_packets,
-   .reset_fifo =
-   enc32_stream_encoder_reset_fifo,
.dp_blank =
enc1_stream_encoder_dp_blank,
.dp_unblank =
diff --git a/drivers/gpu/drm/amd/display/dc/dcn32/dcn32_hubp.c 
b/drivers/gpu/drm/amd/display/dc/dcn32/dcn32_hubp.c
index 830562f4139d..f4b901d393eb 100644
--- a/drivers/gpu/drm/amd/display/dc/dcn32/dcn32_hubp.c
+++ b/drivers/gpu/drm/amd/display/dc/dcn32/dcn32_hubp.c
@@ -185,8 +185,7 @@ static struct hubp_funcs dcn32_hubp_funcs = {
.hubp_update_force_pstate_disallow = 
hubp32_update_force_pstate_disallow,
.phantom_hubp_post_enable = hubp32_phantom_hubp_post_enable,
.hubp_update_mall_sel = hubp32_update_mall_sel,
-   .hubp_prepare_subvp_buffering = hubp32_prepare_subvp_buffering,
-   .hubp_set_flip_int = hubp1_set_flip_int
+   .hubp_prepare_subvp_buffering = hubp32_prepare_subvp_buffering
 };
 
 bool hubp32_construct(
diff --git

[PATCH 35/36] drm/amd/display: update DSC for DCN32

2022-09-28 Thread Hamza Mahfooz

From: Rodrigo Siqueira 

Update DSC checks in the DCN32 VBA.

Reviewed-by: Aurabindo Pillai 
Signed-off-by: Rodrigo Siqueira 
---
 .../drm/amd/display/dc/dml/dcn32/display_mode_vba_32.c | 10 --
 1 file changed, 4 insertions(+), 6 deletions(-)

diff --git a/drivers/gpu/drm/amd/display/dc/dml/dcn32/display_mode_vba_32.c 
b/drivers/gpu/drm/amd/display/dc/dml/dcn32/display_mode_vba_32.c
index 75be1e1ce543..8316b1b914c6 100644
--- a/drivers/gpu/drm/amd/display/dc/dml/dcn32/display_mode_vba_32.c
+++ b/drivers/gpu/drm/amd/display/dc/dml/dcn32/display_mode_vba_32.c
@@ -2252,9 +2252,8 @@ void dml32_ModeSupportAndSystemConfigurationFull(struct 
display_mode_lib *mode_l
for (k = 0; k <= mode_lib->vba.NumberOfActiveSurfaces - 1; k++) {
if (!(mode_lib->vba.DSCInputBitPerComponent[k] == 12.0
|| mode_lib->vba.DSCInputBitPerComponent[k] == 
10.0
-   || mode_lib->vba.DSCInputBitPerComponent[k] == 
8.0
-   || mode_lib->vba.DSCInputBitPerComponent[k] >
-   mode_lib->vba.MaximumDSCBitsPerComponent)) {
+   || mode_lib->vba.DSCInputBitPerComponent[k] == 
8.0)
+   || mode_lib->vba.DSCInputBitPerComponent[k] > 
mode_lib->vba.MaximumDSCBitsPerComponent) {
mode_lib->vba.NonsupportedDSCInputBPC = true;
}
}
@@ -2330,16 +2329,15 @@ void dml32_ModeSupportAndSystemConfigurationFull(struct 
display_mode_lib *mode_l
if (mode_lib->vba.OutputMultistreamId[k] == k 
&& mode_lib->vba.ForcedOutputLinkBPP[k] == 0)

mode_lib->vba.BPPForMultistreamNotIndicated = true;
for (j = 0; j < 
mode_lib->vba.NumberOfActiveSurfaces; ++j) {
-   if 
(mode_lib->vba.OutputMultistreamId[k] == j && 
mode_lib->vba.OutputMultistreamEn[k]
+   if 
(mode_lib->vba.OutputMultistreamId[k] == j
&& 
mode_lib->vba.ForcedOutputLinkBPP[k] == 0)

mode_lib->vba.BPPForMultistreamNotIndicated = true;
}
}
 
if ((mode_lib->vba.Output[k] == dm_edp || 
mode_lib->vba.Output[k] == dm_hdmi)) {
-   if (mode_lib->vba.OutputMultistreamId[k] == k 
&& mode_lib->vba.OutputMultistreamEn[k])
+   if (mode_lib->vba.OutputMultistreamEn[k] == 
true && mode_lib->vba.OutputMultistreamId[k] == k)
mode_lib->vba.MultistreamWithHDMIOreDP 
= true;
-
for (j = 0; j < 
mode_lib->vba.NumberOfActiveSurfaces; ++j) {
if 
(mode_lib->vba.OutputMultistreamEn[k] == true && 
mode_lib->vba.OutputMultistreamId[k] == j)

mode_lib->vba.MultistreamWithHDMIOreDP = true;
-- 
2.37.2

[PATCH 19/36] drm/amd/display: Fix vupdate and vline position calculation

2022-09-28 Thread Hamza Mahfooz

From: Aric Cyr 

[how]
Large deltas for periodic interrupts could result in the interrupt not
being programmed properly and thus not firing.

[why]
Add proper wrap-around support for calculating VUPDATE and VLINE
positions.

Reviewed-by: Jun Lei 
Acked-by: Hamza Mahfooz 
Signed-off-by: Aric Cyr 
---
 .../amd/display/dc/dcn10/dcn10_hw_sequencer.c | 60 ---
 1 file changed, 25 insertions(+), 35 deletions(-)

diff --git a/drivers/gpu/drm/amd/display/dc/dcn10/dcn10_hw_sequencer.c 
b/drivers/gpu/drm/amd/display/dc/dcn10/dcn10_hw_sequencer.c
index 72521749c01d..f5427a979b5d 100644
--- a/drivers/gpu/drm/amd/display/dc/dcn10/dcn10_hw_sequencer.c
+++ b/drivers/gpu/drm/amd/display/dc/dcn10/dcn10_hw_sequencer.c
@@ -3810,28 +3810,14 @@ void dcn10_calc_vupdate_position(
uint32_t *start_line,
uint32_t *end_line)
 {
-   const struct dc_crtc_timing *dc_crtc_timing = _ctx->stream->timing;
-   int vline_int_offset_from_vupdate =
-   pipe_ctx->stream->periodic_interrupt.lines_offset;
-   int vupdate_offset_from_vsync = 
dc->hwss.get_vupdate_offset_from_vsync(pipe_ctx);
-   int start_position;
-
-   if (vline_int_offset_from_vupdate > 0)
-   vline_int_offset_from_vupdate--;
-   else if (vline_int_offset_from_vupdate < 0)
-   vline_int_offset_from_vupdate++;
+   const struct dc_crtc_timing *timing = _ctx->stream->timing;
+   int vupdate_pos = dc->hwss.get_vupdate_offset_from_vsync(pipe_ctx);
 
-   start_position = vline_int_offset_from_vupdate + 
vupdate_offset_from_vsync;
-
-   if (start_position >= 0)
-   *start_line = start_position;
+   if (vupdate_pos >= 0)
+   *start_line = vupdate_pos - ((vupdate_pos / timing->v_total) * 
timing->v_total);
else
-   *start_line = dc_crtc_timing->v_total + start_position - 1;
-
-   *end_line = *start_line + 2;
-
-   if (*end_line >= dc_crtc_timing->v_total)
-   *end_line = 2;
+   *start_line = vupdate_pos + ((-vupdate_pos / timing->v_total) + 
1) * timing->v_total - 1;
+   *end_line = (*start_line + 2) % timing->v_total;
 }
 
 static void dcn10_cal_vline_position(
@@ -3840,23 +3826,27 @@ static void dcn10_cal_vline_position(
uint32_t *start_line,
uint32_t *end_line)
 {
-   switch (pipe_ctx->stream->periodic_interrupt.ref_point) {
-   case START_V_UPDATE:
-   dcn10_calc_vupdate_position(
-   dc,
-   pipe_ctx,
-   start_line,
-   end_line);
-   break;
-   case START_V_SYNC:
+   const struct dc_crtc_timing *timing = _ctx->stream->timing;
+   int vline_pos = pipe_ctx->stream->periodic_interrupt.lines_offset;
+
+   if (pipe_ctx->stream->periodic_interrupt.ref_point == START_V_UPDATE) {
+   if (vline_pos > 0)
+   vline_pos--;
+   else if (vline_pos < 0)
+   vline_pos++;
+
+   vline_pos += dc->hwss.get_vupdate_offset_from_vsync(pipe_ctx);
+   if (vline_pos >= 0)
+   *start_line = vline_pos - ((vline_pos / 
timing->v_total) * timing->v_total);
+   else
+   *start_line = vline_pos + ((-vline_pos / 
timing->v_total) + 1) * timing->v_total - 1;
+   *end_line = (*start_line + 2) % timing->v_total;
+   } else if (pipe_ctx->stream->periodic_interrupt.ref_point == 
START_V_SYNC) {
// vsync is line 0 so start_line is just the requested line 
offset
-   *start_line = pipe_ctx->stream->periodic_interrupt.lines_offset;
-   *end_line = *start_line + 2;
-   break;
-   default:
+   *start_line = vline_pos;
+   *end_line = (*start_line + 2) % timing->v_total;
+   } else
ASSERT(0);
-   break;
-   }
 }
 
 void dcn10_setup_periodic_interrupt(
-- 
2.37.2

[PATCH 34/36] drm/amd/display: Disconnect DSC for unused pipes during ODM transition

2022-09-28 Thread Hamza Mahfooz

From: Rodrigo Siqueira 

[Why]
During transition from ODM combine to ODM bypass, if DSC is enabled need
to disconnect the DSC mux for pipes no longer in use.

[How]
During ODM update, detect pipes with DSC that are no longer being used
for new state and call new DSC interface to disconnect.

Add new DSC interface to disconnect from pipe

Reviewed-by: Alvin Lee 
Signed-off-by: Rodrigo Siqueira 
---
 .../drm/amd/display/dc/dcn32/dcn32_hwseq.c| 19 +++
 1 file changed, 11 insertions(+), 8 deletions(-)

diff --git a/drivers/gpu/drm/amd/display/dc/dcn32/dcn32_hwseq.c 
b/drivers/gpu/drm/amd/display/dc/dcn32/dcn32_hwseq.c
index 33bdf56b2b3a..955ca273cfe1 100644
--- a/drivers/gpu/drm/amd/display/dc/dcn32/dcn32_hwseq.c
+++ b/drivers/gpu/drm/amd/display/dc/dcn32/dcn32_hwseq.c
@@ -1148,16 +1148,19 @@ void dcn32_update_odm(struct dc *dc, struct dc_state 
*context, struct pipe_ctx *
true);
}
 
-   // Don't program pixel clock after link is already enabled
-/* if (false == pipe_ctx->clock_source->funcs->program_pix_clk(
-   pipe_ctx->clock_source,
-   _ctx->stream_res.pix_clk_params,
-   _ctx->pll_settings)) {
-   BREAK_TO_DEBUGGER();
-   }*/
+   if (pipe_ctx->stream_res.dsc) {
+   struct pipe_ctx *current_pipe_ctx = 
>current_state->res_ctx.pipe_ctx[pipe_ctx->pipe_idx];
 
-   if (pipe_ctx->stream_res.dsc)
update_dsc_on_stream(pipe_ctx, 
pipe_ctx->stream->timing.flags.DSC);
+
+   /* Check if no longer using pipe for ODM, then need to 
disconnect DSC for that pipe */
+   if (!pipe_ctx->next_odm_pipe && current_pipe_ctx->next_odm_pipe 
&&
+   
current_pipe_ctx->next_odm_pipe->stream_res.dsc) {
+   struct display_stream_compressor *dsc = 
current_pipe_ctx->next_odm_pipe->stream_res.dsc;
+   /* disconnect DSC block from stream */
+   dsc->funcs->dsc_disconnect(dsc);
+   }
+   }
 }
 
 unsigned int dcn32_calculate_dccg_k1_k2_values(struct pipe_ctx *pipe_ctx, 
unsigned int *k1_div, unsigned int *k2_div)
-- 
2.37.2

[PATCH 11/36] drm/amd/display: AUX tracing cleanup

2022-09-28 Thread Hamza Mahfooz

From: "Leo (Hanghong) Ma" 

[Why && How]
Remove the unnecessary AUX trace and use one trace for AUX failure.

Reviewed-by: Martin Leung 
Acked-by: Hamza Mahfooz 
Signed-off-by: Leo (Hanghong) Ma 
---
 drivers/gpu/drm/amd/display/dc/dce/dce_aux.c | 13 +++--
 1 file changed, 3 insertions(+), 10 deletions(-)

diff --git a/drivers/gpu/drm/amd/display/dc/dce/dce_aux.c 
b/drivers/gpu/drm/amd/display/dc/dce/dce_aux.c
index 32782ef9ef77..140297c8ff55 100644
--- a/drivers/gpu/drm/amd/display/dc/dce/dce_aux.c
+++ b/drivers/gpu/drm/amd/display/dc/dce/dce_aux.c
@@ -942,10 +942,6 @@ bool dce_aux_transfer_with_retries(struct ddc_service *ddc,
case AUX_RET_ERROR_ENGINE_ACQUIRE:
case AUX_RET_ERROR_UNKNOWN:
default:
-   DC_TRACE_LEVEL_MESSAGE(DAL_TRACE_LEVEL_INFORMATION,
-   LOG_FLAG_I2cAux_DceAux,
-   "dce_aux_transfer_with_retries: 
Failure: operation_result=%d",
-   (int)operation_result);
goto fail;
}
}
@@ -953,14 +949,11 @@ bool dce_aux_transfer_with_retries(struct ddc_service 
*ddc,
 fail:
DC_TRACE_LEVEL_MESSAGE(DAL_TRACE_LEVEL_ERROR,
LOG_FLAG_Error_I2cAux,
-   "dce_aux_transfer_with_retries: FAILURE");
+   "%s: Failure: operation_result=%d",
+   __func__,
+   (int)operation_result);
if (!payload_reply)
payload->reply = NULL;
 
-   DC_TRACE_LEVEL_MESSAGE(DAL_TRACE_LEVEL_ERROR,
-   WPP_BIT_FLAG_DC_ERROR,
-   "AUX transaction failed. Result: %d",
-   operation_result);
-
return false;
 }
-- 
2.37.2

[PATCH 26/36] drm/amd/display: Remove OPTC lock check

2022-09-28 Thread Hamza Mahfooz

From: Rodrigo Siqueira 

At some point, we decided to blank HUBP during pixel data blank, and to
handle that, we added some OPTC lock checks. Later, we realized that
this change caused multiple regression, and we removed it. Nevertheless,
we still have some leftovers that might affect some ASIC behavior, and
this commit drops those changes to keep the code consistent.

Reviewed-by: Aurabindo Pillai 
Signed-off-by: Rodrigo Siqueira 
---
 drivers/gpu/drm/amd/display/dc/dcn10/dcn10_optc.c | 11 ---
 drivers/gpu/drm/amd/display/dc/dcn10/dcn10_optc.h |  1 -
 drivers/gpu/drm/amd/display/dc/dcn30/dcn30_optc.c |  1 -
 drivers/gpu/drm/amd/display/dc/dcn31/dcn31_optc.c |  1 -
 drivers/gpu/drm/amd/display/dc/inc/core_types.h   |  1 -
 .../gpu/drm/amd/display/dc/inc/hw/timing_generator.h  |  1 -
 6 files changed, 16 deletions(-)

diff --git a/drivers/gpu/drm/amd/display/dc/dcn10/dcn10_optc.c 
b/drivers/gpu/drm/amd/display/dc/dcn10/dcn10_optc.c
index ea7739255119..143a900d4d3d 100644
--- a/drivers/gpu/drm/amd/display/dc/dcn10/dcn10_optc.c
+++ b/drivers/gpu/drm/amd/display/dc/dcn10/dcn10_optc.c
@@ -679,16 +679,6 @@ void optc1_unlock(struct timing_generator *optc)
OTG_MASTER_UPDATE_LOCK, 0);
 }
 
-bool optc1_is_locked(struct timing_generator *optc)
-{
-   struct optc *optc1 = DCN10TG_FROM_TG(optc);
-   uint32_t locked;
-
-   REG_GET(OTG_MASTER_UPDATE_LOCK, UPDATE_LOCK_STATUS, );
-
-   return (locked == 1);
-}
-
 void optc1_get_position(struct timing_generator *optc,
struct crtc_position *position)
 {
@@ -1583,7 +1573,6 @@ static const struct timing_generator_funcs dcn10_tg_funcs 
= {
.enable_crtc_reset = optc1_enable_crtc_reset,
.disable_reset_trigger = optc1_disable_reset_trigger,
.lock = optc1_lock,
-   .is_locked = optc1_is_locked,
.unlock = optc1_unlock,
.enable_optc_clock = optc1_enable_optc_clock,
.set_drr = optc1_set_drr,
diff --git a/drivers/gpu/drm/amd/display/dc/dcn10/dcn10_optc.h 
b/drivers/gpu/drm/amd/display/dc/dcn10/dcn10_optc.h
index 6323ca6dc3b3..88ac5f6f4c96 100644
--- a/drivers/gpu/drm/amd/display/dc/dcn10/dcn10_optc.h
+++ b/drivers/gpu/drm/amd/display/dc/dcn10/dcn10_optc.h
@@ -654,7 +654,6 @@ void optc1_set_blank(struct timing_generator *optc,
bool enable_blanking);
 
 bool optc1_is_blanked(struct timing_generator *optc);
-bool optc1_is_locked(struct timing_generator *optc);
 
 void optc1_program_blank_color(
struct timing_generator *optc,
diff --git a/drivers/gpu/drm/amd/display/dc/dcn30/dcn30_optc.c 
b/drivers/gpu/drm/amd/display/dc/dcn30/dcn30_optc.c
index 1782b9c26cf4..02459a64ee21 100644
--- a/drivers/gpu/drm/amd/display/dc/dcn30/dcn30_optc.c
+++ b/drivers/gpu/drm/amd/display/dc/dcn30/dcn30_optc.c
@@ -319,7 +319,6 @@ static struct timing_generator_funcs dcn30_tg_funcs = {
.enable_crtc_reset = optc1_enable_crtc_reset,
.disable_reset_trigger = optc1_disable_reset_trigger,
.lock = optc3_lock,
-   .is_locked = optc1_is_locked,
.unlock = optc1_unlock,
.lock_doublebuffer_enable = optc3_lock_doublebuffer_enable,
.lock_doublebuffer_disable = optc3_lock_doublebuffer_disable,
diff --git a/drivers/gpu/drm/amd/display/dc/dcn31/dcn31_optc.c 
b/drivers/gpu/drm/amd/display/dc/dcn31/dcn31_optc.c
index 2f7404a97479..d873def1a8f9 100644
--- a/drivers/gpu/drm/amd/display/dc/dcn31/dcn31_optc.c
+++ b/drivers/gpu/drm/amd/display/dc/dcn31/dcn31_optc.c
@@ -260,7 +260,6 @@ static struct timing_generator_funcs dcn31_tg_funcs = {
.enable_crtc_reset = optc1_enable_crtc_reset,
.disable_reset_trigger = optc1_disable_reset_trigger,
.lock = optc3_lock,
-   .is_locked = optc1_is_locked,
.unlock = optc1_unlock,
.lock_doublebuffer_enable = optc3_lock_doublebuffer_enable,
.lock_doublebuffer_disable = optc3_lock_doublebuffer_disable,
diff --git a/drivers/gpu/drm/amd/display/dc/inc/core_types.h 
b/drivers/gpu/drm/amd/display/dc/inc/core_types.h
index 4ff1392633a7..1fd7ad853210 100644
--- a/drivers/gpu/drm/amd/display/dc/inc/core_types.h
+++ b/drivers/gpu/drm/amd/display/dc/inc/core_types.h
@@ -439,7 +439,6 @@ struct pipe_ctx {
union pipe_update_flags update_flags;
struct dwbc *dwbc;
struct mcif_wb *mcif_wb;
-   bool vtp_locked;
 };
 
 /* Data used for dynamic link encoder assignment.
diff --git a/drivers/gpu/drm/amd/display/dc/inc/hw/timing_generator.h 
b/drivers/gpu/drm/amd/display/dc/inc/hw/timing_generator.h
index 72eef7a5ed83..25a1df45b264 100644
--- a/drivers/gpu/drm/amd/display/dc/inc/hw/timing_generator.h
+++ b/drivers/gpu/drm/amd/display/dc/inc/hw/timing_generator.h
@@ -209,7 +209,6 @@ struct timing_generator_funcs {
void (*set_blank)(struct timing_generator

[PATCH 32/36] drm/amd/display: unblock mcm_luts

2022-09-28 Thread Hamza Mahfooz

From: "Leung, Martin" 

why and how:
needed to fix bad assumption for enable mcm_luts

Reviewed-by: Rodrigo Siqueira 
Signed-off-by: Martin Leung 
---
 drivers/gpu/drm/amd/display/dc/dcn32/dcn32_hwseq.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/amd/display/dc/dcn32/dcn32_hwseq.c 
b/drivers/gpu/drm/amd/display/dc/dcn32/dcn32_hwseq.c
index 8012a48859b5..5213f4443531 100644
--- a/drivers/gpu/drm/amd/display/dc/dcn32/dcn32_hwseq.c
+++ b/drivers/gpu/drm/amd/display/dc/dcn32/dcn32_hwseq.c
@@ -630,10 +630,9 @@ bool dcn32_set_input_transfer_func(struct dc *dc,
params = _base->degamma_params;
}
 
-   result = dpp_base->funcs->dpp_program_gamcor_lut(dpp_base, params);
+   dpp_base->funcs->dpp_program_gamcor_lut(dpp_base, params);
 
-   if (result &&
-   pipe_ctx->stream_res.opp &&
+   if (pipe_ctx->stream_res.opp &&
pipe_ctx->stream_res.opp->ctx &&
hws->funcs.set_mcm_luts)
result = hws->funcs.set_mcm_luts(pipe_ctx, plane_state);
-- 
2.37.2

[PATCH 09/36] drm/amd/display: prevent S4 test from failing

2022-09-28 Thread Hamza Mahfooz

From: Charlene Liu 

[why]
limit the vm prefetch check for now, until the feature is fully
verified.

Reviewed-by: Hansen Dsouza 
Acked-by: Hamza Mahfooz 
Signed-off-by: Charlene Liu 
---
 drivers/gpu/drm/amd/display/dc/dcn21/dcn21_hubbub.c | 8 +---
 1 file changed, 1 insertion(+), 7 deletions(-)

diff --git a/drivers/gpu/drm/amd/display/dc/dcn21/dcn21_hubbub.c 
b/drivers/gpu/drm/amd/display/dc/dcn21/dcn21_hubbub.c
index 5752271f22df..c5e200d09038 100644
--- a/drivers/gpu/drm/amd/display/dc/dcn21/dcn21_hubbub.c
+++ b/drivers/gpu/drm/amd/display/dc/dcn21/dcn21_hubbub.c
@@ -67,15 +67,9 @@ static uint32_t convert_and_clamp(
 void dcn21_dchvm_init(struct hubbub *hubbub)
 {
struct dcn20_hubbub *hubbub1 = TO_DCN20_HUBBUB(hubbub);
-   uint32_t riommu_active, prefetch_done;
+   uint32_t riommu_active;
int i;
 
-   REG_GET(DCHVM_RIOMMU_STAT0, HOSTVM_PREFETCH_DONE, _done);
-
-   if (prefetch_done) {
-   hubbub->riommu_active = true;
-   return;
-   }
//Init DCHVM block
REG_UPDATE(DCHVM_CTRL0, HOSTVM_INIT_REQ, 1);
 
-- 
2.37.2

[PATCH 33/36] drm/amd/display: Enable 2 to 1 ODM policy if supported

2022-09-28 Thread Hamza Mahfooz

From: Rodrigo Siqueira 

If the current configuration supports 2 to 1 ODM policy, let's also
enable the windowed MPO feature.

Reviewed-by: Aurabindo Pillai 
Signed-off-by: Rodrigo Siqueira 
---
 drivers/gpu/drm/amd/display/dc/dcn32/dcn32_hwseq.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/drivers/gpu/drm/amd/display/dc/dcn32/dcn32_hwseq.c 
b/drivers/gpu/drm/amd/display/dc/dcn32/dcn32_hwseq.c
index 5213f4443531..33bdf56b2b3a 100644
--- a/drivers/gpu/drm/amd/display/dc/dcn32/dcn32_hwseq.c
+++ b/drivers/gpu/drm/amd/display/dc/dcn32/dcn32_hwseq.c
@@ -990,6 +990,10 @@ void dcn32_init_hw(struct dc *dc)
dc_dmub_srv_query_caps_cmd(dc->ctx->dmub_srv->dmub);
dc->caps.dmub_caps.psr = 
dc->ctx->dmub_srv->dmub->feature_caps.psr;
}
+
+   /* Enable support for ODM and windowed MPO if policy flag is set */
+   if (dc->debug.enable_single_display_2to1_odm_policy)
+   dc->config.enable_windowed_mpo_odm = true;
 }
 
 static int calc_mpc_flow_ctrl_cnt(const struct dc_stream_state *stream,
-- 
2.37.2

[PATCH 15/36] drm/amd/display: Keep OTG on when Z10 is disable

2022-09-28 Thread Hamza Mahfooz

From: Lewis Huang 

[Why]
Disable OTG when PSRSU with z10 even if z10 is disable

[How]
Reverse condition to keep OTG on when Z10 is disable

Reviewed-by: Robin Chen 
Acked-by: Hamza Mahfooz 
Signed-off-by: Lewis Huang 
---
 drivers/gpu/drm/amd/display/dc/core/dc_link.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/display/dc/core/dc_link.c 
b/drivers/gpu/drm/amd/display/dc/core/dc_link.c
index cd14ec5a5c25..71cf6776998e 100644
--- a/drivers/gpu/drm/amd/display/dc/core/dc_link.c
+++ b/drivers/gpu/drm/amd/display/dc/core/dc_link.c
@@ -3378,8 +3378,8 @@ bool dc_link_setup_psr(struct dc_link *link,
case FAMILY_YELLOW_CARP:
case AMDGPU_FAMILY_GC_10_3_6:
case AMDGPU_FAMILY_GC_11_0_1:
-   if(!dc->debug.disable_z10)
-   psr_context->psr_level.bits.SKIP_CRTC_DISABLE = 
false;
+   if(dc->debug.disable_z10)
+   psr_context->psr_level.bits.SKIP_CRTC_DISABLE = 
true;
break;
default:
psr_context->psr_level.bits.SKIP_CRTC_DISABLE = true;
-- 
2.37.2

[PATCH 14/36] drm/amd/display: skip commit minimal transition state

2022-09-28 Thread Hamza Mahfooz

From: Zhikai Zhai 

[WHY]
Now dynamic ODM will now be disabled when MPO is required safe
transitions to avoid underflow, but we are triggering the way of
minimal transition too often. Commit state of dc with no check
will do pipeline setup which may re-initialize the component with no
need such as audio.

[HOW]
Just do the minimal transition when all of pipes are in use, otherwise
return true to skip.

Reviewed-by: Dillon Varone 
Acked-by: Hamza Mahfooz 
Signed-off-by: Zhikai Zhai 
---
 drivers/gpu/drm/amd/display/dc/core/dc.c | 20 
 1 file changed, 20 insertions(+)

diff --git a/drivers/gpu/drm/amd/display/dc/core/dc.c 
b/drivers/gpu/drm/amd/display/dc/core/dc.c
index 9b7c6bac4760..1508679873d9 100644
--- a/drivers/gpu/drm/amd/display/dc/core/dc.c
+++ b/drivers/gpu/drm/amd/display/dc/core/dc.c
@@ -3635,10 +3635,30 @@ static bool commit_minimal_transition_state(struct dc 
*dc,
bool temp_subvp_policy;
enum dc_status ret = DC_ERROR_UNEXPECTED;
unsigned int i, j;
+   unsigned int pipe_in_use = 0;
 
if (!transition_context)
return false;
 
+   /* check current pipes in use*/
+   for (i = 0; i < dc->res_pool->pipe_count; i++) {
+   struct pipe_ctx *pipe = 
_base_context->res_ctx.pipe_ctx[i];
+
+   if (pipe->plane_state)
+   pipe_in_use++;
+   }
+
+   /* When the OS add a new surface if we have been used all of pipes with 
odm combine
+* and mpc split feature, it need use commit_minimal_transition_state 
to transition safely.
+* After OS exit MPO, it will back to use odm and mpc split with all of 
pipes, we need
+* call it again. Otherwise return true to skip.
+*
+* Reduce the scenarios to use dc_commit_state_no_check in the stage of 
flip. Especially
+* enter/exit MPO when DCN still have enough resources.
+*/
+   if (pipe_in_use != dc->res_pool->pipe_count)
+   return true;
+
if (!dc->config.is_vmin_only_asic) {
tmp_mpc_policy = dc->debug.pipe_split_policy;
dc->debug.pipe_split_policy = MPC_SPLIT_AVOID;
-- 
2.37.2

[PATCH 13/36] drm/amd/display: Add log for LTTPR

2022-09-28 Thread Hamza Mahfooz

From: Leo Chen 

[Why & How]
Adding log for LTTPR to facilitate debugging.

Reviewed-by: Charlene Liu 
Acked-by: Hamza Mahfooz 
Signed-off-by: Leo Chen 
---
 .../gpu/drm/amd/display/dc/core/dc_link_dp.c  | 29 +++
 1 file changed, 23 insertions(+), 6 deletions(-)

diff --git a/drivers/gpu/drm/amd/display/dc/core/dc_link_dp.c 
b/drivers/gpu/drm/amd/display/dc/core/dc_link_dp.c
index 2093120867eb..4ea8acb16161 100644
--- a/drivers/gpu/drm/amd/display/dc/core/dc_link_dp.c
+++ b/drivers/gpu/drm/amd/display/dc/core/dc_link_dp.c
@@ -5090,6 +5090,7 @@ bool dp_retrieve_lttpr_cap(struct dc_link *link)

(dp_convert_to_count(link->dpcd_caps.lttpr_caps.phy_repeater_cnt) == 0)) {
ASSERT(0);
link->dpcd_caps.lttpr_caps.phy_repeater_cnt = 0x80;
+   DC_LOG_DC("lttpr_caps forced phy_repeater_cnt = %d\n", 
link->dpcd_caps.lttpr_caps.phy_repeater_cnt);
}
 
/* Attempt to train in LTTPR transparent mode if repeater count exceeds 
8. */
@@ -5098,6 +5099,7 @@ bool dp_retrieve_lttpr_cap(struct dc_link *link)
if (is_lttpr_present)
CONN_DATA_DETECT(link, lttpr_dpcd_data, 
sizeof(lttpr_dpcd_data), "LTTPR Caps: ");
 
+   DC_LOG_DC("is_lttpr_present = %d\n", is_lttpr_present);
return is_lttpr_present;
 }
 
@@ -5134,6 +5136,7 @@ void dp_get_lttpr_mode_override(struct dc_link *link, 
enum lttpr_mode *override)
} else if (link->dc->debug.lttpr_mode_override == LTTPR_MODE_NON_LTTPR) 
{
*override = LTTPR_MODE_NON_LTTPR;
}
+   DC_LOG_DC("lttpr_mode_override chose LTTPR_MODE = %d\n", 
(uint8_t)(*override));
 }
 
 enum lttpr_mode dp_decide_8b_10b_lttpr_mode(struct dc_link *link)
@@ -5146,22 +5149,34 @@ enum lttpr_mode dp_decide_8b_10b_lttpr_mode(struct 
dc_link *link)
return LTTPR_MODE_NON_LTTPR;
 
if (vbios_lttpr_aware) {
-   if (vbios_lttpr_force_non_transparent)
+   if (vbios_lttpr_force_non_transparent) {
+   DC_LOG_DC("chose LTTPR_MODE_NON_TRANSPARENT due to 
VBIOS DCE_INFO_CAPS_LTTPR_SUPPORT_ENABLE set to 1.\n");
return LTTPR_MODE_NON_TRANSPARENT;
-   else
+   } else {
+   DC_LOG_DC("chose LTTPR_MODE_NON_TRANSPARENT by default 
due to VBIOS not set DCE_INFO_CAPS_LTTPR_SUPPORT_ENABLE set to 1.\n");
return LTTPR_MODE_TRANSPARENT;
+   }
}
 
if (link->dc->config.allow_lttpr_non_transparent_mode.bits.DP1_4A &&
-   link->dc->caps.extended_aux_timeout_support)
+   link->dc->caps.extended_aux_timeout_support) {
+   DC_LOG_DC("chose LTTPR_MODE_NON_TRANSPARENT by default and 
dc->config.allow_lttpr_non_transparent_mode.bits.DP1_4A set to 1.\n");
return LTTPR_MODE_NON_TRANSPARENT;
+   }
 
+   DC_LOG_DC("chose LTTPR_MODE_NON_LTTPR.\n");
return LTTPR_MODE_NON_LTTPR;
 }
 
 enum lttpr_mode dp_decide_128b_132b_lttpr_mode(struct dc_link *link)
 {
-   return dp_is_lttpr_present(link) ? LTTPR_MODE_NON_TRANSPARENT : 
LTTPR_MODE_NON_LTTPR;
+   enum lttpr_mode mode = LTTPR_MODE_NON_LTTPR;
+
+   if (dp_is_lttpr_present(link))
+   mode = LTTPR_MODE_NON_TRANSPARENT;
+
+   DC_LOG_DC("128b_132b chose LTTPR_MODE %d.\n", mode);
+   return mode;
 }
 
 static bool get_usbc_cable_id(struct dc_link *link, union dp_cable_id 
*cable_id)
@@ -5179,9 +5194,10 @@ static bool get_usbc_cable_id(struct dc_link *link, 
union dp_cable_id *cable_id)
cmd.cable_id.data.input.phy_inst = resource_transmitter_to_phy_idx(
link->dc, link->link_enc->transmitter);
if (dc_dmub_srv_cmd_with_reply_data(link->ctx->dmub_srv, ) &&
-   cmd.cable_id.header.ret_status == 1)
+   cmd.cable_id.header.ret_status == 1) {
cable_id->raw = cmd.cable_id.data.output_raw;
-
+   DC_LOG_DC("usbc_cable_id = %d.\n", cable_id->raw);
+   }
return cmd.cable_id.header.ret_status == 1;
 }
 
@@ -5228,6 +5244,7 @@ static enum dc_status wa_try_to_wake_dprx(struct dc_link 
*link, uint64_t timeout
 
lttpr_present = dp_is_lttpr_present(link) ||
(!vbios_lttpr_interop || 
!link->dc->caps.extended_aux_timeout_support);
+   DC_LOG_DC("lttpr_present = %d.\n", lttpr_present ? 1 : 0);
 
/* Issue an AUX read to test DPRX responsiveness. If LTTPR is supported 
the first read is expected to
 * be to determine LTTPR capabilities. Otherwise trying to read power 
state should be an innocuous AUX read.
-- 
2.37.2

[PATCH 30/36] drm/amd/display: Add missing mask sh for SYM32_TP_SQ_PULSE register

2022-09-28 Thread Hamza Mahfooz

From: Wenjing Liu 

There is a missing register mask in dcn32 causing the hardware
programming is not executed when programming SQ_num test pattern for
DP2.

Reviewed-by: Rodrigo Siqueira 
Signed-off-by: Wenjing Liu 
---
 drivers/gpu/drm/amd/display/dc/dcn32/dcn32_hpo_dp_link_encoder.h | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/gpu/drm/amd/display/dc/dcn32/dcn32_hpo_dp_link_encoder.h 
b/drivers/gpu/drm/amd/display/dc/dcn32/dcn32_hpo_dp_link_encoder.h
index 9db1323e1933..176b1537d2a1 100644
--- a/drivers/gpu/drm/amd/display/dc/dcn32/dcn32_hpo_dp_link_encoder.h
+++ b/drivers/gpu/drm/amd/display/dc/dcn32/dcn32_hpo_dp_link_encoder.h
@@ -47,6 +47,7 @@
SE_SF(DP_DPHY_SYM320_DP_DPHY_SYM32_TP_CONFIG, TP_PRBS_SEL1, mask_sh),\
SE_SF(DP_DPHY_SYM320_DP_DPHY_SYM32_TP_CONFIG, TP_PRBS_SEL2, mask_sh),\
SE_SF(DP_DPHY_SYM320_DP_DPHY_SYM32_TP_CONFIG, TP_PRBS_SEL3, mask_sh),\
+   SE_SF(DP_DPHY_SYM320_DP_DPHY_SYM32_TP_SQ_PULSE, TP_SQ_PULSE_WIDTH, 
mask_sh),\
SE_SF(DP_DPHY_SYM320_DP_DPHY_SYM32_SAT_VC0, SAT_STREAM_SOURCE, 
mask_sh),\
SE_SF(DP_DPHY_SYM320_DP_DPHY_SYM32_SAT_VC0, SAT_SLOT_COUNT, mask_sh),\
SE_SF(DP_DPHY_SYM320_DP_DPHY_SYM32_VC_RATE_CNTL0, STREAM_VC_RATE_X, 
mask_sh),\
-- 
2.37.2

[PATCH 07/36] drm/amd/display: Refactor edp ILR caps codes

2022-09-28 Thread Hamza Mahfooz

From: Ian Chen 

We split out ILR config from "global" to "per-panel" config settings.

Reviewed-by: Anthony Koo 
Acked-by: Hamza Mahfooz 
Signed-off-by: Ian Chen 
---
 drivers/gpu/drm/amd/display/dc/core/dc_link.c   |  5 -
 drivers/gpu/drm/amd/display/dc/core/dc_link_dp.c|  4 ++--
 drivers/gpu/drm/amd/display/dc/dc.h |  1 -
 drivers/gpu/drm/amd/display/dc/dc_link.h|  4 
 .../gpu/drm/amd/display/dc/dcn21/dcn21_resource.c   | 13 -
 .../gpu/drm/amd/display/dc/dcn31/dcn31_resource.c   | 13 -
 .../gpu/drm/amd/display/dc/dcn314/dcn314_resource.c | 13 -
 .../gpu/drm/amd/display/dc/dcn315/dcn315_resource.c | 13 -
 .../gpu/drm/amd/display/dc/dcn316/dcn316_resource.c | 13 -
 drivers/gpu/drm/amd/display/dc/inc/core_types.h |  1 +
 10 files changed, 71 insertions(+), 9 deletions(-)

diff --git a/drivers/gpu/drm/amd/display/dc/core/dc_link.c 
b/drivers/gpu/drm/amd/display/dc/core/dc_link.c
index f9479c3ace97..cd14ec5a5c25 100644
--- a/drivers/gpu/drm/amd/display/dc/core/dc_link.c
+++ b/drivers/gpu/drm/amd/display/dc/core/dc_link.c
@@ -1307,7 +1307,10 @@ static bool detect_link_and_local_sink(struct dc_link 
*link,
}
 
if (link->connector_signal == SIGNAL_TYPE_EDP) {
-   // Init dc_panel_config
+   /* Init dc_panel_config by HW config */
+   if 
(dc_ctx->dc->res_pool->funcs->get_panel_config_defaults)
+   
dc_ctx->dc->res_pool->funcs->get_panel_config_defaults(>panel_config);
+   /* Pickup base DM settings */
dm_helpers_init_panel_settings(dc_ctx, 
>panel_config, sink);
// Override dc_panel_config if system has specific 
settings
dm_helpers_override_panel_settings(dc_ctx, 
>panel_config);
diff --git a/drivers/gpu/drm/amd/display/dc/core/dc_link_dp.c 
b/drivers/gpu/drm/amd/display/dc/core/dc_link_dp.c
index d9cdce8f695d..2093120867eb 100644
--- a/drivers/gpu/drm/amd/display/dc/core/dc_link_dp.c
+++ b/drivers/gpu/drm/amd/display/dc/core/dc_link_dp.c
@@ -5795,7 +5795,7 @@ void detect_edp_sink_caps(struct dc_link *link)
 * Per VESA eDP spec, "The DPCD revision for eDP v1.4 is 13h"
 */
if (link->dpcd_caps.dpcd_rev.raw >= DPCD_REV_13 &&
-   (link->dc->debug.optimize_edp_link_rate ||
+   (link->panel_config.ilr.optimize_edp_link_rate ||
link->reported_link_cap.link_rate == 
LINK_RATE_UNKNOWN)) {
// Read DPCD 00010h - 0001Fh 16 bytes at one shot
core_link_read_dpcd(link, DP_SUPPORTED_LINK_RATES,
@@ -6744,7 +6744,7 @@ bool is_edp_ilr_optimization_required(struct dc_link 
*link, struct dc_crtc_timin
ASSERT(link || crtc_timing); // invalid input
 
if (link->dpcd_caps.edp_supported_link_rates_count == 0 ||
-   !link->dc->debug.optimize_edp_link_rate)
+   !link->panel_config.ilr.optimize_edp_link_rate)
return false;
 
 
diff --git a/drivers/gpu/drm/amd/display/dc/dc.h 
b/drivers/gpu/drm/amd/display/dc/dc.h
index 2ecf36e6329b..458a4f431ac6 100644
--- a/drivers/gpu/drm/amd/display/dc/dc.h
+++ b/drivers/gpu/drm/amd/display/dc/dc.h
@@ -821,7 +821,6 @@ struct dc_debug_options {
/* Enable dmub aux for legacy ddc */
bool enable_dmub_aux_for_legacy_ddc;
bool disable_fams;
-   bool optimize_edp_link_rate; /* eDP ILR */
/* FEC/PSR1 sequence enable delay in 100us */
uint8_t fec_enable_delay_in100us;
bool enable_driver_sequence_debug;
diff --git a/drivers/gpu/drm/amd/display/dc/dc_link.h 
b/drivers/gpu/drm/amd/display/dc/dc_link.h
index bf5f9e2773bc..caf0c7af2d0b 100644
--- a/drivers/gpu/drm/amd/display/dc/dc_link.h
+++ b/drivers/gpu/drm/amd/display/dc/dc_link.h
@@ -138,6 +138,10 @@ struct dc_panel_config {
bool disable_dsc_edp;
unsigned int force_dsc_edp_policy;
} dsc;
+   /* eDP ILR */
+   struct ilr {
+   bool optimize_edp_link_rate; /* eDP ILR */
+   } ilr;
 };
 /*
  * A link contains one or more sinks and their connected status.
diff --git a/drivers/gpu/drm/amd/display/dc/dcn21/dcn21_resource.c 
b/drivers/gpu/drm/amd/display/dc/dcn21/dcn21_resource.c
index 7cb35bb1c0f1..887081472c0d 100644
--- a/drivers/gpu/drm/amd/display/dc/dcn21/dcn21_resource.c
+++ b/drivers/gpu/drm/amd/display/dc/dcn21/dcn21_resource.c
@@ -657,7 +657,6 @@ static const struct dc_debug_options debug_defaults_drv = {
.usbc_combo_phy_reset_wa = true,
.dmub_command_table = true,
.use_max_lb = true,
-   .optimize_edp_link_rate = true
 };
 
 static const struct dc_debug_options debug_defaults_diags = {
@@ -677,6 +676,12 @@ static const struct dc_debug_options debug_defaults_diags 
= {

[PATCH 17/36] drm/amd/display: fix integer overflow during MSA V_Freq calculation

2022-09-28 Thread Hamza Mahfooz

From: Wenjing Liu 

[why]
Analyzer shows incorrect V freq in MSA for some large timing.

[how]
Cast an 32 bit integer to uint64_t before multiplication to avoid
integer overflow for a very large timing.

Reviewed-by: Ariel Bernstein 
Acked-by: Hamza Mahfooz 
Signed-off-by: Wenjing Liu 
---
 .../drm/amd/display/dc/dcn31/dcn31_hpo_dp_stream_encoder.c| 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/display/dc/dcn31/dcn31_hpo_dp_stream_encoder.c 
b/drivers/gpu/drm/amd/display/dc/dcn31/dcn31_hpo_dp_stream_encoder.c
index 52fb2bf3d578..d71d89268a07 100644
--- a/drivers/gpu/drm/amd/display/dc/dcn31/dcn31_hpo_dp_stream_encoder.c
+++ b/drivers/gpu/drm/amd/display/dc/dcn31/dcn31_hpo_dp_stream_encoder.c
@@ -197,7 +197,7 @@ static void dcn31_hpo_dp_stream_enc_set_stream_attribute(
uint32_t h_back_porch;
uint32_t h_width;
uint32_t v_height;
-   unsigned long long v_freq;
+   uint64_t v_freq;
uint8_t misc0 = 0;
uint8_t misc1 = 0;
uint8_t hsp;
@@ -360,7 +360,7 @@ static void dcn31_hpo_dp_stream_enc_set_stream_attribute(
v_height = hw_crtc_timing.v_border_top + hw_crtc_timing.v_addressable + 
hw_crtc_timing.v_border_bottom;
hsp = hw_crtc_timing.flags.HSYNC_POSITIVE_POLARITY ? 0 : 0x80;
vsp = hw_crtc_timing.flags.VSYNC_POSITIVE_POLARITY ? 0 : 0x80;
-   v_freq = hw_crtc_timing.pix_clk_100hz * 100;
+   v_freq = (uint64_t)hw_crtc_timing.pix_clk_100hz * 100;
 
/*   MSA Packet Mapping to 32-bit Link Symbols - DP2 spec, section 
2.7.4.1
 *
-- 
2.37.2

[PATCH 20/36] drm/amd/display: Fix merging dynamic ODM+MPO configs on DCN32

2022-09-28 Thread Hamza Mahfooz

From: Dillon Varone 

[WHY?]
When merging ODM pipes that are using MPO, we must copy the stream_res
from the new top pipe to the bottom pipe so that the overlayed plane is
not pointing to the wrong stream assets.

Reviewed-by: Martin Leung 
Reviewed-by: Jun Lei 
Acked-by: Hamza Mahfooz 
Signed-off-by: Dillon Varone 
---
 drivers/gpu/drm/amd/display/dc/dml/dcn32/dcn32_fpu.c | 5 +
 1 file changed, 5 insertions(+)

diff --git a/drivers/gpu/drm/amd/display/dc/dml/dcn32/dcn32_fpu.c 
b/drivers/gpu/drm/amd/display/dc/dml/dcn32/dcn32_fpu.c
index a56ee04f7df9..f3f98e9a0ce6 100644
--- a/drivers/gpu/drm/amd/display/dc/dml/dcn32/dcn32_fpu.c
+++ b/drivers/gpu/drm/amd/display/dc/dml/dcn32/dcn32_fpu.c
@@ -1598,6 +1598,9 @@ bool dcn32_internal_validate_bw(struct dc *dc,
/*MPC split rules will handle this 
case*/
pipe->bottom_pipe->top_pipe = NULL;
} else {
+   /* when merging an ODM pipes, the 
bottom MPC pipe must now point to
+* the previous ODM pipe and its 
associated stream assets
+*/
if (pipe->prev_odm_pipe->bottom_pipe) {
/* 3 plane MPO*/
pipe->bottom_pipe->top_pipe = 
pipe->prev_odm_pipe->bottom_pipe;
@@ -1607,6 +1610,8 @@ bool dcn32_internal_validate_bw(struct dc *dc,
pipe->bottom_pipe->top_pipe = 
pipe->prev_odm_pipe;

pipe->prev_odm_pipe->bottom_pipe = pipe->bottom_pipe;
}
+
+   memcpy(>bottom_pipe->stream_res, 
>bottom_pipe->top_pipe->stream_res, sizeof(struct stream_resource));
}
}
 
-- 
2.37.2

[PATCH 22/36] drm/amd/display: 3.2.206

2022-09-28 Thread Hamza Mahfooz

From: Aric Cyr 

This version brings along following:
- ILR improvements
- PSR fixes
- DCN315 fixes
- DCN32 fixes
- ODM fixes
- DSC fixes
- SubVP fixes

Acked-by: Hamza Mahfooz 
Signed-off-by: Aric Cyr 
---
 drivers/gpu/drm/amd/display/dc/dc.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/display/dc/dc.h 
b/drivers/gpu/drm/amd/display/dc/dc.h
index 458a4f431ac6..66b7482d2e72 100644
--- a/drivers/gpu/drm/amd/display/dc/dc.h
+++ b/drivers/gpu/drm/amd/display/dc/dc.h
@@ -47,7 +47,7 @@ struct aux_payload;
 struct set_config_cmd_payload;
 struct dmub_notification;
 
-#define DC_VER "3.2.205"
+#define DC_VER "3.2.206"
 
 #define MAX_SURFACES 3
 #define MAX_PLANES 6
-- 
2.37.2

[PATCH 31/36] drm/amd/display: Add PState change high hook for DCN32

2022-09-28 Thread Hamza Mahfooz

From: Rodrigo Siqueira 

For some reason, we missed the PState check for DCN32 which may cause
issues for clock transition. This commit add that required hook.

Reviewed-by: Aurabindo Pillai 
Signed-off-by: Rodrigo Siqueira 
---
 drivers/gpu/drm/amd/display/dc/dcn32/dcn32_hubbub.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/gpu/drm/amd/display/dc/dcn32/dcn32_hubbub.c 
b/drivers/gpu/drm/amd/display/dc/dcn32/dcn32_hubbub.c
index f6d3da475835..9fbb72369c10 100644
--- a/drivers/gpu/drm/amd/display/dc/dcn32/dcn32_hubbub.c
+++ b/drivers/gpu/drm/amd/display/dc/dcn32/dcn32_hubbub.c
@@ -936,6 +936,7 @@ static const struct hubbub_funcs hubbub32_funcs = {
.program_watermarks = hubbub32_program_watermarks,
.allow_self_refresh_control = hubbub1_allow_self_refresh_control,
.is_allow_self_refresh_enabled = hubbub1_is_allow_self_refresh_enabled,
+   .verify_allow_pstate_change_high = 
hubbub1_verify_allow_pstate_change_high,
.force_wm_propagate_to_pipes = hubbub32_force_wm_propagate_to_pipes,
.force_pstate_change_control = hubbub3_force_pstate_change_control,
.init_watermarks = hubbub32_init_watermarks,
-- 
2.37.2

[PATCH 24/36] drm/amd/display: Update DCN321 hook that deals with pipe aquire

2022-09-28 Thread Hamza Mahfooz

From: Rodrigo Siqueira 

DCN provides a hook to check if we can have a new pipe allocation based
on some DC constraints. If the current configuration supports the new
pipe request, DC updates its context; otherwise, it will keep the same
configuration. This behavior is similar across multiple ASICs, and for
this reason, we reused DCN20 on DCN321. However, this DCN32x has some
peculiarities which require its function to avoid weird pipe split
issues. This commit update this issue by using
dcn32_acquire_idle_pipe_for_head_pipe_in_layer instead of
dcn20_acquire_idle_pipe_for_layer.

Reviewed-by: Aurabindo Pillai 
Signed-off-by: Rodrigo Siqueira 
---
 drivers/gpu/drm/amd/display/dc/dcn321/dcn321_resource.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/display/dc/dcn321/dcn321_resource.c 
b/drivers/gpu/drm/amd/display/dc/dcn321/dcn321_resource.c
index 910b63d874d5..6658849d5b4e 100644
--- a/drivers/gpu/drm/amd/display/dc/dcn321/dcn321_resource.c
+++ b/drivers/gpu/drm/amd/display/dc/dcn321/dcn321_resource.c
@@ -1604,7 +1604,7 @@ static struct resource_funcs dcn321_res_pool_funcs = {
.validate_bandwidth = dcn32_validate_bandwidth,
.calculate_wm_and_dlg = dcn32_calculate_wm_and_dlg,
.populate_dml_pipes = dcn32_populate_dml_pipes_from_context,
-   .acquire_idle_pipe_for_layer = dcn20_acquire_idle_pipe_for_layer,
+   .acquire_idle_pipe_for_head_pipe_in_layer = 
dcn32_acquire_idle_pipe_for_head_pipe_in_layer,
.add_stream_to_ctx = dcn30_add_stream_to_ctx,
.add_dsc_to_stream_resource = dcn20_add_dsc_to_stream_resource,
.remove_stream_from_ctx = dcn20_remove_stream_from_ctx,
-- 
2.37.2

[PATCH 25/36] drm/amd/display: Fix SubVP control flow in the MPO context

2022-09-28 Thread Hamza Mahfooz

From: Rodrigo Siqueira 

SubVP has some issues related to how we allocate and enable it. This
commit fixes this behavior by adding the proper check and configuration
to the SubVP code path.

Reviewed-by: Aurabindo Pillai 
Signed-off-by: Rodrigo Siqueira 
---
 drivers/gpu/drm/amd/display/dc/core/dc.c   | 16 ++--
 .../gpu/drm/amd/display/dc/dcn20/dcn20_hwseq.c | 18 --
 .../drm/amd/display/dc/dcn32/dcn32_resource.c  |  6 ++
 3 files changed, 20 insertions(+), 20 deletions(-)

diff --git a/drivers/gpu/drm/amd/display/dc/core/dc.c 
b/drivers/gpu/drm/amd/display/dc/core/dc.c
index 1508679873d9..3a4f2d58f2e8 100644
--- a/drivers/gpu/drm/amd/display/dc/core/dc.c
+++ b/drivers/gpu/drm/amd/display/dc/core/dc.c
@@ -2946,6 +2946,12 @@ static bool update_planes_and_stream_state(struct dc *dc,
dc_resource_state_copy_construct(
dc->current_state, context);
 
+   /* For each full update, remove all existing phantom pipes 
first.
+* Ensures that we have enough pipes for newly added MPO planes
+*/
+   if (dc->res_pool->funcs->remove_phantom_pipes)
+   dc->res_pool->funcs->remove_phantom_pipes(dc, context);
+
/*remove old surfaces from context */
if (!dc_rem_all_planes_for_stream(dc, stream, context)) {
 
@@ -3353,8 +3359,14 @@ static void commit_planes_for_stream(struct dc *dc,
/* Since phantom pipe programming is moved to 
post_unlock_program_front_end,
 * move the SubVP lock to after the phantom pipes have been 
setup
 */
-   if (dc->hwss.subvp_pipe_control_lock)
-   dc->hwss.subvp_pipe_control_lock(dc, context, false, 
should_lock_all_pipes, NULL, subvp_prev_use);
+   if (should_lock_all_pipes && 
dc->hwss.interdependent_update_lock) {
+   if (dc->hwss.subvp_pipe_control_lock)
+   dc->hwss.subvp_pipe_control_lock(dc, context, 
false, should_lock_all_pipes, NULL, subvp_prev_use);
+   } else {
+   if (dc->hwss.subvp_pipe_control_lock)
+   dc->hwss.subvp_pipe_control_lock(dc, context, 
false, should_lock_all_pipes, NULL, subvp_prev_use);
+   }
+
return;
}
 
diff --git a/drivers/gpu/drm/amd/display/dc/dcn20/dcn20_hwseq.c 
b/drivers/gpu/drm/amd/display/dc/dcn20/dcn20_hwseq.c
index 7de511fd004b..d732b6f031a1 100644
--- a/drivers/gpu/drm/amd/display/dc/dcn20/dcn20_hwseq.c
+++ b/drivers/gpu/drm/amd/display/dc/dcn20/dcn20_hwseq.c
@@ -1860,24 +1860,6 @@ void dcn20_post_unlock_program_front_end(
}
}
 
-   for (i = 0; i < dc->res_pool->pipe_count; i++) {
-   struct pipe_ctx *pipe = >res_ctx.pipe_ctx[i];
-   struct pipe_ctx *mpcc_pipe;
-
-   if (pipe->vtp_locked) {
-   
dc->hwseq->funcs.wait_for_blank_complete(pipe->stream_res.opp);
-   
pipe->plane_res.hubp->funcs->set_blank(pipe->plane_res.hubp, true);
-   pipe->vtp_locked = false;
-
-   for (mpcc_pipe = pipe->bottom_pipe; mpcc_pipe; 
mpcc_pipe = mpcc_pipe->bottom_pipe)
-   
mpcc_pipe->plane_res.hubp->funcs->set_blank(mpcc_pipe->plane_res.hubp, true);
-
-   for (i = 0; i < dc->res_pool->pipe_count; i++)
-   if 
(context->res_ctx.pipe_ctx[i].update_flags.bits.disable)
-   dc->hwss.disable_plane(dc, 
>current_state->res_ctx.pipe_ctx[i]);
-   }
-   }
-
for (i = 0; i < dc->res_pool->pipe_count; i++) {
struct pipe_ctx *pipe = >res_ctx.pipe_ctx[i];
struct pipe_ctx *old_pipe = 
>current_state->res_ctx.pipe_ctx[i];
diff --git a/drivers/gpu/drm/amd/display/dc/dcn32/dcn32_resource.c 
b/drivers/gpu/drm/amd/display/dc/dcn32/dcn32_resource.c
index 752a4accb116..9585b25f10e5 100644
--- a/drivers/gpu/drm/amd/display/dc/dcn32/dcn32_resource.c
+++ b/drivers/gpu/drm/amd/display/dc/dcn32/dcn32_resource.c
@@ -1680,6 +1680,8 @@ static void dcn32_enable_phantom_plane(struct dc *dc,
phantom_plane->clip_rect.y = 0;
phantom_plane->clip_rect.height = 
phantom_stream->timing.v_addressable;
 
+   phantom_plane->is_phantom = true;
+
dc_add_plane_to_context(dc, phantom_stream, phantom_plane, 
context);
 
curr_pipe = curr_pipe->bottom_pipe;
@@ -1749,6 +1751,10 @@ bool dcn32_remove_phantom_pipes(struct dc *dc, struct 
dc_state *context)
pipe->stream->mall_stream_config.type = SUBVP_NONE;
pipe->stream->mall_stream_config.paired_stream = NULL;
}
+
+   if (pipe->plane_state) {
+   pipe->plane_state->is_phantom = false;
+

[PATCH 18/36] drm/amd/display: write all 4 bytes of FFE_PRESET dpcd value

2022-09-28 Thread Hamza Mahfooz

From: Wenjing Liu 

[why]
According to specs, it expects us to write all 4 bytes even if
current lane count is less than 4.

Reviewed-by: George Shen 
Acked-by: Hamza Mahfooz 
Signed-off-by: Wenjing Liu 
---
 .../gpu/drm/amd/display/dc/core/dc_link_dp.c  | 37 +--
 1 file changed, 18 insertions(+), 19 deletions(-)

diff --git a/drivers/gpu/drm/amd/display/dc/core/dc_link_dp.c 
b/drivers/gpu/drm/amd/display/dc/core/dc_link_dp.c
index 4ea8acb16161..c4acadba78d6 100644
--- a/drivers/gpu/drm/amd/display/dc/core/dc_link_dp.c
+++ b/drivers/gpu/drm/amd/display/dc/core/dc_link_dp.c
@@ -944,6 +944,23 @@ enum dc_status dp_get_lane_status_and_lane_adjust(
return status;
 }
 
+static enum dc_status dpcd_128b_132b_set_lane_settings(
+   struct dc_link *link,
+   const struct link_training_settings *link_training_setting)
+{
+   enum dc_status status = core_link_write_dpcd(link,
+   DP_TRAINING_LANE0_SET,
+   (uint8_t *)(link_training_setting->dpcd_lane_settings),
+   sizeof(link_training_setting->dpcd_lane_settings));
+
+   DC_LOG_HW_LINK_TRAINING("%s:\n 0x%X TX_FFE_PRESET_VALUE = %x\n",
+   __func__,
+   DP_TRAINING_LANE0_SET,
+   
link_training_setting->dpcd_lane_settings[0].tx_ffe.PRESET_VALUE);
+   return status;
+}
+
+
 enum dc_status dpcd_set_lane_settings(
struct dc_link *link,
const struct link_training_settings *link_training_setting,
@@ -964,16 +981,6 @@ enum dc_status dpcd_set_lane_settings(
link_training_setting->link_settings.lane_count);
 
if (is_repeater(link_training_setting, offset)) {
-   if 
(dp_get_link_encoding_format(_training_setting->link_settings) ==
-   DP_128b_132b_ENCODING)
-   DC_LOG_HW_LINK_TRAINING("%s:\n LTTPR Repeater ID: %d\n"
-   " 0x%X TX_FFE_PRESET_VALUE = %x\n",
-   __func__,
-   offset,
-   lane0_set_address,
-   
link_training_setting->dpcd_lane_settings[0].tx_ffe.PRESET_VALUE);
-   else if 
(dp_get_link_encoding_format(_training_setting->link_settings) ==
-   DP_8b_10b_ENCODING)
DC_LOG_HW_LINK_TRAINING("%s\n LTTPR Repeater ID: %d\n"
" 0x%X VS set = %x  PE set = %x max VS Reached 
= %x  max PE Reached = %x\n",
__func__,
@@ -985,14 +992,6 @@ enum dc_status dpcd_set_lane_settings(

link_training_setting->dpcd_lane_settings[0].bits.MAX_PRE_EMPHASIS_REACHED);
 
} else {
-   if 
(dp_get_link_encoding_format(_training_setting->link_settings) ==
-   DP_128b_132b_ENCODING)
-   DC_LOG_HW_LINK_TRAINING("%s:\n 0x%X TX_FFE_PRESET_VALUE 
= %x\n",
-   __func__,
-   lane0_set_address,
-   
link_training_setting->dpcd_lane_settings[0].tx_ffe.PRESET_VALUE);
-   else if 
(dp_get_link_encoding_format(_training_setting->link_settings) ==
-   DP_8b_10b_ENCODING)
DC_LOG_HW_LINK_TRAINING("%s\n 0x%X VS set = %x  PE set = %x max 
VS Reached = %x  max PE Reached = %x\n",
__func__,
lane0_set_address,
@@ -2023,7 +2022,7 @@ static enum link_training_result 
dp_perform_128b_132b_channel_eq_done_sequence(
result = DP_128b_132b_LT_FAILED;
} else {
dp_set_hw_lane_settings(link, link_res, lt_settings, 
DPRX);
-   dpcd_set_lane_settings(link, lt_settings, DPRX);
+   dpcd_128b_132b_set_lane_settings(link, lt_settings);
}
loop_count++;
}
-- 
2.37.2

[PATCH 21/36] drm/amd/display: block odd h_total timings from halving pixel rate

2022-09-28 Thread Hamza Mahfooz

From: Martin Leung 

why:
when dynamic odm was turned on, there is also logic to halve the pixelclk
this still turned on when we avoided odm in the case of odd h_total timings

how:
block the pixel clk mechanism also in the case of odd h_total timings

Reviewed-by: Jun Lei 
Acked-by: Hamza Mahfooz 
Signed-off-by: Martin Leung 
---
 .../dc/dcn32/dcn32_dio_stream_encoder.c   | 35 ++-
 .../drm/amd/display/dc/dcn32/dcn32_hwseq.c|  9 ++---
 2 files changed, 39 insertions(+), 5 deletions(-)

diff --git a/drivers/gpu/drm/amd/display/dc/dcn32/dcn32_dio_stream_encoder.c 
b/drivers/gpu/drm/amd/display/dc/dcn32/dcn32_dio_stream_encoder.c
index 0e9dce414641..3195be9d38f5 100644
--- a/drivers/gpu/drm/amd/display/dc/dcn32/dcn32_dio_stream_encoder.c
+++ b/drivers/gpu/drm/amd/display/dc/dcn32/dcn32_dio_stream_encoder.c
@@ -243,6 +243,39 @@ static bool is_two_pixels_per_containter(const struct 
dc_crtc_timing *timing)
return two_pix;
 }
 
+static bool is_h_timing_divisible_by_2(const struct dc_crtc_timing *timing)
+{
+   /* math borrowed from function of same name in inc/resource
+* checks if h_timing is divisible by 2
+*/
+
+   bool divisible = false;
+   uint16_t h_blank_start = 0;
+   uint16_t h_blank_end = 0;
+
+   if (timing) {
+   h_blank_start = timing->h_total - timing->h_front_porch;
+   h_blank_end = h_blank_start - timing->h_addressable;
+
+   /* HTOTAL, Hblank start/end, and Hsync start/end all must be
+* divisible by 2 in order for the horizontal timing params
+* to be considered divisible by 2. Hsync start is always 0.
+*/
+   divisible = (timing->h_total % 2 == 0) &&
+   (h_blank_start % 2 == 0) &&
+   (h_blank_end % 2 == 0) &&
+   (timing->h_sync_width % 2 == 0);
+   }
+   return divisible;
+}
+
+static bool is_dp_dig_pixel_rate_div_policy(struct dc *dc, const struct 
dc_crtc_timing *timing)
+{
+   /* should be functionally the same as 
dcn32_is_dp_dig_pixel_rate_div_policy for DP encoders*/
+   return is_h_timing_divisible_by_2(timing) &&
+   dc->debug.enable_dp_dig_pixel_rate_div_policy;
+}
+
 static void enc32_stream_encoder_dp_unblank(
 struct dc_link *link,
struct stream_encoder *enc,
@@ -259,7 +292,7 @@ static void enc32_stream_encoder_dp_unblank(
 
/* YCbCr 4:2:0 : Computed VID_M will be 2X the input rate */
if (is_two_pixels_per_containter(>timing) || 
param->opp_cnt > 1
-   || dc->debug.enable_dp_dig_pixel_rate_div_policy) {
+   || is_dp_dig_pixel_rate_div_policy(dc, >timing)) 
{
/*this logic should be the same in 
get_pixel_clock_parameters() */
n_multiply = 1;
}
diff --git a/drivers/gpu/drm/amd/display/dc/dcn32/dcn32_hwseq.c 
b/drivers/gpu/drm/amd/display/dc/dcn32/dcn32_hwseq.c
index a750343ca521..8012a48859b5 100644
--- a/drivers/gpu/drm/amd/display/dc/dcn32/dcn32_hwseq.c
+++ b/drivers/gpu/drm/amd/display/dc/dcn32/dcn32_hwseq.c
@@ -1161,7 +1161,6 @@ unsigned int dcn32_calculate_dccg_k1_k2_values(struct 
pipe_ctx *pipe_ctx, unsign
 {
struct dc_stream_state *stream = pipe_ctx->stream;
unsigned int odm_combine_factor = 0;
-   struct dc *dc = pipe_ctx->stream->ctx->dc;
bool two_pix_per_container = false;
 
// For phantom pipes, use the same programming as the main pipes
@@ -1189,7 +1188,7 @@ unsigned int dcn32_calculate_dccg_k1_k2_values(struct 
pipe_ctx *pipe_ctx, unsign
} else {
*k1_div = PIXEL_RATE_DIV_BY_1;
*k2_div = PIXEL_RATE_DIV_BY_4;
-   if ((odm_combine_factor == 2) || 
dc->debug.enable_dp_dig_pixel_rate_div_policy)
+   if ((odm_combine_factor == 2) || 
dcn32_is_dp_dig_pixel_rate_div_policy(pipe_ctx))
*k2_div = PIXEL_RATE_DIV_BY_2;
}
}
@@ -1226,7 +1225,6 @@ void dcn32_unblank_stream(struct pipe_ctx *pipe_ctx,
struct dc_link *link = stream->link;
struct dce_hwseq *hws = link->dc->hwseq;
struct pipe_ctx *odm_pipe;
-   struct dc *dc = pipe_ctx->stream->ctx->dc;
uint32_t pix_per_cycle = 1;
 
params.opp_cnt = 1;
@@ -1245,7 +1243,7 @@ void dcn32_unblank_stream(struct pipe_ctx *pipe_ctx,
pipe_ctx->stream_res.tg->inst);
} else if (dc_is_dp_signal(pipe_ctx->stream->signal)) {
if (optc2_is_two_pixels_per_containter(>timing) || 
params.opp_cnt > 1
-   || dc->debug.enable_dp_dig_pixel_rate_div_policy) {
+   || dcn32_is_dp_dig_pixel_rate_div_policy(pipe_ctx)) {
params.timing.pix_clk_100hz /= 2;

[PATCH 12/36] drm/amd/display: For SubVP pipe split case use min transition into MPO

2022-09-28 Thread Hamza Mahfooz

From: Alvin Lee 

[Description]
- For SubVP pipe split case we need to use a minimial transition
  when opening MPO video since we are transitioning from 4 pipes
  to 3 pipes where an OPP for a previous MPCC will change
- Also save and restore mall config when doing fast_validate in case
  there was a shallow copy of the dc->current_state

Reviewed-by: Jun Lei 
Acked-by: Hamza Mahfooz 
Signed-off-by: Alvin Lee 
---
 drivers/gpu/drm/amd/display/dc/core/dc.c  | 36 ++
 .../drm/amd/display/dc/dcn32/dcn32_resource.c | 18 +
 .../drm/amd/display/dc/dcn32/dcn32_resource.h | 20 ++
 .../display/dc/dcn32/dcn32_resource_helpers.c | 71 +++
 4 files changed, 145 insertions(+)

diff --git a/drivers/gpu/drm/amd/display/dc/core/dc.c 
b/drivers/gpu/drm/amd/display/dc/core/dc.c
index 2584cb8f44e2..9b7c6bac4760 100644
--- a/drivers/gpu/drm/amd/display/dc/core/dc.c
+++ b/drivers/gpu/drm/amd/display/dc/core/dc.c
@@ -3561,6 +3561,7 @@ static bool 
could_mpcc_tree_change_for_active_pipes(struct dc *dc,
 
struct dc_stream_status *cur_stream_status = 
stream_get_status(dc->current_state, stream);
bool force_minimal_pipe_splitting = false;
+   uint32_t i;
 
*is_plane_addition = false;
 
@@ -3592,6 +3593,36 @@ static bool 
could_mpcc_tree_change_for_active_pipes(struct dc *dc,
}
}
 
+   /* For SubVP pipe split case when adding MPO video
+* we need to add a minimal transition. In this case
+* there will be 2 streams (1 main stream, 1 phantom
+* stream).
+*/
+   if (cur_stream_status &&
+   dc->current_state->stream_count == 2 &&
+   stream->mall_stream_config.type == SUBVP_MAIN) {
+   bool is_pipe_split = false;
+
+   for (i = 0; i < dc->res_pool->pipe_count; i++) {
+   if (dc->current_state->res_ctx.pipe_ctx[i].stream == 
stream &&
+   
(dc->current_state->res_ctx.pipe_ctx[i].bottom_pipe ||
+   
dc->current_state->res_ctx.pipe_ctx[i].next_odm_pipe)) {
+   is_pipe_split = true;
+   break;
+   }
+   }
+
+   /* determine if minimal transition is required due to SubVP*/
+   if (surface_count > 0 && is_pipe_split) {
+   if (cur_stream_status->plane_count > surface_count) {
+   force_minimal_pipe_splitting = true;
+   } else if (cur_stream_status->plane_count < 
surface_count) {
+   force_minimal_pipe_splitting = true;
+   *is_plane_addition = true;
+   }
+   }
+   }
+
return force_minimal_pipe_splitting;
 }
 
@@ -3601,6 +3632,7 @@ static bool commit_minimal_transition_state(struct dc *dc,
struct dc_state *transition_context = dc_create_state(dc);
enum pipe_split_policy tmp_mpc_policy;
bool temp_dynamic_odm_policy;
+   bool temp_subvp_policy;
enum dc_status ret = DC_ERROR_UNEXPECTED;
unsigned int i, j;
 
@@ -3615,6 +3647,9 @@ static bool commit_minimal_transition_state(struct dc *dc,
temp_dynamic_odm_policy = 
dc->debug.enable_single_display_2to1_odm_policy;
dc->debug.enable_single_display_2to1_odm_policy = false;
 
+   temp_subvp_policy = dc->debug.force_disable_subvp;
+   dc->debug.force_disable_subvp = true;
+
dc_resource_state_copy_construct(transition_base_context, 
transition_context);
 
//commit minimal state
@@ -3643,6 +3678,7 @@ static bool commit_minimal_transition_state(struct dc *dc,
dc->debug.pipe_split_policy = tmp_mpc_policy;
 
dc->debug.enable_single_display_2to1_odm_policy = 
temp_dynamic_odm_policy;
+   dc->debug.force_disable_subvp = temp_subvp_policy;
 
if (ret != DC_OK) {
/*this should never happen*/
diff --git a/drivers/gpu/drm/amd/display/dc/dcn32/dcn32_resource.c 
b/drivers/gpu/drm/amd/display/dc/dcn32/dcn32_resource.c
index 05de97ea855f..752a4accb116 100644
--- a/drivers/gpu/drm/amd/display/dc/dcn32/dcn32_resource.c
+++ b/drivers/gpu/drm/amd/display/dc/dcn32/dcn32_resource.c
@@ -1798,14 +1798,32 @@ bool dcn32_validate_bandwidth(struct dc *dc,
int vlevel = 0;
int pipe_cnt = 0;
display_e2e_pipe_params_st *pipes = kzalloc(dc->res_pool->pipe_count * 
sizeof(display_e2e_pipe_params_st), GFP_KERNEL);
+   struct mall_temp_config mall_temp_config;
DC_LOGGER_INIT(dc->ctx->logger);
 
+   /* For fast validation, there are situations where a shallow copy of
+* of the dc->current_state is created for the validation. In this case
+* we want to save and restore the mall config because we always
+* teardown subvp at the beginning of validation (and don't attempt
+* to add

[PATCH 16/36] drm/amd/display: Increase compbuf size prior to updating clocks

2022-09-28 Thread Hamza Mahfooz

From: Dillon Varone 

[WHY?]
Clocks are updating based on the incoming context's support, however the new
compbuf size is not programmed prior to udpating clocks, which can result in
P-State hangs.

[HOW?]
Increase compbuf size prior to updating clocks.

Reviewed-by: Alvin Lee 
Reviewed-by: Martin Leung 
Acked-by: Hamza Mahfooz 
Signed-off-by: Dillon Varone 
---
 drivers/gpu/drm/amd/display/dc/dcn20/dcn20_hwseq.c | 7 ---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/amd/display/dc/dcn20/dcn20_hwseq.c 
b/drivers/gpu/drm/amd/display/dc/dcn20/dcn20_hwseq.c
index e1d271fe9e64..7de511fd004b 100644
--- a/drivers/gpu/drm/amd/display/dc/dcn20/dcn20_hwseq.c
+++ b/drivers/gpu/drm/amd/display/dc/dcn20/dcn20_hwseq.c
@@ -2018,6 +2018,10 @@ void dcn20_optimize_bandwidth(
context->bw_ctx.bw.dcn.clk.dramclk_khz <= 
dc->clk_mgr->bw_params->dc_mode_softmax_memclk * 1000)
dc->clk_mgr->funcs->set_max_memclk(dc->clk_mgr, 
dc->clk_mgr->bw_params->dc_mode_softmax_memclk);
 
+   /* increase compbuf size */
+   if (hubbub->funcs->program_compbuf_size)
+   hubbub->funcs->program_compbuf_size(hubbub, 
context->bw_ctx.bw.dcn.compbuf_size_kb, true);
+
dc->clk_mgr->funcs->update_clocks(
dc->clk_mgr,
context,
@@ -2033,9 +2037,6 @@ void dcn20_optimize_bandwidth(

pipe_ctx->dlg_regs.optimized_min_dst_y_next_start);
}
}
-   /* increase compbuf size */
-   if (hubbub->funcs->program_compbuf_size)
-   hubbub->funcs->program_compbuf_size(hubbub, 
context->bw_ctx.bw.dcn.compbuf_size_kb, true);
 }
 
 bool dcn20_update_bandwidth(
-- 
2.37.2

[PATCH 04/36] drm/amd/display: fix dcn315 dml detile overestimation

2022-09-28 Thread Hamza Mahfooz

From: Dmytro Laktyushkin 

DML does not take the fact that dcn315 does not have enough detile
buffer to max all pipes. This change adds a workaround to apply
the same logic DC does when calculating detile buffer size in DML.

Reviewed-by: Charlene Liu 
Acked-by: Hamza Mahfooz 
Signed-off-by: Dmytro Laktyushkin 
---
 .../gpu/drm/amd/display/dc/dml/dcn31/dcn31_fpu.c  |  2 +-
 .../display/dc/dml/dcn31/display_mode_vba_31.c| 15 +++
 .../gpu/drm/amd/display/dc/dml/display_mode_lib.c |  1 +
 .../gpu/drm/amd/display/dc/dml/display_mode_lib.h |  1 +
 4 files changed, 18 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/display/dc/dml/dcn31/dcn31_fpu.c 
b/drivers/gpu/drm/amd/display/dc/dml/dcn31/dcn31_fpu.c
index 5dbd363b275b..87bfc42bdaaf 100644
--- a/drivers/gpu/drm/amd/display/dc/dml/dcn31/dcn31_fpu.c
+++ b/drivers/gpu/drm/amd/display/dc/dml/dcn31/dcn31_fpu.c
@@ -692,7 +692,7 @@ void dcn315_update_bw_bounding_box(struct dc *dc, struct 
clk_bw_params *bw_param
}
 
if (!IS_FPGA_MAXIMUS_DC(dc->ctx->dce_environment))
-   dml_init_instance(>dml, _15_soc, _15_ip, 
DML_PROJECT_DCN31);
+   dml_init_instance(>dml, _15_soc, _15_ip, 
DML_PROJECT_DCN315);
else
dml_init_instance(>dml, _15_soc, _15_ip, 
DML_PROJECT_DCN31_FPGA);
 }
diff --git a/drivers/gpu/drm/amd/display/dc/dml/dcn31/display_mode_vba_31.c 
b/drivers/gpu/drm/amd/display/dc/dml/dcn31/display_mode_vba_31.c
index 8dfe639b6508..b612edb14417 100644
--- a/drivers/gpu/drm/amd/display/dc/dml/dcn31/display_mode_vba_31.c
+++ b/drivers/gpu/drm/amd/display/dc/dml/dcn31/display_mode_vba_31.c
@@ -43,6 +43,8 @@
 #define BPP_BLENDED_PIPE 0x
 #define DCN31_MAX_DSC_IMAGE_WIDTH 5184
 #define DCN31_MAX_FMT_420_BUFFER_WIDTH 4096
+#define DCN3_15_MIN_COMPBUF_SIZE_KB 128
+#define DCN3_15_MAX_DET_SIZE 384
 
 // For DML-C changes that hasn't been propagated to VBA yet
 //#define __DML_VBA_ALLOW_DELTA__
@@ -3775,6 +3777,17 @@ static noinline void CalculatePrefetchSchedulePerPlane(
>VReadyOffsetPix[k]);
 }
 
+static void PatchDETBufferSizeInKByte(unsigned int NumberOfActivePlanes, int 
NoOfDPPThisState[], unsigned int config_return_buffer_size_in_kbytes, unsigned 
int *DETBufferSizeInKByte)
+{
+   int i, total_pipes = 0;
+   for (i = 0; i < NumberOfActivePlanes; i++)
+   total_pipes += NoOfDPPThisState[i];
+   *DETBufferSizeInKByte = ((config_return_buffer_size_in_kbytes - 
DCN3_15_MIN_COMPBUF_SIZE_KB) / 64 / total_pipes) * 64;
+   if (*DETBufferSizeInKByte > DCN3_15_MAX_DET_SIZE)
+   *DETBufferSizeInKByte = DCN3_15_MAX_DET_SIZE;
+}
+
+
 void dml31_ModeSupportAndSystemConfigurationFull(struct display_mode_lib 
*mode_lib)
 {
struct vba_vars_st *v = _lib->vba;
@@ -4533,6 +4546,8 @@ void dml31_ModeSupportAndSystemConfigurationFull(struct 
display_mode_lib *mode_l
v->ODMCombineEnableThisState[k] = 
v->ODMCombineEnablePerState[i][k];
}
 
+   if (v->NumberOfActivePlanes > 1 && mode_lib->project == 
DML_PROJECT_DCN315)
+   
PatchDETBufferSizeInKByte(v->NumberOfActivePlanes, v->NoOfDPPThisState, 
v->ip.config_return_buffer_size_in_kbytes, >DETBufferSizeInKByte[0]);
CalculateSwathAndDETConfiguration(
false,
v->NumberOfActivePlanes,
diff --git a/drivers/gpu/drm/amd/display/dc/dml/display_mode_lib.c 
b/drivers/gpu/drm/amd/display/dc/dml/display_mode_lib.c
index f5400eda07a5..4125d3d111d1 100644
--- a/drivers/gpu/drm/amd/display/dc/dml/display_mode_lib.c
+++ b/drivers/gpu/drm/amd/display/dc/dml/display_mode_lib.c
@@ -114,6 +114,7 @@ void dml_init_instance(struct display_mode_lib *lib,
break;
case DML_PROJECT_DCN31:
case DML_PROJECT_DCN31_FPGA:
+   case DML_PROJECT_DCN315:
lib->funcs = dml31_funcs;
break;
case DML_PROJECT_DCN314:
diff --git a/drivers/gpu/drm/amd/display/dc/dml/display_mode_lib.h 
b/drivers/gpu/drm/amd/display/dc/dml/display_mode_lib.h
index b1878a1440e2..3d643d50c3eb 100644
--- a/drivers/gpu/drm/amd/display/dc/dml/display_mode_lib.h
+++ b/drivers/gpu/drm/amd/display/dc/dml/display_mode_lib.h
@@ -40,6 +40,7 @@ enum dml_project {
DML_PROJECT_DCN21,
DML_PROJECT_DCN30,
DML_PROJECT_DCN31,
+   DML_PROJECT_DCN315,
DML_PROJECT_DCN31_FPGA,
DML_PROJECT_DCN314,
DML_PROJECT_DCN32,
-- 
2.37.2

[PATCH 10/36] drm/amd/display: Disable GSL when enabling phantom pipe

2022-09-28 Thread Hamza Mahfooz

From: Alvin Lee 

[Description]
When enabling phantom pipe on a pipe that was previously
using immediate flip, we have to disable GSL or this will
prevent the update from taking place right away on the phantom
pipe when we enable it in FW

Reviewed-by: Jun Lei 
Acked-by: Hamza Mahfooz 
Signed-off-by: Alvin Lee 
---
 drivers/gpu/drm/amd/display/dc/dcn32/dcn32_hubp.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/gpu/drm/amd/display/dc/dcn32/dcn32_hubp.c 
b/drivers/gpu/drm/amd/display/dc/dcn32/dcn32_hubp.c
index 2038cbda33f7..830562f4139d 100644
--- a/drivers/gpu/drm/amd/display/dc/dcn32/dcn32_hubp.c
+++ b/drivers/gpu/drm/amd/display/dc/dcn32/dcn32_hubp.c
@@ -79,6 +79,8 @@ void hubp32_phantom_hubp_post_enable(struct hubp *hubp)
uint32_t reg_val;
struct dcn20_hubp *hubp2 = TO_DCN20_HUBP(hubp);
 
+   /* For phantom pipe enable, disable GSL */
+   REG_UPDATE(DCSURF_FLIP_CONTROL2, SURFACE_GSL_ENABLE, 0);
REG_UPDATE(DCHUBP_CNTL, HUBP_BLANK_EN, 1);
reg_val = REG_READ(DCHUBP_CNTL);
if (reg_val) {
-- 
2.37.2

[PATCH 05/36] drm/amd/display: Block SubVP if rotation being used

2022-09-28 Thread Hamza Mahfooz

From: Alvin Lee 

[Description]
- SubVP rotation support is not explicitly implemented,
  so block SubVP in rotation cases to avoid unexpected
  behaviors

Reviewed-by: Nevenko Stupar 
Reviewed-by: Jun Lei 
Acked-by: Hamza Mahfooz 
Signed-off-by: Alvin Lee 
---
 .../drm/amd/display/dc/dcn32/dcn32_resource.h   |  2 ++
 .../display/dc/dcn32/dcn32_resource_helpers.c   | 17 +
 .../drm/amd/display/dc/dml/dcn32/dcn32_fpu.c|  3 ++-
 3 files changed, 21 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/display/dc/dcn32/dcn32_resource.h 
b/drivers/gpu/drm/amd/display/dc/dcn32/dcn32_resource.h
index 55945cca2260..a24f538bdc4c 100644
--- a/drivers/gpu/drm/amd/display/dc/dcn32/dcn32_resource.h
+++ b/drivers/gpu/drm/amd/display/dc/dcn32/dcn32_resource.h
@@ -108,6 +108,8 @@ bool dcn32_subvp_in_use(struct dc *dc,
 
 bool dcn32_mpo_in_use(struct dc_state *context);
 
+bool dcn32_any_surfaces_rotated(struct dc *dc, struct dc_state *context);
+
 struct pipe_ctx *dcn32_acquire_idle_pipe_for_head_pipe_in_layer(
struct dc_state *state,
const struct resource_pool *pool,
diff --git a/drivers/gpu/drm/amd/display/dc/dcn32/dcn32_resource_helpers.c 
b/drivers/gpu/drm/amd/display/dc/dcn32/dcn32_resource_helpers.c
index a2a70a1572b7..7f318ced5dee 100644
--- a/drivers/gpu/drm/amd/display/dc/dcn32/dcn32_resource_helpers.c
+++ b/drivers/gpu/drm/amd/display/dc/dcn32/dcn32_resource_helpers.c
@@ -233,6 +233,23 @@ bool dcn32_mpo_in_use(struct dc_state *context)
return false;
 }
 
+
+bool dcn32_any_surfaces_rotated(struct dc *dc, struct dc_state *context)
+{
+   uint32_t i;
+
+   for (i = 0; i < dc->res_pool->pipe_count; i++) {
+   struct pipe_ctx *pipe = >res_ctx.pipe_ctx[i];
+
+   if (!pipe->stream)
+   continue;
+
+   if (pipe->plane_state && pipe->plane_state->rotation != 
ROTATION_ANGLE_0)
+   return true;
+   }
+   return false;
+}
+
 /**
  * 
***
  * dcn32_determine_det_override: Determine DET allocation for each pipe
diff --git a/drivers/gpu/drm/amd/display/dc/dml/dcn32/dcn32_fpu.c 
b/drivers/gpu/drm/amd/display/dc/dml/dcn32/dcn32_fpu.c
index 0571700f53f9..a56ee04f7df9 100644
--- a/drivers/gpu/drm/amd/display/dc/dml/dcn32/dcn32_fpu.c
+++ b/drivers/gpu/drm/amd/display/dc/dml/dcn32/dcn32_fpu.c
@@ -1115,7 +1115,8 @@ static void dcn32_full_validate_bw_helper(struct dc *dc,
 * 5. (Config doesn't support MCLK in VACTIVE/VBLANK || 
dc->debug.force_subvp_mclk_switch)
 */
if (!dc->debug.force_disable_subvp && 
dcn32_all_pipes_have_stream_and_plane(dc, context) &&
-   !dcn32_mpo_in_use(context) && (*vlevel == 
context->bw_ctx.dml.soc.num_states ||
+   !dcn32_mpo_in_use(context) && !dcn32_any_surfaces_rotated(dc, 
context) &&
+   (*vlevel == context->bw_ctx.dml.soc.num_states ||
vba->DRAMClockChangeSupport[*vlevel][vba->maxMpcComb] == 
dm_dram_clock_change_unsupported ||
dc->debug.force_subvp_mclk_switch)) {
 
-- 
2.37.2

[PATCH 06/36] drm/amd/display: Allow PSR exit when panel is disconnected

2022-09-28 Thread Hamza Mahfooz

From: Iswara Nagulendran 

[HOW]
Fixed check to only avoid PSR entry when panel
is disconnected. PSR exit can be permitted to restore
the HW to it's non-PSR state.

Reviewed-by: Jayendran Ramani 
Acked-by: Hamza Mahfooz 
Signed-off-by: Iswara Nagulendran 
---
 drivers/gpu/drm/amd/display/dc/core/dc_link.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/display/dc/core/dc_link.c 
b/drivers/gpu/drm/amd/display/dc/core/dc_link.c
index 3529be5888c8..f9479c3ace97 100644
--- a/drivers/gpu/drm/amd/display/dc/core/dc_link.c
+++ b/drivers/gpu/drm/amd/display/dc/core/dc_link.c
@@ -3143,7 +3143,7 @@ bool dc_link_set_psr_allow_active(struct dc_link *link, 
const bool *allow_active
if (!dc_get_edp_link_panel_inst(dc, link, _inst))
return false;
 
-   if (allow_active && link->type == dc_connection_none) {
+   if ((allow_active != NULL) && (*allow_active == true) && (link->type == 
dc_connection_none)) {
// Don't enter PSR if panel is not connected
return false;
}
-- 
2.37.2

[PATCH 03/36] drm/amd/display: add dummy pstate workaround to dcn315

2022-09-28 Thread Hamza Mahfooz

From: Dmytro Laktyushkin 

DCN315 has to always allow pstate change or SMU will hang. This
workaround achieves this by applying a low pstate change latency
to be used when pstate is calculated to be unsupported. This lower
latency only accounts for memory retraining; a previous change
handles locking in the highest available pstate allowing us to minimize
required latency hiding to only account for memory retraining.

Reviewed-by: Charlene Liu 
Acked-by: Hamza Mahfooz 
Signed-off-by: Dmytro Laktyushkin 
---
 .../drm/amd/display/dc/dcn30/dcn30_resource.c |  4 +
 .../amd/display/dc/dcn315/dcn315_resource.c   |  2 +-
 .../drm/amd/display/dc/dml/dcn31/dcn31_fpu.c  | 89 +--
 .../drm/amd/display/dc/dml/dcn31/dcn31_fpu.h  |  1 +
 4 files changed, 27 insertions(+), 69 deletions(-)

diff --git a/drivers/gpu/drm/amd/display/dc/dcn30/dcn30_resource.c 
b/drivers/gpu/drm/amd/display/dc/dcn30/dcn30_resource.c
index 3a3b2ac791c7..020f512e9690 100644
--- a/drivers/gpu/drm/amd/display/dc/dcn30/dcn30_resource.c
+++ b/drivers/gpu/drm/amd/display/dc/dcn30/dcn30_resource.c
@@ -1655,6 +1655,9 @@ noinline bool dcn30_internal_validate_bw(
if (!pipes)
return false;
 
+   context->bw_ctx.dml.vba.maxMpcComb = 0;
+   context->bw_ctx.dml.vba.VoltageLevel = 0;
+   context->bw_ctx.dml.vba.DRAMClockChangeSupport[0][0] = 
dm_dram_clock_change_vactive;
dc->res_pool->funcs->update_soc_for_wm_a(dc, context);
pipe_cnt = dc->res_pool->funcs->populate_dml_pipes(dc, context, pipes, 
fast_validate);
 
@@ -1873,6 +1876,7 @@ noinline bool dcn30_internal_validate_bw(
 
if (repopulate_pipes)
pipe_cnt = dc->res_pool->funcs->populate_dml_pipes(dc, context, 
pipes, fast_validate);
+   context->bw_ctx.dml.vba.VoltageLevel = vlevel;
*vlevel_out = vlevel;
*pipe_cnt_out = pipe_cnt;
 
diff --git a/drivers/gpu/drm/amd/display/dc/dcn315/dcn315_resource.c 
b/drivers/gpu/drm/amd/display/dc/dcn315/dcn315_resource.c
index eebb42c9ddd6..07c59f8eefce 100644
--- a/drivers/gpu/drm/amd/display/dc/dcn315/dcn315_resource.c
+++ b/drivers/gpu/drm/amd/display/dc/dcn315/dcn315_resource.c
@@ -1721,7 +1721,7 @@ static struct resource_funcs dcn315_res_pool_funcs = {
.panel_cntl_create = dcn31_panel_cntl_create,
.validate_bandwidth = dcn31_validate_bandwidth,
.calculate_wm_and_dlg = dcn31_calculate_wm_and_dlg,
-   .update_soc_for_wm_a = dcn31_update_soc_for_wm_a,
+   .update_soc_for_wm_a = dcn315_update_soc_for_wm_a,
.populate_dml_pipes = dcn315_populate_dml_pipes_from_context,
.acquire_idle_pipe_for_layer = dcn20_acquire_idle_pipe_for_layer,
.add_stream_to_ctx = dcn30_add_stream_to_ctx,
diff --git a/drivers/gpu/drm/amd/display/dc/dml/dcn31/dcn31_fpu.c 
b/drivers/gpu/drm/amd/display/dc/dml/dcn31/dcn31_fpu.c
index b6e99eefe869..5dbd363b275b 100644
--- a/drivers/gpu/drm/amd/display/dc/dml/dcn31/dcn31_fpu.c
+++ b/drivers/gpu/drm/amd/display/dc/dml/dcn31/dcn31_fpu.c
@@ -292,6 +292,7 @@ static struct _vcs_dpi_soc_bounding_box_st dcn3_15_soc = {
.urgent_latency_adjustment_fabric_clock_component_us = 0,
.urgent_latency_adjustment_fabric_clock_reference_mhz = 0,
.num_chans = 4,
+   .dummy_pstate_latency_us = 10.0
 };
 
 struct _vcs_dpi_ip_params_st dcn3_16_ip = {
@@ -459,6 +460,23 @@ void dcn31_update_soc_for_wm_a(struct dc *dc, struct 
dc_state *context)
}
 }
 
+void dcn315_update_soc_for_wm_a(struct dc *dc, struct dc_state *context)
+{
+   dc_assert_fp_enabled();
+
+   if (dc->clk_mgr->bw_params->wm_table.entries[WM_A].valid) {
+   /* For 315 pstate change is only supported if possible in 
vactive */
+   if 
(context->bw_ctx.dml.vba.DRAMClockChangeSupport[context->bw_ctx.dml.vba.VoltageLevel][context->bw_ctx.dml.vba.maxMpcComb]
 != dm_dram_clock_change_vactive)
+   context->bw_ctx.dml.soc.dram_clock_change_latency_us = 
context->bw_ctx.dml.soc.dummy_pstate_latency_us;
+   else
+   context->bw_ctx.dml.soc.dram_clock_change_latency_us = 
dc->clk_mgr->bw_params->wm_table.entries[WM_A].pstate_latency_us;
+   context->bw_ctx.dml.soc.sr_enter_plus_exit_time_us =
+   
dc->clk_mgr->bw_params->wm_table.entries[WM_A].sr_enter_plus_exit_time_us;
+   context->bw_ctx.dml.soc.sr_exit_time_us =
+   
dc->clk_mgr->bw_params->wm_table.entries[WM_A].sr_exit_time_us;
+   }
+}
+
 void dcn31_calculate_wm_and_dlg_fp(
struct dc *dc, struct dc_state *context,
display_e2e_pipe_params_st *pipes,
@@ -486,72 +504,6 @@ void dcn31_calculate_wm_and_dlg_fp(
pipes[0].clks_cfg.dcfclk_mhz = dcfclk;
pipes[0].clks_cfg.socclk_mhz = 
context->bw_ctx.dml.soc.clock_limits[vlevel].socclk_mhz;
 
-#if 0 // TODO
-   /* Set B:
-* TODO
-*/
-   if

[PATCH 02/36] drm/amd/display: Reorder FCLK P-state switch sequence for DCN32

2022-09-28 Thread Hamza Mahfooz

From: Dillon Varone 

[WHY?]
In some cases, DCFCLK hardmin requests are not acknowledged by SMU as
the requested clock does not have a compatible ratio with current FCLK,
and it cannot be changed as FCLK P-state is not allowed.

[HOW?]
Allow FCLK p-state change prior to changing DCFCLK hardmin.

Reviewed-by: Alvin Lee 
Acked-by: Hamza Mahfooz 
Signed-off-by: Dillon Varone 
---
 .../display/dc/clk_mgr/dcn32/dcn32_clk_mgr.c  | 44 ++-
 1 file changed, 23 insertions(+), 21 deletions(-)

diff --git a/drivers/gpu/drm/amd/display/dc/clk_mgr/dcn32/dcn32_clk_mgr.c 
b/drivers/gpu/drm/amd/display/dc/clk_mgr/dcn32/dcn32_clk_mgr.c
index f0f3f66629cc..96d5e0d5b3ce 100644
--- a/drivers/gpu/drm/amd/display/dc/clk_mgr/dcn32/dcn32_clk_mgr.c
+++ b/drivers/gpu/drm/amd/display/dc/clk_mgr/dcn32/dcn32_clk_mgr.c
@@ -333,6 +333,21 @@ static void dcn32_update_clocks(struct clk_mgr 
*clk_mgr_base,
if (enter_display_off == safe_to_lower)
dcn30_smu_set_num_of_displays(clk_mgr, display_count);
 
+   clk_mgr_base->clks.fclk_prev_p_state_change_support = 
clk_mgr_base->clks.fclk_p_state_change_support;
+
+   total_plane_count = clk_mgr_helper_get_active_plane_cnt(dc, 
context);
+   fclk_p_state_change_support = 
new_clocks->fclk_p_state_change_support || (total_plane_count == 0);
+
+   if (should_update_pstate_support(safe_to_lower, 
fclk_p_state_change_support, clk_mgr_base->clks.fclk_p_state_change_support)) {
+   clk_mgr_base->clks.fclk_p_state_change_support = 
fclk_p_state_change_support;
+
+   /* To enable FCLK P-state switching, send 
FCLK_PSTATE_NOTSUPPORTED message to PMFW */
+   if (clk_mgr_base->ctx->dce_version != DCN_VERSION_3_21 
&& clk_mgr_base->clks.fclk_p_state_change_support && update_fclk) {
+   /* Handle the code for sending a message to 
PMFW that FCLK P-state change is supported */
+   dcn32_smu_send_fclk_pstate_message(clk_mgr, 
FCLK_PSTATE_SUPPORTED);
+   }
+   }
+
if (dc->debug.force_min_dcfclk_mhz > 0)
new_clocks->dcfclk_khz = (new_clocks->dcfclk_khz > 
(dc->debug.force_min_dcfclk_mhz * 1000)) ?
new_clocks->dcfclk_khz : 
(dc->debug.force_min_dcfclk_mhz * 1000);
@@ -352,7 +367,6 @@ static void dcn32_update_clocks(struct clk_mgr 
*clk_mgr_base,
clk_mgr_base->clks.socclk_khz = new_clocks->socclk_khz;
 
clk_mgr_base->clks.prev_p_state_change_support = 
clk_mgr_base->clks.p_state_change_support;
-   clk_mgr_base->clks.fclk_prev_p_state_change_support = 
clk_mgr_base->clks.fclk_p_state_change_support;
clk_mgr_base->clks.prev_num_ways = clk_mgr_base->clks.num_ways;
 
if (clk_mgr_base->clks.num_ways != new_clocks->num_ways &&
@@ -361,9 +375,8 @@ static void dcn32_update_clocks(struct clk_mgr 
*clk_mgr_base,
dcn32_smu_send_cab_for_uclk_message(clk_mgr, 
clk_mgr_base->clks.num_ways);
}
 
-   total_plane_count = clk_mgr_helper_get_active_plane_cnt(dc, 
context);
+
p_state_change_support = new_clocks->p_state_change_support || 
(total_plane_count == 0);
-   fclk_p_state_change_support = 
new_clocks->fclk_p_state_change_support || (total_plane_count == 0);
if (should_update_pstate_support(safe_to_lower, 
p_state_change_support, clk_mgr_base->clks.p_state_change_support)) {
clk_mgr_base->clks.p_state_change_support = 
p_state_change_support;
 
@@ -373,15 +386,14 @@ static void dcn32_update_clocks(struct clk_mgr 
*clk_mgr_base,

clk_mgr_base->bw_params->clk_table.entries[clk_mgr_base->bw_params->clk_table.num_entries
 - 1].memclk_mhz);
}
 
-   if (should_update_pstate_support(safe_to_lower, 
fclk_p_state_change_support, clk_mgr_base->clks.fclk_p_state_change_support) &&
-   clk_mgr_base->ctx->dce_version != 
DCN_VERSION_3_21) {
-   clk_mgr_base->clks.fclk_p_state_change_support = 
fclk_p_state_change_support;
+   /* Always update saved value, even if new value not set due to 
P-State switching unsupported. Also check safe_to_lower for FCLK */
+   if (safe_to_lower && 
(clk_mgr_base->clks.fclk_p_state_change_support != 
clk_mgr_base->clks.fclk_prev_p_state_change_support)) {
+   update_fclk = true;
+   }
 
-   /* To disable FCLK P-state switching, send 
FCLK_PSTATE_NOTSUPPORTED message to PMFW */
-   if (clk_mgr_base->ctx->dce_version != DCN_VERSION_3_21 
&& !clk_mgr_base->clks.fclk_p_state_change_support) {
-   /* Handle code for sending a message to PMFW

[PATCH 01/36] drm/amd/display: Program SubVP in dc_commit_state_no_check

2022-09-28 Thread Hamza Mahfooz

From: Dillon Varone 

[Why?]
Currently SubVP programming is only done in commit_planes_for_stream, as
it was expected only this call would add/remove planes from a
display.

[How?]
Add SubVP programming to dc_commit_state_no_check.

Reviewed-by: Alvin Lee 
Acked-by: Hamza Mahfooz 
Signed-off-by: Dillon Varone 
---
 drivers/gpu/drm/amd/display/dc/core/dc.c | 19 +++
 1 file changed, 19 insertions(+)

diff --git a/drivers/gpu/drm/amd/display/dc/core/dc.c 
b/drivers/gpu/drm/amd/display/dc/core/dc.c
index 258ba5a872b1..2584cb8f44e2 100644
--- a/drivers/gpu/drm/amd/display/dc/core/dc.c
+++ b/drivers/gpu/drm/amd/display/dc/core/dc.c
@@ -1734,10 +1734,20 @@ static enum dc_status dc_commit_state_no_check(struct 
dc *dc, struct dc_state *c
int i, k, l;
struct dc_stream_state *dc_streams[MAX_STREAMS] = {0};
struct dc_state *old_state;
+   bool subvp_prev_use = false;
 
dc_z10_restore(dc);
dc_allow_idle_optimizations(dc, false);
 
+   for (i = 0; i < dc->res_pool->pipe_count; i++) {
+   struct pipe_ctx *old_pipe = 
>current_state->res_ctx.pipe_ctx[i];
+
+   /* Check old context for SubVP */
+   subvp_prev_use |= (old_pipe->stream && 
old_pipe->stream->mall_stream_config.type == SUBVP_PHANTOM);
+   if (subvp_prev_use)
+   break;
+   }
+
for (i = 0; i < context->stream_count; i++)
dc_streams[i] =  context->streams[i];
 
@@ -1777,6 +1787,9 @@ static enum dc_status dc_commit_state_no_check(struct dc 
*dc, struct dc_state *c
dc->hwss.wait_for_mpcc_disconnect(dc, dc->res_pool, pipe);
}
 
+   if (dc->hwss.subvp_pipe_control_lock)
+   dc->hwss.subvp_pipe_control_lock(dc, context, true, true, NULL, 
subvp_prev_use);
+
result = dc->hwss.apply_ctx_to_hw(dc, context);
 
if (result != DC_OK) {
@@ -1794,6 +1807,12 @@ static enum dc_status dc_commit_state_no_check(struct dc 
*dc, struct dc_state *c
dc->hwss.interdependent_update_lock(dc, context, false);
dc->hwss.post_unlock_program_front_end(dc, context);
}
+
+   if (dc->hwss.commit_subvp_config)
+   dc->hwss.commit_subvp_config(dc, context);
+   if (dc->hwss.subvp_pipe_control_lock)
+   dc->hwss.subvp_pipe_control_lock(dc, context, false, true, 
NULL, subvp_prev_use);
+
for (i = 0; i < context->stream_count; i++) {
const struct dc_link *link = context->streams[i]->link;
 
-- 
2.37.2

[PATCH 00/36] DC Patches September 26, 2022

2022-09-28 Thread Hamza Mahfooz

This DC patch-set brings improvements in multiple areas. In summary, we
highlight:

* ILR improvements;
* PSR fixes;
* DCN315 fixes;
* DCN32 fixes;
* ODM fixes;
* DSC fixes;
* SubVP fixes.

Cc: Daniel Wheeler 

Alvin Lee (3):
  drm/amd/display: Block SubVP if rotation being used
  drm/amd/display: Disable GSL when enabling phantom pipe
  drm/amd/display: For SubVP pipe split case use min transition into MPO

Aric Cyr (3):
  Revert "drm/amd/display: correct hostvm flag"
  drm/amd/display: Fix vupdate and vline position calculation
  drm/amd/display: 3.2.206

Charlene Liu (1):
  drm/amd/display: prevent S4 test from failing

Dillon Varone (4):
  drm/amd/display: Program SubVP in dc_commit_state_no_check
  drm/amd/display: Reorder FCLK P-state switch sequence for DCN32
  drm/amd/display: Increase compbuf size prior to updating clocks
  drm/amd/display: Fix merging dynamic ODM+MPO configs on DCN32

Dmytro Laktyushkin (2):
  drm/amd/display: add dummy pstate workaround to dcn315
  drm/amd/display: fix dcn315 dml detile overestimation

Eric Bernstein (1):
  drm/amd/display: Fix disable DSC logic in ghe DIO code

George Shen (1):
  drm/amd/display: Add missing SDP registers to DCN32 reglist

Ian Chen (1):
  drm/amd/display: Refactor edp ILR caps codes

Iswara Nagulendran (1):
  drm/amd/display: Allow PSR exit when panel is disconnected

Leo (Hanghong) Ma (1):
  drm/amd/display: AUX tracing cleanup

Leo Chen (1):
  drm/amd/display: Add log for LTTPR

Leung, Martin (1):
  drm/amd/display: unblock mcm_luts

Lewis Huang (1):
  drm/amd/display: Keep OTG on when Z10 is disable

Martin Leung (1):
  drm/amd/display: block odd h_total timings from halving pixel rate

Rodrigo Siqueira (10):
  drm/amd/display: Drop unused code for DCN32/321
  drm/amd/display: Update DCN321 hook that deals with pipe aquire
  drm/amd/display: Fix SubVP control flow in the MPO context
  drm/amd/display: Remove OPTC lock check
  drm/amd/display: Adding missing HDMI ACP SEND register
  drm/amd/display: Add PState change high hook for DCN32
  drm/amd/display: Enable 2 to 1 ODM policy if supported
  drm/amd/display: Disconnect DSC for unused pipes during ODM transition
  drm/amd/display: update DSC for DCN32
  drm/amd/display: Minor code style change

Wenjing Liu (3):
  drm/amd/display: fix integer overflow during MSA V_Freq calculation
  drm/amd/display: write all 4 bytes of FFE_PRESET dpcd value
  drm/amd/display: Add missing mask sh for SYM32_TP_SQ_PULSE register

Zhikai Zhai (1):
  drm/amd/display: skip commit minimal transition state

 .../display/dc/clk_mgr/dcn32/dcn32_clk_mgr.c  | 44 -
 drivers/gpu/drm/amd/display/dc/core/dc.c  | 91 ++-
 drivers/gpu/drm/amd/display/dc/core/dc_link.c | 11 ++-
 .../gpu/drm/amd/display/dc/core/dc_link_dp.c  | 70 --
 drivers/gpu/drm/amd/display/dc/dc.h   |  3 +-
 drivers/gpu/drm/amd/display/dc/dc_link.h  |  4 +
 drivers/gpu/drm/amd/display/dc/dce/dce_aux.c  | 13 +--
 .../amd/display/dc/dcn10/dcn10_hw_sequencer.c | 60 +---
 .../gpu/drm/amd/display/dc/dcn10/dcn10_optc.c | 11 ---
 .../gpu/drm/amd/display/dc/dcn10/dcn10_optc.h |  1 -
 .../drm/amd/display/dc/dcn20/dcn20_hwseq.c| 25 +
 .../drm/amd/display/dc/dcn21/dcn21_hubbub.c   |  8 +-
 .../drm/amd/display/dc/dcn21/dcn21_resource.c | 13 ++-
 .../gpu/drm/amd/display/dc/dcn30/dcn30_optc.c |  1 -
 .../drm/amd/display/dc/dcn30/dcn30_resource.c |  4 +
 .../dc/dcn31/dcn31_hpo_dp_stream_encoder.c|  4 +-
 .../gpu/drm/amd/display/dc/dcn31/dcn31_optc.c |  1 -
 .../drm/amd/display/dc/dcn31/dcn31_resource.c | 15 ++-
 .../amd/display/dc/dcn314/dcn314_resource.c   | 13 ++-
 .../amd/display/dc/dcn315/dcn315_resource.c   | 15 ++-
 .../amd/display/dc/dcn316/dcn316_resource.c   | 13 ++-
 .../display/dc/dcn32/dcn32_dio_link_encoder.c |  7 --
 .../display/dc/dcn32/dcn32_dio_link_encoder.h |  4 -
 .../dc/dcn32/dcn32_dio_stream_encoder.c   | 57 +++-
 .../dc/dcn32/dcn32_dio_stream_encoder.h   |  3 +
 .../dc/dcn32/dcn32_hpo_dp_link_encoder.h  |  1 +
 .../drm/amd/display/dc/dcn32/dcn32_hubbub.c   |  1 +
 .../gpu/drm/amd/display/dc/dcn32/dcn32_hubp.c |  5 +-
 .../drm/amd/display/dc/dcn32/dcn32_hwseq.c| 37 +---
 .../drm/amd/display/dc/dcn32/dcn32_resource.c | 24 +
 .../drm/amd/display/dc/dcn32/dcn32_resource.h | 22 +
 .../display/dc/dcn32/dcn32_resource_helpers.c | 88 ++
 .../dc/dcn321/dcn321_dio_link_encoder.c   |  1 -
 .../amd/display/dc/dcn321/dcn321_resource.c   |  4 +-
 .../drm/amd/display/dc/dml/dcn31/dcn31_fpu.c  | 91 +--
 .../drm/amd/display/dc/dml/dcn31/dcn31_fpu.h  |  1 +
 .../dc/dml/dcn31/display_mode_vba_31.c| 15 +++
 .../drm/amd/display/dc/dml/dcn32/dcn32_fpu.c  |  8 +-
 .../dc/dml/dcn32/display_mode_vba_32.c| 19 ++--
 .../drm/amd/display/dc/dml/display_mode_lib.c |  1 +
 .../drm/amd/display/dc/dml/display_mode_lib.h |  1 +
 .../gpu/drm/amd/display/dc/inc/core_types.h   |  2 +-

Re: [PATCH 09/10] drm/amdgpu/gfx10: switch to amdgpu_gfx_rlc_init_microcode

2022-09-28 Thread Alex Deucher

On Wed, Sep 28, 2022 at 3:24 PM Dmitry Osipenko
 wrote:
>
> On 9/28/22 20:47, Dmitry Osipenko wrote:
> > On 9/28/22 20:44, Deucher, Alexander wrote:
> >> [AMD Official Use Only - General]
> >>
> >> This should be fixed in a backwards compatible way with this patch:
> >> https://patchwork.freedesktop.org/patch/504869/
> >
> > Good to know that it's already addressed, thank you very much for the
> > quick reply.
>
> Unfortunately, that patch doesn't help beige goby. Please fix.

https://patchwork.freedesktop.org/patch/505248/

Alex

>
> --
> Best regards,
> Dmitry
>

[PATCH] drm/amdgpu/gfx10: ignore rlc ucode validation

2022-09-28 Thread Alex Deucher

There are apparently ucode versions in the wild with incorrect
sizes specified in the header.  We never checked this before,
so don't start now.

Bug: https://gitlab.freedesktop.org/drm/amd/-/issues/2170
Signed-off-by: Alex Deucher 
---
 drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c | 7 ++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c 
b/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
index 18809c3da178..af94ac580d3e 100644
--- a/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
@@ -4061,9 +4061,14 @@ static int gfx_v10_0_init_microcode(struct amdgpu_device 
*adev)
err = request_firmware(>gfx.rlc_fw, fw_name, adev->dev);
if (err)
goto out;
+   /* don't check this.  There are apparently firmwares in the 
wild with
+* incorrect size in the header
+*/
err = amdgpu_ucode_validate(adev->gfx.rlc_fw);
if (err)
-   goto out;
+   dev_dbg(adev->dev,
+   "gfx10: amdgpu_ucode_validate() failed 
\"%s\"\n",
+   fw_name);
rlc_hdr = (const struct rlc_firmware_header_v2_0 
*)adev->gfx.rlc_fw->data;
version_major = 
le16_to_cpu(rlc_hdr->header.header_version_major);
version_minor = 
le16_to_cpu(rlc_hdr->header.header_version_minor);
-- 
2.37.3

[PATCH v6 07/21] drm/omapdrm: Prepare to dynamic dma-buf locking specification

2022-09-28 Thread Dmitry Osipenko

Prepare OMAP DRM driver to the common dynamic dma-buf locking convention
by starting to use the unlocked versions of dma-buf API functions.

Acked-by: Christian König 
Signed-off-by: Dmitry Osipenko 
---
 drivers/gpu/drm/omapdrm/omap_gem_dmabuf.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/omapdrm/omap_gem_dmabuf.c 
b/drivers/gpu/drm/omapdrm/omap_gem_dmabuf.c
index 393f82e26927..8e194dbc9506 100644
--- a/drivers/gpu/drm/omapdrm/omap_gem_dmabuf.c
+++ b/drivers/gpu/drm/omapdrm/omap_gem_dmabuf.c
@@ -125,7 +125,7 @@ struct drm_gem_object *omap_gem_prime_import(struct 
drm_device *dev,
 
get_dma_buf(dma_buf);
 
-   sgt = dma_buf_map_attachment(attach, DMA_TO_DEVICE);
+   sgt = dma_buf_map_attachment_unlocked(attach, DMA_TO_DEVICE);
if (IS_ERR(sgt)) {
ret = PTR_ERR(sgt);
goto fail_detach;
@@ -142,7 +142,7 @@ struct drm_gem_object *omap_gem_prime_import(struct 
drm_device *dev,
return obj;
 
 fail_unmap:
-   dma_buf_unmap_attachment(attach, sgt, DMA_TO_DEVICE);
+   dma_buf_unmap_attachment_unlocked(attach, sgt, DMA_TO_DEVICE);
 fail_detach:
dma_buf_detach(dma_buf, attach);
dma_buf_put(dma_buf);
-- 
2.37.3

[PATCH v6 05/21] drm/armada: Prepare to dynamic dma-buf locking specification

2022-09-28 Thread Dmitry Osipenko

Prepare Armada driver to the common dynamic dma-buf locking convention
by starting to use the unlocked versions of dma-buf API functions.

Acked-by: Christian König 
Signed-off-by: Dmitry Osipenko 
---
 drivers/gpu/drm/armada/armada_gem.c | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/armada/armada_gem.c 
b/drivers/gpu/drm/armada/armada_gem.c
index 5430265ad458..26d10065d534 100644
--- a/drivers/gpu/drm/armada/armada_gem.c
+++ b/drivers/gpu/drm/armada/armada_gem.c
@@ -66,8 +66,8 @@ void armada_gem_free_object(struct drm_gem_object *obj)
if (dobj->obj.import_attach) {
/* We only ever display imported data */
if (dobj->sgt)
-   dma_buf_unmap_attachment(dobj->obj.import_attach,
-dobj->sgt, DMA_TO_DEVICE);
+   
dma_buf_unmap_attachment_unlocked(dobj->obj.import_attach,
+ dobj->sgt, 
DMA_TO_DEVICE);
drm_prime_gem_destroy(>obj, NULL);
}
 
@@ -539,8 +539,8 @@ int armada_gem_map_import(struct armada_gem_object *dobj)
 {
int ret;
 
-   dobj->sgt = dma_buf_map_attachment(dobj->obj.import_attach,
-  DMA_TO_DEVICE);
+   dobj->sgt = dma_buf_map_attachment_unlocked(dobj->obj.import_attach,
+   DMA_TO_DEVICE);
if (IS_ERR(dobj->sgt)) {
ret = PTR_ERR(dobj->sgt);
dobj->sgt = NULL;
-- 
2.37.3

[PATCH v6 19/21] dma-buf: Document dynamic locking convention

2022-09-28 Thread Dmitry Osipenko

Add documentation for the dynamic locking convention. The documentation
tells dma-buf API users when they should take the reservation lock and
when not.

Acked-by: Sumit Semwal 
Reviewed-by: Christian König 
Signed-off-by: Dmitry Osipenko 
---
 Documentation/driver-api/dma-buf.rst |  6 +++
 drivers/dma-buf/dma-buf.c| 64 
 2 files changed, 70 insertions(+)

diff --git a/Documentation/driver-api/dma-buf.rst 
b/Documentation/driver-api/dma-buf.rst
index 36a76cbe9095..622b8156d212 100644
--- a/Documentation/driver-api/dma-buf.rst
+++ b/Documentation/driver-api/dma-buf.rst
@@ -119,6 +119,12 @@ DMA Buffer ioctls
 
 .. kernel-doc:: include/uapi/linux/dma-buf.h
 
+DMA-BUF locking convention
+~
+
+.. kernel-doc:: drivers/dma-buf/dma-buf.c
+   :doc: locking convention
+
 Kernel Functions and Structures Reference
 ~
 
diff --git a/drivers/dma-buf/dma-buf.c b/drivers/dma-buf/dma-buf.c
index 2452b4c82584..e04d504441a5 100644
--- a/drivers/dma-buf/dma-buf.c
+++ b/drivers/dma-buf/dma-buf.c
@@ -795,6 +795,70 @@ static struct sg_table * __map_dma_buf(struct 
dma_buf_attachment *attach,
return sg_table;
 }
 
+/**
+ * DOC: locking convention
+ *
+ * In order to avoid deadlock situations between dma-buf exports and importers,
+ * all dma-buf API users must follow the common dma-buf locking convention.
+ *
+ * Convention for importers
+ *
+ * 1. Importers must hold the dma-buf reservation lock when calling these
+ *functions:
+ *
+ * - dma_buf_pin()
+ * - dma_buf_unpin()
+ * - dma_buf_map_attachment()
+ * - dma_buf_unmap_attachment()
+ * - dma_buf_vmap()
+ * - dma_buf_vunmap()
+ *
+ * 2. Importers must not hold the dma-buf reservation lock when calling these
+ *functions:
+ *
+ * - dma_buf_attach()
+ * - dma_buf_dynamic_attach()
+ * - dma_buf_detach()
+ * - dma_buf_export(
+ * - dma_buf_fd()
+ * - dma_buf_get()
+ * - dma_buf_put()
+ * - dma_buf_mmap()
+ * - dma_buf_begin_cpu_access()
+ * - dma_buf_end_cpu_access()
+ * - dma_buf_map_attachment_unlocked()
+ * - dma_buf_unmap_attachment_unlocked()
+ * - dma_buf_vmap_unlocked()
+ * - dma_buf_vunmap_unlocked()
+ *
+ * Convention for exporters
+ *
+ * 1. These _buf_ops callbacks are invoked with unlocked dma-buf
+ *reservation and exporter can take the lock:
+ *
+ * - _buf_ops.attach()
+ * - _buf_ops.detach()
+ * - _buf_ops.release()
+ * - _buf_ops.begin_cpu_access()
+ * - _buf_ops.end_cpu_access()
+ *
+ * 2. These _buf_ops callbacks are invoked with locked dma-buf
+ *reservation and exporter can't take the lock:
+ *
+ * - _buf_ops.pin()
+ * - _buf_ops.unpin()
+ * - _buf_ops.map_dma_buf()
+ * - _buf_ops.unmap_dma_buf()
+ * - _buf_ops.mmap()
+ * - _buf_ops.vmap()
+ * - _buf_ops.vunmap()
+ *
+ * 3. Exporters must hold the dma-buf reservation lock when calling these
+ *functions:
+ *
+ * - dma_buf_move_notify()
+ */
+
 /**
  * dma_buf_dynamic_attach - Add the device to dma_buf's attachments list
  * @dmabuf:[in]buffer to attach device to.
-- 
2.37.3

[PATCH v6 08/21] drm/tegra: Prepare to dynamic dma-buf locking specification

2022-09-28 Thread Dmitry Osipenko

Prepare Tegra DRM driver to the common dynamic dma-buf locking convention
by starting to use the unlocked versions of dma-buf API functions.

Acked-by: Christian König 
Signed-off-by: Dmitry Osipenko 
---
 drivers/gpu/drm/tegra/gem.c | 17 +
 1 file changed, 9 insertions(+), 8 deletions(-)

diff --git a/drivers/gpu/drm/tegra/gem.c b/drivers/gpu/drm/tegra/gem.c
index 81991090adcc..b09b8ab40ae4 100644
--- a/drivers/gpu/drm/tegra/gem.c
+++ b/drivers/gpu/drm/tegra/gem.c
@@ -84,7 +84,7 @@ static struct host1x_bo_mapping *tegra_bo_pin(struct device 
*dev, struct host1x_
goto free;
}
 
-   map->sgt = dma_buf_map_attachment(map->attach, direction);
+   map->sgt = dma_buf_map_attachment_unlocked(map->attach, 
direction);
if (IS_ERR(map->sgt)) {
dma_buf_detach(buf, map->attach);
err = PTR_ERR(map->sgt);
@@ -160,7 +160,8 @@ static struct host1x_bo_mapping *tegra_bo_pin(struct device 
*dev, struct host1x_
 static void tegra_bo_unpin(struct host1x_bo_mapping *map)
 {
if (map->attach) {
-   dma_buf_unmap_attachment(map->attach, map->sgt, map->direction);
+   dma_buf_unmap_attachment_unlocked(map->attach, map->sgt,
+ map->direction);
dma_buf_detach(map->attach->dmabuf, map->attach);
} else {
dma_unmap_sgtable(map->dev, map->sgt, map->direction, 0);
@@ -181,7 +182,7 @@ static void *tegra_bo_mmap(struct host1x_bo *bo)
if (obj->vaddr) {
return obj->vaddr;
} else if (obj->gem.import_attach) {
-   ret = dma_buf_vmap(obj->gem.import_attach->dmabuf, );
+   ret = dma_buf_vmap_unlocked(obj->gem.import_attach->dmabuf, 
);
return ret ? NULL : map.vaddr;
} else {
return vmap(obj->pages, obj->num_pages, VM_MAP,
@@ -197,7 +198,7 @@ static void tegra_bo_munmap(struct host1x_bo *bo, void 
*addr)
if (obj->vaddr)
return;
else if (obj->gem.import_attach)
-   dma_buf_vunmap(obj->gem.import_attach->dmabuf, );
+   dma_buf_vunmap_unlocked(obj->gem.import_attach->dmabuf, );
else
vunmap(addr);
 }
@@ -461,7 +462,7 @@ static struct tegra_bo *tegra_bo_import(struct drm_device 
*drm,
 
get_dma_buf(buf);
 
-   bo->sgt = dma_buf_map_attachment(attach, DMA_TO_DEVICE);
+   bo->sgt = dma_buf_map_attachment_unlocked(attach, DMA_TO_DEVICE);
if (IS_ERR(bo->sgt)) {
err = PTR_ERR(bo->sgt);
goto detach;
@@ -479,7 +480,7 @@ static struct tegra_bo *tegra_bo_import(struct drm_device 
*drm,
 
 detach:
if (!IS_ERR_OR_NULL(bo->sgt))
-   dma_buf_unmap_attachment(attach, bo->sgt, DMA_TO_DEVICE);
+   dma_buf_unmap_attachment_unlocked(attach, bo->sgt, 
DMA_TO_DEVICE);
 
dma_buf_detach(buf, attach);
dma_buf_put(buf);
@@ -508,8 +509,8 @@ void tegra_bo_free_object(struct drm_gem_object *gem)
tegra_bo_iommu_unmap(tegra, bo);
 
if (gem->import_attach) {
-   dma_buf_unmap_attachment(gem->import_attach, bo->sgt,
-DMA_TO_DEVICE);
+   dma_buf_unmap_attachment_unlocked(gem->import_attach, bo->sgt,
+ DMA_TO_DEVICE);
drm_prime_gem_destroy(gem, NULL);
} else {
tegra_bo_free(gem->dev, bo);
-- 
2.37.3

[PATCH v6 20/21] media: videobuf2: Stop using internal dma-buf lock

2022-09-28 Thread Dmitry Osipenko

All drivers that use dma-bufs have been moved to the updated locking
specification and now dma-buf reservation is guaranteed to be locked
by importers during the mapping operations. There is no need to take
the internal dma-buf lock anymore. Remove locking from the videobuf2
memory allocators.

Acked-by: Tomasz Figa 
Acked-by: Hans Verkuil 
Acked-by: Christian König 
Signed-off-by: Dmitry Osipenko 
---
 drivers/media/common/videobuf2/videobuf2-dma-contig.c | 11 +--
 drivers/media/common/videobuf2/videobuf2-dma-sg.c | 11 +--
 drivers/media/common/videobuf2/videobuf2-vmalloc.c| 11 +--
 3 files changed, 3 insertions(+), 30 deletions(-)

diff --git a/drivers/media/common/videobuf2/videobuf2-dma-contig.c 
b/drivers/media/common/videobuf2/videobuf2-dma-contig.c
index 79f4d8301fbb..555bd40fa472 100644
--- a/drivers/media/common/videobuf2/videobuf2-dma-contig.c
+++ b/drivers/media/common/videobuf2/videobuf2-dma-contig.c
@@ -382,18 +382,12 @@ static struct sg_table *vb2_dc_dmabuf_ops_map(
struct dma_buf_attachment *db_attach, enum dma_data_direction dma_dir)
 {
struct vb2_dc_attachment *attach = db_attach->priv;
-   /* stealing dmabuf mutex to serialize map/unmap operations */
-   struct mutex *lock = _attach->dmabuf->lock;
struct sg_table *sgt;
 
-   mutex_lock(lock);
-
sgt = >sgt;
/* return previously mapped sg table */
-   if (attach->dma_dir == dma_dir) {
-   mutex_unlock(lock);
+   if (attach->dma_dir == dma_dir)
return sgt;
-   }
 
/* release any previous cache */
if (attach->dma_dir != DMA_NONE) {
@@ -409,14 +403,11 @@ static struct sg_table *vb2_dc_dmabuf_ops_map(
if (dma_map_sgtable(db_attach->dev, sgt, dma_dir,
DMA_ATTR_SKIP_CPU_SYNC)) {
pr_err("failed to map scatterlist\n");
-   mutex_unlock(lock);
return ERR_PTR(-EIO);
}
 
attach->dma_dir = dma_dir;
 
-   mutex_unlock(lock);
-
return sgt;
 }
 
diff --git a/drivers/media/common/videobuf2/videobuf2-dma-sg.c 
b/drivers/media/common/videobuf2/videobuf2-dma-sg.c
index 36ecdea8d707..36981a5b5c53 100644
--- a/drivers/media/common/videobuf2/videobuf2-dma-sg.c
+++ b/drivers/media/common/videobuf2/videobuf2-dma-sg.c
@@ -424,18 +424,12 @@ static struct sg_table *vb2_dma_sg_dmabuf_ops_map(
struct dma_buf_attachment *db_attach, enum dma_data_direction dma_dir)
 {
struct vb2_dma_sg_attachment *attach = db_attach->priv;
-   /* stealing dmabuf mutex to serialize map/unmap operations */
-   struct mutex *lock = _attach->dmabuf->lock;
struct sg_table *sgt;
 
-   mutex_lock(lock);
-
sgt = >sgt;
/* return previously mapped sg table */
-   if (attach->dma_dir == dma_dir) {
-   mutex_unlock(lock);
+   if (attach->dma_dir == dma_dir)
return sgt;
-   }
 
/* release any previous cache */
if (attach->dma_dir != DMA_NONE) {
@@ -446,14 +440,11 @@ static struct sg_table *vb2_dma_sg_dmabuf_ops_map(
/* mapping to the client with new direction */
if (dma_map_sgtable(db_attach->dev, sgt, dma_dir, 0)) {
pr_err("failed to map scatterlist\n");
-   mutex_unlock(lock);
return ERR_PTR(-EIO);
}
 
attach->dma_dir = dma_dir;
 
-   mutex_unlock(lock);
-
return sgt;
 }
 
diff --git a/drivers/media/common/videobuf2/videobuf2-vmalloc.c 
b/drivers/media/common/videobuf2/videobuf2-vmalloc.c
index 7831bf545874..41db707e43a4 100644
--- a/drivers/media/common/videobuf2/videobuf2-vmalloc.c
+++ b/drivers/media/common/videobuf2/videobuf2-vmalloc.c
@@ -267,18 +267,12 @@ static struct sg_table *vb2_vmalloc_dmabuf_ops_map(
struct dma_buf_attachment *db_attach, enum dma_data_direction dma_dir)
 {
struct vb2_vmalloc_attachment *attach = db_attach->priv;
-   /* stealing dmabuf mutex to serialize map/unmap operations */
-   struct mutex *lock = _attach->dmabuf->lock;
struct sg_table *sgt;
 
-   mutex_lock(lock);
-
sgt = >sgt;
/* return previously mapped sg table */
-   if (attach->dma_dir == dma_dir) {
-   mutex_unlock(lock);
+   if (attach->dma_dir == dma_dir)
return sgt;
-   }
 
/* release any previous cache */
if (attach->dma_dir != DMA_NONE) {
@@ -289,14 +283,11 @@ static struct sg_table *vb2_vmalloc_dmabuf_ops_map(
/* mapping to the client with new direction */
if (dma_map_sgtable(db_attach->dev, sgt, dma_dir, 0)) {
pr_err("failed to map scatterlist\n");
-   mutex_unlock(lock);
return ERR_PTR(-EIO);
}
 
attach->dma_dir = dma_dir;
 
-   mutex_unlock(lock);
-
return sgt;
 }
 
-- 
2.37.3

[PATCH v6 11/21] misc: fastrpc: Prepare to dynamic dma-buf locking specification

2022-09-28 Thread Dmitry Osipenko

Prepare fastrpc to the common dynamic dma-buf locking convention by
starting to use the unlocked versions of dma-buf API functions.

Acked-by: Christian König 
Acked-by: Srinivas Kandagatla 
Signed-off-by: Dmitry Osipenko 
---
 drivers/misc/fastrpc.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/misc/fastrpc.c b/drivers/misc/fastrpc.c
index 7ff0b63c25e3..1ad580865525 100644
--- a/drivers/misc/fastrpc.c
+++ b/drivers/misc/fastrpc.c
@@ -310,8 +310,8 @@ static void fastrpc_free_map(struct kref *ref)
return;
}
}
-   dma_buf_unmap_attachment(map->attach, map->table,
-DMA_BIDIRECTIONAL);
+   dma_buf_unmap_attachment_unlocked(map->attach, map->table,
+ DMA_BIDIRECTIONAL);
dma_buf_detach(map->buf, map->attach);
dma_buf_put(map->buf);
}
@@ -726,7 +726,7 @@ static int fastrpc_map_create(struct fastrpc_user *fl, int 
fd,
goto attach_err;
}
 
-   map->table = dma_buf_map_attachment(map->attach, DMA_BIDIRECTIONAL);
+   map->table = dma_buf_map_attachment_unlocked(map->attach, 
DMA_BIDIRECTIONAL);
if (IS_ERR(map->table)) {
err = PTR_ERR(map->table);
goto map_err;
-- 
2.37.3

[PATCH v6 18/21] dma-buf: Move dma_buf_mmap() to dynamic locking specification

2022-09-28 Thread Dmitry Osipenko

Move dma_buf_mmap() function to the dynamic locking specification by
taking the reservation lock. Neither of the today's drivers take the
reservation lock within the mmap() callback, hence it's safe to enforce
the locking.

Acked-by: Sumit Semwal 
Acked-by: Christian König 
Signed-off-by: Dmitry Osipenko 
---
 drivers/dma-buf/dma-buf.c | 8 +++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/drivers/dma-buf/dma-buf.c b/drivers/dma-buf/dma-buf.c
index bff5a70b8735..2452b4c82584 100644
--- a/drivers/dma-buf/dma-buf.c
+++ b/drivers/dma-buf/dma-buf.c
@@ -1390,6 +1390,8 @@ EXPORT_SYMBOL_NS_GPL(dma_buf_end_cpu_access, DMA_BUF);
 int dma_buf_mmap(struct dma_buf *dmabuf, struct vm_area_struct *vma,
 unsigned long pgoff)
 {
+   int ret;
+
if (WARN_ON(!dmabuf || !vma))
return -EINVAL;
 
@@ -1410,7 +1412,11 @@ int dma_buf_mmap(struct dma_buf *dmabuf, struct 
vm_area_struct *vma,
vma_set_file(vma, dmabuf->file);
vma->vm_pgoff = pgoff;
 
-   return dmabuf->ops->mmap(dmabuf, vma);
+   dma_resv_lock(dmabuf->resv, NULL);
+   ret = dmabuf->ops->mmap(dmabuf, vma);
+   dma_resv_unlock(dmabuf->resv);
+
+   return ret;
 }
 EXPORT_SYMBOL_NS_GPL(dma_buf_mmap, DMA_BUF);
 
-- 
2.37.3

Re: [PATCH 09/10] drm/amdgpu/gfx10: switch to amdgpu_gfx_rlc_init_microcode

2022-09-28 Thread Dmitry Osipenko

On 9/28/22 20:47, Dmitry Osipenko wrote:
> On 9/28/22 20:44, Deucher, Alexander wrote:
>> [AMD Official Use Only - General]
>>
>> This should be fixed in a backwards compatible way with this patch:
>> https://patchwork.freedesktop.org/patch/504869/
> 
> Good to know that it's already addressed, thank you very much for the
> quick reply.

Unfortunately, that patch doesn't help beige goby. Please fix.

-- 
Best regards,
Dmitry

[PATCH v6 00/21] Move all drivers to a common dma-buf locking convention

2022-09-28 Thread Dmitry Osipenko

Hello,

This series moves all drivers to a dynamic dma-buf locking specification.
>From now on all dma-buf importers are made responsible for holding
dma-buf's reservation lock around all operations performed over dma-bufs
in accordance to the locking specification. This allows us to utilize
reservation lock more broadly around kernel without fearing of a potential
deadlocks.

This patchset passes all i915 selftests. It was also tested using VirtIO,
Panfrost, Lima, Tegra, udmabuf, AMDGPU and Nouveau drivers. I tested cases
of display+GPU, display+V4L and GPU+V4L dma-buf sharing (where appropriate),
which covers majority of kernel drivers since rest of the drivers share
same or similar code paths.

Changelog:

v6: - Added r-b from Michael Ruhl to the i915 patch.

- Added acks from Sumit Semwal and updated commit message of the
  "Move dma_buf_vmap() to dynamic locking specification" patch like
  was suggested by Sumit.

- Added "!dmabuf" check to dma_buf_vmap_unlocked() to match the locked
  variant of the function, for consistency.

v5: - Added acks and r-bs that were given to v4.

- Changed i915 preparation patch like was suggested by Michael Ruhl.
  The scope of reservation locking is smaller now.

v4: - Added dma_buf_mmap() to the "locking convention" documentation,
  which was missed by accident in v3.

- Added acks from Christian König, Tomasz Figa and Hans Verkuil that
  they gave to couple v3 patches.

- Dropped the "_unlocked" postfix from function names that don't have
  the locked variant, as was requested by Christian König.

- Factored out the per-driver preparations into separate patches
  to ease reviewing of the changes, which is now doable without the
  global dma-buf functions renaming.

- Factored out the dynamic locking convention enforcements into separate
  patches which add the final dma_resv_assert_held(dmabuf->resv) to the
  dma-buf API functions.

v3: - Factored out dma_buf_mmap_unlocked() and attachment functions
  into aseparate patches, like was suggested by Christian König.

- Corrected and factored out dma-buf locking documentation into
  a separate patch, like was suggested by Christian König.

- Intel driver dropped the reservation locking fews days ago from
  its BO-release code path, but we need that locking for the imported
  GEMs because in the end that code path unmaps the imported GEM.
  So I added back the locking needed by the imported GEMs, updating
  the "dma-buf attachment locking specification" patch appropriately.

- Tested Nouveau+Intel dma-buf import/export combo.

- Tested udmabuf import to i915/Nouveau/AMDGPU.

- Fixed few places in Etnaviv, Panfrost and Lima drivers that I missed
  to switch to locked dma-buf vmapping in the drm/gem: Take reservation
  lock for vmap/vunmap operations" patch. In a result invalidated the
  Christian's r-b that he gave to v2.

- Added locked dma-buf vmap/vunmap functions that are needed for fixing
  vmappping of Etnaviv, Panfrost and Lima drivers mentioned above.
  I actually had this change stashed for the drm-shmem shrinker patchset,
  but then realized that it's already needed by the dma-buf patches.
  Also improved my tests to better cover these code paths.

v2: - Changed locking specification to avoid problems with a cross-driver
  ww locking, like was suggested by Christian König. Now the attach/detach
  callbacks are invoked without the held lock and exporter should take the
  lock.

- Added "locking convention" documentation that explains which dma-buf
  functions and callbacks are locked/unlocked for importers and exporters,
  which was requested by Christian König.

- Added ack from Tomasz Figa to the V4L patches that he gave to v1.

Dmitry Osipenko (21):
  dma-buf: Add unlocked variant of vmapping functions
  dma-buf: Add unlocked variant of attachment-mapping functions
  drm/gem: Take reservation lock for vmap/vunmap operations
  drm/prime: Prepare to dynamic dma-buf locking specification
  drm/armada: Prepare to dynamic dma-buf locking specification
  drm/i915: Prepare to dynamic dma-buf locking specification
  drm/omapdrm: Prepare to dynamic dma-buf locking specification
  drm/tegra: Prepare to dynamic dma-buf locking specification
  drm/etnaviv: Prepare to dynamic dma-buf locking specification
  RDMA/umem: Prepare to dynamic dma-buf locking specification
  misc: fastrpc: Prepare to dynamic dma-buf locking specification
  xen/gntdev: Prepare to dynamic dma-buf locking specification
  media: videobuf2: Prepare to dynamic dma-buf locking specification
  media: tegra-vde: Prepare to dynamic dma-buf locking specification
  dma-buf: Move dma_buf_vmap() to dynamic locking specification
  dma-buf: Move dma_buf_attach() to dynamic locking specification
  dma-buf: Move dma_buf_map_attachment() to dynamic locking
specification
  dma-buf: Move

[PATCH v6 15/21] dma-buf: Move dma_buf_vmap() to dynamic locking specification

2022-09-28 Thread Dmitry Osipenko

Move dma_buf_vmap/vunmap() functions to the dynamic locking
specification by asserting that the reservation lock is held.

Acked-by: Sumit Semwal 
Acked-by: Christian König 
Signed-off-by: Dmitry Osipenko 
---
 drivers/dma-buf/dma-buf.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/drivers/dma-buf/dma-buf.c b/drivers/dma-buf/dma-buf.c
index f349613790a6..23656f334735 100644
--- a/drivers/dma-buf/dma-buf.c
+++ b/drivers/dma-buf/dma-buf.c
@@ -1450,6 +1450,8 @@ int dma_buf_vmap(struct dma_buf *dmabuf, struct iosys_map 
*map)
if (WARN_ON(!dmabuf))
return -EINVAL;
 
+   dma_resv_assert_held(dmabuf->resv);
+
if (!dmabuf->ops->vmap)
return -EINVAL;
 
@@ -1513,6 +1515,8 @@ void dma_buf_vunmap(struct dma_buf *dmabuf, struct 
iosys_map *map)
if (WARN_ON(!dmabuf))
return;
 
+   dma_resv_assert_held(dmabuf->resv);
+
BUG_ON(iosys_map_is_null(>vmap_ptr));
BUG_ON(dmabuf->vmapping_counter == 0);
BUG_ON(!iosys_map_is_equal(>vmap_ptr, map));
-- 
2.37.3

[PATCH v6 21/21] dma-buf: Remove obsoleted internal lock

2022-09-28 Thread Dmitry Osipenko

The internal dma-buf lock isn't needed anymore because the updated
locking specification claims that dma-buf reservation must be locked
by importers, and thus, the internal data is already protected by the
reservation lock. Remove the obsoleted internal lock.

Acked-by: Sumit Semwal 
Acked-by: Christian König 
Reviewed-by: Christian König 
Signed-off-by: Dmitry Osipenko 
---
 drivers/dma-buf/dma-buf.c | 14 --
 include/linux/dma-buf.h   |  9 -
 2 files changed, 4 insertions(+), 19 deletions(-)

diff --git a/drivers/dma-buf/dma-buf.c b/drivers/dma-buf/dma-buf.c
index e04d504441a5..82f72b5647f8 100644
--- a/drivers/dma-buf/dma-buf.c
+++ b/drivers/dma-buf/dma-buf.c
@@ -657,7 +657,6 @@ struct dma_buf *dma_buf_export(const struct 
dma_buf_export_info *exp_info)
 
dmabuf->file = file;
 
-   mutex_init(>lock);
INIT_LIST_HEAD(>attachments);
 
mutex_lock(_list.lock);
@@ -1503,7 +1502,7 @@ EXPORT_SYMBOL_NS_GPL(dma_buf_mmap, DMA_BUF);
 int dma_buf_vmap(struct dma_buf *dmabuf, struct iosys_map *map)
 {
struct iosys_map ptr;
-   int ret = 0;
+   int ret;
 
iosys_map_clear(map);
 
@@ -1515,28 +1514,25 @@ int dma_buf_vmap(struct dma_buf *dmabuf, struct 
iosys_map *map)
if (!dmabuf->ops->vmap)
return -EINVAL;
 
-   mutex_lock(>lock);
if (dmabuf->vmapping_counter) {
dmabuf->vmapping_counter++;
BUG_ON(iosys_map_is_null(>vmap_ptr));
*map = dmabuf->vmap_ptr;
-   goto out_unlock;
+   return 0;
}
 
BUG_ON(iosys_map_is_set(>vmap_ptr));
 
ret = dmabuf->ops->vmap(dmabuf, );
if (WARN_ON_ONCE(ret))
-   goto out_unlock;
+   return ret;
 
dmabuf->vmap_ptr = ptr;
dmabuf->vmapping_counter = 1;
 
*map = dmabuf->vmap_ptr;
 
-out_unlock:
-   mutex_unlock(>lock);
-   return ret;
+   return 0;
 }
 EXPORT_SYMBOL_NS_GPL(dma_buf_vmap, DMA_BUF);
 
@@ -1581,13 +1577,11 @@ void dma_buf_vunmap(struct dma_buf *dmabuf, struct 
iosys_map *map)
BUG_ON(dmabuf->vmapping_counter == 0);
BUG_ON(!iosys_map_is_equal(>vmap_ptr, map));
 
-   mutex_lock(>lock);
if (--dmabuf->vmapping_counter == 0) {
if (dmabuf->ops->vunmap)
dmabuf->ops->vunmap(dmabuf, map);
iosys_map_clear(>vmap_ptr);
}
-   mutex_unlock(>lock);
 }
 EXPORT_SYMBOL_NS_GPL(dma_buf_vunmap, DMA_BUF);
 
diff --git a/include/linux/dma-buf.h b/include/linux/dma-buf.h
index f11b5bbc2f37..6fa8d4e29719 100644
--- a/include/linux/dma-buf.h
+++ b/include/linux/dma-buf.h
@@ -326,15 +326,6 @@ struct dma_buf {
/** @ops: dma_buf_ops associated with this buffer object. */
const struct dma_buf_ops *ops;
 
-   /**
-* @lock:
-*
-* Used internally to serialize list manipulation, attach/detach and
-* vmap/unmap. Note that in many cases this is superseeded by
-* dma_resv_lock() on @resv.
-*/
-   struct mutex lock;
-
/**
 * @vmapping_counter:
 *
-- 
2.37.3

[PATCH v6 12/21] xen/gntdev: Prepare to dynamic dma-buf locking specification

2022-09-28 Thread Dmitry Osipenko

Prepare gntdev driver to the common dynamic dma-buf locking convention
by starting to use the unlocked versions of dma-buf API functions.

Acked-by: Juergen Gross 
Acked-by: Christian König 
Signed-off-by: Dmitry Osipenko 
---
 drivers/xen/gntdev-dmabuf.c | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/drivers/xen/gntdev-dmabuf.c b/drivers/xen/gntdev-dmabuf.c
index 940e5e9e8a54..4440e626b797 100644
--- a/drivers/xen/gntdev-dmabuf.c
+++ b/drivers/xen/gntdev-dmabuf.c
@@ -600,7 +600,7 @@ dmabuf_imp_to_refs(struct gntdev_dmabuf_priv *priv, struct 
device *dev,
 
gntdev_dmabuf->u.imp.attach = attach;
 
-   sgt = dma_buf_map_attachment(attach, DMA_BIDIRECTIONAL);
+   sgt = dma_buf_map_attachment_unlocked(attach, DMA_BIDIRECTIONAL);
if (IS_ERR(sgt)) {
ret = ERR_CAST(sgt);
goto fail_detach;
@@ -658,7 +658,7 @@ dmabuf_imp_to_refs(struct gntdev_dmabuf_priv *priv, struct 
device *dev,
 fail_end_access:
dmabuf_imp_end_foreign_access(gntdev_dmabuf->u.imp.refs, count);
 fail_unmap:
-   dma_buf_unmap_attachment(attach, sgt, DMA_BIDIRECTIONAL);
+   dma_buf_unmap_attachment_unlocked(attach, sgt, DMA_BIDIRECTIONAL);
 fail_detach:
dma_buf_detach(dma_buf, attach);
 fail_free_obj:
@@ -708,8 +708,8 @@ static int dmabuf_imp_release(struct gntdev_dmabuf_priv 
*priv, u32 fd)
attach = gntdev_dmabuf->u.imp.attach;
 
if (gntdev_dmabuf->u.imp.sgt)
-   dma_buf_unmap_attachment(attach, gntdev_dmabuf->u.imp.sgt,
-DMA_BIDIRECTIONAL);
+   dma_buf_unmap_attachment_unlocked(attach, 
gntdev_dmabuf->u.imp.sgt,
+ DMA_BIDIRECTIONAL);
dma_buf = attach->dmabuf;
dma_buf_detach(attach->dmabuf, attach);
dma_buf_put(dma_buf);
-- 
2.37.3

[PATCH v6 10/21] RDMA/umem: Prepare to dynamic dma-buf locking specification

2022-09-28 Thread Dmitry Osipenko

Prepare InfiniBand drivers to the common dynamic dma-buf locking
convention by starting to use the unlocked versions of dma-buf API
functions.

Acked-by: Christian König 
Signed-off-by: Dmitry Osipenko 
---
 drivers/infiniband/core/umem_dmabuf.c | 7 ---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/drivers/infiniband/core/umem_dmabuf.c 
b/drivers/infiniband/core/umem_dmabuf.c
index 04c04e6d24c3..43b26bc12288 100644
--- a/drivers/infiniband/core/umem_dmabuf.c
+++ b/drivers/infiniband/core/umem_dmabuf.c
@@ -26,7 +26,8 @@ int ib_umem_dmabuf_map_pages(struct ib_umem_dmabuf 
*umem_dmabuf)
if (umem_dmabuf->sgt)
goto wait_fence;
 
-   sgt = dma_buf_map_attachment(umem_dmabuf->attach, DMA_BIDIRECTIONAL);
+   sgt = dma_buf_map_attachment_unlocked(umem_dmabuf->attach,
+ DMA_BIDIRECTIONAL);
if (IS_ERR(sgt))
return PTR_ERR(sgt);
 
@@ -102,8 +103,8 @@ void ib_umem_dmabuf_unmap_pages(struct ib_umem_dmabuf 
*umem_dmabuf)
umem_dmabuf->last_sg_trim = 0;
}
 
-   dma_buf_unmap_attachment(umem_dmabuf->attach, umem_dmabuf->sgt,
-DMA_BIDIRECTIONAL);
+   dma_buf_unmap_attachment_unlocked(umem_dmabuf->attach, umem_dmabuf->sgt,
+ DMA_BIDIRECTIONAL);
 
umem_dmabuf->sgt = NULL;
 }
-- 
2.37.3

[PATCH v6 13/21] media: videobuf2: Prepare to dynamic dma-buf locking specification

2022-09-28 Thread Dmitry Osipenko

Prepare V4L2 memory allocators to the common dynamic dma-buf locking
convention by starting to use the unlocked versions of dma-buf API
functions.

Acked-by: Tomasz Figa 
Acked-by: Christian König 
Signed-off-by: Dmitry Osipenko 
---
 drivers/media/common/videobuf2/videobuf2-dma-contig.c | 11 ++-
 drivers/media/common/videobuf2/videobuf2-dma-sg.c |  8 
 drivers/media/common/videobuf2/videobuf2-vmalloc.c|  6 +++---
 3 files changed, 13 insertions(+), 12 deletions(-)

diff --git a/drivers/media/common/videobuf2/videobuf2-dma-contig.c 
b/drivers/media/common/videobuf2/videobuf2-dma-contig.c
index 678b359717c4..79f4d8301fbb 100644
--- a/drivers/media/common/videobuf2/videobuf2-dma-contig.c
+++ b/drivers/media/common/videobuf2/videobuf2-dma-contig.c
@@ -101,7 +101,7 @@ static void *vb2_dc_vaddr(struct vb2_buffer *vb, void 
*buf_priv)
if (buf->db_attach) {
struct iosys_map map;
 
-   if (!dma_buf_vmap(buf->db_attach->dmabuf, ))
+   if (!dma_buf_vmap_unlocked(buf->db_attach->dmabuf, ))
buf->vaddr = map.vaddr;
 
return buf->vaddr;
@@ -711,7 +711,7 @@ static int vb2_dc_map_dmabuf(void *mem_priv)
}
 
/* get the associated scatterlist for this buffer */
-   sgt = dma_buf_map_attachment(buf->db_attach, buf->dma_dir);
+   sgt = dma_buf_map_attachment_unlocked(buf->db_attach, buf->dma_dir);
if (IS_ERR(sgt)) {
pr_err("Error getting dmabuf scatterlist\n");
return -EINVAL;
@@ -722,7 +722,8 @@ static int vb2_dc_map_dmabuf(void *mem_priv)
if (contig_size < buf->size) {
pr_err("contiguous chunk is too small %lu/%lu\n",
   contig_size, buf->size);
-   dma_buf_unmap_attachment(buf->db_attach, sgt, buf->dma_dir);
+   dma_buf_unmap_attachment_unlocked(buf->db_attach, sgt,
+ buf->dma_dir);
return -EFAULT;
}
 
@@ -750,10 +751,10 @@ static void vb2_dc_unmap_dmabuf(void *mem_priv)
}
 
if (buf->vaddr) {
-   dma_buf_vunmap(buf->db_attach->dmabuf, );
+   dma_buf_vunmap_unlocked(buf->db_attach->dmabuf, );
buf->vaddr = NULL;
}
-   dma_buf_unmap_attachment(buf->db_attach, sgt, buf->dma_dir);
+   dma_buf_unmap_attachment_unlocked(buf->db_attach, sgt, buf->dma_dir);
 
buf->dma_addr = 0;
buf->dma_sgt = NULL;
diff --git a/drivers/media/common/videobuf2/videobuf2-dma-sg.c 
b/drivers/media/common/videobuf2/videobuf2-dma-sg.c
index fa69158a65b1..36ecdea8d707 100644
--- a/drivers/media/common/videobuf2/videobuf2-dma-sg.c
+++ b/drivers/media/common/videobuf2/videobuf2-dma-sg.c
@@ -309,7 +309,7 @@ static void *vb2_dma_sg_vaddr(struct vb2_buffer *vb, void 
*buf_priv)
 
if (!buf->vaddr) {
if (buf->db_attach) {
-   ret = dma_buf_vmap(buf->db_attach->dmabuf, );
+   ret = dma_buf_vmap_unlocked(buf->db_attach->dmabuf, 
);
buf->vaddr = ret ? NULL : map.vaddr;
} else {
buf->vaddr = vm_map_ram(buf->pages, buf->num_pages, -1);
@@ -565,7 +565,7 @@ static int vb2_dma_sg_map_dmabuf(void *mem_priv)
}
 
/* get the associated scatterlist for this buffer */
-   sgt = dma_buf_map_attachment(buf->db_attach, buf->dma_dir);
+   sgt = dma_buf_map_attachment_unlocked(buf->db_attach, buf->dma_dir);
if (IS_ERR(sgt)) {
pr_err("Error getting dmabuf scatterlist\n");
return -EINVAL;
@@ -594,10 +594,10 @@ static void vb2_dma_sg_unmap_dmabuf(void *mem_priv)
}
 
if (buf->vaddr) {
-   dma_buf_vunmap(buf->db_attach->dmabuf, );
+   dma_buf_vunmap_unlocked(buf->db_attach->dmabuf, );
buf->vaddr = NULL;
}
-   dma_buf_unmap_attachment(buf->db_attach, sgt, buf->dma_dir);
+   dma_buf_unmap_attachment_unlocked(buf->db_attach, sgt, buf->dma_dir);
 
buf->dma_sgt = NULL;
 }
diff --git a/drivers/media/common/videobuf2/videobuf2-vmalloc.c 
b/drivers/media/common/videobuf2/videobuf2-vmalloc.c
index 948152f1596b..7831bf545874 100644
--- a/drivers/media/common/videobuf2/videobuf2-vmalloc.c
+++ b/drivers/media/common/videobuf2/videobuf2-vmalloc.c
@@ -376,7 +376,7 @@ static int vb2_vmalloc_map_dmabuf(void *mem_priv)
struct iosys_map map;
int ret;
 
-   ret = dma_buf_vmap(buf->dbuf, );
+   ret = dma_buf_vmap_unlocked(buf->dbuf, );
if (ret)
return -EFAULT;
buf->vaddr = map.vaddr;
@@ -389,7 +389,7 @@ static void vb2_vmalloc_unmap_dmabuf(void *mem_priv)
struct vb2_vmalloc_buf *buf = mem_priv;
struct iosys_map map = IOSYS_MAP_INIT_VADDR(buf->vaddr);
 
-   dma_buf_vunmap(buf->dbuf, );
+   dma_buf_vunmap_unlocked(buf->dbuf, );
buf->vaddr = NULL;
 }
 
@@

[PATCH v6 16/21] dma-buf: Move dma_buf_attach() to dynamic locking specification

2022-09-28 Thread Dmitry Osipenko

Move dma-buf attachment API functions to the dynamic locking specification
by taking the reservation lock around the mapping operations. The strict
locking convention prevents deadlock situations for dma-buf importers and
exporters.

Acked-by: Sumit Semwal 
Reviewed-by: Christian König 
Signed-off-by: Dmitry Osipenko 
---
 drivers/dma-buf/dma-buf.c | 20 
 1 file changed, 8 insertions(+), 12 deletions(-)

diff --git a/drivers/dma-buf/dma-buf.c b/drivers/dma-buf/dma-buf.c
index 23656f334735..d60585bbb529 100644
--- a/drivers/dma-buf/dma-buf.c
+++ b/drivers/dma-buf/dma-buf.c
@@ -859,8 +859,8 @@ dma_buf_dynamic_attach(struct dma_buf *dmabuf, struct 
device *dev,
dma_buf_is_dynamic(dmabuf)) {
struct sg_table *sgt;
 
+   dma_resv_lock(attach->dmabuf->resv, NULL);
if (dma_buf_is_dynamic(attach->dmabuf)) {
-   dma_resv_lock(attach->dmabuf->resv, NULL);
ret = dmabuf->ops->pin(attach);
if (ret)
goto err_unlock;
@@ -873,8 +873,7 @@ dma_buf_dynamic_attach(struct dma_buf *dmabuf, struct 
device *dev,
ret = PTR_ERR(sgt);
goto err_unpin;
}
-   if (dma_buf_is_dynamic(attach->dmabuf))
-   dma_resv_unlock(attach->dmabuf->resv);
+   dma_resv_unlock(attach->dmabuf->resv);
attach->sgt = sgt;
attach->dir = DMA_BIDIRECTIONAL;
}
@@ -890,8 +889,7 @@ dma_buf_dynamic_attach(struct dma_buf *dmabuf, struct 
device *dev,
dmabuf->ops->unpin(attach);
 
 err_unlock:
-   if (dma_buf_is_dynamic(attach->dmabuf))
-   dma_resv_unlock(attach->dmabuf->resv);
+   dma_resv_unlock(attach->dmabuf->resv);
 
dma_buf_detach(dmabuf, attach);
return ERR_PTR(ret);
@@ -937,21 +935,19 @@ void dma_buf_detach(struct dma_buf *dmabuf, struct 
dma_buf_attachment *attach)
if (WARN_ON(!dmabuf || !attach))
return;
 
+   dma_resv_lock(attach->dmabuf->resv, NULL);
+
if (attach->sgt) {
-   if (dma_buf_is_dynamic(attach->dmabuf))
-   dma_resv_lock(attach->dmabuf->resv, NULL);
 
__unmap_dma_buf(attach, attach->sgt, attach->dir);
 
-   if (dma_buf_is_dynamic(attach->dmabuf)) {
+   if (dma_buf_is_dynamic(attach->dmabuf))
dmabuf->ops->unpin(attach);
-   dma_resv_unlock(attach->dmabuf->resv);
-   }
}
-
-   dma_resv_lock(dmabuf->resv, NULL);
list_del(>node);
+
dma_resv_unlock(dmabuf->resv);
+
if (dmabuf->ops->detach)
dmabuf->ops->detach(dmabuf, attach);
 
-- 
2.37.3

[PATCH v6 02/21] dma-buf: Add unlocked variant of attachment-mapping functions

2022-09-28 Thread Dmitry Osipenko

Add unlocked variant of dma_buf_map/unmap_attachment() that will
be used by drivers that don't take the reservation lock explicitly.

Acked-by: Sumit Semwal 
Acked-by: Christian König 
Signed-off-by: Dmitry Osipenko 
---
 drivers/dma-buf/dma-buf.c | 53 +++
 include/linux/dma-buf.h   |  6 +
 2 files changed, 59 insertions(+)

diff --git a/drivers/dma-buf/dma-buf.c b/drivers/dma-buf/dma-buf.c
index 0427f3b88170..f349613790a6 100644
--- a/drivers/dma-buf/dma-buf.c
+++ b/drivers/dma-buf/dma-buf.c
@@ -1100,6 +1100,34 @@ struct sg_table *dma_buf_map_attachment(struct 
dma_buf_attachment *attach,
 }
 EXPORT_SYMBOL_NS_GPL(dma_buf_map_attachment, DMA_BUF);
 
+/**
+ * dma_buf_map_attachment_unlocked - Returns the scatterlist table of the 
attachment;
+ * mapped into _device_ address space. Is a wrapper for map_dma_buf() of the
+ * dma_buf_ops.
+ * @attach:[in]attachment whose scatterlist is to be returned
+ * @direction: [in]direction of DMA transfer
+ *
+ * Unlocked variant of dma_buf_map_attachment().
+ */
+struct sg_table *
+dma_buf_map_attachment_unlocked(struct dma_buf_attachment *attach,
+   enum dma_data_direction direction)
+{
+   struct sg_table *sg_table;
+
+   might_sleep();
+
+   if (WARN_ON(!attach || !attach->dmabuf))
+   return ERR_PTR(-EINVAL);
+
+   dma_resv_lock(attach->dmabuf->resv, NULL);
+   sg_table = dma_buf_map_attachment(attach, direction);
+   dma_resv_unlock(attach->dmabuf->resv);
+
+   return sg_table;
+}
+EXPORT_SYMBOL_NS_GPL(dma_buf_map_attachment_unlocked, DMA_BUF);
+
 /**
  * dma_buf_unmap_attachment - unmaps and decreases usecount of the buffer;might
  * deallocate the scatterlist associated. Is a wrapper for unmap_dma_buf() of
@@ -1136,6 +1164,31 @@ void dma_buf_unmap_attachment(struct dma_buf_attachment 
*attach,
 }
 EXPORT_SYMBOL_NS_GPL(dma_buf_unmap_attachment, DMA_BUF);
 
+/**
+ * dma_buf_unmap_attachment_unlocked - unmaps and decreases usecount of the 
buffer;might
+ * deallocate the scatterlist associated. Is a wrapper for unmap_dma_buf() of
+ * dma_buf_ops.
+ * @attach:[in]attachment to unmap buffer from
+ * @sg_table:  [in]scatterlist info of the buffer to unmap
+ * @direction: [in]direction of DMA transfer
+ *
+ * Unlocked variant of dma_buf_unmap_attachment().
+ */
+void dma_buf_unmap_attachment_unlocked(struct dma_buf_attachment *attach,
+  struct sg_table *sg_table,
+  enum dma_data_direction direction)
+{
+   might_sleep();
+
+   if (WARN_ON(!attach || !attach->dmabuf || !sg_table))
+   return;
+
+   dma_resv_lock(attach->dmabuf->resv, NULL);
+   dma_buf_unmap_attachment(attach, sg_table, direction);
+   dma_resv_unlock(attach->dmabuf->resv);
+}
+EXPORT_SYMBOL_NS_GPL(dma_buf_unmap_attachment_unlocked, DMA_BUF);
+
 /**
  * dma_buf_move_notify - notify attachments that DMA-buf is moving
  *
diff --git a/include/linux/dma-buf.h b/include/linux/dma-buf.h
index 8daa054dd7fe..f11b5bbc2f37 100644
--- a/include/linux/dma-buf.h
+++ b/include/linux/dma-buf.h
@@ -627,6 +627,12 @@ int dma_buf_begin_cpu_access(struct dma_buf *dma_buf,
 enum dma_data_direction dir);
 int dma_buf_end_cpu_access(struct dma_buf *dma_buf,
   enum dma_data_direction dir);
+struct sg_table *
+dma_buf_map_attachment_unlocked(struct dma_buf_attachment *attach,
+   enum dma_data_direction direction);
+void dma_buf_unmap_attachment_unlocked(struct dma_buf_attachment *attach,
+  struct sg_table *sg_table,
+  enum dma_data_direction direction);
 
 int dma_buf_mmap(struct dma_buf *, struct vm_area_struct *,
 unsigned long);
-- 
2.37.3

[PATCH v6 17/21] dma-buf: Move dma_buf_map_attachment() to dynamic locking specification

2022-09-28 Thread Dmitry Osipenko

Move dma-buf attachment mapping functions to the dynamic locking
specification by asserting that the reservation lock is held.

Acked-by: Sumit Semwal 
Reviewed-by: Christian König 
Signed-off-by: Dmitry Osipenko 
---
 drivers/dma-buf/dma-buf.c | 10 ++
 1 file changed, 2 insertions(+), 8 deletions(-)

diff --git a/drivers/dma-buf/dma-buf.c b/drivers/dma-buf/dma-buf.c
index d60585bbb529..bff5a70b8735 100644
--- a/drivers/dma-buf/dma-buf.c
+++ b/drivers/dma-buf/dma-buf.c
@@ -1038,8 +1038,7 @@ struct sg_table *dma_buf_map_attachment(struct 
dma_buf_attachment *attach,
if (WARN_ON(!attach || !attach->dmabuf))
return ERR_PTR(-EINVAL);
 
-   if (dma_buf_attachment_is_dynamic(attach))
-   dma_resv_assert_held(attach->dmabuf->resv);
+   dma_resv_assert_held(attach->dmabuf->resv);
 
if (attach->sgt) {
/*
@@ -1054,7 +1053,6 @@ struct sg_table *dma_buf_map_attachment(struct 
dma_buf_attachment *attach,
}
 
if (dma_buf_is_dynamic(attach->dmabuf)) {
-   dma_resv_assert_held(attach->dmabuf->resv);
if (!IS_ENABLED(CONFIG_DMABUF_MOVE_NOTIFY)) {
r = attach->dmabuf->ops->pin(attach);
if (r)
@@ -1143,15 +1141,11 @@ void dma_buf_unmap_attachment(struct dma_buf_attachment 
*attach,
if (WARN_ON(!attach || !attach->dmabuf || !sg_table))
return;
 
-   if (dma_buf_attachment_is_dynamic(attach))
-   dma_resv_assert_held(attach->dmabuf->resv);
+   dma_resv_assert_held(attach->dmabuf->resv);
 
if (attach->sgt == sg_table)
return;
 
-   if (dma_buf_is_dynamic(attach->dmabuf))
-   dma_resv_assert_held(attach->dmabuf->resv);
-
__unmap_dma_buf(attach, sg_table, direction);
 
if (dma_buf_is_dynamic(attach->dmabuf) &&
-- 
2.37.3

[PATCH v6 09/21] drm/etnaviv: Prepare to dynamic dma-buf locking specification

2022-09-28 Thread Dmitry Osipenko

Prepare Etnaviv driver to the common dynamic dma-buf locking convention
by starting to use the unlocked versions of dma-buf API functions.

Acked-by: Christian König 
Signed-off-by: Dmitry Osipenko 
---
 drivers/gpu/drm/etnaviv/etnaviv_gem_prime.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/etnaviv/etnaviv_gem_prime.c 
b/drivers/gpu/drm/etnaviv/etnaviv_gem_prime.c
index 3fa2da149639..7031db145a77 100644
--- a/drivers/gpu/drm/etnaviv/etnaviv_gem_prime.c
+++ b/drivers/gpu/drm/etnaviv/etnaviv_gem_prime.c
@@ -65,7 +65,7 @@ static void etnaviv_gem_prime_release(struct 
etnaviv_gem_object *etnaviv_obj)
struct iosys_map map = IOSYS_MAP_INIT_VADDR(etnaviv_obj->vaddr);
 
if (etnaviv_obj->vaddr)
-   dma_buf_vunmap(etnaviv_obj->base.import_attach->dmabuf, );
+   
dma_buf_vunmap_unlocked(etnaviv_obj->base.import_attach->dmabuf, );
 
/* Don't drop the pages for imported dmabuf, as they are not
 * ours, just free the array we allocated:
-- 
2.37.3

[PATCH v6 14/21] media: tegra-vde: Prepare to dynamic dma-buf locking specification

2022-09-28 Thread Dmitry Osipenko

Prepare Tegra video decoder driver to the common dynamic dma-buf
locking convention by starting to use the unlocked versions of dma-buf
API functions.

Acked-by: Christian König 
Signed-off-by: Dmitry Osipenko 
---
 drivers/media/platform/nvidia/tegra-vde/dmabuf-cache.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/media/platform/nvidia/tegra-vde/dmabuf-cache.c 
b/drivers/media/platform/nvidia/tegra-vde/dmabuf-cache.c
index 69c346148070..1c5b94989aec 100644
--- a/drivers/media/platform/nvidia/tegra-vde/dmabuf-cache.c
+++ b/drivers/media/platform/nvidia/tegra-vde/dmabuf-cache.c
@@ -38,7 +38,7 @@ static void tegra_vde_release_entry(struct 
tegra_vde_cache_entry *entry)
if (entry->vde->domain)
tegra_vde_iommu_unmap(entry->vde, entry->iova);
 
-   dma_buf_unmap_attachment(entry->a, entry->sgt, entry->dma_dir);
+   dma_buf_unmap_attachment_unlocked(entry->a, entry->sgt, entry->dma_dir);
dma_buf_detach(dmabuf, entry->a);
dma_buf_put(dmabuf);
 
@@ -102,7 +102,7 @@ int tegra_vde_dmabuf_cache_map(struct tegra_vde *vde,
goto err_unlock;
}
 
-   sgt = dma_buf_map_attachment(attachment, dma_dir);
+   sgt = dma_buf_map_attachment_unlocked(attachment, dma_dir);
if (IS_ERR(sgt)) {
dev_err(dev, "Failed to get dmabufs sg_table\n");
err = PTR_ERR(sgt);
@@ -152,7 +152,7 @@ int tegra_vde_dmabuf_cache_map(struct tegra_vde *vde,
 err_free:
kfree(entry);
 err_unmap:
-   dma_buf_unmap_attachment(attachment, sgt, dma_dir);
+   dma_buf_unmap_attachment_unlocked(attachment, sgt, dma_dir);
 err_detach:
dma_buf_detach(dmabuf, attachment);
 err_unlock:
-- 
2.37.3

[PATCH v6 01/21] dma-buf: Add unlocked variant of vmapping functions

2022-09-28 Thread Dmitry Osipenko

Add unlocked variant of dma_buf_vmap/vunmap() that will be utilized
by drivers that don't take the reservation lock explicitly.

Acked-by: Sumit Semwal 
Acked-by: Christian König 
Signed-off-by: Dmitry Osipenko 
---
 drivers/dma-buf/dma-buf.c | 41 +++
 include/linux/dma-buf.h   |  2 ++
 2 files changed, 43 insertions(+)

diff --git a/drivers/dma-buf/dma-buf.c b/drivers/dma-buf/dma-buf.c
index dd0f83ee505b..0427f3b88170 100644
--- a/drivers/dma-buf/dma-buf.c
+++ b/drivers/dma-buf/dma-buf.c
@@ -1425,6 +1425,31 @@ int dma_buf_vmap(struct dma_buf *dmabuf, struct 
iosys_map *map)
 }
 EXPORT_SYMBOL_NS_GPL(dma_buf_vmap, DMA_BUF);
 
+/**
+ * dma_buf_vmap_unlocked - Create virtual mapping for the buffer object into 
kernel
+ * address space. Same restrictions as for vmap and friends apply.
+ * @dmabuf:[in]buffer to vmap
+ * @map:   [out]   returns the vmap pointer
+ *
+ * Unlocked version of dma_buf_vmap()
+ *
+ * Returns 0 on success, or a negative errno code otherwise.
+ */
+int dma_buf_vmap_unlocked(struct dma_buf *dmabuf, struct iosys_map *map)
+{
+   int ret;
+
+   if (WARN_ON(!dmabuf))
+   return -EINVAL;
+
+   dma_resv_lock(dmabuf->resv, NULL);
+   ret = dma_buf_vmap(dmabuf, map);
+   dma_resv_unlock(dmabuf->resv);
+
+   return ret;
+}
+EXPORT_SYMBOL_NS_GPL(dma_buf_vmap_unlocked, DMA_BUF);
+
 /**
  * dma_buf_vunmap - Unmap a vmap obtained by dma_buf_vmap.
  * @dmabuf:[in]buffer to vunmap
@@ -1449,6 +1474,22 @@ void dma_buf_vunmap(struct dma_buf *dmabuf, struct 
iosys_map *map)
 }
 EXPORT_SYMBOL_NS_GPL(dma_buf_vunmap, DMA_BUF);
 
+/**
+ * dma_buf_vunmap_unlocked - Unmap a vmap obtained by dma_buf_vmap.
+ * @dmabuf:[in]buffer to vunmap
+ * @map:   [in]vmap pointer to vunmap
+ */
+void dma_buf_vunmap_unlocked(struct dma_buf *dmabuf, struct iosys_map *map)
+{
+   if (WARN_ON(!dmabuf))
+   return;
+
+   dma_resv_lock(dmabuf->resv, NULL);
+   dma_buf_vunmap(dmabuf, map);
+   dma_resv_unlock(dmabuf->resv);
+}
+EXPORT_SYMBOL_NS_GPL(dma_buf_vunmap_unlocked, DMA_BUF);
+
 #ifdef CONFIG_DEBUG_FS
 static int dma_buf_debug_show(struct seq_file *s, void *unused)
 {
diff --git a/include/linux/dma-buf.h b/include/linux/dma-buf.h
index 71731796c8c3..8daa054dd7fe 100644
--- a/include/linux/dma-buf.h
+++ b/include/linux/dma-buf.h
@@ -632,4 +632,6 @@ int dma_buf_mmap(struct dma_buf *, struct vm_area_struct *,
 unsigned long);
 int dma_buf_vmap(struct dma_buf *dmabuf, struct iosys_map *map);
 void dma_buf_vunmap(struct dma_buf *dmabuf, struct iosys_map *map);
+int dma_buf_vmap_unlocked(struct dma_buf *dmabuf, struct iosys_map *map);
+void dma_buf_vunmap_unlocked(struct dma_buf *dmabuf, struct iosys_map *map);
 #endif /* __DMA_BUF_H__ */
-- 
2.37.3

[PATCH v6 06/21] drm/i915: Prepare to dynamic dma-buf locking specification

2022-09-28 Thread Dmitry Osipenko

Prepare i915 driver to the common dynamic dma-buf locking convention
by starting to use the unlocked versions of dma-buf API functions
and handling cases where importer now holds the reservation lock.

Acked-by: Christian König 
Reviewed-by: Michael J. Ruhl 
Signed-off-by: Dmitry Osipenko 
---
 drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c   |  2 +-
 drivers/gpu/drm/i915/gem/i915_gem_object.c   | 14 ++
 .../gpu/drm/i915/gem/selftests/i915_gem_dmabuf.c | 16 
 3 files changed, 23 insertions(+), 9 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c 
b/drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c
index f5062d0c6333..07eee1c09aaf 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c
@@ -72,7 +72,7 @@ static int i915_gem_dmabuf_vmap(struct dma_buf *dma_buf,
struct drm_i915_gem_object *obj = dma_buf_to_obj(dma_buf);
void *vaddr;
 
-   vaddr = i915_gem_object_pin_map_unlocked(obj, I915_MAP_WB);
+   vaddr = i915_gem_object_pin_map(obj, I915_MAP_WB);
if (IS_ERR(vaddr))
return PTR_ERR(vaddr);
 
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object.c 
b/drivers/gpu/drm/i915/gem/i915_gem_object.c
index 7ff9c7877bec..3e3f63f86629 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_object.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_object.c
@@ -290,7 +290,21 @@ void __i915_gem_object_pages_fini(struct 
drm_i915_gem_object *obj)
__i915_gem_object_free_mmaps(obj);
 
atomic_set(>mm.pages_pin_count, 0);
+
+   /*
+* dma_buf_unmap_attachment() requires reservation to be
+* locked. The imported GEM shouldn't share reservation lock
+* and ttm_bo_cleanup_memtype_use() shouldn't be invoked for
+* dma-buf, so it's safe to take the lock.
+*/
+   if (obj->base.import_attach)
+   i915_gem_object_lock(obj, NULL);
+
__i915_gem_object_put_pages(obj);
+
+   if (obj->base.import_attach)
+   i915_gem_object_unlock(obj);
+
GEM_BUG_ON(i915_gem_object_has_pages(obj));
 }
 
diff --git a/drivers/gpu/drm/i915/gem/selftests/i915_gem_dmabuf.c 
b/drivers/gpu/drm/i915/gem/selftests/i915_gem_dmabuf.c
index 51ed824b020c..f2f3cfad807b 100644
--- a/drivers/gpu/drm/i915/gem/selftests/i915_gem_dmabuf.c
+++ b/drivers/gpu/drm/i915/gem/selftests/i915_gem_dmabuf.c
@@ -213,7 +213,7 @@ static int igt_dmabuf_import_same_driver(struct 
drm_i915_private *i915,
goto out_import;
}
 
-   st = dma_buf_map_attachment(import_attach, DMA_BIDIRECTIONAL);
+   st = dma_buf_map_attachment_unlocked(import_attach, DMA_BIDIRECTIONAL);
if (IS_ERR(st)) {
err = PTR_ERR(st);
goto out_detach;
@@ -226,7 +226,7 @@ static int igt_dmabuf_import_same_driver(struct 
drm_i915_private *i915,
timeout = -ETIME;
}
err = timeout > 0 ? 0 : timeout;
-   dma_buf_unmap_attachment(import_attach, st, DMA_BIDIRECTIONAL);
+   dma_buf_unmap_attachment_unlocked(import_attach, st, DMA_BIDIRECTIONAL);
 out_detach:
dma_buf_detach(dmabuf, import_attach);
 out_import:
@@ -296,7 +296,7 @@ static int igt_dmabuf_import(void *arg)
goto out_obj;
}
 
-   err = dma_buf_vmap(dmabuf, );
+   err = dma_buf_vmap_unlocked(dmabuf, );
dma_map = err ? NULL : map.vaddr;
if (!dma_map) {
pr_err("dma_buf_vmap failed\n");
@@ -337,7 +337,7 @@ static int igt_dmabuf_import(void *arg)
 
err = 0;
 out_dma_map:
-   dma_buf_vunmap(dmabuf, );
+   dma_buf_vunmap_unlocked(dmabuf, );
 out_obj:
i915_gem_object_put(obj);
 out_dmabuf:
@@ -358,7 +358,7 @@ static int igt_dmabuf_import_ownership(void *arg)
if (IS_ERR(dmabuf))
return PTR_ERR(dmabuf);
 
-   err = dma_buf_vmap(dmabuf, );
+   err = dma_buf_vmap_unlocked(dmabuf, );
ptr = err ? NULL : map.vaddr;
if (!ptr) {
pr_err("dma_buf_vmap failed\n");
@@ -367,7 +367,7 @@ static int igt_dmabuf_import_ownership(void *arg)
}
 
memset(ptr, 0xc5, PAGE_SIZE);
-   dma_buf_vunmap(dmabuf, );
+   dma_buf_vunmap_unlocked(dmabuf, );
 
obj = to_intel_bo(i915_gem_prime_import(>drm, dmabuf));
if (IS_ERR(obj)) {
@@ -418,7 +418,7 @@ static int igt_dmabuf_export_vmap(void *arg)
}
i915_gem_object_put(obj);
 
-   err = dma_buf_vmap(dmabuf, );
+   err = dma_buf_vmap_unlocked(dmabuf, );
ptr = err ? NULL : map.vaddr;
if (!ptr) {
pr_err("dma_buf_vmap failed\n");
@@ -435,7 +435,7 @@ static int igt_dmabuf_export_vmap(void *arg)
memset(ptr, 0xc5, dmabuf->size);
 
err = 0;
-   dma_buf_vunmap(dmabuf, );
+   dma_buf_vunmap_unlocked(dmabuf, );
 out:
dma_buf_put(dmabuf);
return err;
-- 
2.37.3

[PATCH v6 04/21] drm/prime: Prepare to dynamic dma-buf locking specification

2022-09-28 Thread Dmitry Osipenko

Prepare DRM prime core to the common dynamic dma-buf locking convention
by starting to use the unlocked versions of dma-buf API functions.

Reviewed-by: Christian König 
Signed-off-by: Dmitry Osipenko 
---
 drivers/gpu/drm/drm_prime.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/drm_prime.c b/drivers/gpu/drm/drm_prime.c
index eb09e86044c6..20e109a802ae 100644
--- a/drivers/gpu/drm/drm_prime.c
+++ b/drivers/gpu/drm/drm_prime.c
@@ -940,7 +940,7 @@ struct drm_gem_object *drm_gem_prime_import_dev(struct 
drm_device *dev,
 
get_dma_buf(dma_buf);
 
-   sgt = dma_buf_map_attachment(attach, DMA_BIDIRECTIONAL);
+   sgt = dma_buf_map_attachment_unlocked(attach, DMA_BIDIRECTIONAL);
if (IS_ERR(sgt)) {
ret = PTR_ERR(sgt);
goto fail_detach;
@@ -958,7 +958,7 @@ struct drm_gem_object *drm_gem_prime_import_dev(struct 
drm_device *dev,
return obj;
 
 fail_unmap:
-   dma_buf_unmap_attachment(attach, sgt, DMA_BIDIRECTIONAL);
+   dma_buf_unmap_attachment_unlocked(attach, sgt, DMA_BIDIRECTIONAL);
 fail_detach:
dma_buf_detach(dma_buf, attach);
dma_buf_put(dma_buf);
@@ -1056,7 +1056,7 @@ void drm_prime_gem_destroy(struct drm_gem_object *obj, 
struct sg_table *sg)
 
attach = obj->import_attach;
if (sg)
-   dma_buf_unmap_attachment(attach, sg, DMA_BIDIRECTIONAL);
+   dma_buf_unmap_attachment_unlocked(attach, sg, 
DMA_BIDIRECTIONAL);
dma_buf = attach->dmabuf;
dma_buf_detach(attach->dmabuf, attach);
/* remove the reference */
-- 
2.37.3

[PATCH v6 03/21] drm/gem: Take reservation lock for vmap/vunmap operations

2022-09-28 Thread Dmitry Osipenko

The new common dma-buf locking convention will require buffer importers
to hold the reservation lock around mapping operations. Make DRM GEM core
to take the lock around the vmapping operations and update DRM drivers to
use the locked functions for the case where DRM core now holds the lock.
This patch prepares DRM core and drivers to the common dynamic dma-buf
locking convention.

Acked-by: Christian König 
Signed-off-by: Dmitry Osipenko 
---
 drivers/gpu/drm/drm_client.c |  4 ++--
 drivers/gpu/drm/drm_gem.c| 24 
 drivers/gpu/drm/drm_gem_dma_helper.c |  6 ++---
 drivers/gpu/drm/drm_gem_framebuffer_helper.c |  6 ++---
 drivers/gpu/drm/drm_gem_ttm_helper.c |  9 +---
 drivers/gpu/drm/lima/lima_sched.c|  4 ++--
 drivers/gpu/drm/panfrost/panfrost_dump.c |  4 ++--
 drivers/gpu/drm/panfrost/panfrost_perfcnt.c  |  6 ++---
 drivers/gpu/drm/qxl/qxl_object.c | 17 +++---
 drivers/gpu/drm/qxl/qxl_prime.c  |  4 ++--
 include/drm/drm_gem.h|  3 +++
 11 files changed, 54 insertions(+), 33 deletions(-)

diff --git a/drivers/gpu/drm/drm_client.c b/drivers/gpu/drm/drm_client.c
index 2b230b4d6942..fbcb1e995384 100644
--- a/drivers/gpu/drm/drm_client.c
+++ b/drivers/gpu/drm/drm_client.c
@@ -323,7 +323,7 @@ drm_client_buffer_vmap(struct drm_client_buffer *buffer,
 * fd_install step out of the driver backend hooks, to make that
 * final step optional for internal users.
 */
-   ret = drm_gem_vmap(buffer->gem, map);
+   ret = drm_gem_vmap_unlocked(buffer->gem, map);
if (ret)
return ret;
 
@@ -345,7 +345,7 @@ void drm_client_buffer_vunmap(struct drm_client_buffer 
*buffer)
 {
struct iosys_map *map = >map;
 
-   drm_gem_vunmap(buffer->gem, map);
+   drm_gem_vunmap_unlocked(buffer->gem, map);
 }
 EXPORT_SYMBOL(drm_client_buffer_vunmap);
 
diff --git a/drivers/gpu/drm/drm_gem.c b/drivers/gpu/drm/drm_gem.c
index 8b68a3c1e6ab..b8db675e7fb5 100644
--- a/drivers/gpu/drm/drm_gem.c
+++ b/drivers/gpu/drm/drm_gem.c
@@ -1158,6 +1158,8 @@ int drm_gem_vmap(struct drm_gem_object *obj, struct 
iosys_map *map)
 {
int ret;
 
+   dma_resv_assert_held(obj->resv);
+
if (!obj->funcs->vmap)
return -EOPNOTSUPP;
 
@@ -1173,6 +1175,8 @@ EXPORT_SYMBOL(drm_gem_vmap);
 
 void drm_gem_vunmap(struct drm_gem_object *obj, struct iosys_map *map)
 {
+   dma_resv_assert_held(obj->resv);
+
if (iosys_map_is_null(map))
return;
 
@@ -1184,6 +1188,26 @@ void drm_gem_vunmap(struct drm_gem_object *obj, struct 
iosys_map *map)
 }
 EXPORT_SYMBOL(drm_gem_vunmap);
 
+int drm_gem_vmap_unlocked(struct drm_gem_object *obj, struct iosys_map *map)
+{
+   int ret;
+
+   dma_resv_lock(obj->resv, NULL);
+   ret = drm_gem_vmap(obj, map);
+   dma_resv_unlock(obj->resv);
+
+   return ret;
+}
+EXPORT_SYMBOL(drm_gem_vmap_unlocked);
+
+void drm_gem_vunmap_unlocked(struct drm_gem_object *obj, struct iosys_map *map)
+{
+   dma_resv_lock(obj->resv, NULL);
+   drm_gem_vunmap(obj, map);
+   dma_resv_unlock(obj->resv);
+}
+EXPORT_SYMBOL(drm_gem_vunmap_unlocked);
+
 /**
  * drm_gem_lock_reservations - Sets up the ww context and acquires
  * the lock on an array of GEM objects.
diff --git a/drivers/gpu/drm/drm_gem_dma_helper.c 
b/drivers/gpu/drm/drm_gem_dma_helper.c
index f6901ff97bbb..1e658c448366 100644
--- a/drivers/gpu/drm/drm_gem_dma_helper.c
+++ b/drivers/gpu/drm/drm_gem_dma_helper.c
@@ -230,7 +230,7 @@ void drm_gem_dma_free(struct drm_gem_dma_object *dma_obj)
 
if (gem_obj->import_attach) {
if (dma_obj->vaddr)
-   dma_buf_vunmap(gem_obj->import_attach->dmabuf, );
+   dma_buf_vunmap_unlocked(gem_obj->import_attach->dmabuf, 
);
drm_prime_gem_destroy(gem_obj, dma_obj->sgt);
} else if (dma_obj->vaddr) {
if (dma_obj->map_noncoherent)
@@ -581,7 +581,7 @@ drm_gem_dma_prime_import_sg_table_vmap(struct drm_device 
*dev,
struct iosys_map map;
int ret;
 
-   ret = dma_buf_vmap(attach->dmabuf, );
+   ret = dma_buf_vmap_unlocked(attach->dmabuf, );
if (ret) {
DRM_ERROR("Failed to vmap PRIME buffer\n");
return ERR_PTR(ret);
@@ -589,7 +589,7 @@ drm_gem_dma_prime_import_sg_table_vmap(struct drm_device 
*dev,
 
obj = drm_gem_dma_prime_import_sg_table(dev, attach, sgt);
if (IS_ERR(obj)) {
-   dma_buf_vunmap(attach->dmabuf, );
+   dma_buf_vunmap_unlocked(attach->dmabuf, );
return obj;
}
 
diff --git a/drivers/gpu/drm/drm_gem_framebuffer_helper.c 
b/drivers/gpu/drm/drm_gem_framebuffer_helper.c
index 880a4975507f..e35e224e6303 100644
--- a/drivers/gpu/drm/drm_gem_framebuffer_helper.c
+++ b/drivers/gpu/drm/drm_gem_framebuffer_helper.c
@@ -354,7 +354,7 @@ int

Re: [PATCH] drm/amdgpu: remove switch from amdgpu_gmc_noretry_set

2022-09-28 Thread Felix Kuehling


Am 2022-09-28 um 11:21 schrieb Graham Sider:

Simplify the logic in amdgpu_gmc_noretry_set by getting rid of the
switch. Also set noretry=1 as default for GFX 10.3.0 and greater since
retry faults are not supported.

Signed-off-by: Graham Sider 


Reviewed-by: Felix Kuehling 



---
  drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c | 48 +
  1 file changed, 9 insertions(+), 39 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c
index aebc384531ac..34233a74248c 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c
@@ -572,45 +572,15 @@ void amdgpu_gmc_tmz_set(struct amdgpu_device *adev)
  void amdgpu_gmc_noretry_set(struct amdgpu_device *adev)
  {
struct amdgpu_gmc *gmc = >gmc;
-
-   switch (adev->ip_versions[GC_HWIP][0]) {
-   case IP_VERSION(9, 0, 1):
-   case IP_VERSION(9, 3, 0):
-   case IP_VERSION(9, 4, 0):
-   case IP_VERSION(9, 4, 1):
-   case IP_VERSION(9, 4, 2):
-   case IP_VERSION(10, 3, 3):
-   case IP_VERSION(10, 3, 4):
-   case IP_VERSION(10, 3, 5):
-   case IP_VERSION(10, 3, 6):
-   case IP_VERSION(10, 3, 7):
-   /*
-* noretry = 0 will cause kfd page fault tests fail
-* for some ASICs, so set default to 1 for these ASICs.
-*/
-   if (amdgpu_noretry == -1)
-   gmc->noretry = 1;
-   else
-   gmc->noretry = amdgpu_noretry;
-   break;
-   default:
-   /* Raven currently has issues with noretry
-* regardless of what we decide for other
-* asics, we should leave raven with
-* noretry = 0 until we root cause the
-* issues.
-*
-* default this to 0 for now, but we may want
-* to change this in the future for certain
-* GPUs as it can increase performance in
-* certain cases.
-*/
-   if (amdgpu_noretry == -1)
-   gmc->noretry = 0;
-   else
-   gmc->noretry = amdgpu_noretry;
-   break;
-   }
+   uint32_t gc_ver = adev->ip_versions[GC_HWIP][0];
+   bool noretry_default = (gc_ver == IP_VERSION(9, 0, 1) ||
+   gc_ver == IP_VERSION(9, 3, 0) ||
+   gc_ver == IP_VERSION(9, 4, 0) ||
+   gc_ver == IP_VERSION(9, 4, 1) ||
+   gc_ver == IP_VERSION(9, 4, 2) ||
+   gc_ver >= IP_VERSION(10, 3, 0));
+
+   gmc->noretry = (amdgpu_noretry == -1) ? noretry_default : 
amdgpu_noretry;
  }
  
  void amdgpu_gmc_set_vm_fault_masks(struct amdgpu_device *adev, int hub_type,

Re: [PATCH v5] drm/sched: Add FIFO sched policy to run queue

2022-09-28 Thread Luben Tuikov

Inlined:

On 2022-09-28 14:49, Andrey Grodzovsky wrote:
> When many entities are competing for the same run queue
> on the same scheduler, we observe an unusually long wait
> times and some jobs get starved. This has been observed on GPUVis.
> 
> The issue is due to the Round Robin policy used by schedulers
> to pick up the next entity's job queue for execution. Under stress
> of many entities and long job queues within entity some
> jobs could be stuck for very long time in it's entity's
> queue before being popped from the queue and executed
> while for other entities with smaller job queues a job
> might execute earlier even though that job arrived later
> then the job in the long queue.
>    
> Fix:
> Add FIFO selection policy to entities in run queue, chose next entity
> on run queue in such order that if job on one entity arrived
> earlier then job on another entity the first job will start
> executing earlier regardless of the length of the entity's job
> queue.
>    
> v2:
> Switch to rb tree structure for entities based on TS of
> oldest job waiting in the job queue of an entity. Improves next
> entity extraction to O(1). Entity TS update
> O(log N) where N is the number of entities in the run-queue
>    
> Drop default option in module control parameter.
> 
> v3:
> Various cosmetical fixes and minor refactoring of fifo update function. 
> (Luben)

"Cosmetic"

> 
> v4:
> Switch drm_sched_rq_select_entity_fifo to in order search (Luben)
> 
> v5: Fix up drm_sched_rq_select_entity_fifo loop

v5 is actually a major major update of the whole patch and working of the patch.
It fixes the entity selection from ready-first, timestamp-second, to 
oldest-first
and ready-second, i.e. oldest-ready entity, as I suggested. The v5 blurb should 
read:

v5: Fix drm_sched_rq_select_entity_fifo loop from reordering and timestamping on
every selection, and picking the last ready first, timestamp-second, to picking
the oldest-ready entity. No reinsertion. (Luben)

>    
> Signed-off-by: Andrey Grodzovsky 
> Tested-by: Li Yunxiang (Teddy) 
> ---
>  drivers/gpu/drm/scheduler/sched_entity.c | 26 ++-
>  drivers/gpu/drm/scheduler/sched_main.c   | 99 +++-
>  include/drm/gpu_scheduler.h  | 32 
>  3 files changed, 151 insertions(+), 6 deletions(-)
> 
> diff --git a/drivers/gpu/drm/scheduler/sched_entity.c 
> b/drivers/gpu/drm/scheduler/sched_entity.c
> index 6b25b2f4f5a3..f3ffce3c9304 100644
> --- a/drivers/gpu/drm/scheduler/sched_entity.c
> +++ b/drivers/gpu/drm/scheduler/sched_entity.c
> @@ -73,6 +73,7 @@ int drm_sched_entity_init(struct drm_sched_entity *entity,
>   entity->priority = priority;
>   entity->sched_list = num_sched_list > 1 ? sched_list : NULL;
>   entity->last_scheduled = NULL;
> + RB_CLEAR_NODE(>rb_tree_node);
>  
>   if(num_sched_list)
>   entity->rq = _list[0]->sched_rq[entity->priority];
> @@ -417,14 +418,16 @@ struct drm_sched_job *drm_sched_entity_pop_job(struct 
> drm_sched_entity *entity)
>  
>   sched_job = to_drm_sched_job(spsc_queue_peek(>job_queue));
>   if (!sched_job)
> - return NULL;
> + goto skip;
>  
>   while ((entity->dependency =
>   drm_sched_job_dependency(sched_job, entity))) {
>   trace_drm_sched_job_wait_dep(sched_job, entity->dependency);
>  
> - if (drm_sched_entity_add_dependency_cb(entity))
> - return NULL;
> + if (drm_sched_entity_add_dependency_cb(entity)) {
> + sched_job = NULL;
> + goto skip;
> + }
>   }
>  
>   /* skip jobs from entity that marked guilty */
> @@ -443,6 +446,16 @@ struct drm_sched_job *drm_sched_entity_pop_job(struct 
> drm_sched_entity *entity)
>   smp_wmb();
>  
>   spsc_queue_pop(>job_queue);
> +
> + /*
> +  * It's when head job is extracted we can access the next job (or empty)
> +  * queue and update the entity location in the min heap accordingly.
> +  */
> +skip:
> + if (drm_sched_policy == DRM_SCHED_POLICY_FIFO)
> + drm_sched_rq_update_fifo(entity,
> +  (sched_job ? sched_job->submit_ts : 
> ktime_get()));
> +
>   return sched_job;
>  }
>  
> @@ -502,11 +515,13 @@ void drm_sched_entity_push_job(struct drm_sched_job 
> *sched_job)
>  {
>   struct drm_sched_entity *entity = sched_job->entity;
>   bool first;
> + ktime_t ts =  ktime_get();
>  
>   trace_drm_sched_job(sched_job, entity);
>   atomic_inc(entity->rq->sched->score);
>   WRITE_ONCE(entity->last_user, current->group_leader);
>   first = spsc_queue_push(>job_queue, _job->queue_node);
> + sched_job->submit_ts = ts;
>  
>   /* first job wakes up scheduler */
>   if (first) {
> @@ -518,8 +533,13 @@ void drm_sched_entity_push_job(struct drm_sched_job 
> *sched_job)
>   DRM_ERROR("Trying to push to a killed entity\n");

[PATCH] drm/amdgpu: Skip put_reset_domain if it doesnt exist

2022-09-28 Thread Vignesh Chander

For xgmi sriov, the reset is handled by host driver and hive->reset_domain
is not initialized so need to check if it exists before doing a put.
Signed-off-by: Vignesh Chander 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_reset.h | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_reset.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_reset.h
index dc43fcb93eac..f5318fedf2f0 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_reset.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_reset.h
@@ -113,7 +113,8 @@ static inline bool amdgpu_reset_get_reset_domain(struct 
amdgpu_reset_domain *dom
 
 static inline void amdgpu_reset_put_reset_domain(struct amdgpu_reset_domain 
*domain)
 {
-   kref_put(>refcount, amdgpu_reset_destroy_reset_domain);
+   if (domain)
+   kref_put(>refcount, amdgpu_reset_destroy_reset_domain);
 }
 
 static inline bool amdgpu_reset_domain_schedule(struct amdgpu_reset_domain 
*domain,
-- 
2.25.1

Re: [PATCH] drm/amdgpu: remove switch from amdgpu_gmc_noretry_set

2022-09-28 Thread Alex Deucher

On Wed, Sep 28, 2022 at 11:22 AM Graham Sider  wrote:
>
> Simplify the logic in amdgpu_gmc_noretry_set by getting rid of the
> switch. Also set noretry=1 as default for GFX 10.3.0 and greater since
> retry faults are not supported.
>
> Signed-off-by: Graham Sider 

Acked-by: Alex Deucher 

> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c | 48 +
>  1 file changed, 9 insertions(+), 39 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c
> index aebc384531ac..34233a74248c 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c
> @@ -572,45 +572,15 @@ void amdgpu_gmc_tmz_set(struct amdgpu_device *adev)
>  void amdgpu_gmc_noretry_set(struct amdgpu_device *adev)
>  {
> struct amdgpu_gmc *gmc = >gmc;
> -
> -   switch (adev->ip_versions[GC_HWIP][0]) {
> -   case IP_VERSION(9, 0, 1):
> -   case IP_VERSION(9, 3, 0):
> -   case IP_VERSION(9, 4, 0):
> -   case IP_VERSION(9, 4, 1):
> -   case IP_VERSION(9, 4, 2):
> -   case IP_VERSION(10, 3, 3):
> -   case IP_VERSION(10, 3, 4):
> -   case IP_VERSION(10, 3, 5):
> -   case IP_VERSION(10, 3, 6):
> -   case IP_VERSION(10, 3, 7):
> -   /*
> -* noretry = 0 will cause kfd page fault tests fail
> -* for some ASICs, so set default to 1 for these ASICs.
> -*/
> -   if (amdgpu_noretry == -1)
> -   gmc->noretry = 1;
> -   else
> -   gmc->noretry = amdgpu_noretry;
> -   break;
> -   default:
> -   /* Raven currently has issues with noretry
> -* regardless of what we decide for other
> -* asics, we should leave raven with
> -* noretry = 0 until we root cause the
> -* issues.
> -*
> -* default this to 0 for now, but we may want
> -* to change this in the future for certain
> -* GPUs as it can increase performance in
> -* certain cases.
> -*/
> -   if (amdgpu_noretry == -1)
> -   gmc->noretry = 0;
> -   else
> -   gmc->noretry = amdgpu_noretry;
> -   break;
> -   }
> +   uint32_t gc_ver = adev->ip_versions[GC_HWIP][0];
> +   bool noretry_default = (gc_ver == IP_VERSION(9, 0, 1) ||
> +   gc_ver == IP_VERSION(9, 3, 0) ||
> +   gc_ver == IP_VERSION(9, 4, 0) ||
> +   gc_ver == IP_VERSION(9, 4, 1) ||
> +   gc_ver == IP_VERSION(9, 4, 2) ||
> +   gc_ver >= IP_VERSION(10, 3, 0));
> +
> +   gmc->noretry = (amdgpu_noretry == -1) ? noretry_default : 
> amdgpu_noretry;
>  }
>
>  void amdgpu_gmc_set_vm_fault_masks(struct amdgpu_device *adev, int hub_type,
> --
> 2.25.1
>

Re: [PATCH v5 1/1] drm/amdkfd: Track unified memory when switching xnack mode

2022-09-28 Thread Felix Kuehling




Am 2022-09-28 um 12:11 schrieb Philip Yang:

Unified memory usage with xnack off is tracked to avoid oversubscribe
system memory, with xnack on, we don't track unified memory usage to
allow memory oversubscribe. When switching xnack mode from off to on,
subsequent free ranges allocated with xnack off will not unreserve
memory. When switching xnack mode from on to off, subsequent free ranges
allocated with xnack on will unreserve memory. Both cases cause memory
accounting unbalanced.

When switching xnack mode from on to off, need reserve already allocated
svm range memory. When switching xnack mode from off to on, need
unreserve already allocated svm range memory.

v5: Handle prange child ranges
v4: Handle reservation memory failure
v3: Handle switching xnack mode race with svm_range_deferred_list_work
v2: Handle both switching xnack from on to off and from off to on cases

Signed-off-by: Philip Yang 
Reviewed-by: Felix Kuehling 
---
  drivers/gpu/drm/amd/amdkfd/kfd_chardev.c | 26 ---
  drivers/gpu/drm/amd/amdkfd/kfd_svm.c | 56 +++-
  drivers/gpu/drm/amd/amdkfd/kfd_svm.h |  1 +
  3 files changed, 76 insertions(+), 7 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
index 56f7307c21d2..5feaba6a77de 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
@@ -1584,6 +1584,8 @@ static int kfd_ioctl_smi_events(struct file *filep,
return kfd_smi_event_open(pdd->dev, >anon_fd);
  }
  
+#if IS_ENABLED(CONFIG_HSA_AMD_SVM)

+
  static int kfd_ioctl_set_xnack_mode(struct file *filep,
struct kfd_process *p, void *data)
  {
@@ -1594,22 +1596,29 @@ static int kfd_ioctl_set_xnack_mode(struct file *filep,
if (args->xnack_enabled >= 0) {
if (!list_empty(>pqm.queues)) {
pr_debug("Process has user queues running\n");
-   mutex_unlock(>mutex);
-   return -EBUSY;
+   r = -EBUSY;
+   goto out_unlock;
}
-   if (args->xnack_enabled && !kfd_process_xnack_mode(p, true))
+
+   if (p->xnack_enabled == args->xnack_enabled)
+   goto out_unlock;
+
+   if (args->xnack_enabled && !kfd_process_xnack_mode(p, true)) {
r = -EPERM;
-   else
-   p->xnack_enabled = args->xnack_enabled;
+   goto out_unlock;
+   }
+
+   r = svm_range_switch_xnack_reserve_mem(p, args->xnack_enabled);
} else {
args->xnack_enabled = p->xnack_enabled;
}
+
+out_unlock:
mutex_unlock(>mutex);
  
  	return r;

  }
  
-#if IS_ENABLED(CONFIG_HSA_AMD_SVM)

  static int kfd_ioctl_svm(struct file *filep, struct kfd_process *p, void 
*data)
  {
struct kfd_ioctl_svm_args *args = data;
@@ -1629,6 +1638,11 @@ static int kfd_ioctl_svm(struct file *filep, struct 
kfd_process *p, void *data)
return r;
  }
  #else
+static int kfd_ioctl_set_xnack_mode(struct file *filep,
+   struct kfd_process *p, void *data)
+{
+   return -EPERM;
+}
  static int kfd_ioctl_svm(struct file *filep, struct kfd_process *p, void 
*data)
  {
return -EPERM;
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
index cf5b4005534c..ff47ac836bd4 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
@@ -278,7 +278,7 @@ static void svm_range_free(struct svm_range *prange, bool 
update_mem_usage)
svm_range_free_dma_mappings(prange);
  
  	if (update_mem_usage && !p->xnack_enabled) {

-   pr_debug("unreserve mem limit: %lld\n", size);
+   pr_debug("unreserve prange 0x%p size: 0x%llx\n", prange, size);
amdgpu_amdkfd_unreserve_mem_limit(NULL, size,
KFD_IOC_ALLOC_MEM_FLAGS_USERPTR);
}
@@ -2956,6 +2956,60 @@ svm_range_restore_pages(struct amdgpu_device *adev, 
unsigned int pasid,
return r;
  }
  
+int

+svm_range_switch_xnack_reserve_mem(struct kfd_process *p, bool xnack_enabled)
+{
+   struct svm_range *prange, *pchild;
+   uint64_t reserved_size = 0;
+   uint64_t size;
+   int r = 0;
+
+   pr_debug("switching xnack from %d to %d\n", p->xnack_enabled, 
xnack_enabled);
+
+   mutex_lock(>svms.lock);
+
+   list_for_each_entry(prange, >svms.list, list) {
+   list_for_each_entry(pchild, >child_list, child_list) {


I believe the child_list is not protected by the svms.lock because we 
update it in MMU notifiers. It is protected by the prange->lock of the 
parent range.


Regards,
  Felix



+   size = (pchild->last - pchild->start + 1) << PAGE_SHIFT;
+   if (xnack_enabled) {
+

[PATCH v5] drm/sched: Add FIFO sched policy to run queue

2022-09-28 Thread Andrey Grodzovsky

When many entities are competing for the same run queue
on the same scheduler, we observe an unusually long wait
times and some jobs get starved. This has been observed on GPUVis.

The issue is due to the Round Robin policy used by schedulers
to pick up the next entity's job queue for execution. Under stress
of many entities and long job queues within entity some
jobs could be stuck for very long time in it's entity's
queue before being popped from the queue and executed
while for other entities with smaller job queues a job
might execute earlier even though that job arrived later
then the job in the long queue.
   
Fix:
Add FIFO selection policy to entities in run queue, chose next entity
on run queue in such order that if job on one entity arrived
earlier then job on another entity the first job will start
executing earlier regardless of the length of the entity's job
queue.
   
v2:
Switch to rb tree structure for entities based on TS of
oldest job waiting in the job queue of an entity. Improves next
entity extraction to O(1). Entity TS update
O(log N) where N is the number of entities in the run-queue
   
Drop default option in module control parameter.

v3:
Various cosmetical fixes and minor refactoring of fifo update function. (Luben)

v4:
Switch drm_sched_rq_select_entity_fifo to in order search (Luben)

v5: Fix up drm_sched_rq_select_entity_fifo loop
   
Signed-off-by: Andrey Grodzovsky 
Tested-by: Li Yunxiang (Teddy) 
---
 drivers/gpu/drm/scheduler/sched_entity.c | 26 ++-
 drivers/gpu/drm/scheduler/sched_main.c   | 99 +++-
 include/drm/gpu_scheduler.h  | 32 
 3 files changed, 151 insertions(+), 6 deletions(-)

diff --git a/drivers/gpu/drm/scheduler/sched_entity.c 
b/drivers/gpu/drm/scheduler/sched_entity.c
index 6b25b2f4f5a3..f3ffce3c9304 100644
--- a/drivers/gpu/drm/scheduler/sched_entity.c
+++ b/drivers/gpu/drm/scheduler/sched_entity.c
@@ -73,6 +73,7 @@ int drm_sched_entity_init(struct drm_sched_entity *entity,
entity->priority = priority;
entity->sched_list = num_sched_list > 1 ? sched_list : NULL;
entity->last_scheduled = NULL;
+   RB_CLEAR_NODE(>rb_tree_node);
 
if(num_sched_list)
entity->rq = _list[0]->sched_rq[entity->priority];
@@ -417,14 +418,16 @@ struct drm_sched_job *drm_sched_entity_pop_job(struct 
drm_sched_entity *entity)
 
sched_job = to_drm_sched_job(spsc_queue_peek(>job_queue));
if (!sched_job)
-   return NULL;
+   goto skip;
 
while ((entity->dependency =
drm_sched_job_dependency(sched_job, entity))) {
trace_drm_sched_job_wait_dep(sched_job, entity->dependency);
 
-   if (drm_sched_entity_add_dependency_cb(entity))
-   return NULL;
+   if (drm_sched_entity_add_dependency_cb(entity)) {
+   sched_job = NULL;
+   goto skip;
+   }
}
 
/* skip jobs from entity that marked guilty */
@@ -443,6 +446,16 @@ struct drm_sched_job *drm_sched_entity_pop_job(struct 
drm_sched_entity *entity)
smp_wmb();
 
spsc_queue_pop(>job_queue);
+
+   /*
+* It's when head job is extracted we can access the next job (or empty)
+* queue and update the entity location in the min heap accordingly.
+*/
+skip:
+   if (drm_sched_policy == DRM_SCHED_POLICY_FIFO)
+   drm_sched_rq_update_fifo(entity,
+(sched_job ? sched_job->submit_ts : 
ktime_get()));
+
return sched_job;
 }
 
@@ -502,11 +515,13 @@ void drm_sched_entity_push_job(struct drm_sched_job 
*sched_job)
 {
struct drm_sched_entity *entity = sched_job->entity;
bool first;
+   ktime_t ts =  ktime_get();
 
trace_drm_sched_job(sched_job, entity);
atomic_inc(entity->rq->sched->score);
WRITE_ONCE(entity->last_user, current->group_leader);
first = spsc_queue_push(>job_queue, _job->queue_node);
+   sched_job->submit_ts = ts;
 
/* first job wakes up scheduler */
if (first) {
@@ -518,8 +533,13 @@ void drm_sched_entity_push_job(struct drm_sched_job 
*sched_job)
DRM_ERROR("Trying to push to a killed entity\n");
return;
}
+
drm_sched_rq_add_entity(entity->rq, entity);
spin_unlock(>rq_lock);
+
+   if (drm_sched_policy == DRM_SCHED_POLICY_FIFO)
+   drm_sched_rq_update_fifo(entity, ts);
+
drm_sched_wakeup(entity->rq->sched);
}
 }
diff --git a/drivers/gpu/drm/scheduler/sched_main.c 
b/drivers/gpu/drm/scheduler/sched_main.c
index 4f2395d1a791..5349fc049384 100644
--- a/drivers/gpu/drm/scheduler/sched_main.c
+++ b/drivers/gpu/drm/scheduler/sched_main.c
@@ -62,6 +62,58 @@
 #define to_drm_sched_job(sched_job)\
container_of((sched_job),

Re: [PATCH] drm/amdgpu: Skip put_reset_domain if it doesnt exist

2022-09-28 Thread Christian König


Am 28.09.22 um 20:22 schrieb Vignesh Chander:

For xgmi sriov, the reset is handled by host driver and hive->reset_domain
is not initialized so need to check if it exists before doing a put.


Once more: General practice is to make the *_put_*() functions NULL 
tolerant.


So rather make this than this patch here.

Regards,
Christian.


Signed-off-by: Vignesh Chander 
---
  drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c | 3 ++-
  1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c
index 47159e9a0884..80fb6ef929e5 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c
@@ -217,7 +217,8 @@ static void amdgpu_xgmi_hive_release(struct kobject *kobj)
struct amdgpu_hive_info *hive = container_of(
kobj, struct amdgpu_hive_info, kobj);
  
-	amdgpu_reset_put_reset_domain(hive->reset_domain);

+   if (hive->reset_domain)
+   amdgpu_reset_put_reset_domain(hive->reset_domain);
hive->reset_domain = NULL;
  
  	mutex_destroy(>hive_lock);

[PATCH] drm/amdgpu: Skip put_reset_domain if it doesnt exist

2022-09-28 Thread Vignesh Chander

For xgmi sriov, the reset is handled by host driver and hive->reset_domain
is not initialized so need to check if it exists before doing a put.
Signed-off-by: Vignesh Chander 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c
index 47159e9a0884..80fb6ef929e5 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c
@@ -217,7 +217,8 @@ static void amdgpu_xgmi_hive_release(struct kobject *kobj)
struct amdgpu_hive_info *hive = container_of(
kobj, struct amdgpu_hive_info, kobj);
 
-   amdgpu_reset_put_reset_domain(hive->reset_domain);
+   if (hive->reset_domain)
+   amdgpu_reset_put_reset_domain(hive->reset_domain);
hive->reset_domain = NULL;
 
mutex_destroy(>hive_lock);
-- 
2.25.1

RE: [PATCH] drm/amdgpu: Skip put_reset_domain if it doesnt exist

2022-09-28 Thread Chander, Vignesh

[AMD Official Use Only - General]

Hi Christian, 

It is because the host driver handles the reset for xgmi sriov case. I will 
update the commit body as

For xgmi sriov, the reset is handled by host driver and hive->reset_domain
is not initialized so need to check if it exists before doing a put.

Regards, 

VIGNESH CHANDER 
Senior Software Development Engineer  |  AMD
SW GPU Virtualization
--
1 Commerce Valley Dr E, Markham, ON L3T 7X6, Canada
  

-Original Message-
From: Christian König  
Sent: Wednesday, September 28, 2022 1:45 PM
To: Chander, Vignesh ; amd-gfx@lists.freedesktop.org
Cc: Liu, Shaoyun 
Subject: Re: [PATCH] drm/amdgpu: Skip put_reset_domain if it doesnt exist

Am 28.09.22 um 19:43 schrieb Vignesh Chander:
> For sriov, the reset domain is no longer created so need to check if it
> exists before doing a put.

Why is the reset domain no longer created for SRIOV?

Regards,
Christian.

> Signed-off-by: Vignesh Chander 
> ---
>   drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c | 3 ++-
>   1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c
> index 47159e9a0884..80fb6ef929e5 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c
> @@ -217,7 +217,8 @@ static void amdgpu_xgmi_hive_release(struct kobject *kobj)
>   struct amdgpu_hive_info *hive = container_of(
>   kobj, struct amdgpu_hive_info, kobj);
>   
> - amdgpu_reset_put_reset_domain(hive->reset_domain);
> + if (hive->reset_domain)
> + amdgpu_reset_put_reset_domain(hive->reset_domain);
>   hive->reset_domain = NULL;
>   
>   mutex_destroy(>hive_lock);

Re: [PATCH 09/10] drm/amdgpu/gfx10: switch to amdgpu_gfx_rlc_init_microcode

2022-09-28 Thread Dmitry Osipenko

On 9/28/22 20:44, Deucher, Alexander wrote:
> [AMD Official Use Only - General]
> 
> This should be fixed in a backwards compatible way with this patch:
> https://patchwork.freedesktop.org/patch/504869/

Good to know that it's already addressed, thank you very much for the
quick reply.

-- 
Best regards,
Dmitry

Re: RE: [PATCH 09/10] drm/amdgpu/gfx10: switch to amdgpu_gfx_rlc_init_microcode

2022-09-28 Thread Dmitry Osipenko

On 9/24/22 10:14, Zhang, Hawking wrote:
> [Public]
> 
> Hi Alex,
> 
> Sienna cichlid rlc firmware binary needs to be updated to match with the 
> kernel change.
> 
> I shall add this comment to commit description.
> 
> Regards,
> Hawking
> 
> From: Deucher, Alexander 
> Sent: Saturday, September 24, 2022 05:36
> To: Zhang, Hawking ; amd-gfx@lists.freedesktop.org; 
> Gao, Likun 
> Subject: Re: [PATCH 09/10] drm/amdgpu/gfx10: switch to 
> amdgpu_gfx_rlc_init_microcode
> 
> 
> [Public]
> 
> This set seems to break RLC firmware loading on sienna cichlid.

This also broke beige goby. Please fix or revert. Requesting to upgrade
firmware is unacceptable.

-- 
Best regards,
Dmitry

Re: [PATCH] drm/amdgpu: Skip put_reset_domain if it doesnt exist

2022-09-28 Thread Christian König


Am 28.09.22 um 19:43 schrieb Vignesh Chander:

For sriov, the reset domain is no longer created so need to check if it
exists before doing a put.


Why is the reset domain no longer created for SRIOV?

Regards,
Christian.


Signed-off-by: Vignesh Chander 
---
  drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c | 3 ++-
  1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c
index 47159e9a0884..80fb6ef929e5 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c
@@ -217,7 +217,8 @@ static void amdgpu_xgmi_hive_release(struct kobject *kobj)
struct amdgpu_hive_info *hive = container_of(
kobj, struct amdgpu_hive_info, kobj);
  
-	amdgpu_reset_put_reset_domain(hive->reset_domain);

+   if (hive->reset_domain)
+   amdgpu_reset_put_reset_domain(hive->reset_domain);
hive->reset_domain = NULL;
  
  	mutex_destroy(>hive_lock);

RE: [PATCH] drm/amdgpu: Skip put_reset_domain if it doesnt exist

2022-09-28 Thread Liu, Shaoyun

[AMD Official Use Only - General]

Please add description like under sriov xgmi configuration , the hive reset is 
handled by host driver , hive->reset_domain  is not been  in initialized  so  
need to skip it .

Regards
Shaoyun.liu


-Original Message-
From: Chander, Vignesh 
Sent: Wednesday, September 28, 2022 1:38 PM
To: amd-gfx@lists.freedesktop.org
Cc: Liu, Shaoyun ; Chander, Vignesh 

Subject: [PATCH] drm/amdgpu: Skip put_reset_domain if it doesnt exist

Change-Id: Ifd6121fb94db3fadaa1dee61d35699abe1259409
Signed-off-by: Vignesh Chander 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c
index 47159e9a0884..80fb6ef929e5 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c
@@ -217,7 +217,8 @@ static void amdgpu_xgmi_hive_release(struct kobject *kobj)
struct amdgpu_hive_info *hive = container_of(
kobj, struct amdgpu_hive_info, kobj);

-   amdgpu_reset_put_reset_domain(hive->reset_domain);
+   if (hive->reset_domain)
+   amdgpu_reset_put_reset_domain(hive->reset_domain);
hive->reset_domain = NULL;

mutex_destroy(>hive_lock);
--
2.25.1

Re: [PATCH] drm/amdgpu: Skip put_reset_domain if it doesnt exist

2022-09-28 Thread Christian König





Am 28.09.22 um 19:37 schrieb Vignesh Chander:

Change-Id: Ifd6121fb94db3fadaa1dee61d35699abe1259409
Signed-off-by: Vignesh Chander 
---
  drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c | 3 ++-
  1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c
index 47159e9a0884..80fb6ef929e5 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c
@@ -217,7 +217,8 @@ static void amdgpu_xgmi_hive_release(struct kobject *kobj)
struct amdgpu_hive_info *hive = container_of(
kobj, struct amdgpu_hive_info, kobj);
  
-	amdgpu_reset_put_reset_domain(hive->reset_domain);

+   if (hive->reset_domain)
+   amdgpu_reset_put_reset_domain(hive->reset_domain);


As general coding style rule _put functions are usually NULL pointer 
tolerant. So this shouldn't be necessary.


If that really is necessary then please adjust the 
amdgpu_reset_put_reset_domain() function instead.


Christian.


hive->reset_domain = NULL;
  
  	mutex_destroy(>hive_lock);

Re: RE: [PATCH 09/10] drm/amdgpu/gfx10: switch to amdgpu_gfx_rlc_init_microcode

2022-09-28 Thread Deucher, Alexander

[AMD Official Use Only - General]

This should be fixed in a backwards compatible way with this patch:
https://patchwork.freedesktop.org/patch/504869/

Alex

From: Dmitry Osipenko 
Sent: Wednesday, September 28, 2022 1:40 PM
To: Zhang, Hawking ; Deucher, Alexander 
; amd-gfx@lists.freedesktop.org 
; Gao, Likun 
Subject: Re: RE: [PATCH 09/10] drm/amdgpu/gfx10: switch to 
amdgpu_gfx_rlc_init_microcode

On 9/24/22 10:14, Zhang, Hawking wrote:
> [Public]
>
> Hi Alex,
>
> Sienna cichlid rlc firmware binary needs to be updated to match with the 
> kernel change.
>
> I shall add this comment to commit description.
>
> Regards,
> Hawking
>
> From: Deucher, Alexander 
> Sent: Saturday, September 24, 2022 05:36
> To: Zhang, Hawking ; amd-gfx@lists.freedesktop.org; 
> Gao, Likun 
> Subject: Re: [PATCH 09/10] drm/amdgpu/gfx10: switch to 
> amdgpu_gfx_rlc_init_microcode
>
>
> [Public]
>
> This set seems to break RLC firmware loading on sienna cichlid.

This also broke beige goby. Please fix or revert. Requesting to upgrade
firmware is unacceptable.

--
Best regards,
Dmitry

[PATCH] drm/amdgpu: Skip put_reset_domain if it doesnt exist

2022-09-28 Thread Vignesh Chander

For sriov, the reset domain is no longer created so need to check if it
exists before doing a put.
Signed-off-by: Vignesh Chander 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c
index 47159e9a0884..80fb6ef929e5 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c
@@ -217,7 +217,8 @@ static void amdgpu_xgmi_hive_release(struct kobject *kobj)
struct amdgpu_hive_info *hive = container_of(
kobj, struct amdgpu_hive_info, kobj);
 
-   amdgpu_reset_put_reset_domain(hive->reset_domain);
+   if (hive->reset_domain)
+   amdgpu_reset_put_reset_domain(hive->reset_domain);
hive->reset_domain = NULL;
 
mutex_destroy(>hive_lock);
-- 
2.25.1

Re: [PATCH] drm/amdgpu: Skip put_reset_domain if it doesnt exist

2022-09-28 Thread Hamza Mahfooz


On 2022-09-28 13:37, Vignesh Chander wrote:

Change-Id: Ifd6121fb94db3fadaa1dee61d35699abe1259409
Signed-off-by: Vignesh Chander 


Please remove the Change-Id and provide commit message body.


---
  drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c | 3 ++-
  1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c
index 47159e9a0884..80fb6ef929e5 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c
@@ -217,7 +217,8 @@ static void amdgpu_xgmi_hive_release(struct kobject *kobj)
struct amdgpu_hive_info *hive = container_of(
kobj, struct amdgpu_hive_info, kobj);
  
-	amdgpu_reset_put_reset_domain(hive->reset_domain);

+   if (hive->reset_domain)
+   amdgpu_reset_put_reset_domain(hive->reset_domain);
hive->reset_domain = NULL;
  
  	mutex_destroy(>hive_lock);


--
Hamza

[PATCH] drm/amdgpu: Skip put_reset_domain if it doesnt exist

2022-09-28 Thread Vignesh Chander

Change-Id: Ifd6121fb94db3fadaa1dee61d35699abe1259409
Signed-off-by: Vignesh Chander 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c
index 47159e9a0884..80fb6ef929e5 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c
@@ -217,7 +217,8 @@ static void amdgpu_xgmi_hive_release(struct kobject *kobj)
struct amdgpu_hive_info *hive = container_of(
kobj, struct amdgpu_hive_info, kobj);
 
-   amdgpu_reset_put_reset_domain(hive->reset_domain);
+   if (hive->reset_domain)
+   amdgpu_reset_put_reset_domain(hive->reset_domain);
hive->reset_domain = NULL;
 
mutex_destroy(>hive_lock);
-- 
2.25.1

Re: [PATCH] drm/amdgpu: Fix mc_umc_status used uninitialized warning

2022-09-28 Thread Leo Li





On 2022-09-28 10:53, Hamza Mahfooz wrote:

On 2022-09-28 10:49, sunpeng...@amd.com wrote:

From: Leo Li 

On ChromeOS clang build, the following warning is seen:

/mnt/host/source/src/third_party/kernel/v5.15/drivers/gpu/drm/amd/amdgpu/umc_v6_7.c:463:6:
 error: variable 'mc_umc_status' is used uninitialized whenever 'if' condition 
is false [-Werror,-Wsometimes-uninitialized]
 if (mca_addr == UMC_INVALID_ADDR) {
 ^~~~
/mnt/host/source/src/third_party/kernel/v5.15/drivers/gpu/drm/amd/amdgpu/umc_v6_7.c:485:21:
 note: uninitialized use occurs here
 if ((REG_GET_FIELD(mc_umc_status, 
MCA_UMC_UMC0_MCUMC_STATUST0, Val) == 1 &&

    ^
/mnt/host/source/src/third_party/kernel/v5.15/drivers/gpu/drm/amd/amdgpu/../amdgpu/amdgpu.h:1208:5:
 note: expanded from macro 'REG_GET_FIELD'
 (((value) & REG_FIELD_MASK(reg, field)) >> 
REG_FIELD_SHIFT(reg, field))

    ^
/mnt/host/source/src/third_party/kernel/v5.15/drivers/gpu/drm/amd/amdgpu/umc_v6_7.c:463:2:
 note: remove the 'if' if its condition is always true
 if (mca_addr == UMC_INVALID_ADDR) {
 ^~
/mnt/host/source/src/third_party/kernel/v5.15/drivers/gpu/drm/amd/amdgpu/umc_v6_7.c:460:24:
 note: initialize the variable 'mc_umc_status' to silence this warning
 uint64_t mc_umc_status, mc_umc_addrt0;
   ^
    = 0
1 error generated.
make[5]: *** 
[/mnt/host/source/src/third_party/kernel/v5.15/scripts/Makefile.build:289: drivers/gpu/drm/amd/amdgpu/umc_v6_7.o] Error 1


Fix by initializing mc_umc_status = 0.

Fixes: d8e19e32945e ("drm/amdgpu: support to convert dedicated umc mca 
address")

Signed-off-by: Leo Li 


Reviewed-by: Hamza Mahfooz 


Thanks! Will merge shortly.
Leo

[PATCH v2] drm/amd/display: Prevent OTG shutdown during PSR SU

2022-09-28 Thread sunpeng.li

From: Leo Li 

[Why]

Enabling Z10 optimizations allows DMUB to disable the OTG during PSR
link-off. This theoretically saves power by putting more of the display
hardware to sleep. However, we observe that with PSR SU, it causes
visual artifacts, higher power usage, and potential system hang.

This is partly due to an odd behavior with the VStartup interrupt used
to signal DRM vblank events. If the OTG is toggled on/off during a PSR
link on/off cycle, the vstartup interrupt fires twice in quick
succession. This generates incorrectly timed vblank events.
Additionally, it can cause cursor updates to generate visual artifacts.

Note that this is not observed with PSR1 since PSR is fully disabled
when there are vblank event requestors. Cursor updates are also
artifact-free, likely because there are no selectively-updated (SU)
frames that can generate artifacts.

[How]

A potential solution is to disable z10 idle optimizations only when fast
updates (flips & cursor updates) are committed. A mechanism to do so
would require some thoughtful design. Let's just disable idle
optimizations for PSR2 for now.

Fixes: 7cc191ee7621 ("drm/amd/display: Implement MPO PSR SU")
Reported-by: August Wikerfors 
Link: 
https://lore.kernel.org/r/c1f8886a-5624-8f49-31b1-e42b6d20d...@augustwikerfors.se/
Tested-by: August Wikerfors 
Reviewed-by: Harry Wentland 
Signed-off-by: Leo Li 
---
 drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_psr.c | 8 +++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_psr.c 
b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_psr.c
index c8da18e45b0e..8ca10ab3dfc1 100644
--- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_psr.c
+++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_psr.c
@@ -170,7 +170,13 @@ bool amdgpu_dm_psr_enable(struct dc_stream_state *stream)
   , 1,
   );
 
-   power_opt |= psr_power_opt_z10_static_screen;
+   /*
+* Only enable static-screen optimizations for PSR1. For PSR SU, this
+* causes vstartup interrupt issues, used by amdgpu_dm to send vblank
+* events.
+*/
+   if (link->psr_settings.psr_version < DC_PSR_VERSION_SU_1)
+   power_opt |= psr_power_opt_z10_static_screen;
 
return dc_link_set_psr_allow_active(link, _enable, false, false, 
_opt);
 }
-- 
2.37.3

Re: [PATCH] drm/amd/display: Prevent OTG shutdown during PSR SU

2022-09-28 Thread Leo Li





On 2022-09-28 09:52, Harry Wentland wrote:



On 2022-09-27 19:13, sunpeng...@amd.com wrote:

From: Leo Li 

[Why]

Enabling Z10 optimizations allows DMUB to disable the OTG during PSR
link-off. This theoretically saves power by putting more of the display
hardware to sleep. However, we observe that with PSR SU, it causes
visual artifacts, higher power usage, and potential system hang.

This is partly due to an odd behavior with the VStartup interrupt used
to signal DRM vblank events. If the OTG is toggled on/off during a PSR
link on/off cycle, the vstartup interrupt fires twice in quick
succession. This generates incorrectly timed vblank events.
Additionally, it can cause cursor updates to generate visual artifacts.

Note that this is not observed with PSR1 since PSR is fully disabled
when there are vblank event requestors. Cursor updates are also
artifact-free, likely because there are no selectively-updated (SU)
frames that can generate artifacts.

[How]

A potential solution is to disable z10 idle optimizations only when fast
updates (flips & cursor updates) are committed. A mechanism to do so
would require some thoughtful design. Let's just disable idle
optimizations for PSR2 for now.



Great writeup. Wish every bugfix came with details like this.


Fixes: 7cc191ee7621 ("drm/amd/display: Implement MPO PSR SU")
Signed-off-by: Leo Li 


With Thorsten's comments addressed this is
Reviewed-by: Harry Wentland 

Harry



Thanks all for the bug report, comments, and review. Will send out v2 
and merge shortly.


Leo


---
  drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_psr.c | 8 +++-
  1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_psr.c 
b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_psr.c
index c8da18e45b0e..8ca10ab3dfc1 100644
--- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_psr.c
+++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_psr.c
@@ -170,7 +170,13 @@ bool amdgpu_dm_psr_enable(struct dc_stream_state *stream)
   , 1,
   );
  
-	power_opt |= psr_power_opt_z10_static_screen;

+   /*
+* Only enable static-screen optimizations for PSR1. For PSR SU, this
+* causes vstartup interrupt issues, used by amdgpu_dm to send vblank
+* events.
+*/
+   if (link->psr_settings.psr_version < DC_PSR_VERSION_SU_1)
+   power_opt |= psr_power_opt_z10_static_screen;
  
  	return dc_link_set_psr_allow_active(link, _enable, false, false, _opt);

  }

Re: [PATCH v2 8/8] hmm-tests: Add test for migrate_device_range()

2022-09-28 Thread Andrew Morton

On Wed, 28 Sep 2022 22:01:22 +1000 Alistair Popple  wrote:

> @@ -1401,22 +1494,7 @@ static int dmirror_device_init(struct dmirror_device 
> *mdevice, int id)
>  
>  static void dmirror_device_remove(struct dmirror_device *mdevice)
>  {
> - unsigned int i;
> -
> - if (mdevice->devmem_chunks) {
> - for (i = 0; i < mdevice->devmem_count; i++) {
> - struct dmirror_chunk *devmem =
> - mdevice->devmem_chunks[i];
> -
> - memunmap_pages(>pagemap);
> - if (devmem->pagemap.type == MEMORY_DEVICE_PRIVATE)
> - release_mem_region(devmem->pagemap.range.start,
> -
> range_len(>pagemap.range));
> - kfree(devmem);
> - }
> - kfree(mdevice->devmem_chunks);
> - }
> -
> + dmirror_device_remove_chunks(mdevice);
>   cdev_del(>cdevice);
>  }

Needed a bit or rework due to
https://lkml.kernel.org/r/20220826050631.25771-1-mpent...@redhat.com. 
Please check my resolution.


--- a/lib/test_hmm.c~hmm-tests-add-test-for-migrate_device_range
+++ a/lib/test_hmm.c
@@ -100,6 +100,7 @@ struct dmirror {
 struct dmirror_chunk {
struct dev_pagemap  pagemap;
struct dmirror_device   *mdevice;
+   bool remove;
 };
 
 /*
@@ -192,11 +193,15 @@ static int dmirror_fops_release(struct i
return 0;
 }
 
+static struct dmirror_chunk *dmirror_page_to_chunk(struct page *page)
+{
+   return container_of(page->pgmap, struct dmirror_chunk, pagemap);
+}
+
 static struct dmirror_device *dmirror_page_to_device(struct page *page)
 
 {
-   return container_of(page->pgmap, struct dmirror_chunk,
-   pagemap)->mdevice;
+   return dmirror_page_to_chunk(page)->mdevice;
 }
 
 static int dmirror_do_fault(struct dmirror *dmirror, struct hmm_range *range)
@@ -1218,6 +1223,85 @@ static int dmirror_snapshot(struct dmirr
return ret;
 }
 
+static void dmirror_device_evict_chunk(struct dmirror_chunk *chunk)
+{
+   unsigned long start_pfn = chunk->pagemap.range.start >> PAGE_SHIFT;
+   unsigned long end_pfn = chunk->pagemap.range.end >> PAGE_SHIFT;
+   unsigned long npages = end_pfn - start_pfn + 1;
+   unsigned long i;
+   unsigned long *src_pfns;
+   unsigned long *dst_pfns;
+
+   src_pfns = kcalloc(npages, sizeof(*src_pfns), GFP_KERNEL);
+   dst_pfns = kcalloc(npages, sizeof(*dst_pfns), GFP_KERNEL);
+
+   migrate_device_range(src_pfns, start_pfn, npages);
+   for (i = 0; i < npages; i++) {
+   struct page *dpage, *spage;
+
+   spage = migrate_pfn_to_page(src_pfns[i]);
+   if (!spage || !(src_pfns[i] & MIGRATE_PFN_MIGRATE))
+   continue;
+
+   if (WARN_ON(!is_device_private_page(spage) &&
+   !is_device_coherent_page(spage)))
+   continue;
+   spage = BACKING_PAGE(spage);
+   dpage = alloc_page(GFP_HIGHUSER_MOVABLE | __GFP_NOFAIL);
+   lock_page(dpage);
+   copy_highpage(dpage, spage);
+   dst_pfns[i] = migrate_pfn(page_to_pfn(dpage));
+   if (src_pfns[i] & MIGRATE_PFN_WRITE)
+   dst_pfns[i] |= MIGRATE_PFN_WRITE;
+   }
+   migrate_device_pages(src_pfns, dst_pfns, npages);
+   migrate_device_finalize(src_pfns, dst_pfns, npages);
+   kfree(src_pfns);
+   kfree(dst_pfns);
+}
+
+/* Removes free pages from the free list so they can't be re-allocated */
+static void dmirror_remove_free_pages(struct dmirror_chunk *devmem)
+{
+   struct dmirror_device *mdevice = devmem->mdevice;
+   struct page *page;
+
+   for (page = mdevice->free_pages; page; page = page->zone_device_data)
+   if (dmirror_page_to_chunk(page) == devmem)
+   mdevice->free_pages = page->zone_device_data;
+}
+
+static void dmirror_device_remove_chunks(struct dmirror_device *mdevice)
+{
+   unsigned int i;
+
+   mutex_lock(>devmem_lock);
+   if (mdevice->devmem_chunks) {
+   for (i = 0; i < mdevice->devmem_count; i++) {
+   struct dmirror_chunk *devmem =
+   mdevice->devmem_chunks[i];
+
+   spin_lock(>lock);
+   devmem->remove = true;
+   dmirror_remove_free_pages(devmem);
+   spin_unlock(>lock);
+
+   dmirror_device_evict_chunk(devmem);
+   memunmap_pages(>pagemap);
+   if (devmem->pagemap.type == MEMORY_DEVICE_PRIVATE)
+   release_mem_region(devmem->pagemap.range.start,
+  
range_len(>pagemap.range));
+   kfree(devmem);
+   }
+   mdevice->devmem_count = 0;
+

[PATCH v5 1/1] drm/amdkfd: Track unified memory when switching xnack mode

2022-09-28 Thread Philip Yang

Unified memory usage with xnack off is tracked to avoid oversubscribe
system memory, with xnack on, we don't track unified memory usage to
allow memory oversubscribe. When switching xnack mode from off to on,
subsequent free ranges allocated with xnack off will not unreserve
memory. When switching xnack mode from on to off, subsequent free ranges
allocated with xnack on will unreserve memory. Both cases cause memory
accounting unbalanced.

When switching xnack mode from on to off, need reserve already allocated
svm range memory. When switching xnack mode from off to on, need
unreserve already allocated svm range memory.

v5: Handle prange child ranges
v4: Handle reservation memory failure
v3: Handle switching xnack mode race with svm_range_deferred_list_work
v2: Handle both switching xnack from on to off and from off to on cases

Signed-off-by: Philip Yang 
Reviewed-by: Felix Kuehling 
---
 drivers/gpu/drm/amd/amdkfd/kfd_chardev.c | 26 ---
 drivers/gpu/drm/amd/amdkfd/kfd_svm.c | 56 +++-
 drivers/gpu/drm/amd/amdkfd/kfd_svm.h |  1 +
 3 files changed, 76 insertions(+), 7 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
index 56f7307c21d2..5feaba6a77de 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
@@ -1584,6 +1584,8 @@ static int kfd_ioctl_smi_events(struct file *filep,
return kfd_smi_event_open(pdd->dev, >anon_fd);
 }
 
+#if IS_ENABLED(CONFIG_HSA_AMD_SVM)
+
 static int kfd_ioctl_set_xnack_mode(struct file *filep,
struct kfd_process *p, void *data)
 {
@@ -1594,22 +1596,29 @@ static int kfd_ioctl_set_xnack_mode(struct file *filep,
if (args->xnack_enabled >= 0) {
if (!list_empty(>pqm.queues)) {
pr_debug("Process has user queues running\n");
-   mutex_unlock(>mutex);
-   return -EBUSY;
+   r = -EBUSY;
+   goto out_unlock;
}
-   if (args->xnack_enabled && !kfd_process_xnack_mode(p, true))
+
+   if (p->xnack_enabled == args->xnack_enabled)
+   goto out_unlock;
+
+   if (args->xnack_enabled && !kfd_process_xnack_mode(p, true)) {
r = -EPERM;
-   else
-   p->xnack_enabled = args->xnack_enabled;
+   goto out_unlock;
+   }
+
+   r = svm_range_switch_xnack_reserve_mem(p, args->xnack_enabled);
} else {
args->xnack_enabled = p->xnack_enabled;
}
+
+out_unlock:
mutex_unlock(>mutex);
 
return r;
 }
 
-#if IS_ENABLED(CONFIG_HSA_AMD_SVM)
 static int kfd_ioctl_svm(struct file *filep, struct kfd_process *p, void *data)
 {
struct kfd_ioctl_svm_args *args = data;
@@ -1629,6 +1638,11 @@ static int kfd_ioctl_svm(struct file *filep, struct 
kfd_process *p, void *data)
return r;
 }
 #else
+static int kfd_ioctl_set_xnack_mode(struct file *filep,
+   struct kfd_process *p, void *data)
+{
+   return -EPERM;
+}
 static int kfd_ioctl_svm(struct file *filep, struct kfd_process *p, void *data)
 {
return -EPERM;
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
index cf5b4005534c..ff47ac836bd4 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
@@ -278,7 +278,7 @@ static void svm_range_free(struct svm_range *prange, bool 
update_mem_usage)
svm_range_free_dma_mappings(prange);
 
if (update_mem_usage && !p->xnack_enabled) {
-   pr_debug("unreserve mem limit: %lld\n", size);
+   pr_debug("unreserve prange 0x%p size: 0x%llx\n", prange, size);
amdgpu_amdkfd_unreserve_mem_limit(NULL, size,
KFD_IOC_ALLOC_MEM_FLAGS_USERPTR);
}
@@ -2956,6 +2956,60 @@ svm_range_restore_pages(struct amdgpu_device *adev, 
unsigned int pasid,
return r;
 }
 
+int
+svm_range_switch_xnack_reserve_mem(struct kfd_process *p, bool xnack_enabled)
+{
+   struct svm_range *prange, *pchild;
+   uint64_t reserved_size = 0;
+   uint64_t size;
+   int r = 0;
+
+   pr_debug("switching xnack from %d to %d\n", p->xnack_enabled, 
xnack_enabled);
+
+   mutex_lock(>svms.lock);
+
+   list_for_each_entry(prange, >svms.list, list) {
+   list_for_each_entry(pchild, >child_list, child_list) {
+   size = (pchild->last - pchild->start + 1) << PAGE_SHIFT;
+   if (xnack_enabled) {
+   amdgpu_amdkfd_unreserve_mem_limit(NULL, size,
+   
KFD_IOC_ALLOC_MEM_FLAGS_USERPTR);
+   } else {
+   r =

Re: [PATCH v4 1/1] drm/amdkfd: Track unified memory when switching xnack mode

2022-09-28 Thread Felix Kuehling


Am 2022-09-28 um 11:55 schrieb Philip Yang:


On 2022-09-27 14:58, Felix Kuehling wrote:

Am 2022-09-27 um 11:43 schrieb Philip Yang:

Unified memory usage with xnack off is tracked to avoid oversubscribe
system memory, with xnack on, we don't track unified memory usage to
allow memory oversubscribe. When switching xnack mode from off to on,
subsequent free ranges allocated with xnack off will not unreserve
memory. When switching xnack mode from on to off, subsequent free 
ranges

allocated with xnack on will unreserve memory. Both cases cause memory
accounting unbalanced.

When switching xnack mode from on to off, need reserve already 
allocated

svm range memory. When switching xnack mode from off to on, need
unreserve already allocated svm range memory.

v4: Handle reservation memory failure
v3: Handle switching xnack mode race with svm_range_deferred_list_work
v2: Handle both switching xnack from on to off and from off to on cases

Signed-off-by: Philip Yang 
---
  drivers/gpu/drm/amd/amdkfd/kfd_chardev.c | 26 ++
  drivers/gpu/drm/amd/amdkfd/kfd_svm.c | 44 
+++-

  drivers/gpu/drm/amd/amdkfd/kfd_svm.h |  2 +-
  3 files changed, 64 insertions(+), 8 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c

index 56f7307c21d2..5feaba6a77de 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
@@ -1584,6 +1584,8 @@ static int kfd_ioctl_smi_events(struct file 
*filep,

  return kfd_smi_event_open(pdd->dev, >anon_fd);
  }
  +#if IS_ENABLED(CONFIG_HSA_AMD_SVM)
+
  static int kfd_ioctl_set_xnack_mode(struct file *filep,
  struct kfd_process *p, void *data)
  {
@@ -1594,22 +1596,29 @@ static int kfd_ioctl_set_xnack_mode(struct 
file *filep,

  if (args->xnack_enabled >= 0) {
  if (!list_empty(>pqm.queues)) {
  pr_debug("Process has user queues running\n");
-    mutex_unlock(>mutex);
-    return -EBUSY;
+    r = -EBUSY;
+    goto out_unlock;
  }
-    if (args->xnack_enabled && !kfd_process_xnack_mode(p, true))
+
+    if (p->xnack_enabled == args->xnack_enabled)
+    goto out_unlock;
+
+    if (args->xnack_enabled && !kfd_process_xnack_mode(p, true)) {
  r = -EPERM;
-    else
-    p->xnack_enabled = args->xnack_enabled;
+    goto out_unlock;
+    }
+
+    r = svm_range_switch_xnack_reserve_mem(p, 
args->xnack_enabled);

  } else {
  args->xnack_enabled = p->xnack_enabled;
  }
+
+out_unlock:
  mutex_unlock(>mutex);
    return r;
  }
  -#if IS_ENABLED(CONFIG_HSA_AMD_SVM)
  static int kfd_ioctl_svm(struct file *filep, struct kfd_process 
*p, void *data)

  {
  struct kfd_ioctl_svm_args *args = data;
@@ -1629,6 +1638,11 @@ static int kfd_ioctl_svm(struct file *filep, 
struct kfd_process *p, void *data)

  return r;
  }
  #else
+static int kfd_ioctl_set_xnack_mode(struct file *filep,
+    struct kfd_process *p, void *data)
+{
+    return -EPERM;
+}
  static int kfd_ioctl_svm(struct file *filep, struct kfd_process 
*p, void *data)

  {
  return -EPERM;
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c

index cf5b4005534c..13d2867faa0c 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
@@ -278,7 +278,7 @@ static void svm_range_free(struct svm_range 
*prange, bool update_mem_usage)

  svm_range_free_dma_mappings(prange);
    if (update_mem_usage && !p->xnack_enabled) {
-    pr_debug("unreserve mem limit: %lld\n", size);
+    pr_debug("unreserve prange 0x%p size: 0x%llx\n", prange, 
size);

  amdgpu_amdkfd_unreserve_mem_limit(NULL, size,
  KFD_IOC_ALLOC_MEM_FLAGS_USERPTR);
  }
@@ -2956,6 +2956,48 @@ svm_range_restore_pages(struct amdgpu_device 
*adev, unsigned int pasid,

  return r;
  }
  +int
+svm_range_switch_xnack_reserve_mem(struct kfd_process *p, bool 
xnack_enabled)

+{
+    struct svm_range *prange;
+    uint64_t reserved_size = 0;
+    uint64_t size;
+    int r = 0;
+
+    pr_debug("switching xnack from %d to %d\n", p->xnack_enabled, 
xnack_enabled);

+
+    mutex_lock(>svms.lock);
+
+    /* Change xnack mode must be inside svms lock, to avoid race with
+ * svm_range_deferred_list_work unreserve memory in parallel.
+ */
+    p->xnack_enabled = xnack_enabled;


This should only be set on a successful return.



+
+    list_for_each_entry(prange, >svms.list, list) {
+    size = (prange->last - prange->start + 1) << PAGE_SHIFT;
+    pr_debug("svms 0x%p %s prange 0x%p size 0x%llx\n", >svms,
+ xnack_enabled ? "unreserve" : "reserve", prange, size);
+
+    if (xnack_enabled) {
+    amdgpu_amdkfd_unreserve_mem_limit(NULL, size,
+    KFD_IOC_ALLOC_MEM_FLAGS_USERPTR);
+    } else {
+

Re: [PATCH v4 1/1] drm/amdkfd: Track unified memory when switching xnack mode

2022-09-28 Thread Philip Yang




On 2022-09-27 14:58, Felix Kuehling wrote:

Am 2022-09-27 um 11:43 schrieb Philip Yang:

Unified memory usage with xnack off is tracked to avoid oversubscribe
system memory, with xnack on, we don't track unified memory usage to
allow memory oversubscribe. When switching xnack mode from off to on,
subsequent free ranges allocated with xnack off will not unreserve
memory. When switching xnack mode from on to off, subsequent free ranges
allocated with xnack on will unreserve memory. Both cases cause memory
accounting unbalanced.

When switching xnack mode from on to off, need reserve already allocated
svm range memory. When switching xnack mode from off to on, need
unreserve already allocated svm range memory.

v4: Handle reservation memory failure
v3: Handle switching xnack mode race with svm_range_deferred_list_work
v2: Handle both switching xnack from on to off and from off to on cases

Signed-off-by: Philip Yang 
---
  drivers/gpu/drm/amd/amdkfd/kfd_chardev.c | 26 ++
  drivers/gpu/drm/amd/amdkfd/kfd_svm.c | 44 +++-
  drivers/gpu/drm/amd/amdkfd/kfd_svm.h |  2 +-
  3 files changed, 64 insertions(+), 8 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c

index 56f7307c21d2..5feaba6a77de 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
@@ -1584,6 +1584,8 @@ static int kfd_ioctl_smi_events(struct file 
*filep,

  return kfd_smi_event_open(pdd->dev, >anon_fd);
  }
  +#if IS_ENABLED(CONFIG_HSA_AMD_SVM)
+
  static int kfd_ioctl_set_xnack_mode(struct file *filep,
  struct kfd_process *p, void *data)
  {
@@ -1594,22 +1596,29 @@ static int kfd_ioctl_set_xnack_mode(struct 
file *filep,

  if (args->xnack_enabled >= 0) {
  if (!list_empty(>pqm.queues)) {
  pr_debug("Process has user queues running\n");
-    mutex_unlock(>mutex);
-    return -EBUSY;
+    r = -EBUSY;
+    goto out_unlock;
  }
-    if (args->xnack_enabled && !kfd_process_xnack_mode(p, true))
+
+    if (p->xnack_enabled == args->xnack_enabled)
+    goto out_unlock;
+
+    if (args->xnack_enabled && !kfd_process_xnack_mode(p, true)) {
  r = -EPERM;
-    else
-    p->xnack_enabled = args->xnack_enabled;
+    goto out_unlock;
+    }
+
+    r = svm_range_switch_xnack_reserve_mem(p, args->xnack_enabled);
  } else {
  args->xnack_enabled = p->xnack_enabled;
  }
+
+out_unlock:
  mutex_unlock(>mutex);
    return r;
  }
  -#if IS_ENABLED(CONFIG_HSA_AMD_SVM)
  static int kfd_ioctl_svm(struct file *filep, struct kfd_process *p, 
void *data)

  {
  struct kfd_ioctl_svm_args *args = data;
@@ -1629,6 +1638,11 @@ static int kfd_ioctl_svm(struct file *filep, 
struct kfd_process *p, void *data)

  return r;
  }
  #else
+static int kfd_ioctl_set_xnack_mode(struct file *filep,
+    struct kfd_process *p, void *data)
+{
+    return -EPERM;
+}
  static int kfd_ioctl_svm(struct file *filep, struct kfd_process *p, 
void *data)

  {
  return -EPERM;
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c

index cf5b4005534c..13d2867faa0c 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
@@ -278,7 +278,7 @@ static void svm_range_free(struct svm_range 
*prange, bool update_mem_usage)

  svm_range_free_dma_mappings(prange);
    if (update_mem_usage && !p->xnack_enabled) {
-    pr_debug("unreserve mem limit: %lld\n", size);
+    pr_debug("unreserve prange 0x%p size: 0x%llx\n", prange, size);
  amdgpu_amdkfd_unreserve_mem_limit(NULL, size,
  KFD_IOC_ALLOC_MEM_FLAGS_USERPTR);
  }
@@ -2956,6 +2956,48 @@ svm_range_restore_pages(struct amdgpu_device 
*adev, unsigned int pasid,

  return r;
  }
  +int
+svm_range_switch_xnack_reserve_mem(struct kfd_process *p, bool 
xnack_enabled)

+{
+    struct svm_range *prange;
+    uint64_t reserved_size = 0;
+    uint64_t size;
+    int r = 0;
+
+    pr_debug("switching xnack from %d to %d\n", p->xnack_enabled, 
xnack_enabled);

+
+    mutex_lock(>svms.lock);
+
+    /* Change xnack mode must be inside svms lock, to avoid race with
+ * svm_range_deferred_list_work unreserve memory in parallel.
+ */
+    p->xnack_enabled = xnack_enabled;


This should only be set on a successful return.



+
+    list_for_each_entry(prange, >svms.list, list) {
+    size = (prange->last - prange->start + 1) << PAGE_SHIFT;
+    pr_debug("svms 0x%p %s prange 0x%p size 0x%llx\n", >svms,
+ xnack_enabled ? "unreserve" : "reserve", prange, size);
+
+    if (xnack_enabled) {
+    amdgpu_amdkfd_unreserve_mem_limit(NULL, size,
+    KFD_IOC_ALLOC_MEM_FLAGS_USERPTR);
+    } else {
+    r = amdgpu_amdkfd_reserve_mem_limit(NULL, size,
+

[PATCH] drm/amdgpu: remove switch from amdgpu_gmc_noretry_set

2022-09-28 Thread Graham Sider

Simplify the logic in amdgpu_gmc_noretry_set by getting rid of the
switch. Also set noretry=1 as default for GFX 10.3.0 and greater since
retry faults are not supported.

Signed-off-by: Graham Sider 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c | 48 +
 1 file changed, 9 insertions(+), 39 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c
index aebc384531ac..34233a74248c 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c
@@ -572,45 +572,15 @@ void amdgpu_gmc_tmz_set(struct amdgpu_device *adev)
 void amdgpu_gmc_noretry_set(struct amdgpu_device *adev)
 {
struct amdgpu_gmc *gmc = >gmc;
-
-   switch (adev->ip_versions[GC_HWIP][0]) {
-   case IP_VERSION(9, 0, 1):
-   case IP_VERSION(9, 3, 0):
-   case IP_VERSION(9, 4, 0):
-   case IP_VERSION(9, 4, 1):
-   case IP_VERSION(9, 4, 2):
-   case IP_VERSION(10, 3, 3):
-   case IP_VERSION(10, 3, 4):
-   case IP_VERSION(10, 3, 5):
-   case IP_VERSION(10, 3, 6):
-   case IP_VERSION(10, 3, 7):
-   /*
-* noretry = 0 will cause kfd page fault tests fail
-* for some ASICs, so set default to 1 for these ASICs.
-*/
-   if (amdgpu_noretry == -1)
-   gmc->noretry = 1;
-   else
-   gmc->noretry = amdgpu_noretry;
-   break;
-   default:
-   /* Raven currently has issues with noretry
-* regardless of what we decide for other
-* asics, we should leave raven with
-* noretry = 0 until we root cause the
-* issues.
-*
-* default this to 0 for now, but we may want
-* to change this in the future for certain
-* GPUs as it can increase performance in
-* certain cases.
-*/
-   if (amdgpu_noretry == -1)
-   gmc->noretry = 0;
-   else
-   gmc->noretry = amdgpu_noretry;
-   break;
-   }
+   uint32_t gc_ver = adev->ip_versions[GC_HWIP][0];
+   bool noretry_default = (gc_ver == IP_VERSION(9, 0, 1) ||
+   gc_ver == IP_VERSION(9, 3, 0) ||
+   gc_ver == IP_VERSION(9, 4, 0) ||
+   gc_ver == IP_VERSION(9, 4, 1) ||
+   gc_ver == IP_VERSION(9, 4, 2) ||
+   gc_ver >= IP_VERSION(10, 3, 0));
+
+   gmc->noretry = (amdgpu_noretry == -1) ? noretry_default : 
amdgpu_noretry;
 }
 
 void amdgpu_gmc_set_vm_fault_masks(struct amdgpu_device *adev, int hub_type,
-- 
2.25.1

Re: [PATCH 4/4] drm/amdgpu: MCBP based on DRM scheduler (v6)

2022-09-28 Thread Michel Dänzer

On 2022-09-28 16:46, Christian König wrote:
> Am 28.09.22 um 15:52 schrieb Michel Dänzer:
>> On 2022-09-28 03:01, Zhu, Jiadong wrote:>
>>> Please make sure umd is calling the libdrm function to create context with 
>>> different priories,
>>> amdgpu_cs_ctx_create2(device_handle, AMDGPU_CTX_PRIORITY_HIGH, 
>>> _handle).
>> Yes, I double-checked that, and that it returns success.
>>
>>
>>> Here is the behavior we could see:
>>> 1. After modprobe amdgpu, two software rings named gfx_high/gfx_low(in 
>>> previous patch named gfx_sw) is visible in UMR. We could check the wptr/ptr 
>>> to see if it is being used.
>>> 2. MCBP happens while the two different priority ibs are submitted at the 
>>> same time. We could check fence info as below:
>>> Last signaled trailing fence++  when the mcbp triggers by kmd. Last 
>>> preempted may not increase as the mcbp is not triggered from CP.
>>>
>>> --- ring 0 (gfx) ---
>>> Last signaled fence  0x0001
>>> Last emitted 0x0001
>>> Last signaled trailing fence 0x0022eb84
>>> Last emitted 0x0022eb84
>>> Last preempted   0x
>>> Last reset   0x
>> I've now tested on this Picasso (GFX9) laptop as well. The "Last signaled 
>> trailing fence" line is changing here (seems to always match the "Last 
>> emitted" line).
>>
>> However, mutter's frame rate still cannot exceed that of GPU-limited 
>> clients. BTW, you can test with a GNOME Wayland session, even without my MR 
>> referenced below. Preemption will just be less effective without that MR. 
>> mutter has used a high priority context when possible for a long time.
>>
>> I'm also seeing intermittent freezes, where not even the mouse cursor moves 
>> for up to around one second, e.g. when interacting with the GNOME top bar. 
>> I'm not sure yet if these are related to this patch series, but I never 
>> noticed it before. I wonder if the freezes might occur when GPU preemption 
>> is attempted.
> 
> Keep in mind that this doesn't have the same fine granularity as the separate 
> hw ring buffer found on gfx10.
> 
> With MCBP we can only preempt on draw command boundary, while the separate hw 
> ring solution can preempt as soon as a CU is available.

Right, but so far I haven't noticed any positive effect. That and the 
intermittent freezes indicate the MCBP based preemption isn't actually working 
as intended yet.


>>> From: Koenig, Christian 
>>>
 This work is solely for gfx9 (e.g. Vega) and older.

 Navi has a completely separate high priority gfx queue we can use for this.
>> Right, but 4c7631800e6b ("drm/amd/amdgpu: add pipe1 hardware support") was 
>> for Sienna Cichlid only, and turned out to be unstable, so it had to 
>> reverted.
>>
>> It would be nice to make the SW ring solution take effect by default 
>> whenever there is no separate high priority HW gfx queue available (and any 
>> other requirements are met).
> 
> I don't think that this will be a good idea. The hw ring buffer or even hw 
> scheduler have much nicer properties and we should focus on getting that 
> working when available.

Of course, the HW features should have priority. I mean as a fallback when the 
HW features effectively aren't available (which is currently always the case 
with amdgpu, even when the GPU has the HW features).


>>> Am 27.09.22 um 19:49 schrieb Michel Dänzer:
 On 2022-09-27 08:06, Christian König wrote:
> Hey Michel,
>
> JIadong is working on exposing high/low priority gfx queues for gfx9 and 
> older hw generations by using mid command buffer preemption.
 Yeah, I've been keeping an eye on these patches. I'm looking forward to 
 this working.


> I know that you have been working on Gnome Mutter to make use from 
> userspace for this. Do you have time to run some tests with that?
 I just tested the v8 series (first without amdgpu.mcbp=1 on the kernel 
 command line, then with it, since I wasn't sure if it's needed) with 
 https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgitlab.gnome.org%2FGNOME%2Fmutter%2F-%2Fmerge_requests%2F1880data=05%7C01%7Cchristian.koenig%40amd.com%7Cc6345d9230004549ba4d08daa0b0abcd%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637998977913548768%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7Csdata=P1Qo2AwDmfmPrxJe2SxTFsVjdJ9vjabK8s84ZVz%2Beh8%3Dreserved=0
  on Navi 14.

 Unfortunately, I'm not seeing any change in behaviour. Even though mutter 
 uses a high priority context via the EGL_IMG_context_priority extension, 
 it's unable to reach a higher frame rate than a GPU-limited client[0]. The 
 "Last preempted" line of /sys/kernel/debug/dri/0/amdgpu_fence_info remains 
 at 0x.

 Did I miss a step?


 [0] I used the GpuTest pixmark piano & plot3d benchmarks. With an Intel 
 iGPU, mutter can achieve a higher frame rate

Re: [PATCH] drm/amdgpu: Fix mc_umc_status used uninitialized warning

2022-09-28 Thread Hamza Mahfooz


On 2022-09-28 10:49, sunpeng...@amd.com wrote:

From: Leo Li 

On ChromeOS clang build, the following warning is seen:

/mnt/host/source/src/third_party/kernel/v5.15/drivers/gpu/drm/amd/amdgpu/umc_v6_7.c:463:6:
 error: variable 'mc_umc_status' is used uninitialized whenever 'if' condition 
is false [-Werror,-Wsometimes-uninitialized]
 if (mca_addr == UMC_INVALID_ADDR) {
 ^~~~
/mnt/host/source/src/third_party/kernel/v5.15/drivers/gpu/drm/amd/amdgpu/umc_v6_7.c:485:21:
 note: uninitialized use occurs here
 if ((REG_GET_FIELD(mc_umc_status, MCA_UMC_UMC0_MCUMC_STATUST0, Val) == 1 
&&
^
/mnt/host/source/src/third_party/kernel/v5.15/drivers/gpu/drm/amd/amdgpu/../amdgpu/amdgpu.h:1208:5:
 note: expanded from macro 'REG_GET_FIELD'
 (((value) & REG_FIELD_MASK(reg, field)) >> REG_FIELD_SHIFT(reg, field))
^
/mnt/host/source/src/third_party/kernel/v5.15/drivers/gpu/drm/amd/amdgpu/umc_v6_7.c:463:2:
 note: remove the 'if' if its condition is always true
 if (mca_addr == UMC_INVALID_ADDR) {
 ^~
/mnt/host/source/src/third_party/kernel/v5.15/drivers/gpu/drm/amd/amdgpu/umc_v6_7.c:460:24:
 note: initialize the variable 'mc_umc_status' to silence this warning
 uint64_t mc_umc_status, mc_umc_addrt0;
   ^
= 0
1 error generated.
make[5]: *** 
[/mnt/host/source/src/third_party/kernel/v5.15/scripts/Makefile.build:289: 
drivers/gpu/drm/amd/amdgpu/umc_v6_7.o] Error 1

Fix by initializing mc_umc_status = 0.

Fixes: d8e19e32945e ("drm/amdgpu: support to convert dedicated umc mca address")
Signed-off-by: Leo Li 


Reviewed-by: Hamza Mahfooz 


---
  drivers/gpu/drm/amd/amdgpu/umc_v6_7.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/umc_v6_7.c 
b/drivers/gpu/drm/amd/amdgpu/umc_v6_7.c
index 2cc961534542..a0d19b768346 100644
--- a/drivers/gpu/drm/amd/amdgpu/umc_v6_7.c
+++ b/drivers/gpu/drm/amd/amdgpu/umc_v6_7.c
@@ -457,7 +457,7 @@ static void umc_v6_7_query_error_address(struct 
amdgpu_device *adev,
  {
uint32_t mc_umc_status_addr;
uint32_t channel_index;
-   uint64_t mc_umc_status, mc_umc_addrt0;
+   uint64_t mc_umc_status = 0, mc_umc_addrt0;
uint64_t err_addr, soc_pa, retired_page, column;
  
  	if (mca_addr == UMC_INVALID_ADDR) {


--
Hamza

[PATCH] drm/amdgpu: Fix mc_umc_status used uninitialized warning

2022-09-28 Thread sunpeng.li

From: Leo Li 

On ChromeOS clang build, the following warning is seen:

/mnt/host/source/src/third_party/kernel/v5.15/drivers/gpu/drm/amd/amdgpu/umc_v6_7.c:463:6:
 error: variable 'mc_umc_status' is used uninitialized whenever 'if' condition 
is false [-Werror,-Wsometimes-uninitialized]
if (mca_addr == UMC_INVALID_ADDR) {
^~~~
/mnt/host/source/src/third_party/kernel/v5.15/drivers/gpu/drm/amd/amdgpu/umc_v6_7.c:485:21:
 note: uninitialized use occurs here
if ((REG_GET_FIELD(mc_umc_status, MCA_UMC_UMC0_MCUMC_STATUST0, Val) == 
1 &&
   ^
/mnt/host/source/src/third_party/kernel/v5.15/drivers/gpu/drm/amd/amdgpu/../amdgpu/amdgpu.h:1208:5:
 note: expanded from macro 'REG_GET_FIELD'
(((value) & REG_FIELD_MASK(reg, field)) >> REG_FIELD_SHIFT(reg, field))
   ^
/mnt/host/source/src/third_party/kernel/v5.15/drivers/gpu/drm/amd/amdgpu/umc_v6_7.c:463:2:
 note: remove the 'if' if its condition is always true
if (mca_addr == UMC_INVALID_ADDR) {
^~
/mnt/host/source/src/third_party/kernel/v5.15/drivers/gpu/drm/amd/amdgpu/umc_v6_7.c:460:24:
 note: initialize the variable 'mc_umc_status' to silence this warning
uint64_t mc_umc_status, mc_umc_addrt0;
  ^
   = 0
1 error generated.
make[5]: *** 
[/mnt/host/source/src/third_party/kernel/v5.15/scripts/Makefile.build:289: 
drivers/gpu/drm/amd/amdgpu/umc_v6_7.o] Error 1

Fix by initializing mc_umc_status = 0.

Fixes: d8e19e32945e ("drm/amdgpu: support to convert dedicated umc mca address")
Signed-off-by: Leo Li 
---
 drivers/gpu/drm/amd/amdgpu/umc_v6_7.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/umc_v6_7.c 
b/drivers/gpu/drm/amd/amdgpu/umc_v6_7.c
index 2cc961534542..a0d19b768346 100644
--- a/drivers/gpu/drm/amd/amdgpu/umc_v6_7.c
+++ b/drivers/gpu/drm/amd/amdgpu/umc_v6_7.c
@@ -457,7 +457,7 @@ static void umc_v6_7_query_error_address(struct 
amdgpu_device *adev,
 {
uint32_t mc_umc_status_addr;
uint32_t channel_index;
-   uint64_t mc_umc_status, mc_umc_addrt0;
+   uint64_t mc_umc_status = 0, mc_umc_addrt0;
uint64_t err_addr, soc_pa, retired_page, column;
 
if (mca_addr == UMC_INVALID_ADDR) {
-- 
2.37.3

Re: [PATCH 4/4] drm/amdgpu: MCBP based on DRM scheduler (v6)

2022-09-28 Thread Christian König


Am 28.09.22 um 15:52 schrieb Michel Dänzer:

On 2022-09-28 03:01, Zhu, Jiadong wrote:>

Please make sure umd is calling the libdrm function to create context with 
different priories,
amdgpu_cs_ctx_create2(device_handle, AMDGPU_CTX_PRIORITY_HIGH, _handle).

Yes, I double-checked that, and that it returns success.



Here is the behavior we could see:
1. After modprobe amdgpu, two software rings named gfx_high/gfx_low(in previous 
patch named gfx_sw) is visible in UMR. We could check the wptr/ptr to see if it 
is being used.
2. MCBP happens while the two different priority ibs are submitted at the same 
time. We could check fence info as below:
Last signaled trailing fence++  when the mcbp triggers by kmd. Last preempted 
may not increase as the mcbp is not triggered from CP.

--- ring 0 (gfx) ---
Last signaled fence  0x0001
Last emitted 0x0001
Last signaled trailing fence 0x0022eb84
Last emitted 0x0022eb84
Last preempted   0x
Last reset   0x

I've now tested on this Picasso (GFX9) laptop as well. The "Last signaled trailing fence" 
line is changing here (seems to always match the "Last emitted" line).

However, mutter's frame rate still cannot exceed that of GPU-limited clients. 
BTW, you can test with a GNOME Wayland session, even without my MR referenced 
below. Preemption will just be less effective without that MR. mutter has used 
a high priority context when possible for a long time.

I'm also seeing intermittent freezes, where not even the mouse cursor moves for 
up to around one second, e.g. when interacting with the GNOME top bar. I'm not 
sure yet if these are related to this patch series, but I never noticed it 
before. I wonder if the freezes might occur when GPU preemption is attempted.


Keep in mind that this doesn't have the same fine granularity as the 
separate hw ring buffer found on gfx10.


With MCBP we can only preempt on draw command boundary, while the 
separate hw ring solution can preempt as soon as a CU is available.



From: Koenig, Christian 


This work is solely for gfx9 (e.g. Vega) and older.

Navi has a completely separate high priority gfx queue we can use for this.

Right, but 4c7631800e6b ("drm/amd/amdgpu: add pipe1 hardware support") was for 
Sienna Cichlid only, and turned out to be unstable, so it had to reverted.

It would be nice to make the SW ring solution take effect by default whenever 
there is no separate high priority HW gfx queue available (and any other 
requirements are met).


I don't think that this will be a good idea. The hw ring buffer or even 
hw scheduler have much nicer properties and we should focus on getting 
that working when available.


Regards,
Christian.





Am 27.09.22 um 19:49 schrieb Michel Dänzer:

On 2022-09-27 08:06, Christian König wrote:

Hey Michel,

JIadong is working on exposing high/low priority gfx queues for gfx9 and older 
hw generations by using mid command buffer preemption.

Yeah, I've been keeping an eye on these patches. I'm looking forward to this 
working.



I know that you have been working on Gnome Mutter to make use from userspace 
for this. Do you have time to run some tests with that?

I just tested the v8 series (first without amdgpu.mcbp=1 on the kernel command line, then 
with it, since I wasn't sure if it's needed) with 
https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgitlab.gnome.org%2FGNOME%2Fmutter%2F-%2Fmerge_requests%2F1880data=05%7C01%7Cchristian.koenig%40amd.com%7Cc6345d9230004549ba4d08daa0b0abcd%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637998977913548768%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7Csdata=P1Qo2AwDmfmPrxJe2SxTFsVjdJ9vjabK8s84ZVz%2Beh8%3Dreserved=0
 on Navi 14.

Unfortunately, I'm not seeing any change in behaviour. Even though mutter uses a high 
priority context via the EGL_IMG_context_priority extension, it's unable to reach a 
higher frame rate than a GPU-limited client[0]. The "Last preempted" line of 
/sys/kernel/debug/dri/0/amdgpu_fence_info remains at 0x.

Did I miss a step?


[0] I used the GpuTest pixmark piano & plot3d benchmarks. With an Intel iGPU, 
mutter can achieve a higher frame rate than plot3d, though not than pixmark piano 
(presumably due to limited GPU preemption granularity).

1 2 >

1 - 100 of 121 matches

Mail list logo