[PATCH] drm/amdgpu: loose check for umc poison mode

2022-02-10 Thread Tao Zhou
No need to check poison setting for each channel, check for umc0
channel0 is enough.

Signed-off-by: Tao Zhou 
---
 drivers/gpu/drm/amd/amdgpu/umc_v6_7.c | 18 +-
 1 file changed, 5 insertions(+), 13 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/umc_v6_7.c 
b/drivers/gpu/drm/amd/amdgpu/umc_v6_7.c
index 47452b61b615..e613511e07e1 100644
--- a/drivers/gpu/drm/amd/amdgpu/umc_v6_7.c
+++ b/drivers/gpu/drm/amd/amdgpu/umc_v6_7.c
@@ -451,21 +451,13 @@ static uint32_t 
umc_v6_7_query_ras_poison_mode_per_channel(
 
 static bool umc_v6_7_query_ras_poison_mode(struct amdgpu_device *adev)
 {
-   uint32_t umc_inst= 0;
-   uint32_t ch_inst = 0;
uint32_t umc_reg_offset  = 0;
 
-   LOOP_UMC_INST_AND_CH(umc_inst, ch_inst) {
-   umc_reg_offset = get_umc_v6_7_reg_offset(adev,
-   umc_inst,
-   ch_inst);
-   /* Enabling fatal error in one channel will be considered
-  as fatal error mode */
-   if (umc_v6_7_query_ras_poison_mode_per_channel(adev, 
umc_reg_offset))
-   return false;
-   }
-
-   return true;
+   /* Enabling fatal error in umc instance0 channel0 will be
+* considered as fatal error mode
+*/
+   umc_reg_offset = get_umc_v6_7_reg_offset(adev, 0, 0);
+   return !umc_v6_7_query_ras_poison_mode_per_channel(adev, 
umc_reg_offset);
 }
 
 const struct amdgpu_ras_block_hw_ops umc_v6_7_ras_hw_ops = {
-- 
2.17.1



Re: [PATCH] drm/amdgpu: Fix compile error.

2022-02-10 Thread Christian König

Am 10.02.22 um 08:06 schrieb Christian König:

Am 10.02.22 um 04:17 schrieb Andrey Grodzovsky:

Seems I forgot to add this to the relevant commit
when submitting.


Rebase/merge issue? Looks like it.



Signed-off-by: Andrey Grodzovsky 
Reported-by: kernel test robot 


Reviewed-by: Christian König 


BTW: I've gone ahead and pushed this to drm-misc-next because I just 
broke basically every build :)


Christian.




---
  drivers/gpu/drm/amd/amdgpu/amdgpu_reset.h | 3 +--
  1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_reset.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_reset.h

index 92de3b7965a1..1949dbe28a86 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_reset.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_reset.h
@@ -118,8 +118,7 @@ static inline bool 
amdgpu_reset_domain_schedule(struct amdgpu_reset_domain *doma

  return queue_work(domain->wq, work);
  }
  -void amdgpu_device_lock_reset_domain(struct amdgpu_reset_domain 
*reset_domain,

- struct amdgpu_hive_info *hive);
+void amdgpu_device_lock_reset_domain(struct amdgpu_reset_domain 
*reset_domain);
    void amdgpu_device_unlock_reset_domain(struct amdgpu_reset_domain 
*reset_domain);






Re: [PATCH 03/27] mm: remove pointless includes from

2022-02-10 Thread Muchun Song
On Thu, Feb 10, 2022 at 3:28 PM Christoph Hellwig  wrote:
>
> hmm.h pulls in the world for no good reason at all.  Remove the
> includes and push a few ones into the users instead.
>
> Signed-off-by: Christoph Hellwig 
> Reviewed-by: Logan Gunthorpe 
> Reviewed-by: Jason Gunthorpe 
> Reviewed-by: Chaitanya Kulkarni 

Reviewed-by: Muchun Song 



RE: [PATCH] drm/amdgpu: loose check for umc poison mode

2022-02-10 Thread Zhang, Hawking
[AMD Official Use Only]

Reviewed-by: Hawking Zhang 

Regards,
Hawking
-Original Message-
From: Zhou1, Tao  
Sent: Thursday, February 10, 2022 16:23
To: amd-gfx@lists.freedesktop.org; Zhang, Hawking ; 
Yang, Stanley ; Chai, Thomas ; 
Clements, John 
Cc: Zhou1, Tao 
Subject: [PATCH] drm/amdgpu: loose check for umc poison mode

No need to check poison setting for each channel, check for umc0
channel0 is enough.

Signed-off-by: Tao Zhou 
---
 drivers/gpu/drm/amd/amdgpu/umc_v6_7.c | 18 +-
 1 file changed, 5 insertions(+), 13 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/umc_v6_7.c 
b/drivers/gpu/drm/amd/amdgpu/umc_v6_7.c
index 47452b61b615..e613511e07e1 100644
--- a/drivers/gpu/drm/amd/amdgpu/umc_v6_7.c
+++ b/drivers/gpu/drm/amd/amdgpu/umc_v6_7.c
@@ -451,21 +451,13 @@ static uint32_t 
umc_v6_7_query_ras_poison_mode_per_channel(
 
 static bool umc_v6_7_query_ras_poison_mode(struct amdgpu_device *adev)  {
-   uint32_t umc_inst= 0;
-   uint32_t ch_inst = 0;
uint32_t umc_reg_offset  = 0;
 
-   LOOP_UMC_INST_AND_CH(umc_inst, ch_inst) {
-   umc_reg_offset = get_umc_v6_7_reg_offset(adev,
-   umc_inst,
-   ch_inst);
-   /* Enabling fatal error in one channel will be considered
-  as fatal error mode */
-   if (umc_v6_7_query_ras_poison_mode_per_channel(adev, 
umc_reg_offset))
-   return false;
-   }
-
-   return true;
+   /* Enabling fatal error in umc instance0 channel0 will be
+* considered as fatal error mode
+*/
+   umc_reg_offset = get_umc_v6_7_reg_offset(adev, 0, 0);
+   return !umc_v6_7_query_ras_poison_mode_per_channel(adev, 
+umc_reg_offset);
 }
 
 const struct amdgpu_ras_block_hw_ops umc_v6_7_ras_hw_ops = {
--
2.17.1


RE: [PATCH] drm/amdgpu: no rlcg read access in SRIOV case for gfx v9

2022-02-10 Thread Zhang, Hawking
[AMD Official Use Only]

Reviewed-by: Hawking Zhang 

Regards,
Hawking
-Original Message-
From: Chen, Guchun  
Sent: Thursday, February 10, 2022 14:40
To: amd-gfx@lists.freedesktop.org; Zhang, Hawking ; 
Zhou, Peng Ju ; Koenig, Christian 
; Deucher, Alexander 
Cc: Chen, Guchun 
Subject: [PATCH] drm/amdgpu: no rlcg read access in SRIOV case for gfx v9

Fall back to MMIO to read registers as rlcg read is not available for gfx v9 in 
SRIOV configration. Otherwise, gmc_v9_0_flush_gpu_tlb will always complain 
timeout and finally breaks driver load.

Fixes: 0dc4a7e75581("drm/amdgpu: switch to get_rlcg_reg_access_flag for gfx9")
Signed-off-by: Guchun Chen 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c | 13 -
 1 file changed, 12 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c
index e1288901beb6..a3274fa1c7e4 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c
@@ -37,6 +37,16 @@
vf2pf_info->ucode_info[ucode].version = ver; \
} while (0)
 
+static bool amdgpu_virt_is_rlcg_read_supported(struct amdgpu_device 
+*adev) {
+   /* rlcg read is not support in SRIOV with gfx v9 */
+   if ((adev->ip_versions[MP0_HWIP][0] == IP_VERSION(9, 0, 0)) ||
+   (adev->ip_versions[GC_HWIP][0] == IP_VERSION(9, 4, 1)))
+   return false;
+
+   return true;
+}
+
 bool amdgpu_virt_mmio_blocked(struct amdgpu_device *adev)  {
/* By now all MMIO pages except mailbox are blocked */ @@ -957,7 +967,8 
@@ u32 amdgpu_sriov_rreg(struct amdgpu_device *adev,
u32 rlcg_flag;
 
if (!amdgpu_sriov_runtime(adev) &&
-   amdgpu_virt_get_rlcg_reg_access_flag(adev, acc_flags, hwip, false, 
&rlcg_flag))
+   amdgpu_virt_is_rlcg_read_supported(adev) &&
+   amdgpu_virt_get_rlcg_reg_access_flag(adev, acc_flags, hwip, 
false, 
+&rlcg_flag))
return amdgpu_virt_rlcg_reg_rw(adev, offset, 0, rlcg_flag);
 
if (acc_flags & AMDGPU_REGS_NO_KIQ)
--
2.17.1


Re: [PATCH 13/27] mm: move the migrate_vma_* device migration code into it's own file

2022-02-10 Thread Alistair Popple
I got the following build error:

/data/source/linux/mm/migrate_device.c: In function ‘migrate_vma_collect_pmd’:
/data/source/linux/mm/migrate_device.c:242:3: error: implicit declaration of 
function ‘flush_tlb_range’; did you mean ‘flush_pmd_tlb_range’? 
[-Werror=implicit-function-declaration]
  242 |   flush_tlb_range(walk->vma, start, end);
  |   ^~~
  |   flush_pmd_tlb_range

Including asm/tlbflush.h in migrate_device.c fixed it for me.

On Thursday, 10 February 2022 6:28:14 PM AEDT Christoph Hellwig wrote:
> Split the code used to migrate to and from ZONE_DEVICE memory from
> migrate.c into a new file.
> 
> Signed-off-by: Christoph Hellwig 
> ---
>  mm/Kconfig  |   3 +
>  mm/Makefile |   1 +
>  mm/migrate.c| 753 ---
>  mm/migrate_device.c | 765 
>  4 files changed, 769 insertions(+), 753 deletions(-)
>  create mode 100644 mm/migrate_device.c
> 
> diff --git a/mm/Kconfig b/mm/Kconfig
> index a1901ae6d06293..6391d8d3a616f3 100644
> --- a/mm/Kconfig
> +++ b/mm/Kconfig
> @@ -249,6 +249,9 @@ config MIGRATION
> pages as migration can relocate pages to satisfy a huge page
> allocation instead of reclaiming.
>  
> +config DEVICE_MIGRATION
> + def_bool MIGRATION && DEVICE_PRIVATE
> +
>  config ARCH_ENABLE_HUGEPAGE_MIGRATION
>   bool
>  
> diff --git a/mm/Makefile b/mm/Makefile
> index 70d4309c9ce338..4cc13f3179a518 100644
> --- a/mm/Makefile
> +++ b/mm/Makefile
> @@ -92,6 +92,7 @@ obj-$(CONFIG_KFENCE) += kfence/
>  obj-$(CONFIG_FAILSLAB) += failslab.o
>  obj-$(CONFIG_MEMTEST)+= memtest.o
>  obj-$(CONFIG_MIGRATION) += migrate.o
> +obj-$(CONFIG_DEVICE_MIGRATION) += migrate_device.o
>  obj-$(CONFIG_TRANSPARENT_HUGEPAGE) += huge_memory.o khugepaged.o
>  obj-$(CONFIG_PAGE_COUNTER) += page_counter.o
>  obj-$(CONFIG_MEMCG) += memcontrol.o vmpressure.o
> diff --git a/mm/migrate.c b/mm/migrate.c
> index 746e1230886ddb..c31d04b46a5e17 100644
> --- a/mm/migrate.c
> +++ b/mm/migrate.c
> @@ -38,12 +38,10 @@
>  #include 
>  #include 
>  #include 
> -#include 
>  #include 
>  #include 
>  #include 
>  #include 
> -#include 
>  #include 
>  #include 
>  #include 
> @@ -2125,757 +2123,6 @@ int migrate_misplaced_page(struct page *page, struct 
> vm_area_struct *vma,
>  #endif /* CONFIG_NUMA_BALANCING */
>  #endif /* CONFIG_NUMA */
>  
> -#ifdef CONFIG_DEVICE_PRIVATE
> -static int migrate_vma_collect_skip(unsigned long start,
> - unsigned long end,
> - struct mm_walk *walk)
> -{
> - struct migrate_vma *migrate = walk->private;
> - unsigned long addr;
> -
> - for (addr = start; addr < end; addr += PAGE_SIZE) {
> - migrate->dst[migrate->npages] = 0;
> - migrate->src[migrate->npages++] = 0;
> - }
> -
> - return 0;
> -}
> -
> -static int migrate_vma_collect_hole(unsigned long start,
> - unsigned long end,
> - __always_unused int depth,
> - struct mm_walk *walk)
> -{
> - struct migrate_vma *migrate = walk->private;
> - unsigned long addr;
> -
> - /* Only allow populating anonymous memory. */
> - if (!vma_is_anonymous(walk->vma))
> - return migrate_vma_collect_skip(start, end, walk);
> -
> - for (addr = start; addr < end; addr += PAGE_SIZE) {
> - migrate->src[migrate->npages] = MIGRATE_PFN_MIGRATE;
> - migrate->dst[migrate->npages] = 0;
> - migrate->npages++;
> - migrate->cpages++;
> - }
> -
> - return 0;
> -}
> -
> -static int migrate_vma_collect_pmd(pmd_t *pmdp,
> -unsigned long start,
> -unsigned long end,
> -struct mm_walk *walk)
> -{
> - struct migrate_vma *migrate = walk->private;
> - struct vm_area_struct *vma = walk->vma;
> - struct mm_struct *mm = vma->vm_mm;
> - unsigned long addr = start, unmapped = 0;
> - spinlock_t *ptl;
> - pte_t *ptep;
> -
> -again:
> - if (pmd_none(*pmdp))
> - return migrate_vma_collect_hole(start, end, -1, walk);
> -
> - if (pmd_trans_huge(*pmdp)) {
> - struct page *page;
> -
> - ptl = pmd_lock(mm, pmdp);
> - if (unlikely(!pmd_trans_huge(*pmdp))) {
> - spin_unlock(ptl);
> - goto again;
> - }
> -
> - page = pmd_page(*pmdp);
> - if (is_huge_zero_page(page)) {
> - spin_unlock(ptl);
> - split_huge_pmd(vma, pmdp, addr);
> - if (pmd_trans_unstable(pmdp))
> - return migrate_vma_collect_skip(start, end,
> - walk);
> - } else {
> - int

Re: [PATCH 14/27] mm: build migrate_vma_* for all configs with ZONE_DEVICE support

2022-02-10 Thread Alistair Popple
Thanks, it's also better than more stubbed functions.

Reviewed-by: Alistair Popple 

On Thursday, 10 February 2022 6:28:15 PM AEDT Christoph Hellwig wrote:
> This code will be used for device coherent memory as well in a bit,
> so relax the ifdef a bit.
> 
> Signed-off-by: Christoph Hellwig 
> ---
>  mm/Kconfig | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/mm/Kconfig b/mm/Kconfig
> index 6391d8d3a616f3..95d4aa3acaefe0 100644
> --- a/mm/Kconfig
> +++ b/mm/Kconfig
> @@ -250,7 +250,7 @@ config MIGRATION
> allocation instead of reclaiming.
>  
>  config DEVICE_MIGRATION
> - def_bool MIGRATION && DEVICE_PRIVATE
> + def_bool MIGRATION && ZONE_DEVICE
>  
>  config ARCH_ENABLE_HUGEPAGE_MIGRATION
>   bool
> 







Re: [PATCH 12/27] mm: refactor the ZONE_DEVICE handling in migrate_vma_pages

2022-02-10 Thread Alistair Popple
Reviewed-by: Alistair Popple 

On Thursday, 10 February 2022 6:28:13 PM AEDT Christoph Hellwig wrote:
> Make the flow a little more clear and prepare for adding a new
> ZONE_DEVICE memory type.
> 
> Signed-off-by: Christoph Hellwig 
> ---
>  mm/migrate.c | 27 ---
>  1 file changed, 12 insertions(+), 15 deletions(-)
> 
> diff --git a/mm/migrate.c b/mm/migrate.c
> index 30ecd7223656c1..746e1230886ddb 100644
> --- a/mm/migrate.c
> +++ b/mm/migrate.c
> @@ -2788,24 +2788,21 @@ void migrate_vma_pages(struct migrate_vma *migrate)
>  
>   mapping = page_mapping(page);
>  
> - if (is_zone_device_page(newpage)) {
> - if (is_device_private_page(newpage)) {
> - /*
> -  * For now only support private anonymous when
> -  * migrating to un-addressable device memory.
> -  */
> - if (mapping) {
> - migrate->src[i] &= ~MIGRATE_PFN_MIGRATE;
> - continue;
> - }
> - } else {
> - /*
> -  * Other types of ZONE_DEVICE page are not
> -  * supported.
> -  */
> + if (is_device_private_page(newpage)) {
> + /*
> +  * For now only support private anonymous when migrating
> +  * to un-addressable device memory.
> +  */
> + if (mapping) {
>   migrate->src[i] &= ~MIGRATE_PFN_MIGRATE;
>   continue;
>   }
> + } else if (is_zone_device_page(newpage)) {
> + /*
> +  * Other types of ZONE_DEVICE page are not supported.
> +  */
> + migrate->src[i] &= ~MIGRATE_PFN_MIGRATE;
> + continue;
>   }
>  
>   r = migrate_page(mapping, newpage, page, MIGRATE_SYNC_NO_COPY);
> 







Re: [PATCH 11/27] mm: refactor the ZONE_DEVICE handling in migrate_vma_insert_page

2022-02-10 Thread Alistair Popple
Reviewed-by: Alistair Popple 

On Thursday, 10 February 2022 6:28:12 PM AEDT Christoph Hellwig wrote:
> Make the flow a little more clear and prepare for adding a new
> ZONE_DEVICE memory type.
> 
> Signed-off-by: Christoph Hellwig 
> ---
>  mm/migrate.c | 31 +++
>  1 file changed, 15 insertions(+), 16 deletions(-)
> 
> diff --git a/mm/migrate.c b/mm/migrate.c
> index 8e0370a73f8a43..30ecd7223656c1 100644
> --- a/mm/migrate.c
> +++ b/mm/migrate.c
> @@ -2670,26 +2670,25 @@ static void migrate_vma_insert_page(struct 
> migrate_vma *migrate,
>*/
>   __SetPageUptodate(page);
>  
> - if (is_zone_device_page(page)) {
> - if (is_device_private_page(page)) {
> - swp_entry_t swp_entry;
> + if (is_device_private_page(page)) {
> + swp_entry_t swp_entry;
>  
> - if (vma->vm_flags & VM_WRITE)
> - swp_entry = make_writable_device_private_entry(
> - page_to_pfn(page));
> - else
> - swp_entry = make_readable_device_private_entry(
> - page_to_pfn(page));
> - entry = swp_entry_to_pte(swp_entry);
> - } else {
> - /*
> -  * For now we only support migrating to un-addressable
> -  * device memory.
> -  */
> + if (vma->vm_flags & VM_WRITE)
> + swp_entry = make_writable_device_private_entry(
> + page_to_pfn(page));
> + else
> + swp_entry = make_readable_device_private_entry(
> + page_to_pfn(page));
> + entry = swp_entry_to_pte(swp_entry);
> + } else {
> + /*
> +  * For now we only support migrating to un-addressable device
> +  * memory.
> +  */
> + if (is_zone_device_page(page)) {
>   pr_warn_once("Unsupported ZONE_DEVICE page type.\n");
>   goto abort;
>   }
> - } else {
>   entry = mk_pte(page, vma->vm_page_prot);
>   if (vma->vm_flags & VM_WRITE)
>   entry = pte_mkwrite(pte_mkdirty(entry));
> 







RE: [PATCH] drm/amdgpu: no rlcg read access in SRIOV case for gfx v9

2022-02-10 Thread Skvortsov, Victor
[AMD Official Use Only]

Hi Guchun,

RLCG read is available on Aldebaran if amdgpu_sriov_reg_indirect_gc() flag is 
set. Instead of adding a new function, I think we should simply add a check 
inside amdgpu_virt_get_rlcg_reg_access_flag():


diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c
index e1288901beb6..1ee600e90312 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c
@@ -836,7 +836,7 @@ static bool amdgpu_virt_get_rlcg_reg_access_flag(struct 
amdgpu_device *adev,
/* only in new version, AMDGPU_REGS_NO_KIQ and
 * AMDGPU_REGS_RLC are enabled simultaneously */
} else if ((acc_flags & AMDGPU_REGS_RLC) &&
-  !(acc_flags & AMDGPU_REGS_NO_KIQ)) {
+  !(acc_flags & AMDGPU_REGS_NO_KIQ) && write) {
*rlcg_flag = AMDGPU_RLCG_GC_WRITE_LEGACY;
ret = true;
}

Thanks,
Victor

-Original Message-
From: amd-gfx  On Behalf Of Zhang, 
Hawking
Sent: Thursday, February 10, 2022 5:02 AM
To: Chen, Guchun ; amd-gfx@lists.freedesktop.org; Zhou, 
Peng Ju ; Koenig, Christian ; 
Deucher, Alexander 
Subject: RE: [PATCH] drm/amdgpu: no rlcg read access in SRIOV case for gfx v9

[CAUTION: External Email]

[AMD Official Use Only]

Reviewed-by: Hawking Zhang 

Regards,
Hawking
-Original Message-
From: Chen, Guchun 
Sent: Thursday, February 10, 2022 14:40
To: amd-gfx@lists.freedesktop.org; Zhang, Hawking ; 
Zhou, Peng Ju ; Koenig, Christian 
; Deucher, Alexander 
Cc: Chen, Guchun 
Subject: [PATCH] drm/amdgpu: no rlcg read access in SRIOV case for gfx v9

Fall back to MMIO to read registers as rlcg read is not available for gfx v9 in 
SRIOV configration. Otherwise, gmc_v9_0_flush_gpu_tlb will always complain 
timeout and finally breaks driver load.

Fixes: 0dc4a7e75581("drm/amdgpu: switch to get_rlcg_reg_access_flag for gfx9")
Signed-off-by: Guchun Chen 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c | 13 -
 1 file changed, 12 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c
index e1288901beb6..a3274fa1c7e4 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c
@@ -37,6 +37,16 @@
vf2pf_info->ucode_info[ucode].version = ver; \
} while (0)

+static bool amdgpu_virt_is_rlcg_read_supported(struct amdgpu_device
+*adev) {
+   /* rlcg read is not support in SRIOV with gfx v9 */
+   if ((adev->ip_versions[MP0_HWIP][0] == IP_VERSION(9, 0, 0)) ||
+   (adev->ip_versions[GC_HWIP][0] == IP_VERSION(9, 4, 1)))
+   return false;
+
+   return true;
+}
+
 bool amdgpu_virt_mmio_blocked(struct amdgpu_device *adev)  {
/* By now all MMIO pages except mailbox are blocked */ @@ -957,7 +967,8 
@@ u32 amdgpu_sriov_rreg(struct amdgpu_device *adev,
u32 rlcg_flag;

if (!amdgpu_sriov_runtime(adev) &&
-   amdgpu_virt_get_rlcg_reg_access_flag(adev, acc_flags, hwip, false, 
&rlcg_flag))
+   amdgpu_virt_is_rlcg_read_supported(adev) &&
+   amdgpu_virt_get_rlcg_reg_access_flag(adev, acc_flags, hwip, 
false,
+&rlcg_flag))
return amdgpu_virt_rlcg_reg_rw(adev, offset, 0, rlcg_flag);

if (acc_flags & AMDGPU_REGS_NO_KIQ)
--
2.17.1


Re: start sorting out the ZONE_DEVICE refcount mess v2

2022-02-10 Thread Alistair Popple
On Thursday, 10 February 2022 6:28:01 PM AEDT Christoph Hellwig wrote:

[...]

> Changes since v1:
>  - add a missing memremap.h include in memcontrol.c
>  - include rebased versions of the device coherent support and
>device coherent migration support series as well as additional
>cleanup patches

Thanks for the rebase. I will take a closer look at it tomorrow but I just
ran the hmm-tests and they are all still passing for me with this series.

> Diffstt:
>  arch/arm64/mm/mmu.c  |1 
>  arch/powerpc/kvm/book3s_hv_uvmem.c   |1 
>  drivers/gpu/drm/amd/amdkfd/kfd_migrate.c |   35 -
>  drivers/gpu/drm/amd/amdkfd/kfd_priv.h|1 
>  drivers/gpu/drm/drm_cache.c  |2 
>  drivers/gpu/drm/nouveau/nouveau_dmem.c   |3 
>  drivers/gpu/drm/nouveau/nouveau_svm.c|1 
>  drivers/infiniband/core/rw.c |1 
>  drivers/nvdimm/pmem.h|1 
>  drivers/nvme/host/pci.c  |1 
>  drivers/nvme/target/io-cmd-bdev.c|1 
>  fs/Kconfig   |2 
>  fs/fuse/virtio_fs.c  |1 
>  include/linux/hmm.h  |9 
>  include/linux/memremap.h |   36 +
>  include/linux/migrate.h  |1 
>  include/linux/mm.h   |   59 --
>  lib/test_hmm.c   |  353 ++---
>  lib/test_hmm_uapi.h  |   22 
>  mm/Kconfig   |7 
>  mm/Makefile  |1 
>  mm/gup.c |  127 +++-
>  mm/internal.h|3 
>  mm/memcontrol.c  |   19 
>  mm/memory-failure.c  |8 
>  mm/memremap.c|   75 +-
>  mm/migrate.c |  763 
>  mm/migrate_device.c  |  822 
> +++
>  mm/rmap.c|5 
>  mm/swap.c|   49 -
>  tools/testing/selftests/vm/Makefile  |2 
>  tools/testing/selftests/vm/hmm-tests.c   |  204 ++-
>  tools/testing/selftests/vm/test_hmm.sh   |   24 
>  33 files changed, 1552 insertions(+), 1088 deletions(-)
> 







Re: [PATCH 2/2] drm/amdgpu: add reset register trace function on GPU reset

2022-02-10 Thread Sharma, Shashank




On 2/10/2022 6:29 AM, Somalapuram, Amaranath wrote:


On 2/9/2022 1:17 PM, Christian König wrote:

Am 08.02.22 um 16:28 schrieb Alex Deucher:

On Tue, Feb 8, 2022 at 3:17 AM Somalapuram Amaranath
 wrote:

Dump the list of register values to trace event on GPU reset.

Signed-off-by: Somalapuram Amaranath 
---
  drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 21 -
  drivers/gpu/drm/amd/amdgpu/amdgpu_trace.h  | 19 +++
  2 files changed, 39 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c

index 1e651b959141..057922fb7e37 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -4534,6 +4534,23 @@ int amdgpu_device_pre_asic_reset(struct 
amdgpu_device *adev,

 return r;
  }

+static int amdgpu_reset_reg_dumps(struct amdgpu_device *adev)
+{
+   int i;
+   uint32_t reg_value[128];
+
+   for (i = 0; adev->reset_dump_reg_list[i] != 0; i++) {
+   if (adev->asic_type >= CHIP_NAVI10)

This check should be against CHIP_VEGA10.  Also, this only allows for
GC registers.  If we wanted to dump other registers, we'd need a
different macro.  Might be better to just use RREG32 here for
everything and then encode the full offset using
SOC15_REG_ENTRY_OFFSET() or a similar macro.  Also, we need to think
about how to handle gfxoff in this case.  gfxoff needs to be disabled
or we'll hang the chip if we try and read GC or SDMA registers via
MMIO which will adversely affect the hang signature.


Well this should execute right before a GPU reset, so I think it 
shouldn't matter if we hang the chip or not as long as the read comes 
back correctly (I remember a very long UVD debug session because of 
this).


But in general I agree, we should just use RREG32() here and always 
encode the full register offset.


Regards,
Christian.


Can I use something like this:

+   reg_value[i] = 
RREG32((adev->reg_offset[adev->reset_dump_reg_list[i][0]]

+ [adev->reset_dump_reg_list[i][1]]
+ [adev->reset_dump_reg_list[i][2]])
+                                 + adev->reset_dump_reg_list[i][3]);

ip --> adev->reset_dump_reg_list[i][0]

inst --> adev->reset_dump_reg_list[i][1]

BASE_IDX--> adev->reset_dump_reg_list[i][2]

reg --> adev->reset_dump_reg_list[i][3]

which requires 4 values in user space for each register.

using any existing macro like RREG32_SOC15** will not be able to pass 
proper argument from user space (like ip##_HWIP or reg##_BASE_IDX)





Why cant we use just a simple array
adev->reset_dump_reg_list[10] for both ip and reg offsets ?

Userspace can provide the IP engine enum in first entry of the array,
reset_dump_reg_list[0], and register offsets in other entries starting 
from 1. We can convert that into desirable engine substring using an 
array of char *, something like:


const char *ip_engine_name_substing[] = {
/* Same order as enum amd_hw_ip_block_type */
"GC", "HDP", ..
}

engine enum;
u32 ip = adev->reset_dump_reg_list[0];
const char *ip_name = ip_engine_name_subs[ip];

for (i = 0; i < 9; i++) {
reg_val = RREG_SOC15_IP(ip_name, reset_dump_reg_list[i+1]);
}

- Shashank





Alex

+   reg_value[i] = RREG32_SOC15_IP(GC, 
adev->reset_dump_reg_list[i]);

+   else
+   reg_value[i] = 
RREG32(adev->reset_dump_reg_list[i]);

+   }
+
+ trace_amdgpu_reset_reg_dumps(adev->reset_dump_reg_list, reg_value, 
i);

+
+   return 0;
+}
+
  int amdgpu_do_asic_reset(struct list_head *device_list_handle,
  struct amdgpu_reset_context *reset_context)
  {
@@ -4567,8 +4584,10 @@ int amdgpu_do_asic_reset(struct list_head 
*device_list_handle,

tmp_adev->gmc.xgmi.pending_reset = false;
 if (!queue_work(system_unbound_wq, 
&tmp_adev->xgmi_reset_work))

 r = -EALREADY;
-   } else
+   } else {
+ amdgpu_reset_reg_dumps(tmp_adev);
 r = amdgpu_asic_reset(tmp_adev);
+   }

 if (r) {
 dev_err(tmp_adev->dev, "ASIC reset 
failed with error, %d for drm dev, %s",
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_trace.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_trace.h

index d855cb53c7e0..3fe33de3564a 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_trace.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_trace.h
@@ -537,6 +537,25 @@ TRACE_EVENT(amdgpu_ib_pipe_sync,
   __entry->seqno)
  );

+TRACE_EVENT(amdgpu_reset_reg_dumps,
+   TP_PROTO(long *address, uint32_t *value, int length),
+   TP_ARGS(address, value, length),
+   TP_STRUCT__entry(
+    __array(long, address, 128)
+    __array(uint32_t, value, 128)
+    __fi

Re: [PATCH 2/2] drm/amdgpu: add reset register trace function on GPU reset

2022-02-10 Thread Christian König

Am 10.02.22 um 12:59 schrieb Sharma, Shashank:



On 2/10/2022 6:29 AM, Somalapuram, Amaranath wrote:


On 2/9/2022 1:17 PM, Christian König wrote:

Am 08.02.22 um 16:28 schrieb Alex Deucher:

On Tue, Feb 8, 2022 at 3:17 AM Somalapuram Amaranath
 wrote:

Dump the list of register values to trace event on GPU reset.

Signed-off-by: Somalapuram Amaranath 
---
  drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 21 
-

  drivers/gpu/drm/amd/amdgpu/amdgpu_trace.h  | 19 +++
  2 files changed, 39 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c

index 1e651b959141..057922fb7e37 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -4534,6 +4534,23 @@ int amdgpu_device_pre_asic_reset(struct 
amdgpu_device *adev,

 return r;
  }

+static int amdgpu_reset_reg_dumps(struct amdgpu_device *adev)
+{
+   int i;
+   uint32_t reg_value[128];
+
+   for (i = 0; adev->reset_dump_reg_list[i] != 0; i++) {
+   if (adev->asic_type >= CHIP_NAVI10)

This check should be against CHIP_VEGA10.  Also, this only allows for
GC registers.  If we wanted to dump other registers, we'd need a
different macro.  Might be better to just use RREG32 here for
everything and then encode the full offset using
SOC15_REG_ENTRY_OFFSET() or a similar macro.  Also, we need to think
about how to handle gfxoff in this case.  gfxoff needs to be disabled
or we'll hang the chip if we try and read GC or SDMA registers via
MMIO which will adversely affect the hang signature.


Well this should execute right before a GPU reset, so I think it 
shouldn't matter if we hang the chip or not as long as the read 
comes back correctly (I remember a very long UVD debug session 
because of this).


But in general I agree, we should just use RREG32() here and always 
encode the full register offset.


Regards,
Christian.


Can I use something like this:

+   reg_value[i] = 
RREG32((adev->reg_offset[adev->reset_dump_reg_list[i][0]]

+ [adev->reset_dump_reg_list[i][1]]
+ [adev->reset_dump_reg_list[i][2]])
+                                 + adev->reset_dump_reg_list[i][3]);

ip --> adev->reset_dump_reg_list[i][0]

inst --> adev->reset_dump_reg_list[i][1]

BASE_IDX--> adev->reset_dump_reg_list[i][2]

reg --> adev->reset_dump_reg_list[i][3]

which requires 4 values in user space for each register.

using any existing macro like RREG32_SOC15** will not be able to pass 
proper argument from user space (like ip##_HWIP or reg##_BASE_IDX)





Why cant we use just a simple array
adev->reset_dump_reg_list[10] for both ip and reg offsets ?


That won't work. The IPs are separated into several base registers, see 
how the SOC15 functions work.


But that's also not necessary. Userspace should have the same 
information as the kernel about which IP is mapped where.


So all we need here is the already resolved 32bit value of which 
register to read and are basically done.


Regards,
Christian.



Userspace can provide the IP engine enum in first entry of the array,
reset_dump_reg_list[0], and register offsets in other entries starting 
from 1. We can convert that into desirable engine substring using an 
array of char *, something like:


const char *ip_engine_name_substing[] = {
/* Same order as enum amd_hw_ip_block_type */
"GC", "HDP", ..
}

engine enum;
u32 ip = adev->reset_dump_reg_list[0];
const char *ip_name = ip_engine_name_subs[ip];

for (i = 0; i < 9; i++) {
reg_val = RREG_SOC15_IP(ip_name, reset_dump_reg_list[i+1]);
}

- Shashank





Alex

+   reg_value[i] = RREG32_SOC15_IP(GC, 
adev->reset_dump_reg_list[i]);

+   else
+   reg_value[i] = 
RREG32(adev->reset_dump_reg_list[i]);

+   }
+
+ trace_amdgpu_reset_reg_dumps(adev->reset_dump_reg_list, 
reg_value, i);

+
+   return 0;
+}
+
  int amdgpu_do_asic_reset(struct list_head *device_list_handle,
  struct amdgpu_reset_context *reset_context)
  {
@@ -4567,8 +4584,10 @@ int amdgpu_do_asic_reset(struct list_head 
*device_list_handle,

tmp_adev->gmc.xgmi.pending_reset = false;
 if 
(!queue_work(system_unbound_wq, &tmp_adev->xgmi_reset_work))

 r = -EALREADY;
-   } else
+   } else {
+ amdgpu_reset_reg_dumps(tmp_adev);
 r = amdgpu_asic_reset(tmp_adev);
+   }

 if (r) {
 dev_err(tmp_adev->dev, "ASIC 
reset failed with error, %d for drm dev, %s",
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_trace.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_trace.h

index d855cb53c7e0..3fe33de3564a 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_trace.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_trace.h
@@ -537,6 +537,25 @@ TRACE_E

Re: [PATCH 2/2] drm/amdgpu: add reset register trace function on GPU reset

2022-02-10 Thread Sharma, Shashank




On 2/10/2022 8:38 AM, Christian König wrote:

Am 10.02.22 um 08:34 schrieb Somalapuram, Amaranath:


On 2/10/2022 12:39 PM, Christian König wrote:

Am 10.02.22 um 06:29 schrieb Somalapuram, Amaranath:


On 2/9/2022 1:17 PM, Christian König wrote:

Am 08.02.22 um 16:28 schrieb Alex Deucher:

On Tue, Feb 8, 2022 at 3:17 AM Somalapuram Amaranath
 wrote:

Dump the list of register values to trace event on GPU reset.

Signed-off-by: Somalapuram Amaranath 
---
  drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 21 
-
  drivers/gpu/drm/amd/amdgpu/amdgpu_trace.h  | 19 
+++

  2 files changed, 39 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c

index 1e651b959141..057922fb7e37 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -4534,6 +4534,23 @@ int amdgpu_device_pre_asic_reset(struct 
amdgpu_device *adev,

 return r;
  }

+static int amdgpu_reset_reg_dumps(struct amdgpu_device *adev)
+{
+   int i;
+   uint32_t reg_value[128];
+
+   for (i = 0; adev->reset_dump_reg_list[i] != 0; i++) {
+   if (adev->asic_type >= CHIP_NAVI10)

This check should be against CHIP_VEGA10.  Also, this only allows for
GC registers.  If we wanted to dump other registers, we'd need a
different macro.  Might be better to just use RREG32 here for
everything and then encode the full offset using
SOC15_REG_ENTRY_OFFSET() or a similar macro.  Also, we need to think
about how to handle gfxoff in this case.  gfxoff needs to be disabled
or we'll hang the chip if we try and read GC or SDMA registers via
MMIO which will adversely affect the hang signature.


Well this should execute right before a GPU reset, so I think it 
shouldn't matter if we hang the chip or not as long as the read 
comes back correctly (I remember a very long UVD debug session 
because of this).


But in general I agree, we should just use RREG32() here and always 
encode the full register offset.


Regards,
Christian.


Can I use something like this:

+   reg_value[i] = 
RREG32((adev->reg_offset[adev->reset_dump_reg_list[i][0]]

+ [adev->reset_dump_reg_list[i][1]]
+ [adev->reset_dump_reg_list[i][2]])
+                                 + adev->reset_dump_reg_list[i][3]);

ip --> adev->reset_dump_reg_list[i][0]

inst --> adev->reset_dump_reg_list[i][1]

BASE_IDX--> adev->reset_dump_reg_list[i][2]

reg --> adev->reset_dump_reg_list[i][3]


No, that won't work.

What you need to do is to use the full 32bit address of the register. 
Userspace can worry about figuring out which ip, instance, base and 
reg to resolve into that address.


Regards,
Christian.


Thanks Christian.

should I consider using gfxoff like below code or not required:
amdgpu_gfx_off_ctrl(adev, false);
amdgpu_gfx_off_ctrl(adev, true);


That's a really good question I can't fully answer.

I think we don't want that because the GPU is stuck when the dump is 
made, but better let Alex comment as well.


Regards,
Christian.



I had a quick look at the function amdgpu_gfx_off_ctrl, and it locks 
this mutex internally:

mutex_lock(&adev->gfx.gfx_off_mutex);

and the reference state is tracked in:
adev->gfx.gfx_off_state

We can do something like this maybe:
- If (adev->gfx_off_state == 0) {
  trylock(gfx_off_mutex)
  read_regs_now;
  unlock_mutex();
}

How does it sounds ?

- Shashank





Regards,

S.Amarnath


which requires 4 values in user space for each register.

using any existing macro like RREG32_SOC15** will not be able to 
pass proper argument from user space (like ip##_HWIP or reg##_BASE_IDX)






Alex


+ reg_value[i] = RREG32_SOC15_IP(GC, adev->reset_dump_reg_list[i]);
+   else
+   reg_value[i] = 
RREG32(adev->reset_dump_reg_list[i]);

+   }
+
+ trace_amdgpu_reset_reg_dumps(adev->reset_dump_reg_list, 
reg_value, i);

+
+   return 0;
+}
+
  int amdgpu_do_asic_reset(struct list_head *device_list_handle,
  struct amdgpu_reset_context 
*reset_context)

  {
@@ -4567,8 +4584,10 @@ int amdgpu_do_asic_reset(struct list_head 
*device_list_handle,

tmp_adev->gmc.xgmi.pending_reset = false;
 if 
(!queue_work(system_unbound_wq, &tmp_adev->xgmi_reset_work))

 r = -EALREADY;
-   } else
+   } else {
+ amdgpu_reset_reg_dumps(tmp_adev);
 r = amdgpu_asic_reset(tmp_adev);
+   }

 if (r) {
dev_err(tmp_adev->dev, "ASIC reset failed with error, %d for drm 
dev, %s",
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_trace.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_trace.h

index d855cb53c7e0..3fe33de3564a 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_trace.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_trace.h
@@ -537,6 +537,25 @@ TRACE_EVENT(amdgpu_ib_pipe_sync,
 

[PATCH v2] drm/amdgpu: skipping SDMA hw_init and hw_fini for S0ix.

2022-02-10 Thread Rajib Mahapatra
[Why]
SDMA ring buffer test failed if suspend is aborted during
S0i3 resume.

[How]
If suspend is aborted for some reason during S0i3 resume
cycle, it follows SDMA ring test failing and errors in amdgpu
resume. For RN/CZN/Picasso, SMU saves and restores SDMA
registers during S0ix cycle. So, skipping SDMA suspend and
resume from driver solves the issue. This time, the system
is able to resume gracefully even the suspend is aborted.

v2: add changes on sdma_v4, skipping SDMA hw_init and hw_fini.
Signed-off-by: Rajib Mahapatra 
---
 drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c | 8 
 1 file changed, 8 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c 
b/drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c
index 06a7ceda4c87..02115d63b071 100644
--- a/drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c
@@ -2058,6 +2058,10 @@ static int sdma_v4_0_suspend(void *handle)
 {
struct amdgpu_device *adev = (struct amdgpu_device *)handle;
 
+   /* SMU saves SDMA state for us */
+   if (adev->in_s0ix)
+   return 0;
+
return sdma_v4_0_hw_fini(adev);
 }
 
@@ -2065,6 +2069,10 @@ static int sdma_v4_0_resume(void *handle)
 {
struct amdgpu_device *adev = (struct amdgpu_device *)handle;
 
+   /* SMU restores SDMA state for us */
+   if (adev->in_s0ix)
+   return 0;
+
return sdma_v4_0_hw_init(adev);
 }
 
-- 
2.25.1



RE: [PATCH v2] drm/amdgpu: skipping SDMA hw_init and hw_fini for S0ix.

2022-02-10 Thread Limonciello, Mario
[Public]



> -Original Message-
> From: Mahapatra, Rajib 
> Sent: Thursday, February 10, 2022 07:35
> To: Liang, Prike ; Limonciello, Mario
> ; Deucher, Alexander
> 
> Cc: amd-gfx@lists.freedesktop.org; S, Shirish ;
> Mahapatra, Rajib 
> Subject: [PATCH v2] drm/amdgpu: skipping SDMA hw_init and hw_fini for S0ix.
> 
> [Why]
> SDMA ring buffer test failed if suspend is aborted during
> S0i3 resume.
> 
> [How]
> If suspend is aborted for some reason during S0i3 resume
> cycle, it follows SDMA ring test failing and errors in amdgpu
> resume. For RN/CZN/Picasso, SMU saves and restores SDMA
> registers during S0ix cycle. So, skipping SDMA suspend and
> resume from driver solves the issue. This time, the system
> is able to resume gracefully even the suspend is aborted.
> 
> v2: add changes on sdma_v4, skipping SDMA hw_init and hw_fini.

This line in the commit message should be "below" the ---

Besides that the code is better.

Reviewed-by: Mario Limonciello 

> Signed-off-by: Rajib Mahapatra 
> ---
>  drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c | 8 
>  1 file changed, 8 insertions(+)
> 
> diff --git a/drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c
> b/drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c
> index 06a7ceda4c87..02115d63b071 100644
> --- a/drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c
> +++ b/drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c
> @@ -2058,6 +2058,10 @@ static int sdma_v4_0_suspend(void *handle)
>  {
>   struct amdgpu_device *adev = (struct amdgpu_device *)handle;
> 
> + /* SMU saves SDMA state for us */
> + if (adev->in_s0ix)
> + return 0;
> +
>   return sdma_v4_0_hw_fini(adev);
>  }
> 
> @@ -2065,6 +2069,10 @@ static int sdma_v4_0_resume(void *handle)
>  {
>   struct amdgpu_device *adev = (struct amdgpu_device *)handle;
> 
> + /* SMU restores SDMA state for us */
> + if (adev->in_s0ix)
> + return 0;
> +
>   return sdma_v4_0_hw_init(adev);
>  }
> 
> --
> 2.25.1


Re: [PATCH 2/2] drm/amdgpu: add reset register trace function on GPU reset

2022-02-10 Thread Christian König

Am 10.02.22 um 14:18 schrieb Sharma, Shashank:



On 2/10/2022 8:38 AM, Christian König wrote:

Am 10.02.22 um 08:34 schrieb Somalapuram, Amaranath:


On 2/10/2022 12:39 PM, Christian König wrote:

Am 10.02.22 um 06:29 schrieb Somalapuram, Amaranath:


On 2/9/2022 1:17 PM, Christian König wrote:

Am 08.02.22 um 16:28 schrieb Alex Deucher:

On Tue, Feb 8, 2022 at 3:17 AM Somalapuram Amaranath
 wrote:

Dump the list of register values to trace event on GPU reset.

Signed-off-by: Somalapuram Amaranath 


---
  drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 21 
-
  drivers/gpu/drm/amd/amdgpu/amdgpu_trace.h  | 19 
+++

  2 files changed, 39 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c

index 1e651b959141..057922fb7e37 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -4534,6 +4534,23 @@ int amdgpu_device_pre_asic_reset(struct 
amdgpu_device *adev,

 return r;
  }

+static int amdgpu_reset_reg_dumps(struct amdgpu_device *adev)
+{
+   int i;
+   uint32_t reg_value[128];
+
+   for (i = 0; adev->reset_dump_reg_list[i] != 0; i++) {
+   if (adev->asic_type >= CHIP_NAVI10)
This check should be against CHIP_VEGA10.  Also, this only 
allows for

GC registers.  If we wanted to dump other registers, we'd need a
different macro.  Might be better to just use RREG32 here for
everything and then encode the full offset using
SOC15_REG_ENTRY_OFFSET() or a similar macro.  Also, we need to 
think
about how to handle gfxoff in this case.  gfxoff needs to be 
disabled

or we'll hang the chip if we try and read GC or SDMA registers via
MMIO which will adversely affect the hang signature.


Well this should execute right before a GPU reset, so I think it 
shouldn't matter if we hang the chip or not as long as the read 
comes back correctly (I remember a very long UVD debug session 
because of this).


But in general I agree, we should just use RREG32() here and 
always encode the full register offset.


Regards,
Christian.


Can I use something like this:

+   reg_value[i] = 
RREG32((adev->reg_offset[adev->reset_dump_reg_list[i][0]]

+ [adev->reset_dump_reg_list[i][1]]
+ [adev->reset_dump_reg_list[i][2]])
+                                 + adev->reset_dump_reg_list[i][3]);

ip --> adev->reset_dump_reg_list[i][0]

inst --> adev->reset_dump_reg_list[i][1]

BASE_IDX--> adev->reset_dump_reg_list[i][2]

reg --> adev->reset_dump_reg_list[i][3]


No, that won't work.

What you need to do is to use the full 32bit address of the 
register. Userspace can worry about figuring out which ip, 
instance, base and reg to resolve into that address.


Regards,
Christian.


Thanks Christian.

should I consider using gfxoff like below code or not required:
amdgpu_gfx_off_ctrl(adev, false);
amdgpu_gfx_off_ctrl(adev, true);


That's a really good question I can't fully answer.

I think we don't want that because the GPU is stuck when the dump is 
made, but better let Alex comment as well.


Regards,
Christian.



I had a quick look at the function amdgpu_gfx_off_ctrl, and it locks 
this mutex internally:

mutex_lock(&adev->gfx.gfx_off_mutex);

and the reference state is tracked in:
adev->gfx.gfx_off_state

We can do something like this maybe:
- If (adev->gfx_off_state == 0) {
  trylock(gfx_off_mutex)
  read_regs_now;
  unlock_mutex();
}

How does it sounds ?


As far as I know that won't work. GFX_off is only disabled intentionally 
in very few places.


So we will probably never get a register trace with that.

Regards,
Christian.



- Shashank





Regards,

S.Amarnath


which requires 4 values in user space for each register.

using any existing macro like RREG32_SOC15** will not be able to 
pass proper argument from user space (like ip##_HWIP or 
reg##_BASE_IDX)






Alex

+ reg_value[i] = RREG32_SOC15_IP(GC, 
adev->reset_dump_reg_list[i]);

+   else
+   reg_value[i] = 
RREG32(adev->reset_dump_reg_list[i]);

+   }
+
+ trace_amdgpu_reset_reg_dumps(adev->reset_dump_reg_list, 
reg_value, i);

+
+   return 0;
+}
+
  int amdgpu_do_asic_reset(struct list_head *device_list_handle,
  struct amdgpu_reset_context 
*reset_context)

  {
@@ -4567,8 +4584,10 @@ int amdgpu_do_asic_reset(struct 
list_head *device_list_handle,

tmp_adev->gmc.xgmi.pending_reset = false;
 if 
(!queue_work(system_unbound_wq, &tmp_adev->xgmi_reset_work))

 r = -EALREADY;
-   } else
+   } else {
+ amdgpu_reset_reg_dumps(tmp_adev);
 r = amdgpu_asic_reset(tmp_adev);
+   }

 if (r) {
dev_err(tmp_adev->dev, "ASIC reset failed with error, %d for 
drm dev, %s",
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_trace.h 
b/

Re: [PATCH 2/2] drm/amdgpu: add reset register trace function on GPU reset

2022-02-10 Thread Sharma, Shashank




On 2/10/2022 3:05 PM, Christian König wrote:

Am 10.02.22 um 14:18 schrieb Sharma, Shashank:



On 2/10/2022 8:38 AM, Christian König wrote:

Am 10.02.22 um 08:34 schrieb Somalapuram, Amaranath:


On 2/10/2022 12:39 PM, Christian König wrote:

Am 10.02.22 um 06:29 schrieb Somalapuram, Amaranath:


On 2/9/2022 1:17 PM, Christian König wrote:

Am 08.02.22 um 16:28 schrieb Alex Deucher:

On Tue, Feb 8, 2022 at 3:17 AM Somalapuram Amaranath
 wrote:

Dump the list of register values to trace event on GPU reset.

Signed-off-by: Somalapuram Amaranath 


---
  drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 21 
-
  drivers/gpu/drm/amd/amdgpu/amdgpu_trace.h  | 19 
+++

  2 files changed, 39 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c

index 1e651b959141..057922fb7e37 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -4534,6 +4534,23 @@ int amdgpu_device_pre_asic_reset(struct 
amdgpu_device *adev,

 return r;
  }

+static int amdgpu_reset_reg_dumps(struct amdgpu_device *adev)
+{
+   int i;
+   uint32_t reg_value[128];
+
+   for (i = 0; adev->reset_dump_reg_list[i] != 0; i++) {
+   if (adev->asic_type >= CHIP_NAVI10)
This check should be against CHIP_VEGA10.  Also, this only 
allows for

GC registers.  If we wanted to dump other registers, we'd need a
different macro.  Might be better to just use RREG32 here for
everything and then encode the full offset using
SOC15_REG_ENTRY_OFFSET() or a similar macro.  Also, we need to 
think
about how to handle gfxoff in this case.  gfxoff needs to be 
disabled

or we'll hang the chip if we try and read GC or SDMA registers via
MMIO which will adversely affect the hang signature.


Well this should execute right before a GPU reset, so I think it 
shouldn't matter if we hang the chip or not as long as the read 
comes back correctly (I remember a very long UVD debug session 
because of this).


But in general I agree, we should just use RREG32() here and 
always encode the full register offset.


Regards,
Christian.


Can I use something like this:

+   reg_value[i] = 
RREG32((adev->reg_offset[adev->reset_dump_reg_list[i][0]]

+ [adev->reset_dump_reg_list[i][1]]
+ [adev->reset_dump_reg_list[i][2]])
+                                 + adev->reset_dump_reg_list[i][3]);

ip --> adev->reset_dump_reg_list[i][0]

inst --> adev->reset_dump_reg_list[i][1]

BASE_IDX--> adev->reset_dump_reg_list[i][2]

reg --> adev->reset_dump_reg_list[i][3]


No, that won't work.

What you need to do is to use the full 32bit address of the 
register. Userspace can worry about figuring out which ip, 
instance, base and reg to resolve into that address.


Regards,
Christian.


Thanks Christian.

should I consider using gfxoff like below code or not required:
amdgpu_gfx_off_ctrl(adev, false);
amdgpu_gfx_off_ctrl(adev, true);


That's a really good question I can't fully answer.

I think we don't want that because the GPU is stuck when the dump is 
made, but better let Alex comment as well.


Regards,
Christian.



I had a quick look at the function amdgpu_gfx_off_ctrl, and it locks 
this mutex internally:

mutex_lock(&adev->gfx.gfx_off_mutex);

and the reference state is tracked in:
adev->gfx.gfx_off_state

We can do something like this maybe:
- If (adev->gfx_off_state == 0) {
  trylock(gfx_off_mutex)
  read_regs_now;
  unlock_mutex();
}

How does it sounds ?


As far as I know that won't work. GFX_off is only disabled intentionally 
in very few places.


So we will probably never get a register trace with that.



Ok, I don't know much about this feature, but due to the name I was 
udner the impression that gfx_off will be mostly disabled. But if it is 
hardly ever disabled, we need more infomrmation about it first, like 
when is this disabled and why ?


Alex ?

- Shashank


Regards,
Christian.



- Shashank





Regards,

S.Amarnath


which requires 4 values in user space for each register.

using any existing macro like RREG32_SOC15** will not be able to 
pass proper argument from user space (like ip##_HWIP or 
reg##_BASE_IDX)






Alex

+ reg_value[i] = RREG32_SOC15_IP(GC, 
adev->reset_dump_reg_list[i]);

+   else
+   reg_value[i] = 
RREG32(adev->reset_dump_reg_list[i]);

+   }
+
+ trace_amdgpu_reset_reg_dumps(adev->reset_dump_reg_list, 
reg_value, i);

+
+   return 0;
+}
+
  int amdgpu_do_asic_reset(struct list_head *device_list_handle,
  struct amdgpu_reset_context 
*reset_context)

  {
@@ -4567,8 +4584,10 @@ int amdgpu_do_asic_reset(struct 
list_head *device_list_handle,

tmp_adev->gmc.xgmi.pending_reset = false;
 if 
(!queue_work(system_unbound_wq, &tmp_adev->xgmi_reset_work))

 r = -EALREADY;
-   } else
+   

Re: [PATCH v2 2/3] mm/gup.c: Migrate device coherent pages when pinning instead of failing

2022-02-10 Thread David Hildenbrand
On 10.02.22 12:39, Alistair Popple wrote:
> On Thursday, 10 February 2022 9:53:38 PM AEDT David Hildenbrand wrote:
>> On 07.02.22 05:26, Alistair Popple wrote:
>>> Currently any attempts to pin a device coherent page will fail. This is
>>> because device coherent pages need to be managed by a device driver, and
>>> pinning them would prevent a driver from migrating them off the device.
>>>
>>> However this is no reason to fail pinning of these pages. These are
>>> coherent and accessible from the CPU so can be migrated just like
>>> pinning ZONE_MOVABLE pages. So instead of failing all attempts to pin
>>> them first try migrating them out of ZONE_DEVICE.
>>>
>>> Signed-off-by: Alistair Popple 
>>> Acked-by: Felix Kuehling 
>>> ---
>>>
>>> Changes for v2:
>>>
>>>  - Added Felix's Acked-by
>>>  - Fixed missing check for dpage == NULL
>>>
>>>  mm/gup.c | 105 ++--
>>>  1 file changed, 95 insertions(+), 10 deletions(-)
>>>
>>> diff --git a/mm/gup.c b/mm/gup.c
>>> index 56d9577..5e826db 100644
>>> --- a/mm/gup.c
>>> +++ b/mm/gup.c
>>> @@ -1861,6 +1861,60 @@ struct page *get_dump_page(unsigned long addr)
>>>  
>>>  #ifdef CONFIG_MIGRATION
>>>  /*
>>> + * Migrates a device coherent page back to normal memory. Caller should 
>>> have a
>>> + * reference on page which will be copied to the new page if migration is
>>> + * successful or dropped on failure.
>>> + */
>>> +static struct page *migrate_device_page(struct page *page,
>>> +   unsigned int gup_flags)
>>> +{
>>> +   struct page *dpage;
>>> +   struct migrate_vma args;
>>> +   unsigned long src_pfn, dst_pfn = 0;
>>> +
>>> +   lock_page(page);
>>> +   src_pfn = migrate_pfn(page_to_pfn(page)) | MIGRATE_PFN_MIGRATE;
>>> +   args.src = &src_pfn;
>>> +   args.dst = &dst_pfn;
>>> +   args.cpages = 1;
>>> +   args.npages = 1;
>>> +   args.vma = NULL;
>>> +   migrate_vma_setup(&args);
>>> +   if (!(src_pfn & MIGRATE_PFN_MIGRATE))
>>> +   return NULL;
>>> +
>>> +   dpage = alloc_pages(GFP_USER | __GFP_NOWARN, 0);
>>> +
>>> +   /*
>>> +* get/pin the new page now so we don't have to retry gup after
>>> +* migrating. We already have a reference so this should never fail.
>>> +*/
>>> +   if (dpage && WARN_ON_ONCE(!try_grab_page(dpage, gup_flags))) {
>>> +   __free_pages(dpage, 0);
>>> +   dpage = NULL;
>>> +   }
>>> +
>>> +   if (dpage) {
>>> +   lock_page(dpage);
>>> +   dst_pfn = migrate_pfn(page_to_pfn(dpage));
>>> +   }
>>> +
>>> +   migrate_vma_pages(&args);
>>> +   if (src_pfn & MIGRATE_PFN_MIGRATE)
>>> +   copy_highpage(dpage, page);
>>> +   migrate_vma_finalize(&args);
>>> +   if (dpage && !(src_pfn & MIGRATE_PFN_MIGRATE)) {
>>> +   if (gup_flags & FOLL_PIN)
>>> +   unpin_user_page(dpage);
>>> +   else
>>> +   put_page(dpage);
>>> +   dpage = NULL;
>>> +   }
>>> +
>>> +   return dpage;
>>> +}
>>> +
>>> +/*
>>>   * Check whether all pages are pinnable, if so return number of pages.  If 
>>> some
>>>   * pages are not pinnable, migrate them, and unpin all pages. Return zero 
>>> if
>>>   * pages were migrated, or if some pages were not successfully isolated.
>>> @@ -1888,15 +1942,40 @@ static long 
>>> check_and_migrate_movable_pages(unsigned long nr_pages,
>>> continue;
>>> prev_head = head;
>>> /*
>>> -* If we get a movable page, since we are going to be pinning
>>> -* these entries, try to move them out if possible.
>>> +* Device coherent pages are managed by a driver and should not
>>> +* be pinned indefinitely as it prevents the driver moving the
>>> +* page. So when trying to pin with FOLL_LONGTERM instead try
>>> +* migrating page out of device memory.
>>>  */
>>> if (is_dev_private_or_coherent_page(head)) {
>>> +   /*
>>> +* device private pages will get faulted in during gup
>>> +* so it shouldn't be possible to see one here.
>>> +*/
>>> WARN_ON_ONCE(is_device_private_page(head));
>>> -   ret = -EFAULT;
>>> -   goto unpin_pages;
>>> +   WARN_ON_ONCE(PageCompound(head));
>>> +
>>> +   /*
>>> +* migration will fail if the page is pinned, so convert
>>> +* the pin on the source page to a normal reference.
>>> +*/
>>> +   if (gup_flags & FOLL_PIN) {
>>> +   get_page(head);
>>> +   unpin_user_page(head);
>>> +   }
>>> +
>>> +   pages[i] = migrate_device_page(head, gup_flags);
>>
>> For ordinary migrate_pages(), we'll unpin all pages and return 0 so the
>> caller will retry pinning by walking the page tables again.
>>
>>

[PATCH] gpu: drm: radeon: use time_after_eq() instead of jiffies judgment

2022-02-10 Thread Qing Wang
From: Wang Qing 

It is better to use time_xxx() directly instead of jiffies judgment
for understanding.

Signed-off-by: Wang Qing 
---
 drivers/gpu/drm/radeon/radeon_pm.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/radeon/radeon_pm.c 
b/drivers/gpu/drm/radeon/radeon_pm.c
index c67b6dd..53d536a
--- a/drivers/gpu/drm/radeon/radeon_pm.c
+++ b/drivers/gpu/drm/radeon/radeon_pm.c
@@ -25,6 +25,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 
@@ -1899,7 +1900,7 @@ static void radeon_dynpm_idle_work_handler(struct 
work_struct *work)
 * to false since we want to wait for vbl to avoid flicker.
 */
if (rdev->pm.dynpm_planned_action != DYNPM_ACTION_NONE &&
-   jiffies > rdev->pm.dynpm_action_timeout) {
+   time_after(jiffies, rdev->pm.dynpm_action_timeout)) {
radeon_pm_get_dynpm_state(rdev);
radeon_pm_set_clocks(rdev);
}
-- 
2.7.4



Re: [PATCH v2 2/3] mm/gup.c: Migrate device coherent pages when pinning instead of failing

2022-02-10 Thread Alistair Popple
On Thursday, 10 February 2022 9:53:38 PM AEDT David Hildenbrand wrote:
> On 07.02.22 05:26, Alistair Popple wrote:
> > Currently any attempts to pin a device coherent page will fail. This is
> > because device coherent pages need to be managed by a device driver, and
> > pinning them would prevent a driver from migrating them off the device.
> > 
> > However this is no reason to fail pinning of these pages. These are
> > coherent and accessible from the CPU so can be migrated just like
> > pinning ZONE_MOVABLE pages. So instead of failing all attempts to pin
> > them first try migrating them out of ZONE_DEVICE.
> > 
> > Signed-off-by: Alistair Popple 
> > Acked-by: Felix Kuehling 
> > ---
> > 
> > Changes for v2:
> > 
> >  - Added Felix's Acked-by
> >  - Fixed missing check for dpage == NULL
> > 
> >  mm/gup.c | 105 ++--
> >  1 file changed, 95 insertions(+), 10 deletions(-)
> > 
> > diff --git a/mm/gup.c b/mm/gup.c
> > index 56d9577..5e826db 100644
> > --- a/mm/gup.c
> > +++ b/mm/gup.c
> > @@ -1861,6 +1861,60 @@ struct page *get_dump_page(unsigned long addr)
> >  
> >  #ifdef CONFIG_MIGRATION
> >  /*
> > + * Migrates a device coherent page back to normal memory. Caller should 
> > have a
> > + * reference on page which will be copied to the new page if migration is
> > + * successful or dropped on failure.
> > + */
> > +static struct page *migrate_device_page(struct page *page,
> > +   unsigned int gup_flags)
> > +{
> > +   struct page *dpage;
> > +   struct migrate_vma args;
> > +   unsigned long src_pfn, dst_pfn = 0;
> > +
> > +   lock_page(page);
> > +   src_pfn = migrate_pfn(page_to_pfn(page)) | MIGRATE_PFN_MIGRATE;
> > +   args.src = &src_pfn;
> > +   args.dst = &dst_pfn;
> > +   args.cpages = 1;
> > +   args.npages = 1;
> > +   args.vma = NULL;
> > +   migrate_vma_setup(&args);
> > +   if (!(src_pfn & MIGRATE_PFN_MIGRATE))
> > +   return NULL;
> > +
> > +   dpage = alloc_pages(GFP_USER | __GFP_NOWARN, 0);
> > +
> > +   /*
> > +* get/pin the new page now so we don't have to retry gup after
> > +* migrating. We already have a reference so this should never fail.
> > +*/
> > +   if (dpage && WARN_ON_ONCE(!try_grab_page(dpage, gup_flags))) {
> > +   __free_pages(dpage, 0);
> > +   dpage = NULL;
> > +   }
> > +
> > +   if (dpage) {
> > +   lock_page(dpage);
> > +   dst_pfn = migrate_pfn(page_to_pfn(dpage));
> > +   }
> > +
> > +   migrate_vma_pages(&args);
> > +   if (src_pfn & MIGRATE_PFN_MIGRATE)
> > +   copy_highpage(dpage, page);
> > +   migrate_vma_finalize(&args);
> > +   if (dpage && !(src_pfn & MIGRATE_PFN_MIGRATE)) {
> > +   if (gup_flags & FOLL_PIN)
> > +   unpin_user_page(dpage);
> > +   else
> > +   put_page(dpage);
> > +   dpage = NULL;
> > +   }
> > +
> > +   return dpage;
> > +}
> > +
> > +/*
> >   * Check whether all pages are pinnable, if so return number of pages.  If 
> > some
> >   * pages are not pinnable, migrate them, and unpin all pages. Return zero 
> > if
> >   * pages were migrated, or if some pages were not successfully isolated.
> > @@ -1888,15 +1942,40 @@ static long 
> > check_and_migrate_movable_pages(unsigned long nr_pages,
> > continue;
> > prev_head = head;
> > /*
> > -* If we get a movable page, since we are going to be pinning
> > -* these entries, try to move them out if possible.
> > +* Device coherent pages are managed by a driver and should not
> > +* be pinned indefinitely as it prevents the driver moving the
> > +* page. So when trying to pin with FOLL_LONGTERM instead try
> > +* migrating page out of device memory.
> >  */
> > if (is_dev_private_or_coherent_page(head)) {
> > +   /*
> > +* device private pages will get faulted in during gup
> > +* so it shouldn't be possible to see one here.
> > +*/
> > WARN_ON_ONCE(is_device_private_page(head));
> > -   ret = -EFAULT;
> > -   goto unpin_pages;
> > +   WARN_ON_ONCE(PageCompound(head));
> > +
> > +   /*
> > +* migration will fail if the page is pinned, so convert
> > +* the pin on the source page to a normal reference.
> > +*/
> > +   if (gup_flags & FOLL_PIN) {
> > +   get_page(head);
> > +   unpin_user_page(head);
> > +   }
> > +
> > +   pages[i] = migrate_device_page(head, gup_flags);
> 
> For ordinary migrate_pages(), we'll unpin all pages and return 0 so the
> caller will retry pinning by walking the page tables again.
> 
> Why can't we apply the same mechanism her

Re: [PATCH 05/23] drm/amd/display: Fix color encoding mismatch

2022-02-10 Thread Maxime Ripard
Hi Harry,

On Mon, Feb 07, 2022 at 01:59:38PM -0500, Harry Wentland wrote:
> On 2022-02-07 13:57, Harry Wentland wrote:
> > On 2022-02-07 11:34, Maxime Ripard wrote:
> >> The amdgpu KMS driver calls drm_plane_create_color_properties() with a
> >> default encoding set to BT709.
> >>
> >> However, the core will ignore it and the driver doesn't force it in its
> >> plane state reset hook, so the initial value will be 0, which represents
> >> BT601.
> >>
> > 
> > Isn't this a core issue? Should __drm_atomic_helper_plane_state_reset
> > reset all plane_state members to their properties' default values?
> > 
> 
> Ah, looks like that's exactly what you do in the later patches, which is
> perfect. With that, I don't think you'll need this patch anymore.

Ok, I'll squash it into the patch that removes the reset code.

Thanks!
Maxime


signature.asc
Description: PGP signature


Re: [PATCH v2 2/3] mm/gup.c: Migrate device coherent pages when pinning instead of failing

2022-02-10 Thread David Hildenbrand
On 07.02.22 05:26, Alistair Popple wrote:
> Currently any attempts to pin a device coherent page will fail. This is
> because device coherent pages need to be managed by a device driver, and
> pinning them would prevent a driver from migrating them off the device.
> 
> However this is no reason to fail pinning of these pages. These are
> coherent and accessible from the CPU so can be migrated just like
> pinning ZONE_MOVABLE pages. So instead of failing all attempts to pin
> them first try migrating them out of ZONE_DEVICE.
> 
> Signed-off-by: Alistair Popple 
> Acked-by: Felix Kuehling 
> ---
> 
> Changes for v2:
> 
>  - Added Felix's Acked-by
>  - Fixed missing check for dpage == NULL
> 
>  mm/gup.c | 105 ++--
>  1 file changed, 95 insertions(+), 10 deletions(-)
> 
> diff --git a/mm/gup.c b/mm/gup.c
> index 56d9577..5e826db 100644
> --- a/mm/gup.c
> +++ b/mm/gup.c
> @@ -1861,6 +1861,60 @@ struct page *get_dump_page(unsigned long addr)
>  
>  #ifdef CONFIG_MIGRATION
>  /*
> + * Migrates a device coherent page back to normal memory. Caller should have 
> a
> + * reference on page which will be copied to the new page if migration is
> + * successful or dropped on failure.
> + */
> +static struct page *migrate_device_page(struct page *page,
> + unsigned int gup_flags)
> +{
> + struct page *dpage;
> + struct migrate_vma args;
> + unsigned long src_pfn, dst_pfn = 0;
> +
> + lock_page(page);
> + src_pfn = migrate_pfn(page_to_pfn(page)) | MIGRATE_PFN_MIGRATE;
> + args.src = &src_pfn;
> + args.dst = &dst_pfn;
> + args.cpages = 1;
> + args.npages = 1;
> + args.vma = NULL;
> + migrate_vma_setup(&args);
> + if (!(src_pfn & MIGRATE_PFN_MIGRATE))
> + return NULL;
> +
> + dpage = alloc_pages(GFP_USER | __GFP_NOWARN, 0);
> +
> + /*
> +  * get/pin the new page now so we don't have to retry gup after
> +  * migrating. We already have a reference so this should never fail.
> +  */
> + if (dpage && WARN_ON_ONCE(!try_grab_page(dpage, gup_flags))) {
> + __free_pages(dpage, 0);
> + dpage = NULL;
> + }
> +
> + if (dpage) {
> + lock_page(dpage);
> + dst_pfn = migrate_pfn(page_to_pfn(dpage));
> + }
> +
> + migrate_vma_pages(&args);
> + if (src_pfn & MIGRATE_PFN_MIGRATE)
> + copy_highpage(dpage, page);
> + migrate_vma_finalize(&args);
> + if (dpage && !(src_pfn & MIGRATE_PFN_MIGRATE)) {
> + if (gup_flags & FOLL_PIN)
> + unpin_user_page(dpage);
> + else
> + put_page(dpage);
> + dpage = NULL;
> + }
> +
> + return dpage;
> +}
> +
> +/*
>   * Check whether all pages are pinnable, if so return number of pages.  If 
> some
>   * pages are not pinnable, migrate them, and unpin all pages. Return zero if
>   * pages were migrated, or if some pages were not successfully isolated.
> @@ -1888,15 +1942,40 @@ static long check_and_migrate_movable_pages(unsigned 
> long nr_pages,
>   continue;
>   prev_head = head;
>   /*
> -  * If we get a movable page, since we are going to be pinning
> -  * these entries, try to move them out if possible.
> +  * Device coherent pages are managed by a driver and should not
> +  * be pinned indefinitely as it prevents the driver moving the
> +  * page. So when trying to pin with FOLL_LONGTERM instead try
> +  * migrating page out of device memory.
>*/
>   if (is_dev_private_or_coherent_page(head)) {
> + /*
> +  * device private pages will get faulted in during gup
> +  * so it shouldn't be possible to see one here.
> +  */
>   WARN_ON_ONCE(is_device_private_page(head));
> - ret = -EFAULT;
> - goto unpin_pages;
> + WARN_ON_ONCE(PageCompound(head));
> +
> + /*
> +  * migration will fail if the page is pinned, so convert
> +  * the pin on the source page to a normal reference.
> +  */
> + if (gup_flags & FOLL_PIN) {
> + get_page(head);
> + unpin_user_page(head);
> + }
> +
> + pages[i] = migrate_device_page(head, gup_flags);



For ordinary migrate_pages(), we'll unpin all pages and return 0 so the
caller will retry pinning by walking the page tables again.

Why can't we apply the same mechanism here? This "let's avoid another
walk" looks unnecessary complicated to me, but I might be wrong.

-- 
Thanks,

David / dhildenb



Re: [PATCH] drm/amdgpu: Fix compile error.

2022-02-10 Thread Andrey Grodzovsky




On 2022-02-10 02:06, Christian König wrote:

Am 10.02.22 um 04:17 schrieb Andrey Grodzovsky:

Seems I forgot to add this to the relevant commit
when submitting.


Rebase/merge issue? Looks like it.


It looks more like I forgot to add the header file
change to the commit after updating with your comments.

Thanks for pushing it.

Andrey





Signed-off-by: Andrey Grodzovsky 
Reported-by: kernel test robot 


Reviewed-by: Christian König 


---
  drivers/gpu/drm/amd/amdgpu/amdgpu_reset.h | 3 +--
  1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_reset.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_reset.h

index 92de3b7965a1..1949dbe28a86 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_reset.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_reset.h
@@ -118,8 +118,7 @@ static inline bool 
amdgpu_reset_domain_schedule(struct amdgpu_reset_domain *doma

  return queue_work(domain->wq, work);
  }
-void amdgpu_device_lock_reset_domain(struct amdgpu_reset_domain 
*reset_domain,

- struct amdgpu_hive_info *hive);
+void amdgpu_device_lock_reset_domain(struct amdgpu_reset_domain 
*reset_domain);
  void amdgpu_device_unlock_reset_domain(struct amdgpu_reset_domain 
*reset_domain);




Re: [PATCH 05/23] drm/amd/display: Fix color encoding mismatch

2022-02-10 Thread Harry Wentland



On 2022-02-10 03:42, Maxime Ripard wrote:
> Hi Harry,
> 
> On Mon, Feb 07, 2022 at 01:59:38PM -0500, Harry Wentland wrote:
>> On 2022-02-07 13:57, Harry Wentland wrote:
>>> On 2022-02-07 11:34, Maxime Ripard wrote:
 The amdgpu KMS driver calls drm_plane_create_color_properties() with a
 default encoding set to BT709.

 However, the core will ignore it and the driver doesn't force it in its
 plane state reset hook, so the initial value will be 0, which represents
 BT601.

>>>
>>> Isn't this a core issue? Should __drm_atomic_helper_plane_state_reset
>>> reset all plane_state members to their properties' default values?
>>>
>>
>> Ah, looks like that's exactly what you do in the later patches, which is
>> perfect. With that, I don't think you'll need this patch anymore.
> 
> Ok, I'll squash it into the patch that removes the reset code.
> 

I don't think that's right. I think we can just drop this patch.
The amdgpu display driver is not doing BT601 by default.

Harry

> Thanks!
> Maxime



Re: [PATCH] drm/amdgpu: add support for GC 10.1.4

2022-02-10 Thread Deucher, Alexander
[Public]

Reviewed-by: Alex Deucher 

From: Yu, Lang 
Sent: Thursday, February 10, 2022 1:20 AM
To: amd-gfx@lists.freedesktop.org 
Cc: Deucher, Alexander ; Huang, Ray 
; Yu, Lang 
Subject: [PATCH] drm/amdgpu: add support for GC 10.1.4

Add basic support for GC 10.1.4,
it uses same IP blocks with GC 10.1.3

Signed-off-by: Lang Yu 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c | 6 ++
 drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c   | 3 ++-
 drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c| 9 +
 drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c| 4 +++-
 drivers/gpu/drm/amd/amdgpu/nv.c   | 1 +
 drivers/gpu/drm/amd/amdgpu/sdma_v5_0.c| 3 ++-
 drivers/gpu/drm/amd/amdkfd/kfd_crat.c | 1 +
 drivers/gpu/drm/amd/amdkfd/kfd_device.c   | 2 ++
 8 files changed, 26 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c
index eb4b7059633d..cd7e8522c130 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c
@@ -674,6 +674,7 @@ static int amdgpu_discovery_set_common_ip_blocks(struct 
amdgpu_device *adev)
 case IP_VERSION(10, 1, 1):
 case IP_VERSION(10, 1, 2):
 case IP_VERSION(10, 1, 3):
+   case IP_VERSION(10, 1, 4):
 case IP_VERSION(10, 3, 0):
 case IP_VERSION(10, 3, 1):
 case IP_VERSION(10, 3, 2):
@@ -709,6 +710,7 @@ static int amdgpu_discovery_set_gmc_ip_blocks(struct 
amdgpu_device *adev)
 case IP_VERSION(10, 1, 1):
 case IP_VERSION(10, 1, 2):
 case IP_VERSION(10, 1, 3):
+   case IP_VERSION(10, 1, 4):
 case IP_VERSION(10, 3, 0):
 case IP_VERSION(10, 3, 1):
 case IP_VERSION(10, 3, 2):
@@ -910,6 +912,7 @@ static int amdgpu_discovery_set_gc_ip_blocks(struct 
amdgpu_device *adev)
 case IP_VERSION(10, 1, 2):
 case IP_VERSION(10, 1, 1):
 case IP_VERSION(10, 1, 3):
+   case IP_VERSION(10, 1, 4):
 case IP_VERSION(10, 3, 0):
 case IP_VERSION(10, 3, 2):
 case IP_VERSION(10, 3, 1):
@@ -1044,6 +1047,7 @@ static int amdgpu_discovery_set_mes_ip_blocks(struct 
amdgpu_device *adev)
 case IP_VERSION(10, 1, 1):
 case IP_VERSION(10, 1, 2):
 case IP_VERSION(10, 1, 3):
+   case IP_VERSION(10, 1, 4):
 case IP_VERSION(10, 3, 0):
 case IP_VERSION(10, 3, 1):
 case IP_VERSION(10, 3, 2):
@@ -1243,6 +1247,7 @@ int amdgpu_discovery_set_ip_blocks(struct amdgpu_device 
*adev)
 case IP_VERSION(10, 1, 1):
 case IP_VERSION(10, 1, 2):
 case IP_VERSION(10, 1, 3):
+   case IP_VERSION(10, 1, 4):
 case IP_VERSION(10, 3, 0):
 case IP_VERSION(10, 3, 2):
 case IP_VERSION(10, 3, 4):
@@ -1264,6 +1269,7 @@ int amdgpu_discovery_set_ip_blocks(struct amdgpu_device 
*adev)
 case IP_VERSION(9, 2, 2):
 case IP_VERSION(9, 3, 0):
 case IP_VERSION(10, 1, 3):
+   case IP_VERSION(10, 1, 4):
 case IP_VERSION(10, 3, 1):
 case IP_VERSION(10, 3, 3):
 adev->flags |= AMD_IS_APU;
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c
index f2806959736a..9bc9155cbf06 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c
@@ -137,7 +137,8 @@ static int psp_early_init(void *handle)
 psp->autoload_supported = true;
 break;
 case IP_VERSION(11, 0, 8):
-   if (adev->apu_flags & AMD_APU_IS_CYAN_SKILLFISH2) {
+   if (adev->apu_flags & AMD_APU_IS_CYAN_SKILLFISH2 ||
+   adev->ip_versions[GC_HWIP][0] == IP_VERSION(10, 1, 4)) {
 psp_v11_0_8_set_psp_funcs(psp);
 psp->autoload_supported = false;
 }
diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c 
b/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
index 3d8c5fea572e..8fb4528c741f 100644
--- a/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
@@ -3641,6 +3641,7 @@ static void gfx_v10_0_init_golden_registers(struct 
amdgpu_device *adev)
 (const 
u32)ARRAY_SIZE(golden_settings_gc_10_3_5));
 break;
 case IP_VERSION(10, 1, 3):
+   case IP_VERSION(10, 1, 4):
 soc15_program_register_sequence(adev,
 
golden_settings_gc_10_0_cyan_skillfish,
 (const 
u32)ARRAY_SIZE(golden_settings_gc_10_0_cyan_skillfish));
@@ -3819,6 +3820,7 @@ static void gfx_v10_0_check_fw_write_wait(struct 
amdgpu_device *adev)
 case IP_VERSION(10, 1, 2):
 case IP_VERSION(10, 1, 1):
 case IP_VERSION(10, 1, 3):
+   case IP_VERSION(10, 1, 4):
 if ((adev->gfx.me_fw_version >= 0x0046)

Re: [PATCH] drm/amdgpu: no rlcg read access in SRIOV case for gfx v9

2022-02-10 Thread Deucher, Alexander
[AMD Official Use Only]

For better future proofing maybe adjust the check to look for pre-gfx10 rather 
than checking for specific IP versions?  E.g.,

adev->ip_versions[MP0_HWIP][0] < IP_VERSION(10, 0, 0)

From: Chen, Guchun 
Sent: Thursday, February 10, 2022 1:40 AM
To: amd-gfx@lists.freedesktop.org ; Zhang, 
Hawking ; Zhou, Peng Ju ; Koenig, 
Christian ; Deucher, Alexander 

Cc: Chen, Guchun 
Subject: [PATCH] drm/amdgpu: no rlcg read access in SRIOV case for gfx v9

Fall back to MMIO to read registers as rlcg read is not
available for gfx v9 in SRIOV configration. Otherwise,
gmc_v9_0_flush_gpu_tlb will always complain timeout and
finally breaks driver load.

Fixes: 0dc4a7e75581("drm/amdgpu: switch to get_rlcg_reg_access_flag for gfx9")
Signed-off-by: Guchun Chen 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c | 13 -
 1 file changed, 12 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c
index e1288901beb6..a3274fa1c7e4 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c
@@ -37,6 +37,16 @@
 vf2pf_info->ucode_info[ucode].version = ver; \
 } while (0)

+static bool amdgpu_virt_is_rlcg_read_supported(struct amdgpu_device *adev)
+{
+   /* rlcg read is not support in SRIOV with gfx v9 */
+   if ((adev->ip_versions[MP0_HWIP][0] == IP_VERSION(9, 0, 0)) ||
+   (adev->ip_versions[GC_HWIP][0] == IP_VERSION(9, 4, 1)))
+   return false;
+
+   return true;
+}
+
 bool amdgpu_virt_mmio_blocked(struct amdgpu_device *adev)
 {
 /* By now all MMIO pages except mailbox are blocked */
@@ -957,7 +967,8 @@ u32 amdgpu_sriov_rreg(struct amdgpu_device *adev,
 u32 rlcg_flag;

 if (!amdgpu_sriov_runtime(adev) &&
-   amdgpu_virt_get_rlcg_reg_access_flag(adev, acc_flags, hwip, false, 
&rlcg_flag))
+   amdgpu_virt_is_rlcg_read_supported(adev) &&
+   amdgpu_virt_get_rlcg_reg_access_flag(adev, acc_flags, hwip, 
false, &rlcg_flag))
 return amdgpu_virt_rlcg_reg_rw(adev, offset, 0, rlcg_flag);

 if (acc_flags & AMDGPU_REGS_NO_KIQ)
--
2.17.1



[PATCH] drm/amdgpu: Add unique_id support for sienna cichlid

2022-02-10 Thread Kent Russell
This is being added to SMU Metrics, so add the required tie-ins in the
kernel

Signed-off-by: Kent Russell 
---
 .../pmfw_if/smu11_driver_if_sienna_cichlid.h  | 12 +--
 .../amd/pm/swsmu/smu11/sienna_cichlid_ppt.c   | 33 +++
 2 files changed, 43 insertions(+), 2 deletions(-)

diff --git 
a/drivers/gpu/drm/amd/pm/swsmu/inc/pmfw_if/smu11_driver_if_sienna_cichlid.h 
b/drivers/gpu/drm/amd/pm/swsmu/inc/pmfw_if/smu11_driver_if_sienna_cichlid.h
index b253be602cc2..c09dec2c4e1e 100644
--- a/drivers/gpu/drm/amd/pm/swsmu/inc/pmfw_if/smu11_driver_if_sienna_cichlid.h
+++ b/drivers/gpu/drm/amd/pm/swsmu/inc/pmfw_if/smu11_driver_if_sienna_cichlid.h
@@ -1419,8 +1419,12 @@ typedef struct {
   uint8_t  PcieRate   ;
   uint8_t  PcieWidth  ;
   uint16_t AverageGfxclkFrequencyTarget;
-  uint16_t Padding16_2;
 
+  //PMFW-8711
+  uint32_t PublicSerialNumLower32;
+  uint32_t PublicSerialNumUpper32;
+
+  uint16_t Padding16_2;
 } SmuMetrics_t;
 
 typedef struct {
@@ -1476,8 +1480,12 @@ typedef struct {
   uint8_t  PcieRate   ;
   uint8_t  PcieWidth  ;
   uint16_t AverageGfxclkFrequencyTarget;
-  uint16_t Padding16_2;
 
+  //PMFW-8711
+  uint32_t PublicSerialNumLower32;
+  uint32_t PublicSerialNumUpper32;
+
+  uint16_t Padding16_2;
 } SmuMetrics_V2_t;
 
 typedef struct {
diff --git a/drivers/gpu/drm/amd/pm/swsmu/smu11/sienna_cichlid_ppt.c 
b/drivers/gpu/drm/amd/pm/swsmu/smu11/sienna_cichlid_ppt.c
index 2a7da2bad96a..048014f05b35 100644
--- a/drivers/gpu/drm/amd/pm/swsmu/smu11/sienna_cichlid_ppt.c
+++ b/drivers/gpu/drm/amd/pm/swsmu/smu11/sienna_cichlid_ppt.c
@@ -451,6 +451,38 @@ static int sienna_cichlid_setup_pptable(struct smu_context 
*smu)
return ret;
 }
 
+static void sienna_cichlid_get_unique_id(struct smu_context *smu)
+{
+   struct amdgpu_device *adev = smu->adev;
+   struct smu_table_context *smu_table = &smu->smu_table;
+   SmuMetrics_t *metrics =
+   &(((SmuMetricsExternal_t 
*)(smu_table->metrics_table))->SmuMetrics);
+   SmuMetrics_V2_t *metrics_v2 =
+   &(((SmuMetricsExternal_t 
*)(smu_table->metrics_table))->SmuMetrics_V2);
+   uint32_t upper32 = 0, lower32 = 0;
+   int ret;
+
+   mutex_lock(&smu->metrics_lock);
+   ret = smu_cmn_get_metrics_table_locked(smu, NULL, false);
+   if (ret)
+   goto out_unlock;
+
+   bool use_metrics_v2 = ((smu->adev->ip_versions[MP1_HWIP][0] == 
IP_VERSION(11, 0, 7)) &&
+   (smu->smc_fw_version >= 0x3A4300)) ? true : false;
+
+   upper32 = use_metrics_v2 ? metrics_v2->PublicSerialNumUpper32 :
+  metrics->PublicSerialNumUpper32;
+   lower32 = use_metrics_v2 ? metrics_v2->PublicSerialNumLower32 :
+  metrics->PublicSerialNumLower32;
+
+out_unlock:
+   mutex_unlock(&smu->metrics_lock);
+
+   adev->unique_id = ((uint64_t)upper32 << 32) | lower32;
+   if (adev->serial[0] == '\0')
+   sprintf(adev->serial, "%016llx", adev->unique_id);
+}
+
 static int sienna_cichlid_tables_init(struct smu_context *smu)
 {
struct smu_table_context *smu_table = &smu->smu_table;
@@ -4012,6 +4044,7 @@ static const struct pptable_funcs 
sienna_cichlid_ppt_funcs = {
.set_mp1_state = sienna_cichlid_set_mp1_state,
.stb_collect_info = sienna_cichlid_stb_get_data_direct,
.get_ecc_info = sienna_cichlid_get_ecc_info,
+   .get_unique_id = sienna_cichlid_get_unique_id,
 };
 
 void sienna_cichlid_set_ppt_funcs(struct smu_context *smu)
-- 
2.25.1



Re: [PATCH 13/27] mm: move the migrate_vma_* device migration code into it's own file

2022-02-10 Thread Christoph Hellwig
On Thu, Feb 10, 2022 at 09:35:10PM +1100, Alistair Popple wrote:
> I got the following build error:
> 
> /data/source/linux/mm/migrate_device.c: In function ‘migrate_vma_collect_pmd’:
> /data/source/linux/mm/migrate_device.c:242:3: error: implicit declaration of 
> function ‘flush_tlb_range’; did you mean ‘flush_pmd_tlb_range’? 
> [-Werror=implicit-function-declaration]
>   242 |   flush_tlb_range(walk->vma, start, end);
>   |   ^~~
>   |   flush_pmd_tlb_range
> 
> Including asm/tlbflush.h in migrate_device.c fixed it for me.

Yes, the buildbot also complained about this, but somehow in my test
configfs it got pulled in implicitly.



Re: [PATCH v2] drm/amdgpu: skipping SDMA hw_init and hw_fini for S0ix.

2022-02-10 Thread Alex Deucher
On Thu, Feb 10, 2022 at 9:04 AM Limonciello, Mario
 wrote:
>
> [Public]
>
>
>
> > -Original Message-
> > From: Mahapatra, Rajib 
> > Sent: Thursday, February 10, 2022 07:35
> > To: Liang, Prike ; Limonciello, Mario
> > ; Deucher, Alexander
> > 
> > Cc: amd-gfx@lists.freedesktop.org; S, Shirish ;
> > Mahapatra, Rajib 
> > Subject: [PATCH v2] drm/amdgpu: skipping SDMA hw_init and hw_fini for S0ix.
> >
> > [Why]
> > SDMA ring buffer test failed if suspend is aborted during
> > S0i3 resume.
> >
> > [How]
> > If suspend is aborted for some reason during S0i3 resume
> > cycle, it follows SDMA ring test failing and errors in amdgpu
> > resume. For RN/CZN/Picasso, SMU saves and restores SDMA
> > registers during S0ix cycle. So, skipping SDMA suspend and
> > resume from driver solves the issue. This time, the system
> > is able to resume gracefully even the suspend is aborted.
> >
> > v2: add changes on sdma_v4, skipping SDMA hw_init and hw_fini.
>
> This line in the commit message should be "below" the ---
>
> Besides that the code is better.
>
> Reviewed-by: Mario Limonciello 

Reviewed-by: Alex Deucher 

I presume sdma_v5.2.c needs a similar fix?

Alex


>
> > Signed-off-by: Rajib Mahapatra 
> > ---
> >  drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c | 8 
> >  1 file changed, 8 insertions(+)
> >
> > diff --git a/drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c
> > b/drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c
> > index 06a7ceda4c87..02115d63b071 100644
> > --- a/drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c
> > +++ b/drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c
> > @@ -2058,6 +2058,10 @@ static int sdma_v4_0_suspend(void *handle)
> >  {
> >   struct amdgpu_device *adev = (struct amdgpu_device *)handle;
> >
> > + /* SMU saves SDMA state for us */
> > + if (adev->in_s0ix)
> > + return 0;
> > +
> >   return sdma_v4_0_hw_fini(adev);
> >  }
> >
> > @@ -2065,6 +2069,10 @@ static int sdma_v4_0_resume(void *handle)
> >  {
> >   struct amdgpu_device *adev = (struct amdgpu_device *)handle;
> >
> > + /* SMU restores SDMA state for us */
> > + if (adev->in_s0ix)
> > + return 0;
> > +
> >   return sdma_v4_0_hw_init(adev);
> >  }
> >
> > --
> > 2.25.1


Re: [Patch V2] drm/amdgpu: Handle the GPU recovery failure in SRIOV environment.

2022-02-10 Thread Andrey Grodzovsky

Reviewed-by: Andrey Grodzovsky 

Andrey

On 2022-02-03 21:45, Surbhi Kakarya wrote:

This patch handles the GPU recovery failure in sriov environment by
retrying the reset if the first reset fails. To determine the condition
of retry, a new macro AMDGPU_RETRY_SRIOV_RESET is added which returns
true if failure is due to ETIMEDOUT, EINVAL or EBUSY, otherwise return
false.A new macro AMDGPU_MAX_RETRY_LIMIT is used to limit the retry to 2.

It also handles the return status in Post Asic Reset by updating the return
code with asic_reset_res and eventually return the return code in
amdgpu_job_timedout().

Signed-off-by: Surbhi Kakarya 
---
Changes in V2:
  * Added the macro AMDGPU_RETRY_SRIOV_RESET to determine the retry condition.
  * Moved the reset retry in amdgpu_device_reset_sriov() to avoid duplicacy.
  * Added the AMDGPU_ prefix in new defines.
  * Verfied the coding style with checkpatch.pl
  * Added the retry limit as 2

  drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 15 +++
  drivers/gpu/drm/amd/amdgpu/amdgpu_job.c|  6 +-
  2 files changed, 20 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index 53af2623c58f..59310ca398f5 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -89,6 +89,8 @@ MODULE_FIRMWARE("amdgpu/vangogh_gpu_info.bin");
  MODULE_FIRMWARE("amdgpu/yellow_carp_gpu_info.bin");
  
  #define AMDGPU_RESUME_MS		2000

+#define AMDGPU_MAX_RETRY_LIMIT 2
+#define AMDGPU_RETRY_SRIOV_RESET(r) ((r) == -EBUSY || (r) == -ETIMEDOUT || (r) 
== -EINVAL)
  
  const char *amdgpu_asic_name[] = {

"TAHITI",
@@ -4456,7 +4458,9 @@ static int amdgpu_device_reset_sriov(struct amdgpu_device 
*adev,
  {
int r;
struct amdgpu_hive_info *hive = NULL;
+   int retry_limit = 0;
  
+retry:

amdgpu_amdkfd_pre_reset(adev);
  
  	if (from_hypervisor)

@@ -4503,6 +4507,14 @@ static int amdgpu_device_reset_sriov(struct 
amdgpu_device *adev,
}
amdgpu_virt_release_full_gpu(adev, true);
  
+	if (AMDGPU_RETRY_SRIOV_RESET(r)) {

+   if (retry_limit < AMDGPU_MAX_RETRY_LIMIT) {
+   retry_limit++;
+   goto retry;
+   } else
+   DRM_ERROR("GPU reset retry is beyond the retry 
limit\n");
+   }
+
return r;
  }
  
@@ -5341,6 +5353,9 @@ int amdgpu_device_gpu_recover(struct amdgpu_device *adev,

drm_helper_resume_force_mode(adev_to_drm(tmp_adev));
}
  
+		if (tmp_adev->asic_reset_res)

+   r = tmp_adev->asic_reset_res;
+
tmp_adev->asic_reset_res = 0;
  
  		if (r) {

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
index e0730ea56a8c..4b9d62f375ac 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
@@ -37,6 +37,7 @@ static enum drm_gpu_sched_stat amdgpu_job_timedout(struct 
drm_sched_job *s_job)
struct amdgpu_task_info ti;
struct amdgpu_device *adev = ring->adev;
int idx;
+   int r;
  
  	if (!drm_dev_enter(adev_to_drm(adev), &idx)) {

DRM_INFO("%s - device unplugged skipping recovery on 
scheduler:%s",
@@ -63,7 +64,10 @@ static enum drm_gpu_sched_stat amdgpu_job_timedout(struct 
drm_sched_job *s_job)
  ti.process_name, ti.tgid, ti.task_name, ti.pid);
  
  	if (amdgpu_device_should_recover_gpu(ring->adev)) {

-   amdgpu_device_gpu_recover(ring->adev, job);
+   r = amdgpu_device_gpu_recover(ring->adev, job);
+   if (r)
+   DRM_ERROR("GPU Recovery Failed: %d\n", r);
+
} else {
drm_sched_suspend_timeout(&ring->sched);
if (amdgpu_sriov_vf(adev))


RE: [PATCH v2] drm/amdgpu: skipping SDMA hw_init and hw_fini for S0ix.

2022-02-10 Thread Limonciello, Mario
[Public]



> -Original Message-
> From: Alex Deucher 
> Sent: Thursday, February 10, 2022 09:28
> To: Limonciello, Mario 
> Cc: Mahapatra, Rajib ; Liang, Prike
> ; Deucher, Alexander ;
> amd-gfx@lists.freedesktop.org; S, Shirish 
> Subject: Re: [PATCH v2] drm/amdgpu: skipping SDMA hw_init and hw_fini for
> S0ix.
> 
> On Thu, Feb 10, 2022 at 9:04 AM Limonciello, Mario
>  wrote:
> >
> > [Public]
> >
> >
> >
> > > -Original Message-
> > > From: Mahapatra, Rajib 
> > > Sent: Thursday, February 10, 2022 07:35
> > > To: Liang, Prike ; Limonciello, Mario
> > > ; Deucher, Alexander
> > > 
> > > Cc: amd-gfx@lists.freedesktop.org; S, Shirish ;
> > > Mahapatra, Rajib 
> > > Subject: [PATCH v2] drm/amdgpu: skipping SDMA hw_init and hw_fini for
> S0ix.
> > >
> > > [Why]
> > > SDMA ring buffer test failed if suspend is aborted during
> > > S0i3 resume.
> > >
> > > [How]
> > > If suspend is aborted for some reason during S0i3 resume
> > > cycle, it follows SDMA ring test failing and errors in amdgpu
> > > resume. For RN/CZN/Picasso, SMU saves and restores SDMA
> > > registers during S0ix cycle. So, skipping SDMA suspend and
> > > resume from driver solves the issue. This time, the system
> > > is able to resume gracefully even the suspend is aborted.
> > >
> > > v2: add changes on sdma_v4, skipping SDMA hw_init and hw_fini.
> >
> > This line in the commit message should be "below" the ---
> >
> > Besides that the code is better.
> >
> > Reviewed-by: Mario Limonciello 
> 
> Reviewed-by: Alex Deucher 
> 
> I presume sdma_v5.2.c needs a similar fix?

VG doesn't do s0i3 right?
No, YC should not take a similar fix.YC had an architectural change and to 
avoid a "similar" problem takes 26db706a6d77b9e184feb11725e97e53b7a89519.

> 
> Alex
> 
> 
> >
> > > Signed-off-by: Rajib Mahapatra 
> > > ---
> > >  drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c | 8 
> > >  1 file changed, 8 insertions(+)
> > >
> > > diff --git a/drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c
> > > b/drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c
> > > index 06a7ceda4c87..02115d63b071 100644
> > > --- a/drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c
> > > +++ b/drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c
> > > @@ -2058,6 +2058,10 @@ static int sdma_v4_0_suspend(void *handle)
> > >  {
> > >   struct amdgpu_device *adev = (struct amdgpu_device *)handle;
> > >
> > > + /* SMU saves SDMA state for us */
> > > + if (adev->in_s0ix)
> > > + return 0;
> > > +
> > >   return sdma_v4_0_hw_fini(adev);
> > >  }
> > >
> > > @@ -2065,6 +2069,10 @@ static int sdma_v4_0_resume(void *handle)
> > >  {
> > >   struct amdgpu_device *adev = (struct amdgpu_device *)handle;
> > >
> > > + /* SMU restores SDMA state for us */
> > > + if (adev->in_s0ix)
> > > + return 0;
> > > +
> > >   return sdma_v4_0_hw_init(adev);
> > >  }
> > >
> > > --
> > > 2.25.1


[PATCH] drm/amdgpu: Add unique_id support for sienna cichlid

2022-02-10 Thread Kent Russell
This is being added to SMU Metrics, so add the required tie-ins in the
kernel. Also create the corresponding unique_id sysfs file.

Signed-off-by: Kent Russell 
---
 drivers/gpu/drm/amd/pm/amdgpu_pm.c|  3 +-
 .../pmfw_if/smu11_driver_if_sienna_cichlid.h  | 12 +--
 .../amd/pm/swsmu/smu11/sienna_cichlid_ppt.c   | 33 +++
 3 files changed, 45 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/amd/pm/amdgpu_pm.c 
b/drivers/gpu/drm/amd/pm/amdgpu_pm.c
index ad5da252228b..f638bcfc3faa 100644
--- a/drivers/gpu/drm/amd/pm/amdgpu_pm.c
+++ b/drivers/gpu/drm/amd/pm/amdgpu_pm.c
@@ -1969,7 +1969,8 @@ static int default_attr_update(struct amdgpu_device 
*adev, struct amdgpu_device_
if (asic_type != CHIP_VEGA10 &&
asic_type != CHIP_VEGA20 &&
asic_type != CHIP_ARCTURUS &&
-   asic_type != CHIP_ALDEBARAN)
+   asic_type != CHIP_ALDEBARAN &&
+   asic_type != CHIP_SIENNA_CICHLID)
*states = ATTR_STATE_UNSUPPORTED;
} else if (DEVICE_ATTR_IS(pp_features)) {
if (adev->flags & AMD_IS_APU || asic_type < CHIP_VEGA10)
diff --git 
a/drivers/gpu/drm/amd/pm/swsmu/inc/pmfw_if/smu11_driver_if_sienna_cichlid.h 
b/drivers/gpu/drm/amd/pm/swsmu/inc/pmfw_if/smu11_driver_if_sienna_cichlid.h
index b253be602cc2..c09dec2c4e1e 100644
--- a/drivers/gpu/drm/amd/pm/swsmu/inc/pmfw_if/smu11_driver_if_sienna_cichlid.h
+++ b/drivers/gpu/drm/amd/pm/swsmu/inc/pmfw_if/smu11_driver_if_sienna_cichlid.h
@@ -1419,8 +1419,12 @@ typedef struct {
   uint8_t  PcieRate   ;
   uint8_t  PcieWidth  ;
   uint16_t AverageGfxclkFrequencyTarget;
-  uint16_t Padding16_2;
 
+  //PMFW-8711
+  uint32_t PublicSerialNumLower32;
+  uint32_t PublicSerialNumUpper32;
+
+  uint16_t Padding16_2;
 } SmuMetrics_t;
 
 typedef struct {
@@ -1476,8 +1480,12 @@ typedef struct {
   uint8_t  PcieRate   ;
   uint8_t  PcieWidth  ;
   uint16_t AverageGfxclkFrequencyTarget;
-  uint16_t Padding16_2;
 
+  //PMFW-8711
+  uint32_t PublicSerialNumLower32;
+  uint32_t PublicSerialNumUpper32;
+
+  uint16_t Padding16_2;
 } SmuMetrics_V2_t;
 
 typedef struct {
diff --git a/drivers/gpu/drm/amd/pm/swsmu/smu11/sienna_cichlid_ppt.c 
b/drivers/gpu/drm/amd/pm/swsmu/smu11/sienna_cichlid_ppt.c
index 2a7da2bad96a..048014f05b35 100644
--- a/drivers/gpu/drm/amd/pm/swsmu/smu11/sienna_cichlid_ppt.c
+++ b/drivers/gpu/drm/amd/pm/swsmu/smu11/sienna_cichlid_ppt.c
@@ -451,6 +451,38 @@ static int sienna_cichlid_setup_pptable(struct smu_context 
*smu)
return ret;
 }
 
+static void sienna_cichlid_get_unique_id(struct smu_context *smu)
+{
+   struct amdgpu_device *adev = smu->adev;
+   struct smu_table_context *smu_table = &smu->smu_table;
+   SmuMetrics_t *metrics =
+   &(((SmuMetricsExternal_t 
*)(smu_table->metrics_table))->SmuMetrics);
+   SmuMetrics_V2_t *metrics_v2 =
+   &(((SmuMetricsExternal_t 
*)(smu_table->metrics_table))->SmuMetrics_V2);
+   uint32_t upper32 = 0, lower32 = 0;
+   int ret;
+
+   mutex_lock(&smu->metrics_lock);
+   ret = smu_cmn_get_metrics_table_locked(smu, NULL, false);
+   if (ret)
+   goto out_unlock;
+
+   bool use_metrics_v2 = ((smu->adev->ip_versions[MP1_HWIP][0] == 
IP_VERSION(11, 0, 7)) &&
+   (smu->smc_fw_version >= 0x3A4300)) ? true : false;
+
+   upper32 = use_metrics_v2 ? metrics_v2->PublicSerialNumUpper32 :
+  metrics->PublicSerialNumUpper32;
+   lower32 = use_metrics_v2 ? metrics_v2->PublicSerialNumLower32 :
+  metrics->PublicSerialNumLower32;
+
+out_unlock:
+   mutex_unlock(&smu->metrics_lock);
+
+   adev->unique_id = ((uint64_t)upper32 << 32) | lower32;
+   if (adev->serial[0] == '\0')
+   sprintf(adev->serial, "%016llx", adev->unique_id);
+}
+
 static int sienna_cichlid_tables_init(struct smu_context *smu)
 {
struct smu_table_context *smu_table = &smu->smu_table;
@@ -4012,6 +4044,7 @@ static const struct pptable_funcs 
sienna_cichlid_ppt_funcs = {
.set_mp1_state = sienna_cichlid_set_mp1_state,
.stb_collect_info = sienna_cichlid_stb_get_data_direct,
.get_ecc_info = sienna_cichlid_get_ecc_info,
+   .get_unique_id = sienna_cichlid_get_unique_id,
 };
 
 void sienna_cichlid_set_ppt_funcs(struct smu_context *smu)
-- 
2.25.1



Re: [PATCH v2] drm/amdgpu: skipping SDMA hw_init and hw_fini for S0ix.

2022-02-10 Thread Alex Deucher
On Thu, Feb 10, 2022 at 10:42 AM Limonciello, Mario
 wrote:
>
> [Public]
>
>
>
> > -Original Message-
> > From: Alex Deucher 
> > Sent: Thursday, February 10, 2022 09:28
> > To: Limonciello, Mario 
> > Cc: Mahapatra, Rajib ; Liang, Prike
> > ; Deucher, Alexander ;
> > amd-gfx@lists.freedesktop.org; S, Shirish 
> > Subject: Re: [PATCH v2] drm/amdgpu: skipping SDMA hw_init and hw_fini for
> > S0ix.
> >
> > On Thu, Feb 10, 2022 at 9:04 AM Limonciello, Mario
> >  wrote:
> > >
> > > [Public]
> > >
> > >
> > >
> > > > -Original Message-
> > > > From: Mahapatra, Rajib 
> > > > Sent: Thursday, February 10, 2022 07:35
> > > > To: Liang, Prike ; Limonciello, Mario
> > > > ; Deucher, Alexander
> > > > 
> > > > Cc: amd-gfx@lists.freedesktop.org; S, Shirish ;
> > > > Mahapatra, Rajib 
> > > > Subject: [PATCH v2] drm/amdgpu: skipping SDMA hw_init and hw_fini for
> > S0ix.
> > > >
> > > > [Why]
> > > > SDMA ring buffer test failed if suspend is aborted during
> > > > S0i3 resume.
> > > >
> > > > [How]
> > > > If suspend is aborted for some reason during S0i3 resume
> > > > cycle, it follows SDMA ring test failing and errors in amdgpu
> > > > resume. For RN/CZN/Picasso, SMU saves and restores SDMA
> > > > registers during S0ix cycle. So, skipping SDMA suspend and
> > > > resume from driver solves the issue. This time, the system
> > > > is able to resume gracefully even the suspend is aborted.
> > > >
> > > > v2: add changes on sdma_v4, skipping SDMA hw_init and hw_fini.
> > >
> > > This line in the commit message should be "below" the ---
> > >
> > > Besides that the code is better.
> > >
> > > Reviewed-by: Mario Limonciello 
> >
> > Reviewed-by: Alex Deucher 
> >
> > I presume sdma_v5.2.c needs a similar fix?
>
> VG doesn't do s0i3 right?

Right.

> No, YC should not take a similar fix.YC had an architectural change and to
> avoid a "similar" problem takes 26db706a6d77b9e184feb11725e97e53b7a89519.

Isn't that likely just a workaround for the same issue?  This seems cleaner.

Alex

>
> >
> > Alex
> >
> >
> > >
> > > > Signed-off-by: Rajib Mahapatra 
> > > > ---
> > > >  drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c | 8 
> > > >  1 file changed, 8 insertions(+)
> > > >
> > > > diff --git a/drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c
> > > > b/drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c
> > > > index 06a7ceda4c87..02115d63b071 100644
> > > > --- a/drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c
> > > > +++ b/drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c
> > > > @@ -2058,6 +2058,10 @@ static int sdma_v4_0_suspend(void *handle)
> > > >  {
> > > >   struct amdgpu_device *adev = (struct amdgpu_device *)handle;
> > > >
> > > > + /* SMU saves SDMA state for us */
> > > > + if (adev->in_s0ix)
> > > > + return 0;
> > > > +
> > > >   return sdma_v4_0_hw_fini(adev);
> > > >  }
> > > >
> > > > @@ -2065,6 +2069,10 @@ static int sdma_v4_0_resume(void *handle)
> > > >  {
> > > >   struct amdgpu_device *adev = (struct amdgpu_device *)handle;
> > > >
> > > > + /* SMU restores SDMA state for us */
> > > > + if (adev->in_s0ix)
> > > > + return 0;
> > > > +
> > > >   return sdma_v4_0_hw_init(adev);
> > > >  }
> > > >
> > > > --
> > > > 2.25.1


Re: [PATCH 2/2] drm/amdgpu: add reset register trace function on GPU reset

2022-02-10 Thread Alex Deucher
On Thu, Feb 10, 2022 at 9:11 AM Sharma, Shashank
 wrote:
>
>
>
> On 2/10/2022 3:05 PM, Christian König wrote:
> > Am 10.02.22 um 14:18 schrieb Sharma, Shashank:
> >>
> >>
> >> On 2/10/2022 8:38 AM, Christian König wrote:
> >>> Am 10.02.22 um 08:34 schrieb Somalapuram, Amaranath:
> 
>  On 2/10/2022 12:39 PM, Christian König wrote:
> > Am 10.02.22 um 06:29 schrieb Somalapuram, Amaranath:
> >>
> >> On 2/9/2022 1:17 PM, Christian König wrote:
> >>> Am 08.02.22 um 16:28 schrieb Alex Deucher:
>  On Tue, Feb 8, 2022 at 3:17 AM Somalapuram Amaranath
>   wrote:
> > Dump the list of register values to trace event on GPU reset.
> >
> > Signed-off-by: Somalapuram Amaranath
> > 
> > ---
> >   drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 21
> > -
> >   drivers/gpu/drm/amd/amdgpu/amdgpu_trace.h  | 19
> > +++
> >   2 files changed, 39 insertions(+), 1 deletion(-)
> >
> > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> > b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> > index 1e651b959141..057922fb7e37 100644
> > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> > @@ -4534,6 +4534,23 @@ int amdgpu_device_pre_asic_reset(struct
> > amdgpu_device *adev,
> >  return r;
> >   }
> >
> > +static int amdgpu_reset_reg_dumps(struct amdgpu_device *adev)
> > +{
> > +   int i;
> > +   uint32_t reg_value[128];
> > +
> > +   for (i = 0; adev->reset_dump_reg_list[i] != 0; i++) {
> > +   if (adev->asic_type >= CHIP_NAVI10)
>  This check should be against CHIP_VEGA10.  Also, this only
>  allows for
>  GC registers.  If we wanted to dump other registers, we'd need a
>  different macro.  Might be better to just use RREG32 here for
>  everything and then encode the full offset using
>  SOC15_REG_ENTRY_OFFSET() or a similar macro.  Also, we need to
>  think
>  about how to handle gfxoff in this case.  gfxoff needs to be
>  disabled
>  or we'll hang the chip if we try and read GC or SDMA registers via
>  MMIO which will adversely affect the hang signature.
> >>>
> >>> Well this should execute right before a GPU reset, so I think it
> >>> shouldn't matter if we hang the chip or not as long as the read
> >>> comes back correctly (I remember a very long UVD debug session
> >>> because of this).
> >>>
> >>> But in general I agree, we should just use RREG32() here and
> >>> always encode the full register offset.
> >>>
> >>> Regards,
> >>> Christian.
> >>>
> >> Can I use something like this:
> >>
> >> +   reg_value[i] =
> >> RREG32((adev->reg_offset[adev->reset_dump_reg_list[i][0]]
> >> + [adev->reset_dump_reg_list[i][1]]
> >> + [adev->reset_dump_reg_list[i][2]])
> >> + + adev->reset_dump_reg_list[i][3]);
> >>
> >> ip --> adev->reset_dump_reg_list[i][0]
> >>
> >> inst --> adev->reset_dump_reg_list[i][1]
> >>
> >> BASE_IDX--> adev->reset_dump_reg_list[i][2]
> >>
> >> reg --> adev->reset_dump_reg_list[i][3]
> >
> > No, that won't work.
> >
> > What you need to do is to use the full 32bit address of the
> > register. Userspace can worry about figuring out which ip,
> > instance, base and reg to resolve into that address.
> >
> > Regards,
> > Christian.
> >
>  Thanks Christian.
> 
>  should I consider using gfxoff like below code or not required:
>  amdgpu_gfx_off_ctrl(adev, false);
>  amdgpu_gfx_off_ctrl(adev, true);
> >>>
> >>> That's a really good question I can't fully answer.
> >>>
> >>> I think we don't want that because the GPU is stuck when the dump is
> >>> made, but better let Alex comment as well.
> >>>
> >>> Regards,
> >>> Christian.
> >>
> >>
> >> I had a quick look at the function amdgpu_gfx_off_ctrl, and it locks
> >> this mutex internally:
> >> mutex_lock(&adev->gfx.gfx_off_mutex);
> >>
> >> and the reference state is tracked in:
> >> adev->gfx.gfx_off_state
> >>
> >> We can do something like this maybe:
> >> - If (adev->gfx_off_state == 0) {
> >>   trylock(gfx_off_mutex)
> >>   read_regs_now;
> >>   unlock_mutex();
> >> }
> >>
> >> How does it sounds ?
> >
> > As far as I know that won't work. GFX_off is only disabled intentionally
> > in very few places.
> >
> > So we will probably never get a register trace with that.
> >
>
> Ok, I don't know much about this feature, but due to the name I was
> udner the impression that gfx_off will be mostly disabled. But if it is
> hardly ever disabled, we need more infomrmation about it first, like
> when is t

RE: [PATCH v2] drm/amdgpu: skipping SDMA hw_init and hw_fini for S0ix.

2022-02-10 Thread Limonciello, Mario
[Public]

> > VG doesn't do s0i3 right?
> 
> Right.
> 
> > No, YC should not take a similar fix.YC had an architectural change and 
> > to
> > avoid a "similar" problem takes
> 26db706a6d77b9e184feb11725e97e53b7a89519.
> 
> Isn't that likely just a workaround for the same issue?  This seems cleaner.
> 

The SMU doesn't handle the restore of the SDMA registers for YC though, this
explicitly changed.  So I don't believe we can do an identical fix there.

@Liang, Prike comments?


Re: [PATCH v11 1/5] drm: improve drm_buddy_alloc function

2022-02-10 Thread Matthew Auld

On 27/01/2022 14:11, Arunpravin wrote:

- Make drm_buddy_alloc a single function to handle
   range allocation and non-range allocation demands

- Implemented a new function alloc_range() which allocates
   the requested power-of-two block comply with range limitations

- Moved order computation and memory alignment logic from
   i915 driver to drm buddy

v2:
   merged below changes to keep the build unbroken
- drm_buddy_alloc_range() becomes obsolete and may be removed
- enable ttm range allocation (fpfn / lpfn) support in i915 driver
- apply enhanced drm_buddy_alloc() function to i915 driver

v3(Matthew Auld):
   - Fix alignment issues and remove unnecessary list_empty check
   - add more validation checks for input arguments
   - make alloc_range() block allocations as bottom-up
   - optimize order computation logic
   - replace uint64_t with u64, which is preferred in the kernel

v4(Matthew Auld):
   - keep drm_buddy_alloc_range() function implementation for generic
 actual range allocations
   - keep alloc_range() implementation for end bias allocations

v5(Matthew Auld):
   - modify drm_buddy_alloc() passing argument place->lpfn to lpfn
 as place->lpfn will currently always be zero for i915

v6(Matthew Auld):
   - fixup potential uaf - If we are unlucky and can't allocate
 enough memory when splitting blocks, where we temporarily
 end up with the given block and its buddy on the respective
 free list, then we need to ensure we delete both blocks,
 and no just the buddy, before potentially freeing them

   - fix warnings reported by kernel test robot 

v7(Matthew Auld):
   - revert fixup potential uaf
   - keep __alloc_range() add node to the list logic same as
 drm_buddy_alloc_blocks() by having a temporary list variable
   - at drm_buddy_alloc_blocks() keep i915 range_overflows macro
 and add a new check for end variable

v8:
   - fix warnings reported by kernel test robot 

Signed-off-by: Arunpravin 
---
  drivers/gpu/drm/drm_buddy.c   | 315 +-
  drivers/gpu/drm/i915/i915_ttm_buddy_manager.c |  67 ++--
  drivers/gpu/drm/i915/i915_ttm_buddy_manager.h |   2 +
  include/drm/drm_buddy.h   |  13 +-
  4 files changed, 280 insertions(+), 117 deletions(-)

diff --git a/drivers/gpu/drm/drm_buddy.c b/drivers/gpu/drm/drm_buddy.c
index d60878bc9c20..cfc160a1ef1a 100644
--- a/drivers/gpu/drm/drm_buddy.c
+++ b/drivers/gpu/drm/drm_buddy.c
@@ -282,23 +282,97 @@ void drm_buddy_free_list(struct drm_buddy *mm, struct 
list_head *objects)
  }
  EXPORT_SYMBOL(drm_buddy_free_list);
  
-/**

- * drm_buddy_alloc_blocks - allocate power-of-two blocks
- *
- * @mm: DRM buddy manager to allocate from
- * @order: size of the allocation
- *
- * The order value here translates to:
- *
- * 0 = 2^0 * mm->chunk_size
- * 1 = 2^1 * mm->chunk_size
- * 2 = 2^2 * mm->chunk_size
- *
- * Returns:
- * allocated ptr to the &drm_buddy_block on success
- */
-struct drm_buddy_block *
-drm_buddy_alloc_blocks(struct drm_buddy *mm, unsigned int order)
+static inline bool overlaps(u64 s1, u64 e1, u64 s2, u64 e2)
+{
+   return s1 <= e2 && e1 >= s2;
+}
+
+static inline bool contains(u64 s1, u64 e1, u64 s2, u64 e2)
+{
+   return s1 <= s2 && e1 >= e2;
+}
+
+static struct drm_buddy_block *
+alloc_range_bias(struct drm_buddy *mm,
+u64 start, u64 end,
+unsigned int order)
+{
+   struct drm_buddy_block *block;
+   struct drm_buddy_block *buddy;
+   LIST_HEAD(dfs);
+   int err;
+   int i;
+
+   end = end - 1;
+
+   for (i = 0; i < mm->n_roots; ++i)
+   list_add_tail(&mm->roots[i]->tmp_link, &dfs);
+
+   do {
+   u64 block_start;
+   u64 block_end;
+
+   block = list_first_entry_or_null(&dfs,
+struct drm_buddy_block,
+tmp_link);
+   if (!block)
+   break;
+
+   list_del(&block->tmp_link);
+
+   if (drm_buddy_block_order(block) < order)
+   continue;
+
+   block_start = drm_buddy_block_offset(block);
+   block_end = block_start + drm_buddy_block_size(mm, block) - 1;
+
+   if (!overlaps(start, end, block_start, block_end))
+   continue;
+
+   if (drm_buddy_block_is_allocated(block))
+   continue;
+
+   if (contains(start, end, block_start, block_end) &&
+   order == drm_buddy_block_order(block)) {
+   /*
+* Find the free block within the range.
+*/
+   if (drm_buddy_block_is_free(block))
+   return block;
+
+   continue;
+   }
+
+   if (!drm_buddy_block_is_split(block)) {
+   err = split

Re: [PATCH v2] drm/amdgpu: skipping SDMA hw_init and hw_fini for S0ix.

2022-02-10 Thread Alex Deucher
On Thu, Feb 10, 2022 at 11:08 AM Limonciello, Mario
 wrote:
>
> [Public]
>
> > > VG doesn't do s0i3 right?
> >
> > Right.
> >
> > > No, YC should not take a similar fix.YC had an architectural change 
> > > and to
> > > avoid a "similar" problem takes
> > 26db706a6d77b9e184feb11725e97e53b7a89519.
> >
> > Isn't that likely just a workaround for the same issue?  This seems cleaner.
> >
>
> The SMU doesn't handle the restore of the SDMA registers for YC though, this
> explicitly changed.  So I don't believe we can do an identical fix there.
>
> @Liang, Prike comments?

Ah, ok, in that case, it's probably correct as is.  I assumed the smu
would probably behave the same.

Alex


Upstream This Patch

2022-02-10 Thread Logush, Oliver
[AMD Official Use Only]

>From 488cc792021a60300df3659de204ebef954ba2bb Mon Sep 17 00:00:00 2001
From: Oliver Logush ollog...@amd.com
Date: Wed, 9 Feb 2022 14:25:13 -0500
Subject: [PATCH] drm/amd/display: extend dcn201 support

Signed-off-by: Oliver Logush ollog...@amd.com
Reviewed By: alexander.deuc...@amd.com
   charlene@amd.com
---
drivers/gpu/drm/amd/display/dc/core/dc_resource.c | 2 +-
drivers/gpu/drm/amd/display/include/dal_asic_id.h | 1 +
2 files changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/display/dc/core/dc_resource.c 
b/drivers/gpu/drm/amd/display/dc/core/dc_resource.c
index b36bae4b5bc9..71b393194c55 100644
--- a/drivers/gpu/drm/amd/display/dc/core/dc_resource.c
+++ b/drivers/gpu/drm/amd/display/dc/core/dc_resource.c
@@ -135,7 +135,7 @@ enum dce_version resource_parse_asic_id(struct hw_asic_id 
asic_id)

case FAMILY_NV:
   dc_version = DCN_VERSION_2_0;
-  if (asic_id.chip_id == DEVICE_ID_NV_13FE) {
+ if (asic_id.chip_id == DEVICE_ID_NV_13FE || 
asic_id.chip_id == DEVICE_ID_NV_143F) {
   dc_version = DCN_VERSION_2_01;
   break;
   }
diff --git a/drivers/gpu/drm/amd/display/include/dal_asic_id.h 
b/drivers/gpu/drm/amd/display/include/dal_asic_id.h
index e4a2dfacab4c..e672be6327cb 100644
--- a/drivers/gpu/drm/amd/display/include/dal_asic_id.h
+++ b/drivers/gpu/drm/amd/display/include/dal_asic_id.h
@@ -212,6 +212,7 @@ enum {
#define ASICREV_IS_GREEN_SARDINE(eChipRev) ((eChipRev >= GREEN_SARDINE_A0) && 
(eChipRev < 0xFF))
#endif
#define DEVICE_ID_NV_13FE 0x13FE  // CYAN_SKILLFISH
+#define DEVICE_ID_NV_143F 0x143F
#define FAMILY_VGH 144
#define DEVICE_ID_VGH_163F 0x163F
#define VANGOGH_A0 0x01
--
2.25.1



Re: Upstream This Patch

2022-02-10 Thread Alex Deucher
Subject is wrong.  Should be:
drm/amd/display: extend dcn201 support

On Thu, Feb 10, 2022 at 11:53 AM Logush, Oliver  wrote:
>
> [AMD Official Use Only]
>
>
> From 488cc792021a60300df3659de204ebef954ba2bb Mon Sep 17 00:00:00 2001
>
> From: Oliver Logush ollog...@amd.com
>
> Date: Wed, 9 Feb 2022 14:25:13 -0500
>
> Subject: [PATCH] drm/amd/display: extend dcn201 support
>
>
>
> Signed-off-by: Oliver Logush ollog...@amd.com
>
> Reviewed By: alexander.deuc...@amd.com
>
>charlene@amd.com

Fix the RB lines, they should look like:

Reviewed-by: Alex Deucher 
Reviewed-by: Charlene Liu 

>
> ---
>
> drivers/gpu/drm/amd/display/dc/core/dc_resource.c | 2 +-
>
> drivers/gpu/drm/amd/display/include/dal_asic_id.h | 1 +
>
> 2 files changed, 2 insertions(+), 1 deletion(-)
>
>
>
> diff --git a/drivers/gpu/drm/amd/display/dc/core/dc_resource.c 
> b/drivers/gpu/drm/amd/display/dc/core/dc_resource.c
>
> index b36bae4b5bc9..71b393194c55 100644
>
> --- a/drivers/gpu/drm/amd/display/dc/core/dc_resource.c
>
> +++ b/drivers/gpu/drm/amd/display/dc/core/dc_resource.c
>
> @@ -135,7 +135,7 @@ enum dce_version resource_parse_asic_id(struct hw_asic_id 
> asic_id)
>
>
>
> case FAMILY_NV:
>
>dc_version = DCN_VERSION_2_0;
>
> -  if (asic_id.chip_id == DEVICE_ID_NV_13FE) {
>
> + if (asic_id.chip_id == DEVICE_ID_NV_13FE || 
> asic_id.chip_id == DEVICE_ID_NV_143F) {
>
>dc_version = DCN_VERSION_2_01;
>
>break;
>
>}
>
> diff --git a/drivers/gpu/drm/amd/display/include/dal_asic_id.h 
> b/drivers/gpu/drm/amd/display/include/dal_asic_id.h
>
> index e4a2dfacab4c..e672be6327cb 100644
>
> --- a/drivers/gpu/drm/amd/display/include/dal_asic_id.h
>
> +++ b/drivers/gpu/drm/amd/display/include/dal_asic_id.h
>
> @@ -212,6 +212,7 @@ enum {
>
> #define ASICREV_IS_GREEN_SARDINE(eChipRev) ((eChipRev >= GREEN_SARDINE_A0) && 
> (eChipRev < 0xFF))
>
> #endif
>
> #define DEVICE_ID_NV_13FE 0x13FE  // CYAN_SKILLFISH
>
> +#define DEVICE_ID_NV_143F 0x143F
>
> #define FAMILY_VGH 144
>
> #define DEVICE_ID_VGH_163F 0x163F
>
> #define VANGOGH_A0 0x01
>
> --
>
> 2.25.1
>
>


[PATCH] drm/amdkfd: fix loop error handling

2022-02-10 Thread trix
From: Tom Rix 

Clang static analysis reports this problem
kfd_chardev.c:2594:16: warning: The expression is an uninitialized value.
  The computed value will also be garbage
while (ret && i--) {
  ^~~

i is a loop variable and this block unwinds a problem in the loop.
When the error happens before the loop, this value is garbage.
Move the initialization of i to its decalaration.

Fixes: be072b06c739 ("drm/amdkfd: CRIU export BOs as prime dmabuf objects")
Signed-off-by: Tom Rix 
---
 drivers/gpu/drm/amd/amdkfd/kfd_chardev.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
index 636391c61cafb..4310ca07af130 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
@@ -2374,7 +2374,7 @@ static int criu_restore_bos(struct kfd_process *p,
const bool criu_resume = true;
bool flush_tlbs = false;
int ret = 0, j = 0;
-   uint32_t i;
+   uint32_t i = 0;
 
if (*priv_offset + (args->num_bos * sizeof(*bo_privs)) > 
max_priv_data_size)
return -EINVAL;
@@ -2410,7 +2410,7 @@ static int criu_restore_bos(struct kfd_process *p,
*priv_offset += args->num_bos * sizeof(*bo_privs);
 
/* Create and map new BOs */
-   for (i = 0; i < args->num_bos; i++) {
+   for (; i < args->num_bos; i++) {
struct kfd_criu_bo_bucket *bo_bucket;
struct kfd_criu_bo_priv_data *bo_priv;
struct kfd_dev *dev;
-- 
2.26.3



[PATCH] amdgpu/pm: Disable managing hwmon sysfs attributes for ONEVF mode

2022-02-10 Thread Danijel Slivka
This patch prohibits performing of set commands on all hwmon attributes
through sysfs in ONEVF mode.

Signed-off-by: Danijel Slivka 
---
 drivers/gpu/drm/amd/pm/amdgpu_pm.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/drivers/gpu/drm/amd/pm/amdgpu_pm.c 
b/drivers/gpu/drm/amd/pm/amdgpu_pm.c
index ad5da252228b..3cec023a7b06 100644
--- a/drivers/gpu/drm/amd/pm/amdgpu_pm.c
+++ b/drivers/gpu/drm/amd/pm/amdgpu_pm.c
@@ -3161,6 +3161,10 @@ static umode_t hwmon_attributes_visible(struct kobject 
*kobj,
if (amdgpu_sriov_vf(adev) && !amdgpu_sriov_is_pp_one_vf(adev))
return 0;
 
+   /* under pp one vf mode manage of hwmon attributes is not supported */
+   if (amdgpu_sriov_is_pp_one_vf(adev))
+   effective_mode &= ~S_IWUSR;
+
/* Skip fan attributes if fan is not present */
if (adev->pm.no_fan && (attr == &sensor_dev_attr_pwm1.dev_attr.attr ||
attr == &sensor_dev_attr_pwm1_enable.dev_attr.attr ||
-- 
2.25.1



Re: start sorting out the ZONE_DEVICE refcount mess v2

2022-02-10 Thread Sierra Guiza, Alejandro (Alex)

Christoph,
Thanks a lot for rebase our patches. I just ran our amdgpu hmm-tests 
with this series and all passed.


Regards,
Alex Sierra

On 2/10/2022 1:28 AM, Christoph Hellwig wrote:

Hi all,

this series removes the offset by one refcount for ZONE_DEVICE pages
that are freed back to the driver owning them, which is just device
private ones for now, but also the planned device coherent pages
and the ehanced p2p ones pending.

It does not address the fsdax pages yet, which will be attacked in a
follow on series.

Note that if we want to get the p2p series rebased on top of this
we'll need a git branch for this series.  I could offer to host one.

A git tree is available here:

 git://git.infradead.org/users/hch/misc.git pgmap-refcount

Gitweb:

 
http://git.infradead.org/users/hch/misc.git/shortlog/refs/heads/pgmap-refcount

Changes since v1:
  - add a missing memremap.h include in memcontrol.c
  - include rebased versions of the device coherent support and
device coherent migration support series as well as additional
cleanup patches

Diffstt:
  arch/arm64/mm/mmu.c  |1
  arch/powerpc/kvm/book3s_hv_uvmem.c   |1
  drivers/gpu/drm/amd/amdkfd/kfd_migrate.c |   35 -
  drivers/gpu/drm/amd/amdkfd/kfd_priv.h|1
  drivers/gpu/drm/drm_cache.c  |2
  drivers/gpu/drm/nouveau/nouveau_dmem.c   |3
  drivers/gpu/drm/nouveau/nouveau_svm.c|1
  drivers/infiniband/core/rw.c |1
  drivers/nvdimm/pmem.h|1
  drivers/nvme/host/pci.c  |1
  drivers/nvme/target/io-cmd-bdev.c|1
  fs/Kconfig   |2
  fs/fuse/virtio_fs.c  |1
  include/linux/hmm.h  |9
  include/linux/memremap.h |   36 +
  include/linux/migrate.h  |1
  include/linux/mm.h   |   59 --
  lib/test_hmm.c   |  353 ++---
  lib/test_hmm_uapi.h  |   22
  mm/Kconfig   |7
  mm/Makefile  |1
  mm/gup.c |  127 +++-
  mm/internal.h|3
  mm/memcontrol.c  |   19
  mm/memory-failure.c  |8
  mm/memremap.c|   75 +-
  mm/migrate.c |  763 
  mm/migrate_device.c  |  822 
+++
  mm/rmap.c|5
  mm/swap.c|   49 -
  tools/testing/selftests/vm/Makefile  |2
  tools/testing/selftests/vm/hmm-tests.c   |  204 ++-
  tools/testing/selftests/vm/test_hmm.sh   |   24
  33 files changed, 1552 insertions(+), 1088 deletions(-)




Re: [PATCH v11 5/5] drm/amdgpu: add drm buddy support to amdgpu

2022-02-10 Thread Matthew Auld

On 08/02/2022 11:20, Arunpravin wrote:



On 04/02/22 6:53 pm, Christian König wrote:

Am 04.02.22 um 12:22 schrieb Arunpravin:

On 28/01/22 7:48 pm, Matthew Auld wrote:

On Thu, 27 Jan 2022 at 14:11, Arunpravin
 wrote:

- Remove drm_mm references and replace with drm buddy functionalities
- Add res cursor support for drm buddy

v2(Matthew Auld):
- replace spinlock with mutex as we call kmem_cache_zalloc
  (..., GFP_KERNEL) in drm_buddy_alloc() function

- lock drm_buddy_block_trim() function as it calls
  mark_free/mark_split are all globally visible

v3(Matthew Auld):
- remove trim method error handling as we address the failure case
  at drm_buddy_block_trim() function

v4:
- fix warnings reported by kernel test robot 

v5:
- fix merge conflict issue

v6:
- fix warnings reported by kernel test robot 

Signed-off-by: Arunpravin 
---
   drivers/gpu/drm/Kconfig   |   1 +
   .../gpu/drm/amd/amdgpu/amdgpu_res_cursor.h|  97 +--
   drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h   |   7 +-
   drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c  | 259 ++
   4 files changed, 231 insertions(+), 133 deletions(-)




-/**
- * amdgpu_vram_mgr_virt_start - update virtual start address
- *
- * @mem: ttm_resource to update
- * @node: just allocated node
- *
- * Calculate a virtual BO start address to easily check if everything is CPU
- * accessible.
- */
-static void amdgpu_vram_mgr_virt_start(struct ttm_resource *mem,
-  struct drm_mm_node *node)
-{
-   unsigned long start;
-
-   start = node->start + node->size;
-   if (start > mem->num_pages)
-   start -= mem->num_pages;
-   else
-   start = 0;
-   mem->start = max(mem->start, start);
-}
-
   /**
* amdgpu_vram_mgr_new - allocate new ranges
*
@@ -366,13 +357,13 @@ static int amdgpu_vram_mgr_new(struct 
ttm_resource_manager *man,
 const struct ttm_place *place,
 struct ttm_resource **res)
   {
-   unsigned long lpfn, num_nodes, pages_per_node, pages_left, pages;
+   unsigned long lpfn, pages_per_node, pages_left, pages, n_pages;
+   u64 vis_usage = 0, mem_bytes, max_bytes, min_page_size;
  struct amdgpu_vram_mgr *mgr = to_vram_mgr(man);
  struct amdgpu_device *adev = to_amdgpu_device(mgr);
-   uint64_t vis_usage = 0, mem_bytes, max_bytes;
-   struct ttm_range_mgr_node *node;
-   struct drm_mm *mm = &mgr->mm;
-   enum drm_mm_insert_mode mode;
+   struct amdgpu_vram_mgr_node *node;
+   struct drm_buddy *mm = &mgr->mm;
+   struct drm_buddy_block *block;
  unsigned i;
  int r;

@@ -391,10 +382,9 @@ static int amdgpu_vram_mgr_new(struct ttm_resource_manager 
*man,
  goto error_sub;
  }

-   if (place->flags & TTM_PL_FLAG_CONTIGUOUS) {
+   if (place->flags & TTM_PL_FLAG_CONTIGUOUS)
  pages_per_node = ~0ul;
-   num_nodes = 1;
-   } else {
+   else {
   #ifdef CONFIG_TRANSPARENT_HUGEPAGE
  pages_per_node = HPAGE_PMD_NR;
   #else
@@ -403,11 +393,9 @@ static int amdgpu_vram_mgr_new(struct ttm_resource_manager 
*man,
   #endif
  pages_per_node = max_t(uint32_t, pages_per_node,
 tbo->page_alignment);
-   num_nodes = DIV_ROUND_UP_ULL(PFN_UP(mem_bytes), pages_per_node);
  }

-   node = kvmalloc(struct_size(node, mm_nodes, num_nodes),
-   GFP_KERNEL | __GFP_ZERO);
+   node = kzalloc(sizeof(*node), GFP_KERNEL);
  if (!node) {
  r = -ENOMEM;
  goto error_sub;
@@ -415,9 +403,17 @@ static int amdgpu_vram_mgr_new(struct ttm_resource_manager 
*man,

  ttm_resource_init(tbo, place, &node->base);

-   mode = DRM_MM_INSERT_BEST;
+   INIT_LIST_HEAD(&node->blocks);
+
  if (place->flags & TTM_PL_FLAG_TOPDOWN)
-   mode = DRM_MM_INSERT_HIGH;
+   node->flags |= DRM_BUDDY_TOPDOWN_ALLOCATION;
+
+   if (place->fpfn || lpfn != man->size)
+   /* Allocate blocks in desired range */
+   node->flags |= DRM_BUDDY_RANGE_ALLOCATION;
+
+   min_page_size = mgr->default_page_size;
+   BUG_ON(min_page_size < mm->chunk_size);

  pages_left = node->base.num_pages;

@@ -425,36 +421,61 @@ static int amdgpu_vram_mgr_new(struct 
ttm_resource_manager *man,
  pages = min(pages_left, 2UL << (30 - PAGE_SHIFT));

  i = 0;
-   spin_lock(&mgr->lock);
  while (pages_left) {
-   uint32_t alignment = tbo->page_alignment;
-
  if (pages >= pages_per_node)
-   alignment = pages_per_node;
-
-   r = drm_mm_insert_node_in_range(mm, &node->mm_nodes[i], pages,
-   alignment, 0, place->fpfn,
-

[PATCH] drm/amd/display: extend dcn201 support

2022-02-10 Thread Logush, Oliver
[AMD Official Use Only]

>From 488cc792021a60300df3659de204ebef954ba2bb Mon Sep 17 00:00:00 2001
From: Oliver Logush ollog...@amd.com
Date: Wed, 9 Feb 2022 14:25:13 -0500
Subject: [PATCH] drm/amd/display: extend dcn201 support

Signed-off-by: Oliver Logush ollog...@amd.com
Reviewed By: alexander.deuc...@amd.com
   charlene@amd.com
---
drivers/gpu/drm/amd/display/dc/core/dc_resource.c | 2 +-
drivers/gpu/drm/amd/display/include/dal_asic_id.h | 1 +
2 files changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/display/dc/core/dc_resource.c 
b/drivers/gpu/drm/amd/display/dc/core/dc_resource.c
index b36bae4b5bc9..71b393194c55 100644
--- a/drivers/gpu/drm/amd/display/dc/core/dc_resource.c
+++ b/drivers/gpu/drm/amd/display/dc/core/dc_resource.c
@@ -135,7 +135,7 @@ enum dce_version resource_parse_asic_id(struct hw_asic_id 
asic_id)

case FAMILY_NV:
   dc_version = DCN_VERSION_2_0;
-  if (asic_id.chip_id == DEVICE_ID_NV_13FE) {
+ if (asic_id.chip_id == DEVICE_ID_NV_13FE || 
asic_id.chip_id == DEVICE_ID_NV_143F) {
   dc_version = DCN_VERSION_2_01;
   break;
   }
diff --git a/drivers/gpu/drm/amd/display/include/dal_asic_id.h 
b/drivers/gpu/drm/amd/display/include/dal_asic_id.h
index e4a2dfacab4c..e672be6327cb 100644
--- a/drivers/gpu/drm/amd/display/include/dal_asic_id.h
+++ b/drivers/gpu/drm/amd/display/include/dal_asic_id.h
@@ -212,6 +212,7 @@ enum {
#define ASICREV_IS_GREEN_SARDINE(eChipRev) ((eChipRev >= GREEN_SARDINE_A0) && 
(eChipRev < 0xFF))
#endif
#define DEVICE_ID_NV_13FE 0x13FE  // CYAN_SKILLFISH
+#define DEVICE_ID_NV_143F 0x143F
#define FAMILY_VGH 144
#define DEVICE_ID_VGH_163F 0x163F
#define VANGOGH_A0 0x01
--
2.25.1



Re: [PATCH] drm/amdkfd: fix loop error handling

2022-02-10 Thread Felix Kuehling

Am 2022-02-10 um 12:04 schrieb t...@redhat.com:

From: Tom Rix 

Clang static analysis reports this problem
kfd_chardev.c:2594:16: warning: The expression is an uninitialized value.
   The computed value will also be garbage
 while (ret && i--) {
   ^~~

i is a loop variable and this block unwinds a problem in the loop.
When the error happens before the loop, this value is garbage.
Move the initialization of i to its decalaration.

Fixes: be072b06c739 ("drm/amdkfd: CRIU export BOs as prime dmabuf objects")
Signed-off-by: Tom Rix 


Thank you. I applied the patch to amd-staging-drm-next.

Regards,
  Felix



---
  drivers/gpu/drm/amd/amdkfd/kfd_chardev.c | 4 ++--
  1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
index 636391c61cafb..4310ca07af130 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
@@ -2374,7 +2374,7 @@ static int criu_restore_bos(struct kfd_process *p,
const bool criu_resume = true;
bool flush_tlbs = false;
int ret = 0, j = 0;
-   uint32_t i;
+   uint32_t i = 0;
  
  	if (*priv_offset + (args->num_bos * sizeof(*bo_privs)) > max_priv_data_size)

return -EINVAL;
@@ -2410,7 +2410,7 @@ static int criu_restore_bos(struct kfd_process *p,
*priv_offset += args->num_bos * sizeof(*bo_privs);
  
  	/* Create and map new BOs */

-   for (i = 0; i < args->num_bos; i++) {
+   for (; i < args->num_bos; i++) {
struct kfd_criu_bo_bucket *bo_bucket;
struct kfd_criu_bo_priv_data *bo_priv;
struct kfd_dev *dev;


Re: [PATCH 6/8] mm: don't include in

2022-02-10 Thread Felix Kuehling



Am 2022-02-09 um 12:48 schrieb Christoph Hellwig:

On Mon, Feb 07, 2022 at 04:19:29PM -0500, Felix Kuehling wrote:

Am 2022-02-07 um 01:32 schrieb Christoph Hellwig:

Move the check for the actual pgmap types that need the free at refcount
one behavior into the out of line helper, and thus avoid the need to
pull memremap.h into mm.h.

Signed-off-by: Christoph Hellwig 

The amdkfd part looks good to me.

It looks like this patch is not based on Alex Sierra's coherent memory
series. He added two new helpers is_device_coherent_page and
is_dev_private_or_coherent_page that would need to be moved along with
is_device_private_page and is_pci_p2pdma_page.

FYI, here is a branch that contains a rebase of the coherent memory
related patches on top of this series:

http://git.infradead.org/users/hch/misc.git/shortlog/refs/heads/pgmap-refcount

I don't have a good way to test this, but I'll at least let the build bot
finish before sending it out (probably tomorrow).


Thank you for taking care of this rebase! Alex tested it on one of our 
coherent memory systems and it passed our tests.


I see you also included these rebased patches in your latest 27-patch 
series. I'll try to review the changes in more detail over the weekend.


Regards,
  Felix





[PATCH v2 3/9] PCI: drop `is_thunderbolt` attribute

2022-02-10 Thread Mario Limonciello
The `is_thunderbolt` attribute is currently a dumping ground for a
variety of things.

Instead use the driver core removable attribute to indicate the
detail a device is attached to a thunderbolt or USB4 chain.

Signed-off-by: Mario Limonciello 
---
 drivers/pci/pci.c |  2 +-
 drivers/pci/probe.c   | 20 +++-
 drivers/platform/x86/apple-gmux.c |  2 +-
 include/linux/pci.h   |  5 ++---
 4 files changed, 11 insertions(+), 18 deletions(-)

diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index 9ecce435fb3f..1264984d5e6d 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -2955,7 +2955,7 @@ bool pci_bridge_d3_possible(struct pci_dev *bridge)
return true;
 
/* Even the oldest 2010 Thunderbolt controller supports D3. */
-   if (bridge->is_thunderbolt)
+   if (dev_is_removable(&bridge->dev))
return true;
 
/* Platform might know better if the bridge supports D3 */
diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
index 17a969942d37..e41656cdd8f0 100644
--- a/drivers/pci/probe.c
+++ b/drivers/pci/probe.c
@@ -1577,16 +1577,6 @@ void set_pcie_hotplug_bridge(struct pci_dev *pdev)
pdev->is_hotplug_bridge = 1;
 }
 
-static void set_pcie_thunderbolt(struct pci_dev *dev)
-{
-   u16 vsec;
-
-   /* Is the device part of a Thunderbolt controller? */
-   vsec = pci_find_vsec_capability(dev, PCI_VENDOR_ID_INTEL, 
PCI_VSEC_ID_INTEL_TBT);
-   if (vsec)
-   dev->is_thunderbolt = 1;
-}
-
 static void set_pcie_untrusted(struct pci_dev *dev)
 {
struct pci_dev *parent;
@@ -1603,6 +1593,10 @@ static void set_pcie_untrusted(struct pci_dev *dev)
 static void pci_set_removable(struct pci_dev *dev)
 {
struct pci_dev *parent = pci_upstream_bridge(dev);
+   u16 vsec;
+
+   /* Is the device a Thunderbolt controller? */
+   vsec = pci_find_vsec_capability(dev, PCI_VENDOR_ID_INTEL, 
PCI_VSEC_ID_INTEL_TBT);
 
/*
 * We (only) consider everything downstream from an external_facing
@@ -1615,8 +1609,9 @@ static void pci_set_removable(struct pci_dev *dev)
 * accessible to user / may not be removed by end user, and thus not
 * exposed as "removable" to userspace.
 */
-   if (parent &&
-   (parent->external_facing || dev_is_removable(&parent->dev)))
+   if (vsec ||
+   (parent &&
+   (parent->external_facing || dev_is_removable(&parent->dev
dev_set_removable(&dev->dev, DEVICE_REMOVABLE);
 }
 
@@ -1860,7 +1855,6 @@ int pci_setup_device(struct pci_dev *dev)
dev->cfg_size = pci_cfg_space_size(dev);
 
/* Need to have dev->cfg_size ready */
-   set_pcie_thunderbolt(dev);
 
set_pcie_untrusted(dev);
 
diff --git a/drivers/platform/x86/apple-gmux.c 
b/drivers/platform/x86/apple-gmux.c
index 57553f9b4d1d..04232fbc7d56 100644
--- a/drivers/platform/x86/apple-gmux.c
+++ b/drivers/platform/x86/apple-gmux.c
@@ -596,7 +596,7 @@ static int gmux_resume(struct device *dev)
 
 static int is_thunderbolt(struct device *dev, void *data)
 {
-   return to_pci_dev(dev)->is_thunderbolt;
+   return pci_is_thunderbolt_attached(to_pci_dev(dev));
 }
 
 static int gmux_probe(struct pnp_dev *pnp, const struct pnp_device_id *id)
diff --git a/include/linux/pci.h b/include/linux/pci.h
index 1e5b769e42fc..d9719eb14654 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -442,7 +442,6 @@ struct pci_dev {
unsigned intis_virtfn:1;
unsigned intis_hotplug_bridge:1;
unsigned intshpc_managed:1; /* SHPC owned by shpchp */
-   unsigned intis_thunderbolt:1;   /* Thunderbolt controller */
unsigned intno_cmd_complete:1;  /* Lies about command completed 
events */
 
/*
@@ -2447,11 +2446,11 @@ static inline bool pci_is_thunderbolt_attached(struct 
pci_dev *pdev)
 {
struct pci_dev *parent = pdev;
 
-   if (pdev->is_thunderbolt)
+   if (dev_is_removable(&pdev->dev))
return true;
 
while ((parent = pci_upstream_bridge(parent)))
-   if (parent->is_thunderbolt)
+   if (dev_is_removable(&parent->dev))
return true;
 
return false;
-- 
2.34.1



[PATCH v2 0/9] Overhaul is_thunderbolt

2022-02-10 Thread Mario Limonciello
Various drivers in the kernel use `is_thunderbolt` or
`pci_is_thunderbolt_attached` to designate behaving differently
from a device that is internally in the machine. This relies upon checks
for a specific capability only set on Intel controllers.

Non-Intel USB4 designs should also match this designation so that they
can be treated the same regardless of the host they're connected to.

As part of adding the generic USB4 controller code, it was realized that
`is_thunderbolt` and `pcie_is_thunderbolt_attached` have been overloaded.

Instead migrate to using removable attribute from device core.

Changes from v1->v2:
 - Add Alex's tag to first patch
 - Move lack of command completion into a quirk (Lukas)
 - Drop `is_thunderbolt` attribute and `pci_is_thunderbolt_attached` and
   use device core removable attribute instead
 - Adjust all consumers of old attribute to use removable

Mario Limonciello (9):
  thunderbolt: move definition of PCI_CLASS_SERIAL_USB_USB4
  PCI: Move `is_thunderbolt` check for lack of command completed to a
quirk
  PCI: drop `is_thunderbolt` attribute
  PCI: mark USB4 devices as removable
  drm/amd: drop the use of `pci_is_thunderbolt_attached`
  drm/nouveau: drop the use of `pci_is_thunderbolt_attached`
  drm/radeon: drop the use of `pci_is_thunderbolt_attached`
  platform/x86: amd-gmux: drop the use of `pci_is_thunderbolt_attached`
  PCI: drop `pci_is_thunderbolt_attached`

 drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c |  2 +-
 drivers/gpu/drm/amd/amdgpu/nbio_v2_3.c  |  2 +-
 drivers/gpu/drm/nouveau/nouveau_vga.c   |  4 ++--
 drivers/gpu/drm/radeon/radeon_device.c  |  4 ++--
 drivers/gpu/drm/radeon/radeon_kms.c |  2 +-
 drivers/pci/hotplug/pciehp_hpc.c|  6 +-
 drivers/pci/pci.c   |  2 +-
 drivers/pci/probe.c | 21 -
 drivers/pci/quirks.c| 17 +
 drivers/platform/x86/apple-gmux.c   |  2 +-
 drivers/thunderbolt/nhi.h   |  2 --
 include/linux/pci.h | 25 ++---
 include/linux/pci_ids.h |  1 +
 13 files changed, 38 insertions(+), 52 deletions(-)

-- 
2.34.1



[PATCH v2 2/9] PCI: Move `is_thunderbolt` check for lack of command completed to a quirk

2022-02-10 Thread Mario Limonciello
The `is_thunderbolt` check is currently used to indicate the lack of
command completed support for a number of older Thunderbolt devices.

This however is heavy handed and should have been done via a quirk.  Move
the affected devices outlined in commit 493fb50e958c ("PCI: pciehp: Assume
NoCompl+ for Thunderbolt ports") into pci quirks.

Suggested-by: Lukas Wunner 
Signed-off-by: Mario Limonciello 
---
 drivers/pci/hotplug/pciehp_hpc.c |  6 +-
 drivers/pci/quirks.c | 17 +
 include/linux/pci.h  |  2 ++
 3 files changed, 20 insertions(+), 5 deletions(-)

diff --git a/drivers/pci/hotplug/pciehp_hpc.c b/drivers/pci/hotplug/pciehp_hpc.c
index 1c1ebf3dad43..e4c42b24aba8 100644
--- a/drivers/pci/hotplug/pciehp_hpc.c
+++ b/drivers/pci/hotplug/pciehp_hpc.c
@@ -996,11 +996,7 @@ struct controller *pcie_init(struct pcie_device *dev)
if (pdev->hotplug_user_indicators)
slot_cap &= ~(PCI_EXP_SLTCAP_AIP | PCI_EXP_SLTCAP_PIP);
 
-   /*
-* We assume no Thunderbolt controllers support Command Complete events,
-* but some controllers falsely claim they do.
-*/
-   if (pdev->is_thunderbolt)
+   if (pdev->no_cmd_complete)
slot_cap |= PCI_EXP_SLTCAP_NCCS;
 
ctrl->slot_cap = slot_cap;
diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
index d2dd6a6cda60..6d3c88edde00 100644
--- a/drivers/pci/quirks.c
+++ b/drivers/pci/quirks.c
@@ -3675,6 +3675,23 @@ DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL, 
PCI_DEVICE_ID_INTEL_CACTUS_RIDGE_4C
 DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL, PCI_DEVICE_ID_INTEL_PORT_RIDGE,
quirk_thunderbolt_hotplug_msi);
 
+static void quirk_thunderbolt_command_completed(struct pci_dev *pdev)
+{
+   pdev->no_cmd_complete = 1;
+}
+DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL, PCI_DEVICE_ID_INTEL_LIGHT_RIDGE,
+   quirk_thunderbolt_command_completed);
+DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL, PCI_DEVICE_ID_INTEL_EAGLE_RIDGE,
+   quirk_thunderbolt_command_completed);
+DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL, PCI_DEVICE_ID_INTEL_LIGHT_PEAK,
+   quirk_thunderbolt_command_completed);
+DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL, 
PCI_DEVICE_ID_INTEL_CACTUS_RIDGE_4C,
+   quirk_thunderbolt_command_completed);
+DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL, 
PCI_DEVICE_ID_INTEL_CACTUS_RIDGE_2C,
+   quirk_thunderbolt_command_completed);
+DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL, PCI_DEVICE_ID_INTEL_PORT_RIDGE,
+   quirk_thunderbolt_command_completed);
+
 #ifdef CONFIG_ACPI
 /*
  * Apple: Shutdown Cactus Ridge Thunderbolt controller.
diff --git a/include/linux/pci.h b/include/linux/pci.h
index 8253a5413d7c..1e5b769e42fc 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -443,6 +443,8 @@ struct pci_dev {
unsigned intis_hotplug_bridge:1;
unsigned intshpc_managed:1; /* SHPC owned by shpchp */
unsigned intis_thunderbolt:1;   /* Thunderbolt controller */
+   unsigned intno_cmd_complete:1;  /* Lies about command completed 
events */
+
/*
 * Devices marked being untrusted are the ones that can potentially
 * execute DMA attacks and similar. They are typically connected
-- 
2.34.1



[PATCH v2 4/9] PCI: mark USB4 devices as removable

2022-02-10 Thread Mario Limonciello
USB4 class devices are also removable like Intel Thunderbolt devices.

Drivers of downstream devices use this information to declare functional
differences in how the drivers perform by knowing that they are connected
to an upstream TBT/USB4 port.

Signed-off-by: Mario Limonciello 
---
 drivers/pci/probe.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
index e41656cdd8f0..73673a83eb5e 100644
--- a/drivers/pci/probe.c
+++ b/drivers/pci/probe.c
@@ -1610,6 +1610,7 @@ static void pci_set_removable(struct pci_dev *dev)
 * exposed as "removable" to userspace.
 */
if (vsec ||
+   dev->class == PCI_CLASS_SERIAL_USB_USB4 ||
(parent &&
(parent->external_facing || dev_is_removable(&parent->dev
dev_set_removable(&dev->dev, DEVICE_REMOVABLE);
-- 
2.34.1



[PATCH v2 5/9] drm/amd: drop the use of `pci_is_thunderbolt_attached`

2022-02-10 Thread Mario Limonciello
Currently `pci_is_thunderbolt_attached` is used to indicate a device
is connected externally.

The PCI core now marks such devices as removable and downstream drivers
can use this instead.

Signed-off-by: Mario Limonciello 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c | 2 +-
 drivers/gpu/drm/amd/amdgpu/nbio_v2_3.c  | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
index 1ebb91db2274..6dbf5753b5be 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
@@ -161,7 +161,7 @@ int amdgpu_driver_load_kms(struct amdgpu_device *adev, 
unsigned long flags)
(amdgpu_is_atpx_hybrid() ||
 amdgpu_has_atpx_dgpu_power_cntl()) &&
((flags & AMD_IS_APU) == 0) &&
-   !pci_is_thunderbolt_attached(to_pci_dev(dev->dev)))
+   !dev_is_removable(&adev->pdev->dev))
flags |= AMD_IS_PX;
 
parent = pci_upstream_bridge(adev->pdev);
diff --git a/drivers/gpu/drm/amd/amdgpu/nbio_v2_3.c 
b/drivers/gpu/drm/amd/amdgpu/nbio_v2_3.c
index ee7cab37dfd5..2c5d74d836f0 100644
--- a/drivers/gpu/drm/amd/amdgpu/nbio_v2_3.c
+++ b/drivers/gpu/drm/amd/amdgpu/nbio_v2_3.c
@@ -382,7 +382,7 @@ static void nbio_v2_3_enable_aspm(struct amdgpu_device 
*adev,
 
data |= NAVI10_PCIE__LC_L0S_INACTIVITY_DEFAULT << 
PCIE_LC_CNTL__LC_L0S_INACTIVITY__SHIFT;
 
-   if (pci_is_thunderbolt_attached(adev->pdev))
+   if (dev_is_removable(&adev->pdev->dev))
data |= NAVI10_PCIE__LC_L1_INACTIVITY_TBT_DEFAULT  << 
PCIE_LC_CNTL__LC_L1_INACTIVITY__SHIFT;
else
data |= NAVI10_PCIE__LC_L1_INACTIVITY_DEFAULT << 
PCIE_LC_CNTL__LC_L1_INACTIVITY__SHIFT;
-- 
2.34.1



[PATCH v2 1/9] thunderbolt: move definition of PCI_CLASS_SERIAL_USB_USB4

2022-02-10 Thread Mario Limonciello
This PCI class definition of the USB4 device is currently located only in
the thunderbolt driver.

It will be needed by a few other drivers for upcoming changes. Move it into
the common include file.

Acked-by: Alex Deucher 
Signed-off-by: Mario Limonciello 
---
 drivers/thunderbolt/nhi.h | 2 --
 include/linux/pci_ids.h   | 1 +
 2 files changed, 1 insertion(+), 2 deletions(-)

diff --git a/drivers/thunderbolt/nhi.h b/drivers/thunderbolt/nhi.h
index 69083aab2736..79e980b51f94 100644
--- a/drivers/thunderbolt/nhi.h
+++ b/drivers/thunderbolt/nhi.h
@@ -81,6 +81,4 @@ extern const struct tb_nhi_ops icl_nhi_ops;
 #define PCI_DEVICE_ID_INTEL_TGL_H_NHI0 0x9a1f
 #define PCI_DEVICE_ID_INTEL_TGL_H_NHI1 0x9a21
 
-#define PCI_CLASS_SERIAL_USB_USB4  0x0c0340
-
 #endif
diff --git a/include/linux/pci_ids.h b/include/linux/pci_ids.h
index aad54c666407..61b161d914f0 100644
--- a/include/linux/pci_ids.h
+++ b/include/linux/pci_ids.h
@@ -116,6 +116,7 @@
 #define PCI_CLASS_SERIAL_USB_OHCI  0x0c0310
 #define PCI_CLASS_SERIAL_USB_EHCI  0x0c0320
 #define PCI_CLASS_SERIAL_USB_XHCI  0x0c0330
+#define PCI_CLASS_SERIAL_USB_USB4  0x0c0340
 #define PCI_CLASS_SERIAL_USB_DEVICE0x0c03fe
 #define PCI_CLASS_SERIAL_FIBER 0x0c04
 #define PCI_CLASS_SERIAL_SMBUS 0x0c05
-- 
2.34.1



[PATCH v2 8/9] platform/x86: amd-gmux: drop the use of `pci_is_thunderbolt_attached`

2022-02-10 Thread Mario Limonciello
Currently `pci_is_thunderbolt_attached` is used to indicate a device
is connected externally.

The PCI core now marks such devices as removable and downstream drivers
can use this instead.

Signed-off-by: Mario Limonciello 
---
 drivers/platform/x86/apple-gmux.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/platform/x86/apple-gmux.c 
b/drivers/platform/x86/apple-gmux.c
index 04232fbc7d56..ffac15b9befd 100644
--- a/drivers/platform/x86/apple-gmux.c
+++ b/drivers/platform/x86/apple-gmux.c
@@ -596,7 +596,7 @@ static int gmux_resume(struct device *dev)
 
 static int is_thunderbolt(struct device *dev, void *data)
 {
-   return pci_is_thunderbolt_attached(to_pci_dev(dev));
+   return dev_is_removable(dev);
 }
 
 static int gmux_probe(struct pnp_dev *pnp, const struct pnp_device_id *id)
-- 
2.34.1



[PATCH v2 9/9] PCI: drop `pci_is_thunderbolt_attached`

2022-02-10 Thread Mario Limonciello
Currently `pci_is_thunderbolt_attached` is used to indicate a device
is connected externally.

As all drivers now look at the removable attribute, drop this function.

Signed-off-by: Mario Limonciello 
---
 include/linux/pci.h | 22 --
 1 file changed, 22 deletions(-)

diff --git a/include/linux/pci.h b/include/linux/pci.h
index d9719eb14654..089e7e36a0d9 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -2434,28 +2434,6 @@ static inline bool pci_ari_enabled(struct pci_bus *bus)
return bus->self && bus->self->ari_enabled;
 }
 
-/**
- * pci_is_thunderbolt_attached - whether device is on a Thunderbolt daisy chain
- * @pdev: PCI device to check
- *
- * Walk upwards from @pdev and check for each encountered bridge if it's part
- * of a Thunderbolt controller.  Reaching the host bridge means @pdev is not
- * Thunderbolt-attached.  (But rather soldered to the mainboard usually.)
- */
-static inline bool pci_is_thunderbolt_attached(struct pci_dev *pdev)
-{
-   struct pci_dev *parent = pdev;
-
-   if (dev_is_removable(&pdev->dev))
-   return true;
-
-   while ((parent = pci_upstream_bridge(parent)))
-   if (dev_is_removable(&parent->dev))
-   return true;
-
-   return false;
-}
-
 #if defined(CONFIG_PCIEPORTBUS) || defined(CONFIG_EEH)
 void pci_uevent_ers(struct pci_dev *pdev, enum  pci_ers_result err_type);
 #endif
-- 
2.34.1



[PATCH v2 6/9] drm/nouveau: drop the use of `pci_is_thunderbolt_attached`

2022-02-10 Thread Mario Limonciello
Currently `pci_is_thunderbolt_attached` is used to indicate a device
is connected externally.

The PCI core now marks such devices as removable and downstream drivers
can use this instead.

Signed-off-by: Mario Limonciello 
---
 drivers/gpu/drm/nouveau/nouveau_vga.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/nouveau/nouveau_vga.c 
b/drivers/gpu/drm/nouveau/nouveau_vga.c
index 60cd8c0463df..2c8008cb38e0 100644
--- a/drivers/gpu/drm/nouveau/nouveau_vga.c
+++ b/drivers/gpu/drm/nouveau/nouveau_vga.c
@@ -97,7 +97,7 @@ nouveau_vga_init(struct nouveau_drm *drm)
vga_client_register(pdev, nouveau_vga_set_decode);
 
/* don't register Thunderbolt eGPU with vga_switcheroo */
-   if (pci_is_thunderbolt_attached(pdev))
+   if (dev_is_removable(&pdev->dev))
return;
 
vga_switcheroo_register_client(pdev, &nouveau_switcheroo_ops, runtime);
@@ -120,7 +120,7 @@ nouveau_vga_fini(struct nouveau_drm *drm)
 
vga_client_unregister(pdev);
 
-   if (pci_is_thunderbolt_attached(pdev))
+   if (dev_is_removable(&pdev->dev))
return;
 
vga_switcheroo_unregister_client(pdev);
-- 
2.34.1



[PATCH v2 7/9] drm/radeon: drop the use of `pci_is_thunderbolt_attached`

2022-02-10 Thread Mario Limonciello
Currently `pci_is_thunderbolt_attached` is used to indicate a device
is connected externally.

The PCI core now marks such devices as removable and downstream drivers
can use this instead.

Signed-off-by: Mario Limonciello 
---
 drivers/gpu/drm/radeon/radeon_device.c | 4 ++--
 drivers/gpu/drm/radeon/radeon_kms.c| 2 +-
 2 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/radeon/radeon_device.c 
b/drivers/gpu/drm/radeon/radeon_device.c
index 4f0fbf667431..5117fce23b3f 100644
--- a/drivers/gpu/drm/radeon/radeon_device.c
+++ b/drivers/gpu/drm/radeon/radeon_device.c
@@ -1439,7 +1439,7 @@ int radeon_device_init(struct radeon_device *rdev,
 
if (rdev->flags & RADEON_IS_PX)
runtime = true;
-   if (!pci_is_thunderbolt_attached(rdev->pdev))
+   if (!dev_is_removable(&rdev->pdev->dev))
vga_switcheroo_register_client(rdev->pdev,
   &radeon_switcheroo_ops, runtime);
if (runtime)
@@ -1527,7 +1527,7 @@ void radeon_device_fini(struct radeon_device *rdev)
/* evict vram memory */
radeon_bo_evict_vram(rdev);
radeon_fini(rdev);
-   if (!pci_is_thunderbolt_attached(rdev->pdev))
+   if (!dev_is_removable(&rdev->pdev->dev))
vga_switcheroo_unregister_client(rdev->pdev);
if (rdev->flags & RADEON_IS_PX)
vga_switcheroo_fini_domain_pm_ops(rdev->dev);
diff --git a/drivers/gpu/drm/radeon/radeon_kms.c 
b/drivers/gpu/drm/radeon/radeon_kms.c
index 11ad210919c8..e01ee7a5cf5d 100644
--- a/drivers/gpu/drm/radeon/radeon_kms.c
+++ b/drivers/gpu/drm/radeon/radeon_kms.c
@@ -139,7 +139,7 @@ int radeon_driver_load_kms(struct drm_device *dev, unsigned 
long flags)
if ((radeon_runtime_pm != 0) &&
radeon_has_atpx() &&
((flags & RADEON_IS_IGP) == 0) &&
-   !pci_is_thunderbolt_attached(pdev))
+   !dev_is_removable(&pdev->dev))
flags |= RADEON_IS_PX;
 
/* radeon_device_init should report only fatal error
-- 
2.34.1



RE: [PATCH] drm/amdgpu: no rlcg read access in SRIOV case for gfx v9

2022-02-10 Thread Chen, Guchun
[Public]

Hi Victor,

I thought about that before sending the patch, as there is indeed a hack in 
gfx9 old code. So my idea is not to impact asics with gfx10, so add the check 
in the caller.

Anyway, I will double verify your suggestion on ASICs with gfx10. If it's fine, 
I will submit a v2 patch.

Regards,
Guchun

-Original Message-
From: Skvortsov, Victor  
Sent: Thursday, February 10, 2022 6:50 PM
To: Zhang, Hawking ; Chen, Guchun ; 
amd-gfx@lists.freedesktop.org; Zhou, Peng Ju ; Koenig, 
Christian ; Deucher, Alexander 

Subject: RE: [PATCH] drm/amdgpu: no rlcg read access in SRIOV case for gfx v9

[AMD Official Use Only]

Hi Guchun,

RLCG read is available on Aldebaran if amdgpu_sriov_reg_indirect_gc() flag is 
set. Instead of adding a new function, I think we should simply add a check 
inside amdgpu_virt_get_rlcg_reg_access_flag():


diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c
index e1288901beb6..1ee600e90312 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c
@@ -836,7 +836,7 @@ static bool amdgpu_virt_get_rlcg_reg_access_flag(struct 
amdgpu_device *adev,
/* only in new version, AMDGPU_REGS_NO_KIQ and
 * AMDGPU_REGS_RLC are enabled simultaneously */
} else if ((acc_flags & AMDGPU_REGS_RLC) &&
-  !(acc_flags & AMDGPU_REGS_NO_KIQ)) {
+  !(acc_flags & AMDGPU_REGS_NO_KIQ) && write) {
*rlcg_flag = AMDGPU_RLCG_GC_WRITE_LEGACY;
ret = true;
}

Thanks,
Victor

-Original Message-
From: amd-gfx  On Behalf Of Zhang, 
Hawking
Sent: Thursday, February 10, 2022 5:02 AM
To: Chen, Guchun ; amd-gfx@lists.freedesktop.org; Zhou, 
Peng Ju ; Koenig, Christian ; 
Deucher, Alexander 
Subject: RE: [PATCH] drm/amdgpu: no rlcg read access in SRIOV case for gfx v9

[CAUTION: External Email]

[AMD Official Use Only]

Reviewed-by: Hawking Zhang 

Regards,
Hawking
-Original Message-
From: Chen, Guchun 
Sent: Thursday, February 10, 2022 14:40
To: amd-gfx@lists.freedesktop.org; Zhang, Hawking ; 
Zhou, Peng Ju ; Koenig, Christian 
; Deucher, Alexander 
Cc: Chen, Guchun 
Subject: [PATCH] drm/amdgpu: no rlcg read access in SRIOV case for gfx v9

Fall back to MMIO to read registers as rlcg read is not available for gfx v9 in 
SRIOV configration. Otherwise, gmc_v9_0_flush_gpu_tlb will always complain 
timeout and finally breaks driver load.

Fixes: 0dc4a7e75581("drm/amdgpu: switch to get_rlcg_reg_access_flag for gfx9")
Signed-off-by: Guchun Chen 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c | 13 -
 1 file changed, 12 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c
index e1288901beb6..a3274fa1c7e4 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c
@@ -37,6 +37,16 @@
vf2pf_info->ucode_info[ucode].version = ver; \
} while (0)

+static bool amdgpu_virt_is_rlcg_read_supported(struct amdgpu_device
+*adev) {
+   /* rlcg read is not support in SRIOV with gfx v9 */
+   if ((adev->ip_versions[MP0_HWIP][0] == IP_VERSION(9, 0, 0)) ||
+   (adev->ip_versions[GC_HWIP][0] == IP_VERSION(9, 4, 1)))
+   return false;
+
+   return true;
+}
+
 bool amdgpu_virt_mmio_blocked(struct amdgpu_device *adev)  {
/* By now all MMIO pages except mailbox are blocked */ @@ -957,7 +967,8 
@@ u32 amdgpu_sriov_rreg(struct amdgpu_device *adev,
u32 rlcg_flag;

if (!amdgpu_sriov_runtime(adev) &&
-   amdgpu_virt_get_rlcg_reg_access_flag(adev, acc_flags, hwip, false, 
&rlcg_flag))
+   amdgpu_virt_is_rlcg_read_supported(adev) &&
+   amdgpu_virt_get_rlcg_reg_access_flag(adev, acc_flags, hwip, 
false,
+&rlcg_flag))
return amdgpu_virt_rlcg_reg_rw(adev, offset, 0, rlcg_flag);

if (acc_flags & AMDGPU_REGS_NO_KIQ)
--
2.17.1


RE: [PATCH] drm/amdgpu: add support for GC 10.1.4

2022-02-10 Thread Huang, Ray
[Public]

Reviewed-by: Huang Rui 

From: Deucher, Alexander 
Sent: Thursday, February 10, 2022 10:57 PM
To: Yu, Lang ; amd-gfx@lists.freedesktop.org
Cc: Huang, Ray 
Subject: Re: [PATCH] drm/amdgpu: add support for GC 10.1.4


[Public]

Reviewed-by: Alex Deucher 
mailto:alexander.deuc...@amd.com>>

From: Yu, Lang mailto:lang...@amd.com>>
Sent: Thursday, February 10, 2022 1:20 AM
To: amd-gfx@lists.freedesktop.org 
mailto:amd-gfx@lists.freedesktop.org>>
Cc: Deucher, Alexander 
mailto:alexander.deuc...@amd.com>>; Huang, Ray 
mailto:ray.hu...@amd.com>>; Yu, Lang 
mailto:lang...@amd.com>>
Subject: [PATCH] drm/amdgpu: add support for GC 10.1.4

Add basic support for GC 10.1.4,
it uses same IP blocks with GC 10.1.3

Signed-off-by: Lang Yu mailto:lang...@amd.com>>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c | 6 ++
 drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c   | 3 ++-
 drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c| 9 +
 drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c| 4 +++-
 drivers/gpu/drm/amd/amdgpu/nv.c   | 1 +
 drivers/gpu/drm/amd/amdgpu/sdma_v5_0.c| 3 ++-
 drivers/gpu/drm/amd/amdkfd/kfd_crat.c | 1 +
 drivers/gpu/drm/amd/amdkfd/kfd_device.c   | 2 ++
 8 files changed, 26 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c
index eb4b7059633d..cd7e8522c130 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c
@@ -674,6 +674,7 @@ static int amdgpu_discovery_set_common_ip_blocks(struct 
amdgpu_device *adev)
 case IP_VERSION(10, 1, 1):
 case IP_VERSION(10, 1, 2):
 case IP_VERSION(10, 1, 3):
+   case IP_VERSION(10, 1, 4):
 case IP_VERSION(10, 3, 0):
 case IP_VERSION(10, 3, 1):
 case IP_VERSION(10, 3, 2):
@@ -709,6 +710,7 @@ static int amdgpu_discovery_set_gmc_ip_blocks(struct 
amdgpu_device *adev)
 case IP_VERSION(10, 1, 1):
 case IP_VERSION(10, 1, 2):
 case IP_VERSION(10, 1, 3):
+   case IP_VERSION(10, 1, 4):
 case IP_VERSION(10, 3, 0):
 case IP_VERSION(10, 3, 1):
 case IP_VERSION(10, 3, 2):
@@ -910,6 +912,7 @@ static int amdgpu_discovery_set_gc_ip_blocks(struct 
amdgpu_device *adev)
 case IP_VERSION(10, 1, 2):
 case IP_VERSION(10, 1, 1):
 case IP_VERSION(10, 1, 3):
+   case IP_VERSION(10, 1, 4):
 case IP_VERSION(10, 3, 0):
 case IP_VERSION(10, 3, 2):
 case IP_VERSION(10, 3, 1):
@@ -1044,6 +1047,7 @@ static int amdgpu_discovery_set_mes_ip_blocks(struct 
amdgpu_device *adev)
 case IP_VERSION(10, 1, 1):
 case IP_VERSION(10, 1, 2):
 case IP_VERSION(10, 1, 3):
+   case IP_VERSION(10, 1, 4):
 case IP_VERSION(10, 3, 0):
 case IP_VERSION(10, 3, 1):
 case IP_VERSION(10, 3, 2):
@@ -1243,6 +1247,7 @@ int amdgpu_discovery_set_ip_blocks(struct amdgpu_device 
*adev)
 case IP_VERSION(10, 1, 1):
 case IP_VERSION(10, 1, 2):
 case IP_VERSION(10, 1, 3):
+   case IP_VERSION(10, 1, 4):
 case IP_VERSION(10, 3, 0):
 case IP_VERSION(10, 3, 2):
 case IP_VERSION(10, 3, 4):
@@ -1264,6 +1269,7 @@ int amdgpu_discovery_set_ip_blocks(struct amdgpu_device 
*adev)
 case IP_VERSION(9, 2, 2):
 case IP_VERSION(9, 3, 0):
 case IP_VERSION(10, 1, 3):
+   case IP_VERSION(10, 1, 4):
 case IP_VERSION(10, 3, 1):
 case IP_VERSION(10, 3, 3):
 adev->flags |= AMD_IS_APU;
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c
index f2806959736a..9bc9155cbf06 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c
@@ -137,7 +137,8 @@ static int psp_early_init(void *handle)
 psp->autoload_supported = true;
 break;
 case IP_VERSION(11, 0, 8):
-   if (adev->apu_flags & AMD_APU_IS_CYAN_SKILLFISH2) {
+   if (adev->apu_flags & AMD_APU_IS_CYAN_SKILLFISH2 ||
+   adev->ip_versions[GC_HWIP][0] == IP_VERSION(10, 1, 4)) {
 psp_v11_0_8_set_psp_funcs(psp);
 psp->autoload_supported = false;
 }
diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c 
b/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
index 3d8c5fea572e..8fb4528c741f 100644
--- a/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
@@ -3641,6 +3641,7 @@ static void gfx_v10_0_init_golden_registers(struct 
amdgpu_device *adev)
 (const 
u32)ARRAY_SIZE(golden_settings_gc_10_3_5));
 break;
 case IP_VERSION(10, 1, 3):
+   case IP_VERSION(10, 1, 4):
 soc15_program_register_sequence(adev,

[PATCH] drm/amdkfd: CRIU fix extra whitespace and block comment warnings

2022-02-10 Thread Rajneesh Bhardwaj
Fix checkpatch reported warning for a quoted line and block line
comments.

Signed-off-by: Rajneesh Bhardwaj 
---
 drivers/gpu/drm/amd/amdkfd/kfd_svm.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
index 783826640da9..b71d47afd243 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
@@ -3514,7 +3514,7 @@ int kfd_criu_resume_svm(struct kfd_process *p)
 i, criu_svm_md->data.start_addr, 
criu_svm_md->data.size);
 
for (j = 0; j < num_attrs; j++) {
-   pr_debug("\ncriu_svm_md[%d]->attrs[%d].type : 0x%x 
\ncriu_svm_md[%d]->attrs[%d].value : 0x%x\n",
+   pr_debug("\ncriu_svm_md[%d]->attrs[%d].type : 
0x%x\ncriu_svm_md[%d]->attrs[%d].value : 0x%x\n",
 i, j, criu_svm_md->data.attrs[j].type,
 i, j, criu_svm_md->data.attrs[j].value);
switch (criu_svm_md->data.attrs[j].type) {
@@ -3601,7 +3601,8 @@ int kfd_criu_restore_svm(struct kfd_process *p,
num_devices = p->n_pdds;
/* Handle one SVM range object at a time, also the number of gpus are
 * assumed to be same on the restore node, checking must be done while
-* evaluating the topology earlier */
+* evaluating the topology earlier
+*/
 
svm_attrs_size = sizeof(struct kfd_ioctl_svm_attribute) *
(nattr_common + nattr_accessibility * num_devices);
-- 
2.17.1



[PATCH] drm/amdkfd: Fix prototype warning for get_process_num_bos

2022-02-10 Thread Rajneesh Bhardwaj
Fix the warning: no previous prototype for 'get_process_num_bos'
[-Wmissing-prototypes]

Reported-by: kernel test robot 
Signed-off-by: Rajneesh Bhardwaj 
---
 drivers/gpu/drm/amd/amdkfd/kfd_chardev.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
index d5699aa79578..54d997f304b5 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
@@ -1648,7 +1648,7 @@ static int criu_checkpoint_devices(struct kfd_process *p,
return ret;
 }
 
-uint32_t get_process_num_bos(struct kfd_process *p)
+static uint32_t get_process_num_bos(struct kfd_process *p)
 {
uint32_t num_of_bos = 0;
int i;
-- 
2.17.1



RE: [PATCH] drm/amdgpu: Add unique_id support for sienna cichlid

2022-02-10 Thread Quan, Evan
[AMD Official Use Only]

If this is only available with the latest pmfw, you might need to add some 
version guard there.
Otherwise, garbage data might be got with latest driver + old pmfw.

Also, the "metrics_lock" was already dropped from latest drm-next. So, it seems 
you worked with an outdated kernel.

BR
Evan
> -Original Message-
> From: amd-gfx  On Behalf Of Kent
> Russell
> Sent: Thursday, February 10, 2022 11:43 PM
> To: amd-gfx@lists.freedesktop.org
> Cc: Russell, Kent 
> Subject: [PATCH] drm/amdgpu: Add unique_id support for sienna cichlid
> 
> This is being added to SMU Metrics, so add the required tie-ins in the kernel.
> Also create the corresponding unique_id sysfs file.
> 
> Signed-off-by: Kent Russell 
> ---
>  drivers/gpu/drm/amd/pm/amdgpu_pm.c|  3 +-
>  .../pmfw_if/smu11_driver_if_sienna_cichlid.h  | 12 +--
>  .../amd/pm/swsmu/smu11/sienna_cichlid_ppt.c   | 33
> +++
>  3 files changed, 45 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/gpu/drm/amd/pm/amdgpu_pm.c
> b/drivers/gpu/drm/amd/pm/amdgpu_pm.c
> index ad5da252228b..f638bcfc3faa 100644
> --- a/drivers/gpu/drm/amd/pm/amdgpu_pm.c
> +++ b/drivers/gpu/drm/amd/pm/amdgpu_pm.c
> @@ -1969,7 +1969,8 @@ static int default_attr_update(struct
> amdgpu_device *adev, struct amdgpu_device_
>   if (asic_type != CHIP_VEGA10 &&
>   asic_type != CHIP_VEGA20 &&
>   asic_type != CHIP_ARCTURUS &&
> - asic_type != CHIP_ALDEBARAN)
> + asic_type != CHIP_ALDEBARAN &&
> + asic_type != CHIP_SIENNA_CICHLID)
>   *states = ATTR_STATE_UNSUPPORTED;
>   } else if (DEVICE_ATTR_IS(pp_features)) {
>   if (adev->flags & AMD_IS_APU || asic_type < CHIP_VEGA10)
> diff --git
> a/drivers/gpu/drm/amd/pm/swsmu/inc/pmfw_if/smu11_driver_if_sienna_
> cichlid.h
> b/drivers/gpu/drm/amd/pm/swsmu/inc/pmfw_if/smu11_driver_if_sienna_
> cichlid.h
> index b253be602cc2..c09dec2c4e1e 100644
> ---
> a/drivers/gpu/drm/amd/pm/swsmu/inc/pmfw_if/smu11_driver_if_sienna_
> cichlid.h
> +++
> b/drivers/gpu/drm/amd/pm/swsmu/inc/pmfw_if/smu11_driver_if_sienna_
> ci
> +++ chlid.h
> @@ -1419,8 +1419,12 @@ typedef struct {
>uint8_t  PcieRate   ;
>uint8_t  PcieWidth  ;
>uint16_t AverageGfxclkFrequencyTarget;
> -  uint16_t Padding16_2;
> 
> +  //PMFW-8711
> +  uint32_t PublicSerialNumLower32;
> +  uint32_t PublicSerialNumUpper32;
> +
> +  uint16_t Padding16_2;
>  } SmuMetrics_t;
> 
>  typedef struct {
> @@ -1476,8 +1480,12 @@ typedef struct {
>uint8_t  PcieRate   ;
>uint8_t  PcieWidth  ;
>uint16_t AverageGfxclkFrequencyTarget;
> -  uint16_t Padding16_2;
> 
> +  //PMFW-8711
> +  uint32_t PublicSerialNumLower32;
> +  uint32_t PublicSerialNumUpper32;
> +
> +  uint16_t Padding16_2;
>  } SmuMetrics_V2_t;
> 
>  typedef struct {
> diff --git a/drivers/gpu/drm/amd/pm/swsmu/smu11/sienna_cichlid_ppt.c
> b/drivers/gpu/drm/amd/pm/swsmu/smu11/sienna_cichlid_ppt.c
> index 2a7da2bad96a..048014f05b35 100644
> --- a/drivers/gpu/drm/amd/pm/swsmu/smu11/sienna_cichlid_ppt.c
> +++ b/drivers/gpu/drm/amd/pm/swsmu/smu11/sienna_cichlid_ppt.c
> @@ -451,6 +451,38 @@ static int sienna_cichlid_setup_pptable(struct
> smu_context *smu)
>   return ret;
>  }
> 
> +static void sienna_cichlid_get_unique_id(struct smu_context *smu) {
> + struct amdgpu_device *adev = smu->adev;
> + struct smu_table_context *smu_table = &smu->smu_table;
> + SmuMetrics_t *metrics =
> + &(((SmuMetricsExternal_t *)(smu_table->metrics_table))-
> >SmuMetrics);
> + SmuMetrics_V2_t *metrics_v2 =
> + &(((SmuMetricsExternal_t *)(smu_table->metrics_table))-
> >SmuMetrics_V2);
> + uint32_t upper32 = 0, lower32 = 0;
> + int ret;
> +
> + mutex_lock(&smu->metrics_lock);
> + ret = smu_cmn_get_metrics_table_locked(smu, NULL, false);
> + if (ret)
> + goto out_unlock;
> +
> + bool use_metrics_v2 = ((smu->adev->ip_versions[MP1_HWIP][0] ==
> IP_VERSION(11, 0, 7)) &&
> + (smu->smc_fw_version >= 0x3A4300)) ? true : false;
> +
> + upper32 = use_metrics_v2 ? metrics_v2->PublicSerialNumUpper32 :
> +metrics->PublicSerialNumUpper32;
> + lower32 = use_metrics_v2 ? metrics_v2->PublicSerialNumLower32 :
> +metrics->PublicSerialNumLower32;
> +
> +out_unlock:
> + mutex_unlock(&smu->metrics_lock);
> +
> + adev->unique_id = ((uint64_t)upper32 << 32) | lower32;
> + if (adev->serial[0] == '\0')
> + sprintf(adev->serial, "%016llx", adev->unique_id); }
> +
>  static int sienna_cichlid_tables_init(struct smu_context *smu)  {
>   struct smu_table_context *smu_table = &smu->smu_table; @@ -
> 4012,6 +4044,7 @@ static const struct pptable_funcs
> sienna_cichlid_ppt_funcs = {
>   .set_mp1_state = sienna_cichlid_set_mp1_state,
>   .stb_collect_info = sienna_cichli

[PATCH] drm/amdgpu: no rlcg legacy read in SRIOV case

2022-02-10 Thread Guchun Chen
rlcg legacy read is not available in SRIOV configration.
Otherwise, gmc_v9_0_flush_gpu_tlb will always complain
timeout and finally breaks driver load.

v2: bypass read in amdgpu_virt_get_rlcg_reg_access_flag (from Victor)

Fixes: 0dc4a7e75581("drm/amdgpu: switch to get_rlcg_reg_access_flag for gfx9")
Signed-off-by: Guchun Chen 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c
index e1288901beb6..6668d7fa89e4 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c
@@ -836,7 +836,7 @@ static bool amdgpu_virt_get_rlcg_reg_access_flag(struct 
amdgpu_device *adev,
/* only in new version, AMDGPU_REGS_NO_KIQ and
 * AMDGPU_REGS_RLC are enabled simultaneously */
} else if ((acc_flags & AMDGPU_REGS_RLC) &&
-  !(acc_flags & AMDGPU_REGS_NO_KIQ)) {
+   !(acc_flags & AMDGPU_REGS_NO_KIQ) && write) {
*rlcg_flag = AMDGPU_RLCG_GC_WRITE_LEGACY;
ret = true;
}
@@ -940,7 +940,7 @@ void amdgpu_sriov_wreg(struct amdgpu_device *adev,
u32 rlcg_flag;
 
if (!amdgpu_sriov_runtime(adev) &&
-   amdgpu_virt_get_rlcg_reg_access_flag(adev, acc_flags, hwip, true, 
&rlcg_flag)) {
+   amdgpu_virt_get_rlcg_reg_access_flag(adev, acc_flags, hwip, 
true, &rlcg_flag)) {
amdgpu_virt_rlcg_reg_rw(adev, offset, value, rlcg_flag);
return;
}
@@ -957,7 +957,7 @@ u32 amdgpu_sriov_rreg(struct amdgpu_device *adev,
u32 rlcg_flag;
 
if (!amdgpu_sriov_runtime(adev) &&
-   amdgpu_virt_get_rlcg_reg_access_flag(adev, acc_flags, hwip, false, 
&rlcg_flag))
+   amdgpu_virt_get_rlcg_reg_access_flag(adev, acc_flags, hwip, 
false, &rlcg_flag))
return amdgpu_virt_rlcg_reg_rw(adev, offset, 0, rlcg_flag);
 
if (acc_flags & AMDGPU_REGS_NO_KIQ)
-- 
2.17.1



RE: [PATCH v2] drm/amdgpu: skipping SDMA hw_init and hw_fini for S0ix.

2022-02-10 Thread Liang, Prike


> -Original Message-
> From: Limonciello, Mario 
> Sent: Friday, February 11, 2022 12:09 AM
> To: Alex Deucher ; Liang, Prike
> 
> Cc: Mahapatra, Rajib ; Deucher, Alexander
> ; amd-gfx@lists.freedesktop.org; S, Shirish
> 
> Subject: RE: [PATCH v2] drm/amdgpu: skipping SDMA hw_init and hw_fini for
> S0ix.
> 
> [Public]
> 
> > > VG doesn't do s0i3 right?
> >
> > Right.
> >
> > > No, YC should not take a similar fix.YC had an architectural change 
> > > and
> to
> > > avoid a "similar" problem takes
> > 26db706a6d77b9e184feb11725e97e53b7a89519.
> >
> > Isn't that likely just a workaround for the same issue?  This seems cleaner.
> >
> 
> The SMU doesn't handle the restore of the SDMA registers for YC though,
> this explicitly changed.  So I don't believe we can do an identical fix there.
> 
> @Liang, Prike comments?

Yeah, in the gfx10 series looks the SMU doesn't handle SDMA save and restore in 
the PMFW anymore. 


RE: [PATCH] drm/amdgpu: no rlcg legacy read in SRIOV case

2022-02-10 Thread Zhang, Hawking
[AMD Official Use Only]

Reviewed-by: Hawking Zhang 

Regards,
Hawking

-Original Message-
From: Chen, Guchun  
Sent: Friday, February 11, 2022 13:39
To: amd-gfx@lists.freedesktop.org; Zhang, Hawking ; 
Zhou, Peng Ju ; Koenig, Christian 
; Deucher, Alexander ; 
Skvortsov, Victor 
Cc: Chen, Guchun 
Subject: [PATCH] drm/amdgpu: no rlcg legacy read in SRIOV case

rlcg legacy read is not available in SRIOV configration.
Otherwise, gmc_v9_0_flush_gpu_tlb will always complain timeout and finally 
breaks driver load.

v2: bypass read in amdgpu_virt_get_rlcg_reg_access_flag (from Victor)

Fixes: 0dc4a7e75581("drm/amdgpu: switch to get_rlcg_reg_access_flag for gfx9")
Signed-off-by: Guchun Chen 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c
index e1288901beb6..6668d7fa89e4 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c
@@ -836,7 +836,7 @@ static bool amdgpu_virt_get_rlcg_reg_access_flag(struct 
amdgpu_device *adev,
/* only in new version, AMDGPU_REGS_NO_KIQ and
 * AMDGPU_REGS_RLC are enabled simultaneously */
} else if ((acc_flags & AMDGPU_REGS_RLC) &&
-  !(acc_flags & AMDGPU_REGS_NO_KIQ)) {
+   !(acc_flags & AMDGPU_REGS_NO_KIQ) && write) {
*rlcg_flag = AMDGPU_RLCG_GC_WRITE_LEGACY;
ret = true;
}
@@ -940,7 +940,7 @@ void amdgpu_sriov_wreg(struct amdgpu_device *adev,
u32 rlcg_flag;
 
if (!amdgpu_sriov_runtime(adev) &&
-   amdgpu_virt_get_rlcg_reg_access_flag(adev, acc_flags, hwip, true, 
&rlcg_flag)) {
+   amdgpu_virt_get_rlcg_reg_access_flag(adev, acc_flags, hwip, 
true, 
+&rlcg_flag)) {
amdgpu_virt_rlcg_reg_rw(adev, offset, value, rlcg_flag);
return;
}
@@ -957,7 +957,7 @@ u32 amdgpu_sriov_rreg(struct amdgpu_device *adev,
u32 rlcg_flag;
 
if (!amdgpu_sriov_runtime(adev) &&
-   amdgpu_virt_get_rlcg_reg_access_flag(adev, acc_flags, hwip, false, 
&rlcg_flag))
+   amdgpu_virt_get_rlcg_reg_access_flag(adev, acc_flags, hwip, 
false, 
+&rlcg_flag))
return amdgpu_virt_rlcg_reg_rw(adev, offset, 0, rlcg_flag);
 
if (acc_flags & AMDGPU_REGS_NO_KIQ)
--
2.17.1


[PATCH Review 1/1] drm/amdgpu: Reset OOB table error count info

2022-02-10 Thread Stanley . Yang
The OOB table error count info should be reset after reset
eeprom table

Change-Id: I2a39e0e44b7b1a5ab7d6b4d4b73ebe48264396b7
Signed-off-by: Stanley.Yang 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_ras_eeprom.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras_eeprom.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras_eeprom.c
index c09d047272b2..2b844a5aafdb 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras_eeprom.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras_eeprom.c
@@ -263,6 +263,7 @@ static int amdgpu_ras_eeprom_correct_header_tag(
  */
 int amdgpu_ras_eeprom_reset_table(struct amdgpu_ras_eeprom_control *control)
 {
+   struct amdgpu_device *adev = to_amdgpu_device(control);
struct amdgpu_ras_eeprom_table_header *hdr = &control->tbl_hdr;
u8 csum;
int res;
@@ -282,6 +283,8 @@ int amdgpu_ras_eeprom_reset_table(struct 
amdgpu_ras_eeprom_control *control)
control->ras_num_recs = 0;
control->ras_fri = 0;
 
+   amdgpu_dpm_send_hbm_bad_pages_num(adev, control->ras_num_recs);
+
amdgpu_ras_debugfs_set_ret_size(control);
 
mutex_unlock(&control->ras_tbl_mutex);
-- 
2.17.1



RE: [PATCH Review 1/1] drm/amdgpu: Reset OOB table error count info

2022-02-10 Thread Zhou1, Tao
[AMD Official Use Only]

Reviewed-by: Tao Zhou 

> -Original Message-
> From: Stanley.Yang 
> Sent: Friday, February 11, 2022 3:04 PM
> To: amd-gfx@lists.freedesktop.org; Zhang, Hawking
> ; Clements, John ;
> Zhou1, Tao 
> Cc: Yang, Stanley 
> Subject: [PATCH Review 1/1] drm/amdgpu: Reset OOB table error count info
> 
> The OOB table error count info should be reset after reset eeprom table
> 
> Change-Id: I2a39e0e44b7b1a5ab7d6b4d4b73ebe48264396b7
> Signed-off-by: Stanley.Yang 
> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_ras_eeprom.c | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras_eeprom.c
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras_eeprom.c
> index c09d047272b2..2b844a5aafdb 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras_eeprom.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras_eeprom.c
> @@ -263,6 +263,7 @@ static int amdgpu_ras_eeprom_correct_header_tag(
>   */
>  int amdgpu_ras_eeprom_reset_table(struct amdgpu_ras_eeprom_control
> *control)  {
> + struct amdgpu_device *adev = to_amdgpu_device(control);
>   struct amdgpu_ras_eeprom_table_header *hdr = &control->tbl_hdr;
>   u8 csum;
>   int res;
> @@ -282,6 +283,8 @@ int amdgpu_ras_eeprom_reset_table(struct
> amdgpu_ras_eeprom_control *control)
>   control->ras_num_recs = 0;
>   control->ras_fri = 0;
> 
> + amdgpu_dpm_send_hbm_bad_pages_num(adev, control-
> >ras_num_recs);
> +
>   amdgpu_ras_debugfs_set_ret_size(control);
> 
>   mutex_unlock(&control->ras_tbl_mutex);
> --
> 2.17.1


[PATCH 01/12] drm/amd/pm: drop unused structure members

2022-02-10 Thread Evan Quan
Drop those members which get never used.

Signed-off-by: Evan Quan 
Change-Id: Iec70ad1dfe2059be26843f378588e6c894e9cae8
---
 drivers/gpu/drm/amd/pm/swsmu/inc/amdgpu_smu.h | 2 --
 1 file changed, 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/pm/swsmu/inc/amdgpu_smu.h 
b/drivers/gpu/drm/amd/pm/swsmu/inc/amdgpu_smu.h
index fbef3ab8d487..fb32846a2d0e 100644
--- a/drivers/gpu/drm/amd/pm/swsmu/inc/amdgpu_smu.h
+++ b/drivers/gpu/drm/amd/pm/swsmu/inc/amdgpu_smu.h
@@ -373,8 +373,6 @@ struct smu_dpm_context {
 };
 
 struct smu_power_gate {
-   bool uvd_gated;
-   bool vce_gated;
atomic_t vcn_gated;
atomic_t jpeg_gated;
 };
-- 
2.29.0



[PATCH 02/12] drm/amd/pm: drop unused interfaces

2022-02-10 Thread Evan Quan
Drop those interfaces which never get used.

Signed-off-by: Evan Quan 
Change-Id: Ia22d395145a1003faca5ac792dca6a30ef2cae54
---
 drivers/gpu/drm/amd/pm/swsmu/amdgpu_smu.c | 13 -
 drivers/gpu/drm/amd/pm/swsmu/inc/amdgpu_smu.h |  5 
 drivers/gpu/drm/amd/pm/swsmu/inc/smu_v13_0.h  |  4 ---
 .../drm/amd/pm/swsmu/smu13/aldebaran_ppt.c|  6 -
 .../gpu/drm/amd/pm/swsmu/smu13/smu_v13_0.c| 27 ---
 5 files changed, 55 deletions(-)

diff --git a/drivers/gpu/drm/amd/pm/swsmu/amdgpu_smu.c 
b/drivers/gpu/drm/amd/pm/swsmu/amdgpu_smu.c
index 6535cf336fa5..1c3a5ccd100c 100644
--- a/drivers/gpu/drm/amd/pm/swsmu/amdgpu_smu.c
+++ b/drivers/gpu/drm/amd/pm/swsmu/amdgpu_smu.c
@@ -2686,19 +2686,6 @@ bool smu_mode1_reset_is_support(struct smu_context *smu)
return ret;
 }
 
-bool smu_mode2_reset_is_support(struct smu_context *smu)
-{
-   bool ret = false;
-
-   if (!smu->pm_enabled)
-   return false;
-
-   if (smu->ppt_funcs && smu->ppt_funcs->mode2_reset_is_support)
-   ret = smu->ppt_funcs->mode2_reset_is_support(smu);
-
-   return ret;
-}
-
 int smu_mode1_reset(struct smu_context *smu)
 {
int ret = 0;
diff --git a/drivers/gpu/drm/amd/pm/swsmu/inc/amdgpu_smu.h 
b/drivers/gpu/drm/amd/pm/swsmu/inc/amdgpu_smu.h
index fb32846a2d0e..39d169440d15 100644
--- a/drivers/gpu/drm/amd/pm/swsmu/inc/amdgpu_smu.h
+++ b/drivers/gpu/drm/amd/pm/swsmu/inc/amdgpu_smu.h
@@ -1143,10 +1143,6 @@ struct pptable_funcs {
 * @mode1_reset_is_support: Check if GPU supports mode1 reset.
 */
bool (*mode1_reset_is_support)(struct smu_context *smu);
-   /**
-* @mode2_reset_is_support: Check if GPU supports mode2 reset.
-*/
-   bool (*mode2_reset_is_support)(struct smu_context *smu);
 
/**
 * @mode1_reset: Perform mode1 reset.
@@ -1397,7 +1393,6 @@ int smu_get_power_limit(void *handle,
enum pp_power_type pp_power_type);
 
 bool smu_mode1_reset_is_support(struct smu_context *smu);
-bool smu_mode2_reset_is_support(struct smu_context *smu);
 int smu_mode1_reset(struct smu_context *smu);
 
 extern const struct amd_ip_funcs smu_ip_funcs;
diff --git a/drivers/gpu/drm/amd/pm/swsmu/inc/smu_v13_0.h 
b/drivers/gpu/drm/amd/pm/swsmu/inc/smu_v13_0.h
index 44af23ae059e..10f41cab796e 100644
--- a/drivers/gpu/drm/amd/pm/swsmu/inc/smu_v13_0.h
+++ b/drivers/gpu/drm/amd/pm/swsmu/inc/smu_v13_0.h
@@ -156,12 +156,8 @@ int smu_v13_0_notify_memory_pool_location(struct 
smu_context *smu);
 int smu_v13_0_system_features_control(struct smu_context *smu,
  bool en);
 
-int smu_v13_0_init_display_count(struct smu_context *smu, uint32_t count);
-
 int smu_v13_0_set_allowed_mask(struct smu_context *smu);
 
-int smu_v13_0_notify_display_change(struct smu_context *smu);
-
 int smu_v13_0_get_current_power_limit(struct smu_context *smu,
  uint32_t *power_limit);
 
diff --git a/drivers/gpu/drm/amd/pm/swsmu/smu13/aldebaran_ppt.c 
b/drivers/gpu/drm/amd/pm/swsmu/smu13/aldebaran_ppt.c
index 890acc4e2cb8..d7e619728e60 100644
--- a/drivers/gpu/drm/amd/pm/swsmu/smu13/aldebaran_ppt.c
+++ b/drivers/gpu/drm/amd/pm/swsmu/smu13/aldebaran_ppt.c
@@ -1967,11 +1967,6 @@ static bool aldebaran_is_mode1_reset_supported(struct 
smu_context *smu)
return true;
 }
 
-static bool aldebaran_is_mode2_reset_supported(struct smu_context *smu)
-{
-   return true;
-}
-
 static int aldebaran_set_mp1_state(struct smu_context *smu,
   enum pp_mp1_state mp1_state)
 {
@@ -2052,7 +2047,6 @@ static const struct pptable_funcs aldebaran_ppt_funcs = {
.set_pp_feature_mask = smu_cmn_set_pp_feature_mask,
.get_gpu_metrics = aldebaran_get_gpu_metrics,
.mode1_reset_is_support = aldebaran_is_mode1_reset_supported,
-   .mode2_reset_is_support = aldebaran_is_mode2_reset_supported,
.smu_handle_passthrough_sbr = aldebaran_smu_handle_passthrough_sbr,
.mode1_reset = aldebaran_mode1_reset,
.set_mp1_state = aldebaran_set_mp1_state,
diff --git a/drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0.c 
b/drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0.c
index f0ab1dc3ca59..b4fd148754ac 100644
--- a/drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0.c
+++ b/drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0.c
@@ -703,19 +703,6 @@ int smu_v13_0_set_tool_table_location(struct smu_context 
*smu)
return ret;
 }
 
-int smu_v13_0_init_display_count(struct smu_context *smu, uint32_t count)
-{
-   int ret = 0;
-
-   if (!smu->pm_enabled)
-   return ret;
-
-   ret = smu_cmn_send_smc_msg_with_param(smu, SMU_MSG_NumOfDisplays, 
count, NULL);
-
-   return ret;
-}
-
-
 int smu_v13_0_set_allowed_mask(struct smu_context *smu)
 {
struct smu_feature *feature = &smu->smu_feature;
@@ -768,20 +755,6 @@ int smu_v13_0_system_features_control(struct smu_context 
*smu,
  S

[PATCH 03/12] drm/amd/pm: drop unneeded !smu->pm_enabled check

2022-02-10 Thread Evan Quan
As smu->pm_enabled is a prerequisite for adev->pm.dpm_enabled.
So, with adev->pm.dpm_enabled set, it can be guarded that
smu->pm_enabled is also set. Thus the extra check for
"!smu->pm_enabled" is totally unnecessary.

Signed-off-by: Evan Quan 
Change-Id: I6ff67137d447e6a3d8cc627b397428fed22753f3
---
 drivers/gpu/drm/amd/pm/swsmu/amdgpu_smu.c | 84 +++
 1 file changed, 42 insertions(+), 42 deletions(-)

diff --git a/drivers/gpu/drm/amd/pm/swsmu/amdgpu_smu.c 
b/drivers/gpu/drm/amd/pm/swsmu/amdgpu_smu.c
index 1c3a5ccd100c..96a3388c2cb7 100644
--- a/drivers/gpu/drm/amd/pm/swsmu/amdgpu_smu.c
+++ b/drivers/gpu/drm/amd/pm/swsmu/amdgpu_smu.c
@@ -68,7 +68,7 @@ static int smu_sys_get_pp_feature_mask(void *handle,
 {
struct smu_context *smu = handle;
 
-   if (!smu->pm_enabled || !smu->adev->pm.dpm_enabled)
+   if (!smu->adev->pm.dpm_enabled)
return -EOPNOTSUPP;
 
return smu_get_pp_feature_mask(smu, buf);
@@ -79,7 +79,7 @@ static int smu_sys_set_pp_feature_mask(void *handle,
 {
struct smu_context *smu = handle;
 
-   if (!smu->pm_enabled || !smu->adev->pm.dpm_enabled)
+   if (!smu->adev->pm.dpm_enabled)
return -EOPNOTSUPP;
 
return smu_set_pp_feature_mask(smu, new_mask);
@@ -219,7 +219,7 @@ static int smu_dpm_set_power_gate(void *handle,
struct smu_context *smu = handle;
int ret = 0;
 
-   if (!smu->pm_enabled || !smu->adev->pm.dpm_enabled) {
+   if (!smu->adev->pm.dpm_enabled) {
dev_WARN(smu->adev->dev,
 "SMU uninitialized but power %s requested for %u!\n",
 gate ? "gate" : "ungate", block_type);
@@ -315,7 +315,7 @@ static void smu_restore_dpm_user_profile(struct smu_context 
*smu)
if (!smu->adev->in_suspend)
return;
 
-   if (!smu->pm_enabled || !smu->adev->pm.dpm_enabled)
+   if (!smu->adev->pm.dpm_enabled)
return;
 
/* Enable restore flag */
@@ -428,7 +428,7 @@ static int smu_sys_get_pp_table(void *handle,
struct smu_context *smu = handle;
struct smu_table_context *smu_table = &smu->smu_table;
 
-   if (!smu->pm_enabled || !smu->adev->pm.dpm_enabled)
+   if (!smu->adev->pm.dpm_enabled)
return -EOPNOTSUPP;
 
if (!smu_table->power_play_table && !smu_table->hardcode_pptable)
@@ -451,7 +451,7 @@ static int smu_sys_set_pp_table(void *handle,
ATOM_COMMON_TABLE_HEADER *header = (ATOM_COMMON_TABLE_HEADER *)buf;
int ret = 0;
 
-   if (!smu->pm_enabled || !smu->adev->pm.dpm_enabled)
+   if (!smu->adev->pm.dpm_enabled)
return -EOPNOTSUPP;
 
if (header->usStructureSize != size) {
@@ -1564,7 +1564,7 @@ static int smu_display_configuration_change(void *handle,
int index = 0;
int num_of_active_display = 0;
 
-   if (!smu->pm_enabled || !smu->adev->pm.dpm_enabled)
+   if (!smu->adev->pm.dpm_enabled)
return -EOPNOTSUPP;
 
if (!display_config)
@@ -1704,7 +1704,7 @@ static int smu_handle_task(struct smu_context *smu,
 {
int ret = 0;
 
-   if (!smu->pm_enabled || !smu->adev->pm.dpm_enabled)
+   if (!smu->adev->pm.dpm_enabled)
return -EOPNOTSUPP;
 
switch (task_id) {
@@ -1745,7 +1745,7 @@ static int smu_switch_power_profile(void *handle,
long workload;
uint32_t index;
 
-   if (!smu->pm_enabled || !smu->adev->pm.dpm_enabled)
+   if (!smu->adev->pm.dpm_enabled)
return -EOPNOTSUPP;
 
if (!(type < PP_SMC_POWER_PROFILE_CUSTOM))
@@ -1775,7 +1775,7 @@ static enum amd_dpm_forced_level 
smu_get_performance_level(void *handle)
struct smu_context *smu = handle;
struct smu_dpm_context *smu_dpm_ctx = &(smu->smu_dpm);
 
-   if (!smu->pm_enabled || !smu->adev->pm.dpm_enabled)
+   if (!smu->adev->pm.dpm_enabled)
return -EOPNOTSUPP;
 
if (!smu->is_apu && !smu_dpm_ctx->dpm_context)
@@ -1791,7 +1791,7 @@ static int smu_force_performance_level(void *handle,
struct smu_dpm_context *smu_dpm_ctx = &(smu->smu_dpm);
int ret = 0;
 
-   if (!smu->pm_enabled || !smu->adev->pm.dpm_enabled)
+   if (!smu->adev->pm.dpm_enabled)
return -EOPNOTSUPP;
 
if (!smu->is_apu && !smu_dpm_ctx->dpm_context)
@@ -1817,7 +1817,7 @@ static int smu_set_display_count(void *handle, uint32_t 
count)
 {
struct smu_context *smu = handle;
 
-   if (!smu->pm_enabled || !smu->adev->pm.dpm_enabled)
+   if (!smu->adev->pm.dpm_enabled)
return -EOPNOTSUPP;
 
return smu_init_display_count(smu, count);
@@ -1830,7 +1830,7 @@ static int smu_force_smuclk_levels(struct smu_context 
*smu,
struct smu_dpm_context *smu_dpm_ctx = &(smu->smu_dpm);
int ret = 0;
 
-   if (!smu->pm_enabled || !smu->adev->pm.dpm_enabled)
+   if (!smu->adev->pm.dpm_enabled)
return -EOPN

[PATCH 04/12] drm/amd/pm: use adev->pm.dpm_enabled for dpm enablement check

2022-02-10 Thread Evan Quan
adev->pm.dpm_enabled instead of hwmgr->pm_en can better reflect
whether the dpm features are actually enabled.

Signed-off-by: Evan Quan 
Change-Id: I6896dcee19bb473d26115cdcb12b6efd554b30f9
---
 drivers/gpu/drm/amd/pm/legacy-dpm/kv_dpm.c|  39 +++---
 drivers/gpu/drm/amd/pm/legacy-dpm/si_dpm.c|  39 +++---
 .../gpu/drm/amd/pm/powerplay/amd_powerplay.c  | 116 +-
 .../gpu/drm/amd/pm/powerplay/hwmgr/hwmgr.c|   6 +
 4 files changed, 104 insertions(+), 96 deletions(-)

diff --git a/drivers/gpu/drm/amd/pm/legacy-dpm/kv_dpm.c 
b/drivers/gpu/drm/amd/pm/legacy-dpm/kv_dpm.c
index 8b23cc9f098a..19e75a3c8bb1 100644
--- a/drivers/gpu/drm/amd/pm/legacy-dpm/kv_dpm.c
+++ b/drivers/gpu/drm/amd/pm/legacy-dpm/kv_dpm.c
@@ -3079,8 +3079,9 @@ static int kv_dpm_hw_fini(void *handle)
 {
struct amdgpu_device *adev = (struct amdgpu_device *)handle;
 
-   if (adev->pm.dpm_enabled)
-   kv_dpm_disable(adev);
+   adev->pm.dpm_enabled = false;
+
+   kv_dpm_disable(adev);
 
return 0;
 }
@@ -3089,12 +3090,13 @@ static int kv_dpm_suspend(void *handle)
 {
struct amdgpu_device *adev = (struct amdgpu_device *)handle;
 
-   if (adev->pm.dpm_enabled) {
-   /* disable dpm */
-   kv_dpm_disable(adev);
-   /* reset the power state */
-   adev->pm.dpm.current_ps = adev->pm.dpm.requested_ps = 
adev->pm.dpm.boot_ps;
-   }
+   adev->pm.dpm_enabled = false;
+
+   /* disable dpm */
+   kv_dpm_disable(adev);
+   /* reset the power state */
+   adev->pm.dpm.current_ps = adev->pm.dpm.requested_ps = 
adev->pm.dpm.boot_ps;
+
return 0;
 }
 
@@ -3103,17 +3105,16 @@ static int kv_dpm_resume(void *handle)
int ret;
struct amdgpu_device *adev = (struct amdgpu_device *)handle;
 
-   if (adev->pm.dpm_enabled) {
-   /* asic init will reset to the boot state */
-   kv_dpm_setup_asic(adev);
-   ret = kv_dpm_enable(adev);
-   if (ret)
-   adev->pm.dpm_enabled = false;
-   else
-   adev->pm.dpm_enabled = true;
-   if (adev->pm.dpm_enabled)
-   amdgpu_legacy_dpm_compute_clocks(adev);
-   }
+   /* asic init will reset to the boot state */
+   kv_dpm_setup_asic(adev);
+   ret = kv_dpm_enable(adev);
+   if (ret)
+   adev->pm.dpm_enabled = false;
+   else
+   adev->pm.dpm_enabled = true;
+   if (adev->pm.dpm_enabled)
+   amdgpu_legacy_dpm_compute_clocks(adev);
+
return 0;
 }
 
diff --git a/drivers/gpu/drm/amd/pm/legacy-dpm/si_dpm.c 
b/drivers/gpu/drm/amd/pm/legacy-dpm/si_dpm.c
index caae54487f9c..c6a294af8de8 100644
--- a/drivers/gpu/drm/amd/pm/legacy-dpm/si_dpm.c
+++ b/drivers/gpu/drm/amd/pm/legacy-dpm/si_dpm.c
@@ -7847,8 +7847,9 @@ static int si_dpm_hw_fini(void *handle)
 {
struct amdgpu_device *adev = (struct amdgpu_device *)handle;
 
-   if (adev->pm.dpm_enabled)
-   si_dpm_disable(adev);
+   adev->pm.dpm_enabled = false;
+
+   si_dpm_disable(adev);
 
return 0;
 }
@@ -7857,12 +7858,13 @@ static int si_dpm_suspend(void *handle)
 {
struct amdgpu_device *adev = (struct amdgpu_device *)handle;
 
-   if (adev->pm.dpm_enabled) {
-   /* disable dpm */
-   si_dpm_disable(adev);
-   /* reset the power state */
-   adev->pm.dpm.current_ps = adev->pm.dpm.requested_ps = 
adev->pm.dpm.boot_ps;
-   }
+   adev->pm.dpm_enabled = false;
+
+   /* disable dpm */
+   si_dpm_disable(adev);
+   /* reset the power state */
+   adev->pm.dpm.current_ps = adev->pm.dpm.requested_ps = 
adev->pm.dpm.boot_ps;
+
return 0;
 }
 
@@ -7871,17 +7873,16 @@ static int si_dpm_resume(void *handle)
int ret;
struct amdgpu_device *adev = (struct amdgpu_device *)handle;
 
-   if (adev->pm.dpm_enabled) {
-   /* asic init will reset to the boot state */
-   si_dpm_setup_asic(adev);
-   ret = si_dpm_enable(adev);
-   if (ret)
-   adev->pm.dpm_enabled = false;
-   else
-   adev->pm.dpm_enabled = true;
-   if (adev->pm.dpm_enabled)
-   amdgpu_legacy_dpm_compute_clocks(adev);
-   }
+   /* asic init will reset to the boot state */
+   si_dpm_setup_asic(adev);
+   ret = si_dpm_enable(adev);
+   if (ret)
+   adev->pm.dpm_enabled = false;
+   else
+   adev->pm.dpm_enabled = true;
+   if (adev->pm.dpm_enabled)
+   amdgpu_legacy_dpm_compute_clocks(adev);
+
return 0;
 }
 
diff --git a/drivers/gpu/drm/amd/pm/powerplay/amd_powerplay.c 
b/drivers/gpu/drm/amd/pm/powerplay/amd_powerplay.c
index a2da46bf3985..991ac4adb263 100644
--- a/drivers/gpu/drm/amd/pm/powerplay/amd_powerplay.c
+++ b/drivers/gpu/drm/amd/

[PATCH 05/12] drm/amd/pm: move the check for dpm enablement to amdgpu_dpm.c

2022-02-10 Thread Evan Quan
Instead of checking this in every instance(framework), moving that check to
amdgpu_dpm.c is more proper. And that can make code clean and tidy.

Signed-off-by: Evan Quan 
Change-Id: I2f83a3b860e8aa12cc86f119011f520fbe21a301
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c   |   5 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c   |  16 +-
 drivers/gpu/drm/amd/pm/amdgpu_dpm.c   | 277 --
 drivers/gpu/drm/amd/pm/amdgpu_pm.c|  25 +-
 drivers/gpu/drm/amd/pm/inc/amdgpu_dpm.h   |  12 +-
 .../gpu/drm/amd/pm/legacy-dpm/legacy_dpm.c|   4 -
 .../gpu/drm/amd/pm/powerplay/amd_powerplay.c  | 117 
 drivers/gpu/drm/amd/pm/swsmu/amdgpu_smu.c | 135 +
 drivers/gpu/drm/amd/pm/swsmu/inc/amdgpu_smu.h |   1 -
 9 files changed, 352 insertions(+), 240 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c
index 2c929fa40379..fff0e6a3882e 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c
@@ -261,11 +261,14 @@ static int amdgpu_ctx_get_stable_pstate(struct amdgpu_ctx 
*ctx,
 {
struct amdgpu_device *adev = ctx->adev;
enum amd_dpm_forced_level current_level;
+   int ret = 0;
 
if (!ctx)
return -EINVAL;
 
-   current_level = amdgpu_dpm_get_performance_level(adev);
+   ret = amdgpu_dpm_get_performance_level(adev, ¤t_level);
+   if (ret)
+   return ret;
 
switch (current_level) {
case AMD_DPM_FORCED_LEVEL_PROFILE_STANDARD:
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
index 9f985bd463be..56144f25b720 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
@@ -813,15 +813,17 @@ int amdgpu_info_ioctl(struct drm_device *dev, void *data, 
struct drm_file *filp)
unsigned i;
struct drm_amdgpu_info_vce_clock_table vce_clk_table = {};
struct amd_vce_state *vce_state;
+   int ret = 0;
 
for (i = 0; i < AMDGPU_VCE_CLOCK_TABLE_ENTRIES; i++) {
-   vce_state = amdgpu_dpm_get_vce_clock_state(adev, i);
-   if (vce_state) {
-   vce_clk_table.entries[i].sclk = vce_state->sclk;
-   vce_clk_table.entries[i].mclk = vce_state->mclk;
-   vce_clk_table.entries[i].eclk = 
vce_state->evclk;
-   vce_clk_table.num_valid_entries++;
-   }
+   ret = amdgpu_dpm_get_vce_clock_state(adev, i, 
vce_state);
+   if (ret)
+   return ret;
+
+   vce_clk_table.entries[i].sclk = vce_state->sclk;
+   vce_clk_table.entries[i].mclk = vce_state->mclk;
+   vce_clk_table.entries[i].eclk = vce_state->evclk;
+   vce_clk_table.num_valid_entries++;
}
 
return copy_to_user(out, &vce_clk_table,
diff --git a/drivers/gpu/drm/amd/pm/amdgpu_dpm.c 
b/drivers/gpu/drm/amd/pm/amdgpu_dpm.c
index 1d63f1e8884c..b46ae0063047 100644
--- a/drivers/gpu/drm/amd/pm/amdgpu_dpm.c
+++ b/drivers/gpu/drm/amd/pm/amdgpu_dpm.c
@@ -41,6 +41,9 @@ int amdgpu_dpm_get_sclk(struct amdgpu_device *adev, bool low)
const struct amd_pm_funcs *pp_funcs = adev->powerplay.pp_funcs;
int ret = 0;
 
+   if (!adev->pm.dpm_enabled)
+   return 0;
+
if (!pp_funcs->get_sclk)
return 0;
 
@@ -57,6 +60,9 @@ int amdgpu_dpm_get_mclk(struct amdgpu_device *adev, bool low)
const struct amd_pm_funcs *pp_funcs = adev->powerplay.pp_funcs;
int ret = 0;
 
+   if (!adev->pm.dpm_enabled)
+   return 0;
+
if (!pp_funcs->get_mclk)
return 0;
 
@@ -74,6 +80,13 @@ int amdgpu_dpm_set_powergating_by_smu(struct amdgpu_device 
*adev, uint32_t block
const struct amd_pm_funcs *pp_funcs = adev->powerplay.pp_funcs;
enum ip_power_state pwr_state = gate ? POWER_STATE_OFF : POWER_STATE_ON;
 
+   if (!adev->pm.dpm_enabled) {
+   dev_WARN(adev->dev,
+"SMU uninitialized but power %s requested for %u!\n",
+gate ? "gate" : "ungate", block_type);
+   return -EOPNOTSUPP;
+   }
+
if (atomic_read(&adev->pm.pwr_state[block_type]) == pwr_state) {
dev_dbg(adev->dev, "IP block%d already in the target %s state!",
block_type, gate ? "gate" : "ungate");
@@ -261,6 +274,9 @@ int amdgpu_dpm_switch_power_profile(struct amdgpu_device 
*adev,
const struct amd_pm_funcs *pp_funcs = adev->powerplay.pp_funcs;
int ret = 0;
 
+   if (!adev->pm.dpm_enabled)
+   return -EOPNOTSUPP;
+
if (amdgpu_sriov_vf(adev))
return 0;
 
@@ -

[PATCH 08/12] drm/amd/pm: add proper check for amdgpu_dpm before granting pp_dpm_load_fw

2022-02-10 Thread Evan Quan
Make sure the interface get granted only when amdgpu_dpm enabled.

Signed-off-by: Evan Quan 
Change-Id: Ia1d1123470fab89b41b24ea80dcb319570aa7438
---
 drivers/gpu/drm/amd/pm/powerplay/amd_powerplay.c | 6 ++
 drivers/gpu/drm/amd/pm/powerplay/hwmgr/hwmgr.c   | 3 ---
 2 files changed, 6 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/amd/pm/powerplay/amd_powerplay.c 
b/drivers/gpu/drm/amd/pm/powerplay/amd_powerplay.c
index 4c709f7bcd51..e95893556147 100644
--- a/drivers/gpu/drm/amd/pm/powerplay/amd_powerplay.c
+++ b/drivers/gpu/drm/amd/pm/powerplay/amd_powerplay.c
@@ -49,6 +49,9 @@ static int amd_powerplay_create(struct amdgpu_device *adev)
 
hwmgr->adev = adev;
hwmgr->not_vf = !amdgpu_sriov_vf(adev);
+   hwmgr->pp_one_vf = amdgpu_sriov_is_pp_one_vf(adev);
+   hwmgr->pm_en = (amdgpu_dpm && (hwmgr->not_vf || hwmgr->pp_one_vf))
+   ? true : false;
hwmgr->device = amdgpu_cgs_create_device(adev);
mutex_init(&hwmgr->msg_lock);
hwmgr->chip_family = adev->family;
@@ -275,6 +278,9 @@ static int pp_dpm_load_fw(void *handle)
 {
struct pp_hwmgr *hwmgr = handle;
 
+   if (!hwmgr->pm_en)
+   return -EOPNOTSUPP;
+
if (!hwmgr || !hwmgr->smumgr_funcs || !hwmgr->smumgr_funcs->start_smu)
return -EINVAL;
 
diff --git a/drivers/gpu/drm/amd/pm/powerplay/hwmgr/hwmgr.c 
b/drivers/gpu/drm/amd/pm/powerplay/hwmgr/hwmgr.c
index 4fd61d7f6c70..c0c2f36094fa 100644
--- a/drivers/gpu/drm/amd/pm/powerplay/hwmgr/hwmgr.c
+++ b/drivers/gpu/drm/amd/pm/powerplay/hwmgr/hwmgr.c
@@ -217,9 +217,6 @@ int hwmgr_hw_init(struct pp_hwmgr *hwmgr)
 {
int ret = 0;
 
-   hwmgr->pp_one_vf = amdgpu_sriov_is_pp_one_vf((struct amdgpu_device 
*)hwmgr->adev);
-   hwmgr->pm_en = (amdgpu_dpm && (hwmgr->not_vf || hwmgr->pp_one_vf))
-   ? true : false;
if (!hwmgr->pm_en)
return 0;
 
-- 
2.29.0



[PATCH 06/12] drm/amd/pm: correct the checks for sriov(pp_one_vf)

2022-02-10 Thread Evan Quan
By setting pm_enabled as false for non pp_one_vf sriov case,
we can avoid the check for (amdgpu_sriov_vf(adev) &&
!amdgpu_sriov_is_pp_one_vf(adev)) in every routine.

Signed-off-by: Evan Quan 
Change-Id: I3859529183cd26dce98c57dc87eab5273ecc949b
---
 drivers/gpu/drm/amd/pm/swsmu/amdgpu_smu.c | 21 -
 1 file changed, 4 insertions(+), 17 deletions(-)

diff --git a/drivers/gpu/drm/amd/pm/swsmu/amdgpu_smu.c 
b/drivers/gpu/drm/amd/pm/swsmu/amdgpu_smu.c
index 97c57a6cf314..8b8feaf7aa0e 100644
--- a/drivers/gpu/drm/amd/pm/swsmu/amdgpu_smu.c
+++ b/drivers/gpu/drm/amd/pm/swsmu/amdgpu_smu.c
@@ -543,7 +543,8 @@ static int smu_early_init(void *handle)
return -ENOMEM;
 
smu->adev = adev;
-   smu->pm_enabled = !!amdgpu_dpm;
+   smu->pm_enabled = amdgpu_dpm &&
+ (!amdgpu_sriov_vf(adev) || 
amdgpu_sriov_is_pp_one_vf(adev));
smu->is_apu = false;
smu->smu_baco.state = SMU_BACO_STATE_EXIT;
smu->smu_baco.platform_support = false;
@@ -1257,10 +1258,8 @@ static int smu_hw_init(void *handle)
struct amdgpu_device *adev = (struct amdgpu_device *)handle;
struct smu_context *smu = adev->powerplay.pp_handle;
 
-   if (amdgpu_sriov_vf(adev) && !amdgpu_sriov_is_pp_one_vf(adev)) {
-   smu->pm_enabled = false;
+   if (!smu->pm_enabled)
return 0;
-   }
 
ret = smu_start_smc_engine(smu);
if (ret) {
@@ -1274,9 +1273,6 @@ static int smu_hw_init(void *handle)
smu_set_gfx_cgpg(smu, true);
}
 
-   if (!smu->pm_enabled)
-   return 0;
-
/* get boot_values from vbios to set revision, gfxclk, and etc. */
ret = smu_get_vbios_bootup_values(smu);
if (ret) {
@@ -1428,7 +1424,7 @@ static int smu_hw_fini(void *handle)
struct amdgpu_device *adev = (struct amdgpu_device *)handle;
struct smu_context *smu = adev->powerplay.pp_handle;
 
-   if (amdgpu_sriov_vf(adev)&& !amdgpu_sriov_is_pp_one_vf(adev))
+   if (!smu->pm_enabled)
return 0;
 
smu_dpm_set_vcn_enable(smu, false);
@@ -1437,9 +1433,6 @@ static int smu_hw_fini(void *handle)
adev->vcn.cur_state = AMD_PG_STATE_GATE;
adev->jpeg.cur_state = AMD_PG_STATE_GATE;
 
-   if (!smu->pm_enabled)
-   return 0;
-
adev->pm.dpm_enabled = false;
 
return smu_smc_hw_cleanup(smu);
@@ -1479,9 +1472,6 @@ static int smu_suspend(void *handle)
struct smu_context *smu = adev->powerplay.pp_handle;
int ret;
 
-   if (amdgpu_sriov_vf(adev)&& !amdgpu_sriov_is_pp_one_vf(adev))
-   return 0;
-
if (!smu->pm_enabled)
return 0;
 
@@ -1504,9 +1494,6 @@ static int smu_resume(void *handle)
struct amdgpu_device *adev = (struct amdgpu_device *)handle;
struct smu_context *smu = adev->powerplay.pp_handle;
 
-   if (amdgpu_sriov_vf(adev)&& !amdgpu_sriov_is_pp_one_vf(adev))
-   return 0;
-
if (!smu->pm_enabled)
return 0;
 
-- 
2.29.0



[PATCH 07/12] drm/amd/pm: correct the checks for granting gpu reset APIs

2022-02-10 Thread Evan Quan
Those gpu reset APIs can be granted when:
  - System is up and dpm features are enabled.
  - System is under resuming and dpm features are not yet enabled.
Under such scenario, the PMFW is already alive and can support
those gpu reset functionalities.

Signed-off-by: Evan Quan 
Change-Id: I8c2f07138921eb53a2bd7fb94f9b3622af0eacf8
---
 .../gpu/drm/amd/include/kgd_pp_interface.h|  1 +
 drivers/gpu/drm/amd/pm/amdgpu_dpm.c   | 34 +++
 .../gpu/drm/amd/pm/powerplay/amd_powerplay.c  | 42 +++
 .../drm/amd/pm/powerplay/hwmgr/smu7_hwmgr.c   |  1 +
 .../drm/amd/pm/powerplay/hwmgr/smu8_hwmgr.c   | 17 
 drivers/gpu/drm/amd/pm/powerplay/inc/hwmgr.h  |  1 +
 drivers/gpu/drm/amd/pm/swsmu/amdgpu_smu.c | 32 +++---
 7 files changed, 101 insertions(+), 27 deletions(-)

diff --git a/drivers/gpu/drm/amd/include/kgd_pp_interface.h 
b/drivers/gpu/drm/amd/include/kgd_pp_interface.h
index a4c267f15959..892648a4a353 100644
--- a/drivers/gpu/drm/amd/include/kgd_pp_interface.h
+++ b/drivers/gpu/drm/amd/include/kgd_pp_interface.h
@@ -409,6 +409,7 @@ struct amd_pm_funcs {
   struct dpm_clocks *clock_table);
int (*get_smu_prv_buf_details)(void *handle, void **addr, size_t *size);
void (*pm_compute_clocks)(void *handle);
+   bool (*is_smc_alive)(void *handle);
 };
 
 struct metrics_table_header {
diff --git a/drivers/gpu/drm/amd/pm/amdgpu_dpm.c 
b/drivers/gpu/drm/amd/pm/amdgpu_dpm.c
index b46ae0063047..5f1d3342f87b 100644
--- a/drivers/gpu/drm/amd/pm/amdgpu_dpm.c
+++ b/drivers/gpu/drm/amd/pm/amdgpu_dpm.c
@@ -120,12 +120,25 @@ int amdgpu_dpm_set_powergating_by_smu(struct 
amdgpu_device *adev, uint32_t block
return ret;
 }
 
+static bool amdgpu_dpm_is_smc_alive(struct amdgpu_device *adev)
+{
+   const struct amd_pm_funcs *pp_funcs = adev->powerplay.pp_funcs;
+
+   if (!pp_funcs || !pp_funcs->is_smc_alive)
+   return false;
+
+   return pp_funcs->is_smc_alive;
+}
+
 int amdgpu_dpm_baco_enter(struct amdgpu_device *adev)
 {
const struct amd_pm_funcs *pp_funcs = adev->powerplay.pp_funcs;
void *pp_handle = adev->powerplay.pp_handle;
int ret = 0;
 
+   if (!amdgpu_dpm_is_smc_alive(adev))
+   return -EOPNOTSUPP;
+
if (!pp_funcs || !pp_funcs->set_asic_baco_state)
return -ENOENT;
 
@@ -145,6 +158,9 @@ int amdgpu_dpm_baco_exit(struct amdgpu_device *adev)
void *pp_handle = adev->powerplay.pp_handle;
int ret = 0;
 
+   if (!amdgpu_dpm_is_smc_alive(adev))
+   return -EOPNOTSUPP;
+
if (!pp_funcs || !pp_funcs->set_asic_baco_state)
return -ENOENT;
 
@@ -164,6 +180,9 @@ int amdgpu_dpm_set_mp1_state(struct amdgpu_device *adev,
int ret = 0;
const struct amd_pm_funcs *pp_funcs = adev->powerplay.pp_funcs;
 
+   if (!amdgpu_dpm_is_smc_alive(adev))
+   return -EOPNOTSUPP;
+
if (pp_funcs && pp_funcs->set_mp1_state) {
mutex_lock(&adev->pm.mutex);
 
@@ -184,6 +203,9 @@ bool amdgpu_dpm_is_baco_supported(struct amdgpu_device 
*adev)
bool baco_cap;
int ret = 0;
 
+   if (!amdgpu_dpm_is_smc_alive(adev))
+   return false;
+
if (!pp_funcs || !pp_funcs->get_asic_baco_capability)
return false;
 
@@ -203,6 +225,9 @@ int amdgpu_dpm_mode2_reset(struct amdgpu_device *adev)
void *pp_handle = adev->powerplay.pp_handle;
int ret = 0;
 
+   if (!amdgpu_dpm_is_smc_alive(adev))
+   return -EOPNOTSUPP;
+
if (!pp_funcs || !pp_funcs->asic_reset_mode_2)
return -ENOENT;
 
@@ -221,6 +246,9 @@ int amdgpu_dpm_baco_reset(struct amdgpu_device *adev)
void *pp_handle = adev->powerplay.pp_handle;
int ret = 0;
 
+   if (!amdgpu_dpm_is_smc_alive(adev))
+   return -EOPNOTSUPP;
+
if (!pp_funcs || !pp_funcs->set_asic_baco_state)
return -ENOENT;
 
@@ -244,6 +272,9 @@ bool amdgpu_dpm_is_mode1_reset_supported(struct 
amdgpu_device *adev)
struct smu_context *smu = adev->powerplay.pp_handle;
bool support_mode1_reset = false;
 
+   if (!amdgpu_dpm_is_smc_alive(adev))
+   return false;
+
if (is_support_sw_smu(adev)) {
mutex_lock(&adev->pm.mutex);
support_mode1_reset = smu_mode1_reset_is_support(smu);
@@ -258,6 +289,9 @@ int amdgpu_dpm_mode1_reset(struct amdgpu_device *adev)
struct smu_context *smu = adev->powerplay.pp_handle;
int ret = -EOPNOTSUPP;
 
+   if (!amdgpu_dpm_is_smc_alive(adev))
+   return -EOPNOTSUPP;
+
if (is_support_sw_smu(adev)) {
mutex_lock(&adev->pm.mutex);
ret = smu_mode1_reset(smu);
diff --git a/drivers/gpu/drm/amd/pm/powerplay/amd_powerplay.c 
b/drivers/gpu/drm/amd/pm/powerplay/amd_powerplay.c
index bba923cfe08c..4c709f7bcd51 100644
--- a/drivers/gpu/drm/amd/pm/

[PATCH 09/12] drm/amd/pm: drop redundant !pp_funcs check

2022-02-10 Thread Evan Quan
As it can be covered by the "!adev->pm.dpm_enabled" check. As long as
"adev->pm.dpm_enabled != NULL", "pp_funcs != NULL" can be also guarded.

Signed-off-by: Evan Quan 
Change-Id: Iec801f18a0069ad5fd384c4133016977fb2b67e8
---
 drivers/gpu/drm/amd/pm/amdgpu_dpm.c | 22 ++
 1 file changed, 10 insertions(+), 12 deletions(-)

diff --git a/drivers/gpu/drm/amd/pm/amdgpu_dpm.c 
b/drivers/gpu/drm/amd/pm/amdgpu_dpm.c
index 5f1d3342f87b..f237dd3a3f66 100644
--- a/drivers/gpu/drm/amd/pm/amdgpu_dpm.c
+++ b/drivers/gpu/drm/amd/pm/amdgpu_dpm.c
@@ -104,7 +104,7 @@ int amdgpu_dpm_set_powergating_by_smu(struct amdgpu_device 
*adev, uint32_t block
case AMD_IP_BLOCK_TYPE_JPEG:
case AMD_IP_BLOCK_TYPE_GMC:
case AMD_IP_BLOCK_TYPE_ACP:
-   if (pp_funcs && pp_funcs->set_powergating_by_smu)
+   if (pp_funcs->set_powergating_by_smu)
ret = (pp_funcs->set_powergating_by_smu(
(adev)->powerplay.pp_handle, block_type, gate));
break;
@@ -314,7 +314,7 @@ int amdgpu_dpm_switch_power_profile(struct amdgpu_device 
*adev,
if (amdgpu_sriov_vf(adev))
return 0;
 
-   if (pp_funcs && pp_funcs->switch_power_profile) {
+   if (pp_funcs->switch_power_profile) {
mutex_lock(&adev->pm.mutex);
ret = pp_funcs->switch_power_profile(
adev->powerplay.pp_handle, type, en);
@@ -333,7 +333,7 @@ int amdgpu_dpm_set_xgmi_pstate(struct amdgpu_device *adev,
if (!adev->pm.dpm_enabled)
return -EOPNOTSUPP;
 
-   if (pp_funcs && pp_funcs->set_xgmi_pstate) {
+   if (pp_funcs->set_xgmi_pstate) {
mutex_lock(&adev->pm.mutex);
ret = pp_funcs->set_xgmi_pstate(adev->powerplay.pp_handle,
pstate);
@@ -353,7 +353,7 @@ int amdgpu_dpm_set_df_cstate(struct amdgpu_device *adev,
if (!adev->pm.dpm_enabled)
return -EOPNOTSUPP;
 
-   if (pp_funcs && pp_funcs->set_df_cstate) {
+   if (pp_funcs->set_df_cstate) {
mutex_lock(&adev->pm.mutex);
ret = pp_funcs->set_df_cstate(pp_handle, cstate);
mutex_unlock(&adev->pm.mutex);
@@ -389,7 +389,7 @@ int amdgpu_dpm_enable_mgpu_fan_boost(struct amdgpu_device 
*adev)
if (!adev->pm.dpm_enabled)
return -EOPNOTSUPP;
 
-   if (pp_funcs && pp_funcs->enable_mgpu_fan_boost) {
+   if (pp_funcs->enable_mgpu_fan_boost) {
mutex_lock(&adev->pm.mutex);
ret = pp_funcs->enable_mgpu_fan_boost(pp_handle);
mutex_unlock(&adev->pm.mutex);
@@ -409,7 +409,7 @@ int amdgpu_dpm_set_clockgating_by_smu(struct amdgpu_device 
*adev,
if (!adev->pm.dpm_enabled)
return -EOPNOTSUPP;
 
-   if (pp_funcs && pp_funcs->set_clockgating_by_smu) {
+   if (pp_funcs->set_clockgating_by_smu) {
mutex_lock(&adev->pm.mutex);
ret = pp_funcs->set_clockgating_by_smu(pp_handle,
   msg_id);
@@ -430,7 +430,7 @@ int amdgpu_dpm_smu_i2c_bus_access(struct amdgpu_device 
*adev,
if (!adev->pm.dpm_enabled)
return -EOPNOTSUPP;
 
-   if (pp_funcs && pp_funcs->smu_i2c_bus_access) {
+   if (pp_funcs->smu_i2c_bus_access) {
mutex_lock(&adev->pm.mutex);
ret = pp_funcs->smu_i2c_bus_access(pp_handle,
   acquire);
@@ -449,8 +449,7 @@ void amdgpu_pm_acpi_event_handler(struct amdgpu_device 
*adev)
else
adev->pm.ac_power = false;
 
-   if (adev->powerplay.pp_funcs &&
-   adev->powerplay.pp_funcs->enable_bapm)
+   if (adev->powerplay.pp_funcs->enable_bapm)
amdgpu_dpm_enable_bapm(adev, adev->pm.ac_power);
 
if (is_support_sw_smu(adev))
@@ -472,7 +471,7 @@ int amdgpu_dpm_read_sensor(struct amdgpu_device *adev, enum 
amd_pp_sensors senso
if (!data || !size)
return -EINVAL;
 
-   if (pp_funcs && pp_funcs->read_sensor) {
+   if (pp_funcs->read_sensor) {
mutex_lock(&adev->pm.mutex);
ret = pp_funcs->read_sensor(adev->powerplay.pp_handle,
sensor,
@@ -719,8 +718,7 @@ void amdgpu_dpm_gfx_state_change(struct amdgpu_device *adev,
return;
 
mutex_lock(&adev->pm.mutex);
-   if (adev->powerplay.pp_funcs &&
-   adev->powerplay.pp_funcs->gfx_state_change_set)
+   if (adev->powerplay.pp_funcs->gfx_state_change_set)
((adev)->powerplay.pp_funcs->gfx_state_change_set(
(adev)->powerplay.pp_handle, state));
mutex_unlock(&adev->pm.mutex);
-- 
2.29.0



[PATCH 10/12] drm/amd/pm: drop nonsense !smu->ppt_funcs check

2022-02-10 Thread Evan Quan
Since the "smu->ppt_funcs" was already well installed at early_init phase,
the checks afterwards make nonsense.

Signed-off-by: Evan Quan 
Change-Id: I07a945035a87b23032e4911bba768edacbd5e65a
---
 drivers/gpu/drm/amd/pm/swsmu/amdgpu_smu.c   | 20 +---
 drivers/gpu/drm/amd/pm/swsmu/smu_internal.h |  2 +-
 2 files changed, 10 insertions(+), 12 deletions(-)

diff --git a/drivers/gpu/drm/amd/pm/swsmu/amdgpu_smu.c 
b/drivers/gpu/drm/amd/pm/swsmu/amdgpu_smu.c
index 27a453fb4db7..3773e95a18bf 100644
--- a/drivers/gpu/drm/amd/pm/swsmu/amdgpu_smu.c
+++ b/drivers/gpu/drm/amd/pm/swsmu/amdgpu_smu.c
@@ -934,7 +934,7 @@ static void smu_interrupt_work_fn(struct work_struct *work)
struct smu_context *smu = container_of(work, struct smu_context,
   interrupt_work);
 
-   if (smu->ppt_funcs && smu->ppt_funcs->interrupt_work)
+   if (smu->ppt_funcs->interrupt_work)
smu->ppt_funcs->interrupt_work(smu);
 }
 
@@ -1782,7 +1782,7 @@ static int smu_force_smuclk_levels(struct smu_context 
*smu,
return -EINVAL;
}
 
-   if (smu->ppt_funcs && smu->ppt_funcs->force_clk_levels) {
+   if (smu->ppt_funcs->force_clk_levels) {
ret = smu->ppt_funcs->force_clk_levels(smu, clk_type, mask);
if (!ret && !(smu->user_dpm_profile.flags & 
SMU_DPM_USER_PROFILE_RESTORE)) {
smu->user_dpm_profile.clk_mask[clk_type] = mask;
@@ -1845,8 +1845,7 @@ static int smu_set_mp1_state(void *handle,
struct smu_context *smu = handle;
int ret = 0;
 
-   if (smu->ppt_funcs &&
-   smu->ppt_funcs->set_mp1_state)
+   if (smu->ppt_funcs->set_mp1_state)
ret = smu->ppt_funcs->set_mp1_state(smu, mp1_state);
 
return ret;
@@ -1858,7 +1857,7 @@ static int smu_set_df_cstate(void *handle,
struct smu_context *smu = handle;
int ret = 0;
 
-   if (!smu->ppt_funcs || !smu->ppt_funcs->set_df_cstate)
+   if (!smu->ppt_funcs->set_df_cstate)
return 0;
 
ret = smu->ppt_funcs->set_df_cstate(smu, state);
@@ -1872,7 +1871,7 @@ int smu_allow_xgmi_power_down(struct smu_context *smu, 
bool en)
 {
int ret = 0;
 
-   if (!smu->ppt_funcs || !smu->ppt_funcs->allow_xgmi_power_down)
+   if (!smu->ppt_funcs->allow_xgmi_power_down)
return 0;
 
ret = smu->ppt_funcs->allow_xgmi_power_down(smu, en);
@@ -2510,7 +2509,7 @@ static int smu_get_baco_capability(void *handle, bool 
*cap)
 
*cap = false;
 
-   if (smu->ppt_funcs && smu->ppt_funcs->baco_is_support)
+   if (smu->ppt_funcs->baco_is_support)
*cap = smu->ppt_funcs->baco_is_support(smu);
 
return 0;
@@ -2542,7 +2541,7 @@ bool smu_mode1_reset_is_support(struct smu_context *smu)
 {
bool ret = false;
 
-   if (smu->ppt_funcs && smu->ppt_funcs->mode1_reset_is_support)
+   if (smu->ppt_funcs->mode1_reset_is_support)
ret = smu->ppt_funcs->mode1_reset_is_support(smu);
 
return ret;
@@ -2667,8 +2666,7 @@ int smu_get_ecc_info(struct smu_context *smu, void 
*umc_ecc)
 {
int ret = -EOPNOTSUPP;
 
-   if (smu->ppt_funcs &&
-   smu->ppt_funcs->get_ecc_info)
+   if (smu->ppt_funcs->get_ecc_info)
ret = smu->ppt_funcs->get_ecc_info(smu, umc_ecc);
 
return ret;
@@ -2881,7 +2879,7 @@ int smu_send_hbm_bad_pages_num(struct smu_context *smu, 
uint32_t size)
 {
int ret = 0;
 
-   if (smu->ppt_funcs && smu->ppt_funcs->send_hbm_bad_pages_num)
+   if (smu->ppt_funcs->send_hbm_bad_pages_num)
ret = smu->ppt_funcs->send_hbm_bad_pages_num(smu, size);
 
return ret;
diff --git a/drivers/gpu/drm/amd/pm/swsmu/smu_internal.h 
b/drivers/gpu/drm/amd/pm/swsmu/smu_internal.h
index 5f21ead860f9..a91967b31eeb 100644
--- a/drivers/gpu/drm/amd/pm/swsmu/smu_internal.h
+++ b/drivers/gpu/drm/amd/pm/swsmu/smu_internal.h
@@ -28,7 +28,7 @@
 #if defined(SWSMU_CODE_LAYER_L1)
 
 #define smu_ppt_funcs(intf, ret, smu, args...) \
-   ((smu)->ppt_funcs ? ((smu)->ppt_funcs->intf ? 
(smu)->ppt_funcs->intf(smu, ##args) : ret) : -EINVAL)
+   ((smu)->ppt_funcs->intf ? (smu)->ppt_funcs->intf(smu, ##args) : ret)
 
 #define smu_init_microcode(smu)
smu_ppt_funcs(init_microcode, 0, smu)
 #define smu_fini_microcode(smu)
smu_ppt_funcs(fini_microcode, 0, smu)
-- 
2.29.0



[PATCH 11/12] drm/amd/pm: drop extra non-necessary null pointers checks

2022-02-10 Thread Evan Quan
They are totally redundant. The checks before them can guard
they cannot be NULL.

Signed-off-by: Evan Quan 
Change-Id: I9f31734f49a8093582fc321ef3d93233946006e3
---
 .../gpu/drm/amd/pm/powerplay/amd_powerplay.c  | 182 ++
 .../amd/pm/powerplay/hwmgr/hardwaremanager.c  |  42 
 .../gpu/drm/amd/pm/powerplay/hwmgr/hwmgr.c|  17 +-
 .../amd/pm/powerplay/hwmgr/processpptables.c  |   2 +-
 .../gpu/drm/amd/pm/powerplay/smumgr/smumgr.c  |   6 +-
 5 files changed, 22 insertions(+), 227 deletions(-)

diff --git a/drivers/gpu/drm/amd/pm/powerplay/amd_powerplay.c 
b/drivers/gpu/drm/amd/pm/powerplay/amd_powerplay.c
index e95893556147..81ec5464b679 100644
--- a/drivers/gpu/drm/amd/pm/powerplay/amd_powerplay.c
+++ b/drivers/gpu/drm/amd/pm/powerplay/amd_powerplay.c
@@ -40,9 +40,6 @@ static int amd_powerplay_create(struct amdgpu_device *adev)
 {
struct pp_hwmgr *hwmgr;
 
-   if (adev == NULL)
-   return -EINVAL;
-
hwmgr = kzalloc(sizeof(struct pp_hwmgr), GFP_KERNEL);
if (hwmgr == NULL)
return -ENOMEM;
@@ -281,7 +278,7 @@ static int pp_dpm_load_fw(void *handle)
if (!hwmgr->pm_en)
return -EOPNOTSUPP;
 
-   if (!hwmgr || !hwmgr->smumgr_funcs || !hwmgr->smumgr_funcs->start_smu)
+   if (!hwmgr->smumgr_funcs->start_smu)
return -EINVAL;
 
if (hwmgr->smumgr_funcs->start_smu(hwmgr)) {
@@ -301,9 +298,6 @@ static int pp_set_clockgating_by_smu(void *handle, uint32_t 
msg_id)
 {
struct pp_hwmgr *hwmgr = handle;
 
-   if (!hwmgr)
-   return -EINVAL;
-
if (hwmgr->hwmgr_func->update_clock_gatings == NULL) {
pr_info_ratelimited("%s was not implemented.\n", __func__);
return 0;
@@ -341,9 +335,6 @@ static int pp_dpm_force_performance_level(void *handle,
 {
struct pp_hwmgr *hwmgr = handle;
 
-   if (!hwmgr)
-   return -EINVAL;
-
if (level == hwmgr->dpm_level)
return 0;
 
@@ -359,9 +350,6 @@ static enum amd_dpm_forced_level 
pp_dpm_get_performance_level(
 {
struct pp_hwmgr *hwmgr = handle;
 
-   if (!hwmgr)
-   return -EINVAL;
-
return hwmgr->dpm_level;
 }
 
@@ -369,9 +357,6 @@ static uint32_t pp_dpm_get_sclk(void *handle, bool low)
 {
struct pp_hwmgr *hwmgr = handle;
 
-   if (!hwmgr)
-   return 0;
-
if (hwmgr->hwmgr_func->get_sclk == NULL) {
pr_info_ratelimited("%s was not implemented.\n", __func__);
return 0;
@@ -383,9 +368,6 @@ static uint32_t pp_dpm_get_mclk(void *handle, bool low)
 {
struct pp_hwmgr *hwmgr = handle;
 
-   if (!hwmgr)
-   return 0;
-
if (hwmgr->hwmgr_func->get_mclk == NULL) {
pr_info_ratelimited("%s was not implemented.\n", __func__);
return 0;
@@ -397,9 +379,6 @@ static void pp_dpm_powergate_vce(void *handle, bool gate)
 {
struct pp_hwmgr *hwmgr = handle;
 
-   if (!hwmgr)
-   return;
-
if (hwmgr->hwmgr_func->powergate_vce == NULL) {
pr_info_ratelimited("%s was not implemented.\n", __func__);
return;
@@ -411,9 +390,6 @@ static void pp_dpm_powergate_uvd(void *handle, bool gate)
 {
struct pp_hwmgr *hwmgr = handle;
 
-   if (!hwmgr)
-   return;
-
if (hwmgr->hwmgr_func->powergate_uvd == NULL) {
pr_info_ratelimited("%s was not implemented.\n", __func__);
return;
@@ -426,9 +402,6 @@ static int pp_dpm_dispatch_tasks(void *handle, enum 
amd_pp_task task_id,
 {
struct pp_hwmgr *hwmgr = handle;
 
-   if (!hwmgr)
-   return -EINVAL;
-
return hwmgr_handle_task(hwmgr, task_id, user_state);
 }
 
@@ -438,7 +411,7 @@ static enum amd_pm_state_type 
pp_dpm_get_current_power_state(void *handle)
struct pp_power_state *state;
enum amd_pm_state_type pm_type;
 
-   if (!hwmgr || !hwmgr->current_ps)
+   if (!hwmgr->current_ps)
return -EINVAL;
 
state = hwmgr->current_ps;
@@ -468,9 +441,6 @@ static int pp_dpm_set_fan_control_mode(void *handle, 
uint32_t mode)
 {
struct pp_hwmgr *hwmgr = handle;
 
-   if (!hwmgr)
-   return -EOPNOTSUPP;
-
if (hwmgr->hwmgr_func->set_fan_control_mode == NULL)
return -EOPNOTSUPP;
 
@@ -486,9 +456,6 @@ static int pp_dpm_get_fan_control_mode(void *handle, 
uint32_t *fan_mode)
 {
struct pp_hwmgr *hwmgr = handle;
 
-   if (!hwmgr)
-   return -EOPNOTSUPP;
-
if (hwmgr->hwmgr_func->get_fan_control_mode == NULL)
return -EOPNOTSUPP;
 
@@ -503,9 +470,6 @@ static int pp_dpm_set_fan_speed_pwm(void *handle, uint32_t 
speed)
 {
struct pp_hwmgr *hwmgr = handle;
 
-   if (!hwmgr)
-   return -EOPNOTSUPP;
-
if (hwmgr->hwmgr_func->set_fan_speed_pwm == NULL)
return -EOPNOTSUPP;
 
@@ -519,9 +4

[PATCH 12/12] drm/amd/pm: revise the implementations for asic reset

2022-02-10 Thread Evan Quan
Instead of having an interface for every reset method, we replace them
with a new interface which can support all reset methods.

Signed-off-by: Evan Quan 
Change-Id: I4c8a7121dd65c2671085673dd7c13cf7e4286f3d
---
 drivers/gpu/drm/amd/amdgpu/aldebaran.c|   2 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_device.c|   4 +-
 drivers/gpu/drm/amd/amdgpu/cik.c  |   4 +-
 drivers/gpu/drm/amd/amdgpu/nv.c   |  13 +-
 drivers/gpu/drm/amd/amdgpu/soc15.c|  12 +-
 drivers/gpu/drm/amd/amdgpu/vi.c   |   6 +-
 .../gpu/drm/amd/include/kgd_pp_interface.h|   7 +-
 drivers/gpu/drm/amd/pm/amdgpu_dpm.c   |  89 ++---
 drivers/gpu/drm/amd/pm/inc/amdgpu_dpm.h   |  13 +-
 .../gpu/drm/amd/pm/powerplay/amd_powerplay.c  |  86 
 drivers/gpu/drm/amd/pm/swsmu/amdgpu_smu.c | 126 +++---
 drivers/gpu/drm/amd/pm/swsmu/inc/amdgpu_smu.h |   3 -
 12 files changed, 180 insertions(+), 185 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/aldebaran.c 
b/drivers/gpu/drm/amd/amdgpu/aldebaran.c
index a545df4efce1..22b787de313a 100644
--- a/drivers/gpu/drm/amd/amdgpu/aldebaran.c
+++ b/drivers/gpu/drm/amd/amdgpu/aldebaran.c
@@ -128,7 +128,7 @@ static int aldebaran_mode2_reset(struct amdgpu_device *adev)
 {
/* disable BM */
pci_clear_master(adev->pdev);
-   adev->asic_reset_res = amdgpu_dpm_mode2_reset(adev);
+   adev->asic_reset_res = amdgpu_dpm_asic_reset(adev, 
AMD_RESET_METHOD_MODE2);
return adev->asic_reset_res;
 }
 
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index 7931132ce6e3..b19bfdf81500 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -4504,9 +4504,9 @@ int amdgpu_device_mode1_reset(struct amdgpu_device *adev)
 
 amdgpu_device_cache_pci_state(adev->pdev);
 
-if (amdgpu_dpm_is_mode1_reset_supported(adev)) {
+if (amdgpu_dpm_is_asic_reset_supported(adev, AMD_RESET_METHOD_MODE1)) {
 dev_info(adev->dev, "GPU smu mode1 reset\n");
-ret = amdgpu_dpm_mode1_reset(adev);
+ret = amdgpu_dpm_asic_reset(adev, AMD_RESET_METHOD_MODE1);
 } else {
 dev_info(adev->dev, "GPU psp mode1 reset\n");
 ret = psp_gpu_reset(adev);
diff --git a/drivers/gpu/drm/amd/amdgpu/cik.c b/drivers/gpu/drm/amd/amdgpu/cik.c
index f10ce740a29c..786975716eb9 100644
--- a/drivers/gpu/drm/amd/amdgpu/cik.c
+++ b/drivers/gpu/drm/amd/amdgpu/cik.c
@@ -1380,7 +1380,7 @@ static bool cik_asic_supports_baco(struct amdgpu_device 
*adev)
switch (adev->asic_type) {
case CHIP_BONAIRE:
case CHIP_HAWAII:
-   return amdgpu_dpm_is_baco_supported(adev);
+   return amdgpu_dpm_is_asic_reset_supported(adev, 
AMD_RESET_METHOD_BACO);
default:
return false;
}
@@ -1434,7 +1434,7 @@ static int cik_asic_reset(struct amdgpu_device *adev)
 
if (cik_asic_reset_method(adev) == AMD_RESET_METHOD_BACO) {
dev_info(adev->dev, "BACO reset\n");
-   r = amdgpu_dpm_baco_reset(adev);
+   r = amdgpu_dpm_asic_reset(adev, AMD_RESET_METHOD_BACO);
} else {
dev_info(adev->dev, "PCI CONFIG reset\n");
r = cik_asic_pci_config_reset(adev);
diff --git a/drivers/gpu/drm/amd/amdgpu/nv.c b/drivers/gpu/drm/amd/amdgpu/nv.c
index 494e17f65fc3..2e590008d3ee 100644
--- a/drivers/gpu/drm/amd/amdgpu/nv.c
+++ b/drivers/gpu/drm/amd/amdgpu/nv.c
@@ -414,7 +414,7 @@ static int nv_asic_mode2_reset(struct amdgpu_device *adev)
 
amdgpu_device_cache_pci_state(adev->pdev);
 
-   ret = amdgpu_dpm_mode2_reset(adev);
+   ret = amdgpu_dpm_asic_reset(adev, AMD_RESET_METHOD_MODE2);
if (ret)
dev_err(adev->dev, "GPU mode2 reset failed\n");
 
@@ -458,7 +458,7 @@ nv_asic_reset_method(struct amdgpu_device *adev)
case IP_VERSION(11, 0, 13):
return AMD_RESET_METHOD_MODE1;
default:
-   if (amdgpu_dpm_is_baco_supported(adev))
+   if (amdgpu_dpm_is_asic_reset_supported(adev, 
AMD_RESET_METHOD_BACO))
return AMD_RESET_METHOD_BACO;
else
return AMD_RESET_METHOD_MODE1;
@@ -476,7 +476,7 @@ static int nv_asic_reset(struct amdgpu_device *adev)
break;
case AMD_RESET_METHOD_BACO:
dev_info(adev->dev, "BACO reset\n");
-   ret = amdgpu_dpm_baco_reset(adev);
+   ret = amdgpu_dpm_asic_reset(adev, AMD_RESET_METHOD_BACO);
break;
case AMD_RESET_METHOD_MODE2:
dev_info(adev->dev, "MODE2 reset\n");
@@ -641,6 +641,11 @@ static int nv_update_umd_stable_pstate(struct 
amdgpu_device *adev,
return 0;
 }
 
+static bool nv_asic_supports_baco(struct amdgpu_device *adev)
+{
+   return amdgpu_dpm_is_asic_rese

Re: [PATCH 01/12] drm/amd/pm: drop unused structure members

2022-02-10 Thread Christian König

Nice cleanup.

Can't say much about the rest, but this patch and patch #2 are 
Reviewed-by: Christian König 


Regards,
Christian.

Am 11.02.22 um 08:51 schrieb Evan Quan:

Drop those members which get never used.

Signed-off-by: Evan Quan 
Change-Id: Iec70ad1dfe2059be26843f378588e6c894e9cae8
---
  drivers/gpu/drm/amd/pm/swsmu/inc/amdgpu_smu.h | 2 --
  1 file changed, 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/pm/swsmu/inc/amdgpu_smu.h 
b/drivers/gpu/drm/amd/pm/swsmu/inc/amdgpu_smu.h
index fbef3ab8d487..fb32846a2d0e 100644
--- a/drivers/gpu/drm/amd/pm/swsmu/inc/amdgpu_smu.h
+++ b/drivers/gpu/drm/amd/pm/swsmu/inc/amdgpu_smu.h
@@ -373,8 +373,6 @@ struct smu_dpm_context {
  };
  
  struct smu_power_gate {

-   bool uvd_gated;
-   bool vce_gated;
atomic_t vcn_gated;
atomic_t jpeg_gated;
  };