date:20211217

RE: [PATCH 4/4] drm/amdgpu: Access the FRU on Aldebaran

2021-12-17 Thread Chen, Guchun

[Public]

Hi Kent,

+
+   if (adev->asic_type == CHIP_ALDEBARAN)
+   offset = 0;
 
if (!is_fru_eeprom_supported(adev))

I prefer to put 'adev->asic_type == CHIP_ALDEBARAN' after calling 
is_fru_eeprom_supported to make code logic cleaner. Without FRU support, we 
should do nothing.

Regards,
Guchun

-Original Message-
From: amd-gfx  On Behalf Of Kent Russell
Sent: Friday, December 17, 2021 11:32 PM
To: amd-gfx@lists.freedesktop.org
Cc: Russell, Kent 
Subject: [PATCH 4/4] drm/amdgpu: Access the FRU on Aldebaran

This is supported, although the offset is different from VG20, so fix that with 
a variable and enable getting the product name and serial number from the FRU. 
Do this for all SKUs since all SKUs have the FRU

Signed-off-by: Kent Russell 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_fru_eeprom.c | 13 ++---
 1 file changed, 10 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_fru_eeprom.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_fru_eeprom.c
index 5ed24701f9cf..80f43e69e659 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_fru_eeprom.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_fru_eeprom.c
@@ -56,6 +56,9 @@ static bool is_fru_eeprom_supported(struct amdgpu_device 
*adev)
return true;
else
return false;
+   case CHIP_ALDEBARAN:
+   /* All Aldebaran SKUs have the FRU */
+   return true;
default:
return false;
}
@@ -91,6 +94,10 @@ int amdgpu_fru_get_product_info(struct amdgpu_device *adev)
unsigned char buff[PRODUCT_NAME_LEN+2];
u32 addrptr;
int size, len;
+   int offset = 2;
+
+   if (adev->asic_type == CHIP_ALDEBARAN)
+   offset = 0;
 
if (!is_fru_eeprom_supported(adev))
return 0;
@@ -137,7 +144,7 @@ int amdgpu_fru_get_product_info(struct amdgpu_device *adev)
len = PRODUCT_NAME_LEN - 1;
}
/* Start at 2 due to buff using fields 0 and 1 for the address */
-   memcpy(adev->product_name, [2], len);
+   memcpy(adev->product_name, [offset], len);
adev->product_name[len] = '\0';
 
addrptr += size + 1;
@@ -155,7 +162,7 @@ int amdgpu_fru_get_product_info(struct amdgpu_device *adev)
DRM_WARN("FRU Product Number is larger than 16 characters. This 
is likely a mistake");
len = sizeof(adev->product_number) - 1;
}
-   memcpy(adev->product_number, [2], len);
+   memcpy(adev->product_number, [offset], len);
adev->product_number[len] = '\0';
 
addrptr += size + 1;
@@ -182,7 +189,7 @@ int amdgpu_fru_get_product_info(struct amdgpu_device *adev)
DRM_WARN("FRU Serial Number is larger than 16 characters. This 
is likely a mistake");
len = sizeof(adev->serial) - 1;
}
-   memcpy(adev->serial, [2], len);
+   memcpy(adev->serial, [offset], len);
adev->serial[len] = '\0';
 
return 0;
--
2.25.1

RE: [PATCH] drm/amdkfd: correct sdma queue number in kfd device init

2021-12-17 Thread Chen, Guchun

[Public]

Hi Graham,

My general thought is, from what I observed, IP version does not change in a 
linear variation manner, so moving to switch case may be easier for user to 
decode this. Also, I want to get the code aligned with the IP parse code in 
amdgpu_discovery.c.

Please correct me if I am wrong.

Regards,
Guchun

-Original Message-
From: Kim, Jonathan  
Sent: Friday, December 17, 2021 11:35 PM
To: Sider, Graham ; Chen, Guchun ; 
amd-gfx@lists.freedesktop.org; Deucher, Alexander ; 
Kuehling, Felix 
Subject: RE: [PATCH] drm/amdkfd: correct sdma queue number in kfd device init



> -Original Message-
> From: Sider, Graham 
> Sent: December 17, 2021 10:06 AM
> To: Chen, Guchun ; amd- 
> g...@lists.freedesktop.org; Deucher, Alexander 
> ; Kuehling, Felix ; 
> Kim, Jonathan 
> Subject: RE: [PATCH] drm/amdkfd: correct sdma queue number in kfd 
> device init
> 
> [Public]
> 
> > -Original Message-
> > From: Chen, Guchun 
> > Sent: Friday, December 17, 2021 9:31 AM
> > To: amd-gfx@lists.freedesktop.org; Deucher, Alexander 
> > ; Sider, Graham
> ;
> > Kuehling, Felix ; Kim, Jonathan 
> > 
> > Cc: Chen, Guchun 
> > Subject: [PATCH] drm/amdkfd: correct sdma queue number in kfd device 
> > init
> >
> > sdma queue number is not correct like on vega20, this patch promises 
> > the setting keeps the same after code refactor.
> > Additionally, improve code to use switch case to list IP version to 
> > complete kfd device_info structure filling.
> > This keeps consistency with the IP parse code in amdgpu_discovery.c.
> >
> > Fixes: a9e2c4dc6cc4("drm/amdkfd: add kfd_device_info_init function")
> > Signed-off-by: Guchun Chen 
> > ---
> >  drivers/gpu/drm/amd/amdkfd/kfd_device.c | 74
> > ++---
> >  1 file changed, 65 insertions(+), 9 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device.c
> > b/drivers/gpu/drm/amd/amdkfd/kfd_device.c
> > index facc28f58c1f..e50bf992f298 100644
> > --- a/drivers/gpu/drm/amd/amdkfd/kfd_device.c
> > +++ b/drivers/gpu/drm/amd/amdkfd/kfd_device.c
> > @@ -59,11 +59,72 @@ static void kfd_gtt_sa_fini(struct kfd_dev 
> > *kfd);
> >
> >  static int kfd_resume(struct kfd_dev *kfd);
> >
> > +static void kfd_device_info_set_sdma_queue_num(struct kfd_dev *kfd)
> {
> > +   uint32_t sdma_version = kfd->adev-
> >ip_versions[SDMA0_HWIP][0];
> > +
> > +   switch (sdma_version) {
> > +   case IP_VERSION(4, 0, 0):/* VEGA10 */
> > +   case IP_VERSION(4, 0, 1):/* VEGA12 */
> > +   case IP_VERSION(4, 1, 0):/* RAVEN */
> > +   case IP_VERSION(4, 1, 1):/* RAVEN */
> > +   case IP_VERSION(4, 1, 2):/* RENIOR */
> > +   case IP_VERSION(5, 2, 1):/* VANGOGH */
> > +   case IP_VERSION(5, 2, 3):/* YELLOW_CARP */
> > +   kfd->device_info.num_sdma_queues_per_engine =
> > 2;
> > +   break;
> > +   case IP_VERSION(4, 2, 0):/* VEGA20 */
> 
> Thanks for spotting this Guchun. My previous patch should have used a "<"
> instead of a "<=" on IP_VERSION(4, 2, 0).
> 
> > +   case IP_VERSION(4, 2, 2):/* ARCTUTUS */
> > +   case IP_VERSION(4, 4, 0):/* ALDEBARAN */
> > +   case IP_VERSION(5, 0, 0):/* NAVI10 */
> > +   case IP_VERSION(5, 0, 1):/* CYAN_SKILLFISH */
> > +   case IP_VERSION(5, 0, 2):/* NAVI14 */
> > +   case IP_VERSION(5, 0, 5):/* NAVI12 */
> > +   case IP_VERSION(5, 2, 0):/* SIENNA_CICHLID */
> > +   case IP_VERSION(5, 2, 2):/* NAVY_FLOUDER */
> > +   case IP_VERSION(5, 2, 4):/* DIMGREY_CAVEFISH */
> > +   kfd->device_info.num_sdma_queues_per_engine =
> > 8;
> > +   break;
> > +   default:
> > +   dev_err(kfd_device,
> > +   "Failed to find sdma ip
> > blocks(SDMA_HWIP:0x%x) in %s\n",
> > +sdma_version, __func__);
> > +   }
> > +}
> > +
> > +static void kfd_device_info_set_event_interrupt_class(struct 
> > +kfd_dev
> > +*kfd) {
> > +   uint32_t gc_version = KFD_GC_VERSION(kfd);
> > +
> > +   switch (gc_version) {
> > +   case IP_VERSION(9, 0, 1): /* VEGA10 */
> > +   case IP_VERSION(9, 2, 1): /* VEGA12 */
> > +   case IP_VERSION(9, 3, 0): /* RENOIR */
> > +   case IP_VERSION(9, 4, 0): /* VEGA20 */
> > +   case IP_VERSION(9, 4, 1): /* ARCTURUS */
> > +   case IP_VERSION(9, 4, 2): /* ALDEBARAN */
> > +   case IP_VERSION(10, 3, 1): /* VANGOGH */
> > +   case IP_VERSION(10, 3, 3): /* YELLOW_CARP */
> > +   case IP_VERSION(10, 1, 3): /* CYAN_SKILLFISH */
> > +   case IP_VERSION(10, 1, 10): /* NAVI10 */
> > +   case IP_VERSION(10, 1, 2): /* NAVI12 */
> > +   case IP_VERSION(10, 1, 1): /* NAVI14 */
> > +   case IP_VERSION(10, 3, 0): /* SIENNA_CICHLID */
> > +   case IP_VERSION(10, 3, 2): /* NAVY_FLOUNDER */
> > +   case IP_VERSION(10, 3, 4): /* DIMGREY_CAVEFISH */
> > +   case IP_VERSION(10, 3, 5): /* BEIGE_GOBY */
> > +   kfd->device_info.event_interrupt_class =
> >

[RFC 5/6] drm/amdgpu: Drop hive->in_reset

2021-12-17 Thread Andrey Grodzovsky

Since we serialize all resets no need to protect from concurrent
resets.

Signed-off-by: Andrey Grodzovsky 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 19 +--
 drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c   |  1 -
 drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.h   |  1 -
 3 files changed, 1 insertion(+), 20 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index 55cd67b9ede2..d2701e4d0622 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -5013,25 +5013,9 @@ int amdgpu_device_gpu_recover_imp(struct amdgpu_device 
*adev,
dev_info(adev->dev, "GPU %s begin!\n",
need_emergency_restart ? "jobs stop":"reset");
 
-   /*
-* Here we trylock to avoid chain of resets executing from
-* either trigger by jobs on different adevs in XGMI hive or jobs on
-* different schedulers for same device while this TO handler is 
running.
-* We always reset all schedulers for device and all devices for XGMI
-* hive so that should take care of them too.
-*/
hive = amdgpu_get_xgmi_hive(adev);
-   if (hive) {
-   if (atomic_cmpxchg(>in_reset, 0, 1) != 0) {
-   DRM_INFO("Bailing on TDR for s_job:%llx, hive: %llx as 
another already in progress",
-   job ? job->base.id : -1, hive->hive_id);
-   amdgpu_put_xgmi_hive(hive);
-   if (job && job->vm)
-   drm_sched_increase_karma(>base);
-   return 0;
-   }
+   if (hive)
mutex_lock(>hive_lock);
-   }
 
reset_context.method = AMD_RESET_METHOD_NONE;
reset_context.reset_req_dev = adev;
@@ -5226,7 +5210,6 @@ int amdgpu_device_gpu_recover_imp(struct amdgpu_device 
*adev,
 
 skip_recovery:
if (hive) {
-   atomic_set(>in_reset, 0);
mutex_unlock(>hive_lock);
amdgpu_put_xgmi_hive(hive);
}
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c
index 8b116f398101..0d54bef5c494 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c
@@ -403,7 +403,6 @@ struct amdgpu_hive_info *amdgpu_get_xgmi_hive(struct 
amdgpu_device *adev)
INIT_LIST_HEAD(>device_list);
INIT_LIST_HEAD(>node);
mutex_init(>hive_lock);
-   atomic_set(>in_reset, 0);
atomic_set(>number_devices, 0);
task_barrier_init(>tb);
hive->pstate = AMDGPU_XGMI_PSTATE_UNKNOWN;
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.h
index 6121aaa292cb..2f2ce53645a5 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.h
@@ -33,7 +33,6 @@ struct amdgpu_hive_info {
struct list_head node;
atomic_t number_devices;
struct mutex hive_lock;
-   atomic_t in_reset;
int hi_req_count;
struct amdgpu_device *hi_req_gpu;
struct task_barrier tb;
-- 
2.25.1

[RFC 1/6] drm/amdgpu: Init GPU reset single threaded wq

2021-12-17 Thread Andrey Grodzovsky

Do it for both single device and XGMI hive cases.

Signed-off-by: Andrey Grodzovsky 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu.h|  7 +++
 drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 20 +++-
 drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c   |  9 +
 drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.h   |  2 ++
 4 files changed, 37 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
index 9f017663ac50..b5ff76aae7e0 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
@@ -812,6 +812,11 @@ struct amd_powerplay {
 
 #define AMDGPU_RESET_MAGIC_NUM 64
 #define AMDGPU_MAX_DF_PERFMONS 4
+
+struct amdgpu_reset_domain {
+   struct workqueue_struct *wq;
+};
+
 struct amdgpu_device {
struct device   *dev;
struct pci_dev  *pdev;
@@ -1096,6 +1101,8 @@ struct amdgpu_device {
 
struct amdgpu_reset_control *reset_cntl;
uint32_t
ip_versions[HW_ID_MAX][HWIP_MAX_INSTANCE];
+
+   struct amdgpu_reset_domain  reset_domain;
 };
 
 static inline struct amdgpu_device *drm_to_adev(struct drm_device *ddev)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index 5625f7736e37..5f13195d23d1 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -2391,9 +2391,27 @@ static int amdgpu_device_ip_init(struct amdgpu_device 
*adev)
if (r)
goto init_failed;
 
-   if (adev->gmc.xgmi.num_physical_nodes > 1)
+   if (adev->gmc.xgmi.num_physical_nodes > 1) {
+   struct amdgpu_hive_info *hive;
+
amdgpu_xgmi_add_device(adev);
 
+   hive = amdgpu_get_xgmi_hive(adev);
+   if (!hive || !hive->reset_domain.wq) {
+   DRM_ERROR("Failed to obtain reset domain info for XGMI 
hive:%llx", hive->hive_id);
+   r = -EINVAL;
+   goto init_failed;
+   }
+
+   adev->reset_domain.wq = hive->reset_domain.wq;
+   } else {
+   adev->reset_domain.wq = 
alloc_ordered_workqueue("amdgpu-reset-dev", 0);
+   if (!adev->reset_domain.wq) {
+   r = -ENOMEM;
+   goto init_failed;
+   }
+   }
+
/* Don't init kfd if whole hive need to be reset during init */
if (!adev->gmc.xgmi.pending_reset)
amdgpu_amdkfd_device_init(adev);
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c
index 0fad2bf854ae..8b116f398101 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c
@@ -391,6 +391,14 @@ struct amdgpu_hive_info *amdgpu_get_xgmi_hive(struct 
amdgpu_device *adev)
goto pro_end;
}
 
+   hive->reset_domain.wq = alloc_ordered_workqueue("amdgpu-reset-hive", 0);
+   if (!hive->reset_domain.wq) {
+   dev_err(adev->dev, "XGMI: failed allocating wq for reset 
domain!\n");
+   kfree(hive);
+   hive = NULL;
+   goto pro_end;
+   }
+
hive->hive_id = adev->gmc.xgmi.hive_id;
INIT_LIST_HEAD(>device_list);
INIT_LIST_HEAD(>node);
@@ -400,6 +408,7 @@ struct amdgpu_hive_info *amdgpu_get_xgmi_hive(struct 
amdgpu_device *adev)
task_barrier_init(>tb);
hive->pstate = AMDGPU_XGMI_PSTATE_UNKNOWN;
hive->hi_req_gpu = NULL;
+
/*
 * hive pstate on boot is high in vega20 so we have to go to low
 * pstate on after boot.
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.h
index d2189bf7d428..6121aaa292cb 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.h
@@ -42,6 +42,8 @@ struct amdgpu_hive_info {
AMDGPU_XGMI_PSTATE_MAX_VEGA20,
AMDGPU_XGMI_PSTATE_UNKNOWN
} pstate;
+
+   struct amdgpu_reset_domain reset_domain;
 };
 
 struct amdgpu_pcs_ras_field {
-- 
2.25.1

[RFC 6/6] drm/amdgpu: Drop concurrent GPU reset protection for device

2021-12-17 Thread Andrey Grodzovsky

Since now all GPU resets are serialzied there is no need for this.

This patch also reverts 'drm/amdgpu: race issue when jobs on 2 ring timeout'

Signed-off-by: Andrey Grodzovsky 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 89 ++
 1 file changed, 7 insertions(+), 82 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index d2701e4d0622..311e0b9e1e4f 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -4763,11 +4763,10 @@ int amdgpu_do_asic_reset(struct list_head 
*device_list_handle,
return r;
 }
 
-static bool amdgpu_device_lock_adev(struct amdgpu_device *adev,
+static void amdgpu_device_lock_adev(struct amdgpu_device *adev,
struct amdgpu_hive_info *hive)
 {
-   if (atomic_cmpxchg(>in_gpu_reset, 0, 1) != 0)
-   return false;
+   atomic_set(>in_gpu_reset, 1);
 
if (hive) {
down_write_nest_lock(>reset_sem, >hive_lock);
@@ -4786,8 +4785,6 @@ static bool amdgpu_device_lock_adev(struct amdgpu_device 
*adev,
adev->mp1_state = PP_MP1_STATE_NONE;
break;
}
-
-   return true;
 }
 
 static void amdgpu_device_unlock_adev(struct amdgpu_device *adev)
@@ -4798,46 +4795,6 @@ static void amdgpu_device_unlock_adev(struct 
amdgpu_device *adev)
up_write(>reset_sem);
 }
 
-/*
- * to lockup a list of amdgpu devices in a hive safely, if not a hive
- * with multiple nodes, it will be similar as amdgpu_device_lock_adev.
- *
- * unlock won't require roll back.
- */
-static int amdgpu_device_lock_hive_adev(struct amdgpu_device *adev, struct 
amdgpu_hive_info *hive)
-{
-   struct amdgpu_device *tmp_adev = NULL;
-
-   if (adev->gmc.xgmi.num_physical_nodes > 1) {
-   if (!hive) {
-   dev_err(adev->dev, "Hive is NULL while device has 
multiple xgmi nodes");
-   return -ENODEV;
-   }
-   list_for_each_entry(tmp_adev, >device_list, 
gmc.xgmi.head) {
-   if (!amdgpu_device_lock_adev(tmp_adev, hive))
-   goto roll_back;
-   }
-   } else if (!amdgpu_device_lock_adev(adev, hive))
-   return -EAGAIN;
-
-   return 0;
-roll_back:
-   if (!list_is_first(_adev->gmc.xgmi.head, >device_list)) {
-   /*
-* if the lockup iteration break in the middle of a hive,
-* it may means there may has a race issue,
-* or a hive device locked up independently.
-* we may be in trouble and may not, so will try to roll back
-* the lock and give out a warnning.
-*/
-   dev_warn(tmp_adev->dev, "Hive lock iteration broke in the 
middle. Rolling back to unlock");
-   list_for_each_entry_continue_reverse(tmp_adev, 
>device_list, gmc.xgmi.head) {
-   amdgpu_device_unlock_adev(tmp_adev);
-   }
-   }
-   return -EAGAIN;
-}
-
 static void amdgpu_device_resume_display_audio(struct amdgpu_device *adev)
 {
struct pci_dev *p = NULL;
@@ -5023,22 +4980,6 @@ int amdgpu_device_gpu_recover_imp(struct amdgpu_device 
*adev,
reset_context.hive = hive;
clear_bit(AMDGPU_NEED_FULL_RESET, _context.flags);
 
-   /*
-* lock the device before we try to operate the linked list
-* if didn't get the device lock, don't touch the linked list since
-* others may iterating it.
-*/
-   r = amdgpu_device_lock_hive_adev(adev, hive);
-   if (r) {
-   dev_info(adev->dev, "Bailing on TDR for s_job:%llx, as another 
already in progress",
-   job ? job->base.id : -1);
-
-   /* even we skipped this reset, still need to set the job to 
guilty */
-   if (job && job->vm)
-   drm_sched_increase_karma(>base);
-   goto skip_recovery;
-   }
-
/*
 * Build list of devices to reset.
 * In case we are in XGMI hive mode, resort the device list
@@ -5058,6 +4999,9 @@ int amdgpu_device_gpu_recover_imp(struct amdgpu_device 
*adev,
 
/* block all schedulers and reset given job's ring */
list_for_each_entry(tmp_adev, device_list_handle, reset_list) {
+
+   amdgpu_device_lock_adev(tmp_adev, hive);
+
/*
 * Try to put the audio codec into suspend state
 * before gpu reset started.
@@ -5208,13 +5152,12 @@ int amdgpu_device_gpu_recover_imp(struct amdgpu_device 
*adev,
amdgpu_device_unlock_adev(tmp_adev);
}
 
-skip_recovery:
if (hive) {
mutex_unlock(>hive_lock);
amdgpu_put_xgmi_hive(hive);
}
 
-   if (r && r != -EAGAIN)
+   if (r)
dev_info(adev->dev, "GPU

[RFC 4/6] drm/amdgpu: Serialize non TDR gpu recovery with TDRs

2021-12-17 Thread Andrey Grodzovsky

Use reset domain wq also for non TDR gpu recovery trigers
such as sysfs and RAS. We must serialize all possible
GPU recoveries to gurantee no concurrency there.
For TDR call the original recovery function directly since
it's already executed from within the wq. For others just
use a wrapper to qeueue work and wait on it to finish.

Signed-off-by: Andrey Grodzovsky 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu.h|  2 ++
 drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 33 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_job.c|  2 +-
 3 files changed, 35 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
index b5ff76aae7e0..8e96b9a14452 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
@@ -1296,6 +1296,8 @@ bool amdgpu_device_has_job_running(struct amdgpu_device 
*adev);
 bool amdgpu_device_should_recover_gpu(struct amdgpu_device *adev);
 int amdgpu_device_gpu_recover(struct amdgpu_device *adev,
  struct amdgpu_job* job);
+int amdgpu_device_gpu_recover_imp(struct amdgpu_device *adev,
+ struct amdgpu_job *job);
 void amdgpu_device_pci_config_reset(struct amdgpu_device *adev);
 int amdgpu_device_pci_reset(struct amdgpu_device *adev);
 bool amdgpu_device_need_post(struct amdgpu_device *adev);
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index b595e6d699b5..55cd67b9ede2 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -4979,7 +4979,7 @@ static void amdgpu_device_recheck_guilty_jobs(
  * Returns 0 for success or an error on failure.
  */
 
-int amdgpu_device_gpu_recover(struct amdgpu_device *adev,
+int amdgpu_device_gpu_recover_imp(struct amdgpu_device *adev,
  struct amdgpu_job *job)
 {
struct list_head device_list, *device_list_handle =  NULL;
@@ -5236,6 +5236,37 @@ int amdgpu_device_gpu_recover(struct amdgpu_device *adev,
return r;
 }
 
+struct recover_work_struct {
+   struct work_struct base;
+   struct amdgpu_device *adev;
+   struct amdgpu_job *job;
+   int ret;
+};
+
+static void amdgpu_device_queue_gpu_recover_work(struct work_struct *work)
+{
+   struct recover_work_struct *recover_work = container_of(work, struct 
recover_work_struct, base);
+
+   recover_work->ret = amdgpu_device_gpu_recover_imp(recover_work->adev, 
recover_work->job);
+}
+/*
+ * Serialize gpu recover into reset domain single threaded wq
+ */
+int amdgpu_device_gpu_recover(struct amdgpu_device *adev,
+   struct amdgpu_job *job)
+{
+   struct recover_work_struct work = {.adev = adev, .job = job};
+
+   INIT_WORK(, amdgpu_device_queue_gpu_recover_work);
+
+   if (!queue_work(adev->reset_domain.wq, ))
+   return -EAGAIN;
+
+   flush_work();
+
+   return work.ret;
+}
+
 /**
  * amdgpu_device_get_pcie_info - fence pcie info about the PCIE slot
  *
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
index bfc47bea23db..38c9fd7b7ad4 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
@@ -63,7 +63,7 @@ static enum drm_gpu_sched_stat amdgpu_job_timedout(struct 
drm_sched_job *s_job)
  ti.process_name, ti.tgid, ti.task_name, ti.pid);
 
if (amdgpu_device_should_recover_gpu(ring->adev)) {
-   amdgpu_device_gpu_recover(ring->adev, job);
+   amdgpu_device_gpu_recover_imp(ring->adev, job);
} else {
drm_sched_suspend_timeout(>sched);
if (amdgpu_sriov_vf(adev))
-- 
2.25.1

[RFC 3/6] drm/amdgpu: Fix crash on modprobe

2021-12-17 Thread Andrey Grodzovsky

Restrict jobs resubmission to suspend case
only since schedulers not initialised yet on
probe.

Signed-off-by: Andrey Grodzovsky 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
index 5527c68c51de..8ebd954e06c6 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
@@ -582,7 +582,7 @@ void amdgpu_fence_driver_hw_init(struct amdgpu_device *adev)
if (!ring || !ring->fence_drv.initialized)
continue;
 
-   if (!ring->no_scheduler) {
+   if (adev->in_suspend && !ring->no_scheduler) {
drm_sched_resubmit_jobs(>sched);
drm_sched_start(>sched, true);
}
-- 
2.25.1

[RFC 2/6] drm/amdgpu: Move scheduler init to after XGMI is ready

2021-12-17 Thread Andrey Grodzovsky

Before we initialize schedulers we must know which reset
domain are we in - for single device there iis a single
domain per device and so single wq per device. For XGMI
the reset domain spans the entire XGMI hive and so the
reset wq is per hive.

Signed-off-by: Andrey Grodzovsky 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 45 ++
 drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c  | 34 ++--
 drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h   |  2 +
 3 files changed, 51 insertions(+), 30 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index 5f13195d23d1..b595e6d699b5 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -2284,6 +2284,47 @@ static int amdgpu_device_fw_loading(struct amdgpu_device 
*adev)
return r;
 }
 
+static int amdgpu_device_init_schedulers(struct amdgpu_device *adev)
+{
+   long timeout;
+   int r, i;
+
+   for (i = 0; i < AMDGPU_MAX_RINGS; ++i) {
+   struct amdgpu_ring *ring = adev->rings[i];
+
+   /* No need to setup the GPU scheduler for rings that don't need 
it */
+   if (!ring || ring->no_scheduler)
+   continue;
+
+   switch (ring->funcs->type) {
+   case AMDGPU_RING_TYPE_GFX:
+   timeout = adev->gfx_timeout;
+   break;
+   case AMDGPU_RING_TYPE_COMPUTE:
+   timeout = adev->compute_timeout;
+   break;
+   case AMDGPU_RING_TYPE_SDMA:
+   timeout = adev->sdma_timeout;
+   break;
+   default:
+   timeout = adev->video_timeout;
+   break;
+   }
+
+   r = drm_sched_init(>sched, _sched_ops,
+  ring->num_hw_submission, 
amdgpu_job_hang_limit,
+  timeout, adev->reset_domain.wq, 
ring->sched_score, ring->name);
+   if (r) {
+   DRM_ERROR("Failed to create scheduler on ring %s.\n",
+ ring->name);
+   return r;
+   }
+   }
+
+   return 0;
+}
+
+
 /**
  * amdgpu_device_ip_init - run init for hardware IPs
  *
@@ -2412,6 +2453,10 @@ static int amdgpu_device_ip_init(struct amdgpu_device 
*adev)
}
}
 
+   r = amdgpu_device_init_schedulers(adev);
+   if (r)
+   goto init_failed;
+
/* Don't init kfd if whole hive need to be reset during init */
if (!adev->gmc.xgmi.pending_reset)
amdgpu_amdkfd_device_init(adev);
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
index 3b7e86ea7167..5527c68c51de 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
@@ -456,8 +456,6 @@ int amdgpu_fence_driver_init_ring(struct amdgpu_ring *ring,
  atomic_t *sched_score)
 {
struct amdgpu_device *adev = ring->adev;
-   long timeout;
-   int r;
 
if (!adev)
return -EINVAL;
@@ -477,36 +475,12 @@ int amdgpu_fence_driver_init_ring(struct amdgpu_ring 
*ring,
spin_lock_init(>fence_drv.lock);
ring->fence_drv.fences = kcalloc(num_hw_submission * 2, sizeof(void *),
 GFP_KERNEL);
-   if (!ring->fence_drv.fences)
-   return -ENOMEM;
 
-   /* No need to setup the GPU scheduler for rings that don't need it */
-   if (ring->no_scheduler)
-   return 0;
+   ring->num_hw_submission = num_hw_submission;
+   ring->sched_score = sched_score;
 
-   switch (ring->funcs->type) {
-   case AMDGPU_RING_TYPE_GFX:
-   timeout = adev->gfx_timeout;
-   break;
-   case AMDGPU_RING_TYPE_COMPUTE:
-   timeout = adev->compute_timeout;
-   break;
-   case AMDGPU_RING_TYPE_SDMA:
-   timeout = adev->sdma_timeout;
-   break;
-   default:
-   timeout = adev->video_timeout;
-   break;
-   }
-
-   r = drm_sched_init(>sched, _sched_ops,
-  num_hw_submission, amdgpu_job_hang_limit,
-  timeout, NULL, sched_score, ring->name);
-   if (r) {
-   DRM_ERROR("Failed to create scheduler on ring %s.\n",
- ring->name);
-   return r;
-   }
+   if (!ring->fence_drv.fences)
+   return -ENOMEM;
 
return 0;
 }
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h
index 4d380e79752c..a4b8279e3011 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h
@@ -253,6 +253,8 @@ struct

[RFC 0/6] Define and use reset domain for GPU recovery in amdgpu

2021-12-17 Thread Andrey Grodzovsky

This patchset is based on earlier work by Boris[1] that allowed to have an
ordered workqueue at the driver level that will be used by the different
schedulers to queue their timeout work. On top of that I also serialized
any GPU reset we trigger from within amdgpu code to also go through the same
ordered wq and in this way simplify somewhat our GPU reset code so we don't need
to protect from concurrency by multiple GPU reset triggeres such as TDR on one
hand and sysfs trigger or RAS trigger on the other hand.

As advised by Christian and Daniel I defined a reset_domain struct such that
all the entities that go through reset together will be serialized one against
another. 

TDR triggered by multiple entities within the same domain due to the same 
reason will not
be triggered as the first such reset will cancel all the pending resets. This is
relevant only to TDR timers and not to triggered resets coming from RAS or 
SYSFS,
those will still happen after the in flight resets finishes.

[1] 
https://patchwork.kernel.org/project/dri-devel/patch/20210629073510.2764391-3-boris.brezil...@collabora.com/

P.S Going through drm-misc-next and not amd-staging-drm-next as Boris work 
hasn't landed yet there.

Andrey Grodzovsky (6):
  drm/amdgpu: Init GPU reset single threaded wq
  drm/amdgpu: Move scheduler init to after XGMI is ready
  drm/amdgpu: Fix crash on modprobe
  drm/amdgpu: Serialize non TDR gpu recovery with TDRs
  drm/amdgpu: Drop hive->in_reset
  drm/amdgpu: Drop concurrent GPU reset protection for device

 drivers/gpu/drm/amd/amdgpu/amdgpu.h|   9 +
 drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 206 +++--
 drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c  |  36 +---
 drivers/gpu/drm/amd/amdgpu/amdgpu_job.c|   2 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h   |   2 +
 drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c   |  10 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.h   |   3 +-
 7 files changed, 132 insertions(+), 136 deletions(-)

-- 
2.25.1

Re: [PATCH 10/19] drm/amd/display: Changed pipe split policy to allow for multi-display pipe split

2021-12-17 Thread Rodrigo Siqueira Jordao





On 2021-12-17 4:36 p.m., Deucher, Alexander wrote:

[AMD Official Use Only]


Maybe add Bug links for:
https://gitlab.freedesktop.org/drm/amd/-/issues/1522 

https://gitlab.freedesktop.org/drm/amd/-/issues/1709 

https://gitlab.freedesktop.org/drm/amd/-/issues/1655 

https://gitlab.freedesktop.org/drm/amd/-/issues/1403 



Sure, I'll update the commit message before apply this patch.

Thanks.






*From:* amd-gfx  on behalf of 
Rodrigo Siqueira 

*Sent:* Friday, December 17, 2021 4:23 PM
*To:* amd-gfx@lists.freedesktop.org 
*Cc:* Wang, Chao-kai (Stylon) ; Cyr, Aric 
; Li, Sun peng (Leo) ; Wentland, 
Harry ; Zhuo, Qingqing (Lillian) 
; Siqueira, Rodrigo ; 
Li, Roman ; Chiu, Solomon ; 
Pillai, Aurabindo ; Wang, Angus 
; Lin, Wayne ; Lipski, Mikita 
; Lakha, Bhawanpreet ; 
Gutierrez, Agustin ; Kotarac, Pavle 

*Subject:* [PATCH 10/19] drm/amd/display: Changed pipe split policy to 
allow for multi-display pipe split

From: Angus Wang 

[WHY]
Current implementation of pipe split policy prevents pipe split with
multiple displays connected, which caused the MCLK speed to be stuck at
max

[HOW]
Changed the pipe split policies so that pipe split is allowed for
multi-display configurations

Reviewed-by: Aric Cyr 
Acked-by: Rodrigo Siqueira 
Signed-off-by: Angus Wang 
---
  drivers/gpu/drm/amd/display/dc/dcn20/dcn20_resource.c   | 2 +-
  drivers/gpu/drm/amd/display/dc/dcn201/dcn201_resource.c | 2 +-
  drivers/gpu/drm/amd/display/dc/dcn21/dcn21_resource.c   | 2 +-
  drivers/gpu/drm/amd/display/dc/dcn30/dcn30_resource.c   | 2 +-
  drivers/gpu/drm/amd/display/dc/dcn301/dcn301_resource.c | 2 +-
  drivers/gpu/drm/amd/display/dc/dcn302/dcn302_resource.c | 2 +-
  drivers/gpu/drm/amd/display/dc/dcn303/dcn303_resource.c | 2 +-
  drivers/gpu/drm/amd/display/dc/dcn31/dcn31_resource.c   | 2 +-
  8 files changed, 8 insertions(+), 8 deletions(-)

diff --git a/drivers/gpu/drm/amd/display/dc/dcn20/dcn20_resource.c 
b/drivers/gpu/drm/amd/display/dc/dcn20/dcn20_resource.c

index 2a72517e2b28..2bc93df023ad 100644
--- a/drivers/gpu/drm/amd/display/dc/dcn20/dcn20_resource.c
+++ b/drivers/gpu/drm/amd/display/dc/dcn20/dcn20_resource.c
@@ -1069,7 +1069,7 @@ static const struct dc_debug_options 
debug_defaults_drv = {

  .timing_trace = false,
  .clock_trace = true,
  .disable_pplib_clock_request = true,
-   .pipe_split_policy = MPC_SPLIT_AVOID_MULT_DISP,
+   .pipe_split_policy = MPC_SPLIT_DYNAMIC,
  .force_single_disp_pipe_split = false,
  .disable_dcc = DCC_ENABLE,
  .vsr_support = true,
diff --git a/drivers/gpu/drm/amd/display/dc/dcn201/dcn201_resource.c 
b/drivers/gpu/drm/amd/display/dc/dcn201/dcn201_resource.c

index d6acf9a8590a..0bb7d3dd53fa 100644
--- a/drivers/gpu/drm/amd/display/dc/dcn201/dcn201_resource.c
+++ b/drivers/gpu/drm/amd/display/dc/dcn201/dcn201_resource.c
@@ -603,7 +603,7 @@ static const struct dc_debug_options 
debug_defaults_drv = {

  .timing_trace = false,
  .clock_trace = true,
  .disable_pplib_clock_request = true,
-   .pipe_split_policy = MPC_SPLIT_AVOID,
+   .pipe_split_policy = MPC_SPLIT_DYNAMIC,
  .force_single_disp_pipe_split = false,
  .disable_dcc = DCC_ENABLE,
  .vsr_support = true,
diff --git a/drivers/gpu/drm/amd/display/dc/dcn21/dcn21_resource.c 
b/drivers/gpu/drm/amd/display/dc/dcn21/dcn21_resource.c

index ca1bbc942fd4..e5cc6bf45743 100644
--- a/drivers/gpu/drm/amd/display/dc/dcn21/dcn21_resource.c
+++ b/drivers/gpu/drm/amd/display/dc/dcn21/dcn21_resource.c
@@ -873,7 +873,7 @@ static const struct dc_debug_options 
debug_defaults_drv = {

  .clock_trace = true,
  .disable_pplib_clock_request = true,
  .min_disp_clk_khz = 10,
-   .pipe_split_policy = MPC_SPLIT_AVOID_MULT_DISP,
+   .pipe_split_policy = MPC_SPLIT_DYNAMIC,
  .force_single_disp_pipe_split = false,
  .disable_dcc = DCC_ENABLE,
  .vsr_support = true,
diff --git a/drivers/gpu/drm/amd/display/dc/dcn30/dcn30_resource.c 
b/drivers/gpu/drm/amd/display/dc/dcn30/dcn30_resource.c

index 369ceeeddc7e..e12660c609ee 100644
--- a/drivers/gpu/drm/amd/display/dc/dcn30/dcn30_resource.c
+++ b/drivers/gpu/drm/amd/display/dc/dcn30/dcn30_resource.c
@@ -840,7 +840,7 @@ static const struct dc_debug_options 
debug_defaults_drv = {

  .timing_trace = false,
  .clock_trace = true,
  .disable_pplib_clock_request = true,
-   .pipe_split_policy = MPC_SPLIT_AVOID_MULT_DISP,
+

RE: [PATCH 00/19] DC Patches December 17, 2021

2021-12-17 Thread Wheeler, Daniel

[AMD Official Use Only]

Hi all,
 
This week this patchset was tested on the following systems:
 
Lenovo Thinkpad T14s Gen2 with AMD Ryzen 5 5650U, with the following display 
types: eDP 1080p 60hz, 4k 60hz  (via USB-C to DP/HDMI), 1440p 144hz (via USB-C 
to DP/HDMI), 1680*1050 60hz (via USB-C to DP and then DP to DVI/VGA)
 
Sapphire Pulse RX5700XT with the following display types:
4k 60hz  (via DP/HDMI), 1440p 144hz (via DP/HDMI), 1680*1050 60hz (via DP to 
DVI/VGA)
 
Reference AMD RX6800 with the following display types:
4k 60hz  (via DP/HDMI and USB-C to DP/HDMI), 1440p 144hz (via USB-C to DP/HDMI 
and USB-C to DP/HDMI), 1680*1050 60hz (via DP to DVI/VGA)
 
Included testing using a Startech DP 1.4 MST hub at 2x 4k 60hz, and 3x 1080p 
60hz on all systems. Also tested DSC via USB-C to DP DSC Hub with 3x 4k 60hz on 
Ryzen 9 5900h and Ryzen 5 4500u.
 
Tested on Ubuntu 20.04.3 with Kernel Version 5.13 
 
Tested-by: Daniel Wheeler 
 
 
Thank you,
 
Dan Wheeler
Technologist  |  AMD
SW Display
--
1 Commerce Valley Dr E, Thornhill, ON L3T 7X6
Facebook |  Twitter |  amd.com  

-Original Message-
From: Siqueira, Rodrigo  
Sent: December 17, 2021 4:24 PM
To: amd-gfx@lists.freedesktop.org
Cc: Wentland, Harry ; Li, Sun peng (Leo) 
; Lakha, Bhawanpreet ; Siqueira, 
Rodrigo ; Pillai, Aurabindo 
; Zhuo, Qingqing (Lillian) ; 
Lipski, Mikita ; Li, Roman ; Lin, 
Wayne ; Wang, Chao-kai (Stylon) ; Chiu, 
Solomon ; Kotarac, Pavle ; 
Gutierrez, Agustin ; Wheeler, Daniel 

Subject: [PATCH 00/19] DC Patches December 17, 2021

This DC patchset brings improvements in multiple areas. In summary, we
highlight:

- Fixes and improvements in the LTTPR code
- Improve z-state
- Fix null pointer check
- Improve communication with s0i2
- Update multiple-display split policy
- Add missing registers

Cc: Daniel Wheeler 

Thanks
Siqueira

Alvin Lee (1):
  drm/amd/display: Fix check for null function ptr

Angus Wang (1):
  drm/amd/display: Changed pipe split policy to allow for multi-display
pipe split

Anthony Koo (1):
  drm/amd/display: [FW Promotion] Release 0.0.98

Aric Cyr (1):
  drm/amd/display: 3.2.167

Charlene Liu (1):
  drm/amd/display: fix B0 TMDS deepcolor no dislay issue

George Shen (2):
  drm/amd/display: Limit max link cap with LTTPR caps
  drm/amd/display: Remove CR AUX RD Interval limit for LTTPR

Lai, Derek (1):
  drm/amd/display: Added power down for DCN10

Martin Leung (1):
  drm/amd/display: Undo ODM combine

Nicholas Kazlauskas (3):
  drm/amd/display: Block z-states when stutter period exceeds criteria
  drm/amd/display: Send s0i2_rdy in stream_count == 0 optimization
  drm/amd/display: Set optimize_pwr_state for DCN31

Shen, George (1):
  drm/amd/display: Refactor vendor specific link training sequence

Wenjing Liu (5):
  drm/amd/display: define link res and make it accessible to all link
interfaces
  drm/amd/display: populate link res in both detection and validation
  drm/amd/display: access hpo dp link encoder only through link resource
  drm/amd/display: support dynamic HPO DP link encoder allocation
  drm/amd/display: get and restore link res map

Wesley Chalmers (1):
  drm/amd/display: Add reg defs for DCN303

 .../display/dc/clk_mgr/dcn31/dcn31_clk_mgr.c  |   1 +
 drivers/gpu/drm/amd/display/dc/core/dc.c  |  18 -
 .../gpu/drm/amd/display/dc/core/dc_debug.c|   2 +
 drivers/gpu/drm/amd/display/dc/core/dc_link.c | 234 +---  
.../gpu/drm/amd/display/dc/core/dc_link_dp.c  | 501 +++---
 .../drm/amd/display/dc/core/dc_link_dpia.c|  48 +-
 .../drm/amd/display/dc/core/dc_link_hwss.c|  63 ++-
 .../gpu/drm/amd/display/dc/core/dc_resource.c | 199 ---
 drivers/gpu/drm/amd/display/dc/dc.h   |   3 +-
 drivers/gpu/drm/amd/display/dc/dc_link.h  |  15 +-
 .../amd/display/dc/dcn10/dcn10_hw_sequencer.c |  14 +-
 .../gpu/drm/amd/display/dc/dcn10/dcn10_init.c |   1 +
 .../drm/amd/display/dc/dcn20/dcn20_hwseq.c|   2 +-
 .../drm/amd/display/dc/dcn20/dcn20_resource.c |   5 +-
 .../amd/display/dc/dcn201/dcn201_resource.c   |   2 +-
 .../drm/amd/display/dc/dcn21/dcn21_resource.c |   2 +-
 .../drm/amd/display/dc/dcn30/dcn30_resource.c |  13 +-
 .../amd/display/dc/dcn301/dcn301_resource.c   |   2 +-
 .../amd/display/dc/dcn302/dcn302_resource.c   |   2 +-
 .../drm/amd/display/dc/dcn303/dcn303_dccg.h   |  20 +-
 .../amd/display/dc/dcn303/dcn303_resource.c   |   2 +-
 .../dc/dcn31/dcn31_hpo_dp_link_encoder.c  |   6 +-
 .../dc/dcn31/dcn31_hpo_dp_link_encoder.h  |   3 +-
 .../gpu/drm/amd/display/dc/dcn31/dcn31_init.c |   1 +
 .../drm/amd/display/dc/dcn31/dcn31_resource.c |  27 +-  
.../drm/amd/display/dc/dcn31/dcn31_resource.h |  31 ++
 .../gpu/drm/amd/display/dc/dml/dml_wrapper.c  |   2 +-
 .../gpu/drm/amd/display/dc/inc/core_status.h  |   2 +
 .../gpu/drm/amd/display/dc/inc/core_types.h   |  17 +
 .../gpu/drm/amd/display/dc/inc/dc_link_dp.h   |  15 +-

Re: [PATCH 10/19] drm/amd/display: Changed pipe split policy to allow for multi-display pipe split

2021-12-17 Thread Deucher, Alexander

[AMD Official Use Only]

Maybe add Bug links for:
https://gitlab.freedesktop.org/drm/amd/-/issues/1522
https://gitlab.freedesktop.org/drm/amd/-/issues/1709
https://gitlab.freedesktop.org/drm/amd/-/issues/1655
https://gitlab.freedesktop.org/drm/amd/-/issues/1403





From: amd-gfx  on behalf of Rodrigo 
Siqueira 
Sent: Friday, December 17, 2021 4:23 PM
To: amd-gfx@lists.freedesktop.org 
Cc: Wang, Chao-kai (Stylon) ; Cyr, Aric 
; Li, Sun peng (Leo) ; Wentland, Harry 
; Zhuo, Qingqing (Lillian) ; 
Siqueira, Rodrigo ; Li, Roman ; 
Chiu, Solomon ; Pillai, Aurabindo 
; Wang, Angus ; Lin, Wayne 
; Lipski, Mikita ; Lakha, Bhawanpreet 
; Gutierrez, Agustin ; 
Kotarac, Pavle 
Subject: [PATCH 10/19] drm/amd/display: Changed pipe split policy to allow for 
multi-display pipe split

From: Angus Wang 

[WHY]
Current implementation of pipe split policy prevents pipe split with
multiple displays connected, which caused the MCLK speed to be stuck at
max

[HOW]
Changed the pipe split policies so that pipe split is allowed for
multi-display configurations

Reviewed-by: Aric Cyr 
Acked-by: Rodrigo Siqueira 
Signed-off-by: Angus Wang 
---
 drivers/gpu/drm/amd/display/dc/dcn20/dcn20_resource.c   | 2 +-
 drivers/gpu/drm/amd/display/dc/dcn201/dcn201_resource.c | 2 +-
 drivers/gpu/drm/amd/display/dc/dcn21/dcn21_resource.c   | 2 +-
 drivers/gpu/drm/amd/display/dc/dcn30/dcn30_resource.c   | 2 +-
 drivers/gpu/drm/amd/display/dc/dcn301/dcn301_resource.c | 2 +-
 drivers/gpu/drm/amd/display/dc/dcn302/dcn302_resource.c | 2 +-
 drivers/gpu/drm/amd/display/dc/dcn303/dcn303_resource.c | 2 +-
 drivers/gpu/drm/amd/display/dc/dcn31/dcn31_resource.c   | 2 +-
 8 files changed, 8 insertions(+), 8 deletions(-)

diff --git a/drivers/gpu/drm/amd/display/dc/dcn20/dcn20_resource.c 
b/drivers/gpu/drm/amd/display/dc/dcn20/dcn20_resource.c
index 2a72517e2b28..2bc93df023ad 100644
--- a/drivers/gpu/drm/amd/display/dc/dcn20/dcn20_resource.c
+++ b/drivers/gpu/drm/amd/display/dc/dcn20/dcn20_resource.c
@@ -1069,7 +1069,7 @@ static const struct dc_debug_options debug_defaults_drv = 
{
 .timing_trace = false,
 .clock_trace = true,
 .disable_pplib_clock_request = true,
-   .pipe_split_policy = MPC_SPLIT_AVOID_MULT_DISP,
+   .pipe_split_policy = MPC_SPLIT_DYNAMIC,
 .force_single_disp_pipe_split = false,
 .disable_dcc = DCC_ENABLE,
 .vsr_support = true,
diff --git a/drivers/gpu/drm/amd/display/dc/dcn201/dcn201_resource.c 
b/drivers/gpu/drm/amd/display/dc/dcn201/dcn201_resource.c
index d6acf9a8590a..0bb7d3dd53fa 100644
--- a/drivers/gpu/drm/amd/display/dc/dcn201/dcn201_resource.c
+++ b/drivers/gpu/drm/amd/display/dc/dcn201/dcn201_resource.c
@@ -603,7 +603,7 @@ static const struct dc_debug_options debug_defaults_drv = {
 .timing_trace = false,
 .clock_trace = true,
 .disable_pplib_clock_request = true,
-   .pipe_split_policy = MPC_SPLIT_AVOID,
+   .pipe_split_policy = MPC_SPLIT_DYNAMIC,
 .force_single_disp_pipe_split = false,
 .disable_dcc = DCC_ENABLE,
 .vsr_support = true,
diff --git a/drivers/gpu/drm/amd/display/dc/dcn21/dcn21_resource.c 
b/drivers/gpu/drm/amd/display/dc/dcn21/dcn21_resource.c
index ca1bbc942fd4..e5cc6bf45743 100644
--- a/drivers/gpu/drm/amd/display/dc/dcn21/dcn21_resource.c
+++ b/drivers/gpu/drm/amd/display/dc/dcn21/dcn21_resource.c
@@ -873,7 +873,7 @@ static const struct dc_debug_options debug_defaults_drv = {
 .clock_trace = true,
 .disable_pplib_clock_request = true,
 .min_disp_clk_khz = 10,
-   .pipe_split_policy = MPC_SPLIT_AVOID_MULT_DISP,
+   .pipe_split_policy = MPC_SPLIT_DYNAMIC,
 .force_single_disp_pipe_split = false,
 .disable_dcc = DCC_ENABLE,
 .vsr_support = true,
diff --git a/drivers/gpu/drm/amd/display/dc/dcn30/dcn30_resource.c 
b/drivers/gpu/drm/amd/display/dc/dcn30/dcn30_resource.c
index 369ceeeddc7e..e12660c609ee 100644
--- a/drivers/gpu/drm/amd/display/dc/dcn30/dcn30_resource.c
+++ b/drivers/gpu/drm/amd/display/dc/dcn30/dcn30_resource.c
@@ -840,7 +840,7 @@ static const struct dc_debug_options debug_defaults_drv = {
 .timing_trace = false,
 .clock_trace = true,
 .disable_pplib_clock_request = true,
-   .pipe_split_policy = MPC_SPLIT_AVOID_MULT_DISP,
+   .pipe_split_policy = MPC_SPLIT_DYNAMIC,
 .force_single_disp_pipe_split = false,
 .disable_dcc = DCC_ENABLE,
 .vsr_support = true,
diff --git a/drivers/gpu/drm/amd/display/dc/dcn301/dcn301_resource.c 
b/drivers/gpu/drm/amd/display/dc/dcn301/dcn301_resource.c
index b4001233867c..c1c6e602b06c 100644
--- a/drivers/gpu/drm/amd/display/dc/dcn301/dcn301_resource.c
+++ b/drivers/gpu/drm/amd/display/dc/dcn301/dcn301_resource.c
@@ -686,7 +686,7

[PATCH 19/19] drm/amd/display: get and restore link res map

2021-12-17 Thread Rodrigo Siqueira

From: Wenjing Liu 

[why]
When reboot the link res map should be persisted.  So during boot up,
driver will look at the map to determine which link should take priority
to use certain link res.  This is to ensure that link res remains
unshuffled after a reboot.

Reviewed-by: Jun Lei 
Acked-by: Rodrigo Siqueira 
Signed-off-by: Wenjing Liu 
---
 drivers/gpu/drm/amd/display/dc/core/dc_link.c | 103 ++
 drivers/gpu/drm/amd/display/dc/dc_link.h  |   4 +
 .../gpu/drm/amd/display/dc/inc/core_types.h   |   5 +
 3 files changed, 112 insertions(+)

diff --git a/drivers/gpu/drm/amd/display/dc/core/dc_link.c 
b/drivers/gpu/drm/amd/display/dc/core/dc_link.c
index c5d3e2417ef6..ee3c1c9eac4a 100644
--- a/drivers/gpu/drm/amd/display/dc/core/dc_link.c
+++ b/drivers/gpu/drm/amd/display/dc/core/dc_link.c
@@ -4890,3 +4890,106 @@ const struct link_resource 
*dc_link_get_cur_link_res(const struct dc_link *link)
 
return link_res;
 }
+
+/**
+ * dc_get_cur_link_res_map() - take a snapshot of current link resource 
allocation state
+ * @dc: pointer to dc of the dm calling this
+ * @map: a dc link resource snapshot defined internally to dc.
+ *
+ * DM needs to capture a snapshot of current link resource allocation mapping
+ * and store it in its persistent storage.
+ *
+ * Some of the link resource is using first come first serve policy.
+ * The allocation mapping depends on original hotplug order. This information
+ * is lost after driver is loaded next time. The snapshot is used in order to
+ * restore link resource to its previous state so user will get consistent
+ * link capability allocation across reboot.
+ *
+ * Return: none (void function)
+ *
+ */
+void dc_get_cur_link_res_map(const struct dc *dc, uint32_t *map)
+{
+#if defined(CONFIG_DRM_AMD_DC_DCN)
+   struct dc_link *link;
+   uint8_t i;
+   uint32_t hpo_dp_recycle_map = 0;
+
+   *map = 0;
+
+   if (dc->caps.dp_hpo) {
+   for (i = 0; i < dc->caps.max_links; i++) {
+   link = dc->links[i];
+   if (link->link_status.link_active &&
+   
dp_get_link_encoding_format(>reported_link_cap) == DP_128b_132b_ENCODING 
&&
+   
dp_get_link_encoding_format(>cur_link_settings) != DP_128b_132b_ENCODING)
+   /* hpo dp link encoder is considered as 
recycled, when RX reports 128b/132b encoding capability
+* but current link doesn't use it.
+*/
+   hpo_dp_recycle_map |= (1 << i);
+   }
+   *map |= (hpo_dp_recycle_map << LINK_RES_HPO_DP_REC_MAP__SHIFT);
+   }
+#endif
+}
+
+/**
+ * dc_restore_link_res_map() - restore link resource allocation state from a 
snapshot
+ * @dc: pointer to dc of the dm calling this
+ * @map: a dc link resource snapshot defined internally to dc.
+ *
+ * DM needs to call this function after initial link detection on boot and
+ * before first commit streams to restore link resource allocation state
+ * from previous boot session.
+ *
+ * Some of the link resource is using first come first serve policy.
+ * The allocation mapping depends on original hotplug order. This information
+ * is lost after driver is loaded next time. The snapshot is used in order to
+ * restore link resource to its previous state so user will get consistent
+ * link capability allocation across reboot.
+ *
+ * Return: none (void function)
+ *
+ */
+void dc_restore_link_res_map(const struct dc *dc, uint32_t *map)
+{
+#if defined(CONFIG_DRM_AMD_DC_DCN)
+   struct dc_link *link;
+   uint8_t i;
+   unsigned int available_hpo_dp_count;
+   uint32_t hpo_dp_recycle_map = (*map & LINK_RES_HPO_DP_REC_MAP__MASK)
+   >> LINK_RES_HPO_DP_REC_MAP__SHIFT;
+
+   if (dc->caps.dp_hpo) {
+   available_hpo_dp_count = dc->res_pool->hpo_dp_link_enc_count;
+   /* remove excess 128b/132b encoding support for not recycled 
links */
+   for (i = 0; i < dc->caps.max_links; i++) {
+   if ((hpo_dp_recycle_map & (1 << i)) == 0) {
+   link = dc->links[i];
+   if (link->type != dc_connection_none &&
+   
dp_get_link_encoding_format(>verified_link_cap) == DP_128b_132b_ENCODING) 
{
+   if (available_hpo_dp_count > 0)
+   available_hpo_dp_count--;
+   else
+   /* remove 128b/132b encoding 
capability by limiting verified link rate to HBR3 */
+   
link->verified_link_cap.link_rate = LINK_RATE_HIGH3;
+   }
+   }
+   }
+   /* remove excess 128b/132b

[PATCH 18/19] drm/amd/display: support dynamic HPO DP link encoder allocation

2021-12-17 Thread Rodrigo Siqueira

From: Wenjing Liu 

[why]
When there are more DP2.0 RXs connected than the number HPO DP link
encoders we have, we need to dynamically allocate HPO DP link encoder to
the port that needs it.

[how]
Only allocate HPO DP link encoder when it is needed.

Reviewed-by: Jun Lei 
Acked-by: Rodrigo Siqueira 
Signed-off-by: Wenjing Liu 
---
 drivers/gpu/drm/amd/display/dc/core/dc.c  |  18 ---
 .../gpu/drm/amd/display/dc/core/dc_debug.c|   2 +
 drivers/gpu/drm/amd/display/dc/core/dc_link.c |  43 ++-
 .../drm/amd/display/dc/core/dc_link_hwss.c|   3 +-
 .../gpu/drm/amd/display/dc/core/dc_resource.c | 119 --
 drivers/gpu/drm/amd/display/dc/dc_link.h  |   3 -
 .../dc/dcn31/dcn31_hpo_dp_link_encoder.c  |   6 +-
 .../dc/dcn31/dcn31_hpo_dp_link_encoder.h  |   3 +-
 .../gpu/drm/amd/display/dc/inc/core_status.h  |   2 +
 .../gpu/drm/amd/display/dc/inc/core_types.h   |   2 +
 .../drm/amd/display/dc/inc/hw/link_encoder.h  |   3 +-
 drivers/gpu/drm/amd/display/dc/inc/resource.h |   6 +-
 12 files changed, 134 insertions(+), 76 deletions(-)

diff --git a/drivers/gpu/drm/amd/display/dc/core/dc.c 
b/drivers/gpu/drm/amd/display/dc/core/dc.c
index c250f6de5136..91c4874473d6 100644
--- a/drivers/gpu/drm/amd/display/dc/core/dc.c
+++ b/drivers/gpu/drm/amd/display/dc/core/dc.c
@@ -274,24 +274,6 @@ static bool create_links(
goto failed_alloc;
}
 
-#if defined(CONFIG_DRM_AMD_DC_DCN)
-   if (IS_FPGA_MAXIMUS_DC(dc->ctx->dce_environment) &&
-   dc->caps.dp_hpo &&
-   
link->dc->res_pool->res_cap->num_hpo_dp_link_encoder > 0) {
-   /* FPGA case - Allocate HPO DP link encoder */
-   if (i < 
link->dc->res_pool->res_cap->num_hpo_dp_link_encoder) {
-   link->hpo_dp_link_enc = 
link->dc->res_pool->hpo_dp_link_enc[i];
-
-   if (link->hpo_dp_link_enc == NULL) {
-   BREAK_TO_DEBUGGER();
-   goto failed_alloc;
-   }
-   link->hpo_dp_link_enc->hpd_source = 
link->link_enc->hpd_source;
-   link->hpo_dp_link_enc->transmitter = 
link->link_enc->transmitter;
-   }
-   }
-#endif
-
link->link_status.dpcd_caps = >dpcd_caps;
 
enc_init.ctx = dc->ctx;
diff --git a/drivers/gpu/drm/amd/display/dc/core/dc_debug.c 
b/drivers/gpu/drm/amd/display/dc/core/dc_debug.c
index 21be2a684393..643762542e4d 100644
--- a/drivers/gpu/drm/amd/display/dc/core/dc_debug.c
+++ b/drivers/gpu/drm/amd/display/dc/core/dc_debug.c
@@ -422,6 +422,8 @@ char *dc_status_to_str(enum dc_status status)
return "The operation is not supported.";
case DC_UNSUPPORTED_VALUE:
return "The value specified is not supported.";
+   case DC_NO_LINK_ENC_RESOURCE:
+   return "No link encoder resource";
case DC_ERROR_UNEXPECTED:
return "Unexpected error";
}
diff --git a/drivers/gpu/drm/amd/display/dc/core/dc_link.c 
b/drivers/gpu/drm/amd/display/dc/core/dc_link.c
index 9197dd73c6d2..c5d3e2417ef6 100644
--- a/drivers/gpu/drm/amd/display/dc/core/dc_link.c
+++ b/drivers/gpu/drm/amd/display/dc/core/dc_link.c
@@ -66,31 +66,6 @@
 
/***
  * Private functions
  
**/
-#if defined(CONFIG_DRM_AMD_DC_DCN)
-static bool add_dp_hpo_link_encoder_to_link(struct dc_link *link)
-{
-   struct hpo_dp_link_encoder *enc = 
resource_get_unused_hpo_dp_link_encoder(
-   link->dc->res_pool);
-
-   if (!link->hpo_dp_link_enc && enc) {
-   link->hpo_dp_link_enc = enc;
-   link->hpo_dp_link_enc->transmitter = 
link->link_enc->transmitter;
-   link->hpo_dp_link_enc->hpd_source = link->link_enc->hpd_source;
-   }
-
-   return (link->hpo_dp_link_enc != NULL);
-}
-
-static void remove_dp_hpo_link_encoder_from_link(struct dc_link *link)
-{
-   if (link->hpo_dp_link_enc) {
-   link->hpo_dp_link_enc->hpd_source = HPD_SOURCEID_UNKNOWN;
-   link->hpo_dp_link_enc->transmitter = TRANSMITTER_UNKNOWN;
-   link->hpo_dp_link_enc = NULL;
-   }
-}
-#endif
-
 static void dc_link_destruct(struct dc_link *link)
 {
int i;
@@ -118,12 +93,6 @@ static void dc_link_destruct(struct dc_link *link)
link->link_enc->funcs->destroy(>link_enc);
}
 
-#if defined(CONFIG_DRM_AMD_DC_DCN)
-   if (link->hpo_dp_link_enc) {
-   remove_dp_hpo_link_encoder_from_link(link);
-   }
-#endif
-
if (link->local_sink)
dc_sink_release(link->local_sink);
 
@@ -975,10 +944,11 @@ static bool

[PATCH 16/19] drm/amd/display: populate link res in both detection and validation

2021-12-17 Thread Rodrigo Siqueira

From: Wenjing Liu 

[why]
This commit is to populate link res in preparation of the next commit.
The next commit will replace all existing code to use link res instead

Reviewed-by: Jun Lei 
Acked-by: Rodrigo Siqueira 
Signed-off-by: Wenjing Liu 
---
 drivers/gpu/drm/amd/display/dc/core/dc_link.c | 9 ++---
 drivers/gpu/drm/amd/display/dc/core/dc_resource.c | 1 +
 drivers/gpu/drm/amd/display/dc/inc/core_types.h   | 4 
 3 files changed, 11 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/amd/display/dc/core/dc_link.c 
b/drivers/gpu/drm/amd/display/dc/core/dc_link.c
index 4130cd98f1ce..a394946ef513 100644
--- a/drivers/gpu/drm/amd/display/dc/core/dc_link.c
+++ b/drivers/gpu/drm/amd/display/dc/core/dc_link.c
@@ -881,6 +881,7 @@ static bool dc_link_detect_helper(struct dc_link *link,
enum dc_connection_type pre_connection_type = dc_connection_none;
bool perform_dp_seamless_boot = false;
const uint32_t post_oui_delay = 30; // 30ms
+   struct link_resource link_res = { 0 };
 
DC_LOGGER_INIT(link->ctx->logger);
 
@@ -974,8 +975,10 @@ static bool dc_link_detect_helper(struct dc_link *link,
}
 
 #if defined(CONFIG_DRM_AMD_DC_DCN)
-   if 
(dp_get_link_encoding_format(>reported_link_cap) == DP_128b_132b_ENCODING)
+   if 
(dp_get_link_encoding_format(>reported_link_cap) == 
DP_128b_132b_ENCODING) {
add_dp_hpo_link_encoder_to_link(link);
+   link_res.hpo_dp_link_enc = 
link->hpo_dp_link_enc;
+   }
 #endif
 
if (link->type == dc_connection_mst_branch) {
@@ -986,7 +989,7 @@ static bool dc_link_detect_helper(struct dc_link *link,
 * empty which leads to allocate_mst_payload() 
has "0"
 * pbn_per_slot value leading to exception on 
dc_fixpt_div()
 */
-   dp_verify_mst_link_cap(link, NULL);
+   dp_verify_mst_link_cap(link, _res);
 
/*
 * This call will initiate MST topology 
discovery. Which
@@ -1150,7 +1153,7 @@ static bool dc_link_detect_helper(struct dc_link *link,
// verify link cap for SST non-seamless boot
if (!perform_dp_seamless_boot)
dp_verify_link_cap_with_retries(link,
-   NULL,
+   _res,

>reported_link_cap,

LINK_TRAINING_MAX_VERIFY_RETRY);
} else {
diff --git a/drivers/gpu/drm/amd/display/dc/core/dc_resource.c 
b/drivers/gpu/drm/amd/display/dc/core/dc_resource.c
index 0da692c9a543..60a9eb6e521f 100644
--- a/drivers/gpu/drm/amd/display/dc/core/dc_resource.c
+++ b/drivers/gpu/drm/amd/display/dc/core/dc_resource.c
@@ -2161,6 +2161,7 @@ enum dc_status resource_map_pool_resources(
>res_ctx, pool,
pipe_ctx->stream_res.hpo_dp_stream_enc,
true);
+   pipe_ctx->link_res.hpo_dp_link_enc = 
stream->link->hpo_dp_link_enc;
}
}
 #endif
diff --git a/drivers/gpu/drm/amd/display/dc/inc/core_types.h 
b/drivers/gpu/drm/amd/display/dc/inc/core_types.h
index 9381ea0549d8..0bd28a332fcb 100644
--- a/drivers/gpu/drm/amd/display/dc/inc/core_types.h
+++ b/drivers/gpu/drm/amd/display/dc/inc/core_types.h
@@ -336,7 +336,11 @@ struct plane_resource {
 
 /* all mappable hardware resources used to enable a link */
 struct link_resource {
+#if defined(CONFIG_DRM_AMD_DC_DCN)
+   struct hpo_dp_link_encoder *hpo_dp_link_enc;
+#else
void *dummy;
+#endif
 };
 
 union pipe_update_flags {
-- 
2.25.1

[PATCH 17/19] drm/amd/display: access hpo dp link encoder only through link resource

2021-12-17 Thread Rodrigo Siqueira

From: Wenjing Liu 

[why]
Update all accesses to use hpo dp link encoder through link resource
only.

Reviewed-by: Jun Lei 
Acked-by: Rodrigo Siqueira 
Signed-off-by: Wenjing Liu 
---
 drivers/gpu/drm/amd/display/dc/core/dc_link.c | 22 +++---
 .../gpu/drm/amd/display/dc/core/dc_link_dp.c  | 12 +---
 .../drm/amd/display/dc/core/dc_link_hwss.c| 30 +--
 drivers/gpu/drm/amd/display/dc/dc.h   |  1 +
 .../amd/display/dc/dcn10/dcn10_hw_sequencer.c | 14 +++--
 .../drm/amd/display/dc/dcn20/dcn20_hwseq.c|  2 +-
 .../gpu/drm/amd/display/dc/dml/dml_wrapper.c  |  2 +-
 7 files changed, 40 insertions(+), 43 deletions(-)

diff --git a/drivers/gpu/drm/amd/display/dc/core/dc_link.c 
b/drivers/gpu/drm/amd/display/dc/core/dc_link.c
index a394946ef513..9197dd73c6d2 100644
--- a/drivers/gpu/drm/amd/display/dc/core/dc_link.c
+++ b/drivers/gpu/drm/amd/display/dc/core/dc_link.c
@@ -3426,7 +3426,7 @@ static enum dc_status dc_link_update_sst_payload(struct 
pipe_ctx *pipe_ctx,
 {
struct dc_stream_state *stream = pipe_ctx->stream;
struct dc_link *link = stream->link;
-   struct hpo_dp_link_encoder *hpo_dp_link_encoder = link->hpo_dp_link_enc;
+   struct hpo_dp_link_encoder *hpo_dp_link_encoder = 
pipe_ctx->link_res.hpo_dp_link_enc;
struct hpo_dp_stream_encoder *hpo_dp_stream_encoder = 
pipe_ctx->stream_res.hpo_dp_stream_enc;
struct link_mst_stream_allocation_table proposed_table = {0};
struct fixed31_32 avg_time_slots_per_mtp;
@@ -3508,7 +3508,7 @@ enum dc_status dc_link_allocate_mst_payload(struct 
pipe_ctx *pipe_ctx)
struct link_encoder *link_encoder = NULL;
struct stream_encoder *stream_encoder = pipe_ctx->stream_res.stream_enc;
 #if defined(CONFIG_DRM_AMD_DC_DCN)
-   struct hpo_dp_link_encoder *hpo_dp_link_encoder = link->hpo_dp_link_enc;
+   struct hpo_dp_link_encoder *hpo_dp_link_encoder = 
pipe_ctx->link_res.hpo_dp_link_enc;
struct hpo_dp_stream_encoder *hpo_dp_stream_encoder = 
pipe_ctx->stream_res.hpo_dp_stream_enc;
 #endif
struct dp_mst_stream_allocation_table proposed_table = {0};
@@ -3838,7 +3838,7 @@ static enum dc_status deallocate_mst_payload(struct 
pipe_ctx *pipe_ctx)
struct link_encoder *link_encoder = NULL;
struct stream_encoder *stream_encoder = pipe_ctx->stream_res.stream_enc;
 #if defined(CONFIG_DRM_AMD_DC_DCN)
-   struct hpo_dp_link_encoder *hpo_dp_link_encoder = link->hpo_dp_link_enc;
+   struct hpo_dp_link_encoder *hpo_dp_link_encoder = 
pipe_ctx->link_res.hpo_dp_link_enc;
struct hpo_dp_stream_encoder *hpo_dp_stream_encoder = 
pipe_ctx->stream_res.hpo_dp_stream_enc;
 #endif
struct dp_mst_stream_allocation_table proposed_table = {0};
@@ -4164,12 +4164,12 @@ static void fpga_dp_hpo_enable_link_and_stream(struct 
dc_state *state, struct pi
proposed_table.stream_allocations[0].hpo_dp_stream_enc = 
pipe_ctx->stream_res.hpo_dp_stream_enc;
}
 
-   stream->link->hpo_dp_link_enc->funcs->update_stream_allocation_table(
-   stream->link->hpo_dp_link_enc,
+   
pipe_ctx->link_res.hpo_dp_link_enc->funcs->update_stream_allocation_table(
+   pipe_ctx->link_res.hpo_dp_link_enc,
_table);
 
-   stream->link->hpo_dp_link_enc->funcs->set_throttled_vcp_size(
-   stream->link->hpo_dp_link_enc,
+   pipe_ctx->link_res.hpo_dp_link_enc->funcs->set_throttled_vcp_size(
+   pipe_ctx->link_res.hpo_dp_link_enc,
pipe_ctx->stream_res.hpo_dp_stream_enc->inst,
avg_time_slots_per_mtp);
 
@@ -4674,11 +4674,9 @@ void dc_link_set_preferred_training_settings(struct dc 
*dc,
if (link_setting != NULL) {
link->preferred_link_setting = *link_setting;
 #if defined(CONFIG_DRM_AMD_DC_DCN)
-   if (dp_get_link_encoding_format(link_setting) ==
-   DP_128b_132b_ENCODING && 
!link->hpo_dp_link_enc) {
-   if (!add_dp_hpo_link_encoder_to_link(link))
-   memset(>preferred_link_setting, 0, 
sizeof(link->preferred_link_setting));
-   }
+   if (dp_get_link_encoding_format(link_setting) == 
DP_128b_132b_ENCODING)
+   /* TODO: add dc update for acquiring link res  */
+   skip_immediate_retrain = true;
 #endif
} else {
link->preferred_link_setting.lane_count = LANE_COUNT_UNKNOWN;
diff --git a/drivers/gpu/drm/amd/display/dc/core/dc_link_dp.c 
b/drivers/gpu/drm/amd/display/dc/core/dc_link_dp.c
index 98835d6c9036..05e216524370 100644
--- a/drivers/gpu/drm/amd/display/dc/core/dc_link_dp.c
+++ b/drivers/gpu/drm/amd/display/dc/core/dc_link_dp.c
@@ -3211,9 +3211,11 @@ static struct dc_link_settings get_max_link_cap(struct 
dc_link *link,
if (link_enc)
link_enc->funcs->get_max_link_cap(link_enc,

[PATCH 10/19] drm/amd/display: Changed pipe split policy to allow for multi-display pipe split

2021-12-17 Thread Rodrigo Siqueira

From: Angus Wang 

[WHY]
Current implementation of pipe split policy prevents pipe split with
multiple displays connected, which caused the MCLK speed to be stuck at
max

[HOW]
Changed the pipe split policies so that pipe split is allowed for
multi-display configurations

Reviewed-by: Aric Cyr 
Acked-by: Rodrigo Siqueira 
Signed-off-by: Angus Wang 
---
 drivers/gpu/drm/amd/display/dc/dcn20/dcn20_resource.c   | 2 +-
 drivers/gpu/drm/amd/display/dc/dcn201/dcn201_resource.c | 2 +-
 drivers/gpu/drm/amd/display/dc/dcn21/dcn21_resource.c   | 2 +-
 drivers/gpu/drm/amd/display/dc/dcn30/dcn30_resource.c   | 2 +-
 drivers/gpu/drm/amd/display/dc/dcn301/dcn301_resource.c | 2 +-
 drivers/gpu/drm/amd/display/dc/dcn302/dcn302_resource.c | 2 +-
 drivers/gpu/drm/amd/display/dc/dcn303/dcn303_resource.c | 2 +-
 drivers/gpu/drm/amd/display/dc/dcn31/dcn31_resource.c   | 2 +-
 8 files changed, 8 insertions(+), 8 deletions(-)

diff --git a/drivers/gpu/drm/amd/display/dc/dcn20/dcn20_resource.c 
b/drivers/gpu/drm/amd/display/dc/dcn20/dcn20_resource.c
index 2a72517e2b28..2bc93df023ad 100644
--- a/drivers/gpu/drm/amd/display/dc/dcn20/dcn20_resource.c
+++ b/drivers/gpu/drm/amd/display/dc/dcn20/dcn20_resource.c
@@ -1069,7 +1069,7 @@ static const struct dc_debug_options debug_defaults_drv = 
{
.timing_trace = false,
.clock_trace = true,
.disable_pplib_clock_request = true,
-   .pipe_split_policy = MPC_SPLIT_AVOID_MULT_DISP,
+   .pipe_split_policy = MPC_SPLIT_DYNAMIC,
.force_single_disp_pipe_split = false,
.disable_dcc = DCC_ENABLE,
.vsr_support = true,
diff --git a/drivers/gpu/drm/amd/display/dc/dcn201/dcn201_resource.c 
b/drivers/gpu/drm/amd/display/dc/dcn201/dcn201_resource.c
index d6acf9a8590a..0bb7d3dd53fa 100644
--- a/drivers/gpu/drm/amd/display/dc/dcn201/dcn201_resource.c
+++ b/drivers/gpu/drm/amd/display/dc/dcn201/dcn201_resource.c
@@ -603,7 +603,7 @@ static const struct dc_debug_options debug_defaults_drv = {
.timing_trace = false,
.clock_trace = true,
.disable_pplib_clock_request = true,
-   .pipe_split_policy = MPC_SPLIT_AVOID,
+   .pipe_split_policy = MPC_SPLIT_DYNAMIC,
.force_single_disp_pipe_split = false,
.disable_dcc = DCC_ENABLE,
.vsr_support = true,
diff --git a/drivers/gpu/drm/amd/display/dc/dcn21/dcn21_resource.c 
b/drivers/gpu/drm/amd/display/dc/dcn21/dcn21_resource.c
index ca1bbc942fd4..e5cc6bf45743 100644
--- a/drivers/gpu/drm/amd/display/dc/dcn21/dcn21_resource.c
+++ b/drivers/gpu/drm/amd/display/dc/dcn21/dcn21_resource.c
@@ -873,7 +873,7 @@ static const struct dc_debug_options debug_defaults_drv = {
.clock_trace = true,
.disable_pplib_clock_request = true,
.min_disp_clk_khz = 10,
-   .pipe_split_policy = MPC_SPLIT_AVOID_MULT_DISP,
+   .pipe_split_policy = MPC_SPLIT_DYNAMIC,
.force_single_disp_pipe_split = false,
.disable_dcc = DCC_ENABLE,
.vsr_support = true,
diff --git a/drivers/gpu/drm/amd/display/dc/dcn30/dcn30_resource.c 
b/drivers/gpu/drm/amd/display/dc/dcn30/dcn30_resource.c
index 369ceeeddc7e..e12660c609ee 100644
--- a/drivers/gpu/drm/amd/display/dc/dcn30/dcn30_resource.c
+++ b/drivers/gpu/drm/amd/display/dc/dcn30/dcn30_resource.c
@@ -840,7 +840,7 @@ static const struct dc_debug_options debug_defaults_drv = {
.timing_trace = false,
.clock_trace = true,
.disable_pplib_clock_request = true,
-   .pipe_split_policy = MPC_SPLIT_AVOID_MULT_DISP,
+   .pipe_split_policy = MPC_SPLIT_DYNAMIC,
.force_single_disp_pipe_split = false,
.disable_dcc = DCC_ENABLE,
.vsr_support = true,
diff --git a/drivers/gpu/drm/amd/display/dc/dcn301/dcn301_resource.c 
b/drivers/gpu/drm/amd/display/dc/dcn301/dcn301_resource.c
index b4001233867c..c1c6e602b06c 100644
--- a/drivers/gpu/drm/amd/display/dc/dcn301/dcn301_resource.c
+++ b/drivers/gpu/drm/amd/display/dc/dcn301/dcn301_resource.c
@@ -686,7 +686,7 @@ static const struct dc_debug_options debug_defaults_drv = {
.disable_clock_gate = true,
.disable_pplib_clock_request = true,
.disable_pplib_wm_range = true,
-   .pipe_split_policy = MPC_SPLIT_AVOID,
+   .pipe_split_policy = MPC_SPLIT_DYNAMIC,
.force_single_disp_pipe_split = false,
.disable_dcc = DCC_ENABLE,
.vsr_support = true,
diff --git a/drivers/gpu/drm/amd/display/dc/dcn302/dcn302_resource.c 
b/drivers/gpu/drm/amd/display/dc/dcn302/dcn302_resource.c
index 003e95368672..2e9cbfa7663b 100644
--- a/drivers/gpu/drm/amd/display/dc/dcn302/dcn302_resource.c
+++ b/drivers/gpu/drm/amd/display/dc/dcn302/dcn302_resource.c
@@ -211,7 +211,7 @@ static const struct dc_debug_options debug_defaults_drv = {
.timing_trace = false,
.clock_trace = true,

[PATCH 14/19] drm/amd/display: 3.2.167

2021-12-17 Thread Rodrigo Siqueira

From: Aric Cyr 

This version brings along the following:

- Fixes and improvements in the LTTPR code
- Improve z-state
- Fix null pointer check
- Improve communication with s0i2
- Update multiple-display split policy
- Add missing registers

Acked-by: Rodrigo Siqueira 
Signed-off-by: Aric Cyr 
---
 drivers/gpu/drm/amd/display/dc/dc.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/display/dc/dc.h 
b/drivers/gpu/drm/amd/display/dc/dc.h
index 18e59d635ca2..1be74d6223df 100644
--- a/drivers/gpu/drm/amd/display/dc/dc.h
+++ b/drivers/gpu/drm/amd/display/dc/dc.h
@@ -47,7 +47,7 @@ struct aux_payload;
 struct set_config_cmd_payload;
 struct dmub_notification;
 
-#define DC_VER "3.2.166"
+#define DC_VER "3.2.167"
 
 #define MAX_SURFACES 3
 #define MAX_PLANES 6
-- 
2.25.1

[PATCH 15/19] drm/amd/display: define link res and make it accessible to all link interfaces

2021-12-17 Thread Rodrigo Siqueira

From: Wenjing Liu 

[why]
There will be a series of re-arch changes in Link Resource Management.
They are more and more muxable link resource objects and the resource is
insufficient for a one to one allocation to all links created.
Therefore a link resource sharing logic is required to determine which
link should use certain link resource.

This commit is the first one in this series that starts by defining a
link resource struct, this struct will be available to all interfaces
that need to perform link programming sequence.

In later commits, we will granduately decouple link resource objects out
of dc link. So instead of access a link resource from dc link. Current
link's resource can be accessible through pipe_ctx->link_res during
commit, or by calling  dc_link_get_cur_link_res function with current
link passed in after commit.

Reviewed-by: Jun Lei 
Acked-by: Rodrigo Siqueira 
Signed-off-by: Wenjing Liu 
---
 drivers/gpu/drm/amd/display/dc/core/dc_link.c |  69 +---
 .../gpu/drm/amd/display/dc/core/dc_link_dp.c  | 152 +++---
 .../drm/amd/display/dc/core/dc_link_dpia.c|  48 --
 .../drm/amd/display/dc/core/dc_link_hwss.c|  30 ++--
 drivers/gpu/drm/amd/display/dc/dc_link.h  |   8 +
 .../gpu/drm/amd/display/dc/inc/core_types.h   |   6 +
 .../gpu/drm/amd/display/dc/inc/dc_link_dp.h   |  15 +-
 .../gpu/drm/amd/display/dc/inc/dc_link_dpia.h |   5 +-
 .../gpu/drm/amd/display/dc/inc/link_hwss.h|  10 +-
 9 files changed, 229 insertions(+), 114 deletions(-)

diff --git a/drivers/gpu/drm/amd/display/dc/core/dc_link.c 
b/drivers/gpu/drm/amd/display/dc/core/dc_link.c
index 857941d83f1f..4130cd98f1ce 100644
--- a/drivers/gpu/drm/amd/display/dc/core/dc_link.c
+++ b/drivers/gpu/drm/amd/display/dc/core/dc_link.c
@@ -986,7 +986,7 @@ static bool dc_link_detect_helper(struct dc_link *link,
 * empty which leads to allocate_mst_payload() 
has "0"
 * pbn_per_slot value leading to exception on 
dc_fixpt_div()
 */
-   dp_verify_mst_link_cap(link);
+   dp_verify_mst_link_cap(link, NULL);
 
/*
 * This call will initiate MST topology 
discovery. Which
@@ -1150,6 +1150,7 @@ static bool dc_link_detect_helper(struct dc_link *link,
// verify link cap for SST non-seamless boot
if (!perform_dp_seamless_boot)
dp_verify_link_cap_with_retries(link,
+   NULL,

>reported_link_cap,

LINK_TRAINING_MAX_VERIFY_RETRY);
} else {
@@ -2503,7 +2504,8 @@ static void write_i2c_redriver_setting(
DC_LOG_DEBUG("Set redriver failed");
 }
 
-static void disable_link(struct dc_link *link, enum signal_type signal)
+static void disable_link(struct dc_link *link, const struct link_resource 
*link_res,
+   enum signal_type signal)
 {
/*
 * TODO: implement call for dp_set_hw_test_pattern
@@ -2522,20 +2524,20 @@ static void disable_link(struct dc_link *link, enum 
signal_type signal)
struct dc_link_settings link_settings = link->cur_link_settings;
 #endif
if (dc_is_dp_sst_signal(signal))
-   dp_disable_link_phy(link, signal);
+   dp_disable_link_phy(link, link_res, signal);
else
-   dp_disable_link_phy_mst(link, signal);
+   dp_disable_link_phy_mst(link, link_res, signal);
 
if (dc_is_dp_sst_signal(signal) ||
link->mst_stream_alloc_table.stream_count == 0) 
{
 #if defined(CONFIG_DRM_AMD_DC_DCN)
if (dp_get_link_encoding_format(_settings) == 
DP_8b_10b_ENCODING) {
dp_set_fec_enable(link, false);
-   dp_set_fec_ready(link, false);
+   dp_set_fec_ready(link, link_res, false);
}
 #else
dp_set_fec_enable(link, false);
-   dp_set_fec_ready(link, false);
+   dp_set_fec_ready(link, link_res, false);
 #endif
}
} else {
@@ -2646,7 +2648,7 @@ static enum dc_status enable_link(
 * new link settings.
 */
if (link->link_status.link_active) {
-   disable_link(link, pipe_ctx->stream->signal);
+   disable_link(link, _ctx->link_res, 
pipe_ctx->stream->signal);
}
 
switch (pipe_ctx->stream->signal) {
@@ -4109,7 +4111,7 @@ static void fpga_dp_hpo_enable_link_and_stream(struct 
dc_state *state, struct pi
stream->link->cur_link_settings = link_settings;

[PATCH 11/19] drm/amd/display: Add reg defs for DCN303

2021-12-17 Thread Rodrigo Siqueira

From: Wesley Chalmers 

[WHY]
These registers are currently missing from the DCN303 header files

Reviewed-by: George Shen 
Acked-by: Rodrigo Siqueira 
Signed-off-by: Wesley Chalmers 
---
 .../drm/amd/display/dc/dcn303/dcn303_dccg.h   | 20 +--
 1 file changed, 18 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/display/dc/dcn303/dcn303_dccg.h 
b/drivers/gpu/drm/amd/display/dc/dcn303/dcn303_dccg.h
index a79c54bbc899..294bd757bcb5 100644
--- a/drivers/gpu/drm/amd/display/dc/dcn303/dcn303_dccg.h
+++ b/drivers/gpu/drm/amd/display/dc/dcn303/dcn303_dccg.h
@@ -15,7 +15,11 @@
SR(DPPCLK_DTO_CTRL),\
DCCG_SRII(DTO_PARAM, DPPCLK, 0),\
DCCG_SRII(DTO_PARAM, DPPCLK, 1),\
-   SR(REFCLK_CNTL)
+   SR(REFCLK_CNTL),\
+   SR(DISPCLK_FREQ_CHANGE_CNTL),\
+   DCCG_SRII(PIXEL_RATE_CNTL, OTG, 0),\
+   DCCG_SRII(PIXEL_RATE_CNTL, OTG, 1)
+
 
 #define DCCG_MASK_SH_LIST_DCN3_03(mask_sh) \
DCCG_SFI(DPPCLK_DTO_CTRL, DTO_ENABLE, DPPCLK, 0, mask_sh),\
@@ -25,6 +29,18 @@
DCCG_SF(DPPCLK0_DTO_PARAM, DPPCLK0_DTO_PHASE, mask_sh),\
DCCG_SF(DPPCLK0_DTO_PARAM, DPPCLK0_DTO_MODULO, mask_sh),\
DCCG_SF(REFCLK_CNTL, REFCLK_CLOCK_EN, mask_sh),\
-   DCCG_SF(REFCLK_CNTL, REFCLK_SRC_SEL, mask_sh)
+   DCCG_SF(REFCLK_CNTL, REFCLK_SRC_SEL, mask_sh),\
+   DCCG_SF(DISPCLK_FREQ_CHANGE_CNTL, DISPCLK_STEP_DELAY, mask_sh),\
+   DCCG_SF(DISPCLK_FREQ_CHANGE_CNTL, DISPCLK_STEP_SIZE, mask_sh),\
+   DCCG_SF(DISPCLK_FREQ_CHANGE_CNTL, DISPCLK_FREQ_RAMP_DONE, 
mask_sh),\
+   DCCG_SF(DISPCLK_FREQ_CHANGE_CNTL, DISPCLK_MAX_ERRDET_CYCLES, 
mask_sh),\
+   DCCG_SF(DISPCLK_FREQ_CHANGE_CNTL, DCCG_FIFO_ERRDET_RESET, 
mask_sh),\
+   DCCG_SF(DISPCLK_FREQ_CHANGE_CNTL, DCCG_FIFO_ERRDET_STATE, 
mask_sh),\
+   DCCG_SF(DISPCLK_FREQ_CHANGE_CNTL, DCCG_FIFO_ERRDET_OVR_EN, 
mask_sh),\
+   DCCG_SF(DISPCLK_FREQ_CHANGE_CNTL, DISPCLK_CHG_FWD_CORR_DISABLE, 
mask_sh),\
+   DCCG_SFII(OTG, PIXEL_RATE_CNTL, OTG, ADD_PIXEL, 0, mask_sh),\
+   DCCG_SFII(OTG, PIXEL_RATE_CNTL, OTG, ADD_PIXEL, 1, mask_sh),\
+   DCCG_SFII(OTG, PIXEL_RATE_CNTL, OTG, DROP_PIXEL, 0, mask_sh),\
+   DCCG_SFII(OTG, PIXEL_RATE_CNTL, OTG, DROP_PIXEL, 1, mask_sh)
 
 #endif //__DCN303_DCCG_H__
-- 
2.25.1

[PATCH 13/19] drm/amd/display: [FW Promotion] Release 0.0.98

2021-12-17 Thread Rodrigo Siqueira

From: Anthony Koo 

Reviewed-by: Aric Cyr 
Acked-by: Rodrigo Siqueira 
Signed-off-by: Anthony Koo 
---
 drivers/gpu/drm/amd/display/dmub/inc/dmub_cmd.h | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/display/dmub/inc/dmub_cmd.h 
b/drivers/gpu/drm/amd/display/dmub/inc/dmub_cmd.h
index a4fd61609190..d18762e02509 100644
--- a/drivers/gpu/drm/amd/display/dmub/inc/dmub_cmd.h
+++ b/drivers/gpu/drm/amd/display/dmub/inc/dmub_cmd.h
@@ -47,10 +47,10 @@
 
 /* Firmware versioning. */
 #ifdef DMUB_EXPOSE_VERSION
-#define DMUB_FW_VERSION_GIT_HASH 0xc99a4517
+#define DMUB_FW_VERSION_GIT_HASH 0xbaf06b95
 #define DMUB_FW_VERSION_MAJOR 0
 #define DMUB_FW_VERSION_MINOR 0
-#define DMUB_FW_VERSION_REVISION 97
+#define DMUB_FW_VERSION_REVISION 98
 #define DMUB_FW_VERSION_TEST 0
 #define DMUB_FW_VERSION_VBIOS 0
 #define DMUB_FW_VERSION_HOTFIX 0
-- 
2.25.1

[PATCH 12/19] drm/amd/display: Undo ODM combine

2021-12-17 Thread Rodrigo Siqueira

From: Martin Leung 

Undo ODM Combine regression causing causing pipe allocation issues.

Reviewed-by: Aric Cyr 
Acked-by: Rodrigo Siqueira 
Signed-off-by: Martin Leung 
---
 .../gpu/drm/amd/display/dc/core/dc_resource.c | 81 +--
 .../drm/amd/display/dc/dcn30/dcn30_resource.c | 11 ---
 2 files changed, 21 insertions(+), 71 deletions(-)

diff --git a/drivers/gpu/drm/amd/display/dc/core/dc_resource.c 
b/drivers/gpu/drm/amd/display/dc/core/dc_resource.c
index 8b6b035bfa9c..0da692c9a543 100644
--- a/drivers/gpu/drm/amd/display/dc/core/dc_resource.c
+++ b/drivers/gpu/drm/amd/display/dc/core/dc_resource.c
@@ -734,10 +734,6 @@ static void calculate_split_count_and_index(struct 
pipe_ctx *pipe_ctx, int *spli
(*split_idx)++;
split_pipe = split_pipe->top_pipe;
}
-
-   /* MPO window on right side of ODM split */
-   if (split_pipe && split_pipe->prev_odm_pipe && 
!pipe_ctx->prev_odm_pipe)
-   (*split_idx)++;
} else {
/*Get odm split index*/
struct pipe_ctx *split_pipe = pipe_ctx->prev_odm_pipe;
@@ -784,11 +780,7 @@ static void calculate_recout(struct pipe_ctx *pipe_ctx)
/*
 * Only the leftmost ODM pipe should be offset by a nonzero distance
 */
-   if (pipe_ctx->top_pipe && pipe_ctx->top_pipe->prev_odm_pipe && 
!pipe_ctx->prev_odm_pipe) {
-   /* MPO window on right side of ODM split */
-   data->recout.x = stream->dst.x + (surf_clip.x - 
stream->dst.width/2) *
-   stream->dst.width / stream->src.width;
-   } else if (!pipe_ctx->prev_odm_pipe || split_idx == split_count) {
+   if (!pipe_ctx->prev_odm_pipe || split_idx == split_count) {
data->recout.x = stream->dst.x;
if (stream->src.x < surf_clip.x)
data->recout.x += (surf_clip.x - stream->src.x) * 
stream->dst.width
@@ -986,8 +978,6 @@ static void calculate_inits_and_viewports(struct pipe_ctx 
*pipe_ctx)
* stream->dst.height / stream->src.height;
if (pipe_ctx->prev_odm_pipe && split_idx)
ro_lb = data->h_active * split_idx - recout_full_x;
-   else if (pipe_ctx->top_pipe && pipe_ctx->top_pipe->prev_odm_pipe)
-   ro_lb = data->h_active * split_idx - recout_full_x + 
data->recout.x;
else
ro_lb = data->recout.x - recout_full_x;
ro_tb = data->recout.y - recout_full_y;
@@ -1086,9 +1076,6 @@ bool resource_build_scaling_params(struct pipe_ctx 
*pipe_ctx)
timing->v_border_top + timing->v_border_bottom;
if (pipe_ctx->next_odm_pipe || pipe_ctx->prev_odm_pipe)
pipe_ctx->plane_res.scl_data.h_active /= 
get_num_odm_splits(pipe_ctx) + 1;
-   /* ODM + windows MPO, where window is on either right or left ODM half 
*/
-   else if (pipe_ctx->top_pipe && (pipe_ctx->top_pipe->next_odm_pipe || 
pipe_ctx->top_pipe->prev_odm_pipe))
-   pipe_ctx->plane_res.scl_data.h_active /= 
get_num_odm_splits(pipe_ctx->top_pipe) + 1;
 
/* depends on h_active */
calculate_recout(pipe_ctx);
@@ -1097,6 +1084,11 @@ bool resource_build_scaling_params(struct pipe_ctx 
*pipe_ctx)
/* depends on scaling ratios and recout, does not calculate offset yet 
*/
calculate_viewport_size(pipe_ctx);
 
+   /* Stopgap for validation of ODM + MPO on one side of screen case */
+   if (pipe_ctx->plane_res.scl_data.viewport.height < 1 ||
+   pipe_ctx->plane_res.scl_data.viewport.width < 1)
+   return false;
+
/*
 * LB calculations depend on vp size, h/v_active and scaling ratios
 * Setting line buffer pixel depth to 24bpp yields banding
@@ -1445,54 +1437,23 @@ bool dc_add_plane_to_context(
if (head_pipe != free_pipe) {
tail_pipe = resource_get_tail_pipe(>res_ctx, 
head_pipe);
ASSERT(tail_pipe);
-
-   /* ODM + window MPO, where MPO window is on right half 
only */
-   if (free_pipe->plane_state &&
-   (free_pipe->plane_state->clip_rect.x >= 
free_pipe->stream->src.width/2) &&
-   tail_pipe->next_odm_pipe) {
-   free_pipe->stream_res.tg = 
tail_pipe->next_odm_pipe->stream_res.tg;
-   free_pipe->stream_res.abm = 
tail_pipe->next_odm_pipe->stream_res.abm;
-   free_pipe->stream_res.opp = 
tail_pipe->next_odm_pipe->stream_res.opp;
-   free_pipe->stream_res.stream_enc = 
tail_pipe->next_odm_pipe->stream_res.stream_enc;
-   free_pipe->stream_res.audio = 
tail_pipe->next_odm_pipe->stream_res.audio;
-   free_pipe->clock_source =

[PATCH 05/19] drm/amd/display: Added power down for DCN10

2021-12-17 Thread Rodrigo Siqueira

From: "Lai, Derek" 

[Why]
The change of setting a timer callback on boot for 10 seconds is still
working, just lacked power down for DCN10.

[How]
Added power down for DCN10.

Reviewed-by: Anthony Koo 
Acked-by: Rodrigo Siqueira 
Signed-off-by: Derek Lai 
---
 drivers/gpu/drm/amd/display/dc/dcn10/dcn10_init.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/gpu/drm/amd/display/dc/dcn10/dcn10_init.c 
b/drivers/gpu/drm/amd/display/dc/dcn10/dcn10_init.c
index 34001a30d449..10e613ec7d24 100644
--- a/drivers/gpu/drm/amd/display/dc/dcn10/dcn10_init.c
+++ b/drivers/gpu/drm/amd/display/dc/dcn10/dcn10_init.c
@@ -78,6 +78,7 @@ static const struct hw_sequencer_funcs dcn10_funcs = {
.get_clock = dcn10_get_clock,
.get_vupdate_offset_from_vsync = dcn10_get_vupdate_offset_from_vsync,
.calc_vupdate_position = dcn10_calc_vupdate_position,
+   .power_down = dce110_power_down,
.set_backlight_level = dce110_set_backlight_level,
.set_abm_immediate_disable = dce110_set_abm_immediate_disable,
.set_pipe = dce110_set_pipe,
-- 
2.25.1

[PATCH 09/19] drm/amd/display: Set optimize_pwr_state for DCN31

2021-12-17 Thread Rodrigo Siqueira

From: Nicholas Kazlauskas 

[Why]
We'll exit optimized power state to do link detection but we won't enter
back into the optimized power state.

This could potentially block s2idle entry depending on the sequencing,
but it also means we're losing some power during the transition period.

[How]
Hook up the handler like DCN21. It was also missed like the
exit_optimized_pwr_state callback.

Fixes: 64b1d0e8d500 ("drm/amd/display: Add DCN3.1 HWSEQ")

Reviewed-by: Eric Yang 
Acked-by: Rodrigo Siqueira 
Signed-off-by: Nicholas Kazlauskas 
---
 drivers/gpu/drm/amd/display/dc/dcn31/dcn31_init.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/gpu/drm/amd/display/dc/dcn31/dcn31_init.c 
b/drivers/gpu/drm/amd/display/dc/dcn31/dcn31_init.c
index 7a7a8c5edabd..d7559e5a99ce 100644
--- a/drivers/gpu/drm/amd/display/dc/dcn31/dcn31_init.c
+++ b/drivers/gpu/drm/amd/display/dc/dcn31/dcn31_init.c
@@ -103,6 +103,7 @@ static const struct hw_sequencer_funcs dcn31_funcs = {
.z10_restore = dcn31_z10_restore,
.z10_save_init = dcn31_z10_save_init,
.set_disp_pattern_generator = dcn30_set_disp_pattern_generator,
+   .optimize_pwr_state = dcn21_optimize_pwr_state,
.exit_optimized_pwr_state = dcn21_exit_optimized_pwr_state,
.update_visual_confirm_color = dcn20_update_visual_confirm_color,
 };
-- 
2.25.1

[PATCH 08/19] drm/amd/display: Remove CR AUX RD Interval limit for LTTPR

2021-12-17 Thread Rodrigo Siqueira

From: George Shen 

[Why]
DP spec specifies that DPRX shall use the read interval in the
TRAINING_AUX_RD_INTERVAL_PHY_REPEATER LTTPR DPCD register. This
register's bit definition is the same as the AUX read interval register
for DPRX.

[How}
Remove logic which forces AUX read interval to 100us for repeaters when
in LTTPR non-transparent mode.

Reviewed-by: Wesley Chalmers 
Acked-by: Rodrigo Siqueira 
Signed-off-by: George Shen 
---
 drivers/gpu/drm/amd/display/dc/core/dc_link_dp.c | 3 ---
 1 file changed, 3 deletions(-)

diff --git a/drivers/gpu/drm/amd/display/dc/core/dc_link_dp.c 
b/drivers/gpu/drm/amd/display/dc/core/dc_link_dp.c
index 04878817e622..9dc99929b0cd 100644
--- a/drivers/gpu/drm/amd/display/dc/core/dc_link_dp.c
+++ b/drivers/gpu/drm/amd/display/dc/core/dc_link_dp.c
@@ -1544,9 +1544,6 @@ static enum link_training_result 
perform_clock_recovery_sequence(
/* 3. wait receiver to lock-on*/
wait_time_microsec = lt_settings->cr_pattern_time;
 
-   if (link->lttpr_mode == LTTPR_MODE_NON_TRANSPARENT)
-   wait_time_microsec = TRAINING_AUX_RD_INTERVAL;
-
if (link->dc->debug.apply_vendor_specific_lttpr_wa &&
(link->chip_caps & 
EXT_DISPLAY_PATH_CAPS__DP_FIXED_VS_EN)) {
wait_time_microsec = 16000;
-- 
2.25.1

[PATCH 04/19] drm/amd/display: Block z-states when stutter period exceeds criteria

2021-12-17 Thread Rodrigo Siqueira

From: Nicholas Kazlauskas 

[Why]
Stutter period won't be less than 5000.0, but if PSR is enabled then we
can potentially enter Z9 when MPO is enabled.

SMU will try to enter Z9 too early in these cases (before PSR is
enabled) and we'll see underflow.

[How]
Block z-states (z9, z10) until we can add a new interface to SMU to
signal when we can support z10 but not z9.

We can revert this once the interface change is in.

Reviewed-by: Eric Yang 
Acked-by: Rodrigo Siqueira 
Signed-off-by: Nicholas Kazlauskas 
---
 drivers/gpu/drm/amd/display/dc/dcn20/dcn20_resource.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/display/dc/dcn20/dcn20_resource.c 
b/drivers/gpu/drm/amd/display/dc/dcn20/dcn20_resource.c
index 40b122a708ef..2a72517e2b28 100644
--- a/drivers/gpu/drm/amd/display/dc/dcn20/dcn20_resource.c
+++ b/drivers/gpu/drm/amd/display/dc/dcn20/dcn20_resource.c
@@ -3093,8 +3093,7 @@ static enum dcn_zstate_support_state  
decide_zstate_support(struct dc *dc, struc
else if (context->stream_count == 1 &&  context->streams[0]->signal == 
SIGNAL_TYPE_EDP) {
struct dc_link *link = context->streams[0]->sink->link;
 
-   if ((link->link_index == 0 && 
link->psr_settings.psr_feature_enabled)
-   || context->bw_ctx.dml.vba.StutterPeriod > 
5000.0)
+   if (link->link_index == 0 && 
context->bw_ctx.dml.vba.StutterPeriod > 5000.0)
return DCN_ZSTATE_SUPPORT_ALLOW;
else
return DCN_ZSTATE_SUPPORT_DISALLOW;
-- 
2.25.1

[PATCH 06/19] drm/amd/display: Fix check for null function ptr

2021-12-17 Thread Rodrigo Siqueira

From: Alvin Lee 

[Why]
Bug fix for null function ptr (should check for NULL instead of not
NULL)

[How]
Fix if condition

Reviewed-by: Samson Tam 
Acked-by: Rodrigo Siqueira 
Signed-off-by: Alvin Lee 
---
 drivers/gpu/drm/amd/display/dmub/src/dmub_srv.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/display/dmub/src/dmub_srv.c 
b/drivers/gpu/drm/amd/display/dmub/src/dmub_srv.c
index f673a1c1777a..9280f2abd973 100644
--- a/drivers/gpu/drm/amd/display/dmub/src/dmub_srv.c
+++ b/drivers/gpu/drm/amd/display/dmub/src/dmub_srv.c
@@ -852,7 +852,7 @@ bool dmub_srv_should_detect(struct dmub_srv *dmub)
 
 enum dmub_status dmub_srv_clear_inbox0_ack(struct dmub_srv *dmub)
 {
-   if (!dmub->hw_init || dmub->hw_funcs.clear_inbox0_ack_register)
+   if (!dmub->hw_init || !dmub->hw_funcs.clear_inbox0_ack_register)
return DMUB_STATUS_INVALID;
 
dmub->hw_funcs.clear_inbox0_ack_register(dmub);
@@ -878,7 +878,7 @@ enum dmub_status dmub_srv_wait_for_inbox0_ack(struct 
dmub_srv *dmub, uint32_t ti
 enum dmub_status dmub_srv_send_inbox0_cmd(struct dmub_srv *dmub,
union dmub_inbox0_data_register data)
 {
-   if (!dmub->hw_init || dmub->hw_funcs.send_inbox0_cmd)
+   if (!dmub->hw_init || !dmub->hw_funcs.send_inbox0_cmd)
return DMUB_STATUS_INVALID;
 
dmub->hw_funcs.send_inbox0_cmd(dmub, data);
-- 
2.25.1

[PATCH 07/19] drm/amd/display: Send s0i2_rdy in stream_count == 0 optimization

2021-12-17 Thread Rodrigo Siqueira

From: Nicholas Kazlauskas 

[Why]
Otherwise SMU won't mark Display as idle when trying to perform s2idle.

[How]
Mark the bit in the dcn31 codepath, doesn't apply to older ASIC.

It needed to be split from phy refclk off to prevent entering s2idle
when PSR was engaged but driver was not ready.

Fixes: 118a33151658 ("drm/amd/display: Add DCN3.1 clock manager support")

Reviewed-by: Eric Yang 
Acked-by: Rodrigo Siqueira 
Signed-off-by: Nicholas Kazlauskas 
---
 drivers/gpu/drm/amd/display/dc/clk_mgr/dcn31/dcn31_clk_mgr.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/gpu/drm/amd/display/dc/clk_mgr/dcn31/dcn31_clk_mgr.c 
b/drivers/gpu/drm/amd/display/dc/clk_mgr/dcn31/dcn31_clk_mgr.c
index 412cc6a716f7..4162ce40089b 100644
--- a/drivers/gpu/drm/amd/display/dc/clk_mgr/dcn31/dcn31_clk_mgr.c
+++ b/drivers/gpu/drm/amd/display/dc/clk_mgr/dcn31/dcn31_clk_mgr.c
@@ -158,6 +158,7 @@ void dcn31_update_clocks(struct clk_mgr *clk_mgr_base,
union display_idle_optimization_u idle_info = { 
0 };
idle_info.idle_info.df_request_disabled = 1;
idle_info.idle_info.phy_ref_clk_off = 1;
+   idle_info.idle_info.s0i2_rdy = 1;

dcn31_smu_set_display_idle_optimization(clk_mgr, idle_info.data);
/* update power state */
clk_mgr_base->clks.pwr_state = 
DCN_PWR_STATE_LOW_POWER;
-- 
2.25.1

[PATCH 01/19] drm/amd/display: fix B0 TMDS deepcolor no dislay issue

2021-12-17 Thread Rodrigo Siqueira

From: Charlene Liu 

[why]
B0 PHY C map to F, D map to G driver use logic instance, dmub does the
remap. Driver still need use the right PHY instance to access right HW.

[how]
use phyical instance when program PHY register.

[note]
could move resync_control programming to dmub next.

Reviewed-by: Dmytro Laktyushkin 
Reviewed-by: Jun Lei 
Acked-by: Rodrigo Siqueira 
Signed-off-by: Charlene Liu 
---
 .../drm/amd/display/dc/dcn31/dcn31_resource.c | 25 +--
 .../drm/amd/display/dc/dcn31/dcn31_resource.h | 31 +++
 2 files changed, 54 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/display/dc/dcn31/dcn31_resource.c 
b/drivers/gpu/drm/amd/display/dc/dcn31/dcn31_resource.c
index 9a9ca70f8fe1..6d07dcecc953 100644
--- a/drivers/gpu/drm/amd/display/dc/dcn31/dcn31_resource.c
+++ b/drivers/gpu/drm/amd/display/dc/dcn31/dcn31_resource.c
@@ -355,6 +355,14 @@ static const struct dce110_clk_src_regs clk_src_regs[] = {
clk_src_regs(3, D),
clk_src_regs(4, E)
 };
+/*pll_id being rempped in dmub, in driver it is logical instance*/
+static const struct dce110_clk_src_regs clk_src_regs_b0[] = {
+   clk_src_regs(0, A),
+   clk_src_regs(1, B),
+   clk_src_regs(2, F),
+   clk_src_regs(3, G),
+   clk_src_regs(4, E)
+};
 
 static const struct dce110_clk_src_shift cs_shift = {
CS_COMMON_MASK_SH_LIST_DCN2_0(__SHIFT)
@@ -2288,14 +2296,27 @@ static bool dcn31_resource_construct(
dcn30_clock_source_create(ctx, ctx->dc_bios,
CLOCK_SOURCE_COMBO_PHY_PLL1,
_src_regs[1], false);
-   pool->base.clock_sources[DCN31_CLK_SRC_PLL2] =
+   /*move phypllx_pixclk_resync to dmub next*/
+   if (dc->ctx->asic_id.hw_internal_rev == YELLOW_CARP_B0) {
+   pool->base.clock_sources[DCN31_CLK_SRC_PLL2] =
+   dcn30_clock_source_create(ctx, ctx->dc_bios,
+   CLOCK_SOURCE_COMBO_PHY_PLL2,
+   _src_regs_b0[2], false);
+   pool->base.clock_sources[DCN31_CLK_SRC_PLL3] =
+   dcn30_clock_source_create(ctx, ctx->dc_bios,
+   CLOCK_SOURCE_COMBO_PHY_PLL3,
+   _src_regs_b0[3], false);
+   } else {
+   pool->base.clock_sources[DCN31_CLK_SRC_PLL2] =
dcn30_clock_source_create(ctx, ctx->dc_bios,
CLOCK_SOURCE_COMBO_PHY_PLL2,
_src_regs[2], false);
-   pool->base.clock_sources[DCN31_CLK_SRC_PLL3] =
+   pool->base.clock_sources[DCN31_CLK_SRC_PLL3] =
dcn30_clock_source_create(ctx, ctx->dc_bios,
CLOCK_SOURCE_COMBO_PHY_PLL3,
_src_regs[3], false);
+   }
+
pool->base.clock_sources[DCN31_CLK_SRC_PLL4] =
dcn30_clock_source_create(ctx, ctx->dc_bios,
CLOCK_SOURCE_COMBO_PHY_PLL4,
diff --git a/drivers/gpu/drm/amd/display/dc/dcn31/dcn31_resource.h 
b/drivers/gpu/drm/amd/display/dc/dcn31/dcn31_resource.h
index 416fe7a721d8..a513363b3326 100644
--- a/drivers/gpu/drm/amd/display/dc/dcn31/dcn31_resource.h
+++ b/drivers/gpu/drm/amd/display/dc/dcn31/dcn31_resource.h
@@ -49,4 +49,35 @@ struct resource_pool *dcn31_create_resource_pool(
const struct dc_init_data *init_data,
struct dc *dc);
 
+/*temp: B0 specific before switch to dcn313 headers*/
+#ifndef regPHYPLLF_PIXCLK_RESYNC_CNTL
+#define regPHYPLLF_PIXCLK_RESYNC_CNTL 0x007e
+#define regPHYPLLF_PIXCLK_RESYNC_CNTL_BASE_IDX 1
+#define regPHYPLLG_PIXCLK_RESYNC_CNTL 0x005f
+#define regPHYPLLG_PIXCLK_RESYNC_CNTL_BASE_IDX 1
+
+//PHYPLLF_PIXCLK_RESYNC_CNTL
+#define PHYPLLF_PIXCLK_RESYNC_CNTL__PHYPLLF_PIXCLK_RESYNC_ENABLE__SHIFT 0x0
+#define 
PHYPLLF_PIXCLK_RESYNC_CNTL__PHYPLLF_DEEP_COLOR_DTO_ENABLE_STATUS__SHIFT 0x1
+#define PHYPLLF_PIXCLK_RESYNC_CNTL__PHYPLLF_DCCG_DEEP_COLOR_CNTL__SHIFT 0x4
+#define PHYPLLF_PIXCLK_RESYNC_CNTL__PHYPLLF_PIXCLK_ENABLE__SHIFT 0x8
+#define PHYPLLF_PIXCLK_RESYNC_CNTL__PHYPLLF_PIXCLK_DOUBLE_RATE_ENABLE__SHIFT 
0x9
+#define PHYPLLF_PIXCLK_RESYNC_CNTL__PHYPLLF_PIXCLK_RESYNC_ENABLE_MASK 
0x0001L
+#define PHYPLLF_PIXCLK_RESYNC_CNTL__PHYPLLF_DEEP_COLOR_DTO_ENABLE_STATUS_MASK 
0x0002L
+#define PHYPLLF_PIXCLK_RESYNC_CNTL__PHYPLLF_DCCG_DEEP_COLOR_CNTL_MASK 
0x0030L
+#define PHYPLLF_PIXCLK_RESYNC_CNTL__PHYPLLF_PIXCLK_ENABLE_MASK 0x0100L
+#define PHYPLLF_PIXCLK_RESYNC_CNTL__PHYPLLF_PIXCLK_DOUBLE_RATE_ENABLE_MASK 
0x0200L
+
+//PHYPLLG_PIXCLK_RESYNC_CNTL
+#define PHYPLLG_PIXCLK_RESYNC_CNTL__PHYPLLG_PIXCLK_RESYNC_ENABLE__SHIFT 0x0
+#define 
PHYPLLG_PIXCLK_RESYNC_CNTL__PHYPLLG_DEEP_COLOR_DTO_ENABLE_STATUS__SHIFT 0x1
+#define PHYPLLG_PIXCLK_RESYNC_CNTL__PHYPLLG_DCCG_DEEP_COLOR_CNTL__SHIFT 0x4
+#define

[PATCH 02/19] drm/amd/display: Limit max link cap with LTTPR caps

2021-12-17 Thread Rodrigo Siqueira

From: George Shen 

[Why]
Max link rate should be limited to the maximum link rate support by any
LTTPR that are connected, including when operating in transparent mode.

[How]
Include transparent mode when factoring in LTTPR max supported link
rate.

Reviewed-by: Wesley Chalmers 
Acked-by: Rodrigo Siqueira 
Signed-off-by: George Shen 
---
 drivers/gpu/drm/amd/display/dc/core/dc_link_dp.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/display/dc/core/dc_link_dp.c 
b/drivers/gpu/drm/amd/display/dc/core/dc_link_dp.c
index 8a35370da867..6f552f7ee1db 100644
--- a/drivers/gpu/drm/amd/display/dc/core/dc_link_dp.c
+++ b/drivers/gpu/drm/amd/display/dc/core/dc_link_dp.c
@@ -2873,7 +2873,7 @@ static struct dc_link_settings get_max_link_cap(struct 
dc_link *link)
 * account for lttpr repeaters cap
 * notes: repeaters do not snoop in the DPRX Capabilities addresses 
(3.6.3).
 */
-   if (link->lttpr_mode == LTTPR_MODE_NON_TRANSPARENT) {
+   if (link->lttpr_mode != LTTPR_MODE_NON_LTTPR) {
if (link->dpcd_caps.lttpr_caps.max_lane_count < 
max_link_cap.lane_count)
max_link_cap.lane_count = 
link->dpcd_caps.lttpr_caps.max_lane_count;
 
-- 
2.25.1

[PATCH 03/19] drm/amd/display: Refactor vendor specific link training sequence

2021-12-17 Thread Rodrigo Siqueira

From: "Shen, George" 

[Why]
Current implementation is not scalable and retrofits the existing
standard link training code for purposes outside of its original design.

[How]
Refactor vendor specific link training sequence into its own separate
function to be called instead of the standard link training function.

Reviewed-by: Wenjing Liu 
Acked-by: Rodrigo Siqueira 
Signed-off-by: George Shen 
---
 .../gpu/drm/amd/display/dc/core/dc_link_dp.c  | 338 +-
 1 file changed, 337 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/display/dc/core/dc_link_dp.c 
b/drivers/gpu/drm/amd/display/dc/core/dc_link_dp.c
index 6f552f7ee1db..04878817e622 100644
--- a/drivers/gpu/drm/amd/display/dc/core/dc_link_dp.c
+++ b/drivers/gpu/drm/amd/display/dc/core/dc_link_dp.c
@@ -2427,6 +2427,338 @@ static enum link_training_result 
dp_perform_128b_132b_link_training(
 }
 #endif
 
+static enum link_training_result 
dc_link_dp_perform_fixed_vs_pe_training_sequence(
+   struct dc_link *link,
+   struct link_training_settings *lt_settings)
+{
+   const uint8_t vendor_lttpr_write_data_reset[4] = {0x1, 0x50, 0x63, 
0xFF};
+   const uint8_t offset = dp_convert_to_count(
+   link->dpcd_caps.lttpr_caps.phy_repeater_cnt);
+   const uint8_t vendor_lttpr_write_data_intercept_en[4] = {0x1, 0x55, 
0x63, 0x0};
+   const uint8_t vendor_lttpr_write_data_intercept_dis[4] = {0x1, 0x55, 
0x63, 0x68};
+   uint8_t vendor_lttpr_write_data_vs[4] = {0x1, 0x51, 0x63, 0x0};
+   uint8_t vendor_lttpr_write_data_pe[4] = {0x1, 0x52, 0x63, 0x0};
+   uint32_t vendor_lttpr_write_address = 0xF004F;
+   enum link_training_result status = LINK_TRAINING_SUCCESS;
+   uint8_t lane = 0;
+   union down_spread_ctrl downspread = {0};
+   union lane_count_set lane_count_set = {0};
+   uint8_t toggle_rate;
+   uint8_t rate;
+
+   /* Only 8b/10b is supported */
+   ASSERT(dp_get_link_encoding_format(_settings->link_settings) ==
+   DP_8b_10b_ENCODING);
+
+   if (offset != 0xFF) {
+   vendor_lttpr_write_address +=
+   ((DP_REPEATER_CONFIGURATION_AND_STATUS_SIZE) * 
(offset - 1));
+   }
+
+   /* Vendor specific: Reset lane settings */
+   core_link_write_dpcd(
+   link,
+   vendor_lttpr_write_address,
+   _lttpr_write_data_reset[0],
+   sizeof(vendor_lttpr_write_data_reset));
+   core_link_write_dpcd(
+   link,
+   vendor_lttpr_write_address,
+   _lttpr_write_data_vs[0],
+   sizeof(vendor_lttpr_write_data_vs));
+   core_link_write_dpcd(
+   link,
+   vendor_lttpr_write_address,
+   _lttpr_write_data_pe[0],
+   sizeof(vendor_lttpr_write_data_pe));
+
+   /* Vendor specific: Enable intercept */
+   core_link_write_dpcd(
+   link,
+   vendor_lttpr_write_address,
+   _lttpr_write_data_intercept_en[0],
+   sizeof(vendor_lttpr_write_data_intercept_en));
+
+   /* 1. set link rate, lane count and spread. */
+
+   downspread.raw = (uint8_t)(lt_settings->link_settings.link_spread);
+
+   lane_count_set.bits.LANE_COUNT_SET =
+   lt_settings->link_settings.lane_count;
+
+   lane_count_set.bits.ENHANCED_FRAMING = lt_settings->enhanced_framing;
+   lane_count_set.bits.POST_LT_ADJ_REQ_GRANTED = 0;
+
+
+   if (lt_settings->pattern_for_eq < DP_TRAINING_PATTERN_SEQUENCE_4) {
+   lane_count_set.bits.POST_LT_ADJ_REQ_GRANTED =
+   
link->dpcd_caps.max_ln_count.bits.POST_LT_ADJ_REQ_SUPPORTED;
+   }
+
+   core_link_write_dpcd(link, DP_DOWNSPREAD_CTRL,
+   , sizeof(downspread));
+
+   core_link_write_dpcd(link, DP_LANE_COUNT_SET,
+   _count_set.raw, 1);
+
+#if defined(CONFIG_DRM_AMD_DC_DCN)
+   rate = get_dpcd_link_rate(_settings->link_settings);
+#else
+   rate = (uint8_t) (lt_settings->link_settings.link_rate);
+#endif
+
+   /* Vendor specific: Toggle link rate */
+   toggle_rate = (rate == 0x6) ? 0xA : 0x6;
+
+   if (link->vendor_specific_lttpr_link_rate_wa == rate) {
+   core_link_write_dpcd(
+   link,
+   DP_LINK_BW_SET,
+   _rate,
+   1);
+   }
+
+   link->vendor_specific_lttpr_link_rate_wa = rate;
+
+   core_link_write_dpcd(link, DP_LINK_BW_SET, , 1);
+
+   DC_LOG_HW_LINK_TRAINING("%s\n %x rate = %x\n %x lane = %x framing = 
%x\n %x spread = %x\n",
+   __func__,
+   DP_LINK_BW_SET,
+   lt_settings->link_settings.link_rate,
+   DP_LANE_COUNT_SET,
+

[PATCH 00/19] DC Patches December 17, 2021

2021-12-17 Thread Rodrigo Siqueira

This DC patchset brings improvements in multiple areas. In summary, we
highlight:

- Fixes and improvements in the LTTPR code
- Improve z-state
- Fix null pointer check
- Improve communication with s0i2
- Update multiple-display split policy
- Add missing registers

Cc: Daniel Wheeler 

Thanks
Siqueira

Alvin Lee (1):
  drm/amd/display: Fix check for null function ptr

Angus Wang (1):
  drm/amd/display: Changed pipe split policy to allow for multi-display
pipe split

Anthony Koo (1):
  drm/amd/display: [FW Promotion] Release 0.0.98

Aric Cyr (1):
  drm/amd/display: 3.2.167

Charlene Liu (1):
  drm/amd/display: fix B0 TMDS deepcolor no dislay issue

George Shen (2):
  drm/amd/display: Limit max link cap with LTTPR caps
  drm/amd/display: Remove CR AUX RD Interval limit for LTTPR

Lai, Derek (1):
  drm/amd/display: Added power down for DCN10

Martin Leung (1):
  drm/amd/display: Undo ODM combine

Nicholas Kazlauskas (3):
  drm/amd/display: Block z-states when stutter period exceeds criteria
  drm/amd/display: Send s0i2_rdy in stream_count == 0 optimization
  drm/amd/display: Set optimize_pwr_state for DCN31

Shen, George (1):
  drm/amd/display: Refactor vendor specific link training sequence

Wenjing Liu (5):
  drm/amd/display: define link res and make it accessible to all link
interfaces
  drm/amd/display: populate link res in both detection and validation
  drm/amd/display: access hpo dp link encoder only through link resource
  drm/amd/display: support dynamic HPO DP link encoder allocation
  drm/amd/display: get and restore link res map

Wesley Chalmers (1):
  drm/amd/display: Add reg defs for DCN303

 .../display/dc/clk_mgr/dcn31/dcn31_clk_mgr.c  |   1 +
 drivers/gpu/drm/amd/display/dc/core/dc.c  |  18 -
 .../gpu/drm/amd/display/dc/core/dc_debug.c|   2 +
 drivers/gpu/drm/amd/display/dc/core/dc_link.c | 234 +---
 .../gpu/drm/amd/display/dc/core/dc_link_dp.c  | 501 +++---
 .../drm/amd/display/dc/core/dc_link_dpia.c|  48 +-
 .../drm/amd/display/dc/core/dc_link_hwss.c|  63 ++-
 .../gpu/drm/amd/display/dc/core/dc_resource.c | 199 ---
 drivers/gpu/drm/amd/display/dc/dc.h   |   3 +-
 drivers/gpu/drm/amd/display/dc/dc_link.h  |  15 +-
 .../amd/display/dc/dcn10/dcn10_hw_sequencer.c |  14 +-
 .../gpu/drm/amd/display/dc/dcn10/dcn10_init.c |   1 +
 .../drm/amd/display/dc/dcn20/dcn20_hwseq.c|   2 +-
 .../drm/amd/display/dc/dcn20/dcn20_resource.c |   5 +-
 .../amd/display/dc/dcn201/dcn201_resource.c   |   2 +-
 .../drm/amd/display/dc/dcn21/dcn21_resource.c |   2 +-
 .../drm/amd/display/dc/dcn30/dcn30_resource.c |  13 +-
 .../amd/display/dc/dcn301/dcn301_resource.c   |   2 +-
 .../amd/display/dc/dcn302/dcn302_resource.c   |   2 +-
 .../drm/amd/display/dc/dcn303/dcn303_dccg.h   |  20 +-
 .../amd/display/dc/dcn303/dcn303_resource.c   |   2 +-
 .../dc/dcn31/dcn31_hpo_dp_link_encoder.c  |   6 +-
 .../dc/dcn31/dcn31_hpo_dp_link_encoder.h  |   3 +-
 .../gpu/drm/amd/display/dc/dcn31/dcn31_init.c |   1 +
 .../drm/amd/display/dc/dcn31/dcn31_resource.c |  27 +-
 .../drm/amd/display/dc/dcn31/dcn31_resource.h |  31 ++
 .../gpu/drm/amd/display/dc/dml/dml_wrapper.c  |   2 +-
 .../gpu/drm/amd/display/dc/inc/core_status.h  |   2 +
 .../gpu/drm/amd/display/dc/inc/core_types.h   |  17 +
 .../gpu/drm/amd/display/dc/inc/dc_link_dp.h   |  15 +-
 .../gpu/drm/amd/display/dc/inc/dc_link_dpia.h |   5 +-
 .../drm/amd/display/dc/inc/hw/link_encoder.h  |   3 +-
 .../gpu/drm/amd/display/dc/inc/link_hwss.h|  10 +-
 drivers/gpu/drm/amd/display/dc/inc/resource.h |   6 +-
 .../gpu/drm/amd/display/dmub/inc/dmub_cmd.h   |   4 +-
 .../gpu/drm/amd/display/dmub/src/dmub_srv.c   |   4 +-
 36 files changed, 964 insertions(+), 321 deletions(-)

-- 
2.25.1

Re: Re: Re: 回复: Re: [PATCH] drm/amdgpu: fixup bad vram size on gmc v8

2021-12-17 Thread Alex Deucher

If you could get me a copy of the vbios image from a problematic board,
that would be helpful.  In the meantime, I've applied the patch.

Alex


On Thu, Dec 16, 2021 at 9:38 PM 周宗敏  wrote:

> Dear Alex:
>
>
> >Is the issue reproducible with the same board in bare metal on x86?Or
> does it only happen with passthrough on ARM?
>
>
> Unfortunately, my current environment is not convenient to test this GPU
> board on x86 platform.
>
> but I can tell you the problem still occurs on ARM without passthrough to
> virtual machine.
>
>
> In addition,at end of 2020,my colleagues also found similar problems on
> MIPS platforms with Graphics chips of Radeon R7 340.
>
> So,I may think it can happen to no matter based on x86 ,ARM or mips.
>
>
> I hope the above information is helpful to you，and I also think it will be
> better for user if can root cause this issue.
>
>
> Best regards.
>
>
>
>
> 
>
>
>
>
>
>
> *主 题：*Re: Re: 回复: Re: [PATCH] drm/amdgpu: fixup bad vram size on gmc v8
>
> *日 期：*2021-12-16 23:28
> *发件人：*Alex Deucher
> *收件人：*周宗敏
>
>
> Is the issue reproducible with the same board in bare metal on x86?  Or
> does it only happen with passthrough on ARM?  Looking through the archives,
> the SI patch I made was for an x86 laptop.  It would be nice to root
> cause this, but there weren't any gfx8 boards with more than 64G of vram,
> so I think it's safe.  That said, if you see similar issues with newer gfx
> IPs then we have an issue since the upper bit will be meaningful, so it
> would be nice to root cause this.
>
> Alex
>
>
> On Thu, Dec 16, 2021 at 4:36 AM 周宗敏  wrote:
>
>> Hi  Christian,
>>
>>
>> I'm  testing for GPU passthrough feature, so I pass through this GPU to
>> virtual machine to use. It  based on arm64 system.
>>
>> As far as i know, Alex had dealt with a similar problems on
>> dri/radeon/si.c .  Maybe they have a same reason to cause it?
>>
>> the history commit message is below:
>>
>>
>> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=0ca223b029a261e82fb2f50c52eb85d510f4260e
>>
>> [image: image.png]
>>
>>
>> Thanks very much.
>>
>>
>>
>> 
>>
>>
>>
>> *主 题：*Re: 回复: Re: [PATCH] drm/amdgpu: fixup bad vram size on gmc v8
>>
>> *日 期：*2021-12-16 16:15
>> *发件人：*Christian König
>> *收件人：*周宗敏Alex Deucher
>>
>>
>>
>>
>> Hi Zongmin,
>>
>>that strongly sounds like the ASIC is not correctly initialized when
>>  trying to read the register.
>>
>>What board and environment are you using this GPU with? Is that a
>>  normal x86 system?
>>
>>Regards,
>>Christian.
>>
>>
>>
>> Am 16.12.21 um 04:11 schrieb 周宗敏:
>>
>>
>>
>>1.
>>
>>the problematic boards that I have tested is [AMD/ATI] Lexa
>> PRO [Radeon RX 550/550X] ;  and the vbios version :
>> 113-RXF9310-C09-BT
>>2.
>>
>>When an exception occurs I can see the following changes in
>> the values of vram size get from RREG32(mmCONFIG_MEMSIZE) ,
>>
>>it seems to have garbage in the upper 16 bits
>>
>>[image: image.png]
>>
>>
>>
>>
>>3.
>>
>>and then I can also see some dmesg like below:
>>
>>when vram size register have garbage,we may see error
>> message like below:
>>
>>amdgpu :09:00.0: VRAM: 4286582784M 0x00F4 -
>> 0x000FF8F4 (4286582784M used)
>>
>>the correct message should like below:
>>
>>amdgpu :09:00.0: VRAM: 4096M 0x00F4 -
>> 0x00F4 (4096M used)
>>
>>
>>
>>
>>if you have any problems,please send me mail.
>>
>>thanks very much.
>>
>>
>>
>>
>> 
>>
>> *主 题：*Re: [PATCH] drm/amdgpu:  fixup bad vram size on gmc v8
>>
>>*日 期：*2021-12-16 04:23
>>*发件人：*Alex Deucher
>>*收件人：*Zongmin Zhou
>>
>>
>>
>>
>> On Wed, Dec 15, 2021 at 10:31 AM Zongmin Zhouwrote:
>>  >
>>  > Some boards(like RX550) seem to have garbage in the upper
>>  > 16 bits of the vram size register.  Check for
>>  > this and clamp the size properly.  Fixes
>>  > boards reporting bogus amounts of vram.
>>  >
>>  > after add this patch,the maximum GPU VRAM size is 64GB,
>>  > otherwise only 64GB vram size will be used.
>>
>>  Can you provide some examples of problematic boards and
>>  possibly a
>>  vbios image from the problematic board?  What values are you
>>  seeing?
>>  It would be nice to see what the boards are reporting and
>>whether the
>>  lower 16 bits are actually correct or if it is some other
>>issue.  This
>>  register is undefined until the asic has been initialized.
>> The vbios
>>  programs it as part of it's asic init sequence (either via
>>vesa/gop or
>>  the OS driver).
>>
>>  Alex
>>
>>
>>  >
>>  > Signed-off-by: Zongmin Zhou
>>> ---
>>>  drivers/gpu/drm/amd/amdgpu/gmc_v8_0.c | 13
>>  ++---
>>>  1 file changed, 10 insertions(+), 3 deletions(-)
>>>
>>

Re: [PATCH] drm/amd/display: Fix USB4 null pointer dereference in update_psp_stream_config

2021-12-17 Thread Harry Wentland

On 2021-12-17 14:25, Nicholas Kazlauskas wrote:
> [Why]
> A porting error on a previous patch left the block of code that
> causes the crash from a NULL pointer dereference.
> 
> More specifically, we try to access link_enc before it's assigned in
> the USB4 case in the following assignment:
> 
> config.dio_output_idx = link_enc->transmitter - TRANSMITTER_UNIPHY_A;
> 
> [How]
> That assignment occurs later depending on the ASIC version. It's only
> needed on DCN31+ and only after link_enc is already assigned.
> 
> Fixes: 35b6fe499be7 ("drm/amd/display: fix a crash on USB4 over C20 PHY")
> Cc: Rodrigo Siqueira 
> Cc: Harry Wentland 
> Signed-off-by: Nicholas Kazlauskas 

Reviewed-by: Harry Wentland 

Harry

> ---
>  drivers/gpu/drm/amd/display/dc/core/dc_link.c | 5 +
>  1 file changed, 1 insertion(+), 4 deletions(-)
> 
> diff --git a/drivers/gpu/drm/amd/display/dc/core/dc_link.c 
> b/drivers/gpu/drm/amd/display/dc/core/dc_link.c
> index 3d75f56a939c..857941d83f1f 100644
> --- a/drivers/gpu/drm/amd/display/dc/core/dc_link.c
> +++ b/drivers/gpu/drm/amd/display/dc/core/dc_link.c
> @@ -4009,12 +4009,9 @@ static void update_psp_stream_config(struct pipe_ctx 
> *pipe_ctx, bool dpms_off)
>   config.dig_be = pipe_ctx->stream->link->link_enc_hw_inst;
>  #if defined(CONFIG_DRM_AMD_DC_DCN)
>   config.stream_enc_idx = pipe_ctx->stream_res.stream_enc->id - 
> ENGINE_ID_DIGA;
> - 
> +
>   if (pipe_ctx->stream->link->ep_type == DISPLAY_ENDPOINT_PHY ||
>   pipe_ctx->stream->link->ep_type == 
> DISPLAY_ENDPOINT_USB4_DPIA) {
> - link_enc = pipe_ctx->stream->link->link_enc;
> - config.dio_output_type = 
> pipe_ctx->stream->link->ep_type;
> - config.dio_output_idx = link_enc->transmitter - 
> TRANSMITTER_UNIPHY_A;
>   if (pipe_ctx->stream->link->ep_type == 
> DISPLAY_ENDPOINT_PHY)
>   link_enc = pipe_ctx->stream->link->link_enc;
>   else if (pipe_ctx->stream->link->ep_type == 
> DISPLAY_ENDPOINT_USB4_DPIA)

RE: [PATCH] drm/amdgpu: Send Message to SMU on aldebaran passthrough for sbr handling

2021-12-17 Thread Liu, Shaoyun

[AMD Official Use Only]

Reviewed by: Shaoyun.liu 

-Original Message-
From: amd-gfx  On Behalf Of sashank saye
Sent: Friday, December 17, 2021 1:56 PM
To: amd-gfx@lists.freedesktop.org
Cc: Saye, Sashank 
Subject: [PATCH] drm/amdgpu: Send Message to SMU on aldebaran passthrough for 
sbr handling

For Aldebaran chip passthrough case we need to intimate SMU
about special handling for SBR.On older chips we send
LightSBR to SMU, enabling the same for Aldebaran. Slight
difference, compared to previous chips, is on Aldebaran, SMU
would do a heavy reset on SBR. Hence, the word Heavy
instead of Light SBR is used for SMU to differentiate.

Signed-off-by: sashank saye 
Change-Id: I79420e7352bb670d6f9696df97d7546f131b18fc
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_device.c |  9 -
 drivers/gpu/drm/amd/pm/inc/aldebaran_ppsmc.h   |  4 +++-
 drivers/gpu/drm/amd/pm/inc/amdgpu_smu.h|  6 +++---
 drivers/gpu/drm/amd/pm/inc/smu_types.h |  3 ++-
 drivers/gpu/drm/amd/pm/inc/smu_v11_0.h |  2 +-
 drivers/gpu/drm/amd/pm/swsmu/amdgpu_smu.c  |  6 +++---
 drivers/gpu/drm/amd/pm/swsmu/smu11/arcturus_ppt.c  |  2 +-
 drivers/gpu/drm/amd/pm/swsmu/smu11/smu_v11_0.c |  2 +-
 drivers/gpu/drm/amd/pm/swsmu/smu13/aldebaran_ppt.c | 10 ++
 9 files changed, 28 insertions(+), 16 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index f31caec669e7..e4c93d373224 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -2618,11 +2618,10 @@ static int amdgpu_device_ip_late_init(struct 
amdgpu_device *adev)
if (r)
DRM_ERROR("enable mgpu fan boost failed (%d).\n", r);
 
-   /* For XGMI + passthrough configuration on arcturus, enable light SBR */
-   if (adev->asic_type == CHIP_ARCTURUS &&
-   amdgpu_passthrough(adev) &&
-   adev->gmc.xgmi.num_physical_nodes > 1)
-   smu_set_light_sbr(>smu, true);
+   /* For passthrough configuration on arcturus and aldebaran, enable 
special handling SBR */
+   if (amdgpu_passthrough(adev) && ((adev->asic_type == CHIP_ARCTURUS && 
adev->gmc.xgmi.num_physical_nodes > 1)||
+  adev->asic_type == CHIP_ALDEBARAN ))
+   smu_handle_passthrough_sbr(>smu, true);
 
if (adev->gmc.xgmi.num_physical_nodes > 1) {
mutex_lock(_info.mutex);
diff --git a/drivers/gpu/drm/amd/pm/inc/aldebaran_ppsmc.h 
b/drivers/gpu/drm/amd/pm/inc/aldebaran_ppsmc.h
index 35fa0d8e92dd..ab66a4b9e438 100644
--- a/drivers/gpu/drm/amd/pm/inc/aldebaran_ppsmc.h
+++ b/drivers/gpu/drm/amd/pm/inc/aldebaran_ppsmc.h
@@ -102,7 +102,9 @@
 
 #define PPSMC_MSG_GfxDriverResetRecovery   0x42
 #define PPSMC_MSG_BoardPowerCalibration0x43
-#define PPSMC_Message_Count0x44
+#define PPSMC_MSG_HeavySBR  0x45
+#define PPSMC_Message_Count0x46
+
 
 //PPSMC Reset Types
 #define PPSMC_RESET_TYPE_WARM_RESET  0x00
diff --git a/drivers/gpu/drm/amd/pm/inc/amdgpu_smu.h 
b/drivers/gpu/drm/amd/pm/inc/amdgpu_smu.h
index 2b9b9a7ba97a..ba7565bc8104 100644
--- a/drivers/gpu/drm/amd/pm/inc/amdgpu_smu.h
+++ b/drivers/gpu/drm/amd/pm/inc/amdgpu_smu.h
@@ -1257,9 +1257,9 @@ struct pptable_funcs {
int (*set_fine_grain_gfx_freq_parameters)(struct smu_context *smu);
 
/**
-* @set_light_sbr:  Set light sbr mode for the SMU.
+* @smu_handle_passthrough_sbr:  Send message to SMU about special 
handling for SBR.
 */
-   int (*set_light_sbr)(struct smu_context *smu, bool enable);
+   int (*smu_handle_passthrough_sbr)(struct smu_context *smu, bool enable);
 
/**
 * @wait_for_event:  Wait for events from SMU.
@@ -1415,7 +1415,7 @@ int smu_allow_xgmi_power_down(struct smu_context *smu, 
bool en);
 
 int smu_get_status_gfxoff(struct amdgpu_device *adev, uint32_t *value);
 
-int smu_set_light_sbr(struct smu_context *smu, bool enable);
+int smu_handle_passthrough_sbr(struct smu_context *smu, bool enable);
 
 int smu_wait_for_event(struct amdgpu_device *adev, enum smu_event_type event,
   uint64_t event_arg);
diff --git a/drivers/gpu/drm/amd/pm/inc/smu_types.h 
b/drivers/gpu/drm/amd/pm/inc/smu_types.h
index 18b862a90fbe..ff8a0bcbd290 100644
--- a/drivers/gpu/drm/amd/pm/inc/smu_types.h
+++ b/drivers/gpu/drm/amd/pm/inc/smu_types.h
@@ -229,7 +229,8 @@
__SMU_DUMMY_MAP(BoardPowerCalibration),   \
__SMU_DUMMY_MAP(RequestGfxclk),   \
__SMU_DUMMY_MAP(ForceGfxVid), \
-   __SMU_DUMMY_MAP(UnforceGfxVid),
+   __SMU_DUMMY_MAP(UnforceGfxVid),   \
+   __SMU_DUMMY_MAP(HeavySBR),
 
 #undef __SMU_DUMMY_MAP
 #define __SMU_DUMMY_MAP(type)  SMU_MSG_##type
diff --git a/drivers/gpu/drm/amd/pm/inc/smu_v11_0.h 
b/drivers/gpu/drm/amd/pm/inc/smu_v11_0.h
index 2d422e6a9feb..acb3be292096 100644

[PATCH] drm/amdgpu: Send Message to SMU on aldebaran passthrough for sbr handling

2021-12-17 Thread sashank saye

For Aldebaran chip passthrough case we need to intimate SMU
about special handling for SBR.On older chips we send
LightSBR to SMU, enabling the same for Aldebaran. Slight
difference, compared to previous chips, is on Aldebaran, SMU
would do a heavy reset on SBR. Hence, the word Heavy
instead of Light SBR is used for SMU to differentiate.

Signed-off-by: sashank saye 
Change-Id: I79420e7352bb670d6f9696df97d7546f131b18fc
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_device.c |  9 -
 drivers/gpu/drm/amd/pm/inc/aldebaran_ppsmc.h   |  4 +++-
 drivers/gpu/drm/amd/pm/inc/amdgpu_smu.h|  6 +++---
 drivers/gpu/drm/amd/pm/inc/smu_types.h |  3 ++-
 drivers/gpu/drm/amd/pm/inc/smu_v11_0.h |  2 +-
 drivers/gpu/drm/amd/pm/swsmu/amdgpu_smu.c  |  6 +++---
 drivers/gpu/drm/amd/pm/swsmu/smu11/arcturus_ppt.c  |  2 +-
 drivers/gpu/drm/amd/pm/swsmu/smu11/smu_v11_0.c |  2 +-
 drivers/gpu/drm/amd/pm/swsmu/smu13/aldebaran_ppt.c | 10 ++
 9 files changed, 28 insertions(+), 16 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index f31caec669e7..e4c93d373224 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -2618,11 +2618,10 @@ static int amdgpu_device_ip_late_init(struct 
amdgpu_device *adev)
if (r)
DRM_ERROR("enable mgpu fan boost failed (%d).\n", r);
 
-   /* For XGMI + passthrough configuration on arcturus, enable light SBR */
-   if (adev->asic_type == CHIP_ARCTURUS &&
-   amdgpu_passthrough(adev) &&
-   adev->gmc.xgmi.num_physical_nodes > 1)
-   smu_set_light_sbr(>smu, true);
+   /* For passthrough configuration on arcturus and aldebaran, enable 
special handling SBR */
+   if (amdgpu_passthrough(adev) && ((adev->asic_type == CHIP_ARCTURUS && 
adev->gmc.xgmi.num_physical_nodes > 1)||
+  adev->asic_type == CHIP_ALDEBARAN ))
+   smu_handle_passthrough_sbr(>smu, true);
 
if (adev->gmc.xgmi.num_physical_nodes > 1) {
mutex_lock(_info.mutex);
diff --git a/drivers/gpu/drm/amd/pm/inc/aldebaran_ppsmc.h 
b/drivers/gpu/drm/amd/pm/inc/aldebaran_ppsmc.h
index 35fa0d8e92dd..ab66a4b9e438 100644
--- a/drivers/gpu/drm/amd/pm/inc/aldebaran_ppsmc.h
+++ b/drivers/gpu/drm/amd/pm/inc/aldebaran_ppsmc.h
@@ -102,7 +102,9 @@
 
 #define PPSMC_MSG_GfxDriverResetRecovery   0x42
 #define PPSMC_MSG_BoardPowerCalibration0x43
-#define PPSMC_Message_Count0x44
+#define PPSMC_MSG_HeavySBR  0x45
+#define PPSMC_Message_Count0x46
+
 
 //PPSMC Reset Types
 #define PPSMC_RESET_TYPE_WARM_RESET  0x00
diff --git a/drivers/gpu/drm/amd/pm/inc/amdgpu_smu.h 
b/drivers/gpu/drm/amd/pm/inc/amdgpu_smu.h
index 2b9b9a7ba97a..ba7565bc8104 100644
--- a/drivers/gpu/drm/amd/pm/inc/amdgpu_smu.h
+++ b/drivers/gpu/drm/amd/pm/inc/amdgpu_smu.h
@@ -1257,9 +1257,9 @@ struct pptable_funcs {
int (*set_fine_grain_gfx_freq_parameters)(struct smu_context *smu);
 
/**
-* @set_light_sbr:  Set light sbr mode for the SMU.
+* @smu_handle_passthrough_sbr:  Send message to SMU about special 
handling for SBR.
 */
-   int (*set_light_sbr)(struct smu_context *smu, bool enable);
+   int (*smu_handle_passthrough_sbr)(struct smu_context *smu, bool enable);
 
/**
 * @wait_for_event:  Wait for events from SMU.
@@ -1415,7 +1415,7 @@ int smu_allow_xgmi_power_down(struct smu_context *smu, 
bool en);
 
 int smu_get_status_gfxoff(struct amdgpu_device *adev, uint32_t *value);
 
-int smu_set_light_sbr(struct smu_context *smu, bool enable);
+int smu_handle_passthrough_sbr(struct smu_context *smu, bool enable);
 
 int smu_wait_for_event(struct amdgpu_device *adev, enum smu_event_type event,
   uint64_t event_arg);
diff --git a/drivers/gpu/drm/amd/pm/inc/smu_types.h 
b/drivers/gpu/drm/amd/pm/inc/smu_types.h
index 18b862a90fbe..ff8a0bcbd290 100644
--- a/drivers/gpu/drm/amd/pm/inc/smu_types.h
+++ b/drivers/gpu/drm/amd/pm/inc/smu_types.h
@@ -229,7 +229,8 @@
__SMU_DUMMY_MAP(BoardPowerCalibration),   \
__SMU_DUMMY_MAP(RequestGfxclk),   \
__SMU_DUMMY_MAP(ForceGfxVid), \
-   __SMU_DUMMY_MAP(UnforceGfxVid),
+   __SMU_DUMMY_MAP(UnforceGfxVid),   \
+   __SMU_DUMMY_MAP(HeavySBR),
 
 #undef __SMU_DUMMY_MAP
 #define __SMU_DUMMY_MAP(type)  SMU_MSG_##type
diff --git a/drivers/gpu/drm/amd/pm/inc/smu_v11_0.h 
b/drivers/gpu/drm/amd/pm/inc/smu_v11_0.h
index 2d422e6a9feb..acb3be292096 100644
--- a/drivers/gpu/drm/amd/pm/inc/smu_v11_0.h
+++ b/drivers/gpu/drm/amd/pm/inc/smu_v11_0.h
@@ -312,7 +312,7 @@ int smu_v11_0_deep_sleep_control(struct smu_context *smu,
 
 void smu_v11_0_interrupt_work(struct smu_context *smu);
 
-int smu_v11_0_set_light_sbr(struct smu_context *smu, bool enable);
+int

RE: [PATCH] drm/amdgpu: Send Message to SMU on aldebaran passthrough for sbr handling

2021-12-17 Thread Liu, Shaoyun

[AMD Official Use Only]

Comment inline .

-Original Message-
From: amd-gfx  On Behalf Of sashank saye
Sent: Friday, December 17, 2021 1:19 PM
To: amd-gfx@lists.freedesktop.org
Cc: Saye, Sashank 
Subject: [PATCH] drm/amdgpu: Send Message to SMU on aldebaran passthrough for 
sbr handling

For Aldebaran chip passthrough case we need to intimate SMU about special 
handling for SBR.On older chips we send LightSBR to SMU, enabling the same for 
Aldebaran. Slight difference, compared to previous chips, is on Aldebaran, SMU 
would do a heavy reset on SBR. Hence, the word Heavy instead of Light SBR is 
used for SMU to differentiate.

Signed-off-by: sashank saye mailto:sashank.s...@amd.com>>
Change-Id: I79420e7352bb670d6f9696df97d7546f131b18fc
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_device.c |  9 -
 drivers/gpu/drm/amd/pm/inc/aldebaran_ppsmc.h   |  4 +++-
 drivers/gpu/drm/amd/pm/inc/amdgpu_smu.h|  6 +++---
 drivers/gpu/drm/amd/pm/inc/smu_types.h |  3 ++-
 drivers/gpu/drm/amd/pm/inc/smu_v11_0.h |  2 +-
 drivers/gpu/drm/amd/pm/swsmu/amdgpu_smu.c  |  6 +++---
 drivers/gpu/drm/amd/pm/swsmu/smu11/arcturus_ppt.c  |  2 +-
 drivers/gpu/drm/amd/pm/swsmu/smu11/smu_v11_0.c |  2 +-
 drivers/gpu/drm/amd/pm/swsmu/smu13/aldebaran_ppt.c | 10 ++
 9 files changed, 28 insertions(+), 16 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index f31caec669e7..0c292e119f7c 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -2618,11 +2618,10 @@ static int amdgpu_device_ip_late_init(struct 
amdgpu_device *adev)
if (r)
DRM_ERROR("enable mgpu fan boost failed (%d).\n", r);

-   /* For XGMI + passthrough configuration on arcturus, enable light SBR */
-   if (adev->asic_type == CHIP_ARCTURUS &&
-   amdgpu_passthrough(adev) &&
-   adev->gmc.xgmi.num_physical_nodes > 1)
-   smu_set_light_sbr(>smu, true);
+   /* For passthrough configuration on arcturus and aldebaran, enable 
special handling SBR */
[shaoyunl] This will change the  behavior for ARCTURUS, which only need to set 
light SBR for XGMI configuration .  You still need to check XGMI for ARCTURUS, 
but don't do that check for ALDEBARAN
+   if ((adev->asic_type == CHIP_ARCTURUS || adev->asic_type == 
CHIP_ALDEBARAN ) &&
+   amdgpu_passthrough(adev))
+   smu_handle_passthrough_sbr(>smu, true);

if (adev->gmc.xgmi.num_physical_nodes > 1) {
mutex_lock(_info.mutex);
diff --git a/drivers/gpu/drm/amd/pm/inc/aldebaran_ppsmc.h 
b/drivers/gpu/drm/amd/pm/inc/aldebaran_ppsmc.h
index 35fa0d8e92dd..ab66a4b9e438 100644
--- a/drivers/gpu/drm/amd/pm/inc/aldebaran_ppsmc.h
+++ b/drivers/gpu/drm/amd/pm/inc/aldebaran_ppsmc.h
@@ -102,7 +102,9 @@

 #define PPSMC_MSG_GfxDriverResetRecovery   0x42
 #define PPSMC_MSG_BoardPowerCalibration0x43
-#define PPSMC_Message_Count0x44
+#define PPSMC_MSG_HeavySBR  0x45
+#define PPSMC_Message_Count0x46
+

 //PPSMC Reset Types
 #define PPSMC_RESET_TYPE_WARM_RESET  0x00
diff --git a/drivers/gpu/drm/amd/pm/inc/amdgpu_smu.h 
b/drivers/gpu/drm/amd/pm/inc/amdgpu_smu.h
index 2b9b9a7ba97a..ba7565bc8104 100644
--- a/drivers/gpu/drm/amd/pm/inc/amdgpu_smu.h
+++ b/drivers/gpu/drm/amd/pm/inc/amdgpu_smu.h
@@ -1257,9 +1257,9 @@ struct pptable_funcs {
int (*set_fine_grain_gfx_freq_parameters)(struct smu_context *smu);

/**
-* @set_light_sbr:  Set light sbr mode for the SMU.
+* @smu_handle_passthrough_sbr:  Send message to SMU about special 
handling for SBR.
 */
-   int (*set_light_sbr)(struct smu_context *smu, bool enable);
+   int (*smu_handle_passthrough_sbr)(struct smu_context *smu, bool
+enable);

/**
 * @wait_for_event:  Wait for events from SMU.
@@ -1415,7 +1415,7 @@ int smu_allow_xgmi_power_down(struct smu_context *smu, 
bool en);

 int smu_get_status_gfxoff(struct amdgpu_device *adev, uint32_t *value);

-int smu_set_light_sbr(struct smu_context *smu, bool enable);
+int smu_handle_passthrough_sbr(struct smu_context *smu, bool enable);

 int smu_wait_for_event(struct amdgpu_device *adev, enum smu_event_type event,
   uint64_t event_arg);
diff --git a/drivers/gpu/drm/amd/pm/inc/smu_types.h 
b/drivers/gpu/drm/amd/pm/inc/smu_types.h
index 18b862a90fbe..ff8a0bcbd290 100644
--- a/drivers/gpu/drm/amd/pm/inc/smu_types.h
+++ b/drivers/gpu/drm/amd/pm/inc/smu_types.h
@@ -229,7 +229,8 @@
__SMU_DUMMY_MAP(BoardPowerCalibration),   \
__SMU_DUMMY_MAP(RequestGfxclk),   \
__SMU_DUMMY_MAP(ForceGfxVid), \
-   __SMU_DUMMY_MAP(UnforceGfxVid),
+   __SMU_DUMMY_MAP(UnforceGfxVid),   \
+   __SMU_DUMMY_MAP(HeavySBR),

 #undef __SMU_DUMMY_MAP
 #define __SMU_DUMMY_MAP(type)

[PATCH] drm/amdgpu: Send Message to SMU on aldebaran passthrough for sbr handling

2021-12-17 Thread sashank saye

For Aldebaran chip passthrough case we need to intimate SMU
about special handling for SBR.On older chips we send
LightSBR to SMU, enabling the same for Aldebaran. Slight
difference, compared to previous chips, is on Aldebaran, SMU
would do a heavy reset on SBR. Hence, the word Heavy
instead of Light SBR is used for SMU to differentiate.

Signed-off-by: sashank saye 
Change-Id: I79420e7352bb670d6f9696df97d7546f131b18fc
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_device.c |  9 -
 drivers/gpu/drm/amd/pm/inc/aldebaran_ppsmc.h   |  4 +++-
 drivers/gpu/drm/amd/pm/inc/amdgpu_smu.h|  6 +++---
 drivers/gpu/drm/amd/pm/inc/smu_types.h |  3 ++-
 drivers/gpu/drm/amd/pm/inc/smu_v11_0.h |  2 +-
 drivers/gpu/drm/amd/pm/swsmu/amdgpu_smu.c  |  6 +++---
 drivers/gpu/drm/amd/pm/swsmu/smu11/arcturus_ppt.c  |  2 +-
 drivers/gpu/drm/amd/pm/swsmu/smu11/smu_v11_0.c |  2 +-
 drivers/gpu/drm/amd/pm/swsmu/smu13/aldebaran_ppt.c | 10 ++
 9 files changed, 28 insertions(+), 16 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index f31caec669e7..0c292e119f7c 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -2618,11 +2618,10 @@ static int amdgpu_device_ip_late_init(struct 
amdgpu_device *adev)
if (r)
DRM_ERROR("enable mgpu fan boost failed (%d).\n", r);
 
-   /* For XGMI + passthrough configuration on arcturus, enable light SBR */
-   if (adev->asic_type == CHIP_ARCTURUS &&
-   amdgpu_passthrough(adev) &&
-   adev->gmc.xgmi.num_physical_nodes > 1)
-   smu_set_light_sbr(>smu, true);
+   /* For passthrough configuration on arcturus and aldebaran, enable 
special handling SBR */
+   if ((adev->asic_type == CHIP_ARCTURUS || adev->asic_type == 
CHIP_ALDEBARAN ) &&
+   amdgpu_passthrough(adev))
+   smu_handle_passthrough_sbr(>smu, true);
 
if (adev->gmc.xgmi.num_physical_nodes > 1) {
mutex_lock(_info.mutex);
diff --git a/drivers/gpu/drm/amd/pm/inc/aldebaran_ppsmc.h 
b/drivers/gpu/drm/amd/pm/inc/aldebaran_ppsmc.h
index 35fa0d8e92dd..ab66a4b9e438 100644
--- a/drivers/gpu/drm/amd/pm/inc/aldebaran_ppsmc.h
+++ b/drivers/gpu/drm/amd/pm/inc/aldebaran_ppsmc.h
@@ -102,7 +102,9 @@
 
 #define PPSMC_MSG_GfxDriverResetRecovery   0x42
 #define PPSMC_MSG_BoardPowerCalibration0x43
-#define PPSMC_Message_Count0x44
+#define PPSMC_MSG_HeavySBR  0x45
+#define PPSMC_Message_Count0x46
+
 
 //PPSMC Reset Types
 #define PPSMC_RESET_TYPE_WARM_RESET  0x00
diff --git a/drivers/gpu/drm/amd/pm/inc/amdgpu_smu.h 
b/drivers/gpu/drm/amd/pm/inc/amdgpu_smu.h
index 2b9b9a7ba97a..ba7565bc8104 100644
--- a/drivers/gpu/drm/amd/pm/inc/amdgpu_smu.h
+++ b/drivers/gpu/drm/amd/pm/inc/amdgpu_smu.h
@@ -1257,9 +1257,9 @@ struct pptable_funcs {
int (*set_fine_grain_gfx_freq_parameters)(struct smu_context *smu);
 
/**
-* @set_light_sbr:  Set light sbr mode for the SMU.
+* @smu_handle_passthrough_sbr:  Send message to SMU about special 
handling for SBR.
 */
-   int (*set_light_sbr)(struct smu_context *smu, bool enable);
+   int (*smu_handle_passthrough_sbr)(struct smu_context *smu, bool enable);
 
/**
 * @wait_for_event:  Wait for events from SMU.
@@ -1415,7 +1415,7 @@ int smu_allow_xgmi_power_down(struct smu_context *smu, 
bool en);
 
 int smu_get_status_gfxoff(struct amdgpu_device *adev, uint32_t *value);
 
-int smu_set_light_sbr(struct smu_context *smu, bool enable);
+int smu_handle_passthrough_sbr(struct smu_context *smu, bool enable);
 
 int smu_wait_for_event(struct amdgpu_device *adev, enum smu_event_type event,
   uint64_t event_arg);
diff --git a/drivers/gpu/drm/amd/pm/inc/smu_types.h 
b/drivers/gpu/drm/amd/pm/inc/smu_types.h
index 18b862a90fbe..ff8a0bcbd290 100644
--- a/drivers/gpu/drm/amd/pm/inc/smu_types.h
+++ b/drivers/gpu/drm/amd/pm/inc/smu_types.h
@@ -229,7 +229,8 @@
__SMU_DUMMY_MAP(BoardPowerCalibration),   \
__SMU_DUMMY_MAP(RequestGfxclk),   \
__SMU_DUMMY_MAP(ForceGfxVid), \
-   __SMU_DUMMY_MAP(UnforceGfxVid),
+   __SMU_DUMMY_MAP(UnforceGfxVid),   \
+   __SMU_DUMMY_MAP(HeavySBR),
 
 #undef __SMU_DUMMY_MAP
 #define __SMU_DUMMY_MAP(type)  SMU_MSG_##type
diff --git a/drivers/gpu/drm/amd/pm/inc/smu_v11_0.h 
b/drivers/gpu/drm/amd/pm/inc/smu_v11_0.h
index 2d422e6a9feb..acb3be292096 100644
--- a/drivers/gpu/drm/amd/pm/inc/smu_v11_0.h
+++ b/drivers/gpu/drm/amd/pm/inc/smu_v11_0.h
@@ -312,7 +312,7 @@ int smu_v11_0_deep_sleep_control(struct smu_context *smu,
 
 void smu_v11_0_interrupt_work(struct smu_context *smu);
 
-int smu_v11_0_set_light_sbr(struct smu_context *smu, bool enable);
+int smu_v11_0_handle_passthrough_sbr(struct smu_context *smu, bool

[PATCH] drm/amdgpu: Send Message to SMU on aldebaran passthrough for sbr handling

2021-12-17 Thread sashank saye

For Aldebaran chip passthrough case we need to intimate SMU
about special handling for SBR.On older chips we send
LightSBR to SMU, enabling the same for Aldebaran. Slight
difference, compared to previous chips, is on Aldebaran, SMU
would do a heavy reset on SBR. Hence, the word Heavy
instead of Light SBR is used for SMU to differentiate.

Signed-off-by: sashank saye 
Change-Id: I79420e7352bb670d6f9696df97d7546f131b18fc
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_device.c |  9 -
 drivers/gpu/drm/amd/pm/inc/aldebaran_ppsmc.h   |  4 +++-
 drivers/gpu/drm/amd/pm/inc/amdgpu_smu.h|  6 +++---
 drivers/gpu/drm/amd/pm/inc/smu_types.h |  3 ++-
 drivers/gpu/drm/amd/pm/inc/smu_v11_0.h |  2 +-
 drivers/gpu/drm/amd/pm/swsmu/amdgpu_smu.c  |  6 +++---
 drivers/gpu/drm/amd/pm/swsmu/smu11/arcturus_ppt.c  |  2 +-
 drivers/gpu/drm/amd/pm/swsmu/smu11/smu_v11_0.c |  2 +-
 drivers/gpu/drm/amd/pm/swsmu/smu13/aldebaran_ppt.c | 10 ++
 9 files changed, 28 insertions(+), 16 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index f31caec669e7..a493b4747a72 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -2618,11 +2618,10 @@ static int amdgpu_device_ip_late_init(struct 
amdgpu_device *adev)
if (r)
DRM_ERROR("enable mgpu fan boost failed (%d).\n", r);
 
-   /* For XGMI + passthrough configuration on arcturus, enable light SBR */
-   if (adev->asic_type == CHIP_ARCTURUS &&
-   amdgpu_passthrough(adev) &&
-   adev->gmc.xgmi.num_physical_nodes > 1)
-   smu_set_light_sbr(>smu, true);
+   /* For XGMI + passthrough configuration on arcturus and aldebaran, 
enable light SBR */
+   if ((adev->asic_type == CHIP_ARCTURUS || adev->asic_type == 
CHIP_ALDEBARAN ) &&
+   amdgpu_passthrough(adev))
+   smu_handle_passthrough_sbr(>smu, true);
 
if (adev->gmc.xgmi.num_physical_nodes > 1) {
mutex_lock(_info.mutex);
diff --git a/drivers/gpu/drm/amd/pm/inc/aldebaran_ppsmc.h 
b/drivers/gpu/drm/amd/pm/inc/aldebaran_ppsmc.h
index 35fa0d8e92dd..ab66a4b9e438 100644
--- a/drivers/gpu/drm/amd/pm/inc/aldebaran_ppsmc.h
+++ b/drivers/gpu/drm/amd/pm/inc/aldebaran_ppsmc.h
@@ -102,7 +102,9 @@
 
 #define PPSMC_MSG_GfxDriverResetRecovery   0x42
 #define PPSMC_MSG_BoardPowerCalibration0x43
-#define PPSMC_Message_Count0x44
+#define PPSMC_MSG_HeavySBR  0x45
+#define PPSMC_Message_Count0x46
+
 
 //PPSMC Reset Types
 #define PPSMC_RESET_TYPE_WARM_RESET  0x00
diff --git a/drivers/gpu/drm/amd/pm/inc/amdgpu_smu.h 
b/drivers/gpu/drm/amd/pm/inc/amdgpu_smu.h
index 2b9b9a7ba97a..ba7565bc8104 100644
--- a/drivers/gpu/drm/amd/pm/inc/amdgpu_smu.h
+++ b/drivers/gpu/drm/amd/pm/inc/amdgpu_smu.h
@@ -1257,9 +1257,9 @@ struct pptable_funcs {
int (*set_fine_grain_gfx_freq_parameters)(struct smu_context *smu);
 
/**
-* @set_light_sbr:  Set light sbr mode for the SMU.
+* @smu_handle_passthrough_sbr:  Send message to SMU about special 
handling for SBR.
 */
-   int (*set_light_sbr)(struct smu_context *smu, bool enable);
+   int (*smu_handle_passthrough_sbr)(struct smu_context *smu, bool enable);
 
/**
 * @wait_for_event:  Wait for events from SMU.
@@ -1415,7 +1415,7 @@ int smu_allow_xgmi_power_down(struct smu_context *smu, 
bool en);
 
 int smu_get_status_gfxoff(struct amdgpu_device *adev, uint32_t *value);
 
-int smu_set_light_sbr(struct smu_context *smu, bool enable);
+int smu_handle_passthrough_sbr(struct smu_context *smu, bool enable);
 
 int smu_wait_for_event(struct amdgpu_device *adev, enum smu_event_type event,
   uint64_t event_arg);
diff --git a/drivers/gpu/drm/amd/pm/inc/smu_types.h 
b/drivers/gpu/drm/amd/pm/inc/smu_types.h
index 18b862a90fbe..ff8a0bcbd290 100644
--- a/drivers/gpu/drm/amd/pm/inc/smu_types.h
+++ b/drivers/gpu/drm/amd/pm/inc/smu_types.h
@@ -229,7 +229,8 @@
__SMU_DUMMY_MAP(BoardPowerCalibration),   \
__SMU_DUMMY_MAP(RequestGfxclk),   \
__SMU_DUMMY_MAP(ForceGfxVid), \
-   __SMU_DUMMY_MAP(UnforceGfxVid),
+   __SMU_DUMMY_MAP(UnforceGfxVid),   \
+   __SMU_DUMMY_MAP(HeavySBR),
 
 #undef __SMU_DUMMY_MAP
 #define __SMU_DUMMY_MAP(type)  SMU_MSG_##type
diff --git a/drivers/gpu/drm/amd/pm/inc/smu_v11_0.h 
b/drivers/gpu/drm/amd/pm/inc/smu_v11_0.h
index 2d422e6a9feb..acb3be292096 100644
--- a/drivers/gpu/drm/amd/pm/inc/smu_v11_0.h
+++ b/drivers/gpu/drm/amd/pm/inc/smu_v11_0.h
@@ -312,7 +312,7 @@ int smu_v11_0_deep_sleep_control(struct smu_context *smu,
 
 void smu_v11_0_interrupt_work(struct smu_context *smu);
 
-int smu_v11_0_set_light_sbr(struct smu_context *smu, bool enable);
+int smu_v11_0_handle_passthrough_sbr(struct smu_context *smu, bool

RE: [PATCH] drm/amdgpu: Send Message to SMU on aldebaran passthrough for sbr handling

2021-12-17 Thread Liu, Shaoyun

[AMD Official Use Only]

>From your explanation , seems  SMU always need this special handling  for SBR 
>on  passthrough mode , but in the  code , that only apply to XGMI 
>configuration.  Should you change that as well ?  Two comments inline.

Regards
Shaoyun.liu



-Original Message-
From: amd-gfx  On Behalf Of sashank saye
Sent: Friday, December 17, 2021 12:39 PM
To: amd-gfx@lists.freedesktop.org
Cc: Saye, Sashank 
Subject: [PATCH] drm/amdgpu: Send Message to SMU on aldebaran passthrough for 
sbr handling

For Aldebaran chip passthrough case we need to intimate SMU about special 
handling for SBR.On older chips we send LightSBR to SMU, enabling the same for 
Aldebaran. Slight difference, compared to previous chips, is on Aldebaran, SMU 
would do a heavy reset on SBR. Hence, the word Heavy instead of Light SBR is 
used for SMU to differentiate.

Signed-off-by: sashank saye mailto:sashank.s...@amd.com>>
Change-Id: I79420e7352bb670d6f9696df97d7546f131b18fc
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_device.c |  6 +++---
 drivers/gpu/drm/amd/pm/inc/aldebaran_ppsmc.h   |  4 +++-
 drivers/gpu/drm/amd/pm/inc/amdgpu_smu.h|  6 +++---
 drivers/gpu/drm/amd/pm/inc/smu_types.h |  3 ++-
 drivers/gpu/drm/amd/pm/inc/smu_v11_0.h |  2 +-
 drivers/gpu/drm/amd/pm/swsmu/amdgpu_smu.c  |  6 +++---
 drivers/gpu/drm/amd/pm/swsmu/smu11/arcturus_ppt.c  |  2 +-
 drivers/gpu/drm/amd/pm/swsmu/smu11/smu_v11_0.c |  2 +-
 drivers/gpu/drm/amd/pm/swsmu/smu13/aldebaran_ppt.c | 11 +++
 9 files changed, 28 insertions(+), 14 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index f31caec669e7..01b02701121e 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -2618,11 +2618,11 @@ static int amdgpu_device_ip_late_init(struct 
amdgpu_device *adev)
if (r)
DRM_ERROR("enable mgpu fan boost failed (%d).\n", r);

-   /* For XGMI + passthrough configuration on arcturus, enable light SBR */
-   if (adev->asic_type == CHIP_ARCTURUS &&
+   /* For XGMI + passthrough configuration on arcturus and aldebaran, 
enable light SBR */
+   if ((adev->asic_type == CHIP_ARCTURUS || adev->asic_type ==
+CHIP_ALDEBARAN ) &&
amdgpu_passthrough(adev) &&
adev->gmc.xgmi.num_physical_nodes > 1)

[shaoyunl] , Should this apply to none  XGMI configuration as well?

-   smu_set_light_sbr(>smu, true);
+   smu_handle_passthrough_sbr(>smu, true);

if (adev->gmc.xgmi.num_physical_nodes > 1) {
mutex_lock(_info.mutex);
diff --git a/drivers/gpu/drm/amd/pm/inc/aldebaran_ppsmc.h 
b/drivers/gpu/drm/amd/pm/inc/aldebaran_ppsmc.h
index 35fa0d8e92dd..ab66a4b9e438 100644
--- a/drivers/gpu/drm/amd/pm/inc/aldebaran_ppsmc.h
+++ b/drivers/gpu/drm/amd/pm/inc/aldebaran_ppsmc.h
@@ -102,7 +102,9 @@

 #define PPSMC_MSG_GfxDriverResetRecovery   0x42
 #define PPSMC_MSG_BoardPowerCalibration0x43
-#define PPSMC_Message_Count0x44
+#define PPSMC_MSG_HeavySBR  0x45
+#define PPSMC_Message_Count0x46
+

 //PPSMC Reset Types
 #define PPSMC_RESET_TYPE_WARM_RESET  0x00
diff --git a/drivers/gpu/drm/amd/pm/inc/amdgpu_smu.h 
b/drivers/gpu/drm/amd/pm/inc/amdgpu_smu.h
index 2b9b9a7ba97a..ba7565bc8104 100644
--- a/drivers/gpu/drm/amd/pm/inc/amdgpu_smu.h
+++ b/drivers/gpu/drm/amd/pm/inc/amdgpu_smu.h
@@ -1257,9 +1257,9 @@ struct pptable_funcs {
int (*set_fine_grain_gfx_freq_parameters)(struct smu_context *smu);

/**
-* @set_light_sbr:  Set light sbr mode for the SMU.
+* @smu_handle_passthrough_sbr:  Send message to SMU about special 
handling for SBR.
 */
-   int (*set_light_sbr)(struct smu_context *smu, bool enable);
+   int (*smu_handle_passthrough_sbr)(struct smu_context *smu, bool
+enable);

/**
 * @wait_for_event:  Wait for events from SMU.
@@ -1415,7 +1415,7 @@ int smu_allow_xgmi_power_down(struct smu_context *smu, 
bool en);

 int smu_get_status_gfxoff(struct amdgpu_device *adev, uint32_t *value);

-int smu_set_light_sbr(struct smu_context *smu, bool enable);
+int smu_handle_passthrough_sbr(struct smu_context *smu, bool enable);

 int smu_wait_for_event(struct amdgpu_device *adev, enum smu_event_type event,
   uint64_t event_arg);
diff --git a/drivers/gpu/drm/amd/pm/inc/smu_types.h 
b/drivers/gpu/drm/amd/pm/inc/smu_types.h
index 18b862a90fbe..ff8a0bcbd290 100644
--- a/drivers/gpu/drm/amd/pm/inc/smu_types.h
+++ b/drivers/gpu/drm/amd/pm/inc/smu_types.h
@@ -229,7 +229,8 @@
__SMU_DUMMY_MAP(BoardPowerCalibration),   \
__SMU_DUMMY_MAP(RequestGfxclk),   \
__SMU_DUMMY_MAP(ForceGfxVid), \
-   __SMU_DUMMY_MAP(UnforceGfxVid),
+   __SMU_DUMMY_MAP(UnforceGfxVid),   \
+   __SMU_DUMMY_MAP(HeavySBR),

Re: [PATCH 4/4] drm/amdgpu: Access the FRU on Aldebaran

2021-12-17 Thread Deucher, Alexander

[AMD Official Use Only]

Series is:
Reviewed-by: Alex Deucher 

From: amd-gfx  on behalf of Kent Russell 

Sent: Friday, December 17, 2021 10:31 AM
To: amd-gfx@lists.freedesktop.org 
Cc: Russell, Kent 
Subject: [PATCH 4/4] drm/amdgpu: Access the FRU on Aldebaran

This is supported, although the offset is different from VG20, so fix
that with a variable and enable getting the product name and serial
number from the FRU. Do this for all SKUs since all SKUs have the FRU

Signed-off-by: Kent Russell 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_fru_eeprom.c | 13 ++---
 1 file changed, 10 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_fru_eeprom.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_fru_eeprom.c
index 5ed24701f9cf..80f43e69e659 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_fru_eeprom.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_fru_eeprom.c
@@ -56,6 +56,9 @@ static bool is_fru_eeprom_supported(struct amdgpu_device 
*adev)
 return true;
 else
 return false;
+   case CHIP_ALDEBARAN:
+   /* All Aldebaran SKUs have the FRU */
+   return true;
 default:
 return false;
 }
@@ -91,6 +94,10 @@ int amdgpu_fru_get_product_info(struct amdgpu_device *adev)
 unsigned char buff[PRODUCT_NAME_LEN+2];
 u32 addrptr;
 int size, len;
+   int offset = 2;
+
+   if (adev->asic_type == CHIP_ALDEBARAN)
+   offset = 0;

 if (!is_fru_eeprom_supported(adev))
 return 0;
@@ -137,7 +144,7 @@ int amdgpu_fru_get_product_info(struct amdgpu_device *adev)
 len = PRODUCT_NAME_LEN - 1;
 }
 /* Start at 2 due to buff using fields 0 and 1 for the address */
-   memcpy(adev->product_name, [2], len);
+   memcpy(adev->product_name, [offset], len);
 adev->product_name[len] = '\0';

 addrptr += size + 1;
@@ -155,7 +162,7 @@ int amdgpu_fru_get_product_info(struct amdgpu_device *adev)
 DRM_WARN("FRU Product Number is larger than 16 characters. 
This is likely a mistake");
 len = sizeof(adev->product_number) - 1;
 }
-   memcpy(adev->product_number, [2], len);
+   memcpy(adev->product_number, [offset], len);
 adev->product_number[len] = '\0';

 addrptr += size + 1;
@@ -182,7 +189,7 @@ int amdgpu_fru_get_product_info(struct amdgpu_device *adev)
 DRM_WARN("FRU Serial Number is larger than 16 characters. This 
is likely a mistake");
 len = sizeof(adev->serial) - 1;
 }
-   memcpy(adev->serial, [2], len);
+   memcpy(adev->serial, [offset], len);
 adev->serial[len] = '\0';

 return 0;
--
2.25.1

[PATCH] drm/amdgpu: Send Message to SMU on aldebaran passthrough for sbr handling

2021-12-17 Thread sashank saye

For Aldebaran chip passthrough case we need to intimate SMU
about special handling for SBR.On older chips we send
LightSBR to SMU, enabling the same for Aldebaran. Slight
difference, compared to previous chips, is on Aldebaran, SMU
would do a heavy reset on SBR. Hence, the word Heavy
instead of Light SBR is used for SMU to differentiate.

Signed-off-by: sashank saye 
Change-Id: I79420e7352bb670d6f9696df97d7546f131b18fc
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_device.c |  6 +++---
 drivers/gpu/drm/amd/pm/inc/aldebaran_ppsmc.h   |  4 +++-
 drivers/gpu/drm/amd/pm/inc/amdgpu_smu.h|  6 +++---
 drivers/gpu/drm/amd/pm/inc/smu_types.h |  3 ++-
 drivers/gpu/drm/amd/pm/inc/smu_v11_0.h |  2 +-
 drivers/gpu/drm/amd/pm/swsmu/amdgpu_smu.c  |  6 +++---
 drivers/gpu/drm/amd/pm/swsmu/smu11/arcturus_ppt.c  |  2 +-
 drivers/gpu/drm/amd/pm/swsmu/smu11/smu_v11_0.c |  2 +-
 drivers/gpu/drm/amd/pm/swsmu/smu13/aldebaran_ppt.c | 11 +++
 9 files changed, 28 insertions(+), 14 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index f31caec669e7..01b02701121e 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -2618,11 +2618,11 @@ static int amdgpu_device_ip_late_init(struct 
amdgpu_device *adev)
if (r)
DRM_ERROR("enable mgpu fan boost failed (%d).\n", r);
 
-   /* For XGMI + passthrough configuration on arcturus, enable light SBR */
-   if (adev->asic_type == CHIP_ARCTURUS &&
+   /* For XGMI + passthrough configuration on arcturus and aldebaran, 
enable light SBR */
+   if ((adev->asic_type == CHIP_ARCTURUS || adev->asic_type == 
CHIP_ALDEBARAN ) &&
amdgpu_passthrough(adev) &&
adev->gmc.xgmi.num_physical_nodes > 1)
-   smu_set_light_sbr(>smu, true);
+   smu_handle_passthrough_sbr(>smu, true);
 
if (adev->gmc.xgmi.num_physical_nodes > 1) {
mutex_lock(_info.mutex);
diff --git a/drivers/gpu/drm/amd/pm/inc/aldebaran_ppsmc.h 
b/drivers/gpu/drm/amd/pm/inc/aldebaran_ppsmc.h
index 35fa0d8e92dd..ab66a4b9e438 100644
--- a/drivers/gpu/drm/amd/pm/inc/aldebaran_ppsmc.h
+++ b/drivers/gpu/drm/amd/pm/inc/aldebaran_ppsmc.h
@@ -102,7 +102,9 @@
 
 #define PPSMC_MSG_GfxDriverResetRecovery   0x42
 #define PPSMC_MSG_BoardPowerCalibration0x43
-#define PPSMC_Message_Count0x44
+#define PPSMC_MSG_HeavySBR  0x45
+#define PPSMC_Message_Count0x46
+
 
 //PPSMC Reset Types
 #define PPSMC_RESET_TYPE_WARM_RESET  0x00
diff --git a/drivers/gpu/drm/amd/pm/inc/amdgpu_smu.h 
b/drivers/gpu/drm/amd/pm/inc/amdgpu_smu.h
index 2b9b9a7ba97a..ba7565bc8104 100644
--- a/drivers/gpu/drm/amd/pm/inc/amdgpu_smu.h
+++ b/drivers/gpu/drm/amd/pm/inc/amdgpu_smu.h
@@ -1257,9 +1257,9 @@ struct pptable_funcs {
int (*set_fine_grain_gfx_freq_parameters)(struct smu_context *smu);
 
/**
-* @set_light_sbr:  Set light sbr mode for the SMU.
+* @smu_handle_passthrough_sbr:  Send message to SMU about special 
handling for SBR.
 */
-   int (*set_light_sbr)(struct smu_context *smu, bool enable);
+   int (*smu_handle_passthrough_sbr)(struct smu_context *smu, bool enable);
 
/**
 * @wait_for_event:  Wait for events from SMU.
@@ -1415,7 +1415,7 @@ int smu_allow_xgmi_power_down(struct smu_context *smu, 
bool en);
 
 int smu_get_status_gfxoff(struct amdgpu_device *adev, uint32_t *value);
 
-int smu_set_light_sbr(struct smu_context *smu, bool enable);
+int smu_handle_passthrough_sbr(struct smu_context *smu, bool enable);
 
 int smu_wait_for_event(struct amdgpu_device *adev, enum smu_event_type event,
   uint64_t event_arg);
diff --git a/drivers/gpu/drm/amd/pm/inc/smu_types.h 
b/drivers/gpu/drm/amd/pm/inc/smu_types.h
index 18b862a90fbe..ff8a0bcbd290 100644
--- a/drivers/gpu/drm/amd/pm/inc/smu_types.h
+++ b/drivers/gpu/drm/amd/pm/inc/smu_types.h
@@ -229,7 +229,8 @@
__SMU_DUMMY_MAP(BoardPowerCalibration),   \
__SMU_DUMMY_MAP(RequestGfxclk),   \
__SMU_DUMMY_MAP(ForceGfxVid), \
-   __SMU_DUMMY_MAP(UnforceGfxVid),
+   __SMU_DUMMY_MAP(UnforceGfxVid),   \
+   __SMU_DUMMY_MAP(HeavySBR),
 
 #undef __SMU_DUMMY_MAP
 #define __SMU_DUMMY_MAP(type)  SMU_MSG_##type
diff --git a/drivers/gpu/drm/amd/pm/inc/smu_v11_0.h 
b/drivers/gpu/drm/amd/pm/inc/smu_v11_0.h
index 2d422e6a9feb..acb3be292096 100644
--- a/drivers/gpu/drm/amd/pm/inc/smu_v11_0.h
+++ b/drivers/gpu/drm/amd/pm/inc/smu_v11_0.h
@@ -312,7 +312,7 @@ int smu_v11_0_deep_sleep_control(struct smu_context *smu,
 
 void smu_v11_0_interrupt_work(struct smu_context *smu);
 
-int smu_v11_0_set_light_sbr(struct smu_context *smu, bool enable);
+int smu_v11_0_handle_passthrough_sbr(struct smu_context *smu, bool enable);
 
 int

Re: [Bug Report] Desktop monitor sleep regression

2021-12-17 Thread Thorsten Leemhuis




On 17.12.21 15:52, Imre Deak wrote:
> On Fri, Dec 17, 2021 at 03:46:21PM +0100, Thorsten Leemhuis wrote:
>> added some CCs Geert added in his reply
>>
>> On 07.12.21 08:20, Thorsten Leemhuis wrote:
>>>
>>> [TLDR: adding this regression to regzbot; most of this mail is compiled
>>> from a few templates paragraphs some of you might have seen already.]
>>>
>>> Hi, this is your Linux kernel regression tracker speaking.
>>
>> /me again
>>
>> What's up here? We are getting close to rc6, but there afaics wasn't any
>> reply of substance since the report ten days ago. Hence:
>>
>> Could anybody please comment on this? Imre Deak, the commit Brandon
>> found in the bisection contains a patch of yours, do you maybe have an
>> idea what's up here?
> 
> Yes,
> https://bugzilla.kernel.org/show_bug.cgi?id=215203
> 
> based on which the problem is somehere in the AMD driver.

Ha, sorry for the noise then, I really feel stupid: I have no idea why I
didn't check the bug report for an update, as I do normally do. Much
have slipped through. Ohh well, hopefully we one day have have a central
place to handle these things.

Ciao, Thorsten

>> Ciao, Thorsten
>>
>> #regzbot poke
>>
>>> Adding the regression mailing list to the list of recipients, as it
>>> should be in the loop for all regressions, as explained here:
>>> https://www.kernel.org/doc/html/latest/admin-guide/reporting-issues.html
>>>
>>> Also adding the authors and reviewers of the culprit and two appropriate
>>> mailing lists.
>>>
>>> On 07.12.21 01:21, Brandon Nielsen wrote:
 Monitors no longer sleep properly on my system (dual monitor connected
 via DP->DVI, amdgpu, x86_64). The monitors slept properly on 5.14, but
 stopped during the 5.15 series. I have also filed this bug on the kernel
 bugzilla[0] and downstream[1].

 I have performed a bisect, first "bad" commit to master is
 55285e21f04517939480966164a33898c34b2af2[1], the same change made it
 into the 5.15 branch as e3b39825ed0813f787cb3ebdc5ecaa5131623647.
>>>
>>> TWIMC: That was for 5.15.3
>>>
 I have
 verified the issue exists in latest master
 (a51e3ac43ddbad891c2b1a4f3aa52371d6939570).

 Steps to reproduce:

   1. Boot system (Fedora Workstation 35 in this case)
   2. Log in
   3. Lock screen (after a few seconds, monitors will enter power save
 "sleep" state with backlight off)
   4. Wait (usually no more than 30 seconds, sometimes up to a few minutes)
   5. Observe monitor leaving "sleep" state (backlight comes back on),
 but nothing is displayed

 [0] - https://bugzilla.kernel.org/show_bug.cgi?id=215203
 [1] - https://bugzilla.redhat.com/show_bug.cgi?id=2028613
>>>
>>> To be sure this issue doesn't fall through the cracks unnoticed, I'm
>>> adding it to regzbot, my Linux kernel regression tracking bot:
>>>
>>> #regzbot ^introduced 55285e21f04517939480966164a33898c34b2af2
>>> #regzbot title fbdev/efifb: Monitors no longer sleep (amdgpu dual
>>> monitor setup)
>>> #regzbot ignore-activity
>>>
>>> Reminder: when fixing the issue, please add a 'Link:' tag with the URL
>>> to the report (the parent of this mail), then regzbot will automatically
>>> mark the regression as resolved once the fix lands in the appropriate
>>> tree. For more details about regzbot see footer.
>>>
>>> Sending this to everyone that got the initial report, to make all aware
>>> of the tracking. I also hope that messages like this motivate people to
>>> directly get at least the regression mailing list and ideally even
>>> regzbot involved when dealing with regressions, as messages like this
>>> wouldn't be needed then.
>>>
>>> Don't worry, I'll send further messages wrt to this regression just to
>>> the lists (with a tag in the subject so people can filter them away), as
>>> long as they are intended just for regzbot. With a bit of luck no such
>>> messages will be needed anyway.
>>>
>>> Ciao, Thorsten, your Linux kernel regression tracker.
>>>
>>> P.S.: As a Linux kernel regression tracker I'm getting a lot of reports
>>> on my table. I can only look briefly into most of them. Unfortunately
>>> therefore I sometimes will get things wrong or miss something important.
>>> I hope that's not the case here; if you think it is, don't hesitate to
>>> tell me about it in a public reply. That's in everyone's interest, as
>>> what I wrote above might be misleading to everyone reading this; any
>>> suggestion I gave they thus might sent someone reading this down the
>>> wrong rabbit hole, which none of us wants.
>>>
>>> BTW, I have no personal interest in this issue, which is tracked using
>>> regzbot, my Linux kernel regression tracking bot
>>> (https://linux-regtracking.leemhuis.info/regzbot/). I'm only posting
>>> this mail to get things rolling again and hence don't need to be CC on
>>> all further activities wrt to this regression.
>>>
>

RE: [PATCH] drm/amdgpu: Send Message to SMU on aldebaran passthrough for sbr handling

2021-12-17 Thread Saye, Sashank

[AMD Official Use Only]

Yeah after smu does the mode 1 reset, the clock is cleared, hence when the 
driver boots after that, it will look like a regular cold boot. 

Regards
Sashank

-Original Message-
From: Liu, Shaoyun  
Sent: Friday, December 17, 2021 12:07 PM
To: Saye, Sashank ; amd-gfx@lists.freedesktop.org
Subject: RE: [PATCH] drm/amdgpu: Send Message to SMU on aldebaran passthrough 
for sbr handling

[AMD Official Use Only]

Ok, sounds reasonable . I'm ok for the function name change .  
Another concern , from driver side , before it start the  ip init ,  it will 
check the SMU clock to determine whether the  asic need a reset from driver 
side . For your case , the hypervisor will trigger the SBR on  VM on/off and 
SMU will handle the reset.  Can  you check after this  reset , will SMU still 
alive ? If it's alive , the driver will trigger the reset again . 

Regards
Shaoyun.liu

-Original Message-
From: Saye, Sashank 
Sent: Friday, December 17, 2021 11:53 AM
To: Liu, Shaoyun ; amd-gfx@lists.freedesktop.org
Subject: RE: [PATCH] drm/amdgpu: Send Message to SMU on aldebaran passthrough 
for sbr handling

[AMD Official Use Only]

Hi Shaoyun,
Yes, From SMU FW point of view they do see a difference between Bare metal and 
passthrough case for SBR. For baremetal they get it as a PCI reset whereas 
passthrough case they get it as a BIF reset. Now within BIF reset they would 
need to differentiate between older asic( where we do BACO) and newer ones 
where we do mode 1 reset. Hence in-order for SMU to differentiate these 
scenarios we are adding a new message. 

I think I will rename the function to smu_handle_passthrough_sbr from the 
current smu_set_light_sbr function name.

Regards
Sashank

-Original Message-
From: Liu, Shaoyun 
Sent: Friday, December 17, 2021 11:45 AM
To: Saye, Sashank ; amd-gfx@lists.freedesktop.org
Cc: Saye, Sashank 
Subject: RE: [PATCH] drm/amdgpu: Send Message to SMU on aldebaran passthrough 
for sbr handling

[AMD Official Use Only]

First , the name of heavy SBR  is confusing when you need to go through  light 
SBR code path. 
Secondary,  originally we introduce the light SBR is because on older asic,   
FW can not synchronize the reset on the devices within the hive, so it depends 
on driver to sync the reset.  From what I have heard , for chip aructus , the 
FW actually can sync the reset itself.  I don't see a necessary to  introduce 
the heavy SBR message, it seems SMU will do a full reset  when it get SBR  
request.  IS there  a different code path  for SMU to handle the reset  for 
XGMI in passthrough mode ?  

Regards
Shaoyun.liu

-Original Message-
From: amd-gfx  On Behalf Of sashank saye
Sent: Friday, December 17, 2021 10:33 AM
To: amd-gfx@lists.freedesktop.org
Cc: Saye, Sashank 
Subject: [PATCH] drm/amdgpu: Send Message to SMU on aldebaran passthrough for 
sbr handling

For Aldebaran chip passthrough case we need to intimate SMU about special 
handling for SBR.On older chips we send LightSBR to SMU, enabling the same for 
Aldebaran. Slight difference, compared to previous chips, is on Aldebaran, SMU 
would do a heavy reset on SBR. Hence, the word Heavy instead of Light SBR is 
used for SMU to differentiate.

Signed-off-by: sashank saye 
Change-Id: I79420e7352bb670d6f9696df97d7546f131b18fc
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_device.c |  4 ++--
 drivers/gpu/drm/amd/pm/inc/aldebaran_ppsmc.h   |  4 +++-
 drivers/gpu/drm/amd/pm/inc/smu_types.h |  3 ++-
 drivers/gpu/drm/amd/pm/swsmu/smu13/aldebaran_ppt.c | 11 +++
 4 files changed, 18 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index f31caec669e7..06aee23505b2 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -2618,8 +2618,8 @@ static int amdgpu_device_ip_late_init(struct 
amdgpu_device *adev)
if (r)
DRM_ERROR("enable mgpu fan boost failed (%d).\n", r);
 
-   /* For XGMI + passthrough configuration on arcturus, enable light SBR */
-   if (adev->asic_type == CHIP_ARCTURUS &&
+   /* For XGMI + passthrough configuration on arcturus and aldebaran, 
enable light SBR */
+   if ((adev->asic_type == CHIP_ARCTURUS || adev->asic_type == 
+CHIP_ALDEBARAN ) &&
amdgpu_passthrough(adev) &&
adev->gmc.xgmi.num_physical_nodes > 1)
smu_set_light_sbr(>smu, true);
diff --git a/drivers/gpu/drm/amd/pm/inc/aldebaran_ppsmc.h 
b/drivers/gpu/drm/amd/pm/inc/aldebaran_ppsmc.h
index 35fa0d8e92dd..ab66a4b9e438 100644
--- a/drivers/gpu/drm/amd/pm/inc/aldebaran_ppsmc.h
+++ b/drivers/gpu/drm/amd/pm/inc/aldebaran_ppsmc.h
@@ -102,7 +102,9 @@
 
 #define PPSMC_MSG_GfxDriverResetRecovery   0x42
 #define PPSMC_MSG_BoardPowerCalibration0x43
-#define PPSMC_Message_Count0x44
+#define PPSMC_MSG_HeavySBR  0x45
+#define

RE: [PATCH] drm/amdgpu: Send Message to SMU on aldebaran passthrough for sbr handling

2021-12-17 Thread Liu, Shaoyun

[AMD Official Use Only]

Ok, sounds reasonable . I'm ok for the function name change .  
Another concern , from driver side , before it start the  ip init ,  it will 
check the SMU clock to determine whether the  asic need a reset from driver 
side . For your case , the hypervisor will trigger the SBR on  VM on/off and 
SMU will handle the reset.  Can  you check after this  reset , will SMU still 
alive ? If it's alive , the driver will trigger the reset again . 

Regards
Shaoyun.liu

-Original Message-
From: Saye, Sashank  
Sent: Friday, December 17, 2021 11:53 AM
To: Liu, Shaoyun ; amd-gfx@lists.freedesktop.org
Subject: RE: [PATCH] drm/amdgpu: Send Message to SMU on aldebaran passthrough 
for sbr handling

[AMD Official Use Only]

Hi Shaoyun,
Yes, From SMU FW point of view they do see a difference between Bare metal and 
passthrough case for SBR. For baremetal they get it as a PCI reset whereas 
passthrough case they get it as a BIF reset. Now within BIF reset they would 
need to differentiate between older asic( where we do BACO) and newer ones 
where we do mode 1 reset. Hence in-order for SMU to differentiate these 
scenarios we are adding a new message. 

I think I will rename the function to smu_handle_passthrough_sbr from the 
current smu_set_light_sbr function name.

Regards
Sashank

-Original Message-
From: Liu, Shaoyun 
Sent: Friday, December 17, 2021 11:45 AM
To: Saye, Sashank ; amd-gfx@lists.freedesktop.org
Cc: Saye, Sashank 
Subject: RE: [PATCH] drm/amdgpu: Send Message to SMU on aldebaran passthrough 
for sbr handling

[AMD Official Use Only]

First , the name of heavy SBR  is confusing when you need to go through  light 
SBR code path. 
Secondary,  originally we introduce the light SBR is because on older asic,   
FW can not synchronize the reset on the devices within the hive, so it depends 
on driver to sync the reset.  From what I have heard , for chip aructus , the 
FW actually can sync the reset itself.  I don't see a necessary to  introduce 
the heavy SBR message, it seems SMU will do a full reset  when it get SBR  
request.  IS there  a different code path  for SMU to handle the reset  for 
XGMI in passthrough mode ?  

Regards
Shaoyun.liu

-Original Message-
From: amd-gfx  On Behalf Of sashank saye
Sent: Friday, December 17, 2021 10:33 AM
To: amd-gfx@lists.freedesktop.org
Cc: Saye, Sashank 
Subject: [PATCH] drm/amdgpu: Send Message to SMU on aldebaran passthrough for 
sbr handling

For Aldebaran chip passthrough case we need to intimate SMU about special 
handling for SBR.On older chips we send LightSBR to SMU, enabling the same for 
Aldebaran. Slight difference, compared to previous chips, is on Aldebaran, SMU 
would do a heavy reset on SBR. Hence, the word Heavy instead of Light SBR is 
used for SMU to differentiate.

Signed-off-by: sashank saye 
Change-Id: I79420e7352bb670d6f9696df97d7546f131b18fc
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_device.c |  4 ++--
 drivers/gpu/drm/amd/pm/inc/aldebaran_ppsmc.h   |  4 +++-
 drivers/gpu/drm/amd/pm/inc/smu_types.h |  3 ++-
 drivers/gpu/drm/amd/pm/swsmu/smu13/aldebaran_ppt.c | 11 +++
 4 files changed, 18 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index f31caec669e7..06aee23505b2 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -2618,8 +2618,8 @@ static int amdgpu_device_ip_late_init(struct 
amdgpu_device *adev)
if (r)
DRM_ERROR("enable mgpu fan boost failed (%d).\n", r);
 
-   /* For XGMI + passthrough configuration on arcturus, enable light SBR */
-   if (adev->asic_type == CHIP_ARCTURUS &&
+   /* For XGMI + passthrough configuration on arcturus and aldebaran, 
enable light SBR */
+   if ((adev->asic_type == CHIP_ARCTURUS || adev->asic_type == 
+CHIP_ALDEBARAN ) &&
amdgpu_passthrough(adev) &&
adev->gmc.xgmi.num_physical_nodes > 1)
smu_set_light_sbr(>smu, true);
diff --git a/drivers/gpu/drm/amd/pm/inc/aldebaran_ppsmc.h 
b/drivers/gpu/drm/amd/pm/inc/aldebaran_ppsmc.h
index 35fa0d8e92dd..ab66a4b9e438 100644
--- a/drivers/gpu/drm/amd/pm/inc/aldebaran_ppsmc.h
+++ b/drivers/gpu/drm/amd/pm/inc/aldebaran_ppsmc.h
@@ -102,7 +102,9 @@
 
 #define PPSMC_MSG_GfxDriverResetRecovery   0x42
 #define PPSMC_MSG_BoardPowerCalibration0x43
-#define PPSMC_Message_Count0x44
+#define PPSMC_MSG_HeavySBR  0x45
+#define PPSMC_Message_Count0x46
+
 
 //PPSMC Reset Types
 #define PPSMC_RESET_TYPE_WARM_RESET  0x00
diff --git a/drivers/gpu/drm/amd/pm/inc/smu_types.h 
b/drivers/gpu/drm/amd/pm/inc/smu_types.h
index 18b862a90fbe..ff8a0bcbd290 100644
--- a/drivers/gpu/drm/amd/pm/inc/smu_types.h
+++ b/drivers/gpu/drm/amd/pm/inc/smu_types.h
@@ -229,7 +229,8 @@
__SMU_DUMMY_MAP(BoardPowerCalibration),   \

RE: [PATCH] drm/amdgpu: Send Message to SMU on aldebaran passthrough for sbr handling

2021-12-17 Thread Saye, Sashank

[AMD Official Use Only]

Hi Shaoyun,
Yes, From SMU FW point of view they do see a difference between Bare metal and 
passthrough case for SBR. For baremetal they get it as a PCI reset whereas 
passthrough case they get it as a BIF reset. Now within BIF reset they would 
need to differentiate between older asic( where we do BACO) and newer ones 
where we do mode 1 reset. Hence in-order for SMU to differentiate these 
scenarios we are adding a new message. 

I think I will rename the function to smu_handle_passthrough_sbr from the 
current smu_set_light_sbr function name.

Regards
Sashank

-Original Message-
From: Liu, Shaoyun  
Sent: Friday, December 17, 2021 11:45 AM
To: Saye, Sashank ; amd-gfx@lists.freedesktop.org
Cc: Saye, Sashank 
Subject: RE: [PATCH] drm/amdgpu: Send Message to SMU on aldebaran passthrough 
for sbr handling

[AMD Official Use Only]

First , the name of heavy SBR  is confusing when you need to go through  light 
SBR code path. 
Secondary,  originally we introduce the light SBR is because on older asic,   
FW can not synchronize the reset on the devices within the hive, so it depends 
on driver to sync the reset.  From what I have heard , for chip aructus , the 
FW actually can sync the reset itself.  I don't see a necessary to  introduce 
the heavy SBR message, it seems SMU will do a full reset  when it get SBR  
request.  IS there  a different code path  for SMU to handle the reset  for 
XGMI in passthrough mode ?  

Regards
Shaoyun.liu

-Original Message-
From: amd-gfx  On Behalf Of sashank saye
Sent: Friday, December 17, 2021 10:33 AM
To: amd-gfx@lists.freedesktop.org
Cc: Saye, Sashank 
Subject: [PATCH] drm/amdgpu: Send Message to SMU on aldebaran passthrough for 
sbr handling

For Aldebaran chip passthrough case we need to intimate SMU about special 
handling for SBR.On older chips we send LightSBR to SMU, enabling the same for 
Aldebaran. Slight difference, compared to previous chips, is on Aldebaran, SMU 
would do a heavy reset on SBR. Hence, the word Heavy instead of Light SBR is 
used for SMU to differentiate.

Signed-off-by: sashank saye 
Change-Id: I79420e7352bb670d6f9696df97d7546f131b18fc
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_device.c |  4 ++--
 drivers/gpu/drm/amd/pm/inc/aldebaran_ppsmc.h   |  4 +++-
 drivers/gpu/drm/amd/pm/inc/smu_types.h |  3 ++-
 drivers/gpu/drm/amd/pm/swsmu/smu13/aldebaran_ppt.c | 11 +++
 4 files changed, 18 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index f31caec669e7..06aee23505b2 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -2618,8 +2618,8 @@ static int amdgpu_device_ip_late_init(struct 
amdgpu_device *adev)
if (r)
DRM_ERROR("enable mgpu fan boost failed (%d).\n", r);
 
-   /* For XGMI + passthrough configuration on arcturus, enable light SBR */
-   if (adev->asic_type == CHIP_ARCTURUS &&
+   /* For XGMI + passthrough configuration on arcturus and aldebaran, 
enable light SBR */
+   if ((adev->asic_type == CHIP_ARCTURUS || adev->asic_type == 
+CHIP_ALDEBARAN ) &&
amdgpu_passthrough(adev) &&
adev->gmc.xgmi.num_physical_nodes > 1)
smu_set_light_sbr(>smu, true);
diff --git a/drivers/gpu/drm/amd/pm/inc/aldebaran_ppsmc.h 
b/drivers/gpu/drm/amd/pm/inc/aldebaran_ppsmc.h
index 35fa0d8e92dd..ab66a4b9e438 100644
--- a/drivers/gpu/drm/amd/pm/inc/aldebaran_ppsmc.h
+++ b/drivers/gpu/drm/amd/pm/inc/aldebaran_ppsmc.h
@@ -102,7 +102,9 @@
 
 #define PPSMC_MSG_GfxDriverResetRecovery   0x42
 #define PPSMC_MSG_BoardPowerCalibration0x43
-#define PPSMC_Message_Count0x44
+#define PPSMC_MSG_HeavySBR  0x45
+#define PPSMC_Message_Count0x46
+
 
 //PPSMC Reset Types
 #define PPSMC_RESET_TYPE_WARM_RESET  0x00
diff --git a/drivers/gpu/drm/amd/pm/inc/smu_types.h 
b/drivers/gpu/drm/amd/pm/inc/smu_types.h
index 18b862a90fbe..ff8a0bcbd290 100644
--- a/drivers/gpu/drm/amd/pm/inc/smu_types.h
+++ b/drivers/gpu/drm/amd/pm/inc/smu_types.h
@@ -229,7 +229,8 @@
__SMU_DUMMY_MAP(BoardPowerCalibration),   \
__SMU_DUMMY_MAP(RequestGfxclk),   \
__SMU_DUMMY_MAP(ForceGfxVid), \
-   __SMU_DUMMY_MAP(UnforceGfxVid),
+   __SMU_DUMMY_MAP(UnforceGfxVid),   \
+   __SMU_DUMMY_MAP(HeavySBR),
 
 #undef __SMU_DUMMY_MAP
 #define __SMU_DUMMY_MAP(type)  SMU_MSG_##type
diff --git a/drivers/gpu/drm/amd/pm/swsmu/smu13/aldebaran_ppt.c 
b/drivers/gpu/drm/amd/pm/swsmu/smu13/aldebaran_ppt.c
index 7433a051e795..f442950e9676 100644
--- a/drivers/gpu/drm/amd/pm/swsmu/smu13/aldebaran_ppt.c
+++ b/drivers/gpu/drm/amd/pm/swsmu/smu13/aldebaran_ppt.c
@@ -141,6 +141,7 @@ static const struct cmn2asic_msg_mapping 
aldebaran_message_map[SMU_MSG_MAX_COUNT
MSG_MAP(SetUclkDpmMode,

RE: [PATCH] drm/amdgpu: Send Message to SMU on aldebaran passthrough for sbr handling

2021-12-17 Thread Liu, Shaoyun

[AMD Official Use Only]

First , the name of heavy SBR  is confusing when you need to go through  light 
SBR code path. 
Secondary,  originally we introduce the light SBR is because on older asic,   
FW can not synchronize the reset on the devices within the hive, so it depends 
on driver to sync the reset.  From what I have heard , for chip aructus , the 
FW actually can sync the reset itself.  I don't see a necessary to  introduce 
the heavy SBR message, it seems SMU will do a full reset  when it get SBR  
request.  IS there  a different code path  for SMU to handle the reset  for 
XGMI in passthrough mode ?  

Regards
Shaoyun.liu

-Original Message-
From: amd-gfx  On Behalf Of sashank saye
Sent: Friday, December 17, 2021 10:33 AM
To: amd-gfx@lists.freedesktop.org
Cc: Saye, Sashank 
Subject: [PATCH] drm/amdgpu: Send Message to SMU on aldebaran passthrough for 
sbr handling

For Aldebaran chip passthrough case we need to intimate SMU about special 
handling for SBR.On older chips we send LightSBR to SMU, enabling the same for 
Aldebaran. Slight difference, compared to previous chips, is on Aldebaran, SMU 
would do a heavy reset on SBR. Hence, the word Heavy instead of Light SBR is 
used for SMU to differentiate.

Signed-off-by: sashank saye 
Change-Id: I79420e7352bb670d6f9696df97d7546f131b18fc
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_device.c |  4 ++--
 drivers/gpu/drm/amd/pm/inc/aldebaran_ppsmc.h   |  4 +++-
 drivers/gpu/drm/amd/pm/inc/smu_types.h |  3 ++-
 drivers/gpu/drm/amd/pm/swsmu/smu13/aldebaran_ppt.c | 11 +++
 4 files changed, 18 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index f31caec669e7..06aee23505b2 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -2618,8 +2618,8 @@ static int amdgpu_device_ip_late_init(struct 
amdgpu_device *adev)
if (r)
DRM_ERROR("enable mgpu fan boost failed (%d).\n", r);
 
-   /* For XGMI + passthrough configuration on arcturus, enable light SBR */
-   if (adev->asic_type == CHIP_ARCTURUS &&
+   /* For XGMI + passthrough configuration on arcturus and aldebaran, 
enable light SBR */
+   if ((adev->asic_type == CHIP_ARCTURUS || adev->asic_type == 
+CHIP_ALDEBARAN ) &&
amdgpu_passthrough(adev) &&
adev->gmc.xgmi.num_physical_nodes > 1)
smu_set_light_sbr(>smu, true);
diff --git a/drivers/gpu/drm/amd/pm/inc/aldebaran_ppsmc.h 
b/drivers/gpu/drm/amd/pm/inc/aldebaran_ppsmc.h
index 35fa0d8e92dd..ab66a4b9e438 100644
--- a/drivers/gpu/drm/amd/pm/inc/aldebaran_ppsmc.h
+++ b/drivers/gpu/drm/amd/pm/inc/aldebaran_ppsmc.h
@@ -102,7 +102,9 @@
 
 #define PPSMC_MSG_GfxDriverResetRecovery   0x42
 #define PPSMC_MSG_BoardPowerCalibration0x43
-#define PPSMC_Message_Count0x44
+#define PPSMC_MSG_HeavySBR  0x45
+#define PPSMC_Message_Count0x46
+
 
 //PPSMC Reset Types
 #define PPSMC_RESET_TYPE_WARM_RESET  0x00
diff --git a/drivers/gpu/drm/amd/pm/inc/smu_types.h 
b/drivers/gpu/drm/amd/pm/inc/smu_types.h
index 18b862a90fbe..ff8a0bcbd290 100644
--- a/drivers/gpu/drm/amd/pm/inc/smu_types.h
+++ b/drivers/gpu/drm/amd/pm/inc/smu_types.h
@@ -229,7 +229,8 @@
__SMU_DUMMY_MAP(BoardPowerCalibration),   \
__SMU_DUMMY_MAP(RequestGfxclk),   \
__SMU_DUMMY_MAP(ForceGfxVid), \
-   __SMU_DUMMY_MAP(UnforceGfxVid),
+   __SMU_DUMMY_MAP(UnforceGfxVid),   \
+   __SMU_DUMMY_MAP(HeavySBR),
 
 #undef __SMU_DUMMY_MAP
 #define __SMU_DUMMY_MAP(type)  SMU_MSG_##type
diff --git a/drivers/gpu/drm/amd/pm/swsmu/smu13/aldebaran_ppt.c 
b/drivers/gpu/drm/amd/pm/swsmu/smu13/aldebaran_ppt.c
index 7433a051e795..f442950e9676 100644
--- a/drivers/gpu/drm/amd/pm/swsmu/smu13/aldebaran_ppt.c
+++ b/drivers/gpu/drm/amd/pm/swsmu/smu13/aldebaran_ppt.c
@@ -141,6 +141,7 @@ static const struct cmn2asic_msg_mapping 
aldebaran_message_map[SMU_MSG_MAX_COUNT
MSG_MAP(SetUclkDpmMode,  PPSMC_MSG_SetUclkDpmMode,  
0),
MSG_MAP(GfxDriverResetRecovery,  
PPSMC_MSG_GfxDriverResetRecovery,  0),
MSG_MAP(BoardPowerCalibration,   
PPSMC_MSG_BoardPowerCalibration,   0),
+   MSG_MAP(HeavySBR,PPSMC_MSG_HeavySBR,
0),
 };
 
 static const struct cmn2asic_mapping aldebaran_clk_map[SMU_CLK_COUNT] = { @@ 
-1912,6 +1913,15 @@ static int aldebaran_mode2_reset(struct smu_context *smu)
return ret;
 }
 
+static int aldebaran_set_light_sbr(struct smu_context *smu, bool 
+enable) {
+   int ret = 0;
+   //For alderbarn chip, SMU would do a mode 1 reset as part of SBR hence 
we call it HeavySBR instead of light
+   ret =  smu_cmn_send_smc_msg_with_param(smu, SMU_MSG_HeavySBR, enable ? 
+1 : 0, NULL);
+

Re: [PATCH] drm/amdgpu: Try To using WARN() instead BUG() avoid kernel panic

2021-12-17 Thread Deucher, Alexander

[Public]

I think these are pretty fundamental errors.  You should never hit them in 
practice and if you do, I think a BUG is fine.

Alex


From: ZhiJie.Zhang 
Sent: Thursday, December 16, 2021 9:38 PM
To: Koenig, Christian ; Deucher, Alexander 
; amd-gfx@lists.freedesktop.org 

Cc: zhangzhi...@loongson.cn ; botton_zh...@163.com 
; airl...@linux.ie ; dan...@ffwll.ch 
; jack.zha...@amd.com 
Subject: [PATCH] drm/amdgpu: Try To using WARN() instead BUG() avoid kernel 
panic

Signed-off-by: ZhiJie.Zhang 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_cgs.c|  4 +--
 drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 31 +++---
 drivers/gpu/drm/amd/amdgpu/cik_sdma.c  |  5 +++-
 drivers/gpu/drm/amd/amdgpu/gfx_v6_0.c  | 18 +
 4 files changed, 41 insertions(+), 17 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cgs.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_cgs.c
index f1a050379190..edf2de4cec8c 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cgs.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cgs.c
@@ -76,7 +76,7 @@ static uint32_t amdgpu_cgs_read_ind_register(struct 
cgs_device *cgs_device,
 DRM_ERROR("audio endpt register access not implemented.\n");
 return 0;
 default:
-   BUG();
+   adev->accel_working = false;
 }
 WARN(1, "Invalid indirect register space");
 return 0;
@@ -104,9 +104,9 @@ static void amdgpu_cgs_write_ind_register(struct cgs_device 
*cgs_device,
 DRM_ERROR("audio endpt register access not implemented.\n");
 return;
 default:
-   BUG();
 }
 WARN(1, "Invalid indirect register space");
+   adev->accel_working = false;
 }

 static uint32_t fw_type_convert(struct cgs_device *cgs_device, uint32_t 
fw_type)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index 188accb71249..b9ecf7f70409 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -488,7 +488,11 @@ uint8_t amdgpu_mm_rreg8(struct amdgpu_device *adev, 
uint32_t offset)

 if (offset < adev->rmmio_size)
 return (readb(adev->rmmio + offset));
-   BUG();
+
+   WARN(1, "Invalid indirect register space");
+   adev->accel_working = false;
+
+   return 0;
 }

 /*
@@ -513,8 +517,10 @@ void amdgpu_mm_wreg8(struct amdgpu_device *adev, uint32_t 
offset, uint8_t value)

 if (offset < adev->rmmio_size)
 writeb(value, adev->rmmio + offset);
-   else
-   BUG();
+   else {
+   WARN(1, "Invalid indirect register space");
+   adev->accel_working = false;
+   }
 }

 /**
@@ -803,7 +809,8 @@ void amdgpu_device_indirect_wreg64(struct amdgpu_device 
*adev,
 static uint32_t amdgpu_invalid_rreg(struct amdgpu_device *adev, uint32_t reg)
 {
 DRM_ERROR("Invalid callback to read register 0x%04X\n", reg);
-   BUG();
+
+   adev->accel_working = false;
 return 0;
 }

@@ -821,7 +828,8 @@ static void amdgpu_invalid_wreg(struct amdgpu_device *adev, 
uint32_t reg, uint32
 {
 DRM_ERROR("Invalid callback to write register 0x%04X with 0x%08X\n",
   reg, v);
-   BUG();
+
+   adev->accel_working = false;
 }

 /**
@@ -837,7 +845,8 @@ static void amdgpu_invalid_wreg(struct amdgpu_device *adev, 
uint32_t reg, uint32
 static uint64_t amdgpu_invalid_rreg64(struct amdgpu_device *adev, uint32_t reg)
 {
 DRM_ERROR("Invalid callback to read 64 bit register 0x%04X\n", reg);
-   BUG();
+
+   adev->accel_working = false;
 return 0;
 }

@@ -855,7 +864,8 @@ static void amdgpu_invalid_wreg64(struct amdgpu_device 
*adev, uint32_t reg, uint
 {
 DRM_ERROR("Invalid callback to write 64 bit register 0x%04X with 
0x%08llX\n",
   reg, v);
-   BUG();
+
+   adev->accel_working = false;
 }

 /**
@@ -874,7 +884,9 @@ static uint32_t amdgpu_block_invalid_rreg(struct 
amdgpu_device *adev,
 {
 DRM_ERROR("Invalid callback to read register 0x%04X in block 0x%04X\n",
   reg, block);
-   BUG();
+
+   adev->accel_working = false;
+
 return 0;
 }

@@ -895,7 +907,8 @@ static void amdgpu_block_invalid_wreg(struct amdgpu_device 
*adev,
 {
 DRM_ERROR("Invalid block callback to write register 0x%04X in block 
0x%04X with 0x%08X\n",
   reg, block, v);
-   BUG();
+
+   adev->accel_working = false;
 }

 /**
diff --git a/drivers/gpu/drm/amd/amdgpu/cik_sdma.c 
b/drivers/gpu/drm/amd/amdgpu/cik_sdma.c
index c8ebd108548d..957169142e57 100644
--- a/drivers/gpu/drm/amd/amdgpu/cik_sdma.c
+++ b/drivers/gpu/drm/amd/amdgpu/cik_sdma.c
@@ -129,7 +129,10 @@ static int cik_sdma_init_microcode(struct amdgpu_device 
*adev)
 case CHIP_MULLINS:
 chip_name = "mullins";
 break;
-   default:

Re: [PATCH] drm/amdgpu: Check the memory can be accesssed by ttm_device_clear_dma_mappings.

2021-12-17 Thread Alex Deucher

On Thu, Dec 16, 2021 at 10:25 AM Surbhi Kakarya  wrote:
>
> If the event guard is enabled and VF doesn't receive an ack from PF for full 
> access,
> the guest driver load crashes.
> This is caused due to the call to ttm_device_clear_dma_mappings with 
> non-initialized
> mman during driver tear down.
>
> This patch adds the necessary condition to check if the mman initialization 
> passed or not
> and takes the path based on the condition output.
>
> Signed-off-by: skakarya 

Acked-by: Alex Deucher 

> Change-Id: I1c18c7eb3500687c8b6e7fc414503dcf2a20b94c
> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> index 598250a380f5..226110be7a2f 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> @@ -3984,7 +3984,8 @@ void amdgpu_device_fini_hw(struct amdgpu_device *adev)
>
> amdgpu_irq_fini_hw(adev);
>
> -   ttm_device_clear_dma_mappings(>mman.bdev);
> +   if (adev->mman.initialized)
> +   ttm_device_clear_dma_mappings(>mman.bdev);
>
> amdgpu_gart_dummy_page_fini(adev);
>
> --
> 2.25.1
>

Re: [Bug Report] Desktop monitor sleep regression

2021-12-17 Thread Thorsten Leemhuis

added some CCs Geert added in his reply

On 07.12.21 08:20, Thorsten Leemhuis wrote:
> 
> [TLDR: adding this regression to regzbot; most of this mail is compiled
> from a few templates paragraphs some of you might have seen already.]
> 
> Hi, this is your Linux kernel regression tracker speaking.

/me again

What's up here? We are getting close to rc6, but there afaics wasn't any
reply of substance since the report ten days ago. Hence:

Could anybody please comment on this? Imre Deak, the commit Brandon
found in the bisection contains a patch of yours, do you maybe have an
idea what's up here?

Ciao, Thorsten

#regzbot poke

> Adding the regression mailing list to the list of recipients, as it
> should be in the loop for all regressions, as explained here:
> https://www.kernel.org/doc/html/latest/admin-guide/reporting-issues.html
> 
> Also adding the authors and reviewers of the culprit and two appropriate
> mailing lists.
> 
> On 07.12.21 01:21, Brandon Nielsen wrote:
>> Monitors no longer sleep properly on my system (dual monitor connected
>> via DP->DVI, amdgpu, x86_64). The monitors slept properly on 5.14, but
>> stopped during the 5.15 series. I have also filed this bug on the kernel
>> bugzilla[0] and downstream[1].
>>
>> I have performed a bisect, first "bad" commit to master is
>> 55285e21f04517939480966164a33898c34b2af2[1], the same change made it
>> into the 5.15 branch as e3b39825ed0813f787cb3ebdc5ecaa5131623647.
> 
> TWIMC: That was for 5.15.3
> 
>> I have
>> verified the issue exists in latest master
>> (a51e3ac43ddbad891c2b1a4f3aa52371d6939570).
>>
>> Steps to reproduce:
>>
>>   1. Boot system (Fedora Workstation 35 in this case)
>>   2. Log in
>>   3. Lock screen (after a few seconds, monitors will enter power save
>> "sleep" state with backlight off)
>>   4. Wait (usually no more than 30 seconds, sometimes up to a few minutes)
>>   5. Observe monitor leaving "sleep" state (backlight comes back on),
>> but nothing is displayed
>>
>> [0] - https://bugzilla.kernel.org/show_bug.cgi?id=215203
>> [1] - https://bugzilla.redhat.com/show_bug.cgi?id=2028613
> 
> To be sure this issue doesn't fall through the cracks unnoticed, I'm
> adding it to regzbot, my Linux kernel regression tracking bot:
> 
> #regzbot ^introduced 55285e21f04517939480966164a33898c34b2af2
> #regzbot title fbdev/efifb: Monitors no longer sleep (amdgpu dual
> monitor setup)
> #regzbot ignore-activity
> 
> Reminder: when fixing the issue, please add a 'Link:' tag with the URL
> to the report (the parent of this mail), then regzbot will automatically
> mark the regression as resolved once the fix lands in the appropriate
> tree. For more details about regzbot see footer.
> 
> Sending this to everyone that got the initial report, to make all aware
> of the tracking. I also hope that messages like this motivate people to
> directly get at least the regression mailing list and ideally even
> regzbot involved when dealing with regressions, as messages like this
> wouldn't be needed then.
> 
> Don't worry, I'll send further messages wrt to this regression just to
> the lists (with a tag in the subject so people can filter them away), as
> long as they are intended just for regzbot. With a bit of luck no such
> messages will be needed anyway.
> 
> Ciao, Thorsten, your Linux kernel regression tracker.
> 
> P.S.: As a Linux kernel regression tracker I'm getting a lot of reports
> on my table. I can only look briefly into most of them. Unfortunately
> therefore I sometimes will get things wrong or miss something important.
> I hope that's not the case here; if you think it is, don't hesitate to
> tell me about it in a public reply. That's in everyone's interest, as
> what I wrote above might be misleading to everyone reading this; any
> suggestion I gave they thus might sent someone reading this down the
> wrong rabbit hole, which none of us wants.
> 
> BTW, I have no personal interest in this issue, which is tracked using
> regzbot, my Linux kernel regression tracking bot
> (https://linux-regtracking.leemhuis.info/regzbot/). I'm only posting
> this mail to get things rolling again and hence don't need to be CC on
> all further activities wrt to this regression.
>

RE: [PATCH] drm/amdkfd: correct sdma queue number in kfd device init

2021-12-17 Thread Kim, Jonathan




> -Original Message-
> From: Sider, Graham 
> Sent: December 17, 2021 10:06 AM
> To: Chen, Guchun ; amd-
> g...@lists.freedesktop.org; Deucher, Alexander
> ; Kuehling, Felix
> ; Kim, Jonathan 
> Subject: RE: [PATCH] drm/amdkfd: correct sdma queue number in kfd
> device init
> 
> [Public]
> 
> > -Original Message-
> > From: Chen, Guchun 
> > Sent: Friday, December 17, 2021 9:31 AM
> > To: amd-gfx@lists.freedesktop.org; Deucher, Alexander
> > ; Sider, Graham
> ;
> > Kuehling, Felix ; Kim, Jonathan
> > 
> > Cc: Chen, Guchun 
> > Subject: [PATCH] drm/amdkfd: correct sdma queue number in kfd device
> > init
> >
> > sdma queue number is not correct like on vega20, this patch promises
> > the setting keeps the same after code refactor.
> > Additionally, improve code to use switch case to list IP version to
> > complete kfd device_info structure filling.
> > This keeps consistency with the IP parse code in amdgpu_discovery.c.
> >
> > Fixes: a9e2c4dc6cc4("drm/amdkfd: add kfd_device_info_init function")
> > Signed-off-by: Guchun Chen 
> > ---
> >  drivers/gpu/drm/amd/amdkfd/kfd_device.c | 74
> > ++---
> >  1 file changed, 65 insertions(+), 9 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device.c
> > b/drivers/gpu/drm/amd/amdkfd/kfd_device.c
> > index facc28f58c1f..e50bf992f298 100644
> > --- a/drivers/gpu/drm/amd/amdkfd/kfd_device.c
> > +++ b/drivers/gpu/drm/amd/amdkfd/kfd_device.c
> > @@ -59,11 +59,72 @@ static void kfd_gtt_sa_fini(struct kfd_dev *kfd);
> >
> >  static int kfd_resume(struct kfd_dev *kfd);
> >
> > +static void kfd_device_info_set_sdma_queue_num(struct kfd_dev *kfd)
> {
> > +   uint32_t sdma_version = kfd->adev-
> >ip_versions[SDMA0_HWIP][0];
> > +
> > +   switch (sdma_version) {
> > +   case IP_VERSION(4, 0, 0):/* VEGA10 */
> > +   case IP_VERSION(4, 0, 1):/* VEGA12 */
> > +   case IP_VERSION(4, 1, 0):/* RAVEN */
> > +   case IP_VERSION(4, 1, 1):/* RAVEN */
> > +   case IP_VERSION(4, 1, 2):/* RENIOR */
> > +   case IP_VERSION(5, 2, 1):/* VANGOGH */
> > +   case IP_VERSION(5, 2, 3):/* YELLOW_CARP */
> > +   kfd->device_info.num_sdma_queues_per_engine =
> > 2;
> > +   break;
> > +   case IP_VERSION(4, 2, 0):/* VEGA20 */
> 
> Thanks for spotting this Guchun. My previous patch should have used a "<"
> instead of a "<=" on IP_VERSION(4, 2, 0).
> 
> > +   case IP_VERSION(4, 2, 2):/* ARCTUTUS */
> > +   case IP_VERSION(4, 4, 0):/* ALDEBARAN */
> > +   case IP_VERSION(5, 0, 0):/* NAVI10 */
> > +   case IP_VERSION(5, 0, 1):/* CYAN_SKILLFISH */
> > +   case IP_VERSION(5, 0, 2):/* NAVI14 */
> > +   case IP_VERSION(5, 0, 5):/* NAVI12 */
> > +   case IP_VERSION(5, 2, 0):/* SIENNA_CICHLID */
> > +   case IP_VERSION(5, 2, 2):/* NAVY_FLOUDER */
> > +   case IP_VERSION(5, 2, 4):/* DIMGREY_CAVEFISH */
> > +   kfd->device_info.num_sdma_queues_per_engine =
> > 8;
> > +   break;
> > +   default:
> > +   dev_err(kfd_device,
> > +   "Failed to find sdma ip
> > blocks(SDMA_HWIP:0x%x) in %s\n",
> > +sdma_version, __func__);
> > +   }
> > +}
> > +
> > +static void kfd_device_info_set_event_interrupt_class(struct kfd_dev
> > +*kfd) {
> > +   uint32_t gc_version = KFD_GC_VERSION(kfd);
> > +
> > +   switch (gc_version) {
> > +   case IP_VERSION(9, 0, 1): /* VEGA10 */
> > +   case IP_VERSION(9, 2, 1): /* VEGA12 */
> > +   case IP_VERSION(9, 3, 0): /* RENOIR */
> > +   case IP_VERSION(9, 4, 0): /* VEGA20 */
> > +   case IP_VERSION(9, 4, 1): /* ARCTURUS */
> > +   case IP_VERSION(9, 4, 2): /* ALDEBARAN */
> > +   case IP_VERSION(10, 3, 1): /* VANGOGH */
> > +   case IP_VERSION(10, 3, 3): /* YELLOW_CARP */
> > +   case IP_VERSION(10, 1, 3): /* CYAN_SKILLFISH */
> > +   case IP_VERSION(10, 1, 10): /* NAVI10 */
> > +   case IP_VERSION(10, 1, 2): /* NAVI12 */
> > +   case IP_VERSION(10, 1, 1): /* NAVI14 */
> > +   case IP_VERSION(10, 3, 0): /* SIENNA_CICHLID */
> > +   case IP_VERSION(10, 3, 2): /* NAVY_FLOUNDER */
> > +   case IP_VERSION(10, 3, 4): /* DIMGREY_CAVEFISH */
> > +   case IP_VERSION(10, 3, 5): /* BEIGE_GOBY */
> > +   kfd->device_info.event_interrupt_class =
> > _interrupt_class_v9;
> > +   break;
> > +   default:
> > +   dev_err(kfd_device, "Failed to find gc ip
> > blocks(GC_HWIP:0x%x) in %s\n",
> > +   gc_version, __func__);
> > +   }
> > +}
> 
> I understand the appeal of moving to a switch for the SDMA queue num
> above since its setting isn't very linear w.r.t. the SDMA versioning. That 
> said
> I don't know if I understand moving to a switch for the event interrupt class
> here. To clarify, originally when I set all SOC15 to event_interrupt_class_v9
> it was to follow what was in the device_info structs in drm-staging-next at
> that

[PATCH] drm/amdgpu: Send Message to SMU on aldebaran passthrough for sbr handling

2021-12-17 Thread sashank saye

For Aldebaran chip passthrough case we need to intimate SMU
about special handling for SBR.On older chips we send
LightSBR to SMU, enabling the same for Aldebaran. Slight
difference, compared to previous chips, is on Aldebaran, SMU
would do a heavy reset on SBR. Hence, the word Heavy
instead of Light SBR is used for SMU to differentiate.

Signed-off-by: sashank saye 
Change-Id: I79420e7352bb670d6f9696df97d7546f131b18fc
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_device.c |  4 ++--
 drivers/gpu/drm/amd/pm/inc/aldebaran_ppsmc.h   |  4 +++-
 drivers/gpu/drm/amd/pm/inc/smu_types.h |  3 ++-
 drivers/gpu/drm/amd/pm/swsmu/smu13/aldebaran_ppt.c | 11 +++
 4 files changed, 18 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index f31caec669e7..06aee23505b2 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -2618,8 +2618,8 @@ static int amdgpu_device_ip_late_init(struct 
amdgpu_device *adev)
if (r)
DRM_ERROR("enable mgpu fan boost failed (%d).\n", r);
 
-   /* For XGMI + passthrough configuration on arcturus, enable light SBR */
-   if (adev->asic_type == CHIP_ARCTURUS &&
+   /* For XGMI + passthrough configuration on arcturus and aldebaran, 
enable light SBR */
+   if ((adev->asic_type == CHIP_ARCTURUS || adev->asic_type == 
CHIP_ALDEBARAN ) &&
amdgpu_passthrough(adev) &&
adev->gmc.xgmi.num_physical_nodes > 1)
smu_set_light_sbr(>smu, true);
diff --git a/drivers/gpu/drm/amd/pm/inc/aldebaran_ppsmc.h 
b/drivers/gpu/drm/amd/pm/inc/aldebaran_ppsmc.h
index 35fa0d8e92dd..ab66a4b9e438 100644
--- a/drivers/gpu/drm/amd/pm/inc/aldebaran_ppsmc.h
+++ b/drivers/gpu/drm/amd/pm/inc/aldebaran_ppsmc.h
@@ -102,7 +102,9 @@
 
 #define PPSMC_MSG_GfxDriverResetRecovery   0x42
 #define PPSMC_MSG_BoardPowerCalibration0x43
-#define PPSMC_Message_Count0x44
+#define PPSMC_MSG_HeavySBR  0x45
+#define PPSMC_Message_Count0x46
+
 
 //PPSMC Reset Types
 #define PPSMC_RESET_TYPE_WARM_RESET  0x00
diff --git a/drivers/gpu/drm/amd/pm/inc/smu_types.h 
b/drivers/gpu/drm/amd/pm/inc/smu_types.h
index 18b862a90fbe..ff8a0bcbd290 100644
--- a/drivers/gpu/drm/amd/pm/inc/smu_types.h
+++ b/drivers/gpu/drm/amd/pm/inc/smu_types.h
@@ -229,7 +229,8 @@
__SMU_DUMMY_MAP(BoardPowerCalibration),   \
__SMU_DUMMY_MAP(RequestGfxclk),   \
__SMU_DUMMY_MAP(ForceGfxVid), \
-   __SMU_DUMMY_MAP(UnforceGfxVid),
+   __SMU_DUMMY_MAP(UnforceGfxVid),   \
+   __SMU_DUMMY_MAP(HeavySBR),
 
 #undef __SMU_DUMMY_MAP
 #define __SMU_DUMMY_MAP(type)  SMU_MSG_##type
diff --git a/drivers/gpu/drm/amd/pm/swsmu/smu13/aldebaran_ppt.c 
b/drivers/gpu/drm/amd/pm/swsmu/smu13/aldebaran_ppt.c
index 7433a051e795..f442950e9676 100644
--- a/drivers/gpu/drm/amd/pm/swsmu/smu13/aldebaran_ppt.c
+++ b/drivers/gpu/drm/amd/pm/swsmu/smu13/aldebaran_ppt.c
@@ -141,6 +141,7 @@ static const struct cmn2asic_msg_mapping 
aldebaran_message_map[SMU_MSG_MAX_COUNT
MSG_MAP(SetUclkDpmMode,  PPSMC_MSG_SetUclkDpmMode,  
0),
MSG_MAP(GfxDriverResetRecovery,  
PPSMC_MSG_GfxDriverResetRecovery,  0),
MSG_MAP(BoardPowerCalibration,   
PPSMC_MSG_BoardPowerCalibration,   0),
+   MSG_MAP(HeavySBR,PPSMC_MSG_HeavySBR,
0),
 };
 
 static const struct cmn2asic_mapping aldebaran_clk_map[SMU_CLK_COUNT] = {
@@ -1912,6 +1913,15 @@ static int aldebaran_mode2_reset(struct smu_context *smu)
return ret;
 }
 
+static int aldebaran_set_light_sbr(struct smu_context *smu, bool enable)
+{
+   int ret = 0;
+   //For alderbarn chip, SMU would do a mode 1 reset as part of SBR hence 
we call it HeavySBR instead of light
+   ret =  smu_cmn_send_smc_msg_with_param(smu, SMU_MSG_HeavySBR, enable ? 
1 : 0, NULL);
+
+   return ret;
+}
+
 static bool aldebaran_is_mode1_reset_supported(struct smu_context *smu)
 {
 #if 0
@@ -2021,6 +2031,7 @@ static const struct pptable_funcs aldebaran_ppt_funcs = {
.get_gpu_metrics = aldebaran_get_gpu_metrics,
.mode1_reset_is_support = aldebaran_is_mode1_reset_supported,
.mode2_reset_is_support = aldebaran_is_mode2_reset_supported,
+   .set_light_sbr = aldebaran_set_light_sbr,
.mode1_reset = aldebaran_mode1_reset,
.set_mp1_state = aldebaran_set_mp1_state,
.mode2_reset = aldebaran_mode2_reset,
-- 
2.25.1

[PATCH 4/4] drm/amdgpu: Access the FRU on Aldebaran

2021-12-17 Thread Kent Russell

This is supported, although the offset is different from VG20, so fix
that with a variable and enable getting the product name and serial
number from the FRU. Do this for all SKUs since all SKUs have the FRU

Signed-off-by: Kent Russell 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_fru_eeprom.c | 13 ++---
 1 file changed, 10 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_fru_eeprom.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_fru_eeprom.c
index 5ed24701f9cf..80f43e69e659 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_fru_eeprom.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_fru_eeprom.c
@@ -56,6 +56,9 @@ static bool is_fru_eeprom_supported(struct amdgpu_device 
*adev)
return true;
else
return false;
+   case CHIP_ALDEBARAN:
+   /* All Aldebaran SKUs have the FRU */
+   return true;
default:
return false;
}
@@ -91,6 +94,10 @@ int amdgpu_fru_get_product_info(struct amdgpu_device *adev)
unsigned char buff[PRODUCT_NAME_LEN+2];
u32 addrptr;
int size, len;
+   int offset = 2;
+
+   if (adev->asic_type == CHIP_ALDEBARAN)
+   offset = 0;
 
if (!is_fru_eeprom_supported(adev))
return 0;
@@ -137,7 +144,7 @@ int amdgpu_fru_get_product_info(struct amdgpu_device *adev)
len = PRODUCT_NAME_LEN - 1;
}
/* Start at 2 due to buff using fields 0 and 1 for the address */
-   memcpy(adev->product_name, [2], len);
+   memcpy(adev->product_name, [offset], len);
adev->product_name[len] = '\0';
 
addrptr += size + 1;
@@ -155,7 +162,7 @@ int amdgpu_fru_get_product_info(struct amdgpu_device *adev)
DRM_WARN("FRU Product Number is larger than 16 characters. This 
is likely a mistake");
len = sizeof(adev->product_number) - 1;
}
-   memcpy(adev->product_number, [2], len);
+   memcpy(adev->product_number, [offset], len);
adev->product_number[len] = '\0';
 
addrptr += size + 1;
@@ -182,7 +189,7 @@ int amdgpu_fru_get_product_info(struct amdgpu_device *adev)
DRM_WARN("FRU Serial Number is larger than 16 characters. This 
is likely a mistake");
len = sizeof(adev->serial) - 1;
}
-   memcpy(adev->serial, [2], len);
+   memcpy(adev->serial, [offset], len);
adev->serial[len] = '\0';
 
return 0;
-- 
2.25.1

[PATCH 3/4] drm/amdgpu: Only overwrite serial if field is empty

2021-12-17 Thread Kent Russell

On Aldebaran, the serial may be obtained from the FRU. Only overwrite
the serial with the unique_id if the serial is empty. This will support
printing serial numbers for mGPU devices where there are 2 unique_ids
for the 2 GPUs, but only one serial number for the board

Signed-off-by: Kent Russell 
---
 drivers/gpu/drm/amd/pm/swsmu/smu13/aldebaran_ppt.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/pm/swsmu/smu13/aldebaran_ppt.c 
b/drivers/gpu/drm/amd/pm/swsmu/smu13/aldebaran_ppt.c
index 7433a051e795..5b9868011f4c 100644
--- a/drivers/gpu/drm/amd/pm/swsmu/smu13/aldebaran_ppt.c
+++ b/drivers/gpu/drm/amd/pm/swsmu/smu13/aldebaran_ppt.c
@@ -1605,7 +1605,8 @@ static void aldebaran_get_unique_id(struct smu_context 
*smu)
mutex_unlock(>metrics_lock);
 
adev->unique_id = ((uint64_t)upper32 << 32) | lower32;
-   sprintf(adev->serial, "%016llx", adev->unique_id);
+   if (adev->serial[0] == '\0')
+   sprintf(adev->serial, "%016llx", adev->unique_id);
 }
 
 static bool aldebaran_is_baco_supported(struct smu_context *smu)
-- 
2.25.1

[PATCH 2/4] drm/amdgpu: Enable unique_id for Aldebaran

2021-12-17 Thread Kent Russell

It's supported, so support the unique_id sysfs file

Signed-off-by: Kent Russell 
---
 drivers/gpu/drm/amd/pm/amdgpu_pm.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/pm/amdgpu_pm.c 
b/drivers/gpu/drm/amd/pm/amdgpu_pm.c
index 082539c70fd4..dfefb147ac2c 100644
--- a/drivers/gpu/drm/amd/pm/amdgpu_pm.c
+++ b/drivers/gpu/drm/amd/pm/amdgpu_pm.c
@@ -2090,7 +2090,8 @@ static int default_attr_update(struct amdgpu_device 
*adev, struct amdgpu_device_
} else if (DEVICE_ATTR_IS(unique_id)) {
if (asic_type != CHIP_VEGA10 &&
asic_type != CHIP_VEGA20 &&
-   asic_type != CHIP_ARCTURUS)
+   asic_type != CHIP_ARCTURUS &&
+   asic_type != CHIP_ALDEBARAN)
*states = ATTR_STATE_UNSUPPORTED;
} else if (DEVICE_ATTR_IS(pp_features)) {
if (adev->flags & AMD_IS_APU || asic_type < CHIP_VEGA10)
-- 
2.25.1

[PATCH 1/4] drm/amdgpu: Increase potential product_name to 64 characters

2021-12-17 Thread Kent Russell

Having seen at least 1 42-character product_name, bump the number up to
64, and put that definition into amdgpu.h to make future adjustments
simpler.

Signed-off-by: Kent Russell 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu.h|  3 ++-
 drivers/gpu/drm/amd/amdgpu/amdgpu_fru_eeprom.c | 12 +---
 2 files changed, 7 insertions(+), 8 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
index e701dedce344..4e6737e4c36c 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
@@ -813,6 +813,7 @@ struct amd_powerplay {
 
 #define AMDGPU_RESET_MAGIC_NUM 64
 #define AMDGPU_MAX_DF_PERFMONS 4
+#define PRODUCT_NAME_LEN 64
 struct amdgpu_device {
struct device   *dev;
struct pci_dev  *pdev;
@@ -1083,7 +1084,7 @@ struct amdgpu_device {
 
/* Chip product information */
charproduct_number[16];
-   charproduct_name[32];
+   charproduct_name[PRODUCT_NAME_LEN];
charserial[20];
 
atomic_tthrottling_logging_enabled;
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_fru_eeprom.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_fru_eeprom.c
index 7709caeb233d..5ed24701f9cf 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_fru_eeprom.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_fru_eeprom.c
@@ -88,7 +88,7 @@ static int amdgpu_fru_read_eeprom(struct amdgpu_device *adev, 
uint32_t addrptr,
 
 int amdgpu_fru_get_product_info(struct amdgpu_device *adev)
 {
-   unsigned char buff[34];
+   unsigned char buff[PRODUCT_NAME_LEN+2];
u32 addrptr;
int size, len;
 
@@ -131,12 +131,10 @@ int amdgpu_fru_get_product_info(struct amdgpu_device 
*adev)
}
 
len = size;
-   /* Product name should only be 32 characters. Any more,
-* and something could be wrong. Cap it at 32 to be safe
-*/
-   if (len >= sizeof(adev->product_name)) {
-   DRM_WARN("FRU Product Number is larger than 32 characters. This 
is likely a mistake");
-   len = sizeof(adev->product_name) - 1;
+   if (len >= PRODUCT_NAME_LEN) {
+   DRM_WARN("FRU Product Name is larger than %d characters. This 
is likely a mistake",
+   PRODUCT_NAME_LEN);
+   len = PRODUCT_NAME_LEN - 1;
}
/* Start at 2 due to buff using fields 0 and 1 for the address */
memcpy(adev->product_name, [2], len);
-- 
2.25.1

RE: [PATCH] drm/amdkfd: correct sdma queue number in kfd device init

2021-12-17 Thread Kim, Jonathan

[AMD Official Use Only]

Are safeguards required for KFD interrupt initialization to fail gracefully in 
the event of a non-assignment?
Same would apply when KGD forwards interrupts to the KFD (although the KFD 
device reference might not exist at this point if the above comment is handled 
well so a check may not apply in this case).

Also should the dev_errs mention what it's failing to do rather than just 
reporting that it could not find the HW IP block?
In the case of non-assignment of sdma queues per engine, it still seems like 
the KFD could move forward but the user wouldn't know what the context of the 
dev_err was.

Thanks,

Jon

> -Original Message-
> From: Chen, Guchun 
> Sent: December 17, 2021 9:31 AM
> To: amd-gfx@lists.freedesktop.org; Deucher, Alexander
> ; Sider, Graham
> ; Kuehling, Felix ;
> Kim, Jonathan 
> Cc: Chen, Guchun 
> Subject: [PATCH] drm/amdkfd: correct sdma queue number in kfd device
> init
>
> sdma queue number is not correct like on vega20, this patch promises the
> setting keeps the same after code refactor.
> Additionally, improve code to use switch case to list IP version to complete
> kfd device_info structure filling.
> This keeps consistency with the IP parse code in amdgpu_discovery.c.
>
> Fixes: a9e2c4dc6cc4("drm/amdkfd: add kfd_device_info_init function")
> Signed-off-by: Guchun Chen 
> ---
>  drivers/gpu/drm/amd/amdkfd/kfd_device.c | 74
> ++---
>  1 file changed, 65 insertions(+), 9 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device.c
> b/drivers/gpu/drm/amd/amdkfd/kfd_device.c
> index facc28f58c1f..e50bf992f298 100644
> --- a/drivers/gpu/drm/amd/amdkfd/kfd_device.c
> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_device.c
> @@ -59,11 +59,72 @@ static void kfd_gtt_sa_fini(struct kfd_dev *kfd);
>
>  static int kfd_resume(struct kfd_dev *kfd);
>
> +static void kfd_device_info_set_sdma_queue_num(struct kfd_dev *kfd) {
> + uint32_t sdma_version = kfd->adev-
> >ip_versions[SDMA0_HWIP][0];
> +
> + switch (sdma_version) {
> + case IP_VERSION(4, 0, 0):/* VEGA10 */
> + case IP_VERSION(4, 0, 1):/* VEGA12 */
> + case IP_VERSION(4, 1, 0):/* RAVEN */
> + case IP_VERSION(4, 1, 1):/* RAVEN */
> + case IP_VERSION(4, 1, 2):/* RENIOR */
> + case IP_VERSION(5, 2, 1):/* VANGOGH */
> + case IP_VERSION(5, 2, 3):/* YELLOW_CARP */
> + kfd->device_info.num_sdma_queues_per_engine =
> 2;
> + break;
> + case IP_VERSION(4, 2, 0):/* VEGA20 */
> + case IP_VERSION(4, 2, 2):/* ARCTUTUS */
> + case IP_VERSION(4, 4, 0):/* ALDEBARAN */
> + case IP_VERSION(5, 0, 0):/* NAVI10 */
> + case IP_VERSION(5, 0, 1):/* CYAN_SKILLFISH */
> + case IP_VERSION(5, 0, 2):/* NAVI14 */
> + case IP_VERSION(5, 0, 5):/* NAVI12 */
> + case IP_VERSION(5, 2, 0):/* SIENNA_CICHLID */
> + case IP_VERSION(5, 2, 2):/* NAVY_FLOUDER */
> + case IP_VERSION(5, 2, 4):/* DIMGREY_CAVEFISH */
> + kfd->device_info.num_sdma_queues_per_engine =
> 8;
> + break;
> + default:
> + dev_err(kfd_device,
> + "Failed to find sdma ip
> blocks(SDMA_HWIP:0x%x) in %s\n",
> +sdma_version, __func__);
> + }
> +}
> +
> +static void kfd_device_info_set_event_interrupt_class(struct kfd_dev
> +*kfd) {
> + uint32_t gc_version = KFD_GC_VERSION(kfd);
> +
> + switch (gc_version) {
> + case IP_VERSION(9, 0, 1): /* VEGA10 */
> + case IP_VERSION(9, 2, 1): /* VEGA12 */
> + case IP_VERSION(9, 3, 0): /* RENOIR */
> + case IP_VERSION(9, 4, 0): /* VEGA20 */
> + case IP_VERSION(9, 4, 1): /* ARCTURUS */
> + case IP_VERSION(9, 4, 2): /* ALDEBARAN */
> + case IP_VERSION(10, 3, 1): /* VANGOGH */
> + case IP_VERSION(10, 3, 3): /* YELLOW_CARP */
> + case IP_VERSION(10, 1, 3): /* CYAN_SKILLFISH */
> + case IP_VERSION(10, 1, 10): /* NAVI10 */
> + case IP_VERSION(10, 1, 2): /* NAVI12 */
> + case IP_VERSION(10, 1, 1): /* NAVI14 */
> + case IP_VERSION(10, 3, 0): /* SIENNA_CICHLID */
> + case IP_VERSION(10, 3, 2): /* NAVY_FLOUNDER */
> + case IP_VERSION(10, 3, 4): /* DIMGREY_CAVEFISH */
> + case IP_VERSION(10, 3, 5): /* BEIGE_GOBY */
> + kfd->device_info.event_interrupt_class =
> _interrupt_class_v9;
> + break;
> + default:
> + dev_err(kfd_device, "Failed to find gc ip
> blocks(GC_HWIP:0x%x) in %s\n",
> + gc_version, __func__);
> + }
> +}
> +
>  static void kfd_device_info_init(struct kfd_dev *kfd,
>bool vf, uint32_t gfx_target_version)  {
>   uint32_t gc_version = KFD_GC_VERSION(kfd);
> - uint32_t sdma_version = kfd->adev-
> >ip_versions[SDMA0_HWIP][0];
>   uint32_t asic_type =

RE: [PATCH] drm/amdkfd: correct sdma queue number in kfd device init

2021-12-17 Thread Sider, Graham

[Public]

> -Original Message-
> From: Chen, Guchun 
> Sent: Friday, December 17, 2021 9:31 AM
> To: amd-gfx@lists.freedesktop.org; Deucher, Alexander
> ; Sider, Graham
> ; Kuehling, Felix ;
> Kim, Jonathan 
> Cc: Chen, Guchun 
> Subject: [PATCH] drm/amdkfd: correct sdma queue number in kfd device init
> 
> sdma queue number is not correct like on vega20, this patch promises the
> setting keeps the same after code refactor.
> Additionally, improve code to use switch case to list IP version to complete
> kfd device_info structure filling.
> This keeps consistency with the IP parse code in amdgpu_discovery.c.
> 
> Fixes: a9e2c4dc6cc4("drm/amdkfd: add kfd_device_info_init function")
> Signed-off-by: Guchun Chen 
> ---
>  drivers/gpu/drm/amd/amdkfd/kfd_device.c | 74
> ++---
>  1 file changed, 65 insertions(+), 9 deletions(-)
> 
> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device.c
> b/drivers/gpu/drm/amd/amdkfd/kfd_device.c
> index facc28f58c1f..e50bf992f298 100644
> --- a/drivers/gpu/drm/amd/amdkfd/kfd_device.c
> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_device.c
> @@ -59,11 +59,72 @@ static void kfd_gtt_sa_fini(struct kfd_dev *kfd);
> 
>  static int kfd_resume(struct kfd_dev *kfd);
> 
> +static void kfd_device_info_set_sdma_queue_num(struct kfd_dev *kfd) {
> + uint32_t sdma_version = kfd->adev->ip_versions[SDMA0_HWIP][0];
> +
> + switch (sdma_version) {
> + case IP_VERSION(4, 0, 0):/* VEGA10 */
> + case IP_VERSION(4, 0, 1):/* VEGA12 */
> + case IP_VERSION(4, 1, 0):/* RAVEN */
> + case IP_VERSION(4, 1, 1):/* RAVEN */
> + case IP_VERSION(4, 1, 2):/* RENIOR */
> + case IP_VERSION(5, 2, 1):/* VANGOGH */
> + case IP_VERSION(5, 2, 3):/* YELLOW_CARP */
> + kfd->device_info.num_sdma_queues_per_engine =
> 2;
> + break;
> + case IP_VERSION(4, 2, 0):/* VEGA20 */

Thanks for spotting this Guchun. My previous patch should have used a "<" 
instead of a "<=" on IP_VERSION(4, 2, 0).

> + case IP_VERSION(4, 2, 2):/* ARCTUTUS */
> + case IP_VERSION(4, 4, 0):/* ALDEBARAN */
> + case IP_VERSION(5, 0, 0):/* NAVI10 */
> + case IP_VERSION(5, 0, 1):/* CYAN_SKILLFISH */
> + case IP_VERSION(5, 0, 2):/* NAVI14 */
> + case IP_VERSION(5, 0, 5):/* NAVI12 */
> + case IP_VERSION(5, 2, 0):/* SIENNA_CICHLID */
> + case IP_VERSION(5, 2, 2):/* NAVY_FLOUDER */
> + case IP_VERSION(5, 2, 4):/* DIMGREY_CAVEFISH */
> + kfd->device_info.num_sdma_queues_per_engine =
> 8;
> + break;
> + default:
> + dev_err(kfd_device,
> + "Failed to find sdma ip
> blocks(SDMA_HWIP:0x%x) in %s\n",
> +sdma_version, __func__);
> + }
> +}
> +
> +static void kfd_device_info_set_event_interrupt_class(struct kfd_dev
> +*kfd) {
> + uint32_t gc_version = KFD_GC_VERSION(kfd);
> +
> + switch (gc_version) {
> + case IP_VERSION(9, 0, 1): /* VEGA10 */
> + case IP_VERSION(9, 2, 1): /* VEGA12 */
> + case IP_VERSION(9, 3, 0): /* RENOIR */
> + case IP_VERSION(9, 4, 0): /* VEGA20 */
> + case IP_VERSION(9, 4, 1): /* ARCTURUS */
> + case IP_VERSION(9, 4, 2): /* ALDEBARAN */
> + case IP_VERSION(10, 3, 1): /* VANGOGH */
> + case IP_VERSION(10, 3, 3): /* YELLOW_CARP */
> + case IP_VERSION(10, 1, 3): /* CYAN_SKILLFISH */
> + case IP_VERSION(10, 1, 10): /* NAVI10 */
> + case IP_VERSION(10, 1, 2): /* NAVI12 */
> + case IP_VERSION(10, 1, 1): /* NAVI14 */
> + case IP_VERSION(10, 3, 0): /* SIENNA_CICHLID */
> + case IP_VERSION(10, 3, 2): /* NAVY_FLOUNDER */
> + case IP_VERSION(10, 3, 4): /* DIMGREY_CAVEFISH */
> + case IP_VERSION(10, 3, 5): /* BEIGE_GOBY */
> + kfd->device_info.event_interrupt_class =
> _interrupt_class_v9;
> + break;
> + default:
> + dev_err(kfd_device, "Failed to find gc ip
> blocks(GC_HWIP:0x%x) in %s\n",
> + gc_version, __func__);
> + }
> +}

I understand the appeal of moving to a switch for the SDMA queue num above 
since its setting isn't very linear w.r.t. the SDMA versioning. That said I 
don't know if I understand moving to a switch for the event interrupt class 
here. To clarify, originally when I set all SOC15 to event_interrupt_class_v9 
it was to follow what was in the device_info structs in drm-staging-next at 
that time--that said if the plan is in a following patch to change this such 
that gfx10 are set to to event_interrupt_class_v10, what's the reasoning not to 
format it along the lines of:

if (gc_version >= IP_VERSION(10, 1, 1)
kfd->device_info.event_interrupt_class = _interrupt_class_v10;
else
kfd->device_info.event_interrupt_class = _interrupt_class_v9;

(Assuming this is done in the SOC15

Re: [PATCH v3] drm/amdgpu: Call amdgpu_device_unmap_mmio() if device is unplugged to prevent crash in GPU initialization failure

2021-12-17 Thread Andrey Grodzovsky


Reviewed-by: Andrey Grodzovsky 

Andrey

On 2021-12-17 3:49 a.m., Christian König wrote:

Am 17.12.21 um 03:26 schrieb Leslie Shi:

[Why]
In amdgpu_driver_load_kms, when amdgpu_device_init returns error 
during driver modprobe, it
will start the error handle path immediately and call into 
amdgpu_device_unmap_mmio as well
to release mapped VRAM. However, in the following release callback, 
driver stills visits the
unmapped memory like vcn.inst[i].fw_shared_cpu_addr in 
vcn_v3_0_sw_fini. So a kernel crash occurs.


[How]
call amdgpu_device_unmap_mmio() if device is unplugged to prevent 
invalid memory address in

vcn_v3_0_sw_fini() when GPU initialization failure.

Signed-off-by: Leslie Shi 


Looks sane to me, but Andrey should probably nood as well.

Acked-by: Christian König 


---
  drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 4 +++-
  1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c

index f31caec669e7..f6a47b927cfd 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -3899,7 +3899,9 @@ void amdgpu_device_fini_hw(struct amdgpu_device 
*adev)

    amdgpu_gart_dummy_page_fini(adev);
  -    amdgpu_device_unmap_mmio(adev);
+    if (drm_dev_is_unplugged(adev_to_drm(adev)))
+    amdgpu_device_unmap_mmio(adev);
+
  }
    void amdgpu_device_fini_sw(struct amdgpu_device *adev)

Re: [Bug Report] Desktop monitor sleep regression

2021-12-17 Thread Imre Deak

On Fri, Dec 17, 2021 at 03:46:21PM +0100, Thorsten Leemhuis wrote:
> added some CCs Geert added in his reply
> 
> On 07.12.21 08:20, Thorsten Leemhuis wrote:
> > 
> > [TLDR: adding this regression to regzbot; most of this mail is compiled
> > from a few templates paragraphs some of you might have seen already.]
> > 
> > Hi, this is your Linux kernel regression tracker speaking.
> 
> /me again
> 
> What's up here? We are getting close to rc6, but there afaics wasn't any
> reply of substance since the report ten days ago. Hence:
>
> Could anybody please comment on this? Imre Deak, the commit Brandon
> found in the bisection contains a patch of yours, do you maybe have an
> idea what's up here?

Yes,
https://bugzilla.kernel.org/show_bug.cgi?id=215203

based on which the problem is somehere in the AMD driver.

> Ciao, Thorsten
> 
> #regzbot poke
> 
> > Adding the regression mailing list to the list of recipients, as it
> > should be in the loop for all regressions, as explained here:
> > https://www.kernel.org/doc/html/latest/admin-guide/reporting-issues.html
> > 
> > Also adding the authors and reviewers of the culprit and two appropriate
> > mailing lists.
> > 
> > On 07.12.21 01:21, Brandon Nielsen wrote:
> >> Monitors no longer sleep properly on my system (dual monitor connected
> >> via DP->DVI, amdgpu, x86_64). The monitors slept properly on 5.14, but
> >> stopped during the 5.15 series. I have also filed this bug on the kernel
> >> bugzilla[0] and downstream[1].
> >>
> >> I have performed a bisect, first "bad" commit to master is
> >> 55285e21f04517939480966164a33898c34b2af2[1], the same change made it
> >> into the 5.15 branch as e3b39825ed0813f787cb3ebdc5ecaa5131623647.
> > 
> > TWIMC: That was for 5.15.3
> > 
> >> I have
> >> verified the issue exists in latest master
> >> (a51e3ac43ddbad891c2b1a4f3aa52371d6939570).
> >>
> >> Steps to reproduce:
> >>
> >>   1. Boot system (Fedora Workstation 35 in this case)
> >>   2. Log in
> >>   3. Lock screen (after a few seconds, monitors will enter power save
> >> "sleep" state with backlight off)
> >>   4. Wait (usually no more than 30 seconds, sometimes up to a few minutes)
> >>   5. Observe monitor leaving "sleep" state (backlight comes back on),
> >> but nothing is displayed
> >>
> >> [0] - https://bugzilla.kernel.org/show_bug.cgi?id=215203
> >> [1] - https://bugzilla.redhat.com/show_bug.cgi?id=2028613
> > 
> > To be sure this issue doesn't fall through the cracks unnoticed, I'm
> > adding it to regzbot, my Linux kernel regression tracking bot:
> > 
> > #regzbot ^introduced 55285e21f04517939480966164a33898c34b2af2
> > #regzbot title fbdev/efifb: Monitors no longer sleep (amdgpu dual
> > monitor setup)
> > #regzbot ignore-activity
> > 
> > Reminder: when fixing the issue, please add a 'Link:' tag with the URL
> > to the report (the parent of this mail), then regzbot will automatically
> > mark the regression as resolved once the fix lands in the appropriate
> > tree. For more details about regzbot see footer.
> > 
> > Sending this to everyone that got the initial report, to make all aware
> > of the tracking. I also hope that messages like this motivate people to
> > directly get at least the regression mailing list and ideally even
> > regzbot involved when dealing with regressions, as messages like this
> > wouldn't be needed then.
> > 
> > Don't worry, I'll send further messages wrt to this regression just to
> > the lists (with a tag in the subject so people can filter them away), as
> > long as they are intended just for regzbot. With a bit of luck no such
> > messages will be needed anyway.
> > 
> > Ciao, Thorsten, your Linux kernel regression tracker.
> > 
> > P.S.: As a Linux kernel regression tracker I'm getting a lot of reports
> > on my table. I can only look briefly into most of them. Unfortunately
> > therefore I sometimes will get things wrong or miss something important.
> > I hope that's not the case here; if you think it is, don't hesitate to
> > tell me about it in a public reply. That's in everyone's interest, as
> > what I wrote above might be misleading to everyone reading this; any
> > suggestion I gave they thus might sent someone reading this down the
> > wrong rabbit hole, which none of us wants.
> > 
> > BTW, I have no personal interest in this issue, which is tracked using
> > regzbot, my Linux kernel regression tracking bot
> > (https://linux-regtracking.leemhuis.info/regzbot/). I'm only posting
> > this mail to get things rolling again and hence don't need to be CC on
> > all further activities wrt to this regression.
> >

[PATCH] drm/amdkfd: correct sdma queue number in kfd device init

2021-12-17 Thread Guchun Chen

sdma queue number is not correct like on vega20, this patch
promises the setting keeps the same after code refactor.
Additionally, improve code to use switch case to list IP
version to complete kfd device_info structure filling.
This keeps consistency with the IP parse code in amdgpu_discovery.c.

Fixes: a9e2c4dc6cc4("drm/amdkfd: add kfd_device_info_init function")
Signed-off-by: Guchun Chen 
---
 drivers/gpu/drm/amd/amdkfd/kfd_device.c | 74 ++---
 1 file changed, 65 insertions(+), 9 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_device.c
index facc28f58c1f..e50bf992f298 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_device.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_device.c
@@ -59,11 +59,72 @@ static void kfd_gtt_sa_fini(struct kfd_dev *kfd);
 
 static int kfd_resume(struct kfd_dev *kfd);
 
+static void kfd_device_info_set_sdma_queue_num(struct kfd_dev *kfd)
+{
+   uint32_t sdma_version = kfd->adev->ip_versions[SDMA0_HWIP][0];
+
+   switch (sdma_version) {
+   case IP_VERSION(4, 0, 0):/* VEGA10 */
+   case IP_VERSION(4, 0, 1):/* VEGA12 */
+   case IP_VERSION(4, 1, 0):/* RAVEN */
+   case IP_VERSION(4, 1, 1):/* RAVEN */
+   case IP_VERSION(4, 1, 2):/* RENIOR */
+   case IP_VERSION(5, 2, 1):/* VANGOGH */
+   case IP_VERSION(5, 2, 3):/* YELLOW_CARP */
+   kfd->device_info.num_sdma_queues_per_engine = 2;
+   break;
+   case IP_VERSION(4, 2, 0):/* VEGA20 */
+   case IP_VERSION(4, 2, 2):/* ARCTUTUS */
+   case IP_VERSION(4, 4, 0):/* ALDEBARAN */
+   case IP_VERSION(5, 0, 0):/* NAVI10 */
+   case IP_VERSION(5, 0, 1):/* CYAN_SKILLFISH */
+   case IP_VERSION(5, 0, 2):/* NAVI14 */
+   case IP_VERSION(5, 0, 5):/* NAVI12 */
+   case IP_VERSION(5, 2, 0):/* SIENNA_CICHLID */
+   case IP_VERSION(5, 2, 2):/* NAVY_FLOUDER */
+   case IP_VERSION(5, 2, 4):/* DIMGREY_CAVEFISH */
+   kfd->device_info.num_sdma_queues_per_engine = 8;
+   break;
+   default:
+   dev_err(kfd_device,
+   "Failed to find sdma ip blocks(SDMA_HWIP:0x%x) 
in %s\n",
+sdma_version, __func__);
+   }
+}
+
+static void kfd_device_info_set_event_interrupt_class(struct kfd_dev *kfd)
+{
+   uint32_t gc_version = KFD_GC_VERSION(kfd);
+
+   switch (gc_version) {
+   case IP_VERSION(9, 0, 1): /* VEGA10 */
+   case IP_VERSION(9, 2, 1): /* VEGA12 */
+   case IP_VERSION(9, 3, 0): /* RENOIR */
+   case IP_VERSION(9, 4, 0): /* VEGA20 */
+   case IP_VERSION(9, 4, 1): /* ARCTURUS */
+   case IP_VERSION(9, 4, 2): /* ALDEBARAN */
+   case IP_VERSION(10, 3, 1): /* VANGOGH */
+   case IP_VERSION(10, 3, 3): /* YELLOW_CARP */
+   case IP_VERSION(10, 1, 3): /* CYAN_SKILLFISH */
+   case IP_VERSION(10, 1, 10): /* NAVI10 */
+   case IP_VERSION(10, 1, 2): /* NAVI12 */
+   case IP_VERSION(10, 1, 1): /* NAVI14 */
+   case IP_VERSION(10, 3, 0): /* SIENNA_CICHLID */
+   case IP_VERSION(10, 3, 2): /* NAVY_FLOUNDER */
+   case IP_VERSION(10, 3, 4): /* DIMGREY_CAVEFISH */
+   case IP_VERSION(10, 3, 5): /* BEIGE_GOBY */
+   kfd->device_info.event_interrupt_class = 
_interrupt_class_v9;
+   break;
+   default:
+   dev_err(kfd_device, "Failed to find gc ip blocks(GC_HWIP:0x%x) 
in %s\n",
+   gc_version, __func__);
+   }
+}
+
 static void kfd_device_info_init(struct kfd_dev *kfd,
 bool vf, uint32_t gfx_target_version)
 {
uint32_t gc_version = KFD_GC_VERSION(kfd);
-   uint32_t sdma_version = kfd->adev->ip_versions[SDMA0_HWIP][0];
uint32_t asic_type = kfd->adev->asic_type;
 
kfd->device_info.max_pasid_bits = 16;
@@ -75,16 +136,11 @@ static void kfd_device_info_init(struct kfd_dev *kfd,
if (KFD_IS_SOC15(kfd)) {
kfd->device_info.doorbell_size = 8;
kfd->device_info.ih_ring_entry_size = 8 * sizeof(uint32_t);
-   kfd->device_info.event_interrupt_class = 
_interrupt_class_v9;
kfd->device_info.supports_cwsr = true;
 
-   if ((sdma_version >= IP_VERSION(4, 0, 0)  &&
-sdma_version <= IP_VERSION(4, 2, 0)) ||
-sdma_version == IP_VERSION(5, 2, 1)  ||
-sdma_version == IP_VERSION(5, 2, 3))
-   kfd->device_info.num_sdma_queues_per_engine = 2;
-   else
-   kfd->device_info.num_sdma_queues_per_engine = 8;
+   kfd_device_info_set_sdma_queue_num(kfd);
+
+   kfd_device_info_set_event_interrupt_class(kfd);
 
/* Raven */

Re: [PATCH v3] drm/amdgpu: Call amdgpu_device_unmap_mmio() if device is unplugged to prevent crash in GPU initialization failure

2021-12-17 Thread Christian König


Am 17.12.21 um 03:26 schrieb Leslie Shi:

[Why]
In amdgpu_driver_load_kms, when amdgpu_device_init returns error during driver 
modprobe, it
will start the error handle path immediately and call into 
amdgpu_device_unmap_mmio as well
to release mapped VRAM. However, in the following release callback, driver 
stills visits the
unmapped memory like vcn.inst[i].fw_shared_cpu_addr in vcn_v3_0_sw_fini. So a 
kernel crash occurs.

[How]
call amdgpu_device_unmap_mmio() if device is unplugged to prevent invalid 
memory address in
vcn_v3_0_sw_fini() when GPU initialization failure.

Signed-off-by: Leslie Shi 


Looks sane to me, but Andrey should probably nood as well.

Acked-by: Christian König 


---
  drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 4 +++-
  1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index f31caec669e7..f6a47b927cfd 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -3899,7 +3899,9 @@ void amdgpu_device_fini_hw(struct amdgpu_device *adev)
  
  	amdgpu_gart_dummy_page_fini(adev);
  
-	amdgpu_device_unmap_mmio(adev);

+   if (drm_dev_is_unplugged(adev_to_drm(adev)))
+   amdgpu_device_unmap_mmio(adev);
+
  }
  
  void amdgpu_device_fini_sw(struct amdgpu_device *adev)

62 matches

Mail list logo