Re: [PATCH v3 2/2] drm/amdkfd: get doorbell's absolute offset based on the db size

2023-10-04 Thread Yadav, Arvind



On 10/4/2023 10:29 PM, Felix Kuehling wrote:


On 2023-10-04 12:16, Arvind Yadav wrote:

This patch is to align the absolute doorbell offset
based on the doorbell's size. So that doorbell offset
will be aligned for both 32 bit and 64 bit.

v2:
- Addressed the review comment from Felix.
v3:
- Adding doorbell_size as parameter to get db absolute offset.

Cc: Christian Koenig 
Cc: Alex Deucher 
Signed-off-by: Shashank Sharma 
Signed-off-by: Arvind Yadav 


The final result looks good to me. But please squash the two patches 
into one. The first patch on its own breaks the build, and that's 
something we don't want to commit to the branch history as it makes 
tracking regressions (e.g. with git bisect) very hard or impossible.


More nit-picks inline.

Sure, we can have one patch.




---
  .../gpu/drm/amd/amdkfd/kfd_device_queue_manager.c   |  6 +-
  drivers/gpu/drm/amd/amdkfd/kfd_doorbell.c   | 13 +++--
  .../gpu/drm/amd/amdkfd/kfd_process_queue_manager.c  |  4 +++-
  3 files changed, 19 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c

index 0d3d538b64eb..690ff131fe4b 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
@@ -346,6 +346,7 @@ static int allocate_doorbell(struct 
qcm_process_device *qpd,

   uint32_t const *restore_id)
  {
  struct kfd_node *dev = qpd->dqm->dev;
+    uint32_t doorbell_size;
    if (!KFD_IS_SOC15(dev)) {
  /* On pre-SOC15 chips we need to use the queue ID to
@@ -405,9 +406,12 @@ static int allocate_doorbell(struct 
qcm_process_device *qpd,

  }
  }
  +    doorbell_size = dev->kfd->device_info.doorbell_size;
+
  q->properties.doorbell_off = 
amdgpu_doorbell_index_on_bar(dev->adev,

    qpd->proc_doorbells,
-  q->doorbell_id);
+  q->doorbell_id,
+  doorbell_size);


You don't need a local variable for doorbell size that's only used 
once. Just pass dev->kfd->device_info.doorbell_size directly.


I have used local variable to make the code cleaner but I will remove 
local variable.



  return 0;
  }
  diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_doorbell.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_doorbell.c

index 7b38537c7c99..59dd76c4b138 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_doorbell.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_doorbell.c
@@ -161,7 +161,10 @@ void __iomem *kfd_get_kernel_doorbell(struct 
kfd_dev *kfd,

  if (inx >= KFD_MAX_NUM_OF_QUEUES_PER_PROCESS)
  return NULL;
  -    *doorbell_off = amdgpu_doorbell_index_on_bar(kfd->adev, 
kfd->doorbells, inx);

+    *doorbell_off = amdgpu_doorbell_index_on_bar(kfd->adev,
+ kfd->doorbells,
+ inx,
+ kfd->device_info.doorbell_size);
  inx *= 2;
    pr_debug("Get kernel queue doorbell\n"
@@ -233,6 +236,7 @@ phys_addr_t kfd_get_process_doorbells(struct 
kfd_process_device *pdd)

  {
  struct amdgpu_device *adev = pdd->dev->adev;
  uint32_t first_db_index;
+    uint32_t doorbell_size;
    if (!pdd->qpd.proc_doorbells) {
  if (kfd_alloc_process_doorbells(pdd->dev->kfd, pdd))
@@ -240,7 +244,12 @@ phys_addr_t kfd_get_process_doorbells(struct 
kfd_process_device *pdd)

  return 0;
  }
  -    first_db_index = amdgpu_doorbell_index_on_bar(adev, 
pdd->qpd.proc_doorbells, 0);

+    doorbell_size = pdd->dev->kfd->device_info.doorbell_size;
+
+    first_db_index = amdgpu_doorbell_index_on_bar(adev,
+  pdd->qpd.proc_doorbells,
+  0,
+  doorbell_size);


Same as above, no local variable needed.

Noted,




  return adev->doorbell.base + first_db_index * sizeof(uint32_t);
  }
  diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_process_queue_manager.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_process_queue_manager.c

index adb5e4bdc0b2..010cd8e8e6a1 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_process_queue_manager.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_process_queue_manager.c
@@ -375,9 +375,11 @@ int pqm_create_queue(struct 
process_queue_manager *pqm,

   * relative doorbell index = Absolute doorbell index -
   * absolute index of first doorbell in the page.
   */
+    uint32_t doorbell_size = 
pdd->dev->kfd->device_info.doorbell_size;
  uint32_t first_db_index = 
amdgpu_doorbell_index_on_bar(pdd->dev->adev,

pdd->qpd.proc_doorbells,
-   0);
+   0,
+   doorbell_size);


No local variable needed.


Noted,

Thanks
~Arvind


Regards,
  Felix



    *p_doorbell_offset_in_process = (q->properties.doorbell_off
  - first_db_i

Re: [PATCH 1/1] drm/amdkfd: Fix unaligned doorbell absolute offset for gfx8

2023-09-27 Thread Yadav, Arvind

Adding felix.kuehl...@amd.com for review.

Thanks
~Arvind

On 9/27/2023 9:46 PM, Arvind Yadav wrote:

This patch is to adjust the absolute doorbell offset
against the doorbell id considering the doorbell
size of 32/64 bit.

Cc: Christian Koenig 
Cc: Alex Deucher 
Signed-off-by: Shashank Sharma 
Signed-off-by: Arvind Yadav 
---
  drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c | 11 ++-
  1 file changed, 10 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
index 0d3d538b64eb..c327f7f604d7 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
@@ -407,7 +407,16 @@ static int allocate_doorbell(struct qcm_process_device 
*qpd,
  
  	q->properties.doorbell_off = amdgpu_doorbell_index_on_bar(dev->adev,

  
qpd->proc_doorbells,
- 
q->doorbell_id);
+ 0);
+
+   /* Adjust the absolute doorbell offset against the doorbell id 
considering
+* the doorbell size of 32/64 bit.
+*/
+   if (!KFD_IS_SOC15(dev))
+   q->properties.doorbell_off += q->doorbell_id;
+   else
+   q->properties.doorbell_off += q->doorbell_id * 2;
+
return 0;
  }
  


Re: [PATCH 0/1] drm/amdkfd: Fix unaligned doorbell absolute offset for gfx8

2023-09-27 Thread Yadav, Arvind

Adding felix.kuehl...@amd.com for review.

Thanks
~Arvind

On 9/27/2023 9:46 PM, Arvind Yadav wrote:

On older chips, the absolute doorbell offset within
the doorbell page is based on the queue ID.
KFD is using queue ID and doorbell size to get an
absolute doorbell offset in userspace.

This patch is to adjust the absolute doorbell offset
against the doorbell id considering the doorbell
size of 32/64 bit.

Arvind Yadav (1):
   drm/amdkfd: Fix unaligned doorbell absolute offset for gfx8

  drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c | 11 ++-
  1 file changed, 10 insertions(+), 1 deletion(-)



Re: [PATCH v3 0/7] GPU workload hints for better performance

2023-08-28 Thread Yadav, Arvind



On 8/28/2023 9:13 PM, Helen Mae Koike Fornazier wrote:

On Monday, August 28, 2023 09:26 -03, Arvind Yadav  wrote:


AMDGPU SOCs supports dynamic workload based power profiles, which can
provide fine-tuned performance for a particular type of workload.
This patch series adds an interface to set/reset these power profiles
based on the submitted job. The driver can dynamically switch
the power profiles based on submitted job. This can optimize the power
performance when the particular workload is on.

Hi Arvind,

Would you mind to test your patchset with drm-ci ? There is a amdgpu
test there and I would love to get your feedback of the ci.

You basically just need to apply the ci patch which is available on
https://cgit.freedesktop.org/drm/drm/log/?h=topic/drm-ci

There are instruction on the docs, but in short: to configure it, you push
your branch to gitlab.freedesktop.org, go to the settings and change the
CI/CD configuration file from .gitlab-ci.yml to 
drivers/gpu/drm/ci/gitlab-ci.yml,
and you can trigger a pipeline on your branch to get tests running.

(by the time of this writing, gitlab.fdo is under maintenance but should
be up soonish).


Hi Helen,

I tried the steps as mentioned by you but looks like something is 
missing and build itself is failing.


https://gitlab.freedesktop.org/ArvindYadav/drm-next/-/commits/smu_workload

Regards,
~Arvind


Thank you!
Helen


v2:
- Splitting workload_profile_set and workload_profile_put
   into two separate patches.
- Addressed review comment.
- Added new suspend function.
- Added patch to switches the GPU workload mode for KFD.

v3:
- Addressed all review comment.
- Changed the function name from *_set() to *_get().
- Now clearing all the profile in work handler.
- Added *_clear_all function to clear all the power profile.


Arvind Yadav (7):
   drm/amdgpu: Added init/fini functions for workload
   drm/amdgpu: Add new function to set GPU power profile
   drm/amdgpu: Add new function to put GPU power profile
   drm/amdgpu: Add suspend function to clear the GPU power profile.
   drm/amdgpu: Set/Reset GPU workload profile
   drm/amdgpu: switch workload context to/from compute
   Revert "drm/amd/amdgpu: switch on/off vcn power profile mode"

  drivers/gpu/drm/amd/amdgpu/Makefile   |   2 +-
  drivers/gpu/drm/amd/amdgpu/amdgpu.h   |   3 +
  drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c|   8 +-
  drivers/gpu/drm/amd/amdgpu/amdgpu_device.c|   6 +
  drivers/gpu/drm/amd/amdgpu/amdgpu_job.c   |   5 +
  drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c   |  14 +-
  drivers/gpu/drm/amd/amdgpu/amdgpu_workload.c  | 226 ++
  drivers/gpu/drm/amd/include/amdgpu_workload.h |  61 +
  8 files changed, 309 insertions(+), 16 deletions(-)
  create mode 100644 drivers/gpu/drm/amd/amdgpu/amdgpu_workload.c
  create mode 100644 drivers/gpu/drm/amd/include/amdgpu_workload.h

--
2.34.1



Re: [PATCH v2 3/7] drm/amdgpu: Add new function to put GPU power profile

2023-08-25 Thread Yadav, Arvind



On 8/22/2023 6:16 PM, Lazar, Lijo wrote:



On 8/22/2023 5:41 PM, Yadav, Arvind wrote:

Hi Lijo,

The *_set function will set the GPU power profile and the *_put 
function will  schedule the
smu_delayed_work task after 100ms delay. This smu_delayed_work task 
will clear a GPU
power profile if any new jobs are not scheduled within 100 ms. But if 
any new job  comes within 100ms
then the *_workload_profile_set function  will cancel this work and 
set the GPU power profile based on

preferences.

Please see the below case.

case 1 - only same profile jobs run. It will take 100ms to clear the 
profile once all jobs complete.


                                        wl = VIDEO <100ms>
workload _|`|

Jobs (VIDEO) |```|__|```|___||___


Case2 - two jobs of two different profile. job1 profile will be set 
but when job2 will arrive it will be moved

     to higher profile.

                  wl = VIDEO  ->    wl = COMPUTE   
<100ms>
workload 
___|``|


Jobs (VIDEO) ___|```|__|```|___||___||___

Jobs (COMPUTE) __|```|___||___||_



Case3 - two jobs of two different profile. job1 profile will be set 
but when job2 will arrive it will not be moved
to lower profile. When compute job2 will complete then only it will 
move to lower profile.


                                      wl = COMPUTE 
->   wl = VIDEO  <100ms>
workload 
_|``| 



Jobs (COMPUTE)    |```|__|```|___||___||___

Jobs (VIDEO) 
___|```|___||___||___||___




swsmu layer maintains a workload mask based on priority. So once you 
have set the mask, until you unset it (i.e when refcount = 0), the 
mask will be set in the lower layer. swsmu layer will take care of 
requesting FW the highest priority. I don't think that needs to be 
repeated at this level.


At this layer, all you need is to refcount the requests and make the 
request.


When refcount of a profile becomes non-zero (only one-time), place one 
request for that profile. As swsmu layer maintains the workload mask, 
it will take the new profile also into consideration while requesting 
for the one  with the highest priority.


When refcount of a profile becomes zero, place a request to clear it. 
This is controlled by your idle work. As I see, it keeps an additional 
100ms tolerance before placing a clear request. In that way, there is 
no need to cancel that work.


Inside idle work handler -
Loop through the profiles that are set and clear those profiles whose 
refcount is zero.


Thus if a job starts during the 100ms delay, idle work won't see the 
ref count as zero and then it won't place a request to clear out that 
profile.



Hi Liji,

Thank you for your comment. We would be considering your comment but we 
would retain the same design.


~Arvind.


On 8/22/2023 10:21 AM, Lazar, Lijo wrote:



On 8/21/2023 12:17 PM, Arvind Yadav wrote:

This patch adds a function which will clear the GPU
power profile after job finished.

This is how it works:
- schedular will set the GPU power profile based on ring_type.
- Schedular will clear the GPU Power profile once job finished.
- Here, the *_workload_profile_set function will set the GPU
   power profile and the *_workload_profile_put function will
   schedule the smu_delayed_work task after 100ms delay. This
   smu_delayed_work task will clear a GPU power profile if any
   new jobs are not scheduled within 100 ms. But if any new job
   comes within 100ms then the *_workload_profile_set function
   will cancel this work and set the GPU power profile based on
   preferences.

v2:
- Splitting workload_profile_set and workload_profile_put
   into two separate patches.
- Addressed review comment.

Cc: Shashank Sharma 
Cc: Christian Koenig 
Cc: Alex Deucher 
Signed-off-by: Arvind Yadav 
---
  drivers/gpu/drm/amd/amdgpu/amdgpu_workload.c  | 97 
+++

  drivers/gpu/drm/amd/include/amdgpu_workload.h |  3 +
  2 files changed, 100 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_workload.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_workload.c

index e661cc5b3d92..6367eb88a44d 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_workload.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_workload.c
@@ -24,6 +24,9 @@
    #include "amdgpu.h"
  +/* 100 millsecond timeout */
+#define SMU_IDLE_TIMEOUT    msecs_to_jiffies(100)
+
  static enum PP_SMC_POWER_PROFILE
  ring_to_power_profile(uint32_t ring_type)
  {
@@ -59,6 +62,80 @@ amdgpu_power_profile_set(struct amdgpu_device 
*adev,

  return ret;
  }
  +static int
+amdgpu_power_profile_clear(struct amdgpu_device *adev,
+   enum PP_SMC_POWER_PROFILE profile)
+{
+    int ret = amdgpu_dpm_switch_pow

Re: [PATCH v2 4/7] drm/amdgpu: Add suspend function to clear the GPU power profile.

2023-08-22 Thread Yadav, Arvind



On 8/22/2023 6:24 PM, Lazar, Lijo wrote:



On 8/22/2023 5:52 PM, Yadav, Arvind wrote:


On 8/22/2023 12:01 PM, Lazar, Lijo wrote:



On 8/21/2023 12:17 PM, Arvind Yadav wrote:

This patch adds a suspend function that will clear the GPU
power profile before going into suspend state.

v2:
- Add the new suspend function based on review comment.

Cc: Shashank Sharma 
Cc: Christian Koenig 
Cc: Alex Deucher 
Signed-off-by: Arvind Yadav 
---
  drivers/gpu/drm/amd/amdgpu/amdgpu_device.c    |  2 ++
  drivers/gpu/drm/amd/amdgpu/amdgpu_workload.c  | 23 
+++

  drivers/gpu/drm/amd/include/amdgpu_workload.h |  2 ++
  3 files changed, 27 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c

index cd3bf641b630..3b70e657b439 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -4212,6 +4212,8 @@ int amdgpu_device_suspend(struct drm_device 
*dev, bool fbcon)

    amdgpu_ras_suspend(adev);
  +    amdgpu_workload_profile_suspend(adev);
+
  amdgpu_device_ip_suspend_phase1(adev);
    if (!adev->in_s0ix)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_workload.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_workload.c

index 6367eb88a44d..44ca8e986984 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_workload.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_workload.c
@@ -174,6 +174,29 @@ void amdgpu_workload_profile_set(struct 
amdgpu_device *adev,

  mutex_unlock(&workload->workload_lock);
  }
  +void amdgpu_workload_profile_suspend(struct amdgpu_device *adev)
+{
+    struct amdgpu_smu_workload *workload = &adev->smu_workload;
+    int ret;
+
+    mutex_lock(&workload->workload_lock);
+ cancel_delayed_work_sync(&workload->smu_delayed_work);


Another deadlock candidate. Between fini() and suspend(), the only 
difference probably could be initialization status. If so, just use 
a helper that is used during fini() and suspend().


Before going to suspend(), we need to cancel the work and clear all 
the profiles but in fini() we are destroying the mutex. also it will 
be called when we are unloading everything.




What I meant is for both suspend/fini, you need to cancel any work 
scheduled, clear refcounts and set the profile back to default 
profile. Keep this in a helper and reuse.



Noted.

Thank you,
~Arvind


Thanks,
Lijo


~Arvind


Thanks,
Lijo


+
+    /* Clear all the set GPU power profile*/
+    for (int index = fls(workload->submit_workload_status);
+ index > 0; index--) {
+    if (workload->submit_workload_status & (1 << index)) {
+ atomic_set(&workload->power_profile_ref[index], 0);
+    ret = amdgpu_power_profile_clear(adev, index);
+    if (ret)
+    DRM_WARN("Failed to clear power profile %s, err = 
%d\n",

+ amdgpu_workload_mode_name[index], ret);
+    }
+    }
+    workload->submit_workload_status = 0;
+    mutex_unlock(&workload->workload_lock);
+}
+
  void amdgpu_workload_profile_init(struct amdgpu_device *adev)
  {
  adev->smu_workload.adev = adev;
diff --git a/drivers/gpu/drm/amd/include/amdgpu_workload.h 
b/drivers/gpu/drm/amd/include/amdgpu_workload.h

index ee1f87257f2d..0acd8769ec52 100644
--- a/drivers/gpu/drm/amd/include/amdgpu_workload.h
+++ b/drivers/gpu/drm/amd/include/amdgpu_workload.h
@@ -52,6 +52,8 @@ void amdgpu_workload_profile_put(struct 
amdgpu_device *adev,

  void amdgpu_workload_profile_set(struct amdgpu_device *adev,
   uint32_t ring_type);
  +void amdgpu_workload_profile_suspend(struct amdgpu_device *adev);
+
  void amdgpu_workload_profile_init(struct amdgpu_device *adev);
    void amdgpu_workload_profile_fini(struct amdgpu_device *adev);


Re: [PATCH v2 2/7] drm/amdgpu: Add new function to set GPU power profile

2023-08-22 Thread Yadav, Arvind



On 8/22/2023 11:55 AM, Lazar, Lijo wrote:



On 8/21/2023 12:17 PM, Arvind Yadav wrote:

This patch adds a function which will change the GPU
power profile based on a submitted job. This can optimize
the power performance when the workload is on.

v2:
- Splitting workload_profile_set and workload_profile_put
   into two separate patches.
- Addressed review comment.

Cc: Shashank Sharma 
Cc: Christian Koenig 
Cc: Alex Deucher 
Signed-off-by: Arvind Yadav 
---
  drivers/gpu/drm/amd/amdgpu/amdgpu_workload.c  | 56 +++
  drivers/gpu/drm/amd/include/amdgpu_workload.h |  3 +
  2 files changed, 59 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_workload.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_workload.c

index 32166f482f77..e661cc5b3d92 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_workload.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_workload.c
@@ -24,6 +24,62 @@
    #include "amdgpu.h"
  +static enum PP_SMC_POWER_PROFILE
+ring_to_power_profile(uint32_t ring_type)
+{
+    switch (ring_type) {
+    case AMDGPU_RING_TYPE_GFX:
+    return PP_SMC_POWER_PROFILE_FULLSCREEN3D;
+    case AMDGPU_RING_TYPE_COMPUTE:
+    return PP_SMC_POWER_PROFILE_COMPUTE;
+    case AMDGPU_RING_TYPE_UVD:
+    case AMDGPU_RING_TYPE_VCE:
+    case AMDGPU_RING_TYPE_UVD_ENC:
+    case AMDGPU_RING_TYPE_VCN_DEC:
+    case AMDGPU_RING_TYPE_VCN_ENC:
+    case AMDGPU_RING_TYPE_VCN_JPEG:
+    return PP_SMC_POWER_PROFILE_VIDEO;
+    default:
+    return PP_SMC_POWER_PROFILE_BOOTUP_DEFAULT;
+    }
+}
+
+static int
+amdgpu_power_profile_set(struct amdgpu_device *adev,
+ enum PP_SMC_POWER_PROFILE profile)
+{
+    int ret = amdgpu_dpm_switch_power_profile(adev, profile, true);
+


You don't need to interact with FW for every set() call. Only send the 
message if workload_status doesn't have the profile set or refcount is 
zero. Otherwise, only need to increment the refcount.

Noted.
Thank You,
~Arvind


Thanks,
Lijo


+    if (!ret) {
+    /* Set the bit for the submitted workload profile */
+    adev->smu_workload.submit_workload_status |= (1 << profile);
+ atomic_inc(&adev->smu_workload.power_profile_ref[profile]);
+    }
+
+    return ret;
+}
+
+void amdgpu_workload_profile_set(struct amdgpu_device *adev,
+ uint32_t ring_type)
+{
+    struct amdgpu_smu_workload *workload = &adev->smu_workload;
+    enum PP_SMC_POWER_PROFILE profile = 
ring_to_power_profile(ring_type);

+    int ret;
+
+    if (profile == PP_SMC_POWER_PROFILE_BOOTUP_DEFAULT)
+    return;
+
+    mutex_lock(&workload->workload_lock);
+
+    ret = amdgpu_power_profile_set(adev, profile);
+    if (ret) {
+    DRM_WARN("Failed to set workload profile to %s, error = %d\n",
+ amdgpu_workload_mode_name[profile], ret);
+    }
+
+    mutex_unlock(&workload->workload_lock);
+}
+
  void amdgpu_workload_profile_init(struct amdgpu_device *adev)
  {
  adev->smu_workload.adev = adev;
diff --git a/drivers/gpu/drm/amd/include/amdgpu_workload.h 
b/drivers/gpu/drm/amd/include/amdgpu_workload.h

index 5d0f068422d4..5022f28fc2f9 100644
--- a/drivers/gpu/drm/amd/include/amdgpu_workload.h
+++ b/drivers/gpu/drm/amd/include/amdgpu_workload.h
@@ -46,6 +46,9 @@ static const char * const 
amdgpu_workload_mode_name[] = {

  "Window3D"
  };
  +void amdgpu_workload_profile_set(struct amdgpu_device *adev,
+ uint32_t ring_type);
+
  void amdgpu_workload_profile_init(struct amdgpu_device *adev);
    void amdgpu_workload_profile_fini(struct amdgpu_device *adev);


Re: [PATCH v2 4/7] drm/amdgpu: Add suspend function to clear the GPU power profile.

2023-08-22 Thread Yadav, Arvind



On 8/22/2023 12:01 PM, Lazar, Lijo wrote:



On 8/21/2023 12:17 PM, Arvind Yadav wrote:

This patch adds a suspend function that will clear the GPU
power profile before going into suspend state.

v2:
- Add the new suspend function based on review comment.

Cc: Shashank Sharma 
Cc: Christian Koenig 
Cc: Alex Deucher 
Signed-off-by: Arvind Yadav 
---
  drivers/gpu/drm/amd/amdgpu/amdgpu_device.c    |  2 ++
  drivers/gpu/drm/amd/amdgpu/amdgpu_workload.c  | 23 +++
  drivers/gpu/drm/amd/include/amdgpu_workload.h |  2 ++
  3 files changed, 27 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c

index cd3bf641b630..3b70e657b439 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -4212,6 +4212,8 @@ int amdgpu_device_suspend(struct drm_device 
*dev, bool fbcon)

    amdgpu_ras_suspend(adev);
  +    amdgpu_workload_profile_suspend(adev);
+
  amdgpu_device_ip_suspend_phase1(adev);
    if (!adev->in_s0ix)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_workload.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_workload.c

index 6367eb88a44d..44ca8e986984 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_workload.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_workload.c
@@ -174,6 +174,29 @@ void amdgpu_workload_profile_set(struct 
amdgpu_device *adev,

  mutex_unlock(&workload->workload_lock);
  }
  +void amdgpu_workload_profile_suspend(struct amdgpu_device *adev)
+{
+    struct amdgpu_smu_workload *workload = &adev->smu_workload;
+    int ret;
+
+    mutex_lock(&workload->workload_lock);
+ cancel_delayed_work_sync(&workload->smu_delayed_work);


Another deadlock candidate. Between fini() and suspend(), the only 
difference probably could be initialization status. If so, just use a 
helper that is used during fini() and suspend().


Before going to suspend(), we need to cancel the work and clear all the 
profiles but in fini() we are destroying the mutex. also it will be 
called when we are unloading everything.


~Arvind


Thanks,
Lijo


+
+    /* Clear all the set GPU power profile*/
+    for (int index = fls(workload->submit_workload_status);
+ index > 0; index--) {
+    if (workload->submit_workload_status & (1 << index)) {
+ atomic_set(&workload->power_profile_ref[index], 0);
+    ret = amdgpu_power_profile_clear(adev, index);
+    if (ret)
+    DRM_WARN("Failed to clear power profile %s, err = 
%d\n",

+ amdgpu_workload_mode_name[index], ret);
+    }
+    }
+    workload->submit_workload_status = 0;
+    mutex_unlock(&workload->workload_lock);
+}
+
  void amdgpu_workload_profile_init(struct amdgpu_device *adev)
  {
  adev->smu_workload.adev = adev;
diff --git a/drivers/gpu/drm/amd/include/amdgpu_workload.h 
b/drivers/gpu/drm/amd/include/amdgpu_workload.h

index ee1f87257f2d..0acd8769ec52 100644
--- a/drivers/gpu/drm/amd/include/amdgpu_workload.h
+++ b/drivers/gpu/drm/amd/include/amdgpu_workload.h
@@ -52,6 +52,8 @@ void amdgpu_workload_profile_put(struct 
amdgpu_device *adev,

  void amdgpu_workload_profile_set(struct amdgpu_device *adev,
   uint32_t ring_type);
  +void amdgpu_workload_profile_suspend(struct amdgpu_device *adev);
+
  void amdgpu_workload_profile_init(struct amdgpu_device *adev);
    void amdgpu_workload_profile_fini(struct amdgpu_device *adev);


Re: [PATCH v2 3/7] drm/amdgpu: Add new function to put GPU power profile

2023-08-22 Thread Yadav, Arvind

Hi Lijo,

The *_set function will set the GPU power profile and the *_put function 
will  schedule the
smu_delayed_work task after 100ms delay. This smu_delayed_work task will 
clear a GPU
power profile if any new jobs are not scheduled within 100 ms. But if 
any new job  comes within 100ms
then the *_workload_profile_set function  will cancel this work and set 
the GPU power profile based on

preferences.

Please see the below case.

case 1 - only same profile jobs run. It will take 100ms to clear the 
profile once all jobs complete.


                                       wl = VIDEO <100ms>
workload _|`|

Jobs (VIDEO) |```|__|```|___||___


Case2 - two jobs of two different profile. job1 profile will be set but 
when job2 will arrive it will be moved

    to higher profile.

                 wl = VIDEO  ->    wl = COMPUTE     
  <100ms>
workload 
___|``|


Jobs (VIDEO) ___|```|__|```|___||___||___

Jobs (COMPUTE) __|```|___||___||_



Case3 - two jobs of two different profile. job1 profile will be set but 
when job2 will arrive it will not be moved
to lower profile. When compute job2 will complete then only it will move 
to lower profile.


                                     wl = COMPUTE 
->   wl = VIDEO  <100ms>
workload 
_|``|


Jobs (COMPUTE)    |```|__|```|___||___||___

Jobs (VIDEO) ___|```|___||___||___||___

On 8/22/2023 10:21 AM, Lazar, Lijo wrote:



On 8/21/2023 12:17 PM, Arvind Yadav wrote:

This patch adds a function which will clear the GPU
power profile after job finished.

This is how it works:
- schedular will set the GPU power profile based on ring_type.
- Schedular will clear the GPU Power profile once job finished.
- Here, the *_workload_profile_set function will set the GPU
   power profile and the *_workload_profile_put function will
   schedule the smu_delayed_work task after 100ms delay. This
   smu_delayed_work task will clear a GPU power profile if any
   new jobs are not scheduled within 100 ms. But if any new job
   comes within 100ms then the *_workload_profile_set function
   will cancel this work and set the GPU power profile based on
   preferences.

v2:
- Splitting workload_profile_set and workload_profile_put
   into two separate patches.
- Addressed review comment.

Cc: Shashank Sharma 
Cc: Christian Koenig 
Cc: Alex Deucher 
Signed-off-by: Arvind Yadav 
---
  drivers/gpu/drm/amd/amdgpu/amdgpu_workload.c  | 97 +++
  drivers/gpu/drm/amd/include/amdgpu_workload.h |  3 +
  2 files changed, 100 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_workload.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_workload.c

index e661cc5b3d92..6367eb88a44d 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_workload.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_workload.c
@@ -24,6 +24,9 @@
    #include "amdgpu.h"
  +/* 100 millsecond timeout */
+#define SMU_IDLE_TIMEOUT    msecs_to_jiffies(100)
+
  static enum PP_SMC_POWER_PROFILE
  ring_to_power_profile(uint32_t ring_type)
  {
@@ -59,6 +62,80 @@ amdgpu_power_profile_set(struct amdgpu_device *adev,
  return ret;
  }
  +static int
+amdgpu_power_profile_clear(struct amdgpu_device *adev,
+   enum PP_SMC_POWER_PROFILE profile)
+{
+    int ret = amdgpu_dpm_switch_power_profile(adev, profile, false);
+
+    if (!ret) {
+    /* Clear the bit for the submitted workload profile */
+    adev->smu_workload.submit_workload_status &= ~(1 << profile);
+    }
+
+    return ret;
+}
+
+static void
+amdgpu_power_profile_idle_work_handler(struct work_struct *work)
+{
+
+    struct amdgpu_smu_workload *workload = container_of(work,
+  struct amdgpu_smu_workload,
+  smu_delayed_work.work);
+    struct amdgpu_device *adev = workload->adev;
+    bool reschedule = false;
+    int index  = fls(workload->submit_workload_status);
+    int ret;
+
+    mutex_lock(&workload->workload_lock);
+    for (; index > 0; index--) {


Why not use for_each_set_bit?


We are clearing which we have only set it. We will clear first higher 
profile then lower.






+    int val = atomic_read(&workload->power_profile_ref[index]);
+
+    if (val) {
+    reschedule = true;


Why do you need to do reschedule? For each put(), a schedule is 
called. If refcount is not zero, that means some other job has already 
set the profile. It is supposed to call put() and at that time, this 
job will be run to clear it anyway, right?



Yes, I have got the comment for this I am going to remove this.
Noted.


+    } else {
+    if (workload->submit_workload_status &
+    (1 << index)) {
+    ret = amdgpu_

Re: [PATCH v2 2/7] drm/amdgpu: Add new function to set GPU power profile

2023-08-21 Thread Yadav, Arvind



On 8/21/2023 11:40 PM, Alex Deucher wrote:

On Mon, Aug 21, 2023 at 1:54 PM Yadav, Arvind  wrote:


On 8/21/2023 9:52 PM, Alex Deucher wrote:

On Mon, Aug 21, 2023 at 2:55 AM Arvind Yadav  wrote:

This patch adds a function which will change the GPU
power profile based on a submitted job. This can optimize
the power performance when the workload is on.

v2:
- Splitting workload_profile_set and workload_profile_put
into two separate patches.
- Addressed review comment.

Cc: Shashank Sharma 
Cc: Christian Koenig 
Cc: Alex Deucher 
Signed-off-by: Arvind Yadav 
---
   drivers/gpu/drm/amd/amdgpu/amdgpu_workload.c  | 56 +++
   drivers/gpu/drm/amd/include/amdgpu_workload.h |  3 +
   2 files changed, 59 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_workload.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_workload.c
index 32166f482f77..e661cc5b3d92 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_workload.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_workload.c
@@ -24,6 +24,62 @@

   #include "amdgpu.h"

+static enum PP_SMC_POWER_PROFILE
+ring_to_power_profile(uint32_t ring_type)
+{
+   switch (ring_type) {
+   case AMDGPU_RING_TYPE_GFX:
+   return PP_SMC_POWER_PROFILE_FULLSCREEN3D;
+   case AMDGPU_RING_TYPE_COMPUTE:
+   return PP_SMC_POWER_PROFILE_COMPUTE;
+   case AMDGPU_RING_TYPE_UVD:
+   case AMDGPU_RING_TYPE_VCE:
+   case AMDGPU_RING_TYPE_UVD_ENC:
+   case AMDGPU_RING_TYPE_VCN_DEC:
+   case AMDGPU_RING_TYPE_VCN_ENC:
+   case AMDGPU_RING_TYPE_VCN_JPEG:
+   return PP_SMC_POWER_PROFILE_VIDEO;
+   default:
+   return PP_SMC_POWER_PROFILE_BOOTUP_DEFAULT;
+   }
+}
+
+static int
+amdgpu_power_profile_set(struct amdgpu_device *adev,
+enum PP_SMC_POWER_PROFILE profile)
+{
+   int ret = amdgpu_dpm_switch_power_profile(adev, profile, true);
+
+   if (!ret) {
+   /* Set the bit for the submitted workload profile */
+   adev->smu_workload.submit_workload_status |= (1 << profile);
+   atomic_inc(&adev->smu_workload.power_profile_ref[profile]);
+   }
+
+   return ret;
+}
+
+void amdgpu_workload_profile_set(struct amdgpu_device *adev,
+uint32_t ring_type)
+{
+   struct amdgpu_smu_workload *workload = &adev->smu_workload;
+   enum PP_SMC_POWER_PROFILE profile = ring_to_power_profile(ring_type);
+   int ret;
+
+   if (profile == PP_SMC_POWER_PROFILE_BOOTUP_DEFAULT)
+   return;

Why is this one skipped?  How do we get back to the boot up profile?

Hi Alex,

enum PP_SMC_POWER_PROFILE {
  PP_SMC_POWER_PROFILE_BOOTUP_DEFAULT = 0x0,
  PP_SMC_POWER_PROFILE_FULLSCREEN3D = 0x1,
  PP_SMC_POWER_PROFILE_POWERSAVING  = 0x2,
  PP_SMC_POWER_PROFILE_VIDEO= 0x3,
  PP_SMC_POWER_PROFILE_VR   = 0x4,
  PP_SMC_POWER_PROFILE_COMPUTE  = 0x5,
  PP_SMC_POWER_PROFILE_CUSTOM   = 0x6,
  PP_SMC_POWER_PROFILE_WINDOW3D = 0x7,
  PP_SMC_POWER_PROFILE_COUNT,
};

These are all the profiles. We are using which is >
PP_SMC_POWER_PROFILE_BOOTUP_DEFAULT.
Now suppose the profile was DEFAULT and we set it to VIDEO, SMU will
move the profile to a higher level.
When we reset the VIDEO profile then SMU will move back to the DEFAULT one.

Our job is to set the profile and reset it after the job is done.
SMU will take care to move to a higher profile and after reset, it will
move back to DEFAULT.

I guess that is the part I'm missing.  How does the call to the SMU to
set the profile back to DEFAULT actually happen?  It seems that both
the put and get functions return early in this case.
SMU is calculating a workload for given the profile and setting it when 
we call the *get and *put function.
When we call *set function for VIDEO then SMU will calculate a workload 
for VIDEO and set it. Now We call
*put function for the same profile then SMU calculates a workload which 
will be lower or DEFAULT (0)

and then it will set it.

Suppose we have called *set function for VIDEO profile then SMU will 
calculate the workload = 4 and set it.
Now we have called *put function for the same profile then SMU will 
calculate the workload = 0 and set it.


please see the below smu code where index will be DEFAULT (0) or lower 
for *put function.


if (!en) { // put function
    smu->workload_mask &= ~(1 << smu->workload_prority[type]);
    index = fls(smu->workload_mask);
    index = index > 0 && index <= WORKLOAD_POLICY_MAX ? index - 1 : 0;
    workload = smu->workload_setting[index];
} else { // set function.
    smu->workload_mask |= (1 << smu->workload_prority[type]);
    index = fls(smu->workload_mask);
    index = index <= WORKLOAD_POLICY_MAX ? index - 1 : 0;
    workload = smu->workload_setting[index];
}

In our case the *set function will set

Re: [PATCH v2 2/7] drm/amdgpu: Add new function to set GPU power profile

2023-08-21 Thread Yadav, Arvind



On 8/21/2023 11:36 PM, Alex Deucher wrote:

On Mon, Aug 21, 2023 at 2:55 AM Arvind Yadav  wrote:

This patch adds a function which will change the GPU
power profile based on a submitted job. This can optimize
the power performance when the workload is on.

v2:
- Splitting workload_profile_set and workload_profile_put
   into two separate patches.
- Addressed review comment.

Cc: Shashank Sharma 
Cc: Christian Koenig 
Cc: Alex Deucher 
Signed-off-by: Arvind Yadav 
---
  drivers/gpu/drm/amd/amdgpu/amdgpu_workload.c  | 56 +++
  drivers/gpu/drm/amd/include/amdgpu_workload.h |  3 +
  2 files changed, 59 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_workload.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_workload.c
index 32166f482f77..e661cc5b3d92 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_workload.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_workload.c
@@ -24,6 +24,62 @@

  #include "amdgpu.h"

+static enum PP_SMC_POWER_PROFILE
+ring_to_power_profile(uint32_t ring_type)
+{
+   switch (ring_type) {
+   case AMDGPU_RING_TYPE_GFX:
+   return PP_SMC_POWER_PROFILE_FULLSCREEN3D;
+   case AMDGPU_RING_TYPE_COMPUTE:
+   return PP_SMC_POWER_PROFILE_COMPUTE;
+   case AMDGPU_RING_TYPE_UVD:
+   case AMDGPU_RING_TYPE_VCE:
+   case AMDGPU_RING_TYPE_UVD_ENC:
+   case AMDGPU_RING_TYPE_VCN_DEC:
+   case AMDGPU_RING_TYPE_VCN_ENC:
+   case AMDGPU_RING_TYPE_VCN_JPEG:
+   return PP_SMC_POWER_PROFILE_VIDEO;
+   default:
+   return PP_SMC_POWER_PROFILE_BOOTUP_DEFAULT;
+   }
+}
+
+static int
+amdgpu_power_profile_set(struct amdgpu_device *adev,
+enum PP_SMC_POWER_PROFILE profile)
+{
+   int ret = amdgpu_dpm_switch_power_profile(adev, profile, true);
+
+   if (!ret) {
+   /* Set the bit for the submitted workload profile */
+   adev->smu_workload.submit_workload_status |= (1 << profile);
+   atomic_inc(&adev->smu_workload.power_profile_ref[profile]);
+   }
+
+   return ret;
+}
+
+void amdgpu_workload_profile_set(struct amdgpu_device *adev,
+uint32_t ring_type)

Maybe rename this amdgpu_workload_profile_get() to align with put/get
naming semantics?

Noted.


Alex


+{
+   struct amdgpu_smu_workload *workload = &adev->smu_workload;
+   enum PP_SMC_POWER_PROFILE profile = ring_to_power_profile(ring_type);
+   int ret;
+
+   if (profile == PP_SMC_POWER_PROFILE_BOOTUP_DEFAULT)
+   return;
+
+   mutex_lock(&workload->workload_lock);
+
+   ret = amdgpu_power_profile_set(adev, profile);
+   if (ret) {
+   DRM_WARN("Failed to set workload profile to %s, error = %d\n",
+amdgpu_workload_mode_name[profile], ret);
+   }
+
+   mutex_unlock(&workload->workload_lock);
+}
+
  void amdgpu_workload_profile_init(struct amdgpu_device *adev)
  {
 adev->smu_workload.adev = adev;
diff --git a/drivers/gpu/drm/amd/include/amdgpu_workload.h 
b/drivers/gpu/drm/amd/include/amdgpu_workload.h
index 5d0f068422d4..5022f28fc2f9 100644
--- a/drivers/gpu/drm/amd/include/amdgpu_workload.h
+++ b/drivers/gpu/drm/amd/include/amdgpu_workload.h
@@ -46,6 +46,9 @@ static const char * const amdgpu_workload_mode_name[] = {
 "Window3D"
  };

+void amdgpu_workload_profile_set(struct amdgpu_device *adev,
+uint32_t ring_type);
+
  void amdgpu_workload_profile_init(struct amdgpu_device *adev);

  void amdgpu_workload_profile_fini(struct amdgpu_device *adev);
--
2.34.1



Re: [PATCH v2 2/7] drm/amdgpu: Add new function to set GPU power profile

2023-08-21 Thread Yadav, Arvind



On 8/21/2023 9:52 PM, Alex Deucher wrote:

On Mon, Aug 21, 2023 at 2:55 AM Arvind Yadav  wrote:

This patch adds a function which will change the GPU
power profile based on a submitted job. This can optimize
the power performance when the workload is on.

v2:
- Splitting workload_profile_set and workload_profile_put
   into two separate patches.
- Addressed review comment.

Cc: Shashank Sharma 
Cc: Christian Koenig 
Cc: Alex Deucher 
Signed-off-by: Arvind Yadav 
---
  drivers/gpu/drm/amd/amdgpu/amdgpu_workload.c  | 56 +++
  drivers/gpu/drm/amd/include/amdgpu_workload.h |  3 +
  2 files changed, 59 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_workload.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_workload.c
index 32166f482f77..e661cc5b3d92 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_workload.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_workload.c
@@ -24,6 +24,62 @@

  #include "amdgpu.h"

+static enum PP_SMC_POWER_PROFILE
+ring_to_power_profile(uint32_t ring_type)
+{
+   switch (ring_type) {
+   case AMDGPU_RING_TYPE_GFX:
+   return PP_SMC_POWER_PROFILE_FULLSCREEN3D;
+   case AMDGPU_RING_TYPE_COMPUTE:
+   return PP_SMC_POWER_PROFILE_COMPUTE;
+   case AMDGPU_RING_TYPE_UVD:
+   case AMDGPU_RING_TYPE_VCE:
+   case AMDGPU_RING_TYPE_UVD_ENC:
+   case AMDGPU_RING_TYPE_VCN_DEC:
+   case AMDGPU_RING_TYPE_VCN_ENC:
+   case AMDGPU_RING_TYPE_VCN_JPEG:
+   return PP_SMC_POWER_PROFILE_VIDEO;
+   default:
+   return PP_SMC_POWER_PROFILE_BOOTUP_DEFAULT;
+   }
+}
+
+static int
+amdgpu_power_profile_set(struct amdgpu_device *adev,
+enum PP_SMC_POWER_PROFILE profile)
+{
+   int ret = amdgpu_dpm_switch_power_profile(adev, profile, true);
+
+   if (!ret) {
+   /* Set the bit for the submitted workload profile */
+   adev->smu_workload.submit_workload_status |= (1 << profile);
+   atomic_inc(&adev->smu_workload.power_profile_ref[profile]);
+   }
+
+   return ret;
+}
+
+void amdgpu_workload_profile_set(struct amdgpu_device *adev,
+uint32_t ring_type)
+{
+   struct amdgpu_smu_workload *workload = &adev->smu_workload;
+   enum PP_SMC_POWER_PROFILE profile = ring_to_power_profile(ring_type);
+   int ret;
+
+   if (profile == PP_SMC_POWER_PROFILE_BOOTUP_DEFAULT)
+   return;

Why is this one skipped?  How do we get back to the boot up profile?


Hi Alex,

enum PP_SMC_POWER_PROFILE {
    PP_SMC_POWER_PROFILE_BOOTUP_DEFAULT = 0x0,
    PP_SMC_POWER_PROFILE_FULLSCREEN3D = 0x1,
    PP_SMC_POWER_PROFILE_POWERSAVING  = 0x2,
    PP_SMC_POWER_PROFILE_VIDEO    = 0x3,
    PP_SMC_POWER_PROFILE_VR   = 0x4,
    PP_SMC_POWER_PROFILE_COMPUTE  = 0x5,
    PP_SMC_POWER_PROFILE_CUSTOM   = 0x6,
    PP_SMC_POWER_PROFILE_WINDOW3D = 0x7,
    PP_SMC_POWER_PROFILE_COUNT,
};

These are all the profiles. We are using which is > 
PP_SMC_POWER_PROFILE_BOOTUP_DEFAULT.
Now suppose the profile was DEFAULT and we set it to VIDEO, SMU will 
move the profile to a higher level.

When we reset the VIDEO profile then SMU will move back to the DEFAULT one.

Our job is to set the profile and reset it after the job is done.
SMU will take care to move to a higher profile and after reset, it will 
move back to DEFAULT.


ThankYou,
~Arvind



Alex


+
+   mutex_lock(&workload->workload_lock);
+
+   ret = amdgpu_power_profile_set(adev, profile);
+   if (ret) {
+   DRM_WARN("Failed to set workload profile to %s, error = %d\n",
+amdgpu_workload_mode_name[profile], ret);
+   }
+
+   mutex_unlock(&workload->workload_lock);
+}
+
  void amdgpu_workload_profile_init(struct amdgpu_device *adev)
  {
 adev->smu_workload.adev = adev;
diff --git a/drivers/gpu/drm/amd/include/amdgpu_workload.h 
b/drivers/gpu/drm/amd/include/amdgpu_workload.h
index 5d0f068422d4..5022f28fc2f9 100644
--- a/drivers/gpu/drm/amd/include/amdgpu_workload.h
+++ b/drivers/gpu/drm/amd/include/amdgpu_workload.h
@@ -46,6 +46,9 @@ static const char * const amdgpu_workload_mode_name[] = {
 "Window3D"
  };

+void amdgpu_workload_profile_set(struct amdgpu_device *adev,
+uint32_t ring_type);
+
  void amdgpu_workload_profile_init(struct amdgpu_device *adev);

  void amdgpu_workload_profile_fini(struct amdgpu_device *adev);
--
2.34.1



Re: [PATCH v2 3/7] drm/amdgpu: Add new function to put GPU power profile

2023-08-21 Thread Yadav, Arvind



On 8/21/2023 7:09 PM, Shashank Sharma wrote:


On 21/08/2023 08:47, Arvind Yadav wrote:

This patch adds a function which will clear the GPU
power profile after job finished.

This is how it works:
- schedular will set the GPU power profile based on ring_type.
- Schedular will clear the GPU Power profile once job finished.
- Here, the *_workload_profile_set function will set the GPU
   power profile and the *_workload_profile_put function will
   schedule the smu_delayed_work task after 100ms delay. This
   smu_delayed_work task will clear a GPU power profile if any
   new jobs are not scheduled within 100 ms. But if any new job
   comes within 100ms then the *_workload_profile_set function
   will cancel this work and set the GPU power profile based on
   preferences.

v2:
- Splitting workload_profile_set and workload_profile_put
   into two separate patches.
- Addressed review comment.

Cc: Shashank Sharma 
Cc: Christian Koenig 
Cc: Alex Deucher 
Signed-off-by: Arvind Yadav 
---
  drivers/gpu/drm/amd/amdgpu/amdgpu_workload.c  | 97 +++
  drivers/gpu/drm/amd/include/amdgpu_workload.h |  3 +
  2 files changed, 100 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_workload.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_workload.c

index e661cc5b3d92..6367eb88a44d 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_workload.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_workload.c
@@ -24,6 +24,9 @@
    #include "amdgpu.h"
  +/* 100 millsecond timeout */
+#define SMU_IDLE_TIMEOUT    msecs_to_jiffies(100)
+
  static enum PP_SMC_POWER_PROFILE
  ring_to_power_profile(uint32_t ring_type)
  {
@@ -59,6 +62,80 @@ amdgpu_power_profile_set(struct amdgpu_device *adev,
  return ret;
  }
  +static int
+amdgpu_power_profile_clear(struct amdgpu_device *adev,
+   enum PP_SMC_POWER_PROFILE profile)
+{
+    int ret = amdgpu_dpm_switch_power_profile(adev, profile, false);
+
+    if (!ret) {
+    /* Clear the bit for the submitted workload profile */
+    adev->smu_workload.submit_workload_status &= ~(1 << profile);
+    }
+
+    return ret;
+}
+
+static void
+amdgpu_power_profile_idle_work_handler(struct work_struct *work)
+{
+
+    struct amdgpu_smu_workload *workload = container_of(work,
+  struct amdgpu_smu_workload,
+  smu_delayed_work.work);
+    struct amdgpu_device *adev = workload->adev;
+    bool reschedule = false;
+    int index  = fls(workload->submit_workload_status);
+    int ret;
+
We should check validity and range of index here before before using 
it below.

Noted.



+ mutex_lock(&workload->workload_lock);
+    for (; index > 0; index--) {
+    int val = atomic_read(&workload->power_profile_ref[index]);
+
+    if (val) {
+    reschedule = true;
+    } else {
+    if (workload->submit_workload_status &
+    (1 << index)) {
+    ret = amdgpu_power_profile_clear(adev, index);
+    if (ret) {
+    DRM_WARN("Failed to clear workload %s,error = 
%d\n",

+ amdgpu_workload_mode_name[index], ret);
+    goto exit;
instead of exiting, we might wanna continue the loop here, just to 
check if we are able to reset another profile in the next attempt.

Noted.

+    }
+    }
+    }
+    }

A blank line recommended here.

Noted.

+    if (reschedule)
+ schedule_delayed_work(&workload->smu_delayed_work,
+  SMU_IDLE_TIMEOUT);
+exit:
+    mutex_unlock(&workload->workload_lock);
+}
+
+void amdgpu_workload_profile_put(struct amdgpu_device *adev,
+ uint32_t ring_type)
+{
+    struct amdgpu_smu_workload *workload = &adev->smu_workload;
+    enum PP_SMC_POWER_PROFILE profile = 
ring_to_power_profile(ring_type);

+
+    if (profile == PP_SMC_POWER_PROFILE_BOOTUP_DEFAULT)
+    return;
+
+    mutex_lock(&workload->workload_lock);
+
+    if (!atomic_read(&workload->power_profile_ref[profile])) {
+    DRM_WARN("Power profile %s ref. count error\n",
+ amdgpu_workload_mode_name[profile]);
+    } else {
+ atomic_dec(&workload->power_profile_ref[profile]);
+ schedule_delayed_work(&workload->smu_delayed_work,
+  SMU_IDLE_TIMEOUT);
We don't want to schedule this work everytime a power profile is put, 
but we want to do that only when a power profile ref count reaches 
'0'. So you might want to check the ref_count, and schedule the work 
under a if (!ref_count) condition.

Noted.



+    }
+
+    mutex_unlock(&workload->workload_lock);
+}
+
  void amdgpu_workload_profile_set(struct amdgpu_device *adev,
   uint32_t ring_type)
  {
@@ -70,13 +147,30 @@ void amdgpu_workload_profile_set(struct 
amdgpu_device *adev,

  return;
    mutex_lock(&workload->workload_lock);
+ cancel_delayed_work_sync(&workload->smu_delayed_work);
    ret = amdgpu_power_profile_set(adev, profile);
  if (ret) {
  DRM_WARN("Fa

Re: [PATCH v2 1/7] drm/amdgpu: Added init/fini functions for workload

2023-08-21 Thread Yadav, Arvind



On 8/21/2023 7:24 PM, Shashank Sharma wrote:


On 21/08/2023 15:35, Yadav, Arvind wrote:


On 8/21/2023 6:36 PM, Shashank Sharma wrote:

Hey Arvind,

On 21/08/2023 08:47, Arvind Yadav wrote:

The'struct amdgpu_smu_workload' initialization/cleanup
functions is added by this patch.

v2:
- Splitting big patch into separate patches.
- Added new fini function.

Cc: Shashank Sharma 
Cc: Christian Koenig 
Cc: Alex Deucher 
Signed-off-by: Arvind Yadav 
---
  drivers/gpu/drm/amd/amdgpu/Makefile   |  2 +-
  drivers/gpu/drm/amd/amdgpu/amdgpu.h   |  3 ++
  drivers/gpu/drm/amd/amdgpu/amdgpu_device.c    |  4 ++
  drivers/gpu/drm/amd/amdgpu/amdgpu_workload.c  | 44 +++
  drivers/gpu/drm/amd/include/amdgpu_workload.h | 53 
+++

  5 files changed, 105 insertions(+), 1 deletion(-)
  create mode 100644 drivers/gpu/drm/amd/amdgpu/amdgpu_workload.c
  create mode 100644 drivers/gpu/drm/amd/include/amdgpu_workload.h

diff --git a/drivers/gpu/drm/amd/amdgpu/Makefile 
b/drivers/gpu/drm/amd/amdgpu/Makefile

index 415a7fa395c4..6a9e187d61e1 100644
--- a/drivers/gpu/drm/amd/amdgpu/Makefile
+++ b/drivers/gpu/drm/amd/amdgpu/Makefile
@@ -60,7 +60,7 @@ amdgpu-y += amdgpu_device.o amdgpu_kms.o \
  amdgpu_umc.o smu_v11_0_i2c.o amdgpu_fru_eeprom.o amdgpu_rap.o \
  amdgpu_fw_attestation.o amdgpu_securedisplay.o \
  amdgpu_eeprom.o amdgpu_mca.o amdgpu_psp_ta.o amdgpu_lsdma.o \
-    amdgpu_ring_mux.o
+    amdgpu_ring_mux.o amdgpu_workload.o
    amdgpu-$(CONFIG_PROC_FS) += amdgpu_fdinfo.o
  diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu.h

index 02b827785e39..1939fa1af8a6 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
@@ -107,6 +107,7 @@
  #include "amdgpu_fdinfo.h"
  #include "amdgpu_mca.h"
  #include "amdgpu_ras.h"
+#include "amdgpu_workload.h"
    #define MAX_GPU_INSTANCE    16
  @@ -1050,6 +1051,8 @@ struct amdgpu_device {
    bool    job_hang;
  bool    dc_enabled;
+
+    struct amdgpu_smu_workload    smu_workload;
  };
    static inline struct amdgpu_device *drm_to_adev(struct 
drm_device *ddev)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c

index 5c7d40873ee2..cd3bf641b630 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -2243,6 +2243,8 @@ static int amdgpu_device_ip_early_init(struct 
amdgpu_device *adev)

  adev->cg_flags &= amdgpu_cg_mask;
  adev->pg_flags &= amdgpu_pg_mask;
  +    amdgpu_workload_profile_init(adev);
+
  return 0;
  }
  @@ -2890,6 +2892,8 @@ static int amdgpu_device_ip_fini(struct 
amdgpu_device *adev)

  {
  int i, r;
  +    amdgpu_workload_profile_fini(adev);
+
  if (amdgpu_sriov_vf(adev) && adev->virt.ras_init_done)
  amdgpu_virt_release_ras_err_handler_data(adev);
  diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_workload.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_workload.c

new file mode 100644
index ..32166f482f77
--- /dev/null
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_workload.c
@@ -0,0 +1,44 @@
+// SPDX-License-Identifier: MIT
+/*
+ * Copyright 2023 Advanced Micro Devices, Inc.
+ *
+ * Permission is hereby granted, free of charge, to any person 
obtaining a
+ * copy of this software and associated documentation files (the 
"Software"),
+ * to deal in the Software without restriction, including without 
limitation
+ * the rights to use, copy, modify, merge, publish, distribute, 
sublicense,
+ * and/or sell copies of the Software, and to permit persons to 
whom the
+ * Software is furnished to do so, subject to the following 
conditions:

+ *
+ * The above copyright notice and this permission notice shall be 
included in

+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, 
EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF 
MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO 
EVENT SHALL
+ * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, 
DAMAGES OR
+ * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR 
OTHERWISE,
+ * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE 
USE OR

+ * OTHER DEALINGS IN THE SOFTWARE.
+ *
+ */
+
+#include "amdgpu.h"
+
+void amdgpu_workload_profile_init(struct amdgpu_device *adev)
+{
+    adev->smu_workload.adev = adev;
+    adev->smu_workload.submit_workload_status = 0;
+    adev->smu_workload.initialized = true;

why do we need this variable ?


Hi Shashank,

If any error comes while the device is booting then amdgpu will start 
unloading everything.
So I am using 'initialized' for unloading a driver successfully. This 
variable is to identify that the drive

Re: [PATCH v2 5/7] drm/amdgpu: Switch on/off GPU workload profile

2023-08-21 Thread Yadav, Arvind



On 8/21/2023 7:16 PM, Shashank Sharma wrote:


On 21/08/2023 08:47, Arvind Yadav wrote:

This patch is to switch the GPU workload profile based
on the submitted job. The workload profile is reset to
default when the job is done.

Cc: Shashank Sharma 
Cc: Christian Koenig 
Cc: Alex Deucher 
Signed-off-by: Arvind Yadav 
---
  drivers/gpu/drm/amd/amdgpu/amdgpu_job.c | 5 +
  1 file changed, 5 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c

index c3d9d75143f4..c2b0fda6ba26 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
@@ -176,6 +176,9 @@ void amdgpu_job_free_resources(struct amdgpu_job 
*job)

  static void amdgpu_job_free_cb(struct drm_sched_job *s_job)
  {
  struct amdgpu_job *job = to_amdgpu_job(s_job);
+    struct amdgpu_ring *ring = to_amdgpu_ring(s_job->sched);
+
+    amdgpu_workload_profile_put(ring->adev, ring->funcs->type);
    drm_sched_job_cleanup(s_job);
  @@ -295,6 +298,8 @@ static struct dma_fence *amdgpu_job_run(struct 
drm_sched_job *sched_job)

  DRM_ERROR("Error scheduling IBs (%d)\n", r);
  }
  +    amdgpu_workload_profile_set(adev, ring->funcs->type);
+
  job->job_run_counter++;
  amdgpu_job_free_resources(job);


Instead of calling switch on/off in title, may we call it set/reset 
GPU workload profile ?


With that minor nitpick handled, please feel free to use:


Noted.

Thank You
~Arvind

Reviewed-by: Shashank Sharma 

- Shashank



Re: [PATCH v2 4/7] drm/amdgpu: Add suspend function to clear the GPU power profile.

2023-08-21 Thread Yadav, Arvind



On 8/21/2023 7:13 PM, Shashank Sharma wrote:


On 21/08/2023 08:47, Arvind Yadav wrote:

This patch adds a suspend function that will clear the GPU
power profile before going into suspend state.

v2:
- Add the new suspend function based on review comment.

Cc: Shashank Sharma 
Cc: Christian Koenig 
Cc: Alex Deucher 
Signed-off-by: Arvind Yadav 
---
  drivers/gpu/drm/amd/amdgpu/amdgpu_device.c    |  2 ++
  drivers/gpu/drm/amd/amdgpu/amdgpu_workload.c  | 23 +++
  drivers/gpu/drm/amd/include/amdgpu_workload.h |  2 ++
  3 files changed, 27 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c

index cd3bf641b630..3b70e657b439 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -4212,6 +4212,8 @@ int amdgpu_device_suspend(struct drm_device 
*dev, bool fbcon)

    amdgpu_ras_suspend(adev);
  +    amdgpu_workload_profile_suspend(adev);
+
  amdgpu_device_ip_suspend_phase1(adev);
    if (!adev->in_s0ix)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_workload.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_workload.c

index 6367eb88a44d..44ca8e986984 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_workload.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_workload.c
@@ -174,6 +174,29 @@ void amdgpu_workload_profile_set(struct 
amdgpu_device *adev,

  mutex_unlock(&workload->workload_lock);
  }
  +void amdgpu_workload_profile_suspend(struct amdgpu_device *adev)
+{
+    struct amdgpu_smu_workload *workload = &adev->smu_workload;
+    int ret;
+
+    mutex_lock(&workload->workload_lock);
+ cancel_delayed_work_sync(&workload->smu_delayed_work);
+
+    /* Clear all the set GPU power profile*/
+    for (int index = fls(workload->submit_workload_status);
+ index > 0; index--) {
+    if (workload->submit_workload_status & (1 << index)) {
+ atomic_set(&workload->power_profile_ref[index], 0);
+    ret = amdgpu_power_profile_clear(adev, index);


Why do we need the checks here ? can't we simply set call 
power_profile_clear() for all profiles ?


Hi Shashank,

 If we use only one profile then why to clear others. But I can remove 
the check and clear for all the profiles as per your suggestion.


ThankYou,
~Arvind


- Shashank


+    if (ret)
+    DRM_WARN("Failed to clear power profile %s, err = 
%d\n",

+ amdgpu_workload_mode_name[index], ret);
+    }
+    }




+    workload->submit_workload_status = 0;
+    mutex_unlock(&workload->workload_lock);
+}
+
  void amdgpu_workload_profile_init(struct amdgpu_device *adev)
  {
  adev->smu_workload.adev = adev;
diff --git a/drivers/gpu/drm/amd/include/amdgpu_workload.h 
b/drivers/gpu/drm/amd/include/amdgpu_workload.h

index ee1f87257f2d..0acd8769ec52 100644
--- a/drivers/gpu/drm/amd/include/amdgpu_workload.h
+++ b/drivers/gpu/drm/amd/include/amdgpu_workload.h
@@ -52,6 +52,8 @@ void amdgpu_workload_profile_put(struct 
amdgpu_device *adev,

  void amdgpu_workload_profile_set(struct amdgpu_device *adev,
   uint32_t ring_type);
  +void amdgpu_workload_profile_suspend(struct amdgpu_device *adev);
+
  void amdgpu_workload_profile_init(struct amdgpu_device *adev);
    void amdgpu_workload_profile_fini(struct amdgpu_device *adev);


Re: [PATCH v2 1/7] drm/amdgpu: Added init/fini functions for workload

2023-08-21 Thread Yadav, Arvind



On 8/21/2023 6:36 PM, Shashank Sharma wrote:

Hey Arvind,

On 21/08/2023 08:47, Arvind Yadav wrote:

The'struct amdgpu_smu_workload' initialization/cleanup
functions is added by this patch.

v2:
- Splitting big patch into separate patches.
- Added new fini function.

Cc: Shashank Sharma 
Cc: Christian Koenig 
Cc: Alex Deucher 
Signed-off-by: Arvind Yadav 
---
  drivers/gpu/drm/amd/amdgpu/Makefile   |  2 +-
  drivers/gpu/drm/amd/amdgpu/amdgpu.h   |  3 ++
  drivers/gpu/drm/amd/amdgpu/amdgpu_device.c    |  4 ++
  drivers/gpu/drm/amd/amdgpu/amdgpu_workload.c  | 44 +++
  drivers/gpu/drm/amd/include/amdgpu_workload.h | 53 +++
  5 files changed, 105 insertions(+), 1 deletion(-)
  create mode 100644 drivers/gpu/drm/amd/amdgpu/amdgpu_workload.c
  create mode 100644 drivers/gpu/drm/amd/include/amdgpu_workload.h

diff --git a/drivers/gpu/drm/amd/amdgpu/Makefile 
b/drivers/gpu/drm/amd/amdgpu/Makefile

index 415a7fa395c4..6a9e187d61e1 100644
--- a/drivers/gpu/drm/amd/amdgpu/Makefile
+++ b/drivers/gpu/drm/amd/amdgpu/Makefile
@@ -60,7 +60,7 @@ amdgpu-y += amdgpu_device.o amdgpu_kms.o \
  amdgpu_umc.o smu_v11_0_i2c.o amdgpu_fru_eeprom.o amdgpu_rap.o \
  amdgpu_fw_attestation.o amdgpu_securedisplay.o \
  amdgpu_eeprom.o amdgpu_mca.o amdgpu_psp_ta.o amdgpu_lsdma.o \
-    amdgpu_ring_mux.o
+    amdgpu_ring_mux.o amdgpu_workload.o
    amdgpu-$(CONFIG_PROC_FS) += amdgpu_fdinfo.o
  diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu.h

index 02b827785e39..1939fa1af8a6 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
@@ -107,6 +107,7 @@
  #include "amdgpu_fdinfo.h"
  #include "amdgpu_mca.h"
  #include "amdgpu_ras.h"
+#include "amdgpu_workload.h"
    #define MAX_GPU_INSTANCE    16
  @@ -1050,6 +1051,8 @@ struct amdgpu_device {
    bool    job_hang;
  bool    dc_enabled;
+
+    struct amdgpu_smu_workload    smu_workload;
  };
    static inline struct amdgpu_device *drm_to_adev(struct drm_device 
*ddev)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c

index 5c7d40873ee2..cd3bf641b630 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -2243,6 +2243,8 @@ static int amdgpu_device_ip_early_init(struct 
amdgpu_device *adev)

  adev->cg_flags &= amdgpu_cg_mask;
  adev->pg_flags &= amdgpu_pg_mask;
  +    amdgpu_workload_profile_init(adev);
+
  return 0;
  }
  @@ -2890,6 +2892,8 @@ static int amdgpu_device_ip_fini(struct 
amdgpu_device *adev)

  {
  int i, r;
  +    amdgpu_workload_profile_fini(adev);
+
  if (amdgpu_sriov_vf(adev) && adev->virt.ras_init_done)
  amdgpu_virt_release_ras_err_handler_data(adev);
  diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_workload.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_workload.c

new file mode 100644
index ..32166f482f77
--- /dev/null
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_workload.c
@@ -0,0 +1,44 @@
+// SPDX-License-Identifier: MIT
+/*
+ * Copyright 2023 Advanced Micro Devices, Inc.
+ *
+ * Permission is hereby granted, free of charge, to any person 
obtaining a
+ * copy of this software and associated documentation files (the 
"Software"),
+ * to deal in the Software without restriction, including without 
limitation
+ * the rights to use, copy, modify, merge, publish, distribute, 
sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom 
the

+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be 
included in

+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, 
EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF 
MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO 
EVENT SHALL
+ * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, 
DAMAGES OR
+ * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR 
OTHERWISE,
+ * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE 
USE OR

+ * OTHER DEALINGS IN THE SOFTWARE.
+ *
+ */
+
+#include "amdgpu.h"
+
+void amdgpu_workload_profile_init(struct amdgpu_device *adev)
+{
+    adev->smu_workload.adev = adev;
+    adev->smu_workload.submit_workload_status = 0;
+    adev->smu_workload.initialized = true;

why do we need this variable ?


Hi Shashank,

If any error comes while the device is booting then amdgpu will start 
unloading everything.
So I am using 'initialized' for unloading a driver successfully. This 
variable is to identify that the driver is loaded or not.


This is the below error for which the amdgpu driver is unloading when it 
is not getting firmware.


[   12.421609] amdgpu :08:00.0: Direct firmware load for 
amdgpu/renoir_ta.bin failed with e

Re: [PATCH 1/3] drm/amdgpu: Add new api to switch on/off power profile mode

2023-08-17 Thread Yadav, Arvind



On 8/14/2023 8:28 PM, Shashank Sharma wrote:

Hey Arvind,

On 14/08/2023 09:34, Arvind Yadav wrote:

This patch adds a function which will allow to
change the GPU power profile based on a submitted job.
This can optimize the power performance when the
workload is on.

Cc: Shashank Sharma 
Cc: Christian Koenig 
Cc: Alex Deucher 
Signed-off-by: Arvind Yadav 
---
  drivers/gpu/drm/amd/amdgpu/Makefile   |   2 +-
  drivers/gpu/drm/amd/amdgpu/amdgpu.h   |   3 +
  drivers/gpu/drm/amd/amdgpu/amdgpu_device.c    |   2 +
  drivers/gpu/drm/amd/amdgpu/amdgpu_workload.c  | 156 ++
  drivers/gpu/drm/amd/include/amdgpu_workload.h |  44 +
  5 files changed, 206 insertions(+), 1 deletion(-)
  create mode 100644 drivers/gpu/drm/amd/amdgpu/amdgpu_workload.c
  create mode 100644 drivers/gpu/drm/amd/include/amdgpu_workload.h

diff --git a/drivers/gpu/drm/amd/amdgpu/Makefile 
b/drivers/gpu/drm/amd/amdgpu/Makefile

index 415a7fa395c4..6a9e187d61e1 100644
--- a/drivers/gpu/drm/amd/amdgpu/Makefile
+++ b/drivers/gpu/drm/amd/amdgpu/Makefile
@@ -60,7 +60,7 @@ amdgpu-y += amdgpu_device.o amdgpu_kms.o \
  amdgpu_umc.o smu_v11_0_i2c.o amdgpu_fru_eeprom.o amdgpu_rap.o \
  amdgpu_fw_attestation.o amdgpu_securedisplay.o \
  amdgpu_eeprom.o amdgpu_mca.o amdgpu_psp_ta.o amdgpu_lsdma.o \
-    amdgpu_ring_mux.o
+    amdgpu_ring_mux.o amdgpu_workload.o
    amdgpu-$(CONFIG_PROC_FS) += amdgpu_fdinfo.o
  diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu.h

index 02b827785e39..1939fa1af8a6 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
@@ -107,6 +107,7 @@
  #include "amdgpu_fdinfo.h"
  #include "amdgpu_mca.h"
  #include "amdgpu_ras.h"
+#include "amdgpu_workload.h"
    #define MAX_GPU_INSTANCE    16
  @@ -1050,6 +1051,8 @@ struct amdgpu_device {
    bool    job_hang;
  bool    dc_enabled;
+
+    struct amdgpu_smu_workload    smu_workload;
  };
    static inline struct amdgpu_device *drm_to_adev(struct drm_device 
*ddev)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c

index 5c7d40873ee2..0ec18b8fe29f 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -3672,6 +3672,8 @@ int amdgpu_device_init(struct amdgpu_device *adev,
    INIT_WORK(&adev->xgmi_reset_work, 
amdgpu_device_xgmi_reset_func);

  +    amdgpu_smu_workload_init(adev);
+
  adev->gfx.gfx_off_req_count = 1;
  adev->gfx.gfx_off_residency = 0;
  adev->gfx.gfx_off_entrycount = 0;
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_workload.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_workload.c

new file mode 100644
index ..ce0339d75c12
--- /dev/null
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_workload.c
@@ -0,0 +1,156 @@
+// SPDX-License-Identifier: MIT
+/*
+ * Copyright 2023 Advanced Micro Devices, Inc.
+ *
+ * Permission is hereby granted, free of charge, to any person 
obtaining a
+ * copy of this software and associated documentation files (the 
"Software"),
+ * to deal in the Software without restriction, including without 
limitation
+ * the rights to use, copy, modify, merge, publish, distribute, 
sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom 
the

+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be 
included in

+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, 
EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF 
MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO 
EVENT SHALL
+ * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, 
DAMAGES OR
+ * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR 
OTHERWISE,
+ * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE 
USE OR

+ * OTHER DEALINGS IN THE SOFTWARE.
+ *
+ */
+
+#include "amdgpu.h"
+
+/* 100 millsecond timeout */
+#define SMU_IDLE_TIMEOUT    msecs_to_jiffies(100)
+
+static enum PP_SMC_POWER_PROFILE
+ring_to_power_profile(uint32_t ring_type)
+{
+    switch (ring_type) {
+    case AMDGPU_RING_TYPE_GFX:
+    return PP_SMC_POWER_PROFILE_FULLSCREEN3D;
+    case AMDGPU_RING_TYPE_COMPUTE:
+    return PP_SMC_POWER_PROFILE_COMPUTE;
+    case AMDGPU_RING_TYPE_UVD:
+    case AMDGPU_RING_TYPE_VCE:
+    case AMDGPU_RING_TYPE_UVD_ENC:
+    case AMDGPU_RING_TYPE_VCN_DEC:
+    case AMDGPU_RING_TYPE_VCN_ENC:
+    case AMDGPU_RING_TYPE_VCN_JPEG:
+    return PP_SMC_POWER_PROFILE_VIDEO;
+    default:
+    return PP_SMC_POWER_PROFILE_BOOTUP_DEFAULT;
+    }
+}
+
+static void
+amdgpu_power_profile_set(struct amdgpu_device *adev,
+ enum PP_SMC_POWER_PROFILE profile)
This function expects the caller to hold the smu_workload_mutex, may 
be 

Re: [PATCH 3/3] Revert "drm/amd/amdgpu: switch on/off vcn power profile mode"

2023-08-17 Thread Yadav, Arvind



On 8/14/2023 9:35 PM, Shashank Sharma wrote:

Ah, Thanks for pointing that out Alex.

@Arvind, please refer to the patch 
(https://patchwork.freedesktop.org/patch/504854/?series=109060&rev=4) 
in previous series of SMU workload hints with UAPI (here: 
https://patchwork.freedesktop.org/series/109060/)



Thank you Shashank and Alex. I will handle the KFD.

Regards,

Arvind


Regards

Shashank

On 14/08/2023 17:20, Alex Deucher wrote:

KFD also changes the profile when queues are active.  Please make sure
that is properly taken into account as well.

Alex

On Mon, Aug 14, 2023 at 3:36 AM Arvind Yadav  
wrote:

This reverts commit 5ce71f59bb9bd3d8a09b96afdbc92975cb6dc303.

Reason for revert: New amdgpu_smu* api is added to switch
on/off profile mode. These new api will allow to change the
GPU power profile based on a submitted job.

Cc: Shashank Sharma 
Cc: Christian Koenig 
Cc: Alex Deucher 
Signed-off-by: Arvind Yadav 
---
  drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c | 14 ++
  1 file changed, 2 insertions(+), 12 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c

index 2d94f1b63bd6..70777fcfa626 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c
@@ -363,7 +363,6 @@ static void amdgpu_vcn_idle_work_handler(struct 
work_struct *work)
 container_of(work, struct amdgpu_device, 
vcn.idle_work.work);
 unsigned int fences = 0, fence[AMDGPU_MAX_VCN_INSTANCES] = 
{0};

 unsigned int i, j;
-   int r = 0;

 for (j = 0; j < adev->vcn.num_vcn_inst; ++j) {
 if (adev->vcn.harvest_config & (1 << j))
@@ -392,10 +391,6 @@ static void amdgpu_vcn_idle_work_handler(struct 
work_struct *work)
 if (!fences && 
!atomic_read(&adev->vcn.total_submission_cnt)) {
 amdgpu_device_ip_set_powergating_state(adev, 
AMD_IP_BLOCK_TYPE_VCN,

    AMD_PG_STATE_GATE);
-   r = amdgpu_dpm_switch_power_profile(adev, 
PP_SMC_POWER_PROFILE_VIDEO,

-   false);
-   if (r)
-   dev_warn(adev->dev, "(%d) failed to disable 
video power profile mode\n", r);

 } else {
schedule_delayed_work(&adev->vcn.idle_work, VCN_IDLE_TIMEOUT);
 }
@@ -404,16 +399,11 @@ static void 
amdgpu_vcn_idle_work_handler(struct work_struct *work)

  void amdgpu_vcn_ring_begin_use(struct amdgpu_ring *ring)
  {
 struct amdgpu_device *adev = ring->adev;
-   int r = 0;

 atomic_inc(&adev->vcn.total_submission_cnt);

-   if (!cancel_delayed_work_sync(&adev->vcn.idle_work)) {
-   r = amdgpu_dpm_switch_power_profile(adev, 
PP_SMC_POWER_PROFILE_VIDEO,

-   true);
-   if (r)
-   dev_warn(adev->dev, "(%d) failed to switch 
to video power profile mode\n", r);

-   }
+   if (!cancel_delayed_work_sync(&adev->vcn.idle_work))
+   amdgpu_gfx_off_ctrl(adev, false);

 mutex_lock(&adev->vcn.vcn_pg_lock);
 amdgpu_device_ip_set_powergating_state(adev, 
AMD_IP_BLOCK_TYPE_VCN,

--
2.34.1



Re: [PATCH 1/3] drm/amdgpu: Add new api to switch on/off power profile mode

2023-08-17 Thread Yadav, Arvind



On 8/14/2023 8:03 PM, Alex Deucher wrote:

On Mon, Aug 14, 2023 at 3:35 AM Arvind Yadav  wrote:

This patch adds a function which will allow to
change the GPU power profile based on a submitted job.
This can optimize the power performance when the
workload is on.

A few minor comments inline below.  One thing to double check is that
we properly cancel this work before a suspend or driver unload.  We
need to make sure this is taken care of before we take down the SMU.

Alex


Noted,


Cc: Shashank Sharma 
Cc: Christian Koenig 
Cc: Alex Deucher 
Signed-off-by: Arvind Yadav 
---
  drivers/gpu/drm/amd/amdgpu/Makefile   |   2 +-
  drivers/gpu/drm/amd/amdgpu/amdgpu.h   |   3 +
  drivers/gpu/drm/amd/amdgpu/amdgpu_device.c|   2 +
  drivers/gpu/drm/amd/amdgpu/amdgpu_workload.c  | 156 ++
  drivers/gpu/drm/amd/include/amdgpu_workload.h |  44 +
  5 files changed, 206 insertions(+), 1 deletion(-)
  create mode 100644 drivers/gpu/drm/amd/amdgpu/amdgpu_workload.c
  create mode 100644 drivers/gpu/drm/amd/include/amdgpu_workload.h

diff --git a/drivers/gpu/drm/amd/amdgpu/Makefile 
b/drivers/gpu/drm/amd/amdgpu/Makefile
index 415a7fa395c4..6a9e187d61e1 100644
--- a/drivers/gpu/drm/amd/amdgpu/Makefile
+++ b/drivers/gpu/drm/amd/amdgpu/Makefile
@@ -60,7 +60,7 @@ amdgpu-y += amdgpu_device.o amdgpu_kms.o \
 amdgpu_umc.o smu_v11_0_i2c.o amdgpu_fru_eeprom.o amdgpu_rap.o \
 amdgpu_fw_attestation.o amdgpu_securedisplay.o \
 amdgpu_eeprom.o amdgpu_mca.o amdgpu_psp_ta.o amdgpu_lsdma.o \
-   amdgpu_ring_mux.o
+   amdgpu_ring_mux.o amdgpu_workload.o

  amdgpu-$(CONFIG_PROC_FS) += amdgpu_fdinfo.o

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
index 02b827785e39..1939fa1af8a6 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
@@ -107,6 +107,7 @@
  #include "amdgpu_fdinfo.h"
  #include "amdgpu_mca.h"
  #include "amdgpu_ras.h"
+#include "amdgpu_workload.h"

  #define MAX_GPU_INSTANCE   16

@@ -1050,6 +1051,8 @@ struct amdgpu_device {

 booljob_hang;
 booldc_enabled;
+
+   struct amdgpu_smu_workload  smu_workload;
  };

  static inline struct amdgpu_device *drm_to_adev(struct drm_device *ddev)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index 5c7d40873ee2..0ec18b8fe29f 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -3672,6 +3672,8 @@ int amdgpu_device_init(struct amdgpu_device *adev,

 INIT_WORK(&adev->xgmi_reset_work, amdgpu_device_xgmi_reset_func);

+   amdgpu_smu_workload_init(adev);
+
 adev->gfx.gfx_off_req_count = 1;
 adev->gfx.gfx_off_residency = 0;
 adev->gfx.gfx_off_entrycount = 0;
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_workload.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_workload.c
new file mode 100644
index ..ce0339d75c12
--- /dev/null
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_workload.c
@@ -0,0 +1,156 @@
+// SPDX-License-Identifier: MIT
+/*
+ * Copyright 2023 Advanced Micro Devices, Inc.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, DAMAGES OR
+ * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
+ * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
+ * OTHER DEALINGS IN THE SOFTWARE.
+ *
+ */
+
+#include "amdgpu.h"
+
+/* 100 millsecond timeout */
+#define SMU_IDLE_TIMEOUT   msecs_to_jiffies(100)
+
+static enum PP_SMC_POWER_PROFILE
+ring_to_power_profile(uint32_t ring_type)
+{
+   switch (ring_type) {
+   case AMDGPU_RING_TYPE_GFX:
+   return PP_SMC_POWER_PROFILE_FULLSCREEN3D;
+   case AMDGPU_RING_TYPE_COMPUTE:
+   return PP_SMC_POWER_PROFILE_COMPUTE;
+   case AMDGPU_RING_TYPE_UVD:
+   case AMDGPU_RING_TYPE_VCE:
+   case AMDGPU_RING_TYPE_UVD_ENC:
+   case AMDGPU_RING_TYPE_VCN_DEC:
+   case AMDGPU_RING_TYPE_VCN_ENC:
+   case AMDGPU_RING_TYPE_VCN_JPEG:
+   return PP_SMC_

Re: [PATCH 1/3] drm/amdgpu: Add new api to switch on/off power profile mode

2023-08-14 Thread Yadav, Arvind



On 8/14/2023 5:35 PM, Christian König wrote:

Am 14.08.23 um 09:34 schrieb Arvind Yadav:

This patch adds a function which will allow to
change the GPU power profile based on a submitted job.
This can optimize the power performance when the
workload is on.

Cc: Shashank Sharma 
Cc: Christian Koenig 
Cc: Alex Deucher 
Signed-off-by: Arvind Yadav 
---
  drivers/gpu/drm/amd/amdgpu/Makefile   |   2 +-
  drivers/gpu/drm/amd/amdgpu/amdgpu.h   |   3 +
  drivers/gpu/drm/amd/amdgpu/amdgpu_device.c    |   2 +
  drivers/gpu/drm/amd/amdgpu/amdgpu_workload.c  | 156 ++
  drivers/gpu/drm/amd/include/amdgpu_workload.h |  44 +
  5 files changed, 206 insertions(+), 1 deletion(-)
  create mode 100644 drivers/gpu/drm/amd/amdgpu/amdgpu_workload.c
  create mode 100644 drivers/gpu/drm/amd/include/amdgpu_workload.h

diff --git a/drivers/gpu/drm/amd/amdgpu/Makefile 
b/drivers/gpu/drm/amd/amdgpu/Makefile

index 415a7fa395c4..6a9e187d61e1 100644
--- a/drivers/gpu/drm/amd/amdgpu/Makefile
+++ b/drivers/gpu/drm/amd/amdgpu/Makefile
@@ -60,7 +60,7 @@ amdgpu-y += amdgpu_device.o amdgpu_kms.o \
  amdgpu_umc.o smu_v11_0_i2c.o amdgpu_fru_eeprom.o amdgpu_rap.o \
  amdgpu_fw_attestation.o amdgpu_securedisplay.o \
  amdgpu_eeprom.o amdgpu_mca.o amdgpu_psp_ta.o amdgpu_lsdma.o \
-    amdgpu_ring_mux.o
+    amdgpu_ring_mux.o amdgpu_workload.o
    amdgpu-$(CONFIG_PROC_FS) += amdgpu_fdinfo.o
  diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu.h

index 02b827785e39..1939fa1af8a6 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
@@ -107,6 +107,7 @@
  #include "amdgpu_fdinfo.h"
  #include "amdgpu_mca.h"
  #include "amdgpu_ras.h"
+#include "amdgpu_workload.h"
    #define MAX_GPU_INSTANCE    16
  @@ -1050,6 +1051,8 @@ struct amdgpu_device {
    bool    job_hang;
  bool    dc_enabled;
+
+    struct amdgpu_smu_workload    smu_workload;
  };
    static inline struct amdgpu_device *drm_to_adev(struct drm_device 
*ddev)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c

index 5c7d40873ee2..0ec18b8fe29f 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -3672,6 +3672,8 @@ int amdgpu_device_init(struct amdgpu_device *adev,
    INIT_WORK(&adev->xgmi_reset_work, 
amdgpu_device_xgmi_reset_func);

  +    amdgpu_smu_workload_init(adev);
+
  adev->gfx.gfx_off_req_count = 1;
  adev->gfx.gfx_off_residency = 0;
  adev->gfx.gfx_off_entrycount = 0;
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_workload.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_workload.c

new file mode 100644
index ..ce0339d75c12
--- /dev/null
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_workload.c
@@ -0,0 +1,156 @@
+// SPDX-License-Identifier: MIT
+/*
+ * Copyright 2023 Advanced Micro Devices, Inc.
+ *
+ * Permission is hereby granted, free of charge, to any person 
obtaining a
+ * copy of this software and associated documentation files (the 
"Software"),
+ * to deal in the Software without restriction, including without 
limitation
+ * the rights to use, copy, modify, merge, publish, distribute, 
sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom 
the

+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be 
included in

+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, 
EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF 
MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO 
EVENT SHALL
+ * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, 
DAMAGES OR
+ * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR 
OTHERWISE,
+ * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE 
USE OR

+ * OTHER DEALINGS IN THE SOFTWARE.
+ *
+ */
+
+#include "amdgpu.h"
+
+/* 100 millsecond timeout */
+#define SMU_IDLE_TIMEOUT    msecs_to_jiffies(100)
+
+static enum PP_SMC_POWER_PROFILE
+ring_to_power_profile(uint32_t ring_type)
+{
+    switch (ring_type) {
+    case AMDGPU_RING_TYPE_GFX:
+    return PP_SMC_POWER_PROFILE_FULLSCREEN3D;
+    case AMDGPU_RING_TYPE_COMPUTE:
+    return PP_SMC_POWER_PROFILE_COMPUTE;
+    case AMDGPU_RING_TYPE_UVD:
+    case AMDGPU_RING_TYPE_VCE:
+    case AMDGPU_RING_TYPE_UVD_ENC:
+    case AMDGPU_RING_TYPE_VCN_DEC:
+    case AMDGPU_RING_TYPE_VCN_ENC:
+    case AMDGPU_RING_TYPE_VCN_JPEG:
+    return PP_SMC_POWER_PROFILE_VIDEO;
+    default:
+    return PP_SMC_POWER_PROFILE_BOOTUP_DEFAULT;
+    }
+}
+
+static void
+amdgpu_power_profile_set(struct amdgpu_device *adev,
+ enum PP_SMC_POWER_PROFILE profile)
+{
+    int ret = amdgpu_dpm_switch_power_profile(adev, profile, true);
+
+    if (r

Re: [PATCH v3] drm/sched: Fix kernel NULL pointer dereference error

2022-10-18 Thread Yadav, Arvind



On 10/17/2022 8:20 PM, Christian König wrote:

Am 17.10.22 um 16:30 schrieb Arvind Yadav:

-This is purely a timing issue. Here, sometimes Job free
is happening before the job is done.
To fix this issue moving 'dma_fence_cb' callback from
job(struct drm_sched_job) to scheduler fence (struct drm_sched_fence).

- Added drm_sched_fence_set_parent() function(and others *_parent_cb)
in sched_fence.c. Moved parent fence intilization and callback
installation into this (this just cleanup).


BUG: kernel NULL pointer dereference, address: 0088
  #PF: supervisor read access in kernel mode
  #PF: error_code(0x) - not-present page
  PGD 0 P4D 0
  Oops:  [#1] PREEMPT SMP NOPTI
  CPU: 2 PID: 0 Comm: swapper/2 Not tainted 6.0.0-rc2-custom #1
  Arvind : [dma_fence_default_wait _START] timeout = -1
  Hardware name: AMD Dibbler/Dibbler, BIOS RDB1107CC 09/26/2018
  RIP: 0010:drm_sched_job_done.isra.0+0x11/0x140 [gpu_sched]
  Code: 8b fe ff ff be 03 00 00 00 e8 7b da b7 e3 e9 d4 fe ff ff 66 
0f 1f 44 00 00 0f 1f 44 00 00 55 48 89 e5 41 55 41 54 49 89 fc 53 
<48> 8b 9f 88 00 00 00 f0 ff 8b f0 00 00 00 48 8b 83 80 01 00 00 f0

  RSP: 0018:b1b1801d4d38 EFLAGS: 00010087
  RAX: c0aa48b0 RBX: b1b1801d4d70 RCX: 0018
  RDX: 36c70afb7c1d RSI: 8a45ca413c60 RDI: 
  RBP: b1b1801d4d50 R08: 00b5 R09: 
  R10:  R11:  R12: 
  R13: b1b1801d4d70 R14: 8a45c416 R15: 8a45c416a708
  FS:  () GS:8a48a0a8() 
knlGS:

  CS:  0010 DS:  ES:  CR0: 80050033
  CR2: 0088 CR3: 00014ad5 CR4: 003506e0
  Call Trace:
   
   drm_sched_job_done_cb+0x12/0x20 [gpu_sched]
   dma_fence_signal_timestamp_locked+0x7e/0x110
   dma_fence_signal+0x31/0x60
   amdgpu_fence_process+0xc4/0x140 [amdgpu]
   gfx_v9_0_eop_irq+0x9d/0xd0 [amdgpu]
   amdgpu_irq_dispatch+0xb7/0x210 [amdgpu]
   amdgpu_ih_process+0x86/0x100 [amdgpu]
   amdgpu_irq_handler+0x24/0x60 [amdgpu]
   __handle_irq_event_percpu+0x4b/0x190
   handle_irq_event_percpu+0x15/0x50
   handle_irq_event+0x39/0x60
   handle_edge_irq+0xaf/0x210
   __common_interrupt+0x6e/0x110
   common_interrupt+0xc1/0xe0
   
   

Signed-off-by: Arvind Yadav 
---

Changes in v2: Moving 'dma_fence_cb' callback from
job(struct drm_sched_job) to scheduler fence(struct drm_sched_fence)
instead of adding NULL check for s_fence.

Changes in v3: Added drm_sched_fence_set_parent() function(and others 
*_parent_cb)

in sched_fence.c. Moved parent fence intilization and callback
installation into this (this just cleanup).

---
  drivers/gpu/drm/scheduler/sched_fence.c | 53 +
  drivers/gpu/drm/scheduler/sched_main.c  | 38 +-
  include/drm/gpu_scheduler.h | 12 +-
  3 files changed, 72 insertions(+), 31 deletions(-)

diff --git a/drivers/gpu/drm/scheduler/sched_fence.c 
b/drivers/gpu/drm/scheduler/sched_fence.c

index 7fd869520ef2..f6808f363261 100644
--- a/drivers/gpu/drm/scheduler/sched_fence.c
+++ b/drivers/gpu/drm/scheduler/sched_fence.c
@@ -77,6 +77,59 @@ static void drm_sched_fence_free_rcu(struct 
rcu_head *rcu)

  if (!WARN_ON_ONCE(!fence))
  kmem_cache_free(sched_fence_slab, fence);
  }


Please add an empty line here.

I will fix in the next version of patch.



+/**
+ * drm_sched_job_done_cb - the callback for a done job
+ * @f: fence
+ * @cb: fence callbacks
+ */
+static void drm_sched_job_done_cb(struct dma_fence *f, struct 
dma_fence_cb *cb)


Probably best to rename this to something like 
drm_sched_fence_parent_cb().



I will rename in the next version of patch.

+{
+    struct drm_sched_fence *s_fence = container_of(cb, struct 
drm_sched_fence,

+   cb);
+    struct drm_gpu_scheduler *sched = s_fence->sched;
+
+    atomic_dec(&sched->hw_rq_count);
+    atomic_dec(sched->score);
+
+    dma_fence_get(&s_fence->finished);


We should probably make sure that this reference is taken before 
installing the callback.


Here, we are signaling the finished fence and dma_fence_signal is 
checking the reference.


So we do not need to check here.




+    drm_sched_fence_finished(s_fence);
+    dma_fence_put(&s_fence->finished);
+    wake_up_interruptible(&sched->wake_up_worker);
+}
+
+int drm_sched_fence_add_parent_cb(struct dma_fence *fence,
+  struct drm_sched_fence *s_fence)
+{
+    return dma_fence_add_callback(fence, &s_fence->cb,
+  drm_sched_job_done_cb);
+}
+
+bool drm_sched_fence_remove_parent_cb(struct drm_sched_fence *s_fence)
+{
+    return dma_fence_remove_callback(s_fence->parent,
+ &s_fence->cb);
+}


Do we really need separate functions for that?

We can use  'drm_sched_fence_set_parent' but we need to add extra NULL 
check before


adding in the callback otherwise add-callback will throw the warning 
message every time.



Re: [PATCH v2] drm/sched: Fix kernel NULL pointer dereference error

2022-10-13 Thread Yadav, Arvind



On 10/12/2022 7:05 PM, Christian König wrote:
That essentially looks like the right approach, but I would go a few 
steps further.


I think we should add a drm_sched_fence_set_parent() function to 
sched_fence.c and move a good part of the handling into that C file. 
Just a simple signal function which tells the scheduler that it should 
decrement it's counter and wake up the main thread.


I have added these two functions in sched_fence.c . Here, I have used 
drm_sched_job_done() function as it is.


Is that fine?

void drm_sched_fence_add_parent_cb(struct dma_fence *fence,
   struct drm_sched_fence *s_fence)
{
    int r = dma_fence_add_callback(fence, &s_fence->cb,
   drm_sched_job_done_cb);
    if (r == -ENOENT)
    drm_sched_job_done(s_fence);
    else if (r)
    DRM_ERROR("fence add callback failed (%d)\n", r);
}

bool drm_sched_fence_remove_parent_cb(struct drm_sched_fence *s_fence) {
    return dma_fence_remove_callback(s_fence->parent,
 &s_fence->cb);
}

Thanks

~Arvind


Regards,
Christian.

Am 12.10.22 um 15:22 schrieb Arvind Yadav:

This is purely a timing issue. Here, sometimes Job free
is happening before the job is done.
To fix this issue moving 'dma_fence_cb' callback from
job(struct drm_sched_job) to scheduler fence (struct drm_sched_fence).

BUG: kernel NULL pointer dereference, address: 0088
  #PF: supervisor read access in kernel mode
  #PF: error_code(0x) - not-present page
  PGD 0 P4D 0
  Oops:  [#1] PREEMPT SMP NOPTI
  CPU: 2 PID: 0 Comm: swapper/2 Not tainted 6.0.0-rc2-custom #1
  Arvind : [dma_fence_default_wait _START] timeout = -1
  Hardware name: AMD Dibbler/Dibbler, BIOS RDB1107CC 09/26/2018
  RIP: 0010:drm_sched_job_done.isra.0+0x11/0x140 [gpu_sched]
  Code: 8b fe ff ff be 03 00 00 00 e8 7b da b7 e3 e9 d4 fe ff ff 66 
0f 1f 44 00 00 0f 1f 44 00 00 55 48 89 e5 41 55 41 54 49 89 fc 53 
<48> 8b 9f 88 00 00 00 f0 ff 8b f0 00 00 00 48 8b 83 80 01 00 00 f0

  RSP: 0018:b1b1801d4d38 EFLAGS: 00010087
  RAX: c0aa48b0 RBX: b1b1801d4d70 RCX: 0018
  RDX: 36c70afb7c1d RSI: 8a45ca413c60 RDI: 
  RBP: b1b1801d4d50 R08: 00b5 R09: 
  R10:  R11:  R12: 
  R13: b1b1801d4d70 R14: 8a45c416 R15: 8a45c416a708
  FS:  () GS:8a48a0a8() 
knlGS:

  CS:  0010 DS:  ES:  CR0: 80050033
  CR2: 0088 CR3: 00014ad5 CR4: 003506e0
  Call Trace:
   
   drm_sched_job_done_cb+0x12/0x20 [gpu_sched]
   dma_fence_signal_timestamp_locked+0x7e/0x110
   dma_fence_signal+0x31/0x60
   amdgpu_fence_process+0xc4/0x140 [amdgpu]
   gfx_v9_0_eop_irq+0x9d/0xd0 [amdgpu]
   amdgpu_irq_dispatch+0xb7/0x210 [amdgpu]
   amdgpu_ih_process+0x86/0x100 [amdgpu]
   amdgpu_irq_handler+0x24/0x60 [amdgpu]
   __handle_irq_event_percpu+0x4b/0x190
   handle_irq_event_percpu+0x15/0x50
   handle_irq_event+0x39/0x60
   handle_edge_irq+0xaf/0x210
   __common_interrupt+0x6e/0x110
   common_interrupt+0xc1/0xe0
   
   

Signed-off-by: Arvind Yadav 
---

Changes in v2: Moving 'dma_fence_cb' callback from
job(struct drm_sched_job) to scheduler fence(struct drm_sched_fence)
instead of adding NULL check for s_fence.

---
  drivers/gpu/drm/scheduler/sched_main.c | 23 +++
  include/drm/gpu_scheduler.h    |  6 --
  2 files changed, 15 insertions(+), 14 deletions(-)

diff --git a/drivers/gpu/drm/scheduler/sched_main.c 
b/drivers/gpu/drm/scheduler/sched_main.c

index 4cc59bae38dd..62d8eca05b92 100644
--- a/drivers/gpu/drm/scheduler/sched_main.c
+++ b/drivers/gpu/drm/scheduler/sched_main.c
@@ -253,13 +253,12 @@ drm_sched_rq_select_entity_fifo(struct 
drm_sched_rq *rq)

    /**
   * drm_sched_job_done - complete a job
- * @s_job: pointer to the job which is done
+ * @s_fence: pointer to the fence of a done job
   *
   * Finish the job's fence and wake up the worker thread.
   */
-static void drm_sched_job_done(struct drm_sched_job *s_job)
+static void drm_sched_job_done(struct drm_sched_fence *s_fence)
  {
-    struct drm_sched_fence *s_fence = s_job->s_fence;
  struct drm_gpu_scheduler *sched = s_fence->sched;
    atomic_dec(&sched->hw_rq_count);
@@ -280,9 +279,9 @@ static void drm_sched_job_done(struct 
drm_sched_job *s_job)

   */
  static void drm_sched_job_done_cb(struct dma_fence *f, struct 
dma_fence_cb *cb)

  {
-    struct drm_sched_job *s_job = container_of(cb, struct 
drm_sched_job, cb);
+    struct drm_sched_fence *s_fence = container_of(cb, struct 
drm_sched_fence, cb);

  -    drm_sched_job_done(s_job);
+    drm_sched_job_done(s_fence);
  }
    /**
@@ -506,7 +505,7 @@ void drm_sched_stop(struct drm_gpu_scheduler 
*sched, struct drm_sched_job *bad)

   list) {
  if (s_job->s_fence->pa

Re: [PATCH] drm/sched: Fix kernel NULL pointer dereference error

2022-09-30 Thread Yadav, Arvind



On 9/30/2022 4:56 PM, Christian König wrote:

Am 30.09.22 um 10:48 schrieb Arvind Yadav:

BUG: kernel NULL pointer dereference, address: 0088
  #PF: supervisor read access in kernel mode
  #PF: error_code(0x) - not-present page
  PGD 0 P4D 0
  Oops:  [#1] PREEMPT SMP NOPTI
  CPU: 2 PID: 0 Comm: swapper/2 Not tainted 6.0.0-rc2-custom #1
  Arvind : [dma_fence_default_wait _START] timeout = -1
  Hardware name: AMD Dibbler/Dibbler, BIOS RDB1107CC 09/26/2018
  RIP: 0010:drm_sched_job_done.isra.0+0x11/0x140 [gpu_sched]
  Code: 8b fe ff ff be 03 00 00 00 e8 7b da b7 e3 e9 d4 fe ff ff 66 
0f 1f 44 00 00 0f 1f 44 00 00 55 48 89 e5 41 55 41 54 49 89 fc 53 
<48> 8b 9f 88 00 00 00 f0 ff 8b f0 00 00 00 48 8b 83 80 01 00 00 f0

  RSP: 0018:b1b1801d4d38 EFLAGS: 00010087
  RAX: c0aa48b0 RBX: b1b1801d4d70 RCX: 0018
  RDX: 36c70afb7c1d RSI: 8a45ca413c60 RDI: 
  RBP: b1b1801d4d50 R08: 00b5 R09: 
  R10:  R11:  R12: 
  R13: b1b1801d4d70 R14: 8a45c416 R15: 8a45c416a708
  FS:  () GS:8a48a0a8() 
knlGS:

  CS:  0010 DS:  ES:  CR0: 80050033
  CR2: 0088 CR3: 00014ad5 CR4: 003506e0
  Call Trace:
   
   drm_sched_job_done_cb+0x12/0x20 [gpu_sched]
   dma_fence_signal_timestamp_locked+0x7e/0x110
   dma_fence_signal+0x31/0x60
   amdgpu_fence_process+0xc4/0x140 [amdgpu]
   gfx_v9_0_eop_irq+0x9d/0xd0 [amdgpu]
   amdgpu_irq_dispatch+0xb7/0x210 [amdgpu]
   amdgpu_ih_process+0x86/0x100 [amdgpu]
   amdgpu_irq_handler+0x24/0x60 [amdgpu]
   __handle_irq_event_percpu+0x4b/0x190
   handle_irq_event_percpu+0x15/0x50
   handle_irq_event+0x39/0x60
   handle_edge_irq+0xaf/0x210
   __common_interrupt+0x6e/0x110
   common_interrupt+0xc1/0xe0
   
   


How is this triggered any why haven't we seen it before?


IGT has few 'amdgpu' specific testcases which is not related  to fence.

while running those test cases I have got this crash but this crash is 
not always reproducible.


~Arvind


Christian


Signed-off-by: Arvind Yadav 
---
  drivers/gpu/drm/scheduler/sched_main.c | 7 ++-
  1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/scheduler/sched_main.c 
b/drivers/gpu/drm/scheduler/sched_main.c

index 6684d88463b4..390272f6b126 100644
--- a/drivers/gpu/drm/scheduler/sched_main.c
+++ b/drivers/gpu/drm/scheduler/sched_main.c
@@ -172,7 +172,12 @@ drm_sched_rq_select_entity(struct drm_sched_rq *rq)
  static void drm_sched_job_done(struct drm_sched_job *s_job)
  {
  struct drm_sched_fence *s_fence = s_job->s_fence;
-    struct drm_gpu_scheduler *sched = s_fence->sched;
+    struct drm_gpu_scheduler *sched;
+
+    if (!s_fence)
+    return;
+
+    sched = s_fence->sched;
    atomic_dec(&sched->hw_rq_count);
  atomic_dec(sched->score);




Re: [PATCH 3/3] dma-buf: Check status of enable-signaling bit on debug

2022-09-30 Thread Yadav, Arvind



On 9/30/2022 12:02 AM, Christian König wrote:

Am 29.09.22 um 20:30 schrieb Yadav, Arvind:


On 9/29/2022 11:48 PM, Christian König wrote:

Am 27.09.22 um 19:24 schrieb Arvind Yadav:

Fence signaling must be enabled to make sure that
the dma_fence_is_signaled_locked() function ever returns true.
Since drivers and implementations sometimes mess this up,
this ensures correct behaviour when DMABUF_DEBUG_ENABLE_SIGNALING
is used during debugging.
This should make any implementation bugs resulting in not
signaled fences much more obvious.


Are all IGT tests now passing with this? That would be a bit 
unfortunate because it means we still have missed the bug in 
drm_syncobj.


IGT has these test cases related to syncobj (syncobj_basic, 
syncobj_timeline and syncobj_wait)and all are passing.


I will check syncobj and let you know.


Maybe CC the Intel list and let their CI systems take a look. That's 
usually rather valuable.


There is one IGT subtest is failing which is related to syncobj. I will 
fix this and submit the patch.


 igt_subtest("host-signal-points")
    test_host_signal_points(fd);



Thanks,
Christian.



~Arvind


Christian.



Signed-off-by: Arvind Yadav 
---
  include/linux/dma-fence.h | 5 +
  1 file changed, 5 insertions(+)

diff --git a/include/linux/dma-fence.h b/include/linux/dma-fence.h
index 775cdc0b4f24..5156dc6be0a6 100644
--- a/include/linux/dma-fence.h
+++ b/include/linux/dma-fence.h
@@ -398,6 +398,11 @@ void dma_fence_enable_sw_signaling(struct 
dma_fence *fence);

  static inline bool
  dma_fence_is_signaled_locked(struct dma_fence *fence)
  {
+#ifdef CONFIG_DMABUF_DEBUG_ENABLE_SIGNALING
+    if (!test_bit(DMA_FENCE_FLAG_ENABLE_SIGNAL_BIT, &fence->flags))
+    return false;
+#endif
+
  if (test_bit(DMA_FENCE_FLAG_SIGNALED_BIT, &fence->flags))
  return true;






Re: [PATCH 3/3] dma-buf: Check status of enable-signaling bit on debug

2022-09-29 Thread Yadav, Arvind



On 9/29/2022 11:48 PM, Christian König wrote:

Am 27.09.22 um 19:24 schrieb Arvind Yadav:

Fence signaling must be enabled to make sure that
the dma_fence_is_signaled_locked() function ever returns true.
Since drivers and implementations sometimes mess this up,
this ensures correct behaviour when DMABUF_DEBUG_ENABLE_SIGNALING
is used during debugging.
This should make any implementation bugs resulting in not
signaled fences much more obvious.


Are all IGT tests now passing with this? That would be a bit 
unfortunate because it means we still have missed the bug in drm_syncobj.


IGT has these test cases related to syncobj (syncobj_basic, 
syncobj_timeline and syncobj_wait)and all are passing.


I will check syncobj and let you know.

~Arvind


Christian.



Signed-off-by: Arvind Yadav 
---
  include/linux/dma-fence.h | 5 +
  1 file changed, 5 insertions(+)

diff --git a/include/linux/dma-fence.h b/include/linux/dma-fence.h
index 775cdc0b4f24..5156dc6be0a6 100644
--- a/include/linux/dma-fence.h
+++ b/include/linux/dma-fence.h
@@ -398,6 +398,11 @@ void dma_fence_enable_sw_signaling(struct 
dma_fence *fence);

  static inline bool
  dma_fence_is_signaled_locked(struct dma_fence *fence)
  {
+#ifdef CONFIG_DMABUF_DEBUG_ENABLE_SIGNALING
+    if (!test_bit(DMA_FENCE_FLAG_ENABLE_SIGNAL_BIT, &fence->flags))
+    return false;
+#endif
+
  if (test_bit(DMA_FENCE_FLAG_SIGNALED_BIT, &fence->flags))
  return true;




Re: [PATCH v4 0/6] dma-buf: Check status of enable-signaling bit on debug

2022-09-15 Thread Yadav, Arvind



On 9/15/2022 5:37 PM, Christian König wrote:
Is that sufficient to allow running a desktop on amdgpu with the extra 
check enabled? If yes that would be quite a milestone.



Yes, It is running on amdgpu with extra config enabled.
What's left is checking the userspace IGT tests. Especially the 
sync_file and drm_syncobj tests I would expect to have problems with 
this extra check.



Yes, IGT test cases are failing .

~Arvind


Thanks,
Christian.

Am 14.09.22 um 18:43 schrieb Arvind Yadav:

Fence signaling must be enabled to make sure that
the dma_fence_is_signaled() function ever returns true.
Since drivers and implementations sometimes mess this up,
this ensures correct behaviour when DEBUG_WW_MUTEX_SLOWPATH
is used during debugging.
This should make any implementation bugs resulting in not
signaled fences much more obvious.

Arvind Yadav (6):
   [PATCH v4 1/6] dma-buf: Remove the signaled bit status check
   [PATCH v4 2/6] dma-buf: set signaling bit for the stub fence
   [PATCH v4 3/6] dma-buf: Enable signaling on fence for selftests
   [PATCH v4 4/6] dma-buf: dma_fence_wait must enable signaling
   [PATCH v4 5/6] drm/sched: Use parent fence instead of finished
   [PATCH v4 6/6] dma-buf: Check status of enable-signaling bit on debug

  drivers/dma-buf/Kconfig    |  7 +++
  drivers/dma-buf/dma-fence.c    | 16 ++--
  drivers/dma-buf/st-dma-fence-chain.c   |  4 
  drivers/dma-buf/st-dma-fence-unwrap.c  | 22 ++
  drivers/dma-buf/st-dma-fence.c | 16 
  drivers/dma-buf/st-dma-resv.c  | 10 ++
  drivers/gpu/drm/scheduler/sched_main.c |  4 ++--
  include/linux/dma-fence.h  |  5 +
  8 files changed, 76 insertions(+), 8 deletions(-)





Re: [PATCH v3 5/6] drm/sched: Use parent fence instead of finished

2022-09-09 Thread Yadav, Arvind



On 9/9/2022 11:02 PM, Andrey Grodzovsky wrote:
What exactly is the scenario which this patch fixes in more detail 
please  ?



GPU reset issue started after adding [PATCH 6/6].

Root cause -> In drm_sched_get_cleanup_job(), We use the finished fence 
status bit to check the job status dma_fence_is_signaled(). If a job is 
signaled (DMA_FENCE_FLAG_SIGNALED_BIT is set), then we cancel the reset 
worker thread.


After applying [patch 6] now we are checking enable signaling in 
dma_fence_is_signaled() by checking DMA_FENCE_FLAG_ENABLE_SIGNAL_BIT 
bit. but signaling is not enabled for the finished fence. As a result, 
dma_fence_is_signaled() always returns false, and 
drm_sched_get_cleanup_job() will not cancel the reset worker thread, 
resulting in the GPU reset.


To Fix the above issue  Christian suggested that we can use 
parent(hardware) fence instead of finished fence because signaling 
enabled by the calling of dma_fence_add_callback() for parent fence. As 
a result, dma_fence_is_signaled() will return the correct fence status 
and reset worker thread can be cancelled in drm_sched_get_cleanup_job().


~arvind


Andrey

On 2022-09-09 13:08, Arvind Yadav wrote:

Using the parent fence instead of the finished fence
to get the job status. This change is to avoid GPU
scheduler timeout error which can cause GPU reset.

Signed-off-by: Arvind Yadav 
---

changes in v1,v2 - Enable signaling for finished fence in sche_main()
is removed

---
  drivers/gpu/drm/scheduler/sched_main.c | 4 ++--
  1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/scheduler/sched_main.c 
b/drivers/gpu/drm/scheduler/sched_main.c

index e0ab14e0fb6b..2ac28ad11432 100644
--- a/drivers/gpu/drm/scheduler/sched_main.c
+++ b/drivers/gpu/drm/scheduler/sched_main.c
@@ -829,7 +829,7 @@ drm_sched_get_cleanup_job(struct 
drm_gpu_scheduler *sched)

  job = list_first_entry_or_null(&sched->pending_list,
 struct drm_sched_job, list);
  -    if (job && dma_fence_is_signaled(&job->s_fence->finished)) {
+    if (job && dma_fence_is_signaled(job->s_fence->parent)) {
  /* remove job from pending_list */
  list_del_init(&job->list);
  @@ -841,7 +841,7 @@ drm_sched_get_cleanup_job(struct 
drm_gpu_scheduler *sched)

    if (next) {
  next->s_fence->scheduled.timestamp =
-    job->s_fence->finished.timestamp;
+    job->s_fence->parent->timestamp;
  /* start TO timer for next job */
  drm_sched_start_timeout(sched);
  }


Re: [PATCH v2 2/4] dma-buf: enable signaling for the stub fence on debug

2022-09-09 Thread Yadav, Arvind



On 9/6/2022 12:39 PM, Christian König wrote:



Am 05.09.22 um 18:35 schrieb Arvind Yadav:

Here's on debug enabling software signaling for the stub fence
which is always signaled. This fence should enable software
signaling otherwise the AMD GPU scheduler will cause a GPU reset
due to a GPU scheduler cleanup activity timeout.

Signed-off-by: Arvind Yadav 
---

Changes in v1 :
1- Addressing Christian's comment to remove unnecessary callback.
2- Replacing CONFIG_DEBUG_WW_MUTEX_SLOWPATH instead of CONFIG_DEBUG_FS.
3- The version of this patch is also changed and previously
it was [PATCH 3/4]

---
  drivers/dma-buf/dma-fence.c | 7 +++
  1 file changed, 7 insertions(+)

diff --git a/drivers/dma-buf/dma-fence.c b/drivers/dma-buf/dma-fence.c
index 066400ed8841..2378b12538c4 100644
--- a/drivers/dma-buf/dma-fence.c
+++ b/drivers/dma-buf/dma-fence.c
@@ -27,6 +27,10 @@ EXPORT_TRACEPOINT_SYMBOL(dma_fence_signaled);
  static DEFINE_SPINLOCK(dma_fence_stub_lock);
  static struct dma_fence dma_fence_stub;
  +#ifdef CONFIG_DEBUG_WW_MUTEX_SLOWPATH
+static bool __dma_fence_enable_signaling(struct dma_fence *fence);
+#endif
+


I would rename the function to something like 
dma_fence_enable_signaling_locked().


And please don't add any #ifdef if it isn't absolutely necessary. This 
makes the code pretty fragile.



  /*
   * fence context counter: each execution context should have its own
   * fence context, this allows checking if fences belong to the same
@@ -136,6 +140,9 @@ struct dma_fence *dma_fence_get_stub(void)
 &dma_fence_stub_ops,
 &dma_fence_stub_lock,
 0, 0);
+#ifdef CONFIG_DEBUG_WW_MUTEX_SLOWPATH
+    __dma_fence_enable_signaling(&dma_fence_stub);
+#endif


Alternatively in this particular case you could just set the bit 
manually here since this is part of the dma_fence code anyway.


Christian.


As per per review comment. I will set the bit manually.

~arvind


dma_fence_signal_locked(&dma_fence_stub);
  }
  spin_unlock(&dma_fence_stub_lock);




Re: [PATCH 2/4] drm/sched: Add callback and enable signaling on debug

2022-09-05 Thread Yadav, Arvind



On 9/5/2022 7:16 PM, Yadav, Arvind wrote:


On 9/5/2022 4:55 PM, Christian König wrote:



Am 05.09.22 um 12:56 schrieb Arvind Yadav:

Here's on debug adding an enable_signaling callback for finished
fences and enabling software signaling for finished fence.

Signed-off-by: Arvind Yadav 
---
  drivers/gpu/drm/scheduler/sched_fence.c | 12 
  drivers/gpu/drm/scheduler/sched_main.c  |  4 +++-
  2 files changed, 15 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/scheduler/sched_fence.c 
b/drivers/gpu/drm/scheduler/sched_fence.c

index 7fd869520ef2..ebd26cdb79a0 100644
--- a/drivers/gpu/drm/scheduler/sched_fence.c
+++ b/drivers/gpu/drm/scheduler/sched_fence.c
@@ -122,16 +122,28 @@ static void 
drm_sched_fence_release_finished(struct dma_fence *f)

    dma_fence_put(&fence->scheduled);
  }
+#ifdef CONFIG_DEBUG_FS
+static bool drm_sched_enable_signaling(struct dma_fence *f)
+{
+    return true;
+}
+#endif
    static const struct dma_fence_ops drm_sched_fence_ops_scheduled = {
  .get_driver_name = drm_sched_fence_get_driver_name,
  .get_timeline_name = drm_sched_fence_get_timeline_name,
+#ifdef CONFIG_DEBUG_FS
+    .enable_signaling = drm_sched_enable_signaling,
+#endif
  .release = drm_sched_fence_release_scheduled,
  };
    static const struct dma_fence_ops drm_sched_fence_ops_finished = {
  .get_driver_name = drm_sched_fence_get_driver_name,
  .get_timeline_name = drm_sched_fence_get_timeline_name,
+#ifdef CONFIG_DEBUG_FS
+    .enable_signaling = drm_sched_enable_signaling,
+#endif


Adding the callback should not be necessary.

sure, I will remove this change.



  .release = drm_sched_fence_release_finished,
  };
  diff --git a/drivers/gpu/drm/scheduler/sched_main.c 
b/drivers/gpu/drm/scheduler/sched_main.c

index e0ab14e0fb6b..140e3d8646e2 100644
--- a/drivers/gpu/drm/scheduler/sched_main.c
+++ b/drivers/gpu/drm/scheduler/sched_main.c
@@ -961,7 +961,9 @@ static int drm_sched_main(void *param)
  s_fence->parent = dma_fence_get(fence);
  /* Drop for original kref_init of the fence */
  dma_fence_put(fence);


Uff, not related to your patch but that looks wrong to me. The 
reference can only be dropped after the call to 
dma_fence_add_callback().



Shall I take care with this patch or I will submit separate one ?


This changes was recently added by  Andrey Grodzovsky (commit : 
45ecaea73) to fix the negative refcount.



-
+#ifdef CONFIG_DEBUG_FS
+ dma_fence_enable_sw_signaling(&s_fence->finished);
+#endif


This should always be called, independent of the config options set.

Christian.


sure, I will remove the Config check.

~arvind




  r = dma_fence_add_callback(fence, &sched_job->cb,
 drm_sched_job_done_cb);
  if (r == -ENOENT)




Re: [PATCH 3/4] dma-buf: Add callback and enable signaling on debug

2022-09-05 Thread Yadav, Arvind



On 9/5/2022 4:56 PM, Christian König wrote:



Am 05.09.22 um 12:56 schrieb Arvind Yadav:

Here's on debug adding an enable_signaling callback for the stub
fences and enabling software signaling for the stub fence which
is always signaled. This fence should enable software signaling
otherwise the AMD GPU scheduler will cause a GPU reset due to a
GPU scheduler cleanup activity timeout.

Signed-off-by: Arvind Yadav 
---
  drivers/dma-buf/dma-fence.c | 17 +
  1 file changed, 17 insertions(+)

diff --git a/drivers/dma-buf/dma-fence.c b/drivers/dma-buf/dma-fence.c
index 066400ed8841..0a67af945ef8 100644
--- a/drivers/dma-buf/dma-fence.c
+++ b/drivers/dma-buf/dma-fence.c
@@ -27,6 +27,10 @@ EXPORT_TRACEPOINT_SYMBOL(dma_fence_signaled);
  static DEFINE_SPINLOCK(dma_fence_stub_lock);
  static struct dma_fence dma_fence_stub;
  +#ifdef CONFIG_DEBUG_FS
+static bool __dma_fence_enable_signaling(struct dma_fence *fence);
+#endif
+
  /*
   * fence context counter: each execution context should have its own
   * fence context, this allows checking if fences belong to the same
@@ -116,9 +120,19 @@ static const char 
*dma_fence_stub_get_name(struct dma_fence *fence)

  return "stub";
  }
  +#ifdef CONFIG_DEBUG_FS
+static bool dma_fence_stub_enable_signaling(struct dma_fence *f)
+{
+    return true;
+}
+#endif


Again, adding the callback is unnecessary.


sure, I will remove callback from here and other patch also.

~arvind




+
  static const struct dma_fence_ops dma_fence_stub_ops = {
  .get_driver_name = dma_fence_stub_get_name,
  .get_timeline_name = dma_fence_stub_get_name,
+#ifdef CONFIG_DEBUG_FS
+    .enable_signaling =  dma_fence_stub_enable_signaling,
+#endif
  };
    /**
@@ -136,6 +150,9 @@ struct dma_fence *dma_fence_get_stub(void)
 &dma_fence_stub_ops,
 &dma_fence_stub_lock,
 0, 0);
+#ifdef CONFIG_DEBUG_FS
+    __dma_fence_enable_signaling(&dma_fence_stub);
+#endif
  dma_fence_signal_locked(&dma_fence_stub);
  }
  spin_unlock(&dma_fence_stub_lock);




Re: [PATCH 2/4] drm/sched: Add callback and enable signaling on debug

2022-09-05 Thread Yadav, Arvind



On 9/5/2022 4:55 PM, Christian König wrote:



Am 05.09.22 um 12:56 schrieb Arvind Yadav:

Here's on debug adding an enable_signaling callback for finished
fences and enabling software signaling for finished fence.

Signed-off-by: Arvind Yadav 
---
  drivers/gpu/drm/scheduler/sched_fence.c | 12 
  drivers/gpu/drm/scheduler/sched_main.c  |  4 +++-
  2 files changed, 15 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/scheduler/sched_fence.c 
b/drivers/gpu/drm/scheduler/sched_fence.c

index 7fd869520ef2..ebd26cdb79a0 100644
--- a/drivers/gpu/drm/scheduler/sched_fence.c
+++ b/drivers/gpu/drm/scheduler/sched_fence.c
@@ -122,16 +122,28 @@ static void 
drm_sched_fence_release_finished(struct dma_fence *f)

    dma_fence_put(&fence->scheduled);
  }
+#ifdef CONFIG_DEBUG_FS
+static bool drm_sched_enable_signaling(struct dma_fence *f)
+{
+    return true;
+}
+#endif
    static const struct dma_fence_ops drm_sched_fence_ops_scheduled = {
  .get_driver_name = drm_sched_fence_get_driver_name,
  .get_timeline_name = drm_sched_fence_get_timeline_name,
+#ifdef CONFIG_DEBUG_FS
+    .enable_signaling = drm_sched_enable_signaling,
+#endif
  .release = drm_sched_fence_release_scheduled,
  };
    static const struct dma_fence_ops drm_sched_fence_ops_finished = {
  .get_driver_name = drm_sched_fence_get_driver_name,
  .get_timeline_name = drm_sched_fence_get_timeline_name,
+#ifdef CONFIG_DEBUG_FS
+    .enable_signaling = drm_sched_enable_signaling,
+#endif


Adding the callback should not be necessary.

sure, I will remove this change.



  .release = drm_sched_fence_release_finished,
  };
  diff --git a/drivers/gpu/drm/scheduler/sched_main.c 
b/drivers/gpu/drm/scheduler/sched_main.c

index e0ab14e0fb6b..140e3d8646e2 100644
--- a/drivers/gpu/drm/scheduler/sched_main.c
+++ b/drivers/gpu/drm/scheduler/sched_main.c
@@ -961,7 +961,9 @@ static int drm_sched_main(void *param)
  s_fence->parent = dma_fence_get(fence);
  /* Drop for original kref_init of the fence */
  dma_fence_put(fence);


Uff, not related to your patch but that looks wrong to me. The 
reference can only be dropped after the call to dma_fence_add_callback().



Shall I take care with this patch or I will submit separate one ?

-
+#ifdef CONFIG_DEBUG_FS
+ dma_fence_enable_sw_signaling(&s_fence->finished);
+#endif


This should always be called, independent of the config options set.

Christian.


sure, I will remove the Config check.

~arvind




  r = dma_fence_add_callback(fence, &sched_job->cb,
 drm_sched_job_done_cb);
  if (r == -ENOENT)




Re: [PATCH 1/4] dma-buf: Check status of enable-signaling bit on debug

2022-09-05 Thread Yadav, Arvind



On 9/5/2022 4:51 PM, Christian König wrote:

Am 05.09.22 um 12:56 schrieb Arvind Yadav:

The core DMA-buf framework needs to enable signaling
before the fence is signaled. The core DMA-buf framework
can forget to enable signaling before the fence is signaled.
To avoid this scenario on the debug kernel, check the
DMA_FENCE_FLAG_ENABLE_SIGNAL_BIT status bit before checking
the signaling bit status to confirm that enable_signaling
is enabled.


You might want to put this patch at the end of the series to avoid 
breaking the kernel in between.



sure, I will add this patch at end.


Signed-off-by: Arvind Yadav 
---
  include/linux/dma-fence.h | 5 +
  1 file changed, 5 insertions(+)

diff --git a/include/linux/dma-fence.h b/include/linux/dma-fence.h
index 775cdc0b4f24..60c0e935c0b5 100644
--- a/include/linux/dma-fence.h
+++ b/include/linux/dma-fence.h
@@ -428,6 +428,11 @@ dma_fence_is_signaled_locked(struct dma_fence 
*fence)

  static inline bool
  dma_fence_is_signaled(struct dma_fence *fence)
  {
+#ifdef CONFIG_DEBUG_FS


CONFIG_DEBUG_FS is certainly wrong, probably better to check for 
CONFIG_DEBUG_WW_MUTEX_SLOWPATH here.


Apart from that looks good to me,
Christian.


I will use CONFIG_DEBUG_WW_MUTEX_SLOWPATH instead of CONFIG_DEBUG_FS

~arvind


+    if (!test_bit(DMA_FENCE_FLAG_ENABLE_SIGNAL_BIT, &fence->flags))
+    return false;
+#endif
+
  if (test_bit(DMA_FENCE_FLAG_SIGNALED_BIT, &fence->flags))
  return true;