Re: [PATCH v3 1/1] drm/amdgpu: rework sched_list generation

2020-04-01 Thread Nirmoy


On 4/1/20 5:02 PM, Luben Tuikov wrote:

On 2020-03-31 08:46, Nirmoy wrote:

On 3/31/20 3:01 AM, Luben Tuikov wrote:

This patch seems to be using DOS line-endings.


Strange, I don't see that in my local patch file.


Not sure why "git am" complained about DOS endings
the first time I downloaded it. Second time was fine.

[snip]>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu.h

index 29f0a410091b..27abbdc603dd 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
@@ -721,6 +721,11 @@ struct amd_powerplay {
const struct amd_pm_funcs *pp_funcs;
   };

+struct amdgpu_sched {
+   uint32_tnum_scheds;
+   struct drm_gpu_scheduler*sched[HWIP_MAX_INSTANCE];
+};
+
   #define AMDGPU_RESET_MAGIC_NUM 64
   #define AMDGPU_MAX_DF_PERFMONS 4
   struct amdgpu_device {
@@ -858,6 +863,8 @@ struct amdgpu_device {
struct amdgpu_ring  *rings[AMDGPU_MAX_RINGS];
boolib_pool_ready;
struct amdgpu_sa_managerring_tmp_bo[AMDGPU_IB_POOL_MAX];
+   /* drm scheduler list */
+   struct amdgpu_sched 
gpu_sched[AMDGPU_HW_IP_NUM][AMDGPU_RING_PRIO_MAX];

That's a 2-dimensional array of "struct amdgpu_sched".
I think that the comment should be removed, or at least
not say "drm scheduler list". (I can see the structure
definition above.)


Yes I should remove it.



If this is the path you want to go, consider removing
"num_scheds" and creating a three dimensional array,
which would really essentialize the direction you want
to go:

struct drm_gpu_scheduler 
*gpu_sched[AMDGPU_HW_IP_NUM][AMDGPU_RING_PRIO_MAX][HWIP_MAX_INSTANCE];

Now that this architecture is stripped down to its essentials,
perhaps we can see some optimizations...?


If you mean whether we should see any performance improvement then imo
we may not see much

difference as we are using pretty much same number of memory access plus
now we have extra cost of array_index_nospec().

Also this is  not hot code path. We do only 1
amdgpu_ctx_init_entity()/HW_IP/Context.

No, this has nothing to do with "performance".
It's all about architecture and design.

You seem to have array-array-struct with array and int,
and it seems much cleaner to just have array-array-array.
I think you don't need to break the chain with
struct of int and array, just as I described
in my comment below which you snipped without addressing it.

If you're not going to address a comment, don't delete it, leave it
for others to see that it hasn't been addressed. Only
snip previously addressed and resolved comments and previously
seen patch info, like diffstat/etc.



I wanted to  understand "running pointer" before I could comment in there.





Also consider that since you're creating an array of pointers,
you don't necessarily need to know their count. Your hot-path
algorithms should not need to know it. If you need to print
their count, say in sysfs, then you can count them up on
behalf of the user-space process cat-ing the sysfs entry.


[snip]


+
+   /* set to default prio if sched_list is NULL */
+   if (!adev->gpu_sched[hw_ip][hw_prio].num_scheds)
+   hw_prio = AMDGPU_RING_PRIO_DEFAULT;

That comment is a bit confusing--it talks about a list
not being NULL, while the conditional implicitly checks
against 0.


Yes, this is wrong, will remove it.




I wish you hadn't snipped my comment here, but address it
one way or the other. It is:

I'd much rather that integer comparison be performed against
integers, as opposed to using logical NOT operator (which is
implicit in comparing against 0), i.e.,

if (adev->gpu_sched[hw_ip][hw_prio].num_scheds == 0)

Also, architecturally, there seems to be informational
redundancy, in keeping an integer count and list of
objects at the same time, as the above if-conditional
exposes: the comment talks about a list and NULL but
the if-conditional implicitly checks for 0.



Number of valid drm scheduler in adev->gpu_sched[hw_ip][hw_prio].sched 
will vary depending on priority and hw_ip.


We need to pass that scheduler  array and num_scheds to 
drm_sched_entity_init(). I think we often use


array and integer count together when the number of valid items in the 
array is dynamic.





Perhaps, we don't need "num_scheds" and you can just
check if the index is NULL and assign PRIO_DEFAULT.


@@ -258,6 +272,12 @@ int amdgpu_ring_init(struct amdgpu_device *adev, struct 
amdgpu_ring *ring,
ring->priority = DRM_SCHED_PRIORITY_NORMAL;
mutex_init(&ring->priority_mutex);

+   if (ring->funcs->type != AMDGPU_RING_TYPE_KIQ) {
+   hw_ip = amdgpu_ring_type_to_drm_hw_ip[ring->funcs->type];
+   num_sched = &adev->gpu_sched[hw_ip][hw_prio].num_scheds;
+   adev->gpu_sched[hw_ip][hw_prio].sched[(*num_sched)++] = 
&ring->sched;
+   }
This seems unnecessarily complicated. Perhaps we can remove
"num_sc

Re: [PATCH v3 1/1] drm/amdgpu: rework sched_list generation

2020-04-01 Thread Luben Tuikov
On 2020-03-31 08:46, Nirmoy wrote:
> 
> On 3/31/20 3:01 AM, Luben Tuikov wrote:
>> This patch seems to be using DOS line-endings.
> 
> 
> Strange, I don't see that in my local patch file.
> 

Not sure why "git am" complained about DOS endings
the first time I downloaded it. Second time was fine.

[snip]>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
>>> index 29f0a410091b..27abbdc603dd 100644
>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
>>> @@ -721,6 +721,11 @@ struct amd_powerplay {
>>> const struct amd_pm_funcs *pp_funcs;
>>>   };
>>>
>>> +struct amdgpu_sched {
>>> +   uint32_tnum_scheds;
>>> +   struct drm_gpu_scheduler*sched[HWIP_MAX_INSTANCE];
>>> +};
>>> +
>>>   #define AMDGPU_RESET_MAGIC_NUM 64
>>>   #define AMDGPU_MAX_DF_PERFMONS 4
>>>   struct amdgpu_device {
>>> @@ -858,6 +863,8 @@ struct amdgpu_device {
>>> struct amdgpu_ring  *rings[AMDGPU_MAX_RINGS];
>>> boolib_pool_ready;
>>> struct amdgpu_sa_managerring_tmp_bo[AMDGPU_IB_POOL_MAX];
>>> +   /* drm scheduler list */
>>> +   struct amdgpu_sched 
>>> gpu_sched[AMDGPU_HW_IP_NUM][AMDGPU_RING_PRIO_MAX];
>> That's a 2-dimensional array of "struct amdgpu_sched".
>> I think that the comment should be removed, or at least
>> not say "drm scheduler list". (I can see the structure
>> definition above.)
> 
> 
> Yes I should remove it.
> 
> 
>> If this is the path you want to go, consider removing
>> "num_scheds" and creating a three dimensional array,
>> which would really essentialize the direction you want
>> to go:
>>
>> struct drm_gpu_scheduler 
>> *gpu_sched[AMDGPU_HW_IP_NUM][AMDGPU_RING_PRIO_MAX][HWIP_MAX_INSTANCE];
>>
>> Now that this architecture is stripped down to its essentials,
>> perhaps we can see some optimizations...?
> 
> 
> If you mean whether we should see any performance improvement then imo 
> we may not see much
> 
> difference as we are using pretty much same number of memory access plus 
> now we have extra cost of array_index_nospec().
> 
> Also this is  not hot code path. We do only 1 
> amdgpu_ctx_init_entity()/HW_IP/Context.

No, this has nothing to do with "performance".
It's all about architecture and design.

You seem to have array-array-struct with array and int,
and it seems much cleaner to just have array-array-array.
I think you don't need to break the chain with
struct of int and array, just as I described
in my comment below which you snipped without addressing it.

If you're not going to address a comment, don't delete it, leave it
for others to see that it hasn't been addressed. Only
snip previously addressed and resolved comments and previously
seen patch info, like diffstat/etc.

>> Also consider that since you're creating an array of pointers,
>> you don't necessarily need to know their count. Your hot-path
>> algorithms should not need to know it. If you need to print
>> their count, say in sysfs, then you can count them up on
>> behalf of the user-space process cat-ing the sysfs entry.
>>

[snip]

>>> +
>>> +   /* set to default prio if sched_list is NULL */
>>> +   if (!adev->gpu_sched[hw_ip][hw_prio].num_scheds)
>>> +   hw_prio = AMDGPU_RING_PRIO_DEFAULT;
>> That comment is a bit confusing--it talks about a list
>> not being NULL, while the conditional implicitly checks
>> against 0.
> 
> 
> Yes, this is wrong, will remove it.
> 
> 
> 

I wish you hadn't snipped my comment here, but address it
one way or the other. It is:

I'd much rather that integer comparison be performed against
integers, as opposed to using logical NOT operator (which is
implicit in comparing against 0), i.e.,

if (adev->gpu_sched[hw_ip][hw_prio].num_scheds == 0)

Also, architecturally, there seems to be informational
redundancy, in keeping an integer count and list of
objects at the same time, as the above if-conditional
exposes: the comment talks about a list and NULL but
the if-conditional implicitly checks for 0.

Perhaps, we don't need "num_scheds" and you can just
check if the index is NULL and assign PRIO_DEFAULT.

>> @@ -258,6 +272,12 @@ int amdgpu_ring_init(struct amdgpu_device *adev, struct 
>> amdgpu_ring *ring,
>>  ring->priority = DRM_SCHED_PRIORITY_NORMAL;
>>  mutex_init(&ring->priority_mutex);
>>
>> +if (ring->funcs->type != AMDGPU_RING_TYPE_KIQ) {
>> +hw_ip = amdgpu_ring_type_to_drm_hw_ip[ring->funcs->type];
>> +num_sched = &adev->gpu_sched[hw_ip][hw_prio].num_scheds;
>> +adev->gpu_sched[hw_ip][hw_prio].sched[(*num_sched)++] = 
>> &ring->sched;
>> +}
>> This seems unnecessarily complicated. Perhaps we can remove
>> "num_scheds", as recommended above, and keep a running pointer
>> while initialization is being done...?
> 
> 
> What do you mean by running pointer ?

A "running pointer" is a local pointer you're using temporarily
to traverse memory. If you remove the "num_

Re: [PATCH v3 1/1] drm/amdgpu: rework sched_list generation

2020-03-31 Thread Nirmoy


On 3/31/20 3:01 AM, Luben Tuikov wrote:

This patch seems to be using DOS line-endings.



Strange, I don't see that in my local patch file.



  After converting
it to UNIX line-endings, the output of "git am" using "scripts/checkpatch.pl"
via the pre-commit hook is appended last to my thoughts below.

On 2020-03-30 11:49 a.m., Nirmoy Das wrote:

Generate HW IP's sched_list in amdgpu_ring_init() instead of
amdgpu_ctx.c. This makes amdgpu_ctx_init_compute_sched(),
ring.has_high_prio and amdgpu_ctx_init_sched() unnecessary.
This patch also stores sched_list for all HW IPs in one big
array in struct amdgpu_device which makes amdgpu_ctx_init_entity()
much more leaner.

v2:
fix a coding style issue
do not use drm hw_ip const to populate amdgpu_ring_type enum

v3:
remove ctx reference and move sched array and num_sched to a struct
use num_scheds to detect uninitialized scheduler list

Signed-off-by: Nirmoy Das
---
  drivers/gpu/drm/amd/amdgpu/amdgpu.h|   8 ++
  drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c| 156 -
  drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.h|   3 -
  drivers/gpu/drm/amd/amdgpu/amdgpu_device.c |   2 -
  drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c|   3 +-
  drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h|   5 -
  drivers/gpu/drm/amd/amdgpu/amdgpu_jpeg.h   |   2 -
  drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c   |  24 +++-
  drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h   |  12 +-
  drivers/gpu/drm/amd/amdgpu/amdgpu_sdma.h   |   2 -
  drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.h|   4 -
  drivers/gpu/drm/amd/amdgpu/cik_sdma.c  |   3 +-
  drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c |  13 +-
  drivers/gpu/drm/amd/amdgpu/gfx_v6_0.c  |   5 +-
  drivers/gpu/drm/amd/amdgpu/gfx_v7_0.c  |   5 +-
  drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c  |  11 +-
  drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c  |  13 +-
  drivers/gpu/drm/amd/amdgpu/jpeg_v1_0.c |   3 +-
  drivers/gpu/drm/amd/amdgpu/jpeg_v2_0.c |   3 +-
  drivers/gpu/drm/amd/amdgpu/jpeg_v2_5.c |   3 +-
  drivers/gpu/drm/amd/amdgpu/sdma_v2_4.c |   3 +-
  drivers/gpu/drm/amd/amdgpu/sdma_v3_0.c |   3 +-
  drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c |   6 +-
  drivers/gpu/drm/amd/amdgpu/sdma_v5_0.c |   3 +-
  drivers/gpu/drm/amd/amdgpu/si_dma.c|   3 +-
  drivers/gpu/drm/amd/amdgpu/uvd_v4_2.c  |   3 +-
  drivers/gpu/drm/amd/amdgpu/uvd_v5_0.c  |   3 +-
  drivers/gpu/drm/amd/amdgpu/uvd_v6_0.c  |   7 +-
  drivers/gpu/drm/amd/amdgpu/uvd_v7_0.c  |   6 +-
  drivers/gpu/drm/amd/amdgpu/vce_v2_0.c  |   2 +-
  drivers/gpu/drm/amd/amdgpu/vce_v3_0.c  |   3 +-
  drivers/gpu/drm/amd/amdgpu/vce_v4_0.c  |   3 +-
  drivers/gpu/drm/amd/amdgpu/vcn_v1_0.c  |   6 +-
  drivers/gpu/drm/amd/amdgpu/vcn_v2_0.c  |   6 +-
  drivers/gpu/drm/amd/amdgpu/vcn_v2_5.c  |   6 +-
  35 files changed, 145 insertions(+), 198 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
index 29f0a410091b..27abbdc603dd 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
@@ -721,6 +721,11 @@ struct amd_powerplay {
const struct amd_pm_funcs *pp_funcs;
  };

+struct amdgpu_sched {
+   uint32_tnum_scheds;
+   struct drm_gpu_scheduler*sched[HWIP_MAX_INSTANCE];
+};
+
  #define AMDGPU_RESET_MAGIC_NUM 64
  #define AMDGPU_MAX_DF_PERFMONS 4
  struct amdgpu_device {
@@ -858,6 +863,8 @@ struct amdgpu_device {
struct amdgpu_ring  *rings[AMDGPU_MAX_RINGS];
boolib_pool_ready;
struct amdgpu_sa_managerring_tmp_bo[AMDGPU_IB_POOL_MAX];
+   /* drm scheduler list */
+   struct amdgpu_sched 
gpu_sched[AMDGPU_HW_IP_NUM][AMDGPU_RING_PRIO_MAX];

That's a 2-dimensional array of "struct amdgpu_sched".
I think that the comment should be removed, or at least
not say "drm scheduler list". (I can see the structure
definition above.)



Yes I should remove it.



If this is the path you want to go, consider removing
"num_scheds" and creating a three dimensional array,
which would really essentialize the direction you want
to go:

struct drm_gpu_scheduler 
*gpu_sched[AMDGPU_HW_IP_NUM][AMDGPU_RING_PRIO_MAX][HWIP_MAX_INSTANCE];

Now that this architecture is stripped down to its essentials,
perhaps we can see some optimizations...?



If you mean whether we should see any performance improvement then imo 
we may not see much


difference as we are using pretty much same number of memory access plus 
now we have extra cost of array_index_nospec().


Also this is  not hot code path. We do only 1 
amdgpu_ctx_init_entity()/HW_IP/Context.




Also consider that since you're creating an array of pointers,
you don't necessarily need to know their count. Your hot-path
algorithms should not need to know it. If you need to print
their count, say in sysfs, then you can count them up on
behalf of the user-space process cat-ing the sysfs entry

Re: [PATCH v3 1/1] drm/amdgpu: rework sched_list generation

2020-03-31 Thread Christian König

Am 31.03.20 um 10:24 schrieb Nirmoy:

[SNIP]


+const unsigned int amdgpu_ring_type_to_drm_hw_ip[AMDGPU_HW_IP_NUM] = {
+    [AMDGPU_RING_TYPE_GFX]  = AMDGPU_HW_IP_GFX,
+    [AMDGPU_RING_TYPE_COMPUTE]  = AMDGPU_HW_IP_COMPUTE,
+    [AMDGPU_RING_TYPE_SDMA] = AMDGPU_HW_IP_DMA,
+    [AMDGPU_RING_TYPE_UVD]  = AMDGPU_HW_IP_UVD,
+    [AMDGPU_RING_TYPE_VCE]  = AMDGPU_HW_IP_VCE,
+    [AMDGPU_RING_TYPE_UVD_ENC]  = AMDGPU_HW_IP_UVD_ENC,
+    [AMDGPU_RING_TYPE_VCN_DEC]  = AMDGPU_HW_IP_VCN_DEC,
+    [AMDGPU_RING_TYPE_VCN_ENC]  = AMDGPU_HW_IP_VCN_ENC,
+    [AMDGPU_RING_TYPE_VCN_JPEG] = AMDGPU_HW_IP_VCN_JPEG,
+};
+


Please don't use a map for this, it is a 1 to 1 mapping as far as I 
can see which the compiler can eliminate.



So I should just adev->gpu_sched[ring->funcs->type] ?


Yes, the AMDGPU_HW_IP_* definitions are UAPI anyway and shouldn't leak 
into the backend if possible.


Christian.






The change to the amdgpu_ring_type definition looked better to me, 
but please make that a separate patch.



Okay.


Regards,

Nirmoy




Christian.


  /**
   * amdgpu_ring_alloc - allocate space on the ring buffer
   *
@@ -162,11 +174,13 @@ void amdgpu_ring_undo(struct amdgpu_ring *ring)
   * Returns 0 on success, error on failure.
   */
  int amdgpu_ring_init(struct amdgpu_device *adev, struct 
amdgpu_ring *ring,

- unsigned max_dw, struct amdgpu_irq_src *irq_src,
- unsigned irq_type)
+ unsigned int max_dw, struct amdgpu_irq_src *irq_src,
+ unsigned int irq_type, unsigned int hw_prio)
  {
  int r, i;
  int sched_hw_submission = amdgpu_sched_hw_submission;
+    unsigned int *num_sched;
+    unsigned int hw_ip;

  /* Set the hw submission limit higher for KIQ because
   * it's used for a number of gfx/compute tasks by both
@@ -258,6 +272,12 @@ int amdgpu_ring_init(struct amdgpu_device 
*adev, struct amdgpu_ring *ring,

  ring->priority = DRM_SCHED_PRIORITY_NORMAL;
  mutex_init(&ring->priority_mutex);

+    if (ring->funcs->type != AMDGPU_RING_TYPE_KIQ) {
+    hw_ip = amdgpu_ring_type_to_drm_hw_ip[ring->funcs->type];
+    num_sched = &adev->gpu_sched[hw_ip][hw_prio].num_scheds;
+ adev->gpu_sched[hw_ip][hw_prio].sched[(*num_sched)++] = &ring->sched;
+    }
+
  for (i = 0; i < DRM_SCHED_PRIORITY_MAX; ++i)
  atomic_set(&ring->num_jobs[i], 0);

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h

index 080024d21d3e..219fb1a07b12 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h
@@ -35,6 +35,9 @@
  #define AMDGPU_MAX_VCE_RINGS    3
  #define AMDGPU_MAX_UVD_ENC_RINGS    2

+#define AMDGPU_RING_PRIO_DEFAULT    1
+#define AMDGPU_RING_PRIO_MAX    AMDGPU_GFX_PIPE_PRIO_MAX
+
  /* some special values for the owner field */
  #define AMDGPU_FENCE_OWNER_UNDEFINED    ((void *)0ul)
  #define AMDGPU_FENCE_OWNER_VM    ((void *)1ul)
@@ -52,11 +55,11 @@ enum amdgpu_ring_type {
  AMDGPU_RING_TYPE_SDMA,
  AMDGPU_RING_TYPE_UVD,
  AMDGPU_RING_TYPE_VCE,
-    AMDGPU_RING_TYPE_KIQ,
  AMDGPU_RING_TYPE_UVD_ENC,
  AMDGPU_RING_TYPE_VCN_DEC,
  AMDGPU_RING_TYPE_VCN_ENC,
-    AMDGPU_RING_TYPE_VCN_JPEG
+    AMDGPU_RING_TYPE_VCN_JPEG,
+    AMDGPU_RING_TYPE_KIQ
  };

  struct amdgpu_device;
@@ -221,7 +224,6 @@ struct amdgpu_ring {
  struct mutex    priority_mutex;
  /* protected by priority_mutex */
  int    priority;
-    bool    has_high_prio;

  #if defined(CONFIG_DEBUG_FS)
  struct dentry *ent;
@@ -259,8 +261,8 @@ void amdgpu_ring_generic_pad_ib(struct 
amdgpu_ring *ring, struct amdgpu_ib *ib);

  void amdgpu_ring_commit(struct amdgpu_ring *ring);
  void amdgpu_ring_undo(struct amdgpu_ring *ring);
  int amdgpu_ring_init(struct amdgpu_device *adev, struct 
amdgpu_ring *ring,

- unsigned ring_size, struct amdgpu_irq_src *irq_src,
- unsigned irq_type);
+ unsigned int ring_size, struct amdgpu_irq_src *irq_src,
+ unsigned int irq_type, unsigned int prio);
  void amdgpu_ring_fini(struct amdgpu_ring *ring);
  void amdgpu_ring_emit_reg_write_reg_wait_helper(struct amdgpu_ring 
*ring,

  uint32_t reg0, uint32_t val0,
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_sdma.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_sdma.h

index 2f4412e030a4..e5b8fb8e75c5 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_sdma.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_sdma.h
@@ -61,8 +61,6 @@ struct amdgpu_sdma_ras_funcs {

  struct amdgpu_sdma {
  struct amdgpu_sdma_instance instance[AMDGPU_MAX_SDMA_INSTANCES];
-    struct drm_gpu_scheduler *sdma_sched[AMDGPU_MAX_SDMA_INSTANCES];
-    uint32_t    num_sdma_sched;
  struct amdgpu_irq_src    trap_irq;
  struct amdgpu_irq_src    illegal_inst_irq;
  struct amdgpu_irq_src    ecc_irq;
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.h

in

Re: [PATCH v3 1/1] drm/amdgpu: rework sched_list generation

2020-03-31 Thread Nirmoy


On 3/31/20 9:56 AM, Christian König wrote:

Am 30.03.20 um 17:49 schrieb Nirmoy Das:

Generate HW IP's sched_list in amdgpu_ring_init() instead of
amdgpu_ctx.c. This makes amdgpu_ctx_init_compute_sched(),
ring.has_high_prio and amdgpu_ctx_init_sched() unnecessary.
This patch also stores sched_list for all HW IPs in one big
array in struct amdgpu_device which makes amdgpu_ctx_init_entity()
much more leaner.

v2:
fix a coding style issue
do not use drm hw_ip const to populate amdgpu_ring_type enum

v3:
remove ctx reference and move sched array and num_sched to a struct
use num_scheds to detect uninitialized scheduler list

Signed-off-by: Nirmoy Das 
---
  drivers/gpu/drm/amd/amdgpu/amdgpu.h    |   8 ++
  drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c    | 156 -
  drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.h    |   3 -
  drivers/gpu/drm/amd/amdgpu/amdgpu_device.c |   2 -
  drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c    |   3 +-
  drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h    |   5 -
  drivers/gpu/drm/amd/amdgpu/amdgpu_jpeg.h   |   2 -
  drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c   |  24 +++-
  drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h   |  12 +-
  drivers/gpu/drm/amd/amdgpu/amdgpu_sdma.h   |   2 -
  drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.h    |   4 -
  drivers/gpu/drm/amd/amdgpu/cik_sdma.c  |   3 +-
  drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c |  13 +-
  drivers/gpu/drm/amd/amdgpu/gfx_v6_0.c  |   5 +-
  drivers/gpu/drm/amd/amdgpu/gfx_v7_0.c  |   5 +-
  drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c  |  11 +-
  drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c  |  13 +-
  drivers/gpu/drm/amd/amdgpu/jpeg_v1_0.c |   3 +-
  drivers/gpu/drm/amd/amdgpu/jpeg_v2_0.c |   3 +-
  drivers/gpu/drm/amd/amdgpu/jpeg_v2_5.c |   3 +-
  drivers/gpu/drm/amd/amdgpu/sdma_v2_4.c |   3 +-
  drivers/gpu/drm/amd/amdgpu/sdma_v3_0.c |   3 +-
  drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c |   6 +-
  drivers/gpu/drm/amd/amdgpu/sdma_v5_0.c |   3 +-
  drivers/gpu/drm/amd/amdgpu/si_dma.c    |   3 +-
  drivers/gpu/drm/amd/amdgpu/uvd_v4_2.c  |   3 +-
  drivers/gpu/drm/amd/amdgpu/uvd_v5_0.c  |   3 +-
  drivers/gpu/drm/amd/amdgpu/uvd_v6_0.c  |   7 +-
  drivers/gpu/drm/amd/amdgpu/uvd_v7_0.c  |   6 +-
  drivers/gpu/drm/amd/amdgpu/vce_v2_0.c  |   2 +-
  drivers/gpu/drm/amd/amdgpu/vce_v3_0.c  |   3 +-
  drivers/gpu/drm/amd/amdgpu/vce_v4_0.c  |   3 +-
  drivers/gpu/drm/amd/amdgpu/vcn_v1_0.c  |   6 +-
  drivers/gpu/drm/amd/amdgpu/vcn_v2_0.c  |   6 +-
  drivers/gpu/drm/amd/amdgpu/vcn_v2_5.c  |   6 +-
  35 files changed, 145 insertions(+), 198 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu.h

index 29f0a410091b..27abbdc603dd 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
@@ -721,6 +721,11 @@ struct amd_powerplay {
  const struct amd_pm_funcs *pp_funcs;
  };

+struct amdgpu_sched {
+    uint32_t    num_scheds;
+    struct drm_gpu_scheduler    *sched[HWIP_MAX_INSTANCE];
+};
+


That can probably go into amdgpu_ring.h.


  #define AMDGPU_RESET_MAGIC_NUM 64
  #define AMDGPU_MAX_DF_PERFMONS 4
  struct amdgpu_device {
@@ -858,6 +863,8 @@ struct amdgpu_device {
  struct amdgpu_ring    *rings[AMDGPU_MAX_RINGS];
  bool    ib_pool_ready;
  struct amdgpu_sa_manager ring_tmp_bo[AMDGPU_IB_POOL_MAX];
+    /* drm scheduler list */
+    struct amdgpu_sched 
gpu_sched[AMDGPU_HW_IP_NUM][AMDGPU_RING_PRIO_MAX];


BTW: You need to be careful when indexing into this by an user 
specified value, e.g. use array_index_nospec().



Thanks, didn't know about this. I will modify accordingly.






+const unsigned int amdgpu_ring_type_to_drm_hw_ip[AMDGPU_HW_IP_NUM] = {
+    [AMDGPU_RING_TYPE_GFX]  = AMDGPU_HW_IP_GFX,
+    [AMDGPU_RING_TYPE_COMPUTE]  = AMDGPU_HW_IP_COMPUTE,
+    [AMDGPU_RING_TYPE_SDMA] = AMDGPU_HW_IP_DMA,
+    [AMDGPU_RING_TYPE_UVD]  = AMDGPU_HW_IP_UVD,
+    [AMDGPU_RING_TYPE_VCE]  = AMDGPU_HW_IP_VCE,
+    [AMDGPU_RING_TYPE_UVD_ENC]  = AMDGPU_HW_IP_UVD_ENC,
+    [AMDGPU_RING_TYPE_VCN_DEC]  = AMDGPU_HW_IP_VCN_DEC,
+    [AMDGPU_RING_TYPE_VCN_ENC]  = AMDGPU_HW_IP_VCN_ENC,
+    [AMDGPU_RING_TYPE_VCN_JPEG] = AMDGPU_HW_IP_VCN_JPEG,
+};
+


Please don't use a map for this, it is a 1 to 1 mapping as far as I 
can see which the compiler can eliminate.



So I should just adev->gpu_sched[ring->funcs->type] ?




The change to the amdgpu_ring_type definition looked better to me, but 
please make that a separate patch.



Okay.


Regards,

Nirmoy




Christian.


  /**
   * amdgpu_ring_alloc - allocate space on the ring buffer
   *
@@ -162,11 +174,13 @@ void amdgpu_ring_undo(struct amdgpu_ring *ring)
   * Returns 0 on success, error on failure.
   */
  int amdgpu_ring_init(struct amdgpu_device *adev, struct amdgpu_ring 
*ring,

- unsigned max_dw, struct amdgpu_irq_src *irq_src,
- unsigned irq_type)
+ unsigned int max_dw, stru

Re: [PATCH v3 1/1] drm/amdgpu: rework sched_list generation

2020-03-31 Thread Christian König

Am 30.03.20 um 17:49 schrieb Nirmoy Das:

Generate HW IP's sched_list in amdgpu_ring_init() instead of
amdgpu_ctx.c. This makes amdgpu_ctx_init_compute_sched(),
ring.has_high_prio and amdgpu_ctx_init_sched() unnecessary.
This patch also stores sched_list for all HW IPs in one big
array in struct amdgpu_device which makes amdgpu_ctx_init_entity()
much more leaner.

v2:
fix a coding style issue
do not use drm hw_ip const to populate amdgpu_ring_type enum

v3:
remove ctx reference and move sched array and num_sched to a struct
use num_scheds to detect uninitialized scheduler list

Signed-off-by: Nirmoy Das 
---
  drivers/gpu/drm/amd/amdgpu/amdgpu.h|   8 ++
  drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c| 156 -
  drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.h|   3 -
  drivers/gpu/drm/amd/amdgpu/amdgpu_device.c |   2 -
  drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c|   3 +-
  drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h|   5 -
  drivers/gpu/drm/amd/amdgpu/amdgpu_jpeg.h   |   2 -
  drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c   |  24 +++-
  drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h   |  12 +-
  drivers/gpu/drm/amd/amdgpu/amdgpu_sdma.h   |   2 -
  drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.h|   4 -
  drivers/gpu/drm/amd/amdgpu/cik_sdma.c  |   3 +-
  drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c |  13 +-
  drivers/gpu/drm/amd/amdgpu/gfx_v6_0.c  |   5 +-
  drivers/gpu/drm/amd/amdgpu/gfx_v7_0.c  |   5 +-
  drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c  |  11 +-
  drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c  |  13 +-
  drivers/gpu/drm/amd/amdgpu/jpeg_v1_0.c |   3 +-
  drivers/gpu/drm/amd/amdgpu/jpeg_v2_0.c |   3 +-
  drivers/gpu/drm/amd/amdgpu/jpeg_v2_5.c |   3 +-
  drivers/gpu/drm/amd/amdgpu/sdma_v2_4.c |   3 +-
  drivers/gpu/drm/amd/amdgpu/sdma_v3_0.c |   3 +-
  drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c |   6 +-
  drivers/gpu/drm/amd/amdgpu/sdma_v5_0.c |   3 +-
  drivers/gpu/drm/amd/amdgpu/si_dma.c|   3 +-
  drivers/gpu/drm/amd/amdgpu/uvd_v4_2.c  |   3 +-
  drivers/gpu/drm/amd/amdgpu/uvd_v5_0.c  |   3 +-
  drivers/gpu/drm/amd/amdgpu/uvd_v6_0.c  |   7 +-
  drivers/gpu/drm/amd/amdgpu/uvd_v7_0.c  |   6 +-
  drivers/gpu/drm/amd/amdgpu/vce_v2_0.c  |   2 +-
  drivers/gpu/drm/amd/amdgpu/vce_v3_0.c  |   3 +-
  drivers/gpu/drm/amd/amdgpu/vce_v4_0.c  |   3 +-
  drivers/gpu/drm/amd/amdgpu/vcn_v1_0.c  |   6 +-
  drivers/gpu/drm/amd/amdgpu/vcn_v2_0.c  |   6 +-
  drivers/gpu/drm/amd/amdgpu/vcn_v2_5.c  |   6 +-
  35 files changed, 145 insertions(+), 198 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
index 29f0a410091b..27abbdc603dd 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
@@ -721,6 +721,11 @@ struct amd_powerplay {
const struct amd_pm_funcs *pp_funcs;
  };

+struct amdgpu_sched {
+   uint32_tnum_scheds;
+   struct drm_gpu_scheduler*sched[HWIP_MAX_INSTANCE];
+};
+


That can probably go into amdgpu_ring.h.


  #define AMDGPU_RESET_MAGIC_NUM 64
  #define AMDGPU_MAX_DF_PERFMONS 4
  struct amdgpu_device {
@@ -858,6 +863,8 @@ struct amdgpu_device {
struct amdgpu_ring  *rings[AMDGPU_MAX_RINGS];
boolib_pool_ready;
struct amdgpu_sa_managerring_tmp_bo[AMDGPU_IB_POOL_MAX];
+   /* drm scheduler list */
+   struct amdgpu_sched 
gpu_sched[AMDGPU_HW_IP_NUM][AMDGPU_RING_PRIO_MAX];


BTW: You need to be careful when indexing into this by an user specified 
value, e.g. use array_index_nospec().




/* interrupts */
struct amdgpu_irq   irq;
@@ -993,6 +1000,7 @@ struct amdgpu_device {
charproduct_number[16];
charproduct_name[32];
charserial[16];
+
  };

  static inline struct amdgpu_device *amdgpu_ttm_adev(struct ttm_bo_device 
*bdev)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c
index 6ed36a2c5f73..331646d472e2 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c
@@ -72,13 +72,30 @@ static enum gfx_pipe_priority 
amdgpu_ctx_sched_prio_to_compute_prio(enum drm_sch
}
  }

+static unsigned int amdgpu_ctx_sched_prio_to_hw_prio(struct amdgpu_device 
*adev,
+enum drm_sched_priority 
prio,
+const int hw_ip)
+{
+   unsigned int hw_prio;
+
+   hw_prio = (hw_ip == AMDGPU_HW_IP_COMPUTE) ?
+   amdgpu_ctx_sched_prio_to_compute_prio(prio) :
+   AMDGPU_RING_PRIO_DEFAULT;
+
+   /* set to default prio if sched_list is NULL */
+   if (!adev->gpu_sched[hw_ip][hw_prio].num_scheds)
+   hw_prio = AMDGPU_RING_PRIO_DEFAULT;
+
+   ret

Re: [PATCH v3 1/1] drm/amdgpu: rework sched_list generation

2020-03-30 Thread Luben Tuikov
This patch seems to be using DOS line-endings. After converting
it to UNIX line-endings, the output of "git am" using "scripts/checkpatch.pl"
via the pre-commit hook is appended last to my thoughts below.

On 2020-03-30 11:49 a.m., Nirmoy Das wrote:
> Generate HW IP's sched_list in amdgpu_ring_init() instead of
> amdgpu_ctx.c. This makes amdgpu_ctx_init_compute_sched(),
> ring.has_high_prio and amdgpu_ctx_init_sched() unnecessary.
> This patch also stores sched_list for all HW IPs in one big
> array in struct amdgpu_device which makes amdgpu_ctx_init_entity()
> much more leaner.
> 
> v2:
> fix a coding style issue
> do not use drm hw_ip const to populate amdgpu_ring_type enum
> 
> v3:
> remove ctx reference and move sched array and num_sched to a struct
> use num_scheds to detect uninitialized scheduler list
> 
> Signed-off-by: Nirmoy Das 
> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu.h|   8 ++
>  drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c| 156 -
>  drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.h|   3 -
>  drivers/gpu/drm/amd/amdgpu/amdgpu_device.c |   2 -
>  drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c|   3 +-
>  drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h|   5 -
>  drivers/gpu/drm/amd/amdgpu/amdgpu_jpeg.h   |   2 -
>  drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c   |  24 +++-
>  drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h   |  12 +-
>  drivers/gpu/drm/amd/amdgpu/amdgpu_sdma.h   |   2 -
>  drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.h|   4 -
>  drivers/gpu/drm/amd/amdgpu/cik_sdma.c  |   3 +-
>  drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c |  13 +-
>  drivers/gpu/drm/amd/amdgpu/gfx_v6_0.c  |   5 +-
>  drivers/gpu/drm/amd/amdgpu/gfx_v7_0.c  |   5 +-
>  drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c  |  11 +-
>  drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c  |  13 +-
>  drivers/gpu/drm/amd/amdgpu/jpeg_v1_0.c |   3 +-
>  drivers/gpu/drm/amd/amdgpu/jpeg_v2_0.c |   3 +-
>  drivers/gpu/drm/amd/amdgpu/jpeg_v2_5.c |   3 +-
>  drivers/gpu/drm/amd/amdgpu/sdma_v2_4.c |   3 +-
>  drivers/gpu/drm/amd/amdgpu/sdma_v3_0.c |   3 +-
>  drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c |   6 +-
>  drivers/gpu/drm/amd/amdgpu/sdma_v5_0.c |   3 +-
>  drivers/gpu/drm/amd/amdgpu/si_dma.c|   3 +-
>  drivers/gpu/drm/amd/amdgpu/uvd_v4_2.c  |   3 +-
>  drivers/gpu/drm/amd/amdgpu/uvd_v5_0.c  |   3 +-
>  drivers/gpu/drm/amd/amdgpu/uvd_v6_0.c  |   7 +-
>  drivers/gpu/drm/amd/amdgpu/uvd_v7_0.c  |   6 +-
>  drivers/gpu/drm/amd/amdgpu/vce_v2_0.c  |   2 +-
>  drivers/gpu/drm/amd/amdgpu/vce_v3_0.c  |   3 +-
>  drivers/gpu/drm/amd/amdgpu/vce_v4_0.c  |   3 +-
>  drivers/gpu/drm/amd/amdgpu/vcn_v1_0.c  |   6 +-
>  drivers/gpu/drm/amd/amdgpu/vcn_v2_0.c  |   6 +-
>  drivers/gpu/drm/amd/amdgpu/vcn_v2_5.c  |   6 +-
>  35 files changed, 145 insertions(+), 198 deletions(-)
> 
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
> index 29f0a410091b..27abbdc603dd 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
> @@ -721,6 +721,11 @@ struct amd_powerplay {
>   const struct amd_pm_funcs *pp_funcs;
>  };
> 
> +struct amdgpu_sched {
> + uint32_tnum_scheds;
> + struct drm_gpu_scheduler*sched[HWIP_MAX_INSTANCE];
> +};
> +
>  #define AMDGPU_RESET_MAGIC_NUM 64
>  #define AMDGPU_MAX_DF_PERFMONS 4
>  struct amdgpu_device {
> @@ -858,6 +863,8 @@ struct amdgpu_device {
>   struct amdgpu_ring  *rings[AMDGPU_MAX_RINGS];
>   boolib_pool_ready;
>   struct amdgpu_sa_managerring_tmp_bo[AMDGPU_IB_POOL_MAX];
> + /* drm scheduler list */
> + struct amdgpu_sched 
> gpu_sched[AMDGPU_HW_IP_NUM][AMDGPU_RING_PRIO_MAX];

That's a 2-dimensional array of "struct amdgpu_sched".
I think that the comment should be removed, or at least
not say "drm scheduler list". (I can see the structure
definition above.)

If this is the path you want to go, consider removing
"num_scheds" and creating a three dimensional array,
which would really essentialize the direction you want
to go:

struct drm_gpu_scheduler 
*gpu_sched[AMDGPU_HW_IP_NUM][AMDGPU_RING_PRIO_MAX][HWIP_MAX_INSTANCE];

Now that this architecture is stripped down to its essentials,
perhaps we can see some optimizations...?

Also consider that since you're creating an array of pointers,
you don't necessarily need to know their count. Your hot-path
algorithms should not need to know it. If you need to print
their count, say in sysfs, then you can count them up on
behalf of the user-space process cat-ing the sysfs entry.

> 
>   /* interrupts */
>   struct amdgpu_irq   irq;
> @@ -993,6 +1000,7 @@ struct amdgpu_device {
>   charproduct_number[16];
>   charproduct_name[32];
>   charserial[16];
> +
>  };

Unnecessary empty line above.

> 
>  s