Re: [PATCH v3 1/1] drm/amdgpu: rework sched_list generation
On 4/1/20 5:02 PM, Luben Tuikov wrote: On 2020-03-31 08:46, Nirmoy wrote: On 3/31/20 3:01 AM, Luben Tuikov wrote: This patch seems to be using DOS line-endings. Strange, I don't see that in my local patch file. Not sure why "git am" complained about DOS endings the first time I downloaded it. Second time was fine. [snip]>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h b/drivers/gpu/drm/amd/amdgpu/amdgpu.h index 29f0a410091b..27abbdc603dd 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h @@ -721,6 +721,11 @@ struct amd_powerplay { const struct amd_pm_funcs *pp_funcs; }; +struct amdgpu_sched { + uint32_tnum_scheds; + struct drm_gpu_scheduler*sched[HWIP_MAX_INSTANCE]; +}; + #define AMDGPU_RESET_MAGIC_NUM 64 #define AMDGPU_MAX_DF_PERFMONS 4 struct amdgpu_device { @@ -858,6 +863,8 @@ struct amdgpu_device { struct amdgpu_ring *rings[AMDGPU_MAX_RINGS]; boolib_pool_ready; struct amdgpu_sa_managerring_tmp_bo[AMDGPU_IB_POOL_MAX]; + /* drm scheduler list */ + struct amdgpu_sched gpu_sched[AMDGPU_HW_IP_NUM][AMDGPU_RING_PRIO_MAX]; That's a 2-dimensional array of "struct amdgpu_sched". I think that the comment should be removed, or at least not say "drm scheduler list". (I can see the structure definition above.) Yes I should remove it. If this is the path you want to go, consider removing "num_scheds" and creating a three dimensional array, which would really essentialize the direction you want to go: struct drm_gpu_scheduler *gpu_sched[AMDGPU_HW_IP_NUM][AMDGPU_RING_PRIO_MAX][HWIP_MAX_INSTANCE]; Now that this architecture is stripped down to its essentials, perhaps we can see some optimizations...? If you mean whether we should see any performance improvement then imo we may not see much difference as we are using pretty much same number of memory access plus now we have extra cost of array_index_nospec(). Also this is not hot code path. We do only 1 amdgpu_ctx_init_entity()/HW_IP/Context. No, this has nothing to do with "performance". It's all about architecture and design. You seem to have array-array-struct with array and int, and it seems much cleaner to just have array-array-array. I think you don't need to break the chain with struct of int and array, just as I described in my comment below which you snipped without addressing it. If you're not going to address a comment, don't delete it, leave it for others to see that it hasn't been addressed. Only snip previously addressed and resolved comments and previously seen patch info, like diffstat/etc. I wanted to understand "running pointer" before I could comment in there. Also consider that since you're creating an array of pointers, you don't necessarily need to know their count. Your hot-path algorithms should not need to know it. If you need to print their count, say in sysfs, then you can count them up on behalf of the user-space process cat-ing the sysfs entry. [snip] + + /* set to default prio if sched_list is NULL */ + if (!adev->gpu_sched[hw_ip][hw_prio].num_scheds) + hw_prio = AMDGPU_RING_PRIO_DEFAULT; That comment is a bit confusing--it talks about a list not being NULL, while the conditional implicitly checks against 0. Yes, this is wrong, will remove it. I wish you hadn't snipped my comment here, but address it one way or the other. It is: I'd much rather that integer comparison be performed against integers, as opposed to using logical NOT operator (which is implicit in comparing against 0), i.e., if (adev->gpu_sched[hw_ip][hw_prio].num_scheds == 0) Also, architecturally, there seems to be informational redundancy, in keeping an integer count and list of objects at the same time, as the above if-conditional exposes: the comment talks about a list and NULL but the if-conditional implicitly checks for 0. Number of valid drm scheduler in adev->gpu_sched[hw_ip][hw_prio].sched will vary depending on priority and hw_ip. We need to pass that scheduler array and num_scheds to drm_sched_entity_init(). I think we often use array and integer count together when the number of valid items in the array is dynamic. Perhaps, we don't need "num_scheds" and you can just check if the index is NULL and assign PRIO_DEFAULT. @@ -258,6 +272,12 @@ int amdgpu_ring_init(struct amdgpu_device *adev, struct amdgpu_ring *ring, ring->priority = DRM_SCHED_PRIORITY_NORMAL; mutex_init(&ring->priority_mutex); + if (ring->funcs->type != AMDGPU_RING_TYPE_KIQ) { + hw_ip = amdgpu_ring_type_to_drm_hw_ip[ring->funcs->type]; + num_sched = &adev->gpu_sched[hw_ip][hw_prio].num_scheds; + adev->gpu_sched[hw_ip][hw_prio].sched[(*num_sched)++] = &ring->sched; + } This seems unnecessarily complicated. Perhaps we can remove "num_sc
Re: [PATCH v3 1/1] drm/amdgpu: rework sched_list generation
On 2020-03-31 08:46, Nirmoy wrote: > > On 3/31/20 3:01 AM, Luben Tuikov wrote: >> This patch seems to be using DOS line-endings. > > > Strange, I don't see that in my local patch file. > Not sure why "git am" complained about DOS endings the first time I downloaded it. Second time was fine. [snip]>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h b/drivers/gpu/drm/amd/amdgpu/amdgpu.h >>> index 29f0a410091b..27abbdc603dd 100644 >>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h >>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h >>> @@ -721,6 +721,11 @@ struct amd_powerplay { >>> const struct amd_pm_funcs *pp_funcs; >>> }; >>> >>> +struct amdgpu_sched { >>> + uint32_tnum_scheds; >>> + struct drm_gpu_scheduler*sched[HWIP_MAX_INSTANCE]; >>> +}; >>> + >>> #define AMDGPU_RESET_MAGIC_NUM 64 >>> #define AMDGPU_MAX_DF_PERFMONS 4 >>> struct amdgpu_device { >>> @@ -858,6 +863,8 @@ struct amdgpu_device { >>> struct amdgpu_ring *rings[AMDGPU_MAX_RINGS]; >>> boolib_pool_ready; >>> struct amdgpu_sa_managerring_tmp_bo[AMDGPU_IB_POOL_MAX]; >>> + /* drm scheduler list */ >>> + struct amdgpu_sched >>> gpu_sched[AMDGPU_HW_IP_NUM][AMDGPU_RING_PRIO_MAX]; >> That's a 2-dimensional array of "struct amdgpu_sched". >> I think that the comment should be removed, or at least >> not say "drm scheduler list". (I can see the structure >> definition above.) > > > Yes I should remove it. > > >> If this is the path you want to go, consider removing >> "num_scheds" and creating a three dimensional array, >> which would really essentialize the direction you want >> to go: >> >> struct drm_gpu_scheduler >> *gpu_sched[AMDGPU_HW_IP_NUM][AMDGPU_RING_PRIO_MAX][HWIP_MAX_INSTANCE]; >> >> Now that this architecture is stripped down to its essentials, >> perhaps we can see some optimizations...? > > > If you mean whether we should see any performance improvement then imo > we may not see much > > difference as we are using pretty much same number of memory access plus > now we have extra cost of array_index_nospec(). > > Also this is not hot code path. We do only 1 > amdgpu_ctx_init_entity()/HW_IP/Context. No, this has nothing to do with "performance". It's all about architecture and design. You seem to have array-array-struct with array and int, and it seems much cleaner to just have array-array-array. I think you don't need to break the chain with struct of int and array, just as I described in my comment below which you snipped without addressing it. If you're not going to address a comment, don't delete it, leave it for others to see that it hasn't been addressed. Only snip previously addressed and resolved comments and previously seen patch info, like diffstat/etc. >> Also consider that since you're creating an array of pointers, >> you don't necessarily need to know their count. Your hot-path >> algorithms should not need to know it. If you need to print >> their count, say in sysfs, then you can count them up on >> behalf of the user-space process cat-ing the sysfs entry. >> [snip] >>> + >>> + /* set to default prio if sched_list is NULL */ >>> + if (!adev->gpu_sched[hw_ip][hw_prio].num_scheds) >>> + hw_prio = AMDGPU_RING_PRIO_DEFAULT; >> That comment is a bit confusing--it talks about a list >> not being NULL, while the conditional implicitly checks >> against 0. > > > Yes, this is wrong, will remove it. > > > I wish you hadn't snipped my comment here, but address it one way or the other. It is: I'd much rather that integer comparison be performed against integers, as opposed to using logical NOT operator (which is implicit in comparing against 0), i.e., if (adev->gpu_sched[hw_ip][hw_prio].num_scheds == 0) Also, architecturally, there seems to be informational redundancy, in keeping an integer count and list of objects at the same time, as the above if-conditional exposes: the comment talks about a list and NULL but the if-conditional implicitly checks for 0. Perhaps, we don't need "num_scheds" and you can just check if the index is NULL and assign PRIO_DEFAULT. >> @@ -258,6 +272,12 @@ int amdgpu_ring_init(struct amdgpu_device *adev, struct >> amdgpu_ring *ring, >> ring->priority = DRM_SCHED_PRIORITY_NORMAL; >> mutex_init(&ring->priority_mutex); >> >> +if (ring->funcs->type != AMDGPU_RING_TYPE_KIQ) { >> +hw_ip = amdgpu_ring_type_to_drm_hw_ip[ring->funcs->type]; >> +num_sched = &adev->gpu_sched[hw_ip][hw_prio].num_scheds; >> +adev->gpu_sched[hw_ip][hw_prio].sched[(*num_sched)++] = >> &ring->sched; >> +} >> This seems unnecessarily complicated. Perhaps we can remove >> "num_scheds", as recommended above, and keep a running pointer >> while initialization is being done...? > > > What do you mean by running pointer ? A "running pointer" is a local pointer you're using temporarily to traverse memory. If you remove the "num_
Re: [PATCH v3 1/1] drm/amdgpu: rework sched_list generation
On 3/31/20 3:01 AM, Luben Tuikov wrote: This patch seems to be using DOS line-endings. Strange, I don't see that in my local patch file. After converting it to UNIX line-endings, the output of "git am" using "scripts/checkpatch.pl" via the pre-commit hook is appended last to my thoughts below. On 2020-03-30 11:49 a.m., Nirmoy Das wrote: Generate HW IP's sched_list in amdgpu_ring_init() instead of amdgpu_ctx.c. This makes amdgpu_ctx_init_compute_sched(), ring.has_high_prio and amdgpu_ctx_init_sched() unnecessary. This patch also stores sched_list for all HW IPs in one big array in struct amdgpu_device which makes amdgpu_ctx_init_entity() much more leaner. v2: fix a coding style issue do not use drm hw_ip const to populate amdgpu_ring_type enum v3: remove ctx reference and move sched array and num_sched to a struct use num_scheds to detect uninitialized scheduler list Signed-off-by: Nirmoy Das --- drivers/gpu/drm/amd/amdgpu/amdgpu.h| 8 ++ drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c| 156 - drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.h| 3 - drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 2 - drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c| 3 +- drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h| 5 - drivers/gpu/drm/amd/amdgpu/amdgpu_jpeg.h | 2 - drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c | 24 +++- drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h | 12 +- drivers/gpu/drm/amd/amdgpu/amdgpu_sdma.h | 2 - drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.h| 4 - drivers/gpu/drm/amd/amdgpu/cik_sdma.c | 3 +- drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c | 13 +- drivers/gpu/drm/amd/amdgpu/gfx_v6_0.c | 5 +- drivers/gpu/drm/amd/amdgpu/gfx_v7_0.c | 5 +- drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c | 11 +- drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c | 13 +- drivers/gpu/drm/amd/amdgpu/jpeg_v1_0.c | 3 +- drivers/gpu/drm/amd/amdgpu/jpeg_v2_0.c | 3 +- drivers/gpu/drm/amd/amdgpu/jpeg_v2_5.c | 3 +- drivers/gpu/drm/amd/amdgpu/sdma_v2_4.c | 3 +- drivers/gpu/drm/amd/amdgpu/sdma_v3_0.c | 3 +- drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c | 6 +- drivers/gpu/drm/amd/amdgpu/sdma_v5_0.c | 3 +- drivers/gpu/drm/amd/amdgpu/si_dma.c| 3 +- drivers/gpu/drm/amd/amdgpu/uvd_v4_2.c | 3 +- drivers/gpu/drm/amd/amdgpu/uvd_v5_0.c | 3 +- drivers/gpu/drm/amd/amdgpu/uvd_v6_0.c | 7 +- drivers/gpu/drm/amd/amdgpu/uvd_v7_0.c | 6 +- drivers/gpu/drm/amd/amdgpu/vce_v2_0.c | 2 +- drivers/gpu/drm/amd/amdgpu/vce_v3_0.c | 3 +- drivers/gpu/drm/amd/amdgpu/vce_v4_0.c | 3 +- drivers/gpu/drm/amd/amdgpu/vcn_v1_0.c | 6 +- drivers/gpu/drm/amd/amdgpu/vcn_v2_0.c | 6 +- drivers/gpu/drm/amd/amdgpu/vcn_v2_5.c | 6 +- 35 files changed, 145 insertions(+), 198 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h b/drivers/gpu/drm/amd/amdgpu/amdgpu.h index 29f0a410091b..27abbdc603dd 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h @@ -721,6 +721,11 @@ struct amd_powerplay { const struct amd_pm_funcs *pp_funcs; }; +struct amdgpu_sched { + uint32_tnum_scheds; + struct drm_gpu_scheduler*sched[HWIP_MAX_INSTANCE]; +}; + #define AMDGPU_RESET_MAGIC_NUM 64 #define AMDGPU_MAX_DF_PERFMONS 4 struct amdgpu_device { @@ -858,6 +863,8 @@ struct amdgpu_device { struct amdgpu_ring *rings[AMDGPU_MAX_RINGS]; boolib_pool_ready; struct amdgpu_sa_managerring_tmp_bo[AMDGPU_IB_POOL_MAX]; + /* drm scheduler list */ + struct amdgpu_sched gpu_sched[AMDGPU_HW_IP_NUM][AMDGPU_RING_PRIO_MAX]; That's a 2-dimensional array of "struct amdgpu_sched". I think that the comment should be removed, or at least not say "drm scheduler list". (I can see the structure definition above.) Yes I should remove it. If this is the path you want to go, consider removing "num_scheds" and creating a three dimensional array, which would really essentialize the direction you want to go: struct drm_gpu_scheduler *gpu_sched[AMDGPU_HW_IP_NUM][AMDGPU_RING_PRIO_MAX][HWIP_MAX_INSTANCE]; Now that this architecture is stripped down to its essentials, perhaps we can see some optimizations...? If you mean whether we should see any performance improvement then imo we may not see much difference as we are using pretty much same number of memory access plus now we have extra cost of array_index_nospec(). Also this is not hot code path. We do only 1 amdgpu_ctx_init_entity()/HW_IP/Context. Also consider that since you're creating an array of pointers, you don't necessarily need to know their count. Your hot-path algorithms should not need to know it. If you need to print their count, say in sysfs, then you can count them up on behalf of the user-space process cat-ing the sysfs entry
Re: [PATCH v3 1/1] drm/amdgpu: rework sched_list generation
Am 31.03.20 um 10:24 schrieb Nirmoy: [SNIP] +const unsigned int amdgpu_ring_type_to_drm_hw_ip[AMDGPU_HW_IP_NUM] = { + [AMDGPU_RING_TYPE_GFX] = AMDGPU_HW_IP_GFX, + [AMDGPU_RING_TYPE_COMPUTE] = AMDGPU_HW_IP_COMPUTE, + [AMDGPU_RING_TYPE_SDMA] = AMDGPU_HW_IP_DMA, + [AMDGPU_RING_TYPE_UVD] = AMDGPU_HW_IP_UVD, + [AMDGPU_RING_TYPE_VCE] = AMDGPU_HW_IP_VCE, + [AMDGPU_RING_TYPE_UVD_ENC] = AMDGPU_HW_IP_UVD_ENC, + [AMDGPU_RING_TYPE_VCN_DEC] = AMDGPU_HW_IP_VCN_DEC, + [AMDGPU_RING_TYPE_VCN_ENC] = AMDGPU_HW_IP_VCN_ENC, + [AMDGPU_RING_TYPE_VCN_JPEG] = AMDGPU_HW_IP_VCN_JPEG, +}; + Please don't use a map for this, it is a 1 to 1 mapping as far as I can see which the compiler can eliminate. So I should just adev->gpu_sched[ring->funcs->type] ? Yes, the AMDGPU_HW_IP_* definitions are UAPI anyway and shouldn't leak into the backend if possible. Christian. The change to the amdgpu_ring_type definition looked better to me, but please make that a separate patch. Okay. Regards, Nirmoy Christian. /** * amdgpu_ring_alloc - allocate space on the ring buffer * @@ -162,11 +174,13 @@ void amdgpu_ring_undo(struct amdgpu_ring *ring) * Returns 0 on success, error on failure. */ int amdgpu_ring_init(struct amdgpu_device *adev, struct amdgpu_ring *ring, - unsigned max_dw, struct amdgpu_irq_src *irq_src, - unsigned irq_type) + unsigned int max_dw, struct amdgpu_irq_src *irq_src, + unsigned int irq_type, unsigned int hw_prio) { int r, i; int sched_hw_submission = amdgpu_sched_hw_submission; + unsigned int *num_sched; + unsigned int hw_ip; /* Set the hw submission limit higher for KIQ because * it's used for a number of gfx/compute tasks by both @@ -258,6 +272,12 @@ int amdgpu_ring_init(struct amdgpu_device *adev, struct amdgpu_ring *ring, ring->priority = DRM_SCHED_PRIORITY_NORMAL; mutex_init(&ring->priority_mutex); + if (ring->funcs->type != AMDGPU_RING_TYPE_KIQ) { + hw_ip = amdgpu_ring_type_to_drm_hw_ip[ring->funcs->type]; + num_sched = &adev->gpu_sched[hw_ip][hw_prio].num_scheds; + adev->gpu_sched[hw_ip][hw_prio].sched[(*num_sched)++] = &ring->sched; + } + for (i = 0; i < DRM_SCHED_PRIORITY_MAX; ++i) atomic_set(&ring->num_jobs[i], 0); diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h index 080024d21d3e..219fb1a07b12 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h @@ -35,6 +35,9 @@ #define AMDGPU_MAX_VCE_RINGS 3 #define AMDGPU_MAX_UVD_ENC_RINGS 2 +#define AMDGPU_RING_PRIO_DEFAULT 1 +#define AMDGPU_RING_PRIO_MAX AMDGPU_GFX_PIPE_PRIO_MAX + /* some special values for the owner field */ #define AMDGPU_FENCE_OWNER_UNDEFINED ((void *)0ul) #define AMDGPU_FENCE_OWNER_VM ((void *)1ul) @@ -52,11 +55,11 @@ enum amdgpu_ring_type { AMDGPU_RING_TYPE_SDMA, AMDGPU_RING_TYPE_UVD, AMDGPU_RING_TYPE_VCE, - AMDGPU_RING_TYPE_KIQ, AMDGPU_RING_TYPE_UVD_ENC, AMDGPU_RING_TYPE_VCN_DEC, AMDGPU_RING_TYPE_VCN_ENC, - AMDGPU_RING_TYPE_VCN_JPEG + AMDGPU_RING_TYPE_VCN_JPEG, + AMDGPU_RING_TYPE_KIQ }; struct amdgpu_device; @@ -221,7 +224,6 @@ struct amdgpu_ring { struct mutex priority_mutex; /* protected by priority_mutex */ int priority; - bool has_high_prio; #if defined(CONFIG_DEBUG_FS) struct dentry *ent; @@ -259,8 +261,8 @@ void amdgpu_ring_generic_pad_ib(struct amdgpu_ring *ring, struct amdgpu_ib *ib); void amdgpu_ring_commit(struct amdgpu_ring *ring); void amdgpu_ring_undo(struct amdgpu_ring *ring); int amdgpu_ring_init(struct amdgpu_device *adev, struct amdgpu_ring *ring, - unsigned ring_size, struct amdgpu_irq_src *irq_src, - unsigned irq_type); + unsigned int ring_size, struct amdgpu_irq_src *irq_src, + unsigned int irq_type, unsigned int prio); void amdgpu_ring_fini(struct amdgpu_ring *ring); void amdgpu_ring_emit_reg_write_reg_wait_helper(struct amdgpu_ring *ring, uint32_t reg0, uint32_t val0, diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_sdma.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_sdma.h index 2f4412e030a4..e5b8fb8e75c5 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_sdma.h +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_sdma.h @@ -61,8 +61,6 @@ struct amdgpu_sdma_ras_funcs { struct amdgpu_sdma { struct amdgpu_sdma_instance instance[AMDGPU_MAX_SDMA_INSTANCES]; - struct drm_gpu_scheduler *sdma_sched[AMDGPU_MAX_SDMA_INSTANCES]; - uint32_t num_sdma_sched; struct amdgpu_irq_src trap_irq; struct amdgpu_irq_src illegal_inst_irq; struct amdgpu_irq_src ecc_irq; diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.h in
Re: [PATCH v3 1/1] drm/amdgpu: rework sched_list generation
On 3/31/20 9:56 AM, Christian König wrote: Am 30.03.20 um 17:49 schrieb Nirmoy Das: Generate HW IP's sched_list in amdgpu_ring_init() instead of amdgpu_ctx.c. This makes amdgpu_ctx_init_compute_sched(), ring.has_high_prio and amdgpu_ctx_init_sched() unnecessary. This patch also stores sched_list for all HW IPs in one big array in struct amdgpu_device which makes amdgpu_ctx_init_entity() much more leaner. v2: fix a coding style issue do not use drm hw_ip const to populate amdgpu_ring_type enum v3: remove ctx reference and move sched array and num_sched to a struct use num_scheds to detect uninitialized scheduler list Signed-off-by: Nirmoy Das --- drivers/gpu/drm/amd/amdgpu/amdgpu.h | 8 ++ drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c | 156 - drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.h | 3 - drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 2 - drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c | 3 +- drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h | 5 - drivers/gpu/drm/amd/amdgpu/amdgpu_jpeg.h | 2 - drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c | 24 +++- drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h | 12 +- drivers/gpu/drm/amd/amdgpu/amdgpu_sdma.h | 2 - drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.h | 4 - drivers/gpu/drm/amd/amdgpu/cik_sdma.c | 3 +- drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c | 13 +- drivers/gpu/drm/amd/amdgpu/gfx_v6_0.c | 5 +- drivers/gpu/drm/amd/amdgpu/gfx_v7_0.c | 5 +- drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c | 11 +- drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c | 13 +- drivers/gpu/drm/amd/amdgpu/jpeg_v1_0.c | 3 +- drivers/gpu/drm/amd/amdgpu/jpeg_v2_0.c | 3 +- drivers/gpu/drm/amd/amdgpu/jpeg_v2_5.c | 3 +- drivers/gpu/drm/amd/amdgpu/sdma_v2_4.c | 3 +- drivers/gpu/drm/amd/amdgpu/sdma_v3_0.c | 3 +- drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c | 6 +- drivers/gpu/drm/amd/amdgpu/sdma_v5_0.c | 3 +- drivers/gpu/drm/amd/amdgpu/si_dma.c | 3 +- drivers/gpu/drm/amd/amdgpu/uvd_v4_2.c | 3 +- drivers/gpu/drm/amd/amdgpu/uvd_v5_0.c | 3 +- drivers/gpu/drm/amd/amdgpu/uvd_v6_0.c | 7 +- drivers/gpu/drm/amd/amdgpu/uvd_v7_0.c | 6 +- drivers/gpu/drm/amd/amdgpu/vce_v2_0.c | 2 +- drivers/gpu/drm/amd/amdgpu/vce_v3_0.c | 3 +- drivers/gpu/drm/amd/amdgpu/vce_v4_0.c | 3 +- drivers/gpu/drm/amd/amdgpu/vcn_v1_0.c | 6 +- drivers/gpu/drm/amd/amdgpu/vcn_v2_0.c | 6 +- drivers/gpu/drm/amd/amdgpu/vcn_v2_5.c | 6 +- 35 files changed, 145 insertions(+), 198 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h b/drivers/gpu/drm/amd/amdgpu/amdgpu.h index 29f0a410091b..27abbdc603dd 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h @@ -721,6 +721,11 @@ struct amd_powerplay { const struct amd_pm_funcs *pp_funcs; }; +struct amdgpu_sched { + uint32_t num_scheds; + struct drm_gpu_scheduler *sched[HWIP_MAX_INSTANCE]; +}; + That can probably go into amdgpu_ring.h. #define AMDGPU_RESET_MAGIC_NUM 64 #define AMDGPU_MAX_DF_PERFMONS 4 struct amdgpu_device { @@ -858,6 +863,8 @@ struct amdgpu_device { struct amdgpu_ring *rings[AMDGPU_MAX_RINGS]; bool ib_pool_ready; struct amdgpu_sa_manager ring_tmp_bo[AMDGPU_IB_POOL_MAX]; + /* drm scheduler list */ + struct amdgpu_sched gpu_sched[AMDGPU_HW_IP_NUM][AMDGPU_RING_PRIO_MAX]; BTW: You need to be careful when indexing into this by an user specified value, e.g. use array_index_nospec(). Thanks, didn't know about this. I will modify accordingly. +const unsigned int amdgpu_ring_type_to_drm_hw_ip[AMDGPU_HW_IP_NUM] = { + [AMDGPU_RING_TYPE_GFX] = AMDGPU_HW_IP_GFX, + [AMDGPU_RING_TYPE_COMPUTE] = AMDGPU_HW_IP_COMPUTE, + [AMDGPU_RING_TYPE_SDMA] = AMDGPU_HW_IP_DMA, + [AMDGPU_RING_TYPE_UVD] = AMDGPU_HW_IP_UVD, + [AMDGPU_RING_TYPE_VCE] = AMDGPU_HW_IP_VCE, + [AMDGPU_RING_TYPE_UVD_ENC] = AMDGPU_HW_IP_UVD_ENC, + [AMDGPU_RING_TYPE_VCN_DEC] = AMDGPU_HW_IP_VCN_DEC, + [AMDGPU_RING_TYPE_VCN_ENC] = AMDGPU_HW_IP_VCN_ENC, + [AMDGPU_RING_TYPE_VCN_JPEG] = AMDGPU_HW_IP_VCN_JPEG, +}; + Please don't use a map for this, it is a 1 to 1 mapping as far as I can see which the compiler can eliminate. So I should just adev->gpu_sched[ring->funcs->type] ? The change to the amdgpu_ring_type definition looked better to me, but please make that a separate patch. Okay. Regards, Nirmoy Christian. /** * amdgpu_ring_alloc - allocate space on the ring buffer * @@ -162,11 +174,13 @@ void amdgpu_ring_undo(struct amdgpu_ring *ring) * Returns 0 on success, error on failure. */ int amdgpu_ring_init(struct amdgpu_device *adev, struct amdgpu_ring *ring, - unsigned max_dw, struct amdgpu_irq_src *irq_src, - unsigned irq_type) + unsigned int max_dw, stru
Re: [PATCH v3 1/1] drm/amdgpu: rework sched_list generation
Am 30.03.20 um 17:49 schrieb Nirmoy Das: Generate HW IP's sched_list in amdgpu_ring_init() instead of amdgpu_ctx.c. This makes amdgpu_ctx_init_compute_sched(), ring.has_high_prio and amdgpu_ctx_init_sched() unnecessary. This patch also stores sched_list for all HW IPs in one big array in struct amdgpu_device which makes amdgpu_ctx_init_entity() much more leaner. v2: fix a coding style issue do not use drm hw_ip const to populate amdgpu_ring_type enum v3: remove ctx reference and move sched array and num_sched to a struct use num_scheds to detect uninitialized scheduler list Signed-off-by: Nirmoy Das --- drivers/gpu/drm/amd/amdgpu/amdgpu.h| 8 ++ drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c| 156 - drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.h| 3 - drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 2 - drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c| 3 +- drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h| 5 - drivers/gpu/drm/amd/amdgpu/amdgpu_jpeg.h | 2 - drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c | 24 +++- drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h | 12 +- drivers/gpu/drm/amd/amdgpu/amdgpu_sdma.h | 2 - drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.h| 4 - drivers/gpu/drm/amd/amdgpu/cik_sdma.c | 3 +- drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c | 13 +- drivers/gpu/drm/amd/amdgpu/gfx_v6_0.c | 5 +- drivers/gpu/drm/amd/amdgpu/gfx_v7_0.c | 5 +- drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c | 11 +- drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c | 13 +- drivers/gpu/drm/amd/amdgpu/jpeg_v1_0.c | 3 +- drivers/gpu/drm/amd/amdgpu/jpeg_v2_0.c | 3 +- drivers/gpu/drm/amd/amdgpu/jpeg_v2_5.c | 3 +- drivers/gpu/drm/amd/amdgpu/sdma_v2_4.c | 3 +- drivers/gpu/drm/amd/amdgpu/sdma_v3_0.c | 3 +- drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c | 6 +- drivers/gpu/drm/amd/amdgpu/sdma_v5_0.c | 3 +- drivers/gpu/drm/amd/amdgpu/si_dma.c| 3 +- drivers/gpu/drm/amd/amdgpu/uvd_v4_2.c | 3 +- drivers/gpu/drm/amd/amdgpu/uvd_v5_0.c | 3 +- drivers/gpu/drm/amd/amdgpu/uvd_v6_0.c | 7 +- drivers/gpu/drm/amd/amdgpu/uvd_v7_0.c | 6 +- drivers/gpu/drm/amd/amdgpu/vce_v2_0.c | 2 +- drivers/gpu/drm/amd/amdgpu/vce_v3_0.c | 3 +- drivers/gpu/drm/amd/amdgpu/vce_v4_0.c | 3 +- drivers/gpu/drm/amd/amdgpu/vcn_v1_0.c | 6 +- drivers/gpu/drm/amd/amdgpu/vcn_v2_0.c | 6 +- drivers/gpu/drm/amd/amdgpu/vcn_v2_5.c | 6 +- 35 files changed, 145 insertions(+), 198 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h b/drivers/gpu/drm/amd/amdgpu/amdgpu.h index 29f0a410091b..27abbdc603dd 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h @@ -721,6 +721,11 @@ struct amd_powerplay { const struct amd_pm_funcs *pp_funcs; }; +struct amdgpu_sched { + uint32_tnum_scheds; + struct drm_gpu_scheduler*sched[HWIP_MAX_INSTANCE]; +}; + That can probably go into amdgpu_ring.h. #define AMDGPU_RESET_MAGIC_NUM 64 #define AMDGPU_MAX_DF_PERFMONS 4 struct amdgpu_device { @@ -858,6 +863,8 @@ struct amdgpu_device { struct amdgpu_ring *rings[AMDGPU_MAX_RINGS]; boolib_pool_ready; struct amdgpu_sa_managerring_tmp_bo[AMDGPU_IB_POOL_MAX]; + /* drm scheduler list */ + struct amdgpu_sched gpu_sched[AMDGPU_HW_IP_NUM][AMDGPU_RING_PRIO_MAX]; BTW: You need to be careful when indexing into this by an user specified value, e.g. use array_index_nospec(). /* interrupts */ struct amdgpu_irq irq; @@ -993,6 +1000,7 @@ struct amdgpu_device { charproduct_number[16]; charproduct_name[32]; charserial[16]; + }; static inline struct amdgpu_device *amdgpu_ttm_adev(struct ttm_bo_device *bdev) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c index 6ed36a2c5f73..331646d472e2 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c @@ -72,13 +72,30 @@ static enum gfx_pipe_priority amdgpu_ctx_sched_prio_to_compute_prio(enum drm_sch } } +static unsigned int amdgpu_ctx_sched_prio_to_hw_prio(struct amdgpu_device *adev, +enum drm_sched_priority prio, +const int hw_ip) +{ + unsigned int hw_prio; + + hw_prio = (hw_ip == AMDGPU_HW_IP_COMPUTE) ? + amdgpu_ctx_sched_prio_to_compute_prio(prio) : + AMDGPU_RING_PRIO_DEFAULT; + + /* set to default prio if sched_list is NULL */ + if (!adev->gpu_sched[hw_ip][hw_prio].num_scheds) + hw_prio = AMDGPU_RING_PRIO_DEFAULT; + + ret
Re: [PATCH v3 1/1] drm/amdgpu: rework sched_list generation
This patch seems to be using DOS line-endings. After converting it to UNIX line-endings, the output of "git am" using "scripts/checkpatch.pl" via the pre-commit hook is appended last to my thoughts below. On 2020-03-30 11:49 a.m., Nirmoy Das wrote: > Generate HW IP's sched_list in amdgpu_ring_init() instead of > amdgpu_ctx.c. This makes amdgpu_ctx_init_compute_sched(), > ring.has_high_prio and amdgpu_ctx_init_sched() unnecessary. > This patch also stores sched_list for all HW IPs in one big > array in struct amdgpu_device which makes amdgpu_ctx_init_entity() > much more leaner. > > v2: > fix a coding style issue > do not use drm hw_ip const to populate amdgpu_ring_type enum > > v3: > remove ctx reference and move sched array and num_sched to a struct > use num_scheds to detect uninitialized scheduler list > > Signed-off-by: Nirmoy Das > --- > drivers/gpu/drm/amd/amdgpu/amdgpu.h| 8 ++ > drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c| 156 - > drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.h| 3 - > drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 2 - > drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c| 3 +- > drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h| 5 - > drivers/gpu/drm/amd/amdgpu/amdgpu_jpeg.h | 2 - > drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c | 24 +++- > drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h | 12 +- > drivers/gpu/drm/amd/amdgpu/amdgpu_sdma.h | 2 - > drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.h| 4 - > drivers/gpu/drm/amd/amdgpu/cik_sdma.c | 3 +- > drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c | 13 +- > drivers/gpu/drm/amd/amdgpu/gfx_v6_0.c | 5 +- > drivers/gpu/drm/amd/amdgpu/gfx_v7_0.c | 5 +- > drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c | 11 +- > drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c | 13 +- > drivers/gpu/drm/amd/amdgpu/jpeg_v1_0.c | 3 +- > drivers/gpu/drm/amd/amdgpu/jpeg_v2_0.c | 3 +- > drivers/gpu/drm/amd/amdgpu/jpeg_v2_5.c | 3 +- > drivers/gpu/drm/amd/amdgpu/sdma_v2_4.c | 3 +- > drivers/gpu/drm/amd/amdgpu/sdma_v3_0.c | 3 +- > drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c | 6 +- > drivers/gpu/drm/amd/amdgpu/sdma_v5_0.c | 3 +- > drivers/gpu/drm/amd/amdgpu/si_dma.c| 3 +- > drivers/gpu/drm/amd/amdgpu/uvd_v4_2.c | 3 +- > drivers/gpu/drm/amd/amdgpu/uvd_v5_0.c | 3 +- > drivers/gpu/drm/amd/amdgpu/uvd_v6_0.c | 7 +- > drivers/gpu/drm/amd/amdgpu/uvd_v7_0.c | 6 +- > drivers/gpu/drm/amd/amdgpu/vce_v2_0.c | 2 +- > drivers/gpu/drm/amd/amdgpu/vce_v3_0.c | 3 +- > drivers/gpu/drm/amd/amdgpu/vce_v4_0.c | 3 +- > drivers/gpu/drm/amd/amdgpu/vcn_v1_0.c | 6 +- > drivers/gpu/drm/amd/amdgpu/vcn_v2_0.c | 6 +- > drivers/gpu/drm/amd/amdgpu/vcn_v2_5.c | 6 +- > 35 files changed, 145 insertions(+), 198 deletions(-) > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h > b/drivers/gpu/drm/amd/amdgpu/amdgpu.h > index 29f0a410091b..27abbdc603dd 100644 > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h > @@ -721,6 +721,11 @@ struct amd_powerplay { > const struct amd_pm_funcs *pp_funcs; > }; > > +struct amdgpu_sched { > + uint32_tnum_scheds; > + struct drm_gpu_scheduler*sched[HWIP_MAX_INSTANCE]; > +}; > + > #define AMDGPU_RESET_MAGIC_NUM 64 > #define AMDGPU_MAX_DF_PERFMONS 4 > struct amdgpu_device { > @@ -858,6 +863,8 @@ struct amdgpu_device { > struct amdgpu_ring *rings[AMDGPU_MAX_RINGS]; > boolib_pool_ready; > struct amdgpu_sa_managerring_tmp_bo[AMDGPU_IB_POOL_MAX]; > + /* drm scheduler list */ > + struct amdgpu_sched > gpu_sched[AMDGPU_HW_IP_NUM][AMDGPU_RING_PRIO_MAX]; That's a 2-dimensional array of "struct amdgpu_sched". I think that the comment should be removed, or at least not say "drm scheduler list". (I can see the structure definition above.) If this is the path you want to go, consider removing "num_scheds" and creating a three dimensional array, which would really essentialize the direction you want to go: struct drm_gpu_scheduler *gpu_sched[AMDGPU_HW_IP_NUM][AMDGPU_RING_PRIO_MAX][HWIP_MAX_INSTANCE]; Now that this architecture is stripped down to its essentials, perhaps we can see some optimizations...? Also consider that since you're creating an array of pointers, you don't necessarily need to know their count. Your hot-path algorithms should not need to know it. If you need to print their count, say in sysfs, then you can count them up on behalf of the user-space process cat-ing the sysfs entry. > > /* interrupts */ > struct amdgpu_irq irq; > @@ -993,6 +1000,7 @@ struct amdgpu_device { > charproduct_number[16]; > charproduct_name[32]; > charserial[16]; > + > }; Unnecessary empty line above. > > s