Re: [PATCH] drm/amdgpu/display: drop the reduction loop when setting the sync groups
On 2020-05-29 2:04 p.m., Alex Deucher wrote: On Fri, May 29, 2020 at 9:56 AM Kazlauskas, Nicholas wrote: On 2020-05-28 10:06 a.m., Alex Deucher wrote: The logic for blanked is not the same as having a plane_state. Technically you can drive an OTG without anything connected in the front end and it'll just draw out the back color which is distinct from having the OTG be blanked. If we add planes or unblank the OTG later then we'll still want the synchronization. Bug: https://gitlab.freedesktop.org/drm/amd/issues/781 Fixes: 5fc0cbfad45648 ("drm/amd/display: determine if a pipe is synced by plane state") Cc: nicholas.kazlaus...@amd.com Signed-off-by: Alex Deucher > --- drivers/gpu/drm/amd/display/dc/core/dc.c | 8 1 file changed, 8 deletions(-) diff --git a/drivers/gpu/drm/amd/display/dc/core/dc.c b/drivers/gpu/drm/amd/display/dc/core/dc.c index 04c3d9f7e323..6279520f7873 100644 --- a/drivers/gpu/drm/amd/display/dc/core/dc.c +++ b/drivers/gpu/drm/amd/display/dc/core/dc.c @@ -1040,14 +1040,6 @@ static void program_timing_sync( status->timing_sync_info.master = false; } - /* remove any other pipes with plane as they have already been synced */ - for (j = j + 1; j < group_size; j++) { - if (pipe_set[j]->plane_state) { - group_size--; - pipe_set[j] = pipe_set[group_size]; - j--; - } - } Looking at this again, I think I may understand the issue this was trying to work around. If we try to force timing synchronization on displays that are currently active then this is going to force reset the vertical position, resulting in screen corruption. So what this logic was attempting to do was ensure that timing synchronization only happens when committing two streams at a time without any image on the screen. Maybe it'd be best to just blank these streams out first, but for now, let's actually go back to fixing this by applying the actual dpg/tg check that Wenjing suggests, something like: if (pool->opps[i]->funcs->dpg_is_blanked) s.blank_enabled = pool->opps[i]->funcs->dpg_is_blanked(pool->opps[i]); else s.blank_enabled = tg->funcs->is_blanked(tg); Hmm, it's not clear to me where this code needs to go. Can you point me in the right direction or provide a quick patch? Thanks, Alex The old code used to check !tg->funcs->is_blanked(tg) ie. to drop the pipe from the group if it's currently active. The issue was that on newer ASIC it's now the DPG that's the indicator from the hardware side, so we should replace the !plane_state check with a check first for !dpg_is_blanked and then !is_blanked if the DPG doesn't exist. Regards, Nicholas Kazlauskas The reason why we have this issue in the first place is because amdgpu_dm usually commits a dc_state with the planes already in it instead of committing them later, so plane_state not being NULL is typically true. Regards, Nicholas Kazlauskas if (group_size > 1) { dc->hwss.enable_timing_synchronization( ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
Re: [PATCH] drm/amdgpu/display: drop the reduction loop when setting the sync groups
On Fri, May 29, 2020 at 9:56 AM Kazlauskas, Nicholas wrote: > > On 2020-05-28 10:06 a.m., Alex Deucher wrote: > > The logic for blanked is not the same as having a plane_state. Technically > > you can drive an OTG without anything connected in the front end and it'll > > just draw out the back color which is distinct from having the OTG be > > blanked. > > If we add planes or unblank the OTG later then we'll still want the > > synchronization. > > > > Bug: https://gitlab.freedesktop.org/drm/amd/issues/781 > > Fixes: 5fc0cbfad45648 ("drm/amd/display: determine if a pipe is synced by > > plane state") > > Cc: nicholas.kazlaus...@amd.com > > Signed-off-by: Alex Deucher > --- > > drivers/gpu/drm/amd/display/dc/core/dc.c | 8 > > 1 file changed, 8 deletions(-) > > > > diff --git a/drivers/gpu/drm/amd/display/dc/core/dc.c > > b/drivers/gpu/drm/amd/display/dc/core/dc.c > > index 04c3d9f7e323..6279520f7873 100644 > > --- a/drivers/gpu/drm/amd/display/dc/core/dc.c > > +++ b/drivers/gpu/drm/amd/display/dc/core/dc.c > > @@ -1040,14 +1040,6 @@ static void program_timing_sync( > > status->timing_sync_info.master = false; > > > > } > > - /* remove any other pipes with plane as they have already > > been synced */ > > - for (j = j + 1; j < group_size; j++) { > > - if (pipe_set[j]->plane_state) { > > - group_size--; > > - pipe_set[j] = pipe_set[group_size]; > > - j--; > > - } > > - } > > > Looking at this again, I think I may understand the issue this was > trying to work around. > > If we try to force timing synchronization on displays that are currently > active then this is going to force reset the vertical position, > resulting in screen corruption. > > So what this logic was attempting to do was ensure that timing > synchronization only happens when committing two streams at a time > without any image on the screen. > > Maybe it'd be best to just blank these streams out first, but for now, > let's actually go back to fixing this by applying the actual dpg/tg > check that Wenjing suggests, something like: > > if (pool->opps[i]->funcs->dpg_is_blanked) > s.blank_enabled = > pool->opps[i]->funcs->dpg_is_blanked(pool->opps[i]); > else > s.blank_enabled = tg->funcs->is_blanked(tg); > Hmm, it's not clear to me where this code needs to go. Can you point me in the right direction or provide a quick patch? Thanks, Alex > > > The reason why we have this issue in the first place is because > amdgpu_dm usually commits a dc_state with the planes already in it > instead of committing them later, so plane_state not being NULL is > typically true. > > Regards, > Nicholas Kazlauskas > > > > > if (group_size > 1) { > > dc->hwss.enable_timing_synchronization( > > > ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
Re: [PATCH] drm/amdgpu/display: drop the reduction loop when setting the sync groups
On 2020-05-28 10:06 a.m., Alex Deucher wrote: The logic for blanked is not the same as having a plane_state. Technically you can drive an OTG without anything connected in the front end and it'll just draw out the back color which is distinct from having the OTG be blanked. If we add planes or unblank the OTG later then we'll still want the synchronization. Bug: https://gitlab.freedesktop.org/drm/amd/issues/781 Fixes: 5fc0cbfad45648 ("drm/amd/display: determine if a pipe is synced by plane state") Cc: nicholas.kazlaus...@amd.com Signed-off-by: Alex Deucher > --- drivers/gpu/drm/amd/display/dc/core/dc.c | 8 1 file changed, 8 deletions(-) diff --git a/drivers/gpu/drm/amd/display/dc/core/dc.c b/drivers/gpu/drm/amd/display/dc/core/dc.c index 04c3d9f7e323..6279520f7873 100644 --- a/drivers/gpu/drm/amd/display/dc/core/dc.c +++ b/drivers/gpu/drm/amd/display/dc/core/dc.c @@ -1040,14 +1040,6 @@ static void program_timing_sync( status->timing_sync_info.master = false; } - /* remove any other pipes with plane as they have already been synced */ - for (j = j + 1; j < group_size; j++) { - if (pipe_set[j]->plane_state) { - group_size--; - pipe_set[j] = pipe_set[group_size]; - j--; - } - } Looking at this again, I think I may understand the issue this was trying to work around. If we try to force timing synchronization on displays that are currently active then this is going to force reset the vertical position, resulting in screen corruption. So what this logic was attempting to do was ensure that timing synchronization only happens when committing two streams at a time without any image on the screen. Maybe it'd be best to just blank these streams out first, but for now, let's actually go back to fixing this by applying the actual dpg/tg check that Wenjing suggests, something like: if (pool->opps[i]->funcs->dpg_is_blanked) s.blank_enabled = pool->opps[i]->funcs->dpg_is_blanked(pool->opps[i]); else s.blank_enabled = tg->funcs->is_blanked(tg); The reason why we have this issue in the first place is because amdgpu_dm usually commits a dc_state with the planes already in it instead of committing them later, so plane_state not being NULL is typically true. Regards, Nicholas Kazlauskas if (group_size > 1) { dc->hwss.enable_timing_synchronization( ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
Re: [PATCH] drm/radeon: Convert get_user_pages() --> pin_user_pages()
On Thu, May 28, 2020 at 12:37 AM John Hubbard wrote: > > On 2020-05-27 01:51, Daniel Vetter wrote: > > On Wed, May 27, 2020 at 10:48:52AM +0200, Daniel Vetter wrote: > >> On Tue, May 26, 2020 at 03:57:45PM -0700, John Hubbard wrote: > >>> On 2020-05-26 14:00, Souptick Joarder wrote: > This code was using get_user_pages(), in a "Case 2" scenario > (DMA/RDMA), using the categorization from [1]. That means that it's > time to convert the get_user_pages() + release_pages() calls to > pin_user_pages() + unpin_user_pages() calls. > > There is some helpful background in [2]: basically, this is a small > part of fixing a long-standing disconnect between pinning pages, and > file systems' use of those pages. > > [1] Documentation/core-api/pin_user_pages.rst > > [2] "Explicit pinning of user-space pages": > https://lwn.net/Articles/807108/ > >> > >> I don't think this is a case 2 here, nor is it any of the others. Feels > >> like not covered at all by the doc. > >> > >> radeon has a mmu notifier (might be a bit broken, but hey whatever there's > >> other drivers which have the same concept, but less broken). So when you > >> do an munmap, radeon will release the page refcount. > > > > Aha, thanks Daniel. I withdraw my misinformed ACK, then. > > > I forgot to add: It's also not case 3, since there's no hw page fault > > support. It's all faked in software, and explicitly synchronizes against > > pending io (or preempts it, that depends a bit upon the jobs running). > > > > This is what case 3 was *intended* to cover, but it looks like case 3 needs to > be written a little better. I'll attempt that, and Cc you on the actual patch > to -mm. (I think we also need a case 5 for an unrelated scenario, too, so > it's time.) There were no *case 5* in the other patch posted in -mm. Do we need to add it ? > > > thanks, > -- > John Hubbard > NVIDIA > > > >> Which case it that? > >> > >> Note that currently only amdgpu doesn't work like that for gpu dma > >> directly to userspace ranges, it uses hmm and afaiui doens't hold a full > >> page pin refcount. > >> > >> Cheers, Daniel > >> > >> > > Signed-off-by: Souptick Joarder > Cc: John Hubbard > > Hi, > > I'm compile tested this, but unable to run-time test, so any testing > help is much appriciated. > --- > drivers/gpu/drm/radeon/radeon_ttm.c | 6 +++--- > 1 file changed, 3 insertions(+), 3 deletions(-) > > diff --git a/drivers/gpu/drm/radeon/radeon_ttm.c > b/drivers/gpu/drm/radeon/radeon_ttm.c > index 5d50c9e..e927de2 100644 > --- a/drivers/gpu/drm/radeon/radeon_ttm.c > +++ b/drivers/gpu/drm/radeon/radeon_ttm.c > @@ -506,7 +506,7 @@ static int radeon_ttm_tt_pin_userptr(struct ttm_tt > *ttm) > uint64_t userptr = gtt->userptr + pinned * PAGE_SIZE; > struct page **pages = ttm->pages + pinned; > - r = get_user_pages(userptr, num_pages, write ? FOLL_WRITE : 0, > + r = pin_user_pages(userptr, num_pages, write ? FOLL_WRITE : 0, > pages, NULL); > if (r < 0) > goto release_pages; > @@ -535,7 +535,7 @@ static int radeon_ttm_tt_pin_userptr(struct ttm_tt > *ttm) > kfree(ttm->sg); > release_pages: > - release_pages(ttm->pages, pinned); > + unpin_user_pages(ttm->pages, pinned); > return r; > } > @@ -562,7 +562,7 @@ static void radeon_ttm_tt_unpin_userptr(struct > ttm_tt *ttm) > set_page_dirty(page); > >>> > >>> > >>> Maybe we also need a preceding patch, to fix the above? It should be > >>> set_page_dirty_lock(), rather than set_page_dirty(), unless I'm > >>> overlooking > >>> something (which is very possible!). > >>> > >>> Either way, from a tunnel vision perspective of changing gup to pup, this > >>> looks good to me, so > >>> > >>> Acked-by: John Hubbard > >>> > >>> > >>> thanks, > >>> -- > >>> John Hubbard > >>> NVIDIA > >>> > mark_page_accessed(page); > - put_page(page); > + unpin_user_page(page); > } > sg_free_table(ttm->sg); > > >>> > >> > >> -- > >> Daniel Vetter > >> Software Engineer, Intel Corporation > >> http://blog.ffwll.ch > > > ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
Re: [PATCH] drm/radeon: Convert get_user_pages() --> pin_user_pages()
On 2020-05-28 23:49, Souptick Joarder wrote: ... This is what case 3 was *intended* to cover, but it looks like case 3 needs to be written a little better. I'll attempt that, and Cc you on the actual patch to -mm. (I think we also need a case 5 for an unrelated scenario, too, so it's time.) There were no *case 5* in the other patch posted in -mm. Do we need to add it ? Working on figuring that out [1], but it's not directly relevant to this thread. Maybe I shouldn't have brought it up here. :) [1] https://lore.kernel.org/r/20200529070343.gl14...@quack2.suse.cz thanks, John Hubbard NVIDIA ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
Re: [PATCH] drm/amdgpu/pm: don't bail for in_suspend
On 2020-05-28 12:51 a.m., Alex Deucher wrote: > Otherwise we disable sysfs/debugfs access with runtime pm. > > Fixes: f7c8d853b029df ("drm/amdgpu/pm: return an error during GPU reset or > suspend") > Signed-off-by: Alex Deucher > --- > drivers/gpu/drm/amd/amdgpu/amdgpu_pm.c | 114 - > 1 file changed, 57 insertions(+), 57 deletions(-) > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_pm.c > b/drivers/gpu/drm/amd/amdgpu/amdgpu_pm.c > index 808884aaf36d..775e389c9a13 100644 > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_pm.c > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_pm.c > @@ -163,7 +163,7 @@ static ssize_t amdgpu_get_power_dpm_state(struct device > *dev, > enum amd_pm_state_type pm; > int ret; > > - if (adev->in_gpu_reset || adev->in_suspend) > + if (adev->in_gpu_reset) > return -EPERM; > > ret = pm_runtime_get_sync(ddev->dev); > @@ -199,7 +199,7 @@ static ssize_t amdgpu_set_power_dpm_state(struct device > *dev, > enum amd_pm_state_type state; > int ret; > > - if (adev->in_gpu_reset || adev->in_suspend) > + if (adev->in_gpu_reset) > return -EPERM; > > if (strncmp("battery", buf, strlen("battery")) == 0) Might be worth moving this check into a helper function; that would make similar changes easier in the future. :) Can be a separate change, of course. -- Earthling Michel Dänzer | https://redhat.com Libre software enthusiast | Mesa and X developer ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
Re: amdgpu doesn't do implicit sync, requires drivers to do it in IBs
Am 28.05.20 um 21:35 schrieb Marek Olšák: On Thu, May 28, 2020 at 2:12 PM Christian König mailto:christian.koe...@amd.com>> wrote: Am 28.05.20 um 18:06 schrieb Marek Olšák: On Thu, May 28, 2020 at 10:40 AM Christian König mailto:christian.koe...@amd.com>> wrote: Am 28.05.20 um 12:06 schrieb Michel Dänzer: > On 2020-05-28 11:11 a.m., Christian König wrote: >> Well we still need implicit sync [...] > Yeah, this isn't about "we don't want implicit sync", it's about "amdgpu > doesn't ensure later jobs fully see the effects of previous implicitly > synced jobs", requiring userspace to do pessimistic flushing. Yes, exactly that. For the background: We also do this flushing for explicit syncs. And when this was implemented 2-3 years ago we first did the flushing for implicit sync as well. That was immediately reverted and then implemented differently because it caused severe performance problems in some use cases. I'm not sure of the root cause of this performance problems. My assumption was always that we then insert to many pipeline syncs, but Marek doesn't seem to think it could be that. On the one hand I'm rather keen to remove the extra handling and just always use the explicit handling for everything because it simplifies the kernel code quite a bit. On the other hand I don't want to run into this performance problem again. Additional to that what the kernel does is a "full" pipeline sync, e.g. we busy wait for the full hardware pipeline to drain. That might be overkill if you just want to do some flushing so that the next shader sees the stuff written, but I'm not an expert on that. Do we busy-wait on the CPU or in WAIT_REG_MEM? WAIT_REG_MEM is what UMDs do and should be faster. We use WAIT_REG_MEM to wait for an EOP fence value to reach memory. We use this for a couple of things, especially to make sure that the hardware is idle before changing VMID to page table associations. What about your idea of having an extra dw in the shared BOs indicating that they are flushed? As far as I understand it an EOS or other event might be sufficient for the caches as well. And you could insert the WAIT_REG_MEM directly before the first draw using the texture and not before the whole IB. Could be that we can optimize this even more than what we do in the kernel. Christian. Adding fences into BOs would be bad, because all UMDs would have to handle them. Yeah, already assumed that this is the biggest blocker. Is it possible to do this in the ring buffer: if (fence_signalled) { indirect_buffer(dependent_IB); indirect_buffer(other_IB); } else { indirect_buffer(other_IB); wait_reg_mem(fence); indirect_buffer(dependent_IB); } That's maybe possible, but at least not easily implementable. Or we might have to wait for a hw scheduler. I'm still fine doing the pipeline sync for implicit sync as well, I just need somebody to confirm me that this doesn't backfire in some case. Does the kernel sync when the driver fd is different, or when the context is different? Only when the driver fd is different. Christian. Marek ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
[bug report] drm/amdkfd: Track SDMA utilization per process
Hello Mukul Joshi, This is a semi-automatic email about new static checker warnings. The patch 522b89c63370: "drm/amdkfd: Track SDMA utilization per process" from May 26, 2020, leads to the following Smatch complaint: drivers/gpu/drm/amd/amdgpu/../amdkfd/kfd_process.c:109 kfd_sdma_activity_worker() warn: variable dereferenced before check 'pdd' (see line 106) drivers/gpu/drm/amd/amdgpu/../amdkfd/kfd_process.c:109 kfd_sdma_activity_worker() warn: address of 'pdd->qpd' is non-NULL drivers/gpu/drm/amd/amdgpu/../amdkfd/kfd_process.c 105 pdd = workarea->pdd; 106 dqm = pdd->dev->dqm; Dereference. 107 qpd = &pdd->qpd; 108 109 if (!pdd || !dqm || !qpd) ^^^ ^^^ pdd is checked too late and qpd can't possibly be NULL. 110 return; 111 regards, dan carpenter ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx