Re: [PATCH] drm/amdgpu/display: drop the reduction loop when setting the sync groups

2020-05-29 Thread Kazlauskas, Nicholas

On 2020-05-29 2:04 p.m., Alex Deucher wrote:

On Fri, May 29, 2020 at 9:56 AM Kazlauskas, Nicholas
 wrote:


On 2020-05-28 10:06 a.m., Alex Deucher wrote:

The logic for blanked is not the same as having a plane_state. Technically
you can drive an OTG without anything connected in the front end and it'll
just draw out the back color which is distinct from having the OTG be blanked.
If we add planes or unblank the OTG later then we'll still want the
synchronization.

Bug: https://gitlab.freedesktop.org/drm/amd/issues/781
Fixes: 5fc0cbfad45648 ("drm/amd/display: determine if a pipe is synced by plane 
state")
Cc: nicholas.kazlaus...@amd.com
Signed-off-by: Alex Deucher  > ---
   drivers/gpu/drm/amd/display/dc/core/dc.c | 8 
   1 file changed, 8 deletions(-)

diff --git a/drivers/gpu/drm/amd/display/dc/core/dc.c 
b/drivers/gpu/drm/amd/display/dc/core/dc.c
index 04c3d9f7e323..6279520f7873 100644
--- a/drivers/gpu/drm/amd/display/dc/core/dc.c
+++ b/drivers/gpu/drm/amd/display/dc/core/dc.c
@@ -1040,14 +1040,6 @@ static void program_timing_sync(
   status->timing_sync_info.master = false;

   }
- /* remove any other pipes with plane as they have already been 
synced */
- for (j = j + 1; j < group_size; j++) {
- if (pipe_set[j]->plane_state) {
- group_size--;
- pipe_set[j] = pipe_set[group_size];
- j--;
- }
- }



Looking at this again, I think I may understand the issue this was
trying to work around.

If we try to force timing synchronization on displays that are currently
active then this is going to force reset the vertical position,
resulting in screen corruption.

So what this logic was attempting to do was ensure that timing
synchronization only happens when committing two streams at a time
without any image on the screen.

Maybe it'd be best to just blank these streams out first, but for now,
let's actually go back to fixing this by applying the actual dpg/tg
check that Wenjing suggests, something like:

 if (pool->opps[i]->funcs->dpg_is_blanked)
  s.blank_enabled =
pool->opps[i]->funcs->dpg_is_blanked(pool->opps[i]);
 else
  s.blank_enabled = tg->funcs->is_blanked(tg);



Hmm, it's not clear to me where this code needs to go.  Can you point
me in the right direction or provide a quick patch?

Thanks,

Alex


The old code used to check !tg->funcs->is_blanked(tg) ie. to drop the 
pipe from the group if it's currently active.


The issue was that on newer ASIC it's now the DPG that's the indicator 
from the hardware side, so we should replace the !plane_state check with 
a check first for !dpg_is_blanked and then !is_blanked if the DPG 
doesn't exist.


Regards,
Nicholas Kazlauskas






The reason why we have this issue in the first place is because
amdgpu_dm usually commits a dc_state with the planes already in it
instead of committing them later, so plane_state not being NULL is
typically true.

Regards,
Nicholas Kazlauskas



   if (group_size > 1) {
   dc->hwss.enable_timing_synchronization(





___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH] drm/amdgpu/display: drop the reduction loop when setting the sync groups

2020-05-29 Thread Alex Deucher
On Fri, May 29, 2020 at 9:56 AM Kazlauskas, Nicholas
 wrote:
>
> On 2020-05-28 10:06 a.m., Alex Deucher wrote:
> > The logic for blanked is not the same as having a plane_state. Technically
> > you can drive an OTG without anything connected in the front end and it'll
> > just draw out the back color which is distinct from having the OTG be 
> > blanked.
> > If we add planes or unblank the OTG later then we'll still want the
> > synchronization.
> >
> > Bug: https://gitlab.freedesktop.org/drm/amd/issues/781
> > Fixes: 5fc0cbfad45648 ("drm/amd/display: determine if a pipe is synced by 
> > plane state")
> > Cc: nicholas.kazlaus...@amd.com
> > Signed-off-by: Alex Deucher  > ---
> >   drivers/gpu/drm/amd/display/dc/core/dc.c | 8 
> >   1 file changed, 8 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/amd/display/dc/core/dc.c 
> > b/drivers/gpu/drm/amd/display/dc/core/dc.c
> > index 04c3d9f7e323..6279520f7873 100644
> > --- a/drivers/gpu/drm/amd/display/dc/core/dc.c
> > +++ b/drivers/gpu/drm/amd/display/dc/core/dc.c
> > @@ -1040,14 +1040,6 @@ static void program_timing_sync(
> >   status->timing_sync_info.master = false;
> >
> >   }
> > - /* remove any other pipes with plane as they have already 
> > been synced */
> > - for (j = j + 1; j < group_size; j++) {
> > - if (pipe_set[j]->plane_state) {
> > - group_size--;
> > - pipe_set[j] = pipe_set[group_size];
> > - j--;
> > - }
> > - }
>
>
> Looking at this again, I think I may understand the issue this was
> trying to work around.
>
> If we try to force timing synchronization on displays that are currently
> active then this is going to force reset the vertical position,
> resulting in screen corruption.
>
> So what this logic was attempting to do was ensure that timing
> synchronization only happens when committing two streams at a time
> without any image on the screen.
>
> Maybe it'd be best to just blank these streams out first, but for now,
> let's actually go back to fixing this by applying the actual dpg/tg
> check that Wenjing suggests, something like:
>
> if (pool->opps[i]->funcs->dpg_is_blanked)
>  s.blank_enabled =
> pool->opps[i]->funcs->dpg_is_blanked(pool->opps[i]);
> else
>  s.blank_enabled = tg->funcs->is_blanked(tg);
>

Hmm, it's not clear to me where this code needs to go.  Can you point
me in the right direction or provide a quick patch?

Thanks,

Alex

>
>
> The reason why we have this issue in the first place is because
> amdgpu_dm usually commits a dc_state with the planes already in it
> instead of committing them later, so plane_state not being NULL is
> typically true.
>
> Regards,
> Nicholas Kazlauskas
>
> >
> >   if (group_size > 1) {
> >   dc->hwss.enable_timing_synchronization(
> >
>
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH] drm/amdgpu/display: drop the reduction loop when setting the sync groups

2020-05-29 Thread Kazlauskas, Nicholas

On 2020-05-28 10:06 a.m., Alex Deucher wrote:

The logic for blanked is not the same as having a plane_state. Technically
you can drive an OTG without anything connected in the front end and it'll
just draw out the back color which is distinct from having the OTG be blanked.
If we add planes or unblank the OTG later then we'll still want the
synchronization.

Bug: https://gitlab.freedesktop.org/drm/amd/issues/781
Fixes: 5fc0cbfad45648 ("drm/amd/display: determine if a pipe is synced by plane 
state")
Cc: nicholas.kazlaus...@amd.com
Signed-off-by: Alex Deucher  > ---
  drivers/gpu/drm/amd/display/dc/core/dc.c | 8 
  1 file changed, 8 deletions(-)

diff --git a/drivers/gpu/drm/amd/display/dc/core/dc.c 
b/drivers/gpu/drm/amd/display/dc/core/dc.c
index 04c3d9f7e323..6279520f7873 100644
--- a/drivers/gpu/drm/amd/display/dc/core/dc.c
+++ b/drivers/gpu/drm/amd/display/dc/core/dc.c
@@ -1040,14 +1040,6 @@ static void program_timing_sync(
status->timing_sync_info.master = false;
  
  		}

-   /* remove any other pipes with plane as they have already been 
synced */
-   for (j = j + 1; j < group_size; j++) {
-   if (pipe_set[j]->plane_state) {
-   group_size--;
-   pipe_set[j] = pipe_set[group_size];
-   j--;
-   }
-   }



Looking at this again, I think I may understand the issue this was 
trying to work around.


If we try to force timing synchronization on displays that are currently 
active then this is going to force reset the vertical position, 
resulting in screen corruption.


So what this logic was attempting to do was ensure that timing 
synchronization only happens when committing two streams at a time 
without any image on the screen.


Maybe it'd be best to just blank these streams out first, but for now, 
let's actually go back to fixing this by applying the actual dpg/tg 
check that Wenjing suggests, something like:


   if (pool->opps[i]->funcs->dpg_is_blanked)
s.blank_enabled = 
pool->opps[i]->funcs->dpg_is_blanked(pool->opps[i]);

   else
s.blank_enabled = tg->funcs->is_blanked(tg);



The reason why we have this issue in the first place is because 
amdgpu_dm usually commits a dc_state with the planes already in it 
instead of committing them later, so plane_state not being NULL is 
typically true.


Regards,
Nicholas Kazlauskas

  
  		if (group_size > 1) {

dc->hwss.enable_timing_synchronization(



___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH] drm/radeon: Convert get_user_pages() --> pin_user_pages()

2020-05-29 Thread Souptick Joarder
On Thu, May 28, 2020 at 12:37 AM John Hubbard  wrote:
>
> On 2020-05-27 01:51, Daniel Vetter wrote:
> > On Wed, May 27, 2020 at 10:48:52AM +0200, Daniel Vetter wrote:
> >> On Tue, May 26, 2020 at 03:57:45PM -0700, John Hubbard wrote:
> >>> On 2020-05-26 14:00, Souptick Joarder wrote:
>  This code was using get_user_pages(), in a "Case 2" scenario
>  (DMA/RDMA), using the categorization from [1]. That means that it's
>  time to convert the get_user_pages() + release_pages() calls to
>  pin_user_pages() + unpin_user_pages() calls.
> 
>  There is some helpful background in [2]: basically, this is a small
>  part of fixing a long-standing disconnect between pinning pages, and
>  file systems' use of those pages.
> 
>  [1] Documentation/core-api/pin_user_pages.rst
> 
>  [2] "Explicit pinning of user-space pages":
>    https://lwn.net/Articles/807108/
> >>
> >> I don't think this is a case 2 here, nor is it any of the others. Feels
> >> like not covered at all by the doc.
> >>
> >> radeon has a mmu notifier (might be a bit broken, but hey whatever there's
> >> other drivers which have the same concept, but less broken). So when you
> >> do an munmap, radeon will release the page refcount.
> >
>
> Aha, thanks Daniel. I withdraw my misinformed ACK, then.
>
> > I forgot to add: It's also not case 3, since there's no hw page fault
> > support. It's all faked in software, and explicitly synchronizes against
> > pending io (or preempts it, that depends a bit upon the jobs running).
> >
>
> This is what case 3 was *intended* to cover, but it looks like case 3 needs to
> be written a little better. I'll attempt that, and Cc you on the actual patch
> to -mm. (I think we also need a case 5 for an unrelated scenario, too, so
> it's time.)

There were no *case 5* in the other patch posted in -mm. Do we need to add it ?

>
>
> thanks,
> --
> John Hubbard
> NVIDIA
>
>
> >> Which case it that?
> >>
> >> Note that currently only amdgpu doesn't work like that for gpu dma
> >> directly to userspace ranges, it uses hmm and afaiui doens't hold a full
> >> page pin refcount.
> >>
> >> Cheers, Daniel
> >>
> >>
> 
>  Signed-off-by: Souptick Joarder 
>  Cc: John Hubbard 
> 
>  Hi,
> 
>  I'm compile tested this, but unable to run-time test, so any testing
>  help is much appriciated.
>  ---
> drivers/gpu/drm/radeon/radeon_ttm.c | 6 +++---
> 1 file changed, 3 insertions(+), 3 deletions(-)
> 
>  diff --git a/drivers/gpu/drm/radeon/radeon_ttm.c 
>  b/drivers/gpu/drm/radeon/radeon_ttm.c
>  index 5d50c9e..e927de2 100644
>  --- a/drivers/gpu/drm/radeon/radeon_ttm.c
>  +++ b/drivers/gpu/drm/radeon/radeon_ttm.c
>  @@ -506,7 +506,7 @@ static int radeon_ttm_tt_pin_userptr(struct ttm_tt 
>  *ttm)
> uint64_t userptr = gtt->userptr + pinned * PAGE_SIZE;
> struct page **pages = ttm->pages + pinned;
>  -  r = get_user_pages(userptr, num_pages, write ? FOLL_WRITE : 0,
>  +  r = pin_user_pages(userptr, num_pages, write ? FOLL_WRITE : 0,
>    pages, NULL);
> if (r < 0)
> goto release_pages;
>  @@ -535,7 +535,7 @@ static int radeon_ttm_tt_pin_userptr(struct ttm_tt 
>  *ttm)
> kfree(ttm->sg);
> release_pages:
>  -  release_pages(ttm->pages, pinned);
>  +  unpin_user_pages(ttm->pages, pinned);
> return r;
> }
>  @@ -562,7 +562,7 @@ static void radeon_ttm_tt_unpin_userptr(struct 
>  ttm_tt *ttm)
> set_page_dirty(page);
> >>>
> >>>
> >>> Maybe we also need a preceding patch, to fix the above? It should be
> >>> set_page_dirty_lock(), rather than set_page_dirty(), unless I'm 
> >>> overlooking
> >>> something (which is very possible!).
> >>>
> >>> Either way, from a tunnel vision perspective of changing gup to pup, this
> >>> looks good to me, so
> >>>
> >>>  Acked-by: John Hubbard 
> >>>
> >>>
> >>> thanks,
> >>> --
> >>> John Hubbard
> >>> NVIDIA
> >>>
> mark_page_accessed(page);
>  -  put_page(page);
>  +  unpin_user_page(page);
> }
> sg_free_table(ttm->sg);
> 
> >>>
> >>
> >> --
> >> Daniel Vetter
> >> Software Engineer, Intel Corporation
> >> http://blog.ffwll.ch
> >
>
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH] drm/radeon: Convert get_user_pages() --> pin_user_pages()

2020-05-29 Thread John Hubbard

On 2020-05-28 23:49, Souptick Joarder wrote:
...

This is what case 3 was *intended* to cover, but it looks like case 3 needs to
be written a little better. I'll attempt that, and Cc you on the actual patch
to -mm. (I think we also need a case 5 for an unrelated scenario, too, so
it's time.)


There were no *case 5* in the other patch posted in -mm. Do we need to add it ?



Working on figuring that out [1], but it's not directly relevant to this thread.
Maybe I shouldn't have brought it up here. :)


[1] https://lore.kernel.org/r/20200529070343.gl14...@quack2.suse.cz

thanks,
John Hubbard
NVIDIA



___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH] drm/amdgpu/pm: don't bail for in_suspend

2020-05-29 Thread Michel Dänzer
On 2020-05-28 12:51 a.m., Alex Deucher wrote:
> Otherwise we disable sysfs/debugfs access with runtime pm.
> 
> Fixes: f7c8d853b029df ("drm/amdgpu/pm: return an error during GPU reset or 
> suspend")
> Signed-off-by: Alex Deucher 
> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_pm.c | 114 -
>  1 file changed, 57 insertions(+), 57 deletions(-)
> 
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_pm.c 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_pm.c
> index 808884aaf36d..775e389c9a13 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_pm.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_pm.c
> @@ -163,7 +163,7 @@ static ssize_t amdgpu_get_power_dpm_state(struct device 
> *dev,
>   enum amd_pm_state_type pm;
>   int ret;
>  
> - if (adev->in_gpu_reset || adev->in_suspend)
> + if (adev->in_gpu_reset)
>   return -EPERM;
>  
>   ret = pm_runtime_get_sync(ddev->dev);
> @@ -199,7 +199,7 @@ static ssize_t amdgpu_set_power_dpm_state(struct device 
> *dev,
>   enum amd_pm_state_type  state;
>   int ret;
>  
> - if (adev->in_gpu_reset || adev->in_suspend)
> + if (adev->in_gpu_reset)
>   return -EPERM;
>  
>   if (strncmp("battery", buf, strlen("battery")) == 0)

Might be worth moving this check into a helper function; that would make
similar changes easier in the future. :) Can be a separate change, of
course.


-- 
Earthling Michel Dänzer   |   https://redhat.com
Libre software enthusiast | Mesa and X developer
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: amdgpu doesn't do implicit sync, requires drivers to do it in IBs

2020-05-29 Thread Christian König

Am 28.05.20 um 21:35 schrieb Marek Olšák:
On Thu, May 28, 2020 at 2:12 PM Christian König 
mailto:christian.koe...@amd.com>> wrote:


Am 28.05.20 um 18:06 schrieb Marek Olšák:

On Thu, May 28, 2020 at 10:40 AM Christian König
mailto:christian.koe...@amd.com>> wrote:

Am 28.05.20 um 12:06 schrieb Michel Dänzer:
> On 2020-05-28 11:11 a.m., Christian König wrote:
>> Well we still need implicit sync [...]
> Yeah, this isn't about "we don't want implicit sync", it's
about "amdgpu
> doesn't ensure later jobs fully see the effects of previous
implicitly
> synced jobs", requiring userspace to do pessimistic flushing.

Yes, exactly that.

For the background: We also do this flushing for explicit
syncs. And
when this was implemented 2-3 years ago we first did the
flushing for
implicit sync as well.

That was immediately reverted and then implemented
differently because
it caused severe performance problems in some use cases.

I'm not sure of the root cause of this performance problems. My
assumption was always that we then insert to many pipeline
syncs, but
Marek doesn't seem to think it could be that.

On the one hand I'm rather keen to remove the extra handling
and just
always use the explicit handling for everything because it
simplifies
the kernel code quite a bit. On the other hand I don't want
to run into
this performance problem again.

Additional to that what the kernel does is a "full" pipeline
sync, e.g.
we busy wait for the full hardware pipeline to drain. That
might be
overkill if you just want to do some flushing so that the
next shader
sees the stuff written, but I'm not an expert on that.


Do we busy-wait on the CPU or in WAIT_REG_MEM?

WAIT_REG_MEM is what UMDs do and should be faster.


We use WAIT_REG_MEM to wait for an EOP fence value to reach memory.

We use this for a couple of things, especially to make sure that
the hardware is idle before changing VMID to page table associations.

What about your idea of having an extra dw in the shared BOs
indicating that they are flushed?

As far as I understand it an EOS or other event might be
sufficient for the caches as well. And you could insert the
WAIT_REG_MEM directly before the first draw using the texture and
not before the whole IB.

Could be that we can optimize this even more than what we do in
the kernel.

Christian.


Adding fences into BOs would be bad, because all UMDs would have to 
handle them.


Yeah, already assumed that this is the biggest blocker.


Is it possible to do this in the ring buffer:
if (fence_signalled) {
   indirect_buffer(dependent_IB);
   indirect_buffer(other_IB);
} else {
   indirect_buffer(other_IB);
   wait_reg_mem(fence);
   indirect_buffer(dependent_IB);
}


That's maybe possible, but at least not easily implementable.


Or we might have to wait for a hw scheduler.


I'm still fine doing the pipeline sync for implicit sync as well, I just 
need somebody to confirm me that this doesn't backfire in some case.




Does the kernel sync when the driver fd is different, or when the 
context is different?


Only when the driver fd is different.

Christian.



Marek


___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


[bug report] drm/amdkfd: Track SDMA utilization per process

2020-05-29 Thread Dan Carpenter
Hello Mukul Joshi,

This is a semi-automatic email about new static checker warnings.

The patch 522b89c63370: "drm/amdkfd: Track SDMA utilization per 
process" from May 26, 2020, leads to the following Smatch complaint:

drivers/gpu/drm/amd/amdgpu/../amdkfd/kfd_process.c:109 
kfd_sdma_activity_worker()
warn: variable dereferenced before check 'pdd' (see line 106)

drivers/gpu/drm/amd/amdgpu/../amdkfd/kfd_process.c:109 
kfd_sdma_activity_worker()
warn: address of 'pdd->qpd' is non-NULL

drivers/gpu/drm/amd/amdgpu/../amdkfd/kfd_process.c
   105  pdd = workarea->pdd;
   106  dqm = pdd->dev->dqm;
  
Dereference.

   107  qpd = &pdd->qpd;
   108  
   109  if (!pdd || !dqm || !qpd)
 ^^^ ^^^
pdd is checked too late and qpd can't possibly be NULL.

   110  return;
   111  

regards,
dan carpenter
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx