Re: [PATCH] drm/amd: fix potential memleak in err branch

2020-06-19 Thread Felix Kuehling
Hi Bernard,

I just applied a patch from another contributor for the kfd_topology
part of this. See
https://cgit.freedesktop.org/~agd5f/linux/commit/?h=amd-staging-drm-next&id=fc0fe8138309fee303bd12991f12b23f01bbf58c

Please rebase your patch on that. I believe that should only leave the
fixes in kfd_process.c.

Thanks,
  Felix

Am 2020-06-19 um 7:45 a.m. schrieb Bernard Zhao:
> The function kobject_init_and_add alloc memory like:
> kobject_init_and_add->kobject_add_varg->kobject_set_name_vargs
> ->kvasprintf_const->kstrdup_const->kstrdup->kmalloc_track_caller
> ->kmalloc_slab, in err branch this memory not free. If use
> kmemleak, this path maybe catched.
> These changes are to add kobject_put in kobject_init_and_add
> failed branch, fix potential memleak.
>
> Signed-off-by: Bernard Zhao 
> ---
>  drivers/gpu/drm/amd/amdkfd/kfd_process.c  |  2 ++
>  drivers/gpu/drm/amd/amdkfd/kfd_topology.c | 20 +++-
>  2 files changed, 17 insertions(+), 5 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_process.c 
> b/drivers/gpu/drm/amd/amdkfd/kfd_process.c
> index d27221ddcdeb..5ee4d6cfb16d 100644
> --- a/drivers/gpu/drm/amd/amdkfd/kfd_process.c
> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_process.c
> @@ -124,6 +124,7 @@ void kfd_procfs_init(void)
>   if (ret) {
>   pr_warn("Could not create procfs proc folder");
>   /* If we fail to create the procfs, clean up */
> + kobject_put(procfs.kobj);
>   kfd_procfs_shutdown();
>   }
>  }
> @@ -428,6 +429,7 @@ struct kfd_process *kfd_create_process(struct file *filep)
>  (int)process->lead_thread->pid);
>   if (ret) {
>   pr_warn("Creating procfs pid directory failed");
> + kobject_put(process->kobj);
>   goto out;
>   }
>  
> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_topology.c 
> b/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
> index bb77f7af2b6d..dc3c4149f860 100644
> --- a/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
> @@ -632,8 +632,10 @@ static int kfd_build_sysfs_node_entry(struct 
> kfd_topology_device *dev,
>  
>   ret = kobject_init_and_add(dev->kobj_node, &node_type,
>   sys_props.kobj_nodes, "%d", id);
> - if (ret < 0)
> + if (ret < 0) {
> + kobject_put(dev->kobj_node);
>   return ret;
> + }
>  
>   dev->kobj_mem = kobject_create_and_add("mem_banks", dev->kobj_node);
>   if (!dev->kobj_mem)
> @@ -680,8 +682,10 @@ static int kfd_build_sysfs_node_entry(struct 
> kfd_topology_device *dev,
>   return -ENOMEM;
>   ret = kobject_init_and_add(mem->kobj, &mem_type,
>   dev->kobj_mem, "%d", i);
> - if (ret < 0)
> + if (ret < 0) {
> + kobject_put(mem->kobj);
>   return ret;
> + }
>  
>   mem->attr.name = "properties";
>   mem->attr.mode = KFD_SYSFS_FILE_MODE;
> @@ -699,8 +703,10 @@ static int kfd_build_sysfs_node_entry(struct 
> kfd_topology_device *dev,
>   return -ENOMEM;
>   ret = kobject_init_and_add(cache->kobj, &cache_type,
>   dev->kobj_cache, "%d", i);
> - if (ret < 0)
> + if (ret < 0) {
> + kobject_put(cache->kobj);
>   return ret;
> + }
>  
>   cache->attr.name = "properties";
>   cache->attr.mode = KFD_SYSFS_FILE_MODE;
> @@ -718,8 +724,10 @@ static int kfd_build_sysfs_node_entry(struct 
> kfd_topology_device *dev,
>   return -ENOMEM;
>   ret = kobject_init_and_add(iolink->kobj, &iolink_type,
>   dev->kobj_iolink, "%d", i);
> - if (ret < 0)
> + if (ret < 0) {
> + kobject_put(iolink->kobj);
>   return ret;
> + }
>  
>   iolink->attr.name = "properties";
>   iolink->attr.mode = KFD_SYSFS_FILE_MODE;
> @@ -798,8 +806,10 @@ static int kfd_topology_update_sysfs(void)
>   ret = kobject_init_and_add(sys_props.kobj_topology,
>   &sysprops_type,  &kfd_device->kobj,
>   "topology");
> - if (ret < 0)
> + if (ret < 0) {
> + kobject_put(sys_props.kobj_topology);
>   return ret;
> + }
>  
>   sys_props.kobj_nodes = kobject_create_and_add("nodes",
>   sys_props.kobj_topology);
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [Linaro-mm-sig] [PATCH 04/18] dma-fence: prime lockdep annotations

2020-06-19 Thread Jerome Glisse
On Fri, Jun 19, 2020 at 10:43:20PM +0200, Daniel Vetter wrote:
> On Fri, Jun 19, 2020 at 10:10 PM Jerome Glisse  wrote:
> >
> > On Fri, Jun 19, 2020 at 03:18:49PM -0300, Jason Gunthorpe wrote:
> > > On Fri, Jun 19, 2020 at 02:09:35PM -0400, Jerome Glisse wrote:
> > > > On Fri, Jun 19, 2020 at 02:23:08PM -0300, Jason Gunthorpe wrote:
> > > > > On Fri, Jun 19, 2020 at 06:19:41PM +0200, Daniel Vetter wrote:
> > > > >
> > > > > > The madness is only that device B's mmu notifier might need to wait
> > > > > > for fence_B so that the dma operation finishes. Which in turn has to
> > > > > > wait for device A to finish first.
> > > > >
> > > > > So, it sound, fundamentally you've got this graph of operations across
> > > > > an unknown set of drivers and the kernel cannot insert itself in
> > > > > dma_fence hand offs to re-validate any of the buffers involved?
> > > > > Buffers which by definition cannot be touched by the hardware yet.
> > > > >
> > > > > That really is a pretty horrible place to end up..
> > > > >
> > > > > Pinning really is right answer for this kind of work flow. I think
> > > > > converting pinning to notifers should not be done unless notifier
> > > > > invalidation is relatively bounded.
> > > > >
> > > > > I know people like notifiers because they give a bit nicer performance
> > > > > in some happy cases, but this cripples all the bad cases..
> > > > >
> > > > > If pinning doesn't work for some reason maybe we should address that?
> > > >
> > > > Note that the dma fence is only true for user ptr buffer which predate
> > > > any HMM work and thus were using mmu notifier already. You need the
> > > > mmu notifier there because of fork and other corner cases.
> > >
> > > I wonder if we should try to fix the fork case more directly - RDMA
> > > has this same problem and added MADV_DONTFORK a long time ago as a
> > > hacky way to deal with it.
> > >
> > > Some crazy page pin that resolved COW in a way that always kept the
> > > physical memory with the mm that initiated the pin?
> >
> > Just no way to deal with it easily, i thought about forcing the
> > anon_vma (page->mapping for anonymous page) to the anon_vma that
> > belongs to the vma against which the GUP was done but it would
> > break things if page is already in other branch of a fork tree.
> > Also this forbid fast GUP.
> >
> > Quite frankly the fork was not the main motivating factor. GPU
> > can pin potentialy GBytes of memory thus we wanted to be able
> > to release it but since Michal changes to reclaim code this is
> > no longer effective.
> 
> What where how? My patch to annote reclaim paths with mmu notifier
> possibility just landed in -mm, so if direct reclaim can't reclaim mmu
> notifier'ed stuff anymore we need to know.
> 
> Also this would resolve the entire pain we're discussing in this
> thread about dma_fence_wait deadlocking against anything that's not
> GFP_ATOMIC ...

Sorry my bad, reclaim still works, only oom skip. It was couple
years ago and i thought that some of the things discuss while
back did make it upstream.

It is probably a good time to also point out that what i wanted
to do is have all the mmu notifier callback provide some kind
of fence (not dma fence) so that we can split the notification
into step:
A- schedule notification on all devices/system get fences
   this step should minimize lock dependency and should
   not have to wait for anything also best if you can avoid
   memory allocation for instance by pre-allocating what
   you need for notification.
B- mm can do things like unmap but can not map new page
   so write special swap pte to cpu page table
C- wait on each fences from A
... resume old code ie replace pte or finish unmap ...

The idea here is that at step C the core mm can decide to back
off if any fence returned from A have to wait. This means that
every device is invalidating for nothing but if we get there
then it might still be a good thing as next time around maybe
the kernel would be successfull without a wait.

This would allow things like reclaim to make forward progress
and skip over or limit wait time to given timeout.

Also I thought to extend this even to multi-cpu tlb flush so
that device and CPUs follow same pattern and we can make //
progress on each.


Getting to such scheme is a lot of work. My plan was to first
get the fence as part of the notifier user API and hide it from
mm inside notifier common code. Then update each core mm path to
new model and see if there is any benefit from it. Reclaim would
be first candidate.

Cheers,
Jérôme

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [Linaro-mm-sig] [PATCH 04/18] dma-fence: prime lockdep annotations

2020-06-19 Thread Daniel Vetter
On Fri, Jun 19, 2020 at 10:10 PM Jerome Glisse  wrote:
>
> On Fri, Jun 19, 2020 at 03:18:49PM -0300, Jason Gunthorpe wrote:
> > On Fri, Jun 19, 2020 at 02:09:35PM -0400, Jerome Glisse wrote:
> > > On Fri, Jun 19, 2020 at 02:23:08PM -0300, Jason Gunthorpe wrote:
> > > > On Fri, Jun 19, 2020 at 06:19:41PM +0200, Daniel Vetter wrote:
> > > >
> > > > > The madness is only that device B's mmu notifier might need to wait
> > > > > for fence_B so that the dma operation finishes. Which in turn has to
> > > > > wait for device A to finish first.
> > > >
> > > > So, it sound, fundamentally you've got this graph of operations across
> > > > an unknown set of drivers and the kernel cannot insert itself in
> > > > dma_fence hand offs to re-validate any of the buffers involved?
> > > > Buffers which by definition cannot be touched by the hardware yet.
> > > >
> > > > That really is a pretty horrible place to end up..
> > > >
> > > > Pinning really is right answer for this kind of work flow. I think
> > > > converting pinning to notifers should not be done unless notifier
> > > > invalidation is relatively bounded.
> > > >
> > > > I know people like notifiers because they give a bit nicer performance
> > > > in some happy cases, but this cripples all the bad cases..
> > > >
> > > > If pinning doesn't work for some reason maybe we should address that?
> > >
> > > Note that the dma fence is only true for user ptr buffer which predate
> > > any HMM work and thus were using mmu notifier already. You need the
> > > mmu notifier there because of fork and other corner cases.
> >
> > I wonder if we should try to fix the fork case more directly - RDMA
> > has this same problem and added MADV_DONTFORK a long time ago as a
> > hacky way to deal with it.
> >
> > Some crazy page pin that resolved COW in a way that always kept the
> > physical memory with the mm that initiated the pin?
>
> Just no way to deal with it easily, i thought about forcing the
> anon_vma (page->mapping for anonymous page) to the anon_vma that
> belongs to the vma against which the GUP was done but it would
> break things if page is already in other branch of a fork tree.
> Also this forbid fast GUP.
>
> Quite frankly the fork was not the main motivating factor. GPU
> can pin potentialy GBytes of memory thus we wanted to be able
> to release it but since Michal changes to reclaim code this is
> no longer effective.

What where how? My patch to annote reclaim paths with mmu notifier
possibility just landed in -mm, so if direct reclaim can't reclaim mmu
notifier'ed stuff anymore we need to know.

Also this would resolve the entire pain we're discussing in this
thread about dma_fence_wait deadlocking against anything that's not
GFP_ATOMIC ...
-Daniel

>
> User buffer should never end up in those weird corner case, iirc
> the first usage was for xorg exa texture upload, then generalize
> to texture upload in mesa and latter on to more upload cases
> (vertices, ...). At least this is what i remember today. So in
> those cases we do not expect fork, splice, mremap, mprotect, ...
>
> Maybe we can audit how user ptr buffer are use today and see if
> we can define a usage pattern that would allow to cut corner in
> kernel. For instance we could use mmu notifier just to block CPU
> pte update while we do GUP and thus never wait on dma fence.
>
> Then GPU driver just keep the GUP pin around until they are done
> with the page. They can also use the mmu notifier to keep a flag
> so that the driver know if it needs to redo a GUP ie:
>
> The notifier path:
>GPU_mmu_notifier_start_callback(range)
> gpu_lock_cpu_pagetable(range)
> for_each_bo_in(bo, range) {
> bo->need_gup = true;
> }
> gpu_unlock_cpu_pagetable(range)
>
>GPU_validate_buffer_pages(bo)
> if (!bo->need_gup)
> return;
> put_pages(bo->pages);
> range = bo_vaddr_range(bo)
> gpu_lock_cpu_pagetable(range)
> GUP(bo->pages, range)
> gpu_unlock_cpu_pagetable(range)
>
>
> Depending on how user_ptr are use today this could work.
>
>
> > (isn't this broken for O_DIRECT as well anyhow?)
>
> Yes it can in theory, if you have an application that does O_DIRECT
> and fork concurrently (ie O_DIRECT in one thread and fork in another).
> Note that O_DIRECT after fork is fine, it is an issue only if GUP_fast
> was able to lookup a page with write permission before fork had the
> chance to update it to read only for COW.
>
> But doing O_DIRECT (or anything that use GUP fast) in one thread and
> fork in another is inherently broken ie there is no way to fix it.
>
> See 17839856fd588f4ab6b789f482ed3ffd7c403e1f
>
> >
> > How does mmu_notifiers help the fork case anyhow? Block fork from
> > progressing?
>
> It enforce ordering between fork and GUP, if fork is first it blocks
> GUP and if forks is last then fork waits on GUP and then user buffer
> get invalidated.
>
> >
> > > I probably need to warn AMD folks again that using H

Re: [Linaro-mm-sig] [PATCH 04/18] dma-fence: prime lockdep annotations

2020-06-19 Thread Jerome Glisse
On Fri, Jun 19, 2020 at 04:55:38PM -0300, Jason Gunthorpe wrote:
> On Fri, Jun 19, 2020 at 03:48:49PM -0400, Felix Kuehling wrote:
> > Am 2020-06-19 um 2:18 p.m. schrieb Jason Gunthorpe:
> > > On Fri, Jun 19, 2020 at 02:09:35PM -0400, Jerome Glisse wrote:
> > >> On Fri, Jun 19, 2020 at 02:23:08PM -0300, Jason Gunthorpe wrote:
> > >>> On Fri, Jun 19, 2020 at 06:19:41PM +0200, Daniel Vetter wrote:
> > >>>
> >  The madness is only that device B's mmu notifier might need to wait
> >  for fence_B so that the dma operation finishes. Which in turn has to
> >  wait for device A to finish first.
> > >>> So, it sound, fundamentally you've got this graph of operations across
> > >>> an unknown set of drivers and the kernel cannot insert itself in
> > >>> dma_fence hand offs to re-validate any of the buffers involved?
> > >>> Buffers which by definition cannot be touched by the hardware yet.
> > >>>
> > >>> That really is a pretty horrible place to end up..
> > >>>
> > >>> Pinning really is right answer for this kind of work flow. I think
> > >>> converting pinning to notifers should not be done unless notifier
> > >>> invalidation is relatively bounded. 
> > >>>
> > >>> I know people like notifiers because they give a bit nicer performance
> > >>> in some happy cases, but this cripples all the bad cases..
> > >>>
> > >>> If pinning doesn't work for some reason maybe we should address that?
> > >> Note that the dma fence is only true for user ptr buffer which predate
> > >> any HMM work and thus were using mmu notifier already. You need the
> > >> mmu notifier there because of fork and other corner cases.
> > > I wonder if we should try to fix the fork case more directly - RDMA
> > > has this same problem and added MADV_DONTFORK a long time ago as a
> > > hacky way to deal with it.
> > >
> > > Some crazy page pin that resolved COW in a way that always kept the
> > > physical memory with the mm that initiated the pin?
> > >
> > > (isn't this broken for O_DIRECT as well anyhow?)
> > >
> > > How does mmu_notifiers help the fork case anyhow? Block fork from
> > > progressing?
> > 
> > How much the mmu_notifier blocks fork progress depends, on quickly we
> > can preempt GPU jobs accessing affected memory. If we don't have
> > fine-grained preemption capability (graphics), the best we can do is
> > wait for the GPU jobs to complete. We can also delay submission of new
> > GPU jobs to the same memory until the MMU notifier is done. Future jobs
> > would use the new page addresses.
> > 
> > With fine-grained preemption (ROCm compute), we can preempt GPU work on
> > the affected adders space to minimize the delay seen by fork.
> > 
> > With recoverable device page faults, we can invalidate GPU page table
> > entries, so device access to the affected pages stops immediately.
> > 
> > In all cases, the end result is, that the device page table gets updated
> > with the address of the copied pages before the GPU accesses the COW
> > memory again.Without the MMU notifier, we'd end up with the GPU
> > corrupting memory of the other process.
> 
> The model here in fork has been wrong for a long time, and I do wonder
> how O_DIRECT manages to not be broken too.. I guess the time windows
> there are too small to get unlucky.

This was discuss extensively in the GUP works John have been doing.
Yes O_DIRECT can potentialy break but only if you are writting to
COW pages and you initiated the O_DIRECT right before the fork and
GUP happen before fork was able to write protect the pages.

If you O_DIRECT but use memory as input ie you are writting the
memory to the file not reading from the file. Then fork is harmless
as you are just reading memory. You can still face the COW uncertainty
(the process against which you did the O_DIRECT get "new" pages but your
O_DIRECT goes on with the "old" pages) but doing O_DIRECT and fork
concurently is asking for trouble.

> 
> If you have a write pin on a page then it should not be COW'd into the
> fork'd process but copied with the originating page remaining with the
> original mm.
> 
> I wonder if there is some easy way to achive that - if that is the
> main reason to use notifiers then it would be a better solution.

Not doable as page refcount can change for things unrelated to GUP, with
John changes we can identify GUP and we could potentialy copy GUPed page
instead of COW but this can potentialy slow down fork() and i am not sure
how acceptable this would be. Also this does not solve GUP against page
that are already in fork tree ie page P0 is in process A which forks,
we now have page P0 in process A and B. Now we have process A which forks
again and we have page P0 in A, B, and C. Here B and C are two branches
with root in A. B and/or C can keep forking and grow the fork tree.

Now if read only GUP on P0 happens in C (or B everything is symetrical in
respect to root A) then P0 might not be the page that is in C after the
GUP ie if something in C write to the virtual address correspo

[PATCH 30/30] drm/amd/display: 3.2.91

2020-06-19 Thread Rodrigo Siqueira
From: Aric Cyr 

Signed-off-by: Aric Cyr 
Reviewed-by: Aric Cyr 
Acked-by: Rodrigo Siqueira 
---
 drivers/gpu/drm/amd/display/dc/dc.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/display/dc/dc.h 
b/drivers/gpu/drm/amd/display/dc/dc.h
index ceba626bda2f..f7cb1354a635 100644
--- a/drivers/gpu/drm/amd/display/dc/dc.h
+++ b/drivers/gpu/drm/amd/display/dc/dc.h
@@ -42,7 +42,7 @@
 #include "inc/hw/dmcu.h"
 #include "dml/display_mode_lib.h"
 
-#define DC_VER "3.2.90"
+#define DC_VER "3.2.91"
 
 #define MAX_SURFACES 3
 #define MAX_PLANES 6
-- 
2.27.0

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


[PATCH 20/30] drm/amd/display: Fill in dmub_srv fw_version from firmware metadata

2020-06-19 Thread Rodrigo Siqueira
From: Nicholas Kazlauskas 

[Why]
DMCUB firmware version is now available from firmware metadata block.

We should be passing this into dmub_srv so we can know when to apply
firmware version specific functionality like using CW4 only instead
of the REGION4.

[How]
We don't have the helpers for DM to actually extract out firmware
metadata block themselves.

We could add that and add helpers in DM to grab this, but not every
creation sequence has firmware instruction before dmub_srv_create.

Easiest way to handle this is to fill this in automatically per DM in
the place we do have it - when calculating the region parameters. But
only fill it in if DM already hasn't in case we need to override with
a specific version.

We aren't do anything firmware version specific in dmub_srv_create
today that does require fw_version, so while it's a little unituitive
to do it when calculating region parameters it works for now.

Signed-off-by: Nicholas Kazlauskas 
Reviewed-by: Tony Cheng 
Acked-by: Rodrigo Siqueira 
---
 drivers/gpu/drm/amd/display/dmub/src/dmub_srv.c | 10 ++
 1 file changed, 10 insertions(+)

diff --git a/drivers/gpu/drm/amd/display/dmub/src/dmub_srv.c 
b/drivers/gpu/drm/amd/display/dmub/src/dmub_srv.c
index 9c924994a2c3..aa41dfa23020 100644
--- a/drivers/gpu/drm/amd/display/dmub/src/dmub_srv.c
+++ b/drivers/gpu/drm/amd/display/dmub/src/dmub_srv.c
@@ -281,6 +281,16 @@ dmub_srv_calc_region_info(struct dmub_srv *dmub,
if (fw_info) {
fw_state_size = fw_info->fw_region_size;
trace_buffer_size = fw_info->trace_buffer_size;
+
+   /**
+* If DM didn't fill in a version, then fill it in based on
+* the firmware meta now that we have it.
+*
+* TODO: Make it easier for driver to extract this out to
+* pass during creation.
+*/
+   if (dmub->fw_version == 0)
+   dmub->fw_version = fw_info->fw_version;
}
 
trace_buff->base = dmub_align(mail->top, 256);
-- 
2.27.0

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


[PATCH 24/30] drm/amd/display: enable seamless boot for dcn30

2020-06-19 Thread Rodrigo Siqueira
From: Martin Leung 

why:
seamless boots requires split of init_hw into hw and pipes to work. This
was implemented in dcn10_init_hw but did not apply yet to dcn30.

how:
Copy over dcn10_init_hw and adapt it to dcn30 using recent changes to
dcn3.  Behavior will be different in init sequence.

Signed-off-by: Martin Leung 
Reviewed-by: Anthony Koo 
Acked-by: Rodrigo Siqueira 
---
 .../drm/amd/display/dc/dcn30/dcn30_hwseq.c| 245 +-
 .../gpu/drm/amd/display/dc/dcn30/dcn30_init.c |   8 +-
 2 files changed, 126 insertions(+), 127 deletions(-)

diff --git a/drivers/gpu/drm/amd/display/dc/dcn30/dcn30_hwseq.c 
b/drivers/gpu/drm/amd/display/dc/dcn30/dcn30_hwseq.c
index 37c310dbb366..a5d750ed569e 100644
--- a/drivers/gpu/drm/amd/display/dc/dcn30/dcn30_hwseq.c
+++ b/drivers/gpu/drm/amd/display/dc/dcn30/dcn30_hwseq.c
@@ -47,6 +47,8 @@
 #include "mpc.h"
 #include "mcif_wb.h"
 #include "dc_dmub_srv.h"
+#include "link_hwss.h"
+#include "dpcd_defs.h"
 
 
 
@@ -427,7 +429,6 @@ void dcn30_init_hw(struct dc *dc)
struct dce_hwseq *hws = dc->hwseq;
struct dc_bios *dcb = dc->ctx->dc_bios;
struct resource_pool *res_pool = dc->res_pool;
-   struct dc_state  *context = dc->current_state;
uint32_t backlight = MAX_BACKLIGHT_LEVEL;
 
if (dc->clk_mgr && dc->clk_mgr->funcs->init_clocks)
@@ -437,153 +438,155 @@ void dcn30_init_hw(struct dc *dc)
if (res_pool->dccg->funcs->dccg_init)
res_pool->dccg->funcs->dccg_init(res_pool->dccg);
 
-   //Enable ability to power gate / don't force power on permanently
-   hws->funcs.enable_power_gating_plane(dc->hwseq, true);
-
if (IS_FPGA_MAXIMUS_DC(dc->ctx->dce_environment)) {
-   REG_WRITE(RBBMIF_TIMEOUT_DIS, 0x);
-   REG_WRITE(RBBMIF_TIMEOUT_DIS_2, 0x);
-
-   hws->funcs.dccg_init(hws);
 
-   REG_UPDATE(DCHUBBUB_GLOBAL_TIMER_CNTL, 
DCHUBBUB_GLOBAL_TIMER_REFDIV, 2);
-   REG_UPDATE(DCHUBBUB_GLOBAL_TIMER_CNTL, 
DCHUBBUB_GLOBAL_TIMER_ENABLE, 1);
REG_WRITE(REFCLK_CNTL, 0);
-   } else {
-   if (!dcb->funcs->is_accelerated_mode(dcb)) {
-   hws->funcs.bios_golden_init(dc);
-   hws->funcs.disable_vga(dc->hwseq);
-   }
+   REG_UPDATE(DCHUBBUB_GLOBAL_TIMER_CNTL, 
DCHUBBUB_GLOBAL_TIMER_ENABLE, 1);
+   REG_WRITE(DIO_MEM_PWR_CTRL, 0);
 
-   if (dc->ctx->dc_bios->fw_info_valid) {
-   res_pool->ref_clocks.xtalin_clock_inKhz =
-   
dc->ctx->dc_bios->fw_info.pll_info.crystal_frequency;
-
-   if (!IS_FPGA_MAXIMUS_DC(dc->ctx->dce_environment)) {
-   if (res_pool->dccg && res_pool->hubbub) {
-
-   
(res_pool->dccg->funcs->get_dccg_ref_freq)(res_pool->dccg,
-   
dc->ctx->dc_bios->fw_info.pll_info.crystal_frequency,
-   
&res_pool->ref_clocks.dccg_ref_clock_inKhz);
-
-   
(res_pool->hubbub->funcs->get_dchub_ref_freq)(res_pool->hubbub,
-   
res_pool->ref_clocks.dccg_ref_clock_inKhz,
-   
&res_pool->ref_clocks.dchub_ref_clock_inKhz);
-   } else {
-   // Not all ASICs have DCCG sw component
-   
res_pool->ref_clocks.dccg_ref_clock_inKhz =
-   
res_pool->ref_clocks.xtalin_clock_inKhz;
-   
res_pool->ref_clocks.dchub_ref_clock_inKhz =
-   
res_pool->ref_clocks.xtalin_clock_inKhz;
-   }
-   }
-   } else
-   ASSERT_CRITICAL(false);
+   if (!dc->debug.disable_clock_gate) {
+   /* enable all DCN clock gating */
+   REG_WRITE(DCCG_GATE_DISABLE_CNTL, 0);
 
-   for (i = 0; i < dc->link_count; i++) {
-   /* Power up AND update implementation according to the
-* required signal (which may be different from the
-* default signal on connector).
-*/
-   struct dc_link *link = dc->links[i];
+   REG_WRITE(DCCG_GATE_DISABLE_CNTL2, 0);
 
-   link->link_enc->funcs->hw_init(link->link_enc);
+   REG_UPDATE(DCFCLK_CNTL, DCFCLK_GATE_DIS, 0);
}
-   }
 
-   /* Power gate DSCs */
-   for (i = 0; i < res_pool->res_cap->num_dsc; i++)
-   hws->funcs.dsc_pg_control(hws, res_pool->dscs[i]->inst, false);

[PATCH 09/30] drm/amd/display: Correctly respond in psr enablement interface

2020-06-19 Thread Rodrigo Siqueira
From: Camille Cho 

[Why]
dc_link_set_psr_allow_active() always returns true, even in the case
that PSR is not supported.

[How]
Hook up the return value of dc_link_set_psr_allow_active().

Signed-off-by: Camille Cho 
Reviewed-by: Josip Pavic 
Acked-by: Anthony Koo 
Acked-by: Rodrigo Siqueira 
---
 drivers/gpu/drm/amd/display/dc/core/dc_link.c | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/display/dc/core/dc_link.c 
b/drivers/gpu/drm/amd/display/dc/core/dc_link.c
index 114ee29132fa..f020b3b67f0d 100644
--- a/drivers/gpu/drm/amd/display/dc/core/dc_link.c
+++ b/drivers/gpu/drm/amd/display/dc/core/dc_link.c
@@ -2557,12 +2557,14 @@ bool dc_link_set_psr_allow_active(struct dc_link *link, 
bool allow_active, bool
struct dmcu *dmcu = dc->res_pool->dmcu;
struct dmub_psr *psr = dc->res_pool->psr;
 
+   link->psr_settings.psr_allow_active = allow_active;
+
if (psr != NULL && link->psr_settings.psr_feature_enabled)
psr->funcs->psr_enable(psr, allow_active);
else if ((dmcu != NULL && dmcu->funcs->is_dmcu_initialized(dmcu)) && 
link->psr_settings.psr_feature_enabled)
dmcu->funcs->set_psr_enable(dmcu, allow_active, wait);
-
-   link->psr_settings.psr_allow_active = allow_active;
+   else
+   return false;
 
return true;
 }
-- 
2.27.0

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


[PATCH 25/30] drm/amd/display: Compare v_front_porch when checking if streams are synchronizable

2020-06-19 Thread Rodrigo Siqueira
From: David Galiffi 

[Why]
If the front porch of the two timings differ, then there may not be
enough time while both streams are in vertical blank to perform a memory
clock change. This can hang the system.

[How]
Check the each streams timing.v_front_porch when determining if the two
streams are synchronizable.

Signed-off-by: David Galiffi 
Reviewed-by: Jun Lei 
Acked-by: Rodrigo Siqueira 
---
 drivers/gpu/drm/amd/display/dc/core/dc_resource.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/drivers/gpu/drm/amd/display/dc/core/dc_resource.c 
b/drivers/gpu/drm/amd/display/dc/core/dc_resource.c
index 3d0003c69373..1000dc6daf72 100644
--- a/drivers/gpu/drm/amd/display/dc/core/dc_resource.c
+++ b/drivers/gpu/drm/amd/display/dc/core/dc_resource.c
@@ -399,6 +399,10 @@ bool resource_are_streams_timing_synchronizable(
!= stream2->timing.v_addressable)
return false;
 
+   if (stream1->timing.v_front_porch
+   != stream2->timing.v_front_porch)
+   return false;
+
if (stream1->timing.pix_clk_100hz
!= stream2->timing.pix_clk_100hz)
return false;
-- 
2.27.0

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


[PATCH 14/30] drm/amd/display: fine tune logic of edid max TMDS clock check

2020-06-19 Thread Rodrigo Siqueira
From: Dale Zhao 

[WHY]
Check max_tmds_clk_mhz firstly will restrict pixel clock under HDMI
1.4, thus HDMI2.0 port can't correctly support 4K 60Hz.

[HOW]
Fine tune the logic to check max_forum_tmds_clk_mhz firstly.

Signed-off-by: Dale Zhao 
Reviewed-by: Chris Park 
Acked-by: Rodrigo Siqueira 
---
 drivers/gpu/drm/amd/display/dc/dc_types.h | 5 ++---
 drivers/gpu/drm/amd/display/dc/dcn10/dcn10_link_encoder.c | 3 ---
 2 files changed, 2 insertions(+), 6 deletions(-)

diff --git a/drivers/gpu/drm/amd/display/dc/dc_types.h 
b/drivers/gpu/drm/amd/display/dc/dc_types.h
index f51e5766d8f5..d64241433548 100644
--- a/drivers/gpu/drm/amd/display/dc/dc_types.h
+++ b/drivers/gpu/drm/amd/display/dc/dc_types.h
@@ -255,15 +255,14 @@ struct dc_edid_caps {
uint8_t qs_bit;
uint8_t qy_bit;
 
+   uint32_t max_tmds_clk_mhz;
+
/*HDMI 2.0 caps*/
bool lte_340mcsc_scramble;
 
bool edid_hdmi;
bool hdr_supported;
 
-   uint32_t max_tmds_clk_mhz;
-   uint32_t max_forum_tmds_clk_mhz;
-
struct dc_panel_patch panel_patch;
 };
 
diff --git a/drivers/gpu/drm/amd/display/dc/dcn10/dcn10_link_encoder.c 
b/drivers/gpu/drm/amd/display/dc/dcn10/dcn10_link_encoder.c
index a9af3f6fd8ec..81db0179f7ea 100644
--- a/drivers/gpu/drm/amd/display/dc/dcn10/dcn10_link_encoder.c
+++ b/drivers/gpu/drm/amd/display/dc/dcn10/dcn10_link_encoder.c
@@ -629,9 +629,6 @@ static bool dcn10_link_encoder_validate_hdmi_output(
if (edid_caps->max_tmds_clk_mhz != 0 &&
adjusted_pix_clk_100hz > edid_caps->max_tmds_clk_mhz * 
1)
return false;
-   if (edid_caps->max_forum_tmds_clk_mhz != 0 &&
-   adjusted_pix_clk_100hz > 
edid_caps->max_forum_tmds_clk_mhz * 1)
-   return false;
 
if (max_deep_color < crtc_timing->display_color_depth)
return false;
-- 
2.27.0

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


[PATCH 26/30] drm/amd/display: allow query ddc data over aux to be read only operation

2020-06-19 Thread Rodrigo Siqueira
From: Wenjing Liu 

[why]
Two issues:
1. Add read only operation support for query ddc data over aux.
2. Fix a bug where if read size is multiple of 16,
mot of the last read transaction will not be set to 0.

Signed-off-by: Wenjing Liu 
Reviewed-by: Jun Lei 
Acked-by: Rodrigo Siqueira 
---
 .../gpu/drm/amd/display/dc/core/dc_link_ddc.c | 29 ---
 1 file changed, 18 insertions(+), 11 deletions(-)

diff --git a/drivers/gpu/drm/amd/display/dc/core/dc_link_ddc.c 
b/drivers/gpu/drm/amd/display/dc/core/dc_link_ddc.c
index aefd29a440b5..be8f265976b0 100644
--- a/drivers/gpu/drm/amd/display/dc/core/dc_link_ddc.c
+++ b/drivers/gpu/drm/amd/display/dc/core/dc_link_ddc.c
@@ -503,7 +503,7 @@ bool dal_ddc_service_query_ddc_data(
uint8_t *read_buf,
uint32_t read_size)
 {
-   bool ret = false;
+   bool success = true;
uint32_t payload_size =
dal_ddc_service_is_in_aux_transaction_mode(ddc) ?
DEFAULT_AUX_MAX_DATA_SIZE : EDID_SEGMENT_SIZE;
@@ -527,7 +527,6 @@ bool dal_ddc_service_query_ddc_data(
 *  but we want to read 256 over i2c*/
if (dal_ddc_service_is_in_aux_transaction_mode(ddc)) {
struct aux_payload payload;
-   bool read_available = true;
 
payload.i2c_over_aux = true;
payload.address = address;
@@ -536,21 +535,26 @@ bool dal_ddc_service_query_ddc_data(
 
if (write_size != 0) {
payload.write = true;
-   payload.mot = false;
+   /* should not set mot (middle of transaction) to 0
+* if there are pending read payloads
+*/
+   payload.mot = read_size == 0 ? false : true;
payload.length = write_size;
payload.data = write_buf;
 
-   ret = dal_ddc_submit_aux_command(ddc, &payload);
-   read_available = ret;
+   success = dal_ddc_submit_aux_command(ddc, &payload);
}
 
-   if (read_size != 0 && read_available) {
+   if (read_size != 0 && success) {
payload.write = false;
+   /* should set mot (middle of transaction) to 0
+* since it is the last payload to send
+*/
payload.mot = false;
payload.length = read_size;
payload.data = read_buf;
 
-   ret = dal_ddc_submit_aux_command(ddc, &payload);
+   success = dal_ddc_submit_aux_command(ddc, &payload);
}
} else {
struct i2c_command command = {0};
@@ -573,7 +577,7 @@ bool dal_ddc_service_query_ddc_data(
command.number_of_payloads =
dal_ddc_i2c_payloads_get_count(&payloads);
 
-   ret = dm_helpers_submit_i2c(
+   success = dm_helpers_submit_i2c(
ddc->ctx,
ddc->link,
&command);
@@ -581,7 +585,7 @@ bool dal_ddc_service_query_ddc_data(
dal_ddc_i2c_payloads_destroy(&payloads);
}
 
-   return ret;
+   return success;
 }
 
 bool dal_ddc_submit_aux_command(struct ddc_service *ddc,
@@ -598,7 +602,7 @@ bool dal_ddc_submit_aux_command(struct ddc_service *ddc,
 
do {
struct aux_payload current_payload;
-   bool is_end_of_payload = (retrieved + 
DEFAULT_AUX_MAX_DATA_SIZE) >
+   bool is_end_of_payload = (retrieved + 
DEFAULT_AUX_MAX_DATA_SIZE) >=
payload->length;
 
current_payload.address = payload->address;
@@ -607,7 +611,10 @@ bool dal_ddc_submit_aux_command(struct ddc_service *ddc,
current_payload.i2c_over_aux = payload->i2c_over_aux;
current_payload.length = is_end_of_payload ?
payload->length - retrieved : DEFAULT_AUX_MAX_DATA_SIZE;
-   current_payload.mot = !is_end_of_payload;
+   /* set mot (middle of transaction) to false
+* if it is the last payload
+*/
+   current_payload.mot = is_end_of_payload ? payload->mot:true;
current_payload.reply = payload->reply;
current_payload.write = payload->write;
 
-- 
2.27.0

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


[PATCH 10/30] drm/amd/display: [FW Promotion] Release 1.0.18

2020-06-19 Thread Rodrigo Siqueira
From: Anthony Koo 

[Header Changes]
- Update scratch information for boot status

Signed-off-by: Anthony Koo 
Reviewed-by: Anthony Koo 
Acked-by: Rodrigo Siqueira 
---
 .../gpu/drm/amd/display/dmub/inc/dmub_cmd.h   | 45 ---
 1 file changed, 39 insertions(+), 6 deletions(-)

diff --git a/drivers/gpu/drm/amd/display/dmub/inc/dmub_cmd.h 
b/drivers/gpu/drm/amd/display/dmub/inc/dmub_cmd.h
index 2b399b836aa6..96e1379c4cf8 100644
--- a/drivers/gpu/drm/amd/display/dmub/inc/dmub_cmd.h
+++ b/drivers/gpu/drm/amd/display/dmub/inc/dmub_cmd.h
@@ -36,10 +36,10 @@
 
 /* Firmware versioning. */
 #ifdef DMUB_EXPOSE_VERSION
-#define DMUB_FW_VERSION_GIT_HASH 0x6d5deb31c
+#define DMUB_FW_VERSION_GIT_HASH 0x67e8928df
 #define DMUB_FW_VERSION_MAJOR 1
 #define DMUB_FW_VERSION_MINOR 0
-#define DMUB_FW_VERSION_REVISION 17
+#define DMUB_FW_VERSION_REVISION 18
 #define DMUB_FW_VERSION_UCODE ((DMUB_FW_VERSION_MAJOR << 24) | 
(DMUB_FW_VERSION_MINOR << 16) | DMUB_FW_VERSION_REVISION)
 #endif
 
@@ -146,10 +146,8 @@ union dmub_fw_meta {
  * DMCUB scratch registers can be used to determine firmware status.
  * Current scratch register usage is as follows:
  *
- * SCRATCH0: Legacy status register
- * SCRATCH1: Firmware version
- * SCRATCH2: Firmware status bits defined by dmub_fw_status_bit
- * SCRATCH3: Reserved firmware status bits
+ * SCRATCH0: FW Boot Status register
+ * SCRATCH15: FW Boot Options register
  */
 
 /**
@@ -160,6 +158,41 @@ enum dmub_fw_status_bit {
DMUB_FW_STATUS_BIT_COMMAND_TABLE_READY = (1 << 1),
 };
 
+
+/* Register bit definition for SCRATCH0 */
+union dmub_fw_boot_status {
+   struct {
+   uint32_t dal_fw : 1;
+   uint32_t mailbox_rdy : 1;
+   uint32_t optimized_init_done : 1;
+   uint32_t reserved : 29;
+   } bits;
+   uint32_t all;
+};
+
+enum dmub_fw_boot_status_bit {
+   DMUB_FW_BOOT_STATUS_BIT_DAL_FIRMWARE = (1 << 0),
+   DMUB_FW_BOOT_STATUS_BIT_MAILBOX_READY = (1 << 1),
+   DMUB_FW_BOOT_STATUS_BIT_OPTIMIZED_INIT_DONE = (1 << 2),
+};
+
+/* Register bit definition for SCRATCH15 */
+union dmub_fw_boot_options {
+   struct {
+   uint32_t pemu_env : 1;
+   uint32_t fpga_env : 1;
+   uint32_t optimized_init : 1;
+   uint32_t reserved : 29;
+   } bits;
+   uint32_t all;
+};
+
+enum dmub_fw_boot_options_bit {
+   DMUB_FW_BOOT_OPTION_BIT_PEMU_ENV = (1 << 0),
+   DMUB_FW_BOOT_OPTION_BIT_FPGA_ENV = (1 << 1),
+   DMUB_FW_BOOT_OPTION_BIT_OPTIMIZED_INIT_DONE = (1 << 2),
+};
+
 
//==
 
//
 
//==
-- 
2.27.0

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


[PATCH 15/30] drm/amd/display: add mechanism to skip DCN init

2020-06-19 Thread Rodrigo Siqueira
From: Eric Yang 

[Why]
If optimized init is done in FW. DCN init be skipped in driver. This
need to be communicated between driver and fw and maintain backwards
compatibility.

[How]
Use DMUB scratch 0 bit 2 to indicate optimized init done in fw and
use DMUB scatch 4 bit 0 to indicate drive supports the optimized flow
so FW will perform it.

Signed-off-by: Eric Yang 
Reviewed-by: Tony Cheng 
Acked-by: Rodrigo Siqueira 
---
 drivers/gpu/drm/amd/display/dc/core/dc.c  |  4 +-
 drivers/gpu/drm/amd/display/dc/dc_dmub_srv.c  | 28 -
 drivers/gpu/drm/amd/display/dc/dc_dmub_srv.h  |  2 +
 .../amd/display/dc/dcn10/dcn10_hw_sequencer.c |  4 +-
 drivers/gpu/drm/amd/display/dmub/dmub_srv.h   |  5 ++-
 .../gpu/drm/amd/display/dmub/src/dmub_dcn20.c | 15 +++
 .../gpu/drm/amd/display/dmub/src/dmub_dcn20.h |  4 ++
 .../gpu/drm/amd/display/dmub/src/dmub_dcn21.c | 10 -
 .../gpu/drm/amd/display/dmub/src/dmub_dcn21.h |  6 ---
 .../gpu/drm/amd/display/dmub/src/dmub_srv.c   | 40 +--
 10 files changed, 46 insertions(+), 72 deletions(-)

diff --git a/drivers/gpu/drm/amd/display/dc/core/dc.c 
b/drivers/gpu/drm/amd/display/dc/core/dc.c
index db5feb89d4af..67402d75e67e 100644
--- a/drivers/gpu/drm/amd/display/dc/core/dc.c
+++ b/drivers/gpu/drm/amd/display/dc/core/dc.c
@@ -2681,6 +2681,7 @@ void dc_interrupt_ack(struct dc *dc, enum dc_irq_source 
src)
dal_irq_service_ack(dc->res_pool->irqs, src);
 }
 
+
 void dc_set_power_state(
struct dc *dc,
enum dc_acpi_cm_power_state power_state)
@@ -2692,9 +2693,6 @@ void dc_set_power_state(
case DC_ACPI_CM_POWER_STATE_D0:
dc_resource_state_construct(dc, dc->current_state);
 
-   if (dc->ctx->dmub_srv)
-   dc_dmub_srv_wait_phy_init(dc->ctx->dmub_srv);
-
dc->hwss.init_hw(dc);
 
if (dc->hwss.init_sys_ctx != NULL &&
diff --git a/drivers/gpu/drm/amd/display/dc/dc_dmub_srv.c 
b/drivers/gpu/drm/amd/display/dc/dc_dmub_srv.c
index eea2429ac67d..96532f7ba480 100644
--- a/drivers/gpu/drm/amd/display/dc/dc_dmub_srv.c
+++ b/drivers/gpu/drm/amd/display/dc/dc_dmub_srv.c
@@ -106,29 +106,17 @@ void dc_dmub_srv_wait_idle(struct dc_dmub_srv 
*dc_dmub_srv)
DC_ERROR("Error waiting for DMUB idle: status=%d\n", status);
 }
 
-void dc_dmub_srv_wait_phy_init(struct dc_dmub_srv *dc_dmub_srv)
+bool dc_dmub_srv_optimized_init_done(struct dc_dmub_srv *dc_dmub_srv)
 {
-   struct dmub_srv *dmub = dc_dmub_srv->dmub;
-   struct dc_context *dc_ctx = dc_dmub_srv->ctx;
-   enum dmub_status status;
+   struct dmub_srv *dmub;
+   union dmub_fw_boot_status status;
 
-   for (;;) {
-   /* Wait up to a second for PHY init. */
-   status = dmub_srv_wait_for_phy_init(dmub, 100);
-   if (status == DMUB_STATUS_OK)
-   /* Initialization OK */
-   break;
+   if (!dc_dmub_srv || !dc_dmub_srv->dmub)
+   return false;
 
-   DC_ERROR("DMCUB PHY init failed: status=%d\n", status);
-   ASSERT(0);
+   dmub = dc_dmub_srv->dmub;
 
-   if (status != DMUB_STATUS_TIMEOUT)
-   /*
-* Server likely initialized or we don't have
-* DMCUB HW support - this won't end.
-*/
-   break;
+   status = dmub->hw_funcs.get_fw_status(dmub);
 
-   /* Continue spinning so we don't hang the ASIC. */
-   }
+   return status.bits.optimized_init_done;
 }
diff --git a/drivers/gpu/drm/amd/display/dc/dc_dmub_srv.h 
b/drivers/gpu/drm/amd/display/dc/dc_dmub_srv.h
index a3a09ccb6d26..8bd20d0d7689 100644
--- a/drivers/gpu/drm/amd/display/dc/dc_dmub_srv.h
+++ b/drivers/gpu/drm/amd/display/dc/dc_dmub_srv.h
@@ -56,4 +56,6 @@ void dc_dmub_srv_wait_idle(struct dc_dmub_srv *dc_dmub_srv);
 
 void dc_dmub_srv_wait_phy_init(struct dc_dmub_srv *dc_dmub_srv);
 
+bool dc_dmub_srv_optimized_init_done(struct dc_dmub_srv *dc_dmub_srv);
+
 #endif /* _DMUB_DC_SRV_H_ */
diff --git a/drivers/gpu/drm/amd/display/dc/dcn10/dcn10_hw_sequencer.c 
b/drivers/gpu/drm/amd/display/dc/dcn10/dcn10_hw_sequencer.c
index abb160b5c395..cb45f05a0319 100644
--- a/drivers/gpu/drm/amd/display/dc/dcn10/dcn10_hw_sequencer.c
+++ b/drivers/gpu/drm/amd/display/dc/dcn10/dcn10_hw_sequencer.c
@@ -1288,7 +1288,9 @@ void dcn10_init_hw(struct dc *dc)
if (!dcb->funcs->is_accelerated_mode(dcb))
hws->funcs.disable_vga(dc->hwseq);
 
-   hws->funcs.bios_golden_init(dc);
+   if (!dc_dmub_srv_optimized_init_done(dc->ctx->dmub_srv))
+   hws->funcs.bios_golden_init(dc);
+
if (dc->ctx->dc_bios->fw_info_valid) {
res_pool->ref_clocks.xtalin_clock_inKhz =

dc->ctx->dc_bios->fw_info.pll_info.crystal_frequency;
diff --git a/drivers/gpu/drm/amd/display/dmub/dmub_srv.h 
b/drivers/gpu/drm/amd/dis

[PATCH 22/30] drm/amd/display: Allow 4 split on 10K 420 modes

2020-06-19 Thread Rodrigo Siqueira
From: Chris Park 

[Why]
10K YCbCr420 does not need ODM 4:1, but it requires MPC 4 split
indicated on the flags.

[How]
Make pixel encoding and resolution size specific workaround to enable
ODM combine on YCbCr420 high resolution modes.

Signed-off-by: Chris Park 
Reviewed-by: Charlene Liu 
Acked-by: Rodrigo Siqueira 
---
 drivers/gpu/drm/amd/display/dc/dcn20/dcn20_resource.c| 5 +
 .../gpu/drm/amd/display/dc/dml/dcn30/display_mode_vba_30.c   | 5 -
 2 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/drivers/gpu/drm/amd/display/dc/dcn20/dcn20_resource.c 
b/drivers/gpu/drm/amd/display/dc/dcn20/dcn20_resource.c
index e8357d7af4ee..1371f4fb168f 100644
--- a/drivers/gpu/drm/amd/display/dc/dcn20/dcn20_resource.c
+++ b/drivers/gpu/drm/amd/display/dc/dcn20/dcn20_resource.c
@@ -2745,6 +2745,11 @@ int dcn20_validate_apply_pipe_split_flags(
split[i] = 4;
v->ODMCombineEnablePerState[vlevel][pipe_plane] = 
dm_odm_combine_mode_4to1;
}
+   /*420 format workaround*/
+   if (pipe->stream->timing.h_addressable > 7680 &&
+   pipe->stream->timing.pixel_encoding == 
PIXEL_ENCODING_YCBCR420) {
+   split[i] = 4;
+   }
 #endif
v->ODMCombineEnabled[pipe_plane] =
v->ODMCombineEnablePerState[vlevel][pipe_plane];
diff --git a/drivers/gpu/drm/amd/display/dc/dml/dcn30/display_mode_vba_30.c 
b/drivers/gpu/drm/amd/display/dc/dml/dcn30/display_mode_vba_30.c
index 5909af0a25fb..75dc4fe41731 100644
--- a/drivers/gpu/drm/amd/display/dc/dml/dcn30/display_mode_vba_30.c
+++ b/drivers/gpu/drm/amd/display/dc/dml/dcn30/display_mode_vba_30.c
@@ -3986,11 +3986,6 @@ void dml30_ModeSupportAndSystemConfigurationFull(struct 
display_mode_lib *mode_l
} else if 
(v->PlaneRequiredDISPCLKWithoutODMCombine > 
v->MaxDispclkRoundedDownToDFSGranularity) {
v->ODMCombineEnablePerState[i][k] = 
dm_odm_combine_mode_2to1;
v->PlaneRequiredDISPCLK = 
v->PlaneRequiredDISPCLKWithODMCombine2To1;
-   /*420 format workaround*/
-   if (v->HActive[k] > 7680 && 
v->OutputFormat[k] == dm_420) {
-   
v->ODMCombineEnablePerState[i][k] = dm_odm_combine_mode_2to1;
-   v->PlaneRequiredDISPCLK = 
v->PlaneRequiredDISPCLKWithODMCombine2To1;
-   }
} else {
v->ODMCombineEnablePerState[i][k] = 
dm_odm_combine_mode_disabled;
v->PlaneRequiredDISPCLK = 
v->PlaneRequiredDISPCLKWithoutODMCombine;
-- 
2.27.0

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


[PATCH 12/30] drm/amd/display: clip plane rects in DM before passing into DC

2020-06-19 Thread Rodrigo Siqueira
From: Aurabindo Pillai 

[Why]
DC global validation can fail when userspace requests to draw large
plane without performing the clipping themselves.

This is observed in the IGT kms_plane panning tests for 4K displays
where they draw an 8K plane without any clipping while expecting only
the top 4K to be drawn.

[How]
DRM already has helpers to take care of the clipping necessary and to
mark whether a plane is visible or not, so make use of these helpers
in DM before passing the plane to DC.

Signed-off-by: Aurabindo Pillai 
Reviewed-by: Nicholas Kazlauskas 
Acked-by: Rodrigo Siqueira 
---
 .../gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 25 +++
 1 file changed, 25 insertions(+)

diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c 
b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
index 6be4913a0239..2dc419194817 100644
--- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
+++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
@@ -5779,6 +5779,17 @@ static void dm_plane_helper_cleanup_fb(struct drm_plane 
*plane,
amdgpu_bo_unref(&rbo);
 }
 
+static int dm_plane_helper_check_state(struct drm_plane_state *state,
+  struct drm_crtc_state *new_crtc_state)
+{
+   int max_downscale = 0;
+   int max_upscale = INT_MAX;
+
+   /* TODO: These should be checked against DC plane caps */
+   return drm_atomic_helper_check_plane_state(
+   state, new_crtc_state, max_downscale, max_upscale, true, true);
+}
+
 static int dm_plane_atomic_check(struct drm_plane *plane,
 struct drm_plane_state *state)
 {
@@ -5786,6 +5797,7 @@ static int dm_plane_atomic_check(struct drm_plane *plane,
struct dc *dc = adev->dm.dc;
struct dm_plane_state *dm_plane_state;
struct dc_scaling_info scaling_info;
+   struct drm_crtc_state *new_crtc_state;
int ret;
 
dm_plane_state = to_dm_plane_state(state);
@@ -5793,6 +5805,15 @@ static int dm_plane_atomic_check(struct drm_plane *plane,
if (!dm_plane_state->dc_state)
return 0;
 
+   new_crtc_state =
+   drm_atomic_get_new_crtc_state(state->state, state->crtc);
+   if (!new_crtc_state)
+   return -EINVAL;
+
+   ret = dm_plane_helper_check_state(state, new_crtc_state);
+   if (ret)
+   return ret;
+
ret = fill_dc_scaling_info(state, &scaling_info);
if (ret)
return ret;
@@ -8312,6 +8333,10 @@ static int dm_update_plane_state(struct dc *dc,
if (!needs_reset)
return 0;
 
+   ret = dm_plane_helper_check_state(new_plane_state, 
new_crtc_state);
+   if (ret)
+   return ret;
+
WARN_ON(dm_new_plane_state->dc_state);
 
dc_new_plane_state = dc_create_plane_state(dc);
-- 
2.27.0

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


[PATCH 28/30] drm/amd/display: [FW Promotion] Release 1.0.19

2020-06-19 Thread Rodrigo Siqueira
From: Anthony Koo 

[Header Changes]
- Add debug flag for psr to use hw locking mgr state machine

Signed-off-by: Anthony Koo 
Reviewed-by: Anthony Koo 
Acked-by: Rodrigo Siqueira 
---
 drivers/gpu/drm/amd/display/dmub/inc/dmub_cmd.h | 9 +
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/amd/display/dmub/inc/dmub_cmd.h 
b/drivers/gpu/drm/amd/display/dmub/inc/dmub_cmd.h
index 96e1379c4cf8..68b5fd811d26 100644
--- a/drivers/gpu/drm/amd/display/dmub/inc/dmub_cmd.h
+++ b/drivers/gpu/drm/amd/display/dmub/inc/dmub_cmd.h
@@ -36,10 +36,10 @@
 
 /* Firmware versioning. */
 #ifdef DMUB_EXPOSE_VERSION
-#define DMUB_FW_VERSION_GIT_HASH 0x67e8928df
+#define DMUB_FW_VERSION_GIT_HASH 0xf87bb940b
 #define DMUB_FW_VERSION_MAJOR 1
 #define DMUB_FW_VERSION_MINOR 0
-#define DMUB_FW_VERSION_REVISION 18
+#define DMUB_FW_VERSION_REVISION 19
 #define DMUB_FW_VERSION_UCODE ((DMUB_FW_VERSION_MAJOR << 24) | 
(DMUB_FW_VERSION_MINOR << 16) | DMUB_FW_VERSION_REVISION)
 #endif
 
@@ -86,10 +86,11 @@ union dmub_addr {
 
 union dmub_psr_debug_flags {
struct {
-   uint8_t visual_confirm : 1;
+   uint32_t visual_confirm : 1;
+   uint32_t use_hw_lock_mgr : 1;
} bitfields;
 
-   unsigned int u32All;
+   uint32_t u32All;
 };
 
 #if defined(__cplusplus)
-- 
2.27.0

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


[PATCH 23/30] drm/amd/display: Red screen observed on startup

2020-06-19 Thread Rodrigo Siqueira
From: Peikang Zhang 

[Why]
We try to to change new_clocks->dppclk_khz to 10 when
new_clocks->dppclk_khz is 0

[How]
Don't change new_clocks->dppclk_khz value when new_clocks->dppclk_khz is
0

Signed-off-by: Peikang Zhang 
Reviewed-by: Yongqiang Sun 
Acked-by: Rodrigo Siqueira 
---
 drivers/gpu/drm/amd/display/dc/clk_mgr/dcn21/rn_clk_mgr.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/display/dc/clk_mgr/dcn21/rn_clk_mgr.c 
b/drivers/gpu/drm/amd/display/dc/clk_mgr/dcn21/rn_clk_mgr.c
index 24c5765890fa..39788a7bd003 100644
--- a/drivers/gpu/drm/amd/display/dc/clk_mgr/dcn21/rn_clk_mgr.c
+++ b/drivers/gpu/drm/amd/display/dc/clk_mgr/dcn21/rn_clk_mgr.c
@@ -153,8 +153,9 @@ void rn_update_clocks(struct clk_mgr *clk_mgr_base,
}
 
// workaround: Limit dppclk to 100Mhz to avoid lower eDP panel switch 
to plus 4K monitor underflow.
+   // Do not adjust dppclk if dppclk is 0 to avoid unexpected result
if (!IS_DIAG_DC(dc->ctx->dce_environment)) {
-   if (new_clocks->dppclk_khz < 10)
+   if (new_clocks->dppclk_khz < 10 && new_clocks->dppclk_khz > 
0)
new_clocks->dppclk_khz = 10;
}
 
-- 
2.27.0

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


[PATCH 27/30] drm/amd/display: DP link layer test 4.2.1.1 fix due to specs update

2020-06-19 Thread Rodrigo Siqueira
From: Wenjing Liu 

[why]
DP link layer CTS specs updated to change the test parameters in test
4.2.1.1.
Before it requires source to delay 400us on aux no reply.
With the specs updates Errata5, it requires source to delay 3.2ms
(based on LTTPR aux timeout)
This causes our test to fail after updating with the latest test
equipment firmware.

[how]
the change is to allow LTTPR 3.2ms aux timeout delay by default.
And set back to 400us if LTTPR feature is not enabled.
We will set 3.2ms and always enable LTTPR non transparent mode
if LTTPR feature is enabled and LTTPR is present.

Signed-off-by: Wenjing Liu 
Reviewed-by: Jun Lei 
Acked-by: Krunoslav Kovac 
Acked-by: Rodrigo Siqueira 
---
 drivers/gpu/drm/amd/display/dc/core/dc_link.c |  8 +-
 .../gpu/drm/amd/display/dc/core/dc_link_ddc.c | 13 ++--
 .../gpu/drm/amd/display/dc/core/dc_link_dp.c  | 76 ++-
 .../drm/amd/display/dc/core/dc_link_hwss.c|  2 +-
 drivers/gpu/drm/amd/display/dc/dc.h   |  2 +-
 drivers/gpu/drm/amd/display/dc/dc_link.h  |  2 +-
 .../drm/amd/display/dc/dcn21/dcn21_resource.c |  2 +-
 .../gpu/drm/amd/display/dc/inc/dc_link_ddc.h  |  2 +-
 .../gpu/drm/amd/display/dc/inc/dc_link_dp.h   |  2 +-
 9 files changed, 59 insertions(+), 50 deletions(-)

diff --git a/drivers/gpu/drm/amd/display/dc/core/dc_link.c 
b/drivers/gpu/drm/amd/display/dc/core/dc_link.c
index f020b3b67f0d..02742cca4d84 100644
--- a/drivers/gpu/drm/amd/display/dc/core/dc_link.c
+++ b/drivers/gpu/drm/amd/display/dc/core/dc_link.c
@@ -690,7 +690,6 @@ static bool detect_dp(struct dc_link *link,
 
if (sink_caps->transaction_type == DDC_TRANSACTION_TYPE_I2C_OVER_AUX) {
sink_caps->signal = SIGNAL_TYPE_DISPLAY_PORT;
-   dpcd_set_source_specific_data(link);
if (!detect_dp_sink_caps(link))
return false;
if (is_mst_supported(link)) {
@@ -855,6 +854,7 @@ static bool dc_link_detect_helper(struct dc_link *link,
bool same_dpcd = true;
enum dc_connection_type new_connection_type = dc_connection_none;
bool perform_dp_seamless_boot = false;
+   const uint32_t post_oui_delay = 30; // 30ms
 
DC_LOGGER_INIT(link->ctx->logger);
 
@@ -867,6 +867,7 @@ static bool dc_link_detect_helper(struct dc_link *link,
// need to re-write OUI and brightness in resume case
if (link->connector_signal == SIGNAL_TYPE_EDP) {
dpcd_set_source_specific_data(link);
+   msleep(post_oui_delay);
dc_link_set_default_brightness_aux(link);
//TODO: use cached
}
@@ -922,8 +923,6 @@ static bool dc_link_detect_helper(struct dc_link *link,
case SIGNAL_TYPE_EDP: {
read_current_link_settings_on_detect(link);
 
-   dpcd_set_source_specific_data(link);
-
detect_edp_sink_caps(link);
read_current_link_settings_on_detect(link);
sink_caps.transaction_type = 
DDC_TRANSACTION_TYPE_I2C_OVER_AUX;
@@ -1633,6 +1632,7 @@ static enum dc_status enable_link_dp(struct dc_state 
*state,
int i;
bool apply_seamless_boot_optimization = false;
uint32_t bl_oled_enable_delay = 50; // in ms
+   const uint32_t post_oui_delay = 30; // 30ms
 
// check for seamless boot
for (i = 0; i < state->stream_count; i++) {
@@ -1659,6 +1659,8 @@ static enum dc_status enable_link_dp(struct dc_state 
*state,
 
// during mode switch we do DP_SET_POWER off then on, and OUI is lost
dpcd_set_source_specific_data(link);
+   if (link->dpcd_sink_ext_caps.raw != 0)
+   msleep(post_oui_delay);
 
skip_video_pattern = true;
 
diff --git a/drivers/gpu/drm/amd/display/dc/core/dc_link_ddc.c 
b/drivers/gpu/drm/amd/display/dc/core/dc_link_ddc.c
index be8f265976b0..b984eecca58b 100644
--- a/drivers/gpu/drm/amd/display/dc/core/dc_link_ddc.c
+++ b/drivers/gpu/drm/amd/display/dc/core/dc_link_ddc.c
@@ -655,16 +655,17 @@ bool dc_link_aux_transfer_with_retries(struct ddc_service 
*ddc,
 }
 
 
-uint32_t dc_link_aux_configure_timeout(struct ddc_service *ddc,
+bool dc_link_aux_try_to_configure_timeout(struct ddc_service *ddc,
uint32_t timeout)
 {
-   uint32_t prev_timeout = 0;
+   bool result = false;
struct ddc *ddc_pin = ddc->ddc_pin;
 
-   if 
(ddc->ctx->dc->res_pool->engines[ddc_pin->pin_data->en]->funcs->configure_timeout)
-   prev_timeout =
-   
ddc->ctx->dc->res_pool->engines[ddc_pin->pin_data->en]->funcs->configure_timeout(ddc,
 timeout);
-   return prev_timeout;
+   if 
(ddc->ctx->dc->res_pool->engines[ddc_pin->pin_data->en]->funcs->configure_timeout)
 {
+   
ddc->ctx->dc->res_pool->engines[ddc_pin->pin_data->en]->funcs->configure_timeout(ddc,
 timeout);
+   result = true;
+   }

[PATCH 29/30] drm/amd/display: Fix ineffective setting of max bpc property

2020-06-19 Thread Rodrigo Siqueira
From: Stylon Wang 

[Why]
Regression was introduced where setting max bpc property has no effect
on the atomic check and final commit. It has the same effect as max bpc
being stuck at 8.

[How]
Correctly propagate max bpc with the new connector state.

Signed-off-by: Stylon Wang 
Reviewed-by: Nicholas Kazlauskas 
Acked-by: Rodrigo Siqueira 
---
 drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c 
b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
index e35fd2225972..b3ceba072643 100644
--- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
+++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
@@ -5135,7 +5135,8 @@ create_validate_stream_for_sink(struct 
amdgpu_dm_connector *aconnector,
struct drm_connector *connector = &aconnector->base;
struct amdgpu_device *adev = connector->dev->dev_private;
struct dc_stream_state *stream;
-   int requested_bpc = connector->state ? 
connector->state->max_requested_bpc : 8;
+   const struct drm_connector_state *drm_state = dm_state ? 
&dm_state->base : NULL;
+   int requested_bpc = drm_state ? drm_state->max_requested_bpc : 8;
enum dc_status dc_result = DC_OK;
 
do {
-- 
2.27.0

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


[PATCH 13/30] drm/amd/display: Added local_sink null check before access

2020-06-19 Thread Rodrigo Siqueira
From: Jake Wang 

[Why & How]
Need to check if local_sink is NULL before accessing.

Signed-off-by: Jake Wang 
Reviewed-by: Anthony Koo 
Acked-by: Rodrigo Siqueira 
---
 drivers/gpu/drm/amd/display/dc/core/dc_link_hwss.c | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/display/dc/core/dc_link_hwss.c 
b/drivers/gpu/drm/amd/display/dc/core/dc_link_hwss.c
index 6590f51caefa..93e28231a9d0 100644
--- a/drivers/gpu/drm/amd/display/dc/core/dc_link_hwss.c
+++ b/drivers/gpu/drm/amd/display/dc/core/dc_link_hwss.c
@@ -167,7 +167,8 @@ bool edp_receiver_ready_T9(struct dc_link *link)
} while (++tries < 50);
}
 
-   if (link->local_sink->edid_caps.panel_patch.extra_delay_backlight_off > 
0)
+   if (link->local_sink &&
+   
link->local_sink->edid_caps.panel_patch.extra_delay_backlight_off > 0)

udelay(link->local_sink->edid_caps.panel_patch.extra_delay_backlight_off * 
1000);
 
return result;
@@ -201,7 +202,8 @@ bool edp_receiver_ready_T7(struct dc_link *link)
} while (time_taken_in_ns < 50 * 100); //MAx T7 is 50ms
}
 
-   if (link->local_sink->edid_caps.panel_patch.extra_t7_ms > 0)
+   if (link->local_sink &&
+   link->local_sink->edid_caps.panel_patch.extra_t7_ms > 0)
udelay(link->local_sink->edid_caps.panel_patch.extra_t7_ms * 
1000);
 
return result;
-- 
2.27.0

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


[PATCH 02/30] drm/amd/display: [FW Promotion] Release 1.0.16

2020-06-19 Thread Rodrigo Siqueira
From: Anthony Koo 

Signed-off-by: Anthony Koo 
Reviewed-by: Anthony Koo 
Acked-by: Rodrigo Siqueira 
---
 drivers/gpu/drm/amd/display/dmub/inc/dmub_cmd.h | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/display/dmub/inc/dmub_cmd.h 
b/drivers/gpu/drm/amd/display/dmub/inc/dmub_cmd.h
index 44f74047050a..ef9c116b790f 100644
--- a/drivers/gpu/drm/amd/display/dmub/inc/dmub_cmd.h
+++ b/drivers/gpu/drm/amd/display/dmub/inc/dmub_cmd.h
@@ -36,10 +36,10 @@
 
 /* Firmware versioning. */
 #ifdef DMUB_EXPOSE_VERSION
-#define DMUB_FW_VERSION_GIT_HASH 0xee850bb2f
+#define DMUB_FW_VERSION_GIT_HASH 0x703682cd7
 #define DMUB_FW_VERSION_MAJOR 1
 #define DMUB_FW_VERSION_MINOR 0
-#define DMUB_FW_VERSION_REVISION 15
+#define DMUB_FW_VERSION_REVISION 16
 #define DMUB_FW_VERSION_UCODE ((DMUB_FW_VERSION_MAJOR << 24) | 
(DMUB_FW_VERSION_MINOR << 16) | DMUB_FW_VERSION_REVISION)
 #endif
 
-- 
2.27.0

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


[PATCH 04/30] drm/amd/display: Fixed using wrong eDP power sequence function pointer

2020-06-19 Thread Rodrigo Siqueira
From: Yi-Ling Chen 

[why]
dc->hwss->edp_backlight_control is null, it would casue it only be off
main-link of eDP.  It is not worng behavior for eDP power sequence off.

[how]
Must use hwseq->funcs.edp_backlight_control finction pointer for edp
backlight.

Signed-off-by: Yi-Ling Chen 
Reviewed-by: Sung Lee 
Acked-by: Rodrigo Siqueira 
---
 drivers/gpu/drm/amd/display/dc/dcn10/dcn10_hw_sequencer.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/display/dc/dcn10/dcn10_hw_sequencer.c 
b/drivers/gpu/drm/amd/display/dc/dcn10/dcn10_hw_sequencer.c
index 845e7f823a3d..abb160b5c395 100644
--- a/drivers/gpu/drm/amd/display/dc/dcn10/dcn10_hw_sequencer.c
+++ b/drivers/gpu/drm/amd/display/dc/dcn10/dcn10_hw_sequencer.c
@@ -1394,10 +1394,10 @@ void dcn10_init_hw(struct dc *dc)
if (edp_link &&
edp_link->link_enc->funcs->is_dig_enabled &&

edp_link->link_enc->funcs->is_dig_enabled(edp_link->link_enc) &&
-   dc->hwss.edp_backlight_control &&
+   dc->hwseq->funcs.edp_backlight_control &&
dc->hwss.power_down &&
dc->hwss.edp_power_control) {
-   dc->hwss.edp_backlight_control(edp_link, false);
+   dc->hwseq->funcs.edp_backlight_control(edp_link, false);
dc->hwss.power_down(dc);
dc->hwss.edp_power_control(edp_link, false);
} else {
-- 
2.27.0

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


[PATCH 03/30] drm/amd/display: Fix calculation of virtual channel payload

2020-06-19 Thread Rodrigo Siqueira
From: Mikita Lipski 

[why]
The calculation of virtual channel payload would not take link settings
in account. As we calculate VCPI slots needed both PBN for stream and
also PBN per time slot.  Before we would use generic PBN per time slot,
which would not change with link settings causing wrong Payload
allocation.

[how]
Provide PBN per time slot for each Virtual channel payload calculation.

Signed-off-by: Mikita Lipski 
Reviewed-by: Nicholas Kazlauskas 
Acked-by: Rodrigo Siqueira 
---
 .../gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c |  2 +-
 .../display/amdgpu_dm/amdgpu_dm_mst_types.c   | 19 +--
 2 files changed, 10 insertions(+), 11 deletions(-)

diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c 
b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
index 9ab0d8521576..6be4913a0239 100644
--- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
+++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
@@ -5538,7 +5538,7 @@ static int dm_encoder_helper_atomic_check(struct 
drm_encoder *encoder,
   
mst_mgr,
   
mst_port,
   
dm_new_connector_state->pbn,
-  0);
+  
dm_mst_get_pbn_divider(aconnector->dc_link));
if (dm_new_connector_state->vcpi_slots < 0) {
DRM_DEBUG_ATOMIC("failed finding vcpi slots: %d\n", 
(int)dm_new_connector_state->vcpi_slots);
return dm_new_connector_state->vcpi_slots;
diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_mst_types.c 
b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_mst_types.c
index 69056660672d..6b98d420f9e2 100644
--- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_mst_types.c
+++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_mst_types.c
@@ -543,6 +543,8 @@ static void increase_dsc_bpp(struct drm_atomic_state *state,
int link_timeslots_used;
int fair_pbn_alloc;
 
+   pbn_per_timeslot = dm_mst_get_pbn_divider(dc_link);
+
for (i = 0; i < count; i++) {
if (vars[i].dsc_enabled) {
initial_slack[i] = 
kbps_to_peak_pbn(params[i].bw_range.max_kbps) - vars[i].pbn;
@@ -554,9 +556,6 @@ static void increase_dsc_bpp(struct drm_atomic_state *state,
}
}
 
-   pbn_per_timeslot = dc_link_bandwidth_kbps(dc_link,
-   dc_link_get_link_cap(dc_link)) / (8 * 1000 * 54);
-
while (remaining_to_increase) {
next_index = -1;
min_initial_slack = -1;
@@ -585,7 +584,7 @@ static void increase_dsc_bpp(struct drm_atomic_state *state,
  
params[next_index].port->mgr,
  
params[next_index].port,
  vars[next_index].pbn,
- 
dm_mst_get_pbn_divider(dc_link)) < 0)
+ pbn_per_timeslot) < 0)
return;
if (!drm_dp_mst_atomic_check(state)) {
vars[next_index].bpp_x16 = 
bpp_x16_from_pbn(params[next_index], vars[next_index].pbn);
@@ -595,7 +594,7 @@ static void increase_dsc_bpp(struct drm_atomic_state *state,
  
params[next_index].port->mgr,
  
params[next_index].port,
  
vars[next_index].pbn,
- 
dm_mst_get_pbn_divider(dc_link)) < 0)
+ 
pbn_per_timeslot) < 0)
return;
}
} else {
@@ -604,7 +603,7 @@ static void increase_dsc_bpp(struct drm_atomic_state *state,
  
params[next_index].port->mgr,
  
params[next_index].port,
  vars[next_index].pbn,
- 
dm_mst_get_pbn_divider(dc_link)) < 0)
+ pbn_per_timeslot) < 0)
return;
if (!drm_dp_mst_atomic_check(state)) {
vars[next_index].bpp_x16 = 
params[next_index].bw_range.max_target_bpp_x16;
@@ -614,7 +613,7 @@ static void increase_dsc_bpp(struct drm_atomic_state *state,
   

[PATCH 06/30] drm/amd/display: implement edid max TMDS clock check in DC

2020-06-19 Thread Rodrigo Siqueira
From: Michael Strauss 

[WHY]
Currently DC doesn't check requested pixel clock against an EDID
specified TMDS max clock if it exists, passing modes that should fail

[HOW]
Add max TMDS clk to edid caps and perform check during validation

Signed-off-by: Michael Strauss 
Reviewed-by: Eric Yang 
Acked-by: Rodrigo Siqueira 
---
 drivers/gpu/drm/amd/display/dc/dc_types.h  |  3 +++
 .../gpu/drm/amd/display/dc/dcn10/dcn10_link_encoder.c  | 10 ++
 2 files changed, 13 insertions(+)

diff --git a/drivers/gpu/drm/amd/display/dc/dc_types.h 
b/drivers/gpu/drm/amd/display/dc/dc_types.h
index d7b9d311d9e0..f51e5766d8f5 100644
--- a/drivers/gpu/drm/amd/display/dc/dc_types.h
+++ b/drivers/gpu/drm/amd/display/dc/dc_types.h
@@ -261,6 +261,9 @@ struct dc_edid_caps {
bool edid_hdmi;
bool hdr_supported;
 
+   uint32_t max_tmds_clk_mhz;
+   uint32_t max_forum_tmds_clk_mhz;
+
struct dc_panel_patch panel_patch;
 };
 
diff --git a/drivers/gpu/drm/amd/display/dc/dcn10/dcn10_link_encoder.c 
b/drivers/gpu/drm/amd/display/dc/dcn10/dcn10_link_encoder.c
index 7fd385be3f3d..a9af3f6fd8ec 100644
--- a/drivers/gpu/drm/amd/display/dc/dcn10/dcn10_link_encoder.c
+++ b/drivers/gpu/drm/amd/display/dc/dcn10/dcn10_link_encoder.c
@@ -619,11 +619,20 @@ bool dcn10_link_encoder_validate_dvi_output(
 static bool dcn10_link_encoder_validate_hdmi_output(
const struct dcn10_link_encoder *enc10,
const struct dc_crtc_timing *crtc_timing,
+   const struct dc_edid_caps *edid_caps,
int adjusted_pix_clk_100hz)
 {
enum dc_color_depth max_deep_color =
enc10->base.features.max_hdmi_deep_color;
 
+   // check pixel clock against edid specified max TMDS clk
+   if (edid_caps->max_tmds_clk_mhz != 0 &&
+   adjusted_pix_clk_100hz > edid_caps->max_tmds_clk_mhz * 
1)
+   return false;
+   if (edid_caps->max_forum_tmds_clk_mhz != 0 &&
+   adjusted_pix_clk_100hz > 
edid_caps->max_forum_tmds_clk_mhz * 1)
+   return false;
+
if (max_deep_color < crtc_timing->display_color_depth)
return false;
 
@@ -801,6 +810,7 @@ bool dcn10_link_encoder_validate_output_with_stream(
is_valid = dcn10_link_encoder_validate_hdmi_output(
enc10,
&stream->timing,
+   &stream->sink->edid_caps,
stream->phy_pix_clk * 10);
break;
case SIGNAL_TYPE_DISPLAY_PORT:
-- 
2.27.0

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


[PATCH 00/30] DC Patches Jun 19, 2020

2020-06-19 Thread Rodrigo Siqueira
This DC patchset brings improvements in multiple areas. In summary, we
highlight:

* DCN3 improvements
* Fw updates
* A series of improvements and fixes

Anthony Koo (4):
  drm/amd/display: [FW Promotion] Release 1.0.16
  drm/amd/display: [FW Promotion] Release 1.0.17
  drm/amd/display: [FW Promotion] Release 1.0.18
  drm/amd/display: [FW Promotion] Release 1.0.19

Aric Cyr (2):
  drm/amd/display: 3.2.90
  drm/amd/display: 3.2.91

Aurabindo Pillai (1):
  drm/amd/display: clip plane rects in DM before passing into DC

Bhawanpreet Lakha (1):
  drm/amd/display: enable assr

Brandon Syu (1):
  drm/amd/display: use dispclk AVFS for dppclk

Camille Cho (1):
  drm/amd/display: Correctly respond in psr enablement interface

Chris Park (2):
  drm/amd/display: Force ODM combine on 5K+ 420 modes
  drm/amd/display: Allow 4 split on 10K 420 modes

Dale Zhao (1):
  drm/amd/display: fine tune logic of edid max TMDS clock check

David Galiffi (1):
  drm/amd/display: Compare v_front_porch when checking if streams are
synchronizable

Derek Lai (1):
  drm/amd/display: VSC SDP supported for SST

Dmytro Laktyushkin (1):
  drm/amd/display: fix 4to1 odm MPC_OUT_FLOW_CONTROL_COUNT

Eric Yang (1):
  drm/amd/display: add mechanism to skip DCN init

Jake Wang (1):
  drm/amd/display: Added local_sink null check before access

Martin Leung (1):
  drm/amd/display: enable seamless boot for dcn30

Michael Strauss (1):
  drm/amd/display: implement edid max TMDS clock check in DC

Mikita Lipski (1):
  drm/amd/display: Fix calculation of virtual channel payload

Nicholas Kazlauskas (2):
  drm/amd/display: Fix DML failures caused by doubled stereo viewport
  drm/amd/display: Fill in dmub_srv fw_version from firmware metadata

Peikang Zhang (1):
  drm/amd/display: Red screen observed on startup

Stylon Wang (2):
  drm/amd/display: Enable output_bpc property on all outputs
  drm/amd/display: Fix ineffective setting of max bpc property

Wenjing Liu (2):
  drm/amd/display: allow query ddc data over aux to be read only
operation
  drm/amd/display: DP link layer test 4.2.1.1 fix due to specs update

Wyatt Wood (1):
  drm/amd/display: Use dmub fw to lock pipe, cursor, dig

Yi-Ling Chen (1):
  drm/amd/display: Fixed using wrong eDP power sequence function pointer

 .../gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c |  34 ++-
 .../amd/display/amdgpu_dm/amdgpu_dm_debugfs.c |   4 +-
 .../amd/display/amdgpu_dm/amdgpu_dm_hdcp.c|  37 +++
 .../display/amdgpu_dm/amdgpu_dm_mst_types.c   |  19 +-
 .../gpu/drm/amd/display/dc/calcs/dcn_calcs.c  |  11 +-
 .../display/dc/clk_mgr/dcn20/dcn20_clk_mgr.c  |  18 +-
 .../amd/display/dc/clk_mgr/dcn21/rn_clk_mgr.c |   3 +-
 drivers/gpu/drm/amd/display/dc/core/dc.c  |  40 ++-
 drivers/gpu/drm/amd/display/dc/core/dc_link.c |  14 +-
 .../gpu/drm/amd/display/dc/core/dc_link_ddc.c |  42 +--
 .../gpu/drm/amd/display/dc/core/dc_link_dp.c  | 121 ++---
 .../drm/amd/display/dc/core/dc_link_hwss.c|   8 +-
 .../gpu/drm/amd/display/dc/core/dc_resource.c |   4 +
 drivers/gpu/drm/amd/display/dc/dc.h   |   5 +-
 drivers/gpu/drm/amd/display/dc/dc_dmub_srv.c  |  28 +-
 drivers/gpu/drm/amd/display/dc/dc_dmub_srv.h  |   2 +
 drivers/gpu/drm/amd/display/dc/dc_link.h  |   2 +-
 drivers/gpu/drm/amd/display/dc/dc_types.h |   2 +
 drivers/gpu/drm/amd/display/dc/dce/Makefile   |   3 +-
 .../drm/amd/display/dc/dce/dmub_hw_lock_mgr.c |  57 
 .../drm/amd/display/dc/dce/dmub_hw_lock_mgr.h |  39 +++
 .../amd/display/dc/dcn10/dcn10_hw_sequencer.c |  25 +-
 .../amd/display/dc/dcn10/dcn10_link_encoder.c |   7 +
 .../drm/amd/display/dc/dcn20/dcn20_hwseq.c|  20 +-
 .../drm/amd/display/dc/dcn20/dcn20_resource.c |  13 +
 .../drm/amd/display/dc/dcn21/dcn21_resource.c |   2 +-
 .../drm/amd/display/dc/dcn30/dcn30_hwseq.c| 245 +-
 .../gpu/drm/amd/display/dc/dcn30/dcn30_init.c |   8 +-
 drivers/gpu/drm/amd/display/dc/dm_cp_psp.h|   1 +
 .../dc/dml/dcn30/display_mode_vba_30.c|   5 +
 .../gpu/drm/amd/display/dc/inc/dc_link_ddc.h  |   2 +-
 .../gpu/drm/amd/display/dc/inc/dc_link_dp.h   |   2 +-
 drivers/gpu/drm/amd/display/dmub/dmub_srv.h   |   5 +-
 .../gpu/drm/amd/display/dmub/inc/dmub_cmd.h   |  51 +++-
 .../gpu/drm/amd/display/dmub/src/dmub_dcn20.c |  15 ++
 .../gpu/drm/amd/display/dmub/src/dmub_dcn20.h |   4 +
 .../gpu/drm/amd/display/dmub/src/dmub_dcn21.c |  10 -
 .../gpu/drm/amd/display/dmub/src/dmub_dcn21.h |   6 -
 .../gpu/drm/amd/display/dmub/src/dmub_srv.c   |  50 ++--
 39 files changed, 649 insertions(+), 315 deletions(-)
 create mode 100644 drivers/gpu/drm/amd/display/dc/dce/dmub_hw_lock_mgr.c
 create mode 100644 drivers/gpu/drm/amd/display/dc/dce/dmub_hw_lock_mgr.h

-- 
2.27.0

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


[PATCH 08/30] drm/amd/display: Fix DML failures caused by doubled stereo viewport

2020-06-19 Thread Rodrigo Siqueira
From: Nicholas Kazlauskas 

[Why]
Side-by-side and Top-and-bottom stereo configurations fail DML mode
validation due to Viewport exceeded.

This is because we consider the planes as being pipe split in pipe
population so we end up doubling the viewport width, eg. from 4k to 8k.

[How]
These pipes technically aren't hsplit, so add a check for determining
whether is_hsplit should be set.

Signed-off-by: Nicholas Kazlauskas 
Reviewed-by: Dmytro Laktyushkin 
Acked-by: Rodrigo Siqueira 
---
 drivers/gpu/drm/amd/display/dc/calcs/dcn_calcs.c  | 11 +--
 drivers/gpu/drm/amd/display/dc/dcn20/dcn20_resource.c |  8 
 2 files changed, 17 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/display/dc/calcs/dcn_calcs.c 
b/drivers/gpu/drm/amd/display/dc/calcs/dcn_calcs.c
index 555af29565aa..51397b565ddf 100644
--- a/drivers/gpu/drm/amd/display/dc/calcs/dcn_calcs.c
+++ b/drivers/gpu/drm/amd/display/dc/calcs/dcn_calcs.c
@@ -302,10 +302,17 @@ static void pipe_ctx_to_e2e_pipe_params (
struct _vcs_dpi_display_pipe_params_st *input)
 {
input->src.is_hsplit = false;
-   if (pipe->top_pipe != NULL && pipe->top_pipe->plane_state == 
pipe->plane_state)
+
+   /* stereo can never be split */
+   if (pipe->plane_state->stereo_format == 
PLANE_STEREO_FORMAT_SIDE_BY_SIDE ||
+   pipe->plane_state->stereo_format == 
PLANE_STEREO_FORMAT_TOP_AND_BOTTOM) {
+   /* reset the split group if it was already considered split. */
+   input->src.hsplit_grp = pipe->pipe_idx;
+   } else if (pipe->top_pipe != NULL && pipe->top_pipe->plane_state == 
pipe->plane_state) {
input->src.is_hsplit = true;
-   else if (pipe->bottom_pipe != NULL && pipe->bottom_pipe->plane_state == 
pipe->plane_state)
+   } else if (pipe->bottom_pipe != NULL && pipe->bottom_pipe->plane_state 
== pipe->plane_state) {
input->src.is_hsplit = true;
+   }
 
if (pipe->plane_res.dpp->ctx->dc->debug.optimized_watermark) {
/*
diff --git a/drivers/gpu/drm/amd/display/dc/dcn20/dcn20_resource.c 
b/drivers/gpu/drm/amd/display/dc/dcn20/dcn20_resource.c
index fb167393b8fe..e8357d7af4ee 100644
--- a/drivers/gpu/drm/amd/display/dc/dcn20/dcn20_resource.c
+++ b/drivers/gpu/drm/amd/display/dc/dcn20/dcn20_resource.c
@@ -2257,6 +2257,14 @@ int dcn20_populate_dml_pipes_from_context(
pipes[pipe_cnt].pipe.src.is_hsplit = 
(res_ctx->pipe_ctx[i].bottom_pipe && 
res_ctx->pipe_ctx[i].bottom_pipe->plane_state == pln)
|| (res_ctx->pipe_ctx[i].top_pipe && 
res_ctx->pipe_ctx[i].top_pipe->plane_state == pln)
|| 
pipes[pipe_cnt].pipe.dest.odm_combine != dm_odm_combine_mode_disabled;
+
+   /* stereo is never split, nor odm combine */
+   if (pln->stereo_format == 
PLANE_STEREO_FORMAT_SIDE_BY_SIDE ||
+   pln->stereo_format == 
PLANE_STEREO_FORMAT_TOP_AND_BOTTOM) {
+   pipes[pipe_cnt].pipe.src.is_hsplit = false;
+   pipes[pipe_cnt].pipe.src.hsplit_grp = 
res_ctx->pipe_ctx[i].pipe_idx;
+   }
+
pipes[pipe_cnt].pipe.src.source_scan = pln->rotation == 
ROTATION_ANGLE_90
|| pln->rotation == ROTATION_ANGLE_270 
? dm_vert : dm_horz;
pipes[pipe_cnt].pipe.src.viewport_y_y = 
scl->viewport_unadjusted.y;
-- 
2.27.0

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


[PATCH 18/30] drm/amd/display: Force ODM combine on 5K+ 420 modes

2020-06-19 Thread Rodrigo Siqueira
From: Chris Park 

[Why]
All YCbCr420 resolutions 5K and above have tiling and discoloration
issues.  The issue can be remedied by forcing ODM combine from 5K to 8K.
10K resolution requires ODM 4:1.  The mechanism of what the real problem
is, that is inherent in ODM combine programming, doesn't seem to be
pointed at singular register programming (CLK, MPC, DCSURF, etc.), and
needs more in-depth programming sequence review for these new use case
scenarios.  Until then, workaround to enable ODM combine is proposed.
While it is not our policy, HW spreadsheet also recommends turning on
ODM for these scenario to lower the voltage.

[How]
Make pixel encoding and resolution size specific workaround to enable
ODM combine on YCbCr420 high resolution modes.

Signed-off-by: Chris Park 
Reviewed-by: Charlene Liu 
Acked-by: Rodrigo Siqueira 
---
 .../drm/amd/display/dc/dml/dcn30/display_mode_vba_30.c | 10 ++
 1 file changed, 10 insertions(+)

diff --git a/drivers/gpu/drm/amd/display/dc/dml/dcn30/display_mode_vba_30.c 
b/drivers/gpu/drm/amd/display/dc/dml/dcn30/display_mode_vba_30.c
index 2a32ed6682fc..5909af0a25fb 100644
--- a/drivers/gpu/drm/amd/display/dc/dml/dcn30/display_mode_vba_30.c
+++ b/drivers/gpu/drm/amd/display/dc/dml/dcn30/display_mode_vba_30.c
@@ -3986,9 +3986,19 @@ void dml30_ModeSupportAndSystemConfigurationFull(struct 
display_mode_lib *mode_l
} else if 
(v->PlaneRequiredDISPCLKWithoutODMCombine > 
v->MaxDispclkRoundedDownToDFSGranularity) {
v->ODMCombineEnablePerState[i][k] = 
dm_odm_combine_mode_2to1;
v->PlaneRequiredDISPCLK = 
v->PlaneRequiredDISPCLKWithODMCombine2To1;
+   /*420 format workaround*/
+   if (v->HActive[k] > 7680 && 
v->OutputFormat[k] == dm_420) {
+   
v->ODMCombineEnablePerState[i][k] = dm_odm_combine_mode_2to1;
+   v->PlaneRequiredDISPCLK = 
v->PlaneRequiredDISPCLKWithODMCombine2To1;
+   }
} else {
v->ODMCombineEnablePerState[i][k] = 
dm_odm_combine_mode_disabled;
v->PlaneRequiredDISPCLK = 
v->PlaneRequiredDISPCLKWithoutODMCombine;
+   /*420 format workaround*/
+   if (v->HActive[k] > 4096 && 
v->OutputFormat[k] == dm_420) {
+   
v->ODMCombineEnablePerState[i][k] = dm_odm_combine_mode_2to1;
+   v->PlaneRequiredDISPCLK = 
v->PlaneRequiredDISPCLKWithODMCombine2To1;
+   }
}
 
if (v->ODMCombineEnablePerState[i][k] == 
dm_odm_combine_mode_4to1) {
-- 
2.27.0

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


[PATCH 21/30] drm/amd/display: VSC SDP supported for SST

2020-06-19 Thread Rodrigo Siqueira
From: Derek Lai 

[why]
If a typeC to HDMI dongle supports YCbCr420 pass through and VSC
colorimetry and pixel encoding formats in the Extended Receiver
Capability, we shall allow VSC SDP to be used.

[How]
The Extended Receiver Capability field shall check the
EXTENDED_RECEIVER_CAPABILITY_FIELD_PRESENT bit in the
TRAINING_AUX_RD_INTERVAL register.  Removed DPCD rev checking for VSC
SDP.

Signed-off-by: Derek Lai 
Reviewed-by: Wenjing Liu 
Acked-by: Rodrigo Siqueira 
Acked-by: Tony Cheng 
---
 drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 4 +---
 drivers/gpu/drm/amd/display/dc/core/dc_link_dp.c  | 2 +-
 2 files changed, 2 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c 
b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
index 2dc419194817..e35fd2225972 100644
--- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
+++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
@@ -4646,10 +4646,8 @@ create_stream_for_sink(struct amdgpu_dm_connector 
*aconnector,
stream->use_vsc_sdp_for_colorimetry =

aconnector->dc_sink->is_vsc_sdp_colorimetry_supported;
} else {
-   if (stream->link->dpcd_caps.dpcd_rev.raw >= 
0x14 &&
-   
stream->link->dpcd_caps.dprx_feature.bits.VSC_SDP_COLORIMETRY_SUPPORTED) {
+   if 
(stream->link->dpcd_caps.dprx_feature.bits.VSC_SDP_COLORIMETRY_SUPPORTED)
stream->use_vsc_sdp_for_colorimetry = 
true;
-   }
}
mod_build_vsc_infopacket(stream, 
&stream->vsc_infopacket);
}
diff --git a/drivers/gpu/drm/amd/display/dc/core/dc_link_dp.c 
b/drivers/gpu/drm/amd/display/dc/core/dc_link_dp.c
index 3da5d76ee8b6..d9b53654c35a 100644
--- a/drivers/gpu/drm/amd/display/dc/core/dc_link_dp.c
+++ b/drivers/gpu/drm/amd/display/dc/core/dc_link_dp.c
@@ -3376,7 +3376,7 @@ static bool retrieve_link_cap(struct dc_link *link)
link->dpcd_caps.dpcd_rev.raw =
dpcd_data[DP_DPCD_REV - DP_DPCD_REV];
 
-   if (link->dpcd_caps.dpcd_rev.raw >= 0x14) {
+   if (link->dpcd_caps.ext_receiver_cap_field_present) {
for (i = 0; i < read_dpcd_retry_cnt; i++) {
status = core_link_read_dpcd(
link,
-- 
2.27.0

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


[PATCH 01/30] drm/amd/display: Use dmub fw to lock pipe, cursor, dig

2020-06-19 Thread Rodrigo Siqueira
From: Wyatt Wood 

[Why]
Hw lock manager adds the ability to lock pipe, cursor, and dig in fw.

[How]
Send hw lock command to fw to lock pipe, cursor, and dig.

Signed-off-by: Wyatt Wood 
Reviewed-by: Anthony Koo 
Acked-by: Rodrigo Siqueira 
---
 drivers/gpu/drm/amd/display/dc/core/dc.c  | 36 ++--
 .../gpu/drm/amd/display/dc/core/dc_link_dp.c  | 43 --
 drivers/gpu/drm/amd/display/dc/dce/Makefile   |  3 +-
 .../drm/amd/display/dc/dce/dmub_hw_lock_mgr.c | 57 +++
 .../drm/amd/display/dc/dce/dmub_hw_lock_mgr.h | 39 +
 .../amd/display/dc/dcn10/dcn10_hw_sequencer.c | 17 +-
 .../drm/amd/display/dc/dcn20/dcn20_hwseq.c| 18 +-
 .../gpu/drm/amd/display/dmub/inc/dmub_cmd.h   |  1 +
 8 files changed, 200 insertions(+), 14 deletions(-)
 create mode 100644 drivers/gpu/drm/amd/display/dc/dce/dmub_hw_lock_mgr.c
 create mode 100644 drivers/gpu/drm/amd/display/dc/dce/dmub_hw_lock_mgr.h

diff --git a/drivers/gpu/drm/amd/display/dc/core/dc.c 
b/drivers/gpu/drm/amd/display/dc/core/dc.c
index 49dd310ed588..db5feb89d4af 100644
--- a/drivers/gpu/drm/amd/display/dc/core/dc.c
+++ b/drivers/gpu/drm/amd/display/dc/core/dc.c
@@ -68,6 +68,8 @@
 
 #include "dmub/dmub_srv.h"
 
+#include "dce/dmub_hw_lock_mgr.h"
+
 #define CTX \
dc->ctx
 
@@ -2321,9 +2323,22 @@ static void commit_planes_for_stream(struct dc *dc,
}
 
if ((update_type != UPDATE_TYPE_FAST) && 
stream->update_flags.bits.dsc_changed)
-   if 
(top_pipe_to_program->stream_res.tg->funcs->lock_doublebuffer_enable)
-   
top_pipe_to_program->stream_res.tg->funcs->lock_doublebuffer_enable(
-   top_pipe_to_program->stream_res.tg);
+   if 
(top_pipe_to_program->stream_res.tg->funcs->lock_doublebuffer_enable) {
+   if (stream && should_use_dmub_lock(stream->link)) {
+   union dmub_hw_lock_flags hw_locks = { 0 };
+   struct dmub_hw_lock_inst_flags inst_flags = { 0 
};
+
+   hw_locks.bits.lock_dig = 1;
+   inst_flags.dig_inst = 
top_pipe_to_program->stream_res.tg->inst;
+
+   dmub_hw_lock_mgr_cmd(dc->ctx->dmub_srv,
+   true,
+   &hw_locks,
+   &inst_flags);
+   } else
+   
top_pipe_to_program->stream_res.tg->funcs->lock_doublebuffer_enable(
+   
top_pipe_to_program->stream_res.tg);
+   }
 
if ((update_type != UPDATE_TYPE_FAST) && 
dc->hwss.interdependent_update_lock)
dc->hwss.interdependent_update_lock(dc, context, true);
@@ -2493,7 +2508,20 @@ static void commit_planes_for_stream(struct dc *dc,

top_pipe_to_program->stream_res.tg->funcs->wait_for_state(
top_pipe_to_program->stream_res.tg,
CRTC_STATE_VACTIVE);
-   
top_pipe_to_program->stream_res.tg->funcs->lock_doublebuffer_disable(
+
+   if (stream && should_use_dmub_lock(stream->link)) {
+   union dmub_hw_lock_flags hw_locks = { 0 };
+   struct dmub_hw_lock_inst_flags inst_flags = { 0 
};
+
+   hw_locks.bits.lock_dig = 1;
+   inst_flags.dig_inst = 
top_pipe_to_program->stream_res.tg->inst;
+
+   dmub_hw_lock_mgr_cmd(dc->ctx->dmub_srv,
+   false,
+   &hw_locks,
+   &inst_flags);
+   } else
+   
top_pipe_to_program->stream_res.tg->funcs->lock_doublebuffer_disable(
top_pipe_to_program->stream_res.tg);
}
 
diff --git a/drivers/gpu/drm/amd/display/dc/core/dc_link_dp.c 
b/drivers/gpu/drm/amd/display/dc/core/dc_link_dp.c
index 484a6849f3de..3da5d76ee8b6 100644
--- a/drivers/gpu/drm/amd/display/dc/core/dc_link_dp.c
+++ b/drivers/gpu/drm/amd/display/dc/core/dc_link_dp.c
@@ -12,6 +12,8 @@
 #include "dc_link_ddc.h"
 #include "core_status.h"
 #include "dpcd_defs.h"
+#include "dc_dmub_srv.h"
+#include "dce/dmub_hw_lock_mgr.h"
 
 #define DC_LOGGER \
link->ctx->logger
@@ -4030,9 +4032,23 @@ bool dc_link_dp_set_test_pattern(
break;
}
 
-   if (pipe_ctx->stream_res.tg->funcs->lock_doublebuffer_enable)
-   
pipe_ctx->stream_res.tg->funcs->lock_doublebuffer_enable(
-   pipe_ctx->stream_res.tg);
+

[PATCH 11/30] drm/amd/display: 3.2.90

2020-06-19 Thread Rodrigo Siqueira
From: Aric Cyr 

Signed-off-by: Aric Cyr 
Reviewed-by: Aric Cyr 
Acked-by: Rodrigo Siqueira 
---
 drivers/gpu/drm/amd/display/dc/dc.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/display/dc/dc.h 
b/drivers/gpu/drm/amd/display/dc/dc.h
index a45b5ea98918..f9eb8c37d7c3 100644
--- a/drivers/gpu/drm/amd/display/dc/dc.h
+++ b/drivers/gpu/drm/amd/display/dc/dc.h
@@ -42,7 +42,7 @@
 #include "inc/hw/dmcu.h"
 #include "dml/display_mode_lib.h"
 
-#define DC_VER "3.2.89"
+#define DC_VER "3.2.90"
 
 #define MAX_SURFACES 3
 #define MAX_PLANES 6
-- 
2.27.0

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


[PATCH 19/30] drm/amd/display: Enable output_bpc property on all outputs

2020-06-19 Thread Rodrigo Siqueira
From: Stylon Wang 

[Why]
Connector property output_bpc is available on DP/eDP only. New IGT tests
would benifit if this property works on HDMI.

[How]
Enable this read-only property on all types of connectors.

Signed-off-by: Stylon Wang 
Reviewed-by: Nicholas Kazlauskas 
Acked-by: Rodrigo Siqueira 
---
 drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_debugfs.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_debugfs.c 
b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_debugfs.c
index 7b8968baaeb9..db4fab10a0c4 100644
--- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_debugfs.c
+++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_debugfs.c
@@ -1058,7 +1058,6 @@ static const struct {
{"link_settings", &dp_link_settings_debugfs_fops},
{"phy_settings", &dp_phy_settings_debugfs_fop},
{"test_pattern", &dp_phy_test_pattern_fops},
-   {"output_bpc", &output_bpc_fops},
{"vrr_range", &vrr_range_fops},
 #ifdef CONFIG_DRM_AMD_DC_HDCP
{"hdcp_sink_capability", &hdcp_sink_capability_fops},
@@ -1142,6 +1141,9 @@ void connector_debugfs_init(struct amdgpu_dm_connector 
*connector)
debugfs_create_file_unsafe("force_yuv420_output", 0644, dir, connector,
   &force_yuv420_output_fops);
 
+   debugfs_create_file("output_bpc", 0644, dir, connector,
+   &output_bpc_fops);
+
connector->debugfs_dpcd_address = 0;
connector->debugfs_dpcd_size = 0;
 
-- 
2.27.0

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


[PATCH 16/30] drm/amd/display: use dispclk AVFS for dppclk

2020-06-19 Thread Rodrigo Siqueira
From: Brandon Syu 

[Why]
There is using pixelclk AVFS for dppclk, that would cause issue.

[How]
To use dispclk AVFS for both dispclk and dppclk.  There would choose
dppclk for request voltage when dispclk wouldn't be updated case.  If
dispclk need to be updated, then it'll choose the bigger one from dppclk
and dispclk for request voltage.

Signed-off-by: Brandon Syu 
Reviewed-by: Tony Cheng 
Acked-by: Rodrigo Siqueira 
---
 .../display/dc/clk_mgr/dcn20/dcn20_clk_mgr.c   | 18 +-
 drivers/gpu/drm/amd/display/dc/dc.h|  1 +
 2 files changed, 14 insertions(+), 5 deletions(-)

diff --git a/drivers/gpu/drm/amd/display/dc/clk_mgr/dcn20/dcn20_clk_mgr.c 
b/drivers/gpu/drm/amd/display/dc/clk_mgr/dcn20/dcn20_clk_mgr.c
index 55d09adbf0d9..c63ec960e116 100644
--- a/drivers/gpu/drm/amd/display/dc/clk_mgr/dcn20/dcn20_clk_mgr.c
+++ b/drivers/gpu/drm/amd/display/dc/clk_mgr/dcn20/dcn20_clk_mgr.c
@@ -234,20 +234,26 @@ void dcn2_update_clocks(struct clk_mgr *clk_mgr_base,
dpp_clock_lowered = true;
clk_mgr->base.clks.dppclk_khz = new_clocks->dppclk_khz;
 
-   if (pp_smu && pp_smu->set_voltage_by_freq)
-   pp_smu->set_voltage_by_freq(&pp_smu->pp_smu, 
PP_SMU_NV_PIXELCLK, clk_mgr_base->clks.dppclk_khz / 1000);
-
update_dppclk = true;
}
 
if (should_set_clock(safe_to_lower, new_clocks->dispclk_khz, 
clk_mgr_base->clks.dispclk_khz)) {
clk_mgr_base->clks.dispclk_khz = new_clocks->dispclk_khz;
-   if (pp_smu && pp_smu->set_voltage_by_freq)
-   pp_smu->set_voltage_by_freq(&pp_smu->pp_smu, 
PP_SMU_NV_DISPCLK, clk_mgr_base->clks.dispclk_khz / 1000);
 
update_dispclk = true;
}
 
+   if (update_dppclk || update_dispclk) {
+   new_clocks->disp_dpp_voltage_level_khz = new_clocks->dppclk_khz;
+
+   if (update_dispclk)
+   new_clocks->disp_dpp_voltage_level_khz = 
new_clocks->dispclk_khz > new_clocks->dppclk_khz ? new_clocks->dispclk_khz : 
new_clocks->dppclk_khz;
+
+   clk_mgr_base->clks.disp_dpp_voltage_level_khz = 
new_clocks->disp_dpp_voltage_level_khz;
+   if (pp_smu && pp_smu->set_voltage_by_freq)
+   pp_smu->set_voltage_by_freq(&pp_smu->pp_smu, 
PP_SMU_NV_DISPCLK, clk_mgr_base->clks.disp_dpp_voltage_level_khz / 1000);
+   }
+
if (dc->config.forced_clocks == false || (force_reset && 
safe_to_lower)) {
if (dpp_clock_lowered) {
// if clock is being lowered, increase DTO before 
lowering refclk
@@ -403,6 +409,8 @@ static bool dcn2_are_clock_states_equal(struct dc_clocks *a,
return false;
else if (a->dppclk_khz != b->dppclk_khz)
return false;
+   else if (a->disp_dpp_voltage_level_khz != b->disp_dpp_voltage_level_khz)
+   return false;
else if (a->dcfclk_khz != b->dcfclk_khz)
return false;
else if (a->socclk_khz != b->socclk_khz)
diff --git a/drivers/gpu/drm/amd/display/dc/dc.h 
b/drivers/gpu/drm/amd/display/dc/dc.h
index f9eb8c37d7c3..83de4c2e045e 100644
--- a/drivers/gpu/drm/amd/display/dc/dc.h
+++ b/drivers/gpu/drm/amd/display/dc/dc.h
@@ -337,6 +337,7 @@ enum dcn_pwr_state {
 struct dc_clocks {
int dispclk_khz;
int dppclk_khz;
+   int disp_dpp_voltage_level_khz;
int dcfclk_khz;
int socclk_khz;
int dcfclk_deep_sleep_khz;
-- 
2.27.0

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


[PATCH 05/30] drm/amd/display: [FW Promotion] Release 1.0.17

2020-06-19 Thread Rodrigo Siqueira
From: Anthony Koo 

Signed-off-by: Anthony Koo 
Reviewed-by: Anthony Koo 
Acked-by: Rodrigo Siqueira 
---
 drivers/gpu/drm/amd/display/dmub/inc/dmub_cmd.h | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/display/dmub/inc/dmub_cmd.h 
b/drivers/gpu/drm/amd/display/dmub/inc/dmub_cmd.h
index ef9c116b790f..2b399b836aa6 100644
--- a/drivers/gpu/drm/amd/display/dmub/inc/dmub_cmd.h
+++ b/drivers/gpu/drm/amd/display/dmub/inc/dmub_cmd.h
@@ -36,10 +36,10 @@
 
 /* Firmware versioning. */
 #ifdef DMUB_EXPOSE_VERSION
-#define DMUB_FW_VERSION_GIT_HASH 0x703682cd7
+#define DMUB_FW_VERSION_GIT_HASH 0x6d5deb31c
 #define DMUB_FW_VERSION_MAJOR 1
 #define DMUB_FW_VERSION_MINOR 0
-#define DMUB_FW_VERSION_REVISION 16
+#define DMUB_FW_VERSION_REVISION 17
 #define DMUB_FW_VERSION_UCODE ((DMUB_FW_VERSION_MAJOR << 24) | 
(DMUB_FW_VERSION_MINOR << 16) | DMUB_FW_VERSION_REVISION)
 #endif
 
-- 
2.27.0

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


[PATCH 07/30] drm/amd/display: enable assr

2020-06-19 Thread Rodrigo Siqueira
From: Bhawanpreet Lakha 

[Why]
assr is content protection for eDP, in order to use it we need to call
psp ta (dtm)

[How]
We have a enable_assr callback, hook into this and call the correct psp
cmd id to enable assr.

Signed-off-by: Bhawanpreet Lakha 
Reviewed-by: Hersen Wu 
Acked-by: Rodrigo Siqueira 
---
 .../amd/display/amdgpu_dm/amdgpu_dm_hdcp.c| 37 +++
 drivers/gpu/drm/amd/display/dc/dm_cp_psp.h|  1 +
 2 files changed, 38 insertions(+)

diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_hdcp.c 
b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_hdcp.c
index dcf84a61de37..a8ee42d30911 100644
--- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_hdcp.c
+++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_hdcp.c
@@ -390,6 +390,42 @@ void hdcp_destroy(struct hdcp_workqueue *hdcp_work)
kfree(hdcp_work);
 }
 
+
+static bool enable_assr(void *handle, struct dc_link *link)
+{
+
+   struct hdcp_workqueue *hdcp_work = handle;
+   struct mod_hdcp hdcp = hdcp_work->hdcp;
+   struct psp_context *psp = hdcp.config.psp.handle;
+   struct ta_dtm_shared_memory *dtm_cmd;
+   bool res = true;
+
+   if (!psp->dtm_context.dtm_initialized) {
+   DRM_INFO("Failed to enable ASSR, DTM TA is not initialized.");
+   return false;
+   }
+
+   dtm_cmd = (struct ta_dtm_shared_memory 
*)psp->dtm_context.dtm_shared_buf;
+
+   mutex_lock(&psp->dtm_context.mutex);
+   memset(dtm_cmd, 0, sizeof(struct ta_dtm_shared_memory));
+
+   dtm_cmd->cmd_id = TA_DTM_COMMAND__TOPOLOGY_ASSR_ENABLE;
+   
dtm_cmd->dtm_in_message.topology_assr_enable.display_topology_dig_be_index = 
link->link_enc_hw_inst;
+   dtm_cmd->dtm_status = TA_DTM_STATUS__GENERIC_FAILURE;
+
+   psp_dtm_invoke(psp, dtm_cmd->cmd_id);
+
+   if (dtm_cmd->dtm_status != TA_DTM_STATUS__SUCCESS) {
+   DRM_INFO("Failed to enable ASSR");
+   res = false;
+   }
+
+   mutex_unlock(&psp->dtm_context.mutex);
+
+   return res;
+}
+
 static void update_config(void *handle, struct cp_psp_stream_config *config)
 {
struct hdcp_workqueue *hdcp_work = handle;
@@ -599,6 +635,7 @@ struct hdcp_workqueue *hdcp_create_workqueue(struct 
amdgpu_device *adev, struct
}
 
cp_psp->funcs.update_stream_config = update_config;
+   cp_psp->funcs.enable_assr = enable_assr;
cp_psp->handle = hdcp_work;
 
/* File created at /sys/class/drm/card0/device/hdcp_srm*/
diff --git a/drivers/gpu/drm/amd/display/dc/dm_cp_psp.h 
b/drivers/gpu/drm/amd/display/dc/dm_cp_psp.h
index 968c46dfb506..5da7677627a1 100644
--- a/drivers/gpu/drm/amd/display/dc/dm_cp_psp.h
+++ b/drivers/gpu/drm/amd/display/dc/dm_cp_psp.h
@@ -38,6 +38,7 @@ struct cp_psp_stream_config {
 };
 
 struct cp_psp_funcs {
+   bool (*enable_assr)(void *handle, struct dc_link *link);
void (*update_stream_config)(void *handle, struct cp_psp_stream_config 
*config);
 };
 
-- 
2.27.0

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


[PATCH 17/30] drm/amd/display: fix 4to1 odm MPC_OUT_FLOW_CONTROL_COUNT

2020-06-19 Thread Rodrigo Siqueira
From: Dmytro Laktyushkin 

Signed-off-by: Dmytro Laktyushkin 
Reviewed-by: Chris Park 
Acked-by: Rodrigo Siqueira 
---
 drivers/gpu/drm/amd/display/dc/dcn20/dcn20_hwseq.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/display/dc/dcn20/dcn20_hwseq.c 
b/drivers/gpu/drm/amd/display/dc/dcn20/dcn20_hwseq.c
index 789e33fae016..5621c95177d2 100644
--- a/drivers/gpu/drm/amd/display/dc/dcn20/dcn20_hwseq.c
+++ b/drivers/gpu/drm/amd/display/dc/dcn20/dcn20_hwseq.c
@@ -618,7 +618,7 @@ static int calc_mpc_flow_ctrl_cnt(const struct 
dc_stream_state *stream,
bool hblank_halved = 
optc2_is_two_pixels_per_containter(&stream->timing);
int flow_ctrl_cnt;
 
-   if (opp_cnt == 2)
+   if (opp_cnt >= 2)
hblank_halved = true;
 
flow_ctrl_cnt = stream->timing.h_total - stream->timing.h_addressable -
-- 
2.27.0

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [Linaro-mm-sig] [PATCH 04/18] dma-fence: prime lockdep annotations

2020-06-19 Thread Jerome Glisse
On Fri, Jun 19, 2020 at 03:18:49PM -0300, Jason Gunthorpe wrote:
> On Fri, Jun 19, 2020 at 02:09:35PM -0400, Jerome Glisse wrote:
> > On Fri, Jun 19, 2020 at 02:23:08PM -0300, Jason Gunthorpe wrote:
> > > On Fri, Jun 19, 2020 at 06:19:41PM +0200, Daniel Vetter wrote:
> > > 
> > > > The madness is only that device B's mmu notifier might need to wait
> > > > for fence_B so that the dma operation finishes. Which in turn has to
> > > > wait for device A to finish first.
> > > 
> > > So, it sound, fundamentally you've got this graph of operations across
> > > an unknown set of drivers and the kernel cannot insert itself in
> > > dma_fence hand offs to re-validate any of the buffers involved?
> > > Buffers which by definition cannot be touched by the hardware yet.
> > > 
> > > That really is a pretty horrible place to end up..
> > > 
> > > Pinning really is right answer for this kind of work flow. I think
> > > converting pinning to notifers should not be done unless notifier
> > > invalidation is relatively bounded. 
> > > 
> > > I know people like notifiers because they give a bit nicer performance
> > > in some happy cases, but this cripples all the bad cases..
> > > 
> > > If pinning doesn't work for some reason maybe we should address that?
> > 
> > Note that the dma fence is only true for user ptr buffer which predate
> > any HMM work and thus were using mmu notifier already. You need the
> > mmu notifier there because of fork and other corner cases.
> 
> I wonder if we should try to fix the fork case more directly - RDMA
> has this same problem and added MADV_DONTFORK a long time ago as a
> hacky way to deal with it.
>
> Some crazy page pin that resolved COW in a way that always kept the
> physical memory with the mm that initiated the pin?

Just no way to deal with it easily, i thought about forcing the
anon_vma (page->mapping for anonymous page) to the anon_vma that
belongs to the vma against which the GUP was done but it would
break things if page is already in other branch of a fork tree.
Also this forbid fast GUP.

Quite frankly the fork was not the main motivating factor. GPU
can pin potentialy GBytes of memory thus we wanted to be able
to release it but since Michal changes to reclaim code this is
no longer effective.

User buffer should never end up in those weird corner case, iirc
the first usage was for xorg exa texture upload, then generalize
to texture upload in mesa and latter on to more upload cases
(vertices, ...). At least this is what i remember today. So in
those cases we do not expect fork, splice, mremap, mprotect, ...

Maybe we can audit how user ptr buffer are use today and see if
we can define a usage pattern that would allow to cut corner in
kernel. For instance we could use mmu notifier just to block CPU
pte update while we do GUP and thus never wait on dma fence.

Then GPU driver just keep the GUP pin around until they are done
with the page. They can also use the mmu notifier to keep a flag
so that the driver know if it needs to redo a GUP ie:

The notifier path:
   GPU_mmu_notifier_start_callback(range)
gpu_lock_cpu_pagetable(range)
for_each_bo_in(bo, range) {
bo->need_gup = true;
}
gpu_unlock_cpu_pagetable(range)

   GPU_validate_buffer_pages(bo)
if (!bo->need_gup)
return;
put_pages(bo->pages);
range = bo_vaddr_range(bo)
gpu_lock_cpu_pagetable(range)
GUP(bo->pages, range)
gpu_unlock_cpu_pagetable(range)


Depending on how user_ptr are use today this could work.


> (isn't this broken for O_DIRECT as well anyhow?)

Yes it can in theory, if you have an application that does O_DIRECT
and fork concurrently (ie O_DIRECT in one thread and fork in another).
Note that O_DIRECT after fork is fine, it is an issue only if GUP_fast
was able to lookup a page with write permission before fork had the
chance to update it to read only for COW.

But doing O_DIRECT (or anything that use GUP fast) in one thread and
fork in another is inherently broken ie there is no way to fix it.

See 17839856fd588f4ab6b789f482ed3ffd7c403e1f

> 
> How does mmu_notifiers help the fork case anyhow? Block fork from
> progressing?

It enforce ordering between fork and GUP, if fork is first it blocks
GUP and if forks is last then fork waits on GUP and then user buffer
get invalidated.

> 
> > I probably need to warn AMD folks again that using HMM means that you
> > must be able to update the GPU page table asynchronously without
> > fence wait.
> 
> It is kind of unrelated to HMM, it just shouldn't be using mmu
> notifiers to replace page pinning..

Well my POV is that if you abide by rules HMM defined then you do
not need to pin pages. The rule is asynchronous device page table
update.

Pinning pages is problematic it blocks many core mm features and
it is just bad all around. Also it is inherently broken in front
of fork/mremap/splice/...

Cheers,
Jérôme

___
amd-gfx m

Re: [Linaro-mm-sig] [PATCH 04/18] dma-fence: prime lockdep annotations

2020-06-19 Thread Felix Kuehling

Am 2020-06-19 um 3:55 p.m. schrieb Jason Gunthorpe:
> On Fri, Jun 19, 2020 at 03:48:49PM -0400, Felix Kuehling wrote:
>> Am 2020-06-19 um 2:18 p.m. schrieb Jason Gunthorpe:
>>> On Fri, Jun 19, 2020 at 02:09:35PM -0400, Jerome Glisse wrote:
 On Fri, Jun 19, 2020 at 02:23:08PM -0300, Jason Gunthorpe wrote:
> On Fri, Jun 19, 2020 at 06:19:41PM +0200, Daniel Vetter wrote:
>
>> The madness is only that device B's mmu notifier might need to wait
>> for fence_B so that the dma operation finishes. Which in turn has to
>> wait for device A to finish first.
> So, it sound, fundamentally you've got this graph of operations across
> an unknown set of drivers and the kernel cannot insert itself in
> dma_fence hand offs to re-validate any of the buffers involved?
> Buffers which by definition cannot be touched by the hardware yet.
>
> That really is a pretty horrible place to end up..
>
> Pinning really is right answer for this kind of work flow. I think
> converting pinning to notifers should not be done unless notifier
> invalidation is relatively bounded. 
>
> I know people like notifiers because they give a bit nicer performance
> in some happy cases, but this cripples all the bad cases..
>
> If pinning doesn't work for some reason maybe we should address that?
 Note that the dma fence is only true for user ptr buffer which predate
 any HMM work and thus were using mmu notifier already. You need the
 mmu notifier there because of fork and other corner cases.
>>> I wonder if we should try to fix the fork case more directly - RDMA
>>> has this same problem and added MADV_DONTFORK a long time ago as a
>>> hacky way to deal with it.
>>>
>>> Some crazy page pin that resolved COW in a way that always kept the
>>> physical memory with the mm that initiated the pin?
>>>
>>> (isn't this broken for O_DIRECT as well anyhow?)
>>>
>>> How does mmu_notifiers help the fork case anyhow? Block fork from
>>> progressing?
>> How much the mmu_notifier blocks fork progress depends, on quickly we
>> can preempt GPU jobs accessing affected memory. If we don't have
>> fine-grained preemption capability (graphics), the best we can do is
>> wait for the GPU jobs to complete. We can also delay submission of new
>> GPU jobs to the same memory until the MMU notifier is done. Future jobs
>> would use the new page addresses.
>>
>> With fine-grained preemption (ROCm compute), we can preempt GPU work on
>> the affected adders space to minimize the delay seen by fork.
>>
>> With recoverable device page faults, we can invalidate GPU page table
>> entries, so device access to the affected pages stops immediately.
>>
>> In all cases, the end result is, that the device page table gets updated
>> with the address of the copied pages before the GPU accesses the COW
>> memory again.Without the MMU notifier, we'd end up with the GPU
>> corrupting memory of the other process.
> The model here in fork has been wrong for a long time, and I do wonder
> how O_DIRECT manages to not be broken too.. I guess the time windows
> there are too small to get unlucky.
>
> If you have a write pin on a page then it should not be COW'd into the
> fork'd process but copied with the originating page remaining with the
> original mm. 
>
> I wonder if there is some easy way to achive that - if that is the
> main reason to use notifiers then it would be a better solution.

Other than the application changing its own virtual address mappings
(mprotect, munmap, etc.), triggering MMU notifiers, we also get MMU
notifiers from THP worker threads, and NUMA balancing.

When we start doing migration to DEVICE_PRIVATE memory with HMM, we also
get MMU notifiers during those driver-initiated migrations.

Regards,
  Felix


>
> Jason
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [Linaro-mm-sig] [PATCH 04/18] dma-fence: prime lockdep annotations

2020-06-19 Thread Jason Gunthorpe
On Fri, Jun 19, 2020 at 03:30:32PM -0400, Felix Kuehling wrote:
> We have a potential problem with CPU updating page tables while the GPU
> is retrying on page table entries because 64 bit CPU transactions don't
> arrive in device memory atomically.

Except for 32 bit platforms atomicity is guarenteed if you use uncached
writeq() to aligned addresses..

The linux driver model breaks of the writeX() stuff is not atomic.

Jason
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [Linaro-mm-sig] [PATCH 04/18] dma-fence: prime lockdep annotations

2020-06-19 Thread Jason Gunthorpe
On Fri, Jun 19, 2020 at 03:48:49PM -0400, Felix Kuehling wrote:
> Am 2020-06-19 um 2:18 p.m. schrieb Jason Gunthorpe:
> > On Fri, Jun 19, 2020 at 02:09:35PM -0400, Jerome Glisse wrote:
> >> On Fri, Jun 19, 2020 at 02:23:08PM -0300, Jason Gunthorpe wrote:
> >>> On Fri, Jun 19, 2020 at 06:19:41PM +0200, Daniel Vetter wrote:
> >>>
>  The madness is only that device B's mmu notifier might need to wait
>  for fence_B so that the dma operation finishes. Which in turn has to
>  wait for device A to finish first.
> >>> So, it sound, fundamentally you've got this graph of operations across
> >>> an unknown set of drivers and the kernel cannot insert itself in
> >>> dma_fence hand offs to re-validate any of the buffers involved?
> >>> Buffers which by definition cannot be touched by the hardware yet.
> >>>
> >>> That really is a pretty horrible place to end up..
> >>>
> >>> Pinning really is right answer for this kind of work flow. I think
> >>> converting pinning to notifers should not be done unless notifier
> >>> invalidation is relatively bounded. 
> >>>
> >>> I know people like notifiers because they give a bit nicer performance
> >>> in some happy cases, but this cripples all the bad cases..
> >>>
> >>> If pinning doesn't work for some reason maybe we should address that?
> >> Note that the dma fence is only true for user ptr buffer which predate
> >> any HMM work and thus were using mmu notifier already. You need the
> >> mmu notifier there because of fork and other corner cases.
> > I wonder if we should try to fix the fork case more directly - RDMA
> > has this same problem and added MADV_DONTFORK a long time ago as a
> > hacky way to deal with it.
> >
> > Some crazy page pin that resolved COW in a way that always kept the
> > physical memory with the mm that initiated the pin?
> >
> > (isn't this broken for O_DIRECT as well anyhow?)
> >
> > How does mmu_notifiers help the fork case anyhow? Block fork from
> > progressing?
> 
> How much the mmu_notifier blocks fork progress depends, on quickly we
> can preempt GPU jobs accessing affected memory. If we don't have
> fine-grained preemption capability (graphics), the best we can do is
> wait for the GPU jobs to complete. We can also delay submission of new
> GPU jobs to the same memory until the MMU notifier is done. Future jobs
> would use the new page addresses.
> 
> With fine-grained preemption (ROCm compute), we can preempt GPU work on
> the affected adders space to minimize the delay seen by fork.
> 
> With recoverable device page faults, we can invalidate GPU page table
> entries, so device access to the affected pages stops immediately.
> 
> In all cases, the end result is, that the device page table gets updated
> with the address of the copied pages before the GPU accesses the COW
> memory again.Without the MMU notifier, we'd end up with the GPU
> corrupting memory of the other process.

The model here in fork has been wrong for a long time, and I do wonder
how O_DIRECT manages to not be broken too.. I guess the time windows
there are too small to get unlucky.

If you have a write pin on a page then it should not be COW'd into the
fork'd process but copied with the originating page remaining with the
original mm. 

I wonder if there is some easy way to achive that - if that is the
main reason to use notifiers then it would be a better solution.

Jason
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [Linaro-mm-sig] [PATCH 04/18] dma-fence: prime lockdep annotations

2020-06-19 Thread Felix Kuehling
Am 2020-06-19 um 2:18 p.m. schrieb Jason Gunthorpe:
> On Fri, Jun 19, 2020 at 02:09:35PM -0400, Jerome Glisse wrote:
>> On Fri, Jun 19, 2020 at 02:23:08PM -0300, Jason Gunthorpe wrote:
>>> On Fri, Jun 19, 2020 at 06:19:41PM +0200, Daniel Vetter wrote:
>>>
 The madness is only that device B's mmu notifier might need to wait
 for fence_B so that the dma operation finishes. Which in turn has to
 wait for device A to finish first.
>>> So, it sound, fundamentally you've got this graph of operations across
>>> an unknown set of drivers and the kernel cannot insert itself in
>>> dma_fence hand offs to re-validate any of the buffers involved?
>>> Buffers which by definition cannot be touched by the hardware yet.
>>>
>>> That really is a pretty horrible place to end up..
>>>
>>> Pinning really is right answer for this kind of work flow. I think
>>> converting pinning to notifers should not be done unless notifier
>>> invalidation is relatively bounded. 
>>>
>>> I know people like notifiers because they give a bit nicer performance
>>> in some happy cases, but this cripples all the bad cases..
>>>
>>> If pinning doesn't work for some reason maybe we should address that?
>> Note that the dma fence is only true for user ptr buffer which predate
>> any HMM work and thus were using mmu notifier already. You need the
>> mmu notifier there because of fork and other corner cases.
> I wonder if we should try to fix the fork case more directly - RDMA
> has this same problem and added MADV_DONTFORK a long time ago as a
> hacky way to deal with it.
>
> Some crazy page pin that resolved COW in a way that always kept the
> physical memory with the mm that initiated the pin?
>
> (isn't this broken for O_DIRECT as well anyhow?)
>
> How does mmu_notifiers help the fork case anyhow? Block fork from
> progressing?

How much the mmu_notifier blocks fork progress depends, on quickly we
can preempt GPU jobs accessing affected memory. If we don't have
fine-grained preemption capability (graphics), the best we can do is
wait for the GPU jobs to complete. We can also delay submission of new
GPU jobs to the same memory until the MMU notifier is done. Future jobs
would use the new page addresses.

With fine-grained preemption (ROCm compute), we can preempt GPU work on
the affected adders space to minimize the delay seen by fork.

With recoverable device page faults, we can invalidate GPU page table
entries, so device access to the affected pages stops immediately.

In all cases, the end result is, that the device page table gets updated
with the address of the copied pages before the GPU accesses the COW
memory again.Without the MMU notifier, we'd end up with the GPU
corrupting memory of the other process.

Regards,
  Felix


>
>> I probably need to warn AMD folks again that using HMM means that you
>> must be able to update the GPU page table asynchronously without
>> fence wait.
> It is kind of unrelated to HMM, it just shouldn't be using mmu
> notifiers to replace page pinning..
>
>> The issue for AMD is that they already update their GPU page table
>> using DMA engine. I believe this is still doable if they use a
>> kernel only DMA engine context, where only kernel can queue up jobs
>> so that you do not need to wait for unrelated things and you can
>> prioritize GPU page table update which should translate in fast GPU
>> page table update without DMA fence.
> Make sense
>
> I'm not sure I saw this in the AMD hmm stuff - it would be good if
> someone would look at that. Every time I do it looks like the locking
> is wrong.
>
> Jason
> ___
> amd-gfx mailing list
> amd-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [Linaro-mm-sig] [PATCH 04/18] dma-fence: prime lockdep annotations

2020-06-19 Thread Jerome Glisse
On Fri, Jun 19, 2020 at 03:30:32PM -0400, Felix Kuehling wrote:
> 
> Am 2020-06-19 um 3:11 p.m. schrieb Alex Deucher:
> > On Fri, Jun 19, 2020 at 2:09 PM Jerome Glisse  wrote:
> >> On Fri, Jun 19, 2020 at 02:23:08PM -0300, Jason Gunthorpe wrote:
> >>> On Fri, Jun 19, 2020 at 06:19:41PM +0200, Daniel Vetter wrote:
> >>>
>  The madness is only that device B's mmu notifier might need to wait
>  for fence_B so that the dma operation finishes. Which in turn has to
>  wait for device A to finish first.
> >>> So, it sound, fundamentally you've got this graph of operations across
> >>> an unknown set of drivers and the kernel cannot insert itself in
> >>> dma_fence hand offs to re-validate any of the buffers involved?
> >>> Buffers which by definition cannot be touched by the hardware yet.
> >>>
> >>> That really is a pretty horrible place to end up..
> >>>
> >>> Pinning really is right answer for this kind of work flow. I think
> >>> converting pinning to notifers should not be done unless notifier
> >>> invalidation is relatively bounded.
> >>>
> >>> I know people like notifiers because they give a bit nicer performance
> >>> in some happy cases, but this cripples all the bad cases..
> >>>
> >>> If pinning doesn't work for some reason maybe we should address that?
> >> Note that the dma fence is only true for user ptr buffer which predate
> >> any HMM work and thus were using mmu notifier already. You need the
> >> mmu notifier there because of fork and other corner cases.
> >>
> >> For nouveau the notifier do not need to wait for anything it can update
> >> the GPU page table right away. Modulo needing to write to GPU memory
> >> using dma engine if the GPU page table is in GPU memory that is not
> >> accessible from the CPU but that's never the case for nouveau so far
> >> (but i expect it will be at one point).
> >>
> >>
> >> So i see this as 2 different cases, the user ptr case, which does pin
> >> pages by the way, where things are synchronous. Versus the HMM cases
> >> where everything is asynchronous.
> >>
> >>
> >> I probably need to warn AMD folks again that using HMM means that you
> >> must be able to update the GPU page table asynchronously without
> >> fence wait. The issue for AMD is that they already update their GPU
> >> page table using DMA engine. I believe this is still doable if they
> >> use a kernel only DMA engine context, where only kernel can queue up
> >> jobs so that you do not need to wait for unrelated things and you can
> >> prioritize GPU page table update which should translate in fast GPU
> >> page table update without DMA fence.
> > All devices which support recoverable page faults also have a
> > dedicated paging engine for the kernel driver which the driver already
> > makes use of.  We can also update the GPU page tables with the CPU.
> 
> We have a potential problem with CPU updating page tables while the GPU
> is retrying on page table entries because 64 bit CPU transactions don't
> arrive in device memory atomically.
> 
> We are using SDMA for page table updates. This currently goes through a
> the DRM GPU scheduler to a special SDMA queue that's used by kernel-mode
> only. But since it's based on the DRM GPU scheduler, we do use dma-fence
> to wait for completion.

Yeah my worry is mostly that some cross dma fence leak into it but
it should never happen realy, maybe there is a way to catch if it
does and print a warning.

So yes you can use dma fence, as long as they do not have cross-dep.
Another expectation is that they complete quickly and usualy page
table update do.

Cheers,
Jérôme

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [Linaro-mm-sig] [PATCH 04/18] dma-fence: prime lockdep annotations

2020-06-19 Thread Felix Kuehling

Am 2020-06-19 um 3:11 p.m. schrieb Alex Deucher:
> On Fri, Jun 19, 2020 at 2:09 PM Jerome Glisse  wrote:
>> On Fri, Jun 19, 2020 at 02:23:08PM -0300, Jason Gunthorpe wrote:
>>> On Fri, Jun 19, 2020 at 06:19:41PM +0200, Daniel Vetter wrote:
>>>
 The madness is only that device B's mmu notifier might need to wait
 for fence_B so that the dma operation finishes. Which in turn has to
 wait for device A to finish first.
>>> So, it sound, fundamentally you've got this graph of operations across
>>> an unknown set of drivers and the kernel cannot insert itself in
>>> dma_fence hand offs to re-validate any of the buffers involved?
>>> Buffers which by definition cannot be touched by the hardware yet.
>>>
>>> That really is a pretty horrible place to end up..
>>>
>>> Pinning really is right answer for this kind of work flow. I think
>>> converting pinning to notifers should not be done unless notifier
>>> invalidation is relatively bounded.
>>>
>>> I know people like notifiers because they give a bit nicer performance
>>> in some happy cases, but this cripples all the bad cases..
>>>
>>> If pinning doesn't work for some reason maybe we should address that?
>> Note that the dma fence is only true for user ptr buffer which predate
>> any HMM work and thus were using mmu notifier already. You need the
>> mmu notifier there because of fork and other corner cases.
>>
>> For nouveau the notifier do not need to wait for anything it can update
>> the GPU page table right away. Modulo needing to write to GPU memory
>> using dma engine if the GPU page table is in GPU memory that is not
>> accessible from the CPU but that's never the case for nouveau so far
>> (but i expect it will be at one point).
>>
>>
>> So i see this as 2 different cases, the user ptr case, which does pin
>> pages by the way, where things are synchronous. Versus the HMM cases
>> where everything is asynchronous.
>>
>>
>> I probably need to warn AMD folks again that using HMM means that you
>> must be able to update the GPU page table asynchronously without
>> fence wait. The issue for AMD is that they already update their GPU
>> page table using DMA engine. I believe this is still doable if they
>> use a kernel only DMA engine context, where only kernel can queue up
>> jobs so that you do not need to wait for unrelated things and you can
>> prioritize GPU page table update which should translate in fast GPU
>> page table update without DMA fence.
> All devices which support recoverable page faults also have a
> dedicated paging engine for the kernel driver which the driver already
> makes use of.  We can also update the GPU page tables with the CPU.

We have a potential problem with CPU updating page tables while the GPU
is retrying on page table entries because 64 bit CPU transactions don't
arrive in device memory atomically.

We are using SDMA for page table updates. This currently goes through a
the DRM GPU scheduler to a special SDMA queue that's used by kernel-mode
only. But since it's based on the DRM GPU scheduler, we do use dma-fence
to wait for completion.

Regards,
  Felix


>
> Alex
>
>>
 Full disclosure: We are aware that we've designed ourselves into an
 impressive corner here, and there's lots of talks going on about
 untangling the dma synchronization from the memory management
 completely. But
>>> I think the documenting is really important: only GPU should be using
>>> this stuff and driving notifiers this way. Complete NO for any
>>> totally-not-a-GPU things in drivers/accel for sure.
>> Yes for user that expect HMM they need to be asynchronous. But it is
>> hard to revert user ptr has it was done a long time ago.
>>
>> Cheers,
>> Jérôme
>>
>> ___
>> amd-gfx mailing list
>> amd-gfx@lists.freedesktop.org
>> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
> ___
> amd-gfx mailing list
> amd-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [Linaro-mm-sig] [PATCH 04/18] dma-fence: prime lockdep annotations

2020-06-19 Thread Alex Deucher
On Fri, Jun 19, 2020 at 2:09 PM Jerome Glisse  wrote:
>
> On Fri, Jun 19, 2020 at 02:23:08PM -0300, Jason Gunthorpe wrote:
> > On Fri, Jun 19, 2020 at 06:19:41PM +0200, Daniel Vetter wrote:
> >
> > > The madness is only that device B's mmu notifier might need to wait
> > > for fence_B so that the dma operation finishes. Which in turn has to
> > > wait for device A to finish first.
> >
> > So, it sound, fundamentally you've got this graph of operations across
> > an unknown set of drivers and the kernel cannot insert itself in
> > dma_fence hand offs to re-validate any of the buffers involved?
> > Buffers which by definition cannot be touched by the hardware yet.
> >
> > That really is a pretty horrible place to end up..
> >
> > Pinning really is right answer for this kind of work flow. I think
> > converting pinning to notifers should not be done unless notifier
> > invalidation is relatively bounded.
> >
> > I know people like notifiers because they give a bit nicer performance
> > in some happy cases, but this cripples all the bad cases..
> >
> > If pinning doesn't work for some reason maybe we should address that?
>
> Note that the dma fence is only true for user ptr buffer which predate
> any HMM work and thus were using mmu notifier already. You need the
> mmu notifier there because of fork and other corner cases.
>
> For nouveau the notifier do not need to wait for anything it can update
> the GPU page table right away. Modulo needing to write to GPU memory
> using dma engine if the GPU page table is in GPU memory that is not
> accessible from the CPU but that's never the case for nouveau so far
> (but i expect it will be at one point).
>
>
> So i see this as 2 different cases, the user ptr case, which does pin
> pages by the way, where things are synchronous. Versus the HMM cases
> where everything is asynchronous.
>
>
> I probably need to warn AMD folks again that using HMM means that you
> must be able to update the GPU page table asynchronously without
> fence wait. The issue for AMD is that they already update their GPU
> page table using DMA engine. I believe this is still doable if they
> use a kernel only DMA engine context, where only kernel can queue up
> jobs so that you do not need to wait for unrelated things and you can
> prioritize GPU page table update which should translate in fast GPU
> page table update without DMA fence.

All devices which support recoverable page faults also have a
dedicated paging engine for the kernel driver which the driver already
makes use of.  We can also update the GPU page tables with the CPU.

Alex

>
>
> > > Full disclosure: We are aware that we've designed ourselves into an
> > > impressive corner here, and there's lots of talks going on about
> > > untangling the dma synchronization from the memory management
> > > completely. But
> >
> > I think the documenting is really important: only GPU should be using
> > this stuff and driving notifiers this way. Complete NO for any
> > totally-not-a-GPU things in drivers/accel for sure.
>
> Yes for user that expect HMM they need to be asynchronous. But it is
> hard to revert user ptr has it was done a long time ago.
>
> Cheers,
> Jérôme
>
> ___
> amd-gfx mailing list
> amd-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


[PATCH v2 3/3] drm/amdgpu: Warn about disabled DPM

2020-06-19 Thread Paul Menzel
Currently, besides there is no explicit message, that DPM is disabled.
The user would need to know, that the missing success line indicates
that.

[drm] amdgpu: dpm initialized

So, add an explicit message, and make it log level warning, as disabling
dpm is not the default, and device performance will most likely suffer.

Resolves: https://gitlab.freedesktop.org/drm/amd/-/issues/1173
Cc: amd-gfx@lists.freedesktop.org
Signed-off-by: Paul Menzel 
---
v2: Use new print helpers, and inform user about effects.

 drivers/gpu/drm/amd/amdgpu/kv_dpm.c | 4 +++-
 drivers/gpu/drm/amd/amdgpu/si_dpm.c | 4 +++-
 2 files changed, 6 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/kv_dpm.c 
b/drivers/gpu/drm/amd/amdgpu/kv_dpm.c
index f054ded902f2..c601587c6d59 100644
--- a/drivers/gpu/drm/amd/amdgpu/kv_dpm.c
+++ b/drivers/gpu/drm/amd/amdgpu/kv_dpm.c
@@ -3014,8 +3014,10 @@ static int kv_dpm_sw_init(void *handle)
adev->pm.current_mclk = adev->clock.default_mclk;
adev->pm.int_thermal_type = THERMAL_TYPE_NONE;
 
-   if (amdgpu_dpm == 0)
+   if (amdgpu_dpm == 0) {
+   drm_warn(adev, "amdgpu: dpm disabled per parameter. Your 
graphics device will run with lower clocks impacting graphics performance.\n");
return 0;
+   }
 
INIT_WORK(&adev->pm.dpm.thermal.work, amdgpu_dpm_thermal_work_handler);
mutex_lock(&adev->pm.mutex);
diff --git a/drivers/gpu/drm/amd/amdgpu/si_dpm.c 
b/drivers/gpu/drm/amd/amdgpu/si_dpm.c
index f7edc1d50df4..1f35d5a36300 100644
--- a/drivers/gpu/drm/amd/amdgpu/si_dpm.c
+++ b/drivers/gpu/drm/amd/amdgpu/si_dpm.c
@@ -7687,8 +7687,10 @@ static int si_dpm_sw_init(void *handle)
adev->pm.current_mclk = adev->clock.default_mclk;
adev->pm.int_thermal_type = THERMAL_TYPE_NONE;
 
-   if (amdgpu_dpm == 0)
+   if (amdgpu_dpm == 0) {
+   drm_warn(adev, "amdgpu: dpm disabled per parameter. Your 
graphics device will run with lower clocks impacting graphics performance.\n");
return 0;
+   }
 
ret = si_dpm_init_microcode(adev);
if (ret)
-- 
2.27.0

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


[PATCH v2 1/3] drm/amdgpu/kv,si: DPM: Use new print helpers

2020-06-19 Thread Paul Menzel
---
Untested.

 drivers/gpu/drm/amd/amdgpu/kv_dpm.c | 58 ++--
 drivers/gpu/drm/amd/amdgpu/si_dpm.c | 82 ++---
 2 files changed, 70 insertions(+), 70 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/kv_dpm.c 
b/drivers/gpu/drm/amd/amdgpu/kv_dpm.c
index 4b3faaccecb9..b179bdc17cdc 100644
--- a/drivers/gpu/drm/amd/amdgpu/kv_dpm.c
+++ b/drivers/gpu/drm/amd/amdgpu/kv_dpm.c
@@ -1252,7 +1252,7 @@ static void kv_dpm_enable_bapm(void *handle, bool enable)
if (pi->bapm_enable) {
ret = amdgpu_kv_smc_bapm_enable(adev, enable);
if (ret)
-   DRM_ERROR("amdgpu_kv_smc_bapm_enable failed\n");
+   drm_err(adev, "amdgpu_kv_smc_bapm_enable failed\n");
}
 }
 
@@ -1263,40 +1263,40 @@ static int kv_dpm_enable(struct amdgpu_device *adev)
 
ret = kv_process_firmware_header(adev);
if (ret) {
-   DRM_ERROR("kv_process_firmware_header failed\n");
+   drm_err(adev, "kv_process_firmware_header failed\n");
return ret;
}
kv_init_fps_limits(adev);
kv_init_graphics_levels(adev);
ret = kv_program_bootup_state(adev);
if (ret) {
-   DRM_ERROR("kv_program_bootup_state failed\n");
+   dev_err(adev, "kv_program_bootup_state failed\n");
return ret;
}
kv_calculate_dfs_bypass_settings(adev);
ret = kv_upload_dpm_settings(adev);
if (ret) {
-   DRM_ERROR("kv_upload_dpm_settings failed\n");
+   dev_err(adev, "kv_upload_dpm_settings failed\n");
return ret;
}
ret = kv_populate_uvd_table(adev);
if (ret) {
-   DRM_ERROR("kv_populate_uvd_table failed\n");
+   drm_err(adev, "kv_populate_uvd_table failed\n");
return ret;
}
ret = kv_populate_vce_table(adev);
if (ret) {
-   DRM_ERROR("kv_populate_vce_table failed\n");
+   drm_err(adev, "kv_populate_vce_table failed\n");
return ret;
}
ret = kv_populate_samu_table(adev);
if (ret) {
-   DRM_ERROR("kv_populate_samu_table failed\n");
+   drm_err(adev, "kv_populate_samu_table failed\n");
return ret;
}
ret = kv_populate_acp_table(adev);
if (ret) {
-   DRM_ERROR("kv_populate_acp_table failed\n");
+   drm_err(adev, "kv_populate_acp_table failed\n");
return ret;
}
kv_program_vc(adev);
@@ -1307,39 +1307,39 @@ static int kv_dpm_enable(struct amdgpu_device *adev)
if (pi->enable_auto_thermal_throttling) {
ret = kv_enable_auto_thermal_throttling(adev);
if (ret) {
-   DRM_ERROR("kv_enable_auto_thermal_throttling failed\n");
+   drm_err(adev, "kv_enable_auto_thermal_throttling 
failed\n");
return ret;
}
}
ret = kv_enable_dpm_voltage_scaling(adev);
if (ret) {
-   DRM_ERROR("kv_enable_dpm_voltage_scaling failed\n");
+   drm_err(adev, "kv_enable_dpm_voltage_scaling failed\n");
return ret;
}
ret = kv_set_dpm_interval(adev);
if (ret) {
-   DRM_ERROR("kv_set_dpm_interval failed\n");
+   drm_err(adev, "kv_set_dpm_interval failed\n");
return ret;
}
ret = kv_set_dpm_boot_state(adev);
if (ret) {
-   DRM_ERROR("kv_set_dpm_boot_state failed\n");
+   drm_err(adev, "kv_set_dpm_boot_state failed\n");
return ret;
}
ret = kv_enable_ulv(adev, true);
if (ret) {
-   DRM_ERROR("kv_enable_ulv failed\n");
+   drm_err(adev, "kv_enable_ulv failed\n");
return ret;
}
kv_start_dpm(adev);
ret = kv_enable_didt(adev, true);
if (ret) {
-   DRM_ERROR("kv_enable_didt failed\n");
+   drm_err(adev, "kv_enable_didt failed\n");
return ret;
}
ret = kv_enable_smc_cac(adev, true);
if (ret) {
-   DRM_ERROR("kv_enable_smc_cac failed\n");
+   drm_err(adev, "kv_enable_smc_cac failed\n");
return ret;
}
 
@@ -1347,7 +1347,7 @@ static int kv_dpm_enable(struct amdgpu_device *adev)
 
ret = amdgpu_kv_smc_bapm_enable(adev, false);
if (ret) {
-   DRM_ERROR("amdgpu_kv_smc_bapm_enable failed\n");
+   drm_err(adev, "amdgpu_kv_smc_bapm_enable failed\n");
return ret;
}
 
@@ -1355,7 +1355,7 @@ static int kv_dpm_enable(struct amdgpu_device *adev)
amdgpu_is_internal_thermal_sensor(adev->pm.int_thermal_type)) {
ret = kv_set_thermal_temperature_range(adev, KV_TEMP_RANGE_MIN, 
KV_TEMP_RANGE

Re: [Linaro-mm-sig] [PATCH 04/18] dma-fence: prime lockdep annotations

2020-06-19 Thread Jason Gunthorpe
On Fri, Jun 19, 2020 at 02:09:35PM -0400, Jerome Glisse wrote:
> On Fri, Jun 19, 2020 at 02:23:08PM -0300, Jason Gunthorpe wrote:
> > On Fri, Jun 19, 2020 at 06:19:41PM +0200, Daniel Vetter wrote:
> > 
> > > The madness is only that device B's mmu notifier might need to wait
> > > for fence_B so that the dma operation finishes. Which in turn has to
> > > wait for device A to finish first.
> > 
> > So, it sound, fundamentally you've got this graph of operations across
> > an unknown set of drivers and the kernel cannot insert itself in
> > dma_fence hand offs to re-validate any of the buffers involved?
> > Buffers which by definition cannot be touched by the hardware yet.
> > 
> > That really is a pretty horrible place to end up..
> > 
> > Pinning really is right answer for this kind of work flow. I think
> > converting pinning to notifers should not be done unless notifier
> > invalidation is relatively bounded. 
> > 
> > I know people like notifiers because they give a bit nicer performance
> > in some happy cases, but this cripples all the bad cases..
> > 
> > If pinning doesn't work for some reason maybe we should address that?
> 
> Note that the dma fence is only true for user ptr buffer which predate
> any HMM work and thus were using mmu notifier already. You need the
> mmu notifier there because of fork and other corner cases.

I wonder if we should try to fix the fork case more directly - RDMA
has this same problem and added MADV_DONTFORK a long time ago as a
hacky way to deal with it.

Some crazy page pin that resolved COW in a way that always kept the
physical memory with the mm that initiated the pin?

(isn't this broken for O_DIRECT as well anyhow?)

How does mmu_notifiers help the fork case anyhow? Block fork from
progressing?

> I probably need to warn AMD folks again that using HMM means that you
> must be able to update the GPU page table asynchronously without
> fence wait.

It is kind of unrelated to HMM, it just shouldn't be using mmu
notifiers to replace page pinning..

> The issue for AMD is that they already update their GPU page table
> using DMA engine. I believe this is still doable if they use a
> kernel only DMA engine context, where only kernel can queue up jobs
> so that you do not need to wait for unrelated things and you can
> prioritize GPU page table update which should translate in fast GPU
> page table update without DMA fence.

Make sense

I'm not sure I saw this in the AMD hmm stuff - it would be good if
someone would look at that. Every time I do it looks like the locking
is wrong.

Jason
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


[PATCH v2 2/3] drm/amdgpu: Inform user about effect of running without DPM

2020-06-19 Thread Paul Menzel
Currently, most users have no idea about the effects of failed DPM
initialization. So add it to the log message.

Signed-off-by: Paul Menzel 
---
 drivers/gpu/drm/amd/amdgpu/kv_dpm.c | 2 +-
 drivers/gpu/drm/amd/amdgpu/si_dpm.c | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/kv_dpm.c 
b/drivers/gpu/drm/amd/amdgpu/kv_dpm.c
index b179bdc17cdc..f054ded902f2 100644
--- a/drivers/gpu/drm/amd/amdgpu/kv_dpm.c
+++ b/drivers/gpu/drm/amd/amdgpu/kv_dpm.c
@@ -3033,7 +3033,7 @@ static int kv_dpm_sw_init(void *handle)
 dpm_failed:
kv_dpm_fini(adev);
mutex_unlock(&adev->pm.mutex);
-   drm_err(adev, "amdgpu: dpm initialization failed\n");
+   drm_err(adev, "amdgpu: dpm initialization failed. Clocks set up by 
firmware will be used. Most likely they are low, so performance might 
suffer.\n");
return ret;
 }
 
diff --git a/drivers/gpu/drm/amd/amdgpu/si_dpm.c 
b/drivers/gpu/drm/amd/amdgpu/si_dpm.c
index 8ba673ca2f5e..f7edc1d50df4 100644
--- a/drivers/gpu/drm/amd/amdgpu/si_dpm.c
+++ b/drivers/gpu/drm/amd/amdgpu/si_dpm.c
@@ -7710,7 +7710,7 @@ static int si_dpm_sw_init(void *handle)
 dpm_failed:
si_dpm_fini(adev);
mutex_unlock(&adev->pm.mutex);
-   drm_err(adev, "amdgpu: dpm initialization failed\n");
+   drm_err(adev, "amdgpu: dpm initialization failed. Clocks set up by 
firmware will be used. Most likely they are low, so performance might 
suffer.\n");
return ret;
 }
 
-- 
2.27.0

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [Linaro-mm-sig] [PATCH 04/18] dma-fence: prime lockdep annotations

2020-06-19 Thread Jerome Glisse
On Thu, Jun 11, 2020 at 07:35:35PM -0400, Felix Kuehling wrote:
> Am 2020-06-11 um 10:15 a.m. schrieb Jason Gunthorpe:
> > On Thu, Jun 11, 2020 at 10:34:30AM +0200, Daniel Vetter wrote:
> >>> I still have my doubts about allowing fence waiting from within shrinkers.
> >>> IMO ideally they should use a trywait approach, in order to allow memory
> >>> allocation during command submission for drivers that
> >>> publish fences before command submission. (Since early reservation object
> >>> release requires that).
> >> Yeah it is a bit annoying, e.g. for drm/scheduler I think we'll end up
> >> with a mempool to make sure it can handle it's allocations.
> >>
> >>> But since drivers are already waiting from within shrinkers and I take 
> >>> your
> >>> word for HMM requiring this,
> >> Yeah the big trouble is HMM and mmu notifiers. That's the really awkward
> >> one, the shrinker one is a lot less established.
> > I really question if HW that needs something like DMA fence should
> > even be using mmu notifiers - the best use is HW that can fence the
> > DMA directly without having to get involved with some command stream
> > processing.
> >
> > Or at the very least it should not be a generic DMA fence but a
> > narrowed completion tied only into the same GPU driver's command
> > completion processing which should be able to progress without
> > blocking.
> >
> > The intent of notifiers was never to endlessly block while vast
> > amounts of SW does work.
> >
> > Going around and switching everything in a GPU to GFP_ATOMIC seems
> > like bad idea.
> >
> >> I've pinged a bunch of armsoc gpu driver people and ask them how much this
> >> hurts, so that we have a clear answer. On x86 I don't think we have much
> >> of a choice on this, with userptr in amd and i915 and hmm work in nouveau
> >> (but nouveau I think doesn't use dma_fence in there). 
> 
> Soon nouveau will get company. We're working on a recoverable page fault
> implementation for HMM in amdgpu where we'll need to update page tables
> using the GPUs SDMA engine and wait for corresponding fences in MMU
> notifiers.

Note that HMM mandate, and i stressed that several time in the past,
that all GPU page table update are asynchronous and do not have to
wait on _anything_.

I understand that you use DMA engine for GPU page table update but
if you want to do so with HMM then you need a GPU page table update
only DMA context where all GPU page table update goes through and
where user space can not queue up job.

It can be for HMM only but if you want to mix HMM with non HMM then
everything need to be on that queue and other command queue will have
to depends on it.

Cheers,
Jérôme

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [Linaro-mm-sig] [PATCH 04/18] dma-fence: prime lockdep annotations

2020-06-19 Thread Jerome Glisse
On Fri, Jun 19, 2020 at 02:23:08PM -0300, Jason Gunthorpe wrote:
> On Fri, Jun 19, 2020 at 06:19:41PM +0200, Daniel Vetter wrote:
> 
> > The madness is only that device B's mmu notifier might need to wait
> > for fence_B so that the dma operation finishes. Which in turn has to
> > wait for device A to finish first.
> 
> So, it sound, fundamentally you've got this graph of operations across
> an unknown set of drivers and the kernel cannot insert itself in
> dma_fence hand offs to re-validate any of the buffers involved?
> Buffers which by definition cannot be touched by the hardware yet.
> 
> That really is a pretty horrible place to end up..
> 
> Pinning really is right answer for this kind of work flow. I think
> converting pinning to notifers should not be done unless notifier
> invalidation is relatively bounded. 
> 
> I know people like notifiers because they give a bit nicer performance
> in some happy cases, but this cripples all the bad cases..
> 
> If pinning doesn't work for some reason maybe we should address that?

Note that the dma fence is only true for user ptr buffer which predate
any HMM work and thus were using mmu notifier already. You need the
mmu notifier there because of fork and other corner cases.

For nouveau the notifier do not need to wait for anything it can update
the GPU page table right away. Modulo needing to write to GPU memory
using dma engine if the GPU page table is in GPU memory that is not
accessible from the CPU but that's never the case for nouveau so far
(but i expect it will be at one point).


So i see this as 2 different cases, the user ptr case, which does pin
pages by the way, where things are synchronous. Versus the HMM cases
where everything is asynchronous.


I probably need to warn AMD folks again that using HMM means that you
must be able to update the GPU page table asynchronously without
fence wait. The issue for AMD is that they already update their GPU
page table using DMA engine. I believe this is still doable if they
use a kernel only DMA engine context, where only kernel can queue up
jobs so that you do not need to wait for unrelated things and you can
prioritize GPU page table update which should translate in fast GPU
page table update without DMA fence.


> > Full disclosure: We are aware that we've designed ourselves into an
> > impressive corner here, and there's lots of talks going on about
> > untangling the dma synchronization from the memory management
> > completely. But
> 
> I think the documenting is really important: only GPU should be using
> this stuff and driving notifiers this way. Complete NO for any
> totally-not-a-GPU things in drivers/accel for sure.

Yes for user that expect HMM they need to be asynchronous. But it is
hard to revert user ptr has it was done a long time ago.

Cheers,
Jérôme

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [Linaro-mm-sig] [PATCH 04/18] dma-fence: prime lockdep annotations

2020-06-19 Thread Jason Gunthorpe
On Fri, Jun 19, 2020 at 06:19:41PM +0200, Daniel Vetter wrote:

> The madness is only that device B's mmu notifier might need to wait
> for fence_B so that the dma operation finishes. Which in turn has to
> wait for device A to finish first.

So, it sound, fundamentally you've got this graph of operations across
an unknown set of drivers and the kernel cannot insert itself in
dma_fence hand offs to re-validate any of the buffers involved?
Buffers which by definition cannot be touched by the hardware yet.

That really is a pretty horrible place to end up..

Pinning really is right answer for this kind of work flow. I think
converting pinning to notifers should not be done unless notifier
invalidation is relatively bounded. 

I know people like notifiers because they give a bit nicer performance
in some happy cases, but this cripples all the bad cases..

If pinning doesn't work for some reason maybe we should address that?

> Full disclosure: We are aware that we've designed ourselves into an
> impressive corner here, and there's lots of talks going on about
> untangling the dma synchronization from the memory management
> completely. But

I think the documenting is really important: only GPU should be using
this stuff and driving notifiers this way. Complete NO for any
totally-not-a-GPU things in drivers/accel for sure.

Jason
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH 1/1] drm/amd/display: fix compilation error on allmodconfig

2020-06-19 Thread Kazlauskas, Nicholas

On 2020-06-19 11:27 a.m., Alex Deucher wrote:

On Thu, Jun 18, 2020 at 3:49 PM Qingqing Zhuo  wrote:


when compiled with allmodconfig option, there are error
messages as below:

ERROR: modpost:
"mod_color_is_table_init"
[drivers/gpu/drm/amd/amdgpu/amdgpu.ko] undefined!
ERROR: modpost:
"mod_color_get_table"
[drivers/gpu/drm/amd/amdgpu/amdgpu.ko] undefined!
ERROR: modpost:
"mod_color_set_table_init_state"
[drivers/gpu/drm/amd/amdgpu/amdgpu.ko] undefined!

To fix the issue, this commits removes
CONFIG_DRM_AMD_DC_DCN guard in color/makefile.

Signed-off-by: Qingqing Zhuo 
CC: Lewis Huang 
CC: Aric Cyr 
CC: Alexander Deucher 
CC: Harry Wentland 
CC: Nicholas Kazlauskas 
CC: Bhawanpreet Lakha 
CC: Stephen Rothwell 


Acked-by: Alex Deucher 


Reviewed-by: Nicholas Kazlauskas 




---
  drivers/gpu/drm/amd/display/modules/color/Makefile | 6 +-
  1 file changed, 1 insertion(+), 5 deletions(-)

diff --git a/drivers/gpu/drm/amd/display/modules/color/Makefile 
b/drivers/gpu/drm/amd/display/modules/color/Makefile
index 3ee7f27ff93b..e66c19a840c2 100644
--- a/drivers/gpu/drm/amd/display/modules/color/Makefile
+++ b/drivers/gpu/drm/amd/display/modules/color/Makefile
@@ -23,11 +23,7 @@
  # Makefile for the color sub-module of DAL.
  #

-MOD_COLOR = color_gamma.o
-
-ifdef CONFIG_DRM_AMD_DC_DCN
-MOD_COLOR += color_table.o
-endif
+MOD_COLOR = color_gamma.o color_table.o

  AMD_DAL_MOD_COLOR = $(addprefix $(AMDDALPATH)/modules/color/,$(MOD_COLOR))
  #$(info   DAL COLOR MODULE MAKEFILE )
--
2.17.1

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [Linaro-mm-sig] [PATCH 04/18] dma-fence: prime lockdep annotations

2020-06-19 Thread Daniel Vetter
On Fri, Jun 19, 2020 at 5:15 PM Jason Gunthorpe  wrote:
>
> On Fri, Jun 19, 2020 at 05:06:04PM +0200, Daniel Vetter wrote:
> > On Fri, Jun 19, 2020 at 1:39 PM Jason Gunthorpe  wrote:
> > >
> > > On Fri, Jun 19, 2020 at 09:22:09AM +0200, Daniel Vetter wrote:
> > > > > As I've understood GPU that means you need to show that the commands
> > > > > associated with the buffer have completed. This is all local stuff
> > > > > within the driver, right? Why use fence (other than it already exists)
> > > >
> > > > Because that's the end-of-dma thing. And it's cross-driver for the
> > > > above reasons, e.g.
> > > > - device A renders some stuff. Userspace gets dma_fence A out of that
> > > > (well sync_file or one of the other uapi interfaces, but you get the
> > > > idea)
> > > > - userspace (across process or just different driver) issues more
> > > > rendering for device B, which depends upon the rendering done on
> > > > device A. So dma_fence A is an dependency and will block this dma
> > > > operation. Userspace (and the kernel) gets dma_fence B out of this
> > > > - because unfortunate reasons, the same rendering on device B also
> > > > needs a userptr buffer, which means that dma_fence B is also the one
> > > > that the mmu_range_notifier needs to wait on before it can tell core
> > > > mm that it can go ahead and release those pages
> > >
> > > I was afraid you'd say this - this is complete madness for other DMA
> > > devices to borrow the notifier hook of the first device!
> >
> > The first device might not even have a notifier. This is the 2nd
> > device, waiting on a dma_fence of its own, but which happens to be
> > queued up as a dma operation behind something else.
> >
> > > What if the first device is a page faulting device and doesn't call
> > > dma_fence??
> >
> > Not sure what you mean with this ... even if it does page-faulting for
> > some other reasons, it'll emit a dma_fence which the 2nd device can
> > consume as a dependency.
>
> At some point the pages under the buffer have to be either pinned
> or protected by mmu notifier. So each and every single device doing
> DMA to these pages must either pin, or use mmu notifier.
>
> Driver A should never 'borrow' a notifier from B

It doesn't. I guess this would be great topic for lpc with a seriously
big white-board, but I guess we don't have that this year again, so
let me try again. Simplified example ofc, but should be the gist.

Ingredients:
Device A and Device B
A dma-buf, shared between device A and device B, let's call that shared_buf
A userptr buffer, which userspace created on device B to hopefully
somewhat track a virtual memory range, let's call that userptr_buf.
A pile of other buffers, but we pretend they don't exist (because they
kinda don't matter.

Sequence of events as userspace issues them to the kernel.
1. dma operation on device A, which fills some interesting stuff into
shared_buf. Userspace gets back a handle to dma_fence fence_A. No mmu
notifier anywhere to be seen in the driver for device A.

2. userspace passes fence_A around to some other place

3. other places takes the handle for shared_buf and fence_A and
userptr_buf and starts a dma operation on device B. It's one dma
operation, maybe device B is taking the data from shared_buf and
compresses it into userptr_buf, so that userspace can then send it
over the network or to disk or whatever. device B has a mmu_notifier.
Userspace gets back fence_B, which represents this dma operation. The
kernel also stuffs this fence_B into the mmu_range_notifier for
userptr_buf.

-> at this point device A might still be crunching the numbers

4. device A is finally done doing whatever it was supposed to do, and
fence_A completes

5. device B wakes up (this might or might not involve the kernel,
usually it does) since fence_A has completed, and now starts doing its
own crunching.

6. once device B is also done, it signals fence_B

In all this device A has never borrowed the mmu notifier or even
accessd the memory in userptr_buf or had access to that buffer handle.

The madness is only that device B's mmu notifier might need to wait
for fence_B so that the dma operation finishes. Which in turn has to
wait for device A to finish first.

> If each driver controls its own lifetime of the buffers, why can't the
> driver locally wait for its device to finish?
>
> Can't the GPUs cancel work that is waiting on a DMA fence? Ie if
> Driver A detects that work completed and wants to trigger a DMA fence,
> but it now knows the buffer is invalidated, can't it tell driver B to
> give up?

We can (usually, the shitty hw where we can't has generally
disappeared) with gpu reset. Users make really sad faces when that
happens though, and generally they're only ok with that if it's indeed
a nasty gpu program that resulted in the crash (there's some webgl
shaders that run too long for quick&easy testing of how good the gpu
reset is, don't do that if you care about the data in your desktop
session ...).

The 

Re: [PATCH 1/1] drm/amd/display: fix compilation error on allmodconfig

2020-06-19 Thread Alex Deucher
On Thu, Jun 18, 2020 at 3:49 PM Qingqing Zhuo  wrote:
>
> when compiled with allmodconfig option, there are error
> messages as below:
>
> ERROR: modpost:
> "mod_color_is_table_init"
> [drivers/gpu/drm/amd/amdgpu/amdgpu.ko] undefined!
> ERROR: modpost:
> "mod_color_get_table"
> [drivers/gpu/drm/amd/amdgpu/amdgpu.ko] undefined!
> ERROR: modpost:
> "mod_color_set_table_init_state"
> [drivers/gpu/drm/amd/amdgpu/amdgpu.ko] undefined!
>
> To fix the issue, this commits removes
> CONFIG_DRM_AMD_DC_DCN guard in color/makefile.
>
> Signed-off-by: Qingqing Zhuo 
> CC: Lewis Huang 
> CC: Aric Cyr 
> CC: Alexander Deucher 
> CC: Harry Wentland 
> CC: Nicholas Kazlauskas 
> CC: Bhawanpreet Lakha 
> CC: Stephen Rothwell 

Acked-by: Alex Deucher 

> ---
>  drivers/gpu/drm/amd/display/modules/color/Makefile | 6 +-
>  1 file changed, 1 insertion(+), 5 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/display/modules/color/Makefile 
> b/drivers/gpu/drm/amd/display/modules/color/Makefile
> index 3ee7f27ff93b..e66c19a840c2 100644
> --- a/drivers/gpu/drm/amd/display/modules/color/Makefile
> +++ b/drivers/gpu/drm/amd/display/modules/color/Makefile
> @@ -23,11 +23,7 @@
>  # Makefile for the color sub-module of DAL.
>  #
>
> -MOD_COLOR = color_gamma.o
> -
> -ifdef CONFIG_DRM_AMD_DC_DCN
> -MOD_COLOR += color_table.o
> -endif
> +MOD_COLOR = color_gamma.o color_table.o
>
>  AMD_DAL_MOD_COLOR = $(addprefix $(AMDDALPATH)/modules/color/,$(MOD_COLOR))
>  #$(info   DAL COLOR MODULE MAKEFILE )
> --
> 2.17.1
>
> ___
> amd-gfx mailing list
> amd-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [Linaro-mm-sig] [PATCH 04/18] dma-fence: prime lockdep annotations

2020-06-19 Thread Jason Gunthorpe
On Fri, Jun 19, 2020 at 05:06:04PM +0200, Daniel Vetter wrote:
> On Fri, Jun 19, 2020 at 1:39 PM Jason Gunthorpe  wrote:
> >
> > On Fri, Jun 19, 2020 at 09:22:09AM +0200, Daniel Vetter wrote:
> > > > As I've understood GPU that means you need to show that the commands
> > > > associated with the buffer have completed. This is all local stuff
> > > > within the driver, right? Why use fence (other than it already exists)
> > >
> > > Because that's the end-of-dma thing. And it's cross-driver for the
> > > above reasons, e.g.
> > > - device A renders some stuff. Userspace gets dma_fence A out of that
> > > (well sync_file or one of the other uapi interfaces, but you get the
> > > idea)
> > > - userspace (across process or just different driver) issues more
> > > rendering for device B, which depends upon the rendering done on
> > > device A. So dma_fence A is an dependency and will block this dma
> > > operation. Userspace (and the kernel) gets dma_fence B out of this
> > > - because unfortunate reasons, the same rendering on device B also
> > > needs a userptr buffer, which means that dma_fence B is also the one
> > > that the mmu_range_notifier needs to wait on before it can tell core
> > > mm that it can go ahead and release those pages
> >
> > I was afraid you'd say this - this is complete madness for other DMA
> > devices to borrow the notifier hook of the first device!
> 
> The first device might not even have a notifier. This is the 2nd
> device, waiting on a dma_fence of its own, but which happens to be
> queued up as a dma operation behind something else.
> 
> > What if the first device is a page faulting device and doesn't call
> > dma_fence??
> 
> Not sure what you mean with this ... even if it does page-faulting for
> some other reasons, it'll emit a dma_fence which the 2nd device can
> consume as a dependency.

At some point the pages under the buffer have to be either pinned
or protected by mmu notifier. So each and every single device doing
DMA to these pages must either pin, or use mmu notifier.

Driver A should never 'borrow' a notifier from B

If each driver controls its own lifetime of the buffers, why can't the
driver locally wait for its device to finish?

Can't the GPUs cancel work that is waiting on a DMA fence? Ie if
Driver A detects that work completed and wants to trigger a DMA fence,
but it now knows the buffer is invalidated, can't it tell driver B to
give up?

> The problem is that there's piles of other dependencies for a dma job.
> GPU doesn't just consume a single buffer each time, it consumes entire
> lists of buffers and mixes them all up in funny ways. Some of these
> buffers are userptr, entirely local to the device. Other buffers are
> just normal device driver allocations (and managed with some shrinker
> to keep them in check). And then there's the actually shared dma-buf
> with other devices. The trouble is that they're all bundled up
> together.

But why does this matter? Does the GPU itself consume some work and
then stall internally waiting for an external DMA fence?

Otherwise I would expect this dependency chain should be breakable by
aborting work waiting on fences upon invalidation (without stalling)

> > Do not need to wait on dma_fence in notifiers.
> 
> Maybe :-) The goal of this series is more to document current rules
> and make them more consistent. Fixing them if we don't like them might
> be a follow-up task, but that would likely be a pile more work. First
> we need to know what the exact shape of the problem even is.

Fair enough

Jason
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


[PATCH] drm/amd: fix potential memleak in err branch

2020-06-19 Thread Bernard Zhao
The function kobject_init_and_add alloc memory like:
kobject_init_and_add->kobject_add_varg->kobject_set_name_vargs
->kvasprintf_const->kstrdup_const->kstrdup->kmalloc_track_caller
->kmalloc_slab, in err branch this memory not free. If use
kmemleak, this path maybe catched.
These changes are to add kobject_put in kobject_init_and_add
failed branch, fix potential memleak.

Signed-off-by: Bernard Zhao 
---
 drivers/gpu/drm/amd/amdkfd/kfd_process.c  |  2 ++
 drivers/gpu/drm/amd/amdkfd/kfd_topology.c | 20 +++-
 2 files changed, 17 insertions(+), 5 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_process.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_process.c
index d27221ddcdeb..5ee4d6cfb16d 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_process.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_process.c
@@ -124,6 +124,7 @@ void kfd_procfs_init(void)
if (ret) {
pr_warn("Could not create procfs proc folder");
/* If we fail to create the procfs, clean up */
+   kobject_put(procfs.kobj);
kfd_procfs_shutdown();
}
 }
@@ -428,6 +429,7 @@ struct kfd_process *kfd_create_process(struct file *filep)
   (int)process->lead_thread->pid);
if (ret) {
pr_warn("Creating procfs pid directory failed");
+   kobject_put(process->kobj);
goto out;
}
 
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_topology.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
index bb77f7af2b6d..dc3c4149f860 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
@@ -632,8 +632,10 @@ static int kfd_build_sysfs_node_entry(struct 
kfd_topology_device *dev,
 
ret = kobject_init_and_add(dev->kobj_node, &node_type,
sys_props.kobj_nodes, "%d", id);
-   if (ret < 0)
+   if (ret < 0) {
+   kobject_put(dev->kobj_node);
return ret;
+   }
 
dev->kobj_mem = kobject_create_and_add("mem_banks", dev->kobj_node);
if (!dev->kobj_mem)
@@ -680,8 +682,10 @@ static int kfd_build_sysfs_node_entry(struct 
kfd_topology_device *dev,
return -ENOMEM;
ret = kobject_init_and_add(mem->kobj, &mem_type,
dev->kobj_mem, "%d", i);
-   if (ret < 0)
+   if (ret < 0) {
+   kobject_put(mem->kobj);
return ret;
+   }
 
mem->attr.name = "properties";
mem->attr.mode = KFD_SYSFS_FILE_MODE;
@@ -699,8 +703,10 @@ static int kfd_build_sysfs_node_entry(struct 
kfd_topology_device *dev,
return -ENOMEM;
ret = kobject_init_and_add(cache->kobj, &cache_type,
dev->kobj_cache, "%d", i);
-   if (ret < 0)
+   if (ret < 0) {
+   kobject_put(cache->kobj);
return ret;
+   }
 
cache->attr.name = "properties";
cache->attr.mode = KFD_SYSFS_FILE_MODE;
@@ -718,8 +724,10 @@ static int kfd_build_sysfs_node_entry(struct 
kfd_topology_device *dev,
return -ENOMEM;
ret = kobject_init_and_add(iolink->kobj, &iolink_type,
dev->kobj_iolink, "%d", i);
-   if (ret < 0)
+   if (ret < 0) {
+   kobject_put(iolink->kobj);
return ret;
+   }
 
iolink->attr.name = "properties";
iolink->attr.mode = KFD_SYSFS_FILE_MODE;
@@ -798,8 +806,10 @@ static int kfd_topology_update_sysfs(void)
ret = kobject_init_and_add(sys_props.kobj_topology,
&sysprops_type,  &kfd_device->kobj,
"topology");
-   if (ret < 0)
+   if (ret < 0) {
+   kobject_put(sys_props.kobj_topology);
return ret;
+   }
 
sys_props.kobj_nodes = kobject_create_and_add("nodes",
sys_props.kobj_topology);
-- 
2.17.1

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [Linaro-mm-sig] [PATCH 04/18] dma-fence: prime lockdep annotations

2020-06-19 Thread Jason Gunthorpe
On Fri, Jun 19, 2020 at 09:22:09AM +0200, Daniel Vetter wrote:
> > As I've understood GPU that means you need to show that the commands
> > associated with the buffer have completed. This is all local stuff
> > within the driver, right? Why use fence (other than it already exists)
> 
> Because that's the end-of-dma thing. And it's cross-driver for the
> above reasons, e.g.
> - device A renders some stuff. Userspace gets dma_fence A out of that
> (well sync_file or one of the other uapi interfaces, but you get the
> idea)
> - userspace (across process or just different driver) issues more
> rendering for device B, which depends upon the rendering done on
> device A. So dma_fence A is an dependency and will block this dma
> operation. Userspace (and the kernel) gets dma_fence B out of this
> - because unfortunate reasons, the same rendering on device B also
> needs a userptr buffer, which means that dma_fence B is also the one
> that the mmu_range_notifier needs to wait on before it can tell core
> mm that it can go ahead and release those pages

I was afraid you'd say this - this is complete madness for other DMA
devices to borrow the notifier hook of the first device!

What if the first device is a page faulting device and doesn't call
dma_fence??

It you are going to treat things this way then the mmu notifier really
needs to be part of the some core DMA buf, and not randomly sprinkled
in drivers

But really this is what page pinning is supposed to be used for, the
MM behavior when it blocks on a pinned page is less invasive than if
it stalls inside a mmu notifier.

You can mix it, use mmu notififers to keep track if the buffer is
still live, but when you want to trigger DMA then pin the pages and
keep them pinned until DMA is done. The pin protects things (well,
fork is still a problem)

Do not need to wait on dma_fence in notifiers.

Jason
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH] drm/amdkfd: Fix memory leaks according to error branches

2020-06-19 Thread Markus Elfring
> The function kobject_init_and_add alloc memory like:
> kobject_init_and_add->kobject_add_varg->kobject_set_name_vargs
> ->kvasprintf_const->kstrdup_const->kstrdup->kmalloc_track_caller
> ->kmalloc_slab, in err branch this memory not free. If use
> kmemleak, this path maybe catched.
> These changes are to add kobject_put in kobject_init_and_add
> failed branch, fix potential memleak.

I suggest to improve this change description.

* Can an other wording variant be nicer?

* Will the tag “Fixes” become helpful?

Regards,
Markus
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [Linaro-mm-sig] [PATCH 04/18] dma-fence: prime lockdep annotations

2020-06-19 Thread Daniel Vetter
On Fri, Jun 19, 2020 at 1:39 PM Jason Gunthorpe  wrote:
>
> On Fri, Jun 19, 2020 at 09:22:09AM +0200, Daniel Vetter wrote:
> > > As I've understood GPU that means you need to show that the commands
> > > associated with the buffer have completed. This is all local stuff
> > > within the driver, right? Why use fence (other than it already exists)
> >
> > Because that's the end-of-dma thing. And it's cross-driver for the
> > above reasons, e.g.
> > - device A renders some stuff. Userspace gets dma_fence A out of that
> > (well sync_file or one of the other uapi interfaces, but you get the
> > idea)
> > - userspace (across process or just different driver) issues more
> > rendering for device B, which depends upon the rendering done on
> > device A. So dma_fence A is an dependency and will block this dma
> > operation. Userspace (and the kernel) gets dma_fence B out of this
> > - because unfortunate reasons, the same rendering on device B also
> > needs a userptr buffer, which means that dma_fence B is also the one
> > that the mmu_range_notifier needs to wait on before it can tell core
> > mm that it can go ahead and release those pages
>
> I was afraid you'd say this - this is complete madness for other DMA
> devices to borrow the notifier hook of the first device!

The first device might not even have a notifier. This is the 2nd
device, waiting on a dma_fence of its own, but which happens to be
queued up as a dma operation behind something else.

> What if the first device is a page faulting device and doesn't call
> dma_fence??

Not sure what you mean with this ... even if it does page-faulting for
some other reasons, it'll emit a dma_fence which the 2nd device can
consume as a dependency.

> It you are going to treat things this way then the mmu notifier really
> needs to be part of the some core DMA buf, and not randomly sprinkled
> in drivers

So maybe again unclear, we don't allow such userptr dma-buf to even be
shared. They're just for slurping in stuff in the local device
(general from file io or something the cpu has done or similar). There
have been attempts to use it as the general backing storage, but that
didn't go down too well because way too many complications.

Generally most memory the gpu operates on isn't stuff that's
mmu_notifier'ed. And also, the device with userptr support only waits
for its own dma_fence (because well you can't share this stuff, we
disallow that).

The problem is that there's piles of other dependencies for a dma job.
GPU doesn't just consume a single buffer each time, it consumes entire
lists of buffers and mixes them all up in funny ways. Some of these
buffers are userptr, entirely local to the device. Other buffers are
just normal device driver allocations (and managed with some shrinker
to keep them in check). And then there's the actually shared dma-buf
with other devices. The trouble is that they're all bundled up
together.

Now we probably should have some helper code for userptr so that all
drivers do this roughly the same, but that's just not there yet. But
it can't be a dma-buf exporter behind the dma-buf interfaces, because
even just pinned get_user_pages would be too different semantics
compared to normal shared dma-buf objects, that's all very tightly
tied into the specific driver.

> But really this is what page pinning is supposed to be used for, the
> MM behavior when it blocks on a pinned page is less invasive than if
> it stalls inside a mmu notifier.
>
> You can mix it, use mmu notififers to keep track if the buffer is
> still live, but when you want to trigger DMA then pin the pages and
> keep them pinned until DMA is done. The pin protects things (well,
> fork is still a problem)

Hm I thought amdgpu had that (or drm/radeon as the previous
incarnation of that stack), and was unhappy about the issues. Would
need Christian König to chime in.

> Do not need to wait on dma_fence in notifiers.

Maybe :-) The goal of this series is more to document current rules
and make them more consistent. Fixing them if we don't like them might
be a follow-up task, but that would likely be a pile more work. First
we need to know what the exact shape of the problem even is.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [Intel-gfx] [PATCH 03/18] dma-fence: basic lockdep annotations

2020-06-19 Thread Chris Wilson
Quoting Daniel Vetter (2020-06-19 10:43:09)
> On Fri, Jun 19, 2020 at 10:13:35AM +0100, Chris Wilson wrote:
> > Quoting Daniel Vetter (2020-06-19 09:51:59)
> > > On Fri, Jun 19, 2020 at 10:25 AM Chris Wilson  
> > > wrote:
> > > > Forcing a generic primitive to always be part of the same global map is
> > > > horrible.
> > > 
> > > And  no concrete example or reason for why that's not possible.
> > > Because frankly it's not horrible, this is what upstream is all about:
> > > Shared concepts, shared contracts, shared code.
> > > 
> > > The proposed patches might very well encode the wrong contract, that's
> > > all up for discussion. But fundamentally questioning that we need one
> > > is missing what upstream is all about.
> > 
> > Then I have not clearly communicated, as my opinion is not that
> > validation is worthless, but that the implementation is enshrining a
> > global property on a low level primitive that prevents it from being
> > used elsewhere. And I want to replace completion [chains] with fences, and
> > bio with fences, and closures with fences, and what other equivalencies
> > there are in the kernel. The fence is as central a locking construct as
> > struct completion and deserves to be a foundational primitive provided
> > by kernel/ used throughout all drivers for discrete problem domains.
> > 
> > This is narrowing dma_fence whereby adding
> >   struct lockdep_map *dma_fence::wait_map
> > and annotating linkage, allows you to continue to specify that all
> > dma_fence used for a particular purpose must follow common rules,
> > without restricting the primitive for uses outside of this scope.
> 
> Somewhere else in this thread I had discussions with Jason Gunthorpe about
> this topic. It might maybe change somewhat depending upon exact rules, but
> his take is very much "I don't want dma_fence in rdma". Or pretty close to
> that at least.
> 
> Similar discussions with habanalabs, they're using dma_fence internally
> without any of the uapi. Discussion there has also now concluded that it's
> best if they remove them, and simply switch over to a wait_queue or
> completion like every other driver does.
> 
> The next round of the patches already have a paragraph to at least
> somewhat limit how non-gpu drivers use dma_fence. And I guess actual
> consensus might be pointing even more strongly at dma_fence being solely
> something for gpus and closely related subsystem (maybe media) for syncing
> dma-buf access.
> 
> So dma_fence as general replacement for completion chains I think just
> wont happen.

That is sad. I cannot comprehend going back to pure completions after a
taste of fence scheduling. And we are not even close to fully utilising
them, as not all the async cpu [allocation!] tasks are fully tracked by
fences yet and are still stuck in a FIFO workqueue.

> What might make sense is if e.g. the lockdep annotations could be reused,
> at least in design, for wait_queue or completion or anything else
> really. I do think that has a fair chance compared to the automagic
> cross-release annotations approach, which relied way too heavily on
> guessing where barriers are. My experience from just a bit of playing
> around with these patches here and discussing them with other driver
> maintainers is that accurately deciding where critical sections start and
> end is a job for humans only. And if you get it wrong, you will have a
> false positive.
> 
> And you're indeed correct that if we'd do annotations for completions and
> wait queues, then that would need to have a class per semantically
> equivalent user, like we have lockdep classes for mutexes, not just one
> overall.
> 
> But dma_fence otoh is something very specific, which comes with very
> specific rules attached - it's not a generic wait_queue at all. Originally
> it did start out as one even, but it is a very specialized wait_queue.
> 
> So there's imo two cases:
> 
> - Your completion is entirely orthogonal of dma_fences, and can never ever
>   block a dma_fence. Don't use dma_fence for this, and no problem. It's
>   just another wait_queue somewhere.
> 
> - Your completion can eventually, maybe through lots of convolutions and
>   depdencies, block a dma_fence. In that case full dma_fence rules apply,
>   and the only thing you can do with a custom annotation is make the rules
>   even stricter. E.g. if a sub-timeline in the scheduler isn't allowed to
>   take certain scheduler locks. But the userspace visible/published fence
>   do take them, maybe as part of command submission or retirement.
>   Entirely hypotethical, no idea any driver actually needs this.

I think we are faced with this very real problem.

The papering we have today over userptr is so very thin, and if you
squint you can already see it is coupled into the completion signal. Just
it happens to be on the other side of the fence.

The next batch of priority inversions involve integrating the async cpu
tasks into the scheduler, and have full dependency t

[PATCH v7 04/36] drm: amdgpu: fix common struct sg_table related issues

2020-06-19 Thread Marek Szyprowski
The Documentation/DMA-API-HOWTO.txt states that the dma_map_sg() function
returns the number of the created entries in the DMA address space.
However the subsequent calls to the dma_sync_sg_for_{device,cpu}() and
dma_unmap_sg must be called with the original number of the entries
passed to the dma_map_sg().

struct sg_table is a common structure used for describing a non-contiguous
memory buffer, used commonly in the DRM and graphics subsystems. It
consists of a scatterlist with memory pages and DMA addresses (sgl entry),
as well as the number of scatterlist entries: CPU pages (orig_nents entry)
and DMA mapped pages (nents entry).

It turned out that it was a common mistake to misuse nents and orig_nents
entries, calling DMA-mapping functions with a wrong number of entries or
ignoring the number of mapped entries returned by the dma_map_sg()
function.

To avoid such issues, lets use a common dma-mapping wrappers operating
directly on the struct sg_table objects and use scatterlist page
iterators where possible. This, almost always, hides references to the
nents and orig_nents entries, making the code robust, easier to follow
and copy/paste safe.

Signed-off-by: Marek Szyprowski 
Reviewed-by: Christian König 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_dma_buf.c  | 6 +++---
 drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c  | 9 +++--
 drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c | 8 
 3 files changed, 10 insertions(+), 13 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_dma_buf.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_dma_buf.c
index 43d8ed7dbd00..519ce4427fce 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_dma_buf.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_dma_buf.c
@@ -307,8 +307,8 @@ static struct sg_table *amdgpu_dma_buf_map(struct 
dma_buf_attachment *attach,
if (IS_ERR(sgt))
return sgt;
 
-   if (!dma_map_sg_attrs(attach->dev, sgt->sgl, sgt->nents, dir,
- DMA_ATTR_SKIP_CPU_SYNC))
+   if (dma_map_sgtable(attach->dev, sgt, dir,
+   DMA_ATTR_SKIP_CPU_SYNC))
goto error_free;
break;
 
@@ -349,7 +349,7 @@ static void amdgpu_dma_buf_unmap(struct dma_buf_attachment 
*attach,
struct amdgpu_device *adev = amdgpu_ttm_adev(bo->tbo.bdev);
 
if (sgt->sgl->page_link) {
-   dma_unmap_sg(attach->dev, sgt->sgl, sgt->nents, dir);
+   dma_unmap_sgtable(attach->dev, sgt, dir, 0);
sg_free_table(sgt);
kfree(sgt);
} else {
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
index 5129a996e941..97fb73e5a6ae 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
@@ -1025,7 +1025,6 @@ static int amdgpu_ttm_tt_pin_userptr(struct ttm_tt *ttm)
 {
struct amdgpu_device *adev = amdgpu_ttm_adev(ttm->bdev);
struct amdgpu_ttm_tt *gtt = (void *)ttm;
-   unsigned nents;
int r;
 
int write = !(gtt->userflags & AMDGPU_GEM_USERPTR_READONLY);
@@ -1040,9 +1039,8 @@ static int amdgpu_ttm_tt_pin_userptr(struct ttm_tt *ttm)
goto release_sg;
 
/* Map SG to device */
-   r = -ENOMEM;
-   nents = dma_map_sg(adev->dev, ttm->sg->sgl, ttm->sg->nents, direction);
-   if (nents == 0)
+   r = dma_map_sgtable(adev->dev, ttm->sg, direction, 0);
+   if (r)
goto release_sg;
 
/* convert SG to linear array of pages and dma addresses */
@@ -1073,8 +1071,7 @@ static void amdgpu_ttm_tt_unpin_userptr(struct ttm_tt 
*ttm)
return;
 
/* unmap the pages mapped to the device */
-   dma_unmap_sg(adev->dev, ttm->sg->sgl, ttm->sg->nents, direction);
-
+   dma_unmap_sgtable(adev->dev, ttm->sg, direction, 0);
sg_free_table(ttm->sg);
 
 #if IS_ENABLED(CONFIG_DRM_AMDGPU_USERPTR)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c
index d399e5893170..c281aa13f5ec 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c
@@ -477,11 +477,11 @@ int amdgpu_vram_mgr_alloc_sgt(struct amdgpu_device *adev,
if (r)
goto error_free;
 
-   for_each_sg((*sgt)->sgl, sg, num_entries, i)
+   for_each_sgtable_sg((*sgt), sg, i)
sg->length = 0;
 
node = mem->mm_node;
-   for_each_sg((*sgt)->sgl, sg, num_entries, i) {
+   for_each_sgtable_sg((*sgt), sg, i) {
phys_addr_t phys = (node->start << PAGE_SHIFT) +
adev->gmc.aper_base;
size_t size = node->size << PAGE_SHIFT;
@@ -501,7 +501,7 @@ int amdgpu_vram_mgr_alloc_sgt(struct amdgpu_device *adev,
return 0;
 
 error_unmap:
-   for_each_sg((*sgt)->sgl, sg, num_entries, i) {
+   for_each_sgtable_sg((*sgt), sg, i) {
i

[PATCH v7 17/36] drm: radeon: fix common struct sg_table related issues

2020-06-19 Thread Marek Szyprowski
The Documentation/DMA-API-HOWTO.txt states that the dma_map_sg() function
returns the number of the created entries in the DMA address space.
However the subsequent calls to the dma_sync_sg_for_{device,cpu}() and
dma_unmap_sg must be called with the original number of the entries
passed to the dma_map_sg().

struct sg_table is a common structure used for describing a non-contiguous
memory buffer, used commonly in the DRM and graphics subsystems. It
consists of a scatterlist with memory pages and DMA addresses (sgl entry),
as well as the number of scatterlist entries: CPU pages (orig_nents entry)
and DMA mapped pages (nents entry).

It turned out that it was a common mistake to misuse nents and orig_nents
entries, calling DMA-mapping functions with a wrong number of entries or
ignoring the number of mapped entries returned by the dma_map_sg()
function.

To avoid such issues, lets use a common dma-mapping wrappers operating
directly on the struct sg_table objects and use scatterlist page
iterators where possible. This, almost always, hides references to the
nents and orig_nents entries, making the code robust, easier to follow
and copy/paste safe.

Signed-off-by: Marek Szyprowski 
Reviewed-by: Christian König 
---
 drivers/gpu/drm/radeon/radeon_ttm.c | 11 +--
 1 file changed, 5 insertions(+), 6 deletions(-)

diff --git a/drivers/gpu/drm/radeon/radeon_ttm.c 
b/drivers/gpu/drm/radeon/radeon_ttm.c
index 5d50c9edbe80..0e3eb0d22831 100644
--- a/drivers/gpu/drm/radeon/radeon_ttm.c
+++ b/drivers/gpu/drm/radeon/radeon_ttm.c
@@ -481,7 +481,7 @@ static int radeon_ttm_tt_pin_userptr(struct ttm_tt *ttm)
 {
struct radeon_device *rdev = radeon_get_rdev(ttm->bdev);
struct radeon_ttm_tt *gtt = (void *)ttm;
-   unsigned pinned = 0, nents;
+   unsigned pinned = 0;
int r;
 
int write = !(gtt->userflags & RADEON_GEM_USERPTR_READONLY);
@@ -521,9 +521,8 @@ static int radeon_ttm_tt_pin_userptr(struct ttm_tt *ttm)
if (r)
goto release_sg;
 
-   r = -ENOMEM;
-   nents = dma_map_sg(rdev->dev, ttm->sg->sgl, ttm->sg->nents, direction);
-   if (nents == 0)
+   r = dma_map_sgtable(rdev->dev, ttm->sg, direction, 0);
+   if (r)
goto release_sg;
 
drm_prime_sg_to_page_addr_arrays(ttm->sg, ttm->pages,
@@ -554,9 +553,9 @@ static void radeon_ttm_tt_unpin_userptr(struct ttm_tt *ttm)
return;
 
/* free the sg table and pages again */
-   dma_unmap_sg(rdev->dev, ttm->sg->sgl, ttm->sg->nents, direction);
+   dma_unmap_sgtable(rdev->dev, ttm->sg, direction, 0);
 
-   for_each_sg_page(ttm->sg->sgl, &sg_iter, ttm->sg->nents, 0) {
+   for_each_sgtable_page(ttm->sg, &sg_iter, 0) {
struct page *page = sg_page_iter_page(&sg_iter);
if (!(gtt->userflags & RADEON_GEM_USERPTR_READONLY))
set_page_dirty(page);
-- 
2.17.1

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [Intel-gfx] [PATCH 03/18] dma-fence: basic lockdep annotations

2020-06-19 Thread Daniel Vetter
On Fri, Jun 19, 2020 at 10:13:35AM +0100, Chris Wilson wrote:
> Quoting Daniel Vetter (2020-06-19 09:51:59)
> > On Fri, Jun 19, 2020 at 10:25 AM Chris Wilson  
> > wrote:
> > > Forcing a generic primitive to always be part of the same global map is
> > > horrible.
> > 
> > And  no concrete example or reason for why that's not possible.
> > Because frankly it's not horrible, this is what upstream is all about:
> > Shared concepts, shared contracts, shared code.
> > 
> > The proposed patches might very well encode the wrong contract, that's
> > all up for discussion. But fundamentally questioning that we need one
> > is missing what upstream is all about.
> 
> Then I have not clearly communicated, as my opinion is not that
> validation is worthless, but that the implementation is enshrining a
> global property on a low level primitive that prevents it from being
> used elsewhere. And I want to replace completion [chains] with fences, and
> bio with fences, and closures with fences, and what other equivalencies
> there are in the kernel. The fence is as central a locking construct as
> struct completion and deserves to be a foundational primitive provided
> by kernel/ used throughout all drivers for discrete problem domains.
> 
> This is narrowing dma_fence whereby adding
>   struct lockdep_map *dma_fence::wait_map
> and annotating linkage, allows you to continue to specify that all
> dma_fence used for a particular purpose must follow common rules,
> without restricting the primitive for uses outside of this scope.

Somewhere else in this thread I had discussions with Jason Gunthorpe about
this topic. It might maybe change somewhat depending upon exact rules, but
his take is very much "I don't want dma_fence in rdma". Or pretty close to
that at least.

Similar discussions with habanalabs, they're using dma_fence internally
without any of the uapi. Discussion there has also now concluded that it's
best if they remove them, and simply switch over to a wait_queue or
completion like every other driver does.

The next round of the patches already have a paragraph to at least
somewhat limit how non-gpu drivers use dma_fence. And I guess actual
consensus might be pointing even more strongly at dma_fence being solely
something for gpus and closely related subsystem (maybe media) for syncing
dma-buf access.

So dma_fence as general replacement for completion chains I think just
wont happen.

What might make sense is if e.g. the lockdep annotations could be reused,
at least in design, for wait_queue or completion or anything else
really. I do think that has a fair chance compared to the automagic
cross-release annotations approach, which relied way too heavily on
guessing where barriers are. My experience from just a bit of playing
around with these patches here and discussing them with other driver
maintainers is that accurately deciding where critical sections start and
end is a job for humans only. And if you get it wrong, you will have a
false positive.

And you're indeed correct that if we'd do annotations for completions and
wait queues, then that would need to have a class per semantically
equivalent user, like we have lockdep classes for mutexes, not just one
overall.

But dma_fence otoh is something very specific, which comes with very
specific rules attached - it's not a generic wait_queue at all. Originally
it did start out as one even, but it is a very specialized wait_queue.

So there's imo two cases:

- Your completion is entirely orthogonal of dma_fences, and can never ever
  block a dma_fence. Don't use dma_fence for this, and no problem. It's
  just another wait_queue somewhere.

- Your completion can eventually, maybe through lots of convolutions and
  depdencies, block a dma_fence. In that case full dma_fence rules apply,
  and the only thing you can do with a custom annotation is make the rules
  even stricter. E.g. if a sub-timeline in the scheduler isn't allowed to
  take certain scheduler locks. But the userspace visible/published fence
  do take them, maybe as part of command submission or retirement.
  Entirely hypotethical, no idea any driver actually needs this.

Cheers, Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [Intel-gfx] [PATCH 03/18] dma-fence: basic lockdep annotations

2020-06-19 Thread Chris Wilson
Quoting Daniel Vetter (2020-06-19 09:51:59)
> On Fri, Jun 19, 2020 at 10:25 AM Chris Wilson  
> wrote:
> > Forcing a generic primitive to always be part of the same global map is
> > horrible.
> 
> And  no concrete example or reason for why that's not possible.
> Because frankly it's not horrible, this is what upstream is all about:
> Shared concepts, shared contracts, shared code.
> 
> The proposed patches might very well encode the wrong contract, that's
> all up for discussion. But fundamentally questioning that we need one
> is missing what upstream is all about.

Then I have not clearly communicated, as my opinion is not that
validation is worthless, but that the implementation is enshrining a
global property on a low level primitive that prevents it from being
used elsewhere. And I want to replace completion [chains] with fences, and
bio with fences, and closures with fences, and what other equivalencies
there are in the kernel. The fence is as central a locking construct as
struct completion and deserves to be a foundational primitive provided
by kernel/ used throughout all drivers for discrete problem domains.

This is narrowing dma_fence whereby adding
struct lockdep_map *dma_fence::wait_map
and annotating linkage, allows you to continue to specify that all
dma_fence used for a particular purpose must follow common rules,
without restricting the primitive for uses outside of this scope.
-Chris
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [Intel-gfx] [PATCH 03/18] dma-fence: basic lockdep annotations

2020-06-19 Thread Daniel Vetter
On Fri, Jun 19, 2020 at 10:25 AM Chris Wilson  wrote:
>
> Quoting Daniel Stone (2020-06-11 10:01:46)
> > Hi,
> >
> > On Thu, 11 Jun 2020 at 09:44, Dave Airlie  wrote:
> > > On Thu, 11 Jun 2020 at 18:01, Chris Wilson  
> > > wrote:
> > > > Introducing a global lockmap that cannot capture the rules correctly,
> > >
> > > Can you document the rules all drivers should be following then,
> > > because from here it looks to get refactored every version of i915,
> > > and it would be nice if we could all aim for the same set of things
> > > roughly. We've already had enough problems with amdgpu vs i915 vs
> > > everyone else with fences, if this stops that in the future then I'd
> > > rather we have that than just some unwritten rules per driver and
> > > untestable.
> >
> > As someone who has sunk a bunch of work into explicit-fencing
> > awareness in my compositor so I can never be blocked, I'd be
> > disappointed if the infrastructure was ultimately pointless because
> > the documented fencing rules were \_o_/ or thereabouts. Lockdep
> > definitely isn't my area of expertise so I can't comment on the patch
> > per se, but having something to ensure we don't hit deadlocks sure
> > seems a lot better than nothing.
>
> This is doing dependency analysis on execution contexts which is a far
> cry from doing the fence dependency analysis, and so has to actively
> ignore the cycles that must exist on the dma side, and also the cycles
> that prevent entering execution contexts on the CPU. It has to actively
> ignore scheduler execution contexts, for lockdep cries, and so we do not
> get analysis of the locking contexts along that path. This would be
> solvable along the lines of extending lockdep ala lockdep_dma_enter().

drm/scheduler is annotated, found some rather improbably to hit issues
in practice. But from the quick chat I've had with König and others I
think he agrees that it's real at least in the theoretical sense.
Probably should consider playing lottery if you hit it in practice
though :-)

> Had i915's execution flow been marked up, it should have found the
> dubious wait for external fences inside the dead GPU recovery, and
> probably found a few more things to complain about with the reset locking.
> [Note we already do the same annotations for wait-vs-reset, but not
> reset-vs-execution.]

I know it splats, that's why the tdr annotation patch comes with a
spec proposal for lifting the wait busting we do in i915 to the
dma_fence level. I included that because amdgpu has the same problem
on modern hw. Apparently their planned fix (because they've hit this
bug in testing) was to push some shared lock down into their
atomic_comit_tail function and use that in gpu reset, so don't seem
that interested in extending dma_fence.

For i915 it's just gen2/3 display, and cross-driver dma-buf/fence
usage for those is nil and won't change. Pragmatic solution imo would
be to just not annotate gpu reset on these platforms, and relying on
our wait busting plus igt tests to make sure it keeps working as-is.
The point of the explicit annotations for the signalling side is very
much that it can be rolled out gradually, and entirely left out for
old legacy paths that aren't worth fixing.

> Determination of which waits are legal and which are not is entirely ad
> hoc, for there is no status change tracking in the dependency analysis
> [that is once an execution context is linked to a published fence, again
> integral to lockdep.] Consider if the completion chain in atomic is
> swapped out for the morally equivalent fences along intertwined timelines,
> and so it does a bunch of dma_fence_wait() instead. Why are those waits
> legal despite them being after we have committed to fulfilling the out
> fence? [Why are the waits on and for the GPU legal, since they equally
> block execution flow?]

No need to consider, it's already real and resulted in some pretty
splats until I got the recursion handling right.

> Forcing a generic primitive to always be part of the same global map is
> horrible. You forgo being able to use the primitive for unrelated tasks,
> lose the ability to name particular contexts to gain more informative
> dependency cycle reports from having the explicit linkage. You can add
> wait_map tracking without loss of generality [in less than 10 lines],
> and you can still enforce that all fences used for a common purpose
> follow the same rules [the simplest way being to default to the singular
> wait_map]. But it's the explicitly named execution contexts that are the
> biggest boon to reading the code and reading the lockdep warns.

So one thing that's maybe not clear here: This doesn't track the DAG
of dependencies. Doesn't even try, I'm still faithfully assuming
drivers get that part right. Which is a gap and maybe we should fix
this, but not the goal here.

All this does is validate fences against anything else that might be
going on in the system. E.g. your recursion example for atomic is
handled by just a

RE: [PATCH 2/2] drm/amdgpu: add fw release for sdma v5_0

2020-06-19 Thread Zhang, Hawking
[AMD Public Use]

Series is:

Reviewed-by: Hawking Zhang 

Regards,
Hawking
-Original Message-
From: amd-gfx  On Behalf Of Wenhui Sheng
Sent: Thursday, June 18, 2020 15:53
To: amd-gfx@lists.freedesktop.org
Cc: Sheng, Wenhui 
Subject: [PATCH 2/2] drm/amdgpu: add fw release for sdma v5_0

sdma v5_0 fw isn't released when module exit

Signed-off-by: Wenhui Sheng 
---
 drivers/gpu/drm/amd/amdgpu/sdma_v5_0.c | 6 +-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/sdma_v5_0.c 
b/drivers/gpu/drm/amd/amdgpu/sdma_v5_0.c
index 58d2a80af450..6751ad69ed90 100644
--- a/drivers/gpu/drm/amd/amdgpu/sdma_v5_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/sdma_v5_0.c
@@ -1299,8 +1299,12 @@ static int sdma_v5_0_sw_fini(void *handle)
struct amdgpu_device *adev = (struct amdgpu_device *)handle;
int i;
 
-   for (i = 0; i < adev->sdma.num_instances; i++)
+   for (i = 0; i < adev->sdma.num_instances; i++) {
+   if (adev->sdma.instance[i].fw != NULL)
+   release_firmware(adev->sdma.instance[i].fw);
+
amdgpu_ring_fini(&adev->sdma.instance[i].ring);
+   }
 
return 0;
 }
-- 
2.17.1

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.freedesktop.org%2Fmailman%2Flistinfo%2Famd-gfx&data=02%7C01%7Chawking.zhang%40amd.com%7Cd0b8b72e0be443a7343d08d8135cb093%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637280636689938431&sdata=2g%2BOLo5Hg6gxO2pF%2B2uDnshodSAmLaSh9RZQsKrnFUI%3D&reserved=0
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [Intel-gfx] [PATCH 03/18] dma-fence: basic lockdep annotations

2020-06-19 Thread Chris Wilson
Quoting Daniel Stone (2020-06-11 10:01:46)
> Hi,
> 
> On Thu, 11 Jun 2020 at 09:44, Dave Airlie  wrote:
> > On Thu, 11 Jun 2020 at 18:01, Chris Wilson  wrote:
> > > Introducing a global lockmap that cannot capture the rules correctly,
> >
> > Can you document the rules all drivers should be following then,
> > because from here it looks to get refactored every version of i915,
> > and it would be nice if we could all aim for the same set of things
> > roughly. We've already had enough problems with amdgpu vs i915 vs
> > everyone else with fences, if this stops that in the future then I'd
> > rather we have that than just some unwritten rules per driver and
> > untestable.
> 
> As someone who has sunk a bunch of work into explicit-fencing
> awareness in my compositor so I can never be blocked, I'd be
> disappointed if the infrastructure was ultimately pointless because
> the documented fencing rules were \_o_/ or thereabouts. Lockdep
> definitely isn't my area of expertise so I can't comment on the patch
> per se, but having something to ensure we don't hit deadlocks sure
> seems a lot better than nothing.

This is doing dependency analysis on execution contexts which is a far
cry from doing the fence dependency analysis, and so has to actively
ignore the cycles that must exist on the dma side, and also the cycles
that prevent entering execution contexts on the CPU. It has to actively
ignore scheduler execution contexts, for lockdep cries, and so we do not
get analysis of the locking contexts along that path. This would be
solvable along the lines of extending lockdep ala lockdep_dma_enter().

Had i915's execution flow been marked up, it should have found the
dubious wait for external fences inside the dead GPU recovery, and
probably found a few more things to complain about with the reset locking.
[Note we already do the same annotations for wait-vs-reset, but not
reset-vs-execution.]

Determination of which waits are legal and which are not is entirely ad
hoc, for there is no status change tracking in the dependency analysis
[that is once an execution context is linked to a published fence, again
integral to lockdep.] Consider if the completion chain in atomic is
swapped out for the morally equivalent fences along intertwined timelines,
and so it does a bunch of dma_fence_wait() instead. Why are those waits
legal despite them being after we have committed to fulfilling the out
fence? [Why are the waits on and for the GPU legal, since they equally
block execution flow?]

Forcing a generic primitive to always be part of the same global map is
horrible. You forgo being able to use the primitive for unrelated tasks,
lose the ability to name particular contexts to gain more informative
dependency cycle reports from having the explicit linkage. You can add
wait_map tracking without loss of generality [in less than 10 lines],
and you can still enforce that all fences used for a common purpose
follow the same rules [the simplest way being to default to the singular
wait_map]. But it's the explicitly named execution contexts that are the
biggest boon to reading the code and reading the lockdep warns.

This is a bunch of ad hoc tracking for a very narrow purpose applied
globally, with loss of information.
-Chris
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [Linaro-mm-sig] [PATCH 04/18] dma-fence: prime lockdep annotations

2020-06-19 Thread Daniel Vetter
On Fri, Jun 19, 2020 at 8:58 AM Jason Gunthorpe  wrote:
>
> On Thu, Jun 18, 2020 at 05:00:51PM +0200, Daniel Vetter wrote:
> > On Wed, Jun 17, 2020 at 12:28:35PM -0300, Jason Gunthorpe wrote:
> > > On Wed, Jun 17, 2020 at 08:48:50AM +0200, Daniel Vetter wrote:
> > >
> > > > Now my understanding for rdma is that if you don't have hw page fault
> > > > support,
> > >
> > > The RDMA ODP feature is restartable HW page faulting just like nouveau
> > > has. The classical MR feature doesn't have this. Only mlx5 HW supports
> > > ODP today.
> > >
> > > > It's only gpus (I think) which are in this awkward in-between spot
> > > > where dynamic memory management really is much wanted, but the hw
> > > > kinda sucks. Aside, about 10+ years ago we had a similar problem with
> > > > gpu hw, but for security: Many gpu didn't have any kinds of page
> > > > tables to isolate different clients from each another. drivers/gpu
> > > > fixed this by parsing&validating what userspace submitted to make sure
> > > > it's only every accessing its own buffers. Most gpus have become
> > > > reasonable nowadays and do have proper per-process pagetables (gpu
> > > > process, not the pasid stuff), but even today there's still some of
> > > > the old model left in some of the smallest SoC.
> > >
> > > But I still don't understand why a dma fence is needed inside the GPU
> > > driver itself in the notifier.
> > >
> > > Surely the GPU driver can block and release the notifier directly from
> > > its own command processing channel?
> > >
> > > Why does this fence and all it entails need to leak out across
> > > drivers?
> >
> > So 10 years ago we had this world of every gpu driver is its own bucket,
> > nothing leaks out to the world. But the world had a different idea how
> > gpus where supposed to work, with stuff like:
>
> Sure, I understand DMA fence, but why does a *notifier* need it?
>
> The job of the notifier is to guarentee that the device it is
> connected to is not doing DMA before it returns.
>
> That just means you need to prove that device is done with the buffer.
>
> As I've understood GPU that means you need to show that the commands
> associated with the buffer have completed. This is all local stuff
> within the driver, right? Why use fence (other than it already exists)

Because that's the end-of-dma thing. And it's cross-driver for the
above reasons, e.g.
- device A renders some stuff. Userspace gets dma_fence A out of that
(well sync_file or one of the other uapi interfaces, but you get the
idea)
- userspace (across process or just different driver) issues more
rendering for device B, which depends upon the rendering done on
device A. So dma_fence A is an dependency and will block this dma
operation. Userspace (and the kernel) gets dma_fence B out of this
- because unfortunate reasons, the same rendering on device B also
needs a userptr buffer, which means that dma_fence B is also the one
that the mmu_range_notifier needs to wait on before it can tell core
mm that it can go ahead and release those pages
- unhappiness ensues, because now the mmu notifier from device B can
get hung up on the dma operation device A is doing

If you want to avoid this either a) have less shitty hw (not an
option, gpus are gpus, it is slowly getting better though) or b) force
userspace to stall before handing over to next device (about as
uncool) or c) just pin all the memory always, who cares (also rather
unpopular, gpus tend to use all the memory they can get).

I guess the thing with gpus is that dma operations aren't like
read/writes for pretty much everything else, but essentially compute
contexts (usually implemented as ringbuffers where you stream stuff
into) with cross everything dependencies. This even holds within a
single gpu, since pretty much all modern gpus have multiple different
engines special on different things. And yup that's directly exposed
to userspace, for vulkan and other low-level gpu apis even directly to
applications. So dma operation for gpu isn't just "done when the
read/write finishes", but pulls in an entire chain of dependencies and
ordering that needs to happen before it can even start.

-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx