Re: [resend PATCH] drm/ttm: Fix a deadlock if the target BO is not idle during swap

2021-09-06 Thread Christian König

Added a Fixes tag and pushed this to drm-misc-fixes.

It will take a while until it cycles back into the development branches, 
so feel free to push some version to amd-staging-drm-next as well. Just 
ping Alex when you do this.


Thanks,
Christian.

Am 07.09.21 um 06:08 schrieb xinhui pan:

The ret value might be -EBUSY, caller will think lru lock is still
locked but actually NOT. So return -ENOSPC instead. Otherwise we hit
list corruption.

ttm_bo_cleanup_refs might fail too if BO is not idle. If we return 0,
caller(ttm_tt_populate -> ttm_global_swapout ->ttm_device_swapout) will
be stuck as we actually did not free any BO memory. This usually happens
when the fence is not signaled for a long time.

Signed-off-by: xinhui pan 
Reviewed-by: Christian König 
---
  drivers/gpu/drm/ttm/ttm_bo.c | 6 +++---
  1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c
index 8d7fd65ccced..23f906941ac9 100644
--- a/drivers/gpu/drm/ttm/ttm_bo.c
+++ b/drivers/gpu/drm/ttm/ttm_bo.c
@@ -1152,9 +1152,9 @@ int ttm_bo_swapout(struct ttm_buffer_object *bo, struct 
ttm_operation_ctx *ctx,
}
  
  	if (bo->deleted) {

-   ttm_bo_cleanup_refs(bo, false, false, locked);
+   ret = ttm_bo_cleanup_refs(bo, false, false, locked);
ttm_bo_put(bo);
-   return 0;
+   return ret == -EBUSY ? -ENOSPC : ret;
}
  
  	ttm_bo_del_from_lru(bo);

@@ -1208,7 +1208,7 @@ int ttm_bo_swapout(struct ttm_buffer_object *bo, struct 
ttm_operation_ctx *ctx,
if (locked)
dma_resv_unlock(bo->base.resv);
ttm_bo_put(bo);
-   return ret;
+   return ret == -EBUSY ? -ENOSPC : ret;
  }
  
  void ttm_bo_tt_destroy(struct ttm_buffer_object *bo)




[resend PATCH] drm/ttm: Fix a deadlock if the target BO is not idle during swap

2021-09-06 Thread xinhui pan
The ret value might be -EBUSY, caller will think lru lock is still
locked but actually NOT. So return -ENOSPC instead. Otherwise we hit
list corruption.

ttm_bo_cleanup_refs might fail too if BO is not idle. If we return 0,
caller(ttm_tt_populate -> ttm_global_swapout ->ttm_device_swapout) will
be stuck as we actually did not free any BO memory. This usually happens
when the fence is not signaled for a long time.

Signed-off-by: xinhui pan 
Reviewed-by: Christian König 
---
 drivers/gpu/drm/ttm/ttm_bo.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c
index 8d7fd65ccced..23f906941ac9 100644
--- a/drivers/gpu/drm/ttm/ttm_bo.c
+++ b/drivers/gpu/drm/ttm/ttm_bo.c
@@ -1152,9 +1152,9 @@ int ttm_bo_swapout(struct ttm_buffer_object *bo, struct 
ttm_operation_ctx *ctx,
}
 
if (bo->deleted) {
-   ttm_bo_cleanup_refs(bo, false, false, locked);
+   ret = ttm_bo_cleanup_refs(bo, false, false, locked);
ttm_bo_put(bo);
-   return 0;
+   return ret == -EBUSY ? -ENOSPC : ret;
}
 
ttm_bo_del_from_lru(bo);
@@ -1208,7 +1208,7 @@ int ttm_bo_swapout(struct ttm_buffer_object *bo, struct 
ttm_operation_ctx *ctx,
if (locked)
dma_resv_unlock(bo->base.resv);
ttm_bo_put(bo);
-   return ret;
+   return ret == -EBUSY ? -ENOSPC : ret;
 }
 
 void ttm_bo_tt_destroy(struct ttm_buffer_object *bo)
-- 
2.25.1



RE: [PATCH v2 1/2] drm/ttm: Fix a deadlock if the target BO is not idle during swap

2021-09-06 Thread Pan, Xinhui
[AMD Official Use Only]

It is the internal staging drm-next.

-Original Message-
From: Koenig, Christian 
Sent: 2021年9月6日 19:26
To: Pan, Xinhui ; amd-gfx@lists.freedesktop.org
Cc: Deucher, Alexander ; che...@uniontech.com; 
dri-de...@lists.freedesktop.org
Subject: Re: [PATCH v2 1/2] drm/ttm: Fix a deadlock if the target BO is not 
idle during swap

Which branch is this patch based on? Please rebase on top drm-misc-fixes and 
resend.

Thanks,
Christian.

Am 06.09.21 um 03:12 schrieb xinhui pan:
> The ret value might be -EBUSY, caller will think lru lock is still
> locked but actually NOT. So return -ENOSPC instead. Otherwise we hit
> list corruption.
>
> ttm_bo_cleanup_refs might fail too if BO is not idle. If we return 0,
> caller(ttm_tt_populate -> ttm_global_swapout ->ttm_device_swapout)
> will be stuck as we actually did not free any BO memory. This usually
> happens when the fence is not signaled for a long time.
>
> Signed-off-by: xinhui pan 
> Reviewed-by: Christian König 
> ---
>   drivers/gpu/drm/ttm/ttm_bo.c | 6 +++---
>   1 file changed, 3 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/gpu/drm/ttm/ttm_bo.c
> b/drivers/gpu/drm/ttm/ttm_bo.c index 1fedd0eb67ba..f1367107925b 100644
> --- a/drivers/gpu/drm/ttm/ttm_bo.c
> +++ b/drivers/gpu/drm/ttm/ttm_bo.c
> @@ -1159,9 +1159,9 @@ int ttm_bo_swapout(struct ttm_buffer_object *bo, struct 
> ttm_operation_ctx *ctx,
>   }
>
>   if (bo->deleted) {
> - ttm_bo_cleanup_refs(bo, false, false, locked);
> + ret = ttm_bo_cleanup_refs(bo, false, false, locked);
>   ttm_bo_put(bo);
> - return 0;
> + return ret == -EBUSY ? -ENOSPC : ret;
>   }
>
>   ttm_bo_move_to_pinned(bo);
> @@ -1215,7 +1215,7 @@ int ttm_bo_swapout(struct ttm_buffer_object *bo, struct 
> ttm_operation_ctx *ctx,
>   if (locked)
>   dma_resv_unlock(bo->base.resv);
>   ttm_bo_put(bo);
> - return ret;
> + return ret == -EBUSY ? -ENOSPC : ret;
>   }
>
>   void ttm_bo_tt_destroy(struct ttm_buffer_object *bo)



RE: [RFC PATCH v3 1/6] drm/doc: Color Management and HDR10 RFC

2021-09-06 Thread Shankar, Uma



> -Original Message-
> From: sebast...@sebastianwick.net 
> Sent: Monday, August 16, 2021 7:07 PM
> To: Harry Wentland 
> Cc: Brian Starkey ; Sharma, Shashank
> ; amd-gfx@lists.freedesktop.org; dri-
> de...@lists.freedesktop.org; ppaala...@gmail.com; mca...@google.com;
> jsha...@google.com; deepak.sha...@amd.com; shiris...@amd.com;
> vitaly.pros...@amd.com; aric@amd.com; bhawanpreet.la...@amd.com;
> krunoslav.ko...@amd.com; hersenxs...@amd.com;
> nicholas.kazlaus...@amd.com; laurentiu.pa...@oss.nxp.com;
> ville.syrj...@linux.intel.com; n...@arm.com; Shankar, Uma
> 
> Subject: Re: [RFC PATCH v3 1/6] drm/doc: Color Management and HDR10 RFC
> 
> On 2021-08-16 14:40, Harry Wentland wrote:
> > On 2021-08-16 7:10 a.m., Brian Starkey wrote:
> >> On Fri, Aug 13, 2021 at 10:42:12AM +0530, Sharma, Shashank wrote:
> >>> Hello Brian,
> >>> (+Uma in cc)
> >>>

Thanks Shashank for cc'ing me. Apologies for being late here. Now seems
all stakeholders are back so we can resume the UAPI discussion on color.

> >>> Thanks for your comments, Let me try to fill-in for Harry to keep
> >>> the design discussion going. Please find my comments inline.
> >>>
> >
> > Thanks, Shashank. I'm back at work now. Had to cut my trip short due
> > to rising Covid cases and concern for my kids.
> >
> >>> On 8/2/2021 10:00 PM, Brian Starkey wrote:
> 
> >>
> >> -- snip --
> >>
> 
>  Android doesn't blend in linear space, so any API shouldn't be
>  built around an assumption of linear blending.
> 
> >
> > This seems incorrect but I guess ultimately the OS is in control of
> > this. If we want to allow blending in non-linear space with the new
> > API we would either need to describe the blending space or the
> > pre/post-blending gamma/de-gamma.
> >
> > Any idea if this blending behavior in Android might get changed in the
> > future?
> 
> There is lots of software which blends in sRGB space and designers adjusted 
> to the
> incorrect blending in a way that the result looks right.
> Blending in linear space would result in incorrectly looking images.
> 

I feel we should just leave it to userspace to decide rather than forcing 
linear or non
Linear blending in driver.

> >>>
> >>> If I am not wrong, we still need linear buffers for accurate Gamut
> >>> transformation (SRGB -> BT2020 or other way around) isn't it ?
> >>
> >> Yeah, you need to transform the buffer to linear for color gamut
> >> conversions, but then back to non-linear (probably sRGB or gamma 2.2)
> >> for actual blending.
> >>
> >> This is why I'd like to have the per-plane "OETF/GAMMA" separate from
> >> tone-mapping, so that the composition transfer function is
> >> independent.
> >>
> >>>
> >>
> >> ...
> >>
> > +
> > +Tonemapping in this case could be a simple nits value or `EDR`_
> > +to
> > describe
> > +how to scale the :ref:`SDR luminance`.
> > +
> > +Tonemapping could also include the ability to use a 3D LUT which
> > might be
> > +accompanied by a 1D shaper LUT. The shaper LUT is required in
> > order to
> > +ensure a 3D LUT with limited entries (e.g. 9x9x9, or 17x17x17)
> > operates
> > +in perceptual (non-linear) space, so as to evenly spread the
> > limited
> > +entries evenly across the perceived space.
> 
>  Some terminology care may be needed here - up until this point, I
>  think you've been talking about "tonemapping" being luminance
>  adjustment, whereas I'd expect 3D LUTs to be used for gamut
>  adjustment.
> 
> >>>
> >>> IMO, what harry wants to say here is that, which HW block gets
> >>> picked and how tone mapping is achieved can be a very driver/HW
> >>> specific thing, where one driver can use a 1D/Fixed function block,
> >>> whereas another one can choose more complex HW like a 3D LUT for the
> >>> same.
> >>>
> >>> DRM layer needs to define only the property to hook the API with
> >>> core driver, and the driver can decide which HW to pick and
> >>> configure for the activity. So when we have a tonemapping property,
> >>> we might not have a separate 3D-LUT property, or the driver may fail
> >>> the atomic_check() if both of them are programmed for different
> >>> usages.
> >>
> >> I still think that directly exposing the HW blocks and their
> >> capabilities is the right approach, rather than a "magic" tonemapping
> >> property.
> >>
> >> Yes, userspace would need to have a good understanding of how to use
> >> that hardware, but if the pipeline model is standardised that's the
> >> kind of thing a cross-vendor library could handle.
> >>
> >
> > One problem with cross-vendor libraries is that they might struggle to
> > really be cross-vendor when it comes to unique HW behavior. Or they
> > might pick sub-optimal configurations as they're not aware of the
> > power impact of a configuration. What's an optimal configuration might
> > differ greatly between different HW.
> >
> > We're seeing this problem with "universal" planes as well.
>

Re: [Intel-gfx] [PATCH v7 5/8] drm_print: add choice to use dynamic debug in drm-debug

2021-09-06 Thread jim . cromie
> I'll try to extract the "executive summary" from this, you tell me if I
> got it right.
>
> So using or not using dynamic debug for DRM debug ends up being about
> shifting the cost between kernel binary size (data section grows by each
> pr_debug call site) and runtime conditionals?

Yes.

> Since the table sizes you mention seem significant enough, I think that
> justifies existence of DRM_USE_DYNAMIC_DEBUG. It would probably be a
> good idea to put some commentary on that there. Ideally including some
> rough estimates both including space cost per call site and space cost
> for a typical distro kernel build?

yeah, agreed.  I presume you mean in Kconfig entry,
since commits have some size info now - I have i915, amdgpu, nouveau;
I can see some prose improvements for 5/8




> Regards,
>
> Tvrtko

thanks
Jim


Re: [Intel-gfx] [PATCH v7 3/8] i915/gvt: use DEFINE_DYNAMIC_DEBUG_CATEGORIES to create "gvt:core:" etc categories

2021-09-06 Thread jim . cromie
On Mon, Sep 6, 2021 at 6:26 AM Tvrtko Ursulin <
tvrtko.ursu...@linux.intel.com> wrote:
>
>
> On 03/09/2021 20:22, jim.cro...@gmail.com wrote:
> > On Fri, Sep 3, 2021 at 5:07 AM Tvrtko Ursulin
> >  wrote:
> >>
> >>
> >> On 31/08/2021 21:21, Jim Cromie wrote:
> >>> The gvt component of this driver has ~120 pr_debugs, in 9 categories
> >>> quite similar to those in DRM.  Following the interface model of
> >>> drm.debug, add a parameter to map bits to these categorizations.
> >>>
> >>> DEFINE_DYNAMIC_DEBUG_CATEGORIES(debug_gvt, __gvt_debug,
> >>>"dyndbg bitmap desc",
> >>>{ "gvt:cmd: ",  "command processing" },

> >>> v7:
> >>> . move ccflags addition up to i915/Makefile from i915/gvt
> >>> ---
> >>>drivers/gpu/drm/i915/Makefile  |  4 
> >>>drivers/gpu/drm/i915/i915_params.c | 35
++
> >>
> >> Can this work if put under gvt/ or at least intel_gvt.h|c?

I tried this.
I moved the code block into gvt/debug.c (new file)
added it to Makefile GVT_SOURCES
dunno why it wont make.
frustratig basic err, Im not seeing.
It does seem proper placement, will resolve...


> >>
> >
> > I thought it belonged here more, at least according to the name of the
> > config.var
>
> Hmm bear with me please - the categories this patch creates are intended
> to be used explicitly from the GVT "sub-module", or they somehow even
> get automatically used with no further intervention to callers required?
>

2009 - v5.9.0  the only users were admins reading/echoing
/proc/dynamic_debug/control
presumably cuz they wanted more info in the logs, episodically.
v5.9.0 exported dynamic_debug_exec_queries for in-kernel use,
reusing the stringy: echo $query_command > control  idiom.
My intention was to let in-kernel users roll their own drm.debug type
interface,
or whatever else they needed.  nobodys using it yet.

patch 1/8 implements that drm.debug interface.
5/8 is the primary use case
3/8 (this patch) & 4/8 are patches of opportunity, test cases, proof of
function/utility.
its value as such is easier control of those pr-debugs than given by echo >
control

Sean Paul  seanp...@chromium.org worked up a patchset to do runtime
steering of drm-debug stream,
in particular watching for drm:atomic:fail: type activity (a subcategory
which doesnt exist yet).
5/8 conflicts with his patchset, I have an rfc approach to that, so his
concerns are mine too.


note:  if this patchset goes in, we dont *really* need the export anymore,
since the main use case is covered.  We could un-export, and re-add later
if its needed for a different use case.  Further, it seems likely that the
callbacks
(refactored) would be a better basis for new in-kernel users.
If not that, then full exposure of struct ddebug_query to in-kernel use


not quite sure how we got 2 chunks, but theres 1 more q below.

On Mon, Sep 6, 2021 at 6:26 AM Tvrtko Ursulin <
tvrtko.ursu...@linux.intel.com> wrote:

>
> On 03/09/2021 20:22, jim.cro...@gmail.com wrote:
> > On Fri, Sep 3, 2021 at 5:07 AM Tvrtko Ursulin
> >  wrote:
> >>
> >>
> >> On 31/08/2021 21:21, Jim Cromie wrote:
> >>> The gvt component of this driver has ~120 pr_debugs, in 9 categories
> >>> quite similar to those in DRM.  Following the interface model of
> >>> drm.debug, add a parameter to map bits to these categorizations.
> >>>
> >>> DEFINE_DYNAMIC_DEBUG_CATEGORIES(debug_gvt, __gvt_debug,
> >>>"dyndbg bitmap desc",
> >>>{ "gvt:cmd: ",  "command processing" },
> >>>{ "gvt:core: ", "core help" },
> >>>{ "gvt:dpy: ",  "display help" },
> >>>{ "gvt:el: ",   "help" },
> >>>{ "gvt:irq: ",  "help" },
> >>>{ "gvt:mm: ",   "help" },
> >>>{ "gvt:mmio: ", "help" },
> >>>{ "gvt:render: ", "help" },
> >>>{ "gvt:sched: " "help" });
> >>>
> >
> > BTW, Ive dropped the help field, its already handled, dont need to
> clutter.
> >
> >
> >>> The actual patch has a few details different, cmd_help() macro emits
> >>> the initialization construct.
> >>>
> >>> if CONFIG_DRM_USE_DYNAMIC_DEBUG, then -DDYNAMIC_DEBUG_MODULE is added
> >>> cflags, by gvt/Makefile.
> >>>
> >>> Signed-off-by: Jim Cromie 
> >>> ---
> >>> v5:
> >>> . static decl of vector of bit->class descriptors - Emil.V
> >>> . relocate gvt-makefile chunk from elsewhere
> >>> v7:
> >>> . move ccflags addition up to i915/Makefile from i915/gvt
> >>> ---
> >>>drivers/gpu/drm/i915/Makefile  |  4 
> >>>drivers/gpu/drm/i915/i915_params.c | 35
> ++
> >>
> >> Can this work if put under gvt/ or at least intel_gvt.h|c?
> >>
> >
> > I thought it belonged here more, at least according to the name of the
> > config.var
>
> Hmm bear with me please - the categories this patch creates are intended
> to be used explicitly from the GVT "sub-module", or they somehow even
> get automatically used with no further intervention to callers required?
>
> > CONFIG_DRM_USE_DYNAMIC_DEBUG.
> >
> > I suppose its not a great name, its narrow purpos

Re: [PATCH 2/2] drm/amdgpu: cleanup debugfs for amdgpu rings

2021-09-06 Thread Sharma, Shashank




On 9/5/2021 5:01 PM, Das, Nirmoy wrote:


On 9/5/2021 10:03 AM, Sharma, Shashank wrote:



On 9/3/2021 9:44 PM, Das, Nirmoy wrote:

Hi Shashank,

On 9/3/2021 5:51 PM, Das, Nirmoy wrote:


On 9/3/2021 5:26 PM, Sharma, Shashank wrote:



On 9/3/2021 1:39 PM, Das, Nirmoy wrote:


On 9/3/2021 8:36 AM, Sharma, Shashank wrote:



On 9/2/2021 5:14 PM, Nirmoy Das wrote:

Use debugfs_create_file_size API for creating ring debugfs
file, also cleanup surrounding code.

Signed-off-by: Nirmoy Das 
---
  drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c |  4 +---
  drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c    | 16 +---
  drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h    |  8 +---
  3 files changed, 7 insertions(+), 21 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c

index 077f9baf74fe..dee56ab19a8f 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c
@@ -1734,9 +1734,7 @@ int amdgpu_debugfs_init(struct 
amdgpu_device *adev)

  if (!ring)
  continue;
  -    if (amdgpu_debugfs_ring_init(adev, ring)) {
-    DRM_ERROR("Failed to register debugfs file for 
rings !\n");

-    }
+    amdgpu_debugfs_ring_init(adev, ring);
  }
    amdgpu_ras_debugfs_create_all(adev);
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c

index f40753e1a60d..968521d80514 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c
@@ -415,26 +415,20 @@ static const struct file_operations 
amdgpu_debugfs_ring_fops = {

    #endif
  -int amdgpu_debugfs_ring_init(struct amdgpu_device *adev,
+void amdgpu_debugfs_ring_init(struct amdgpu_device *adev,
   struct amdgpu_ring *ring)
  {
  #if defined(CONFIG_DEBUG_FS)
  struct drm_minor *minor = adev_to_drm(adev)->primary;
-    struct dentry *ent, *root = minor->debugfs_root;
+    struct dentry *root = minor->debugfs_root;
  char name[32];
    sprintf(name, "amdgpu_ring_%s", ring->name);
  -    ent = debugfs_create_file(name,
-  S_IFREG | S_IRUGO, root,
-  ring, &amdgpu_debugfs_ring_fops);
-    if (IS_ERR(ent))
-    return -ENOMEM;


Why are we doing this ? Why to make it void from int ?



We tend to ignore debugfs return values as those are not serious 
errors. This to sync with rest of our


debugfs calls.


Regards,

Nirmoy




I am not suere if completely removing the provision of return value 
is a good way of doing it, we can always ignore it at the caller 
side, isn't it ?




I just realized while making the change debugfs_create_file_size() is 
void return, so we don't have anything useful to return in 
amdgpu_debugfs_ring_init()





Ah, it makes better sense now. Probably just a mention in the body of 
the message that we are moving from debugfs_create_file() to 
debugfs_create_file_size(), will make this change of return type more 
logical.



Yes, I have that "Use debugfs_create_file_size API for creating ring 
debugfs file,..."





My bad, I was too focused (and a bit confused due to uasge of clean-up) 
around the code change.


Suggestion for message: Use debugfs_create_file_size API for creating 
ring debugfs, and as its a NULL returning API, change the return type 
for amdgpu_debugfs_ring_init API as well.


With (or even without) this change, please feel free to use:

Reviewed-by: Shashank Sharma 

- Shashank


Nirmoy



- Shashank


Regards,

Nirmoy





Yes, we are currently throwing an error msg and ignoring it. I don't 
have a strong opinion regarding this, I will send a v2 restoring 
previous behavior.



Thanks,

Nirmoy




- Shashank



- Shashank



-
-    i_size_write(ent->d_inode, ring->ring_size + 12);
-    ring->ent = ent;
+    debugfs_create_file_size(name, S_IFREG | S_IRUGO, root, ring,
+ &amdgpu_debugfs_ring_fops,
+ ring->ring_size + 12);
  #endif
-    return 0;
  }
    /**
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h

index 88d80eb3fea1..c29fbce0a5b4 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h
@@ -253,10 +253,6 @@ struct amdgpu_ring {
  bool    has_compute_vm_bug;
  bool    no_scheduler;
  int    hw_prio;
-
-#if defined(CONFIG_DEBUG_FS)
-    struct dentry *ent;
-#endif
  };
    #define amdgpu_ring_parse_cs(r, p, ib) 
((r)->funcs->parse_cs((p), (ib)))
@@ -356,8 +352,6 @@ static inline void 
amdgpu_ring_write_multiple(struct amdgpu_ring *ring,

    int amdgpu_ring_test_helper(struct amdgpu_ring *ring);
  -int amdgpu_debugfs_ring_init(struct amdgpu_device *adev,
+void amdgpu_debugfs_ring_init(struct amdgpu_device *adev,
   struct amdgpu_ring *ring);
-void amdgpu_debugfs_ring_fini(struct amdgpu_ring *ring);
-
  #endif



Re: [diagnostic TDR mode patches] unify our solution opinions/suggestions in one thread

2021-09-06 Thread Jingwen Chen
Hi Christian/Andrey/Daniel,

I read Boris's patch about ordered workqueue and I think maybe we can
leverage this change.
https://lore.kernel.org/dri-devel/20210625133327.2598825-2-boris.brezil...@collabora.com/

As the TDR race condition we are talking about is caused by a bailing
job being deleted from pending list. While if we use the ordered
workqueue for timedout in the driver, there will be no bailing job.

Do you have any suggestions?

Best Regards,
JingWen Chen

On Mon Sep 06, 2021 at 02:36:52PM +0800, Liu, Monk wrote:
> [AMD Official Use Only]
> 
> > I'm fearing that just repeating what Alex said, but to make it clear 
> > once more: That is *not* necessary!
> >
> > The shared repository is owned by upstream maintainers and they are 
> > usually free to do restructuring work without getting acknowledge from 
> > every single driver maintainer.
> 
> Hi Daniel
> 
> Anyway thanks for officially confirm to me of working model & policy in 
> community, I don't want to put my opinion here due to that's not my call to 
> change no matter how.
> I only want to let this diagnostic TDR scheme going to a good end for AMD or 
> even for all DRM vendor.
> 
> How about this way, we still have a final patch not landed in DRM scheduler 
> and I would like jingwen to present it to you and AlexD/Christian/Andrey,  I 
> believe you will have concerns or objections regarding this patch, but that's 
> fine, let us figure it out together, how to make it acceptable by you and 
> other vendors that working with DRM scheduler.
> 
> P.S.:  I had to repeat myself again, we are not popping up new idea suddenly, 
> it is disconnection issue, we didn't have changes (or plan to have changes) 
> in DRM scheduler before, but eventually we found we must make job_timeout and 
> sched_main to work in a serialized otherwise it won't work based on current 
> scheduler's code structure.
> 
> Thanks 
> 
> --
> Monk Liu | Cloud-GPU Core team
> --
> 
> -Original Message-
> From: Daniel Vetter  
> Sent: Friday, September 3, 2021 12:11 AM
> To: Koenig, Christian 
> Cc: Liu, Monk ; Dave Airlie ; Alex 
> Deucher ; Grodzovsky, Andrey 
> ; Chen, JingWen ; DRI 
> Development ; amd-gfx@lists.freedesktop.org
> Subject: Re: [diagnostic TDR mode patches] unify our solution 
> opinions/suggestions in one thread
> 
> On Thu, Sep 2, 2021 at 1:00 PM Christian König  
> wrote:
> >
> > Hi Monk,
> >
> > Am 02.09.21 um 07:52 schrieb Liu, Monk:
> > > [AMD Official Use Only]
> > >
> > > I'm not sure I can add much to help this along, I'm sure Alex has 
> > > some internal training, Once your driver is upstream, it belongs to 
> > > upstream, you can maintain it, but you no longer control it 100%, it's a 
> > > tradeoff, it's not one companies always understand.
> > > Usually people are fine developing away internally, but once interaction 
> > > with other parts of the kernel/subsystem is required they have the 
> > > realisation that they needed to work upstream 6 months earlier.
> > > The best time to interact with upstream was 6 months ago, the second best 
> > > time is now.
> > > <<<
> > >
> > > Daniel/AlexD
> > >
> > > I didn't mean your changes on AMD driver need my personal approval 
> > > or review ... and  I'm totally already get used that our driver is not 
> > > 100% under control by AMDers, but supposedly any one from community 
> > > (including you) who tend to change AMD's driver need at least to get 
> > > approvement from someone in AMD, e.g.: AlexD or Christian, doesn't that 
> > > reasonable?
> >
> > I'm fearing that just repeating what Alex said, but to make it clear 
> > once more: That is *not* necessary!
> >
> > The shared repository is owned by upstream maintainers and they are 
> > usually free to do restructuring work without getting acknowledge from 
> > every single driver maintainer.
> >
> > Anybody can of course technically object to upstream design decisions, 
> > but that means that you need to pay attention to the mailing lists in 
> > the first place.
> >
> > > just like we need your approve if we try to modify DRM-sched, or need 
> > > panfrost's approval if we need to change panfrost code ...
> > >
> > > by only CC AMD's engineers looks not quite properly, how do you know if 
> > > your changes (on AMD code part) are conflicting with AMD's on-going 
> > > internal features/refactoring or not ?
> >
> > Well because AMD is supposed to work in public as much as possible and 
> > ask upstream before doing changes to the code base.
> >
> > Additional to that design decisions are supposed to be discussed on 
> > the mailing list and *not* internally.
> 
> Yeah I'm honestly really surprised about the course of this discussion here. 
> With Alex, Christian and others amd has a lot of folks with years/decades of 
> experience in how to collaborate in upstream, when to pull in others 
> proactively and when that's not needed, and in general how to 

[PATCH] drm/amd/amdgpu: Enable some sysnodes for guest smi

2021-09-06 Thread Roy Sun
Enable sysnode vclk and dclk on Navi21 asic for guest smi

Signed-off-by: Roy Sun 
---
 drivers/gpu/drm/amd/pm/amdgpu_pm.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/pm/amdgpu_pm.c 
b/drivers/gpu/drm/amd/pm/amdgpu_pm.c
index 249cb0aeb5ae..c255b4b8e685 100644
--- a/drivers/gpu/drm/amd/pm/amdgpu_pm.c
+++ b/drivers/gpu/drm/amd/pm/amdgpu_pm.c
@@ -2087,10 +2087,10 @@ static int default_attr_update(struct amdgpu_device 
*adev, struct amdgpu_device_
if (asic_type < CHIP_VEGA12)
*states = ATTR_STATE_UNSUPPORTED;
} else if (DEVICE_ATTR_IS(pp_dpm_vclk)) {
-   if (!(asic_type == CHIP_VANGOGH))
+   if (!(asic_type == CHIP_VANGOGH || asic_type == 
CHIP_SIENNA_CICHLID))
*states = ATTR_STATE_UNSUPPORTED;
} else if (DEVICE_ATTR_IS(pp_dpm_dclk)) {
-   if (!(asic_type == CHIP_VANGOGH))
+   if (!(asic_type == CHIP_VANGOGH || asic_type == 
CHIP_SIENNA_CICHLID))
*states = ATTR_STATE_UNSUPPORTED;
}
 
-- 
2.32.0



Re: [Intel-gfx] [PATCH v7 3/8] i915/gvt: use DEFINE_DYNAMIC_DEBUG_CATEGORIES to create "gvt:core:" etc categories

2021-09-06 Thread Tvrtko Ursulin



On 03/09/2021 20:22, jim.cro...@gmail.com wrote:

On Fri, Sep 3, 2021 at 5:07 AM Tvrtko Ursulin
 wrote:



On 31/08/2021 21:21, Jim Cromie wrote:

The gvt component of this driver has ~120 pr_debugs, in 9 categories
quite similar to those in DRM.  Following the interface model of
drm.debug, add a parameter to map bits to these categorizations.

DEFINE_DYNAMIC_DEBUG_CATEGORIES(debug_gvt, __gvt_debug,
   "dyndbg bitmap desc",
   { "gvt:cmd: ",  "command processing" },
   { "gvt:core: ", "core help" },
   { "gvt:dpy: ",  "display help" },
   { "gvt:el: ",   "help" },
   { "gvt:irq: ",  "help" },
   { "gvt:mm: ",   "help" },
   { "gvt:mmio: ", "help" },
   { "gvt:render: ", "help" },
   { "gvt:sched: " "help" });



BTW, Ive dropped the help field, its already handled, dont need to clutter.



The actual patch has a few details different, cmd_help() macro emits
the initialization construct.

if CONFIG_DRM_USE_DYNAMIC_DEBUG, then -DDYNAMIC_DEBUG_MODULE is added
cflags, by gvt/Makefile.

Signed-off-by: Jim Cromie 
---
v5:
. static decl of vector of bit->class descriptors - Emil.V
. relocate gvt-makefile chunk from elsewhere
v7:
. move ccflags addition up to i915/Makefile from i915/gvt
---
   drivers/gpu/drm/i915/Makefile  |  4 
   drivers/gpu/drm/i915/i915_params.c | 35 ++


Can this work if put under gvt/ or at least intel_gvt.h|c?



I thought it belonged here more, at least according to the name of the
config.var


Hmm bear with me please - the categories this patch creates are intended 
to be used explicitly from the GVT "sub-module", or they somehow even 
get automatically used with no further intervention to callers required?



CONFIG_DRM_USE_DYNAMIC_DEBUG.

I suppose its not a great name, its narrow purpose is to swap
drm-debug api to use dyndbg.   drm-evrything already "uses"
dyndbg if CONFIG_DYNAMIC_DEBUG=y, those gvt/pr_debugs in particular.

Theres also CONFIG_DYNAMIC_DEBUG_CORE=y,
which drm basically ignores currently.

So with the name CONFIG_DRM_USE_DYNAMIC_DEBUG
it seemed proper to arrange for that  to be true on DD-CORE=y builds,
by adding -DDYNAMIC_DEBUG_MODULE

Does that make some sense ?
How to best resolve the frictions ?
new CONFIG names ?


   2 files changed, 39 insertions(+)

diff --git a/drivers/gpu/drm/i915/Makefile b/drivers/gpu/drm/i915/Makefile
index 4f22cac1c49b..5a4e371a3ec2 100644
--- a/drivers/gpu/drm/i915/Makefile
+++ b/drivers/gpu/drm/i915/Makefile
@@ -30,6 +30,10 @@ CFLAGS_display/intel_fbdev.o = $(call cc-disable-warning, 
override-init)

   subdir-ccflags-y += -I$(srctree)/$(src)

+#ifdef CONFIG_DRM_USE_DYNAMIC_DEBUG
+ccflags-y += -DDYNAMIC_DEBUG_MODULE
+#endif


Ignores whether CONFIG_DRM_I915_GVT is enabled or not?



not intentionally.
I think theres 2 things youre noting:

1 - make frag into gvt/Makefile
I had it there earlier, not sure why I moved it up.
maybe some confusion on proper scope of the flag.


2 - move new declaration code in i915-param.c inside the gvt ifdef

Im good with that.
I'll probably copy the ifdef wrapper down rather than move the decl up.
ie:

#if __and(IS_ENABLED(CONFIG_DRM_I915_GVT),
   IS_ENABLED(CONFIG_DRM_USE_DYNAMIC_DEBUG))

unsigned long __gvt_debug;
EXPORT_SYMBOL(__gvt_debug);



+
   # Please keep these build lists sorted!

   # core driver code
diff --git a/drivers/gpu/drm/i915/i915_params.c 
b/drivers/gpu/drm/i915/i915_params.c
index e07f4cfea63a..e645e149485e 100644
--- a/drivers/gpu/drm/i915/i915_params.c
+++ b/drivers/gpu/drm/i915/i915_params.c
@@ -265,3 +265,38 @@ void i915_params_free(struct i915_params *params)
   I915_PARAMS_FOR_EACH(FREE);
   #undef FREE
   }
+
+#ifdef CONFIG_DRM_USE_DYNAMIC_DEBUG
+/* todo: needs DYNAMIC_DEBUG_MODULE in some cases */
+
+unsigned long __gvt_debug;
+EXPORT_SYMBOL(__gvt_debug);
+
+#define _help(key)   "\t\"" key "\"\t: help for " key "\n"
+
+#define I915_GVT_CATEGORIES(name) \
+ " Enable debug output via /sys/module/i915/parameters/" #name   \
+ ", where each bit enables a debug category.\n"  \
+ _help("gvt:cmd:")   \
+ _help("gvt:core:")  \
+ _help("gvt:dpy:")   \
+ _help("gvt:el:")\
+ _help("gvt:irq:")   \
+ _help("gvt:mm:")\
+ _help("gvt:mmio:")  \
+ _help("gvt:render:")\
+ _help("gvt:sched:")
+
+DEFINE_DYNAMIC_DEBUG_CATEGORIES(debug_gvt, __gvt_debug,
+ I915_GVT_CATEGORIES(debug_gvt),
+ _DD_cat_("gvt:cmd:"),
+ _DD_cat_("gvt:core:"),
+ _DD_cat_("gvt:dpy:"),
+ _

Re: [PATCH v2 1/2] drm/ttm: Fix a deadlock if the target BO is not idle during swap

2021-09-06 Thread Christian König
Which branch is this patch based on? Please rebase on top drm-misc-fixes 
and resend.


Thanks,
Christian.

Am 06.09.21 um 03:12 schrieb xinhui pan:

The ret value might be -EBUSY, caller will think lru lock is still
locked but actually NOT. So return -ENOSPC instead. Otherwise we hit
list corruption.

ttm_bo_cleanup_refs might fail too if BO is not idle. If we return 0,
caller(ttm_tt_populate -> ttm_global_swapout ->ttm_device_swapout) will
be stuck as we actually did not free any BO memory. This usually happens
when the fence is not signaled for a long time.

Signed-off-by: xinhui pan 
Reviewed-by: Christian König 
---
  drivers/gpu/drm/ttm/ttm_bo.c | 6 +++---
  1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c
index 1fedd0eb67ba..f1367107925b 100644
--- a/drivers/gpu/drm/ttm/ttm_bo.c
+++ b/drivers/gpu/drm/ttm/ttm_bo.c
@@ -1159,9 +1159,9 @@ int ttm_bo_swapout(struct ttm_buffer_object *bo, struct 
ttm_operation_ctx *ctx,
}
  
  	if (bo->deleted) {

-   ttm_bo_cleanup_refs(bo, false, false, locked);
+   ret = ttm_bo_cleanup_refs(bo, false, false, locked);
ttm_bo_put(bo);
-   return 0;
+   return ret == -EBUSY ? -ENOSPC : ret;
}
  
  	ttm_bo_move_to_pinned(bo);

@@ -1215,7 +1215,7 @@ int ttm_bo_swapout(struct ttm_buffer_object *bo, struct 
ttm_operation_ctx *ctx,
if (locked)
dma_resv_unlock(bo->base.resv);
ttm_bo_put(bo);
-   return ret;
+   return ret == -EBUSY ? -ENOSPC : ret;
  }
  
  void ttm_bo_tt_destroy(struct ttm_buffer_object *bo)




Re: [PATCH v2 0/2] Fix a hung during memory pressure test

2021-09-06 Thread Christian König

Am 06.09.21 um 12:16 schrieb Pan, Xinhui:

2021年9月6日 17:04,Christian König  写道:



Am 06.09.21 um 03:12 schrieb xinhui pan:

A long time ago, someone reports system got hung during memory test.
In recent days, I am trying to look for or understand the potential
deadlock in ttm/amdgpu code.

This patchset aims to fix the deadlock during ttm populate.

TTM has a parameter called pages_limit, when allocated GTT memory
reaches this limit, swapout would be triggered. As ttm_bo_swapout does
not return the correct retval, populate might get hung.

UVD ib test uses GTT which might be insufficient. So a gpu recovery
would hung if populate hung.

Ah, now I understand what you are trying to do.

Problem is that won't work either. Allocating VRAM can easily land you inside 
the same deadlock.

We need to avoid the allocation altogether for this for work correctly.

looks like we need reserve some pages at sw init.


Yeah, something like that should do it.

But keep in mind that you then need a lock or similar when using the 
resource to prevent concurrent use.





I have made one drm test which alloc two GTT BOs, submit gfx copy
commands and free these BOs without waiting fence. What's more, these
gfx copy commands will cause gfx ring hang. So gpu recovery would be
triggered.

Mhm, that should never be possible. It is perfectly valid for an application to 
terminate without waitting for the GFX submission to be completed.

gfx ring hangs because of the command is illegal.
the packet is COMMAND [30:21] | BYTE_COUNT [20:0]
I use 0xFF << 20 to hang the ring on purpose.


Ok that makes more sense.

Thanks,
Christian.




Going to push patch #1 to drm-misc-fixes or drm-misc-next-fixes in a moment.

Thanks,
Christian.


Now here is one possible deadlock case.
gpu_recovery
  -> stop drm scheduler
  -> asic reset
-> ib test
   -> tt populate (uvd ib test)
->  ttm_bo_swapout (BO A) // this always fails as the fence of
BO A would not be signaled by schedluer or HW. Hit deadlock.

I paste the drm test patch below.
#modprobe ttm pages_limit=65536
#amdgpu_test -s 1 -t 4
---
  tests/amdgpu/basic_tests.c | 32 ++--
  1 file changed, 14 insertions(+), 18 deletions(-)

diff --git a/tests/amdgpu/basic_tests.c b/tests/amdgpu/basic_tests.c
index dbf02fee..f85ed340 100644
--- a/tests/amdgpu/basic_tests.c
+++ b/tests/amdgpu/basic_tests.c
@@ -65,13 +65,16 @@ static void amdgpu_direct_gma_test(void);
  static void amdgpu_command_submission_write_linear_helper(unsigned ip_type);
  static void amdgpu_command_submission_const_fill_helper(unsigned ip_type);
  static void amdgpu_command_submission_copy_linear_helper(unsigned ip_type);
-static void amdgpu_test_exec_cs_helper(amdgpu_context_handle context_handle,
+static void _amdgpu_test_exec_cs_helper(amdgpu_context_handle context_handle,
   unsigned ip_type,
   int instance, int pm4_dw, uint32_t 
*pm4_src,
   int res_cnt, amdgpu_bo_handle *resources,
   struct amdgpu_cs_ib_info *ib_info,
-  struct amdgpu_cs_request *ibs_request);
+  struct amdgpu_cs_request *ibs_request, 
int sync, int repeat);
   +#define amdgpu_test_exec_cs_helper(...) \
+   _amdgpu_test_exec_cs_helper(__VA_ARGS__, 1, 1)
+
  CU_TestInfo basic_tests[] = {
{ "Query Info Test",  amdgpu_query_info_test },
{ "Userptr Test",  amdgpu_userptr_test },
@@ -1341,12 +1344,12 @@ static void amdgpu_command_submission_compute(void)
   * pm4_src, resources, ib_info, and ibs_request
   * submit command stream described in ibs_request and wait for this IB 
accomplished
   */
-static void amdgpu_test_exec_cs_helper(amdgpu_context_handle context_handle,
+static void _amdgpu_test_exec_cs_helper(amdgpu_context_handle context_handle,
   unsigned ip_type,
   int instance, int pm4_dw, uint32_t 
*pm4_src,
   int res_cnt, amdgpu_bo_handle *resources,
   struct amdgpu_cs_ib_info *ib_info,
-  struct amdgpu_cs_request *ibs_request)
+  struct amdgpu_cs_request *ibs_request, 
int sync, int repeat)
  {
int r;
uint32_t expired;
@@ -1395,12 +1398,15 @@ static void 
amdgpu_test_exec_cs_helper(amdgpu_context_handle context_handle,
CU_ASSERT_NOT_EQUAL(ibs_request, NULL);
/* submit CS */
-   r = amdgpu_cs_submit(context_handle, 0, ibs_request, 1);
+   while (repeat--)
+   r = amdgpu_cs_submit(context_handle, 0, ibs_request, 1);
CU_ASSERT_EQUAL(r, 0);
r = amdgpu_bo_list_destroy(ibs_request->resources);
CU_ASSERT_EQUAL(r, 0);
  + if (!sync)
+   return;
fence_status.ip_type 

Re: [Intel-gfx] [PATCH v7 5/8] drm_print: add choice to use dynamic debug in drm-debug

2021-09-06 Thread Tvrtko Ursulin



On 03/09/2021 22:57, jim.cro...@gmail.com wrote:

On Fri, Sep 3, 2021 at 5:15 AM Tvrtko Ursulin
 wrote:



On 31/08/2021 21:21, Jim Cromie wrote:

drm's debug system writes 10 distinct categories of messages to syslog
using a small API[1]: drm_dbg*(10 names), DRM_DEV_DEBUG*(3 names),
DRM_DEBUG*(8 names).  There are thousands of these callsites, each
categorized in this systematized way.

These callsites can be enabled at runtime by their category, each
controlled by a bit in drm.debug (/sys/modules/drm/parameter/debug).
In the current "basic" implementation, drm_debug_enabled() tests these
bits in __drm_debug each time an API[1] call is executed; while cheap
individually, the costs accumulate with uptime.

This patch uses dynamic-debug with jump-label to patch enabled calls
onto their respective NOOP slots, avoiding all runtime bit-checks of
__drm_debug by drm_debug_enabled().

Dynamic debug has no concept of category, but we can emulate one by
replacing enum categories with a set of prefix-strings; "drm:core:",
"drm:kms:" "drm:driver:" etc, and prepend them (at compile time) to
the given formats.

Then we can use:
`echo module drm format "^drm:core: " +p > control`

to enable the whole category with one query.


Probably stupid question - enabling stuff at boot time still works as
described in Documentation/admin-guide/dynamic-debug-howto.rst?



yes.  its turned on in earlyinit, and cmdline args are a processed then,
and when modules are added



Second question, which perhaps has been covered in the past so apologies
if redundant - what is the advantage of allowing this to be
configurable, versus perhaps always enabling it? Like what would be the
reasons someone wouldn't just want to have CONFIG_DYNAMIC_DEBUG compiled
in? Kernel binary size?



Im unaware of anything on this topic, but I can opine :-)
Its been configurable since I saw it and thought "jump-labels are cool!"

code is small
[jimc@frodo local-i915m]$ size lib/dynamic_debug.o
textdata bss dec hex filename
   240168041  64   321217d79 lib/dynamic_debug.o

Its data tables are big, particularly the __dyndbg section
builtins:
dyndbg: 108 debug prints in module mptcp
dyndbg:   2 debug prints in module i386
dyndbg:   2 debug prints in module xen
dyndbg:   2 debug prints in module fixup
dyndbg:   7 debug prints in module irq
dyndbg: 3039 prdebugs in 283 modules, 11 KiB in ddebug tables, 166 kiB
in __dyndbg section

bash-5.1#
bash-5.1# for m in i915 amdgpu ; do modprobe $m dyndbg=+_ ; done
dyndbg: 384 debug prints in module drm
dyndbg: 211 debug prints in module drm_kms_helper
dyndbg:   2 debug prints in module ttm
dyndbg:   8 debug prints in module video
dyndbg: 1727 debug prints in module i915
dyndbg: processed 1 queries, with 3852 matches, 0 errs
dyndbg: 3852 debug prints in module amdgpu
[drm] amdgpu kernel modesetting enabled.
amdgpu: CRAT table disabled by module option
amdgpu: Virtual CRAT table created for CPU
amdgpu: Topology: Add CPU node
bash-5.1#

At 56 bytes / callsite, it adds up.
And teaching DRM to use it enlarges its use dramatically,
not just in drm itself, but in many drivers

amdgpu has 3852 callsite, (vs 3039 in my kernel), so it has ~240kb.
It has extra (large chunks generated by macros) to trim,
but i915 has ~1700, and drm has ~380

I have WIP to reduce the table space, by splitting it into 2 separate ones;
guts and decorations (module, function, file pointers).
The decoration recs are redundant, 45% are copies of previous
(function changes fastest)
It needs much rework, but it should get 20% overall.
decorations are 24/56 of footprint.


I'll try to extract the "executive summary" from this, you tell me if I 
got it right.


So using or not using dynamic debug for DRM debug ends up being about 
shifting the cost between kernel binary size (data section grows by each 
pr_debug call site) and runtime conditionals?


Since the table sizes you mention seem significant enough, I think that 
justifies existence of DRM_USE_DYNAMIC_DEBUG. It would probably be a 
good idea to put some commentary on that there. Ideally including some 
rough estimates both including space cost per call site and space cost 
for a typical distro kernel build?


Regards,

Tvrtko


Re: [PATCH v2 0/2] Fix a hung during memory pressure test

2021-09-06 Thread Pan, Xinhui


> 2021年9月6日 17:04,Christian König  写道:
> 
> 
> 
> Am 06.09.21 um 03:12 schrieb xinhui pan:
>> A long time ago, someone reports system got hung during memory test.
>> In recent days, I am trying to look for or understand the potential
>> deadlock in ttm/amdgpu code.
>> 
>> This patchset aims to fix the deadlock during ttm populate.
>> 
>> TTM has a parameter called pages_limit, when allocated GTT memory
>> reaches this limit, swapout would be triggered. As ttm_bo_swapout does
>> not return the correct retval, populate might get hung.
>> 
>> UVD ib test uses GTT which might be insufficient. So a gpu recovery
>> would hung if populate hung.
> 
> Ah, now I understand what you are trying to do.
> 
> Problem is that won't work either. Allocating VRAM can easily land you inside 
> the same deadlock.
> 
> We need to avoid the allocation altogether for this for work correctly.

looks like we need reserve some pages at sw init.

> 
>> 
>> I have made one drm test which alloc two GTT BOs, submit gfx copy
>> commands and free these BOs without waiting fence. What's more, these
>> gfx copy commands will cause gfx ring hang. So gpu recovery would be
>> triggered.
> 
> Mhm, that should never be possible. It is perfectly valid for an application 
> to terminate without waitting for the GFX submission to be completed.

gfx ring hangs because of the command is illegal.
the packet is COMMAND [30:21] | BYTE_COUNT [20:0]
I use 0xFF << 20 to hang the ring on purpose.

> 
> Going to push patch #1 to drm-misc-fixes or drm-misc-next-fixes in a moment.
> 
> Thanks,
> Christian.
> 
>> 
>> Now here is one possible deadlock case.
>> gpu_recovery
>>  -> stop drm scheduler
>>  -> asic reset
>>-> ib test
>>   -> tt populate (uvd ib test)
>>  ->  ttm_bo_swapout (BO A) // this always fails as the fence of
>>  BO A would not be signaled by schedluer or HW. Hit deadlock.
>> 
>> I paste the drm test patch below.
>> #modprobe ttm pages_limit=65536
>> #amdgpu_test -s 1 -t 4
>> ---
>>  tests/amdgpu/basic_tests.c | 32 ++--
>>  1 file changed, 14 insertions(+), 18 deletions(-)
>> 
>> diff --git a/tests/amdgpu/basic_tests.c b/tests/amdgpu/basic_tests.c
>> index dbf02fee..f85ed340 100644
>> --- a/tests/amdgpu/basic_tests.c
>> +++ b/tests/amdgpu/basic_tests.c
>> @@ -65,13 +65,16 @@ static void amdgpu_direct_gma_test(void);
>>  static void amdgpu_command_submission_write_linear_helper(unsigned ip_type);
>>  static void amdgpu_command_submission_const_fill_helper(unsigned ip_type);
>>  static void amdgpu_command_submission_copy_linear_helper(unsigned ip_type);
>> -static void amdgpu_test_exec_cs_helper(amdgpu_context_handle context_handle,
>> +static void _amdgpu_test_exec_cs_helper(amdgpu_context_handle 
>> context_handle,
>> unsigned ip_type,
>> int instance, int pm4_dw, uint32_t 
>> *pm4_src,
>> int res_cnt, amdgpu_bo_handle *resources,
>> struct amdgpu_cs_ib_info *ib_info,
>> -   struct amdgpu_cs_request *ibs_request);
>> +   struct amdgpu_cs_request *ibs_request, 
>> int sync, int repeat);
>>   +#define amdgpu_test_exec_cs_helper(...) \
>> +_amdgpu_test_exec_cs_helper(__VA_ARGS__, 1, 1)
>> +
>>  CU_TestInfo basic_tests[] = {
>>  { "Query Info Test",  amdgpu_query_info_test },
>>  { "Userptr Test",  amdgpu_userptr_test },
>> @@ -1341,12 +1344,12 @@ static void amdgpu_command_submission_compute(void)
>>   * pm4_src, resources, ib_info, and ibs_request
>>   * submit command stream described in ibs_request and wait for this IB 
>> accomplished
>>   */
>> -static void amdgpu_test_exec_cs_helper(amdgpu_context_handle context_handle,
>> +static void _amdgpu_test_exec_cs_helper(amdgpu_context_handle 
>> context_handle,
>> unsigned ip_type,
>> int instance, int pm4_dw, uint32_t 
>> *pm4_src,
>> int res_cnt, amdgpu_bo_handle *resources,
>> struct amdgpu_cs_ib_info *ib_info,
>> -   struct amdgpu_cs_request *ibs_request)
>> +   struct amdgpu_cs_request *ibs_request, 
>> int sync, int repeat)
>>  {
>>  int r;
>>  uint32_t expired;
>> @@ -1395,12 +1398,15 @@ static void 
>> amdgpu_test_exec_cs_helper(amdgpu_context_handle context_handle,
>>  CU_ASSERT_NOT_EQUAL(ibs_request, NULL);
>>  /* submit CS */
>> -r = amdgpu_cs_submit(context_handle, 0, ibs_request, 1);
>> +while (repeat--)
>> +r = amdgpu_cs_submit(context_handle, 0, ibs_request, 1);
>>  CU_ASSERT_EQUAL(r, 0);
>>  r = amdgpu_bo_list_destroy(ibs_request->resources);
>>  CU_ASSERT_EQUAL(r, 0);
>>  +   if (!sync)
>> +return;
>>  fence_status.ip_type = ip_type;
>>  fence_statu

RE: [PATCH] drm/amdgpu: Create common PSP TA load function

2021-09-06 Thread Clements, John
[AMD Official Use Only]

Reviewed-by: John Clements 

-Original Message-
From: Li, Candice  
Sent: Monday, September 6, 2021 4:34 PM
To: amd-gfx@lists.freedesktop.org
Cc: Clements, John ; Li, Candice 
Subject: [PATCH] drm/amdgpu: Create common PSP TA load function

Creat common PSP TA load function and update PSP ta_mem_context with size 
information.

Signed-off-by: Candice Li 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c | 280 +++-  
drivers/gpu/drm/amd/amdgpu/amdgpu_psp.h |  17 +-
 2 files changed, 93 insertions(+), 204 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c
index 54c26432c65b3d..75eed18370eb12 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c
@@ -901,22 +901,20 @@ int psp_reg_program(struct psp_context *psp, enum 
psp_reg_prog_id reg,  static void psp_prep_ta_load_cmd_buf(struct 
psp_gfx_cmd_resp *cmd,
 uint64_t ta_bin_mc,
 uint32_t ta_bin_size,
-uint64_t ta_shared_mc,
-uint32_t ta_shared_size)
+struct ta_mem_context *mem_ctx)
 {
cmd->cmd_id = GFX_CMD_ID_LOAD_TA;
cmd->cmd.cmd_load_ta.app_phy_addr_lo= lower_32_bits(ta_bin_mc);
cmd->cmd.cmd_load_ta.app_phy_addr_hi= upper_32_bits(ta_bin_mc);
cmd->cmd.cmd_load_ta.app_len= ta_bin_size;
 
-   cmd->cmd.cmd_load_ta.cmd_buf_phy_addr_lo = lower_32_bits(ta_shared_mc);
-   cmd->cmd.cmd_load_ta.cmd_buf_phy_addr_hi = upper_32_bits(ta_shared_mc);
-   cmd->cmd.cmd_load_ta.cmd_buf_len = ta_shared_size;
+   cmd->cmd.cmd_load_ta.cmd_buf_phy_addr_lo = 
lower_32_bits(mem_ctx->shared_mc_addr);
+   cmd->cmd.cmd_load_ta.cmd_buf_phy_addr_hi = 
upper_32_bits(mem_ctx->shared_mc_addr);
+   cmd->cmd.cmd_load_ta.cmd_buf_len = mem_ctx->shared_mem_size;
 }
 
 static int psp_ta_init_shared_buf(struct psp_context *psp,
- struct ta_mem_context *mem_ctx,
- uint32_t shared_mem_size)
+ struct ta_mem_context *mem_ctx)
 {
int ret;
 
@@ -924,8 +922,8 @@ static int psp_ta_init_shared_buf(struct psp_context *psp,
* Allocate 16k memory aligned to 4k from Frame Buffer (local
* physical) for ta to host memory
*/
-   ret = amdgpu_bo_create_kernel(psp->adev, shared_mem_size, PAGE_SIZE,
- AMDGPU_GEM_DOMAIN_VRAM,
+   ret = amdgpu_bo_create_kernel(psp->adev, mem_ctx->shared_mem_size,
+ PAGE_SIZE, AMDGPU_GEM_DOMAIN_VRAM,
  &mem_ctx->shared_bo,
  &mem_ctx->shared_mc_addr,
  &mem_ctx->shared_buf);
@@ -941,8 +939,7 @@ static void psp_ta_free_shared_buf(struct ta_mem_context 
*mem_ctx)
 
 static int psp_xgmi_init_shared_buf(struct psp_context *psp)  {
-   return psp_ta_init_shared_buf(psp, 
&psp->xgmi_context.context.mem_context,
- PSP_XGMI_SHARED_MEM_SIZE);
+   return psp_ta_init_shared_buf(psp, 
+&psp->xgmi_context.context.mem_context);
 }
 
 static void psp_prep_ta_invoke_cmd_buf(struct psp_gfx_cmd_resp *cmd, @@ 
-971,31 +968,27 @@ static int psp_ta_invoke(struct psp_context *psp,
return ret;
 }
 
-static int psp_xgmi_load(struct psp_context *psp)
+static int psp_ta_load(struct psp_context *psp,
+  struct psp_bin_desc *bin_desc,
+  struct ta_context *context)
 {
int ret;
struct psp_gfx_cmd_resp *cmd;
 
-   /*
-* TODO: bypass the loading in sriov for now
-*/
-
cmd = acquire_psp_cmd_buf(psp);
 
-   psp_copy_fw(psp, psp->xgmi.start_addr, psp->xgmi.size_bytes);
+   psp_copy_fw(psp, bin_desc->start_addr, bin_desc->size_bytes);
 
psp_prep_ta_load_cmd_buf(cmd,
 psp->fw_pri_mc_addr,
-psp->xgmi.size_bytes,
-
psp->xgmi_context.context.mem_context.shared_mc_addr,
-PSP_XGMI_SHARED_MEM_SIZE);
+bin_desc->size_bytes,
+&context->mem_context);
 
ret = psp_cmd_submit_buf(psp, NULL, cmd,
 psp->fence_buf_mc_addr);
 
if (!ret) {
-   psp->xgmi_context.context.initialized = true;
-   psp->xgmi_context.context.session_id = cmd->resp.session_id;
+   context->session_id = cmd->resp.session_id;
}
 
release_psp_cmd_buf(psp);
@@ -1003,6 +996,11 @@ static int psp_xgmi_load(struct psp_context *psp)
return ret;
 }
 
+static int psp_xgm

Re: [PATCH v2 0/2] Fix a hung during memory pressure test

2021-09-06 Thread Christian König




Am 06.09.21 um 03:12 schrieb xinhui pan:

A long time ago, someone reports system got hung during memory test.
In recent days, I am trying to look for or understand the potential
deadlock in ttm/amdgpu code.

This patchset aims to fix the deadlock during ttm populate.

TTM has a parameter called pages_limit, when allocated GTT memory
reaches this limit, swapout would be triggered. As ttm_bo_swapout does
not return the correct retval, populate might get hung.

UVD ib test uses GTT which might be insufficient. So a gpu recovery
would hung if populate hung.


Ah, now I understand what you are trying to do.

Problem is that won't work either. Allocating VRAM can easily land you 
inside the same deadlock.


We need to avoid the allocation altogether for this for work correctly.



I have made one drm test which alloc two GTT BOs, submit gfx copy
commands and free these BOs without waiting fence. What's more, these
gfx copy commands will cause gfx ring hang. So gpu recovery would be
triggered.


Mhm, that should never be possible. It is perfectly valid for an 
application to terminate without waitting for the GFX submission to be 
completed.


Going to push patch #1 to drm-misc-fixes or drm-misc-next-fixes in a moment.

Thanks,
Christian.



Now here is one possible deadlock case.
gpu_recovery
  -> stop drm scheduler
  -> asic reset
-> ib test
   -> tt populate (uvd ib test)
->  ttm_bo_swapout (BO A) // this always fails as the fence of
BO A would not be signaled by schedluer or HW. Hit deadlock.

I paste the drm test patch below.
#modprobe ttm pages_limit=65536
#amdgpu_test -s 1 -t 4
---
  tests/amdgpu/basic_tests.c | 32 ++--
  1 file changed, 14 insertions(+), 18 deletions(-)

diff --git a/tests/amdgpu/basic_tests.c b/tests/amdgpu/basic_tests.c
index dbf02fee..f85ed340 100644
--- a/tests/amdgpu/basic_tests.c
+++ b/tests/amdgpu/basic_tests.c
@@ -65,13 +65,16 @@ static void amdgpu_direct_gma_test(void);
  static void amdgpu_command_submission_write_linear_helper(unsigned ip_type);
  static void amdgpu_command_submission_const_fill_helper(unsigned ip_type);
  static void amdgpu_command_submission_copy_linear_helper(unsigned ip_type);
-static void amdgpu_test_exec_cs_helper(amdgpu_context_handle context_handle,
+static void _amdgpu_test_exec_cs_helper(amdgpu_context_handle context_handle,
   unsigned ip_type,
   int instance, int pm4_dw, uint32_t 
*pm4_src,
   int res_cnt, amdgpu_bo_handle *resources,
   struct amdgpu_cs_ib_info *ib_info,
-  struct amdgpu_cs_request *ibs_request);
+  struct amdgpu_cs_request *ibs_request, 
int sync, int repeat);
   
+#define amdgpu_test_exec_cs_helper(...) \

+   _amdgpu_test_exec_cs_helper(__VA_ARGS__, 1, 1)
+
  CU_TestInfo basic_tests[] = {
{ "Query Info Test",  amdgpu_query_info_test },
{ "Userptr Test",  amdgpu_userptr_test },
@@ -1341,12 +1344,12 @@ static void amdgpu_command_submission_compute(void)
   * pm4_src, resources, ib_info, and ibs_request
   * submit command stream described in ibs_request and wait for this IB 
accomplished
   */
-static void amdgpu_test_exec_cs_helper(amdgpu_context_handle context_handle,
+static void _amdgpu_test_exec_cs_helper(amdgpu_context_handle context_handle,
   unsigned ip_type,
   int instance, int pm4_dw, uint32_t 
*pm4_src,
   int res_cnt, amdgpu_bo_handle *resources,
   struct amdgpu_cs_ib_info *ib_info,
-  struct amdgpu_cs_request *ibs_request)
+  struct amdgpu_cs_request *ibs_request, 
int sync, int repeat)
  {
int r;
uint32_t expired;
@@ -1395,12 +1398,15 @@ static void 
amdgpu_test_exec_cs_helper(amdgpu_context_handle context_handle,
CU_ASSERT_NOT_EQUAL(ibs_request, NULL);
  
  	/* submit CS */

-   r = amdgpu_cs_submit(context_handle, 0, ibs_request, 1);
+   while (repeat--)
+   r = amdgpu_cs_submit(context_handle, 0, ibs_request, 1);
CU_ASSERT_EQUAL(r, 0);
  
  	r = amdgpu_bo_list_destroy(ibs_request->resources);

CU_ASSERT_EQUAL(r, 0);
  
+	if (!sync)

+   return;
fence_status.ip_type = ip_type;
fence_status.ip_instance = 0;
fence_status.ring = ibs_request->ring;
@@ -1667,7 +1673,7 @@ static void 
amdgpu_command_submission_sdma_const_fill(void)
  
  static void amdgpu_command_submission_copy_linear_helper(unsigned ip_type)

  {
-   const int sdma_write_length = 1024;
+   const int sdma_write_length = (255) << 20;
const int pm4_dw = 256;
amdgpu_context_handle context_handle;
amdgpu_bo_handle bo1, bo2;
@@ -

Re: [PATCH v2 2/2] drm/amdpgu: Use VRAM domain in UVD IB test

2021-09-06 Thread Christian König

Am 06.09.21 um 03:12 schrieb xinhui pan:

Like vce/vcn does, visible VRAM is OK for ib test.
While commit a11d9ff3ebe0 ("drm/amdgpu: use GTT for
uvd_get_create/destory_msg") says VRAM is not mapped correctly in his
platform which is likely an arm64.

So lets change back to use VRAM on x86_64 platform.


That's still a rather clear NAK. This issue is not related to ARM at all 
and you are trying to fix a problem which is independent of the platform.


Christian.



Signed-off-by: xinhui pan 
---
  drivers/gpu/drm/amd/amdgpu/amdgpu_uvd.c | 8 
  1 file changed, 8 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_uvd.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_uvd.c
index d451c359606a..e4b75f33ccc8 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_uvd.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_uvd.c
@@ -1178,7 +1178,11 @@ int amdgpu_uvd_get_create_msg(struct amdgpu_ring *ring, 
uint32_t handle,
int r, i;
  
  	r = amdgpu_bo_create_reserved(adev, 1024, PAGE_SIZE,

+#ifdef CONFIG_X86_64
+ AMDGPU_GEM_DOMAIN_VRAM,
+#else
  AMDGPU_GEM_DOMAIN_GTT,
+#endif
  &bo, NULL, (void **)&msg);
if (r)
return r;
@@ -1210,7 +1214,11 @@ int amdgpu_uvd_get_destroy_msg(struct amdgpu_ring *ring, 
uint32_t handle,
int r, i;
  
  	r = amdgpu_bo_create_reserved(adev, 1024, PAGE_SIZE,

+#ifdef CONFIG_X86_64
+ AMDGPU_GEM_DOMAIN_VRAM,
+#else
  AMDGPU_GEM_DOMAIN_GTT,
+#endif
  &bo, NULL, (void **)&msg);
if (r)
return r;




[PATCH] drm/amdgpu: Create common PSP TA load function

2021-09-06 Thread Candice Li
Creat common PSP TA load function and update PSP ta_mem_context
with size information.

Signed-off-by: Candice Li 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c | 280 +++-
 drivers/gpu/drm/amd/amdgpu/amdgpu_psp.h |  17 +-
 2 files changed, 93 insertions(+), 204 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c
index 54c26432c65b3d..75eed18370eb12 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c
@@ -901,22 +901,20 @@ int psp_reg_program(struct psp_context *psp, enum 
psp_reg_prog_id reg,
 static void psp_prep_ta_load_cmd_buf(struct psp_gfx_cmd_resp *cmd,
 uint64_t ta_bin_mc,
 uint32_t ta_bin_size,
-uint64_t ta_shared_mc,
-uint32_t ta_shared_size)
+struct ta_mem_context *mem_ctx)
 {
cmd->cmd_id = GFX_CMD_ID_LOAD_TA;
cmd->cmd.cmd_load_ta.app_phy_addr_lo= lower_32_bits(ta_bin_mc);
cmd->cmd.cmd_load_ta.app_phy_addr_hi= upper_32_bits(ta_bin_mc);
cmd->cmd.cmd_load_ta.app_len= ta_bin_size;
 
-   cmd->cmd.cmd_load_ta.cmd_buf_phy_addr_lo = lower_32_bits(ta_shared_mc);
-   cmd->cmd.cmd_load_ta.cmd_buf_phy_addr_hi = upper_32_bits(ta_shared_mc);
-   cmd->cmd.cmd_load_ta.cmd_buf_len = ta_shared_size;
+   cmd->cmd.cmd_load_ta.cmd_buf_phy_addr_lo = 
lower_32_bits(mem_ctx->shared_mc_addr);
+   cmd->cmd.cmd_load_ta.cmd_buf_phy_addr_hi = 
upper_32_bits(mem_ctx->shared_mc_addr);
+   cmd->cmd.cmd_load_ta.cmd_buf_len = mem_ctx->shared_mem_size;
 }
 
 static int psp_ta_init_shared_buf(struct psp_context *psp,
- struct ta_mem_context *mem_ctx,
- uint32_t shared_mem_size)
+ struct ta_mem_context *mem_ctx)
 {
int ret;
 
@@ -924,8 +922,8 @@ static int psp_ta_init_shared_buf(struct psp_context *psp,
* Allocate 16k memory aligned to 4k from Frame Buffer (local
* physical) for ta to host memory
*/
-   ret = amdgpu_bo_create_kernel(psp->adev, shared_mem_size, PAGE_SIZE,
- AMDGPU_GEM_DOMAIN_VRAM,
+   ret = amdgpu_bo_create_kernel(psp->adev, mem_ctx->shared_mem_size,
+ PAGE_SIZE, AMDGPU_GEM_DOMAIN_VRAM,
  &mem_ctx->shared_bo,
  &mem_ctx->shared_mc_addr,
  &mem_ctx->shared_buf);
@@ -941,8 +939,7 @@ static void psp_ta_free_shared_buf(struct ta_mem_context 
*mem_ctx)
 
 static int psp_xgmi_init_shared_buf(struct psp_context *psp)
 {
-   return psp_ta_init_shared_buf(psp, 
&psp->xgmi_context.context.mem_context,
- PSP_XGMI_SHARED_MEM_SIZE);
+   return psp_ta_init_shared_buf(psp, 
&psp->xgmi_context.context.mem_context);
 }
 
 static void psp_prep_ta_invoke_cmd_buf(struct psp_gfx_cmd_resp *cmd,
@@ -971,31 +968,27 @@ static int psp_ta_invoke(struct psp_context *psp,
return ret;
 }
 
-static int psp_xgmi_load(struct psp_context *psp)
+static int psp_ta_load(struct psp_context *psp,
+  struct psp_bin_desc *bin_desc,
+  struct ta_context *context)
 {
int ret;
struct psp_gfx_cmd_resp *cmd;
 
-   /*
-* TODO: bypass the loading in sriov for now
-*/
-
cmd = acquire_psp_cmd_buf(psp);
 
-   psp_copy_fw(psp, psp->xgmi.start_addr, psp->xgmi.size_bytes);
+   psp_copy_fw(psp, bin_desc->start_addr, bin_desc->size_bytes);
 
psp_prep_ta_load_cmd_buf(cmd,
 psp->fw_pri_mc_addr,
-psp->xgmi.size_bytes,
-
psp->xgmi_context.context.mem_context.shared_mc_addr,
-PSP_XGMI_SHARED_MEM_SIZE);
+bin_desc->size_bytes,
+&context->mem_context);
 
ret = psp_cmd_submit_buf(psp, NULL, cmd,
 psp->fence_buf_mc_addr);
 
if (!ret) {
-   psp->xgmi_context.context.initialized = true;
-   psp->xgmi_context.context.session_id = cmd->resp.session_id;
+   context->session_id = cmd->resp.session_id;
}
 
release_psp_cmd_buf(psp);
@@ -1003,6 +996,11 @@ static int psp_xgmi_load(struct psp_context *psp)
return ret;
 }
 
+static int psp_xgmi_load(struct psp_context *psp)
+{
+   return psp_ta_load(psp, &psp->xgmi, &psp->xgmi_context.context);
+}
+
 static int psp_xgmi_unload(struct psp_context *psp)
 {
return psp_ta_unload(psp, psp->xgmi_context.context.session_id);
@@ -1051,6 +1049,8 @@ int psp_xgmi_in

[PATCH] drm/radeon: Prefer kcalloc over open coded arithmetic

2021-09-06 Thread Len Baker
As noted in the "Deprecated Interfaces, Language Features, Attributes,
and Conventions" documentation [1], size calculations (especially
multiplication) should not be performed in memory allocator (or similar)
function arguments due to the risk of them overflowing. This could lead
to values wrapping around and a smaller allocation being made than the
caller was expecting. Using those allocations could lead to linear
overflows of heap memory and other misbehaviors.

So, refactor the code a bit to use the purpose specific kcalloc()
function instead of the calculated size argument in the kzalloc()
function.

[1] 
https://www.kernel.org/doc/html/v5.14/process/deprecated.html#open-coded-arithmetic-in-allocator-arguments

Signed-off-by: Len Baker 
---
 drivers/gpu/drm/radeon/r600_dpm.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/radeon/r600_dpm.c 
b/drivers/gpu/drm/radeon/r600_dpm.c
index 35b77c944701..fd4226b99862 100644
--- a/drivers/gpu/drm/radeon/r600_dpm.c
+++ b/drivers/gpu/drm/radeon/r600_dpm.c
@@ -820,12 +820,12 @@ union fan_info {
 static int r600_parse_clk_voltage_dep_table(struct 
radeon_clock_voltage_dependency_table *radeon_table,

ATOM_PPLIB_Clock_Voltage_Dependency_Table *atom_table)
 {
-   u32 size = atom_table->ucNumEntries *
-   sizeof(struct radeon_clock_voltage_dependency_entry);
int i;
ATOM_PPLIB_Clock_Voltage_Dependency_Record *entry;

-   radeon_table->entries = kzalloc(size, GFP_KERNEL);
+   radeon_table->entries = kcalloc(atom_table->ucNumEntries,
+   sizeof(struct 
radeon_clock_voltage_dependency_entry),
+   GFP_KERNEL);
if (!radeon_table->entries)
return -ENOMEM;

--
2.25.1



[QUESTION] [amd-gfx] iMac 5K resolution

2021-09-06 Thread Максим Мосейчук
Hello. I try to install Linux (Gentoo) on iMac 27'' with 5700 XT Pro.
And I want to get 5k resolution.
5k uses 2 DP 1.2 links, but Linux detects only one connection. Max
available resolution is 4k.
kernel 5.6 and 5.14.
Linux-firmware latest from git.

Display sends EDID with max 4k resolution. It's standard behavior for
one link connection.

I got 3 EDIDs from macOS:
1) 2.5k from eDP-1 connect with tile section
2) 2.5k form DP-1 connect with tile section
3) 5K (I think it's fake EDID, not works)

And I force enable ports (video=eDP-1:e video=DP-1:e
drm.edid_firmware=eDP-1:edid/edid1.bin,DP-1:edid/edid2.bin).

But I get only 2.5k resolutions with EDID, Xorg detects a second
connect and uses it, but the monitor shows a scaled image from eDP-1
only.

I know that the intel driver works with LG Ultrafine display in native
5k support after 5.9 kernel. It's the same display that in iMac. But
connected over one TB3 (2 DP 1.2 too).
I checked last ubuntu 21.04 and fedora 34. Max 4k too.

I don't know how to check where the problem is. In DRM or amdgpu? Or
does Apple use non-typical controller and need to activate a second
connect over i2c or another bus?

edid1.bin base64
AP///wAGEDKuFsR5ABYdAQS1PCJ4IghBrlJBsyYOUFQBAQEBAQEBAQEBAQEBAQEBxbwAoKBAUrAwIDoAK1ARAAAaj3oAoIAAQpAwIDoAK1ARAAAa/ABpTWFjCiAgICAgICAg/wAwMjA2MDkzMTZDNDM5ASdwE3kDABIAFoIQAAD/CT8LAABBUFAzrhbEea0DADxM0AAE/w6fAC+AHwBvCD0AAgAEAMyRAAR/DJ8AL4AfAAcHMwACAAQAVV4ABP8JnwAvgB8AnwUoAAIABAB/gRj6EAABAQASdjH8ePv/AhCIYtP6+Pj+//8AAABykA==

edid2.bin base64
AP///wAGEDKuFsR5ABYdAQS1PCJ4IghBrlJBsyYOUFQBAQEBAQEBAQEBAQEBAQEBxbwAoKBAUrAwIDoAK1ARAAAaj3oAoIAAQpAwIDoAK1ARAAAa/ABpTWFjCiAgICAgICAg/wAwMjA2MDkzMTZDNDM5ASdwE3kDABIAFoAQEAD/CT8LAABBUFAzrhbEea0AAACVkA==