date:20240503

Re: [PATCH v1 10/12] sfc: falcon: Make I2C terminology more inclusive

2024-05-03 Thread Jakub Kicinski

On Tue, 30 Apr 2024 17:38:09 + Easwar Hariharan wrote:
> I2C v7, SMBus 3.2, and I3C 1.1.1 specifications have replaced "master/slave"
> with more appropriate terms. Inspired by and following on to Wolfram's
> series to fix drivers/i2c/[1], fix the terminology for users of
> I2C_ALGOBIT bitbanging interface, now that the approved verbiage exists
> in the specification.
> 
> Compile tested, no functionality changes intended

FWIW we're assuming someone (Wolfram?) will take all of these,
instead of area maintainers picking them individually.
Please let us know if that's incorrect.

[PATCH] drm/amdkfd: Ensure gpu_id is unique

2024-05-03 Thread Harish Kasiviswanathan

gpu_id needs to be unique for user space to identify GPUs via KFD
interface. In the current implementation there is a very small
probability of having non unique gpu_ids.

v2: Add check to confirm if gpu_id is unique. If not unique, find one
Changed commit header to reflect the above

Signed-off-by: Harish Kasiviswanathan 
---
 drivers/gpu/drm/amd/amdkfd/kfd_topology.c | 26 ++-
 1 file changed, 25 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_topology.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
index b93913934b03..01d4c2e10c6d 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
@@ -1095,6 +1095,8 @@ static uint32_t kfd_generate_gpu_id(struct kfd_node *gpu)
uint32_t hashout;
uint32_t buf[8];
uint64_t local_mem_size;
+   struct kfd_topology_device *dev;
+   bool is_unique;
int i;
 
if (!gpu)
@@ -1115,6 +1117,28 @@ static uint32_t kfd_generate_gpu_id(struct kfd_node *gpu)
for (i = 0, hashout = 0; i < 8; i++)
hashout ^= hash_32(buf[i], KFD_GPU_ID_HASH_WIDTH);
 
+   /* hash generated could be non-unique. Check if it is unique.
+* If not unique increment till unique one is found. In case
+* of overflow, restart from 1
+   */
+   down_read(&topology_lock);
+   do {
+   is_unique = true;
+   list_for_each_entry(dev, &topology_device_list, list) {
+   if (dev->gpu && dev->gpu_id == hashout) {
+   is_unique = false;
+   break;
+   }
+   }
+   if (unlikely(!is_unique)) {
+   hashout = (hashout + 1) &
+ ((1 << KFD_GPU_ID_HASH_WIDTH) - 1);
+   if (!hashout)
+   hashout = 1;
+   }
+   } while (!is_unique);
+   up_read(&topology_lock);
+
return hashout;
 }
 /* kfd_assign_gpu - Attach @gpu to the correct kfd topology device. If
@@ -1946,7 +1970,6 @@ int kfd_topology_add_device(struct kfd_node *gpu)
struct amdgpu_gfx_config *gfx_info = &gpu->adev->gfx.config;
struct amdgpu_cu_info *cu_info = &gpu->adev->gfx.cu_info;
 
-   gpu_id = kfd_generate_gpu_id(gpu);
if (gpu->xcp && !gpu->xcp->ddev) {
dev_warn(gpu->adev->dev,
 "Won't add GPU to topology since it has no drm node 
assigned.");
@@ -1969,6 +1992,7 @@ int kfd_topology_add_device(struct kfd_node *gpu)
if (res)
return res;
 
+   gpu_id = kfd_generate_gpu_id(gpu);
dev->gpu_id = gpu_id;
gpu->id = gpu_id;
 
-- 
2.34.1

Re: [PATCH v2 03/12] drm/i915: Make I2C terminology more inclusive

2024-05-03 Thread Rodrigo Vivi

On Fri, May 03, 2024 at 02:04:15PM -0700, Easwar Hariharan wrote:
> On 5/3/2024 12:34 PM, Rodrigo Vivi wrote:
> > On Fri, May 03, 2024 at 06:13:24PM +, Easwar Hariharan wrote:
> >> I2C v7, SMBus 3.2, and I3C 1.1.1 specifications have replaced 
> >> "master/slave"
> >> with more appropriate terms. Inspired by and following on to Wolfram's
> >> series to fix drivers/i2c/[1], fix the terminology for users of
> >> I2C_ALGOBIT bitbanging interface, now that the approved verbiage exists
> >> in the specification.
> >>
> >> Compile tested, no functionality changes intended
> >>
> >> [1]: 
> >> https://lore.kernel.org/all/20240322132619.6389-1-wsa+rene...@sang-engineering.com/
> >>
> >> Reviewed-by: Rodrigo Vivi 
> >> Acked-by: Rodrigo Vivi 
> > 
> > It looks like the ack is not needed since we are merging this through
> > drm-intel-next. But I'm planing to merge this only after seeing the
> > main drivers/i2c accepting the new terminology. So we don't have a
> > risk of that getting push back and new names there and we having
> > to rename it once again.
> 
> Just to be explicit, did you want me to remove the Acked-by in v3, or will 
> you when you pull
> the patch into drm-intel-next?
> 
> > 
> > (more below)
> > 
> >> Acked-by: Zhi Wang 
> >> Signed-off-by: Easwar Hariharan 
> > 
> > Cc: Jani Nikula 
> > 
> > Jani, what bits were you concerned that were not necessarily i2c?
> > I believe although not necessarily/directly i2c, I believe they
> > are related and could benefit from the massive single shot renable.
> > or do you have any better split to suggest here?
> > 
> > (more below)
> > 
> >> ---
> >>  drivers/gpu/drm/i915/display/dvo_ch7017.c | 14 -
> >>  drivers/gpu/drm/i915/display/dvo_ch7xxx.c | 18 +--
> >>  drivers/gpu/drm/i915/display/dvo_ivch.c   | 16 +-
> >>  drivers/gpu/drm/i915/display/dvo_ns2501.c | 18 +--
> >>  drivers/gpu/drm/i915/display/dvo_sil164.c | 18 +--
> >>  drivers/gpu/drm/i915/display/dvo_tfp410.c | 18 +--
> >>  drivers/gpu/drm/i915/display/intel_bios.c | 22 +++---
> >>  drivers/gpu/drm/i915/display/intel_ddi.c  |  2 +-
> >>  .../gpu/drm/i915/display/intel_display_core.h |  2 +-
> >>  drivers/gpu/drm/i915/display/intel_dsi.h  |  2 +-
> >>  drivers/gpu/drm/i915/display/intel_dsi_vbt.c  | 20 ++---
> >>  drivers/gpu/drm/i915/display/intel_dvo.c  | 14 -
> >>  drivers/gpu/drm/i915/display/intel_dvo_dev.h  |  2 +-
> >>  drivers/gpu/drm/i915/display/intel_gmbus.c|  4 +--
> >>  drivers/gpu/drm/i915/display/intel_sdvo.c | 30 +--
> >>  drivers/gpu/drm/i915/display/intel_vbt_defs.h |  4 +--
> >>  drivers/gpu/drm/i915/gvt/edid.c   | 28 -
> >>  drivers/gpu/drm/i915/gvt/edid.h   |  4 +--
> >>  drivers/gpu/drm/i915/gvt/opregion.c   |  2 +-
> >>  19 files changed, 119 insertions(+), 119 deletions(-)
> >>
> 
> 
> 
> >> diff --git a/drivers/gpu/drm/i915/display/intel_ddi.c 
> >> b/drivers/gpu/drm/i915/display/intel_ddi.c
> >> index c17462b4c2ac..64db211148a8 100644
> >> --- a/drivers/gpu/drm/i915/display/intel_ddi.c
> >> +++ b/drivers/gpu/drm/i915/display/intel_ddi.c
> >> @@ -4332,7 +4332,7 @@ static int intel_ddi_compute_config_late(struct 
> >> intel_encoder *encoder,
> >>
> >> connector->tile_group->id);
> >>  
> >>/*
> >> -   * EDP Transcoders cannot be ensalved
> >> +   * EDP Transcoders cannot be slaves
> > 
> >  ^ here
> > perhaps you meant 'targeted' ?
> > 
> >> * make them a master always when present
> 
> 
> 
> This is not actually I2C related as far as I could tell when I was making the 
> change, so this was more of a typo fix.
> 
> If we want to improve this, a quick check with the eDP v1.5a spec suggests 
> using primary/secondary instead,
> though in a global fashion rather than specifically for eDP transcoders. 
> There is also source/sink terminology
> in the spec related to DP encoders.
> 
> Which would be a more acceptable change here?

hmmm probably better to split the patches and align with the spec naming where 
it applies.
and with i2c name where it applies.

> 
> Thanks,
> Easwar

Re: [PATCH v2 03/12] drm/i915: Make I2C terminology more inclusive

2024-05-03 Thread Rodrigo Vivi

On Fri, May 03, 2024 at 06:13:24PM +, Easwar Hariharan wrote:
> I2C v7, SMBus 3.2, and I3C 1.1.1 specifications have replaced "master/slave"
> with more appropriate terms. Inspired by and following on to Wolfram's
> series to fix drivers/i2c/[1], fix the terminology for users of
> I2C_ALGOBIT bitbanging interface, now that the approved verbiage exists
> in the specification.
> 
> Compile tested, no functionality changes intended
> 
> [1]: 
> https://lore.kernel.org/all/20240322132619.6389-1-wsa+rene...@sang-engineering.com/
> 
> Reviewed-by: Rodrigo Vivi 
> Acked-by: Rodrigo Vivi 

It looks like the ack is not needed since we are merging this through
drm-intel-next. But I'm planing to merge this only after seeing the
main drivers/i2c accepting the new terminology. So we don't have a
risk of that getting push back and new names there and we having
to rename it once again.

(more below)

> Acked-by: Zhi Wang 
> Signed-off-by: Easwar Hariharan 

Cc: Jani Nikula 

Jani, what bits were you concerned that were not necessarily i2c?
I believe although not necessarily/directly i2c, I believe they
are related and could benefit from the massive single shot renable.
or do you have any better split to suggest here?

(more below)

> ---
>  drivers/gpu/drm/i915/display/dvo_ch7017.c | 14 -
>  drivers/gpu/drm/i915/display/dvo_ch7xxx.c | 18 +--
>  drivers/gpu/drm/i915/display/dvo_ivch.c   | 16 +-
>  drivers/gpu/drm/i915/display/dvo_ns2501.c | 18 +--
>  drivers/gpu/drm/i915/display/dvo_sil164.c | 18 +--
>  drivers/gpu/drm/i915/display/dvo_tfp410.c | 18 +--
>  drivers/gpu/drm/i915/display/intel_bios.c | 22 +++---
>  drivers/gpu/drm/i915/display/intel_ddi.c  |  2 +-
>  .../gpu/drm/i915/display/intel_display_core.h |  2 +-
>  drivers/gpu/drm/i915/display/intel_dsi.h  |  2 +-
>  drivers/gpu/drm/i915/display/intel_dsi_vbt.c  | 20 ++---
>  drivers/gpu/drm/i915/display/intel_dvo.c  | 14 -
>  drivers/gpu/drm/i915/display/intel_dvo_dev.h  |  2 +-
>  drivers/gpu/drm/i915/display/intel_gmbus.c|  4 +--
>  drivers/gpu/drm/i915/display/intel_sdvo.c | 30 +--
>  drivers/gpu/drm/i915/display/intel_vbt_defs.h |  4 +--
>  drivers/gpu/drm/i915/gvt/edid.c   | 28 -
>  drivers/gpu/drm/i915/gvt/edid.h   |  4 +--
>  drivers/gpu/drm/i915/gvt/opregion.c   |  2 +-
>  19 files changed, 119 insertions(+), 119 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/display/dvo_ch7017.c 
> b/drivers/gpu/drm/i915/display/dvo_ch7017.c
> index d0c3880d7f80..493e730c685b 100644
> --- a/drivers/gpu/drm/i915/display/dvo_ch7017.c
> +++ b/drivers/gpu/drm/i915/display/dvo_ch7017.c
> @@ -170,13 +170,13 @@ static bool ch7017_read(struct intel_dvo_device *dvo, 
> u8 addr, u8 *val)
>  {
>   struct i2c_msg msgs[] = {
>   {
> - .addr = dvo->slave_addr,
> + .addr = dvo->target_addr,
>   .flags = 0,
>   .len = 1,
>   .buf = &addr,
>   },
>   {
> - .addr = dvo->slave_addr,
> + .addr = dvo->target_addr,
>   .flags = I2C_M_RD,
>   .len = 1,
>   .buf = val,
> @@ -189,7 +189,7 @@ static bool ch7017_write(struct intel_dvo_device *dvo, u8 
> addr, u8 val)
>  {
>   u8 buf[2] = { addr, val };
>   struct i2c_msg msg = {
> - .addr = dvo->slave_addr,
> + .addr = dvo->target_addr,
>   .flags = 0,
>   .len = 2,
>   .buf = buf,
> @@ -197,7 +197,7 @@ static bool ch7017_write(struct intel_dvo_device *dvo, u8 
> addr, u8 val)
>   return i2c_transfer(dvo->i2c_bus, &msg, 1) == 1;
>  }
>  
> -/** Probes for a CH7017 on the given bus and slave address. */
> +/** Probes for a CH7017 on the given bus and target address. */
>  static bool ch7017_init(struct intel_dvo_device *dvo,
>   struct i2c_adapter *adapter)
>  {
> @@ -227,13 +227,13 @@ static bool ch7017_init(struct intel_dvo_device *dvo,
>   break;
>   default:
>   DRM_DEBUG_KMS("ch701x not detected, got %d: from %s "
> -   "slave %d.\n",
> -   val, adapter->name, dvo->slave_addr);
> +   "target %d.\n",
> +   val, adapter->name, dvo->target_addr);
>   goto fail;
>   }
>  
>   DRM_DEBUG_KMS("%s detected on %s, addr %d\n",
> -   str, adapter->name, dvo->slave_addr);
> +   str, adapter->name, dvo->target_addr);
>   return true;
>  
>  fail:
> diff --git a/drivers/gpu/drm/i915/display/dvo_ch7xxx.c 
> b/drivers/gpu/drm/i915/display/dvo_ch7xxx.c
> index 2e8e85da5a40..534b8544e0a4 100644
> --- a/drivers/gpu/drm/i915/display/dvo_ch7xxx.c
> +++ b/drivers/gpu/drm/i915/

Re: [PATCH v0 04/14] media: au0828: Make I2C terminology more inclusive

2024-05-03 Thread Mauro Carvalho Chehab

Em Fri, 29 Mar 2024 17:00:28 +
Easwar Hariharan  escreveu:

> I2C v7, SMBus 3.2, and I3C specifications have replaced "master/slave"
> with more appropriate terms. Inspired by and following on to Wolfram's
> series to fix drivers/i2c/[1], fix the terminology for users of
> I2C_ALGOBIT bitbanging interface, now that the approved verbiage exists
> in the specification.
> 
> Compile tested, no functionality changes intended

Current media drivers are perfectly fine with the current terminology. 
None of them implement the above new standards.

Please drop patches for current stuff under drivers/media.

Regards,
Mauro

> 
> [1]: 
> https://lore.kernel.org/all/20240322132619.6389-1-wsa+rene...@sang-engineering.com/
> 
> Signed-off-by: Easwar Hariharan 
> ---
>  drivers/media/usb/au0828/au0828-i2c.c   | 4 ++--
>  drivers/media/usb/au0828/au0828-input.c | 2 +-
>  2 files changed, 3 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/media/usb/au0828/au0828-i2c.c 
> b/drivers/media/usb/au0828/au0828-i2c.c
> index 749f90d73b5b..3e66d42bf134 100644
> --- a/drivers/media/usb/au0828/au0828-i2c.c
> +++ b/drivers/media/usb/au0828/au0828-i2c.c
> @@ -23,7 +23,7 @@ MODULE_PARM_DESC(i2c_scan, "scan i2c bus at insmod time");
>  #define I2C_WAIT_DELAY 25
>  #define I2C_WAIT_RETRY 1000
>  
> -static inline int i2c_slave_did_read_ack(struct i2c_adapter *i2c_adap)
> +static inline int i2c_client_did_read_ack(struct i2c_adapter *i2c_adap)
>  {
>   struct au0828_dev *dev = i2c_adap->algo_data;
>   return au0828_read(dev, AU0828_I2C_STATUS_201) &
> @@ -35,7 +35,7 @@ static int i2c_wait_read_ack(struct i2c_adapter *i2c_adap)
>   int count;
>  
>   for (count = 0; count < I2C_WAIT_RETRY; count++) {
> - if (!i2c_slave_did_read_ack(i2c_adap))
> + if (!i2c_client_did_read_ack(i2c_adap))
>   break;
>   udelay(I2C_WAIT_DELAY);
>   }
> diff --git a/drivers/media/usb/au0828/au0828-input.c 
> b/drivers/media/usb/au0828/au0828-input.c
> index 3d3368202cd0..98a57b6e02e2 100644
> --- a/drivers/media/usb/au0828/au0828-input.c
> +++ b/drivers/media/usb/au0828/au0828-input.c
> @@ -30,7 +30,7 @@ struct au0828_rc {
>   int polling;
>   struct delayed_work work;
>  
> - /* i2c slave address of external device (if used) */
> + /* i2c client address of external device (if used) */
>   u16 i2c_dev_addr;
>  
>   int  (*get_key_i2c)(struct au0828_rc *ir);

Re: [PATCH] Documentation/gpu: Document the situation with unqualified drm-memory-

2024-05-03 Thread Alex Deucher

On Fri, May 3, 2024 at 1:06 PM Tvrtko Ursulin  wrote:
>
>
> On 03/05/2024 16:58, Alex Deucher wrote:
> > On Fri, May 3, 2024 at 11:33 AM Daniel Vetter  wrote:
> >>
> >> On Fri, May 03, 2024 at 01:58:38PM +0100, Tvrtko Ursulin wrote:
> >>>
> >>> [And I forgot dri-devel.. doing well!]
> >>>
> >>> On 03/05/2024 13:40, Tvrtko Ursulin wrote:
> 
>  [Correcting Christian's email]
> 
>  On 03/05/2024 13:36, Tvrtko Ursulin wrote:
> > From: Tvrtko Ursulin 
> >
> > Currently it is not well defined what is drm-memory- compared to other
> > categories.
> >
> > In practice the only driver which emits these keys is amdgpu and in them
> > exposes the total memory use (including shared).
> >
> > Document that drm-memory- and drm-total-memory- are aliases to
> > prevent any
> > confusion in the future.
> >
> > While at it also clarify that the reserved sub-string 'memory' refers to
> > the memory region component.
> >
> > Signed-off-by: Tvrtko Ursulin 
> > Cc: Alex Deucher 
> > Cc: Christian König 
> 
>  Mea culpa, I copied the mistake from
>  77d17c4cd0bf52eacfad88e63e8932eb45d643c5. :)
> 
>  Regards,
> 
>  Tvrtko
> 
> > Cc: Rob Clark 
> > ---
> >Documentation/gpu/drm-usage-stats.rst | 10 +-
> >1 file changed, 9 insertions(+), 1 deletion(-)
> >
> > diff --git a/Documentation/gpu/drm-usage-stats.rst
> > b/Documentation/gpu/drm-usage-stats.rst
> > index 6dc299343b48..ef5c0a0aa477 100644
> > --- a/Documentation/gpu/drm-usage-stats.rst
> > +++ b/Documentation/gpu/drm-usage-stats.rst
> > @@ -128,7 +128,9 @@ Memory
> >Each possible memory type which can be used to store buffer
> > objects by the
> >GPU in question shall be given a stable and unique name to be
> > returned as the
> > -string here.  The name "memory" is reserved to refer to normal
> > system memory.
> > +string here.
> > +
> > +The region name "memory" is reserved to refer to normal system memory.
> >Value shall reflect the amount of storage currently consumed by
> > the buffer
> >objects belong to this client, in the respective memory region.
> > @@ -136,6 +138,9 @@ objects belong to this client, in the respective
> > memory region.
> >Default unit shall be bytes with optional unit specifiers of 'KiB'
> > or 'MiB'
> >indicating kibi- or mebi-bytes.
> > +This is an alias for drm-total- and only one of the two
> > should be
> > +present.
> >>
> >> This feels a bit awkward and seems to needlessly complicate fdinfo uapi.
> >>
> >> - Could we just patch amdgpu to follow everyone else, and avoid the
> >>special case? If there's no tool that relies on the special amdgpu
> >>prefix then that would be a lot easier.
> >>
> >> - If that's not on the table, could we make everyone (with a suitable
> >>helper or something) just print both variants, so that we again have
> >>consisent fdinfo output? Or breaks that a different set of existing
> >>tools.
> >>
> >> - Finally maybe could we get away with fixing amd by adding the common
> >>format there, deprecating the old, fixing the tools that would break and
> >>then maybe if we're lucky, remove the old one from amdgpu in a year or
> >>so?
> >
> > I'm not really understanding what amdgpu is doing wrong.  It seems to
> > be following the documentation.  Is the idea that we would like to
> > deprecate drm-memory- in favor of drm-total-?
> > If that's the case, I think the 3rd option is probably the best.  We
> > have a lot of tools and customers using this.  It would have also been
> > nice to have "memory" in the string for the newer ones to avoid
> > conflicts with other things that might be a total or shared in the
> > future, but I guess that ship has sailed.  We should also note that
> > drm-memory- is deprecated.  While we are here, maybe we should
> > clarify the semantics of resident, purgeable, and active.  For
> > example, isn't resident just a duplicate of total?  If the memory was
> > not resident, it would be in a different region.
>
> Amdgpu isn't doing anything wrong. It just appears when the format was
> discussed no one noticed (me included) that the two keys are not clearly
> described. And it looks there also wasn't a plan to handle the uncelar
> duality in the future.
>
> For me deprecating sounds fine, the 3rd option. I understand we would
> only make amdgpu emit both sets of keys and then remove drm-memory- in
> due time.
>
> With regards to key naming, yeah, memory in the name would have been
> nice. We had a lot of discussion on this topic but ship has indeed
> sailed. It is probably workarble for anything new that might come to add
> their prefix. As long as it does not clash with the memory categories is
> should be fine.
>
> In terms of resident semantics, think of it as VIRT vs RES in top(1).

Re: [PATCH v1 3/4] drm/amdgpu: add compute registers in ip dump for gfx10

2024-05-03 Thread Khatri, Sunil




On 5/3/2024 9:52 PM, Alex Deucher wrote:

On Fri, May 3, 2024 at 12:09 PM Khatri, Sunil  wrote:


On 5/3/2024 9:18 PM, Khatri, Sunil wrote:

On 5/3/2024 8:52 PM, Alex Deucher wrote:

On Fri, May 3, 2024 at 4:45 AM Sunil Khatri 
wrote:

add compute registers in set of registers to dump
during ip dump for gfx10.

Signed-off-by: Sunil Khatri 
---
   drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c | 42
+-
   1 file changed, 41 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
b/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
index 953df202953a..00c7a842ea3b 100644
--- a/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
@@ -378,7 +378,47 @@ static const struct amdgpu_hwip_reg_entry
gc_reg_list_10_1[] = {
  SOC15_REG_ENTRY_STR(GC, 0, mmGRBM_STATUS_SE0),
  SOC15_REG_ENTRY_STR(GC, 0, mmGRBM_STATUS_SE1),
  SOC15_REG_ENTRY_STR(GC, 0, mmGRBM_STATUS_SE2),
-   SOC15_REG_ENTRY_STR(GC, 0, mmGRBM_STATUS_SE3)
+   SOC15_REG_ENTRY_STR(GC, 0, mmGRBM_STATUS_SE3),
+   /* compute registers */
+   SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_VMID),
+   SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_PERSISTENT_STATE),
+   SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_PIPE_PRIORITY),
+   SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_QUEUE_PRIORITY),
+   SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_QUANTUM),
+   SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_PQ_BASE),
+   SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_PQ_BASE_HI),
+   SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_PQ_RPTR),
+   SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_PQ_WPTR_POLL_ADDR),
+   SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_PQ_WPTR_POLL_ADDR_HI),
+   SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_PQ_DOORBELL_CONTROL),
+   SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_PQ_CONTROL),
+   SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_IB_BASE_ADDR),
+   SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_IB_BASE_ADDR_HI),
+   SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_IB_RPTR),
+   SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_IB_CONTROL),
+   SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_DEQUEUE_REQUEST),
+   SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_EOP_BASE_ADDR),
+   SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_EOP_BASE_ADDR_HI),
+   SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_EOP_CONTROL),
+   SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_EOP_RPTR),
+   SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_EOP_WPTR),
+   SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_EOP_EVENTS),
+   SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_CTX_SAVE_BASE_ADDR_LO),
+   SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_CTX_SAVE_BASE_ADDR_HI),
+   SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_CTX_SAVE_CONTROL),
+   SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_CNTL_STACK_OFFSET),
+   SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_CNTL_STACK_SIZE),
+   SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_WG_STATE_OFFSET),
+   SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_CTX_SAVE_SIZE),
+   SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_GDS_RESOURCE_STATE),
+   SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_ERROR),
+   SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_EOP_WPTR_MEM),
+   SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_PQ_WPTR_LO),
+   SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_PQ_WPTR_HI),
+   SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_SUSPEND_CNTL_STACK_OFFSET),
+   SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_SUSPEND_CNTL_STACK_DW_CNT),
+   SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_SUSPEND_WG_STATE_OFFSET),
+   SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_DEQUEUE_STATUS)

The registers in patches 3 and 4 are multi-instance, so we should
ideally print every instance of them rather than just one.  Use
nv_grbm_select() to select the pipes and queues.  Make sure to protect
access using the adev->srbm_mutex mutex.

E.g., for the compute registers (patch 3):
  mutex_lock(&adev->srbm_mutex);
  for (i = 0; i < adev->gfx.mec.num_mec; ++i) {
  for (j = 0; j < adev->gfx.mec.num_pipe_per_mec; j++) {
 for (k = 0; k <
adev->gfx.mec.num_queue_per_pipe; k++) {
   drm_printf("mec %d, pipe %d, queue %d\n", i, j, k);
  nv_grbm_select(adev, i, j, k, 0);
 for (reg = 0; reg < ARRAY_SIZE(compute_regs);
reg++)
 drm_printf(...RREG(compute_regs[reg]));
  }
  }
  }
  nv_grbm_select(adev, 0, 0, 0, 0);
  mutex_unlock(&adev->srbm_mutex);

For gfx registers (patch 4):

  mutex_lock(&adev->srbm_mutex);
  for (i = 0; i < adev->gfx.me.num_me; ++i) {
  for (j = 0; j < adev->gfx.me.num_pipe_per_me; j++) {
  for (k = 0; k < adev->gfx.me.num_queue_per_pipe;
k++) {
drm_printf("me %d, pipe %d, queue
%d\n", i, j, k);
  nv_grbm_select(adev, i, j, k, 0);
 for (reg = 0; reg < ARRAY_SIZE(gfx_regs); reg++)
 drm_printf(...RREG(gfx_regs[reg]));

I see one problem here, we dump the registers in mem

Re: Splat during driver probe

2024-05-03 Thread Paneer Selvam, Arunpravin


Hi Alex,

yes, this is related to memory clearing changes. This is a known issue.
I am working on a fix.

Thanks,
Arun.

On 5/3/2024 6:46 PM, Alex Deucher wrote:

Possibly related to Arun's new memory clearing changes.  @Arunpravin
can you take a look?

Alex

On Fri, May 3, 2024 at 9:10 AM Tvrtko Ursulin  wrote:


Hi,

I tried asking on #radeon yesterday but no takers. I was wondering if
the below splat is already known or not.

Appears to happen due amdgpu_bo_release_notify genuniely running before
amdgpu_ttm_set_buffer_funcs_status set the buffer funcs to enabled
during device probe.

I couldn't easily figure out what changed.

Regards,

Tvrtko

[   11.802030] [drm:amdgpu_fill_buffer [amdgpu]] *ERROR* Trying to clear
memory with ring turned off.
[   11.802519] [ cut here ]
[   11.802530] WARNING: CPU: 5 PID: 435 at
drivers/gpu/drm/amd/amdgpu/amdgpu_object.c:1378
amdgpu_bo_release_notify+0x20c/0x230 [amdgpu]
[   11.802916] Modules linked in: nft_fib_inet nft_fib_ipv4 nft_fib_ipv6
nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct
amdgpu(E+) nft_chain_nat nf_tables ip6table_nat ip6table_mangle
ip6table_raw ip6table_security iptable_nat nf_nat nf_conntrack
nf_defrag_ipv6 nf_defrag_ipv4 iptable_mangle iptable_raw
iptable_security nfnetlink ip6table_filter ip6_tables iptable_filter
cmac algif_hash ramoops algif_skcipher reed_solomon af_alg bnep qrtr
mousedev intel_rapl_msr ath11k_pci(+) intel_rapl_common mhi
snd_acp_sof_mach snd_acp_mach snd_sof_probes ath11k snd_soc_dmic kvm_amd
cdc_mbim qmi_helpers joydev hid_multitouch cdc_wdm snd_sof_amd_vangogh
snd_sof_pci kvm mac80211 snd_hda_codec_hdmi hci_uart crct10dif_pclmul
amdxcp i2c_algo_bit snd_sof_amd_acp btqca drm_ttm_helper libarc4
crc32_pclmul btrtl snd_hda_intel snd_sof_xtensa_dsp btbcm ttm
snd_intel_dspcfg btintel polyval_clmulni snd_sof drm_exec
polyval_generic snd_hda_codec snd_sof_utils gpu_sched cfg80211 bluetooth
snd_pci_acp5x gf128mul cdc_ncm
[   11.803070]  sp5100_tco snd_hwdep ghash_clmulni_intel snd_soc_nau8821
snd_soc_max98388 drm_suballoc_helper sha512_ssse3 cdc_ether
snd_acp_config drm_buddy atkbd snd_soc_core aesni_intel snd_hda_core
video ecdh_generic crypto_simd libps2 usbnet cryptd rapl mii wdat_wdt
vivaldi_fmap pcspkr snd_compress i2c_piix4 snd_pcm drm_display_helper
ccp snd_soc_acpi cdc_acm rfkill wmi ltrf216a 8250_dw i2c_hid_acpi
snd_timer snd i2c_hid industrialio soundcore hid_steam pkcs8_key_parser
crypto_user loop fuse dm_mod ip_tables x_tables overlay ext4 crc16
mbcache jbd2 usbhid vfat fat btrfs blake2b_generic libcrc32c
crc32c_generic xor raid6_pq sdhci_pci cqhci nvme serio_raw sdhci
xhci_pci xhci_pci_renesas crc32c_intel mmc_core nvme_core i8042 serio
spi_amd
[   11.803448] CPU: 5 PID: 435 Comm: (udev-worker) Tainted: GW
E  6.9.0-rc6 #3 dd3fb65c639fd86ff89731c604b1e804996e8989
[   11.803471] Hardware name: Valve Galileo/Galileo, BIOS F7G0107 12/01/2023
[   11.803486] RIP: 0010:amdgpu_bo_release_notify+0x20c/0x230 [amdgpu]
[   11.803857] Code: 0b e9 a4 fe ff ff 48 ba ff ff ff ff ff ff ff 7f 31
f6 4c 89 e7 e8 44 e5 33 e6 eb 8d e8 dd dd 33 e6 eb a7 0f 0b e9 4d fe ff
ff <0f> 0b eb 9c be 03 00 00 00 e8 f6 04 00 e6 eb 90 e8 8f 64 79 e6 66
[   11.803890] RSP: 0018:a77bc1537568 EFLAGS: 00010282
[   11.803904] RAX: ffea RBX: 8e1f0da89048 RCX:

[   11.803919] RDX:  RSI: 0027 RDI:

[   11.803934] RBP: 8e1f6ec0ffb0 R08:  R09:
0003
[   11.803948] R10: a77bc15372f0 R11: 8e223ef7ffe8 R12:
8e1f0da89000
[   11.803963] R13: 8e1f0da89180 R14: 8e1f6ec0ffb0 R15:
8e1f6ec0
[   11.803977] FS:  7fa2b600b200() GS:8e222ee8()
knlGS:
[   11.803994] CS:  0010 DS:  ES:  CR0: 80050033
[   11.804007] CR2: 55cac2aa7080 CR3: 00010f818000 CR4:
00350ef0
[   11.804021] Call Trace:
[   11.804031]  
[   11.804042]  ? __warn+0x8c/0x180
[   11.804057]  ? amdgpu_bo_release_notify+0x20c/0x230 [amdgpu
03b1dca7e09918e3f45efb1acdfc90682f73e4c1]
[   11.804456]  ? report_bug+0x191/0x1c0
[   11.804473]  ? handle_bug+0x3a/0x70
[   11.804485]  ? exc_invalid_op+0x17/0x70
[   11.804497]  ? asm_exc_invalid_op+0x1a/0x20
[   11.804517]  ? amdgpu_bo_release_notify+0x20c/0x230 [amdgpu
03b1dca7e09918e3f45efb1acdfc90682f73e4c1]
[   11.804894]  ? amdgpu_bo_release_notify+0x14a/0x230 [amdgpu
03b1dca7e09918e3f45efb1acdfc90682f73e4c1]
[   11.805277]  ttm_bo_release+0x115/0x330 [ttm
39c917822ce73b2a754d4183af7a0990991c66be]
[   11.805303]  ? srso_return_thunk+0x5/0x5f
[   11.805315]  ? __mutex_unlock_slowpath+0x3a/0x290
[   11.805332]  amdgpu_bo_free_kernel+0xd6/0x130 [amdgpu
03b1dca7e09918e3f45efb1acdfc90682f73e4c1]
[   11.805709]  dm_helpers_free_gpu_mem+0x41/0x80 [amdgpu
03b1dca7e09918e3f45efb1acdfc90682f73e4c1]
[   11.806172]  vg_clk_mgr_construct+0x2c4/0x490 [amdgpu
03b1dca7e09918e3f45efb1acdfc90682f73e4c1]
[   11.806645]  dc_clk

Re: [PATCH v1 3/4] drm/amdgpu: add compute registers in ip dump for gfx10

2024-05-03 Thread Alex Deucher

On Fri, May 3, 2024 at 12:09 PM Khatri, Sunil  wrote:
>
>
> On 5/3/2024 9:18 PM, Khatri, Sunil wrote:
> >
> > On 5/3/2024 8:52 PM, Alex Deucher wrote:
> >> On Fri, May 3, 2024 at 4:45 AM Sunil Khatri 
> >> wrote:
> >>> add compute registers in set of registers to dump
> >>> during ip dump for gfx10.
> >>>
> >>> Signed-off-by: Sunil Khatri 
> >>> ---
> >>>   drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c | 42
> >>> +-
> >>>   1 file changed, 41 insertions(+), 1 deletion(-)
> >>>
> >>> diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
> >>> b/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
> >>> index 953df202953a..00c7a842ea3b 100644
> >>> --- a/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
> >>> +++ b/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
> >>> @@ -378,7 +378,47 @@ static const struct amdgpu_hwip_reg_entry
> >>> gc_reg_list_10_1[] = {
> >>>  SOC15_REG_ENTRY_STR(GC, 0, mmGRBM_STATUS_SE0),
> >>>  SOC15_REG_ENTRY_STR(GC, 0, mmGRBM_STATUS_SE1),
> >>>  SOC15_REG_ENTRY_STR(GC, 0, mmGRBM_STATUS_SE2),
> >>> -   SOC15_REG_ENTRY_STR(GC, 0, mmGRBM_STATUS_SE3)
> >>> +   SOC15_REG_ENTRY_STR(GC, 0, mmGRBM_STATUS_SE3),
> >>> +   /* compute registers */
> >>> +   SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_VMID),
> >>> +   SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_PERSISTENT_STATE),
> >>> +   SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_PIPE_PRIORITY),
> >>> +   SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_QUEUE_PRIORITY),
> >>> +   SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_QUANTUM),
> >>> +   SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_PQ_BASE),
> >>> +   SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_PQ_BASE_HI),
> >>> +   SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_PQ_RPTR),
> >>> +   SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_PQ_WPTR_POLL_ADDR),
> >>> +   SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_PQ_WPTR_POLL_ADDR_HI),
> >>> +   SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_PQ_DOORBELL_CONTROL),
> >>> +   SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_PQ_CONTROL),
> >>> +   SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_IB_BASE_ADDR),
> >>> +   SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_IB_BASE_ADDR_HI),
> >>> +   SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_IB_RPTR),
> >>> +   SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_IB_CONTROL),
> >>> +   SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_DEQUEUE_REQUEST),
> >>> +   SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_EOP_BASE_ADDR),
> >>> +   SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_EOP_BASE_ADDR_HI),
> >>> +   SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_EOP_CONTROL),
> >>> +   SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_EOP_RPTR),
> >>> +   SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_EOP_WPTR),
> >>> +   SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_EOP_EVENTS),
> >>> +   SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_CTX_SAVE_BASE_ADDR_LO),
> >>> +   SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_CTX_SAVE_BASE_ADDR_HI),
> >>> +   SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_CTX_SAVE_CONTROL),
> >>> +   SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_CNTL_STACK_OFFSET),
> >>> +   SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_CNTL_STACK_SIZE),
> >>> +   SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_WG_STATE_OFFSET),
> >>> +   SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_CTX_SAVE_SIZE),
> >>> +   SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_GDS_RESOURCE_STATE),
> >>> +   SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_ERROR),
> >>> +   SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_EOP_WPTR_MEM),
> >>> +   SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_PQ_WPTR_LO),
> >>> +   SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_PQ_WPTR_HI),
> >>> +   SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_SUSPEND_CNTL_STACK_OFFSET),
> >>> +   SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_SUSPEND_CNTL_STACK_DW_CNT),
> >>> +   SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_SUSPEND_WG_STATE_OFFSET),
> >>> +   SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_DEQUEUE_STATUS)
> >> The registers in patches 3 and 4 are multi-instance, so we should
> >> ideally print every instance of them rather than just one.  Use
> >> nv_grbm_select() to select the pipes and queues.  Make sure to protect
> >> access using the adev->srbm_mutex mutex.
> >>
> >> E.g., for the compute registers (patch 3):
> >>  mutex_lock(&adev->srbm_mutex);
> >>  for (i = 0; i < adev->gfx.mec.num_mec; ++i) {
> >>  for (j = 0; j < adev->gfx.mec.num_pipe_per_mec; j++) {
> >> for (k = 0; k <
> >> adev->gfx.mec.num_queue_per_pipe; k++) {
> >>   drm_printf("mec %d, pipe %d, queue %d\n", i, j, k);
> >>  nv_grbm_select(adev, i, j, k, 0);
> >> for (reg = 0; reg < ARRAY_SIZE(compute_regs);
> >> reg++)
> >> drm_printf(...RREG(compute_regs[reg]));
> >>  }
> >>  }
> >>  }
> >>  nv_grbm_select(adev, 0, 0, 0, 0);
> >>  mutex_unlock(&adev->srbm_mutex);
> >>
> >> For gfx registers (patch 4):
> >>
> >>  mutex_lock(&adev->srbm_mutex);
> >>  for (i = 0; i < adev->gfx.me.num_me; ++i) {
> >>  for (j = 0; j < adev->

Re: [PATCH v1 3/4] drm/amdgpu: add compute registers in ip dump for gfx10

2024-05-03 Thread Khatri, Sunil




On 5/3/2024 9:18 PM, Khatri, Sunil wrote:


On 5/3/2024 8:52 PM, Alex Deucher wrote:
On Fri, May 3, 2024 at 4:45 AM Sunil Khatri  
wrote:

add compute registers in set of registers to dump
during ip dump for gfx10.

Signed-off-by: Sunil Khatri 
---
  drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c | 42 
+-

  1 file changed, 41 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c 
b/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c

index 953df202953a..00c7a842ea3b 100644
--- a/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
@@ -378,7 +378,47 @@ static const struct amdgpu_hwip_reg_entry 
gc_reg_list_10_1[] = {

 SOC15_REG_ENTRY_STR(GC, 0, mmGRBM_STATUS_SE0),
 SOC15_REG_ENTRY_STR(GC, 0, mmGRBM_STATUS_SE1),
 SOC15_REG_ENTRY_STR(GC, 0, mmGRBM_STATUS_SE2),
-   SOC15_REG_ENTRY_STR(GC, 0, mmGRBM_STATUS_SE3)
+   SOC15_REG_ENTRY_STR(GC, 0, mmGRBM_STATUS_SE3),
+   /* compute registers */
+   SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_VMID),
+   SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_PERSISTENT_STATE),
+   SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_PIPE_PRIORITY),
+   SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_QUEUE_PRIORITY),
+   SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_QUANTUM),
+   SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_PQ_BASE),
+   SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_PQ_BASE_HI),
+   SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_PQ_RPTR),
+   SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_PQ_WPTR_POLL_ADDR),
+   SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_PQ_WPTR_POLL_ADDR_HI),
+   SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_PQ_DOORBELL_CONTROL),
+   SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_PQ_CONTROL),
+   SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_IB_BASE_ADDR),
+   SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_IB_BASE_ADDR_HI),
+   SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_IB_RPTR),
+   SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_IB_CONTROL),
+   SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_DEQUEUE_REQUEST),
+   SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_EOP_BASE_ADDR),
+   SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_EOP_BASE_ADDR_HI),
+   SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_EOP_CONTROL),
+   SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_EOP_RPTR),
+   SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_EOP_WPTR),
+   SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_EOP_EVENTS),
+   SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_CTX_SAVE_BASE_ADDR_LO),
+   SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_CTX_SAVE_BASE_ADDR_HI),
+   SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_CTX_SAVE_CONTROL),
+   SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_CNTL_STACK_OFFSET),
+   SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_CNTL_STACK_SIZE),
+   SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_WG_STATE_OFFSET),
+   SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_CTX_SAVE_SIZE),
+   SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_GDS_RESOURCE_STATE),
+   SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_ERROR),
+   SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_EOP_WPTR_MEM),
+   SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_PQ_WPTR_LO),
+   SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_PQ_WPTR_HI),
+   SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_SUSPEND_CNTL_STACK_OFFSET),
+   SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_SUSPEND_CNTL_STACK_DW_CNT),
+   SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_SUSPEND_WG_STATE_OFFSET),
+   SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_DEQUEUE_STATUS)

The registers in patches 3 and 4 are multi-instance, so we should
ideally print every instance of them rather than just one.  Use
nv_grbm_select() to select the pipes and queues.  Make sure to protect
access using the adev->srbm_mutex mutex.

E.g., for the compute registers (patch 3):
 mutex_lock(&adev->srbm_mutex);
 for (i = 0; i < adev->gfx.mec.num_mec; ++i) {
 for (j = 0; j < adev->gfx.mec.num_pipe_per_mec; j++) {
    for (k = 0; k < 
adev->gfx.mec.num_queue_per_pipe; k++) {

  drm_printf("mec %d, pipe %d, queue %d\n", i, j, k);
 nv_grbm_select(adev, i, j, k, 0);
    for (reg = 0; reg < ARRAY_SIZE(compute_regs); 
reg++)

    drm_printf(...RREG(compute_regs[reg]));
 }
 }
 }
 nv_grbm_select(adev, 0, 0, 0, 0);
 mutex_unlock(&adev->srbm_mutex);

For gfx registers (patch 4):

 mutex_lock(&adev->srbm_mutex);
 for (i = 0; i < adev->gfx.me.num_me; ++i) {
 for (j = 0; j < adev->gfx.me.num_pipe_per_me; j++) {
 for (k = 0; k < adev->gfx.me.num_queue_per_pipe; 
k++) {
   drm_printf("me %d, pipe %d, queue 
%d\n", i, j, k);

 nv_grbm_select(adev, i, j, k, 0);
    for (reg = 0; reg < ARRAY_SIZE(gfx_regs); reg++)
    drm_printf(...RREG(gfx_regs[reg]));
I see one problem here, we dump the registers in memory allocated first 
and read before and store and then dump later when user read the 
devcoredump file. Here w

Re: [PATCH] Documentation/gpu: Document the situation with unqualified drm-memory-

2024-05-03 Thread Alex Deucher

On Fri, May 3, 2024 at 11:33 AM Daniel Vetter  wrote:
>
> On Fri, May 03, 2024 at 01:58:38PM +0100, Tvrtko Ursulin wrote:
> >
> > [And I forgot dri-devel.. doing well!]
> >
> > On 03/05/2024 13:40, Tvrtko Ursulin wrote:
> > >
> > > [Correcting Christian's email]
> > >
> > > On 03/05/2024 13:36, Tvrtko Ursulin wrote:
> > > > From: Tvrtko Ursulin 
> > > >
> > > > Currently it is not well defined what is drm-memory- compared to other
> > > > categories.
> > > >
> > > > In practice the only driver which emits these keys is amdgpu and in them
> > > > exposes the total memory use (including shared).
> > > >
> > > > Document that drm-memory- and drm-total-memory- are aliases to
> > > > prevent any
> > > > confusion in the future.
> > > >
> > > > While at it also clarify that the reserved sub-string 'memory' refers to
> > > > the memory region component.
> > > >
> > > > Signed-off-by: Tvrtko Ursulin 
> > > > Cc: Alex Deucher 
> > > > Cc: Christian König 
> > >
> > > Mea culpa, I copied the mistake from
> > > 77d17c4cd0bf52eacfad88e63e8932eb45d643c5. :)
> > >
> > > Regards,
> > >
> > > Tvrtko
> > >
> > > > Cc: Rob Clark 
> > > > ---
> > > >   Documentation/gpu/drm-usage-stats.rst | 10 +-
> > > >   1 file changed, 9 insertions(+), 1 deletion(-)
> > > >
> > > > diff --git a/Documentation/gpu/drm-usage-stats.rst
> > > > b/Documentation/gpu/drm-usage-stats.rst
> > > > index 6dc299343b48..ef5c0a0aa477 100644
> > > > --- a/Documentation/gpu/drm-usage-stats.rst
> > > > +++ b/Documentation/gpu/drm-usage-stats.rst
> > > > @@ -128,7 +128,9 @@ Memory
> > > >   Each possible memory type which can be used to store buffer
> > > > objects by the
> > > >   GPU in question shall be given a stable and unique name to be
> > > > returned as the
> > > > -string here.  The name "memory" is reserved to refer to normal
> > > > system memory.
> > > > +string here.
> > > > +
> > > > +The region name "memory" is reserved to refer to normal system memory.
> > > >   Value shall reflect the amount of storage currently consumed by
> > > > the buffer
> > > >   objects belong to this client, in the respective memory region.
> > > > @@ -136,6 +138,9 @@ objects belong to this client, in the respective
> > > > memory region.
> > > >   Default unit shall be bytes with optional unit specifiers of 'KiB'
> > > > or 'MiB'
> > > >   indicating kibi- or mebi-bytes.
> > > > +This is an alias for drm-total- and only one of the two
> > > > should be
> > > > +present.
>
> This feels a bit awkward and seems to needlessly complicate fdinfo uapi.
>
> - Could we just patch amdgpu to follow everyone else, and avoid the
>   special case? If there's no tool that relies on the special amdgpu
>   prefix then that would be a lot easier.
>
> - If that's not on the table, could we make everyone (with a suitable
>   helper or something) just print both variants, so that we again have
>   consisent fdinfo output? Or breaks that a different set of existing
>   tools.
>
> - Finally maybe could we get away with fixing amd by adding the common
>   format there, deprecating the old, fixing the tools that would break and
>   then maybe if we're lucky, remove the old one from amdgpu in a year or
>   so?

I'm not really understanding what amdgpu is doing wrong.  It seems to
be following the documentation.  Is the idea that we would like to
deprecate drm-memory- in favor of drm-total-?
If that's the case, I think the 3rd option is probably the best.  We
have a lot of tools and customers using this.  It would have also been
nice to have "memory" in the string for the newer ones to avoid
conflicts with other things that might be a total or shared in the
future, but I guess that ship has sailed.  We should also note that
drm-memory- is deprecated.  While we are here, maybe we should
clarify the semantics of resident, purgeable, and active.  For
example, isn't resident just a duplicate of total?  If the memory was
not resident, it would be in a different region.

Alex

>
> Uapi that's "either do $foo or on this one driver, do $bar" is just
> guaranteed to fragement the ecosystem, so imo that should be the absolute
> last resort.
> -Sima
>
> > > > +
> > > >   - drm-shared-:  [KiB|MiB]
> > > >   The total size of buffers that are shared with another file (e.g.,
> > > > have more
> > > > @@ -145,6 +150,9 @@ than a single handle).
> > > >   The total size of buffers that including shared and private memory.
> > > > +This is an alias for drm-memory- and only one of the two
> > > > should be
> > > > +present.
> > > > +
> > > >   - drm-resident-:  [KiB|MiB]
> > > >   The total size of buffers that are resident in the specified region.
>
> --
> Daniel Vetter
> Software Engineer, Intel Corporation
> http://blog.ffwll.ch

Re: [PATCH v1 3/4] drm/amdgpu: add compute registers in ip dump for gfx10

2024-05-03 Thread Khatri, Sunil




On 5/3/2024 8:52 PM, Alex Deucher wrote:

On Fri, May 3, 2024 at 4:45 AM Sunil Khatri  wrote:

add compute registers in set of registers to dump
during ip dump for gfx10.

Signed-off-by: Sunil Khatri 
---
  drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c | 42 +-
  1 file changed, 41 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c 
b/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
index 953df202953a..00c7a842ea3b 100644
--- a/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
@@ -378,7 +378,47 @@ static const struct amdgpu_hwip_reg_entry 
gc_reg_list_10_1[] = {
 SOC15_REG_ENTRY_STR(GC, 0, mmGRBM_STATUS_SE0),
 SOC15_REG_ENTRY_STR(GC, 0, mmGRBM_STATUS_SE1),
 SOC15_REG_ENTRY_STR(GC, 0, mmGRBM_STATUS_SE2),
-   SOC15_REG_ENTRY_STR(GC, 0, mmGRBM_STATUS_SE3)
+   SOC15_REG_ENTRY_STR(GC, 0, mmGRBM_STATUS_SE3),
+   /* compute registers */
+   SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_VMID),
+   SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_PERSISTENT_STATE),
+   SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_PIPE_PRIORITY),
+   SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_QUEUE_PRIORITY),
+   SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_QUANTUM),
+   SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_PQ_BASE),
+   SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_PQ_BASE_HI),
+   SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_PQ_RPTR),
+   SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_PQ_WPTR_POLL_ADDR),
+   SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_PQ_WPTR_POLL_ADDR_HI),
+   SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_PQ_DOORBELL_CONTROL),
+   SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_PQ_CONTROL),
+   SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_IB_BASE_ADDR),
+   SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_IB_BASE_ADDR_HI),
+   SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_IB_RPTR),
+   SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_IB_CONTROL),
+   SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_DEQUEUE_REQUEST),
+   SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_EOP_BASE_ADDR),
+   SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_EOP_BASE_ADDR_HI),
+   SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_EOP_CONTROL),
+   SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_EOP_RPTR),
+   SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_EOP_WPTR),
+   SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_EOP_EVENTS),
+   SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_CTX_SAVE_BASE_ADDR_LO),
+   SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_CTX_SAVE_BASE_ADDR_HI),
+   SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_CTX_SAVE_CONTROL),
+   SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_CNTL_STACK_OFFSET),
+   SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_CNTL_STACK_SIZE),
+   SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_WG_STATE_OFFSET),
+   SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_CTX_SAVE_SIZE),
+   SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_GDS_RESOURCE_STATE),
+   SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_ERROR),
+   SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_EOP_WPTR_MEM),
+   SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_PQ_WPTR_LO),
+   SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_PQ_WPTR_HI),
+   SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_SUSPEND_CNTL_STACK_OFFSET),
+   SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_SUSPEND_CNTL_STACK_DW_CNT),
+   SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_SUSPEND_WG_STATE_OFFSET),
+   SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_DEQUEUE_STATUS)

The registers in patches 3 and 4 are multi-instance, so we should
ideally print every instance of them rather than just one.  Use
nv_grbm_select() to select the pipes and queues.  Make sure to protect
access using the adev->srbm_mutex mutex.

E.g., for the compute registers (patch 3):
 mutex_lock(&adev->srbm_mutex);
 for (i = 0; i < adev->gfx.mec.num_mec; ++i) {
 for (j = 0; j < adev->gfx.mec.num_pipe_per_mec; j++) {
for (k = 0; k < adev->gfx.mec.num_queue_per_pipe; k++) {
  drm_printf("mec %d, pipe %d, queue %d\n", i, j, k);
 nv_grbm_select(adev, i, j, k, 0);
for (reg = 0; reg < ARRAY_SIZE(compute_regs); reg++)
drm_printf(...RREG(compute_regs[reg]));
 }
 }
 }
 nv_grbm_select(adev, 0, 0, 0, 0);
 mutex_unlock(&adev->srbm_mutex);

For gfx registers (patch 4):

 mutex_lock(&adev->srbm_mutex);
 for (i = 0; i < adev->gfx.me.num_me; ++i) {
 for (j = 0; j < adev->gfx.me.num_pipe_per_me; j++) {
 for (k = 0; k < adev->gfx.me.num_queue_per_pipe; k++) {
   drm_printf("me %d, pipe %d, queue %d\n", i, j, 
k);
 nv_grbm_select(adev, i, j, k, 0);
for (reg = 0; reg < ARRAY_SIZE(gfx_regs); reg++)
drm_printf(...RREG(gfx_regs[reg]));
 }
 }
 }
 nv_grbm_select(adev, 0, 0, 0, 0);
 mutex_unlock(&adev->srbm_mutex);


Thanks for pointing that out and suggesting the sample code of how it 
should be. Will take c

Re: Allow setting a power cap that's lower than recommended

2024-05-03 Thread Mario Limonciello


On 5/3/2024 07:05, fililip wrote:

This patch allows setting a low power cap if |ignore_min_pcap| is set to 1.

Signed-off-by: fililip 


Rather than an attachment you should send the patch inline.  That would 
mean that your commit message and SoB should be at the top of the patch 
itself.


If you're not aware of it, you should read through:

https://www.kernel.org/doc/html/latest/process/submitting-patches.html

Basically a maintainer should be able to run:

'b4 shazam $URL'

And it would download the patch and apply it.

That being said my comments on the patch content:

As Alex mentioned a concern about the the maintenance of more 
parameters, maybe it's worth making the sysfs file of the limit 
"writable" instead.


Then when it's written the kernel driver can add_taint() for 
TAINT_FIRMWARE_WORKAROUND when in use so we can see it clearly in logs.

Re: Allow setting a power cap that's lower than recommended

2024-05-03 Thread Alex Deucher

On Fri, May 3, 2024 at 9:27 AM fililip  wrote:
>
> This patch allows setting a low power cap if ignore_min_pcap is set to 1.
>
> Signed-off-by: fililip 

Please generate a proper signed off git patch and use git-send-email
to send it out.  Also, I don't want to add more module parameters.

Alex

Re: [PATCH] Documentation/gpu: Document the situation with unqualified drm-memory-

2024-05-03 Thread Daniel Vetter

On Fri, May 03, 2024 at 01:58:38PM +0100, Tvrtko Ursulin wrote:
> 
> [And I forgot dri-devel.. doing well!]
> 
> On 03/05/2024 13:40, Tvrtko Ursulin wrote:
> > 
> > [Correcting Christian's email]
> > 
> > On 03/05/2024 13:36, Tvrtko Ursulin wrote:
> > > From: Tvrtko Ursulin 
> > > 
> > > Currently it is not well defined what is drm-memory- compared to other
> > > categories.
> > > 
> > > In practice the only driver which emits these keys is amdgpu and in them
> > > exposes the total memory use (including shared).
> > > 
> > > Document that drm-memory- and drm-total-memory- are aliases to
> > > prevent any
> > > confusion in the future.
> > > 
> > > While at it also clarify that the reserved sub-string 'memory' refers to
> > > the memory region component.
> > > 
> > > Signed-off-by: Tvrtko Ursulin 
> > > Cc: Alex Deucher 
> > > Cc: Christian König 
> > 
> > Mea culpa, I copied the mistake from
> > 77d17c4cd0bf52eacfad88e63e8932eb45d643c5. :)
> > 
> > Regards,
> > 
> > Tvrtko
> > 
> > > Cc: Rob Clark 
> > > ---
> > >   Documentation/gpu/drm-usage-stats.rst | 10 +-
> > >   1 file changed, 9 insertions(+), 1 deletion(-)
> > > 
> > > diff --git a/Documentation/gpu/drm-usage-stats.rst
> > > b/Documentation/gpu/drm-usage-stats.rst
> > > index 6dc299343b48..ef5c0a0aa477 100644
> > > --- a/Documentation/gpu/drm-usage-stats.rst
> > > +++ b/Documentation/gpu/drm-usage-stats.rst
> > > @@ -128,7 +128,9 @@ Memory
> > >   Each possible memory type which can be used to store buffer
> > > objects by the
> > >   GPU in question shall be given a stable and unique name to be
> > > returned as the
> > > -string here.  The name "memory" is reserved to refer to normal
> > > system memory.
> > > +string here.
> > > +
> > > +The region name "memory" is reserved to refer to normal system memory.
> > >   Value shall reflect the amount of storage currently consumed by
> > > the buffer
> > >   objects belong to this client, in the respective memory region.
> > > @@ -136,6 +138,9 @@ objects belong to this client, in the respective
> > > memory region.
> > >   Default unit shall be bytes with optional unit specifiers of 'KiB'
> > > or 'MiB'
> > >   indicating kibi- or mebi-bytes.
> > > +This is an alias for drm-total- and only one of the two
> > > should be
> > > +present.

This feels a bit awkward and seems to needlessly complicate fdinfo uapi.

- Could we just patch amdgpu to follow everyone else, and avoid the
  special case? If there's no tool that relies on the special amdgpu
  prefix then that would be a lot easier.

- If that's not on the table, could we make everyone (with a suitable
  helper or something) just print both variants, so that we again have
  consisent fdinfo output? Or breaks that a different set of existing
  tools.

- Finally maybe could we get away with fixing amd by adding the common
  format there, deprecating the old, fixing the tools that would break and
  then maybe if we're lucky, remove the old one from amdgpu in a year or
  so?

Uapi that's "either do $foo or on this one driver, do $bar" is just
guaranteed to fragement the ecosystem, so imo that should be the absolute
last resort.
-Sima

> > > +
> > >   - drm-shared-:  [KiB|MiB]
> > >   The total size of buffers that are shared with another file (e.g.,
> > > have more
> > > @@ -145,6 +150,9 @@ than a single handle).
> > >   The total size of buffers that including shared and private memory.
> > > +This is an alias for drm-memory- and only one of the two
> > > should be
> > > +present.
> > > +
> > >   - drm-resident-:  [KiB|MiB]
> > >   The total size of buffers that are resident in the specified region.

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

Re: [PATCH v1 3/4] drm/amdgpu: add compute registers in ip dump for gfx10

2024-05-03 Thread Alex Deucher

On Fri, May 3, 2024 at 4:45 AM Sunil Khatri  wrote:
>
> add compute registers in set of registers to dump
> during ip dump for gfx10.
>
> Signed-off-by: Sunil Khatri 
> ---
>  drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c | 42 +-
>  1 file changed, 41 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c 
> b/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
> index 953df202953a..00c7a842ea3b 100644
> --- a/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
> +++ b/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
> @@ -378,7 +378,47 @@ static const struct amdgpu_hwip_reg_entry 
> gc_reg_list_10_1[] = {
> SOC15_REG_ENTRY_STR(GC, 0, mmGRBM_STATUS_SE0),
> SOC15_REG_ENTRY_STR(GC, 0, mmGRBM_STATUS_SE1),
> SOC15_REG_ENTRY_STR(GC, 0, mmGRBM_STATUS_SE2),
> -   SOC15_REG_ENTRY_STR(GC, 0, mmGRBM_STATUS_SE3)
> +   SOC15_REG_ENTRY_STR(GC, 0, mmGRBM_STATUS_SE3),
> +   /* compute registers */
> +   SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_VMID),
> +   SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_PERSISTENT_STATE),
> +   SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_PIPE_PRIORITY),
> +   SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_QUEUE_PRIORITY),
> +   SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_QUANTUM),
> +   SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_PQ_BASE),
> +   SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_PQ_BASE_HI),
> +   SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_PQ_RPTR),
> +   SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_PQ_WPTR_POLL_ADDR),
> +   SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_PQ_WPTR_POLL_ADDR_HI),
> +   SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_PQ_DOORBELL_CONTROL),
> +   SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_PQ_CONTROL),
> +   SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_IB_BASE_ADDR),
> +   SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_IB_BASE_ADDR_HI),
> +   SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_IB_RPTR),
> +   SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_IB_CONTROL),
> +   SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_DEQUEUE_REQUEST),
> +   SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_EOP_BASE_ADDR),
> +   SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_EOP_BASE_ADDR_HI),
> +   SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_EOP_CONTROL),
> +   SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_EOP_RPTR),
> +   SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_EOP_WPTR),
> +   SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_EOP_EVENTS),
> +   SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_CTX_SAVE_BASE_ADDR_LO),
> +   SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_CTX_SAVE_BASE_ADDR_HI),
> +   SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_CTX_SAVE_CONTROL),
> +   SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_CNTL_STACK_OFFSET),
> +   SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_CNTL_STACK_SIZE),
> +   SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_WG_STATE_OFFSET),
> +   SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_CTX_SAVE_SIZE),
> +   SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_GDS_RESOURCE_STATE),
> +   SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_ERROR),
> +   SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_EOP_WPTR_MEM),
> +   SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_PQ_WPTR_LO),
> +   SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_PQ_WPTR_HI),
> +   SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_SUSPEND_CNTL_STACK_OFFSET),
> +   SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_SUSPEND_CNTL_STACK_DW_CNT),
> +   SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_SUSPEND_WG_STATE_OFFSET),
> +   SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_DEQUEUE_STATUS)

The registers in patches 3 and 4 are multi-instance, so we should
ideally print every instance of them rather than just one.  Use
nv_grbm_select() to select the pipes and queues.  Make sure to protect
access using the adev->srbm_mutex mutex.

E.g., for the compute registers (patch 3):
mutex_lock(&adev->srbm_mutex);
for (i = 0; i < adev->gfx.mec.num_mec; ++i) {
for (j = 0; j < adev->gfx.mec.num_pipe_per_mec; j++) {
   for (k = 0; k < adev->gfx.mec.num_queue_per_pipe; k++) {
 drm_printf("mec %d, pipe %d, queue %d\n", i, j, k);
nv_grbm_select(adev, i, j, k, 0);
   for (reg = 0; reg < ARRAY_SIZE(compute_regs); reg++)
   drm_printf(...RREG(compute_regs[reg]));
}
}
}
nv_grbm_select(adev, 0, 0, 0, 0);
mutex_unlock(&adev->srbm_mutex);

For gfx registers (patch 4):

mutex_lock(&adev->srbm_mutex);
for (i = 0; i < adev->gfx.me.num_me; ++i) {
for (j = 0; j < adev->gfx.me.num_pipe_per_me; j++) {
for (k = 0; k < adev->gfx.me.num_queue_per_pipe; k++) {
  drm_printf("me %d, pipe %d, queue %d\n", i, j, k);
nv_grbm_select(adev, i, j, k, 0);
   for (reg = 0; reg < ARRAY_SIZE(gfx_regs); reg++)
   drm_printf(...RREG(gfx_regs[reg]));
}
}
}
nv_grbm_select(adev, 0, 0, 0, 0);
mutex_unlock(&adev->srbm_mutex);

Alex

>  };
>
>  static const struct soc15_reg_golde

Re: [PATCH v1 1/4] drm/amdgpu: add CP headers registers to gfx10 dump

2024-05-03 Thread Alex Deucher

On Fri, May 3, 2024 at 4:45 AM Sunil Khatri  wrote:
>
> add registers in the ip dump for CP headers in gfx10
>
> Signed-off-by: Sunil Khatri 

Patches 1, 2 are:
Reviewed-by: Alex Deucher 

> ---
>  drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c | 9 -
>  1 file changed, 8 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c 
> b/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
> index 3171ed5e5af3..61c1e997f794 100644
> --- a/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
> +++ b/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
> @@ -366,7 +366,14 @@ static const struct amdgpu_hwip_reg_entry 
> gc_reg_list_10_1[] = {
> SOC15_REG_ENTRY_STR(GC, 0, mmRLC_GPM_DEBUG_INST_A),
> SOC15_REG_ENTRY_STR(GC, 0, mmRLC_GPM_DEBUG_INST_B),
> SOC15_REG_ENTRY_STR(GC, 0, mmRLC_GPM_DEBUG_INST_ADDR),
> -   SOC15_REG_ENTRY_STR(GC, 0, mmRLC_LX6_CORE_PDEBUG_INST)
> +   SOC15_REG_ENTRY_STR(GC, 0, mmRLC_LX6_CORE_PDEBUG_INST),
> +   /* cp header registers */
> +   SOC15_REG_ENTRY_STR(GC, 0, mmCP_CE_HEADER_DUMP),
> +   SOC15_REG_ENTRY_STR(GC, 0, mmCP_MEC_ME1_HEADER_DUMP),
> +   SOC15_REG_ENTRY_STR(GC, 0, mmCP_MEC_ME2_HEADER_DUMP),
> +   SOC15_REG_ENTRY_STR(GC, 0, mmCP_PFP_HEADER_DUMP),
> +   SOC15_REG_ENTRY_STR(GC, 0, mmCP_ME_HEADER_DUMP),
> +   SOC15_REG_ENTRY_STR(GC, 0, mmCP_MES_HEADER_DUMP)
>  };
>
>  static const struct soc15_reg_golden golden_settings_gc_10_1[] = {
> --
> 2.34.1
>

Re: Proposal to add CRIU support to DRM render nodes

2024-05-03 Thread Felix Kuehling




On 2024-04-16 10:04, Tvrtko Ursulin wrote:
> 
> On 01/04/2024 18:58, Felix Kuehling wrote:
>>
>> On 2024-04-01 12:56, Tvrtko Ursulin wrote:
>>>
>>> On 01/04/2024 17:37, Felix Kuehling wrote:
 On 2024-04-01 11:09, Tvrtko Ursulin wrote:
>
> On 28/03/2024 20:42, Felix Kuehling wrote:
>>
>> On 2024-03-28 12:03, Tvrtko Ursulin wrote:
>>>
>>> Hi Felix,
>>>
>>> I had one more thought while browsing around the amdgpu CRIU plugin. It 
>>> appears it relies on the KFD support being compiled in and /dev/kfd 
>>> present, correct? AFAICT at least, it relies on that to figure out the 
>>> amdgpu DRM node.
>>>
>>> In would be probably good to consider designing things without that 
>>> dependency. So that checkpointing an application which does not use 
>>> /dev/kfd is possible. Or if the kernel does not even have the KFD 
>>> support compiled in.
>>
>> Yeah, if we want to support graphics apps that don't use KFD, we should 
>> definitely do that. Currently we get a lot of topology information from 
>> KFD, not even from the /dev/kfd device but from the sysfs nodes exposed 
>> by KFD. We'd need to get GPU device info from the render nodes instead. 
>> And if KFD is available, we may need to integrate both sources of 
>> information.
>>
>>
>>>
>>> It could perhaps mean no more than adding some GPU discovery code into 
>>> CRIU. Which shuold be flexible enough to account for things like 
>>> re-assigned minor numbers due driver reload.
>>
>> Do you mean adding GPU discovery to the core CRIU, or to the plugin. I 
>> was thinking this is still part of the plugin.
>
> Yes I agree. I was only thinking about adding some DRM device discovery 
> code in a more decoupled fashion from the current plugin, for both the 
> reason discussed above (decoupling a bit from reliance on kfd sysfs), and 
> then also if/when a new DRM driver might want to implement this the code 
> could be move to some common plugin area.
>
> I am not sure how feasible that would be though. The "gpu id" concept and 
> it's matching in the current kernel code and CRIU plugin - is that value 
> tied to the physical GPU instance or how it works?

 The concept of the GPU ID is that it's stable while the system is up, even 
 when devices get added and removed dynamically. It was baked into the API 
 early on, but I don't think we ever fully validated device hot plug. I 
 think the closest we're getting is with our latest MI GPUs and dynamic 
 partition mode change.
>>>
>>> Doesn't it read the saved gpu id from the image file while doing restore 
>>> and tries to open the render node to match it? Maybe I am misreading the 
>>> code.. But if it does, does it imply that in practice it could be stable 
>>> across reboots? Or that it is not possible to restore to a different 
>>> instance of maybe the same GPU model installed in a system?
>>
>> Ah, the idea is, that when you restore on a different system, you may get 
>> different GPU IDs. Or you may checkpoint an app running on GPU 1 but restore 
>> it on GPU 2 on the same system. That's why we need to translate GPU IDs in 
>> restored applications. User mode still uses the old GPU IDs, but the kernel 
>> mode driver translates them to the actual GPU IDs of the GPUs that the 
>> process was restored on.
> 
> I see.. I think. Normal flow is ppd->user_gpu_id set during client init, but 
> for restored clients it gets overriden during restore so that any further 
> ioctls can actually not instantly fail.
> 
> And then in amdgpu_plugin_restore_file, when it is opening the render node, 
> it relies on the kfd topology to have filled in (more or less) the 
> target_gpu_id corresponding to the render node gpu id of the target GPU - the 
> one associated with the new kfd gpu_id?

Yes.

> 
> I am digging into this because I am trying to see if some part of GPU 
> discovery could somehow be decoupled.. to offer you to work on at least that 
> until you start to tackle the main body of the feature. But it looks properly 
> tangled up.

OK. Most of the interesting plugin code should be in amdgpu_plugin_topology.c. 
It currently has some pretty complicated logic to find a set of devices that 
matches the topology in the checkpoint, including shader ISA versions, numbers 
of compute units, memory sizes, firmware versions and IO-Links between GPUs. 
This was originally done to support P2P with XGMI links. I'm not sure we ever 
updated it to properly support PCIe P2P.


> 
> Do you have any suggestions with what I could help with? Maybe developing 
> some sort of drm device enumeration library if you see a way that would be 
> useful in decoupling the device discovery from kfd. We would need to define 
> what sort of information you would need to be queryable from it.

Maybe. I think a lot of device information is available with some amd

Re: [RFC 5/5] drm/amdgpu: Only show VRAM in fdinfo if it exists

2024-05-03 Thread Alex Deucher

On Fri, May 3, 2024 at 10:01 AM Tvrtko Ursulin
 wrote:
>
>
> On 03/05/2024 14:47, Alex Deucher wrote:
> > On Fri, May 3, 2024 at 3:50 AM Tvrtko Ursulin  
> > wrote:
> >>
> >>
> >> On 02/05/2024 14:16, Christian König wrote:
> >>> Am 30.04.24 um 19:27 schrieb Tvrtko Ursulin:
>  From: Tvrtko Ursulin 
> 
>  Do not emit the key-value pairs if the VRAM does not exist ie. VRAM
>  placement is not valid and accessible.
> >>>
> >>> Yeah, that's unfortunately rather misleading.
> >>>
> >>> Even APUs have VRAM or rather stolen system memory which is managed by
> >>> the graphics driver.
> >>>
> >>> We only have a single compute model which really doesn't have VRAM at all.
> >>
> >> Hm what is misleading and how more precisely? :) Maybe in other words,
> >> if is_app_apu is not the right criteria to know when TTM_PL_VRAM is
> >> impossible, what is? Is the compute model you mentio the only thing
> >> which sets is_app_apu and uses the dummy vram manager?
> >
> > Probably better to check if adev->gmc.real_vram_size is non-0.
>
> Hmm "real VRAM" - will that handle APUs correctly?

Yes.  In the client APU case "VRAM" will be the UMA carve out region
reserved by the sbios.

>
> I am looking at this:
>
> if (!adev->gmc.is_app_apu) {
> man->func = &amdgpu_vram_mgr_func;
>
> err = drm_buddy_init(&mgr->mm, man->size, PAGE_SIZE);
> if (err)
> return err;
> } else {
> man->func = &amdgpu_dummy_vram_mgr_func;
> DRM_INFO("Setup dummy vram mgr\n");
> }
>
> And assuming that unless the dummy manager is used, TTM_PL_VRAM will be
> valid. Wrong assumption?

Today checking is_app_apu would be fine, but It's supposed to
distinguish datacenter APUs, not whether or not the device has a
"VRAM" pool or not, but its usage has gotten sort of overloaded.  Just
want to avoid overloading what it means in too many more places.

Alex

>
> Regards,
>
> Tvrtko
>
>
> > Alex
> >
> >>
> >> Regards,
> >>
> >> Tvrtko
> >>
> >>> Regards,
> >>> Christian.
> >>>
> 
>  Signed-off-by: Tvrtko Ursulin 
>  ---
> drivers/gpu/drm/amd/amdgpu/amdgpu_fdinfo.c | 29 +-
> 1 file changed, 17 insertions(+), 12 deletions(-)
> 
>  diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_fdinfo.c
>  b/drivers/gpu/drm/amd/amdgpu/amdgpu_fdinfo.c
>  index a09944104c41..603a5c010f5d 100644
>  --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_fdinfo.c
>  +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_fdinfo.c
>  @@ -83,25 +83,30 @@ void amdgpu_show_fdinfo(struct drm_printer *p,
>  struct drm_file *file)
>  */
> drm_printf(p, "pasid:\t%u\n", fpriv->vm.pasid);
>  -drm_printf(p, "drm-memory-vram:\t%llu KiB\n", stats.vram/1024UL);
> drm_printf(p, "drm-memory-gtt: \t%llu KiB\n", stats.gtt/1024UL);
> drm_printf(p, "drm-memory-cpu: \t%llu KiB\n", stats.cpu/1024UL);
>  -drm_printf(p, "amd-memory-visible-vram:\t%llu KiB\n",
>  -   stats.visible_vram/1024UL);
>  -drm_printf(p, "amd-evicted-vram:\t%llu KiB\n",
>  -   stats.evicted_vram/1024UL);
>  -drm_printf(p, "amd-evicted-visible-vram:\t%llu KiB\n",
>  -   stats.evicted_visible_vram/1024UL);
>  -drm_printf(p, "amd-requested-vram:\t%llu KiB\n",
>  -   stats.requested_vram/1024UL);
>  -drm_printf(p, "amd-requested-visible-vram:\t%llu KiB\n",
>  -   stats.requested_visible_vram/1024UL);
> drm_printf(p, "amd-requested-gtt:\t%llu KiB\n",
>    stats.requested_gtt/1024UL);
>  -drm_printf(p, "drm-shared-vram:\t%llu KiB\n",
>  stats.vram_shared/1024UL);
> drm_printf(p, "drm-shared-gtt:\t%llu KiB\n",
>  stats.gtt_shared/1024UL);
> drm_printf(p, "drm-shared-cpu:\t%llu KiB\n",
>  stats.cpu_shared/1024UL);
>  +if (!adev->gmc.is_app_apu) {
>  +drm_printf(p, "drm-memory-vram:\t%llu KiB\n",
>  +   stats.vram/1024UL);
>  +drm_printf(p, "amd-memory-visible-vram:\t%llu KiB\n",
>  +   stats.visible_vram/1024UL);
>  +drm_printf(p, "amd-evicted-vram:\t%llu KiB\n",
>  +   stats.evicted_vram/1024UL);
>  +drm_printf(p, "amd-evicted-visible-vram:\t%llu KiB\n",
>  +   stats.evicted_visible_vram/1024UL);
>  +drm_printf(p, "amd-requested-vram:\t%llu KiB\n",
>  +   stats.requested_vram/1024UL);
>  +drm_printf(p, "amd-requested-visible-vram:\t%llu KiB\n",
>  +   stats.requested_visible_vram/1024UL);
>  +drm_printf(p, "drm-shared-vram:\t%llu KiB\n",
>  +   stats.vram_shared/1024UL);
>  +}
>  +
> for (hw_ip = 0; hw_ip < AMDGPU_HW_IP_NUM; ++hw_ip) {
> if (!usage[hw_ip])
> continue;
> >>>

Re: [RFC 0/5] Add capacity key to fdinfo

2024-05-03 Thread Alex Deucher

On Fri, May 3, 2024 at 7:50 AM Tvrtko Ursulin  wrote:
>
>
> On 02/05/2024 16:00, Alex Deucher wrote:
> > On Thu, May 2, 2024 at 10:43 AM Tvrtko Ursulin
> >  wrote:
> >>
> >>
> >> On 02/05/2024 14:07, Christian König wrote:
> >>> Am 01.05.24 um 15:27 schrieb Tvrtko Ursulin:
> 
>  Hi Alex,
> 
>  On 30/04/2024 19:32, Alex Deucher wrote:
> > On Tue, Apr 30, 2024 at 1:27 PM Tvrtko Ursulin 
> > wrote:
> >>
> >> From: Tvrtko Ursulin 
> >>
> >> I have noticed AMD GPUs can have more than one "engine" (ring?) of
> >> the same type
> >> but amdgpu is not reporting that in fdinfo using the capacity engine
> >> tag.
> >>
> >> This series is therefore an attempt to improve that, but only an RFC
> >> since it is
> >> quite likely I got stuff wrong on the first attempt. Or if not wrong
> >> it may not
> >> be very beneficial in AMDs case.
> >>
> >> So I tried to figure out how to count and store the number of
> >> instances of an
> >> "engine" type and spotted that could perhaps be used in more than
> >> one place in
> >> the driver. I was more than a little bit confused by the ip_instance
> >> and uapi
> >> rings, then how rings are selected to context entities internally.
> >> Anyway..
> >> hopefully it is a simple enough series to easily spot any such large
> >> misses.
> >>
> >> End result should be that, assuming two "engine" instances, one
> >> fully loaded and
> >> one idle will only report client using 50% of that engine type.
> >
> > That would only be true if there are multiple instantiations of the IP
> > on the chip which in most cases is not true.  In most cases there is
> > one instance of the IP that can be fed from multiple rings. E.g. for
> > graphics and compute, all of the rings ultimately feed into the same
> > compute units on the chip.  So if you have a gfx ring and a compute
> > rings, you can schedule work to them asynchronously, but ultimately
> > whether they execute serially or in parallel depends on the actual
> > shader code in the command buffers and the extent to which it can
> > utilize the available compute units in the shader cores.
> 
>  This is the same as with Intel/i915. Fdinfo is not intended to provide
>  utilisation of EUs and such, just how busy are the "entities" kernel
>  submits to. So doing something like in this series would make the
>  reporting more similar between the two drivers.
> 
>  I think both the 0-800% or 0-100% range (taking 8 ring compute as an
>  example) can be misleading for different workloads. Neither <800% in
>  the former means one can send more work and same for <100% in the latter.
> >>>
> >>> Yeah, I think that's what Alex tries to describe. By using 8 compute
> >>> rings your 800% load is actually incorrect and quite misleading.
> >>>
> >>> Background is that those 8 compute rings won't be active all at the same
> >>> time, but rather waiting on each other for resources.
> >>>
> >>> But this "waiting" is unfortunately considered execution time since the
> >>> used approach is actually not really capable of separating waiting and
> >>> execution time.
> >>
> >> Right, so 800% is what gputop could be suggesting today, by the virtue 8
> >> context/clients can each use 100% if they only use a subset of compute
> >> units. I was proposing to expose the capacity in fdinfo so it can be
> >> scaled down and then dicussing how both situation have pros and cons.
> >>
>  There is also a parallel with the CPU world here and hyper threading,
>  if not wider, where "What does 100% actually mean?" is also wishy-washy.
> 
>  Also note that the reporting of actual time based values in fdinfo
>  would not changing with this series.
> 
>  Of if you can guide me towards how to distinguish real vs fake
>  parallelism in HW IP blocks I could modify the series to only add
>  capacity tags where there are truly independent blocks. That would be
>  different from i915 though were I did not bother with that
>  distinction. (For reasons that assignment of for instance EUs to
>  compute "rings" (command streamers in i915) was supposed to be
>  possible to re-configure on the fly. So it did not make sense to try
>  and be super smart in fdinfo.)
> >>>
> >>> Well exactly that's the point we don't really have truly independent
> >>> blocks on AMD hardware.
> >>>
> >>> There are things like independent SDMA instances, but those a meant to
> >>> be used like the first instance for uploads and the second for downloads
> >>> etc.. When you use both instances for the same job they will pretty much
> >>> limit each other because of a single resource.
> >>
> >> So _never_ multiple instances of the same IP block? No video decode,
> >> encode, anything?
> >
> > Some chips have multiple encode/decode IP blocks that are actually
>

Re: [PATCH] dm/amd/pm: Fix problems with reboot/shutdown for some SMU 13.0.4/13.0.11 users

2024-05-03 Thread Alex Deucher

On Thu, May 2, 2024 at 3:22 PM Mario Limonciello
 wrote:
>
> Limit the workaround introduced by commit 31729e8c21ec ("drm/amd/pm: fixes
> a random hang in S4 for SMU v13.0.4/11") to only run in the s4 path.
>
> Cc: Tim Huang 
> Fixes: 31729e8c21ec ("drm/amd/pm: fixes a random hang in S4 for SMU 
> v13.0.4/11")
> Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3351
> Signed-off-by: Mario Limonciello 

Acked-by: Alex Deucher 

> ---
> I tested this on SMU 13.0.4 for ~85 cycles with this script, BIOS 1.1.0.2a and
> didn't observe any hangs.
>
> ```
> #!/bin/sh
> echo test_resume > /sys/power/disk
> i=1
> while [ : ]; do
>
>   echo "Starting cycle $i"
>   echo disk > /sys/power/state
>   echo "Ending cycle $i"
>   i=$((i+1))
>   sleep 5
> done
> ```
>
>  drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0_4_ppt.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0_4_ppt.c 
> b/drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0_4_ppt.c
> index 949131bd1ecb..4abfcd32747d 100644
> --- a/drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0_4_ppt.c
> +++ b/drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0_4_ppt.c
> @@ -226,7 +226,7 @@ static int smu_v13_0_4_system_features_control(struct 
> smu_context *smu, bool en)
> struct amdgpu_device *adev = smu->adev;
> int ret = 0;
>
> -   if (!en && !adev->in_s0ix) {
> +   if (!en && adev->in_s4) {
> /* Adds a GFX reset as workaround just before sending the
>  * MP1_UNLOAD message to prevent GC/RLC/PMFW from entering
>  * an invalid state.
> --
> 2.43.0
>

Re: [RFC 5/5] drm/amdgpu: Only show VRAM in fdinfo if it exists

2024-05-03 Thread Alex Deucher

On Fri, May 3, 2024 at 3:50 AM Tvrtko Ursulin  wrote:
>
>
> On 02/05/2024 14:16, Christian König wrote:
> > Am 30.04.24 um 19:27 schrieb Tvrtko Ursulin:
> >> From: Tvrtko Ursulin 
> >>
> >> Do not emit the key-value pairs if the VRAM does not exist ie. VRAM
> >> placement is not valid and accessible.
> >
> > Yeah, that's unfortunately rather misleading.
> >
> > Even APUs have VRAM or rather stolen system memory which is managed by
> > the graphics driver.
> >
> > We only have a single compute model which really doesn't have VRAM at all.
>
> Hm what is misleading and how more precisely? :) Maybe in other words,
> if is_app_apu is not the right criteria to know when TTM_PL_VRAM is
> impossible, what is? Is the compute model you mentio the only thing
> which sets is_app_apu and uses the dummy vram manager?

Probably better to check if adev->gmc.real_vram_size is non-0.

Alex

>
> Regards,
>
> Tvrtko
>
> > Regards,
> > Christian.
> >
> >>
> >> Signed-off-by: Tvrtko Ursulin 
> >> ---
> >>   drivers/gpu/drm/amd/amdgpu/amdgpu_fdinfo.c | 29 +-
> >>   1 file changed, 17 insertions(+), 12 deletions(-)
> >>
> >> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_fdinfo.c
> >> b/drivers/gpu/drm/amd/amdgpu/amdgpu_fdinfo.c
> >> index a09944104c41..603a5c010f5d 100644
> >> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_fdinfo.c
> >> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_fdinfo.c
> >> @@ -83,25 +83,30 @@ void amdgpu_show_fdinfo(struct drm_printer *p,
> >> struct drm_file *file)
> >>*/
> >>   drm_printf(p, "pasid:\t%u\n", fpriv->vm.pasid);
> >> -drm_printf(p, "drm-memory-vram:\t%llu KiB\n", stats.vram/1024UL);
> >>   drm_printf(p, "drm-memory-gtt: \t%llu KiB\n", stats.gtt/1024UL);
> >>   drm_printf(p, "drm-memory-cpu: \t%llu KiB\n", stats.cpu/1024UL);
> >> -drm_printf(p, "amd-memory-visible-vram:\t%llu KiB\n",
> >> -   stats.visible_vram/1024UL);
> >> -drm_printf(p, "amd-evicted-vram:\t%llu KiB\n",
> >> -   stats.evicted_vram/1024UL);
> >> -drm_printf(p, "amd-evicted-visible-vram:\t%llu KiB\n",
> >> -   stats.evicted_visible_vram/1024UL);
> >> -drm_printf(p, "amd-requested-vram:\t%llu KiB\n",
> >> -   stats.requested_vram/1024UL);
> >> -drm_printf(p, "amd-requested-visible-vram:\t%llu KiB\n",
> >> -   stats.requested_visible_vram/1024UL);
> >>   drm_printf(p, "amd-requested-gtt:\t%llu KiB\n",
> >>  stats.requested_gtt/1024UL);
> >> -drm_printf(p, "drm-shared-vram:\t%llu KiB\n",
> >> stats.vram_shared/1024UL);
> >>   drm_printf(p, "drm-shared-gtt:\t%llu KiB\n",
> >> stats.gtt_shared/1024UL);
> >>   drm_printf(p, "drm-shared-cpu:\t%llu KiB\n",
> >> stats.cpu_shared/1024UL);
> >> +if (!adev->gmc.is_app_apu) {
> >> +drm_printf(p, "drm-memory-vram:\t%llu KiB\n",
> >> +   stats.vram/1024UL);
> >> +drm_printf(p, "amd-memory-visible-vram:\t%llu KiB\n",
> >> +   stats.visible_vram/1024UL);
> >> +drm_printf(p, "amd-evicted-vram:\t%llu KiB\n",
> >> +   stats.evicted_vram/1024UL);
> >> +drm_printf(p, "amd-evicted-visible-vram:\t%llu KiB\n",
> >> +   stats.evicted_visible_vram/1024UL);
> >> +drm_printf(p, "amd-requested-vram:\t%llu KiB\n",
> >> +   stats.requested_vram/1024UL);
> >> +drm_printf(p, "amd-requested-visible-vram:\t%llu KiB\n",
> >> +   stats.requested_visible_vram/1024UL);
> >> +drm_printf(p, "drm-shared-vram:\t%llu KiB\n",
> >> +   stats.vram_shared/1024UL);
> >> +}
> >> +
> >>   for (hw_ip = 0; hw_ip < AMDGPU_HW_IP_NUM; ++hw_ip) {
> >>   if (!usage[hw_ip])
> >>   continue;
> >

Re: [PATCH] Documentation/gpu: Document the situation with unqualified drm-memory-

2024-05-03 Thread Alex Deucher

On Fri, May 3, 2024 at 8:58 AM Tvrtko Ursulin  wrote:
>
>
> [And I forgot dri-devel.. doing well!]
>
> On 03/05/2024 13:40, Tvrtko Ursulin wrote:
> >
> > [Correcting Christian's email]
> >
> > On 03/05/2024 13:36, Tvrtko Ursulin wrote:
> >> From: Tvrtko Ursulin 
> >>
> >> Currently it is not well defined what is drm-memory- compared to other
> >> categories.
> >>
> >> In practice the only driver which emits these keys is amdgpu and in them
> >> exposes the total memory use (including shared).
> >>
> >> Document that drm-memory- and drm-total-memory- are aliases to prevent
> >> any
> >> confusion in the future.
> >>
> >> While at it also clarify that the reserved sub-string 'memory' refers to
> >> the memory region component.
> >>
> >> Signed-off-by: Tvrtko Ursulin 
> >> Cc: Alex Deucher 
> >> Cc: Christian König 
> >
> > Mea culpa, I copied the mistake from
> > 77d17c4cd0bf52eacfad88e63e8932eb45d643c5. :)
> >

I'm not following.  What is the mistake from that commit?

> > Regards,
> >
> > Tvrtko
> >
> >> Cc: Rob Clark 
> >> ---
> >>   Documentation/gpu/drm-usage-stats.rst | 10 +-
> >>   1 file changed, 9 insertions(+), 1 deletion(-)
> >>
> >> diff --git a/Documentation/gpu/drm-usage-stats.rst
> >> b/Documentation/gpu/drm-usage-stats.rst
> >> index 6dc299343b48..ef5c0a0aa477 100644
> >> --- a/Documentation/gpu/drm-usage-stats.rst
> >> +++ b/Documentation/gpu/drm-usage-stats.rst
> >> @@ -128,7 +128,9 @@ Memory
> >>   Each possible memory type which can be used to store buffer objects
> >> by the
> >>   GPU in question shall be given a stable and unique name to be
> >> returned as the
> >> -string here.  The name "memory" is reserved to refer to normal system
> >> memory.
> >> +string here.
> >> +
> >> +The region name "memory" is reserved to refer to normal system memory.

Is this supposed to mean drm-memory-memory?  That was my impression,
but that seems sort of weird.  Maybe we should just drop that
sentence.

Alex

> >>   Value shall reflect the amount of storage currently consumed by the
> >> buffer
> >>   objects belong to this client, in the respective memory region.
> >> @@ -136,6 +138,9 @@ objects belong to this client, in the respective
> >> memory region.
> >>   Default unit shall be bytes with optional unit specifiers of 'KiB'
> >> or 'MiB'
> >>   indicating kibi- or mebi-bytes.
> >> +This is an alias for drm-total- and only one of the two
> >> should be
> >> +present.
> >> +
> >>   - drm-shared-:  [KiB|MiB]
> >>   The total size of buffers that are shared with another file (e.g.,
> >> have more
> >> @@ -145,6 +150,9 @@ than a single handle).
> >>   The total size of buffers that including shared and private memory.
> >> +This is an alias for drm-memory- and only one of the two
> >> should be
> >> +present.
> >> +
> >>   - drm-resident-:  [KiB|MiB]
> >>   The total size of buffers that are resident in the specified region.

Re: Splat during driver probe

2024-05-03 Thread Alex Deucher

Possibly related to Arun's new memory clearing changes.  @Arunpravin
can you take a look?

Alex

On Fri, May 3, 2024 at 9:10 AM Tvrtko Ursulin  wrote:
>
>
> Hi,
>
> I tried asking on #radeon yesterday but no takers. I was wondering if
> the below splat is already known or not.
>
> Appears to happen due amdgpu_bo_release_notify genuniely running before
> amdgpu_ttm_set_buffer_funcs_status set the buffer funcs to enabled
> during device probe.
>
> I couldn't easily figure out what changed.
>
> Regards,
>
> Tvrtko
>
> [   11.802030] [drm:amdgpu_fill_buffer [amdgpu]] *ERROR* Trying to clear
> memory with ring turned off.
> [   11.802519] [ cut here ]
> [   11.802530] WARNING: CPU: 5 PID: 435 at
> drivers/gpu/drm/amd/amdgpu/amdgpu_object.c:1378
> amdgpu_bo_release_notify+0x20c/0x230 [amdgpu]
> [   11.802916] Modules linked in: nft_fib_inet nft_fib_ipv4 nft_fib_ipv6
> nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct
> amdgpu(E+) nft_chain_nat nf_tables ip6table_nat ip6table_mangle
> ip6table_raw ip6table_security iptable_nat nf_nat nf_conntrack
> nf_defrag_ipv6 nf_defrag_ipv4 iptable_mangle iptable_raw
> iptable_security nfnetlink ip6table_filter ip6_tables iptable_filter
> cmac algif_hash ramoops algif_skcipher reed_solomon af_alg bnep qrtr
> mousedev intel_rapl_msr ath11k_pci(+) intel_rapl_common mhi
> snd_acp_sof_mach snd_acp_mach snd_sof_probes ath11k snd_soc_dmic kvm_amd
> cdc_mbim qmi_helpers joydev hid_multitouch cdc_wdm snd_sof_amd_vangogh
> snd_sof_pci kvm mac80211 snd_hda_codec_hdmi hci_uart crct10dif_pclmul
> amdxcp i2c_algo_bit snd_sof_amd_acp btqca drm_ttm_helper libarc4
> crc32_pclmul btrtl snd_hda_intel snd_sof_xtensa_dsp btbcm ttm
> snd_intel_dspcfg btintel polyval_clmulni snd_sof drm_exec
> polyval_generic snd_hda_codec snd_sof_utils gpu_sched cfg80211 bluetooth
> snd_pci_acp5x gf128mul cdc_ncm
> [   11.803070]  sp5100_tco snd_hwdep ghash_clmulni_intel snd_soc_nau8821
> snd_soc_max98388 drm_suballoc_helper sha512_ssse3 cdc_ether
> snd_acp_config drm_buddy atkbd snd_soc_core aesni_intel snd_hda_core
> video ecdh_generic crypto_simd libps2 usbnet cryptd rapl mii wdat_wdt
> vivaldi_fmap pcspkr snd_compress i2c_piix4 snd_pcm drm_display_helper
> ccp snd_soc_acpi cdc_acm rfkill wmi ltrf216a 8250_dw i2c_hid_acpi
> snd_timer snd i2c_hid industrialio soundcore hid_steam pkcs8_key_parser
> crypto_user loop fuse dm_mod ip_tables x_tables overlay ext4 crc16
> mbcache jbd2 usbhid vfat fat btrfs blake2b_generic libcrc32c
> crc32c_generic xor raid6_pq sdhci_pci cqhci nvme serio_raw sdhci
> xhci_pci xhci_pci_renesas crc32c_intel mmc_core nvme_core i8042 serio
> spi_amd
> [   11.803448] CPU: 5 PID: 435 Comm: (udev-worker) Tainted: GW
> E  6.9.0-rc6 #3 dd3fb65c639fd86ff89731c604b1e804996e8989
> [   11.803471] Hardware name: Valve Galileo/Galileo, BIOS F7G0107 12/01/2023
> [   11.803486] RIP: 0010:amdgpu_bo_release_notify+0x20c/0x230 [amdgpu]
> [   11.803857] Code: 0b e9 a4 fe ff ff 48 ba ff ff ff ff ff ff ff 7f 31
> f6 4c 89 e7 e8 44 e5 33 e6 eb 8d e8 dd dd 33 e6 eb a7 0f 0b e9 4d fe ff
> ff <0f> 0b eb 9c be 03 00 00 00 e8 f6 04 00 e6 eb 90 e8 8f 64 79 e6 66
> [   11.803890] RSP: 0018:a77bc1537568 EFLAGS: 00010282
> [   11.803904] RAX: ffea RBX: 8e1f0da89048 RCX:
> 
> [   11.803919] RDX:  RSI: 0027 RDI:
> 
> [   11.803934] RBP: 8e1f6ec0ffb0 R08:  R09:
> 0003
> [   11.803948] R10: a77bc15372f0 R11: 8e223ef7ffe8 R12:
> 8e1f0da89000
> [   11.803963] R13: 8e1f0da89180 R14: 8e1f6ec0ffb0 R15:
> 8e1f6ec0
> [   11.803977] FS:  7fa2b600b200() GS:8e222ee8()
> knlGS:
> [   11.803994] CS:  0010 DS:  ES:  CR0: 80050033
> [   11.804007] CR2: 55cac2aa7080 CR3: 00010f818000 CR4:
> 00350ef0
> [   11.804021] Call Trace:
> [   11.804031]  
> [   11.804042]  ? __warn+0x8c/0x180
> [   11.804057]  ? amdgpu_bo_release_notify+0x20c/0x230 [amdgpu
> 03b1dca7e09918e3f45efb1acdfc90682f73e4c1]
> [   11.804456]  ? report_bug+0x191/0x1c0
> [   11.804473]  ? handle_bug+0x3a/0x70
> [   11.804485]  ? exc_invalid_op+0x17/0x70
> [   11.804497]  ? asm_exc_invalid_op+0x1a/0x20
> [   11.804517]  ? amdgpu_bo_release_notify+0x20c/0x230 [amdgpu
> 03b1dca7e09918e3f45efb1acdfc90682f73e4c1]
> [   11.804894]  ? amdgpu_bo_release_notify+0x14a/0x230 [amdgpu
> 03b1dca7e09918e3f45efb1acdfc90682f73e4c1]
> [   11.805277]  ttm_bo_release+0x115/0x330 [ttm
> 39c917822ce73b2a754d4183af7a0990991c66be]
> [   11.805303]  ? srso_return_thunk+0x5/0x5f
> [   11.805315]  ? __mutex_unlock_slowpath+0x3a/0x290
> [   11.805332]  amdgpu_bo_free_kernel+0xd6/0x130 [amdgpu
> 03b1dca7e09918e3f45efb1acdfc90682f73e4c1]
> [   11.805709]  dm_helpers_free_gpu_mem+0x41/0x80 [amdgpu
> 03b1dca7e09918e3f45efb1acdfc90682f73e4c1]
> [   11.806172]  vg_clk_mgr_construct+0x2c4/0x490 [amdgpu
> 03b1dca7e09918e3f45efb1acdfc90682f73e4c1]

Re: [PATCH] Documentation/gpu: Document the situation with unqualified drm-memory-

2024-05-03 Thread Tvrtko Ursulin




[And I forgot dri-devel.. doing well!]

On 03/05/2024 13:40, Tvrtko Ursulin wrote:


[Correcting Christian's email]

On 03/05/2024 13:36, Tvrtko Ursulin wrote:

From: Tvrtko Ursulin 

Currently it is not well defined what is drm-memory- compared to other
categories.

In practice the only driver which emits these keys is amdgpu and in them
exposes the total memory use (including shared).

Document that drm-memory- and drm-total-memory- are aliases to prevent 
any

confusion in the future.

While at it also clarify that the reserved sub-string 'memory' refers to
the memory region component.

Signed-off-by: Tvrtko Ursulin 
Cc: Alex Deucher 
Cc: Christian König 


Mea culpa, I copied the mistake from 
77d17c4cd0bf52eacfad88e63e8932eb45d643c5. :)


Regards,

Tvrtko


Cc: Rob Clark 
---
  Documentation/gpu/drm-usage-stats.rst | 10 +-
  1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/Documentation/gpu/drm-usage-stats.rst 
b/Documentation/gpu/drm-usage-stats.rst

index 6dc299343b48..ef5c0a0aa477 100644
--- a/Documentation/gpu/drm-usage-stats.rst
+++ b/Documentation/gpu/drm-usage-stats.rst
@@ -128,7 +128,9 @@ Memory
  Each possible memory type which can be used to store buffer objects 
by the
  GPU in question shall be given a stable and unique name to be 
returned as the
-string here.  The name "memory" is reserved to refer to normal system 
memory.

+string here.
+
+The region name "memory" is reserved to refer to normal system memory.
  Value shall reflect the amount of storage currently consumed by the 
buffer

  objects belong to this client, in the respective memory region.
@@ -136,6 +138,9 @@ objects belong to this client, in the respective 
memory region.
  Default unit shall be bytes with optional unit specifiers of 'KiB' 
or 'MiB'

  indicating kibi- or mebi-bytes.
+This is an alias for drm-total- and only one of the two 
should be

+present.
+
  - drm-shared-:  [KiB|MiB]
  The total size of buffers that are shared with another file (e.g., 
have more

@@ -145,6 +150,9 @@ than a single handle).
  The total size of buffers that including shared and private memory.
+This is an alias for drm-memory- and only one of the two 
should be

+present.
+
  - drm-resident-:  [KiB|MiB]
  The total size of buffers that are resident in the specified region.

Re: [linux-next:master] BUILD REGRESSION 9c6ecb3cb6e20c4fd7997047213ba0efcf9ada1a

2024-05-03 Thread Krishna Kurapati PSSNV





On 5/3/2024 11:58 AM, Greg KH wrote:

On Fri, May 03, 2024 at 11:30:50AM +0530, Krishna Kurapati PSSNV wrote:



On 5/3/2024 10:42 AM, Greg KH wrote:

Ok, I'm getting tired of seeing these for the USB portion of the tree,
so I went to look for:

On Fri, May 03, 2024 at 04:44:42AM +0800, kernel test robot wrote:

|-- arc-randconfig-002-20240503
|   `-- drivers-usb-dwc3-core.c:warning:variable-hw_mode-set-but-not-used


This warning (same for all arches), but can't seem to find it anywhere.

Any hints as to where it would be?



Hi Greg,

  I think the hw_mode was not removed in hs_phy_setup and left unused.

  Thinh reported the same when there was a merge conflict into linux next
(that the hw_mode variable was removed in ss_phy_setup and should be removed
in hs_phy_setup as well):

https://lore.kernel.org/all/20240426213923.tyeddub4xszyp...@synopsys.com/

  Perhaps that was missed ?


Must have been.  I need it in a format that it can be applied in (a
2-way diff can't apply...)



I just checked it with W=1 and it is causing the issue:

/local/mnt/workspace/krishna/linux-next/skales_test/skales/kernel/drivers/usb/dwc3/core.c: 
In function 'dwc3_hs_phy_setup':
/local/mnt/workspace/krishna/linux-next/skales_test/skales/kernel/drivers/usb/dwc3/core.c:679:15: 
warning: variable 'hw_mode' set but not used [-Wunused-but-set-variable]

  unsigned int hw_mode;
   ^

I can send a patch to fix it up. Also, just wanted to confirm if  I skip 
the fixes and CC tags as the main patch wasn't yet merged into any 
stable trees ?


Regards,
Krishna,

Re: [PATCH 1/3] drm/amdgpu: Add amdgpu_bo_is_vm_bo helper

2024-05-03 Thread Tvrtko Ursulin




On 03/05/2024 07:26, Christian König wrote:

Am 29.04.24 um 18:47 schrieb Tvrtko Ursulin:

From: Tvrtko Ursulin 

Help code readability by replacing a bunch of:

bo->tbo.base.resv == vm->root.bo->tbo.base.resv

With:

amdgpu_vm_is_bo_always_valid(vm, bo)

No functional changes.

v2:
  * Rename helper and move to amdgpu_vm. (Christian)

Signed-off-by: Tvrtko Ursulin 
Cc: Christian König 
---
  drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c |  2 +-
  drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c  | 40 +++--
  drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h  |  2 ++
  3 files changed, 28 insertions(+), 16 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c

index 67c234bcf89f..e698d65e9508 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
@@ -174,7 +174,7 @@ static int amdgpu_gem_object_open(struct 
drm_gem_object *obj,

  return -EPERM;
  if (abo->flags & AMDGPU_GEM_CREATE_VM_ALWAYS_VALID &&
-    abo->tbo.base.resv != vm->root.bo->tbo.base.resv)
+    !amdgpu_vm_is_bo_always_valid(vm, abo))
  return -EPERM;
  r = amdgpu_bo_reserve(abo, false);
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c

index 8af3f0fd3073..01ca4b35b369 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
@@ -333,7 +333,7 @@ void amdgpu_vm_bo_base_init(struct 
amdgpu_vm_bo_base *base,

  base->next = bo->vm_bo;
  bo->vm_bo = base;
-    if (bo->tbo.base.resv != vm->root.bo->tbo.base.resv)
+    if (!amdgpu_vm_is_bo_always_valid(vm, bo))
  return;
  dma_resv_assert_held(vm->root.bo->tbo.base.resv);
@@ -1101,13 +1101,13 @@ static void amdgpu_vm_bo_get_memory(struct 
amdgpu_bo_va *bo_va,

   * For now ignore BOs which are currently locked and potentially
   * changing their location.
   */
-    if (bo->tbo.base.resv != vm->root.bo->tbo.base.resv &&
+    if (!amdgpu_vm_is_bo_always_valid(vm, bo) &&
  !dma_resv_trylock(bo->tbo.base.resv))
  return;
  amdgpu_bo_get_memory(bo, stats);
-    if (bo->tbo.base.resv != vm->root.bo->tbo.base.resv)
-    dma_resv_unlock(bo->tbo.base.resv);
+    if (amdgpu_vm_is_bo_always_valid(vm, bo))
+    dma_resv_unlock(bo->tbo.base.resv);
  }
  void amdgpu_vm_get_memory(struct amdgpu_vm *vm,
@@ -1203,8 +1203,7 @@ int amdgpu_vm_bo_update(struct amdgpu_device 
*adev, struct amdgpu_bo_va *bo_va,

  uncached = false;
  }
-    if (clear || (bo && bo->tbo.base.resv ==
-  vm->root.bo->tbo.base.resv))
+    if (clear || amdgpu_vm_is_bo_always_valid(vm, bo))
  last_update = &vm->last_update;
  else
  last_update = &bo_va->last_pt_update;
@@ -1246,7 +1245,7 @@ int amdgpu_vm_bo_update(struct amdgpu_device 
*adev, struct amdgpu_bo_va *bo_va,

   * the evicted list so that it gets validated again on the
   * next command submission.
   */
-    if (bo && bo->tbo.base.resv == vm->root.bo->tbo.base.resv) {
+    if (amdgpu_vm_is_bo_always_valid(vm, bo)) {
  uint32_t mem_type = bo->tbo.resource->mem_type;
  if (!(bo->preferred_domains &
@@ -1640,10 +1639,9 @@ static void amdgpu_vm_bo_insert_map(struct 
amdgpu_device *adev,

  if (mapping->flags & AMDGPU_PTE_PRT)
  amdgpu_vm_prt_get(adev);
-    if (bo && bo->tbo.base.resv == vm->root.bo->tbo.base.resv &&
-    !bo_va->base.moved) {
+    if (amdgpu_vm_is_bo_always_valid(vm, bo) && !bo_va->base.moved)
  amdgpu_vm_bo_moved(&bo_va->base);
-    }
+
  trace_amdgpu_vm_bo_map(bo_va, mapping);
  }
@@ -1922,7 +1920,7 @@ int amdgpu_vm_bo_clear_mappings(struct 
amdgpu_device *adev,

  if (before->flags & AMDGPU_PTE_PRT)
  amdgpu_vm_prt_get(adev);
-    if (bo && bo->tbo.base.resv == vm->root.bo->tbo.base.resv &&
+    if (amdgpu_vm_is_bo_always_valid(vm, bo) &&
  !before->bo_va->base.moved)
  amdgpu_vm_bo_moved(&before->bo_va->base);
  } else {
@@ -1937,7 +1935,7 @@ int amdgpu_vm_bo_clear_mappings(struct 
amdgpu_device *adev,

  if (after->flags & AMDGPU_PTE_PRT)
  amdgpu_vm_prt_get(adev);
-    if (bo && bo->tbo.base.resv == vm->root.bo->tbo.base.resv &&
+    if (amdgpu_vm_is_bo_always_valid(vm, bo) &&
  !after->bo_va->base.moved)
  amdgpu_vm_bo_moved(&after->bo_va->base);
  } else {
@@ -2017,7 +2015,7 @@ void amdgpu_vm_bo_del(struct amdgpu_device *adev,
  if (bo) {
  dma_resv_assert_held(bo->tbo.base.resv);
-    if (bo->tbo.base.resv == vm->root.bo->tbo.base.resv)
+    if (amdgpu_vm_is_bo_always_valid(vm, bo))
  ttm_bo_set_bulk_move(&bo->tbo, NULL);
  for (base = &bo_va->base.bo->vm_bo; *base;
@@ -2111,7 +2109,7 @@ void amdgpu_vm_bo_invalidate(struct 
amdgpu_device *adev,

  for (bo_base = bo->vm_bo; bo_base; bo_base = bo_base->next) {
  struct amdgpu_vm *vm

Splat during driver probe

2024-05-03 Thread Tvrtko Ursulin




Hi,

I tried asking on #radeon yesterday but no takers. I was wondering if 
the below splat is already known or not.


Appears to happen due amdgpu_bo_release_notify genuniely running before 
amdgpu_ttm_set_buffer_funcs_status set the buffer funcs to enabled 
during device probe.


I couldn't easily figure out what changed.

Regards,

Tvrtko

[   11.802030] [drm:amdgpu_fill_buffer [amdgpu]] *ERROR* Trying to clear 
memory with ring turned off.

[   11.802519] [ cut here ]
[   11.802530] WARNING: CPU: 5 PID: 435 at 
drivers/gpu/drm/amd/amdgpu/amdgpu_object.c:1378 
amdgpu_bo_release_notify+0x20c/0x230 [amdgpu]
[   11.802916] Modules linked in: nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 
nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct 
amdgpu(E+) nft_chain_nat nf_tables ip6table_nat ip6table_mangle 
ip6table_raw ip6table_security iptable_nat nf_nat nf_conntrack 
nf_defrag_ipv6 nf_defrag_ipv4 iptable_mangle iptable_raw 
iptable_security nfnetlink ip6table_filter ip6_tables iptable_filter 
cmac algif_hash ramoops algif_skcipher reed_solomon af_alg bnep qrtr 
mousedev intel_rapl_msr ath11k_pci(+) intel_rapl_common mhi 
snd_acp_sof_mach snd_acp_mach snd_sof_probes ath11k snd_soc_dmic kvm_amd 
cdc_mbim qmi_helpers joydev hid_multitouch cdc_wdm snd_sof_amd_vangogh 
snd_sof_pci kvm mac80211 snd_hda_codec_hdmi hci_uart crct10dif_pclmul 
amdxcp i2c_algo_bit snd_sof_amd_acp btqca drm_ttm_helper libarc4 
crc32_pclmul btrtl snd_hda_intel snd_sof_xtensa_dsp btbcm ttm 
snd_intel_dspcfg btintel polyval_clmulni snd_sof drm_exec 
polyval_generic snd_hda_codec snd_sof_utils gpu_sched cfg80211 bluetooth 
snd_pci_acp5x gf128mul cdc_ncm
[   11.803070]  sp5100_tco snd_hwdep ghash_clmulni_intel snd_soc_nau8821 
snd_soc_max98388 drm_suballoc_helper sha512_ssse3 cdc_ether 
snd_acp_config drm_buddy atkbd snd_soc_core aesni_intel snd_hda_core 
video ecdh_generic crypto_simd libps2 usbnet cryptd rapl mii wdat_wdt 
vivaldi_fmap pcspkr snd_compress i2c_piix4 snd_pcm drm_display_helper 
ccp snd_soc_acpi cdc_acm rfkill wmi ltrf216a 8250_dw i2c_hid_acpi 
snd_timer snd i2c_hid industrialio soundcore hid_steam pkcs8_key_parser 
crypto_user loop fuse dm_mod ip_tables x_tables overlay ext4 crc16 
mbcache jbd2 usbhid vfat fat btrfs blake2b_generic libcrc32c 
crc32c_generic xor raid6_pq sdhci_pci cqhci nvme serio_raw sdhci 
xhci_pci xhci_pci_renesas crc32c_intel mmc_core nvme_core i8042 serio 
spi_amd
[   11.803448] CPU: 5 PID: 435 Comm: (udev-worker) Tainted: GW 
E  6.9.0-rc6 #3 dd3fb65c639fd86ff89731c604b1e804996e8989

[   11.803471] Hardware name: Valve Galileo/Galileo, BIOS F7G0107 12/01/2023
[   11.803486] RIP: 0010:amdgpu_bo_release_notify+0x20c/0x230 [amdgpu]
[   11.803857] Code: 0b e9 a4 fe ff ff 48 ba ff ff ff ff ff ff ff 7f 31 
f6 4c 89 e7 e8 44 e5 33 e6 eb 8d e8 dd dd 33 e6 eb a7 0f 0b e9 4d fe ff 
ff <0f> 0b eb 9c be 03 00 00 00 e8 f6 04 00 e6 eb 90 e8 8f 64 79 e6 66

[   11.803890] RSP: 0018:a77bc1537568 EFLAGS: 00010282
[   11.803904] RAX: ffea RBX: 8e1f0da89048 RCX: 

[   11.803919] RDX:  RSI: 0027 RDI: 

[   11.803934] RBP: 8e1f6ec0ffb0 R08:  R09: 
0003
[   11.803948] R10: a77bc15372f0 R11: 8e223ef7ffe8 R12: 
8e1f0da89000
[   11.803963] R13: 8e1f0da89180 R14: 8e1f6ec0ffb0 R15: 
8e1f6ec0
[   11.803977] FS:  7fa2b600b200() GS:8e222ee8() 
knlGS:

[   11.803994] CS:  0010 DS:  ES:  CR0: 80050033
[   11.804007] CR2: 55cac2aa7080 CR3: 00010f818000 CR4: 
00350ef0

[   11.804021] Call Trace:
[   11.804031]  
[   11.804042]  ? __warn+0x8c/0x180
[   11.804057]  ? amdgpu_bo_release_notify+0x20c/0x230 [amdgpu 
03b1dca7e09918e3f45efb1acdfc90682f73e4c1]

[   11.804456]  ? report_bug+0x191/0x1c0
[   11.804473]  ? handle_bug+0x3a/0x70
[   11.804485]  ? exc_invalid_op+0x17/0x70
[   11.804497]  ? asm_exc_invalid_op+0x1a/0x20
[   11.804517]  ? amdgpu_bo_release_notify+0x20c/0x230 [amdgpu 
03b1dca7e09918e3f45efb1acdfc90682f73e4c1]
[   11.804894]  ? amdgpu_bo_release_notify+0x14a/0x230 [amdgpu 
03b1dca7e09918e3f45efb1acdfc90682f73e4c1]
[   11.805277]  ttm_bo_release+0x115/0x330 [ttm 
39c917822ce73b2a754d4183af7a0990991c66be]

[   11.805303]  ? srso_return_thunk+0x5/0x5f
[   11.805315]  ? __mutex_unlock_slowpath+0x3a/0x290
[   11.805332]  amdgpu_bo_free_kernel+0xd6/0x130 [amdgpu 
03b1dca7e09918e3f45efb1acdfc90682f73e4c1]
[   11.805709]  dm_helpers_free_gpu_mem+0x41/0x80 [amdgpu 
03b1dca7e09918e3f45efb1acdfc90682f73e4c1]
[   11.806172]  vg_clk_mgr_construct+0x2c4/0x490 [amdgpu 
03b1dca7e09918e3f45efb1acdfc90682f73e4c1]
[   11.806645]  dc_clk_mgr_create+0x390/0x580 [amdgpu 
03b1dca7e09918e3f45efb1acdfc90682f73e4c1]
[   11.807112]  dc_create+0x28a/0x640 [amdgpu 
03b1dca7e09918e3f45efb1acdfc90682f73e4c1]
[   11.807567]  amdgpu_dm_init.isra.0+0x2e1/0x1fe0 [amdgpu 
03b1dca7e09918e3f45efb1acdfc9068

Re: [PATCH] Documentation/gpu: Document the situation with unqualified drm-memory-

2024-05-03 Thread Tvrtko Ursulin




[Correcting Christian's email]

On 03/05/2024 13:36, Tvrtko Ursulin wrote:

From: Tvrtko Ursulin 

Currently it is not well defined what is drm-memory- compared to other
categories.

In practice the only driver which emits these keys is amdgpu and in them
exposes the total memory use (including shared).

Document that drm-memory- and drm-total-memory- are aliases to prevent any
confusion in the future.

While at it also clarify that the reserved sub-string 'memory' refers to
the memory region component.

Signed-off-by: Tvrtko Ursulin 
Cc: Alex Deucher 
Cc: Christian König 


Mea culpa, I copied the mistake from 
77d17c4cd0bf52eacfad88e63e8932eb45d643c5. :)


Regards,

Tvrtko


Cc: Rob Clark 
---
  Documentation/gpu/drm-usage-stats.rst | 10 +-
  1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/Documentation/gpu/drm-usage-stats.rst 
b/Documentation/gpu/drm-usage-stats.rst
index 6dc299343b48..ef5c0a0aa477 100644
--- a/Documentation/gpu/drm-usage-stats.rst
+++ b/Documentation/gpu/drm-usage-stats.rst
@@ -128,7 +128,9 @@ Memory
  
  Each possible memory type which can be used to store buffer objects by the

  GPU in question shall be given a stable and unique name to be returned as the
-string here.  The name "memory" is reserved to refer to normal system memory.
+string here.
+
+The region name "memory" is reserved to refer to normal system memory.
  
  Value shall reflect the amount of storage currently consumed by the buffer

  objects belong to this client, in the respective memory region.
@@ -136,6 +138,9 @@ objects belong to this client, in the respective memory 
region.
  Default unit shall be bytes with optional unit specifiers of 'KiB' or 'MiB'
  indicating kibi- or mebi-bytes.
  
+This is an alias for drm-total- and only one of the two should be

+present.
+
  - drm-shared-:  [KiB|MiB]
  
  The total size of buffers that are shared with another file (e.g., have more

@@ -145,6 +150,9 @@ than a single handle).
  
  The total size of buffers that including shared and private memory.
  
+This is an alias for drm-memory- and only one of the two should be

+present.
+
  - drm-resident-:  [KiB|MiB]
  
  The total size of buffers that are resident in the specified region.

Re: [RFC 0/5] Add capacity key to fdinfo

2024-05-03 Thread Tvrtko Ursulin




On 02/05/2024 16:00, Alex Deucher wrote:

On Thu, May 2, 2024 at 10:43 AM Tvrtko Ursulin
 wrote:



On 02/05/2024 14:07, Christian König wrote:

Am 01.05.24 um 15:27 schrieb Tvrtko Ursulin:


Hi Alex,

On 30/04/2024 19:32, Alex Deucher wrote:

On Tue, Apr 30, 2024 at 1:27 PM Tvrtko Ursulin 
wrote:


From: Tvrtko Ursulin 

I have noticed AMD GPUs can have more than one "engine" (ring?) of
the same type
but amdgpu is not reporting that in fdinfo using the capacity engine
tag.

This series is therefore an attempt to improve that, but only an RFC
since it is
quite likely I got stuff wrong on the first attempt. Or if not wrong
it may not
be very beneficial in AMDs case.

So I tried to figure out how to count and store the number of
instances of an
"engine" type and spotted that could perhaps be used in more than
one place in
the driver. I was more than a little bit confused by the ip_instance
and uapi
rings, then how rings are selected to context entities internally.
Anyway..
hopefully it is a simple enough series to easily spot any such large
misses.

End result should be that, assuming two "engine" instances, one
fully loaded and
one idle will only report client using 50% of that engine type.


That would only be true if there are multiple instantiations of the IP
on the chip which in most cases is not true.  In most cases there is
one instance of the IP that can be fed from multiple rings. E.g. for
graphics and compute, all of the rings ultimately feed into the same
compute units on the chip.  So if you have a gfx ring and a compute
rings, you can schedule work to them asynchronously, but ultimately
whether they execute serially or in parallel depends on the actual
shader code in the command buffers and the extent to which it can
utilize the available compute units in the shader cores.


This is the same as with Intel/i915. Fdinfo is not intended to provide
utilisation of EUs and such, just how busy are the "entities" kernel
submits to. So doing something like in this series would make the
reporting more similar between the two drivers.

I think both the 0-800% or 0-100% range (taking 8 ring compute as an
example) can be misleading for different workloads. Neither <800% in
the former means one can send more work and same for <100% in the latter.


Yeah, I think that's what Alex tries to describe. By using 8 compute
rings your 800% load is actually incorrect and quite misleading.

Background is that those 8 compute rings won't be active all at the same
time, but rather waiting on each other for resources.

But this "waiting" is unfortunately considered execution time since the
used approach is actually not really capable of separating waiting and
execution time.


Right, so 800% is what gputop could be suggesting today, by the virtue 8
context/clients can each use 100% if they only use a subset of compute
units. I was proposing to expose the capacity in fdinfo so it can be
scaled down and then dicussing how both situation have pros and cons.


There is also a parallel with the CPU world here and hyper threading,
if not wider, where "What does 100% actually mean?" is also wishy-washy.

Also note that the reporting of actual time based values in fdinfo
would not changing with this series.

Of if you can guide me towards how to distinguish real vs fake
parallelism in HW IP blocks I could modify the series to only add
capacity tags where there are truly independent blocks. That would be
different from i915 though were I did not bother with that
distinction. (For reasons that assignment of for instance EUs to
compute "rings" (command streamers in i915) was supposed to be
possible to re-configure on the fly. So it did not make sense to try
and be super smart in fdinfo.)


Well exactly that's the point we don't really have truly independent
blocks on AMD hardware.

There are things like independent SDMA instances, but those a meant to
be used like the first instance for uploads and the second for downloads
etc.. When you use both instances for the same job they will pretty much
limit each other because of a single resource.


So _never_ multiple instances of the same IP block? No video decode,
encode, anything?


Some chips have multiple encode/decode IP blocks that are actually
separate instances, however, we load balance between them so userspace
sees just one engine.  Also in some cases they are asymmetric (e.g.,
different sets of supported CODECs on each instance).  The driver
handles this by inspecting the command buffer and scheduling on the
appropriate instance based on the requested CODEC.  SDMA also supports
multiple IP blocks that are independent.


Similar to i915 just that we don't inspect buffers but expose the 
instance capabilities and userspace is responsible to set up the load 
balancing engine with the correct physical mask.


Anyway, back to the main point - are you interested at all for me to add 
the capacity flags to at least the IP blocks which are probed to exist 
more than a singleton? An

Re: [PATCH 3/3] RDMA/hfi1: Use RMW accessors for changing LNKCTL2

2024-05-03 Thread Jason Gunthorpe

On Fri, May 03, 2024 at 01:18:35PM +0300, Ilpo Järvinen wrote:
> On Thu, 15 Feb 2024, Ilpo Järvinen wrote:
> 
> > Convert open coded RMW accesses for LNKCTL2 to use
> > pcie_capability_clear_and_set_word() which makes its easier to
> > understand what the code tries to do.
> > 
> > LNKCTL2 is not really owned by any driver because it is a collection of
> > control bits that PCI core might need to touch. RMW accessors already
> > have support for proper locking for a selected set of registers
> > (LNKCTL2 is not yet among them but likely will be in the future) to
> > avoid losing concurrent updates.
> > 
> > Suggested-by: Lukas Wunner 
> > Signed-off-by: Ilpo Järvinen 
> > Reviewed-by: Dean Luick 
> 
> I found out from Linux RDMA and InfiniBand patchwork that this patch had 
> been silently closed as "Not Applicable". Is there some reason for
> that?

It is part of a series that crosses subsystems, series like that
usually go through some other trees.

If you want single patches applied then please send single
patches.. It is hard to understand intent from mixed series.

Jason

Re: [PATCH 3/3] RDMA/hfi1: Use RMW accessors for changing LNKCTL2

2024-05-03 Thread Ilpo Järvinen

On Thu, 15 Feb 2024, Ilpo Järvinen wrote:

> Convert open coded RMW accesses for LNKCTL2 to use
> pcie_capability_clear_and_set_word() which makes its easier to
> understand what the code tries to do.
> 
> LNKCTL2 is not really owned by any driver because it is a collection of
> control bits that PCI core might need to touch. RMW accessors already
> have support for proper locking for a selected set of registers
> (LNKCTL2 is not yet among them but likely will be in the future) to
> avoid losing concurrent updates.
> 
> Suggested-by: Lukas Wunner 
> Signed-off-by: Ilpo Järvinen 
> Reviewed-by: Dean Luick 

I found out from Linux RDMA and InfiniBand patchwork that this patch had 
been silently closed as "Not Applicable". Is there some reason for that?

I was sending this change independently out (among 2 similar ones that 
already got applied) so I wouldn't need to keep carrying it within my PCIe 
bandwidth controller series. It seemed useful enough as cleanups to stand 
on its own legs w/o requiring it to be part of PCIe bw controller series.

Should I resend the patch or do RDMA/IB maintainers prefer it to remain 
as a part of PCIe BW controller series?

-- 
 i.

> ---
>  drivers/infiniband/hw/hfi1/pcie.c | 30 --
>  1 file changed, 8 insertions(+), 22 deletions(-)
> 
> diff --git a/drivers/infiniband/hw/hfi1/pcie.c 
> b/drivers/infiniband/hw/hfi1/pcie.c
> index 119ec2f1382b..7133964749f8 100644
> --- a/drivers/infiniband/hw/hfi1/pcie.c
> +++ b/drivers/infiniband/hw/hfi1/pcie.c
> @@ -1207,14 +1207,11 @@ int do_pcie_gen3_transition(struct hfi1_devdata *dd)
>   (u32)lnkctl2);
>   /* only write to parent if target is not as high as ours */
>   if ((lnkctl2 & PCI_EXP_LNKCTL2_TLS) < target_vector) {
> - lnkctl2 &= ~PCI_EXP_LNKCTL2_TLS;
> - lnkctl2 |= target_vector;
> - dd_dev_info(dd, "%s: ..new link control2: 0x%x\n", __func__,
> - (u32)lnkctl2);
> - ret = pcie_capability_write_word(parent,
> -  PCI_EXP_LNKCTL2, lnkctl2);
> + ret = pcie_capability_clear_and_set_word(parent, 
> PCI_EXP_LNKCTL2,
> +  PCI_EXP_LNKCTL2_TLS,
> +  target_vector);
>   if (ret) {
> - dd_dev_err(dd, "Unable to write to PCI config\n");
> + dd_dev_err(dd, "Unable to change parent PCI target 
> speed\n");
>   return_error = 1;
>   goto done;
>   }
> @@ -1223,22 +1220,11 @@ int do_pcie_gen3_transition(struct hfi1_devdata *dd)
>   }
>  
>   dd_dev_info(dd, "%s: setting target link speed\n", __func__);
> - ret = pcie_capability_read_word(dd->pcidev, PCI_EXP_LNKCTL2, &lnkctl2);
> + ret = pcie_capability_clear_and_set_word(dd->pcidev, PCI_EXP_LNKCTL2,
> +  PCI_EXP_LNKCTL2_TLS,
> +  target_vector);
>   if (ret) {
> - dd_dev_err(dd, "Unable to read from PCI config\n");
> - return_error = 1;
> - goto done;
> - }
> -
> - dd_dev_info(dd, "%s: ..old link control2: 0x%x\n", __func__,
> - (u32)lnkctl2);
> - lnkctl2 &= ~PCI_EXP_LNKCTL2_TLS;
> - lnkctl2 |= target_vector;
> - dd_dev_info(dd, "%s: ..new link control2: 0x%x\n", __func__,
> - (u32)lnkctl2);
> - ret = pcie_capability_write_word(dd->pcidev, PCI_EXP_LNKCTL2, lnkctl2);
> - if (ret) {
> - dd_dev_err(dd, "Unable to write to PCI config\n");
> + dd_dev_err(dd, "Unable to change device PCI target speed\n");
>   return_error = 1;
>   goto done;
>   }
>

Allow setting a power cap that's lower than recommended

2024-05-03 Thread fililip

This patch allows setting a low power cap if ignore_min_pcap is set to 1.

Signed-off-by: fililip diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
index 50f57d4dfd8f..b83697b75221 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
@@ -151,6 +151,7 @@ struct amdgpu_watchdog_timer
  */
 extern int amdgpu_modeset;
 extern unsigned int amdgpu_vram_limit;
+extern int amdgpu_ignore_min_pcap;
 extern int amdgpu_vis_vram_limit;
 extern int amdgpu_gart_size;
 extern int amdgpu_gtt_size;
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
index a7ad77ed09ca..2b7ed2e48e33 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
@@ -131,6 +131,7 @@ enum AMDGPU_DEBUG_MASK {
 };
 
 unsigned int amdgpu_vram_limit = UINT_MAX;
+int amdgpu_ignore_min_pcap = 0; /* do not ignore by default */
 int amdgpu_vis_vram_limit;
 int amdgpu_gart_size = -1; /* auto */
 int amdgpu_gtt_size = -1; /* auto */
@@ -238,6 +239,15 @@ struct amdgpu_watchdog_timer amdgpu_watchdog_timer = {
 	.period = 0x0, /* default to 0x0 (timeout disable) */
 };
 
+/**
+ * DOC: ignore_min_pcap (int)
+ * Ignore the minimum power cap.
+ * Useful on graphics cards where the minimum power cap is very high.
+ * The default is 0 (Do not ignore).
+ */
+MODULE_PARM_DESC(ignore_min_pcap, "Ignore the minimum power cap");
+module_param_named(ignore_min_pcap, amdgpu_ignore_min_pcap, int, 0600);
+
 /**
  * DOC: vramlimit (int)
  * Restrict the total amount of VRAM in MiB for testing.  The default is 0 (Use full VRAM).
diff --git a/drivers/gpu/drm/amd/pm/amdgpu_pm.c b/drivers/gpu/drm/amd/pm/amdgpu_pm.c
index 20c53eefd680..6d00cae9caec 100644
--- a/drivers/gpu/drm/amd/pm/amdgpu_pm.c
+++ b/drivers/gpu/drm/amd/pm/amdgpu_pm.c
@@ -2984,6 +2984,9 @@ static ssize_t amdgpu_hwmon_show_power_cap_min(struct device *dev,
 	 struct device_attribute *attr,
 	 char *buf)
 {
+	if (amdgpu_ignore_min_pcap)
+		return sysfs_emit(buf, "%i\n", 0);
+
 	return amdgpu_hwmon_show_power_cap_generic(dev, attr, buf, PP_PWR_LIMIT_MIN);
 }
 
diff --git a/drivers/gpu/drm/amd/pm/swsmu/amdgpu_smu.c b/drivers/gpu/drm/amd/pm/swsmu/amdgpu_smu.c
index 60dce148b2d7..72891811b2d5 100644
--- a/drivers/gpu/drm/amd/pm/swsmu/amdgpu_smu.c
+++ b/drivers/gpu/drm/amd/pm/swsmu/amdgpu_smu.c
@@ -2487,7 +2487,10 @@ int smu_get_power_limit(void *handle,
 			*limit = smu->max_power_limit;
 			break;
 		case SMU_PPT_LIMIT_MIN:
-			*limit = smu->min_power_limit;
+			if (amdgpu_ignore_min_pcap)
+*limit = 0;
+			else
+*limit = smu->min_power_limit;
 			break;
 		default:
 			return -EINVAL;
@@ -2511,7 +2514,14 @@ static int smu_set_power_limit(void *handle, uint32_t limit)
 		if (smu->ppt_funcs->set_power_limit)
 			return smu->ppt_funcs->set_power_limit(smu, limit_type, limit);
 
-	if ((limit > smu->max_power_limit) || (limit < smu->min_power_limit)) {
+	if (amdgpu_ignore_min_pcap) {
+		if ((limit > smu->max_power_limit)) {
+			dev_err(smu->adev->dev,
+"New power limit (%d) is over the max allowed %d\n",
+limit, smu->max_power_limit);
+			return -EINVAL;
+		}
+	} else if ((limit > smu->max_power_limit) || (limit < smu->min_power_limit)) {
 		dev_err(smu->adev->dev,
 			"New power limit (%d) is out of range [%d,%d]\n",
 			limit, smu->min_power_limit, smu->max_power_limit);

[PATCH] Documentation/gpu: Document the situation with unqualified drm-memory-

2024-05-03 Thread Tvrtko Ursulin

From: Tvrtko Ursulin 

Currently it is not well defined what is drm-memory- compared to other
categories.

In practice the only driver which emits these keys is amdgpu and in them
exposes the total memory use (including shared).

Document that drm-memory- and drm-total-memory- are aliases to prevent any
confusion in the future.

While at it also clarify that the reserved sub-string 'memory' refers to
the memory region component.

Signed-off-by: Tvrtko Ursulin 
Cc: Alex Deucher 
Cc: Christian König 
Cc: Rob Clark 
---
 Documentation/gpu/drm-usage-stats.rst | 10 +-
 1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/Documentation/gpu/drm-usage-stats.rst 
b/Documentation/gpu/drm-usage-stats.rst
index 6dc299343b48..ef5c0a0aa477 100644
--- a/Documentation/gpu/drm-usage-stats.rst
+++ b/Documentation/gpu/drm-usage-stats.rst
@@ -128,7 +128,9 @@ Memory
 
 Each possible memory type which can be used to store buffer objects by the
 GPU in question shall be given a stable and unique name to be returned as the
-string here.  The name "memory" is reserved to refer to normal system memory.
+string here.
+
+The region name "memory" is reserved to refer to normal system memory.
 
 Value shall reflect the amount of storage currently consumed by the buffer
 objects belong to this client, in the respective memory region.
@@ -136,6 +138,9 @@ objects belong to this client, in the respective memory 
region.
 Default unit shall be bytes with optional unit specifiers of 'KiB' or 'MiB'
 indicating kibi- or mebi-bytes.
 
+This is an alias for drm-total- and only one of the two should be
+present.
+
 - drm-shared-:  [KiB|MiB]
 
 The total size of buffers that are shared with another file (e.g., have more
@@ -145,6 +150,9 @@ than a single handle).
 
 The total size of buffers that including shared and private memory.
 
+This is an alias for drm-memory- and only one of the two should be
+present.
+
 - drm-resident-:  [KiB|MiB]
 
 The total size of buffers that are resident in the specified region.
-- 
2.44.0

[PATCH 1/3] drm/amdgpu: Add amdgpu_bo_is_vm_bo helper

2024-05-03 Thread Tvrtko Ursulin

From: Tvrtko Ursulin 

Help code readability by replacing a bunch of:

bo->tbo.base.resv == vm->root.bo->tbo.base.resv

With:

amdgpu_vm_is_bo_always_valid(vm, bo)

No functional changes.

v2:
 * Rename helper and move to amdgpu_vm. (Christian)

v3:
 * Use Christian's kerneldoc.

Signed-off-by: Tvrtko Ursulin 
Cc: Christian König 
Reviewed-by: Christian König 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c |  2 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c  | 41 -
 drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h  |  2 ++
 3 files changed, 29 insertions(+), 16 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
index 67c234bcf89f..e698d65e9508 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
@@ -174,7 +174,7 @@ static int amdgpu_gem_object_open(struct drm_gem_object 
*obj,
return -EPERM;
 
if (abo->flags & AMDGPU_GEM_CREATE_VM_ALWAYS_VALID &&
-   abo->tbo.base.resv != vm->root.bo->tbo.base.resv)
+   !amdgpu_vm_is_bo_always_valid(vm, abo))
return -EPERM;
 
r = amdgpu_bo_reserve(abo, false);
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
index 4e2391c83d7c..d451ff9d5af4 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
@@ -333,7 +333,7 @@ void amdgpu_vm_bo_base_init(struct amdgpu_vm_bo_base *base,
base->next = bo->vm_bo;
bo->vm_bo = base;
 
-   if (bo->tbo.base.resv != vm->root.bo->tbo.base.resv)
+   if (!amdgpu_vm_is_bo_always_valid(vm, bo))
return;
 
dma_resv_assert_held(vm->root.bo->tbo.base.resv);
@@ -1101,13 +1101,13 @@ static void amdgpu_vm_bo_get_memory(struct amdgpu_bo_va 
*bo_va,
 * For now ignore BOs which are currently locked and potentially
 * changing their location.
 */
-   if (bo->tbo.base.resv != vm->root.bo->tbo.base.resv &&
+   if (!amdgpu_vm_is_bo_always_valid(vm, bo) &&
!dma_resv_trylock(bo->tbo.base.resv))
return;
 
amdgpu_bo_get_memory(bo, stats);
-   if (bo->tbo.base.resv != vm->root.bo->tbo.base.resv)
-   dma_resv_unlock(bo->tbo.base.resv);
+   if (amdgpu_vm_is_bo_always_valid(vm, bo))
+   dma_resv_unlock(bo->tbo.base.resv);
 }
 
 void amdgpu_vm_get_memory(struct amdgpu_vm *vm,
@@ -1203,8 +1203,7 @@ int amdgpu_vm_bo_update(struct amdgpu_device *adev, 
struct amdgpu_bo_va *bo_va,
uncached = false;
}
 
-   if (clear || (bo && bo->tbo.base.resv ==
- vm->root.bo->tbo.base.resv))
+   if (clear || amdgpu_vm_is_bo_always_valid(vm, bo))
last_update = &vm->last_update;
else
last_update = &bo_va->last_pt_update;
@@ -1246,7 +1245,7 @@ int amdgpu_vm_bo_update(struct amdgpu_device *adev, 
struct amdgpu_bo_va *bo_va,
 * the evicted list so that it gets validated again on the
 * next command submission.
 */
-   if (bo && bo->tbo.base.resv == vm->root.bo->tbo.base.resv) {
+   if (amdgpu_vm_is_bo_always_valid(vm, bo)) {
uint32_t mem_type = bo->tbo.resource->mem_type;
 
if (!(bo->preferred_domains &
@@ -1640,10 +1639,9 @@ static void amdgpu_vm_bo_insert_map(struct amdgpu_device 
*adev,
if (mapping->flags & AMDGPU_PTE_PRT)
amdgpu_vm_prt_get(adev);
 
-   if (bo && bo->tbo.base.resv == vm->root.bo->tbo.base.resv &&
-   !bo_va->base.moved) {
+   if (amdgpu_vm_is_bo_always_valid(vm, bo) && !bo_va->base.moved)
amdgpu_vm_bo_moved(&bo_va->base);
-   }
+
trace_amdgpu_vm_bo_map(bo_va, mapping);
 }
 
@@ -1942,7 +1940,7 @@ int amdgpu_vm_bo_clear_mappings(struct amdgpu_device 
*adev,
if (before->flags & AMDGPU_PTE_PRT)
amdgpu_vm_prt_get(adev);
 
-   if (bo && bo->tbo.base.resv == vm->root.bo->tbo.base.resv &&
+   if (amdgpu_vm_is_bo_always_valid(vm, bo) &&
!before->bo_va->base.moved)
amdgpu_vm_bo_moved(&before->bo_va->base);
} else {
@@ -1957,7 +1955,7 @@ int amdgpu_vm_bo_clear_mappings(struct amdgpu_device 
*adev,
if (after->flags & AMDGPU_PTE_PRT)
amdgpu_vm_prt_get(adev);
 
-   if (bo && bo->tbo.base.resv == vm->root.bo->tbo.base.resv &&
+   if (amdgpu_vm_is_bo_always_valid(vm, bo) &&
!after->bo_va->base.moved)
amdgpu_vm_bo_moved(&after->bo_va->base);
} else {
@@ -2037,7 +2035,7 @@ void amdgpu_vm_bo_del(struct amdgpu_device *adev,
 
if (bo) {
dma_resv_assert_held(bo->tbo.base.resv);
-   if (bo->tbo.base.resv == vm->root.bo->tbo.base.resv)
+   if (amdgpu_vm_is_bo_always_valid(vm, bo))

[PATCH 2/3] drm/amdgpu: Reduce mem_type to domain double indirection

2024-05-03 Thread Tvrtko Ursulin

From: Tvrtko Ursulin 

All apart from AMDGPU_GEM_DOMAIN_GTT memory domains map 1:1 to TTM
placements. And the former be either AMDGPU_PL_PREEMPT or TTM_PL_TT,
depending on AMDGPU_GEM_CREATE_PREEMPTIBLE.

Simplify a few places in the code which convert the TTM placement into
a domain by checking against the current placement directly.

In the conversion AMDGPU_PL_PREEMPT either does not have to be handled
because amdgpu_mem_type_to_domain() cannot return that value anyway.

v2:
 * Remove AMDGPU_PL_PREEMPT handling.

v3:
 * Rebase.

Signed-off-by: Tvrtko Ursulin 
Reviewed-by: Christian König  # v1
Reviewed-by: Felix Kuehling  # v2
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_dma_buf.c |  3 +--
 drivers/gpu/drm/amd/amdgpu/amdgpu_object.c  | 29 +
 2 files changed, 13 insertions(+), 19 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_dma_buf.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_dma_buf.c
index 055ba2ea4c12..0b3b10d21952 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_dma_buf.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_dma_buf.c
@@ -165,8 +165,7 @@ static struct sg_table *amdgpu_dma_buf_map(struct 
dma_buf_attachment *attach,
if (r)
return ERR_PTR(r);
 
-   } else if (!(amdgpu_mem_type_to_domain(bo->tbo.resource->mem_type) &
-AMDGPU_GEM_DOMAIN_GTT)) {
+   } else if (bo->tbo.resource->mem_type != TTM_PL_TT) {
return ERR_PTR(-EBUSY);
}
 
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
index 8d8c39be6129..4f9073dd19eb 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
@@ -983,12 +983,11 @@ int amdgpu_bo_pin_restricted(struct amdgpu_bo *bo, u32 
domain,
 
ttm_bo_pin(&bo->tbo);
 
-   domain = amdgpu_mem_type_to_domain(bo->tbo.resource->mem_type);
-   if (domain == AMDGPU_GEM_DOMAIN_VRAM) {
+   if (bo->tbo.resource->mem_type == TTM_PL_VRAM) {
atomic64_add(amdgpu_bo_size(bo), &adev->vram_pin_size);
atomic64_add(amdgpu_vram_mgr_bo_visible_size(bo),
 &adev->visible_pin_size);
-   } else if (domain == AMDGPU_GEM_DOMAIN_GTT) {
+   } else if (bo->tbo.resource->mem_type == TTM_PL_TT) {
atomic64_add(amdgpu_bo_size(bo), &adev->gart_pin_size);
}
 
@@ -1293,7 +1292,6 @@ void amdgpu_bo_get_memory(struct amdgpu_bo *bo,
struct ttm_resource *res = bo->tbo.resource;
uint64_t size = amdgpu_bo_size(bo);
struct drm_gem_object *obj;
-   unsigned int domain;
bool shared;
 
/* Abort if the BO doesn't currently have a backing store */
@@ -1303,21 +1301,20 @@ void amdgpu_bo_get_memory(struct amdgpu_bo *bo,
obj = &bo->tbo.base;
shared = drm_gem_object_is_shared_for_memory_stats(obj);
 
-   domain = amdgpu_mem_type_to_domain(res->mem_type);
-   switch (domain) {
-   case AMDGPU_GEM_DOMAIN_VRAM:
+   switch (res->mem_type) {
+   case TTM_PL_VRAM:
stats->vram += size;
-   if (amdgpu_res_cpu_visible(adev, bo->tbo.resource))
+   if (amdgpu_res_cpu_visible(adev, res))
stats->visible_vram += size;
if (shared)
stats->vram_shared += size;
break;
-   case AMDGPU_GEM_DOMAIN_GTT:
+   case TTM_PL_TT:
stats->gtt += size;
if (shared)
stats->gtt_shared += size;
break;
-   case AMDGPU_GEM_DOMAIN_CPU:
+   case TTM_PL_SYSTEM:
default:
stats->cpu += size;
if (shared)
@@ -1330,7 +1327,7 @@ void amdgpu_bo_get_memory(struct amdgpu_bo *bo,
if (bo->flags & AMDGPU_GEM_CREATE_CPU_ACCESS_REQUIRED)
stats->requested_visible_vram += size;
 
-   if (domain != AMDGPU_GEM_DOMAIN_VRAM) {
+   if (res->mem_type != TTM_PL_VRAM) {
stats->evicted_vram += size;
if (bo->flags & AMDGPU_GEM_CREATE_CPU_ACCESS_REQUIRED)
stats->evicted_visible_vram += size;
@@ -1604,20 +1601,18 @@ u64 amdgpu_bo_print_info(int id, struct amdgpu_bo *bo, 
struct seq_file *m)
u64 size;
 
if (dma_resv_trylock(bo->tbo.base.resv)) {
-   unsigned int domain;
 
-   domain = amdgpu_mem_type_to_domain(bo->tbo.resource->mem_type);
-   switch (domain) {
-   case AMDGPU_GEM_DOMAIN_VRAM:
+   switch (bo->tbo.resource->mem_type) {
+   case TTM_PL_VRAM:
if (amdgpu_res_cpu_visible(adev, bo->tbo.resource))
placement = "VRAM VISIBLE";
else
placement = "VRAM";
break;
-   case AMDGPU_GEM_DOMAIN_GTT:
+

[PATCH 3/3] drm/amdgpu: Describe all object placements in debugfs

2024-05-03 Thread Tvrtko Ursulin

From: Tvrtko Ursulin 

Accurately show all placements when describing objects in debugfs, instead
of bunching them up under the 'CPU' placement.

Signed-off-by: Tvrtko Ursulin 
Cc: Christian König 
Cc: Felix Kuehling 
Reviewed-by: Christian König 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_object.c | 15 +++
 1 file changed, 15 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
index 4f9073dd19eb..fa5227a4aac2 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
@@ -1612,6 +1612,21 @@ u64 amdgpu_bo_print_info(int id, struct amdgpu_bo *bo, 
struct seq_file *m)
case TTM_PL_TT:
placement = "GTT";
break;
+   case AMDGPU_PL_GDS:
+   placement = "GDS";
+   break;
+   case AMDGPU_PL_GWS:
+   placement = "GWS";
+   break;
+   case AMDGPU_PL_OA:
+   placement = "OA";
+   break;
+   case AMDGPU_PL_PREEMPT:
+   placement = "PREEMPTIBLE";
+   break;
+   case AMDGPU_PL_DOORBELL:
+   placement = "DOORBELL";
+   break;
case TTM_PL_SYSTEM:
default:
placement = "CPU";
-- 
2.44.0

[PATCH v1 4/4] drm/amdgpu: add gfx queue registers for gfx10 ip dump

2024-05-03 Thread Sunil Khatri

Add gfx queue registers in the list of registers to
be dumped in ip dump for gfx10

Signed-off-by: Sunil Khatri 
---
 drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c | 26 +-
 1 file changed, 25 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c 
b/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
index 00c7a842ea3b..bef7d8ca35df 100644
--- a/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
@@ -418,7 +418,31 @@ static const struct amdgpu_hwip_reg_entry 
gc_reg_list_10_1[] = {
SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_SUSPEND_CNTL_STACK_OFFSET),
SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_SUSPEND_CNTL_STACK_DW_CNT),
SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_SUSPEND_WG_STATE_OFFSET),
-   SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_DEQUEUE_STATUS)
+   SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_DEQUEUE_STATUS),
+   /* gfx queue registers */
+   SOC15_REG_ENTRY_STR(GC, 0, mmCP_GFX_HQD_ACTIVE),
+   SOC15_REG_ENTRY_STR(GC, 0, mmCP_GFX_HQD_QUEUE_PRIORITY),
+   SOC15_REG_ENTRY_STR(GC, 0, mmCP_GFX_HQD_BASE),
+   SOC15_REG_ENTRY_STR(GC, 0, mmCP_GFX_HQD_BASE_HI),
+   SOC15_REG_ENTRY_STR(GC, 0, mmCP_GFX_HQD_OFFSET),
+   SOC15_REG_ENTRY_STR(GC, 0, mmCP_GFX_HQD_CSMD_RPTR),
+   SOC15_REG_ENTRY_STR(GC, 0, mmCP_GFX_HQD_WPTR),
+   SOC15_REG_ENTRY_STR(GC, 0, mmCP_GFX_HQD_WPTR_HI),
+   SOC15_REG_ENTRY_STR(GC, 0, mmCP_GFX_HQD_DEQUEUE_REQUEST),
+   SOC15_REG_ENTRY_STR(GC, 0, mmCP_GFX_HQD_MAPPED),
+   SOC15_REG_ENTRY_STR(GC, 0, mmCP_GFX_HQD_QUE_MGR_CONTROL),
+   SOC15_REG_ENTRY_STR(GC, 0, mmCP_GFX_HQD_HQ_CONTROL0),
+   SOC15_REG_ENTRY_STR(GC, 0, mmCP_GFX_HQD_HQ_STATUS0),
+   SOC15_REG_ENTRY_STR(GC, 0, mmCP_GFX_HQD_CE_WPTR_POLL_ADDR_LO),
+   SOC15_REG_ENTRY_STR(GC, 0, mmCP_GFX_HQD_CE_WPTR_POLL_ADDR_HI),
+   SOC15_REG_ENTRY_STR(GC, 0, mmCP_GFX_HQD_CE_OFFSET),
+   SOC15_REG_ENTRY_STR(GC, 0, mmCP_GFX_HQD_CE_CSMD_RPTR),
+   SOC15_REG_ENTRY_STR(GC, 0, mmCP_GFX_HQD_CE_WPTR),
+   SOC15_REG_ENTRY_STR(GC, 0, mmCP_GFX_HQD_CE_WPTR_HI),
+   SOC15_REG_ENTRY_STR(GC, 0, mmCP_GFX_MQD_BASE_ADDR),
+   SOC15_REG_ENTRY_STR(GC, 0, mmCP_GFX_MQD_BASE_ADDR_HI),
+   SOC15_REG_ENTRY_STR(GC, 0, mmCP_RB_WPTR_POLL_ADDR_LO),
+   SOC15_REG_ENTRY_STR(GC, 0, mmCP_RB_WPTR_POLL_ADDR_HI)
 };
 
 static const struct soc15_reg_golden golden_settings_gc_10_1[] = {
-- 
2.34.1

[PATCH v1 2/4] drm/amdgpu: add se registers to ip dump for gfx10

2024-05-03 Thread Sunil Khatri

add the registers of SE block of gfx for ip dump
for gfx10 IP.

Signed-off-by: Sunil Khatri 
---
 drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c | 7 ++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c 
b/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
index 61c1e997f794..953df202953a 100644
--- a/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
@@ -373,7 +373,12 @@ static const struct amdgpu_hwip_reg_entry 
gc_reg_list_10_1[] = {
SOC15_REG_ENTRY_STR(GC, 0, mmCP_MEC_ME2_HEADER_DUMP),
SOC15_REG_ENTRY_STR(GC, 0, mmCP_PFP_HEADER_DUMP),
SOC15_REG_ENTRY_STR(GC, 0, mmCP_ME_HEADER_DUMP),
-   SOC15_REG_ENTRY_STR(GC, 0, mmCP_MES_HEADER_DUMP)
+   SOC15_REG_ENTRY_STR(GC, 0, mmCP_MES_HEADER_DUMP),
+   /* SE status registers */
+   SOC15_REG_ENTRY_STR(GC, 0, mmGRBM_STATUS_SE0),
+   SOC15_REG_ENTRY_STR(GC, 0, mmGRBM_STATUS_SE1),
+   SOC15_REG_ENTRY_STR(GC, 0, mmGRBM_STATUS_SE2),
+   SOC15_REG_ENTRY_STR(GC, 0, mmGRBM_STATUS_SE3)
 };
 
 static const struct soc15_reg_golden golden_settings_gc_10_1[] = {
-- 
2.34.1

[PATCH v1 3/4] drm/amdgpu: add compute registers in ip dump for gfx10

2024-05-03 Thread Sunil Khatri

add compute registers in set of registers to dump
during ip dump for gfx10.

Signed-off-by: Sunil Khatri 
---
 drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c | 42 +-
 1 file changed, 41 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c 
b/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
index 953df202953a..00c7a842ea3b 100644
--- a/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
@@ -378,7 +378,47 @@ static const struct amdgpu_hwip_reg_entry 
gc_reg_list_10_1[] = {
SOC15_REG_ENTRY_STR(GC, 0, mmGRBM_STATUS_SE0),
SOC15_REG_ENTRY_STR(GC, 0, mmGRBM_STATUS_SE1),
SOC15_REG_ENTRY_STR(GC, 0, mmGRBM_STATUS_SE2),
-   SOC15_REG_ENTRY_STR(GC, 0, mmGRBM_STATUS_SE3)
+   SOC15_REG_ENTRY_STR(GC, 0, mmGRBM_STATUS_SE3),
+   /* compute registers */
+   SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_VMID),
+   SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_PERSISTENT_STATE),
+   SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_PIPE_PRIORITY),
+   SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_QUEUE_PRIORITY),
+   SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_QUANTUM),
+   SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_PQ_BASE),
+   SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_PQ_BASE_HI),
+   SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_PQ_RPTR),
+   SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_PQ_WPTR_POLL_ADDR),
+   SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_PQ_WPTR_POLL_ADDR_HI),
+   SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_PQ_DOORBELL_CONTROL),
+   SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_PQ_CONTROL),
+   SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_IB_BASE_ADDR),
+   SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_IB_BASE_ADDR_HI),
+   SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_IB_RPTR),
+   SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_IB_CONTROL),
+   SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_DEQUEUE_REQUEST),
+   SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_EOP_BASE_ADDR),
+   SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_EOP_BASE_ADDR_HI),
+   SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_EOP_CONTROL),
+   SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_EOP_RPTR),
+   SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_EOP_WPTR),
+   SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_EOP_EVENTS),
+   SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_CTX_SAVE_BASE_ADDR_LO),
+   SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_CTX_SAVE_BASE_ADDR_HI),
+   SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_CTX_SAVE_CONTROL),
+   SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_CNTL_STACK_OFFSET),
+   SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_CNTL_STACK_SIZE),
+   SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_WG_STATE_OFFSET),
+   SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_CTX_SAVE_SIZE),
+   SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_GDS_RESOURCE_STATE),
+   SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_ERROR),
+   SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_EOP_WPTR_MEM),
+   SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_PQ_WPTR_LO),
+   SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_PQ_WPTR_HI),
+   SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_SUSPEND_CNTL_STACK_OFFSET),
+   SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_SUSPEND_CNTL_STACK_DW_CNT),
+   SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_SUSPEND_WG_STATE_OFFSET),
+   SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_DEQUEUE_STATUS)
 };
 
 static const struct soc15_reg_golden golden_settings_gc_10_1[] = {
-- 
2.34.1

[PATCH v1 0/4] Adding differnt blocks of gfx10 registers for register dump

2024-05-03 Thread Sunil Khatri

*** BLURB HERE ***

Sunil Khatri (4):
  drm/amdgpu: add CP headers registers to gfx10 dump
  drm/amdgpu: add se registers to ip dump for gfx10
  drm/amdgpu: add compute registers in ip dump for gfx10
  drm/amdgpu: add gfx queue registers for gfx10 ip dump

 drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c | 78 +-
 1 file changed, 77 insertions(+), 1 deletion(-)

-- 
2.34.1

[PATCH v1 1/4] drm/amdgpu: add CP headers registers to gfx10 dump

2024-05-03 Thread Sunil Khatri

add registers in the ip dump for CP headers in gfx10

Signed-off-by: Sunil Khatri 
---
 drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c | 9 -
 1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c 
b/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
index 3171ed5e5af3..61c1e997f794 100644
--- a/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
@@ -366,7 +366,14 @@ static const struct amdgpu_hwip_reg_entry 
gc_reg_list_10_1[] = {
SOC15_REG_ENTRY_STR(GC, 0, mmRLC_GPM_DEBUG_INST_A),
SOC15_REG_ENTRY_STR(GC, 0, mmRLC_GPM_DEBUG_INST_B),
SOC15_REG_ENTRY_STR(GC, 0, mmRLC_GPM_DEBUG_INST_ADDR),
-   SOC15_REG_ENTRY_STR(GC, 0, mmRLC_LX6_CORE_PDEBUG_INST)
+   SOC15_REG_ENTRY_STR(GC, 0, mmRLC_LX6_CORE_PDEBUG_INST),
+   /* cp header registers */
+   SOC15_REG_ENTRY_STR(GC, 0, mmCP_CE_HEADER_DUMP),
+   SOC15_REG_ENTRY_STR(GC, 0, mmCP_MEC_ME1_HEADER_DUMP),
+   SOC15_REG_ENTRY_STR(GC, 0, mmCP_MEC_ME2_HEADER_DUMP),
+   SOC15_REG_ENTRY_STR(GC, 0, mmCP_PFP_HEADER_DUMP),
+   SOC15_REG_ENTRY_STR(GC, 0, mmCP_ME_HEADER_DUMP),
+   SOC15_REG_ENTRY_STR(GC, 0, mmCP_MES_HEADER_DUMP)
 };
 
 static const struct soc15_reg_golden golden_settings_gc_10_1[] = {
-- 
2.34.1

Re: [PATCH 1/3] drm/amdgpu: Add amdgpu_bo_is_vm_bo helper

2024-05-03 Thread Christian König


Am 03.05.24 um 10:23 schrieb Tvrtko Ursulin:


On 03/05/2024 07:26, Christian König wrote:

Am 29.04.24 um 18:47 schrieb Tvrtko Ursulin:

From: Tvrtko Ursulin 

Help code readability by replacing a bunch of:

bo->tbo.base.resv == vm->root.bo->tbo.base.resv

With:

amdgpu_vm_is_bo_always_valid(vm, bo)

No functional changes.

v2:
  * Rename helper and move to amdgpu_vm. (Christian)

Signed-off-by: Tvrtko Ursulin 
Cc: Christian König 
---
  drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c |  2 +-
  drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c  | 40 
+++--

  drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h  |  2 ++
  3 files changed, 28 insertions(+), 16 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c

index 67c234bcf89f..e698d65e9508 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
@@ -174,7 +174,7 @@ static int amdgpu_gem_object_open(struct 
drm_gem_object *obj,

  return -EPERM;
  if (abo->flags & AMDGPU_GEM_CREATE_VM_ALWAYS_VALID &&
-    abo->tbo.base.resv != vm->root.bo->tbo.base.resv)
+    !amdgpu_vm_is_bo_always_valid(vm, abo))
  return -EPERM;
  r = amdgpu_bo_reserve(abo, false);
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c

index 8af3f0fd3073..01ca4b35b369 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
@@ -333,7 +333,7 @@ void amdgpu_vm_bo_base_init(struct 
amdgpu_vm_bo_base *base,

  base->next = bo->vm_bo;
  bo->vm_bo = base;
-    if (bo->tbo.base.resv != vm->root.bo->tbo.base.resv)
+    if (!amdgpu_vm_is_bo_always_valid(vm, bo))
  return;
  dma_resv_assert_held(vm->root.bo->tbo.base.resv);
@@ -1101,13 +1101,13 @@ static void amdgpu_vm_bo_get_memory(struct 
amdgpu_bo_va *bo_va,

   * For now ignore BOs which are currently locked and potentially
   * changing their location.
   */
-    if (bo->tbo.base.resv != vm->root.bo->tbo.base.resv &&
+    if (!amdgpu_vm_is_bo_always_valid(vm, bo) &&
  !dma_resv_trylock(bo->tbo.base.resv))
  return;
  amdgpu_bo_get_memory(bo, stats);
-    if (bo->tbo.base.resv != vm->root.bo->tbo.base.resv)
-    dma_resv_unlock(bo->tbo.base.resv);
+    if (amdgpu_vm_is_bo_always_valid(vm, bo))
+    dma_resv_unlock(bo->tbo.base.resv);
  }
  void amdgpu_vm_get_memory(struct amdgpu_vm *vm,
@@ -1203,8 +1203,7 @@ int amdgpu_vm_bo_update(struct amdgpu_device 
*adev, struct amdgpu_bo_va *bo_va,

  uncached = false;
  }
-    if (clear || (bo && bo->tbo.base.resv ==
-  vm->root.bo->tbo.base.resv))
+    if (clear || amdgpu_vm_is_bo_always_valid(vm, bo))
  last_update = &vm->last_update;
  else
  last_update = &bo_va->last_pt_update;
@@ -1246,7 +1245,7 @@ int amdgpu_vm_bo_update(struct amdgpu_device 
*adev, struct amdgpu_bo_va *bo_va,

   * the evicted list so that it gets validated again on the
   * next command submission.
   */
-    if (bo && bo->tbo.base.resv == vm->root.bo->tbo.base.resv) {
+    if (amdgpu_vm_is_bo_always_valid(vm, bo)) {
  uint32_t mem_type = bo->tbo.resource->mem_type;
  if (!(bo->preferred_domains &
@@ -1640,10 +1639,9 @@ static void amdgpu_vm_bo_insert_map(struct 
amdgpu_device *adev,

  if (mapping->flags & AMDGPU_PTE_PRT)
  amdgpu_vm_prt_get(adev);
-    if (bo && bo->tbo.base.resv == vm->root.bo->tbo.base.resv &&
-    !bo_va->base.moved) {
+    if (amdgpu_vm_is_bo_always_valid(vm, bo) && !bo_va->base.moved)
  amdgpu_vm_bo_moved(&bo_va->base);
-    }
+
  trace_amdgpu_vm_bo_map(bo_va, mapping);
  }
@@ -1922,7 +1920,7 @@ int amdgpu_vm_bo_clear_mappings(struct 
amdgpu_device *adev,

  if (before->flags & AMDGPU_PTE_PRT)
  amdgpu_vm_prt_get(adev);
-    if (bo && bo->tbo.base.resv == vm->root.bo->tbo.base.resv &&
+    if (amdgpu_vm_is_bo_always_valid(vm, bo) &&
  !before->bo_va->base.moved)
amdgpu_vm_bo_moved(&before->bo_va->base);
  } else {
@@ -1937,7 +1935,7 @@ int amdgpu_vm_bo_clear_mappings(struct 
amdgpu_device *adev,

  if (after->flags & AMDGPU_PTE_PRT)
  amdgpu_vm_prt_get(adev);
-    if (bo && bo->tbo.base.resv == vm->root.bo->tbo.base.resv &&
+    if (amdgpu_vm_is_bo_always_valid(vm, bo) &&
  !after->bo_va->base.moved)
amdgpu_vm_bo_moved(&after->bo_va->base);
  } else {
@@ -2017,7 +2015,7 @@ void amdgpu_vm_bo_del(struct amdgpu_device *adev,
  if (bo) {
  dma_resv_assert_held(bo->tbo.base.resv);
-    if (bo->tbo.base.resv == vm->root.bo->tbo.base.resv)
+    if (amdgpu_vm_is_bo_always_valid(vm, bo))
  ttm_bo_set_bulk_move(&bo->tbo, NULL);
  for (base = &bo_va->base.bo->vm_bo; *base;
@@ -2111,7 +2109,7 @@ void amdgpu_vm_bo_invalidate(struct 
amdgpu_device *adev,

  for (bo_base = bo->vm_bo; bo_base; bo_base = bo_base->next) {
  st

Re: [linux-next:master] BUILD REGRESSION 9c6ecb3cb6e20c4fd7997047213ba0efcf9ada1a

2024-05-03 Thread Greg KH

Ok, I'm getting tired of seeing these for the USB portion of the tree,
so I went to look for:

On Fri, May 03, 2024 at 04:44:42AM +0800, kernel test robot wrote:
> |-- arc-randconfig-002-20240503
> |   `-- drivers-usb-dwc3-core.c:warning:variable-hw_mode-set-but-not-used

This warning (same for all arches), but can't seem to find it anywhere.

Any hints as to where it would be?

thanks,

greg k-h

Re: [linux-next:master] BUILD REGRESSION 9c6ecb3cb6e20c4fd7997047213ba0efcf9ada1a

2024-05-03 Thread Greg KH

On Fri, May 03, 2024 at 11:30:50AM +0530, Krishna Kurapati PSSNV wrote:
> 
> 
> On 5/3/2024 10:42 AM, Greg KH wrote:
> > Ok, I'm getting tired of seeing these for the USB portion of the tree,
> > so I went to look for:
> > 
> > On Fri, May 03, 2024 at 04:44:42AM +0800, kernel test robot wrote:
> > > |-- arc-randconfig-002-20240503
> > > |   `-- drivers-usb-dwc3-core.c:warning:variable-hw_mode-set-but-not-used
> > 
> > This warning (same for all arches), but can't seem to find it anywhere.
> > 
> > Any hints as to where it would be?
> > 
> 
> Hi Greg,
> 
>  I think the hw_mode was not removed in hs_phy_setup and left unused.
> 
>  Thinh reported the same when there was a merge conflict into linux next
> (that the hw_mode variable was removed in ss_phy_setup and should be removed
> in hs_phy_setup as well):
> 
> https://lore.kernel.org/all/20240426213923.tyeddub4xszyp...@synopsys.com/
> 
>  Perhaps that was missed ?

Must have been.  I need it in a format that it can be applied in (a
2-way diff can't apply...)

thanks,

greg k-h

Re: [RFC 5/5] drm/amdgpu: Only show VRAM in fdinfo if it exists

2024-05-03 Thread Tvrtko Ursulin




On 02/05/2024 14:16, Christian König wrote:

Am 30.04.24 um 19:27 schrieb Tvrtko Ursulin:

From: Tvrtko Ursulin 

Do not emit the key-value pairs if the VRAM does not exist ie. VRAM
placement is not valid and accessible.


Yeah, that's unfortunately rather misleading.

Even APUs have VRAM or rather stolen system memory which is managed by 
the graphics driver.


We only have a single compute model which really doesn't have VRAM at all.


Hm what is misleading and how more precisely? :) Maybe in other words, 
if is_app_apu is not the right criteria to know when TTM_PL_VRAM is 
impossible, what is? Is the compute model you mentio the only thing 
which sets is_app_apu and uses the dummy vram manager?


Regards,

Tvrtko


Regards,
Christian.



Signed-off-by: Tvrtko Ursulin 
---
  drivers/gpu/drm/amd/amdgpu/amdgpu_fdinfo.c | 29 +-
  1 file changed, 17 insertions(+), 12 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_fdinfo.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_fdinfo.c

index a09944104c41..603a5c010f5d 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_fdinfo.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_fdinfo.c
@@ -83,25 +83,30 @@ void amdgpu_show_fdinfo(struct drm_printer *p, 
struct drm_file *file)

   */
  drm_printf(p, "pasid:\t%u\n", fpriv->vm.pasid);
-    drm_printf(p, "drm-memory-vram:\t%llu KiB\n", stats.vram/1024UL);
  drm_printf(p, "drm-memory-gtt: \t%llu KiB\n", stats.gtt/1024UL);
  drm_printf(p, "drm-memory-cpu: \t%llu KiB\n", stats.cpu/1024UL);
-    drm_printf(p, "amd-memory-visible-vram:\t%llu KiB\n",
-   stats.visible_vram/1024UL);
-    drm_printf(p, "amd-evicted-vram:\t%llu KiB\n",
-   stats.evicted_vram/1024UL);
-    drm_printf(p, "amd-evicted-visible-vram:\t%llu KiB\n",
-   stats.evicted_visible_vram/1024UL);
-    drm_printf(p, "amd-requested-vram:\t%llu KiB\n",
-   stats.requested_vram/1024UL);
-    drm_printf(p, "amd-requested-visible-vram:\t%llu KiB\n",
-   stats.requested_visible_vram/1024UL);
  drm_printf(p, "amd-requested-gtt:\t%llu KiB\n",
 stats.requested_gtt/1024UL);
-    drm_printf(p, "drm-shared-vram:\t%llu KiB\n", 
stats.vram_shared/1024UL);
  drm_printf(p, "drm-shared-gtt:\t%llu KiB\n", 
stats.gtt_shared/1024UL);
  drm_printf(p, "drm-shared-cpu:\t%llu KiB\n", 
stats.cpu_shared/1024UL);

+    if (!adev->gmc.is_app_apu) {
+    drm_printf(p, "drm-memory-vram:\t%llu KiB\n",
+   stats.vram/1024UL);
+    drm_printf(p, "amd-memory-visible-vram:\t%llu KiB\n",
+   stats.visible_vram/1024UL);
+    drm_printf(p, "amd-evicted-vram:\t%llu KiB\n",
+   stats.evicted_vram/1024UL);
+    drm_printf(p, "amd-evicted-visible-vram:\t%llu KiB\n",
+   stats.evicted_visible_vram/1024UL);
+    drm_printf(p, "amd-requested-vram:\t%llu KiB\n",
+   stats.requested_vram/1024UL);
+    drm_printf(p, "amd-requested-visible-vram:\t%llu KiB\n",
+   stats.requested_visible_vram/1024UL);
+    drm_printf(p, "drm-shared-vram:\t%llu KiB\n",
+   stats.vram_shared/1024UL);
+    }
+
  for (hw_ip = 0; hw_ip < AMDGPU_HW_IP_NUM; ++hw_ip) {
  if (!usage[hw_ip])
  continue;

Re: [RFC 0/5] Add capacity key to fdinfo

2024-05-03 Thread Tvrtko Ursulin




On 02/05/2024 14:07, Christian König wrote:

Am 01.05.24 um 15:27 schrieb Tvrtko Ursulin:


Hi Alex,

On 30/04/2024 19:32, Alex Deucher wrote:
On Tue, Apr 30, 2024 at 1:27 PM Tvrtko Ursulin  
wrote:


From: Tvrtko Ursulin 

I have noticed AMD GPUs can have more than one "engine" (ring?) of 
the same type
but amdgpu is not reporting that in fdinfo using the capacity engine 
tag.


This series is therefore an attempt to improve that, but only an RFC 
since it is
quite likely I got stuff wrong on the first attempt. Or if not wrong 
it may not

be very beneficial in AMDs case.

So I tried to figure out how to count and store the number of 
instances of an
"engine" type and spotted that could perhaps be used in more than 
one place in
the driver. I was more than a little bit confused by the ip_instance 
and uapi
rings, then how rings are selected to context entities internally. 
Anyway..
hopefully it is a simple enough series to easily spot any such large 
misses.


End result should be that, assuming two "engine" instances, one 
fully loaded and

one idle will only report client using 50% of that engine type.


That would only be true if there are multiple instantiations of the IP
on the chip which in most cases is not true.  In most cases there is
one instance of the IP that can be fed from multiple rings. E.g. for
graphics and compute, all of the rings ultimately feed into the same
compute units on the chip.  So if you have a gfx ring and a compute
rings, you can schedule work to them asynchronously, but ultimately
whether they execute serially or in parallel depends on the actual
shader code in the command buffers and the extent to which it can
utilize the available compute units in the shader cores.


This is the same as with Intel/i915. Fdinfo is not intended to provide 
utilisation of EUs and such, just how busy are the "entities" kernel 
submits to. So doing something like in this series would make the 
reporting more similar between the two drivers.


I think both the 0-800% or 0-100% range (taking 8 ring compute as an 
example) can be misleading for different workloads. Neither <800% in 
the former means one can send more work and same for <100% in the latter.


Yeah, I think that's what Alex tries to describe. By using 8 compute 
rings your 800% load is actually incorrect and quite misleading.


Background is that those 8 compute rings won't be active all at the same 
time, but rather waiting on each other for resources.


But this "waiting" is unfortunately considered execution time since the 
used approach is actually not really capable of separating waiting and 
execution time.


Right, so 800% is what gputop could be suggesting today, by the virtue 8 
context/clients can each use 100% if they only use a subset of compute 
units. I was proposing to expose the capacity in fdinfo so it can be 
scaled down and then dicussing how both situation have pros and cons.


There is also a parallel with the CPU world here and hyper threading, 
if not wider, where "What does 100% actually mean?" is also wishy-washy.


Also note that the reporting of actual time based values in fdinfo 
would not changing with this series.


Of if you can guide me towards how to distinguish real vs fake 
parallelism in HW IP blocks I could modify the series to only add 
capacity tags where there are truly independent blocks. That would be 
different from i915 though were I did not bother with that 
distinction. (For reasons that assignment of for instance EUs to 
compute "rings" (command streamers in i915) was supposed to be 
possible to re-configure on the fly. So it did not make sense to try 
and be super smart in fdinfo.)


Well exactly that's the point we don't really have truly independent 
blocks on AMD hardware.


There are things like independent SDMA instances, but those a meant to 
be used like the first instance for uploads and the second for downloads 
etc.. When you use both instances for the same job they will pretty much 
limit each other because of a single resource.


So _never_ multiple instances of the same IP block? No video decode, 
encode, anything?



As for the UAPI portion of this, we generally expose a limited number
of rings to user space and then we use the GPU scheduler to load
balance between all of the available rings of a type to try and
extract as much parallelism as we can.


The part I do not understand is the purpose of the ring argument in 
for instance drm_amdgpu_cs_chunk_ib. It appears userspace can create 
up to N scheduling entities using different ring id's, but internally 
they can map to 1:N same scheduler instances (depending on IP type, 
can be that each userspace ring maps to same N hw rings, or for rings 
with no drm sched load balancing userspace ring also does not appear 
to have a relation to the picked drm sched instance.).


So I neither understand how this ring is useful, or how it does not 
create a problem for IP types which use drm_sched_pick_best. It 
appears even if u

Re: [PATCH v1 12/12] fbdev/viafb: Make I2C terminology more inclusive

2024-05-03 Thread Easwar Hariharan

On 5/2/2024 3:46 AM, Thomas Zimmermann wrote:
> 
> 
> Am 30.04.24 um 19:38 schrieb Easwar Hariharan:
>> I2C v7, SMBus 3.2, and I3C 1.1.1 specifications have replaced "master/slave"
>> with more appropriate terms. Inspired by and following on to Wolfram's
>> series to fix drivers/i2c/[1], fix the terminology for users of
>> I2C_ALGOBIT bitbanging interface, now that the approved verbiage exists
>> in the specification.
>>
>> Compile tested, no functionality changes intended
>>
>> [1]: 
>> https://lore.kernel.org/all/20240322132619.6389-1-wsa+rene...@sang-engineering.com/
>>
>> Signed-off-by: Easwar Hariharan 
> 
> Acked-by: Thomas Zimmermann 
> 

Thanks for the ack! I had been addressing feedback as I got it on the v0 
series, and it seems
I missed out on updating viafb and smscufx to spec-compliant controller/target 
terminology like
the v0->v1 changelog calls out before posting v1.

For smscufx, I feel phrasing the following line (as an example)

> -/* sets up I2C Controller for 100 Kbps, std. speed, 7-bit addr, host, 
> +/* sets up I2C Controller for 100 Kbps, std. speed, 7-bit addr, 
> *controller*, 

would actually impact readability negatively, so I propose to leave smscufx as 
is.

For viafb, I propose making it compliant with the spec using the 
controller/target terminology and
posting a v2 respin (which I can send out as soon as you say) and ask you to 
review again.

What do you think?

Thanks,
Easwar

>> ---
>>   drivers/video/fbdev/via/chip.h    |  8 
>>   drivers/video/fbdev/via/dvi.c | 24 
>>   drivers/video/fbdev/via/lcd.c |  6 +++---
>>   drivers/video/fbdev/via/via_aux.h |  2 +-
>>   drivers/video/fbdev/via/via_i2c.c | 12 ++--
>>   drivers/video/fbdev/via/vt1636.c  |  6 +++---
>>   6 files changed, 29 insertions(+), 29 deletions(-)
>>

Re: [linux-next:master] BUILD REGRESSION 9c6ecb3cb6e20c4fd7997047213ba0efcf9ada1a

2024-05-03 Thread Krishna Kurapati PSSNV





On 5/3/2024 10:42 AM, Greg KH wrote:

Ok, I'm getting tired of seeing these for the USB portion of the tree,
so I went to look for:

On Fri, May 03, 2024 at 04:44:42AM +0800, kernel test robot wrote:

|-- arc-randconfig-002-20240503
|   `-- drivers-usb-dwc3-core.c:warning:variable-hw_mode-set-but-not-used


This warning (same for all arches), but can't seem to find it anywhere.

Any hints as to where it would be?



Hi Greg,

 I think the hw_mode was not removed in hs_phy_setup and left unused.

 Thinh reported the same when there was a merge conflict into linux 
next (that the hw_mode variable was removed in ss_phy_setup and should 
be removed in hs_phy_setup as well):


https://lore.kernel.org/all/20240426213923.tyeddub4xszyp...@synopsys.com/

 Perhaps that was missed ?

Regards,
Krishna,

Re: [PATCH v1 12/12] fbdev/viafb: Make I2C terminology more inclusive

2024-05-03 Thread Thomas Zimmermann


Hi

Am 03.05.24 um 00:26 schrieb Easwar Hariharan:

On 5/2/2024 3:46 AM, Thomas Zimmermann wrote:


Am 30.04.24 um 19:38 schrieb Easwar Hariharan:

I2C v7, SMBus 3.2, and I3C 1.1.1 specifications have replaced "master/slave"
with more appropriate terms. Inspired by and following on to Wolfram's
series to fix drivers/i2c/[1], fix the terminology for users of
I2C_ALGOBIT bitbanging interface, now that the approved verbiage exists
in the specification.

Compile tested, no functionality changes intended

[1]: 
https://lore.kernel.org/all/20240322132619.6389-1-wsa+rene...@sang-engineering.com/

Signed-off-by: Easwar Hariharan 

Acked-by: Thomas Zimmermann 


Thanks for the ack! I had been addressing feedback as I got it on the v0 
series, and it seems
I missed out on updating viafb and smscufx to spec-compliant controller/target 
terminology like
the v0->v1 changelog calls out before posting v1.

For smscufx, I feel phrasing the following line (as an example)


-/* sets up I2C Controller for 100 Kbps, std. speed, 7-bit addr, host,
+/* sets up I2C Controller for 100 Kbps, std. speed, 7-bit addr, *controller*,

would actually impact readability negatively, so I propose to leave smscufx as 
is.


Why? I don't see much of a difference.



For viafb, I propose making it compliant with the spec using the 
controller/target terminology and
posting a v2 respin (which I can send out as soon as you say) and ask you to 
review again.

What do you think?


I think we should adopt the spec's language everywhere. That makes it 
possible to grep the spec for terms used in the source code. Using 
'host' in smscufx appears to introduce yet another term. If you are 
worried about using 'I2C controller' and 'controller' in the same 
sentence, you can replace 'I2C controller' with 'DDC channel'. That's 
even more precise about the purpose of this code.


Best regards
Thomas



Thanks,
Easwar


---
   drivers/video/fbdev/via/chip.h    |  8 
   drivers/video/fbdev/via/dvi.c | 24 
   drivers/video/fbdev/via/lcd.c |  6 +++---
   drivers/video/fbdev/via/via_aux.h |  2 +-
   drivers/video/fbdev/via/via_i2c.c | 12 ++--
   drivers/video/fbdev/via/vt1636.c  |  6 +++---
   6 files changed, 29 insertions(+), 29 deletions(-)






--
--
Thomas Zimmermann
Graphics Driver Developer
SUSE Software Solutions Germany GmbH
Frankenstrasse 146, 90461 Nuernberg, Germany
GF: Ivo Totev, Andrew Myers, Andrew McDonald, Boudien Moerman
HRB 36809 (AG Nuernberg)

Re: [PATCH v1 03/12] drm/i915: Make I2C terminology more inclusive

2024-05-03 Thread Zhi Wang

On Tue, 30 Apr 2024 17:38:02 +
Easwar Hariharan  wrote:

> I2C v7, SMBus 3.2, and I3C 1.1.1 specifications have replaced
> "master/slave" with more appropriate terms. Inspired by and following
> on to Wolfram's series to fix drivers/i2c/[1], fix the terminology
> for users of I2C_ALGOBIT bitbanging interface, now that the approved
> verbiage exists in the specification.
> 
> Compile tested, no functionality changes intended
> 
For GVT part,

Acked-by: Zhi Wang 

Thanks,
Zhi.

> [1]:
> https://lore.kernel.org/all/20240322132619.6389-1-wsa+rene...@sang-engineering.com/
> 
> Signed-off-by: Easwar Hariharan 
> ---
>  drivers/gpu/drm/i915/display/dvo_ch7017.c | 14 -
>  drivers/gpu/drm/i915/display/dvo_ch7xxx.c | 18 +--
>  drivers/gpu/drm/i915/display/dvo_ivch.c   | 16 +-
>  drivers/gpu/drm/i915/display/dvo_ns2501.c | 18 +--
>  drivers/gpu/drm/i915/display/dvo_sil164.c | 18 +--
>  drivers/gpu/drm/i915/display/dvo_tfp410.c | 18 +--
>  drivers/gpu/drm/i915/display/intel_bios.c | 22 +++---
>  drivers/gpu/drm/i915/display/intel_ddi.c  |  2 +-
>  .../gpu/drm/i915/display/intel_display_core.h |  2 +-
>  drivers/gpu/drm/i915/display/intel_dsi.h  |  2 +-
>  drivers/gpu/drm/i915/display/intel_dsi_vbt.c  | 20 ++---
>  drivers/gpu/drm/i915/display/intel_dvo.c  | 14 -
>  drivers/gpu/drm/i915/display/intel_dvo_dev.h  |  2 +-
>  drivers/gpu/drm/i915/display/intel_gmbus.c|  4 +--
>  drivers/gpu/drm/i915/display/intel_sdvo.c | 30
> +-- drivers/gpu/drm/i915/display/intel_vbt_defs.h |
> 4 +-- drivers/gpu/drm/i915/gvt/edid.c   | 28
> - drivers/gpu/drm/i915/gvt/edid.h   |  4
> +-- drivers/gpu/drm/i915/gvt/opregion.c   |  2 +-
>  19 files changed, 119 insertions(+), 119 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/display/dvo_ch7017.c
> b/drivers/gpu/drm/i915/display/dvo_ch7017.c index
> d0c3880d7f80..493e730c685b 100644 ---
> a/drivers/gpu/drm/i915/display/dvo_ch7017.c +++
> b/drivers/gpu/drm/i915/display/dvo_ch7017.c @@ -170,13 +170,13 @@
> static bool ch7017_read(struct intel_dvo_device *dvo, u8 addr, u8
> *val) { struct i2c_msg msgs[] = {
>   {
> - .addr = dvo->slave_addr,
> + .addr = dvo->target_addr,
>   .flags = 0,
>   .len = 1,
>   .buf = &addr,
>   },
>   {
> - .addr = dvo->slave_addr,
> + .addr = dvo->target_addr,
>   .flags = I2C_M_RD,
>   .len = 1,
>   .buf = val,
> @@ -189,7 +189,7 @@ static bool ch7017_write(struct intel_dvo_device
> *dvo, u8 addr, u8 val) {
>   u8 buf[2] = { addr, val };
>   struct i2c_msg msg = {
> - .addr = dvo->slave_addr,
> + .addr = dvo->target_addr,
>   .flags = 0,
>   .len = 2,
>   .buf = buf,
> @@ -197,7 +197,7 @@ static bool ch7017_write(struct intel_dvo_device
> *dvo, u8 addr, u8 val) return i2c_transfer(dvo->i2c_bus, &msg, 1) ==
> 1; }
>  
> -/** Probes for a CH7017 on the given bus and slave address. */
> +/** Probes for a CH7017 on the given bus and target address. */
>  static bool ch7017_init(struct intel_dvo_device *dvo,
>   struct i2c_adapter *adapter)
>  {
> @@ -227,13 +227,13 @@ static bool ch7017_init(struct intel_dvo_device
> *dvo, break;
>   default:
>   DRM_DEBUG_KMS("ch701x not detected, got %d: from %s "
> -   "slave %d.\n",
> -   val, adapter->name, dvo->slave_addr);
> +   "target %d.\n",
> +   val, adapter->name, dvo->target_addr);
>   goto fail;
>   }
>  
>   DRM_DEBUG_KMS("%s detected on %s, addr %d\n",
> -   str, adapter->name, dvo->slave_addr);
> +   str, adapter->name, dvo->target_addr);
>   return true;
>  
>  fail:
> diff --git a/drivers/gpu/drm/i915/display/dvo_ch7xxx.c
> b/drivers/gpu/drm/i915/display/dvo_ch7xxx.c index
> 2e8e85da5a40..534b8544e0a4 100644 ---
> a/drivers/gpu/drm/i915/display/dvo_ch7xxx.c +++
> b/drivers/gpu/drm/i915/display/dvo_ch7xxx.c @@ -153,13 +153,13 @@
> static bool ch7xxx_readb(struct intel_dvo_device *dvo, int addr, u8
> *ch) struct i2c_msg msgs[] = {
>   {
> - .addr = dvo->slave_addr,
> + .addr = dvo->target_addr,
>   .flags = 0,
>   .len = 1,
>   .buf = out_buf,
>   },
>   {
> - .addr = dvo->slave_addr,
> + .addr = dvo->target_addr,
>   .flags = I2C_M_RD,
>   .len = 1,
>   .buf = in_buf,
> @@ -176,7 +176,7 @@ static bool ch7xxx_readb(

51 matches

Mail list logo