Re: [PATCH v1 10/12] sfc: falcon: Make I2C terminology more inclusive
On Tue, 30 Apr 2024 17:38:09 + Easwar Hariharan wrote: > I2C v7, SMBus 3.2, and I3C 1.1.1 specifications have replaced "master/slave" > with more appropriate terms. Inspired by and following on to Wolfram's > series to fix drivers/i2c/[1], fix the terminology for users of > I2C_ALGOBIT bitbanging interface, now that the approved verbiage exists > in the specification. > > Compile tested, no functionality changes intended FWIW we're assuming someone (Wolfram?) will take all of these, instead of area maintainers picking them individually. Please let us know if that's incorrect.
[PATCH] drm/amdkfd: Ensure gpu_id is unique
gpu_id needs to be unique for user space to identify GPUs via KFD interface. In the current implementation there is a very small probability of having non unique gpu_ids. v2: Add check to confirm if gpu_id is unique. If not unique, find one Changed commit header to reflect the above Signed-off-by: Harish Kasiviswanathan --- drivers/gpu/drm/amd/amdkfd/kfd_topology.c | 26 ++- 1 file changed, 25 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_topology.c b/drivers/gpu/drm/amd/amdkfd/kfd_topology.c index b93913934b03..01d4c2e10c6d 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_topology.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_topology.c @@ -1095,6 +1095,8 @@ static uint32_t kfd_generate_gpu_id(struct kfd_node *gpu) uint32_t hashout; uint32_t buf[8]; uint64_t local_mem_size; + struct kfd_topology_device *dev; + bool is_unique; int i; if (!gpu) @@ -1115,6 +1117,28 @@ static uint32_t kfd_generate_gpu_id(struct kfd_node *gpu) for (i = 0, hashout = 0; i < 8; i++) hashout ^= hash_32(buf[i], KFD_GPU_ID_HASH_WIDTH); + /* hash generated could be non-unique. Check if it is unique. +* If not unique increment till unique one is found. In case +* of overflow, restart from 1 + */ + down_read(&topology_lock); + do { + is_unique = true; + list_for_each_entry(dev, &topology_device_list, list) { + if (dev->gpu && dev->gpu_id == hashout) { + is_unique = false; + break; + } + } + if (unlikely(!is_unique)) { + hashout = (hashout + 1) & + ((1 << KFD_GPU_ID_HASH_WIDTH) - 1); + if (!hashout) + hashout = 1; + } + } while (!is_unique); + up_read(&topology_lock); + return hashout; } /* kfd_assign_gpu - Attach @gpu to the correct kfd topology device. If @@ -1946,7 +1970,6 @@ int kfd_topology_add_device(struct kfd_node *gpu) struct amdgpu_gfx_config *gfx_info = &gpu->adev->gfx.config; struct amdgpu_cu_info *cu_info = &gpu->adev->gfx.cu_info; - gpu_id = kfd_generate_gpu_id(gpu); if (gpu->xcp && !gpu->xcp->ddev) { dev_warn(gpu->adev->dev, "Won't add GPU to topology since it has no drm node assigned."); @@ -1969,6 +1992,7 @@ int kfd_topology_add_device(struct kfd_node *gpu) if (res) return res; + gpu_id = kfd_generate_gpu_id(gpu); dev->gpu_id = gpu_id; gpu->id = gpu_id; -- 2.34.1
Re: [PATCH v2 03/12] drm/i915: Make I2C terminology more inclusive
On Fri, May 03, 2024 at 02:04:15PM -0700, Easwar Hariharan wrote: > On 5/3/2024 12:34 PM, Rodrigo Vivi wrote: > > On Fri, May 03, 2024 at 06:13:24PM +, Easwar Hariharan wrote: > >> I2C v7, SMBus 3.2, and I3C 1.1.1 specifications have replaced > >> "master/slave" > >> with more appropriate terms. Inspired by and following on to Wolfram's > >> series to fix drivers/i2c/[1], fix the terminology for users of > >> I2C_ALGOBIT bitbanging interface, now that the approved verbiage exists > >> in the specification. > >> > >> Compile tested, no functionality changes intended > >> > >> [1]: > >> https://lore.kernel.org/all/20240322132619.6389-1-wsa+rene...@sang-engineering.com/ > >> > >> Reviewed-by: Rodrigo Vivi > >> Acked-by: Rodrigo Vivi > > > > It looks like the ack is not needed since we are merging this through > > drm-intel-next. But I'm planing to merge this only after seeing the > > main drivers/i2c accepting the new terminology. So we don't have a > > risk of that getting push back and new names there and we having > > to rename it once again. > > Just to be explicit, did you want me to remove the Acked-by in v3, or will > you when you pull > the patch into drm-intel-next? > > > > > (more below) > > > >> Acked-by: Zhi Wang > >> Signed-off-by: Easwar Hariharan > > > > Cc: Jani Nikula > > > > Jani, what bits were you concerned that were not necessarily i2c? > > I believe although not necessarily/directly i2c, I believe they > > are related and could benefit from the massive single shot renable. > > or do you have any better split to suggest here? > > > > (more below) > > > >> --- > >> drivers/gpu/drm/i915/display/dvo_ch7017.c | 14 - > >> drivers/gpu/drm/i915/display/dvo_ch7xxx.c | 18 +-- > >> drivers/gpu/drm/i915/display/dvo_ivch.c | 16 +- > >> drivers/gpu/drm/i915/display/dvo_ns2501.c | 18 +-- > >> drivers/gpu/drm/i915/display/dvo_sil164.c | 18 +-- > >> drivers/gpu/drm/i915/display/dvo_tfp410.c | 18 +-- > >> drivers/gpu/drm/i915/display/intel_bios.c | 22 +++--- > >> drivers/gpu/drm/i915/display/intel_ddi.c | 2 +- > >> .../gpu/drm/i915/display/intel_display_core.h | 2 +- > >> drivers/gpu/drm/i915/display/intel_dsi.h | 2 +- > >> drivers/gpu/drm/i915/display/intel_dsi_vbt.c | 20 ++--- > >> drivers/gpu/drm/i915/display/intel_dvo.c | 14 - > >> drivers/gpu/drm/i915/display/intel_dvo_dev.h | 2 +- > >> drivers/gpu/drm/i915/display/intel_gmbus.c| 4 +-- > >> drivers/gpu/drm/i915/display/intel_sdvo.c | 30 +-- > >> drivers/gpu/drm/i915/display/intel_vbt_defs.h | 4 +-- > >> drivers/gpu/drm/i915/gvt/edid.c | 28 - > >> drivers/gpu/drm/i915/gvt/edid.h | 4 +-- > >> drivers/gpu/drm/i915/gvt/opregion.c | 2 +- > >> 19 files changed, 119 insertions(+), 119 deletions(-) > >> > > > > >> diff --git a/drivers/gpu/drm/i915/display/intel_ddi.c > >> b/drivers/gpu/drm/i915/display/intel_ddi.c > >> index c17462b4c2ac..64db211148a8 100644 > >> --- a/drivers/gpu/drm/i915/display/intel_ddi.c > >> +++ b/drivers/gpu/drm/i915/display/intel_ddi.c > >> @@ -4332,7 +4332,7 @@ static int intel_ddi_compute_config_late(struct > >> intel_encoder *encoder, > >> > >> connector->tile_group->id); > >> > >>/* > >> - * EDP Transcoders cannot be ensalved > >> + * EDP Transcoders cannot be slaves > > > > ^ here > > perhaps you meant 'targeted' ? > > > >> * make them a master always when present > > > > This is not actually I2C related as far as I could tell when I was making the > change, so this was more of a typo fix. > > If we want to improve this, a quick check with the eDP v1.5a spec suggests > using primary/secondary instead, > though in a global fashion rather than specifically for eDP transcoders. > There is also source/sink terminology > in the spec related to DP encoders. > > Which would be a more acceptable change here? hmmm probably better to split the patches and align with the spec naming where it applies. and with i2c name where it applies. > > Thanks, > Easwar
Re: [PATCH v2 03/12] drm/i915: Make I2C terminology more inclusive
On Fri, May 03, 2024 at 06:13:24PM +, Easwar Hariharan wrote: > I2C v7, SMBus 3.2, and I3C 1.1.1 specifications have replaced "master/slave" > with more appropriate terms. Inspired by and following on to Wolfram's > series to fix drivers/i2c/[1], fix the terminology for users of > I2C_ALGOBIT bitbanging interface, now that the approved verbiage exists > in the specification. > > Compile tested, no functionality changes intended > > [1]: > https://lore.kernel.org/all/20240322132619.6389-1-wsa+rene...@sang-engineering.com/ > > Reviewed-by: Rodrigo Vivi > Acked-by: Rodrigo Vivi It looks like the ack is not needed since we are merging this through drm-intel-next. But I'm planing to merge this only after seeing the main drivers/i2c accepting the new terminology. So we don't have a risk of that getting push back and new names there and we having to rename it once again. (more below) > Acked-by: Zhi Wang > Signed-off-by: Easwar Hariharan Cc: Jani Nikula Jani, what bits were you concerned that were not necessarily i2c? I believe although not necessarily/directly i2c, I believe they are related and could benefit from the massive single shot renable. or do you have any better split to suggest here? (more below) > --- > drivers/gpu/drm/i915/display/dvo_ch7017.c | 14 - > drivers/gpu/drm/i915/display/dvo_ch7xxx.c | 18 +-- > drivers/gpu/drm/i915/display/dvo_ivch.c | 16 +- > drivers/gpu/drm/i915/display/dvo_ns2501.c | 18 +-- > drivers/gpu/drm/i915/display/dvo_sil164.c | 18 +-- > drivers/gpu/drm/i915/display/dvo_tfp410.c | 18 +-- > drivers/gpu/drm/i915/display/intel_bios.c | 22 +++--- > drivers/gpu/drm/i915/display/intel_ddi.c | 2 +- > .../gpu/drm/i915/display/intel_display_core.h | 2 +- > drivers/gpu/drm/i915/display/intel_dsi.h | 2 +- > drivers/gpu/drm/i915/display/intel_dsi_vbt.c | 20 ++--- > drivers/gpu/drm/i915/display/intel_dvo.c | 14 - > drivers/gpu/drm/i915/display/intel_dvo_dev.h | 2 +- > drivers/gpu/drm/i915/display/intel_gmbus.c| 4 +-- > drivers/gpu/drm/i915/display/intel_sdvo.c | 30 +-- > drivers/gpu/drm/i915/display/intel_vbt_defs.h | 4 +-- > drivers/gpu/drm/i915/gvt/edid.c | 28 - > drivers/gpu/drm/i915/gvt/edid.h | 4 +-- > drivers/gpu/drm/i915/gvt/opregion.c | 2 +- > 19 files changed, 119 insertions(+), 119 deletions(-) > > diff --git a/drivers/gpu/drm/i915/display/dvo_ch7017.c > b/drivers/gpu/drm/i915/display/dvo_ch7017.c > index d0c3880d7f80..493e730c685b 100644 > --- a/drivers/gpu/drm/i915/display/dvo_ch7017.c > +++ b/drivers/gpu/drm/i915/display/dvo_ch7017.c > @@ -170,13 +170,13 @@ static bool ch7017_read(struct intel_dvo_device *dvo, > u8 addr, u8 *val) > { > struct i2c_msg msgs[] = { > { > - .addr = dvo->slave_addr, > + .addr = dvo->target_addr, > .flags = 0, > .len = 1, > .buf = &addr, > }, > { > - .addr = dvo->slave_addr, > + .addr = dvo->target_addr, > .flags = I2C_M_RD, > .len = 1, > .buf = val, > @@ -189,7 +189,7 @@ static bool ch7017_write(struct intel_dvo_device *dvo, u8 > addr, u8 val) > { > u8 buf[2] = { addr, val }; > struct i2c_msg msg = { > - .addr = dvo->slave_addr, > + .addr = dvo->target_addr, > .flags = 0, > .len = 2, > .buf = buf, > @@ -197,7 +197,7 @@ static bool ch7017_write(struct intel_dvo_device *dvo, u8 > addr, u8 val) > return i2c_transfer(dvo->i2c_bus, &msg, 1) == 1; > } > > -/** Probes for a CH7017 on the given bus and slave address. */ > +/** Probes for a CH7017 on the given bus and target address. */ > static bool ch7017_init(struct intel_dvo_device *dvo, > struct i2c_adapter *adapter) > { > @@ -227,13 +227,13 @@ static bool ch7017_init(struct intel_dvo_device *dvo, > break; > default: > DRM_DEBUG_KMS("ch701x not detected, got %d: from %s " > - "slave %d.\n", > - val, adapter->name, dvo->slave_addr); > + "target %d.\n", > + val, adapter->name, dvo->target_addr); > goto fail; > } > > DRM_DEBUG_KMS("%s detected on %s, addr %d\n", > - str, adapter->name, dvo->slave_addr); > + str, adapter->name, dvo->target_addr); > return true; > > fail: > diff --git a/drivers/gpu/drm/i915/display/dvo_ch7xxx.c > b/drivers/gpu/drm/i915/display/dvo_ch7xxx.c > index 2e8e85da5a40..534b8544e0a4 100644 > --- a/drivers/gpu/drm/i915/display/dvo_ch7xxx.c > +++ b/drivers/gpu/drm/i915/
Re: [PATCH v0 04/14] media: au0828: Make I2C terminology more inclusive
Em Fri, 29 Mar 2024 17:00:28 + Easwar Hariharan escreveu: > I2C v7, SMBus 3.2, and I3C specifications have replaced "master/slave" > with more appropriate terms. Inspired by and following on to Wolfram's > series to fix drivers/i2c/[1], fix the terminology for users of > I2C_ALGOBIT bitbanging interface, now that the approved verbiage exists > in the specification. > > Compile tested, no functionality changes intended Current media drivers are perfectly fine with the current terminology. None of them implement the above new standards. Please drop patches for current stuff under drivers/media. Regards, Mauro > > [1]: > https://lore.kernel.org/all/20240322132619.6389-1-wsa+rene...@sang-engineering.com/ > > Signed-off-by: Easwar Hariharan > --- > drivers/media/usb/au0828/au0828-i2c.c | 4 ++-- > drivers/media/usb/au0828/au0828-input.c | 2 +- > 2 files changed, 3 insertions(+), 3 deletions(-) > > diff --git a/drivers/media/usb/au0828/au0828-i2c.c > b/drivers/media/usb/au0828/au0828-i2c.c > index 749f90d73b5b..3e66d42bf134 100644 > --- a/drivers/media/usb/au0828/au0828-i2c.c > +++ b/drivers/media/usb/au0828/au0828-i2c.c > @@ -23,7 +23,7 @@ MODULE_PARM_DESC(i2c_scan, "scan i2c bus at insmod time"); > #define I2C_WAIT_DELAY 25 > #define I2C_WAIT_RETRY 1000 > > -static inline int i2c_slave_did_read_ack(struct i2c_adapter *i2c_adap) > +static inline int i2c_client_did_read_ack(struct i2c_adapter *i2c_adap) > { > struct au0828_dev *dev = i2c_adap->algo_data; > return au0828_read(dev, AU0828_I2C_STATUS_201) & > @@ -35,7 +35,7 @@ static int i2c_wait_read_ack(struct i2c_adapter *i2c_adap) > int count; > > for (count = 0; count < I2C_WAIT_RETRY; count++) { > - if (!i2c_slave_did_read_ack(i2c_adap)) > + if (!i2c_client_did_read_ack(i2c_adap)) > break; > udelay(I2C_WAIT_DELAY); > } > diff --git a/drivers/media/usb/au0828/au0828-input.c > b/drivers/media/usb/au0828/au0828-input.c > index 3d3368202cd0..98a57b6e02e2 100644 > --- a/drivers/media/usb/au0828/au0828-input.c > +++ b/drivers/media/usb/au0828/au0828-input.c > @@ -30,7 +30,7 @@ struct au0828_rc { > int polling; > struct delayed_work work; > > - /* i2c slave address of external device (if used) */ > + /* i2c client address of external device (if used) */ > u16 i2c_dev_addr; > > int (*get_key_i2c)(struct au0828_rc *ir);
Re: [PATCH] Documentation/gpu: Document the situation with unqualified drm-memory-
On Fri, May 3, 2024 at 1:06 PM Tvrtko Ursulin wrote: > > > On 03/05/2024 16:58, Alex Deucher wrote: > > On Fri, May 3, 2024 at 11:33 AM Daniel Vetter wrote: > >> > >> On Fri, May 03, 2024 at 01:58:38PM +0100, Tvrtko Ursulin wrote: > >>> > >>> [And I forgot dri-devel.. doing well!] > >>> > >>> On 03/05/2024 13:40, Tvrtko Ursulin wrote: > > [Correcting Christian's email] > > On 03/05/2024 13:36, Tvrtko Ursulin wrote: > > From: Tvrtko Ursulin > > > > Currently it is not well defined what is drm-memory- compared to other > > categories. > > > > In practice the only driver which emits these keys is amdgpu and in them > > exposes the total memory use (including shared). > > > > Document that drm-memory- and drm-total-memory- are aliases to > > prevent any > > confusion in the future. > > > > While at it also clarify that the reserved sub-string 'memory' refers to > > the memory region component. > > > > Signed-off-by: Tvrtko Ursulin > > Cc: Alex Deucher > > Cc: Christian König > > Mea culpa, I copied the mistake from > 77d17c4cd0bf52eacfad88e63e8932eb45d643c5. :) > > Regards, > > Tvrtko > > > Cc: Rob Clark > > --- > >Documentation/gpu/drm-usage-stats.rst | 10 +- > >1 file changed, 9 insertions(+), 1 deletion(-) > > > > diff --git a/Documentation/gpu/drm-usage-stats.rst > > b/Documentation/gpu/drm-usage-stats.rst > > index 6dc299343b48..ef5c0a0aa477 100644 > > --- a/Documentation/gpu/drm-usage-stats.rst > > +++ b/Documentation/gpu/drm-usage-stats.rst > > @@ -128,7 +128,9 @@ Memory > >Each possible memory type which can be used to store buffer > > objects by the > >GPU in question shall be given a stable and unique name to be > > returned as the > > -string here. The name "memory" is reserved to refer to normal > > system memory. > > +string here. > > + > > +The region name "memory" is reserved to refer to normal system memory. > >Value shall reflect the amount of storage currently consumed by > > the buffer > >objects belong to this client, in the respective memory region. > > @@ -136,6 +138,9 @@ objects belong to this client, in the respective > > memory region. > >Default unit shall be bytes with optional unit specifiers of 'KiB' > > or 'MiB' > >indicating kibi- or mebi-bytes. > > +This is an alias for drm-total- and only one of the two > > should be > > +present. > >> > >> This feels a bit awkward and seems to needlessly complicate fdinfo uapi. > >> > >> - Could we just patch amdgpu to follow everyone else, and avoid the > >>special case? If there's no tool that relies on the special amdgpu > >>prefix then that would be a lot easier. > >> > >> - If that's not on the table, could we make everyone (with a suitable > >>helper or something) just print both variants, so that we again have > >>consisent fdinfo output? Or breaks that a different set of existing > >>tools. > >> > >> - Finally maybe could we get away with fixing amd by adding the common > >>format there, deprecating the old, fixing the tools that would break and > >>then maybe if we're lucky, remove the old one from amdgpu in a year or > >>so? > > > > I'm not really understanding what amdgpu is doing wrong. It seems to > > be following the documentation. Is the idea that we would like to > > deprecate drm-memory- in favor of drm-total-? > > If that's the case, I think the 3rd option is probably the best. We > > have a lot of tools and customers using this. It would have also been > > nice to have "memory" in the string for the newer ones to avoid > > conflicts with other things that might be a total or shared in the > > future, but I guess that ship has sailed. We should also note that > > drm-memory- is deprecated. While we are here, maybe we should > > clarify the semantics of resident, purgeable, and active. For > > example, isn't resident just a duplicate of total? If the memory was > > not resident, it would be in a different region. > > Amdgpu isn't doing anything wrong. It just appears when the format was > discussed no one noticed (me included) that the two keys are not clearly > described. And it looks there also wasn't a plan to handle the uncelar > duality in the future. > > For me deprecating sounds fine, the 3rd option. I understand we would > only make amdgpu emit both sets of keys and then remove drm-memory- in > due time. > > With regards to key naming, yeah, memory in the name would have been > nice. We had a lot of discussion on this topic but ship has indeed > sailed. It is probably workarble for anything new that might come to add > their prefix. As long as it does not clash with the memory categories is > should be fine. > > In terms of resident semantics, think of it as VIRT vs RES in top(1).
Re: [PATCH v1 3/4] drm/amdgpu: add compute registers in ip dump for gfx10
On 5/3/2024 9:52 PM, Alex Deucher wrote: On Fri, May 3, 2024 at 12:09 PM Khatri, Sunil wrote: On 5/3/2024 9:18 PM, Khatri, Sunil wrote: On 5/3/2024 8:52 PM, Alex Deucher wrote: On Fri, May 3, 2024 at 4:45 AM Sunil Khatri wrote: add compute registers in set of registers to dump during ip dump for gfx10. Signed-off-by: Sunil Khatri --- drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c | 42 +- 1 file changed, 41 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c b/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c index 953df202953a..00c7a842ea3b 100644 --- a/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c +++ b/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c @@ -378,7 +378,47 @@ static const struct amdgpu_hwip_reg_entry gc_reg_list_10_1[] = { SOC15_REG_ENTRY_STR(GC, 0, mmGRBM_STATUS_SE0), SOC15_REG_ENTRY_STR(GC, 0, mmGRBM_STATUS_SE1), SOC15_REG_ENTRY_STR(GC, 0, mmGRBM_STATUS_SE2), - SOC15_REG_ENTRY_STR(GC, 0, mmGRBM_STATUS_SE3) + SOC15_REG_ENTRY_STR(GC, 0, mmGRBM_STATUS_SE3), + /* compute registers */ + SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_VMID), + SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_PERSISTENT_STATE), + SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_PIPE_PRIORITY), + SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_QUEUE_PRIORITY), + SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_QUANTUM), + SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_PQ_BASE), + SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_PQ_BASE_HI), + SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_PQ_RPTR), + SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_PQ_WPTR_POLL_ADDR), + SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_PQ_WPTR_POLL_ADDR_HI), + SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_PQ_DOORBELL_CONTROL), + SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_PQ_CONTROL), + SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_IB_BASE_ADDR), + SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_IB_BASE_ADDR_HI), + SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_IB_RPTR), + SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_IB_CONTROL), + SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_DEQUEUE_REQUEST), + SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_EOP_BASE_ADDR), + SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_EOP_BASE_ADDR_HI), + SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_EOP_CONTROL), + SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_EOP_RPTR), + SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_EOP_WPTR), + SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_EOP_EVENTS), + SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_CTX_SAVE_BASE_ADDR_LO), + SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_CTX_SAVE_BASE_ADDR_HI), + SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_CTX_SAVE_CONTROL), + SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_CNTL_STACK_OFFSET), + SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_CNTL_STACK_SIZE), + SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_WG_STATE_OFFSET), + SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_CTX_SAVE_SIZE), + SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_GDS_RESOURCE_STATE), + SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_ERROR), + SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_EOP_WPTR_MEM), + SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_PQ_WPTR_LO), + SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_PQ_WPTR_HI), + SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_SUSPEND_CNTL_STACK_OFFSET), + SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_SUSPEND_CNTL_STACK_DW_CNT), + SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_SUSPEND_WG_STATE_OFFSET), + SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_DEQUEUE_STATUS) The registers in patches 3 and 4 are multi-instance, so we should ideally print every instance of them rather than just one. Use nv_grbm_select() to select the pipes and queues. Make sure to protect access using the adev->srbm_mutex mutex. E.g., for the compute registers (patch 3): mutex_lock(&adev->srbm_mutex); for (i = 0; i < adev->gfx.mec.num_mec; ++i) { for (j = 0; j < adev->gfx.mec.num_pipe_per_mec; j++) { for (k = 0; k < adev->gfx.mec.num_queue_per_pipe; k++) { drm_printf("mec %d, pipe %d, queue %d\n", i, j, k); nv_grbm_select(adev, i, j, k, 0); for (reg = 0; reg < ARRAY_SIZE(compute_regs); reg++) drm_printf(...RREG(compute_regs[reg])); } } } nv_grbm_select(adev, 0, 0, 0, 0); mutex_unlock(&adev->srbm_mutex); For gfx registers (patch 4): mutex_lock(&adev->srbm_mutex); for (i = 0; i < adev->gfx.me.num_me; ++i) { for (j = 0; j < adev->gfx.me.num_pipe_per_me; j++) { for (k = 0; k < adev->gfx.me.num_queue_per_pipe; k++) { drm_printf("me %d, pipe %d, queue %d\n", i, j, k); nv_grbm_select(adev, i, j, k, 0); for (reg = 0; reg < ARRAY_SIZE(gfx_regs); reg++) drm_printf(...RREG(gfx_regs[reg])); I see one problem here, we dump the registers in mem
Re: Splat during driver probe
Hi Alex, yes, this is related to memory clearing changes. This is a known issue. I am working on a fix. Thanks, Arun. On 5/3/2024 6:46 PM, Alex Deucher wrote: Possibly related to Arun's new memory clearing changes. @Arunpravin can you take a look? Alex On Fri, May 3, 2024 at 9:10 AM Tvrtko Ursulin wrote: Hi, I tried asking on #radeon yesterday but no takers. I was wondering if the below splat is already known or not. Appears to happen due amdgpu_bo_release_notify genuniely running before amdgpu_ttm_set_buffer_funcs_status set the buffer funcs to enabled during device probe. I couldn't easily figure out what changed. Regards, Tvrtko [ 11.802030] [drm:amdgpu_fill_buffer [amdgpu]] *ERROR* Trying to clear memory with ring turned off. [ 11.802519] [ cut here ] [ 11.802530] WARNING: CPU: 5 PID: 435 at drivers/gpu/drm/amd/amdgpu/amdgpu_object.c:1378 amdgpu_bo_release_notify+0x20c/0x230 [amdgpu] [ 11.802916] Modules linked in: nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct amdgpu(E+) nft_chain_nat nf_tables ip6table_nat ip6table_mangle ip6table_raw ip6table_security iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 iptable_mangle iptable_raw iptable_security nfnetlink ip6table_filter ip6_tables iptable_filter cmac algif_hash ramoops algif_skcipher reed_solomon af_alg bnep qrtr mousedev intel_rapl_msr ath11k_pci(+) intel_rapl_common mhi snd_acp_sof_mach snd_acp_mach snd_sof_probes ath11k snd_soc_dmic kvm_amd cdc_mbim qmi_helpers joydev hid_multitouch cdc_wdm snd_sof_amd_vangogh snd_sof_pci kvm mac80211 snd_hda_codec_hdmi hci_uart crct10dif_pclmul amdxcp i2c_algo_bit snd_sof_amd_acp btqca drm_ttm_helper libarc4 crc32_pclmul btrtl snd_hda_intel snd_sof_xtensa_dsp btbcm ttm snd_intel_dspcfg btintel polyval_clmulni snd_sof drm_exec polyval_generic snd_hda_codec snd_sof_utils gpu_sched cfg80211 bluetooth snd_pci_acp5x gf128mul cdc_ncm [ 11.803070] sp5100_tco snd_hwdep ghash_clmulni_intel snd_soc_nau8821 snd_soc_max98388 drm_suballoc_helper sha512_ssse3 cdc_ether snd_acp_config drm_buddy atkbd snd_soc_core aesni_intel snd_hda_core video ecdh_generic crypto_simd libps2 usbnet cryptd rapl mii wdat_wdt vivaldi_fmap pcspkr snd_compress i2c_piix4 snd_pcm drm_display_helper ccp snd_soc_acpi cdc_acm rfkill wmi ltrf216a 8250_dw i2c_hid_acpi snd_timer snd i2c_hid industrialio soundcore hid_steam pkcs8_key_parser crypto_user loop fuse dm_mod ip_tables x_tables overlay ext4 crc16 mbcache jbd2 usbhid vfat fat btrfs blake2b_generic libcrc32c crc32c_generic xor raid6_pq sdhci_pci cqhci nvme serio_raw sdhci xhci_pci xhci_pci_renesas crc32c_intel mmc_core nvme_core i8042 serio spi_amd [ 11.803448] CPU: 5 PID: 435 Comm: (udev-worker) Tainted: GW E 6.9.0-rc6 #3 dd3fb65c639fd86ff89731c604b1e804996e8989 [ 11.803471] Hardware name: Valve Galileo/Galileo, BIOS F7G0107 12/01/2023 [ 11.803486] RIP: 0010:amdgpu_bo_release_notify+0x20c/0x230 [amdgpu] [ 11.803857] Code: 0b e9 a4 fe ff ff 48 ba ff ff ff ff ff ff ff 7f 31 f6 4c 89 e7 e8 44 e5 33 e6 eb 8d e8 dd dd 33 e6 eb a7 0f 0b e9 4d fe ff ff <0f> 0b eb 9c be 03 00 00 00 e8 f6 04 00 e6 eb 90 e8 8f 64 79 e6 66 [ 11.803890] RSP: 0018:a77bc1537568 EFLAGS: 00010282 [ 11.803904] RAX: ffea RBX: 8e1f0da89048 RCX: [ 11.803919] RDX: RSI: 0027 RDI: [ 11.803934] RBP: 8e1f6ec0ffb0 R08: R09: 0003 [ 11.803948] R10: a77bc15372f0 R11: 8e223ef7ffe8 R12: 8e1f0da89000 [ 11.803963] R13: 8e1f0da89180 R14: 8e1f6ec0ffb0 R15: 8e1f6ec0 [ 11.803977] FS: 7fa2b600b200() GS:8e222ee8() knlGS: [ 11.803994] CS: 0010 DS: ES: CR0: 80050033 [ 11.804007] CR2: 55cac2aa7080 CR3: 00010f818000 CR4: 00350ef0 [ 11.804021] Call Trace: [ 11.804031] [ 11.804042] ? __warn+0x8c/0x180 [ 11.804057] ? amdgpu_bo_release_notify+0x20c/0x230 [amdgpu 03b1dca7e09918e3f45efb1acdfc90682f73e4c1] [ 11.804456] ? report_bug+0x191/0x1c0 [ 11.804473] ? handle_bug+0x3a/0x70 [ 11.804485] ? exc_invalid_op+0x17/0x70 [ 11.804497] ? asm_exc_invalid_op+0x1a/0x20 [ 11.804517] ? amdgpu_bo_release_notify+0x20c/0x230 [amdgpu 03b1dca7e09918e3f45efb1acdfc90682f73e4c1] [ 11.804894] ? amdgpu_bo_release_notify+0x14a/0x230 [amdgpu 03b1dca7e09918e3f45efb1acdfc90682f73e4c1] [ 11.805277] ttm_bo_release+0x115/0x330 [ttm 39c917822ce73b2a754d4183af7a0990991c66be] [ 11.805303] ? srso_return_thunk+0x5/0x5f [ 11.805315] ? __mutex_unlock_slowpath+0x3a/0x290 [ 11.805332] amdgpu_bo_free_kernel+0xd6/0x130 [amdgpu 03b1dca7e09918e3f45efb1acdfc90682f73e4c1] [ 11.805709] dm_helpers_free_gpu_mem+0x41/0x80 [amdgpu 03b1dca7e09918e3f45efb1acdfc90682f73e4c1] [ 11.806172] vg_clk_mgr_construct+0x2c4/0x490 [amdgpu 03b1dca7e09918e3f45efb1acdfc90682f73e4c1] [ 11.806645] dc_clk
Re: [PATCH v1 3/4] drm/amdgpu: add compute registers in ip dump for gfx10
On Fri, May 3, 2024 at 12:09 PM Khatri, Sunil wrote: > > > On 5/3/2024 9:18 PM, Khatri, Sunil wrote: > > > > On 5/3/2024 8:52 PM, Alex Deucher wrote: > >> On Fri, May 3, 2024 at 4:45 AM Sunil Khatri > >> wrote: > >>> add compute registers in set of registers to dump > >>> during ip dump for gfx10. > >>> > >>> Signed-off-by: Sunil Khatri > >>> --- > >>> drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c | 42 > >>> +- > >>> 1 file changed, 41 insertions(+), 1 deletion(-) > >>> > >>> diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c > >>> b/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c > >>> index 953df202953a..00c7a842ea3b 100644 > >>> --- a/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c > >>> +++ b/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c > >>> @@ -378,7 +378,47 @@ static const struct amdgpu_hwip_reg_entry > >>> gc_reg_list_10_1[] = { > >>> SOC15_REG_ENTRY_STR(GC, 0, mmGRBM_STATUS_SE0), > >>> SOC15_REG_ENTRY_STR(GC, 0, mmGRBM_STATUS_SE1), > >>> SOC15_REG_ENTRY_STR(GC, 0, mmGRBM_STATUS_SE2), > >>> - SOC15_REG_ENTRY_STR(GC, 0, mmGRBM_STATUS_SE3) > >>> + SOC15_REG_ENTRY_STR(GC, 0, mmGRBM_STATUS_SE3), > >>> + /* compute registers */ > >>> + SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_VMID), > >>> + SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_PERSISTENT_STATE), > >>> + SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_PIPE_PRIORITY), > >>> + SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_QUEUE_PRIORITY), > >>> + SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_QUANTUM), > >>> + SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_PQ_BASE), > >>> + SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_PQ_BASE_HI), > >>> + SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_PQ_RPTR), > >>> + SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_PQ_WPTR_POLL_ADDR), > >>> + SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_PQ_WPTR_POLL_ADDR_HI), > >>> + SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_PQ_DOORBELL_CONTROL), > >>> + SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_PQ_CONTROL), > >>> + SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_IB_BASE_ADDR), > >>> + SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_IB_BASE_ADDR_HI), > >>> + SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_IB_RPTR), > >>> + SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_IB_CONTROL), > >>> + SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_DEQUEUE_REQUEST), > >>> + SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_EOP_BASE_ADDR), > >>> + SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_EOP_BASE_ADDR_HI), > >>> + SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_EOP_CONTROL), > >>> + SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_EOP_RPTR), > >>> + SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_EOP_WPTR), > >>> + SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_EOP_EVENTS), > >>> + SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_CTX_SAVE_BASE_ADDR_LO), > >>> + SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_CTX_SAVE_BASE_ADDR_HI), > >>> + SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_CTX_SAVE_CONTROL), > >>> + SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_CNTL_STACK_OFFSET), > >>> + SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_CNTL_STACK_SIZE), > >>> + SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_WG_STATE_OFFSET), > >>> + SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_CTX_SAVE_SIZE), > >>> + SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_GDS_RESOURCE_STATE), > >>> + SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_ERROR), > >>> + SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_EOP_WPTR_MEM), > >>> + SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_PQ_WPTR_LO), > >>> + SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_PQ_WPTR_HI), > >>> + SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_SUSPEND_CNTL_STACK_OFFSET), > >>> + SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_SUSPEND_CNTL_STACK_DW_CNT), > >>> + SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_SUSPEND_WG_STATE_OFFSET), > >>> + SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_DEQUEUE_STATUS) > >> The registers in patches 3 and 4 are multi-instance, so we should > >> ideally print every instance of them rather than just one. Use > >> nv_grbm_select() to select the pipes and queues. Make sure to protect > >> access using the adev->srbm_mutex mutex. > >> > >> E.g., for the compute registers (patch 3): > >> mutex_lock(&adev->srbm_mutex); > >> for (i = 0; i < adev->gfx.mec.num_mec; ++i) { > >> for (j = 0; j < adev->gfx.mec.num_pipe_per_mec; j++) { > >> for (k = 0; k < > >> adev->gfx.mec.num_queue_per_pipe; k++) { > >> drm_printf("mec %d, pipe %d, queue %d\n", i, j, k); > >> nv_grbm_select(adev, i, j, k, 0); > >> for (reg = 0; reg < ARRAY_SIZE(compute_regs); > >> reg++) > >> drm_printf(...RREG(compute_regs[reg])); > >> } > >> } > >> } > >> nv_grbm_select(adev, 0, 0, 0, 0); > >> mutex_unlock(&adev->srbm_mutex); > >> > >> For gfx registers (patch 4): > >> > >> mutex_lock(&adev->srbm_mutex); > >> for (i = 0; i < adev->gfx.me.num_me; ++i) { > >> for (j = 0; j < adev->
Re: [PATCH v1 3/4] drm/amdgpu: add compute registers in ip dump for gfx10
On 5/3/2024 9:18 PM, Khatri, Sunil wrote: On 5/3/2024 8:52 PM, Alex Deucher wrote: On Fri, May 3, 2024 at 4:45 AM Sunil Khatri wrote: add compute registers in set of registers to dump during ip dump for gfx10. Signed-off-by: Sunil Khatri --- drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c | 42 +- 1 file changed, 41 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c b/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c index 953df202953a..00c7a842ea3b 100644 --- a/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c +++ b/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c @@ -378,7 +378,47 @@ static const struct amdgpu_hwip_reg_entry gc_reg_list_10_1[] = { SOC15_REG_ENTRY_STR(GC, 0, mmGRBM_STATUS_SE0), SOC15_REG_ENTRY_STR(GC, 0, mmGRBM_STATUS_SE1), SOC15_REG_ENTRY_STR(GC, 0, mmGRBM_STATUS_SE2), - SOC15_REG_ENTRY_STR(GC, 0, mmGRBM_STATUS_SE3) + SOC15_REG_ENTRY_STR(GC, 0, mmGRBM_STATUS_SE3), + /* compute registers */ + SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_VMID), + SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_PERSISTENT_STATE), + SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_PIPE_PRIORITY), + SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_QUEUE_PRIORITY), + SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_QUANTUM), + SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_PQ_BASE), + SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_PQ_BASE_HI), + SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_PQ_RPTR), + SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_PQ_WPTR_POLL_ADDR), + SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_PQ_WPTR_POLL_ADDR_HI), + SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_PQ_DOORBELL_CONTROL), + SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_PQ_CONTROL), + SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_IB_BASE_ADDR), + SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_IB_BASE_ADDR_HI), + SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_IB_RPTR), + SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_IB_CONTROL), + SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_DEQUEUE_REQUEST), + SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_EOP_BASE_ADDR), + SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_EOP_BASE_ADDR_HI), + SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_EOP_CONTROL), + SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_EOP_RPTR), + SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_EOP_WPTR), + SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_EOP_EVENTS), + SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_CTX_SAVE_BASE_ADDR_LO), + SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_CTX_SAVE_BASE_ADDR_HI), + SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_CTX_SAVE_CONTROL), + SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_CNTL_STACK_OFFSET), + SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_CNTL_STACK_SIZE), + SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_WG_STATE_OFFSET), + SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_CTX_SAVE_SIZE), + SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_GDS_RESOURCE_STATE), + SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_ERROR), + SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_EOP_WPTR_MEM), + SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_PQ_WPTR_LO), + SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_PQ_WPTR_HI), + SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_SUSPEND_CNTL_STACK_OFFSET), + SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_SUSPEND_CNTL_STACK_DW_CNT), + SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_SUSPEND_WG_STATE_OFFSET), + SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_DEQUEUE_STATUS) The registers in patches 3 and 4 are multi-instance, so we should ideally print every instance of them rather than just one. Use nv_grbm_select() to select the pipes and queues. Make sure to protect access using the adev->srbm_mutex mutex. E.g., for the compute registers (patch 3): mutex_lock(&adev->srbm_mutex); for (i = 0; i < adev->gfx.mec.num_mec; ++i) { for (j = 0; j < adev->gfx.mec.num_pipe_per_mec; j++) { for (k = 0; k < adev->gfx.mec.num_queue_per_pipe; k++) { drm_printf("mec %d, pipe %d, queue %d\n", i, j, k); nv_grbm_select(adev, i, j, k, 0); for (reg = 0; reg < ARRAY_SIZE(compute_regs); reg++) drm_printf(...RREG(compute_regs[reg])); } } } nv_grbm_select(adev, 0, 0, 0, 0); mutex_unlock(&adev->srbm_mutex); For gfx registers (patch 4): mutex_lock(&adev->srbm_mutex); for (i = 0; i < adev->gfx.me.num_me; ++i) { for (j = 0; j < adev->gfx.me.num_pipe_per_me; j++) { for (k = 0; k < adev->gfx.me.num_queue_per_pipe; k++) { drm_printf("me %d, pipe %d, queue %d\n", i, j, k); nv_grbm_select(adev, i, j, k, 0); for (reg = 0; reg < ARRAY_SIZE(gfx_regs); reg++) drm_printf(...RREG(gfx_regs[reg])); I see one problem here, we dump the registers in memory allocated first and read before and store and then dump later when user read the devcoredump file. Here w
Re: [PATCH] Documentation/gpu: Document the situation with unqualified drm-memory-
On Fri, May 3, 2024 at 11:33 AM Daniel Vetter wrote: > > On Fri, May 03, 2024 at 01:58:38PM +0100, Tvrtko Ursulin wrote: > > > > [And I forgot dri-devel.. doing well!] > > > > On 03/05/2024 13:40, Tvrtko Ursulin wrote: > > > > > > [Correcting Christian's email] > > > > > > On 03/05/2024 13:36, Tvrtko Ursulin wrote: > > > > From: Tvrtko Ursulin > > > > > > > > Currently it is not well defined what is drm-memory- compared to other > > > > categories. > > > > > > > > In practice the only driver which emits these keys is amdgpu and in them > > > > exposes the total memory use (including shared). > > > > > > > > Document that drm-memory- and drm-total-memory- are aliases to > > > > prevent any > > > > confusion in the future. > > > > > > > > While at it also clarify that the reserved sub-string 'memory' refers to > > > > the memory region component. > > > > > > > > Signed-off-by: Tvrtko Ursulin > > > > Cc: Alex Deucher > > > > Cc: Christian König > > > > > > Mea culpa, I copied the mistake from > > > 77d17c4cd0bf52eacfad88e63e8932eb45d643c5. :) > > > > > > Regards, > > > > > > Tvrtko > > > > > > > Cc: Rob Clark > > > > --- > > > > Documentation/gpu/drm-usage-stats.rst | 10 +- > > > > 1 file changed, 9 insertions(+), 1 deletion(-) > > > > > > > > diff --git a/Documentation/gpu/drm-usage-stats.rst > > > > b/Documentation/gpu/drm-usage-stats.rst > > > > index 6dc299343b48..ef5c0a0aa477 100644 > > > > --- a/Documentation/gpu/drm-usage-stats.rst > > > > +++ b/Documentation/gpu/drm-usage-stats.rst > > > > @@ -128,7 +128,9 @@ Memory > > > > Each possible memory type which can be used to store buffer > > > > objects by the > > > > GPU in question shall be given a stable and unique name to be > > > > returned as the > > > > -string here. The name "memory" is reserved to refer to normal > > > > system memory. > > > > +string here. > > > > + > > > > +The region name "memory" is reserved to refer to normal system memory. > > > > Value shall reflect the amount of storage currently consumed by > > > > the buffer > > > > objects belong to this client, in the respective memory region. > > > > @@ -136,6 +138,9 @@ objects belong to this client, in the respective > > > > memory region. > > > > Default unit shall be bytes with optional unit specifiers of 'KiB' > > > > or 'MiB' > > > > indicating kibi- or mebi-bytes. > > > > +This is an alias for drm-total- and only one of the two > > > > should be > > > > +present. > > This feels a bit awkward and seems to needlessly complicate fdinfo uapi. > > - Could we just patch amdgpu to follow everyone else, and avoid the > special case? If there's no tool that relies on the special amdgpu > prefix then that would be a lot easier. > > - If that's not on the table, could we make everyone (with a suitable > helper or something) just print both variants, so that we again have > consisent fdinfo output? Or breaks that a different set of existing > tools. > > - Finally maybe could we get away with fixing amd by adding the common > format there, deprecating the old, fixing the tools that would break and > then maybe if we're lucky, remove the old one from amdgpu in a year or > so? I'm not really understanding what amdgpu is doing wrong. It seems to be following the documentation. Is the idea that we would like to deprecate drm-memory- in favor of drm-total-? If that's the case, I think the 3rd option is probably the best. We have a lot of tools and customers using this. It would have also been nice to have "memory" in the string for the newer ones to avoid conflicts with other things that might be a total or shared in the future, but I guess that ship has sailed. We should also note that drm-memory- is deprecated. While we are here, maybe we should clarify the semantics of resident, purgeable, and active. For example, isn't resident just a duplicate of total? If the memory was not resident, it would be in a different region. Alex > > Uapi that's "either do $foo or on this one driver, do $bar" is just > guaranteed to fragement the ecosystem, so imo that should be the absolute > last resort. > -Sima > > > > > + > > > > - drm-shared-: [KiB|MiB] > > > > The total size of buffers that are shared with another file (e.g., > > > > have more > > > > @@ -145,6 +150,9 @@ than a single handle). > > > > The total size of buffers that including shared and private memory. > > > > +This is an alias for drm-memory- and only one of the two > > > > should be > > > > +present. > > > > + > > > > - drm-resident-: [KiB|MiB] > > > > The total size of buffers that are resident in the specified region. > > -- > Daniel Vetter > Software Engineer, Intel Corporation > http://blog.ffwll.ch
Re: [PATCH v1 3/4] drm/amdgpu: add compute registers in ip dump for gfx10
On 5/3/2024 8:52 PM, Alex Deucher wrote: On Fri, May 3, 2024 at 4:45 AM Sunil Khatri wrote: add compute registers in set of registers to dump during ip dump for gfx10. Signed-off-by: Sunil Khatri --- drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c | 42 +- 1 file changed, 41 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c b/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c index 953df202953a..00c7a842ea3b 100644 --- a/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c +++ b/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c @@ -378,7 +378,47 @@ static const struct amdgpu_hwip_reg_entry gc_reg_list_10_1[] = { SOC15_REG_ENTRY_STR(GC, 0, mmGRBM_STATUS_SE0), SOC15_REG_ENTRY_STR(GC, 0, mmGRBM_STATUS_SE1), SOC15_REG_ENTRY_STR(GC, 0, mmGRBM_STATUS_SE2), - SOC15_REG_ENTRY_STR(GC, 0, mmGRBM_STATUS_SE3) + SOC15_REG_ENTRY_STR(GC, 0, mmGRBM_STATUS_SE3), + /* compute registers */ + SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_VMID), + SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_PERSISTENT_STATE), + SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_PIPE_PRIORITY), + SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_QUEUE_PRIORITY), + SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_QUANTUM), + SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_PQ_BASE), + SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_PQ_BASE_HI), + SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_PQ_RPTR), + SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_PQ_WPTR_POLL_ADDR), + SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_PQ_WPTR_POLL_ADDR_HI), + SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_PQ_DOORBELL_CONTROL), + SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_PQ_CONTROL), + SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_IB_BASE_ADDR), + SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_IB_BASE_ADDR_HI), + SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_IB_RPTR), + SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_IB_CONTROL), + SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_DEQUEUE_REQUEST), + SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_EOP_BASE_ADDR), + SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_EOP_BASE_ADDR_HI), + SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_EOP_CONTROL), + SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_EOP_RPTR), + SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_EOP_WPTR), + SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_EOP_EVENTS), + SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_CTX_SAVE_BASE_ADDR_LO), + SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_CTX_SAVE_BASE_ADDR_HI), + SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_CTX_SAVE_CONTROL), + SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_CNTL_STACK_OFFSET), + SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_CNTL_STACK_SIZE), + SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_WG_STATE_OFFSET), + SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_CTX_SAVE_SIZE), + SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_GDS_RESOURCE_STATE), + SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_ERROR), + SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_EOP_WPTR_MEM), + SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_PQ_WPTR_LO), + SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_PQ_WPTR_HI), + SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_SUSPEND_CNTL_STACK_OFFSET), + SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_SUSPEND_CNTL_STACK_DW_CNT), + SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_SUSPEND_WG_STATE_OFFSET), + SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_DEQUEUE_STATUS) The registers in patches 3 and 4 are multi-instance, so we should ideally print every instance of them rather than just one. Use nv_grbm_select() to select the pipes and queues. Make sure to protect access using the adev->srbm_mutex mutex. E.g., for the compute registers (patch 3): mutex_lock(&adev->srbm_mutex); for (i = 0; i < adev->gfx.mec.num_mec; ++i) { for (j = 0; j < adev->gfx.mec.num_pipe_per_mec; j++) { for (k = 0; k < adev->gfx.mec.num_queue_per_pipe; k++) { drm_printf("mec %d, pipe %d, queue %d\n", i, j, k); nv_grbm_select(adev, i, j, k, 0); for (reg = 0; reg < ARRAY_SIZE(compute_regs); reg++) drm_printf(...RREG(compute_regs[reg])); } } } nv_grbm_select(adev, 0, 0, 0, 0); mutex_unlock(&adev->srbm_mutex); For gfx registers (patch 4): mutex_lock(&adev->srbm_mutex); for (i = 0; i < adev->gfx.me.num_me; ++i) { for (j = 0; j < adev->gfx.me.num_pipe_per_me; j++) { for (k = 0; k < adev->gfx.me.num_queue_per_pipe; k++) { drm_printf("me %d, pipe %d, queue %d\n", i, j, k); nv_grbm_select(adev, i, j, k, 0); for (reg = 0; reg < ARRAY_SIZE(gfx_regs); reg++) drm_printf(...RREG(gfx_regs[reg])); } } } nv_grbm_select(adev, 0, 0, 0, 0); mutex_unlock(&adev->srbm_mutex); Thanks for pointing that out and suggesting the sample code of how it should be. Will take c
Re: Allow setting a power cap that's lower than recommended
On 5/3/2024 07:05, fililip wrote: This patch allows setting a low power cap if |ignore_min_pcap| is set to 1. Signed-off-by: fililip Rather than an attachment you should send the patch inline. That would mean that your commit message and SoB should be at the top of the patch itself. If you're not aware of it, you should read through: https://www.kernel.org/doc/html/latest/process/submitting-patches.html Basically a maintainer should be able to run: 'b4 shazam $URL' And it would download the patch and apply it. That being said my comments on the patch content: As Alex mentioned a concern about the the maintenance of more parameters, maybe it's worth making the sysfs file of the limit "writable" instead. Then when it's written the kernel driver can add_taint() for TAINT_FIRMWARE_WORKAROUND when in use so we can see it clearly in logs.
Re: Allow setting a power cap that's lower than recommended
On Fri, May 3, 2024 at 9:27 AM fililip wrote: > > This patch allows setting a low power cap if ignore_min_pcap is set to 1. > > Signed-off-by: fililip Please generate a proper signed off git patch and use git-send-email to send it out. Also, I don't want to add more module parameters. Alex
Re: [PATCH] Documentation/gpu: Document the situation with unqualified drm-memory-
On Fri, May 03, 2024 at 01:58:38PM +0100, Tvrtko Ursulin wrote: > > [And I forgot dri-devel.. doing well!] > > On 03/05/2024 13:40, Tvrtko Ursulin wrote: > > > > [Correcting Christian's email] > > > > On 03/05/2024 13:36, Tvrtko Ursulin wrote: > > > From: Tvrtko Ursulin > > > > > > Currently it is not well defined what is drm-memory- compared to other > > > categories. > > > > > > In practice the only driver which emits these keys is amdgpu and in them > > > exposes the total memory use (including shared). > > > > > > Document that drm-memory- and drm-total-memory- are aliases to > > > prevent any > > > confusion in the future. > > > > > > While at it also clarify that the reserved sub-string 'memory' refers to > > > the memory region component. > > > > > > Signed-off-by: Tvrtko Ursulin > > > Cc: Alex Deucher > > > Cc: Christian König > > > > Mea culpa, I copied the mistake from > > 77d17c4cd0bf52eacfad88e63e8932eb45d643c5. :) > > > > Regards, > > > > Tvrtko > > > > > Cc: Rob Clark > > > --- > > > Documentation/gpu/drm-usage-stats.rst | 10 +- > > > 1 file changed, 9 insertions(+), 1 deletion(-) > > > > > > diff --git a/Documentation/gpu/drm-usage-stats.rst > > > b/Documentation/gpu/drm-usage-stats.rst > > > index 6dc299343b48..ef5c0a0aa477 100644 > > > --- a/Documentation/gpu/drm-usage-stats.rst > > > +++ b/Documentation/gpu/drm-usage-stats.rst > > > @@ -128,7 +128,9 @@ Memory > > > Each possible memory type which can be used to store buffer > > > objects by the > > > GPU in question shall be given a stable and unique name to be > > > returned as the > > > -string here. The name "memory" is reserved to refer to normal > > > system memory. > > > +string here. > > > + > > > +The region name "memory" is reserved to refer to normal system memory. > > > Value shall reflect the amount of storage currently consumed by > > > the buffer > > > objects belong to this client, in the respective memory region. > > > @@ -136,6 +138,9 @@ objects belong to this client, in the respective > > > memory region. > > > Default unit shall be bytes with optional unit specifiers of 'KiB' > > > or 'MiB' > > > indicating kibi- or mebi-bytes. > > > +This is an alias for drm-total- and only one of the two > > > should be > > > +present. This feels a bit awkward and seems to needlessly complicate fdinfo uapi. - Could we just patch amdgpu to follow everyone else, and avoid the special case? If there's no tool that relies on the special amdgpu prefix then that would be a lot easier. - If that's not on the table, could we make everyone (with a suitable helper or something) just print both variants, so that we again have consisent fdinfo output? Or breaks that a different set of existing tools. - Finally maybe could we get away with fixing amd by adding the common format there, deprecating the old, fixing the tools that would break and then maybe if we're lucky, remove the old one from amdgpu in a year or so? Uapi that's "either do $foo or on this one driver, do $bar" is just guaranteed to fragement the ecosystem, so imo that should be the absolute last resort. -Sima > > > + > > > - drm-shared-: [KiB|MiB] > > > The total size of buffers that are shared with another file (e.g., > > > have more > > > @@ -145,6 +150,9 @@ than a single handle). > > > The total size of buffers that including shared and private memory. > > > +This is an alias for drm-memory- and only one of the two > > > should be > > > +present. > > > + > > > - drm-resident-: [KiB|MiB] > > > The total size of buffers that are resident in the specified region. -- Daniel Vetter Software Engineer, Intel Corporation http://blog.ffwll.ch
Re: [PATCH v1 3/4] drm/amdgpu: add compute registers in ip dump for gfx10
On Fri, May 3, 2024 at 4:45 AM Sunil Khatri wrote: > > add compute registers in set of registers to dump > during ip dump for gfx10. > > Signed-off-by: Sunil Khatri > --- > drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c | 42 +- > 1 file changed, 41 insertions(+), 1 deletion(-) > > diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c > b/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c > index 953df202953a..00c7a842ea3b 100644 > --- a/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c > +++ b/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c > @@ -378,7 +378,47 @@ static const struct amdgpu_hwip_reg_entry > gc_reg_list_10_1[] = { > SOC15_REG_ENTRY_STR(GC, 0, mmGRBM_STATUS_SE0), > SOC15_REG_ENTRY_STR(GC, 0, mmGRBM_STATUS_SE1), > SOC15_REG_ENTRY_STR(GC, 0, mmGRBM_STATUS_SE2), > - SOC15_REG_ENTRY_STR(GC, 0, mmGRBM_STATUS_SE3) > + SOC15_REG_ENTRY_STR(GC, 0, mmGRBM_STATUS_SE3), > + /* compute registers */ > + SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_VMID), > + SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_PERSISTENT_STATE), > + SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_PIPE_PRIORITY), > + SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_QUEUE_PRIORITY), > + SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_QUANTUM), > + SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_PQ_BASE), > + SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_PQ_BASE_HI), > + SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_PQ_RPTR), > + SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_PQ_WPTR_POLL_ADDR), > + SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_PQ_WPTR_POLL_ADDR_HI), > + SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_PQ_DOORBELL_CONTROL), > + SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_PQ_CONTROL), > + SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_IB_BASE_ADDR), > + SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_IB_BASE_ADDR_HI), > + SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_IB_RPTR), > + SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_IB_CONTROL), > + SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_DEQUEUE_REQUEST), > + SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_EOP_BASE_ADDR), > + SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_EOP_BASE_ADDR_HI), > + SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_EOP_CONTROL), > + SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_EOP_RPTR), > + SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_EOP_WPTR), > + SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_EOP_EVENTS), > + SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_CTX_SAVE_BASE_ADDR_LO), > + SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_CTX_SAVE_BASE_ADDR_HI), > + SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_CTX_SAVE_CONTROL), > + SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_CNTL_STACK_OFFSET), > + SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_CNTL_STACK_SIZE), > + SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_WG_STATE_OFFSET), > + SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_CTX_SAVE_SIZE), > + SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_GDS_RESOURCE_STATE), > + SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_ERROR), > + SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_EOP_WPTR_MEM), > + SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_PQ_WPTR_LO), > + SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_PQ_WPTR_HI), > + SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_SUSPEND_CNTL_STACK_OFFSET), > + SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_SUSPEND_CNTL_STACK_DW_CNT), > + SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_SUSPEND_WG_STATE_OFFSET), > + SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_DEQUEUE_STATUS) The registers in patches 3 and 4 are multi-instance, so we should ideally print every instance of them rather than just one. Use nv_grbm_select() to select the pipes and queues. Make sure to protect access using the adev->srbm_mutex mutex. E.g., for the compute registers (patch 3): mutex_lock(&adev->srbm_mutex); for (i = 0; i < adev->gfx.mec.num_mec; ++i) { for (j = 0; j < adev->gfx.mec.num_pipe_per_mec; j++) { for (k = 0; k < adev->gfx.mec.num_queue_per_pipe; k++) { drm_printf("mec %d, pipe %d, queue %d\n", i, j, k); nv_grbm_select(adev, i, j, k, 0); for (reg = 0; reg < ARRAY_SIZE(compute_regs); reg++) drm_printf(...RREG(compute_regs[reg])); } } } nv_grbm_select(adev, 0, 0, 0, 0); mutex_unlock(&adev->srbm_mutex); For gfx registers (patch 4): mutex_lock(&adev->srbm_mutex); for (i = 0; i < adev->gfx.me.num_me; ++i) { for (j = 0; j < adev->gfx.me.num_pipe_per_me; j++) { for (k = 0; k < adev->gfx.me.num_queue_per_pipe; k++) { drm_printf("me %d, pipe %d, queue %d\n", i, j, k); nv_grbm_select(adev, i, j, k, 0); for (reg = 0; reg < ARRAY_SIZE(gfx_regs); reg++) drm_printf(...RREG(gfx_regs[reg])); } } } nv_grbm_select(adev, 0, 0, 0, 0); mutex_unlock(&adev->srbm_mutex); Alex > }; > > static const struct soc15_reg_golde
Re: [PATCH v1 1/4] drm/amdgpu: add CP headers registers to gfx10 dump
On Fri, May 3, 2024 at 4:45 AM Sunil Khatri wrote: > > add registers in the ip dump for CP headers in gfx10 > > Signed-off-by: Sunil Khatri Patches 1, 2 are: Reviewed-by: Alex Deucher > --- > drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c | 9 - > 1 file changed, 8 insertions(+), 1 deletion(-) > > diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c > b/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c > index 3171ed5e5af3..61c1e997f794 100644 > --- a/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c > +++ b/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c > @@ -366,7 +366,14 @@ static const struct amdgpu_hwip_reg_entry > gc_reg_list_10_1[] = { > SOC15_REG_ENTRY_STR(GC, 0, mmRLC_GPM_DEBUG_INST_A), > SOC15_REG_ENTRY_STR(GC, 0, mmRLC_GPM_DEBUG_INST_B), > SOC15_REG_ENTRY_STR(GC, 0, mmRLC_GPM_DEBUG_INST_ADDR), > - SOC15_REG_ENTRY_STR(GC, 0, mmRLC_LX6_CORE_PDEBUG_INST) > + SOC15_REG_ENTRY_STR(GC, 0, mmRLC_LX6_CORE_PDEBUG_INST), > + /* cp header registers */ > + SOC15_REG_ENTRY_STR(GC, 0, mmCP_CE_HEADER_DUMP), > + SOC15_REG_ENTRY_STR(GC, 0, mmCP_MEC_ME1_HEADER_DUMP), > + SOC15_REG_ENTRY_STR(GC, 0, mmCP_MEC_ME2_HEADER_DUMP), > + SOC15_REG_ENTRY_STR(GC, 0, mmCP_PFP_HEADER_DUMP), > + SOC15_REG_ENTRY_STR(GC, 0, mmCP_ME_HEADER_DUMP), > + SOC15_REG_ENTRY_STR(GC, 0, mmCP_MES_HEADER_DUMP) > }; > > static const struct soc15_reg_golden golden_settings_gc_10_1[] = { > -- > 2.34.1 >
Re: Proposal to add CRIU support to DRM render nodes
On 2024-04-16 10:04, Tvrtko Ursulin wrote: > > On 01/04/2024 18:58, Felix Kuehling wrote: >> >> On 2024-04-01 12:56, Tvrtko Ursulin wrote: >>> >>> On 01/04/2024 17:37, Felix Kuehling wrote: On 2024-04-01 11:09, Tvrtko Ursulin wrote: > > On 28/03/2024 20:42, Felix Kuehling wrote: >> >> On 2024-03-28 12:03, Tvrtko Ursulin wrote: >>> >>> Hi Felix, >>> >>> I had one more thought while browsing around the amdgpu CRIU plugin. It >>> appears it relies on the KFD support being compiled in and /dev/kfd >>> present, correct? AFAICT at least, it relies on that to figure out the >>> amdgpu DRM node. >>> >>> In would be probably good to consider designing things without that >>> dependency. So that checkpointing an application which does not use >>> /dev/kfd is possible. Or if the kernel does not even have the KFD >>> support compiled in. >> >> Yeah, if we want to support graphics apps that don't use KFD, we should >> definitely do that. Currently we get a lot of topology information from >> KFD, not even from the /dev/kfd device but from the sysfs nodes exposed >> by KFD. We'd need to get GPU device info from the render nodes instead. >> And if KFD is available, we may need to integrate both sources of >> information. >> >> >>> >>> It could perhaps mean no more than adding some GPU discovery code into >>> CRIU. Which shuold be flexible enough to account for things like >>> re-assigned minor numbers due driver reload. >> >> Do you mean adding GPU discovery to the core CRIU, or to the plugin. I >> was thinking this is still part of the plugin. > > Yes I agree. I was only thinking about adding some DRM device discovery > code in a more decoupled fashion from the current plugin, for both the > reason discussed above (decoupling a bit from reliance on kfd sysfs), and > then also if/when a new DRM driver might want to implement this the code > could be move to some common plugin area. > > I am not sure how feasible that would be though. The "gpu id" concept and > it's matching in the current kernel code and CRIU plugin - is that value > tied to the physical GPU instance or how it works? The concept of the GPU ID is that it's stable while the system is up, even when devices get added and removed dynamically. It was baked into the API early on, but I don't think we ever fully validated device hot plug. I think the closest we're getting is with our latest MI GPUs and dynamic partition mode change. >>> >>> Doesn't it read the saved gpu id from the image file while doing restore >>> and tries to open the render node to match it? Maybe I am misreading the >>> code.. But if it does, does it imply that in practice it could be stable >>> across reboots? Or that it is not possible to restore to a different >>> instance of maybe the same GPU model installed in a system? >> >> Ah, the idea is, that when you restore on a different system, you may get >> different GPU IDs. Or you may checkpoint an app running on GPU 1 but restore >> it on GPU 2 on the same system. That's why we need to translate GPU IDs in >> restored applications. User mode still uses the old GPU IDs, but the kernel >> mode driver translates them to the actual GPU IDs of the GPUs that the >> process was restored on. > > I see.. I think. Normal flow is ppd->user_gpu_id set during client init, but > for restored clients it gets overriden during restore so that any further > ioctls can actually not instantly fail. > > And then in amdgpu_plugin_restore_file, when it is opening the render node, > it relies on the kfd topology to have filled in (more or less) the > target_gpu_id corresponding to the render node gpu id of the target GPU - the > one associated with the new kfd gpu_id? Yes. > > I am digging into this because I am trying to see if some part of GPU > discovery could somehow be decoupled.. to offer you to work on at least that > until you start to tackle the main body of the feature. But it looks properly > tangled up. OK. Most of the interesting plugin code should be in amdgpu_plugin_topology.c. It currently has some pretty complicated logic to find a set of devices that matches the topology in the checkpoint, including shader ISA versions, numbers of compute units, memory sizes, firmware versions and IO-Links between GPUs. This was originally done to support P2P with XGMI links. I'm not sure we ever updated it to properly support PCIe P2P. > > Do you have any suggestions with what I could help with? Maybe developing > some sort of drm device enumeration library if you see a way that would be > useful in decoupling the device discovery from kfd. We would need to define > what sort of information you would need to be queryable from it. Maybe. I think a lot of device information is available with some amd
Re: [RFC 5/5] drm/amdgpu: Only show VRAM in fdinfo if it exists
On Fri, May 3, 2024 at 10:01 AM Tvrtko Ursulin wrote: > > > On 03/05/2024 14:47, Alex Deucher wrote: > > On Fri, May 3, 2024 at 3:50 AM Tvrtko Ursulin > > wrote: > >> > >> > >> On 02/05/2024 14:16, Christian König wrote: > >>> Am 30.04.24 um 19:27 schrieb Tvrtko Ursulin: > From: Tvrtko Ursulin > > Do not emit the key-value pairs if the VRAM does not exist ie. VRAM > placement is not valid and accessible. > >>> > >>> Yeah, that's unfortunately rather misleading. > >>> > >>> Even APUs have VRAM or rather stolen system memory which is managed by > >>> the graphics driver. > >>> > >>> We only have a single compute model which really doesn't have VRAM at all. > >> > >> Hm what is misleading and how more precisely? :) Maybe in other words, > >> if is_app_apu is not the right criteria to know when TTM_PL_VRAM is > >> impossible, what is? Is the compute model you mentio the only thing > >> which sets is_app_apu and uses the dummy vram manager? > > > > Probably better to check if adev->gmc.real_vram_size is non-0. > > Hmm "real VRAM" - will that handle APUs correctly? Yes. In the client APU case "VRAM" will be the UMA carve out region reserved by the sbios. > > I am looking at this: > > if (!adev->gmc.is_app_apu) { > man->func = &amdgpu_vram_mgr_func; > > err = drm_buddy_init(&mgr->mm, man->size, PAGE_SIZE); > if (err) > return err; > } else { > man->func = &amdgpu_dummy_vram_mgr_func; > DRM_INFO("Setup dummy vram mgr\n"); > } > > And assuming that unless the dummy manager is used, TTM_PL_VRAM will be > valid. Wrong assumption? Today checking is_app_apu would be fine, but It's supposed to distinguish datacenter APUs, not whether or not the device has a "VRAM" pool or not, but its usage has gotten sort of overloaded. Just want to avoid overloading what it means in too many more places. Alex > > Regards, > > Tvrtko > > > > Alex > > > >> > >> Regards, > >> > >> Tvrtko > >> > >>> Regards, > >>> Christian. > >>> > > Signed-off-by: Tvrtko Ursulin > --- > drivers/gpu/drm/amd/amdgpu/amdgpu_fdinfo.c | 29 +- > 1 file changed, 17 insertions(+), 12 deletions(-) > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_fdinfo.c > b/drivers/gpu/drm/amd/amdgpu/amdgpu_fdinfo.c > index a09944104c41..603a5c010f5d 100644 > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_fdinfo.c > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_fdinfo.c > @@ -83,25 +83,30 @@ void amdgpu_show_fdinfo(struct drm_printer *p, > struct drm_file *file) > */ > drm_printf(p, "pasid:\t%u\n", fpriv->vm.pasid); > -drm_printf(p, "drm-memory-vram:\t%llu KiB\n", stats.vram/1024UL); > drm_printf(p, "drm-memory-gtt: \t%llu KiB\n", stats.gtt/1024UL); > drm_printf(p, "drm-memory-cpu: \t%llu KiB\n", stats.cpu/1024UL); > -drm_printf(p, "amd-memory-visible-vram:\t%llu KiB\n", > - stats.visible_vram/1024UL); > -drm_printf(p, "amd-evicted-vram:\t%llu KiB\n", > - stats.evicted_vram/1024UL); > -drm_printf(p, "amd-evicted-visible-vram:\t%llu KiB\n", > - stats.evicted_visible_vram/1024UL); > -drm_printf(p, "amd-requested-vram:\t%llu KiB\n", > - stats.requested_vram/1024UL); > -drm_printf(p, "amd-requested-visible-vram:\t%llu KiB\n", > - stats.requested_visible_vram/1024UL); > drm_printf(p, "amd-requested-gtt:\t%llu KiB\n", > stats.requested_gtt/1024UL); > -drm_printf(p, "drm-shared-vram:\t%llu KiB\n", > stats.vram_shared/1024UL); > drm_printf(p, "drm-shared-gtt:\t%llu KiB\n", > stats.gtt_shared/1024UL); > drm_printf(p, "drm-shared-cpu:\t%llu KiB\n", > stats.cpu_shared/1024UL); > +if (!adev->gmc.is_app_apu) { > +drm_printf(p, "drm-memory-vram:\t%llu KiB\n", > + stats.vram/1024UL); > +drm_printf(p, "amd-memory-visible-vram:\t%llu KiB\n", > + stats.visible_vram/1024UL); > +drm_printf(p, "amd-evicted-vram:\t%llu KiB\n", > + stats.evicted_vram/1024UL); > +drm_printf(p, "amd-evicted-visible-vram:\t%llu KiB\n", > + stats.evicted_visible_vram/1024UL); > +drm_printf(p, "amd-requested-vram:\t%llu KiB\n", > + stats.requested_vram/1024UL); > +drm_printf(p, "amd-requested-visible-vram:\t%llu KiB\n", > + stats.requested_visible_vram/1024UL); > +drm_printf(p, "drm-shared-vram:\t%llu KiB\n", > + stats.vram_shared/1024UL); > +} > + > for (hw_ip = 0; hw_ip < AMDGPU_HW_IP_NUM; ++hw_ip) { > if (!usage[hw_ip]) > continue; > >>>
Re: [RFC 0/5] Add capacity key to fdinfo
On Fri, May 3, 2024 at 7:50 AM Tvrtko Ursulin wrote: > > > On 02/05/2024 16:00, Alex Deucher wrote: > > On Thu, May 2, 2024 at 10:43 AM Tvrtko Ursulin > > wrote: > >> > >> > >> On 02/05/2024 14:07, Christian König wrote: > >>> Am 01.05.24 um 15:27 schrieb Tvrtko Ursulin: > > Hi Alex, > > On 30/04/2024 19:32, Alex Deucher wrote: > > On Tue, Apr 30, 2024 at 1:27 PM Tvrtko Ursulin > > wrote: > >> > >> From: Tvrtko Ursulin > >> > >> I have noticed AMD GPUs can have more than one "engine" (ring?) of > >> the same type > >> but amdgpu is not reporting that in fdinfo using the capacity engine > >> tag. > >> > >> This series is therefore an attempt to improve that, but only an RFC > >> since it is > >> quite likely I got stuff wrong on the first attempt. Or if not wrong > >> it may not > >> be very beneficial in AMDs case. > >> > >> So I tried to figure out how to count and store the number of > >> instances of an > >> "engine" type and spotted that could perhaps be used in more than > >> one place in > >> the driver. I was more than a little bit confused by the ip_instance > >> and uapi > >> rings, then how rings are selected to context entities internally. > >> Anyway.. > >> hopefully it is a simple enough series to easily spot any such large > >> misses. > >> > >> End result should be that, assuming two "engine" instances, one > >> fully loaded and > >> one idle will only report client using 50% of that engine type. > > > > That would only be true if there are multiple instantiations of the IP > > on the chip which in most cases is not true. In most cases there is > > one instance of the IP that can be fed from multiple rings. E.g. for > > graphics and compute, all of the rings ultimately feed into the same > > compute units on the chip. So if you have a gfx ring and a compute > > rings, you can schedule work to them asynchronously, but ultimately > > whether they execute serially or in parallel depends on the actual > > shader code in the command buffers and the extent to which it can > > utilize the available compute units in the shader cores. > > This is the same as with Intel/i915. Fdinfo is not intended to provide > utilisation of EUs and such, just how busy are the "entities" kernel > submits to. So doing something like in this series would make the > reporting more similar between the two drivers. > > I think both the 0-800% or 0-100% range (taking 8 ring compute as an > example) can be misleading for different workloads. Neither <800% in > the former means one can send more work and same for <100% in the latter. > >>> > >>> Yeah, I think that's what Alex tries to describe. By using 8 compute > >>> rings your 800% load is actually incorrect and quite misleading. > >>> > >>> Background is that those 8 compute rings won't be active all at the same > >>> time, but rather waiting on each other for resources. > >>> > >>> But this "waiting" is unfortunately considered execution time since the > >>> used approach is actually not really capable of separating waiting and > >>> execution time. > >> > >> Right, so 800% is what gputop could be suggesting today, by the virtue 8 > >> context/clients can each use 100% if they only use a subset of compute > >> units. I was proposing to expose the capacity in fdinfo so it can be > >> scaled down and then dicussing how both situation have pros and cons. > >> > There is also a parallel with the CPU world here and hyper threading, > if not wider, where "What does 100% actually mean?" is also wishy-washy. > > Also note that the reporting of actual time based values in fdinfo > would not changing with this series. > > Of if you can guide me towards how to distinguish real vs fake > parallelism in HW IP blocks I could modify the series to only add > capacity tags where there are truly independent blocks. That would be > different from i915 though were I did not bother with that > distinction. (For reasons that assignment of for instance EUs to > compute "rings" (command streamers in i915) was supposed to be > possible to re-configure on the fly. So it did not make sense to try > and be super smart in fdinfo.) > >>> > >>> Well exactly that's the point we don't really have truly independent > >>> blocks on AMD hardware. > >>> > >>> There are things like independent SDMA instances, but those a meant to > >>> be used like the first instance for uploads and the second for downloads > >>> etc.. When you use both instances for the same job they will pretty much > >>> limit each other because of a single resource. > >> > >> So _never_ multiple instances of the same IP block? No video decode, > >> encode, anything? > > > > Some chips have multiple encode/decode IP blocks that are actually >
Re: [PATCH] dm/amd/pm: Fix problems with reboot/shutdown for some SMU 13.0.4/13.0.11 users
On Thu, May 2, 2024 at 3:22 PM Mario Limonciello wrote: > > Limit the workaround introduced by commit 31729e8c21ec ("drm/amd/pm: fixes > a random hang in S4 for SMU v13.0.4/11") to only run in the s4 path. > > Cc: Tim Huang > Fixes: 31729e8c21ec ("drm/amd/pm: fixes a random hang in S4 for SMU > v13.0.4/11") > Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3351 > Signed-off-by: Mario Limonciello Acked-by: Alex Deucher > --- > I tested this on SMU 13.0.4 for ~85 cycles with this script, BIOS 1.1.0.2a and > didn't observe any hangs. > > ``` > #!/bin/sh > echo test_resume > /sys/power/disk > i=1 > while [ : ]; do > > echo "Starting cycle $i" > echo disk > /sys/power/state > echo "Ending cycle $i" > i=$((i+1)) > sleep 5 > done > ``` > > drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0_4_ppt.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0_4_ppt.c > b/drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0_4_ppt.c > index 949131bd1ecb..4abfcd32747d 100644 > --- a/drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0_4_ppt.c > +++ b/drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0_4_ppt.c > @@ -226,7 +226,7 @@ static int smu_v13_0_4_system_features_control(struct > smu_context *smu, bool en) > struct amdgpu_device *adev = smu->adev; > int ret = 0; > > - if (!en && !adev->in_s0ix) { > + if (!en && adev->in_s4) { > /* Adds a GFX reset as workaround just before sending the > * MP1_UNLOAD message to prevent GC/RLC/PMFW from entering > * an invalid state. > -- > 2.43.0 >
Re: [RFC 5/5] drm/amdgpu: Only show VRAM in fdinfo if it exists
On Fri, May 3, 2024 at 3:50 AM Tvrtko Ursulin wrote: > > > On 02/05/2024 14:16, Christian König wrote: > > Am 30.04.24 um 19:27 schrieb Tvrtko Ursulin: > >> From: Tvrtko Ursulin > >> > >> Do not emit the key-value pairs if the VRAM does not exist ie. VRAM > >> placement is not valid and accessible. > > > > Yeah, that's unfortunately rather misleading. > > > > Even APUs have VRAM or rather stolen system memory which is managed by > > the graphics driver. > > > > We only have a single compute model which really doesn't have VRAM at all. > > Hm what is misleading and how more precisely? :) Maybe in other words, > if is_app_apu is not the right criteria to know when TTM_PL_VRAM is > impossible, what is? Is the compute model you mentio the only thing > which sets is_app_apu and uses the dummy vram manager? Probably better to check if adev->gmc.real_vram_size is non-0. Alex > > Regards, > > Tvrtko > > > Regards, > > Christian. > > > >> > >> Signed-off-by: Tvrtko Ursulin > >> --- > >> drivers/gpu/drm/amd/amdgpu/amdgpu_fdinfo.c | 29 +- > >> 1 file changed, 17 insertions(+), 12 deletions(-) > >> > >> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_fdinfo.c > >> b/drivers/gpu/drm/amd/amdgpu/amdgpu_fdinfo.c > >> index a09944104c41..603a5c010f5d 100644 > >> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_fdinfo.c > >> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_fdinfo.c > >> @@ -83,25 +83,30 @@ void amdgpu_show_fdinfo(struct drm_printer *p, > >> struct drm_file *file) > >>*/ > >> drm_printf(p, "pasid:\t%u\n", fpriv->vm.pasid); > >> -drm_printf(p, "drm-memory-vram:\t%llu KiB\n", stats.vram/1024UL); > >> drm_printf(p, "drm-memory-gtt: \t%llu KiB\n", stats.gtt/1024UL); > >> drm_printf(p, "drm-memory-cpu: \t%llu KiB\n", stats.cpu/1024UL); > >> -drm_printf(p, "amd-memory-visible-vram:\t%llu KiB\n", > >> - stats.visible_vram/1024UL); > >> -drm_printf(p, "amd-evicted-vram:\t%llu KiB\n", > >> - stats.evicted_vram/1024UL); > >> -drm_printf(p, "amd-evicted-visible-vram:\t%llu KiB\n", > >> - stats.evicted_visible_vram/1024UL); > >> -drm_printf(p, "amd-requested-vram:\t%llu KiB\n", > >> - stats.requested_vram/1024UL); > >> -drm_printf(p, "amd-requested-visible-vram:\t%llu KiB\n", > >> - stats.requested_visible_vram/1024UL); > >> drm_printf(p, "amd-requested-gtt:\t%llu KiB\n", > >> stats.requested_gtt/1024UL); > >> -drm_printf(p, "drm-shared-vram:\t%llu KiB\n", > >> stats.vram_shared/1024UL); > >> drm_printf(p, "drm-shared-gtt:\t%llu KiB\n", > >> stats.gtt_shared/1024UL); > >> drm_printf(p, "drm-shared-cpu:\t%llu KiB\n", > >> stats.cpu_shared/1024UL); > >> +if (!adev->gmc.is_app_apu) { > >> +drm_printf(p, "drm-memory-vram:\t%llu KiB\n", > >> + stats.vram/1024UL); > >> +drm_printf(p, "amd-memory-visible-vram:\t%llu KiB\n", > >> + stats.visible_vram/1024UL); > >> +drm_printf(p, "amd-evicted-vram:\t%llu KiB\n", > >> + stats.evicted_vram/1024UL); > >> +drm_printf(p, "amd-evicted-visible-vram:\t%llu KiB\n", > >> + stats.evicted_visible_vram/1024UL); > >> +drm_printf(p, "amd-requested-vram:\t%llu KiB\n", > >> + stats.requested_vram/1024UL); > >> +drm_printf(p, "amd-requested-visible-vram:\t%llu KiB\n", > >> + stats.requested_visible_vram/1024UL); > >> +drm_printf(p, "drm-shared-vram:\t%llu KiB\n", > >> + stats.vram_shared/1024UL); > >> +} > >> + > >> for (hw_ip = 0; hw_ip < AMDGPU_HW_IP_NUM; ++hw_ip) { > >> if (!usage[hw_ip]) > >> continue; > >
Re: [PATCH] Documentation/gpu: Document the situation with unqualified drm-memory-
On Fri, May 3, 2024 at 8:58 AM Tvrtko Ursulin wrote: > > > [And I forgot dri-devel.. doing well!] > > On 03/05/2024 13:40, Tvrtko Ursulin wrote: > > > > [Correcting Christian's email] > > > > On 03/05/2024 13:36, Tvrtko Ursulin wrote: > >> From: Tvrtko Ursulin > >> > >> Currently it is not well defined what is drm-memory- compared to other > >> categories. > >> > >> In practice the only driver which emits these keys is amdgpu and in them > >> exposes the total memory use (including shared). > >> > >> Document that drm-memory- and drm-total-memory- are aliases to prevent > >> any > >> confusion in the future. > >> > >> While at it also clarify that the reserved sub-string 'memory' refers to > >> the memory region component. > >> > >> Signed-off-by: Tvrtko Ursulin > >> Cc: Alex Deucher > >> Cc: Christian König > > > > Mea culpa, I copied the mistake from > > 77d17c4cd0bf52eacfad88e63e8932eb45d643c5. :) > > I'm not following. What is the mistake from that commit? > > Regards, > > > > Tvrtko > > > >> Cc: Rob Clark > >> --- > >> Documentation/gpu/drm-usage-stats.rst | 10 +- > >> 1 file changed, 9 insertions(+), 1 deletion(-) > >> > >> diff --git a/Documentation/gpu/drm-usage-stats.rst > >> b/Documentation/gpu/drm-usage-stats.rst > >> index 6dc299343b48..ef5c0a0aa477 100644 > >> --- a/Documentation/gpu/drm-usage-stats.rst > >> +++ b/Documentation/gpu/drm-usage-stats.rst > >> @@ -128,7 +128,9 @@ Memory > >> Each possible memory type which can be used to store buffer objects > >> by the > >> GPU in question shall be given a stable and unique name to be > >> returned as the > >> -string here. The name "memory" is reserved to refer to normal system > >> memory. > >> +string here. > >> + > >> +The region name "memory" is reserved to refer to normal system memory. Is this supposed to mean drm-memory-memory? That was my impression, but that seems sort of weird. Maybe we should just drop that sentence. Alex > >> Value shall reflect the amount of storage currently consumed by the > >> buffer > >> objects belong to this client, in the respective memory region. > >> @@ -136,6 +138,9 @@ objects belong to this client, in the respective > >> memory region. > >> Default unit shall be bytes with optional unit specifiers of 'KiB' > >> or 'MiB' > >> indicating kibi- or mebi-bytes. > >> +This is an alias for drm-total- and only one of the two > >> should be > >> +present. > >> + > >> - drm-shared-: [KiB|MiB] > >> The total size of buffers that are shared with another file (e.g., > >> have more > >> @@ -145,6 +150,9 @@ than a single handle). > >> The total size of buffers that including shared and private memory. > >> +This is an alias for drm-memory- and only one of the two > >> should be > >> +present. > >> + > >> - drm-resident-: [KiB|MiB] > >> The total size of buffers that are resident in the specified region.
Re: Splat during driver probe
Possibly related to Arun's new memory clearing changes. @Arunpravin can you take a look? Alex On Fri, May 3, 2024 at 9:10 AM Tvrtko Ursulin wrote: > > > Hi, > > I tried asking on #radeon yesterday but no takers. I was wondering if > the below splat is already known or not. > > Appears to happen due amdgpu_bo_release_notify genuniely running before > amdgpu_ttm_set_buffer_funcs_status set the buffer funcs to enabled > during device probe. > > I couldn't easily figure out what changed. > > Regards, > > Tvrtko > > [ 11.802030] [drm:amdgpu_fill_buffer [amdgpu]] *ERROR* Trying to clear > memory with ring turned off. > [ 11.802519] [ cut here ] > [ 11.802530] WARNING: CPU: 5 PID: 435 at > drivers/gpu/drm/amd/amdgpu/amdgpu_object.c:1378 > amdgpu_bo_release_notify+0x20c/0x230 [amdgpu] > [ 11.802916] Modules linked in: nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 > nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct > amdgpu(E+) nft_chain_nat nf_tables ip6table_nat ip6table_mangle > ip6table_raw ip6table_security iptable_nat nf_nat nf_conntrack > nf_defrag_ipv6 nf_defrag_ipv4 iptable_mangle iptable_raw > iptable_security nfnetlink ip6table_filter ip6_tables iptable_filter > cmac algif_hash ramoops algif_skcipher reed_solomon af_alg bnep qrtr > mousedev intel_rapl_msr ath11k_pci(+) intel_rapl_common mhi > snd_acp_sof_mach snd_acp_mach snd_sof_probes ath11k snd_soc_dmic kvm_amd > cdc_mbim qmi_helpers joydev hid_multitouch cdc_wdm snd_sof_amd_vangogh > snd_sof_pci kvm mac80211 snd_hda_codec_hdmi hci_uart crct10dif_pclmul > amdxcp i2c_algo_bit snd_sof_amd_acp btqca drm_ttm_helper libarc4 > crc32_pclmul btrtl snd_hda_intel snd_sof_xtensa_dsp btbcm ttm > snd_intel_dspcfg btintel polyval_clmulni snd_sof drm_exec > polyval_generic snd_hda_codec snd_sof_utils gpu_sched cfg80211 bluetooth > snd_pci_acp5x gf128mul cdc_ncm > [ 11.803070] sp5100_tco snd_hwdep ghash_clmulni_intel snd_soc_nau8821 > snd_soc_max98388 drm_suballoc_helper sha512_ssse3 cdc_ether > snd_acp_config drm_buddy atkbd snd_soc_core aesni_intel snd_hda_core > video ecdh_generic crypto_simd libps2 usbnet cryptd rapl mii wdat_wdt > vivaldi_fmap pcspkr snd_compress i2c_piix4 snd_pcm drm_display_helper > ccp snd_soc_acpi cdc_acm rfkill wmi ltrf216a 8250_dw i2c_hid_acpi > snd_timer snd i2c_hid industrialio soundcore hid_steam pkcs8_key_parser > crypto_user loop fuse dm_mod ip_tables x_tables overlay ext4 crc16 > mbcache jbd2 usbhid vfat fat btrfs blake2b_generic libcrc32c > crc32c_generic xor raid6_pq sdhci_pci cqhci nvme serio_raw sdhci > xhci_pci xhci_pci_renesas crc32c_intel mmc_core nvme_core i8042 serio > spi_amd > [ 11.803448] CPU: 5 PID: 435 Comm: (udev-worker) Tainted: GW > E 6.9.0-rc6 #3 dd3fb65c639fd86ff89731c604b1e804996e8989 > [ 11.803471] Hardware name: Valve Galileo/Galileo, BIOS F7G0107 12/01/2023 > [ 11.803486] RIP: 0010:amdgpu_bo_release_notify+0x20c/0x230 [amdgpu] > [ 11.803857] Code: 0b e9 a4 fe ff ff 48 ba ff ff ff ff ff ff ff 7f 31 > f6 4c 89 e7 e8 44 e5 33 e6 eb 8d e8 dd dd 33 e6 eb a7 0f 0b e9 4d fe ff > ff <0f> 0b eb 9c be 03 00 00 00 e8 f6 04 00 e6 eb 90 e8 8f 64 79 e6 66 > [ 11.803890] RSP: 0018:a77bc1537568 EFLAGS: 00010282 > [ 11.803904] RAX: ffea RBX: 8e1f0da89048 RCX: > > [ 11.803919] RDX: RSI: 0027 RDI: > > [ 11.803934] RBP: 8e1f6ec0ffb0 R08: R09: > 0003 > [ 11.803948] R10: a77bc15372f0 R11: 8e223ef7ffe8 R12: > 8e1f0da89000 > [ 11.803963] R13: 8e1f0da89180 R14: 8e1f6ec0ffb0 R15: > 8e1f6ec0 > [ 11.803977] FS: 7fa2b600b200() GS:8e222ee8() > knlGS: > [ 11.803994] CS: 0010 DS: ES: CR0: 80050033 > [ 11.804007] CR2: 55cac2aa7080 CR3: 00010f818000 CR4: > 00350ef0 > [ 11.804021] Call Trace: > [ 11.804031] > [ 11.804042] ? __warn+0x8c/0x180 > [ 11.804057] ? amdgpu_bo_release_notify+0x20c/0x230 [amdgpu > 03b1dca7e09918e3f45efb1acdfc90682f73e4c1] > [ 11.804456] ? report_bug+0x191/0x1c0 > [ 11.804473] ? handle_bug+0x3a/0x70 > [ 11.804485] ? exc_invalid_op+0x17/0x70 > [ 11.804497] ? asm_exc_invalid_op+0x1a/0x20 > [ 11.804517] ? amdgpu_bo_release_notify+0x20c/0x230 [amdgpu > 03b1dca7e09918e3f45efb1acdfc90682f73e4c1] > [ 11.804894] ? amdgpu_bo_release_notify+0x14a/0x230 [amdgpu > 03b1dca7e09918e3f45efb1acdfc90682f73e4c1] > [ 11.805277] ttm_bo_release+0x115/0x330 [ttm > 39c917822ce73b2a754d4183af7a0990991c66be] > [ 11.805303] ? srso_return_thunk+0x5/0x5f > [ 11.805315] ? __mutex_unlock_slowpath+0x3a/0x290 > [ 11.805332] amdgpu_bo_free_kernel+0xd6/0x130 [amdgpu > 03b1dca7e09918e3f45efb1acdfc90682f73e4c1] > [ 11.805709] dm_helpers_free_gpu_mem+0x41/0x80 [amdgpu > 03b1dca7e09918e3f45efb1acdfc90682f73e4c1] > [ 11.806172] vg_clk_mgr_construct+0x2c4/0x490 [amdgpu > 03b1dca7e09918e3f45efb1acdfc90682f73e4c1]
Re: [PATCH] Documentation/gpu: Document the situation with unqualified drm-memory-
[And I forgot dri-devel.. doing well!] On 03/05/2024 13:40, Tvrtko Ursulin wrote: [Correcting Christian's email] On 03/05/2024 13:36, Tvrtko Ursulin wrote: From: Tvrtko Ursulin Currently it is not well defined what is drm-memory- compared to other categories. In practice the only driver which emits these keys is amdgpu and in them exposes the total memory use (including shared). Document that drm-memory- and drm-total-memory- are aliases to prevent any confusion in the future. While at it also clarify that the reserved sub-string 'memory' refers to the memory region component. Signed-off-by: Tvrtko Ursulin Cc: Alex Deucher Cc: Christian König Mea culpa, I copied the mistake from 77d17c4cd0bf52eacfad88e63e8932eb45d643c5. :) Regards, Tvrtko Cc: Rob Clark --- Documentation/gpu/drm-usage-stats.rst | 10 +- 1 file changed, 9 insertions(+), 1 deletion(-) diff --git a/Documentation/gpu/drm-usage-stats.rst b/Documentation/gpu/drm-usage-stats.rst index 6dc299343b48..ef5c0a0aa477 100644 --- a/Documentation/gpu/drm-usage-stats.rst +++ b/Documentation/gpu/drm-usage-stats.rst @@ -128,7 +128,9 @@ Memory Each possible memory type which can be used to store buffer objects by the GPU in question shall be given a stable and unique name to be returned as the -string here. The name "memory" is reserved to refer to normal system memory. +string here. + +The region name "memory" is reserved to refer to normal system memory. Value shall reflect the amount of storage currently consumed by the buffer objects belong to this client, in the respective memory region. @@ -136,6 +138,9 @@ objects belong to this client, in the respective memory region. Default unit shall be bytes with optional unit specifiers of 'KiB' or 'MiB' indicating kibi- or mebi-bytes. +This is an alias for drm-total- and only one of the two should be +present. + - drm-shared-: [KiB|MiB] The total size of buffers that are shared with another file (e.g., have more @@ -145,6 +150,9 @@ than a single handle). The total size of buffers that including shared and private memory. +This is an alias for drm-memory- and only one of the two should be +present. + - drm-resident-: [KiB|MiB] The total size of buffers that are resident in the specified region.
Re: [linux-next:master] BUILD REGRESSION 9c6ecb3cb6e20c4fd7997047213ba0efcf9ada1a
On 5/3/2024 11:58 AM, Greg KH wrote: On Fri, May 03, 2024 at 11:30:50AM +0530, Krishna Kurapati PSSNV wrote: On 5/3/2024 10:42 AM, Greg KH wrote: Ok, I'm getting tired of seeing these for the USB portion of the tree, so I went to look for: On Fri, May 03, 2024 at 04:44:42AM +0800, kernel test robot wrote: |-- arc-randconfig-002-20240503 | `-- drivers-usb-dwc3-core.c:warning:variable-hw_mode-set-but-not-used This warning (same for all arches), but can't seem to find it anywhere. Any hints as to where it would be? Hi Greg, I think the hw_mode was not removed in hs_phy_setup and left unused. Thinh reported the same when there was a merge conflict into linux next (that the hw_mode variable was removed in ss_phy_setup and should be removed in hs_phy_setup as well): https://lore.kernel.org/all/20240426213923.tyeddub4xszyp...@synopsys.com/ Perhaps that was missed ? Must have been. I need it in a format that it can be applied in (a 2-way diff can't apply...) I just checked it with W=1 and it is causing the issue: /local/mnt/workspace/krishna/linux-next/skales_test/skales/kernel/drivers/usb/dwc3/core.c: In function 'dwc3_hs_phy_setup': /local/mnt/workspace/krishna/linux-next/skales_test/skales/kernel/drivers/usb/dwc3/core.c:679:15: warning: variable 'hw_mode' set but not used [-Wunused-but-set-variable] unsigned int hw_mode; ^ I can send a patch to fix it up. Also, just wanted to confirm if I skip the fixes and CC tags as the main patch wasn't yet merged into any stable trees ? Regards, Krishna,
Re: [PATCH 1/3] drm/amdgpu: Add amdgpu_bo_is_vm_bo helper
On 03/05/2024 07:26, Christian König wrote: Am 29.04.24 um 18:47 schrieb Tvrtko Ursulin: From: Tvrtko Ursulin Help code readability by replacing a bunch of: bo->tbo.base.resv == vm->root.bo->tbo.base.resv With: amdgpu_vm_is_bo_always_valid(vm, bo) No functional changes. v2: * Rename helper and move to amdgpu_vm. (Christian) Signed-off-by: Tvrtko Ursulin Cc: Christian König --- drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c | 2 +- drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 40 +++-- drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h | 2 ++ 3 files changed, 28 insertions(+), 16 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c index 67c234bcf89f..e698d65e9508 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c @@ -174,7 +174,7 @@ static int amdgpu_gem_object_open(struct drm_gem_object *obj, return -EPERM; if (abo->flags & AMDGPU_GEM_CREATE_VM_ALWAYS_VALID && - abo->tbo.base.resv != vm->root.bo->tbo.base.resv) + !amdgpu_vm_is_bo_always_valid(vm, abo)) return -EPERM; r = amdgpu_bo_reserve(abo, false); diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c index 8af3f0fd3073..01ca4b35b369 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c @@ -333,7 +333,7 @@ void amdgpu_vm_bo_base_init(struct amdgpu_vm_bo_base *base, base->next = bo->vm_bo; bo->vm_bo = base; - if (bo->tbo.base.resv != vm->root.bo->tbo.base.resv) + if (!amdgpu_vm_is_bo_always_valid(vm, bo)) return; dma_resv_assert_held(vm->root.bo->tbo.base.resv); @@ -1101,13 +1101,13 @@ static void amdgpu_vm_bo_get_memory(struct amdgpu_bo_va *bo_va, * For now ignore BOs which are currently locked and potentially * changing their location. */ - if (bo->tbo.base.resv != vm->root.bo->tbo.base.resv && + if (!amdgpu_vm_is_bo_always_valid(vm, bo) && !dma_resv_trylock(bo->tbo.base.resv)) return; amdgpu_bo_get_memory(bo, stats); - if (bo->tbo.base.resv != vm->root.bo->tbo.base.resv) - dma_resv_unlock(bo->tbo.base.resv); + if (amdgpu_vm_is_bo_always_valid(vm, bo)) + dma_resv_unlock(bo->tbo.base.resv); } void amdgpu_vm_get_memory(struct amdgpu_vm *vm, @@ -1203,8 +1203,7 @@ int amdgpu_vm_bo_update(struct amdgpu_device *adev, struct amdgpu_bo_va *bo_va, uncached = false; } - if (clear || (bo && bo->tbo.base.resv == - vm->root.bo->tbo.base.resv)) + if (clear || amdgpu_vm_is_bo_always_valid(vm, bo)) last_update = &vm->last_update; else last_update = &bo_va->last_pt_update; @@ -1246,7 +1245,7 @@ int amdgpu_vm_bo_update(struct amdgpu_device *adev, struct amdgpu_bo_va *bo_va, * the evicted list so that it gets validated again on the * next command submission. */ - if (bo && bo->tbo.base.resv == vm->root.bo->tbo.base.resv) { + if (amdgpu_vm_is_bo_always_valid(vm, bo)) { uint32_t mem_type = bo->tbo.resource->mem_type; if (!(bo->preferred_domains & @@ -1640,10 +1639,9 @@ static void amdgpu_vm_bo_insert_map(struct amdgpu_device *adev, if (mapping->flags & AMDGPU_PTE_PRT) amdgpu_vm_prt_get(adev); - if (bo && bo->tbo.base.resv == vm->root.bo->tbo.base.resv && - !bo_va->base.moved) { + if (amdgpu_vm_is_bo_always_valid(vm, bo) && !bo_va->base.moved) amdgpu_vm_bo_moved(&bo_va->base); - } + trace_amdgpu_vm_bo_map(bo_va, mapping); } @@ -1922,7 +1920,7 @@ int amdgpu_vm_bo_clear_mappings(struct amdgpu_device *adev, if (before->flags & AMDGPU_PTE_PRT) amdgpu_vm_prt_get(adev); - if (bo && bo->tbo.base.resv == vm->root.bo->tbo.base.resv && + if (amdgpu_vm_is_bo_always_valid(vm, bo) && !before->bo_va->base.moved) amdgpu_vm_bo_moved(&before->bo_va->base); } else { @@ -1937,7 +1935,7 @@ int amdgpu_vm_bo_clear_mappings(struct amdgpu_device *adev, if (after->flags & AMDGPU_PTE_PRT) amdgpu_vm_prt_get(adev); - if (bo && bo->tbo.base.resv == vm->root.bo->tbo.base.resv && + if (amdgpu_vm_is_bo_always_valid(vm, bo) && !after->bo_va->base.moved) amdgpu_vm_bo_moved(&after->bo_va->base); } else { @@ -2017,7 +2015,7 @@ void amdgpu_vm_bo_del(struct amdgpu_device *adev, if (bo) { dma_resv_assert_held(bo->tbo.base.resv); - if (bo->tbo.base.resv == vm->root.bo->tbo.base.resv) + if (amdgpu_vm_is_bo_always_valid(vm, bo)) ttm_bo_set_bulk_move(&bo->tbo, NULL); for (base = &bo_va->base.bo->vm_bo; *base; @@ -2111,7 +2109,7 @@ void amdgpu_vm_bo_invalidate(struct amdgpu_device *adev, for (bo_base = bo->vm_bo; bo_base; bo_base = bo_base->next) { struct amdgpu_vm *vm
Splat during driver probe
Hi, I tried asking on #radeon yesterday but no takers. I was wondering if the below splat is already known or not. Appears to happen due amdgpu_bo_release_notify genuniely running before amdgpu_ttm_set_buffer_funcs_status set the buffer funcs to enabled during device probe. I couldn't easily figure out what changed. Regards, Tvrtko [ 11.802030] [drm:amdgpu_fill_buffer [amdgpu]] *ERROR* Trying to clear memory with ring turned off. [ 11.802519] [ cut here ] [ 11.802530] WARNING: CPU: 5 PID: 435 at drivers/gpu/drm/amd/amdgpu/amdgpu_object.c:1378 amdgpu_bo_release_notify+0x20c/0x230 [amdgpu] [ 11.802916] Modules linked in: nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct amdgpu(E+) nft_chain_nat nf_tables ip6table_nat ip6table_mangle ip6table_raw ip6table_security iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 iptable_mangle iptable_raw iptable_security nfnetlink ip6table_filter ip6_tables iptable_filter cmac algif_hash ramoops algif_skcipher reed_solomon af_alg bnep qrtr mousedev intel_rapl_msr ath11k_pci(+) intel_rapl_common mhi snd_acp_sof_mach snd_acp_mach snd_sof_probes ath11k snd_soc_dmic kvm_amd cdc_mbim qmi_helpers joydev hid_multitouch cdc_wdm snd_sof_amd_vangogh snd_sof_pci kvm mac80211 snd_hda_codec_hdmi hci_uart crct10dif_pclmul amdxcp i2c_algo_bit snd_sof_amd_acp btqca drm_ttm_helper libarc4 crc32_pclmul btrtl snd_hda_intel snd_sof_xtensa_dsp btbcm ttm snd_intel_dspcfg btintel polyval_clmulni snd_sof drm_exec polyval_generic snd_hda_codec snd_sof_utils gpu_sched cfg80211 bluetooth snd_pci_acp5x gf128mul cdc_ncm [ 11.803070] sp5100_tco snd_hwdep ghash_clmulni_intel snd_soc_nau8821 snd_soc_max98388 drm_suballoc_helper sha512_ssse3 cdc_ether snd_acp_config drm_buddy atkbd snd_soc_core aesni_intel snd_hda_core video ecdh_generic crypto_simd libps2 usbnet cryptd rapl mii wdat_wdt vivaldi_fmap pcspkr snd_compress i2c_piix4 snd_pcm drm_display_helper ccp snd_soc_acpi cdc_acm rfkill wmi ltrf216a 8250_dw i2c_hid_acpi snd_timer snd i2c_hid industrialio soundcore hid_steam pkcs8_key_parser crypto_user loop fuse dm_mod ip_tables x_tables overlay ext4 crc16 mbcache jbd2 usbhid vfat fat btrfs blake2b_generic libcrc32c crc32c_generic xor raid6_pq sdhci_pci cqhci nvme serio_raw sdhci xhci_pci xhci_pci_renesas crc32c_intel mmc_core nvme_core i8042 serio spi_amd [ 11.803448] CPU: 5 PID: 435 Comm: (udev-worker) Tainted: GW E 6.9.0-rc6 #3 dd3fb65c639fd86ff89731c604b1e804996e8989 [ 11.803471] Hardware name: Valve Galileo/Galileo, BIOS F7G0107 12/01/2023 [ 11.803486] RIP: 0010:amdgpu_bo_release_notify+0x20c/0x230 [amdgpu] [ 11.803857] Code: 0b e9 a4 fe ff ff 48 ba ff ff ff ff ff ff ff 7f 31 f6 4c 89 e7 e8 44 e5 33 e6 eb 8d e8 dd dd 33 e6 eb a7 0f 0b e9 4d fe ff ff <0f> 0b eb 9c be 03 00 00 00 e8 f6 04 00 e6 eb 90 e8 8f 64 79 e6 66 [ 11.803890] RSP: 0018:a77bc1537568 EFLAGS: 00010282 [ 11.803904] RAX: ffea RBX: 8e1f0da89048 RCX: [ 11.803919] RDX: RSI: 0027 RDI: [ 11.803934] RBP: 8e1f6ec0ffb0 R08: R09: 0003 [ 11.803948] R10: a77bc15372f0 R11: 8e223ef7ffe8 R12: 8e1f0da89000 [ 11.803963] R13: 8e1f0da89180 R14: 8e1f6ec0ffb0 R15: 8e1f6ec0 [ 11.803977] FS: 7fa2b600b200() GS:8e222ee8() knlGS: [ 11.803994] CS: 0010 DS: ES: CR0: 80050033 [ 11.804007] CR2: 55cac2aa7080 CR3: 00010f818000 CR4: 00350ef0 [ 11.804021] Call Trace: [ 11.804031] [ 11.804042] ? __warn+0x8c/0x180 [ 11.804057] ? amdgpu_bo_release_notify+0x20c/0x230 [amdgpu 03b1dca7e09918e3f45efb1acdfc90682f73e4c1] [ 11.804456] ? report_bug+0x191/0x1c0 [ 11.804473] ? handle_bug+0x3a/0x70 [ 11.804485] ? exc_invalid_op+0x17/0x70 [ 11.804497] ? asm_exc_invalid_op+0x1a/0x20 [ 11.804517] ? amdgpu_bo_release_notify+0x20c/0x230 [amdgpu 03b1dca7e09918e3f45efb1acdfc90682f73e4c1] [ 11.804894] ? amdgpu_bo_release_notify+0x14a/0x230 [amdgpu 03b1dca7e09918e3f45efb1acdfc90682f73e4c1] [ 11.805277] ttm_bo_release+0x115/0x330 [ttm 39c917822ce73b2a754d4183af7a0990991c66be] [ 11.805303] ? srso_return_thunk+0x5/0x5f [ 11.805315] ? __mutex_unlock_slowpath+0x3a/0x290 [ 11.805332] amdgpu_bo_free_kernel+0xd6/0x130 [amdgpu 03b1dca7e09918e3f45efb1acdfc90682f73e4c1] [ 11.805709] dm_helpers_free_gpu_mem+0x41/0x80 [amdgpu 03b1dca7e09918e3f45efb1acdfc90682f73e4c1] [ 11.806172] vg_clk_mgr_construct+0x2c4/0x490 [amdgpu 03b1dca7e09918e3f45efb1acdfc90682f73e4c1] [ 11.806645] dc_clk_mgr_create+0x390/0x580 [amdgpu 03b1dca7e09918e3f45efb1acdfc90682f73e4c1] [ 11.807112] dc_create+0x28a/0x640 [amdgpu 03b1dca7e09918e3f45efb1acdfc90682f73e4c1] [ 11.807567] amdgpu_dm_init.isra.0+0x2e1/0x1fe0 [amdgpu 03b1dca7e09918e3f45efb1acdfc9068
Re: [PATCH] Documentation/gpu: Document the situation with unqualified drm-memory-
[Correcting Christian's email] On 03/05/2024 13:36, Tvrtko Ursulin wrote: From: Tvrtko Ursulin Currently it is not well defined what is drm-memory- compared to other categories. In practice the only driver which emits these keys is amdgpu and in them exposes the total memory use (including shared). Document that drm-memory- and drm-total-memory- are aliases to prevent any confusion in the future. While at it also clarify that the reserved sub-string 'memory' refers to the memory region component. Signed-off-by: Tvrtko Ursulin Cc: Alex Deucher Cc: Christian König Mea culpa, I copied the mistake from 77d17c4cd0bf52eacfad88e63e8932eb45d643c5. :) Regards, Tvrtko Cc: Rob Clark --- Documentation/gpu/drm-usage-stats.rst | 10 +- 1 file changed, 9 insertions(+), 1 deletion(-) diff --git a/Documentation/gpu/drm-usage-stats.rst b/Documentation/gpu/drm-usage-stats.rst index 6dc299343b48..ef5c0a0aa477 100644 --- a/Documentation/gpu/drm-usage-stats.rst +++ b/Documentation/gpu/drm-usage-stats.rst @@ -128,7 +128,9 @@ Memory Each possible memory type which can be used to store buffer objects by the GPU in question shall be given a stable and unique name to be returned as the -string here. The name "memory" is reserved to refer to normal system memory. +string here. + +The region name "memory" is reserved to refer to normal system memory. Value shall reflect the amount of storage currently consumed by the buffer objects belong to this client, in the respective memory region. @@ -136,6 +138,9 @@ objects belong to this client, in the respective memory region. Default unit shall be bytes with optional unit specifiers of 'KiB' or 'MiB' indicating kibi- or mebi-bytes. +This is an alias for drm-total- and only one of the two should be +present. + - drm-shared-: [KiB|MiB] The total size of buffers that are shared with another file (e.g., have more @@ -145,6 +150,9 @@ than a single handle). The total size of buffers that including shared and private memory. +This is an alias for drm-memory- and only one of the two should be +present. + - drm-resident-: [KiB|MiB] The total size of buffers that are resident in the specified region.
Re: [RFC 0/5] Add capacity key to fdinfo
On 02/05/2024 16:00, Alex Deucher wrote: On Thu, May 2, 2024 at 10:43 AM Tvrtko Ursulin wrote: On 02/05/2024 14:07, Christian König wrote: Am 01.05.24 um 15:27 schrieb Tvrtko Ursulin: Hi Alex, On 30/04/2024 19:32, Alex Deucher wrote: On Tue, Apr 30, 2024 at 1:27 PM Tvrtko Ursulin wrote: From: Tvrtko Ursulin I have noticed AMD GPUs can have more than one "engine" (ring?) of the same type but amdgpu is not reporting that in fdinfo using the capacity engine tag. This series is therefore an attempt to improve that, but only an RFC since it is quite likely I got stuff wrong on the first attempt. Or if not wrong it may not be very beneficial in AMDs case. So I tried to figure out how to count and store the number of instances of an "engine" type and spotted that could perhaps be used in more than one place in the driver. I was more than a little bit confused by the ip_instance and uapi rings, then how rings are selected to context entities internally. Anyway.. hopefully it is a simple enough series to easily spot any such large misses. End result should be that, assuming two "engine" instances, one fully loaded and one idle will only report client using 50% of that engine type. That would only be true if there are multiple instantiations of the IP on the chip which in most cases is not true. In most cases there is one instance of the IP that can be fed from multiple rings. E.g. for graphics and compute, all of the rings ultimately feed into the same compute units on the chip. So if you have a gfx ring and a compute rings, you can schedule work to them asynchronously, but ultimately whether they execute serially or in parallel depends on the actual shader code in the command buffers and the extent to which it can utilize the available compute units in the shader cores. This is the same as with Intel/i915. Fdinfo is not intended to provide utilisation of EUs and such, just how busy are the "entities" kernel submits to. So doing something like in this series would make the reporting more similar between the two drivers. I think both the 0-800% or 0-100% range (taking 8 ring compute as an example) can be misleading for different workloads. Neither <800% in the former means one can send more work and same for <100% in the latter. Yeah, I think that's what Alex tries to describe. By using 8 compute rings your 800% load is actually incorrect and quite misleading. Background is that those 8 compute rings won't be active all at the same time, but rather waiting on each other for resources. But this "waiting" is unfortunately considered execution time since the used approach is actually not really capable of separating waiting and execution time. Right, so 800% is what gputop could be suggesting today, by the virtue 8 context/clients can each use 100% if they only use a subset of compute units. I was proposing to expose the capacity in fdinfo so it can be scaled down and then dicussing how both situation have pros and cons. There is also a parallel with the CPU world here and hyper threading, if not wider, where "What does 100% actually mean?" is also wishy-washy. Also note that the reporting of actual time based values in fdinfo would not changing with this series. Of if you can guide me towards how to distinguish real vs fake parallelism in HW IP blocks I could modify the series to only add capacity tags where there are truly independent blocks. That would be different from i915 though were I did not bother with that distinction. (For reasons that assignment of for instance EUs to compute "rings" (command streamers in i915) was supposed to be possible to re-configure on the fly. So it did not make sense to try and be super smart in fdinfo.) Well exactly that's the point we don't really have truly independent blocks on AMD hardware. There are things like independent SDMA instances, but those a meant to be used like the first instance for uploads and the second for downloads etc.. When you use both instances for the same job they will pretty much limit each other because of a single resource. So _never_ multiple instances of the same IP block? No video decode, encode, anything? Some chips have multiple encode/decode IP blocks that are actually separate instances, however, we load balance between them so userspace sees just one engine. Also in some cases they are asymmetric (e.g., different sets of supported CODECs on each instance). The driver handles this by inspecting the command buffer and scheduling on the appropriate instance based on the requested CODEC. SDMA also supports multiple IP blocks that are independent. Similar to i915 just that we don't inspect buffers but expose the instance capabilities and userspace is responsible to set up the load balancing engine with the correct physical mask. Anyway, back to the main point - are you interested at all for me to add the capacity flags to at least the IP blocks which are probed to exist more than a singleton? An
Re: [PATCH 3/3] RDMA/hfi1: Use RMW accessors for changing LNKCTL2
On Fri, May 03, 2024 at 01:18:35PM +0300, Ilpo Järvinen wrote: > On Thu, 15 Feb 2024, Ilpo Järvinen wrote: > > > Convert open coded RMW accesses for LNKCTL2 to use > > pcie_capability_clear_and_set_word() which makes its easier to > > understand what the code tries to do. > > > > LNKCTL2 is not really owned by any driver because it is a collection of > > control bits that PCI core might need to touch. RMW accessors already > > have support for proper locking for a selected set of registers > > (LNKCTL2 is not yet among them but likely will be in the future) to > > avoid losing concurrent updates. > > > > Suggested-by: Lukas Wunner > > Signed-off-by: Ilpo Järvinen > > Reviewed-by: Dean Luick > > I found out from Linux RDMA and InfiniBand patchwork that this patch had > been silently closed as "Not Applicable". Is there some reason for > that? It is part of a series that crosses subsystems, series like that usually go through some other trees. If you want single patches applied then please send single patches.. It is hard to understand intent from mixed series. Jason
Re: [PATCH 3/3] RDMA/hfi1: Use RMW accessors for changing LNKCTL2
On Thu, 15 Feb 2024, Ilpo Järvinen wrote: > Convert open coded RMW accesses for LNKCTL2 to use > pcie_capability_clear_and_set_word() which makes its easier to > understand what the code tries to do. > > LNKCTL2 is not really owned by any driver because it is a collection of > control bits that PCI core might need to touch. RMW accessors already > have support for proper locking for a selected set of registers > (LNKCTL2 is not yet among them but likely will be in the future) to > avoid losing concurrent updates. > > Suggested-by: Lukas Wunner > Signed-off-by: Ilpo Järvinen > Reviewed-by: Dean Luick I found out from Linux RDMA and InfiniBand patchwork that this patch had been silently closed as "Not Applicable". Is there some reason for that? I was sending this change independently out (among 2 similar ones that already got applied) so I wouldn't need to keep carrying it within my PCIe bandwidth controller series. It seemed useful enough as cleanups to stand on its own legs w/o requiring it to be part of PCIe bw controller series. Should I resend the patch or do RDMA/IB maintainers prefer it to remain as a part of PCIe BW controller series? -- i. > --- > drivers/infiniband/hw/hfi1/pcie.c | 30 -- > 1 file changed, 8 insertions(+), 22 deletions(-) > > diff --git a/drivers/infiniband/hw/hfi1/pcie.c > b/drivers/infiniband/hw/hfi1/pcie.c > index 119ec2f1382b..7133964749f8 100644 > --- a/drivers/infiniband/hw/hfi1/pcie.c > +++ b/drivers/infiniband/hw/hfi1/pcie.c > @@ -1207,14 +1207,11 @@ int do_pcie_gen3_transition(struct hfi1_devdata *dd) > (u32)lnkctl2); > /* only write to parent if target is not as high as ours */ > if ((lnkctl2 & PCI_EXP_LNKCTL2_TLS) < target_vector) { > - lnkctl2 &= ~PCI_EXP_LNKCTL2_TLS; > - lnkctl2 |= target_vector; > - dd_dev_info(dd, "%s: ..new link control2: 0x%x\n", __func__, > - (u32)lnkctl2); > - ret = pcie_capability_write_word(parent, > - PCI_EXP_LNKCTL2, lnkctl2); > + ret = pcie_capability_clear_and_set_word(parent, > PCI_EXP_LNKCTL2, > + PCI_EXP_LNKCTL2_TLS, > + target_vector); > if (ret) { > - dd_dev_err(dd, "Unable to write to PCI config\n"); > + dd_dev_err(dd, "Unable to change parent PCI target > speed\n"); > return_error = 1; > goto done; > } > @@ -1223,22 +1220,11 @@ int do_pcie_gen3_transition(struct hfi1_devdata *dd) > } > > dd_dev_info(dd, "%s: setting target link speed\n", __func__); > - ret = pcie_capability_read_word(dd->pcidev, PCI_EXP_LNKCTL2, &lnkctl2); > + ret = pcie_capability_clear_and_set_word(dd->pcidev, PCI_EXP_LNKCTL2, > + PCI_EXP_LNKCTL2_TLS, > + target_vector); > if (ret) { > - dd_dev_err(dd, "Unable to read from PCI config\n"); > - return_error = 1; > - goto done; > - } > - > - dd_dev_info(dd, "%s: ..old link control2: 0x%x\n", __func__, > - (u32)lnkctl2); > - lnkctl2 &= ~PCI_EXP_LNKCTL2_TLS; > - lnkctl2 |= target_vector; > - dd_dev_info(dd, "%s: ..new link control2: 0x%x\n", __func__, > - (u32)lnkctl2); > - ret = pcie_capability_write_word(dd->pcidev, PCI_EXP_LNKCTL2, lnkctl2); > - if (ret) { > - dd_dev_err(dd, "Unable to write to PCI config\n"); > + dd_dev_err(dd, "Unable to change device PCI target speed\n"); > return_error = 1; > goto done; > } >
Allow setting a power cap that's lower than recommended
This patch allows setting a low power cap if ignore_min_pcap is set to 1. Signed-off-by: fililip diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h b/drivers/gpu/drm/amd/amdgpu/amdgpu.h index 50f57d4dfd8f..b83697b75221 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h @@ -151,6 +151,7 @@ struct amdgpu_watchdog_timer */ extern int amdgpu_modeset; extern unsigned int amdgpu_vram_limit; +extern int amdgpu_ignore_min_pcap; extern int amdgpu_vis_vram_limit; extern int amdgpu_gart_size; extern int amdgpu_gtt_size; diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c index a7ad77ed09ca..2b7ed2e48e33 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c @@ -131,6 +131,7 @@ enum AMDGPU_DEBUG_MASK { }; unsigned int amdgpu_vram_limit = UINT_MAX; +int amdgpu_ignore_min_pcap = 0; /* do not ignore by default */ int amdgpu_vis_vram_limit; int amdgpu_gart_size = -1; /* auto */ int amdgpu_gtt_size = -1; /* auto */ @@ -238,6 +239,15 @@ struct amdgpu_watchdog_timer amdgpu_watchdog_timer = { .period = 0x0, /* default to 0x0 (timeout disable) */ }; +/** + * DOC: ignore_min_pcap (int) + * Ignore the minimum power cap. + * Useful on graphics cards where the minimum power cap is very high. + * The default is 0 (Do not ignore). + */ +MODULE_PARM_DESC(ignore_min_pcap, "Ignore the minimum power cap"); +module_param_named(ignore_min_pcap, amdgpu_ignore_min_pcap, int, 0600); + /** * DOC: vramlimit (int) * Restrict the total amount of VRAM in MiB for testing. The default is 0 (Use full VRAM). diff --git a/drivers/gpu/drm/amd/pm/amdgpu_pm.c b/drivers/gpu/drm/amd/pm/amdgpu_pm.c index 20c53eefd680..6d00cae9caec 100644 --- a/drivers/gpu/drm/amd/pm/amdgpu_pm.c +++ b/drivers/gpu/drm/amd/pm/amdgpu_pm.c @@ -2984,6 +2984,9 @@ static ssize_t amdgpu_hwmon_show_power_cap_min(struct device *dev, struct device_attribute *attr, char *buf) { + if (amdgpu_ignore_min_pcap) + return sysfs_emit(buf, "%i\n", 0); + return amdgpu_hwmon_show_power_cap_generic(dev, attr, buf, PP_PWR_LIMIT_MIN); } diff --git a/drivers/gpu/drm/amd/pm/swsmu/amdgpu_smu.c b/drivers/gpu/drm/amd/pm/swsmu/amdgpu_smu.c index 60dce148b2d7..72891811b2d5 100644 --- a/drivers/gpu/drm/amd/pm/swsmu/amdgpu_smu.c +++ b/drivers/gpu/drm/amd/pm/swsmu/amdgpu_smu.c @@ -2487,7 +2487,10 @@ int smu_get_power_limit(void *handle, *limit = smu->max_power_limit; break; case SMU_PPT_LIMIT_MIN: - *limit = smu->min_power_limit; + if (amdgpu_ignore_min_pcap) +*limit = 0; + else +*limit = smu->min_power_limit; break; default: return -EINVAL; @@ -2511,7 +2514,14 @@ static int smu_set_power_limit(void *handle, uint32_t limit) if (smu->ppt_funcs->set_power_limit) return smu->ppt_funcs->set_power_limit(smu, limit_type, limit); - if ((limit > smu->max_power_limit) || (limit < smu->min_power_limit)) { + if (amdgpu_ignore_min_pcap) { + if ((limit > smu->max_power_limit)) { + dev_err(smu->adev->dev, +"New power limit (%d) is over the max allowed %d\n", +limit, smu->max_power_limit); + return -EINVAL; + } + } else if ((limit > smu->max_power_limit) || (limit < smu->min_power_limit)) { dev_err(smu->adev->dev, "New power limit (%d) is out of range [%d,%d]\n", limit, smu->min_power_limit, smu->max_power_limit);
[PATCH] Documentation/gpu: Document the situation with unqualified drm-memory-
From: Tvrtko Ursulin Currently it is not well defined what is drm-memory- compared to other categories. In practice the only driver which emits these keys is amdgpu and in them exposes the total memory use (including shared). Document that drm-memory- and drm-total-memory- are aliases to prevent any confusion in the future. While at it also clarify that the reserved sub-string 'memory' refers to the memory region component. Signed-off-by: Tvrtko Ursulin Cc: Alex Deucher Cc: Christian König Cc: Rob Clark --- Documentation/gpu/drm-usage-stats.rst | 10 +- 1 file changed, 9 insertions(+), 1 deletion(-) diff --git a/Documentation/gpu/drm-usage-stats.rst b/Documentation/gpu/drm-usage-stats.rst index 6dc299343b48..ef5c0a0aa477 100644 --- a/Documentation/gpu/drm-usage-stats.rst +++ b/Documentation/gpu/drm-usage-stats.rst @@ -128,7 +128,9 @@ Memory Each possible memory type which can be used to store buffer objects by the GPU in question shall be given a stable and unique name to be returned as the -string here. The name "memory" is reserved to refer to normal system memory. +string here. + +The region name "memory" is reserved to refer to normal system memory. Value shall reflect the amount of storage currently consumed by the buffer objects belong to this client, in the respective memory region. @@ -136,6 +138,9 @@ objects belong to this client, in the respective memory region. Default unit shall be bytes with optional unit specifiers of 'KiB' or 'MiB' indicating kibi- or mebi-bytes. +This is an alias for drm-total- and only one of the two should be +present. + - drm-shared-: [KiB|MiB] The total size of buffers that are shared with another file (e.g., have more @@ -145,6 +150,9 @@ than a single handle). The total size of buffers that including shared and private memory. +This is an alias for drm-memory- and only one of the two should be +present. + - drm-resident-: [KiB|MiB] The total size of buffers that are resident in the specified region. -- 2.44.0
[PATCH 1/3] drm/amdgpu: Add amdgpu_bo_is_vm_bo helper
From: Tvrtko Ursulin Help code readability by replacing a bunch of: bo->tbo.base.resv == vm->root.bo->tbo.base.resv With: amdgpu_vm_is_bo_always_valid(vm, bo) No functional changes. v2: * Rename helper and move to amdgpu_vm. (Christian) v3: * Use Christian's kerneldoc. Signed-off-by: Tvrtko Ursulin Cc: Christian König Reviewed-by: Christian König --- drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c | 2 +- drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 41 - drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h | 2 ++ 3 files changed, 29 insertions(+), 16 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c index 67c234bcf89f..e698d65e9508 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c @@ -174,7 +174,7 @@ static int amdgpu_gem_object_open(struct drm_gem_object *obj, return -EPERM; if (abo->flags & AMDGPU_GEM_CREATE_VM_ALWAYS_VALID && - abo->tbo.base.resv != vm->root.bo->tbo.base.resv) + !amdgpu_vm_is_bo_always_valid(vm, abo)) return -EPERM; r = amdgpu_bo_reserve(abo, false); diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c index 4e2391c83d7c..d451ff9d5af4 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c @@ -333,7 +333,7 @@ void amdgpu_vm_bo_base_init(struct amdgpu_vm_bo_base *base, base->next = bo->vm_bo; bo->vm_bo = base; - if (bo->tbo.base.resv != vm->root.bo->tbo.base.resv) + if (!amdgpu_vm_is_bo_always_valid(vm, bo)) return; dma_resv_assert_held(vm->root.bo->tbo.base.resv); @@ -1101,13 +1101,13 @@ static void amdgpu_vm_bo_get_memory(struct amdgpu_bo_va *bo_va, * For now ignore BOs which are currently locked and potentially * changing their location. */ - if (bo->tbo.base.resv != vm->root.bo->tbo.base.resv && + if (!amdgpu_vm_is_bo_always_valid(vm, bo) && !dma_resv_trylock(bo->tbo.base.resv)) return; amdgpu_bo_get_memory(bo, stats); - if (bo->tbo.base.resv != vm->root.bo->tbo.base.resv) - dma_resv_unlock(bo->tbo.base.resv); + if (amdgpu_vm_is_bo_always_valid(vm, bo)) + dma_resv_unlock(bo->tbo.base.resv); } void amdgpu_vm_get_memory(struct amdgpu_vm *vm, @@ -1203,8 +1203,7 @@ int amdgpu_vm_bo_update(struct amdgpu_device *adev, struct amdgpu_bo_va *bo_va, uncached = false; } - if (clear || (bo && bo->tbo.base.resv == - vm->root.bo->tbo.base.resv)) + if (clear || amdgpu_vm_is_bo_always_valid(vm, bo)) last_update = &vm->last_update; else last_update = &bo_va->last_pt_update; @@ -1246,7 +1245,7 @@ int amdgpu_vm_bo_update(struct amdgpu_device *adev, struct amdgpu_bo_va *bo_va, * the evicted list so that it gets validated again on the * next command submission. */ - if (bo && bo->tbo.base.resv == vm->root.bo->tbo.base.resv) { + if (amdgpu_vm_is_bo_always_valid(vm, bo)) { uint32_t mem_type = bo->tbo.resource->mem_type; if (!(bo->preferred_domains & @@ -1640,10 +1639,9 @@ static void amdgpu_vm_bo_insert_map(struct amdgpu_device *adev, if (mapping->flags & AMDGPU_PTE_PRT) amdgpu_vm_prt_get(adev); - if (bo && bo->tbo.base.resv == vm->root.bo->tbo.base.resv && - !bo_va->base.moved) { + if (amdgpu_vm_is_bo_always_valid(vm, bo) && !bo_va->base.moved) amdgpu_vm_bo_moved(&bo_va->base); - } + trace_amdgpu_vm_bo_map(bo_va, mapping); } @@ -1942,7 +1940,7 @@ int amdgpu_vm_bo_clear_mappings(struct amdgpu_device *adev, if (before->flags & AMDGPU_PTE_PRT) amdgpu_vm_prt_get(adev); - if (bo && bo->tbo.base.resv == vm->root.bo->tbo.base.resv && + if (amdgpu_vm_is_bo_always_valid(vm, bo) && !before->bo_va->base.moved) amdgpu_vm_bo_moved(&before->bo_va->base); } else { @@ -1957,7 +1955,7 @@ int amdgpu_vm_bo_clear_mappings(struct amdgpu_device *adev, if (after->flags & AMDGPU_PTE_PRT) amdgpu_vm_prt_get(adev); - if (bo && bo->tbo.base.resv == vm->root.bo->tbo.base.resv && + if (amdgpu_vm_is_bo_always_valid(vm, bo) && !after->bo_va->base.moved) amdgpu_vm_bo_moved(&after->bo_va->base); } else { @@ -2037,7 +2035,7 @@ void amdgpu_vm_bo_del(struct amdgpu_device *adev, if (bo) { dma_resv_assert_held(bo->tbo.base.resv); - if (bo->tbo.base.resv == vm->root.bo->tbo.base.resv) + if (amdgpu_vm_is_bo_always_valid(vm, bo))
[PATCH 2/3] drm/amdgpu: Reduce mem_type to domain double indirection
From: Tvrtko Ursulin All apart from AMDGPU_GEM_DOMAIN_GTT memory domains map 1:1 to TTM placements. And the former be either AMDGPU_PL_PREEMPT or TTM_PL_TT, depending on AMDGPU_GEM_CREATE_PREEMPTIBLE. Simplify a few places in the code which convert the TTM placement into a domain by checking against the current placement directly. In the conversion AMDGPU_PL_PREEMPT either does not have to be handled because amdgpu_mem_type_to_domain() cannot return that value anyway. v2: * Remove AMDGPU_PL_PREEMPT handling. v3: * Rebase. Signed-off-by: Tvrtko Ursulin Reviewed-by: Christian König # v1 Reviewed-by: Felix Kuehling # v2 --- drivers/gpu/drm/amd/amdgpu/amdgpu_dma_buf.c | 3 +-- drivers/gpu/drm/amd/amdgpu/amdgpu_object.c | 29 + 2 files changed, 13 insertions(+), 19 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_dma_buf.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_dma_buf.c index 055ba2ea4c12..0b3b10d21952 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_dma_buf.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_dma_buf.c @@ -165,8 +165,7 @@ static struct sg_table *amdgpu_dma_buf_map(struct dma_buf_attachment *attach, if (r) return ERR_PTR(r); - } else if (!(amdgpu_mem_type_to_domain(bo->tbo.resource->mem_type) & -AMDGPU_GEM_DOMAIN_GTT)) { + } else if (bo->tbo.resource->mem_type != TTM_PL_TT) { return ERR_PTR(-EBUSY); } diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c index 8d8c39be6129..4f9073dd19eb 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c @@ -983,12 +983,11 @@ int amdgpu_bo_pin_restricted(struct amdgpu_bo *bo, u32 domain, ttm_bo_pin(&bo->tbo); - domain = amdgpu_mem_type_to_domain(bo->tbo.resource->mem_type); - if (domain == AMDGPU_GEM_DOMAIN_VRAM) { + if (bo->tbo.resource->mem_type == TTM_PL_VRAM) { atomic64_add(amdgpu_bo_size(bo), &adev->vram_pin_size); atomic64_add(amdgpu_vram_mgr_bo_visible_size(bo), &adev->visible_pin_size); - } else if (domain == AMDGPU_GEM_DOMAIN_GTT) { + } else if (bo->tbo.resource->mem_type == TTM_PL_TT) { atomic64_add(amdgpu_bo_size(bo), &adev->gart_pin_size); } @@ -1293,7 +1292,6 @@ void amdgpu_bo_get_memory(struct amdgpu_bo *bo, struct ttm_resource *res = bo->tbo.resource; uint64_t size = amdgpu_bo_size(bo); struct drm_gem_object *obj; - unsigned int domain; bool shared; /* Abort if the BO doesn't currently have a backing store */ @@ -1303,21 +1301,20 @@ void amdgpu_bo_get_memory(struct amdgpu_bo *bo, obj = &bo->tbo.base; shared = drm_gem_object_is_shared_for_memory_stats(obj); - domain = amdgpu_mem_type_to_domain(res->mem_type); - switch (domain) { - case AMDGPU_GEM_DOMAIN_VRAM: + switch (res->mem_type) { + case TTM_PL_VRAM: stats->vram += size; - if (amdgpu_res_cpu_visible(adev, bo->tbo.resource)) + if (amdgpu_res_cpu_visible(adev, res)) stats->visible_vram += size; if (shared) stats->vram_shared += size; break; - case AMDGPU_GEM_DOMAIN_GTT: + case TTM_PL_TT: stats->gtt += size; if (shared) stats->gtt_shared += size; break; - case AMDGPU_GEM_DOMAIN_CPU: + case TTM_PL_SYSTEM: default: stats->cpu += size; if (shared) @@ -1330,7 +1327,7 @@ void amdgpu_bo_get_memory(struct amdgpu_bo *bo, if (bo->flags & AMDGPU_GEM_CREATE_CPU_ACCESS_REQUIRED) stats->requested_visible_vram += size; - if (domain != AMDGPU_GEM_DOMAIN_VRAM) { + if (res->mem_type != TTM_PL_VRAM) { stats->evicted_vram += size; if (bo->flags & AMDGPU_GEM_CREATE_CPU_ACCESS_REQUIRED) stats->evicted_visible_vram += size; @@ -1604,20 +1601,18 @@ u64 amdgpu_bo_print_info(int id, struct amdgpu_bo *bo, struct seq_file *m) u64 size; if (dma_resv_trylock(bo->tbo.base.resv)) { - unsigned int domain; - domain = amdgpu_mem_type_to_domain(bo->tbo.resource->mem_type); - switch (domain) { - case AMDGPU_GEM_DOMAIN_VRAM: + switch (bo->tbo.resource->mem_type) { + case TTM_PL_VRAM: if (amdgpu_res_cpu_visible(adev, bo->tbo.resource)) placement = "VRAM VISIBLE"; else placement = "VRAM"; break; - case AMDGPU_GEM_DOMAIN_GTT: +
[PATCH 3/3] drm/amdgpu: Describe all object placements in debugfs
From: Tvrtko Ursulin Accurately show all placements when describing objects in debugfs, instead of bunching them up under the 'CPU' placement. Signed-off-by: Tvrtko Ursulin Cc: Christian König Cc: Felix Kuehling Reviewed-by: Christian König --- drivers/gpu/drm/amd/amdgpu/amdgpu_object.c | 15 +++ 1 file changed, 15 insertions(+) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c index 4f9073dd19eb..fa5227a4aac2 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c @@ -1612,6 +1612,21 @@ u64 amdgpu_bo_print_info(int id, struct amdgpu_bo *bo, struct seq_file *m) case TTM_PL_TT: placement = "GTT"; break; + case AMDGPU_PL_GDS: + placement = "GDS"; + break; + case AMDGPU_PL_GWS: + placement = "GWS"; + break; + case AMDGPU_PL_OA: + placement = "OA"; + break; + case AMDGPU_PL_PREEMPT: + placement = "PREEMPTIBLE"; + break; + case AMDGPU_PL_DOORBELL: + placement = "DOORBELL"; + break; case TTM_PL_SYSTEM: default: placement = "CPU"; -- 2.44.0
[PATCH v1 4/4] drm/amdgpu: add gfx queue registers for gfx10 ip dump
Add gfx queue registers in the list of registers to be dumped in ip dump for gfx10 Signed-off-by: Sunil Khatri --- drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c | 26 +- 1 file changed, 25 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c b/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c index 00c7a842ea3b..bef7d8ca35df 100644 --- a/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c +++ b/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c @@ -418,7 +418,31 @@ static const struct amdgpu_hwip_reg_entry gc_reg_list_10_1[] = { SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_SUSPEND_CNTL_STACK_OFFSET), SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_SUSPEND_CNTL_STACK_DW_CNT), SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_SUSPEND_WG_STATE_OFFSET), - SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_DEQUEUE_STATUS) + SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_DEQUEUE_STATUS), + /* gfx queue registers */ + SOC15_REG_ENTRY_STR(GC, 0, mmCP_GFX_HQD_ACTIVE), + SOC15_REG_ENTRY_STR(GC, 0, mmCP_GFX_HQD_QUEUE_PRIORITY), + SOC15_REG_ENTRY_STR(GC, 0, mmCP_GFX_HQD_BASE), + SOC15_REG_ENTRY_STR(GC, 0, mmCP_GFX_HQD_BASE_HI), + SOC15_REG_ENTRY_STR(GC, 0, mmCP_GFX_HQD_OFFSET), + SOC15_REG_ENTRY_STR(GC, 0, mmCP_GFX_HQD_CSMD_RPTR), + SOC15_REG_ENTRY_STR(GC, 0, mmCP_GFX_HQD_WPTR), + SOC15_REG_ENTRY_STR(GC, 0, mmCP_GFX_HQD_WPTR_HI), + SOC15_REG_ENTRY_STR(GC, 0, mmCP_GFX_HQD_DEQUEUE_REQUEST), + SOC15_REG_ENTRY_STR(GC, 0, mmCP_GFX_HQD_MAPPED), + SOC15_REG_ENTRY_STR(GC, 0, mmCP_GFX_HQD_QUE_MGR_CONTROL), + SOC15_REG_ENTRY_STR(GC, 0, mmCP_GFX_HQD_HQ_CONTROL0), + SOC15_REG_ENTRY_STR(GC, 0, mmCP_GFX_HQD_HQ_STATUS0), + SOC15_REG_ENTRY_STR(GC, 0, mmCP_GFX_HQD_CE_WPTR_POLL_ADDR_LO), + SOC15_REG_ENTRY_STR(GC, 0, mmCP_GFX_HQD_CE_WPTR_POLL_ADDR_HI), + SOC15_REG_ENTRY_STR(GC, 0, mmCP_GFX_HQD_CE_OFFSET), + SOC15_REG_ENTRY_STR(GC, 0, mmCP_GFX_HQD_CE_CSMD_RPTR), + SOC15_REG_ENTRY_STR(GC, 0, mmCP_GFX_HQD_CE_WPTR), + SOC15_REG_ENTRY_STR(GC, 0, mmCP_GFX_HQD_CE_WPTR_HI), + SOC15_REG_ENTRY_STR(GC, 0, mmCP_GFX_MQD_BASE_ADDR), + SOC15_REG_ENTRY_STR(GC, 0, mmCP_GFX_MQD_BASE_ADDR_HI), + SOC15_REG_ENTRY_STR(GC, 0, mmCP_RB_WPTR_POLL_ADDR_LO), + SOC15_REG_ENTRY_STR(GC, 0, mmCP_RB_WPTR_POLL_ADDR_HI) }; static const struct soc15_reg_golden golden_settings_gc_10_1[] = { -- 2.34.1
[PATCH v1 2/4] drm/amdgpu: add se registers to ip dump for gfx10
add the registers of SE block of gfx for ip dump for gfx10 IP. Signed-off-by: Sunil Khatri --- drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c | 7 ++- 1 file changed, 6 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c b/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c index 61c1e997f794..953df202953a 100644 --- a/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c +++ b/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c @@ -373,7 +373,12 @@ static const struct amdgpu_hwip_reg_entry gc_reg_list_10_1[] = { SOC15_REG_ENTRY_STR(GC, 0, mmCP_MEC_ME2_HEADER_DUMP), SOC15_REG_ENTRY_STR(GC, 0, mmCP_PFP_HEADER_DUMP), SOC15_REG_ENTRY_STR(GC, 0, mmCP_ME_HEADER_DUMP), - SOC15_REG_ENTRY_STR(GC, 0, mmCP_MES_HEADER_DUMP) + SOC15_REG_ENTRY_STR(GC, 0, mmCP_MES_HEADER_DUMP), + /* SE status registers */ + SOC15_REG_ENTRY_STR(GC, 0, mmGRBM_STATUS_SE0), + SOC15_REG_ENTRY_STR(GC, 0, mmGRBM_STATUS_SE1), + SOC15_REG_ENTRY_STR(GC, 0, mmGRBM_STATUS_SE2), + SOC15_REG_ENTRY_STR(GC, 0, mmGRBM_STATUS_SE3) }; static const struct soc15_reg_golden golden_settings_gc_10_1[] = { -- 2.34.1
[PATCH v1 3/4] drm/amdgpu: add compute registers in ip dump for gfx10
add compute registers in set of registers to dump during ip dump for gfx10. Signed-off-by: Sunil Khatri --- drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c | 42 +- 1 file changed, 41 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c b/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c index 953df202953a..00c7a842ea3b 100644 --- a/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c +++ b/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c @@ -378,7 +378,47 @@ static const struct amdgpu_hwip_reg_entry gc_reg_list_10_1[] = { SOC15_REG_ENTRY_STR(GC, 0, mmGRBM_STATUS_SE0), SOC15_REG_ENTRY_STR(GC, 0, mmGRBM_STATUS_SE1), SOC15_REG_ENTRY_STR(GC, 0, mmGRBM_STATUS_SE2), - SOC15_REG_ENTRY_STR(GC, 0, mmGRBM_STATUS_SE3) + SOC15_REG_ENTRY_STR(GC, 0, mmGRBM_STATUS_SE3), + /* compute registers */ + SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_VMID), + SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_PERSISTENT_STATE), + SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_PIPE_PRIORITY), + SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_QUEUE_PRIORITY), + SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_QUANTUM), + SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_PQ_BASE), + SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_PQ_BASE_HI), + SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_PQ_RPTR), + SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_PQ_WPTR_POLL_ADDR), + SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_PQ_WPTR_POLL_ADDR_HI), + SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_PQ_DOORBELL_CONTROL), + SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_PQ_CONTROL), + SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_IB_BASE_ADDR), + SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_IB_BASE_ADDR_HI), + SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_IB_RPTR), + SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_IB_CONTROL), + SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_DEQUEUE_REQUEST), + SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_EOP_BASE_ADDR), + SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_EOP_BASE_ADDR_HI), + SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_EOP_CONTROL), + SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_EOP_RPTR), + SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_EOP_WPTR), + SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_EOP_EVENTS), + SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_CTX_SAVE_BASE_ADDR_LO), + SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_CTX_SAVE_BASE_ADDR_HI), + SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_CTX_SAVE_CONTROL), + SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_CNTL_STACK_OFFSET), + SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_CNTL_STACK_SIZE), + SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_WG_STATE_OFFSET), + SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_CTX_SAVE_SIZE), + SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_GDS_RESOURCE_STATE), + SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_ERROR), + SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_EOP_WPTR_MEM), + SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_PQ_WPTR_LO), + SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_PQ_WPTR_HI), + SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_SUSPEND_CNTL_STACK_OFFSET), + SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_SUSPEND_CNTL_STACK_DW_CNT), + SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_SUSPEND_WG_STATE_OFFSET), + SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_DEQUEUE_STATUS) }; static const struct soc15_reg_golden golden_settings_gc_10_1[] = { -- 2.34.1
[PATCH v1 0/4] Adding differnt blocks of gfx10 registers for register dump
*** BLURB HERE *** Sunil Khatri (4): drm/amdgpu: add CP headers registers to gfx10 dump drm/amdgpu: add se registers to ip dump for gfx10 drm/amdgpu: add compute registers in ip dump for gfx10 drm/amdgpu: add gfx queue registers for gfx10 ip dump drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c | 78 +- 1 file changed, 77 insertions(+), 1 deletion(-) -- 2.34.1
[PATCH v1 1/4] drm/amdgpu: add CP headers registers to gfx10 dump
add registers in the ip dump for CP headers in gfx10 Signed-off-by: Sunil Khatri --- drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c | 9 - 1 file changed, 8 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c b/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c index 3171ed5e5af3..61c1e997f794 100644 --- a/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c +++ b/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c @@ -366,7 +366,14 @@ static const struct amdgpu_hwip_reg_entry gc_reg_list_10_1[] = { SOC15_REG_ENTRY_STR(GC, 0, mmRLC_GPM_DEBUG_INST_A), SOC15_REG_ENTRY_STR(GC, 0, mmRLC_GPM_DEBUG_INST_B), SOC15_REG_ENTRY_STR(GC, 0, mmRLC_GPM_DEBUG_INST_ADDR), - SOC15_REG_ENTRY_STR(GC, 0, mmRLC_LX6_CORE_PDEBUG_INST) + SOC15_REG_ENTRY_STR(GC, 0, mmRLC_LX6_CORE_PDEBUG_INST), + /* cp header registers */ + SOC15_REG_ENTRY_STR(GC, 0, mmCP_CE_HEADER_DUMP), + SOC15_REG_ENTRY_STR(GC, 0, mmCP_MEC_ME1_HEADER_DUMP), + SOC15_REG_ENTRY_STR(GC, 0, mmCP_MEC_ME2_HEADER_DUMP), + SOC15_REG_ENTRY_STR(GC, 0, mmCP_PFP_HEADER_DUMP), + SOC15_REG_ENTRY_STR(GC, 0, mmCP_ME_HEADER_DUMP), + SOC15_REG_ENTRY_STR(GC, 0, mmCP_MES_HEADER_DUMP) }; static const struct soc15_reg_golden golden_settings_gc_10_1[] = { -- 2.34.1
Re: [PATCH 1/3] drm/amdgpu: Add amdgpu_bo_is_vm_bo helper
Am 03.05.24 um 10:23 schrieb Tvrtko Ursulin: On 03/05/2024 07:26, Christian König wrote: Am 29.04.24 um 18:47 schrieb Tvrtko Ursulin: From: Tvrtko Ursulin Help code readability by replacing a bunch of: bo->tbo.base.resv == vm->root.bo->tbo.base.resv With: amdgpu_vm_is_bo_always_valid(vm, bo) No functional changes. v2: * Rename helper and move to amdgpu_vm. (Christian) Signed-off-by: Tvrtko Ursulin Cc: Christian König --- drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c | 2 +- drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 40 +++-- drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h | 2 ++ 3 files changed, 28 insertions(+), 16 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c index 67c234bcf89f..e698d65e9508 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c @@ -174,7 +174,7 @@ static int amdgpu_gem_object_open(struct drm_gem_object *obj, return -EPERM; if (abo->flags & AMDGPU_GEM_CREATE_VM_ALWAYS_VALID && - abo->tbo.base.resv != vm->root.bo->tbo.base.resv) + !amdgpu_vm_is_bo_always_valid(vm, abo)) return -EPERM; r = amdgpu_bo_reserve(abo, false); diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c index 8af3f0fd3073..01ca4b35b369 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c @@ -333,7 +333,7 @@ void amdgpu_vm_bo_base_init(struct amdgpu_vm_bo_base *base, base->next = bo->vm_bo; bo->vm_bo = base; - if (bo->tbo.base.resv != vm->root.bo->tbo.base.resv) + if (!amdgpu_vm_is_bo_always_valid(vm, bo)) return; dma_resv_assert_held(vm->root.bo->tbo.base.resv); @@ -1101,13 +1101,13 @@ static void amdgpu_vm_bo_get_memory(struct amdgpu_bo_va *bo_va, * For now ignore BOs which are currently locked and potentially * changing their location. */ - if (bo->tbo.base.resv != vm->root.bo->tbo.base.resv && + if (!amdgpu_vm_is_bo_always_valid(vm, bo) && !dma_resv_trylock(bo->tbo.base.resv)) return; amdgpu_bo_get_memory(bo, stats); - if (bo->tbo.base.resv != vm->root.bo->tbo.base.resv) - dma_resv_unlock(bo->tbo.base.resv); + if (amdgpu_vm_is_bo_always_valid(vm, bo)) + dma_resv_unlock(bo->tbo.base.resv); } void amdgpu_vm_get_memory(struct amdgpu_vm *vm, @@ -1203,8 +1203,7 @@ int amdgpu_vm_bo_update(struct amdgpu_device *adev, struct amdgpu_bo_va *bo_va, uncached = false; } - if (clear || (bo && bo->tbo.base.resv == - vm->root.bo->tbo.base.resv)) + if (clear || amdgpu_vm_is_bo_always_valid(vm, bo)) last_update = &vm->last_update; else last_update = &bo_va->last_pt_update; @@ -1246,7 +1245,7 @@ int amdgpu_vm_bo_update(struct amdgpu_device *adev, struct amdgpu_bo_va *bo_va, * the evicted list so that it gets validated again on the * next command submission. */ - if (bo && bo->tbo.base.resv == vm->root.bo->tbo.base.resv) { + if (amdgpu_vm_is_bo_always_valid(vm, bo)) { uint32_t mem_type = bo->tbo.resource->mem_type; if (!(bo->preferred_domains & @@ -1640,10 +1639,9 @@ static void amdgpu_vm_bo_insert_map(struct amdgpu_device *adev, if (mapping->flags & AMDGPU_PTE_PRT) amdgpu_vm_prt_get(adev); - if (bo && bo->tbo.base.resv == vm->root.bo->tbo.base.resv && - !bo_va->base.moved) { + if (amdgpu_vm_is_bo_always_valid(vm, bo) && !bo_va->base.moved) amdgpu_vm_bo_moved(&bo_va->base); - } + trace_amdgpu_vm_bo_map(bo_va, mapping); } @@ -1922,7 +1920,7 @@ int amdgpu_vm_bo_clear_mappings(struct amdgpu_device *adev, if (before->flags & AMDGPU_PTE_PRT) amdgpu_vm_prt_get(adev); - if (bo && bo->tbo.base.resv == vm->root.bo->tbo.base.resv && + if (amdgpu_vm_is_bo_always_valid(vm, bo) && !before->bo_va->base.moved) amdgpu_vm_bo_moved(&before->bo_va->base); } else { @@ -1937,7 +1935,7 @@ int amdgpu_vm_bo_clear_mappings(struct amdgpu_device *adev, if (after->flags & AMDGPU_PTE_PRT) amdgpu_vm_prt_get(adev); - if (bo && bo->tbo.base.resv == vm->root.bo->tbo.base.resv && + if (amdgpu_vm_is_bo_always_valid(vm, bo) && !after->bo_va->base.moved) amdgpu_vm_bo_moved(&after->bo_va->base); } else { @@ -2017,7 +2015,7 @@ void amdgpu_vm_bo_del(struct amdgpu_device *adev, if (bo) { dma_resv_assert_held(bo->tbo.base.resv); - if (bo->tbo.base.resv == vm->root.bo->tbo.base.resv) + if (amdgpu_vm_is_bo_always_valid(vm, bo)) ttm_bo_set_bulk_move(&bo->tbo, NULL); for (base = &bo_va->base.bo->vm_bo; *base; @@ -2111,7 +2109,7 @@ void amdgpu_vm_bo_invalidate(struct amdgpu_device *adev, for (bo_base = bo->vm_bo; bo_base; bo_base = bo_base->next) { st
Re: [linux-next:master] BUILD REGRESSION 9c6ecb3cb6e20c4fd7997047213ba0efcf9ada1a
Ok, I'm getting tired of seeing these for the USB portion of the tree, so I went to look for: On Fri, May 03, 2024 at 04:44:42AM +0800, kernel test robot wrote: > |-- arc-randconfig-002-20240503 > | `-- drivers-usb-dwc3-core.c:warning:variable-hw_mode-set-but-not-used This warning (same for all arches), but can't seem to find it anywhere. Any hints as to where it would be? thanks, greg k-h
Re: [linux-next:master] BUILD REGRESSION 9c6ecb3cb6e20c4fd7997047213ba0efcf9ada1a
On Fri, May 03, 2024 at 11:30:50AM +0530, Krishna Kurapati PSSNV wrote: > > > On 5/3/2024 10:42 AM, Greg KH wrote: > > Ok, I'm getting tired of seeing these for the USB portion of the tree, > > so I went to look for: > > > > On Fri, May 03, 2024 at 04:44:42AM +0800, kernel test robot wrote: > > > |-- arc-randconfig-002-20240503 > > > | `-- drivers-usb-dwc3-core.c:warning:variable-hw_mode-set-but-not-used > > > > This warning (same for all arches), but can't seem to find it anywhere. > > > > Any hints as to where it would be? > > > > Hi Greg, > > I think the hw_mode was not removed in hs_phy_setup and left unused. > > Thinh reported the same when there was a merge conflict into linux next > (that the hw_mode variable was removed in ss_phy_setup and should be removed > in hs_phy_setup as well): > > https://lore.kernel.org/all/20240426213923.tyeddub4xszyp...@synopsys.com/ > > Perhaps that was missed ? Must have been. I need it in a format that it can be applied in (a 2-way diff can't apply...) thanks, greg k-h
Re: [RFC 5/5] drm/amdgpu: Only show VRAM in fdinfo if it exists
On 02/05/2024 14:16, Christian König wrote: Am 30.04.24 um 19:27 schrieb Tvrtko Ursulin: From: Tvrtko Ursulin Do not emit the key-value pairs if the VRAM does not exist ie. VRAM placement is not valid and accessible. Yeah, that's unfortunately rather misleading. Even APUs have VRAM or rather stolen system memory which is managed by the graphics driver. We only have a single compute model which really doesn't have VRAM at all. Hm what is misleading and how more precisely? :) Maybe in other words, if is_app_apu is not the right criteria to know when TTM_PL_VRAM is impossible, what is? Is the compute model you mentio the only thing which sets is_app_apu and uses the dummy vram manager? Regards, Tvrtko Regards, Christian. Signed-off-by: Tvrtko Ursulin --- drivers/gpu/drm/amd/amdgpu/amdgpu_fdinfo.c | 29 +- 1 file changed, 17 insertions(+), 12 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_fdinfo.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_fdinfo.c index a09944104c41..603a5c010f5d 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_fdinfo.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_fdinfo.c @@ -83,25 +83,30 @@ void amdgpu_show_fdinfo(struct drm_printer *p, struct drm_file *file) */ drm_printf(p, "pasid:\t%u\n", fpriv->vm.pasid); - drm_printf(p, "drm-memory-vram:\t%llu KiB\n", stats.vram/1024UL); drm_printf(p, "drm-memory-gtt: \t%llu KiB\n", stats.gtt/1024UL); drm_printf(p, "drm-memory-cpu: \t%llu KiB\n", stats.cpu/1024UL); - drm_printf(p, "amd-memory-visible-vram:\t%llu KiB\n", - stats.visible_vram/1024UL); - drm_printf(p, "amd-evicted-vram:\t%llu KiB\n", - stats.evicted_vram/1024UL); - drm_printf(p, "amd-evicted-visible-vram:\t%llu KiB\n", - stats.evicted_visible_vram/1024UL); - drm_printf(p, "amd-requested-vram:\t%llu KiB\n", - stats.requested_vram/1024UL); - drm_printf(p, "amd-requested-visible-vram:\t%llu KiB\n", - stats.requested_visible_vram/1024UL); drm_printf(p, "amd-requested-gtt:\t%llu KiB\n", stats.requested_gtt/1024UL); - drm_printf(p, "drm-shared-vram:\t%llu KiB\n", stats.vram_shared/1024UL); drm_printf(p, "drm-shared-gtt:\t%llu KiB\n", stats.gtt_shared/1024UL); drm_printf(p, "drm-shared-cpu:\t%llu KiB\n", stats.cpu_shared/1024UL); + if (!adev->gmc.is_app_apu) { + drm_printf(p, "drm-memory-vram:\t%llu KiB\n", + stats.vram/1024UL); + drm_printf(p, "amd-memory-visible-vram:\t%llu KiB\n", + stats.visible_vram/1024UL); + drm_printf(p, "amd-evicted-vram:\t%llu KiB\n", + stats.evicted_vram/1024UL); + drm_printf(p, "amd-evicted-visible-vram:\t%llu KiB\n", + stats.evicted_visible_vram/1024UL); + drm_printf(p, "amd-requested-vram:\t%llu KiB\n", + stats.requested_vram/1024UL); + drm_printf(p, "amd-requested-visible-vram:\t%llu KiB\n", + stats.requested_visible_vram/1024UL); + drm_printf(p, "drm-shared-vram:\t%llu KiB\n", + stats.vram_shared/1024UL); + } + for (hw_ip = 0; hw_ip < AMDGPU_HW_IP_NUM; ++hw_ip) { if (!usage[hw_ip]) continue;
Re: [RFC 0/5] Add capacity key to fdinfo
On 02/05/2024 14:07, Christian König wrote: Am 01.05.24 um 15:27 schrieb Tvrtko Ursulin: Hi Alex, On 30/04/2024 19:32, Alex Deucher wrote: On Tue, Apr 30, 2024 at 1:27 PM Tvrtko Ursulin wrote: From: Tvrtko Ursulin I have noticed AMD GPUs can have more than one "engine" (ring?) of the same type but amdgpu is not reporting that in fdinfo using the capacity engine tag. This series is therefore an attempt to improve that, but only an RFC since it is quite likely I got stuff wrong on the first attempt. Or if not wrong it may not be very beneficial in AMDs case. So I tried to figure out how to count and store the number of instances of an "engine" type and spotted that could perhaps be used in more than one place in the driver. I was more than a little bit confused by the ip_instance and uapi rings, then how rings are selected to context entities internally. Anyway.. hopefully it is a simple enough series to easily spot any such large misses. End result should be that, assuming two "engine" instances, one fully loaded and one idle will only report client using 50% of that engine type. That would only be true if there are multiple instantiations of the IP on the chip which in most cases is not true. In most cases there is one instance of the IP that can be fed from multiple rings. E.g. for graphics and compute, all of the rings ultimately feed into the same compute units on the chip. So if you have a gfx ring and a compute rings, you can schedule work to them asynchronously, but ultimately whether they execute serially or in parallel depends on the actual shader code in the command buffers and the extent to which it can utilize the available compute units in the shader cores. This is the same as with Intel/i915. Fdinfo is not intended to provide utilisation of EUs and such, just how busy are the "entities" kernel submits to. So doing something like in this series would make the reporting more similar between the two drivers. I think both the 0-800% or 0-100% range (taking 8 ring compute as an example) can be misleading for different workloads. Neither <800% in the former means one can send more work and same for <100% in the latter. Yeah, I think that's what Alex tries to describe. By using 8 compute rings your 800% load is actually incorrect and quite misleading. Background is that those 8 compute rings won't be active all at the same time, but rather waiting on each other for resources. But this "waiting" is unfortunately considered execution time since the used approach is actually not really capable of separating waiting and execution time. Right, so 800% is what gputop could be suggesting today, by the virtue 8 context/clients can each use 100% if they only use a subset of compute units. I was proposing to expose the capacity in fdinfo so it can be scaled down and then dicussing how both situation have pros and cons. There is also a parallel with the CPU world here and hyper threading, if not wider, where "What does 100% actually mean?" is also wishy-washy. Also note that the reporting of actual time based values in fdinfo would not changing with this series. Of if you can guide me towards how to distinguish real vs fake parallelism in HW IP blocks I could modify the series to only add capacity tags where there are truly independent blocks. That would be different from i915 though were I did not bother with that distinction. (For reasons that assignment of for instance EUs to compute "rings" (command streamers in i915) was supposed to be possible to re-configure on the fly. So it did not make sense to try and be super smart in fdinfo.) Well exactly that's the point we don't really have truly independent blocks on AMD hardware. There are things like independent SDMA instances, but those a meant to be used like the first instance for uploads and the second for downloads etc.. When you use both instances for the same job they will pretty much limit each other because of a single resource. So _never_ multiple instances of the same IP block? No video decode, encode, anything? As for the UAPI portion of this, we generally expose a limited number of rings to user space and then we use the GPU scheduler to load balance between all of the available rings of a type to try and extract as much parallelism as we can. The part I do not understand is the purpose of the ring argument in for instance drm_amdgpu_cs_chunk_ib. It appears userspace can create up to N scheduling entities using different ring id's, but internally they can map to 1:N same scheduler instances (depending on IP type, can be that each userspace ring maps to same N hw rings, or for rings with no drm sched load balancing userspace ring also does not appear to have a relation to the picked drm sched instance.). So I neither understand how this ring is useful, or how it does not create a problem for IP types which use drm_sched_pick_best. It appears even if u
Re: [PATCH v1 12/12] fbdev/viafb: Make I2C terminology more inclusive
On 5/2/2024 3:46 AM, Thomas Zimmermann wrote: > > > Am 30.04.24 um 19:38 schrieb Easwar Hariharan: >> I2C v7, SMBus 3.2, and I3C 1.1.1 specifications have replaced "master/slave" >> with more appropriate terms. Inspired by and following on to Wolfram's >> series to fix drivers/i2c/[1], fix the terminology for users of >> I2C_ALGOBIT bitbanging interface, now that the approved verbiage exists >> in the specification. >> >> Compile tested, no functionality changes intended >> >> [1]: >> https://lore.kernel.org/all/20240322132619.6389-1-wsa+rene...@sang-engineering.com/ >> >> Signed-off-by: Easwar Hariharan > > Acked-by: Thomas Zimmermann > Thanks for the ack! I had been addressing feedback as I got it on the v0 series, and it seems I missed out on updating viafb and smscufx to spec-compliant controller/target terminology like the v0->v1 changelog calls out before posting v1. For smscufx, I feel phrasing the following line (as an example) > -/* sets up I2C Controller for 100 Kbps, std. speed, 7-bit addr, host, > +/* sets up I2C Controller for 100 Kbps, std. speed, 7-bit addr, > *controller*, would actually impact readability negatively, so I propose to leave smscufx as is. For viafb, I propose making it compliant with the spec using the controller/target terminology and posting a v2 respin (which I can send out as soon as you say) and ask you to review again. What do you think? Thanks, Easwar >> --- >> drivers/video/fbdev/via/chip.h | 8 >> drivers/video/fbdev/via/dvi.c | 24 >> drivers/video/fbdev/via/lcd.c | 6 +++--- >> drivers/video/fbdev/via/via_aux.h | 2 +- >> drivers/video/fbdev/via/via_i2c.c | 12 ++-- >> drivers/video/fbdev/via/vt1636.c | 6 +++--- >> 6 files changed, 29 insertions(+), 29 deletions(-) >>
Re: [linux-next:master] BUILD REGRESSION 9c6ecb3cb6e20c4fd7997047213ba0efcf9ada1a
On 5/3/2024 10:42 AM, Greg KH wrote: Ok, I'm getting tired of seeing these for the USB portion of the tree, so I went to look for: On Fri, May 03, 2024 at 04:44:42AM +0800, kernel test robot wrote: |-- arc-randconfig-002-20240503 | `-- drivers-usb-dwc3-core.c:warning:variable-hw_mode-set-but-not-used This warning (same for all arches), but can't seem to find it anywhere. Any hints as to where it would be? Hi Greg, I think the hw_mode was not removed in hs_phy_setup and left unused. Thinh reported the same when there was a merge conflict into linux next (that the hw_mode variable was removed in ss_phy_setup and should be removed in hs_phy_setup as well): https://lore.kernel.org/all/20240426213923.tyeddub4xszyp...@synopsys.com/ Perhaps that was missed ? Regards, Krishna,
Re: [PATCH v1 12/12] fbdev/viafb: Make I2C terminology more inclusive
Hi Am 03.05.24 um 00:26 schrieb Easwar Hariharan: On 5/2/2024 3:46 AM, Thomas Zimmermann wrote: Am 30.04.24 um 19:38 schrieb Easwar Hariharan: I2C v7, SMBus 3.2, and I3C 1.1.1 specifications have replaced "master/slave" with more appropriate terms. Inspired by and following on to Wolfram's series to fix drivers/i2c/[1], fix the terminology for users of I2C_ALGOBIT bitbanging interface, now that the approved verbiage exists in the specification. Compile tested, no functionality changes intended [1]: https://lore.kernel.org/all/20240322132619.6389-1-wsa+rene...@sang-engineering.com/ Signed-off-by: Easwar Hariharan Acked-by: Thomas Zimmermann Thanks for the ack! I had been addressing feedback as I got it on the v0 series, and it seems I missed out on updating viafb and smscufx to spec-compliant controller/target terminology like the v0->v1 changelog calls out before posting v1. For smscufx, I feel phrasing the following line (as an example) -/* sets up I2C Controller for 100 Kbps, std. speed, 7-bit addr, host, +/* sets up I2C Controller for 100 Kbps, std. speed, 7-bit addr, *controller*, would actually impact readability negatively, so I propose to leave smscufx as is. Why? I don't see much of a difference. For viafb, I propose making it compliant with the spec using the controller/target terminology and posting a v2 respin (which I can send out as soon as you say) and ask you to review again. What do you think? I think we should adopt the spec's language everywhere. That makes it possible to grep the spec for terms used in the source code. Using 'host' in smscufx appears to introduce yet another term. If you are worried about using 'I2C controller' and 'controller' in the same sentence, you can replace 'I2C controller' with 'DDC channel'. That's even more precise about the purpose of this code. Best regards Thomas Thanks, Easwar --- drivers/video/fbdev/via/chip.h | 8 drivers/video/fbdev/via/dvi.c | 24 drivers/video/fbdev/via/lcd.c | 6 +++--- drivers/video/fbdev/via/via_aux.h | 2 +- drivers/video/fbdev/via/via_i2c.c | 12 ++-- drivers/video/fbdev/via/vt1636.c | 6 +++--- 6 files changed, 29 insertions(+), 29 deletions(-) -- -- Thomas Zimmermann Graphics Driver Developer SUSE Software Solutions Germany GmbH Frankenstrasse 146, 90461 Nuernberg, Germany GF: Ivo Totev, Andrew Myers, Andrew McDonald, Boudien Moerman HRB 36809 (AG Nuernberg)
Re: [PATCH v1 03/12] drm/i915: Make I2C terminology more inclusive
On Tue, 30 Apr 2024 17:38:02 + Easwar Hariharan wrote: > I2C v7, SMBus 3.2, and I3C 1.1.1 specifications have replaced > "master/slave" with more appropriate terms. Inspired by and following > on to Wolfram's series to fix drivers/i2c/[1], fix the terminology > for users of I2C_ALGOBIT bitbanging interface, now that the approved > verbiage exists in the specification. > > Compile tested, no functionality changes intended > For GVT part, Acked-by: Zhi Wang Thanks, Zhi. > [1]: > https://lore.kernel.org/all/20240322132619.6389-1-wsa+rene...@sang-engineering.com/ > > Signed-off-by: Easwar Hariharan > --- > drivers/gpu/drm/i915/display/dvo_ch7017.c | 14 - > drivers/gpu/drm/i915/display/dvo_ch7xxx.c | 18 +-- > drivers/gpu/drm/i915/display/dvo_ivch.c | 16 +- > drivers/gpu/drm/i915/display/dvo_ns2501.c | 18 +-- > drivers/gpu/drm/i915/display/dvo_sil164.c | 18 +-- > drivers/gpu/drm/i915/display/dvo_tfp410.c | 18 +-- > drivers/gpu/drm/i915/display/intel_bios.c | 22 +++--- > drivers/gpu/drm/i915/display/intel_ddi.c | 2 +- > .../gpu/drm/i915/display/intel_display_core.h | 2 +- > drivers/gpu/drm/i915/display/intel_dsi.h | 2 +- > drivers/gpu/drm/i915/display/intel_dsi_vbt.c | 20 ++--- > drivers/gpu/drm/i915/display/intel_dvo.c | 14 - > drivers/gpu/drm/i915/display/intel_dvo_dev.h | 2 +- > drivers/gpu/drm/i915/display/intel_gmbus.c| 4 +-- > drivers/gpu/drm/i915/display/intel_sdvo.c | 30 > +-- drivers/gpu/drm/i915/display/intel_vbt_defs.h | > 4 +-- drivers/gpu/drm/i915/gvt/edid.c | 28 > - drivers/gpu/drm/i915/gvt/edid.h | 4 > +-- drivers/gpu/drm/i915/gvt/opregion.c | 2 +- > 19 files changed, 119 insertions(+), 119 deletions(-) > > diff --git a/drivers/gpu/drm/i915/display/dvo_ch7017.c > b/drivers/gpu/drm/i915/display/dvo_ch7017.c index > d0c3880d7f80..493e730c685b 100644 --- > a/drivers/gpu/drm/i915/display/dvo_ch7017.c +++ > b/drivers/gpu/drm/i915/display/dvo_ch7017.c @@ -170,13 +170,13 @@ > static bool ch7017_read(struct intel_dvo_device *dvo, u8 addr, u8 > *val) { struct i2c_msg msgs[] = { > { > - .addr = dvo->slave_addr, > + .addr = dvo->target_addr, > .flags = 0, > .len = 1, > .buf = &addr, > }, > { > - .addr = dvo->slave_addr, > + .addr = dvo->target_addr, > .flags = I2C_M_RD, > .len = 1, > .buf = val, > @@ -189,7 +189,7 @@ static bool ch7017_write(struct intel_dvo_device > *dvo, u8 addr, u8 val) { > u8 buf[2] = { addr, val }; > struct i2c_msg msg = { > - .addr = dvo->slave_addr, > + .addr = dvo->target_addr, > .flags = 0, > .len = 2, > .buf = buf, > @@ -197,7 +197,7 @@ static bool ch7017_write(struct intel_dvo_device > *dvo, u8 addr, u8 val) return i2c_transfer(dvo->i2c_bus, &msg, 1) == > 1; } > > -/** Probes for a CH7017 on the given bus and slave address. */ > +/** Probes for a CH7017 on the given bus and target address. */ > static bool ch7017_init(struct intel_dvo_device *dvo, > struct i2c_adapter *adapter) > { > @@ -227,13 +227,13 @@ static bool ch7017_init(struct intel_dvo_device > *dvo, break; > default: > DRM_DEBUG_KMS("ch701x not detected, got %d: from %s " > - "slave %d.\n", > - val, adapter->name, dvo->slave_addr); > + "target %d.\n", > + val, adapter->name, dvo->target_addr); > goto fail; > } > > DRM_DEBUG_KMS("%s detected on %s, addr %d\n", > - str, adapter->name, dvo->slave_addr); > + str, adapter->name, dvo->target_addr); > return true; > > fail: > diff --git a/drivers/gpu/drm/i915/display/dvo_ch7xxx.c > b/drivers/gpu/drm/i915/display/dvo_ch7xxx.c index > 2e8e85da5a40..534b8544e0a4 100644 --- > a/drivers/gpu/drm/i915/display/dvo_ch7xxx.c +++ > b/drivers/gpu/drm/i915/display/dvo_ch7xxx.c @@ -153,13 +153,13 @@ > static bool ch7xxx_readb(struct intel_dvo_device *dvo, int addr, u8 > *ch) struct i2c_msg msgs[] = { > { > - .addr = dvo->slave_addr, > + .addr = dvo->target_addr, > .flags = 0, > .len = 1, > .buf = out_buf, > }, > { > - .addr = dvo->slave_addr, > + .addr = dvo->target_addr, > .flags = I2C_M_RD, > .len = 1, > .buf = in_buf, > @@ -176,7 +176,7 @@ static bool ch7xxx_readb(