Re: [RFC 0/5] Add capacity key to fdinfo

Tvrtko Ursulin Fri, 03 May 2024 00:50:58 -0700


On 02/05/2024 14:07, Christian König wrote:

Am 01.05.24 um 15:27 schrieb Tvrtko Ursulin:
Hi Alex,

On 30/04/2024 19:32, Alex Deucher wrote:
On Tue, Apr 30, 2024 at 1:27 PM Tvrtko Ursulin <tursu...@igalia.com>wrote:
From: Tvrtko Ursulin <tvrtko.ursu...@igalia.com>
I have noticed AMD GPUs can have more than one "engine" (ring?) ofthe same typebut amdgpu is not reporting that in fdinfo using the capacity enginetag.
This series is therefore an attempt to improve that, but only an RFCsince it isquite likely I got stuff wrong on the first attempt. Or if not wrongit may not
be very beneficial in AMDs case.
So I tried to figure out how to count and store the number ofinstances of an"engine" type and spotted that could perhaps be used in more thanone place inthe driver. I was more than a little bit confused by the ip_instanceand uapirings, then how rings are selected to context entities internally.Anyway..hopefully it is a simple enough series to easily spot any such largemisses.
End result should be that, assuming two "engine" instances, onefully loaded and
one idle will only report client using 50% of that engine type.
That would only be true if there are multiple instantiations of the IP
on the chip which in most cases is not true.  In most cases there is
one instance of the IP that can be fed from multiple rings. E.g. for
graphics and compute, all of the rings ultimately feed into the same
compute units on the chip.  So if you have a gfx ring and a compute
rings, you can schedule work to them asynchronously, but ultimately
whether they execute serially or in parallel depends on the actual
shader code in the command buffers and the extent to which it can
utilize the available compute units in the shader cores.
This is the same as with Intel/i915. Fdinfo is not intended to provideutilisation of EUs and such, just how busy are the "entities" kernelsubmits to. So doing something like in this series would make thereporting more similar between the two drivers.
I think both the 0-800% or 0-100% range (taking 8 ring compute as anexample) can be misleading for different workloads. Neither <800% inthe former means one can send more work and same for <100% in the latter.
Yeah, I think that's what Alex tries to describe. By using 8 computerings your 800% load is actually incorrect and quite misleading.
Background is that those 8 compute rings won't be active all at the sametime, but rather waiting on each other for resources.
But this "waiting" is unfortunately considered execution time since theused approach is actually not really capable of separating waiting andexecution time.

Right, so 800% is what gputop could be suggesting today, by the virtue 8context/clients can each use 100% if they only use a subset of computeunits. I was proposing to expose the capacity in fdinfo so it can bescaled down and then dicussing how both situation have pros and cons.

There is also a parallel with the CPU world here and hyper threading,if not wider, where "What does 100% actually mean?" is also wishy-washy.
Also note that the reporting of actual time based values in fdinfowould not changing with this series.
Of if you can guide me towards how to distinguish real vs fakeparallelism in HW IP blocks I could modify the series to only addcapacity tags where there are truly independent blocks. That would bedifferent from i915 though were I did not bother with thatdistinction. (For reasons that assignment of for instance EUs tocompute "rings" (command streamers in i915) was supposed to bepossible to re-configure on the fly. So it did not make sense to tryand be super smart in fdinfo.)
Well exactly that's the point we don't really have truly independentblocks on AMD hardware.
There are things like independent SDMA instances, but those a meant tobe used like the first instance for uploads and the second for downloadsetc.. When you use both instances for the same job they will pretty muchlimit each other because of a single resource.

So _never_ multiple instances of the same IP block? No video decode,encode, anything?

As for the UAPI portion of this, we generally expose a limited number
of rings to user space and then we use the GPU scheduler to load
balance between all of the available rings of a type to try and
extract as much parallelism as we can.
The part I do not understand is the purpose of the ring argument infor instance drm_amdgpu_cs_chunk_ib. It appears userspace can createup to N scheduling entities using different ring id's, but internallythey can map to 1:N same scheduler instances (depending on IP type,can be that each userspace ring maps to same N hw rings, or for ringswith no drm sched load balancing userspace ring also does not appearto have a relation to the picked drm sched instance.).
So I neither understand how this ring is useful, or how it does notcreate a problem for IP types which use drm_sched_pick_best. Itappears even if userspace created two scheduling entities withdifferent ring ids they could randomly map to same drm sched aka samehw ring, no?
Yeah, that is correct. The multimedia instances have to use a "fixed"load balancing because of lack of firmware support. That should havebeen fixed by now but we never found time to actually validate it.


Gotcha.

Regarding the "ring" parameter in CS, that is basically just forbackward compatibility with older userspace. E.g. that we don't map allSDMA jobs to the same instance when only once context is used.

I see. In that sense "limits" for compute in amdgpu_ctx_num_entities arearbitrary, or related to some old userspace expectation?


Regards,

Tvrtko

Regards,
Christian.


Regards,

Tvrtko

Alex


Tvrtko Ursulin (5):
   drm/amdgpu: Cache number of rings per hw ip type

drm/amdgpu: Use cached number of rings from theAMDGPU_INFO_HW_IP_INFO

     ioctl
   drm/amdgpu: Skip not present rings in amdgpu_ctx_mgr_usage
   drm/amdgpu: Show engine capacity in fdinfo
   drm/amdgpu: Only show VRAM in fdinfo if it exists

  drivers/gpu/drm/amd/amdgpu/amdgpu.h        |  1 +
  drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c    |  3 ++
  drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 14 +++++
  drivers/gpu/drm/amd/amdgpu/amdgpu_fdinfo.c | 39 +++++++++-----

drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c | 62+++-------------------

  5 files changed, 49 insertions(+), 70 deletions(-)

--
2.44.0

Re: [RFC 0/5] Add capacity key to fdinfo

Reply via email to