Looks good to me.  It might be good to add a comment above pcie_gen in
amdgpu_drm.h to specify that this is the max common pcie gen supported
by both the GPU and the slot.  With that fixed, the patch is:
Reviewed-by: Alex Deucher <alexander.deuc...@amd.com>

On Fri, Jan 13, 2023 at 6:33 PM Marek Olšák <mar...@gmail.com> wrote:
>
> There is no hole on 32-bit unfortunately. It looks like the hole on 64-bit is 
> now ABI.
>
> I moved the field to replace _pad1. The patch is attached (with your Rb).
>
> Marek
>
> On Fri, Jan 13, 2023 at 4:20 PM Alex Deucher <alexdeuc...@gmail.com> wrote:
>>
>> On Fri, Jan 13, 2023 at 4:02 PM Marek Olšák <mar...@gmail.com> wrote:
>> >
>> > i've added the comments and indeed pahole shows the hole as expected.
>>
>> What about on 32-bit?
>>
>> Alex
>>
>> >
>> > Marek
>> >
>> > On Thu, Jan 12, 2023 at 11:44 AM Alex Deucher <alexdeuc...@gmail.com> 
>> > wrote:
>> >>
>> >> On Thu, Jan 12, 2023 at 6:50 AM Christian König
>> >> <christian.koe...@amd.com> wrote:
>> >> >
>> >> > Am 11.01.23 um 21:48 schrieb Alex Deucher:
>> >> > > On Wed, Jan 4, 2023 at 3:17 PM Marek Olšák <mar...@gmail.com> wrote:
>> >> > >> Yes, it's meant to be like a spec sheet. We are not interested in 
>> >> > >> the current bandwidth utilization.
>> >> > > After chatting with Marek on IRC and thinking about this more, I think
>> >> > > this patch is fine.  It's not really meant for bandwidth per se, but
>> >> > > rather as a limit to determine what the driver should do in certain
>> >> > > cases (i.e., when does it make sense to copy to vram vs not).  It's
>> >> > > not straightforward for userspace to parse the full topology to
>> >> > > determine what links may be slow.  I guess one potential pitfall would
>> >> > > be that if you pass the device into a VM, the driver may report the
>> >> > > wrong values.  Generally in a VM the VM doesn't get the full view up
>> >> > > to the root port.  I don't know if the hypervisors report properly for
>> >> > > pcie_bandwidth_available() in a VM or if it just shows the info about
>> >> > > the endpoint in the VM.
>> >> >
>> >> > So this basically doesn't return the gen and lanes of the device, but
>> >> > rather what was negotiated between the device and the upstream root 
>> >> > port?
>> >>
>> >> Correct. It exposes the max gen and lanes of the slowest link between
>> >> the device and the root port.
>> >>
>> >> >
>> >> > If I got that correctly then we should probably document that cause
>> >> > otherwise somebody will try to "fix" it at some time.
>> >>
>> >> Good point.
>> >>
>> >> Alex
>> >>
>> >> >
>> >> > Christian.
>> >> >
>> >> > >
>> >> > > Reviewed-by: Alex Deucher <alexander.deuc...@amd.com>
>> >> > >
>> >> > > Alex
>> >> > >
>> >> > >> Marek
>> >> > >>
>> >> > >> On Wed, Jan 4, 2023 at 10:33 AM Lazar, Lijo <lijo.la...@amd.com> 
>> >> > >> wrote:
>> >> > >>> [AMD Official Use Only - General]
>> >> > >>>
>> >> > >>>
>> >> > >>> To clarify, with DPM in place, the current bandwidth will be 
>> >> > >>> changing based on the load.
>> >> > >>>
>> >> > >>> If apps/umd already has a way to know the current bandwidth 
>> >> > >>> utilisation, then possible maximum also could be part of the same 
>> >> > >>> API. Otherwise, this only looks like duplicate information. We have 
>> >> > >>> the same information in sysfs DPM nodes.
>> >> > >>>
>> >> > >>> BTW, I don't know to what extent app/umd really makes use of this. 
>> >> > >>> Take that memory frequency as an example (I'm reading it as 16GHz). 
>> >> > >>> It only looks like a spec sheet.
>> >> > >>>
>> >> > >>> Thanks,
>> >> > >>> Lijo
>> >> > >>> ________________________________
>> >> > >>> From: Marek Olšák <mar...@gmail.com>
>> >> > >>> Sent: Wednesday, January 4, 2023 8:40:00 PM
>> >> > >>> To: Lazar, Lijo <lijo.la...@amd.com>
>> >> > >>> Cc: amd-gfx@lists.freedesktop.org <amd-gfx@lists.freedesktop.org>
>> >> > >>> Subject: Re: [PATCH 1/2] drm/amdgpu: return the PCIe gen and lanes 
>> >> > >>> from the INFO
>> >> > >>>
>> >> > >>> On Wed, Jan 4, 2023 at 9:19 AM Lazar, Lijo <lijo.la...@amd.com> 
>> >> > >>> wrote:
>> >> > >>>
>> >> > >>>
>> >> > >>>
>> >> > >>> On 1/4/2023 7:43 PM, Marek Olšák wrote:
>> >> > >>>> On Wed, Jan 4, 2023 at 6:50 AM Lazar, Lijo <lijo.la...@amd.com
>> >> > >>>> <mailto:lijo.la...@amd.com>> wrote:
>> >> > >>>>
>> >> > >>>>
>> >> > >>>>
>> >> > >>>>      On 1/4/2023 4:11 AM, Marek Olšák wrote:
>> >> > >>>>       > I see. Well, those sysfs files are not usable, and I don't 
>> >> > >>>> think it
>> >> > >>>>       > would be important even if they were usable, but for 
>> >> > >>>> completeness:
>> >> > >>>>       >
>> >> > >>>>       > The ioctl returns:
>> >> > >>>>       >      pcie_gen = 1
>> >> > >>>>       >      pcie_num_lanes = 16
>> >> > >>>>       >
>> >> > >>>>       > Theoretical bandwidth from those values: 4.0 GB/s
>> >> > >>>>       > My DMA test shows this write bandwidth: 3.5 GB/s
>> >> > >>>>       > It matches the expectation.
>> >> > >>>>       >
>> >> > >>>>       > Let's see the devices (there is only 1 GPU Navi21 in the 
>> >> > >>>> system):
>> >> > >>>>       > $ lspci |egrep '(PCI|VGA).*Navi'
>> >> > >>>>       > 0a:00.0 PCI bridge: Advanced Micro Devices, Inc. [AMD/ATI] 
>> >> > >>>> Navi
>> >> > >>>>      10 XL
>> >> > >>>>       > Upstream Port of PCI Express Switch (rev c3)
>> >> > >>>>       > 0b:00.0 PCI bridge: Advanced Micro Devices, Inc. [AMD/ATI] 
>> >> > >>>> Navi
>> >> > >>>>      10 XL
>> >> > >>>>       > Downstream Port of PCI Express Switch
>> >> > >>>>       > 0c:00.0 VGA compatible controller: Advanced Micro Devices, 
>> >> > >>>> Inc.
>> >> > >>>>       > [AMD/ATI] Navi 21 [Radeon RX 6800/6800 XT / 6900 XT] (rev 
>> >> > >>>> c3)
>> >> > >>>>       >
>> >> > >>>>       > Let's read sysfs:
>> >> > >>>>       >
>> >> > >>>>       > $ cat /sys/bus/pci/devices/0000:0a:00.0/current_link_width
>> >> > >>>>       > 16
>> >> > >>>>       > $ cat /sys/bus/pci/devices/0000:0b:00.0/current_link_width
>> >> > >>>>       > 16
>> >> > >>>>       > $ cat /sys/bus/pci/devices/0000:0c:00.0/current_link_width
>> >> > >>>>       > 16
>> >> > >>>>       > $ cat /sys/bus/pci/devices/0000:0a:00.0/current_link_speed
>> >> > >>>>       > 2.5 GT/s PCIe
>> >> > >>>>       > $ cat /sys/bus/pci/devices/0000:0b:00.0/current_link_speed
>> >> > >>>>       > 16.0 GT/s PCIe
>> >> > >>>>       > $ cat /sys/bus/pci/devices/0000:0c:00.0/current_link_speed
>> >> > >>>>       > 16.0 GT/s PCIe
>> >> > >>>>       >
>> >> > >>>>       > Problem 1: None of the speed numbers match 4 GB/s.
>> >> > >>>>
>> >> > >>>>      US bridge = 2.5GT/s means operating at PCIe Gen 1 speed. Total
>> >> > >>>>      theoretical bandwidth is then derived based on encoding and 
>> >> > >>>> total
>> >> > >>>>      number
>> >> > >>>>      of lanes.
>> >> > >>>>
>> >> > >>>>       > Problem 2: Userspace doesn't know the bus index of the 
>> >> > >>>> bridges,
>> >> > >>>>      and it's
>> >> > >>>>       > not clear which bridge should be used.
>> >> > >>>>
>> >> > >>>>      In general, modern ones have this arch= US->DS->EP. US is the 
>> >> > >>>> one
>> >> > >>>>      connected to physical link.
>> >> > >>>>
>> >> > >>>>       > Problem 3: The PCIe gen number is missing.
>> >> > >>>>
>> >> > >>>>      Current link speed is based on whether it's Gen1/2/3/4/5.
>> >> > >>>>
>> >> > >>>>      BTW, your patch makes use of capabilities flags which gives 
>> >> > >>>> the maximum
>> >> > >>>>      supported speed/width by the device. It may not necessarily 
>> >> > >>>> reflect the
>> >> > >>>>      current speed/width negotiated. I guess in NV, this info is 
>> >> > >>>> already
>> >> > >>>>      obtained from PMFW and made available through metrics table.
>> >> > >>>>
>> >> > >>>>
>> >> > >>>> It computes the minimum of the device PCIe gen and the 
>> >> > >>>> motherboard/slot
>> >> > >>>> PCIe gen to get the final value. These 2 lines do that. The low 16 
>> >> > >>>> bits
>> >> > >>>> of the mask contain the device PCIe gen mask. The high 16 bits of 
>> >> > >>>> the
>> >> > >>>> mask contain the slot PCIe gen mask.
>> >> > >>>> + pcie_gen_mask = adev->pm.pcie_gen_mask & (adev->pm.pcie_gen_mask 
>> >> > >>>> >> 16);
>> >> > >>>> + dev_info->pcie_gen = fls(pcie_gen_mask);
>> >> > >>>>
>> >> > >>> With DPM in place on some ASICs, how much does this static info 
>> >> > >>> help for
>> >> > >>> upper level apps?
>> >> > >>>
>> >> > >>>
>> >> > >>> It helps UMDs make better decisions if they know the maximum 
>> >> > >>> achievable bandwidth. UMDs also compute the maximum memory 
>> >> > >>> bandwidth and compute performance (FLOPS). Right now it's printed 
>> >> > >>> by Mesa to give users detailed information about their GPU. For 
>> >> > >>> example:
>> >> > >>>
>> >> > >>> $ AMD_DEBUG=info glxgears
>> >> > >>> Device info:
>> >> > >>>      name = NAVI21
>> >> > >>>      marketing_name = AMD Radeon RX 6800
>> >> > >>>      num_se = 3
>> >> > >>>      num_rb = 12
>> >> > >>>      num_cu = 60
>> >> > >>>      max_gpu_freq = 2475 MHz
>> >> > >>>      max_gflops = 19008 GFLOPS
>> >> > >>>      l0_cache_size = 16 KB
>> >> > >>>      l1_cache_size = 128 KB
>> >> > >>>      l2_cache_size = 4096 KB
>> >> > >>>      l3_cache_size = 128 MB
>> >> > >>>      memory_channels = 16 (TCC blocks)
>> >> > >>>      memory_size = 16 GB (16384 MB)
>> >> > >>>      memory_freq = 16 GHz
>> >> > >>>      memory_bus_width = 256 bits
>> >> > >>>      memory_bandwidth = 512 GB/s
>> >> > >>>      pcie_gen = 1
>> >> > >>>      pcie_num_lanes = 16
>> >> > >>>      pcie_bandwidth = 4.0 GB/s
>> >> > >>>
>> >> > >>> Marek
>> >> >

Reply via email to