Looks good to me. It might be good to add a comment above pcie_gen in amdgpu_drm.h to specify that this is the max common pcie gen supported by both the GPU and the slot. With that fixed, the patch is: Reviewed-by: Alex Deucher <alexander.deuc...@amd.com>
On Fri, Jan 13, 2023 at 6:33 PM Marek Olšák <mar...@gmail.com> wrote: > > There is no hole on 32-bit unfortunately. It looks like the hole on 64-bit is > now ABI. > > I moved the field to replace _pad1. The patch is attached (with your Rb). > > Marek > > On Fri, Jan 13, 2023 at 4:20 PM Alex Deucher <alexdeuc...@gmail.com> wrote: >> >> On Fri, Jan 13, 2023 at 4:02 PM Marek Olšák <mar...@gmail.com> wrote: >> > >> > i've added the comments and indeed pahole shows the hole as expected. >> >> What about on 32-bit? >> >> Alex >> >> > >> > Marek >> > >> > On Thu, Jan 12, 2023 at 11:44 AM Alex Deucher <alexdeuc...@gmail.com> >> > wrote: >> >> >> >> On Thu, Jan 12, 2023 at 6:50 AM Christian König >> >> <christian.koe...@amd.com> wrote: >> >> > >> >> > Am 11.01.23 um 21:48 schrieb Alex Deucher: >> >> > > On Wed, Jan 4, 2023 at 3:17 PM Marek Olšák <mar...@gmail.com> wrote: >> >> > >> Yes, it's meant to be like a spec sheet. We are not interested in >> >> > >> the current bandwidth utilization. >> >> > > After chatting with Marek on IRC and thinking about this more, I think >> >> > > this patch is fine. It's not really meant for bandwidth per se, but >> >> > > rather as a limit to determine what the driver should do in certain >> >> > > cases (i.e., when does it make sense to copy to vram vs not). It's >> >> > > not straightforward for userspace to parse the full topology to >> >> > > determine what links may be slow. I guess one potential pitfall would >> >> > > be that if you pass the device into a VM, the driver may report the >> >> > > wrong values. Generally in a VM the VM doesn't get the full view up >> >> > > to the root port. I don't know if the hypervisors report properly for >> >> > > pcie_bandwidth_available() in a VM or if it just shows the info about >> >> > > the endpoint in the VM. >> >> > >> >> > So this basically doesn't return the gen and lanes of the device, but >> >> > rather what was negotiated between the device and the upstream root >> >> > port? >> >> >> >> Correct. It exposes the max gen and lanes of the slowest link between >> >> the device and the root port. >> >> >> >> > >> >> > If I got that correctly then we should probably document that cause >> >> > otherwise somebody will try to "fix" it at some time. >> >> >> >> Good point. >> >> >> >> Alex >> >> >> >> > >> >> > Christian. >> >> > >> >> > > >> >> > > Reviewed-by: Alex Deucher <alexander.deuc...@amd.com> >> >> > > >> >> > > Alex >> >> > > >> >> > >> Marek >> >> > >> >> >> > >> On Wed, Jan 4, 2023 at 10:33 AM Lazar, Lijo <lijo.la...@amd.com> >> >> > >> wrote: >> >> > >>> [AMD Official Use Only - General] >> >> > >>> >> >> > >>> >> >> > >>> To clarify, with DPM in place, the current bandwidth will be >> >> > >>> changing based on the load. >> >> > >>> >> >> > >>> If apps/umd already has a way to know the current bandwidth >> >> > >>> utilisation, then possible maximum also could be part of the same >> >> > >>> API. Otherwise, this only looks like duplicate information. We have >> >> > >>> the same information in sysfs DPM nodes. >> >> > >>> >> >> > >>> BTW, I don't know to what extent app/umd really makes use of this. >> >> > >>> Take that memory frequency as an example (I'm reading it as 16GHz). >> >> > >>> It only looks like a spec sheet. >> >> > >>> >> >> > >>> Thanks, >> >> > >>> Lijo >> >> > >>> ________________________________ >> >> > >>> From: Marek Olšák <mar...@gmail.com> >> >> > >>> Sent: Wednesday, January 4, 2023 8:40:00 PM >> >> > >>> To: Lazar, Lijo <lijo.la...@amd.com> >> >> > >>> Cc: amd-gfx@lists.freedesktop.org <amd-gfx@lists.freedesktop.org> >> >> > >>> Subject: Re: [PATCH 1/2] drm/amdgpu: return the PCIe gen and lanes >> >> > >>> from the INFO >> >> > >>> >> >> > >>> On Wed, Jan 4, 2023 at 9:19 AM Lazar, Lijo <lijo.la...@amd.com> >> >> > >>> wrote: >> >> > >>> >> >> > >>> >> >> > >>> >> >> > >>> On 1/4/2023 7:43 PM, Marek Olšák wrote: >> >> > >>>> On Wed, Jan 4, 2023 at 6:50 AM Lazar, Lijo <lijo.la...@amd.com >> >> > >>>> <mailto:lijo.la...@amd.com>> wrote: >> >> > >>>> >> >> > >>>> >> >> > >>>> >> >> > >>>> On 1/4/2023 4:11 AM, Marek Olšák wrote: >> >> > >>>> > I see. Well, those sysfs files are not usable, and I don't >> >> > >>>> think it >> >> > >>>> > would be important even if they were usable, but for >> >> > >>>> completeness: >> >> > >>>> > >> >> > >>>> > The ioctl returns: >> >> > >>>> > pcie_gen = 1 >> >> > >>>> > pcie_num_lanes = 16 >> >> > >>>> > >> >> > >>>> > Theoretical bandwidth from those values: 4.0 GB/s >> >> > >>>> > My DMA test shows this write bandwidth: 3.5 GB/s >> >> > >>>> > It matches the expectation. >> >> > >>>> > >> >> > >>>> > Let's see the devices (there is only 1 GPU Navi21 in the >> >> > >>>> system): >> >> > >>>> > $ lspci |egrep '(PCI|VGA).*Navi' >> >> > >>>> > 0a:00.0 PCI bridge: Advanced Micro Devices, Inc. [AMD/ATI] >> >> > >>>> Navi >> >> > >>>> 10 XL >> >> > >>>> > Upstream Port of PCI Express Switch (rev c3) >> >> > >>>> > 0b:00.0 PCI bridge: Advanced Micro Devices, Inc. [AMD/ATI] >> >> > >>>> Navi >> >> > >>>> 10 XL >> >> > >>>> > Downstream Port of PCI Express Switch >> >> > >>>> > 0c:00.0 VGA compatible controller: Advanced Micro Devices, >> >> > >>>> Inc. >> >> > >>>> > [AMD/ATI] Navi 21 [Radeon RX 6800/6800 XT / 6900 XT] (rev >> >> > >>>> c3) >> >> > >>>> > >> >> > >>>> > Let's read sysfs: >> >> > >>>> > >> >> > >>>> > $ cat /sys/bus/pci/devices/0000:0a:00.0/current_link_width >> >> > >>>> > 16 >> >> > >>>> > $ cat /sys/bus/pci/devices/0000:0b:00.0/current_link_width >> >> > >>>> > 16 >> >> > >>>> > $ cat /sys/bus/pci/devices/0000:0c:00.0/current_link_width >> >> > >>>> > 16 >> >> > >>>> > $ cat /sys/bus/pci/devices/0000:0a:00.0/current_link_speed >> >> > >>>> > 2.5 GT/s PCIe >> >> > >>>> > $ cat /sys/bus/pci/devices/0000:0b:00.0/current_link_speed >> >> > >>>> > 16.0 GT/s PCIe >> >> > >>>> > $ cat /sys/bus/pci/devices/0000:0c:00.0/current_link_speed >> >> > >>>> > 16.0 GT/s PCIe >> >> > >>>> > >> >> > >>>> > Problem 1: None of the speed numbers match 4 GB/s. >> >> > >>>> >> >> > >>>> US bridge = 2.5GT/s means operating at PCIe Gen 1 speed. Total >> >> > >>>> theoretical bandwidth is then derived based on encoding and >> >> > >>>> total >> >> > >>>> number >> >> > >>>> of lanes. >> >> > >>>> >> >> > >>>> > Problem 2: Userspace doesn't know the bus index of the >> >> > >>>> bridges, >> >> > >>>> and it's >> >> > >>>> > not clear which bridge should be used. >> >> > >>>> >> >> > >>>> In general, modern ones have this arch= US->DS->EP. US is the >> >> > >>>> one >> >> > >>>> connected to physical link. >> >> > >>>> >> >> > >>>> > Problem 3: The PCIe gen number is missing. >> >> > >>>> >> >> > >>>> Current link speed is based on whether it's Gen1/2/3/4/5. >> >> > >>>> >> >> > >>>> BTW, your patch makes use of capabilities flags which gives >> >> > >>>> the maximum >> >> > >>>> supported speed/width by the device. It may not necessarily >> >> > >>>> reflect the >> >> > >>>> current speed/width negotiated. I guess in NV, this info is >> >> > >>>> already >> >> > >>>> obtained from PMFW and made available through metrics table. >> >> > >>>> >> >> > >>>> >> >> > >>>> It computes the minimum of the device PCIe gen and the >> >> > >>>> motherboard/slot >> >> > >>>> PCIe gen to get the final value. These 2 lines do that. The low 16 >> >> > >>>> bits >> >> > >>>> of the mask contain the device PCIe gen mask. The high 16 bits of >> >> > >>>> the >> >> > >>>> mask contain the slot PCIe gen mask. >> >> > >>>> + pcie_gen_mask = adev->pm.pcie_gen_mask & (adev->pm.pcie_gen_mask >> >> > >>>> >> 16); >> >> > >>>> + dev_info->pcie_gen = fls(pcie_gen_mask); >> >> > >>>> >> >> > >>> With DPM in place on some ASICs, how much does this static info >> >> > >>> help for >> >> > >>> upper level apps? >> >> > >>> >> >> > >>> >> >> > >>> It helps UMDs make better decisions if they know the maximum >> >> > >>> achievable bandwidth. UMDs also compute the maximum memory >> >> > >>> bandwidth and compute performance (FLOPS). Right now it's printed >> >> > >>> by Mesa to give users detailed information about their GPU. For >> >> > >>> example: >> >> > >>> >> >> > >>> $ AMD_DEBUG=info glxgears >> >> > >>> Device info: >> >> > >>> name = NAVI21 >> >> > >>> marketing_name = AMD Radeon RX 6800 >> >> > >>> num_se = 3 >> >> > >>> num_rb = 12 >> >> > >>> num_cu = 60 >> >> > >>> max_gpu_freq = 2475 MHz >> >> > >>> max_gflops = 19008 GFLOPS >> >> > >>> l0_cache_size = 16 KB >> >> > >>> l1_cache_size = 128 KB >> >> > >>> l2_cache_size = 4096 KB >> >> > >>> l3_cache_size = 128 MB >> >> > >>> memory_channels = 16 (TCC blocks) >> >> > >>> memory_size = 16 GB (16384 MB) >> >> > >>> memory_freq = 16 GHz >> >> > >>> memory_bus_width = 256 bits >> >> > >>> memory_bandwidth = 512 GB/s >> >> > >>> pcie_gen = 1 >> >> > >>> pcie_num_lanes = 16 >> >> > >>> pcie_bandwidth = 4.0 GB/s >> >> > >>> >> >> > >>> Marek >> >> >