On Fri, 31 May 2024 12:36:35 +0200 Nam Cao <nam...@linutronix.de> wrote:
> On Fri, May 31, 2024 at 11:14:00AM +0100, Jonathan Cameron wrote: > > On Wed, 29 May 2024 22:17:44 +0200 > > Nam Cao <nam...@linutronix.de> wrote: > > > > > Set link width to x1 and link speed to 2.5 Gb/s as specified by the > > > datasheet. Without this, these fields in the link status register read > > > zero, which is incorrect. > > > > > > This problem appeared since 3d67447fe7c2 ("pcie: Fill PCIESlot link fields > > > to support higher speeds and widths"), which allows PCIe slot to set link > > > width and link speed. However, if PCIe slot does not explicitly set these > > > properties, they will be zero. Before this commit, the width and speed > > > default to x1 and 2.5 Gb/s. > > > > > > Fixes: 3d67447fe7c2 ("pcie: Fill PCIESlot link fields to support higher > > > speeds and widths") > > > Signed-off-by: Nam Cao <nam...@linutronix.de> > > Hi Nam, > > > > I'm feeling a bit guilty about this one a known it was there for a while. > > > > I was lazy when fixing the equivalent CXL case a while back on > > basis no one had noticed and unlike CXL (where migration is broken for a lot > > of reasons) fixing this may need to take into account migration from broken > > to > > fixed versions. Have you tested that? > I've run into problems in the past around updating config space registers because when we migrate from a prepatch QEMU instance to a post patch 1 the config space registers are compared. I'm not sure if LNKCAP is included in that. LNKSTA is explicitly ruled out I think. For examples see all the machine version checks in hw/core/machine.c The one that bit me was fixed with x-pcie-err-unc-mask when I was fixing a register that didn't match the spec defined values. > I tested this patch with Linux kernel. > > I noticed this bug when Linux complained that the PCI link was broken. > Linux determines weather a link is up by checking if these speed/width > fields have valid value. > > Repro: > qemu-system-x86_64 \ > -machine pc-q35-2.10 \ > -kernel bzImage \ > -drive "file=img,format=raw" \ > -m 2048 -smp 1 -enable-kvm \ > -append "console=ttyS0 root=/dev/sda debug" \ > -nographic \ > -device pcie-root-port,bus=pcie.0,slot=1,id=rp1,bus-reserve=253 \ > -device x3130-upstream,id=up1,bus=rp1 \ > -device xio3130-downstream,id=dp1,bus=up1,chassis=1,slot=1 > > Then after Linux has booted: > device_add device_add e1000,bus=dp1,id=eth0 > > Then Linux complains that something is wrong with the link: > pcieport 0000:02:00.0: pciehp: Slot(1-1): Cannot train link: status 0x2000 > > This patch gets rid of Linux's complain, and the hot-plug now works fine. > > > I did the CXL fix slightly differently. Can't remember why though - looking > > at the fact it uses an instance_post_init, is there an issue with > > accidentally > > overwriting the parameters? Or did I just over engineer the fix? > > I would say over engineer. I think CXL does not take link speed and link > width as parameters. I've implemented control but this still ends up over engineered because the reason I want to control this is to vary access parameters for calculating latency and bandwidth. That is easiest done by controlling the EP status to degrade the link. For that I just set the CAP register on the switch DSP to allow suitably high values and let pcie_sync_bridge() match this to the status of the EP (which I have properties to contro). There seems to be only one way 'negotiation' of these parameters so it needs to be EP driven. Jonathan > > Best regards, > Nam