On Fri, 31 May 2024 12:36:35 +0200
Nam Cao <nam...@linutronix.de> wrote:

> On Fri, May 31, 2024 at 11:14:00AM +0100, Jonathan Cameron wrote:
> > On Wed, 29 May 2024 22:17:44 +0200
> > Nam Cao <nam...@linutronix.de> wrote:
> >   
> > > Set link width to x1 and link speed to 2.5 Gb/s as specified by the
> > > datasheet. Without this, these fields in the link status register read
> > > zero, which is incorrect.
> > > 
> > > This problem appeared since 3d67447fe7c2 ("pcie: Fill PCIESlot link fields
> > > to support higher speeds and widths"), which allows PCIe slot to set link
> > > width and link speed. However, if PCIe slot does not explicitly set these
> > > properties, they will be zero. Before this commit, the width and speed
> > > default to x1 and 2.5 Gb/s.
> > > 
> > > Fixes: 3d67447fe7c2 ("pcie: Fill PCIESlot link fields to support higher 
> > > speeds and widths")
> > > Signed-off-by: Nam Cao <nam...@linutronix.de>  
> > Hi Nam,
> > 
> > I'm feeling a bit guilty about this one a known it was there for a while.
> > 
> > I was lazy when fixing the equivalent CXL case a while back on
> > basis no one had noticed and unlike CXL (where migration is broken for a lot
> > of reasons) fixing this may need to take into account migration from broken 
> > to
> > fixed versions.  Have you tested that?  
> 

I've run into problems in the past around updating config space registers
because when we migrate from a prepatch QEMU instance to a post patch 1 the
config space registers are compared. I'm not sure if LNKCAP is included
in that.  LNKSTA is explicitly ruled out I think.

For examples see all the machine version checks in
hw/core/machine.c

The one that bit me was fixed with x-pcie-err-unc-mask
when I was fixing a register that didn't match the spec defined values.


> I tested this patch with Linux kernel.
> 
> I noticed this bug when Linux complained that the PCI link was broken.
> Linux determines weather a link is up by checking if these speed/width
> fields have valid value.
> 
> Repro:
>       qemu-system-x86_64 \
>       -machine pc-q35-2.10 \
>       -kernel bzImage \
>       -drive "file=img,format=raw" \
>       -m 2048 -smp 1 -enable-kvm \
>       -append "console=ttyS0 root=/dev/sda debug" \
>       -nographic \
>       -device pcie-root-port,bus=pcie.0,slot=1,id=rp1,bus-reserve=253 \
>       -device x3130-upstream,id=up1,bus=rp1 \
>       -device xio3130-downstream,id=dp1,bus=up1,chassis=1,slot=1
> 
> Then after Linux has booted:
>       device_add device_add e1000,bus=dp1,id=eth0
> 
> Then Linux complains that something is wrong with the link:
> pcieport 0000:02:00.0: pciehp: Slot(1-1): Cannot train link: status 0x2000
>  
> This patch gets rid of Linux's complain, and the hot-plug now works fine.
> 
> > I did the CXL fix slightly differently.  Can't remember why though - looking
> > at the fact it uses an instance_post_init, is there an issue with 
> > accidentally
> > overwriting the parameters?  Or did I just over engineer the fix?  
> 
> I would say over engineer. I think CXL does not take link speed and link
> width as parameters.

I've implemented control but this still ends up over engineered because
the reason I want to control this is to vary access parameters for calculating
latency and bandwidth.  That is easiest done by controlling the EP status
to degrade the link.  For that I just set the CAP register on the switch DSP
to allow suitably high values and let pcie_sync_bridge() match this to
the status of the EP (which I have properties to contro).
There seems to be only one way 'negotiation' of these parameters so it
needs to be EP driven.

Jonathan
> 
> Best regards,
> Nam


Reply via email to