Re: [PATCH v3] pci: Work around PCIe link training failures

2022-01-17 Thread Maciej W. Rozycki
On Sat, 15 Jan 2022, Tom Rini wrote:

> > Keep the 2.5GT/s speed restriction then, conservatively, if functional 
> > once applied.
> > 
> > Signed-off-by: Maciej W. Rozycki 
> > Reviewed-by: Stefan Roese 
> 
> Applied to u-boot/master, thanks!

 Great, thank you all for input and reviews!

  Maciej


Re: [PATCH v3] pci: Work around PCIe link training failures

2022-01-15 Thread Tom Rini
On Sat, Nov 20, 2021 at 11:03:30PM +, Maciej W. Rozycki wrote:

> Attempt to handle cases with a downstream port of a PCIe switch where
> link training never completes and the link continues switching between 
> speeds indefinitely with the data link layer never reaching the active 
> state.
> 
> It has been observed with a downstream port of the ASMedia ASM2824 Gen 3 
> switch wired to the upstream port of the Pericom PI7C9X2G304 Gen 2 
> switch, using a Delock Riser Card PCI Express x1 > 2 x PCIe x1 device, 
> P/N 41433, wired to a SiFive HiFive Unmatched board.  In this setup the 
> switches are supposed to negotiate the link speed of preferably 5.0GT/s, 
> falling back to 2.5GT/s.
> 
> However the link continues oscillating between the two speeds, at the 
> rate of 34-35 times per second, with link training reported repeatedly 
> active ~84% of the time, e.g.:
> 
> 02:03.0 PCI bridge [0604]: ASMedia Technology Inc. ASM2824 PCIe Gen3 Packet 
> Switch [1b21:2824] (rev 01) (prog-if 00 [Normal decode])
> [...]
>   Bus: primary=02, secondary=05, subordinate=05, sec-latency=0
> [...]
>   Capabilities: [80] Express (v2) Downstream Port (Slot+), MSI 00
> [...]
>   LnkSta: Speed 5GT/s (downgraded), Width x1 (ok)
>   TrErr- Train+ SlotClk+ DLActive- BWMgmt+ ABWMgmt-
> [...]
>   LnkCtl2: Target Link Speed: 8GT/s, EnterCompliance- SpeedDis+, 
> Selectable De-emphasis: -3.5dB
>Transmit Margin: Normal Operating Range, 
> EnterModifiedCompliance- ComplianceSOS-
>Compliance De-emphasis: -6dB
> [...]
> 
> Forcibly limiting the target link speed to 2.5GT/s with the upstream 
> ASM2824 device makes the two switches communicate correctly however:
> 
> 02:03.0 PCI bridge [0604]: ASMedia Technology Inc. ASM2824 PCIe Gen3 Packet 
> Switch [1b21:2824] (rev 01) (prog-if 00 [Normal decode])
> [...]
>   Bus: primary=02, secondary=05, subordinate=09, sec-latency=0
> [...]
>   Capabilities: [80] Express (v2) Downstream Port (Slot+), MSI 00
> [...]
>   LnkSta: Speed 2.5GT/s (downgraded), Width x1 (ok)
>   TrErr- Train- SlotClk+ DLActive+ BWMgmt- ABWMgmt-
> [...]
>   LnkCtl2: Target Link Speed: 2.5GT/s, EnterCompliance- 
> SpeedDis+, Selectable De-emphasis: -3.5dB
>Transmit Margin: Normal Operating Range, 
> EnterModifiedCompliance- ComplianceSOS-
>Compliance De-emphasis: -6dB
> [...]
> 
> and then:
> 
> 05:00.0 PCI bridge [0604]: Pericom Semiconductor PI7C9X2G304 EL/SL PCIe2 
> 3-Port/4-Lane Packet Switch [12d8:2304] (rev 05) (prog-if 00 [Normal decode])
> [...]
>   Bus: primary=05, secondary=06, subordinate=09, sec-latency=0
> [...]
>   Capabilities: [c0] Express (v2) Upstream Port, MSI 00
> [...]
>   LnkSta: Speed 2.5GT/s (downgraded), Width x1 (downgraded)
>   TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
> [...]
>   LnkCtl2: Target Link Speed: 5GT/s, EnterCompliance- SpeedDis-
>Transmit Margin: Normal Operating Range, 
> EnterModifiedCompliance- ComplianceSOS-
>Compliance De-emphasis: -6dB
> [...]
> 
> Make use of this observation then and attempt to detect the inability to 
> negotiate the link speed automatically, and then handle it by hand.  Use 
> the Data Link Layer Link Active status flag as the primary indicator of 
> successful link speed negotiation, but given that the flag is optional 
> by hardware to implement (the ASM2824 does have it though), resort to 
> checking for the mandatory Link Bandwidth Management Status flag showing 
> that the link speed or width has been changed in an attempt to correct 
> unreliable link operation (the ASM2824 does set it too).
> 
> If these checks indicate that link may not operate correctly, then poll 
> the Data Link Layer Link Active status flag along with the Link Training 
> flag for the duration of 200ms to see if the link has stabilised, that 
> is either that the Data Link Layer Link Active status flag has been set 
> or that Link Training has been inactive during at least the second half 
> of the interval.
> 
> If that has indicated failure, restrict the target speed to 2.5GT/s, 
> request a link retrain and check again if the link has stabilised.  If 
> that does not work either, then restore the original speed setting and 
> claim defeat, otherwise we are done.
> 
> NB interestingly enough with the ASM2824 vs PI7C9X2G304 configuration 
> referred above asking the ASM2824 to retrain with a higher target link 
> speed once the 2.5GT/s speed has been negotiated makes the two devices 
> successfully negotiate 5.0GT/s.  Lifting the 2.5GT/s speed restriction 
> would however prevent our workaround from working with an OS that issues 
> a reset and that is unaware of the problem.  This is because the devices 
> would then try to negotiate a higher link speed from scratch and 

Re: [PATCH v3] pci: Work around PCIe link training failures

2022-01-13 Thread Stefan Roese

On 11/21/21 00:03, Maciej W. Rozycki wrote:

Attempt to handle cases with a downstream port of a PCIe switch where
link training never completes and the link continues switching between
speeds indefinitely with the data link layer never reaching the active
state.

It has been observed with a downstream port of the ASMedia ASM2824 Gen 3
switch wired to the upstream port of the Pericom PI7C9X2G304 Gen 2
switch, using a Delock Riser Card PCI Express x1 > 2 x PCIe x1 device,
P/N 41433, wired to a SiFive HiFive Unmatched board.  In this setup the
switches are supposed to negotiate the link speed of preferably 5.0GT/s,
falling back to 2.5GT/s.

However the link continues oscillating between the two speeds, at the
rate of 34-35 times per second, with link training reported repeatedly
active ~84% of the time, e.g.:

02:03.0 PCI bridge [0604]: ASMedia Technology Inc. ASM2824 PCIe Gen3 Packet 
Switch [1b21:2824] (rev 01) (prog-if 00 [Normal decode])
[...]
Bus: primary=02, secondary=05, subordinate=05, sec-latency=0
[...]
Capabilities: [80] Express (v2) Downstream Port (Slot+), MSI 00
[...]
LnkSta: Speed 5GT/s (downgraded), Width x1 (ok)
TrErr- Train+ SlotClk+ DLActive- BWMgmt+ ABWMgmt-
[...]
LnkCtl2: Target Link Speed: 8GT/s, EnterCompliance- SpeedDis+, 
Selectable De-emphasis: -3.5dB
 Transmit Margin: Normal Operating Range, 
EnterModifiedCompliance- ComplianceSOS-
 Compliance De-emphasis: -6dB
[...]

Forcibly limiting the target link speed to 2.5GT/s with the upstream
ASM2824 device makes the two switches communicate correctly however:

02:03.0 PCI bridge [0604]: ASMedia Technology Inc. ASM2824 PCIe Gen3 Packet 
Switch [1b21:2824] (rev 01) (prog-if 00 [Normal decode])
[...]
Bus: primary=02, secondary=05, subordinate=09, sec-latency=0
[...]
Capabilities: [80] Express (v2) Downstream Port (Slot+), MSI 00
[...]
LnkSta: Speed 2.5GT/s (downgraded), Width x1 (ok)
TrErr- Train- SlotClk+ DLActive+ BWMgmt- ABWMgmt-
[...]
LnkCtl2: Target Link Speed: 2.5GT/s, EnterCompliance- 
SpeedDis+, Selectable De-emphasis: -3.5dB
 Transmit Margin: Normal Operating Range, 
EnterModifiedCompliance- ComplianceSOS-
 Compliance De-emphasis: -6dB
[...]

and then:

05:00.0 PCI bridge [0604]: Pericom Semiconductor PI7C9X2G304 EL/SL PCIe2 
3-Port/4-Lane Packet Switch [12d8:2304] (rev 05) (prog-if 00 [Normal decode])
[...]
Bus: primary=05, secondary=06, subordinate=09, sec-latency=0
[...]
Capabilities: [c0] Express (v2) Upstream Port, MSI 00
[...]
LnkSta: Speed 2.5GT/s (downgraded), Width x1 (downgraded)
TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
[...]
LnkCtl2: Target Link Speed: 5GT/s, EnterCompliance- SpeedDis-
 Transmit Margin: Normal Operating Range, 
EnterModifiedCompliance- ComplianceSOS-
 Compliance De-emphasis: -6dB
[...]

Make use of this observation then and attempt to detect the inability to
negotiate the link speed automatically, and then handle it by hand.  Use
the Data Link Layer Link Active status flag as the primary indicator of
successful link speed negotiation, but given that the flag is optional
by hardware to implement (the ASM2824 does have it though), resort to
checking for the mandatory Link Bandwidth Management Status flag showing
that the link speed or width has been changed in an attempt to correct
unreliable link operation (the ASM2824 does set it too).

If these checks indicate that link may not operate correctly, then poll
the Data Link Layer Link Active status flag along with the Link Training
flag for the duration of 200ms to see if the link has stabilised, that
is either that the Data Link Layer Link Active status flag has been set
or that Link Training has been inactive during at least the second half
of the interval.

If that has indicated failure, restrict the target speed to 2.5GT/s,
request a link retrain and check again if the link has stabilised.  If
that does not work either, then restore the original speed setting and
claim defeat, otherwise we are done.

NB interestingly enough with the ASM2824 vs PI7C9X2G304 configuration
referred above asking the ASM2824 to retrain with a higher target link
speed once the 2.5GT/s speed has been negotiated makes the two devices
successfully negotiate 5.0GT/s.  Lifting the 2.5GT/s speed restriction
would however prevent our workaround from working with an OS that issues
a reset and that is unaware of the problem.  This is because the devices
would then try to negotiate a higher link speed from scratch and fail,
while the sticky property of the Target Link Speed setting will keep the
2.5GT/s speed restriction across a reset.

Keep the 2.5GT/s speed restriction then, conservatively, if functional
once applied.

S

Re: [PATCH v3] pci: Work around PCIe link training failures

2022-01-12 Thread Maciej W. Rozycki
On Wed, 12 Jan 2022, Tom Rini wrote:

> >  I believe this version has addressed all concerns raised in the review 
> > thus far.  With the nature of a problem better understood now I'm sending 
> > a corresponding update for Linux as well.
> 
> What as the feedback to your Linux change?  Is this essentially the path
> forward still?  Thanks!

 There has been no response so far, perhaps due to unfavourable timing, 
the festive season, etc.  I have rebased my original change and posted a 
regenerated version a bit more than a week ago:



and a few of my other resubmissions in the PCI area have been partially 
reviewed last week.  So things have been progressing, but with Linux 5.16 
released last Sun we're in the merge window for 5.17 now, so people are 
surely busy with that.  We shall see.

 NB as I previously noted the Linux change has to be different, because 
you cannot busy-loop polling a bit in a device register in an OS, unlike 
in firmware, and therefore the Linux version has to rely on the data link 
layer active reporting capability, which may not be there in PCIe 2.0 
devices.  Conceptually the two changes remain similar though, so I guess 
input from the Linux side will still be valuable.

 Thank you for your attention to my proposal.

  Maciej


Re: [PATCH v3] pci: Work around PCIe link training failures

2022-01-12 Thread Tom Rini
On Sat, Nov 20, 2021 at 11:03:30PM +, Maciej W. Rozycki wrote:

> Attempt to handle cases with a downstream port of a PCIe switch where
> link training never completes and the link continues switching between 
> speeds indefinitely with the data link layer never reaching the active 
> state.
> 
> It has been observed with a downstream port of the ASMedia ASM2824 Gen 3 
> switch wired to the upstream port of the Pericom PI7C9X2G304 Gen 2 
> switch, using a Delock Riser Card PCI Express x1 > 2 x PCIe x1 device, 
> P/N 41433, wired to a SiFive HiFive Unmatched board.  In this setup the 
> switches are supposed to negotiate the link speed of preferably 5.0GT/s, 
> falling back to 2.5GT/s.
> 
> However the link continues oscillating between the two speeds, at the 
> rate of 34-35 times per second, with link training reported repeatedly 
> active ~84% of the time, e.g.:
> 
> 02:03.0 PCI bridge [0604]: ASMedia Technology Inc. ASM2824 PCIe Gen3 Packet 
> Switch [1b21:2824] (rev 01) (prog-if 00 [Normal decode])
> [...]
>   Bus: primary=02, secondary=05, subordinate=05, sec-latency=0
> [...]
>   Capabilities: [80] Express (v2) Downstream Port (Slot+), MSI 00
> [...]
>   LnkSta: Speed 5GT/s (downgraded), Width x1 (ok)
>   TrErr- Train+ SlotClk+ DLActive- BWMgmt+ ABWMgmt-
> [...]
>   LnkCtl2: Target Link Speed: 8GT/s, EnterCompliance- SpeedDis+, 
> Selectable De-emphasis: -3.5dB
>Transmit Margin: Normal Operating Range, 
> EnterModifiedCompliance- ComplianceSOS-
>Compliance De-emphasis: -6dB
> [...]
> 
> Forcibly limiting the target link speed to 2.5GT/s with the upstream 
> ASM2824 device makes the two switches communicate correctly however:
> 
> 02:03.0 PCI bridge [0604]: ASMedia Technology Inc. ASM2824 PCIe Gen3 Packet 
> Switch [1b21:2824] (rev 01) (prog-if 00 [Normal decode])
> [...]
>   Bus: primary=02, secondary=05, subordinate=09, sec-latency=0
> [...]
>   Capabilities: [80] Express (v2) Downstream Port (Slot+), MSI 00
> [...]
>   LnkSta: Speed 2.5GT/s (downgraded), Width x1 (ok)
>   TrErr- Train- SlotClk+ DLActive+ BWMgmt- ABWMgmt-
> [...]
>   LnkCtl2: Target Link Speed: 2.5GT/s, EnterCompliance- 
> SpeedDis+, Selectable De-emphasis: -3.5dB
>Transmit Margin: Normal Operating Range, 
> EnterModifiedCompliance- ComplianceSOS-
>Compliance De-emphasis: -6dB
> [...]
> 
> and then:
> 
> 05:00.0 PCI bridge [0604]: Pericom Semiconductor PI7C9X2G304 EL/SL PCIe2 
> 3-Port/4-Lane Packet Switch [12d8:2304] (rev 05) (prog-if 00 [Normal decode])
> [...]
>   Bus: primary=05, secondary=06, subordinate=09, sec-latency=0
> [...]
>   Capabilities: [c0] Express (v2) Upstream Port, MSI 00
> [...]
>   LnkSta: Speed 2.5GT/s (downgraded), Width x1 (downgraded)
>   TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
> [...]
>   LnkCtl2: Target Link Speed: 5GT/s, EnterCompliance- SpeedDis-
>Transmit Margin: Normal Operating Range, 
> EnterModifiedCompliance- ComplianceSOS-
>Compliance De-emphasis: -6dB
> [...]
> 
> Make use of this observation then and attempt to detect the inability to 
> negotiate the link speed automatically, and then handle it by hand.  Use 
> the Data Link Layer Link Active status flag as the primary indicator of 
> successful link speed negotiation, but given that the flag is optional 
> by hardware to implement (the ASM2824 does have it though), resort to 
> checking for the mandatory Link Bandwidth Management Status flag showing 
> that the link speed or width has been changed in an attempt to correct 
> unreliable link operation (the ASM2824 does set it too).
> 
> If these checks indicate that link may not operate correctly, then poll 
> the Data Link Layer Link Active status flag along with the Link Training 
> flag for the duration of 200ms to see if the link has stabilised, that 
> is either that the Data Link Layer Link Active status flag has been set 
> or that Link Training has been inactive during at least the second half 
> of the interval.
> 
> If that has indicated failure, restrict the target speed to 2.5GT/s, 
> request a link retrain and check again if the link has stabilised.  If 
> that does not work either, then restore the original speed setting and 
> claim defeat, otherwise we are done.
> 
> NB interestingly enough with the ASM2824 vs PI7C9X2G304 configuration 
> referred above asking the ASM2824 to retrain with a higher target link 
> speed once the 2.5GT/s speed has been negotiated makes the two devices 
> successfully negotiate 5.0GT/s.  Lifting the 2.5GT/s speed restriction 
> would however prevent our workaround from working with an OS that issues 
> a reset and that is unaware of the problem.  This is because the devices 
> would then try to negotiate a higher link speed from scratch and 

[PING][PATCH v3] pci: Work around PCIe link training failures

2022-01-02 Thread Maciej W. Rozycki
On Sat, 20 Nov 2021, Maciej W. Rozycki wrote:

> Attempt to handle cases with a downstream port of a PCIe switch where
> link training never completes and the link continues switching between 
> speeds indefinitely with the data link layer never reaching the active 
> state.

 Ping for: 
.

  Maciej


[PATCH v3] pci: Work around PCIe link training failures

2021-11-20 Thread Maciej W. Rozycki
Attempt to handle cases with a downstream port of a PCIe switch where
link training never completes and the link continues switching between 
speeds indefinitely with the data link layer never reaching the active 
state.

It has been observed with a downstream port of the ASMedia ASM2824 Gen 3 
switch wired to the upstream port of the Pericom PI7C9X2G304 Gen 2 
switch, using a Delock Riser Card PCI Express x1 > 2 x PCIe x1 device, 
P/N 41433, wired to a SiFive HiFive Unmatched board.  In this setup the 
switches are supposed to negotiate the link speed of preferably 5.0GT/s, 
falling back to 2.5GT/s.

However the link continues oscillating between the two speeds, at the 
rate of 34-35 times per second, with link training reported repeatedly 
active ~84% of the time, e.g.:

02:03.0 PCI bridge [0604]: ASMedia Technology Inc. ASM2824 PCIe Gen3 Packet 
Switch [1b21:2824] (rev 01) (prog-if 00 [Normal decode])
[...]
Bus: primary=02, secondary=05, subordinate=05, sec-latency=0
[...]
Capabilities: [80] Express (v2) Downstream Port (Slot+), MSI 00
[...]
LnkSta: Speed 5GT/s (downgraded), Width x1 (ok)
TrErr- Train+ SlotClk+ DLActive- BWMgmt+ ABWMgmt-
[...]
LnkCtl2: Target Link Speed: 8GT/s, EnterCompliance- SpeedDis+, 
Selectable De-emphasis: -3.5dB
 Transmit Margin: Normal Operating Range, 
EnterModifiedCompliance- ComplianceSOS-
 Compliance De-emphasis: -6dB
[...]

Forcibly limiting the target link speed to 2.5GT/s with the upstream 
ASM2824 device makes the two switches communicate correctly however:

02:03.0 PCI bridge [0604]: ASMedia Technology Inc. ASM2824 PCIe Gen3 Packet 
Switch [1b21:2824] (rev 01) (prog-if 00 [Normal decode])
[...]
Bus: primary=02, secondary=05, subordinate=09, sec-latency=0
[...]
Capabilities: [80] Express (v2) Downstream Port (Slot+), MSI 00
[...]
LnkSta: Speed 2.5GT/s (downgraded), Width x1 (ok)
TrErr- Train- SlotClk+ DLActive+ BWMgmt- ABWMgmt-
[...]
LnkCtl2: Target Link Speed: 2.5GT/s, EnterCompliance- 
SpeedDis+, Selectable De-emphasis: -3.5dB
 Transmit Margin: Normal Operating Range, 
EnterModifiedCompliance- ComplianceSOS-
 Compliance De-emphasis: -6dB
[...]

and then:

05:00.0 PCI bridge [0604]: Pericom Semiconductor PI7C9X2G304 EL/SL PCIe2 
3-Port/4-Lane Packet Switch [12d8:2304] (rev 05) (prog-if 00 [Normal decode])
[...]
Bus: primary=05, secondary=06, subordinate=09, sec-latency=0
[...]
Capabilities: [c0] Express (v2) Upstream Port, MSI 00
[...]
LnkSta: Speed 2.5GT/s (downgraded), Width x1 (downgraded)
TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
[...]
LnkCtl2: Target Link Speed: 5GT/s, EnterCompliance- SpeedDis-
 Transmit Margin: Normal Operating Range, 
EnterModifiedCompliance- ComplianceSOS-
 Compliance De-emphasis: -6dB
[...]

Make use of this observation then and attempt to detect the inability to 
negotiate the link speed automatically, and then handle it by hand.  Use 
the Data Link Layer Link Active status flag as the primary indicator of 
successful link speed negotiation, but given that the flag is optional 
by hardware to implement (the ASM2824 does have it though), resort to 
checking for the mandatory Link Bandwidth Management Status flag showing 
that the link speed or width has been changed in an attempt to correct 
unreliable link operation (the ASM2824 does set it too).

If these checks indicate that link may not operate correctly, then poll 
the Data Link Layer Link Active status flag along with the Link Training 
flag for the duration of 200ms to see if the link has stabilised, that 
is either that the Data Link Layer Link Active status flag has been set 
or that Link Training has been inactive during at least the second half 
of the interval.

If that has indicated failure, restrict the target speed to 2.5GT/s, 
request a link retrain and check again if the link has stabilised.  If 
that does not work either, then restore the original speed setting and 
claim defeat, otherwise we are done.

NB interestingly enough with the ASM2824 vs PI7C9X2G304 configuration 
referred above asking the ASM2824 to retrain with a higher target link 
speed once the 2.5GT/s speed has been negotiated makes the two devices 
successfully negotiate 5.0GT/s.  Lifting the 2.5GT/s speed restriction 
would however prevent our workaround from working with an OS that issues 
a reset and that is unaware of the problem.  This is because the devices 
would then try to negotiate a higher link speed from scratch and fail, 
while the sticky property of the Target Link Speed setting will keep the 
2.5GT/s speed restriction across a reset.

Keep the 2.5GT/s speed restriction then, conservatively, if functional 
once applied.

Signed-off-by