On Thu, Apr 06, 2023 at 01:21:31AM +0100, Maciej W. Rozycki wrote:
> Attempt to handle cases such as with a downstream port of the ASMedia 
> ASM2824 PCIe switch where link training never completes and the link 
> continues switching between speeds indefinitely with the data link layer 
> never reaching the active state.

We're going to land this series this cycle, come hell or high water.

We talked about reusing pcie_retrain_link() earlier.  IIRC that didn't
work: ASPM needs to use PCI_EXP_LNKSTA_LT because not all devices
support PCI_EXP_LNKSTA_DLLLA, and you need PCI_EXP_LNKSTA_DLLLA
because the erratum makes PCI_EXP_LNKSTA_LT flap.

What if we made pcie_retrain_link() reusable by making it:

  bool pcie_retrain_link(struct pci_dev *pdev, u16 link_status_bit)

so ASPM could use pcie_retrain_link(link->pdev, PCI_EXP_LNKSTA_LT) and
you could use pcie_retrain_link(dev, PCI_EXP_LNKSTA_DLLLA)?

Maybe do it two steps?

  1) Move pcie_retrain_link() just after pcie_wait_for_link() and make
  it take link->pdev instead of link.

  2) Add the bit parameter.

I'm OK with having pcie_retrain_link() in pci.c, but the surrounding
logic about restricting to 2.5GT/s, retraining, removing the
restriction, retraining again is stuff I'd rather have in quirks.c so
it doesn't clutter pci.c.

I think it'd be good if the pci_device_add() path made clear that this
is a workaround for a problem, e.g.,

  void pci_device_add(struct pci_dev *dev, struct pci_bus *bus)
  {
    ...
    if (pcie_link_failed(dev))
      pcie_fix_link_train(dev);

where pcie_fix_link_train() could live in quirks.c (with a stub when
CONFIG_PCI_QUIRKS isn't enabled).  It *might* even be worth adding it
and the stub first because that's a trivial patch and wouldn't clutter
the probe.c git history with all the grotty details about ASM2824 and
this topology.

> +int pcie_downstream_link_retrain(struct pci_dev *dev)
> +{
> +     static const struct pci_device_id ids[] = {
> +             { PCI_VDEVICE(ASMEDIA, 0x2824) }, /* ASMedia ASM2824 */
> +             {}
> +     };
> +     u16 lnksta, lnkctl2;
> +
> +     if (!pci_is_pcie(dev) || !pcie_downstream_port(dev) ||
> +         !pcie_cap_has_lnkctl2(dev) || !dev->link_active_reporting)
> +             return -1;
> +
> +     pcie_capability_read_word(dev, PCI_EXP_LNKCTL2, &lnkctl2);
> +     pcie_capability_read_word(dev, PCI_EXP_LNKSTA, &lnksta);
> +     if ((lnksta & (PCI_EXP_LNKSTA_LBMS | PCI_EXP_LNKSTA_DLLLA)) ==
> +         PCI_EXP_LNKSTA_LBMS) {

You go to some trouble to make sure PCI_EXP_LNKSTA_LBMS is set, and I
can't remember what the reason is.  If you make a preparatory patch
like this, it would give a place for that background, e.g.,

  +bool pcie_link_failed(struct pci_dev *dev)
  +{
  +       u16 lnksta;
  +
  +       if (!pci_is_pcie(dev) || !pcie_downstream_port(dev) ||
  +           !pcie_cap_has_lnkctl2(dev) || !dev->link_active_reporting)
  +               return false;
  +
  +       pcie_capability_read_word(dev, PCI_EXP_LNKSTA, &lnksta);
  +       if ((lnksta & (PCI_EXP_LNKSTA_LBMS | PCI_EXP_LNKSTA_DLLLA)) ==
  +                       PCI_EXP_LNKSTA_LBMS)
  +               return true;
  +
  +       return false;
  +}

If this is a generic thing and checking PCI_EXP_LNKSTA_LBMS makes
sense for everybody, it could go in pci.c; otherwise it could go in
quirks.c as well.  I guess it's not *truly* generic anyway because it
only detects link training failures for devices that have LNKCTL2 and
link_active_reporting.

> +             unsigned long timeout;
> +             u16 lnkctl;
> +
> +             pci_info(dev, "broken device, retraining non-functional 
> downstream link at 2.5GT/s\n");
> +
> +             pcie_capability_read_word(dev, PCI_EXP_LNKCTL, &lnkctl);
> +             lnkctl |= PCI_EXP_LNKCTL_RL;
> +             lnkctl2 &= ~PCI_EXP_LNKCTL2_TLS;
> +             lnkctl2 |= PCI_EXP_LNKCTL2_TLS_2_5GT;
> +             pcie_capability_write_word(dev, PCI_EXP_LNKCTL2, lnkctl2);
> +             pcie_capability_write_word(dev, PCI_EXP_LNKCTL, lnkctl);
> +             /*
> +              * Due to an erratum in some devices the Retrain Link bit
> +              * needs to be cleared again manually to allow the link
> +              * training to succeed.
> +              */
> +             lnkctl &= ~PCI_EXP_LNKCTL_RL;
> +             if (dev->clear_retrain_link)
> +                     pcie_capability_write_word(dev, PCI_EXP_LNKCTL,
> +                                                lnkctl);
> +
> +             timeout = jiffies + PCIE_LINK_RETRAIN_TIMEOUT;
> +             do {
> +                     pcie_capability_read_word(dev, PCI_EXP_LNKSTA,
> +                                          &lnksta);
> +                     if (lnksta & PCI_EXP_LNKSTA_DLLLA)
> +                             break;
> +                     usleep_range(10000, 20000);
> +             } while (time_before(jiffies, timeout));
> +
> +             if (!(lnksta & PCI_EXP_LNKSTA_DLLLA)) {
> +                     pci_info(dev, "retraining failed\n");
> +                     return -1;
> +             }
> +     }

> +     if (IS_ENABLED(CONFIG_PCI_QUIRKS) && (lnksta & PCI_EXP_LNKSTA_DLLLA) &&
> +         (lnkctl2 & PCI_EXP_LNKCTL2_TLS) == PCI_EXP_LNKCTL2_TLS_2_5GT &&
> +         pci_match_id(ids, dev)) {
> +             u32 lnkcap;
> +             u16 lnkctl;
> +
> +             pci_info(dev, "removing 2.5GT/s downstream link speed 
> restriction\n");
> +             pcie_capability_read_dword(dev, PCI_EXP_LNKCAP, &lnkcap);
> +             pcie_capability_read_word(dev, PCI_EXP_LNKCTL, &lnkctl);
> +             lnkctl |= PCI_EXP_LNKCTL_RL;
> +             lnkctl2 &= ~PCI_EXP_LNKCTL2_TLS;
> +             lnkctl2 |= lnkcap & PCI_EXP_LNKCAP_SLS;
> +             pcie_capability_write_word(dev, PCI_EXP_LNKCTL2, lnkctl2);
> +             pcie_capability_write_word(dev, PCI_EXP_LNKCTL, lnkctl);

This starts a retrain; should we wait for training to complete?

> +     }

If we put most of this into a pcie_fix_link_train() (separated from
detecting the *need* to fix something), could it be made to look
sort of like this?  (I suppose you'd want to return bool and rename
it that reads naturally, e.g., "pcie_link_forcibly_retrained()",
"pcie_link_retrained()", etc)

  +void pcie_fix_link_train(struct pci_dev *dev)
  +{
  +       u16 lnkctl2;
  +       u32 lnkcap;
  +       bool linkup;
  +
  +       pci_info(dev, "attempting link retrain at 2.5GT/s\n");
  +       pcie_capability_read_word(dev, PCI_EXP_LNKCTL2, &lnkctl2);
  +       lnkctl2 &= ~PCI_EXP_LNKCTL2_TLS;
  +       lnkctl2 |= PCI_EXP_LNKCTL2_TLS_2_5GT;
  +       pcie_capability_write_word(dev, PCI_EXP_LNKCTL2, lnkctl2);
  +
  +       linkup = pcie_retrain_link(dev, PCI_EXP_LNKSTA_DLLLA);
  +       if (!linkup) {
  +               pci_info(dev, "retraining failed\n");
  +               return;
  +       }
  +
  +       if (LNKCAP supports only 2.5GT/s)
  +               return;
  +
  +       if (!pci_match_id(ids, dev))
  +               return;

Your comment said "if we know this is *safe*"; I can't remember if
pci_match_id() is there to avoid a known problem?

  +
  +       pci_info(dev, "attempting link retrain at max supported rate\n");
  +       pcie_capability_read_dword(dev, PCI_EXP_LNKCAP, &lnkcap);
  +       lnkctl2 &= ~PCI_EXP_LNKCTL2_TLS;
  +       lnkctl2 |= lnkcap & PCI_EXP_LNKCAP_SLS;
  +       pcie_capability_write_word(dev, PCI_EXP_LNKCTL2, lnkctl2);
  +
  +       linkup = pcie_retrain_link(dev, PCI_EXP_LNKSTA_DLLLA);
  +       if (!linkup)
  +               pci_info(dev, "retraining failed\n");
  +}

> +
> +     return 0;
> +}
> +
> +/* Same as above, but called for a downstream device.  */
> +static int pcie_upstream_link_retrain(struct pci_dev *dev)
> +{
> +     struct pci_dev *bridge;
> +
> +     bridge = pci_upstream_bridge(dev);
> +     if (bridge)
> +             return pcie_downstream_link_retrain(bridge);
> +     else
> +             return -1;
> +}
> +
>  static int pci_acs_enable;
>  
>  /**
> @@ -1148,8 +1274,8 @@ void pci_resume_bus(struct pci_bus *bus)
>  
>  static int pci_dev_wait(struct pci_dev *dev, char *reset_type, int timeout)
>  {
> +     int retrain = 0;
>       int delay = 1;
> -     u32 id;
>  
>       /*
>        * After reset, the device should not silently discard config
> @@ -1163,21 +1289,37 @@ static int pci_dev_wait(struct pci_dev *
>        * Command register instead of Vendor ID so we don't have to
>        * contend with the CRS SV value.
>        */
> -     pci_read_config_dword(dev, PCI_COMMAND, &id);
> -     while (PCI_POSSIBLE_ERROR(id)) {
> +     for (;;) {
> +             u32 id;
> +
> +             pci_read_config_dword(dev, PCI_COMMAND, &id);
> +             if (!PCI_POSSIBLE_ERROR(id)) {
> +                     if (delay > PCI_RESET_WAIT)
> +                             pci_info(dev, "ready %dms after %s\n",
> +                                      delay - 1, reset_type);
> +                     break;
> +             }
> +
>               if (delay > timeout) {
>                       pci_warn(dev, "not ready %dms after %s; giving up\n",
>                                delay - 1, reset_type);
>                       return -ENOTTY;
>               }
>  
> -             if (delay > PCI_RESET_WAIT)
> +             if (delay > PCI_RESET_WAIT) {
> +                     if (!retrain) {
> +                             retrain = 1;
> +                             if (pcie_upstream_link_retrain(dev) == 0) {
> +                                     delay = 1;
> +                                     continue;
> +                             }
> +                     }
>                       pci_info(dev, "not ready %dms after %s; waiting\n",
>                                delay - 1, reset_type);
> +             }

Thanks for fixing this in the reset path, too.  Can we move this part
to a separate patch?  It's related to the rest of the patch, but it
looks so much different that I think it would be easier to understand
by itself.

I think I might try to fold the pcie_upstream_link_retrain() directly
in here because the "upstream link retrain" in the function name
doesn't really make sense in PCIe terms.

Bjorn

Reply via email to