As pointed out by you, PCIe link might not yet have completed the training.
Maybe checking the  link training  completion bit until a certain timeout
can be an ideal solution.

Try the diff below:
diff --git a/src/device/pci_device.c b/src/device/pci_device.c
index 4b5e73b806..c96e9f1e9d 100644
--- a/src/device/pci_device.c
+++ b/src/device/pci_device.c
@@ -1213,6 +1213,7 @@ static void pci_scan_hidden_device(struct device *dev)
  * @param min_devfn Minimum devfn to look at in the scan, usually 0x00.
  * @param max_devfn Maximum devfn to look at in the scan, usually 0xff.
  */
+#define PCIE_TRAIN_RETRY 10000
 void pci_scan_bus(struct bus *bus, unsigned int min_devfn,
                          unsigned int max_devfn)
 {
@@ -1254,6 +1255,14 @@ void pci_scan_bus(struct bus *bus, unsigned int
min_devfn,
                        continue;
                }

+               /* Wait for training to complete */
+               u16 lnk, try, cap = pci_find_capability(dev,
PCI_CAP_ID_PCIE);
+               for (try = PCIE_TRAIN_RETRY; try > 0; try--) {
+                       lnk = pci_read_config16(dev, cap + PCI_EXP_LNKSTA);
+                       if (!(lnk & PCI_EXP_LNKSTA_LT))
+                               break;
+                       udelay(100);
+               }
                /* See if a device is present and setup the device
structure. */
                dev = pci_probe_dev(dev, bus, devfn);


Regards,
Naresh

On Tue, Sep 21, 2021 at 4:48 PM Sumo <kingsu...@gmail.com> wrote:

> Hi,
>
> Anything else can be tried instead of using simple delays?
> Perhaps during the enumeration for some devices a delay is really
> required, for the NVMe case the vendor/device ID pair isn't detected at
> all... I'm not sure if the PCIe link is still training (I don't have an
> analyzer) or if the device is still booting...
>
> Kind regards,
> Sumo
>
> On Tue, Aug 17, 2021 at 8:34 AM Sumo <kingsu...@gmail.com> wrote:
>
>> Hi,
>>
>> I have managed to disable the UART console and collect the logs via cbmem
>> tool. Therefore it will not add any additional delay even with debug logs
>> enabled so we can compare the logs with and without the delay.
>>
>> Here are my findings:
>> - without the delay in dev_enumerate() the NVMe because isn't detected at
>> all (i.e. the PCIe device isn't shown in the bus):
>> PCI: 00:0b.0 scanning...
>> do_pci_scan_bridge for PCI: 00:0b.0
>> PCI: 00:0b.0: Enabled LTR
>> PCI: pci_scan_bus for bus 04
>> POST: 0x24
>>
>> *PCI: Static device PCI: 04:00.0 not found, disabling it.*POST: 0x25
>> PCI: Leftover static devices:
>> PCI: 04:00.0
>> PCI: Check your devicetree.cb.
>> POST: 0x55
>> scan_bus: bus PCI: 00:0b.0 finished in 0 msecs
>>
>> - by adding the delay the device is detected and initialized:
>> PCI: 00:0b.0 scanning...
>> do_pci_scan_bridge for PCI: 00:0b.0
>> PCI: 00:0b.0: Enabled LTR
>> PCI: pci_scan_bus for bus 04
>> POST: 0x24
>>
>> *PCI: 04:00.0 [1987/5012] enabled*POST: 0x25
>> POST: 0x55
>> Enabling Common Clock Configuration
>> PCIE CLK PM is not supported by endpoint
>> L1 Sub-State supported from root port 11
>> L1 Sub-State Support = 0xf
>> CommonModeRestoreTime = 0x28
>> Power On Value = 0x6, Power On Scale = 0x1
>> ASPM: Enabled L1
>> PCIe: Max_Payload_Size adjusted to 256
>> PCI: 04:00.0: Enabled LTR
>> PCI: 04:00.0: Programmed LTR max latencies
>> scan_bus: bus PCI: 00:0b.0 finished in 0 msecs
>>
>> Also, another device is failing if I remove the delay - it's a I211
>> gigabit ethernet controller:
>> PCI: 00:0f.0 scanning...
>> do_pci_scan_bridge for PCI: 00:0f.0
>> PCI: 00:0f.0: Enabled LTR
>> PCI: pci_scan_bus for bus 05
>> POST: 0x24
>>
>> *PCI: Static device PCI: 05:00.0 not found, disabling it.*POST: 0x25
>> PCI: Leftover static devices:
>> PCI: 05:00.0
>> PCI: Check your devicetree.cb.
>> POST: 0x55
>> scan_bus: bus PCI: 00:0f.0 finished in 0 msecs
>>
>> With the delay the I211 is detected:
>> PCI: 00:0f.0 scanning...
>> do_pci_scan_bridge for PCI: 00:0f.0
>> PCI: 00:0f.0: Enabled LTR
>> PCI: pci_scan_bus for bus 05
>> POST: 0x24
>>
>> *PCI: 05:00.0 [8086/1539] enabled*POST: 0x25
>> POST: 0x55
>> Enabling Common Clock Configuration
>> PCIE CLK PM is not supported by endpoint
>> ASPM: Enabled L1
>> PCIe: Max_Payload_Size adjusted to 256
>> PCI: 05:00.0: No LTR support
>> scan_bus: bus PCI: 00:0f.0 finished in 0 msecs
>>
>> Full logs are attached.
>>
>> Kind regards,
>> Sumo
>>
>> On Mon, Aug 16, 2021 at 8:01 PM Sumo <kingsu...@gmail.com> wrote:
>>
>>> Hi Paul,
>>>
>>> When logs are (almost) disabled the error isn't shown, so if I add the
>>> delay with logs disabled the log output will have almost no difference at
>>> all.
>>>
>>> Following are the logs, including a log with Coreboot debug enabled + no
>>> delay. For all logs FSP loglevel is set to NoDebug:
>>> - nvme-err.log : no delay; coreboot debug_level=Error; NVMe error: at
>>> the end of the log is shown the error in the UEFI FW:
>>>   ERROR: C40000002:V02010007 I0 93B80004-9FB3-11D4-9A3A-0090273FC14D
>>> 7E90A998;
>>> - nvme-ok-delay.log : 20ms delay; coreboot debug_level=Error; NVMe ok;
>>> - nvme-ok.log : no delay; coreboot debug_level=Spew; NVMe ok: the
>>> coreboot log output is enough to make NVMe work properly;
>>>
>>> The NVMe is in the root port 00:0b.0, it is shown as 04:00.0
>>>
>>> Thanks,
>>> Sumo
>>>
>>> On Mon, Aug 16, 2021 at 2:57 PM Paul Menzel <pmen...@molgen.mpg.de>
>>> wrote:
>>>
>>>> Dear Sumo,
>>>>
>>>>
>>>> Am 16.08.21 um 18:38 schrieb Sumo:
>>>>
>>>> > The NVMe is not detected when serial console logs are disabled, I
>>>> mean by
>>>> > setting both Coreboot log_level=Error (or less) and FSP
>>>> > PcdFspDebugPrintErrorLevel=NoDebug. Looks like the enumeration fails
>>>> then
>>>> > further on the device is not listed in the UEFI FW (same issue shown
>>>> in
>>>> > either CorebootPayloadPkg or UefiPayloadPkg). When Linux boots the
>>>> device
>>>> > appears normally.
>>>> >
>>>> > The problem is fixed by adding a small delay inside dev_enumerate() -
>>>> a
>>>> > 20ms delay at the very beginning of the function is enough. I'm
>>>> wondering
>>>> > if there is a better solution for this, the device is already defined
>>>> in
>>>> > the devicetree.cb (set as on). Maybe coreboot is too fast and the
>>>> NVMe is
>>>> > still booting up - or the PCIe link is still training, not sure.
>>>> Coreboot
>>>> > doesn't retry if the device is not detected right away?
>>>>
>>>> Please share the logs without and with the delay.
>>>>
>>>>
>>>> Kind regards,
>>>>
>>>> Paul
>>>>
>>> _______________________________________________
> coreboot mailing list -- coreboot@coreboot.org
> To unsubscribe send an email to coreboot-le...@coreboot.org
>


-- 
Best regards,
Naresh G. Solanki
_______________________________________________
coreboot mailing list -- coreboot@coreboot.org
To unsubscribe send an email to coreboot-le...@coreboot.org

Reply via email to