On Thu, Sep 26, 2019 at 8:51 AM Kyösti Mälkki <kyosti.mal...@gmail.com>
wrote:

> Hi Matt,
>
> thanks for bringing the topic up. Please also contact your Intel reps
> about this via commercial support channel as well. I believe Patrick
> G. once stated that he could act as a relay when it comes to disputes
> between commercial and community development 'strategies'.
>
> On Wed, Sep 25, 2019 at 8:48 PM Matt DeVillier <matt.devill...@gmail.com>
> wrote:
> >
> > Commit 903b40a [soc/intel: Replace uses of dev_find_slot()] replaced
> calls to dev_find_slot() with calls to pcidev_path_on_root() for all Intel
> common SoCs, but it seems unclear exactly what problem this was intended to
> fix, and has created problems with locating hidden PCI devices.
>
> I believe [1] commit explains dev_find_slot() with enough details.
>
> > Commit f2ac0137 [soc/intel: Fix regression with hidden PCI devices]
> fixed this partially by creating a debug version of pcidev_path_on_root()
> which falls back on dev_find_slot(), but did not apply it to all devices
> which need it. On SKL/KBL alone, there are at least 5 different function
> calls accessing a dozen PCI devices which require falling back on
> dev_find_slot().
> >
> > Kyosti has expressed a desire to eliminate the use of dev_find_slot()
> since it can potentially return false positives prior to device enumeration
> in ramstage, but as currently implemented the cure seems worse than the
> disease.
> >
> > Short term, it seems like having pcidev_path_on_root() fall back on
> dev_find_slot() with a loud warning (like pcidev_path_on_root_debug() does
> now) makes the most sense, vs having two nearly identical function calls
> used inconsistently. Long term, we need a better strategy for dealing with
> PCI devices which get hidden by FSP / are in violation of PCI spec.
> >
> > But since discussion on Gerrit seems to have died, reviving it here for
> a larger audience.
>

I just noticed these commits and have commented there. :)

>
> I will quote below what I already wrote in gerrit [2], [3]:
>
> The trouble is the entire PCI subsystem in ramstage is based on
> matching the vendor/device ID register with a PCI driver and to the
> source we want to control that device with. To allow this hiding of
> PCI devices will ultimately force us to write the control somewhere in
> the scope of SOC instead. Oh, but wait, perhaps the intention is for
> us to __not__ write that control anymore but let the FSP blob do all
> that too!
>

I think we can provide a semantic of "possibly hidden" devices and provide
operations which do the correct thing provided the platform constraints. I
do want to understand the complete problem, though.

My current understanding is that we're forcing a device to be unhidden
after a sequence we know that hides a particular device. And we're
attempting to put it back to hidden permanently because of policy enforced
by a silicon vendor. However, if an access comes after the hide sequence we
trip over the NULL check if the device was removed from the topology. What
I couldn't figure out was what caused the need for f2ac0137? p2sb was
working still? The commits don't have more information as to the
circumstances of the failure.

>
> AFAICS, hardware that does not respond to vendor/device ID register
> reads does not meet PCI compliance. I am willing to hit +2 on removing
> support for platforms that do not meet PCI compliance, specially when
> in the cases here, it is a matter of broken FSP blobs and not broken
> silicon per-se.
>

I'm not in agreement on the policies/thinking from all the hardware
vendors. Many issues have arisen in not thinking through pci dev/fun
allocation. e.g. function 0 can be hidden but functions >1 should be
around. Definitely a violation of PCI spec and many things have been worked
around that.

That said, computer architecture/design is messy, and I think it's naive to
think everything is spec compliant. There are complex fabrics and fabric
policies in all these devices that are utilized to provide a "pci". And as
firmware people we need to deal with the realities of the hardware. That's
pretty much the point of firmware: prepare the abstractions for the OS to
make those assumptions of aligning with specs. Even then, some things just
can't be fixed (check out all the quirks and layering violations in the
drivers).



>
> Also, I should not be accepting new callers for dev_find_slot() due
> the ill semantics it has. Prior to device enumeration, it can return
> false positives because all devices appear on bus 0. So please look
> for alternative solution if you want to support Intel's initiative of
> more blob less FOSS.
>

dev_find_slot() failures have nothing to do with blobs or not. Certainly,
some of the circumstances have tripped up some of the assumptions made in
that code. And I agree we should move away from using that API and/or
removing it.

>
> I suggest you post on the mailing list. That active PCI devices no
> longer respond to Vendor ID / Device ID queries does not meet PCI
> compliance, as I understand the specs. Unfortunately, someone with
> enough authority inside the org will likely decide it's just fine that
> ramstage will no longer be designed using PCI drivers and allow use of
> shim calling massive FSP blob.
>
>
> As for solutions:
>
> Preferably fix FSP and not allow this PCI hide-and-seek.
>
> or
>
> Rewrite PCI device enumeration such that devices that do not respond
> to ID queries are not permanently removed from topology links. You
>

I thought that the devices being removed was part of the original problem?

> need platform hooks to decide whether to remove a particular device
> (based on B:D.F) or not. This would avoid ill dev_find_slot() then.
>
>
I have some ideas on some solutions, but I want to understand the failure
that occurred after 903b40a.

or
>
> Revert to __SIMPLE_DEVICE__ for PCI access. But __SIMPLE_DEVICE__ is
> also on my kill list due to uglyness and high maintenance whenever we
> attempt to move sources to earlier stages. Unfortunately things found
> in soc/intel has slowed down the process to standstill and work is not
> funded.
>
> [1] https://review.coreboot.org/c/coreboot/+/26447
> [2] https://review.coreboot.org/c/coreboot/+/35087
> [3] https://review.coreboot.org/c/coreboot/+/35088
>
> Regards,
> Kyösti Mälkki
> _______________________________________________
> coreboot mailing list -- coreboot@coreboot.org
> To unsubscribe send an email to coreboot-le...@coreboot.org
>
_______________________________________________
coreboot mailing list -- coreboot@coreboot.org
To unsubscribe send an email to coreboot-le...@coreboot.org

Reply via email to