Re: [PATCH 0/2] PCI/AER: Consistently use _OSC to determine who owns AER

2018-11-19 Thread Keith Busch
On Mon, Nov 19, 2018 at 11:53:05AM -0500, Tyler Baicar wrote:
> On Thu, Nov 15, 2018 at 8:49 PM Sinan Kaya  wrote:
> >
> > On 11/15/2018 3:16 PM, Alexandru Gagniuc wrote:
> > > I've asked around a few people at Dell and they unanimously agree that
> > > _OSC is the correct way to determine ownership of AER. In linux, we
> > > use the result of _OSC to enable AER services, but we use HEST to
> > > determine AER ownership. That's inconsistent. This series drops the
> > > use of HEST in favor of _OSC.
> > >
> > > [1]https://lkml.org/lkml/2018/11/15/62
> >
> > This change breaks the existing systems that rely on the HEST table
> > telling the operating system about firmware first presence.
> >
> > Besides, HEST table has much more granularity about which PCI component
> > needs firmware such as global/device/switch.
> >
> > You should probably circulate these ideas for wider consumption in UEFI
> > forum as UEFI owns the HEST table definition.
> 
> I agree with Sinan, this will break existing systems, and the granularity of 
> the
> HEST definition is more useful than the single bit in _OSC.

But we're not using HEST as a fine grain control. We disable native AER
handling if *any* device has FF set in HEST, and that just forces people
to use pcie_ports=native to get around that.


Re: [PATCH 0/2] PCI/AER: Consistently use _OSC to determine who owns AER

2018-11-19 Thread Sinan Kaya

On 11/19/2018 11:53 AM, Keith Busch wrote:

On Mon, Nov 19, 2018 at 11:53:05AM -0500, Tyler Baicar wrote:

On Thu, Nov 15, 2018 at 8:49 PM Sinan Kaya  wrote:


On 11/15/2018 3:16 PM, Alexandru Gagniuc wrote:

I've asked around a few people at Dell and they unanimously agree that
_OSC is the correct way to determine ownership of AER. In linux, we
use the result of _OSC to enable AER services, but we use HEST to
determine AER ownership. That's inconsistent. This series drops the
use of HEST in favor of _OSC.

[1]https://lkml.org/lkml/2018/11/15/62


This change breaks the existing systems that rely on the HEST table
telling the operating system about firmware first presence.

Besides, HEST table has much more granularity about which PCI component
needs firmware such as global/device/switch.

You should probably circulate these ideas for wider consumption in UEFI
forum as UEFI owns the HEST table definition.


I agree with Sinan, this will break existing systems, and the granularity of the
HEST definition is more useful than the single bit in _OSC.


But we're not using HEST as a fine grain control. We disable native AER
handling if *any* device has FF set in HEST, and that just forces people
to use pcie_ports=native to get around that.



I don't see *any* in the code.  aer_hest_parse() does the HEST table parsing.
It switches to firmware first mode if global flag in HEST is set. Otherwise
for each BDF in device, hest_match_pci() is used to do a cross-matching against
HEST table contents.

Am I missing something?


Re: [PATCH 0/2] PCI/AER: Consistently use _OSC to determine who owns AER

2018-11-20 Thread Sinan Kaya

On 11/20/2018 4:42 PM, Keith Busch wrote:

How does that work? If the OS takes control, it sets up MSIs that FW don't
react to, and disables system errors through PCIe Root Control. Aren't
those sys errs the mechanism FW knows it has something to do, which
means the OS can effectively fence it off?


I think this is all implementation detail and doesn't necessarily apply
to all firmware-first implementation flavors.

Assumptions are:
1. both FW and OS are listening to MSI interrupts
2. FW monitors the system errors

Some FF implementation could route the AER interrupt to a higher privilege
level. Some other implementation could use INTx or a side-band channel interrupt
for firmware-interrupt too.

I have seen all 3 except MSI :) and also firmware never monitored the system
error bits. I was curious if anybody ever used those legacy bits. Now, I know
someone is using it.




Re: [PATCH 0/2] PCI/AER: Consistently use _OSC to determine who owns AER

2018-11-27 Thread Tyler Baicar
On Tue, Nov 27, 2018 at 1:32 PM Sinan Kaya  wrote:
>
> On 11/27/2018 1:22 PM, alex_gagn...@dellteam.com wrote:
> > On 11/20/2018 04:08 PM, Sinan Kaya wrote:
> >> I followed the ASWG thread yesterday. There will be a meeting next week to
> >> discuss this.
> >
> > Any updates on the meeting?
> >
> >
>
> Tyler should be able to get an update.

As per Harb, the meetings are on Thursdays so it will be discussed this
Thursday since last week was a holiday.

Thanks,
Tyler


Re: [PATCH 0/2] PCI/AER: Consistently use _OSC to determine who owns AER

2019-03-05 Thread Bjorn Helgaas
On Thu, Nov 15, 2018 at 05:16:01PM -0600, Alexandru Gagniuc wrote:
> Thanks to Keith for pointing out that it doesn't make sense to disable
> AER services when only one device has a FIRMWARE_FIRST HEST.
> 
> AER ownership is an interesting issue brought in by FFS (firmware-first)
> model. In a nutshell if FFS handles AER, then OS should not touch any
> of the AER bits. FW might set things up so that it receives AER
> notifications via SMI. It's theoretically possible to receive SCIs,
> but the exact mechanism is platform-dependent. OS touching AER bits
> when firmware owns them may interfere with these notifications.
> 
> The ACPI mechanism for negotiating control of AER is _OSC, and is
> described in detail in ACPI 6.2 Ch. 6.2.11.3. _OSC is negotiated at
> the root bus level. Any root port, switch, or endpoint under the bus
> would have its AER ownership negotiated in one _OSC call.
> 
> Then there is HEST, which is part of ACPI Platform Error Interfaces
> (APEI). HEST tables describe the errors that FW may report to the OS.
> A types 6,7 and 7 HEST tables describe AER errors from PCIe devices.
> As part of this description, we're told if the error source is FFS.
> 
> Information in HEST seems to be redundant, as each error reported by
> FW will have a CPER table that describes it in detail.
> 
> Because HEST describes an error source as firmware-first or not, we've
> taken this to mean ownership of AER. Because AER ownership and error
> reporting are coupled, _OSC and HEST usually agree on the matter of
> ownership. However, that doesn't seem to be required by ACPI.
> 
> I've asked around a few people at Dell and they unanimously agree that
> _OSC is the correct way to determine ownership of AER. In linux, we
> use the result of _OSC to enable AER services, but we use HEST to
> determine AER ownership. That's inconsistent. This series drops the
> use of HEST in favor of _OSC.
> 
> [1] https://lkml.org/lkml/2018/11/15/62
> 
> Alexandru Gagniuc (2):
>   PCI/AER: Do not use APEI/HEST to disable AER services globally
>   PCI/AER: Determine AER ownership based on _OSC instead of HEST
> 
>  drivers/acpi/pci_root.c  |  9 +
>  drivers/pci/pcie/aer.c   | 82 ++--
>  include/linux/pci-acpi.h |  6 ---
>  3 files changed, 5 insertions(+), 92 deletions(-)

I'm pretty sure we do need to do something here, but there was quite a
lot of discussion that didn't seem to really get resolved, so I'm
dropping these for now.

Please repost them with any relevant updates and we'll see if we can
get a consensus that we're going the right direction.

Bjorn