>>> On 13.04.15 at 13:47, <m...@redhat.com> wrote:
> On Mon, Apr 13, 2015 at 12:34:34PM +0100, Jan Beulich wrote:
>> >>> On 13.04.15 at 13:19, <m...@redhat.com> wrote:
>> > Yes Linux can't fix firmware 1st mode, but
>> > PCI express spec says what firmware should do in this case:
>> > 
>> > IMPLEMENTATION NOTE Software UR Reporting Compatibility with 1.0a Devices
>> > 
>> >         With 1.0a device Functions, 96 if the Unsupported Request 
>> > Reporting 
> 
>> > Enable bit is set, the Function
>> >         when operating as a Completer will send an uncorrectable error 
>> > Message (if enabled) when a UR
>> >         error is detected. On platforms where an uncorrectable error 
> Message 
>> > is handled as a System Error,
>> >         this will break PC-compatible Configuration Space probing, so 
>> > software/firmware on such
>> >         platforms may need to avoid setting the Unsupported Request 
>> > Reporting Enable bit.
>> >         With device Functions implementing Role-Based Error Reporting, 
>> > setting the Unsupported Request
>> >         Reporting Enable bit will not interfere with PC-compatible 
>> > Configuration Space probing, assuming
>> >         that the severity for UR is left at its default of non-fatal. 
>> > However, setting the Unsupported Request
>> >         Reporting Enable bit will enable the Function to report UR errors 
>> > detected with posted Requests,
>> >         helping avoid this case for potential silent data corruption.
>> >         On platforms where robust error handling and PC-compatible 
>> > Configuration Space probing is
>> >         required, it is suggested that software or firmware have the 
>> > Unsupported Request Reporting Enable
>> >         bit Set for Role-Based Error Reporting Functions, but clear for 
>> > 1.0a 
> 
>> > Functions. Software or
>> >         firmware can distinguish the two classes of Functions by examining 
>> > the Role-Based Error Reporting
>> >         bit in the Device Capabilities register.
>> > 
>> > 
>> > What I think you have is a very old 1.0a system, and you set Unsupported
>> > Request Reporting Enable.
>> > 
>> > Can you confirm?
>> 
>> No. In at least one of the two cases we got reports of the original
>> problem, triggering the finding of this issue, this is a brand new one,
>> only soon to become available publicly. Furthermore I'm being
>> confused by the mention of PC-compatible config space probing
>> above: The URs we talk about here don't result from config space
>> accessed at all.
> 
> OK. Can you please explain why does UR cause a system error then?
> It looks like a hardware bug: PCIE 1.1 seems to say it shouldn't.

Quite possible. Looking at the ITP log we were provided, the UR
severity bit is clear (non-fatal), yet the error got surfaced to the
OS as a fatal one (I would guess because it validly gets flagged as
uncorrectable at the same time).

Jan


Reply via email to