RE: [PATCH] powerpc/fsl: Add support for pci(e) machine check exception on E500MC / E5500

2014-10-08 Thread Hongtao Jia


> -Original Message-
> From: Wood Scott-B07421
> Sent: Thursday, October 09, 2014 7:48 AM
> To: Jia Hongtao-B38951
> Cc: Guenter Roeck; Benjamin Herrenschmidt; Paul Mackerras; Michael
> Ellerman; linuxppc-dev@lists.ozlabs.org; linux-ker...@vger.kernel.org;
> Jojy G Varghese; Guenter Roeck
> Subject: Re: [PATCH] powerpc/fsl: Add support for pci(e) machine check
> exception on E500MC / E5500
> 
> On Tue, 2014-10-07 at 22:08 -0500, Jia Hongtao-B38951 wrote:
> >
> > > -Original Message-
> > > From: Wood Scott-B07421
> > > Sent: Tuesday, September 30, 2014 2:36 AM
> > > To: Guenter Roeck
> > > Cc: Benjamin Herrenschmidt; Paul Mackerras; Michael Ellerman;
> > > linuxppc- d...@lists.ozlabs.org; linux-ker...@vger.kernel.org; Jojy G
> > > Varghese; Guenter Roeck; Jia Hongtao-B38951
> > > Subject: Re: [PATCH] powerpc/fsl: Add support for pci(e) machine
> > > check exception on E500MC / E5500
> > >
> > > On Mon, 2014-09-29 at 09:48 -0700, Guenter Roeck wrote:
> > > > From: Jojy G Varghese 
> > > >
> > > > For E500MC and E5500, a machine check exception in pci(e) memory
> > > > space crashes the kernel.
> > > >
> > > > Testing shows that the MCAR(U) register is zero on a MC exception
> > > > for the
> > > > E5500 core. At the same time, DEAR register has been found to have
> > > > the address of the faulty load address during an MC exception for
> this core.
> > > >
> > > > This fix changes the current behavior to fixup the result register
> > > > and instruction pointers in the case of a load operation on a
> > > > faulty PCI address.
> > > >
> > > > The changes are:
> > > > - Added the hook to pci machine check handing to the e500mc
> > > > machine
> > > check
> > > >   exception handler.
> > > > - For the E5500 core, load faulting address from SPRN_DEAR register.
> > > >   As mentioned above, this is necessary because the E5500 core does
> not
> > > >   report the fault address in the MCAR register.
> > > >
> > > > Cc: Scott Wood 
> > > > Signed-off-by: Jojy G Varghese  [Guenter Roeck:
> > > > updated description]
> > > > Signed-off-by: Guenter Roeck 
> > > > Signed-off-by: Guenter Roeck 
> > > > ---
> > > >  arch/powerpc/kernel/traps.c   | 3 ++-
> > > >  arch/powerpc/sysdev/fsl_pci.c | 5 +
> > > >  2 files changed, 7 insertions(+), 1 deletion(-)
> > > >
> > > > diff --git a/arch/powerpc/kernel/traps.c
> > > > b/arch/powerpc/kernel/traps.c index 0dc43f9..ecb709b 100644
> > > > --- a/arch/powerpc/kernel/traps.c
> > > > +++ b/arch/powerpc/kernel/traps.c
> > > > @@ -494,7 +494,8 @@ int machine_check_e500mc(struct pt_regs *regs)
> > > > int recoverable = 1;
> > > >
> > > > if (reason & MCSR_LD) {
> > > > -   recoverable = fsl_rio_mcheck_exception(regs);
> > > > +   recoverable = fsl_rio_mcheck_exception(regs) ||
> > > > +   fsl_pci_mcheck_exception(regs);
> > > > if (recoverable == 1)
> > > > goto silent_out;
> > > > }
> > > > diff --git a/arch/powerpc/sysdev/fsl_pci.c
> > > > b/arch/powerpc/sysdev/fsl_pci.c index c507767..bdb956b 100644
> > > > --- a/arch/powerpc/sysdev/fsl_pci.c
> > > > +++ b/arch/powerpc/sysdev/fsl_pci.c
> > > > @@ -1021,6 +1021,11 @@ int fsl_pci_mcheck_exception(struct pt_regs
> > > > *regs)  #endif
> > > > addr += mfspr(SPRN_MCAR);
> > > >
> > > > +#ifdef CONFIG_E5500_CPU
> > > > +   if (mfspr(SPRN_EPCR) & SPRN_EPCR_ICM)
> > > > +   addr = PFN_PHYS(vmalloc_to_pfn((void
> *)mfspr(SPRN_DEAR)));
> > > #endif
> > >
> > > Kconfig tells you what hardware is supported, not what hardware
> > > you're actually running on.
> > >
> > > Jia Hongtao, do you know anything about this issue?  Is there an
> erratum?
> >
> > Sorry for the late response, I just return from my vacation.
> > I don't know this issue.
> >
> > > What chips are affected by the the erratum covered by
> > > <http://patchwork.ozlabs.org/patch/240239/>?
> >
> > MPC8544, MPC8548, MPC8572 are affected by this erratum.
> 
> Wh

Re: [PATCH] powerpc/fsl: Add support for pci(e) machine check exception on E500MC / E5500

2014-10-08 Thread Scott Wood
On Tue, 2014-10-07 at 22:08 -0500, Jia Hongtao-B38951 wrote:
> 
> > -Original Message-
> > From: Wood Scott-B07421
> > Sent: Tuesday, September 30, 2014 2:36 AM
> > To: Guenter Roeck
> > Cc: Benjamin Herrenschmidt; Paul Mackerras; Michael Ellerman; linuxppc-
> > d...@lists.ozlabs.org; linux-ker...@vger.kernel.org; Jojy G Varghese;
> > Guenter Roeck; Jia Hongtao-B38951
> > Subject: Re: [PATCH] powerpc/fsl: Add support for pci(e) machine check
> > exception on E500MC / E5500
> > 
> > On Mon, 2014-09-29 at 09:48 -0700, Guenter Roeck wrote:
> > > From: Jojy G Varghese 
> > >
> > > For E500MC and E5500, a machine check exception in pci(e) memory space
> > > crashes the kernel.
> > >
> > > Testing shows that the MCAR(U) register is zero on a MC exception for
> > > the
> > > E5500 core. At the same time, DEAR register has been found to have the
> > > address of the faulty load address during an MC exception for this core.
> > >
> > > This fix changes the current behavior to fixup the result register and
> > > instruction pointers in the case of a load operation on a faulty PCI
> > > address.
> > >
> > > The changes are:
> > > - Added the hook to pci machine check handing to the e500mc machine
> > check
> > >   exception handler.
> > > - For the E5500 core, load faulting address from SPRN_DEAR register.
> > >   As mentioned above, this is necessary because the E5500 core does not
> > >   report the fault address in the MCAR register.
> > >
> > > Cc: Scott Wood 
> > > Signed-off-by: Jojy G Varghese  [Guenter Roeck:
> > > updated description]
> > > Signed-off-by: Guenter Roeck 
> > > Signed-off-by: Guenter Roeck 
> > > ---
> > >  arch/powerpc/kernel/traps.c   | 3 ++-
> > >  arch/powerpc/sysdev/fsl_pci.c | 5 +
> > >  2 files changed, 7 insertions(+), 1 deletion(-)
> > >
> > > diff --git a/arch/powerpc/kernel/traps.c b/arch/powerpc/kernel/traps.c
> > > index 0dc43f9..ecb709b 100644
> > > --- a/arch/powerpc/kernel/traps.c
> > > +++ b/arch/powerpc/kernel/traps.c
> > > @@ -494,7 +494,8 @@ int machine_check_e500mc(struct pt_regs *regs)
> > >   int recoverable = 1;
> > >
> > >   if (reason & MCSR_LD) {
> > > - recoverable = fsl_rio_mcheck_exception(regs);
> > > + recoverable = fsl_rio_mcheck_exception(regs) ||
> > > + fsl_pci_mcheck_exception(regs);
> > >   if (recoverable == 1)
> > >   goto silent_out;
> > >   }
> > > diff --git a/arch/powerpc/sysdev/fsl_pci.c
> > > b/arch/powerpc/sysdev/fsl_pci.c index c507767..bdb956b 100644
> > > --- a/arch/powerpc/sysdev/fsl_pci.c
> > > +++ b/arch/powerpc/sysdev/fsl_pci.c
> > > @@ -1021,6 +1021,11 @@ int fsl_pci_mcheck_exception(struct pt_regs
> > > *regs)  #endif
> > >   addr += mfspr(SPRN_MCAR);
> > >
> > > +#ifdef CONFIG_E5500_CPU
> > > + if (mfspr(SPRN_EPCR) & SPRN_EPCR_ICM)
> > > + addr = PFN_PHYS(vmalloc_to_pfn((void *)mfspr(SPRN_DEAR)));
> > #endif
> > 
> > Kconfig tells you what hardware is supported, not what hardware you're
> > actually running on.
> > 
> > Jia Hongtao, do you know anything about this issue?  Is there an erratum?
> 
> Sorry for the late response, I just return from my vacation.
> I don't know this issue.
> 
> > What chips are affected by the the erratum covered by
> > <http://patchwork.ozlabs.org/patch/240239/>?
> 
> MPC8544, MPC8548, MPC8572 are affected by this erratum.

What is the erratum number?

> I checked P4080 which using e500mc and no such erratum is found.

What is the erratum behavior, and how does it differ from the problem
that Jojy and Guenter are trying to solve?

-Scott


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

RE: [PATCH] powerpc/fsl: Add support for pci(e) machine check exception on E500MC / E5500

2014-10-07 Thread Hongtao Jia


> -Original Message-
> From: Wood Scott-B07421
> Sent: Wednesday, October 01, 2014 8:44 AM
> To: Guenter Roeck
> Cc: Jojy Varghese; Benjamin Herrenschmidt; Paul Mackerras; Michael
> Ellerman; linuxppc-dev@lists.ozlabs.org; linux-ker...@vger.kernel.org;
> Guenter Roeck; Jia Hongtao-B38951
> Subject: Re: [PATCH] powerpc/fsl: Add support for pci(e) machine check
> exception on E500MC / E5500
> 
> On Tue, 2014-09-30 at 08:50 -0700, Guenter Roeck wrote:
> > On Mon, Sep 29, 2014 at 06:31:06PM -0500, Scott Wood wrote:
> > > On Mon, 2014-09-29 at 23:03 +, Jojy Varghese wrote:
> > > >
> > > > On 9/29/14 12:06 PM, "Guenter Roeck"  wrote:
> > > >
> > > > >Those are errors related to PCIe hotplug, and are seen with
> > > > >unexpected PCIe device removals (triggered, for example, by
> > > > >removing power from a PCIe adapter).
> > > > >The behavior we see on E5500 is quite similar to the same
> > > > >behavior on
> > > > >E500:
> > > > >If unhandled, the CPU keeps executing the same instruction over
> > > > >and over again if there is an error on a PCIe access and thus
> > > > >stalls. I don't know if this is considered an erratum or expected
> > > > >behavior, but it is one we have to address since we have to be
> > > > >able to handle that condition.
> > >
> > > The reason I ask is that the handling for e500 was described as an
> > > erratum workaround.  If it is an erratum it would be nice to know
> > > the erratum number and the full list of affected chips.
> > >
> > My understanding, which may be wrong, was that this is expected
> > behavior, at least for E5500. I actually thought I had seen it
> > somewhere in the specification (response to PCIe errors), but I don't
> recall where exactly.
> >
> > At least for my part I am not aware of an erratum.
> 
> Jia Hongtao, can you comment here?

I did not find any related erratum either.

> 
> > > > >Ultimately, we'll want
> > > > >to
> > > > >implement PCIe error handlers for the affected drivers, but that
> > > > >will be a next step.
> > >
> > > For now can we at least print a ratelimited error message?  I don't
> > > like the idea of silently ignoring these errors.  I suppose it's a
> > > separate issue from extending the workaround to cover e500mc, though.
> > >
> > I don't really like the idea of printing an error message pretty much
> > each time when an unexpected hotplug event occurs.
> 
> Unexpected events seem like the sort of thing you'd want to log, but my
> concern is that this might not be the only cause of PCI errors.
> 
> -Scott
> 

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

RE: [PATCH] powerpc/fsl: Add support for pci(e) machine check exception on E500MC / E5500

2014-10-07 Thread Hongtao Jia


> -Original Message-
> From: Wood Scott-B07421
> Sent: Tuesday, September 30, 2014 2:36 AM
> To: Guenter Roeck
> Cc: Benjamin Herrenschmidt; Paul Mackerras; Michael Ellerman; linuxppc-
> d...@lists.ozlabs.org; linux-ker...@vger.kernel.org; Jojy G Varghese;
> Guenter Roeck; Jia Hongtao-B38951
> Subject: Re: [PATCH] powerpc/fsl: Add support for pci(e) machine check
> exception on E500MC / E5500
> 
> On Mon, 2014-09-29 at 09:48 -0700, Guenter Roeck wrote:
> > From: Jojy G Varghese 
> >
> > For E500MC and E5500, a machine check exception in pci(e) memory space
> > crashes the kernel.
> >
> > Testing shows that the MCAR(U) register is zero on a MC exception for
> > the
> > E5500 core. At the same time, DEAR register has been found to have the
> > address of the faulty load address during an MC exception for this core.
> >
> > This fix changes the current behavior to fixup the result register and
> > instruction pointers in the case of a load operation on a faulty PCI
> > address.
> >
> > The changes are:
> > - Added the hook to pci machine check handing to the e500mc machine
> check
> >   exception handler.
> > - For the E5500 core, load faulting address from SPRN_DEAR register.
> >   As mentioned above, this is necessary because the E5500 core does not
> >   report the fault address in the MCAR register.
> >
> > Cc: Scott Wood 
> > Signed-off-by: Jojy G Varghese  [Guenter Roeck:
> > updated description]
> > Signed-off-by: Guenter Roeck 
> > Signed-off-by: Guenter Roeck 
> > ---
> >  arch/powerpc/kernel/traps.c   | 3 ++-
> >  arch/powerpc/sysdev/fsl_pci.c | 5 +
> >  2 files changed, 7 insertions(+), 1 deletion(-)
> >
> > diff --git a/arch/powerpc/kernel/traps.c b/arch/powerpc/kernel/traps.c
> > index 0dc43f9..ecb709b 100644
> > --- a/arch/powerpc/kernel/traps.c
> > +++ b/arch/powerpc/kernel/traps.c
> > @@ -494,7 +494,8 @@ int machine_check_e500mc(struct pt_regs *regs)
> > int recoverable = 1;
> >
> > if (reason & MCSR_LD) {
> > -   recoverable = fsl_rio_mcheck_exception(regs);
> > +   recoverable = fsl_rio_mcheck_exception(regs) ||
> > +   fsl_pci_mcheck_exception(regs);
> > if (recoverable == 1)
> > goto silent_out;
> > }
> > diff --git a/arch/powerpc/sysdev/fsl_pci.c
> > b/arch/powerpc/sysdev/fsl_pci.c index c507767..bdb956b 100644
> > --- a/arch/powerpc/sysdev/fsl_pci.c
> > +++ b/arch/powerpc/sysdev/fsl_pci.c
> > @@ -1021,6 +1021,11 @@ int fsl_pci_mcheck_exception(struct pt_regs
> > *regs)  #endif
> > addr += mfspr(SPRN_MCAR);
> >
> > +#ifdef CONFIG_E5500_CPU
> > +   if (mfspr(SPRN_EPCR) & SPRN_EPCR_ICM)
> > +   addr = PFN_PHYS(vmalloc_to_pfn((void *)mfspr(SPRN_DEAR)));
> #endif
> 
> Kconfig tells you what hardware is supported, not what hardware you're
> actually running on.
> 
> Jia Hongtao, do you know anything about this issue?  Is there an erratum?

Sorry for the late response, I just return from my vacation.
I don't know this issue.

> What chips are affected by the the erratum covered by
> <http://patchwork.ozlabs.org/patch/240239/>?

MPC8544, MPC8548, MPC8572 are affected by this erratum.
I checked P4080 which using e500mc and no such erratum is found.

> 
> Can we rely on DEAR or is this just a side effect of likely having taken
> a TLB miss for the address recently?  Perhaps we should use the
> instruction emulation to determine the effective address instead.
> 
> Guenter, is this patch intended to deal with an erratum or are you
> covering up legitimate errors?
> 
> -Scott
> 

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH] powerpc/fsl: Add support for pci(e) machine check exception on E500MC / E5500

2014-09-30 Thread Scott Wood
On Tue, 2014-09-30 at 08:50 -0700, Guenter Roeck wrote:
> On Mon, Sep 29, 2014 at 06:31:06PM -0500, Scott Wood wrote:
> > On Mon, 2014-09-29 at 23:03 +, Jojy Varghese wrote:
> > > 
> > > On 9/29/14 12:06 PM, "Guenter Roeck"  wrote:
> > > 
> > > >Those are errors related to PCIe hotplug, and are seen with unexpected
> > > >PCIe
> > > >device removals (triggered, for example, by removing power from a PCIe
> > > >adapter).
> > > >The behavior we see on E5500 is quite similar to the same behavior on
> > > >E500:
> > > >If unhandled, the CPU keeps executing the same instruction over and over
> > > >again
> > > >if there is an error on a PCIe access and thus stalls. I don't know if
> > > >this
> > > >is considered an erratum or expected behavior, but it is one we have to
> > > >address
> > > >since we have to be able to handle that condition. 
> > 
> > The reason I ask is that the handling for e500 was described as an
> > erratum workaround.  If it is an erratum it would be nice to know the
> > erratum number and the full list of affected chips.
> > 
> My understanding, which may be wrong, was that this is expected behavior,
> at least for E5500. I actually thought I had seen it somewhere in the
> specification (response to PCIe errors), but I don't recall where exactly.
> 
> At least for my part I am not aware of an erratum.

Jia Hongtao, can you comment here?

> > > >Ultimately, we'll want
> > > >to
> > > >implement PCIe error handlers for the affected drivers, but that will be
> > > >a next
> > > >step.
> > 
> > For now can we at least print a ratelimited error message?  I don't like
> > the idea of silently ignoring these errors.  I suppose it's a separate
> > issue from extending the workaround to cover e500mc, though.
> > 
> I don't really like the idea of printing an error message pretty much each 
> time
> when an unexpected hotplug event occurs.

Unexpected events seem like the sort of thing you'd want to log, but my
concern is that this might not be the only cause of PCI errors.

-Scott


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH] powerpc/fsl: Add support for pci(e) machine check exception on E500MC / E5500

2014-09-30 Thread Jojy Varghese


On 9/30/14 1:17 PM, "Scott Wood"  wrote:

>On Tue, 2014-09-30 at 20:15 +, Jojy Varghese wrote:
>> 
>> On 9/30/14 8:50 AM, "Guenter Roeck"  wrote:
>> 
>> >On Mon, Sep 29, 2014 at 06:31:06PM -0500, Scott Wood wrote:
>> >> Which specific chip and revision did you see this on?  What is the
>>value
>> >> in MCSR?
>> >> 
>> >Jojy can answer that, at least for P5020. We have seen it on P5040 as
>> >well,
>> >though, so it is not just limited to one chip/revision.
>> 
>> The specifics are:
>> PVR: 0x80240012
>> Instruction that causes the MC exception: lwbrx
>>  The faulty load address is also present in RB. So we could change the
>> logic to use that
>> instead of DEAR. What I don¹t know is of there are other cases also
>>which
>> escapes the current logic.
>
>Could you find out what MCSR was when that happened?  I'm most
>interested in whether MAV was set, but the other bits would be
>interesting as well.

MCSR=a000 ( Load Error Report)
>
>-Scott
>
>
Thanks
Jojy

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH] powerpc/fsl: Add support for pci(e) machine check exception on E500MC / E5500

2014-09-30 Thread Jojy Varghese


On 9/30/14 8:50 AM, "Guenter Roeck"  wrote:

>On Mon, Sep 29, 2014 at 06:31:06PM -0500, Scott Wood wrote:
>> On Mon, 2014-09-29 at 23:03 +, Jojy Varghese wrote:
>> > 
>> > On 9/29/14 12:06 PM, "Guenter Roeck"  wrote:
>> > 
>> > >On Mon, Sep 29, 2014 at 01:36:06PM -0500, Scott Wood wrote:
>> > >> On Mon, 2014-09-29 at 09:48 -0700, Guenter Roeck wrote:
>> > >> > From: Jojy G Varghese 
>> > >> > 
>> > >> > For E500MC and E5500, a machine check exception in pci(e) memory
>>space
>> > >> > crashes the kernel.
>> > >> > 
>> > >> > Testing shows that the MCAR(U) register is zero on a MC
>>exception for
>> > >>the
>> > >> > E5500 core. At the same time, DEAR register has been found to
>>have the
>> > >> > address of the faulty load address during an MC exception for
>>this
>> > >>core.
>> > >> > 
>> > >> > This fix changes the current behavior to fixup the result
>>register
>> > >> > and instruction pointers in the case of a load operation on a
>>faulty
>> > >> > PCI address.
>> > >> > 
>> > >> > The changes are:
>> > >> > - Added the hook to pci machine check handing to the e500mc
>>machine
>> > >>check
>> > >> >   exception handler.
>> > >> > - For the E5500 core, load faulting address from SPRN_DEAR
>>register.
>> > >> >   As mentioned above, this is necessary because the E5500 core
>>does
>> > >>not
>> > >> >   report the fault address in the MCAR register.
>> > >> > 
>> > >> > Cc: Scott Wood 
>> > >> > Signed-off-by: Jojy G Varghese 
>> > >> > [Guenter Roeck: updated description]
>> > >> > Signed-off-by: Guenter Roeck 
>> > >> > Signed-off-by: Guenter Roeck 
>> > >> > ---
>> > >> >  arch/powerpc/kernel/traps.c   | 3 ++-
>> > >> >  arch/powerpc/sysdev/fsl_pci.c | 5 +
>> > >> >  2 files changed, 7 insertions(+), 1 deletion(-)
>> > >> > 
>> > >> > diff --git a/arch/powerpc/kernel/traps.c
>>b/arch/powerpc/kernel/traps.c
>> > >> > index 0dc43f9..ecb709b 100644
>> > >> > --- a/arch/powerpc/kernel/traps.c
>> > >> > +++ b/arch/powerpc/kernel/traps.c
>> > >> > @@ -494,7 +494,8 @@ int machine_check_e500mc(struct pt_regs
>>*regs)
>> > >> >   int recoverable = 1;
>> > >> >  
>> > >> >   if (reason & MCSR_LD) {
>> > >> > - recoverable = fsl_rio_mcheck_exception(regs);
>> > >> > + recoverable = fsl_rio_mcheck_exception(regs) ||
>> > >> > + fsl_pci_mcheck_exception(regs);
>> > >> >   if (recoverable == 1)
>> > >> >   goto silent_out;
>> > >> >   }
>> > >> > diff --git a/arch/powerpc/sysdev/fsl_pci.c
>> > >>b/arch/powerpc/sysdev/fsl_pci.c
>> > >> > index c507767..bdb956b 100644
>> > >> > --- a/arch/powerpc/sysdev/fsl_pci.c
>> > >> > +++ b/arch/powerpc/sysdev/fsl_pci.c
>> > >> > @@ -1021,6 +1021,11 @@ int fsl_pci_mcheck_exception(struct
>>pt_regs
>> > >>*regs)
>> > >> >  #endif
>> > >> >   addr += mfspr(SPRN_MCAR);
>> > >> >  
>> > >> > +#ifdef CONFIG_E5500_CPU
>> > >> > + if (mfspr(SPRN_EPCR) & SPRN_EPCR_ICM)
>> > >> > + addr = PFN_PHYS(vmalloc_to_pfn((void 
>> > >> > *)mfspr(SPRN_DEAR)));
>> > >> > +#endif
>> > >> 
>> > >> Kconfig tells you what hardware is supported, not what hardware
>>you're
>> > >> actually running on.
>> 
>> Plus, CONFIG_E5500_CPU may not even be set when running on an e5500, as
>> it is used for selecting GCC optimization settings.  You could have
>> CONFIG_GENERIC_CPU instead.
>> 
>> And the subject says "E500MC / E5500", not just "E5500". :-)
>> 
>> > >Hi Scott,
>> > >
>> > >Good point. Jojy, guess we'll have to check if the CPU is actually an
>> > >E5500.
>> > >Can you look into that ?
>> > 
>> > 
>> > "/proc/cpuinfo" shows the cpu as "e5500". Scott, are you suggesting
>>that
>> > we use a runtime method of determining the cpu type (cpu_spec's
>>cpu_name
>> > for
>> > example).  
>> 
>> Yes, if there's a bug to be worked around, and we don't want to apply
>> the workaround unconditionally, you should use PVR to determine whether
>> you're running on an affected core.
>> 
>> > >> Can we rely on DEAR or is this just a side effect of likely having
>>taken
>> > >> a TLB miss for the address recently?  Perhaps we should use the
>> > >> instruction emulation to determine the effective address instead.
>> > >> 
>> > >> Guenter, is this patch intended to deal with an erratum or are you
>> > >> covering up legitimate errors?
>> > >> 
>> >
>> > >Those are errors related to PCIe hotplug, and are seen with
>>unexpected
>> > >PCIe
>> > >device removals (triggered, for example, by removing power from a
>>PCIe
>> > >adapter).
>> > >The behavior we see on E5500 is quite similar to the same behavior on
>> > >E500:
>> > >If unhandled, the CPU keeps executing the same instruction over and
>>over
>> > >again
>> > >if there is an error on a PCIe access and thus stalls. I don't know
>>if
>> > >this
>> > >is considered an erratum or expected behavior, but it is one we have
>>to
>> > >address
>> > >since we have to be able to handle that condition.
>> 
>> The reason I ask is that the handling for e500 was 

Re: [PATCH] powerpc/fsl: Add support for pci(e) machine check exception on E500MC / E5500

2014-09-30 Thread Scott Wood
On Tue, 2014-09-30 at 20:15 +, Jojy Varghese wrote:
> 
> On 9/30/14 8:50 AM, "Guenter Roeck"  wrote:
> 
> >On Mon, Sep 29, 2014 at 06:31:06PM -0500, Scott Wood wrote:
> >> Which specific chip and revision did you see this on?  What is the value
> >> in MCSR?
> >> 
> >Jojy can answer that, at least for P5020. We have seen it on P5040 as
> >well,
> >though, so it is not just limited to one chip/revision.
> 
> The specifics are:
> PVR: 0x80240012
> Instruction that causes the MC exception: lwbrx
>   The faulty load address is also present in RB. So we could change the
> logic to use that 
> instead of DEAR. What I don’t know is of there are other cases also which
> escapes the current logic.

Could you find out what MCSR was when that happened?  I'm most
interested in whether MAV was set, but the other bits would be
interesting as well.

-Scott


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH] powerpc/fsl: Add support for pci(e) machine check exception on E500MC / E5500

2014-09-30 Thread Guenter Roeck
On Mon, Sep 29, 2014 at 06:31:06PM -0500, Scott Wood wrote:
> On Mon, 2014-09-29 at 23:03 +, Jojy Varghese wrote:
> > 
> > On 9/29/14 12:06 PM, "Guenter Roeck"  wrote:
> > 
> > >On Mon, Sep 29, 2014 at 01:36:06PM -0500, Scott Wood wrote:
> > >> On Mon, 2014-09-29 at 09:48 -0700, Guenter Roeck wrote:
> > >> > From: Jojy G Varghese 
> > >> > 
> > >> > For E500MC and E5500, a machine check exception in pci(e) memory space
> > >> > crashes the kernel.
> > >> > 
> > >> > Testing shows that the MCAR(U) register is zero on a MC exception for
> > >>the
> > >> > E5500 core. At the same time, DEAR register has been found to have the
> > >> > address of the faulty load address during an MC exception for this
> > >>core.
> > >> > 
> > >> > This fix changes the current behavior to fixup the result register
> > >> > and instruction pointers in the case of a load operation on a faulty
> > >> > PCI address.
> > >> > 
> > >> > The changes are:
> > >> > - Added the hook to pci machine check handing to the e500mc machine
> > >>check
> > >> >   exception handler.
> > >> > - For the E5500 core, load faulting address from SPRN_DEAR register.
> > >> >   As mentioned above, this is necessary because the E5500 core does
> > >>not
> > >> >   report the fault address in the MCAR register.
> > >> > 
> > >> > Cc: Scott Wood 
> > >> > Signed-off-by: Jojy G Varghese 
> > >> > [Guenter Roeck: updated description]
> > >> > Signed-off-by: Guenter Roeck 
> > >> > Signed-off-by: Guenter Roeck 
> > >> > ---
> > >> >  arch/powerpc/kernel/traps.c   | 3 ++-
> > >> >  arch/powerpc/sysdev/fsl_pci.c | 5 +
> > >> >  2 files changed, 7 insertions(+), 1 deletion(-)
> > >> > 
> > >> > diff --git a/arch/powerpc/kernel/traps.c b/arch/powerpc/kernel/traps.c
> > >> > index 0dc43f9..ecb709b 100644
> > >> > --- a/arch/powerpc/kernel/traps.c
> > >> > +++ b/arch/powerpc/kernel/traps.c
> > >> > @@ -494,7 +494,8 @@ int machine_check_e500mc(struct pt_regs *regs)
> > >> >int recoverable = 1;
> > >> >  
> > >> >if (reason & MCSR_LD) {
> > >> > -  recoverable = fsl_rio_mcheck_exception(regs);
> > >> > +  recoverable = fsl_rio_mcheck_exception(regs) ||
> > >> > +  fsl_pci_mcheck_exception(regs);
> > >> >if (recoverable == 1)
> > >> >goto silent_out;
> > >> >}
> > >> > diff --git a/arch/powerpc/sysdev/fsl_pci.c
> > >>b/arch/powerpc/sysdev/fsl_pci.c
> > >> > index c507767..bdb956b 100644
> > >> > --- a/arch/powerpc/sysdev/fsl_pci.c
> > >> > +++ b/arch/powerpc/sysdev/fsl_pci.c
> > >> > @@ -1021,6 +1021,11 @@ int fsl_pci_mcheck_exception(struct pt_regs
> > >>*regs)
> > >> >  #endif
> > >> >addr += mfspr(SPRN_MCAR);
> > >> >  
> > >> > +#ifdef CONFIG_E5500_CPU
> > >> > +  if (mfspr(SPRN_EPCR) & SPRN_EPCR_ICM)
> > >> > +  addr = PFN_PHYS(vmalloc_to_pfn((void 
> > >> > *)mfspr(SPRN_DEAR)));
> > >> > +#endif
> > >> 
> > >> Kconfig tells you what hardware is supported, not what hardware you're
> > >> actually running on.
> 
> Plus, CONFIG_E5500_CPU may not even be set when running on an e5500, as
> it is used for selecting GCC optimization settings.  You could have
> CONFIG_GENERIC_CPU instead.
> 
> And the subject says "E500MC / E5500", not just "E5500". :-)
> 
> > >Hi Scott,
> > >
> > >Good point. Jojy, guess we'll have to check if the CPU is actually an
> > >E5500.
> > >Can you look into that ?
> > 
> > 
> > "/proc/cpuinfo" shows the cpu as "e5500". Scott, are you suggesting that
> > we use a runtime method of determining the cpu type (cpu_spec's cpu_name
> > for
> > example).  
> 
> Yes, if there's a bug to be worked around, and we don't want to apply
> the workaround unconditionally, you should use PVR to determine whether
> you're running on an affected core.
> 
> > >> Can we rely on DEAR or is this just a side effect of likely having taken
> > >> a TLB miss for the address recently?  Perhaps we should use the
> > >> instruction emulation to determine the effective address instead.
> > >> 
> > >> Guenter, is this patch intended to deal with an erratum or are you
> > >> covering up legitimate errors?
> > >> 
> >
> > >Those are errors related to PCIe hotplug, and are seen with unexpected
> > >PCIe
> > >device removals (triggered, for example, by removing power from a PCIe
> > >adapter).
> > >The behavior we see on E5500 is quite similar to the same behavior on
> > >E500:
> > >If unhandled, the CPU keeps executing the same instruction over and over
> > >again
> > >if there is an error on a PCIe access and thus stalls. I don't know if
> > >this
> > >is considered an erratum or expected behavior, but it is one we have to
> > >address
> > >since we have to be able to handle that condition. 
> 
> The reason I ask is that the handling for e500 was described as an
> erratum workaround.  If it is an erratum it would be nice to know the
> erratum number and the full list of affected chips.
> 
My understanding, which may be wrong, was that this

Re: [PATCH] powerpc/fsl: Add support for pci(e) machine check exception on E500MC / E5500

2014-09-29 Thread Jojy Varghese


On 9/29/14 12:06 PM, "Guenter Roeck"  wrote:

>On Mon, Sep 29, 2014 at 01:36:06PM -0500, Scott Wood wrote:
>> On Mon, 2014-09-29 at 09:48 -0700, Guenter Roeck wrote:
>> > From: Jojy G Varghese 
>> > 
>> > For E500MC and E5500, a machine check exception in pci(e) memory space
>> > crashes the kernel.
>> > 
>> > Testing shows that the MCAR(U) register is zero on a MC exception for
>>the
>> > E5500 core. At the same time, DEAR register has been found to have the
>> > address of the faulty load address during an MC exception for this
>>core.
>> > 
>> > This fix changes the current behavior to fixup the result register
>> > and instruction pointers in the case of a load operation on a faulty
>> > PCI address.
>> > 
>> > The changes are:
>> > - Added the hook to pci machine check handing to the e500mc machine
>>check
>> >   exception handler.
>> > - For the E5500 core, load faulting address from SPRN_DEAR register.
>> >   As mentioned above, this is necessary because the E5500 core does
>>not
>> >   report the fault address in the MCAR register.
>> > 
>> > Cc: Scott Wood 
>> > Signed-off-by: Jojy G Varghese 
>> > [Guenter Roeck: updated description]
>> > Signed-off-by: Guenter Roeck 
>> > Signed-off-by: Guenter Roeck 
>> > ---
>> >  arch/powerpc/kernel/traps.c   | 3 ++-
>> >  arch/powerpc/sysdev/fsl_pci.c | 5 +
>> >  2 files changed, 7 insertions(+), 1 deletion(-)
>> > 
>> > diff --git a/arch/powerpc/kernel/traps.c b/arch/powerpc/kernel/traps.c
>> > index 0dc43f9..ecb709b 100644
>> > --- a/arch/powerpc/kernel/traps.c
>> > +++ b/arch/powerpc/kernel/traps.c
>> > @@ -494,7 +494,8 @@ int machine_check_e500mc(struct pt_regs *regs)
>> >int recoverable = 1;
>> >  
>> >if (reason & MCSR_LD) {
>> > -  recoverable = fsl_rio_mcheck_exception(regs);
>> > +  recoverable = fsl_rio_mcheck_exception(regs) ||
>> > +  fsl_pci_mcheck_exception(regs);
>> >if (recoverable == 1)
>> >goto silent_out;
>> >}
>> > diff --git a/arch/powerpc/sysdev/fsl_pci.c
>>b/arch/powerpc/sysdev/fsl_pci.c
>> > index c507767..bdb956b 100644
>> > --- a/arch/powerpc/sysdev/fsl_pci.c
>> > +++ b/arch/powerpc/sysdev/fsl_pci.c
>> > @@ -1021,6 +1021,11 @@ int fsl_pci_mcheck_exception(struct pt_regs
>>*regs)
>> >  #endif
>> >addr += mfspr(SPRN_MCAR);
>> >  
>> > +#ifdef CONFIG_E5500_CPU
>> > +  if (mfspr(SPRN_EPCR) & SPRN_EPCR_ICM)
>> > +  addr = PFN_PHYS(vmalloc_to_pfn((void *)mfspr(SPRN_DEAR)));
>> > +#endif
>> 
>> Kconfig tells you what hardware is supported, not what hardware you're
>> actually running on.
>> 
>Hi Scott,
>
>Good point. Jojy, guess we'll have to check if the CPU is actually an
>E5500.
>Can you look into that ?


"/proc/cpuinfo" shows the cpu as "e5500". Scott, are you suggesting that
we use a runtime method of determining the cpu type (cpu_spec's cpu_name
for
example).  


>
>> Jia Hongtao, do you know anything about this issue?  Is there an
>> erratum?  What chips are affected by the the erratum covered by
>> ?
>> 
>We already have and use the above patch(es) in our kernel. It works fine
>for E500 (P2020), but does not address E5500 (P5020/P5040).
>
>> Can we rely on DEAR or is this just a side effect of likely having taken
>> a TLB miss for the address recently?  Perhaps we should use the
>> instruction emulation to determine the effective address instead.
>> 
>> Guenter, is this patch intended to deal with an erratum or are you
>> covering up legitimate errors?
>> 
>Those are errors related to PCIe hotplug, and are seen with unexpected
>PCIe
>device removals (triggered, for example, by removing power from a PCIe
>adapter).
>The behavior we see on E5500 is quite similar to the same behavior on
>E500:
>If unhandled, the CPU keeps executing the same instruction over and over
>again
>if there is an error on a PCIe access and thus stalls. I don't know if
>this
>is considered an erratum or expected behavior, but it is one we have to
>address
>since we have to be able to handle that condition. Ultimately, we'll want
>to
>implement PCIe error handlers for the affected drivers, but that will be
>a next
>step.

According to the spec, we MCAR is supposed to hold the faulty data address
but for 5500 core, we found that MCAR is zero. You are right that DEAR
entry could
be a resultOf a TLB miss but that¹s the register we could rely on.

What do you mean by "instruction emulation"? Are you suggesting that we
examine the RD, RS 
registers for the instruction?



>
>Please let me know if you have a better solution to address this problem.
>
>Thanks,
>Guenter


Thanks
Jojy

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH] powerpc/fsl: Add support for pci(e) machine check exception on E500MC / E5500

2014-09-29 Thread Scott Wood
On Mon, 2014-09-29 at 23:03 +, Jojy Varghese wrote:
> 
> On 9/29/14 12:06 PM, "Guenter Roeck"  wrote:
> 
> >On Mon, Sep 29, 2014 at 01:36:06PM -0500, Scott Wood wrote:
> >> On Mon, 2014-09-29 at 09:48 -0700, Guenter Roeck wrote:
> >> > From: Jojy G Varghese 
> >> > 
> >> > For E500MC and E5500, a machine check exception in pci(e) memory space
> >> > crashes the kernel.
> >> > 
> >> > Testing shows that the MCAR(U) register is zero on a MC exception for
> >>the
> >> > E5500 core. At the same time, DEAR register has been found to have the
> >> > address of the faulty load address during an MC exception for this
> >>core.
> >> > 
> >> > This fix changes the current behavior to fixup the result register
> >> > and instruction pointers in the case of a load operation on a faulty
> >> > PCI address.
> >> > 
> >> > The changes are:
> >> > - Added the hook to pci machine check handing to the e500mc machine
> >>check
> >> >   exception handler.
> >> > - For the E5500 core, load faulting address from SPRN_DEAR register.
> >> >   As mentioned above, this is necessary because the E5500 core does
> >>not
> >> >   report the fault address in the MCAR register.
> >> > 
> >> > Cc: Scott Wood 
> >> > Signed-off-by: Jojy G Varghese 
> >> > [Guenter Roeck: updated description]
> >> > Signed-off-by: Guenter Roeck 
> >> > Signed-off-by: Guenter Roeck 
> >> > ---
> >> >  arch/powerpc/kernel/traps.c   | 3 ++-
> >> >  arch/powerpc/sysdev/fsl_pci.c | 5 +
> >> >  2 files changed, 7 insertions(+), 1 deletion(-)
> >> > 
> >> > diff --git a/arch/powerpc/kernel/traps.c b/arch/powerpc/kernel/traps.c
> >> > index 0dc43f9..ecb709b 100644
> >> > --- a/arch/powerpc/kernel/traps.c
> >> > +++ b/arch/powerpc/kernel/traps.c
> >> > @@ -494,7 +494,8 @@ int machine_check_e500mc(struct pt_regs *regs)
> >> >  int recoverable = 1;
> >> >  
> >> >  if (reason & MCSR_LD) {
> >> > -recoverable = fsl_rio_mcheck_exception(regs);
> >> > +recoverable = fsl_rio_mcheck_exception(regs) ||
> >> > +fsl_pci_mcheck_exception(regs);
> >> >  if (recoverable == 1)
> >> >  goto silent_out;
> >> >  }
> >> > diff --git a/arch/powerpc/sysdev/fsl_pci.c
> >>b/arch/powerpc/sysdev/fsl_pci.c
> >> > index c507767..bdb956b 100644
> >> > --- a/arch/powerpc/sysdev/fsl_pci.c
> >> > +++ b/arch/powerpc/sysdev/fsl_pci.c
> >> > @@ -1021,6 +1021,11 @@ int fsl_pci_mcheck_exception(struct pt_regs
> >>*regs)
> >> >  #endif
> >> >  addr += mfspr(SPRN_MCAR);
> >> >  
> >> > +#ifdef CONFIG_E5500_CPU
> >> > +if (mfspr(SPRN_EPCR) & SPRN_EPCR_ICM)
> >> > +addr = PFN_PHYS(vmalloc_to_pfn((void 
> >> > *)mfspr(SPRN_DEAR)));
> >> > +#endif
> >> 
> >> Kconfig tells you what hardware is supported, not what hardware you're
> >> actually running on.

Plus, CONFIG_E5500_CPU may not even be set when running on an e5500, as
it is used for selecting GCC optimization settings.  You could have
CONFIG_GENERIC_CPU instead.

And the subject says "E500MC / E5500", not just "E5500". :-)

> >Hi Scott,
> >
> >Good point. Jojy, guess we'll have to check if the CPU is actually an
> >E5500.
> >Can you look into that ?
> 
> 
> "/proc/cpuinfo" shows the cpu as "e5500". Scott, are you suggesting that
> we use a runtime method of determining the cpu type (cpu_spec's cpu_name
> for
> example).  

Yes, if there's a bug to be worked around, and we don't want to apply
the workaround unconditionally, you should use PVR to determine whether
you're running on an affected core.

> >> Can we rely on DEAR or is this just a side effect of likely having taken
> >> a TLB miss for the address recently?  Perhaps we should use the
> >> instruction emulation to determine the effective address instead.
> >> 
> >> Guenter, is this patch intended to deal with an erratum or are you
> >> covering up legitimate errors?
> >> 
>
> >Those are errors related to PCIe hotplug, and are seen with unexpected
> >PCIe
> >device removals (triggered, for example, by removing power from a PCIe
> >adapter).
> >The behavior we see on E5500 is quite similar to the same behavior on
> >E500:
> >If unhandled, the CPU keeps executing the same instruction over and over
> >again
> >if there is an error on a PCIe access and thus stalls. I don't know if
> >this
> >is considered an erratum or expected behavior, but it is one we have to
> >address
> >since we have to be able to handle that condition. 

The reason I ask is that the handling for e500 was described as an
erratum workaround.  If it is an erratum it would be nice to know the
erratum number and the full list of affected chips.

> >Ultimately, we'll want
> >to
> >implement PCIe error handlers for the affected drivers, but that will be
> >a next
> >step.

For now can we at least print a ratelimited error message?  I don't like
the idea of silently ignoring these errors.  I suppose it's a separate
issue from extending the workaround to cover e500mc, though

Re: [PATCH] powerpc/fsl: Add support for pci(e) machine check exception on E500MC / E5500

2014-09-29 Thread Guenter Roeck
On Mon, Sep 29, 2014 at 01:36:06PM -0500, Scott Wood wrote:
> On Mon, 2014-09-29 at 09:48 -0700, Guenter Roeck wrote:
> > From: Jojy G Varghese 
> > 
> > For E500MC and E5500, a machine check exception in pci(e) memory space
> > crashes the kernel.
> > 
> > Testing shows that the MCAR(U) register is zero on a MC exception for the
> > E5500 core. At the same time, DEAR register has been found to have the
> > address of the faulty load address during an MC exception for this core.
> > 
> > This fix changes the current behavior to fixup the result register
> > and instruction pointers in the case of a load operation on a faulty
> > PCI address.
> > 
> > The changes are:
> > - Added the hook to pci machine check handing to the e500mc machine check
> >   exception handler.
> > - For the E5500 core, load faulting address from SPRN_DEAR register.
> >   As mentioned above, this is necessary because the E5500 core does not
> >   report the fault address in the MCAR register.
> > 
> > Cc: Scott Wood 
> > Signed-off-by: Jojy G Varghese 
> > [Guenter Roeck: updated description]
> > Signed-off-by: Guenter Roeck 
> > Signed-off-by: Guenter Roeck 
> > ---
> >  arch/powerpc/kernel/traps.c   | 3 ++-
> >  arch/powerpc/sysdev/fsl_pci.c | 5 +
> >  2 files changed, 7 insertions(+), 1 deletion(-)
> > 
> > diff --git a/arch/powerpc/kernel/traps.c b/arch/powerpc/kernel/traps.c
> > index 0dc43f9..ecb709b 100644
> > --- a/arch/powerpc/kernel/traps.c
> > +++ b/arch/powerpc/kernel/traps.c
> > @@ -494,7 +494,8 @@ int machine_check_e500mc(struct pt_regs *regs)
> > int recoverable = 1;
> >  
> > if (reason & MCSR_LD) {
> > -   recoverable = fsl_rio_mcheck_exception(regs);
> > +   recoverable = fsl_rio_mcheck_exception(regs) ||
> > +   fsl_pci_mcheck_exception(regs);
> > if (recoverable == 1)
> > goto silent_out;
> > }
> > diff --git a/arch/powerpc/sysdev/fsl_pci.c b/arch/powerpc/sysdev/fsl_pci.c
> > index c507767..bdb956b 100644
> > --- a/arch/powerpc/sysdev/fsl_pci.c
> > +++ b/arch/powerpc/sysdev/fsl_pci.c
> > @@ -1021,6 +1021,11 @@ int fsl_pci_mcheck_exception(struct pt_regs *regs)
> >  #endif
> > addr += mfspr(SPRN_MCAR);
> >  
> > +#ifdef CONFIG_E5500_CPU
> > +   if (mfspr(SPRN_EPCR) & SPRN_EPCR_ICM)
> > +   addr = PFN_PHYS(vmalloc_to_pfn((void *)mfspr(SPRN_DEAR)));
> > +#endif
> 
> Kconfig tells you what hardware is supported, not what hardware you're
> actually running on.
> 
Hi Scott,

Good point. Jojy, guess we'll have to check if the CPU is actually an E5500.
Can you look into that ?

> Jia Hongtao, do you know anything about this issue?  Is there an
> erratum?  What chips are affected by the the erratum covered by
> ?
> 
We already have and use the above patch(es) in our kernel. It works fine
for E500 (P2020), but does not address E5500 (P5020/P5040).

> Can we rely on DEAR or is this just a side effect of likely having taken
> a TLB miss for the address recently?  Perhaps we should use the
> instruction emulation to determine the effective address instead.
> 
> Guenter, is this patch intended to deal with an erratum or are you
> covering up legitimate errors?
> 
Those are errors related to PCIe hotplug, and are seen with unexpected PCIe
device removals (triggered, for example, by removing power from a PCIe adapter).
The behavior we see on E5500 is quite similar to the same behavior on E500:
If unhandled, the CPU keeps executing the same instruction over and over again
if there is an error on a PCIe access and thus stalls. I don't know if this
is considered an erratum or expected behavior, but it is one we have to address
since we have to be able to handle that condition. Ultimately, we'll want to
implement PCIe error handlers for the affected drivers, but that will be a next
step.

Please let me know if you have a better solution to address this problem.

Thanks,
Guenter
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH] powerpc/fsl: Add support for pci(e) machine check exception on E500MC / E5500

2014-09-29 Thread Scott Wood
On Mon, 2014-09-29 at 09:48 -0700, Guenter Roeck wrote:
> From: Jojy G Varghese 
> 
> For E500MC and E5500, a machine check exception in pci(e) memory space
> crashes the kernel.
> 
> Testing shows that the MCAR(U) register is zero on a MC exception for the
> E5500 core. At the same time, DEAR register has been found to have the
> address of the faulty load address during an MC exception for this core.
> 
> This fix changes the current behavior to fixup the result register
> and instruction pointers in the case of a load operation on a faulty
> PCI address.
> 
> The changes are:
> - Added the hook to pci machine check handing to the e500mc machine check
>   exception handler.
> - For the E5500 core, load faulting address from SPRN_DEAR register.
>   As mentioned above, this is necessary because the E5500 core does not
>   report the fault address in the MCAR register.
> 
> Cc: Scott Wood 
> Signed-off-by: Jojy G Varghese 
> [Guenter Roeck: updated description]
> Signed-off-by: Guenter Roeck 
> Signed-off-by: Guenter Roeck 
> ---
>  arch/powerpc/kernel/traps.c   | 3 ++-
>  arch/powerpc/sysdev/fsl_pci.c | 5 +
>  2 files changed, 7 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/powerpc/kernel/traps.c b/arch/powerpc/kernel/traps.c
> index 0dc43f9..ecb709b 100644
> --- a/arch/powerpc/kernel/traps.c
> +++ b/arch/powerpc/kernel/traps.c
> @@ -494,7 +494,8 @@ int machine_check_e500mc(struct pt_regs *regs)
>   int recoverable = 1;
>  
>   if (reason & MCSR_LD) {
> - recoverable = fsl_rio_mcheck_exception(regs);
> + recoverable = fsl_rio_mcheck_exception(regs) ||
> + fsl_pci_mcheck_exception(regs);
>   if (recoverable == 1)
>   goto silent_out;
>   }
> diff --git a/arch/powerpc/sysdev/fsl_pci.c b/arch/powerpc/sysdev/fsl_pci.c
> index c507767..bdb956b 100644
> --- a/arch/powerpc/sysdev/fsl_pci.c
> +++ b/arch/powerpc/sysdev/fsl_pci.c
> @@ -1021,6 +1021,11 @@ int fsl_pci_mcheck_exception(struct pt_regs *regs)
>  #endif
>   addr += mfspr(SPRN_MCAR);
>  
> +#ifdef CONFIG_E5500_CPU
> + if (mfspr(SPRN_EPCR) & SPRN_EPCR_ICM)
> + addr = PFN_PHYS(vmalloc_to_pfn((void *)mfspr(SPRN_DEAR)));
> +#endif

Kconfig tells you what hardware is supported, not what hardware you're
actually running on.

Jia Hongtao, do you know anything about this issue?  Is there an
erratum?  What chips are affected by the the erratum covered by
?

Can we rely on DEAR or is this just a side effect of likely having taken
a TLB miss for the address recently?  Perhaps we should use the
instruction emulation to determine the effective address instead.

Guenter, is this patch intended to deal with an erratum or are you
covering up legitimate errors?

-Scott


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH] powerpc/fsl: Add support for pci(e) machine check exception on E500MC / E5500

2014-09-29 Thread Guenter Roeck
From: Jojy G Varghese 

For E500MC and E5500, a machine check exception in pci(e) memory space
crashes the kernel.

Testing shows that the MCAR(U) register is zero on a MC exception for the
E5500 core. At the same time, DEAR register has been found to have the
address of the faulty load address during an MC exception for this core.

This fix changes the current behavior to fixup the result register
and instruction pointers in the case of a load operation on a faulty
PCI address.

The changes are:
- Added the hook to pci machine check handing to the e500mc machine check
  exception handler.
- For the E5500 core, load faulting address from SPRN_DEAR register.
  As mentioned above, this is necessary because the E5500 core does not
  report the fault address in the MCAR register.

Cc: Scott Wood 
Signed-off-by: Jojy G Varghese 
[Guenter Roeck: updated description]
Signed-off-by: Guenter Roeck 
Signed-off-by: Guenter Roeck 
---
 arch/powerpc/kernel/traps.c   | 3 ++-
 arch/powerpc/sysdev/fsl_pci.c | 5 +
 2 files changed, 7 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/kernel/traps.c b/arch/powerpc/kernel/traps.c
index 0dc43f9..ecb709b 100644
--- a/arch/powerpc/kernel/traps.c
+++ b/arch/powerpc/kernel/traps.c
@@ -494,7 +494,8 @@ int machine_check_e500mc(struct pt_regs *regs)
int recoverable = 1;
 
if (reason & MCSR_LD) {
-   recoverable = fsl_rio_mcheck_exception(regs);
+   recoverable = fsl_rio_mcheck_exception(regs) ||
+   fsl_pci_mcheck_exception(regs);
if (recoverable == 1)
goto silent_out;
}
diff --git a/arch/powerpc/sysdev/fsl_pci.c b/arch/powerpc/sysdev/fsl_pci.c
index c507767..bdb956b 100644
--- a/arch/powerpc/sysdev/fsl_pci.c
+++ b/arch/powerpc/sysdev/fsl_pci.c
@@ -1021,6 +1021,11 @@ int fsl_pci_mcheck_exception(struct pt_regs *regs)
 #endif
addr += mfspr(SPRN_MCAR);
 
+#ifdef CONFIG_E5500_CPU
+   if (mfspr(SPRN_EPCR) & SPRN_EPCR_ICM)
+   addr = PFN_PHYS(vmalloc_to_pfn((void *)mfspr(SPRN_DEAR)));
+#endif
+
if (is_in_pci_mem_space(addr)) {
if (user_mode(regs)) {
pagefault_disable();
-- 
1.9.1

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev