Re: Regression: Linux v5.15+ does not boot on Freescale P2020
On Tue, Aug 2, 2022 at 8:47 AM Christophe Leroy wrote: > Le 26/07/2022 à 15:44, Segher Boessenkool a écrit : > > Whoops :-) We need fixes for processor implementation bugs all the > > time of course, but this is a massive *design* bug. I'm surprised this > > CPU still works as well as it does! > > "Programming Environments Manual for 32-Bit Implementations of the > PowerPC™ Architecture" §4.1.2.2.2 says: "Invalid forms result when a bit > or operand is coded incorrectly, for example, or when a reserved bit > (shown as ‘0’) is coded as ‘1’." > > > > Also people using an SMP kernel on older cores should see the problem, > > no? Or is that patched out? Or does this use case never happen :-) It doesn't get patched out, I think it's just not a combination that anyone tests on. The few defconfig files for SMP 85xx tend to be e500mc (which is incompatible with the older cores). > Maybe unlike e500, older cores ignore the EH bit and don't mind when > it's set to 1 ? Pretty sure this is the case. My interpretation is that Freescale and IBM just interpreted the spec differently at the time and were not even aware of the difference until it was too late. Arnd
Re: Regression: Linux v5.15+ does not boot on Freescale P2020
Le 26/07/2022 à 15:44, Segher Boessenkool a écrit : > On Tue, Jul 26, 2022 at 11:02:59AM +0200, Arnd Bergmann wrote: >> On Tue, Jul 26, 2022 at 10:34 AM Pali Rohár wrote: >>> On Monday 25 July 2022 16:54:16 Segher Boessenkool wrote: The EH field in larx insns is new since ISA 2.05, and some ISA 1.x cpu implementations actually raise an illegal insn exception on EH=1. It appears P2020 is one of those. >>> >>> P2020 has e500 cores. e500 cores uses ISA 2.03. So this may be reason. >>> But in official Freescale/NXP documentation for e500 is documented that >>> lwarx supports also eh=1. Maybe it is not really supported. >>> https://www.nxp.com/files-static/32bit/doc/ref_manual/EREF_RM.pdf (page 562) > > (page 6-186) > >>> At least there is NOTE: >>> Some older processors may treat EH=1 as an illegal instruction. > > And the architecture says >Programming Note >Warning: On some processors that comply with versions of the >architecture that precede Version 2.00, executing a Load And Reserve >instruction in which EH = 1 will cause the illegal instruction error >handler to be invoked. > >> In commit d6ccb1f55ddf ("powerpc/85xx: Make sure lwarx hint isn't set on >> ppc32") >> this was clarified to affect (all?) e500v1/v2, > >e500v1/v2 based chips will treat any reserved field being set in an >opcode as illegal. > > while the architecture says > >Reserved fields in instructions are ignored by the processor. > > Whoops :-) We need fixes for processor implementation bugs all the > time of course, but this is a massive *design* bug. I'm surprised this > CPU still works as well as it does! "Programming Environments Manual for 32-Bit Implementations of the PowerPC™ Architecture" §4.1.2.2.2 says: "Invalid forms result when a bit or operand is coded incorrectly, for example, or when a reserved bit (shown as ‘0’) is coded as ‘1’." > > Even the venerable PEM (last updated in 1997) shows the EH field as > reserved, always treated as 0. > >> this one apparently >> fixed it before, >> but Christophe's commit effectively reverted that change. >> >> I think only the simple_spinlock.h file actually uses EH=1 > > That's right afaics. > >> and this is not >> included in non-SMP kernels, so presumably the only affected machines were >> the rare dual-core e500v2 ones (p2020, MPC8572, bsc9132), which would >> explain why nobody noticed for the past 9 months. > > Also people using an SMP kernel on older cores should see the problem, > no? Or is that patched out? Or does this use case never happen :-) Maybe unlike e500, older cores ignore the EH bit and don't mind when it's set to 1 ? Chritophe
Re: Regression: Linux v5.15+ does not boot on Freescale P2020
Le 26/07/2022 à 10:34, Pali Rohár a écrit : > On Monday 25 July 2022 16:54:16 Segher Boessenkool wrote: >> On Mon, Jul 25, 2022 at 10:10:09PM +0200, Pali Rohár wrote: >>> On Monday 25 July 2022 16:20:49 Christophe Leroy wrote: >>> Now I did again clean test with same Debian 10 cross compiler. >>> >>> $ git clone >>> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git && cd >>> linux >>> $ git checkout v5.15 >>> $ make mpc85xx_smp_defconfig ARCH=powerpc >>> CROSS_COMPILE=powerpc-linux-gnuspe- >>> $ make vmlinux ARCH=powerpc CROSS_COMPILE=powerpc-linux-gnuspe- >>> $ cp -a vmlinux vmlinux.v5.15 >>> $ git revert 9401f4e46cf6965e23738f70e149172344a01eef >>> $ make vmlinux ARCH=powerpc CROSS_COMPILE=powerpc-linux-gnuspe- >>> $ cp -a vmlinux vmlinux.revert >>> $ powerpc-linux-gnuspe-objdump -d vmlinux.revert > vmlinux.revert.dump >>> $ powerpc-linux-gnuspe-objdump -d vmlinux.v5.15 > vmlinux.v5.15.dump >>> $ diff -Naurp vmlinux.v5.15.dump vmlinux.revert.dump >>> >>> And there are: >>> >>> -c000c304: 7d 20 f8 29 lwarx r9,0,r31,1 >>> +c000c304: 7d 20 f8 28 lwarx r9,0,r31 >>> >>> I guess it must be reproducible this issue as I'm using regular >>> toolchain from distribution. >> >> The kernel had >> >> #define PPC_RAW_LWARX(t, a, b, eh) (0x7c28 | ___PPC_RT(t) | >> ___PPC_RA(a) | ___PPC_RB(b) | __PPC_EH(eh)) >> >> and >> >> #define PPC_LWARX(t, a, b, eh) stringify_in_c(.long PPC_RAW_LWARX(t, a, b, >> eh)) >> >> and >> >> #ifdef CONFIG_PPC64 >> #define __PPC_EH(eh)(((eh) & 0x1) << 0) >> #else >> #define __PPC_EH(eh)0 >> #endif >> >> but Christophe's 9401f4e46cf6 changed >> >> -"1:" PPC_LWARX(%0,0,%2,1) "\n\ >> +"1:lwarx %0,0,%2,1\n\ >> >> no longer checking CONFIG_PPC64. That appears to be the bug. > > Nice catch! > > Now I have tried to apply following change on master (without reverting > anything) > > diff --git a/arch/powerpc/include/asm/simple_spinlock.h > b/arch/powerpc/include/asm/simple_spinlock.h > index 7ae6aeef8464..72d3657fd2f7 100644 > --- a/arch/powerpc/include/asm/simple_spinlock.h > +++ b/arch/powerpc/include/asm/simple_spinlock.h > @@ -51,7 +51,7 @@ static inline unsigned long > __arch_spin_trylock(arch_spinlock_t *lock) > > token = LOCK_TOKEN; > __asm__ __volatile__( > -"1: lwarx %0,0,%2,1\n\ > +"1: lwarx %0,0,%2,0\n\ > cmpwi 0,%0,0\n\ > bne-2f\n\ > stwcx. %1,0,%2\n\ > @@ -158,7 +158,7 @@ static inline long __arch_read_trylock(arch_rwlock_t *rw) > long tmp; > > __asm__ __volatile__( > -"1: lwarx %0,0,%1,1\n" > +"1: lwarx %0,0,%1,0\n" > __DO_SIGN_EXTEND > " addic. %0,%0,1\n\ > ble-2f\n" > @@ -182,7 +182,7 @@ static inline long __arch_write_trylock(arch_rwlock_t *rw) > > token = WRLOCK_TOKEN; > __asm__ __volatile__( > -"1: lwarx %0,0,%2,1\n\ > +"1: lwarx %0,0,%2,0\n\ > cmpwi 0,%0,0\n\ > bne-2f\n" > " stwcx. %1,0,%2\n\ > > and with this change, objdump showed exactly same result as if I revert > that problematic commit on top of master branch. > > I guess that simple_spinlock.h should be fixed to pass 1 to lwarx for > CONFIG_PPC64 and 0 otherwise. > > Christophe, are you going to look at it? > Yes I will, but next week at the earliest. Christophe
Re: Regression: Linux v5.15+ does not boot on Freescale P2020
On Tue, Jul 26, 2022 at 04:01:00PM +0200, Pali Rohár wrote: > On Tuesday 26 July 2022 08:44:05 Segher Boessenkool wrote: > > And the architecture says > > Programming Note > > Warning: On some processors that comply with versions of the > > architecture that precede Version 2.00 > > But e500v2 is 2.03 and not older than 2.00... Yes. And it does not implement reserved fields in instructions (*any* reserved fields in instructions, apparently!) correctly at all. > > e500v1/v2 based chips will treat any reserved field being set in an > > opcode as illegal. > > > > while the architecture says > > > > Reserved fields in instructions are ignored by the processor. > > > > Whoops :-) We need fixes for processor implementation bugs all the > > time of course, but this is a massive *design* bug. > > I looked also in e500v2 and P2020 errata documents there is nothing > mentioned about eh flag. But it looks like a bug. The bug is if it does this for any reserved field (and it apparently does it for all even). > > Also people using an SMP kernel on older cores should see the problem, > > no? > > Probably yes. > > But most people on these machines are using stable LTS kernels and do > not upgrade too often. Yeah. > So you need to wait longer time to see people starting reporting such > bugs. Need to wait at least when v4.14 and v4.19 LTS versions stops > receiving updates. v4.19 is used in Debian 10 (oldstable) and v5.4 is > used by current OpenWRT. Both distributions are still supported, so > users have not migrated to new v5.15 problematic kernel yet. That's not a reasonable timeline for kernel development of course. We see the same thing with the compiler... Although GCC has a much slower release cadence (one new major version every year), it often takes two or three or more years before we get bug reports that something was broken. If stuff isn't tested, we cannot really support it at all. Segher
Re: Regression: Linux v5.15+ does not boot on Freescale P2020
On Tuesday 26 July 2022 08:44:05 Segher Boessenkool wrote: > On Tue, Jul 26, 2022 at 11:02:59AM +0200, Arnd Bergmann wrote: > > On Tue, Jul 26, 2022 at 10:34 AM Pali Rohár wrote: > > > On Monday 25 July 2022 16:54:16 Segher Boessenkool wrote: > > > > The EH field in larx insns is new since ISA 2.05, and some ISA 1.x cpu > > > > implementations actually raise an illegal insn exception on EH=1. It > > > > appears P2020 is one of those. > > > > > > P2020 has e500 cores. e500 cores uses ISA 2.03. So this may be reason. > > > But in official Freescale/NXP documentation for e500 is documented that > > > lwarx supports also eh=1. Maybe it is not really supported. > > > https://www.nxp.com/files-static/32bit/doc/ref_manual/EREF_RM.pdf (page > > > 562) > > (page 6-186) > > > > At least there is NOTE: > > > Some older processors may treat EH=1 as an illegal instruction. > > And the architecture says > Programming Note > Warning: On some processors that comply with versions of the > architecture that precede Version 2.00 But e500v2 is 2.03 and not older than 2.00... > executing a Load And Reserve > instruction in which EH = 1 will cause the illegal instruction error > handler to be invoked. > > > In commit d6ccb1f55ddf ("powerpc/85xx: Make sure lwarx hint isn't set on > > ppc32") > > this was clarified to affect (all?) e500v1/v2, > > e500v1/v2 based chips will treat any reserved field being set in an > opcode as illegal. > > while the architecture says > > Reserved fields in instructions are ignored by the processor. > > Whoops :-) We need fixes for processor implementation bugs all the > time of course, but this is a massive *design* bug. I looked also in e500v2 and P2020 errata documents there is nothing mentioned about eh flag. But it looks like a bug. > I'm surprised this > CPU still works as well as it does! > > Even the venerable PEM (last updated in 1997) shows the EH field as > reserved, always treated as 0. > > > this one apparently > > fixed it before, > > but Christophe's commit effectively reverted that change. > > > > I think only the simple_spinlock.h file actually uses EH=1 > > That's right afaics. > > > and this is not > > included in non-SMP kernels, so presumably the only affected machines were > > the rare dual-core e500v2 ones (p2020, MPC8572, bsc9132), which would > > explain why nobody noticed for the past 9 months. > > Also people using an SMP kernel on older cores should see the problem, > no? Probably yes. But most people on these machines are using stable LTS kernels and do not upgrade too often. So you need to wait longer time to see people starting reporting such bugs. Need to wait at least when v4.14 and v4.19 LTS versions stops receiving updates. v4.19 is used in Debian 10 (oldstable) and v5.4 is used by current OpenWRT. Both distributions are still supported, so users have not migrated to new v5.15 problematic kernel yet. > Or is that patched out? Or does this use case never happen :-) > > > Segher
Re: Regression: Linux v5.15+ does not boot on Freescale P2020
On Tue, Jul 26, 2022 at 11:02:59AM +0200, Arnd Bergmann wrote: > On Tue, Jul 26, 2022 at 10:34 AM Pali Rohár wrote: > > On Monday 25 July 2022 16:54:16 Segher Boessenkool wrote: > > > The EH field in larx insns is new since ISA 2.05, and some ISA 1.x cpu > > > implementations actually raise an illegal insn exception on EH=1. It > > > appears P2020 is one of those. > > > > P2020 has e500 cores. e500 cores uses ISA 2.03. So this may be reason. > > But in official Freescale/NXP documentation for e500 is documented that > > lwarx supports also eh=1. Maybe it is not really supported. > > https://www.nxp.com/files-static/32bit/doc/ref_manual/EREF_RM.pdf (page 562) (page 6-186) > > At least there is NOTE: > > Some older processors may treat EH=1 as an illegal instruction. And the architecture says Programming Note Warning: On some processors that comply with versions of the architecture that precede Version 2.00, executing a Load And Reserve instruction in which EH = 1 will cause the illegal instruction error handler to be invoked. > In commit d6ccb1f55ddf ("powerpc/85xx: Make sure lwarx hint isn't set on > ppc32") > this was clarified to affect (all?) e500v1/v2, e500v1/v2 based chips will treat any reserved field being set in an opcode as illegal. while the architecture says Reserved fields in instructions are ignored by the processor. Whoops :-) We need fixes for processor implementation bugs all the time of course, but this is a massive *design* bug. I'm surprised this CPU still works as well as it does! Even the venerable PEM (last updated in 1997) shows the EH field as reserved, always treated as 0. > this one apparently > fixed it before, > but Christophe's commit effectively reverted that change. > > I think only the simple_spinlock.h file actually uses EH=1 That's right afaics. > and this is not > included in non-SMP kernels, so presumably the only affected machines were > the rare dual-core e500v2 ones (p2020, MPC8572, bsc9132), which would > explain why nobody noticed for the past 9 months. Also people using an SMP kernel on older cores should see the problem, no? Or is that patched out? Or does this use case never happen :-) Segher
Re: Regression: Linux v5.15+ does not boot on Freescale P2020
On Tue, Jul 26, 2022 at 10:34 AM Pali Rohár wrote: > On Monday 25 July 2022 16:54:16 Segher Boessenkool wrote: > > On Mon, Jul 25, 2022 at 10:10:09PM +0200, Pali Rohár wrote: > > > On Monday 25 July 2022 16:20:49 Christophe Leroy wrote: > > > Now I did again clean test with same Debian 10 cross compiler. > > > > > > $ git clone > > > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git && cd > > > linux > > > $ git checkout v5.15 > > > $ make mpc85xx_smp_defconfig ARCH=powerpc > > > CROSS_COMPILE=powerpc-linux-gnuspe- > > > $ make vmlinux ARCH=powerpc CROSS_COMPILE=powerpc-linux-gnuspe- > > > $ cp -a vmlinux vmlinux.v5.15 > > > $ git revert 9401f4e46cf6965e23738f70e149172344a01eef > > > $ make vmlinux ARCH=powerpc CROSS_COMPILE=powerpc-linux-gnuspe- > > > $ cp -a vmlinux vmlinux.revert > > > $ powerpc-linux-gnuspe-objdump -d vmlinux.revert > vmlinux.revert.dump > > > $ powerpc-linux-gnuspe-objdump -d vmlinux.v5.15 > vmlinux.v5.15.dump > > > $ diff -Naurp vmlinux.v5.15.dump vmlinux.revert.dump > > > > > > And there are: > > > > > > -c000c304: 7d 20 f8 29 lwarx r9,0,r31,1 > > > +c000c304: 7d 20 f8 28 lwarx r9,0,r31 > > > > > > I guess it must be reproducible this issue as I'm using regular > > > toolchain from distribution. > > > > > The EH field in larx insns is new since ISA 2.05, and some ISA 1.x cpu > > implementations actually raise an illegal insn exception on EH=1. It > > appears P2020 is one of those. > > > > P2020 has e500 cores. e500 cores uses ISA 2.03. So this may be reason. > But in official Freescale/NXP documentation for e500 is documented that > lwarx supports also eh=1. Maybe it is not really supported. > https://www.nxp.com/files-static/32bit/doc/ref_manual/EREF_RM.pdf (page 562) > At least there is NOTE: > Some older processors may treat EH=1 as an illegal instruction. In commit d6ccb1f55ddf ("powerpc/85xx: Make sure lwarx hint isn't set on ppc32") this was clarified to affect (all?) e500v1/v2, this one apparently fixed it before, but Christophe's commit effectively reverted that change. I think only the simple_spinlock.h file actually uses EH=1 and this is not included in non-SMP kernels, so presumably the only affected machines were the rare dual-core e500v2 ones (p2020, MPC8572, bsc9132), which would explain why nobody noticed for the past 9 months. Arnd
Re: Regression: Linux v5.15+ does not boot on Freescale P2020
On Monday 25 July 2022 16:54:16 Segher Boessenkool wrote: > On Mon, Jul 25, 2022 at 10:10:09PM +0200, Pali Rohár wrote: > > On Monday 25 July 2022 16:20:49 Christophe Leroy wrote: > > Now I did again clean test with same Debian 10 cross compiler. > > > > $ git clone > > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git && cd > > linux > > $ git checkout v5.15 > > $ make mpc85xx_smp_defconfig ARCH=powerpc > > CROSS_COMPILE=powerpc-linux-gnuspe- > > $ make vmlinux ARCH=powerpc CROSS_COMPILE=powerpc-linux-gnuspe- > > $ cp -a vmlinux vmlinux.v5.15 > > $ git revert 9401f4e46cf6965e23738f70e149172344a01eef > > $ make vmlinux ARCH=powerpc CROSS_COMPILE=powerpc-linux-gnuspe- > > $ cp -a vmlinux vmlinux.revert > > $ powerpc-linux-gnuspe-objdump -d vmlinux.revert > vmlinux.revert.dump > > $ powerpc-linux-gnuspe-objdump -d vmlinux.v5.15 > vmlinux.v5.15.dump > > $ diff -Naurp vmlinux.v5.15.dump vmlinux.revert.dump > > > > And there are: > > > > -c000c304: 7d 20 f8 29 lwarx r9,0,r31,1 > > +c000c304: 7d 20 f8 28 lwarx r9,0,r31 > > > > I guess it must be reproducible this issue as I'm using regular > > toolchain from distribution. > > The kernel had > > #define PPC_RAW_LWARX(t, a, b, eh) (0x7c28 | ___PPC_RT(t) | > ___PPC_RA(a) | ___PPC_RB(b) | __PPC_EH(eh)) > > and > > #define PPC_LWARX(t, a, b, eh) stringify_in_c(.long PPC_RAW_LWARX(t, a, b, > eh)) > > and > > #ifdef CONFIG_PPC64 > #define __PPC_EH(eh)(((eh) & 0x1) << 0) > #else > #define __PPC_EH(eh)0 > #endif > > but Christophe's 9401f4e46cf6 changed > > -"1:" PPC_LWARX(%0,0,%2,1) "\n\ > +"1:lwarx %0,0,%2,1\n\ > > no longer checking CONFIG_PPC64. That appears to be the bug. Nice catch! Now I have tried to apply following change on master (without reverting anything) diff --git a/arch/powerpc/include/asm/simple_spinlock.h b/arch/powerpc/include/asm/simple_spinlock.h index 7ae6aeef8464..72d3657fd2f7 100644 --- a/arch/powerpc/include/asm/simple_spinlock.h +++ b/arch/powerpc/include/asm/simple_spinlock.h @@ -51,7 +51,7 @@ static inline unsigned long __arch_spin_trylock(arch_spinlock_t *lock) token = LOCK_TOKEN; __asm__ __volatile__( -"1:lwarx %0,0,%2,1\n\ +"1:lwarx %0,0,%2,0\n\ cmpwi 0,%0,0\n\ bne-2f\n\ stwcx. %1,0,%2\n\ @@ -158,7 +158,7 @@ static inline long __arch_read_trylock(arch_rwlock_t *rw) long tmp; __asm__ __volatile__( -"1:lwarx %0,0,%1,1\n" +"1:lwarx %0,0,%1,0\n" __DO_SIGN_EXTEND " addic. %0,%0,1\n\ ble-2f\n" @@ -182,7 +182,7 @@ static inline long __arch_write_trylock(arch_rwlock_t *rw) token = WRLOCK_TOKEN; __asm__ __volatile__( -"1:lwarx %0,0,%2,1\n\ +"1:lwarx %0,0,%2,0\n\ cmpwi 0,%0,0\n\ bne-2f\n" " stwcx. %1,0,%2\n\ and with this change, objdump showed exactly same result as if I revert that problematic commit on top of master branch. I guess that simple_spinlock.h should be fixed to pass 1 to lwarx for CONFIG_PPC64 and 0 otherwise. Christophe, are you going to look at it? > The EH field in larx insns is new since ISA 2.05, and some ISA 1.x cpu > implementations actually raise an illegal insn exception on EH=1. It > appears P2020 is one of those. > > > Segher P2020 has e500 cores. e500 cores uses ISA 2.03. So this may be reason. But in official Freescale/NXP documentation for e500 is documented that lwarx supports also eh=1. Maybe it is not really supported. https://www.nxp.com/files-static/32bit/doc/ref_manual/EREF_RM.pdf (page 562) At least there is NOTE: Some older processors may treat EH=1 as an illegal instruction.
Re: Regression: Linux v5.15+ does not boot on Freescale P2020
On Mon, Jul 25, 2022 at 10:10:09PM +0200, Pali Rohár wrote: > On Monday 25 July 2022 16:20:49 Christophe Leroy wrote: > Now I did again clean test with same Debian 10 cross compiler. > > $ git clone > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git && cd linux > $ git checkout v5.15 > $ make mpc85xx_smp_defconfig ARCH=powerpc CROSS_COMPILE=powerpc-linux-gnuspe- > $ make vmlinux ARCH=powerpc CROSS_COMPILE=powerpc-linux-gnuspe- > $ cp -a vmlinux vmlinux.v5.15 > $ git revert 9401f4e46cf6965e23738f70e149172344a01eef > $ make vmlinux ARCH=powerpc CROSS_COMPILE=powerpc-linux-gnuspe- > $ cp -a vmlinux vmlinux.revert > $ powerpc-linux-gnuspe-objdump -d vmlinux.revert > vmlinux.revert.dump > $ powerpc-linux-gnuspe-objdump -d vmlinux.v5.15 > vmlinux.v5.15.dump > $ diff -Naurp vmlinux.v5.15.dump vmlinux.revert.dump > > And there are: > > -c000c304: 7d 20 f8 29 lwarx r9,0,r31,1 > +c000c304: 7d 20 f8 28 lwarx r9,0,r31 > > I guess it must be reproducible this issue as I'm using regular > toolchain from distribution. The kernel had #define PPC_RAW_LWARX(t, a, b, eh) (0x7c28 | ___PPC_RT(t) | ___PPC_RA(a) | ___PPC_RB(b) | __PPC_EH(eh)) and #define PPC_LWARX(t, a, b, eh) stringify_in_c(.long PPC_RAW_LWARX(t, a, b, eh)) and #ifdef CONFIG_PPC64 #define __PPC_EH(eh)(((eh) & 0x1) << 0) #else #define __PPC_EH(eh)0 #endif but Christophe's 9401f4e46cf6 changed -"1:" PPC_LWARX(%0,0,%2,1) "\n\ +"1:lwarx %0,0,%2,1\n\ no longer checking CONFIG_PPC64. That appears to be the bug. The EH field in larx insns is new since ISA 2.05, and some ISA 1.x cpu implementations actually raise an illegal insn exception on EH=1. It appears P2020 is one of those. Segher
Re: Regression: Linux v5.15+ does not boot on Freescale P2020
On Monday 25 July 2022 16:20:49 Christophe Leroy wrote: > Le 25/07/2022 à 14:52, Pali Rohár a écrit : > > On Monday 25 July 2022 18:20:01 Michael Ellerman wrote: > >> Pali Rohár writes: > >>> On Saturday 23 July 2022 14:42:22 Christophe Leroy wrote: > Le 22/07/2022 à 11:09, Pali Rohár a écrit : > > Trying to boot mainline Linux kernel v5.15+, including current version > > from master branch, on Freescale P2020 does not work. Kernel does not > > print anything to serial console, seems that it does not work and after > > timeout watchdog reset the board. > > Can you provide more information ? Which defconfig or .config, which > version of gcc, etc ... ? > >>> > >>> I used default defconfig for mpc85xx with gcc 8, compilation for e500 > >>> cores. > >>> > >>> If you need exact .config content I can send it during week. > >>> > > I run git bisect and it found following commit: > > > > 9401f4e46cf6965e23738f70e149172344a01eef is the first bad commit > > commit 9401f4e46cf6965e23738f70e149172344a01eef > > Author: Christophe Leroy > > Date: Tue Mar 2 08:48:11 2021 + > > > > powerpc: Use lwarx/ldarx directly instead of PPC_LWARX/LDARX > > macros > > > > Force the eh flag at 0 on PPC32. > > > > Signed-off-by: Christophe Leroy > > Signed-off-by: Michael Ellerman > > Link: > > https://lore.kernel.org/r/1fc81f07cabebb875b963e295408cc3dd38c8d85.1614674882.git.christophe.le...@csgroup.eu > > > > :04 04 fe6747e45736dfcba74914a9445e5f70f5120600 > > 96358d08b65d3200928a973efb5b969b3d45f2b0 M arch > > > > > > If I revert this commit then kernel boots correctly. It also boots fine > > if I revert this commit on top of master branch. > > > > Freescale P2020 has two 32-bit e500 powerpc cores. > > > > Any idea why above commit is causing crash of the kernel? And why it is > > needed? Could eh flag set to 0 cause deadlock? > > Setting the eh flag to 0 is not supposed to be a change introduced by > that commit. Indeed that commit is not supposed to change anything at > all in the generated code. > >>> > >>> My understanding of that commit is that it changed eh flag parameter > >>> from 1 to 0 for 32-bit powerpc, including also p2020. > >> > >> Can you compare the disassembly before and after and find a place where > >> an instruction has changed? > >> > >> cheers > > > > Yes, of course. Here is diff between output from objdump -d vmlinux. > > original version --- is from git master branch and modified version +++ > > is the original version with reverted above problematic commit. > > So the +++ version is the one which is working. > > > > --- vmlinux.master.dump 2022-07-25 14:43:45.922239496 +0200 > > +++ vmlinux.revert.dump 2022-07-25 14:43:49.238259296 +0200 > > @@ -1,5 +1,5 @@ > > > > -vmlinux.master: file format elf32-powerpc > > +vmlinux.revert: file format elf32-powerpc > > > > > > Disassembly of section .head.text: > > @@ -11213,7 +11213,7 @@ c000b850: 3f a0 c1 0f lis r29,-1611 > > c000b854: 81 02 00 04 lwz r8,4(r2) > > c000b858: 3b fd 10 68 addir31,r29,4200 > > c000b85c: 39 40 00 01 li r10,1 > > -c000b860: 7d 20 f8 29 lwarx r9,0,r31,1 > > +c000b860: 7d 20 f8 28 lwarx r9,0,r31 > > c000b864: 2c 09 00 00 cmpwi r9,0 > > c000b868: 40 82 00 10 bne c000b878 > > c000b86c: 7d 40 f9 2d stwcx. r10,0,r31 > > That's really strange. I made a try with mpc85xx_defconfig with GCC 11 > and I don't get any such difference. Yes, that is strange... > Does your version of GCC has anything special ? Nothing. Ordinary Debian 10 amd64 system with cross compiler from gcc-powerpc-linux-gnuspe package (standard version, part of Debian). Now I did again clean test with same Debian 10 cross compiler. $ git clone https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git && cd linux $ git checkout v5.15 $ make mpc85xx_smp_defconfig ARCH=powerpc CROSS_COMPILE=powerpc-linux-gnuspe- $ make vmlinux ARCH=powerpc CROSS_COMPILE=powerpc-linux-gnuspe- $ cp -a vmlinux vmlinux.v5.15 $ git revert 9401f4e46cf6965e23738f70e149172344a01eef $ make vmlinux ARCH=powerpc CROSS_COMPILE=powerpc-linux-gnuspe- $ cp -a vmlinux vmlinux.revert $ powerpc-linux-gnuspe-objdump -d vmlinux.revert > vmlinux.revert.dump $ powerpc-linux-gnuspe-objdump -d vmlinux.v5.15 > vmlinux.v5.15.dump $ diff -Naurp vmlinux.v5.15.dump vmlinux.revert.dump And there are: -c000c304: 7d 20 f8 29 lwarx r9,0,r31,1 +c000c304: 7d 20 f8 28 lwarx r9,0,r31 I guess it must be reproducible this issue as I'm using regular toolchain from distribution. Just to note that I had to apply Makefile patch for CONFIG_E500 https://lore.kernel.org/linuxppc-dev/20220524093939.30927-1-p...@kernel.org/ But I was told that this issue is reproducible also by
Re: Regression: Linux v5.15+ does not boot on Freescale P2020
Le 25/07/2022 à 14:52, Pali Rohár a écrit : > On Monday 25 July 2022 18:20:01 Michael Ellerman wrote: >> Pali Rohár writes: >>> On Saturday 23 July 2022 14:42:22 Christophe Leroy wrote: Le 22/07/2022 à 11:09, Pali Rohár a écrit : > Trying to boot mainline Linux kernel v5.15+, including current version > from master branch, on Freescale P2020 does not work. Kernel does not > print anything to serial console, seems that it does not work and after > timeout watchdog reset the board. Can you provide more information ? Which defconfig or .config, which version of gcc, etc ... ? >>> >>> I used default defconfig for mpc85xx with gcc 8, compilation for e500 >>> cores. >>> >>> If you need exact .config content I can send it during week. >>> > I run git bisect and it found following commit: > > 9401f4e46cf6965e23738f70e149172344a01eef is the first bad commit > commit 9401f4e46cf6965e23738f70e149172344a01eef > Author: Christophe Leroy > Date: Tue Mar 2 08:48:11 2021 + > > powerpc: Use lwarx/ldarx directly instead of PPC_LWARX/LDARX macros > > Force the eh flag at 0 on PPC32. > > Signed-off-by: Christophe Leroy > Signed-off-by: Michael Ellerman > Link: > https://lore.kernel.org/r/1fc81f07cabebb875b963e295408cc3dd38c8d85.1614674882.git.christophe.le...@csgroup.eu > > :04 04 fe6747e45736dfcba74914a9445e5f70f5120600 > 96358d08b65d3200928a973efb5b969b3d45f2b0 M arch > > > If I revert this commit then kernel boots correctly. It also boots fine > if I revert this commit on top of master branch. > > Freescale P2020 has two 32-bit e500 powerpc cores. > > Any idea why above commit is causing crash of the kernel? And why it is > needed? Could eh flag set to 0 cause deadlock? Setting the eh flag to 0 is not supposed to be a change introduced by that commit. Indeed that commit is not supposed to change anything at all in the generated code. >>> >>> My understanding of that commit is that it changed eh flag parameter >>> from 1 to 0 for 32-bit powerpc, including also p2020. >> >> Can you compare the disassembly before and after and find a place where >> an instruction has changed? >> >> cheers > > Yes, of course. Here is diff between output from objdump -d vmlinux. > original version --- is from git master branch and modified version +++ > is the original version with reverted above problematic commit. > So the +++ version is the one which is working. > > --- vmlinux.master.dump 2022-07-25 14:43:45.922239496 +0200 > +++ vmlinux.revert.dump 2022-07-25 14:43:49.238259296 +0200 > @@ -1,5 +1,5 @@ > > -vmlinux.master: file format elf32-powerpc > +vmlinux.revert: file format elf32-powerpc > > > Disassembly of section .head.text: > @@ -11213,7 +11213,7 @@ c000b850: 3f a0 c1 0f lis r29,-1611 > c000b854: 81 02 00 04 lwz r8,4(r2) > c000b858: 3b fd 10 68 addir31,r29,4200 > c000b85c: 39 40 00 01 li r10,1 > -c000b860:7d 20 f8 29 lwarx r9,0,r31,1 > +c000b860:7d 20 f8 28 lwarx r9,0,r31 > c000b864: 2c 09 00 00 cmpwi r9,0 > c000b868: 40 82 00 10 bne c000b878 > c000b86c: 7d 40 f9 2d stwcx. r10,0,r31 That's really strange. I made a try with mpc85xx_defconfig with GCC 11 and I don't get any such difference. Does your version of GCC has anything special ? Can you send you exact .config ? Thanks Christophe
Re: Regression: Linux v5.15+ does not boot on Freescale P2020
On Monday 25 July 2022 18:20:01 Michael Ellerman wrote: > Pali Rohár writes: > > On Saturday 23 July 2022 14:42:22 Christophe Leroy wrote: > >> Le 22/07/2022 à 11:09, Pali Rohár a écrit : > >> > Trying to boot mainline Linux kernel v5.15+, including current version > >> > from master branch, on Freescale P2020 does not work. Kernel does not > >> > print anything to serial console, seems that it does not work and after > >> > timeout watchdog reset the board. > >> > >> Can you provide more information ? Which defconfig or .config, which > >> version of gcc, etc ... ? > > > > I used default defconfig for mpc85xx with gcc 8, compilation for e500 > > cores. > > > > If you need exact .config content I can send it during week. > > > >> > I run git bisect and it found following commit: > >> > > >> > 9401f4e46cf6965e23738f70e149172344a01eef is the first bad commit > >> > commit 9401f4e46cf6965e23738f70e149172344a01eef > >> > Author: Christophe Leroy > >> > Date: Tue Mar 2 08:48:11 2021 + > >> > > >> > powerpc: Use lwarx/ldarx directly instead of PPC_LWARX/LDARX macros > >> > > >> > Force the eh flag at 0 on PPC32. > >> > > >> > Signed-off-by: Christophe Leroy > >> > Signed-off-by: Michael Ellerman > >> > Link: > >> > https://lore.kernel.org/r/1fc81f07cabebb875b963e295408cc3dd38c8d85.1614674882.git.christophe.le...@csgroup.eu > >> > > >> > :04 04 fe6747e45736dfcba74914a9445e5f70f5120600 > >> > 96358d08b65d3200928a973efb5b969b3d45f2b0 M arch > >> > > >> > > >> > If I revert this commit then kernel boots correctly. It also boots fine > >> > if I revert this commit on top of master branch. > >> > > >> > Freescale P2020 has two 32-bit e500 powerpc cores. > >> > > >> > Any idea why above commit is causing crash of the kernel? And why it is > >> > needed? Could eh flag set to 0 cause deadlock? > >> > >> Setting the eh flag to 0 is not supposed to be a change introduced by > >> that commit. Indeed that commit is not supposed to change anything at > >> all in the generated code. > > > > My understanding of that commit is that it changed eh flag parameter > > from 1 to 0 for 32-bit powerpc, including also p2020. > > Can you compare the disassembly before and after and find a place where > an instruction has changed? > > cheers Yes, of course. Here is diff between output from objdump -d vmlinux. original version --- is from git master branch and modified version +++ is the original version with reverted above problematic commit. So the +++ version is the one which is working. --- vmlinux.master.dump 2022-07-25 14:43:45.922239496 +0200 +++ vmlinux.revert.dump 2022-07-25 14:43:49.238259296 +0200 @@ -1,5 +1,5 @@ -vmlinux.master: file format elf32-powerpc +vmlinux.revert: file format elf32-powerpc Disassembly of section .head.text: @@ -11213,7 +11213,7 @@ c000b850: 3f a0 c1 0f lis r29,-1611 c000b854: 81 02 00 04 lwz r8,4(r2) c000b858: 3b fd 10 68 addir31,r29,4200 c000b85c: 39 40 00 01 li r10,1 -c000b860: 7d 20 f8 29 lwarx r9,0,r31,1 +c000b860: 7d 20 f8 28 lwarx r9,0,r31 c000b864: 2c 09 00 00 cmpwi r9,0 c000b868: 40 82 00 10 bne c000b878 c000b86c: 7d 40 f9 2d stwcx. r10,0,r31 @@ -11227,7 +11227,7 @@ c000b888: 81 3e 00 1c lwz r9,28(r30 c000b88c: 7f 88 48 00 cmpwcr7,r8,r9 c000b890: 41 9e 00 38 beq cr7,c000b8c8 c000b894: 39 40 00 01 li r10,1 -c000b898: 7d 20 f8 29 lwarx r9,0,r31,1 +c000b898: 7d 20 f8 28 lwarx r9,0,r31 c000b89c: 2c 09 00 00 cmpwi r9,0 c000b8a0: 40 82 00 10 bne c000b8b0 c000b8a4: 7d 40 f9 2d stwcx. r10,0,r31 @@ -186495,7 +186495,7 @@ c00b173c: 3b 40 00 00 li r26,0 c00b1740: 3a e0 00 00 li r23,0 c00b1744: 7e c0 00 a6 mfmsr r22 c00b1748: 7c 00 01 46 wrteei 0 -c00b174c: 7f a0 c0 29 lwarx r29,0,r24,1 +c00b174c: 7f a0 c0 28 lwarx r29,0,r24 c00b1750: 2c 1d 00 00 cmpwi r29,0 c00b1754: 40 82 00 10 bne c00b1764 c00b1758: 7f 20 c1 2d stwcx. r25,0,r24 @@ -187821,7 +187821,7 @@ c00b2b7c: 3f e0 c1 0b lis r31,-1611 c00b2b80: 38 c0 00 01 li r6,1 c00b2b84: 3b ff c5 20 addir31,r31,-15072 c00b2b88: 38 ff 02 20 addir7,r31,544 -c00b2b8c: 7d 00 38 29 lwarx r8,0,r7,1 +c00b2b8c: 7d 00 38 28 lwarx r8,0,r7 c00b2b90: 2c 08 00 00 cmpwi r8,0 c00b2b94: 40 82 00 10 bne c00b2ba4 c00b2b98: 7c c0 39 2d stwcx. r6,0,r7 @@ -187947,7 +187947,7 @@ c00b2d6c: 3f a0 c1 0b lis r29,-1611 c00b2d70: 39 00 00 01 li r8,1 c00b2d74: 3b bd c5 20 addir29,r29,-15072 c00b2d78: 39 3d 02 20 addir9,r29,544 -c00b2d7c: 7d 40 48 29 lwarx r10,0,r9,1 +c00b2d7c: 7d 40 48 28
Re: Regression: Linux v5.15+ does not boot on Freescale P2020
Pali Rohár writes: > On Saturday 23 July 2022 14:42:22 Christophe Leroy wrote: >> Le 22/07/2022 à 11:09, Pali Rohár a écrit : >> > Trying to boot mainline Linux kernel v5.15+, including current version >> > from master branch, on Freescale P2020 does not work. Kernel does not >> > print anything to serial console, seems that it does not work and after >> > timeout watchdog reset the board. >> >> Can you provide more information ? Which defconfig or .config, which >> version of gcc, etc ... ? > > I used default defconfig for mpc85xx with gcc 8, compilation for e500 > cores. > > If you need exact .config content I can send it during week. > >> > I run git bisect and it found following commit: >> > >> > 9401f4e46cf6965e23738f70e149172344a01eef is the first bad commit >> > commit 9401f4e46cf6965e23738f70e149172344a01eef >> > Author: Christophe Leroy >> > Date: Tue Mar 2 08:48:11 2021 + >> > >> > powerpc: Use lwarx/ldarx directly instead of PPC_LWARX/LDARX macros >> > >> > Force the eh flag at 0 on PPC32. >> > >> > Signed-off-by: Christophe Leroy >> > Signed-off-by: Michael Ellerman >> > Link: >> > https://lore.kernel.org/r/1fc81f07cabebb875b963e295408cc3dd38c8d85.1614674882.git.christophe.le...@csgroup.eu >> > >> > :04 04 fe6747e45736dfcba74914a9445e5f70f5120600 >> > 96358d08b65d3200928a973efb5b969b3d45f2b0 M arch >> > >> > >> > If I revert this commit then kernel boots correctly. It also boots fine >> > if I revert this commit on top of master branch. >> > >> > Freescale P2020 has two 32-bit e500 powerpc cores. >> > >> > Any idea why above commit is causing crash of the kernel? And why it is >> > needed? Could eh flag set to 0 cause deadlock? >> >> Setting the eh flag to 0 is not supposed to be a change introduced by >> that commit. Indeed that commit is not supposed to change anything at >> all in the generated code. > > My understanding of that commit is that it changed eh flag parameter > from 1 to 0 for 32-bit powerpc, including also p2020. Can you compare the disassembly before and after and find a place where an instruction has changed? cheers
Re: Regression: Linux v5.15+ does not boot on Freescale P2020
Hello, On Saturday 23 July 2022 14:42:22 Christophe Leroy wrote: > Hello, > > Le 22/07/2022 à 11:09, Pali Rohár a écrit : > > Hello! > > > > Trying to boot mainline Linux kernel v5.15+, including current version > > from master branch, on Freescale P2020 does not work. Kernel does not > > print anything to serial console, seems that it does not work and after > > timeout watchdog reset the board. > > Can you provide more information ? Which defconfig or .config, which > version of gcc, etc ... ? I used default defconfig for mpc85xx with gcc 8, compilation for e500 cores. If you need exact .config content I can send it during week. > > > > I run git bisect and it found following commit: > > > > 9401f4e46cf6965e23738f70e149172344a01eef is the first bad commit > > commit 9401f4e46cf6965e23738f70e149172344a01eef > > Author: Christophe Leroy > > Date: Tue Mar 2 08:48:11 2021 + > > > > powerpc: Use lwarx/ldarx directly instead of PPC_LWARX/LDARX macros > > > > Force the eh flag at 0 on PPC32. > > > > Signed-off-by: Christophe Leroy > > Signed-off-by: Michael Ellerman > > Link: > > https://lore.kernel.org/r/1fc81f07cabebb875b963e295408cc3dd38c8d85.1614674882.git.christophe.le...@csgroup.eu > > > > :04 04 fe6747e45736dfcba74914a9445e5f70f5120600 > > 96358d08b65d3200928a973efb5b969b3d45f2b0 M arch > > > > > > If I revert this commit then kernel boots correctly. It also boots fine > > if I revert this commit on top of master branch. > > > > Freescale P2020 has two 32-bit e500 powerpc cores. > > > > Any idea why above commit is causing crash of the kernel? And why it is > > needed? Could eh flag set to 0 cause deadlock? > > Setting the eh flag to 0 is not supposed to be a change introduced by > that commit. Indeed that commit is not supposed to change anything at > all in the generated code. My understanding of that commit is that it changed eh flag parameter from 1 to 0 for 32-bit powerpc, including also p2020. > Christophe > > > > > I have looked into e500 Reference Manual for lwarx instruction (page 562) > > https://www.nxp.com/files-static/32bit/doc/ref_manual/EREF_RM.pdf and > > both 0 and 1 values for EH flag should be supported.
Re: Regression: Linux v5.15+ does not boot on Freescale P2020
Hello, Le 22/07/2022 à 11:09, Pali Rohár a écrit : > Hello! > > Trying to boot mainline Linux kernel v5.15+, including current version > from master branch, on Freescale P2020 does not work. Kernel does not > print anything to serial console, seems that it does not work and after > timeout watchdog reset the board. Can you provide more information ? Which defconfig or .config, which version of gcc, etc ... ? > > I run git bisect and it found following commit: > > 9401f4e46cf6965e23738f70e149172344a01eef is the first bad commit > commit 9401f4e46cf6965e23738f70e149172344a01eef > Author: Christophe Leroy > Date: Tue Mar 2 08:48:11 2021 + > > powerpc: Use lwarx/ldarx directly instead of PPC_LWARX/LDARX macros > > Force the eh flag at 0 on PPC32. > > Signed-off-by: Christophe Leroy > Signed-off-by: Michael Ellerman > Link: > https://lore.kernel.org/r/1fc81f07cabebb875b963e295408cc3dd38c8d85.1614674882.git.christophe.le...@csgroup.eu > > :04 04 fe6747e45736dfcba74914a9445e5f70f5120600 > 96358d08b65d3200928a973efb5b969b3d45f2b0 M arch > > > If I revert this commit then kernel boots correctly. It also boots fine > if I revert this commit on top of master branch. > > Freescale P2020 has two 32-bit e500 powerpc cores. > > Any idea why above commit is causing crash of the kernel? And why it is > needed? Could eh flag set to 0 cause deadlock? Setting the eh flag to 0 is not supposed to be a change introduced by that commit. Indeed that commit is not supposed to change anything at all in the generated code. Christophe > > I have looked into e500 Reference Manual for lwarx instruction (page 562) > https://www.nxp.com/files-static/32bit/doc/ref_manual/EREF_RM.pdf and > both 0 and 1 values for EH flag should be supported.