Re: Suspected regression?
Hi Christophe, On 26 August 2016 at 14:46, Christophe Leroywrote: [...] > Can you try the patch below ? I have identified that in case the packet is > smaller than a cacheline, it doesn't get cache-aligned so the result shall > not be rotated in case of odd dest address. > > This patch goes in addition to the previous fix (1bc8b816cb805) as it fixes > a different case. > > Christophe > > diff --git a/arch/powerpc/lib/checksum_32.S b/arch/powerpc/lib/checksum_32.S > index 68f6862..3971cfb 100644 > --- a/arch/powerpc/lib/checksum_32.S > +++ b/arch/powerpc/lib/checksum_32.S > @@ -127,18 +127,19 @@ _GLOBAL(csum_partial_copy_generic) > stw r7,12(r1) > stw r8,8(r1) > > - rlwinm r0,r4,3,0x8 > - rlwnm r6,r6,r0,0,31 /* odd destination address: rotate one byte > */ > - cmplwi cr7,r0,0/* is destination address even ? */ > addic r12,r6,0 > addir6,r4,-4 > neg r0,r4 > addir4,r3,-4 > andi. r0,r0,CACHELINE_MASK/* # bytes to start of cache line */ > + crset 4*cr7+eq > beq 58f > > cmplw 0,r5,r0 /* is this more than total to do? */ > blt 63f /* if not much to do */ > + rlwinm r7,r6,3,0x8 > + rlwnm r12,r12,r7,0,31 /* odd destination address: rotate one byte > */ > + cmplwi cr7,r7,0/* is destination address even ? */ > andi. r8,r0,3 /* get it word-aligned first */ > mtctr r8 > beq+61f Yeah! It fixes my problem! Thank you very much! Ciao, Alessio
Re: Suspected regression?
Hi Alessio, Le 26/08/2016 à 04:32, Scott Wood a écrit : On Tue, 2016-08-23 at 13:34 +0200, Christophe Leroy wrote: Le 23/08/2016 à 11:20, Alessio Igor Bogani a écrit : Hi Christophe, Sorry for delay in reply I was on vacation. On 6 August 2016 at 11:29, christophe leroywrote: Alessio, Le 05/08/2016 à 09:51, Christophe Leroy a écrit : Le 19/07/2016 à 23:52, Scott Wood a écrit : On Tue, 2016-07-19 at 12:00 +0200, Alessio Igor Bogani wrote: Hi all, I have got two boards MVME5100 (MPC7410 cpu) and MVME7100 (MPC8641D cpu) for which I use the same cross-compiler (ppc7400). I tested these against kernel HEAD to found that these don't boot anymore (PID 1 crash). Bisecting results in first offending commit: 7aef4136566b0539a1a98391181e188905e33401 Removing it from HEAD make boards boot properly again. A third system based on P2010 isn't affected at all. Is it a regression or I have made something wrong? I booted both my next branch, and Linus's master on MPC8641HPCN and didn't see this -- though possibly your RFS is doing something different. Maybe that's the difference with P2010 as well. Is there any way you can debug the cause of the crash? Or send me a minimal RFS that demonstrates the problem (ideally with debug symbols on the userspace binaries)? I got from Alessio the below information: systemd[1]: Caught , core dump failed (child 137, code=killed, status=7/BUS). systemd[1]: Freezing execution. What can generate SIGBUS ? And shouldn't we also get some KERN_ERR trace, something like "unhandled signal 7 at ." ? As far as I can see, SIGBUS is mainly generated from alignment exception. According to 7410 Reference Manual, alignment exception can happen in the following cases: * An operand of a dcbz instruction is on a page that is write-through or cache-inhibited for a virtual mode access. * An attempt to execute a dcbz instruction occurs when the cache is disabled or locked. Could try with below patch to check if the dcbz insn is causing the SIGBUS ? Unfortunately that patch doesn't solve the problem. Is there a chance that cache behavior could settled by board firmware (PPCBug on the MPC7410 board and MotLoad on the MPC8641D one)? In that case what do you suggest me to looking for? If the removal of dcbz doesn't solve the issue, I don't think it is a cache related issue. As far as I understood, your init gets a SIGBUS signal, right ? Then we must identify the reason for that sigbus. My guess would be errors demand-loading a page via NFS. One approach might be to hack up the code so that both versions of csum_partial_copy_generic() are present, and call both each time. If the results differ or the copied bytes are wrong, then spit out a dump of the details. Can you try the patch below ? I have identified that in case the packet is smaller than a cacheline, it doesn't get cache-aligned so the result shall not be rotated in case of odd dest address. This patch goes in addition to the previous fix (1bc8b816cb805) as it fixes a different case. Christophe diff --git a/arch/powerpc/lib/checksum_32.S b/arch/powerpc/lib/checksum_32.S index 68f6862..3971cfb 100644 --- a/arch/powerpc/lib/checksum_32.S +++ b/arch/powerpc/lib/checksum_32.S @@ -127,18 +127,19 @@ _GLOBAL(csum_partial_copy_generic) stw r7,12(r1) stw r8,8(r1) - rlwinm r0,r4,3,0x8 - rlwnm r6,r6,r0,0,31 /* odd destination address: rotate one byte */ - cmplwi cr7,r0,0/* is destination address even ? */ addic r12,r6,0 addir6,r4,-4 neg r0,r4 addir4,r3,-4 andi. r0,r0,CACHELINE_MASK/* # bytes to start of cache line */ + crset 4*cr7+eq beq 58f cmplw 0,r5,r0 /* is this more than total to do? */ blt 63f /* if not much to do */ + rlwinm r7,r6,3,0x8 + rlwnm r12,r12,r7,0,31 /* odd destination address: rotate one byte */ + cmplwi cr7,r7,0/* is destination address even ? */ andi. r8,r0,3 /* get it word-aligned first */ mtctr r8 beq+61f --
Re: Suspected regression?
On Tue, 2016-08-23 at 13:34 +0200, Christophe Leroy wrote: > > Le 23/08/2016 à 11:20, Alessio Igor Bogani a écrit : > > > > Hi Christophe, > > > > Sorry for delay in reply I was on vacation. > > > > On 6 August 2016 at 11:29, christophe leroy> > wrote: > > > > > > Alessio, > > > > > > > > > Le 05/08/2016 à 09:51, Christophe Leroy a écrit : > > > > > > > > > > > > > > > > > > > > Le 19/07/2016 à 23:52, Scott Wood a écrit : > > > > > > > > > > > > > > > On Tue, 2016-07-19 at 12:00 +0200, Alessio Igor Bogani wrote: > > > > > > > > > > > > > > > > > > Hi all, > > > > > > > > > > > > I have got two boards MVME5100 (MPC7410 cpu) and MVME7100 > > > > > > (MPC8641D > > > > > > cpu) for which I use the same cross-compiler (ppc7400). > > > > > > > > > > > > I tested these against kernel HEAD to found that these don't boot > > > > > > anymore (PID 1 crash). > > > > > > > > > > > > Bisecting results in first offending commit: > > > > > > 7aef4136566b0539a1a98391181e188905e33401 > > > > > > > > > > > > Removing it from HEAD make boards boot properly again. > > > > > > > > > > > > A third system based on P2010 isn't affected at all. > > > > > > > > > > > > Is it a regression or I have made something wrong? > > > > > > > > > > I booted both my next branch, and Linus's master on MPC8641HPCN and > > > > > didn't see > > > > > this -- though possibly your RFS is doing something > > > > > different. Maybe > > > > > that's > > > > > the difference with P2010 as well. > > > > > > > > > > Is there any way you can debug the cause of the crash? Or send me a > > > > > minimal > > > > > RFS that demonstrates the problem (ideally with debug symbols on the > > > > > userspace > > > > > binaries)? > > > > > > > > > I got from Alessio the below information: > > > > > > > > systemd[1]: Caught , core dump failed (child 137, code=killed, > > > > status=7/BUS). > > > > systemd[1]: Freezing execution. > > > > > > > > > > > > What can generate SIGBUS ? > > > > And shouldn't we also get some KERN_ERR trace, something like > > > > "unhandled > > > > signal 7 at ." ? > > > > > > > As far as I can see, SIGBUS is mainly generated from alignment > > > exception. > > > According to 7410 Reference Manual, alignment exception can happen in > > > the > > > following cases: > > > * An operand of a dcbz instruction is on a page that is write-through or > > > cache-inhibited for a virtual mode access. > > > * An attempt to execute a dcbz instruction occurs when the cache is > > > disabled > > > or locked. > > > > > > Could try with below patch to check if the dcbz insn is causing the > > > SIGBUS ? > > Unfortunately that patch doesn't solve the problem. > > > > Is there a chance that cache behavior could settled by board firmware > > (PPCBug on the MPC7410 board and MotLoad on the MPC8641D one)? > > In that case what do you suggest me to looking for? > If the removal of dcbz doesn't solve the issue, I don't think it is a > cache related issue. > As far as I understood, your init gets a SIGBUS signal, right ? Then we > must identify the reason for that sigbus. My guess would be errors demand-loading a page via NFS. One approach might be to hack up the code so that both versions of csum_partial_copy_generic() are present, and call both each time. If the results differ or the copied bytes are wrong, then spit out a dump of the details. -Scott
Re: Suspected regression?
Le 23/08/2016 à 11:20, Alessio Igor Bogani a écrit : Hi Christophe, Sorry for delay in reply I was on vacation. On 6 August 2016 at 11:29, christophe leroywrote: Alessio, Le 05/08/2016 à 09:51, Christophe Leroy a écrit : Le 19/07/2016 à 23:52, Scott Wood a écrit : On Tue, 2016-07-19 at 12:00 +0200, Alessio Igor Bogani wrote: Hi all, I have got two boards MVME5100 (MPC7410 cpu) and MVME7100 (MPC8641D cpu) for which I use the same cross-compiler (ppc7400). I tested these against kernel HEAD to found that these don't boot anymore (PID 1 crash). Bisecting results in first offending commit: 7aef4136566b0539a1a98391181e188905e33401 Removing it from HEAD make boards boot properly again. A third system based on P2010 isn't affected at all. Is it a regression or I have made something wrong? I booted both my next branch, and Linus's master on MPC8641HPCN and didn't see this -- though possibly your RFS is doing something different. Maybe that's the difference with P2010 as well. Is there any way you can debug the cause of the crash? Or send me a minimal RFS that demonstrates the problem (ideally with debug symbols on the userspace binaries)? I got from Alessio the below information: systemd[1]: Caught , core dump failed (child 137, code=killed, status=7/BUS). systemd[1]: Freezing execution. What can generate SIGBUS ? And shouldn't we also get some KERN_ERR trace, something like "unhandled signal 7 at ." ? As far as I can see, SIGBUS is mainly generated from alignment exception. According to 7410 Reference Manual, alignment exception can happen in the following cases: * An operand of a dcbz instruction is on a page that is write-through or cache-inhibited for a virtual mode access. * An attempt to execute a dcbz instruction occurs when the cache is disabled or locked. Could try with below patch to check if the dcbz insn is causing the SIGBUS ? Unfortunately that patch doesn't solve the problem. Is there a chance that cache behavior could settled by board firmware (PPCBug on the MPC7410 board and MotLoad on the MPC8641D one)? In that case what do you suggest me to looking for? If the removal of dcbz doesn't solve the issue, I don't think it is a cache related issue. As far as I understood, your init gets a SIGBUS signal, right ? Then we must identify the reason for that sigbus. Once it has happened, do you have access to 'dmesg' at all ? If not, you should make sure the default log level on your console is high enough to capture all messages, then I recommend you to send us your complete console log from startup until init crash so that we can get a complete picture. Christophe Christophe diff --git a/arch/powerpc/lib/checksum_32.S b/arch/powerpc/lib/checksum_32.S index 68f6862..3ad782a 100644 --- a/arch/powerpc/lib/checksum_32.S +++ b/arch/powerpc/lib/checksum_32.S @@ -192,7 +192,7 @@ _GLOBAL(csum_partial_copy_generic) mtctr r8 53:dcbtr3,r4 -54:dcbzr11,r6 +54:nop /* the main body of the cacheline loop */ CSUM_COPY_16_BYTES_WITHEX(0) #if L1_CACHE_BYTES >= 32 Thanks for your help! Ciao, Alessio
Re: Suspected regression?
Hi Christophe, Sorry for delay in reply I was on vacation. On 6 August 2016 at 11:29, christophe leroywrote: > Alessio, > > > Le 05/08/2016 à 09:51, Christophe Leroy a écrit : >> >> >> >> Le 19/07/2016 à 23:52, Scott Wood a écrit : >>> >>> On Tue, 2016-07-19 at 12:00 +0200, Alessio Igor Bogani wrote: Hi all, I have got two boards MVME5100 (MPC7410 cpu) and MVME7100 (MPC8641D cpu) for which I use the same cross-compiler (ppc7400). I tested these against kernel HEAD to found that these don't boot anymore (PID 1 crash). Bisecting results in first offending commit: 7aef4136566b0539a1a98391181e188905e33401 Removing it from HEAD make boards boot properly again. A third system based on P2010 isn't affected at all. Is it a regression or I have made something wrong? >>> >>> >>> I booted both my next branch, and Linus's master on MPC8641HPCN and >>> didn't see >>> this -- though possibly your RFS is doing something different. Maybe >>> that's >>> the difference with P2010 as well. >>> >>> Is there any way you can debug the cause of the crash? Or send me a >>> minimal >>> RFS that demonstrates the problem (ideally with debug symbols on the >>> userspace >>> binaries)? >>> >> >> I got from Alessio the below information: >> >> systemd[1]: Caught , core dump failed (child 137, code=killed, >> status=7/BUS). >> systemd[1]: Freezing execution. >> >> >> What can generate SIGBUS ? >> And shouldn't we also get some KERN_ERR trace, something like "unhandled >> signal 7 at ." ? >> > > As far as I can see, SIGBUS is mainly generated from alignment exception. > According to 7410 Reference Manual, alignment exception can happen in the > following cases: > * An operand of a dcbz instruction is on a page that is write-through or > cache-inhibited for a virtual mode access. > * An attempt to execute a dcbz instruction occurs when the cache is disabled > or locked. > > Could try with below patch to check if the dcbz insn is causing the SIGBUS ? Unfortunately that patch doesn't solve the problem. Is there a chance that cache behavior could settled by board firmware (PPCBug on the MPC7410 board and MotLoad on the MPC8641D one)? In that case what do you suggest me to looking for? > Christophe > > diff --git a/arch/powerpc/lib/checksum_32.S b/arch/powerpc/lib/checksum_32.S > index 68f6862..3ad782a 100644 > --- a/arch/powerpc/lib/checksum_32.S > +++ b/arch/powerpc/lib/checksum_32.S > @@ -192,7 +192,7 @@ _GLOBAL(csum_partial_copy_generic) > mtctr r8 > > 53:dcbtr3,r4 > -54:dcbzr11,r6 > +54:nop > /* the main body of the cacheline loop */ > CSUM_COPY_16_BYTES_WITHEX(0) > #if L1_CACHE_BYTES >= 32 Thanks for your help! Ciao, Alessio
Re: Suspected regression?
Alessio, Le 05/08/2016 à 09:51, Christophe Leroy a écrit : Le 19/07/2016 à 23:52, Scott Wood a écrit : On Tue, 2016-07-19 at 12:00 +0200, Alessio Igor Bogani wrote: Hi all, I have got two boards MVME5100 (MPC7410 cpu) and MVME7100 (MPC8641D cpu) for which I use the same cross-compiler (ppc7400). I tested these against kernel HEAD to found that these don't boot anymore (PID 1 crash). Bisecting results in first offending commit: 7aef4136566b0539a1a98391181e188905e33401 Removing it from HEAD make boards boot properly again. A third system based on P2010 isn't affected at all. Is it a regression or I have made something wrong? I booted both my next branch, and Linus's master on MPC8641HPCN and didn't see this -- though possibly your RFS is doing something different. Maybe that's the difference with P2010 as well. Is there any way you can debug the cause of the crash? Or send me a minimal RFS that demonstrates the problem (ideally with debug symbols on the userspace binaries)? I got from Alessio the below information: systemd[1]: Caught , core dump failed (child 137, code=killed, status=7/BUS). systemd[1]: Freezing execution. What can generate SIGBUS ? And shouldn't we also get some KERN_ERR trace, something like "unhandled signal 7 at ." ? As far as I can see, SIGBUS is mainly generated from alignment exception. According to 7410 Reference Manual, alignment exception can happen in the following cases: * An operand of a dcbz instruction is on a page that is write-through or cache-inhibited for a virtual mode access. * An attempt to execute a dcbz instruction occurs when the cache is disabled or locked. Could try with below patch to check if the dcbz insn is causing the SIGBUS ? Christophe diff --git a/arch/powerpc/lib/checksum_32.S b/arch/powerpc/lib/checksum_32.S index 68f6862..3ad782a 100644 --- a/arch/powerpc/lib/checksum_32.S +++ b/arch/powerpc/lib/checksum_32.S @@ -192,7 +192,7 @@ _GLOBAL(csum_partial_copy_generic) mtctr r8 53:dcbtr3,r4 -54:dcbzr11,r6 +54:nop /* the main body of the cacheline loop */ CSUM_COPY_16_BYTES_WITHEX(0) #if L1_CACHE_BYTES >= 32 --- L'absence de virus dans ce courrier électronique a été vérifiée par le logiciel antivirus Avast. https://www.avast.com/antivirus
Re: Suspected regression?
Le 19/07/2016 à 23:52, Scott Wood a écrit : On Tue, 2016-07-19 at 12:00 +0200, Alessio Igor Bogani wrote: Hi all, I have got two boards MVME5100 (MPC7410 cpu) and MVME7100 (MPC8641D cpu) for which I use the same cross-compiler (ppc7400). I tested these against kernel HEAD to found that these don't boot anymore (PID 1 crash). Bisecting results in first offending commit: 7aef4136566b0539a1a98391181e188905e33401 Removing it from HEAD make boards boot properly again. A third system based on P2010 isn't affected at all. Is it a regression or I have made something wrong? I booted both my next branch, and Linus's master on MPC8641HPCN and didn't see this -- though possibly your RFS is doing something different. Maybe that's the difference with P2010 as well. Is there any way you can debug the cause of the crash? Or send me a minimal RFS that demonstrates the problem (ideally with debug symbols on the userspace binaries)? I got from Alessio the below information: systemd[1]: Caught , core dump failed (child 137, code=killed, status=7/BUS). systemd[1]: Freezing execution. What can generate SIGBUS ? And shouldn't we also get some KERN_ERR trace, something like "unhandled signal 7 at ." ? Christophe
Re: Suspected regression?
On Tue, 2016-07-19 at 12:00 +0200, Alessio Igor Bogani wrote: > Hi all, > > I have got two boards MVME5100 (MPC7410 cpu) and MVME7100 (MPC8641D > cpu) for which I use the same cross-compiler (ppc7400). > > I tested these against kernel HEAD to found that these don't boot > anymore (PID 1 crash). > > Bisecting results in first offending commit: > 7aef4136566b0539a1a98391181e188905e33401 > > Removing it from HEAD make boards boot properly again. > > A third system based on P2010 isn't affected at all. > > Is it a regression or I have made something wrong? I booted both my next branch, and Linus's master on MPC8641HPCN and didn't see this -- though possibly your RFS is doing something different. Maybe that's the difference with P2010 as well. Is there any way you can debug the cause of the crash? Or send me a minimal RFS that demonstrates the problem (ideally with debug symbols on the userspace binaries)? -Scott ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Suspected regression?
Hi all, I have got two boards MVME5100 (MPC7410 cpu) and MVME7100 (MPC8641D cpu) for which I use the same cross-compiler (ppc7400). I tested these against kernel HEAD to found that these don't boot anymore (PID 1 crash). Bisecting results in first offending commit: 7aef4136566b0539a1a98391181e188905e33401 Removing it from HEAD make boards boot properly again. A third system based on P2010 isn't affected at all. Is it a regression or I have made something wrong? Thanks! Ciao, Alessio ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev