Re: Regression in kernel 4.12-rc1 for Powerpc 32 - bisected to commit 3448890c32c3
Larry Fingerwrites: > On 06/23/2017 03:29 PM, Al Viro wrote: >> On Fri, Jun 23, 2017 at 01:49:16PM -0500, Larry Finger wrote: >> BTW, could you try to check what happens if you kill the if (__builtin_constant_p(n) && (n <= 8)) bits in raw_copy_{to,from}_user()? The usefulness of those (in __copy_from_user() originally) had always been dubious and the things are simpler without them. If _that_ turns out to cure breakage, I would be very surprised, though. >>> Sorry I was gone so long. Installing jessie on this box resulted in a crash >>> on boot. Lubuntu 14.04 yielded a desktop with a functioning cursor, but >>> nothing else. Finally, Ubuntu 12.04 resulted in a working system. I hate >>> Unity, but I guess I'm stuck for now. >> >> Ho-hum... Jessie is 3.16, so whatever is crashing there, it's something >> different... Ubuntu 12.04 is what, 3.2? >> >>> I know how easy it is to screw up a long bisection by booting the wrong >>> kernel. To help that problem and to work around the yaconf/yboot nonsense on >>> the MAC, my /etc/yaconf has always had generic kernel stanzas with only >>> default, old, and original kernels mentioned. From there I use a local >>> script to finish a kernel installation by moving the default links to the >>> old ones and creating the new default links pointing to the current kernel. >>> With those long-tested scripts, I'm sure that I am booting the one I want. >>> >>> With the new installation, kernel 4.12-rc6 failed, as did 3448890c with the >>> backported 46f401c4 added. >>> >>> Replacing "if (__builtin_constant_p(n) && (n <= 8))" with "if (0)" had no >>> effect. >> >> OK, that simplifies things a bit. Just to make sure we are on the same page: >> >> * f2ed8bebee69 + cherry-pick of 46f401c4 boots (Ubuntu 12.04 userland) >> * 3448890c32c3 + cherry-pick of 46f401c4 fails (Ubuntu 12.04 userland), ditto >>with removal of constant-size bits in raw_copy_..._user(). Failure >> appears >>to be on udev getting EFAULT on some syscalls. >> * straight Ubuntu 12.04 works >> * jessie crashes on boot. > > I made a break through. If I turn off inline copy to/from users for 32-bit > ppc > with the following patch, then the system boots: > > diff --git a/arch/powerpc/include/asm/uaccess.h > b/arch/powerpc/include/asm/uaccess.h > index 5c0d8a8cdae5..1e6a8723f497 100644 > --- a/arch/powerpc/include/asm/uaccess.h > +++ b/arch/powerpc/include/asm/uaccess.h > @@ -267,12 +267,7 @@ do { > \ > extern unsigned long __copy_tofrom_user(void __user *to, > const void __user *from, unsigned long size); > > -#ifndef __powerpc64__ > - > -#define INLINE_COPY_FROM_USER > -#define INLINE_COPY_TO_USE > - > -#else /* __powerpc64__ */ > +#ifdef __powerpc64__ > > static inline unsigned long > raw_copy_in_user(void __user *to, const void __user *from, unsigned long n) Thanks for debugging this. I just sent a fix based on the above. Let me know if it doesn't work for you. cheers
Re: Regression in kernel 4.12-rc1 for Powerpc 32 - bisected to commit 3448890c32c3
Larry Finger writes: > On 06/23/2017 03:29 PM, Al Viro wrote: >> On Fri, Jun 23, 2017 at 01:49:16PM -0500, Larry Finger wrote: >> BTW, could you try to check what happens if you kill the if (__builtin_constant_p(n) && (n <= 8)) bits in raw_copy_{to,from}_user()? The usefulness of those (in __copy_from_user() originally) had always been dubious and the things are simpler without them. If _that_ turns out to cure breakage, I would be very surprised, though. >>> Sorry I was gone so long. Installing jessie on this box resulted in a crash >>> on boot. Lubuntu 14.04 yielded a desktop with a functioning cursor, but >>> nothing else. Finally, Ubuntu 12.04 resulted in a working system. I hate >>> Unity, but I guess I'm stuck for now. >> >> Ho-hum... Jessie is 3.16, so whatever is crashing there, it's something >> different... Ubuntu 12.04 is what, 3.2? >> >>> I know how easy it is to screw up a long bisection by booting the wrong >>> kernel. To help that problem and to work around the yaconf/yboot nonsense on >>> the MAC, my /etc/yaconf has always had generic kernel stanzas with only >>> default, old, and original kernels mentioned. From there I use a local >>> script to finish a kernel installation by moving the default links to the >>> old ones and creating the new default links pointing to the current kernel. >>> With those long-tested scripts, I'm sure that I am booting the one I want. >>> >>> With the new installation, kernel 4.12-rc6 failed, as did 3448890c with the >>> backported 46f401c4 added. >>> >>> Replacing "if (__builtin_constant_p(n) && (n <= 8))" with "if (0)" had no >>> effect. >> >> OK, that simplifies things a bit. Just to make sure we are on the same page: >> >> * f2ed8bebee69 + cherry-pick of 46f401c4 boots (Ubuntu 12.04 userland) >> * 3448890c32c3 + cherry-pick of 46f401c4 fails (Ubuntu 12.04 userland), ditto >>with removal of constant-size bits in raw_copy_..._user(). Failure >> appears >>to be on udev getting EFAULT on some syscalls. >> * straight Ubuntu 12.04 works >> * jessie crashes on boot. > > I made a break through. If I turn off inline copy to/from users for 32-bit > ppc > with the following patch, then the system boots: > > diff --git a/arch/powerpc/include/asm/uaccess.h > b/arch/powerpc/include/asm/uaccess.h > index 5c0d8a8cdae5..1e6a8723f497 100644 > --- a/arch/powerpc/include/asm/uaccess.h > +++ b/arch/powerpc/include/asm/uaccess.h > @@ -267,12 +267,7 @@ do { > \ > extern unsigned long __copy_tofrom_user(void __user *to, > const void __user *from, unsigned long size); > > -#ifndef __powerpc64__ > - > -#define INLINE_COPY_FROM_USER > -#define INLINE_COPY_TO_USE > - > -#else /* __powerpc64__ */ > +#ifdef __powerpc64__ > > static inline unsigned long > raw_copy_in_user(void __user *to, const void __user *from, unsigned long n) Thanks for debugging this. I just sent a fix based on the above. Let me know if it doesn't work for you. cheers
Re: gcc 4.6.3 miscompile on ppc32 (was Re: Regression in kernel 4.12-rc1 for Powerpc 32 - bisected to commit 3448890c32c3)
Al Virowrites: > On Sun, Jun 25, 2017 at 04:44:09PM -0500, Segher Boessenkool wrote: > >> Do you have a short stand-alone testcase? 4.6 is ancient, of course, but >> the actual problem may still exist in more recent compilers (if it _is_ >> a compiler problem; if it's not, you *really* want to know :-) ) > > Enjoy. At least 6.3 doesn't step into that. Look for mtctr in the resulting > asm... > > cat <<'EOF' >a.c ... I pointed creduce at that and got the version below, which I'm pretty sure still exhibits the weird mtctr behaviour. cheers # cat input.c struct { void *iov_base; unsigned iov_len; } * c; long v; void *a; int b; unsigned bar(); foo(unsigned p1) { unsigned d, e = p1; if (p1 == 0) goto out; if (p1 > 4) goto out; if (__builtin_expect(!!(0, v && a), 1)) e = bar(); if (e) barf(e); if (e) goto out; d = 0; for (; d < p1; d++) { int f = c[d].iov_len; if (__builtin_expect(c[d].iov_base && f, 0)) b = 4; } out:; } $ cat output.s .file "input.c" # rs6000/powerpc options: -mcpu=powerpc -msdata=data -G 8 # GNU C (GCC) version 4.6.3 (powerpc64-linux) # compiled by GNU C version 4.3.2, GMP version 4.3.2, MPFR version 2.4.2, MPC version 0.8.2 # ... # Compiler executable checksum: 4b51a6b899110d06c9e3310ac66ad26c .section".text" .align 2 .globl foo .type foo, @function foo: cmpwi 0,3,0 # tmp169, p1 stwu 1,-16(1)#,, mflr 0 #, stw 0,20(1) #, beq- 0,.L9 # cmplwi 7,3,4 #, tmp170, p1 bgt- 7,.L9 # lis 9,v@ha # tmp172, lwz 0,v@l(9) # v, v cmpwi 7,0,0 #, tmp174, v beq- 7,.L3 # lis 9,a@ha # tmp176, lwz 0,a@l(9) # a, a cmpwi 7,0,0 #, tmp178, a beq- 7,.L3 # bl bar # cmpwi 0,3,0 # tmp179, e beq+ 0,.L4 # .L3: bl barf # b .L9# .L4: lis 8,0x2000 #, lis 9,c@ha # tmp181, mtctr 8 # tmp192, lwz 11,c@l(9)# c, c.3 lis 10,b@ha # tmp190, li 9,0 # ivtmp.12, li 0,4 # tmp191, .L6: lwzx 7,11,9 # MEM[base: c.3_14, index: ivtmp.12_25, offset: 0B], MEM[base: c.3_14, index: ivtmp.12_25, offset: 0B] add 8,11,9 # tmp182, c.3, ivtmp.12 lwz 8,4(8) # MEM[base: D.1310_21, offset: 4B], D.1287 cmpwi 7,7,0 #, tmp184, MEM[base: c.3_14, index: ivtmp.12_25, offset: 0B] beq+ 7,.L5 # cmpwi 7,8,0 #, tmp185, D.1287 beq+ 7,.L5 # stw 0,b@l(10)# b, tmp191 .L5: addi 9,9,8 # ivtmp.12, ivtmp.12, bdnz .L6 # .L2: .L9: lwz 0,20(1) #, addi 1,1,16 #,, mtlr 0 #, blr # .size foo,.-foo .globl b .globl a .globl v .globl c .section.sbss,"aw",@nobits .align 2 .type b, @object .size b, 4 b: .zero 4 .type a, @object .size a, 4 a: .zero 4 .type v, @object .size v, 4 v: .zero 4 .type c, @object .size c, 4 c: .zero 4 .ident "GCC: (GNU) 4.6.3" .section.note.GNU-stack,"",@progbits
Re: gcc 4.6.3 miscompile on ppc32 (was Re: Regression in kernel 4.12-rc1 for Powerpc 32 - bisected to commit 3448890c32c3)
Al Viro writes: > On Sun, Jun 25, 2017 at 04:44:09PM -0500, Segher Boessenkool wrote: > >> Do you have a short stand-alone testcase? 4.6 is ancient, of course, but >> the actual problem may still exist in more recent compilers (if it _is_ >> a compiler problem; if it's not, you *really* want to know :-) ) > > Enjoy. At least 6.3 doesn't step into that. Look for mtctr in the resulting > asm... > > cat <<'EOF' >a.c ... I pointed creduce at that and got the version below, which I'm pretty sure still exhibits the weird mtctr behaviour. cheers # cat input.c struct { void *iov_base; unsigned iov_len; } * c; long v; void *a; int b; unsigned bar(); foo(unsigned p1) { unsigned d, e = p1; if (p1 == 0) goto out; if (p1 > 4) goto out; if (__builtin_expect(!!(0, v && a), 1)) e = bar(); if (e) barf(e); if (e) goto out; d = 0; for (; d < p1; d++) { int f = c[d].iov_len; if (__builtin_expect(c[d].iov_base && f, 0)) b = 4; } out:; } $ cat output.s .file "input.c" # rs6000/powerpc options: -mcpu=powerpc -msdata=data -G 8 # GNU C (GCC) version 4.6.3 (powerpc64-linux) # compiled by GNU C version 4.3.2, GMP version 4.3.2, MPFR version 2.4.2, MPC version 0.8.2 # ... # Compiler executable checksum: 4b51a6b899110d06c9e3310ac66ad26c .section".text" .align 2 .globl foo .type foo, @function foo: cmpwi 0,3,0 # tmp169, p1 stwu 1,-16(1)#,, mflr 0 #, stw 0,20(1) #, beq- 0,.L9 # cmplwi 7,3,4 #, tmp170, p1 bgt- 7,.L9 # lis 9,v@ha # tmp172, lwz 0,v@l(9) # v, v cmpwi 7,0,0 #, tmp174, v beq- 7,.L3 # lis 9,a@ha # tmp176, lwz 0,a@l(9) # a, a cmpwi 7,0,0 #, tmp178, a beq- 7,.L3 # bl bar # cmpwi 0,3,0 # tmp179, e beq+ 0,.L4 # .L3: bl barf # b .L9# .L4: lis 8,0x2000 #, lis 9,c@ha # tmp181, mtctr 8 # tmp192, lwz 11,c@l(9)# c, c.3 lis 10,b@ha # tmp190, li 9,0 # ivtmp.12, li 0,4 # tmp191, .L6: lwzx 7,11,9 # MEM[base: c.3_14, index: ivtmp.12_25, offset: 0B], MEM[base: c.3_14, index: ivtmp.12_25, offset: 0B] add 8,11,9 # tmp182, c.3, ivtmp.12 lwz 8,4(8) # MEM[base: D.1310_21, offset: 4B], D.1287 cmpwi 7,7,0 #, tmp184, MEM[base: c.3_14, index: ivtmp.12_25, offset: 0B] beq+ 7,.L5 # cmpwi 7,8,0 #, tmp185, D.1287 beq+ 7,.L5 # stw 0,b@l(10)# b, tmp191 .L5: addi 9,9,8 # ivtmp.12, ivtmp.12, bdnz .L6 # .L2: .L9: lwz 0,20(1) #, addi 1,1,16 #,, mtlr 0 #, blr # .size foo,.-foo .globl b .globl a .globl v .globl c .section.sbss,"aw",@nobits .align 2 .type b, @object .size b, 4 b: .zero 4 .type a, @object .size a, 4 a: .zero 4 .type v, @object .size v, 4 v: .zero 4 .type c, @object .size c, 4 c: .zero 4 .ident "GCC: (GNU) 4.6.3" .section.note.GNU-stack,"",@progbits
Re: gcc 4.6.3 miscompile on ppc32 (was Re: Regression in kernel 4.12-rc1 for Powerpc 32 - bisected to commit 3448890c32c3)
On Sun, Jun 25, 2017 at 04:44:09PM -0500, Segher Boessenkool wrote: > Do you have a short stand-alone testcase? 4.6 is ancient, of course, but > the actual problem may still exist in more recent compilers (if it _is_ > a compiler problem; if it's not, you *really* want to know :-) ) Enjoy. At least 6.3 doesn't step into that. Look for mtctr in the resulting asm... cat <<'EOF' >a.c struct iovec { void *iov_base; unsigned iov_len; }; unsigned long v; extern void * barf(void *,int,unsigned); extern unsigned long bar(void *to, const void *from, unsigned long size); static inline unsigned long __bar(void *to, const void *from, unsigned long n) { unsigned long res = n; if (__builtin_expect(!!(((void)0, unsigned long)(from)) <= v) && n)) == 0) || n)) - 1) <= (v - (( unsigned long)(from, 1)) res = bar(to, from, n); if (res) barf(to + (n - res), 0, res); return res; } int foo(int type, const struct iovec * uvector, unsigned long nr_segs, unsigned long fast_segs, struct iovec *iov, struct iovec **ret_pointer) { unsigned long seg; int ret; if (nr_segs == 0) { ret = 0; goto out; } if (nr_segs > 1024) { ret = -22; goto out; } if (__bar(iov, uvector, nr_segs*sizeof(*uvector))) { ret = -14; goto out; } ret = 0; for (seg = 0; seg < nr_segs; seg++) { void *buf = iov[seg].iov_base; int len = (int)iov[seg].iov_len; if (len < 0) { ret = -22; goto out; } if (type >= 0 && __builtin_expect(!!(!((void)0, unsigned long)(buf)) <= v) && len)) == 0) || len)) - 1) <= (v - (( unsigned long)(buf, 0)) { ret = -14; goto out; } ret += len; } out: *ret_pointer = iov; return ret; } EOF powerpc-linux-gcc -m32 -fno-strict-aliasing -fno-common -std=gnu89 -fno-PIE -msoft-float -pipe -ffixed-r2 -mmultiple -mno-altivec -mno-vsx -mno-spe -mspe=no -funit-at-a-time -fno-dwarf2-cfi-asm -mno-string -mcpu=powerpc -Wa,-maltivec -mbig-endian -fno-delete-null-pointer-checks -Os -fno-stack-protector -Wno-unused-but-set-variable -fomit-frame-pointer -fno-var-tracking-assignments -femit-struct-debug-baseonly -fno-var-tracking -fno-strict-overflow -fconserve-stack -fverbose-asm -S a.c
Re: gcc 4.6.3 miscompile on ppc32 (was Re: Regression in kernel 4.12-rc1 for Powerpc 32 - bisected to commit 3448890c32c3)
On Sun, Jun 25, 2017 at 04:44:09PM -0500, Segher Boessenkool wrote: > Do you have a short stand-alone testcase? 4.6 is ancient, of course, but > the actual problem may still exist in more recent compilers (if it _is_ > a compiler problem; if it's not, you *really* want to know :-) ) Enjoy. At least 6.3 doesn't step into that. Look for mtctr in the resulting asm... cat <<'EOF' >a.c struct iovec { void *iov_base; unsigned iov_len; }; unsigned long v; extern void * barf(void *,int,unsigned); extern unsigned long bar(void *to, const void *from, unsigned long size); static inline unsigned long __bar(void *to, const void *from, unsigned long n) { unsigned long res = n; if (__builtin_expect(!!(((void)0, unsigned long)(from)) <= v) && n)) == 0) || n)) - 1) <= (v - (( unsigned long)(from, 1)) res = bar(to, from, n); if (res) barf(to + (n - res), 0, res); return res; } int foo(int type, const struct iovec * uvector, unsigned long nr_segs, unsigned long fast_segs, struct iovec *iov, struct iovec **ret_pointer) { unsigned long seg; int ret; if (nr_segs == 0) { ret = 0; goto out; } if (nr_segs > 1024) { ret = -22; goto out; } if (__bar(iov, uvector, nr_segs*sizeof(*uvector))) { ret = -14; goto out; } ret = 0; for (seg = 0; seg < nr_segs; seg++) { void *buf = iov[seg].iov_base; int len = (int)iov[seg].iov_len; if (len < 0) { ret = -22; goto out; } if (type >= 0 && __builtin_expect(!!(!((void)0, unsigned long)(buf)) <= v) && len)) == 0) || len)) - 1) <= (v - (( unsigned long)(buf, 0)) { ret = -14; goto out; } ret += len; } out: *ret_pointer = iov; return ret; } EOF powerpc-linux-gcc -m32 -fno-strict-aliasing -fno-common -std=gnu89 -fno-PIE -msoft-float -pipe -ffixed-r2 -mmultiple -mno-altivec -mno-vsx -mno-spe -mspe=no -funit-at-a-time -fno-dwarf2-cfi-asm -mno-string -mcpu=powerpc -Wa,-maltivec -mbig-endian -fno-delete-null-pointer-checks -Os -fno-stack-protector -Wno-unused-but-set-variable -fomit-frame-pointer -fno-var-tracking-assignments -femit-struct-debug-baseonly -fno-var-tracking -fno-strict-overflow -fconserve-stack -fverbose-asm -S a.c
Re: gcc 4.6.3 miscompile on ppc32 (was Re: Regression in kernel 4.12-rc1 for Powerpc 32 - bisected to commit 3448890c32c3)
On Sun, Jun 25, 2017 at 09:53:24PM +0100, Al Viro wrote: > Confirmed. It manages to bugger the loop immediately after the (successful) > copying of iovec array in rw_copy_check_uvector(); both with and without > INLINE_COPY_FROM_USER it has (just before the call of copy_from_user()) r27 > set to nr_segs * sizeof(struct iovec). The call is made, we check that it > has succeeded and that's when it hits the fan: without INLINE_COPY_FROM_USER > we have (interleaved with unrelated insns) > addi 27,27,-8 > srwi 27,27,3 > addi 27,27,1 > mtctr 27 > Weird, but manages to pass nr_segs to mtctr. This weirdosity is https://gcc.gnu.org/PR67288 . Those three instructions are not the same as just srwi 27,27,3 in case r27 is 0; GCC does not figure out this cannot happen here. > _With_ INLINE_COPY_FROM_USER we > get this: > lis 9,0x2000 > mtctr 9 > In other words, the loop will try to go through 8192 iterations. No idea > where > that number has come from, but it sure as hell is wrong. 8192*65535, even. This is as if r27 was 0 always. Do you have a short stand-alone testcase? 4.6 is ancient, of course, but the actual problem may still exist in more recent compilers (if it _is_ a compiler problem; if it's not, you *really* want to know :-) ) Segher
Re: gcc 4.6.3 miscompile on ppc32 (was Re: Regression in kernel 4.12-rc1 for Powerpc 32 - bisected to commit 3448890c32c3)
On Sun, Jun 25, 2017 at 09:53:24PM +0100, Al Viro wrote: > Confirmed. It manages to bugger the loop immediately after the (successful) > copying of iovec array in rw_copy_check_uvector(); both with and without > INLINE_COPY_FROM_USER it has (just before the call of copy_from_user()) r27 > set to nr_segs * sizeof(struct iovec). The call is made, we check that it > has succeeded and that's when it hits the fan: without INLINE_COPY_FROM_USER > we have (interleaved with unrelated insns) > addi 27,27,-8 > srwi 27,27,3 > addi 27,27,1 > mtctr 27 > Weird, but manages to pass nr_segs to mtctr. This weirdosity is https://gcc.gnu.org/PR67288 . Those three instructions are not the same as just srwi 27,27,3 in case r27 is 0; GCC does not figure out this cannot happen here. > _With_ INLINE_COPY_FROM_USER we > get this: > lis 9,0x2000 > mtctr 9 > In other words, the loop will try to go through 8192 iterations. No idea > where > that number has come from, but it sure as hell is wrong. 8192*65535, even. This is as if r27 was 0 always. Do you have a short stand-alone testcase? 4.6 is ancient, of course, but the actual problem may still exist in more recent compilers (if it _is_ a compiler problem; if it's not, you *really* want to know :-) ) Segher
gcc 4.6.3 miscompile on ppc32 (was Re: Regression in kernel 4.12-rc1 for Powerpc 32 - bisected to commit 3448890c32c3)
On Sun, Jun 25, 2017 at 12:14:04PM +0100, Al Viro wrote: > On Sun, Jun 25, 2017 at 10:53:58AM +0100, Al Viro wrote: > > On Sat, Jun 24, 2017 at 12:29:23PM -0500, Larry Finger wrote: > > > > > I made a break through. If I turn off inline copy to/from users for 32-bit > > > ppc with the following patch, then the system boots: > > > > OK... So it's 4.6.3 miscompiling something - it is hardware-independent, > > reproduced in qemu. I'd like to get more self-contained example of > > miscompile, though; should be done by tonight... > > OK, it's the call in rw_copy_check_uvector(); with INLINE_COPY_FROM_USER > it's miscompiled by 4.6.3. I hadn't looked through the generated code > yet; will do that after I grab some sleep. Confirmed. It manages to bugger the loop immediately after the (successful) copying of iovec array in rw_copy_check_uvector(); both with and without INLINE_COPY_FROM_USER it has (just before the call of copy_from_user()) r27 set to nr_segs * sizeof(struct iovec). The call is made, we check that it has succeeded and that's when it hits the fan: without INLINE_COPY_FROM_USER we have (interleaved with unrelated insns) addi 27,27,-8 srwi 27,27,3 addi 27,27,1 mtctr 27 Weird, but manages to pass nr_segs to mtctr. _With_ INLINE_COPY_FROM_USER we get this: lis 9,0x2000 mtctr 9 In other words, the loop will try to go through 8192 iterations. No idea where that number has come from, but it sure as hell is wrong. That's where those -EINVAL, etc. are coming from - we run into something negative in iov[seg].len, after having run out of on-stack iovec array. Assembler generated out of rw_copy_check_uvector() with and without INLINE_COPY_FROM_USER is attached; it's a definite miscompile. Neither 4.4.5 nor 6.3.0 use mtctr/bdnz for that loop. The bottom line is, ppc cross-toolchain on kernel.org happens to be the version that miscompiles rw_copy_check_uvector() with INLINE_COPY_FROM_USER and hell knows what else. Said that, I would rather have ppc32 drop the INLINE_COPY_{TO,FROM}_USER anyway; that won't fix any other places where the same 4.6.3 bug hits, but I seriously suspect that it will end up being faster even on non^Wless buggy gcc versions. Could powerpc folks check what does removing those two defines from arch/powerpc/include/asm/uaccess.h do to performance? If there's no slowdown, I would strongly recommend just removing those as in the patch Larry has posted upthread. Fixing whatever it is in gcc 4.6.3 that triggers that behaviour is IMO pointless - it might make sense to switch kernel.org cross-toolchain to something more recent, but that's it. .globl rw_copy_check_uvector .type rw_copy_check_uvector, @function rw_copy_check_uvector: .LFB2683: .loc 1 773 0 stwu 1,-32(1)#,, .LCFI142: mflr 0 #, .LCFI143: stmw 27,12(1)#, .LCFI144: .loc 1 783 0 mr. 27,5 # nr_segs, nr_segs .loc 1 773 0 mr 30,3 # type, type stw 0,36(1) #, .LCFI145: .loc 1 773 0 mr 31,4 # uvector, uvector mr 29,8 # ret_pointer, ret_pointer .loc 1 776 0 mr 28,7 # iov, fast_pointer .loc 1 784 0 li 0,0 # ret, .loc 1 783 0 beq- 0,.L495 # .loc 1 792 0 cmplwi 7,27,1024 #, tmp160, nr_segs .loc 1 793 0 li 0,-22 # ret, .loc 1 792 0 bgt- 7,.L495 # .loc 1 796 0 cmplw 7,27,6 # fast_segs, tmp161, nr_segs ble- 7,.L496 # .LBB1538: .LBB1539: .file 21 "./include/linux/slab.h" .loc 21 495 0 lis 4,0x140 # tmp190, slwi 3,27,3 #, nr_segs, ori 4,4,192 #,, tmp190, bl __kmalloc # .LBE1539: .LBE1538: .loc 1 799 0 li 0,-12 # ret, .loc 1 798 0 mr. 28,3 # iov, beq- 0,.L495 # .L496: .LBB1540: .LBB1541: .LBB1542: .LBB1543: .loc 19 113 0 lwz 0,1128(2)# current.192_185->thread.fs.seg, D.39493 .LBE1543: .LBE1542: .LBE1541: .LBE1540: .loc 1 803 0 slwi 27,27,3 # n, nr_segs, .LBB1549: .LBB1548: .LBB1547: .LBB1546: mr 5,27 # n, n .loc 19 113 0 cmplw 7,31,0 # D.39493, tmp165, uvector bgt- 7,.L497 # addi 9,27,-1 # tmp166, n, subf 0,31,0 # tmp167, uvector, D.39493 cmplw 7,9,0 # tmp167, tmp168, tmp166 bgt- 7,.L497 # .LBB1544: .LBB1545: .file 22 "./arch/powerpc/include/asm/uaccess.h" .loc 22 305 0 mr 3,28 #, iov mr 4,31 #, uvector bl __copy_tofrom_user# .LBE1545: .LBE1544: .loc 19 115 0 mr. 5,3 # n, beq+ 0,.L498 # .L497: .loc 19 116 0 subf 3,5,27 # tmp170, n, n li 4,0 #, add 3,28,3 #, iov, tmp170 bl memset#
gcc 4.6.3 miscompile on ppc32 (was Re: Regression in kernel 4.12-rc1 for Powerpc 32 - bisected to commit 3448890c32c3)
On Sun, Jun 25, 2017 at 12:14:04PM +0100, Al Viro wrote: > On Sun, Jun 25, 2017 at 10:53:58AM +0100, Al Viro wrote: > > On Sat, Jun 24, 2017 at 12:29:23PM -0500, Larry Finger wrote: > > > > > I made a break through. If I turn off inline copy to/from users for 32-bit > > > ppc with the following patch, then the system boots: > > > > OK... So it's 4.6.3 miscompiling something - it is hardware-independent, > > reproduced in qemu. I'd like to get more self-contained example of > > miscompile, though; should be done by tonight... > > OK, it's the call in rw_copy_check_uvector(); with INLINE_COPY_FROM_USER > it's miscompiled by 4.6.3. I hadn't looked through the generated code > yet; will do that after I grab some sleep. Confirmed. It manages to bugger the loop immediately after the (successful) copying of iovec array in rw_copy_check_uvector(); both with and without INLINE_COPY_FROM_USER it has (just before the call of copy_from_user()) r27 set to nr_segs * sizeof(struct iovec). The call is made, we check that it has succeeded and that's when it hits the fan: without INLINE_COPY_FROM_USER we have (interleaved with unrelated insns) addi 27,27,-8 srwi 27,27,3 addi 27,27,1 mtctr 27 Weird, but manages to pass nr_segs to mtctr. _With_ INLINE_COPY_FROM_USER we get this: lis 9,0x2000 mtctr 9 In other words, the loop will try to go through 8192 iterations. No idea where that number has come from, but it sure as hell is wrong. That's where those -EINVAL, etc. are coming from - we run into something negative in iov[seg].len, after having run out of on-stack iovec array. Assembler generated out of rw_copy_check_uvector() with and without INLINE_COPY_FROM_USER is attached; it's a definite miscompile. Neither 4.4.5 nor 6.3.0 use mtctr/bdnz for that loop. The bottom line is, ppc cross-toolchain on kernel.org happens to be the version that miscompiles rw_copy_check_uvector() with INLINE_COPY_FROM_USER and hell knows what else. Said that, I would rather have ppc32 drop the INLINE_COPY_{TO,FROM}_USER anyway; that won't fix any other places where the same 4.6.3 bug hits, but I seriously suspect that it will end up being faster even on non^Wless buggy gcc versions. Could powerpc folks check what does removing those two defines from arch/powerpc/include/asm/uaccess.h do to performance? If there's no slowdown, I would strongly recommend just removing those as in the patch Larry has posted upthread. Fixing whatever it is in gcc 4.6.3 that triggers that behaviour is IMO pointless - it might make sense to switch kernel.org cross-toolchain to something more recent, but that's it. .globl rw_copy_check_uvector .type rw_copy_check_uvector, @function rw_copy_check_uvector: .LFB2683: .loc 1 773 0 stwu 1,-32(1)#,, .LCFI142: mflr 0 #, .LCFI143: stmw 27,12(1)#, .LCFI144: .loc 1 783 0 mr. 27,5 # nr_segs, nr_segs .loc 1 773 0 mr 30,3 # type, type stw 0,36(1) #, .LCFI145: .loc 1 773 0 mr 31,4 # uvector, uvector mr 29,8 # ret_pointer, ret_pointer .loc 1 776 0 mr 28,7 # iov, fast_pointer .loc 1 784 0 li 0,0 # ret, .loc 1 783 0 beq- 0,.L495 # .loc 1 792 0 cmplwi 7,27,1024 #, tmp160, nr_segs .loc 1 793 0 li 0,-22 # ret, .loc 1 792 0 bgt- 7,.L495 # .loc 1 796 0 cmplw 7,27,6 # fast_segs, tmp161, nr_segs ble- 7,.L496 # .LBB1538: .LBB1539: .file 21 "./include/linux/slab.h" .loc 21 495 0 lis 4,0x140 # tmp190, slwi 3,27,3 #, nr_segs, ori 4,4,192 #,, tmp190, bl __kmalloc # .LBE1539: .LBE1538: .loc 1 799 0 li 0,-12 # ret, .loc 1 798 0 mr. 28,3 # iov, beq- 0,.L495 # .L496: .LBB1540: .LBB1541: .LBB1542: .LBB1543: .loc 19 113 0 lwz 0,1128(2)# current.192_185->thread.fs.seg, D.39493 .LBE1543: .LBE1542: .LBE1541: .LBE1540: .loc 1 803 0 slwi 27,27,3 # n, nr_segs, .LBB1549: .LBB1548: .LBB1547: .LBB1546: mr 5,27 # n, n .loc 19 113 0 cmplw 7,31,0 # D.39493, tmp165, uvector bgt- 7,.L497 # addi 9,27,-1 # tmp166, n, subf 0,31,0 # tmp167, uvector, D.39493 cmplw 7,9,0 # tmp167, tmp168, tmp166 bgt- 7,.L497 # .LBB1544: .LBB1545: .file 22 "./arch/powerpc/include/asm/uaccess.h" .loc 22 305 0 mr 3,28 #, iov mr 4,31 #, uvector bl __copy_tofrom_user# .LBE1545: .LBE1544: .loc 19 115 0 mr. 5,3 # n, beq+ 0,.L498 # .L497: .loc 19 116 0 subf 3,5,27 # tmp170, n, n li 4,0 #, add 3,28,3 #, iov, tmp170 bl memset#
Re: Regression in kernel 4.12-rc1 for Powerpc 32 - bisected to commit 3448890c32c3
On Sun, Jun 25, 2017 at 10:53:58AM +0100, Al Viro wrote: > On Sat, Jun 24, 2017 at 12:29:23PM -0500, Larry Finger wrote: > > > I made a break through. If I turn off inline copy to/from users for 32-bit > > ppc with the following patch, then the system boots: > > OK... So it's 4.6.3 miscompiling something - it is hardware-independent, > reproduced in qemu. I'd like to get more self-contained example of > miscompile, though; should be done by tonight... OK, it's the call in rw_copy_check_uvector(); with INLINE_COPY_FROM_USER it's miscompiled by 4.6.3. I hadn't looked through the generated code yet; will do that after I grab some sleep.
Re: Regression in kernel 4.12-rc1 for Powerpc 32 - bisected to commit 3448890c32c3
On Sun, Jun 25, 2017 at 10:53:58AM +0100, Al Viro wrote: > On Sat, Jun 24, 2017 at 12:29:23PM -0500, Larry Finger wrote: > > > I made a break through. If I turn off inline copy to/from users for 32-bit > > ppc with the following patch, then the system boots: > > OK... So it's 4.6.3 miscompiling something - it is hardware-independent, > reproduced in qemu. I'd like to get more self-contained example of > miscompile, though; should be done by tonight... OK, it's the call in rw_copy_check_uvector(); with INLINE_COPY_FROM_USER it's miscompiled by 4.6.3. I hadn't looked through the generated code yet; will do that after I grab some sleep.
Re: Regression in kernel 4.12-rc1 for Powerpc 32 - bisected to commit 3448890c32c3
On Sat, Jun 24, 2017 at 12:29:23PM -0500, Larry Finger wrote: > I made a break through. If I turn off inline copy to/from users for 32-bit > ppc with the following patch, then the system boots: OK... So it's 4.6.3 miscompiling something - it is hardware-independent, reproduced in qemu. I'd like to get more self-contained example of miscompile, though; should be done by tonight...
Re: Regression in kernel 4.12-rc1 for Powerpc 32 - bisected to commit 3448890c32c3
On Sat, Jun 24, 2017 at 12:29:23PM -0500, Larry Finger wrote: > I made a break through. If I turn off inline copy to/from users for 32-bit > ppc with the following patch, then the system boots: OK... So it's 4.6.3 miscompiling something - it is hardware-independent, reproduced in qemu. I'd like to get more self-contained example of miscompile, though; should be done by tonight...
Re: Regression in kernel 4.12-rc1 for Powerpc 32 - bisected to commit 3448890c32c3
On 06/23/2017 03:29 PM, Al Viro wrote: On Fri, Jun 23, 2017 at 01:49:16PM -0500, Larry Finger wrote: BTW, could you try to check what happens if you kill the if (__builtin_constant_p(n) && (n <= 8)) bits in raw_copy_{to,from}_user()? The usefulness of those (in __copy_from_user() originally) had always been dubious and the things are simpler without them. If _that_ turns out to cure breakage, I would be very surprised, though. Sorry I was gone so long. Installing jessie on this box resulted in a crash on boot. Lubuntu 14.04 yielded a desktop with a functioning cursor, but nothing else. Finally, Ubuntu 12.04 resulted in a working system. I hate Unity, but I guess I'm stuck for now. Ho-hum... Jessie is 3.16, so whatever is crashing there, it's something different... Ubuntu 12.04 is what, 3.2? I know how easy it is to screw up a long bisection by booting the wrong kernel. To help that problem and to work around the yaconf/yboot nonsense on the MAC, my /etc/yaconf has always had generic kernel stanzas with only default, old, and original kernels mentioned. From there I use a local script to finish a kernel installation by moving the default links to the old ones and creating the new default links pointing to the current kernel. With those long-tested scripts, I'm sure that I am booting the one I want. With the new installation, kernel 4.12-rc6 failed, as did 3448890c with the backported 46f401c4 added. Replacing "if (__builtin_constant_p(n) && (n <= 8))" with "if (0)" had no effect. OK, that simplifies things a bit. Just to make sure we are on the same page: * f2ed8bebee69 + cherry-pick of 46f401c4 boots (Ubuntu 12.04 userland) * 3448890c32c3 + cherry-pick of 46f401c4 fails (Ubuntu 12.04 userland), ditto with removal of constant-size bits in raw_copy_..._user(). Failure appears to be on udev getting EFAULT on some syscalls. * straight Ubuntu 12.04 works * jessie crashes on boot. I made a break through. If I turn off inline copy to/from users for 32-bit ppc with the following patch, then the system boots: diff --git a/arch/powerpc/include/asm/uaccess.h b/arch/powerpc/include/asm/uaccess.h index 5c0d8a8cdae5..1e6a8723f497 100644 --- a/arch/powerpc/include/asm/uaccess.h +++ b/arch/powerpc/include/asm/uaccess.h @@ -267,12 +267,7 @@ do { \ extern unsigned long __copy_tofrom_user(void __user *to, const void __user *from, unsigned long size); -#ifndef __powerpc64__ - -#define INLINE_COPY_FROM_USER -#define INLINE_COPY_TO_USE - -#else /* __powerpc64__ */ +#ifdef __powerpc64__ static inline unsigned long raw_copy_in_user(void __user *to, const void __user *from, unsigned long n) It seems whatever problem I am seeing is in the inline version of _copy_to_user() and _copy_from_user() on the 32-bit ppc. The only other difference between the two versions is the placement of the __user macro, which looks to be wrong in the non-inlined version of _copy_to_user() in lib/usercopy.c, but that is the one that works. To me, this looks like a compiler error. On the PowerBook, 'gcc --version' reports "gcc (Ubuntu/Linaro 4.6.3-1ubuntu5) 4.6.3". I will prepare a proper patch that I will send to you privately. If you agree with it, it can be send through normal channels in time for the release of 4.12. Larry
Re: Regression in kernel 4.12-rc1 for Powerpc 32 - bisected to commit 3448890c32c3
On 06/23/2017 03:29 PM, Al Viro wrote: On Fri, Jun 23, 2017 at 01:49:16PM -0500, Larry Finger wrote: BTW, could you try to check what happens if you kill the if (__builtin_constant_p(n) && (n <= 8)) bits in raw_copy_{to,from}_user()? The usefulness of those (in __copy_from_user() originally) had always been dubious and the things are simpler without them. If _that_ turns out to cure breakage, I would be very surprised, though. Sorry I was gone so long. Installing jessie on this box resulted in a crash on boot. Lubuntu 14.04 yielded a desktop with a functioning cursor, but nothing else. Finally, Ubuntu 12.04 resulted in a working system. I hate Unity, but I guess I'm stuck for now. Ho-hum... Jessie is 3.16, so whatever is crashing there, it's something different... Ubuntu 12.04 is what, 3.2? I know how easy it is to screw up a long bisection by booting the wrong kernel. To help that problem and to work around the yaconf/yboot nonsense on the MAC, my /etc/yaconf has always had generic kernel stanzas with only default, old, and original kernels mentioned. From there I use a local script to finish a kernel installation by moving the default links to the old ones and creating the new default links pointing to the current kernel. With those long-tested scripts, I'm sure that I am booting the one I want. With the new installation, kernel 4.12-rc6 failed, as did 3448890c with the backported 46f401c4 added. Replacing "if (__builtin_constant_p(n) && (n <= 8))" with "if (0)" had no effect. OK, that simplifies things a bit. Just to make sure we are on the same page: * f2ed8bebee69 + cherry-pick of 46f401c4 boots (Ubuntu 12.04 userland) * 3448890c32c3 + cherry-pick of 46f401c4 fails (Ubuntu 12.04 userland), ditto with removal of constant-size bits in raw_copy_..._user(). Failure appears to be on udev getting EFAULT on some syscalls. * straight Ubuntu 12.04 works * jessie crashes on boot. I made a break through. If I turn off inline copy to/from users for 32-bit ppc with the following patch, then the system boots: diff --git a/arch/powerpc/include/asm/uaccess.h b/arch/powerpc/include/asm/uaccess.h index 5c0d8a8cdae5..1e6a8723f497 100644 --- a/arch/powerpc/include/asm/uaccess.h +++ b/arch/powerpc/include/asm/uaccess.h @@ -267,12 +267,7 @@ do { \ extern unsigned long __copy_tofrom_user(void __user *to, const void __user *from, unsigned long size); -#ifndef __powerpc64__ - -#define INLINE_COPY_FROM_USER -#define INLINE_COPY_TO_USE - -#else /* __powerpc64__ */ +#ifdef __powerpc64__ static inline unsigned long raw_copy_in_user(void __user *to, const void __user *from, unsigned long n) It seems whatever problem I am seeing is in the inline version of _copy_to_user() and _copy_from_user() on the 32-bit ppc. The only other difference between the two versions is the placement of the __user macro, which looks to be wrong in the non-inlined version of _copy_to_user() in lib/usercopy.c, but that is the one that works. To me, this looks like a compiler error. On the PowerBook, 'gcc --version' reports "gcc (Ubuntu/Linaro 4.6.3-1ubuntu5) 4.6.3". I will prepare a proper patch that I will send to you privately. If you agree with it, it can be send through normal channels in time for the release of 4.12. Larry
Re: Regression in kernel 4.12-rc1 for Powerpc 32 - bisected to commit 3448890c32c3
On Fri, Jun 23, 2017 at 01:49:16PM -0500, Larry Finger wrote: > > BTW, could you try to check what happens if you kill the > > if (__builtin_constant_p(n) && (n <= 8)) > > bits in raw_copy_{to,from}_user()? The usefulness of those (in > > __copy_from_user() > > originally) had always been dubious and the things are simpler without them. > > If _that_ turns out to cure breakage, I would be very surprised, though. > > > Sorry I was gone so long. Installing jessie on this box resulted in a crash > on boot. Lubuntu 14.04 yielded a desktop with a functioning cursor, but > nothing else. Finally, Ubuntu 12.04 resulted in a working system. I hate > Unity, but I guess I'm stuck for now. Ho-hum... Jessie is 3.16, so whatever is crashing there, it's something different... Ubuntu 12.04 is what, 3.2? > I know how easy it is to screw up a long bisection by booting the wrong > kernel. To help that problem and to work around the yaconf/yboot nonsense on > the MAC, my /etc/yaconf has always had generic kernel stanzas with only > default, old, and original kernels mentioned. From there I use a local > script to finish a kernel installation by moving the default links to the > old ones and creating the new default links pointing to the current kernel. > With those long-tested scripts, I'm sure that I am booting the one I want. > > With the new installation, kernel 4.12-rc6 failed, as did 3448890c with the > backported 46f401c4 added. > > Replacing "if (__builtin_constant_p(n) && (n <= 8))" with "if (0)" had no > effect. OK, that simplifies things a bit. Just to make sure we are on the same page: * f2ed8bebee69 + cherry-pick of 46f401c4 boots (Ubuntu 12.04 userland) * 3448890c32c3 + cherry-pick of 46f401c4 fails (Ubuntu 12.04 userland), ditto with removal of constant-size bits in raw_copy_..._user(). Failure appears to be on udev getting EFAULT on some syscalls. * straight Ubuntu 12.04 works * jessie crashes on boot. Could you post the boot logs of the first two?
Re: Regression in kernel 4.12-rc1 for Powerpc 32 - bisected to commit 3448890c32c3
On Fri, Jun 23, 2017 at 01:49:16PM -0500, Larry Finger wrote: > > BTW, could you try to check what happens if you kill the > > if (__builtin_constant_p(n) && (n <= 8)) > > bits in raw_copy_{to,from}_user()? The usefulness of those (in > > __copy_from_user() > > originally) had always been dubious and the things are simpler without them. > > If _that_ turns out to cure breakage, I would be very surprised, though. > > > Sorry I was gone so long. Installing jessie on this box resulted in a crash > on boot. Lubuntu 14.04 yielded a desktop with a functioning cursor, but > nothing else. Finally, Ubuntu 12.04 resulted in a working system. I hate > Unity, but I guess I'm stuck for now. Ho-hum... Jessie is 3.16, so whatever is crashing there, it's something different... Ubuntu 12.04 is what, 3.2? > I know how easy it is to screw up a long bisection by booting the wrong > kernel. To help that problem and to work around the yaconf/yboot nonsense on > the MAC, my /etc/yaconf has always had generic kernel stanzas with only > default, old, and original kernels mentioned. From there I use a local > script to finish a kernel installation by moving the default links to the > old ones and creating the new default links pointing to the current kernel. > With those long-tested scripts, I'm sure that I am booting the one I want. > > With the new installation, kernel 4.12-rc6 failed, as did 3448890c with the > backported 46f401c4 added. > > Replacing "if (__builtin_constant_p(n) && (n <= 8))" with "if (0)" had no > effect. OK, that simplifies things a bit. Just to make sure we are on the same page: * f2ed8bebee69 + cherry-pick of 46f401c4 boots (Ubuntu 12.04 userland) * 3448890c32c3 + cherry-pick of 46f401c4 fails (Ubuntu 12.04 userland), ditto with removal of constant-size bits in raw_copy_..._user(). Failure appears to be on udev getting EFAULT on some syscalls. * straight Ubuntu 12.04 works * jessie crashes on boot. Could you post the boot logs of the first two?
Re: Regression in kernel 4.12-rc1 for Powerpc 32 - bisected to commit 3448890c32c3
On 06/22/2017 02:25 PM, Al Viro wrote: On Thu, Jun 22, 2017 at 09:19:58AM -0500, Larry Finger wrote: Ugh... MintPPC appears to be dead. On KVM with Debian userland (either jessie or wheezy - no difference in result) booting the commit in question with your .config oopses as soon as pata_macio is initialized, due to the bug in "treewide: Move dma_ops from struct dev_archdata into struct device", and after cherry-picking your own fix for that (commit 46f401c4297a "powerpc/pmac: Fix crash in dma-mapping.h with NULL dma_ops") the result boots just fine. Again, that happens both for Debian 8 and Debian 7 userlands, so unless Mint had been doing something very odd there, I would question the accuracy of your bisect... Any chance that real hardware differs from KVM emulation? For that one? Bloody unlikely; udev could, theoretically, hit different codepaths due to different devices being observed, etc., but changes in that commit are not in the areas that would be easy to get wrong in emulator. All I know at this point is that commit f2ed8beb with 46f401c4 backported boots OK and commit 3448890c with the same backport fails. I will try loading jessie and see what happens. I would recheck which kernels are being booted - I had screwed that up during long bisects often enough... BTW, could you try to check what happens if you kill the if (__builtin_constant_p(n) && (n <= 8)) bits in raw_copy_{to,from}_user()? The usefulness of those (in __copy_from_user() originally) had always been dubious and the things are simpler without them. If _that_ turns out to cure breakage, I would be very surprised, though. Sorry I was gone so long. Installing jessie on this box resulted in a crash on boot. Lubuntu 14.04 yielded a desktop with a functioning cursor, but nothing else. Finally, Ubuntu 12.04 resulted in a working system. I hate Unity, but I guess I'm stuck for now. I know how easy it is to screw up a long bisection by booting the wrong kernel. To help that problem and to work around the yaconf/yboot nonsense on the MAC, my /etc/yaconf has always had generic kernel stanzas with only default, old, and original kernels mentioned. From there I use a local script to finish a kernel installation by moving the default links to the old ones and creating the new default links pointing to the current kernel. With those long-tested scripts, I'm sure that I am booting the one I want. With the new installation, kernel 4.12-rc6 failed, as did 3448890c with the backported 46f401c4 added. Replacing "if (__builtin_constant_p(n) && (n <= 8))" with "if (0)" had no effect. Larry
Re: Regression in kernel 4.12-rc1 for Powerpc 32 - bisected to commit 3448890c32c3
On 06/22/2017 02:25 PM, Al Viro wrote: On Thu, Jun 22, 2017 at 09:19:58AM -0500, Larry Finger wrote: Ugh... MintPPC appears to be dead. On KVM with Debian userland (either jessie or wheezy - no difference in result) booting the commit in question with your .config oopses as soon as pata_macio is initialized, due to the bug in "treewide: Move dma_ops from struct dev_archdata into struct device", and after cherry-picking your own fix for that (commit 46f401c4297a "powerpc/pmac: Fix crash in dma-mapping.h with NULL dma_ops") the result boots just fine. Again, that happens both for Debian 8 and Debian 7 userlands, so unless Mint had been doing something very odd there, I would question the accuracy of your bisect... Any chance that real hardware differs from KVM emulation? For that one? Bloody unlikely; udev could, theoretically, hit different codepaths due to different devices being observed, etc., but changes in that commit are not in the areas that would be easy to get wrong in emulator. All I know at this point is that commit f2ed8beb with 46f401c4 backported boots OK and commit 3448890c with the same backport fails. I will try loading jessie and see what happens. I would recheck which kernels are being booted - I had screwed that up during long bisects often enough... BTW, could you try to check what happens if you kill the if (__builtin_constant_p(n) && (n <= 8)) bits in raw_copy_{to,from}_user()? The usefulness of those (in __copy_from_user() originally) had always been dubious and the things are simpler without them. If _that_ turns out to cure breakage, I would be very surprised, though. Sorry I was gone so long. Installing jessie on this box resulted in a crash on boot. Lubuntu 14.04 yielded a desktop with a functioning cursor, but nothing else. Finally, Ubuntu 12.04 resulted in a working system. I hate Unity, but I guess I'm stuck for now. I know how easy it is to screw up a long bisection by booting the wrong kernel. To help that problem and to work around the yaconf/yboot nonsense on the MAC, my /etc/yaconf has always had generic kernel stanzas with only default, old, and original kernels mentioned. From there I use a local script to finish a kernel installation by moving the default links to the old ones and creating the new default links pointing to the current kernel. With those long-tested scripts, I'm sure that I am booting the one I want. With the new installation, kernel 4.12-rc6 failed, as did 3448890c with the backported 46f401c4 added. Replacing "if (__builtin_constant_p(n) && (n <= 8))" with "if (0)" had no effect. Larry
Re: Regression in kernel 4.12-rc1 for Powerpc 32 - bisected to commit 3448890c32c3
On Thu, Jun 22, 2017 at 08:25:16PM +0100, Al Viro wrote: > > All I know at this > > point is that commit f2ed8beb with 46f401c4 backported boots OK and commit > > 3448890c with the same backport fails. > > > > I will try loading jessie and see what happens. > > I would recheck which kernels are being booted - I had screwed that up during > long > bisects often enough... > > BTW, could you try to check what happens if you kill the > if (__builtin_constant_p(n) && (n <= 8)) > bits in raw_copy_{to,from}_user()? The usefulness of those (in > __copy_from_user() > originally) had always been dubious and the things are simpler without them. > If _that_ turns out to cure breakage, I would be very surprised, though. FWIW, having dug through the __copy_tofrom_user() change in 3448890c, I don't see anything that would be likely to cause that effect, be it on hardware or emulated. Moreover, had that been fucked up, I would've expected lots and lots of folks screaming by now - boot being broken since -rc1 tends to have such effect, even if nobody had noticed that in -next last cycle. What I can prove is that * __copy_tofrom_user() return value is unchanged in all cases * the only difference in its behaviour is that prior to that commit some cases when it returns non-zero used to do memset(dest + something, 0, retval) and now they do not. _All_ such cases must have stepped into a fault on load from src + something. And looking through arch/powerpc callers of all that bunch, I don't see any candidates for being buggered by disappearing memset() on partial copy with faulting read; note that copy_from_user() *will* memset() explicitly if raw_copy_from_user() returns non-zero. I wondered if it could be a weird case when copy_to_user() had been running into an unmapped area of *source* and proceeded to zero the tail of destination, but I don't see anything likely in arch/powerpc and anything in arch-independent code would've been oopsing on that all along for some architectures...
Re: Regression in kernel 4.12-rc1 for Powerpc 32 - bisected to commit 3448890c32c3
On Thu, Jun 22, 2017 at 08:25:16PM +0100, Al Viro wrote: > > All I know at this > > point is that commit f2ed8beb with 46f401c4 backported boots OK and commit > > 3448890c with the same backport fails. > > > > I will try loading jessie and see what happens. > > I would recheck which kernels are being booted - I had screwed that up during > long > bisects often enough... > > BTW, could you try to check what happens if you kill the > if (__builtin_constant_p(n) && (n <= 8)) > bits in raw_copy_{to,from}_user()? The usefulness of those (in > __copy_from_user() > originally) had always been dubious and the things are simpler without them. > If _that_ turns out to cure breakage, I would be very surprised, though. FWIW, having dug through the __copy_tofrom_user() change in 3448890c, I don't see anything that would be likely to cause that effect, be it on hardware or emulated. Moreover, had that been fucked up, I would've expected lots and lots of folks screaming by now - boot being broken since -rc1 tends to have such effect, even if nobody had noticed that in -next last cycle. What I can prove is that * __copy_tofrom_user() return value is unchanged in all cases * the only difference in its behaviour is that prior to that commit some cases when it returns non-zero used to do memset(dest + something, 0, retval) and now they do not. _All_ such cases must have stepped into a fault on load from src + something. And looking through arch/powerpc callers of all that bunch, I don't see any candidates for being buggered by disappearing memset() on partial copy with faulting read; note that copy_from_user() *will* memset() explicitly if raw_copy_from_user() returns non-zero. I wondered if it could be a weird case when copy_to_user() had been running into an unmapped area of *source* and proceeded to zero the tail of destination, but I don't see anything likely in arch/powerpc and anything in arch-independent code would've been oopsing on that all along for some architectures...
Re: Regression in kernel 4.12-rc1 for Powerpc 32 - bisected to commit 3448890c32c3
On Thu, Jun 22, 2017 at 09:19:58AM -0500, Larry Finger wrote: > > Ugh... MintPPC appears to be dead. On KVM with Debian userland (either > > jessie or wheezy - no difference in result) booting the commit in > > question with your .config oopses as soon as pata_macio is initialized, > > due to the bug in "treewide: Move dma_ops from struct dev_archdata into > > struct device", and after cherry-picking your own fix for that (commit > > 46f401c4297a "powerpc/pmac: Fix crash in dma-mapping.h with NULL dma_ops") > > the result boots just fine. > > > > Again, that happens both for Debian 8 and Debian 7 userlands, so unless > > Mint had been doing something very odd there, I would question the accuracy > > of your bisect... > Any chance that real hardware differs from KVM emulation? For that one? Bloody unlikely; udev could, theoretically, hit different codepaths due to different devices being observed, etc., but changes in that commit are not in the areas that would be easy to get wrong in emulator. > All I know at this > point is that commit f2ed8beb with 46f401c4 backported boots OK and commit > 3448890c with the same backport fails. > > I will try loading jessie and see what happens. I would recheck which kernels are being booted - I had screwed that up during long bisects often enough... BTW, could you try to check what happens if you kill the if (__builtin_constant_p(n) && (n <= 8)) bits in raw_copy_{to,from}_user()? The usefulness of those (in __copy_from_user() originally) had always been dubious and the things are simpler without them. If _that_ turns out to cure breakage, I would be very surprised, though.
Re: Regression in kernel 4.12-rc1 for Powerpc 32 - bisected to commit 3448890c32c3
On Thu, Jun 22, 2017 at 09:19:58AM -0500, Larry Finger wrote: > > Ugh... MintPPC appears to be dead. On KVM with Debian userland (either > > jessie or wheezy - no difference in result) booting the commit in > > question with your .config oopses as soon as pata_macio is initialized, > > due to the bug in "treewide: Move dma_ops from struct dev_archdata into > > struct device", and after cherry-picking your own fix for that (commit > > 46f401c4297a "powerpc/pmac: Fix crash in dma-mapping.h with NULL dma_ops") > > the result boots just fine. > > > > Again, that happens both for Debian 8 and Debian 7 userlands, so unless > > Mint had been doing something very odd there, I would question the accuracy > > of your bisect... > Any chance that real hardware differs from KVM emulation? For that one? Bloody unlikely; udev could, theoretically, hit different codepaths due to different devices being observed, etc., but changes in that commit are not in the areas that would be easy to get wrong in emulator. > All I know at this > point is that commit f2ed8beb with 46f401c4 backported boots OK and commit > 3448890c with the same backport fails. > > I will try loading jessie and see what happens. I would recheck which kernels are being booted - I had screwed that up during long bisects often enough... BTW, could you try to check what happens if you kill the if (__builtin_constant_p(n) && (n <= 8)) bits in raw_copy_{to,from}_user()? The usefulness of those (in __copy_from_user() originally) had always been dubious and the things are simpler without them. If _that_ turns out to cure breakage, I would be very surprised, though.
Re: Regression in kernel 4.12-rc1 for Powerpc 32 - bisected to commit 3448890c32c3
On 06/22/2017 09:12 AM, Al Viro wrote: On Wed, Jun 21, 2017 at 04:49:46PM -0500, Larry Finger wrote: On 06/21/2017 04:34 PM, Al Viro wrote: On Wed, Jun 21, 2017 at 04:31:40PM -0500, Larry Finger wrote: On 06/21/2017 04:22 PM, Al Viro wrote: How about the .config that works on parent of that commit? Attached. OK... am I right assuming straight jessie/powerpc userland? Actually Mint 12 with noting fancy. The machine mainly exists to test the wireless drivers work on big-endian hardware. 'cat /etc/issue' reports "Debian GNU/Linux 7". Ugh... MintPPC appears to be dead. On KVM with Debian userland (either jessie or wheezy - no difference in result) booting the commit in question with your .config oopses as soon as pata_macio is initialized, due to the bug in "treewide: Move dma_ops from struct dev_archdata into struct device", and after cherry-picking your own fix for that (commit 46f401c4297a "powerpc/pmac: Fix crash in dma-mapping.h with NULL dma_ops") the result boots just fine. Again, that happens both for Debian 8 and Debian 7 userlands, so unless Mint had been doing something very odd there, I would question the accuracy of your bisect... Any chance that real hardware differs from KVM emulation? All I know at this point is that commit f2ed8beb with 46f401c4 backported boots OK and commit 3448890c with the same backport fails. I will try loading jessie and see what happens. Thanks for investigating. Larry
Re: Regression in kernel 4.12-rc1 for Powerpc 32 - bisected to commit 3448890c32c3
On 06/22/2017 09:12 AM, Al Viro wrote: On Wed, Jun 21, 2017 at 04:49:46PM -0500, Larry Finger wrote: On 06/21/2017 04:34 PM, Al Viro wrote: On Wed, Jun 21, 2017 at 04:31:40PM -0500, Larry Finger wrote: On 06/21/2017 04:22 PM, Al Viro wrote: How about the .config that works on parent of that commit? Attached. OK... am I right assuming straight jessie/powerpc userland? Actually Mint 12 with noting fancy. The machine mainly exists to test the wireless drivers work on big-endian hardware. 'cat /etc/issue' reports "Debian GNU/Linux 7". Ugh... MintPPC appears to be dead. On KVM with Debian userland (either jessie or wheezy - no difference in result) booting the commit in question with your .config oopses as soon as pata_macio is initialized, due to the bug in "treewide: Move dma_ops from struct dev_archdata into struct device", and after cherry-picking your own fix for that (commit 46f401c4297a "powerpc/pmac: Fix crash in dma-mapping.h with NULL dma_ops") the result boots just fine. Again, that happens both for Debian 8 and Debian 7 userlands, so unless Mint had been doing something very odd there, I would question the accuracy of your bisect... Any chance that real hardware differs from KVM emulation? All I know at this point is that commit f2ed8beb with 46f401c4 backported boots OK and commit 3448890c with the same backport fails. I will try loading jessie and see what happens. Thanks for investigating. Larry
Re: Regression in kernel 4.12-rc1 for Powerpc 32 - bisected to commit 3448890c32c3
On Wed, Jun 21, 2017 at 04:49:46PM -0500, Larry Finger wrote: > On 06/21/2017 04:34 PM, Al Viro wrote: > > On Wed, Jun 21, 2017 at 04:31:40PM -0500, Larry Finger wrote: > > > On 06/21/2017 04:22 PM, Al Viro wrote: > > > > How about the .config that works on parent of that commit? > > > > > > Attached. > > > > OK... am I right assuming straight jessie/powerpc userland? > > > > Actually Mint 12 with noting fancy. The machine mainly exists to test the > wireless drivers work on big-endian hardware. > > 'cat /etc/issue' reports "Debian GNU/Linux 7". Ugh... MintPPC appears to be dead. On KVM with Debian userland (either jessie or wheezy - no difference in result) booting the commit in question with your .config oopses as soon as pata_macio is initialized, due to the bug in "treewide: Move dma_ops from struct dev_archdata into struct device", and after cherry-picking your own fix for that (commit 46f401c4297a "powerpc/pmac: Fix crash in dma-mapping.h with NULL dma_ops") the result boots just fine. Again, that happens both for Debian 8 and Debian 7 userlands, so unless Mint had been doing something very odd there, I would question the accuracy of your bisect...
Re: Regression in kernel 4.12-rc1 for Powerpc 32 - bisected to commit 3448890c32c3
On Wed, Jun 21, 2017 at 04:49:46PM -0500, Larry Finger wrote: > On 06/21/2017 04:34 PM, Al Viro wrote: > > On Wed, Jun 21, 2017 at 04:31:40PM -0500, Larry Finger wrote: > > > On 06/21/2017 04:22 PM, Al Viro wrote: > > > > How about the .config that works on parent of that commit? > > > > > > Attached. > > > > OK... am I right assuming straight jessie/powerpc userland? > > > > Actually Mint 12 with noting fancy. The machine mainly exists to test the > wireless drivers work on big-endian hardware. > > 'cat /etc/issue' reports "Debian GNU/Linux 7". Ugh... MintPPC appears to be dead. On KVM with Debian userland (either jessie or wheezy - no difference in result) booting the commit in question with your .config oopses as soon as pata_macio is initialized, due to the bug in "treewide: Move dma_ops from struct dev_archdata into struct device", and after cherry-picking your own fix for that (commit 46f401c4297a "powerpc/pmac: Fix crash in dma-mapping.h with NULL dma_ops") the result boots just fine. Again, that happens both for Debian 8 and Debian 7 userlands, so unless Mint had been doing something very odd there, I would question the accuracy of your bisect...
Re: Regression in kernel 4.12-rc1 for Powerpc 32 - bisected to commit 3448890c32c3
On 06/21/2017 04:34 PM, Al Viro wrote: On Wed, Jun 21, 2017 at 04:31:40PM -0500, Larry Finger wrote: On 06/21/2017 04:22 PM, Al Viro wrote: How about the .config that works on parent of that commit? Attached. OK... am I right assuming straight jessie/powerpc userland? Actually Mint 12 with noting fancy. The machine mainly exists to test the wireless drivers work on big-endian hardware. 'cat /etc/issue' reports "Debian GNU/Linux 7". Larry Larry
Re: Regression in kernel 4.12-rc1 for Powerpc 32 - bisected to commit 3448890c32c3
On 06/21/2017 04:34 PM, Al Viro wrote: On Wed, Jun 21, 2017 at 04:31:40PM -0500, Larry Finger wrote: On 06/21/2017 04:22 PM, Al Viro wrote: How about the .config that works on parent of that commit? Attached. OK... am I right assuming straight jessie/powerpc userland? Actually Mint 12 with noting fancy. The machine mainly exists to test the wireless drivers work on big-endian hardware. 'cat /etc/issue' reports "Debian GNU/Linux 7". Larry Larry
Re: Regression in kernel 4.12-rc1 for Powerpc 32 - bisected to commit 3448890c32c3
On Wed, Jun 21, 2017 at 04:31:40PM -0500, Larry Finger wrote: > On 06/21/2017 04:22 PM, Al Viro wrote: > > On Wed, Jun 21, 2017 at 10:10:57AM -0500, Larry Finger wrote: > > > > > I finally finished the bisection by patching each commit that was affected > > > by the bootstrap crash. The faulty change is commit > > > 3448890c32c32c482c3ec20baa8fdd2ab4f94cc0 ("powerpc: get rid of zeroing, > > > switch to RAW_COPY_USER"). I am very confident in the bisection. > > > > > > As I know nothing of assembly for ppc32, I have not been able to attempt > > > to > > > find the problem with these patches. > > > > How about the .config that works on parent of that commit? > > Attached. OK... am I right assuming straight jessie/powerpc userland?
Re: Regression in kernel 4.12-rc1 for Powerpc 32 - bisected to commit 3448890c32c3
On Wed, Jun 21, 2017 at 04:31:40PM -0500, Larry Finger wrote: > On 06/21/2017 04:22 PM, Al Viro wrote: > > On Wed, Jun 21, 2017 at 10:10:57AM -0500, Larry Finger wrote: > > > > > I finally finished the bisection by patching each commit that was affected > > > by the bootstrap crash. The faulty change is commit > > > 3448890c32c32c482c3ec20baa8fdd2ab4f94cc0 ("powerpc: get rid of zeroing, > > > switch to RAW_COPY_USER"). I am very confident in the bisection. > > > > > > As I know nothing of assembly for ppc32, I have not been able to attempt > > > to > > > find the problem with these patches. > > > > How about the .config that works on parent of that commit? > > Attached. OK... am I right assuming straight jessie/powerpc userland?
Re: Regression in kernel 4.12-rc1 for Powerpc 32 - bisected to commit 3448890c32c3
On 06/21/2017 04:22 PM, Al Viro wrote: On Wed, Jun 21, 2017 at 10:10:57AM -0500, Larry Finger wrote: I finally finished the bisection by patching each commit that was affected by the bootstrap crash. The faulty change is commit 3448890c32c32c482c3ec20baa8fdd2ab4f94cc0 ("powerpc: get rid of zeroing, switch to RAW_COPY_USER"). I am very confident in the bisection. As I know nothing of assembly for ppc32, I have not been able to attempt to find the problem with these patches. How about the .config that works on parent of that commit? Attached. Laeey # # Automatically generated file; DO NOT EDIT. # Linux/powerpc 4.11.0-rc1 Kernel Configuration # # CONFIG_PPC64 is not set # # Processor support # CONFIG_PPC_BOOK3S_32=y # CONFIG_PPC_85xx is not set # CONFIG_PPC_8xx is not set # CONFIG_40x is not set # CONFIG_44x is not set # CONFIG_E200 is not set CONFIG_PPC_BOOK3S=y CONFIG_6xx=y CONFIG_PPC_FPU=y CONFIG_ALTIVEC=y CONFIG_PPC_STD_MMU=y CONFIG_PPC_STD_MMU_32=y # CONFIG_PPC_MM_SLICES is not set CONFIG_PPC_HAVE_PMU_SUPPORT=y CONFIG_PPC_PERF_CTRS=y # CONFIG_SMP is not set # CONFIG_PPC_DOORBELL is not set CONFIG_VDSO32=y CONFIG_CPU_BIG_ENDIAN=y CONFIG_PPC32=y CONFIG_32BIT=y # CONFIG_ARCH_PHYS_ADDR_T_64BIT is not set # CONFIG_ARCH_DMA_ADDR_T_64BIT is not set CONFIG_MMU=y # CONFIG_HAVE_SETUP_PER_CPU_AREA is not set # CONFIG_NEED_PER_CPU_EMBED_FIRST_CHUNK is not set CONFIG_NR_IRQS=512 CONFIG_STACKTRACE_SUPPORT=y CONFIG_TRACE_IRQFLAGS_SUPPORT=y CONFIG_LOCKDEP_SUPPORT=y CONFIG_RWSEM_XCHGADD_ALGORITHM=y CONFIG_ARCH_HAS_ILOG2_U32=y CONFIG_GENERIC_HWEIGHT=y CONFIG_ARCH_HAS_DMA_SET_COHERENT_MASK=y CONFIG_PPC=y # CONFIG_GENERIC_CSUM is not set CONFIG_EARLY_PRINTK=y CONFIG_PANIC_TIMEOUT=180 CONFIG_GENERIC_NVRAM=y CONFIG_SCHED_OMIT_FRAME_POINTER=y CONFIG_ARCH_MAY_HAVE_PC_FDC=y CONFIG_PPC_UDBG_16550=y # CONFIG_GENERIC_TBSYNC is not set CONFIG_AUDIT_ARCH=y CONFIG_GENERIC_BUG=y # CONFIG_EPAPR_BOOT is not set # CONFIG_DEFAULT_UIMAGE is not set CONFIG_ARCH_HIBERNATION_POSSIBLE=y CONFIG_ARCH_SUSPEND_POSSIBLE=y # CONFIG_PPC_DCR_NATIVE is not set # CONFIG_PPC_DCR_MMIO is not set CONFIG_ARCH_SUPPORTS_DEBUG_PAGEALLOC=y CONFIG_ARCH_SUPPORTS_UPROBES=y CONFIG_PPC_EMULATE_SSTEP=y CONFIG_PGTABLE_LEVELS=2 CONFIG_DEFCONFIG_LIST="/lib/modules/$UNAME_RELEASE/.config" CONFIG_IRQ_WORK=y CONFIG_BUILDTIME_EXTABLE_SORT=y # # General setup # CONFIG_BROKEN_ON_SMP=y CONFIG_INIT_ENV_ARG_LIMIT=32 CONFIG_CROSS_COMPILE="" # CONFIG_COMPILE_TEST is not set CONFIG_LOCALVERSION="" # CONFIG_LOCALVERSION_AUTO is not set CONFIG_HAVE_KERNEL_GZIP=y CONFIG_KERNEL_GZIP=y CONFIG_DEFAULT_HOSTNAME="(none)" CONFIG_SWAP=y CONFIG_SYSVIPC=y CONFIG_SYSVIPC_SYSCTL=y CONFIG_POSIX_MQUEUE=y CONFIG_POSIX_MQUEUE_SYSCTL=y CONFIG_CROSS_MEMORY_ATTACH=y CONFIG_FHANDLE=y CONFIG_USELIB=y CONFIG_AUDIT=y CONFIG_HAVE_ARCH_AUDITSYSCALL=y CONFIG_AUDITSYSCALL=y CONFIG_AUDIT_WATCH=y CONFIG_AUDIT_TREE=y # # IRQ subsystem # CONFIG_GENERIC_IRQ_SHOW=y CONFIG_GENERIC_IRQ_SHOW_LEVEL=y CONFIG_IRQ_DOMAIN=y CONFIG_GENERIC_MSI_IRQ=y # CONFIG_IRQ_DOMAIN_DEBUG is not set CONFIG_IRQ_FORCED_THREADING=y CONFIG_SPARSE_IRQ=y CONFIG_GENERIC_TIME_VSYSCALL_OLD=y CONFIG_GENERIC_CLOCKEVENTS=y CONFIG_GENERIC_CMOS_UPDATE=y # # Timers subsystem # CONFIG_TICK_ONESHOT=y CONFIG_NO_HZ_COMMON=y # CONFIG_HZ_PERIODIC is not set CONFIG_NO_HZ_IDLE=y CONFIG_NO_HZ=y CONFIG_HIGH_RES_TIMERS=y # # CPU/Task time and stats accounting # CONFIG_TICK_CPU_ACCOUNTING=y # CONFIG_VIRT_CPU_ACCOUNTING_NATIVE is not set CONFIG_BSD_PROCESS_ACCT=y CONFIG_BSD_PROCESS_ACCT_V3=y CONFIG_TASKSTATS=y CONFIG_TASK_DELAY_ACCT=y CONFIG_TASK_XACCT=y CONFIG_TASK_IO_ACCOUNTING=y # # RCU Subsystem # CONFIG_TINY_RCU=y # CONFIG_RCU_EXPERT is not set CONFIG_SRCU=y # CONFIG_TASKS_RCU is not set # CONFIG_RCU_STALL_COMMON is not set # CONFIG_TREE_RCU_TRACE is not set CONFIG_BUILD_BIN2C=y # CONFIG_IKCONFIG is not set CONFIG_LOG_BUF_SHIFT=20 CONFIG_PRINTK_SAFE_LOG_BUF_SHIFT=13 CONFIG_CGROUPS=y # CONFIG_MEMCG is not set CONFIG_BLK_CGROUP=y # CONFIG_DEBUG_BLK_CGROUP is not set CONFIG_CGROUP_SCHED=y CONFIG_FAIR_GROUP_SCHED=y # CONFIG_CFS_BANDWIDTH is not set # CONFIG_RT_GROUP_SCHED is not set # CONFIG_CGROUP_PIDS is not set # CONFIG_CGROUP_RDMA is not set CONFIG_CGROUP_FREEZER=y CONFIG_CPUSETS=y CONFIG_PROC_PID_CPUSET=y CONFIG_CGROUP_DEVICE=y CONFIG_CGROUP_CPUACCT=y CONFIG_CGROUP_PERF=y # CONFIG_CGROUP_DEBUG is not set # CONFIG_SOCK_CGROUP_DATA is not set # CONFIG_CHECKPOINT_RESTORE is not set CONFIG_NAMESPACES=y CONFIG_UTS_NS=y CONFIG_IPC_NS=y CONFIG_USER_NS=y CONFIG_PID_NS=y CONFIG_NET_NS=y CONFIG_SCHED_AUTOGROUP=y # CONFIG_SYSFS_DEPRECATED is not set CONFIG_RELAY=y CONFIG_BLK_DEV_INITRD=y CONFIG_INITRAMFS_SOURCE="" CONFIG_RD_GZIP=y CONFIG_RD_BZIP2=y CONFIG_RD_LZMA=y CONFIG_RD_XZ=y CONFIG_RD_LZO=y CONFIG_RD_LZ4=y CONFIG_INITRAMFS_COMPRESSION=".gz" # CONFIG_CC_OPTIMIZE_FOR_PERFORMANCE is not set CONFIG_CC_OPTIMIZE_FOR_SIZE=y CONFIG_SYSCTL=y CONFIG_ANON_INODES=y CONFIG_SYSCTL_EXCEPTION_TRACE=y CONFIG_HAVE_PCSPKR_PLATFORM=y CONFIG_BPF=y # CONFIG_EXPERT is not
Re: Regression in kernel 4.12-rc1 for Powerpc 32 - bisected to commit 3448890c32c3
On 06/21/2017 04:22 PM, Al Viro wrote: On Wed, Jun 21, 2017 at 10:10:57AM -0500, Larry Finger wrote: I finally finished the bisection by patching each commit that was affected by the bootstrap crash. The faulty change is commit 3448890c32c32c482c3ec20baa8fdd2ab4f94cc0 ("powerpc: get rid of zeroing, switch to RAW_COPY_USER"). I am very confident in the bisection. As I know nothing of assembly for ppc32, I have not been able to attempt to find the problem with these patches. How about the .config that works on parent of that commit? Attached. Laeey # # Automatically generated file; DO NOT EDIT. # Linux/powerpc 4.11.0-rc1 Kernel Configuration # # CONFIG_PPC64 is not set # # Processor support # CONFIG_PPC_BOOK3S_32=y # CONFIG_PPC_85xx is not set # CONFIG_PPC_8xx is not set # CONFIG_40x is not set # CONFIG_44x is not set # CONFIG_E200 is not set CONFIG_PPC_BOOK3S=y CONFIG_6xx=y CONFIG_PPC_FPU=y CONFIG_ALTIVEC=y CONFIG_PPC_STD_MMU=y CONFIG_PPC_STD_MMU_32=y # CONFIG_PPC_MM_SLICES is not set CONFIG_PPC_HAVE_PMU_SUPPORT=y CONFIG_PPC_PERF_CTRS=y # CONFIG_SMP is not set # CONFIG_PPC_DOORBELL is not set CONFIG_VDSO32=y CONFIG_CPU_BIG_ENDIAN=y CONFIG_PPC32=y CONFIG_32BIT=y # CONFIG_ARCH_PHYS_ADDR_T_64BIT is not set # CONFIG_ARCH_DMA_ADDR_T_64BIT is not set CONFIG_MMU=y # CONFIG_HAVE_SETUP_PER_CPU_AREA is not set # CONFIG_NEED_PER_CPU_EMBED_FIRST_CHUNK is not set CONFIG_NR_IRQS=512 CONFIG_STACKTRACE_SUPPORT=y CONFIG_TRACE_IRQFLAGS_SUPPORT=y CONFIG_LOCKDEP_SUPPORT=y CONFIG_RWSEM_XCHGADD_ALGORITHM=y CONFIG_ARCH_HAS_ILOG2_U32=y CONFIG_GENERIC_HWEIGHT=y CONFIG_ARCH_HAS_DMA_SET_COHERENT_MASK=y CONFIG_PPC=y # CONFIG_GENERIC_CSUM is not set CONFIG_EARLY_PRINTK=y CONFIG_PANIC_TIMEOUT=180 CONFIG_GENERIC_NVRAM=y CONFIG_SCHED_OMIT_FRAME_POINTER=y CONFIG_ARCH_MAY_HAVE_PC_FDC=y CONFIG_PPC_UDBG_16550=y # CONFIG_GENERIC_TBSYNC is not set CONFIG_AUDIT_ARCH=y CONFIG_GENERIC_BUG=y # CONFIG_EPAPR_BOOT is not set # CONFIG_DEFAULT_UIMAGE is not set CONFIG_ARCH_HIBERNATION_POSSIBLE=y CONFIG_ARCH_SUSPEND_POSSIBLE=y # CONFIG_PPC_DCR_NATIVE is not set # CONFIG_PPC_DCR_MMIO is not set CONFIG_ARCH_SUPPORTS_DEBUG_PAGEALLOC=y CONFIG_ARCH_SUPPORTS_UPROBES=y CONFIG_PPC_EMULATE_SSTEP=y CONFIG_PGTABLE_LEVELS=2 CONFIG_DEFCONFIG_LIST="/lib/modules/$UNAME_RELEASE/.config" CONFIG_IRQ_WORK=y CONFIG_BUILDTIME_EXTABLE_SORT=y # # General setup # CONFIG_BROKEN_ON_SMP=y CONFIG_INIT_ENV_ARG_LIMIT=32 CONFIG_CROSS_COMPILE="" # CONFIG_COMPILE_TEST is not set CONFIG_LOCALVERSION="" # CONFIG_LOCALVERSION_AUTO is not set CONFIG_HAVE_KERNEL_GZIP=y CONFIG_KERNEL_GZIP=y CONFIG_DEFAULT_HOSTNAME="(none)" CONFIG_SWAP=y CONFIG_SYSVIPC=y CONFIG_SYSVIPC_SYSCTL=y CONFIG_POSIX_MQUEUE=y CONFIG_POSIX_MQUEUE_SYSCTL=y CONFIG_CROSS_MEMORY_ATTACH=y CONFIG_FHANDLE=y CONFIG_USELIB=y CONFIG_AUDIT=y CONFIG_HAVE_ARCH_AUDITSYSCALL=y CONFIG_AUDITSYSCALL=y CONFIG_AUDIT_WATCH=y CONFIG_AUDIT_TREE=y # # IRQ subsystem # CONFIG_GENERIC_IRQ_SHOW=y CONFIG_GENERIC_IRQ_SHOW_LEVEL=y CONFIG_IRQ_DOMAIN=y CONFIG_GENERIC_MSI_IRQ=y # CONFIG_IRQ_DOMAIN_DEBUG is not set CONFIG_IRQ_FORCED_THREADING=y CONFIG_SPARSE_IRQ=y CONFIG_GENERIC_TIME_VSYSCALL_OLD=y CONFIG_GENERIC_CLOCKEVENTS=y CONFIG_GENERIC_CMOS_UPDATE=y # # Timers subsystem # CONFIG_TICK_ONESHOT=y CONFIG_NO_HZ_COMMON=y # CONFIG_HZ_PERIODIC is not set CONFIG_NO_HZ_IDLE=y CONFIG_NO_HZ=y CONFIG_HIGH_RES_TIMERS=y # # CPU/Task time and stats accounting # CONFIG_TICK_CPU_ACCOUNTING=y # CONFIG_VIRT_CPU_ACCOUNTING_NATIVE is not set CONFIG_BSD_PROCESS_ACCT=y CONFIG_BSD_PROCESS_ACCT_V3=y CONFIG_TASKSTATS=y CONFIG_TASK_DELAY_ACCT=y CONFIG_TASK_XACCT=y CONFIG_TASK_IO_ACCOUNTING=y # # RCU Subsystem # CONFIG_TINY_RCU=y # CONFIG_RCU_EXPERT is not set CONFIG_SRCU=y # CONFIG_TASKS_RCU is not set # CONFIG_RCU_STALL_COMMON is not set # CONFIG_TREE_RCU_TRACE is not set CONFIG_BUILD_BIN2C=y # CONFIG_IKCONFIG is not set CONFIG_LOG_BUF_SHIFT=20 CONFIG_PRINTK_SAFE_LOG_BUF_SHIFT=13 CONFIG_CGROUPS=y # CONFIG_MEMCG is not set CONFIG_BLK_CGROUP=y # CONFIG_DEBUG_BLK_CGROUP is not set CONFIG_CGROUP_SCHED=y CONFIG_FAIR_GROUP_SCHED=y # CONFIG_CFS_BANDWIDTH is not set # CONFIG_RT_GROUP_SCHED is not set # CONFIG_CGROUP_PIDS is not set # CONFIG_CGROUP_RDMA is not set CONFIG_CGROUP_FREEZER=y CONFIG_CPUSETS=y CONFIG_PROC_PID_CPUSET=y CONFIG_CGROUP_DEVICE=y CONFIG_CGROUP_CPUACCT=y CONFIG_CGROUP_PERF=y # CONFIG_CGROUP_DEBUG is not set # CONFIG_SOCK_CGROUP_DATA is not set # CONFIG_CHECKPOINT_RESTORE is not set CONFIG_NAMESPACES=y CONFIG_UTS_NS=y CONFIG_IPC_NS=y CONFIG_USER_NS=y CONFIG_PID_NS=y CONFIG_NET_NS=y CONFIG_SCHED_AUTOGROUP=y # CONFIG_SYSFS_DEPRECATED is not set CONFIG_RELAY=y CONFIG_BLK_DEV_INITRD=y CONFIG_INITRAMFS_SOURCE="" CONFIG_RD_GZIP=y CONFIG_RD_BZIP2=y CONFIG_RD_LZMA=y CONFIG_RD_XZ=y CONFIG_RD_LZO=y CONFIG_RD_LZ4=y CONFIG_INITRAMFS_COMPRESSION=".gz" # CONFIG_CC_OPTIMIZE_FOR_PERFORMANCE is not set CONFIG_CC_OPTIMIZE_FOR_SIZE=y CONFIG_SYSCTL=y CONFIG_ANON_INODES=y CONFIG_SYSCTL_EXCEPTION_TRACE=y CONFIG_HAVE_PCSPKR_PLATFORM=y CONFIG_BPF=y # CONFIG_EXPERT is not
Re: Regression in kernel 4.12-rc1 for Powerpc 32 - bisected to commit 3448890c32c3
On Wed, Jun 21, 2017 at 10:10:57AM -0500, Larry Finger wrote: > I finally finished the bisection by patching each commit that was affected > by the bootstrap crash. The faulty change is commit > 3448890c32c32c482c3ec20baa8fdd2ab4f94cc0 ("powerpc: get rid of zeroing, > switch to RAW_COPY_USER"). I am very confident in the bisection. > > As I know nothing of assembly for ppc32, I have not been able to attempt to > find the problem with these patches. How about the .config that works on parent of that commit?
Re: Regression in kernel 4.12-rc1 for Powerpc 32 - bisected to commit 3448890c32c3
On Wed, Jun 21, 2017 at 10:10:57AM -0500, Larry Finger wrote: > I finally finished the bisection by patching each commit that was affected > by the bootstrap crash. The faulty change is commit > 3448890c32c32c482c3ec20baa8fdd2ab4f94cc0 ("powerpc: get rid of zeroing, > switch to RAW_COPY_USER"). I am very confident in the bisection. > > As I know nothing of assembly for ppc32, I have not been able to attempt to > find the problem with these patches. How about the .config that works on parent of that commit?
Re: Regression in kernel 4.12-rc1 for Powerpc 32 - bisected to commit 3448890c32c3
On 06/01/2017 11:39 AM, Larry Finger wrote: On my Powerbook G4 aluminum, kernel 4.12-rc1 fails to boot. I cannot save a copy of the early printk messages and I will need to summarize. The kernel finds the hard driver and recognizes the various partitions. All seems normal until after the "unused kernel memory is freed" and that "This architecture does not have kernel memory protection", which are both normal. The next batch of logged messages are Loading, please wait... udevd[64]: starting version 175 udevd[64]: Unable to receive ctrl message: Bad address. modprobe: chdir(4.12-rc1): No such file or directory udevd[64]: Unable to receive ctrl message: Bad address. Begin: Loading essential drivers ... modprobe: chdir: chdir(4.12.0-rc1): No such file or directory done. Begin: Running /scripts/init-premount ... done. udevd[64]: Begin: Waiting for root file system ... [ 11.651175] random: faast init done done. Gave up waiting for root device. Common problems: - Boot args (cat /proc/cmdline) - Check rootdelay= (did the system wait long enough?) - Check root= (Did the system wait for the right device?) - Missing modules (cat /proc/modules; ls /dev) ALERT! /dev/disk/by-uuid/ does not exist. Dropping to a shell! modprobe: chdir(4.12-rc1): No such file or directory BusyBox v1.20.2 (Debian 1:1.20.0-7) built-in shell (ash) Enter 'help' for a list of built-in commands. /bin/sh: can't access tty: Job control turned off (initramfs) At that point, the system is dead. I have tried bisecting this issue; however, I run into a second problem that crashes the bootstrap. I finally finished the bisection by patching each commit that was affected by the bootstrap crash. The faulty change is commit 3448890c32c32c482c3ec20baa8fdd2ab4f94cc0 ("powerpc: get rid of zeroing, switch to RAW_COPY_USER"). I am very confident in the bisection. As I know nothing of assembly for ppc32, I have not been able to attempt to find the problem with these patches. Larry
Re: Regression in kernel 4.12-rc1 for Powerpc 32 - bisected to commit 3448890c32c3
On 06/01/2017 11:39 AM, Larry Finger wrote: On my Powerbook G4 aluminum, kernel 4.12-rc1 fails to boot. I cannot save a copy of the early printk messages and I will need to summarize. The kernel finds the hard driver and recognizes the various partitions. All seems normal until after the "unused kernel memory is freed" and that "This architecture does not have kernel memory protection", which are both normal. The next batch of logged messages are Loading, please wait... udevd[64]: starting version 175 udevd[64]: Unable to receive ctrl message: Bad address. modprobe: chdir(4.12-rc1): No such file or directory udevd[64]: Unable to receive ctrl message: Bad address. Begin: Loading essential drivers ... modprobe: chdir: chdir(4.12.0-rc1): No such file or directory done. Begin: Running /scripts/init-premount ... done. udevd[64]: Begin: Waiting for root file system ... [ 11.651175] random: faast init done done. Gave up waiting for root device. Common problems: - Boot args (cat /proc/cmdline) - Check rootdelay= (did the system wait long enough?) - Check root= (Did the system wait for the right device?) - Missing modules (cat /proc/modules; ls /dev) ALERT! /dev/disk/by-uuid/ does not exist. Dropping to a shell! modprobe: chdir(4.12-rc1): No such file or directory BusyBox v1.20.2 (Debian 1:1.20.0-7) built-in shell (ash) Enter 'help' for a list of built-in commands. /bin/sh: can't access tty: Job control turned off (initramfs) At that point, the system is dead. I have tried bisecting this issue; however, I run into a second problem that crashes the bootstrap. I finally finished the bisection by patching each commit that was affected by the bootstrap crash. The faulty change is commit 3448890c32c32c482c3ec20baa8fdd2ab4f94cc0 ("powerpc: get rid of zeroing, switch to RAW_COPY_USER"). I am very confident in the bisection. As I know nothing of assembly for ppc32, I have not been able to attempt to find the problem with these patches. Larry