Re: rcutorture’s init segfaults in ppc64le VM
Dear Paul On Thu, Mar 10, 2022 at 4:10 PM Paul Menzel wrote: > > Dear Zhouyi, > > > Thank you for still looking into this. You are very welcome ;-) > > > Am 10.03.22 um 03:37 schrieb Zhouyi Zhou: > > > I try to reproduce the bug in ppc64 VM in Oregon State University > > using the vmlinux extracted from > > https://owww.molgen.mpg.de/~pmenzel/rcutorture-2022.02.01-21.52.37-torture-locktorture-kasan-lock01.tar.xz > > > > the ppc64 VM in which I run the qemu without hardware acceleration is: > > Linux version 5.4.0-100-generic (buildd@bos02-ppc64el-021) (gcc version > > 9.3.0 (Ubuntu 9.3.0-17ubuntu1~20.04)) #113-Ubuntu SMP Thu Feb 3 18:43:11 > > UTC 2022 (Ubuntu 5.4.0-100.113-generic 5.4.166) > > > > > > The qemu command I use to test: > > cd > > /tmp/dev/shm/linux/tools/testing/selftests/rcutorture/res/2022.02.01-21.52.37-torture/results-locktorture-kasan/LOCK01$ > > $qemu-system-ppc64 -nographic -smp cores=2,threads=1 -net none -M > > pseries -nodefaults -device spapr-vscsi -serial file:/tmp/console.log > > -m 512 -kernel ./vmlinux -append "debug_boot_weak_hash panic=-1 > > console=ttyS0 rcutorture.onoff_interval=200 > > rcutorture.onoff_holdoff=30 rcutree.gp_preinit_delay=12 > > rcutree.gp_init_delay=3 rcutree.gp_cleanup_delay=3 > > rcutree.kthread_prio=2 threadirqs tree.use_softirq=0 > > rcutorture.n_barrier_cbs=4 rcutorture.stat_interval=15 > > rcutorture.shutdown_secs=1800 rcutorture.test_no_idle_hz=1 > > rcutorture.verbose=1" > > > > The console.log is uploaded to: > > http://154.223.142.244/logs/20220310/console.paul.log > > The log tells us it is illegal instruction that causes the trouble: > > [4.246387][T1] init[1]: illegal instruction (4) at 1002c308 nip > > 1002c308 lr 10001684 code 1 in init[1000+d] > > [4.251400][T1] init[1]: code: f90d88c0 f92a0008 f9480008 7c2004ac > > 2c2d f949 386d88d0 38e8 > > [4.253416][T1] init[1]: code: 41820098 e92d8f98 75290010 4182008c > > <4401> 2c2d 6000 8902f438 > > > > > > Meanwhile, the vmlinux compiled by myself runs smoothly. > > How did you build it? Using GCC or clang? I forgot, if the problem was I built vmlinux(es) using GCC and clang both. The compiled vmlinux(es) runs smoothly. > only reproducible if the host Linux kernel was built with clang or the > VM kernel. Yes, I also remember this, the dependence of how the host Linux kernel is built makes things more complex. > > > Then I modify mkinitrd.sh to let it panic manually: > > http://154.223.142.244/logs/20220310/mkinitrd.sh > > I only see the change: > > - > + int *ptr = 0; > + *ptr = 0; > Yes, I make the segfault happen manually. > > The log tells us it is a segfault (instead of a illegal instruction): > > http://154.223.142.244/logs/20220310/console.zhouyi.log > > > > Then I use gdb to debug the init in host: > > ubuntu@zhouzhouyi-1:~/newkernel/linux-next$ gdb > > tools/testing/selftests/rcutorture/initrd/init > > (gdb) run > > Starting program: > > /home/ubuntu/newkernel/linux-next/tools/testing/selftests/rcutorture/initrd/init > > > > Program received signal SIGSEGV, Segmentation fault. > > 0x1b2c in ?? () > > (gdb) x/10i $pc > > => 0x1b2c:stw r9,0(r9) > > 0x1b30:trap > > 0x1b34:.long 0x0 > > 0x1b38:.long 0x0 > > 0x1b3c:.long 0x0 > > 0x1b40:lis r2,4110 > > 0x1b44:addir2,r2,31488 > > 0x1b48:mr r9,r1 > > 0x1b4c:rldicr r1,r1,0,59 > > 0x1b50:li r0,0 > > (gdb) p $r9 > > $1 = 0 > > (gdb) x/30x $pc - 0x30 > > 0x1afc:0x388400400x387f00400xf80100400x48026919 > > 0x1b0c:0x60000xe80100400x7c0803a60x4b24 > > 0x1b1c:0x0x01000x01800x3920 > > 0x1b2c:0x91290x7fe80x0x > > which matches the hex content of > > http://154.223.142.244/logs/20220310/console.zhouyi.log: > > [5.077431][T1] init[1]: segfault (11) at 0 nip 1b2c lr 10001024 > > code 1 in init[1000+d] > > [5.087167][T1] init[1]: code: 38840040 387f0040 f8010040 48026919 > > 6000 e8010040 7c0803a6 4b24 > > [5.093987][T1] init[1]: code: 0100 0180 3920 > > <9129> 7fe8 > > > > > > Conclusions: there might be something wrong when packing the init into > > vmlinux in your environment. > > > > I will continue to do research on this interesting problem with you. > > As written I think it’s a problem with LLVM/clang. Unfortunately, I > won’t be able to retest before next week. Roger that, no need to hurry ;-) Kind regards Zhouyi > Kind regards, > > Paul
Re: rcutorture’s init segfaults in ppc64le VM
Dear Zhouyi, Thank you for still looking into this. Am 10.03.22 um 03:37 schrieb Zhouyi Zhou: I try to reproduce the bug in ppc64 VM in Oregon State University using the vmlinux extracted from https://owww.molgen.mpg.de/~pmenzel/rcutorture-2022.02.01-21.52.37-torture-locktorture-kasan-lock01.tar.xz the ppc64 VM in which I run the qemu without hardware acceleration is: Linux version 5.4.0-100-generic (buildd@bos02-ppc64el-021) (gcc version 9.3.0 (Ubuntu 9.3.0-17ubuntu1~20.04)) #113-Ubuntu SMP Thu Feb 3 18:43:11 UTC 2022 (Ubuntu 5.4.0-100.113-generic 5.4.166) The qemu command I use to test: cd /tmp/dev/shm/linux/tools/testing/selftests/rcutorture/res/2022.02.01-21.52.37-torture/results-locktorture-kasan/LOCK01$ $qemu-system-ppc64 -nographic -smp cores=2,threads=1 -net none -M pseries -nodefaults -device spapr-vscsi -serial file:/tmp/console.log -m 512 -kernel ./vmlinux -append "debug_boot_weak_hash panic=-1 console=ttyS0 rcutorture.onoff_interval=200 rcutorture.onoff_holdoff=30 rcutree.gp_preinit_delay=12 rcutree.gp_init_delay=3 rcutree.gp_cleanup_delay=3 rcutree.kthread_prio=2 threadirqs tree.use_softirq=0 rcutorture.n_barrier_cbs=4 rcutorture.stat_interval=15 rcutorture.shutdown_secs=1800 rcutorture.test_no_idle_hz=1 rcutorture.verbose=1" The console.log is uploaded to: http://154.223.142.244/logs/20220310/console.paul.log The log tells us it is illegal instruction that causes the trouble: [4.246387][T1] init[1]: illegal instruction (4) at 1002c308 nip 1002c308 lr 10001684 code 1 in init[1000+d] [4.251400][T1] init[1]: code: f90d88c0 f92a0008 f9480008 7c2004ac 2c2d f949 386d88d0 38e8 [4.253416][T1] init[1]: code: 41820098 e92d8f98 75290010 4182008c <4401> 2c2d 6000 8902f438 Meanwhile, the vmlinux compiled by myself runs smoothly. How did you build it? Using GCC or clang? I forgot, if the problem was only reproducible if the host Linux kernel was built with clang or the VM kernel. Then I modify mkinitrd.sh to let it panic manually: http://154.223.142.244/logs/20220310/mkinitrd.sh I only see the change: - + int *ptr = 0; + *ptr = 0; The log tells us it is a segfault (instead of a illegal instruction): http://154.223.142.244/logs/20220310/console.zhouyi.log Then I use gdb to debug the init in host: ubuntu@zhouzhouyi-1:~/newkernel/linux-next$ gdb tools/testing/selftests/rcutorture/initrd/init (gdb) run Starting program: /home/ubuntu/newkernel/linux-next/tools/testing/selftests/rcutorture/initrd/init Program received signal SIGSEGV, Segmentation fault. 0x1b2c in ?? () (gdb) x/10i $pc => 0x1b2c:stw r9,0(r9) 0x1b30:trap 0x1b34:.long 0x0 0x1b38:.long 0x0 0x1b3c:.long 0x0 0x1b40:lis r2,4110 0x1b44:addir2,r2,31488 0x1b48:mr r9,r1 0x1b4c:rldicr r1,r1,0,59 0x1b50:li r0,0 (gdb) p $r9 $1 = 0 (gdb) x/30x $pc - 0x30 0x1afc:0x388400400x387f00400xf80100400x48026919 0x1b0c:0x60000xe80100400x7c0803a60x4b24 0x1b1c:0x0x01000x01800x3920 0x1b2c:0x91290x7fe80x0x which matches the hex content of http://154.223.142.244/logs/20220310/console.zhouyi.log: [5.077431][T1] init[1]: segfault (11) at 0 nip 1b2c lr 10001024 code 1 in init[1000+d] [5.087167][T1] init[1]: code: 38840040 387f0040 f8010040 48026919 6000 e8010040 7c0803a6 4b24 [5.093987][T1] init[1]: code: 0100 0180 3920 <9129> 7fe8 Conclusions: there might be something wrong when packing the init into vmlinux in your environment. I will continue to do research on this interesting problem with you. As written I think it’s a problem with LLVM/clang. Unfortunately, I won’t be able to retest before next week. Kind regards, Paul
Re: rcutorture’s init segfaults in ppc64le VM
On Thu, Mar 10, 2022 at 10:37:12AM +0800, Zhouyi Zhou wrote: > Dear Paul > > I try to reproduce the bug in ppc64 VM in Oregon State University > using the vmlinux extracted from > https://owww.molgen.mpg.de/~pmenzel/rcutorture-2022.02.01-21.52.37-torture-locktorture-kasan-lock01.tar.xz > > the ppc64 VM in which I run the qemu without hardware acceleration is: > Linux version 5.4.0-100-generic (buildd@bos02-ppc64el-021) (gcc > version 9.3.0 (Ubuntu 9.3.0-17ubuntu1~20.04)) #113-Ubuntu SMP Thu Feb > 3 18:43:11 UTC 2022 (Ubuntu 5.4.0-100.113-generic 5.4.166) > > > The qemu command I use to test: > cd > /tmp/dev/shm/linux/tools/testing/selftests/rcutorture/res/2022.02.01-21.52.37-torture/results-locktorture-kasan/LOCK01$ > $qemu-system-ppc64 -nographic -smp cores=2,threads=1 -net none -M > pseries -nodefaults -device spapr-vscsi -serial file:/tmp/console.log > -m 512 -kernel ./vmlinux -append "debug_boot_weak_hash panic=-1 > console=ttyS0 rcutorture.onoff_interval=200 > rcutorture.onoff_holdoff=30 rcutree.gp_preinit_delay=12 > rcutree.gp_init_delay=3 rcutree.gp_cleanup_delay=3 > rcutree.kthread_prio=2 threadirqs tree.use_softirq=0 > rcutorture.n_barrier_cbs=4 rcutorture.stat_interval=15 > rcutorture.shutdown_secs=1800 rcutorture.test_no_idle_hz=1 > rcutorture.verbose=1" > > The console.log is uploaded to: > http://154.223.142.244/logs/20220310/console.paul.log > The log tells us it is illegal instruction that causes the trouble: > [4.246387][T1] init[1]: illegal instruction (4) at 1002c308 > nip 1002c308 lr 10001684 code 1 in init[1000+d] > [4.251400][T1] init[1]: code: f90d88c0 f92a0008 f9480008 > 7c2004ac 2c2d f949 386d88d0 38e8 > [4.253416][T1] init[1]: code: 41820098 e92d8f98 75290010 > 4182008c <4401> 2c2d 6000 8902f438 > > > Meanwhile, the vmlinux compiled by myself runs smoothly. > > Then I modify mkinitrd.sh to let it panic manually: > http://154.223.142.244/logs/20220310/mkinitrd.sh > The log tells us it is a segfault (instead of a illegal instruction): > http://154.223.142.244/logs/20220310/console.zhouyi.log > > Then I use gdb to debug the init in host: > ubuntu@zhouzhouyi-1:~/newkernel/linux-next$ gdb > tools/testing/selftests/rcutorture/initrd/init > (gdb) run > Starting program: > /home/ubuntu/newkernel/linux-next/tools/testing/selftests/rcutorture/initrd/init > > Program received signal SIGSEGV, Segmentation fault. > 0x1b2c in ?? () > (gdb) x/10i $pc > => 0x1b2c:stw r9,0(r9) >0x1b30:trap >0x1b34:.long 0x0 >0x1b38:.long 0x0 >0x1b3c:.long 0x0 >0x1b40:lis r2,4110 >0x1b44:addir2,r2,31488 >0x1b48:mr r9,r1 >0x1b4c:rldicr r1,r1,0,59 >0x1b50:li r0,0 > (gdb) p $r9 > $1 = 0 > (gdb) x/30x $pc - 0x30 > 0x1afc:0x388400400x387f00400xf80100400x48026919 > 0x1b0c:0x60000xe80100400x7c0803a60x4b24 > 0x1b1c:0x0x01000x01800x3920 > 0x1b2c:0x91290x7fe80x0x > which matches the hex content of > http://154.223.142.244/logs/20220310/console.zhouyi.log: > [5.077431][T1] init[1]: segfault (11) at 0 nip 1b2c lr > 10001024 code 1 in init[1000+d] > [5.087167][T1] init[1]: code: 38840040 387f0040 f8010040 > 48026919 6000 e8010040 7c0803a6 4b24 > [5.093987][T1] init[1]: code: 0100 0180 > 3920 <9129> 7fe8 > > > Conclusions: there might be something wrong when packing the init into > vmlinux in your environment. Quite possibly! Or the compiler might not be being invoked properly by the mkinitrd.sh script. > I will continue to do research on this interesting problem with you. Please let me know how it goes! Thanx, Paul > Thanks > Kind Regards > Zhouyi > > > > On Tue, Feb 8, 2022 at 8:12 PM Paul Menzel wrote: > > > > Dear Michael, > > > > > > Thank you for looking into this. > > > > Am 08.02.22 um 11:09 schrieb Michael Ellerman: > > > Paul Menzel writes: > > > > […] > > > > >> On the POWER8 server IBM S822LC running Ubuntu 21.10, building Linux > > >> 5.17-rc2+ with rcutorture tests > > > > > > I'm not sure if that's the host kernel version or the version you're > > > using of rcutorture? Can you tell us the sha1 of your host kernel and of > > > the tree you're running rcutorture from? > > > > The host system runs Linux 5.17-rc1+ started with kexec. Unfortunately, > > I am unable to find the exact sha1. > > > > $ more /proc/version > > Linux version 5.17.0-rc1+ > > (pmen...@flughafenberlinbrandenburgwillybrandt.molgen.mpg.de) (Ubuntu > > clang version 13.0.0-2, LLD 13.0.0) #1 SMP Fri Jan 28 > > 17:13:04 CET 2022 > > > > The Linux tree, from where I run rcutorture from, is at commit > > dfd42facf1e4 (Linux 5.17-rc3) with
Re: rcutorture’s init segfaults in ppc64le VM
Dear Paul I try to reproduce the bug in ppc64 VM in Oregon State University using the vmlinux extracted from https://owww.molgen.mpg.de/~pmenzel/rcutorture-2022.02.01-21.52.37-torture-locktorture-kasan-lock01.tar.xz the ppc64 VM in which I run the qemu without hardware acceleration is: Linux version 5.4.0-100-generic (buildd@bos02-ppc64el-021) (gcc version 9.3.0 (Ubuntu 9.3.0-17ubuntu1~20.04)) #113-Ubuntu SMP Thu Feb 3 18:43:11 UTC 2022 (Ubuntu 5.4.0-100.113-generic 5.4.166) The qemu command I use to test: cd /tmp/dev/shm/linux/tools/testing/selftests/rcutorture/res/2022.02.01-21.52.37-torture/results-locktorture-kasan/LOCK01$ $qemu-system-ppc64 -nographic -smp cores=2,threads=1 -net none -M pseries -nodefaults -device spapr-vscsi -serial file:/tmp/console.log -m 512 -kernel ./vmlinux -append "debug_boot_weak_hash panic=-1 console=ttyS0 rcutorture.onoff_interval=200 rcutorture.onoff_holdoff=30 rcutree.gp_preinit_delay=12 rcutree.gp_init_delay=3 rcutree.gp_cleanup_delay=3 rcutree.kthread_prio=2 threadirqs tree.use_softirq=0 rcutorture.n_barrier_cbs=4 rcutorture.stat_interval=15 rcutorture.shutdown_secs=1800 rcutorture.test_no_idle_hz=1 rcutorture.verbose=1" The console.log is uploaded to: http://154.223.142.244/logs/20220310/console.paul.log The log tells us it is illegal instruction that causes the trouble: [4.246387][T1] init[1]: illegal instruction (4) at 1002c308 nip 1002c308 lr 10001684 code 1 in init[1000+d] [4.251400][T1] init[1]: code: f90d88c0 f92a0008 f9480008 7c2004ac 2c2d f949 386d88d0 38e8 [4.253416][T1] init[1]: code: 41820098 e92d8f98 75290010 4182008c <4401> 2c2d 6000 8902f438 Meanwhile, the vmlinux compiled by myself runs smoothly. Then I modify mkinitrd.sh to let it panic manually: http://154.223.142.244/logs/20220310/mkinitrd.sh The log tells us it is a segfault (instead of a illegal instruction): http://154.223.142.244/logs/20220310/console.zhouyi.log Then I use gdb to debug the init in host: ubuntu@zhouzhouyi-1:~/newkernel/linux-next$ gdb tools/testing/selftests/rcutorture/initrd/init (gdb) run Starting program: /home/ubuntu/newkernel/linux-next/tools/testing/selftests/rcutorture/initrd/init Program received signal SIGSEGV, Segmentation fault. 0x1b2c in ?? () (gdb) x/10i $pc => 0x1b2c:stw r9,0(r9) 0x1b30:trap 0x1b34:.long 0x0 0x1b38:.long 0x0 0x1b3c:.long 0x0 0x1b40:lis r2,4110 0x1b44:addir2,r2,31488 0x1b48:mr r9,r1 0x1b4c:rldicr r1,r1,0,59 0x1b50:li r0,0 (gdb) p $r9 $1 = 0 (gdb) x/30x $pc - 0x30 0x1afc:0x388400400x387f00400xf80100400x48026919 0x1b0c:0x60000xe80100400x7c0803a60x4b24 0x1b1c:0x0x01000x01800x3920 0x1b2c:0x91290x7fe80x0x which matches the hex content of http://154.223.142.244/logs/20220310/console.zhouyi.log: [5.077431][T1] init[1]: segfault (11) at 0 nip 1b2c lr 10001024 code 1 in init[1000+d] [5.087167][T1] init[1]: code: 38840040 387f0040 f8010040 48026919 6000 e8010040 7c0803a6 4b24 [5.093987][T1] init[1]: code: 0100 0180 3920 <9129> 7fe8 Conclusions: there might be something wrong when packing the init into vmlinux in your environment. I will continue to do research on this interesting problem with you. Thanks Kind Regards Zhouyi On Tue, Feb 8, 2022 at 8:12 PM Paul Menzel wrote: > > Dear Michael, > > > Thank you for looking into this. > > Am 08.02.22 um 11:09 schrieb Michael Ellerman: > > Paul Menzel writes: > > […] > > >> On the POWER8 server IBM S822LC running Ubuntu 21.10, building Linux > >> 5.17-rc2+ with rcutorture tests > > > > I'm not sure if that's the host kernel version or the version you're > > using of rcutorture? Can you tell us the sha1 of your host kernel and of > > the tree you're running rcutorture from? > > The host system runs Linux 5.17-rc1+ started with kexec. Unfortunately, > I am unable to find the exact sha1. > > $ more /proc/version > Linux version 5.17.0-rc1+ > (pmen...@flughafenberlinbrandenburgwillybrandt.molgen.mpg.de) (Ubuntu > clang version 13.0.0-2, LLD 13.0.0) #1 SMP Fri Jan 28 > 17:13:04 CET 2022 > > The Linux tree, from where I run rcutorture from, is at commit > dfd42facf1e4 (Linux 5.17-rc3) with four patches on top: > > $ git log --oneline -6 > 207cec79e752 (HEAD -> master, origin/master, origin/HEAD) Problems > with rcutorture on ppc64le: allmodconfig(2) and other failures > 8c82f96fbe57 ata: libata-sata: improve sata_link_debounce() > a447541d925f ata: libata-sata: remove debounce delay by default > afd84e1eeafc ata: libata-sata: introduce struct sata_deb_timing > f4caf7e48b75 ata: libata-sata: Simplify sata_link_resume() interface > dfd42facf1e4
Re: rcutorture’s init segfaults in ppc64le VM
Dear Michael, Am 11.02.22 um 15:19 schrieb Paul Menzel: Am 11.02.22 um 02:48 schrieb Michael Ellerman: Paul Menzel writes: Am 08.02.22 um 11:09 schrieb Michael Ellerman: Paul Menzel writes: […] On the POWER8 server IBM S822LC running Ubuntu 21.10, building Linux 5.17-rc2+ with rcutorture tests I'm not sure if that's the host kernel version or the version you're using of rcutorture? Can you tell us the sha1 of your host kernel and of the tree you're running rcutorture from? The host system runs Linux 5.17-rc1+ started with kexec. Unfortunately, I am unable to find the exact sha1. $ more /proc/version Linux version 5.17.0-rc1+ (x...@eddb.molgen.mpg.de) (Ubuntu clang version 13.0.0-2, LLD 13.0.0) #1 SMP Fri Jan 28 17:13:04 CET 2022 OK. In general rc1 kernels can have issues, so it might be worth rebooting the host into either v5.17-rc3 or a distro or stable kernel. Just to rule out any issues on the host. Yes, that was a good test. It works with Ubuntu’s 5.13 Linux kernel. $ more /proc/version Linux version 5.13.0-28-generic (buildd@bos02-ppc64el-013) (gcc (Ubuntu 11.2.0-7ubuntu2) 11.2.0, GNU ld (GNU Binutils for Ubuntu) 2.37) #31-Ubuntu SMP Thu Jan 13 17:40:19 UTC 2022 I have to do more tests, but it could be LLVM/clang related. Building commit f1baf68e1383 (Merge tag 'net-5.17-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net) with the ata patches on top with GCC, I am unable to reproduce the issue. Before I built it with make -j100 LLVM=1 LLVM_IAS=0 bindeb-pkg […] Kind regards, Paul
Re: rcutorture’s init segfaults in ppc64le VM
Dear Michael, Am 11.02.22 um 02:48 schrieb Michael Ellerman: Paul Menzel writes: Am 08.02.22 um 11:09 schrieb Michael Ellerman: Paul Menzel writes: […] On the POWER8 server IBM S822LC running Ubuntu 21.10, building Linux 5.17-rc2+ with rcutorture tests I'm not sure if that's the host kernel version or the version you're using of rcutorture? Can you tell us the sha1 of your host kernel and of the tree you're running rcutorture from? The host system runs Linux 5.17-rc1+ started with kexec. Unfortunately, I am unable to find the exact sha1. $ more /proc/version Linux version 5.17.0-rc1+ (x...@eddb.molgen.mpg.de) (Ubuntu clang version 13.0.0-2, LLD 13.0.0) #1 SMP Fri Jan 28 17:13:04 CET 2022 OK. In general rc1 kernels can have issues, so it might be worth rebooting the host into either v5.17-rc3 or a distro or stable kernel. Just to rule out any issues on the host. Yes, that was a good test. It works with Ubuntu’s 5.13 Linux kernel. $ more /proc/version Linux version 5.13.0-28-generic (buildd@bos02-ppc64el-013) (gcc (Ubuntu 11.2.0-7ubuntu2) 11.2.0, GNU ld (GNU Binutils for Ubuntu) 2.37) #31-Ubuntu SMP Thu Jan 13 17:40:19 UTC 2022 I have to do more tests, but it could be LLVM/clang related. The Linux tree, from where I run rcutorture from, is at commit dfd42facf1e4 (Linux 5.17-rc3) with four patches on top: $ git log --oneline -6 207cec79e752 (HEAD -> master, origin/master, origin/HEAD) Problems with rcutorture on ppc64le: allmodconfig(2) and other failures 8c82f96fbe57 ata: libata-sata: improve sata_link_debounce() a447541d925f ata: libata-sata: remove debounce delay by default afd84e1eeafc ata: libata-sata: introduce struct sata_deb_timing f4caf7e48b75 ata: libata-sata: Simplify sata_link_resume() interface dfd42facf1e4 (tag: v5.17-rc3) Linux 5.17-rc3 $ tools/testing/selftests/rcutorture/bin/torture.sh --duration 10 the built init $ file tools/testing/selftests/rcutorture/initrd/init tools/testing/selftests/rcutorture/initrd/init: ELF 64-bit LSB executable, 64-bit PowerPC or cisco 7500, version 1 (SYSV), statically linked, BuildID[sha1]=0ded0e45649184a296f30d611f7a03cc51ecb616, for GNU/Linux 3.10.0, stripped Mine looks pretty much identical: $ file tools/testing/selftests/rcutorture/initrd/init tools/testing/selftests/rcutorture/initrd/init: ELF 64-bit LSB executable, 64-bit PowerPC or cisco 7500, version 1 (SYSV), statically linked, BuildID[sha1]=86078bf6e5d54ab0860d36aa9a65d52818b972c8, for GNU/Linux 3.10.0, stripped segfaults in QEMU. From one of the log files But mine doesn't segfault, it runs fine and the test completes. What qemu version are you using? I tried 4.2.1 and 6.2.0, both worked. $ qemu-system-ppc64le --version QEMU emulator version 6.0.0 (Debian 1:6.0+dfsg-2expubuntu1.1) Copyright (c) 2003-2021 Fabrice Bellard and the QEMU Project developers OK, that's one difference between our setups, but I'd be surprised if it explains this bug, but I guess anything's possible. /dev/shm/linux/tools/testing/selftests/rcutorture/res/2022.02.01-21.52.37-torture/results-rcutorture/TREE03/console.log Sorry, that was the wrong path/test. The correct one for the excerpt below is: /dev/shm/linux/tools/testing/selftests/rcutorture/res/2022.02.01-21.52.37-torture/results-locktorture-kasan/LOCK01/console.log (For TREE03, QEMU does not start the Linux kernel at all, that means no output after: Booting Linux via __start() @ 0x0040 ... OK yeah I see that too. Removing "threadirqs" from tools/testing/selftests/rcutorture/configs/rcu/TREE03.boot seems to fix it. Nice find. I have no idea, what that means though. I still see some preempt related warnings, we clearly have some bugs with preempt enabled. You can now download the content of `/dev/shm/linux/tools/testing/selftests/rcutorture/res/2022.02.01-21.52.37-torture/results-locktorture-kasan/LOCK01` [1, 65 MB]. Can you reproduce the segmentation fault with the line below? $ qemu-system-ppc64 -enable-kvm -nographic -smp cores=1,threads=8 \ -net none -enable-kvm -M pseries -nodefaults -device spapr-vscsi -serial stdio -m 512 \ -kernel /dev/shm/linux/tools/testing/selftests/rcutorture/res/2022.02.01-21.52.37-torture/results-locktorture-kasan/LOCK01/vmlinux \ -append "debug_boot_weak_hash panic=-1 console=ttyS0 \ torture.disable_onoff_at_boot locktorture.onoff_interval=3 \ locktorture.onoff_holdoff=30 locktorture.stat_interval=15 \ locktorture.shutdown_secs=60 locktorture.verbose=1" That works fine for me, boots and runs the test, then shuts down. I assume you see the segfault on every boot, not intermittently? So the differences between our setups are the host kernel and the qemu version. Can you try a different host kernel easily? The other thing would be to try a different qemu version, you might need to build from source,
Re: rcutorture’s init segfaults in ppc64le VM
Paul Menzel writes: > Am 08.02.22 um 11:09 schrieb Michael Ellerman: >> Paul Menzel writes: > > […] > >>> On the POWER8 server IBM S822LC running Ubuntu 21.10, building Linux >>> 5.17-rc2+ with rcutorture tests >> >> I'm not sure if that's the host kernel version or the version you're >> using of rcutorture? Can you tell us the sha1 of your host kernel and of >> the tree you're running rcutorture from? > > The host system runs Linux 5.17-rc1+ started with kexec. Unfortunately, > I am unable to find the exact sha1. > > $ more /proc/version > Linux version 5.17.0-rc1+ > (pmen...@flughafenberlinbrandenburgwillybrandt.molgen.mpg.de) (Ubuntu > clang version 13.0.0-2, LLD 13.0.0) #1 SMP Fri Jan 28 > 17:13:04 CET 2022 OK. In general rc1 kernels can have issues, so it might be worth rebooting the host into either v5.17-rc3 or a distro or stable kernel. Just to rule out any issues on the host. > The Linux tree, from where I run rcutorture from, is at commit > dfd42facf1e4 (Linux 5.17-rc3) with four patches on top: > > $ git log --oneline -6 > 207cec79e752 (HEAD -> master, origin/master, origin/HEAD) Problems > with rcutorture on ppc64le: allmodconfig(2) and other failures > 8c82f96fbe57 ata: libata-sata: improve sata_link_debounce() > a447541d925f ata: libata-sata: remove debounce delay by default > afd84e1eeafc ata: libata-sata: introduce struct sata_deb_timing > f4caf7e48b75 ata: libata-sata: Simplify sata_link_resume() interface > dfd42facf1e4 (tag: v5.17-rc3) Linux 5.17-rc3 > >>> $ tools/testing/selftests/rcutorture/bin/torture.sh --duration 10 >>> >>> the built init >>> >>> $ file tools/testing/selftests/rcutorture/initrd/init >>> tools/testing/selftests/rcutorture/initrd/init: ELF 64-bit LSB >>> executable, 64-bit PowerPC or cisco 7500, version 1 (SYSV), statically >>> linked, BuildID[sha1]=0ded0e45649184a296f30d611f7a03cc51ecb616, for >>> GNU/Linux 3.10.0, stripped >> >> Mine looks pretty much identical: >> >>$ file tools/testing/selftests/rcutorture/initrd/init >>tools/testing/selftests/rcutorture/initrd/init: ELF 64-bit LSB >> executable, 64-bit PowerPC or cisco 7500, version 1 (SYSV), statically >> linked, BuildID[sha1]=86078bf6e5d54ab0860d36aa9a65d52818b972c8, for >> GNU/Linux 3.10.0, stripped >> >>> segfaults in QEMU. From one of the log files >> >> But mine doesn't segfault, it runs fine and the test completes. >> >> What qemu version are you using? >> >> I tried 4.2.1 and 6.2.0, both worked. > > $ qemu-system-ppc64le --version > QEMU emulator version 6.0.0 (Debian 1:6.0+dfsg-2expubuntu1.1) > Copyright (c) 2003-2021 Fabrice Bellard and the QEMU Project developers OK, that's one difference between our setups, but I'd be surprised if it explains this bug, but I guess anything's possible. >>> /dev/shm/linux/tools/testing/selftests/rcutorture/res/2022.02.01-21.52.37-torture/results-rcutorture/TREE03/console.log > > Sorry, that was the wrong path/test. The correct one for the excerpt > below is: > > > /dev/shm/linux/tools/testing/selftests/rcutorture/res/2022.02.01-21.52.37-torture/results-locktorture-kasan/LOCK01/console.log > > (For TREE03, QEMU does not start the Linux kernel at all, that means no > output after: > > Booting Linux via __start() @ 0x0040 ... OK yeah I see that too. Removing "threadirqs" from tools/testing/selftests/rcutorture/configs/rcu/TREE03.boot seems to fix it. I still see some preempt related warnings, we clearly have some bugs with preempt enabled. > You can now download the content of > `/dev/shm/linux/tools/testing/selftests/rcutorture/res/2022.02.01-21.52.37-torture/results-locktorture-kasan/LOCK01` > > [1, 65 MB]. > > Can you reproduce the segmentation fault with the line below? > > $ qemu-system-ppc64 -enable-kvm -nographic -smp cores=1,threads=8 > -net none -enable-kvm -M pseries -nodefaults -device spapr-vscsi -serial > stdio -m 512 -kernel > /dev/shm/linux/tools/testing/selftests/rcutorture/res/2022.02.01-21.52.37-torture/results-locktorture-kasan/LOCK01/vmlinux > > -append "debug_boot_weak_hash panic=-1 console=ttyS0 > torture.disable_onoff_at_boot locktorture.onoff_interval=3 > locktorture.onoff_holdoff=30 locktorture.stat_interval=15 > locktorture.shutdown_secs=60 locktorture.verbose=1" That works fine for me, boots and runs the test, then shuts down. I assume you see the segfault on every boot, not intermittently? So the differences between our setups are the host kernel and the qemu version. Can you try a different host kernel easily? The other thing would be to try a different qemu version, you might need to build from source, but it's not that hard :) cheers
Re: rcutorture’s init segfaults in ppc64le VM
[Correct sha1 for test for 2022.02.01-21.52.37] Am 08.02.22 um 13:12 schrieb Paul Menzel: Dear Michael, Thank you for looking into this. Am 08.02.22 um 11:09 schrieb Michael Ellerman: Paul Menzel writes: […] On the POWER8 server IBM S822LC running Ubuntu 21.10, building Linux 5.17-rc2+ with rcutorture tests I'm not sure if that's the host kernel version or the version you're using of rcutorture? Can you tell us the sha1 of your host kernel and of the tree you're running rcutorture from? The host system runs Linux 5.17-rc1+ started with kexec. Unfortunately, I am unable to find the exact sha1. $ more /proc/version Linux version 5.17.0-rc1+ (pmen...@flughafenberlinbrandenburgwillybrandt.molgen.mpg.de) (Ubuntu clang version 13.0.0-2, LLD 13.0.0) #1 SMP Fri Jan 28 17:13:04 CET 2022 The Linux tree, from where I run rcutorture from, is at commit dfd42facf1e4 (Linux 5.17-rc3) with four patches on top: $ git log --oneline -6 207cec79e752 (HEAD -> master, origin/master, origin/HEAD) Problems with rcutorture on ppc64le: allmodconfig(2) and other failures 8c82f96fbe57 ata: libata-sata: improve sata_link_debounce() a447541d925f ata: libata-sata: remove debounce delay by default afd84e1eeafc ata: libata-sata: introduce struct sata_deb_timing f4caf7e48b75 ata: libata-sata: Simplify sata_link_resume() interface dfd42facf1e4 (tag: v5.17-rc3) Linux 5.17-rc3 I was able to reproduce this with the above, but the report and the attached logs at the end are from: $ git log --oneline -6 b37a34a8cf5a b37a34a8cf5a Problems with rcutorture on ppc64le: allmodconfig(2) and other failures 9a78ddead89a ata: libata-sata: improve sata_link_debounce() 567da2eaf099 ata: libata-sata: remove debounce delay by default 70ae61851660 ata: libata-sata: introduce struct sata_deb_timing 9ebb6433d9c3 ata: libata-sata: Simplify sata_link_resume() interface 26291c54e111 (tag: v5.17-rc2) Linux 5.17-rc2 $ tools/testing/selftests/rcutorture/bin/torture.sh --duration 10 the built init $ file tools/testing/selftests/rcutorture/initrd/init tools/testing/selftests/rcutorture/initrd/init: ELF 64-bit LSB executable, 64-bit PowerPC or cisco 7500, version 1 (SYSV), statically linked, BuildID[sha1]=0ded0e45649184a296f30d611f7a03cc51ecb616, for GNU/Linux 3.10.0, stripped Mine looks pretty much identical: $ file tools/testing/selftests/rcutorture/initrd/init tools/testing/selftests/rcutorture/initrd/init: ELF 64-bit LSB executable, 64-bit PowerPC or cisco 7500, version 1 (SYSV), statically linked, BuildID[sha1]=86078bf6e5d54ab0860d36aa9a65d52818b972c8, for GNU/Linux 3.10.0, stripped segfaults in QEMU. From one of the log files But mine doesn't segfault, it runs fine and the test completes. What qemu version are you using? I tried 4.2.1 and 6.2.0, both worked. $ qemu-system-ppc64le --version QEMU emulator version 6.0.0 (Debian 1:6.0+dfsg-2expubuntu1.1) Copyright (c) 2003-2021 Fabrice Bellard and the QEMU Project developers /dev/shm/linux/tools/testing/selftests/rcutorture/res/2022.02.01-21.52.37-torture/results-rcutorture/TREE03/console.log Sorry, that was the wrong path/test. The correct one for the excerpt below is: /dev/shm/linux/tools/testing/selftests/rcutorture/res/2022.02.01-21.52.37-torture/results-locktorture-kasan/LOCK01/console.log (For TREE03, QEMU does not start the Linux kernel at all, that means no output after: Booting Linux via __start() @ 0x0040 ... ) [ 1.119803][ T1] Run /init as init process [ 1.122011][ T1] init[1]: segfault (11) at f0656d90 nip 1a18 lr 0 code 1 in init[1000+d] [ 1.124863][ T1] init[1]: code: 2c2903e7 f9210030 4081ff84 4b58 0100 0580 3c40100f [ 1.128823][ T1] init[1]: code: 38427c00 7c290b78 782106e4 3800 7c0803a6 f801 e9028010 The disassembly from 3c40100f is: lis r2,4111 addi r2,r2,31744 mr r9,r1 rldicr r1,r1,0,59 li r0,0 stdu r1,-128(r1) <- fault mtlr r0 std r0,0(r1) ld r8,-32752(r2) I think you'll find that's the code at the ELF entry point. You can check with: $ readelf -e tools/testing/selftests/rcutorture/initrd/init | grep Entry Entry point address: 0x1c0c $ objdump -d tools/testing/selftests/rcutorture/initrd/init | grep -m 1 -A 8 1c0c 1c0c: 0e 10 40 3c lis r2,4110 1c10: 00 7b 42 38 addi r2,r2,31488 1c14: 78 0b 29 7c mr r9,r1 1c18: e4 06 21 78 rldicr r1,r1,0,59 1c1c: 00 00 00 38 li r0,0 1c20: 81 ff 21 f8 stdu r1,-128(r1) 1c24: a6 03 08 7c mtlr r0 1c28: 00 00 01 f8 std r0,0(r1) 1c2c: 10 80 02 e9 ld r8,-32752(r2) The fault you're seeing is the first store
Re: rcutorture’s init segfaults in ppc64le VM
Dear Michael, Thank you for looking into this. Am 08.02.22 um 11:09 schrieb Michael Ellerman: Paul Menzel writes: […] On the POWER8 server IBM S822LC running Ubuntu 21.10, building Linux 5.17-rc2+ with rcutorture tests I'm not sure if that's the host kernel version or the version you're using of rcutorture? Can you tell us the sha1 of your host kernel and of the tree you're running rcutorture from? The host system runs Linux 5.17-rc1+ started with kexec. Unfortunately, I am unable to find the exact sha1. $ more /proc/version Linux version 5.17.0-rc1+ (pmen...@flughafenberlinbrandenburgwillybrandt.molgen.mpg.de) (Ubuntu clang version 13.0.0-2, LLD 13.0.0) #1 SMP Fri Jan 28 17:13:04 CET 2022 The Linux tree, from where I run rcutorture from, is at commit dfd42facf1e4 (Linux 5.17-rc3) with four patches on top: $ git log --oneline -6 207cec79e752 (HEAD -> master, origin/master, origin/HEAD) Problems with rcutorture on ppc64le: allmodconfig(2) and other failures 8c82f96fbe57 ata: libata-sata: improve sata_link_debounce() a447541d925f ata: libata-sata: remove debounce delay by default afd84e1eeafc ata: libata-sata: introduce struct sata_deb_timing f4caf7e48b75 ata: libata-sata: Simplify sata_link_resume() interface dfd42facf1e4 (tag: v5.17-rc3) Linux 5.17-rc3 $ tools/testing/selftests/rcutorture/bin/torture.sh --duration 10 the built init $ file tools/testing/selftests/rcutorture/initrd/init tools/testing/selftests/rcutorture/initrd/init: ELF 64-bit LSB executable, 64-bit PowerPC or cisco 7500, version 1 (SYSV), statically linked, BuildID[sha1]=0ded0e45649184a296f30d611f7a03cc51ecb616, for GNU/Linux 3.10.0, stripped Mine looks pretty much identical: $ file tools/testing/selftests/rcutorture/initrd/init tools/testing/selftests/rcutorture/initrd/init: ELF 64-bit LSB executable, 64-bit PowerPC or cisco 7500, version 1 (SYSV), statically linked, BuildID[sha1]=86078bf6e5d54ab0860d36aa9a65d52818b972c8, for GNU/Linux 3.10.0, stripped segfaults in QEMU. From one of the log files But mine doesn't segfault, it runs fine and the test completes. What qemu version are you using? I tried 4.2.1 and 6.2.0, both worked. $ qemu-system-ppc64le --version QEMU emulator version 6.0.0 (Debian 1:6.0+dfsg-2expubuntu1.1) Copyright (c) 2003-2021 Fabrice Bellard and the QEMU Project developers /dev/shm/linux/tools/testing/selftests/rcutorture/res/2022.02.01-21.52.37-torture/results-rcutorture/TREE03/console.log Sorry, that was the wrong path/test. The correct one for the excerpt below is: /dev/shm/linux/tools/testing/selftests/rcutorture/res/2022.02.01-21.52.37-torture/results-locktorture-kasan/LOCK01/console.log (For TREE03, QEMU does not start the Linux kernel at all, that means no output after: Booting Linux via __start() @ 0x0040 ... ) [1.119803][T1] Run /init as init process [1.122011][T1] init[1]: segfault (11) at f0656d90 nip 1a18 lr 0 code 1 in init[1000+d] [1.124863][T1] init[1]: code: 2c2903e7 f9210030 4081ff84 4b58 0100 0580 3c40100f [1.128823][T1] init[1]: code: 38427c00 7c290b78 782106e4 3800 7c0803a6 f801 e9028010 The disassembly from 3c40100f is: lis r2,4111 addir2,r2,31744 mr r9,r1 rldicr r1,r1,0,59 li r0,0 stdur1,-128(r1) <- fault mtlrr0 std r0,0(r1) ld r8,-32752(r2) I think you'll find that's the code at the ELF entry point. You can check with: $ readelf -e tools/testing/selftests/rcutorture/initrd/init | grep Entry Entry point address: 0x1c0c $ objdump -d tools/testing/selftests/rcutorture/initrd/init | grep -m 1 -A 8 1c0c 1c0c: 0e 10 40 3c lis r2,4110 1c10: 00 7b 42 38 addir2,r2,31488 1c14: 78 0b 29 7c mr r9,r1 1c18: e4 06 21 78 rldicr r1,r1,0,59 1c1c: 00 00 00 38 li r0,0 1c20: 81 ff 21 f8 stdur1,-128(r1) 1c24: a6 03 08 7c mtlrr0 1c28: 00 00 01 f8 std r0,0(r1) 1c2c: 10 80 02 e9 ld r8,-32752(r2) The fault you're seeing is the first store using the stack pointer (r1), which is setup by the kernel. The fault address f0656d90 is weirdly low, the stack should be up near 128TB. I'm not sure how we end up with a bad r1. Can you dump some info about the kernel that was built, something like: $ file /dev/shm/linux/tools/testing/selftests/rcutorture/res/2022.02.01-21.52.37-torture/results-rcutorture/TREE03/vmlinux And maybe paste/attach the full log, maybe there's a clue somewhere. You can now download the content of `/dev/shm/linux/tools/testing/selftests/rcutorture/res/2022.02.01-21.52.37-torture/results-locktorture-kasan/LOCK01` [1, 65 MB]. Can you reproduce the segmentation fault with the line
Re: rcutorture’s init segfaults in ppc64le VM
Paul Menzel writes: > Dear Linux folks, Hi Paul, > On the POWER8 server IBM S822LC running Ubuntu 21.10, building Linux > 5.17-rc2+ with rcutorture tests I'm not sure if that's the host kernel version or the version you're using of rcutorture? Can you tell us the sha1 of your host kernel and of the tree you're running rcutorture from? > $ tools/testing/selftests/rcutorture/bin/torture.sh --duration 10 > > the built init > > $ file tools/testing/selftests/rcutorture/initrd/init > tools/testing/selftests/rcutorture/initrd/init: ELF 64-bit LSB > executable, 64-bit PowerPC or cisco 7500, version 1 (SYSV), statically > linked, BuildID[sha1]=0ded0e45649184a296f30d611f7a03cc51ecb616, for > GNU/Linux 3.10.0, stripped Mine looks pretty much identical: $ file tools/testing/selftests/rcutorture/initrd/init tools/testing/selftests/rcutorture/initrd/init: ELF 64-bit LSB executable, 64-bit PowerPC or cisco 7500, version 1 (SYSV), statically linked, BuildID[sha1]=86078bf6e5d54ab0860d36aa9a65d52818b972c8, for GNU/Linux 3.10.0, stripped > segfaults in QEMU. From one of the log files But mine doesn't segfault, it runs fine and the test completes. What qemu version are you using? I tried 4.2.1 and 6.2.0, both worked. > /dev/shm/linux/tools/testing/selftests/rcutorture/res/2022.02.01-21.52.37-torture/results-rcutorture/TREE03/console.log > > [1.119803][T1] Run /init as init process > [1.122011][T1] init[1]: segfault (11) at f0656d90 nip 1a18 > lr 0 code 1 in init[1000+d] > [1.124863][T1] init[1]: code: 2c2903e7 f9210030 4081ff84 > 4b58 0100 0580 3c40100f > [1.128823][T1] init[1]: code: 38427c00 7c290b78 782106e4 > 3800 7c0803a6 f801 e9028010 The disassembly from 3c40100f is: lis r2,4111 addir2,r2,31744 mr r9,r1 rldicr r1,r1,0,59 li r0,0 stdur1,-128(r1) <- fault mtlrr0 std r0,0(r1) ld r8,-32752(r2) I think you'll find that's the code at the ELF entry point. You can check with: $ readelf -e tools/testing/selftests/rcutorture/initrd/init | grep Entry Entry point address: 0x1c0c $ objdump -d tools/testing/selftests/rcutorture/initrd/init | grep -m 1 -A 8 1c0c 1c0c: 0e 10 40 3c lis r2,4110 1c10: 00 7b 42 38 addir2,r2,31488 1c14: 78 0b 29 7c mr r9,r1 1c18: e4 06 21 78 rldicr r1,r1,0,59 1c1c: 00 00 00 38 li r0,0 1c20: 81 ff 21 f8 stdur1,-128(r1) 1c24: a6 03 08 7c mtlrr0 1c28: 00 00 01 f8 std r0,0(r1) 1c2c: 10 80 02 e9 ld r8,-32752(r2) The fault you're seeing is the first store using the stack pointer (r1), which is setup by the kernel. The fault address f0656d90 is weirdly low, the stack should be up near 128TB. I'm not sure how we end up with a bad r1. Can you dump some info about the kernel that was built, something like: $ file /dev/shm/linux/tools/testing/selftests/rcutorture/res/2022.02.01-21.52.37-torture/results-rcutorture/TREE03/vmlinux And maybe paste/attach the full log, maybe there's a clue somewhere. cheers
Re: rcutorture’s init segfaults in ppc64le VM
Hi, The mailing list forward the emails to me in periodic style, very sorry not seeing Willy's email until I visited https://lore.kernel.org/rcu/20220207180901.gb14...@1wt.eu/T/#u, I am also very interested in testing Willy's proposal. Thanks a lot Zhouyi On Tue, Feb 8, 2022 at 1:46 PM Zhouyi Zhou wrote: > > Dear Paul > > I am also very interested in the topic. > The Open source lab of Oregon State University has lent me a 8 core > power ppc64el VM for 3 months, I guess I can try reproducing this bug > in the Virtual Machine by executing qemu in non hardware accelerated > mode (using -no-kvm argument). > I am currently doing research on > https://lore.kernel.org/rcu/20220201175023.GW4285@paulmck-ThinkPad-P17-Gen-1/T/#mc7e5f8ec99e3794bec1e38fbbb130e71172e4759, > I think I can give a preliminary short report on that previous topic > tomorrow. And I am very interested in doing a search on the new topic > the day after tomorrow. > > Thank you both for providing me an opportunity to improve myself ;-) > > Thanks again > Zhouyi > > On Tue, Feb 8, 2022 at 12:10 PM Paul E. McKenney wrote: > > > > On Mon, Feb 07, 2022 at 05:44:47PM +0100, Paul Menzel wrote: > > > Dear Linux folks, > > > > > > > > > On the POWER8 server IBM S822LC running Ubuntu 21.10, building Linux > > > 5.17-rc2+ with rcutorture tests > > > > > > $ tools/testing/selftests/rcutorture/bin/torture.sh --duration 10 > > > > > > the built init > > > > > > $ file tools/testing/selftests/rcutorture/initrd/init > > > tools/testing/selftests/rcutorture/initrd/init: ELF 64-bit LSB > > > executable, 64-bit PowerPC or cisco 7500, version 1 (SYSV), statically > > > linked, BuildID[sha1]=0ded0e45649184a296f30d611f7a03cc51ecb616, for > > > GNU/Linux 3.10.0, stripped > > > > > > segfaults in QEMU. From one of the log files > > > > > > > > > /dev/shm/linux/tools/testing/selftests/rcutorture/res/2022.02.01-21.52.37-torture/results-rcutorture/TREE03/console.log > > > > > > [1.119803][T1] Run /init as init process > > > [1.122011][T1] init[1]: segfault (11) at f0656d90 nip 1a18 > > > lr 0 code 1 in init[1000+d] > > > [1.124863][T1] init[1]: code: 2c2903e7 f9210030 4081ff84 > > > 4b58 0100 0580 3c40100f > > > [1.128823][T1] init[1]: code: 38427c00 7c290b78 782106e4 > > > 3800 7c0803a6 f801 e9028010 > > > > > > Executing the init, which just seems to be an endless loop, from userspace > > > work: > > > > > > $ strace ./tools/testing/selftests/rcutorture/initrd/init > > > execve("./tools/testing/selftests/rcutorture/initrd/init", > > > ["./tools/testing/selftests/rcutor"...], 0x7db9e860 /* 31 vars */) = 0 > > > brk(NULL) = 0x1001d94 > > > brk(0x1001d940b98) = 0x1001d940b98 > > > set_tid_address(0x1001d9400d0) = 2890832 > > > set_robust_list(0x1001d9400e0, 24) = 0 > > > uname({sysname="Linux", > > > nodename="flughafenberlinbrandenburgwillybrandt.molgen.mpg.de", ...}) = 0 > > > prlimit64(0, RLIMIT_STACK, NULL, {rlim_cur=8192*1024, > > > rlim_max=RLIM64_INFINITY}) = 0 > > > readlink("/proc/self/exe", "/dev/shm/linux/tools/testing/sel"..., > > > 4096) > > > = 61 > > > getrandom("\xf1\x30\x4c\x9e\x82\x8d\x26\xd7", 8, GRND_NONBLOCK) = 8 > > > brk(0x1001d970b98) = 0x1001d970b98 > > > brk(0x1001d98) = 0x1001d98 > > > mprotect(0x100e, 65536, PROT_READ) = 0 > > > clock_nanosleep(CLOCK_REALTIME, 0, {tv_sec=1, tv_nsec=0}, > > > 0x7b22c8a8) = 0 > > > clock_nanosleep(CLOCK_REALTIME, 0, {tv_sec=1, tv_nsec=0}, > > > 0x7b22c8a8) = 0 > > > clock_nanosleep(CLOCK_REALTIME, 0, {tv_sec=1, tv_nsec=0}, ^C{tv_sec=0, > > > tv_nsec=872674044}) = ? ERESTART_RESTARTBLOCK (Interrupted by signal) > > > strace: Process 2890832 detached > > > > Huh. In PowerPC, is there some difference between system calls > > executed in initrd and those same system calls executed in userspace? > > > > And just to make sure, the above strace was from exactly the same > > binary "init" file that is included in initrd, correct? > > > > Adding Willy Tarreau for his thoughts. > > > > Thanx, Paul > > > > > Any ideas, what `mkinitrd.sh` [2] should do differently? > > > > > > ``` > > > cat > init.c << '___EOF___' > > > #ifndef NOLIBC > > > #include > > > #include > > > #endif > > > > > > volatile unsigned long delaycount; > > > > > > int main(int argc, int argv[]) > > > { > > > int i; > > > struct timeval tv; > > > struct timeval tvb; > > > > > > for (;;) { > > > sleep(1); > > > /* Need some userspace time. */ > > > if (gettimeofday(, NULL)) > > > continue; > > > do { > > > for (i = 0; i < 1000 * 100; i++) > > >
Re: rcutorture’s init segfaults in ppc64le VM
Dear Paul I am also very interested in the topic. The Open source lab of Oregon State University has lent me a 8 core power ppc64el VM for 3 months, I guess I can try reproducing this bug in the Virtual Machine by executing qemu in non hardware accelerated mode (using -no-kvm argument). I am currently doing research on https://lore.kernel.org/rcu/20220201175023.GW4285@paulmck-ThinkPad-P17-Gen-1/T/#mc7e5f8ec99e3794bec1e38fbbb130e71172e4759, I think I can give a preliminary short report on that previous topic tomorrow. And I am very interested in doing a search on the new topic the day after tomorrow. Thank you both for providing me an opportunity to improve myself ;-) Thanks again Zhouyi On Tue, Feb 8, 2022 at 12:10 PM Paul E. McKenney wrote: > > On Mon, Feb 07, 2022 at 05:44:47PM +0100, Paul Menzel wrote: > > Dear Linux folks, > > > > > > On the POWER8 server IBM S822LC running Ubuntu 21.10, building Linux > > 5.17-rc2+ with rcutorture tests > > > > $ tools/testing/selftests/rcutorture/bin/torture.sh --duration 10 > > > > the built init > > > > $ file tools/testing/selftests/rcutorture/initrd/init > > tools/testing/selftests/rcutorture/initrd/init: ELF 64-bit LSB > > executable, 64-bit PowerPC or cisco 7500, version 1 (SYSV), statically > > linked, BuildID[sha1]=0ded0e45649184a296f30d611f7a03cc51ecb616, for > > GNU/Linux 3.10.0, stripped > > > > segfaults in QEMU. From one of the log files > > > > > > /dev/shm/linux/tools/testing/selftests/rcutorture/res/2022.02.01-21.52.37-torture/results-rcutorture/TREE03/console.log > > > > [1.119803][T1] Run /init as init process > > [1.122011][T1] init[1]: segfault (11) at f0656d90 nip 1a18 > > lr 0 code 1 in init[1000+d] > > [1.124863][T1] init[1]: code: 2c2903e7 f9210030 4081ff84 > > 4b58 0100 0580 3c40100f > > [1.128823][T1] init[1]: code: 38427c00 7c290b78 782106e4 > > 3800 7c0803a6 f801 e9028010 > > > > Executing the init, which just seems to be an endless loop, from userspace > > work: > > > > $ strace ./tools/testing/selftests/rcutorture/initrd/init > > execve("./tools/testing/selftests/rcutorture/initrd/init", > > ["./tools/testing/selftests/rcutor"...], 0x7db9e860 /* 31 vars */) = 0 > > brk(NULL) = 0x1001d94 > > brk(0x1001d940b98) = 0x1001d940b98 > > set_tid_address(0x1001d9400d0) = 2890832 > > set_robust_list(0x1001d9400e0, 24) = 0 > > uname({sysname="Linux", > > nodename="flughafenberlinbrandenburgwillybrandt.molgen.mpg.de", ...}) = 0 > > prlimit64(0, RLIMIT_STACK, NULL, {rlim_cur=8192*1024, > > rlim_max=RLIM64_INFINITY}) = 0 > > readlink("/proc/self/exe", "/dev/shm/linux/tools/testing/sel"..., 4096) > > = 61 > > getrandom("\xf1\x30\x4c\x9e\x82\x8d\x26\xd7", 8, GRND_NONBLOCK) = 8 > > brk(0x1001d970b98) = 0x1001d970b98 > > brk(0x1001d98) = 0x1001d98 > > mprotect(0x100e, 65536, PROT_READ) = 0 > > clock_nanosleep(CLOCK_REALTIME, 0, {tv_sec=1, tv_nsec=0}, > > 0x7b22c8a8) = 0 > > clock_nanosleep(CLOCK_REALTIME, 0, {tv_sec=1, tv_nsec=0}, > > 0x7b22c8a8) = 0 > > clock_nanosleep(CLOCK_REALTIME, 0, {tv_sec=1, tv_nsec=0}, ^C{tv_sec=0, > > tv_nsec=872674044}) = ? ERESTART_RESTARTBLOCK (Interrupted by signal) > > strace: Process 2890832 detached > > Huh. In PowerPC, is there some difference between system calls > executed in initrd and those same system calls executed in userspace? > > And just to make sure, the above strace was from exactly the same > binary "init" file that is included in initrd, correct? > > Adding Willy Tarreau for his thoughts. > > Thanx, Paul > > > Any ideas, what `mkinitrd.sh` [2] should do differently? > > > > ``` > > cat > init.c << '___EOF___' > > #ifndef NOLIBC > > #include > > #include > > #endif > > > > volatile unsigned long delaycount; > > > > int main(int argc, int argv[]) > > { > > int i; > > struct timeval tv; > > struct timeval tvb; > > > > for (;;) { > > sleep(1); > > /* Need some userspace time. */ > > if (gettimeofday(, NULL)) > > continue; > > do { > > for (i = 0; i < 1000 * 100; i++) > > delaycount = i * i; > > if (gettimeofday(, NULL)) > > break; > > tv.tv_sec -= tvb.tv_sec; > > if (tv.tv_sec > 1) > > break; > > tv.tv_usec += tv.tv_sec * 1000 * 1000; > > tv.tv_usec -= tvb.tv_usec; > > } while (tv.tv_usec < 1000); > > } > > return 0; > > } > > ___EOF___ > > > > # build using nolibc on supported archs (smaller executable)
Re: rcutorture’s init segfaults in ppc64le VM
On Mon, Feb 07, 2022 at 05:44:47PM +0100, Paul Menzel wrote: > Dear Linux folks, > > > On the POWER8 server IBM S822LC running Ubuntu 21.10, building Linux > 5.17-rc2+ with rcutorture tests > > $ tools/testing/selftests/rcutorture/bin/torture.sh --duration 10 > > the built init > > $ file tools/testing/selftests/rcutorture/initrd/init > tools/testing/selftests/rcutorture/initrd/init: ELF 64-bit LSB > executable, 64-bit PowerPC or cisco 7500, version 1 (SYSV), statically > linked, BuildID[sha1]=0ded0e45649184a296f30d611f7a03cc51ecb616, for > GNU/Linux 3.10.0, stripped > > segfaults in QEMU. From one of the log files > > > /dev/shm/linux/tools/testing/selftests/rcutorture/res/2022.02.01-21.52.37-torture/results-rcutorture/TREE03/console.log > > [1.119803][T1] Run /init as init process > [1.122011][T1] init[1]: segfault (11) at f0656d90 nip 1a18 > lr 0 code 1 in init[1000+d] > [1.124863][T1] init[1]: code: 2c2903e7 f9210030 4081ff84 > 4b58 0100 0580 3c40100f > [1.128823][T1] init[1]: code: 38427c00 7c290b78 782106e4 > 3800 7c0803a6 f801 e9028010 > > Executing the init, which just seems to be an endless loop, from userspace > work: > > $ strace ./tools/testing/selftests/rcutorture/initrd/init > execve("./tools/testing/selftests/rcutorture/initrd/init", > ["./tools/testing/selftests/rcutor"...], 0x7db9e860 /* 31 vars */) = 0 > brk(NULL) = 0x1001d94 > brk(0x1001d940b98) = 0x1001d940b98 > set_tid_address(0x1001d9400d0) = 2890832 > set_robust_list(0x1001d9400e0, 24) = 0 > uname({sysname="Linux", > nodename="flughafenberlinbrandenburgwillybrandt.molgen.mpg.de", ...}) = 0 > prlimit64(0, RLIMIT_STACK, NULL, {rlim_cur=8192*1024, > rlim_max=RLIM64_INFINITY}) = 0 > readlink("/proc/self/exe", "/dev/shm/linux/tools/testing/sel"..., 4096) > = 61 > getrandom("\xf1\x30\x4c\x9e\x82\x8d\x26\xd7", 8, GRND_NONBLOCK) = 8 > brk(0x1001d970b98) = 0x1001d970b98 > brk(0x1001d98) = 0x1001d98 > mprotect(0x100e, 65536, PROT_READ) = 0 > clock_nanosleep(CLOCK_REALTIME, 0, {tv_sec=1, tv_nsec=0}, > 0x7b22c8a8) = 0 > clock_nanosleep(CLOCK_REALTIME, 0, {tv_sec=1, tv_nsec=0}, > 0x7b22c8a8) = 0 > clock_nanosleep(CLOCK_REALTIME, 0, {tv_sec=1, tv_nsec=0}, ^C{tv_sec=0, > tv_nsec=872674044}) = ? ERESTART_RESTARTBLOCK (Interrupted by signal) > strace: Process 2890832 detached Huh. In PowerPC, is there some difference between system calls executed in initrd and those same system calls executed in userspace? And just to make sure, the above strace was from exactly the same binary "init" file that is included in initrd, correct? Adding Willy Tarreau for his thoughts. Thanx, Paul > Any ideas, what `mkinitrd.sh` [2] should do differently? > > ``` > cat > init.c << '___EOF___' > #ifndef NOLIBC > #include > #include > #endif > > volatile unsigned long delaycount; > > int main(int argc, int argv[]) > { > int i; > struct timeval tv; > struct timeval tvb; > > for (;;) { > sleep(1); > /* Need some userspace time. */ > if (gettimeofday(, NULL)) > continue; > do { > for (i = 0; i < 1000 * 100; i++) > delaycount = i * i; > if (gettimeofday(, NULL)) > break; > tv.tv_sec -= tvb.tv_sec; > if (tv.tv_sec > 1) > break; > tv.tv_usec += tv.tv_sec * 1000 * 1000; > tv.tv_usec -= tvb.tv_usec; > } while (tv.tv_usec < 1000); > } > return 0; > } > ___EOF___ > > # build using nolibc on supported archs (smaller executable) and fall > # back to regular glibc on other ones. > if echo -e "#if __x86_64__||__i386__||__i486__||__i586__||__i686__" \ >"||__ARM_EABI__||__aarch64__\nyes\n#endif" \ >| ${CROSS_COMPILE}gcc -E -nostdlib -xc - \ >| grep -q '^yes'; then > # architecture supported by nolibc > ${CROSS_COMPILE}gcc -fno-asynchronous-unwind-tables -fno-ident \ > -nostdlib -include ../../../../include/nolibc/nolibc.h \ > -s -static -Os -o init init.c -lgcc > else > ${CROSS_COMPILE}gcc -s -static -Os -o init init.c > fi > ``` > > > Kind regards, > > Paul > > > [1]: > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/tools/testing/selftests/rcutorture/doc/initrd.txt > [2]: > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/tools/testing/selftests/rcutorture/bin/mkinitrd.sh
rcutorture’s init segfaults in ppc64le VM
Dear Linux folks, On the POWER8 server IBM S822LC running Ubuntu 21.10, building Linux 5.17-rc2+ with rcutorture tests $ tools/testing/selftests/rcutorture/bin/torture.sh --duration 10 the built init $ file tools/testing/selftests/rcutorture/initrd/init tools/testing/selftests/rcutorture/initrd/init: ELF 64-bit LSB executable, 64-bit PowerPC or cisco 7500, version 1 (SYSV), statically linked, BuildID[sha1]=0ded0e45649184a296f30d611f7a03cc51ecb616, for GNU/Linux 3.10.0, stripped segfaults in QEMU. From one of the log files /dev/shm/linux/tools/testing/selftests/rcutorture/res/2022.02.01-21.52.37-torture/results-rcutorture/TREE03/console.log [1.119803][T1] Run /init as init process [1.122011][T1] init[1]: segfault (11) at f0656d90 nip 1a18 lr 0 code 1 in init[1000+d] [1.124863][T1] init[1]: code: 2c2903e7 f9210030 4081ff84 4b58 0100 0580 3c40100f [1.128823][T1] init[1]: code: 38427c00 7c290b78 782106e4 3800 7c0803a6 f801 e9028010 Executing the init, which just seems to be an endless loop, from userspace work: $ strace ./tools/testing/selftests/rcutorture/initrd/init execve("./tools/testing/selftests/rcutorture/initrd/init", ["./tools/testing/selftests/rcutor"...], 0x7db9e860 /* 31 vars */) = 0 brk(NULL) = 0x1001d94 brk(0x1001d940b98) = 0x1001d940b98 set_tid_address(0x1001d9400d0) = 2890832 set_robust_list(0x1001d9400e0, 24) = 0 uname({sysname="Linux", nodename="flughafenberlinbrandenburgwillybrandt.molgen.mpg.de", ...}) = 0 prlimit64(0, RLIMIT_STACK, NULL, {rlim_cur=8192*1024, rlim_max=RLIM64_INFINITY}) = 0 readlink("/proc/self/exe", "/dev/shm/linux/tools/testing/sel"..., 4096) = 61 getrandom("\xf1\x30\x4c\x9e\x82\x8d\x26\xd7", 8, GRND_NONBLOCK) = 8 brk(0x1001d970b98) = 0x1001d970b98 brk(0x1001d98) = 0x1001d98 mprotect(0x100e, 65536, PROT_READ) = 0 clock_nanosleep(CLOCK_REALTIME, 0, {tv_sec=1, tv_nsec=0}, 0x7b22c8a8) = 0 clock_nanosleep(CLOCK_REALTIME, 0, {tv_sec=1, tv_nsec=0}, 0x7b22c8a8) = 0 clock_nanosleep(CLOCK_REALTIME, 0, {tv_sec=1, tv_nsec=0}, ^C{tv_sec=0, tv_nsec=872674044}) = ? ERESTART_RESTARTBLOCK (Interrupted by signal) strace: Process 2890832 detached Any ideas, what `mkinitrd.sh` [2] should do differently? ``` cat > init.c << '___EOF___' #ifndef NOLIBC #include #include #endif volatile unsigned long delaycount; int main(int argc, int argv[]) { int i; struct timeval tv; struct timeval tvb; for (;;) { sleep(1); /* Need some userspace time. */ if (gettimeofday(, NULL)) continue; do { for (i = 0; i < 1000 * 100; i++) delaycount = i * i; if (gettimeofday(, NULL)) break; tv.tv_sec -= tvb.tv_sec; if (tv.tv_sec > 1) break; tv.tv_usec += tv.tv_sec * 1000 * 1000; tv.tv_usec -= tvb.tv_usec; } while (tv.tv_usec < 1000); } return 0; } ___EOF___ # build using nolibc on supported archs (smaller executable) and fall # back to regular glibc on other ones. if echo -e "#if __x86_64__||__i386__||__i486__||__i586__||__i686__" \ "||__ARM_EABI__||__aarch64__\nyes\n#endif" \ | ${CROSS_COMPILE}gcc -E -nostdlib -xc - \ | grep -q '^yes'; then # architecture supported by nolibc ${CROSS_COMPILE}gcc -fno-asynchronous-unwind-tables -fno-ident \ -nostdlib -include ../../../../include/nolibc/nolibc.h \ -s -static -Os -o init init.c -lgcc else ${CROSS_COMPILE}gcc -s -static -Os -o init init.c fi ``` Kind regards, Paul [1]: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/tools/testing/selftests/rcutorture/doc/initrd.txt [2]: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/tools/testing/selftests/rcutorture/bin/mkinitrd.sh