Re: rcutorture’s init segfaults in ppc64le VM

2022-03-10 Thread Zhouyi Zhou
Dear Paul

On Thu, Mar 10, 2022 at 4:10 PM Paul Menzel  wrote:
>
> Dear Zhouyi,
>
>
> Thank you for still looking into this.
You are very welcome ;-)
>
>
> Am 10.03.22 um 03:37 schrieb Zhouyi Zhou:
>
> > I try to reproduce the bug in ppc64 VM in Oregon State University
> > using the vmlinux extracted from
> > https://owww.molgen.mpg.de/~pmenzel/rcutorture-2022.02.01-21.52.37-torture-locktorture-kasan-lock01.tar.xz
> >
> > the ppc64 VM in which I run the qemu without hardware acceleration is:
> > Linux version 5.4.0-100-generic (buildd@bos02-ppc64el-021) (gcc version 
> > 9.3.0 (Ubuntu 9.3.0-17ubuntu1~20.04)) #113-Ubuntu SMP Thu Feb 3 18:43:11 
> > UTC 2022 (Ubuntu 5.4.0-100.113-generic 5.4.166)
> >
> >
> > The qemu command I use to test:
> > cd 
> > /tmp/dev/shm/linux/tools/testing/selftests/rcutorture/res/2022.02.01-21.52.37-torture/results-locktorture-kasan/LOCK01$
> > $qemu-system-ppc64   -nographic -smp cores=2,threads=1 -net none -M
> > pseries -nodefaults -device spapr-vscsi -serial file:/tmp/console.log
> > -m 512 -kernel ./vmlinux -append "debug_boot_weak_hash panic=-1
> > console=ttyS0 rcutorture.onoff_interval=200
> > rcutorture.onoff_holdoff=30 rcutree.gp_preinit_delay=12
> > rcutree.gp_init_delay=3 rcutree.gp_cleanup_delay=3
> > rcutree.kthread_prio=2 threadirqs tree.use_softirq=0
> > rcutorture.n_barrier_cbs=4 rcutorture.stat_interval=15
> > rcutorture.shutdown_secs=1800 rcutorture.test_no_idle_hz=1
> > rcutorture.verbose=1"
> >
> > The console.log is uploaded to:
> > http://154.223.142.244/logs/20220310/console.paul.log
> > The log tells us it is illegal instruction that causes the trouble:
> > [4.246387][T1] init[1]: illegal instruction (4) at 1002c308 nip 
> > 1002c308 lr 10001684 code 1 in init[1000+d]
> > [4.251400][T1] init[1]: code: f90d88c0 f92a0008 f9480008 7c2004ac 
> > 2c2d f949 386d88d0 38e8
> > [4.253416][T1] init[1]: code: 41820098 e92d8f98 75290010 4182008c 
> > <4401> 2c2d 6000 8902f438
> >
> >
> > Meanwhile, the vmlinux compiled by myself runs smoothly.
>
> How did you build it? Using GCC or clang? I forgot, if the problem was
I built vmlinux(es) using GCC and clang both. The compiled vmlinux(es)
runs smoothly.
> only reproducible if the host Linux kernel was built with clang or the
> VM kernel.
Yes, I also remember this, the dependence of how the host Linux kernel
is built makes things more complex.
>
> > Then I modify mkinitrd.sh to let it panic manually:
> > http://154.223.142.244/logs/20220310/mkinitrd.sh
>
> I only see the change:
>
>  -
>  +  int *ptr = 0;
>  +  *ptr =  0;
>
Yes, I make the segfault happen manually.
> > The log tells us it is a segfault (instead of a illegal instruction):
> > http://154.223.142.244/logs/20220310/console.zhouyi.log
> >
> > Then I use gdb to debug the init in host:
> > ubuntu@zhouzhouyi-1:~/newkernel/linux-next$ gdb
> > tools/testing/selftests/rcutorture/initrd/init
> > (gdb) run
> > Starting program:
> > /home/ubuntu/newkernel/linux-next/tools/testing/selftests/rcutorture/initrd/init
> >
> > Program received signal SIGSEGV, Segmentation fault.
> > 0x1b2c in ?? ()
> > (gdb) x/10i $pc
> > => 0x1b2c:stw r9,0(r9)
> > 0x1b30:trap
> > 0x1b34:.long 0x0
> > 0x1b38:.long 0x0
> > 0x1b3c:.long 0x0
> > 0x1b40:lis r2,4110
> > 0x1b44:addir2,r2,31488
> > 0x1b48:mr  r9,r1
> > 0x1b4c:rldicr  r1,r1,0,59
> > 0x1b50:li  r0,0
> > (gdb) p $r9
> > $1 = 0
> > (gdb) x/30x $pc - 0x30
> > 0x1afc:0x388400400x387f00400xf80100400x48026919
> > 0x1b0c:0x60000xe80100400x7c0803a60x4b24
> > 0x1b1c:0x0x01000x01800x3920
> > 0x1b2c:0x91290x7fe80x0x
> > which matches the hex content of
> > http://154.223.142.244/logs/20220310/console.zhouyi.log:
> > [5.077431][T1] init[1]: segfault (11) at 0 nip 1b2c lr 10001024 
> > code 1 in init[1000+d]
> > [5.087167][T1] init[1]: code: 38840040 387f0040 f8010040 48026919 
> > 6000 e8010040 7c0803a6 4b24
> > [5.093987][T1] init[1]: code:  0100 0180 3920 
> > <9129> 7fe8  
> >
> >
> > Conclusions: there might be something wrong when packing the init into
> > vmlinux in your environment.
> >
> > I will continue to do research on this interesting problem with you.
>
> As written I think it’s a problem with LLVM/clang. Unfortunately, I
> won’t be able to retest before next week.
Roger that, no need to hurry ;-)

Kind regards
Zhouyi
> Kind regards,
>
> Paul


Re: rcutorture’s init segfaults in ppc64le VM

2022-03-10 Thread Paul Menzel

Dear Zhouyi,


Thank you for still looking into this.


Am 10.03.22 um 03:37 schrieb Zhouyi Zhou:


I try to reproduce the bug in ppc64 VM in Oregon State University
using the vmlinux extracted from
https://owww.molgen.mpg.de/~pmenzel/rcutorture-2022.02.01-21.52.37-torture-locktorture-kasan-lock01.tar.xz

the ppc64 VM in which I run the qemu without hardware acceleration is:
Linux version 5.4.0-100-generic (buildd@bos02-ppc64el-021) (gcc version 9.3.0 
(Ubuntu 9.3.0-17ubuntu1~20.04)) #113-Ubuntu SMP Thu Feb 3 18:43:11 UTC 2022 
(Ubuntu 5.4.0-100.113-generic 5.4.166)


The qemu command I use to test:
cd 
/tmp/dev/shm/linux/tools/testing/selftests/rcutorture/res/2022.02.01-21.52.37-torture/results-locktorture-kasan/LOCK01$
$qemu-system-ppc64   -nographic -smp cores=2,threads=1 -net none -M
pseries -nodefaults -device spapr-vscsi -serial file:/tmp/console.log
-m 512 -kernel ./vmlinux -append "debug_boot_weak_hash panic=-1
console=ttyS0 rcutorture.onoff_interval=200
rcutorture.onoff_holdoff=30 rcutree.gp_preinit_delay=12
rcutree.gp_init_delay=3 rcutree.gp_cleanup_delay=3
rcutree.kthread_prio=2 threadirqs tree.use_softirq=0
rcutorture.n_barrier_cbs=4 rcutorture.stat_interval=15
rcutorture.shutdown_secs=1800 rcutorture.test_no_idle_hz=1
rcutorture.verbose=1"

The console.log is uploaded to:
http://154.223.142.244/logs/20220310/console.paul.log
The log tells us it is illegal instruction that causes the trouble:
[4.246387][T1] init[1]: illegal instruction (4) at 1002c308 nip 
1002c308 lr 10001684 code 1 in init[1000+d]
[4.251400][T1] init[1]: code: f90d88c0 f92a0008 f9480008 7c2004ac 
2c2d f949 386d88d0 38e8
[4.253416][T1] init[1]: code: 41820098 e92d8f98 75290010 4182008c 
<4401> 2c2d 6000 8902f438


Meanwhile, the vmlinux compiled by myself runs smoothly.


How did you build it? Using GCC or clang? I forgot, if the problem was 
only reproducible if the host Linux kernel was built with clang or the 
VM kernel.



Then I modify mkinitrd.sh to let it panic manually:
http://154.223.142.244/logs/20220310/mkinitrd.sh


I only see the change:

-
+   int *ptr = 0;
+   *ptr =  0;


The log tells us it is a segfault (instead of a illegal instruction):
http://154.223.142.244/logs/20220310/console.zhouyi.log

Then I use gdb to debug the init in host:
ubuntu@zhouzhouyi-1:~/newkernel/linux-next$ gdb
tools/testing/selftests/rcutorture/initrd/init
(gdb) run
Starting program:
/home/ubuntu/newkernel/linux-next/tools/testing/selftests/rcutorture/initrd/init

Program received signal SIGSEGV, Segmentation fault.
0x1b2c in ?? ()
(gdb) x/10i $pc
=> 0x1b2c:stw r9,0(r9)
0x1b30:trap
0x1b34:.long 0x0
0x1b38:.long 0x0
0x1b3c:.long 0x0
0x1b40:lis r2,4110
0x1b44:addir2,r2,31488
0x1b48:mr  r9,r1
0x1b4c:rldicr  r1,r1,0,59
0x1b50:li  r0,0
(gdb) p $r9
$1 = 0
(gdb) x/30x $pc - 0x30
0x1afc:0x388400400x387f00400xf80100400x48026919
0x1b0c:0x60000xe80100400x7c0803a60x4b24
0x1b1c:0x0x01000x01800x3920
0x1b2c:0x91290x7fe80x0x
which matches the hex content of
http://154.223.142.244/logs/20220310/console.zhouyi.log:
[5.077431][T1] init[1]: segfault (11) at 0 nip 1b2c lr 10001024 
code 1 in init[1000+d]
[5.087167][T1] init[1]: code: 38840040 387f0040 f8010040 48026919 
6000 e8010040 7c0803a6 4b24
[5.093987][T1] init[1]: code:  0100 0180 3920 
<9129> 7fe8  


Conclusions: there might be something wrong when packing the init into
vmlinux in your environment.

I will continue to do research on this interesting problem with you.


As written I think it’s a problem with LLVM/clang. Unfortunately, I 
won’t be able to retest before next week.



Kind regards,

Paul


Re: rcutorture’s init segfaults in ppc64le VM

2022-03-09 Thread Paul E. McKenney
On Thu, Mar 10, 2022 at 10:37:12AM +0800, Zhouyi Zhou wrote:
> Dear Paul
> 
> I try to reproduce the bug in ppc64 VM in Oregon State University
> using the vmlinux extracted from
> https://owww.molgen.mpg.de/~pmenzel/rcutorture-2022.02.01-21.52.37-torture-locktorture-kasan-lock01.tar.xz
> 
> the ppc64 VM in which I run the qemu without hardware acceleration is:
> Linux version 5.4.0-100-generic (buildd@bos02-ppc64el-021) (gcc
> version 9.3.0 (Ubuntu 9.3.0-17ubuntu1~20.04)) #113-Ubuntu SMP Thu Feb
> 3 18:43:11 UTC 2022 (Ubuntu 5.4.0-100.113-generic 5.4.166)
> 
> 
> The qemu command I use to test:
> cd 
> /tmp/dev/shm/linux/tools/testing/selftests/rcutorture/res/2022.02.01-21.52.37-torture/results-locktorture-kasan/LOCK01$
> $qemu-system-ppc64   -nographic -smp cores=2,threads=1 -net none -M
> pseries -nodefaults -device spapr-vscsi -serial file:/tmp/console.log
> -m 512 -kernel ./vmlinux -append "debug_boot_weak_hash panic=-1
> console=ttyS0 rcutorture.onoff_interval=200
> rcutorture.onoff_holdoff=30 rcutree.gp_preinit_delay=12
> rcutree.gp_init_delay=3 rcutree.gp_cleanup_delay=3
> rcutree.kthread_prio=2 threadirqs tree.use_softirq=0
> rcutorture.n_barrier_cbs=4 rcutorture.stat_interval=15
> rcutorture.shutdown_secs=1800 rcutorture.test_no_idle_hz=1
> rcutorture.verbose=1"
> 
> The console.log is uploaded to:
> http://154.223.142.244/logs/20220310/console.paul.log
> The log tells us it is illegal instruction that causes the trouble:
> [4.246387][T1] init[1]: illegal instruction (4) at 1002c308
> nip 1002c308 lr 10001684 code 1 in init[1000+d]
> [4.251400][T1] init[1]: code: f90d88c0 f92a0008 f9480008
> 7c2004ac 2c2d f949 386d88d0 38e8
> [4.253416][T1] init[1]: code: 41820098 e92d8f98 75290010
> 4182008c <4401> 2c2d 6000 8902f438
> 
> 
> Meanwhile, the vmlinux compiled by myself runs smoothly.
> 
> Then I modify mkinitrd.sh to let it panic manually:
> http://154.223.142.244/logs/20220310/mkinitrd.sh
> The log tells us it is a segfault (instead of a illegal instruction):
> http://154.223.142.244/logs/20220310/console.zhouyi.log
> 
> Then I use gdb to debug the init in host:
> ubuntu@zhouzhouyi-1:~/newkernel/linux-next$ gdb
> tools/testing/selftests/rcutorture/initrd/init
> (gdb) run
> Starting program:
> /home/ubuntu/newkernel/linux-next/tools/testing/selftests/rcutorture/initrd/init
> 
> Program received signal SIGSEGV, Segmentation fault.
> 0x1b2c in ?? ()
> (gdb) x/10i $pc
> => 0x1b2c:stw r9,0(r9)
>0x1b30:trap
>0x1b34:.long 0x0
>0x1b38:.long 0x0
>0x1b3c:.long 0x0
>0x1b40:lis r2,4110
>0x1b44:addir2,r2,31488
>0x1b48:mr  r9,r1
>0x1b4c:rldicr  r1,r1,0,59
>0x1b50:li  r0,0
> (gdb) p $r9
> $1 = 0
> (gdb) x/30x $pc - 0x30
> 0x1afc:0x388400400x387f00400xf80100400x48026919
> 0x1b0c:0x60000xe80100400x7c0803a60x4b24
> 0x1b1c:0x0x01000x01800x3920
> 0x1b2c:0x91290x7fe80x0x
> which matches the hex content of
> http://154.223.142.244/logs/20220310/console.zhouyi.log:
> [5.077431][T1] init[1]: segfault (11) at 0 nip 1b2c lr
> 10001024 code 1 in init[1000+d]
> [5.087167][T1] init[1]: code: 38840040 387f0040 f8010040
> 48026919 6000 e8010040 7c0803a6 4b24
> [5.093987][T1] init[1]: code:  0100 0180
> 3920 <9129> 7fe8  
> 
> 
> Conclusions: there might be something wrong when packing the init into
> vmlinux in your environment.

Quite possibly!  Or the compiler might not be being invoked properly
by the mkinitrd.sh script.

> I will continue to do research on this interesting problem with you.

Please let me know how it goes!

Thanx, Paul

> Thanks
> Kind Regards
> Zhouyi
> 
> 
> 
> On Tue, Feb 8, 2022 at 8:12 PM Paul Menzel  wrote:
> >
> > Dear Michael,
> >
> >
> > Thank you for looking into this.
> >
> > Am 08.02.22 um 11:09 schrieb Michael Ellerman:
> > > Paul Menzel writes:
> >
> > […]
> >
> > >> On the POWER8 server IBM S822LC running Ubuntu 21.10, building Linux
> > >> 5.17-rc2+ with rcutorture tests
> > >
> > > I'm not sure if that's the host kernel version or the version you're
> > > using of rcutorture? Can you tell us the sha1 of your host kernel and of
> > > the tree you're running rcutorture from?
> >
> > The host system runs Linux 5.17-rc1+ started with kexec. Unfortunately,
> > I am unable to find the exact sha1.
> >
> >  $ more /proc/version
> >  Linux version 5.17.0-rc1+
> > (pmen...@flughafenberlinbrandenburgwillybrandt.molgen.mpg.de) (Ubuntu
> > clang version 13.0.0-2, LLD 13.0.0) #1 SMP Fri Jan 28
> > 17:13:04 CET 2022
> >
> > The Linux tree, from where I run rcutorture from, is at commit
> > dfd42facf1e4 (Linux 5.17-rc3) with 

Re: rcutorture’s init segfaults in ppc64le VM

2022-03-09 Thread Zhouyi Zhou
Dear Paul

I try to reproduce the bug in ppc64 VM in Oregon State University
using the vmlinux extracted from
https://owww.molgen.mpg.de/~pmenzel/rcutorture-2022.02.01-21.52.37-torture-locktorture-kasan-lock01.tar.xz

the ppc64 VM in which I run the qemu without hardware acceleration is:
Linux version 5.4.0-100-generic (buildd@bos02-ppc64el-021) (gcc
version 9.3.0 (Ubuntu 9.3.0-17ubuntu1~20.04)) #113-Ubuntu SMP Thu Feb
3 18:43:11 UTC 2022 (Ubuntu 5.4.0-100.113-generic 5.4.166)


The qemu command I use to test:
cd 
/tmp/dev/shm/linux/tools/testing/selftests/rcutorture/res/2022.02.01-21.52.37-torture/results-locktorture-kasan/LOCK01$
$qemu-system-ppc64   -nographic -smp cores=2,threads=1 -net none -M
pseries -nodefaults -device spapr-vscsi -serial file:/tmp/console.log
-m 512 -kernel ./vmlinux -append "debug_boot_weak_hash panic=-1
console=ttyS0 rcutorture.onoff_interval=200
rcutorture.onoff_holdoff=30 rcutree.gp_preinit_delay=12
rcutree.gp_init_delay=3 rcutree.gp_cleanup_delay=3
rcutree.kthread_prio=2 threadirqs tree.use_softirq=0
rcutorture.n_barrier_cbs=4 rcutorture.stat_interval=15
rcutorture.shutdown_secs=1800 rcutorture.test_no_idle_hz=1
rcutorture.verbose=1"

The console.log is uploaded to:
http://154.223.142.244/logs/20220310/console.paul.log
The log tells us it is illegal instruction that causes the trouble:
[4.246387][T1] init[1]: illegal instruction (4) at 1002c308
nip 1002c308 lr 10001684 code 1 in init[1000+d]
[4.251400][T1] init[1]: code: f90d88c0 f92a0008 f9480008
7c2004ac 2c2d f949 386d88d0 38e8
[4.253416][T1] init[1]: code: 41820098 e92d8f98 75290010
4182008c <4401> 2c2d 6000 8902f438


Meanwhile, the vmlinux compiled by myself runs smoothly.

Then I modify mkinitrd.sh to let it panic manually:
http://154.223.142.244/logs/20220310/mkinitrd.sh
The log tells us it is a segfault (instead of a illegal instruction):
http://154.223.142.244/logs/20220310/console.zhouyi.log

Then I use gdb to debug the init in host:
ubuntu@zhouzhouyi-1:~/newkernel/linux-next$ gdb
tools/testing/selftests/rcutorture/initrd/init
(gdb) run
Starting program:
/home/ubuntu/newkernel/linux-next/tools/testing/selftests/rcutorture/initrd/init

Program received signal SIGSEGV, Segmentation fault.
0x1b2c in ?? ()
(gdb) x/10i $pc
=> 0x1b2c:stw r9,0(r9)
   0x1b30:trap
   0x1b34:.long 0x0
   0x1b38:.long 0x0
   0x1b3c:.long 0x0
   0x1b40:lis r2,4110
   0x1b44:addir2,r2,31488
   0x1b48:mr  r9,r1
   0x1b4c:rldicr  r1,r1,0,59
   0x1b50:li  r0,0
(gdb) p $r9
$1 = 0
(gdb) x/30x $pc - 0x30
0x1afc:0x388400400x387f00400xf80100400x48026919
0x1b0c:0x60000xe80100400x7c0803a60x4b24
0x1b1c:0x0x01000x01800x3920
0x1b2c:0x91290x7fe80x0x
which matches the hex content of
http://154.223.142.244/logs/20220310/console.zhouyi.log:
[5.077431][T1] init[1]: segfault (11) at 0 nip 1b2c lr
10001024 code 1 in init[1000+d]
[5.087167][T1] init[1]: code: 38840040 387f0040 f8010040
48026919 6000 e8010040 7c0803a6 4b24
[5.093987][T1] init[1]: code:  0100 0180
3920 <9129> 7fe8  


Conclusions: there might be something wrong when packing the init into
vmlinux in your environment.

I will continue to do research on this interesting problem with you.

Thanks
Kind Regards
Zhouyi



On Tue, Feb 8, 2022 at 8:12 PM Paul Menzel  wrote:
>
> Dear Michael,
>
>
> Thank you for looking into this.
>
> Am 08.02.22 um 11:09 schrieb Michael Ellerman:
> > Paul Menzel writes:
>
> […]
>
> >> On the POWER8 server IBM S822LC running Ubuntu 21.10, building Linux
> >> 5.17-rc2+ with rcutorture tests
> >
> > I'm not sure if that's the host kernel version or the version you're
> > using of rcutorture? Can you tell us the sha1 of your host kernel and of
> > the tree you're running rcutorture from?
>
> The host system runs Linux 5.17-rc1+ started with kexec. Unfortunately,
> I am unable to find the exact sha1.
>
>  $ more /proc/version
>  Linux version 5.17.0-rc1+
> (pmen...@flughafenberlinbrandenburgwillybrandt.molgen.mpg.de) (Ubuntu
> clang version 13.0.0-2, LLD 13.0.0) #1 SMP Fri Jan 28
> 17:13:04 CET 2022
>
> The Linux tree, from where I run rcutorture from, is at commit
> dfd42facf1e4 (Linux 5.17-rc3) with four patches on top:
>
>  $ git log --oneline -6
>  207cec79e752 (HEAD -> master, origin/master, origin/HEAD) Problems
> with rcutorture on ppc64le: allmodconfig(2) and other failures
>  8c82f96fbe57 ata: libata-sata: improve sata_link_debounce()
>  a447541d925f ata: libata-sata: remove debounce delay by default
>  afd84e1eeafc ata: libata-sata: introduce struct sata_deb_timing
>  f4caf7e48b75 ata: libata-sata: Simplify sata_link_resume() interface
>  dfd42facf1e4 

Re: rcutorture’s init segfaults in ppc64le VM

2022-02-11 Thread Paul Menzel

Dear Michael,


Am 11.02.22 um 15:19 schrieb Paul Menzel:


Am 11.02.22 um 02:48 schrieb Michael Ellerman:

Paul Menzel writes:

Am 08.02.22 um 11:09 schrieb Michael Ellerman:

Paul Menzel writes:


[…]


On the POWER8 server IBM S822LC running Ubuntu 21.10, building Linux
5.17-rc2+ with rcutorture tests


I'm not sure if that's the host kernel version or the version you're
using of rcutorture? Can you tell us the sha1 of your host kernel 
and of the tree you're running rcutorture from?


The host system runs Linux 5.17-rc1+ started with kexec. Unfortunately,
I am unable to find the exact sha1.

  $ more /proc/version
  Linux version 5.17.0-rc1+ (x...@eddb.molgen.mpg.de) (Ubuntu clang version 
13.0.0-2, LLD 13.0.0) #1 SMP Fri Jan 28 17:13:04 CET 2022


OK. In general rc1 kernels can have issues, so it might be worth
rebooting the host into either v5.17-rc3 or a distro or stable kernel.
Just to rule out any issues on the host.


Yes, that was a good test. It works with Ubuntu’s 5.13 Linux kernel.

     $ more /proc/version
     Linux version 5.13.0-28-generic (buildd@bos02-ppc64el-013) (gcc (Ubuntu 
11.2.0-7ubuntu2) 11.2.0, GNU ld (GNU Binutils for Ubuntu) 2.37) #31-Ubuntu SMP 
Thu Jan 13 17:40:19 UTC 2022

I have to do more tests, but it could be LLVM/clang related.


Building commit f1baf68e1383 (Merge tag 'net-5.17-rc4' of 
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net) with the ata 
patches on top with GCC, I am unable to reproduce the issue. Before I 
built it with


make -j100 LLVM=1 LLVM_IAS=0 bindeb-pkg

[…]


Kind regards,

Paul


Re: rcutorture’s init segfaults in ppc64le VM

2022-02-11 Thread Paul Menzel

Dear Michael,


Am 11.02.22 um 02:48 schrieb Michael Ellerman:

Paul Menzel writes:

Am 08.02.22 um 11:09 schrieb Michael Ellerman:

Paul Menzel writes:


[…]


On the POWER8 server IBM S822LC running Ubuntu 21.10, building Linux
5.17-rc2+ with rcutorture tests


I'm not sure if that's the host kernel version or the version you're
using of rcutorture? Can you tell us the sha1 of your host kernel and of
the tree you're running rcutorture from?


The host system runs Linux 5.17-rc1+ started with kexec. Unfortunately,
I am unable to find the exact sha1.

  $ more /proc/version
  Linux version 5.17.0-rc1+ (x...@eddb.molgen.mpg.de) (Ubuntu clang version 
13.0.0-2, LLD 13.0.0) #1 SMP Fri Jan 28 17:13:04 CET 2022


OK. In general rc1 kernels can have issues, so it might be worth
rebooting the host into either v5.17-rc3 or a distro or stable kernel.
Just to rule out any issues on the host.


Yes, that was a good test. It works with Ubuntu’s 5.13 Linux kernel.

$ more /proc/version
Linux version 5.13.0-28-generic (buildd@bos02-ppc64el-013) (gcc 
(Ubuntu 11.2.0-7ubuntu2) 11.2.0, GNU ld (GNU Binutils for Ubuntu) 2.37) 
#31-Ubuntu SMP Thu Jan 13 17:40:19 UTC 2022


I have to do more tests, but it could be LLVM/clang related.


The Linux tree, from where I run rcutorture from, is at commit
dfd42facf1e4 (Linux 5.17-rc3) with four patches on top:

  $ git log --oneline -6
  207cec79e752 (HEAD -> master, origin/master, origin/HEAD) Problems with 
rcutorture on ppc64le: allmodconfig(2) and other failures
  8c82f96fbe57 ata: libata-sata: improve sata_link_debounce()
  a447541d925f ata: libata-sata: remove debounce delay by default
  afd84e1eeafc ata: libata-sata: introduce struct sata_deb_timing
  f4caf7e48b75 ata: libata-sata: Simplify sata_link_resume() interface
  dfd42facf1e4 (tag: v5.17-rc3) Linux 5.17-rc3


   $ tools/testing/selftests/rcutorture/bin/torture.sh --duration 10

the built init

   $ file tools/testing/selftests/rcutorture/initrd/init
   tools/testing/selftests/rcutorture/initrd/init: ELF 64-bit LSB 
executable, 64-bit PowerPC or cisco 7500, version 1 (SYSV), statically linked, 
BuildID[sha1]=0ded0e45649184a296f30d611f7a03cc51ecb616, for GNU/Linux 3.10.0, 
stripped


Mine looks pretty much identical:

$ file tools/testing/selftests/rcutorture/initrd/init
tools/testing/selftests/rcutorture/initrd/init: ELF 64-bit LSB executable, 
64-bit PowerPC or cisco 7500, version 1 (SYSV), statically linked, 
BuildID[sha1]=86078bf6e5d54ab0860d36aa9a65d52818b972c8, for GNU/Linux 3.10.0, 
stripped


segfaults in QEMU. From one of the log files


But mine doesn't segfault, it runs fine and the test completes.

What qemu version are you using?

I tried 4.2.1 and 6.2.0, both worked.


  $ qemu-system-ppc64le --version
  QEMU emulator version 6.0.0 (Debian 1:6.0+dfsg-2expubuntu1.1)
  Copyright (c) 2003-2021 Fabrice Bellard and the QEMU Project developers


OK, that's one difference between our setups, but I'd be surprised if it
explains this bug, but I guess anything's possible.


/dev/shm/linux/tools/testing/selftests/rcutorture/res/2022.02.01-21.52.37-torture/results-rcutorture/TREE03/console.log


Sorry, that was the wrong path/test. The correct one for the excerpt
below is:

  
/dev/shm/linux/tools/testing/selftests/rcutorture/res/2022.02.01-21.52.37-torture/results-locktorture-kasan/LOCK01/console.log


(For TREE03, QEMU does not start the Linux kernel at all, that means no
output after:

  Booting Linux via __start() @ 0x0040 ...


OK yeah I see that too.

Removing "threadirqs" from 
tools/testing/selftests/rcutorture/configs/rcu/TREE03.boot
seems to fix it.


Nice find. I have no idea, what that means though.


I still see some preempt related warnings, we clearly have some bugs
with preempt enabled.


You can now download the content of
`/dev/shm/linux/tools/testing/selftests/rcutorture/res/2022.02.01-21.52.37-torture/results-locktorture-kasan/LOCK01`
[1, 65 MB].

Can you reproduce the segmentation fault with the line below?

  $ qemu-system-ppc64 -enable-kvm -nographic -smp cores=1,threads=8 \
  -net none -enable-kvm -M pseries -nodefaults -device spapr-vscsi -serial 
stdio -m 512 \
  -kernel 
/dev/shm/linux/tools/testing/selftests/rcutorture/res/2022.02.01-21.52.37-torture/results-locktorture-kasan/LOCK01/vmlinux
 \
  -append "debug_boot_weak_hash panic=-1 console=ttyS0 \
  torture.disable_onoff_at_boot locktorture.onoff_interval=3 \
  locktorture.onoff_holdoff=30 locktorture.stat_interval=15 \
  locktorture.shutdown_secs=60 locktorture.verbose=1"


That works fine for me, boots and runs the test, then shuts down.

I assume you see the segfault on every boot, not intermittently?

So the differences between our setups are the host kernel and the qemu
version. Can you try a different host kernel easily?

The other thing would be to try a different qemu version, you might need
to build from source, 

Re: rcutorture’s init segfaults in ppc64le VM

2022-02-10 Thread Michael Ellerman
Paul Menzel  writes:
> Am 08.02.22 um 11:09 schrieb Michael Ellerman:
>> Paul Menzel writes:
>
> […]
>
>>> On the POWER8 server IBM S822LC running Ubuntu 21.10, building Linux
>>> 5.17-rc2+ with rcutorture tests
>> 
>> I'm not sure if that's the host kernel version or the version you're
>> using of rcutorture? Can you tell us the sha1 of your host kernel and of
>> the tree you're running rcutorture from?
>
> The host system runs Linux 5.17-rc1+ started with kexec. Unfortunately, 
> I am unable to find the exact sha1.
>
>  $ more /proc/version
>  Linux version 5.17.0-rc1+ 
> (pmen...@flughafenberlinbrandenburgwillybrandt.molgen.mpg.de) (Ubuntu 
> clang version 13.0.0-2, LLD 13.0.0) #1 SMP Fri Jan 28
> 17:13:04 CET 2022

OK. In general rc1 kernels can have issues, so it might be worth
rebooting the host into either v5.17-rc3 or a distro or stable kernel.
Just to rule out any issues on the host.

> The Linux tree, from where I run rcutorture from, is at commit 
> dfd42facf1e4 (Linux 5.17-rc3) with four patches on top:
>
>  $ git log --oneline -6
>  207cec79e752 (HEAD -> master, origin/master, origin/HEAD) Problems 
> with rcutorture on ppc64le: allmodconfig(2) and other failures
>  8c82f96fbe57 ata: libata-sata: improve sata_link_debounce()
>  a447541d925f ata: libata-sata: remove debounce delay by default
>  afd84e1eeafc ata: libata-sata: introduce struct sata_deb_timing
>  f4caf7e48b75 ata: libata-sata: Simplify sata_link_resume() interface
>  dfd42facf1e4 (tag: v5.17-rc3) Linux 5.17-rc3
>
>>>   $ tools/testing/selftests/rcutorture/bin/torture.sh --duration 10
>>>
>>> the built init
>>>
>>>   $ file tools/testing/selftests/rcutorture/initrd/init
>>>   tools/testing/selftests/rcutorture/initrd/init: ELF 64-bit LSB 
>>> executable, 64-bit PowerPC or cisco 7500, version 1 (SYSV), statically 
>>> linked, BuildID[sha1]=0ded0e45649184a296f30d611f7a03cc51ecb616, for 
>>> GNU/Linux 3.10.0, stripped
>> 
>> Mine looks pretty much identical:
>> 
>>$ file tools/testing/selftests/rcutorture/initrd/init
>>tools/testing/selftests/rcutorture/initrd/init: ELF 64-bit LSB 
>> executable, 64-bit PowerPC or cisco 7500, version 1 (SYSV), statically 
>> linked, BuildID[sha1]=86078bf6e5d54ab0860d36aa9a65d52818b972c8, for 
>> GNU/Linux 3.10.0, stripped
>> 
>>> segfaults in QEMU. From one of the log files
>> 
>> But mine doesn't segfault, it runs fine and the test completes.
>> 
>> What qemu version are you using?
>> 
>> I tried 4.2.1 and 6.2.0, both worked.
>
>  $ qemu-system-ppc64le --version
>  QEMU emulator version 6.0.0 (Debian 1:6.0+dfsg-2expubuntu1.1)
>  Copyright (c) 2003-2021 Fabrice Bellard and the QEMU Project developers

OK, that's one difference between our setups, but I'd be surprised if it
explains this bug, but I guess anything's possible.


>>> /dev/shm/linux/tools/testing/selftests/rcutorture/res/2022.02.01-21.52.37-torture/results-rcutorture/TREE03/console.log
>
> Sorry, that was the wrong path/test. The correct one for the excerpt 
> below is:
>
>  
> /dev/shm/linux/tools/testing/selftests/rcutorture/res/2022.02.01-21.52.37-torture/results-locktorture-kasan/LOCK01/console.log
>
> (For TREE03, QEMU does not start the Linux kernel at all, that means no 
> output after:
>
>  Booting Linux via __start() @ 0x0040 ...

OK yeah I see that too.

Removing "threadirqs" from 
tools/testing/selftests/rcutorture/configs/rcu/TREE03.boot
seems to fix it.

I still see some preempt related warnings, we clearly have some bugs
with preempt enabled.

> You can now download the content of 
> `/dev/shm/linux/tools/testing/selftests/rcutorture/res/2022.02.01-21.52.37-torture/results-locktorture-kasan/LOCK01`
>  
> [1, 65 MB].
>
> Can you reproduce the segmentation fault with the line below?
>
>  $ qemu-system-ppc64 -enable-kvm -nographic -smp cores=1,threads=8 
> -net none -enable-kvm -M pseries -nodefaults -device spapr-vscsi -serial 
> stdio -m 512 -kernel 
> /dev/shm/linux/tools/testing/selftests/rcutorture/res/2022.02.01-21.52.37-torture/results-locktorture-kasan/LOCK01/vmlinux
>  
> -append "debug_boot_weak_hash panic=-1 console=ttyS0 
> torture.disable_onoff_at_boot locktorture.onoff_interval=3 
> locktorture.onoff_holdoff=30 locktorture.stat_interval=15 
> locktorture.shutdown_secs=60 locktorture.verbose=1"

That works fine for me, boots and runs the test, then shuts down.

I assume you see the segfault on every boot, not intermittently?

So the differences between our setups are the host kernel and the qemu
version. Can you try a different host kernel easily?

The other thing would be to try a different qemu version, you might need
to build from source, but it's not that hard :)

cheers


Re: rcutorture’s init segfaults in ppc64le VM

2022-02-08 Thread Paul Menzel

[Correct sha1 for test for 2022.02.01-21.52.37]


Am 08.02.22 um 13:12 schrieb Paul Menzel:

Dear Michael,


Thank you for looking into this.

Am 08.02.22 um 11:09 schrieb Michael Ellerman:

Paul Menzel writes:


[…]


On the POWER8 server IBM S822LC running Ubuntu 21.10, building Linux
5.17-rc2+ with rcutorture tests


I'm not sure if that's the host kernel version or the version you're
using of rcutorture? Can you tell us the sha1 of your host kernel and of
the tree you're running rcutorture from?


The host system runs Linux 5.17-rc1+ started with kexec. Unfortunately, 
I am unable to find the exact sha1.


     $ more /proc/version
     Linux version 5.17.0-rc1+ 
(pmen...@flughafenberlinbrandenburgwillybrandt.molgen.mpg.de) (Ubuntu clang 
version 13.0.0-2, LLD 13.0.0) #1 SMP Fri Jan 28 17:13:04 CET 2022

The Linux tree, from where I run rcutorture from, is at commit 
dfd42facf1e4 (Linux 5.17-rc3) with four patches on top:


     $ git log --oneline -6
     207cec79e752 (HEAD -> master, origin/master, origin/HEAD) Problems with 
rcutorture on ppc64le: allmodconfig(2) and other failures
     8c82f96fbe57 ata: libata-sata: improve sata_link_debounce()
     a447541d925f ata: libata-sata: remove debounce delay by default
     afd84e1eeafc ata: libata-sata: introduce struct sata_deb_timing
     f4caf7e48b75 ata: libata-sata: Simplify sata_link_resume() interface
     dfd42facf1e4 (tag: v5.17-rc3) Linux 5.17-rc3


I was able to reproduce this with the above, but the report and the 
attached logs at the end are from:


$ git log --oneline -6 b37a34a8cf5a
b37a34a8cf5a Problems with rcutorture on ppc64le: allmodconfig(2) 
and other failures

9a78ddead89a ata: libata-sata: improve sata_link_debounce()
567da2eaf099 ata: libata-sata: remove debounce delay by default
70ae61851660 ata: libata-sata: introduce struct sata_deb_timing
9ebb6433d9c3 ata: libata-sata: Simplify sata_link_resume() interface
26291c54e111 (tag: v5.17-rc2) Linux 5.17-rc2


  $ tools/testing/selftests/rcutorture/bin/torture.sh --duration 10

the built init

  $ file tools/testing/selftests/rcutorture/initrd/init
  tools/testing/selftests/rcutorture/initrd/init: ELF 64-bit LSB 
executable, 64-bit PowerPC or cisco 7500, version 1 (SYSV), statically linked, 
BuildID[sha1]=0ded0e45649184a296f30d611f7a03cc51ecb616, for GNU/Linux 3.10.0, 
stripped


Mine looks pretty much identical:

   $ file tools/testing/selftests/rcutorture/initrd/init
   tools/testing/selftests/rcutorture/initrd/init: ELF 64-bit LSB executable, 
64-bit PowerPC or cisco 7500, version 1 (SYSV), statically linked, 
BuildID[sha1]=86078bf6e5d54ab0860d36aa9a65d52818b972c8, for GNU/Linux 3.10.0, 
stripped


segfaults in QEMU. From one of the log files


But mine doesn't segfault, it runs fine and the test completes.

What qemu version are you using?

I tried 4.2.1 and 6.2.0, both worked.


     $ qemu-system-ppc64le --version
     QEMU emulator version 6.0.0 (Debian 1:6.0+dfsg-2expubuntu1.1)
     Copyright (c) 2003-2021 Fabrice Bellard and the QEMU Project 
developers


/dev/shm/linux/tools/testing/selftests/rcutorture/res/2022.02.01-21.52.37-torture/results-rcutorture/TREE03/console.log 



Sorry, that was the wrong path/test. The correct one for the excerpt 
below is:



/dev/shm/linux/tools/testing/selftests/rcutorture/res/2022.02.01-21.52.37-torture/results-locktorture-kasan/LOCK01/console.log 



(For TREE03, QEMU does not start the Linux kernel at all, that means no 
output after:


     Booting Linux via __start() @ 0x0040 ...
)


  [    1.119803][    T1] Run /init as init process
  [    1.122011][    T1] init[1]: segfault (11) at f0656d90 nip 1a18 lr 
0 code 1 in init[1000+d]
  [    1.124863][    T1] init[1]: code: 2c2903e7 f9210030 4081ff84 4b58 
 0100 0580 3c40100f
  [    1.128823][    T1] init[1]: code: 38427c00 7c290b78 782106e4 3800 
 7c0803a6 f801 e9028010


The disassembly from 3c40100f is:
   lis r2,4111
   addi    r2,r2,31744
   mr  r9,r1
   rldicr  r1,r1,0,59
   li  r0,0
   stdu    r1,-128(r1)    <- fault
   mtlr    r0
   std r0,0(r1)
   ld  r8,-32752(r2)


I think you'll find that's the code at the ELF entry point. You can
check with:

  $ readelf -e tools/testing/selftests/rcutorture/initrd/init | grep 
Entry

    Entry point address:   0x1c0c

  $ objdump -d tools/testing/selftests/rcutorture/initrd/init | grep 
-m 1 -A 8 1c0c

 1c0c:   0e 10 40 3c lis r2,4110
 1c10:   00 7b 42 38 addi    r2,r2,31488
 1c14:   78 0b 29 7c mr  r9,r1
 1c18:   e4 06 21 78 rldicr  r1,r1,0,59
 1c1c:   00 00 00 38 li  r0,0
 1c20:   81 ff 21 f8 stdu    r1,-128(r1)
 1c24:   a6 03 08 7c mtlr    r0
 1c28:   00 00 01 f8 std r0,0(r1)
 1c2c:   10 80 02 e9 ld  r8,-32752(r2)

The fault you're seeing is the first store 

Re: rcutorture’s init segfaults in ppc64le VM

2022-02-08 Thread Paul Menzel

Dear Michael,


Thank you for looking into this.

Am 08.02.22 um 11:09 schrieb Michael Ellerman:

Paul Menzel writes:


[…]


On the POWER8 server IBM S822LC running Ubuntu 21.10, building Linux
5.17-rc2+ with rcutorture tests


I'm not sure if that's the host kernel version or the version you're
using of rcutorture? Can you tell us the sha1 of your host kernel and of
the tree you're running rcutorture from?


The host system runs Linux 5.17-rc1+ started with kexec. Unfortunately, 
I am unable to find the exact sha1.


$ more /proc/version
Linux version 5.17.0-rc1+ 
(pmen...@flughafenberlinbrandenburgwillybrandt.molgen.mpg.de) (Ubuntu 
clang version 13.0.0-2, LLD 13.0.0) #1 SMP Fri Jan 28

17:13:04 CET 2022

The Linux tree, from where I run rcutorture from, is at commit 
dfd42facf1e4 (Linux 5.17-rc3) with four patches on top:


$ git log --oneline -6
207cec79e752 (HEAD -> master, origin/master, origin/HEAD) Problems 
with rcutorture on ppc64le: allmodconfig(2) and other failures

8c82f96fbe57 ata: libata-sata: improve sata_link_debounce()
a447541d925f ata: libata-sata: remove debounce delay by default
afd84e1eeafc ata: libata-sata: introduce struct sata_deb_timing
f4caf7e48b75 ata: libata-sata: Simplify sata_link_resume() interface
dfd42facf1e4 (tag: v5.17-rc3) Linux 5.17-rc3


  $ tools/testing/selftests/rcutorture/bin/torture.sh --duration 10

the built init

  $ file tools/testing/selftests/rcutorture/initrd/init
  tools/testing/selftests/rcutorture/initrd/init: ELF 64-bit LSB 
executable, 64-bit PowerPC or cisco 7500, version 1 (SYSV), statically linked, 
BuildID[sha1]=0ded0e45649184a296f30d611f7a03cc51ecb616, for GNU/Linux 3.10.0, 
stripped


Mine looks pretty much identical:

   $ file tools/testing/selftests/rcutorture/initrd/init
   tools/testing/selftests/rcutorture/initrd/init: ELF 64-bit LSB executable, 
64-bit PowerPC or cisco 7500, version 1 (SYSV), statically linked, 
BuildID[sha1]=86078bf6e5d54ab0860d36aa9a65d52818b972c8, for GNU/Linux 3.10.0, 
stripped


segfaults in QEMU. From one of the log files


But mine doesn't segfault, it runs fine and the test completes.

What qemu version are you using?

I tried 4.2.1 and 6.2.0, both worked.


$ qemu-system-ppc64le --version
QEMU emulator version 6.0.0 (Debian 1:6.0+dfsg-2expubuntu1.1)
Copyright (c) 2003-2021 Fabrice Bellard and the QEMU Project developers


/dev/shm/linux/tools/testing/selftests/rcutorture/res/2022.02.01-21.52.37-torture/results-rcutorture/TREE03/console.log


Sorry, that was the wrong path/test. The correct one for the excerpt 
below is:



/dev/shm/linux/tools/testing/selftests/rcutorture/res/2022.02.01-21.52.37-torture/results-locktorture-kasan/LOCK01/console.log

(For TREE03, QEMU does not start the Linux kernel at all, that means no 
output after:


Booting Linux via __start() @ 0x0040 ...
)


  [1.119803][T1] Run /init as init process
  [1.122011][T1] init[1]: segfault (11) at f0656d90 nip 1a18 lr 
0 code 1 in init[1000+d]
  [1.124863][T1] init[1]: code: 2c2903e7 f9210030 4081ff84 4b58 
 0100 0580 3c40100f
  [1.128823][T1] init[1]: code: 38427c00 7c290b78 782106e4 3800 
 7c0803a6 f801 e9028010


The disassembly from 3c40100f is:
   lis r2,4111
   addir2,r2,31744
   mr  r9,r1
   rldicr  r1,r1,0,59
   li  r0,0
   stdur1,-128(r1)  <- fault
   mtlrr0
   std r0,0(r1)
   ld  r8,-32752(r2)


I think you'll find that's the code at the ELF entry point. You can
check with:

  $ readelf -e tools/testing/selftests/rcutorture/initrd/init | grep Entry
Entry point address:   0x1c0c

  $ objdump -d tools/testing/selftests/rcutorture/initrd/init | grep -m 1 -A 8 
1c0c
 1c0c:   0e 10 40 3c lis r2,4110
 1c10:   00 7b 42 38 addir2,r2,31488
 1c14:   78 0b 29 7c mr  r9,r1
 1c18:   e4 06 21 78 rldicr  r1,r1,0,59
 1c1c:   00 00 00 38 li  r0,0
 1c20:   81 ff 21 f8 stdur1,-128(r1)
 1c24:   a6 03 08 7c mtlrr0
 1c28:   00 00 01 f8 std r0,0(r1)
 1c2c:   10 80 02 e9 ld  r8,-32752(r2)

The fault you're seeing is the first store using the stack pointer (r1),
which is setup by the kernel.

The fault address f0656d90 is weirdly low, the stack should be up near 128TB.

I'm not sure how we end up with a bad r1.

Can you dump some info about the kernel that was built, something like:

$ file 
/dev/shm/linux/tools/testing/selftests/rcutorture/res/2022.02.01-21.52.37-torture/results-rcutorture/TREE03/vmlinux

And maybe paste/attach the full log, maybe there's a clue somewhere.


You can now download the content of 
`/dev/shm/linux/tools/testing/selftests/rcutorture/res/2022.02.01-21.52.37-torture/results-locktorture-kasan/LOCK01` 
[1, 65 MB].


Can you reproduce the segmentation fault with the line 

Re: rcutorture’s init segfaults in ppc64le VM

2022-02-08 Thread Michael Ellerman
Paul Menzel  writes:
> Dear Linux folks,

Hi Paul,

> On the POWER8 server IBM S822LC running Ubuntu 21.10, building Linux 
> 5.17-rc2+ with rcutorture tests

I'm not sure if that's the host kernel version or the version you're
using of rcutorture? Can you tell us the sha1 of your host kernel and of
the tree you're running rcutorture from?

>  $ tools/testing/selftests/rcutorture/bin/torture.sh --duration 10
>
> the built init
>
>  $ file tools/testing/selftests/rcutorture/initrd/init
>  tools/testing/selftests/rcutorture/initrd/init: ELF 64-bit LSB 
> executable, 64-bit PowerPC or cisco 7500, version 1 (SYSV), statically 
> linked, BuildID[sha1]=0ded0e45649184a296f30d611f7a03cc51ecb616, for 
> GNU/Linux 3.10.0, stripped

Mine looks pretty much identical:

  $ file tools/testing/selftests/rcutorture/initrd/init
  tools/testing/selftests/rcutorture/initrd/init: ELF 64-bit LSB
  executable, 64-bit PowerPC or cisco 7500, version 1 (SYSV), statically
  linked, BuildID[sha1]=86078bf6e5d54ab0860d36aa9a65d52818b972c8, for
  GNU/Linux 3.10.0, stripped


> segfaults in QEMU. From one of the log files

But mine doesn't segfault, it runs fine and the test completes.

What qemu version are you using?

I tried 4.2.1 and 6.2.0, both worked.


> /dev/shm/linux/tools/testing/selftests/rcutorture/res/2022.02.01-21.52.37-torture/results-rcutorture/TREE03/console.log
>
>  [1.119803][T1] Run /init as init process
>  [1.122011][T1] init[1]: segfault (11) at f0656d90 nip 1a18 
> lr 0 code 1 in init[1000+d]
>  [1.124863][T1] init[1]: code: 2c2903e7 f9210030 4081ff84 
> 4b58  0100 0580 3c40100f
>  [1.128823][T1] init[1]: code: 38427c00 7c290b78 782106e4 
> 3800  7c0803a6 f801 e9028010

The disassembly from 3c40100f is:
  lis r2,4111
  addir2,r2,31744
  mr  r9,r1
  rldicr  r1,r1,0,59
  li  r0,0
  stdur1,-128(r1)   <- fault
  mtlrr0
  std r0,0(r1)
  ld  r8,-32752(r2)


I think you'll find that's the code at the ELF entry point. You can
check with:

 $ readelf -e tools/testing/selftests/rcutorture/initrd/init | grep Entry
   Entry point address:   0x1c0c

 $ objdump -d tools/testing/selftests/rcutorture/initrd/init | grep -m 1 -A 8 
1c0c
1c0c:   0e 10 40 3c lis r2,4110
1c10:   00 7b 42 38 addir2,r2,31488
1c14:   78 0b 29 7c mr  r9,r1
1c18:   e4 06 21 78 rldicr  r1,r1,0,59
1c1c:   00 00 00 38 li  r0,0
1c20:   81 ff 21 f8 stdur1,-128(r1)
1c24:   a6 03 08 7c mtlrr0
1c28:   00 00 01 f8 std r0,0(r1)
1c2c:   10 80 02 e9 ld  r8,-32752(r2)


The fault you're seeing is the first store using the stack pointer (r1),
which is setup by the kernel.

The fault address f0656d90 is weirdly low, the stack should be up near 128TB.

I'm not sure how we end up with a bad r1.

Can you dump some info about the kernel that was built, something like:

$ file 
/dev/shm/linux/tools/testing/selftests/rcutorture/res/2022.02.01-21.52.37-torture/results-rcutorture/TREE03/vmlinux

And maybe paste/attach the full log, maybe there's a clue somewhere.

cheers


Re: rcutorture’s init segfaults in ppc64le VM

2022-02-07 Thread Zhouyi Zhou
Hi,

The mailing list forward the emails to me in periodic style, very
sorry not seeing Willy's email until I visited
https://lore.kernel.org/rcu/20220207180901.gb14...@1wt.eu/T/#u,  I am
also very interested in testing Willy's proposal.

Thanks a lot
Zhouyi

On Tue, Feb 8, 2022 at 1:46 PM Zhouyi Zhou  wrote:
>
> Dear Paul
>
> I am also very interested in the topic.
> The Open source lab of Oregon State University has lent me a 8 core
> power ppc64el VM for 3 months, I guess I can try reproducing this bug
> in the Virtual Machine by executing qemu in non hardware accelerated
> mode (using -no-kvm argument).
> I am currently doing research on
> https://lore.kernel.org/rcu/20220201175023.GW4285@paulmck-ThinkPad-P17-Gen-1/T/#mc7e5f8ec99e3794bec1e38fbbb130e71172e4759,
> I think I can give a preliminary short report on that previous topic
> tomorrow. And I am very interested in doing a search on the new topic
> the day after tomorrow.
>
> Thank you both for providing me an opportunity to improve myself ;-)
>
> Thanks again
> Zhouyi
>
> On Tue, Feb 8, 2022 at 12:10 PM Paul E. McKenney  wrote:
> >
> > On Mon, Feb 07, 2022 at 05:44:47PM +0100, Paul Menzel wrote:
> > > Dear Linux folks,
> > >
> > >
> > > On the POWER8 server IBM S822LC running Ubuntu 21.10, building Linux
> > > 5.17-rc2+ with rcutorture tests
> > >
> > > $ tools/testing/selftests/rcutorture/bin/torture.sh --duration 10
> > >
> > > the built init
> > >
> > > $ file tools/testing/selftests/rcutorture/initrd/init
> > > tools/testing/selftests/rcutorture/initrd/init: ELF 64-bit LSB
> > > executable, 64-bit PowerPC or cisco 7500, version 1 (SYSV), statically
> > > linked, BuildID[sha1]=0ded0e45649184a296f30d611f7a03cc51ecb616, for
> > > GNU/Linux 3.10.0, stripped
> > >
> > > segfaults in QEMU. From one of the log files
> > >
> > >
> > > /dev/shm/linux/tools/testing/selftests/rcutorture/res/2022.02.01-21.52.37-torture/results-rcutorture/TREE03/console.log
> > >
> > > [1.119803][T1] Run /init as init process
> > > [1.122011][T1] init[1]: segfault (11) at f0656d90 nip 1a18
> > > lr 0 code 1 in init[1000+d]
> > > [1.124863][T1] init[1]: code: 2c2903e7 f9210030 4081ff84
> > > 4b58  0100 0580 3c40100f
> > > [1.128823][T1] init[1]: code: 38427c00 7c290b78 782106e4
> > > 3800  7c0803a6 f801 e9028010
> > >
> > > Executing the init, which just seems to be an endless loop, from userspace
> > > work:
> > >
> > > $ strace ./tools/testing/selftests/rcutorture/initrd/init
> > > execve("./tools/testing/selftests/rcutorture/initrd/init",
> > > ["./tools/testing/selftests/rcutor"...], 0x7db9e860 /* 31 vars */) = 0
> > > brk(NULL)   = 0x1001d94
> > > brk(0x1001d940b98)  = 0x1001d940b98
> > > set_tid_address(0x1001d9400d0)  = 2890832
> > > set_robust_list(0x1001d9400e0, 24)  = 0
> > > uname({sysname="Linux",
> > > nodename="flughafenberlinbrandenburgwillybrandt.molgen.mpg.de", ...}) = 0
> > > prlimit64(0, RLIMIT_STACK, NULL, {rlim_cur=8192*1024,
> > > rlim_max=RLIM64_INFINITY}) = 0
> > > readlink("/proc/self/exe", "/dev/shm/linux/tools/testing/sel"..., 
> > > 4096)
> > > = 61
> > > getrandom("\xf1\x30\x4c\x9e\x82\x8d\x26\xd7", 8, GRND_NONBLOCK) = 8
> > > brk(0x1001d970b98)  = 0x1001d970b98
> > > brk(0x1001d98)  = 0x1001d98
> > > mprotect(0x100e, 65536, PROT_READ)  = 0
> > > clock_nanosleep(CLOCK_REALTIME, 0, {tv_sec=1, tv_nsec=0},
> > > 0x7b22c8a8) = 0
> > > clock_nanosleep(CLOCK_REALTIME, 0, {tv_sec=1, tv_nsec=0},
> > > 0x7b22c8a8) = 0
> > > clock_nanosleep(CLOCK_REALTIME, 0, {tv_sec=1, tv_nsec=0}, ^C{tv_sec=0,
> > > tv_nsec=872674044}) = ? ERESTART_RESTARTBLOCK (Interrupted by signal)
> > > strace: Process 2890832 detached
> >
> > Huh.  In PowerPC, is there some difference between system calls
> > executed in initrd and those same system calls executed in userspace?
> >
> > And just to make sure, the above strace was from exactly the same
> > binary "init" file that is included in initrd, correct?
> >
> > Adding Willy Tarreau for his thoughts.
> >
> > Thanx, Paul
> >
> > > Any ideas, what `mkinitrd.sh` [2] should do differently?
> > >
> > > ```
> > > cat > init.c << '___EOF___'
> > > #ifndef NOLIBC
> > > #include 
> > > #include 
> > > #endif
> > >
> > > volatile unsigned long delaycount;
> > >
> > > int main(int argc, int argv[])
> > > {
> > >   int i;
> > >   struct timeval tv;
> > >   struct timeval tvb;
> > >
> > >   for (;;) {
> > >   sleep(1);
> > >   /* Need some userspace time. */
> > >   if (gettimeofday(, NULL))
> > >   continue;
> > >   do {
> > >   for (i = 0; i < 1000 * 100; i++)
> > >  

Re: rcutorture’s init segfaults in ppc64le VM

2022-02-07 Thread Zhouyi Zhou
Dear Paul

I am also very interested in the topic.
The Open source lab of Oregon State University has lent me a 8 core
power ppc64el VM for 3 months, I guess I can try reproducing this bug
in the Virtual Machine by executing qemu in non hardware accelerated
mode (using -no-kvm argument).
I am currently doing research on
https://lore.kernel.org/rcu/20220201175023.GW4285@paulmck-ThinkPad-P17-Gen-1/T/#mc7e5f8ec99e3794bec1e38fbbb130e71172e4759,
I think I can give a preliminary short report on that previous topic
tomorrow. And I am very interested in doing a search on the new topic
the day after tomorrow.

Thank you both for providing me an opportunity to improve myself ;-)

Thanks again
Zhouyi

On Tue, Feb 8, 2022 at 12:10 PM Paul E. McKenney  wrote:
>
> On Mon, Feb 07, 2022 at 05:44:47PM +0100, Paul Menzel wrote:
> > Dear Linux folks,
> >
> >
> > On the POWER8 server IBM S822LC running Ubuntu 21.10, building Linux
> > 5.17-rc2+ with rcutorture tests
> >
> > $ tools/testing/selftests/rcutorture/bin/torture.sh --duration 10
> >
> > the built init
> >
> > $ file tools/testing/selftests/rcutorture/initrd/init
> > tools/testing/selftests/rcutorture/initrd/init: ELF 64-bit LSB
> > executable, 64-bit PowerPC or cisco 7500, version 1 (SYSV), statically
> > linked, BuildID[sha1]=0ded0e45649184a296f30d611f7a03cc51ecb616, for
> > GNU/Linux 3.10.0, stripped
> >
> > segfaults in QEMU. From one of the log files
> >
> >
> > /dev/shm/linux/tools/testing/selftests/rcutorture/res/2022.02.01-21.52.37-torture/results-rcutorture/TREE03/console.log
> >
> > [1.119803][T1] Run /init as init process
> > [1.122011][T1] init[1]: segfault (11) at f0656d90 nip 1a18
> > lr 0 code 1 in init[1000+d]
> > [1.124863][T1] init[1]: code: 2c2903e7 f9210030 4081ff84
> > 4b58  0100 0580 3c40100f
> > [1.128823][T1] init[1]: code: 38427c00 7c290b78 782106e4
> > 3800  7c0803a6 f801 e9028010
> >
> > Executing the init, which just seems to be an endless loop, from userspace
> > work:
> >
> > $ strace ./tools/testing/selftests/rcutorture/initrd/init
> > execve("./tools/testing/selftests/rcutorture/initrd/init",
> > ["./tools/testing/selftests/rcutor"...], 0x7db9e860 /* 31 vars */) = 0
> > brk(NULL)   = 0x1001d94
> > brk(0x1001d940b98)  = 0x1001d940b98
> > set_tid_address(0x1001d9400d0)  = 2890832
> > set_robust_list(0x1001d9400e0, 24)  = 0
> > uname({sysname="Linux",
> > nodename="flughafenberlinbrandenburgwillybrandt.molgen.mpg.de", ...}) = 0
> > prlimit64(0, RLIMIT_STACK, NULL, {rlim_cur=8192*1024,
> > rlim_max=RLIM64_INFINITY}) = 0
> > readlink("/proc/self/exe", "/dev/shm/linux/tools/testing/sel"..., 4096)
> > = 61
> > getrandom("\xf1\x30\x4c\x9e\x82\x8d\x26\xd7", 8, GRND_NONBLOCK) = 8
> > brk(0x1001d970b98)  = 0x1001d970b98
> > brk(0x1001d98)  = 0x1001d98
> > mprotect(0x100e, 65536, PROT_READ)  = 0
> > clock_nanosleep(CLOCK_REALTIME, 0, {tv_sec=1, tv_nsec=0},
> > 0x7b22c8a8) = 0
> > clock_nanosleep(CLOCK_REALTIME, 0, {tv_sec=1, tv_nsec=0},
> > 0x7b22c8a8) = 0
> > clock_nanosleep(CLOCK_REALTIME, 0, {tv_sec=1, tv_nsec=0}, ^C{tv_sec=0,
> > tv_nsec=872674044}) = ? ERESTART_RESTARTBLOCK (Interrupted by signal)
> > strace: Process 2890832 detached
>
> Huh.  In PowerPC, is there some difference between system calls
> executed in initrd and those same system calls executed in userspace?
>
> And just to make sure, the above strace was from exactly the same
> binary "init" file that is included in initrd, correct?
>
> Adding Willy Tarreau for his thoughts.
>
> Thanx, Paul
>
> > Any ideas, what `mkinitrd.sh` [2] should do differently?
> >
> > ```
> > cat > init.c << '___EOF___'
> > #ifndef NOLIBC
> > #include 
> > #include 
> > #endif
> >
> > volatile unsigned long delaycount;
> >
> > int main(int argc, int argv[])
> > {
> >   int i;
> >   struct timeval tv;
> >   struct timeval tvb;
> >
> >   for (;;) {
> >   sleep(1);
> >   /* Need some userspace time. */
> >   if (gettimeofday(, NULL))
> >   continue;
> >   do {
> >   for (i = 0; i < 1000 * 100; i++)
> >   delaycount = i * i;
> >   if (gettimeofday(, NULL))
> >   break;
> >   tv.tv_sec -= tvb.tv_sec;
> >   if (tv.tv_sec > 1)
> >   break;
> >   tv.tv_usec += tv.tv_sec * 1000 * 1000;
> >   tv.tv_usec -= tvb.tv_usec;
> >   } while (tv.tv_usec < 1000);
> >   }
> >   return 0;
> > }
> > ___EOF___
> >
> > # build using nolibc on supported archs (smaller executable) 

Re: rcutorture’s init segfaults in ppc64le VM

2022-02-07 Thread Paul E. McKenney
On Mon, Feb 07, 2022 at 05:44:47PM +0100, Paul Menzel wrote:
> Dear Linux folks,
> 
> 
> On the POWER8 server IBM S822LC running Ubuntu 21.10, building Linux
> 5.17-rc2+ with rcutorture tests
> 
> $ tools/testing/selftests/rcutorture/bin/torture.sh --duration 10
> 
> the built init
> 
> $ file tools/testing/selftests/rcutorture/initrd/init
> tools/testing/selftests/rcutorture/initrd/init: ELF 64-bit LSB
> executable, 64-bit PowerPC or cisco 7500, version 1 (SYSV), statically
> linked, BuildID[sha1]=0ded0e45649184a296f30d611f7a03cc51ecb616, for
> GNU/Linux 3.10.0, stripped
> 
> segfaults in QEMU. From one of the log files
> 
> 
> /dev/shm/linux/tools/testing/selftests/rcutorture/res/2022.02.01-21.52.37-torture/results-rcutorture/TREE03/console.log
> 
> [1.119803][T1] Run /init as init process
> [1.122011][T1] init[1]: segfault (11) at f0656d90 nip 1a18
> lr 0 code 1 in init[1000+d]
> [1.124863][T1] init[1]: code: 2c2903e7 f9210030 4081ff84
> 4b58  0100 0580 3c40100f
> [1.128823][T1] init[1]: code: 38427c00 7c290b78 782106e4
> 3800  7c0803a6 f801 e9028010
> 
> Executing the init, which just seems to be an endless loop, from userspace
> work:
> 
> $ strace ./tools/testing/selftests/rcutorture/initrd/init
> execve("./tools/testing/selftests/rcutorture/initrd/init",
> ["./tools/testing/selftests/rcutor"...], 0x7db9e860 /* 31 vars */) = 0
> brk(NULL)   = 0x1001d94
> brk(0x1001d940b98)  = 0x1001d940b98
> set_tid_address(0x1001d9400d0)  = 2890832
> set_robust_list(0x1001d9400e0, 24)  = 0
> uname({sysname="Linux",
> nodename="flughafenberlinbrandenburgwillybrandt.molgen.mpg.de", ...}) = 0
> prlimit64(0, RLIMIT_STACK, NULL, {rlim_cur=8192*1024,
> rlim_max=RLIM64_INFINITY}) = 0
> readlink("/proc/self/exe", "/dev/shm/linux/tools/testing/sel"..., 4096)
> = 61
> getrandom("\xf1\x30\x4c\x9e\x82\x8d\x26\xd7", 8, GRND_NONBLOCK) = 8
> brk(0x1001d970b98)  = 0x1001d970b98
> brk(0x1001d98)  = 0x1001d98
> mprotect(0x100e, 65536, PROT_READ)  = 0
> clock_nanosleep(CLOCK_REALTIME, 0, {tv_sec=1, tv_nsec=0},
> 0x7b22c8a8) = 0
> clock_nanosleep(CLOCK_REALTIME, 0, {tv_sec=1, tv_nsec=0},
> 0x7b22c8a8) = 0
> clock_nanosleep(CLOCK_REALTIME, 0, {tv_sec=1, tv_nsec=0}, ^C{tv_sec=0,
> tv_nsec=872674044}) = ? ERESTART_RESTARTBLOCK (Interrupted by signal)
> strace: Process 2890832 detached

Huh.  In PowerPC, is there some difference between system calls
executed in initrd and those same system calls executed in userspace?

And just to make sure, the above strace was from exactly the same
binary "init" file that is included in initrd, correct?

Adding Willy Tarreau for his thoughts.

Thanx, Paul

> Any ideas, what `mkinitrd.sh` [2] should do differently?
> 
> ```
> cat > init.c << '___EOF___'
> #ifndef NOLIBC
> #include 
> #include 
> #endif
> 
> volatile unsigned long delaycount;
> 
> int main(int argc, int argv[])
> {
>   int i;
>   struct timeval tv;
>   struct timeval tvb;
> 
>   for (;;) {
>   sleep(1);
>   /* Need some userspace time. */
>   if (gettimeofday(, NULL))
>   continue;
>   do {
>   for (i = 0; i < 1000 * 100; i++)
>   delaycount = i * i;
>   if (gettimeofday(, NULL))
>   break;
>   tv.tv_sec -= tvb.tv_sec;
>   if (tv.tv_sec > 1)
>   break;
>   tv.tv_usec += tv.tv_sec * 1000 * 1000;
>   tv.tv_usec -= tvb.tv_usec;
>   } while (tv.tv_usec < 1000);
>   }
>   return 0;
> }
> ___EOF___
> 
> # build using nolibc on supported archs (smaller executable) and fall
> # back to regular glibc on other ones.
> if echo -e "#if __x86_64__||__i386__||__i486__||__i586__||__i686__" \
>"||__ARM_EABI__||__aarch64__\nyes\n#endif" \
>| ${CROSS_COMPILE}gcc -E -nostdlib -xc - \
>| grep -q '^yes'; then
>   # architecture supported by nolibc
> ${CROSS_COMPILE}gcc -fno-asynchronous-unwind-tables -fno-ident \
>   -nostdlib -include ../../../../include/nolibc/nolibc.h \
>   -s -static -Os -o init init.c -lgcc
> else
>   ${CROSS_COMPILE}gcc -s -static -Os -o init init.c
> fi
> ```
> 
> 
> Kind regards,
> 
> Paul
> 
> 
> [1]: 
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/tools/testing/selftests/rcutorture/doc/initrd.txt
> [2]: 
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/tools/testing/selftests/rcutorture/bin/mkinitrd.sh


rcutorture’s init segfaults in ppc64le VM

2022-02-07 Thread Paul Menzel

Dear Linux folks,


On the POWER8 server IBM S822LC running Ubuntu 21.10, building Linux 
5.17-rc2+ with rcutorture tests


$ tools/testing/selftests/rcutorture/bin/torture.sh --duration 10

the built init

$ file tools/testing/selftests/rcutorture/initrd/init
tools/testing/selftests/rcutorture/initrd/init: ELF 64-bit LSB 
executable, 64-bit PowerPC or cisco 7500, version 1 (SYSV), statically 
linked, BuildID[sha1]=0ded0e45649184a296f30d611f7a03cc51ecb616, for 
GNU/Linux 3.10.0, stripped


segfaults in QEMU. From one of the log files


/dev/shm/linux/tools/testing/selftests/rcutorture/res/2022.02.01-21.52.37-torture/results-rcutorture/TREE03/console.log

[1.119803][T1] Run /init as init process
[1.122011][T1] init[1]: segfault (11) at f0656d90 nip 
1a18 lr 0 code 1 in init[1000+d]
[1.124863][T1] init[1]: code: 2c2903e7 f9210030 4081ff84 
4b58  0100 0580 3c40100f
[1.128823][T1] init[1]: code: 38427c00 7c290b78 782106e4 
3800  7c0803a6 f801 e9028010


Executing the init, which just seems to be an endless loop, from 
userspace work:


$ strace ./tools/testing/selftests/rcutorture/initrd/init
execve("./tools/testing/selftests/rcutorture/initrd/init", 
["./tools/testing/selftests/rcutor"...], 0x7db9e860 /* 31 vars */) = 0

brk(NULL)   = 0x1001d94
brk(0x1001d940b98)  = 0x1001d940b98
set_tid_address(0x1001d9400d0)  = 2890832
set_robust_list(0x1001d9400e0, 24)  = 0
uname({sysname="Linux", 
nodename="flughafenberlinbrandenburgwillybrandt.molgen.mpg.de", ...}) = 0
prlimit64(0, RLIMIT_STACK, NULL, {rlim_cur=8192*1024, 
rlim_max=RLIM64_INFINITY}) = 0
readlink("/proc/self/exe", "/dev/shm/linux/tools/testing/sel"..., 
4096) = 61

getrandom("\xf1\x30\x4c\x9e\x82\x8d\x26\xd7", 8, GRND_NONBLOCK) = 8
brk(0x1001d970b98)  = 0x1001d970b98
brk(0x1001d98)  = 0x1001d98
mprotect(0x100e, 65536, PROT_READ)  = 0
clock_nanosleep(CLOCK_REALTIME, 0, {tv_sec=1, tv_nsec=0}, 
0x7b22c8a8) = 0
clock_nanosleep(CLOCK_REALTIME, 0, {tv_sec=1, tv_nsec=0}, 
0x7b22c8a8) = 0
clock_nanosleep(CLOCK_REALTIME, 0, {tv_sec=1, tv_nsec=0}, 
^C{tv_sec=0, tv_nsec=872674044}) = ? ERESTART_RESTARTBLOCK (Interrupted 
by signal)

strace: Process 2890832 detached

Any ideas, what `mkinitrd.sh` [2] should do differently?

```
cat > init.c << '___EOF___'
#ifndef NOLIBC
#include 
#include 
#endif

volatile unsigned long delaycount;

int main(int argc, int argv[])
{
int i;
struct timeval tv;
struct timeval tvb;

for (;;) {
sleep(1);
/* Need some userspace time. */
if (gettimeofday(, NULL))
continue;
do {
for (i = 0; i < 1000 * 100; i++)
delaycount = i * i;
if (gettimeofday(, NULL))
break;
tv.tv_sec -= tvb.tv_sec;
if (tv.tv_sec > 1)
break;
tv.tv_usec += tv.tv_sec * 1000 * 1000;
tv.tv_usec -= tvb.tv_usec;
} while (tv.tv_usec < 1000);
}
return 0;
}
___EOF___

# build using nolibc on supported archs (smaller executable) and fall
# back to regular glibc on other ones.
if echo -e "#if __x86_64__||__i386__||__i486__||__i586__||__i686__" \
   "||__ARM_EABI__||__aarch64__\nyes\n#endif" \
   | ${CROSS_COMPILE}gcc -E -nostdlib -xc - \
   | grep -q '^yes'; then
# architecture supported by nolibc
${CROSS_COMPILE}gcc -fno-asynchronous-unwind-tables -fno-ident \
-nostdlib -include ../../../../include/nolibc/nolibc.h \
-s -static -Os -o init init.c -lgcc
else
${CROSS_COMPILE}gcc -s -static -Os -o init init.c
fi
```


Kind regards,

Paul


[1]: 
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/tools/testing/selftests/rcutorture/doc/initrd.txt
[2]: 
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/tools/testing/selftests/rcutorture/bin/mkinitrd.sh