On Tue, Mar 16, 2021 at 12:35 PM Ben Dooks <ben.do...@codethink.co.uk> wrote: > >>>> On 12/03/2021 16:25, Alex Ghiti wrote: > >>>>> > >>>>> > >>>>> Le 3/12/21 à 10:12 AM, Dmitry Vyukov a écrit : > >>>>>> On Fri, Mar 12, 2021 at 2:50 PM Ben Dooks <ben.do...@codethink.co.uk> > >>>>>> wrote: > >>>>>>> > >>>>>>> On 10/03/2021 17:16, Dmitry Vyukov wrote: > >>>>>>>> On Wed, Mar 10, 2021 at 5:46 PM syzbot > >>>>>>>> <syzbot+e74b94fe601ab9552...@syzkaller.appspotmail.com> wrote: > >>>>>>>>> > >>>>>>>>> Hello, > >>>>>>>>> > >>>>>>>>> syzbot found the following issue on: > >>>>>>>>> > >>>>>>>>> HEAD commit: 0d7588ab riscv: process: Fix no prototype for > >>>>>>>>> arch_dup_tas.. > >>>>>>>>> git tree: > >>>>>>>>> git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux.git fixes > >>>>>>>>> console output: > >>>>>>>>> https://syzkaller.appspot.com/x/log.txt?x=1212c6e6d00000 > >>>>>>>>> kernel config: > >>>>>>>>> https://syzkaller.appspot.com/x/.config?x=e3c595255fb2d136 > >>>>>>>>> dashboard link: > >>>>>>>>> https://syzkaller.appspot.com/bug?extid=e74b94fe601ab9552d69 > >>>>>>>>> userspace arch: riscv64 > >>>>>>>>> > >>>>>>>>> Unfortunately, I don't have any reproducer for this issue yet. > >>>>>>>>> > >>>>>>>>> IMPORTANT: if you fix the issue, please add the following tag to > >>>>>>>>> the commit: > >>>>>>>>> Reported-by: syzbot+e74b94fe601ab9552...@syzkaller.appspotmail.com > >>>>>>>> > >>>>>>>> +riscv maintainers > >>>>>>>> > >>>>>>>> This is riscv64-specific. > >>>>>>>> I've seen similar crashes in put_user in other places. It looks like > >>>>>>>> put_user crashes in the user address is not mapped/protected (?). > >>>>>>> > >>>>>>> I've been having a look, and this seems to be down to access of the > >>>>>>> tsk->set_child_tid variable. I assume the fuzzing here is to pass a > >>>>>>> bad address to clone? > >>>>>>> > >>>>>>> From looking at the code, the put_user() code should have set the > >>>>>>> relevant SR_SUM bit (the value for this, which is 1<<18 is in the > >>>>>>> s2 register in the crash report) and from looking at the compiler > >>>>>>> output from my gcc-10, the code looks to be dong the relevant csrs > >>>>>>> and then csrc around the put_user > >>>>>>> > >>>>>>> So currently I do not understand how the above could have happened > >>>>>>> over than something re-tried the code seqeunce and ended up retrying > >>>>>>> the faulting instruction without the SR_SUM bit set. > >>>>>> > >>>>>> I would maybe blame qemu for randomly resetting SR_SUM, but it's > >>>>>> strange that 99% of these crashes are in schedule_tail. If it would be > >>>>>> qemu, then they would be more evenly distributed... > >>>>>> > >>>>>> Another observation: looking at a dozen of crash logs, in none of > >>>>>> these cases fuzzer was actually trying to fuzz clone with some insane > >>>>>> arguments. So it looks like completely normal clone's (e..g coming > >>>>>> from pthread_create) result in this crash. > >>>>>> > >>>>>> I also wonder why there is ret_from_exception, is it normal? I see > >>>>>> handle_exception disables SR_SUM: > >>>>> > >>>>> csrrc does the right thing: it cleans SR_SUM bit in status but saves the > >>>>> previous value that will get correctly restored. > >>>>> > >>>>> ("The CSRRC (Atomic Read and Clear Bits in CSR) instruction reads the > >>>>> value of the CSR, zero-extends the value to XLEN bits, and writes it to > >>>>> integer registerrd. The initial value in integerregisterrs1is treated > >>>>> as a bit mask that specifies bit positions to be cleared in the CSR. Any > >>>>> bitthat is high inrs1will cause the corresponding bit to be cleared in > >>>>> the CSR, if that CSR bit iswritable. Other bits in the CSR are > >>>>> unaffected.") > >>>> > >>>> I think there may also be an understanding issue on what the SR_SUM > >>>> bit does. I thought if it is set, M->U accesses would fault, which is > >>>> why it gets set early on. But from reading the uaccess code it looks > >>>> like the uaccess code sets it on entry and then clears on exit. > >>>> > >>>> I am very confused. Is there a master reference for rv64? > >>>> > >>>> https://people.eecs.berkeley.edu/~krste/papers/riscv-privileged-v1.9.pdf > >>>> seems to state PUM is the SR_SUM bit, and that (if set) disabled > >>>> > >>>> Quote: > >>>> The PUM (Protect User Memory) bit modifies the privilege with which > >>>> S-mode loads, stores, and instruction fetches access virtual memory. > >>>> When PUM=0, translation and protection behave as normal. When PUM=1, > >>>> S-mode memory accesses to pages that are accessible by U-mode (U=1 in > >>>> Figure 4.19) will fault. PUM has no effect when executing in U-mode > >>>> > >>>> > >>>>>> https://elixir.bootlin.com/linux/v5.12-rc2/source/arch/riscv/kernel/entry.S#L73 > >>>>>> > >>>>> > >>>>> Still no luck for the moment, can't reproduce it locally, my test is > >>>>> maybe not that good (I created threads all day long in order to trigger > >>>>> the put_user of schedule_tail). > >>>> > >>>> It may of course depend on memory and other stuff. I did try to see if > >>>> it was possible to clone() with the child_tid address being a valid but > >>>> not mapped page... > >>>> > >>>>> Given that the path you mention works most of the time, and that the > >>>>> status register in the stack trace shows the SUM bit is not set whereas > >>>>> it is set in put_user, I'm leaning toward some race condition (maybe an > >>>>> interrupt that arrives at the "wrong" time) or a qemu issue as you > >>>>> mentioned. > >>>> > >>>> I suppose this is possible. From what I read it should get to the > >>>> point of being there with the SUM flag cleared, so either something > >>>> went wrong in trying to fix the instruction up or there's some other > >>>> error we're missing. > >>>> > >>>>> To eliminate qemu issues, do you have access to some HW ? Or to > >>>>> different qemu versions ? > >>>> > >>>> I do have access to a Microchip Polarfire board. I just need the > >>>> instructions on how to setup the test-code to make it work on the > >>>> hardware. > >>> > >>> For full syzkaller support, it would need to know how to reboot these > >>> boards and get access to the console. > >>> syzkaller has a stop-gap VM backend which just uses ssh to a physical > >>> machine and expects the kernel to reboot on its own after any crashes. > >>> > >>> But I actually managed to reproduce it in an even simpler setup. > >>> Assuming you have Go 1.15 and riscv64 cross-compiler gcc installed > >>> > >>> $ go get -u -d github.com/google/syzkaller/... > >>> $ cd $GOPATH/src/github.com/google/syzkaller > >>> $ make stress executor TARGETARCH=riscv64 > >>> $ scp bin/linux_riscv64/syz-execprog bin/linux_riscv64/syz-executor > >>> your_machine:/ > >>> > >>> Then run ./syz-stress on the machine. > >>> On the first run it crashed it with some other bug, on the second run > >>> I got the crash in schedule_tail. > >>> With qemu tcg I also added -slowdown=10 flag to syz-stress to scale > >>> all timeouts, if native execution is faster, then you don't need it. > >> > >> Ok, not sure what's going on. I get a lot of errors similar to: > >>> > >>> 2021/03/15 21:35:20 transitively unsupported: > >>> ioctl$SNAPSHOT_CREATE_IMAGE: no syscalls can create resource fd_snapshot, > >>> enable some syscalls that can create it [openat$snapshot] > > > > This is not an error, just a notification that some syscalls are not > > enabled in the kernel and won't be fuzzed. > > > >> Followed by: > >> > >>> 2021/03/15 21:35:48 executed 0 programs > >>> 2021/03/15 21:35:48 failed to create execution environment: failed to > >>> mmap shm file: invalid argument > >> > >> The qemu is 5.2.0 and root is Debian/unstable riscv64 (same as chroot > >> used to build the syz tools) > > > > This is an error. But I see it the first time ever. > > It comes from here: > > https://github.com/google/syzkaller/blob/fdb2bb2c23ee709880407f56307e2800ad27e9ae/pkg/osutil/osutil_unix.go#L119-L121 > > There should be pretty simple logic inside of syscall.Mmap. Perhaps > > you are using some older Go toolchain with incomplete riscv support? > > I think I've used 1.14 and 1.15. But there is already 1.16. You can > > always download a toolchain here: > > https://golang.org/dl/ > > Hmm it would have been useful to print out what file it failed to map.
What do you want to do with the file name? It's not one of pre-existing files, so the name won't tell the user much. It's just a temp file, it won't exist afterwards and it's easy to create an equivalent file. It was created in that function with: f, err = ioutil.TempFile("./", "syzkaller-shm") if err != nil { err = fmt.Errorf("failed to create temp file: %v", err) return } if err = f.Truncate(int64(size)); err != nil { err = fmt.Errorf("failed to truncate shm file: %v", err) f.Close() os.Remove(f.Name()) return } f.Close() fname := f.Name() f, err = os.OpenFile(f.Name(), os.O_RDWR, DefaultFilePerm) if err != nil { err = fmt.Errorf("failed to open shm file: %v", err) os.Remove(fname) return } > I've got go 1.15 from the debian/unstable riscv64 chroot. > I'll have a look at this in a bit to see if it throws the same issue on > a real system. > > > -- > Ben Dooks http://www.codethink.co.uk/ > Senior Engineer Codethink - Providing Genius > > https://www.codethink.co.uk/privacy.html > > -- > You received this message because you are subscribed to the Google Groups > "syzkaller-bugs" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to syzkaller-bugs+unsubscr...@googlegroups.com. > To view this discussion on the web visit > https://groups.google.com/d/msgid/syzkaller-bugs/8ebea51d-b03c-e6de-fa1c-d47091c54e45%40codethink.co.uk.