We've been here before:

The problem is not related to KERNEL_LOCK around uvm_fault.

Jeremie Courreges-Anglas <j...@wxcvbn.org> wrote:

> On Mon, Nov 01 2021, Martin Pieuchot <m...@openbsd.org> wrote:
> > On 31/10/21(Sun) 15:57, Jeremie Courreges-Anglas wrote:
> >> On Fri, Oct 08 2021, Jeremie Courreges-Anglas <j...@wxcvbn.org> wrote:
> >> > riscv64.ports was running dpb(1) with two other members in the build
> >> > cluster.  A few minutes ago I found it in ddb(4).  The report is short,
> >> > sadly, as the machine doesn't return from the 'bt' command.
> >> >
> >> > The machine is acting both as an NFS server and and NFS client.
> >> >
> >> > OpenBSD/riscv64 (riscv64.ports.openbsd.org) (console)
> >> >
> >> > login: panic: pool_anic:t: pol_ free l: p mod fiee liat m  oxifief:c a2e 
> >> > 07ff0ff fte21ade0 00f ifem c0d
> >> > 1 07f1f0ffcf2177 010=0 c16ce6 7x090xc52c !
> >> > 0x9066d21 919 xc1521
> >> > Stopped at      panic+0xfe:     addi    a0,zero,256    TID    PID    UID 
> >> >     PR
> >> > FLAGS     PFLAGS  CPU  COMMAND
> >> >   24243  43192     55         0x2          0    0  cc
> >> > *480349  52543      0        0x11          0    1  perl
> >> >  480803  72746     55         0x2          0    3  c++
> >> >  366351   3003     55         0x2          0    2K c++
> >> > panic() at panic+0xfa
> >> > panic() at pool_do_get+0x29a
> >> > pool_do_get() at pool_get+0x76
> >> > pool_get() at pmap_enter+0x128
> >> > pmap_enter() at uvm_fault_upper+0x1c2
> >> > uvm_fault_upper() at uvm_fault+0xb2
> >> > uvm_fault() at do_trap_user+0x120
> >> > https://www.openbsd.org/ddb.html describes the minimum info required in 
> >> > bug
> >> > reports.  Insufficient info makes it difficult to find and fix bugs.
> >> > ddb{1}> bt
> >> > panic() at panic+0xfa
> >> > panic() at pool_do_get+0x29a
> >> > pool_do_get() at pool_get+0x76
> >> > pool_get() at pmap_enter+0x128
> >> > pmap_enter() at uvm_fault_upper+0x1c2
> >> > uvm_fault_upper() at uvm_fault+0xb2
> >> > uvm_fault() at do_trap_user+0x120
> >> > do_trap_user() at cpu_exception_handler_user+0x7a
> >> > <hangs>
> >> 
> >> Another panic on riscv64-1, a new board which doesn't have RTC/I2C
> >> problems anymore and is acting as a dpb(1) cluster member/NFS client.
> >
> > Why are both traces ending in pool_do_get()?  Are CPU0 and CPU1 there at
> > the same time?
> >
> > This corruption as well as the one above arise in the top part of the
> > fault handler which already runs concurrently.  Did you try putting
> > KERNEL_LOCK/UNLOCK() dances around uvm_fault() in trap.c?  That could
> > help figure out if something is still unsafe in riscv64's pmap.
> 
> On my riscv64 I did add locking around the two uvm_fault() calls as
> suggested, rebooted, then started building libcrypto and libssl and left
> the place.  Sadly the box is now unreachable (panic?) and will stay as
> is for the next days.  I'll get back to it on sunday.
> 
> Since I haven't mentioned it in this thread, clang crashes with SIGSEGV
> often when building ports.  For the two first published bulk builds
> I just restarted the failed ports.
> 
> -- 
> jca | PGP : 0x1524E7EE / 5135 92C1 AD36 5293 2BDF  DDCC 0DFA 74AE 1524 E7EE
> 

Reply via email to