On Mon, Nov 01 2021, Jeremie Courreges-Anglas <j...@wxcvbn.org> wrote:
> On Mon, Nov 01 2021, Martin Pieuchot <m...@openbsd.org> wrote:
>> On 31/10/21(Sun) 15:57, Jeremie Courreges-Anglas wrote:
>>> On Fri, Oct 08 2021, Jeremie Courreges-Anglas <j...@wxcvbn.org> wrote:
>>> > riscv64.ports was running dpb(1) with two other members in the build
>>> > cluster.  A few minutes ago I found it in ddb(4).  The report is short,
>>> > sadly, as the machine doesn't return from the 'bt' command.
>>> >
>>> > The machine is acting both as an NFS server and and NFS client.
>>> >
>>> > OpenBSD/riscv64 (riscv64.ports.openbsd.org) (console)
>>> >
>>> > login: panic: pool_anic:t: pol_ free l: p mod fiee liat m  oxifief:c a2e 
>>> > 07ff0ff fte21ade0 00f ifem c0d
>>> > 1 07f1f0ffcf2177 010=0 c16ce6 7x090xc52c !
>>> > 0x9066d21 919 xc1521
>>> > Stopped at      panic+0xfe:     addi    a0,zero,256    TID    PID    UID  
>>> >    PR
>>> > FLAGS     PFLAGS  CPU  COMMAND
>>> >   24243  43192     55         0x2          0    0  cc
>>> > *480349  52543      0        0x11          0    1  perl
>>> >  480803  72746     55         0x2          0    3  c++
>>> >  366351   3003     55         0x2          0    2K c++
>>> > panic() at panic+0xfa
>>> > panic() at pool_do_get+0x29a
>>> > pool_do_get() at pool_get+0x76
>>> > pool_get() at pmap_enter+0x128
>>> > pmap_enter() at uvm_fault_upper+0x1c2
>>> > uvm_fault_upper() at uvm_fault+0xb2
>>> > uvm_fault() at do_trap_user+0x120
>>> > https://www.openbsd.org/ddb.html describes the minimum info required in 
>>> > bug
>>> > reports.  Insufficient info makes it difficult to find and fix bugs.
>>> > ddb{1}> bt
>>> > panic() at panic+0xfa
>>> > panic() at pool_do_get+0x29a
>>> > pool_do_get() at pool_get+0x76
>>> > pool_get() at pmap_enter+0x128
>>> > pmap_enter() at uvm_fault_upper+0x1c2
>>> > uvm_fault_upper() at uvm_fault+0xb2
>>> > uvm_fault() at do_trap_user+0x120
>>> > do_trap_user() at cpu_exception_handler_user+0x7a
>>> > <hangs>
>>> 
>>> Another panic on riscv64-1, a new board which doesn't have RTC/I2C
>>> problems anymore and is acting as a dpb(1) cluster member/NFS client.
>>
>> Why are both traces ending in pool_do_get()?  Are CPU0 and CPU1 there at
>> the same time?
>>
>> This corruption as well as the one above arise in the top part of the
>> fault handler which already runs concurrently.  Did you try putting
>> KERNEL_LOCK/UNLOCK() dances around uvm_fault() in trap.c?  That could
>> help figure out if something is still unsafe in riscv64's pmap.

I'll try that on the ports bulk build machines.  After all, that's where
I hit most/all the panics and clang crashes.

> On my riscv64 I did add locking around the two uvm_fault() calls as
> suggested, rebooted, then started building libcrypto and libssl and left
> the place.  Sadly the box is now unreachable (panic?) and will stay as
> is for the next days.  I'll get back to it on sunday.

That was a bit premature, I finally managed to remotely connect to the
machine.  No idea why I couldn't connect to it for so long.  Either
a problem with the provider/router, or something wrong regarding
riscv64, slaacd and the router?

  slaacd[28738]: sendmsg: Can't assign requested address

seems to happen at each reboot.  Will have to investigate.

-- 
jca | PGP : 0x1524E7EE / 5135 92C1 AD36 5293 2BDF  DDCC 0DFA 74AE 1524 E7EE

Reply via email to