On Mon, Jul 11, 2022 at 01:05:19PM +0200, Martin Pieuchot wrote:
> On 11/07/22(Mon) 07:50, Theo Buehler wrote:
> > On Fri, Jun 03, 2022 at 03:02:36PM +0200, Theo Buehler wrote:
> > > > Please do note that this change can introduce/expose other issues.
> > > 
> > > It seems that this diff causes occasional hangs when building snapshots
> > > on my mac M1 mini. This happened twice in 10 builds, both times in
> > > xenocara. Unfortunately, both times the machine became entirely
> > > unresponsive and as I don't have serial console, that's all the info I
> > > have...
> > > 
> > > This machine has been very reliable and built >50 snaps without any hang
> > > over the last 2.5 months. I'm now trying snap builds in a loop without
> > > the diff to make sure the machine doesn't hang due to another recent
> > > kernel change.
> > > 
> > 
> > A little bit of info on this. The first three lines were a bit garbled on
> > the screen:
> > 
> > panic: kernel diagnostic assertion "uvn->_oppa jai c:              ke r  el 
> >   d iag no   tic a  s   rt n "   map   ==    UL L  | | rw wr                
> >      k
> > ite held(amap->amap_lock)" failed: file "/ss/uvm/uvm_fault.c", line 846.
> > ernel diagnostic assertion "!_kernel_lock_held
> > Stopped at panic+0160:  cmp w21, #0x0  ailed: file "/sys/kern/
> >     TID    PID     UID     PRFLAGS     PFLAGS   CPU  COMMAND
> >  411910  44540      21    0x100001          0     3  make
> > *436444  84241      21    0x100003          0     6  sh
> >  227952  53498      21    0x100003          0     5  sh
> >  258925  15765      21    0x101005          0     0  make
> >  128459   9649      21    0x100003          0     1  tradcpp
> >  287213  64216      21    0x100003        0x8     7  make
> >  173587   4617    1000    0x100000          0     2  tmux
> >  126511  69919       0     0x14000      0x200     4  softnet
> > db_enter() at panic+0x15c
> > panic() at __assert+0x24
> > uvm_fault() at uvm_fault_upper_lookup+0x258
> > uvm_fault_upper() at uvm_fault+0xec
> > uvm_fault() at udata_abort+0x128
> > udata_abort() at do_el0_sync+0xdc
> > do_el0_sync() at handle_el0_sync+0x74
> > https://www.openbsd.org/ddb.html describes the minimum info required in bug
> > reports.  Insufficient info makes it difficult to find and fix bugs.
> > ddb{6}> show panic
> > *cpu0: kernel diagnostic assertion "uvn->u_obj.uo_refs == 0" failed: file  
> > "/sys/kern/uvn_vnode.c", line 231.
> >  cpu6: kernel diagnostic assertion "amap == NULL || 
> > rw_write_held(amap->am_lock)" failed: file  "/sys/uvm/uvm_fault", line 846.
> >  cpu3: kernel diagnostic assertion "!_kernel_lock_held()" failed: file 
> > "/sys/kern/kern_fork.c", line 678
> > ddb{6}> mach ddbcpu 0
> > 
> > After pressing enter here, the machine locked up completely.
> 
> It's hard for me to tell what's going on.  I believe the interesting
> trace is the one from cpu0 that we don't have.  Can you easily reproduce
> this?  I'm trying on amd64 without luck.  I'd glad if you could gather
> more infos.

Sorry for the delay. I was only at home intermittently. I hit this three
times:

panic: kernel diagnostic assertion "uvn->u_obj.uo_refs == 0" failed: file 
"/sys/uvm/uvm_vnode.c", line 231
panic+0x160:    cmp     w21, #0x0
      TID     PID        UID    PRFLAGS         PFLAGS  CPU     COMMAND
    66455   64425         21    0x100001             0    7     make
  *501659   83050         21    0x101005             0    2K    make
   226254   83437         21    0x100001             0    0     sh
   325842   29705         21    0x100003             0    5     gzip
   450503   79436         21    0x100003             0    1     bdftopcf
   223429   90969         21    0x100003             0    3     make
    25518   23526       1000    0x100003             0    6     tee
   482494   33196          0     0x14000         0x200    4     reaper
db_enter() at panic+0x15c
panic() at __assert+0x24
panic() at uvn_attach+0x2ac
uvm_vnp_terminate() at vmcmd_map_pagedvn+0x58
vmcmd_map_pagedvn() at exec_process_vmcmds+0x80
exec_process_vmcmds() at sys_execve+0x5ac
sys_execve() at svc_handler+0x2bc
ddb{2}> bt
db_enter() at panic+0x15c
panic() at __assert+0x24
panic() at uvn_attach+0x2ac
uvm_vnp_terminate() at vmcmd_map_pagedvn+0x58
vmcmd_map_pagedvn() at exec_process_vmcmds+0x80
exec_process_vmcmds() at sys_execve+0x5ac
sys_execve() at svc_handler+0x2bc
svc_handler() at do_el0_sync+0xa0
do_el0_sync() at handle_el0_sync+0x74
address 0x7ffffea988 is invalid
--- trap ---
ddb{2}>

Apart from uvm_vnp_terminate(), the trace seems straightforward:

sys_execve()
 exec_process_vmcmds()
  vmcmd_map_pagedvn()
   uvn_attach()
    KASSERT(uvn->u_obj.uo_refs == 0)

I do not see how uvm_vnp_terminate() comes in. It showed up in all the
traces I saw. vmcmd_map_pagedvn + 0x58 is the call to uvn_attach() in
exec_subr.c:190.

Reply via email to