On Sat, Sep 02, 2023 at 11:52:28AM +0100, Martin Pieuchot wrote:
> On 13/08/23(Sun) 22:59, Kurt Miller wrote:
> > I’ve been hunting an intermittent jdk crash on sparc64 for some time now.
> > Since egdb has not been up to the task, I created a small c program which
> > reproduces the problem. This partially mimics the jdk startup where a number
> > of detached threads are created. When each thread is created the main thread
> > waits for it to start and change state. In my test program I then have the
> > detached thread wait for a condition that will not happen (parked waiting
> > on a condition var).
> >
> > When the intermittent crash occurs, one of two things happen; a segfault or
> > the process has been killed by the kernel. The segfault cores are similar to
> > what I see with the jdk crashes. It looks like the stack of the thread
> > creating
> > the threads is corrupted. In this case it is the primordial thread. In the
> > jdk
> > it is a different thread but its the thread that called pthread_create that
> > has it stack wiped out.
>
> I have seen similar symptoms on x86 with go & rust when unlocking the
> fault handler. I wonder if grabbing the KERNEL_LOCK() around uvm_fault()
> in sparc64/trap.c makes the problem disappear...
It does not. I ran the test program with the diff below and I still see
both symptoms of this instability.
Index: sys/arch/sparc64/sparc64/trap.c
===================================================================
RCS file: /cvs/src/sys/arch/sparc64/sparc64/trap.c,v
retrieving revision 1.115
diff -u -p -r1.115 trap.c
--- sys/arch/sparc64/sparc64/trap.c 11 Feb 2023 23:07:28 -0000 1.115
+++ sys/arch/sparc64/sparc64/trap.c 2 Sep 2023 12:16:09 -0000
@@ -957,7 +957,9 @@ text_access_fault(struct trapframe *tf,
uvm_map_inentry_sp, p->p_vmspace->vm_map.sserial))
goto out;
+ KERNEL_LOCK();
error = uvm_fault(&p->p_vmspace->vm_map, va, 0, access_type);
+ KERNEL_UNLOCK();
/*
* If this was a stack access we keep track of the maximum
@@ -1051,7 +1053,9 @@ text_access_error(struct trapframe *tf,
uvm_map_inentry_sp, p->p_vmspace->vm_map.sserial))
goto out;
+ KERNEL_LOCK();
error = uvm_fault(&p->p_vmspace->vm_map, va, 0, access_type);
+ KERNEL_UNLOCK();
/*
* If this was a stack access we keep track of the maximum
@@ -1261,7 +1265,9 @@ copyinsn(struct proc *p, vaddr_t uva, in
do {
if (pmap_copyinsn(map->pmap, uva, (uint32_t *)insn) == 0)
break;
+ KERNEL_LOCK();
error = uvm_fault(map, trunc_page(uva), 0, PROT_EXEC);
+ KERNEL_UNLOCK();
} while (error == 0);
return error;