> On Aug 14, 2023, at 5:42 PM, Theo Buehler <t...@theobuehler.org> wrote:
> 
> On Mon, Aug 14, 2023 at 08:47:22PM +0000, Miod Vallat wrote:
>> For what it's worth, I couldn't get your test to fail on a dual-cpu
>> sun4u. Either it's a sun4v-specific issue or it needs many more cpus to
>> trigger.
> 
> I can reproduce the segfault, but seemingly not the killed process on
> 16-cpu LDOM ona T4-2:
> 
> cpu0 at mainbus0: SPARC-T4 (rev 0.0) @ 2847.862 MHz
> 
> Segmentation fault (core dumped)
> 93
> Segmentation fault (core dumped)
> 1616
> Segmentation fault (core dumped)
> 4185
> 
> etc.
> 
> I don't seem to be able to reproduce on a 4-cpu M3000
> 
> cpu0 at core0: FJSV,SPARC64-VII (rev 10.1) @ 2750 MHz
> cpu0: physical 64K instruction (64 b/l), 64K data (64 b/l), 5120K external 
> (256 b/l)

While chatting with deraadt@ about this he pointed out my
statement about the stack being clobbered didn’t make much
sense. Looking closer at the core file data it appears that
the registers of the main thread don’t appear to be correct
when the process segfaults.

In the test program each thread has its own mutex and 
cond_var. The main thread should be utilizing one of the
per-thread mutexes and cond_vars. The core files are 
consistently crashing in the main thread with a back trace
that looks like this:

Thread 1 (process 557006):
#0  0x0000005e81739078 in _rthread_mutex_timedlock (mutexp=0x5f39af5d98, 
trywait=0, abs=0x0, timed=0) at rthread_mutex.c:163
#1  0x0000005e8176efdc in _rthread_cond_timedwait (cond=<optimized out>, 
mutexp=0x5f39af5d98, abs=0xc) at rthread_cond.c:121

However, the mutexp address is not one of the per-thread
mutexes. The address is not with the threads array at all:

(gdb) p &threads
$1 = (thread_t (*)[40]) 0x5c61c02058 <threads>
(gdb) p &threads[40]
$2 = (thread_t *) 0x5c61c02698

mutexp is in the i0 register. It not containing a correct value
suggests the registers are not always correct after transitioning
back to user land. Perhaps there is some sort of coherency issue?

Reply via email to