On Tue, Aug 16, 2016 at 12:16:26 +0100, Alex Bennée wrote:
> Emilio G. Cota <c...@braap.org> writes:
> > However, I'm finding issues that might not have to do with the
> > patch itself.

I had some time today to dig deeper -- turns out the issues *have*
to do with my patch, see below. (And sorry for hijacking this thread.)

> >   - Applying the "remove tb_lock around hot path" patch makes it
> >     easier to trigger this assert in cpu-exec.c:650 (approx.):
> >             /* Assert that the compiler does not smash local variables. */
> >             g_assert(cpu == current_cpu)
> >     I've also seen triggered the assert immediately after that one, as well
> >     as the rcu_read_unlock depth assert.
> 
> Odd - these are remnants of a dodgy compiler.

The problem is that by calling cpu_exec_step() in a loop, we don't
know what instructions we might execute. Thus, when one of those instructions
(sandwiched between an ldrex and strex) causes an exception (e.g. SVC in A64)
we take the longjmp that lands into cpu_exec_loop, from which we did *not*
come from. That explains those odd asserts being triggered.

The reason why this is only triggered when pthreads are joined, is because
the code there is particularly tricky, with branches and SVC between
ldrex/strex pairs.

The good news is that this still allows me to benchmark the TSX code vs
cmpxchg (I just print out the results before joining); for 4 cores
(8 HW threads), qht-bench performs just as well with TSX and cmpxchg (but
with TSX we get full correctness). For 1 thread, atomic_add is faster
with cmpxchg, but the gap is greatly reduced as contention increases.
This gap is due to the fixed cost of calling _xstart/_xend, which
is quite a few more instructions than just emitting an atomic.

                Emilio

Reply via email to