On Tue, 24 Sep 2019 11:29:25 PDT (-0700), alistai...@gmail.com wrote:
On Mon, Jun 24, 2019 at 11:21 AM Joel Sing <j...@sing.id.au> wrote:

On 19-06-17 16:52:44, Richard Henderson wrote:
> On 6/16/19 12:19 PM, Joel Sing wrote:
> > +    /*
> > +     * Clear the load reservation, since an SC must fail if there is
> > +     * an SC to any address, in between an LR and SC pair.
> > +     */
> > +    tcg_gen_movi_tl(load_res, 0);
> > +
> >      gen_set_label(l2);
>
> This clear needs to be moved down below label l2.
> Otherwise, with lr / sc / sc, the second sc could succeed in error.

Indeed, thanks.

> FWIW, other targets have used -1 as the "invalid" load reservation, since the
> architecture does not require address 0 to be unmapped.  This should be quite
> visible in M-mode with paging disabled and ram at offset 0.  Often, other
> targets require alignment for the lr/sc address, though I don't see that for 
riscv.

I've switched to -1 as suggested. Regarding the alignment for reservations, the
specification does require this, although I do not recall seeing any enforcement
of this by qemu itself.

New diff follows.

From 8ef31a2ce8ef1cbeee92995a0b2994f480e9bb6d Mon Sep 17 00:00:00 2001
From: Joel Sing <j...@sing.id.au>
Date: Tue, 25 Jun 2019 02:44:24 +1000
Subject: [PATCH] Clear load reservations on qemu riscv target

This prevents a load reservation from being placed in one context/process,
then being used in another, resulting in an SC succeeding incorrectly and
breaking atomics.

Signed-off-by: Joel Sing <j...@sing.id.au>
---
 target/riscv/cpu.c                      | 1 +
 target/riscv/cpu_helper.c               | 9 +++++++++
 target/riscv/insn_trans/trans_rva.inc.c | 8 +++++++-
 3 files changed, 17 insertions(+), 1 deletion(-)

diff --git a/target/riscv/cpu.c b/target/riscv/cpu.c
index d61bce6d55..e7c8bf48fc 100644
--- a/target/riscv/cpu.c
+++ b/target/riscv/cpu.c
@@ -281,6 +281,7 @@ static void riscv_cpu_reset(CPUState *cs)
     env->pc = env->resetvec;
 #endif
     cs->exception_index = EXCP_NONE;
+    env->load_res = -1;
     set_default_nan_mode(1, &env->fp_status);
 }

diff --git a/target/riscv/cpu_helper.c b/target/riscv/cpu_helper.c
index b17f169681..6a07b12e65 100644
--- a/target/riscv/cpu_helper.c
+++ b/target/riscv/cpu_helper.c
@@ -113,6 +113,15 @@ void riscv_cpu_set_mode(CPURISCVState *env, target_ulong 
newpriv)
     }
     /* tlb_flush is unnecessary as mode is contained in mmu_idx */
     env->priv = newpriv;
+
+    /* Clear the load reservation - otherwise a reservation placed in one
+     * context/process can be used by another, resulting in an SC succeeding
+     * incorrectly. Version 2.2 of the ISA specification explicitly requires
+     * this behaviour, while later revisions say that the kernel "should" use
+     * an SC instruction to force the yielding of a load reservation on a
+     * preemptive context switch. As a result, do both.
+     */
+    env->load_res = -1;
 }

 /* get_physical_address - get the physical address for this virtual address
diff --git a/target/riscv/insn_trans/trans_rva.inc.c 
b/target/riscv/insn_trans/trans_rva.inc.c
index f6dbbc065e..fadd88849e 100644
--- a/target/riscv/insn_trans/trans_rva.inc.c
+++ b/target/riscv/insn_trans/trans_rva.inc.c
@@ -61,7 +61,7 @@ static inline bool gen_sc(DisasContext *ctx, arg_atomic *a, 
TCGMemOp mop)

     gen_set_label(l1);
     /*
-     * Address comparion failure.  However, we still need to
+     * Address comparison failure.  However, we still need to
      * provide the memory barrier implied by AQ/RL.
      */
     tcg_gen_mb(TCG_MO_ALL + a->aq * TCG_BAR_LDAQ + a->rl * TCG_BAR_STRL);
@@ -69,6 +69,12 @@ static inline bool gen_sc(DisasContext *ctx, arg_atomic *a, 
TCGMemOp mop)
     gen_set_gpr(a->rd, dat);

     gen_set_label(l2);
+    /*
+     * Clear the load reservation, since an SC must fail if there is
+     * an SC to any address, in between an LR and SC pair.
+     */
+    tcg_gen_movi_tl(load_res, -1);
+
     tcg_temp_free(dat);
     tcg_temp_free(src1);
     tcg_temp_free(src2);
--

This patch causes boot failures when booting systemd built with musl on RV64.

It could possibly be a musl bug, but I wanted to throw that out here
first to see what people think.

Looking at the musl port, I see at least one bug in their atomics jumping out at me:

diff --git a/arch/riscv64/atomic_arch.h b/arch/riscv64/atomic_arch.h
index c9765342..41ad4d04 100644
--- a/arch/riscv64/atomic_arch.h
+++ b/arch/riscv64/atomic_arch.h
@@ -14,7 +14,7 @@ static inline int a_cas(volatile int *p, int t, int s)
               "       sc.w.aqrl %1, %4, (%2)\n"
               "       bnez %1, 1b\n"
               "1:"
-               : "=&r"(old), "=r"(tmp)
+               : "=&r"(old), "=&r"(tmp)
               : "r"(p), "r"(t), "r"(s)
               : "memory");
       return old;
@@ -31,7 +31,7 @@ static inline void *a_cas_p(volatile void *p, void *t, void 
*s)
               "       sc.d.aqrl %1, %4, (%2)\n"
               "       bnez %1, 1b\n"
               "1:"
-               : "=&r"(old), "=r"(tmp)
+               : "=&r"(old), "=&r"(tmp)
               : "r"(p), "r"(t), "r"(s)
               : "memory");
       return old;

It's a shot in the dark as to whether that'll fix your bug, but I could at least see a mechanism for it: before we yielded load reservations on context switches then that backwards branch would never be taken, so we wouldn't notice if tmp was allocated into one of the same registers as the inputs. Even if it doesn't fix your issue it's still a bug so I'll send the patch out, just LMK so I can indicate how important the issue is.

This should manifest on hardware, but it looks like we managed to drop that SC patch. I'll go send the patch out now...


Alistair

2.21.0



Reply via email to