On Mon, 17 Jul 2023 at 19:25, Richard Henderson
<richard.hender...@linaro.org> wrote:
>
> On 7/17/23 11:12, Peter Maydell wrote:
> > On Sun, 16 Jul 2023 at 18:03, Richard Henderson
> > <richard.hender...@linaro.org> wrote:
> >>
> >> For user-only, the probe for page writability may race with another
> >> thread's mprotect.  Take the mmap_lock around the operation.  This
> >> is still faster than the start/end_exclusive fallback.
> >>
> >> Remove the write probe in load_atomic8_or_exit.  There we don't have
> >> the same machinery for testing the existance of an 8-byte cmpxchg.
> >
> > "existence"
> >
> >>
> >> Signed-off-by: Richard Henderson <richard.hender...@linaro.org>
> >> ---
> >>   accel/tcg/ldst_atomicity.c.inc | 54 +++++++++++++++-------------------
> >>   1 file changed, 24 insertions(+), 30 deletions(-)
> >>
> >> diff --git a/accel/tcg/ldst_atomicity.c.inc 
> >> b/accel/tcg/ldst_atomicity.c.inc
> >> index 4de0a80492..e7170f8ba2 100644
> >> --- a/accel/tcg/ldst_atomicity.c.inc
> >> +++ b/accel/tcg/ldst_atomicity.c.inc
> >> @@ -152,19 +152,6 @@ static uint64_t load_atomic8_or_exit(CPUArchState 
> >> *env, uintptr_t ra, void *pv)
> >>           return load_atomic8(pv);
> >>       }
> >>
> >> -#ifdef CONFIG_USER_ONLY
> >> -    /*
> >> -     * If the page is not writable, then assume the value is immutable
> >> -     * and requires no locking.  This ignores the case of MAP_SHARED with
> >> -     * another process, because the fallback start_exclusive solution
> >> -     * provides no protection across processes.
> >> -     */
> >> -    if (page_check_range(h2g(pv), 8, PAGE_WRITE_ORG)) {
> >> -        uint64_t *p = __builtin_assume_aligned(pv, 8);
> >> -        return *p;
> >> -    }
> >> -#endif
> >
> > I don't really understand the comment in the commit message:
> > why would it be wrong to wrap this "test writeability and
> > do the operation" in the mmap-lock, the same way we do for the
> > 16-byte case?
>
> It would not be wrong.  I was just thinking of the cmpxchg8 part, for which 
> we do not have
> a configure probe, and for which I *think* there's no call, because there are 
> no 32-bit
> hosts that have cmpxchg8 but not the full CONFIG_ATOMIC64.

But this piece of code the patch deletes isn't doing a
cmpxchg8, it just does a plain load. If "take the lock
and do the operation" is faster than always using the
fallback code for the atomic16 case, why don't we make
the same tradeoff choice in atomic8  ?

thanks
-- PMM

Reply via email to