Ping.

Please review.

Regards,
Surya

On 06/08/25 6:51 pm, Surya Kumari Jangala wrote:
> The PowerPC ISA has Load-And-Reserve and Store-Conditional instructions
> which can be used to construct a sequence of instructions that appears
> to perform an atomic update operation on an aligned storage location.
> 
> The larx (load-and-reserve) instruction supports an Exclusive Access
> Hint (EH). A value of 0 for this hint indicates other programs might
> attempt to modify the storage location. A value of 1 indicates that
> other programs will not attempt to modify the memory location until the
> program that has done the load performs a subsequent store. EH = 1
> should be used when the program is obtaining a lock variable which it
> will subsequently release before another program attempts to modify the
> lock variable. When contention for a lock is significant, using this
> hint may reduce the number of times a cache block is transferred between
> processor caches.
> 
> This patch introduces a new built-in function:
>  __atomic_compare_exchange_local()
> 
> It behaves like __atomic_compare_exchange(), but it uses an EH value of
> 1 in the larx (load-and-reserve) instruction. The new builtin helps
> optimize lock contention on PowerPC by keeping the lock cacheline in
> the local processor longer, reducing performance penalties from
> cacheline movement.
> 
> This patch also provides a hook to specify if a target supports load
> instructions with exclusive access hints. For targets that do not
> support such load instructions, calling the new builtin will result in
> an error.
> 
> The existing infrastructure for supporting __atomic_compare_exchange
> is reused—with some modifications—to accommodate the new builtin.
> In the expand pass, additional parameters are introduced in functions
> wherever necessary to indicate that the builtin being processed is 
> __atomic_compare_exchange_local.
> 
> Bootstrapped and regtested on powerpc64le and aarch64. Ok for trunk?
> 
> 2025-08-05  Surya Kumari Jangala  <[email protected]>
> 
> gcc:
>       * builtins.cc (expand_builtin_atomic_compare_exchange): Add a new
>       parameter 'local'. Pass new parameter 'local' to
>       expand_atomic_compare_and_swap().
>       (expand_builtin): Pass parameter 'local' to
>       expand_builtin_atomic_compare_exchange(). Expand call to
>       __atomic_compare_exchange_local().
>       * c-family/c-common.cc (get_atomic_generic_size): Add new case
>       for BUILT_IN_ATOMIC_COMPARE_EXCHANGE_LOCAL in switch statement.
>       (resolve_overloaded_atomic_compare_exchange): Issue error message if
>       lock free size not specified for __atomic_compare_exchange_local.
>       (resolve_overloaded_builtin): Check if target supports loads with
>       exclusive access hints. Convert builtin to _N variant.
>       * config/rs6000/rs6000-protos.h (rs6000_expand_atomic_compare_and_swap):
>       Add additional parameter 'local' to the prototype.
>       * config/rs6000/rs6000.cc (rs6000_have_load_with_exclusive_access):
>       New function.
>       (emit_load_locked): Add new parameter. Pass new parameter to generate
>       load-locked instruction.
>       (rs6000_expand_atomic_compare_and_swap): Add new parameter. Call
>       emit_load_locked() with additional parameter value of EH bit.
>       (rs6000_expand_atomic_exchange): Pass EH value 0 to emit_load_locked().
>       (rs6000_expand_atomic_op): Likewise.
>       * config/rs6000/rs6000.h (TARGET_HAVE_LOAD_WITH_EXCLUSIVE_ACCESS):
>       Define.
>       * config/rs6000/sync.md (load_locked<mode>): Add new operand in RTL
>       template. Specify EH bit in the larx instruction.
>       (load_locked<QHI:mode>_si): Likewise.
>       (load_lockedpti): Likewise.
>       (load_lockedti): Add new operand in RTL template. Pass EH bit to
>       gen_load_lockedpti().
>       (atomic_compare_and_swap<mode>): Pass new parameter 'false' to
>       rs6000_expand_atomic_compare_and_swap.
>       (atomic_compare_and_swap_local<mode>): New define_expand.
>       * doc/tm.texi: Regenerate
>       * doc/tm.texi.in (TARGET_HAVE_LOAD_WITH_EXCLUSIVE_ACCESS): New hook.
>       * optabs.cc (expand_atomic_compare_and_swap): Expand the new builtin.
>       * optabs.def (atomic_compare_and_swap_local_optab): New entry.
>       * optabs.h (expand_atomic_compare_and_swap): Add additional parameter
>       'local' with default value false.
>       * predict.cc (expr_expected_value_1): Set up predictor for the new
>       builtin.
>       * sync-builtins.def (BUILT_IN_ATOMIC_COMPARE_EXCHANGE_LOCAL): Define
>       new enum.
>       (BUILT_IN_ATOMIC_COMPARE_EXCHANGE_LOCAL_N): Likewise.
>       (BUILT_IN_ATOMIC_COMPARE_EXCHANGE_LOCAL_1): Likewise.
>       (BUILT_IN_ATOMIC_COMPARE_EXCHANGE_LOCAL_2): Likewise.
>       (BUILT_IN_ATOMIC_COMPARE_EXCHANGE_LOCAL_4): Likewise.
>       (BUILT_IN_ATOMIC_COMPARE_EXCHANGE_LOCAL_8): Likewise.
>       (BUILT_IN_ATOMIC_COMPARE_EXCHANGE_LOCAL_16): Likewise.
>       * target.def ((have_load_with_exclusive_access): New hook.
> 
> gcc/testsuite:
>       * gcc.target/powerpc/acmp-tst.c: New test.
> ---
>  gcc/builtins.cc                             | 32 +++++++++++---
>  gcc/c-family/c-common.cc                    | 48 ++++++++++++++++++---
>  gcc/config/rs6000/rs6000-protos.h           |  2 +-
>  gcc/config/rs6000/rs6000.cc                 | 30 +++++++++----
>  gcc/config/rs6000/rs6000.h                  |  3 ++
>  gcc/config/rs6000/sync.md                   | 37 ++++++++++++----
>  gcc/doc/tm.texi                             |  8 ++++
>  gcc/doc/tm.texi.in                          |  2 +
>  gcc/optabs.cc                               | 10 ++++-
>  gcc/optabs.def                              |  1 +
>  gcc/optabs.h                                |  2 +-
>  gcc/predict.cc                              |  7 +++
>  gcc/sync-builtins.def                       | 28 ++++++++++++
>  gcc/target.def                              | 11 +++++
>  gcc/testsuite/gcc.target/powerpc/acmp-tst.c | 12 ++++++
>  15 files changed, 201 insertions(+), 32 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/powerpc/acmp-tst.c
> 
> diff --git a/gcc/builtins.cc b/gcc/builtins.cc
> index 7f580a3145f..e44b2f2db9d 100644
> --- a/gcc/builtins.cc
> +++ b/gcc/builtins.cc
> @@ -6691,17 +6691,25 @@ expand_builtin_atomic_exchange (machine_mode mode, 
> tree exp, rtx target)
>    return expand_atomic_exchange (target, mem, val, model);
>  }
>  
> -/* Expand the __atomic_compare_exchange intrinsic:
> +/* Expand the __atomic_compare_exchange and the
> +   __atomic_compare_exchange_local intrinsics:
>       bool __atomic_compare_exchange (TYPE *object, TYPE *expect,
>                                       TYPE desired, BOOL weak,
>                                       enum memmodel success,
>                                       enum memmodel failure)
> +     bool __atomic_compare_exchange_local (TYPE *object, TYPE *expect,
> +                                           TYPE desired, BOOL weak,
> +                                           enum memmodel success,
> +                                           enum memmodel failure)
>     EXP is the CALL_EXPR.
> -   TARGET is an optional place for us to store the results.  */
> +   TARGET is an optional place for us to store the results.
> +   LOCAL indicates which builtin is being expanded. A value of true
> +   means __atomic_compare_exchange_local is being expanded, while a
> +   value of false indicates expansion of __atomic_compare_exchange.  */
>  
>  static rtx
>  expand_builtin_atomic_compare_exchange (machine_mode mode, tree exp,
> -                                     rtx target)
> +                                     rtx target, bool local)
>  {
>    rtx expect, desired, mem, oldval;
>    rtx_code_label *label;
> @@ -6745,7 +6753,7 @@ expand_builtin_atomic_compare_exchange (machine_mode 
> mode, tree exp,
>    oldval = NULL;
>  
>    if (!expand_atomic_compare_and_swap (&target, &oldval, mem, expect, 
> desired,
> -                                    is_weak, success, failure))
> +                                    is_weak, success, failure, local))
>      return NULL_RTX;
>  
>    /* Conditionally store back to EXPECT, lest we create a race condition
> @@ -8711,7 +8719,7 @@ expand_builtin (tree exp, rtx target, rtx subtarget, 
> machine_mode mode,
>  
>       mode =
>           get_builtin_sync_mode (fcode - BUILT_IN_ATOMIC_COMPARE_EXCHANGE_1);
> -     target = expand_builtin_atomic_compare_exchange (mode, exp, target);
> +     target = expand_builtin_atomic_compare_exchange (mode, exp, target, 
> false);
>       if (target)
>         return target;
>  
> @@ -8728,6 +8736,20 @@ expand_builtin (tree exp, rtx target, rtx subtarget, 
> machine_mode mode,
>       break;
>        }
>  
> +    case BUILT_IN_ATOMIC_COMPARE_EXCHANGE_LOCAL_1:
> +    case BUILT_IN_ATOMIC_COMPARE_EXCHANGE_LOCAL_2:
> +    case BUILT_IN_ATOMIC_COMPARE_EXCHANGE_LOCAL_4:
> +    case BUILT_IN_ATOMIC_COMPARE_EXCHANGE_LOCAL_8:
> +    case BUILT_IN_ATOMIC_COMPARE_EXCHANGE_LOCAL_16:
> +      {
> +     mode =
> +         get_builtin_sync_mode (fcode - 
> BUILT_IN_ATOMIC_COMPARE_EXCHANGE_LOCAL_1);
> +     target = expand_builtin_atomic_compare_exchange (mode, exp, target, 
> true);
> +     if (target)
> +       return target;
> +     break;
> +      }
> +
>      case BUILT_IN_ATOMIC_LOAD_1:
>      case BUILT_IN_ATOMIC_LOAD_2:
>      case BUILT_IN_ATOMIC_LOAD_4:
> diff --git a/gcc/c-family/c-common.cc b/gcc/c-family/c-common.cc
> index e7dd4602ac1..965b17947d3 100644
> --- a/gcc/c-family/c-common.cc
> +++ b/gcc/c-family/c-common.cc
> @@ -7807,6 +7807,11 @@ get_atomic_generic_size (location_t loc, tree function,
>        n_model = 2;
>        outputs = 3;
>        break;
> +    case BUILT_IN_ATOMIC_COMPARE_EXCHANGE_LOCAL:
> +      n_param = 6;
> +      n_model = 2;
> +      outputs = 3;
> +      break;
>      default:
>        gcc_unreachable ();
>      }
> @@ -8118,13 +8123,22 @@ resolve_overloaded_atomic_exchange (location_t loc, 
> tree function,
>    return false;
>  }
>  
> -/* This will process an __atomic_compare_exchange function call, determine
> -   whether it needs to be mapped to the _N variation, or turned into a lib 
> call.
> +/* This will process __atomic_compare_exchange and 
> __atomic_compare_exchange_local
> +   function calls and determine whether they can be mapped to the _N 
> variation,
> +   or in the case of __atomic_compare_exchange, turned into a lib call.
>     LOC is the location of the builtin call.
>     FUNCTION is the DECL that has been invoked;
> -   PARAMS is the argument list for the call.  The return value is non-null
> +   PARAMS is the argument list for the call.
> +   The return value is non-null
> +   For __atomic_compare_exchange:
>     TRUE is returned if it is translated into the proper format for a call to 
> the
>     external library, and NEW_RETURN is set the tree for that function.
> +   FALSE is returned if processing for the _N variation is required.
> +   For __atomic_compare_exchange_local:
> +   TRUE is returned if a lock-free size is not specified, and NEW_RETURN IS
> +   set to error_mark_node. Library support is not provided for this builtin
> +   since the intent of this builtin is to provide exclusive access hints on 
> the
> +   machine instructions implementing this builtin.
>     FALSE is returned if processing for the _N variation is required.  */
>  
>  static bool
> @@ -8146,6 +8160,14 @@ resolve_overloaded_atomic_compare_exchange (location_t 
> loc, tree function,
>    /* If not a lock-free size, change to the library generic format.  */
>    if (!atomic_size_supported_p (n))
>      {
> +      enum built_in_function fn_code = DECL_FUNCTION_CODE (function);
> +      if (fn_code == BUILT_IN_ATOMIC_COMPARE_EXCHANGE_LOCAL)
> +     {
> +       error_at (loc, "lock-free size not specified for builtin-function 
> %qE", function);
> +       *new_return = error_mark_node;
> +       return true;
> +     }
> +
>        /* The library generic format does not have the weak parameter, so
>        remove it from the param list.  Since a parameter has been removed,
>        we can be sure that there is room for the SIZE_T parameter, meaning
> @@ -8640,10 +8662,11 @@ resolve_overloaded_builtin (location_t loc, tree 
> function,
>  
>      case BUILT_IN_ATOMIC_EXCHANGE:
>      case BUILT_IN_ATOMIC_COMPARE_EXCHANGE:
> +    case BUILT_IN_ATOMIC_COMPARE_EXCHANGE_LOCAL:
>      case BUILT_IN_ATOMIC_LOAD:
>      case BUILT_IN_ATOMIC_STORE:
>        {
> -     /* Handle these 4 together so that they can fall through to the next
> +     /* Handle these 5 together so that they can fall through to the next
>          case if the call is transformed to an _N variant.  */
>          switch (orig_code)
>         {
> @@ -8666,6 +8689,20 @@ resolve_overloaded_builtin (location_t loc, tree 
> function,
>             orig_code = BUILT_IN_ATOMIC_COMPARE_EXCHANGE_N;
>             break;
>           }
> +       case BUILT_IN_ATOMIC_COMPARE_EXCHANGE_LOCAL:
> +         {
> +           if (!targetm.have_load_with_exclusive_access())
> +             {
> +               error_at(loc, "unsupported builtin-function %qE", function);
> +               return error_mark_node;
> +             }
> +           if (resolve_overloaded_atomic_compare_exchange (
> +                 loc, function, params, &new_return, complain))
> +             return new_return;
> +           /* Change to the _N variant.  */
> +           orig_code = BUILT_IN_ATOMIC_COMPARE_EXCHANGE_LOCAL_N;
> +           break;
> +         }
>         case BUILT_IN_ATOMIC_LOAD:
>           {
>             if (resolve_overloaded_atomic_load (loc, function, params,
> @@ -8771,7 +8808,8 @@ resolve_overloaded_builtin (location_t loc, tree 
> function,
>       if (orig_code != BUILT_IN_SYNC_BOOL_COMPARE_AND_SWAP_N
>           && orig_code != BUILT_IN_SYNC_LOCK_RELEASE_N
>           && orig_code != BUILT_IN_ATOMIC_STORE_N
> -         && orig_code != BUILT_IN_ATOMIC_COMPARE_EXCHANGE_N)
> +         && orig_code != BUILT_IN_ATOMIC_COMPARE_EXCHANGE_N
> +         && orig_code != BUILT_IN_ATOMIC_COMPARE_EXCHANGE_LOCAL_N)
>         result = sync_resolve_return (first_param, result, orig_format);
>  
>       if (fetch_op)
> diff --git a/gcc/config/rs6000/rs6000-protos.h 
> b/gcc/config/rs6000/rs6000-protos.h
> index 234eb0ae2b3..f4b9b4ee922 100644
> --- a/gcc/config/rs6000/rs6000-protos.h
> +++ b/gcc/config/rs6000/rs6000-protos.h
> @@ -127,7 +127,7 @@ extern bool rs6000_emit_set_const (rtx, rtx);
>  extern bool rs6000_emit_cmove (rtx, rtx, rtx, rtx);
>  extern bool rs6000_emit_int_cmove (rtx, rtx, rtx, rtx);
>  extern void rs6000_emit_minmax (rtx, enum rtx_code, rtx, rtx);
> -extern void rs6000_expand_atomic_compare_and_swap (rtx op[]);
> +extern void rs6000_expand_atomic_compare_and_swap (rtx op[], bool local);
>  extern rtx swap_endian_selector_for_mode (machine_mode mode);
>  
>  extern void rs6000_expand_atomic_exchange (rtx op[]);
> diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc
> index 764b4992fb5..0f4a5d2c4fb 100644
> --- a/gcc/config/rs6000/rs6000.cc
> +++ b/gcc/config/rs6000/rs6000.cc
> @@ -395,6 +395,12 @@ mode_supports_dq_form (machine_mode mode)
>         != 0);
>  }
>  
> +bool
> +rs6000_have_load_with_exclusive_access ()
> +{
> +  return true;
> +}
> +
>  /* Given that there exists at least one variable that is set (produced)
>     by OUT_INSN and read (consumed) by IN_INSN, return true iff
>     IN_INSN represents one or more memory store operations and none of
> @@ -16749,12 +16755,13 @@ emit_unlikely_jump (rtx cond, rtx label)
>  
>  /* A subroutine of the atomic operation splitters.  Emit a load-locked
>     instruction in MODE.  For QI/HImode, possibly use a pattern than includes
> -   the zero_extend operation.  */
> +   the zero_extend operation. LOCAL indicates the EH bit value for the
> +   load-locked instruction.  */
>  
>  static void
> -emit_load_locked (machine_mode mode, rtx reg, rtx mem)
> +emit_load_locked (machine_mode mode, rtx reg, rtx mem, rtx local)
>  {
> -  rtx (*fn) (rtx, rtx) = NULL;
> +  rtx (*fn) (rtx, rtx, rtx) = NULL;
>  
>    switch (mode)
>      {
> @@ -16781,7 +16788,7 @@ emit_load_locked (machine_mode mode, rtx reg, rtx mem)
>      default:
>        gcc_unreachable ();
>      }
> -  emit_insn (fn (reg, mem));
> +  emit_insn (fn (reg, mem, local));
>  }
>  
>  /* A subroutine of the atomic operation splitters.  Emit a store-conditional
> @@ -16948,10 +16955,12 @@ rs6000_finish_atomic_subword (rtx narrow, rtx wide, 
> rtx shift)
>    emit_move_insn (narrow, gen_lowpart (GET_MODE (narrow), wide));
>  }
>  
> -/* Expand an atomic compare and swap operation.  */
> +/* Expand an atomic compare and swap operation.
> +   If LOCAL is true, the load-locked (larx) instruction should have
> +   an EH value of 1. */
>  
>  void
> -rs6000_expand_atomic_compare_and_swap (rtx operands[])
> +rs6000_expand_atomic_compare_and_swap (rtx operands[], bool local)
>  {
>    rtx boolval, retval, mem, oldval, newval, cond;
>    rtx label1, label2, x, mask, shift;
> @@ -17014,7 +17023,10 @@ rs6000_expand_atomic_compare_and_swap (rtx 
> operands[])
>      }
>    label2 = gen_rtx_LABEL_REF (VOIDmode, gen_label_rtx ());
>  
> -  emit_load_locked (mode, retval, mem);
> +  if (local)
> +    emit_load_locked (mode, retval, mem, const1_rtx);
> +  else
> +    emit_load_locked (mode, retval, mem, const0_rtx);
>  
>    x = retval;
>    if (mask)
> @@ -17112,7 +17124,7 @@ rs6000_expand_atomic_exchange (rtx operands[])
>    label = gen_rtx_LABEL_REF (VOIDmode, gen_label_rtx ());
>    emit_label (XEXP (label, 0));
>  
> -  emit_load_locked (mode, retval, mem);
> +  emit_load_locked (mode, retval, mem, const0_rtx);
>  
>    x = val;
>    if (mask)
> @@ -17217,7 +17229,7 @@ rs6000_expand_atomic_op (enum rtx_code code, rtx mem, 
> rtx val,
>    if (before == NULL_RTX)
>      before = gen_reg_rtx (mode);
>  
> -  emit_load_locked (mode, before, mem);
> +  emit_load_locked (mode, before, mem, const0_rtx);
>  
>    if (code == NOT)
>      {
> diff --git a/gcc/config/rs6000/rs6000.h b/gcc/config/rs6000/rs6000.h
> index db6112a09e1..6b86949f963 100644
> --- a/gcc/config/rs6000/rs6000.h
> +++ b/gcc/config/rs6000/rs6000.h
> @@ -597,6 +597,9 @@ extern unsigned char rs6000_recip_bits[];
>  #define TARGET_CPU_CPP_BUILTINS() \
>    rs6000_cpu_cpp_builtins (pfile)
>  
> +#define TARGET_HAVE_LOAD_WITH_EXCLUSIVE_ACCESS \
> +  rs6000_have_load_with_exclusive_access
> +
>  /* This is used by rs6000_cpu_cpp_builtins to indicate the byte order
>     we're compiling for.  Some configurations may need to override it.  */
>  #define RS6000_CPU_CPP_ENDIAN_BUILTINS()     \
> diff --git a/gcc/config/rs6000/sync.md b/gcc/config/rs6000/sync.md
> index f0ac3348f7b..2be7828d049 100644
> --- a/gcc/config/rs6000/sync.md
> +++ b/gcc/config/rs6000/sync.md
> @@ -278,17 +278,19 @@
>  (define_insn "load_locked<mode>"
>    [(set (match_operand:ATOMIC 0 "int_reg_operand" "=r")
>       (unspec_volatile:ATOMIC
> -         [(match_operand:ATOMIC 1 "memory_operand" "Z")] UNSPECV_LL))]
> +         [(match_operand:ATOMIC 1 "memory_operand" "Z")
> +          (match_operand:QI 2 "u1bit_cint_operand" "n")] UNSPECV_LL))]
>    ""
> -  "<larx> %0,%y1"
> +  "<larx> %0,%y1,%2"
>    [(set_attr "type" "load_l")])
>  
>  (define_insn "load_locked<QHI:mode>_si"
>    [(set (match_operand:SI 0 "int_reg_operand" "=r")
>       (unspec_volatile:SI
> -       [(match_operand:QHI 1 "memory_operand" "Z")] UNSPECV_LL))]
> +       [(match_operand:QHI 1 "memory_operand" "Z")
> +           (match_operand:QI 2 "u1bit_cint_operand" "n")] UNSPECV_LL))]
>    "TARGET_SYNC_HI_QI"
> -  "<QHI:larx> %0,%y1"
> +  "<QHI:larx> %0,%y1,%2"
>    [(set_attr "type" "load_l")])
>  
>  ;; Use PTImode to get even/odd register pairs.
> @@ -302,7 +304,8 @@
>  
>  (define_expand "load_lockedti"
>    [(use (match_operand:TI 0 "quad_int_reg_operand"))
> -   (use (match_operand:TI 1 "memory_operand"))]
> +   (use (match_operand:TI 1 "memory_operand"))
> +   (use (match_operand:QI 2 "u1bit_cint_operand"))]
>    "TARGET_SYNC_TI"
>  {
>    rtx op0 = operands[0];
> @@ -316,7 +319,7 @@
>        operands[1] = op1 = change_address (op1, TImode, new_addr);
>      }
>  
> -  emit_insn (gen_load_lockedpti (pti, op1));
> +  emit_insn (gen_load_lockedpti (pti, op1, operands[2]));
>    if (WORDS_BIG_ENDIAN)
>      emit_move_insn (op0, gen_lowpart (TImode, pti));
>    else
> @@ -330,11 +333,12 @@
>  (define_insn "load_lockedpti"
>    [(set (match_operand:PTI 0 "quad_int_reg_operand" "=&r")
>       (unspec_volatile:PTI
> -         [(match_operand:TI 1 "indexed_or_indirect_operand" "Z")] 
> UNSPECV_LL))]
> +         [(match_operand:TI 1 "indexed_or_indirect_operand" "Z")
> +          (match_operand:QI 2 "u1bit_cint_operand" "n")] UNSPECV_LL))]
>    "TARGET_SYNC_TI
>     && !reg_mentioned_p (operands[0], operands[1])
>     && quad_int_reg_operand (operands[0], PTImode)"
> -  "lqarx %0,%y1"
> +  "lqarx %0,%y1,%2"
>    [(set_attr "type" "load_l")
>     (set_attr "size" "128")])
>  
> @@ -411,7 +415,22 @@
>     (match_operand:SI 7 "const_int_operand")]         ;; model fail
>    ""
>  {
> -  rs6000_expand_atomic_compare_and_swap (operands);
> +  rs6000_expand_atomic_compare_and_swap (operands, false);
> +  DONE;
> +})
> +
> +(define_expand "atomic_compare_and_swap_local<mode>"
> +  [(match_operand:SI 0 "int_reg_operand")            ;; bool out
> +   (match_operand:AINT 1 "int_reg_operand")          ;; val out
> +   (match_operand:AINT 2 "memory_operand")           ;; memory
> +   (match_operand:AINT 3 "reg_or_short_operand")     ;; expected
> +   (match_operand:AINT 4 "int_reg_operand")          ;; desired
> +   (match_operand:SI 5 "const_int_operand")          ;; is_weak
> +   (match_operand:SI 6 "const_int_operand")          ;; model succ
> +   (match_operand:SI 7 "const_int_operand")]         ;; model fail
> +  ""
> +{
> +  rs6000_expand_atomic_compare_and_swap (operands, true);
>    DONE;
>  })
>  
> diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi
> index 4d4e676aadf..05e1c15062b 100644
> --- a/gcc/doc/tm.texi
> +++ b/gcc/doc/tm.texi
> @@ -12496,6 +12496,14 @@ enabled, such as in response to command-line flags. 
> The default implementation
>  returns true iff @code{TARGET_GEN_CCMP_FIRST} is defined.
>  @end deftypefn
>  
> +@deftypefn {Target Hook} bool TARGET_HAVE_LOAD_WITH_EXCLUSIVE_ACCESS (void)
> +This target hook returns true if the target supports load instructions
> +with exclusive access hints that optimize how a cache block is transferred
> +between processor caches. Such hints are helpful, for example, to reduce the
> +number of times a cache block is transferred between processor caches when
> +there is significant lock contention.
> +@end deftypefn
> +
>  @deftypefn {Target Hook} unsigned TARGET_LOOP_UNROLL_ADJUST (unsigned 
> @var{nunroll}, class loop *@var{loop})
>  This target hook returns a new value for the number of times @var{loop}
>  should be unrolled. The parameter @var{nunroll} is the number of times
> diff --git a/gcc/doc/tm.texi.in b/gcc/doc/tm.texi.in
> index 1a51ad54817..8b953042335 100644
> --- a/gcc/doc/tm.texi.in
> +++ b/gcc/doc/tm.texi.in
> @@ -7944,6 +7944,8 @@ lists.
>  
>  @hook TARGET_HAVE_CCMP
>  
> +@hook TARGET_HAVE_LOAD_WITH_EXCLUSIVE_ACCESS
> +
>  @hook TARGET_LOOP_UNROLL_ADJUST
>  
>  @defmac POWI_MAX_MULTS
> diff --git a/gcc/optabs.cc b/gcc/optabs.cc
> index 5c9450f6145..fe6840d21ed 100644
> --- a/gcc/optabs.cc
> +++ b/gcc/optabs.cc
> @@ -7121,6 +7121,8 @@ expand_atomic_exchange (rtx target, rtx mem, rtx val, 
> enum memmodel model)
>     success to the actual location of the corresponding result.
>  
>     MEMMODEL is the memory model variant to use.
> +   A true value for LOCAL indicates expansion of the builtin
> +   __atomic_compare_exchange_local.
>  
>     The return value of the function is true for success.  */
>  
> @@ -7128,7 +7130,7 @@ bool
>  expand_atomic_compare_and_swap (rtx *ptarget_bool, rtx *ptarget_oval,
>                               rtx mem, rtx expected, rtx desired,
>                               bool is_weak, enum memmodel succ_model,
> -                             enum memmodel fail_model)
> +                             enum memmodel fail_model, bool local)
>  {
>    machine_mode mode = GET_MODE (mem);
>    class expand_operand ops[8];
> @@ -7157,7 +7159,11 @@ expand_atomic_compare_and_swap (rtx *ptarget_bool, rtx 
> *ptarget_oval,
>        || reg_overlap_mentioned_p (expected, target_oval))
>      target_oval = gen_reg_rtx (mode);
>  
> -  icode = direct_optab_handler (atomic_compare_and_swap_optab, mode);
> +  if (!local)
> +    icode = direct_optab_handler (atomic_compare_and_swap_optab, mode);
> +  else
> +    icode = direct_optab_handler (atomic_compare_and_swap_local_optab, mode);
> +
>    if (icode != CODE_FOR_nothing)
>      {
>        machine_mode bool_mode = insn_data[icode].operand[0].mode;
> diff --git a/gcc/optabs.def b/gcc/optabs.def
> index 87a8b85da15..1e730069caf 100644
> --- a/gcc/optabs.def
> +++ b/gcc/optabs.def
> @@ -512,6 +512,7 @@ OPTAB_D (atomic_bit_test_and_set_optab, 
> "atomic_bit_test_and_set$I$a")
>  OPTAB_D (atomic_bit_test_and_complement_optab, 
> "atomic_bit_test_and_complement$I$a")
>  OPTAB_D (atomic_bit_test_and_reset_optab, "atomic_bit_test_and_reset$I$a")
>  OPTAB_D (atomic_compare_and_swap_optab, "atomic_compare_and_swap$I$a")
> +OPTAB_D (atomic_compare_and_swap_local_optab, 
> "atomic_compare_and_swap_local$I$a")
>  OPTAB_D (atomic_exchange_optab,       "atomic_exchange$I$a")
>  OPTAB_D (atomic_fetch_add_optab, "atomic_fetch_add$I$a")
>  OPTAB_D (atomic_fetch_and_optab, "atomic_fetch_and$I$a")
> diff --git a/gcc/optabs.h b/gcc/optabs.h
> index a8b0e93d60b..6f7e0f5a027 100644
> --- a/gcc/optabs.h
> +++ b/gcc/optabs.h
> @@ -356,7 +356,7 @@ extern rtx expand_sync_lock_test_and_set (rtx, rtx, rtx);
>  extern rtx expand_atomic_test_and_set (rtx, rtx, enum memmodel);
>  extern rtx expand_atomic_exchange (rtx, rtx, rtx, enum memmodel);
>  extern bool expand_atomic_compare_and_swap (rtx *, rtx *, rtx, rtx, rtx, 
> bool,
> -                                         enum memmodel, enum memmodel);
> +                                         enum memmodel, enum memmodel, bool 
> local = false);
>  /* Generate memory barriers.  */
>  extern void expand_mem_thread_fence (enum memmodel);
>  extern void expand_mem_signal_fence (enum memmodel);
> diff --git a/gcc/predict.cc b/gcc/predict.cc
> index 5639d81d277..1006bdf3d3c 100644
> --- a/gcc/predict.cc
> +++ b/gcc/predict.cc
> @@ -2672,6 +2672,13 @@ expr_expected_value_1 (tree type, tree op0, enum 
> tree_code code,
>             case BUILT_IN_ATOMIC_COMPARE_EXCHANGE_4:
>             case BUILT_IN_ATOMIC_COMPARE_EXCHANGE_8:
>             case BUILT_IN_ATOMIC_COMPARE_EXCHANGE_16:
> +           case BUILT_IN_ATOMIC_COMPARE_EXCHANGE_LOCAL:
> +           case BUILT_IN_ATOMIC_COMPARE_EXCHANGE_LOCAL_N:
> +           case BUILT_IN_ATOMIC_COMPARE_EXCHANGE_LOCAL_1:
> +           case BUILT_IN_ATOMIC_COMPARE_EXCHANGE_LOCAL_2:
> +           case BUILT_IN_ATOMIC_COMPARE_EXCHANGE_LOCAL_4:
> +           case BUILT_IN_ATOMIC_COMPARE_EXCHANGE_LOCAL_8:
> +           case BUILT_IN_ATOMIC_COMPARE_EXCHANGE_LOCAL_16:
>               /* Assume that any given atomic operation has low contention,
>                  and thus the compare-and-swap operation succeeds.  */
>               *predictor = PRED_COMPARE_AND_SWAP;
> diff --git a/gcc/sync-builtins.def b/gcc/sync-builtins.def
> index 0f058187a20..ad1dd5e2d1f 100644
> --- a/gcc/sync-builtins.def
> +++ b/gcc/sync-builtins.def
> @@ -338,6 +338,34 @@ DEF_SYNC_BUILTIN (BUILT_IN_ATOMIC_COMPARE_EXCHANGE_16,
>                 BT_FN_BOOL_VPTR_PTR_I16_BOOL_INT_INT,
>                 ATTR_NOTHROWCALL_LEAF_LIST)
>  
> +DEF_SYNC_BUILTIN (BUILT_IN_ATOMIC_COMPARE_EXCHANGE_LOCAL,
> +               "__atomic_compare_exchange_local",
> +               BT_FN_BOOL_SIZE_VPTR_PTR_PTR_INT_INT,
> +               ATTR_NOTHROWCALL_LEAF_LIST)
> +DEF_SYNC_BUILTIN (BUILT_IN_ATOMIC_COMPARE_EXCHANGE_LOCAL_N,
> +               "__atomic_compare_exchange_local_n",
> +               BT_FN_VOID_VAR, ATTR_NOTHROWCALL_LEAF_LIST)
> +DEF_SYNC_BUILTIN (BUILT_IN_ATOMIC_COMPARE_EXCHANGE_LOCAL_1,
> +               "__atomic_compare_exchange_local_1",
> +               BT_FN_BOOL_VPTR_PTR_I1_BOOL_INT_INT,
> +               ATTR_NOTHROWCALL_LEAF_LIST)
> +DEF_SYNC_BUILTIN (BUILT_IN_ATOMIC_COMPARE_EXCHANGE_LOCAL_2,
> +               "__atomic_compare_exchange_local_2",
> +               BT_FN_BOOL_VPTR_PTR_I2_BOOL_INT_INT,
> +               ATTR_NOTHROWCALL_LEAF_LIST)
> +DEF_SYNC_BUILTIN (BUILT_IN_ATOMIC_COMPARE_EXCHANGE_LOCAL_4,
> +               "__atomic_compare_exchange_local_4",
> +               BT_FN_BOOL_VPTR_PTR_I4_BOOL_INT_INT,
> +               ATTR_NOTHROWCALL_LEAF_LIST)
> +DEF_SYNC_BUILTIN (BUILT_IN_ATOMIC_COMPARE_EXCHANGE_LOCAL_8,
> +               "__atomic_compare_exchange_local_8",
> +               BT_FN_BOOL_VPTR_PTR_I8_BOOL_INT_INT,
> +               ATTR_NOTHROWCALL_LEAF_LIST)
> +DEF_SYNC_BUILTIN (BUILT_IN_ATOMIC_COMPARE_EXCHANGE_LOCAL_16,
> +               "__atomic_compare_exchange_local_16",
> +               BT_FN_BOOL_VPTR_PTR_I16_BOOL_INT_INT,
> +               ATTR_NOTHROWCALL_LEAF_LIST)
> +
>  DEF_SYNC_BUILTIN (BUILT_IN_ATOMIC_STORE,
>                 "__atomic_store",
>                 BT_FN_VOID_SIZE_VPTR_PTR_INT, ATTR_NOTHROWCALL_LEAF_LIST)
> diff --git a/gcc/target.def b/gcc/target.def
> index 5dd8f253ef6..fad306b0199 100644
> --- a/gcc/target.def
> +++ b/gcc/target.def
> @@ -2828,6 +2828,17 @@ returns true iff @code{TARGET_GEN_CCMP_FIRST} is 
> defined.",
>   bool, (void),
>   default_have_ccmp)
>  
> +/* Return true if the target supports load instructions with exclusive 
> access.  */
> +DEFHOOK
> +(have_load_with_exclusive_access,
> + "This target hook returns true if the target supports load instructions\n\
> +with exclusive access hints that optimize how a cache block is transferred\n\
> +between processor caches. Such hints are helpful, for example, to reduce 
> the\n\
> +number of times a cache block is transferred between processor caches when\n\
> +there is significant lock contention.",
> + bool, (void),
> + hook_bool_void_false)
> +
>  /* Return a new value for loop unroll size.  */
>  DEFHOOK
>  (loop_unroll_adjust,
> diff --git a/gcc/testsuite/gcc.target/powerpc/acmp-tst.c 
> b/gcc/testsuite/gcc.target/powerpc/acmp-tst.c
> new file mode 100644
> index 00000000000..a4b5861216b
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/powerpc/acmp-tst.c
> @@ -0,0 +1,12 @@
> +/* { dg-do compile } */
> +/* { dg-options "-mdejagnu-cpu=power8 -O2" } */
> +
> +#include <stdint.h>
> +
> +bool
> +word_exchange (uint64_t *ptr, uint64_t *expected, uint64_t * desired)
> +{
> +  return __atomic_compare_exchange_local (ptr, expected, desired, 0, 
> __ATOMIC_SEQ_CST, __ATOMIC_ACQUIRE);
> +}
> +
> +/* { dg-final { scan-assembler {\mldarx +[0-9]+,[0-9]+,[0-9]+,1} } } */

Reply via email to