Ping. Please review.
Regards, Surya On 06/08/25 6:51 pm, Surya Kumari Jangala wrote: > The PowerPC ISA has Load-And-Reserve and Store-Conditional instructions > which can be used to construct a sequence of instructions that appears > to perform an atomic update operation on an aligned storage location. > > The larx (load-and-reserve) instruction supports an Exclusive Access > Hint (EH). A value of 0 for this hint indicates other programs might > attempt to modify the storage location. A value of 1 indicates that > other programs will not attempt to modify the memory location until the > program that has done the load performs a subsequent store. EH = 1 > should be used when the program is obtaining a lock variable which it > will subsequently release before another program attempts to modify the > lock variable. When contention for a lock is significant, using this > hint may reduce the number of times a cache block is transferred between > processor caches. > > This patch introduces a new built-in function: > __atomic_compare_exchange_local() > > It behaves like __atomic_compare_exchange(), but it uses an EH value of > 1 in the larx (load-and-reserve) instruction. The new builtin helps > optimize lock contention on PowerPC by keeping the lock cacheline in > the local processor longer, reducing performance penalties from > cacheline movement. > > This patch also provides a hook to specify if a target supports load > instructions with exclusive access hints. For targets that do not > support such load instructions, calling the new builtin will result in > an error. > > The existing infrastructure for supporting __atomic_compare_exchange > is reused—with some modifications—to accommodate the new builtin. > In the expand pass, additional parameters are introduced in functions > wherever necessary to indicate that the builtin being processed is > __atomic_compare_exchange_local. > > Bootstrapped and regtested on powerpc64le and aarch64. Ok for trunk? > > 2025-08-05 Surya Kumari Jangala <[email protected]> > > gcc: > * builtins.cc (expand_builtin_atomic_compare_exchange): Add a new > parameter 'local'. Pass new parameter 'local' to > expand_atomic_compare_and_swap(). > (expand_builtin): Pass parameter 'local' to > expand_builtin_atomic_compare_exchange(). Expand call to > __atomic_compare_exchange_local(). > * c-family/c-common.cc (get_atomic_generic_size): Add new case > for BUILT_IN_ATOMIC_COMPARE_EXCHANGE_LOCAL in switch statement. > (resolve_overloaded_atomic_compare_exchange): Issue error message if > lock free size not specified for __atomic_compare_exchange_local. > (resolve_overloaded_builtin): Check if target supports loads with > exclusive access hints. Convert builtin to _N variant. > * config/rs6000/rs6000-protos.h (rs6000_expand_atomic_compare_and_swap): > Add additional parameter 'local' to the prototype. > * config/rs6000/rs6000.cc (rs6000_have_load_with_exclusive_access): > New function. > (emit_load_locked): Add new parameter. Pass new parameter to generate > load-locked instruction. > (rs6000_expand_atomic_compare_and_swap): Add new parameter. Call > emit_load_locked() with additional parameter value of EH bit. > (rs6000_expand_atomic_exchange): Pass EH value 0 to emit_load_locked(). > (rs6000_expand_atomic_op): Likewise. > * config/rs6000/rs6000.h (TARGET_HAVE_LOAD_WITH_EXCLUSIVE_ACCESS): > Define. > * config/rs6000/sync.md (load_locked<mode>): Add new operand in RTL > template. Specify EH bit in the larx instruction. > (load_locked<QHI:mode>_si): Likewise. > (load_lockedpti): Likewise. > (load_lockedti): Add new operand in RTL template. Pass EH bit to > gen_load_lockedpti(). > (atomic_compare_and_swap<mode>): Pass new parameter 'false' to > rs6000_expand_atomic_compare_and_swap. > (atomic_compare_and_swap_local<mode>): New define_expand. > * doc/tm.texi: Regenerate > * doc/tm.texi.in (TARGET_HAVE_LOAD_WITH_EXCLUSIVE_ACCESS): New hook. > * optabs.cc (expand_atomic_compare_and_swap): Expand the new builtin. > * optabs.def (atomic_compare_and_swap_local_optab): New entry. > * optabs.h (expand_atomic_compare_and_swap): Add additional parameter > 'local' with default value false. > * predict.cc (expr_expected_value_1): Set up predictor for the new > builtin. > * sync-builtins.def (BUILT_IN_ATOMIC_COMPARE_EXCHANGE_LOCAL): Define > new enum. > (BUILT_IN_ATOMIC_COMPARE_EXCHANGE_LOCAL_N): Likewise. > (BUILT_IN_ATOMIC_COMPARE_EXCHANGE_LOCAL_1): Likewise. > (BUILT_IN_ATOMIC_COMPARE_EXCHANGE_LOCAL_2): Likewise. > (BUILT_IN_ATOMIC_COMPARE_EXCHANGE_LOCAL_4): Likewise. > (BUILT_IN_ATOMIC_COMPARE_EXCHANGE_LOCAL_8): Likewise. > (BUILT_IN_ATOMIC_COMPARE_EXCHANGE_LOCAL_16): Likewise. > * target.def ((have_load_with_exclusive_access): New hook. > > gcc/testsuite: > * gcc.target/powerpc/acmp-tst.c: New test. > --- > gcc/builtins.cc | 32 +++++++++++--- > gcc/c-family/c-common.cc | 48 ++++++++++++++++++--- > gcc/config/rs6000/rs6000-protos.h | 2 +- > gcc/config/rs6000/rs6000.cc | 30 +++++++++---- > gcc/config/rs6000/rs6000.h | 3 ++ > gcc/config/rs6000/sync.md | 37 ++++++++++++---- > gcc/doc/tm.texi | 8 ++++ > gcc/doc/tm.texi.in | 2 + > gcc/optabs.cc | 10 ++++- > gcc/optabs.def | 1 + > gcc/optabs.h | 2 +- > gcc/predict.cc | 7 +++ > gcc/sync-builtins.def | 28 ++++++++++++ > gcc/target.def | 11 +++++ > gcc/testsuite/gcc.target/powerpc/acmp-tst.c | 12 ++++++ > 15 files changed, 201 insertions(+), 32 deletions(-) > create mode 100644 gcc/testsuite/gcc.target/powerpc/acmp-tst.c > > diff --git a/gcc/builtins.cc b/gcc/builtins.cc > index 7f580a3145f..e44b2f2db9d 100644 > --- a/gcc/builtins.cc > +++ b/gcc/builtins.cc > @@ -6691,17 +6691,25 @@ expand_builtin_atomic_exchange (machine_mode mode, > tree exp, rtx target) > return expand_atomic_exchange (target, mem, val, model); > } > > -/* Expand the __atomic_compare_exchange intrinsic: > +/* Expand the __atomic_compare_exchange and the > + __atomic_compare_exchange_local intrinsics: > bool __atomic_compare_exchange (TYPE *object, TYPE *expect, > TYPE desired, BOOL weak, > enum memmodel success, > enum memmodel failure) > + bool __atomic_compare_exchange_local (TYPE *object, TYPE *expect, > + TYPE desired, BOOL weak, > + enum memmodel success, > + enum memmodel failure) > EXP is the CALL_EXPR. > - TARGET is an optional place for us to store the results. */ > + TARGET is an optional place for us to store the results. > + LOCAL indicates which builtin is being expanded. A value of true > + means __atomic_compare_exchange_local is being expanded, while a > + value of false indicates expansion of __atomic_compare_exchange. */ > > static rtx > expand_builtin_atomic_compare_exchange (machine_mode mode, tree exp, > - rtx target) > + rtx target, bool local) > { > rtx expect, desired, mem, oldval; > rtx_code_label *label; > @@ -6745,7 +6753,7 @@ expand_builtin_atomic_compare_exchange (machine_mode > mode, tree exp, > oldval = NULL; > > if (!expand_atomic_compare_and_swap (&target, &oldval, mem, expect, > desired, > - is_weak, success, failure)) > + is_weak, success, failure, local)) > return NULL_RTX; > > /* Conditionally store back to EXPECT, lest we create a race condition > @@ -8711,7 +8719,7 @@ expand_builtin (tree exp, rtx target, rtx subtarget, > machine_mode mode, > > mode = > get_builtin_sync_mode (fcode - BUILT_IN_ATOMIC_COMPARE_EXCHANGE_1); > - target = expand_builtin_atomic_compare_exchange (mode, exp, target); > + target = expand_builtin_atomic_compare_exchange (mode, exp, target, > false); > if (target) > return target; > > @@ -8728,6 +8736,20 @@ expand_builtin (tree exp, rtx target, rtx subtarget, > machine_mode mode, > break; > } > > + case BUILT_IN_ATOMIC_COMPARE_EXCHANGE_LOCAL_1: > + case BUILT_IN_ATOMIC_COMPARE_EXCHANGE_LOCAL_2: > + case BUILT_IN_ATOMIC_COMPARE_EXCHANGE_LOCAL_4: > + case BUILT_IN_ATOMIC_COMPARE_EXCHANGE_LOCAL_8: > + case BUILT_IN_ATOMIC_COMPARE_EXCHANGE_LOCAL_16: > + { > + mode = > + get_builtin_sync_mode (fcode - > BUILT_IN_ATOMIC_COMPARE_EXCHANGE_LOCAL_1); > + target = expand_builtin_atomic_compare_exchange (mode, exp, target, > true); > + if (target) > + return target; > + break; > + } > + > case BUILT_IN_ATOMIC_LOAD_1: > case BUILT_IN_ATOMIC_LOAD_2: > case BUILT_IN_ATOMIC_LOAD_4: > diff --git a/gcc/c-family/c-common.cc b/gcc/c-family/c-common.cc > index e7dd4602ac1..965b17947d3 100644 > --- a/gcc/c-family/c-common.cc > +++ b/gcc/c-family/c-common.cc > @@ -7807,6 +7807,11 @@ get_atomic_generic_size (location_t loc, tree function, > n_model = 2; > outputs = 3; > break; > + case BUILT_IN_ATOMIC_COMPARE_EXCHANGE_LOCAL: > + n_param = 6; > + n_model = 2; > + outputs = 3; > + break; > default: > gcc_unreachable (); > } > @@ -8118,13 +8123,22 @@ resolve_overloaded_atomic_exchange (location_t loc, > tree function, > return false; > } > > -/* This will process an __atomic_compare_exchange function call, determine > - whether it needs to be mapped to the _N variation, or turned into a lib > call. > +/* This will process __atomic_compare_exchange and > __atomic_compare_exchange_local > + function calls and determine whether they can be mapped to the _N > variation, > + or in the case of __atomic_compare_exchange, turned into a lib call. > LOC is the location of the builtin call. > FUNCTION is the DECL that has been invoked; > - PARAMS is the argument list for the call. The return value is non-null > + PARAMS is the argument list for the call. > + The return value is non-null > + For __atomic_compare_exchange: > TRUE is returned if it is translated into the proper format for a call to > the > external library, and NEW_RETURN is set the tree for that function. > + FALSE is returned if processing for the _N variation is required. > + For __atomic_compare_exchange_local: > + TRUE is returned if a lock-free size is not specified, and NEW_RETURN IS > + set to error_mark_node. Library support is not provided for this builtin > + since the intent of this builtin is to provide exclusive access hints on > the > + machine instructions implementing this builtin. > FALSE is returned if processing for the _N variation is required. */ > > static bool > @@ -8146,6 +8160,14 @@ resolve_overloaded_atomic_compare_exchange (location_t > loc, tree function, > /* If not a lock-free size, change to the library generic format. */ > if (!atomic_size_supported_p (n)) > { > + enum built_in_function fn_code = DECL_FUNCTION_CODE (function); > + if (fn_code == BUILT_IN_ATOMIC_COMPARE_EXCHANGE_LOCAL) > + { > + error_at (loc, "lock-free size not specified for builtin-function > %qE", function); > + *new_return = error_mark_node; > + return true; > + } > + > /* The library generic format does not have the weak parameter, so > remove it from the param list. Since a parameter has been removed, > we can be sure that there is room for the SIZE_T parameter, meaning > @@ -8640,10 +8662,11 @@ resolve_overloaded_builtin (location_t loc, tree > function, > > case BUILT_IN_ATOMIC_EXCHANGE: > case BUILT_IN_ATOMIC_COMPARE_EXCHANGE: > + case BUILT_IN_ATOMIC_COMPARE_EXCHANGE_LOCAL: > case BUILT_IN_ATOMIC_LOAD: > case BUILT_IN_ATOMIC_STORE: > { > - /* Handle these 4 together so that they can fall through to the next > + /* Handle these 5 together so that they can fall through to the next > case if the call is transformed to an _N variant. */ > switch (orig_code) > { > @@ -8666,6 +8689,20 @@ resolve_overloaded_builtin (location_t loc, tree > function, > orig_code = BUILT_IN_ATOMIC_COMPARE_EXCHANGE_N; > break; > } > + case BUILT_IN_ATOMIC_COMPARE_EXCHANGE_LOCAL: > + { > + if (!targetm.have_load_with_exclusive_access()) > + { > + error_at(loc, "unsupported builtin-function %qE", function); > + return error_mark_node; > + } > + if (resolve_overloaded_atomic_compare_exchange ( > + loc, function, params, &new_return, complain)) > + return new_return; > + /* Change to the _N variant. */ > + orig_code = BUILT_IN_ATOMIC_COMPARE_EXCHANGE_LOCAL_N; > + break; > + } > case BUILT_IN_ATOMIC_LOAD: > { > if (resolve_overloaded_atomic_load (loc, function, params, > @@ -8771,7 +8808,8 @@ resolve_overloaded_builtin (location_t loc, tree > function, > if (orig_code != BUILT_IN_SYNC_BOOL_COMPARE_AND_SWAP_N > && orig_code != BUILT_IN_SYNC_LOCK_RELEASE_N > && orig_code != BUILT_IN_ATOMIC_STORE_N > - && orig_code != BUILT_IN_ATOMIC_COMPARE_EXCHANGE_N) > + && orig_code != BUILT_IN_ATOMIC_COMPARE_EXCHANGE_N > + && orig_code != BUILT_IN_ATOMIC_COMPARE_EXCHANGE_LOCAL_N) > result = sync_resolve_return (first_param, result, orig_format); > > if (fetch_op) > diff --git a/gcc/config/rs6000/rs6000-protos.h > b/gcc/config/rs6000/rs6000-protos.h > index 234eb0ae2b3..f4b9b4ee922 100644 > --- a/gcc/config/rs6000/rs6000-protos.h > +++ b/gcc/config/rs6000/rs6000-protos.h > @@ -127,7 +127,7 @@ extern bool rs6000_emit_set_const (rtx, rtx); > extern bool rs6000_emit_cmove (rtx, rtx, rtx, rtx); > extern bool rs6000_emit_int_cmove (rtx, rtx, rtx, rtx); > extern void rs6000_emit_minmax (rtx, enum rtx_code, rtx, rtx); > -extern void rs6000_expand_atomic_compare_and_swap (rtx op[]); > +extern void rs6000_expand_atomic_compare_and_swap (rtx op[], bool local); > extern rtx swap_endian_selector_for_mode (machine_mode mode); > > extern void rs6000_expand_atomic_exchange (rtx op[]); > diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc > index 764b4992fb5..0f4a5d2c4fb 100644 > --- a/gcc/config/rs6000/rs6000.cc > +++ b/gcc/config/rs6000/rs6000.cc > @@ -395,6 +395,12 @@ mode_supports_dq_form (machine_mode mode) > != 0); > } > > +bool > +rs6000_have_load_with_exclusive_access () > +{ > + return true; > +} > + > /* Given that there exists at least one variable that is set (produced) > by OUT_INSN and read (consumed) by IN_INSN, return true iff > IN_INSN represents one or more memory store operations and none of > @@ -16749,12 +16755,13 @@ emit_unlikely_jump (rtx cond, rtx label) > > /* A subroutine of the atomic operation splitters. Emit a load-locked > instruction in MODE. For QI/HImode, possibly use a pattern than includes > - the zero_extend operation. */ > + the zero_extend operation. LOCAL indicates the EH bit value for the > + load-locked instruction. */ > > static void > -emit_load_locked (machine_mode mode, rtx reg, rtx mem) > +emit_load_locked (machine_mode mode, rtx reg, rtx mem, rtx local) > { > - rtx (*fn) (rtx, rtx) = NULL; > + rtx (*fn) (rtx, rtx, rtx) = NULL; > > switch (mode) > { > @@ -16781,7 +16788,7 @@ emit_load_locked (machine_mode mode, rtx reg, rtx mem) > default: > gcc_unreachable (); > } > - emit_insn (fn (reg, mem)); > + emit_insn (fn (reg, mem, local)); > } > > /* A subroutine of the atomic operation splitters. Emit a store-conditional > @@ -16948,10 +16955,12 @@ rs6000_finish_atomic_subword (rtx narrow, rtx wide, > rtx shift) > emit_move_insn (narrow, gen_lowpart (GET_MODE (narrow), wide)); > } > > -/* Expand an atomic compare and swap operation. */ > +/* Expand an atomic compare and swap operation. > + If LOCAL is true, the load-locked (larx) instruction should have > + an EH value of 1. */ > > void > -rs6000_expand_atomic_compare_and_swap (rtx operands[]) > +rs6000_expand_atomic_compare_and_swap (rtx operands[], bool local) > { > rtx boolval, retval, mem, oldval, newval, cond; > rtx label1, label2, x, mask, shift; > @@ -17014,7 +17023,10 @@ rs6000_expand_atomic_compare_and_swap (rtx > operands[]) > } > label2 = gen_rtx_LABEL_REF (VOIDmode, gen_label_rtx ()); > > - emit_load_locked (mode, retval, mem); > + if (local) > + emit_load_locked (mode, retval, mem, const1_rtx); > + else > + emit_load_locked (mode, retval, mem, const0_rtx); > > x = retval; > if (mask) > @@ -17112,7 +17124,7 @@ rs6000_expand_atomic_exchange (rtx operands[]) > label = gen_rtx_LABEL_REF (VOIDmode, gen_label_rtx ()); > emit_label (XEXP (label, 0)); > > - emit_load_locked (mode, retval, mem); > + emit_load_locked (mode, retval, mem, const0_rtx); > > x = val; > if (mask) > @@ -17217,7 +17229,7 @@ rs6000_expand_atomic_op (enum rtx_code code, rtx mem, > rtx val, > if (before == NULL_RTX) > before = gen_reg_rtx (mode); > > - emit_load_locked (mode, before, mem); > + emit_load_locked (mode, before, mem, const0_rtx); > > if (code == NOT) > { > diff --git a/gcc/config/rs6000/rs6000.h b/gcc/config/rs6000/rs6000.h > index db6112a09e1..6b86949f963 100644 > --- a/gcc/config/rs6000/rs6000.h > +++ b/gcc/config/rs6000/rs6000.h > @@ -597,6 +597,9 @@ extern unsigned char rs6000_recip_bits[]; > #define TARGET_CPU_CPP_BUILTINS() \ > rs6000_cpu_cpp_builtins (pfile) > > +#define TARGET_HAVE_LOAD_WITH_EXCLUSIVE_ACCESS \ > + rs6000_have_load_with_exclusive_access > + > /* This is used by rs6000_cpu_cpp_builtins to indicate the byte order > we're compiling for. Some configurations may need to override it. */ > #define RS6000_CPU_CPP_ENDIAN_BUILTINS() \ > diff --git a/gcc/config/rs6000/sync.md b/gcc/config/rs6000/sync.md > index f0ac3348f7b..2be7828d049 100644 > --- a/gcc/config/rs6000/sync.md > +++ b/gcc/config/rs6000/sync.md > @@ -278,17 +278,19 @@ > (define_insn "load_locked<mode>" > [(set (match_operand:ATOMIC 0 "int_reg_operand" "=r") > (unspec_volatile:ATOMIC > - [(match_operand:ATOMIC 1 "memory_operand" "Z")] UNSPECV_LL))] > + [(match_operand:ATOMIC 1 "memory_operand" "Z") > + (match_operand:QI 2 "u1bit_cint_operand" "n")] UNSPECV_LL))] > "" > - "<larx> %0,%y1" > + "<larx> %0,%y1,%2" > [(set_attr "type" "load_l")]) > > (define_insn "load_locked<QHI:mode>_si" > [(set (match_operand:SI 0 "int_reg_operand" "=r") > (unspec_volatile:SI > - [(match_operand:QHI 1 "memory_operand" "Z")] UNSPECV_LL))] > + [(match_operand:QHI 1 "memory_operand" "Z") > + (match_operand:QI 2 "u1bit_cint_operand" "n")] UNSPECV_LL))] > "TARGET_SYNC_HI_QI" > - "<QHI:larx> %0,%y1" > + "<QHI:larx> %0,%y1,%2" > [(set_attr "type" "load_l")]) > > ;; Use PTImode to get even/odd register pairs. > @@ -302,7 +304,8 @@ > > (define_expand "load_lockedti" > [(use (match_operand:TI 0 "quad_int_reg_operand")) > - (use (match_operand:TI 1 "memory_operand"))] > + (use (match_operand:TI 1 "memory_operand")) > + (use (match_operand:QI 2 "u1bit_cint_operand"))] > "TARGET_SYNC_TI" > { > rtx op0 = operands[0]; > @@ -316,7 +319,7 @@ > operands[1] = op1 = change_address (op1, TImode, new_addr); > } > > - emit_insn (gen_load_lockedpti (pti, op1)); > + emit_insn (gen_load_lockedpti (pti, op1, operands[2])); > if (WORDS_BIG_ENDIAN) > emit_move_insn (op0, gen_lowpart (TImode, pti)); > else > @@ -330,11 +333,12 @@ > (define_insn "load_lockedpti" > [(set (match_operand:PTI 0 "quad_int_reg_operand" "=&r") > (unspec_volatile:PTI > - [(match_operand:TI 1 "indexed_or_indirect_operand" "Z")] > UNSPECV_LL))] > + [(match_operand:TI 1 "indexed_or_indirect_operand" "Z") > + (match_operand:QI 2 "u1bit_cint_operand" "n")] UNSPECV_LL))] > "TARGET_SYNC_TI > && !reg_mentioned_p (operands[0], operands[1]) > && quad_int_reg_operand (operands[0], PTImode)" > - "lqarx %0,%y1" > + "lqarx %0,%y1,%2" > [(set_attr "type" "load_l") > (set_attr "size" "128")]) > > @@ -411,7 +415,22 @@ > (match_operand:SI 7 "const_int_operand")] ;; model fail > "" > { > - rs6000_expand_atomic_compare_and_swap (operands); > + rs6000_expand_atomic_compare_and_swap (operands, false); > + DONE; > +}) > + > +(define_expand "atomic_compare_and_swap_local<mode>" > + [(match_operand:SI 0 "int_reg_operand") ;; bool out > + (match_operand:AINT 1 "int_reg_operand") ;; val out > + (match_operand:AINT 2 "memory_operand") ;; memory > + (match_operand:AINT 3 "reg_or_short_operand") ;; expected > + (match_operand:AINT 4 "int_reg_operand") ;; desired > + (match_operand:SI 5 "const_int_operand") ;; is_weak > + (match_operand:SI 6 "const_int_operand") ;; model succ > + (match_operand:SI 7 "const_int_operand")] ;; model fail > + "" > +{ > + rs6000_expand_atomic_compare_and_swap (operands, true); > DONE; > }) > > diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi > index 4d4e676aadf..05e1c15062b 100644 > --- a/gcc/doc/tm.texi > +++ b/gcc/doc/tm.texi > @@ -12496,6 +12496,14 @@ enabled, such as in response to command-line flags. > The default implementation > returns true iff @code{TARGET_GEN_CCMP_FIRST} is defined. > @end deftypefn > > +@deftypefn {Target Hook} bool TARGET_HAVE_LOAD_WITH_EXCLUSIVE_ACCESS (void) > +This target hook returns true if the target supports load instructions > +with exclusive access hints that optimize how a cache block is transferred > +between processor caches. Such hints are helpful, for example, to reduce the > +number of times a cache block is transferred between processor caches when > +there is significant lock contention. > +@end deftypefn > + > @deftypefn {Target Hook} unsigned TARGET_LOOP_UNROLL_ADJUST (unsigned > @var{nunroll}, class loop *@var{loop}) > This target hook returns a new value for the number of times @var{loop} > should be unrolled. The parameter @var{nunroll} is the number of times > diff --git a/gcc/doc/tm.texi.in b/gcc/doc/tm.texi.in > index 1a51ad54817..8b953042335 100644 > --- a/gcc/doc/tm.texi.in > +++ b/gcc/doc/tm.texi.in > @@ -7944,6 +7944,8 @@ lists. > > @hook TARGET_HAVE_CCMP > > +@hook TARGET_HAVE_LOAD_WITH_EXCLUSIVE_ACCESS > + > @hook TARGET_LOOP_UNROLL_ADJUST > > @defmac POWI_MAX_MULTS > diff --git a/gcc/optabs.cc b/gcc/optabs.cc > index 5c9450f6145..fe6840d21ed 100644 > --- a/gcc/optabs.cc > +++ b/gcc/optabs.cc > @@ -7121,6 +7121,8 @@ expand_atomic_exchange (rtx target, rtx mem, rtx val, > enum memmodel model) > success to the actual location of the corresponding result. > > MEMMODEL is the memory model variant to use. > + A true value for LOCAL indicates expansion of the builtin > + __atomic_compare_exchange_local. > > The return value of the function is true for success. */ > > @@ -7128,7 +7130,7 @@ bool > expand_atomic_compare_and_swap (rtx *ptarget_bool, rtx *ptarget_oval, > rtx mem, rtx expected, rtx desired, > bool is_weak, enum memmodel succ_model, > - enum memmodel fail_model) > + enum memmodel fail_model, bool local) > { > machine_mode mode = GET_MODE (mem); > class expand_operand ops[8]; > @@ -7157,7 +7159,11 @@ expand_atomic_compare_and_swap (rtx *ptarget_bool, rtx > *ptarget_oval, > || reg_overlap_mentioned_p (expected, target_oval)) > target_oval = gen_reg_rtx (mode); > > - icode = direct_optab_handler (atomic_compare_and_swap_optab, mode); > + if (!local) > + icode = direct_optab_handler (atomic_compare_and_swap_optab, mode); > + else > + icode = direct_optab_handler (atomic_compare_and_swap_local_optab, mode); > + > if (icode != CODE_FOR_nothing) > { > machine_mode bool_mode = insn_data[icode].operand[0].mode; > diff --git a/gcc/optabs.def b/gcc/optabs.def > index 87a8b85da15..1e730069caf 100644 > --- a/gcc/optabs.def > +++ b/gcc/optabs.def > @@ -512,6 +512,7 @@ OPTAB_D (atomic_bit_test_and_set_optab, > "atomic_bit_test_and_set$I$a") > OPTAB_D (atomic_bit_test_and_complement_optab, > "atomic_bit_test_and_complement$I$a") > OPTAB_D (atomic_bit_test_and_reset_optab, "atomic_bit_test_and_reset$I$a") > OPTAB_D (atomic_compare_and_swap_optab, "atomic_compare_and_swap$I$a") > +OPTAB_D (atomic_compare_and_swap_local_optab, > "atomic_compare_and_swap_local$I$a") > OPTAB_D (atomic_exchange_optab, "atomic_exchange$I$a") > OPTAB_D (atomic_fetch_add_optab, "atomic_fetch_add$I$a") > OPTAB_D (atomic_fetch_and_optab, "atomic_fetch_and$I$a") > diff --git a/gcc/optabs.h b/gcc/optabs.h > index a8b0e93d60b..6f7e0f5a027 100644 > --- a/gcc/optabs.h > +++ b/gcc/optabs.h > @@ -356,7 +356,7 @@ extern rtx expand_sync_lock_test_and_set (rtx, rtx, rtx); > extern rtx expand_atomic_test_and_set (rtx, rtx, enum memmodel); > extern rtx expand_atomic_exchange (rtx, rtx, rtx, enum memmodel); > extern bool expand_atomic_compare_and_swap (rtx *, rtx *, rtx, rtx, rtx, > bool, > - enum memmodel, enum memmodel); > + enum memmodel, enum memmodel, bool > local = false); > /* Generate memory barriers. */ > extern void expand_mem_thread_fence (enum memmodel); > extern void expand_mem_signal_fence (enum memmodel); > diff --git a/gcc/predict.cc b/gcc/predict.cc > index 5639d81d277..1006bdf3d3c 100644 > --- a/gcc/predict.cc > +++ b/gcc/predict.cc > @@ -2672,6 +2672,13 @@ expr_expected_value_1 (tree type, tree op0, enum > tree_code code, > case BUILT_IN_ATOMIC_COMPARE_EXCHANGE_4: > case BUILT_IN_ATOMIC_COMPARE_EXCHANGE_8: > case BUILT_IN_ATOMIC_COMPARE_EXCHANGE_16: > + case BUILT_IN_ATOMIC_COMPARE_EXCHANGE_LOCAL: > + case BUILT_IN_ATOMIC_COMPARE_EXCHANGE_LOCAL_N: > + case BUILT_IN_ATOMIC_COMPARE_EXCHANGE_LOCAL_1: > + case BUILT_IN_ATOMIC_COMPARE_EXCHANGE_LOCAL_2: > + case BUILT_IN_ATOMIC_COMPARE_EXCHANGE_LOCAL_4: > + case BUILT_IN_ATOMIC_COMPARE_EXCHANGE_LOCAL_8: > + case BUILT_IN_ATOMIC_COMPARE_EXCHANGE_LOCAL_16: > /* Assume that any given atomic operation has low contention, > and thus the compare-and-swap operation succeeds. */ > *predictor = PRED_COMPARE_AND_SWAP; > diff --git a/gcc/sync-builtins.def b/gcc/sync-builtins.def > index 0f058187a20..ad1dd5e2d1f 100644 > --- a/gcc/sync-builtins.def > +++ b/gcc/sync-builtins.def > @@ -338,6 +338,34 @@ DEF_SYNC_BUILTIN (BUILT_IN_ATOMIC_COMPARE_EXCHANGE_16, > BT_FN_BOOL_VPTR_PTR_I16_BOOL_INT_INT, > ATTR_NOTHROWCALL_LEAF_LIST) > > +DEF_SYNC_BUILTIN (BUILT_IN_ATOMIC_COMPARE_EXCHANGE_LOCAL, > + "__atomic_compare_exchange_local", > + BT_FN_BOOL_SIZE_VPTR_PTR_PTR_INT_INT, > + ATTR_NOTHROWCALL_LEAF_LIST) > +DEF_SYNC_BUILTIN (BUILT_IN_ATOMIC_COMPARE_EXCHANGE_LOCAL_N, > + "__atomic_compare_exchange_local_n", > + BT_FN_VOID_VAR, ATTR_NOTHROWCALL_LEAF_LIST) > +DEF_SYNC_BUILTIN (BUILT_IN_ATOMIC_COMPARE_EXCHANGE_LOCAL_1, > + "__atomic_compare_exchange_local_1", > + BT_FN_BOOL_VPTR_PTR_I1_BOOL_INT_INT, > + ATTR_NOTHROWCALL_LEAF_LIST) > +DEF_SYNC_BUILTIN (BUILT_IN_ATOMIC_COMPARE_EXCHANGE_LOCAL_2, > + "__atomic_compare_exchange_local_2", > + BT_FN_BOOL_VPTR_PTR_I2_BOOL_INT_INT, > + ATTR_NOTHROWCALL_LEAF_LIST) > +DEF_SYNC_BUILTIN (BUILT_IN_ATOMIC_COMPARE_EXCHANGE_LOCAL_4, > + "__atomic_compare_exchange_local_4", > + BT_FN_BOOL_VPTR_PTR_I4_BOOL_INT_INT, > + ATTR_NOTHROWCALL_LEAF_LIST) > +DEF_SYNC_BUILTIN (BUILT_IN_ATOMIC_COMPARE_EXCHANGE_LOCAL_8, > + "__atomic_compare_exchange_local_8", > + BT_FN_BOOL_VPTR_PTR_I8_BOOL_INT_INT, > + ATTR_NOTHROWCALL_LEAF_LIST) > +DEF_SYNC_BUILTIN (BUILT_IN_ATOMIC_COMPARE_EXCHANGE_LOCAL_16, > + "__atomic_compare_exchange_local_16", > + BT_FN_BOOL_VPTR_PTR_I16_BOOL_INT_INT, > + ATTR_NOTHROWCALL_LEAF_LIST) > + > DEF_SYNC_BUILTIN (BUILT_IN_ATOMIC_STORE, > "__atomic_store", > BT_FN_VOID_SIZE_VPTR_PTR_INT, ATTR_NOTHROWCALL_LEAF_LIST) > diff --git a/gcc/target.def b/gcc/target.def > index 5dd8f253ef6..fad306b0199 100644 > --- a/gcc/target.def > +++ b/gcc/target.def > @@ -2828,6 +2828,17 @@ returns true iff @code{TARGET_GEN_CCMP_FIRST} is > defined.", > bool, (void), > default_have_ccmp) > > +/* Return true if the target supports load instructions with exclusive > access. */ > +DEFHOOK > +(have_load_with_exclusive_access, > + "This target hook returns true if the target supports load instructions\n\ > +with exclusive access hints that optimize how a cache block is transferred\n\ > +between processor caches. Such hints are helpful, for example, to reduce > the\n\ > +number of times a cache block is transferred between processor caches when\n\ > +there is significant lock contention.", > + bool, (void), > + hook_bool_void_false) > + > /* Return a new value for loop unroll size. */ > DEFHOOK > (loop_unroll_adjust, > diff --git a/gcc/testsuite/gcc.target/powerpc/acmp-tst.c > b/gcc/testsuite/gcc.target/powerpc/acmp-tst.c > new file mode 100644 > index 00000000000..a4b5861216b > --- /dev/null > +++ b/gcc/testsuite/gcc.target/powerpc/acmp-tst.c > @@ -0,0 +1,12 @@ > +/* { dg-do compile } */ > +/* { dg-options "-mdejagnu-cpu=power8 -O2" } */ > + > +#include <stdint.h> > + > +bool > +word_exchange (uint64_t *ptr, uint64_t *expected, uint64_t * desired) > +{ > + return __atomic_compare_exchange_local (ptr, expected, desired, 0, > __ATOMIC_SEQ_CST, __ATOMIC_ACQUIRE); > +} > + > +/* { dg-final { scan-assembler {\mldarx +[0-9]+,[0-9]+,[0-9]+,1} } } */
