https://gcc.gnu.org/bugzilla/show_bug.cgi?id=71153
Andrew Pinski <pinskia at gcc dot gnu.org> changed: What |Removed |Added ---------------------------------------------------------------------------- Keywords| |missed-optimization Status|UNCONFIRMED |ASSIGNED Last reconfirmed| |2016-05-16 CC| |pinskia at gcc dot gnu.org Assignee|unassigned at gcc dot gnu.org |pinskia at gcc dot gnu.org Ever confirmed|0 |1 Severity|normal |enhancement --- Comment #2 from Andrew Pinski <pinskia at gcc dot gnu.org> --- LDCLRL does MEM &= ~op. LDCLR – atomic AND NOT (bitclear) of a location with value in a register, with original data loaded into a register. Applies to 8, 16, 32, 64 bits. Each instruction can have one of four possible orderings - acquire, release, acquire&release, no order, as it performs both a load and a store. So the code is correct just not optimal. Something like this should get the code back to optimal: diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c index 236c819..ad92f6a 100644 --- a/gcc/config/aarch64/aarch64.c +++ b/gcc/config/aarch64/aarch64.c @@ -11607,6 +11607,7 @@ aarch64_gen_atomic_ldop (enum rtx_code code, rtx out_data, rtx out_result, aarch64_atomic_load_op_code ldop_code; rtx src; rtx x; + bool already_swapped = false; if (out_data) out_data = gen_lowpart (mode, out_data); @@ -11614,6 +11615,12 @@ aarch64_gen_atomic_ldop (enum rtx_code code, rtx out_data, rtx out_result, if (out_result) out_result = gen_lowpart (mode, out_result); + if (code == AND && CONST_INT_P (value)) + { + value = simplify_gen_unary (NOT, mode, value, mode); + already_swapped = true; + } + /* Make sure the value is in a register, putting it into a destination register if it needs to be manipulated. */ if (!register_operand (value, mode) @@ -11670,8 +11677,11 @@ aarch64_gen_atomic_ldop (enum rtx_code code, rtx out_data, rtx out_result, if (short_mode) src = gen_lowpart (wmode, src); - not_src = gen_rtx_NOT (wmode, src); - emit_insn (gen_rtx_SET (src, not_src)); + if (!already_swapped) + { + not_src = gen_rtx_NOT (wmode, src); + emit_insn (gen_rtx_SET (src, not_src)); + } if (short_mode) src = gen_lowpart (mode, src);