On Mon, May 2, 2016 at 6:36 PM, Jakub Jelinek <ja...@redhat.com> wrote: > Hi! > > This patch adds pattern recognition (see attached testcase on what it e.g. > can handle) of the i?86/x86_64 lock; bt[src] operations. > It is too late to do this during or after RTL expansion, so it is done late > during gimple, by recognizing these sequences in the fold builtins pass, > turning those into an internal call which represents atomically setting, > complementing or resetting a bit and remembering the previous value of the > bit. > > The patch doesn't handle (yet) the weirdo handling of memory operands where > the counter can be actually not just in between 0 and bitsize - 1 of the > particular mode, but can be much larger and the CPU locates the right memory > word first, but could be extended to handle that later. > I'd like to find out if there are other targets that have similar > instructions in their ISAs, or if x86_64/i686 is the only one. > > Bootstrapped/regtested on x86_64-linux and i686-linux (relies on the > gimple.c patch I've just posted, otherwise the expected number of > scan-assembler-times would need to be tweaked for the short int cases). > Ok for trunk? > > 2016-05-02 Jakub Jelinek <ja...@redhat.com> > > PR target/49244 > * tree-ssa-ccp.c: Include stor-layout.h and optabs-query.h. > (optimize_atomic_bit_test_and): New function. > (pass_fold_builtins::execute): Use it. > * optabs.def (atomic_bit_test_and_set_optab, > atomic_bit_test_and_complement_optab, > atomic_bit_test_and_reset_optab): New optabs. > * internal-fn.def (ATOMIC_BIT_TEST_AND_SET, > ATOMIC_BIT_TEST_AND_COMPLEMENT, ATOMIC_BIT_TEST_AND_RESET): New ifns. > * builtins.h (expand_ifn_atomic_bit_test_and): New prototype. > * builtins.c (expand_ifn_atomic_bit_test_and): New function. > * internal-fn.c (expand_ATOMIC_BIT_TEST_AND_SET, > expand_ATOMIC_BIT_TEST_AND_COMPLEMENT, > expand_ATOMIC_BIT_TEST_AND_RESET): New functions. > * doc/md.texi (atomic_bit_test_and_set@var{mode}, > atomic_bit_test_and_complement@var{mode}, > atomic_bit_test_and_reset@var{mode}): Document. > * config/i386/sync.md (atomic_bit_test_and_set<mode>, > atomic_bit_test_and_complement<mode>, > atomic_bit_test_and_reset<mode>): New expanders. > (atomic_bit_test_and_set<mode>_1, > atomic_bit_test_and_complement<mode>_1, > atomic_bit_test_and_reset<mode>_1): New insns. > > * gcc.target/i386/pr49244-1.c: New test. > * gcc.target/i386/pr49244-2.c: New test.
> +(define_expand "atomic_bit_test_and_set<mode>" > + [(match_operand:SWI248 0 "register_operand") > + (match_operand:SWI248 1 "memory_operand") > + (match_operand:SWI248 2 "nonmemory_operand") > + (match_operand:SI 3 "const_int_operand") ;; model > + (match_operand:SI 4 "const_int_operand")] > + "" > +{ > + emit_insn (gen_atomic_bit_test_and_set<mode>_1 (operands[1], operands[2], > + operands[3])); > + operands[5] = gen_reg_rtx (QImode); Please don't use operands[N] without corresponding (match_dup N) in the RTL pattern. Tthe "operands" array is only as long as the last operand number from the pattern. Just grep the pattern name from generated insn-emit.c and you will see the problem. > + ix86_expand_setcc (operands[5], EQ, gen_rtx_REG (CCCmode, FLAGS_REG), > + const0_rtx); > + rtx result = convert_modes (<MODE>mode, QImode, operands[5], 1); > + if (operands[4] == const0_rtx) > + result = expand_simple_binop (<MODE>mode, ASHIFT, result, > + operands[2], operands[0], 0, OPTAB_DIRECT); > + if (result != operands[0]) > + emit_move_insn (operands[0], result); > + DONE; > +}) > + > +(define_insn "atomic_bit_test_and_set<mode>_1" > + [(set (reg:CCC FLAGS_REG) > + (compare:CCC > + (unspec_volatile:SWI248 > + [(match_operand:SWI248 0 "memory_operand" "+m") > + (match_operand:SI 2 "const_int_operand")] ;; model > + UNSPECV_XCHG) > + (const_int 0))) > + (set (zero_extract:SWI248 (match_dup 0) > + (const_int 1) > + (match_operand:SWI248 1 > + "nonmemory_operand" "<r>N")) No need for <r> attribute with SWI248, you can use plain "r" above. <r> makes difference only for QImode. > + (const_int 1))] > + "" > + "lock{%;} %K2bts{<imodesuffix>}\t{%1, %0|%0, %1}") > + > +(define_expand "atomic_bit_test_and_complement<mode>" > + [(match_operand:SWI248 0 "register_operand") > + (match_operand:SWI248 1 "memory_operand") > + (match_operand:SWI248 2 "nonmemory_operand") > + (match_operand:SI 3 "const_int_operand") ;; model > + (match_operand:SI 4 "const_int_operand")] > + "" > +{ > + emit_insn (gen_atomic_bit_test_and_complement<mode>_1 (operands[1], > + operands[2], > + operands[3])); > + operands[5] = gen_reg_rtx (QImode); > + ix86_expand_setcc (operands[5], EQ, gen_rtx_REG (CCCmode, FLAGS_REG), > + const0_rtx); > + rtx result = convert_modes (<MODE>mode, QImode, operands[5], 1); > + if (operands[4] == const0_rtx) > + result = expand_simple_binop (<MODE>mode, ASHIFT, result, > + operands[2], operands[0], 0, OPTAB_DIRECT); > + if (result != operands[0]) > + emit_move_insn (operands[0], result); > + DONE; > +}) > + > +(define_insn "atomic_bit_test_and_complement<mode>_1" > + [(set (reg:CCC FLAGS_REG) > + (compare:CCC > + (unspec_volatile:SWI248 > + [(match_operand:SWI248 0 "memory_operand" "+m") > + (match_operand:SI 2 "const_int_operand")] ;; model > + UNSPECV_XCHG) > + (const_int 0))) > + (set (zero_extract:SWI248 (match_dup 0) > + (const_int 1) > + (match_operand:SWI248 1 > + "nonmemory_operand" "<r>N")) > + (not:SWI248 (zero_extract:SWI248 (match_dup 0) > + (const_int 1) > + (match_dup 1))))] > + "" > + "lock{%;} %K2btc{<imodesuffix>}\t{%1, %0|%0, %1}") > + > +(define_expand "atomic_bit_test_and_reset<mode>" > + [(match_operand:SWI248 0 "register_operand") > + (match_operand:SWI248 1 "memory_operand") > + (match_operand:SWI248 2 "nonmemory_operand") > + (match_operand:SI 3 "const_int_operand") ;; model > + (match_operand:SI 4 "const_int_operand")] > + "" > +{ > + emit_insn (gen_atomic_bit_test_and_reset<mode>_1 (operands[1], operands[2], > + operands[3])); > + operands[5] = gen_reg_rtx (QImode); > + ix86_expand_setcc (operands[5], EQ, gen_rtx_REG (CCCmode, FLAGS_REG), > + const0_rtx); > + rtx result = convert_modes (<MODE>mode, QImode, operands[5], 1); > + if (operands[4] == const0_rtx) > + result = expand_simple_binop (<MODE>mode, ASHIFT, result, > + operands[2], operands[0], 0, OPTAB_DIRECT); > + if (result != operands[0]) > + emit_move_insn (operands[0], result); > + DONE; > +}) > + > +(define_insn "atomic_bit_test_and_reset<mode>_1" > + [(set (reg:CCC FLAGS_REG) > + (compare:CCC > + (unspec_volatile:SWI248 > + [(match_operand:SWI248 0 "memory_operand" "+m") > + (match_operand:SI 2 "const_int_operand")] ;; model > + UNSPECV_XCHG) > + (const_int 0))) > + (set (zero_extract:SWI248 (match_dup 0) > + (const_int 1) > + (match_operand:SWI248 1 > + "nonmemory_operand" "<r>N")) > + (const_int 0))] > + "" > + "lock{%;} %K2btr{<imodesuffix>}\t{%1, %0|%0, %1}") > --- gcc/testsuite/gcc.target/i386/pr49244-1.c.jj 2016-05-02 > 14:52:56.776814774 +0200 > +++ gcc/testsuite/gcc.target/i386/pr49244-1.c 2016-05-02 12:39:52.126750700 > +0200 > @@ -0,0 +1,188 @@ > +/* PR target/49244 */ > +/* { dg-do compile } */ > +/* { dg-options "-O2" } */ > + > +void bar (void); > + > +__attribute__((noinline, noclone)) int > +f1 (int *a, int bit) > +{ > + unsigned int mask = (1u << bit); > + return (__sync_fetch_and_or (a, mask) & mask) != 0; > +} > + > +__attribute__((noinline, noclone)) int > +f2 (int *a, int bit) > +{ > + unsigned int mask = (1u << bit); > + unsigned int t1 = __atomic_fetch_or (a, mask, __ATOMIC_RELAXED); > + unsigned int t2 = t1 & mask; > + return t2 != 0; > +} > + > +__attribute__((noinline, noclone)) long int > +f3 (long int *a, int bit) > +{ > + unsigned long int mask = (1ul << bit); > + return (__atomic_fetch_or (a, mask, __ATOMIC_SEQ_CST) & mask) == 0; > +} > + > +__attribute__((noinline, noclone)) int > +f4 (int *a) > +{ > + unsigned int mask = (1u << 7); > + return (__sync_fetch_and_or (a, mask) & mask) != 0; > +} > + > +__attribute__((noinline, noclone)) int > +f5 (int *a) > +{ > + unsigned int mask = (1u << 13); > + return (__atomic_fetch_or (a, mask, __ATOMIC_RELAXED) & mask) != 0; > +} > + > +__attribute__((noinline, noclone)) int > +f6 (int *a) > +{ > + unsigned int mask = (1u << 0); > + return (__atomic_fetch_or (a, mask, __ATOMIC_SEQ_CST) & mask) != 0; > +} > + > +__attribute__((noinline, noclone)) void > +f7 (int *a, int bit) > +{ > + unsigned int mask = (1u << bit); > + if ((__sync_fetch_and_xor (a, mask) & mask) != 0) > + bar (); > +} > + > +__attribute__((noinline, noclone)) void > +f8 (int *a, int bit) > +{ > + unsigned int mask = (1u << bit); > + if ((__atomic_fetch_xor (a, mask, __ATOMIC_RELAXED) & mask) == 0) > + bar (); > +} > + > +__attribute__((noinline, noclone)) int > +f9 (int *a, int bit) > +{ > + unsigned int mask = (1u << bit); > + return (__atomic_fetch_xor (a, mask, __ATOMIC_SEQ_CST) & mask) != 0; > +} > + > +__attribute__((noinline, noclone)) int > +f10 (int *a) > +{ > + unsigned int mask = (1u << 7); > + return (__sync_fetch_and_xor (a, mask) & mask) != 0; > +} > + > +__attribute__((noinline, noclone)) int > +f11 (int *a) > +{ > + unsigned int mask = (1u << 13); > + return (__atomic_fetch_xor (a, mask, __ATOMIC_RELAXED) & mask) != 0; > +} > + > +__attribute__((noinline, noclone)) int > +f12 (int *a) > +{ > + unsigned int mask = (1u << 0); > + return (__atomic_fetch_xor (a, mask, __ATOMIC_SEQ_CST) & mask) != 0; > +} > + > +__attribute__((noinline, noclone)) int > +f13 (int *a, int bit) > +{ > + unsigned int mask = (1u << bit); > + return (__sync_fetch_and_and (a, ~mask) & mask) != 0; > +} > + > +__attribute__((noinline, noclone)) int > +f14 (int *a, int bit) > +{ > + unsigned int mask = (1u << bit); > + return (__atomic_fetch_and (a, ~mask, __ATOMIC_RELAXED) & mask) != 0; > +} > + > +__attribute__((noinline, noclone)) int > +f15 (int *a, int bit) > +{ > + unsigned int mask = (1u << bit); > + return (__atomic_fetch_and (a, ~mask, __ATOMIC_SEQ_CST) & mask) != 0; > +} > + > +__attribute__((noinline, noclone)) int > +f16 (int *a) > +{ > + unsigned int mask = (1u << 7); > + return (__sync_fetch_and_and (a, ~mask) & mask) != 0; > +} > + > +__attribute__((noinline, noclone)) int > +f17 (int *a) > +{ > + unsigned int mask = (1u << 13); > + return (__atomic_fetch_and (a, ~mask, __ATOMIC_RELAXED) & mask) != 0; > +} > + > +__attribute__((noinline, noclone)) int > +f18 (int *a) > +{ > + unsigned int mask = (1u << 0); > + return (__atomic_fetch_and (a, ~mask, __ATOMIC_SEQ_CST) & mask) != 0; > +} > + > +__attribute__((noinline, noclone)) unsigned long int > +f19 (unsigned long int *a, int bit) > +{ > + unsigned long int mask = (1ul << bit); > + return (__atomic_xor_fetch (a, mask, __ATOMIC_SEQ_CST) & mask) != 0; > +} > + > +__attribute__((noinline, noclone)) unsigned long int > +f20 (unsigned long int *a) > +{ > + unsigned long int mask = (1ul << 7); > + return (__atomic_xor_fetch (a, mask, __ATOMIC_SEQ_CST) & mask) == 0; > +} > + > +__attribute__((noinline, noclone)) int > +f21 (int *a, int bit) > +{ > + unsigned int mask = (1u << bit); > + return (__sync_fetch_and_or (a, mask) & mask); > +} > + > +__attribute__((noinline, noclone)) unsigned long int > +f22 (unsigned long int *a) > +{ > + unsigned long int mask = (1ul << 7); > + return (__atomic_xor_fetch (a, mask, __ATOMIC_SEQ_CST) & mask); > +} > + > +__attribute__((noinline, noclone)) unsigned long int > +f23 (unsigned long int *a) > +{ > + unsigned long int mask = (1ul << 7); > + return (__atomic_fetch_xor (a, mask, __ATOMIC_SEQ_CST) & mask); > +} > + > +__attribute__((noinline, noclone)) unsigned short int > +f24 (unsigned short int *a) > +{ > + unsigned short int mask = (1u << 7); > + return (__sync_fetch_and_or (a, mask) & mask) != 0; > +} > + > +__attribute__((noinline, noclone)) unsigned short int > +f25 (unsigned short int *a) > +{ > + unsigned short int mask = (1u << 7); > + return (__atomic_fetch_or (a, mask, __ATOMIC_SEQ_CST) & mask) != 0; > +} > + > +/* { dg-final { scan-assembler-times "lock;?\[ \t\]*bts" 9 } } */ > +/* { dg-final { scan-assembler-times "lock;?\[ \t\]*btc" 10 } } */ > +/* { dg-final { scan-assembler-times "lock;?\[ \t\]*btr" 6 } } */ > --- gcc/testsuite/gcc.target/i386/pr49244-2.c.jj 2016-05-02 > 12:51:51.501983254 +0200 > +++ gcc/testsuite/gcc.target/i386/pr49244-2.c 2016-05-02 14:47:30.240202019 > +0200 > @@ -0,0 +1,109 @@ > +/* PR target/49244 */ > +/* { dg-do run } */ > +/* { dg-options "-O2 -g" } */ Probably no need for -g in the testcase? > +/* { dg-additional-options "-march=i486" { target ia32 } } */ Hm, a "dg-do run" testcase with -march is kind of dangerous without a runtime test for requested architecture. OTOH, do we really need -march here? Uros.