[Bug target/77281] [ARM] Wrong code generated for move of constant vector with mix of signed and unsigned zeros
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77281 --- Comment #3 from mwahab at gcc dot gnu.org --- Author: mwahab Date: Thu Sep 1 08:57:21 2016 New Revision: 239918 URL: https://gcc.gnu.org/viewcvs?rev=239918&root=gcc&view=rev Log: [ARM] Fix an invalid check for vectors of the same floating-point constants. 2016-09-01 Matthew Wahab PR target/77281 * config/arm/arm.c (neon_valid_immediate): Delete declaration. Use const_vec_duplicate to check for duplicated elements. Modified: branches/gcc-6-branch/gcc/ChangeLog branches/gcc-6-branch/gcc/config/arm/arm.c
[Bug tree-optimization/69270] DOM should exploit range information to create more equivalences
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69270 mwahab at gcc dot gnu.org changed: What|Removed |Added CC||mwahab at gcc dot gnu.org --- Comment #6 from mwahab at gcc dot gnu.org --- The new test gcc.dg/tree-ssa/pr69270-3.c fails on aarch64 and arm targets. There seems to be a disparity between the expected output directive and the comment preceding it. The directive seems to look for 4 instances of ", 1" but the comment says that it should be looking for one instance (of a constant argument). In the output generated for aarch64-none-linux-gnu, there is one instance of ", 1" in the line # phi_inserted_4 = PHI Is this just the test or could it be something else going wrong? Matthew
[Bug c++/57728] Explicit template instantiation with defaulted method causes missing symbol
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=57728 mwahab at gcc dot gnu.org changed: What|Removed |Added CC||mwahab at gcc dot gnu.org --- Comment #10 from mwahab at gcc dot gnu.org --- For aarch64 and arm, the new test g++.dg/cpp0x/explicit12.C shows as UNRESOLVED and the log complain about output files not existing. The test has { dg-do link .. } and { dg-final { scan-assembler-not .. } } but dg-do link doesn't seem to preserve the assembler files. Was the test intended to have a { dg-options "-save-temps" } set? Matthew
[Bug tree-optimization/69047] memcpy is not as optimized as union is
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69047 mwahab at gcc dot gnu.org changed: What|Removed |Added CC||mwahab at gcc dot gnu.org --- Comment #9 from mwahab at gcc dot gnu.org --- The new test gcc.dg/pr69047.c fails on big-endian aarch64 (aarch64_be-none-elf) because the expected output isn't generated. For big-endian aarch64, the expression in the cddce1 file is _2 = BIT_FIELD_REF ; For little-endian aarch64, the expression is as expected: _2 = (unsigned char) b_6(D); Matthew
[Bug tree-optimization/57326] Piecewise folding of operations on PHI nodes
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=57326 mwahab at gcc dot gnu.org changed: What|Removed |Added CC||mwahab at gcc dot gnu.org --- Comment #4 from mwahab at gcc dot gnu.org --- I think that this change is the cause of a failure in gcc.dg/autopar/pr46193.c on aarch64-none-linux-gnu. Looking at the parloop2 log when the test is run, neither function in pr46193.c is vectorized because for both 'latch block not empty'. Matthew
[Bug target/77281] [ARM] Wrong code generated for move of constant vector with mix of signed and unsigned zeros
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77281 --- Comment #2 from mwahab at gcc dot gnu.org --- Author: mwahab Date: Fri Aug 19 13:59:18 2016 New Revision: 239610 URL: https://gcc.gnu.org/viewcvs?rev=239610&root=gcc&view=rev Log: [ARM] Fix an invalid check for vectors of the same floating-point constants. 2016-08-19 Matthew Wahab PR target/77281 * config/arm/arm.c (neon_valid_immediate): Delete declaration. Use const_vec_duplicate to check for duplicated elements. Modified: trunk/gcc/ChangeLog trunk/gcc/config/arm/arm.c
[Bug tree-optimization/72824] [5/6 Regression] Signed floating point zero semantics broken at optimization level -O3 (tree-loop-distribute-patterns)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=72824 --- Comment #12 from mwahab at gcc dot gnu.org --- (In reply to mwahab from comment #11) > The new test-case gcc.c-torture/execute/ieee/pr72824-2.c is failing for > arm-none-linux-gnueabihf with gcc-6 and trunk. > > I'm still looking into why. > Matthew This is a bug in the arm backend. I've opened PR target/77281 and I'm testing a fix. Matthew
[Bug target/77281] [ARM] Wrong code generated for move of constant vector with mix of signed and unsigned zeros
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77281 mwahab at gcc dot gnu.org changed: What|Removed |Added Target||arm --- Comment #1 from mwahab at gcc dot gnu.org --- I think I've got a fix and I'm testing it now.
[Bug target/77281] New: [ARM] Wrong code generated for move of constant vector with mix of signed and unsigned zeros
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77281 Bug ID: 77281 Summary: [ARM] Wrong code generated for move of constant vector with mix of signed and unsigned zeros Product: gcc Version: unknown Status: UNCONFIRMED Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: mwahab at gcc dot gnu.org Target Milestone: --- Test gcc.c-torture/execute/ieee/pr72824-2.c fails for arm targets because the code generated to move a vector of signed and unsigned zeros treats it as a vector of unsigned zeros. That is, an assignment x = { 0.f, -0.f, 0.f, -0.f } is treated as the assignment x = { 0.f, 0.f, 0.f, 0.f }. This is due to config/arm/arm.c/neon_valid_immediate using real_equal to compare the vector elements. Seen on trunk and gcc-6.
[Bug tree-optimization/72824] [5/6 Regression] Signed floating point zero semantics broken at optimization level -O3 (tree-loop-distribute-patterns)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=72824 mwahab at gcc dot gnu.org changed: What|Removed |Added CC||mwahab at gcc dot gnu.org --- Comment #11 from mwahab at gcc dot gnu.org --- The new test-case gcc.c-torture/execute/ieee/pr72824-2.c is failing for arm-none-linux-gnueabihf with gcc-6 and trunk. I'm still looking into why. Matthew
[Bug rtl-optimization/69847] Spec 2006 403.gcc slows down with -mlra vs. reload on PowerPC
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69847 --- Comment #24 from mwahab at gcc dot gnu.org --- Created attachment 39055 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=39055&action=edit Testcase for ICE on arm targets Attached vshuf-v64qi.i for the ICE on arm targets. The failure is triggered with -mfloat-abi=hard -O1. The cc1 command line is: cc1 vshuf-v64qi.i -mfloat-abi=hard -O1 The float-abi option is suspicious but I get the same failure when -march=armv8-a -mfpu=neon-fp-armv8 is added so I don't think that it's the lack of FP registers that's the problem. Matthew
[Bug rtl-optimization/69847] Spec 2006 403.gcc slows down with -mlra vs. reload on PowerPC
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69847 mwahab at gcc dot gnu.org changed: What|Removed |Added CC||mwahab at gcc dot gnu.org --- Comment #22 from mwahab at gcc dot gnu.org --- I believe that this patch is the cause of compilation failures for a number of tests on arm-none-linux-gnueabihf and arm-none-eabi. E.g. arm-none-linux-gnueabihf-gcc -S testsuite/gcc.dg/torture/vshuf-v64qi.c -O1 results in an ICE with testsuite/gcc.dg/torture/vshuf-main.inc: In function ‘test_2’: testsuite/gcc.dg/torture/vshuf-main.inc:28:1: error: insn does not satisfy its constraints: TESTS ^ (insn 606 607 589 2 (set (reg:QI 3 r3 [326]) (reg:QI 31 s15)) testsuite/gcc.dg/torture/vshuf-main.inc:28 182 {*arm_movqi_insn} (expr_list:REG_EQUAL (const_int 13 [0xd]) (nil))) testsuite/gcc.dg/torture/vshuf-main.inc:28:1: internal compiler error: in extract_constrain_insn, at recog.c:2211 0xaea3e8 _fatal_insn(char const*, rtx_def const*, char const*, int, char const*) /work/src/gnutools/armdsg/src/gcc/gcc/rtl-error.c:108 0xaea40f _fatal_insn_not_found(rtx_def const*, char const*, int, char const*) /work/src/gnutools/armdsg/src/gcc/gcc/rtl-error.c:119 0xabcf1d extract_constrain_insn(rtx_insn*) /work/src/gnutools/armdsg/src/gcc/gcc/recog.c:2211 0x9d76af check_rtl /work/src/gnutools/armdsg/src/gcc/gcc/lra.c:2108 0x9dbfa9 lra(_IO_FILE*) /work/src/gnutools/armdsg/src/gcc/gcc/lra.c:2516 0x99106f do_reload /work/src/gnutools/armdsg/src/gcc/gcc/ira.c:5384 0x99106f execute /work/src/gnutools/armdsg/src/gcc/gcc/ira.c:5568 The test compiles with trunk before the change. Let me know if there's anything else I should check. Matthew
[Bug middle-end/71078] x/abs(x) -> sign(1.0,x)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=71078 mwahab at gcc dot gnu.org changed: What|Removed |Added CC||mwahab at gcc dot gnu.org --- Comment #4 from mwahab at gcc dot gnu.org --- (In reply to prathamesh3492 from comment #3) > Fixed. The new test pr71078-1.c fails for arm and aarch64 because copysignl isn't generated for fabsl making the expected output fail to match. Matthew
[Bug middle-end/70920] if ((intptr_t)ptr == 0) doesn't get simplified to if (ptr == 0)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70920 mwahab at gcc dot gnu.org changed: What|Removed |Added CC||mwahab at gcc dot gnu.org --- Comment #4 from mwahab at gcc dot gnu.org --- (In reply to prathamesh3492 from comment #3) > Fixed on trunk. I think that this is the cause of a failure in gcc.dg/tree-ssa/pr22051-2.c for arm-none-linux-gnueabihf and arm-none-eabi. With the patch, the scan-tree-dump match in the test file fails because the conditional becomes if (q_2(D) != 0B) rather than the expected if (r_3 != 0) The comment in the test file suggests that the optimization might be wrong. Matthew
[Bug rtl-optimization/71984] [7 Regression] wrong code with -O -mavx512cd
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=71984 mwahab at gcc dot gnu.org changed: What|Removed |Added CC||mwahab at gcc dot gnu.org --- Comment #7 from mwahab at gcc dot gnu.org --- (In reply to Richard Biener from comment #6) > Should be fixed. The test pr71984.c fails for aarch64 big-endian, with x==2. (The test passes for little-endian.) Matthew
[Bug target/70711] GCC ARM big-endian ARMv8.1 code fails.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70711 mwahab at gcc dot gnu.org changed: What|Removed |Added Status|NEW |RESOLVED Resolution|--- |FIXED --- Comment #5 from mwahab at gcc dot gnu.org --- Fix committed to trunk and to gcc-6.
[Bug target/70711] GCC ARM big-endian ARMv8.1 code fails.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70711 --- Comment #4 from mwahab at gcc dot gnu.org --- Author: mwahab Date: Mon Apr 18 12:18:10 2016 New Revision: 235133 URL: https://gcc.gnu.org/viewcvs?rev=235133&root=gcc&view=rev Log: PR target/70711 * config/arm/bpabi.h (BE8_LINK_SPEC): Add entries for armv8+crc, armv8.1-a and armv8.1-a+crc. Modified: branches/gcc-6-branch/gcc/ChangeLog branches/gcc-6-branch/gcc/config/arm/bpabi.h
[Bug target/70711] GCC ARM big-endian ARMv8.1 code fails.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70711 --- Comment #3 from mwahab at gcc dot gnu.org --- Author: mwahab Date: Mon Apr 18 12:11:03 2016 New Revision: 235132 URL: https://gcc.gnu.org/viewcvs?rev=235132&root=gcc&view=rev Log: PR target/70711 * config/arm/bpabi.h (BE8_LINK_SPEC): Add entries for armv8+crc, armv8.1-a and armv8.1-a+crc. Modified: trunk/gcc/ChangeLog trunk/gcc/config/arm/bpabi.h
[Bug target/70711] New: GCC ARM big-endian ARMv8.1 code fails.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70711 Bug ID: 70711 Summary: GCC ARM big-endian ARMv8.1 code fails. Product: gcc Version: unknown Status: UNCONFIRMED Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: mwahab at gcc dot gnu.org Target Milestone: --- Created attachment 38301 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=38301&action=edit Fix ARMv8.1-A big-endian builds. The config/arm/bpabi.h file wasn't updated when ARMv8.1 support went in. This means that when GCC is targeting big-endian ARMv8.1, the code it generates can fail to execute. The attached patch is being tested for armeb-none-eabi.
[Bug tree-optimization/64946] [AArch64] gcc.target/aarch64/vect-abs-compile.c - "abs" vectorization fails for char/short types
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64946 mwahab at gcc dot gnu.org changed: What|Removed |Added CC||mwahab at gcc dot gnu.org --- Comment #11 from mwahab at gcc dot gnu.org --- I'm testing a patch that fixes the vect-abs-compile failure by adding a case to convert_to_integer to push the narrowing type conversion to the argument of the ABS_EXPR.
[Bug tree-optimization/68619] [6 Regression] error: loop with header 6 not in loop tree
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68619 mwahab at gcc dot gnu.org changed: What|Removed |Added CC||mwahab at gcc dot gnu.org --- Comment #8 from mwahab at gcc dot gnu.org --- (In reply to Jeffrey A. Law from comment #6) > Fixed on trunk. The new test case pr68619-4.c is failing for arm-none-eabi because it triggers the error pr68619-4.c:42:17: error: width of 'code' exceeds its type enum rtx_code code:16; For arm-none-eabi, the rtx_code enum is being represented in 8-bits. so 'enum rtx_code code:8;' compiles without the error. Matthew
[Bug target/64784] -march=native should be supported
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64784 Bug 64784 depends on bug 64783, which changed state. Bug 64783 Summary: -march=armv8.1-a should be supported https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64783 What|Removed |Added Status|ASSIGNED|RESOLVED Resolution|--- |FIXED
[Bug target/64783] -march=armv8.1-a should be supported
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64783 mwahab at gcc dot gnu.org changed: What|Removed |Added Status|ASSIGNED|RESOLVED Resolution|--- |FIXED --- Comment #4 from mwahab at gcc dot gnu.org --- Author: mwahab Date: Wed Dec 16 11:45:25 2015 New Revision: 231678 URL: https://gcc.gnu.org/viewcvs?rev=231678&root=gcc&view=rev Log: [ARM] Add support for ARMv8.1. * config/arm/arm-arches.def: Add "armv8.1-a" and "armv8.1-a+crc". * config/arm/arm-protos.h (FL2_ARCH8_1): New. (FL2_FOR_ARCH8_1A): New. * config/arm/arm-tables.opt: Regenerate. * config/arm/arm.c (arm_arch8_1): New. (arm_option_override): Set arm_arch8_1. * config/arm/arm.h (TARGET_NEON_RDMA): New. (arm_arch8_1): Declare. * doc/invoke.texi (ARM Options, -march): Add "armv8.1-a" and "armv8.1-a+crc". (ARM Options, -mfpu): Fix a typo. Modified: trunk/gcc/ChangeLog trunk/gcc/config/arm/arm-arches.def trunk/gcc/config/arm/arm-protos.h trunk/gcc/config/arm/arm-tables.opt trunk/gcc/config/arm/arm.c trunk/gcc/config/arm/arm.h trunk/gcc/doc/invoke.texi
[Bug lto/61886] [4.9/5/6 Regression] LTO breaks fread with _FORTIFY_SOURCE=2
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61886 mwahab at gcc dot gnu.org changed: What|Removed |Added CC||mwahab at gcc dot gnu.org --- Comment #53 from mwahab at gcc dot gnu.org --- (In reply to Jan Hubicka from comment #52) > We no longer merge the decls, so hopefully this is fixed, but it would be > nice to give it a try on the original packages that failed. We still may > lose/provoke some warnings during IPA optimizations I suppose as IPA code > does not assume this kind of side effect for aliases, but that is not an LTO > specific issue anymore. The new test case alias-2.c fails for aarch64. Compiling from the command line with gcc -O2 -fdump-tree-all, I get the following for the tree.optimized pass - ;; Function main (main, funcdef_no=0, decl_uid=2753, cgraph_uid=0, symbol_order=3) (executed once) main () { int off.0_2; : off.0_2 = off; b[off.0_2] = 1; a[off.0_2] = 2; __builtin_abort (); } Matthew
[Bug tree-optimization/68333] [6 Regression] FAIL: gcc.dg/vect/slp-multitypes-4.c -flto -ffat-lto-objects scan-tree-dump-times vect "vectorized 1 loops" 1
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68333 --- Comment #5 from mwahab at gcc dot gnu.org --- (In reply to mwahab from comment #4) > (In reply to Richard Biener from comment #3) > > Should be fixed now. > > It's still failing for aarch64_be-none-elf. > > The test has a dg-skip-if directive for aarch64-*-*, should that be > aarch64*-*-*? Sorry, I was looking at the wrong test, slp-multitype-4 is passing for aarch64_be.
[Bug tree-optimization/68333] [6 Regression] FAIL: gcc.dg/vect/slp-multitypes-4.c -flto -ffat-lto-objects scan-tree-dump-times vect "vectorized 1 loops" 1
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68333 mwahab at gcc dot gnu.org changed: What|Removed |Added CC||mwahab at gcc dot gnu.org --- Comment #4 from mwahab at gcc dot gnu.org --- (In reply to Richard Biener from comment #3) > Should be fixed now. It's still failing for aarch64_be-none-elf. The test has a dg-skip-if directive for aarch64-*-*, should that be aarch64*-*-*? Matthew
[Bug tree-optimization/68659] [6 regression] FAIL: gcc.dg/graphite/id-pr45230-1.c (internal compiler error)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68659 mwahab at gcc dot gnu.org changed: What|Removed |Added CC||mwahab at gcc dot gnu.org --- Comment #9 from mwahab at gcc dot gnu.org --- I still see this on arm-none-eabi.
[Bug fortran/68534] No error on mismatch in number of arguments between submodule and module interface
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68534 mwahab at gcc dot gnu.org changed: What|Removed |Added CC||mwahab at gcc dot gnu.org --- Comment #4 from mwahab at gcc dot gnu.org --- (In reply to Paul Thomas from comment #3) > Author: pault Hello > Date: Mon Nov 30 13:33:27 2015 > New Revision: 231072 > > URL: https://gcc.gnu.org/viewcvs?rev=231072&root=gcc&view=rev > Log: > 2015-11-30 Paul Thomas > > PR fortran/68534 > * decl.c (gfc_match_formal_arglist): Cope with zero formal args > either in interface declaration or in procedure declaration in > submodule. > > 2015-11-30 Paul Thomas > > PR fortran/68534 > * gfortran.dg/submodule_13.f08: New test. > This test is failing on aarch64-none-linux-gnu, it looks like the second dg-error (on module subroutine bar) doesn't get the output it expected. Running from the command line, only the first error message is emitted. Is the test correct?
[Bug target/65697] __atomic memory barriers not strong enough for __sync builtins
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65697 mwahab at gcc dot gnu.org changed: What|Removed |Added Known to fail||4.9.4 --- Comment #74 from mwahab at gcc dot gnu.org --- (In reply to Ramana Radhakrishnan from comment #73) > Now fixed ,Matthew ? Its fixed in trunk and gcc-5 but not in gcc-4.9.
[Bug target/67143] [5/6 Regression] ICE (could not split insn) on aarch64-linux-gnu
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67143 --- Comment #5 from mwahab at gcc dot gnu.org --- Author: mwahab Date: Wed Sep 23 09:48:16 2015 New Revision: 228037 URL: https://gcc.gnu.org/viewcvs?rev=228037&root=gcc&view=rev Log: [Aarch64][target/PR 67143][5.2] Backport correct constraints for atomic operations. gcc/ 2015-09-23 Matthew Wahab Backport from mainline 2015-08-14 Matthew Wahab PR target/67143 * config/aarch64/atomics.md (atomic_): Replace 'lconst_atomic' with 'const_atomic'. (atomic_fetch_): Likewise. (atomic__fetch): Likewise. * config/aarch64/iterators.md (lconst-atomic): Move below 'const_atomic'. (const_atomic): New. gcc/testsuite/ 2015-09-23 Matthew Wahab Backport from mainline 2015-08-14 Matthew Wahab Matthias Klose PR target/67143 * gcc.c-torture/compile/pr67143.c: New * gcc.target/aarch64/atomic-op-imm.c (atomic_fetch_add_negative_RELAXED): New. (atomic_fetch_sub_negative_ACQUIRE): New. Added: branches/gcc-5-branch/gcc/testsuite/gcc.c-torture/compile/pr67143.c Modified: branches/gcc-5-branch/gcc/ChangeLog branches/gcc-5-branch/gcc/config/aarch64/atomics.md branches/gcc-5-branch/gcc/config/aarch64/iterators.md branches/gcc-5-branch/gcc/testsuite/ChangeLog branches/gcc-5-branch/gcc/testsuite/gcc.target/aarch64/atomic-op-imm.c
[Bug target/67143] [5/6 Regression] ICE (could not split insn) on aarch64-linux-gnu
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67143 --- Comment #3 from mwahab at gcc dot gnu.org --- Author: mwahab Date: Fri Aug 14 15:05:42 2015 New Revision: 226895 URL: https://gcc.gnu.org/viewcvs?rev=226895&root=gcc&view=rev Log: gcc/ 2015-08-14 Matthew Wahab PR target/67143 * config/aarch64/atomics.md (atomic_): Replace 'lconst_atomic' with 'const_atomic'. (atomic_fetch_): Likewise. (atomic__fetch): Likewise. * config/aarch64/iterators.md (lconst-atomic): Move below 'const_atomic'. (const_atomic): New. gcc/testsuite/ 2015-08-14 Matthew Wahab Matthias Klose PR target/67143 * gcc.c-torture/compile/pr67143.c: New * gcc.target/aarch64/atomic-op-imm.c (atomic_fetch_add_negative_RELAXED): New. (atomic_fetch_sub_negative_ACQUIRE): New. Added: trunk/gcc/testsuite/gcc.c-torture/compile/pr67143.c Modified: trunk/gcc/ChangeLog trunk/gcc/config/aarch64/atomics.md trunk/gcc/config/aarch64/iterators.md trunk/gcc/testsuite/ChangeLog trunk/gcc/testsuite/gcc.target/aarch64/atomic-op-imm.c
[Bug target/67143] [5/6 Regression] ICE (could not split insn) on aarch64-linux-gnu
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67143 mwahab at gcc dot gnu.org changed: What|Removed |Added Assignee|unassigned at gcc dot gnu.org |mwahab at gcc dot gnu.org --- Comment #2 from mwahab at gcc dot gnu.org --- (In reply to ktkachov from comment #1) > Seems related to something Matthew has been working on recently? No, it looks like something to do with the constratint introduced by https://gcc.gnu.org/viewcvs/gcc?view=revision&revision=217076
[Bug target/65697] __atomic memory barriers not strong enough for __sync builtins
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65697 --- Comment #72 from mwahab at gcc dot gnu.org --- Author: mwahab Date: Wed Aug 5 13:43:04 2015 New Revision: 226628 URL: https://gcc.gnu.org/viewcvs?rev=226628&root=gcc&view=rev Log: Backport from trunk: 2015-06-29 Matthew Wahab PR target/65697 * gcc.target/arm/armv-sync-comp-swap.c: New. * gcc.target/arm/armv-sync-op-acquire.c: New. * gcc.target/arm/armv-sync-op-full.c: New. * gcc.target/arm/armv-sync-op-release.c: New. Added: branches/gcc-5-branch/gcc/testsuite/gcc.target/arm/armv8-sync-comp-swap.c branches/gcc-5-branch/gcc/testsuite/gcc.target/arm/armv8-sync-op-acquire.c branches/gcc-5-branch/gcc/testsuite/gcc.target/arm/armv8-sync-op-full.c branches/gcc-5-branch/gcc/testsuite/gcc.target/arm/armv8-sync-op-release.c Modified: branches/gcc-5-branch/gcc/testsuite/ChangeLog
[Bug target/65697] __atomic memory barriers not strong enough for __sync builtins
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65697 --- Comment #71 from mwahab at gcc dot gnu.org --- Author: mwahab Date: Wed Aug 5 13:40:14 2015 New Revision: 226627 URL: https://gcc.gnu.org/viewcvs?rev=226627&root=gcc&view=rev Log: Backport from trunk: 2015-06-29 Matthew Wahab PR target/65697 * config/arm/arm.c (arm_split_compare_and_swap): For ARMv8, replace an initial acquire barrier with final barrier. Modified: branches/gcc-5-branch/gcc/ChangeLog branches/gcc-5-branch/gcc/config/arm/arm.c
[Bug target/65697] __atomic memory barriers not strong enough for __sync builtins
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65697 --- Comment #70 from mwahab at gcc dot gnu.org --- Author: mwahab Date: Wed Aug 5 13:27:41 2015 New Revision: 226625 URL: https://gcc.gnu.org/viewcvs?rev=226625&root=gcc&view=rev Log: Backport from trunk: 2015-06-29 Matthew Wahab PR target/65697 * config/armc/arm.c (arm_split_atomic_op): For ARMv8, replace an initial acquire barrier with final barrier. Modified: branches/gcc-5-branch/gcc/ChangeLog branches/gcc-5-branch/gcc/config/arm/arm.c
[Bug target/65697] __atomic memory barriers not strong enough for __sync builtins
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65697 --- Comment #69 from mwahab at gcc dot gnu.org --- Author: mwahab Date: Wed Aug 5 11:48:43 2015 New Revision: 226621 URL: https://gcc.gnu.org/viewcvs?rev=226621&root=gcc&view=rev Log: Backport from trunk 2015-06-01 Matthew Wahab PR target/65697 * gcc.target/aarch64/sync-comp-swap.c: New. * gcc.target/aarch64/sync-comp-swap.x: New. * gcc.target/aarch64/sync-op-acquire.c: New. * gcc.target/aarch64/sync-op-acquire.x: New. * gcc.target/aarch64/sync-op-full.c: New. * gcc.target/aarch64/sync-op-full.x: New. * gcc.target/aarch64/sync-op-release.c: New. * gcc.target/aarch64/sync-op-release.x: New. Added: branches/gcc-5-branch/gcc/testsuite/gcc.target/aarch64/sync-comp-swap.c branches/gcc-5-branch/gcc/testsuite/gcc.target/aarch64/sync-comp-swap.x branches/gcc-5-branch/gcc/testsuite/gcc.target/aarch64/sync-op-acquire.c branches/gcc-5-branch/gcc/testsuite/gcc.target/aarch64/sync-op-acquire.x branches/gcc-5-branch/gcc/testsuite/gcc.target/aarch64/sync-op-full.c branches/gcc-5-branch/gcc/testsuite/gcc.target/aarch64/sync-op-full.x branches/gcc-5-branch/gcc/testsuite/gcc.target/aarch64/sync-op-release.c branches/gcc-5-branch/gcc/testsuite/gcc.target/aarch64/sync-op-release.x Modified: branches/gcc-5-branch/gcc/testsuite/ChangeLog
[Bug target/65697] __atomic memory barriers not strong enough for __sync builtins
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65697 --- Comment #68 from mwahab at gcc dot gnu.org --- Author: mwahab Date: Wed Aug 5 11:40:25 2015 New Revision: 226620 URL: https://gcc.gnu.org/viewcvs?rev=226620&root=gcc&view=rev Log: Backport from trunk. 2015-06-01 Matthew Wahab PR target/65697 * config/aarch64/aarch64.c (aarch64_split_compare_and_swap): Check for __sync memory models, emit initial loads and final barriers as appropriate. Modified: branches/gcc-5-branch/gcc/ChangeLog branches/gcc-5-branch/gcc/config/aarch64/aarch64.c
[Bug target/65697] __atomic memory barriers not strong enough for __sync builtins
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65697 --- Comment #67 from mwahab at gcc dot gnu.org --- Author: mwahab Date: Wed Aug 5 11:29:28 2015 New Revision: 226619 URL: https://gcc.gnu.org/viewcvs?rev=226619&root=gcc&view=rev Log: Backport from trunk. 2015-06-01 Matthew Wahab PR target/65697 * config/aarch64/aarch64.c (aarch64_emit_post_barrier): New. (aarch64_split_atomic_op): Check for __sync memory models, emit appropriate initial loads and final barriers. Modified: branches/gcc-5-branch/gcc/ChangeLog branches/gcc-5-branch/gcc/config/aarch64/aarch64.c
[Bug target/65697] __atomic memory barriers not strong enough for __sync builtins
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65697 --- Comment #66 from mwahab at gcc dot gnu.org --- Author: mwahab Date: Wed Aug 5 11:20:59 2015 New Revision: 226618 URL: https://gcc.gnu.org/viewcvs?rev=226618&root=gcc&view=rev Log: Backport from trunk 2015-05-12 Andrew MacLeod PR target/65697 * coretypes.h (MEMMODEL_SYNC, MEMMODEL_BASE_MASK): New macros. (enum memmodel): Add SYNC_{ACQUIRE,RELEASE,SEQ_CST}. * tree.h (memmodel_from_int, memmodel_base, is_mm_relaxed) (is_mm_consume,is_mm_acquire, is_mm_release, is_mm_acq_rel) (is_mm_seq_cst, is_mm_sync): New accessor functions. * builtins.c (expand_builtin_sync_operation) (expand_builtin_compare_and_swap): Use MEMMODEL_SYNC_SEQ_CST. (expand_builtin_sync_lock_release): Use MEMMODEL_SYNC_RELEASE. (get_memmodel, expand_builtin_atomic_compare_exchange) (expand_builtin_atomic_load, expand_builtin_atomic_store) (expand_builtin_atomic_clear): Use new accessor routines. (expand_builtin_sync_synchronize): Use MEMMODEL_SYNC_SEQ_CST. * optabs.c (expand_compare_and_swap_loop): Use MEMMODEL_SYNC_SEQ_CST. (maybe_emit_sync_lock_test_and_set): Use new accessors and MEMMODEL_SYNC_ACQUIRE. (expand_sync_lock_test_and_set): Use MEMMODEL_SYNC_ACQUIRE. (expand_mem_thread_fence, expand_mem_signal_fence, expand_atomic_load) (expand_atomic_store): Use new accessors. * emit-rtl.c (need_atomic_barrier_p): Add additional enum cases. * tsan.c (instrument_builtin_call): Update check for memory model beyond final enum to use MEMMODEL_LAST. * c-family/c-common.c: Use new accessor for memmodel_base. * config/aarch64/aarch64.c (aarch64_expand_compare_and_swap): Use new accessors. * config/aarch64/atomics.md (atomic_load,atomic_store) (arch64_load_exclusive, aarch64_store_exclusive) (mem_thread_fence, *dmb): Likewise. * config/alpha/alpha.c (alpha_split_compare_and_swap) (alpha_split_compare_and_swap_12): Likewise. * config/arm/arm.c (arm_expand_compare_and_swap) (arm_split_compare_and_swap, arm_split_atomic_op): Likewise. * config/arm/sync.md (atomic_load, atomic_store) (atomic_loaddi): Likewise. * config/i386/i386.c (ix86_destroy_cost_data, ix86_memmodel_check): Likewise. * config/i386/sync.md (mem_thread_fence, atomic_store): Likewise. * config/ia64/ia64.c (ia64_expand_atomic_op): Add new memmodel cases and use new accessors. * config/ia64/sync.md (mem_thread_fence, atomic_load) (atomic_store, atomic_compare_and_swap) (atomic_exchange): Use new accessors. * config/mips/mips.c (mips_process_sync_loop): Likewise. * config/pa/pa.md (atomic_loaddi, atomic_storedi): Likewise. * config/rs6000/rs6000.c (rs6000_pre_atomic_barrier) (rs6000_post_atomic_barrier): Add new cases. (rs6000_expand_atomic_compare_and_swap): Use new accessors. * config/rs6000/sync.md (mem_thread_fence): Add new cases. (atomic_load): Add new cases and use new accessors. (store_quadpti): Add new cases. * config/s390/s390.md (mem_thread_fence, atomic_store): Use new accessors. * config/sparc/sparc.c (sparc_emit_membar_for_model): Use new accessors. * doc/extend.texi: Update docs to indicate 16 bits are used for memory model, not 8. Modified: branches/gcc-5-branch/gcc/ChangeLog branches/gcc-5-branch/gcc/builtins.c branches/gcc-5-branch/gcc/c-family/c-common.c branches/gcc-5-branch/gcc/config/aarch64/aarch64.c branches/gcc-5-branch/gcc/config/aarch64/atomics.md branches/gcc-5-branch/gcc/config/alpha/alpha.c branches/gcc-5-branch/gcc/config/arm/arm.c branches/gcc-5-branch/gcc/config/arm/sync.md branches/gcc-5-branch/gcc/config/i386/i386.c branches/gcc-5-branch/gcc/config/i386/sync.md branches/gcc-5-branch/gcc/config/ia64/ia64.c branches/gcc-5-branch/gcc/config/ia64/sync.md branches/gcc-5-branch/gcc/config/mips/mips.c branches/gcc-5-branch/gcc/config/pa/pa.md branches/gcc-5-branch/gcc/config/rs6000/rs6000.c branches/gcc-5-branch/gcc/config/rs6000/sync.md branches/gcc-5-branch/gcc/config/s390/s390.md branches/gcc-5-branch/gcc/config/sparc/sparc.c branches/gcc-5-branch/gcc/coretypes.h branches/gcc-5-branch/gcc/doc/extend.texi branches/gcc-5-branch/gcc/emit-rtl.c branches/gcc-5-branch/gcc/optabs.c branches/gcc-5-branch/gcc/tree.h branches/gcc-5-branch/gcc/tsan.c
[Bug target/65697] __atomic memory barriers not strong enough for __sync builtins
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65697 --- Comment #65 from mwahab at gcc dot gnu.org --- Author: mwahab Date: Mon Jun 29 16:12:12 2015 New Revision: 225134 URL: https://gcc.gnu.org/viewcvs?rev=225134&root=gcc&view=rev Log: 2015-06-29 Matthew Wahab PR target/65697 * gcc.target/arm/armv-sync-comp-swap.c: New. * gcc.target/arm/armv-sync-op-acquire.c: New. * gcc.target/arm/armv-sync-op-full.c: New. * gcc.target/arm/armv-sync-op-release.c: New. Added: trunk/gcc/testsuite/gcc.target/arm/armv8-sync-comp-swap.c (with props) trunk/gcc/testsuite/gcc.target/arm/armv8-sync-op-acquire.c (with props) trunk/gcc/testsuite/gcc.target/arm/armv8-sync-op-full.c (with props) trunk/gcc/testsuite/gcc.target/arm/armv8-sync-op-release.c (with props) Modified: trunk/gcc/ChangeLog Propchange: trunk/gcc/testsuite/gcc.target/arm/armv8-sync-comp-swap.c ('svn:eol-style' added) Propchange: trunk/gcc/testsuite/gcc.target/arm/armv8-sync-comp-swap.c ('svn:keywords' added) Propchange: trunk/gcc/testsuite/gcc.target/arm/armv8-sync-op-acquire.c ('svn:eol-style' added) Propchange: trunk/gcc/testsuite/gcc.target/arm/armv8-sync-op-acquire.c ('svn:keywords' added) Propchange: trunk/gcc/testsuite/gcc.target/arm/armv8-sync-op-full.c ('svn:eol-style' added) Propchange: trunk/gcc/testsuite/gcc.target/arm/armv8-sync-op-full.c ('svn:keywords' added) Propchange: trunk/gcc/testsuite/gcc.target/arm/armv8-sync-op-release.c ('svn:eol-style' added) Propchange: trunk/gcc/testsuite/gcc.target/arm/armv8-sync-op-release.c ('svn:keywords' added)
[Bug target/65697] __atomic memory barriers not strong enough for __sync builtins
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65697 --- Comment #64 from mwahab at gcc dot gnu.org --- Author: mwahab Date: Mon Jun 29 16:09:10 2015 New Revision: 225133 URL: https://gcc.gnu.org/viewcvs?rev=225133&root=gcc&view=rev Log: 2015-06-29 Matthew Wahab PR target/65697 * config/armc/arm.c (arm_split_compare_and_swap): For ARMv8, replace an initial acquire barrier with final barrier. Modified: trunk/gcc/ChangeLog trunk/gcc/config/arm/arm.c
[Bug target/65697] __atomic memory barriers not strong enough for __sync builtins
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65697 --- Comment #63 from mwahab at gcc dot gnu.org --- Author: mwahab Date: Mon Jun 29 16:03:34 2015 New Revision: 225132 URL: https://gcc.gnu.org/viewcvs?rev=225132&root=gcc&view=rev Log: 2015-06-29 Matthew Wahab PR target/65697 * config/armc/arm.c (arm_split_atomic_op): For ARMv8, replace an initial acquire barrier with final barrier. Modified: trunk/gcc/ChangeLog trunk/gcc/config/arm/arm.c
[Bug target/64783] -march=armv8.1-a should be supported
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64783 --- Comment #3 from mwahab at gcc dot gnu.org --- I've just noticed this has been assigned to me. Support for -march=armv8.1-a has been added to the Aarch64 backend, the ARM backend is still to be done. Author: mwahab Date: Tue Jun 16 13:38:37 2015 New Revision: 224519 URL: https://gcc.gnu.org/viewcvs?rev=224519&root=gcc&view=rev Log: 2015-06-16 Matthew Wahab * config/aarch64/aarch64-arches.def: Add "armv8.1-a". * config/aarch64/aarch64-options-extensions.def: Update "fP", "simd" and "crypto". Add "lse", "pan", "lor" and "rdma". * gcc/config/aarch64/aarch64.h (AARCH64_FL_LSE): New. (AARCH64_FL_PAN): New. (AARCH64_FL_LOR): New. (AARCH64_FL_RDMA): New. (AARCH64_FL_FOR_ARCH8_1): New. * doc/invoke.texi (AArch64 Options): Add "armv8.1-a" to -march. Add "lse", "pan", "lor", "rdma" to feature modifiers. Modified: trunk/gcc/ChangeLog trunk/gcc/config/aarch64/aarch64-arches.def trunk/gcc/config/aarch64/aarch64-option-extensions.def trunk/gcc/config/aarch64/aarch64.h trunk/gcc/doc/invoke.texi
[Bug target/65697] __atomic memory barriers not strong enough for __sync builtins
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65697 --- Comment #62 from mwahab at gcc dot gnu.org --- (In reply to Ramana Radhakrishnan from comment #61) > Well, confirmed at least. And at the minute fixed on trunk - not sure if we > are asking for backports for this ? Marcus has asked for this to be backported to 5.1 and maybe 4.9. I'm looking at it.
[Bug target/65697] __atomic memory barriers not strong enough for __sync builtins
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65697 --- Comment #60 from mwahab at gcc dot gnu.org --- Author: mwahab Date: Mon Jun 1 15:24:37 2015 New Revision: 223986 URL: https://gcc.gnu.org/viewcvs?rev=223986&root=gcc&view=rev Log: PR target/65697 * gcc.target/aarch64/sync-comp-swap.c: New. * gcc.target/aarch64/sync-comp-swap.x: New. * gcc.target/aarch64/sync-op-acquire.c: New. * gcc.target/aarch64/sync-op-acquire.x: New. * gcc.target/aarch64/sync-op-full.c: New. * gcc.target/aarch64/sync-op-full.x: New. * gcc.target/aarch64/sync-op-release.c: New. * gcc.target/aarch64/sync-op-release.x: New. Added: trunk/gcc/testsuite/gcc.target/aarch64/sync-comp-swap.c trunk/gcc/testsuite/gcc.target/aarch64/sync-comp-swap.x trunk/gcc/testsuite/gcc.target/aarch64/sync-op-acquire.c trunk/gcc/testsuite/gcc.target/aarch64/sync-op-acquire.x trunk/gcc/testsuite/gcc.target/aarch64/sync-op-full.c trunk/gcc/testsuite/gcc.target/aarch64/sync-op-full.x trunk/gcc/testsuite/gcc.target/aarch64/sync-op-release.c trunk/gcc/testsuite/gcc.target/aarch64/sync-op-release.x Modified: trunk/gcc/testsuite/ChangeLog
[Bug target/65697] __atomic memory barriers not strong enough for __sync builtins
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65697 --- Comment #59 from mwahab at gcc dot gnu.org --- Author: mwahab Date: Mon Jun 1 15:21:02 2015 New Revision: 223984 URL: https://gcc.gnu.org/viewcvs?rev=223984&root=gcc&view=rev Log: PR target/65697 * config/aarch64/aarch64.c (aarch64_split_compare_and_swap): Check for __sync memory models, emit initial loads and final barriers as appropriate. Modified: trunk/gcc/ChangeLog trunk/gcc/config/aarch64/aarch64.c
[Bug target/65697] __atomic memory barriers not strong enough for __sync builtins
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65697 --- Comment #58 from mwahab at gcc dot gnu.org --- Author: mwahab Date: Mon Jun 1 15:18:19 2015 New Revision: 223983 URL: https://gcc.gnu.org/viewcvs?rev=223983&root=gcc&view=rev Log: PR target/65697 * config/aarch64/aarch64.c (aarch64_emit_post_barrier):New. (aarch64_split_atomic_op): Check for __sync memory models, emit appropriate initial loads and final barriers. Modified: trunk/gcc/ChangeLog trunk/gcc/config/aarch64/aarch64.c
[Bug target/65697] __atomic memory barriers not strong enough for __sync builtins
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65697 --- Comment #56 from mwahab at gcc dot gnu.org --- (In reply to James Greenhalgh from comment #55) > (In reply to torvald from comment #49) > > > This is the case of allowing non-DRF normal accesses. The *other* case I > > was thinking about is how the test would have to look like when *not* > > allowing them. One way to do it would be: > > > > thread_a { > > __sync_lock_test_and_set (foo, 1) > > __sync_lock_test_and_set (bar, 1) // or __sync_lock_release, or __sync RMW > > } > > [..] (it is > a shame that the DRF code you provided will suffer from an extra barrier if > Matthew/Andrew's work is applied, but I think this is a corner case which we > probably don't want to put too much thought in to working around). I'm not familiar with the code but I would have thought that it would be straightforward to optimize away the first dmb. It seems like it would be a simple peephole to spot a sequence of two atomic operations, both with __sync barriers, and replace the first with the equivalent __atomic barrier.
[Bug target/65697] __atomic memory barriers not strong enough for __sync builtins
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65697 --- Comment #54 from mwahab at gcc dot gnu.org --- (In reply to mwahab from comment #53) > (In reply to Andrew Macleod from comment #50) > > Created attachment 35478 [details] > > implement SYNC flag for memory model > > > > This compiles on all targets, but is only runtime tested on > > x86_64-unknown-linux-gnu with no regressions. > > The patch passes check-gcc for aarch64-none-linux-gnu with no new failures. > I'll also run the tests on arm. > Also passes check-gcc for arm-none-eabi.
[Bug target/65697] __atomic memory barriers not strong enough for __sync builtins
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65697 --- Comment #53 from mwahab at gcc dot gnu.org --- (In reply to Andrew Macleod from comment #50) > Created attachment 35478 [details] > implement SYNC flag for memory model > > This compiles on all targets, but is only runtime tested on > x86_64-unknown-linux-gnu with no regressions. The patch passes check-gcc for aarch64-none-linux-gnu with no new failures. I'll also run the tests on arm. > With this you should be able to easily issue whatever different insn > sequence you need to within an atomic pattern. Give it a try. If it works > as required, then we'll see about executing on all targets for testing... I modified the Aarch64 backend to use the new is_mm_sync test to generate stronger barriers when needed and it all works as required.
[Bug target/65697] __atomic memory barriers not strong enough for __sync builtins
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65697 --- Comment #51 from mwahab at gcc dot gnu.org --- (In reply to Andrew Macleod from comment #50) > Created attachment 35478 [details] > implement SYNC flag for memory model > > > Adding the __sync barriers to coretypes.h is the better approach if more > > than one backend has this problem. Since it seems that only Aarch64 is > > affected, I think its better to go with the target hook. If a new > > architecture gets added with the same problem, it will be easy enough to > > switch to the coretypes.h approach. > > Well, if it affects some target now, it likely to affect some target in the > future as well. Didn't someone also mention there is a potential issue with > PPC somewhere too? The earlier comment (https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65697#c24) suggests that PPC generates full barriers so won't be affected by this. It looks like Aarch64 is the only current target with the problem. > In any case, I'm not a fan of adding target hooks for this... We ought to > just bite the bullet and integrate it cleanly into the memory model. OK. > I have a patch which adds a SYNC flag to the upper bit of the memory model. > It modifies all the generic code and target patterns to use access routines > (declared in tree.h) instead of hard coding masks and flags (which probably > should have be done in the first place). This makes carrying around the SYNC > flag transparent until you want to look for it. >From a quick look, this seems to do what's needed to allow the Aarch64 backend to distinguish the sync and atomic code. > This compiles on all targets, but is only runtime tested on > x86_64-unknown-linux-gnu with no regressions. The mips backend was the only one that stood out as needing some care, because the way it uses the memory models (e.g. in function mips_process_sync_loop) is a little different from the backends. > With this you should be able to easily issue whatever different insn > sequence you need to within an atomic pattern. Give it a try. If it works > as required, then we'll see about executing on all targets for testing... I'll give it a go.
[Bug target/65697] __atomic memory barriers not strong enough for __sync builtins
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65697 --- Comment #48 from mwahab at gcc dot gnu.org --- (In reply to Andrew Macleod from comment #47) > Created attachment 35425 [details] > potential patch to add MEMMODEL_SYNC > > I don't know where we've finally settled on this, but I did prototype out > what involved in adding a new MEMMODEL_SYNC which can be identified and made > stronger than MEMMODEL_SEQ_CST if so desired. Normally it will behave > exactly like MEMMODEL_SEQ_CST. This bootstraps on x86-64 with no testsuite > regressions. > I've been working on patches for this. Adding a new MEMMODEL type was my first approach because it means that gcc has one representation for all the memory models it supports. I decided not to submit the patch because Aarch64 also needs an tag for the __sync acquire used in __sync_lock_test_and_set. Adding that touches more backends than the MEMMODEL_SYNC and some of those changes are not obvious (for the mips backend in particular). The reasons why Aarch64 needs a new acquire barrier are rather tenuous (they're in the earlier comments) and don't seem to justify the possibly invasive changes that would be needed. The approach I'm working on now is to add a target hook to allow a backend to supply its own expansion of calls to the __sync builtins. That makes for smaller changes in the target-independent code and lets the Aarch64 backend add the new barriers as target-specific memory models. The actual code generation goes through the infrastructure for the atomics. Adding the __sync barriers to coretypes.h is the better approach if more than one backend has this problem. Since it seems that only Aarch64 is affected, I think its better to go with the target hook. If a new architecture gets added with the same problem, it will be easy enough to switch to the coretypes.h approach.
[Bug target/65697] __atomic memory barriers not strong enough for __sync builtins
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65697 --- Comment #46 from mwahab at gcc dot gnu.org --- (In reply to James Greenhalgh from comment #45) > (In reply to mwahab from comment #44) > > And this final sentence is buggy by omission of a mention of memory writes: > > but following memory reads are not prevented from being speculated to > before the barrier. > > Which can be read as forbidding store-store reordering across the barrier. It doesn't say anything about store-store reordering so it's trumped by the stated intention to be compatible with the psABI. > As I say though, I think this is an artefact of transcribing the > documentation, rather than an attempt to provide stronger ordering semantics > in the GCC implementation. Agreed.
[Bug target/65697] __atomic memory barriers not strong enough for __sync builtins
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65697 --- Comment #44 from mwahab at gcc dot gnu.org --- (In reply to James Greenhalgh from comment #43) > (In reply to torvald from comment #37) > > > > I'm not worried about __sync_lock_release, I think the documentation is > > > strong enough and unambiguous. > > > > Are you aware that the GCC's __sync disallow store-store reordering across > > __sync_lock_release, whereas the psABI docs don't? > > No I was not, and even looking exactly for the text you were referring to, > it took > me three attempts to spot it. Yes, I understand now why you are concerned > about > the GCC wording. Perhaps this is just an artefact of a mistake transcribing > the psABI? > > AArch64 is certainly not providing that guarantee just now. I wasn't able to find the text, could you copy it in a reply. In the GCC manual, the only description of __sync_lock_release behaviour is in the last paragraph. That descriptions seems consistent with the function being a release barrier and with the current Aarch64 code generated for it.
[Bug target/65697] __atomic memory barriers not strong enough for __sync builtins
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65697 --- Comment #41 from mwahab at gcc dot gnu.org --- (In reply to torvald from comment #38) > (In reply to Andrew Macleod from comment #34) > > Also, if you look at the IA-64 __sync_lock_release vs. GCC docs' > __sync_lock_release, the latter is like x86/TSO. Do you have any info on > which other semantics __sync was supposed to adhere to? > > One potential way to solve it would be to just require code that uses __sync > to more or less implement an IA-64 or x86 memory model, modulo allowing > compiler-reordering and optimization between adjacent non-__sync memory > accesses. This could be inefficient on ARM (see James' examples) and > perhaps Power too (or not -- see Jakub's comments). If the __sync barriers are as described in the GCC manual, that a barrier is atomic and its restrictions apply to all data references, then the Aarch64 backend doesn't currently emit strong enough barriers. For MEMMODEL_SEQ_CST, the problem was visible enough and the solution I suggested (extending the set of available memmodel types) was simple enough that the changes it would need could be justified. I don't think that's true for the MEMMODEL_ACQUIRE case which seems to be much less likely to be seen and would be rather more disruptive. I believe that Aarch64 is the only current target where the code needs to be strengthened. Since extending the set of memmodels is difficult to justify and (IMO) so is resurrecting the __sync patterns, I suggest just adding a target hook to allow the expansion of __sync calls to be overridden. That will allow Aarch64 to set a target-specific memmodel value, as is currently allowed, which can then be passed through the existing __atomics mechanisms in the middle through to the Aarch64 backend. No other backend will need to be touched. If it happens that future architectures have a similar problem then we can reconsider whether any changes need to be made in the target-independent expansions. Does that sounds like a reasonable approach?
[Bug target/65697] __atomic memory barriers not strong enough for __sync builtins
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65697 --- Comment #40 from mwahab at gcc dot gnu.org --- (In reply to Andrew Macleod from comment #25) > Documentation needs updating for sure... The rules have changed under us > since originally SEQ_CST and sync were intended to be the same thing... > Guess I shouldn't have tied our horse to the C++11 cart :-) I've submitted a patch to try to update the __atomics documentation: https://gcc.gnu.org/ml/gcc-patches/2015-04/msg01025.html.
[Bug target/65697] __atomic memory barriers not strong enough for __sync builtins
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65697 --- Comment #36 from mwahab at gcc dot gnu.org --- (In reply to Andrew Macleod from comment #31) > > > Targets that don't need special sync patterns (ie most of them) simply don't > provide them. The expanders see no sync pattern and use SEQ_CST, exactly > like they do today. For current targets that may be true but assumes that no new target will need the special sync patterns. > sync patterns would only be provided by targets which do need to do > something different. This means there is no potential bug introduction on > unaffected targets.. I was thinking of existing sync patterns in current backends which may have been made redundant by the atomics builtins but are still there. There's a danger that they'll get used even if they're not preferred for that target. There's also the question of whether they're getting tested, assuming that they were only ever generated for the __sync builtins. > Well, it actually returns back to the situation before they were merged. We > use to look for sync patterns too... until I thought they were redundant. I believe that they are redundant (for __sync builtins anyway) since it looks like everything could be done through the atomics + new barrier. > The intention was to deprecate __sync and support just the c++ memory model. > SEQ_CST == SYNC was the original intent and understanding, thus the code > merge. If we decide we want/need to provide a stronger form of SEQ_CST and > call it SYNC, then we can do that as required... but I'm not envisioning a > lot of future additional memory models. Although never say never I suppose. I don't expect that many models will be needed, the set should certainly be kept as small as possible. I think that extending it in this case is justified because of the gap between what is needed and what is now available. The choice seems to be - between continuing the move away from the syncs to the atomics. This makes the __sync and the __atomic builtins rely on one infrastructure. - keeping both the atomics and the syncs infrastructures with individual targets choosing between them. The first option seems better, not least because it reduces the number of things that need to be understood when dealing with synchronization primitives.
[Bug target/65697] __atomic memory barriers not strong enough for __sync builtins
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65697 --- Comment #33 from mwahab at gcc dot gnu.org --- (In reply to torvald from comment #32) > (In reply to James Greenhalgh from comment #28) > > (In reply to torvald from comment #24) > > > 3) We could do something just on ARM (and scan other arcs for similar > > > issues). That's perhaps the cleanest option. > > > > Which leaves 3). From Andrew's two proposed solutions: > > 3) Also seems best to me. 2) is worst, 1) is too much of a stick. > > > This also gives us an easier route to fixing any issues with the > > acquire/release __sync primitives (__sync_lock_test_and_set and > > __sync_lock_release) if we decide that these also need to be stronger than > > their C++11 equivalents. > > I don't think we have another case of different __sync vs. __atomics > semantics in case of __sync_lock_test_and_set. The current specification > makes it clear that this is an acquire barrier, and how it describes the > semantics (ie, loads and stores that are program-order before the acquire op > can move to after it) , this seems to be consistent with the effects C11 > specifies for acquire MO (with perhaps the distinction that C11 is clear > that acquire needs to be paired with some release op to create an ordering > constraint). Thanks, I suspect that the acquire barrier may not be much as much of an issue as I had remembered. (The issue came up while I was trying to understand the C11 semantics.) The test case (aarch64) I have is: int foo = 0; int bar = 0; int T5(void) { int x = __sync_lock_test_and_set(&foo, 1); return bar; } .L11: ldaxrw2, [x0] ; load-acquire stxrw3, w1, [x0] ; store cbnzw3, .L11 ldrw0, [x0, 4] ; load ret My concern was that the load could be speculated ahead of the store. Since the store marks the end of the barrier, that could make it appear as if the load had completed before the acquire-barrier. In retrospect, I don't think that there will be a problem because any load that could be moved would have to end up with the same value as if it had not moved.
[Bug target/65697] __atomic memory barriers not strong enough for __sync builtins
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65697 --- Comment #30 from mwahab at gcc dot gnu.org --- (In reply to James Greenhalgh from comment #28) > Which leaves 3). From Andrew's two proposed solutions: > > > a) introduce an additional memory model... MEMMODEL_SYNC or something > > which is even stronger than SEQ_CST. This would involve fixing any places > > that assume SEQ_CST is the highest. And then we use this from the places > > that expand sync calls as the memory model. > > This seems a sensible approach and leads to a nicer design, but I worry that > it might be overkill for primitives which we ultimately want to reduce > support for and eventually deprecate. I don't understand why it would be overkill. Its already expected that not all barriers will be meaningful for all targets and target code to handle redundant barriers usually comes down to a few clauses in a conditional statement. > > (b) may be easier to implement, but puts more onus on the target.. > > probably involves more complex patterns since you need both atomic and > > sync patterns. My guess is some duplication of code will occur here. But > > the impact is only to specific targets. > > When I looked at this problem internally a few weeks back, this is exactly > how I expected things to work (we can add the documentation for the pattern > names to the list of things which need fixing, as it took me a while to > figure out why my new patterns were never expanded!). > > I don't think this is particularly onerous for a target. The tough part in > all of this is figuring out the minimal cost instructions at the ISA level > to use for the various __atomic primitives. Extending support to a stronger > model, should be straightforward given explicit documentation of the > stronger ordering requirements. > My objection to using sync patterns is that it does take more work, both for the initial implementation and for continuing maintenance. It also means adding sync patterns to targets that would not otherwise need them and preserving a part of the gcc infrastructure that is only needed for a legacy feature and could otherwise be targeted for removal. > Certainly, a target would have to do the same legwork for a) regardless, and > would have to pollute their atomic patterns with checks and code paths for > MEMMODEL_SYNC. Actually, the changes needed for (a) are very simple. The checks and code-paths for handling barriers are already there, reusing them for MEMMODEL_SYNC is trivial. The resulting code is much easier to understand, and safely fix, because it is all in the same place, than if it was spread out across patterns and functions. > This also gives us an easier route to fixing any issues with the > acquire/release __sync primitives (__sync_lock_test_and_set and > __sync_lock_release) if we decide that these also need to be stronger than > their C++11 equivalents. This seems like a chunky workaround to avoid having to add a barrier representation to gcc. It's a, not necessarily straightforward, reworking of the middle-end to use patterns that would need to be added to architectures that don't currently have them and at least checked in the architectures that do. This seems to come down to what enum memmodel is supposed to be for. If it is intended to represent the barriers provided by C11/C+11 atomics and nothing else than a workaround seems unavoidable. If it is meant to represent barriers needed by gcc to compile user code than, it seems to me, that it would be better to just add the barrier to the enum and update code where necessary. Since memmodel is relatively recent, was merged from what looks like the C++ memory model branch (cxx-mem-model), and doesn't seem to have changed since then, it's maybe not surprising that it doesn't already include every thing needed by gcc. I don't see that adding to it should be prohibited, provided the additions can be show to be strictly required.
[Bug target/65697] __atomic memory barriers not strong enough for __sync builtins
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65697 --- Comment #29 from mwahab at gcc dot gnu.org --- (In reply to mwahab from comment #27) > (In reply to Andrew Macleod from comment #25) > > My opinion: > > > > 1) is undesirable... even though it could possibly accelerate the conversion > > of legacy sync to atomic calls... I fear it would instead just cause > > frustration, annoyance and worse. I don't think we need to actually > > penalize sync for no reason better than a 1 in a somethillion shot in the > > dark. > > > > 2) Similar reasoning, I don't think we need to penalize SEQ_CST everywhere > > for a legacy sync call that probably doesn't need stronger code either. > > > > which leaves option 3) > > > > There are 2 primary options I think There may be a third option, which is to set-up a target hook to allow the sync expansions in the middle end to be overridden. This limits the changes to the backends that need the different semantics without having to continue with the sync_ patterns (which aren't in aarch64 for example). There's space in the memmodel values to support target specific barriers so, for aarch64, this would allow the atomics code to be reused. I don't know much about the target hook infrastructure though so I don't know how disruptive a new hook would be.
[Bug target/65697] __atomic memory barriers not strong enough for __sync builtins
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65697 --- Comment #27 from mwahab at gcc dot gnu.org --- (In reply to Andrew Macleod from comment #25) > My opinion: > > 1) is undesirable... even though it could possibly accelerate the conversion > of legacy sync to atomic calls... I fear it would instead just cause > frustration, annoyance and worse. I don't think we need to actually > penalize sync for no reason better than a 1 in a somethillion shot in the > dark. > > 2) Similar reasoning, I don't think we need to penalize SEQ_CST everywhere > for a legacy sync call that probably doesn't need stronger code either. > > which leaves option 3) > > There are 2 primary options I think > > a) introduce an additional memory model... MEMMODEL_SYNC or something which > is even stronger than SEQ_CST. This would involve fixing any places that > assume SEQ_CST is the highest. And then we use this from the places that > expand sync calls as the memory model. > > or > b) When we are expanding sync calls, we first look for target __sync > patterns instead of atomic patterns. If they exist, we use those. Otherwise > we simply expand like we do today. > > > (b) may be easier to implement, but puts more onus on the target.. probably > involves more complex patterns since you need both atomic and sync patterns. > My guess is some duplication of code will occur here. But the impact is > only to specific targets. > > (a) Is probably cleaner... just touches a lot more code. Since we're in > stage 1, maybe (a) is a better long term solution...? > Adding a new barrier specifer to enum memmodel seems the simplest approach. It would mean that all barriers used in gcc are represented in the same way and that would make them more visible and easier to understand than splitting them between the enum memmodel and the atomic/sync patterns. I'm testing a patch-set for option (a). It touches several backends but all the changes are very simple since the new barrier is the same as MEMMODEL_SEQ_CST for most targets. One thing though is that the aarch64 code for __sync_lock_test_and_set and there may have a similar problem with a too-weak barrier. It's not confirmed that there is a problem but, since it may mean more work in this area, I thought I'd mention it.
[Bug target/65697] __atomic memory barriers not strong enough for __sync builtins
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65697 --- Comment #26 from mwahab at gcc dot gnu.org --- (In reply to torvald from comment #21) > (In reply to Andrew Haley from comment #20) > > (In reply to mwahab from comment #19) > > > (In reply to Andrew Haley from comment #18) > > > > > > It looks inconsistent with C11 S7.17.7.4-2 (C++11 S29.6.4-21) "Further, if > > > the comparison is true, memory is affected according to the value of > > > success, and if the comparison is false, memory is affected according to > > > the > > > value of failure." (where success and failure are the memory model > > > arguments.) In this case, the write to *exp should be > > > memory_order_seq_cst. > > > > But no store actually takes place, so the only effect is that of the read. > > You can't have a sequentially consistent store without a store. > > I agree. If you continue reading in the C++11 paragraph that you cited, > you'll see that if just one MO is provided and the CAS fails, an acq_rel MO > is downgraded to acquire and a release MO to relaxed. This is consistent > with no update of the atomic variable (note that expected is not atomic, so > applying an MO to accesses to it is not meaningful in any case). However, > if the provided MO is seq_cst, I think a failed CAS still needs to be > equivalent to a seq_cst load. Yes, the last two sentences in the C++11 paragraph make it clear: "If the operation returns true, these operations are atomic read-modify-write operations (1.10). Otherwise, these operations are atomic load operations." In that case, the Aarch64 code looks ok.
[Bug target/65697] __atomic memory barriers not strong enough for __sync builtins
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65697 --- Comment #19 from mwahab at gcc dot gnu.org --- (In reply to Andrew Haley from comment #18) > (In reply to mwahab from comment #17) > > > > > int cas(int* barf, int* expected, int* desired) > > { > > return __atomic_compare_exchange_n(barf, expected, desired, 0, > > __ATOMIC_SEQ_CST, __ATOMIC_SEQ_CST); > > } > > > > cas: > > ldr w3, [x1] > > .L3: > > ldaxr w4, [x0] > > cmp w4, w3 > > bne .L4 > > stlxr w5, w2, [x0] ; store-release > > cbnzw5, .L3 > > .L4: > > csetw0, eq > > cbz w0, .L6 > > ret > > .p2align 3 > > .L6: > > str w4, [x1] ; store, no barrier. > > ret > > > > > > This looks odd to me but I'd need look into it more. > > That looks fine to me: if the CAS fails, the prev value -> *expected. It looks inconsistent with C11 S7.17.7.4-2 (C++11 S29.6.4-21) "Further, if the comparison is true, memory is affected according to the value of success, and if the comparison is false, memory is affected according to the value of failure." (where success and failure are the memory model arguments.) In this case, the write to *exp should be memory_order_seq_cst.
[Bug target/65697] __atomic memory barriers not strong enough for __sync builtins
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65697 --- Comment #17 from mwahab at gcc dot gnu.org --- According to the GCC documentation, __atomic_compare_exchange(ptr, exp, des, ..) is: if (*ptr == *exp) *ptr = *exp; else *exp = *ptr; On Aarch64 the else (*ptr != *exp) branch is a store rather than a store-release. This doesn't appear in your example but with int cas(int* barf, int* expected, int* desired) { return __atomic_compare_exchange_n(barf, expected, desired, 0, __ATOMIC_SEQ_CST, __ATOMIC_SEQ_CST); } cas: ldrw3, [x1] .L3: ldaxrw4, [x0] cmpw4, w3 bne.L4 stlxrw5, w2, [x0] ; store-release cbnzw5, .L3 .L4: csetw0, eq cbzw0, .L6 ret .p2align 3 .L6: strw4, [x1] ; store, no barrier. ret This looks odd to me but I'd need look into it more.
[Bug target/65697] __atomic memory barriers not strong enough for __sync builtins
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65697 --- Comment #14 from mwahab at gcc dot gnu.org --- (In reply to Andrew Haley from comment #13) > There's surely a documentation problem here. > > GCC defines this: > > `__ATOMIC_SEQ_CST' > Full barrier in both directions and synchronizes with acquire > loads and release stores in all threads. The documentation is a little misleading. Barriers in __atomic operations are slightly weaker then barriers in __atomic fences; the differences are rather subtle though. For the __sync implementation an fence would be ok but an operation is too weak. > But LDAXR/STLXR doesn't do that, and there's no write barrier at all when > the compare fails. If the intention really is to map onto c++11, this > specification is wrong. The LDAXR/STLXR sequences rely on the C11/C++11 prohibition of data races. That the __atomic builtins assume this restriction is implied by the references to C11/C++11 in the documentation but should probably be made clearer. I'll try to write a patch, if nobody else gets there first. I'll have to look at the compare-exchange code, it does looks like it goes wrong.