[Bug target/77281] [ARM] Wrong code generated for move of constant vector with mix of signed and unsigned zeros

2016-09-01 Thread mwahab at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77281

--- Comment #3 from mwahab at gcc dot gnu.org ---
Author: mwahab
Date: Thu Sep  1 08:57:21 2016
New Revision: 239918

URL: https://gcc.gnu.org/viewcvs?rev=239918&root=gcc&view=rev
Log:
[ARM] Fix an invalid check for vectors of the same floating-point constants.

2016-09-01  Matthew Wahab  

PR target/77281
* config/arm/arm.c (neon_valid_immediate): Delete declaration.
Use const_vec_duplicate to check for duplicated elements.


Modified:
branches/gcc-6-branch/gcc/ChangeLog
branches/gcc-6-branch/gcc/config/arm/arm.c

[Bug tree-optimization/69270] DOM should exploit range information to create more equivalences

2016-08-30 Thread mwahab at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69270

mwahab at gcc dot gnu.org changed:

   What|Removed |Added

 CC||mwahab at gcc dot gnu.org

--- Comment #6 from mwahab at gcc dot gnu.org ---
The new test gcc.dg/tree-ssa/pr69270-3.c fails on aarch64 and arm targets.

There seems to be a disparity between the expected output directive and the
comment preceding it. The directive seems to look for 4 instances of ", 1" but
the comment says that it should be looking for one instance (of a constant
argument).

In the output generated for aarch64-none-linux-gnu, there is one instance of ",
1" in the line

  # phi_inserted_4 = PHI 


Is this just the test or could it be something else going wrong?
Matthew

[Bug c++/57728] Explicit template instantiation with defaulted method causes missing symbol

2016-08-30 Thread mwahab at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=57728

mwahab at gcc dot gnu.org changed:

   What|Removed |Added

 CC||mwahab at gcc dot gnu.org

--- Comment #10 from mwahab at gcc dot gnu.org ---
For aarch64 and arm, the new test g++.dg/cpp0x/explicit12.C shows as UNRESOLVED
and the log complain about output files not existing.

The test has { dg-do link .. } and { dg-final { scan-assembler-not .. } } but
dg-do link doesn't seem to preserve the assembler files. Was the test intended
to have a { dg-options "-save-temps" } set?

Matthew

[Bug tree-optimization/69047] memcpy is not as optimized as union is

2016-08-30 Thread mwahab at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69047

mwahab at gcc dot gnu.org changed:

   What|Removed |Added

 CC||mwahab at gcc dot gnu.org

--- Comment #9 from mwahab at gcc dot gnu.org ---
The new test gcc.dg/pr69047.c fails on big-endian aarch64 (aarch64_be-none-elf)
because the expected output isn't generated.

For big-endian aarch64, the expression in the cddce1 file is

  _2 = BIT_FIELD_REF ;


For little-endian aarch64, the expression is as expected:

  _2 = (unsigned char) b_6(D);


Matthew

[Bug tree-optimization/57326] Piecewise folding of operations on PHI nodes

2016-08-22 Thread mwahab at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=57326

mwahab at gcc dot gnu.org changed:

   What|Removed |Added

 CC||mwahab at gcc dot gnu.org

--- Comment #4 from mwahab at gcc dot gnu.org ---
I think that this change is the cause of a failure in gcc.dg/autopar/pr46193.c
on aarch64-none-linux-gnu.

Looking at the parloop2 log when the test is run, neither function in pr46193.c
is vectorized because for both 'latch block not empty'.

Matthew

[Bug target/77281] [ARM] Wrong code generated for move of constant vector with mix of signed and unsigned zeros

2016-08-19 Thread mwahab at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77281

--- Comment #2 from mwahab at gcc dot gnu.org ---
Author: mwahab
Date: Fri Aug 19 13:59:18 2016
New Revision: 239610

URL: https://gcc.gnu.org/viewcvs?rev=239610&root=gcc&view=rev
Log:
[ARM] Fix an invalid check for vectors of the same floating-point constants.

2016-08-19  Matthew Wahab  

PR target/77281
* config/arm/arm.c (neon_valid_immediate): Delete declaration.
Use const_vec_duplicate to check for duplicated elements.


Modified:
trunk/gcc/ChangeLog
trunk/gcc/config/arm/arm.c

[Bug tree-optimization/72824] [5/6 Regression] Signed floating point zero semantics broken at optimization level -O3 (tree-loop-distribute-patterns)

2016-08-17 Thread mwahab at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=72824

--- Comment #12 from mwahab at gcc dot gnu.org ---
(In reply to mwahab from comment #11)
> The new test-case gcc.c-torture/execute/ieee/pr72824-2.c is failing for
> arm-none-linux-gnueabihf with gcc-6 and trunk.
> 
> I'm still looking into why.
> Matthew

This is a bug in the arm backend. I've opened PR target/77281 and I'm testing a
fix.
Matthew

[Bug target/77281] [ARM] Wrong code generated for move of constant vector with mix of signed and unsigned zeros

2016-08-17 Thread mwahab at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77281

mwahab at gcc dot gnu.org changed:

   What|Removed |Added

 Target||arm

--- Comment #1 from mwahab at gcc dot gnu.org ---
I think I've got a fix and I'm testing it now.

[Bug target/77281] New: [ARM] Wrong code generated for move of constant vector with mix of signed and unsigned zeros

2016-08-17 Thread mwahab at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77281

Bug ID: 77281
   Summary: [ARM] Wrong code generated for move of constant vector
with mix of signed and unsigned zeros
   Product: gcc
   Version: unknown
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: mwahab at gcc dot gnu.org
  Target Milestone: ---

Test gcc.c-torture/execute/ieee/pr72824-2.c fails for arm targets because the
code generated to move a vector of signed and unsigned zeros treats it as a
vector of unsigned zeros.

That is, an assignment x = { 0.f, -0.f, 0.f, -0.f } is treated as the
assignment x = { 0.f, 0.f, 0.f, 0.f }. 

This is due to config/arm/arm.c/neon_valid_immediate using real_equal to
compare the vector elements.

Seen on trunk and gcc-6.

[Bug tree-optimization/72824] [5/6 Regression] Signed floating point zero semantics broken at optimization level -O3 (tree-loop-distribute-patterns)

2016-08-16 Thread mwahab at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=72824

mwahab at gcc dot gnu.org changed:

   What|Removed |Added

 CC||mwahab at gcc dot gnu.org

--- Comment #11 from mwahab at gcc dot gnu.org ---
The new test-case gcc.c-torture/execute/ieee/pr72824-2.c is failing for
arm-none-linux-gnueabihf with gcc-6 and trunk.

I'm still looking into why.
Matthew

[Bug rtl-optimization/69847] Spec 2006 403.gcc slows down with -mlra vs. reload on PowerPC

2016-08-05 Thread mwahab at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69847

--- Comment #24 from mwahab at gcc dot gnu.org ---
Created attachment 39055
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=39055&action=edit
Testcase for ICE on arm targets

Attached vshuf-v64qi.i  for the ICE on arm targets.

The failure is triggered with -mfloat-abi=hard -O1.
The cc1 command line is:

cc1  vshuf-v64qi.i -mfloat-abi=hard -O1


The float-abi option is suspicious but I get the same failure when
-march=armv8-a -mfpu=neon-fp-armv8 is added so I don't think that it's the lack
of FP registers that's the problem.

Matthew

[Bug rtl-optimization/69847] Spec 2006 403.gcc slows down with -mlra vs. reload on PowerPC

2016-08-04 Thread mwahab at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69847

mwahab at gcc dot gnu.org changed:

   What|Removed |Added

 CC||mwahab at gcc dot gnu.org

--- Comment #22 from mwahab at gcc dot gnu.org ---
I believe that this patch is the cause of compilation failures for a number of
tests on arm-none-linux-gnueabihf and arm-none-eabi.

E.g. arm-none-linux-gnueabihf-gcc -S testsuite/gcc.dg/torture/vshuf-v64qi.c -O1
results in an ICE with

testsuite/gcc.dg/torture/vshuf-main.inc: In function ‘test_2’:
testsuite/gcc.dg/torture/vshuf-main.inc:28:1: error: insn does not satisfy its
constraints:
 TESTS
 ^
(insn 606 607 589 2 (set (reg:QI 3 r3 [326])
(reg:QI 31 s15)) testsuite/gcc.dg/torture/vshuf-main.inc:28 182
{*arm_movqi_insn}
 (expr_list:REG_EQUAL (const_int 13 [0xd])
(nil)))
testsuite/gcc.dg/torture/vshuf-main.inc:28:1: internal compiler error: in
extract_constrain_insn, at recog.c:2211
0xaea3e8 _fatal_insn(char const*, rtx_def const*, char const*, int, char
const*)
/work/src/gnutools/armdsg/src/gcc/gcc/rtl-error.c:108
0xaea40f _fatal_insn_not_found(rtx_def const*, char const*, int, char const*)
/work/src/gnutools/armdsg/src/gcc/gcc/rtl-error.c:119
0xabcf1d extract_constrain_insn(rtx_insn*)
/work/src/gnutools/armdsg/src/gcc/gcc/recog.c:2211
0x9d76af check_rtl
/work/src/gnutools/armdsg/src/gcc/gcc/lra.c:2108
0x9dbfa9 lra(_IO_FILE*)
/work/src/gnutools/armdsg/src/gcc/gcc/lra.c:2516
0x99106f do_reload
/work/src/gnutools/armdsg/src/gcc/gcc/ira.c:5384
0x99106f execute
/work/src/gnutools/armdsg/src/gcc/gcc/ira.c:5568


The test compiles with trunk before the change.

Let me know if there's anything else I should check.
Matthew

[Bug middle-end/71078] x/abs(x) -> sign(1.0,x)

2016-08-04 Thread mwahab at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=71078

mwahab at gcc dot gnu.org changed:

   What|Removed |Added

 CC||mwahab at gcc dot gnu.org

--- Comment #4 from mwahab at gcc dot gnu.org ---
(In reply to prathamesh3492 from comment #3)
> Fixed.

The new test pr71078-1.c fails for arm and aarch64 because copysignl isn't 
generated for fabsl making the expected output fail to match.

Matthew

[Bug middle-end/70920] if ((intptr_t)ptr == 0) doesn't get simplified to if (ptr == 0)

2016-08-03 Thread mwahab at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70920

mwahab at gcc dot gnu.org changed:

   What|Removed |Added

 CC||mwahab at gcc dot gnu.org

--- Comment #4 from mwahab at gcc dot gnu.org ---
(In reply to prathamesh3492 from comment #3)
> Fixed on trunk.

I think that this is the cause of a failure in gcc.dg/tree-ssa/pr22051-2.c for
arm-none-linux-gnueabihf and arm-none-eabi. 

With the patch, the scan-tree-dump match in the test file fails because the
conditional becomes

if (q_2(D) != 0B)

rather than the expected

if (r_3 != 0)


The comment in the test file suggests that the optimization might be wrong.

Matthew

[Bug rtl-optimization/71984] [7 Regression] wrong code with -O -mavx512cd

2016-08-03 Thread mwahab at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=71984

mwahab at gcc dot gnu.org changed:

   What|Removed |Added

 CC||mwahab at gcc dot gnu.org

--- Comment #7 from mwahab at gcc dot gnu.org ---
(In reply to Richard Biener from comment #6)
> Should be fixed.

The test pr71984.c fails for aarch64 big-endian, with x==2. (The test passes
for little-endian.)

Matthew

[Bug target/70711] GCC ARM big-endian ARMv8.1 code fails.

2016-04-18 Thread mwahab at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70711

mwahab at gcc dot gnu.org changed:

   What|Removed |Added

 Status|NEW |RESOLVED
 Resolution|--- |FIXED

--- Comment #5 from mwahab at gcc dot gnu.org ---
Fix committed to trunk and to gcc-6.

[Bug target/70711] GCC ARM big-endian ARMv8.1 code fails.

2016-04-18 Thread mwahab at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70711

--- Comment #4 from mwahab at gcc dot gnu.org ---
Author: mwahab
Date: Mon Apr 18 12:18:10 2016
New Revision: 235133

URL: https://gcc.gnu.org/viewcvs?rev=235133&root=gcc&view=rev
Log:
PR target/70711
* config/arm/bpabi.h (BE8_LINK_SPEC): Add entries for armv8+crc,
armv8.1-a and armv8.1-a+crc.


Modified:
branches/gcc-6-branch/gcc/ChangeLog
branches/gcc-6-branch/gcc/config/arm/bpabi.h

[Bug target/70711] GCC ARM big-endian ARMv8.1 code fails.

2016-04-18 Thread mwahab at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70711

--- Comment #3 from mwahab at gcc dot gnu.org ---
Author: mwahab
Date: Mon Apr 18 12:11:03 2016
New Revision: 235132

URL: https://gcc.gnu.org/viewcvs?rev=235132&root=gcc&view=rev
Log:
PR target/70711
* config/arm/bpabi.h (BE8_LINK_SPEC): Add entries for armv8+crc,
armv8.1-a and armv8.1-a+crc.


Modified:
trunk/gcc/ChangeLog
trunk/gcc/config/arm/bpabi.h

[Bug target/70711] New: GCC ARM big-endian ARMv8.1 code fails.

2016-04-18 Thread mwahab at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70711

Bug ID: 70711
   Summary: GCC ARM big-endian ARMv8.1 code fails.
   Product: gcc
   Version: unknown
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: mwahab at gcc dot gnu.org
  Target Milestone: ---

Created attachment 38301
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=38301&action=edit
Fix ARMv8.1-A big-endian builds.

The config/arm/bpabi.h file wasn't updated when ARMv8.1 support went in. This
means that when GCC is targeting big-endian ARMv8.1, the code it generates can
fail to execute.

The attached patch is being tested for armeb-none-eabi.

[Bug tree-optimization/64946] [AArch64] gcc.target/aarch64/vect-abs-compile.c - "abs" vectorization fails for char/short types

2016-01-07 Thread mwahab at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64946

mwahab at gcc dot gnu.org changed:

   What|Removed |Added

 CC||mwahab at gcc dot gnu.org

--- Comment #11 from mwahab at gcc dot gnu.org ---
I'm testing a patch that fixes the vect-abs-compile failure by adding a case to
 convert_to_integer to push the narrowing type conversion to the argument of
the ABS_EXPR.

[Bug tree-optimization/68619] [6 Regression] error: loop with header 6 not in loop tree

2015-12-16 Thread mwahab at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68619

mwahab at gcc dot gnu.org changed:

   What|Removed |Added

 CC||mwahab at gcc dot gnu.org

--- Comment #8 from mwahab at gcc dot gnu.org ---
(In reply to Jeffrey A. Law from comment #6)
> Fixed on trunk.

The new test case pr68619-4.c is failing for arm-none-eabi because it triggers
the error

pr68619-4.c:42:17: error: width of 'code' exceeds its type
   enum rtx_code code:16;


For arm-none-eabi, the rtx_code enum is being represented in 8-bits.
so 'enum rtx_code code:8;' compiles without the error.

Matthew

[Bug target/64784] -march=native should be supported

2015-12-16 Thread mwahab at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64784
Bug 64784 depends on bug 64783, which changed state.

Bug 64783 Summary: -march=armv8.1-a should be supported
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64783

   What|Removed |Added

 Status|ASSIGNED|RESOLVED
 Resolution|--- |FIXED

[Bug target/64783] -march=armv8.1-a should be supported

2015-12-16 Thread mwahab at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64783

mwahab at gcc dot gnu.org changed:

   What|Removed |Added

 Status|ASSIGNED|RESOLVED
 Resolution|--- |FIXED

--- Comment #4 from mwahab at gcc dot gnu.org ---
Author: mwahab
Date: Wed Dec 16 11:45:25 2015
New Revision: 231678

URL: https://gcc.gnu.org/viewcvs?rev=231678&root=gcc&view=rev
Log:
[ARM] Add support for ARMv8.1.

* config/arm/arm-arches.def: Add "armv8.1-a" and "armv8.1-a+crc".
* config/arm/arm-protos.h (FL2_ARCH8_1): New.
(FL2_FOR_ARCH8_1A): New.
* config/arm/arm-tables.opt: Regenerate.
* config/arm/arm.c (arm_arch8_1): New.
(arm_option_override): Set arm_arch8_1.
* config/arm/arm.h (TARGET_NEON_RDMA): New.
(arm_arch8_1): Declare.
* doc/invoke.texi (ARM Options, -march): Add "armv8.1-a" and
"armv8.1-a+crc".
(ARM Options, -mfpu): Fix a typo.


Modified:
trunk/gcc/ChangeLog
trunk/gcc/config/arm/arm-arches.def
trunk/gcc/config/arm/arm-protos.h
trunk/gcc/config/arm/arm-tables.opt
trunk/gcc/config/arm/arm.c
trunk/gcc/config/arm/arm.h
trunk/gcc/doc/invoke.texi

[Bug lto/61886] [4.9/5/6 Regression] LTO breaks fread with _FORTIFY_SOURCE=2

2015-12-16 Thread mwahab at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61886

mwahab at gcc dot gnu.org changed:

   What|Removed |Added

 CC||mwahab at gcc dot gnu.org

--- Comment #53 from mwahab at gcc dot gnu.org ---
(In reply to Jan Hubicka from comment #52)
> We no longer merge the decls, so hopefully this is fixed, but it would be
> nice to give it a try on the original packages that failed.  We still may
> lose/provoke some warnings during IPA optimizations I suppose as IPA code
> does not assume this kind of side effect for aliases, but that is not an LTO
> specific issue anymore.

The new test case alias-2.c fails for aarch64.

Compiling from the command line with gcc -O2 -fdump-tree-all, I get the
following for the tree.optimized pass
-
;; Function main (main, funcdef_no=0, decl_uid=2753, cgraph_uid=0,
symbol_order=3) (executed once)

main ()
{
  int off.0_2;

  :
  off.0_2 = off;
  b[off.0_2] = 1;
  a[off.0_2] = 2;
  __builtin_abort ();

}


Matthew

[Bug tree-optimization/68333] [6 Regression] FAIL: gcc.dg/vect/slp-multitypes-4.c -flto -ffat-lto-objects scan-tree-dump-times vect "vectorized 1 loops" 1

2015-12-08 Thread mwahab at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68333

--- Comment #5 from mwahab at gcc dot gnu.org ---
(In reply to mwahab from comment #4)
> (In reply to Richard Biener from comment #3)
> > Should be fixed now.
> 
> It's still failing for aarch64_be-none-elf.
> 
> The test has a dg-skip-if directive for aarch64-*-*, should that be
> aarch64*-*-*?

Sorry, I was looking at the wrong test, slp-multitype-4 is passing for
aarch64_be.

[Bug tree-optimization/68333] [6 Regression] FAIL: gcc.dg/vect/slp-multitypes-4.c -flto -ffat-lto-objects scan-tree-dump-times vect "vectorized 1 loops" 1

2015-12-08 Thread mwahab at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68333

mwahab at gcc dot gnu.org changed:

   What|Removed |Added

 CC||mwahab at gcc dot gnu.org

--- Comment #4 from mwahab at gcc dot gnu.org ---
(In reply to Richard Biener from comment #3)
> Should be fixed now.

It's still failing for aarch64_be-none-elf.

The test has a dg-skip-if directive for aarch64-*-*, should that be
aarch64*-*-*?

Matthew

[Bug tree-optimization/68659] [6 regression] FAIL: gcc.dg/graphite/id-pr45230-1.c (internal compiler error)

2015-12-07 Thread mwahab at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68659

mwahab at gcc dot gnu.org changed:

   What|Removed |Added

 CC||mwahab at gcc dot gnu.org

--- Comment #9 from mwahab at gcc dot gnu.org ---
I still see this on arm-none-eabi.

[Bug fortran/68534] No error on mismatch in number of arguments between submodule and module interface

2015-12-04 Thread mwahab at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68534

mwahab at gcc dot gnu.org changed:

   What|Removed |Added

 CC||mwahab at gcc dot gnu.org

--- Comment #4 from mwahab at gcc dot gnu.org ---
(In reply to Paul Thomas from comment #3)
> Author: pault
Hello

> Date: Mon Nov 30 13:33:27 2015
> New Revision: 231072
> 
> URL: https://gcc.gnu.org/viewcvs?rev=231072&root=gcc&view=rev
> Log:
> 2015-11-30  Paul Thomas  
> 
>   PR fortran/68534
>   * decl.c (gfc_match_formal_arglist): Cope with zero formal args
>   either in interface declaration or in procedure declaration in
>   submodule.
> 
> 2015-11-30  Paul Thomas  
> 
>   PR fortran/68534
>   * gfortran.dg/submodule_13.f08: New test.
> 

This test is failing on aarch64-none-linux-gnu, it looks like the second
dg-error (on module subroutine bar) doesn't get the output it expected. Running
from the command line, only the first error message is emitted.

Is the test correct?

[Bug target/65697] __atomic memory barriers not strong enough for __sync builtins

2015-10-05 Thread mwahab at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65697

mwahab at gcc dot gnu.org changed:

   What|Removed |Added

  Known to fail||4.9.4

--- Comment #74 from mwahab at gcc dot gnu.org ---
(In reply to Ramana Radhakrishnan from comment #73)
> Now fixed ,Matthew ?

Its fixed in trunk and gcc-5 but not in gcc-4.9.


[Bug target/67143] [5/6 Regression] ICE (could not split insn) on aarch64-linux-gnu

2015-09-23 Thread mwahab at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67143

--- Comment #5 from mwahab at gcc dot gnu.org ---
Author: mwahab
Date: Wed Sep 23 09:48:16 2015
New Revision: 228037

URL: https://gcc.gnu.org/viewcvs?rev=228037&root=gcc&view=rev
Log:
[Aarch64][target/PR 67143][5.2] Backport correct constraints for atomic
operations.

gcc/
2015-09-23  Matthew Wahab  

Backport from mainline
2015-08-14  Matthew Wahab  

PR target/67143
* config/aarch64/atomics.md (atomic_): Replace
'lconst_atomic' with 'const_atomic'.
(atomic_fetch_): Likewise.
(atomic__fetch): Likewise.
* config/aarch64/iterators.md (lconst-atomic): Move below
'const_atomic'.
(const_atomic): New.

gcc/testsuite/
2015-09-23  Matthew Wahab  

Backport from mainline
2015-08-14  Matthew Wahab  
Matthias Klose  

PR target/67143
* gcc.c-torture/compile/pr67143.c: New
* gcc.target/aarch64/atomic-op-imm.c
(atomic_fetch_add_negative_RELAXED): New.
(atomic_fetch_sub_negative_ACQUIRE): New.


Added:
branches/gcc-5-branch/gcc/testsuite/gcc.c-torture/compile/pr67143.c
Modified:
branches/gcc-5-branch/gcc/ChangeLog
branches/gcc-5-branch/gcc/config/aarch64/atomics.md
branches/gcc-5-branch/gcc/config/aarch64/iterators.md
branches/gcc-5-branch/gcc/testsuite/ChangeLog
branches/gcc-5-branch/gcc/testsuite/gcc.target/aarch64/atomic-op-imm.c


[Bug target/67143] [5/6 Regression] ICE (could not split insn) on aarch64-linux-gnu

2015-08-14 Thread mwahab at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67143

--- Comment #3 from mwahab at gcc dot gnu.org ---
Author: mwahab
Date: Fri Aug 14 15:05:42 2015
New Revision: 226895

URL: https://gcc.gnu.org/viewcvs?rev=226895&root=gcc&view=rev
Log:
gcc/
2015-08-14  Matthew Wahab  

PR target/67143
* config/aarch64/atomics.md (atomic_): Replace
'lconst_atomic' with 'const_atomic'.
(atomic_fetch_): Likewise.
(atomic__fetch): Likewise.
* config/aarch64/iterators.md (lconst-atomic): Move below
'const_atomic'.
(const_atomic): New.

gcc/testsuite/
2015-08-14  Matthew Wahab  
Matthias Klose  

PR target/67143
* gcc.c-torture/compile/pr67143.c: New
* gcc.target/aarch64/atomic-op-imm.c
(atomic_fetch_add_negative_RELAXED): New.
(atomic_fetch_sub_negative_ACQUIRE): New.


Added:
trunk/gcc/testsuite/gcc.c-torture/compile/pr67143.c
Modified:
trunk/gcc/ChangeLog
trunk/gcc/config/aarch64/atomics.md
trunk/gcc/config/aarch64/iterators.md
trunk/gcc/testsuite/ChangeLog
trunk/gcc/testsuite/gcc.target/aarch64/atomic-op-imm.c


[Bug target/67143] [5/6 Regression] ICE (could not split insn) on aarch64-linux-gnu

2015-08-07 Thread mwahab at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67143

mwahab at gcc dot gnu.org changed:

   What|Removed |Added

   Assignee|unassigned at gcc dot gnu.org  |mwahab at gcc dot 
gnu.org

--- Comment #2 from mwahab at gcc dot gnu.org ---
(In reply to ktkachov from comment #1)
> Seems related to something Matthew has been working on recently?

No, it looks like something to do with the  constratint
introduced by https://gcc.gnu.org/viewcvs/gcc?view=revision&revision=217076


[Bug target/65697] __atomic memory barriers not strong enough for __sync builtins

2015-08-05 Thread mwahab at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65697

--- Comment #72 from mwahab at gcc dot gnu.org ---
Author: mwahab
Date: Wed Aug  5 13:43:04 2015
New Revision: 226628

URL: https://gcc.gnu.org/viewcvs?rev=226628&root=gcc&view=rev
Log:
Backport from trunk:
2015-06-29  Matthew Wahab  

PR target/65697
* gcc.target/arm/armv-sync-comp-swap.c: New.
* gcc.target/arm/armv-sync-op-acquire.c: New.
* gcc.target/arm/armv-sync-op-full.c: New.
* gcc.target/arm/armv-sync-op-release.c: New.


Added:
branches/gcc-5-branch/gcc/testsuite/gcc.target/arm/armv8-sync-comp-swap.c
branches/gcc-5-branch/gcc/testsuite/gcc.target/arm/armv8-sync-op-acquire.c
branches/gcc-5-branch/gcc/testsuite/gcc.target/arm/armv8-sync-op-full.c
branches/gcc-5-branch/gcc/testsuite/gcc.target/arm/armv8-sync-op-release.c
Modified:
branches/gcc-5-branch/gcc/testsuite/ChangeLog


[Bug target/65697] __atomic memory barriers not strong enough for __sync builtins

2015-08-05 Thread mwahab at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65697

--- Comment #71 from mwahab at gcc dot gnu.org ---
Author: mwahab
Date: Wed Aug  5 13:40:14 2015
New Revision: 226627

URL: https://gcc.gnu.org/viewcvs?rev=226627&root=gcc&view=rev
Log:
Backport from trunk:
2015-06-29  Matthew Wahab  

PR target/65697
* config/arm/arm.c (arm_split_compare_and_swap): For ARMv8,
replace an initial acquire barrier with final barrier.


Modified:
branches/gcc-5-branch/gcc/ChangeLog
branches/gcc-5-branch/gcc/config/arm/arm.c


[Bug target/65697] __atomic memory barriers not strong enough for __sync builtins

2015-08-05 Thread mwahab at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65697

--- Comment #70 from mwahab at gcc dot gnu.org ---
Author: mwahab
Date: Wed Aug  5 13:27:41 2015
New Revision: 226625

URL: https://gcc.gnu.org/viewcvs?rev=226625&root=gcc&view=rev
Log:
Backport from trunk:
2015-06-29  Matthew Wahab  

PR target/65697
* config/armc/arm.c (arm_split_atomic_op): For ARMv8, replace an
initial acquire barrier with final barrier.


Modified:
branches/gcc-5-branch/gcc/ChangeLog
branches/gcc-5-branch/gcc/config/arm/arm.c


[Bug target/65697] __atomic memory barriers not strong enough for __sync builtins

2015-08-05 Thread mwahab at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65697

--- Comment #69 from mwahab at gcc dot gnu.org ---
Author: mwahab
Date: Wed Aug  5 11:48:43 2015
New Revision: 226621

URL: https://gcc.gnu.org/viewcvs?rev=226621&root=gcc&view=rev
Log:
Backport from trunk
2015-06-01  Matthew Wahab  

PR target/65697
* gcc.target/aarch64/sync-comp-swap.c: New.
* gcc.target/aarch64/sync-comp-swap.x: New.
* gcc.target/aarch64/sync-op-acquire.c: New.
* gcc.target/aarch64/sync-op-acquire.x: New.
* gcc.target/aarch64/sync-op-full.c: New.
* gcc.target/aarch64/sync-op-full.x: New.
* gcc.target/aarch64/sync-op-release.c: New.
* gcc.target/aarch64/sync-op-release.x: New.


Added:
branches/gcc-5-branch/gcc/testsuite/gcc.target/aarch64/sync-comp-swap.c
branches/gcc-5-branch/gcc/testsuite/gcc.target/aarch64/sync-comp-swap.x
branches/gcc-5-branch/gcc/testsuite/gcc.target/aarch64/sync-op-acquire.c
branches/gcc-5-branch/gcc/testsuite/gcc.target/aarch64/sync-op-acquire.x
branches/gcc-5-branch/gcc/testsuite/gcc.target/aarch64/sync-op-full.c
branches/gcc-5-branch/gcc/testsuite/gcc.target/aarch64/sync-op-full.x
branches/gcc-5-branch/gcc/testsuite/gcc.target/aarch64/sync-op-release.c
branches/gcc-5-branch/gcc/testsuite/gcc.target/aarch64/sync-op-release.x
Modified:
branches/gcc-5-branch/gcc/testsuite/ChangeLog


[Bug target/65697] __atomic memory barriers not strong enough for __sync builtins

2015-08-05 Thread mwahab at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65697

--- Comment #68 from mwahab at gcc dot gnu.org ---
Author: mwahab
Date: Wed Aug  5 11:40:25 2015
New Revision: 226620

URL: https://gcc.gnu.org/viewcvs?rev=226620&root=gcc&view=rev
Log:
Backport from trunk.
2015-06-01  Matthew Wahab  

PR target/65697
* config/aarch64/aarch64.c (aarch64_split_compare_and_swap): Check
for __sync memory models, emit initial loads and final barriers as
appropriate.


Modified:
branches/gcc-5-branch/gcc/ChangeLog
branches/gcc-5-branch/gcc/config/aarch64/aarch64.c


[Bug target/65697] __atomic memory barriers not strong enough for __sync builtins

2015-08-05 Thread mwahab at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65697

--- Comment #67 from mwahab at gcc dot gnu.org ---
Author: mwahab
Date: Wed Aug  5 11:29:28 2015
New Revision: 226619

URL: https://gcc.gnu.org/viewcvs?rev=226619&root=gcc&view=rev
Log:
Backport from trunk.
2015-06-01  Matthew Wahab  

PR target/65697
* config/aarch64/aarch64.c (aarch64_emit_post_barrier): New.
(aarch64_split_atomic_op): Check for __sync memory models, emit
appropriate initial loads and final barriers.


Modified:
branches/gcc-5-branch/gcc/ChangeLog
branches/gcc-5-branch/gcc/config/aarch64/aarch64.c


[Bug target/65697] __atomic memory barriers not strong enough for __sync builtins

2015-08-05 Thread mwahab at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65697

--- Comment #66 from mwahab at gcc dot gnu.org ---
Author: mwahab
Date: Wed Aug  5 11:20:59 2015
New Revision: 226618

URL: https://gcc.gnu.org/viewcvs?rev=226618&root=gcc&view=rev
Log:
Backport from trunk
2015-05-12  Andrew MacLeod  

PR target/65697
* coretypes.h (MEMMODEL_SYNC, MEMMODEL_BASE_MASK): New macros.
(enum memmodel): Add SYNC_{ACQUIRE,RELEASE,SEQ_CST}.
* tree.h (memmodel_from_int, memmodel_base, is_mm_relaxed)
(is_mm_consume,is_mm_acquire, is_mm_release, is_mm_acq_rel)
(is_mm_seq_cst, is_mm_sync): New accessor functions.
* builtins.c (expand_builtin_sync_operation)
(expand_builtin_compare_and_swap): Use MEMMODEL_SYNC_SEQ_CST.
(expand_builtin_sync_lock_release): Use MEMMODEL_SYNC_RELEASE.
(get_memmodel,  expand_builtin_atomic_compare_exchange)
(expand_builtin_atomic_load, expand_builtin_atomic_store)
(expand_builtin_atomic_clear): Use new accessor routines.
(expand_builtin_sync_synchronize): Use MEMMODEL_SYNC_SEQ_CST.
* optabs.c (expand_compare_and_swap_loop): Use MEMMODEL_SYNC_SEQ_CST.
(maybe_emit_sync_lock_test_and_set): Use new accessors and
MEMMODEL_SYNC_ACQUIRE.
(expand_sync_lock_test_and_set): Use MEMMODEL_SYNC_ACQUIRE.
(expand_mem_thread_fence, expand_mem_signal_fence, expand_atomic_load)
(expand_atomic_store): Use new accessors.
* emit-rtl.c (need_atomic_barrier_p): Add additional enum cases.
* tsan.c (instrument_builtin_call): Update check for memory model
beyond
final enum to use MEMMODEL_LAST.
* c-family/c-common.c: Use new accessor for memmodel_base.
* config/aarch64/aarch64.c (aarch64_expand_compare_and_swap): Use new
accessors.
* config/aarch64/atomics.md (atomic_load,atomic_store)
(arch64_load_exclusive, aarch64_store_exclusive)
(mem_thread_fence, *dmb): Likewise.
* config/alpha/alpha.c (alpha_split_compare_and_swap)
(alpha_split_compare_and_swap_12): Likewise.
* config/arm/arm.c (arm_expand_compare_and_swap)
(arm_split_compare_and_swap, arm_split_atomic_op): Likewise.
* config/arm/sync.md (atomic_load, atomic_store)
(atomic_loaddi): Likewise.
* config/i386/i386.c (ix86_destroy_cost_data, ix86_memmodel_check):
Likewise.
* config/i386/sync.md (mem_thread_fence, atomic_store): Likewise.
* config/ia64/ia64.c (ia64_expand_atomic_op): Add new memmodel cases
and
use new accessors.
* config/ia64/sync.md (mem_thread_fence, atomic_load)
(atomic_store, atomic_compare_and_swap)
(atomic_exchange): Use new accessors.
* config/mips/mips.c (mips_process_sync_loop): Likewise.
* config/pa/pa.md (atomic_loaddi, atomic_storedi): Likewise.
* config/rs6000/rs6000.c (rs6000_pre_atomic_barrier)
(rs6000_post_atomic_barrier): Add new cases.
(rs6000_expand_atomic_compare_and_swap): Use new accessors.
* config/rs6000/sync.md (mem_thread_fence): Add new cases.
(atomic_load): Add new cases and use new accessors.
(store_quadpti): Add new cases.
* config/s390/s390.md (mem_thread_fence, atomic_store): Use new
accessors.
* config/sparc/sparc.c (sparc_emit_membar_for_model): Use new
accessors.
* doc/extend.texi: Update docs to indicate 16 bits are used for memory
model, not 8.


Modified:
branches/gcc-5-branch/gcc/ChangeLog
branches/gcc-5-branch/gcc/builtins.c
branches/gcc-5-branch/gcc/c-family/c-common.c
branches/gcc-5-branch/gcc/config/aarch64/aarch64.c
branches/gcc-5-branch/gcc/config/aarch64/atomics.md
branches/gcc-5-branch/gcc/config/alpha/alpha.c
branches/gcc-5-branch/gcc/config/arm/arm.c
branches/gcc-5-branch/gcc/config/arm/sync.md
branches/gcc-5-branch/gcc/config/i386/i386.c
branches/gcc-5-branch/gcc/config/i386/sync.md
branches/gcc-5-branch/gcc/config/ia64/ia64.c
branches/gcc-5-branch/gcc/config/ia64/sync.md
branches/gcc-5-branch/gcc/config/mips/mips.c
branches/gcc-5-branch/gcc/config/pa/pa.md
branches/gcc-5-branch/gcc/config/rs6000/rs6000.c
branches/gcc-5-branch/gcc/config/rs6000/sync.md
branches/gcc-5-branch/gcc/config/s390/s390.md
branches/gcc-5-branch/gcc/config/sparc/sparc.c
branches/gcc-5-branch/gcc/coretypes.h
branches/gcc-5-branch/gcc/doc/extend.texi
branches/gcc-5-branch/gcc/emit-rtl.c
branches/gcc-5-branch/gcc/optabs.c
branches/gcc-5-branch/gcc/tree.h
branches/gcc-5-branch/gcc/tsan.c


[Bug target/65697] __atomic memory barriers not strong enough for __sync builtins

2015-06-29 Thread mwahab at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65697

--- Comment #65 from mwahab at gcc dot gnu.org ---
Author: mwahab
Date: Mon Jun 29 16:12:12 2015
New Revision: 225134

URL: https://gcc.gnu.org/viewcvs?rev=225134&root=gcc&view=rev
Log:
2015-06-29  Matthew Wahab  

PR target/65697
* gcc.target/arm/armv-sync-comp-swap.c: New.
* gcc.target/arm/armv-sync-op-acquire.c: New.
* gcc.target/arm/armv-sync-op-full.c: New.
* gcc.target/arm/armv-sync-op-release.c: New.


Added:
trunk/gcc/testsuite/gcc.target/arm/armv8-sync-comp-swap.c   (with props)
trunk/gcc/testsuite/gcc.target/arm/armv8-sync-op-acquire.c   (with props)
trunk/gcc/testsuite/gcc.target/arm/armv8-sync-op-full.c   (with props)
trunk/gcc/testsuite/gcc.target/arm/armv8-sync-op-release.c   (with props)
Modified:
trunk/gcc/ChangeLog

Propchange: trunk/gcc/testsuite/gcc.target/arm/armv8-sync-comp-swap.c
('svn:eol-style' added)

Propchange: trunk/gcc/testsuite/gcc.target/arm/armv8-sync-comp-swap.c
('svn:keywords' added)

Propchange: trunk/gcc/testsuite/gcc.target/arm/armv8-sync-op-acquire.c
('svn:eol-style' added)

Propchange: trunk/gcc/testsuite/gcc.target/arm/armv8-sync-op-acquire.c
('svn:keywords' added)

Propchange: trunk/gcc/testsuite/gcc.target/arm/armv8-sync-op-full.c
('svn:eol-style' added)

Propchange: trunk/gcc/testsuite/gcc.target/arm/armv8-sync-op-full.c
('svn:keywords' added)

Propchange: trunk/gcc/testsuite/gcc.target/arm/armv8-sync-op-release.c
('svn:eol-style' added)

Propchange: trunk/gcc/testsuite/gcc.target/arm/armv8-sync-op-release.c
('svn:keywords' added)


[Bug target/65697] __atomic memory barriers not strong enough for __sync builtins

2015-06-29 Thread mwahab at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65697

--- Comment #64 from mwahab at gcc dot gnu.org ---
Author: mwahab
Date: Mon Jun 29 16:09:10 2015
New Revision: 225133

URL: https://gcc.gnu.org/viewcvs?rev=225133&root=gcc&view=rev
Log:
2015-06-29  Matthew Wahab  

PR target/65697
* config/armc/arm.c (arm_split_compare_and_swap): For ARMv8, replace an
initial acquire barrier with final barrier.


Modified:
trunk/gcc/ChangeLog
trunk/gcc/config/arm/arm.c


[Bug target/65697] __atomic memory barriers not strong enough for __sync builtins

2015-06-29 Thread mwahab at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65697

--- Comment #63 from mwahab at gcc dot gnu.org ---
Author: mwahab
Date: Mon Jun 29 16:03:34 2015
New Revision: 225132

URL: https://gcc.gnu.org/viewcvs?rev=225132&root=gcc&view=rev
Log:
2015-06-29  Matthew Wahab  

PR target/65697
* config/armc/arm.c (arm_split_atomic_op): For ARMv8, replace an
initial acquire barrier with final barrier.


Modified:
trunk/gcc/ChangeLog
trunk/gcc/config/arm/arm.c


[Bug target/64783] -march=armv8.1-a should be supported

2015-06-25 Thread mwahab at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64783

--- Comment #3 from mwahab at gcc dot gnu.org ---
I've just noticed this has been assigned to me. Support for -march=armv8.1-a
has been added to the Aarch64 backend, the ARM backend is still to be done.

Author: mwahab
Date: Tue Jun 16 13:38:37 2015
New Revision: 224519

URL: https://gcc.gnu.org/viewcvs?rev=224519&root=gcc&view=rev
Log:
2015-06-16  Matthew Wahab  

* config/aarch64/aarch64-arches.def: Add "armv8.1-a".
* config/aarch64/aarch64-options-extensions.def: Update "fP",
"simd" and "crypto".  Add "lse", "pan", "lor" and "rdma".
* gcc/config/aarch64/aarch64.h (AARCH64_FL_LSE): New.
(AARCH64_FL_PAN): New.
(AARCH64_FL_LOR): New.
(AARCH64_FL_RDMA): New.
(AARCH64_FL_FOR_ARCH8_1): New.
* doc/invoke.texi (AArch64 Options): Add "armv8.1-a" to
-march. Add "lse", "pan", "lor", "rdma" to feature modifiers.


Modified:
trunk/gcc/ChangeLog
trunk/gcc/config/aarch64/aarch64-arches.def
trunk/gcc/config/aarch64/aarch64-option-extensions.def
trunk/gcc/config/aarch64/aarch64.h
trunk/gcc/doc/invoke.texi


[Bug target/65697] __atomic memory barriers not strong enough for __sync builtins

2015-06-11 Thread mwahab at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65697

--- Comment #62 from mwahab at gcc dot gnu.org ---
(In reply to Ramana Radhakrishnan from comment #61)
> Well, confirmed at least. And at the minute fixed on trunk - not sure if we
> are asking for backports for this ?

Marcus has asked for this to be backported to 5.1 and maybe 4.9. I'm looking at
it.


[Bug target/65697] __atomic memory barriers not strong enough for __sync builtins

2015-06-01 Thread mwahab at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65697

--- Comment #60 from mwahab at gcc dot gnu.org ---
Author: mwahab
Date: Mon Jun  1 15:24:37 2015
New Revision: 223986

URL: https://gcc.gnu.org/viewcvs?rev=223986&root=gcc&view=rev
Log:
PR target/65697
* gcc.target/aarch64/sync-comp-swap.c: New.
* gcc.target/aarch64/sync-comp-swap.x: New.
* gcc.target/aarch64/sync-op-acquire.c: New.
* gcc.target/aarch64/sync-op-acquire.x: New.
* gcc.target/aarch64/sync-op-full.c: New.
* gcc.target/aarch64/sync-op-full.x: New.
* gcc.target/aarch64/sync-op-release.c: New.
* gcc.target/aarch64/sync-op-release.x: New.


Added:
trunk/gcc/testsuite/gcc.target/aarch64/sync-comp-swap.c
trunk/gcc/testsuite/gcc.target/aarch64/sync-comp-swap.x
trunk/gcc/testsuite/gcc.target/aarch64/sync-op-acquire.c
trunk/gcc/testsuite/gcc.target/aarch64/sync-op-acquire.x
trunk/gcc/testsuite/gcc.target/aarch64/sync-op-full.c
trunk/gcc/testsuite/gcc.target/aarch64/sync-op-full.x
trunk/gcc/testsuite/gcc.target/aarch64/sync-op-release.c
trunk/gcc/testsuite/gcc.target/aarch64/sync-op-release.x
Modified:
trunk/gcc/testsuite/ChangeLog


[Bug target/65697] __atomic memory barriers not strong enough for __sync builtins

2015-06-01 Thread mwahab at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65697

--- Comment #59 from mwahab at gcc dot gnu.org ---
Author: mwahab
Date: Mon Jun  1 15:21:02 2015
New Revision: 223984

URL: https://gcc.gnu.org/viewcvs?rev=223984&root=gcc&view=rev
Log:
PR target/65697
* config/aarch64/aarch64.c (aarch64_split_compare_and_swap): Check
for __sync memory models, emit initial loads and final barriers as
appropriate.


Modified:
trunk/gcc/ChangeLog
trunk/gcc/config/aarch64/aarch64.c


[Bug target/65697] __atomic memory barriers not strong enough for __sync builtins

2015-06-01 Thread mwahab at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65697

--- Comment #58 from mwahab at gcc dot gnu.org ---
Author: mwahab
Date: Mon Jun  1 15:18:19 2015
New Revision: 223983

URL: https://gcc.gnu.org/viewcvs?rev=223983&root=gcc&view=rev
Log:
PR target/65697
* config/aarch64/aarch64.c (aarch64_emit_post_barrier):New.
(aarch64_split_atomic_op): Check for __sync memory models, emit
appropriate initial loads and final barriers.


Modified:
trunk/gcc/ChangeLog
trunk/gcc/config/aarch64/aarch64.c


[Bug target/65697] __atomic memory barriers not strong enough for __sync builtins

2015-05-11 Thread mwahab at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65697

--- Comment #56 from mwahab at gcc dot gnu.org ---
(In reply to James Greenhalgh from comment #55)
> (In reply to torvald from comment #49)
>
> > This is the case of allowing non-DRF normal accesses.  The *other* case I
> > was thinking about is how the test would have to look like when *not*
> > allowing them.  One way to do it would be:
> > 
> > thread_a {
> >   __sync_lock_test_and_set (foo, 1)
> >   __sync_lock_test_and_set (bar, 1) // or __sync_lock_release, or __sync RMW
> > }
>
> [..] (it is
> a shame that the DRF code you provided will suffer from an extra barrier if
> Matthew/Andrew's work is applied, but I think this is a corner case which we
> probably don't want to put too much thought in to working around).

I'm not familiar with the code but I would have thought that it would be 
straightforward to optimize away the first dmb. It seems like it would be a
simple peephole to spot a sequence of two atomic operations, both with __sync
barriers, and replace the first with the equivalent __atomic barrier.


[Bug target/65697] __atomic memory barriers not strong enough for __sync builtins

2015-05-08 Thread mwahab at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65697

--- Comment #54 from mwahab at gcc dot gnu.org ---
(In reply to mwahab from comment #53)
> (In reply to Andrew Macleod from comment #50)
> > Created attachment 35478 [details]
> > implement SYNC flag for memory model
> > 
> > This compiles on all targets, but is only runtime tested on
> > x86_64-unknown-linux-gnu with no regressions.
> 
> The patch passes check-gcc for aarch64-none-linux-gnu with no new failures.
> I'll also run the tests on arm.
> 

Also passes check-gcc for arm-none-eabi.


[Bug target/65697] __atomic memory barriers not strong enough for __sync builtins

2015-05-07 Thread mwahab at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65697

--- Comment #53 from mwahab at gcc dot gnu.org ---
(In reply to Andrew Macleod from comment #50)
> Created attachment 35478 [details]
> implement SYNC flag for memory model
> 
> This compiles on all targets, but is only runtime tested on
> x86_64-unknown-linux-gnu with no regressions.

The patch passes check-gcc for aarch64-none-linux-gnu with no new failures.
I'll also run the tests on arm.

> With this you should be able to easily issue whatever different insn
> sequence you need to within an atomic pattern.   Give it a try.  If it works
> as required, then we'll see about executing on all targets for testing...

I modified the Aarch64 backend to use the new is_mm_sync test to generate
stronger barriers when needed and it all works as required.


[Bug target/65697] __atomic memory barriers not strong enough for __sync builtins

2015-05-06 Thread mwahab at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65697

--- Comment #51 from mwahab at gcc dot gnu.org ---
(In reply to Andrew Macleod from comment #50)
> Created attachment 35478 [details]
> implement SYNC flag for memory model
> 
> > Adding the __sync barriers to coretypes.h is the better approach if more
> > than one backend has this problem. Since it seems that only Aarch64 is
> > affected, I think its better to go with the target hook. If a new
> > architecture gets added with the same problem, it will be easy enough to
> > switch to the coretypes.h approach.
> 
> Well, if it affects some target now, it likely to affect some target in the
> future as well. Didn't someone also mention there is a potential issue with
> PPC somewhere too?

The earlier comment (https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65697#c24)
suggests that PPC generates full barriers so won't be affected by this. It
looks like Aarch64 is the only current target with the problem.

> In any case, I'm not a fan of adding target hooks for this... We ought to
> just bite the bullet and integrate it cleanly into the memory model. 

OK.

> I have a patch which adds a SYNC flag to the upper bit of the memory model.
> It modifies all the generic code and target patterns to use access routines
> (declared in tree.h) instead of hard coding masks and flags (which probably
> should have be done in the first place). This makes carrying around the SYNC
> flag transparent until you want to look for it.

>From a quick look, this seems to do what's needed to allow the Aarch64 backend
to distinguish the sync and atomic code.

> This compiles on all targets, but is only runtime tested on
> x86_64-unknown-linux-gnu with no regressions.

The mips backend was the only one that stood out as needing some care, because
the way it uses the memory models (e.g. in function mips_process_sync_loop) is
a little different from the backends.

> With this you should be able to easily issue whatever different insn
> sequence you need to within an atomic pattern.   Give it a try.  If it works
> as required, then we'll see about executing on all targets for testing...

I'll give it a go.


[Bug target/65697] __atomic memory barriers not strong enough for __sync builtins

2015-04-30 Thread mwahab at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65697

--- Comment #48 from mwahab at gcc dot gnu.org ---
(In reply to Andrew Macleod from comment #47)
> Created attachment 35425 [details]
> potential patch to add MEMMODEL_SYNC
> 
> I don't know where we've finally settled on this, but I did prototype out
> what involved in adding a new MEMMODEL_SYNC which can be identified and made
> stronger than MEMMODEL_SEQ_CST if so desired. Normally it will behave
> exactly like MEMMODEL_SEQ_CST.   This bootstraps on x86-64 with no testsuite
> regressions.
> 

I've been working on patches for this. Adding a new MEMMODEL type was my first
approach because it means that gcc has one representation for all the memory
models it supports. I decided not to submit the patch because Aarch64 also
needs an tag for the __sync acquire used in __sync_lock_test_and_set. Adding
that touches more backends than the MEMMODEL_SYNC and some of those changes are
not obvious (for the mips backend in particular). The reasons why Aarch64 needs
a new acquire barrier are rather tenuous (they're in the earlier comments) and
don't seem to justify the possibly invasive changes that would be needed. 

The approach I'm working on now is to add a target hook to allow a backend to
supply its own expansion of calls to the __sync builtins. That makes for
smaller changes in the target-independent code and lets the Aarch64 backend add
the new barriers as target-specific memory models. The actual code generation
goes through the infrastructure for the atomics.

Adding the __sync barriers to coretypes.h is the better approach if more than
one backend has this problem. Since it seems that only Aarch64 is affected, I
think its better to go with the target hook. If a new architecture gets added
with the same problem, it will be easy enough to switch to the coretypes.h
approach.


[Bug target/65697] __atomic memory barriers not strong enough for __sync builtins

2015-04-29 Thread mwahab at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65697

--- Comment #46 from mwahab at gcc dot gnu.org ---
(In reply to James Greenhalgh from comment #45)
> (In reply to mwahab from comment #44)
>
> And this final sentence is buggy by omission of a mention of memory writes:
> 
>   but following memory reads are not prevented from being speculated to
>   before the barrier.
> 
> Which can be read as forbidding store-store reordering across the barrier.

It doesn't say anything about store-store reordering so it's trumped by the
stated intention to be compatible with the psABI.

> As I say though, I think this is an artefact of transcribing the
> documentation, rather than an attempt to provide stronger ordering semantics
> in the GCC implementation.

Agreed.


[Bug target/65697] __atomic memory barriers not strong enough for __sync builtins

2015-04-29 Thread mwahab at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65697

--- Comment #44 from mwahab at gcc dot gnu.org ---
(In reply to James Greenhalgh from comment #43)
> (In reply to torvald from comment #37)
> 
> > > I'm not worried about __sync_lock_release, I think the documentation is
> > > strong enough and unambiguous.
> > 
> > Are you aware that the GCC's __sync disallow store-store reordering across
> > __sync_lock_release, whereas the psABI docs don't?
> 
> No I was not, and even looking exactly for the text you were referring to,
> it took
> me three attempts to spot it. Yes, I understand now why you are concerned
> about
> the GCC wording. Perhaps this is just an artefact of a mistake transcribing
> the psABI?
> 
> AArch64 is certainly not providing that guarantee just now.

I wasn't able to find the text, could you copy it in a reply. 
In the GCC manual, the only description of __sync_lock_release behaviour is in
the last paragraph. That descriptions seems consistent with the function being
a release barrier and with the current Aarch64 code generated for it.


[Bug target/65697] __atomic memory barriers not strong enough for __sync builtins

2015-04-20 Thread mwahab at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65697

--- Comment #41 from mwahab at gcc dot gnu.org ---
(In reply to torvald from comment #38)
> (In reply to Andrew Macleod from comment #34)
> 
> Also, if you look at the IA-64 __sync_lock_release vs. GCC docs'
> __sync_lock_release, the latter is like x86/TSO.  Do you have any info on
> which other semantics __sync was supposed to adhere to?
> 
> One potential way to solve it would be to just require code that uses __sync
> to more or less implement an IA-64 or x86 memory model, modulo allowing
> compiler-reordering and optimization between adjacent non-__sync memory
> accesses.  This could be inefficient on ARM (see James' examples) and
> perhaps Power too (or not -- see Jakub's comments).

If the __sync barriers are as described in the GCC manual, that a barrier is
atomic and its restrictions apply to all data references, then the Aarch64
backend doesn't currently emit strong enough barriers.

For MEMMODEL_SEQ_CST, the problem was visible enough and the solution I
suggested (extending the set of available memmodel types) was simple enough
that the changes it would need could be justified. I don't think that's true
for the MEMMODEL_ACQUIRE case which seems to be much less likely to be seen and
would be rather more disruptive.

I believe that Aarch64 is the only current target where the code needs to be
strengthened. Since extending the set of memmodels is difficult to justify and
(IMO) so is resurrecting the __sync patterns, I suggest just adding a target
hook to allow the expansion of __sync calls to be overridden. That will allow
Aarch64 to set a target-specific memmodel value, as is currently allowed, which
can then be passed through the existing __atomics mechanisms in the middle
through to the Aarch64 backend. No other backend will need to be touched.

If it happens that future architectures have a similar problem then we can
reconsider whether any changes need to be made in the target-independent
expansions.

Does that sounds like a reasonable approach?


[Bug target/65697] __atomic memory barriers not strong enough for __sync builtins

2015-04-20 Thread mwahab at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65697

--- Comment #40 from mwahab at gcc dot gnu.org ---
(In reply to Andrew Macleod from comment #25)

> Documentation needs updating for sure... The rules have changed under us
> since originally SEQ_CST and sync were intended to be the same thing...
> Guess I shouldn't have tied our horse to the C++11 cart :-)

I've submitted a patch to try to update the __atomics documentation:
https://gcc.gnu.org/ml/gcc-patches/2015-04/msg01025.html.


[Bug target/65697] __atomic memory barriers not strong enough for __sync builtins

2015-04-16 Thread mwahab at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65697

--- Comment #36 from mwahab at gcc dot gnu.org ---
(In reply to Andrew Macleod from comment #31)
> 
> 
> Targets that don't need special sync patterns (ie most of them) simply don't
> provide them.   The expanders see no sync pattern and use SEQ_CST, exactly
> like they do today.

For current targets that may be true but assumes that no new target will need
the special sync patterns.

> sync patterns would only be provided by targets which do need to do
> something different.   This means there is no potential bug introduction on
> unaffected targets.. 

I was thinking of existing sync patterns in current backends which may have
been made redundant by the atomics builtins but are still there. There's a
danger that they'll get used even if they're not preferred for that target.
There's also the question of whether they're getting tested, assuming that they
were only ever generated for the __sync builtins.

> Well, it actually returns back to the situation before they were merged. We
> use to look for sync patterns too... until I thought they were redundant.

I believe that they are redundant (for __sync builtins anyway) since it looks
like everything could be done through the atomics + new barrier.

> The intention was to deprecate __sync and support just the c++ memory model.
> SEQ_CST == SYNC was the original intent and understanding, thus the code
> merge.  If we decide we want/need to provide a stronger form of SEQ_CST and
> call it SYNC, then we can do that as required... but I'm not envisioning a
> lot of future additional memory models. Although never say never I suppose.

I don't expect that many models will be needed, the set should certainly be
kept as small as possible. I think that extending it in this case is justified
because of the gap between what is needed and what is now available. 

The choice seems to be 
- between continuing the move away from the syncs to the atomics. This makes
the __sync and the __atomic builtins rely on one infrastructure.
- keeping both the atomics and the syncs infrastructures with individual
targets choosing between them.

The first option seems better, not least because it reduces the number of
things that need to be understood when dealing with synchronization primitives.


[Bug target/65697] __atomic memory barriers not strong enough for __sync builtins

2015-04-16 Thread mwahab at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65697

--- Comment #33 from mwahab at gcc dot gnu.org ---
(In reply to torvald from comment #32)
> (In reply to James Greenhalgh from comment #28)
> > (In reply to torvald from comment #24)
> > > 3) We could do something just on ARM (and scan other arcs for similar
> > > issues).  That's perhaps the cleanest option.
> > 
> > Which leaves 3). From Andrew's two proposed solutions:
> 
> 3) Also seems best to me.  2) is worst, 1) is too much of a stick.
> 
> > This also gives us an easier route to fixing any issues with the
> > acquire/release __sync primitives (__sync_lock_test_and_set and
> > __sync_lock_release) if we decide that these also need to be stronger than
> > their C++11 equivalents.
> 
> I don't think we have another case of different __sync vs. __atomics
> semantics in case of __sync_lock_test_and_set.  The current specification
> makes it clear that this is an acquire barrier, and how it describes the
> semantics (ie, loads and stores that are program-order before the acquire op
> can move to after it) , this seems to be consistent with the effects C11
> specifies for acquire MO (with perhaps the distinction that C11 is clear
> that acquire needs to be paired with some release op to create an ordering
> constraint).

Thanks, I suspect that the acquire barrier may not be much as much of an issue
as I had remembered. (The issue came up while I was trying to understand the
C11 semantics.)

The test case (aarch64) I have is:

int foo = 0;
int bar = 0;
int T5(void)
{
  int x = __sync_lock_test_and_set(&foo, 1);
  return bar;
}

.L11:
ldaxrw2, [x0] ; load-acquire
stxrw3, w1, [x0] ; store
cbnzw3, .L11
ldrw0, [x0, 4]  ; load
ret

My concern was that the load could be speculated ahead of the store. Since the
store marks the end of the barrier, that could make it appear as if the load
had completed before the acquire-barrier.

In retrospect, I don't think that there will be a problem because any load that
could be moved would have to end up with the same value as if it had not moved.


[Bug target/65697] __atomic memory barriers not strong enough for __sync builtins

2015-04-16 Thread mwahab at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65697

--- Comment #30 from mwahab at gcc dot gnu.org ---
(In reply to James Greenhalgh from comment #28)

> Which leaves 3). From Andrew's two proposed solutions:
> 
> > a) introduce an additional memory model... MEMMODEL_SYNC or something
> > which is even stronger than SEQ_CST.  This would involve fixing any places
> > that assume SEQ_CST is the highest.  And then we use this from the places
> > that expand sync calls as the memory model.
> 
> This seems a sensible approach and leads to a nicer design, but I worry that
> it might be overkill for primitives which we ultimately want to reduce
> support for and eventually deprecate.

I don't understand why it would be overkill. Its already expected that not all
barriers will be meaningful for all targets and target code to handle redundant
barriers usually comes down to a few clauses in a conditional statement. 

> > (b) may be easier to implement, but puts more onus on the target..
> > probably involves more complex patterns since you need both atomic and
> > sync patterns. My guess is some duplication of code will occur here.  But
> > the impact is only to specific targets.
> 
> When I looked at this problem internally a few weeks back, this is exactly
> how I expected things to work (we can add the documentation for the pattern
> names to the list of things which need fixing, as it took me a while to
> figure out why my new patterns were never expanded!).
> 
> I don't think this is particularly onerous for a target. The tough part in
> all of this is figuring out the minimal cost instructions at the ISA level
> to use for the various __atomic primitives. Extending support to a stronger
> model, should be straightforward given explicit documentation of the
> stronger ordering requirements.
>

My objection to using sync patterns is that it does take more work, both for
the initial implementation and for continuing maintenance. It also means adding
sync patterns to targets that would not otherwise need them and preserving a
part of the gcc infrastructure that is only needed for a legacy feature and
could otherwise be targeted for removal.

> Certainly, a target would have to do the same legwork for a) regardless, and
> would have to pollute their atomic patterns with checks and code paths for
> MEMMODEL_SYNC.

Actually, the changes needed for (a) are very simple. The checks and code-paths
for handling barriers are already there, reusing them for MEMMODEL_SYNC is
trivial. The resulting code is much easier to understand, and safely fix,
because it is all in the same place, than if it was spread out across patterns
and functions. 

> This also gives us an easier route to fixing any issues with the
> acquire/release __sync primitives (__sync_lock_test_and_set and
> __sync_lock_release) if we decide that these also need to be stronger than
> their C++11 equivalents.

This seems like a chunky workaround to avoid having to add a barrier
representation to gcc.  It's a, not necessarily straightforward, reworking of
the middle-end to use patterns that would need to be added to architectures
that don't currently have them and at least checked in the architectures that
do.

This seems to come down to what enum memmodel is supposed to be for. If it is
intended to represent the barriers provided by C11/C+11 atomics and nothing
else than a workaround seems unavoidable. If it is meant to represent barriers
needed by gcc to compile user code than, it seems to me, that it would be
better to just add the barrier to the enum and update code where necessary. 

Since memmodel is relatively recent, was merged from what looks like the C++
memory model branch (cxx-mem-model), and doesn't seem to have changed since
then, it's maybe not surprising that it doesn't already include every thing
needed by gcc. I don't see that adding to it should be prohibited, provided the
additions can be show to be strictly required.


[Bug target/65697] __atomic memory barriers not strong enough for __sync builtins

2015-04-16 Thread mwahab at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65697

--- Comment #29 from mwahab at gcc dot gnu.org ---
(In reply to mwahab from comment #27)
> (In reply to Andrew Macleod from comment #25)
> > My opinion:
> > 
> > 1) is undesirable... even though it could possibly accelerate the conversion
> > of legacy sync to atomic calls... I fear it would instead just cause
> > frustration, annoyance and worse.  I don't think we need to actually
> > penalize sync for no reason better than a 1 in a somethillion shot in the
> > dark.
> > 
> > 2) Similar reasoning, I don't think we need to penalize SEQ_CST everywhere
> > for a legacy sync call that probably doesn't need stronger code either.
> > 
> > which leaves option 3)
> > 
> > There are 2 primary options I think


There may be a third option, which is to set-up a target hook to allow the sync
expansions in the middle end to be overridden. This limits the changes to the
backends that need the different semantics without having to continue with the
sync_ patterns (which aren't in aarch64 for example). There's space in the
memmodel values to support target specific barriers so, for aarch64, this would
allow the atomics code to be reused. I don't know much about the target hook
infrastructure though so I don't know how disruptive a new hook would be.


[Bug target/65697] __atomic memory barriers not strong enough for __sync builtins

2015-04-16 Thread mwahab at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65697

--- Comment #27 from mwahab at gcc dot gnu.org ---
(In reply to Andrew Macleod from comment #25)
> My opinion:
> 
> 1) is undesirable... even though it could possibly accelerate the conversion
> of legacy sync to atomic calls... I fear it would instead just cause
> frustration, annoyance and worse.  I don't think we need to actually
> penalize sync for no reason better than a 1 in a somethillion shot in the
> dark.
> 
> 2) Similar reasoning, I don't think we need to penalize SEQ_CST everywhere
> for a legacy sync call that probably doesn't need stronger code either.
> 
> which leaves option 3)
> 
> There are 2 primary options I think
> 
> a) introduce an additional memory model... MEMMODEL_SYNC or something which
> is even stronger than SEQ_CST.  This would involve fixing any places that
> assume SEQ_CST is the highest.  And then we use this from the places that
> expand sync calls as the memory model.  
>
> or
> b) When we are expanding sync calls, we first look for target __sync
> patterns instead of atomic patterns.  If they exist, we use those. Otherwise
> we simply expand like we do today.  
> 
> 
> (b) may be easier to implement, but puts more onus on the target.. probably
> involves more complex patterns since you need both atomic and sync patterns.
> My guess is some duplication of code will occur here.  But the impact is
> only to specific targets.
> 
> (a) Is probably cleaner... just touches a lot more code.  Since we're in
> stage 1, maybe (a) is a better long term solution...?   
> 

Adding a new barrier specifer to enum memmodel seems the simplest approach. It
would mean that all barriers used in gcc are represented in the same way and
that would make them more visible and easier to understand than splitting them
between the enum memmodel and the atomic/sync patterns.

I'm testing a patch-set for option (a). It touches several backends but all the
changes are very simple since the new barrier is the same as MEMMODEL_SEQ_CST
for most targets.

One thing though is that the aarch64 code for __sync_lock_test_and_set and
there may have a similar problem with a too-weak barrier. It's not confirmed
that there is a problem but, since it may mean more work in this area, I
thought I'd mention it.


[Bug target/65697] __atomic memory barriers not strong enough for __sync builtins

2015-04-16 Thread mwahab at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65697

--- Comment #26 from mwahab at gcc dot gnu.org ---
(In reply to torvald from comment #21)
> (In reply to Andrew Haley from comment #20)
> > (In reply to mwahab from comment #19)
> > > (In reply to Andrew Haley from comment #18)
> > > 
> > > It looks inconsistent with C11 S7.17.7.4-2 (C++11 S29.6.4-21) "Further, if
> > > the comparison is true, memory is affected according to the value of
> > > success, and if the comparison is false, memory is affected according to 
> > > the
> > > value of failure." (where success and failure are the memory model
> > > arguments.) In this case, the write to *exp should be 
> > > memory_order_seq_cst.
> > 
> > But no store actually takes place, so the only effect is that of the read.
> > You can't have a sequentially consistent store without a store.
> 
> I agree. If you continue reading in the C++11 paragraph that you cited,
> you'll see that if just one MO is provided and the CAS fails, an acq_rel MO
> is downgraded to acquire and a release MO to relaxed.  This is consistent
> with no update of the atomic variable (note that expected is not atomic, so
> applying an MO to accesses to it is not meaningful in any case).  However,
> if the provided MO is seq_cst, I think a failed CAS still needs to be
> equivalent to a seq_cst load.

Yes, the last two sentences in the C++11 paragraph make it clear: "If the
operation returns true, these operations are atomic read-modify-write
operations (1.10). Otherwise, these operations are atomic load operations."  In
that case, the Aarch64 code looks ok.


[Bug target/65697] __atomic memory barriers not strong enough for __sync builtins

2015-04-15 Thread mwahab at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65697

--- Comment #19 from mwahab at gcc dot gnu.org ---
(In reply to Andrew Haley from comment #18)
> (In reply to mwahab from comment #17)
> 
> > 
> > int cas(int* barf, int* expected, int* desired)
> > {
> >   return __atomic_compare_exchange_n(barf, expected, desired, 0,
> >  __ATOMIC_SEQ_CST, __ATOMIC_SEQ_CST);
> > }
> > 
> > cas:
> > ldr w3, [x1]
> > .L3:
> > ldaxr   w4, [x0]
> > cmp w4, w3
> > bne .L4
> > stlxr   w5, w2, [x0]  ; store-release
> > cbnzw5, .L3
> > .L4:
> > csetw0, eq
> > cbz w0, .L6
> > ret
> > .p2align 3
> > .L6:
> > str w4, [x1]  ; store, no barrier.
> > ret
> > 
> > 
> > This looks odd to me but I'd need look into it more.
> 
> That looks fine to me: if the CAS fails, the prev value -> *expected.

It looks inconsistent with C11 S7.17.7.4-2 (C++11 S29.6.4-21) "Further, if the
comparison is true, memory is affected according to the value of success, and
if the comparison is false, memory is affected according to the value of
failure." (where success and failure are the memory model arguments.) In this
case, the write to *exp should be memory_order_seq_cst.

[Bug target/65697] __atomic memory barriers not strong enough for __sync builtins

2015-04-15 Thread mwahab at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65697

--- Comment #17 from mwahab at gcc dot gnu.org ---
According to the GCC documentation, __atomic_compare_exchange(ptr, exp, des,
..) is: if (*ptr == *exp) *ptr = *exp; else *exp = *ptr;

On Aarch64 the else (*ptr != *exp) branch is a store rather than a
store-release.

This doesn't appear in your example but with

int cas(int* barf, int* expected, int* desired)
{
  return __atomic_compare_exchange_n(barf, expected, desired, 0,
 __ATOMIC_SEQ_CST, __ATOMIC_SEQ_CST);
}

cas:
ldrw3, [x1]
.L3:
ldaxrw4, [x0]
cmpw4, w3
bne.L4
stlxrw5, w2, [x0]  ; store-release
cbnzw5, .L3
.L4:
csetw0, eq
cbzw0, .L6
ret
.p2align 3
.L6:
strw4, [x1]  ; store, no barrier.
ret


This looks odd to me but I'd need look into it more.


[Bug target/65697] __atomic memory barriers not strong enough for __sync builtins

2015-04-15 Thread mwahab at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65697

--- Comment #14 from mwahab at gcc dot gnu.org ---
(In reply to Andrew Haley from comment #13)
> There's surely a documentation problem here.
> 
> GCC defines this:
> 
> `__ATOMIC_SEQ_CST'
>  Full barrier in both directions and synchronizes with acquire
>  loads and release stores in all threads.

The documentation is a little misleading. Barriers in __atomic operations are
slightly weaker then barriers in __atomic fences; the differences are rather
subtle though. For the __sync implementation an fence would be ok but an
operation is too weak.

> But LDAXR/STLXR doesn't do that, and there's no write barrier at all when
> the compare fails.  If the intention really is to map onto c++11, this
> specification is wrong.

The LDAXR/STLXR sequences rely on the C11/C++11 prohibition of data races. That
the __atomic builtins assume this restriction is implied by the references to
C11/C++11 in the documentation but should probably be made clearer. I'll try to
write a patch, if nobody else gets there first.

I'll have to look at the compare-exchange code, it does looks like it goes
wrong.