[Bug middle-end/59448] Code generation doesn't respect C11 address-dependency

2022-01-14 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=59448

--- Comment #26 from Andrew Pinski  ---
http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2018/p0750r1.html

[Bug middle-end/59448] Code generation doesn't respect C11 address-dependency

2022-01-14 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=59448

Andrew Pinski  changed:

   What|Removed |Added

   Keywords||wrong-code
   Target Milestone|--- |5.0

[Bug middle-end/59448] Code generation doesn't respect C11 address-dependency

2016-01-21 Thread torvald at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=59448

torvald at gcc dot gnu.org changed:

   What|Removed |Added

 Status|NEW |RESOLVED
 Resolution|--- |FIXED

--- Comment #25 from torvald at gcc dot gnu.org ---
(In reply to Andrew Macleod from comment #24)
> Author: amacleod
> Date: Wed Jan 14 13:58:35 2015
> New Revision: 219601
> 
> URL: https://gcc.gnu.org/viewcvs?rev=219601=gcc=rev
> Log:
> 
> 2015-01-14  Andrew MacLeod  
> 
>   PR middle-end/59448
>   * builtins.c (get_memmodel): Promote consume to acquire always.
>   * testsuite/gcc.dg/atomic-invalid.c: Remove obselete test for illegal
>   consume in an atomic_exchange.

To summarize, Jeff Preshing confirmed that Andrew's patch fixes the problem
(see Comment 23).  Everyone in ISO C++ seems to agree that promoting consume to
acquire is the right approach until we have a different language-level facility
that is actually implementable.  Therefore, I'm closing this as fixed.

[Bug middle-end/59448] Code generation doesn't respect C11 address-dependency

2015-01-14 Thread amacleod at redhat dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=59448

--- Comment #24 from Andrew Macleod amacleod at redhat dot com ---
Author: amacleod
Date: Wed Jan 14 13:58:35 2015
New Revision: 219601

URL: https://gcc.gnu.org/viewcvs?rev=219601root=gccview=rev
Log:

2015-01-14  Andrew MacLeod  amacl...@redhat.com

PR middle-end/59448
* builtins.c (get_memmodel): Promote consume to acquire always.
* testsuite/gcc.dg/atomic-invalid.c: Remove obselete test for illegal
consume in an atomic_exchange.


Modified:
trunk/gcc/ChangeLog
trunk/gcc/builtins.c
trunk/gcc/testsuite/ChangeLog
trunk/gcc/testsuite/gcc.dg/atomic-invalid.c


[Bug middle-end/59448] Code generation doesn't respect C11 address-dependency

2014-11-24 Thread filter-gcc at preshing dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=59448

--- Comment #23 from preshing filter-gcc at preshing dot com ---
Hi,

I went ahead and verified this bug using a cross-compiler built from GCC 4.9.2
sources. The bug indeed exists and happens when compiling for AArch64, but not
PowerPC. Andrew's patch fixes it (changing the first ldr instruction to an ldar
in this case). Full AArch64 assembly listings below.

I've also written a blog post on this subject in the hope of clarifying the
issue for anyone determined enough to make sense of it:
http://preshing.com/20141124/fixing-gccs-implementation-of-memory_order_consume

Andrew's patch, if it works the way I understand it, seems like the correct
thing for GCC to do until somebody figures out how to safely implement the
efficient compiler strategy for consume semantics.

I guess the next step is to run the test suite on a few platforms to make sure
there are no regressions, then submit?

Cheers,
Jeff

-- AArch64 listing of threadB() without Andrew's patch:

_Z7threadBv:
.LFB2304:
.cfi_startproc
adrpx1, .LANCHOR0
stp x29, x30, [sp, -16]!
.cfi_def_cfa_offset 16
.cfi_offset 29, -16
.cfi_offset 30, -8
add x1, x1, :lo12:.LANCHOR0
add x29, sp, 0
.cfi_def_cfa_register 29
.L10:
add x0, x1, 8
ldr w0, [x0]
cbz w0, .L10
ldr w0, [x1]
cmp w0, 1
bne .L15
str wzr, [x1]
add x0, x1, 8
stlrwzr, [x0]
b   .L10
.L15:
adrpx3, .LANCHOR1
adrpx0, .LC2
adrpx1, .LC1
add x3, x3, :lo12:.LANCHOR1
add x0, x0, :lo12:.LC2
add x1, x1, :lo12:.LC1
mov w2, 47
add x3, x3, 16
bl  __assert_fail
.cfi_endproc

-- AArch64 listing of threadB() with Andrew's patch:

_Z7threadBv:
.LFB2304:
.cfi_startproc
adrpx1, .LANCHOR0
stp x29, x30, [sp, -16]!
.cfi_def_cfa_offset 16
.cfi_offset 29, -16
.cfi_offset 30, -8
add x1, x1, :lo12:.LANCHOR0
add x29, sp, 0
.cfi_def_cfa_register 29
.L10:
add x0, x1, 8
ldarw0, [x0]
cbz w0, .L10
ldr w0, [x1]
cmp w0, 1
bne .L15
str wzr, [x1]
add x0, x1, 8
stlrwzr, [x0]
b   .L10
.L15:
adrpx3, .LANCHOR1
adrpx0, .LC2
adrpx1, .LC1
add x3, x3, :lo12:.LANCHOR1
add x0, x0, :lo12:.LC2
add x1, x1, :lo12:.LC1
mov w2, 47
add x3, x3, 16
bl  __assert_fail
.cfi_endproc


[Bug middle-end/59448] Code generation doesn't respect C11 address-dependency

2014-10-30 Thread torvald at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=59448

--- Comment #21 from torvald at gcc dot gnu.org ---
(In reply to torvald from comment #17)
 (In reply to Andrew Macleod from comment #15)
  So have we concluded that we should promote memory_order_consume to
  memory_order_acquire for now?  
 
 I also think that this is the best way forward.  I believe everyone in ISO
 C++ SG1 agreed that this is basically a defect in the standard.

To clarify, my impression from the discussion was that there was general
consensus that memory_order_consume as specified now wouldn't achieve what it
was intended to in practice, due to the implementation issues.  IOW, it's not
useful even though it's not a defect in the sense of being inconsistent or
plain wrong.


[Bug middle-end/59448] Code generation doesn't respect C11 address-dependency

2014-10-30 Thread filter-gcc at preshing dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=59448

preshing filter-gcc at preshing dot com changed:

   What|Removed |Added

 CC||filter-gcc at preshing dot com

--- Comment #22 from preshing filter-gcc at preshing dot com ---
(In reply to algrant from comment #13)
 I see what you mean - there is a race if threadB reads the data when the
 flag is not set, i.e. in the case when the read value is never used.
 Moving the read of data after the test (as in the fragment in comment 4)
 would fix the race.

algrant, could you post the complete assembly listing for threadB(), using your
version of AArch64 g++, after all the corrections to the example have been
taken into account, with -O2? For convenience, I've posted the corrected
example below.

When I compile the example using PowerPC g++ 4.9.1, it seems that
memory_order_consume is already promoted to memory_order_acquire, as evidenced
by the cmpw;bne-;isync sequence following the load from flag, which acts as an
acquire barrier. I've posted the PowerPC assembly listing below as well.

Is the bug specific to AArch64?

-

/* Reproducer for 59448: compile with -std=c++11 -lpthread -O2 */

#include atomic
#include thread
#include stdio.h
#include assert.h

static std::atomicunsigned int flag(0);
static int data;

/*
Thread A tries to release a data value 1 to thread B via a
release/consume sequence.

In the demonstration, this is tried repeatedly.  The iterations of
the two threads are synchronized via a release/acquire sequence
from B to A.  This is not relevant to the ordering problem.
*/

void threadA(void)
{
  for (;;) {
data = 1;   // A.A
flag.store(1, std::memory_order_release);   // A.B
/* By 1.9#14, A.A is sequenced before A.B */
/* Now wait for thread B to accept the data. */
while (flag.load(std::memory_order_acquire) == 1);
assert(data == 0);
  }
}

void threadB(void)
{
  for (;;) {
int f, d;
do {
  f = flag.load(std::memory_order_consume); // B.A
} while (!f);
/* By 1.10#9, B.A carries a dependency to B.B */
/* By 1.10#10, A.B is dependency-ordered before B.B */
/* By 1.10#11, A.A intra-thread happens before B.B */
/* By 1.10#12, A.A happens before B.B */
/* By 1.10#13, A.A is a visible side-effect with respect to B.B
   and the value determined by B.B shall be the
   value (1) stored by A.A. */
d = *(data + (f - f)); // B.B
assert(d == 1);
/* Release thread A to start another iteration. */
data = 0;
flag.store(0, std::memory_order_release);
  }
}

int main(void)
{
  std::thread a(threadA);
  std::thread b(threadB);
  a.join();
  b.join();
  return 0;
}

- PowerPC listing of threadB():

_Z7threadBv:
.LFB2302:
.cfi_startproc
lis 10,.LANCHOR0@ha
li 8,0
la 10,.LANCHOR0@l(10)
.L16:
lwz 9,4(10)
cmpw 7,9,9  --- acquire barrier
bne- 7,$+4  ---
isync   ---
cmpwi 7,9,0
beq+ 7,.L16
lwz 9,0(10)
cmpwi 7,9,1
bne- 7,.L22
stw 8,0(10)
lwsync
stw 8,4(10)
b .L16
.L22:
mflr 0
lis 6,.LANCHOR1@ha
stwu 1,-16(1)
.cfi_def_cfa_offset 16
.cfi_register 65, 0
lis 3,.LC2@ha
lis 4,.LC1@ha
la 6,.LANCHOR1@l(6)
la 3,.LC2@l(3)
la 4,.LC1@l(4)
li 5,47
addi 6,6,16
stw 0,20(1)
.cfi_offset 65, 4
bl __assert_fail
.cfi_endproc


[Bug middle-end/59448] Code generation doesn't respect C11 address-dependency

2014-10-29 Thread torvald at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=59448

--- Comment #20 from torvald at gcc dot gnu.org ---
(In reply to jos...@codesourcery.com from comment #19)
 * carries_dependency is about increasing optimization, not decreasing it.  
 If it affects when barriers are added, it does so by allowing some 
 barriers to be omitted that would otherwise be required.

That's not quite true, unfortunately, AFAIU.  I agree that generally,
attributes are supposed to be ignorable -- but in the case of
carries_dependency, C++11 explicitly requires that all declarations of a
function either have the attribute or none has (see 7.6.4p2).  This is because
you need that to actually exploit the attribute; it's a contract between
callers and callees.  If a compiler does try to preserve dependencies (or not
across function boundaries), then ignoring the attribute is fine.  But if there
should be a compiler out there that doesn't, and GCC-generated code is supposed
to link to that other compiler's code, then we need to do something to make
this work.


[Bug middle-end/59448] Code generation doesn't respect C11 address-dependency

2014-10-28 Thread ramana at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=59448

Ramana Radhakrishnan ramana at gcc dot gnu.org changed:

   What|Removed |Added

 Status|UNCONFIRMED |NEW
   Last reconfirmed||2014-10-28
 CC||ramana at gcc dot gnu.org
 Ever confirmed|0   |1

--- Comment #14 from Ramana Radhakrishnan ramana at gcc dot gnu.org ---
This appears to have now morphed into a standards issue and possibly affects
all targets that have a weak memory model. Anyway confirming this.


[Bug middle-end/59448] Code generation doesn't respect C11 address-dependency

2014-10-28 Thread amacleod at redhat dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=59448

--- Comment #15 from Andrew Macleod amacleod at redhat dot com ---
Created attachment 33831
  -- https://gcc.gnu.org/bugzilla/attachment.cgi?id=33831action=edit
promote memory order consume to acquire

So have we concluded that we should promote memory_order_consume to
memory_order_acquire for now?  

That change should be fairly trivial. builtins.c::get_memmodel would simply
change any consumes to acquire.  All consumes would then be promoted to
acquire.

I don't know if libatomic does much with consume, but once compiled with this
patch in the compiler, it will also generate acquire code for consumes.

Here's an untested patch provided if someone wants to experiment.


[Bug middle-end/59448] Code generation doesn't respect C11 address-dependency

2014-10-28 Thread joseph at codesourcery dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=59448

--- Comment #16 from joseph at codesourcery dot com joseph at codesourcery dot 
com ---
My view is promote consume to acquire until the standards committees and 
formal model people have sorted out what to do with it.


[Bug middle-end/59448] Code generation doesn't respect C11 address-dependency

2014-10-28 Thread torvald at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=59448

--- Comment #17 from torvald at gcc dot gnu.org ---
(In reply to Andrew Macleod from comment #15)
 So have we concluded that we should promote memory_order_consume to
 memory_order_acquire for now?  

I also think that this is the best way forward.  I believe everyone in ISO C++
SG1 agreed that this is basically a defect in the standard.

What I haven't thought through is how to deal with with carries_dependency
(7.6.4 in C++11): For GCC code generated after we promote consume to acquire,
it can safely be ignored; but should GCC code be linked to code generated by
another compiler that does not promote and expects the code to preserve
dependencies, this won't work.

I am not aware of any shipping compiler that would actually try to preserve
dependencies, and nobody else mentioned any during the discussion of this topic
in ISO C++ SG1.  Thus, we could assume that there are no such other compilers,
and make it part of the ABI (assumptions) that consume is promoted to acquire
in a correct compiler.

Alternatively, we could try to be conservative and add an acquire barrier
before the function body if any parameter of the function has
carries_dependency; and, likewise, add an acquire barrier after every call to a
function which has carries_dependency.

I don't have more input from the ISO C side, but I would guess that the
situation there is similar.

Thoughts?


[Bug middle-end/59448] Code generation doesn't respect C11 address-dependency

2014-10-28 Thread t.p.northover at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=59448

--- Comment #18 from Tim Northover t.p.northover at gmail dot com ---
 I am not aware of any shipping compiler that would actually try to preserve 
 dependencies, and nobody else mentioned any during the discussion of this 
 topic in ISO C++ SG1.

In case the data point is useful, Clang promotes consume to acquire at the
moment (though I wouldn't rule out bugs even in that choice).


[Bug middle-end/59448] Code generation doesn't respect C11 address-dependency

2014-10-28 Thread joseph at codesourcery dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=59448

--- Comment #19 from joseph at codesourcery dot com joseph at codesourcery dot 
com ---
On Tue, 28 Oct 2014, torvald at gcc dot gnu.org wrote:

 Alternatively, we could try to be conservative and add an acquire barrier
 before the function body if any parameter of the function has
 carries_dependency; and, likewise, add an acquire barrier after every call to 
 a
 function which has carries_dependency.
 
 I don't have more input from the ISO C side, but I would guess that the
 situation there is similar.

As previously noted:

* ISO C does not have carries_dependency (or attributes at all).

* carries_dependency is about increasing optimization, not decreasing it.  
If it affects when barriers are added, it does so by allowing some 
barriers to be omitted that would otherwise be required.


[Bug middle-end/59448] Code generation doesn't respect C11 address-dependency

2014-02-17 Thread algrant at acm dot org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=59448

--- Comment #11 from algrant at acm dot org ---
Where do you get that this is racy if the access to data is not atomic?  By
design, release/acquire and release/consume sequences don't require wholesale
changes to the way the data payload (in the general case, multiple fields
within a structure) is first constructed and then used.  1.10#13 makes clear
that as a result of the intra-thread sequencing between atomic and non-atomic
operations (1.9#14), and the inter-thread ordering between atomic operations
(1.10 various), there is a resulting ordering on operations to ordinary (sic)
objects.  Please see the references to the C++ standard in the source example,
for the chain of reasoning here.


[Bug middle-end/59448] Code generation doesn't respect C11 address-dependency

2014-02-17 Thread torvald at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=59448

--- Comment #12 from torvald at gcc dot gnu.org ---
(In reply to algrant from comment #11)
 Where do you get that this is racy if the access to data is not atomic?

In threadB(), you do:

  f = flag.load(std::memory_order_consume); // B.A
  d = *(data + f - f); // B.B

That reads from data irrespective of the value of f, so will read when data is
actually not owned by threadB.  You can either make the accesses to data
atomic, or move the load from data to after checking that flag equals 1.

 By
 design, release/acquire and release/consume sequences don't require
 wholesale changes to the way the data payload (in the general case, multiple
 fields within a structure) is first constructed and then used.  1.10#13
 makes clear that as a result of the intra-thread sequencing between atomic
 and non-atomic operations (1.9#14), and the inter-thread ordering between
 atomic operations (1.10 various), there is a resulting ordering on
 operations to ordinary (sic) objects.  Please see the references to the
 C++ standard in the source example, for the chain of reasoning here.

I'm very much aware of the rules, I believe.  But as far as you can see,
there's a data race in your test program, which results in undefined behavior.


[Bug middle-end/59448] Code generation doesn't respect C11 address-dependency

2014-02-17 Thread algrant at acm dot org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=59448

--- Comment #13 from algrant at acm dot org ---
I see what you mean - there is a race if threadB reads the data when the
flag is not set, i.e. in the case when the read value is never used.
Moving the read of data after the test (as in the fragment in comment 4)
would fix the race.


[Bug middle-end/59448] Code generation doesn't respect C11 address-dependency

2014-01-23 Thread torvald at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=59448

torvald at gcc dot gnu.org changed:

   What|Removed |Added

 CC||rth at gcc dot gnu.org

--- Comment #10 from torvald at gcc dot gnu.org ---
(In reply to algrant from comment #6)
 Here is a complete C++11 test case.  The code generated for the two loads
 in thread B doesn't maintain the required ordering:

The C++ test has a race condition because the nonatomic load from data is not
ordered wrt. the nonatomic store to data in the other thread.  Please fix that
and report back whether this changes anything.  (I believe the use of
relaxed-MO atomic accesses should be sufficient.)

Nonetheless, if there is a fixed (data-race-free) similar test, then I believe
this fixed test case to be valid in the sense that AFAIU the standards'
requirements (both C++11's and C11's; they are equal in this aspect), code as
in this test case is supposed to be carrying the dependencies forward.

However, I cannot see a *practical* implementation of this requirement.  AFAIU,
dependencies are carried forward through loads/stores too (5.1.2.4#14):
An evaluation A carries a dependency to an evaluation B if: [...]A writes a
scalar object or bit-field M, B reads from M the value written by A, and A is
sequenced before B.
Because a compiler cannot, in general, analyze which piece of code stored last
to M, it would have to conservatively assume that a dependency is carried
through M.  In turn, it could never optimize expressions such as (x-x), at
least if those affect atomic operations on M in some way, irrespective of how
remotely that might have happened.  (I'm restricting my statement to atomic
accesses because I'm not quite sure whether there's some other reason why
nonatomic accesses couldn't be affected by this.)  OTOH, maybe there are
factors that I'm not aware of and that effectively constrain
carries-a-dependency to something that doesn't widely prohibit optimizations.

I don't have significant background information on the design of this part of
the memory model (nor the implementability of this part), but I'll ask around
at the C++ meeting in two weeks.

 According to the architecture specification, to achieve the ordering it's
 sufficient to use the result of the first load in the calculation of the
 address of the second, even if it's done in such a way as to have no
 dependence on the value.

And that's certainly a feasible *architecture* specification.  ISTM that the
language standard tried to model this, but then didn't give the compiler enough
leeway to still be able to optimize code.  The carries-a-depency rules seem
like an unusual shortcut from the language level to requirements on the
generated code, without the usual indirection using the observable behavior of
the abstract machine and as-if.  For example -- and this isn't a precise
definition of course -- if the standard would require that f (in your test
case) is needed to process the evaluation, then things like f-f could be
optimized out, and the compiler might just do the right thing with standard
optimizations.

As a short-term fix, I believe we can promote all memory_order_consume atomics
to memory_order_acquire, when generating code with consume-MO atomics.  GCC is
at fault here, not the code in the builtins or in libatomic.  Thus, this would
prevent errors when GCC compiled the code and a correct atomics implementation
is used (e.g., third-party libatomic).

This won't help when another (potentially correct) compiler generated the code
with the consume-MO atomics, and GCC-compiled code fails to carry the
dependencies forward.  We could try to change libatomic's consume-MO
implementation to prevent that, but this would any catch cases in which
libatomic is actually used and not the other compiler's builtins or similar. 
Thus, changing libatomic does not seem to be necessary.

Richard, Andrew: Thoughts?


[Bug middle-end/59448] Code generation doesn't respect C11 address-dependency

2014-01-20 Thread algrant at acm dot org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=59448

--- Comment #6 from algrant at acm dot org ---
Here is a complete C++11 test case.  The code generated for the two loads
in thread B doesn't maintain the required ordering:

...
ldrbw1, [x0]
uxtbw1, w1
adrpx0, .LANCHOR0
ldr w2, [x3]
...

According to the architecture specification, to achieve the ordering it's
sufficient to use the result of the first load in the calculation of the
address of the second, even if it's done in such a way as to have no
dependence on the value.

/* Reproducer for 59448: compile with -std=c++11 -lpthread */

#include atomic
#include thread
#include stdio.h
#include assert.h

static std::atomicbool flag(0);
static int data;

/*
Thread A tries to release a data value 1 to thread B via a
release/consume sequence.

In the demonstration, this is tried repeatedly.  The iterations of
the two threads are synchronized via a release/acquire sequence
from B to A.  This is not relevant to the ordering problem.
*/

void threadA(void)
{
  for (;;) {
data = 1;   // A.A
flag.store(1, std::memory_order_release);   // A.B
/* By 1.9#14, A.A is sequenced before A.B */
/* Now wait for thread B to accept the data. */
while (flag.load(std::memory_order_acquire) == 1);
assert(data == 0);
  }
}

void threadB(void)
{
  for (;;) {
int f, d;
do {
  f = flag.load(std::memory_order_consume); // B.A
  d = *(data + f - f); // B.B
  /* By 1.10#9, B.A carries a dependency to B.B */
  /* By 1.10#10, A.B is dependency-ordered before B.B */
  /* By 1.10#11, A.A intra-thread happens before B.B */
  /* By 1.10#12, A.A happens before B.B */
  /* By 1.10#13, A.A is a visible side-effect with respect to B.B
 and the value determined by B.B shall be the
 value (1) stored by A.A. */
} while (!f);
assert(d == 1);
/* Release thread A to start another iteration. */
data = 0;
flag.store(0, std::memory_order_release);
  }
}

int main(void)
{
  std::thread a(threadA);
  std::thread b(threadB);
  a.join();
  b.join();
  return 0;
}


[Bug middle-end/59448] Code generation doesn't respect C11 address-dependency

2014-01-20 Thread pinskia at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=59448

--- Comment #7 from Andrew Pinski pinskia at gcc dot gnu.org ---
*(data + f - f);  

This could be undefined behavior if f is not 0 or 1.


[Bug middle-end/59448] Code generation doesn't respect C11 address-dependency

2014-01-20 Thread algrant at acm dot org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=59448

--- Comment #8 from algrant at acm dot org ---
I don't see how f can not be 0 or 1 here, but to make this even more clear
that there is no UB, define flag this way:

  static std::atomicunsigned int flag(0);

and calculate the address this way:

  d = *(data + (f - f)); // B.B

The ordering requirements are the same, and there is no possibility of UB.


[Bug middle-end/59448] Code generation doesn't respect C11 address-dependency

2014-01-20 Thread joseph at codesourcery dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=59448

--- Comment #9 from joseph at codesourcery dot com joseph at codesourcery dot 
com ---
Thanks for the testcase.  At this point I think we need Andrew MacLeod or 
Torvald Riegel to review it and assess what should happen where to fix 
this (both for direct uses of the atomic built-in functions, and for 
_Atomic / std::atomic).

Since the original bug report it has come to my attention that there are 
known ambiguities in how the C11/C++11 memory model is mapped to processor 
operations on some architectures, which will need to be addressed in 
platform ABIs for interoperation between implementations (when one thread 
is built with one implementation and the other thread with another 
implementation, for example).

http://www.cl.cam.ac.uk/~pes20/cpp/cpp0xmappings.html

(It is not clear to me if research in this area is sufficiently advanced 
to provide a good way for ABIs to specify this that is unambiguous and 
does not inhibit optimizations within a single implementation.)

It will also of course be necessary for ABIs to specify such things as 
alignment for C11 _Atomic for that to interoperate between implementations 
(for GCC at present, if the size of the type is 8 / 16 / 32 / 64/ 128 
bits, the alignment gets increased to match that of the corresponding core 
type).  But first it may take a while for things to work reliably within 
single implementations.


[Bug middle-end/59448] Code generation doesn't respect C11 address-dependency

2013-12-16 Thread joseph at codesourcery dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=59448

--- Comment #5 from joseph at codesourcery dot com joseph at codesourcery dot 
com ---
On Thu, 12 Dec 2013, algrant at acm dot org wrote:

 demonstrates the same lack of ordering.  You suggest that this might
 be a problem with the atomic built-ins - and yes, if this had been a
 load-acquire, it would be a problem with the built-in not introducing a
 barrier or using a load-acquire instruction.  But for a load-consume on
 this architecture, no barrier is necessary to separate the load-consume
 from a load that is address-dependent on it.  The programmer wrote a
 dependency but the compiler lost track of it.

address-dependent is not a C standard concept.  As far as I can tell, at 
least as regards C there are no such ordering constraints between 
non-atomic operations, only between operations at least one of which is 
atomic - thus, it is the responsibility of the atomic built-in functions 
to ensure whatever ordering may be required.  (Whereas the parts of the 
memory model defining what counts as a memory location *do* have 
implications in the absence of atomics, restricting the code sequences 
that can be used for struct modifications and preventing speculative 
stores.)

 It's not necessary to demonstrate failure - there's an architectural 
 race condition here.  Even if it doesn't fail now there's no guarantee
 it will never fail on future more aggressively reordering cores.

You still need to provide a testcase (a complete program that can be 
compiled and linked with current GCC) that (a) does not show undefined 
behavior, (b) that, you justify by reference to the standard definitions 
and how they apply to source code constructs, must exhibit specific 
observable behavior (values printed, assertions passed, etc. - *not* just 
ordering of loads at the architectural level), and (c) that, you justify 
by reference to the architecture definition if not to actual observed 
failure, could fail to meet the requirements for observable behavior, 
given the code generated by GCC and the behavior permitted by the 
architecture for that code.

I am unable to tell what code you envisage running in another thread or 
what observable failure you think could result from lack of ordering, 
because you have not provided a complete testcase.  Thus, I am unable to 
tell if there is a genuine bug here at all.  The standard definitions 
associated with atomicity are extremely complicated; you need to be very 
careful about identifying exactly how particular definitions apply to 
particular source code constructs and so how you deduce the requirements 
on behavior of a particular program, for a bug report to be of any use.


[Bug middle-end/59448] Code generation doesn't respect C11 address-dependency

2013-12-12 Thread algrant at acm dot org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=59448

--- Comment #4 from algrant at acm dot org ---
So using g++,

  #include atomic
  int f1(std::atomicint const *p, std::atomicint const *q)
  {
int flag = p-load(std::memory_order_consume);
return flag ? (q + flag - flag)-load(std::memory_order_relaxed) : 0;
  }

demonstrates the same lack of ordering.  You suggest that this might
be a problem with the atomic built-ins - and yes, if this had been a
load-acquire, it would be a problem with the built-in not introducing a
barrier or using a load-acquire instruction.  But for a load-consume on
this architecture, no barrier is necessary to separate the load-consume
from a load that is address-dependent on it.  The programmer wrote a
dependency but the compiler lost track of it.

It's not necessary to demonstrate failure - there's an architectural 
race condition here.  Even if it doesn't fail now there's no guarantee
it will never fail on future more aggressively reordering cores.