On 10/21/07, Tomash Brechko <[EMAIL PROTECTED]> wrote:

> Hello,
>
> I have a question regarding the thread-safeness of a particular GCC
> optimization.  I'm sorry if this was already discussed on the list, if
> so please provide me with the reference to the previous discussion.
>
> Consider this piece of code:
>
>     extern int v;
>
>     void
>     f(int set_v)
>     {
>       if (set_v)
>         v = 1;
>     }
>
> If f() is called concurrently from several threads, then call to f(1)
> should be protected by the mutex.  But do we have to acquire the mutex
> for f(0) calls?  I'd say no, why, there's no access to global v in
> that case.  But GCC 3.3.4--4.3.0 on i686 with -01 generates the
> following:
>
>     f:
>             pushl   %ebp
>             movl    %esp, %ebp
>             cmpl    $0, 8(%ebp)
>             movl    $1, %eax
>             cmove   v, %eax        ; load (maybe)
>             movl    %eax, v        ; store (always)
>             popl    %ebp
>             ret
>
> Note the last unconditional store to v.  Now, if some thread would
> modify v between our load and store (acquiring the mutex first), then
> we will overwrite the new value with the old one (and would do that in
> a thread-unsafe manner, not acquiring the mutex).
>
> So, do the calls to f(0) require the mutex, or it's a GCC bug?
...
> So, could someone explain me why this GCC optimization is valid, and,
> if so, where lies the boundary below which I may safely assume GCC
> won't try to store to objects that aren't stored to explicitly during
> particular execution path?  Or maybe the named bug report is valid
> after all?

Hello Tomash,

I'm not an expert in the C89/C99 standards, but I have written a Ph.D.
on the subject of memory models. What I learned during writing that
Ph.D. is the following:

- If you want to know which optimizations are valid and which ones are
not, you have to look at the semantics defined in the language
standard.

- Every language standard document defines what the result is of
executing a sequential program. The definition of the behavior of a
multithreaded program written in a certain programming language is
called the memory model of that programming language.

- The memory model of C and C++ is still under discussion as has
already been pointed out on this mailing list.

- Although the memory model for C and C++ is still under discussion,
there is a definition for the behavior of multithreaded C and C++
programs. The following is required by the ANSI/ISO C89 standard (from
paragraph 5.1.2.3, Program Execution):
  Accessing a volatile object, modifying an object, modifying a file,
or calling a function
  that does any of those operations are all side effects, which are
changes in the state of
  the execution environment. Evaluation of an expression may produce
side effects. At
  certain specified points in the execution sequence called sequence
points, all side effects
  of previous evaluations shall be complete and no side effects of
subsequent evaluations
  shall have taken place. (A summary of the sequence points is given
in annex C.)

In annex C it is explained that a.o. the call to a function (after
argument evaluation) is a sequence point.

See also http://std.dkuug.dk/JTC1/SC22/WG14/www/docs/n843.pdf

- The above paragraph does not impose any limitation for the compiler
with regard to optimizations on non-volatile variables. Or: the
generated code shown in your mail is allowed by the above paragraph.

- The above paragraph has also the following implications for volatile
variables:
  * There exists a total order for all accesses to all volatile variables.
  * It is the responsibility of the compiler to ensure cache coherency
for volatile variables. If memory barrier instructions are needed to
ensure cache coherency on the architecture for which the compiler is
generating code for, then it is the responsibility of the compiler to
generate these instructions for volatile variables. This fact is often
overlooked.
  * The compiler must generate code such that exactly one store
statement is executed for each assignment to a volatile variable.
Prefetching volatile variables is allowed as long as it does not
violate paragraph 5.1.2.3 from the language definition.
  * As known the compiler may reorder function calls and assignments
to non-volatile variables if the compiler can prove that the called
function won't modify that variable. This becomes problematic if the
variable is modified by more than one thread and the called function
is a synchronization function, e.g. pthread_mutex_lock(). This kind of
reordering is highly undesirable. This is why any variable that is
shared over threads has to be declared volatile, even when using
explicit locking calls.

I hope the above brings more clarity in this discussion.

Bart Van Assche.

Reply via email to