[EMAIL PROTECTED] wrote on 17/07/2005 08:30:26:

> Michael Veksler <[EMAIL PROTECTED]> writes:
>
> | Gabriel Dos Reis wrote on 17/07/2005 06:07:29:
> |
> | > Daniel Berlin <[EMAIL PROTECTED]> writes:
> | >
> | > | Anything it sees anything in a statement with volatile, it marks
the
> | > | statement as volatile, which should stop things from touching it
> | > | (anything that *does* optimize something marked volatile is buggy).
> | > great!
> | >
> |
> | I can't agree with that as is. I would refine it to:
> |   Anything that *does* optimizes away visible reads or writes of
> |   something marked volatile is buggy.
>
> How do you define "visible reads or writes", and how is it different
> from Daniel's statement?

Visible reads or writes is to be defined by the hardware:
The most percise definition would include cache coherency
and other very complicated hardware isssues. For now I'll
only say that "visible reads or writes" follow the "as if"
rule from hardware's point of view(equivalent cache/bus
activity for external memory address, a bit more tricky
to define for internal memory addresses).

>From Daniel's statement it follows that you are not
allowed to do what PR 3506 wants to do:

To generate:
        incl      y
Instead of:
        movl    y, %eax
        incl      %eax
        movl    %eax, y

Becuase "anything that *does* optimize something marked volatile
is buggy" [Daniel Berlin].

I say that it is perfectly legal to do so, provided that "incl y"
is not atomic.

My statements are correct as long as the following is allowed to
become an infinite loop:


1:f()
2:{
3:    static volatile int i=0;
4:    while (i <= 1 && i >=0)
5:    {
6:        ++i;
7:        --i;
8:    }
9:}

When f() is simultaneously called from 2 threads.

A possible run before the transformation:

time=0: i==0 ;  thread 0 [line 6]: reg=i ; ++reg ; next(i) == 0
time=1: i==0 ;  thread 1 [line 6]: reg=i ; ++reg ; next(i) == 0
time=2: i==0 ;  thread 1 [line 6]: i=reg ; next(i) == 1
time=3: i==0 ;  thread 0 [line 6]: i=reg ; next(i) == 1
time=4: i==1 ; thread 0 [line 7]: reg=i; ++reg; i=reg; next(i)==2
time=5: i==2 ; thread 0 [line 4]; i <= 1 --> is false: break loop


After the transformation, "inc i" will be an atomic operation
for a single processor. As a result, no two ++i will be able
to operate in parallel (on a single processor), and times [0..3]
will no longer be able to interleave. As a result, this will be
an infinite loop.

With MP or with a multi-threaded processor [inc i] will not be
atomic (correct me if I am wrong WRT x86), so the above
run is still possible - even after transformation.


Considering the above example, do you think that this
transformation is invalid?


  Michael

Reply via email to