Re: [RFC] Do not consider volatile asms as optimization barriers #1

Richard Biener Mon, 03 Mar 2014 02:16:26 -0800

On Mon, Mar 3, 2014 at 9:01 AM, Richard Sandiford
<rdsandif...@googlemail.com> wrote:
> Eric Botcazou <ebotca...@adacore.com> writes:
>>> Thanks for doing this.  FWIW I agree it's probably the best stop-gap fix.
>>> But the implication seems to be that unspec_volatile and volatile asms
>>> are volatile in different ways.  IMO they're volatile in the same way
>>> and the problems for volatile asms apply to unspec_volatile too.
>>
>> I disagree, we need a simple way for the RTL middle-end as well as the back-
>> ends to block most optimizations across a specific point (e.g. a non-local
>> label as in HP's fix) and UNSPEC_VOLATILE is the best candidate, at least in
>> the short term.
>
> I don't agree it's the best candidate if...
>
>>> E.g. although cse.c will flush the table for unspec_volatile,
>>> it isn't the case that unspec_volatile forces a containing function
>>> to save all call-saved registers.  That would be excessive for a plain
>>> blockage instruction.  So again we seem to be assuming one thing in places
>>> like cse.c and another in the register allocator.  Code that uses the DF
>>> framework will also assume that registers are not implicitly clobbered
>>> by an unspec_volatile:
>>> [...]
>>> Also, ira-lives.c (which tracks the liveness of both pseudo and hard
>>> registers) doesn't mention "volatile" at all.
>>
>> Yes, the definition of a blockage instruction is somewhat vague and I agree
>> that it shoudn't cause registers to be spilled.  But it needs to block most,
>> if not all, optimizations.
>
> ...it's so loosely defined.  If non-local labels are the specific problem,
> I think it'd be better to limit the flush to that.


non-local labels should block most optimizations by the fact they
are a receiver of control flow and thus should have an abnormal
edge coming into them.  If that's not the case (no abnormal edge)
then that's the bug to fix.

Otherwise I agree with Richard.  Please sit down and _exactly_ define
what 'volatile' in an asm provides for guarantees compared to non-volatile
asms.  Likewise do so for volatile UNSPECs.

A volatile shouldn't be a cheap way out of properly enumerating all
uses, defs and clobbers of a stmt.  If volatile is used to tell the
insn has additional uses/defs or clobbers to those explicitely given
the only reason that may be valid is because we cannot explicitely
enumerate those.  But we should fix that instead (for example with
the special register idea or by adding a middle-end wide "special"
"blockage" that you can use/def/clobber).

To better assess the problem at hand can you enumerate the cases
where you need that special easy "blockage" instruction?  With
testcases please?

Note that on GIMPLE even volatiles are not strictly ordered if they
don't have a dependence that orders them (that doesn't mean that
any existing transform deliberately re-orders them, but as shown
with the loop example below such re-ordering can happen
as a side-effect of a valid transform).

> I'm back to throwing examples around, sorry, but take the MIPS testcase:
>
>     volatile int x = 1;
>
>     void foo (void)
>     {
>       x = 1;
>       __builtin_mips_set_fcsr (0);
>       x = 2;
>     }
>
> where __builtin_mips_set_fcsr is a handy way of getting unspec_volatile.
> (I'm not interested in what the function does here.)  Even at -O2,
> the cse.c code successfully prevents %hi(x) from being shared,
> as you'd expect:
>
>         li      $3,1                    # 0x1
>         lui     $2,%hi(x)
>         sw      $3,%lo(x)($2)
>         move    $2,$0
>         ctc1    $2,$31
>         li      $3,2                    # 0x2
>         lui     $2,%hi(x)
>         sw      $3,%lo(x)($2)
>         j       $31
>         nop
>
> But put it in a loop:
>
>     void frob (void)
>     {
>       for (;;)
>         {
>           x = 1;
>           __builtin_mips_set_fcsr (0);
>           x = 2;
>         }
>     }
>
> and we get the rather bizarre code:
>
>         lui     $2,%hi(x)
>         li      $6,1                    # 0x1
>         move    $5,$0
>         move    $4,$2
>         li      $3,2                    # 0x2
>         .align  3
> .L3:
>         sw      $6,%lo(x)($2)
>         ctc1    $5,$31
>         sw      $3,%lo(x)($4)
>         j       .L3
>         lui     $2,%hi(x)
>
> Here the _second_ %hi(x), the 1 and the 2 have been hoisted but the first
> %hi(x) is reloaded each time.  So what's the correct behaviour here?
> Should the hoisting of the second %hi(x) have been disabled because the
> loop contains an unspec_volatile?  What about the 1 (from the first store)
> and the 2?
>
> If instead it was:
>
>    for (i = 0; i < 100; i++)
>
> would converting to a hardware do-loop be acceptable?
>
>>> So most passes assume that no pseudos or hard registers will be
>>> implicitly clobbered by unspec_volatile, just like for a volatile asm.
>>> And IMO that's right.  I think the rule should be the same for volatile
>>> asms and unspec_volatiles, and the same for registers as it already is for
>>> memory: if the instruction clobbers something, it should say so explicitly.
>>
>> IMO that would buy us nothing and, on the contrary, would add complexity 
>> where
>> there currently isn't.  We really need a simple blockage instruction.
>>
>>> Volatile itself should:
>>>
>>> (a) prevent deletion or duplication of the operation
>>> (b) prevent reordering wrt other volatiles
>>> (c) prevent the operation from being considered equivalent to any other
>>>     operation (even if it's structurally identical and has the same inputs)
>>>
>>> but nothing beyond that.
>>
>> Maybe UNSPEC_VOLATILE is a misnomer then and we should allow volatile UNSPECs
>> along the above lines.
>
> That'd be fine with me, especially since with the patch it sounds like
> using volatile asm would produce better code than a built-in function
> that expands to an unspec_volatile, whereas IMO the opposite should
> always be true.
>
> But I still think we need a similar list for what unspec_volatile
> means now, if we decide to keep something with the current meaning.
>
> Thanks,
> Richard

Re: [RFC] Do not consider volatile asms as optimization barriers #1

Reply via email to