On Tue, Aug 22, 2017 at 3:28 PM, Pekka Jääskeläinen <pe...@parmance.com> wrote:
> Hi Richard and Joseph,
>
> Replies for both inline:
>
> I wrote:
>>> Both the inputs and outputs must be flushed to zero in the HSAIL’s
>>> ‘ftz’ semantics.
>>> FTZ operations were previously always “explicit” in the BRIG FE output, 
>>> like you
>>> propose here; there were builtin calls injected for all inputs and the
>>> output of ‘ftz’-marked
>>> float HSAIL instructions. This is still provided as a fallback for
>>> targets which do not
>>> support a CPU mode flag.
>
> On Mon, Aug 14, 2017 at 1:17 PM, Richard Biener
> <richard.guent...@gmail.com> wrote:
>> I see.  But how does making them implicit fix cases in the conformance
>> testsuite?  That is, isn't the error in the runtime implementation of
>> __hsail_ftz_*?  I'd have used a "simple" [...]
>
> There are two parts in the story here:
>
> 1) Making the FTZ/DAZ “the default”, meaning no builtin calls or
> similar are used to flush
> the operands/results, but relying on that the runtime flips on the
> FTZ/DAZ CPU flags
> before executing this code. This is purely a performance optimization because
> those FTZ/DAZ builtin calls (three per HSAIL instruction) ruin the performance
> for multiple reasons. We implemented this optimization already in our
> staging branch of
> the BRIG FE.
>
> 2) Ensuring GCC does not perform certain compile-time optimizations with the
> assumption that FTZ/DAZ is optional, but make it assume that ftz
> should happen for
> correctness. The proposed patch addresses this part for the compiler
> side by disabling
> the currently known optimizations which should be flushed at runtime
> when “ftz denorm
> math” is desired.
>
>>> The problem with a special FTZ ‘operation’ of some kind in the generic 
>>> output is
>>> that the basic optimizations get confused by a new operation and we’d need 
>>> to
>>> add knowledge of the ‘FTZ’ operation to a bunch of existing optimizer
>>> code, which
>>> seems unnecessary to support this case as the optimizations typically apply 
>>> also
>>> for the ‘FTZ semantics’ when the FTZ/DAZ flag is on.
>>
>> Apart from the exceptions you needed to guard ... do you have an example of
>> a transform that is confused by explicit FTZ and that would be valid if that 
>> FTZ
>> were implicit?  An explicit FTZ should be much safer.  I think the builtins
>> should also be CONST and not only PURE.
>
> Explicit builtin calls ruin many optimizations starting from a simple
> common subexpression
> elimination if they don’t understand what the builtin returns for any
> given operand.

Calls to const functions are CSEd just fine (if they are passed the same
argument, that is).

int __attribute__((const)) foo (int i);

int main()
{
  return foo(1) + foo(1);
}

results in 2 * foo (1).

Note that I expected FTZ to be a tree code and not a builtin.  The target
can then choose to simply elide all FTZ.  Constant folding can then
also correctly handle FTZ in the places where it is relevant.

> Thus,
> inlining the builtin function’s code would be needed first and there
> would be a lot of code
> inlined due to the abundance of ftz calls required and you cannot
> eliminate it all (as at
> compile time you don’t know if the operand is a denorm or not). Another 
> approach
> would be to introduce special cases to the optimizations affected so
> they understand
> the FTZ builtin and might be able to remove the useless ones. This potentially
> touches _a lot_ of code. And in the end, if the CPU could flush
> denorms efficiently
> using hardware (typically it’s faster to do FTZ in HW than gradual
> underflow so this
> is likely the case), any builtin call to do it that cannot be
> optimized away presents
> additional, possibly major, runtime overhead.

Understood.

> We tested if a simple common subexpression elimination case works with
> the ftz builtins
> and it didn’t. CONST didn’t help here.
>
> However, I understand your concern that there might be optimizations
> that still break the
> FTZ semantics if there are no explicit builtin calls, but we are
> prepared to fix them case by
> case if/when they appear. The attached updated patch fixes a few
> additional cases we noticed,
> e.g. it disables several constant folding cases.
>
> On Mon, Aug 14, 2017 at 2:30 PM, Joseph Myers <jos...@codesourcery.com> wrote:
>> Presumably this means that constant folding needs to know about those
>> semantics, both for operations with a subnormal floating-point argument
>> (whether or not the output is floating point, or floating point in the
>> same format), and those with such a result?
>> Can assignments copy subnormals without converting them to zero?  Should
>> comparisons flush input subnormals to zero before comparing?  Should
>> conversions e.g. from float to double convert a float subnormal input to
>> zero?
>
> I can answer yes to all of these questions.

I think the flag approach isn't good here.  If we'd have a mode that
doesn't have
denormals we could represent that but it's the language frontend that requires
a certain semantic and thus it should impose those as IL details.  These days
I'd not like to introduce global flags for semantic details of the IL
as we try to
get rid of those already existing.

Richard.

> BR,
> Pekka

Reply via email to