On Thu, Oct 08, 2020 at 11:58:21AM +0200, Aldy Hernandez via Gcc-patches wrote:
> I am quoting my analysis from the PR.  Could an aarch64 expert pontificate
> here?
> 
> This test is checking the final assembly for a specific sequence.  I don't
> speak aarch64 assembly, but the IL is different coming out of evrp.
> 
> The first culprit is this difference in the mergephi1 dump:
> 
>    _9 = .CTZ (x_6(D));
> -  _10 = _9 & 31;
> +  _10 = _9;
> 
> These are unsigned ints, so assuming they are 32 bits on aarch64,
> __builtin_ctz is always less than 32.  This is because a CTZ of 0 is
> undefined according to the GCC manual:
> 
> [[
> Built-in Function: int __builtin_ctz (unsigned int x)
> 
>     Returns the number of trailing 0-bits in x, starting at the least
> significant bit position. If x is 0, the result is undefined.

In user __builtin_ctz yes, but it isn't that easy.

Aarch64 defines CTZ_DEFINED_VALUE_AT_ZERO to 2 and sets value to bitsize
of the mode (i.e. 32 in this case), 

And documentation says:
 -- Macro: CLZ_DEFINED_VALUE_AT_ZERO (MODE, VALUE)
 -- Macro: CTZ_DEFINED_VALUE_AT_ZERO (MODE, VALUE)
     A C expression that indicates whether the architecture defines a
     value for 'clz' or 'ctz' with a zero operand.  A result of '0'
     indicates the value is undefined.  If the value is defined for only
     the RTL expression, the macro should evaluate to '1'; if the value
     applies also to the corresponding optab entry (which is normally
     the case if it expands directly into the corresponding RTL), then
     the macro should evaluate to '2'.  In the cases where the value is
     defined, VALUE should be set to this value.

At least for *_DEFINED_VALUE_AT_ZERO (*) == 2, I'm afraid VRP can't
assume it is undefined at zero, but needs to use a range that covers
both the values for non-zero operand and the VALUE from
CTZ_DEFINED_VALUE_AT_ZERO, so [0, 32] range in this case, because
earlier GIMPLE passes could have already optimized away e.g. a != 0 check.
See simplify_count_trailing_zeroes in forwprop.

Perhaps another way out of this would be document and enforce that
__builtin_c[lt]z{,l,ll} etc calls are undefined at zero, but C[TL]Z ifn
calls are defined there based on *_DEFINED_VALUE_AT_ZERO (*) == 2, and then
we would need to make sure that e.g. in simplify_count_trailing_zeroes
we emit a .CTZ or __builtin_ctz call depending on whether it is undefined
there or not; or give .C[TL]Z an additional argument (0/1) which would tell
if it is defined at zero or not.

We have several enhancement reports in bugzilla from Gabriel Ravier on this
topic I think.

        Jakub

Reply via email to