On Thu, Oct 08, 2020 at 11:58:21AM +0200, Aldy Hernandez via Gcc-patches wrote: > I am quoting my analysis from the PR. Could an aarch64 expert pontificate > here? > > This test is checking the final assembly for a specific sequence. I don't > speak aarch64 assembly, but the IL is different coming out of evrp. > > The first culprit is this difference in the mergephi1 dump: > > _9 = .CTZ (x_6(D)); > - _10 = _9 & 31; > + _10 = _9; > > These are unsigned ints, so assuming they are 32 bits on aarch64, > __builtin_ctz is always less than 32. This is because a CTZ of 0 is > undefined according to the GCC manual: > > [[ > Built-in Function: int __builtin_ctz (unsigned int x) > > Returns the number of trailing 0-bits in x, starting at the least > significant bit position. If x is 0, the result is undefined.
In user __builtin_ctz yes, but it isn't that easy. Aarch64 defines CTZ_DEFINED_VALUE_AT_ZERO to 2 and sets value to bitsize of the mode (i.e. 32 in this case), And documentation says: -- Macro: CLZ_DEFINED_VALUE_AT_ZERO (MODE, VALUE) -- Macro: CTZ_DEFINED_VALUE_AT_ZERO (MODE, VALUE) A C expression that indicates whether the architecture defines a value for 'clz' or 'ctz' with a zero operand. A result of '0' indicates the value is undefined. If the value is defined for only the RTL expression, the macro should evaluate to '1'; if the value applies also to the corresponding optab entry (which is normally the case if it expands directly into the corresponding RTL), then the macro should evaluate to '2'. In the cases where the value is defined, VALUE should be set to this value. At least for *_DEFINED_VALUE_AT_ZERO (*) == 2, I'm afraid VRP can't assume it is undefined at zero, but needs to use a range that covers both the values for non-zero operand and the VALUE from CTZ_DEFINED_VALUE_AT_ZERO, so [0, 32] range in this case, because earlier GIMPLE passes could have already optimized away e.g. a != 0 check. See simplify_count_trailing_zeroes in forwprop. Perhaps another way out of this would be document and enforce that __builtin_c[lt]z{,l,ll} etc calls are undefined at zero, but C[TL]Z ifn calls are defined there based on *_DEFINED_VALUE_AT_ZERO (*) == 2, and then we would need to make sure that e.g. in simplify_count_trailing_zeroes we emit a .CTZ or __builtin_ctz call depending on whether it is undefined there or not; or give .C[TL]Z an additional argument (0/1) which would tell if it is defined at zero or not. We have several enhancement reports in bugzilla from Gabriel Ravier on this topic I think. Jakub