Richard Biener <rguent...@suse.de> writes: > On Tue, 1 Aug 2023, Richard Sandiford wrote: > >> Richard Sandiford <richard.sandif...@arm.com> writes: >> > Richard Biener via Gcc-patches <gcc-patches@gcc.gnu.org> writes: >> >> The following makes sure to limit the shift operand when vectorizing >> >> (short)((int)x >> 31) via (short)x >> 31 as the out of bounds shift >> >> operand otherwise invokes undefined behavior. When we determine >> >> whether we can demote the operand we know we at most shift in the >> >> sign bit so we can adjust the shift amount. >> >> >> >> Note this has the possibility of un-CSEing common shift operands >> >> as there's no good way to share pattern stmts between patterns. >> >> We'd have to separately pattern recognize the definition. >> >> >> >> Bootstrapped on x86_64-unknown-linux-gnu, testing in progress. >> >> >> >> Not sure about LSHIFT_EXPR, it probably has the same issue but >> >> the fallback optimistic zero for out-of-range shifts is at least >> >> "corrrect". Not sure we ever try to demote rotates (probably not). >> > >> > I guess you mean "correct" for x86? But that's just a quirk of x86. >> > IMO the behaviour is equally wrong for LSHIFT_EXPR. > > I meant "correct" for the constant folding that evaluates out-of-bound > shifts as zero. > >> Sorry for the multiple messages. Wanted to get something out quickly >> because I wasn't sure how long it would take me to write this... >> >> On rotates, for: >> >> void >> foo (unsigned short *restrict ptr) >> { >> for (int i = 0; i < 200; ++i) >> { >> unsigned int x = ptr[i] & 0xff0; >> ptr[i] = (x << 1) | (x >> 31); >> } >> } >> >> we do get: >> >> can narrow to unsigned:13 without loss of precision: _5 = x_12 r>> 31; >> >> although aarch64 doesn't provide rrotate patterns, so nothing actually >> comes of it. > > I think it's still correct that we only need unsigned:13 for the input, > we know other bits are zero. But of course when actually applying > this as documented > > /* Record that STMT_INFO could be changed from operating on TYPE to > operating on a type with the precision and sign given by PRECISION > and SIGN respectively. > > the operation itself has to be altered (the above doesn't suggest > promoting/demoting the operands to TYPE is the only thing to do). > > So it seems to be the burden is on the consumers of the information?
Yeah, textually that seems fair. Not sure I was thinking of it in those terms at the time though. :) >> I think the handling of variable shifts is flawed for other reasons. Given: >> >> void >> uu (unsigned short *restrict ptr1, unsigned short *restrict ptr2) >> { >> for (int i = 0; i < 200; ++i) >> ptr1[i] = ptr1[i] >> ptr2[i]; >> } >> >> void >> us (unsigned short *restrict ptr1, short *restrict ptr2) >> { >> for (int i = 0; i < 200; ++i) >> ptr1[i] = ptr1[i] >> ptr2[i]; >> } >> >> void >> su (short *restrict ptr1, unsigned short *restrict ptr2) >> { >> for (int i = 0; i < 200; ++i) >> ptr1[i] = ptr1[i] >> ptr2[i]; >> } >> >> void >> ss (short *restrict ptr1, short *restrict ptr2) >> { >> for (int i = 0; i < 200; ++i) >> ptr1[i] = ptr1[i] >> ptr2[i]; >> } >> >> we only narrow uu and ss, due to: >> >> /* Ignore codes that don't take uniform arguments. */ >> if (!types_compatible_p (TREE_TYPE (op), type)) >> return; > > I suppose that's because we care about the shift operand at all here. > We could possibly use [0 .. precision-1] as known range for it > and only if that doesn't fit 'type' give up (and otherwise simply > ignore the input range of the shift operands here). > >> in vect_determine_precisions_from_range. Maybe we should drop >> the shift handling from there and instead rely on >> vect_determine_precisions_from_users, extending: >> >> if (TREE_CODE (shift) != INTEGER_CST >> || !wi::ltu_p (wi::to_widest (shift), precision)) >> return; >> >> to handle ranges where the max is known to be < precision. >> >> There again, if masking is enough for right shifts and right rotates, >> maybe we should keep the current handling for then (with your fix) >> and skip the types_compatible_p check for those cases. > > I think it should be enough for left-shifts as well? If we lshift > out like 0x100 << 9 so the lhs range is [0,0] the input range from > op0 will still make us use HImode. I think we only ever get overly > conservative answers for left-shifts from this function? But if we have: short x, y; int z = (int) x << (int) y; and at runtime, x == 1, y == 16, (short) z should be 0 (no UB), whereas x << y would invoke UB and x << (y & 15) would be 1. > Whatever works for RROTATE should also work for LROTATE. I think the same problem affects LROTATE. >> So: >> >> - restrict shift handling in vect_determine_precisions_from_range to >> RSHIFT_EXPR and RROTATE_EXPR >> >> - remove types_compatible_p restriction for those cases >> >> - extend vect_determine_precisions_from_users shift handling to check >> for ranges on the shift amount >> >> Does that sound right? > > I'm not sure. This all felt somewhat fragile when looking closer > (I was hoping you would eventually tackle it from the older > referenced bug) ... so my main idea was to perform incremental changes > where I have test coverage (as with the new wrong-code bug). Fair :) > Originally I completely disabled shift support but that regressed > the over-widen testcases a lot which at least have widened shifts > by constants a lot. > > x86 has vector rotates only for AMD XOP (which is dead) plus > some for V1TImode AFAICS, but I think we pattern-match rotates > to shifts, so maybe the precision stuff is interesting for the > case where we match the pattern rotate sequence for widenings? > > So for the types_compatible_p issue something along > the following? We could also exempt the shift operand from > being covered by min_precision so the consumer would have > to make sure it can be represented (I think that's never going > to be an issue in practice until we get 256bit integers vectorized). > It will have to fixup the shift operands anyway. > > diff --git a/gcc/tree-vect-patterns.cc b/gcc/tree-vect-patterns.cc > index e4ab8c2d65b..cdeeaf98a47 100644 > --- a/gcc/tree-vect-patterns.cc > +++ b/gcc/tree-vect-patterns.cc > @@ -6378,16 +6378,26 @@ vect_determine_precisions_from_range > (stmt_vec_info stmt_info, gassign *stmt) > } > else if (TREE_CODE (op) == SSA_NAME) > { > - /* Ignore codes that don't take uniform arguments. */ > - if (!types_compatible_p (TREE_TYPE (op), type)) > + /* Ignore codes that don't take uniform arguments. For shifts > + the shift amount is known to be in-range. */ I guess it's more "we can assume that the amount is in range"? > + if (code == LSHIFT_EXPR > + || code == RSHIFT_EXPR > + || code == LROTATE_EXPR > + || code == RROTATE_EXPR) > + { > + min_value = wi::min (min_value, 0, sign); > + max_value = wi::max (max_value, TYPE_PRECISION (type), > sign); LGTM for shifts right. Because of the above lshift thing, I think we need something like: if (code == LSHIFT_EXPR || code == LROTATE_EXPR) { wide_int op_min_value, op_max_value; if (!vect_get_range_info (op, &op_min_value, op_max_value)) return; /* We can ignore left shifts by negative amounts, which are UB. */ min_value = wi::min (min_value, 0, sign); /* Make sure the highest non-UB shift amount doesn't become UB. */ op_max_value = wi::umin (op_max_value, TYPE_PRECISION (type)); auto mask = wi::mask (TYPE_PRECISION (type), false, op_max_value.to_uhwi ()); max_value = wi::max (max_value, mask, sign); } Does that look right? Thanks, and sorry for the scope creep. Richard > + } > + else if (types_compatible_p (TREE_TYPE (op), type)) > + { > + wide_int op_min_value, op_max_value; > + if (!vect_get_range_info (op, &op_min_value, > &op_max_value)) > + return; > + min_value = wi::min (min_value, op_min_value, sign); > + max_value = wi::max (max_value, op_max_value, sign); > + } > + else > return; > - > - wide_int op_min_value, op_max_value; > - if (!vect_get_range_info (op, &op_min_value, &op_max_value)) > - return; > - > - min_value = wi::min (min_value, op_min_value, sign); > - max_value = wi::max (max_value, op_max_value, sign); > } > else > return;