Richard Sandiford <richard.sandif...@arm.com> writes: > Richard Biener via Gcc-patches <gcc-patches@gcc.gnu.org> writes: >> The following makes sure to limit the shift operand when vectorizing >> (short)((int)x >> 31) via (short)x >> 31 as the out of bounds shift >> operand otherwise invokes undefined behavior. When we determine >> whether we can demote the operand we know we at most shift in the >> sign bit so we can adjust the shift amount. >> >> Note this has the possibility of un-CSEing common shift operands >> as there's no good way to share pattern stmts between patterns. >> We'd have to separately pattern recognize the definition. >> >> Bootstrapped on x86_64-unknown-linux-gnu, testing in progress. >> >> Not sure about LSHIFT_EXPR, it probably has the same issue but >> the fallback optimistic zero for out-of-range shifts is at least >> "corrrect". Not sure we ever try to demote rotates (probably not). > > I guess you mean "correct" for x86? But that's just a quirk of x86. > IMO the behaviour is equally wrong for LSHIFT_EXPR.
Sorry for the multiple messages. Wanted to get something out quickly because I wasn't sure how long it would take me to write this... On rotates, for: void foo (unsigned short *restrict ptr) { for (int i = 0; i < 200; ++i) { unsigned int x = ptr[i] & 0xff0; ptr[i] = (x << 1) | (x >> 31); } } we do get: can narrow to unsigned:13 without loss of precision: _5 = x_12 r>> 31; although aarch64 doesn't provide rrotate patterns, so nothing actually comes of it. I think the handling of variable shifts is flawed for other reasons. Given: void uu (unsigned short *restrict ptr1, unsigned short *restrict ptr2) { for (int i = 0; i < 200; ++i) ptr1[i] = ptr1[i] >> ptr2[i]; } void us (unsigned short *restrict ptr1, short *restrict ptr2) { for (int i = 0; i < 200; ++i) ptr1[i] = ptr1[i] >> ptr2[i]; } void su (short *restrict ptr1, unsigned short *restrict ptr2) { for (int i = 0; i < 200; ++i) ptr1[i] = ptr1[i] >> ptr2[i]; } void ss (short *restrict ptr1, short *restrict ptr2) { for (int i = 0; i < 200; ++i) ptr1[i] = ptr1[i] >> ptr2[i]; } we only narrow uu and ss, due to: /* Ignore codes that don't take uniform arguments. */ if (!types_compatible_p (TREE_TYPE (op), type)) return; in vect_determine_precisions_from_range. Maybe we should drop the shift handling from there and instead rely on vect_determine_precisions_from_users, extending: if (TREE_CODE (shift) != INTEGER_CST || !wi::ltu_p (wi::to_widest (shift), precision)) return; to handle ranges where the max is known to be < precision. There again, if masking is enough for right shifts and right rotates, maybe we should keep the current handling for then (with your fix) and skip the types_compatible_p check for those cases. So: - restrict shift handling in vect_determine_precisions_from_range to RSHIFT_EXPR and RROTATE_EXPR - remove types_compatible_p restriction for those cases - extend vect_determine_precisions_from_users shift handling to check for ranges on the shift amount Does that sound right? Thanks, Richard