On 8/25/18 1:09 PM, Segher Boessenkool wrote: > Hey, > > On Sat, Aug 25, 2018 at 12:15:16PM -0500, Bill Schmidt wrote: >> On 8/20/18 4:44 PM, Will Schmidt wrote: >>> Enable GIMPLE folding of the vec_splat() intrinsic. (v3). >>> >>> This uses the tree_vec_extract() function out of tree-vect-generic.c >>> to retrieve the splat value, which is a BIT_FIELD_REF. That function is >>> made non-static as part of this change. >>> >>> Testcases are already in-tree. >>> >>> V2 updates, per feedback previously received. >>> Forced arg1 into range (modulo #elements) before attempting to extract >>> the splat value. >>> Removed the (now unnecessary) code that did bounds-checking before calling >>> the tree_vec_extract helper. >>> Used arg0_type rather than lhs_type for calculating the tree size. >>> >>> V3 updates, inspired by additional offline discussion with Segher. >>> Noting that for vec_splat(arg1,arg2), the ABI describes ARG2 as an element >>> number less than the number of elements supported by the respective ARG1 >>> type, so we do not attempt to gimple-fold the intrinsic if we determine our >>> value is out of range. Thus, the initial check ensures that ARG1 is both >>> constant and has a valid index into ARG0. >>> The subsequent modulo operation is no longer necessary, and has been >>> removed. >>> 2018-08-20 Will Schmidt <[email protected]> >>> >>> * config/rs6000/rs6000.c (rs6000_gimple_fold_builtin): Add >>> support for >>> early gimple folding of vec_splat(). > s/ // > >>> * tree-vect-generic.c: Remove static from tree_vec_extract() >>> definition. >>> * gimple-fold.h: Add an extern define for tree_vec_extract(). > s/ / / > >>> + /* Only fold the vec_splat_*() if arg1 is both a constant value, and a >>> valid >>> + index into the arg0 vector. */ > Needs two more spaces indent? > >>> + unsigned int n_elts = VECTOR_CST_NELTS (arg0); >>> + if (TREE_CODE (arg1) != INTEGER_CST >>> + || TREE_INT_CST_LOW (arg1) > (n_elts -1)) >> Space between - and 1. >> >> I'm still not convinced we should do this. The semantics of vec_splat are >> to select the element as arg1 modulo n_elts. Odd as it seems, it's >> historically >> valid for values between 0 and 31 to be used regardless of the value of >> n_elts. > But, the assembler complains. So it never worked, and it seems better to > not allow it in the compiler either. > > A lot of existing testcases generate assembler code the assembler chokes > on -- but the testcases are dg-do compile, so the assembler isn't run. > > This is also PR86987; I have patches, but it snowballs because of the > aforementioned existing wrong testcases. > >> Since arg1 is a constant we can easily bring it into valid range and expand >> it >> early, rather than losing optimization opportunities by keeping this as a >> built-in call. > We should just error on invalid input. > >> Do we have test cases for arg1 in {n_elts, ..., 31}? What's GCC's current >> behavior? Maybe we already aren't honoring this technically legal case? > We do, lots. Sometimes we compile it; the assembler chokes on the output > (but the assembler is not run in those testcases). Sometimes we ICE.
Thanks for the reminders. :-) No concerns, then. Bill > >> Otherwise looks fine to me, though of course I can't approve. > Looks fine to me, too. > >>> + int splat_elem_size = TREE_INT_CST_LOW (size_in_bytes (arg0_type)) >>> + * BITS_PER_UNIT / n_elts; > (Well one thing then... Wrong indent?) > > > Segher >
