On Wed, Dec 14, 2011 at 01:25:13PM +0100, Jakub Jelinek wrote: > On Tue, Dec 13, 2011 at 05:57:40PM +0400, Kirill Yukhin wrote: > > > Let me hack up a quick pattern recognizer for this... > > Here it is, untested so far. > On the testcase doing 2000000 f1+f2+f3+f4 calls in the loop with -O3 -mavx > on Sandybridge (so, vectorized just with 16 byte vectors) gives: > vanilla 0m34.571s > the tree-vect* parts of this patch only 0m9.013s > the whole patch 0m8.824s > The i386 parts are just a small optimization, I guess it could be > done in the vectorizer too (but then we'd have to check whether the > arithmetic/logical right shifts are supported and check costs?), or > perhaps in the generic vcond expander (again, we'd need to check some > costs).
Now bootstrapped/regtested on x86_64-linux and i686-linux. Ok for trunk (at least the pattern recognizer)? > 2011-12-14 Jakub Jelinek <ja...@redhat.com> > > * tree-vectorizer.h (NUM_PATTERNS): Bump to 10. > * tree-vect-patterns.c (vect_recog_sdivmod_pow2_pattern): New > function. > (vect_vect_recog_func_ptrs): Add it. > > * config/i386/sse.md (vcond<V_256:mode><VI_256:mode>, > vcond<V_128:mode><VI124_128:mode>, vcond<VI8F_128:mode>v2di): > Use general_operand instead of nonimmediate_operand for > operand 5 and no predicate for operands 1 and 2. > * config/i386/i386.c (ix86_expand_int_vcond): Optimize > x < 0 ? -1 : 0 and x < 0 ? 1 : 0 into vector arithmetic > resp. logical shift. > > * gcc.dg/vect/vect-sdivmod-1.c: New test. Jakub