On Wed, Dec 14, 2011 at 01:25:13PM +0100, Jakub Jelinek wrote:
> On Tue, Dec 13, 2011 at 05:57:40PM +0400, Kirill Yukhin wrote:
> > > Let me hack up a quick pattern recognizer for this...
> 
> Here it is, untested so far.
> On the testcase doing 2000000 f1+f2+f3+f4 calls in the loop with -O3 -mavx
> on Sandybridge (so, vectorized just with 16 byte vectors) gives:
> vanilla                                       0m34.571s
> the tree-vect* parts of this patch only       0m9.013s
> the whole patch                               0m8.824s
> The i386 parts are just a small optimization, I guess it could be
> done in the vectorizer too (but then we'd have to check whether the
> arithmetic/logical right shifts are supported and check costs?), or
> perhaps in the generic vcond expander (again, we'd need to check some
> costs).

Now bootstrapped/regtested on x86_64-linux and i686-linux.
Ok for trunk (at least the pattern recognizer)?

> 2011-12-14  Jakub Jelinek  <ja...@redhat.com>
> 
>       * tree-vectorizer.h (NUM_PATTERNS): Bump to 10.
>       * tree-vect-patterns.c (vect_recog_sdivmod_pow2_pattern): New
>       function.
>       (vect_vect_recog_func_ptrs): Add it.
> 
>       * config/i386/sse.md (vcond<V_256:mode><VI_256:mode>,
>       vcond<V_128:mode><VI124_128:mode>, vcond<VI8F_128:mode>v2di):
>       Use general_operand instead of nonimmediate_operand for
>       operand 5 and no predicate for operands 1 and 2.
>       * config/i386/i386.c (ix86_expand_int_vcond): Optimize
>       x < 0 ? -1 : 0 and x < 0 ? 1 : 0 into vector arithmetic
>       resp. logical shift.
> 
>       * gcc.dg/vect/vect-sdivmod-1.c: New test.

        Jakub

Reply via email to