On Fri, Jun 29, 2012 at 11:00:14AM +0200, Richard Guenther wrote:
> Indeed - the lack of cross-sub-128bit-word operations makes it very much
> expensive for some vectorizations.  Initially we added the patterns for
> vectorization of the hi/lo and interleave stuff because we didn't want
> regressions
> for vectorizing with 256bit vectors vs. 128bit vectors in the
> vectorizer testsuite.
> But now as we have support for vectorizing with both sizes we could consider
> not advertising the really not existing intstructions for 256bit vectors.  Or 
> at
> least properly model their cost.

The pr51581-3.c (f2) generated code is only shorter with -O3 -mavx
when using hi/lo over even/odd, with -O3 -mavx2 even/odd sequence is
shorter than hi/lo.
$ ~/timing ./pr51581-3-evenodd
Strip out best and worst realtime result
minimum: 0.110145575 sec real / 0.000071177 sec CPU
maximum: 0.134790162 sec real / 0.000140234 sec CPU
average: 0.113982306 sec real / 0.000113236 sec CPU
stdev  : 0.002545680 sec real / 0.000009365 sec CPU
$ ~/timing ./pr51581-3-hilo
Strip out best and worst realtime result
minimum: 0.098651474 sec real / 0.000069318 sec CPU
maximum: 0.102126514 sec real / 0.000129507 sec CPU
average: 0.100120802 sec real / 0.000104589 sec CPU
stdev  : 0.001008010 sec real / 0.000013241 sec CPU
Can't benchmark -mavx2 though...

        Jakub

Reply via email to