On Fri, Jun 29, 2012 at 11:00:14AM +0200, Richard Guenther wrote: > Indeed - the lack of cross-sub-128bit-word operations makes it very much > expensive for some vectorizations. Initially we added the patterns for > vectorization of the hi/lo and interleave stuff because we didn't want > regressions > for vectorizing with 256bit vectors vs. 128bit vectors in the > vectorizer testsuite. > But now as we have support for vectorizing with both sizes we could consider > not advertising the really not existing intstructions for 256bit vectors. Or > at > least properly model their cost.
The pr51581-3.c (f2) generated code is only shorter with -O3 -mavx when using hi/lo over even/odd, with -O3 -mavx2 even/odd sequence is shorter than hi/lo. $ ~/timing ./pr51581-3-evenodd Strip out best and worst realtime result minimum: 0.110145575 sec real / 0.000071177 sec CPU maximum: 0.134790162 sec real / 0.000140234 sec CPU average: 0.113982306 sec real / 0.000113236 sec CPU stdev : 0.002545680 sec real / 0.000009365 sec CPU $ ~/timing ./pr51581-3-hilo Strip out best and worst realtime result minimum: 0.098651474 sec real / 0.000069318 sec CPU maximum: 0.102126514 sec real / 0.000129507 sec CPU average: 0.100120802 sec real / 0.000104589 sec CPU stdev : 0.001008010 sec real / 0.000013241 sec CPU Can't benchmark -mavx2 though... Jakub