On Thu, Jan 19, 2012 at 2:12 PM, Alexander Herz <alexander.h...@mytum.de> wrote: > The generated non-vectorized assembly is simply the unrolled loop with >8 > iterations, so loop structure is pretty much intact (except for unrolling). > > Does the vectorizer fail on unrolled loops? > > I can compile some assembly dumps showing both the vectorized and the > unvectorized loop?
Assembly does not help. Loop unrolling happens after vectorization. Richard. > Alex > > > On 01/19/2012 11:29 AM, Richard Guenther wrote: >> >> On Wed, Jan 18, 2012 at 6:37 PM, Alexander Herz<alexander.h...@mytum.de> >> wrote: >>> >>> Given this piece of code (gcc-4.7-20120114): >>> >>> static void Test(Batch* block,Batch* new_block,const uint32 offs) >>> { >>> >>> T* __restrict old_values >>> =(T*)__builtin_assume_aligned(block->items,16); >>> T* __restrict new_values >>> =(T*)__builtin_assume_aligned(new_block->items,16); >>> >>> //assert(((uint64)(&block->items)%16)==0); //OK!! >>> //assert(((uint64)(&new_block->items)%16)==0); >>> >>> for(uint32 c=0;c<(BS<<1);c++) //hopefully compiler applies SIMD >>> here >>> { >>> new_values[c]=old_values[c]*old_values[c]; >>> } >>> >>> } >>> >>> I would assume that the loop is always vectorized (pointers tagged as >>> restricted and aligned, loop >>> over fixed iteration space even a power of 2, so most likely dividable by >>> 4), it is quite similar to vectorization example22 >>> (http://gcc.gnu.org/projects/tree-ssa/vectorization.html#vectorizab). >>> >>> I run the previously mentioned g++ version with this command line: >>> -std=c++0x -g -O3 -msse -msse2 -msse3 -msse4.1 -Wall -Wstrict-aliasing=2 >>> -ftree-vectorizer-verbose=2 >>> >>> Looking at the vectorizer output (and at the generated assembly) it looks >>> as >>> if the loop given above >>> is indeed vectorized if Test() is called from main() (vectorized 1 loop). >>> >>> When the function Test() is called nested inside some complex code, it >>> looks >>> as if the vectorization analysis gives up because the code is too complex >>> to >>> analyze and never considers the loop inside Test() in this context even >>> though it should be easily vectorizeable in any context given the hints >>> inside Test(). >>> >>> Is there anything I can do, so that Test() is analyzed in all contexts? I >>> guess all methods that contain the >>> __builtin_assume_aligned hint should be considered for vectorization, >>> independent of their context. >> >> Without a concrete example it is impossible to say. I suppose earlier >> optimizations destroy loop structure too much? >> >>> Thx for your help, >>> Alex >>> >>> >