Hi there!

I am developing software which tries to deliberately exploit the compiler's autovectorization facilities by feeding data in autovectorization-friendly loops. I'm currently using both g++ and clang++ to see how well this approach works. Using simple arithmetic, I often get good results. To widen the scope of my work, I was looking for documentation on which constructs would be recognized by the autovectorization stage, and found

https://www.gnu.org/software/gcc/projects/tree-ssa/vectorization.html

By the looks of it, this document has not seen any changes for several years. Has development on the autovectorization stage stopped, or is there simply no documentation?

In my experience, vectorization is essential to speed up arithmetic on the CPU, and reliable recognition of vectorization opportunities by the compiler can provide vectorization to programs which don't bother to code it explicitly. I feel the topic is being neglected - at least the documentation I found suggests this. To demonstrate what I mean, I have two concrete scenarios which I'd like to be handled by the autovectorization stage:

- gather/scatter with arbitrary indexes

In C, this would be loops like

// gather from B to A using gather indexes

for ( int i = 0 ; i < vsz ; i++ )
  A [ i ] = B [ indexes [ i ] ] ;

From the AVX2 ISA onwards, there are hardware gather/scatter operations, which can speed things up a good deal.

- repeated use of vectorizable functions

for ( int i = 0 ; i < vsz ; i++ )
  A [ i ] = sqrt ( B [ i ] ) ;

Here, replacing the repeated call of sqrt with the vectorized equivalent gives a dramatic speedup (ca. 4X)

If the compiler were to provide the autovectorization facilities, and if the patterns it recognizes were well-documented, users could rely on certain code patterns being recognized and autovectorized - sort of a contract between the user and the compiler. With a well-chosen spectrum of patterns, this would make it unnecessary to have to rely on explicit vectorization in many cases. My hope is that such an interface would help vectorization to become more frequently used - as I understand the status quo, this is still a niche topic, even though many processors provide suitable hardware nowadays.

Can you point me to where 'the action is' in this regard?

With regards

Kay F. Jahnke


Reply via email to