Hi, I've recently been trying to hand-write code to trigger automatic vectorization optimizations in GCC on Intel x86 machines (without using the interfaces in immintrin.h), but I'm running into a problem where I can't seem to get the concise `vpmovzxbd` or similar instructions.
My requirement is to convert 8 `uint8_t` elements to `int32_t` type and print the output. If I use the interface (_mm256_cvtepu8_epi32) in immintrin.h, the code is as follows: int immintrin () { int size = 10000, offset = 3; uint8_t* a = malloc(sizeof(char) * size); __v8si b = (__v8si)_mm256_cvtepu8_epi32(*(__m128i *)(a + offset)); for (int i = 0; i < 8; i++) { printf("%d\n", b[i]); } } After compiling with -mavx2 -O3, you can get concise and efficient instructions. (You can see it here: https://godbolt.org/z/8ojzdav47) But if I do not use this interface and instead use a for-loop or the `__builtin_convertvector` interface provided by GCC, I cannot achieve the above effect. The code is as follows: typedef uint8_t v8qiu __attribute__ ((__vector_size__ (8))); int forloop () { int size = 10000, offset = 3; uint8_t* a = malloc(sizeof(char) * size); v8qiu av = *(v8qiu *)(a + offset); __v8si b = {}; for (int i = 0; i < 8; i++) { b[i] = (a + offset)[i]; } for (int i = 0; i < 8; i++) { printf("%d\n", b[i]); } } int builtin_cvt () { int size = 10000, offset = 3; uint8_t* a = malloc(sizeof(char) * size); v8qiu av = *(v8qiu *)(a + offset); __v8si b = __builtin_convertvector(av, __v8si); for (int i = 0; i < 8; i++) { printf("%d\n", b[i]); } } The instructions generated by both functions are redundant and complex, and are quite difficult to read compared to calling `_mm256_cvtepu8_epi32` directly. (You can see it here as well: https://godbolt.org/z/8ojzdav47) What I want to ask is: How should I write the source code to get assembly instructions similar to directly calling _mm256_cvtepu8_epi32? Or would it be easier if I modified the GIMPLE directly? But it seems that there is no relevant expression or interface directly corresponding to `vpmovzxbd` in GIMPLE. Thanks Hanke Zhang