Hi,
I've recently been trying to hand-write code to trigger automatic
vectorization optimizations in GCC on Intel x86 machines (without
using the interfaces in immintrin.h), but I'm running into a problem
where I can't seem to get the concise `vpmovzxbd` or similar
instructions.

My requirement is to convert 8 `uint8_t` elements to `int32_t` type
and print the output. If I use the interface (_mm256_cvtepu8_epi32) in
immintrin.h, the code is as follows:

int immintrin () {
    int size = 10000, offset = 3;
    uint8_t* a = malloc(sizeof(char) * size);

    __v8si b = (__v8si)_mm256_cvtepu8_epi32(*(__m128i *)(a + offset));

    for (int i = 0; i < 8; i++) {
        printf("%d\n", b[i]);
    }
}

After compiling with -mavx2 -O3, you can get concise and efficient
instructions. (You can see it here: https://godbolt.org/z/8ojzdav47)

But if I do not use this interface and instead use a for-loop or the
`__builtin_convertvector` interface provided by GCC, I cannot achieve
the above effect. The code is as follows:

typedef uint8_t v8qiu __attribute__ ((__vector_size__ (8)));
int forloop () {
    int size = 10000, offset = 3;
    uint8_t* a = malloc(sizeof(char) * size);

    v8qiu av = *(v8qiu *)(a + offset);
    __v8si b = {};
    for (int i = 0; i < 8; i++) {
        b[i] = (a + offset)[i];
    }

    for (int i = 0; i < 8; i++) {
        printf("%d\n", b[i]);
    }
}

int builtin_cvt () {
    int size = 10000, offset = 3;
    uint8_t* a = malloc(sizeof(char) * size);

    v8qiu av = *(v8qiu *)(a + offset);
    __v8si b = __builtin_convertvector(av, __v8si);

    for (int i = 0; i < 8; i++) {
        printf("%d\n", b[i]);
    }
}

The instructions generated by both functions are redundant and
complex, and are quite difficult to read compared to calling
`_mm256_cvtepu8_epi32` directly. (You can see it here as well:
https://godbolt.org/z/8ojzdav47)

What I want to ask is: How should I write the source code to get
assembly instructions similar to directly calling
_mm256_cvtepu8_epi32?

Or would it be easier if I modified the GIMPLE directly? But it seems
that there is no relevant expression or interface directly
corresponding to `vpmovzxbd` in GIMPLE.

Thanks
Hanke Zhang

Reply via email to