Within "unrollOptimizations.c" compiled on a Ryzen 5 2600 using x86-64 gcc
12.2 "-O2 -march=native" gives vastly different assembler results given
some scope things I have found. If the for loop iterator name is both int i
in each loop, gcc seemingly spreads arrayA and arrayB into one single XMM
register, name them differently and you get usages over XMM and YMM.
Additionally, white space seems to be a variable to this as well. you can
have the for loops one newline away from each other and this results in XMM
YMM, but put them together and you get XMM. Changing float* a to float*
const a (same goes for float* b) also changes these values.

Godbolt equivalent for simplicity
https://godbolt.org/z/4xWrGoPaE
#include <stdint.h>
#include <stdio.h>
#include <string.h>

const uint32_t array_A[] = {
        4473924, 2236962, 1118481, 559240,
        279620, 139810, 69905, 34952,
        17476, 8738, 4369
};

const uint32_t array_B[] = {
        4210752, 2105376, 1052688, 526344,
        263172, 131586, 65793, 32896,
        16448, 8224, 4112, 2056,
        1028, 514, 257
};

void Setup(int exponent, float* a, float* b) {
    exponent <<= 23;
    float arrA[11];
    float arrB[15];
    memcpy(arrA, array_A, sizeof(arrA));
    memcpy(arrB, array_B, sizeof(arrB));
    for(int i = 0; i < 11; i++) {
        uint32_t v = *(uint32_t*)&array_A[i];
        v |= exponent;
        arrA[i] = *(float*)&v;
    }
    for(int i = 0; i < 15; i++) {
        uint32_t v = *(uint32_t*)&arrB[i];
        v |= exponent;
        arrB[i] = *(float*)&v;
    }
    memcpy(a, arrA, sizeof(arrA));
    memcpy(b, arrB, sizeof(arrB));
}

Reply via email to