Re: XMM/YMM assembly bloat

2023-02-14 Thread Owen Cook via Gcc-bugs
Adding white space/newlines outside the function also affect the length of
assembly.

On Tue, Feb 14, 2023 at 5:09 PM Owen Cook  wrote:

> Within "unrollOptimizations.c" compiled on a Ryzen 5 2600 using x86-64 gcc
> 12.2 "-O2 -march=native" gives vastly different assembler results given
> some scope things I have found. If the for loop iterator name is both int i
> in each loop, gcc seemingly spreads arrayA and arrayB into one single XMM
> register, name them differently and you get usages over XMM and YMM.
> Additionally, white space seems to be a variable to this as well. you can
> have the for loops one newline away from each other and this results in XMM
> YMM, but put them together and you get XMM. Changing float* a to float*
> const a (same goes for float* b) also changes these values.
>
> Godbolt equivalent for simplicity
> https://godbolt.org/z/4xWrGoPaE
>


XMM/YMM assembly bloat

2023-02-14 Thread Owen Cook via Gcc-bugs
Within "unrollOptimizations.c" compiled on a Ryzen 5 2600 using x86-64 gcc
12.2 "-O2 -march=native" gives vastly different assembler results given
some scope things I have found. If the for loop iterator name is both int i
in each loop, gcc seemingly spreads arrayA and arrayB into one single XMM
register, name them differently and you get usages over XMM and YMM.
Additionally, white space seems to be a variable to this as well. you can
have the for loops one newline away from each other and this results in XMM
YMM, but put them together and you get XMM. Changing float* a to float*
const a (same goes for float* b) also changes these values.

Godbolt equivalent for simplicity
https://godbolt.org/z/4xWrGoPaE
#include 
#include 
#include 

const uint32_t array_A[] = {
4473924, 2236962, 1118481, 559240,
279620, 139810, 69905, 34952,
17476, 8738, 4369
};

const uint32_t array_B[] = {
4210752, 2105376, 1052688, 526344,
263172, 131586, 65793, 32896,
16448, 8224, 4112, 2056,
1028, 514, 257
};

void Setup(int exponent, float* a, float* b) {
exponent <<= 23;
float arrA[11];
float arrB[15];
memcpy(arrA, array_A, sizeof(arrA));
memcpy(arrB, array_B, sizeof(arrB));
for(int i = 0; i < 11; i++) {
uint32_t v = *(uint32_t*)_A[i];
v |= exponent;
arrA[i] = *(float*)
}
for(int i = 0; i < 15; i++) {
uint32_t v = *(uint32_t*)[i];
v |= exponent;
arrB[i] = *(float*)
}
memcpy(a, arrA, sizeof(arrA));
memcpy(b, arrB, sizeof(arrB));
}