[Bug target/82731] _mm256_set_epi8(array[offset[0]], array[offset[1]], ...) byte gather makes slow code, trying to zero-extend all the uint16_t offsets first and spilling them.

2024-04-17 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82731 --- Comment #7 from Hongtao Liu --- (In reply to Hongtao Liu from comment #4) > (In reply to Hongtao Liu from comment #3) > > Looks like ix86_vect_estimate_reg_pressure doesn't work here, taking a look. > > Oh, ix86_vect_estimate_reg_pressure

[Bug target/82731] _mm256_set_epi8(array[offset[0]], array[offset[1]], ...) byte gather makes slow code, trying to zero-extend all the uint16_t offsets first and spilling them.

2024-04-17 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82731 --- Comment #6 from Richard Biener --- That's ix86_expand_vector_init_interleave which for QI inner_mode extends to SImode, likely because it tries to work with just SSE2?

[Bug target/82731] _mm256_set_epi8(array[offset[0]], array[offset[1]], ...) byte gather makes slow code, trying to zero-extend all the uint16_t offsets first and spilling them.

2024-04-17 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82731 --- Comment #5 from Richard Biener --- We do not BB vectorize gathers I think (ISTR some "loop" uses in the infrastructure, not too difficult to fix I guess). In the end the problem is RTL expansion of the CTOR and then lack of combine? Look

[Bug target/82731] _mm256_set_epi8(array[offset[0]], array[offset[1]], ...) byte gather makes slow code, trying to zero-extend all the uint16_t offsets first and spilling them.

2024-04-17 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82731 --- Comment #4 from Hongtao Liu --- (In reply to Hongtao Liu from comment #3) > Looks like ix86_vect_estimate_reg_pressure doesn't work here, taking a look. Oh, ix86_vect_estimate_reg_pressure is only for loop, BB vectorizer only use

[Bug target/82731] _mm256_set_epi8(array[offset[0]], array[offset[1]], ...) byte gather makes slow code, trying to zero-extend all the uint16_t offsets first and spilling them.

2024-04-17 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82731 Hongtao Liu changed: What|Removed |Added CC||liuhongt at gcc dot gnu.org --- Comment

[Bug target/82731] _mm256_set_epi8(array[offset[0]], array[offset[1]], ...) byte gather makes slow code, trying to zero-extend all the uint16_t offsets first and spilling them.

2024-04-15 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82731 --- Comment #2 from Andrew Pinski --- Note you can reproduce the same issue with SSE2 (and not just AVX): ``` #define vect16 __attribute__((vector_size(16))) vect16 char gather(char *array, unsigned short *offset) { return (vect16

[Bug target/82731] _mm256_set_epi8(array[offset[0]], array[offset[1]], ...) byte gather makes slow code, trying to zero-extend all the uint16_t offsets first and spilling them.

2024-04-15 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82731 Andrew Pinski changed: What|Removed |Added Last reconfirmed||2024-04-15 Severity|normal