https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66115
Jakub Jelinek <jakub at gcc dot gnu.org> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|UNCONFIRMED |RESOLVED Resolution|--- |INVALID --- Comment #10 from Jakub Jelinek <jakub at gcc dot gnu.org> --- (In reply to carloscastro10 from comment #9) > __alignof__(__m128i) is 16, just like __alignof__(long) is 8 and > __alignof__(int) is 4. However, if I have a pointer to long or a pointer to > int, the memory addresses pointed at by those pointers can be aligned to any > byte boundary. That is not the case, please read the C or C++ standards. > Assuming that the address pointed at by a pointer to __m128i > is aligned to a 16-byte boundary is not a correct assumption, especially > when compiling with -mavx. It prevents proper modeling in debug mode of > access to unaligned operands in memory. This problem is also present with > the __m256i type. In AVX, aligned memory load operations (_mm256_load_si256 > and similar) are the exceptions in that they require pointers to aligned > memory addresses. Most AVX operations accept unaligned addresses. One thing is how the HW instructions look like, another is the language you're writing this in (C/C++), that isn't necessarily the same thing. As I said, you really shouldn't add *(__m128i*) or similar (other __m128*, __m256*, custom vector_size attribute defined types, etc.) dereferences if the pointer isn't suitably aligned. Use the unaligned loads and the compiler for -mavx will when optimizing combine them where possible with the actual vector arithmetics etc. instructions.