https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81389
--- Comment #13 from Marc Glisse <glisse at gcc dot gnu.org> --- (In reply to rockeet from comment #7) > @Marc @Jakub @Martin > Intel CPU document says: operand of _mm_cmpestri can be memory or mm > register, when the operand is memory, it does not require alignment. That's the doc for the CPU instruction. The intrinsic, as a C function, always takes an object of type __m128i, not a register or memory. The only question is what the alignment of the type __m128i is. In gcc, it is 16 bytes. What does alignof (or _Alignof or whatever variant you can get working) return with Intel's compiler? > The issue is: GCC does not know this knowledge(memory operand need not > memory align), and there is no way to enforce gcc to generate a _mm_cmpestri > which always use memory operand, not mm register. Use inline asm? Intrinsics are not quite as low level as you seem to expect. > If I manually load the unaligned memory into an aligned `__m128i`, it has > performance penalty on optimizing compilation. Uh? With -O1, the compiler merges the unaligned load with pcmpestri (it knows that the insn can read unaligned memory). Did you mean to talk about the performance of code generated with -O0? We explicitly do not care about that.