https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68991
Bug ID: 68991 Summary: -O3 generates misaligned xorv4si3 Product: gcc Version: 5.3.1 Status: UNCONFIRMED Severity: normal Priority: P3 Component: rtl-optimization Assignee: unassigned at gcc dot gnu.org Reporter: hjl.tools at gmail dot com Target Milestone: --- Target: x32 When compiling llvm 3.8 at -O3 for x32, GCC 5.3.1 turns (insn 194 193 195 24 (set (reg:V4SI 246 [ vect__45.575 ]) (xor:V4SI (mem/c:V4SI (plus:SI (reg/f:SI 20 frame) (const_int -32 [0xffffffffffffffe0])) [14 MEM[(long unsigned int *)&D.120283]+0 S16 A128]) (reg:V4SI 247))) /usr/include/c++/5.3.1/bitset:163 3434 {*xorv4si3} (expr_list:REG_EQUAL (not:V4SI (mem/c:V4SI (plus:SI (reg/f:SI 20 frame) (const_int -32 [0xffffffffffffffe0])) [14 MEM[(long unsigned int *)&D.120283]+0 S16 A128])) (nil))) into (insn 194 193 439 22 (set (reg:V4SI 246 [ vect__45.575 ]) (xor:V4SI (reg:V4SI 326) (reg:V4SI 247))) /usr/include/c++/5.3.1/bitset:163 3434 {*xorv4si3} (expr_list:REG_DEAD (reg:V4SI 326) (expr_list:REG_DEAD (reg:V4SI 247) (expr_list:REG_EQUAL (not:V4SI (mem/c:V4SI (plus:SI (reg/f:SI 20 frame) (const_int -32 [0xffffffffffffffe0])) [14 MEM[(long unsigned int *)&D.120283]+0 S16 A128])) (nil))))) Combine generates (insn 194 193 439 22 (set (reg:V4SI 246 [ vect__45.575 ]) (xor:V4SI (reg:V4SI 247) (subreg:V4SI (reg:TI 245 [ MEM[(const struct bitset &)FeatureEntry_21 + 8] ]) 0))) /usr/include/c++/5.3.1/bitset:163 3434 {*xorv4si3} (expr_list:REG_DEAD (reg:TI 245 [ MEM[(const struct bitset &)FeatureEntry_21 + 8] ]) (expr_list:REG_DEAD (reg:V4SI 247) (nil)))) But memory is aligned at 4 bytes: (insn 194 193 439 22 (set (reg:V4SI 21 xmm0 [orig:246 vect__45.575 ] [246]) (xor:V4SI (reg:V4SI 21 xmm0 [247]) (mem:V4SI (plus:SI (reg/v/f:SI 3 bx [orig:88 FeatureEntry ] [88]) (const_int 8 [0x8])) [12 MEM[(const struct bitset &)FeatureEntry_21 + 8]+0 S16 A32]))) /usr/include/c++/5.3.1/bitset:163 3434 {*xorv4si3} (nil)) Combine with subreg over vector memory, which may be misaligned, is only valid for AVX, not for SSE.