https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68991

            Bug ID: 68991
           Summary: -O3 generates misaligned xorv4si3
           Product: gcc
           Version: 5.3.1
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: rtl-optimization
          Assignee: unassigned at gcc dot gnu.org
          Reporter: hjl.tools at gmail dot com
  Target Milestone: ---
            Target: x32

When compiling llvm 3.8 at -O3 for x32, GCC 5.3.1 turns

(insn 194 193 195 24 (set (reg:V4SI 246 [ vect__45.575 ])
        (xor:V4SI (mem/c:V4SI (plus:SI (reg/f:SI 20 frame)
                    (const_int -32 [0xffffffffffffffe0])) [14 MEM[(long
unsigned int *)&D.120283]+0 S16 A128])
            (reg:V4SI 247))) /usr/include/c++/5.3.1/bitset:163 3434 {*xorv4si3}
     (expr_list:REG_EQUAL (not:V4SI (mem/c:V4SI (plus:SI (reg/f:SI 20 frame)
                    (const_int -32 [0xffffffffffffffe0])) [14 MEM[(long
unsigned int *)&D.120283]+0 S16 A128]))
        (nil)))

into

(insn 194 193 439 22 (set (reg:V4SI 246 [ vect__45.575 ])
        (xor:V4SI (reg:V4SI 326)
            (reg:V4SI 247))) /usr/include/c++/5.3.1/bitset:163 3434 {*xorv4si3}
     (expr_list:REG_DEAD (reg:V4SI 326)
        (expr_list:REG_DEAD (reg:V4SI 247)
            (expr_list:REG_EQUAL (not:V4SI (mem/c:V4SI (plus:SI (reg/f:SI 20
frame)
                            (const_int -32 [0xffffffffffffffe0])) [14 MEM[(long
unsigned int *)&D.120283]+0 S16 A128]))
                (nil)))))

Combine generates

(insn 194 193 439 22 (set (reg:V4SI 246 [ vect__45.575 ])
        (xor:V4SI (reg:V4SI 247)
            (subreg:V4SI (reg:TI 245 [ MEM[(const struct bitset
&)FeatureEntry_21 + 8] ]) 0))) /usr/include/c++/5.3.1/bitset:163 3434
{*xorv4si3}
     (expr_list:REG_DEAD (reg:TI 245 [ MEM[(const struct bitset
&)FeatureEntry_21 + 8] ])
        (expr_list:REG_DEAD (reg:V4SI 247) 
            (nil))))

But memory is aligned at 4 bytes:

(insn 194 193 439 22 (set (reg:V4SI 21 xmm0 [orig:246 vect__45.575 ] [246])
        (xor:V4SI (reg:V4SI 21 xmm0 [247])
            (mem:V4SI (plus:SI (reg/v/f:SI 3 bx [orig:88 FeatureEntry ] [88])
                    (const_int 8 [0x8])) [12 MEM[(const struct bitset
&)FeatureEntry_21 + 8]+0 S16 A32]))) /usr/include/c++/5.3.1/bitset:163 3434
{*xorv4si3}
     (nil))

Combine with subreg over vector memory, which may be misaligned,
is only valid for AVX, not for SSE.

Reply via email to