Hello, I was running a couple of tests on various platforms in preparation of getting the find_reload_subreg_address patch needed by aarch64 upstream: http://gcc.gnu.org/ml/gcc-patches/2012-07/msg01421.html
This unfortunately uncovered a regression in vect-98-big-array.c on i386. It seems to me that this is due to a pre-existing problem in the back-end. The insn being reloaded is: (insn 17 15 19 3 (set (subreg:V16QI (reg:V4SI 182) 0) (unspec:V16QI [ (mem:V16QI (plus:SI (reg:SI 64 [ ivtmp.97 ]) (const_int 16 [0x10])) [2 MEM[base: _164, offset: 16B]+0 S16 A32]) ] UNSPEC_MOVU)) /home/uweigand/fsf/gcc-head/gcc/testsuite/gcc.dg/vect/vect-98-big-array.c:23 921 {sse2_movdqu} (nil)) where reg:V4SI 182 was spilled to the stack. With the current compiler, this gets reloaded via a straightforward output reload: (insn 17 15 195 3 (set (reg:V16QI 21 xmm0) (unspec:V16QI [ (mem:V16QI (plus:SI (reg:SI 0 ax [orig:64 ivtmp.97 ] [64]) (const_int 16 [0x10])) [2 MEM[base: _164, offset: 16B]+0 S16 A32]) ] UNSPEC_MOVU)) /home/uweigand/fsf/gcc-head/gcc/testsuite/gcc.dg/vect/vect-98-big-array.c:23 921 {sse2_movdqu} (nil)) (insn 195 17 19 3 (set (mem/c:V16QI (plus:SI (reg/f:SI 7 sp) (const_int 112 [0x70])) [4 %sfp+-1232 S16 A128]) (reg:V16QI 21 xmm0)) /home/uweigand/fsf/gcc-head/gcc/testsuite/gcc.dg/vect/vect-98-big-array.c:23 901 {*movv16qi_internal} (nil)) But with the above patch applied, we get an input reload instead: (insn 196 15 17 3 (set (reg:V16QI 21 xmm0) (mem:V16QI (plus:SI (reg:SI 0 ax [orig:64 ivtmp.97 ] [64]) (const_int 16 [0x10])) [2 MEM[base: _164, offset: 16B]+0 S16 A32])) /home/uweigand/fsf/gcc-head/gcc/testsuite/gcc.dg/vect/vect-98-big-array.c:23 901 {*movv16qi_internal} (nil)) (insn 17 196 19 3 (set (mem/c:V16QI (plus:SI (reg/f:SI 7 sp) (const_int 112 [0x70])) [4 %sfp+-1232 S16 A128]) (unspec:V16QI [ (reg:V16QI 21 xmm0) ] UNSPEC_MOVU)) /home/uweigand/fsf/gcc-head/gcc/testsuite/gcc.dg/vect/vect-98-big-array.c:23 921 {sse2_movdqu} (nil)) Now this is clearly broken, since now the load from memory is no longer marked as potentially unaligned. And indeed execution of this instruction traps due to a misaligned access. However, looking at the underlying pattern: (define_insn "<sse2>_movdqu<avxsizesuffix>" [(set (match_operand:VI1 0 "nonimmediate_operand" "=x,m") (unspec:VI1 [(match_operand:VI1 1 "nonimmediate_operand" "xm,x")] UNSPEC_MOVU))] "TARGET_SSE2 && !(MEM_P (operands[0]) && MEM_P (operands[1]))" the back-end actually *allows* this transformation just fine. So reload has always had both options. Since both require just one reload, it's a matter of additional heuristics which will get chosen, and my patch as a side effect slightly changes one of those heuristics ... But in principle the problem is latent even without the patch. The back-end should not permit reload to make changes to the pattern that fundamentally change the semantics of the resulting instruction ... I was wondering if the i386 port maintainers could have a look at this pattern. Shouldn't we really have two patterns, one to *load* an unaligned value and one to *store* and unaligned value, and not permit that memory access to get reloaded? [I guess a more fundamental change might also be to not have an unspec in the first place, but only plain moves, and check MEM_ALIGN in the move insn emitter to see which variant of the instruction is required ...] Bye, Ulrich -- Dr. Ulrich Weigand GNU Toolchain for Linux on System z and Cell BE ulrich.weig...@de.ibm.com