https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89049
Richard Biener <rguenth at gcc dot gnu.org> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |segher at gcc dot gnu.org --- Comment #5 from Richard Biener <rguenth at gcc dot gnu.org> --- So combine can see (insn 11 10 13 3 (set (reg:V8SF 105) (vec_concat:V8SF (reg:V4SF 106 [ MEM[base: _2, offset: 0B] ]) (mem:V4SF (plus:DI (reg:DI 85 [ ivtmp.11 ]) (const_int 16 [0x10])) [1 MEM[base: _2, offset: 0B]+16 S16 A32]))) "t.c":1:72 5046 {avx_vec_concatv8sf} (nil)) with its uses (insn 13 11 14 3 (set (reg:V4SF 107) (vec_select:V4SF (reg:V8SF 105) (parallel [ (const_int 0 [0]) (const_int 1 [0x1]) (const_int 2 [0x2]) (const_int 3 [0x3]) ]))) 2702 {vec_extract_lo_v8sf} (nil)) (insn 25 24 26 3 (set (reg:V4SF 111) (vec_select:V4SF (reg:V8SF 105) (parallel [ (const_int 4 [0x4]) (const_int 5 [0x5]) (const_int 6 [0x6]) (const_int 7 [0x7]) ]))) 2711 {vec_extract_hi_v8sf} (expr_list:REG_DEAD (reg:V8SF 105) (nil))) but somehow it only tries 11 -> 13: Trying 11 -> 13: 11: r105:V8SF=vec_concat(r106:V4SF,[r85:DI+0x10]) REG_DEAD r106:V4SF 13: r107:V4SF=vec_select(r105:V8SF,parallel) ... Successfully matched this instruction: (set (reg:V8SF 105) (vec_concat:V8SF (reg:V4SF 106 [ MEM[base: _2, offset: 0B] ]) (mem:V4SF (plus:DI (reg:DI 85 [ ivtmp.11 ]) (const_int 16 [0x10])) [1 MEM[base: _2, offset: 0B]+16 S16 A32]))) Successfully matched this instruction: (set (reg:V4SF 107) (reg:V4SF 106 [ MEM[base: _2, offset: 0B] ])) allowing combination of insns 11 and 13 original costs 4 + 4 = 8 replacement costs 4 + 4 = 8 modifying insn i2 11: r105:V8SF=vec_concat(r106:V4SF,[r85:DI+0x10]) deferring rescan insn with uid = 11. modifying insn i3 13: r107:V4SF=r106:V4SF REG_DEAD r106:V4SF then it continues: Trying 11 -> 25: 11: r105:V8SF=vec_concat(r106:V4SF,[r85:DI+0x10]) 25: r111:V4SF=vec_select(r105:V8SF,parallel) REG_DEAD r105:V8SF Successfully matched this instruction: (set (reg:V4SF 111) (mem:V4SF (plus:DI (reg:DI 85 [ ivtmp.11 ]) (const_int 16 [0x10])) [1 MEM[base: _2, offset: 0B]+16 S16 A32])) rejecting combination of insns 11 and 25 original costs 4 + 4 = 8 replacement cost 12 where it rejects this for some reason... I think the cost of 4 assigned to 11 is bogus here (maybe combine uses wrong costs, not accounting for embedded MEMs?)