https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104034
Bug ID: 104034 Summary: Miscompilation of LLVM on s390x with -march=z13 -mtune=z14 in GCC 8.x Product: gcc Version: 8.5.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: rtl-optimization Assignee: unassigned at gcc dot gnu.org Reporter: krebbel at gcc dot gnu.org Target Milestone: --- Created attachment 52194 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=52194&action=edit Testcase Initial analysis done by Jakub Jelinek as part of: https://bugzilla.redhat.com/show_bug.cgi?id=2028609 The following testcase is miscompiled on s390x with g++ -fPIC -fvisibility-inlines-hidden -ffunction-sections -fdata-sections -O2 -fPIC -fno-exceptions -fno-rtti -std=c++14 -mlong-double-128 -march=z13 -mtune=z14 both with the RHEL gcc 8.x and with upstream 8.5.0. When miscompiled, it prints something like __insertion_sort 0x3ffd74fd310 0x3ffd74fd348 0xdeadbeefcafebabe 0xdeadbeefcafebabe __insertion_sort 0x3ffd74fd348 0x3ffd74fd348 0x10006b8 0xdeadbeefcafebabe rather than __insertion_sort 0x3ffd74fd310 0x3ffd74fd348 0x10006b8 0xdeadbeefcafebabe __insertion_sort 0x3ffd74fd348 0x3ffd74fd348 0x10006b8 0xdeadbeefcafebabe The interesting part is below, .cfi_* directives removed for brevity. On entry, this function has 3 pointers in %r2, %r3 and %r4 registers, and %r5 is pointer to the 16-byte function_ref<decltype(foo)> - object with trivially copyable class containing 2 8-byte members. _ZSt24__merge_sort_with_bufferIPPvS1_N4llvm12function_refIFbS0_S0_EEEEvT_S6_T0_T1_: stmg %r6,%r15,48(%r15) lgr %r14,%r15 lay %r15,-248(%r15) aghi %r14,-32 std %f8,0(%r14) std %f12,8(%r14) std %f14,16(%r14) std %f9,24(%r14) sgrk %r11,%r3,%r2 lgr %r1,%r4 srag %r13,%r11,3 agr %r1,%r11 lmg %r8,%r9,0(%r5) stmg %r8,%r9,160(%r15) ! The above stores the whole 16-byte function_ref correctly to %r15+160 cgijle %r11,48,.L13 vlvgp %v0,%r8,%r9 ldgr %f9,%r1 ldgr %f12,%r4 la %r1,200(%r15) lgr %r10,%r3 stg %r11,176(%r15) ldgr %f8,%r2 lgr %r6,%r9 vlgvg %r7,%v0,1 stmg %r8,%r9,184(%r15) ! So does the above lgr %r8,%r1 .L14: la %r11,56(%r2) lgr %r4,%r8 lgr %r3,%r11 stmg %r6,%r7,200(%r15) ! But this one actually stores both 8-byte words the same to %r15+160, and %r15+200 is passed as %r4 to the function brasl %r14,_ZSt16__insertion_sortIPPvN4llvm12function_refIFbS0_S0_EEEEvT_S6_T0_@PLT In *.postreload, we have still correct: (insn 16 12 166 2 (set (reg/v:TI 16 %f0 [orig:69 __comp ] [69]) (reg:TI 8 %r8)) 1268 {movti} (nil)) ... (insn 137 136 140 3 (set (reg/v:TI 6 %r6 [orig:69 __comp ] [69]) (reg/v:TI 16 %f0 [orig:69 __comp ] [69])) 1268 {movti} (nil)) The code spills it to 128-bit %f0 register and loads it back from it. Next, split2 pass splits the latter (but not the former) into: (insn 167 136 168 3 (set (reg:DI 6 %r6 [ __comp ]) (reg:DI 16 %f0)) 1269 {*movdi_64} (nil)) (insn 168 167 140 3 (set (reg:DI 7 %r7 [orig:69 __comp+8 ] [69]) (unspec:DI [ (reg:V2DI 16 %f0) (const_int 1 [0x1]) ] UNSPEC_VEC_EXTRACT)) 402 {*vec_extractv2di} (nil)) and finally cprop_hardreg seeing (insn 187 188 186 3 (set (reg/v:TI 16 %f0 [orig:69 __comp ] [69]) (reg:TI 8 %r8)) 1268 {movti} (nil)) changes insn 167 to: (insn 167 136 168 3 (set (reg:DI 6 %r6 [ __comp ]) (reg:DI 9 %r9 [16])) 1269 {*movdi_64} (nil)) I'm not sure if this is a bug in the ; Split a VR -> GPR TImode move into 2 vector load GR from VR element. ; For the higher order bits we do simply a DImode move while the ; second part is done via vec extract. Both will end up as vlgvg. (define_split [(set (match_operand:TI 0 "register_operand" "") (match_operand:TI 1 "register_operand" ""))] "TARGET_VX && reload_completed && GENERAL_REG_P (operands[0]) && VECTOR_REG_P (operands[1])" [(set (match_dup 2) (match_dup 4)) (set (match_dup 3) (unspec:DI [(match_dup 5) (const_int 1)] UNSPEC_VEC_EXTRACT))] { operands[2] = operand_subword (operands[0], 0, 0, TImode); operands[3] = operand_subword (operands[0], 1, 0, TImode); operands[4] = gen_rtx_REG (DImode, REGNO (operands[1])); operands[5] = gen_rtx_REG (V2DImode, REGNO (operands[1])); }) splitter, in cprop_hardreg or the s390x representation of those TImodes in floating point registers. In GCC 9 it got "fixed" with https://gcc.gnu.org/r9-3763-gef976be1a23a517 but that just means it went latent. And I can't reproduce it even with upstream GCC 9 branch with r9-3763 reverted - some RA decisions changed. But that doesn't mean the problem isn't latent even on the trunk, certainly the above splitter is still there.