[Bug rtl-optimization/87718] [9 Regression] FAIL: gcc.target/i386/avx512dq-concatv2si-1.c
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87718 Tamar Christina changed: What|Removed |Added CC||tnfchris at gcc dot gnu.org --- Comment #9 from Tamar Christina --- This change has quite a negative effect on the cost model in AArch64 an ICE due to the new costs and register classes it picks. See PR88282 Thanks, Tamar
[Bug rtl-optimization/87718] [9 Regression] FAIL: gcc.target/i386/avx512dq-concatv2si-1.c
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87718 Jakub Jelinek changed: What|Removed |Added Status|NEW |RESOLVED Resolution|--- |FIXED --- Comment #8 from Jakub Jelinek --- Fixed, thanks.
[Bug rtl-optimization/87718] [9 Regression] FAIL: gcc.target/i386/avx512dq-concatv2si-1.c
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87718 --- Comment #7 from Vladimir Makarov --- Author: vmakarov Date: Thu Nov 22 17:25:57 2018 New Revision: 266385 URL: https://gcc.gnu.org/viewcvs?rev=266385&root=gcc&view=rev Log: 2018-11-22 Vladimir Makarov PR rtl-optimization/87718 * ira-costs.c: Remove trailing white-spaces. (record_operand_costs): Add a special treatment for moves involving a hard register. 2018-11-22 Vladimir Makarov PR rtl-optimization/87718 * gcc.target/i386/pr82361-1.c: Check only the first operand of moves. Modified: trunk/gcc/ChangeLog trunk/gcc/ira-costs.c trunk/gcc/testsuite/ChangeLog trunk/gcc/testsuite/gcc.target/i386/pr82361-1.c
[Bug rtl-optimization/87718] [9 Regression] FAIL: gcc.target/i386/avx512dq-concatv2si-1.c
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87718 --- Comment #6 from Vladimir Makarov --- The culprit for the bad code generation is the following insn description (define_insn "*movsi_internal" [(set (match_operand:SI 0 "nonimmediate_operand" "=r,m ,*y,*y,?*y,?m,?r,?*y,*v,*v,*v,m ,?r,?*v,*k,*k ,*rm") (match_operand:SI 1 "general_operand" "g ,re,C ,*y,m ,*y,*y,r ,C ,*v,m ,*v,*v,r ,*r,*km,*k"))] Alternatives with sse regs are not considered at all (hint *) for cost calculation even if one operand is sse hard reg. And therefore sse class for another operand with pseudo is too costly. Removing the hints is not a solution. I believe we will have even more problems with GCC testsuite. So I am trying to solve it with specific treatment of moves for cost calculations. The patch I am working on solves the PR but currently creates a few GCC testsuite failures (unexpected but correct code generation). So I am continuing to work on the PR.
[Bug rtl-optimization/87718] [9 Regression] FAIL: gcc.target/i386/avx512dq-concatv2si-1.c
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87718 --- Comment #5 from Vladimir Makarov --- In general moving from propagation of hard regs is good thing for RA. Although there are exception as this PR. The problem starts with IRA. It decides that r91 should be a general regs based on cost calculation. The cost calculation code in IRA is very sensitive. A change there usually results in new PRs with unexpected code generation. I'll investigate more the PR and how to fix the PR with minimal effect to other targets and tests. But right now I can guess that the cost of move of greg with sse-regs and move of sse-regs is the same for intel but if we choose sse-regs they are coalesced and the move is removed.
[Bug rtl-optimization/87718] [9 Regression] FAIL: gcc.target/i386/avx512dq-concatv2si-1.c
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87718 --- Comment #4 from Terry Guo --- (In reply to Uroš Bizjak from comment #2) > Following testcase: > > --cut here-- > typedef int V __attribute__((vector_size (8))); > > void foo (int x, int y) > { > register int a __asm ("xmm1"); > register int b __asm ("xmm2"); > register V c __asm ("xmm3"); > a = x; > b = y; > asm volatile ("" : "+v" (a), "+v" (b)); > c = (V) { a, b }; > asm volatile ("" : "+v" (c)); > } > --cut here-- > > gets compiled with -O2 -mavx -mtune=intel: > > vmovd %edi, %xmm1 > vmovd %esi, %xmm2 > vmovd %xmm2, %eax > vpinsrd $1, %eax, %xmm1, %xmm3 > ret > > The relevant pattern is defined as: > > (define_insn "*vec_concatv2si_sse4_1" > [(set (match_operand:V2SI 0 "register_operand" > "=Yr,*x, x, v,Yr,*x, v, v, *y,*y") > (vec_concat:V2SI > (match_operand:SI 1 "nonimmediate_operand" > " 0, 0, x,Yv, 0, 0,Yv,rm, 0,rm") > (match_operand:SI 2 "nonimm_or_0_operand" > " rm,rm,rm,rm,Yr,*x,Yv, C,*ym, C")))] > "TARGET_SSE4_1 && !(MEM_P (operands[1]) && MEM_P (operands[2]))" > "@ >pinsrd\t{$1, %2, %0|%0, %2, 1} >pinsrd\t{$1, %2, %0|%0, %2, 1} >vpinsrd\t{$1, %2, %1, %0|%0, %1, %2, 1} >vpinsrd\t{$1, %2, %1, %0|%0, %1, %2, 1} >punpckldq\t{%2, %0|%0, %2} >punpckldq\t{%2, %0|%0, %2} >vpunpckldq\t{%2, %1, %0|%0, %1, %2} >%vmovd\t{%1, %0|%0, %1} >punpckldq\t{%2, %0|%0, %2} >movd\t{%1, %0|%0, %1}" > > but for some reason RA chooses alternative 2 (x<-x,rm) instead of > alternative 6 (v<-Yv,Yv), although alternative 2 needs an extra reload from > %xmm2 to %eax. I dig this a bit and looks like we missed something in combine pass, hence fail to get a pattern that can match alternative 6. The combine pass dump of old gcc shows: --- REG_UNUSED flags:CC insn_cost 4 for10: r82:SI=xmm16:SI REG_DEAD xmm16:SI insn_cost 4 for11: r83:SI=xmm17:SI REG_DEAD xmm17:SI insn_cost 4 for12: r87:V2SI=vec_concat(r82:SI,r83:SI) REG_DEAD r83:SI REG_DEAD r82:SI --- then we got: --- Trying 10 -> 12: 10: r82:SI=xmm16:SI REG_DEAD xmm16:SI 12: r87:V2SI=vec_concat(r82:SI,r83:SI) REG_DEAD r83:SI REG_DEAD r82:SI Successfully matched this instruction: (set (reg:V2SI 87) (vec_concat:V2SI (reg/v:SI 52 xmm16 [ a ]) (reg:SI 83 [ b.1_2 ]))) allowing combination of insns 10 and 12 original costs 4 + 4 = 8 replacement cost 4 deferring deletion of insn with uid = 10. modifying insn i312: r87:V2SI=vec_concat(xmm16:SI,r83:SI) REG_DEAD xmm16:SI REG_DEAD r83:SI deferring rescan insn with uid = 12. Trying 11 -> 12: 11: r83:SI=xmm17:SI REG_DEAD xmm17:SI 12: r87:V2SI=vec_concat(xmm16:SI,r83:SI) REG_DEAD xmm16:SI REG_DEAD r83:SI Successfully matched this instruction: (set (reg:V2SI 87) (vec_concat:V2SI (reg/v:SI 52 xmm16 [ a ]) (reg/v:SI 53 xmm17 [ b ]))) allowing combination of insns 11 and 12 original costs 4 + 4 = 8 replacement cost 4 deferring deletion of insn with uid = 11. modifying insn i312: r87:V2SI=vec_concat(xmm16:SI,xmm17:SI) REG_DEAD xmm17:SI REG_DEAD xmm16:SI deferring rescan insn with uid = 12. --- There are two successful combine attempts. We end up with pattern that can match alternative 6. However dump from current GCC trunk shows: --- insn_cost 4 for19: r90:SI=xmm16:SI REG_DEAD xmm16:SI insn_cost 4 for10: r82:SI=r90:SI REG_DEAD r90:SI insn_cost 4 for20: r91:SI=xmm17:SI REG_DEAD xmm17:SI insn_cost 4 for11: r83:SI=r91:SI REG_DEAD r91:SI insn_cost 4 for12: r87:V2SI=vec_concat(r82:SI,r83:SI) REG_DEAD r83:SI REG_DEAD r82:SI insn_cost 4 for13: xmm3:V2SI=r87:V2SI REG_DEAD r87:V2SI --- Trying 11 -> 12: 11: r83:SI=r91:SI REG_DEAD r91:SI 12: r87:V2SI=vec_concat(r90:SI,r83:SI) REG_DEAD r90:SI REG_DEAD r83:SI Successfully matched this instruction: (set (reg:V2SI 87) (vec_concat:V2SI (reg:SI 90) (reg:SI 91))) allowing combination of insns 11 and 12 original costs 4 + 4 = 8 replacement cost 4 deferring deletion of insn with uid = 11. modifying insn i312: r87:V2SI=vec_concat(r90:SI,r91:SI) REG_DEAD r91:SI REG_DEAD r90:SI deferring rescan insn with uid = 12. --- We end up with "12: r87:V2SI=vec_concat(r90:SI,r91:SI)", later in LRA pass, the operand r90 is replaced with XMM register, the r91 is kept as general register. Then no chance match against preferred alternative 6.
[Bug rtl-optimization/87718] [9 Regression] FAIL: gcc.target/i386/avx512dq-concatv2si-1.c
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87718 Jakub Jelinek changed: What|Removed |Added CC||jakub at gcc dot gnu.org, ||vmakarov at gcc dot gnu.org --- Comment #3 from Jakub Jelinek --- Vlad, could you please have a look?
[Bug rtl-optimization/87718] [9 Regression] FAIL: gcc.target/i386/avx512dq-concatv2si-1.c
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87718 --- Comment #2 from Uroš Bizjak --- Following testcase: --cut here-- typedef int V __attribute__((vector_size (8))); void foo (int x, int y) { register int a __asm ("xmm1"); register int b __asm ("xmm2"); register V c __asm ("xmm3"); a = x; b = y; asm volatile ("" : "+v" (a), "+v" (b)); c = (V) { a, b }; asm volatile ("" : "+v" (c)); } --cut here-- gets compiled with -O2 -mavx -mtune=intel: vmovd %edi, %xmm1 vmovd %esi, %xmm2 vmovd %xmm2, %eax vpinsrd $1, %eax, %xmm1, %xmm3 ret The relevant pattern is defined as: (define_insn "*vec_concatv2si_sse4_1" [(set (match_operand:V2SI 0 "register_operand" "=Yr,*x, x, v,Yr,*x, v, v, *y,*y") (vec_concat:V2SI (match_operand:SI 1 "nonimmediate_operand" " 0, 0, x,Yv, 0, 0,Yv,rm, 0,rm") (match_operand:SI 2 "nonimm_or_0_operand" " rm,rm,rm,rm,Yr,*x,Yv, C,*ym, C")))] "TARGET_SSE4_1 && !(MEM_P (operands[1]) && MEM_P (operands[2]))" "@ pinsrd\t{$1, %2, %0|%0, %2, 1} pinsrd\t{$1, %2, %0|%0, %2, 1} vpinsrd\t{$1, %2, %1, %0|%0, %1, %2, 1} vpinsrd\t{$1, %2, %1, %0|%0, %1, %2, 1} punpckldq\t{%2, %0|%0, %2} punpckldq\t{%2, %0|%0, %2} vpunpckldq\t{%2, %1, %0|%0, %1, %2} %vmovd\t{%1, %0|%0, %1} punpckldq\t{%2, %0|%0, %2} movd\t{%1, %0|%0, %1}" but for some reason RA chooses alternative 2 (x<-x,rm) instead of alternative 6 (v<-Yv,Yv), although alternative 2 needs an extra reload from %xmm2 to %eax.
[Bug rtl-optimization/87718] [9 Regression] FAIL: gcc.target/i386/avx512dq-concatv2si-1.c
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87718 --- Comment #1 from Uroš Bizjak --- *** Bug 87717 has been marked as a duplicate of this bug. ***
[Bug rtl-optimization/87718] [9 Regression] FAIL: gcc.target/i386/avx512dq-concatv2si-1.c
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87718 Richard Biener changed: What|Removed |Added Keywords||missed-optimization, ra Status|UNCONFIRMED |NEW Last reconfirmed||2018-10-24 Target Milestone|--- |9.0 Ever confirmed|0 |1