[Bug rtl-optimization/87718] [9 Regression] FAIL: gcc.target/i386/avx512dq-concatv2si-1.c

2018-11-30 Thread tnfchris at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87718

Tamar Christina  changed:

   What|Removed |Added

 CC||tnfchris at gcc dot gnu.org

--- Comment #9 from Tamar Christina  ---
This change has quite a negative effect on the cost model in AArch64 an ICE due
to the new costs and register classes it picks.

See PR88282

Thanks,
Tamar

[Bug rtl-optimization/87718] [9 Regression] FAIL: gcc.target/i386/avx512dq-concatv2si-1.c

2018-11-27 Thread jakub at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87718

Jakub Jelinek  changed:

   What|Removed |Added

 Status|NEW |RESOLVED
 Resolution|--- |FIXED

--- Comment #8 from Jakub Jelinek  ---
Fixed, thanks.

[Bug rtl-optimization/87718] [9 Regression] FAIL: gcc.target/i386/avx512dq-concatv2si-1.c

2018-11-22 Thread vmakarov at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87718

--- Comment #7 from Vladimir Makarov  ---
Author: vmakarov
Date: Thu Nov 22 17:25:57 2018
New Revision: 266385

URL: https://gcc.gnu.org/viewcvs?rev=266385&root=gcc&view=rev
Log:
2018-11-22  Vladimir Makarov  

PR rtl-optimization/87718
* ira-costs.c: Remove trailing white-spaces.
(record_operand_costs): Add a special treatment for moves
involving a hard register.

2018-11-22  Vladimir Makarov  

PR rtl-optimization/87718
* gcc.target/i386/pr82361-1.c: Check only the first operand of
moves.


Modified:
trunk/gcc/ChangeLog
trunk/gcc/ira-costs.c
trunk/gcc/testsuite/ChangeLog
trunk/gcc/testsuite/gcc.target/i386/pr82361-1.c

[Bug rtl-optimization/87718] [9 Regression] FAIL: gcc.target/i386/avx512dq-concatv2si-1.c

2018-11-19 Thread vmakarov at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87718

--- Comment #6 from Vladimir Makarov  ---
The culprit for the bad code generation is the following insn description

(define_insn "*movsi_internal"
  [(set (match_operand:SI 0 "nonimmediate_operand"
"=r,m ,*y,*y,?*y,?m,?r,?*y,*v,*v,*v,m ,?r,?*v,*k,*k ,*rm")
(match_operand:SI 1 "general_operand"
"g ,re,C ,*y,m  ,*y,*y,r  ,C ,*v,m ,*v,*v,r  ,*r,*km,*k"))]

Alternatives with sse regs are not considered at all (hint *) for cost
calculation even if one operand is sse hard reg.  And therefore sse class for
another operand with pseudo is too costly.

Removing the hints is not a solution.  I believe we will have even more
problems with GCC testsuite.  So I am trying to solve it with specific
treatment of moves for cost calculations.  The patch I am working on solves the
PR but currently creates a few GCC testsuite failures (unexpected but correct
code generation).  So I am continuing to work on the PR.

[Bug rtl-optimization/87718] [9 Regression] FAIL: gcc.target/i386/avx512dq-concatv2si-1.c

2018-11-14 Thread vmakarov at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87718

--- Comment #5 from Vladimir Makarov  ---
  In general moving from propagation of hard regs is good thing for RA. 
Although there are exception as this PR.

  The problem starts with IRA.  It decides that r91 should be a general regs
based  on cost calculation.  The cost calculation code in IRA is very
sensitive.  A change there usually results in new PRs with unexpected code
generation.

  I'll investigate more the PR and how to fix the PR with minimal effect to
other targets and tests.  But right now I can guess that the cost of move of
greg with sse-regs and move of sse-regs is the same for intel but if we choose
sse-regs they are coalesced and the move is removed.

[Bug rtl-optimization/87718] [9 Regression] FAIL: gcc.target/i386/avx512dq-concatv2si-1.c

2018-11-13 Thread xuepeng.guo at intel dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87718

--- Comment #4 from Terry Guo  ---
(In reply to Uroš Bizjak from comment #2)
> Following testcase:
> 
> --cut here--
> typedef int V __attribute__((vector_size (8)));
> 
> void foo (int x, int y)
> {
>   register int a __asm ("xmm1");
>   register int b __asm ("xmm2");
>   register V c __asm ("xmm3");
>   a = x;
>   b = y;
>   asm volatile ("" : "+v" (a), "+v" (b));
>   c = (V) { a, b };
>   asm volatile ("" : "+v" (c));
> }
> --cut here--
> 
> gets compiled with -O2 -mavx -mtune=intel:
> 
> vmovd   %edi, %xmm1
> vmovd   %esi, %xmm2
> vmovd   %xmm2, %eax
> vpinsrd $1, %eax, %xmm1, %xmm3
> ret
> 
> The relevant pattern is defined as:
> 
> (define_insn "*vec_concatv2si_sse4_1"
>   [(set (match_operand:V2SI 0 "register_operand"
> "=Yr,*x, x, v,Yr,*x, v, v, *y,*y")
>   (vec_concat:V2SI
> (match_operand:SI 1 "nonimmediate_operand"
> "  0, 0, x,Yv, 0, 0,Yv,rm,  0,rm")
> (match_operand:SI 2 "nonimm_or_0_operand"
> " rm,rm,rm,rm,Yr,*x,Yv, C,*ym, C")))]
>   "TARGET_SSE4_1 && !(MEM_P (operands[1]) && MEM_P (operands[2]))"
>   "@
>pinsrd\t{$1, %2, %0|%0, %2, 1}
>pinsrd\t{$1, %2, %0|%0, %2, 1}
>vpinsrd\t{$1, %2, %1, %0|%0, %1, %2, 1}
>vpinsrd\t{$1, %2, %1, %0|%0, %1, %2, 1}
>punpckldq\t{%2, %0|%0, %2}
>punpckldq\t{%2, %0|%0, %2}
>vpunpckldq\t{%2, %1, %0|%0, %1, %2}
>%vmovd\t{%1, %0|%0, %1}
>punpckldq\t{%2, %0|%0, %2}
>movd\t{%1, %0|%0, %1}"
> 
> but for some reason RA chooses alternative 2 (x<-x,rm) instead of
> alternative 6 (v<-Yv,Yv), although alternative 2 needs an extra reload from
> %xmm2 to %eax.

I dig this a bit and looks like we missed something in combine pass, hence fail
to get a pattern that can match alternative 6. The combine pass dump of old gcc
shows:
---
  REG_UNUSED flags:CC
insn_cost 4 for10: r82:SI=xmm16:SI
  REG_DEAD xmm16:SI
insn_cost 4 for11: r83:SI=xmm17:SI
  REG_DEAD xmm17:SI
insn_cost 4 for12: r87:V2SI=vec_concat(r82:SI,r83:SI)
  REG_DEAD r83:SI
  REG_DEAD r82:SI
---

then we got:
---
Trying 10 -> 12:
   10: r82:SI=xmm16:SI
  REG_DEAD xmm16:SI
   12: r87:V2SI=vec_concat(r82:SI,r83:SI)
  REG_DEAD r83:SI
  REG_DEAD r82:SI
Successfully matched this instruction:
(set (reg:V2SI 87)
(vec_concat:V2SI (reg/v:SI 52 xmm16 [ a ])
(reg:SI 83 [ b.1_2 ])))
allowing combination of insns 10 and 12
original costs 4 + 4 = 8
replacement cost 4
deferring deletion of insn with uid = 10.
modifying insn i312: r87:V2SI=vec_concat(xmm16:SI,r83:SI)
  REG_DEAD xmm16:SI
  REG_DEAD r83:SI
deferring rescan insn with uid = 12.

Trying 11 -> 12:
   11: r83:SI=xmm17:SI
  REG_DEAD xmm17:SI
   12: r87:V2SI=vec_concat(xmm16:SI,r83:SI)
  REG_DEAD xmm16:SI
  REG_DEAD r83:SI
Successfully matched this instruction:
(set (reg:V2SI 87)
(vec_concat:V2SI (reg/v:SI 52 xmm16 [ a ])
(reg/v:SI 53 xmm17 [ b ])))
allowing combination of insns 11 and 12
original costs 4 + 4 = 8
replacement cost 4
deferring deletion of insn with uid = 11.
modifying insn i312: r87:V2SI=vec_concat(xmm16:SI,xmm17:SI)
  REG_DEAD xmm17:SI
  REG_DEAD xmm16:SI
deferring rescan insn with uid = 12.
---

There are two successful combine attempts. We end up with pattern that can
match alternative 6.

However dump from current GCC trunk shows:
---
insn_cost 4 for19: r90:SI=xmm16:SI
  REG_DEAD xmm16:SI
insn_cost 4 for10: r82:SI=r90:SI
  REG_DEAD r90:SI
insn_cost 4 for20: r91:SI=xmm17:SI
  REG_DEAD xmm17:SI
insn_cost 4 for11: r83:SI=r91:SI
  REG_DEAD r91:SI
insn_cost 4 for12: r87:V2SI=vec_concat(r82:SI,r83:SI)
  REG_DEAD r83:SI
  REG_DEAD r82:SI
insn_cost 4 for13: xmm3:V2SI=r87:V2SI
  REG_DEAD r87:V2SI
---
Trying 11 -> 12:
   11: r83:SI=r91:SI
  REG_DEAD r91:SI
   12: r87:V2SI=vec_concat(r90:SI,r83:SI)
  REG_DEAD r90:SI
  REG_DEAD r83:SI
Successfully matched this instruction:
(set (reg:V2SI 87)
(vec_concat:V2SI (reg:SI 90)
(reg:SI 91)))
allowing combination of insns 11 and 12
original costs 4 + 4 = 8
replacement cost 4
deferring deletion of insn with uid = 11.
modifying insn i312: r87:V2SI=vec_concat(r90:SI,r91:SI)
  REG_DEAD r91:SI
  REG_DEAD r90:SI
deferring rescan insn with uid = 12.
---

We end up with "12: r87:V2SI=vec_concat(r90:SI,r91:SI)", later in LRA pass, the
operand r90 is replaced with XMM register, the r91 is kept as general register.
Then no chance match against preferred alternative 6.

[Bug rtl-optimization/87718] [9 Regression] FAIL: gcc.target/i386/avx512dq-concatv2si-1.c

2018-11-13 Thread jakub at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87718

Jakub Jelinek  changed:

   What|Removed |Added

 CC||jakub at gcc dot gnu.org,
   ||vmakarov at gcc dot gnu.org

--- Comment #3 from Jakub Jelinek  ---
Vlad, could you please have a look?

[Bug rtl-optimization/87718] [9 Regression] FAIL: gcc.target/i386/avx512dq-concatv2si-1.c

2018-11-12 Thread ubizjak at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87718

--- Comment #2 from Uroš Bizjak  ---
Following testcase:

--cut here--
typedef int V __attribute__((vector_size (8)));

void foo (int x, int y)
{
  register int a __asm ("xmm1");
  register int b __asm ("xmm2");
  register V c __asm ("xmm3");
  a = x;
  b = y;
  asm volatile ("" : "+v" (a), "+v" (b));
  c = (V) { a, b };
  asm volatile ("" : "+v" (c));
}
--cut here--

gets compiled with -O2 -mavx -mtune=intel:

vmovd   %edi, %xmm1
vmovd   %esi, %xmm2
vmovd   %xmm2, %eax
vpinsrd $1, %eax, %xmm1, %xmm3
ret

The relevant pattern is defined as:

(define_insn "*vec_concatv2si_sse4_1"
  [(set (match_operand:V2SI 0 "register_operand"
  "=Yr,*x, x, v,Yr,*x, v, v, *y,*y")
(vec_concat:V2SI
  (match_operand:SI 1 "nonimmediate_operand"
  "  0, 0, x,Yv, 0, 0,Yv,rm,  0,rm")
  (match_operand:SI 2 "nonimm_or_0_operand"
  " rm,rm,rm,rm,Yr,*x,Yv, C,*ym, C")))]
  "TARGET_SSE4_1 && !(MEM_P (operands[1]) && MEM_P (operands[2]))"
  "@
   pinsrd\t{$1, %2, %0|%0, %2, 1}
   pinsrd\t{$1, %2, %0|%0, %2, 1}
   vpinsrd\t{$1, %2, %1, %0|%0, %1, %2, 1}
   vpinsrd\t{$1, %2, %1, %0|%0, %1, %2, 1}
   punpckldq\t{%2, %0|%0, %2}
   punpckldq\t{%2, %0|%0, %2}
   vpunpckldq\t{%2, %1, %0|%0, %1, %2}
   %vmovd\t{%1, %0|%0, %1}
   punpckldq\t{%2, %0|%0, %2}
   movd\t{%1, %0|%0, %1}"

but for some reason RA chooses alternative 2 (x<-x,rm) instead of alternative 6
(v<-Yv,Yv), although alternative 2 needs an extra reload from %xmm2 to %eax.

[Bug rtl-optimization/87718] [9 Regression] FAIL: gcc.target/i386/avx512dq-concatv2si-1.c

2018-11-12 Thread ubizjak at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87718

--- Comment #1 from Uroš Bizjak  ---
*** Bug 87717 has been marked as a duplicate of this bug. ***

[Bug rtl-optimization/87718] [9 Regression] FAIL: gcc.target/i386/avx512dq-concatv2si-1.c

2018-10-24 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87718

Richard Biener  changed:

   What|Removed |Added

   Keywords||missed-optimization, ra
 Status|UNCONFIRMED |NEW
   Last reconfirmed||2018-10-24
   Target Milestone|--- |9.0
 Ever confirmed|0   |1