[Bug rtl-optimization/59511] [4.9 Regression] FAIL: gcc.target/i386/pr36222-1.c scan-assembler-not movdqa with -mtune=corei7

2016-06-02 Thread peter at cordes dot ca
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=59511

--- Comment #7 from Peter Cordes  ---
I'm seeing the same symptom, affecting gcc4.9 through 5.3.  Not present in 6.1.

IDK if the cause is the same.

(code from an improvement to the horizontal_add functions in Agner Fog's vector
class library)

#include 
int hsum16_gccmovdqa (__m128i const a) {
__m128i lo= _mm_cvtepi16_epi32(a); // sign-extended
a0, a1, a2, a3
__m128i hi= _mm_unpackhi_epi64(a,a); // gcc4.9 through 5.3
wastes a movdqa on this
hi= _mm_cvtepi16_epi32(hi);
__m128i sum1  = _mm_add_epi32(lo,hi);  // add
sign-extended upper / lower halves
//return horizontal_add(sum1);  // manually inlined.
// Shortening the code below can avoid the movdqa
__m128i shuf  = _mm_shuffle_epi32(sum1, 0xEE);
__m128i sum2  = _mm_add_epi32(shuf,sum1);  // 2 sums
shuf  = _mm_shufflelo_epi16(sum2, 0xEE);
__m128i sum4  = _mm_add_epi32(shuf,sum2);
return  _mm_cvtsi128_si32(sum4);   // 32 bit sum
}

gcc4.9 through gcc5.3 output (-O3 -mtune=generic -msse4.1):

movdqa  %xmm0, %xmm1
pmovsxwd%xmm0, %xmm2
punpckhqdq  %xmm0, %xmm1
pmovsxwd%xmm1, %xmm0
paddd   %xmm2, %xmm0
...

gcc6.1 output:

pmovsxwd%xmm0, %xmm1
punpckhqdq  %xmm0, %xmm0
pmovsxwd%xmm0, %xmm0
paddd   %xmm0, %xmm1
...



In a more complicated case, when inlining this code or not, there's actually a
difference between gcc 4.9 and 5.x: gcc5 has the extra movdqa in more cases. 
See my attachment, copied from https://godbolt.org/g/e8iQsj

[Bug rtl-optimization/59511] [4.9 Regression] FAIL: gcc.target/i386/pr36222-1.c scan-assembler-not movdqa with -mtune=corei7

2016-06-02 Thread peter at cordes dot ca
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=59511

Peter Cordes  changed:

   What|Removed |Added

 CC||peter at cordes dot ca

--- Comment #6 from Peter Cordes  ---
Created attachment 38629
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=38629=edit
extra-movdqa-with-gcc5-not-4.9.cpp

[Bug rtl-optimization/59511] [4.9 Regression] FAIL: gcc.target/i386/pr36222-1.c scan-assembler-not movdqa with -mtune=corei7

2014-01-15 Thread vmakarov at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=59511

--- Comment #4 from Vladimir Makarov vmakarov at gcc dot gnu.org ---
Author: vmakarov
Date: Wed Jan 15 17:32:47 2014
New Revision: 206636

URL: http://gcc.gnu.org/viewcvs?rev=206636root=gccview=rev
Log:
2014-01-15  Vladimir Makarov  vmaka...@redhat.com

PR rtl-optimization/59511
* ira.c (ira_init_register_move_cost): Use memory costs for some
cases of register move cost calculations.
* lra-constraints.c (lra_constraints): Use REG_FREQ_FROM_BB
instead of BB frequency.
* lra-coalesce.c (move_freq_compare_func, lra_coalesce): Ditto.
* lra-assigns.c (find_hard_regno_for): Ditto.


Modified:
trunk/gcc/ChangeLog
trunk/gcc/ira.c
trunk/gcc/lra-assigns.c
trunk/gcc/lra-coalesce.c
trunk/gcc/lra-constraints.c


[Bug rtl-optimization/59511] [4.9 Regression] FAIL: gcc.target/i386/pr36222-1.c scan-assembler-not movdqa with -mtune=corei7

2014-01-15 Thread jakub at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=59511

Jakub Jelinek jakub at gcc dot gnu.org changed:

   What|Removed |Added

 Status|NEW |RESOLVED
 Resolution|--- |FIXED

--- Comment #5 from Jakub Jelinek jakub at gcc dot gnu.org ---
Fixed, thanks.


[Bug rtl-optimization/59511] [4.9 Regression] FAIL: gcc.target/i386/pr36222-1.c scan-assembler-not movdqa with -mtune=corei7

2013-12-19 Thread rguenth at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=59511

Richard Biener rguenth at gcc dot gnu.org changed:

   What|Removed |Added

   Priority|P3  |P1


[Bug rtl-optimization/59511] [4.9 Regression] FAIL: gcc.target/i386/pr36222-1.c scan-assembler-not movdqa with -mtune=corei7

2013-12-17 Thread vmakarov at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=59511

--- Comment #3 from Vladimir Makarov vmakarov at gcc dot gnu.org ---
(In reply to Jakub Jelinek from comment #2)
 One movdqa started appearing with r204212, the second movdqa started
 appearing with r204752.  Vlad, can you please have a look?

It seems the changes triggered a bug in register move cost calculations.  I
have a patch to fix it but I need more time to check affect of it on the
performance.  So the fix will be ready at the end of week if everything is ok.


[Bug rtl-optimization/59511] [4.9 Regression] FAIL: gcc.target/i386/pr36222-1.c scan-assembler-not movdqa with -mtune=corei7

2013-12-16 Thread jakub at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=59511

Jakub Jelinek jakub at gcc dot gnu.org changed:

   What|Removed |Added

 CC||jakub at gcc dot gnu.org

--- Comment #2 from Jakub Jelinek jakub at gcc dot gnu.org ---
One movdqa started appearing with r204212, the second movdqa started appearing
with r204752.  Vlad, can you please have a look?


[Bug rtl-optimization/59511] [4.9 Regression] FAIL: gcc.target/i386/pr36222-1.c scan-assembler-not movdqa with -mtune=corei7

2013-12-15 Thread ubizjak at gmail dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=59511

Uroš Bizjak ubizjak at gmail dot com changed:

   What|Removed |Added

   Keywords||ra
 Target||x86_64-pc-linux-gnu
 Status|UNCONFIRMED |NEW
   Last reconfirmed||2013-12-15
 CC||vmakarov at gcc dot gnu.org
   Target Milestone|--- |4.9.0
 Ever confirmed|0   |1

--- Comment #1 from Uroš Bizjak ubizjak at gmail dot com ---
Confirmed as RA regression.