[Bug target/50829] avx extra copy for _mm256_insertf128_pd

2020-08-10 Thread glisse at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=50829

Marc Glisse  changed:

   What|Removed |Added

 Status|NEW |RESOLVED
  Known to work||10.1.0
 Resolution|--- |FIXED
  Known to fail||9.3.0

--- Comment #14 from Marc Glisse  ---
This was fixed (by Jakub I think).

[Bug target/50829] avx extra copy for _mm256_insertf128_pd

2013-03-30 Thread glisse at gcc dot gnu.org


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50829



--- Comment #13 from Marc Glisse  2013-03-30 
10:13:46 UTC ---

(In reply to comment #10)

> Created attachment 28846 [details]

> Use subreg



The patch was submitted at

http://gcc.gnu.org/ml/gcc-patches/2013-03/msg00683.html and rejected, see the

discussion in that thread. We need a way to completely remove the pattern.


[Bug target/50829] avx extra copy for _mm256_insertf128_pd

2012-12-01 Thread hjl.tools at gmail dot com


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50829



--- Comment #12 from H.J. Lu  2012-12-01 22:22:28 
UTC ---

Also see PR 44551.


[Bug target/50829] avx extra copy for _mm256_insertf128_pd

2012-12-01 Thread hjl.tools at gmail dot com


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50829



H.J. Lu  changed:



   What|Removed |Added



 CC||areg.melikadamyan at gmail

   ||dot com, hjl.tools at gmail

   ||dot com



--- Comment #11 from H.J. Lu  2012-12-01 20:26:32 
UTC ---

(In reply to comment #10)

> Created attachment 28846 [details]

> Use subreg

> 

> Hmm, I don't understand why we use UNSPEC_CAST. I tried the attached patch to

> use a subreg instead. It passed bootstrap+testsuite and generates optimal code

> for the testcase of this PR.

> 

> So, any idea what advantage the unspec has over a subreg? And if none, what is

> the best way to use a subreg?



subreg didn't work before.


[Bug target/50829] avx extra copy for _mm256_insertf128_pd

2012-12-01 Thread glisse at gcc dot gnu.org


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50829



--- Comment #10 from Marc Glisse  2012-12-01 
19:50:28 UTC ---

Created attachment 28846

  --> http://gcc.gnu.org/bugzilla/attachment.cgi?id=28846

Use subreg



Hmm, I don't understand why we use UNSPEC_CAST. I tried the attached patch to

use a subreg instead. It passed bootstrap+testsuite and generates optimal code

for the testcase of this PR.



So, any idea what advantage the unspec has over a subreg? And if none, what is

the best way to use a subreg?


[Bug target/50829] avx extra copy for _mm256_insertf128_pd

2012-12-01 Thread glisse at gcc dot gnu.org


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50829



Marc Glisse  changed:



   What|Removed |Added



 CC||glisse at gcc dot gnu.org



--- Comment #9 from Marc Glisse  2012-12-01 16:30:23 
UTC ---

The code with intrinsics still has the extra move, but note that this version

without intrinsics generates perfect code:



#include 

__m256d concat(__m128d x){

  __m256d z={x[0],x[1],x[0],x[1]};

  return z;

}


[Bug target/50829] avx extra copy for _mm256_insertf128_pd

2011-11-23 Thread pinskia at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50829

--- Comment #8 from Andrew Pinski  2011-11-24 
03:47:06 UTC ---
I should note that have used that trick (and the same regrename.c patch) on two
different targets (PowerPC and MIPS) while at two different companies.


[Bug target/50829] avx extra copy for _mm256_insertf128_pd

2011-11-23 Thread vmakarov at redhat dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50829

--- Comment #7 from Vladimir Makarov  2011-11-24 
03:45:24 UTC ---
As for stack allocation.  crtl->stack_realign_needed == 1 results in
frame_pointer_needed:=1 in ira.c::ira_setup_eliminable_regset.  I don't
remember the origin of the code.  Probably, it is from HJ's stack aligning
work.  Sorry, if I am wrong.

I guess we should re-evaluate frame_pointer_needed at the end of RA if we don't
allocate any memory in all RA.


[Bug target/50829] avx extra copy for _mm256_insertf128_pd

2011-11-23 Thread pinskia at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50829

Andrew Pinski  changed:

   What|Removed |Added

  Component|rtl-optimization|target

--- Comment #6 from Andrew Pinski  2011-11-24 
03:43:46 UTC ---
The move that is generated in the split should be in the bigger mode.  And then
apply the following patch:
* regrename.c (maybe_mode_change): Return the reg in the
new mode if the copy was done in the same mode size.
Index: regrename.c
===
--- regrename.c(revision 55954)
+++ regrename.c(revision 55955)
@@ -1322,6 +1322,15 @@ maybe_mode_change (enum machine_mode ori
enum machine_mode new_mode, unsigned int regno,
unsigned int copy_regno ATTRIBUTE_UNUSED)
 {
+  /*  If we are using the register in the copy mode (if the number of hard
+  registers is the same), just used the reg with the new mode.  */
+  if (GET_MODE_SIZE (copy_mode) == GET_MODE_SIZE (new_mode)
+  && hard_regno_nregs[copy_regno][copy_mode] ==
+ hard_regno_nregs[copy_regno][new_mode]
+  && hard_regno_nregs[regno][copy_mode] ==
+ hard_regno_nregs[copy_regno][new_mode])
+return gen_rtx_raw_REG (new_mode, regno);
+
   if (GET_MODE_SIZE (copy_mode) < GET_MODE_SIZE (orig_mode)
   && GET_MODE_SIZE (copy_mode) < GET_MODE_SIZE (new_mode))
 return NULL_RTX;

This will at least remove it with -frename-registers which we most likely
should have on by default at -O2 and above now anyways.


[Bug target/50829] avx extra copy for _mm256_insertf128_pd

2011-10-23 Thread marc.glisse at normalesup dot org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50829

--- Comment #3 from Marc Glisse  2011-10-23 
08:20:50 UTC ---
(In reply to comment #1)
> This looks similar to PR 34283, a RA problem.

PR 48037 too. I didn't find all of those before reporting because I was looking
for something AVX-specific, sorry.


[Bug target/50829] avx extra copy for _mm256_insertf128_pd

2011-10-22 Thread marc.glisse at normalesup dot org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50829

--- Comment #2 from Marc Glisse  2011-10-22 
17:01:51 UTC ---
(In reply to comment #1)
> This looks similar to PR 34283, a RA problem.

Ah, indeed. I also had a function that ended with:
vmovapd%xmm1, %xmm0
vmaxpd%xmm2, %xmm0, %xmm0
vzeroupper
ret

which looks related too. Thanks for the link.


[Bug target/50829] avx extra copy for _mm256_insertf128_pd

2011-10-22 Thread ubizjak at gmail dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50829

Uros Bizjak  changed:

   What|Removed |Added

   Keywords||ra

--- Comment #1 from Uros Bizjak  2011-10-22 16:10:12 
UTC ---
This looks similar to PR 34283, a RA problem.