[Bug target/50829] avx extra copy for _mm256_insertf128_pd
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=50829 Marc Glisse changed: What|Removed |Added Status|NEW |RESOLVED Known to work||10.1.0 Resolution|--- |FIXED Known to fail||9.3.0 --- Comment #14 from Marc Glisse --- This was fixed (by Jakub I think).
[Bug target/50829] avx extra copy for _mm256_insertf128_pd
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50829 --- Comment #13 from Marc Glisse 2013-03-30 10:13:46 UTC --- (In reply to comment #10) > Created attachment 28846 [details] > Use subreg The patch was submitted at http://gcc.gnu.org/ml/gcc-patches/2013-03/msg00683.html and rejected, see the discussion in that thread. We need a way to completely remove the pattern.
[Bug target/50829] avx extra copy for _mm256_insertf128_pd
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50829 --- Comment #12 from H.J. Lu 2012-12-01 22:22:28 UTC --- Also see PR 44551.
[Bug target/50829] avx extra copy for _mm256_insertf128_pd
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50829 H.J. Lu changed: What|Removed |Added CC||areg.melikadamyan at gmail ||dot com, hjl.tools at gmail ||dot com --- Comment #11 from H.J. Lu 2012-12-01 20:26:32 UTC --- (In reply to comment #10) > Created attachment 28846 [details] > Use subreg > > Hmm, I don't understand why we use UNSPEC_CAST. I tried the attached patch to > use a subreg instead. It passed bootstrap+testsuite and generates optimal code > for the testcase of this PR. > > So, any idea what advantage the unspec has over a subreg? And if none, what is > the best way to use a subreg? subreg didn't work before.
[Bug target/50829] avx extra copy for _mm256_insertf128_pd
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50829 --- Comment #10 from Marc Glisse 2012-12-01 19:50:28 UTC --- Created attachment 28846 --> http://gcc.gnu.org/bugzilla/attachment.cgi?id=28846 Use subreg Hmm, I don't understand why we use UNSPEC_CAST. I tried the attached patch to use a subreg instead. It passed bootstrap+testsuite and generates optimal code for the testcase of this PR. So, any idea what advantage the unspec has over a subreg? And if none, what is the best way to use a subreg?
[Bug target/50829] avx extra copy for _mm256_insertf128_pd
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50829 Marc Glisse changed: What|Removed |Added CC||glisse at gcc dot gnu.org --- Comment #9 from Marc Glisse 2012-12-01 16:30:23 UTC --- The code with intrinsics still has the extra move, but note that this version without intrinsics generates perfect code: #include __m256d concat(__m128d x){ __m256d z={x[0],x[1],x[0],x[1]}; return z; }
[Bug target/50829] avx extra copy for _mm256_insertf128_pd
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50829 --- Comment #8 from Andrew Pinski 2011-11-24 03:47:06 UTC --- I should note that have used that trick (and the same regrename.c patch) on two different targets (PowerPC and MIPS) while at two different companies.
[Bug target/50829] avx extra copy for _mm256_insertf128_pd
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50829 --- Comment #7 from Vladimir Makarov 2011-11-24 03:45:24 UTC --- As for stack allocation. crtl->stack_realign_needed == 1 results in frame_pointer_needed:=1 in ira.c::ira_setup_eliminable_regset. I don't remember the origin of the code. Probably, it is from HJ's stack aligning work. Sorry, if I am wrong. I guess we should re-evaluate frame_pointer_needed at the end of RA if we don't allocate any memory in all RA.
[Bug target/50829] avx extra copy for _mm256_insertf128_pd
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50829 Andrew Pinski changed: What|Removed |Added Component|rtl-optimization|target --- Comment #6 from Andrew Pinski 2011-11-24 03:43:46 UTC --- The move that is generated in the split should be in the bigger mode. And then apply the following patch: * regrename.c (maybe_mode_change): Return the reg in the new mode if the copy was done in the same mode size. Index: regrename.c === --- regrename.c(revision 55954) +++ regrename.c(revision 55955) @@ -1322,6 +1322,15 @@ maybe_mode_change (enum machine_mode ori enum machine_mode new_mode, unsigned int regno, unsigned int copy_regno ATTRIBUTE_UNUSED) { + /* If we are using the register in the copy mode (if the number of hard + registers is the same), just used the reg with the new mode. */ + if (GET_MODE_SIZE (copy_mode) == GET_MODE_SIZE (new_mode) + && hard_regno_nregs[copy_regno][copy_mode] == + hard_regno_nregs[copy_regno][new_mode] + && hard_regno_nregs[regno][copy_mode] == + hard_regno_nregs[copy_regno][new_mode]) +return gen_rtx_raw_REG (new_mode, regno); + if (GET_MODE_SIZE (copy_mode) < GET_MODE_SIZE (orig_mode) && GET_MODE_SIZE (copy_mode) < GET_MODE_SIZE (new_mode)) return NULL_RTX; This will at least remove it with -frename-registers which we most likely should have on by default at -O2 and above now anyways.
[Bug target/50829] avx extra copy for _mm256_insertf128_pd
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50829 --- Comment #3 from Marc Glisse 2011-10-23 08:20:50 UTC --- (In reply to comment #1) > This looks similar to PR 34283, a RA problem. PR 48037 too. I didn't find all of those before reporting because I was looking for something AVX-specific, sorry.
[Bug target/50829] avx extra copy for _mm256_insertf128_pd
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50829 --- Comment #2 from Marc Glisse 2011-10-22 17:01:51 UTC --- (In reply to comment #1) > This looks similar to PR 34283, a RA problem. Ah, indeed. I also had a function that ended with: vmovapd%xmm1, %xmm0 vmaxpd%xmm2, %xmm0, %xmm0 vzeroupper ret which looks related too. Thanks for the link.
[Bug target/50829] avx extra copy for _mm256_insertf128_pd
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50829 Uros Bizjak changed: What|Removed |Added Keywords||ra --- Comment #1 from Uros Bizjak 2011-10-22 16:10:12 UTC --- This looks similar to PR 34283, a RA problem.