[Bug tree-optimization/57858] AVX2: ymm used for div, not for sqrt

2021-09-10 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=57858

Andrew Pinski  changed:

   What|Removed |Added

   Target Milestone|--- |8.0

[Bug tree-optimization/57858] AVX2: ymm used for div, not for sqrt

2021-07-30 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=57858

--- Comment #8 from Andrew Pinski  ---
(In reply to Richard Biener from comment #7)
> It was fixed by adding another loop header copying pass before
> vectorization, aka ch_vect. 

But that went in way in GCC 6 (r6-1951) but the loop header copying was not
happening until GCC 8.

[Bug tree-optimization/57858] AVX2: ymm used for div, not for sqrt

2021-07-30 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=57858

Richard Biener  changed:

   What|Removed |Added

 CC||rguenth at gcc dot gnu.org
 Status|UNCONFIRMED |RESOLVED
 Blocks||53947
 Resolution|--- |FIXED

--- Comment #7 from Richard Biener  ---
It was fixed by adding another loop header copying pass before vectorization,
aka ch_vect.  Of course it means we peel one iteration which might be not 100%
optimal.  Optimally we'd teach PRE that those loop carried dependences are
bad(TM) just like we do for loads and extend that to cover calls.  The peeling
means we need an epilogue, so we didn't really save a sqrt call.

That said, the situation is somewhat mitigated now and I'd declare it fixed
anyway, the testcase is somewhat artificial (resolvable at compile time).


Referenced Bugs:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53947
[Bug 53947] [meta-bug] vectorizer missed-optimizations

[Bug tree-optimization/57858] AVX2: ymm used for div, not for sqrt

2021-07-29 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=57858

Andrew Pinski  changed:

   What|Removed |Added

   Keywords||missed-optimization

--- Comment #6 from Andrew Pinski  ---
So this was fixed in GCC 8 but I cannot tell by what.  ch_vect has been there
since 2014 which should have done the copying of the header but did not until
GCC 8.  There is not enough debug output to tell what changed either.

[Bug tree-optimization/57858] AVX2: ymm used for div, not for sqrt

2013-07-10 Thread vincenzo.innocente at cern dot ch
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=57858

--- Comment #5 from vincenzo Innocente  ---
I remember something similar in the past
--param max-completely-peel-times=1 
sort of fix it…  (why pre does not recognize that 1/(1+0) == 1  btw??

of course it is just a benchmark (and I can modify it to avoid the loop
peeling),
still

[Bug tree-optimization/57858] AVX2: ymm used for div, not for sqrt

2013-07-09 Thread jakub at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=57858

--- Comment #4 from Jakub Jelinek  ---
Actually, it isn't vectorized at all, because PRE attempts to be smart, figures
out that for the first iteration of the loop it can avoid computing the sqrt
because the result will be one, and moves thus the sqrt call into the latch,
but we can't vectorize any loops that have non-empty latches.
So, either the vectorizer would need to undo this transformation, or PRE not do
it at all, or arrange for it to be done only after vectorizations.  Richard,
any thoughts on this?


[Bug tree-optimization/57858] AVX2: ymm used for div, not for sqrt

2013-07-09 Thread glisse at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=57858

--- Comment #3 from Marc Glisse  ---
-fno-tree-pre lets it vectorize sqr as well. PRE creates a jump to the middle
of the loop body, which is nice but prevents vectorization.


[Bug tree-optimization/57858] AVX2: ymm used for div, not for sqrt

2013-07-09 Thread vincenzo.innocente at cern dot ch
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=57858

--- Comment #2 from vincenzo Innocente  ---
actually the code for div and sqr is different already for standard SSE
c++ -std=c++11 -Ofast -S avx2sqrt.cc -ftree-vectorizer-verbose=1 -Wall ; cat
avx2sqrt.s

.L2:
movdqa%xmm0, %xmm1
addl$1, %eax
movdqa%xmm0, %xmm4
cmpl$256, %eax
paddd%xmm5, %xmm1
pshufd$238, %xmm1, %xmm0
cvtdq2pd%xmm1, %xmm1
movapd%xmm3, %xmm7
paddd%xmm6, %xmm4
cvtdq2pd%xmm0, %xmm0
divpd%xmm0, %xmm7
movapd%xmm7, %xmm0
movapd%xmm3, %xmm7
divpd%xmm1, %xmm7
addpd%xmm7, %xmm0
addpd%xmm0, %xmm2
jne.L3
movapd%xmm2, -24(%rsp)
movsd-16(%rsp), %xmm0
addsd%xmm2, %xmm0
ret
.cfi_endproc
.LFE3:
.size_Z3divv, .-_Z3divv
.p2align 4,,15
.globl_Z3sqrv
.type_Z3sqrv, @function
_Z3sqrv:
.LFB4:
.cfi_startproc
movl$1, %eax
movsd.LC4(%rip), %xmm1
xorpd%xmm0, %xmm0
jmp.L6
.p2align 4,,10
.p2align 3
.L7:
cvtsi2sd%eax, %xmm1
sqrtsd%xmm1, %xmm1
.L6:
addl$1, %eax
addsd%xmm1, %xmm0
cmpl$1025, %eax
jne.L7
rep; ret
.cfi_endproc


[Bug tree-optimization/57858] AVX2: ymm used for div, not for sqrt

2013-07-09 Thread jakub at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=57858

Jakub Jelinek  changed:

   What|Removed |Added

 CC||jakub at gcc dot gnu.org

--- Comment #1 from Jakub Jelinek  ---
I'll look at this.