[Bug middle-end/92455] Unnecessary memory read in a loop

2022-01-05 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92455

Andrew Pinski  changed:

   What|Removed |Added

   Severity|normal  |enhancement

[Bug middle-end/92455] Unnecessary memory read in a loop

2021-07-28 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92455

Richard Biener  changed:

   What|Removed |Added

   Target Milestone|11.2|---

[Bug middle-end/92455] Unnecessary memory read in a loop

2021-04-27 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92455

Jakub Jelinek  changed:

   What|Removed |Added

   Target Milestone|11.0|11.2

--- Comment #8 from Jakub Jelinek  ---
GCC 11.1 has been released, retargeting bugs to GCC 11.2.

[Bug middle-end/92455] Unnecessary memory read in a loop

2020-05-30 Thread marxin at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92455

Martin Liška  changed:

   What|Removed |Added

 CC||marxin at gcc dot gnu.org

--- Comment #7 from Martin Liška  ---
Created attachment 48644
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=48644&action=edit
Complete LNT results

There are complete LNT results, nothing has improved rapidly.

[Bug middle-end/92455] Unnecessary memory read in a loop

2020-05-29 Thread marxin at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92455

Martin Liška  changed:

   What|Removed |Added

 Status|ASSIGNED|NEW
   Assignee|marxin at gcc dot gnu.org  |unassigned at gcc dot 
gnu.org

--- Comment #6 from Martin Liška  ---
I've just run SPEC2006 and SPEC2017 on various machines and I haven't found any
speed improvement.
There are 2 noticeable regressions:

znver2 -Ofast: SPEC/SPEC2017/FP/507.cactuBSSN_r - +5.62%
znver2 -Ofast PGO: SPEC/SPEC2017/INT/520.omnetpp_r - +2.42%

[Bug middle-end/92455] Unnecessary memory read in a loop

2020-02-06 Thread marxin at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92455

Martin Liška  changed:

   What|Removed |Added

   Target Milestone|--- |11.0

[Bug middle-end/92455] Unnecessary memory read in a loop

2020-01-30 Thread marxin at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92455

Martin Liška  changed:

   What|Removed |Added

 Status|UNCONFIRMED |ASSIGNED
   Last reconfirmed||2020-01-30
 CC||marxin at gcc dot gnu.org
   Assignee|unassigned at gcc dot gnu.org  |marxin at gcc dot 
gnu.org
 Ever confirmed|0   |1

--- Comment #5 from Martin Liška  ---
I'll measure impact of the option on SPEC benchmarks.

[Bug middle-end/92455] Unnecessary memory read in a loop

2019-11-11 Thread antoshkka at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92455

--- Comment #4 from Antony Polukhin  ---
(In reply to Richard Biener from comment #3)
> But maybe
> you can provide benchmark data (including compile-time/memory-use figures)?

OK. Is there any GCC specific tool or flag for that?

[Bug middle-end/92455] Unnecessary memory read in a loop

2019-11-11 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92455

--- Comment #3 from Richard Biener  ---
(In reply to Antony Polukhin from comment #2)
> Can the -ftree-partial-pre flag be enabled by default for -O2?

It used to be quite slow in its dataflow compute but that has improved.
It's still quadratic in size though and it's scope is extremely limited
(partial antic but fully available).  So I don't think so.  But maybe
you can provide benchmark data (including compile-time/memory-use figures)?

[Bug middle-end/92455] Unnecessary memory read in a loop

2019-11-11 Thread antoshkka at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92455

--- Comment #2 from Antony Polukhin  ---
Can the -ftree-partial-pre flag be enabled by default for -O2?

[Bug middle-end/92455] Unnecessary memory read in a loop

2019-11-11 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92455

--- Comment #1 from Richard Biener  ---
You need partial-PRE to perform the desired transform.  With -O3 or -O2
-ftree-partial-pre we do what you suggest (plus also cache *max->ptr in
exchange
for another IV):

f1:
.LFB0:
.cfi_startproc
movq(%rdi), %rax
leaq40(%rdi), %rcx
movq%rdi, %rsi
movl(%rax), %edx
.L3:
movq8(%rdi), %rax
addq$8, %rdi
movl(%rax), %eax
cmpl%edx, %eax
jle .L2
movl%eax, %edx
movq%rdi, %rsi
.L2:
cmpq%rdi, %rcx
jne .L3
movq(%rsi), %rax
ret

because of the two conditional values (*max and *max->ptr_) the cmov
transform doesn't trigger.