[Bug middle-end/92455] Unnecessary memory read in a loop
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92455 Andrew Pinski changed: What|Removed |Added Severity|normal |enhancement
[Bug middle-end/92455] Unnecessary memory read in a loop
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92455 Richard Biener changed: What|Removed |Added Target Milestone|11.2|---
[Bug middle-end/92455] Unnecessary memory read in a loop
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92455 Jakub Jelinek changed: What|Removed |Added Target Milestone|11.0|11.2 --- Comment #8 from Jakub Jelinek --- GCC 11.1 has been released, retargeting bugs to GCC 11.2.
[Bug middle-end/92455] Unnecessary memory read in a loop
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92455 Martin Liška changed: What|Removed |Added CC||marxin at gcc dot gnu.org --- Comment #7 from Martin Liška --- Created attachment 48644 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=48644&action=edit Complete LNT results There are complete LNT results, nothing has improved rapidly.
[Bug middle-end/92455] Unnecessary memory read in a loop
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92455 Martin Liška changed: What|Removed |Added Status|ASSIGNED|NEW Assignee|marxin at gcc dot gnu.org |unassigned at gcc dot gnu.org --- Comment #6 from Martin Liška --- I've just run SPEC2006 and SPEC2017 on various machines and I haven't found any speed improvement. There are 2 noticeable regressions: znver2 -Ofast: SPEC/SPEC2017/FP/507.cactuBSSN_r - +5.62% znver2 -Ofast PGO: SPEC/SPEC2017/INT/520.omnetpp_r - +2.42%
[Bug middle-end/92455] Unnecessary memory read in a loop
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92455 Martin Liška changed: What|Removed |Added Target Milestone|--- |11.0
[Bug middle-end/92455] Unnecessary memory read in a loop
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92455 Martin Liška changed: What|Removed |Added Status|UNCONFIRMED |ASSIGNED Last reconfirmed||2020-01-30 CC||marxin at gcc dot gnu.org Assignee|unassigned at gcc dot gnu.org |marxin at gcc dot gnu.org Ever confirmed|0 |1 --- Comment #5 from Martin Liška --- I'll measure impact of the option on SPEC benchmarks.
[Bug middle-end/92455] Unnecessary memory read in a loop
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92455 --- Comment #4 from Antony Polukhin --- (In reply to Richard Biener from comment #3) > But maybe > you can provide benchmark data (including compile-time/memory-use figures)? OK. Is there any GCC specific tool or flag for that?
[Bug middle-end/92455] Unnecessary memory read in a loop
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92455 --- Comment #3 from Richard Biener --- (In reply to Antony Polukhin from comment #2) > Can the -ftree-partial-pre flag be enabled by default for -O2? It used to be quite slow in its dataflow compute but that has improved. It's still quadratic in size though and it's scope is extremely limited (partial antic but fully available). So I don't think so. But maybe you can provide benchmark data (including compile-time/memory-use figures)?
[Bug middle-end/92455] Unnecessary memory read in a loop
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92455 --- Comment #2 from Antony Polukhin --- Can the -ftree-partial-pre flag be enabled by default for -O2?
[Bug middle-end/92455] Unnecessary memory read in a loop
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92455 --- Comment #1 from Richard Biener --- You need partial-PRE to perform the desired transform. With -O3 or -O2 -ftree-partial-pre we do what you suggest (plus also cache *max->ptr in exchange for another IV): f1: .LFB0: .cfi_startproc movq(%rdi), %rax leaq40(%rdi), %rcx movq%rdi, %rsi movl(%rax), %edx .L3: movq8(%rdi), %rax addq$8, %rdi movl(%rax), %eax cmpl%edx, %eax jle .L2 movl%eax, %edx movq%rdi, %rsi .L2: cmpq%rdi, %rcx jne .L3 movq(%rsi), %rax ret because of the two conditional values (*max and *max->ptr_) the cmov transform doesn't trigger.