https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90056
--- Comment #1 from Martin Liška ---
Created attachment 46169
--> https://gcc.gnu.org/bugzilla/attachment.cgi?id=46169=edit
perf annotate - Ofast native vs. Ofast native PGO
I'm attaching HTML and txt perf annotate for Ofast native and Ofast native PGO
builds. As seen, it's still the same story. There's a big register pressure
that leads to spilling of some of the induction variables.
For these builds, the most significant difference is:
GOOD:
: if(block(row, 4, i4) <= 0) cycle
0.00 :41c660: mov(%r9),%r12d
1.99 :41c663: mov%r11d,0x80(%rsp)
0.11 :41c66b: mov%r11d,%edx
0.02 :41c66e: test %r12d,%r12d
0.15 :41c671: jg 41c7b0
<__brute_force_MOD_digits_2+0xe00>
0.01 :41c677: inc%r11
0.64 :41c67a: add$0x144,%r9
0.13 :41c681: add$0x144,%r8
0.05 :41c688: add$0x144,%r10
: do i4 = l(4), u(4)
0.15 :41c68f: cmp%r11d,0x6c(%rsp)
2.39 :41c694: jge41c660
<__brute_force_MOD_digits_2+0xcb0>
0.00 :41c696: mov0x168(%rsp),%r10
0.55 :41c69e: mov0x170(%rsp),%r9
0.08 :41c6a6: mov0x178(%rsp),%r11
0.05 :41c6ae: mov0x180(%rsp),%r8
: block(row, 4:9, i3) = block(row, 4:9, i3) + 10
BAD:
: if(block(row, 4, i4) <= 0) cycle
0.05 :41a8b0: mov(%r11),%edi
0.78 :41a8b3: mov%r10d,0x84(%rsp)
0.04 :41a8bb: mov%r10d,%r13d
0.01 :41a8be: test %edi,%edi
0.26 :41a8c0: jg 41aa10
<__brute_force_MOD_digits_2+0x1210>
0.44 :41a8c6: addq $0x144,0x48(%rsp)
4.04 :41a8cf: addq $0x144,0x58(%rsp)
1.31 :41a8d8: inc%r10
0.02 :41a8db: add$0x144,%r11
: do i4 = l(4), u(4)
0.01 :41a8e2: cmp%r10d,0x88(%rsp)
0.25 :41a8ea: jge41a8b0
<__brute_force_MOD_digits_2+0x10b0>
: block(row, 4:9, i3) = block(row, 4:9, i3) + 10
0.03 :41a8ec: mov0xd0(%rsp),%r15
0.27 :41a8f4: addl $0xa,-0xdc(%r15)
0.20 :41a8fc: addl $0xa,-0xb8(%r15)
0.01 :41a904: addl $0xa,-0x94(%r15)
0.07 :41a90c: addl $0xa,-0x70(%r15)
0.05 :41a911: addl $0xa,-0x4c(%r15)
0.06 :41a916: addl $0xa,-0x28(%r15)
The benchmark is quite unpredictable, I'm leaving that for now.