https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120916
--- Comment #7 from Jan Hubicka <hubicka at gcc dot gnu.org> ---
LLVM also gets execution counts wrong, just the different (and less harmful)
way:
test:270773509:9780
1: 9116
2: 51984 for (
4: 51984 i<s <- this is i<s and should also have large count
5: 7081488 i++
6: 7081488 a[i]++
7: 8576
main:36431:0
1: 0
2.1: 9051
3: 9278 test:9780
4: 0
I am confused why the autofdo tools does this. In the internal loop we output:
.L4:
.loc 1 10 11 is_stmt 1 view .LVU8 <- a[i]
.loc 1 10 15 is_stmt 0 view .LVU9 <- ++
movdqa a(%rax), %xmm0
addq $16, %rax
paddd %xmm1, %xmm0
movaps %xmm0, a-16(%rax)
.loc 1 9 15 is_stmt 1 view .LVU10 <- i++
.loc 1 8 16 view .LVU11 <- i<s
cmpq %rax, %rdx
jne .L4
Exchanging to
.loc 1 8 16 view .LVU11 <- i<s
.loc 1 9 15 is_stmt 1 view .LVU10 <- i++
yields to:
test total:2652901 head:4123
3: 0
4: 4123
5: 1322715
6: 1322715
7: 3348
main total:3983 head:0
1: 0
2.1: 1916
3: 2067 test:1925
4: 0
So it seems that the tool only takes only the first location of the sample,
which is odd, since debug stmts may come from multiple original basic blocks
and this fact is not visible.
Ideally we could do something like:
.L4:
.loc 1 10 11 is_stmt 1 view .LVU8 <- a[i]
movdqa a(%rax), %xmm0
.loc 1 9 15 is_stmt 1 view .LVU10 <- i++
addq $16, %rax
.loc 1 10 15 is_stmt 0 view .LVU9 <- ++
paddd %xmm1, %xmm0
movaps %xmm0, a-16(%rax)
.loc 1 8 16 view .LVU11 <- i<s
cmpq %rax, %rdx
jne .L4
Which would make things to work (since there are no chained debug stmts) and
breakpointing would be less surprising but I understand it is not designed to
work this way....
llvm does
.LBB0_4: # =>This Inner Loop Header: Depth=1
.loc 0 10 15 is_stmt 1 discriminator 33 # ll.c:10:15
movdqa (%rsi,%rdi), %xmm1
movdqa 16(%rsi,%rdi), %xmm2
psubd %xmm0, %xmm1
psubd %xmm0, %xmm2
movdqa %xmm1, (%rsi,%rdi)
movdqa %xmm2, 16(%rsi,%rdi)
.loc 0 9 15 discriminator 33 # ll.c:9:15
addq $32, %rsi
cmpq %rsi, %rdx
jne .LBB0_4
So it has only line 9 and 10. Large discriminator numbers seems to be FS
discriminator encoding. LLVM assigns discriminators twice. First one is done
similarly as we do, but scaled up.
I think it is supposed to handle when statement gets duplicated into multiple
basic blocks, like a[i]++ does. So it has:
.loc 0 10 15 is_stmt 1 discriminator 33 # ll.c:10:15
movdqa (%rsi,%rdi), %xmm1
movdqa 16(%rsi,%rdi), %xmm2
psubd %xmm0, %xmm1
psubd %xmm0, %xmm2
movdqa %xmm1, (%rsi,%rdi)
movdqa %xmm2, 16(%rsi,%rdi)
for the vectorized body and
.loc 0 10 15 is_stmt 1 # ll.c:10:15
leaq (%rcx,%rdx,4), %rdi
incl (%rsi,%rdi)
for epilogue. Tool has -fuse_discriminator_encoding option which then merges
values back. I will look into what this really does.