https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90078

--- Comment #6 from bin cheng <amker at gcc dot gnu.org> ---
(In reply to Martin Liška from comment #5)
> (In reply to bin cheng from comment #4)
> > In get_scaled_computation_cost_at, we have very big ratio between
> > bb_count/loop_count:
> > 
> > (gdb) p data->current_loop->latch->count                   
> > $50 = {static n_bits = 61, static max_count = 2305843009213693950, static
> > uninitialized_count = 2305843009213693951, m_val = 158483, m_quality =
> > profile_guessed_local}
> > (gdb) p gimple_bb(at)->count
> > $51 = {static n_bits = 61, static max_count = 2305843009213693950, static
> > uninitialized_count = 2305843009213693951, m_val = 1569139790, m_quality =
> > profile_guessed_local}
> > (gdb) p 1569139790 / 158483
> > $52 = 9900
> > (gdb) p cost
> > $53 = {cost = 20, complexity = 2, scratch = 1}
> > (gdb) p 19 * 9900
> > $54 = 188100
> > 
> > as a result, sum_cost soon reaches to overflow of infinite_cost.  Shall we
> > cap the ratio so that it doesn't grow too quick?  Of course, some benchmark
> > data is needed for this heuristic tuning.
> 
> I would implement the capping in comp_cost struct where each individual
> operator
> can cap to infinite. What do you think Bin?
Implementing the capping in comp_cost::operators to infinite_cost is less
invasive.  OTOH, capping bb_freq/loop_freq has its own advantages, because:
Once cost reaches to infinite, it becomes meaningless in comparison as well as
candidate choosing;  capping bb_freq/loop_freq can still express hotness of
code to some extend.
Let's fix the issue by capping comp_cost::operators first for this stage 4 and
revisit the idea capping bb_freq/loop_freq with more benchmark data in next
Stage 1.  How about that?

Thanks.
> 
> > 
> > 
> > Another problem is the generated binary has segment fault issue even
> > compiled O0:
> > 
> > $ ./g++ -O0 pr90078.cc -o a.out -ftemplate-depth=1000000 -ftime-report  -g
> > -std=c++14
> > $ gdb --args ./a.out
> > 
> > Dump of assembler code for function main():
> >    0x0000000000400572 <+0>:     push   %rbp
> >    0x0000000000400573 <+1>:     mov    %rsp,%rbp
> >    0x0000000000400576 <+4>:     sub    $0x2625a020,%rsp
> >    0x000000000040057d <+11>:    lea    -0x2625a020(%rbp),%rax
> >    0x0000000000400584 <+18>:    mov    %rax,%rdi
> > => 0x0000000000400587 <+21>:    callq  0x4006c0 <Tensor4<float, 100, 100,
> > 100, 100>::Tensor4()>
> >    0x000000000040058c <+26>:    lea    -0x4c4b410(%rbp),%rax
> >    0x0000000000400593 <+33>:    lea    -0xe4e1c10(%rbp),%rdx
> > 
> > The segment fault happens at the callq instruction.
> 
> Yes, same happens also for clang. It's a stack overflow:
> 
> $ g++ pr90078.cpp  -ftemplate-depth=1111111 -fsanitize=address && ./a.out 
> AddressSanitizer:DEADLYSIGNAL
> =================================================================
> ==5750==ERROR: AddressSanitizer: stack-overflow on address 0x7fffd9da3af0
> (pc 0x0000004011cb bp 0x7fffffffdc60 sp 0x7fffd9da3af0 T0)
>     #0 0x4011ca in main (/home/marxin/Programming/testcases/a.out+0x4011ca)
>     #1 0x7ffff6d32b7a in __libc_start_main ../csu/libc-start.c:308
>     #2 0x401109 in _start (/home/marxin/Programming/testcases/a.out+0x401109)
> 
> SUMMARY: AddressSanitizer: stack-overflow
> (/home/marxin/Programming/testcases/a.out+0x4011ca) in main
> ==5750==ABORTING

Reply via email to