On 2020/9/12 01:36, Tamar Christina wrote:
> Hi Martin,
> 
>>
>> can you please confirm that the difference between these two is all due to
>> the last option -fno-inline-functions-called-once ?  Is LTo necessary?  
>> I.e., can
>> you run the benchmark also built with the branch compiler and -mcpu=native
>> -Ofast -fomit-frame-pointer -fno-inline-functions-called-once ?
>>
> 
> Done, see below.
> 
>>> +----------+------------------------------------------------------------------------------
>> --------------------------------------------------------------------+--------------+--+--+
>>> | Branch   | -mcpu=native -Ofast -fomit-frame-pointer -flto
>> | -24%         |  |  |
>>> +----------+------------------------------------------------------------------------------
>> --------------------------------------------------------------------+--------------+--+--+
>>> | Branch   | -mcpu=native -Ofast -fomit-frame-pointer
>> | -26%         |  |  |
>>> +----------+------------------------------------------------------------------------------
>> --------------------------------------------------------------------+--------------+--+--+
>>
>>>
>>> (Hopefully the table shows up correct)
>>
>> it does show OK for me, thanks.
>>
>>>
>>> It looks like your patch definitely does improve the basic cases. So
>>> there's not much difference between lto and non-lto anymore and it's
>> much Better than GCC 10. However it still contains the regression introduced
>> by Honza's changes.
>>
>> I assume these are rates, not times, so negative means bad.  But do I
>> understand it correctly that you're comparing against GCC 10 with the two
>> parameters set to rather special values?  Because your table seems to
>> indicate that even for you, the branch is faster than GCC 10 with just -
>> mcpu=native -Ofast -fomit-frame-pointer.
> 
> Yes these are indeed rates, and indeed I am comparing against the same options
> we used to get the fastest rates on before which is the two parameters and
> the inline flag.
> 
>>
>> So is the problem that the best obtainable run-time, even with obscure
>> options, from the branch is slower than the best obtainable run-time from
>> GCC 10?
>>
> 
> Yeah that's the problem, when we compare the two we're still behind.
> 
> I've done the additional two runs
> 
> +----------+--------------------------------------------------------------------------------------------------------------------------------------------------+--------------+
> | Compiler | Flags                                                            
>                                                                               
>   | diff GCC 10  |
> +----------+--------------------------------------------------------------------------------------------------------------------------------------------------+--------------+
> | GCC 10   | -mcpu=native -Ofast -fomit-frame-pointer -flto --param 
> ipa-cp-eval-threshold=1 --param   ipa-cp-unit-growth=80 
> -fno-inline-functions-called-once |              |
> +----------+--------------------------------------------------------------------------------------------------------------------------------------------------+--------------+
> | GCC 10   | -mcpu=native -Ofast -fomit-frame-pointer                         
>                                                                               
>   | -44%         |
> +----------+--------------------------------------------------------------------------------------------------------------------------------------------------+--------------+
> | GCC 10   | -mcpu=native -Ofast -fomit-frame-pointer -flto                   
>                                                                               
>   | -36%         |
> +----------+--------------------------------------------------------------------------------------------------------------------------------------------------+--------------+
> | GCC 11   | -mcpu=native -Ofast -fomit-frame-pointer -flto --param 
> ipa-cp-eval-threshold=1 --param   ipa-cp-unit-growth=80 
> -fno-inline-functions-called-once | -12%         |
> +----------+--------------------------------------------------------------------------------------------------------------------------------------------------+--------------+
> | Branch   | -mcpu=native -Ofast -fomit-frame-pointer -flto --param 
> ipa-cp-eval-threshold=1 --param   ipa-cp-unit-growth=80                       
>             | -22%         |
> +----------+--------------------------------------------------------------------------------------------------------------------------------------------------+--------------+
> | Branch   | -mcpu=native -Ofast -fomit-frame-pointer -flto --param 
> ipa-cp-eval-threshold=1 --param   ipa-cp-unit-growth=80 
> -fno-inline-functions-called-once | -12%         |
> +----------+--------------------------------------------------------------------------------------------------------------------------------------------------+--------------+
> | Branch   | -mcpu=native -Ofast -fomit-frame-pointer -flto                   
>                                                                               
>   | -24%         |
> +----------+--------------------------------------------------------------------------------------------------------------------------------------------------+--------------+
> | Branch   | -mcpu=native -Ofast -fomit-frame-pointer                         
>                                                                               
>   | -26%         |
> +----------+--------------------------------------------------------------------------------------------------------------------------------------------------+--------------+
> | Branch   | -mcpu=native -Ofast -fomit-frame-pointer -flto 
> -fno-inline-functions-called-once                                             
>                     | -12%         |
> +----------+--------------------------------------------------------------------------------------------------------------------------------------------------+--------------+
> | Branch   | -mcpu=native -Ofast -fomit-frame-pointer 
> -fno-inline-functions-called-once                                             
>                           | -11%         |
> +----------+--------------------------------------------------------------------------------------------------------------------------------------------------+--------------+
> 
> And this confirms that indeed LTO isn't needed and that the branch
> without any options is indeed much better than it was on GCC 10 without any 
> options.
> 
> It also confirms that the only remaining difference is in the 
> -fno-inline-functions-called-once

If -fno-inline-functions-called-once is added, the recursive call
function digits_2 won't be inlined, as each digits_2 is specialized
to clone nodes and called once only, so performance back is expected,
I guess it is somewhat similar to -fno-inline for this case. 

@Jambor @Honza Any progress about this (--param controlling maximal
recursion depth) and the other regression about LOOP_GUARD_WITH_PREDICTION in
PR96825(https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96825) please? :)
I tested the current master FSF code, the regression still exists...


Thanks,
Xionghu


Reply via email to