On 5/30/2023 6:08 PM, Alex Bennée wrote:
> 
> "Wu, Fei" <fei2...@intel.com> writes:
> 
>> On 5/30/2023 1:01 PM, Wu, Fei wrote:
>>> On 5/30/2023 12:07 PM, Richard Henderson wrote:
>>>> On 5/29/23 04:49, Fei Wu wrote:
>>>>> +/*
>>>>> + * The TCGProfile structure holds data for analysing the quality of
>>>>> + * the code generation. The data is split between stuff that is valid
>>>>> + * for the lifetime of a single translation and things that are valid
>>>>> + * for the lifetime of the translator. As the former is reset for each
>>>>> + * new translation so it should be copied elsewhere if you want to
>>>>> + * keep it.
>>>>> + *
>>>>> + * The structure is safe to access within the context of translation
>>>>> + * but accessing the data from elsewhere should be done with safe
>>>>> + * work.
>>>>> + */
> <snip>
>>>>> +    int64_t cpu_exec_time;
>>>>> +    int64_t op_count; /* total insn count */
>>>>> +    int64_t code_in_len;
>>>>> +    int64_t code_out_len;
>>>>> +    int64_t search_out_len;
>>>>> +
>>>>> +    /* Timestamps during translation  */
>>>>> +    uint64_t gen_start_time;
>>>>> +    uint64_t gen_ir_done_time;
>>>>> +    uint64_t gen_opt_done_time;
>>>>> +    uint64_t gen_la_done_time;
>>>>> +    uint64_t gen_code_done_time;
>>>>> +
>>>>> +    /* Lifetime count of TCGOps per TCGContext */
>>>>> +    uint64_t table_op_count[NB_OPS];
>>>>> +} TCGProfile;
>>>>> +
>>>>
>>>> Why have you added this back?
>>>>
>>>> The whole point of removing CONFIG_PROFILE to begin was to have all new
>>>> code.  Not to remove it then reintroduce it unchanged.
>>>>
>>>> In tcg_gen_code, you have access to the TranslationBlock as s->gen_tb. 
>>>> There is zero point to accumulating into TCGProfile, when you could be
>>>> accumulating into TCGStatistics directly.
>>>>
>>> TCGProfile contains global wide (per TCGContext) stats, but TBStatistics
>>> is TB specific, some info in TCGProfile such as table_op_count is not
>>> able to be summed up from TBStatistics. The per-translation stats in
>>> TCGProfile may be removed indeed.
>>>
>> After some cleanup locally, these are the remains in TCGProfile:
>> * cpu_exec_time - which is not guarded by tb_stats_enabled, it could be
>> moved out as an individual variable?
>> * gen_xxx_time - which in kinda global variables across functions to
>> calc ts->gen_times
> 
> Given the work on JIT profiling I think there is an argument to drop the
> time profile bits and pieces. I think you can get the same information
> from a perf run although it does amortise the cost of generation over
> all translations. Do we see any particular TBs that are particularly
> expensive to translate by more than a standard deviation?
> 
I added some code locally to sort by ir/code, (opt & la are not printed
in 'info tb-list').

(qemu) info tb-list 2 ir
(qemu) TB id:1 | phys:0x2abe14 virt:0xffffffff800abe14 flags:0x01024001
0 inv/1
        | exec:99079/0 guest inst cov:0.03%
        | trans:1 ints: g:6 op:24 op_opt:21 spills:0
        | h/g (host bytes / guest insts): 20.000000
        | time to gen at 2.4GHz => code:840.00(ns) IR:75274.58(ns)

TB id:2 | phys:0x24cf94 virt:0xffffffff8004cf94 flags:0x01024001 0 inv/3
        | exec:99085/0 guest inst cov:0.53%
        | trans:3 ints: g:31 op:86 op_opt:74 spills:0
        | h/g (host bytes / guest insts): 42.322582
        | time to gen at 2.4GHz => code:10011.25(ns) IR:118656.25(ns)

(qemu) info tb-list 2 ir
(qemu) TB id:1 | phys:0x24cf94 virt:0xffffffff8004cf94 flags:0x01024001
0 inv/3
        | exec:131259/0 guest inst cov:0.59%
        | trans:3 ints: g:31 op:86 op_opt:74 spills:0
        | h/g (host bytes / guest insts): 42.322582
        | time to gen at 2.4GHz => code:10011.25(ns) IR:118656.25(ns)

TB id:2 | phys:0x2abe14 virt:0xffffffff800abe14 flags:0x01024001 1 inv/3
        | exec:131541/0 guest inst cov:0.11%
        | trans:3 ints: g:6 op:24 op_opt:21 spills:0
        | h/g (host bytes / guest insts): 20.000000
        | time to gen at 2.4GHz => code:4411.25(ns) IR:88288.33(ns)

(qemu) info tb 2
(qemu)
------------------------------

TB id:2 | phys:0x2abe14 virt:0xffffffff800abe14 flags:0x01024001 1 inv/3
        | exec:668754/0 guest inst cov:0.16%
        | trans:3 ints: g:6 op:24 op_opt:21 spills:0
        | h/g (host bytes / guest insts): 20.000000
        | time to gen at 2.4GHz => code:4411.25(ns) IR:88288.33(ns)

----------------
IN:
Priv: 1; Virt: 0

0xffffffff800abe14:  864a              mv                      a2,s2
0xffffffff800abe16:  85ce              mv                      a1,s3
0xffffffff800abe18:  8526              mv                      a0,s1
0xffffffff800abe1a:  46bd              addi                    a3,zero,15
0xffffffff800abe1c:  fffff097          auipc                   ra,-4096
              # 0xffffffff800aae1c
0xffffffff800abe20:  cc0080e7          jalr                    ra,ra,-832
------------------------------

Look at the tb with phys:0x2abe14, although the first time IR takes
75274ns, but in the second command we can see it takes much less time
(Note IR time is accumulated).

So if the time for the same TB is not consistent, and the deviation
could be dominated by system events such as memory allocation instead of
codegen itself (?), I think it's less useful.

Alex, regarding dropping time profile, do you mean remove TB_JIT_TIME
completely?

Thanks,
Fei.

>> * table_op_count - it's indeed guarded by tb_stats_enabled, but it's a
>> large array, it might be too large to put into TBStaticstics.
> 
> Probably. This is probably more interesting information as an aggregate
> than per TB.
> 
>>
>> Thanks,
>> Fei.
>>
>>> Thanks,
>>> Fei.
>>>
>>>>
>>>> r~
>>>
> 
> 


Reply via email to