Thank you for the review, Honza. 

> On 15 Jan 2026, at 8:03 AM, Jan Hubicka <[email protected]> wrote:
> 
> External email: Use caution opening links or attachments
> 
> 
>> This patch attempts to reduce compile time for locality cloning pass by
>> reducing recursive calls to partition_callchain ().  This is achieved by
>> precomputing caller callee information into locality_info.  locality_info
>> stores all callees of a node, either directly or via inlined nodes thereby
>> avoiding calls to partition_callchain () for inlined nodes which are already
>> partitioned with their inlined_to nodes.  locality_info stores precomputed
>> accumulated incoming edge frequencies per unique caller and avoids repeated
>> computation within partition_callchain ().  It also stores preaccumulated and
>> sorted outgoing edge frequencies for unique callees.
>> 
>> This patch refines is_entry_node_p () check by calling local_p () instead of
>> just alias check.
>> 
>> Approximately 45% compile time improvement is observed for
>> bootstrap-lto-locality config, and takes 2-5% more time on top of
>> bootstrap-lto.
>> 
>> This patch also handles appropriate memory management of pass specific data
>> structures.
>> 
>> Bootstrapped and tested on aarch64-none-linux-gnu.
>> Ok for mainline?
>> 
>> Thanks,
>> Prachi
>> 
>> Signed-off-by: Prachi Godbole <[email protected]>
>> 
>> gcc/ChangeLog:
>> 
>>      * ipa-locality-cloning.cc (struct locality_callee_info): New struct.
>>      (struct locality_info): Ditto.
>>      (loc_infos): Ditto.
>>      (get_locality_info): New function.
>>      (sort_all_callees_default): Ditto.
>>      (callee_default_cmp): Ditto.
>>      (populate_callee_locality_info): Ditto.
>>      (populate_caller_locality_info): Ditto.
>>      (create_locality_info): Ditto.
>>      (adjust_recursive_callees): Access node_to_clone by reference.
>>      (inline_clones): Access node_to_clone and clone_to_node by reference.
>>      (clone_node_as_needed): Ditto.
>>      (accumulate_incoming_edge_frequency): Remove function.
>>      (clone_node_p): New function.
>>      (partition_callchain): Refactor the function.
>>      (is_entry_node_p): Call local_p ().
>>      (locality_determine_ipa_order): Call create_locality_info ().
>>      (locality_determine_static_order): Ditto.
>>      (locality_partition_and_clone): Update call to partition_callchain ()
>>                                                       according prototype.
>>      (lc_execute): Allocate and free node_to_ch_info, node_to_clone,
>>      clone_to_node.
>> 
> 
> +/* Data structure to hold precomputed callchain information.  */
> +struct locality_info
> +{
> +  cgraph_node *node;
> +
> +  /* Consolidated callees, including callees of inlined nodes.  */
> +  vec<locality_callee_info *> all_callees;
> +
> +  /* Accumulated caller->node edge frequencies for unique callers.  */
> +  hash_map<loc_map_hash, sreal> caller_freq;
> +  /* Accumulated node->callee edge frequencies for unique callees.  */
> +  hash_map<loc_map_hash, locality_callee_info> callee_info;
> +
> +  locality_info ()
> +    {
> +      all_callees.create (1);
> +    }
> +  ~locality_info ()
> +    {
> +      all_callees.release ();
> +    }
> Wouldn't auto_vec do the job here?
> 
I tried with auto_vec but for bigger vectors it sometimes results in memory 
corruption so went with explicit allocation deallocation.
> Otherwise the patch looks OK.
> Honza


Reply via email to