Ping
Thanks,
Prachi
________________________________________
From: Prachi Godbole <[email protected]>
Sent: Wednesday, July 30, 2025 6:48 PM
To: [email protected]
Cc: [email protected]
Subject: [Patch] Address compile time issues for locality cloning pass
This patch attempts to reduce compile time for locality cloning pass by
reducing recursive calls to partition_callchain (). This is achieved by
precomputing caller callee information into locality_info.
locality_info stores all callees of a node, either directly or via inlined
nodes thereby avoiding calls to partition_callchain () for inlined nodes which
are already partitioned with their inlined_to nodes.
locality_info also stores precomputed accumulated incoming edge frequencies per
unique caller and avoids repeated computation within partition_callchain ().
Approximately 45% compile time improvement is observed for
bootstrap-lto-locality config, and takes 2-5% more time on top of bootstrap-lto.
This patch also handles appropriate memory management of pass specific data
structures.
Bootstrapped and tested on aarch64-none-linux-gnu.
Ok for mainline?
Thanks,
Prachi
Signed-off-by: Prachi Godbole
config/ChangeLog:
* bootstrap-lto-locality.mk (STAGE2_CFLAGS): Add param
lto-max-locality-partition.
(STAGE3_CFLAGS): Ditto.
(STAGEprofile_CFLAGS): Remove -fipa-reorder-for-locality.
(STAGEtrain_CFLAGS): Ditto.
gcc/ChangeLog:
* ipa-locality-cloning.cc (struct locality_info): New struct.
(loc_infos): Ditto.
(get_locality_info): New function.
(populate_callee_locality_info): Ditto.
(populate_caller_locality_info): Ditto.
(create_locality_info): Ditto.
(adjust_recursive_callees): Access node_to_clone by reference.
(inline_clones): Access node_to_clone and clone_to_node by reference.
(clone_node_as_needed): Ditto.
(accumulate_incoming_edge_frequency): Remove function.
(clone_node_p): New function.
(partition_callchain): Change prototype.
(locality_determine_ipa_order): Call create_locality_info ().
(locality_determine_static_order): Ditto.
(locality_partition_and_clone): Update call to partition_callchain ()
according prototype.
(lc_execute): Allocate and free node_to_ch_info, node_to_clone,
clone_to_node.