Hi Honza, > On 24 Jun 2025, at 4:37 pm, Jan Hubicka <hubi...@ucw.cz> wrote: > > External email: Use caution opening links or attachments > > >>> >>> With part suffixes we also may want to merge specially, since the >>> entry_count of the split part does not correspond to entry_count of the >>> original function. >>> >>> I wonder, does partitioned function work with the google tool? I >>> remember it had limitations in this respect. >>> >> >> Yes, Here are some examples. >> >> _Z17expand_assignmentP9tree_nodeS0_b.part.0 total:7045 head:297 >> 0: 297 >> 20: 297 >> >> _Z17expand_assignmentP9tree_nodeS0_b total:1488 head:277 >> 1: 277 >> 9: 277 >> Here, we should keep the head as it is as head is for offset 1. > > I actually had in ming the .cold partition > (-freprder-blocks-and-partitoin) > but this is interesting too. We should track if we stripped .part > suffix and in that case do not merge in head counts. > > However # of invocations of > _Z17expand_assignmentP9tree_nodeS0_b.part.0 > should always be strictly lower than #of invocations of > _Z17expand_assignmentP9tree_nodeS0_b > > this is not reflected by head count, since I suppose the second is > inlined into some contexts which makes the execution to be accounted > spearately into their inline instances. > > So merging the profiles will also lead to inconsistencies making the > .part variant to seem more hot than it is...
I am looking into this and will post the patch as a follow up patch. > >> >> >> _Z19recompute_dominator13cdi_directionP15basic_block_def.part.0 total:1182 >> head:13 >> 0: 13 >> 3: 13 >> 11: 13 >> >> _Z19recompute_dominator13cdi_directionP15basic_block_def total:11 head:9 >> 1: 0 >> 3: 0 >> 9: 9 >> >> Here also, we should keep the head as it is as head is for offset 9. >> >> _Z22init_attr_rdwr_indicesP8hash_mapI16rdwr_access_hash11attr_access21simple_hashmap_traitsI19default_hash_traitsIS0_ES1_EEP9tree_node.part.0 >> total:85 head:5 >> 0: 8 >> 11: 0 >> 12: 0 >> 16: 0 >> 17: 0 >> 18: 0 >> 20: 0 >> 21: 0 >> 25: 0 >> 25.1: 2 >> 27: 2 >> 30: 0 >> 31: 0 >> 34: 0 >> 35: 2 >> 38: 2 >> 38.1: 2 >> 39: 2 >> 41: 2 >> 46: 2 >> 52.1: 0 >> 54: 0 >> 54.1: 0 >> 56: 8 >> 57: 0 >> 59: 0 >> 62: 0 >> 63: 3 >> 65: 0 >> 71.1: 0 >> 77: 0 >> 78: 0 >> 81: 3 >> 84: 2 >> 86: 0 >> 89: 0 >> 91: 0 >> 92: 0 >> 92.1: 0 >> 98: 0 >> 99: 0 >> 103: 0 >> 108: 0 >> 108.1: 0 >> 111: 0 >> 114: 0 >> 120: 1 >> 124: 0 >> 125: 0 >> 127: 0 >> 128: 0 >> 130: 0 >> 131: 0 >> 134: 0 >> 139: 0 >> 140: 0 >> 143: 1 >> 6: lookup_attribute total:40 >> 4: 5 >> >> >> _Z22init_attr_rdwr_indicesP8hash_mapI16rdwr_access_hash11attr_access21simple_hashmap_traitsI19default_hash_traitsIS0_ES1_EEP9tree_node >> total:212 head:71 >> 2: 71 >> _Z22init_attr_rdwr_indicesP8hash_mapI16rdwr_access_hash11attr_access21simple_hashmap_traitsI19default_hash_traitsIS0_ES1_EEP9tree_node.part.0:5 >> 143: 141 >> >> This looks odd. Looks like create_gcovt getting mixed up with the offset of >> inlined functions > > I am not sure I follow what you mean here? Head count here is 5 ( head:5). But the sample counts for the offset does not match this. Except that: >> 6: lookup_attribute total:40 >> 4: 5 This looks wrong? Here is the revised patch for get_original_name. Also added a test case. Is this OK? Thanks, Kugan > > This is my current benchmark run with -Ofast -mtune=native (without LTO) > comparing no feedback (base) to autofdo (peak) > > 500.perlbench_r 1 167 9.51 * 1 155 10.3 > * > 502.gcc_r 1 132 10.7 * 1 126 11.2 > * > 505.mcf_r 1 226 7.16 * 1 225 > 7.20 * > 520.omnetpp_r 1 203 6.47 * 1 203 > 6.47 * > 523.xalancbmk_r NR > NR > 525.x264_r 1 84.7 20.7 * 1 90.7 19.3 > * > 531.deepsjeng_r 1 208 5.50 * 1 209 > 5.47 * > 541.leela_r 1 295 5.61 * 1 318 > 5.21 * > 548.exchange2_r 1 85.9 30.5 * 1 93.3 28.1 > * > 557.xz_r 1 225 4.79 * 1 220 > 4.90 * > Est. SPECrate2017_int_base 9.13 > Est. SPECrate2017_int_peak 9.05 > > So there are regressions in x264, deepsjeng, leela and exchange neighter > of them very bad. I think it would be interesting to understand > 541.leela_r first. > > Honza >> >> Thanksm >> Kugabn
0001-AutoFDO-v3-Fix-get_original_name-to-strip-only-names.patch
Description: 0001-AutoFDO-v3-Fix-get_original_name-to-strip-only-names.patch