Hi On 9 August 2013 17:28, Jan Hubicka <hubi...@ucw.cz> wrote: >> > Do we sanity check that the cold partition does not contain any blocks of >> > count 0? It may be that the profile is broken enough to make partitioning >> > not work. >> >> Do you mean sanity check that the cold partition does not contain any >> blocks of count > 0? (they should all be zero) I don't think that >> sanity check is there, but I can try adding that. > > Thanks, lets start with this - I suppose we need to figure out if > 1) the reachable blocks goes to cold section because partitioning decides > so even if they have non-0 count. > 2) the reachable blocks goes to cold section because they have incorrectly > updated count to 0 by someone > 3) profiling gets some blocks wrong. >> >> The issue with such a sanity check may be due to the later fixup I >> have in this patch (fixup_partitions). It is invoked after certain >> optimizations on the cfg that may make hot blocks previously reached >> by both hot and cold edges only reachable by cold blocks. These blocks >> are remarked cold. If the profile data hasn't been updated correctly >> it is possible that they would still have a non-0 count, although they >> are essentially cold after the cfg transformation. > > Well, or the other posibility is that the edges was updated wrong > and the blocks are really cold. We need to figure out if that happens > commonly enough. > > I will try to think of some artificial testcases. > >> >> But certainly such a sanity check should always succeed after the >> original partitioning. >> >> > I can think of inlining where the count gets scaled all way down to 0. >> > Perhaps >> > count scaling code can be modified to never round towards 0 for block >> > executing >> > non-0 times... >> >> This reminds me of why this situation could happen. When I have been >> testing this on the google branch I found situations where COMDAT >> routines have 0 profile counts (if I remember correctly, this happens >> when profile-gen binary has call to out-of-line copy of COMDAT in >> module A, linker chooses the out-of-line copy from module B, therefore >> the profile data for COMDAT in module A is 0). When the COMDAT gets >> inlined, the 0 counts on its bbs are scaled to 0, even though the >> callsite is non-zero. I have a patch that I was planning to send as a >> follow-up that handles this case by propagating the callsite bb's >> count to the inlined code when it has 0 counts, scaling by the edge >> frequencies. I can either include that patch in this one, or send it >> for review separately right now. Do you want to give it a try with >> this one to see if it addresses the issue? > > This scenario should not happen with LTO setup: the LTO symbol tables contains > code before early optimization and should be identical with profiling or > without (modulo the new references and call from profile code). > > But this patch seems useful as a backup solution for non-LTO, so yes, please > send it separately and I can try to double check that it really do not happen > with LTO. > (acutally LTO symtab may just chose COMDAT from module that has counts with > it. > It has all the info for it. I was thinkin about it few weeks back. It is > bit hard to do - you need to verify that all references from the function are > the same or linking might fail if you overwrite linker's decisiosns). >> >> Also, can you send me reproduction instructions for gimp? I don't >> think I need Martin's patch, but which version of gimp and what is the >> equivalent way for me to train it? I have some scripts to generate a >> similar type of instruction heat map graph that I have been using to >> tune partitioning and function reordering. Essentially it uses linux >> perf to sample on instructions_retired and then munge the data in >> several ways to produce various stats and graphs. One thing that has >> been useful has been to combine the perf data with nm output to >> determine which cold functions are being executed at runtime. > > Martin?
I use gimp from git repository, commit: 88ecd59c3436d302b644a5d25c1938c0e7b60ae0 (from Fet 5 2013) Link: http://www.gimp.org/source/#gimp_from_git Martin >> >> However, for this to tell me which split cold bbs are being executed I >> need to use a patch that Sri sent for review several months back that >> gives the split cold section its own name: >> http://gcc.gnu.org/ml/gcc-patches/2013-04/msg01571.html >> Steven had some follow up comments that Sri hasn't had a chance to address >> yet: >> http://gcc.gnu.org/ml/gcc-patches/2013-05/msg00798.html >> (cc'ing Sri as we should probably revive this patch soon to address >> gdb and other issues with detecting split functions properly) > > Intresting, I used linker script for this purposes, but that his GNU ld > only... > > Honza >> >> Thanks! >> Teresa >> >> > >> > Honza >> >> >> >> Thanks, >> >> Teresa >> >> >> >> > I think we are really looking primarily for dead parts of the functions >> >> > (sanity checks/error handling) >> >> > that should not be visited by train run. We can then see how to make >> >> > the heuristic more aggressive? >> >> > >> >> > Honza >> >> >> >> >> >> >> >> -- >> >> Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413 >> >> >> >> -- >> Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413