On Fri, Aug 9, 2013 at 8:54 AM, Martin Liška <marxin.li...@gmail.com> wrote: > Hi > > On 9 August 2013 17:28, Jan Hubicka <hubi...@ucw.cz> wrote: >>> > Do we sanity check that the cold partition does not contain any blocks of >>> > count 0? It may be that the profile is broken enough to make partitioning >>> > not work. >>> >>> Do you mean sanity check that the cold partition does not contain any >>> blocks of count > 0? (they should all be zero) I don't think that >>> sanity check is there, but I can try adding that. >> >> Thanks, lets start with this - I suppose we need to figure out if >> 1) the reachable blocks goes to cold section because partitioning decides >> so even if they have non-0 count. >> 2) the reachable blocks goes to cold section because they have incorrectly >> updated count to 0 by someone >> 3) profiling gets some blocks wrong. >>> >>> The issue with such a sanity check may be due to the later fixup I >>> have in this patch (fixup_partitions). It is invoked after certain >>> optimizations on the cfg that may make hot blocks previously reached >>> by both hot and cold edges only reachable by cold blocks. These blocks >>> are remarked cold. If the profile data hasn't been updated correctly >>> it is possible that they would still have a non-0 count, although they >>> are essentially cold after the cfg transformation. >> >> Well, or the other posibility is that the edges was updated wrong >> and the blocks are really cold. We need to figure out if that happens >> commonly enough. >> >> I will try to think of some artificial testcases. >> >>> >>> But certainly such a sanity check should always succeed after the >>> original partitioning. >>> >>> > I can think of inlining where the count gets scaled all way down to 0. >>> > Perhaps >>> > count scaling code can be modified to never round towards 0 for block >>> > executing >>> > non-0 times... >>> >>> This reminds me of why this situation could happen. When I have been >>> testing this on the google branch I found situations where COMDAT >>> routines have 0 profile counts (if I remember correctly, this happens >>> when profile-gen binary has call to out-of-line copy of COMDAT in >>> module A, linker chooses the out-of-line copy from module B, therefore >>> the profile data for COMDAT in module A is 0). When the COMDAT gets >>> inlined, the 0 counts on its bbs are scaled to 0, even though the >>> callsite is non-zero. I have a patch that I was planning to send as a >>> follow-up that handles this case by propagating the callsite bb's >>> count to the inlined code when it has 0 counts, scaling by the edge >>> frequencies. I can either include that patch in this one, or send it >>> for review separately right now. Do you want to give it a try with >>> this one to see if it addresses the issue? >> >> This scenario should not happen with LTO setup: the LTO symbol tables >> contains >> code before early optimization and should be identical with profiling or >> without (modulo the new references and call from profile code). >> >> But this patch seems useful as a backup solution for non-LTO, so yes, please >> send it separately and I can try to double check that it really do not happen >> with LTO. >> (acutally LTO symtab may just chose COMDAT from module that has counts with >> it. >> It has all the info for it. I was thinkin about it few weeks back. It is >> bit hard to do - you need to verify that all references from the function are >> the same or linking might fail if you overwrite linker's decisiosns). >>> >>> Also, can you send me reproduction instructions for gimp? I don't >>> think I need Martin's patch, but which version of gimp and what is the >>> equivalent way for me to train it? I have some scripts to generate a >>> similar type of instruction heat map graph that I have been using to >>> tune partitioning and function reordering. Essentially it uses linux >>> perf to sample on instructions_retired and then munge the data in >>> several ways to produce various stats and graphs. One thing that has >>> been useful has been to combine the perf data with nm output to >>> determine which cold functions are being executed at runtime. >> >> Martin? > > I use gimp from git repository, commit: > 88ecd59c3436d302b644a5d25c1938c0e7b60ae0 (from Fet 5 2013) > Link: http://www.gimp.org/source/#gimp_from_git
Thanks. Were you building with LTO? And just -O2, or any other options I should use? Teresa > > Martin > >>> >>> However, for this to tell me which split cold bbs are being executed I >>> need to use a patch that Sri sent for review several months back that >>> gives the split cold section its own name: >>> http://gcc.gnu.org/ml/gcc-patches/2013-04/msg01571.html >>> Steven had some follow up comments that Sri hasn't had a chance to address >>> yet: >>> http://gcc.gnu.org/ml/gcc-patches/2013-05/msg00798.html >>> (cc'ing Sri as we should probably revive this patch soon to address >>> gdb and other issues with detecting split functions properly) >> >> Intresting, I used linker script for this purposes, but that his GNU ld >> only... >> >> Honza >>> >>> Thanks! >>> Teresa >>> >>> > >>> > Honza >>> >> >>> >> Thanks, >>> >> Teresa >>> >> >>> >> > I think we are really looking primarily for dead parts of the >>> >> > functions (sanity checks/error handling) >>> >> > that should not be visited by train run. We can then see how to make >>> >> > the heuristic more aggressive? >>> >> > >>> >> > Honza >>> >> >>> >> >>> >> >>> >> -- >>> >> Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413 >>> >>> >>> >>> -- >>> Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413 -- Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413