On Fri, Aug 9, 2013 at 8:54 AM, Martin Liška <marxin.li...@gmail.com> wrote:
> Hi
>
> On 9 August 2013 17:28, Jan Hubicka <hubi...@ucw.cz> wrote:
>>> > Do we sanity check that the cold partition does not contain any blocks of
>>> > count 0?  It may be that the profile is broken enough to make partitioning
>>> > not work.
>>>
>>> Do you mean sanity check that the cold partition does not contain any
>>> blocks of count > 0? (they should all be zero) I don't think that
>>> sanity check is there, but I can try adding that.
>>
>> Thanks, lets start with this - I suppose we need to figure out if
>>  1) the reachable blocks goes to cold section because partitioning decides
>>     so even if they have non-0 count.
>>  2) the reachable blocks goes to cold section because they have incorrectly
>>     updated count to 0 by someone
>>  3) profiling gets some blocks wrong.
>>>
>>> The issue with such a sanity check may be due to the later fixup I
>>> have in this patch (fixup_partitions). It is invoked after certain
>>> optimizations on the cfg that may make hot blocks previously reached
>>> by both hot and cold edges only reachable by cold blocks. These blocks
>>> are remarked cold. If the profile data hasn't been updated correctly
>>> it is possible that they would still have a non-0 count, although they
>>> are essentially cold after the cfg transformation.
>>
>> Well, or the other posibility is that the edges was updated wrong
>> and the blocks are really cold.  We need to figure out if that happens
>> commonly enough.
>>
>> I will try to think of some artificial testcases.
>>
>>>
>>> But certainly such a sanity check should always succeed after the
>>> original partitioning.
>>>
>>> > I can think of inlining where the count gets scaled all way down to 0.  
>>> > Perhaps
>>> > count scaling code can be modified to never round towards 0 for block 
>>> > executing
>>> > non-0 times...
>>>
>>> This reminds me of why this situation could happen. When I have been
>>> testing this on the google branch I found situations where COMDAT
>>> routines have 0 profile counts (if I remember correctly, this happens
>>> when profile-gen binary has call to out-of-line copy of COMDAT in
>>> module A, linker chooses the out-of-line copy from module B, therefore
>>> the profile data for COMDAT in module A is 0). When the COMDAT gets
>>> inlined, the 0 counts on its bbs are scaled to 0, even though the
>>> callsite is non-zero. I have a patch that I was planning to send as a
>>> follow-up that handles this case by propagating the callsite bb's
>>> count to the inlined code when it has 0 counts, scaling by the edge
>>> frequencies. I can either include that patch in this one, or send it
>>> for review separately right now. Do you want to give it a try with
>>> this one to see if it addresses the issue?
>>
>> This scenario should not happen with LTO setup: the LTO symbol tables 
>> contains
>> code before early optimization and should be identical with profiling or
>> without (modulo the new references and call from profile code).
>>
>> But this patch seems useful as a backup solution for non-LTO, so yes, please
>> send it separately and I can try to double check that it really do not happen
>> with LTO.
>> (acutally LTO symtab may just chose COMDAT from module that has counts with 
>> it.
>> It has all the info for it.  I was thinkin about it few weeks back.  It is
>> bit hard to do - you need to verify that all references from the function are
>> the same or linking might fail if you overwrite linker's decisiosns).
>>>
>>> Also, can you send me reproduction instructions for gimp? I don't
>>> think I need Martin's patch, but which version of gimp and what is the
>>> equivalent way for me to train it? I have some scripts to generate a
>>> similar type of instruction heat map graph that I have been using to
>>> tune partitioning and function reordering. Essentially it uses linux
>>> perf to sample on instructions_retired and then munge the data in
>>> several ways to produce various stats and graphs. One thing that has
>>> been useful has been to combine the perf data with nm output to
>>> determine which cold functions are being executed at runtime.
>>
>> Martin?
>
> I use gimp from git repository, commit:
> 88ecd59c3436d302b644a5d25c1938c0e7b60ae0 (from Fet 5 2013)
> Link: http://www.gimp.org/source/#gimp_from_git

Thanks. Were you building with LTO? And just -O2, or any other options
I should use?

Teresa

>
> Martin
>
>>>
>>> However, for this to tell me which split cold bbs are being executed I
>>> need to use a patch that Sri sent for review several months back that
>>> gives the split cold section its own name:
>>>   http://gcc.gnu.org/ml/gcc-patches/2013-04/msg01571.html
>>> Steven had some follow up comments that Sri hasn't had a chance to address 
>>> yet:
>>>   http://gcc.gnu.org/ml/gcc-patches/2013-05/msg00798.html
>>> (cc'ing Sri as we should probably revive this patch soon to address
>>> gdb and other issues with detecting split functions properly)
>>
>> Intresting, I used linker script for this purposes, but that his GNU ld 
>> only...
>>
>> Honza
>>>
>>> Thanks!
>>> Teresa
>>>
>>> >
>>> > Honza
>>> >>
>>> >> Thanks,
>>> >> Teresa
>>> >>
>>> >> > I think we are really looking primarily for dead parts of the 
>>> >> > functions (sanity checks/error handling)
>>> >> > that should not be visited by train run.  We can then see how to make 
>>> >> > the heuristic more aggressive?
>>> >> >
>>> >> > Honza
>>> >>
>>> >>
>>> >>
>>> >> --
>>> >> Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413
>>>
>>>
>>>
>>> --
>>> Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413



-- 
Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413

Reply via email to