> 
> I see, yes LTO can deal with this better since it has global
> information. In non-LTO mode (including LIPO) we have the issue.

Either Martin or me will implement merging of the multiple copies at
LTO link time.  This is needed for Martin's code unification patch anyway.

Theoretically gcov runtime can also have symbol names and cfg checksums of
comdats in the static data and at exit produce buckets based on matching
names+checksums+counter counts, merge all data into in each bucket to one
representative by the existing merging routines and then memcpy them to
all the oriignal copiles.  This way all compilation units will receive same
results.

I am not very keen about making gcov runtime bigger and more complex than it
needs to be, but having sane profile for comdats seems quite important.
Perhaps, in GNU toolchain, ordered subsections can be used to make linker to
produce ordered list of comdats, so the runtime won't need to do hashing +
lookups.

Honza
> 
> I take it gimp is built with LTO and therefore shouldn't be hitting
> this comdat issue?
> 
> Let me do a couple things:
> - port over my comdat inlining fix from the google branch to trunk and
> send it for review. If you or Martin could try it to see if it helps
> with function splitting to avoid the hits from the cold code that
> would be great
> - I'll add some new sanity checking to try to detect non-zero blocks
> in the cold section, or 0 blocks reached by non-zero edges and see if
> I can flush out any problems with my tests or a profiledbootstrap or
> gimp.
> - I'll try building and profiling gimp myself to see if I can
> reproduce the issue with code executing out of the cold section.
> 
> Thanks,
> Teresa
> 
> >>
> >> Also, can you send me reproduction instructions for gimp? I don't
> >> think I need Martin's patch, but which version of gimp and what is the
> >> equivalent way for me to train it? I have some scripts to generate a
> >> similar type of instruction heat map graph that I have been using to
> >> tune partitioning and function reordering. Essentially it uses linux
> >> perf to sample on instructions_retired and then munge the data in
> >> several ways to produce various stats and graphs. One thing that has
> >> been useful has been to combine the perf data with nm output to
> >> determine which cold functions are being executed at runtime.
> >
> > Martin?
> >
> >>
> >> However, for this to tell me which split cold bbs are being executed I
> >> need to use a patch that Sri sent for review several months back that
> >> gives the split cold section its own name:
> >>   http://gcc.gnu.org/ml/gcc-patches/2013-04/msg01571.html
> >> Steven had some follow up comments that Sri hasn't had a chance to address 
> >> yet:
> >>   http://gcc.gnu.org/ml/gcc-patches/2013-05/msg00798.html
> >> (cc'ing Sri as we should probably revive this patch soon to address
> >> gdb and other issues with detecting split functions properly)
> >
> > Intresting, I used linker script for this purposes, but that his GNU ld 
> > only...
> >
> > Honza
> >>
> >> Thanks!
> >> Teresa
> >>
> >> >
> >> > Honza
> >> >>
> >> >> Thanks,
> >> >> Teresa
> >> >>
> >> >> > I think we are really looking primarily for dead parts of the 
> >> >> > functions (sanity checks/error handling)
> >> >> > that should not be visited by train run.  We can then see how to make 
> >> >> > the heuristic more aggressive?
> >> >> >
> >> >> > Honza
> >> >>
> >> >>
> >> >>
> >> >> --
> >> >> Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413
> >>
> >>
> >>
> >> --
> >> Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413
> 
> 
> 
> -- 
> Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413

Reply via email to