mzolotukhin added a comment.
> What kinds of would be useful to you? (How do you want the
> runtime of Clang broken down?) Are vertical slices (through all Clang's
> various layers and potentially parts of LLVM) -- as this patch will produce
> -- useful, or would you really want horizontal slices (as in, what part of
> Clang is actually spending the time)? Or just anything that's basically
> expected to be consistent from run to run, so you can identify that something
> has slowed down, even if you don't have enough information to really know
> what?
For me "something has slowed down" would be enough. I.e. even if "parse
templates" would be erroneously attributed to 90% time spent in front-end, I
would be able to see a jump from 90% to 95%. While these numbers are not
reflecting the actual ratio, they still will indicate changes.
I would find horizontal slicing more interesting, as we can get vertical slices
from profilers. I.e. if in LLVM profile I see that time spent in APInt::add has
grown a lot, I'd like to know which pass started to use it more extensively -
it's rarely the case that APInt::add slowed down itself.
> I think we need to fix the overlap issue as a prerequisite to adding timers
> with large amounts of overlap, especially self-overlap. Otherwise the numbers
> will likely do more harm than good, as they will significantly misattribute
> runtime. Fortunately, I think that should only require some relatively small
> changes to the LLVM timer infrastructure.
I definitely would support that :)
> Well, that depends on the profiler. With some profilers, you can just attach
> a profiler to your entire compilation and then ask it what the hottest
> functions were. But sure, if you have existing infrastructure built around
> -ftime-report, I can see that it makes sense to reuse that infrastructure.
Yeah, that was exactly my point.
> While LLVM may have some overlap between regions, the vertical timing slices
> are still pretty well aligned with the horizontal functionality slices. I
> expect the problems in Clang to be a lot worse. Suppose you enter N levels of
> parsing templates, and then trigger a whole bunch of template instantiation,
> AST deserialization, and code generation. Let's say that takes 1s in total.
> With this patch, I think we'd end up attributing Ns of compile time to
> "parsing templates", which is clearly very far from the truth, but will
> likely be listed as (by far) the slowest thing in the compilation. This does
> not even rise to the level of "not-perfect", it's going to be actively
> misleading in a lot of cases, and won't even necessarily point anywhere near
> the problematic spot.
> I think with this patch and no fix to the overlap problem, we will find
> ourselves frequently fielding bugs where people say "X is slow" when it
> actually isn't. Plus I think we have the opportunity to deliver something
> that's hugely useful and not much harder to achieve (providing timing
> information that relates back to the source code), and I'd prefer we spend
> our complexity budget for this feature there.
I see your point, and I agree it would be really misleading if e.g the sum of
timers won't match the total time due to self-overlaps. I think adding these
timers still might be worthwhile as my guess is that no one usually uses them
anyway unless they know what to expect from the reported timers, but I might be
mistaken here. And anyway, if you think it's possible to fix the issue first,
it's totally fine with me.
Thanks,
Michael
https://reviews.llvm.org/D43578
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits