subject:"\[go\-nuts\] Shrinking pprof data for PGO through quantization."

Re: [go-nuts] Shrinking pprof data for PGO through quantization.

2023-11-17 Thread 'Adam Azarchs' via golang-nuts

Thanks for the pointers!  Looks like just the aggregation alone from your 
PR there cuts the size of the profile for the thing I was using it on to 
about a quarter the size, which is nice.  Your CL there does in fact 
truncate the stacks to length 5, just kind of surprisingly does that while 
looping over the samples referencing those locations.  I somehow hadn't 
come across github.com/google/pprof/profile, which seems to do a lot of the 
work I thought I'd need to implement myself.

I'd also suggest that it might be nice to have a version of profile.Write() 
that tool a gzip level argument; anyone who's bothering to use a tool like 
that to trim the size of a profile is going to want to spend the small 
amount of extra time involved in using maximum-level compression. Doesn't 
really matter to me, as I've been re-compressing with zopfli anyway, but 
that workflow is inconvenient and the gains from zopfli are very small.

Anyway that little tool solves my immediate needs so I probably can't 
justify spending effort on trying to polish or upstream it, but it's 
probably worth doing at some point.

On Thursday, November 9, 2023 at 1:58:43 PM UTC-8 Michael Pratt wrote:

> Hi Adam,
>
> Thanks for sending this, lots of interesting ideas here!
>
> I've been meaning for a long time to have more concrete guidelines for how 
> much data collection is enough for good results, to include on 
> https://go.dev/doc/pgo.
>
> I also think it would be valuable to provide tooling that can help 
> minimize file sizes. I played with this a bit last year, with results in 
> https://go.dev/cl/449500. In this CL I experimented with a few 
> approaches: truncating stacks (as you mention, though it seems this part 
> didn't make it into the code in my CL?), as well as aggregation: I drop PCs 
> entirely since the compiler can't use them anyway, and drop line numbers 
> from leaf functions since they don't matter to our current optimizations. 
> Another thing you'll see the source of that tool do is truncate the profile 
> to 99% CDF. i.e., drop all any samples in the coldest 1% of the 
> application. This is not a lossless operation because the compiler performs 
> its own CDF computations to determine which calls are hot and by removing 
> some of the total sample weight from the profile the compiler will probably 
> decide a few functions on the edge are no longer hot. To do this 
> losslessly, I think we'd need to keep the remaining weight in the profile 
> as a single "unknown" sample and have the compiler always sort that last.
>
> Some other thoughts inline below:
>
> On Thu, Nov 9, 2023 at 3:15 PM 'Adam Azarchs' via golang-nuts <
> golan...@googlegroups.com> wrote:
>
>> It's great that we have PGO support in go now, and it's relatively easy 
>> to use.  One thing that is of mild concern to me is that typically one will 
>> be checking the default.pgo file in to the repo, and git is notoriously not 
>> great at handling binary blobs that are updated  frequently.  Its "diff 
>> compression" doesn't really work on compressed data, of course, which means 
>> that in a standard (non-shallow) clone that pulls down all history, over 
>> time it can contribute significantly to time to clone and size of the 
>> `.git` directory.  git lfs avoids the cumulative burden over time, but has 
>> its own set of problems.  At a few hundred kB each, they're not so 
>> concerning, but if you had for example an automated system that collected 
>> data from production and updated the profile data every few days (which you 
>> want to do if you're deploying new code that often), possibly for multiple 
>> executables, it starts to add up.  So far I've been recompressing the pprof 
>> data (which is automatically gzip-compressed) with zopfli, but I was 
>> thinking about making a tool to do better than that.
>>
>> Firstly, we can drop entries from the mappings table if they aren't 
>> referenced by any locations in samples.  For a linux cgo binary this will 
>> include stuff like libc.  We can then also drop corresponding entries from 
>> the string table.
>>
>> We can drop values for sample_types which are not of interest 
>> .
>>   
>> That is, the first sample_type that is "samples"/"count" or 
>> "cpu"/"nanoseconds".  Most profiles seem to have both, but we only need to 
>> keep one of them.
>>
>> Next, it seems like PGO ignores all but the last two stack frames 
>> .
>>   
>> This is of course an implementation detail subject to change, but the logic 
>> for why it does this is sound, and so it's probably still safe to truncate 
>> stack frames at least somewhat.  Doing so would likely permit many samples 
>> to be merged, which could significantly reduce the uncompressed size of the 
>> profile.
>>
>> A pprof profile is, for purposes of PGO,

Re: [go-nuts] Shrinking pprof data for PGO through quantization.

2023-11-09 Thread 'Michael Pratt' via golang-nuts

Hi Adam,

Thanks for sending this, lots of interesting ideas here!

I've been meaning for a long time to have more concrete guidelines for how
much data collection is enough for good results, to include on
https://go.dev/doc/pgo.

I also think it would be valuable to provide tooling that can help minimize
file sizes. I played with this a bit last year, with results in
https://go.dev/cl/449500. In this CL I experimented with a few approaches:
truncating stacks (as you mention, though it seems this part didn't make it
into the code in my CL?), as well as aggregation: I drop PCs entirely since
the compiler can't use them anyway, and drop line numbers from leaf
functions since they don't matter to our current optimizations. Another
thing you'll see the source of that tool do is truncate the profile to 99%
CDF. i.e., drop all any samples in the coldest 1% of the application. This
is not a lossless operation because the compiler performs its own CDF
computations to determine which calls are hot and by removing some of the
total sample weight from the profile the compiler will probably decide a
few functions on the edge are no longer hot. To do this losslessly, I think
we'd need to keep the remaining weight in the profile as a single "unknown"
sample and have the compiler always sort that last.

Some other thoughts inline below:

On Thu, Nov 9, 2023 at 3:15 PM 'Adam Azarchs' via golang-nuts <
golang-nuts@googlegroups.com> wrote:

> It's great that we have PGO support in go now, and it's relatively easy to
> use.  One thing that is of mild concern to me is that typically one will be
> checking the default.pgo file in to the repo, and git is notoriously not
> great at handling binary blobs that are updated  frequently.  Its "diff
> compression" doesn't really work on compressed data, of course, which means
> that in a standard (non-shallow) clone that pulls down all history, over
> time it can contribute significantly to time to clone and size of the
> `.git` directory.  git lfs avoids the cumulative burden over time, but has
> its own set of problems.  At a few hundred kB each, they're not so
> concerning, but if you had for example an automated system that collected
> data from production and updated the profile data every few days (which you
> want to do if you're deploying new code that often), possibly for multiple
> executables, it starts to add up.  So far I've been recompressing the pprof
> data (which is automatically gzip-compressed) with zopfli, but I was
> thinking about making a tool to do better than that.
>
> Firstly, we can drop entries from the mappings table if they aren't
> referenced by any locations in samples.  For a linux cgo binary this will
> include stuff like libc.  We can then also drop corresponding entries from
> the string table.
>
> We can drop values for sample_types which are not of interest
> .
> That is, the first sample_type that is "samples"/"count" or
> "cpu"/"nanoseconds".  Most profiles seem to have both, but we only need to
> keep one of them.
>
> Next, it seems like PGO ignores all but the last two stack frames
> .
> This is of course an implementation detail subject to change, but the logic
> for why it does this is sound, and so it's probably still safe to truncate
> stack frames at least somewhat.  Doing so would likely permit many samples
> to be merged, which could significantly reduce the uncompressed size of the
> profile.
>
> A pprof profile is, for purposes of PGO, effectively table of execution
> stacks and how often they were sampled.  If you want to get really good
> profiling data, you do as the PGO guide tells you
>  and collect multiple samples and merge
> them, which gets you more coverage, but also makes for larger, more varied
> sample counts, which decreases the effectiveness of compression.  For
> purposes of PGO, we only care about the relative frequency of different
> code paths at a pretty coarse granularity.  There's two opportunities here.
>
> Normalizing and quantizing the sample counts should be possible to do with
> no significant effect on the accuracy or usefulness to PGO, and would
> improve the effectiveness of compression.  That is, you could for example
> round each sample count to the nearest power of N, and then scale them all
> so that the smallest sample count is N (where N is e.g. 2).  The effect of
> this would likely be minor, since most of the space in the profile is taken
> up by other things like the location and function tables, but it wouldn't
> hurt.
>

This sounds feasible, but, as you say, I imagine the impact on the final
size would be very small.

>
> The other, much more complicated thing we can do is merge sampled
> locations.  PGO is using the profile data to improve its guesses about
> wh

[go-nuts] Shrinking pprof data for PGO through quantization.

2023-11-09 Thread 'Adam Azarchs' via golang-nuts

It's great that we have PGO support in go now, and it's relatively easy to
use. One thing that is of mild concern to me is that typically one will be
checking the default.pgo file in to the repo, and git is notoriously not
great at handling binary blobs that are updated frequently. Its "diff
compression" doesn't really work on compressed data, of course, which means
that in a standard (non-shallow) clone that pulls down all history, over
time it can contribute significantly to time to clone and size of the
`.git` directory. git lfs avoids the cumulative burden over time, but has
its own set of problems. At a few hundred kB each, they're not so
concerning, but if you had for example an automated system that collected
data from production and updated the profile data every few days (which you
want to do if you're deploying new code that often), possibly for multiple
executables, it starts to add up. So far I've been recompressing the pprof
data (which is automatically gzip-compressed) with zopfli, but I was
thinking about making a tool to do better than that.

Firstly, we can drop entries from the mappings table if they aren't
referenced by any locations in samples. For a linux cgo binary this will
include stuff like libc. We can then also drop corresponding entries from
the string table.

We can drop values for sample_types which are not of interest
.

That is, the first sample_type that is "samples"/"count" or
"cpu"/"nanoseconds". Most profiles seem to have both, but we only need to
keep one of them.

Next, it seems like PGO ignores all but the last two stack frames
.

This is of course an implementation detail subject to change, but the logic
for why it does this is sound, and so it's probably still safe to truncate
stack frames at least somewhat. Doing so would likely permit many samples
to be merged, which could significantly reduce the uncompressed size of the
profile.

A pprof profile is, for purposes of PGO, effectively table of execution
stacks and how often they were sampled. If you want to get really good
profiling data, you do as the PGO guide tells you
and collect multiple samples and merge them,
which gets you more coverage, but also makes for larger, more varied sample
counts, which decreases the effectiveness of compression. For purposes of
PGO, we only care about the relative frequency of different code paths at a
pretty coarse granularity. There's two opportunities here.

Normalizing and quantizing the sample counts should be possible to do with
no significant effect on the accuracy or usefulness to PGO, and would
improve the effectiveness of compression. That is, you could for example
round each sample count to the nearest power of N, and then scale them all
so that the smallest sample count is N (where N is e.g. 2). The effect of
this would likely be minor, since most of the space in the profile is taken
up by other things like the location and function tables, but it wouldn't
hurt.

The other, much more complicated thing we can do is merge sampled
locations. PGO is using the profile data to improve its guesses about
which branches are taken (including implicit branches by type for interface
methods). We generally don't actually care which specific statement within
each branch is taking up the most time. If there are no possible branches
between two sampled locations, from PGO's perspective one might as well
merge them (e.g. just drop the one with the lower sample count). This is
more complicated to do than quantization, of course, as it requires control
flow analysis.

My questions for anyone who's read this far are

1. Would these ideas work, or am I making bad assumptions about what PGO
actually needs?
2. Are there pre-existing tools for doing this kind of thing that I just
haven't noticed?
3. Are there other significant opportunities for pruning the pprof data
in ways that wouldn't impact PGO?
4. Would this be valuable enough to try to roll it into an option for
`go tool pprof -proto`?
5. Any pointers for pre-existing tools for doing the control-flow
analysis bits?

--
You received this message because you are subscribed to the Google Groups
"golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to golang-nuts+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/golang-nuts/235fbddf-c31a-4267-b0eb-26b42662a730n%40googlegroups.com.

Re: [go-nuts] Shrinking pprof data for PGO through quantization.

Re: [go-nuts] Shrinking pprof data for PGO through quantization.

[go-nuts] Shrinking pprof data for PGO through quantization.

3 matches

Site Navigation

Mail list logo

Footer information