Re: [PATCH v7 08/13] pack-objects: shrink z_delta_size field in struct object_entry

2018-03-31 Thread Jeff King
On Sat, Mar 31, 2018 at 06:40:23AM +0200, Duy Nguyen wrote:

> > Unlike the depth, I don't think there's any _inherent_ reason you
> > couldn't throw, say, 1MB deltas into the cache (if you sized it large
> > enough). But I doubt such deltas are really all that common. Here are
> > the top 10 in linux.git:
> >
> >   $ git cat-file --batch-all-objects --batch-check='%(deltabase) 
> > %(objectsize:disk)' |
> > grep -v ^0 | sort -k 2nr | head
> >   a02b6794337286bc12c907c33d5d75537c240bd0 769103
> >   b28d4b64c05da02c5e8c684dcb9422876225ebdc 327116
> >   1e98ce86ed19aff9ba721d13a749ff08088c9922 325257
> >   a02b6794337286bc12c907c33d5d75537c240bd0 240647
> >   c550d99286c01867dfb26e432417f3106acf8611 177896
> >   5977795854f852c2b95dd023fd03cace023ee41c 119737
> >   4ccf9681c45d01d17376f7e0d266532a4460f5f8 112671
> >   b39fb6821faa9e7bc36de738152a2817b4bf3654 112657
> >   2645d6239b74bebd661436762e819b831095b084 103980
> >   b8ce7fe5d8def58dc63b7ae099eff7bd07e4e845 101014
> >
> > It's possible some weird workload would want to tweak this. Say you were
> > storing a ton of delta-capable files that were big and always differed
> > by a megabyte. And it was somehow really important to you to tradeoff
> > memory for CPU during the write phase of a pack.
> 
> We're not short on spare bits so I will try to raise this limit to 1MB
> (not because you mentioned 1MB, but because the largest size in your
> output is close to 1MB).

I doubt it matters much. Unless somebody has been tweaking the config
themselves, this has been limited to 1000 for everybody running
linux.git and nobody has ever noticed.

So I think it would only be an issue if:

  1. you had an oddball repo with gigantic deltas

AND

  2. you for some reason really cared about caching the deltas between
 phases

AND

  3. you had done enough homework to even figure out that this knob
 existed

I was thinking that you might care about (2) for serving fetches of your
oddball repository. But really, if you care about minimizing work, you
want to be reusing on-disk deltas anyway, which would skip this cache
entirely. So any work we do to reproduce the delta would probably be
dwarfed by the finding of this giant delta in the first place.

So raise the limit if you want, but I'd be surprised if anybody was even
doing (3) in the first place.

-Peff


Re: [PATCH v7 08/13] pack-objects: shrink z_delta_size field in struct object_entry

2018-03-30 Thread Duy Nguyen
On Fri, Mar 30, 2018 at 10:59 PM, Jeff King  wrote:
> On Sat, Mar 24, 2018 at 07:33:48AM +0100, Nguyễn Thái Ngọc Duy wrote:
>
>> We only cache deltas when it's smaller than a certain limit. This limit
>> defaults to 1000 but save its compressed length in a 64-bit field.
>> Shrink that field down to 16 bits, so you can only cache 65kb deltas.
>> Larger deltas must be recomputed at when the pack is written down.
>
> Unlike the depth, I don't think there's any _inherent_ reason you
> couldn't throw, say, 1MB deltas into the cache (if you sized it large
> enough). But I doubt such deltas are really all that common. Here are
> the top 10 in linux.git:
>
>   $ git cat-file --batch-all-objects --batch-check='%(deltabase) 
> %(objectsize:disk)' |
> grep -v ^0 | sort -k 2nr | head
>   a02b6794337286bc12c907c33d5d75537c240bd0 769103
>   b28d4b64c05da02c5e8c684dcb9422876225ebdc 327116
>   1e98ce86ed19aff9ba721d13a749ff08088c9922 325257
>   a02b6794337286bc12c907c33d5d75537c240bd0 240647
>   c550d99286c01867dfb26e432417f3106acf8611 177896
>   5977795854f852c2b95dd023fd03cace023ee41c 119737
>   4ccf9681c45d01d17376f7e0d266532a4460f5f8 112671
>   b39fb6821faa9e7bc36de738152a2817b4bf3654 112657
>   2645d6239b74bebd661436762e819b831095b084 103980
>   b8ce7fe5d8def58dc63b7ae099eff7bd07e4e845 101014
>
> It's possible some weird workload would want to tweak this. Say you were
> storing a ton of delta-capable files that were big and always differed
> by a megabyte. And it was somehow really important to you to tradeoff
> memory for CPU during the write phase of a pack.

We're not short on spare bits so I will try to raise this limit to 1MB
(not because you mentioned 1MB, but because the largest size in your
output is close to 1MB).
-- 
Duy


Re: [PATCH v7 08/13] pack-objects: shrink z_delta_size field in struct object_entry

2018-03-30 Thread Jeff King
On Sat, Mar 24, 2018 at 07:33:48AM +0100, Nguyễn Thái Ngọc Duy wrote:

> We only cache deltas when it's smaller than a certain limit. This limit
> defaults to 1000 but save its compressed length in a 64-bit field.
> Shrink that field down to 16 bits, so you can only cache 65kb deltas.
> Larger deltas must be recomputed at when the pack is written down.

Unlike the depth, I don't think there's any _inherent_ reason you
couldn't throw, say, 1MB deltas into the cache (if you sized it large
enough). But I doubt such deltas are really all that common. Here are
the top 10 in linux.git:

  $ git cat-file --batch-all-objects --batch-check='%(deltabase) 
%(objectsize:disk)' |
grep -v ^0 | sort -k 2nr | head
  a02b6794337286bc12c907c33d5d75537c240bd0 769103
  b28d4b64c05da02c5e8c684dcb9422876225ebdc 327116
  1e98ce86ed19aff9ba721d13a749ff08088c9922 325257
  a02b6794337286bc12c907c33d5d75537c240bd0 240647
  c550d99286c01867dfb26e432417f3106acf8611 177896
  5977795854f852c2b95dd023fd03cace023ee41c 119737
  4ccf9681c45d01d17376f7e0d266532a4460f5f8 112671
  b39fb6821faa9e7bc36de738152a2817b4bf3654 112657
  2645d6239b74bebd661436762e819b831095b084 103980
  b8ce7fe5d8def58dc63b7ae099eff7bd07e4e845 101014

It's possible some weird workload would want to tweak this. Say you were
storing a ton of delta-capable files that were big and always differed
by a megabyte. And it was somehow really important to you to tradeoff
memory for CPU during the write phase of a pack.

That seems pretty unlikely to bite anybody (and that was the best I
could come up with as a devil's advocate against it).

> Signed-off-by: Nguyễn Thái Ngọc Duy 
> ---
>  Documentation/config.txt |  3 ++-
>  builtin/pack-objects.c   | 22 --
>  pack-objects.h   |  3 ++-
>  3 files changed, 20 insertions(+), 8 deletions(-)

Patch looks OK.

-Peff