Re: [PATCH] [RFC] Making use of bitmaps for thin objects

2014-01-07 Thread Jeff King
On Mon, Jan 06, 2014 at 10:14:30PM +, Ben Maurer wrote:

 It looks like for my repo the size win wasn't as big (~10%). Is it
 possible that with the kernel test you got extremely lucky and there
 was some huge binary blob that thin packing turned into a tiny delta?

I don't think so. When I look at the reused-delta numbers, I see that we
reused ~3000 extra deltas (and the compressing progress meter drops by
the same amount. If I do a full clone (or just index-pack the result),
it claims ~3000 thin objects completed with local objects (versus 0 in
the normal case).

So I think we really are getting a lot of little savings adding up,
which makes sense. If there were thousands of changed files, a non-thin
pack has to have at least _one_ full version of each file. With thin
packs, we might literally have only deltas.

It was a 7-week period, which might make more difference. I'm going to
run some experiments with different time periods to see if that changes
anything.

It might also be the repo contents. I'm going to try my experiments on a
few different repositories. It may be that either the kernel or your
repo is unusual in some way.

Or maybe I was just lucky. :)

 When you get a chance, it'd be handy if you could push an updated
 version of your change out to your public github repo. I'd like to see
 if folks here are interested in testing this more, and it'd be good to
 make sure we're testing the diff that is targeted for upstream.

I've pushed it to:

  git://github.com/peff/git.git jk/bitmap-reuse-delta

I'll continue to rebase it forward as time goes on (until a cleaned-up
version gets merged upstream), but the tip of that branch should always
be in a working state.

-Peff
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] [RFC] Making use of bitmaps for thin objects

2014-01-06 Thread Jeff King
On Sun, Dec 22, 2013 at 09:55:23PM +, Ben Maurer wrote:

 One issue with this approach is that it seems git-pack-index doesn't
 perform as well with thin packs. git-index-pack uses a multi-threaded
 approach to resolving the deltas. However, the multithreading only
 works on deltas that are exclusively in the pack. After the
 multi-threaded phase, it incrementally brings in objects from external
 packs, but in single threaded manner. Many objects in the pack have
 some dependency on an external object, therefore, defeating the
 multithreading.

Yes. It will also just plain perform worse, because it will have to
copy over more external objects. This is somewhat made up for getting an
actual smaller pack size, but I suspect the completed thin-pack ends up
larger than what the server would otherwise send. Because the server is
blindly reusing on-disk deltas (which is good, because it takes load off
of the server), it misses good delta opportunities between objects in
the sent pack (which are likely almost as small, but would not require
fixing on the other end).

Single-threading the extra work we have to do just exacerbates the
problem, of course.

Still, I think it will be a net win for end-to-end wall clock time of
the operation. You are saving CPU time on the server end, and you're
saving network bandwidth with a smaller pack.

In my tests on torvalds/linux, doing a fetch across a local machine (so
basically discounting network improvements), the times look like (this
is end-to-end, counting both server and client CPU time):

  [vanilla]
  real0m3.850s
  user0m7.504s
  sys 0m0.380s

  [patched]
  real0m2.785s
  user0m2.472s
  sys 0m0.180s

So it was a win both for wall-clock and CPU.

 What's the use case for a pack file with a SHA1 reference living
 inside the pack file (why not just use an offset?) Would it make sense
 to assume that all such files are external and bring them in in the
 first phase.

Once upon a time, ref-delta was the only format supported by packfiles. Later,
delta-base-offset was invented, and the client and server negotiate the
use of the feature before the packfile is generated (and even when we
reuse objects, pack-objects will rewrite the header on the fly to use
ref-delta if necessary).

These days, pretty much everybody supports delta-base-offset, so I don't
think there is any reason index-pack should see ref-delta for a non-thin
object. We could probably teach index-pack an --assume-refs-are-thin
option to optimize for this case, and have fetch-pack/receive-pack pass
it whenever they know that delta-base-offset was negotiated.

-Peff
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] [RFC] Making use of bitmaps for thin objects

2014-01-06 Thread Jeff King
On Sun, Dec 22, 2013 at 11:47:34AM -0800, Ben Maurer wrote:

 Jeff King's bitmap branch appears to give a very substantial speedup.
 After applying this branch, the counting objects phase is basically
 free. However, I found that the compression phase still takes a
 non-trivial amount of time.

Sorry for the slow reply; I've been on vacation.

First off, I'm excited that you're looking into using bitmaps. We've
been using them for a while at GitHub, but more testing is definitely
appreciated. :)

When you build your bitmaps, do you set the pack.writeBitmapHashCache
option? We found that it makes a significant difference during the
compression phase, as otherwise git attempts deltas between random files
based on size. Here are some numbers for a simulated fetch from
torvalds/linux, representing about 7 weeks of history. Running:

  from=2d3c627502f2a9b0a7de06a5a2df2365542a72c9
  to=f0a679afefc0b6288310f88606b4bb1f243f1aa9
  run() {
(echo $to  echo ^$from) |
git pack-objects --stdout --all-progress --revs /dev/null
  }

  echo == no hash cache
  git repack -adb 2/dev/null
  time run

  echo == with hash cache
  git -c pack.writebitmaphashcache=1 repack -adb 2/dev/null
  time run

produces:

  == no hash cache
  Counting objects: 20661, done.
  Delta compression using up to 8 threads.
  Compressing objects: 100% (7700/7700), done.
  Writing objects: 100% (20661/20661), 23.23 MiB | 11.13 MiB/s, done.
  Total 20661 (delta 13884), reused 16638 (delta 12940)

  real0m3.626s
  user0m10.760s
  sys 0m0.060s

  == with hash cache
  Counting objects: 20661, done.
  Delta compression using up to 8 threads.
  Compressing objects: 100% (7700/7700), done.
  Writing objects: 100% (20661/20661), 22.64 MiB | 10.82 MiB/s, done.
  Total 20661 (delta 14038), reused 16638 (delta 12940)

  real0m3.072s
  user0m6.168s
  sys 0m0.100s

So our resulting pack shrinks a little because we find better deltas,
but note that we save a fair bit of CPU time (the wall clock time ends
up not all that different, because the single-threaded writing phase
represents a big chunk of that).

 It looks like most of the time spent compressing objects was in cases
 where the object was already compressed in the packfile, but the delta
 was based on an object that the client already had. For some reason,
 --thin wasn't enabling reuse of these deltas.

I'm not too surprised. The long-time strategy for a fetch has been to
walk down the haves and wants to their merge base. That boundary
commit is marked as a preferred base, meaning we won't send it, but
it's a good base for other objects, since we know the client has it.

Technically _all_ of the history reachable from that merge base could be
marked as a preferred base, but we don't do so for efficiency reasons:

  1. It's expensive to walk the full object graph for a small fetch, and

  2. You would clog the delta-search algorithm if you had a very large
 number of preferred-base objects.

With bitmaps, though, the history walk is free (we just check each
object against our have bitmap), so (1) is a non-issue. For (2), we
probably don't want to stick each object into the preferred-base list,
but we do want to reuse on-disk deltas we have, if we know the other
side has the base.

I don't know if you went through the same line of thinking, but that
matches your proposed solution. :)

 This is a hacky, poorly styled attempt to figure how how much better
 performance could be. With the bitmap branch, git should know what
 objects the client has already and can easily test if an existing delta
 can be reused. I don't know the branch well enough to code this, so
 as a hack, I just assumed the client has any delta base that is in
 the pack file (for our repo, this is always true, because we have a
 linear history)

Even without a linear history, it mostly works. If you are fetching all
of the branches from the other side, then you will end up with all of
the objects that the remote has. Which means that either you already
have the base, or the remote is about to send it to you.

It will break down, though, whenever the other side has something you're
not fetching. For that you really need to do the have bitmap check.

 This greatly reduces the time:
 
 $ { echo HEAD  echo ^$have; } | time ../git-bitmap/install/bin/git 
 pack-objects --use-bitmap-index --revs --stdout --thin  /dev/null
 Counting objects: 220909, done.
 Compressing objects: 100% (14203/14203), done.
 Total 220909 (delta 194050), reused 220909 (delta 199885)
 3.57user 1.28system 0:04.59elapsed 105%CPU (0avgtext+0avgdata 
 2007296maxresident)k
 0inputs+0outputs (0major+416243minor)pagefaults 0swaps

You might try with --all-progress (or pipe to wc -c), as this should
be reducing the output size, too.

Here's my same torvalds/linux test, run with the patch I'm including
below:

  Counting objects: 20661, done.
  Delta compression using up to 8 threads.
  Compressing objects: 100% (3677/3677), done.
  Writing 

Re: [PATCH] [RFC] Making use of bitmaps for thin objects

2014-01-06 Thread Junio C Hamano
Jeff King p...@peff.net writes:

 We could probably teach index-pack an --assume-refs-are-thin
 option to optimize for this case, and have fetch-pack/receive-pack pass
 it whenever they know that delta-base-offset was negotiated.

I thought the existing negotiation merely means I understand offset
encoded bases, so you are allowed to use that encoding, not I will
not accept encoding other than the offset format, so you must use
that encoding for everything.

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] [RFC] Making use of bitmaps for thin objects

2014-01-06 Thread Jeff King
On Mon, Jan 06, 2014 at 08:37:49AM -0800, Junio C Hamano wrote:

 Jeff King p...@peff.net writes:
 
  We could probably teach index-pack an --assume-refs-are-thin
  option to optimize for this case, and have fetch-pack/receive-pack pass
  it whenever they know that delta-base-offset was negotiated.
 
 I thought the existing negotiation merely means I understand offset
 encoded bases, so you are allowed to use that encoding, not I will
 not accept encoding other than the offset format, so you must use
 that encoding for everything.

You are right about what it means. But this is an optimization, not a
correctness thing. So if we assume that senders who are allowed to send
offsets will generally do so, it might be a reasonable optimization to
guess that ref-delta objects will need thin completion. If we are wrong,
the worst case is that we add an extra local object to the end of the
pack. So as long as we are right most of the time, it may still be a
win.

Of course, it may also be possible to simply multi-thread the
thin-completion portion of index-pack. That would be even better, though
I am not sure how it would work. The resolution of an object in one
thread can always become the input for another thread. But maybe we
could have each thread come up with a proposed set of objects to add to
the pack, and then drop duplicates or something. I haven't looked
closely.

-Peff
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] [RFC] Making use of bitmaps for thin objects

2014-01-06 Thread Jeff King
On Mon, Jan 06, 2014 at 09:57:23AM -0500, Jeff King wrote:

 diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
 index c733379..0cff874 100644
 --- a/builtin/pack-objects.c
 +++ b/builtin/pack-objects.c
 @@ -1402,6 +1402,19 @@ static void check_object(struct object_entry *entry)
   base_entry-delta_child = entry;
   unuse_pack(w_curs);
   return;
 + } else if(base_ref  bitmap_have(base_ref)) {
 + entry-type = entry-in_pack_type;
 + entry-delta_size = entry-size;
 + /*
 +  * XXX we'll leak this, but it's probably OK
 +  * since we'll exit immediately after the packing
 +  * is done
 +  */
 + entry-delta = xcalloc(1, sizeof(*entry-delta));
 + hashcpy(entry-delta-idx.sha1, base_ref);
 + entry-delta-preferred_base = 1;
 + unuse_pack(w_curs);
 + return;
   }

Just reading over this again, the conditional here should obviously be
checking thin (which needs to become a global, as in your patch).

-Peff
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH] [RFC] Making use of bitmaps for thin objects

2014-01-06 Thread Ben Maurer
 Sorry for the slow reply; I've been on vacation.

No worries.

 When you build your bitmaps, do you set the pack.writeBitmapHashCache
 option? We found that it makes a significant difference during the
 compression phase, as otherwise git attempts deltas between random files
 based on size. Here are some numbers for a simulated fetch from
 torvalds/linux, representing about 7 weeks of history. Running:

Yeah, I enabled this option. I don't have timings for generating the bitmap 
index without this option unfortunately (this is a pretty large repo, doing the 
full repack that I needed to do to generate the first version took 15+ minutes)

 Having been confused by it myself before, I think the magic keyword is
 preferred base. Once you know that, the code makes more sense. :)

Ah, nice! That was pretty confusing :-)

 Let me know how this patch does for you. My testing has been fairly
 limited so far.

This patch looks like a much cleaner version :-). Works well for me, but my 
test setup may not be great since I didn't get any errors from completely 
ignoring the haves bitmap :-). --
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] [RFC] Making use of bitmaps for thin objects

2014-01-06 Thread Jeff King
On Mon, Jan 06, 2014 at 09:15:04PM +, Ben Maurer wrote:

  Let me know how this patch does for you. My testing has been fairly
  limited so far.
 
 This patch looks like a much cleaner version :-). Works well for me,
 but my test setup may not be great since I didn't get any errors from
 completely ignoring the haves bitmap :-).

Great. Out of curiosity, can you show the before/after? The timings
should be similar to what your patch produced, but I'm really curious to
see how the pack size changes.

-Peff
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH] [RFC] Making use of bitmaps for thin objects

2014-01-06 Thread Ben Maurer
It looks like for my repo the size win wasn't as big (~10%). Is it possible 
that with the kernel test you got extremely lucky and there was some huge 
binary blob that thin packing turned into a tiny delta?

The repo I'm testing with here isn't a typical codebase -- it is used to store 
configuration information and it has very different update patterns than most 
codebases.

When you get a chance, it'd be handy if you could push an updated version of 
your change out to your public github repo. I'd like to see if folks here are 
interested in testing this more, and it'd be good to make sure we're testing 
the diff that is targeted for upstream.

Bitmap index, without thin packing:

Counting objects: 158825, done.
Delta compression using up to 32 threads.
Compressing objects: 100% (18113/18113), done.
Writing objects: 100% (158825/158825), 89.87 MiB | 11.23 MiB/s, done.
Total 158825 (delta 139493), reused 153076 (delta 135730)
real 15.60
user 34.38
sys 2.99


Bitmap index, with thin packing:

Counting objects: 158825, done.
Delta compression using up to 32 threads.
Compressing objects: 100% (12364/12364), done.
Writing objects: 100% (158825/158825), 81.35 MiB | 0 bytes/s, done.
Total 158825 (delta 135730), reused 158825 (delta 141479)
real 2.70
user 2.28
sys 0.65



From: Jeff King [p...@peff.net]
Sent: Monday, January 06, 2014 1:57 PM
To: Ben Maurer
Cc: git@vger.kernel.org
Subject: Re: [PATCH] [RFC] Making use of bitmaps for thin objects

On Mon, Jan 06, 2014 at 09:15:04PM +, Ben Maurer wrote:

  Let me know how this patch does for you. My testing has been fairly
  limited so far.

 This patch looks like a much cleaner version :-). Works well for me,
 but my test setup may not be great since I didn't get any errors from
 completely ignoring the haves bitmap :-).

Great. Out of curiosity, can you show the before/after? The timings
should be similar to what your patch produced, but I'm really curious to
see how the pack size changes.

-Peff
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] [RFC] Making use of bitmaps for thin objects

2013-12-22 Thread Ben Maurer
I've been hacking with the performance of git on a large, quickly
changing git repo used inside Facebook. Pulling a week of changes
from this repo can be quite painful.

$ { echo HEAD  echo ^$have; } | time git pack-objects --no-use-bitmap-index 
--revs --stdout  /dev/null
Counting objects: 221082, done.
Compressing objects: 100% (20174/20174), done.
Total 221082 (delta 197889), reused 215144 (delta 194084)
86.53user 7.89system 1:12.76elapsed 129%CPU (0avgtext+0avgdata 
5746608maxresident)k
0inputs+0outputs (0major+3177262minor)pagefaults 0swaps

Jeff King's bitmap branch appears to give a very substantial speedup.
After applying this branch, the counting objects phase is basically
free. However, I found that the compression phase still takes a
non-trivial amount of time.

$ { echo HEAD  echo ^$have; } | time ../git-bitmap/install/bin/git 
pack-objects --use-bitmap-index --revs --stdout  /dev/null
Counting objects: 220909, done.
Compressing objects: 100% (20038/20038), done.
Total 220909 (delta 197794), reused 215074 (delta 194050)
47.96user 2.45system 0:28.39elapsed 177%CPU (0avgtext+0avgdata 
5890768maxresident)k
0inputs+0outputs (0major+984875minor)pagefaults 0swaps

It looks like most of the time spent compressing objects was in cases
where the object was already compressed in the packfile, but the delta
was based on an object that the client already had. For some reason,
--thin wasn't enabling reuse of these deltas.

This is a hacky, poorly styled attempt to figure how how much better
performance could be. With the bitmap branch, git should know what
objects the client has already and can easily test if an existing delta
can be reused. I don't know the branch well enough to code this, so
as a hack, I just assumed the client has any delta base that is in
the pack file (for our repo, this is always true, because we have a
linear history)

This greatly reduces the time:

$ { echo HEAD  echo ^$have; } | time ../git-bitmap/install/bin/git 
pack-objects --use-bitmap-index --revs --stdout --thin  /dev/null
Counting objects: 220909, done.
Compressing objects: 100% (14203/14203), done.
Total 220909 (delta 194050), reused 220909 (delta 199885)
3.57user 1.28system 0:04.59elapsed 105%CPU (0avgtext+0avgdata 
2007296maxresident)k
0inputs+0outputs (0major+416243minor)pagefaults 0swaps

I didn't do much testing here, I did run git-index-pack --fix-thin
then git fsck, as a sanity check.

I'd appreciate feedback here, if this is a valid approach, maybe
somebody more involved in the bitmap branch would be interested
in implementing the actual logic to figure out if the thin
revision is valid

 * I was rather confused as to how --thin works today. I could
   not figure out how the choice was made to reuse a thin delta
   in the existing code base.
 * I had a very hacky way of communicating to the pack writing code
   that there was a delta, but that it need not take that into
   account for the pruposes of sorting the file. Does anybody have
   a better suggestion here?

Signed-off-by: Ben Maurer bmau...@fb.com
---
 builtin/pack-objects.c | 26 +++---
 pack-objects.h |  1 +
 2 files changed, 20 insertions(+), 7 deletions(-)

diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index c2f2847..3dc4411 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -53,6 +53,7 @@ static unsigned long pack_size_limit;
 static int depth = 50;
 static int delta_search_threads;
 static int pack_to_stdout;
+static int thin = 0;
 static int num_preferred_base;
 static struct progress *progress_state;
 static int pack_compression_level = Z_DEFAULT_COMPRESSION;
@@ -347,7 +348,8 @@ static unsigned long write_no_reuse_object(struct sha1file 
*f, struct object_ent

 /* Return 0 if we will bust the pack-size limit */
 static unsigned long write_reuse_object(struct sha1file *f, struct 
object_entry *entry,
-   unsigned long limit, int usable_delta)
+   unsigned long limit, int usable_delta,
+   const unsigned char* thin_delta_sha1)
 {
struct packed_git *p = entry-in_pack;
struct pack_window *w_curs = NULL;
@@ -361,7 +363,11 @@ static unsigned long write_reuse_object(struct sha1file 
*f, struct object_entry
if (entry-delta)
type = (allow_ofs_delta  entry-delta-idx.offset) ?
OBJ_OFS_DELTA : OBJ_REF_DELTA;
-   hdrlen = encode_in_pack_object_header(type, entry-size, header);
+
+   if (thin_delta_sha1)
+   type = OBJ_REF_DELTA;
+
+   hdrlen = encode_in_pack_object_header(type, thin_delta_sha1 ? 
entry-delta_size : entry-size, header);

offset = entry-in_pack_offset;
revidx = find_pack_revindex(p, offset);
@@ -403,7 +409,7 @@ static unsigned long write_reuse_object(struct sha1file *f, 
struct object_entry
return 0;
}
sha1write(f, header, 

RE: [PATCH] [RFC] Making use of bitmaps for thin objects

2013-12-22 Thread Ben Maurer
One issue with this approach is that it seems git-pack-index doesn't perform as 
well with thin packs. git-index-pack uses a multi-threaded approach to 
resolving the deltas. However, the multithreading only works on deltas that are 
exclusively in the pack. After the multi-threaded phase, it incrementally 
brings in objects from external packs, but in single threaded manner. Many 
objects in the pack have some dependency on an external object, therefore, 
defeating the multithreading.

What's the use case for a pack file with a SHA1 reference living inside the 
pack file (why not just use an offset?) Would it make sense to assume that all 
such files are external and bring them in in the first phase.--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html