Re: [PATCH] [RFC] Making use of bitmaps for thin objects
On Mon, Jan 06, 2014 at 10:14:30PM +, Ben Maurer wrote: It looks like for my repo the size win wasn't as big (~10%). Is it possible that with the kernel test you got extremely lucky and there was some huge binary blob that thin packing turned into a tiny delta? I don't think so. When I look at the reused-delta numbers, I see that we reused ~3000 extra deltas (and the compressing progress meter drops by the same amount. If I do a full clone (or just index-pack the result), it claims ~3000 thin objects completed with local objects (versus 0 in the normal case). So I think we really are getting a lot of little savings adding up, which makes sense. If there were thousands of changed files, a non-thin pack has to have at least _one_ full version of each file. With thin packs, we might literally have only deltas. It was a 7-week period, which might make more difference. I'm going to run some experiments with different time periods to see if that changes anything. It might also be the repo contents. I'm going to try my experiments on a few different repositories. It may be that either the kernel or your repo is unusual in some way. Or maybe I was just lucky. :) When you get a chance, it'd be handy if you could push an updated version of your change out to your public github repo. I'd like to see if folks here are interested in testing this more, and it'd be good to make sure we're testing the diff that is targeted for upstream. I've pushed it to: git://github.com/peff/git.git jk/bitmap-reuse-delta I'll continue to rebase it forward as time goes on (until a cleaned-up version gets merged upstream), but the tip of that branch should always be in a working state. -Peff -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] [RFC] Making use of bitmaps for thin objects
On Sun, Dec 22, 2013 at 09:55:23PM +, Ben Maurer wrote: One issue with this approach is that it seems git-pack-index doesn't perform as well with thin packs. git-index-pack uses a multi-threaded approach to resolving the deltas. However, the multithreading only works on deltas that are exclusively in the pack. After the multi-threaded phase, it incrementally brings in objects from external packs, but in single threaded manner. Many objects in the pack have some dependency on an external object, therefore, defeating the multithreading. Yes. It will also just plain perform worse, because it will have to copy over more external objects. This is somewhat made up for getting an actual smaller pack size, but I suspect the completed thin-pack ends up larger than what the server would otherwise send. Because the server is blindly reusing on-disk deltas (which is good, because it takes load off of the server), it misses good delta opportunities between objects in the sent pack (which are likely almost as small, but would not require fixing on the other end). Single-threading the extra work we have to do just exacerbates the problem, of course. Still, I think it will be a net win for end-to-end wall clock time of the operation. You are saving CPU time on the server end, and you're saving network bandwidth with a smaller pack. In my tests on torvalds/linux, doing a fetch across a local machine (so basically discounting network improvements), the times look like (this is end-to-end, counting both server and client CPU time): [vanilla] real0m3.850s user0m7.504s sys 0m0.380s [patched] real0m2.785s user0m2.472s sys 0m0.180s So it was a win both for wall-clock and CPU. What's the use case for a pack file with a SHA1 reference living inside the pack file (why not just use an offset?) Would it make sense to assume that all such files are external and bring them in in the first phase. Once upon a time, ref-delta was the only format supported by packfiles. Later, delta-base-offset was invented, and the client and server negotiate the use of the feature before the packfile is generated (and even when we reuse objects, pack-objects will rewrite the header on the fly to use ref-delta if necessary). These days, pretty much everybody supports delta-base-offset, so I don't think there is any reason index-pack should see ref-delta for a non-thin object. We could probably teach index-pack an --assume-refs-are-thin option to optimize for this case, and have fetch-pack/receive-pack pass it whenever they know that delta-base-offset was negotiated. -Peff -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] [RFC] Making use of bitmaps for thin objects
On Sun, Dec 22, 2013 at 11:47:34AM -0800, Ben Maurer wrote: Jeff King's bitmap branch appears to give a very substantial speedup. After applying this branch, the counting objects phase is basically free. However, I found that the compression phase still takes a non-trivial amount of time. Sorry for the slow reply; I've been on vacation. First off, I'm excited that you're looking into using bitmaps. We've been using them for a while at GitHub, but more testing is definitely appreciated. :) When you build your bitmaps, do you set the pack.writeBitmapHashCache option? We found that it makes a significant difference during the compression phase, as otherwise git attempts deltas between random files based on size. Here are some numbers for a simulated fetch from torvalds/linux, representing about 7 weeks of history. Running: from=2d3c627502f2a9b0a7de06a5a2df2365542a72c9 to=f0a679afefc0b6288310f88606b4bb1f243f1aa9 run() { (echo $to echo ^$from) | git pack-objects --stdout --all-progress --revs /dev/null } echo == no hash cache git repack -adb 2/dev/null time run echo == with hash cache git -c pack.writebitmaphashcache=1 repack -adb 2/dev/null time run produces: == no hash cache Counting objects: 20661, done. Delta compression using up to 8 threads. Compressing objects: 100% (7700/7700), done. Writing objects: 100% (20661/20661), 23.23 MiB | 11.13 MiB/s, done. Total 20661 (delta 13884), reused 16638 (delta 12940) real0m3.626s user0m10.760s sys 0m0.060s == with hash cache Counting objects: 20661, done. Delta compression using up to 8 threads. Compressing objects: 100% (7700/7700), done. Writing objects: 100% (20661/20661), 22.64 MiB | 10.82 MiB/s, done. Total 20661 (delta 14038), reused 16638 (delta 12940) real0m3.072s user0m6.168s sys 0m0.100s So our resulting pack shrinks a little because we find better deltas, but note that we save a fair bit of CPU time (the wall clock time ends up not all that different, because the single-threaded writing phase represents a big chunk of that). It looks like most of the time spent compressing objects was in cases where the object was already compressed in the packfile, but the delta was based on an object that the client already had. For some reason, --thin wasn't enabling reuse of these deltas. I'm not too surprised. The long-time strategy for a fetch has been to walk down the haves and wants to their merge base. That boundary commit is marked as a preferred base, meaning we won't send it, but it's a good base for other objects, since we know the client has it. Technically _all_ of the history reachable from that merge base could be marked as a preferred base, but we don't do so for efficiency reasons: 1. It's expensive to walk the full object graph for a small fetch, and 2. You would clog the delta-search algorithm if you had a very large number of preferred-base objects. With bitmaps, though, the history walk is free (we just check each object against our have bitmap), so (1) is a non-issue. For (2), we probably don't want to stick each object into the preferred-base list, but we do want to reuse on-disk deltas we have, if we know the other side has the base. I don't know if you went through the same line of thinking, but that matches your proposed solution. :) This is a hacky, poorly styled attempt to figure how how much better performance could be. With the bitmap branch, git should know what objects the client has already and can easily test if an existing delta can be reused. I don't know the branch well enough to code this, so as a hack, I just assumed the client has any delta base that is in the pack file (for our repo, this is always true, because we have a linear history) Even without a linear history, it mostly works. If you are fetching all of the branches from the other side, then you will end up with all of the objects that the remote has. Which means that either you already have the base, or the remote is about to send it to you. It will break down, though, whenever the other side has something you're not fetching. For that you really need to do the have bitmap check. This greatly reduces the time: $ { echo HEAD echo ^$have; } | time ../git-bitmap/install/bin/git pack-objects --use-bitmap-index --revs --stdout --thin /dev/null Counting objects: 220909, done. Compressing objects: 100% (14203/14203), done. Total 220909 (delta 194050), reused 220909 (delta 199885) 3.57user 1.28system 0:04.59elapsed 105%CPU (0avgtext+0avgdata 2007296maxresident)k 0inputs+0outputs (0major+416243minor)pagefaults 0swaps You might try with --all-progress (or pipe to wc -c), as this should be reducing the output size, too. Here's my same torvalds/linux test, run with the patch I'm including below: Counting objects: 20661, done. Delta compression using up to 8 threads. Compressing objects: 100% (3677/3677), done. Writing
Re: [PATCH] [RFC] Making use of bitmaps for thin objects
Jeff King p...@peff.net writes: We could probably teach index-pack an --assume-refs-are-thin option to optimize for this case, and have fetch-pack/receive-pack pass it whenever they know that delta-base-offset was negotiated. I thought the existing negotiation merely means I understand offset encoded bases, so you are allowed to use that encoding, not I will not accept encoding other than the offset format, so you must use that encoding for everything. -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] [RFC] Making use of bitmaps for thin objects
On Mon, Jan 06, 2014 at 08:37:49AM -0800, Junio C Hamano wrote: Jeff King p...@peff.net writes: We could probably teach index-pack an --assume-refs-are-thin option to optimize for this case, and have fetch-pack/receive-pack pass it whenever they know that delta-base-offset was negotiated. I thought the existing negotiation merely means I understand offset encoded bases, so you are allowed to use that encoding, not I will not accept encoding other than the offset format, so you must use that encoding for everything. You are right about what it means. But this is an optimization, not a correctness thing. So if we assume that senders who are allowed to send offsets will generally do so, it might be a reasonable optimization to guess that ref-delta objects will need thin completion. If we are wrong, the worst case is that we add an extra local object to the end of the pack. So as long as we are right most of the time, it may still be a win. Of course, it may also be possible to simply multi-thread the thin-completion portion of index-pack. That would be even better, though I am not sure how it would work. The resolution of an object in one thread can always become the input for another thread. But maybe we could have each thread come up with a proposed set of objects to add to the pack, and then drop duplicates or something. I haven't looked closely. -Peff -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] [RFC] Making use of bitmaps for thin objects
On Mon, Jan 06, 2014 at 09:57:23AM -0500, Jeff King wrote: diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c index c733379..0cff874 100644 --- a/builtin/pack-objects.c +++ b/builtin/pack-objects.c @@ -1402,6 +1402,19 @@ static void check_object(struct object_entry *entry) base_entry-delta_child = entry; unuse_pack(w_curs); return; + } else if(base_ref bitmap_have(base_ref)) { + entry-type = entry-in_pack_type; + entry-delta_size = entry-size; + /* + * XXX we'll leak this, but it's probably OK + * since we'll exit immediately after the packing + * is done + */ + entry-delta = xcalloc(1, sizeof(*entry-delta)); + hashcpy(entry-delta-idx.sha1, base_ref); + entry-delta-preferred_base = 1; + unuse_pack(w_curs); + return; } Just reading over this again, the conditional here should obviously be checking thin (which needs to become a global, as in your patch). -Peff -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [PATCH] [RFC] Making use of bitmaps for thin objects
Sorry for the slow reply; I've been on vacation. No worries. When you build your bitmaps, do you set the pack.writeBitmapHashCache option? We found that it makes a significant difference during the compression phase, as otherwise git attempts deltas between random files based on size. Here are some numbers for a simulated fetch from torvalds/linux, representing about 7 weeks of history. Running: Yeah, I enabled this option. I don't have timings for generating the bitmap index without this option unfortunately (this is a pretty large repo, doing the full repack that I needed to do to generate the first version took 15+ minutes) Having been confused by it myself before, I think the magic keyword is preferred base. Once you know that, the code makes more sense. :) Ah, nice! That was pretty confusing :-) Let me know how this patch does for you. My testing has been fairly limited so far. This patch looks like a much cleaner version :-). Works well for me, but my test setup may not be great since I didn't get any errors from completely ignoring the haves bitmap :-). -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] [RFC] Making use of bitmaps for thin objects
On Mon, Jan 06, 2014 at 09:15:04PM +, Ben Maurer wrote: Let me know how this patch does for you. My testing has been fairly limited so far. This patch looks like a much cleaner version :-). Works well for me, but my test setup may not be great since I didn't get any errors from completely ignoring the haves bitmap :-). Great. Out of curiosity, can you show the before/after? The timings should be similar to what your patch produced, but I'm really curious to see how the pack size changes. -Peff -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [PATCH] [RFC] Making use of bitmaps for thin objects
It looks like for my repo the size win wasn't as big (~10%). Is it possible that with the kernel test you got extremely lucky and there was some huge binary blob that thin packing turned into a tiny delta? The repo I'm testing with here isn't a typical codebase -- it is used to store configuration information and it has very different update patterns than most codebases. When you get a chance, it'd be handy if you could push an updated version of your change out to your public github repo. I'd like to see if folks here are interested in testing this more, and it'd be good to make sure we're testing the diff that is targeted for upstream. Bitmap index, without thin packing: Counting objects: 158825, done. Delta compression using up to 32 threads. Compressing objects: 100% (18113/18113), done. Writing objects: 100% (158825/158825), 89.87 MiB | 11.23 MiB/s, done. Total 158825 (delta 139493), reused 153076 (delta 135730) real 15.60 user 34.38 sys 2.99 Bitmap index, with thin packing: Counting objects: 158825, done. Delta compression using up to 32 threads. Compressing objects: 100% (12364/12364), done. Writing objects: 100% (158825/158825), 81.35 MiB | 0 bytes/s, done. Total 158825 (delta 135730), reused 158825 (delta 141479) real 2.70 user 2.28 sys 0.65 From: Jeff King [p...@peff.net] Sent: Monday, January 06, 2014 1:57 PM To: Ben Maurer Cc: git@vger.kernel.org Subject: Re: [PATCH] [RFC] Making use of bitmaps for thin objects On Mon, Jan 06, 2014 at 09:15:04PM +, Ben Maurer wrote: Let me know how this patch does for you. My testing has been fairly limited so far. This patch looks like a much cleaner version :-). Works well for me, but my test setup may not be great since I didn't get any errors from completely ignoring the haves bitmap :-). Great. Out of curiosity, can you show the before/after? The timings should be similar to what your patch produced, but I'm really curious to see how the pack size changes. -Peff -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] [RFC] Making use of bitmaps for thin objects
I've been hacking with the performance of git on a large, quickly changing git repo used inside Facebook. Pulling a week of changes from this repo can be quite painful. $ { echo HEAD echo ^$have; } | time git pack-objects --no-use-bitmap-index --revs --stdout /dev/null Counting objects: 221082, done. Compressing objects: 100% (20174/20174), done. Total 221082 (delta 197889), reused 215144 (delta 194084) 86.53user 7.89system 1:12.76elapsed 129%CPU (0avgtext+0avgdata 5746608maxresident)k 0inputs+0outputs (0major+3177262minor)pagefaults 0swaps Jeff King's bitmap branch appears to give a very substantial speedup. After applying this branch, the counting objects phase is basically free. However, I found that the compression phase still takes a non-trivial amount of time. $ { echo HEAD echo ^$have; } | time ../git-bitmap/install/bin/git pack-objects --use-bitmap-index --revs --stdout /dev/null Counting objects: 220909, done. Compressing objects: 100% (20038/20038), done. Total 220909 (delta 197794), reused 215074 (delta 194050) 47.96user 2.45system 0:28.39elapsed 177%CPU (0avgtext+0avgdata 5890768maxresident)k 0inputs+0outputs (0major+984875minor)pagefaults 0swaps It looks like most of the time spent compressing objects was in cases where the object was already compressed in the packfile, but the delta was based on an object that the client already had. For some reason, --thin wasn't enabling reuse of these deltas. This is a hacky, poorly styled attempt to figure how how much better performance could be. With the bitmap branch, git should know what objects the client has already and can easily test if an existing delta can be reused. I don't know the branch well enough to code this, so as a hack, I just assumed the client has any delta base that is in the pack file (for our repo, this is always true, because we have a linear history) This greatly reduces the time: $ { echo HEAD echo ^$have; } | time ../git-bitmap/install/bin/git pack-objects --use-bitmap-index --revs --stdout --thin /dev/null Counting objects: 220909, done. Compressing objects: 100% (14203/14203), done. Total 220909 (delta 194050), reused 220909 (delta 199885) 3.57user 1.28system 0:04.59elapsed 105%CPU (0avgtext+0avgdata 2007296maxresident)k 0inputs+0outputs (0major+416243minor)pagefaults 0swaps I didn't do much testing here, I did run git-index-pack --fix-thin then git fsck, as a sanity check. I'd appreciate feedback here, if this is a valid approach, maybe somebody more involved in the bitmap branch would be interested in implementing the actual logic to figure out if the thin revision is valid * I was rather confused as to how --thin works today. I could not figure out how the choice was made to reuse a thin delta in the existing code base. * I had a very hacky way of communicating to the pack writing code that there was a delta, but that it need not take that into account for the pruposes of sorting the file. Does anybody have a better suggestion here? Signed-off-by: Ben Maurer bmau...@fb.com --- builtin/pack-objects.c | 26 +++--- pack-objects.h | 1 + 2 files changed, 20 insertions(+), 7 deletions(-) diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c index c2f2847..3dc4411 100644 --- a/builtin/pack-objects.c +++ b/builtin/pack-objects.c @@ -53,6 +53,7 @@ static unsigned long pack_size_limit; static int depth = 50; static int delta_search_threads; static int pack_to_stdout; +static int thin = 0; static int num_preferred_base; static struct progress *progress_state; static int pack_compression_level = Z_DEFAULT_COMPRESSION; @@ -347,7 +348,8 @@ static unsigned long write_no_reuse_object(struct sha1file *f, struct object_ent /* Return 0 if we will bust the pack-size limit */ static unsigned long write_reuse_object(struct sha1file *f, struct object_entry *entry, - unsigned long limit, int usable_delta) + unsigned long limit, int usable_delta, + const unsigned char* thin_delta_sha1) { struct packed_git *p = entry-in_pack; struct pack_window *w_curs = NULL; @@ -361,7 +363,11 @@ static unsigned long write_reuse_object(struct sha1file *f, struct object_entry if (entry-delta) type = (allow_ofs_delta entry-delta-idx.offset) ? OBJ_OFS_DELTA : OBJ_REF_DELTA; - hdrlen = encode_in_pack_object_header(type, entry-size, header); + + if (thin_delta_sha1) + type = OBJ_REF_DELTA; + + hdrlen = encode_in_pack_object_header(type, thin_delta_sha1 ? entry-delta_size : entry-size, header); offset = entry-in_pack_offset; revidx = find_pack_revindex(p, offset); @@ -403,7 +409,7 @@ static unsigned long write_reuse_object(struct sha1file *f, struct object_entry return 0; } sha1write(f, header,
RE: [PATCH] [RFC] Making use of bitmaps for thin objects
One issue with this approach is that it seems git-pack-index doesn't perform as well with thin packs. git-index-pack uses a multi-threaded approach to resolving the deltas. However, the multithreading only works on deltas that are exclusively in the pack. After the multi-threaded phase, it incrementally brings in objects from external packs, but in single threaded manner. Many objects in the pack have some dependency on an external object, therefore, defeating the multithreading. What's the use case for a pack file with a SHA1 reference living inside the pack file (why not just use an offset?) Would it make sense to assume that all such files are external and bring them in in the first phase.-- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html