Re: [PATCH] pack-objects: disable pack reuse for object-selection options
Jeff Kingwrites: > On Tue, May 09, 2017 at 11:48:17AM +0900, Junio C Hamano wrote: > >> > I guess what I was asking was: do you still think it was unclear, or do >> > you think you were just being dense? >> > >> > I don't feel like I gave any information in the follow-on explanation >> > that wasn't in the commit message, so I wasn't clear if I worded it >> > better or if it just sunk in better. >> >> At least, "the current code is buggy when --local and friend are >> given and includes needless objects in the result" was something I >> learned only during the discussion, and would never have guessed by >> reading the log message. The second paragraph does talk about "This >> bug has been present since...", but the first paragraph does not say >> anything about the current output being broken. > > While waiting for your response I took a look to see if I could improve > it and came to the same conclusion. The result is below. Looks good to me. I really like how the third-paragraph reasons about pros and cons and decides to just disable the codepath. I see this as an example of omitting something you know so well as "too obvious", and it turns out that it isn't so obvious to others; I commit the same sin all the time myself. Catching instances of these is part of the review process. Thanks. > -- >8 -- > Subject: pack-objects: disable pack reuse for object-selection options > > If certain options like --honor-pack-keep, --local, or > --incremental are used with pack-objects, then we need to > feed each potential object to want_object_in_pack() to see > if it should be filtered out. But when the bitmap > reuse_packfile optimization is in effect, we do not call > that function at all, and in fact skip adding the objects to > the to_pack list entirely. This means we have a bug: for > certain requests we will silently ignore those options and > include objects in that pack that should not be there. > > The problem has been present since the inception of the > pack-reuse code in 6b8fda2db (pack-objects: use bitmaps when > packing objects, 2013-12-21), but it was unlikely to come up > in practice. These options are generally used for on-disk > packing, not transfer packs (which go to stdout), but we've > never allowed pack reuse for non-stdout packs (until > 645c432d6, we did not even use bitmaps, which the reuse > optimization relies on; after that, we explicitly turned it > off when not packing to stdout). > > We can fix this by just disabling the reuse_packfile > optimization when the options are in use. In theory we could > teach the pack-reuse code to satisfy these checks, but it's > not worth the complexity. The purpose of the optimization is > to keep the amount of per-object work we do to a minimum. > But these options inherently require us to search for other > copies of each object, drowning out any benefit of the > pack-reuse optimization. But note that the optimizations > from 56dfeb626 (pack-objects: compute local/ignore_pack_keep > early, 2016-07-29) happen before pack-reuse, meaning that > specifying "--honor-pack-keep" in a repository with no .keep > files can still follow the fast path. > > There are tests in t5310 that check these options with > bitmaps and --stdout, but they didn't catch the bug, and > it's hard to adapt them to do so. > > One problem is that they don't use --delta-base-offset; > without that option, we always disable the reuse > optimization entirely. It would be fine to add it in (it > actually makes the test more realistic), but that still > isn't quite enough. > > The other problem is that the reuse code is very picky; it > only kicks in when it can reuse most of a pack, starting > from the first byte. So we'd have to start from a fully > repacked and bitmapped state to trigger it. But the tests > for these options use a much more subtle state; they want to > be sure that the want_object_in_pack() code is allowing some > objects but not others. Doing a full repack runs counter to > that. > > So this patch adds new tests at the end of the script which > create the fully-packed state and make sure that each option > is not fooled by reusable pack. > > Signed-off-by: Jeff King
Re: [PATCH] pack-objects: disable pack reuse for object-selection options
On Mon, May 08, 2017 at 10:54:12PM -0400, Jeff King wrote: > Contents are the same. I decided to leave the "; do" as it > matches the rest of the script. If we're going to fix it, we > should do them all. Like this, perhaps. I didn't go on a crusade against "; do" in the other scripts, but perhaps that is low-hanging fruit that somebody else might want to pick. -- >8 -- Subject: [PATCH] t5310: fix "; do" style Our usual shell style is to put the "do" of a loop on its own line, like: while $cond do something done instead of: while $cond; do something done We have a bit of both in our code base, but the former is what's in CodingGuidelines (and outnumbers the latter in t/ by about 6:1). Signed-off-by: Jeff King--- t/t5310-pack-bitmaps.sh | 12 1 file changed, 8 insertions(+), 4 deletions(-) diff --git a/t/t5310-pack-bitmaps.sh b/t/t5310-pack-bitmaps.sh index c3ddfa217..20e2473a0 100755 --- a/t/t5310-pack-bitmaps.sh +++ b/t/t5310-pack-bitmaps.sh @@ -20,11 +20,13 @@ has_any () { } test_expect_success 'setup repo with moderate-sized history' ' - for i in $(test_seq 1 10); do + for i in $(test_seq 1 10) + do test_commit $i done && git checkout -b other HEAD~5 && - for i in $(test_seq 1 10); do + for i in $(test_seq 1 10) + do test_commit side-$i done && git checkout master && @@ -104,7 +106,8 @@ test_expect_success 'clone from bitmapped repository' ' ' test_expect_success 'setup further non-bitmapped commits' ' - for i in $(test_seq 1 10); do + for i in $(test_seq 1 10) + do test_commit further-$i done ' @@ -300,7 +303,8 @@ test_expect_success 'set up reusable pack' ' test_expect_success 'pack reuse respects --honor-pack-keep' ' test_when_finished "rm -f .git/objects/pack/*.keep" && - for i in .git/objects/pack/*.pack; do + for i in .git/objects/pack/*.pack + do >${i%.pack}.keep done && reusable_pack --honor-pack-keep >empty.pack && -- 2.13.0.rc2.436.g524cea07c
Re: [PATCH] pack-objects: disable pack reuse for object-selection options
On Tue, May 09, 2017 at 11:48:17AM +0900, Junio C Hamano wrote: > > I guess what I was asking was: do you still think it was unclear, or do > > you think you were just being dense? > > > > I don't feel like I gave any information in the follow-on explanation > > that wasn't in the commit message, so I wasn't clear if I worded it > > better or if it just sunk in better. > > At least, "the current code is buggy when --local and friend are > given and includes needless objects in the result" was something I > learned only during the discussion, and would never have guessed by > reading the log message. The second paragraph does talk about "This > bug has been present since...", but the first paragraph does not say > anything about the current output being broken. While waiting for your response I took a look to see if I could improve it and came to the same conclusion. The result is below. > So, I am not sure if this was a case where I was dense. I think the > first paragraph needs a bit more clarity. Yeah, you were not being dense. I just wrote it badly. Sorry for the confusion. -- >8 -- Subject: pack-objects: disable pack reuse for object-selection options If certain options like --honor-pack-keep, --local, or --incremental are used with pack-objects, then we need to feed each potential object to want_object_in_pack() to see if it should be filtered out. But when the bitmap reuse_packfile optimization is in effect, we do not call that function at all, and in fact skip adding the objects to the to_pack list entirely. This means we have a bug: for certain requests we will silently ignore those options and include objects in that pack that should not be there. The problem has been present since the inception of the pack-reuse code in 6b8fda2db (pack-objects: use bitmaps when packing objects, 2013-12-21), but it was unlikely to come up in practice. These options are generally used for on-disk packing, not transfer packs (which go to stdout), but we've never allowed pack reuse for non-stdout packs (until 645c432d6, we did not even use bitmaps, which the reuse optimization relies on; after that, we explicitly turned it off when not packing to stdout). We can fix this by just disabling the reuse_packfile optimization when the options are in use. In theory we could teach the pack-reuse code to satisfy these checks, but it's not worth the complexity. The purpose of the optimization is to keep the amount of per-object work we do to a minimum. But these options inherently require us to search for other copies of each object, drowning out any benefit of the pack-reuse optimization. But note that the optimizations from 56dfeb626 (pack-objects: compute local/ignore_pack_keep early, 2016-07-29) happen before pack-reuse, meaning that specifying "--honor-pack-keep" in a repository with no .keep files can still follow the fast path. There are tests in t5310 that check these options with bitmaps and --stdout, but they didn't catch the bug, and it's hard to adapt them to do so. One problem is that they don't use --delta-base-offset; without that option, we always disable the reuse optimization entirely. It would be fine to add it in (it actually makes the test more realistic), but that still isn't quite enough. The other problem is that the reuse code is very picky; it only kicks in when it can reuse most of a pack, starting from the first byte. So we'd have to start from a fully repacked and bitmapped state to trigger it. But the tests for these options use a much more subtle state; they want to be sure that the want_object_in_pack() code is allowing some objects but not others. Doing a full repack runs counter to that. So this patch adds new tests at the end of the script which create the fully-packed state and make sure that each option is not fooled by reusable pack. Signed-off-by: Jeff King--- Contents are the same. I decided to leave the "; do" as it matches the rest of the script. If we're going to fix it, we should do them all. builtin/pack-objects.c | 6 +- t/t5310-pack-bitmaps.sh | 38 ++ 2 files changed, 43 insertions(+), 1 deletion(-) diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c index 0fe35d1b5..50e01aa80 100644 --- a/builtin/pack-objects.c +++ b/builtin/pack-objects.c @@ -2717,7 +2717,11 @@ static void loosen_unused_packed_objects(struct rev_info *revs) */ static int pack_options_allow_reuse(void) { - return pack_to_stdout && allow_ofs_delta; + return pack_to_stdout && + allow_ofs_delta && + !ignore_packed_keep && + (!local || !have_non_local_packs) && + !incremental; } static int get_object_list_from_bitmap(struct rev_info *revs) diff --git a/t/t5310-pack-bitmaps.sh b/t/t5310-pack-bitmaps.sh index 424bec7d7..c3ddfa217 100755 --- a/t/t5310-pack-bitmaps.sh +++ b/t/t5310-pack-bitmaps.sh @@ -289,4 +289,42 @@ test_expect_success 'splitting packs does
Re: [PATCH] pack-objects: disable pack reuse for object-selection options
Jeff Kingwrites: > On Tue, May 09, 2017 at 11:14:18AM +0900, Junio C Hamano wrote: > >> Jeff King writes: >> >> >> Ah, OK, and now I understand why you called this a "bug" (which is >> >> older and do not need to be addressed as part of 2.13) in the >> >> original message. The new tests check requests that ought to >> >> produce an empty packfile as the result actually do, but with the >> >> current code, the reuse code does not work with --local and friends >> >> and ends up including what was requested to be excluded. >> > >> > Right. Did you want me to try re-wording the commit message, or does it >> > make sense now? >> >> It does make sense to me now, but I do not speak for all future >> readers of "git log", so... > > I guess what I was asking was: do you still think it was unclear, or do > you think you were just being dense? > > I don't feel like I gave any information in the follow-on explanation > that wasn't in the commit message, so I wasn't clear if I worded it > better or if it just sunk in better. At least, "the current code is buggy when --local and friend are given and includes needless objects in the result" was something I learned only during the discussion, and would never have guessed by reading the log message. The second paragraph does talk about "This bug has been present since...", but the first paragraph does not say anything about the current output being broken. So, I am not sure if this was a case where I was dense. I think the first paragraph needs a bit more clarity. If certain options like --honor-pack-keep, --local, or --incremental are used with pack-objects, then we need to feed each potential object to want_object_in_pack() to see if it should be filtered out. This is totally contrary to the purpose of the pack-reuse optimization, which tries hard to avoid doing any per-object work. Therefore we need to disable this optimization when these options are in use. Perhaps "... should be filtered out." can be followed by "However, the current code fails to do so, and we end up including these unwanted objects in the output.", and then "This is totally..." can instead begin with "Besides, having to do per-object filtering is totally...". I wouldn't have been confused if it were like so.
Re: [PATCH] pack-objects: disable pack reuse for object-selection options
On Tue, May 09, 2017 at 11:14:18AM +0900, Junio C Hamano wrote: > Jeff Kingwrites: > > >> Ah, OK, and now I understand why you called this a "bug" (which is > >> older and do not need to be addressed as part of 2.13) in the > >> original message. The new tests check requests that ought to > >> produce an empty packfile as the result actually do, but with the > >> current code, the reuse code does not work with --local and friends > >> and ends up including what was requested to be excluded. > > > > Right. Did you want me to try re-wording the commit message, or does it > > make sense now? > > It does make sense to me now, but I do not speak for all future > readers of "git log", so... I guess what I was asking was: do you still think it was unclear, or do you think you were just being dense? I don't feel like I gave any information in the follow-on explanation that wasn't in the commit message, so I wasn't clear if I worded it better or if it just sunk in better. -Peff
Re: [PATCH] pack-objects: disable pack reuse for object-selection options
Jeff Kingwrites: >> Ah, OK, and now I understand why you called this a "bug" (which is >> older and do not need to be addressed as part of 2.13) in the >> original message. The new tests check requests that ought to >> produce an empty packfile as the result actually do, but with the >> current code, the reuse code does not work with --local and friends >> and ends up including what was requested to be excluded. > > Right. Did you want me to try re-wording the commit message, or does it > make sense now? It does make sense to me now, but I do not speak for all future readers of "git log", so...
Re: [PATCH] pack-objects: disable pack reuse for object-selection options
On Tue, May 09, 2017 at 09:44:50AM +0900, Junio C Hamano wrote: > Jeff Kingwrites: > > > On Mon, May 08, 2017 at 01:56:27PM +0900, Junio C Hamano wrote: > > > >> Surely, even if we need to exclude some objects from an existing > >> packfile due to these selection options, we should be able to reuse > >> the non-excluded part, no? The end result may involve having to > >> pick and reuse more and smaller slices from existing packfiles, > >> which may be much less efficient, but it is no immediately obvious > >> to me if it leads to "need to disable". I would understand it if it > >> were "it becomes much less efficient and we are better off not using > >> the bitmap code at all", though. > > > > Yes, it's this last bit. The main win of the packfile reuse code is that > > it builds on the bitmaps to avoid doing as much per-object work as > > possible. So the objects don't even get added to the list of "struct > > object_entry", and we never consider them for the "should they be in the > > result" checks beyond the have/want computation done by the bitmaps. > > > > We could add those checks in, but what's the point? The idea of the > > reuse code is to be a fast path for serving vanilla clones. Searching > > through all of the packfiles for a .keep entry is the antithesis of > > that. > > Ah, OK, and now I understand why you called this a "bug" (which is > older and do not need to be addressed as part of 2.13) in the > original message. The new tests check requests that ought to > produce an empty packfile as the result actually do, but with the > current code, the reuse code does not work with --local and friends > and ends up including what was requested to be excluded. Right. Did you want me to try re-wording the commit message, or does it make sense now? -Peff
Re: [PATCH] pack-objects: disable pack reuse for object-selection options
Jeff Kingwrites: > On Mon, May 08, 2017 at 01:56:27PM +0900, Junio C Hamano wrote: > >> Surely, even if we need to exclude some objects from an existing >> packfile due to these selection options, we should be able to reuse >> the non-excluded part, no? The end result may involve having to >> pick and reuse more and smaller slices from existing packfiles, >> which may be much less efficient, but it is no immediately obvious >> to me if it leads to "need to disable". I would understand it if it >> were "it becomes much less efficient and we are better off not using >> the bitmap code at all", though. > > Yes, it's this last bit. The main win of the packfile reuse code is that > it builds on the bitmaps to avoid doing as much per-object work as > possible. So the objects don't even get added to the list of "struct > object_entry", and we never consider them for the "should they be in the > result" checks beyond the have/want computation done by the bitmaps. > > We could add those checks in, but what's the point? The idea of the > reuse code is to be a fast path for serving vanilla clones. Searching > through all of the packfiles for a .keep entry is the antithesis of > that. Ah, OK, and now I understand why you called this a "bug" (which is older and do not need to be addressed as part of 2.13) in the original message. The new tests check requests that ought to produce an empty packfile as the result actually do, but with the current code, the reuse code does not work with --local and friends and ends up including what was requested to be excluded. Thanks.
Re: [PATCH] pack-objects: disable pack reuse for object-selection options
On Mon, May 08, 2017 at 01:56:27PM +0900, Junio C Hamano wrote: > Jeff Kingwrites: > > > If certain options like --honor-pack-keep, --local, or > > --incremental are used with pack-objects, then we need to > > feed each potential object to want_object_in_pack() to see > > if it should be filtered out. This is totally contrary to > > the purpose of the pack-reuse optimization, which tries hard > > to avoid doing any per-object work. Therefore we need to > > disable this optimization when these options are in use. > > I read this five times, as I wanted to understand what you are > saying, but I am not sure if I got it right. One of the reasons why > I was confused was that I originally thought this "reuse" was about > delta reuse, but it is not. It is "sending a slice of the original > packfile straight to the output". Right. The "send a slice" goes under the name "reuse_packfile" in the code, but I'm not surprised you're not familiar with it. We added it long ago as part of the bitmap feature, and it's not often talked about (and in its current incarnation, isn't actually very useful; I have patches to improve that, but haven't gotten around to upstreaming them). > But even after getting myself out > of that confusion, I still do not see why we "need to disable". > Surely, even if we need to exclude some objects from an existing > packfile due to these selection options, we should be able to reuse > the non-excluded part, no? The end result may involve having to > pick and reuse more and smaller slices from existing packfiles, > which may be much less efficient, but it is no immediately obvious > to me if it leads to "need to disable". I would understand it if it > were "it becomes much less efficient and we are better off not using > the bitmap code at all", though. Yes, it's this last bit. The main win of the packfile reuse code is that it builds on the bitmaps to avoid doing as much per-object work as possible. So the objects don't even get added to the list of "struct object_entry", and we never consider them for the "should they be in the result" checks beyond the have/want computation done by the bitmaps. We could add those checks in, but what's the point? The idea of the reuse code is to be a fast path for serving vanilla clones. Searching through all of the packfiles for a .keep entry is the antithesis of that. > Is the real reason why we want to disable the "reuse" because it is > not easy to update the reuse_partial_packfile_from_bitmap() logic to > take these selection options into account? If so, I would also > understand why this is a good change. This is a side concern for the current form. With the patches I mentioned above, it's not too hard to omit some objects and still reuse the other bits verbatim. But again, even evaluating the function for each pack is too expensive, even if the implementation supported it. > > +test_expect_success 'pack reuse respects --honor-pack-keep' ' > > + test_when_finished "rm -f .git/objects/pack/*.keep" && > > + for i in .git/objects/pack/*.pack; do > > + >${i%.pack}.keep > > + done && > > Micronit: style. I assume you mean putting "do" on a separate line. Sorry, the "; do" is my native tongue and I sometimes slip back into it without thinking. -Peff
Re: [PATCH] pack-objects: disable pack reuse for object-selection options
Jeff Kingwrites: > If certain options like --honor-pack-keep, --local, or > --incremental are used with pack-objects, then we need to > feed each potential object to want_object_in_pack() to see > if it should be filtered out. This is totally contrary to > the purpose of the pack-reuse optimization, which tries hard > to avoid doing any per-object work. Therefore we need to > disable this optimization when these options are in use. I read this five times, as I wanted to understand what you are saying, but I am not sure if I got it right. One of the reasons why I was confused was that I originally thought this "reuse" was about delta reuse, but it is not. It is "sending a slice of the original packfile straight to the output". But even after getting myself out of that confusion, I still do not see why we "need to disable". Surely, even if we need to exclude some objects from an existing packfile due to these selection options, we should be able to reuse the non-excluded part, no? The end result may involve having to pick and reuse more and smaller slices from existing packfiles, which may be much less efficient, but it is no immediately obvious to me if it leads to "need to disable". I would understand it if it were "it becomes much less efficient and we are better off not using the bitmap code at all", though. Is the real reason why we want to disable the "reuse" because it is not easy to update the reuse_partial_packfile_from_bitmap() logic to take these selection options into account? If so, I would also understand why this is a good change. Puzzled. > This bug has been present since the inception of the > pack-reuse code, but was unlikely to come up in practice. > +test_expect_success 'pack reuse respects --honor-pack-keep' ' > + test_when_finished "rm -f .git/objects/pack/*.keep" && > + for i in .git/objects/pack/*.pack; do > + >${i%.pack}.keep > + done && Micronit: style.
[PATCH] pack-objects: disable pack reuse for object-selection options
If certain options like --honor-pack-keep, --local, or --incremental are used with pack-objects, then we need to feed each potential object to want_object_in_pack() to see if it should be filtered out. This is totally contrary to the purpose of the pack-reuse optimization, which tries hard to avoid doing any per-object work. Therefore we need to disable this optimization when these options are in use. This bug has been present since the inception of the pack-reuse code, but was unlikely to come up in practice. These options are generally used for on-disk packing, not transfer packs (which go to stdout), but we've never allowed pack reuse for non-stdout packs (until 645c432d6, we did not even use bitmaps, which the reuse optimization relies on; after that, we explicitly turned it off when not packing to stdout). There are tests in t5310 that check these options with bitmaps and --stdout, but they didn't catch the bug, and it's hard to adapt them to do so. One problem is that they don't use --delta-base-offset; without that option, we always disable the reuse optimization entirely. It would be fine to add it in (it actually makes the test more realistic), but that still isn't quite enough. Another problem is that the reuse code is very picky; it only kicks in when it can reuse most of a pack, starting from the first byte. So we'd have to start from a fully repacked and bitmapped state to trigger it. But the tests for these options use a much more subtle state; they want to be sure that the want_object_in_pack() code is allowing some objects but not others. Doing a full repack runs counter to that. So this patch adds new tests at the end of the script which create the fully-packed state and make sure that each option is not fooled by reusable pack. Signed-off-by: Jeff King--- I happened to notice this because I have a series which makes the reuse code much less picky (it kicks in more often, and can even convert to non-ofs-delta clients on the fly). And it fails the tests when merged with 702d1b958 (pack-objects: respect --local/--honor-pack-keep/--incremental when bitmap is in use, 2016-09-10). But the bug is much older than that. So this isn't at all urgent for v2.13. builtin/pack-objects.c | 6 +- t/t5310-pack-bitmaps.sh | 38 ++ 2 files changed, 43 insertions(+), 1 deletion(-) diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c index 0fe35d1b5..50e01aa80 100644 --- a/builtin/pack-objects.c +++ b/builtin/pack-objects.c @@ -2717,7 +2717,11 @@ static void loosen_unused_packed_objects(struct rev_info *revs) */ static int pack_options_allow_reuse(void) { - return pack_to_stdout && allow_ofs_delta; + return pack_to_stdout && + allow_ofs_delta && + !ignore_packed_keep && + (!local || !have_non_local_packs) && + !incremental; } static int get_object_list_from_bitmap(struct rev_info *revs) diff --git a/t/t5310-pack-bitmaps.sh b/t/t5310-pack-bitmaps.sh index 424bec7d7..c3ddfa217 100755 --- a/t/t5310-pack-bitmaps.sh +++ b/t/t5310-pack-bitmaps.sh @@ -289,4 +289,42 @@ test_expect_success 'splitting packs does not generate bogus bitmaps' ' git -C no-bitmaps.git fetch .. HEAD ' +test_expect_success 'set up reusable pack' ' + rm -f .git/objects/pack/*.keep && + git repack -adb && + reusable_pack () { + git for-each-ref --format="%(objectname)" | + git pack-objects --delta-base-offset --revs --stdout "$@" + } +' + +test_expect_success 'pack reuse respects --honor-pack-keep' ' + test_when_finished "rm -f .git/objects/pack/*.keep" && + for i in .git/objects/pack/*.pack; do + >${i%.pack}.keep + done && + reusable_pack --honor-pack-keep >empty.pack && + git index-pack empty.pack && + >expect && + git show-index actual && + test_cmp expect actual +' + +test_expect_success 'pack reuse respects --local' ' + mv .git/objects/pack/* alt.git/objects/pack/ && + test_when_finished "mv alt.git/objects/pack/* .git/objects/pack/" && + reusable_pack --local >empty.pack && + git index-pack empty.pack && + >expect && + git show-index actual && + test_cmp expect actual +' + +test_expect_success 'pack reuse respects --incremental' ' + reusable_pack --incremental >empty.pack && + git index-pack empty.pack && + >expect && + git show-index actual && + test_cmp expect actual +' test_done -- 2.13.0.rc1.437.g927e4246e