[PATCH 2/2] index-pack: handle duplicate base objects gracefully
If a pack contains duplicates of an object, and if that object has any deltas pointing at it with REF_DELTA, we will try to resolve the deltas twice. While unusual, this is not strictly an error, but we currently die() with an unhelpful message. We should instead silently ignore the delta and move on. The first resolution already filled in any data we needed (like the sha1). The duplicate base object is not our concern during the resolution phase, and the .idx-writing phase will decide whether to complain or allow it (based on whether --strict is set). Based-on-work-by: René Scharfe Signed-off-by: Jeff King --- builtin/index-pack.c | 18 ++ t/t5308-pack-detect-duplicates.sh | 15 +++ 2 files changed, 25 insertions(+), 8 deletions(-) diff --git a/builtin/index-pack.c b/builtin/index-pack.c index eebf1a8..acc30a9 100644 --- a/builtin/index-pack.c +++ b/builtin/index-pack.c @@ -936,20 +936,22 @@ static struct base_data *find_unresolved_deltas_1(struct base_data *base, link_base_data(prev_base, base); } - if (base->ref_first <= base->ref_last) { + while (base->ref_first <= base->ref_last) { struct object_entry *child = objects + deltas[base->ref_first].obj_no; - struct base_data *result = alloc_base_data(); + struct base_data *result = NULL; - if (!compare_and_swap_type(&child->real_type, OBJ_REF_DELTA, - base->obj->real_type)) - die("BUG: child->real_type != OBJ_REF_DELTA"); + if (compare_and_swap_type(&child->real_type, OBJ_REF_DELTA, + base->obj->real_type)) { + result = alloc_base_data(); + resolve_delta(child, base, result); + } - resolve_delta(child, base, result); if (base->ref_first == base->ref_last && base->ofs_last == -1) free_base_data(base); - base->ref_first++; - return result; + + if (result) + return result; } if (base->ofs_first <= base->ofs_last) { diff --git a/t/t5308-pack-detect-duplicates.sh b/t/t5308-pack-detect-duplicates.sh index 9c5a876..50f7a69 100755 --- a/t/t5308-pack-detect-duplicates.sh +++ b/t/t5308-pack-detect-duplicates.sh @@ -77,4 +77,19 @@ test_expect_success 'index-pack can reject packs with duplicates' ' test_expect_code 1 git cat-file -e $LO_SHA1 ' +test_expect_success 'duplicated delta base does not trigger assert' ' + clear_packs && + { + A=01d7713666f4de822776c7622c10f1b07de280dc && + B=e68fe8129b546b101aee9510c5328e7f21ca1d18 && + pack_header 3 && + pack_obj $A $B && + pack_obj $B && + pack_obj $B + } >dups.pack && + pack_trailer dups.pack && + git index-pack --stdin http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/2] index-pack: handle duplicate base objects gracefully
Jeff King writes: > If a pack contains duplicates of an object, and if that > object has any deltas pointing at it with REF_DELTA, we will > try to resolve the deltas twice. While unusual, this is not > strictly an error, but we currently die() with an unhelpful > message. Hmm, I vaguely recall Shawn declaring this as an error in the pack stream, though. > The duplicate base object is not our concern during the > resolution phase, and the .idx-writing phase will decide > whether to complain or allow it (based on whether --strict > is set). OK. We still diagnose and just be more lenient without loosening the correctness of the result, so that would be not just OK but a very welcome change. Will queue. Thanks. -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/2] index-pack: handle duplicate base objects gracefully
On Fri, Aug 29, 2014 at 02:56:18PM -0700, Junio C Hamano wrote: > Jeff King writes: > > > If a pack contains duplicates of an object, and if that > > object has any deltas pointing at it with REF_DELTA, we will > > try to resolve the deltas twice. While unusual, this is not > > strictly an error, but we currently die() with an unhelpful > > message. > > Hmm, I vaguely recall Shawn declaring this as an error in the pack > stream, though. I agree it is probably a bug on the sending side, but I think last time this came up we decided to try to be liberal in what we accept. c.f. http://thread.gmane.org/gmane.comp.version-control.git/232305/focus=232310 -Peff -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/2] index-pack: handle duplicate base objects gracefully
On Fri, Aug 29, 2014 at 3:08 PM, Jeff King wrote: > On Fri, Aug 29, 2014 at 02:56:18PM -0700, Junio C Hamano wrote: > >> Jeff King writes: >> >> > If a pack contains duplicates of an object, and if that >> > object has any deltas pointing at it with REF_DELTA, we will >> > try to resolve the deltas twice. While unusual, this is not >> > strictly an error, but we currently die() with an unhelpful >> > message. >> >> Hmm, I vaguely recall Shawn declaring this as an error in the pack >> stream, though. > > I agree it is probably a bug on the sending side, but I think last time > this came up we decided to try to be liberal in what we accept. c.f. > http://thread.gmane.org/gmane.comp.version-control.git/232305/focus=232310 IIRC they aren't valid pack files to contain duplicates. Once upon a time JGit had a bug and android.googlesource.com returned duplicate objects in a Linux kernel repository. This caused at least some versions of git-core to fail very badly in binary search at object lookup time or something. We had a lot of users angry with us. :) I know Nico said its OK last year, but its really not. I don't think implementations are capable of handling it. -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/2] index-pack: handle duplicate base objects gracefully
On Fri, Aug 29, 2014 at 07:59:32PM -0700, Shawn Pearce wrote: > > I agree it is probably a bug on the sending side, but I think last time > > this came up we decided to try to be liberal in what we accept. c.f. > > http://thread.gmane.org/gmane.comp.version-control.git/232305/focus=232310 > > IIRC they aren't valid pack files to contain duplicates. > > Once upon a time JGit had a bug and android.googlesource.com returned > duplicate objects in a Linux kernel repository. This caused at least > some versions of git-core to fail very badly in binary search at > object lookup time or something. We had a lot of users angry with us. > :) > > I know Nico said its OK last year, but its really not. I don't think > implementations are capable of handling it. We do detect and complain if --strict is given. Should we make it the default instead? I think it is still worthwhile to have a mode that can handle these packs. It may be the only reasonable way to recover the data from such a broken pack (and that broken pack may be the only copy of the data you have, if you are stuck getting it out of a broken implementation on a remote server). -Peff -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/2] index-pack: handle duplicate base objects gracefully
Am 30.08.2014 um 15:16 schrieb Jeff King: On Fri, Aug 29, 2014 at 07:59:32PM -0700, Shawn Pearce wrote: I agree it is probably a bug on the sending side, but I think last time this came up we decided to try to be liberal in what we accept. c.f. http://thread.gmane.org/gmane.comp.version-control.git/232305/focus=232310 IIRC they aren't valid pack files to contain duplicates. Once upon a time JGit had a bug and android.googlesource.com returned duplicate objects in a Linux kernel repository. This caused at least some versions of git-core to fail very badly in binary search at object lookup time or something. We had a lot of users angry with us. :) I know Nico said its OK last year, but its really not. I don't think implementations are capable of handling it. We do detect and complain if --strict is given. Should we make it the default instead? I think it is still worthwhile to have a mode that can handle these packs. It may be the only reasonable way to recover the data from such a broken pack (and that broken pack may be the only copy of the data you have, if you are stuck getting it out of a broken implementation on a remote server). Sounds reasonable; being able to extract code from broken repos -- especially in this real-world case -- is beneficial. My only nit with patch 2: Petr Stodulka and Martin von Gagern should be mentioned as bug reporters. René -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/2] index-pack: handle duplicate base objects gracefully
On Sat, Aug 30, 2014 at 6:16 AM, Jeff King wrote: > On Fri, Aug 29, 2014 at 07:59:32PM -0700, Shawn Pearce wrote: > >> > I agree it is probably a bug on the sending side, but I think last time >> > this came up we decided to try to be liberal in what we accept. c.f. >> > http://thread.gmane.org/gmane.comp.version-control.git/232305/focus=232310 >> >> IIRC they aren't valid pack files to contain duplicates. >> >> Once upon a time JGit had a bug and android.googlesource.com returned >> duplicate objects in a Linux kernel repository. This caused at least >> some versions of git-core to fail very badly in binary search at >> object lookup time or something. We had a lot of users angry with us. >> :) >> >> I know Nico said its OK last year, but its really not. I don't think >> implementations are capable of handling it. > > We do detect and complain if --strict is given. Should we make it the > default instead? I think it is still worthwhile to have a mode that can > handle these packs. It may be the only reasonable way to recover the > data from such a broken pack (and that broken pack may be the only copy > of the data you have, if you are stuck getting it out of a broken > implementation on a remote server). Eh, OK, that does make a lot of sense. Unfortunately accepting it by default encourages broken implementations to stay broken and ship packs with duplicate objects, which they shouldn't do since there are readers out there that cannot handle it. In my opinion we should make --strict default, but its an excellent point made that users should have an escape hatch to disable this check, attempt accepting a broken pack and try fixing it locally with repack. -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/2] index-pack: handle duplicate base objects gracefully
On Sat, Aug 30, 2014 at 06:00:59PM +0200, René Scharfe wrote: > My only nit with patch 2: Petr Stodulka and Martin von > Gagern should be mentioned as bug reporters. Yeah, I agree with that. And actually, you should get a Reported-by: on the first patch. :) However, I think there are some grave implications of this series; see the message I just posted elsewhere in the thread. I think we will end up dropping patch 2, but keep patch 1. -Peff -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/2] index-pack: handle duplicate base objects gracefully
On Sat, Aug 30, 2014 at 06:10:33PM -0700, Shawn Pearce wrote: > > We do detect and complain if --strict is given. Should we make it the > > default instead? I think it is still worthwhile to have a mode that can > > handle these packs. It may be the only reasonable way to recover the > > data from such a broken pack (and that broken pack may be the only copy > > of the data you have, if you are stuck getting it out of a broken > > implementation on a remote server). > > Eh, OK, that does make a lot of sense. > > Unfortunately accepting it by default encourages broken > implementations to stay broken and ship packs with duplicate objects, > which they shouldn't do since there are readers out there that cannot > handle it. I was pretty much convinced by this line of reasoning after you reading your message. But after discovering some horrible side effects that I just posted elsewhere, I think it's a no-brainer. > In my opinion we should make --strict default, but its an excellent > point made that users should have an escape hatch to disable this > check, attempt accepting a broken pack and try fixing it locally with > repack. I am not sure if you meant it this way, but --strict controls a lot more than just duplicate objects. It also runs fsck checks against the incoming objects. Turning all of that on is a much bigger change than I think we've been talking about. But it's one I think we should consider. We've had --strict on at GitHub for a few years, and it has caught many problems. Frequently we get push-back from users saying "but nowhere else I pushed this to complains". My opinion is that it is not that we are wrong for being picky, but that other servers are not being picky enough. It's also a bit of a hassle, though. E.g., a lot of projects have broken ident lines deep in their history (from old versions of git, or other broken implementations) and it is often way too late to fix them. We usually try to get people to rewrite the history if it's not a problem, but otherwise will lift the restrictions temporarily to let them push up old history. Broken ident lines are annoying, but not _too_ fundamentally bad. Duplicate tree entries are a lot worse. Fsck even distinguishes between "error" and "warning", but "index-pack --strict" treats both as a reason to reject the object. We could perhaps loosen that, and make sure our error/warning categories are reasonable. -Peff -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/2] index-pack: handle duplicate base objects gracefully
Am 31.08.2014 um 17:17 schrieb Jeff King: On Sat, Aug 30, 2014 at 06:00:59PM +0200, René Scharfe wrote: My only nit with patch 2: Petr Stodulka and Martin von Gagern should be mentioned as bug reporters. Yeah, I agree with that. And actually, you should get a Reported-by: on the first patch. :) However, I think there are some grave implications of this series; see the message I just posted elsewhere in the thread. I think we will end up dropping patch 2, but keep patch 1. OK, but it would still be a good idea to replace the assert()s with die()s and error messages that tell users that the repo is broken, not git. René -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/2] index-pack: handle duplicate base objects gracefully
Jeff King writes: > Broken ident lines are annoying, but not _too_ fundamentally bad. > Duplicate tree entries are a lot worse. Fsck even distinguishes between > "error" and "warning", but "index-pack --strict" treats both as a reason > to reject the object. We could perhaps loosen that, and make sure our > error/warning categories are reasonable. A pack stream that records the same object twice is not a breakage that is buried deep in the history and cannot be corrected without a wholesale history rewrite, unlike commit objects with broken ident lines or tree objects with 0-filled file type designators. As such, I think it is a reasonable thing to do the following: - "git repack" should be taught to dedup, as a way to recover from such a breakage, if not done already. - "git fsck" should complain, suggesting users to repack to recover. It may even want to exit with non-zero status. - "git receive-pack" and "git fetch-pack" should warn, loudly, without failing. They should ideally not keep such a corrupt pack stream on disk at all, de-duping the objects while streaming, but if that is not practical, at least they should give an easy way for the user to cause de-duping immediately (I do not mind if that is "we detected that the other side fed us a pack stream that is broken. We automatically correct the breakage for you by repacking. Be patient while we do so"). - If the other side of "receive-pack/fetch-pack" implements the agent capability, upon detecting such a breakage, it may not be a bad idea to offer the user to send an e-mail reporting the version of the software to a wall-of-shame e-mail address ;-). I agree that a tree that records the same path twice should be outright rejected. There is no sane recovery path. -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html