[PATCH 2/2] index-pack: handle duplicate base objects gracefully

2014-08-29 Thread Jeff King
If a pack contains duplicates of an object, and if that
object has any deltas pointing at it with REF_DELTA, we will
try to resolve the deltas twice. While unusual, this is not
strictly an error, but we currently die() with an unhelpful
message.  We should instead silently ignore the delta and
move on. The first resolution already filled in any data we
needed (like the sha1).

The duplicate base object is not our concern during the
resolution phase, and the .idx-writing phase will decide
whether to complain or allow it (based on whether --strict
is set).

Based-on-work-by: René Scharfe 
Signed-off-by: Jeff King 
---
 builtin/index-pack.c  | 18 ++
 t/t5308-pack-detect-duplicates.sh | 15 +++
 2 files changed, 25 insertions(+), 8 deletions(-)

diff --git a/builtin/index-pack.c b/builtin/index-pack.c
index eebf1a8..acc30a9 100644
--- a/builtin/index-pack.c
+++ b/builtin/index-pack.c
@@ -936,20 +936,22 @@ static struct base_data *find_unresolved_deltas_1(struct 
base_data *base,
link_base_data(prev_base, base);
}
 
-   if (base->ref_first <= base->ref_last) {
+   while (base->ref_first <= base->ref_last) {
struct object_entry *child = objects + 
deltas[base->ref_first].obj_no;
-   struct base_data *result = alloc_base_data();
+   struct base_data *result = NULL;
 
-   if (!compare_and_swap_type(&child->real_type, OBJ_REF_DELTA,
-  base->obj->real_type))
-   die("BUG: child->real_type != OBJ_REF_DELTA");
+   if (compare_and_swap_type(&child->real_type, OBJ_REF_DELTA,
+ base->obj->real_type)) {
+   result = alloc_base_data();
+   resolve_delta(child, base, result);
+   }
 
-   resolve_delta(child, base, result);
if (base->ref_first == base->ref_last && base->ofs_last == -1)
free_base_data(base);
-
base->ref_first++;
-   return result;
+
+   if (result)
+   return result;
}
 
if (base->ofs_first <= base->ofs_last) {
diff --git a/t/t5308-pack-detect-duplicates.sh 
b/t/t5308-pack-detect-duplicates.sh
index 9c5a876..50f7a69 100755
--- a/t/t5308-pack-detect-duplicates.sh
+++ b/t/t5308-pack-detect-duplicates.sh
@@ -77,4 +77,19 @@ test_expect_success 'index-pack can reject packs with 
duplicates' '
test_expect_code 1 git cat-file -e $LO_SHA1
 '
 
+test_expect_success 'duplicated delta base does not trigger assert' '
+   clear_packs &&
+   {
+   A=01d7713666f4de822776c7622c10f1b07de280dc &&
+   B=e68fe8129b546b101aee9510c5328e7f21ca1d18 &&
+   pack_header 3 &&
+   pack_obj $A $B &&
+   pack_obj $B &&
+   pack_obj $B
+   } >dups.pack &&
+   pack_trailer dups.pack &&
+   git index-pack --stdin http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2] index-pack: handle duplicate base objects gracefully

2014-08-29 Thread Junio C Hamano
Jeff King  writes:

> If a pack contains duplicates of an object, and if that
> object has any deltas pointing at it with REF_DELTA, we will
> try to resolve the deltas twice. While unusual, this is not
> strictly an error, but we currently die() with an unhelpful
> message.

Hmm, I vaguely recall Shawn declaring this as an error in the pack
stream, though.

> The duplicate base object is not our concern during the
> resolution phase, and the .idx-writing phase will decide
> whether to complain or allow it (based on whether --strict
> is set).

OK.  We still diagnose and just be more lenient without loosening
the correctness of the result, so that would be not just OK but a
very welcome change.

Will queue.  Thanks.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2] index-pack: handle duplicate base objects gracefully

2014-08-29 Thread Jeff King
On Fri, Aug 29, 2014 at 02:56:18PM -0700, Junio C Hamano wrote:

> Jeff King  writes:
> 
> > If a pack contains duplicates of an object, and if that
> > object has any deltas pointing at it with REF_DELTA, we will
> > try to resolve the deltas twice. While unusual, this is not
> > strictly an error, but we currently die() with an unhelpful
> > message.
> 
> Hmm, I vaguely recall Shawn declaring this as an error in the pack
> stream, though.

I agree it is probably a bug on the sending side, but I think last time
this came up we decided to try to be liberal in what we accept.  c.f.
http://thread.gmane.org/gmane.comp.version-control.git/232305/focus=232310

-Peff
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2] index-pack: handle duplicate base objects gracefully

2014-08-29 Thread Shawn Pearce
On Fri, Aug 29, 2014 at 3:08 PM, Jeff King  wrote:
> On Fri, Aug 29, 2014 at 02:56:18PM -0700, Junio C Hamano wrote:
>
>> Jeff King  writes:
>>
>> > If a pack contains duplicates of an object, and if that
>> > object has any deltas pointing at it with REF_DELTA, we will
>> > try to resolve the deltas twice. While unusual, this is not
>> > strictly an error, but we currently die() with an unhelpful
>> > message.
>>
>> Hmm, I vaguely recall Shawn declaring this as an error in the pack
>> stream, though.
>
> I agree it is probably a bug on the sending side, but I think last time
> this came up we decided to try to be liberal in what we accept.  c.f.
> http://thread.gmane.org/gmane.comp.version-control.git/232305/focus=232310

IIRC they aren't valid pack files to contain duplicates.

Once upon a time JGit had a bug and android.googlesource.com returned
duplicate objects in a Linux kernel repository. This caused at least
some versions of git-core to fail very badly in binary search at
object lookup time or something. We had a lot of users angry with us.
:)

I know Nico said its OK last year, but its really not. I don't think
implementations are capable of handling it.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2] index-pack: handle duplicate base objects gracefully

2014-08-30 Thread Jeff King
On Fri, Aug 29, 2014 at 07:59:32PM -0700, Shawn Pearce wrote:

> > I agree it is probably a bug on the sending side, but I think last time
> > this came up we decided to try to be liberal in what we accept.  c.f.
> > http://thread.gmane.org/gmane.comp.version-control.git/232305/focus=232310
> 
> IIRC they aren't valid pack files to contain duplicates.
> 
> Once upon a time JGit had a bug and android.googlesource.com returned
> duplicate objects in a Linux kernel repository. This caused at least
> some versions of git-core to fail very badly in binary search at
> object lookup time or something. We had a lot of users angry with us.
> :)
> 
> I know Nico said its OK last year, but its really not. I don't think
> implementations are capable of handling it.

We do detect and complain if --strict is given. Should we make it the
default instead? I think it is still worthwhile to have a mode that can
handle these packs. It may be the only reasonable way to recover the
data from such a broken pack (and that broken pack may be the only copy
of the data you have, if you are stuck getting it out of a broken
implementation on a remote server).

-Peff
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2] index-pack: handle duplicate base objects gracefully

2014-08-30 Thread René Scharfe

Am 30.08.2014 um 15:16 schrieb Jeff King:

On Fri, Aug 29, 2014 at 07:59:32PM -0700, Shawn Pearce wrote:


I agree it is probably a bug on the sending side, but I think last time
this came up we decided to try to be liberal in what we accept.  c.f.
http://thread.gmane.org/gmane.comp.version-control.git/232305/focus=232310


IIRC they aren't valid pack files to contain duplicates.

Once upon a time JGit had a bug and android.googlesource.com returned
duplicate objects in a Linux kernel repository. This caused at least
some versions of git-core to fail very badly in binary search at
object lookup time or something. We had a lot of users angry with us.
:)

I know Nico said its OK last year, but its really not. I don't think
implementations are capable of handling it.


We do detect and complain if --strict is given. Should we make it the
default instead? I think it is still worthwhile to have a mode that can
handle these packs. It may be the only reasonable way to recover the
data from such a broken pack (and that broken pack may be the only copy
of the data you have, if you are stuck getting it out of a broken
implementation on a remote server).


Sounds reasonable; being able to extract code from broken repos -- 
especially in this real-world case -- is beneficial.


My only nit with patch 2: Petr Stodulka  and Martin 
von Gagern  should be mentioned as bug reporters.


René
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2] index-pack: handle duplicate base objects gracefully

2014-08-30 Thread Shawn Pearce
On Sat, Aug 30, 2014 at 6:16 AM, Jeff King  wrote:
> On Fri, Aug 29, 2014 at 07:59:32PM -0700, Shawn Pearce wrote:
>
>> > I agree it is probably a bug on the sending side, but I think last time
>> > this came up we decided to try to be liberal in what we accept.  c.f.
>> > http://thread.gmane.org/gmane.comp.version-control.git/232305/focus=232310
>>
>> IIRC they aren't valid pack files to contain duplicates.
>>
>> Once upon a time JGit had a bug and android.googlesource.com returned
>> duplicate objects in a Linux kernel repository. This caused at least
>> some versions of git-core to fail very badly in binary search at
>> object lookup time or something. We had a lot of users angry with us.
>> :)
>>
>> I know Nico said its OK last year, but its really not. I don't think
>> implementations are capable of handling it.
>
> We do detect and complain if --strict is given. Should we make it the
> default instead? I think it is still worthwhile to have a mode that can
> handle these packs. It may be the only reasonable way to recover the
> data from such a broken pack (and that broken pack may be the only copy
> of the data you have, if you are stuck getting it out of a broken
> implementation on a remote server).

Eh, OK, that does make a lot of sense.

Unfortunately accepting it by default encourages broken
implementations to stay broken and ship packs with duplicate objects,
which they shouldn't do since there are readers out there that cannot
handle it.

In my opinion we should make --strict default, but its an excellent
point made that users should have an escape hatch to disable this
check, attempt accepting a broken pack and try fixing it locally with
repack.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2] index-pack: handle duplicate base objects gracefully

2014-08-31 Thread Jeff King
On Sat, Aug 30, 2014 at 06:00:59PM +0200, René Scharfe wrote:

> My only nit with patch 2: Petr Stodulka  and Martin von
> Gagern  should be mentioned as bug reporters.

Yeah, I agree with that. And actually, you should get a Reported-by:
on the first patch. :)

However, I think there are some grave implications of this series; see
the message I just posted elsewhere in the thread. I think we will end
up dropping patch 2, but keep patch 1.

-Peff
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2] index-pack: handle duplicate base objects gracefully

2014-08-31 Thread Jeff King
On Sat, Aug 30, 2014 at 06:10:33PM -0700, Shawn Pearce wrote:

> > We do detect and complain if --strict is given. Should we make it the
> > default instead? I think it is still worthwhile to have a mode that can
> > handle these packs. It may be the only reasonable way to recover the
> > data from such a broken pack (and that broken pack may be the only copy
> > of the data you have, if you are stuck getting it out of a broken
> > implementation on a remote server).
> 
> Eh, OK, that does make a lot of sense.
> 
> Unfortunately accepting it by default encourages broken
> implementations to stay broken and ship packs with duplicate objects,
> which they shouldn't do since there are readers out there that cannot
> handle it.

I was pretty much convinced by this line of reasoning after you reading
your message. But after discovering some horrible side effects that I
just posted elsewhere, I think it's a no-brainer.

> In my opinion we should make --strict default, but its an excellent
> point made that users should have an escape hatch to disable this
> check, attempt accepting a broken pack and try fixing it locally with
> repack.

I am not sure if you meant it this way, but --strict controls a lot more
than just duplicate objects. It also runs fsck checks against the
incoming objects. Turning all of that on is a much bigger change than I
think we've been talking about.

But it's one I think we should consider. We've had --strict on at GitHub
for a few years, and it has caught many problems. Frequently we get
push-back from users saying "but nowhere else I pushed this to
complains". My opinion is that it is not that we are wrong for being
picky, but that other servers are not being picky enough.

It's also a bit of a hassle, though. E.g., a lot of projects have broken
ident lines deep in their history (from old versions of git, or other
broken implementations) and it is often way too late to fix them. We
usually try to get people to rewrite the history if it's not a problem,
but otherwise will lift the restrictions temporarily to let them push up
old history.

Broken ident lines are annoying, but not _too_ fundamentally bad.
Duplicate tree entries are a lot worse. Fsck even distinguishes between
"error" and "warning", but "index-pack --strict" treats both as a reason
to reject the object. We could perhaps loosen that, and make sure our
error/warning categories are reasonable.

-Peff
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2] index-pack: handle duplicate base objects gracefully

2014-08-31 Thread René Scharfe

Am 31.08.2014 um 17:17 schrieb Jeff King:

On Sat, Aug 30, 2014 at 06:00:59PM +0200, René Scharfe wrote:


My only nit with patch 2: Petr Stodulka  and Martin von
Gagern  should be mentioned as bug reporters.


Yeah, I agree with that. And actually, you should get a Reported-by:
on the first patch. :)

However, I think there are some grave implications of this series; see
the message I just posted elsewhere in the thread. I think we will end
up dropping patch 2, but keep patch 1.


OK, but it would still be a good idea to replace the assert()s with 
die()s and error messages that tell users that the repo is broken, not git.


René
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2] index-pack: handle duplicate base objects gracefully

2014-08-31 Thread Junio C Hamano
Jeff King  writes:

> Broken ident lines are annoying, but not _too_ fundamentally bad.
> Duplicate tree entries are a lot worse. Fsck even distinguishes between
> "error" and "warning", but "index-pack --strict" treats both as a reason
> to reject the object. We could perhaps loosen that, and make sure our
> error/warning categories are reasonable.

A pack stream that records the same object twice is not a breakage
that is buried deep in the history and cannot be corrected without a
wholesale history rewrite, unlike commit objects with broken ident
lines or tree objects with 0-filled file type designators.

As such, I think it is a reasonable thing to do the following:

 - "git repack" should be taught to dedup, as a way to recover from
   such a breakage, if not done already.

 - "git fsck" should complain, suggesting users to repack to
   recover.  It may even want to exit with non-zero status.

 - "git receive-pack" and "git fetch-pack" should warn, loudly,
   without failing.  They should ideally not keep such a corrupt
   pack stream on disk at all, de-duping the objects while
   streaming, but if that is not practical, at least they should
   give an easy way for the user to cause de-duping immediately (I
   do not mind if that is "we detected that the other side fed us a
   pack stream that is broken.  We automatically correct the
   breakage for you by repacking.  Be patient while we do so").

 - If the other side of "receive-pack/fetch-pack" implements the
   agent capability, upon detecting such a breakage, it may not be a
   bad idea to offer the user to send an e-mail reporting the
   version of the software to a wall-of-shame e-mail address ;-).

I agree that a tree that records the same path twice should be
outright rejected.  There is no sane recovery path.

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html