Re: Bug: git-upload-pack will return successfully even when it can't read all references

2015-09-08 Thread Jeff King
On Mon, Sep 07, 2015 at 02:11:15PM +0200, Ævar Arnfjörð Bjarmason wrote:

> This turned out to be a pretty trivial filesystem error.
> refs/heads/master wasn't readable by the backup process, but some
> other stuff in refs/heads and objects/* was.
>
> [...]
>
> I wanted to check if this was a regression and got as far back as
> v1.4.3 with the same behavior before the commands wouldn't work
> anymore due to changes in the git config parsing code.

Right, it has basically always been this way. for_each_ref() silently
eats oddities or errors while reading refs. Calling for_each_rawref()
will include them, but we don't do it in most places; it would make
non-critical operations on a corrupted repo barf.  And it is difficult
to know what is "critical" inside the code. You might be calling
"upload-pack" to salvage what you can from a corrupted repo, or to make
a backup where you want to know what is corrupted and what is not.

Commit 49672f2 introduced a "ref paranoia" environment variable to let
you specify this (and robust backups was definitely one of the use cases
I had in mind). It's a little tricky to use with upload-pack because you
may be crossing an ssh boundary, but:

  git clone -u 'GIT_REF_PARANOIA=1 git-upload-pack' ...

should work.

With your case:

  $ git clone --no-local -u 'GIT_REF_PARANOIA=1 git-upload-pack' repo 
repo-checkout
  Cloning into 'repo-checkout'...
  fatal: git upload-pack: not our ref 
  fatal: The remote end hung up unexpectedly

Without "--no-local" it behaves weirdly, but I would not recommend local
clones in general if you are trying to be careful. They optimize out a
lot of the safety checks, and we do things like copy the packed-refs
file wholesale.

And certainly the error message is not the greatest. upload-pack is not
checking for the REF_ISBROKEN flag, so it just dumps:

   refs/heads/master

in the advertisement, and the client happily requests that object.
REF_PARANOIA is really just a band-aid to feed the broken refs to the
normal code paths, which typically barf on their own. :)

Something like this:

diff --git a/upload-pack.c b/upload-pack.c
index 89e832b..3c621a5 100644
--- a/upload-pack.c
+++ b/upload-pack.c
@@ -731,6 +731,9 @@ static int send_ref(const char *refname, const struct 
object_id *oid,
if (mark_our_ref(refname, oid))
return 0;
 
+   if (flag & REF_ISBROKEN)
+   warning("remote ref '%s' is broken", refname);
+
if (capabilities) {
struct strbuf symref_info = STRBUF_INIT;
 
kind of helps, but the advertisement is too early for us to send
sideband messages. So it makes it to the user if the transport is local
or ssh, but not over git:// or http.

That's something we could do better with protocol v2 (we'll negotiate
capabilities before the advertisement).

-Peff
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Bug: git-upload-pack will return successfully even when it can't read all references

2015-09-08 Thread Ævar Arnfjörð Bjarmason
On Tue, Sep 8, 2015 at 8:53 AM, Jeff King  wrote:
> On Mon, Sep 07, 2015 at 02:11:15PM +0200, Ævar Arnfjörð Bjarmason wrote:
>
>> This turned out to be a pretty trivial filesystem error.
>> refs/heads/master wasn't readable by the backup process, but some
>> other stuff in refs/heads and objects/* was.
>>
>> [...]
>>
>> I wanted to check if this was a regression and got as far back as
>> v1.4.3 with the same behavior before the commands wouldn't work
>> anymore due to changes in the git config parsing code.
>
> Right, it has basically always been this way. for_each_ref() silently
> eats oddities or errors while reading refs. Calling for_each_rawref()
> will include them, but we don't do it in most places; it would make
> non-critical operations on a corrupted repo barf.  And it is difficult
> to know what is "critical" inside the code. You might be calling
> "upload-pack" to salvage what you can from a corrupted repo, or to make
> a backup where you want to know what is corrupted and what is not.
>
> Commit 49672f2 introduced a "ref paranoia" environment variable to let
> you specify this (and robust backups was definitely one of the use cases
> I had in mind). It's a little tricky to use with upload-pack because you
> may be crossing an ssh boundary, but:
>
>   git clone -u 'GIT_REF_PARANOIA=1 git-upload-pack' ...
>
> should work.
>
> With your case:
>
>   $ git clone --no-local -u 'GIT_REF_PARANOIA=1 git-upload-pack' repo 
> repo-checkout
>   Cloning into 'repo-checkout'...
>   fatal: git upload-pack: not our ref 
>   fatal: The remote end hung up unexpectedly
>
> Without "--no-local" it behaves weirdly, but I would not recommend local
> clones in general if you are trying to be careful. They optimize out a
> lot of the safety checks, and we do things like copy the packed-refs
> file wholesale.
>
> And certainly the error message is not the greatest. upload-pack is not
> checking for the REF_ISBROKEN flag, so it just dumps:
>
>    refs/heads/master
>
> in the advertisement, and the client happily requests that object.
> REF_PARANOIA is really just a band-aid to feed the broken refs to the
> normal code paths, which typically barf on their own. :)
>
> Something like this:
>
> diff --git a/upload-pack.c b/upload-pack.c
> index 89e832b..3c621a5 100644
> --- a/upload-pack.c
> +++ b/upload-pack.c
> @@ -731,6 +731,9 @@ static int send_ref(const char *refname, const struct 
> object_id *oid,
> if (mark_our_ref(refname, oid))
> return 0;
>
> +   if (flag & REF_ISBROKEN)
> +   warning("remote ref '%s' is broken", refname);
> +
> if (capabilities) {
> struct strbuf symref_info = STRBUF_INIT;
>
> kind of helps, but the advertisement is too early for us to send
> sideband messages. So it makes it to the user if the transport is local
> or ssh, but not over git:// or http.
>
> That's something we could do better with protocol v2 (we'll negotiate
> capabilities before the advertisement).

Fantastic. REF_PARANOIA does exactly what I need, i.e. stall the fetch
process so permissions can be manually repaired.

I think it makes sense to keep the default at "let's try to copy over
what we can", for salvage purposes. I think the bug is that we still
return success in that case, and should return non-zero, but as you
point out this is easier said than done due to needing to deal with
the case where the remote transport sends us the ... ref.

I wonder if --upload-pack="GIT_REF_PARANOIA=1 git-upload-pack" should
be the default when running fetch if you have --prune enabled. There's
a particularly bad edge case now where if you have permission errors
on the master repository and run --prune on your backup along with a
--mirror clone to mirror the refs, then when you have permission
issues you'll prune everything from the backup.

But yeah, a proper fix needs protocol v2. Because among other things
that --upload-pack hack will only work for ssh, not http.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Bug: git-upload-pack will return successfully even when it can't read all references

2015-09-08 Thread Jeff King
On Tue, Sep 08, 2015 at 02:16:03PM +0200, Ævar Arnfjörð Bjarmason wrote:

> I wonder if --upload-pack="GIT_REF_PARANOIA=1 git-upload-pack" should
> be the default when running fetch if you have --prune enabled. There's
> a particularly bad edge case now where if you have permission errors
> on the master repository and run --prune on your backup along with a
> --mirror clone to mirror the refs, then when you have permission
> issues you'll prune everything from the backup.
> 
> But yeah, a proper fix needs protocol v2. Because among other things
> that --upload-pack hack will only work for ssh, not http.

Right. There is no real way to flip the flag from the client side,
because we cannot reliably communicate it. Once we have such a
mechanism, it might actually make sense to _always_ flip on paranoia.
That is, we would rather an operation fail and be retried with in a
"loose" mode explicitly. That's safer. It's less convenient when you
have a broken repo, but surely that's the exception, right?

-Peff
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Bug: git-upload-pack will return successfully even when it can't read all references

2015-09-07 Thread Ævar Arnfjörð Bjarmason
We have a process to back up our Git repositories at work, this
started alerting because it wasn't getting the same refs as the
remote.

This turned out to be a pretty trivial filesystem error.
refs/heads/master wasn't readable by the backup process, but some
other stuff in refs/heads and objects/* was.

But I think it's a bug that if we ssh to the remote end, and
git-upload-pack can't read certain refs in refs/heads/ that we don't
return an error.

This simple shellscript reproduces the issue:

rm -rf /tmp/repo /tmp/repo-checkout
git init /tmp/repo
cd /tmp/repo
touch foo
git add foo
git commit -m"foo"
git checkout -b branch
git checkout master
git show-ref
chmod 000 .git/refs/heads/master
git show-ref
cd /tmp
git clone repo repo-checkout
echo "Status code of clone: $?"
cd repo-checkout
git show-ref

After running this you get:

$ (cd /tmp/repo-checkout && echo -n | strace
/tmp/avar/bin/git-upload-pack /tmp/repo 2>&1 | grep -e EACCES)
open("refs/heads/master", O_RDONLY) = -1 EACCES (Permission denied)
open("refs/heads/master", O_RDONLY) = -1 EACCES (Permission denied)
open("refs/heads/master", O_RDONLY) = -1 EACCES (Permission denied)

And "git fetch" will return 0.

We fail to call get refs/heads/master in head_ref_namespaced() called
by upload_pack(). I was going to see if I could patch it to return an
error, but that code seems very far removed from any error checking.

This isn't only an issue with git-upload-pack, e.g. show-ref itself
has the same issue:

$ chmod 600 .git/refs/heads/master
$ git show-ref; echo $?
e7255c8fcabc6e15f57cd984f9f117870052c1a0 refs/heads/branch
e7255c8fcabc6e15f57cd984f9f117870052c1a0 refs/heads/master
0
$ chmod 000 .git/refs/heads/master
$ git show-ref; echo $?
e7255c8fcabc6e15f57cd984f9f117870052c1a0 refs/heads/branch
0

I wanted to check if this was a regression and got as far back as
v1.4.3 with the same behavior before the commands wouldn't work
anymore due to changes in the git config parsing code.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html