On Tue, Jan 27, 2015 at 03:20:41PM +0000, Charles Bailey wrote:

> From: Charles Bailey <cbaile...@bloomberg.net>
> 
> When objects are spread across multiple packs, if an initial fetch does
> require all pack files, a subsequent fetch for objects in packs not
> retrieved in the initial fetch will fail.

s/does/does not/, I think?

> I'm not very familiar with the http client code so this analysis is based
> purely on observed behaviour.

Debugging the http code is a royal pain because all the work happens in
a separate helper. I use a git-remote-debug script like this:

  #!/bin/sh
  host=localhost:5001
  proto=$(echo "${2:-$1}" | sed 's/:.*//')
  prog=git-remote-$proto
  echo >&2 "gdb -ex 'target remote $host' $prog"
  gdbserver localhost:5001 "$prog" "$@"

and then you can use:

  git fetch debug::http://...

in the test script, cut-and-paste the gdb command printed to stderr, and
you're dropped into the appropriate debugger without worrying about all
of the stdio mess.

> When fetching only some refs from a repository served over dumb httpd Git
> appears to download all of the index files for the available packs but then
> only chooses the pack files that help it resolve the objects which we need.

Right. And it looks like we have special code in sha1_file.c to make
sure we do not trust an index which does not have a matching packfile.
So that's good.

The http-walker code does its own check, in fetch_and_setup_pack_index,
that checks for an existing valid copy of the index. If we don't have
it, we download the index and proceed. If we do, we skip straight to
grabbing the pack. But if we have it and it doesn't appear valid, we
return an error. And there seems to be a bug with checking the validity.

It looks like the culprit is 7b64469 (Allow parse_pack_index on
temporary files, 2010-04-19). It added a new "idx_path" parameter to
parse_pack_index, which we pass as NULL.  That causes its call to
check_packed_git_idx to fail (because it has no idea what file we are
talking about!).

This seems to fix it:

diff --git a/sha1_file.c b/sha1_file.c
index 30995e6..eda4d90 100644
--- a/sha1_file.c
+++ b/sha1_file.c
@@ -1149,6 +1149,9 @@ struct packed_git *parse_pack_index(unsigned char *sha1, 
const char *idx_path)
        const char *path = sha1_pack_name(sha1);
        struct packed_git *p = alloc_packed_git(strlen(path) + 1);
 
+       if (!idx_path)
+               idx_path = sha1_pack_index_name(sha1);
+
        strcpy(p->pack_name, path);
        hashcpy(p->sha1, sha1);
        if (check_packed_git_idx(idx_path, p)) {

(Alternatively, we could pass in sha1_pack_index_name instead of NULL in
the first place, but I think it is reasonable for parse_pack_index to
take care of this).

I think it may also make sense for fetch_and_setup_pack_index to delete
and re-download a broken .idx file (rather than aborting), but I don't
think that's a big deal. It should only happen in the face of on-disk
data corruption, and the user can remove the broken .idx themselves.

-Peff
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to