Re: [PATCH v2] fetch-pack: don't try to fetch peel values with --all
On Wed, Jun 13, 2018 at 05:13:02PM -0400, Jeff King wrote: > On Tue, Jun 12, 2018 at 06:54:17PM +, Kirill Smelkov wrote: > > > > If an extra connection isn't a problem, you might be better off with > > > "git ls-remote", and then picking through the results for refs of > > > interest, and then "git fetch-pack" to actually get the pack. That's how > > > git-fetch worked when it was a shell script (e.g., see c3a200120d, the > > > last shell version). > > > > Yes, this is what I ended up doing: > > > > https://lab.nexedi.com/kirr/git-backup/commit/899103bf > > > > but for another reason - to avoid repeating for every fetched repository > > slow (in case of my "big" destination backup repository) quickfetch() > > checking in every spawned `git fetch`: git-backup can build index of > > objects we already have ourselves only once at startup, and then in > > fetch, after checking lsremote output, consult that index, and if we see > > we already have everything for an advertised reference - just avoid > > giving it to fetch-pack to process. It turns out for many pulled > > repositories there is usually no references changed at all and this way > > fetch-pack can be skipped completely: > > > > https://lab.nexedi.com/kirr/git-backup/commit/3efed898 > > Thanks for sharing that, it's an interesting case. I'd hope that > git-fetch is smart enough not to even bother with quickfetch() if there > are no refs to update. But if we have even one change to fetch, then > yeah, in the general case it makes sense to me that you could do better > by amortizing the scan of local objects across many operations. Thanks for feedback. For the reference in case of git-backup `git fetch` or `git fetch-pack` would have to always do quickfetch scan or equivalent, because in case of backup repo there is only one reference in it - its master - and references of backed up repositories do not have anything representing them in backup.git/refs/ . Kirill
Re: [PATCH v2] fetch-pack: don't try to fetch peel values with --all
On Tue, Jun 12, 2018 at 06:54:17PM +, Kirill Smelkov wrote: > > If an extra connection isn't a problem, you might be better off with > > "git ls-remote", and then picking through the results for refs of > > interest, and then "git fetch-pack" to actually get the pack. That's how > > git-fetch worked when it was a shell script (e.g., see c3a200120d, the > > last shell version). > > Yes, this is what I ended up doing: > > https://lab.nexedi.com/kirr/git-backup/commit/899103bf > > but for another reason - to avoid repeating for every fetched repository > slow (in case of my "big" destination backup repository) quickfetch() > checking in every spawned `git fetch`: git-backup can build index of > objects we already have ourselves only once at startup, and then in > fetch, after checking lsremote output, consult that index, and if we see > we already have everything for an advertised reference - just avoid > giving it to fetch-pack to process. It turns out for many pulled > repositories there is usually no references changed at all and this way > fetch-pack can be skipped completely: > > https://lab.nexedi.com/kirr/git-backup/commit/3efed898 Thanks for sharing that, it's an interesting case. I'd hope that git-fetch is smart enough not to even bother with quickfetch() if there are no refs to update. But if we have even one change to fetch, then yeah, in the general case it makes sense to me that you could do better by amortizing the scan of local objects across many operations. -Peff
Re: [PATCH v2] fetch-pack: don't try to fetch peel values with --all
On Tue, Jun 12, 2018 at 05:48:49AM -0400, Jeff King wrote: > On Mon, Jun 11, 2018 at 09:43:02AM +, Kirill Smelkov wrote: > > > > Looking deeper, we do not need these trees and blobs at all. The problem > > > is really just a tag that peels to an object that is not otherwise a ref > > > tip, regardless of its type. > > > > Thanks for feedback and for coming up with the fix. Sure, I'm ok with > > moving the test into your patch. However, even if a test becomes > > different - narrowing down root of _current_ problem, I suggest to also > > keep explicitly testing tag-to-blob and tag-to-tree (and if we really > > also want tag-to-commit and tag-to-tag) behaviour. Reason is: if we skip > > those now, they can potentially break in the future. > > Yeah, I have no problem testing these cases separately. There's no bug > with them now, but it is a slightly uncommon case. My suggestion would > be to submit a patch that goes on top of mine that covers these cases. Ok, I will try to do it. > > I would also suggest to fix upload-pack, as it is just not consistent to > > reject sending objects that were advertised, and so can strike again > > some way in the future. After all git.git's fetch-pack is not the only > > git client that should be possible to interact with git.git's > > upload-pack on remote side, right? > > No, it's not the only client. At the same time, I am on the fence over > whether upload-pack's behavior is wrong or not. It depends what you take > a peeled advertisement line to mean. Does it mean: this object has been > advertised and clients should be able to fetch it? Or does it mean: by > the way, you may be interested to know the peeled value of this tag in > case you want to do tag-following? > > So far I think it has only meant the latter. I could see an argument for > the former, but any client depending on that would never have worked, > AFAICT. We could _make_ it work, but how would a client know which > server version it's talking to (and therefore whether it is safe to make > the request?). I think you'd have to add a capability to negotiate. I see. I don't know the details of the exchange, just it was surprising for outside observer that fetching what was advertised is rejected. For the reference there is no strong need for me for this to work anymore (please see below). > > I'm not sure, but I would say that `fetch-pack --all` from an empty > > repository should not fail and should just give empty output as fetch > > does. > > Yeah, that seems reasonable to me. The die() that catches this dates > back to 2005-era, and we later taught the "fetch" porcelain to handle > this. I don't _think_ anybody would be upset that the plumbing learned > to treat this as a noop. It's probably a one-liner change in > fetch_pack() to return early instead of dying. Ok, I will try to send related testcase, and it is indeed easy to find - the fix itself. > > For the reference all the cases presented here are real - they appear in > > our repositories on lab.nexedi.com for which I maintain the backup, and > > I've noticed them in the process of switching git-backup from using > > fetch to fetch-pack here: > > > > https://lab.nexedi.com/kirr/git-backup/blob/0ab7bbb6/git-backup.go#L436 > > I applaud you using the porcelain for your scripts, but I suspect that > fetch-pack by itself is not at all well-used or well-tested these days > (certainly this --all bug has been around for almost 6 years and is not > very hard to trigger in practice). I see; thanks for the warning. > If an extra connection isn't a problem, you might be better off with > "git ls-remote", and then picking through the results for refs of > interest, and then "git fetch-pack" to actually get the pack. That's how > git-fetch worked when it was a shell script (e.g., see c3a200120d, the > last shell version). Yes, this is what I ended up doing: https://lab.nexedi.com/kirr/git-backup/commit/899103bf but for another reason - to avoid repeating for every fetched repository slow (in case of my "big" destination backup repository) quickfetch() checking in every spawned `git fetch`: git-backup can build index of objects we already have ourselves only once at startup, and then in fetch, after checking lsremote output, consult that index, and if we see we already have everything for an advertised reference - just avoid giving it to fetch-pack to process. It turns out for many pulled repositories there is usually no references changed at all and this way fetch-pack can be skipped completely: https://lab.nexedi.com/kirr/git-backup/commit/3efed898 > It may also be sane to just use "git fetch", which I'd say is _fairly_ > safe to script. Of course I have no problem if you want to fix all of > the corner cases in fetch-pack. Just giving you fair warning. :) Thanks again for the warning. I'm happy the switch to fetch plumbing happenned on my side, and so far it is working well. Like I said above I cannot use `git fetch` as is, becaus
Re: [PATCH v2] fetch-pack: don't try to fetch peel values with --all
On Mon, Jun 11, 2018 at 09:43:02AM +, Kirill Smelkov wrote: > > Looking deeper, we do not need these trees and blobs at all. The problem > > is really just a tag that peels to an object that is not otherwise a ref > > tip, regardless of its type. > > Thanks for feedback and for coming up with the fix. Sure, I'm ok with > moving the test into your patch. However, even if a test becomes > different - narrowing down root of _current_ problem, I suggest to also > keep explicitly testing tag-to-blob and tag-to-tree (and if we really > also want tag-to-commit and tag-to-tag) behaviour. Reason is: if we skip > those now, they can potentially break in the future. Yeah, I have no problem testing these cases separately. There's no bug with them now, but it is a slightly uncommon case. My suggestion would be to submit a patch that goes on top of mine that covers these cases. > I would also suggest to fix upload-pack, as it is just not consistent to > reject sending objects that were advertised, and so can strike again > some way in the future. After all git.git's fetch-pack is not the only > git client that should be possible to interact with git.git's > upload-pack on remote side, right? No, it's not the only client. At the same time, I am on the fence over whether upload-pack's behavior is wrong or not. It depends what you take a peeled advertisement line to mean. Does it mean: this object has been advertised and clients should be able to fetch it? Or does it mean: by the way, you may be interested to know the peeled value of this tag in case you want to do tag-following? So far I think it has only meant the latter. I could see an argument for the former, but any client depending on that would never have worked, AFAICT. We could _make_ it work, but how would a client know which server version it's talking to (and therefore whether it is safe to make the request?). I think you'd have to add a capability to negotiate. > I'm not sure, but I would say that `fetch-pack --all` from an empty > repository should not fail and should just give empty output as fetch > does. Yeah, that seems reasonable to me. The die() that catches this dates back to 2005-era, and we later taught the "fetch" porcelain to handle this. I don't _think_ anybody would be upset that the plumbing learned to treat this as a noop. It's probably a one-liner change in fetch_pack() to return early instead of dying. > For the reference all the cases presented here are real - they appear in > our repositories on lab.nexedi.com for which I maintain the backup, and > I've noticed them in the process of switching git-backup from using > fetch to fetch-pack here: > > https://lab.nexedi.com/kirr/git-backup/blob/0ab7bbb6/git-backup.go#L436 I applaud you using the porcelain for your scripts, but I suspect that fetch-pack by itself is not at all well-used or well-tested these days (certainly this --all bug has been around for almost 6 years and is not very hard to trigger in practice). If an extra connection isn't a problem, you might be better off with "git ls-remote", and then picking through the results for refs of interest, and then "git fetch-pack" to actually get the pack. That's how git-fetch worked when it was a shell script (e.g., see c3a200120d, the last shell version). It may also be sane to just use "git fetch", which I'd say is _fairly_ safe to script. Of course I have no problem if you want to fix all of the corner cases in fetch-pack. Just giving you fair warning. :) -Peff
Re: [PATCH v2] fetch-pack: don't try to fetch peel values with --all
Jeff, On Mon, Jun 11, 2018 at 01:53:57AM -0400, Jeff King wrote: > On Mon, Jun 11, 2018 at 01:28:23AM -0400, Eric Sunshine wrote: > > > On Mon, Jun 11, 2018 at 12:47 AM, Jeff King wrote: > > > Subject: fetch-pack: don't try to fetch peeled values with --all > > > [...] > > > Original report and test from Kirill Smelkov. > > > > > > Signed-off-by: Kirill Smelkov > > > Signed-off-by: Jeff King > > > --- > > > diff --git a/t/t5500-fetch-pack.sh b/t/t5500-fetch-pack.sh > > > @@ -506,30 +506,45 @@ test_expect_success 'test missing ref before > > > existing' ' > > > +test_expect_success 'test --all wrt tag to non-commits' ' > > > + blob_sha1=$(echo "hello blob" | git hash-object -t blob -w > > > --stdin) && > > > + git tag -a -m "tag -> blob" tag-to-blob $blob_sha1 && > > > + tree_sha1=$(printf "100644 blob $blob_sha1\tfile\n" | git mktree) > > > && > > > > Perhaps modernize these names to 'blob_oid' and 'tree_oid', or even > > simpler, just 'blob' and 'tree'. > > Looking deeper, we do not need these trees and blobs at all. The problem > is really just a tag that peels to an object that is not otherwise a ref > tip, regardless of its type. Thanks for feedback and for coming up with the fix. Sure, I'm ok with moving the test into your patch. However, even if a test becomes different - narrowing down root of _current_ problem, I suggest to also keep explicitly testing tag-to-blob and tag-to-tree (and if we really also want tag-to-commit and tag-to-tag) behaviour. Reason is: if we skip those now, they can potentially break in the future. I would also suggest to fix upload-pack, as it is just not consistent to reject sending objects that were advertised, and so can strike again some way in the future. After all git.git's fetch-pack is not the only git client that should be possible to interact with git.git's upload-pack on remote side, right? By the way, another problem I noticed with fetch-pack is that fetching with --all from completely empty repository also fails: .../r$ git init --bare repo.git Initialized empty Git repository in /home/kirr/tmp/trashme/r/repo.git/ .../r$ mkdir clone .../r$ cd clone/ .../r/clone$ git init Initialized empty Git repository in /home/kirr/tmp/trashme/r/clone/.git/ .../r/clone$ git fetch-pack --all ../repo.git/ fatal: no matching remote head .../r/clone$ echo $? 128 .../r/clone$ git ls-remote ../repo.git/ .../r/clone$ echo $? 0 .../r/clone$ git fetch ../repo.git/ 'refs/*:refs/repo/*' .../r/clone$ echo $? 0 I'm not sure, but I would say that `fetch-pack --all` from an empty repository should not fail and should just give empty output as fetch does. For the reference all the cases presented here are real - they appear in our repositories on lab.nexedi.com for which I maintain the backup, and I've noticed them in the process of switching git-backup from using fetch to fetch-pack here: https://lab.nexedi.com/kirr/git-backup/blob/0ab7bbb6/git-backup.go#L436 Kirill > So below is a patch that simplifies the test even further (the actual > code change is the same). > > > > + git tag -a -m "tag -> tree" tag-to-tree $tree_sha1 && > > > + mkdir fetchall && > > > + ( > > > + cd fetchall && > > > + git init && > > > + git fetch-pack --all .. && > > > > Simpler: > > > > git init fetchall && > > ( > > cd fetchall && > > git fetch-pack --all .. && > > > > Although, I see that this script already has a mix of the two styles > > (simpler and not-so-simple), so... > > The nearby tests actually reuse the "client" directory. We can do that, > too, if we simply create new objects for our test, to make sure they > still need fetching. See below (we could also use "git -C" there, but > the subshell matches the other tests). > > -- >8 -- > Subject: fetch-pack: don't try to fetch peel values with --all > > When "fetch-pack --all" sees a tag-to-blob on the remote, it > tries to fetch both the tag itself ("refs/tags/foo") and the > peeled value that the remote advertises ("refs/tags/foo^{}"). > Asking for the object pointed to by the latter can cause > upload-pack to complain with "not our ref", since it does > not mark the peeled objects with the OUR_REF (unless they > were at the tip of some other ref). > > Arguably upload-pack _should_ be marking those peeled > objects. But it never has in the past, since clients would > generally just ask for the tag and expect to get the peeled > value along with it. And that's how "git fetch" works, as > well as older versions of "fetch-pack --all". > > The problem was introduced by 5f0fc64513 (fetch-pack: > eliminate spurious error messages, 2012-09-09). Before then, > the matching logic was something like: > > if (refname is ill-formed) > do nothing > else if (doing --all) > always consider it matched
[PATCH v2] fetch-pack: don't try to fetch peel values with --all
On Mon, Jun 11, 2018 at 01:28:23AM -0400, Eric Sunshine wrote: > On Mon, Jun 11, 2018 at 12:47 AM, Jeff King wrote: > > Subject: fetch-pack: don't try to fetch peeled values with --all > > [...] > > Original report and test from Kirill Smelkov. > > > > Signed-off-by: Kirill Smelkov > > Signed-off-by: Jeff King > > --- > > diff --git a/t/t5500-fetch-pack.sh b/t/t5500-fetch-pack.sh > > @@ -506,30 +506,45 @@ test_expect_success 'test missing ref before > > existing' ' > > +test_expect_success 'test --all wrt tag to non-commits' ' > > + blob_sha1=$(echo "hello blob" | git hash-object -t blob -w --stdin) > > && > > + git tag -a -m "tag -> blob" tag-to-blob $blob_sha1 && > > + tree_sha1=$(printf "100644 blob $blob_sha1\tfile\n" | git mktree) && > > Perhaps modernize these names to 'blob_oid' and 'tree_oid', or even > simpler, just 'blob' and 'tree'. Looking deeper, we do not need these trees and blobs at all. The problem is really just a tag that peels to an object that is not otherwise a ref tip, regardless of its type. So below is a patch that simplifies the test even further (the actual code change is the same). > > + git tag -a -m "tag -> tree" tag-to-tree $tree_sha1 && > > + mkdir fetchall && > > + ( > > + cd fetchall && > > + git init && > > + git fetch-pack --all .. && > > Simpler: > > git init fetchall && > ( > cd fetchall && > git fetch-pack --all .. && > > Although, I see that this script already has a mix of the two styles > (simpler and not-so-simple), so... The nearby tests actually reuse the "client" directory. We can do that, too, if we simply create new objects for our test, to make sure they still need fetching. See below (we could also use "git -C" there, but the subshell matches the other tests). -- >8 -- Subject: fetch-pack: don't try to fetch peel values with --all When "fetch-pack --all" sees a tag-to-blob on the remote, it tries to fetch both the tag itself ("refs/tags/foo") and the peeled value that the remote advertises ("refs/tags/foo^{}"). Asking for the object pointed to by the latter can cause upload-pack to complain with "not our ref", since it does not mark the peeled objects with the OUR_REF (unless they were at the tip of some other ref). Arguably upload-pack _should_ be marking those peeled objects. But it never has in the past, since clients would generally just ask for the tag and expect to get the peeled value along with it. And that's how "git fetch" works, as well as older versions of "fetch-pack --all". The problem was introduced by 5f0fc64513 (fetch-pack: eliminate spurious error messages, 2012-09-09). Before then, the matching logic was something like: if (refname is ill-formed) do nothing else if (doing --all) always consider it matched else look through list of sought refs for a match That commit wanted to flip the order of the second two arms of that conditional. But we ended up with: if (refname is ill-formed) do nothing else look through list of sought refs for a match if (--all and no match so far) always consider it matched That means tha an ill-formed ref will trigger the --all conditional block, even though we should just be ignoring it. We can fix that by having a single "else" with all of the well-formed logic, that checks the sought refs and "--all" in the correct order. Reported-by: Kirill Smelkov Signed-off-by: Jeff King --- fetch-pack.c | 8 t/t5500-fetch-pack.sh | 10 ++ 2 files changed, 14 insertions(+), 4 deletions(-) diff --git a/fetch-pack.c b/fetch-pack.c index a320ce9872..cc7a42fee9 100644 --- a/fetch-pack.c +++ b/fetch-pack.c @@ -657,11 +657,11 @@ static void filter_refs(struct fetch_pack_args *args, } i++; } - } - if (!keep && args->fetch_all && - (!args->deepen || !starts_with(ref->name, "refs/tags/"))) - keep = 1; + if (!keep && args->fetch_all && + (!args->deepen || !starts_with(ref->name, "refs/tags/"))) + keep = 1; + } if (keep) { *newtail = ref; diff --git a/t/t5500-fetch-pack.sh b/t/t5500-fetch-pack.sh index d4f435155f..5726f83ea3 100755 --- a/t/t5500-fetch-pack.sh +++ b/t/t5500-fetch-pack.sh @@ -518,6 +518,16 @@ test_expect_success 'test --all, --depth, and explicit tag' ' ) >out-adt 2>error-adt ' +test_expect_success 'test --all with tag to non-tip' ' + git commit --allow-empty -m non-tip && + git commit --allow-empty -m tip && + git tag -m "annotated" non-tip HEAD^ && + ( + cd client && + git fetch-pack --all .. + ) +' + test_expect_success 'shallow fetch with tags does not break the repo