Re: [BUG] allowtipsha1inwant serves unreachable blobs if you know its hash
On Thu, Feb 23, 2017 at 03:03:58PM -0800, Jonathan Tan wrote: > If a server sets allowtipsha1inwant (or allowreachablesha1inwant), a > client can call "git fetch " where SHA-1 is the hash of > a blob (reachable or unreachable) to obtain it. The test below (which > passes) demonstrates that. Thanks for a nice demonstration. TBH, I think the whole "we will not give you unreachable things" is overblown as a security measure. There are a lot of ways to leak information out of the repo. E.g., claiming you _do_ have the unreachable thing, at which point pack-objects may create a delta against it. Done repeatedly, it works as an oracle by which you can guess the contents of the unreachable thing. > This happens most likely because the "rev-list" call in > "do_reachable_revlist" (called through "has_unreachable") is invoked > without the "--objects" argument. "has_unreachable" determines that an > object is unreachable if nothing is printed to stdout, which normally > works, except that "rev-list" prints nothing when asked which commits > are reachable from a blob (which makes sense). > > Adding "--objects" works, and all existing tests pass, except for the > potential performance issue and the side effect that even fetching a > reachable blob no longer works. This is due to a possible bug where a > call like "git rev-list --objects $tree ^master" (where $tree is the > tree object corresponding to master) prints out objects even though all > objects reachable from the tree are also reachable from master. (There > should be no issue with "get_reachable_list", the other invoker of > "do_reachable_revlist", because non-commits in the command's stdout are > skipped.) I think "--objects --use-bitmap-index" would be faster. If you have bitmaps. -Peff
Re: [BUG] allowtipsha1inwant serves unreachable blobs if you know its hash
Junio C Hamano writes: > Jonathan Tan writes: > >> Adding "--objects" works, and all existing tests pass, except for the >> potential performance issue and the side effect that even fetching a >> reachable blob no longer works. This is due to a possible bug where a >> call like "git rev-list --objects $tree ^master" (where $tree is the >> tree object corresponding to master) prints out objects ... > > The "reachable from this, excluding what is reachable from that" > notation was originally designed to work only on commits, and I > wouldn't be surprised if "$tree ^master" did not work as you expect > in the current implementation. > > I agree that ideally it shouldn't show anything, but I suspect that > it would make it very expensive if done naively---we'd end up having > to call mark_tree_uninteresting() for all uninteresting commits, not > just for the commits at the edge of the DAG as we do right now. BTW, by "it would be expensive" I didn't mean "hence it shouldn't be done." Even though I do not know by how much expensive it would become, I think this approach is a lot better way going forward than punting and say "no, you cannot specify anything lower than commit." Thanks for looking into this.
Re: [BUG] allowtipsha1inwant serves unreachable blobs if you know its hash
On Thu, Feb 23, 2017 at 03:50:00PM -0800, Junio C Hamano wrote: > Jonathan Tan writes: > > > Adding "--objects" works, and all existing tests pass, except for the > > potential performance issue and the side effect that even fetching a > > reachable blob no longer works. This is due to a possible bug where a > > call like "git rev-list --objects $tree ^master" (where $tree is the > > tree object corresponding to master) prints out objects ... > > The "reachable from this, excluding what is reachable from that" > notation was originally designed to work only on commits, and I > wouldn't be surprised if "$tree ^master" did not work as you expect > in the current implementation. > > I agree that ideally it shouldn't show anything, but I suspect that > it would make it very expensive if done naively---we'd end up having > to call mark_tree_uninteresting() for all uninteresting commits, not > just for the commits at the edge of the DAG as we do right now. Yes, it's super-expensive to do naively (like 40+ seconds of CPU on torvalds/linux). Bitmaps should generally make it tolerable, though there are corner cases (e.g., if a ref tip has to walk a bit to get to a bitmap; we try to put bitmaps near the ref tips, but when you have 50,000 tags it's hard to do). We've explored similar things at GitHub for doing reachability checks on all 40-hex lookups (because right now I can point you to .../git/git/commit/1234abcd and you see that object, even if it's only in _my_ fork and not reachable from git/git). -Peff
Re: [BUG] allowtipsha1inwant serves unreachable blobs if you know its hash
Jonathan Tan writes: > Adding "--objects" works, and all existing tests pass, except for the > potential performance issue and the side effect that even fetching a > reachable blob no longer works. This is due to a possible bug where a > call like "git rev-list --objects $tree ^master" (where $tree is the > tree object corresponding to master) prints out objects ... The "reachable from this, excluding what is reachable from that" notation was originally designed to work only on commits, and I wouldn't be surprised if "$tree ^master" did not work as you expect in the current implementation. I agree that ideally it shouldn't show anything, but I suspect that it would make it very expensive if done naively---we'd end up having to call mark_tree_uninteresting() for all uninteresting commits, not just for the commits at the edge of the DAG as we do right now.
[BUG] allowtipsha1inwant serves unreachable blobs if you know its hash
If a server sets allowtipsha1inwant (or allowreachablesha1inwant), a client can call "git fetch " where SHA-1 is the hash of a blob (reachable or unreachable) to obtain it. The test below (which passes) demonstrates that. I have bisected this, and this bug occurs at least as early as the introduction of allowreachablesha1inwant in commit 68ee628 ("upload-pack: optionally allow fetching reachable sha1", 2015-05-21). It may have occurred earlier, though - initially, I thought that this was due to allowreachablesha1inwant, so I did not investigate allowtipsha1inwant until later. I found this out while investigating the feasibility of using the existing fetch-pack/upload-pack protocol to support fetching of arbitrary blobs. This happens most likely because the "rev-list" call in "do_reachable_revlist" (called through "has_unreachable") is invoked without the "--objects" argument. "has_unreachable" determines that an object is unreachable if nothing is printed to stdout, which normally works, except that "rev-list" prints nothing when asked which commits are reachable from a blob (which makes sense). Adding "--objects" works, and all existing tests pass, except for the potential performance issue and the side effect that even fetching a reachable blob no longer works. This is due to a possible bug where a call like "git rev-list --objects $tree ^master" (where $tree is the tree object corresponding to master) prints out objects even though all objects reachable from the tree are also reachable from master. (There should be no issue with "get_reachable_list", the other invoker of "do_reachable_revlist", because non-commits in the command's stdout are skipped.) The other option, of course, is to whitelist object types (that is, only allow commits and tags - I haven't checked if this is sufficient, though). This might be a sufficient temporary fix, although I expect that we would still have to revisit this later (especially if we decide that we want to allow downloading blobs through this interface). I don't mind taking a look at either solution, but thought of putting this out first in case people have any opinion or insight into this problem and its possible solutions. (CC-ing some people who might have an interest in downloading arbitrary blobs.) Signed-off-by: Jonathan Tan --- t/t-mytests.sh | 61 ++ 1 file changed, 61 insertions(+) create mode 100644 t/t-mytests.sh diff --git a/t/t-mytests.sh b/t/t-mytests.sh new file mode 100644 index 0..e71cfbf48 --- /dev/null +++ b/t/t-mytests.sh @@ -0,0 +1,61 @@ +#!/bin/sh + +test_description='my tests' +. ./test-lib.sh +. "$TEST_DIRECTORY"/lib-httpd.sh +start_httpd + +test_expect_success 'setup' ' + server="$HTTPD_DOCUMENT_ROOT_PATH/blobs" && + git init "$server" && + ( + cd "$server" && + test_commit myfile + ) && + + # A reachable blob. + reachable=$( + git -C "$server" cat-file -p \ + $(git -C "$server" cat-file -p HEAD | grep "^tree " | cut -c6-) | \ + grep myfile | cut -c13-52) && + test -n "$reachable" && + + # 2 unreachable blobs. Only one will be fetched. (2 are included here + # to demonstrate that there is no whole-repo copying or anything like + # that.) + unreachable=$(echo abc123 | git -C "$server" hash-object -w --stdin) && + test -n "$unreachable" && + another_unreachable=$(echo def456 | git -C "$server" hash-object -w --stdin) && + test -n "$another_unreachable" && + + git init --bare client.git && + git -C client.git remote add origin "$HTTPD_URL/smart/blobs" && + + # These fetches are supposed to fail (and do) + test_must_fail git -C client.git fetch origin $reachable && + test_must_fail git -C client.git fetch origin $unreachable +' + +test_expect_success 'allowtipsha1inwant suddenly allows blobs' ' + test -n "$reachable" && + + git -C "$server" config uploadpack.allowtipsha1inwant 1 && + + # This fetch passes + git -C client.git fetch origin $reachable && + test "myfile" = $(git -C client.git cat-file -p $reachable) +' + +test_expect_success 'even unreachable ones' ' + test -n "$unreachable" && + + # This fetch is supposed to fail (for multiple reasons), but passes. + # Only the wanted blob is fetched. + git -C client.git fetch origin $unreachable && + git -C client.git cat-file -e $unreachable && + test_must_fail git -C client.git cat-file -e $another_unreachable && + test "abc123" = $(git -C client.git cat-file -p $unreachable) +' + +stop_httpd +test_done -- 2.11.0.483.g087da7b7c-goog