Re: [BUG] allowtipsha1inwant serves unreachable blobs if you know its hash

2017-02-23 Thread Jeff King
On Thu, Feb 23, 2017 at 03:03:58PM -0800, Jonathan Tan wrote:

> If a server sets allowtipsha1inwant (or allowreachablesha1inwant), a
> client can call "git fetch  " where SHA-1 is the hash of
> a blob (reachable or unreachable) to obtain it. The test below (which
> passes) demonstrates that.

Thanks for a nice demonstration.

TBH, I think the whole "we will not give you unreachable things" is
overblown as a security measure. There are a lot of ways to leak
information out of the repo. E.g., claiming you _do_ have the
unreachable thing, at which point pack-objects may create a delta
against it. Done repeatedly, it works as an oracle by which you can
guess the contents of the unreachable thing.

> This happens most likely because the "rev-list" call in
> "do_reachable_revlist" (called through "has_unreachable") is invoked
> without the "--objects" argument. "has_unreachable" determines that an
> object is unreachable if nothing is printed to stdout, which normally
> works, except that "rev-list" prints nothing when asked which commits
> are reachable from a blob (which makes sense).
> 
> Adding "--objects" works, and all existing tests pass, except for the
> potential performance issue and the side effect that even fetching a
> reachable blob no longer works. This is due to a possible bug where a
> call like "git rev-list --objects $tree ^master" (where $tree is the
> tree object corresponding to master) prints out objects even though all
> objects reachable from the tree are also reachable from master. (There
> should be no issue with "get_reachable_list", the other invoker of
> "do_reachable_revlist", because non-commits in the command's stdout are
> skipped.)

I think "--objects --use-bitmap-index" would be faster. If you have
bitmaps.

-Peff


Re: [BUG] allowtipsha1inwant serves unreachable blobs if you know its hash

2017-02-23 Thread Junio C Hamano
Junio C Hamano  writes:

> Jonathan Tan  writes:
>
>> Adding "--objects" works, and all existing tests pass, except for the
>> potential performance issue and the side effect that even fetching a
>> reachable blob no longer works. This is due to a possible bug where a
>> call like "git rev-list --objects $tree ^master" (where $tree is the
>> tree object corresponding to master) prints out objects ...
>
> The "reachable from this, excluding what is reachable from that"
> notation was originally designed to work only on commits, and I
> wouldn't be surprised if "$tree ^master" did not work as you expect
> in the current implementation.
>
> I agree that ideally it shouldn't show anything, but I suspect that
> it would make it very expensive if done naively---we'd end up having
> to call mark_tree_uninteresting() for all uninteresting commits, not
> just for the commits at the edge of the DAG as we do right now.

BTW, by "it would be expensive" I didn't mean "hence it shouldn't be
done."  Even though I do not know by how much expensive it would
become, I think this approach is a lot better way going forward than
punting and say "no, you cannot specify anything lower than commit."

Thanks for looking into this.



Re: [BUG] allowtipsha1inwant serves unreachable blobs if you know its hash

2017-02-23 Thread Jeff King
On Thu, Feb 23, 2017 at 03:50:00PM -0800, Junio C Hamano wrote:

> Jonathan Tan  writes:
> 
> > Adding "--objects" works, and all existing tests pass, except for the
> > potential performance issue and the side effect that even fetching a
> > reachable blob no longer works. This is due to a possible bug where a
> > call like "git rev-list --objects $tree ^master" (where $tree is the
> > tree object corresponding to master) prints out objects ...
> 
> The "reachable from this, excluding what is reachable from that"
> notation was originally designed to work only on commits, and I
> wouldn't be surprised if "$tree ^master" did not work as you expect
> in the current implementation.
> 
> I agree that ideally it shouldn't show anything, but I suspect that
> it would make it very expensive if done naively---we'd end up having
> to call mark_tree_uninteresting() for all uninteresting commits, not
> just for the commits at the edge of the DAG as we do right now.

Yes, it's super-expensive to do naively (like 40+ seconds of CPU on
torvalds/linux). Bitmaps should generally make it tolerable, though
there are corner cases (e.g., if a ref tip has to walk a bit to get to a
bitmap; we try to put bitmaps near the ref tips, but when you have
50,000 tags it's hard to do).

We've explored similar things at GitHub for doing reachability checks on
all 40-hex lookups (because right now I can point you to
.../git/git/commit/1234abcd and you see that object, even if it's only
in _my_ fork and not reachable from git/git).

-Peff


Re: [BUG] allowtipsha1inwant serves unreachable blobs if you know its hash

2017-02-23 Thread Junio C Hamano
Jonathan Tan  writes:

> Adding "--objects" works, and all existing tests pass, except for the
> potential performance issue and the side effect that even fetching a
> reachable blob no longer works. This is due to a possible bug where a
> call like "git rev-list --objects $tree ^master" (where $tree is the
> tree object corresponding to master) prints out objects ...

The "reachable from this, excluding what is reachable from that"
notation was originally designed to work only on commits, and I
wouldn't be surprised if "$tree ^master" did not work as you expect
in the current implementation.

I agree that ideally it shouldn't show anything, but I suspect that
it would make it very expensive if done naively---we'd end up having
to call mark_tree_uninteresting() for all uninteresting commits, not
just for the commits at the edge of the DAG as we do right now.


[BUG] allowtipsha1inwant serves unreachable blobs if you know its hash

2017-02-23 Thread Jonathan Tan
If a server sets allowtipsha1inwant (or allowreachablesha1inwant), a
client can call "git fetch  " where SHA-1 is the hash of
a blob (reachable or unreachable) to obtain it. The test below (which
passes) demonstrates that.

I have bisected this, and this bug occurs at least as early as the
introduction of allowreachablesha1inwant in commit 68ee628
("upload-pack: optionally allow fetching reachable sha1", 2015-05-21).
It may have occurred earlier, though - initially, I thought that this
was due to allowreachablesha1inwant, so I did not investigate
allowtipsha1inwant until later.

I found this out while investigating the feasibility of using the
existing fetch-pack/upload-pack protocol to support fetching of
arbitrary blobs.

This happens most likely because the "rev-list" call in
"do_reachable_revlist" (called through "has_unreachable") is invoked
without the "--objects" argument. "has_unreachable" determines that an
object is unreachable if nothing is printed to stdout, which normally
works, except that "rev-list" prints nothing when asked which commits
are reachable from a blob (which makes sense).

Adding "--objects" works, and all existing tests pass, except for the
potential performance issue and the side effect that even fetching a
reachable blob no longer works. This is due to a possible bug where a
call like "git rev-list --objects $tree ^master" (where $tree is the
tree object corresponding to master) prints out objects even though all
objects reachable from the tree are also reachable from master. (There
should be no issue with "get_reachable_list", the other invoker of
"do_reachable_revlist", because non-commits in the command's stdout are
skipped.)

The other option, of course, is to whitelist object types (that is, only
allow commits and tags - I haven't checked if this is sufficient,
though). This might be a sufficient temporary fix, although I expect
that we would still have to revisit this later (especially if we decide
that we want to allow downloading blobs through this interface).

I don't mind taking a look at either solution, but thought of putting
this out first in case people have any opinion or insight into this
problem and its possible solutions.

(CC-ing some people who might have an interest in downloading arbitrary
blobs.)

Signed-off-by: Jonathan Tan 
---
 t/t-mytests.sh | 61 ++
 1 file changed, 61 insertions(+)
 create mode 100644 t/t-mytests.sh

diff --git a/t/t-mytests.sh b/t/t-mytests.sh
new file mode 100644
index 0..e71cfbf48
--- /dev/null
+++ b/t/t-mytests.sh
@@ -0,0 +1,61 @@
+#!/bin/sh
+
+test_description='my tests'
+. ./test-lib.sh
+. "$TEST_DIRECTORY"/lib-httpd.sh
+start_httpd
+
+test_expect_success 'setup' '
+   server="$HTTPD_DOCUMENT_ROOT_PATH/blobs" &&
+   git init "$server" &&
+   (
+   cd "$server" &&
+   test_commit myfile
+   ) &&
+
+   # A reachable blob.
+   reachable=$(
+   git -C "$server" cat-file -p \
+   $(git -C "$server" cat-file -p HEAD | grep "^tree " | cut -c6-) 
| \
+   grep myfile | cut -c13-52) &&
+   test -n "$reachable" &&
+
+   # 2 unreachable blobs. Only one will be fetched. (2 are included here
+   # to demonstrate that there is no whole-repo copying or anything like
+   # that.)
+   unreachable=$(echo abc123 | git -C "$server" hash-object -w --stdin) &&
+   test -n "$unreachable" &&
+   another_unreachable=$(echo def456 | git -C "$server" hash-object -w 
--stdin) &&
+   test -n "$another_unreachable" &&
+
+   git init --bare client.git &&
+   git -C client.git remote add origin "$HTTPD_URL/smart/blobs" &&
+
+   # These fetches are supposed to fail (and do)
+   test_must_fail git -C client.git fetch origin $reachable &&
+   test_must_fail git -C client.git fetch origin $unreachable
+'
+
+test_expect_success 'allowtipsha1inwant suddenly allows blobs' '
+   test -n "$reachable" &&
+
+   git -C "$server" config uploadpack.allowtipsha1inwant 1 &&
+
+   # This fetch passes
+   git -C client.git fetch origin $reachable &&
+   test "myfile" = $(git -C client.git cat-file -p $reachable)
+'
+
+test_expect_success 'even unreachable ones' '
+   test -n "$unreachable" &&
+
+   # This fetch is supposed to fail (for multiple reasons), but passes.
+   # Only the wanted blob is fetched.
+   git -C client.git fetch origin $unreachable &&
+   git -C client.git cat-file -e $unreachable &&
+   test_must_fail git -C client.git cat-file -e $another_unreachable &&
+   test "abc123" = $(git -C client.git cat-file -p $unreachable)
+'
+
+stop_httpd
+test_done
-- 
2.11.0.483.g087da7b7c-goog