Re: [PATCH 4/4] archive: ignore blob objects when checking reachability

2013-06-06 Thread Michael Haggerty
On 06/06/2013 12:40 AM, Jeff King wrote:
 We cannot create an archive from a blob object, so we would
 not expect anyone to provide one to us. And if they do, we
 will fail anyway just after the reachability check.  We can
 therefore optimize our reachability check to ignore blobs
 completely, and not even create a struct blob for them.
 
 Depending on the repository size and the exact place we find
 the reachable object in the traversal, this can save 20-25%,
 a we can avoid many lookups in the object hash.
 
 The downside of this is that a blob provided to a remote
 archive process will fail with no such object rather than
 object is not a tree (we could organize the code to retain
 the old message, but since we no longer know whether the
 blob is reachable or not, we would potentially be leaking
 information about the existence of unreachable objects).

Could we change the error message to no such tree object to be
non-committal about the reason for the failure?


For a moment I thought that one could get correct error messages while
retaining the speed gain in the usual case by doing a quick object
lookup, and then

check type of object
if object is missing:
die(there is no such object)
else if object is a blob:
do reachability test including blobs
if object is not reachable:
die(there is no such object)
else
die(object is not a tree)
else
do reachability test excluding blobs
etc

However, even this would leak information about the existence of
nonreachable objects to a client measuring time time for the response
because the death due to non-reachability would take longer than death
due to missing object.  So, if one would insist on correct error
messages and no information leakage, one could just skip the first
object is missing optimization (it should be pretty rare anyway!) like so:

check type of object
if object is missing or object is a blob:
/* Force the same delay in either case: */
do reachability test including blobs
if object is missing or object is not reachable:
die(there is no such object)
else
die(object is not a tree)
else
do reachability test excluding blobs
etc

I'm not suggesting that the extra effort is worth it; I just wanted to
mention the possibility.

Michael

-- 
Michael Haggerty
mhag...@alum.mit.edu
http://softwareswirl.blogspot.com/
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 4/4] archive: ignore blob objects when checking reachability

2013-06-06 Thread Eric Sunshine
On Wed, Jun 5, 2013 at 6:40 PM, Jeff King p...@peff.net wrote:
 We cannot create an archive from a blob object, so we would
 not expect anyone to provide one to us. And if they do, we
 will fail anyway just after the reachability check.  We can
 therefore optimize our reachability check to ignore blobs
 completely, and not even create a struct blob for them.

 Depending on the repository size and the exact place we find
 the reachable object in the traversal, this can save 20-25%,
 a we can avoid many lookups in the object hash.

s/a/as/

 The downside of this is that a blob provided to a remote
 archive process will fail with no such object rather than
 object is not a tree (we could organize the code to retain
 the old message, but since we no longer know whether the
 blob is reachable or not, we would potentially be leaking
 information about the existence of unreachable objects).

 Signed-off-by: Jeff King p...@peff.net
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 4/4] archive: ignore blob objects when checking reachability

2013-06-05 Thread Jeff King
We cannot create an archive from a blob object, so we would
not expect anyone to provide one to us. And if they do, we
will fail anyway just after the reachability check.  We can
therefore optimize our reachability check to ignore blobs
completely, and not even create a struct blob for them.

Depending on the repository size and the exact place we find
the reachable object in the traversal, this can save 20-25%,
a we can avoid many lookups in the object hash.

The downside of this is that a blob provided to a remote
archive process will fail with no such object rather than
object is not a tree (we could organize the code to retain
the old message, but since we no longer know whether the
blob is reachable or not, we would potentially be leaking
information about the existence of unreachable objects).

Signed-off-by: Jeff King p...@peff.net
---
 archive.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/archive.c b/archive.c
index 4d77624..98691cd 100644
--- a/archive.c
+++ b/archive.c
@@ -290,6 +290,7 @@ static int object_is_reachable(const unsigned char *sha1)
save_commit_buffer = 0;
init_revisions(data.revs, NULL);
setup_revisions(ARRAY_SIZE(argv) - 1, argv, data.revs, NULL);
+   data.revs.blob_objects = 0;
if (prepare_revision_walk(data.revs))
return 0;
 
-- 
1.8.3.rc2.14.g7eee6b3
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html