Re: [PATCH 4/4] archive: ignore blob objects when checking reachability
On 06/06/2013 12:40 AM, Jeff King wrote: We cannot create an archive from a blob object, so we would not expect anyone to provide one to us. And if they do, we will fail anyway just after the reachability check. We can therefore optimize our reachability check to ignore blobs completely, and not even create a struct blob for them. Depending on the repository size and the exact place we find the reachable object in the traversal, this can save 20-25%, a we can avoid many lookups in the object hash. The downside of this is that a blob provided to a remote archive process will fail with no such object rather than object is not a tree (we could organize the code to retain the old message, but since we no longer know whether the blob is reachable or not, we would potentially be leaking information about the existence of unreachable objects). Could we change the error message to no such tree object to be non-committal about the reason for the failure? For a moment I thought that one could get correct error messages while retaining the speed gain in the usual case by doing a quick object lookup, and then check type of object if object is missing: die(there is no such object) else if object is a blob: do reachability test including blobs if object is not reachable: die(there is no such object) else die(object is not a tree) else do reachability test excluding blobs etc However, even this would leak information about the existence of nonreachable objects to a client measuring time time for the response because the death due to non-reachability would take longer than death due to missing object. So, if one would insist on correct error messages and no information leakage, one could just skip the first object is missing optimization (it should be pretty rare anyway!) like so: check type of object if object is missing or object is a blob: /* Force the same delay in either case: */ do reachability test including blobs if object is missing or object is not reachable: die(there is no such object) else die(object is not a tree) else do reachability test excluding blobs etc I'm not suggesting that the extra effort is worth it; I just wanted to mention the possibility. Michael -- Michael Haggerty mhag...@alum.mit.edu http://softwareswirl.blogspot.com/ -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 4/4] archive: ignore blob objects when checking reachability
On Wed, Jun 5, 2013 at 6:40 PM, Jeff King p...@peff.net wrote: We cannot create an archive from a blob object, so we would not expect anyone to provide one to us. And if they do, we will fail anyway just after the reachability check. We can therefore optimize our reachability check to ignore blobs completely, and not even create a struct blob for them. Depending on the repository size and the exact place we find the reachable object in the traversal, this can save 20-25%, a we can avoid many lookups in the object hash. s/a/as/ The downside of this is that a blob provided to a remote archive process will fail with no such object rather than object is not a tree (we could organize the code to retain the old message, but since we no longer know whether the blob is reachable or not, we would potentially be leaking information about the existence of unreachable objects). Signed-off-by: Jeff King p...@peff.net -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 4/4] archive: ignore blob objects when checking reachability
We cannot create an archive from a blob object, so we would not expect anyone to provide one to us. And if they do, we will fail anyway just after the reachability check. We can therefore optimize our reachability check to ignore blobs completely, and not even create a struct blob for them. Depending on the repository size and the exact place we find the reachable object in the traversal, this can save 20-25%, a we can avoid many lookups in the object hash. The downside of this is that a blob provided to a remote archive process will fail with no such object rather than object is not a tree (we could organize the code to retain the old message, but since we no longer know whether the blob is reachable or not, we would potentially be leaking information about the existence of unreachable objects). Signed-off-by: Jeff King p...@peff.net --- archive.c | 1 + 1 file changed, 1 insertion(+) diff --git a/archive.c b/archive.c index 4d77624..98691cd 100644 --- a/archive.c +++ b/archive.c @@ -290,6 +290,7 @@ static int object_is_reachable(const unsigned char *sha1) save_commit_buffer = 0; init_revisions(data.revs, NULL); setup_revisions(ARRAY_SIZE(argv) - 1, argv, data.revs, NULL); + data.revs.blob_objects = 0; if (prepare_revision_walk(data.revs)) return 0; -- 1.8.3.rc2.14.g7eee6b3 -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html