On Wed, 20 Apr 2005, Martin Uecker wrote:

The other thing I don't like is the use of a sha1
for a complete file. Switching to some kind of hash
tree would allow to introduce chunks later. This has
two advantages:

You can (and my code demonstrates/will demonstrate) still use a whole-file hash to use chunking. With content prefixes, this takes O(N ln M) time (where N is the file size and M is the number of chunks) to compute all hashes; if subtrees can share the same prefix, then you can do this in O(N) time (ie, as fast as possible, modulo a constant factor, which is '2'). You don't *need* internal hashing functions.


It would allow git to scale to repositories of large
binary files. And it would allow to build a very cool
content transport algorithm for those repositories.
This algorithm could combine all the advantages of
bittorrent and rsync (without the cpu load).

Yes, the big benefit of internal hashing is that it lets you check validity of a chunk w/o having the entire file available. I'm not sure that's terribly useful in this case. [And, if it is, then it can obviously be done w/ other means.]


And it would allow trivial merging of patches which
apply to different chunks of a file in exact the same
way as merging changesets which apply to different
files in a tree.

I'm not sure anyone should be looking at chunks. To me, at least, they are an object-store-implementation detail only. For merging, etc, we should be looking at whole files, or (better) the whole repository.
The chunking algorithm is guaranteed not to respect semantic boundaries (for *some* semantics of *some* file).
--scott


explosion JMTRAX DC KUBARK biowarfare LCFLUTTER ESMERALDITE for Dummies Hager Nader Israel General ZRMETAL Castro cryptographic Indonesia
( http://cscott.net/ )
-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at http://vger.kernel.org/majordomo-info.html

Reply via email to