On 3/23/2010 0:49, Max Battcher wrote:
A similar pony repository format idea might be to try experimenting with one of the new, hip document databases like couchdb. I've thought at times hashed-storage already seems to be converging in the direction of a document database... Interesting thought, a couchdb-based darcs...
Curiously, I stumbled upon an alpha of a git backend to use couchdb for storage (git-db on github). (Of course its under-documented and tough to figure out how much of what it actually supports...)
Anyway, here's a quick overview of couchdb (and maybe some of why it seems kin to me to hashed-storage), for the curious:
Couchdb is a document database that is also in the "written in a functional language club" with darcs, albeit in this case Erlang. It's owned by the Apache Foundation and under the Apache License. It's now installed by default on Ubuntu and under the "desktop-couch" project is already used for several types of data interoperation/synchronization in "normal" Ubuntu applications.
A document database is geared towards/optimized for storing documents (schema-less key-value mappings) by some key (often, by default, a UUID, but sometimes more of a URI/"file name"). Document lookups by key are generally designed to be very fast (presumably as near to a file system lookup as possible). Document databases then provide a mechanism (usually a direct descendant of the MapReduce pattern) to query (and index) for values stored within the document database. Generally if the query can be expressed directly and simply as a MapReduce, it is a fast, parallel query (that can run across a partitioned cluster, even).
Couchdb's particular "native" document format is JSON objects which it serializes into a "crash-only", append-only B-Tree file format, and its map/reduce "view" language is JavaScript. Other than some of these format choices, most document databases are very similar. If you consider hashed-storage in this light: hashed-storage stores whichever format it is provided into individual files in the file system, and it doesn't provide a generic query interface or indexing. I'm not meaning to be terribly harsh to hashed-storage, I'm merely trying to point out that perhaps hashed-storage is the "long way around" to something more like a generalized document database like couchdb. (Particularly if you consider that packs would probably be append-only B-Trees or similar, and that hashed-storage/darcs is already working towards its second, specialized index...)
It may be interesting to see a camp-like experiment on couchdb. Maybe even something that could be done in a branch of darcs itself, using some of the storage backend generalization that hashed-storage already began.
Obviously there is a ton of pros and cons to consider... a couchdb-based darcs would probably have to be pretty different to take the most advantage of it, and it would have to have entirely different communication strategies than the current, but very useful, "dumb file transfer protocols". (Of which I am a proponent, so I'd probably want a lot of good testing and some sort of concessions for "cheap, dumb hosts" before something like couchdb became the default backend...)
Anyway, that's the thought experiment. Maybe there are some useful ideas to explore there for camp, darcs and/or hashed-storage.
-- --Max Battcher-- http://worldmaker.net _______________________________________________ darcs-users mailing list [email protected] http://lists.osuosl.org/mailman/listinfo/darcs-users
