Re: [darcs-users] patch metadata, annotations, Ignore-this, tagging, etc

Max Battcher Mon, 29 Mar 2010 22:54:01 -0700

On 3/23/2010 0:49, Max Battcher wrote:

A similar pony repository format idea might be to try experimenting with
one of the new, hip document databases like couchdb. I've thought at
times hashed-storage already seems to be converging in the direction of
a document database... Interesting thought, a couchdb-based darcs...

Curiously, I stumbled upon an alpha of a git backend to use couchdb forstorage (git-db on github). (Of course its under-documented and tough tofigure out how much of what it actually supports...)

Anyway, here's a quick overview of couchdb (and maybe some of why itseems kin to me to hashed-storage), for the curious:

Couchdb is a document database that is also in the "written in afunctional language club" with darcs, albeit in this case Erlang. It'sowned by the Apache Foundation and under the Apache License. It's nowinstalled by default on Ubuntu and under the "desktop-couch" project isalready used for several types of data interoperation/synchronization in"normal" Ubuntu applications.

A document database is geared towards/optimized for storing documents(schema-less key-value mappings) by some key (often, by default, a UUID,but sometimes more of a URI/"file name"). Document lookups by key aregenerally designed to be very fast (presumably as near to a file systemlookup as possible). Document databases then provide a mechanism(usually a direct descendant of the MapReduce pattern) to query (andindex) for values stored within the document database. Generally if thequery can be expressed directly and simply as a MapReduce, it is a fast,parallel query (that can run across a partitioned cluster, even).

Couchdb's particular "native" document format is JSON objects which itserializes into a "crash-only", append-only B-Tree file format, and itsmap/reduce "view" language is JavaScript. Other than some of theseformat choices, most document databases are very similar. If youconsider hashed-storage in this light: hashed-storage stores whicheverformat it is provided into individual files in the file system, and itdoesn't provide a generic query interface or indexing. I'm not meaningto be terribly harsh to hashed-storage, I'm merely trying to point outthat perhaps hashed-storage is the "long way around" to something morelike a generalized document database like couchdb. (Particularly if youconsider that packs would probably be append-only B-Trees or similar,and that hashed-storage/darcs is already working towards its second,specialized index...)

It may be interesting to see a camp-like experiment on couchdb. Maybeeven something that could be done in a branch of darcs itself, usingsome of the storage backend generalization that hashed-storage alreadybegan.

Obviously there is a ton of pros and cons to consider... a couchdb-baseddarcs would probably have to be pretty different to take the mostadvantage of it, and it would have to have entirely differentcommunication strategies than the current, but very useful, "dumb filetransfer protocols". (Of which I am a proponent, so I'd probably want alot of good testing and some sort of concessions for "cheap, dumb hosts"before something like couchdb became the default backend...)

Anyway, that's the thought experiment. Maybe there are some useful ideasto explore there for camp, darcs and/or hashed-storage.


--
--Max Battcher--
http://worldmaker.net
_______________________________________________
darcs-users mailing list
[email protected]
http://lists.osuosl.org/mailman/listinfo/darcs-users

Re: [darcs-users] patch metadata, annotations, Ignore-this, tagging, etc

Reply via email to