On Tue, Feb 24, 2009 at 09:06:09AM +0100, Patrick Antivackis wrote: > Oh and by the way, in a use case where there is only one database and you > don't use compaction because you want to keep everything, well _rev is a > revision that can be used to see the history of the document.
This is a good point. If you follow "accountants don't use erasers" then you will never compact (and maybe you want a flag which prevents compaction). However, you must then be prepared for your database to be a single file which grows without bounds. If CouchDB wants to support this model, it would be helpful if the data were stored in chunks which can be backed up separately. "Compaction" for saving space could be achieved by rewriting the database, but keeping diffs for earlier revisions. At this point you would end up with something roughly like git. On a random tangent: has anyone considered a CouchDB-like system where documents are raw blobs, rather than JSON? ISTM that: - it would save a lot of conversion between Erlang terms and JSON - it would remove the second-class nature of attachments - it would allow structured data to be stored in arbitary formats (e.g. XML) - it would allow map/reduce to work on binary data (e.g. use a map function to make thumbnails of all your jpegs) - you could still use JSON quite happily, e.g. function map(type, data) { if (type == "application/json") { doc = evalcx(data); ... continue as normal } } I guess some of the APIs would become a bit more awkward though. For example, bulk document insert would probably become MIME multipart. In principle, I think you could get today's CouchDB as a thin layer on top of this. However, "attachments" do have interesting special semantics (e.g. deleting a document deletes all its attachments) which might need some parent/child relationship between documents to maintain. Having that relationship between documents in a more general form could also be useful. Just thinking out loud. Regards, Brian.