Re: Relying on revisions for rollbacks

Jan Lehnardt Sun, 13 Apr 2008 14:25:02 -0700

Heya Ralf,
On Apr 13, 2008, at 02:33, Ralf Nieuwenhuijsen wrote:

Because the storage system is pretty wasteful and you'd end up with
several Gigabytes of database files for just a few hundredMegabytes ofactual data. So we do need compaction in one form or another. Acompaction
that retains revisions is a lot harder to write.  Also, dealing with
revisions in a distributed setup is less than trivial and wouldcomplicate
the replication system quite a bit.
The gigabytes versus hundred megabytes seem acceptable to me.Esspeciallywhen we can scale that easily. Also, it seems to depend on how oftendata
changes. A simple solution to compact revisions would be to store each
revision as a reverse-diff as well. The normal data can then becompacted,whereas the reverse-diff is just kept. From the most recent versionthe
older versions can be established.

Question 1: How would manual revisions be any more space efficient?

Manual revisions as top-level documents could be compacted to under 2Nof the data size not 100N, where N is the size of your actual data. Ofcourse, the amount of data you want to store won't magically decrease.It is rather that CouchDB's storage engine trades disk-space for speedand consistency at runtime with asynchronous compaction to regainwasted space. And, just to make sure, quoting myself: "The revisionsare not, at least at this point, meant to implement revision controlsystems, they rather exists for the optimistic concurrency controlthat allows any number of parallel readers while serialised writes arehappening and to power replication."

Compacting is a manual process at the moment. If we would introduce a
scheduling mechanism, it would certainly be more general purposeand you
could hook in al sorts of operations, including compaction.
Question 2: In which case 'compacting' (aka as destroying therevisions)
would still be optional; something we can turn off?

You need to run it explicitly at the moment. So by default, everythingis kept. This might change in the future, but you will be able todisable it on a database-level.

Question 3: Can we use older revisions in views?

No.

See http://damienkatz.net/2008/02/incremental_map.html
and http://damienkatz.net/2008/02/incremental_map_1.html

Question 4: It appears from the comments this will behave much like a

combinator. So the algorithm complexity of adding one new documentwould be

O(1) ?


I think so, but I am not the definitive source here. Damien?

You don't merge, at least at the moment, but declare one revisionto be thewinner when resolving the conflict. Since this is a manual process,you canmake sure you don't lose revision trees. Merge might be in at somepoint,
but no thoughts (at least public) went into that.
Question 5: Is manually implementing a conflict resolver possible atthemoment (didn't find it on the wiki) and if so, why not let thatfunctionjust return the winning _data_. That way we could easily implement amerger.
(which would be a much more sane approach for most documents)

It is not possible at the moment. You need to resolve a conflict fromyour application. There you can do all the merging you need :)

I don't understand what you mean here :) What is 'doc-is' in thiscontext?
Oops, i meant 'doc-ID's' .. if i have several revisions of the samedocumentas seperate documents, then the doc-id can no longer be some nicename.
Since doc-id's have to be unqiue.

Correct. Pretty names can get hairy in a distributed setup, so youmight want to stick to UUIDs and provide your own "pretty-name". Youwouldn't be able to use that, programatically, except in views, though.

The alternative of a cron-like system, could work much like the
view-documents. These documents could contain a source url (possibly
local),
a schedule-parameter and a function that maps a document to anarray ofdocuments that is treated as a batch-put. This way we could easilysetupreplication, but also all kinds of delayed and/or scheduledproccessing
of data.
Indeed. No planning went into such a thing at the moment. You mightwantto open a feature request at https://issues.apache.org/jira/browse/COUCHDBor
come up with a patch.
Perhaps i will look into it myself, if it turns out I need thisdesperately.I don't have any erlang experience, but i think my experience withhaskell
will pull me through ;-)

Awesome! May recommend Joe Armstrong's 'Programming Erlang' book at http://www.pragprog.com/titles/jaerlangin case you are looking for literature. Thanks for consideringhelping out. Contributions are very important for the project.

Conflict resolution and merge functions do sound interesting, I don't
understand the "not guaranteeing scalability" remark though. In thecurrentimplementation, this feature actually makes CouchDB scalable byensuring,that all node participating in a cluster eventually end up with thesamedata. If you really do need two-phase-commit (if I understandcorrectly, youwant that), that would need to be part of your application or aintermediate
storage layer.
No, no need for two-phase-commits. Rather, i would suggest thecompleteother extreme. No failed inserts/updates ever, including batch puts.Just a
generic merging conflict solver.

JSON seems very merge friendly to me ;-) It would seem that 99% of all
documents and use cases could be treated with the same genericl merge
function.


I think this idea is worth pursuing.

Cheers
Jan
--

Re: Relying on revisions for rollbacks

Reply via email to