Thanks Paul. I am starting to accept the fact that internal rev-control is not to be used. And I feel I am barely starting to understand why it shouldn't be used. I thought that compacting was the only reason (and compacting is manual and controlled - at least for know...), so that's wy I am being stubborn about it. :)
I understood how the process worked, but my problem was how the A revisions are merged? How do I handle the conflict during replication? I have never actually performed a replication before. Will it do internal conflict management and the latest edit wins? Or, is the process calling a conflict handler I set up beforehand? Or is the item flagged in order for me to pick up conflicting docs after replication? This is one of the reasons I am holding back on the idea of replication (for any other purpose than backup). Thanks again for a descriptive answer, it's most appreciated. Regards, Ronny 2008/9/18 Paul Davis <[EMAIL PROTECTED]> > So two things here. First the update steps similar to before: > > 1. Get current version of document id X > 2. Clone doc X making doc Y > 3. Make doc Y a history doc: > a. Y._id = new_uuid() > b. Y.is_current_revision = false > c. Delete Y._rev > e. X.previous_version = Y._id > 4. Edit doc X as desired > 5. In a single HTTP request, send both documents to _bulk_docs > > So, given a document A, we have a current history of A.previous = C._id > Getting A, clone A to get B and edit as per step 3 > Now we have A.previous = B, B.previous = C or A -> B -> C > > To make this permanent, we post to _bulk_docs both A and B. If A was > edited simultaneously, our A will get rejected as will B. So nothing > changed, you'd resolve this for as per any other normal situation. > > This will work in face of replication. Just the same as per any other > replication we may have to resolve conflicts, but our histories should > never conflict. What this system does introduce is this: > > Given that someone did the A->B->C above, say someone else > simultaneously A->D->C and we replicate. > > B, C, and D will not conflict. > > The two versions of A will. We resolve this as we would have for any > case. Then we indicate that A now has *two* previous histories. A-> (B > or D) -> C > > Full Stop. > > Using the built in revisioning for app dependent revisioning is a Bad > Idea ™. Its not meant for that and shouldn't be relied on. I am > not saying "Don't use the built in rev-control for rev-control." I'm > saying "Don't use the builtin collision detection system that is not > at all meant for rev-control for rev-control." I know, shades of gray > and all. > > The single node case can't be handled by the internal revision > control. You may think it can. It may look like it can. But it just > can't. You'll be whistling along and then wham! Something will happen > and you'll be up shit creek. (something will happen = accidental > compaction, need for replication, changes to couchdb internals > invalidating this approach, meteor hits your datacenter, you get the > idea) > > We can't use the internal _rev system for multi-node stuff because old > revisions are never replicated. Not even attempted at being > replicated. CouchDB idealogy says that there is one version of each > document, the most recent revision. Yes, it is possible to obtain > previous revisions making it look like revision control, but that's an > effect of implementation and hence should not be relied upon (Caveats > apply, using for things like undo etc are probably kosher as long as > you handle the possibly missing document etc). > > HTH, > Paul > > > On Thu, Sep 18, 2008 at 11:13 AM, Ronny Hanssen <[EMAIL PROTECTED]> > wrote: > > Ok, I get it... I understand bulk_docs is atomic, but I missed out on > that > > you actually preserved the *original* doc.id (doh). I thought that with > > clone you meant a new doc in CouchDB, with it's own id. And I just > couldn't > > understand why you did that :). This now makes more sense to me. Sorry. > >> As to replication, what you'd need is a flag that says if a particular > >> node is the head node. Then your history docs should never clash. If > >> you get conflicts on the head node you resolve them and store all > >> conflicting previous revisions. In this manner your linked list > >> becomes a linked directed acyclic graph. (Yay college) This does mean > >> that at any given point in the history you could possibly have > >> multiple versions of the same doc, but replication works. > > > > Ok, but how is that flag supposed to be set? At the time of inserting > with > > _bulk_docs the system needs to update the current, which means that any > node > > racing during an update will flag it to be current and actual. Which > means > > that replication in race conditions will conflict(?). > > > > I am just asking because the single node case could be handled by the > > internal CouchDB revision control. So, using the elaborate scheme you > > propose isn't really helping for that scenario. My impression was that we > > cannot use the internal CouchDB due to the difficulties in handling > > conflicts with multiple nodes involved (because conflicts could/would > > occur), and that this would be better handled by manual hand-coded > > rev-control. > > > > It seems to me that there are no solutions on how to do this by hand > coding > > either. So, it seems we are saying "don't use the built-in rev-control > for > > rev-control of data" to avoid people blaming CouchDB when the built in > > "revision control" conflicts. > > > > Thanks for your patience guys. > > > > ~Ronny > > > > 2008/9/18 Paul Davis <[EMAIL PROTECTED]> > > > >> Ronny, > >> > >> There are two points that I think you're missing. > >> > >> 1. _bulk_docs is atomic. As in, if one doc fails, they all fail. > >> 2. I was trying to make sure that the latest _id of a doc is constant. > >> > >> Think of this as a linked list. You grab the head document (most > >> current revision) and clone it. Then we change the uuid of the second > >> doc and make our pointer links to fit into the list. Then after making > >> the necessary changes, we edit the head node to our desire. Now we > >> post *both* (in the same HTTP request!) docs to _bulk_docs. This > >> ensures that if someone else edited this particular doc, the revisions > >> will be different and the second edit would fail. Thus, on success 2 > >> docs are inserted, on failure, 0 docs. > >> > >> As to replication, what you'd need is a flag that says if a particular > >> node is the head node. Then your history docs should never clash. If > >> you get conflicts on the head node you resolve them and store all > >> conflicting previous revisions. In this manner your linked list > >> becomes a linked directed acyclic graph. (Yay college) This does mean > >> that at any given point in the history you could possibly have > >> multiple versions of the same doc, but replication works. > >> > >> For views, you'd just want to have a flag that says "Not the most > >> recent version." Then in your view you would know whether to emit > >> key/value pairs for it. This could be something like "No next version > >> pointer" or some such. Actually, this couldn't be a next pointer > >> without two initial gets because you'd need to get the head node and > >> next node. A boolean flag indicating head node status would be > >> sufficient though. And then you could have a history view if you ever > >> need to walk from tail to head > >> > >> HTH, > >> Paul > >> > >> > >> On Wed, Sep 17, 2008 at 9:35 PM, Ronny Hanssen <[EMAIL PROTECTED]> > >> wrote: > >> > Hm. > >> > > >> > In Paul's case I am not 100% sure what is going on. Here's a use case > for > >> > two concurrent edits: > >> > * First two users get the original. > >> > * Both makes a copy which they save. > >> > This means that there are two fresh docs in CouchDB (even on a single > >> > node). > >> > * Save the original using a new doc._id (which the copy is to persist > in > >> > copy.previous_version). > >> > This means that the two new docs know where to find their previous > >> > versions. The problem I have with this scheme is that every change of > a > >> > document means that it needs to store not only the new version, but > also > >> > it's old version (in addition to the original). The fact that two > racing > >> > updates will generate 4(!) new docs in addition to the original > document > >> is > >> > worrying. I guess Paul also want the original to be marked as deleted > in > >> the > >> > _bulk_docs? But, in any case the previous version are now new two new > >> docs, > >> > but they look exactly the same, except for the doc._id, naturally... > >> > > >> > Wouldn't this be enough Paul? > >> > 1. old = get_doc() > >> > 2. update = clone(old); > >> > 3. update.previous_version = old._id; > >> > 4. post via _bulk_docs > >> > > >> > This way there won't be multiple old docs around. > >> > > >> > Jan's way ensures that for a view there is always only one current > >> version > >> > of a doc, since it is using the built-in rev-control. Competing > updates > >> on > >> > the same node may fail which is then what CouchDB is designed to > handle. > >> If > >> > on different nodes, then the rev-control history might come "out of > >> synch" > >> > via concurrent updates. How does CouchDB handle this? Which update > wins? > >> On > >> > a single node this is intercepted when saving the doc. For multiple > nodes > >> > they might both get a response saying "save complete". So, these then > >> needs > >> > merging. How is that done? Jan further on secures the previous version > by > >> > storing the previous version as a new doc, allowing them to be > persisted > >> > beyond compaction. I guess Jan's sample would benefit nicely from > >> _bulk_docs > >> > too. I like this method due to the fact that it allows only one > current > >> doc. > >> > But, I worry about how revision control handles conflicts, Jan? > >> > > >> > Paul and my updated suggestion always posts new versions, not using > the > >> > revision system at all. The downside is that there may be multiple > >> current > >> > versions around... And this is a bit tricky I believe... Anyone? > >> > > >> > Paul's suggestion also keeps multiple copies of the previous version. > I > >> am > >> > not sure why, Paul? > >> > > >> > > >> > Regards, > >> > Ronny > >> > > >> > 2008/9/17 Paul Davis <[EMAIL PROTECTED]> > >> > > >> >> Good point chris. > >> >> > >> >> On Wed, Sep 17, 2008 at 11:39 AM, Chris Anderson <[EMAIL PROTECTED]> > >> >> wrote: > >> >> > On Wed, Sep 17, 2008 at 11:34 AM, Paul Davis > >> >> > <[EMAIL PROTECTED]> wrote: > >> >> >> Alternatively something like the following might work: > >> >> >> > >> >> >> Keep an eye on the specifics of _bulk_docs though. There have been > >> >> >> requests to make it non-atomic, but I think in the face of > something > >> >> >> like this we might make non-atomic _bulk_docs a non-default or > some > >> >> >> such. > >> >> > > >> >> > I think the need for non-transaction bulk-docs will be obviated > when > >> >> > we have the failure response say which docs caused failure, that > way > >> >> > one can retry once to save all the non-conflicting docs, and then > loop > >> >> > back through to handle the conflicts. > >> >> > > >> >> > upshot: I bet you can count on bulk docs being transactional. > >> >> > > >> >> > > >> >> > -- > >> >> > Chris Anderson > >> >> > http://jchris.mfdz.com > >> >> > > >> >> > >> > > >> > > >
