single node atomic bulk_docs operations

Tim Parkin Tue, 17 Mar 2009 05:10:44 -0700

I've been thinking about the change in bulk docs behaviour and wanted to
discuss online but it;s difficult to get my thoughts across
conversationally so I've written a little 'article'. I'd love feedback
on it and if we can get some conclusions will write up a final document
about the issues as a wiki page.


Summary
=======

What is this about
------------------

Prior to the 0.9 release, it was possible to make atomic operations
against a local database using the bulk_docs functionality. This allowed
a group of operations to be either all applied without error or conflict
or not applied at all.

The 0.9 release of CouchDB changed the functionality so that it would
only fail to apply the changes on validation or system error. If the
last change in a group of operations was in conflict (e.g. an out of
date rev because someone had changed a document in the mean time) then
the change would still be applied but a conflict flag would be attached
and a message returned listing which operations were in conflict.

Why is it an issue
------------------

This feature made it possible to provide simple success/fail wrappers
around operations (for instance, an API call or a web request). The
change means that conflict resolution has to be handled in some way for
any bulk operation.

Examples of Issues faced due to change in functionality
=======================================================

We'll use an example where we have a patient and doctor documents where
the patient holds a reference to some of their doctors data
[Hospital_ID, Name and Surgery]

Here are a couple of examples of operation and some notes/questions..

Simple change of the Doctors details
====================================

If an admin changes the doctors Surgery then the following should happen

  i) Make change to doctor
  ii) Make change to any patients referring to doctor

If step two fails (i.e. changing the patients reference to the doctor)
----------------------------------------------------------------------

We can

   a) rollback the changes
   b) ignore the failure
   c) reload the patient and try to apply the change again

   step b) isn't realy an option.. we can't have  patients being sent to
   the wrong surgery. Anyway if we did, how would we let the admin fix
   things?

   how do we rollback the change to the doctor?

If the step one fails (i.e. changing the doctor's surgery)
----------------------------------------------------------

We can

  a) rollback the changes?
  b) accept the failure

If we accept the failure, we have to report back to the user that half
of their changes succeeded. What does the user do then?

how do we rollback the change to the doctor?

For both of these examples, the only realistic way we can see of
recovering for the administrator is to roll back the changes and tell
them that their 'change' failed..


Two Patients changing references because a doctors changes
==========================================================

A doctor (D) has two patients (P1 and P2)

If an admin changes the doctors Surgery then the following should happen

    i) Make change to D
    ii) Make change to P1
    iii) Make changes to P2


If step iii) fails (i.e. someone has changed P2 in the meantime)
----------------------------------------------------------------

With all_or_nothing false (default)  :-
.......................................

We have inconsistent data where P2 contains the wrong surgery. We can:-

    a) rollback the changes?
    b) accept the failure

a) The problem we have is that the Doctor change applied successfully,
   as did the first Patient change so how do we rollback?

b) If we accept it, what do we report to the administrator interface?

with all_or_nothing true :-
............................

We now have a conflict on P2 and we don't know whether it contains our
change or not? (and someone elses legitimate changes may have been
affected)

we now need a plan on how to resolve this conflict. Because we know
nothing about the previous change that we are conflicting with, the only
way to resolve it is to remove our change (we can't delete someone else
change without unknown repercussions). So how do we remove our change?

As far as we can see the only consistent way to report this to the
administrator is to revert all changes and report failure..

Anyway -- lets see how to deal with accepting the failure in different
places

Accepting that conflicts exist
==============================

Because of the nature of CouchDB we accept that conflicts may exist.
This does not mean that we don't care about minimising users exposure to
these changes.

Lets think about a possible result of an accepted conflict.

     / r2[r3]---r4[r3]
    /
r1 *
    \
     \ r3 (failed conflict)

What we have here is a document which starts at revision 1.

A change is made creating r2

A change is applied to r1 which conflicts, r2 is chosen as the winning
rev and r3 is saved on it as a conflict

A change is made to r2 to create r4 but the conflict flag still points
at r3

If we want to rebase our document using r3 instead of r2, we have to
work out a way to apply r4's changes to r2.

This could potentially be conflicting (application dependent) or it may
be possible to merge changes (if the changes are across different json
elements and the document doesn't have references


Conflicts exist at replication outside of a user interaction
-------------------------------------------------------------

If these conflicts happen during replication, then the failure can be
dealt with without affecting users working on a single node. This is an
'offline' job potentially and the number of occurences of conflicts
should be lower but also confined to a point in time (i.e. when you
replicate).


Conflicts exist on a single node because of a single user operation
-------------------------------------------------------------------

If the conflicts happend during a simple change, the person making those
changes will have to be informed of the problem and be given the options
to resolve that problem. Most users will only want to  see a binary
'worked/didnt work' result and won't informed enough to deal with the
subtleties of rebasing changsets

Conflicts during normal operation affect individual user interface views
and will occur at a greater frequency and distributed in time.


What this means to dealing with users.
======================================

For most web applications, I would imagine a single user will be dealing
with a single database instance. Because of this conciously chosen
specialisation it would be nice to have the tools available to make
single database instance operations as simple as they need to be.

Removing the previous bulk docs atomic operation makes these user
interface operations unnescessarily complex.

Trying to provide a self consistent view and simple user interaction
with this single database instance is fundamentally different than the
eventual consistency and conflict resolution that is required
occasionally on a single node and on database replication.

The steps that developers will have to take to provide conflict
resolution during user interface transactions (in order to provide user
interface consistency) will probably not be the same steps that they
will have to take to deal with conflict resolution in the general sense
(i.e. background conflict checking and conflict checking at
replication).

I would like to propose that the old bulk_docs functionality be
reinstated in some way but with enough information for developers to
understand what it actually means in the context of distributed data
(i.e. it is a tool to improve consistency not to guarantee consistency).

Conclusion
==========

My understanding for the reason to exclude bulk_docs is to force people
into dealing with distributed conflict resolution (i.e. to prevent
people from using bulk docs as a crutch or using it as an indicator of
atomicity). However, solving the issues raised here will not mean that
the general conflict resolution problem is also solved. The two problems
are very different as would be the probably technical solutions.

multiple query operations against a single node instantiated by a user
usually require a success/fail result. Dealing with conflicts during a
user 'request' complicates the writing of couchdb backed user
interfaces significantly. We feel the reintroduction of some form of the
previous bulk_docs functionality (with appropriate caveats and name
convention, etc) is critical to some real world applications and will
provide more real world benefits than philosophical drawbacks.

single node atomic bulk_docs operations

Reply via email to