On 3 Aug 2009, at 17:26, Rune Skou Larsen wrote:
Damien Katz skrev:
2009/7/31 Jason Davies <[email protected]>:
The main points of this proposal are:
1. Store the historical versions of documents in a separate
database.
This
is for a number of reasons: a) keeping it separate means we don't
clog up
the main database with historical data b) history-specific views
can be
kept
here c) non-intrusive implementation of this is easier.
Some comments about the proposal
1. The callbacks must be synchronous. Queueing them for writing
later
means the queue can get overloaded and changes lost.
2 Changes can still get lost. We don't have commits across dbs, so
it's possible a crash during update will put the main and history dbs
out of sync.
3. Replicated changes get lost. If a client makes 5 edits to local
replica of a document, then replicates it to a server db, only the
most recent change get recorded in the history.
I would prefer to store the history as attachments to the main
document.
-Damien
I agree that _all versions of a document should be in the same
database_
because commit-scope of a change should include saving the undo-
history.
What good is unreliable undo?
But also for other reasons:
1) Future versions
In my company, we need a system, where we can replicate data to all
couchdb-instances before it should be used. This is also very common
in
the CMS-world for scheduling a change to the website. So we need to to
be able to store a future version, which becomes valid at a specified
time and make the "invisible" change between versions (we use a url
rewrite). Thats very tough if current data and history data are in
separate databases and in different formats.
2) Applying views
View'ing on historic docs should be as powerful as viewing "current"
docs. With the proposed format for historic documents, the same view
cannot be applied on current and history db. In fact, complex views
can't be used at all in the history db, since the one-dimensional
view-index must include time.
I dream of a fully temporal couchdb, where all GET requests can
include
the point in time for which I want to see the docs through my views,
lists and shows :-)
Using attachments is not optimal, because there's still the "un-
dynamic"
distinction between past, current and future, but its much better
than a
seperate db. The attachments-proposal retains the possibility to
manipulate versions of the same doc in one commit-scope.
We've just been discussing this some more on IRC and BenoƮt suggested
adding a "_history" member to allow historical versions of documents
to be stored there (essentially as attachments, because doc._history
would by default only contain stubs). I'd prefer not to overpopulate
the "_" namespace so I'm not set on adding doc._history but let's run
with this for this discussion.
The stubs would contain basic metadata: last modified timestamp and
userCtx that modified the doc (perhaps we can do away with
doc._history and add this metadata to the attachment metadata? Or
decide on a format for the attachment filename e.g. _history/
<timestamp>/<userCtx>.json?)
This would then make it easy to write views that manipulated the
history via the doc._history stubs. I'm thinking we only probably
want to send the stubs to the view server, as serialising all the
historical data for each doc could get CPU-hungry.
The other question is whether to make this a db-wide setting, perhaps
a special doc so that it will be replicated (_history_settings) or
perhaps put it in design docs, or do we want to configure it on a per-
doc level? Rune suggested something like { _history_settings:
{ num_docs: 10, ... } }. I would probably lean towards putting it in
design docs, so that the decision can be made by the app developer.
There is a possibility that this could be implemented in the _update
handler but I'd strongly prefer to have a core module written in
Erlang for performance reasons, and to make it easier for people to
turn it on and off.
Finally, whartung pointed out this paper: http://www.cs.tau.ac.il/~ohadrode/papers/btree_TOS.pdf
which contains some interesting info on using B-trees to support
snapshots, maybe someone can comment on the feasibility of supporting
that?
Comments welcomed!
--
Jason Davies
www.jasondavies.com