Juste so I understand:
attaching previous versions as attachements is:
1) Last version of document containing a list of attachements
or
2) Last version of document containing previous version as
attachement, which itself contains previous version as attachement,
etc...
If the answer is (2), then merging updates from several servers might
really be difficult !
If the answer is (1), merging is simpler but it is not very easy to
generate a version number, except using revision dates.
Ultimately, the reasons to keep revisions (in what I am considering
using couchdb for) are:
1) audit trail (for legal reasons) which means not only "show me who
changed what when in document X" but "show me a set of documents as
they were on Jan-3-2008 10:28"
2) have different document "status": archived (e.g. can't be changed),
published (for global use), published locally, work in progress (only
for the user editing)
Point 2 is important because it means a document can be "live" with
several different revisions and depending who you are in the system,
you get to see one or another revision.
It actually means that it should be easy to write views which say for
example:
"give me all published document + all my work in progress documents"
Since there could be many published revisions, it is actually "give me
the last revision with published status + last revision with work in
progress status"
Then, when I finished working on my "work in progress" document I want
to store it as "published" and delete all revisions with status work
in progress I created between last published document and my new
version...
In summary, what I am describing here is rather generic in document
management systems. Do we want this as custom built code, as actually
part of CouchDb or as an optional layer on top of CouchDb ?
My 2 euro cents :-)
Alex
Le 17 mars 08 à 20:52, Damien Katz a écrit :
On Mar 17, 2008, at 2:48 PM, Alan Bell wrote:
Jan Lehnardt wrote:
You can do that, too. With attachments, you'd have it all in one
place and would not need to write your views in a way that they
don't pick up old revisions. That said, it is certainly possible to
store older revisions in other documents, if that solves your
problems.
Cheers
Jan
--
well I might be missing something about the way couchdb handles
attachments but this doesn't sound good to me. Adding attachments
to hold the revision history means that the attachments have to be
replicated each time a revision happens.
Right now, this is true. But with attachment level incremental
replication then only attachments that have changed will replicate.
Also a replication conflict is pretty much the same thing as a
revision, a client application would have no knowledge of a
replication conflict happening but this would be good to see in a
wiki-like page history. I can imagine in a distributed system it
would be very hard for the clients to maintain a revision history
as attachments.
I disagree about the difficulty. It's surprisingly simple
conceptually.
The first thing is, every time you update the document, simply
attach the previous revision when you save. Eventually there will be
a flag you can pass in to do this automatically.
Then, if there is a replication conflict to resolve, simply open the
two conflicting documents (manually if necessary), update your
chosen winner with any info you want to preserve from the loser
(data, revision histories, etc) , then delete the loser revision.
And that's it. The thing about this system is you can get very
simple or very complicated with the revision history aspects, it's
up to the application developer. The nice thing is you generally
don't need to worry about concurrent or distributed updates with
other nodes attempting the same thing. The same rules still apply
and eventually the conflicts will be resolved.
As for writing views to not pick up old revisions, I think all
applications should assume that all documents are at all times
carrying a bundle of prior versions and replication/save conflicts.
One of the nasty things in Notes is that most applications assume
that replication conflicts don't happen and can break when they do
happen. I think a major feature of Couchdb is sensible handling of
revisions and conflicts. Purging revisions and conflicts is going
to be necessary for some applications, but in others it is
desirable to retain all versions. It would be good at least to be
able to specify which databases to run compaction on and which to
exclude.
The scheduling of compaction is something that will be external to
the core database code. Much of the work here isn't in the actually
file level compaction code, but in creating tools to monitor things
and initiate it with desired options.
What is the proposed rule for compaction? Just deleting all
revisions it finds? Deleting old revisions over a certain age?
For the first cut of compaction, it will unconditionally purge all
previous revisions of a document from a database, leaving only the
most recent revisions of the winner and it's conflicts.
Then we will provide a way to perform selective purging during
compaction, probably with a user provided function will be fed each
document at compaction time, and it will return true or false if the
document should be kept or discarded. This is also how deletion
"stubs" will be purged as well (keeping some meta info about deleted
documents is necessary for replication).
Another thought, it would be nice perhaps to run compaction on some
servers but not on others for replicas of the same database. Thus a
bunch of offline clients could compact fairly frequently and
aggressively, however a central server they all replicate with that
has lots of disk space could retain all versions.
Ok, that's a neat use case but I'm not sure how you would handle the
intermediate edits replicating back to the server. Maybe they just
get lost. It seems possible to support such a thing without a lot of
work. We'll see what is possible.
I am thinking in particular of the scenario of OLPC XO laptops
replicating with a school server.
Alan.