Heya Ralf,

Thanks for your input and engaging in this discussion!

On Apr 12, 2008, at 04:36, Ralf Nieuwenhuijsen wrote:
Hi,

I've joined this mailing-list, because i wanted to reply to this discussion
specifically.
I was hoping you could clear a number of things up for me.

1. Why make compacting the default? Isn't more likely that in this day &
age, most will prefer revisions for all data?

Because the storage system is pretty wasteful and you'd end up with several Gigabytes of database files for just a few hundred Megabytes of actual data. So we do need compaction in one form or another. A compaction that retains revisions is a lot harder to write. Also, dealing with revisions in a distributed setup is less than trivial and would complicate the replication system quite a bit.


2. Compacting seems like very specific behavior, wouldn't a built-in
cron-like system be much more generic? It could allow for all kinds of
background proccessing, like replication, fulltext-search using javascript,
compacting, searching-for-dead-urls, etc.

Compacting is a manual process at the moment. If we would introduce a scheduling mechanism, it would certainly be more general purpose and you could hook in al sorts of operations, including compaction.


3. Is support for some sort of reduce behavior, as part of the views,
planned and ifso, what can we expect?

See http://damienkatz.net/2008/02/incremental_map.html
and http://damienkatz.net/2008/02/incremental_map_1.html


4. What is the default conflict behavor? Most recent version wins?

There's no 'recent' in a distributed system. At the moment, the revision with the most changes wins, if I remember correctly.


5. Is it possible to merge on conflicts, or ifnot, how could attachments possible properly model revisions. Wouldn't we loose a whole revision tree?

You don't merge, at least at the moment, but declare one revision to be the winner when resolving the conflict. Since this is a manual process, you can make sure you don't lose revision trees. Merge might be in at some point, but no thoughts (at least public) went into that.


6. Without merging, we need to store revisions in seperate documents,
thereby prohibiting usefull doc-is for documents under revision.

I don't understand what you mean here :) What is 'doc-is' in this context?


7. What added benefit do manual revisisons have when we can just store extra
revision data to each document anyway?
I'm quite sure my understanding of CouchDB can be lacking. But to me it
seems like garantueed revisisions are the killer feature.

The revisions are not, at least at this point, meant to implement revision control systems, they rather exists for the optimistic concurrency control that allows any number of parallel readers while serialised writes are happening and to power replication.


The alternative of a cron-like system, could work much like the
view-documents. These documents could contain a source url (possibly local), a schedule-parameter and a function that maps a document to an array of documents that is treated as a batch-put. This way we could easily setup replication, but also all kinds of delayed and/or scheduled proccessing of
data.

Indeed. No planning went into such a thing at the moment. You might want to open a feature request at https://issues.apache.org/jira/browse/COUCHDB or come up with a patch.


Likewise, being able to define a conflict function that could merge data or
decide who wins, seems like a much better alternative to the 'atomic'
batch-put-operations, that break down when distributed. (thereby no longer
garantueeing the scalability; another killer-feature).

Conflict resolution and merge functions do sound interesting, I don't understand the "not guaranteeing scalability" remark though. In the current implementation, this feature actually makes CouchDB scalable by ensuring, that all node participating in a cluster eventually end up with the same data. If you really do need two-phase-commit (if I understand correctly, you want that), that would need to be part of your application or a intermediate storage layer.


Cheers
Jan
--

Reply via email to