On Sep 24, 2008, at 2:15 , Ayende Rahien wrote:
First, I must admit that I am a complete newbie with Erlang.
Nevertheless, I tried to read couch db source in order to see if I can
actually understand what is going on.
You can see the results here:
http://ayende.com/Blog/archive/2008/09/24/reading-erlang-inspecting-couchdb.aspx
I would be happy if someone could point out all the gross
inaccuracies that
are undoubtedly there.
I'm only halfway through, but I'll send in my comments soon.
Anyway, I had a few questions that I hope I'll be able to get some
answers
for.
merge conflicts - how does couch db decides on "best" revision?
It arbitrarily choses one revision. The only guarantee that is made is
that for
the same conflict all nodes in a CouchDB cluster choose the same latest
revision to ensure data consistency.
does couchdb store all documents on all servers? implements sharding?
from browsing the code, it seems like all documents exists on all
servers,
For the moment, yes all docs on all nodes, but we will have sharding.
Also,
consistent hashing in your data storage layer could already emulate
that.
and it is up to the servers management to decide how to replicate
between
them. Something like master / 2 slaves between each three nodes
should do
quite well, I imagine.
Correct.
Two questions that are of particular interest to me, and I haven't
been able
to get from the code so far are:
- How is the data stored? I think that it is a binary tree on disk,
but I am
not following how updates to that can be safe to do so with ACID
guarantees.
Writes are serialized. Only one write can happen at a time and it is
completely
flushed and committed to disk (2 x fsync()) before another write comes
in. Writes
are append-only. No data is ever overwritten. This gives us the ACID &
MVCC
buzzcronyms :-)
- How are views stored?
In the same way as a database.
Cheers
Jan
--