On Feb 20, 2009, at 4:37 PM, Stefan Karpinski wrote:
Trees would be overkill except for with very large clusters.
With CouchDB map views, you need to combine results from every node
in a
big merge sort. If you combine all results at a single node, the
single
clients ability to simultaneously pull data and sort data from all
other
nodes may become the bottleneck. So to parallelize, you have
multiple nodes
doing a merge sort of sub nodes , then sending those results to
another node
to be combined further, etc. The same with with the reduce views,
but
instead of a merge sort it's just rereducing results. The natural
"shape" of
that computation is a tree, with only the final root node at the
top being
the bottleneck, but now it has to maintain connections and merge
the sort
values from far fewer nodes.
-Damien
That makes sense and it clarifies one of my questions about this
topic. Is
the goal of partitioned clustering to increase performance for very
large
data sets, or to increase reliability? It would seem from this
answere that
the goal is to increase query performance by distributing the query
processing, and not to increase reliability.
I see partitioning and clustering as 2 different things. Partitioning
is data partitioning, spreading the data out across nodes, no node
having the complete database. Clustering is nodes having the same, or
nearly the same data (they might be behind on replicating changes, but
otherwise they have the same data).
Partitioning would primarily increase write performance (updates
happening concurrently on many nodes) and the size of the data set.
Partitioning helps with client read scalability, but only for document
reads, not views queries. Partitioning alone could reduce reliability,
depending how tolerant you are to missing portions of the database.
Clustering would primarily address database reliability (failover),
address client read scalability for docs and views. Clustering doesn't
help much with write performance because even if you spread out the
update load, the replication as the cluster syncs up means every node
gets the update anyway. It might be useful to deal with update spikes,
where you get a bunch of updates at once and can wait for the
replication delay to get everyone synced back up.
For really big, really reliable database, I'd have clusters of
partitions, where the database is partitioned N ways, each each
partition have at least M identical cluster members. Increase N for
larger databases and update load, M for higher availability and read
load.
-Damien