On Feb 20, 2009, at 4:37 PM, Stefan Karpinski wrote:


Trees would be overkill except for with very large clusters.


With CouchDB map views, you need to combine results from every node in a big merge sort. If you combine all results at a single node, the single clients ability to simultaneously pull data and sort data from all other nodes may become the bottleneck. So to parallelize, you have multiple nodes doing a merge sort of sub nodes , then sending those results to another node to be combined further, etc. The same with with the reduce views, but instead of a merge sort it's just rereducing results. The natural "shape" of that computation is a tree, with only the final root node at the top being the bottleneck, but now it has to maintain connections and merge the sort
values from far fewer nodes.

-Damien


That makes sense and it clarifies one of my questions about this topic. Is the goal of partitioned clustering to increase performance for very large data sets, or to increase reliability? It would seem from this answere that
the goal is to increase query performance by distributing the query
processing, and not to increase reliability.


I see partitioning and clustering as 2 different things. Partitioning is data partitioning, spreading the data out across nodes, no node having the complete database. Clustering is nodes having the same, or nearly the same data (they might be behind on replicating changes, but otherwise they have the same data).

Partitioning would primarily increase write performance (updates happening concurrently on many nodes) and the size of the data set. Partitioning helps with client read scalability, but only for document reads, not views queries. Partitioning alone could reduce reliability, depending how tolerant you are to missing portions of the database.

Clustering would primarily address database reliability (failover), address client read scalability for docs and views. Clustering doesn't help much with write performance because even if you spread out the update load, the replication as the cluster syncs up means every node gets the update anyway. It might be useful to deal with update spikes, where you get a bunch of updates at once and can wait for the replication delay to get everyone synced back up.

For really big, really reliable database, I'd have clusters of partitions, where the database is partitioned N ways, each each partition have at least M identical cluster members. Increase N for larger databases and update load, M for higher availability and read load.

-Damien

Reply via email to