>
> Trees would be overkill except for with very large clusters.
>

> With CouchDB map views, you need to combine results from every node in a
> big merge sort. If you combine all results at a single node, the single
> clients ability to simultaneously pull data and sort data from all other
> nodes may become the bottleneck. So to parallelize, you have multiple nodes
> doing a merge sort of sub nodes , then sending those results to another node
> to be combined further, etc.  The same with with the reduce views, but
> instead of a merge sort it's just rereducing results. The natural "shape" of
> that computation is a tree, with only the final root node at the top being
> the bottleneck, but now it has to maintain connections and merge the sort
> values from far fewer nodes.
>
> -Damien


 That makes sense and it clarifies one of my questions about this topic. Is
the goal of partitioned clustering to increase performance for very large
data sets, or to increase reliability? It would seem from this answere that
the goal is to increase query performance by distributing the query
processing, and not to increase reliability.

Reply via email to