Mahout algorithms guide

Bogdan Vatkov Wed, 06 Jan 2010 18:21:29 -0800

Hi,

I am wondering if the different algorithms available @ Mahout have different
results and different behavior (e.g. performance - memory, speed, etc.) and
if yes could we have some short (2-3 sentences per alg.) description of the
different algs.
For example how they perform in different conditions: e.g. how they behave
related to:
- documents amount
- documents average size
- documents of very different sizes (e.g. half of the docs are very small
and the other half very large - would either of the doc sizes win for some
reason during clustering)
- cluster size
- documents amount to cluster size ratio
- memory needed
- time needed


For example I am right now interested in clustering of documents:
- of close size (most of the documents have size very close to the average
size)
- ratio between docs and clusters desired is 23 000 : 80 (or maybe even : 40
and :20)
Which Mahout algorithm and using which parameters is recommended for my
case?

Of course I should be able to run my data through all possible algorithms
and then try to compare results - but it would be good to know if using one
or another algorithm would lead to one or another flavor of the result -
especially if this is already known based on the specifics of the
algorithms.

Best regards,
Bogdan

Mahout algorithms guide

Reply via email to