[jira] [Commented] (MAHOUT-1431) Comparison of Mahout 0.8 vs mahout 0.9 in EMR

Suneel Marthi (JIRA) Tue, 04 Mar 2014 03:36:43 -0800

    [ 
https://issues.apache.org/jira/browse/MAHOUT-1431?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13919293#comment-13919293
 ]


Suneel Marthi commented on MAHOUT-1431:
---------------------------------------

Comparing 0.7 to 0.8 is comparing apples-oranges. The clustering code was 
redone for 0.7 and wasn't functioning right until 0.8.

> Comparison of Mahout 0.8 vs mahout 0.9 in EMR
> ---------------------------------------------
>
>                 Key: MAHOUT-1431
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-1431
>             Project: Mahout
>          Issue Type: Question
>          Components: Clustering
>    Affects Versions: 0.8, 0.9
>            Reporter: yannis ats
>              Labels: performance
>
> Hi all,
> i tested mahout 0.8 and 0.9 in mahout emr with a large dataset as input and 
> i performed kmeans experiments with both versions in amazon EMR.
> What i found is that mahout 0.8 is faster than mahout 0.9
> in particular i observed that mahout 0.8 is performing less iterations and 
> every iteration of kmeans is faster than mahout 0.9.Every iteration in mahout 
> 0.8 is twice as fast as that of 0.9
> the hadoop version was 1.0.x and the input of the data was roughly 2 million 
> datapoints with dimensionality of 1800.
> The input parameters in both experiments were exactly the same,modulo the 
> initialization which was random in both cases and i can understand that this 
> may affect the convergence(the amount of iterations),but i am baffled by the 
> fact that every iteration takes almost twice the time in 0.9 vs 0.8
> Is this normal?is this  expected?
> thank you in advance for your time.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (MAHOUT-1431) Comparison of Mahout 0.8 vs mahout 0.9 in EMR

Reply via email to