[ https://issues.apache.org/jira/browse/MAHOUT-1431?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13919314#comment-13919314 ]
yannis ats commented on MAHOUT-1431: ------------------------------------ I am pretty sure that the reducer takes more time than the mapper,but if i remember well probably both mapper and reducer take more time in 0.9 than 0.8 but i think it took more time on the mapper but i am not very confident (my memory is not very good) but i cannot answer this question right now, probably i have to restart the processes and check manually how long it takes > Comparison of Mahout 0.8 vs mahout 0.9 in EMR > --------------------------------------------- > > Key: MAHOUT-1431 > URL: https://issues.apache.org/jira/browse/MAHOUT-1431 > Project: Mahout > Issue Type: Question > Components: Clustering > Affects Versions: 0.8, 0.9 > Reporter: yannis ats > Labels: performance > > Hi all, > i tested mahout 0.8 and 0.9 in mahout emr with a large dataset as input and > i performed kmeans experiments with both versions in amazon EMR. > What i found is that mahout 0.8 is faster than mahout 0.9 > in particular i observed that mahout 0.8 is performing less iterations and > every iteration of kmeans is faster than mahout 0.9.Every iteration in mahout > 0.8 is twice as fast as that of 0.9 > the hadoop version was 1.0.x and the input of the data was roughly 2 million > datapoints with dimensionality of 1800. > The input parameters in both experiments were exactly the same,modulo the > initialization which was random in both cases and i can understand that this > may affect the convergence(the amount of iterations),but i am baffled by the > fact that every iteration takes almost twice the time in 0.9 vs 0.8 > Is this normal?is this expected? > thank you in advance for your time. -- This message was sent by Atlassian JIRA (v6.2#6252)