[ https://issues.apache.org/jira/browse/MAHOUT-1431?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13924841#comment-13924841 ]
yannis ats commented on MAHOUT-1431: ------------------------------------ how can i distinguish which mapper is slower? by the logs? i would try to see the logs here the time was defined approximately by the console of emr > Comparison of Mahout 0.8 vs mahout 0.9 in EMR > --------------------------------------------- > > Key: MAHOUT-1431 > URL: https://issues.apache.org/jira/browse/MAHOUT-1431 > Project: Mahout > Issue Type: Question > Components: Clustering > Affects Versions: 0.8, 0.9 > Reporter: yannis ats > Labels: performance > > Hi all, > i tested mahout 0.8 and 0.9 in mahout emr with a large dataset as input and > i performed kmeans experiments with both versions in amazon EMR. > What i found is that mahout 0.8 is faster than mahout 0.9 > in particular i observed that mahout 0.8 is performing less iterations and > every iteration of kmeans is faster than mahout 0.9.Every iteration in mahout > 0.8 is twice as fast as that of 0.9 > the hadoop version was 1.0.x and the input of the data was roughly 2 million > datapoints with dimensionality of 1800. > The input parameters in both experiments were exactly the same,modulo the > initialization which was random in both cases and i can understand that this > may affect the convergence(the amount of iterations),but i am baffled by the > fact that every iteration takes almost twice the time in 0.9 vs 0.8 > Is this normal?is this expected? > thank you in advance for your time. -- This message was sent by Atlassian JIRA (v6.2#6252)