[ https://issues.apache.org/jira/browse/MAHOUT-966?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Grant Ingersoll resolved MAHOUT-966. ------------------------------------ Resolution: Not A Problem This is actually behaving correctly. Here's what I did: # bin/mahout org.apache.mahout.clustering.syntheticcontrol.meanshift.Job -x 25 -cd 5 -t1 50 -t2 10 -dm org.apache.mahout.common.distance.EuclideanDistanceMeasure -i /path/to/synthetic_control.data -ow -o output -cl # Independently, do: ## bin/mahout clusterdump -i output/clusters-5-final/ -p output/clusteredPoints -o /tmp/clusterdump.txt ## For clusterPP ### bin/mahout clusterpp -i output -o output/post ### bin/mahout seqdumper -i output/post/0/part-r-00000 --facets Both report 5 clusters total. For clusterpp, Seq Dumper reports the following number of points per cluster: {quote} -----Facets--- Key Count 0 145 101 31 104 25 200 199 300 200 {quote} For clusterdumper, I see: {quote} MSV-0{n=145 MSV-101{n=31 MSV-104{n=25 MSV-200{n=199 MSV-300{n=200 {quote} > Mismatch in the number of points given by the clusterDumper and > ClusterOutputPostProcessor > ------------------------------------------------------------------------------------------ > > Key: MAHOUT-966 > URL: https://issues.apache.org/jira/browse/MAHOUT-966 > Project: Mahout > Issue Type: Bug > Components: Integration > Affects Versions: 0.6 > Environment: hadoop 0.20.2 mahout 0.6 > Reporter: Gaurav Redkar > Assignee: Grant Ingersoll > Priority: Minor > Fix For: 0.8 > > Attachments: cluster-dumper-output.txt, clusterpp-output.txt, > mtestdata.txt, points100dCCNorm.txt > > > After running the post processor the number of points that each cluster > contains is not matching the number of points each cluster should contain as > stated by clusterdumper. > > MSV-287{ n=90 c=[0.05195, 0.05675, 0.07151, 0.05713, 0.06946,...} > MSV-145{ n=90 c=[0.93685, 0.93071, 0.93641, 0.94629, 0.94409,..} > the n mentioned in clusters-n-final against each cluster is different from > the number of points actually contained in d directory for each cluster. Any > idea why is this happening ...? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira