[ https://issues.apache.org/jira/browse/MAHOUT-551?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15171226#comment-15171226 ]
Andrew Musselman commented on MAHOUT-551: ----------------------------------------- This k-means job does work with numerical inputs but keep in mind this ticket is quite old so things may have changed since this was integrated into the code base. Are you having trouble running the job? > Kmeans example with space delimited data > ---------------------------------------- > > Key: MAHOUT-551 > URL: https://issues.apache.org/jira/browse/MAHOUT-551 > Project: Mahout > Issue Type: Improvement > Components: Integration > Affects Versions: 0.4 > Reporter: Djellel Eddine Difallah > Assignee: Jeff Eastman > Priority: Minor > Fix For: 0.5 > > Attachments: MAHOUT-551.patch, MAHOUT-551.patch > > > The provided example for Kmeans clustering using the synthetic control data > asks for t1 and t2 measures because it runs the Canopy Driver to determine > the initial clusters. Kmeans originally requires a K variable to generate > random centers from the input data. I propose to add another example in the > package which will serve for any space delimited numerical input to cluster > with Kmeans in its original form and not using Canopy. The modification is > quite simple and is mostly based on the synthetic control Job. -- This message was sent by Atlassian JIRA (v6.3.4#6332)