I'll fish for a one more hint. I'm using the MAHOUT-126 code to turn text into data via TF-IDF. What comes out of there is not in the same format as your example data. This means that I need a different InputDriver? Is one lying about for the format written by that DocumentVector class?
On Fri, May 29, 2009 at 10:29 AM, Jeff Eastman <[email protected]>wrote: > Benson Margulies wrote: > >> OK, I've got some inputs, I want to run k-means, how do I feed the beast? >> >> >> > Make sure you can run the Synthetic Control example to get everything wired > together correctly: JDK, Hadoop, Mahout. See > http://cwiki.apache.org/MAHOUT/syntheticcontroldata.html. Then write an > input job to convert your data similar to > /Mahout/examples/src/main/java/org/apache/mahout/clustering/syntheticcontrol/canopy/InputDriver.java > and make a new job like > /Mahout/examples/src/main/java/org/apache/mahout/clustering/syntheticcontrol/kmeans/Job.java. > You will have a small adventure and then be operational. > > Have fun, > Jeff >
