What do you see when you dump out the values of the input vectors? Is there actually content going in?
Also, what version of Mahout are you running? On May 4, 2011, at 9:25 AM, Dipti Mathur wrote: > Hi All, > > I get the following error while running the kmeans algorithm with the > reuters dataset. The topic has been discussed in many forums ( > http://search.lucidimagination.com/search/document/3f6b06ee9d45b4fe/tranforming_data_for_k_means_analysis) > but no one has mentioned a solution. Anyone faced this and has a solution? > > dipti@dipti-laptop:~$ mahout kmeans -i vect-output/tf-vectors/part-r-00000 > -k 15 --output kmeans-output --clusters kmeans-output/clusters --maxIter 200 > Running on hadoop, using HADOOP_HOME=/usr/lib/hadoop-0.20.2/ > HADOOP_CONF_DIR=/usr/lib/hadoop-0.20.2/conf > 11/05/04 18:41:59 INFO common.AbstractJob: Command line arguments: > {--clusters=kmeans-output/clusters, --convergenceDelta=0.5, > --distanceMeasure=org.apache.mahout.common.distance.SquaredEuclideanDistanceMeasure, > --endPhase=2147483647, --input=vect-output/tf-vectors/part-r-00000, > --maxIter=200, --method=mapreduce, --numClusters=15, --output=kmeans-output, > --startPhase=0, --tempDir=temp} > 11/05/04 18:41:59 INFO util.NativeCodeLoader: Loaded the native-hadoop > library > 11/05/04 18:41:59 INFO zlib.ZlibFactory: Successfully loaded & initialized > native-zlib library > 11/05/04 18:41:59 INFO compress.CodecPool: Got brand-new compressor > Exception in thread "main" java.lang.IndexOutOfBoundsException: Index: 0, > Size: 0 > at java.util.ArrayList.rangeCheck(ArrayList.java:571) > at java.util.ArrayList.get(ArrayList.java:349) > at > org.apache.mahout.clustering.kmeans.RandomSeedGenerator.buildRandom(RandomSeedGenerator.java:107) > at > org.apache.mahout.clustering.kmeans.KMeansDriver.run(KMeansDriver.java:96) > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65) > at > org.apache.mahout.clustering.kmeans.KMeansDriver.main(KMeansDriver.java:54) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:616) > at > org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68) > at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139) > at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:184) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:616) > at org.apache.hadoop.util.RunJar.main(RunJar.java:156) > > Regards, > Dipti Mathur -------------------------- Grant Ingersoll Lucene Revolution -- Lucene and Solr User Conference May 25-26 in San Francisco www.lucenerevolution.org
