I suspect you are seeing this error because ClusterDumper doesn't work from hadoop/hdfs, you will need to copy the directories down to your local disk and run from there using java.
On Thu, Jan 7, 2010 at 4:30 PM, diveman <[email protected]> wrote: > > Thanks! > > and when I try to run the dumper it gives me the following: > hadoop jar mahout-utils-0.3-SNAPSHOT.jar > org.apache.mahout.utils.clustering.ClusterDumper -s output/clusters-6/ -o > /data/output > Exception in thread "main" java.lang.NullPointerException > at > org.apache.mahout.utils.clustering.ClusterDumper.printClusters(ClusterDumper.java:112) > at > org.apache.mahout.utils.clustering.ClusterDumper.main(ClusterDumper.java:253) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) > at java.lang.reflect.Method.invoke(Method.java:597) > at org.apache.hadoop.util.RunJar.main(RunJar.java:156) > > > > Drew Farris wrote: >> >> Each iteration of k-means clustering will produce a cluster-X file. In >> this case, there were 7 iterations prior to the clusters converging. >> The final cluster data can be found in clusters-6. >> >> There is a utility in mahout-util, >> o.a.m.utils.clustering.ClusterDumper that can be used to dump the data >> from clusters-6 and points into a json-like format. You could use that >> code as a starting point for discovering how to get at the data you're >> interested in. >> >> On Thu, Jan 7, 2010 at 3:23 PM, diveman <[email protected]> wrote: >>> >>> I'm new to Mahout. Installed 0.3 in a 4-node cluster and run mahout kmean >>> example with syntheticcontrol data. I got outputs like the following: >>> >>> output/canopies >>> output/clusters-0 >>> output/clusters-1 >>> output/clusters-2 >>> output/clusters-3 >>> output/clusters-4 >>> output/clusters-5 >>> output/clusters-6 >>> output/data >>> output/points >>> >>> by which I understand in the points folder, each point is labeled with a >>> cluster id. I'm wondering where I can find the cluster center, radius >>> info, >>> etc. And what's in clusters-0~6? BTW, the sample data has 6 groups and >>> the >>> result has 7 clusters, any clue? >>> >>> Thanks! >>> -- >>> View this message in context: >>> http://old.nabble.com/Kmeans-clustering-tp27066415p27066415.html >>> Sent from the Mahout User List mailing list archive at Nabble.com. >>> >>> >> >> > > -- > View this message in context: > http://old.nabble.com/Kmeans-clustering-tp27066415p27067350.html > Sent from the Mahout User List mailing list archive at Nabble.com. > >
