Re: Kmeans clustering

diveman Mon, 11 Jan 2010 10:45:51 -0800

I copied down to local and found that the classDumper needs both HDFS and
local disk file. So I made two folders exactly the same name one in HDFS and
one in local. And I get the following:


hadoop jar mahout-utils-0.3-SNAPSHOT.jar
org.apache.mahout.utils.clustering.ClusterDumper -s /data/clusters-6 -o
/data/output
Input Path: /data/clusters-6/part-00000
Exception in thread "main" java.lang.NullPointerException
        at
org.apache.mahout.utils.vectors.VectorHelper.vectorToString(VectorHelper.java:60)
        at
org.apache.mahout.utils.clustering.ClusterDumper.printClusters(ClusterDumper.java:124)
        at
org.apache.mahout.utils.clustering.ClusterDumper.main(ClusterDumper.java:253)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
        at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:597)
        at org.apache.hadoop.util.RunJar.main(RunJar.java:156)

Any thoughts?




Drew Farris wrote:
> 
> I suspect you are seeing this error because ClusterDumper doesn't work
> from hadoop/hdfs, you will need to copy the directories down to your
> local disk and run from there using java.
> 
> On Thu, Jan 7, 2010 at 4:30 PM, diveman <shilian...@gmail.com> wrote:
>>
>> Thanks!
>>
>> and when I try to run the dumper it gives me the following:
>> hadoop jar mahout-utils-0.3-SNAPSHOT.jar
>> org.apache.mahout.utils.clustering.ClusterDumper -s output/clusters-6/ -o
>> /data/output
>> Exception in thread "main" java.lang.NullPointerException
>>        at
>> org.apache.mahout.utils.clustering.ClusterDumper.printClusters(ClusterDumper.java:112)
>>        at
>> org.apache.mahout.utils.clustering.ClusterDumper.main(ClusterDumper.java:253)
>>        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>        at
>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>>        at
>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>>        at java.lang.reflect.Method.invoke(Method.java:597)
>>        at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
>>
>>
>>
>> Drew Farris wrote:
>>>
>>> Each iteration of k-means clustering will produce a cluster-X file. In
>>> this case, there were 7 iterations prior to the clusters converging.
>>> The final cluster data can be found in clusters-6.
>>>
>>> There is a utility in mahout-util,
>>> o.a.m.utils.clustering.ClusterDumper that can be used to dump the data
>>> from clusters-6 and points into a json-like format. You could use that
>>> code as a starting point for discovering how to get at the data you're
>>> interested in.
>>>
>>> On Thu, Jan 7, 2010 at 3:23 PM, diveman <shilian...@gmail.com> wrote:
>>>>
>>>> I'm new to Mahout. Installed 0.3 in a 4-node cluster and run mahout
>>>> kmean
>>>> example with syntheticcontrol data. I got outputs like the following:
>>>>
>>>> output/canopies
>>>> output/clusters-0
>>>> output/clusters-1
>>>> output/clusters-2
>>>> output/clusters-3
>>>> output/clusters-4
>>>> output/clusters-5
>>>> output/clusters-6
>>>> output/data
>>>> output/points
>>>>
>>>> by which I understand in the points folder, each point is labeled with
>>>> a
>>>> cluster id. I'm wondering where I can find the cluster center, radius
>>>> info,
>>>> etc. And what's in clusters-0~6? BTW, the sample data has 6 groups and
>>>> the
>>>> result has 7 clusters, any clue?
>>>>
>>>> Thanks!
>>>> --
>>>> View this message in context:
>>>> http://old.nabble.com/Kmeans-clustering-tp27066415p27066415.html
>>>> Sent from the Mahout User List mailing list archive at Nabble.com.
>>>>
>>>>
>>>
>>>
>>
>> --
>> View this message in context:
>> http://old.nabble.com/Kmeans-clustering-tp27066415p27067350.html
>> Sent from the Mahout User List mailing list archive at Nabble.com.
>>
>>
> 
> 

-- 
View this message in context: 
http://old.nabble.com/Kmeans-clustering-tp27066415p27115555.html
Sent from the Mahout User List mailing list archive at Nabble.com.

Re: Kmeans clustering

Reply via email to