Hi Shashi,

That does sound like a JDK version problem. Most jobs require an initial step to get the input into the correct vector format to use the clustering code. The /Mahout/examples/src/main/java/org/apache/mahout/clustering/syntheticcontrol/canopy/Job.java calls an InputDriver that does that for the syntheticcontrol examples. You would need to do something similar to massage your data into Mahout Vector format before you can run the clustering job of your choosing.

Jeff

Shashikant Kore wrote:
Thanks for the response, Grant.

Upgrading Hadoop didn't really help. Now, I am not able to launch even
the Namenode, JobTracker, ... as I am getting same error. I suspect
version conflict somewhere as there are two JDK version on the box. I
will try it out on another box which has only JDK 6.

>From the documentation of clustering, it is not clear how to get the
vectors from text (or html) files. I suppose, you can get TF-IDF
values by indexing this content with Lucene. How does one proceed from
there? Any pointers on that are appreciated.

--shashi

On Tue, Apr 28, 2009 at 8:40 PM, Grant Ingersoll <[email protected]> wrote:
On Apr 28, 2009, at 6:01 AM, Shashikant Kore wrote:

Hi,

Initially, I got the version number error at the beginning. I found
that JDK version was 1.5. It has been upgraded it to 1.6. Now
JAVA_HOME points to /usr/java/jdk1.6.0_13/  and I am using Hadoop
0.18.3.

1. What could possibly be wrong? I checked the Hadoop script. Value of
JAVA_HOME is correct (ie 1.6). Is it possible that somehow it is still
using 1.5?
I'm going to guess the issue is that you need Hadoop 0.19.
2. The last step the clustering tutorial says "Get the data out of
HDFS and have a look." Can you please point me to the documentation of
Hadoop about how to read this data?
http://hadoop.apache.org/core/docs/current/quickstart.html towards the
bottom.  It shows some of the commands you can use w/ HDFS.  -get, -cat,
etc.


-Grant




Reply via email to