Re: Error with KMeans example in trunk (793689)

Paul Ingles Tue, 14 Jul 2009 06:24:48 -0700

I've also tried r787776 on Hadoop 0.19.1, I get a NoClassDefFoundErrorfor com/google/gson/reflect/TypeToken. I'm pretty sure this is thesame error I was seeing when trying 793689 against Hadoop 0.20.0.

I've checked the mahout-*-examples.job file and the lib directory doescontain gson-1.3.jar which does contain TypeToken.class at com/google/gson/reflect so not too sure what's happening.


On 14 Jul 2009, at 13:23, Paul Ingles wrote:

I noticed it was using 0.20.0 this morning and gave it a go. I thinkit failed at the Clustering phases with a NoClassDef error for theGSon stuff, but I don't remember exactly.
I'm running from an earlier revision against 0.19 at the moment, butwill try 0.20 again when it's finished and let you know how it goes.
Thanks again,
Paul

On 14 Jul 2009, at 12:58, Grant Ingersoll wrote:
Try Hadoop 0.20.0, which is what trunk is now on. I will updatethe docs.
On Jul 13, 2009, at 7:02 PM, Paul Ingles wrote:
Hi,
I've been going over the kmeans stuff the last few days to try andunderstand how it works, and how I might extend it to work withthe data I'm looking to process. It's taken me a while to get abasic understanding of things, and really appreciate having listslike this around for support.
I need to be able to label the vectors: each vector holds (for adocument) a set of similarity scores across a number ofattributes. I did some searching around payloads (after comingacross the term in some comments) but couldn't see how I add apayload to the Vector. I then stumbled on MAHOUT-65 (https://issues.apache.org/jira/browse/MAHOUT-65) that mentions the addition of the setName method to Vector. I'vetried building trunk, and although there were a few test failuresfor other (seemingly unrelated) examples I continued and managedto get the mahout-examples jar/job files built to give it a whirl.
When I run the following:
$ hadoop jar examples/target/mahout-examples-0.2-SNAPSHOT.joborg.apache.mahout.clustering.syntheticcontrol.kmeans.Job
I see it run the "Preparing Input", "Running Canopy to get initialclusters", and then finally it starts "Running KMeans". But,shortly after it breaks with the following trace:
---snip---
Running KMeans
09/07/13 23:49:34 INFO kmeans.KMeansDriver: Input: output/dataClusters In: output/canopies Out: output Distance:org.apache.mahout.utils.EuclideanDistanceMeasure09/07/13 23:49:34 INFO kmeans.KMeansDriver: convergence: 0.5 maxIterations: 10 num Reduce Tasks: 1 Input Vectors:org.apache.mahout.matrix.SparseVector
09/07/13 23:49:34 INFO kmeans.KMeansDriver: Iteration 0
09/07/13 23:49:34 WARN mapred.JobClient: Use GenericOptionsParserfor parsing the arguments. Applications should implement Tool forthe same.09/07/13 23:49:34 INFO mapred.FileInputFormat: Total input pathsto process : 209/07/13 23:49:34 INFO mapred.JobClient: Running job:job_200907132019_0040
09/07/13 23:49:35 INFO mapred.JobClient:  map 0% reduce 0%
09/07/13 23:49:42 INFO mapred.JobClient:  map 50% reduce 0%
09/07/13 23:49:43 INFO mapred.JobClient:  map 100% reduce 0%
09/07/13 23:49:49 INFO mapred.JobClient:  map 100% reduce 100%
09/07/13 23:49:50 INFO mapred.JobClient: Job complete:job_200907132019_0040
09/07/13 23:49:50 INFO mapred.JobClient: Counters: 16
09/07/13 23:49:50 INFO mapred.JobClient:   File Systems
09/07/13 23:49:50 INFO mapred.JobClient:     HDFS bytes read=465629
09/07/13 23:49:50 INFO mapred.JobClient:     HDFS bytes written=5631
09/07/13 23:49:50 INFO mapred.JobClient:     Local bytes read=7806
09/07/13 23:49:50 INFO mapred.JobClient: Local byteswritten=15674
09/07/13 23:49:50 INFO mapred.JobClient:   Job Counters
09/07/13 23:49:50 INFO mapred.JobClient:     Launched reduce tasks=1
09/07/13 23:49:50 INFO mapred.JobClient:     Launched map tasks=2
09/07/13 23:49:50 INFO mapred.JobClient:     Data-local map tasks=2
09/07/13 23:49:50 INFO mapred.JobClient:   Map-Reduce Framework
09/07/13 23:49:50 INFO mapred.JobClient:     Reduce input groups=7
09/07/13 23:49:50 INFO mapred.JobClient: Combine outputrecords=10
09/07/13 23:49:50 INFO mapred.JobClient:     Map input records=600
09/07/13 23:49:50 INFO mapred.JobClient:     Reduce output records=7
09/07/13 23:49:50 INFO mapred.JobClient:     Map output bytes=465600
09/07/13 23:49:50 INFO mapred.JobClient:     Map input bytes=448580
09/07/13 23:49:50 INFO mapred.JobClient: Combine inputrecords=600
09/07/13 23:49:50 INFO mapred.JobClient:     Map output records=600
09/07/13 23:49:50 INFO mapred.JobClient:     Reduce input records=10
09/07/13 23:49:50 WARN kmeans.KMeansDriver: java.io.IOException:Cannot open filename /user/paul/output/clusters-0/_logsjava.io.IOException: Cannot open filename /user/paul/output/clusters-0/_logsat org.apache.hadoop.hdfs.DFSClient$DFSInputStream.openInfo(DFSClient.java:1394)at org.apache.hadoop.hdfs.DFSClient$DFSInputStream.<init>(DFSClient.java:1385)
        at org.apache.hadoop.hdfs.DFSClient.open(DFSClient.java:338)
atorg.apache.hadoop.hdfs.DistributedFileSystem.open(DistributedFileSystem.java:171)at org.apache.hadoop.io.SequenceFile$Reader.openFile(SequenceFile.java:1437)at org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1424)at org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1417)at org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1412)atorg.apache.mahout.clustering.kmeans.KMeansDriver.isConverged(KMeansDriver.java:304)atorg.apache.mahout.clustering.kmeans.KMeansDriver.runIteration(KMeansDriver.java:241)atorg.apache.mahout.clustering.kmeans.KMeansDriver.runJob(KMeansDriver.java:194)atorg.apache.mahout.clustering.syntheticcontrol.kmeans.Job.runJob(Job.java:100)atorg.apache.mahout.clustering.syntheticcontrol.kmeans.Job.main(Job.java:56)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
atsun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)atsun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:597)
        at org.apache.hadoop.util.RunJar.main(RunJar.java:165)
        at org.apache.hadoop.mapred.JobShell.run(JobShell.java:54)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
        at org.apache.hadoop.mapred.JobShell.main(JobShell.java:68)
---snip---
This is against revision 793689, running on my development Mac Pro(pseudo-distributed single node) with Hadoop 0.19.1.
It's a bit late to be digging through what's going on, but willtry and take a look tomorrow- really excited about giving kmeans awhirl on the document processing I'm playing with. In themeantime, I was wondering whether anyone else had seen the same,or knew a way to accomplish something similar with the releasedversion (or point me to a past good revision perhaps?)
Thanks again,
Paul
--------------------------
Grant Ingersoll
http://www.lucidimagination.com/
Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids)using Solr/Lucene:
http://www.lucidimagination.com/search

Re: Error with KMeans example in trunk (793689)

Reply via email to