I've flattened the JOB with all classes in the same JAR and that works successfully.

Steps:

1) svn co http://svn.apache.org/repos/asf/lucene/mahout/trunk mahout- trunk
2) cd mahout-trunk
3) mvn install
4) hadoop jar examples/target/mahout-examples-0.2-SNAPSHOT.job org.apache.mahout.clustering.syntheticcontrol.kmeans.Job -libjars examples/target/dependency/gson-1.3.jar

As for setting up Hadoop in pseudo-distributed, that was done following the guide on the site but I'll check that again if it's been updated recently.

Thanks again for all the help,
Paul

On 17 Jul 2009, at 13:39, Grant Ingersoll wrote:

Have you tried flattening the JOB so all the classes are packed in a single JAR? Also, can you give the full list of steps you are doing, because I am able to run this in pseudo-distro without getting this error. Also, have you checked the Hadoop logs ($HADOOP/ logs, I believe)

I also notice that the Hadoop quick start has different configuration settings now due to 0.20

-Grant

On Jul 17, 2009, at 5:00 AM, Paul Ingles wrote:

I've tried re-running specifically adding the gson jar as follows:

$ hadoop jar examples/target/mahout-examples-0.2-SNAPSHOT.job org.apache.mahout.clustering.syntheticcontrol.kmeans.Job -libjars examples/target/dependency/gson-1.3.jar

Unfortunately, I get the same errors as before:

09/07/17 09:53:50 INFO kmeans.KMeansDriver: Clustering
09/07/17 09:53:50 INFO kmeans.KMeansDriver: Running Clustering
09/07/17 09:53:50 INFO kmeans.KMeansDriver: Input: output/data Clusters In: output/clusters-4 Out: output/points Distance: org.apache.mahout.utils.EuclideanDistanceMeasure 09/07/17 09:53:50 INFO kmeans.KMeansDriver: convergence: 0.5 Input Vectors: org.apache.mahout.matrix.SparseVector 09/07/17 09:53:50 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same. 09/07/17 09:53:50 INFO mapred.FileInputFormat: Total input paths to process : 2 09/07/17 09:53:51 INFO mapred.JobClient: Running job: job_200907161209_0018
09/07/17 09:53:52 INFO mapred.JobClient:  map 0% reduce 0%
09/07/17 09:54:06 INFO mapred.JobClient: Task Id : attempt_200907161209_0018_m_000000_0, Status : FAILED
java.lang.NoClassDefFoundError: com/google/gson/reflect/TypeToken
        at java.lang.ClassLoader.defineClass1(Native Method)
        at java.lang.ClassLoader.defineClass(ClassLoader.java:703)
at java.security.SecureClassLoader.defineClass(SecureClassLoader.java: 124)
        at java.net.URLClassLoader.defineClass(URLClassLoader.java:260)
        at java.net.URLClassLoader.access$000(URLClassLoader.java:56)
        at java.net.URLClassLoader$1.run(URLClassLoader.java:195)
        at java.security.AccessController.doPrivileged(Native Method)
        at java.net.URLClassLoader.findClass(URLClassLoader.java:188)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:319)
        at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:330)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:254)
        at java.lang.ClassLoader.loadClassInternal(ClassLoader.java:402)
at org .apache .mahout.matrix.AbstractVector.asFormatString(AbstractVector.java:374) at org .apache .mahout .clustering.kmeans.Cluster.outputPointWithClusterInfo(Cluster.java: 198) at org .apache .mahout .clustering.kmeans.KMeansClusterMapper.map(KMeansClusterMapper.java: 39) at org .apache .mahout .clustering.kmeans.KMeansClusterMapper.map(KMeansClusterMapper.java: 32)
        at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
        at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:356)
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
        at org.apache.hadoop.mapred.Child.main(Child.java:170)
Caused by: java.lang.ClassNotFoundException: com.google.gson.reflect.TypeToken
        at java.net.URLClassLoader$1.run(URLClassLoader.java:200)
        at java.security.AccessController.doPrivileged(Native Method)
        at java.net.URLClassLoader.findClass(URLClassLoader.java:188)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:319)
        at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:330)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:254)
        at java.lang.ClassLoader.loadClassInternal(ClassLoader.java:402)
        ... 20 more

This is running pseudo-distributed on my laptop.

On 16 Jul 2009, at 18:57, Adil Aijaz wrote:

My basic understanding of the class loader stuff is:

1. Any jars that need to be available to map/reduce jobs should be specified through -libjars (e.g hadoop --config ... -libjars gson.jar jar <path to my jar> ...) 2. Any jars that need to be available to the main class should be specified through lib/*.jar (that is in the mahout-examples-0.2- SNAPSHOT/lib/*.jar)

unless of course as Jeff is saying one ends up flattening the lib/ *.jar into top level classes.

Adil

Jeff Eastman wrote:
Isn't this the same old problem that our Job jar file has a lib directory with the Mahout code in it and the way Hadoop loads the jar it sometimes cannot resolve classes in it? IIRC, one needs to smash the job jar file into a single jar in order for Dirichlet (at least, and any other examples which contain non-core classes). I confess I do not understand the class loader stuff enough to be more specific.

I have duplicated the CNF exception by defining and using a user- defined distance measure in the Job file and running KMeans with it, so it is not specific to Dirichlet.


classes
Grant Ingersoll wrote:
Hmm, I'm not seeing the ClassNotFound problem but am getting fetch failures. Will look later.

-Grant

On Jul 16, 2009, at 11:32 AM, Paul Ingles wrote:

I've just tried setting a brand new machine (Ubuntu 8.04 Virtual Machine) with Hadoop 0.20.0 and running the compile jobs against it. I get the same problems as before... still scratching my head :(

On 16 Jul 2009, at 12:15, Paul Ingles wrote:

Sure,

I'm running (currently) on my MacBook Air, running OSX Leopard.

JDK: java version "1.6.0_13"
Java(TM) SE Runtime Environment (build 1.6.0_13-b03-211)
Java HotSpot(TM) 64-Bit Server VM (build 11.3-b02-83, mixed mode)

Hadoop is: 0.20.0, r763504

I'm compiling mahout from trunk (r794023) as follows (in the root of the project directory):

% mvn install
% hadoop jar examples/target/mahout-examples-0.2-SNAPSHOT.job org.apache.mahout.clustering.syntheticcontrol.kmeans.Job

The only difference (for dirichlet) is the different class to run.

Thanks,
Paul

On 16 Jul 2009, at 11:33, Grant Ingersoll wrote:

Can you share how you built and how you are running, as in command line options, etc.? Also, JDK version, Hadoop version, etc.

On Jul 16, 2009, at 6:21 AM, Paul Ingles wrote:

Hi,

Thank you for the suggestion. Unfortunately, when I tried that I received the same error. I've also tried copying the gson jar directly into $HADOOP_HOME/lib (when I was running a single node pseudo-distributed) and get the same error still.

Weirdly enough, if I try and run the Dirichlet example on the cluster I receive another ClassNotFoundException:

09/07/16 10:27:54 INFO mapred.JobClient: Task Id : attempt_200907161026_0002_m_000001_0, Status : FAILED
java.lang.RuntimeException: Error in configuring object
at org .apache .hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java: 93) at org .apache .hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:64) at org .apache .hadoop .util.ReflectionUtils.newInstance(ReflectionUtils.java:117) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java: 352)
 at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
 at org.apache.hadoop.mapred.Child.main(Child.java:170)
Caused by: java.lang.reflect.InvocationTargetException
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun .reflect .NativeMethodAccessorImpl .invoke(NativeMethodAccessorImpl.java:39) at sun .reflect .DelegatingMethodAccessorImpl .invoke(DelegatingMethodAccessorImpl.java:25)
 at java.lang.reflect.Method.invoke(Method.java:597)
at org .apache .hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java: 88)
 ... 5 more
Caused by: java.lang.RuntimeException: Error in configuring object at org .apache .hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java: 93) at org .apache .hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:64) at org .apache .hadoop .util.ReflectionUtils.newInstance(ReflectionUtils.java:117) at org.apache.hadoop.mapred.MapRunner.configure(MapRunner.java: 34)
 ... 10 more
Caused by: java.lang.reflect.InvocationTargetException
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun .reflect .NativeMethodAccessorImpl .invoke(NativeMethodAccessorImpl.java:39) at sun .reflect .DelegatingMethodAccessorImpl .invoke(DelegatingMethodAccessorImpl.java:25)
 at java.lang.reflect.Method.invoke(Method.java:597)
at org .apache .hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java: 88)
 ... 13 more
Caused by: java.lang.RuntimeException: java.lang.ClassNotFoundException: org .apache .mahout .clustering .syntheticcontrol.dirichlet.NormalScModelDistribution at org .apache .mahout .clustering .dirichlet .DirichletMapper.getDirichletState(DirichletMapper.java:95) at org .apache .mahout .clustering .dirichlet.DirichletMapper.configure(DirichletMapper.java:60)
 ... 18 more
Caused by: java.lang.ClassNotFoundException: org .apache .mahout .clustering .syntheticcontrol.dirichlet.NormalScModelDistribution
 at java.net.URLClassLoader$1.run(URLClassLoader.java:200)
 at java.security.AccessController.doPrivileged(Native Method)
 at java.net.URLClassLoader.findClass(URLClassLoader.java:188)
 at java.lang.ClassLoader.loadClass(ClassLoader.java:316)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java: 288)
 at java.lang.ClassLoader.loadClass(ClassLoader.java:251)
at org .apache .mahout .clustering .dirichlet.DirichletDriver.createState(DirichletDriver.java: 121) at org .apache .mahout .clustering .dirichlet .DirichletMapper.getDirichletState(DirichletMapper.java:71)
 ... 19 more


Hoping this sparks some other suggestions :)

Thanks,
Paul


On Wed Jul 15 22:08:09 UTC 2009, Adil Aijaz <a...@yahoo-inc.com > wrote:
try hadoop --config <hod-cluster-dir> jar -libjars <path to gson.jar>
<your job/jar file> <your class> <arguments>

Adil

Paul Ingles wrote:
Hi,

Apologies for the cross-posting (I also sent this to the Hadoop user list) but I'm still getting errors if I try and run the KMeans examples on a cluster, whether that be my single-node Mac Pro, or our cluster. I've attached the stack trace at the bottom of the email.

The gson jar is definitely included in the packaged .job, and is also in the temporary directory when the task tracker picks up the work. The gson jar also includes TypeToken.class in the expected path.

Again, really appreciate people's help in getting this going!

----snip----
09/07/15 17:06:38 INFO mapred.JobClient: Task Id :
attempt_200907151617_0010_m_000000_0, Status : FAILED
java.lang.NoClassDefFoundError: com/google/gson/reflect/ TypeToken
at java.lang.ClassLoader.defineClass1(Native Method)
at java.lang.ClassLoader.defineClass(ClassLoader.java:703)
at
java .security .SecureClassLoader.defineClass(SecureClassLoader.java:124) at java.net.URLClassLoader.defineClass(URLClassLoader.java: 260) at java.net.URLClassLoader.access$000(URLClassLoader.java: 56)
at java.net.URLClassLoader$1.run(URLClassLoader.java:195)
at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java: 188)
at java.lang.ClassLoader.loadClass(ClassLoader.java:319)
at sun.misc.Launcher $AppClassLoader.loadClass(Launcher.java:330)
at java.lang.ClassLoader.loadClass(ClassLoader.java:254)
at java.lang.ClassLoader.loadClassInternal(ClassLoader.java: 402)
at
org .apache .mahout .matrix.AbstractVector.asFormatString(AbstractVector.java: 374)

at
org .apache .mahout .clustering .kmeans.Cluster.outputPointWithClusterInfo(Cluster.java:198)

at
org .apache .mahout .clustering .kmeans.KMeansClusterMapper.map(KMeansClusterMapper.java:39)

at
org .apache .mahout .clustering .kmeans.KMeansClusterMapper.map(KMeansClusterMapper.java:32)

at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java: 356)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
at org.apache.hadoop.mapred.Child.main(Child.java:170)
Caused by: java.lang.ClassNotFoundException:
com.google.gson.reflect.TypeToken
at java.net.URLClassLoader$1.run(URLClassLoader.java:200)
at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java: 188)
at java.lang.ClassLoader.loadClass(ClassLoader.java:319)
at sun.misc.Launcher $AppClassLoader.loadClass(Launcher.java:330)
at java.lang.ClassLoader.loadClass(ClassLoader.java:254)
at java.lang.ClassLoader.loadClassInternal(ClassLoader.java: 402)
... 20 more
----snip----

Incidentally, as part of this work I've also implemented a Pearson distance measure, if people think it would be useful to be folded in I'd be happy to get the SVN patch with tests and implementation together.

Thanks,
Paul

--------------------------
Grant Ingersoll
http://www.lucidimagination.com/

Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/ Droids) using Solr/Lucene:
http://www.lucidimagination.com/search










--------------------------
Grant Ingersoll
http://www.lucidimagination.com/

Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids) using Solr/Lucene:
http://www.lucidimagination.com/search


Reply via email to