Flatten the job just means to put all the Java class files in a single JAR. A JOB file is made up of a other JAR files, so you would unjar the JOB file, then unjar all the JAR files in there, then JAR them all back up into one big happy JAR file.

On Aug 3, 2009, at 2:33 PM, tigertail wrote:


Got to ask this again.

I installed and started Hadoop-0.20.0 on a cluster with two boxes properly. Then I just follow the steps Paul gave to install mahout on the master node. After that i can run canopy with no problem. But I cannot run kmeans. There
is always the error java.lang.NoClassDefFoundError:
com/google/gson/reflect/TypeToken.

Can Paul, or Grant, help me out this please, thanks!


tigertail wrote:

Hi Paul,

Sorry for the naive question, can you show me how to "flatten the JOB with all classes in the same JAR"? And has this error been fixed in the new SVN
version?


Paul Ingles-4 wrote:

I've flattened the JOB with all classes in the same JAR and that works
successfully.

Steps:

1) svn co http://svn.apache.org/repos/asf/lucene/mahout/trunk mahout-
trunk
2) cd mahout-trunk
3) mvn install
4) hadoop jar examples/target/mahout-examples-0.2-SNAPSHOT.job
org.apache.mahout.clustering.syntheticcontrol.kmeans.Job -libjars
examples/target/dependency/gson-1.3.jar

As for setting up Hadoop in pseudo-distributed, that was done
following the guide on the site but I'll check that again if it's been
updated recently.

Thanks again for all the help,
Paul

On 17 Jul 2009, at 13:39, Grant Ingersoll wrote:

Have you tried flattening the JOB so all the classes are packed in a
single JAR?  Also, can you give the full list of steps you are
doing, because I am able to run this in pseudo-distro without
getting this error. Also, have you checked the Hadoop logs ($HADOOP/
logs, I believe)

I also notice that the Hadoop quick start has different
configuration settings now due to 0.20

-Grant

On Jul 17, 2009, at 5:00 AM, Paul Ingles wrote:

I've tried re-running specifically adding the gson jar as follows:

$ hadoop jar examples/target/mahout-examples-0.2-SNAPSHOT.job
org.apache.mahout.clustering.syntheticcontrol.kmeans.Job -libjars
examples/target/dependency/gson-1.3.jar

Unfortunately, I get the same errors as before:

09/07/17 09:53:50 INFO kmeans.KMeansDriver: Clustering
09/07/17 09:53:50 INFO kmeans.KMeansDriver: Running Clustering
09/07/17 09:53:50 INFO kmeans.KMeansDriver: Input: output/data
Clusters In: output/clusters-4 Out: output/points Distance:
org.apache.mahout.utils.EuclideanDistanceMeasure
09/07/17 09:53:50 INFO kmeans.KMeansDriver: convergence: 0.5 Input
Vectors: org.apache.mahout.matrix.SparseVector
09/07/17 09:53:50 WARN mapred.JobClient: Use GenericOptionsParser
for parsing the arguments. Applications should implement Tool for
the same.
09/07/17 09:53:50 INFO mapred.FileInputFormat: Total input paths to
process : 2
09/07/17 09:53:51 INFO mapred.JobClient: Running job:
job_200907161209_0018
09/07/17 09:53:52 INFO mapred.JobClient:  map 0% reduce 0%
09/07/17 09:54:06 INFO mapred.JobClient: Task Id :
attempt_200907161209_0018_m_000000_0, Status : FAILED
java.lang.NoClassDefFoundError: com/google/gson/reflect/TypeToken
        at java.lang.ClassLoader.defineClass1(Native Method)
        at java.lang.ClassLoader.defineClass(ClassLoader.java:703)
        at
java .security.SecureClassLoader.defineClass(SecureClassLoader.java:
124)
        at java.net.URLClassLoader.defineClass(URLClassLoader.java:260)
        at java.net.URLClassLoader.access$000(URLClassLoader.java:56)
        at java.net.URLClassLoader$1.run(URLClassLoader.java:195)
        at java.security.AccessController.doPrivileged(Native Method)
        at java.net.URLClassLoader.findClass(URLClassLoader.java:188)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:319)
        at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:330)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:254)
        at java.lang.ClassLoader.loadClassInternal(ClassLoader.java:402)
        at
org
.apache
.mahout.matrix.AbstractVector.asFormatString(AbstractVector.java: 374)
        at
org
.apache
.mahout
.clustering .kmeans.Cluster.outputPointWithClusterInfo(Cluster.java:
198)
        at
org
.apache
.mahout
.clustering .kmeans.KMeansClusterMapper.map(KMeansClusterMapper.java:
39)
        at
org
.apache
.mahout
.clustering .kmeans.KMeansClusterMapper.map(KMeansClusterMapper.java:
32)
        at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java: 356)
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
        at org.apache.hadoop.mapred.Child.main(Child.java:170)
Caused by: java.lang.ClassNotFoundException:
com.google.gson.reflect.TypeToken
        at java.net.URLClassLoader$1.run(URLClassLoader.java:200)
        at java.security.AccessController.doPrivileged(Native Method)
        at java.net.URLClassLoader.findClass(URLClassLoader.java:188)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:319)
        at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:330)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:254)
        at java.lang.ClassLoader.loadClassInternal(ClassLoader.java:402)
        ... 20 more

This is running pseudo-distributed on my laptop.

On 16 Jul 2009, at 18:57, Adil Aijaz wrote:

My basic understanding of the class loader stuff is:

1. Any jars that need to be available to map/reduce jobs should be
specified through -libjars  (e.g hadoop --config ... -libjars
gson.jar jar <path to my jar> ...)
2. Any jars that need to be available to the main class should be
specified through lib/*.jar (that is in the mahout-examples-0.2-
SNAPSHOT/lib/*.jar)

unless of course as Jeff is saying one ends up flattening the lib/
*.jar into top level classes.

Adil

Jeff Eastman wrote:
Isn't this the same old problem that our Job jar file has a lib
directory with the Mahout code in it and the way Hadoop loads the jar it sometimes cannot resolve classes in it? IIRC, one needs to
smash the job jar file into a single jar in order for Dirichlet
(at least, and any other examples which contain non-core
classes). I confess I do not understand the class loader stuff
enough to be more specific.

I have duplicated the CNF exception by defining and using a user-
defined distance measure in the Job file and running KMeans with
it, so it is not specific to Dirichlet.


classes
Grant Ingersoll wrote:
Hmm, I'm not seeing the ClassNotFound problem but am getting
fetch failures.  Will look later.

-Grant

On Jul 16, 2009, at 11:32 AM, Paul Ingles wrote:

I've just tried setting a brand new machine (Ubuntu 8.04
Virtual Machine) with Hadoop 0.20.0 and running the compile
jobs against it. I get the same problems as before... still
scratching my head :(

On 16 Jul 2009, at 12:15, Paul Ingles wrote:

Sure,

I'm running (currently) on my MacBook Air, running OSX Leopard.

JDK: java version "1.6.0_13"
Java(TM) SE Runtime Environment (build 1.6.0_13-b03-211)
Java HotSpot(TM) 64-Bit Server VM (build 11.3-b02-83, mixed
mode)

Hadoop is: 0.20.0, r763504

I'm compiling mahout from trunk (r794023) as follows (in the
root of the project directory):

% mvn install
% hadoop jar examples/target/mahout-examples-0.2-SNAPSHOT.job
org.apache.mahout.clustering.syntheticcontrol.kmeans.Job

The only difference (for dirichlet) is the different class to
run.

Thanks,
Paul

On 16 Jul 2009, at 11:33, Grant Ingersoll wrote:

Can you share how you built and how you are running, as in
command line options, etc.?  Also, JDK version, Hadoop
version, etc.

On Jul 16, 2009, at 6:21 AM, Paul Ingles wrote:

Hi,

Thank you for the suggestion. Unfortunately, when I tried
that I received the same error. I've also tried copying the
gson jar directly into $HADOOP_HOME/lib (when I was running
a single node pseudo-distributed) and get the same error
still.

Weirdly enough, if I try and run the Dirichlet example on
the cluster I receive another ClassNotFoundException:

09/07/16 10:27:54 INFO mapred.JobClient: Task Id :
attempt_200907161026_0002_m_000001_0, Status : FAILED
java.lang.RuntimeException: Error in configuring object
at
org
.apache
.hadoop .util.ReflectionUtils.setJobConf(ReflectionUtils.java:
93)
at
org
.apache
.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java: 64)
at
org
.apache
.hadoop
.util.ReflectionUtils.newInstance(ReflectionUtils.java:117)
at
org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:
352)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
at org.apache.hadoop.mapred.Child.main(Child.java:170)
Caused by: java.lang.reflect.InvocationTargetException
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native
Method)
at
sun
.reflect
.NativeMethodAccessorImpl
.invoke(NativeMethodAccessorImpl.java:39)
at
sun
.reflect
.DelegatingMethodAccessorImpl
.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at
org
.apache
.hadoop .util.ReflectionUtils.setJobConf(ReflectionUtils.java:
88)
... 5 more
Caused by: java.lang.RuntimeException: Error in configuring
object
at
org
.apache
.hadoop .util.ReflectionUtils.setJobConf(ReflectionUtils.java:
93)
at
org
.apache
.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java: 64)
at
org
.apache
.hadoop
.util.ReflectionUtils.newInstance(ReflectionUtils.java:117)
at
org .apache.hadoop.mapred.MapRunner.configure(MapRunner.java:
34)
... 10 more
Caused by: java.lang.reflect.InvocationTargetException
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native
Method)
at
sun
.reflect
.NativeMethodAccessorImpl
.invoke(NativeMethodAccessorImpl.java:39)
at
sun
.reflect
.DelegatingMethodAccessorImpl
.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at
org
.apache
.hadoop .util.ReflectionUtils.setJobConf(ReflectionUtils.java:
88)
... 13 more
Caused by: java.lang.RuntimeException:
java.lang.ClassNotFoundException:
org
.apache
.mahout
.clustering
.syntheticcontrol.dirichlet.NormalScModelDistribution
at
org
.apache
.mahout
.clustering
.dirichlet
.DirichletMapper.getDirichletState(DirichletMapper.java:95)
at
org
.apache
.mahout
.clustering
.dirichlet.DirichletMapper.configure(DirichletMapper.java: 60)
... 18 more
Caused by: java.lang.ClassNotFoundException:
org
.apache
.mahout
.clustering
.syntheticcontrol.dirichlet.NormalScModelDistribution
at java.net.URLClassLoader$1.run(URLClassLoader.java:200)
at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java: 188)
at java.lang.ClassLoader.loadClass(ClassLoader.java:316)
at sun.misc.Launcher $AppClassLoader.loadClass(Launcher.java:
288)
at java.lang.ClassLoader.loadClass(ClassLoader.java:251)
at
org
.apache
.mahout
.clustering
.dirichlet .DirichletDriver.createState(DirichletDriver.java:
121)
at
org
.apache
.mahout
.clustering
.dirichlet
.DirichletMapper.getDirichletState(DirichletMapper.java:71)
... 19 more


Hoping this sparks some other suggestions :)

Thanks,
Paul


On Wed Jul 15 22:08:09 UTC 2009, Adil Aijaz <a...@yahoo-inc.com
wrote:
try hadoop --config <hod-cluster-dir> jar -libjars <path to
gson.jar>
<your job/jar file> <your class> <arguments>

Adil

Paul Ingles wrote:
Hi,

Apologies for the cross-posting (I also sent this to the
Hadoop user
list) but I'm still getting errors if I try and run the
KMeans
examples on a cluster, whether that be my single-node Mac
Pro, or our
cluster. I've attached the stack trace at the bottom of
the email.

The gson jar is definitely included in the packaged .job,
and is also
in the temporary directory when the task tracker picks up
the work.
The gson jar also includes TypeToken.class in the expected
path.

Again, really appreciate people's help in getting this
going!

----snip----
09/07/15 17:06:38 INFO mapred.JobClient: Task Id :
attempt_200907151617_0010_m_000000_0, Status : FAILED
java.lang.NoClassDefFoundError: com/google/gson/reflect/
TypeToken
at java.lang.ClassLoader.defineClass1(Native Method)
at java.lang.ClassLoader.defineClass(ClassLoader.java: 703)
at
java
.security
.SecureClassLoader.defineClass(SecureClassLoader.java: 124) at java.net.URLClassLoader.defineClass(URLClassLoader.java:
260)
at java.net.URLClassLoader.access $000(URLClassLoader.java:
56)
at java.net.URLClassLoader$1.run(URLClassLoader.java:195)
at java.security.AccessController.doPrivileged(Native
Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:
188)
at java.lang.ClassLoader.loadClass(ClassLoader.java:319)
at sun.misc.Launcher
$AppClassLoader.loadClass(Launcher.java:330)
at java.lang.ClassLoader.loadClass(ClassLoader.java:254)
at
java.lang.ClassLoader.loadClassInternal(ClassLoader.java:
402)
at
org
.apache
.mahout
.matrix .AbstractVector.asFormatString(AbstractVector.java:
374)

at
org
.apache
.mahout
.clustering
.kmeans.Cluster.outputPointWithClusterInfo(Cluster.java: 198)

at
org
.apache
.mahout
.clustering
.kmeans .KMeansClusterMapper.map(KMeansClusterMapper.java:39)

at
org
.apache
.mahout
.clustering
.kmeans .KMeansClusterMapper.map(KMeansClusterMapper.java:32)

at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
at
org .apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:
356)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
at org.apache.hadoop.mapred.Child.main(Child.java:170)
Caused by: java.lang.ClassNotFoundException:
com.google.gson.reflect.TypeToken
at java.net.URLClassLoader$1.run(URLClassLoader.java:200)
at java.security.AccessController.doPrivileged(Native
Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:
188)
at java.lang.ClassLoader.loadClass(ClassLoader.java:319)
at sun.misc.Launcher
$AppClassLoader.loadClass(Launcher.java:330)
at java.lang.ClassLoader.loadClass(ClassLoader.java:254)
at
java.lang.ClassLoader.loadClassInternal(ClassLoader.java:
402)
... 20 more
----snip----

Incidentally, as part of this work I've also implemented a
Pearson
distance measure, if people think it would be useful to be
folded in
I'd be happy to get the SVN patch with tests and
implementation together.

Thanks,
Paul

--------------------------
Grant Ingersoll
http://www.lucidimagination.com/

Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/
Droids) using Solr/Lucene:
http://www.lucidimagination.com/search










--------------------------
Grant Ingersoll
http://www.lucidimagination.com/

Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids)
using Solr/Lucene:
http://www.lucidimagination.com/search







--
View this message in context: 
http://www.nabble.com/ClassNotFoundException-with-pseudo-distributed-run-of-KMeans-tp24505889p24795839.html
Sent from the Mahout User List mailing list archive at Nabble.com.


--------------------------
Grant Ingersoll
http://www.lucidimagination.com/

Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids) using Solr/Lucene:
http://www.lucidimagination.com/search

Reply via email to