Hi Adrian, Spark is expecting a specific naming of the tgz and also the folder name inside, as this is generated by running make-distribution.sh --tgz in the Spark source folder.
If you use a Spark 1.4 tgz generated with that script with the same name and upload to HDFS again, fix the URI then it should work. Tim On Wed, Sep 9, 2015 at 8:18 AM, Adrian Bridgett <adr...@opensignal.com> wrote: > 5mins later... > > Trying 1.5 with a fairly plain build: > ./make-distribution.sh --tgz --name os1 -Phadoop-2.6 > > and on my first attempt stderr showed: > I0909 15:16:49.392144 1619 fetcher.cpp:441] Fetched > 'hdfs:///apps/spark/spark15.tgz' to > '/tmp/mesos/slaves/20150826-133446-3217621258-5050-4064-S1/frameworks/20150826-133446-3217621258-5050-4064-211204/executors/20150826-133446-3217621258-5050-4064-S1/runs/43026ba8-6624-4817-912c-3d7573433102/spark15.tgz' > sh: 1: cd: can't cd to spark15.tgz > sh: 1: ./bin/spark-class: not found > > Aha, let's rename the file in hdfs (and the two configs) from spark15.tgz > to spark-1.5.0-bin-os1.tgz... > Success!!! > > The same trick with 1.4 doesn't work, but now that I have something that > does I can make progress. > > Hopefully this helps someone else :-) > > Adrian > > > On 09/09/2015 16:59, Adrian Bridgett wrote: > > I'm trying to run spark (1.4.1) on top of mesos (0.23). I've followed the > instructions (uploaded spark tarball to HDFS, set executor uri in both > places etc) and yet on the slaves it's failing to lauch even the SparkPi > example with a JNI error. It does run with a local master. A day of > debugging later and it's time to ask for help! > > bin/spark-submit --master mesos://10.1.201.191:5050 --class > org.apache.spark.examples.SparkPi /tmp/examples.jar > > (I'm putting the jar outside hdfs - on both client box + slave (turned > off other slaves for debugging) - due to > <http://apache-spark-user-list.1001560.n3.nabble.com/Remote-jar-file-td20649.html> > http://apache-spark-user-list.1001560.n3.nabble.com/Remote-jar-file-td20649.html. > I should note that I had the same JNI errors when using the mesos cluster > dispatcher). > > I'm using Oracle Java 8 (no other java - even openjdk - is installed) > > As you can see, the slave is downloading the framework fine (you can even > see it extracted on the slave). Can anyone shed some light on what's going > on - e.g. how is it attempting to run the executor? > > I'm going to try a different JVM (and try a custom spark distribution) but > I suspect that the problem is much more basic. Maybe it can't find the > hadoop native libs? > > Any light would be much appreciated :) I've included the slaves's stderr > below: > > I0909 14:14:01.405185 32132 logging.cpp:177] Logging to STDERR > I0909 14:14:01.405256 32132 fetcher.cpp:409] Fetcher Info: > {"cache_directory":"\/tmp\/mesos\/fetch\/slaves\/20150826-133446-3217621258-5050-4064-S0\/ubuntu","items":[{"action":"BYPASS_CACHE","uri":{"extract":true,"value":"hdfs:\/\/\/apps\/spark\/spark.tgz"}}],"sandbox_directory":"\/tmp\/mesos\/slaves\/20150826-133446-3217621258-5050-4064-S0\/frameworks\/20150826-133446-3217621258-5050-4064-211198\/executors\/20150826-133446-3217621258-5050-4064-S0\/runs\/38077da2-553e-4888-bfa3-ece2ab2119f3","user":"ubuntu"} > I0909 14:14:01.406332 32132 fetcher.cpp:364] Fetching URI > 'hdfs:///apps/spark/spark.tgz' > I0909 14:14:01.406344 32132 fetcher.cpp:238] Fetching directly into the > sandbox directory > I0909 14:14:01.406358 32132 fetcher.cpp:176] Fetching URI > 'hdfs:///apps/spark/spark.tgz' > I0909 14:14:01.679055 32132 fetcher.cpp:104] Downloading resource with > Hadoop client from 'hdfs:///apps/spark/spark.tgz' to > '/tmp/mesos/slaves/20150826-133446-3217621258-5050-4064-S0/frameworks/20150826-133446-3217621258-5050-4064-211198/executors/20150826-133446-3217621258-5050-4064-S0/runs/38077da2-553e-4888-bfa3-ece2ab2119f3/spark.tgz' > I0909 14:14:05.492626 32132 fetcher.cpp:76] Extracting with command: tar > -C > '/tmp/mesos/slaves/20150826-133446-3217621258-5050-4064-S0/frameworks/20150826-133446-3217621258-5050-4064-211198/executors/20150826-133446-3217621258-5050-4064-S0/runs/38077da2-553e-4888-bfa3-ece2ab2119f3' > -xf > '/tmp/mesos/slaves/20150826-133446-3217621258-5050-4064-S0/frameworks/20150826-133446-3217621258-5050-4064-211198/executors/20150826-133446-3217621258-5050-4064-S0/runs/38077da2-553e-4888-bfa3-ece2ab2119f3/spark.tgz' > I0909 14:14:07.489753 32132 fetcher.cpp:84] Extracted > '/tmp/mesos/slaves/20150826-133446-3217621258-5050-4064-S0/frameworks/20150826-133446-3217621258-5050-4064-211198/executors/20150826-133446-3217621258-5050-4064-S0/runs/38077da2-553e-4888-bfa3-ece2ab2119f3/spark.tgz' > into > '/tmp/mesos/slaves/20150826-133446-3217621258-5050-4064-S0/frameworks/20150826-133446-3217621258-5050-4064-211198/executors/20150826-133446-3217621258-5050-4064-S0/runs/38077da2-553e-4888-bfa3-ece2ab2119f3' > W0909 14:14:07.489784 32132 fetcher.cpp:260] Copying instead of extracting > resource from URI with 'extract' flag, because it does not seem to be an > archive: hdfs:///apps/spark/spark.tgz > I0909 14:14:07.489791 32132 fetcher.cpp:441] Fetched > 'hdfs:///apps/spark/spark.tgz' to > '/tmp/mesos/slaves/20150826-133446-3217621258-5050-4064-S0/frameworks/20150826-133446-3217621258-5050-4064-211198/executors/20150826-133446-3217621258-5050-4064-S0/runs/38077da2-553e-4888-bfa3-ece2ab2119f3/spark.tgz' > Error: A JNI error has occurred, please check your installation and try > again > Exception in thread "main" java.lang.NoClassDefFoundError: org/slf4j/Logger > at java.lang.Class.getDeclaredMethods0(Native Method) > at java.lang.Class.privateGetDeclaredMethods(Class.java:2701) > at java.lang.Class.privateGetMethodRecursive(Class.java:3048) > at java.lang.Class.getMethod0(Class.java:3018) > at java.lang.Class.getMethod(Class.java:1784) > at > sun.launcher.LauncherHelper.validateMainClass(LauncherHelper.java:544) > at > sun.launcher.LauncherHelper.checkAndLoadMain(LauncherHelper.java:526) > Caused by: java.lang.ClassNotFoundException: org.slf4j.Logger > at java.net.URLClassLoader.findClass(URLClassLoader.java:381) > at java.lang.ClassLoader.loadClass(ClassLoader.java:424) > at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331) > at java.lang.ClassLoader.loadClass(ClassLoader.java:357) > ... 7 more > > > > -- > *Adrian Bridgett* | Sysadmin Engineer, OpenSignal > <http://www.opensignal.com> > _____________________________________________________ > Office: First Floor, Scriptor Court, 155-157 Farringdon Road, Clerkenwell, > London, EC1R 3AD > Phone #: +44 777-377-8251 > Skype: abridgett | @adrianbridgett <http://twitter.com/adrianbridgett> | > LinkedIn link <https://uk.linkedin.com/in/abridgett> > _____________________________________________________ >