5mins later...

Trying 1.5 with a fairly plain build:
./make-distribution.sh --tgz --name os1 -Phadoop-2.6

and on my first attempt stderr showed:
I0909 15:16:49.392144 1619 fetcher.cpp:441] Fetched 'hdfs:///apps/spark/spark15.tgz' to '/tmp/mesos/slaves/20150826-133446-3217621258-5050-4064-S1/frameworks/20150826-133446-3217621258-5050-4064-211204/executors/20150826-133446-3217621258-5050-4064-S1/runs/43026ba8-6624-4817-912c-3d7573433102/spark15.tgz'
sh: 1: cd: can't cd to spark15.tgz
sh: 1: ./bin/spark-class: not found

Aha, let's rename the file in hdfs (and the two configs) from spark15.tgz to spark-1.5.0-bin-os1.tgz...
Success!!!

The same trick with 1.4 doesn't work, but now that I have something that does I can make progress.

Hopefully this helps someone else :-)

Adrian

On 09/09/2015 16:59, Adrian Bridgett wrote:
I'm trying to run spark (1.4.1) on top of mesos (0.23). I've followed the instructions (uploaded spark tarball to HDFS, set executor uri in both places etc) and yet on the slaves it's failing to lauch even the SparkPi example with a JNI error. It does run with a local master. A day of debugging later and it's time to ask for help!

bin/spark-submit --master mesos://10.1.201.191:5050 --class org.apache.spark.examples.SparkPi /tmp/examples.jar

(I'm putting the jar outside hdfs - on both client box + slave (turned off other slaves for debugging) - due to http://apache-spark-user-list.1001560.n3.nabble.com/Remote-jar-file-td20649.html. I should note that I had the same JNI errors when using the mesos cluster dispatcher).

I'm using Oracle Java 8 (no other java - even openjdk - is installed)

As you can see, the slave is downloading the framework fine (you can even see it extracted on the slave). Can anyone shed some light on what's going on - e.g. how is it attempting to run the executor?

I'm going to try a different JVM (and try a custom spark distribution) but I suspect that the problem is much more basic. Maybe it can't find the hadoop native libs?

Any light would be much appreciated :) I've included the slaves's stderr below:

I0909 14:14:01.405185 32132 logging.cpp:177] Logging to STDERR
I0909 14:14:01.405256 32132 fetcher.cpp:409] Fetcher Info: {"cache_directory":"\/tmp\/mesos\/fetch\/slaves\/20150826-133446-3217621258-5050-4064-S0\/ubuntu","items":[{"action":"BYPASS_CACHE","uri":{"extract":true,"value":"hdfs:\/\/\/apps\/spark\/spark.tgz"}}],"sandbox_directory":"\/tmp\/mesos\/slaves\/20150826-133446-3217621258-5050-4064-S0\/frameworks\/20150826-133446-3217621258-5050-4064-211198\/executors\/20150826-133446-3217621258-5050-4064-S0\/runs\/38077da2-553e-4888-bfa3-ece2ab2119f3","user":"ubuntu"} I0909 14:14:01.406332 32132 fetcher.cpp:364] Fetching URI 'hdfs:///apps/spark/spark.tgz' I0909 14:14:01.406344 32132 fetcher.cpp:238] Fetching directly into the sandbox directory I0909 14:14:01.406358 32132 fetcher.cpp:176] Fetching URI 'hdfs:///apps/spark/spark.tgz' I0909 14:14:01.679055 32132 fetcher.cpp:104] Downloading resource with Hadoop client from 'hdfs:///apps/spark/spark.tgz' to '/tmp/mesos/slaves/20150826-133446-3217621258-5050-4064-S0/frameworks/20150826-133446-3217621258-5050-4064-211198/executors/20150826-133446-3217621258-5050-4064-S0/runs/38077da2-553e-4888-bfa3-ece2ab2119f3/spark.tgz' I0909 14:14:05.492626 32132 fetcher.cpp:76] Extracting with command: tar -C '/tmp/mesos/slaves/20150826-133446-3217621258-5050-4064-S0/frameworks/20150826-133446-3217621258-5050-4064-211198/executors/20150826-133446-3217621258-5050-4064-S0/runs/38077da2-553e-4888-bfa3-ece2ab2119f3' -xf '/tmp/mesos/slaves/20150826-133446-3217621258-5050-4064-S0/frameworks/20150826-133446-3217621258-5050-4064-211198/executors/20150826-133446-3217621258-5050-4064-S0/runs/38077da2-553e-4888-bfa3-ece2ab2119f3/spark.tgz' I0909 14:14:07.489753 32132 fetcher.cpp:84] Extracted '/tmp/mesos/slaves/20150826-133446-3217621258-5050-4064-S0/frameworks/20150826-133446-3217621258-5050-4064-211198/executors/20150826-133446-3217621258-5050-4064-S0/runs/38077da2-553e-4888-bfa3-ece2ab2119f3/spark.tgz' into '/tmp/mesos/slaves/20150826-133446-3217621258-5050-4064-S0/frameworks/20150826-133446-3217621258-5050-4064-211198/executors/20150826-133446-3217621258-5050-4064-S0/runs/38077da2-553e-4888-bfa3-ece2ab2119f3' W0909 14:14:07.489784 32132 fetcher.cpp:260] Copying instead of extracting resource from URI with 'extract' flag, because it does not seem to be an archive: hdfs:///apps/spark/spark.tgz I0909 14:14:07.489791 32132 fetcher.cpp:441] Fetched 'hdfs:///apps/spark/spark.tgz' to '/tmp/mesos/slaves/20150826-133446-3217621258-5050-4064-S0/frameworks/20150826-133446-3217621258-5050-4064-211198/executors/20150826-133446-3217621258-5050-4064-S0/runs/38077da2-553e-4888-bfa3-ece2ab2119f3/spark.tgz' Error: A JNI error has occurred, please check your installation and try again Exception in thread "main" java.lang.NoClassDefFoundError: org/slf4j/Logger
    at java.lang.Class.getDeclaredMethods0(Native Method)
    at java.lang.Class.privateGetDeclaredMethods(Class.java:2701)
    at java.lang.Class.privateGetMethodRecursive(Class.java:3048)
    at java.lang.Class.getMethod0(Class.java:3018)
    at java.lang.Class.getMethod(Class.java:1784)
at sun.launcher.LauncherHelper.validateMainClass(LauncherHelper.java:544) at sun.launcher.LauncherHelper.checkAndLoadMain(LauncherHelper.java:526)
Caused by: java.lang.ClassNotFoundException: org.slf4j.Logger
    at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
    at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
    ... 7 more



--
*Adrian Bridgett* | Sysadmin Engineer, OpenSignal <http://www.opensignal.com>
_____________________________________________________
Office: First Floor, Scriptor Court, 155-157 Farringdon Road, Clerkenwell, London, EC1R 3AD
Phone #: +44 777-377-8251
Skype: abridgett |@adrianbridgett <http://twitter.com/adrianbridgett>| LinkedIn link <https://uk.linkedin.com/in/abridgett>
_____________________________________________________

Reply via email to