I finally managed to get the example working, here are the details that may
help other users.
I have 2 windows nodes for the test system, PN01 and PN02. Both have the same
shared drive S: (it is mapped to C:\source on PN02).
If I run the worker and master from S:\spark-1.1.0-bin-hadoop2.4, then running
simple test fails on the ClassNotFoundException (even if there is only one node
which hosts both the master and the worker).
If I run the workers and masters from the local drive
(c:\source\spark-1.1.0-bin-hadoop2.4), then the simple test runs ok (with one
or two nodes)
I haven’t found why the class fails to load with the shared drive (I checked
the permissions and they look ok) but at least the cluster is working now.
If anyone has experience getting Spark with windows shared drive, any advice
welcome !
Thanks,
Benoit.
PS: Yes thanks Angel, I did check that
s:\spark\simple>"%JAVA_HOME%"\bin\jar tvf
s:\spark\simple\target\scala-2.10\simple-project_2.10-1.0.jar
299 Thu Nov 20 17:29:40 GMT 2014 META-INF/MANIFEST.MF
1070 Thu Nov 20 17:29:40 GMT 2014 SimpleApp$$anonfun$2.class
1350 Thu Nov 20 17:29:40 GMT 2014 SimpleApp$$anonfun$main$1.class
2581 Thu Nov 20 17:29:40 GMT 2014 SimpleApp$.class
1070 Thu Nov 20 17:29:40 GMT 2014 SimpleApp$$anonfun$1.class
710 Thu Nov 20 17:29:40 GMT 2014 SimpleApp.class
From: angel2014 [mailto:angel.alvarez.pas...@gmail.com]
Sent: Friday, November 21, 2014 3:16 AM
To: u...@spark.incubator.apache.org
Subject: Re: ClassNotFoundException in standalone mode
Can you make sure the class "SimpleApp$$anonfun$1" is included in your app jar?
2014-11-20 18:19 GMT+01:00 Benoit Pasquereau [via Apache Spark User List]
<[hidden email]>:
Hi Guys,
I’m having an issue in standalone mode (Spark 1.1, Hadoop 2.4, Windows Server
2008).
A very simple program runs fine in local mode but fails in standalone mode.
Here is the error:
14/11/20 17:01:53 INFO DAGScheduler: Failed to run count at SimpleApp.scala:22
Exception in thread "main" org.apache.spark.SparkException: Job aborted due to
stage failure: Task 0 in stage 0.0 failed 4 times, most recent failure: Lost
task
0.3 in stage 0.0 (TID 6,
UK-RND-PN02.actixhost.eu<http://UK-RND-PN02.actixhost.eu>):
java.lang.ClassNotFoundException: SimpleApp$$anonfun$1
java.net.URLClassLoader$1.run(URLClassLoader.java:202)
I have added the jar to the SparkConf() to be on the safe side and it appears
in standard output (copied after the code):
/* SimpleApp.scala */
import org.apache.spark.SparkContext
import org.apache.spark.SparkContext._
import org.apache.spark.SparkConf
import java.net.URLClassLoader
object SimpleApp {
def main(args: Array[String]) {
val logFile = "S:\\spark-1.1.0-bin-hadoop2.4\\README.md"
val conf = new
SparkConf()//.setJars(Seq("s:\\spark\\simple\\target\\scala-2.10\\simple-project_2.10-1.0.jar"))
.setMaster("spark://UK-RND-PN02.actixhost.eu:7077<http://UK-RND-PN02.actixhost.eu:7077>")
//.setMaster("local[4]")
.setAppName("Simple Application")
val sc = new SparkContext(conf)
val cl = ClassLoader.getSystemClassLoader
val urls = cl.asInstanceOf[URLClassLoader].getURLs
urls.foreach(url => println("Executor classpath is:" + url.getFile))
val logData = sc.textFile(logFile, 2).cache()
val numAs = logData.filter(line => line.contains("a")).count()
val numBs = logData.filter(line => line.contains("b")).count()
println("Lines with a: %s, Lines with b: %s".format(numAs, numBs))
sc.stop()
}
}
Simple-project is in the executor classpath list:
14/11/20 17:01:48 INFO SparkDeploySchedulerBackend: SchedulerBackend is ready
for scheduling beginning after reached minRegisteredResourcesRatio: 0.0
Executor classpath is:/S:/spark/simple/
Executor classpath
is:/S:/spark/simple/target/scala-2.10/simple-project_2.10-1.0.jar
Executor classpath is:/S:/spark-1.1.0-bin-hadoop2.4/conf/
Executor classpath
is:/S:/spark-1.1.0-bin-hadoop2.4/lib/spark-assembly-1.1.0-hadoop2.4.0.jar
Executor classpath is:/S:/spark/simple/
Executor classpath
is:/S:/spark-1.1.0-bin-hadoop2.4/lib/datanucleus-api-jdo-3.2.1.jar
Executor classpath
is:/S:/spark-1.1.0-bin-hadoop2.4/lib/datanucleus-core-3.2.2.jar
Executor classpath
is:/S:/spark-1.1.0-bin-hadoop2.4/lib/datanucleus-rdbms-3.2.1.jar
Executor classpath is:/S:/spark/simple/
Would you have any idea how I could investigate further ?
Thanks !
Benoit.
PS: I could attach a debugger to the Worker where the ClassNotFoundException
happens but it is a bit painful
This message and the information contained herein is proprietary and
confidential and subject to the Amdocs policy statement, you may review at
http://www.amdocs.com/email_disclaimer.asp
If you reply to this email, your message will be added to the discussion below:
http://apache