Hello everyone,

I'm a student assistant in research at the University of Paderborn, working
on integrating Spark (v1.6.2) with a new network resource management system.
I have already taken a deep dive into the source code of spark-core w.r.t.
its scheduling systems.

We are running a cluster in standalone mode consisting of a master node and
three slave nodes. Am I right to assume that tasks are scheduled within the
TaskSchedulerImpl using the DAGScheduler in this mode? I need to find a
place where the execution plan (and each stage) for a job is computed and
can be analyzed, so I placed some breakpoints in these two classes.

The remote debugging session within IntelliJ IDEA has been established by
running the following commands on the master node before:

  export SPARK_WORKER_OPTS="-Xdebug
-Xrunjdwp:server=y,transport=dt_socket,address=4000,suspend=n"
  export SPARK_MASTER_OPTS="-Xdebug
-Xrunjdwp:server=y,transport=dt_socket,address=4000,suspend=n"

Port 4000 has been forwarded to my local machine. Unfortunately, none of my
breakpoints through the class get hit when I invoke a task like
sc.parallelize(1 to 1000).count() in spark-shell on the master node (using
--master spark://...), though when I pause all threads I can see that the
process I am debugging runs some kind of event queue, which means that the
debugger is connected to /something/.

Do I rely on false assumptions or should these breakpoints in fact get hit?
I am not too familiar with Spark, so please bear with me if I got something
wrong. Many thanks in advance for your help.

Best regards,
Christian Brüggemann



--
View this message in context: 
http://apache-spark-developers-list.1001551.n3.nabble.com/Debugging-Spark-itself-in-standalone-cluster-mode-tp18139.html
Sent from the Apache Spark Developers List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org

Reply via email to