I have setup spark debug env on windows and mac, and thought its worth sharing given some of the issues I encountered and the instructions given here <https://cwiki.apache.org/confluence/display/SPARK/Useful+Developer+Tools#UsefulDeveloperTools-Eclipse> did not work for *eclipse* (possibly outdated now). The first step "sbt/sbt" or "build/sbt" hangs in downloading sbt with the message "Getting org.scala-sbt sbt 0.13.7 ...". I tried the alternative "build/mvn eclipse:eclipse", but that too failed as the generated .classpath files contained classpathentry only for java files.
1. Build spark using maven on command line. This will download all the necessary jars from maven repos and speed up eclipse build. Maven 3.3.3 is required. Spark ships with it. Just use build/mvn and ensure that there is no "mvn" command in PATH (build/mvn -Pyarn -Phadoop-2.6 -Dhadoop.version=2.6.0 -DskipTests clean package). 2. Download latest scala-ide (4.1.1 as of now) from http://scala-ide.org 3. Check if the eclipse scala maven plugin is installed. If not, install it: Help --> Install New Software --> http://alchim31.free.fr/m2e-scala/update-site/ which is sourced from https://github.com/sonatype/m2eclipse-scala. 4. If using scala 2.10, add installation 2.10.4. If you build spark using steps in described here <http://spark.apache.org/docs/latest/building-spark.html> , (build/mvn -Pyarn -Phadoop-2.6 -Dhadoop.version=2.6.0 -DskipTests clean package), it gets installed in build/scala-2.10.4. In Eclipse Preferences -> Scala -> Installations -> Add, specify the <spark-dir>/build/scala-2.10.4/lib. 5. In Eclipse -> Project, disable Build Automatically. This is to avoid building projects till all projects are imported and some settings are changed. Otherwise, eclipse takes up hours building projects while in half baked state. 6. In Eclipse -> Preferences -> Java -> Compiler -> Errors/Warnings --> Deprecated and Restricted API, change the setting to Warning from earlier Error. This is to take care of Unsafe classes for project tungsten. 7. Import maven projects: In eclipse, File --> Import --> Maven --> Existing Maven Projects (*not General --> Existing projects in workspace*). 8. After the projects are completely imported, select all projects except java8-tests_2.10, spark-assembly_2.10, spark-parent_2.10, right click and choose Scala -> Set the Scala Installation. Choose 2.10.4. This step is also described here <https://cwiki.apache.org/confluence/display/SPARK/Useful+Developer+Tools#UsefulDeveloperTools-Eclipse> . It does not work for some projects. Right click on each project, Properties -> Scala Compiler -> Check Use Project Settings, Select Scala Installation as scala 2.10.4 and click OK. 9. Some projects will give error "Plugin execution not covered by lifecycle configuration" when building. The issue is described here <http://stackoverflow.com/questions/6352208/how-to-solve-plugin-execution-not-covered-by-lifecycle-configuration-for-sprin> . The pom.xml of those projects will need <pluginManagement> ... </pluginManagement> around the <plugins> like below: The projects which need this change are spark-streaming-flume-sink_2.10 (external/flume-sink/pom.xml), spark-repl_2.10 (repl/pom.xml), spark-sql_2.10 (sql/pom.xml), spark-hive_2.10 (sql/hive/pom.xml), spark-hive-thriftserver_2.10 (sql/hive-thriftserver_2.10/pom.xml), spark-unsafe_2.10 (unsafe/pom.xml). 10. Right click on project spark-streaming-flume-sink_2.10, Properties -> Java Build Path -> Source -> Add Folder. Navigate to target -> scala-2.10 -> src_managed -> main -> compiled_avro. Check the checkbox and click OK. 11. Now enable Project -> Build Automatically. Sit back and relax. If build fails for some projects (SBT crashes sometimes), just select those, Project -> Clean -> Clean selected projects. 12. After the build completes (hopefully without any errors), run/debug an example from spark-examples_2.10. You should be able to put breakpoints in spark code and debug. You may have to change source of examples to add */.setMaster("local")/* on the */val sparkConf/* line. After this minor change, it will work. Also, the first time you debug, it will ask you specify source path. Just select Add -> Java Project -> select all spark projects. Let the first debugging session complete as will not show any spark code. You may disable breakpoints in this session to let it go. Subsequent sessions allow you to walk through step by step in spark code. Enjoy You may not have to go through all this if using scala 2.11 or IntelliJ. But if you are like me, who uses eclipse and also the spark's current scala 2.10.4, you will find this useful and avoid a lot of googling The one issue I encountered is debugging/setting breakpoints in expression generated java code. This code generated as string in spark-catalyst_2.10 --> org.apache.spark.sql.catalyst.expressions and org.apache.spark.sql.catalyst.expressions.codegen. If anyone has figured out how to do it, please update on this thread. -- View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/Spark-1-5-0-setting-up-debug-env-tp14056.html Sent from the Apache Spark Developers List mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org