Hi Steve, Have you looked at the spark-testing-base package by Holden? It’s really useful for unit testing Spark apps as it handles all the bootstrapping for you.
https://github.com/holdenk/spark-testing-base DataFrame examples are here: https://github.com/holdenk/spark-testing-base/blob/master/src/test/1.3/scala/com/holdenkarau/spark/testing/SampleDataFrameTest.scala Thanks, Silvio From: Steve Annessa <steve.anne...@gmail.com<mailto:steve.anne...@gmail.com>> Date: Thursday, February 4, 2016 at 8:36 PM To: "user@spark.apache.org<mailto:user@spark.apache.org>" <user@spark.apache.org<mailto:user@spark.apache.org>> Subject: Unit test with sqlContext I'm trying to unit test a function that reads in a JSON file, manipulates the DF and then returns a Scala Map. The function has signature: def ingest(dataLocation: String, sc: SparkContext, sqlContext: SQLContext) I've created a bootstrap spec for spark jobs that instantiates the Spark Context and SQLContext like so: @transient var sc: SparkContext = _ @transient var sqlContext: SQLContext = _ override def beforeAll = { System.clearProperty("spark.driver.port") System.clearProperty("spark.hostPort") val conf = new SparkConf() .setMaster(master) .setAppName(appName) sc = new SparkContext(conf) sqlContext = new SQLContext(sc) } When I do not include sqlContext, my tests run. Once I add the sqlContext I get the following errors: 16/02/04 17:31:58 WARN SparkContext: Another SparkContext is being constructed (or threw an exception in its constructor). This may indicate an error, since only one SparkContext may be running in this JVM (see SPARK-2243). The other SparkContext was created at: org.apache.spark.SparkContext.<init>(SparkContext.scala:81) 16/02/04 17:31:59 ERROR SparkContext: Error initializing SparkContext. akka.actor.InvalidActorNameException: actor name [ExecutorEndpoint] is not unique! and finally: [info] IngestSpec: [info] Exception encountered when attempting to run a suite with class name: com.company.package.IngestSpec *** ABORTED *** [info] akka.actor.InvalidActorNameException: actor name [ExecutorEndpoint] is not unique! What do I need to do to get a sqlContext through my tests? Thanks, -- Steve