Hi Steve,

Have you looked at the spark-testing-base package by Holden? It’s really useful 
for unit testing Spark apps as it handles all the bootstrapping for you.

https://github.com/holdenk/spark-testing-base

DataFrame examples are here: 
https://github.com/holdenk/spark-testing-base/blob/master/src/test/1.3/scala/com/holdenkarau/spark/testing/SampleDataFrameTest.scala

Thanks,
Silvio

From: Steve Annessa <steve.anne...@gmail.com<mailto:steve.anne...@gmail.com>>
Date: Thursday, February 4, 2016 at 8:36 PM
To: "user@spark.apache.org<mailto:user@spark.apache.org>" 
<user@spark.apache.org<mailto:user@spark.apache.org>>
Subject: Unit test with sqlContext

I'm trying to unit test a function that reads in a JSON file, manipulates the 
DF and then returns a Scala Map.

The function has signature:
def ingest(dataLocation: String, sc: SparkContext, sqlContext: SQLContext)

I've created a bootstrap spec for spark jobs that instantiates the Spark 
Context and SQLContext like so:

@transient var sc: SparkContext = _
@transient var sqlContext: SQLContext = _

override def beforeAll = {
  System.clearProperty("spark.driver.port")
  System.clearProperty("spark.hostPort")

  val conf = new SparkConf()
    .setMaster(master)
    .setAppName(appName)

  sc = new SparkContext(conf)
  sqlContext = new SQLContext(sc)
}

When I do not include sqlContext, my tests run. Once I add the sqlContext I get 
the following errors:

16/02/04 17:31:58 WARN SparkContext: Another SparkContext is being constructed 
(or threw an exception in its constructor).  This may indicate an error, since 
only one SparkContext may be running in this JVM (see SPARK-2243). The other 
SparkContext was created at:
org.apache.spark.SparkContext.<init>(SparkContext.scala:81)

16/02/04 17:31:59 ERROR SparkContext: Error initializing SparkContext.
akka.actor.InvalidActorNameException: actor name [ExecutorEndpoint] is not 
unique!

and finally:

[info] IngestSpec:
[info] Exception encountered when attempting to run a suite with class name: 
com.company.package.IngestSpec *** ABORTED ***
[info]   akka.actor.InvalidActorNameException: actor name [ExecutorEndpoint] is 
not unique!


What do I need to do to get a sqlContext through my tests?

Thanks,

-- Steve

Reply via email to