If you prefer the py.test framework, I just wrote a blog post with some examples:
Unit testing Apache Spark with py.test https://engblog.nextdoor.com/unit-testing-apache-spark-with-py-test-3b8970dc013b On Fri, Feb 5, 2016 at 11:43 AM, Steve Annessa <steve.anne...@gmail.com> wrote: > Thanks for all of the responses. > > I do have an afterAll that stops the sc. > > While looking over Holden's readme I noticed she mentioned "Make sure to > disable parallel execution." That was what I was missing; I added the > follow to my build.sbt: > > ``` > parallelExecution in Test := false > ``` > > Now all of my tests are running. > > I'm going to look into using the package she created. > > Thanks again, > > -- Steve > > > On Thu, Feb 4, 2016 at 8:50 PM, Rishi Mishra <rmis...@snappydata.io> > wrote: > >> Hi Steve, >> Have you cleaned up your SparkContext ( sc.stop()) , in a afterAll(). >> The error suggests you are creating more than one SparkContext. >> >> >> On Fri, Feb 5, 2016 at 10:04 AM, Holden Karau <hol...@pigscanfly.ca> >> wrote: >> >>> Thanks for recommending spark-testing-base :) Just wanted to add if >>> anyone has feature requests for Spark testing please get in touch (or add >>> an issue on the github) :) >>> >>> >>> On Thu, Feb 4, 2016 at 8:25 PM, Silvio Fiorito < >>> silvio.fior...@granturing.com> wrote: >>> >>>> Hi Steve, >>>> >>>> Have you looked at the spark-testing-base package by Holden? It’s >>>> really useful for unit testing Spark apps as it handles all the >>>> bootstrapping for you. >>>> >>>> https://github.com/holdenk/spark-testing-base >>>> >>>> DataFrame examples are here: >>>> https://github.com/holdenk/spark-testing-base/blob/master/src/test/1.3/scala/com/holdenkarau/spark/testing/SampleDataFrameTest.scala >>>> >>>> Thanks, >>>> Silvio >>>> >>>> From: Steve Annessa <steve.anne...@gmail.com> >>>> Date: Thursday, February 4, 2016 at 8:36 PM >>>> To: "user@spark.apache.org" <user@spark.apache.org> >>>> Subject: Unit test with sqlContext >>>> >>>> I'm trying to unit test a function that reads in a JSON file, >>>> manipulates the DF and then returns a Scala Map. >>>> >>>> The function has signature: >>>> def ingest(dataLocation: String, sc: SparkContext, sqlContext: >>>> SQLContext) >>>> >>>> I've created a bootstrap spec for spark jobs that instantiates the >>>> Spark Context and SQLContext like so: >>>> >>>> @transient var sc: SparkContext = _ >>>> @transient var sqlContext: SQLContext = _ >>>> >>>> override def beforeAll = { >>>> System.clearProperty("spark.driver.port") >>>> System.clearProperty("spark.hostPort") >>>> >>>> val conf = new SparkConf() >>>> .setMaster(master) >>>> .setAppName(appName) >>>> >>>> sc = new SparkContext(conf) >>>> sqlContext = new SQLContext(sc) >>>> } >>>> >>>> When I do not include sqlContext, my tests run. Once I add the >>>> sqlContext I get the following errors: >>>> >>>> 16/02/04 17:31:58 WARN SparkContext: Another SparkContext is being >>>> constructed (or threw an exception in its constructor). This may indicate >>>> an error, since only one SparkContext may be running in this JVM (see >>>> SPARK-2243). The other SparkContext was created at: >>>> org.apache.spark.SparkContext.<init>(SparkContext.scala:81) >>>> >>>> 16/02/04 17:31:59 ERROR SparkContext: Error initializing SparkContext. >>>> akka.actor.InvalidActorNameException: actor name [ExecutorEndpoint] is >>>> not unique! >>>> >>>> and finally: >>>> >>>> [info] IngestSpec: >>>> [info] Exception encountered when attempting to run a suite with class >>>> name: com.company.package.IngestSpec *** ABORTED *** >>>> [info] akka.actor.InvalidActorNameException: actor name >>>> [ExecutorEndpoint] is not unique! >>>> >>>> >>>> What do I need to do to get a sqlContext through my tests? >>>> >>>> Thanks, >>>> >>>> -- Steve >>>> >>> >>> >>> >>> -- >>> Cell : 425-233-8271 >>> Twitter: https://twitter.com/holdenkarau >>> >> >> >> >> -- >> Regards, >> Rishitesh Mishra, >> SnappyData . (http://www.snappydata.io/) >> >> https://in.linkedin.com/in/rishiteshmishra >> > >