If you prefer  the py.test framework, I just wrote a blog post with some
examples:

Unit testing Apache Spark with py.test
https://engblog.nextdoor.com/unit-testing-apache-spark-with-py-test-3b8970dc013b

On Fri, Feb 5, 2016 at 11:43 AM, Steve Annessa <steve.anne...@gmail.com>
wrote:

> Thanks for all of the responses.
>
> I do have an afterAll that stops the sc.
>
> While looking over Holden's readme I noticed she mentioned "Make sure to
> disable parallel execution." That was what I was missing; I added the
> follow to my build.sbt:
>
> ```
> parallelExecution in Test := false
> ```
>
> Now all of my tests are running.
>
> I'm going to look into using the package she created.
>
> Thanks again,
>
> -- Steve
>
>
> On Thu, Feb 4, 2016 at 8:50 PM, Rishi Mishra <rmis...@snappydata.io>
> wrote:
>
>> Hi Steve,
>> Have you cleaned up your SparkContext ( sc.stop())  , in a afterAll().
>> The error suggests you are creating more than one SparkContext.
>>
>>
>> On Fri, Feb 5, 2016 at 10:04 AM, Holden Karau <hol...@pigscanfly.ca>
>> wrote:
>>
>>> Thanks for recommending spark-testing-base :) Just wanted to add if
>>> anyone has feature requests for Spark testing please get in touch (or add
>>> an issue on the github) :)
>>>
>>>
>>> On Thu, Feb 4, 2016 at 8:25 PM, Silvio Fiorito <
>>> silvio.fior...@granturing.com> wrote:
>>>
>>>> Hi Steve,
>>>>
>>>> Have you looked at the spark-testing-base package by Holden? It’s
>>>> really useful for unit testing Spark apps as it handles all the
>>>> bootstrapping for you.
>>>>
>>>> https://github.com/holdenk/spark-testing-base
>>>>
>>>> DataFrame examples are here:
>>>> https://github.com/holdenk/spark-testing-base/blob/master/src/test/1.3/scala/com/holdenkarau/spark/testing/SampleDataFrameTest.scala
>>>>
>>>> Thanks,
>>>> Silvio
>>>>
>>>> From: Steve Annessa <steve.anne...@gmail.com>
>>>> Date: Thursday, February 4, 2016 at 8:36 PM
>>>> To: "user@spark.apache.org" <user@spark.apache.org>
>>>> Subject: Unit test with sqlContext
>>>>
>>>> I'm trying to unit test a function that reads in a JSON file,
>>>> manipulates the DF and then returns a Scala Map.
>>>>
>>>> The function has signature:
>>>> def ingest(dataLocation: String, sc: SparkContext, sqlContext:
>>>> SQLContext)
>>>>
>>>> I've created a bootstrap spec for spark jobs that instantiates the
>>>> Spark Context and SQLContext like so:
>>>>
>>>> @transient var sc: SparkContext = _
>>>> @transient var sqlContext: SQLContext = _
>>>>
>>>> override def beforeAll = {
>>>>   System.clearProperty("spark.driver.port")
>>>>   System.clearProperty("spark.hostPort")
>>>>
>>>>   val conf = new SparkConf()
>>>>     .setMaster(master)
>>>>     .setAppName(appName)
>>>>
>>>>   sc = new SparkContext(conf)
>>>>   sqlContext = new SQLContext(sc)
>>>> }
>>>>
>>>> When I do not include sqlContext, my tests run. Once I add the
>>>> sqlContext I get the following errors:
>>>>
>>>> 16/02/04 17:31:58 WARN SparkContext: Another SparkContext is being
>>>> constructed (or threw an exception in its constructor).  This may indicate
>>>> an error, since only one SparkContext may be running in this JVM (see
>>>> SPARK-2243). The other SparkContext was created at:
>>>> org.apache.spark.SparkContext.<init>(SparkContext.scala:81)
>>>>
>>>> 16/02/04 17:31:59 ERROR SparkContext: Error initializing SparkContext.
>>>> akka.actor.InvalidActorNameException: actor name [ExecutorEndpoint] is
>>>> not unique!
>>>>
>>>> and finally:
>>>>
>>>> [info] IngestSpec:
>>>> [info] Exception encountered when attempting to run a suite with class
>>>> name: com.company.package.IngestSpec *** ABORTED ***
>>>> [info]   akka.actor.InvalidActorNameException: actor name
>>>> [ExecutorEndpoint] is not unique!
>>>>
>>>>
>>>> What do I need to do to get a sqlContext through my tests?
>>>>
>>>> Thanks,
>>>>
>>>> -- Steve
>>>>
>>>
>>>
>>>
>>> --
>>> Cell : 425-233-8271
>>> Twitter: https://twitter.com/holdenkarau
>>>
>>
>>
>>
>> --
>> Regards,
>> Rishitesh Mishra,
>> SnappyData . (http://www.snappydata.io/)
>>
>> https://in.linkedin.com/in/rishiteshmishra
>>
>
>

Reply via email to