Re: Unit test with sqlContext

2016-03-19 Thread Vikas Kawadia
If you prefer  the py.test framework, I just wrote a blog post with some
examples:

Unit testing Apache Spark with py.test
https://engblog.nextdoor.com/unit-testing-apache-spark-with-py-test-3b8970dc013b

On Fri, Feb 5, 2016 at 11:43 AM, Steve Annessa 
wrote:

> Thanks for all of the responses.
>
> I do have an afterAll that stops the sc.
>
> While looking over Holden's readme I noticed she mentioned "Make sure to
> disable parallel execution." That was what I was missing; I added the
> follow to my build.sbt:
>
> ```
> parallelExecution in Test := false
> ```
>
> Now all of my tests are running.
>
> I'm going to look into using the package she created.
>
> Thanks again,
>
> -- Steve
>
>
> On Thu, Feb 4, 2016 at 8:50 PM, Rishi Mishra 
> wrote:
>
>> Hi Steve,
>> Have you cleaned up your SparkContext ( sc.stop())  , in a afterAll().
>> The error suggests you are creating more than one SparkContext.
>>
>>
>> On Fri, Feb 5, 2016 at 10:04 AM, Holden Karau 
>> wrote:
>>
>>> Thanks for recommending spark-testing-base :) Just wanted to add if
>>> anyone has feature requests for Spark testing please get in touch (or add
>>> an issue on the github) :)
>>>
>>>
>>> On Thu, Feb 4, 2016 at 8:25 PM, Silvio Fiorito <
>>> silvio.fior...@granturing.com> wrote:
>>>
 Hi Steve,

 Have you looked at the spark-testing-base package by Holden? It’s
 really useful for unit testing Spark apps as it handles all the
 bootstrapping for you.

 https://github.com/holdenk/spark-testing-base

 DataFrame examples are here:
 https://github.com/holdenk/spark-testing-base/blob/master/src/test/1.3/scala/com/holdenkarau/spark/testing/SampleDataFrameTest.scala

 Thanks,
 Silvio

 From: Steve Annessa 
 Date: Thursday, February 4, 2016 at 8:36 PM
 To: "user@spark.apache.org" 
 Subject: Unit test with sqlContext

 I'm trying to unit test a function that reads in a JSON file,
 manipulates the DF and then returns a Scala Map.

 The function has signature:
 def ingest(dataLocation: String, sc: SparkContext, sqlContext:
 SQLContext)

 I've created a bootstrap spec for spark jobs that instantiates the
 Spark Context and SQLContext like so:

 @transient var sc: SparkContext = _
 @transient var sqlContext: SQLContext = _

 override def beforeAll = {
   System.clearProperty("spark.driver.port")
   System.clearProperty("spark.hostPort")

   val conf = new SparkConf()
 .setMaster(master)
 .setAppName(appName)

   sc = new SparkContext(conf)
   sqlContext = new SQLContext(sc)
 }

 When I do not include sqlContext, my tests run. Once I add the
 sqlContext I get the following errors:

 16/02/04 17:31:58 WARN SparkContext: Another SparkContext is being
 constructed (or threw an exception in its constructor).  This may indicate
 an error, since only one SparkContext may be running in this JVM (see
 SPARK-2243). The other SparkContext was created at:
 org.apache.spark.SparkContext.(SparkContext.scala:81)

 16/02/04 17:31:59 ERROR SparkContext: Error initializing SparkContext.
 akka.actor.InvalidActorNameException: actor name [ExecutorEndpoint] is
 not unique!

 and finally:

 [info] IngestSpec:
 [info] Exception encountered when attempting to run a suite with class
 name: com.company.package.IngestSpec *** ABORTED ***
 [info]   akka.actor.InvalidActorNameException: actor name
 [ExecutorEndpoint] is not unique!


 What do I need to do to get a sqlContext through my tests?

 Thanks,

 -- Steve

>>>
>>>
>>>
>>> --
>>> Cell : 425-233-8271
>>> Twitter: https://twitter.com/holdenkarau
>>>
>>
>>
>>
>> --
>> Regards,
>> Rishitesh Mishra,
>> SnappyData . (http://www.snappydata.io/)
>>
>> https://in.linkedin.com/in/rishiteshmishra
>>
>
>


Re: Unit testing framework for Spark Jobs?

2016-03-19 Thread Vikas Kawadia
I just wrote a blog post on Unit testing Apache Spark with py.test
https://engblog.nextdoor.com/unit-testing-apache-spark-with-py-test-3b8970dc013b

If you prefer using the py.test framework, then it might be useful.

-vikas

On Wed, Mar 2, 2016 at 10:59 AM, radoburansky 
wrote:

> I am sure you have googled this:
> https://github.com/holdenk/spark-testing-base
>
> On Wed, Mar 2, 2016 at 6:54 PM, SRK [via Apache Spark User List] <[hidden
> email] > wrote:
>
>> Hi,
>>
>> What is a good unit testing framework for Spark batch/streaming jobs? I
>> have core spark, spark sql with dataframes and streaming api getting used.
>> Any good framework to cover unit tests for these APIs?
>>
>> Thanks!
>>
>> --
>> If you reply to this email, your message will be added to the discussion
>> below:
>>
>> http://apache-spark-user-list.1001560.n3.nabble.com/Unit-testing-framework-for-Spark-Jobs-tp26380.html
>> To start a new topic under Apache Spark User List, email [hidden email]
>> 
>> To unsubscribe from Apache Spark User List, click here.
>> NAML
>> 
>>
>
>
> --
> View this message in context: Re: Unit testing framework for Spark Jobs?
> 
> Sent from the Apache Spark User List mailing list archive
>  at Nabble.com.
>