Re: Unit test with sqlContext
If you prefer the py.test framework, I just wrote a blog post with some examples: Unit testing Apache Spark with py.test https://engblog.nextdoor.com/unit-testing-apache-spark-with-py-test-3b8970dc013b On Fri, Feb 5, 2016 at 11:43 AM, Steve Annessa <steve.anne...@gmail.com> wrote: > Thanks for all of the responses. > > I do have an afterAll that stops the sc. > > While looking over Holden's readme I noticed she mentioned "Make sure to > disable parallel execution." That was what I was missing; I added the > follow to my build.sbt: > > ``` > parallelExecution in Test := false > ``` > > Now all of my tests are running. > > I'm going to look into using the package she created. > > Thanks again, > > -- Steve > > > On Thu, Feb 4, 2016 at 8:50 PM, Rishi Mishra <rmis...@snappydata.io> > wrote: > >> Hi Steve, >> Have you cleaned up your SparkContext ( sc.stop()) , in a afterAll(). >> The error suggests you are creating more than one SparkContext. >> >> >> On Fri, Feb 5, 2016 at 10:04 AM, Holden Karau <hol...@pigscanfly.ca> >> wrote: >> >>> Thanks for recommending spark-testing-base :) Just wanted to add if >>> anyone has feature requests for Spark testing please get in touch (or add >>> an issue on the github) :) >>> >>> >>> On Thu, Feb 4, 2016 at 8:25 PM, Silvio Fiorito < >>> silvio.fior...@granturing.com> wrote: >>> >>>> Hi Steve, >>>> >>>> Have you looked at the spark-testing-base package by Holden? It’s >>>> really useful for unit testing Spark apps as it handles all the >>>> bootstrapping for you. >>>> >>>> https://github.com/holdenk/spark-testing-base >>>> >>>> DataFrame examples are here: >>>> https://github.com/holdenk/spark-testing-base/blob/master/src/test/1.3/scala/com/holdenkarau/spark/testing/SampleDataFrameTest.scala >>>> >>>> Thanks, >>>> Silvio >>>> >>>> From: Steve Annessa <steve.anne...@gmail.com> >>>> Date: Thursday, February 4, 2016 at 8:36 PM >>>> To: "user@spark.apache.org" <user@spark.apache.org> >>>> Subject: Unit test with sqlContext >>>> >>>> I'm trying to unit test a function that reads in a JSON file, >>>> manipulates the DF and then returns a Scala Map. >>>> >>>> The function has signature: >>>> def ingest(dataLocation: String, sc: SparkContext, sqlContext: >>>> SQLContext) >>>> >>>> I've created a bootstrap spec for spark jobs that instantiates the >>>> Spark Context and SQLContext like so: >>>> >>>> @transient var sc: SparkContext = _ >>>> @transient var sqlContext: SQLContext = _ >>>> >>>> override def beforeAll = { >>>> System.clearProperty("spark.driver.port") >>>> System.clearProperty("spark.hostPort") >>>> >>>> val conf = new SparkConf() >>>> .setMaster(master) >>>> .setAppName(appName) >>>> >>>> sc = new SparkContext(conf) >>>> sqlContext = new SQLContext(sc) >>>> } >>>> >>>> When I do not include sqlContext, my tests run. Once I add the >>>> sqlContext I get the following errors: >>>> >>>> 16/02/04 17:31:58 WARN SparkContext: Another SparkContext is being >>>> constructed (or threw an exception in its constructor). This may indicate >>>> an error, since only one SparkContext may be running in this JVM (see >>>> SPARK-2243). The other SparkContext was created at: >>>> org.apache.spark.SparkContext.(SparkContext.scala:81) >>>> >>>> 16/02/04 17:31:59 ERROR SparkContext: Error initializing SparkContext. >>>> akka.actor.InvalidActorNameException: actor name [ExecutorEndpoint] is >>>> not unique! >>>> >>>> and finally: >>>> >>>> [info] IngestSpec: >>>> [info] Exception encountered when attempting to run a suite with class >>>> name: com.company.package.IngestSpec *** ABORTED *** >>>> [info] akka.actor.InvalidActorNameException: actor name >>>> [ExecutorEndpoint] is not unique! >>>> >>>> >>>> What do I need to do to get a sqlContext through my tests? >>>> >>>> Thanks, >>>> >>>> -- Steve >>>> >>> >>> >>> >>> -- >>> Cell : 425-233-8271 >>> Twitter: https://twitter.com/holdenkarau >>> >> >> >> >> -- >> Regards, >> Rishitesh Mishra, >> SnappyData . (http://www.snappydata.io/) >> >> https://in.linkedin.com/in/rishiteshmishra >> > >
Re: Unit test with sqlContext
Thanks for all of the responses. I do have an afterAll that stops the sc. While looking over Holden's readme I noticed she mentioned "Make sure to disable parallel execution." That was what I was missing; I added the follow to my build.sbt: ``` parallelExecution in Test := false ``` Now all of my tests are running. I'm going to look into using the package she created. Thanks again, -- Steve On Thu, Feb 4, 2016 at 8:50 PM, Rishi Mishra <rmis...@snappydata.io> wrote: > Hi Steve, > Have you cleaned up your SparkContext ( sc.stop()) , in a afterAll(). The > error suggests you are creating more than one SparkContext. > > > On Fri, Feb 5, 2016 at 10:04 AM, Holden Karau <hol...@pigscanfly.ca> > wrote: > >> Thanks for recommending spark-testing-base :) Just wanted to add if >> anyone has feature requests for Spark testing please get in touch (or add >> an issue on the github) :) >> >> >> On Thu, Feb 4, 2016 at 8:25 PM, Silvio Fiorito < >> silvio.fior...@granturing.com> wrote: >> >>> Hi Steve, >>> >>> Have you looked at the spark-testing-base package by Holden? It’s really >>> useful for unit testing Spark apps as it handles all the bootstrapping for >>> you. >>> >>> https://github.com/holdenk/spark-testing-base >>> >>> DataFrame examples are here: >>> https://github.com/holdenk/spark-testing-base/blob/master/src/test/1.3/scala/com/holdenkarau/spark/testing/SampleDataFrameTest.scala >>> >>> Thanks, >>> Silvio >>> >>> From: Steve Annessa <steve.anne...@gmail.com> >>> Date: Thursday, February 4, 2016 at 8:36 PM >>> To: "user@spark.apache.org" <user@spark.apache.org> >>> Subject: Unit test with sqlContext >>> >>> I'm trying to unit test a function that reads in a JSON file, >>> manipulates the DF and then returns a Scala Map. >>> >>> The function has signature: >>> def ingest(dataLocation: String, sc: SparkContext, sqlContext: >>> SQLContext) >>> >>> I've created a bootstrap spec for spark jobs that instantiates the Spark >>> Context and SQLContext like so: >>> >>> @transient var sc: SparkContext = _ >>> @transient var sqlContext: SQLContext = _ >>> >>> override def beforeAll = { >>> System.clearProperty("spark.driver.port") >>> System.clearProperty("spark.hostPort") >>> >>> val conf = new SparkConf() >>> .setMaster(master) >>> .setAppName(appName) >>> >>> sc = new SparkContext(conf) >>> sqlContext = new SQLContext(sc) >>> } >>> >>> When I do not include sqlContext, my tests run. Once I add the >>> sqlContext I get the following errors: >>> >>> 16/02/04 17:31:58 WARN SparkContext: Another SparkContext is being >>> constructed (or threw an exception in its constructor). This may indicate >>> an error, since only one SparkContext may be running in this JVM (see >>> SPARK-2243). The other SparkContext was created at: >>> org.apache.spark.SparkContext.(SparkContext.scala:81) >>> >>> 16/02/04 17:31:59 ERROR SparkContext: Error initializing SparkContext. >>> akka.actor.InvalidActorNameException: actor name [ExecutorEndpoint] is >>> not unique! >>> >>> and finally: >>> >>> [info] IngestSpec: >>> [info] Exception encountered when attempting to run a suite with class >>> name: com.company.package.IngestSpec *** ABORTED *** >>> [info] akka.actor.InvalidActorNameException: actor name >>> [ExecutorEndpoint] is not unique! >>> >>> >>> What do I need to do to get a sqlContext through my tests? >>> >>> Thanks, >>> >>> -- Steve >>> >> >> >> >> -- >> Cell : 425-233-8271 >> Twitter: https://twitter.com/holdenkarau >> > > > > -- > Regards, > Rishitesh Mishra, > SnappyData . (http://www.snappydata.io/) > > https://in.linkedin.com/in/rishiteshmishra >
Unit test with sqlContext
I'm trying to unit test a function that reads in a JSON file, manipulates the DF and then returns a Scala Map. The function has signature: def ingest(dataLocation: String, sc: SparkContext, sqlContext: SQLContext) I've created a bootstrap spec for spark jobs that instantiates the Spark Context and SQLContext like so: @transient var sc: SparkContext = _ @transient var sqlContext: SQLContext = _ override def beforeAll = { System.clearProperty("spark.driver.port") System.clearProperty("spark.hostPort") val conf = new SparkConf() .setMaster(master) .setAppName(appName) sc = new SparkContext(conf) sqlContext = new SQLContext(sc) } When I do not include sqlContext, my tests run. Once I add the sqlContext I get the following errors: 16/02/04 17:31:58 WARN SparkContext: Another SparkContext is being constructed (or threw an exception in its constructor). This may indicate an error, since only one SparkContext may be running in this JVM (see SPARK-2243). The other SparkContext was created at: org.apache.spark.SparkContext.(SparkContext.scala:81) 16/02/04 17:31:59 ERROR SparkContext: Error initializing SparkContext. akka.actor.InvalidActorNameException: actor name [ExecutorEndpoint] is not unique! and finally: [info] IngestSpec: [info] Exception encountered when attempting to run a suite with class name: com.company.package.IngestSpec *** ABORTED *** [info] akka.actor.InvalidActorNameException: actor name [ExecutorEndpoint] is not unique! What do I need to do to get a sqlContext through my tests? Thanks, -- Steve
Re: Unit test with sqlContext
Hi Steve, Have you looked at the spark-testing-base package by Holden? It’s really useful for unit testing Spark apps as it handles all the bootstrapping for you. https://github.com/holdenk/spark-testing-base DataFrame examples are here: https://github.com/holdenk/spark-testing-base/blob/master/src/test/1.3/scala/com/holdenkarau/spark/testing/SampleDataFrameTest.scala Thanks, Silvio From: Steve Annessa <steve.anne...@gmail.com<mailto:steve.anne...@gmail.com>> Date: Thursday, February 4, 2016 at 8:36 PM To: "user@spark.apache.org<mailto:user@spark.apache.org>" <user@spark.apache.org<mailto:user@spark.apache.org>> Subject: Unit test with sqlContext I'm trying to unit test a function that reads in a JSON file, manipulates the DF and then returns a Scala Map. The function has signature: def ingest(dataLocation: String, sc: SparkContext, sqlContext: SQLContext) I've created a bootstrap spec for spark jobs that instantiates the Spark Context and SQLContext like so: @transient var sc: SparkContext = _ @transient var sqlContext: SQLContext = _ override def beforeAll = { System.clearProperty("spark.driver.port") System.clearProperty("spark.hostPort") val conf = new SparkConf() .setMaster(master) .setAppName(appName) sc = new SparkContext(conf) sqlContext = new SQLContext(sc) } When I do not include sqlContext, my tests run. Once I add the sqlContext I get the following errors: 16/02/04 17:31:58 WARN SparkContext: Another SparkContext is being constructed (or threw an exception in its constructor). This may indicate an error, since only one SparkContext may be running in this JVM (see SPARK-2243). The other SparkContext was created at: org.apache.spark.SparkContext.(SparkContext.scala:81) 16/02/04 17:31:59 ERROR SparkContext: Error initializing SparkContext. akka.actor.InvalidActorNameException: actor name [ExecutorEndpoint] is not unique! and finally: [info] IngestSpec: [info] Exception encountered when attempting to run a suite with class name: com.company.package.IngestSpec *** ABORTED *** [info] akka.actor.InvalidActorNameException: actor name [ExecutorEndpoint] is not unique! What do I need to do to get a sqlContext through my tests? Thanks, -- Steve
Re: Unit test with sqlContext
Hi Steve, Have you cleaned up your SparkContext ( sc.stop()) , in a afterAll(). The error suggests you are creating more than one SparkContext. On Fri, Feb 5, 2016 at 10:04 AM, Holden Karau <hol...@pigscanfly.ca> wrote: > Thanks for recommending spark-testing-base :) Just wanted to add if anyone > has feature requests for Spark testing please get in touch (or add an issue > on the github) :) > > > On Thu, Feb 4, 2016 at 8:25 PM, Silvio Fiorito < > silvio.fior...@granturing.com> wrote: > >> Hi Steve, >> >> Have you looked at the spark-testing-base package by Holden? It’s really >> useful for unit testing Spark apps as it handles all the bootstrapping for >> you. >> >> https://github.com/holdenk/spark-testing-base >> >> DataFrame examples are here: >> https://github.com/holdenk/spark-testing-base/blob/master/src/test/1.3/scala/com/holdenkarau/spark/testing/SampleDataFrameTest.scala >> >> Thanks, >> Silvio >> >> From: Steve Annessa <steve.anne...@gmail.com> >> Date: Thursday, February 4, 2016 at 8:36 PM >> To: "user@spark.apache.org" <user@spark.apache.org> >> Subject: Unit test with sqlContext >> >> I'm trying to unit test a function that reads in a JSON file, manipulates >> the DF and then returns a Scala Map. >> >> The function has signature: >> def ingest(dataLocation: String, sc: SparkContext, sqlContext: SQLContext) >> >> I've created a bootstrap spec for spark jobs that instantiates the Spark >> Context and SQLContext like so: >> >> @transient var sc: SparkContext = _ >> @transient var sqlContext: SQLContext = _ >> >> override def beforeAll = { >> System.clearProperty("spark.driver.port") >> System.clearProperty("spark.hostPort") >> >> val conf = new SparkConf() >> .setMaster(master) >> .setAppName(appName) >> >> sc = new SparkContext(conf) >> sqlContext = new SQLContext(sc) >> } >> >> When I do not include sqlContext, my tests run. Once I add the sqlContext >> I get the following errors: >> >> 16/02/04 17:31:58 WARN SparkContext: Another SparkContext is being >> constructed (or threw an exception in its constructor). This may indicate >> an error, since only one SparkContext may be running in this JVM (see >> SPARK-2243). The other SparkContext was created at: >> org.apache.spark.SparkContext.(SparkContext.scala:81) >> >> 16/02/04 17:31:59 ERROR SparkContext: Error initializing SparkContext. >> akka.actor.InvalidActorNameException: actor name [ExecutorEndpoint] is >> not unique! >> >> and finally: >> >> [info] IngestSpec: >> [info] Exception encountered when attempting to run a suite with class >> name: com.company.package.IngestSpec *** ABORTED *** >> [info] akka.actor.InvalidActorNameException: actor name >> [ExecutorEndpoint] is not unique! >> >> >> What do I need to do to get a sqlContext through my tests? >> >> Thanks, >> >> -- Steve >> > > > > -- > Cell : 425-233-8271 > Twitter: https://twitter.com/holdenkarau > -- Regards, Rishitesh Mishra, SnappyData . (http://www.snappydata.io/) https://in.linkedin.com/in/rishiteshmishra
Re: Unit test with sqlContext
Thanks for recommending spark-testing-base :) Just wanted to add if anyone has feature requests for Spark testing please get in touch (or add an issue on the github) :) On Thu, Feb 4, 2016 at 8:25 PM, Silvio Fiorito < silvio.fior...@granturing.com> wrote: > Hi Steve, > > Have you looked at the spark-testing-base package by Holden? It’s really > useful for unit testing Spark apps as it handles all the bootstrapping for > you. > > https://github.com/holdenk/spark-testing-base > > DataFrame examples are here: > https://github.com/holdenk/spark-testing-base/blob/master/src/test/1.3/scala/com/holdenkarau/spark/testing/SampleDataFrameTest.scala > > Thanks, > Silvio > > From: Steve Annessa <steve.anne...@gmail.com> > Date: Thursday, February 4, 2016 at 8:36 PM > To: "user@spark.apache.org" <user@spark.apache.org> > Subject: Unit test with sqlContext > > I'm trying to unit test a function that reads in a JSON file, manipulates > the DF and then returns a Scala Map. > > The function has signature: > def ingest(dataLocation: String, sc: SparkContext, sqlContext: SQLContext) > > I've created a bootstrap spec for spark jobs that instantiates the Spark > Context and SQLContext like so: > > @transient var sc: SparkContext = _ > @transient var sqlContext: SQLContext = _ > > override def beforeAll = { > System.clearProperty("spark.driver.port") > System.clearProperty("spark.hostPort") > > val conf = new SparkConf() > .setMaster(master) > .setAppName(appName) > > sc = new SparkContext(conf) > sqlContext = new SQLContext(sc) > } > > When I do not include sqlContext, my tests run. Once I add the sqlContext > I get the following errors: > > 16/02/04 17:31:58 WARN SparkContext: Another SparkContext is being > constructed (or threw an exception in its constructor). This may indicate > an error, since only one SparkContext may be running in this JVM (see > SPARK-2243). The other SparkContext was created at: > org.apache.spark.SparkContext.(SparkContext.scala:81) > > 16/02/04 17:31:59 ERROR SparkContext: Error initializing SparkContext. > akka.actor.InvalidActorNameException: actor name [ExecutorEndpoint] is not > unique! > > and finally: > > [info] IngestSpec: > [info] Exception encountered when attempting to run a suite with class > name: com.company.package.IngestSpec *** ABORTED *** > [info] akka.actor.InvalidActorNameException: actor name > [ExecutorEndpoint] is not unique! > > > What do I need to do to get a sqlContext through my tests? > > Thanks, > > -- Steve > -- Cell : 425-233-8271 Twitter: https://twitter.com/holdenkarau