Re: TDD in Spark

2017-01-20 Thread A Shaikh
Thanks for all the suggestion. Very Helpful.

On 17 January 2017 at 22:04, Lars Albertsson  wrote:

> My advice, short version:
> * Start by testing one job per test.
> * Use Scalatest or a standard framework.
> * Generate input datasets with Spark routines, write to local file.
> * Run job with local master.
> * Read output with Spark routines, validate only the fields you care
> about for the test case at hand.
> * Focus on building a functional regression test suite with small test
> cases before testing with large input datasets. The former improves
> productivity more.
>
> Avoid:
> * Test frameworks coupled to your processing technology - they will
> make it difficult to switch.
> * Spending much effort to small unit tests. Internal interfaces in
> Spark tend to be volatile, and testing against them results in high
> maintenance costs.
> * Input files checked in to version control. They are difficult to
> maintain. Generate input files with code instead.
> * Expected output files checked in to VC. Same reason. Validate
> selected fields instead.
>
> For a longer answer, please search for my previous posts to the user
> list, or watch this presentation: https://vimeo.com/192429554
>
> Slides at http://www.slideshare.net/lallea/test-strategies-for-
> data-processing-pipelines-67244458
>
>
> Regards,
>
>
>
> Lars Albertsson
> Data engineering consultant
> www.mapflat.com
> https://twitter.com/lalleal
> +46 70 7687109
> Calendar: https://goo.gl/6FBtlS, https://freebusy.io/la...@mapflat.com
>
>
> On Sun, Jan 15, 2017 at 7:14 PM, A Shaikh  wrote:
> > Whats the most popular Testing approach for Spark App. I am looking
> > something in the line of TDD.
>


Re: TDD in Spark

2017-01-17 Thread Lars Albertsson
My advice, short version:
* Start by testing one job per test.
* Use Scalatest or a standard framework.
* Generate input datasets with Spark routines, write to local file.
* Run job with local master.
* Read output with Spark routines, validate only the fields you care
about for the test case at hand.
* Focus on building a functional regression test suite with small test
cases before testing with large input datasets. The former improves
productivity more.

Avoid:
* Test frameworks coupled to your processing technology - they will
make it difficult to switch.
* Spending much effort to small unit tests. Internal interfaces in
Spark tend to be volatile, and testing against them results in high
maintenance costs.
* Input files checked in to version control. They are difficult to
maintain. Generate input files with code instead.
* Expected output files checked in to VC. Same reason. Validate
selected fields instead.

For a longer answer, please search for my previous posts to the user
list, or watch this presentation: https://vimeo.com/192429554

Slides at 
http://www.slideshare.net/lallea/test-strategies-for-data-processing-pipelines-67244458


Regards,



Lars Albertsson
Data engineering consultant
www.mapflat.com
https://twitter.com/lalleal
+46 70 7687109
Calendar: https://goo.gl/6FBtlS, https://freebusy.io/la...@mapflat.com


On Sun, Jan 15, 2017 at 7:14 PM, A Shaikh  wrote:
> Whats the most popular Testing approach for Spark App. I am looking
> something in the line of TDD.

-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org



Re: TDD in Spark

2017-01-15 Thread Miguel Morales
I've also written a small blog post that may help you out:
https://medium.com/@therevoltingx/test-driven-development-w-apache-spark-746082b44941#.ia6stbl6n

On Sun, Jan 15, 2017 at 12:13 PM, Silvio Fiorito
 wrote:
> You should check out Holden’s excellent spark-testing-base package:
> https://github.com/holdenk/spark-testing-base
>
>
>
>
>
> From: A Shaikh 
> Date: Sunday, January 15, 2017 at 1:14 PM
> To: User 
> Subject: TDD in Spark
>
>
>
> Whats the most popular Testing approach for Spark App. I am looking
> something in the line of TDD.

-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org



Re: TDD in Spark

2017-01-15 Thread Silvio Fiorito
You should check out Holden’s excellent spark-testing-base package: 
https://github.com/holdenk/spark-testing-base


From: A Shaikh 
Date: Sunday, January 15, 2017 at 1:14 PM
To: User 
Subject: TDD in Spark

Whats the most popular Testing approach for Spark App. I am looking something 
in the line of TDD.