Re: How to unit test spark streaming?

2017-03-07 Thread kant kodali
Agreed with the statement in quotes below whether one wants to do unit
tests or not It is a good practice to write code that way. But I think the
more painful and tedious task is to mock/emulate all the nodes such as
spark workers/master/hdfs/input source stream and all that. I wish there is
something really simple. Perhaps the simplest thing to do is just to do
integration tests which also tests the transformations/business logic. This
way I can spawn a small cluster and run my tests and bring my cluster down
when I am done. And sure if the cluster isn't available then I can't run
the tests however some node should be available even to run a single
process. I somehow feel like we may doing too much work to fit into the
archaic definition of unit tests.

 "Basically you abstract your transformations to take in a dataframe and
return one, then you assert on the returned df " this

On Tue, Mar 7, 2017 at 11:14 AM, Michael Armbrust 
wrote:

> Basically you abstract your transformations to take in a dataframe and
>> return one, then you assert on the returned df
>>
>
> +1 to this suggestion.  This is why we wanted streaming and batch
> dataframes to share the same API.
>


Re: How to unit test spark streaming?

2017-03-07 Thread Michael Armbrust
>
> Basically you abstract your transformations to take in a dataframe and
> return one, then you assert on the returned df
>

+1 to this suggestion.  This is why we wanted streaming and batch
dataframes to share the same API.


Re: How to unit test spark streaming?

2017-03-07 Thread Jörn Franke
This depends on your target setup! I run for example for my open source 
libraries for spark integration tests (a dedicated folder a side the unit 
tests) a local spark master, but also use a minidfs cluster (to simulate HDFS 
on a node) and sometimes also a miniyarn cluster (see 
https://wiki.apache.org/hadoop/HowToDevelopUnitTests).

 An example can be found here:  
https://github.com/ZuInnoTe/hadoopcryptoledger/tree/master/examples/spark-bitcoinblock
 

or - if you need Scala - 
https://github.com/ZuInnoTe/hadoopcryptoledger/tree/master/examples/scala-spark-bitcoinblock
 

In both cases it is in the integration-tests (Java) or it (Scala) folder.

Spark Streaming - I have no open source example at hand, but basically you need 
to simulate the source and the rest is as above.

 I will eventually write a blog post about this with more details.

> On 7 Mar 2017, at 13:04, kant kodali  wrote:
> 
> Hi All,
> 
> How to unit test spark streaming or spark in general? How do I test the 
> results of my transformations? Also, more importantly don't we need to spawn 
> master and worker JVM's either in one or multiple nodes?
> 
> Thanks!
> kant

-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org



Re: How to unit test spark streaming?

2017-03-07 Thread Sam Elamin
Hey kant

You can use holdens spark test base

Have a look at some of the specs I wrote here to give you an idea

https://github.com/samelamin/spark-bigquery/blob/master/src/test/scala/com/samelamin/spark/bigquery/BigQuerySchemaSpecs.scala

Basically you abstract your transformations to take in a dataframe and
return one, then you assert on the returned df

Regards
Sam
On Tue, 7 Mar 2017 at 12:05, kant kodali  wrote:

> Hi All,
>
> How to unit test spark streaming or spark in general? How do I test the
> results of my transformations? Also, more importantly don't we need to
> spawn master and worker JVM's either in one or multiple nodes?
>
> Thanks!
> kant
>