Re: How to unit test spark streaming?

2017-03-07 Thread kant kodali
Agreed with the statement in quotes below whether one wants to do unit tests or not It is a good practice to write code that way. But I think the more painful and tedious task is to mock/emulate all the nodes such as spark workers/master/hdfs/input source stream and all that. I wish there is

Re: How to unit test spark streaming?

2017-03-07 Thread Michael Armbrust
> > Basically you abstract your transformations to take in a dataframe and > return one, then you assert on the returned df > +1 to this suggestion. This is why we wanted streaming and batch dataframes to share the same API.

Re: How to unit test spark streaming?

2017-03-07 Thread Jörn Franke
This depends on your target setup! I run for example for my open source libraries for spark integration tests (a dedicated folder a side the unit tests) a local spark master, but also use a minidfs cluster (to simulate HDFS on a node) and sometimes also a miniyarn cluster (see

Re: How to unit test spark streaming?

2017-03-07 Thread Sam Elamin
Hey kant You can use holdens spark test base Have a look at some of the specs I wrote here to give you an idea https://github.com/samelamin/spark-bigquery/blob/master/src/test/scala/com/samelamin/spark/bigquery/BigQuerySchemaSpecs.scala Basically you abstract your transformations to take in a