Re: How to unit test spark streaming?
Agreed with the statement in quotes below whether one wants to do unit tests or not It is a good practice to write code that way. But I think the more painful and tedious task is to mock/emulate all the nodes such as spark workers/master/hdfs/input source stream and all that. I wish there is something really simple. Perhaps the simplest thing to do is just to do integration tests which also tests the transformations/business logic. This way I can spawn a small cluster and run my tests and bring my cluster down when I am done. And sure if the cluster isn't available then I can't run the tests however some node should be available even to run a single process. I somehow feel like we may doing too much work to fit into the archaic definition of unit tests. "Basically you abstract your transformations to take in a dataframe and return one, then you assert on the returned df " this On Tue, Mar 7, 2017 at 11:14 AM, Michael Armbrustwrote: > Basically you abstract your transformations to take in a dataframe and >> return one, then you assert on the returned df >> > > +1 to this suggestion. This is why we wanted streaming and batch > dataframes to share the same API. >
Re: How to unit test spark streaming?
> > Basically you abstract your transformations to take in a dataframe and > return one, then you assert on the returned df > +1 to this suggestion. This is why we wanted streaming and batch dataframes to share the same API.
Re: How to unit test spark streaming?
This depends on your target setup! I run for example for my open source libraries for spark integration tests (a dedicated folder a side the unit tests) a local spark master, but also use a minidfs cluster (to simulate HDFS on a node) and sometimes also a miniyarn cluster (see https://wiki.apache.org/hadoop/HowToDevelopUnitTests). An example can be found here: https://github.com/ZuInnoTe/hadoopcryptoledger/tree/master/examples/spark-bitcoinblock or - if you need Scala - https://github.com/ZuInnoTe/hadoopcryptoledger/tree/master/examples/scala-spark-bitcoinblock In both cases it is in the integration-tests (Java) or it (Scala) folder. Spark Streaming - I have no open source example at hand, but basically you need to simulate the source and the rest is as above. I will eventually write a blog post about this with more details. > On 7 Mar 2017, at 13:04, kant kodaliwrote: > > Hi All, > > How to unit test spark streaming or spark in general? How do I test the > results of my transformations? Also, more importantly don't we need to spawn > master and worker JVM's either in one or multiple nodes? > > Thanks! > kant - To unsubscribe e-mail: user-unsubscr...@spark.apache.org
Re: How to unit test spark streaming?
Hey kant You can use holdens spark test base Have a look at some of the specs I wrote here to give you an idea https://github.com/samelamin/spark-bigquery/blob/master/src/test/scala/com/samelamin/spark/bigquery/BigQuerySchemaSpecs.scala Basically you abstract your transformations to take in a dataframe and return one, then you assert on the returned df Regards Sam On Tue, 7 Mar 2017 at 12:05, kant kodaliwrote: > Hi All, > > How to unit test spark streaming or spark in general? How do I test the > results of my transformations? Also, more importantly don't we need to > spawn master and worker JVM's either in one or multiple nodes? > > Thanks! > kant >