Hi, I've been following this thread for a while.
I'm trying to bring in a test strategy in my team to test a number of data pipelines before production. I have watched Lars' presentation and find it great. However I'm debating whether unit tests are worth the effort if there are good job-level and pipeline-level tests. Does anybody have any experiences benefitting from unit-tests in such a case? Cheers, Shiv On Mon, Dec 12, 2016 at 6:00 AM, Juan Rodríguez Hortalá < juan.rodriguez.hort...@gmail.com> wrote: > Hi all, > > I would also would like to participate on that. > > Greetings, > > Juan > > On Fri, Dec 9, 2016 at 6:03 AM, Michael Stratton <michael.stratton@ > komodohealth.com> wrote: > >> That sounds great, please include me so I can get involved. >> >> On Fri, Dec 9, 2016 at 7:39 AM, Marco Mistroni <mmistr...@gmail.com> >> wrote: >> >>> Me too as I spent most of my time writing unit/integ tests.... pls >>> advise on where I can start >>> Kr >>> >>> On 9 Dec 2016 12:15 am, "Miguel Morales" <therevolti...@gmail.com> >>> wrote: >>> >>>> I would be interested in contributing. Ive created my own library for >>>> this as well. In my blog post I talk about testing with Spark in RSpec >>>> style: >>>> https://medium.com/@therevoltingx/test-driven-development-w- >>>> apache-spark-746082b44941 >>>> >>>> Sent from my iPhone >>>> >>>> On Dec 8, 2016, at 4:09 PM, Holden Karau <hol...@pigscanfly.ca> wrote: >>>> >>>> There are also libraries designed to simplify testing Spark in the >>>> various platforms, spark-testing-base >>>> <http://github.com/holdenk/spark-testing-base> for Scala/Java/Python >>>> (& video https://www.youtube.com/watch?v=f69gSGSLGrY), sscheck >>>> <https://github.com/juanrh/sscheck> (scala focused property based), >>>> pyspark.test (python focused with py.test instead of unittest2) (& >>>> blog post from nextdoor https://engblog.nextd >>>> oor.com/unit-testing-apache-spark-with-py-test-3b8970dc013b#.jw3bdcej9 >>>> ) >>>> >>>> Good luck on your Spark Adventures :) >>>> >>>> P.S. >>>> >>>> If anyone is interested in helping improve spark testing libraries I'm >>>> always looking for more people to be involved with spark-testing-base >>>> because I'm lazy :p >>>> >>>> On Thu, Dec 8, 2016 at 2:05 PM, Lars Albertsson <la...@mapflat.com> >>>> wrote: >>>> >>>>> I wrote some advice in a previous post on the list: >>>>> http://markmail.org/message/bbs5acrnksjxsrrs >>>>> >>>>> It does not mention python, but the strategy advice is the same. Just >>>>> replace JUnit/Scalatest with pytest, unittest, or your favourite >>>>> python test framework. >>>>> >>>>> >>>>> I recently held a presentation on the subject. There is a video >>>>> recording at https://vimeo.com/192429554 and slides at >>>>> http://www.slideshare.net/lallea/test-strategies-for-data-pr >>>>> ocessing-pipelines-67244458 >>>>> >>>>> You can find more material on test strategies at >>>>> http://www.mapflat.com/lands/resources/reading-list/index.html >>>>> >>>>> >>>>> >>>>> >>>>> Lars Albertsson >>>>> Data engineering consultant >>>>> www.mapflat.com >>>>> https://twitter.com/lalleal >>>>> +46 70 7687109 >>>>> Calendar: https://goo.gl/6FBtlS, https://freebusy.io/la...@mapflat.com >>>>> >>>>> >>>>> On Thu, Dec 8, 2016 at 4:14 PM, pseudo oduesp <pseudo20...@gmail.com> >>>>> wrote: >>>>> > somone can tell me how i can make unit test on pyspark ? >>>>> > (book, tutorial ...) >>>>> >>>>> --------------------------------------------------------------------- >>>>> To unsubscribe e-mail: user-unsubscr...@spark.apache.org >>>>> >>>>> >>>> >>>> >>>> -- >>>> Cell : 425-233-8271 <(425)%20233-8271> >>>> Twitter: https://twitter.com/holdenkarau >>>> >>>> >> >