Oh interesting. I did send a PR or thought I did will check this eve. On Apr 29, 2017 10:04 AM, "Sam Elamin" <hussam.ela...@gmail.com> wrote:
> Hi lucas > > > Thanks for the detailed feedback, that's really useful! > > I did suggest Github but my colleague asked for an email > > You raise a good point with the grammar, sure I will rephrase it. I am > more than happy to merge in the PR if you send it > > > Th at said I know you can make BDD tests using any framework but I am a > lazy developer and would rather use the framework or library defaults to > make it easier for other devs to pick up. > > The number of rows is only a start correct, we can add more tests to check > the transformed version but I was going to point that out on the future > part of the series since this one is mainly about raw extracts. > > > Thank you very much for the feedback and I will be sure to add it once I > have more feedback > > > Maybe we can create a gist of all this or even a tiny book on best > practices if people find it useful > > Looking forward to the PR! > > Regards > Sam > > > > > > On Sat, 29 Apr 2017 at 06:36, lucas.g...@gmail.com <lucas.g...@gmail.com> > wrote: > >> Awesome, thanks. >> >> Just reading your post >> >> A few observations: >> 1) You're giving out Marius's email: "I have been lucky enough to >> build this pipeline with the amazing Marius Feteanu". A linked or >> github link might be more helpful. >> >> 2) "If you are in Pyspark world sadly Holden’s test base wont work so >> I suggest you check out Pytest and pytest-bdd.". doesn't read well to >> me, on first read I was wondering if Spark-Test-Base wasn't available >> in python... It took me about 20 seconds to figure out that you >> probably meant it doesn't allow for direct BDD semantics. My 2nd >> observation here is that BDD semantics can be aped in any given >> testing framework. You just need to be flexible :) >> >> 3) You're doing a transformation (IE JSON input against a JSON >> schema). You are testing for # of rows which is a good start. But I >> don't think that really exercises a test against your JSON schema. I >> tend to view schema as the things that need the most rigorous testing >> (it's code after all). IE I would want to confirm that the output >> matches the expected shape and values after being loaded against the >> schema. >> >> I saw a few minor spelling and grammatical issues as well. I put a PR >> into your blog for them. I won't be offended if you squish it :) >> >> I should be getting into our testing 'how-to' stuff this week. I'll >> scrape our org specific stuff and put it up to github this week as >> well. It'll be in python so maybe we'll get both use cases covered >> with examples :) >> >> G >> >> On 27 April 2017 at 03:46, Sam Elamin <hussam.ela...@gmail.com> wrote: >> > Hi >> > >> > @Lucas I certainly would love to write an integration testing library >> for >> > workflows, I have a few ideas I would love to share with others and >> they are >> > focused around Airflow since that is what we use >> > >> > >> > As promised here is the first blog post in a series of posts I hope to >> write >> > on how we build data pipelines >> > >> > Please feel free to retweet my original tweet and share because the more >> > ideas we have the better! >> > >> > Feedback is always welcome! >> > >> > Regards >> > Sam >> > >> > On Tue, Apr 25, 2017 at 10:32 PM, lucas.g...@gmail.com >> > <lucas.g...@gmail.com> wrote: >> >> >> >> Hi all, whoever (Sam I think) was going to do some work on doing a >> >> template testing pipeline. I'd love to be involved, I have a current >> task >> >> in my day job (data engineer) to flesh out our testing how-to / best >> >> practices for Spark jobs and I think I'll be doing something very >> similar >> >> for the next week or 2. >> >> >> >> I'll scrape out what i have now in the next day or so and put it up in >> a >> >> gist that I can share too. >> >> >> >> G >> >> >> >> On 25 April 2017 at 13:04, Holden Karau <hol...@pigscanfly.ca> wrote: >> >>> >> >>> Urgh hangouts did something frustrating, updated link >> >>> https://hangouts.google.com/hangouts/_/ha6kusycp5fvzei2trhay4uhhqe >> >>> >> >>> On Mon, Apr 24, 2017 at 12:13 AM, Holden Karau <hol...@pigscanfly.ca> >> >>> wrote: >> >>>> >> >>>> The (tentative) link for those interested is >> >>>> https://hangouts.google.com/hangouts/_/oyjvcnffejcjhi6qazf3lysypue . >> >>>> >> >>>> On Mon, Apr 24, 2017 at 12:02 AM, Holden Karau <hol...@pigscanfly.ca >> > >> >>>> wrote: >> >>>>> >> >>>>> So 14 people have said they are available on Tuesday the 25th at 1PM >> >>>>> pacific so we will do this meeting then ( >> >>>>> https://doodle.com/poll/69y6yab4pyf7u8bn ). >> >>>>> >> >>>>> Since hangouts tends to work ok on the Linux distro I'm running my >> >>>>> default is to host this as a "hangouts-on-air" unless there are >> alternative >> >>>>> ideas. >> >>>>> >> >>>>> I'll record the hangout and if it isn't terrible I'll post it for >> those >> >>>>> who weren't able to make it (and for next time I'll include more >> European >> >>>>> friendly time options - Doodle wouldn't let me update it once >> posted). >> >>>>> >> >>>>> On Fri, Apr 14, 2017 at 11:17 AM, Holden Karau < >> hol...@pigscanfly.ca> >> >>>>> wrote: >> >>>>>> >> >>>>>> Hi Spark Users (+ Some Spark Testing Devs on BCC), >> >>>>>> >> >>>>>> Awhile back on one of the many threads about testing in Spark there >> >>>>>> was some interest in having a chat about the state of Spark >> testing and what >> >>>>>> people want/need. >> >>>>>> >> >>>>>> So if you are interested in joining an online (with maybe an IRL >> >>>>>> component if enough people are SF based) chat about Spark testing >> please >> >>>>>> fill out this doodle - https://doodle.com/poll/69y6yab4pyf7u8bn >> >>>>>> >> >>>>>> I think reasonable topics of discussion could be: >> >>>>>> >> >>>>>> 1) What is the state of the different Spark testing libraries in >> the >> >>>>>> different core (Scala, Python, R, Java) and extended languages (C#, >> >>>>>> Javascript, etc.)? >> >>>>>> 2) How do we make these more easily discovered by users? >> >>>>>> 3) What are people looking for in their testing libraries that we >> are >> >>>>>> missing? (can be functionality, documentation, etc.) >> >>>>>> 4) Are there any examples of well tested open source Spark projects >> >>>>>> and where are they? >> >>>>>> >> >>>>>> If you have other topics that's awesome. >> >>>>>> >> >>>>>> To clarify this about libraries and best practices for people >> testing >> >>>>>> their Spark applications, and less about testing Spark's internals >> (although >> >>>>>> as illustrated by some of the libraries there is some strong >> overlap in what >> >>>>>> is required to make that work). >> >>>>>> >> >>>>>> Cheers, >> >>>>>> >> >>>>>> Holden :) >> >>>>>> >> >>>>>> -- >> >>>>>> Cell : 425-233-8271 <(425)%20233-8271> >> >>>>>> Twitter: https://twitter.com/holdenkarau >> >>>>> >> >>>>> >> >>>>> >> >>>>> >> >>>>> -- >> >>>>> Cell : 425-233-8271 <(425)%20233-8271> >> >>>>> Twitter: https://twitter.com/holdenkarau >> >>>> >> >>>> >> >>>> >> >>>> >> >>>> -- >> >>>> Cell : 425-233-8271 <(425)%20233-8271> >> >>>> Twitter: https://twitter.com/holdenkarau >> >>> >> >>> >> >>> >> >>> >> >>> -- >> >>> Cell : 425-233-8271 <(425)%20233-8271> >> >>> Twitter: https://twitter.com/holdenkarau >> >> >> >> >> > >> >