Oh interesting. I did send a PR or thought I did will check this eve.

On Apr 29, 2017 10:04 AM, "Sam Elamin" <hussam.ela...@gmail.com> wrote:

> Hi lucas
>
>
> Thanks for the detailed feedback, that's really useful!
>
> I did suggest Github but my colleague asked for an email
>
> You raise a good point with the grammar, sure I will rephrase it. I am
> more than happy to merge in the PR if you send it
>
>
> Th at said I know you can make BDD tests using any framework but I am a
> lazy developer and would rather use the framework or library defaults to
> make it easier for other devs to pick up.
>
> The number of rows is only a start correct, we can add more tests to check
> the transformed version but I was going to point that out on the future
> part of the series since this one is mainly about raw extracts.
>
>
> Thank you very much for the feedback and I will be sure to add it once I
> have more feedback
>
>
> Maybe we can create a gist of all this or even a tiny book on best
> practices if people find it useful
>
> Looking forward to the PR!
>
> Regards
> Sam
>
>
>
>
>
> On Sat, 29 Apr 2017 at 06:36, lucas.g...@gmail.com <lucas.g...@gmail.com>
> wrote:
>
>> Awesome, thanks.
>>
>> Just reading your post
>>
>> A few observations:
>> 1) You're giving out Marius's email: "I have been lucky enough to
>> build this pipeline with the amazing Marius Feteanu".  A linked or
>> github link might be more helpful.
>>
>> 2) "If you are in Pyspark world sadly Holden’s test base wont work so
>> I suggest you check out Pytest and pytest-bdd.".  doesn't read well to
>> me, on first read I was wondering if Spark-Test-Base wasn't available
>> in python... It took me about 20 seconds to figure out that you
>> probably meant it doesn't allow for direct BDD semantics.  My 2nd
>> observation here is that BDD semantics can be aped in any given
>> testing framework.  You just need to be flexible :)
>>
>> 3) You're doing a transformation (IE JSON input against a JSON
>> schema).  You are testing for # of rows which is a good start.  But I
>> don't think that really exercises a test against your JSON schema. I
>> tend to view schema as the things that need the most rigorous testing
>> (it's code after all).  IE I would want to confirm that the output
>> matches the expected shape and values after being loaded against the
>> schema.
>>
>> I saw a few minor spelling and grammatical issues as well.  I put a PR
>> into your blog for them.  I won't be offended if you squish it :)
>>
>> I should be getting into our testing 'how-to' stuff this week.  I'll
>> scrape our org specific stuff and put it up to github this week as
>> well.  It'll be in python so maybe we'll get both use cases covered
>> with examples :)
>>
>> G
>>
>> On 27 April 2017 at 03:46, Sam Elamin <hussam.ela...@gmail.com> wrote:
>> > Hi
>> >
>> > @Lucas I certainly would love to write an integration testing library
>> for
>> > workflows, I have a few ideas I would love to share with others and
>> they are
>> > focused around Airflow since that is what we use
>> >
>> >
>> > As promised here is the first blog post in a series of posts I hope to
>> write
>> > on how we build data pipelines
>> >
>> > Please feel free to retweet my original tweet and share because the more
>> > ideas we have the better!
>> >
>> > Feedback is always welcome!
>> >
>> > Regards
>> > Sam
>> >
>> > On Tue, Apr 25, 2017 at 10:32 PM, lucas.g...@gmail.com
>> > <lucas.g...@gmail.com> wrote:
>> >>
>> >> Hi all, whoever (Sam I think) was going to do some work on doing a
>> >> template testing pipeline.  I'd love to be involved, I have a current
>> task
>> >> in my day job (data engineer) to flesh out our testing how-to / best
>> >> practices for Spark jobs and I think I'll be doing something very
>> similar
>> >> for the next week or 2.
>> >>
>> >> I'll scrape out what i have now in the next day or so and put it up in
>> a
>> >> gist that I can share too.
>> >>
>> >> G
>> >>
>> >> On 25 April 2017 at 13:04, Holden Karau <hol...@pigscanfly.ca> wrote:
>> >>>
>> >>> Urgh hangouts did something frustrating, updated link
>> >>> https://hangouts.google.com/hangouts/_/ha6kusycp5fvzei2trhay4uhhqe
>> >>>
>> >>> On Mon, Apr 24, 2017 at 12:13 AM, Holden Karau <hol...@pigscanfly.ca>
>> >>> wrote:
>> >>>>
>> >>>> The (tentative) link for those interested is
>> >>>> https://hangouts.google.com/hangouts/_/oyjvcnffejcjhi6qazf3lysypue .
>> >>>>
>> >>>> On Mon, Apr 24, 2017 at 12:02 AM, Holden Karau <hol...@pigscanfly.ca
>> >
>> >>>> wrote:
>> >>>>>
>> >>>>> So 14 people have said they are available on Tuesday the 25th at 1PM
>> >>>>> pacific so we will do this meeting then (
>> >>>>> https://doodle.com/poll/69y6yab4pyf7u8bn ).
>> >>>>>
>> >>>>> Since hangouts tends to work ok on the Linux distro I'm running my
>> >>>>> default is to host this as a "hangouts-on-air" unless there are
>> alternative
>> >>>>> ideas.
>> >>>>>
>> >>>>> I'll record the hangout and if it isn't terrible I'll post it for
>> those
>> >>>>> who weren't able to make it (and for next time I'll include more
>> European
>> >>>>> friendly time options - Doodle wouldn't let me update it once
>> posted).
>> >>>>>
>> >>>>> On Fri, Apr 14, 2017 at 11:17 AM, Holden Karau <
>> hol...@pigscanfly.ca>
>> >>>>> wrote:
>> >>>>>>
>> >>>>>> Hi Spark Users (+ Some Spark Testing Devs on BCC),
>> >>>>>>
>> >>>>>> Awhile back on one of the many threads about testing in Spark there
>> >>>>>> was some interest in having a chat about the state of Spark
>> testing and what
>> >>>>>> people want/need.
>> >>>>>>
>> >>>>>> So if you are interested in joining an online (with maybe an IRL
>> >>>>>> component if enough people are SF based) chat about Spark testing
>> please
>> >>>>>> fill out this doodle - https://doodle.com/poll/69y6yab4pyf7u8bn
>> >>>>>>
>> >>>>>> I think reasonable topics of discussion could be:
>> >>>>>>
>> >>>>>> 1) What is the state of the different Spark testing libraries in
>> the
>> >>>>>> different core (Scala, Python, R, Java) and extended languages (C#,
>> >>>>>> Javascript, etc.)?
>> >>>>>> 2) How do we make these more easily discovered by users?
>> >>>>>> 3) What are people looking for in their testing libraries that we
>> are
>> >>>>>> missing? (can be functionality, documentation, etc.)
>> >>>>>> 4) Are there any examples of well tested open source Spark projects
>> >>>>>> and where are they?
>> >>>>>>
>> >>>>>> If you have other topics that's awesome.
>> >>>>>>
>> >>>>>> To clarify this about libraries and best practices for people
>> testing
>> >>>>>> their Spark applications, and less about testing Spark's internals
>> (although
>> >>>>>> as illustrated by some of the libraries there is some strong
>> overlap in what
>> >>>>>> is required to make that work).
>> >>>>>>
>> >>>>>> Cheers,
>> >>>>>>
>> >>>>>> Holden :)
>> >>>>>>
>> >>>>>> --
>> >>>>>> Cell : 425-233-8271 <(425)%20233-8271>
>> >>>>>> Twitter: https://twitter.com/holdenkarau
>> >>>>>
>> >>>>>
>> >>>>>
>> >>>>>
>> >>>>> --
>> >>>>> Cell : 425-233-8271 <(425)%20233-8271>
>> >>>>> Twitter: https://twitter.com/holdenkarau
>> >>>>
>> >>>>
>> >>>>
>> >>>>
>> >>>> --
>> >>>> Cell : 425-233-8271 <(425)%20233-8271>
>> >>>> Twitter: https://twitter.com/holdenkarau
>> >>>
>> >>>
>> >>>
>> >>>
>> >>> --
>> >>> Cell : 425-233-8271 <(425)%20233-8271>
>> >>> Twitter: https://twitter.com/holdenkarau
>> >>
>> >>
>> >
>>
>

Reply via email to