Re: Airflow Testing Library

Arthur Wiedmer Tue, 09 May 2017 12:47:16 -0700

Hi,

I would love to see if we can contribute some of the work we have done
internally at Airbnb to support some testing of DAGs. We have a long ways
to go though :)


Best,
Arthur

On Tue, May 9, 2017 at 12:34 PM, Sam Elamin <[email protected]> wrote:

> Thanks Gerard and Laura, I have created an email thread as agreed in the
> call so lets take the discussion there. If anyone else is interested in
> helping us build this library please do get in touch!
>
> On Tue, May 9, 2017 at 5:40 PM, Laura Lorenz <[email protected]>
> wrote:
>
> > Good points @Gerard. I think the distinctions you make between different
> > testing considerations could help us focus our efforts. Here's my 2 cents
> > in the buckets you describe; I'm wondering if any of these use cases
> align
> > with anyone else and can help narrow our scope, and if I understood you
> > right @Gerard:
> >
> > Regarding platform code: For our own platform code (ie custom Operators
> and
> > Hooks), we have our CI platform running unittests on their construction
> > and, in the case of hooks, integration tests on connectivity. The latter
> > involves us setting up test integration services (i.e. a test MySQL
> > process) which we start up as docker containers and we flip our airflow's
> > configuration to point at them during testing using environment
> variables.
> > It seems from a browse on airflow's testing that operators and hooks are
> > mostly unittested, with the integrations mocked or skipped (ie
> > https://github.com/apache/incubator-airflow/blob/master/
> > tests/contrib/hooks/test_jira_hook.py#L40-L41
> > or
> > https://github.com/apache/incubator-airflow/blob/master/
> > tests/contrib/hooks/test_sqoop_hook.py#L123-L125).
> > If the hook is using some other, well tested library to actually
> establish
> > the connection, the case can probably be made here that the custom
> operator
> > and hook authors don't need integration tests, so since the normal
> unittest
> > library is enough to handle these that might not need to be in scope for
> a
> > new testing library to describe.
> >
> > Regarding data manipulation functions of the business code:
> > For us, we run tests on each operator in each DAG on CI, seeded with test
> > input data, asserted against known output data, all of which we have
> > compiled over time to represent different edge cases we expect or have
> > seen. So this is a test at the level of the operator as described in a
> > given DAG. Because we only describe edge cases we have seen or can
> predict,
> > its a very reactive way to handle testing at this level.
> >
> > If I understand your idea right, another way to test (or at least,
> surface
> > errors) at this level is, given you have a DAG that is resilient against
> > arbitrary data failures, your DAG should include a validation task/report
> > at its end or a test suite should run daily against the production error
> > log for that DAG that surfaces errors your business code encountered on
> > production data. I think this is really interesting and reminds me of an
> > airflow video I saw once (can't remember who gave the talk) on a DAG
> whose
> > last task self-reported error counts and rows lost. If implemented as a
> > test suite you would run against production this might be a direction we
> > would want a testing library to go into.
> >
> > Regarding the workflow correctness of the business code:
> > What we set out to do on our side was a hybrid version of your item 1
> and 2
> > which we call "end-to-end tests": to call a whole DAG against 'real'
> > existing systems (though really they are test docker containers of the
> > processes we need (MySQL and Neo4J specifically) that we use environment
> > variables to switch our airflow to use when instantiating hooks etc),
> > seeded with test input files for services that are hard to set up (i.e.
> > third party APIs we ingest data from). Since the whole DAG is seeded with
> > known input data, this gives us a way to compare the last output of a DAG
> > to a known file, so that if any workflow changes OR business logic in the
> > middle affected the final output, we would know as part of our test suite
> > instead of when production breaks. In other words, a way to test a
> > regression of the whole DAG. So this is the framework we were thinking
> > needed to be created, and is a direction we could go with a testing
> library
> > as well.
> >
> > This doesn't get to your point of determining what workflow was used,
> which
> > is interesting, just not a use case we have encountered yet (we only have
> > deterministic DAGs). In my mind in this case we would want a testing
> suite
> > to be able to more or less turn some DAGs "on" against seeded input data
> > and mocked or test integration services, let a scheduler go at it, and
> then
> > check the metadata database for what workflow happened (and, if we had
> test
> > integration services, maybe also check the output against the known
> output
> > for the seeded input). I can definitely see your suggestion of developing
> > instrumentation to inspect a followed workflow as a useful addition a
> > testing library could include.
> >
> > To some degree our end-to-end DAG tests overlaps in our workflow with
> your
> > point 3 (UAT environment), but we've found that more useful to test if
> > "wild data" causes uncaught exceptions or any integration errors with
> > difficult-to-mock third party services, not DAG level logic regressions,
> > since the input data is unknown and thus we can't compare to a known
> output
> > in this case, depending instead on a fallible human QA or just accepting
> > that the DAG running with no exceptions as passing UAT.
> >
> > Laura
> >
> > On Tue, May 9, 2017 at 2:15 AM, Gerard Toonstra <[email protected]>
> > wrote:
> >
> > > Very interesting video. I was unable to take part. I watched only part
> of
> > > it for now.
> > > Let us know where the discussion is being moved to.
> > >
> > > The confluence does indeed seem to be the place to put final
> conclusions
> > > and thoughts.
> > >
> > > For airflow, I like to make a distinction between "platform" and
> > "business"
> > > code. The platform code are
> > > the hooks and operators and provide the capabilities of what your ETL
> > > system can do. You'll test this
> > > code with a lot of thoroughness, such that each component behaves how
> > you'd
> > > expect, judging from
> > > the constructor interface. Any abstractions in there (like copying
> files
> > to
> > > GCS) should be kept as hidden
> > > as possible (retries, etc).
> > >
> > > The "business" code is what runs on a daily basis. This can be divided
> in
> > > another two concerns
> > > for testing:
> > >
> > > 1 The workflow, the code between the data manipulation functions that
> > > decides which operators get called
> > > 2 The data manipulation function.
> > >
> > >
> > > I think it's good practice to run tests on "2" on a daily basis and not
> > > just once on CI. The reason is that there
> > > are too many unforeseen circumstances where data can get into a bad
> > state.
> > > So such tests shouldn't run
> > > once on a highly controlled environment like CI, but run daily in a
> less
> > > predictable environment like production,
> > > where all kind of weird things can happen, but you'll be able to catch
> > with
> > > proper checks in place. Even if the checks
> > > are too rigorous, you can skip them and improve on them, so that it
> fits
> > > what goes on in your environment
> > > to your best ability.
> > >
> > >
> > > Which mostly leaves testing workflow correctness and platform code.
> What
> > I
> > > had intended to do was;
> > >
> > > 1. Test the platform code against real existing systems (or maybe
> docker
> > > containers), to test their behavior
> > >     in success and failure conditions.
> > > 2. Create workflow scripts for testing the workflow; this probably
> > requires
> > > some specific changes in hooks,
> > >    which wouldn't call out to other systems, but would just pick up
> small
> > > files you prepare from a testing repo
> > >    and pass them around. The test script could also simulate
> > > unavailability, etc.
> > >    This relieves you of a huge responsibility of setting up systems,
> > docker
> > > containers and load that with data.
> > >     Airflow sets up pretty quickly as a docker container and you can
> also
> > > start up a sample database with that.
> > >     Afterwards, from a test script, you can check which workflow was
> > > followed by inspecting the database,
> > >    so develop some instrumentation for that.
> > > 3. Test the data manipulation in a UAT environment, mirrorring the runs
> > in
> > > production to some extent.
> > >     That would be a place to verify if the data comes out correctly and
> > > also show people what kind of
> > >    monitoring is in place to double-check that.
> > >
> > >
> > > On Tue, May 9, 2017 at 1:14 AM, Arnie Salazar <[email protected]>
> > > wrote:
> > >
> > > > Scratch that. I see the whole video now.
> > > >
> > > > On Mon, May 8, 2017 at 3:33 PM Arnie Salazar <[email protected]
> >
> > > > wrote:
> > > >
> > > > > Thanks Sam!
> > > > >
> > > > > Is there a part 2 to the video? If not, can you post the "next
> steps"
> > > > > notes you took whenever you have a chance?
> > > > >
> > > > > Cheers,
> > > > > Arnie
> > > > >
> > > > > On Mon, May 8, 2017 at 3:08 PM Sam Elamin <[email protected]
> >
> > > > wrote:
> > > > >
> > > > >> Hi Folks
> > > > >>
> > > > >> For those of you who missed it, you can catch the discussion from
> > the
> > > > link
> > > > >> on this tweet <https://twitter.com/samelamin/status/
> > > 861703888298225670>
> > > > >>
> > > > >> Please do share and feel free to get involved as the more feedback
> > we
> > > > get
> > > > >> the better the library we create is :)
> > > > >>
> > > > >> Regards
> > > > >> Sam
> > > > >>
> > > > >> On Mon, May 8, 2017 at 9:43 PM, Sam Elamin <
> [email protected]
> > >
> > > > >> wrote:
> > > > >>
> > > > >> > Bit late notice but the call is happening today at 9 15 utc so
> in
> > > > about
> > > > >> >  30 mins or so
> > > > >> >
> > > > >> > It will be recorded but if anyone would like to join in on the
> > > > >> discussion
> > > > >> > the hangout link is https://hangouts.google.com/hangouts/_/
> > > > >> > mbkr6xassnahjjonpuvrirxbnae
> > > > >> >
> > > > >> > Regards
> > > > >> > Sam
> > > > >> >
> > > > >> > On Fri, 5 May 2017 at 21:35, Ali Uz <[email protected]> wrote:
> > > > >> >
> > > > >> >> I am also very interested in seeing how this turns out. Even
> > though
> > > > we
> > > > >> >> don't have a testing framework in-place on the project I am
> > working
> > > > >> on, I
> > > > >> >> would very much like to contribute to some general framework
> for
> > > > >> testing
> > > > >> >> DAGs.
> > > > >> >>
> > > > >> >> As of now we are just implementing dummy tasks that test our
> > actual
> > > > >> tasks
> > > > >> >> and verify if the given input produces the expected output.
> > Nothing
> > > > >> crazy
> > > > >> >> and certainly not flexible in the long run.
> > > > >> >>
> > > > >> >>
> > > > >> >> On Fri, 5 May 2017 at 22:59, Sam Elamin <
> [email protected]
> > >
> > > > >> wrote:
> > > > >> >>
> > > > >> >> > Haha yes Scott you are in!
> > > > >> >> > On Fri, 5 May 2017 at 20:07, Scott Halgrim <
> > > > [email protected]
> > > > >> >
> > > > >> >> > wrote:
> > > > >> >> >
> > > > >> >> > > Sounds A+ to me. By “both of you” did you include me? My
> > first
> > > > >> >> response
> > > > >> >> > > was just to your email address.
> > > > >> >> > >
> > > > >> >> > > On May 5, 2017, 11:58 AM -0700, Sam Elamin <
> > > > >> [email protected]>,
> > > > >> >> > > wrote:
> > > > >> >> > > > Ok sounds great folks
> > > > >> >> > > >
> > > > >> >> > > > Thanks for the detailed response laura! I'll invite both
> of
> > > you
> > > > >> to
> > > > >> >> the
> > > > >> >> > > > group if you are happy and we can schedule a call for
> next
> > > > week?
> > > > >> >> > > >
> > > > >> >> > > > How does that sound?
> > > > >> >> > > > On Fri, 5 May 2017 at 17:41, Laura Lorenz <
> > > > >> [email protected]
> > > > >> >> >
> > > > >> >> > > wrote:
> > > > >> >> > > >
> > > > >> >> > > > > We do! We developed our own little in-house DAG test
> > > > framework
> > > > >> >> which
> > > > >> >> > we
> > > > >> >> > > > > could share insights on/would love to hear what other
> > folks
> > > > >> are up
> > > > >> >> > to.
> > > > >> >> > > > > Basically we use mock a DAG's input data, use the
> > > BackfillJob
> > > > >> API
> > > > >> >> > > directly
> > > > >> >> > > > > to call a DAG in a test, and compare its outputs to the
> > > > >> intended
> > > > >> >> > result
> > > > >> >> > > > > given the inputs. We use docker/docker-compose to
> manage
> > > > >> services,
> > > > >> >> > and
> > > > >> >> > > > > split our dev and test stack locally so that the tests
> > have
> > > > >> their
> > > > >> >> own
> > > > >> >> > > > > scheduler and metadata database and so that our CI tool
> > > knows
> > > > >> how
> > > > >> >> to
> > > > >> >> > > > > construct the test stack as well.
> > > > >> >> > > > >
> > > > >> >> > > > > We co-opted the BackfillJob API for our own purposes
> > here,
> > > > but
> > > > >> it
> > > > >> >> > > seemed
> > > > >> >> > > > > overly complicated and fragile to start and interact
> with
> > > our
> > > > >> own
> > > > >> >> > > > > in-test-process executor like we saw in a few of the
> > tests
> > > in
> > > > >> the
> > > > >> >> > > Airflow
> > > > >> >> > > > > test suite. So I'd be really interested on finding a
> way
> > to
> > > > >> >> > streamline
> > > > >> >> > > how
> > > > >> >> > > > > to describe a test executor for both the Airflow test
> > suite
> > > > and
> > > > >> >> > > people's
> > > > >> >> > > > > own DAG testing and make that a first class type of
> API.
> > > > >> >> > > > >
> > > > >> >> > > > > Laura
> > > > >> >> > > > >
> > > > >> >> > > > > On Fri, May 5, 2017 at 11:46 AM, Sam Elamin <
> > > > >> >> [email protected]
> > > > >> >> > > > > wrote:
> > > > >> >> > > > >
> > > > >> >> > > > > > Hi All
> > > > >> >> > > > > >
> > > > >> >> > > > > > A few people in the Spark community are interested in
> > > > >> writing a
> > > > >> >> > > testing
> > > > >> >> > > > > > library for Airflow. We would love anyone who uses
> > > Airflow
> > > > >> >> heavily
> > > > >> >> > in
> > > > >> >> > > > > > production to be involved
> > > > >> >> > > > > >
> > > > >> >> > > > > > At the moment (AFAIK) testing your DAGs is a bit of a
> > > pain,
> > > > >> >> > > especially if
> > > > >> >> > > > > > you want to run them in a CI server
> > > > >> >> > > > > >
> > > > >> >> > > > > > Is anyone interested in being involved in the
> > discussion?
> > > > >> >> > > > > >
> > > > >> >> > > > > > Kind Regards
> > > > >> >> > > > > > Sam
> > > > >> >> > > > > >
> > > > >> >> > > > >
> > > > >> >> > >
> > > > >> >> >
> > > > >> >>
> > > > >> >
> > > > >>
> > > > >
> > > >
> > >
> >
>

Re: Airflow Testing Library

Reply via email to