Hi, I would love to see if we can contribute some of the work we have done internally at Airbnb to support some testing of DAGs. We have a long ways to go though :)
Best, Arthur On Tue, May 9, 2017 at 12:34 PM, Sam Elamin <[email protected]> wrote: > Thanks Gerard and Laura, I have created an email thread as agreed in the > call so lets take the discussion there. If anyone else is interested in > helping us build this library please do get in touch! > > On Tue, May 9, 2017 at 5:40 PM, Laura Lorenz <[email protected]> > wrote: > > > Good points @Gerard. I think the distinctions you make between different > > testing considerations could help us focus our efforts. Here's my 2 cents > > in the buckets you describe; I'm wondering if any of these use cases > align > > with anyone else and can help narrow our scope, and if I understood you > > right @Gerard: > > > > Regarding platform code: For our own platform code (ie custom Operators > and > > Hooks), we have our CI platform running unittests on their construction > > and, in the case of hooks, integration tests on connectivity. The latter > > involves us setting up test integration services (i.e. a test MySQL > > process) which we start up as docker containers and we flip our airflow's > > configuration to point at them during testing using environment > variables. > > It seems from a browse on airflow's testing that operators and hooks are > > mostly unittested, with the integrations mocked or skipped (ie > > https://github.com/apache/incubator-airflow/blob/master/ > > tests/contrib/hooks/test_jira_hook.py#L40-L41 > > or > > https://github.com/apache/incubator-airflow/blob/master/ > > tests/contrib/hooks/test_sqoop_hook.py#L123-L125). > > If the hook is using some other, well tested library to actually > establish > > the connection, the case can probably be made here that the custom > operator > > and hook authors don't need integration tests, so since the normal > unittest > > library is enough to handle these that might not need to be in scope for > a > > new testing library to describe. > > > > Regarding data manipulation functions of the business code: > > For us, we run tests on each operator in each DAG on CI, seeded with test > > input data, asserted against known output data, all of which we have > > compiled over time to represent different edge cases we expect or have > > seen. So this is a test at the level of the operator as described in a > > given DAG. Because we only describe edge cases we have seen or can > predict, > > its a very reactive way to handle testing at this level. > > > > If I understand your idea right, another way to test (or at least, > surface > > errors) at this level is, given you have a DAG that is resilient against > > arbitrary data failures, your DAG should include a validation task/report > > at its end or a test suite should run daily against the production error > > log for that DAG that surfaces errors your business code encountered on > > production data. I think this is really interesting and reminds me of an > > airflow video I saw once (can't remember who gave the talk) on a DAG > whose > > last task self-reported error counts and rows lost. If implemented as a > > test suite you would run against production this might be a direction we > > would want a testing library to go into. > > > > Regarding the workflow correctness of the business code: > > What we set out to do on our side was a hybrid version of your item 1 > and 2 > > which we call "end-to-end tests": to call a whole DAG against 'real' > > existing systems (though really they are test docker containers of the > > processes we need (MySQL and Neo4J specifically) that we use environment > > variables to switch our airflow to use when instantiating hooks etc), > > seeded with test input files for services that are hard to set up (i.e. > > third party APIs we ingest data from). Since the whole DAG is seeded with > > known input data, this gives us a way to compare the last output of a DAG > > to a known file, so that if any workflow changes OR business logic in the > > middle affected the final output, we would know as part of our test suite > > instead of when production breaks. In other words, a way to test a > > regression of the whole DAG. So this is the framework we were thinking > > needed to be created, and is a direction we could go with a testing > library > > as well. > > > > This doesn't get to your point of determining what workflow was used, > which > > is interesting, just not a use case we have encountered yet (we only have > > deterministic DAGs). In my mind in this case we would want a testing > suite > > to be able to more or less turn some DAGs "on" against seeded input data > > and mocked or test integration services, let a scheduler go at it, and > then > > check the metadata database for what workflow happened (and, if we had > test > > integration services, maybe also check the output against the known > output > > for the seeded input). I can definitely see your suggestion of developing > > instrumentation to inspect a followed workflow as a useful addition a > > testing library could include. > > > > To some degree our end-to-end DAG tests overlaps in our workflow with > your > > point 3 (UAT environment), but we've found that more useful to test if > > "wild data" causes uncaught exceptions or any integration errors with > > difficult-to-mock third party services, not DAG level logic regressions, > > since the input data is unknown and thus we can't compare to a known > output > > in this case, depending instead on a fallible human QA or just accepting > > that the DAG running with no exceptions as passing UAT. > > > > Laura > > > > On Tue, May 9, 2017 at 2:15 AM, Gerard Toonstra <[email protected]> > > wrote: > > > > > Very interesting video. I was unable to take part. I watched only part > of > > > it for now. > > > Let us know where the discussion is being moved to. > > > > > > The confluence does indeed seem to be the place to put final > conclusions > > > and thoughts. > > > > > > For airflow, I like to make a distinction between "platform" and > > "business" > > > code. The platform code are > > > the hooks and operators and provide the capabilities of what your ETL > > > system can do. You'll test this > > > code with a lot of thoroughness, such that each component behaves how > > you'd > > > expect, judging from > > > the constructor interface. Any abstractions in there (like copying > files > > to > > > GCS) should be kept as hidden > > > as possible (retries, etc). > > > > > > The "business" code is what runs on a daily basis. This can be divided > in > > > another two concerns > > > for testing: > > > > > > 1 The workflow, the code between the data manipulation functions that > > > decides which operators get called > > > 2 The data manipulation function. > > > > > > > > > I think it's good practice to run tests on "2" on a daily basis and not > > > just once on CI. The reason is that there > > > are too many unforeseen circumstances where data can get into a bad > > state. > > > So such tests shouldn't run > > > once on a highly controlled environment like CI, but run daily in a > less > > > predictable environment like production, > > > where all kind of weird things can happen, but you'll be able to catch > > with > > > proper checks in place. Even if the checks > > > are too rigorous, you can skip them and improve on them, so that it > fits > > > what goes on in your environment > > > to your best ability. > > > > > > > > > Which mostly leaves testing workflow correctness and platform code. > What > > I > > > had intended to do was; > > > > > > 1. Test the platform code against real existing systems (or maybe > docker > > > containers), to test their behavior > > > in success and failure conditions. > > > 2. Create workflow scripts for testing the workflow; this probably > > requires > > > some specific changes in hooks, > > > which wouldn't call out to other systems, but would just pick up > small > > > files you prepare from a testing repo > > > and pass them around. The test script could also simulate > > > unavailability, etc. > > > This relieves you of a huge responsibility of setting up systems, > > docker > > > containers and load that with data. > > > Airflow sets up pretty quickly as a docker container and you can > also > > > start up a sample database with that. > > > Afterwards, from a test script, you can check which workflow was > > > followed by inspecting the database, > > > so develop some instrumentation for that. > > > 3. Test the data manipulation in a UAT environment, mirrorring the runs > > in > > > production to some extent. > > > That would be a place to verify if the data comes out correctly and > > > also show people what kind of > > > monitoring is in place to double-check that. > > > > > > > > > On Tue, May 9, 2017 at 1:14 AM, Arnie Salazar <[email protected]> > > > wrote: > > > > > > > Scratch that. I see the whole video now. > > > > > > > > On Mon, May 8, 2017 at 3:33 PM Arnie Salazar <[email protected] > > > > > > wrote: > > > > > > > > > Thanks Sam! > > > > > > > > > > Is there a part 2 to the video? If not, can you post the "next > steps" > > > > > notes you took whenever you have a chance? > > > > > > > > > > Cheers, > > > > > Arnie > > > > > > > > > > On Mon, May 8, 2017 at 3:08 PM Sam Elamin <[email protected] > > > > > > wrote: > > > > > > > > > >> Hi Folks > > > > >> > > > > >> For those of you who missed it, you can catch the discussion from > > the > > > > link > > > > >> on this tweet <https://twitter.com/samelamin/status/ > > > 861703888298225670> > > > > >> > > > > >> Please do share and feel free to get involved as the more feedback > > we > > > > get > > > > >> the better the library we create is :) > > > > >> > > > > >> Regards > > > > >> Sam > > > > >> > > > > >> On Mon, May 8, 2017 at 9:43 PM, Sam Elamin < > [email protected] > > > > > > > >> wrote: > > > > >> > > > > >> > Bit late notice but the call is happening today at 9 15 utc so > in > > > > about > > > > >> > 30 mins or so > > > > >> > > > > > >> > It will be recorded but if anyone would like to join in on the > > > > >> discussion > > > > >> > the hangout link is https://hangouts.google.com/hangouts/_/ > > > > >> > mbkr6xassnahjjonpuvrirxbnae > > > > >> > > > > > >> > Regards > > > > >> > Sam > > > > >> > > > > > >> > On Fri, 5 May 2017 at 21:35, Ali Uz <[email protected]> wrote: > > > > >> > > > > > >> >> I am also very interested in seeing how this turns out. Even > > though > > > > we > > > > >> >> don't have a testing framework in-place on the project I am > > working > > > > >> on, I > > > > >> >> would very much like to contribute to some general framework > for > > > > >> testing > > > > >> >> DAGs. > > > > >> >> > > > > >> >> As of now we are just implementing dummy tasks that test our > > actual > > > > >> tasks > > > > >> >> and verify if the given input produces the expected output. > > Nothing > > > > >> crazy > > > > >> >> and certainly not flexible in the long run. > > > > >> >> > > > > >> >> > > > > >> >> On Fri, 5 May 2017 at 22:59, Sam Elamin < > [email protected] > > > > > > > >> wrote: > > > > >> >> > > > > >> >> > Haha yes Scott you are in! > > > > >> >> > On Fri, 5 May 2017 at 20:07, Scott Halgrim < > > > > [email protected] > > > > >> > > > > > >> >> > wrote: > > > > >> >> > > > > > >> >> > > Sounds A+ to me. By “both of you” did you include me? My > > first > > > > >> >> response > > > > >> >> > > was just to your email address. > > > > >> >> > > > > > > >> >> > > On May 5, 2017, 11:58 AM -0700, Sam Elamin < > > > > >> [email protected]>, > > > > >> >> > > wrote: > > > > >> >> > > > Ok sounds great folks > > > > >> >> > > > > > > > >> >> > > > Thanks for the detailed response laura! I'll invite both > of > > > you > > > > >> to > > > > >> >> the > > > > >> >> > > > group if you are happy and we can schedule a call for > next > > > > week? > > > > >> >> > > > > > > > >> >> > > > How does that sound? > > > > >> >> > > > On Fri, 5 May 2017 at 17:41, Laura Lorenz < > > > > >> [email protected] > > > > >> >> > > > > > >> >> > > wrote: > > > > >> >> > > > > > > > >> >> > > > > We do! We developed our own little in-house DAG test > > > > framework > > > > >> >> which > > > > >> >> > we > > > > >> >> > > > > could share insights on/would love to hear what other > > folks > > > > >> are up > > > > >> >> > to. > > > > >> >> > > > > Basically we use mock a DAG's input data, use the > > > BackfillJob > > > > >> API > > > > >> >> > > directly > > > > >> >> > > > > to call a DAG in a test, and compare its outputs to the > > > > >> intended > > > > >> >> > result > > > > >> >> > > > > given the inputs. We use docker/docker-compose to > manage > > > > >> services, > > > > >> >> > and > > > > >> >> > > > > split our dev and test stack locally so that the tests > > have > > > > >> their > > > > >> >> own > > > > >> >> > > > > scheduler and metadata database and so that our CI tool > > > knows > > > > >> how > > > > >> >> to > > > > >> >> > > > > construct the test stack as well. > > > > >> >> > > > > > > > > >> >> > > > > We co-opted the BackfillJob API for our own purposes > > here, > > > > but > > > > >> it > > > > >> >> > > seemed > > > > >> >> > > > > overly complicated and fragile to start and interact > with > > > our > > > > >> own > > > > >> >> > > > > in-test-process executor like we saw in a few of the > > tests > > > in > > > > >> the > > > > >> >> > > Airflow > > > > >> >> > > > > test suite. So I'd be really interested on finding a > way > > to > > > > >> >> > streamline > > > > >> >> > > how > > > > >> >> > > > > to describe a test executor for both the Airflow test > > suite > > > > and > > > > >> >> > > people's > > > > >> >> > > > > own DAG testing and make that a first class type of > API. > > > > >> >> > > > > > > > > >> >> > > > > Laura > > > > >> >> > > > > > > > > >> >> > > > > On Fri, May 5, 2017 at 11:46 AM, Sam Elamin < > > > > >> >> [email protected] > > > > >> >> > > > > wrote: > > > > >> >> > > > > > > > > >> >> > > > > > Hi All > > > > >> >> > > > > > > > > > >> >> > > > > > A few people in the Spark community are interested in > > > > >> writing a > > > > >> >> > > testing > > > > >> >> > > > > > library for Airflow. We would love anyone who uses > > > Airflow > > > > >> >> heavily > > > > >> >> > in > > > > >> >> > > > > > production to be involved > > > > >> >> > > > > > > > > > >> >> > > > > > At the moment (AFAIK) testing your DAGs is a bit of a > > > pain, > > > > >> >> > > especially if > > > > >> >> > > > > > you want to run them in a CI server > > > > >> >> > > > > > > > > > >> >> > > > > > Is anyone interested in being involved in the > > discussion? > > > > >> >> > > > > > > > > > >> >> > > > > > Kind Regards > > > > >> >> > > > > > Sam > > > > >> >> > > > > > > > > > >> >> > > > > > > > > >> >> > > > > > > >> >> > > > > > >> >> > > > > >> > > > > > >> > > > > > > > > > > > > > > >
