Hi, I think Jarek that you can unit test your DAGs without a BDD. You will have to patching some connections but it's feasible. On my side I test the topological order of my DAGs to be sure of the order.
And I patch the xcom_push and xcom_push methods to be sure that between task everything will be OK. I your hooks are tested well I think it's OK. For instance I have this kind of code to test the topological order : https://gist.github.com/Bl3f/acd3d4b251eb565c96168635d84d0513. Regards, Christophe Le ven. 19 oct. 2018 à 10:23, Jarek Potiuk <jarek.pot...@polidea.com> a écrit : > Thanks! I like the suggestion about testing hooks rather than whole DAGs - > we will certainly use it in the future. And BDD is the approach I really > like - thanks for the code examples! We might also use it in the near > future. Super helpful! > > So far we mocked hooks in our unit tests only (for example here > < > https://github.com/PolideaInternal/incubator-airflow/blob/master/tests/contrib/operators/test_gcp_compute_operator.py#L241 > >) > - that helps to test the logic of more complex operators. > @Anthony - we also use a modified docker-based environment to run the tests > (https://github.com/PolideaInternal/airflow-breeze/tree/integration-tests) > including running full Dags. And yeah missing import was just an > exaggerated example :) we also use IDE/lints to catch those early :D. > > I think still there is a need to run whole DAGs on top of testing operators > and hooks separate. This is to test a bit more complex interactions between > the operators. In our case we use example dags for both documentation and > running full e2e integration tests (for example here > > https://github.com/PolideaInternal/incubator-airflow/blob/master/airflow/contrib/example_dags/example_gcp_compute.py > ). > Those are simple examples but we will have a bit more complex interactions > and it would be great to be able to run them quicker. However if we get the > hook tests automated/unit-testable as well, maybe our current approach > where we run them in the full dockerized environment will be good enough. > > J. > > > On Thu, Oct 18, 2018 at 5:44 PM Anthony Brown < > anthony.br...@johnlewis.co.uk> > wrote: > > > I have pylint set up in my IDE which catches most silly errors like > missing > > imports > > I also use a docker image so I can start up airflow locally and manually > > test any changes before trying to deploy them. I use a slightly modified > > version of https://github.com/puckel/docker-airflow to control it. This > > only works on connections I have access to from my machine > > Finally I have a suite of tests based on > > > > > https://blog.usejournal.com/testing-in-airflow-part-1-dag-validation-tests-dag-definition-tests-and-unit-tests-2aa94970570c > > which I can run to test DAGs are valid and any unit tests I can put in. > The > > tests are run in a docker container which runs a local db instance so I > > have access to xcoms etc > > > > As part of my deployment pipeline, I run pylint and tests again before > > deploying anywhere to make sure nobody has forgotten to run them locally > > > > Gerard - I like the suggestion about using mocked hooks and BDD. I will > > look into this further > > > > On Thu, 18 Oct 2018 at 15:12, Gerard Toonstra <gtoons...@gmail.com> > wrote: > > > > > There was a discussion about a unit testing approach last year 2017 I > > > believe. If you dig the mail archives, you can find it. > > > > > > My take is: > > > > > > - You should test "hooks" against some real system, which can be a > docker > > > container. Make sure the behavior is predictable when talking against > > that > > > system. Hook tests are not part of general CI tests because of the > > > complexity of the CI setup you'd have to make, so they are run on local > > > boxes. > > > - Maybe add additional "mock" hook tests, mocking out the connected > > > systems. > > > - When hooks are tested, operators can use 'mocked' hooks that no > longer > > > need access to actual systems. You can then set up an environment where > > you > > > have predictable inputs and outputs and test how the operators act on > > them. > > > I've used "behave" to do that with very simple record sets, but you can > > > make these as complex as you want. > > > - Then you know your hooks and operators work functionally. Testing if > > your > > > workflow works in general can be implemented by adding "check" > operators. > > > The benefit here is that you don't test the workflow once, but you test > > for > > > data consistency every time the dag runs. If you have complex workflows > > > where the correct behavior of the flow is worrysome, then you may need > to > > > go deeper into it. > > > > > > The above doesn't depend on DAGS that need to be scheduled and the > delays > > > involving that. > > > > > > All of the above is implemented in my repo > > > https://github.com/gtoonstra/airflow-hovercraft , using "behave" as a > > BDD > > > method of testing, so you can peruse that. > > > > > > Rgds, > > > > > > G> > > > > > > > > > On Thu, Oct 18, 2018 at 2:43 PM Jarek Potiuk <jarek.pot...@polidea.com > > > > > wrote: > > > > > > > I am also looking to have (I think) similar workflow. Maybe someone > has > > > > done something similar and can give some hints on how to do it the > > > easiest > > > > way? > > > > > > > > Context: > > > > > > > > While developing operators I am using example test DAGs that talk to > > GCP. > > > > So far our "integration tests" require copying the dag folder and > > > > restarting the airflow servers, unpausing the dag and waiting for it > to > > > > start. That takes a lot of time, sometimes just to find out that you > > > missed > > > > one import. > > > > > > > > Ideal workflow: > > > > > > > > Ideally I'd love to have a "unit" test (i.e possible to run via > > nosetests > > > > or IDE integration/PyCharm) that: > > > > > > > > - should not need to have airflow scheduler/webserver started. I > > guess > > > > we need a DB but possibly an in-memory, on-demand created database > > > > might be > > > > a good solution > > > > - load the DAG from a file specified (not really from/dags > > directory) > > > > - build internal dependencies between the DAG tasks (as specified > in > > > the > > > > Dag) > > > > - run the DAG immediately and fully (i.e. run all the "execute" > > > methods > > > > as needed and pass XCOM between tasks). > > > > - ideally produce log output in console rather in per-task files. > > > > > > > > I thought about using DagRun/DagBag but have not tried it yet and not > > > sure > > > > if you need to have whole environment set (which parts?). Any help > > > > appreciated :) ? > > > > > > > > J. > > > > > > > > On Thu, Oct 18, 2018 at 1:08 AM bielllob...@gmail.com < > > > > bielllob...@gmail.com> > > > > wrote: > > > > > > > > > I think it would be great to have a way to mock airflow for unit > > tests. > > > > > The way I approached this was to create a context manager that > > creates > > > a > > > > > temporary directory, sets the AIRFLOW_HOME environment variable to > > this > > > > > directory (only within the scope of the context manager) and then > > > renders > > > > > an airflow.cfg to that location. This creates an SQLite just for > the > > > test > > > > > so you can add variables and connections needed for the test > without > > > > > affecting the real Airflow installation. > > > > > > > > > > The first thing I realized is that this didn't work if the imports > > were > > > > > outside the context manager, since airflow.configuration and > > > > > airflow.settings perform all the initialization when they are > > imported, > > > > so > > > > > the AIRFLOW_HOME variable is already set to the real installation > > > before > > > > > getting inside the context manager. > > > > > > > > > > The workaround for this was to reload those modules and this works > > for > > > > the > > > > > tests I have written. However, when I tried to use it for something > > > more > > > > > complex (I have a plugin that I'm importing) I noticed that inside > > the > > > > > operator in this plugin, AIRFLOW_HOME is still set to the real > > > > > installation, not the temporary one for the test. I thought this > must > > > be > > > > > related to the imports but I haven't been able to figure out a way > to > > > fix > > > > > the issue. I tried patching some methods but I must have been > missing > > > > > something because the database initialization failed. > > > > > > > > > > Does anyone have an idea on the best way to mock/patch airflow so > > that > > > > > EVERYTHING that is executed inside the context manager uses the > > > temporary > > > > > installation? > > > > > > > > > > PS: This is my current attempt which works for the tests I defined > > but > > > > not > > > > > for external plugins: > > > > > https://github.com/biellls/airflow_testing > > > > > > > > > > For an example on how it works: > > > > > > > > > > > > > > > https://github.com/biellls/airflow_testing/blob/master/tests/mock_airflow_test.py > > > > > > > > > > > > > > > > > -- > > > > > > > > *Jarek Potiuk, Principal Software Engineer* > > > > Mobile: +48 660 796 129 > > > > > > > > > > > > > -- > > -- > > > > Anthony Brown > > Data Engineer BI Team - John Lewis > > Tel : 0787 215 7305 > > ********************************************************************** > > This email is confidential and may contain copyright material of the John > > Lewis Partnership. > > If you are not the intended recipient, please notify us immediately and > > delete all copies of this message. > > (Please note that it is your responsibility to scan this message for > > viruses). Email to and from the > > John Lewis Partnership is automatically monitored for operational and > > lawful business reasons. > > ********************************************************************** > > > > John Lewis plc > > Registered in England 233462 > > Registered office 171 Victoria Street London SW1E 5NN > > > > Websites: https://www.johnlewis.com > > http://www.waitrose.com > > https://www.johnlewisfinance.com > > http://www.johnlewispartnership.co.uk > > > > ********************************************************************** > > > > > -- > > *Jarek Potiuk, Principal Software Engineer* > Mobile: +48 660 796 129 >