Yeah. Developing faster than airflow itself is a very valid point Ash.

On Sun, Jul 17, 2022 at 10:36 PM Ash Berlin-Taylor <[email protected]> wrote:

> I agree this would be a great addition to the Airflow ecosystem but I
> think it should start out life as an external package for two reasons:
>
> 1. It means you can release and iterate quickly without being beholden to
> the Airflow release process (voting, timelines etc)
> 2. It means we can see how popular it is before we (Airflow maintainers)
> have to commit to supporting it long term.
>
> -a
>
> On 17 July 2022 21:19:21 BST, Jarek Potiuk <[email protected]> wrote:
>>
>> First comment - without looking at the details yet - those kinds of tests
>> are badly needed. We have many questions from our users "How do I test my
>> dags", and also one of the comments I've heard about some other
>> orchestration framework was ("I really like how easy to to run tests is".
>> Getting a "built-in" simple test harness for DAG writing would be cool.
>>
>> Whether it is part of Airflow or external library - I think both have
>> pros/cons but as long as it is small and easy to follow and maintain, I am
>> for getting it in (providing that we will have good documentation/guidance
>> for our users how to use it and plenty of examples). I think this is the
>> only thing I'd be worried about when considering accepting such a framework
>> to the community - the code we get in Airflow might become a liability if
>> people who use it will drag more attention and effort of maintainers out of
>> other things. This is basically something that in regular business is
>> called "lost opportunity" cost.
>>
>> So as long as we can get really great documentation, examples and some
>> ways to make our users self-serviced mostly, I am all in.
>>
>> J.
>>
>> On Sun, Jul 17, 2022 at 10:09 PM Pablo Estrada <[email protected]>
>> wrote:
>>
>>> Understood!
>>>
>>> TL;DR: I propose a testing framework where users can check for 'DAG
>>> execution invariants' or 'DAG execution expectations' given certain task
>>> outcomes.
>>>
>>> As DAGs grow in complexity, sometimes it might become difficult to
>>> reason about their runtime behavior in many scenarios. Users may want to
>>> lay out rules in the form of tests that can verify  DAG execution results.
>>> For example:
>>>
>>> - If any of my database_backup_* tasks fails, I want to ensure that at
>>> least one email_alert_* task will run.
>>> - If my 'check_authentication' task fails, I want to ensure that the
>>> whole DAG will fail.
>>> - If any of my DataflowOperator tasks fails, I want to ensure that a
>>> PubsubOperator downstream will always run.
>>>
>>> These sorts of invariants don't need the DAG to be executed; but in
>>> fact, they are pretty hard to test today: Staging environments can't check
>>> every possible runtime outcome.
>>>
>>> In this framework, users would define unit tests like this:
>>>
>>> ```
>>> def test_my_example_dag():
>>>   the_dag = models.DAG(
>>>         'the_basic_dag',
>>>         schedule_interval='@daily',
>>>         start_date=DEFAULT_DATE,
>>>     )
>>>
>>>     with the_dag:
>>>         op1 = EmptyOperator(task_id='task_1')
>>>         op2 = EmptyOperator(task_id='task_2')
>>>         op3 = EmptyOperator(task_id='task_3')
>>>
>>>         op1 >> op2 >> op3
>>>     # DAG invariant: If task_1 and task_2 succeeds, then task_3 will
>>> always run
>>>     assert_that(
>>>             given(thedag)\
>>>                 .when(task('task_1'), succeeds())\
>>>                 .and_(task('task_2'), succeeds())\
>>>                 .then(task('task_3'), runs()))
>>> ```
>>>
>>> This is a very simple example - and it's not great, because it only
>>> duplicates the DAG logic - but you can see more examples in my draft PR
>>> <https://github.com/apache/airflow/pull/25112/files#diff-b1f30afa38d247f9204790392ab6888b04288603ac4d38154d05e6c5b998cf85R28-R82>[1]
>>> and in my draft AIP
>>> <https://docs.google.com/document/d/1priak1uiJTXP1F9K5B8XS8qmeRbJ8trYLvE4k2aBY5c/edit#heading=h.atmk0p7fmv7g>
>>> [2].
>>>
>>> I started writing up an AIP in a Google doc[2] which y'all can check.
>>> It's very close to what I have written here : )
>>>
>>> LMK what y'all think. I am also happy to publish this as a separate
>>> library if y'all wanna be cautious about adding it directly to Airflow.
>>> -P.
>>>
>>> [1]
>>> https://github.com/apache/airflow/pull/25112/files#diff-b1f30afa38d247f9204790392ab6888b04288603ac4d38154d05e6c5b998cf85R28-R82
>>> [2]
>>> https://docs.google.com/document/d/1priak1uiJTXP1F9K5B8XS8qmeRbJ8trYLvE4k2aBY5c/edit#
>>>
>>>
>>> On Sun, Jul 17, 2022 at 2:13 AM Jarek Potiuk <[email protected]> wrote:
>>>
>>>> Yep. Just outline your proposal on devlist, Pablo :).
>>>>
>>>> On Sun, Jul 17, 2022 at 10:35 AM Ash Berlin-Taylor <[email protected]>
>>>> wrote:
>>>> >
>>>> > Hi Pablo,
>>>> >
>>>> > Could you describe at a high level what you are thinking of? It's
>>>> entirely possible it doesn't need any changes to core Airflow, or isn't
>>>> significant enough to need an AIP.
>>>> >
>>>> > Thanks,
>>>> > Ash
>>>> >
>>>> > On 17 July 2022 07:43:54 BST, Pablo Estrada
>>>> <[email protected]> wrote:
>>>> >>
>>>> >> Hi there!
>>>> >> I would like to start a discussion of an idea that I had for a
>>>> testing framework for airflow.
>>>> >> I believe the first step would be to write up an AIP - so could I
>>>> have access to write a new one on the cwiki?
>>>> >>
>>>> >> Thanks!
>>>> >> -P.
>>>>
>>>

Reply via email to