If much is out, rather than in, is there a different pool from where you
will draw contributors and eventually committers/pmc?

Sounds somewhat like a question of whether to grow the tent of
contributors, committers, pmc of what is deemed to be "Airflow" (capital
"A" and in)?  Or err towards things manageable for the existing committers,
pmc?  With more things deemed not-in, would adding new blood to the project
be more difficult?



On Sat, Aug 6, 2022, 9:02 AM Jarek Potiuk <ja...@potiuk.com> wrote:

> What do you think, Pablo about the "being out" vs. "being in" the
> official repo?
>
> On Thu, Jul 28, 2022 at 3:51 PM Jarek Potiuk <ja...@potiuk.com> wrote:
>
>> Anyone :) ?
>>
>> On Mon, Jul 18, 2022 at 10:38 AM Jarek Potiuk <ja...@potiuk.com> wrote:
>>
>>> I would love to hear what others think about the "in/out" approach -
>>> mine is just the line of thoughts I've been exploring during the last few
>>> months where I prepared my own line of thought about providers,
>>> maintenance, incentive of entities maintaining open-source projects, and
>>> especially - expectations of the users that it creates. But those are just
>>> my thoughts and I'd love to hear what others think about it.
>>>
>>> On Mon, Jul 18, 2022 at 10:33 AM Jarek Potiuk <ja...@potiuk.com> wrote:
>>>
>>>> I had some thoughts about it - this also connected with recent
>>>> discussions about mixed governance for providers, and I think it's worth
>>>> using this discussion to set some rules and "boundaries" on when and
>>>> how and especially why we want to accept some contributions, while for some
>>>> other contributions it's better to be outside.
>>>>
>>>> We are about to start more seriously thinking (and discussing) on how
>>>> to split Airflow providers off airflow. And I think we can split off more
>>>> than providers - this might be a good candidate to be a standalone, but
>>>> still community maintained package. If we are going to solve the problem of
>>>> splitting airflow to N packages, one more package does not matter.
>>>> And it would nicely solve "version independence". We could even make it
>>>> airflow 2.0+ compliant if we want.
>>>>
>>>> So I think while the question of "is it tied with a specific airflow
>>>> version or not" does not really prevent us from making it part of community
>>>> - those two are not related (if we are going to have more repositories
>>>> anyway)
>>>>
>>>> The important part is really how "self-servicing" we can make it and
>>>> how we make sure it stays relevant with future versions of Airflow and who
>>>> does it I think - namely who has the incentive and "responsibility" to
>>>> maintain it. I am sure we will add more features to Airflow DAGs and
>>>> simplify the way DAGs are written over time, and the test harness will have
>>>> to adapt to it.
>>>>
>>>> There are pros and cons of having such a standalone package "in the
>>>> community/ASF project" and "out of it". We have a good example (from
>>>> similar kinds of tools/utils) in the past that we can learn from(and maybe
>>>> Bas can share more insights).
>>>>
>>>> https://github.com/BasPH/pylint-airflow - pylint plugin for
>>>> Airflow DAGs
>>>>
>>>> Initially that was "sponsored" by GoDataDriven where Bas worked and I
>>>> think this is where it was born. And that made sense as it was likely also
>>>> useful for the customers of GoDataDriven (here I am guessing). But
>>>> apparently both GoDataDriven's incentives winded down and it turned out
>>>> that usefulness of it was not as big (also I think we all in Python
>>>> community learned that Pylint is more of a distraction than real help - we
>>>> dumped Pylint eventually and the plugin was not maintained beyond some
>>>> versions of 1.10. And the tool is all but defunct now. Which is perfectly
>>>> understandable.
>>>>
>>>> In this case there is (I think) no risk of a "pylint" like problem, but
>>>> the question of maintenance and adaptation to future versions of Airflow
>>>> remains.
>>>>
>>>> I think there is one big differences of something that is "in ASF
>>>> repos" and "out":
>>>>
>>>> * if we make it a standalone package in "asf airflow community" - we
>>>> will have some obligation and expectations from our users to maintain it.
>>>> We can add some test harness (regardless if it will be in airflow
>>>> repository or in a separate one) to make sure that new airflow "core"
>>>> changes will not break it (and we can fail our PRs if they do - basically
>>>> making "core" maintainers take care about this problem rather than delegate
>>>> it to someone else to react on core changes (this is what has to  happen
>>>> with providers I believe even if we split them to separate repo).  I think
>>>> anything that we as the ASF community release should have such harnesses -
>>>> making sure that whatever we release and make available to our users work
>>>> together.
>>>>
>>>> * if it is outside of the "ASF community", someone will have to react
>>>> to "core airflow" changes. We will not do it in the community, we will not
>>>> pay attention, such an "external tool" might break at any time because we
>>>> introduced a change in part of a core that the external tool implicitly
>>>> relied on.
>>>>
>>>> For me the question is whether something should be in/out should be
>>>> based on :
>>>>
>>>> * is it really useful for the community as a whole? -> if yes we should
>>>> consider it
>>>> * is it strongly tied with the core of airflow in the sense of relying
>>>> on some internals that might change easily? -> if not, there is no need to
>>>> bring it in, it can be easily maintained outside by anyone
>>>> * if it is strongly tied with the core - > is there someone (person,
>>>> organisation) who wants to take the burden of maintaining it and has
>>>> incentive of doing it for quite some time -> if yes, great, let them do
>>>> that!
>>>> * if it is strongly tied, do we want to take a burden as "core airflow
>>>> maintainers" to keep it updated together with the core if it is? -> if yes,
>>>> we should bring it in
>>>>
>>>> If we have a strongly tied tool that we do not want to maintain in the
>>>> core and there is no entity who would like to do it, then I think this idea
>>>> should be dropped :).
>>>>
>>>> J.
>>>>
>>>>
>>>> On Mon, Jul 18, 2022 at 1:52 AM Ping Zhang <pin...@umich.edu> wrote:
>>>>
>>>>> Hi Pablo,
>>>>>
>>>>> Wow, I really love this idea. This will greatly enrich the airflow
>>>>> ecosystem.
>>>>>
>>>>> I agree with Ash, it is better to have it as a standalone package. And
>>>>> we can use this framework to write airflow core invariants tests, so that
>>>>> we will run them on every airflow release to guarantee no regressions.
>>>>>
>>>>> Thanks,
>>>>>
>>>>> Ping
>>>>>
>>>>>
>>>>> On Sun, Jul 17, 2022 at 1:09 PM Pablo Estrada
>>>>> <pabl...@google.com.invalid> wrote:
>>>>>
>>>>>> Understood!
>>>>>>
>>>>>> TL;DR: I propose a testing framework where users can check for 'DAG
>>>>>> execution invariants' or 'DAG execution expectations' given certain task
>>>>>> outcomes.
>>>>>>
>>>>>> As DAGs grow in complexity, sometimes it might become difficult to
>>>>>> reason about their runtime behavior in many scenarios. Users may want to
>>>>>> lay out rules in the form of tests that can verify  DAG execution 
>>>>>> results.
>>>>>> For example:
>>>>>>
>>>>>> - If any of my database_backup_* tasks fails, I want to ensure that
>>>>>> at least one email_alert_* task will run.
>>>>>> - If my 'check_authentication' task fails, I want to ensure that the
>>>>>> whole DAG will fail.
>>>>>> - If any of my DataflowOperator tasks fails, I want to ensure that a
>>>>>> PubsubOperator downstream will always run.
>>>>>>
>>>>>> These sorts of invariants don't need the DAG to be executed; but in
>>>>>> fact, they are pretty hard to test today: Staging environments can't 
>>>>>> check
>>>>>> every possible runtime outcome.
>>>>>>
>>>>>> In this framework, users would define unit tests like this:
>>>>>>
>>>>>> ```
>>>>>> def test_my_example_dag():
>>>>>>   the_dag = models.DAG(
>>>>>>         'the_basic_dag',
>>>>>>         schedule_interval='@daily',
>>>>>>         start_date=DEFAULT_DATE,
>>>>>>     )
>>>>>>
>>>>>>     with the_dag:
>>>>>>         op1 = EmptyOperator(task_id='task_1')
>>>>>>         op2 = EmptyOperator(task_id='task_2')
>>>>>>         op3 = EmptyOperator(task_id='task_3')
>>>>>>
>>>>>>         op1 >> op2 >> op3
>>>>>>     # DAG invariant: If task_1 and task_2 succeeds, then task_3 will
>>>>>> always run
>>>>>>     assert_that(
>>>>>>             given(thedag)\
>>>>>>                 .when(task('task_1'), succeeds())\
>>>>>>                 .and_(task('task_2'), succeeds())\
>>>>>>                 .then(task('task_3'), runs()))
>>>>>> ```
>>>>>>
>>>>>> This is a very simple example - and it's not great, because it only
>>>>>> duplicates the DAG logic - but you can see more examples in my draft
>>>>>> PR
>>>>>> <https://github.com/apache/airflow/pull/25112/files#diff-b1f30afa38d247f9204790392ab6888b04288603ac4d38154d05e6c5b998cf85R28-R82>[1]
>>>>>> and in my draft AIP
>>>>>> <https://docs.google.com/document/d/1priak1uiJTXP1F9K5B8XS8qmeRbJ8trYLvE4k2aBY5c/edit#heading=h.atmk0p7fmv7g>
>>>>>> [2].
>>>>>>
>>>>>> I started writing up an AIP in a Google doc[2] which y'all can check.
>>>>>> It's very close to what I have written here : )
>>>>>>
>>>>>> LMK what y'all think. I am also happy to publish this as a separate
>>>>>> library if y'all wanna be cautious about adding it directly to Airflow.
>>>>>> -P.
>>>>>>
>>>>>> [1]
>>>>>> https://github.com/apache/airflow/pull/25112/files#diff-b1f30afa38d247f9204790392ab6888b04288603ac4d38154d05e6c5b998cf85R28-R82
>>>>>> [2]
>>>>>> https://docs.google.com/document/d/1priak1uiJTXP1F9K5B8XS8qmeRbJ8trYLvE4k2aBY5c/edit#
>>>>>>
>>>>>>
>>>>>> On Sun, Jul 17, 2022 at 2:13 AM Jarek Potiuk <ja...@potiuk.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Yep. Just outline your proposal on devlist, Pablo :).
>>>>>>>
>>>>>>> On Sun, Jul 17, 2022 at 10:35 AM Ash Berlin-Taylor <a...@apache.org>
>>>>>>> wrote:
>>>>>>> >
>>>>>>> > Hi Pablo,
>>>>>>> >
>>>>>>> > Could you describe at a high level what you are thinking of? It's
>>>>>>> entirely possible it doesn't need any changes to core Airflow, or isn't
>>>>>>> significant enough to need an AIP.
>>>>>>> >
>>>>>>> > Thanks,
>>>>>>> > Ash
>>>>>>> >
>>>>>>> > On 17 July 2022 07:43:54 BST, Pablo Estrada
>>>>>>> <pabl...@google.com.INVALID> wrote:
>>>>>>> >>
>>>>>>> >> Hi there!
>>>>>>> >> I would like to start a discussion of an idea that I had for a
>>>>>>> testing framework for airflow.
>>>>>>> >> I believe the first step would be to write up an AIP - so could I
>>>>>>> have access to write a new one on the cwiki?
>>>>>>> >>
>>>>>>> >> Thanks!
>>>>>>> >> -P.
>>>>>>>
>>>>>>

Reply via email to