Hi all, I have created a document to summarize the discussion from our third dev call for Airflow 2.0.
Thank you all who joined the call. *Doc Link*: https://cwiki.apache.org/confluence/display/AIRFLOW/Meeting+Notes#MeetingNotes-#4:14Sep2020 To all those who attended, can you please double-check and add if I have missed anything? To all those who didn't join, if you disagree to anything in the Summary please voice your opinion. Also please let me know if someone wants to include an item in Next call's Agenda. Including the Summary here too (might potentially break formatting): *Key Decisions* - *Updates* - Airflow v2-0-test branch <https://github.com/apache/airflow/commits/v2-0-test> has already been cut and currently manually rebased on top of the Master. Currently, we don't run CI as the branch is in-sync with Master. As soon as we have a PR / commit that we don't want to have it in 2.0 we will diverge v2-0-test branch from Master and start running tests against it. - The upgrade-check PR <https://github.com/apache/airflow/pull/9467> was merged, we now need to define more rules to add more checks. - *API* - Progress: - Project Board: https://github.com/apache/airflow/projects/1 - The issues labelled with "Enhancement" are not a requirement for 2.0 - Endpoints: - Task Instance Endpoint <https://github.com/apache/airflow/pull/9597> is WIP, all the other endpoints have been implemented. - Permissions Model: - On-going discussion on the PR <https://github.com/apache/airflow/pull/10594> but close to completion. - The next piece of work to be done is migrating existing Views to use resource-based permissions. (Github issue <https://github.com/apache/airflow/issues/10469>). This is mainly for standardizing the permissions model across API and UI. - *Improvements to SubDags / Concept of TaskGroup* - AIP-34 <https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-34+TaskGroup%3A+A+UI+task+grouping+concept+as+an+alternative+to+SubDagOperator> | PR <https://github.com/apache/airflow/pull/10153> introduced the concepts of TaskGroup and will be *included in Airflow 2.0*. - The PR implements TaskGroups for Graph View, the Tree View will be implemented in follow-up PRs. - Follow-up items from the discussion: - Discuss on mailing list whether we should deprecate SubDags in favour of TaskGroup in 2.0 or wait until Airflow 2.1 or 2.2 - Add docs around when to use TaskGroup vs SubDag and potentially listing PROs and CONS. - *Scheduler HA *(AIP-15 <https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=103092651> ) - A Draft PR <https://github.com/apache/airflow/pull/10956> has been created to enable code reviews and to allow the members of the community to start testing it with various setups. - To get the most benefit of Scheduler HA on MySQL, users will need to use MySQL 8. This is because MySQL 5.7 does not support SKIP LOCK <https://dev.mysql.com/doc/refman/8.0/en/innodb-locking-reads.html#innodb-locking-reads-nowait-skip-locked>feature but note that *MySQL 5.7 will still continue to work with at least the same or improved performance as now*. - Astronomer has done performance testing with different Scenarios and will publish benchmarks over the coming weeks. Google Composer Team + Polidea said that they would be happy to carry out various tests for Scheduler HA as well. - There were some concerns raised around LOCKING Timeout periods and the usage of DAG Serialization. More testing in the upcoming weeks should help mitigate any concerns and help fix the bugs if discovered. - *Docs:* - Explicitly mention that for HA Scheduler reads some of the properties from serialized_dag table. Users can turn on/off DAG Serialization in the Webserver but the Scheduler will continue using it. - Do we recommend 2 schedulers for Production deployments? - X Schedulers vs single Scheduler. Use case when one would be better than the other. - Some kind of Bell Curve showing an increase in Schedulers stops improving performance and maybe also degrades. This is intended to give guidance around what number of schedulers to run based on expected load, since this decision could be based on multiple factors. - Follow up items: - Create mailing list thread to discuss "Removing Pickling from Airflow 2.0". Currently, pickled dags are only supported by CeleryExecutor and we have a flag on *airflow scheduler <https://airflow.readthedocs.io/en/latest/cli-ref.html#scheduler> *(--do-pickle) and "--ship-dag" on *airflow tasks run <https://airflow.readthedocs.io/en/latest/cli-ref.html#run> *command. If we want to remove pickling Airflow 2.0 is the right time or we shouldn't do it until 3.0 - *Helm Chart* - We will continue focusing on getting Airflow 2.0 out so the first official release of Helm Chart might need to wait. - The issue with Helm Chart sources was fixed and there are no blockers currently if we were to release it at some point in the near future. - Enhancements (but not blockers) are: - Better Test Coverage with integration tests - Docs pointing to the chart on the Airflow Website or the docsite - The artifacts for the Helm chart would be published at https://downloads.apache.org/airflow/ - There is still an open question around *Helm Chart Versioning Policy *i.e. do we want to tie-in Airflow Versions with Helm Chart? Or do we just start from *1.0.0? * This needs to be decided before the release of the Helm Chart. *Things to Discuss Next* - *21 September (Subject to Change)* - Finish up open discussion items from the earlier meeting if not yet resolved: - Providers versioning, - SubDag deprecation, - Helm Chart release, - REST API permissions - Docs changes - UI Changes for 2.0 - Minimum effort changes: CSS/colours/spacing to make the UI look a bit modern - Process: - When should we defer the in-scope items to post-2.0 - Completion by a date? - Progress by a date? Regards, Kaxil