Hey everyone, Thank you for attending the dev call on Thursday. I updated our meeting notes on the Airflow wiki and the link for those notes is here <https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=373886699#Airflow3.xDevCall:Meetingnotes-Summary.31>
To everyone who attended the meeting, please check the summary and add anything that I may have missed. For those who could not join, please let us know if you disagree with anything discussed and agreed upon in the meeting. Also, please do ask questions if something is unclear. Our next meeting is scheduled for the 26th of February at the same time. The agenda is already populated, primarily with swim lane updates and Airflow 3.2 AIP updates. If you would like to keep this call to discuss a particular topic, please let me know if you would like to add anything to the agenda <https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=373886699#Airflow3.xDevCall:Meetingnotes-ProposedAgenda.33> . Best regards, Vikram -- Below is the summary from the call: - Swim lane updates: - UI Test framework (Rahul Vats): - Rahul shared that we now have good coverage on the E2E UI tests with more tests having been added over the last two weeks bringing the total now to 83. - He said that some of the end to end tests have been moved to being unit tests, so that the CI times could be reduced, and that this work was ongoing, since the CI was now around 14-15 minutes despite the move because of the additional tests. - UI / API update (Pierre): - Pierre confirmed that API issues were only for those supporting the UIs. The API issues supporting task execution were now flagged under TaskSDK. - Pierre shared that the team was leveraging the test deployment created by Rahul containing millions of task instance records for performance issue identification, and that there was great community participation in this process, including endpoint improvements with caching. - Airflow 3.2 development updates: - AIP-76 Asset Partitions (Wei Lee): - Wei Lee shared a recorded demo showcasing the progress made to date on Asset Partitions, which showed support for date based partitions, leveraging timetables. - The concepts of flexible date based partitions such as Hourly, Daily, Weekly, etc., were clearly demonstrated in the demo and the overall demo was very well received by the team. - There were some questions around the partition keys and the visibility of those partition keys in the UI, which the team agreed to take offline with Wei. Vikram also requested community feedback on potential issues with mismatched partition keys and conditions for subsequent triggering. - AIP-86 Deadline alerts (Dennis): - Dennis said that the last two PRs were ready for review, with synchronous callbacks working in the local executor and were abstracted in the base executor. - Dennis shared that Celery implementation was nearly complete and that would be asking for community help on the other executors. - Dennis said that they were well positioned to hit the code freeze target date of the week of Feb 26th. - Vikram to follow up with Dennis async regarding the configuration tradeoffs around sync callback execution, concurrency controls, and timeliness of alerts. - AIP-67 Multi-team (Niko / Vincent): - Vincent said that they were working on adding minimal multi-team functionality to simple auth manager for testing. - Rajesh said that Niko had asked for community help on the Kubernetes executor for multi-team, but that there wasn't much progress here yet. - AIP-98 Async Python Operator (David Blain): - Vikram enquired if there was any progress on the documentation around the Async Python Operator, specifically the usage guidance as compared to Deferrable operators. - There didn't seem to be, so Vikram to follow-up async with David on this. - Discussion topics: - AIP-99 Common data access patterns (Pavan) - Pavan presented a comprehensive overview of the work planned for this AIP, specifically including: SQL Query generation using DB schemas, Human-in-the-loop review before execution, Data transfer operators via DataFusion, and based on support for all existing Airflow database hooks - Pavan also showed a quick demo which covered: Automatic schema fetching from Postgres, SQL generation with validation, and XCom integration for query results. - Pavan said that the implementation approach would be to start with basic SQL operators and would then expand to multiple databases. - The overview and demo was very well received by the team. - Vikram asked for broader interfaces to be defined first, before going broad with database support and Pavan agreed with that guidance. - AIP-100 Task Priorities (Natanel and Theo S): - Natanel shared the analysis and draft design approaches written up by the two of them (Theo could not make the call) regarding priority-based scheduling within Airflow. - Natanel said that the current priority-based scheduling causes starvation at significant scale, when concurrency limits are hit with worker saturation. Based on their research, the proposed solutions built on top of a combination of priority + aging. - Jens shared his feedback (through Vikram) that he very much appreciated the analysis, but was unconvinced about any of the currently proposed solutions. - Vikram also commended the team on their research into the problem, and added that he was surprised that task priorities still existed in Airflow, saying that he had though we had deprecated them a long time ago (since Airflow 2) - The general consensus was that the topic needed greater in-depth offline thought before proceeding to a conclusion towards an algorithm and a migration strategy. -- Vikram Koka Chief Strategy Officer Email: [email protected] <https://www.astronomer.io/>
