Hello Everyone, I think we need some more pairs of eyes to take a look at potential fixes we have for the pesky LocalExecutorTest that we are all experiencing with our Travis builds. Once we solve it I think we should be much closer to have stable builds - including some other flaky test fixes merged recently.
It turned out that the problem relates to quite deep internals of how data is passed between processes using multiprocessing queues. It's really deep in the core processing of Airflow so I think it would be great if also other experienced Airflowers review and comment it and help to select the best solution as we could have missed something. I was looking at it together with Ash and Bas and I (a bit too fast) merged a preliminary version of the fix last week. We reverted it later as it turned out to have some side effects, so we know we have to be careful with this one. After more detailed analysis and discussions with Omar, we have now two potential candidates to fix it. Both are green and from local testing - both are solving the problem in a different way. - https://github.com/apache/airflow/pull/5199 - https://github.com/apache/airflow/pull/5200 I tried to describe the problem, solution candidates with Pros and Cons in the JIRA ticket : https://issues.apache.org/jira/browse/AIRFLOW-4401 I'd love if we can get reviews in the PRs and input to discussion on which solution to choose. J. -- Jarek Potiuk Polidea <https://www.polidea.com/> | Principal Software Engineer M: +48 660 796 129 <+48660796129> E: jarek.pot...@polidea.com