Hi All, I've been on a team using airflow to ingest batch data for about 2 years and I wanted to throw some support behind the recent AIP-15 by Xiaodong DENG, and to say that it probably doesn't go far enough in its current state.
AIP-15 Support Multiple-Schedulers for HA & Better Scheduling Performance https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=103092651 However, the number one frustration I have experienced and heard across a few companies using airflow is that the scheduler is hard to control. I don't know if the teams I have talked to have identical problems to my teams. Scheduler expectations has got to be a top reason engineers do not adopt airflow. Things like scheduling tasks 24h ahead, inability to trigger tasks at exact times, lack of ability to prioritize dropped dags to be picked up first, need to be tuned to the needs of specific organizations. There are common workarounds that I won't get into here. There might even be short-term value in the idea of spinning up an airflow docker container every task to trigger a manual run, using some other scheduler. I have my thumb on at least a few pulses and I believe the next step folks will take is to try to find a way to get off airflow to improve the scheduling woes. I guess I would say that Airflow's major value has been templatizing workflows with the DAG constraint and pulling them out of bash, now we've exposed the next issue which is the high variety of business logic expectations people bring to a scheduler. Airflow is pretty far ahead of other tools in the space. I am moving to a role where I don't use airflow, but for those who want to grow the tool I think this is the single biggest blocker to adoption and the best way to create a feeling of joy/relief (and not dread) when you open up Airflow at 9 AM on Wednesday. Best, Trent Robbins https://www.linkedin.com/in/trentrobbins
