potiuk commented on issue #27449: URL: https://github.com/apache/airflow/issues/27449#issuecomment-1303554730
Wild idea. This was a loose thought I have for quite some time while reviewing some of issues connected to mini-scheduler. I think the mini-scheduler "approach" was a good idea - triggering direct downstream task when task is finished as soon as possible is generally cool. However it introduces complexity (similar to above) because it is IMHO a tea that is "almost, but not quite, entirely unlike tea" to quote Douglas Adams. I looks like scheduler scheduling tasks, it behaves like that, but there are some small quirks (like above) that make it susceptible to subtle bugs. Also I think the original idea that the mini-scheduler will be run in a more "distributed" fashion is a bit hampered by the fact that bulk of what the mini-scheduler does is done via the DB anyway and it synchronizes on DagRun lock and DB operation so pretty much all the "distribution" benefits are all but gone. Havint multiple schedulers already provides a way to "distribute" scheduling and distributing it even more does not change much. Yes, it uses the fact that the DAG is already loaded in memory and some of the DB objects are effectively cached by SQLAlchemy but I think with 2.4 and the "micro-pipelines" approach where our users have much better ways to split their DAGs into smaller, independent DAGs, this is becomes far less of an issue. I thought that we could implement "mini-scheduling" slightly differently. Rather than actually do the scheduling, we could add DagRun of just completed task to a some kind of table where we could keep "priority" DagRuns to schedule by a "real" scheduler. Then it would be pretty immediately picked up by (one of) the schedulers at the next scheduling loop and scheduled. The latency there would be slightly bigger, but not that much IMHO and we would loose the "DAG structure in-memory", but I think it would make it a bit more robust - especially when we start introducing new concepts. It's not entirely hashed out - there are probably some edge cases where for example we have a lot of those "priority" dagruns to process (what should we do with the non-priority ones for example). I am mentioning it here because I think such an approach would pretty much automatically solve the above problem. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
