[GitHub] [airflow] potiuk commented on issue #27449: Dynamic tasks marked as `upstream_failed` when none of their upstream tasks are `failed` or `upstream_failed`

GitBox Fri, 04 Nov 2022 06:41:28 -0700


potiuk commented on issue #27449:
URL: https://github.com/apache/airflow/issues/27449#issuecomment-1303554730


   Wild idea. 
   
   This was a loose thought I have for quite some time while reviewing some of 
issues connected to mini-scheduler. I think the mini-scheduler "approach" was a 
good idea - triggering direct downstream task when task is finished as soon as 
possible is generally cool. However it introduces complexity (similar to above) 
because it is IMHO a tea that is "almost, but not quite, entirely unlike tea" 
to quote Douglas Adams. I looks like scheduler scheduling tasks, it behaves 
like that, but there are some small quirks (like above) that make it 
susceptible to subtle bugs.
   
   Also I think the original idea that the mini-scheduler will be run in a more 
"distributed" fashion is a bit hampered by the fact that bulk of what the 
mini-scheduler does is done via the DB anyway and it synchronizes on DagRun 
lock and DB operation so pretty much all the "distribution" benefits are all 
but gone. Havint multiple schedulers already provides a way to "distribute" 
scheduling and distributing it even more does not change much. Yes, it uses the 
fact that the DAG is already loaded in memory and some of the DB objects are 
effectively cached by SQLAlchemy but I think with 2.4 and the "micro-pipelines" 
approach where our users have much better ways to split their DAGs into 
smaller, independent DAGs, this is becomes far less of an issue.
   
   I thought that we could implement "mini-scheduling" slightly differently. 
Rather than actually do the scheduling, we could add DagRun of just completed 
task to a some kind of table where we could keep "priority" DagRuns to schedule 
by a "real" scheduler. Then it would be pretty immediately picked up by (one 
of) the schedulers at the next scheduling loop and scheduled. The latency there 
would be slightly bigger, but not that much IMHO and we would loose the "DAG 
structure in-memory", but I think it would make it a bit more robust - 
especially when we start introducing new concepts. 
   
   It's not entirely hashed out - there are probably some edge cases where for 
example we have a lot of those "priority" dagruns to process (what should we do 
with the non-priority ones for example).
   
   I am mentioning it here because I think such an approach would pretty much 
automatically solve the above problem.  


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [airflow] potiuk commented on issue #27449: Dynamic tasks marked as `upstream_failed` when none of their upstream tasks are `failed` or `upstream_failed`

Reply via email to