ashb commented on issue #27449: URL: https://github.com/apache/airflow/issues/27449#issuecomment-1303250538
Okay, so the problem here is, as Ephraim identified, that the mini scheduler operates on a subset of the DAG, not the complete dag. So when `first_task` finishes we get a partial DAG containing the following tasks: `first_task, last_task, middle_task`. Not crucially that we _dont_ have `second_task`. The reason we use a partial subset rather than the whole dag was a performance optimization. Essentially: this task just finished, lets look at the downstream task and see if any of them can be scheduled. But in order to check if those can be scheduled, we need the upstream of _those_ tasks (which is how we get to include `middle_task`). So I think we have two options here: 1) Remove `include_direct_upstream` so that the partial dag includes _all_ upstreams. Con to this: mini scheduler does more work (but likely not all that much more) 2) Change the expansion so that we don't fail mapped tasks if we can't expand when `self.dag.partial is True`? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
