whitleykeith opened a new issue, #38407:
URL: https://github.com/apache/airflow/issues/38407

   ### Description
   
   One current major limitation on backfills with Airflow is how 
`--ignore-dependencies` works when using `task_regex`, as it also ignores 
dependencies of the resulting partial dag. However, not using 
`--ignore-dependencies` means that you end up rerunning tasks you may not 
need/want to rerun in a given backfill.
   
   For example, say you have a dag like this: 
   
   ```python
   sensors = [ExternalTaskSensor(external_dag_id=dag_id, 
external_task_id=task_id) for dag_id, task_id in sensor_list]
   
   foo = DummyOperator(task_id="not_sensor_foo")
   bar = DummyOperator(task_id="not_sensor_bar")
   baz = DummyOperator(task_id="not_sensor_baz")
   
   sensors >> foo >> bar >> baz
   
   ```
   
   A common use case is to backfill just the non-sensor tasks, because those 
ultimately may not need to be rerun. However `--reset-dagruns` would reset 
these tasks if they already completed, and sometimes that flag is desired to 
clear existing runs of tasks.
   
   This leads to a situation where you cannot skip the desired sensor tasks 
without also effectively ignoring dependencies you actually don't want to 
ignore. Therefore you must either:
   
   * Run the backfill with all unwanted tasks
   * Or run individual backfills for each "layer" of the DAG dependency tree
   
   Both options are not ideal and require either waiting longer than necessary 
for backfills to finish or manual hand-holding 
   
   
   ### Use case/motivation
   
   Ideally, when running the above DAG with task regex of `not_sensor_.*`, 
either the existing `ignore_dependencies` flag or another flag allows for 
partial dag dependencies to be honored, while still ignoring dependencies of 
tasks filtered out
   
   ### Related issues
   
   _No response_
   
   ### Are you willing to submit a PR?
   
   - [x] Yes I am willing to submit a PR!
   
   ### Code of Conduct
   
   - [x] I agree to follow this project's [Code of 
Conduct](https://github.com/apache/airflow/blob/main/CODE_OF_CONDUCT.md)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to