ephraimbuddy opened a new pull request, #40293:
URL: https://github.com/apache/airflow/pull/40293

   TI.are_dependencies_met run over and over even when no changes have happened 
that would allow it to pass. This causes the scheduler loop to get slower and 
slower as more blocked TIs pile up.
   
   This scenario is easy to reproduce with this DAG (courtesy of @rob-1126): 
Before running it, enable debug logging
   
   ```python
   
   from datetime import datetime
   
   from airflow import DAG
   from airflow.operators.bash_operator import BashOperator
   
   class FailsFirstTimeOperator(BashOperator):
       def execute(self, context):
           if context["ti"].try_number == 1:
               raise Exception("I fail the first time on purpose to test retry 
delay")
           print(context["ti"].try_number)
           return super().execute(context)
   
   one_day_of_seconds = 60 * 60 * 24
   with DAG(dag_id="waity", schedule_interval=None, start_date=datetime(2021, 
1, 1)):
       starting_task = FailsFirstTimeOperator(task_id="starting_task", 
retry_delay=one_day_of_seconds, retries=1, bash_command="echo whee")
       for i in range(0,1*1000):
           task = BashOperator(task_id=f"task_{i}", bash_command="sleep 1")
           starting_task >> task
   
   ```
   
   Simply run multiples of the above DAG (6 dagruns is enough to observe the 
delay). Note that the scheduler loop is now taking ~4-6 seconds, and grows with 
each new waity dagrun.
   
   The solution was to change the last_scheduling_decision to next_schedulable 
date for the dagrun.
   1. When the task instance enter up_for_retry state, we set the 
next_schedulable date for the dagrun to the next retry date of the ti. This 
way, we stop unnecessary dependency checks for other TIs blocked by this retry 
state.
   2. In other cases, we check once if the dependencies of a TI are met, if not 
met, we nullify the next_schedulable date.
   3. The next schedulable date is updated for a dagrun when any of its 
taskinstance's state is in the finished state.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to