karakanb commented on issue #43948:
URL: https://github.com/apache/airflow/issues/43948#issuecomment-2507302897

   I looked into this a bit but it seems like there's a fundamental issue here, 
I'll try to explain below.
   
   
   The expected behavior would be to have a sensor that can run with retries, 
in case something fails during the sensor check, e.g. infra issues. The retries 
are not about the sensor not finding what it was supposed to, e.g. "the task is 
not there", but to recover from infra failures, e.g. the database being 
temporarily unavailable. This behavior works as expected with sensors in 
general.
   
   However, when combining retries on sensors with timeouts, that's where 
things start getting interesting:
   - When the user sets a timeout, the intention is "wait this long _from the 
beginning of the first try_", which is a very important factor that is also 
highlighted in the [Timeouts section of the 
docs](https://airflow.apache.org/docs/apache-airflow/stable/core-concepts/tasks.html#timeouts).
 This behavior seems to work correctly with `reschedule` mode thanks to the 
`task_reschedule` table that records the start timestamp for the first try.
   - However, when deferrable mode is used, the timeouts do not work with 
retries since there's no way to retrieve the start time of the first attempt of 
a task instance.
   
   It seems like the user would want the same behavior between deferred and 
non-deferred versions of the sensor for the timeouts with retries, but I 
couldn't find a way to solve it without adding a new table to airflow. is the 
original first start time information saved somewhere?
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to