kzosabe commented on issue #38632:
URL: https://github.com/apache/airflow/issues/38632#issuecomment-2031136250

   @uranusjr Thanks for the explanation!
   
   Just to confirm, does a DEFFERED task really transition directly to RUNNING?
   I may be missing something, but I could not find a code path that directly 
transitions from DEFFERED to RUNNING.
   As far as I have investigated, it seems that DEFFERED tasks are first rolled 
back to SCHEDULED, as in the following test code.
   
https://github.com/apache/airflow/blob/db3181c27bc52ebf0167fb6b2f8181a13e7d02b9/tests/models/test_trigger.py#L93-L118
   
   However, whether there is a direct transition or not, your explanation seems 
to be correct.
   
   As I was checking your comment, I noticed that, at least the presence or 
absence of start_date is not be determined by the value of ti.state. So the 
form of matrix I wrote above may be nonsense.
   
   From reading your comment, in my interpretation, the current desired 
behavior of `start_date` is:
   
   - The datetime when the task is first touched by worker is stored.
   - And is normally retained no matter how the state changes thereafter 
(including changes to pre-running state).
   - If there is a action that can be regarded as `restart` , it will be 
overwritten with the datetime of it.
       - Perhaps we don't want to consider a run from DEFERRED as a `restart`, 
but we want to consider a run from UP_FOR_RESCHEDULE as a `restart`?
   
   Is this understanding correct?
   
   If this is correct, then at least the changes I've made in 
https://github.com/apache/airflow/pull/38631 should be reasonable in terms of 
conforming to this interpretation.
   Because in the current implementation, the change from QUEUED to SCHEDULED 
also sets the end_date, which is also clearly contrary to the above.
   
   However, it seems that changes such as putting None in start_date when 
reverting to a pre-running state, as described in 
https://github.com/apache/airflow/pull/38631#discussion_r1545086130 , are wrong 
and should not be done.
   
   
   > I think your reference on UP_FOR_RETRY’s end_date is wrong?
   
   According to my referenced code, 
`ti.set_state(TaskInstanceState.UP_FOR_RETRY)` actually sets current time as 
ti.end_date (because of `or ti.state == TaskInstanceState.UP_FOR_RETRY`).
   However, from reading your commentary, this may also be the wrong behavior.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to