kzosabe opened a new issue, #38632: URL: https://github.com/apache/airflow/issues/38632
### What do you see as an issue? There was a problem with the emitting of statistics logs regarding execution time introduced in https://github.com/apache/airflow/pull/30612 . Problem details: - https://github.com/apache/airflow/pull/30612#issuecomment-1980985307 - https://github.com/apache/airflow/pull/34771#discussion_r1542257954 When issuing such statistical indicators of execution time, it is important to first clarify what the corresponding time fields are for each state, and to make recognition and implementation consistent. The time fields corresponding to the transitions to each state of TIs, as I read the code, are as follows: ``` SCHEDULED: (no time field in this state currently exists) QUEUED: queued_dttm RUNNING: start_date SUCCESS: end_date FAILED: end_date SKIPPED: end_date UPSTREAM_FAILED: end_date REMOVED: end_date UP_FOR_RESCHEDULE: (uncertain) RESTARTING: (uncertain) UP_FOR_RETRY: (uncertain) DEFERRED: (uncertain) ``` So as I understood, the matrix of states and TI dates should look like this (corrections welcome): --- ``` SCHEDULED - queued_dttm: no, start_date: no, end_date: no QUEUED - queued_dttm: yes, start_date: no, end_date: no RUNNING - queued_dttm: yes, start_date: yes, end_date: no SUCCESS - queued_dttm: yes, start_date: yes, end_date: yes(1) FAILED - queued_dttm: yes, start_date: yes, end_date: yes(1) SKIPPED - queued_dttm: yes, start_date: yes, end_date: yes(1) UPSTREAM_FAILED - queued_dttm: yes, start_date: yes, end_date: yes(1) REMOVED - queued_dttm: yes, start_date: yes, end_date: yes(1) UP_FOR_RESCHEDULE - queued_dttm: yes, start_date: yes, end_date: yes?(2) RESTARTING - queued_dttm: no?, start_date: no?, end_date: no?(3) UP_FOR_RETRY - queued_dttm: no?, start_date: no?, end_date: no?(3) DEFERRED - queued_dttm: no, start_date: no, end_date: no(4) ``` (1) https://github.com/apache/airflow/blob/fce3a583348162f655282d032eca654dcb67b497/airflow/models/taskinstance.py#L1894 and https://github.com/apache/airflow/blob/33e5d0351d7d09033c57a4a0b851b9e1b50bfb01/airflow/utils/state.py#L160 (2) https://github.com/apache/airflow/blob/fce3a583348162f655282d032eca654dcb67b497/airflow/models/taskinstance.py#L2829 Is this state being handled this way because the task is interpreted as finished but waiting to be picked up by the scheduler? (3) https://airflow.apache.org/docs/apache-airflow/stable/core-concepts/tasks.html#task-instances One interpretation of the above diagram should look like this. If RESTARTING and UP_FOR_RETRY are interpreted as pre-scheduled states, then all times should be no. If we interpret them as finished states like (2), then all times should be yes. We could also treat these as having indeterminate time field states, but that would probably not be a good idea. (4) Undocumented, but commented as `Deferrable operator waiting on a trigger` on this line. https://github.com/apache/airflow/blob/33e5d0351d7d09033c57a4a0b851b9e1b50bfb01/airflow/utils/state.py#L59 And these lines imply that this state is a pre-scheduled state https://github.com/apache/airflow/blob/db3181c27bc52ebf0167fb6b2f8181a13e7d02b9/airflow/models/trigger.py#L191 --- ### Solving the problem So it will be great that - Fill in the matrix as above by community consensus (and write it into the official document if we can) - Fix codes that do not follow the matrix above - Add tests or debug logs that can check for incorrect state and time combinations Is it worth working on this? In the first, I tried to tackle the set_state issue mentioned in https://github.com/apache/airflow/pull/34771#discussion_r1542257954 . https://github.com/apache/airflow/pull/38631 ### Anything else _No response_ ### Are you willing to submit PR? - [X] Yes I am willing to submit a PR! ### Code of Conduct - [X] I agree to follow this project's [Code of Conduct](https://github.com/apache/airflow/blob/main/CODE_OF_CONDUCT.md) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org