ashb commented on issue #44147:
URL: https://github.com/apache/airflow/issues/44147#issuecomment-2631381889

   Thinking through this a little bit more @jedcunningham and I have decided 
that:
   
   TaskReschedule, XCom, TaskMap and TINote should _not_ be linked to a 
specific TI Try/TIHistory row and should instead apply to the whole "dag run 
task" (a concept which doesn't currently exist in Airflow 2 or 3 or even have a 
good name).
   
   Our thinking is:
   
   - TI Note: For UX reasons note should be shared across all attempts of a Task
   - TaskMap and TaskReschedule don't serve any value in storing beyond the 
currently active task (so they could either be FKd to a specific TI uuid, or be 
deleted when the TI state is terminal.)
   - XCom: Bit of a funny one, we clear the XCom value on start of the next 
attempt _anyway_, so we don't ever store history. However there _could_ be some 
value in some of these being tied to an atempt. For example, many 
BaseOperatorLink subclasses store the URL in xcom, and being able to see 
external job  logs (EMR, DataProc etc) for a previous attempt would be a nice 
feature.
   
   The only other relationships _onto_ TaskInstance table is TIHistory and 
RenderedTaskInstanceFIelds.
   
   While it might be nice to see some Xcom values and the RTIF for a given 
TIHistory row, the cost of keeping TIHistory in sync with TaskInstance changes 
and the shenanigans we'd need to pull to get the FKs update means we don't 
_need_ to update anything.
   
   One option if we only want to avoid composite PKs of 
(dag_id,run_id,task_id,map_index) would be to create a "Dag Task id" (this name 
sucks though) column which is an integer/UUID key that is unique over 
(dag_id,run_id)
   
   For example, we might store these TI history rows
   
   ```
    dag_id | run_id | task_id |                NEW_ID                |          
        id                  | map_index | try_number
   
--------+--------+---------+--------------------------------------+--------------------------------------+-----------+------------
    dag1   | run1   | my_task | 64379755-f717-4463-9f8b-5e2cb16c74f8 | 
f4578f24-1ecb-4ae1-b23d-2da43863cf80 |        -1 |          1
    dag1   | run1   | my_task | 64379755-f717-4463-9f8b-5e2cb16c74f8 | 
7fdd63b8-b6da-4726-8aab-423612dc98b7 |        -1 |          2
   ```
   
   and this "active" TI:
   
   
   ```
    dag1   | run1   | my_task | 64379755-f717-4463-9f8b-5e2cb16c74f8 | 
87dc53d3-d46c-4e80-a3fe-e7a986879951 |        -1 |          3
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to