kaxil commented on PR #67902:
URL: https://github.com/apache/airflow/pull/67902#issuecomment-4608326246

   @jroachgolf84 I read your PR and the AIP use-case you pointed to, and I 
agree it's a valid use-case. So you're right, especially the S3 use-case and 
watermarking. I'm also on board with keeping the store task-unaware.
   
   The one thing I'd flag so we don't over-correct: tasks do write the Asset 
Store too (a producer maintaining an asset's watermark across runs), and for 
those entries it's genuinely useful to be able to jump to which run set a 
value, e.g. when a watermark looks wrong. So I'd keep a per-entry "where was 
this written", just not as a task-only field.
   
   The problem with `last_updated_by_ti_id` isn't that it records a writer, 
it's that it can only record a task writer, so watcher watermarks (the headline 
case) come back NULL. The user-facing value is really a link to where the value 
was set: the run's logs for a task write, the triggerer logs for a watcher 
write, rather than a raw owner shown in a column. So if we record anything it 
should be writer-agnostic (a `kind`: task / watcher / api) so we can build the 
right link for each, instead of a task-only field that can't point anywhere for 
a watcher. Ownership itself stays where it belongs via the scope, asset-state 
on the asset and task-state on the TI, which already matches the UI layout.
   
   On storage, the asset store is long-lived (a watermark outlives the runs 
that touch it), so for the task case I'd lean to plain 
`dag_id/run_id/task_id/map_index` over a `task_instance` FK: an FK with `ON 
DELETE SET NULL` loses the link target the moment the run is cleaned up, while 
the plain fields still build it. That part's an implementation detail though, 
the user-facing thing is the link.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to