I was able to chat with a couple folks about this. Small sample, but the
sentiment was, "this is just a timeout".  In other words, if we're going to
call this SLA, we really ought to evaluate against the "this thing should
have run by" time and not the actual start time.  And, ideally, we should
also have a way to enforce "this should have run by X time daily" (for
example) even when it's a dataset-triggered or API-triggered dag with *no*
 schedule.

Like I said, it's a small number of folks I've talked to, so I don't have
overwhelming confidence about this assessment.  But I do think it's more
likely than not that this would be the prevailing assessment were somehow
able to get better data on this.

Reply via email to