Hi Daniel, I was the author of the current OTel spans approach and I would like to share some context.
The spans for the dag_run and the tasks aren't adding much value because all the info is already available in the web server. The main benefit is the ability to add sub-spans from under tasks and monitor individual steps or external operations. That's why it was designed this way. You need the task span to stay active long enough to get its context and propagate it under the task. After that you can use it to create sub-spans. If the parent span isn't active while you are creating the sub-span, the visual result won't make much sense. Initially, I had an idea to create a short span in the beginning of the task and then a short span at the end, but it doesn't look good when there are multiple sub-spans and it's hard to figure out what is the order of the steps and their placing in the dag_run. *Possible tree* dag_run [------------------------------------------] task_1 [-----] task_2 [----------------------------------] task_2_1 [--------] task_2_2 [----] task_2_3 [----------------] The implementation ended up being quite complex due to scheduler HA. The span objects can't be shared outside of the process that created them, they are thread local. One dag_run can be so long-running that the scheduler that started processing it, isn't the one that marks it as finished. The benefit of having an active span for each scheduler that keeps track of a dag_run is to have the ability to get a clear picture of individual task steps and get continuous observability. I'm not trying to point out that it shouldn't be simplified if possible. When I contributed these changes, Airflow looked very different from what it is now. For example, the task-sdk didn't exist and tasks had direct DB access. If for some reason, the task span ended and a new span was started, you could get the context_carrier directly from the db and use it to make the new span as the parent of any future sub-spans. That's not possible anymore. I'm open to all ideas. I just wanted to explain the complexity. Christos On Tue, Feb 17, 2026 at 6:53 PM Daniel Standish <[email protected]> wrote: > I should add a bit of detail.... > > What we currently do is create a span when scheduler realizes a dag run is > running. And then we store it in a dictionary. Then when we detect the > dag run ends we check in the dictionary and close it if we can. > > Similar for tasks. > > But these spans have to deal with scheduler restarts, different schedulers > handling the same dag run and tasks etc. > > I don't think it's really the way you're supposed to create spans. And I'm > not one to say we always have to do things the "right" way. But I don't > like the complexity it introduces and I don't see the benefit. > > So in the PR I rip out all of those "active spans" dictionaries and just > create spans when it makes sense and maintain the parent child > relationships so you can still see the full flow in the end. > > On Tue, Feb 17, 2026 at 8:45 AM Daniel Standish <[email protected]> > wrote: > > > Hi I am looking at our OTEL stuff and I have reached the conclusion that > > we should rework it so we don't jump through hoops to keep alive very > long > > spans for dag run and task. > > > > We should still have the spans, we just shouldn't jump through hoops to > > ensure that their start and end times match those in the metastore. > > > > Indeed, that's information that's always already available! > > > > I do not claim to be an OTEL expert. > > > > But intuitively we can see that the current approach is very complicated > > and confusing, and therein probably less reliable and certainly less > > maintainable. > > > > You can see what I've done so far here: > > https://github.com/apache/airflow/pull/61897 > > > > Sycophantic though it may be, chat gpt seems to agree with me: > > https://chatgpt.com/share/698fea02-fa18-8013-93f7-1e26215bc3f6 > > > > And it makes sense -- just make spans that will automatically be closed > > when your specific action is over. > > > > WDYT? > > > > Thanks > > >
