Thanks Daniel! I'm going to run some tests to see what's the behavior with
your patch.

Christos

On Wed, Feb 18, 2026 at 2:52 AM Daniel Standish <[email protected]>
wrote:

> Thanks Christos
>
> Please take a look at the screenshots in my pr
>
> You can see what it looks like
>
> I am able to use the parent dag run and task spans even down into task sdk
>
> The only thing you don’t get is a green bar that lasts the whole time but
> to me this doesn’t really matter
>
> Please take a look when you have a chance. I think it’s a good approach
> with very small trade off.
>
> On Tue, Feb 17, 2026 at 9:37 AM Christos Bisias <[email protected]>
> wrote:
>
> > Hi Daniel,
> >
> > I was the author of the current OTel spans approach and I would like to
> > share some context.
> >
> > The spans for the dag_run and the tasks aren't adding much value because
> > all the info is already available in the web server. The main benefit is
> > the ability to add sub-spans from under tasks and monitor individual
> steps
> > or external operations. That's why it was designed this way.
> >
> > You need the task span to stay active long enough to get its context and
> > propagate it under the task. After that you can use it to create
> sub-spans.
> > If the parent span isn't active while you are creating the sub-span, the
> > visual result won't make much sense.
> >
> > Initially, I had an idea to create a short span in the beginning of the
> > task and then a short span at the end, but it doesn't look good when
> there
> > are multiple sub-spans and it's hard to figure out what is the order of
> the
> > steps and their placing in the dag_run.
> >
> > *Possible tree*
> > dag_run    [------------------------------------------]
> > task_1        [-----]
> > task_2                [----------------------------------]
> > task_2_1                [--------]
> > task_2_2                           [----]
> > task_2_3                                 [----------------]
> >
> > The implementation ended up being quite complex due to scheduler HA. The
> > span objects can't be shared outside of the process that created them,
> they
> > are thread local. One dag_run can be so long-running that the scheduler
> > that started processing it, isn't the one that marks it as finished. The
> > benefit of having an active span for each scheduler that keeps track of a
> > dag_run is to have the ability to get a clear picture of individual task
> > steps and get continuous observability.
> >
> > I'm not trying to point out that it shouldn't be simplified if possible.
> > When I contributed these changes, Airflow looked very different from what
> > it is now. For example, the task-sdk didn't exist and tasks had direct DB
> > access. If for some reason, the task span ended and a new span was
> started,
> > you could get the context_carrier directly from the db and use it to make
> > the new span as the parent of any future sub-spans. That's not possible
> > anymore.
> >
> > I'm open to all ideas. I just wanted to explain the complexity.
> >
> > Christos
> >
> > On Tue, Feb 17, 2026 at 6:53 PM Daniel Standish <[email protected]>
> > wrote:
> >
> > > I should add a bit of detail....
> > >
> > > What we currently do is create a span when scheduler realizes a dag run
> > is
> > > running.  And then we store it in a dictionary.  Then when we detect
> the
> > > dag run ends we check in the dictionary and close it if we can.
> > >
> > > Similar for tasks.
> > >
> > > But these spans have to deal with scheduler restarts, different
> > schedulers
> > > handling the same dag run and tasks etc.
> > >
> > > I don't think it's really the way you're supposed to create spans.  And
> > I'm
> > > not one to say we always have to do things the "right" way.  But I
> don't
> > > like the complexity it introduces and I don't see the benefit.
> > >
> > > So in the PR I rip out all of those "active spans" dictionaries and
> just
> > > create spans when it makes sense and maintain the parent child
> > > relationships so you can still see the full flow in the end.
> > >
> > > On Tue, Feb 17, 2026 at 8:45 AM Daniel Standish <[email protected]>
> > > wrote:
> > >
> > > > Hi I am looking at our OTEL stuff and I have reached the conclusion
> > that
> > > > we should rework it so we don't jump through hoops to keep alive very
> > > long
> > > > spans for dag run and task.
> > > >
> > > > We should still have the spans, we just shouldn't jump through hoops
> to
> > > > ensure that their start and end times match those in the metastore.
> > > >
> > > > Indeed, that's information that's always already available!
> > > >
> > > > I do not claim to be an OTEL expert.
> > > >
> > > > But intuitively we can see that the current approach is very
> > complicated
> > > > and confusing, and therein probably less reliable and certainly less
> > > > maintainable.
> > > >
> > > > You can see what I've done so far here:
> > > > https://github.com/apache/airflow/pull/61897
> > > >
> > > > Sycophantic though it may be, chat gpt seems to agree with me:
> > > > https://chatgpt.com/share/698fea02-fa18-8013-93f7-1e26215bc3f6
> > > >
> > > > And it makes sense -- just make spans that will automatically be
> closed
> > > > when your specific action is over.
> > > >
> > > > WDYT?
> > > >
> > > > Thanks
> > > >
> > >
> >
>

Reply via email to