Re: [DISCUSS] Airflow error codes (AERR) design

Jarek Potiuk Mon, 11 May 2026 07:33:31 -0700

1. Agree with the grouping idea. I think even originally when you discussed
it Omkar - there were some "groups" of exceptions.
AERR-DAG-NOTFOUND-BACKFILL seems like a more suitable short name than 0001,
provided it is descriptive enough for you to easily understand what each
error means. I would hate always having to look up the error code in a
table or YAML file. We coud have such table generated and in docs, but
essentially after seeing enough logs you should know what the short code
means without memorizing the number. It's almost inhuman to force people to
associate numeric values with meaning.


2. I think 1-1 mapping exception to the code would be good. While a short
error code is useful in logs, seeing the short name in the code when you
"raise" them is counterproductive because it adds noise to something we
already have: the Exception Class name. On the other hand, such a class
name looks way worse in the logs./

3. *Idea:* Why don't we just keep the correct naming convention for our
Exceptions and map them into IDs automatically (e.g.,
AirflowDagNotFoundBackfillException -> AERR-DAG-NOT-FOUND-BACKFILL). I
think it ticks all the boxes:

* 0 maintenance (just a hook to check if all exceptions follow the right
conventions
* 0 mapping
* Code friendly
* Log friendly
* You see what you get by looking at either the exception class or ID
* We can build an exception hierarchy that allows us to catch several
exceptions (e.g., `AirflowDagNotFoundException` being an abstract
(non-instantiable) parent of AirflowDagNotFoundBackfillExceptions and
AirflowDagNotFoundParsingException for example
* Grouping works naturally and without conscious thought—in both exception
classes and IDs

Essentially, no SKILL is needed for that.

And BTW. I think none of our "coding" should really "Requiire" using SKILLS
and "impair" those who do not use agents. Even though I'm known as an AI
and Agent enthusiast, we should avoid making standard code parts or
development workflows inaccessible to those who don't want to use agents,
especially if it's easy.

It's one thing to empower maintainers and contributors with SKILLS to
review or triage PRs if they want to or for someone doing translation to
add a new phrase in a language. However, it's a different story when
discussing basic "code" tasks, like adding new exceptions. Ideally, those
tasks should not **require** you to use Agents or be "difficult" without
them. We should totally respect people who choose not to use agents
themselves and ensure they do not feel like "lesser" people. Promoting
something and giving people new tools is one thing; making it a mandatory
part of the regular workflow when it isn't truly required is another.

J.



On Mon, May 11, 2026 at 3:30 PM Ash Berlin-Taylor <[email protected]> wrote:

> Maybe we should not have sequential IDs at all and do something similar to
> what SQLA does: https://sqlalche.me/e/20/xd2s for example (That’s
> `/e/<major><minor>/<code>` which redirects)
>
> Some of the example(?) errors are internal to a single component and never
> exposed to users, so shouldn’t be in the registry - AERR009/DagCodeNotFound
> for instance, is likely thrown by the ORM layer and caught by the API
> server, which is to say it is entirely invisible to the user? I imagine
> there are many more in this category.
>
>
> AERR010 and AERR011 are both DagNotFound, but 11 is specifically for
> "Requested DAG could not be found for backfill operation” — that seems very
> odd to have a different error code for that.
>
> We also have provider specific error codes in the main registry which
> isn’t a pattern that will work (`user_facing_error_message: Google Ads link
> not found for the specified property`) etc.
>
> -ash
>
>
> > On 11 May 2026, at 14:20, Ash Berlin-Taylor <[email protected]> wrote:
> >
> > If we do this (and I’m still not sure what I think overall) +1 to some
> kind of grouping. Right now for instance the registry has AERR002 for
> connection not found, but no space to add  Variable not found, or State not
> found in the future.
> >
> >> On 11 May 2026, at 12:25, Dev-iL <[email protected]> wrote:
> >>
> >> (please assume there's a "In my opinion, " prefix to every sentence)
> >>
> >> 0. Since the dev workflow is very structured, it can/should be made
> into a
> >> SKILL.
> >> 1. Long term yes, but while we refactor the existing code we should
> allow
> >> it (assuming it trip hooks or CI)
> >> 2. YAML seems suitable at first glance
> >> 3. One code per exception makes sense to me. Depending on how we want
> the
> >> exception taxonomy to evolve, perhaps we want to have codes like ###.###
> >> for "parent" and "subclass" exceptions, or Ruff-style #00 will be a
> family
> >> of similar exceptions.
> >>
> >>
> >> On Mon, 11 May 2026, 12:15 Omkar P, <[email protected]> wrote:
> >>
> >>> Hi team,
> >>>
> >>> Starting this thread to discuss the design of Airflow error codes.
> These
> >>> are LLM-friendly strings starting with AERR, which airflow devs can use
> >>> when raising exceptions, to convey the error context to dag users in a
> >>> succinct way. Providing current design details below.
> >>>
> >>> PR: https://github.com/apache/airflow/pull/65423
> >>>
> >>> Feature flow:
> >>> 1. airflow dev identifies error case and defines a new error code in
> the
> >>> error mapping yaml (say AERR002).
> >>> 2. dev then adds AirflowErrorCodeMixin to respective exception class
> >>> that they'd want to raise with an error_code.
> >>> 3. dev then specifies the error_code in raise in code (e.g.  raise
> >>> AirflowNotFoundException(..., error_code="AERR002")).
> >>> 4. dev runs breeze build-docs that generates a new docs page
> AERR002.rst
> >>> 5. breeze static check takes care of validating if error code is mapped
> >>> to correct exception class.
> >>>
> >>> User side:
> >>> On airflow users' side, they now see airflow error code as
> >>> part of the stack trace, which they can use for communicating problems
> >>> instead of pasting verbose stack traces. Error codes also improve
> >>> LLM-based discovery of airflow errors as codes are much more
> >>> deterministic/well-defined than plain stack traces.
> >>>
> >>> Open questions:
> >>> 1. Should the error code be mandatory for all raises of an exception
> >>> class that uses them?
> >>> 2. Where should the error code info be stored? Is a YAML-based registry
> >>> good enough?
> >>> 3. Shall we have a 1:1 mapping between an error code and exception
> >>> class? e.g. AirflowNotFoundException mapped only to AERR002 i.e. only
> one
> >>> error code. (current implementation in PR has supports many to one
> mapping,
> >>> one exception class <-> multiple error codes based on respective
> context).
> >>>
> >>> Look forward to your thoughts on above open questions or any other
> >>> design suggestions you'd like to add, thanks!
> >>>
> >>> Regards,
> >>> Omkar
> >>>
> >
>
>

Re: [DISCUSS] Airflow error codes (AERR) design

Reply via email to