Re: [DISCUSS] Airflow error codes (AERR) design

Jens Scheffler Mon, 11 May 2026 14:30:48 -0700

Hi,

+1 to Jarek and Ash, while I generally like the idea I#d favor _not_needing a manuayl mapping in YAML and no code lookup table.

Assuming for 95% of cases an 1:1 error code to exception mapping isreasonable. If there are 5% of cases then it might be pretty easy tosplit exceptions or adding a manualy code for these special cases. Butall majority would be great if zero maintenance. Automated mapping fromException class to error code seems reasonable.

And for sure very very important would be to be able to supportProviders in general. If this is only in core then it would behalf-baked. Most exceptions in real life hopefully are generated inproviders.


Jens

On 11.05.26 16:35, Jarek Potiuk wrote:

Ah ... and one good thing about the auto-mapping idea. You know that
saying: T*he world is a slightly better place with every single line of
yaml removed or not even created in the first place. *This is almost
literally the quote from our "Monorepo" talk with Amogh in Talk Python To
Me :).

On Mon, May 11, 2026 at 4:33 PM Jarek Potiuk <[email protected]> wrote:

1. Agree with the grouping idea. I think even originally when you
discussed it Omkar - there were some "groups" of exceptions.
AERR-DAG-NOTFOUND-BACKFILL seems like a more suitable short name than 0001,
provided it is descriptive enough for you to easily understand what each
error means. I would hate always having to look up the error code in a
table or YAML file. We coud have such table generated and in docs, but
essentially after seeing enough logs you should know what the short code
means without memorizing the number. It's almost inhuman to force people to
associate numeric values with meaning.

2. I think 1-1 mapping exception to the code would be good. While a short
error code is useful in logs, seeing the short name in the code when you
"raise" them is counterproductive because it adds noise to something we
already have: the Exception Class name. On the other hand, such a class
name looks way worse in the logs./

3. *Idea:* Why don't we just keep the correct naming convention for our
Exceptions and map them into IDs automatically (e.g.,
AirflowDagNotFoundBackfillException -> AERR-DAG-NOT-FOUND-BACKFILL). I
think it ticks all the boxes:

* 0 maintenance (just a hook to check if all exceptions follow the right
conventions
* 0 mapping
* Code friendly
* Log friendly
* You see what you get by looking at either the exception class or ID
* We can build an exception hierarchy that allows us to catch several
exceptions (e.g., `AirflowDagNotFoundException` being an abstract
(non-instantiable) parent of AirflowDagNotFoundBackfillExceptions and
AirflowDagNotFoundParsingException for example
* Grouping works naturally and without conscious thought—in both exception
classes and IDs

Essentially, no SKILL is needed for that.

And BTW. I think none of our "coding" should really "Requiire" using
SKILLS and "impair" those who do not use agents. Even though I'm known as
an AI and Agent enthusiast, we should avoid making standard code parts or
development workflows inaccessible to those who don't want to use agents,
especially if it's easy.

It's one thing to empower maintainers and contributors with SKILLS to
review or triage PRs if they want to or for someone doing translation to
add a new phrase in a language. However, it's a different story when
discussing basic "code" tasks, like adding new exceptions. Ideally, those
tasks should not **require** you to use Agents or be "difficult" without
them. We should totally respect people who choose not to use agents
themselves and ensure they do not feel like "lesser" people. Promoting
something and giving people new tools is one thing; making it a mandatory
part of the regular workflow when it isn't truly required is another.

J.



On Mon, May 11, 2026 at 3:30 PM Ash Berlin-Taylor <[email protected]> wrote:

Maybe we should not have sequential IDs at all and do something similar
to what SQLA does: https://sqlalche.me/e/20/xd2s for example (That’s
`/e/<major><minor>/<code>` which redirects)

Some of the example(?) errors are internal to a single component and
never exposed to users, so shouldn’t be in the registry -
AERR009/DagCodeNotFound for instance, is likely thrown by the ORM layer and
caught by the API server, which is to say it is entirely invisible to the
user? I imagine there are many more in this category.


AERR010 and AERR011 are both DagNotFound, but 11 is specifically for
"Requested DAG could not be found for backfill operation” — that seems very
odd to have a different error code for that.

We also have provider specific error codes in the main registry which
isn’t a pattern that will work (`user_facing_error_message: Google Ads link
not found for the specified property`) etc.

-ash

On 11 May 2026, at 14:20, Ash Berlin-Taylor <[email protected]> wrote:

If we do this (and I’m still not sure what I think overall) +1 to some

kind of grouping. Right now for instance the registry has AERR002 for
connection not found, but no space to add  Variable not found, or State not
found in the future.

On 11 May 2026, at 12:25, Dev-iL <[email protected]> wrote:

(please assume there's a "In my opinion, " prefix to every sentence)

0. Since the dev workflow is very structured, it can/should be made

into a

SKILL.
1. Long term yes, but while we refactor the existing code we should

allow

it (assuming it trip hooks or CI)
2. YAML seems suitable at first glance
3. One code per exception makes sense to me. Depending on how we want

the

exception taxonomy to evolve, perhaps we want to have codes like

###.###

for "parent" and "subclass" exceptions, or Ruff-style #00 will be a

family

of similar exceptions.


On Mon, 11 May 2026, 12:15 Omkar P, <[email protected]> wrote:

Hi team,

Starting this thread to discuss the design of Airflow error codes.

These

are LLM-friendly strings starting with AERR, which airflow devs can

use

when raising exceptions, to convey the error context to dag users in a
succinct way. Providing current design details below.

PR: https://github.com/apache/airflow/pull/65423

Feature flow:
1. airflow dev identifies error case and defines a new error code in

the

error mapping yaml (say AERR002).
2. dev then adds AirflowErrorCodeMixin to respective exception class
that they'd want to raise with an error_code.
3. dev then specifies the error_code in raise in code (e.g.  raise
AirflowNotFoundException(..., error_code="AERR002")).
4. dev runs breeze build-docs that generates a new docs page

AERR002.rst

5. breeze static check takes care of validating if error code is

mapped

to correct exception class.

User side:
On airflow users' side, they now see airflow error code as
part of the stack trace, which they can use for communicating problems
instead of pasting verbose stack traces. Error codes also improve
LLM-based discovery of airflow errors as codes are much more
deterministic/well-defined than plain stack traces.

Open questions:
1. Should the error code be mandatory for all raises of an exception
class that uses them?
2. Where should the error code info be stored? Is a YAML-based

registry

good enough?
3. Shall we have a 1:1 mapping between an error code and exception
class? e.g. AirflowNotFoundException mapped only to AERR002 i.e. only

one

error code. (current implementation in PR has supports many to one

mapping,

one exception class <-> multiple error codes based on respective

context).

Look forward to your thoughts on above open questions or any other
design suggestions you'd like to add, thanks!

Regards,
Omkar


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [DISCUSS] Airflow error codes (AERR) design

Reply via email to