1. Agree with the grouping idea. I think even originally when you discussed it Omkar - there were some "groups" of exceptions. AERR-DAG-NOTFOUND-BACKFILL seems like a more suitable short name than 0001, provided it is descriptive enough for you to easily understand what each error means. I would hate always having to look up the error code in a table or YAML file. We coud have such table generated and in docs, but essentially after seeing enough logs you should know what the short code means without memorizing the number. It's almost inhuman to force people to associate numeric values with meaning.
2. I think 1-1 mapping exception to the code would be good. While a short error code is useful in logs, seeing the short name in the code when you "raise" them is counterproductive because it adds noise to something we already have: the Exception Class name. On the other hand, such a class name looks way worse in the logs./ 3. *Idea:* Why don't we just keep the correct naming convention for our Exceptions and map them into IDs automatically (e.g., AirflowDagNotFoundBackfillException -> AERR-DAG-NOT-FOUND-BACKFILL). I think it ticks all the boxes: * 0 maintenance (just a hook to check if all exceptions follow the right conventions * 0 mapping * Code friendly * Log friendly * You see what you get by looking at either the exception class or ID * We can build an exception hierarchy that allows us to catch several exceptions (e.g., `AirflowDagNotFoundException` being an abstract (non-instantiable) parent of AirflowDagNotFoundBackfillExceptions and AirflowDagNotFoundParsingException for example * Grouping works naturally and without conscious thought—in both exception classes and IDs Essentially, no SKILL is needed for that. And BTW. I think none of our "coding" should really "Requiire" using SKILLS and "impair" those who do not use agents. Even though I'm known as an AI and Agent enthusiast, we should avoid making standard code parts or development workflows inaccessible to those who don't want to use agents, especially if it's easy. It's one thing to empower maintainers and contributors with SKILLS to review or triage PRs if they want to or for someone doing translation to add a new phrase in a language. However, it's a different story when discussing basic "code" tasks, like adding new exceptions. Ideally, those tasks should not **require** you to use Agents or be "difficult" without them. We should totally respect people who choose not to use agents themselves and ensure they do not feel like "lesser" people. Promoting something and giving people new tools is one thing; making it a mandatory part of the regular workflow when it isn't truly required is another. J. On Mon, May 11, 2026 at 3:30 PM Ash Berlin-Taylor <[email protected]> wrote: > Maybe we should not have sequential IDs at all and do something similar to > what SQLA does: https://sqlalche.me/e/20/xd2s for example (That’s > `/e/<major><minor>/<code>` which redirects) > > Some of the example(?) errors are internal to a single component and never > exposed to users, so shouldn’t be in the registry - AERR009/DagCodeNotFound > for instance, is likely thrown by the ORM layer and caught by the API > server, which is to say it is entirely invisible to the user? I imagine > there are many more in this category. > > > AERR010 and AERR011 are both DagNotFound, but 11 is specifically for > "Requested DAG could not be found for backfill operation” — that seems very > odd to have a different error code for that. > > We also have provider specific error codes in the main registry which > isn’t a pattern that will work (`user_facing_error_message: Google Ads link > not found for the specified property`) etc. > > -ash > > > > On 11 May 2026, at 14:20, Ash Berlin-Taylor <[email protected]> wrote: > > > > If we do this (and I’m still not sure what I think overall) +1 to some > kind of grouping. Right now for instance the registry has AERR002 for > connection not found, but no space to add Variable not found, or State not > found in the future. > > > >> On 11 May 2026, at 12:25, Dev-iL <[email protected]> wrote: > >> > >> (please assume there's a "In my opinion, " prefix to every sentence) > >> > >> 0. Since the dev workflow is very structured, it can/should be made > into a > >> SKILL. > >> 1. Long term yes, but while we refactor the existing code we should > allow > >> it (assuming it trip hooks or CI) > >> 2. YAML seems suitable at first glance > >> 3. One code per exception makes sense to me. Depending on how we want > the > >> exception taxonomy to evolve, perhaps we want to have codes like ###.### > >> for "parent" and "subclass" exceptions, or Ruff-style #00 will be a > family > >> of similar exceptions. > >> > >> > >> On Mon, 11 May 2026, 12:15 Omkar P, <[email protected]> wrote: > >> > >>> Hi team, > >>> > >>> Starting this thread to discuss the design of Airflow error codes. > These > >>> are LLM-friendly strings starting with AERR, which airflow devs can use > >>> when raising exceptions, to convey the error context to dag users in a > >>> succinct way. Providing current design details below. > >>> > >>> PR: https://github.com/apache/airflow/pull/65423 > >>> > >>> Feature flow: > >>> 1. airflow dev identifies error case and defines a new error code in > the > >>> error mapping yaml (say AERR002). > >>> 2. dev then adds AirflowErrorCodeMixin to respective exception class > >>> that they'd want to raise with an error_code. > >>> 3. dev then specifies the error_code in raise in code (e.g. raise > >>> AirflowNotFoundException(..., error_code="AERR002")). > >>> 4. dev runs breeze build-docs that generates a new docs page > AERR002.rst > >>> 5. breeze static check takes care of validating if error code is mapped > >>> to correct exception class. > >>> > >>> User side: > >>> On airflow users' side, they now see airflow error code as > >>> part of the stack trace, which they can use for communicating problems > >>> instead of pasting verbose stack traces. Error codes also improve > >>> LLM-based discovery of airflow errors as codes are much more > >>> deterministic/well-defined than plain stack traces. > >>> > >>> Open questions: > >>> 1. Should the error code be mandatory for all raises of an exception > >>> class that uses them? > >>> 2. Where should the error code info be stored? Is a YAML-based registry > >>> good enough? > >>> 3. Shall we have a 1:1 mapping between an error code and exception > >>> class? e.g. AirflowNotFoundException mapped only to AERR002 i.e. only > one > >>> error code. (current implementation in PR has supports many to one > mapping, > >>> one exception class <-> multiple error codes based on respective > context). > >>> > >>> Look forward to your thoughts on above open questions or any other > >>> design suggestions you'd like to add, thanks! > >>> > >>> Regards, > >>> Omkar > >>> > > > >
