Thanks for your comments too, Jens.

   * Aggregate status of tasks in the upstream of same Dag (pass, fail,
     listing)

Does the DAG run page not show that?
Partly yes, but in our environment it is a bit more complex than "pass/fail". Bit more complex story, we want to know more details of the failed and aggregate details. So high-level saying get the XCom from failed and then aggregate details. Imagine all tasks ahve an owner and we want to send a notification to each owner but if 10 tasks from one owner fail we want to send 1 notification with 10 failed in the text. And, yes, can be done via API.
   * Custom mass-triggering of other dags and collection of results from
    triggered dags as scale-out option for dynamic task mapping

Can't an API do that?
Yes, API could do this with 5-times more code including the limits per response where you need to loop over all pages until you have a full list (e.g. API limited to 100 results). Not impossible but a lot of re-implementation.
   * And the famous: Partial database clean on a per Dag level with
     different retention

Can you elaborate this one a bit :D

Yes. We have some Dag that is called 50k-100k times per day and others that are called 12 times a day. And a lot of others in-between like 25k runs per month. The Dag with 100k runs per day we want to archive ASAP probably after 3 days for all not failed calls to reduce DB overhead. The failed ones we keep for 14 days for potential re-processing if there was an outage.

Most other Dag Runs we keep for a month. And some we cap that we archive if more than 25k runs

Might be something that could be a potential contributionto "airflow db clean"


Thanks & Regards,
Amogh Desai


On Wed, Nov 5, 2025 at 3:12 AM Jens Scheffler <[email protected]> wrote:

Thanks Amough for adding docs for migration hints.

We actually suffer a lot of integrations that had been built in the past
which now makes it hard and serious effort to migrate to version 3. So
most probably we ourself need to take option 2 but knowing (like in the
past) that you can not ask for support. But at least this un-blocks us
from staying with 2.x

I'd love to take route 1 as well but then a lot of code needs to be re
written. This will take time, And in mid term we will migrate to (1).

As in the dev call I'd love if in Airflow 3.2 we could have option 1
supported out-of-the-box - knowing that some security discussion is
implied, so maybe need to be turned on and not be enabled by default.

The use cases we have and which requires some kind of DB access where
TaskSDK is not helping with support

   * Adding task and dag run notes to tasks as better readable status
     while and after execution
   * Aggregate status of tasks in the upstream of same Dag (pass, fail,
     listing)
   * Custom mass-triggering of other dags and collection of results from
     triggered dags as scale-out option for dynamic task mapping
   * Adjusting Pools based on available workers
   * Checking results of pass/fail per edge worker and depending on
     stability adjusting Queues on Edge workers based on status and
     errors of workers
   * Adjust Pools based on time of day
   * And the famous: Partial database clean on a per Dag level with
     different retention

I would be okay removing option 3 and a clear warning to option 2 is
also okay.

Jens

On 11/4/25 13:06, Jarek Potiuk wrote:
My take (and details can be found in the discussion):

2. Don't make the impression it is something that we will support - and
explain to the users that it **WILL** break in the future and it's on
**THEM** to fix when it breaks.

The 2 is **kinda** possible but we should strongly discourage this and
say
"this will break any time and it's you who have to adapt to any future
changes in schema" - we had a lot of similar cases in the past where our
users felt entitled to get **something** they felt as "valid way of using
things" broken by our changes. If we say "recommended" they will take it
as
"and all the usage there is expected to work when Airlfow gets a new
version so I should be fully entitled to open a valid issue when things
change".  I think "recommended" in this case is far too strong from our
side.

3. Absolutely remove.

Sounds like we are going back to Airflow 2 behaviour. And we've made all
the effort to break out of that. Various things will start breaking in
Airflow 3.2 and beyond. Once we complete the task isolation work, Airflow
workers will NOT have sqlalchemy package installed by default - it simply
will not be task-sdk dependency. The fact that you **can** use sqlalchemy
now is mostly a by-product of the fact that we have not completed the
split
yet - but it was not even **SUPPOSED** to work.

J.



On Tue, Nov 4, 2025 at 10:03 AM Amogh Desai<[email protected]>
wrote:
Hi All,

I'm working on expanding the Airflow 3 upgrade documentation to address
a
frequently asked question from users
migrating from Airflow 2.x: "How do I access the metadata database from
my
tasks now that direct database
access is blocked?"

Currently, Step 5 of the upgrade guide[1] only mentions that direct DB
access is blocked and points to a GitHub issue.
However, users need concrete guidance on migration options.

I've drafted documentation via [2] describing three approaches, but
before
proceeding to finalising this, I'd like to get community
consensus on how we should present these options, especially given the
architectural principles we've established with
Airflow 3.

## Proposed Approaches

Approach 1: Airflow Python Client (REST API)
- Uses `apache-airflow-client` [3] to interact via REST API
- Pros: No DB drivers needed, aligned with Airflow 3 architecture,
API-first
- Cons: Requires package installation, API server dependency, auth token
management, limited operations possible

Approach 2: Database Hooks (PostgresHook/MySqlHook)
- Create a connection to metadata DB and use DB hooks to execute SQL
directly
- Pros: Uses Airflow connection management, simple SQL interface
- Cons: Requires DB drivers, direct network access, bypasses Airflow API
server and connects to DB directly

Approach 3: Direct SQLAlchemy Access (last resort)
- Use environment variable with DB connection string and create
SQLAlchemy
session directly
- Pros: Maximum flexibility
- Cons: Bypasses all Airflow protections, schema coupling, manual
connection management, worst possible option.

I was expecting some pushback regarding these approaches and there were
(rightly) some important concerns raised
by Jarek about Approaches 2 and 3:

1. Breaks Task Isolation - Contradicts Airflow 3's core promise
2. DB as Public Interface - Schema changes would require release notes
and
break user code
3. Performance Impact - Using Approach 2 creates direct DB access and
can
bring back Airflow 2's
connection-per-task overhead
4. Security Model Violation - Contradicts documented isolation
principles
Considering these comments, this is what I want to document now:

1. Approach 1 - Keep as primary/recommended solution (aligns with
Airflow 3
architecture)
2. Approach 2 - Present as "known workaround" (not recommendation) with
explicit warnings
about breaking isolation, schema not being public API, performance
implications, and no support guarantees
3. Approach 3 - Remove entirely, or keep with strongest possible
warnings
(would love to hear what others think for
this one particularly)

Once we arrive at some discussion points on this one, I would like to
call
for a lazy consensus for posterity and visibility
of the community.

Looking forward to your feedback!

[1]


https://github.com/apache/airflow/blob/main/airflow-core/docs/installation/upgrading_to_airflow3.rst#step-5-review-custom-operators-for-direct-db-access
[2]https://github.com/apache/airflow/pull/57479
[3]https://github.com/apache/airflow-client-python


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to