Short reminder: About 10 hours left till I wind this discussion up and start a lazy consensus for the same.
Thanks & Regards, Amogh Desai On Fri, Nov 7, 2025 at 12:58 PM Amogh Desai <[email protected]> wrote: > I will be waiting for responses on this discussion before creating a lazy > consensus till *Tue, Nov 11, 3:00 PM UTC* > > So, if you have thoughts, feel free to chime in now :) > > Thanks & Regards, > Amogh Desai > > > On Fri, Nov 7, 2025 at 4:57 AM Buğra Öztürk <[email protected]> > wrote: > >> Great initiative Amogh, thanks! I agree with others on 1 and not >> encouraging for 2 as well. >> >> Idea of filling the gaps with adding more endpoints would enable more >> automation with a secure environment in the long run. In addition, we can >> consider providing some more granular clean up/db functionality on CLI too >> where those could be automated on server side with Admin commands and not >> from Dags, just an idea. >> >> I hope we will add airflowctl there soon, of course with limited >> opwrations. 🤞 >> >> Bugra Ozturk >> >> On Thu, 6 Nov 2025, 14:32 Amogh Desai, <[email protected]> wrote: >> >> > Looking for some more eyes on this one. >> > >> > Thanks & Regards, >> > Amogh Desai >> > >> > >> > On Thu, Nov 6, 2025 at 12:55 PM Amogh Desai <[email protected]> >> wrote: >> > >> > > > Yes, API could do this with 5-times more code including the limits >> per >> > > response where you need to loop over all pages until you have a full >> > > list (e.g. API limited to 100 results). Not impossible but a lot of >> > > re-implementation. >> > > >> > > Just wondering, why not vanilla task mapping? >> > > >> > > > Might be something that could be a potential contributionto >> "airflow db >> > > clean" >> > > >> > > Maybe, yes. >> > > >> > > Thanks & Regards, >> > > Amogh Desai >> > > >> > > >> > > On Thu, Nov 6, 2025 at 12:53 PM Amogh Desai <[email protected]> >> > wrote: >> > > >> > >> > I think our efforts should be way more focused on adding some >> missing >> > >> API >> > >> calls in Task SDK that our users miss, rather than in allowing them >> to >> > use >> > >> "old ways". Every time someone says "I cannot migrate because i did >> > this", >> > >> our first thought should be: >> > >> >> > >> * is it a valid way? >> > >> * is it acceptable to have an API call for it in SDK? >> > >> * should we do it ? >> > >> >> > >> >> > >> That is currently a grey zone we need to define better I think. >> Certain >> > >> use cases might be general >> > >> enough that we need an execution API endpoint for that, and we can >> > >> certainly do that. But there will >> > >> also be cases when the use case is niche and we will NOT want to have >> > >> execution API endpoints >> > >> for that for various reasons. The harder problem to solve is the >> latter. >> > >> >> > >> But you make a fair point here. >> > >> >> > >> >> > >> >> > >> Thanks & Regards, >> > >> Amogh Desai >> > >> >> > >> >> > >> On Thu, Nov 6, 2025 at 2:33 AM Jens Scheffler <[email protected]> >> > >> wrote: >> > >> >> > >>> > Thanks for your comments too, Jens. >> > >>> > >> > >>> >> * Aggregate status of tasks in the upstream of same Dag (pass, >> > >>> fail, >> > >>> >> listing) >> > >>> >> >> > >>> >> Does the DAG run page not show that? >> > >>> Partly yes, but in our environment it is a bit more complex than >> > >>> "pass/fail". Bit more complex story, we want to know more details of >> > the >> > >>> failed and aggregate details. So high-level saying get the XCom from >> > >>> failed and then aggregate details. Imagine all tasks ahve an owner >> and >> > >>> we want to send a notification to each owner but if 10 tasks from >> one >> > >>> owner fail we want to send 1 notification with 10 failed in the >> text. >> > >>> And, yes, can be done via API. >> > >>> >> * Custom mass-triggering of other dags and collection of >> results >> > >>> from >> > >>> >> triggered dags as scale-out option for dynamic task mapping >> > >>> >> >> > >>> >> Can't an API do that? >> > >>> Yes, API could do this with 5-times more code including the limits >> per >> > >>> response where you need to loop over all pages until you have a full >> > >>> list (e.g. API limited to 100 results). Not impossible but a lot of >> > >>> re-implementation. >> > >>> >> * And the famous: Partial database clean on a per Dag level >> with >> > >>> >> different retention >> > >>> >> >> > >>> >> Can you elaborate this one a bit :D >> > >>> >> > >>> Yes. We have some Dag that is called 50k-100k times per day and >> others >> > >>> that are called 12 times a day. And a lot of others in-between like >> 25k >> > >>> runs per month. The Dag with 100k runs per day we want to archive >> ASAP >> > >>> probably after 3 days for all not failed calls to reduce DB >> overhead. >> > >>> The failed ones we keep for 14 days for potential re-processing if >> > there >> > >>> was an outage. >> > >>> >> > >>> Most other Dag Runs we keep for a month. And some we cap that we >> > archive >> > >>> if more than 25k runs >> > >>> >> > >>> Might be something that could be a potential contributionto >> "airflow db >> > >>> clean" >> > >>> >> > >>> >> >> > >>> >> Thanks & Regards, >> > >>> >> Amogh Desai >> > >>> >> >> > >>> >> >> > >>> >> On Wed, Nov 5, 2025 at 3:12 AM Jens Scheffler < >> [email protected]> >> > >>> wrote: >> > >>> >> >> > >>> >> Thanks Amough for adding docs for migration hints. >> > >>> >> >> > >>> >> We actually suffer a lot of integrations that had been built in >> the >> > >>> past >> > >>> >> which now makes it hard and serious effort to migrate to version >> 3. >> > So >> > >>> >> most probably we ourself need to take option 2 but knowing (like >> in >> > >>> the >> > >>> >> past) that you can not ask for support. But at least this >> un-blocks >> > us >> > >>> >> from staying with 2.x >> > >>> >> >> > >>> >> I'd love to take route 1 as well but then a lot of code needs to >> be >> > re >> > >>> >> written. This will take time, And in mid term we will migrate to >> > (1). >> > >>> >> >> > >>> >> As in the dev call I'd love if in Airflow 3.2 we could have >> option 1 >> > >>> >> supported out-of-the-box - knowing that some security discussion >> is >> > >>> >> implied, so maybe need to be turned on and not be enabled by >> > default. >> > >>> >> >> > >>> >> The use cases we have and which requires some kind of DB access >> > where >> > >>> >> TaskSDK is not helping with support >> > >>> >> >> > >>> >> * Adding task and dag run notes to tasks as better readable >> > status >> > >>> >> while and after execution >> > >>> >> * Aggregate status of tasks in the upstream of same Dag (pass, >> > >>> fail, >> > >>> >> listing) >> > >>> >> * Custom mass-triggering of other dags and collection of >> results >> > >>> from >> > >>> >> triggered dags as scale-out option for dynamic task mapping >> > >>> >> * Adjusting Pools based on available workers >> > >>> >> * Checking results of pass/fail per edge worker and depending >> on >> > >>> >> stability adjusting Queues on Edge workers based on status >> and >> > >>> >> errors of workers >> > >>> >> * Adjust Pools based on time of day >> > >>> >> * And the famous: Partial database clean on a per Dag level >> with >> > >>> >> different retention >> > >>> >> >> > >>> >> I would be okay removing option 3 and a clear warning to option >> 2 is >> > >>> >> also okay. >> > >>> >> >> > >>> >> Jens >> > >>> >> >> > >>> >> On 11/4/25 13:06, Jarek Potiuk wrote: >> > >>> >>> My take (and details can be found in the discussion): >> > >>> >>> >> > >>> >>> 2. Don't make the impression it is something that we will >> support - >> > >>> and >> > >>> >>> explain to the users that it **WILL** break in the future and >> it's >> > on >> > >>> >>> **THEM** to fix when it breaks. >> > >>> >>> >> > >>> >>> The 2 is **kinda** possible but we should strongly discourage >> this >> > >>> and >> > >>> >> say >> > >>> >>> "this will break any time and it's you who have to adapt to any >> > >>> future >> > >>> >>> changes in schema" - we had a lot of similar cases in the past >> > where >> > >>> our >> > >>> >>> users felt entitled to get **something** they felt as "valid >> way of >> > >>> using >> > >>> >>> things" broken by our changes. If we say "recommended" they will >> > >>> take it >> > >>> >> as >> > >>> >>> "and all the usage there is expected to work when Airlfow gets a >> > new >> > >>> >>> version so I should be fully entitled to open a valid issue when >> > >>> things >> > >>> >>> change". I think "recommended" in this case is far too strong >> from >> > >>> our >> > >>> >>> side. >> > >>> >>> >> > >>> >>> 3. Absolutely remove. >> > >>> >>> >> > >>> >>> Sounds like we are going back to Airflow 2 behaviour. And we've >> > made >> > >>> all >> > >>> >>> the effort to break out of that. Various things will start >> breaking >> > >>> in >> > >>> >>> Airflow 3.2 and beyond. Once we complete the task isolation >> work, >> > >>> Airflow >> > >>> >>> workers will NOT have sqlalchemy package installed by default - >> it >> > >>> simply >> > >>> >>> will not be task-sdk dependency. The fact that you **can** use >> > >>> sqlalchemy >> > >>> >>> now is mostly a by-product of the fact that we have not >> completed >> > the >> > >>> >> split >> > >>> >>> yet - but it was not even **SUPPOSED** to work. >> > >>> >>> >> > >>> >>> J. >> > >>> >>> >> > >>> >>> >> > >>> >>> >> > >>> >>> On Tue, Nov 4, 2025 at 10:03 AM Amogh Desai< >> [email protected]> >> > >>> >> wrote: >> > >>> >>>> Hi All, >> > >>> >>>> >> > >>> >>>> I'm working on expanding the Airflow 3 upgrade documentation to >> > >>> address >> > >>> >> a >> > >>> >>>> frequently asked question from users >> > >>> >>>> migrating from Airflow 2.x: "How do I access the metadata >> database >> > >>> from >> > >>> >> my >> > >>> >>>> tasks now that direct database >> > >>> >>>> access is blocked?" >> > >>> >>>> >> > >>> >>>> Currently, Step 5 of the upgrade guide[1] only mentions that >> > direct >> > >>> DB >> > >>> >>>> access is blocked and points to a GitHub issue. >> > >>> >>>> However, users need concrete guidance on migration options. >> > >>> >>>> >> > >>> >>>> I've drafted documentation via [2] describing three approaches, >> > but >> > >>> >> before >> > >>> >>>> proceeding to finalising this, I'd like to get community >> > >>> >>>> consensus on how we should present these options, especially >> given >> > >>> the >> > >>> >>>> architectural principles we've established with >> > >>> >>>> Airflow 3. >> > >>> >>>> >> > >>> >>>> ## Proposed Approaches >> > >>> >>>> >> > >>> >>>> Approach 1: Airflow Python Client (REST API) >> > >>> >>>> - Uses `apache-airflow-client` [3] to interact via REST API >> > >>> >>>> - Pros: No DB drivers needed, aligned with Airflow 3 >> architecture, >> > >>> >>>> API-first >> > >>> >>>> - Cons: Requires package installation, API server dependency, >> auth >> > >>> token >> > >>> >>>> management, limited operations possible >> > >>> >>>> >> > >>> >>>> Approach 2: Database Hooks (PostgresHook/MySqlHook) >> > >>> >>>> - Create a connection to metadata DB and use DB hooks to >> execute >> > SQL >> > >>> >>>> directly >> > >>> >>>> - Pros: Uses Airflow connection management, simple SQL >> interface >> > >>> >>>> - Cons: Requires DB drivers, direct network access, bypasses >> > >>> Airflow API >> > >>> >>>> server and connects to DB directly >> > >>> >>>> >> > >>> >>>> Approach 3: Direct SQLAlchemy Access (last resort) >> > >>> >>>> - Use environment variable with DB connection string and create >> > >>> >> SQLAlchemy >> > >>> >>>> session directly >> > >>> >>>> - Pros: Maximum flexibility >> > >>> >>>> - Cons: Bypasses all Airflow protections, schema coupling, >> manual >> > >>> >>>> connection management, worst possible option. >> > >>> >>>> >> > >>> >>>> I was expecting some pushback regarding these approaches and >> there >> > >>> were >> > >>> >>>> (rightly) some important concerns raised >> > >>> >>>> by Jarek about Approaches 2 and 3: >> > >>> >>>> >> > >>> >>>> 1. Breaks Task Isolation - Contradicts Airflow 3's core promise >> > >>> >>>> 2. DB as Public Interface - Schema changes would require >> release >> > >>> notes >> > >>> >> and >> > >>> >>>> break user code >> > >>> >>>> 3. Performance Impact - Using Approach 2 creates direct DB >> access >> > >>> and >> > >>> >> can >> > >>> >>>> bring back Airflow 2's >> > >>> >>>> connection-per-task overhead >> > >>> >>>> 4. Security Model Violation - Contradicts documented isolation >> > >>> >> principles >> > >>> >>>> Considering these comments, this is what I want to document >> now: >> > >>> >>>> >> > >>> >>>> 1. Approach 1 - Keep as primary/recommended solution (aligns >> with >> > >>> >> Airflow 3 >> > >>> >>>> architecture) >> > >>> >>>> 2. Approach 2 - Present as "known workaround" (not >> recommendation) >> > >>> with >> > >>> >>>> explicit warnings >> > >>> >>>> about breaking isolation, schema not being public API, >> performance >> > >>> >>>> implications, and no support guarantees >> > >>> >>>> 3. Approach 3 - Remove entirely, or keep with strongest >> possible >> > >>> >> warnings >> > >>> >>>> (would love to hear what others think for >> > >>> >>>> this one particularly) >> > >>> >>>> >> > >>> >>>> Once we arrive at some discussion points on this one, I would >> like >> > >>> to >> > >>> >> call >> > >>> >>>> for a lazy consensus for posterity and visibility >> > >>> >>>> of the community. >> > >>> >>>> >> > >>> >>>> Looking forward to your feedback! >> > >>> >>>> >> > >>> >>>> [1] >> > >>> >>>> >> > >>> >>>> >> > >>> >> >> > >>> >> > >> https://github.com/apache/airflow/blob/main/airflow-core/docs/installation/upgrading_to_airflow3.rst#step-5-review-custom-operators-for-direct-db-access >> > >>> >>>> [2]https://github.com/apache/airflow/pull/57479 >> > >>> >>>> [3]https://github.com/apache/airflow-client-python >> > >>> >>>> >> > >>> >> > >>> >> --------------------------------------------------------------------- >> > >>> To unsubscribe, e-mail: [email protected] >> > >>> For additional commands, e-mail: [email protected] >> > >>> >> > >>> >> > >> >
