Thank you everyone for thoughtful discussions on this AIP, we have sent out for voting.
Regards, Pavan On Tue, Jan 13, 2026 at 10:52 PM Pavankumar Gopidesu < [email protected]> wrote: > Thanks Alex, > > I agree that evals will be a core part of the operator implementations. > haven’t yet fully thought through the structure or how best to expose and > serve evals across operators, so your perspective is very timely. The idea > of a BaseEvals operator is interesting as well. > > Thank you for offering your support, we’ll definitely take you up on that. > I’ll reach out when we move into implementation so we can definitely > collaborate on this. > > Regards. > Pavan > > > > > On Tue, Jan 13, 2026 at 11:43 AM Alex <[email protected]> wrote: > >> Thanks Pavan, this thread and the AIP are awesome! >> >> I've been starting to use and advocate an eval-first approach (including a >> lightning talk in the Airflow Summit [1]), for not just traditional >> software developers but new builders from other domains (So I can't just >> say "it's like TDD with integration tests for AI apps") and I'd be happy >> to >> help build the evals for, test, design or brainstorm components in this >> space. >> >> I firmly believe evals are a key area and I'm starting to contact the MCP >> server pioneers I met at last summit so we can experiment building a >> testbed [2] to evaluate operators/agents/mcps/skills. >> >> Including a BaseEvals operator (Which I believe differs from the goal of >> LLMDataQualityOperator) in the proposal might be worth it (unless the >> evals >> scope deserves its own place). >> >> Any specific area where you'd like support? >> >> Thanks, >> Alex >> >> - [1] >> >> https://alexhans.github.io/posts/talk.toward-a-shared-vision-of-llm-evals-in-airflow-ecosystem.html >> - [2] https://github.com/Alexhans/evals-playground >> >> On Thu, Jan 8, 2026 at 9:43 PM Pavankumar Gopidesu < >> [email protected]> >> wrote: >> >> > Thanks Niko, for reviewing . >> > >> > For now I am moving the cycliness implementation to future scope, >> > maybe a new AIP to bring this in and rethink on this. >> > >> > Regards, >> > Pavan >> > >> > On Wed, Jan 7, 2026 at 9:59 PM Oliveira, Niko <[email protected]> >> wrote: >> > > >> > > I read through the AIP and I like the idea a lot! I see both sides of >> > where to put the HITL portion. But I think that's something we can >> adjust >> > one way or another (in an additive way), so if we fine out that it's not >> > the right fit later, we can pivot. >> > > >> > > ________________________________ >> > > From: Pavankumar Gopidesu <[email protected]> >> > > Sent: Monday, January 5, 2026 9:08:00 AM >> > > To: [email protected] >> > > Subject: RE: [EXT] AI-Native Airflow - LLM-Driven Intelligence for >> > Production Data Workflows >> > > >> > > CAUTION: This email originated from outside of the organization. Do >> not >> > click links or open attachments unless you can confirm the sender and >> know >> > the content is safe. >> > > >> > > >> > > >> > > AVERTISSEMENT: Ce courrier électronique provient d’un expéditeur >> > externe. Ne cliquez sur aucun lien et n’ouvrez aucune pièce jointe si >> vous >> > ne pouvez pas confirmer l’identité de l’expéditeur et si vous n’êtes pas >> > certain que le contenu ne présente aucun risque. >> > > >> > > >> > > >> > > Yes Zoppi, as mentioned kaxil, we will be using PydanticAI and it >> > > provides nice interfaces to integrate validations etc; >> > > >> > > Pavan >> > > >> > > On Wed, Dec 31, 2025 at 2:28 AM Kaxil Naik <[email protected]> >> wrote: >> > > > >> > > > Evals will be part of it as this will be built on top of PydanticAI >> > that >> > > > supports it. >> > > > >> > > > On Mon, 29 Dec 2025 at 19:03, Giorgio Zoppi < >> [email protected]> >> > wrote: >> > > > >> > > > > Hey Pavan. >> > > > > If you are going to introduce this have you thought at the >> evaluation >> > > > > framework? >> > > > > How do you evaluate the LLm operator? >> > > > > >> > > > > On Mon, Dec 29, 2025, 09:40 Pavankumar Gopidesu < >> > [email protected]> >> > > > > wrote: >> > > > > >> > > > > > Thanks Jens and Jarek, agree on both points raised in comments. >> > > > > > >> > > > > > I am happy to defer the embedding of the HITL to separate AIP. >> > > > > > >> > > > > > To Jens: >> > > > > > Yes it's planned for phases wise, our plan starts with only >> > provider >> > > > > > changes. >> > > > > > >> > > > > > Regards >> > > > > > Pavan >> > > > > > >> > > > > > On Sun, Dec 28, 2025 at 2:03 PM Jarek Potiuk <[email protected]> >> > wrote: >> > > > > > > >> > > > > > > I also looked at it and I love it as well. I think of it as a >> > missing >> > > > > > > abstraction between current Airflow users and current LLM app >> > > > > > developers, I >> > > > > > > also proposed something a little bit bolder there, which I >> think >> > shows >> > > > > > the >> > > > > > > true potential of that approach. >> > > > > > > I added comment in the doc, but I will copy it here for better >> > > > > visibility >> > > > > > > >> > > > > > > --- >> > > > > > > >> > > > > > > After thinking quite a bit about the proposal, I actually love >> > it and I >> > > > > > > think that should be the next frontier of making Airflow >> > abstractions >> > > > > > more >> > > > > > > approachable and usable by those who want to implement various >> > patterns >> > > > > > of >> > > > > > > interacting with LLMS. >> > > > > > > >> > > > > > > And I have a little different opinion than Jens regarding >> HITL. >> > I see >> > > > > > those >> > > > > > > common LLM operators as slightly "higher" level operators that >> > might >> > > > > > > implement a set of common LLM-related patterns that are >> currently >> > > > > either >> > > > > > > difficult or impossible to express via putting together things >> > via Dag >> > > > > > and >> > > > > > > individual tasks. In this sense, the capability of making HITL >> > call-out >> > > > > > for >> > > > > > > approval or selection from within such an operator - without >> > completing >> > > > > > the >> > > > > > > operator and even running those "call-outs" more than once, >> > actually >> > > > > even >> > > > > > > unbounded number of times during a single operator's >> execution. >> > > > > > > >> > > > > > > Actually it's a great way for us to implement some >> "cyclicness" - >> > > > > without >> > > > > > > breaking the "acyclic" property of our Dags (for now at >> least). >> > Making >> > > > > > Dag >> > > > > > > "cyclic" is quite a dramatic change, and possibly we do not >> even >> > have >> > > > > to >> > > > > > do >> > > > > > > it, because the "cyclic" part can be likely encompassed within >> > the >> > > > > > > specialized LLM operators. I can imagine an operator that >> > performs LLM >> > > > > > > querying and refining it via additional interactions with LLMs >> > > > > > "internally" >> > > > > > > - during a single operator's execution. And some of those >> > iterations >> > > > > > might >> > > > > > > result in HITL "call-out" - even multiple times during one >> > execution. >> > > > > > > >> > > > > > > Also one more proposal I have here is to use an API similar to >> > HITL (or >> > > > > > > maybe repurpose HITL for that) - to report PROGRESS of such a >> > task. >> > > > > This >> > > > > > is >> > > > > > > the typical property of good LLM task that it provides some >> > feedback to >> > > > > > the >> > > > > > > user - it might be HITL when it asks for something but also it >> > might be >> > > > > > > HOOTL (Human Outside Of The Loop) - where the task is simply >> > reporting >> > > > > > it's >> > > > > > > progress and allows the user to perform asynchronous actions >> > based on >> > > > > > that >> > > > > > > progress → for example abort the execution (to stop the Dag) >> or >> > mark it >> > > > > > as >> > > > > > > "skipped" (to trigger - skip processing path), or mark it as >> > "success" >> > > > > to >> > > > > > > simulate things being completed when they are not. While the >> > three >> > > > > > "async" >> > > > > > > operations we already have, we do not currently have >> "progress" >> > > > > targeted >> > > > > > > for the kind of actor who is also HITL "actor" - someone who >> is >> > not >> > > > > > > interested in detailed logs, but rather want to monitor >> progress >> > and >> > > > > > assess >> > > > > > > quality of the output - even if it is just a partial output in >> > the >> > > > > > > iterative process). >> > > > > > > >> > > > > > > I think that it will be easier and much more "surgical" (and >> > applied in >> > > > > > the >> > > > > > > right place) to embed this "iterative" feedback / progress >> than >> > to >> > > > > modify >> > > > > > > the "acyclic" property into our Dags. >> > > > > > > >> > > > > > > Also - this kind of Progress interface can also be used to >> > publish the >> > > > > > > "async" tasks progress as the next step of [WIP] AIP-98: Add >> > async >> > > > > > support >> > > > > > > for PythonOperator in Airflow 3: >> > > > > > > >> > > > > > >> > > > > >> > >> https://cwiki.apache.org/confluence/display/AIRFLOW/%5BWIP%5D+AIP-98%3A+Add+async+support+for+PythonOperator+in+Airflow+3 >> > > > > > > that we discussed with David . >> > > > > > > >> > > > > > > J. >> > > > > > > >> > > > > > > >> > > > > > > >> > > > > > > On Sun, Dec 28, 2025 at 2:16 PM Jens Scheffler < >> > [email protected]> >> > > > > > wrote: >> > > > > > > >> > > > > > > > I like the AIP very much and in my view can be made >> completely >> > in a >> > > > > > > > Provider package... with some comments (I assume non >> blocking) >> > and >> > > > > > would >> > > > > > > > propose to really start in increments and then adjust by >> > learning on >> > > > > > the >> > > > > > > > path. >> > > > > > > > >> > > > > > > > On 12/27/25 22:00, Pavankumar Gopidesu wrote: >> > > > > > > > > Thanks Giorgio Zoppi, for reviewing the AIP, yes its >> already >> > > > > planned >> > > > > > > > > part of this AIP, see the [1] example , where you can >> > disable hitl >> > > > > > > > > step or enable it. So its integrated part of the Operator >> > with the >> > > > > > > > > help of HITL operator. >> > > > > > > > > >> > > > > > > > > ``` >> > > > > > > > > LLMDataQualityOperator( >> > > > > > > > > >> > > > > > > > > task_id="customer_quality_analysis", >> > > > > > > > > >> > > > > > > > > data_sources=[customer_s3], >> > > > > > > > > >> > > > > > > > > prompt="Generate data quality validation queries", >> > > > > > > > > >> > > > > > > > > require_approval=True, # Built-in HITL >> > > > > > > > > >> > > > > > > > > approval_timeout=timedelta(hours=2) >> > > > > > > > > >> > > > > > > > > ) >> > > > > > > > > ``` >> > > > > > > > > >> > > > > > > > > [1]: >> > > > > > > > >> > > > > > >> > > > > >> > >> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=406618285 >> > > > > > > > > >> > > > > > > > > Regards, >> > > > > > > > > Pavan >> > > > > > > > > >> > > > > > > > > On Sat, Dec 27, 2025 at 9:16 AM Giorgio Zoppi < >> > > > > > [email protected]> >> > > > > > > > wrote: >> > > > > > > > >> Hello, >> > > > > > > > >> Just 1c, skimming AIP, >> > > > > > > > >> You might want to explore on how to avoid human approval >> > for >> > > > > > generated >> > > > > > > > >> query using llm as judge to eval the quality. The nice >> > thing of >> > > > > data >> > > > > > > > >> pipelines is automation >> > > > > > > > >> >> > > > > > > > >> >> > > > > > > > >> >> > > > > > > > >> >> > > > > > > > >> On Wed, Dec 24, 2025, 10:23 Pavankumar Gopidesu < >> > > > > > > > [email protected]> >> > > > > > > > >> wrote: >> > > > > > > > >> >> > > > > > > > >>> Hello everyone, >> > > > > > > > >>> >> > > > > > > > >>> The thread has been quiet for some time, and I would >> like >> > to >> > > > > > restart >> > > > > > > > >>> the discussion with the AIP. >> > > > > > > > >>> >> > > > > > > > >>> First, a sincere thank you to Kaxil for presenting the >> > idea at >> > > > > > Airflow >> > > > > > > > >>> Summit 2025. The session was very well received, and >> many >> > > > > attendees >> > > > > > > > >>> expressed strong interest in the proposal. >> Unfortunately, >> > I was >> > > > > > unable >> > > > > > > > >>> to attend the summit due to visa issues, but I am >> hopeful >> > I will >> > > > > be >> > > > > > > > >>> able to join next year. >> > > > > > > > >>> >> > > > > > > > >>> The demo included well-structured prototypes. For those >> > who were >> > > > > > > > >>> unable to attend the session, please refer to the >> recorded >> > talk >> > > > > > here >> > > > > > > > >>> [1]. >> > > > > > > > >>> >> > > > > > > > >>> I have also drafted the complete AIP proposal, which is >> > available >> > > > > > here >> > > > > > > > >>> [2]. I would greatly appreciate your reviews and look >> > forward to >> > > > > > > > >>> feedback and further discussion. >> > > > > > > > >>> >> > > > > > > > >>> Finally, to those celebrating Christmas, I wish you a >> very >> > happy >> > > > > > > > >>> Christmas and a wonderful holiday season. >> > > > > > > > >>> >> > > > > > > > >>> Regards >> > > > > > > > >>> Pavan >> > > > > > > > >>> >> > > > > > > > >>> [1] https://www.youtube.com/watch?v=XSAzSDVUi2o >> > > > > > > > >>> [2] >> > > > > > > > >>> >> > > > > > > > >> > > > > > >> > > > > >> > >> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=406618285 >> > > > > > > > >>> >> > > > > > > > >>> On Wed, Oct 15, 2025 at 6:13 AM Amogh Desai < >> > > > > [email protected] >> > > > > > > >> > > > > > > > wrote: >> > > > > > > > >>>> Thanks Pavan and Kaxil, seems like an interesting idea >> > and a >> > > > > > pretty >> > > > > > > > >>>> reasonable problem to solve. >> > > > > > > > >>>> >> > > > > > > > >>>> I also like the idea of starting with >> > > > > > > > >>> `apache-airflow-providers-common-ai` >> > > > > > > > >>>> and expanding as / when needed. >> > > > > > > > >>>> >> > > > > > > > >>>> Looking forward to when the recording will be out, >> missed >> > > > > > attending >> > > > > > > > this >> > > > > > > > >>>> session at the Airflow Summit. >> > > > > > > > >>>> >> > > > > > > > >>>> Thanks & Regards, >> > > > > > > > >>>> Amogh Desai >> > > > > > > > >>>> >> > > > > > > > >>>> >> > > > > > > > >>>> On Thu, Oct 9, 2025 at 10:49 AM Kaxil Naik < >> > [email protected] >> > > > > > >> > > > > > > > wrote: >> > > > > > > > >>>> >> > > > > > > > >>>>> Yea I think it should be >> > apache-airflow-providers-common-ai >> > > > > > > > >>>>> >> > > > > > > > >>>>> On Wed, 8 Oct 2025 at 02:04, Pavankumar Gopidesu < >> > > > > > > > >>> [email protected]> >> > > > > > > > >>>>> wrote: >> > > > > > > > >>>>> >> > > > > > > > >>>>>> Yes its new provider starting with completely >> > experimental, we >> > > > > > dont >> > > > > > > > >>>>>> want to break functionalities with existing >> providers :) >> > > > > > > > >>>>>> >> > > > > > > > >>>>>> Mostly its sql based operators, so named it as sql-ai >> > but >> > > > > agree >> > > > > > we >> > > > > > > > >>> can >> > > > > > > > >>>>>> make it generic without specifying sql in it :) >> > > > > > > > >>>>>> >> > > > > > > > >>>>>> Pavan >> > > > > > > > >>>>>> >> > > > > > > > >>>>>> On Tue, Oct 7, 2025 at 3:48 PM Ryan Hatter via dev >> > > > > > > > >>>>>> <[email protected]> wrote: >> > > > > > > > >>>>>>> Would this really necessitate a new provider? Should >> > this >> > > > > just >> > > > > > be >> > > > > > > > >>> baked >> > > > > > > > >>>>>>> into the common SQL provider? >> > > > > > > > >>>>>>> >> > > > > > > > >>>>>>> Alternatively, instead of a narrow `sql-ai` >> provider, >> > why not >> > > > > > have >> > > > > > > > >>> a >> > > > > > > > >>>>>>> generic common ai provider with a SQL package, which >> > would >> > > > > > allow >> > > > > > > > >>> for us >> > > > > > > > >>>>>> to >> > > > > > > > >>>>>>> build AI-based subpackages into the provider other >> > than just >> > > > > > SQL? >> > > > > > > > >>>>>>> >> > > > > > > > >>>>>>> On Mon, Oct 6, 2025 at 4:31 PM Pavankumar Gopidesu < >> > > > > > > > >>>>>> [email protected]> >> > > > > > > > >>>>>>> wrote: >> > > > > > > > >>>>>>> >> > > > > > > > >>>>>>>> @Giorgio Yes indeed that's also a good thought to >> > > > > integrate. I >> > > > > > > > >>> will >> > > > > > > > >>>>>> keep in >> > > > > > > > >>>>>>>> mind to think about when I draft AIP and message >> > about this >> > > > > a >> > > > > > bit >> > > > > > > > >>>>> more >> > > > > > > > >>>>>> :) >> > > > > > > > >>>>>>>> Yes please join. We have great demos packed on this >> > topic :) >> > > > > > > > >>>>>>>> >> > > > > > > > >>>>>>>> @kaxil , Yes that's a great blog post from the wren >> > AI and >> > > > > > > > >>> leveraging >> > > > > > > > >>>>>> the >> > > > > > > > >>>>>>>> Apache DataFusion as a query engine to connect to >> > different >> > > > > > data >> > > > > > > > >>>>>> sources. >> > > > > > > > >>>>>>>> Pavan >> > > > > > > > >>>>>>>> >> > > > > > > > >>>>>>>> On Tue, Sep 30, 2025 at 7:37 PM Giorgio Zoppi < >> > > > > > > > >>>>> [email protected] >> > > > > > > > >>>>>>>> wrote: >> > > > > > > > >>>>>>>> >> > > > > > > > >>>>>>>>> Hey Pavan, >> > > > > > > > >>>>>>>>> Some notes: >> > > > > > > > >>>>>>>>> 1. LLM can be also very useful in detecting root >> > causes of >> > > > > > your >> > > > > > > > >>>>> error >> > > > > > > > >>>>>>>> while >> > > > > > > > >>>>>>>>> developing and design a pipeline. I explain me >> > better, we'd >> > > > > > in >> > > > > > > > >>> the >> > > > > > > > >>>>>> past >> > > > > > > > >>>>>>>>> several >> > > > > > > > >>>>>>>>> Spark processes, when it is all green is ok, but >> > when on >> > > > > > > > >>> fails, it >> > > > > > > > >>>>>> will >> > > > > > > > >>>>>>>> be >> > > > > > > > >>>>>>>>> nice to have a tool integrated to ask why. >> > > > > > > > >>>>>>>>> 2. Ideally such operator could be a >> > > > > > > > >>> ModelContextProtocolOperator >> > > > > > > > >>>>> and >> > > > > > > > >>>>>> you >> > > > > > > > >>>>>>>>> would not need nothing else that put an LLM as >> > parameter >> > > > > with >> > > > > > > > >>> that >> > > > > > > > >>>>>>>>> operator, >> > > > > > > > >>>>>>>>> and just call for tools, execute query, and so on. >> > This >> > > > > would >> > > > > > > > >>> be >> > > > > > > > >>>>> more >> > > > > > > > >>>>>>>>> powerful, because you create an abstraction >> between >> > > > > devices, >> > > > > > > > >>>>>> databases, >> > > > > > > > >>>>>>>>> server and so on, so each source of data can be >> > injected on >> > > > > > the >> > > > > > > > >>>>>> pipeline. >> > > > > > > > >>>>>>>>> 3. Good job! Looking forward to see the >> > presentation. >> > > > > > > > >>>>>>>>> Best Regards, >> > > > > > > > >>>>>>>>> Giorgio >> > > > > > > > >>>>>>>>> >> > > > > > > > >>>>>>>>> Il giorno mar 30 set 2025 alle ore 14:51 >> Pavankumar >> > > > > Gopidesu >> > > > > > < >> > > > > > > > >>>>>>>>> [email protected]> ha scritto: >> > > > > > > > >>>>>>>>> >> > > > > > > > >>>>>>>>>> Hi everyone, >> > > > > > > > >>>>>>>>>> >> > > > > > > > >>>>>>>>>> We're exploring adding LLM-powered SQL operators >> to >> > > > > Airflow >> > > > > > > > >>> and >> > > > > > > > >>>>>> would >> > > > > > > > >>>>>>>>> love >> > > > > > > > >>>>>>>>>> community input before writing an AIP. >> > > > > > > > >>>>>>>>>> >> > > > > > > > >>>>>>>>>> The idea: Let users write natural language >> prompts >> > like >> > > > > > "find >> > > > > > > > >>>>>> customers >> > > > > > > > >>>>>>>>>> with missing emails" and have Airflow generate >> safe >> > SQL >> > > > > > > > >>> queries >> > > > > > > > >>>>>> with >> > > > > > > > >>>>>>>> full >> > > > > > > > >>>>>>>>>> context about your database schema, connections, >> > and data >> > > > > > > > >>>>>> sensitivity. >> > > > > > > > >>>>>>>>>> Why this matters: >> > > > > > > > >>>>>>>>>> >> > > > > > > > >>>>>>>>>> >> > > > > > > > >>>>>>>>>> Most of us spend too much time on schema drift >> > detection >> > > > > and >> > > > > > > > >>>>> manual >> > > > > > > > >>>>>>>> data >> > > > > > > > >>>>>>>>>> quality checks. Meanwhile, AI agents are getting >> > powerful >> > > > > > but >> > > > > > > > >>>>> lack >> > > > > > > > >>>>>>>>>> production-ready data integrations. Airflow could >> > bridge >> > > > > > this >> > > > > > > > >>>>> gap. >> > > > > > > > >>>>>>>>>> Here's what we're dealing with at Tavant: >> > > > > > > > >>>>>>>>>> >> > > > > > > > >>>>>>>>>> >> > > > > > > > >>>>>>>>>> Our team works with multiple data domain teams >> > producing >> > > > > > > > >>> data in >> > > > > > > > >>>>>>>>> different >> > > > > > > > >>>>>>>>>> formats and storage across S3, PostgreSQL, >> Iceberg, >> > and >> > > > > > > > >>> Aurora. >> > > > > > > > >>>>>> When >> > > > > > > > >>>>>>>> data >> > > > > > > > >>>>>>>>>> assets become available for consumption, we need: >> > > > > > > > >>>>>>>>>> >> > > > > > > > >>>>>>>>>> - Detection of breaking schema changes between >> > systems >> > > > > > > > >>>>>>>>>> >> > > > > > > > >>>>>>>>>> - Data quality assessments between snapshots >> > > > > > > > >>>>>>>>>> >> > > > > > > > >>>>>>>>>> - Validation that assets meet mandatory metadata >> > > > > > requirements >> > > > > > > > >>>>>>>>>> >> > > > > > > > >>>>>>>>>> - Lookup validation against existing data >> > (comparing file >> > > > > > > > >>> feeds >> > > > > > > > >>>>>> with >> > > > > > > > >>>>>>>>>> different formats to existing data in >> > Iceberg/Aurora) >> > > > > > > > >>>>>>>>>> >> > > > > > > > >>>>>>>>>> This is exactly the type of work that LLMs could >> > automate >> > > > > > > > >>> while >> > > > > > > > >>>>>>>>>> maintaining governance. >> > > > > > > > >>>>>>>>>> >> > > > > > > > >>>>>>>>>> What we're thinking: >> > > > > > > > >>>>>>>>>> >> > > > > > > > >>>>>>>>>> ```python >> > > > > > > > >>>>>>>>>> >> > > > > > > > >>>>>>>>>> # Instead of writing complex SQL by hand... >> > > > > > > > >>>>>>>>>> >> > > > > > > > >>>>>>>>>> quality_check = LLMSQLQueryOperator( >> > > > > > > > >>>>>>>>>> >> > > > > > > > >>>>>>>>>> task_id="find_data_issues", >> > > > > > > > >>>>>>>>>> >> > > > > > > > >>>>>>>>>> prompt="Find customers with invalid email >> > formats and >> > > > > > > > >>> missing >> > > > > > > > >>>>>> phone >> > > > > > > > >>>>>>>>>> numbers", >> > > > > > > > >>>>>>>>>> >> > > > > > > > >>>>>>>>>> data_sources=[customer_asset], # Airflow >> > knows the >> > > > > > > > >>> schema >> > > > > > > > >>>>>>>>>> automatically >> > > > > > > > >>>>>>>>>> >> > > > > > > > >>>>>>>>>> # Built-in safety: won't generate >> DROP/DELETE >> > > > > > statements >> > > > > > > > >>>>>>>>>> >> > > > > > > > >>>>>>>>>> ) >> > > > > > > > >>>>>>>>>> >> > > > > > > > >>>>>>>>>> ``` >> > > > > > > > >>>>>>>>>> >> > > > > > > > >>>>>>>>>> The operator would: >> > > > > > > > >>>>>>>>>> >> > > > > > > > >>>>>>>>>> - Auto-inject database schema, sample data, and >> > connection >> > > > > > > > >>>>> details >> > > > > > > > >>>>>>>>>> - Generate safe SQL (blocks dangerous operations) >> > > > > > > > >>>>>>>>>> >> > > > > > > > >>>>>>>>>> - Work across PostgreSQL, Snowflake, BigQuery >> with >> > dialect >> > > > > > > > >>>>>> awareness >> > > > > > > > >>>>>>>>>> - Support schema drift detection between systems >> > > > > > > > >>>>>>>>>> >> > > > > > > > >>>>>>>>>> - Handle multi-cloud data via Apache >> DataFusion[1] >> > (Did >> > > > > some >> > > > > > > > >>>>>>>> experiments >> > > > > > > > >>>>>>>>>> with 50M+ records and results are in >> 10-15 >> > > > > seconds >> > > > > > > > >>> for >> > > > > > > > >>>>>> common >> > > > > > > > >>>>>>>>>> aggregations) >> > > > > > > > >>>>>>>>>> >> > > > > > > > >>>>>>>>>> for more info on benchmarks [2] >> > > > > > > > >>>>>>>>>> >> > > > > > > > >>>>>>>>>> Key benefit: Assets become smarter with >> structured >> > > > > metadata >> > > > > > > > >>>>>> (schema, >> > > > > > > > >>>>>>>>>> sensitivity, format) instead of just throwing >> > everything >> > > > > in >> > > > > > > > >>>>>> `extra`. >> > > > > > > > >>>>>>>>>> Implementation plan: >> > > > > > > > >>>>>>>>>> >> > > > > > > > >>>>>>>>>> Start with a separate provider >> > > > > > > > >>>>> (`apache-airflow-providers-sql-ai`) >> > > > > > > > >>>>>> so >> > > > > > > > >>>>>>>> we >> > > > > > > > >>>>>>>>>> can iterate without touching the Airflow core. No >> > breaking >> > > > > > > > >>>>> changes, >> > > > > > > > >>>>>>>> works >> > > > > > > > >>>>>>>>>> with existing connections and hooks. >> > > > > > > > >>>>>>>>>> >> > > > > > > > >>>>>>>>>> I am presenting this at Airflow Summit 2025 in >> > Seattle >> > > > > with >> > > > > > > > >>>>> Kaxil - >> > > > > > > > >>>>>>>> come >> > > > > > > > >>>>>>>>>> see the live demo! >> > > > > > > > >>>>>>>>>> >> > > > > > > > >>>>>>>>>> Next steps: >> > > > > > > > >>>>>>>>>> >> > > > > > > > >>>>>>>>>> If this resonates after the Summit, we'll write a >> > proper >> > > > > AIP >> > > > > > > > >>> with >> > > > > > > > >>>>>>>>> technical >> > > > > > > > >>>>>>>>>> details and further build a working prototype. >> > > > > > > > >>>>>>>>>> >> > > > > > > > >>>>>>>>>> Thoughts? Concerns? Better ideas? >> > > > > > > > >>>>>>>>>> >> > > > > > > > >>>>>>>>>> >> > > > > > > > >>>>>>>>>> [1]: https://datafusion.apache.org/ >> > > > > > > > >>>>>>>>>> >> > > > > > > > >>>>>>>>>> [2]: >> > > > > > > > >>>>>>>>>> >> > > > > > > > >>>>>>>>>> >> > > > > > > > >>> >> > > > > > > > >> > > > > > >> > > > > >> > >> https://datafusion.apache.org/blog/2024/11/18/datafusion-fastest-single-node-parquet-clickbench/ >> > > > > > > > >>>>>>>>>> Thanks, >> > > > > > > > >>>>>>>>>> >> > > > > > > > >>>>>>>>>> Pavan >> > > > > > > > >>>>>>>>>> >> > > > > > > > >>>>>>>>>> P.S. - Happy to share more technical details with >> > anyone >> > > > > > > > >>>>>> interested. >> > > > > > > > >>>>>>>>> >> > > > > > > > >>>>>>>>> -- >> > > > > > > > >>>>>>>>> Life is a chess game - Anonymous. >> > > > > > > > >>>>>>>>> >> > > > > > > > >>>>>> >> > > > > > > > >> > --------------------------------------------------------------------- >> > > > > > > > >>>>>> To unsubscribe, e-mail: >> > [email protected] >> > > > > > > > >>>>>> For additional commands, e-mail: >> > [email protected] >> > > > > > > > >>>>>> >> > > > > > > > >>>>>> >> > > > > > > > >>> >> > > > > > >> > --------------------------------------------------------------------- >> > > > > > > > >>> To unsubscribe, e-mail: >> [email protected] >> > > > > > > > >>> For additional commands, e-mail: >> > [email protected] >> > > > > > > > >>> >> > > > > > > > >>> >> > > > > > > > > >> > > > > >> --------------------------------------------------------------------- >> > > > > > > > > To unsubscribe, e-mail: >> [email protected] >> > > > > > > > > For additional commands, e-mail: >> [email protected] >> > > > > > > > > >> > > > > > > > >> > > > > > > > >> > --------------------------------------------------------------------- >> > > > > > > > To unsubscribe, e-mail: [email protected] >> > > > > > > > For additional commands, e-mail: >> [email protected] >> > > > > > > > >> > > > > > > > >> > > > > > >> > > > > > >> > --------------------------------------------------------------------- >> > > > > > To unsubscribe, e-mail: [email protected] >> > > > > > For additional commands, e-mail: [email protected] >> > > > > > >> > > > > > >> > > > > >> > > >> > > --------------------------------------------------------------------- >> > > To unsubscribe, e-mail: [email protected] >> > > For additional commands, e-mail: [email protected] >> > > >> > >> > --------------------------------------------------------------------- >> > To unsubscribe, e-mail: [email protected] >> > For additional commands, e-mail: [email protected] >> > >> > >> >
