In response to the other devlist thread on AIP 91 / MCP, I set out to build
a minimally viable example of what it could look like. I ended up building
a tool that would allow users to interact with Airflow via LLM/MCP in a way
that goes beyond just exposing the Airflow REST API. For example, a user
could say "please update the revenue dashboard", which would trigger a dag
(or set of dags via asset aware scheduling). IMO, it became too opinionated
for the main Airflow project (I'd be happy to be proven wrong here!), but I
think it's pretty cool. Not sure if a gif will work in the devlist but I'll
give it a try:

[image: demo.gif]

I'm really quite skeptical of LLMs beyond individual/personal use,  but I
don't think it's out of the realm of possibility that they become better
and more capable of complex, code-based tasks. Hopefully this spurs some
ideas :)

Demo repo: https://github.com/RNHTTR/airflow-mcp-demo
Project repo with the gif in case it doesn't load in the dev list:
https://github.com/RNHTTR/MCPipeline?tab=readme-ov-file#demo

On Tue, Dec 2, 2025 at 9:05 AM Kunal Bhattacharya <[email protected]>
wrote:

> My experience with Gen AI code editors has been somewhat mixed as well. I
> have mostly used Windsurf, alternating between GPT-5 and Claude Sonnet 4.5
> models and I have felt it useful for:
> * Understanding pre-existing code, with some caveats around more nitty
> gritty stuff that it tends to miss but gets right overall
> * Writing repetitive code, like tests, when provided the exact framework
> such as a couple of reference test suites to mirror in terms of approach.
> In a particular scenario where I had to write 16 integration tests, I wrote
> 2 with all of the helper functions and asked Windsurf to replicate the same
> framework for the remaining 14 and it did a decent job. But even then, it
> seemed to go off the rails with simple tasks such as appending only to the
> end of the file and not the middle.
>
> In summary, for writing code, it seems to be useful only when you know
> 80-90% of the exact changes required to be done, hands on.
>
> On the other hand, when asked to do stuff where the solution is not
> entirely clear to me, e.g. updating the extensibility of a package that is
> being used in our codebase which would impact which functions to call,
> which objects to change for deprecations etc., it fails horribly. I also
> tried using it to resolve a service-to-service communication issue in our
> platform via configurations in the Helm chart but again started running in
> circles. So the R&D and trial-n-error stuff seems to stay with me while I
> can assign it the more mundane stuff, which is okay I guess :) I am
> interested to experiment with locally hosted LLMs to see if this changes my
> experience!
>
> On a lighter note, I would definitely be more concerned if it starts doing
> pinpoint R&Ds and suggesting accurate solutions across large codebases!
>
> Curious to know how others are using it!
>
> Regards,
> Kunal
>
>
>
> On Tue, Dec 2, 2025 at 2:55 PM Jarek Potiuk <[email protected]> wrote:
>
> > Thanks Dennis :). Hopefully with your message we will get back on track,
> > rather than being distracted with mailing list issues ;).
> >
> > Yeah I have quite similar experiences, - hopefully we can get this thread
> > going  and others will chime in as well. I am not sure if the channel on
> > slack is a good idea, so maybe let's continue here.
> >
> > One more comment. We recently had discussion at the ASF members@ about
> > using AI for AF (including the guidelines I shared) - and of course
> people
> > have various concerns - from licensing, training AI on copyrighted
> > material, "dependency on big tech" etc. Valid concerns and we have some
> > very constructive discussions on how we can make a better use of AI
> > ourselves in a way that follows our principles.
> >
> > I personally think that first of all - AI is overhyped of course, but
> it's
> > here to stay, I also see how the models can get optimised over time, and
> > start fitting into smaller hardware and can run locally and eventually -
> > while some of the big ones are trying to take over the AI and monetise
> it,
> > the open-source world (maybe even ASF building and releasing their fully
> > open-source models) will win. Many of us don't remember (because they
> were
> > not born yet ;) ) - we've seen that 30 years ago when open source was
> just
> > starting - where proprietary software was basically the only thing you
> > could get. Now 9X% of the software out there is open-source and while
> > proprietary services are out there still, you can use most of the
> software
> > for free (for example - Airflow :D).
> >
> > I'd love to hear also from others - how they are using AI now :). BTW. I
> > will be speaking in February at a new "grass-root" conference in Poland
> > https://post-software.intentee.com/ (run by two of my very young and
> > enthusiastic friends) where I will be speaking about our usage of AI
> > (starting with the UI translation project), so I also have also a very
> good
> > reason to ask you for feedback here :).
> >
> > J.
> >
> >
> >
> > On Mon, Dec 1, 2025 at 8:27 PM Jarek Potiuk <[email protected]> wrote:
> >
> > > > Hey, please remove me from this distribution list! Thanks!
> > >
> > > Hey - you can remove yourself following the description on
> > > https://airflow.apache.org/community/
> > >
> > >
> > > On Mon, Dec 1, 2025 at 8:05 PM Aaron Dantley <[email protected]>
> > > wrote:
> > >
> > >> Hey, please remove me from this distribution list! Thanks!
> > >>
> > >> On Mon, Dec 1, 2025 at 1:36 PM Ferruzzi, Dennis <[email protected]>
> > >> wrote:
> > >>
> > >> > I was hoping this thread would get more love so I could see how
> others
> > >> are
> > >> > using it.  I'm not using LLMs a whole lot for writing actual code
> > right
> > >> > now, I don't find them all that intelligent. My experience feels
> more
> > >> like
> > >> > having an overeager intern; the code isn't great, the "thinking" is
> > >> pretty
> > >> > one-track - often retrying the same failed ideas multiple times -
> and
> > >> it's
> > >> > often faster to just do it myself.
> > >> >
> > >> > I have tried things like:
> > >> >  - "here is a python file I have made changes to, and the existing
> > test
> > >> > file, do I still have coverage?"  A dedicated tool like covecov is
> > >> better
> > >> > for this, but I'm trying to give them a fair shot.
> > >> >  - "I just wrote a couple of functions, I need you to check for any
> > >> > missing type-hints and generate the method docsctrings following
> > >> pydocstyle
> > >> > formatting rules and the formatting style of the existing methods".
> > The
> > >> > docstrings then need to be reviewed, but they are usually pretty
> > decent,
> > >> > and a dedicated linter is likely better at the hinting.
> > >> >
> > >> > - Summarizing existing code into plain English seems to work pretty
> > well
> > >> > if you just want an overview of what a block of code is actually
> doing
> > >> > - "Summarize this git diff into a 2-line PR description" usually
> > results
> > >> > in a pretty reasonable starting point that just needs some tweaks.
> > >> >
> > >> > Parsing stack traces I think are the biggest thing that it actually
> > does
> > >> > well; those things can get out of hand some times and it can be
> handy
> > to
> > >> > have the LLM parse it and get you the summary and the main issues
> > (don't
> > >> > show me the internal calls of 3rd party packages, etc)
> > >> >
> > >> > I recently started giving Cline a try, it's a code-aware LLM that
> > lives
> > >> in
> > >> > your IDE and has access to any files in the current project.  It's
> > >> > definitely better but still not great IMHO.  What I do like about
> that
> > >> one
> > >> > is you can ask thinks like "where do we ACTUALLY write the
> > >> serialized_dag
> > >> > to the database?" and "Show me where we actually re-parse the dag
> bag"
> > >> and
> > >> > it seems to be pretty good at tracing through the code to find that
> > >> kind of
> > >> > thing, which has saved me a little time when poking at corners of
> the
> > >> > project I'm not as familiar with.  But given my experience with them
> > in
> > >> the
> > >> > past and the complexity of the codebase, I never really trust that
> it
> > >> finds
> > >> > all the references.  For example, if it points to a line of code
> where
> > >> we
> > >> > re-parse the dag bag I can't trust that this is the **only** place
> we
> > do
> > >> > that, so I may have to double-check it's work anyway.
> > >> >
> > >> > Overall, I think Jarek actually hit the nail on the head with his
> > >> comment
> > >> > that the key to using them right now is figuring out what they
> > actually
> > >> CAN
> > >> > do well and avoiding them for tasks where they are going to slow you
> > >> down.
> > >> > It takes some trial and error to figure out where that line is and
> new
> > >> > models and tools come out so fast, the line is constantly shifting.
> > >> >
> > >> >
> > >> >  - ferruzzi
> > >> >
> > >> >
> > >> > ________________________________
> > >> > From: Jarek Potiuk <[email protected]>
> > >> > Sent: Tuesday, November 11, 2025 3:21 AM
> > >> > To: [email protected]
> > >> > Subject: [EXT] Share your Gen-AI contributions ?
> > >> >
> > >> > CAUTION: This email originated from outside of the organization. Do
> > not
> > >> > click links or open attachments unless you can confirm the sender
> and
> > >> know
> > >> > the content is safe.
> > >> >
> > >> >
> > >> >
> > >> > AVERTISSEMENT: Ce courrier électronique provient d’un expéditeur
> > >> externe.
> > >> > Ne cliquez sur aucun lien et n’ouvrez aucune pièce jointe si vous ne
> > >> pouvez
> > >> > pas confirmer l’identité de l’expéditeur et si vous n’êtes pas
> certain
> > >> que
> > >> > le contenu ne présente aucun risque.
> > >> >
> > >> >
> > >> >
> > >> > Hello community,
> > >> >
> > >> > *TL;DR; I have a proposal that we share a bit more openly how we are
> > >> using
> > >> > Gen AI tooling to make us more productive. I thought about creating
> a
> > >> > dedicated #gen-ai-contribution-sharing channel in Slack for that
> > >> purpose*
> > >> >
> > >> > I've been using various Gen-AI tools and I am sure many of us do and
> > >> I've
> > >> > seen people sharing their experiences in various places - we also
> > >> shared it
> > >> > a bit here - our UI Translation project is largely based on AI
> helping
> > >> our
> > >> > translators to do the heavy-lifting. I also shared a few times how
> AI
> > >> > helped me to massively speed up work on fixing footers on our 250K
> > >> pages of
> > >> > documentation and - more recently - make sure our licensing in
> > packages
> > >> is
> > >> > compliant with ASF - but also I used Gen AI to generate some
> scripting
> > >> > tools (breeze ci upgrade and the check_translation_completness.py
> > >> script).
> > >> > Also many of our contributors use various Gen AI tools to create
> their
> > >> PRs.
> > >> > And I know few of us use it to analyse stack-traces and errors, and
> > use
> > >> it
> > >> > to explain how our code works.
> > >> >
> > >> > I thought that there are two interesting aspects that it would be
> > great
> > >> > that we learn from one another:
> > >> >
> > >> > 1) What kind of tooling you use and how it fits-in the UX and
> > developer
> > >> > experience (I used a number of things - from copilot CLI, IDE
> > >> integration
> > >> > to Copilot reviews and Agents. I found that the better integrated
> the
> > >> tool
> > >> > is in your daily regular tasks, the more useful it is.
> > >> >
> > >> > 2) The recurring theme from all the Gen-AI discussions I hear is
> that
> > >> it's
> > >> > most important to learn where Gen AI helps, and where it stands in
> the
> > >> way:
> > >> > * in a few things I tried Gen AI makes me vastly more productive - I
> > >> feel
> > >> > * in some of them I feel the reviews, correction of mistakes and
> > >> generally
> > >> > iteration on it slows me down significantly
> > >> > * in some cases it maybe not faster, but takes a lot less mental
> > energy
> > >> and
> > >> > decision making and mostly repetitive coding, so generally I feel
> > >> happier
> > >> > * finally there are cases (like the UI translation) that I would
> never
> > >> even
> > >> > attempt because of the vast amount of mostly repetitive and
> generally
> > >> > boring things that would normally cause me dropping out very quickly
> > and
> > >> > abandoning it eventually
> > >> >
> > >> > I feel that we could learn from each-other. For me learning by
> > example -
> > >> > especially an example in a project that you know well and you can
> > easily
> > >> > transplant the learnings to your own tasks - is the fastest and best
> > >> way of
> > >> > learning.
> > >> >
> > >> > Finally - The Apache Software Foundation has this official guidance
> on
> > >> > using AI to contribute code [1]  - I think this is a very well
> written
> > >> one,
> > >> > and it describes some border conditions where AI contributions are
> > "OK"
> > >> > from the licencing, copyright point of view - largely to avoid big
> > >> chunks
> > >> > of copyrightable code leaking from GPL-licensed training material.
> And
> > >> > while it does not have definite answers, I think when we share our
> > >> > contributions openly we can discuss things like "is that
> > copyrightable",
> > >> > where is that coming from etc. etc.  (note that in many cases - when
> > you
> > >> > generate large chunks of code, you can ask the LLM where the code
> > comes
> > >> > from and several of the LLM tools even provides you immediately the
> > >> > references of the sources of code in such cases.
> > >> >
> > >> > So my proposal is to create a *#gen-ai-contribution-sharing  *in our
> > >> > slack - where we will share our experiences from using AI, ask when
> > you
> > >> > have doubts about whether you can submit such a code etc.
> > >> >
> > >> > WDYT? Is it a good idea ?
> > >> >
> > >> > [1] Generative Tooling Guidance by ASF:
> > >> > https://www.apache.org/legal/generative-tooling.html
> > >> >
> > >> > J.
> > >> >
> > >>
> > >
> >
>

Reply via email to