Re: Share your Gen-AI contributions ?

Jarek Potiuk Tue, 02 Dec 2025 01:25:31 -0800

Thanks Dennis :). Hopefully with your message we will get back on track,
rather than being distracted with mailing list issues ;).


Yeah I have quite similar experiences, - hopefully we can get this thread
going  and others will chime in as well. I am not sure if the channel on
slack is a good idea, so maybe let's continue here.

One more comment. We recently had discussion at the ASF members@ about
using AI for AF (including the guidelines I shared) - and of course people
have various concerns - from licensing, training AI on copyrighted
material, "dependency on big tech" etc. Valid concerns and we have some
very constructive discussions on how we can make a better use of AI
ourselves in a way that follows our principles.

I personally think that first of all - AI is overhyped of course, but it's
here to stay, I also see how the models can get optimised over time, and
start fitting into smaller hardware and can run locally and eventually -
while some of the big ones are trying to take over the AI and monetise it,
the open-source world (maybe even ASF building and releasing their fully
open-source models) will win. Many of us don't remember (because they were
not born yet ;) ) - we've seen that 30 years ago when open source was just
starting - where proprietary software was basically the only thing you
could get. Now 9X% of the software out there is open-source and while
proprietary services are out there still, you can use most of the software
for free (for example - Airflow :D).

I'd love to hear also from others - how they are using AI now :). BTW. I
will be speaking in February at a new "grass-root" conference in Poland
https://post-software.intentee.com/ (run by two of my very young and
enthusiastic friends) where I will be speaking about our usage of AI
(starting with the UI translation project), so I also have also a very good
reason to ask you for feedback here :).

J.



On Mon, Dec 1, 2025 at 8:27 PM Jarek Potiuk <[email protected]> wrote:

> > Hey, please remove me from this distribution list! Thanks!
>
> Hey - you can remove yourself following the description on
> https://airflow.apache.org/community/
>
>
> On Mon, Dec 1, 2025 at 8:05 PM Aaron Dantley <[email protected]>
> wrote:
>
>> Hey, please remove me from this distribution list! Thanks!
>>
>> On Mon, Dec 1, 2025 at 1:36 PM Ferruzzi, Dennis <[email protected]>
>> wrote:
>>
>> > I was hoping this thread would get more love so I could see how others
>> are
>> > using it.  I'm not using LLMs a whole lot for writing actual code right
>> > now, I don't find them all that intelligent. My experience feels more
>> like
>> > having an overeager intern; the code isn't great, the "thinking" is
>> pretty
>> > one-track - often retrying the same failed ideas multiple times - and
>> it's
>> > often faster to just do it myself.
>> >
>> > I have tried things like:
>> >  - "here is a python file I have made changes to, and the existing test
>> > file, do I still have coverage?"  A dedicated tool like covecov is
>> better
>> > for this, but I'm trying to give them a fair shot.
>> >  - "I just wrote a couple of functions, I need you to check for any
>> > missing type-hints and generate the method docsctrings following
>> pydocstyle
>> > formatting rules and the formatting style of the existing methods". The
>> > docstrings then need to be reviewed, but they are usually pretty decent,
>> > and a dedicated linter is likely better at the hinting.
>> >
>> > - Summarizing existing code into plain English seems to work pretty well
>> > if you just want an overview of what a block of code is actually doing
>> > - "Summarize this git diff into a 2-line PR description" usually results
>> > in a pretty reasonable starting point that just needs some tweaks.
>> >
>> > Parsing stack traces I think are the biggest thing that it actually does
>> > well; those things can get out of hand some times and it can be handy to
>> > have the LLM parse it and get you the summary and the main issues (don't
>> > show me the internal calls of 3rd party packages, etc)
>> >
>> > I recently started giving Cline a try, it's a code-aware LLM that lives
>> in
>> > your IDE and has access to any files in the current project.  It's
>> > definitely better but still not great IMHO.  What I do like about that
>> one
>> > is you can ask thinks like "where do we ACTUALLY write the
>> serialized_dag
>> > to the database?" and "Show me where we actually re-parse the dag bag"
>> and
>> > it seems to be pretty good at tracing through the code to find that
>> kind of
>> > thing, which has saved me a little time when poking at corners of the
>> > project I'm not as familiar with.  But given my experience with them in
>> the
>> > past and the complexity of the codebase, I never really trust that it
>> finds
>> > all the references.  For example, if it points to a line of code where
>> we
>> > re-parse the dag bag I can't trust that this is the **only** place we do
>> > that, so I may have to double-check it's work anyway.
>> >
>> > Overall, I think Jarek actually hit the nail on the head with his
>> comment
>> > that the key to using them right now is figuring out what they actually
>> CAN
>> > do well and avoiding them for tasks where they are going to slow you
>> down.
>> > It takes some trial and error to figure out where that line is and new
>> > models and tools come out so fast, the line is constantly shifting.
>> >
>> >
>> >  - ferruzzi
>> >
>> >
>> > ________________________________
>> > From: Jarek Potiuk <[email protected]>
>> > Sent: Tuesday, November 11, 2025 3:21 AM
>> > To: [email protected]
>> > Subject: [EXT] Share your Gen-AI contributions ?
>> >
>> > CAUTION: This email originated from outside of the organization. Do not
>> > click links or open attachments unless you can confirm the sender and
>> know
>> > the content is safe.
>> >
>> >
>> >
>> > AVERTISSEMENT: Ce courrier électronique provient d’un expéditeur
>> externe.
>> > Ne cliquez sur aucun lien et n’ouvrez aucune pièce jointe si vous ne
>> pouvez
>> > pas confirmer l’identité de l’expéditeur et si vous n’êtes pas certain
>> que
>> > le contenu ne présente aucun risque.
>> >
>> >
>> >
>> > Hello community,
>> >
>> > *TL;DR; I have a proposal that we share a bit more openly how we are
>> using
>> > Gen AI tooling to make us more productive. I thought about creating a
>> > dedicated #gen-ai-contribution-sharing channel in Slack for that
>> purpose*
>> >
>> > I've been using various Gen-AI tools and I am sure many of us do and
>> I've
>> > seen people sharing their experiences in various places - we also
>> shared it
>> > a bit here - our UI Translation project is largely based on AI helping
>> our
>> > translators to do the heavy-lifting. I also shared a few times how AI
>> > helped me to massively speed up work on fixing footers on our 250K
>> pages of
>> > documentation and - more recently - make sure our licensing in packages
>> is
>> > compliant with ASF - but also I used Gen AI to generate some scripting
>> > tools (breeze ci upgrade and the check_translation_completness.py
>> script).
>> > Also many of our contributors use various Gen AI tools to create their
>> PRs.
>> > And I know few of us use it to analyse stack-traces and errors, and use
>> it
>> > to explain how our code works.
>> >
>> > I thought that there are two interesting aspects that it would be great
>> > that we learn from one another:
>> >
>> > 1) What kind of tooling you use and how it fits-in the UX and developer
>> > experience (I used a number of things - from copilot CLI, IDE
>> integration
>> > to Copilot reviews and Agents. I found that the better integrated the
>> tool
>> > is in your daily regular tasks, the more useful it is.
>> >
>> > 2) The recurring theme from all the Gen-AI discussions I hear is that
>> it's
>> > most important to learn where Gen AI helps, and where it stands in the
>> way:
>> > * in a few things I tried Gen AI makes me vastly more productive - I
>> feel
>> > * in some of them I feel the reviews, correction of mistakes and
>> generally
>> > iteration on it slows me down significantly
>> > * in some cases it maybe not faster, but takes a lot less mental energy
>> and
>> > decision making and mostly repetitive coding, so generally I feel
>> happier
>> > * finally there are cases (like the UI translation) that I would never
>> even
>> > attempt because of the vast amount of mostly repetitive and generally
>> > boring things that would normally cause me dropping out very quickly and
>> > abandoning it eventually
>> >
>> > I feel that we could learn from each-other. For me learning by example -
>> > especially an example in a project that you know well and you can easily
>> > transplant the learnings to your own tasks - is the fastest and best
>> way of
>> > learning.
>> >
>> > Finally - The Apache Software Foundation has this official guidance on
>> > using AI to contribute code [1]  - I think this is a very well written
>> one,
>> > and it describes some border conditions where AI contributions are "OK"
>> > from the licencing, copyright point of view - largely to avoid big
>> chunks
>> > of copyrightable code leaking from GPL-licensed training material. And
>> > while it does not have definite answers, I think when we share our
>> > contributions openly we can discuss things like "is that copyrightable",
>> > where is that coming from etc. etc.  (note that in many cases - when you
>> > generate large chunks of code, you can ask the LLM where the code comes
>> > from and several of the LLM tools even provides you immediately the
>> > references of the sources of code in such cases.
>> >
>> > So my proposal is to create a *#gen-ai-contribution-sharing  *in our
>> > slack - where we will share our experiences from using AI, ask when you
>> > have doubts about whether you can submit such a code etc.
>> >
>> > WDYT? Is it a good idea ?
>> >
>> > [1] Generative Tooling Guidance by ASF:
>> > https://www.apache.org/legal/generative-tooling.html
>> >
>> > J.
>> >
>>
>

Re: Share your Gen-AI contributions ?

Reply via email to