[Numpy-discussion] Re: Current policy on AI-generated code in NumPy

Ralf Gommers via NumPy-Discussion Mon, 09 Feb 2026 14:03:44 -0800

On Mon, Feb 9, 2026 at 6:23 PM Matthew Brett via NumPy-Discussion <
[email protected]> wrote:


> Hi,
>
> I thought your (Ralf's) distinction was interesting, so here's some
> more reflection.  The distinction starts at:
>
> > we don't prescribe to others how they are and aren't allowed to
> contribute (to the extent possible)
>
> I think it's correct that it's not sensible for policies to reflect
> things like dislike of AI's use of energy or the effects on the
> environment of AI data centers.   However, it seems obvious to me that
> it is sensible for policies to take into account the effect of AI on
> learning.


Why would that be obvious? It seems incredibly presumptuous to decide for
other people what methods or tools they are or aren't allowed to use for
learning. We're not running a high school or university here.

At most we can provide docs and a happy path for some types of tools, but
that's about it. We cannot prescribe anything.

 But why the distinction?
>
> On reflection, it seems to me that policies should reflect only on the
> interests of the project, but those interests should be seen broadly,
> and include planning for future community and maintainers.  Thus,
> environmental concerns might well be important in general, but do not
> bear directly on the work of the project.  Therefore the project's
> managers have no mandate to act on that concern, at least without
> explicit consensus.   However, any sensible project should be thinking
> about the state of maintenance in 5 or 10 years.  Therefore, the
> project does have a potential mandate to prefer tools that will lead
> to better overall understanding, communication, community building, or
> code quality in the future.
>

This also presumes that you, or we, are able to determine what usage of AI
tools helps or hinders learning. That is not possible at the level of
individuals: people can learn in very different ways, plus it will strongly
depend on how the tools are used. And even in the aggregate it's not
practically possible: most of the studies that have been referenced in this
and linked thread (a) are one-offs, and often inconsistent with each other,
and (b) already outdated, given how fast the field is developing.

It's easy to think of ways that using AI tools for contributing could help
with learning:

   - Simple time gain: once one has done the same thing a number of times
   and it becomes routine, automate it with AI so the contributor can spend
   more time focusing on learning about new topics.
   - Improved code quality and internal consistency from letting AI tools
   fix up and verify design rules (e.g., how type promotion is handled) will
   lead to the ability to learn the concepts from the code base in a more
   consistent fashion.
   - Use as a brainstorming tool to suggest multiple design options,
   broadening discovery.
   - We could ask AI tools to write internal design documentation, of the
   kind that only a few handfuls of maintainers would be able to write (but
   almost never do, because we're too busy). There are important parts of the
   code base that have no documentation beyond some scattered code comments.
   - Give contributors feedback that the maintainers often don't have the
   time or interest to give, in a timely fashion or at all.
   - Writing throwaway prototypes of ideas for NumPy that would otherwise
   take too long to implement and would never get done, thereby allowing to
   learn if something is feasible at all, or a good idea.
   - Learning to use the AI tools themselves: this may well become an
   essential skill for most software-related roles in the near future.

Same for future community & new maintainers:

   - Current maintainers may enjoy both learning something new and
   automating the more tedious parts of maintenance, so they can focus on the
   more interesting parts. That will aid maintainer retention.
      - Ilhan's point is a great example here. He just finished a massive
      amount of work rewriting code from Fortran into C. And now found that AI
      tools can be quite helpful in that endeavour (while 6 months ago they
      weren't). This work must have been extremely tedious (thanks again for
      biting that bullet Ilhan). And it really wasn't fun to review either.
      - New contributors may default to working with these tools more often
   than not, and be turned off from contributing by rules that say they cannot
   use their default workflow.

I'm sure it's not hard to think of more along these lines, but I hope the
point is clear.

Cheers,
Ralf




> Cheers,
>
> Matthew
>
> On Sun, Feb 8, 2026 at 12:01 PM Ralf Gommers via NumPy-Discussion
> <[email protected]> wrote:
> >
> >
> >
> > On Sat, Feb 7, 2026 at 3:11 PM David Cournapeau via NumPy-Discussion <
> [email protected]> wrote:
> >>
> >>
> >>
> >> On Sat, Feb 7, 2026 at 2:50 AM Ilhan Polat via NumPy-Discussion <
> [email protected]> wrote:
> >>>
> >>> That's fantastic that you are working on it David. A good high-level
> ARPACK is beneficial for all and possibly better to re-map to C if the
> accuracy is higher. We can maybe replace the translated C code with it.
> >>>
> >>> There are a few places discussion took place already, a few of them
> below and the references therein
> >>>
> >>>
> https://discuss.scientific-python.org/t/a-policy-on-generative-ai-assisted-contributions/1702
> >>> https://github.com/scientific-python/summit-2025/issues/35
> >>>
> >>> I wish these models were available when I was translating all that
> Fortran code because now I can scan my previous work and find the errors
> extremely quickly when I am hunting for bugs. So just in a few months they
> leaped forward from the pointless "this code uses Fortran let me compile
> with f2c, hihi" to "I compiled with valgrind and on line 760, the Fortran
> has out-of-bounds access which seems to cause an issue, I'll fix the
> translated code". I think I wrote sufficient text in those sources, so I'll
> leave it to others but regardless of the policy discussions, you have at
> least one customer looking forward to it.
> >>
> >>
> >> I missed that recent discussion, thanks. Seems to clarify the direction
> NumPy community may follow based on SymPy policy.
> >
> >
> > I agree, this seems to be at least the majority view of both NumPy/SciPy
> maintainers, as well as the high-level principles that a lot of well-know
> OSS projects are ending up with when they write down a policy. I'll copy
> the four principles from Stefan's blog post here:
> >
> > Be transparent
> > Take responsibility
> > Gain understanding
> > Honor Copyright
> >
> > Adding the "we want to interact with other humans, not machines"
> principle more explicitly to that would indeed be good as well. LLVM's
> recently adopted policy (https://llvm.org/docs/AIToolPolicy.html) is
> another example that I like, with principles similar to the ones Stefan
> articulated and the SymPy policy.
> >
> > I'd add one principle here that doesn't need to be in a policy but is
> important for this discussion: we don't prescribe to others how they are
> and aren't allowed to contribute (to the extent possible). That means that
> arguments about the productivity gains of using any given tool, or other
> effects of using that given tool like a reduction in learning, the impact
> on society or environment, etc. are - while quite interesting and important
> - not applicable to the question of "am I allowed to use tool X to
> contribute to NumPy or SciPy?". There are obviously better and worse ways
> to use any tool, but the responsibility of that is up to every individual.
> >
> > Re ARPACK rewrite: I think at this point I'd recommend steering clear of
> letting an LLM tool generate substantial algorithmic code - given the niche
> application, the copyright implications of doing that are pretty murky
> indeed. However, using an LLM tool to generate more unit tests given a
> specific criterion, or have it fill in stubbed out C code in the
> implementation for things like error handling, checking/fixing
> Py_DECREF'ing issues, adding the "create a Python extension module"
> boilerplate, and all such kinds of clearly not copyrightable code seems
> perfectly fine to do. That just automates some of the tedious and fiddly
> parts of coding, without breaking any of the principles listed above.
> >
> > Cheers,
> > Ralf
> >
> >
> > _______________________________________________
> > NumPy-Discussion mailing list -- [email protected]
> > To unsubscribe send an email to [email protected]
> > https://mail.python.org/mailman3//lists/numpy-discussion.python.org
> > Member address: [email protected]
>
>
>
> --
> This email is fully human-source.    Unless I'm quoting AI, I did not
> use AI for any text in this email.
> _______________________________________________
> NumPy-Discussion mailing list -- [email protected]
> To unsubscribe send an email to [email protected]
> https://mail.python.org/mailman3//lists/numpy-discussion.python.org
> Member address: [email protected]
>

_______________________________________________
NumPy-Discussion mailing list -- [email protected]
To unsubscribe send an email to [email protected]
https://mail.python.org/mailman3//lists/numpy-discussion.python.org
Member address: [email protected]

[Numpy-discussion] Re: Current policy on AI-generated code in NumPy

Reply via email to