[Numpy-discussion] Re: Current policy on AI-generated code in NumPy

Ralf Gommers via NumPy-Discussion Sun, 08 Feb 2026 04:00:04 -0800

On Sat, Feb 7, 2026 at 3:11 PM David Cournapeau via NumPy-Discussion <
[email protected]> wrote:

>
>
> On Sat, Feb 7, 2026 at 2:50 AM Ilhan Polat via NumPy-Discussion <
> [email protected]> wrote:
>
>> That's fantastic that you are working on it David. A good high-level
>> ARPACK is beneficial for all and possibly better to re-map to C if the
>> accuracy is higher. We can maybe replace the translated C code with it.
>>
>> There are a few places discussion took place already, a few of them below
>> and the references therein
>>
>>
>> https://discuss.scientific-python.org/t/a-policy-on-generative-ai-assisted-contributions/1702
>> https://github.com/scientific-python/summit-2025/issues/35
>>
>> I wish these models were available when I was translating all that
>> Fortran code because now I can scan my previous work and find the errors
>> extremely quickly when I am hunting for bugs. So just in a few months they
>> leaped forward from the pointless "this code uses Fortran let me compile
>> with f2c, hihi" to "I compiled with valgrind and on line 760, the Fortran
>> has out-of-bounds access which seems to cause an issue, I'll fix the
>> translated code". I think I wrote sufficient text in those sources, so I'll
>> leave it to others but regardless of the policy discussions, you have at
>> least one customer looking forward to it.
>>
>
> I missed that recent discussion, thanks. Seems to clarify the direction
> NumPy community may follow based on SymPy policy.
>

I agree, this seems to be at least the majority view of both NumPy/SciPy
maintainers, as well as the high-level principles that a lot of
well-know OSS projects are ending up with when they write down a policy.
I'll copy the four principles from Stefan's blog post here:

   1. Be transparent
   2. Take responsibility
   3. Gain understanding
   4. Honor Copyright

Adding the "we want to interact with other humans, not machines" principle
more explicitly to that would indeed be good as well. LLVM's
recently adopted policy (https://llvm.org/docs/AIToolPolicy.html) is
another example that I like, with principles similar to the ones Stefan
articulated and the SymPy policy.

I'd add one principle here that doesn't need to be in a policy but is
important for this discussion: we don't prescribe to others how they are
and aren't allowed to contribute (to the extent possible). That means that
arguments about the productivity gains of using any given tool, or other
effects of using that given tool like a reduction in learning, the impact
on society or environment, etc. are - while quite interesting and important
- not applicable to the question of "am I allowed to use tool X to
contribute to NumPy or SciPy?". There are obviously better and worse ways
to use any tool, but the responsibility of that is up to every individual.

Re ARPACK rewrite: I think at this point I'd recommend steering clear of
letting an LLM tool generate substantial algorithmic code - given the niche
application, the copyright implications of doing that are pretty murky
indeed. However, using an LLM tool to generate more unit tests given a
specific criterion, or have it fill in stubbed out C code in the
implementation for things like error handling, checking/fixing
Py_DECREF'ing issues, adding the "create a Python extension module"
boilerplate, and all such kinds of clearly not copyrightable code seems
perfectly fine to do. That just automates some of the tedious and fiddly
parts of coding, without breaking any of the principles listed above.

Cheers,
Ralf

_______________________________________________
NumPy-Discussion mailing list -- [email protected]
To unsubscribe send an email to [email protected]
https://mail.python.org/mailman3//lists/numpy-discussion.python.org
Member address: [email protected]

[Numpy-discussion] Re: Current policy on AI-generated code in NumPy

Reply via email to