[Numpy-discussion] Re: Current policy on AI-generated code in NumPy

Matthew Brett via NumPy-Discussion Mon, 09 Feb 2026 09:25:20 -0800

Hi,

I thought your (Ralf's) distinction was interesting, so here's some
more reflection.  The distinction starts at:


> we don't prescribe to others how they are and aren't allowed to contribute 
> (to the extent possible)

I think it's correct that it's not sensible for policies to reflect
things like dislike of AI's use of energy or the effects on the
environment of AI data centers.   However, it seems obvious to me that
it is sensible for policies to take into account the effect of AI on
learning.   But why the distinction?

On reflection, it seems to me that policies should reflect only on the
interests of the project, but those interests should be seen broadly,
and include planning for future community and maintainers.  Thus,
environmental concerns might well be important in general, but do not
bear directly on the work of the project.  Therefore the project's
managers have no mandate to act on that concern, at least without
explicit consensus.   However, any sensible project should be thinking
about the state of maintenance in 5 or 10 years.  Therefore, the
project does have a potential mandate to prefer tools that will lead
to better overall understanding, communication, community building, or
code quality in the future.

Cheers,

Matthew

On Sun, Feb 8, 2026 at 12:01 PM Ralf Gommers via NumPy-Discussion
<[email protected]> wrote:
>
>
>
> On Sat, Feb 7, 2026 at 3:11 PM David Cournapeau via NumPy-Discussion 
> <[email protected]> wrote:
>>
>>
>>
>> On Sat, Feb 7, 2026 at 2:50 AM Ilhan Polat via NumPy-Discussion 
>> <[email protected]> wrote:
>>>
>>> That's fantastic that you are working on it David. A good high-level ARPACK 
>>> is beneficial for all and possibly better to re-map to C if the accuracy is 
>>> higher. We can maybe replace the translated C code with it.
>>>
>>> There are a few places discussion took place already, a few of them below 
>>> and the references therein
>>>
>>> https://discuss.scientific-python.org/t/a-policy-on-generative-ai-assisted-contributions/1702
>>> https://github.com/scientific-python/summit-2025/issues/35
>>>
>>> I wish these models were available when I was translating all that Fortran 
>>> code because now I can scan my previous work and find the errors extremely 
>>> quickly when I am hunting for bugs. So just in a few months they leaped 
>>> forward from the pointless "this code uses Fortran let me compile with f2c, 
>>> hihi" to "I compiled with valgrind and on line 760, the Fortran has 
>>> out-of-bounds access which seems to cause an issue, I'll fix the translated 
>>> code". I think I wrote sufficient text in those sources, so I'll leave it 
>>> to others but regardless of the policy discussions, you have at least one 
>>> customer looking forward to it.
>>
>>
>> I missed that recent discussion, thanks. Seems to clarify the direction 
>> NumPy community may follow based on SymPy policy.
>
>
> I agree, this seems to be at least the majority view of both NumPy/SciPy 
> maintainers, as well as the high-level principles that a lot of well-know OSS 
> projects are ending up with when they write down a policy. I'll copy the four 
> principles from Stefan's blog post here:
>
> Be transparent
> Take responsibility
> Gain understanding
> Honor Copyright
>
> Adding the "we want to interact with other humans, not machines" principle 
> more explicitly to that would indeed be good as well. LLVM's recently adopted 
> policy (https://llvm.org/docs/AIToolPolicy.html) is another example that I 
> like, with principles similar to the ones Stefan articulated and the SymPy 
> policy.
>
> I'd add one principle here that doesn't need to be in a policy but is 
> important for this discussion: we don't prescribe to others how they are and 
> aren't allowed to contribute (to the extent possible). That means that 
> arguments about the productivity gains of using any given tool, or other 
> effects of using that given tool like a reduction in learning, the impact on 
> society or environment, etc. are - while quite interesting and important - 
> not applicable to the question of "am I allowed to use tool X to contribute 
> to NumPy or SciPy?". There are obviously better and worse ways to use any 
> tool, but the responsibility of that is up to every individual.
>
> Re ARPACK rewrite: I think at this point I'd recommend steering clear of 
> letting an LLM tool generate substantial algorithmic code - given the niche 
> application, the copyright implications of doing that are pretty murky 
> indeed. However, using an LLM tool to generate more unit tests given a 
> specific criterion, or have it fill in stubbed out C code in the 
> implementation for things like error handling, checking/fixing Py_DECREF'ing 
> issues, adding the "create a Python extension module" boilerplate, and all 
> such kinds of clearly not copyrightable code seems perfectly fine to do. That 
> just automates some of the tedious and fiddly parts of coding, without 
> breaking any of the principles listed above.
>
> Cheers,
> Ralf
>
>
> _______________________________________________
> NumPy-Discussion mailing list -- [email protected]
> To unsubscribe send an email to [email protected]
> https://mail.python.org/mailman3//lists/numpy-discussion.python.org
> Member address: [email protected]



-- 
This email is fully human-source.    Unless I'm quoting AI, I did not
use AI for any text in this email.
_______________________________________________
NumPy-Discussion mailing list -- [email protected]
To unsubscribe send an email to [email protected]
https://mail.python.org/mailman3//lists/numpy-discussion.python.org
Member address: [email protected]

[Numpy-discussion] Re: Current policy on AI-generated code in NumPy

Reply via email to