Re: [QGIS-Developer] [Poll created] Re: "Human In The Loop" Policy For AI/Tool-Assisted Contributions

Vincent Picavet via QGIS-Developer Sat, 14 Feb 2026 01:49:22 -0800

Hello,

On 11/02/2026 16:03, Martin Dobias wrote:

Hi all


My perspective on AI tools is that it's just another tool in a developer's 
toolbox, just like a debugger or a code analyzer.


AI tools - LLMs have very specific characteristics  which are important to take 
into account : cost, transparency, vendor lock-in, social and environmental 
impacts to just cite a few aspects, are totally different for LLMs than for 
other tools like a debugger or code analyzer.

Having a developer-centric approach to tooling would most probably leads us to 
choices not compatible with our global missions and views.

There are lots of unknowns about the training data, I agree. But being 
conservative and adopting a strict no AI tools policy seems like restricting 
ourselves from using the best tools for the job. In the end, it is still the 
responsibility of the developer to ensure their contribution is correct, and 
not violating copyright, whether they use AI tools or not.


I have not been talking about a strict No-AI policy. I am saying that **code 
generation** by LLM pose an existential threat to OpenSource projects like QGIS.

AI tools may be useful for other tasks than code generation.

Again, pushing the responsibility to individual contributors would be very 
hypocritical : there is **no way** for anyone to assess the correctness of a 
contribution generated by LLM concerning copyright violation. Putting 
responsibility on people without giving them means of action and verification 
is definitely not something we should defend.

Quality and security are concerns that can be intrinsically assessed by 
individuals, IP issues are not.

It is also hard to draw a line for a conservative "no AI" policy:
- is it acceptable to brainstorm design with AI?
- is it acceptable to get a prototype built with AI, for inspiration?
- is it acceptable to get AI to check code for bugs?
- is it acceptable to ask AI to improve tone of my reviews?

With strict no AI policy, I guess we would also need to make sure that all of 
the 50+ dependencies also have strict no AI policy, otherwise QGIS builds could 
still in theory contain AI-generated copyrighted code? I am not sure that is 
realistic...


Again, who has been arguing for a no-AI policy ? There are more balanced 
policies which can be evaluated.

Let's be pragmatic: AI tools are here to stay, we can either ignore them, or we 
can learn to use them responsibly to deliver even more QGIS goodness :-) And we 
can expect that with the increased risk of copyright issues, there will be 
automated tools to scan the code for possible copyright problems, integrated in 
CI, which will flag any risky contributions.


"AI tools are here to stay" is your own opinion, and an idea LLM companies are 
trying to convince everyone it is a fact. Given the economy of data centers and LLM 
companies, this may or may not be true at all. Or at least not at the current costs, 
which is also a strong issue. To make a comparison, note that there was a time when 
asbestos was there to stay too.

Also "we can expect that... " sounds like magic thoughts, definitely not 
something we can really count on.

We cannot ignore LLM and AI though, and this is why this discussion takes place.

Also, I saw an affirmation about "fair use" being the default legal position in 
the US right now. This is again what the 7 magnificents want you to believe, but actual 
analysis for the US congress is far away from this statement :

https://www.congress.gov/crs_external_products/LSB/PDF/LSB10922/LSB10922.8.pdf

Last but not least, let me point to the latest development of "AI contributions to 
OpenSource" :

https://theshamblog.com/an-ai-agent-published-a-hit-piece-on-me/

We are living interesting times…

Vincent


Cheers
Martin


On Tue, Feb 10, 2026 at 6:06 PM Vincent Picavet via QGIS-Developer 
<[email protected]> wrote:

    Hello,

    On 07/02/2026 00:30, Even Rouault wrote:
    > Thanks for your feedback. Yes copyright / IP issues are a tricky problem 
to deal with and you make good points on how things could go wrong. That said, in 
practice I believe most generated code would be mostly derived from QGIS itself, 
QT code or doc or be non-copyrightable material. I can

    I seriously doubt that, given the amount of copyrighted material that has 
already been used to train LLMs. And most important, we have no way to know 
that, except if the LLM companies get subpoenaed and disclose hidden practices 
when asked in court.

    Believing is not enough when talking about legal stuff.

    > imagine though that contributions including non-trivial algorithms could 
possibly be infringing copyright. In that situation, the reviewers could ask the
    Even simpler code can be plagiarized, not only non-trivial algorithms.
    > submitter to bring more light on the provenance of such code (the 
contributor may ask the LLM to dig for references for the provenance and check 
they are OK with GPL2 inclusion, being aware that the LLM could hallucinate 
them...), and if no satisfactory answer is given, reject the contribution. But 
that can be admitedly hard to spot for reviewers. I'd be happy to amend the QEP 
with that if someone can propose an adequate formulation.
    Again, a reviewer has no way to know or imagine that a code has been 
plagiarized : you would have to be aware of the full training dataset of the 
LLM, and this is 1. behind closed fences 2. not humanly possible.
    > I've doubts a "no AI" policy is achievable in practice, or people will lie.  As you 
mention, we can require people to mention the tool they have used, possibly the prompt(s) they use, 
which manual modifications they applied on top of that.  Doesn't the paragraph starting at line 35 
(https://github.com/qgis/QGIS-Enhancement-Proposals/pull/363/changes#diff-4f4102e51f04fdfc82e843c6942abe9965c03ac85a92e9becf21bcca8b5571adR35)
 cover enough your point about "have a mandatory mention and description of LLM usage for each 
contribution" ?

    Lying about not using AI is just like lying about being sure one has the 
right to contribute the code ( see contributor's agreement ) : if this is a 
rule, we ask people and they lie, then we should also have sanctions and be 
strict about it.

    For me, full transparency about usage of a blackbox **for code generation** 
is not enough as a protection against legal matters. For anything else that can 
be AI-aided, this is more about resiliency, transparency and trust, and we 
should make it clear that explaining exactly how AI has been used is mandatory 
and should be given along every contribution.

    > The main driver for this QEP was to give us a tool to be able to quickly 
reject sloppy contributions with a solid reference to back our decisions, but we 
must indeed decide whether we go further than this.

    I guess there is matter to debate and a lot of uncertainty. Hence the 
conservative approach, to avoid the worst and maybe open it later on whenever 
we see the situation clearer.

    > For that purpose, I've created a quick poll at 
https://docs.google.com/forms/d/e/1FAIpQLSdnVWoD5DrwCbNXqPqHsLw2jfbLkPMKBkvfyQfTQOPZkj_EaQ/viewform
 so we can gather opinions on the general direction we want on that subject. All, 
please fill!

    Ok to gather more advice, thanks for running the poll,


    Vincent

    >
    > Even
    >
    > Le 06/02/2026 à 18:01, Vincent Picavet via QGIS-Developer a écrit :
    >> Hi,
    >>
    >> I would double-down on Greg Troxel's advice concerning copyright issues, 
especially concerning the introduction of LLM-generated code into QGIS codebase.
    >>
    >> Opensource's success is based on these main characteristics : quality, 
security, trust.
    >>
    >> AI contributions pose a threat to quality, security and trust alike.
    >>
    >> A human-in-the-loop policy for contributions written with AI may help 
for quality and security issues, but will still leaves a huge problem for trust.
    >>
    >> Among the various aspects of trust, what worries me most right now is 
the copyright issue. OpenSource software is based on intellectual property laws, and 
especially on copyright, to be able to derive copyleft and grant more rights to 
end-users.
    >>
    >> End-user trust opensource software from a legal point of view because :
    >>
    >> - they are backed by well-established copyright laws
    >>
    >> - they have clear and well established end-users contracts ( opensource 
licences )
    >>
    >> - they have a full record of modifications of the source code, hence a 
full lineage and certification of IP rights for the code
    >>
    >> - also, foundations like OSGeo additionnaly put a stamp on the software 
to guarantee that process and initial IP can be trusted enough to have a legal 
insurance concerning the software
    >>
    >> Introducing IA black boxes into the development process breaks the 
ability to control the lineage of the code and guarantee that it is a genuine 
invention, and therefore allowed to be licenced under the GPL.
    >>
    >> For quality and security, a developer can always intrinsically assess 
that the generated code has the required level of quality, and that it does not 
include any security flaw.
    >>
    >> But **there is no way for a developer to evaluate the IP rights on a 
code generated by a LLM**. How would one do it, since the code has been generated 
through a total opaque black box ingesting non-identified enormous volumes of data ?
    >>
    >> Today, we definitely know that LLMs ( ChatGPT, Claude and others ) have 
been trained on illegal copyrighted material. It is proven that they trained LLMs on 
pirated books. Furthermore, every time someone complaints about IP violation by LLM, 
big corps settle a financial arrangement with the copyright owners and move on.
    >>
    >> There is therefore no doubt that they have also trained LLMs on 
proprietary code. And also on opensource code not compliant with GPLv2+.
    >>
    >> Big corp. currently hide behind a "fair use" argument, but this is 
clearly rubbish, otherwise why would they bother to settle large financial deals with copyright 
owners ?
    >>
    >> So, LLM-generated code contributed to QGIS will at some point be 
plagiarized from random code available on the internet, and neither QGIS.org nor the 
contributor will be able to know.
    >>
    >> If we start accepting such code without being able to check provenance 
or copyright issues, it will end up buried deep inside QGIS, and the day we will 
discover that it infringes copyright, it will be a nightmare to solve : in this case 
we will want to revert all incriminated code, and also all code depending on the 
plagiarized code **and have it rewritten from scratch by someone who has never read 
the plagiarized code** ( ref : SCO/UNIX for example ). This is almost impossible.
    >>
    >> This would be a nightmare, just for one identified contribution.
    >>
    >> Even more, if/when the fair-use principle of LLMs falls down, then all 
LLM-generated code should be removed from QGIS, and all code depending on it. This is 
a really high risk with high impact.
    >>
    >> You may say : "ok but everyone does it, the chances of being caught are low, 
why not benefit from the opportunity ?"
    >>
    >> Then what about "everyone copies GPL code into proprietary code, the chances 
of being caught are low, why not benefit from the opportunity ?"
    >>
    >> Copyright is at the foundation of OpenSource software, and especially 
GPL-based software. If we choose to deny it, then we loose our core principle.
    >>
    >> In the text Even propose, there is a copyright section, pushing the 
responsibility of IP compliance control back to the contributor. It may protect 
QGIS.org or other developers from being sued whenever there is a problem, or they 
could sue back the faulty contributor, but this is not enough :
    >>
    >> - the faulty contributor has no way to ensure his generated code has no 
IP issue ( other than NOT using LLMs ) : responsibility without any mean of action is 
not fair and sustainable
    >>
    >> - even if the QGIS projet can avoid being convicted by transferring 
responsibility, then the situation would still be open and be a nightmare : removing 
plagiarized code entangled down the core of the software and all its dependency code, 
and rewrite it without IP issue is really hard
    >>
    >> Therefore, I do not think this mention is enough for IP protection.
    >>
    >> This rationale concerns the generated code itself, contributed to QGIS 
or other software in the ecosystem. LLMs may be useful and without IP risks to help 
find bugs, write parts of documentations where there is no risk of plagiarism, or 
other use cases.
    >>
    >> But I would definitely **forbid any generated code to be introduced into 
the main source code because of IP risk**.
    >>
    >> Also, the least we can do for any contribution, is not only to have a 
human in the loop, but also to have a mandatory mention and description of LLM usage 
for each contribution. This would at least give traceability. It does not solve 
anything, but in case of a problem, we could at least start to investigate.
    >>
    >> A am glad this conversation takes place, and willing to pursue the 
discussion, sorry for having been long.
    >>
    >> Have a nice weekend,
    >>
    >> Vincent
    >>
    >>
    >>
    >>
    >>
    >> On 31/01/2026 01:01, Greg Troxel via QGIS-Developer wrote:
    >>> I would suggest a much stronger policy:
    >>>
    >>>    no LLM-generated code or discussion may be submitted to any QGIS 
forum
    >>>
    >>>
    >>> The idea that LLM-generated code has been "reviewed" intends to be that
    >>> it is of high enough quality that it is reasonable for *humans* to spend
    >>> time reviewing it.  But I don't believe that asking that it be reviewed
    >>> will achieve that in practice.
    >>>
    >>> I've already had the experience (in a different project) of seeing a
    >>> posted PR(ish, patch on list), taking the time to comment, and getting
    >>> LLM-generated (vacuous) replies to my comments.
    >>>
    >>> Besides the ethical problems with asking humans to review, improve,
    >>> judge or in any other way pay attention to LLM output, there's the
    >>> problem of copyright.  While machine-generated text isn't copyrightable
    >>> as is, LLM output is a derived work of stolen human work, scraped
    >>> and used without permission, often as DDOS.
    >>>
    >>> On the basis of each reason, I believe the policy about LLM should just
    >>> be "no".
    >>> _______________________________________________
    >>> QGIS-Developer mailing list
    >>> [email protected]
    >>> List info: https://lists.osgeo.org/mailman/listinfo/qgis-developer
    >>> Unsubscribe: https://lists.osgeo.org/mailman/listinfo/qgis-developer
    >> _______________________________________________
    >> QGIS-Developer mailing list
    >> [email protected]
    >> List info: https://lists.osgeo.org/mailman/listinfo/qgis-developer
    >> Unsubscribe: https://lists.osgeo.org/mailman/listinfo/qgis-developer
    >
    _______________________________________________
    QGIS-Developer mailing list
    [email protected]
    List info: https://lists.osgeo.org/mailman/listinfo/qgis-developer
    Unsubscribe: https://lists.osgeo.org/mailman/listinfo/qgis-developer


On Tue, Feb 10, 2026 at 6:06 PM Vincent Picavet via QGIS-Developer 
<[email protected]> wrote:

    Hello,

    On 07/02/2026 00:30, Even Rouault wrote:
    > Thanks for your feedback. Yes copyright / IP issues are a tricky problem 
to deal with and you make good points on how things could go wrong. That said, in 
practice I believe most generated code would be mostly derived from QGIS itself, 
QT code or doc or be non-copyrightable material. I can

    I seriously doubt that, given the amount of copyrighted material that has 
already been used to train LLMs. And most important, we have no way to know 
that, except if the LLM companies get subpoenaed and disclose hidden practices 
when asked in court.

    Believing is not enough when talking about legal stuff.

    > imagine though that contributions including non-trivial algorithms could 
possibly be infringing copyright. In that situation, the reviewers could ask the
    Even simpler code can be plagiarized, not only non-trivial algorithms.
    > submitter to bring more light on the provenance of such code (the 
contributor may ask the LLM to dig for references for the provenance and check 
they are OK with GPL2 inclusion, being aware that the LLM could hallucinate 
them...), and if no satisfactory answer is given, reject the contribution. But 
that can be admitedly hard to spot for reviewers. I'd be happy to amend the QEP 
with that if someone can propose an adequate formulation.
    Again, a reviewer has no way to know or imagine that a code has been 
plagiarized : you would have to be aware of the full training dataset of the 
LLM, and this is 1. behind closed fences 2. not humanly possible.
    > I've doubts a "no AI" policy is achievable in practice, or people will lie.  As you 
mention, we can require people to mention the tool they have used, possibly the prompt(s) they use, 
which manual modifications they applied on top of that. Doesn't the paragraph starting at line 35 
(https://github.com/qgis/QGIS-Enhancement-Proposals/pull/363/changes#diff-4f4102e51f04fdfc82e843c6942abe9965c03ac85a92e9becf21bcca8b5571adR35)
 cover enough your point about "have a mandatory mention and description of LLM usage for each 
contribution" ?

    Lying about not using AI is just like lying about being sure one has the 
right to contribute the code ( see contributor's agreement ) : if this is a 
rule, we ask people and they lie, then we should also have sanctions and be 
strict about it.

    For me, full transparency about usage of a blackbox **for code generation** 
is not enough as a protection against legal matters. For anything else that can 
be AI-aided, this is more about resiliency, transparency and trust, and we 
should make it clear that explaining exactly how AI has been used is mandatory 
and should be given along every contribution.

    > The main driver for this QEP was to give us a tool to be able to quickly 
reject sloppy contributions with a solid reference to back our decisions, but we 
must indeed decide whether we go further than this.

    I guess there is matter to debate and a lot of uncertainty. Hence the 
conservative approach, to avoid the worst and maybe open it later on whenever 
we see the situation clearer.

    > For that purpose, I've created a quick poll at 
https://docs.google.com/forms/d/e/1FAIpQLSdnVWoD5DrwCbNXqPqHsLw2jfbLkPMKBkvfyQfTQOPZkj_EaQ/viewform
 so we can gather opinions on the general direction we want on that subject. All, 
please fill!

    Ok to gather more advice, thanks for running the poll,


    Vincent

    >
    > Even
    >
    > Le 06/02/2026 à 18:01, Vincent Picavet via QGIS-Developer a écrit :
    >> Hi,
    >>
    >> I would double-down on Greg Troxel's advice concerning copyright issues, 
especially concerning the introduction of LLM-generated code into QGIS codebase.
    >>
    >> Opensource's success is based on these main characteristics : quality, 
security, trust.
    >>
    >> AI contributions pose a threat to quality, security and trust alike.
    >>
    >> A human-in-the-loop policy for contributions written with AI may help 
for quality and security issues, but will still leaves a huge problem for trust.
    >>
    >> Among the various aspects of trust, what worries me most right now is 
the copyright issue. OpenSource software is based on intellectual property laws, and 
especially on copyright, to be able to derive copyleft and grant more rights to 
end-users.
    >>
    >> End-user trust opensource software from a legal point of view because :
    >>
    >> - they are backed by well-established copyright laws
    >>
    >> - they have clear and well established end-users contracts ( opensource 
licences )
    >>
    >> - they have a full record of modifications of the source code, hence a 
full lineage and certification of IP rights for the code
    >>
    >> - also, foundations like OSGeo additionnaly put a stamp on the software 
to guarantee that process and initial IP can be trusted enough to have a legal 
insurance concerning the software
    >>
    >> Introducing IA black boxes into the development process breaks the 
ability to control the lineage of the code and guarantee that it is a genuine 
invention, and therefore allowed to be licenced under the GPL.
    >>
    >> For quality and security, a developer can always intrinsically assess 
that the generated code has the required level of quality, and that it does not 
include any security flaw.
    >>
    >> But **there is no way for a developer to evaluate the IP rights on a 
code generated by a LLM**. How would one do it, since the code has been generated 
through a total opaque black box ingesting non-identified enormous volumes of data ?
    >>
    >> Today, we definitely know that LLMs ( ChatGPT, Claude and others ) have 
been trained on illegal copyrighted material. It is proven that they trained LLMs on 
pirated books. Furthermore, every time someone complaints about IP violation by LLM, 
big corps settle a financial arrangement with the copyright owners and move on.
    >>
    >> There is therefore no doubt that they have also trained LLMs on 
proprietary code. And also on opensource code not compliant with GPLv2+.
    >>
    >> Big corp. currently hide behind a "fair use" argument, but this is 
clearly rubbish, otherwise why would they bother to settle large financial deals with copyright 
owners ?
    >>
    >> So, LLM-generated code contributed to QGIS will at some point be 
plagiarized from random code available on the internet, and neither QGIS.org nor the 
contributor will be able to know.
    >>
    >> If we start accepting such code without being able to check provenance 
or copyright issues, it will end up buried deep inside QGIS, and the day we will 
discover that it infringes copyright, it will be a nightmare to solve : in this case 
we will want to revert all incriminated code, and also all code depending on the 
plagiarized code **and have it rewritten from scratch by someone who has never read 
the plagiarized code** ( ref : SCO/UNIX for example ). This is almost impossible.
    >>
    >> This would be a nightmare, just for one identified contribution.
    >>
    >> Even more, if/when the fair-use principle of LLMs falls down, then all 
LLM-generated code should be removed from QGIS, and all code depending on it. This is 
a really high risk with high impact.
    >>
    >> You may say : "ok but everyone does it, the chances of being caught are low, 
why not benefit from the opportunity ?"
    >>
    >> Then what about "everyone copies GPL code into proprietary code, the chances 
of being caught are low, why not benefit from the opportunity ?"
    >>
    >> Copyright is at the foundation of OpenSource software, and especially 
GPL-based software. If we choose to deny it, then we loose our core principle.
    >>
    >> In the text Even propose, there is a copyright section, pushing the 
responsibility of IP compliance control back to the contributor. It may protect 
QGIS.org or other developers from being sued whenever there is a problem, or they 
could sue back the faulty contributor, but this is not enough :
    >>
    >> - the faulty contributor has no way to ensure his generated code has no 
IP issue ( other than NOT using LLMs ) : responsibility without any mean of action is 
not fair and sustainable
    >>
    >> - even if the QGIS projet can avoid being convicted by transferring 
responsibility, then the situation would still be open and be a nightmare : removing 
plagiarized code entangled down the core of the software and all its dependency code, 
and rewrite it without IP issue is really hard
    >>
    >> Therefore, I do not think this mention is enough for IP protection.
    >>
    >> This rationale concerns the generated code itself, contributed to QGIS 
or other software in the ecosystem. LLMs may be useful and without IP risks to help 
find bugs, write parts of documentations where there is no risk of plagiarism, or 
other use cases.
    >>
    >> But I would definitely **forbid any generated code to be introduced into 
the main source code because of IP risk**.
    >>
    >> Also, the least we can do for any contribution, is not only to have a 
human in the loop, but also to have a mandatory mention and description of LLM usage 
for each contribution. This would at least give traceability. It does not solve 
anything, but in case of a problem, we could at least start to investigate.
    >>
    >> A am glad this conversation takes place, and willing to pursue the 
discussion, sorry for having been long.
    >>
    >> Have a nice weekend,
    >>
    >> Vincent
    >>
    >>
    >>
    >>
    >>
    >> On 31/01/2026 01:01, Greg Troxel via QGIS-Developer wrote:
    >>> I would suggest a much stronger policy:
    >>>
    >>>    no LLM-generated code or discussion may be submitted to any QGIS 
forum
    >>>
    >>>
    >>> The idea that LLM-generated code has been "reviewed" intends to be that
    >>> it is of high enough quality that it is reasonable for *humans* to spend
    >>> time reviewing it.  But I don't believe that asking that it be reviewed
    >>> will achieve that in practice.
    >>>
    >>> I've already had the experience (in a different project) of seeing a
    >>> posted PR(ish, patch on list), taking the time to comment, and getting
    >>> LLM-generated (vacuous) replies to my comments.
    >>>
    >>> Besides the ethical problems with asking humans to review, improve,
    >>> judge or in any other way pay attention to LLM output, there's the
    >>> problem of copyright.  While machine-generated text isn't copyrightable
    >>> as is, LLM output is a derived work of stolen human work, scraped
    >>> and used without permission, often as DDOS.
    >>>
    >>> On the basis of each reason, I believe the policy about LLM should just
    >>> be "no".
    >>> _______________________________________________
    >>> QGIS-Developer mailing list
    >>> [email protected]
    >>> List info: https://lists.osgeo.org/mailman/listinfo/qgis-developer
    >>> Unsubscribe: https://lists.osgeo.org/mailman/listinfo/qgis-developer
    >> _______________________________________________
    >> QGIS-Developer mailing list
    >> [email protected]
    >> List info: https://lists.osgeo.org/mailman/listinfo/qgis-developer
    >> Unsubscribe: https://lists.osgeo.org/mailman/listinfo/qgis-developer
    >
    _______________________________________________
    QGIS-Developer mailing list
    [email protected]
    List info: https://lists.osgeo.org/mailman/listinfo/qgis-developer
    Unsubscribe: https://lists.osgeo.org/mailman/listinfo/qgis-developer

_______________________________________________
QGIS-Developer mailing list
[email protected]
List info: https://lists.osgeo.org/mailman/listinfo/qgis-developer
Unsubscribe: https://lists.osgeo.org/mailman/listinfo/qgis-developer

Re: [QGIS-Developer] [Poll created] Re: "Human In The Loop" Policy For AI/Tool-Assisted Contributions

Reply via email to