Thanks for your feedback!
I think your statement "There's nothing special about LLMs and this,
other than perhaps the speed with which you can make mistakes" hits the
nail on the head, which I think means, that there should actually be no
special rule in this sense for GenAI, but rather a "warning", that with
GenAI you might risk to break the existing copyright laws more easily /
unconscious.
Re examples, I could imagine there are GenAI tools which make it more
obvious where the content comes resp. has good references, than other
tools, which of course does not mean, that you should not be less aware
of possibly breaking copyright laws.
With the EU AI Act the LLMs should actually have to declare what data
they were trained on, etc. which also should make it more transparent in
the future.
Thanks
Michael
Am 22.04.24 um 10:35 schrieb Nick Burch:
On Sun, 21 Apr 2024, Michael Wechner wrote:
Thanks for the pointer to the Generative Tooling rules, which I was
not aware of so far.
At the bottom it says, that the ASF does not tell developers what
tools to use, but I think it would be useful to useful to have some
concrete examples, which would make the rules more clear.
(Not a lawyer, not an official ASK response)
There's nothing special about LLMs and this, other than perhaps the
speed with which you can make mistakes... When including other
people's code, it's all about license compatibility and attribution
The ASF started when a bunch of people started sharing patches for a
web server, with attribution and code under a compatible license. The
foundation grew during a period where it got easier to find code +
code snippets online, including much that wasn't under a compatible
license. Rules didn't change, other than clarifying processes for
checking licenses and what was/wasn't compatible.
You weren't, and still aren't, allowed to copy + paste large chunks of
someone else's code without a compatible license and suitable
attribution. Using a LLM to read all the internet and suggest the code
to copy doesn't change that. Well, other than the well-documented
issues with getting LLMs to cite their sources...
LLMs have loads of great uses, including helping you learn new things,
decoding error messages, finding common patterns, rubber-ducking etc.
They're even worse than many internet forums for suggesting large
chunks of code of unclear provenance to copy+paste
It doesn't matter if it's ChatGPT, Github Co-pilot, a local LLM,
someone on StackOverflow, or a YouTube video that's giving you some
code you want to copy. 3 characters are almost certainly fine, 3 pages
are almost certainly not, a general idea is often fine, and you
absolutely need to engage your brain before committing to ASF repos!
Otherwise, if you do still think more rules / examples / etc are
needed, you'll be wanting legal-discuss@
https://lists.apache.org/list.html?legal-disc...@apache.org
Cheers
Nick