Paolo Bonzini <[email protected]> writes: > [People in Cc are a mix of Python people, tracing people, and people > who followed the recent AI discussions. - Paolo] > > This series adds type annotations to tracetool. While useful on its own, > it also served as an experiment in whether AI tools could be useful and > appropriate for mechanical code transformations that may not involve > copyrightable expression. > > In this version, the types were added mostly with the RightTyper tool > (https://github.com/RightTyper/RightTyper), which uses profiling to detect > the types of arguments and return types at run time. However, because > adding type annotations is such a narrow and verifiable task, I also developed > a parallel version using an LLM, to provide some data on topics such as: > > - how much choice/creativity is there in writing type annotations? > Is it closer to writing functional code or to refactoring?
Based on my work with John Snow on typing of the QAPI generator: there is some choice. Consider typing a function's argument. Should we pick it based on what the function requires from its argument? Or should the type reflect how the function is used? Say the function iterates over the argument. So we make the argument Iterable[...], right? But what if all callers pass a list? Making it List[...] could be clearer then. It's a choice. I think the choice depends on context and taste. At some library's external interface, picking a more general type can make the function more generally useful. But for some internal helper, I'd pick the actual type. My point isn't that an LLM could not possibly do the right thing based on context, and maybe even "taste" distilled from its training data. My point is that this isn't entirely mechanical with basically one correct output. Once we have such judgement calls, there's the question how an LLM's choice depends on its training data (first order approximation today: nobody knows), and whether and when that makes the LLM's output a derived work of its training data (to be settled in court). [...] > Based on this experience, my answer to the copyrightability question is > that, for this kind of narrow request, the output of AI can be treated as > the output of an imperfect tool, rather than as creative content potentially > tainted by the training material. Maybe. > Of course this is one data point and > is intended as an experiment rather than a policy recommendation. Understood. We need to develop a better understanding of capabilities, potential benefits and risks, and such experiments can only help with that.
