Current Lilypond Pipeline Issue I'd like to return to the actual concern: is our workflow of automatically integrating every contribution unless flagged by reviewers broken?
I wonder if we can be more clear about it since there are two implications in this concern. The first concern is: does it imply that someone who has access to our pipeline cannot be trusted? I would suspect not. And if it were, then the remedy would be to remove the user's permissions. So this is not the concern we really need to solve for, I don't think. (If I am wrong and this is a problem, then we do have to solve this problem. Because if we were planning on granting access to people we don't trust, then there will be no available solutions to any of our other concerns.) So, the main concern is: we see a future where we have too few reviewers and therefore some problematic PRs will not get flagged. In my estimation, so long as the permissions/trust of contributors is under control, I don't think this is an imminent concern. However, I do suspect that it is inevitable that we will see more activity and interest from AI-savvy developers. Even an increase in contributions of unproblematic code could easily overwhelm the capacity of our reviewers. Since our reviewers tend to focus on their areas of expertise, it is not like any single developer feels the responsibility to make sure every PR gets reviewed, nor should they. So we can imagine that this will be a problem if PR size and frequency increases. If we accept that this is a concern, and want to prevent it from becoming a problem, then we will want to adjust our pipeline, or our management of it. Options for Solving the Pipeline Issue The nuclear approach is to do away with the automatic merging of the PR and require all PRs to receive approvals. But that would add friction to the existing pipeline, and we may want to avoid that. (please forgive any inaccurate use of terms like PR/approve/flag as I am not intimately familiar with how our pipeline is implemented) Which leaves the easiest way to address this concern under the current pipeline is to appoint a reviewer who's sole job is to reject any PR that makes it the the penultimate step without any comments from any other reviewer; Crucially, however, they would only reject PRs from contributors who are not legacy trusted contributors. Contributions from trusted contributors will be let to go through silently as is customary. This person could be a new volunteer, who's technical qualifications is only to have familiarity with github, to avoid adding any burden to trusted reviewers. It is worth being clear about what kind of gatekeeping we are doing and why. The current pipeline is designed for a small and well-trusted group of developers. They are valuable and we don't want to waste their time. If we want to enforce more guardrails for new developers then we will need to create some kind of distinction between trusted and untrusted developers. We can start out being loose about this, because we kind of all know who that is. We could head towards establishing some kind of criteria around metrics like commit history, etc. to have a coherent way of establishing what "trusted" means. But for now, let's just suppose that we have a loyalty czar that will only flag unfamiliar contributors' PRs. The second topic is about how futile it is to attempt to gatekeep the amount of AI influence over contributors' code. In all cases, the responsibility for the code lies with the contributor. Again, if we are letting new developers in and don't trust them, that is an upstream problem with a different solution. The problem is a reviewer bandwidth problem. The contribution is not digestible enough. For its size, it does not provide enough context for someone to understand what problem is being addressed, what the architecture design of the solution is, why this was chosen compared to other approaches, what other considerations went into the design, what are the compromises, how the feature is tested, and what the benchmark performance is, docs and helpful snippets. If the loyalty czar wants to facilitate a PR, they may have to figure out who is most appropriate to review it, and bring their attention to it. The reviewer should be able to articulate any concerns such any in addition to those above, without spending much time delving in to understand the code. They can just raise potential concerns based on the scope of work being done, edge cases, antipatterns, desired regtests, comparison of approach to similar features, efficiency, backward compatibility, etc. Whatever they would like to see that would help them to more easily be able to make a determination whether the problem being solved is worthwhile and is done in a coherent way. If that conversation happens outside github, then the loyalty czar will update the PR with the reviewer's concerns. The contributor can then address the concerns. Note that if the contributor used AI to generate the code, they will likely use AI to generate the responses. And if the contributor is using a coherent setup, then the answers provided will make sense. If the answers make sense, then the code is also likely good enough. If the reviewer has literally any concerns that cannot be addressed to their satisfaction, then they can leave it unmerged and ask for more improvements to code consumability. Crucially, we want to adopt a stance that the reviewers' time is most valuable, and they can keep asking questions, and until they are satisfied, the PR will not move. It is up to the contributors to prepare their contribution so that it is sufficiently consumable. Contributors will need to document doing enough due diligence so it is commensurate with the scope of their changes, and the whims of the reviewers. If the code seems to work according to regtests, but the reviewer can't comprehend it because of either style issues, naming conventions, or lack of comments, it is valid to reject and ask for the code to be refactored according to their standards. Again, if the contributor is using a productive contemporary development environment, then they will easily be able to provide any requested documentation, or trivially make code legibility requested changes. The Future of Lilypond and AI Rather than being suspect of contributors' use of AI, we should understand what an AI-oriented development approach should be. We might want to consider having a parallel pipeline for contemporary work. I will hereby stop trying to distinguish how much AI is used to write code. It is a somewhat arbitrary label at this point, because it can refer to anything from auto-complete in an IDE, to asking an LLM for code suggestions that you cut & paste, to having an agent write the code...to having agent orchestration that writes design docs, code, compares code to design docs, writes unit tests, validates them, writes documentation, checks for vulnerabilities, optimizes, and measures performance... So much so that I would argue that not using AI in any capacity is becoming a charming historical practice much like doing long division on paper. So, I think that we want to have a legacy pipeline and a contemporary pipeline The legacy pipeline is our current trusted low-friction approach. Publish silently. The contemporary one would have more structure and context. PRs would need to meet a high degree of standards and documentation to merge. In terms of facilitating contributions, we would want to develop sets of skills that represent aspects of lilypond development best practices. Skills are an AI term for what is basically a set of instructions for repeatable tasks. Skills are used to populate the context window of LLM requests and allow it to do the intended work in a consistent fashion. Being able to provide this kind of clarity and consistency is what will make lilypond survive vibe coding. Elaine
