On 9/19/24 11:51 AM, Joseph Myers wrote: > 1. Introduction Thanks for writing this up!
> This message expands on my remarks at the Cauldron (especially the > patch review and maintenance BoF, and the Sourceware infrastructure > BoF) regarding desired features for a system providing pull request > functionality (patch submission via creating branches that are then > proposed using some kind of web interface or API, with a central > database then tracking the status of each pull request and review > comments thereon automatically), for use by the GNU toolchain (or one > or more components thereof - there is no need for each component to > make the same decision about moving to such software and workflow, and > indeed we have no mechanism to make such decisions for the toolchain > as a whole). > > This does not advocate a particular choice of software for such > functionality (though much of the discussion seemed to suggest Forgejo > as the most likely starting point), or a particular choice of where to > host it. Hosting would of course need to meet appropriate security > requirements, and to achieve a passing grade on the GNU Ethical > Repository Criteria, and the software would need to be entirely free > software. Where relevant features are not already supported, it's > important that the software is receptive to the addition of such > features (including cases where part of the functionality is provided > by software specific to the GNU toolchain or parts thereof - such as > for the custom checks currently implemented with git hooks - and the > underlying software provides appropriate interfaces to allow > integration of such external pieces). The list of features here may > be a good basis for reviewing what particular forge software supports > and whether other features can be added, directly or through use of > appropriate APIs. > > Forge software may provide other pieces such as bug tracking or wikis > that we currently handle separately from git hosting. In such cases, > we should be able to disable those pieces and keep using the existing > bug tracking and wiki software (while having the option to decide > independently to migrate those if desired). > > I consider the overall benefits of such a move to be having more > structured data about all changes proposed for inclusion and their > status (needing review, needing changes from the author, under > discussion, needing merge from mainline, etc.), to help all people > involved in the patch submission and review process to track such > information and to find patches needing review as applicable, along > with providing a more familiar workflow for many people that avoids > many of the problems with email (which affect experienced contributors > working around corporate email systems, not just new contributors). > It would not of course by itself turn people with no interest in or > understanding of systems software development into contributors (for > example, people without knowledge of directories and hierarchical file > storage, or people who only understand software development as web > development). Nor would it prevent the accumulation of large backlogs > of unreviewed patches, as is evident from many large and active > projects using PR workflows with large numbers of open PRs. > > As Richard noted in his BoF, email sucks. As I noted in reply, so do > the web and web browsers when trying to deal with large amounts of > patch review state (when one wishes to apply one's own view, not the > forge's, of what is resolved and what needs attention). As I also > noted, in the Sourceware infrastructure BoF, tools such as patchwork > and b4 are the right answer to the wrong question: trying to get > structured data about patch submissions when working from the axiom > that emails on a mailing list should be the primary source of truth > for everything needing review, rather than starting from more > structured data and generating emails as one form of output. > > Moving to a pull request system is not expected to change policies > regarding who can approve a change for inclusion, or the technical > limits on who can cause a change to enter mainline (e.g. all people > with commit access would be expected to be able to use a button in a > web interface to cause to PR to be merged, though policy might limit > when they should do so). We can of course choose to change policies, > either as part of adopting a PR system or later. Agreed. > > > 2. Key features > > (a) Some forges have a design that emphasises the tree you get after a > proposed contribution, but not the sequence of commits to get there. This has changed a lot in recent years in gitlab and github where per-commit reviews were emerging as a needed property of the web UI, so the commits (rather than the final state of the tree) were being exposed for review purposes. I see this as a general maturing of the tooling towards what we already knew in systems design, that the commits and the clean logical changes were a necessary prerequisite to managing the complexity of the change. > For the toolchain, we care about the clean, logical sequence of > commits on the mainline branch. (We also use linear history on > mainline, but that's a secondary matter - though certainly we'd want > any forge used to support such linear history so that property doesn't > need to change as part of adopting pull request workflow.) Having a > clean sequence of commits has some implications for forge support: > > * Support for reviewing the proposed commit message, not just the > diffs, is important (and it should be clear what commit message > would result from merging any pull request). > > * Patch series and dependencies between patches are important. In > such cases, fixes for issues from review should go into the > appropriate logical commit in the PR (with rebasing as necessary), > and it should be possible at all times to see what the sequence of > commits on mainline would look like. (E.g. people use some > workarounds on GitHub to manage PR dependencies, but those result in > second and subsequent PRs in a series showing the full set of diffs > from a PR and those it depends on, rather than just the logical > diffs for one PR.) This area has seen a lot of change recently in the large forges, and warrants a review of current offerings. > I consider patch series and dependencies to be separate but related > things: a patch series may not have strictly linear dependencies > (and it's often useful to merge the more straightforward patches > from a series while still discussing and revising others), while a > patch may depend on other patches that it's not logically part of > the same series as. They are, however, closely related, and a > sufficient solution for dependencies might also be adequate for many > cases of series. Agreed. > Note that series can sometimes have hundreds of patches; any > solution for patch series and dependencies needs to scale that far. > > There is of course the common case of a single-patch submission, > where the patch is ready for inclusion after some number of fixes. > In those cases, it's probably convenient if it's not necessary to > rebase - provided it's clear that a particular PR would be > squash-merged, and also what the commit message would be for the > final fixed commit. > > * Given the need for rebasing when working with patch series, it's > important to have good support for rebasing. In particular, all > revisions of the changes for a PR that was rebased need to remain > permanently available (e.g. through appropriate documented refs to > fetch to get all revisions of all PRs). > > (b) Various people have built efficient workflows for going through > all patch submissions and comments (or all in a particular area), even > when only reviewing a small proportion, and have concerns about > efficiency of a web interface when working with many patches and > comments. It's important to have good API support to allow people to > build tools supporting their own workflow like this without needing to > use the browser interface (and following their own logic, not the > forge's, for what changes are of interest). Good API support might, > for example, include a straightforward way to get all changes to PR > and comment data and metadata since some particular point, as well as > for actions such as reviewing / commenting / approving a PR. Such API > support might be similar to what's needed to ensure people can readily > get and maintain a local replica of all the key data and its history > for all PRs. Agreed. > Replication like that is also important for reliably ensuring key data > remains available even if the forge software subsequently ceases to be > maintained. Consider, for example, the apparent disappearance of the > data from www-gnats.gnu.org (we occasionally come across references to > old bug reports from that system in glibc development, but don't have > any way to resolve those references). > > Another use of such an API would be to allow maintaining local copies > of all PR comments etc. in a plain text form that can be searched with > grep. > > (c) Given that a transition would be from using mailing lists, it's > important to have good plain text outward email notifications for all > new merge requests, comments thereon and other significant actions > (changing metadata, approvals / merges, etc.) - through whatever > combination of built-in support in a forge and local implementation > using the API. Such notifications would go to the existing mailing > lists (note that the choice of mailing lists for a patch can depend on > which parts of the tree it changes - for example, the choice between > binutils and gdb-patches lists, or some GCC patches going to the > fortran or libstdc++ lists) - as well as to individuals concerned with > / subscribed to a given PR. Some very routine changes, such as > reports of clean CI results, might be omitted by default from > notifications sent to the mailing lists. > > "good" includes quoting appropriate diff hunks being comments on. > It's OK to have an HTML multipart as well, but the quality of the > plain text part matters. Diffs themselves should be included in > new-PR emails (and those for changes to a PR) unless very large. > > I do not however suggest trying to take incoming email (unstructured) > and turn it into more structured comments on particular parts of a PR > in the database. Agreed Re: incoming email. > Similarly, commit emails should continue to go to the existing mailing > lists for those. > > (d) Rather than the forge owning the mainline branch in the sense of > every commit having to go through an approved PR, at least initially > it should be possible for people to push directly to mainline as well, > so the transition doesn't have to be instantaneous (and in particular, > if a change was posted before the transition and approved after it, it > shouldn't be necessary to turn it into a PR). Longer term, I think it > would be a good idea to move to everything going through the PR system > (with people able to self-approve and merge their own PRs immediately > where they can commit without review at present), so there is better > structured uniform tracking of all changes and associated CI > information etc. - however, direct commits to branches other than > mainline (and maybe release branches) should continue to be OK long > term (although it should also be possible to make a PR to such a > branch if desired). > > Beyond putting everything through PRs, eventually I'd hope to have > merges to mainline normally all go through a CI system that makes sure > there are no regressions for at least one configuration before merging > changes. Agreed. > (e) All existing pre-commit checks from hooks should be kept in some > form, to maintain existing invariants on both tree and commit contents > (some hook checks make sure that commits don't have commit messages > that would cause other automated processes to fall over later, for > example). These could all move to pre-commit CI checks that block merging. > (f) Existing cron jobs that read from or commit to repositories should > use normal remote git access, not filesystem access to the repository, > including using APIs to create and self-merge PRs when pushing to the > repository is involved. Agreed. Thanks for writing this up! -- Cheers, Carlos.