Anton makes a good point. We have been talking about policy for what we do, but the real issue is what we want to come out of it: a clear history for seeing where code came from and granular rollback. I think in both cases the key thing is that each commit is a single clear change. How they get there is not the point.
I have worked on multiple projects with a 1 commit = 1 review (and sometimes even = 1 ticket). These pretty much never have a good history. The best case is that each commit has a message that is a bullet point of many separate changes, because it is simply too inefficient to review each logical change separately. But since the messages become less useful, it encourages a culture of not even bothering to write meaningful messages at all. Note that is trivial for a committer-reviewer to edit the history any way they like without the button. "Allow edits by maintainers" is on by default. The "Squash and merge" button just adds a button for something we can already do. Charles: super useful! Worth noting that for a PR with a good history it will skip meaningful commits (but still give the summary line, which is nice). Kenn On Fri, Sep 28, 2018 at 8:54 AM Anton Kedin <ke...@google.com> wrote: > Is there an actual problem caused by squashing or not squashing the > commits that we face in the project? I personally have never needed to > revert something complicated that would be problematic either way (and > don't have a strong opinion about which way we should do it). From what I > see so far in the thread it doesn't look like reverting is a frequent major > pain for anyone. Maybe it is exactly because we're mostly following some > best practice and it makes it easy. If someone has concrete examples from > their experience in the project, please share them, this way it would be > easier to justify the choice. > > The PR and commit cleanliness, size and isolation are probably more > important thing to have guidance and maybe enforcement for. There are well > known practices and guidelines that I think we should follow, and I think > they will make squashing or not squashing mostly irrelevant. For example, > if we accept that commits should have description that actually describes > what commit does, then "!fixup", "address comments" and similar should not > be part of the history and should be squashed before submitting the PR no > matter which way we decide to go in general. Also, I think that making > commits isolated is also a good practice, and PR author should be able to > relatively easily split the PR upon reviewer's request. And if we choose to > keep whole PRs small and incremental with descriptive isolated commits, > then there won't be too much difference how many commits there are. > > Regards, > Anton > > On Fri, Sep 28, 2018 at 8:21 AM Andrew Pilloud <apill...@google.com> > wrote: > >> I brought up this discussion a few months ago from the other side: I >> don't like my commits being squashed. I try to create logical commits that >> each passes tests and could be broken up into multiple PRs. Keeping those >> changes intact is useful from a history perspective and squashing may break >> other PRs I have in flight. If the intent is clear (one commit with a >> descriptive message and a bunch of "fixups"), then feel free to squash, >> otherwise ask first. When you do squash, it would be good to leave a >> comment as to how the author can avoid having their commits squashed in the >> future. >> >> >> https://lists.apache.org/thread.html/8d29e474e681ab9123280164d95075bb8b0b91486b66d3fa25ed20c2@%3Cdev.beam.apache.org%3E >> >> Andrew >> >> On Fri, Sep 28, 2018 at 7:29 AM Chamikara Jayalath <chamik...@google.com> >> wrote: >> >>> >>> >>> On Thu, Sep 27, 2018 at 9:51 AM Robert Bradshaw <rober...@google.com> >>> wrote: >>> >>>> I agree that we should create a good pointer for cleaning up PRs, and >>>> request (though not require) that authors do it. It's unfortunate though >>>> that squashing during a review makes things difficult to follow, so adds >>>> one more round trip. >>>> >>>> We could consider for those PRs that make sense as a single logical >>>> commit (most, but not all, of them) simply using the "squash and merge" >>>> button even though it technically doesn't create a merge commit. >>>> >>> >>> +1 for allowing "squash and merge" as an option. Most of the reviews (at >>> least for me) consist of a single valid commit and several additional >>> commits that get piled up during the review process which obviously should >>> not be included in the commit history. Going through another round here >>> just to ask the author to fixup everything is unnecessarily time consuming. >>> >>> - Cham >>> >>> >>>> >>>> >>>> On Fri, Sep 21, 2018 at 9:24 PM Daniel Oliveira <danolive...@google.com> >>>> wrote: >>>> >>>>> As a non-committer I think some automated squashing of commits sounds >>>>> best since it lightens the load of regular contributors, by not having to >>>>> always remember to squash, and lightens the load of committers so it >>>>> doesn't take as long to have your PR approved by one. >>>>> >>>>> But for now I think the second best route would be making it PR >>>>> author's responsibility to squash fixup commits. Having that expectation >>>>> described clearly in the Contributor's Guide, along with some simple >>>>> step-by-step instructions for how to do so should be enough. I mainly >>>>> support this because I've been doing the squashing myself since I saw a >>>>> thread about it here a few months ago. It's not nearly as huge a burden on >>>>> me as it probably is for committers who have to merge in many more PRs, >>>>> it's very easy to learn how to do, and it's one less barrier to having my >>>>> code merged in. >>>>> >>>>> Of course I wouldn't expect that committers wait for PR authors to >>>>> squash their fixup commits, but I think leaving a message like "For future >>>>> pull requests you should squash any small fixup commits, as described >>>>> here: >>>>> <link>" should be fine. >>>>> >>>>> >>>>>> I was also thinking about the possibility of wanting to revert >>>>>> individual commits from a merge commit. The solution you propose >>>>>> works, >>>>>> but only if you want to revert everything. >>>>> >>>>> >>>>> Does this happen often? I might not have enough context since I'm not >>>>> a committer, but it seems to me that often the person performing a revert >>>>> is not the original author of a change and doesn't have the context or >>>>> time >>>>> to pick out an individual commit to revert. >>>>> >>>>> On Wed, Sep 19, 2018 at 1:32 PM Maximilian Michels <m...@apache.org> >>>>> wrote: >>>>> >>>>>> I tend to agree with you Lukasz. Of course we should try to follow >>>>>> the >>>>>> guide lines as much as possible but if it requires an extra back and >>>>>> forth with the PR author for a cosmetic change, it may not be worth >>>>>> the >>>>>> time. >>>>>> >>>>>> On 19.09.18 22:17, Lukasz Cwik wrote: >>>>>> > I have to say I'm guilty of not following the merge guidelines, >>>>>> > sometimes doing merges without rebasing/flatten commits. >>>>>> > >>>>>> > I find that it is a few extra mins of my time to fix someones PR >>>>>> history >>>>>> > if they have more then one logical commit they want to be separate >>>>>> and >>>>>> > it usually takes days for the PR author to do merging with the >>>>>> extra >>>>>> > burden as a committer to keep track of another PR and its state >>>>>> (waiting >>>>>> > for clean-up) is taxing. I really liked the idea of the mergebot >>>>>> (even >>>>>> > though it didn't work out in practice) because it could do all the >>>>>> > policy work on my behalf. >>>>>> > >>>>>> > Anything that reduces my overhead as a committer is useful as for >>>>>> the >>>>>> > 100s of PRs that I have merged, I've only had to rollback a couple >>>>>> so >>>>>> > I'm for Charle's suggestion which makes the rollback flow slightly >>>>>> more >>>>>> > complicated for a significantly easier PR merge workflow. >>>>>> > >>>>>> > On Wed, Sep 19, 2018 at 1:13 PM Charles Chen <c...@google.com >>>>>> > <mailto:c...@google.com>> wrote: >>>>>> > >>>>>> > What I mean is that if you get the first-parent commit using >>>>>> "git >>>>>> > log --first-parent", it will incorporate any and all fix up >>>>>> commits >>>>>> > so we don't need to worry about missing any. >>>>>> > >>>>>> > On Wed, Sep 19, 2018, 1:07 PM Maximilian Michels < >>>>>> m...@apache.org >>>>>> > <mailto:m...@apache.org>> wrote: >>>>>> > >>>>>> > Generally, +1 for isolated commits which are easy to revert. >>>>>> > >>>>>> > > I don't think it's actually harder to roll back a set of >>>>>> > commits that are merged together. >>>>>> > I think Thomas was mainly concerned about "fixup" commits to >>>>>> > land in >>>>>> > master (as part of a merge). These indeed make reverting >>>>>> commits >>>>>> > more >>>>>> > difficult because you have to check whether you missed a >>>>>> "fixup". >>>>>> > >>>>>> > > Ideally every commit should compile and pass tests >>>>>> though, right? >>>>>> > >>>>>> > That is definitely what we should strive for when doing a >>>>>> merge >>>>>> > against >>>>>> > master. >>>>>> > >>>>>> > > Perhaps the bigger issue is that we need better >>>>>> documentation >>>>>> > and a playbook on how to do this these common tasks in git. >>>>>> > >>>>>> > We do actually have basic documentation about this but most >>>>>> > people don't >>>>>> > read it. For example, the commit message of a Merge commit >>>>>> > should be: >>>>>> > >>>>>> > Merge pull request #xxxx: [BEAM-yyyy] Issue title >>>>>> > >>>>>> > But most merge commits don't comply with this rule :) See >>>>>> > >>>>>> https://beam.apache.org/contribute/committer-guide/#merging-it >>>>>> > >>>>>> > On 19.09.18 21:34, Reuven Lax wrote: >>>>>> > > Ideally every commit should compile and pass tests >>>>>> though, right? >>>>>> > > >>>>>> > > On Wed, Sep 19, 2018 at 12:15 PM Ankur Goenka >>>>>> > <goe...@google.com <mailto:goe...@google.com> >>>>>> > > <mailto:goe...@google.com <mailto:goe...@google.com>>> >>>>>> wrote: >>>>>> > > >>>>>> > > I agree with the cleanliness of the Commit history. >>>>>> > > "Fixup!", "Address comments", "Address even more >>>>>> > comments" type of >>>>>> > > comments does not convey meaningful information and >>>>>> are >>>>>> > not very >>>>>> > > useful. Its a good idea to squash them. >>>>>> > > However, I think its ok to keep separate commits for >>>>>> > different >>>>>> > > logical pieces of the code which make reviewing and >>>>>> > revisiting code >>>>>> > > easier. >>>>>> > > Example PR: Support X in the pipeline >>>>>> > > Commit 1: Restructuring a bunch of code without any >>>>>> > logical change. >>>>>> > > Commit 2: Changing validation logic for pipeline. >>>>>> > > Commit 3: Supporting new field "X" for pipeline. >>>>>> > > >>>>>> > > On Wed, Sep 19, 2018 at 11:27 AM Charles Chen >>>>>> > <c...@google.com <mailto:c...@google.com> >>>>>> > > <mailto:c...@google.com <mailto:c...@google.com>>> >>>>>> wrote: >>>>>> > > >>>>>> > > To be concrete, it is very easy to revert a >>>>>> commit in >>>>>> > any case: >>>>>> > > >>>>>> > > 1. First, use "git log --first-parent" to find >>>>>> the >>>>>> > first-parent >>>>>> > > commit corresponding to a PR merge (this is a >>>>>> > one-to-one >>>>>> > > correspondence). >>>>>> > > 2. Use "git revert -m 1 <commitid>" to revert >>>>>> the >>>>>> > commit; this >>>>>> > > selects the first parent as the base for a >>>>>> merge >>>>>> > commit (in >>>>>> > > the case where a single commit needs to be >>>>>> > reverted, just >>>>>> > > use "git revert <commitid>" without the "-m >>>>>> 1" flag). >>>>>> > > >>>>>> > > In any case, as a general good engineering >>>>>> practice, >>>>>> > I do agree >>>>>> > > that it is highly desirable to have small >>>>>> independent PRs >>>>>> > > instead of large jumbo PRs whenever possible. >>>>>> > > >>>>>> > > On Wed, Sep 19, 2018 at 11:20 AM Charles Chen >>>>>> > <c...@google.com <mailto:c...@google.com> >>>>>> > > <mailto:c...@google.com <mailto:c...@google.com>>> >>>>>> wrote: >>>>>> > > >>>>>> > > I don't think it's actually harder to roll >>>>>> back a >>>>>> > set of >>>>>> > > commits that are merged together. Git has >>>>>> the >>>>>> > notion of >>>>>> > > first-parent commits (you can see, for >>>>>> example, >>>>>> > "git log >>>>>> > > --first-parent", which filters out the >>>>>> intermediate >>>>>> > > commits). In this sense, PRs still get >>>>>> merged as >>>>>> > one unit >>>>>> > > and this is preserved even if intermediate >>>>>> > commits are >>>>>> > > kept. Perhaps the bigger issue is that we >>>>>> need >>>>>> > better >>>>>> > > documentation and a playbook on how to do >>>>>> this >>>>>> > these common >>>>>> > > tasks in git. >>>>>> > > >>>>>> > > On Wed, Sep 19, 2018 at 9:27 AM Thomas Weise >>>>>> > <t...@apache.org <mailto:t...@apache.org> >>>>>> > > <mailto:t...@apache.org <mailto: >>>>>> t...@apache.org>>> >>>>>> > wrote: >>>>>> > > >>>>>> > > Wanted to bring this up as reminder as >>>>>> well as >>>>>> > > opportunity to discuss potential changes >>>>>> to our >>>>>> > > committer guide. It has been a while >>>>>> since >>>>>> > last related >>>>>> > > discussion and we welcomed several new >>>>>> > committers since >>>>>> > > then. >>>>>> > > >>>>>> > > Finishing up pull requests pre-merge: >>>>>> > > >>>>>> > > >>>>>> > >>>>>> https://beam.apache.org/contribute/committer-guide/#finishing-touches >>>>>> > > >>>>>> > > PRs are worked on over time and may >>>>>> > accumulate many >>>>>> > > commits. Sometimes because scope expands, >>>>>> > sometimes just >>>>>> > > to separate independent changes but most >>>>>> of >>>>>> > the time the >>>>>> > > commits are just fixups that are added as >>>>>> > review progresses. >>>>>> > > It is important that the latter get >>>>>> squashed >>>>>> > prior to PR >>>>>> > > merge, as otherwise we lost the ability >>>>>> to >>>>>> > roll back >>>>>> > > changes by reverting a single commit and >>>>>> also >>>>>> > generally >>>>>> > > cause a lot of noise in the commit >>>>>> history >>>>>> > that does not >>>>>> > > help other contributors. To be clear, I >>>>>> refer >>>>>> > to the >>>>>> > > "Fixup!", "Address comments", "Address >>>>>> even more >>>>>> > > comments" type of entries :) >>>>>> > > >>>>>> > > I would also propose that every commit >>>>>> gets >>>>>> > tagged with >>>>>> > > a JIRA (except those fixups that will be >>>>>> > squashed). >>>>>> > > Having the JIRA and possibly other tags >>>>>> makes >>>>>> > it easier >>>>>> > > for others not involved in the PR to >>>>>> identify >>>>>> > changes >>>>>> > > after they were merged, for example when >>>>>> > looking at the >>>>>> > > revision history or annotated source. >>>>>> > > >>>>>> > > As for other scenarios of jumbo PRs with >>>>>> many >>>>>> > commits, >>>>>> > > there are probably situations where work >>>>>> > needs to be >>>>>> > > broken down into smaller units, making >>>>>> life >>>>>> > better for >>>>>> > > both, contributor and reviewer(s). >>>>>> Ideally, >>>>>> > every PR >>>>>> > > would have only one commit, but that may >>>>>> be a >>>>>> > bit much >>>>>> > > to mandate? Is the general expectation >>>>>> > something we need >>>>>> > > to document more clearly? >>>>>> > > >>>>>> > > Thanks, >>>>>> > > Thomas >>>>>> > > >>>>>> > >>>>>> >>>>>