SGTM. I generally favor "trust the committer's judgment" aka (2)(d) when it is obvious. We had some small communication problem about this before so I just wanted to be careful to ask the author when it is not obvious.
Kenn On Sat, Sep 29, 2018 at 9:06 AM Robert Bradshaw <rober...@google.com> wrote: > There's two separate topics of discussion going on here. > > (1) Is it acceptable for PRs to have more than one commit. We've discussed > this before, and the consensus seems to be that this is both allowed and > intentional. > > (2) What to do about PRs that are clearly "[BEAM-NNNN] Implement feature" > + (fixup/address comments/lint/...)*. I think this is easier (or at least > quicker) to make a call on. Our options are > > (a) Merge as is. > (b) Ask the author to do a final squash. (Doing a squash before an > approval, however, makes reviewing more difficult, so this enforces another > round.) > (c) Having the committer manually pull, squash, (force push to PR?), and > merge. > (d) Use the "squash and merge" button in this case. > > Clearly, we shouldn't be doing any squashing unless the intent is obvious, > but when it is, I think (d) is the sanest and safest route. > > - Robert > > > > On Fri, Sep 28, 2018 at 9:53 PM Kenneth Knowles <k...@apache.org> wrote: > >> On Fri, Sep 28, 2018 at 10:29 AM Thomas Weise <t...@apache.org> wrote: >> >>> +1 for stating the goal of clear provenance and granular rollback. >>> >> >> >>> >>> >> Also of course efficiency and quality of review (we don't to review tiny >> or out-of-context changes or huge mega-changes), efficiency of authoring >> (don't want to wait on a review for a tiny bit because GitHub makes it very >> hard to stack up reviews in sequence / don't want to have major changes >> blocked because of difficulty of review), ease of new contribution (OK for >> committers to do more IMO, while new/one-time contributors shouldn't need >> to know or obey any policy). >> >> I think this discussion helps to remind/identify best practices how to >>> get there. Where appropriate we should augment guidelines for both, >>> contributor and committer. >>> >> >> Kenn: would you elaborate on the "1 commit = 1 review (and sometimes even >>> = 1 ticket)" a bit more. Is that a problem of insufficient task / ticket >>> granularity or something else? >>> >> >> The problem isn't a failure to define tasks at the right granularity, but >> that they naturally and fundamentally exist with a different granularity >> (unit of change -> unit of review -> unit of tracking/delivery). I'd guess >> there's often a 5x jump in size from commit -> review -> ticket. There are >> many many easy-to-isolate changes that are wasteful to independent review >> or track. >> >> To get some concrete facts, I bet the one things we could probably find >> research on is the ideal review size. And we could also scrape logs for >> messages with bullet points (often each bullet is basically what would have >> been a commit). Generally getting a sense of what these ratios are in >> practice and idealized would be kind of interesting but maybe overkill. >> >> Kenn >> >> Charles: my plan is to translate the outcome of this discussion to >>> guideline updates and your log filter trick will be part of it. Though my >>> hope is that it won't be needed if we get closer to the goals above. >>> >>> Thanks, >>> Thomas >>> >>> >>> On Fri, Sep 28, 2018 at 9:44 AM Kenneth Knowles <k...@apache.org> wrote: >>> >>>> Anton makes a good point. We have been talking about policy for what we >>>> do, but the real issue is what we want to come out of it: a clear history >>>> for seeing where code came from and granular rollback. I think in both >>>> cases the key thing is that each commit is a single clear change. How they >>>> get there is not the point. >>>> >>>> I have worked on multiple projects with a 1 commit = 1 review (and >>>> sometimes even = 1 ticket). These pretty much never have a good history. >>>> The best case is that each commit has a message that is a bullet point of >>>> many separate changes, because it is simply too inefficient to review each >>>> logical change separately. But since the messages become less useful, it >>>> encourages a culture of not even bothering to write meaningful messages at >>>> all. >>>> >>>> Note that is trivial for a committer-reviewer to edit the history any >>>> way they like without the button. "Allow edits by maintainers" is on by >>>> default. The "Squash and merge" button just adds a button for something we >>>> can already do. >>>> >>>> Charles: super useful! Worth noting that for a PR with a good history >>>> it will skip meaningful commits (but still give the summary line, which is >>>> nice). >>>> >>>> Kenn >>>> >>>> >>>> >>>> On Fri, Sep 28, 2018 at 8:54 AM Anton Kedin <ke...@google.com> wrote: >>>> >>>>> Is there an actual problem caused by squashing or not squashing the >>>>> commits that we face in the project? I personally have never needed to >>>>> revert something complicated that would be problematic either way (and >>>>> don't have a strong opinion about which way we should do it). From what I >>>>> see so far in the thread it doesn't look like reverting is a frequent >>>>> major >>>>> pain for anyone. Maybe it is exactly because we're mostly following some >>>>> best practice and it makes it easy. If someone has concrete examples from >>>>> their experience in the project, please share them, this way it would be >>>>> easier to justify the choice. >>>>> >>>>> The PR and commit cleanliness, size and isolation are probably more >>>>> important thing to have guidance and maybe enforcement for. There are well >>>>> known practices and guidelines that I think we should follow, and I think >>>>> they will make squashing or not squashing mostly irrelevant. For example, >>>>> if we accept that commits should have description that actually describes >>>>> what commit does, then "!fixup", "address comments" and similar should not >>>>> be part of the history and should be squashed before submitting the PR no >>>>> matter which way we decide to go in general. Also, I think that making >>>>> commits isolated is also a good practice, and PR author should be able to >>>>> relatively easily split the PR upon reviewer's request. And if we choose >>>>> to >>>>> keep whole PRs small and incremental with descriptive isolated commits, >>>>> then there won't be too much difference how many commits there are. >>>>> >>>>> Regards, >>>>> Anton >>>>> >>>>> On Fri, Sep 28, 2018 at 8:21 AM Andrew Pilloud <apill...@google.com> >>>>> wrote: >>>>> >>>>>> I brought up this discussion a few months ago from the other side: I >>>>>> don't like my commits being squashed. I try to create logical commits >>>>>> that >>>>>> each passes tests and could be broken up into multiple PRs. Keeping those >>>>>> changes intact is useful from a history perspective and squashing may >>>>>> break >>>>>> other PRs I have in flight. If the intent is clear (one commit with a >>>>>> descriptive message and a bunch of "fixups"), then feel free to squash, >>>>>> otherwise ask first. When you do squash, it would be good to leave a >>>>>> comment as to how the author can avoid having their commits squashed in >>>>>> the >>>>>> future. >>>>>> >>>>>> >>>>>> https://lists.apache.org/thread.html/8d29e474e681ab9123280164d95075bb8b0b91486b66d3fa25ed20c2@%3Cdev.beam.apache.org%3E >>>>>> >>>>>> Andrew >>>>>> >>>>>> On Fri, Sep 28, 2018 at 7:29 AM Chamikara Jayalath < >>>>>> chamik...@google.com> wrote: >>>>>> >>>>>>> >>>>>>> >>>>>>> On Thu, Sep 27, 2018 at 9:51 AM Robert Bradshaw <rober...@google.com> >>>>>>> wrote: >>>>>>> >>>>>>>> I agree that we should create a good pointer for cleaning up PRs, >>>>>>>> and request (though not require) that authors do it. It's unfortunate >>>>>>>> though that squashing during a review makes things difficult to >>>>>>>> follow, so >>>>>>>> adds one more round trip. >>>>>>>> >>>>>>>> We could consider for those PRs that make sense as a single logical >>>>>>>> commit (most, but not all, of them) simply using the "squash and merge" >>>>>>>> button even though it technically doesn't create a merge commit. >>>>>>>> >>>>>>> >>>>>>> +1 for allowing "squash and merge" as an option. Most of the reviews >>>>>>> (at least for me) consist of a single valid commit and several >>>>>>> additional >>>>>>> commits that get piled up during the review process which obviously >>>>>>> should >>>>>>> not be included in the commit history. Going through another round here >>>>>>> just to ask the author to fixup everything is unnecessarily time >>>>>>> consuming. >>>>>>> >>>>>>> - Cham >>>>>>> >>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> On Fri, Sep 21, 2018 at 9:24 PM Daniel Oliveira < >>>>>>>> danolive...@google.com> wrote: >>>>>>>> >>>>>>>>> As a non-committer I think some automated squashing of commits >>>>>>>>> sounds best since it lightens the load of regular contributors, by not >>>>>>>>> having to always remember to squash, and lightens the load of >>>>>>>>> committers so >>>>>>>>> it doesn't take as long to have your PR approved by one. >>>>>>>>> >>>>>>>>> But for now I think the second best route would be making it PR >>>>>>>>> author's responsibility to squash fixup commits. Having that >>>>>>>>> expectation >>>>>>>>> described clearly in the Contributor's Guide, along with some simple >>>>>>>>> step-by-step instructions for how to do so should be enough. I mainly >>>>>>>>> support this because I've been doing the squashing myself since I saw >>>>>>>>> a >>>>>>>>> thread about it here a few months ago. It's not nearly as huge a >>>>>>>>> burden on >>>>>>>>> me as it probably is for committers who have to merge in many more >>>>>>>>> PRs, >>>>>>>>> it's very easy to learn how to do, and it's one less barrier to >>>>>>>>> having my >>>>>>>>> code merged in. >>>>>>>>> >>>>>>>>> Of course I wouldn't expect that committers wait for PR authors to >>>>>>>>> squash their fixup commits, but I think leaving a message like "For >>>>>>>>> future >>>>>>>>> pull requests you should squash any small fixup commits, as described >>>>>>>>> here: >>>>>>>>> <link>" should be fine. >>>>>>>>> >>>>>>>>> >>>>>>>>>> I was also thinking about the possibility of wanting to revert >>>>>>>>>> individual commits from a merge commit. The solution you propose >>>>>>>>>> works, >>>>>>>>>> but only if you want to revert everything. >>>>>>>>> >>>>>>>>> >>>>>>>>> Does this happen often? I might not have enough context since I'm >>>>>>>>> not a committer, but it seems to me that often the person performing a >>>>>>>>> revert is not the original author of a change and doesn't have the >>>>>>>>> context >>>>>>>>> or time to pick out an individual commit to revert. >>>>>>>>> >>>>>>>>> On Wed, Sep 19, 2018 at 1:32 PM Maximilian Michels <m...@apache.org> >>>>>>>>> wrote: >>>>>>>>> >>>>>>>>>> I tend to agree with you Lukasz. Of course we should try to >>>>>>>>>> follow the >>>>>>>>>> guide lines as much as possible but if it requires an extra back >>>>>>>>>> and >>>>>>>>>> forth with the PR author for a cosmetic change, it may not be >>>>>>>>>> worth the >>>>>>>>>> time. >>>>>>>>>> >>>>>>>>>> On 19.09.18 22:17, Lukasz Cwik wrote: >>>>>>>>>> > I have to say I'm guilty of not following the merge guidelines, >>>>>>>>>> > sometimes doing merges without rebasing/flatten commits. >>>>>>>>>> > >>>>>>>>>> > I find that it is a few extra mins of my time to fix someones >>>>>>>>>> PR history >>>>>>>>>> > if they have more then one logical commit they want to be >>>>>>>>>> separate and >>>>>>>>>> > it usually takes days for the PR author to do merging with the >>>>>>>>>> extra >>>>>>>>>> > burden as a committer to keep track of another PR and its state >>>>>>>>>> (waiting >>>>>>>>>> > for clean-up) is taxing. I really liked the idea of the >>>>>>>>>> mergebot (even >>>>>>>>>> > though it didn't work out in practice) because it could do all >>>>>>>>>> the >>>>>>>>>> > policy work on my behalf. >>>>>>>>>> > >>>>>>>>>> > Anything that reduces my overhead as a committer is useful as >>>>>>>>>> for the >>>>>>>>>> > 100s of PRs that I have merged, I've only had to rollback a >>>>>>>>>> couple so >>>>>>>>>> > I'm for Charle's suggestion which makes the rollback flow >>>>>>>>>> slightly more >>>>>>>>>> > complicated for a significantly easier PR merge workflow. >>>>>>>>>> > >>>>>>>>>> > On Wed, Sep 19, 2018 at 1:13 PM Charles Chen <c...@google.com >>>>>>>>>> > <mailto:c...@google.com>> wrote: >>>>>>>>>> > >>>>>>>>>> > What I mean is that if you get the first-parent commit >>>>>>>>>> using "git >>>>>>>>>> > log --first-parent", it will incorporate any and all fix up >>>>>>>>>> commits >>>>>>>>>> > so we don't need to worry about missing any. >>>>>>>>>> > >>>>>>>>>> > On Wed, Sep 19, 2018, 1:07 PM Maximilian Michels < >>>>>>>>>> m...@apache.org >>>>>>>>>> > <mailto:m...@apache.org>> wrote: >>>>>>>>>> > >>>>>>>>>> > Generally, +1 for isolated commits which are easy to >>>>>>>>>> revert. >>>>>>>>>> > >>>>>>>>>> > > I don't think it's actually harder to roll back a >>>>>>>>>> set of >>>>>>>>>> > commits that are merged together. >>>>>>>>>> > I think Thomas was mainly concerned about "fixup" >>>>>>>>>> commits to >>>>>>>>>> > land in >>>>>>>>>> > master (as part of a merge). These indeed make >>>>>>>>>> reverting commits >>>>>>>>>> > more >>>>>>>>>> > difficult because you have to check whether you missed >>>>>>>>>> a "fixup". >>>>>>>>>> > >>>>>>>>>> > > Ideally every commit should compile and pass tests >>>>>>>>>> though, right? >>>>>>>>>> > >>>>>>>>>> > That is definitely what we should strive for when doing >>>>>>>>>> a merge >>>>>>>>>> > against >>>>>>>>>> > master. >>>>>>>>>> > >>>>>>>>>> > > Perhaps the bigger issue is that we need better >>>>>>>>>> documentation >>>>>>>>>> > and a playbook on how to do this these common tasks in >>>>>>>>>> git. >>>>>>>>>> > >>>>>>>>>> > We do actually have basic documentation about this but >>>>>>>>>> most >>>>>>>>>> > people don't >>>>>>>>>> > read it. For example, the commit message of a Merge >>>>>>>>>> commit >>>>>>>>>> > should be: >>>>>>>>>> > >>>>>>>>>> > Merge pull request #xxxx: [BEAM-yyyy] Issue title >>>>>>>>>> > >>>>>>>>>> > But most merge commits don't comply with this rule :) >>>>>>>>>> See >>>>>>>>>> > >>>>>>>>>> https://beam.apache.org/contribute/committer-guide/#merging-it >>>>>>>>>> > >>>>>>>>>> > On 19.09.18 21:34, Reuven Lax wrote: >>>>>>>>>> > > Ideally every commit should compile and pass tests >>>>>>>>>> though, right? >>>>>>>>>> > > >>>>>>>>>> > > On Wed, Sep 19, 2018 at 12:15 PM Ankur Goenka >>>>>>>>>> > <goe...@google.com <mailto:goe...@google.com> >>>>>>>>>> > > <mailto:goe...@google.com <mailto:goe...@google.com>>> >>>>>>>>>> wrote: >>>>>>>>>> > > >>>>>>>>>> > > I agree with the cleanliness of the Commit >>>>>>>>>> history. >>>>>>>>>> > > "Fixup!", "Address comments", "Address even more >>>>>>>>>> > comments" type of >>>>>>>>>> > > comments does not convey meaningful information >>>>>>>>>> and are >>>>>>>>>> > not very >>>>>>>>>> > > useful. Its a good idea to squash them. >>>>>>>>>> > > However, I think its ok to keep separate commits >>>>>>>>>> for >>>>>>>>>> > different >>>>>>>>>> > > logical pieces of the code which make reviewing >>>>>>>>>> and >>>>>>>>>> > revisiting code >>>>>>>>>> > > easier. >>>>>>>>>> > > Example PR: Support X in the pipeline >>>>>>>>>> > > Commit 1: Restructuring a bunch of code without >>>>>>>>>> any >>>>>>>>>> > logical change. >>>>>>>>>> > > Commit 2: Changing validation logic for pipeline. >>>>>>>>>> > > Commit 3: Supporting new field "X" for pipeline. >>>>>>>>>> > > >>>>>>>>>> > > On Wed, Sep 19, 2018 at 11:27 AM Charles Chen >>>>>>>>>> > <c...@google.com <mailto:c...@google.com> >>>>>>>>>> > > <mailto:c...@google.com <mailto:c...@google.com>>> >>>>>>>>>> wrote: >>>>>>>>>> > > >>>>>>>>>> > > To be concrete, it is very easy to revert a >>>>>>>>>> commit in >>>>>>>>>> > any case: >>>>>>>>>> > > >>>>>>>>>> > > 1. First, use "git log --first-parent" to >>>>>>>>>> find the >>>>>>>>>> > first-parent >>>>>>>>>> > > commit corresponding to a PR merge (this >>>>>>>>>> is a >>>>>>>>>> > one-to-one >>>>>>>>>> > > correspondence). >>>>>>>>>> > > 2. Use "git revert -m 1 <commitid>" to >>>>>>>>>> revert the >>>>>>>>>> > commit; this >>>>>>>>>> > > selects the first parent as the base for >>>>>>>>>> a merge >>>>>>>>>> > commit (in >>>>>>>>>> > > the case where a single commit needs to >>>>>>>>>> be >>>>>>>>>> > reverted, just >>>>>>>>>> > > use "git revert <commitid>" without the >>>>>>>>>> "-m 1" flag). >>>>>>>>>> > > >>>>>>>>>> > > In any case, as a general good engineering >>>>>>>>>> practice, >>>>>>>>>> > I do agree >>>>>>>>>> > > that it is highly desirable to have small >>>>>>>>>> independent PRs >>>>>>>>>> > > instead of large jumbo PRs whenever possible. >>>>>>>>>> > > >>>>>>>>>> > > On Wed, Sep 19, 2018 at 11:20 AM Charles Chen >>>>>>>>>> > <c...@google.com <mailto:c...@google.com> >>>>>>>>>> > > <mailto:c...@google.com <mailto: >>>>>>>>>> c...@google.com>>> wrote: >>>>>>>>>> > > >>>>>>>>>> > > I don't think it's actually harder to >>>>>>>>>> roll back a >>>>>>>>>> > set of >>>>>>>>>> > > commits that are merged together. Git >>>>>>>>>> has the >>>>>>>>>> > notion of >>>>>>>>>> > > first-parent commits (you can see, for >>>>>>>>>> example, >>>>>>>>>> > "git log >>>>>>>>>> > > --first-parent", which filters out the >>>>>>>>>> intermediate >>>>>>>>>> > > commits). In this sense, PRs still get >>>>>>>>>> merged as >>>>>>>>>> > one unit >>>>>>>>>> > > and this is preserved even if >>>>>>>>>> intermediate >>>>>>>>>> > commits are >>>>>>>>>> > > kept. Perhaps the bigger issue is that >>>>>>>>>> we need >>>>>>>>>> > better >>>>>>>>>> > > documentation and a playbook on how to >>>>>>>>>> do this >>>>>>>>>> > these common >>>>>>>>>> > > tasks in git. >>>>>>>>>> > > >>>>>>>>>> > > On Wed, Sep 19, 2018 at 9:27 AM Thomas >>>>>>>>>> Weise >>>>>>>>>> > <t...@apache.org <mailto:t...@apache.org> >>>>>>>>>> > > <mailto:t...@apache.org <mailto: >>>>>>>>>> t...@apache.org>>> >>>>>>>>>> > wrote: >>>>>>>>>> > > >>>>>>>>>> > > Wanted to bring this up as reminder >>>>>>>>>> as well as >>>>>>>>>> > > opportunity to discuss potential >>>>>>>>>> changes to our >>>>>>>>>> > > committer guide. It has been a while >>>>>>>>>> since >>>>>>>>>> > last related >>>>>>>>>> > > discussion and we welcomed several >>>>>>>>>> new >>>>>>>>>> > committers since >>>>>>>>>> > > then. >>>>>>>>>> > > >>>>>>>>>> > > Finishing up pull requests pre-merge: >>>>>>>>>> > > >>>>>>>>>> > > >>>>>>>>>> > >>>>>>>>>> https://beam.apache.org/contribute/committer-guide/#finishing-touches >>>>>>>>>> > > >>>>>>>>>> > > PRs are worked on over time and may >>>>>>>>>> > accumulate many >>>>>>>>>> > > commits. Sometimes because scope >>>>>>>>>> expands, >>>>>>>>>> > sometimes just >>>>>>>>>> > > to separate independent changes but >>>>>>>>>> most of >>>>>>>>>> > the time the >>>>>>>>>> > > commits are just fixups that are >>>>>>>>>> added as >>>>>>>>>> > review progresses. >>>>>>>>>> > > It is important that the latter get >>>>>>>>>> squashed >>>>>>>>>> > prior to PR >>>>>>>>>> > > merge, as otherwise we lost the >>>>>>>>>> ability to >>>>>>>>>> > roll back >>>>>>>>>> > > changes by reverting a single commit >>>>>>>>>> and also >>>>>>>>>> > generally >>>>>>>>>> > > cause a lot of noise in the commit >>>>>>>>>> history >>>>>>>>>> > that does not >>>>>>>>>> > > help other contributors. To be >>>>>>>>>> clear, I refer >>>>>>>>>> > to the >>>>>>>>>> > > "Fixup!", "Address comments", >>>>>>>>>> "Address even more >>>>>>>>>> > > comments" type of entries :) >>>>>>>>>> > > >>>>>>>>>> > > I would also propose that every >>>>>>>>>> commit gets >>>>>>>>>> > tagged with >>>>>>>>>> > > a JIRA (except those fixups that >>>>>>>>>> will be >>>>>>>>>> > squashed). >>>>>>>>>> > > Having the JIRA and possibly other >>>>>>>>>> tags makes >>>>>>>>>> > it easier >>>>>>>>>> > > for others not involved in the PR to >>>>>>>>>> identify >>>>>>>>>> > changes >>>>>>>>>> > > after they were merged, for example >>>>>>>>>> when >>>>>>>>>> > looking at the >>>>>>>>>> > > revision history or annotated source. >>>>>>>>>> > > >>>>>>>>>> > > As for other scenarios of jumbo PRs >>>>>>>>>> with many >>>>>>>>>> > commits, >>>>>>>>>> > > there are probably situations where >>>>>>>>>> work >>>>>>>>>> > needs to be >>>>>>>>>> > > broken down into smaller units, >>>>>>>>>> making life >>>>>>>>>> > better for >>>>>>>>>> > > both, contributor and reviewer(s). >>>>>>>>>> Ideally, >>>>>>>>>> > every PR >>>>>>>>>> > > would have only one commit, but that >>>>>>>>>> may be a >>>>>>>>>> > bit much >>>>>>>>>> > > to mandate? Is the general >>>>>>>>>> expectation >>>>>>>>>> > something we need >>>>>>>>>> > > to document more clearly? >>>>>>>>>> > > >>>>>>>>>> > > Thanks, >>>>>>>>>> > > Thomas >>>>>>>>>> > > >>>>>>>>>> > >>>>>>>>>> >>>>>>>>>