Case study: https://github.com/apache/beam/pull/14618/commits
To get back to the question of Issue and PR titles: - When doing Jira triage, fix issue titles and issue type to be meaningful. These autogenerate release notes, and also we use these to find duplicates, etc. - Any committer can usually edit PR titles and descriptions. I do this frequently. I am unhappy with how often the template "PLEASE add a meaningful description" is left exactly as-is. The template is literally begging you to actually describe the change. It is fine to duplicate the body of the commit message. Good commit messages have the format <subject>\n\n<description>. Kenn On Thu, Apr 22, 2021 at 12:49 PM Kenneth Knowles <k...@apache.org> wrote: > I think I am wrong about this. It seems like for squashed/rebased commits > it is still GitHub that is committer? But it does seem to have the metadata > about who did the squash & merge. This pattern of storing important > metadata outside of git is not a good direction. > > Kenn > > On Thu, Apr 22, 2021 at 12:45 PM Kenneth Knowles <k...@apache.org> wrote: > >> That is unfortunate that GitHub is the committer of merge commits :-/ >> though I suppose you have the author field you can use. It is unfortunate >> the this is a different field based on method. >> >> Kenn >> >> On Thu, Apr 22, 2021 at 12:39 PM Ismaël Mejía <ieme...@gmail.com> wrote: >> >>> I was not referring to author identity but to committer identity that >>> matters to know who accepted to merge something but it seems we are >>> not really using this much because github is the 'committer' of merge >>> commits too :S maybe something we can improve as part of this >>> discussion. >>> >>> git show --pretty=full COMMITID >>> >>> On Thu, Apr 22, 2021 at 9:10 PM Valentyn Tymofieiev <valen...@google.com> >>> wrote: >>> > >>> > Author identity is preserved. Here's an output of 'git log' >>> > >>> > commit 93ecc1d3a4b997b2490c4439972ffaf09125299f >>> > Merge: 2e9ee8c005 4e3decbb4e >>> <------ a merge commit that merges 2 commit, >>> 4e3decbb4e and it's parent. Author history is preserved on 4e3decbb4e >>> > Author: Ismaël Mejía <ieme...@gmail.com> >>> <------ this is the author of merge commit >>> > Date: Thu Apr 22 12:46:38 2021 +0200 >>> > >>> > Merge pull request #14616: [BEAM-12207] Remove log messages about >>> files to stage. <------ Note that message was edited, and does not >>> include a branch, which is nice! >>> > commit 2e9ee8c0052d96045588e617c9e5de017f30454a >>> > >>> > >>> > commit 28020effca12a18a65799ac7d2d3d520d73072d7 >>> > Author: yoshiki.obata <1285728+lazyl...@users.noreply.github.com> >>> > Date: Thu Apr 22 11:57:45 2021 +0900 >>> > >>> > [BEAM-7372] cleanup codes for py2 from apache_beam/transforms >>> (#14544) <--- 1-commit PR was squashed-and-merged by me. Author's >>> identity is preserved >>> > >>> > On Thu, Apr 22, 2021 at 11:47 AM Ismaël Mejía <ieme...@gmail.com> >>> wrote: >>> >> >>> >> In the past github squash and merge did not preserve the committer >>> >> identity correctly, is it still the case? If so we should not be >>> >> using it. >>> >> https://github.com/isaacs/github/issues/1368 >>> >> >>> >> On Thu, Apr 22, 2021 at 8:41 PM Robert Bradshaw <rober...@google.com> >>> wrote: >>> >> > >>> >> > On Thu, Apr 22, 2021 at 11:29 AM Valentyn Tymofieiev < >>> valen...@google.com> wrote: >>> >> >> >>> >> >> I always squash-and-merge even when there is only 1 commit. This >>> avoids the necessity to edit the commit message to remove not so helpful >>> "Merge pull request xxx" message. Is there any harm to recommend squash by >>> default in the upcoming squash bot even for single commit PRs? >>> >> > >>> >> > >>> >> > Does squash-and-merge in that case preserve the commit as-is if >>> there's only one? In that case, there'd be no issues of history. (I opted >>> to not comment on 1-commit PRs to be less chatty.) >>> >> > >>> >> >> >>> >> >> >>> >> >> On Thu, Apr 22, 2021 at 11:19 AM Robert Bradshaw < >>> rober...@google.com> wrote: >>> >> >>> >>> >> >>> On Thu, Apr 22, 2021 at 9:33 AM Kenneth Knowles <k...@apache.org> >>> wrote: >>> >> >>>> >>> >> >>>> >>> >> >>>> On Thu, Apr 22, 2021 at 7:04 AM Alexey Romanenko < >>> aromanenko....@gmail.com> wrote: >>> >> >>>>> >>> >> >>>>> Thanks Ismael for bringing this on the table again. Kind of my >>> “favourite” topic, unfortunately, that I raised a couple of times… Let me >>> share some of my thoughts on this. >>> >> >>>>> >>> >> >>>>> First of all, as Beam developers, honestly we have to agree if >>> we care about our commits history or not. If not (or not so much) then >>> probably there is no more things to discuss and we use Git as just Git… >>> It’s not a bad thing, it’s just different but for large projects, like >>> Beam, clear commits history is ultra important, imho. >>> >> >>>>> >>> >> >>>>> Well, for now we do care and we clearly mention this in our >>> Contribution Guide. Probably, it sounds only as a recommendation there or >>> not all contributors (especially first-time ones) read this or take this >>> into account or pay attention on this. It’s fine and we always can expect >>> not following our guide because of many different reasons. And this is >>> exactly where Committers have to play their role! I mean that our clear Git >>> history mostly relies on committer's shoulders and, before clicking on >>> Merge button, every committer have (even “must" I’d say) make sure that PR >>> respects all our rules (we have them because of some reasons, right?) and >>> ready to be merged. Nice and correct titles/messages is one this thing. >>> Personally, the first thing that I do once I start to do a review and >>> before merge, is checking the PR’s title, branches (if it’s from a feature >>> branch and against main Beam branch), number of commits and their messages. >>> Then I take a look on related Jira issue which ID should be prefixed to >>> PR's title and commit’s message(s). >>> >> >>>>> >>> >> >>>>> For sure, there are always exceptions. In case of emergency, >>> for example, if the build is broken because of tiny thing then it makes >>> sense to fix this as fast as possible and then, probably, to neglect some >>> rules. But if exceptions become the common practice and happen quite often, >>> then it’s a signal that either we have to change the rules or change our >>> attitude to this. >>> >> >>>>> >>> >> >>>>> As I see, the initial Ismael’s message of his email was more >>> about titles and multiple commits per PR is another but, of course, related >>> topic. For both, I believe we can partly automate it - add checks to >>> prevent merging the commits with not correct messages or/and limit number >>> of commits per PR, for example. Some other big projects, like Apache Spark, >>> have even special tool to merge PR in well-formed way [1]. I’m not sure >>> that we need to have something similar because I’m pretty sure it will >>> affect the performance of adding new fixes/features (at least in the >>> beginning), but since we already started the similar discussions several >>> times on regular bases, we might want to think in this way as an option too. >>> >> >>>> >>> >> >>>> >>> >> >>>> Noting that we had one too [1]. The trouble was that the bot had >>> a lot of downtime, code was not part of Beam's repo, and also did not >>> encode best practices (for example it broke the connection between PRs and >>> master branch history because it just cherry-picked and squashed commits >>> and pushed those new unrelated commits straight to master). A script would >>> address much of this. >>> >> >>> >>> >> >>> >>> >> >>> Yeah, the mergebot was much more hassle than it was worth, and >>> lots harder to use than pushing a button. I wouldn't be opposed to trying >>> again with a better (simpler, under our control) one (and in my >>> investigations of github actions, it doesn't look that hard). >>> >> >>> >>> >> >>>> Heuristic CI that says "this commit history looks OK" might >>> solve a lot of the problem (I see that Robert already started on this). >>> >> >>>> >>> >> >>>> And finally I was to repeat my agreement with Ismaël and Alexey >>> that the root problem is this: we need to actually care about the commit >>> history and communication of PR/commit titles and descriptions. We use >>> tools to help us to implement our intentions and to communicate them to >>> newcomers, but I don't think this will replace taking care of the repo. >>> >> >>> >>> >> >>> >>> >> >>> Committers should care about taking care of the repo more than >>> the average contributor, but even there there is high variance. I think the >>> issue is "oh, I didn't think to squash vs. merge" rather than "who cares, I >>> always press merge anyway" in which case a timely reminder will go a long >>> way. >>> >> >>> >>> >> >>>> Kenn >>> >> >>>> >>> >> >>>> [1] >>> https://lists.apache.org/thread.html/4a65fb0b66935c9dc61568a3067538775edc3e685c6ac03dd3fa4725%40%3Cdev.beam.apache.org%3E >>> >> >>>> >>> >> >>>>> >>> >> >>>>> >>> >> >>>>> As for me, I’d prefer that every committer paid more attention >>> (if not yet) on these “non code” things before reviewing/merging a PR. >>> >> >>>>> >>> >> >>>>> [1] >>> https://github.com/apache/spark/blob/master/dev/merge_spark_pr.py >>> >> >>>>> >>> >> >>>>> >>> >> >>>>> On 22 Apr 2021, at 01:28, Robert Bradshaw <rober...@google.com> >>> wrote: >>> >> >>>>> >>> >> >>>>> I am also in the camp that it often makes sense to have more >>> than 1 commit per PR, but rather than enforce a 1 commit per PR policy, I >>> would say that it's too much bother to look at the commit history whether >>> it should be squashed or merged (though I think it is almost always very >>> obvious which is preferable for a given PR), go ahead and squash rather >>> than merge by default. >>> >> >>>>> >>> >> >>>>> >>> >> >>>>> On Wed, Apr 21, 2021 at 2:23 PM Kenneth Knowles < >>> k...@apache.org> wrote: >>> >> >>>>>> >>> >> >>>>>> This seems to come up a lot. Maybe we should change something? >>> >> >>>>>> >>> >> >>>>>> Having worked on a number of projects and at companies with >>> this policy, companies using non-distributed source control, and companies >>> that just "use git like git", I know all these ways of life pretty well. >>> >> >>>>>> >>> >> >>>>>> TL;DR my experience is: >>> >> >>>>>> - when people care about the commit history and take care of >>> it, then just "use git like git" results in faster development and clearer >>> history, despite intermediate commits not being tested by Jenkins/Travis/GHA >>> >> >>>>>> - when people see git as an inconvenience, view the history >>> as an implementation detail, or think in linear history of PR merges and >>> view the commits as an implementation detail, it becomes a mess >>> >> >>>>>> >>> >> >>>>>> Empirically, this is what I expect from a 1 commit = 1 PR >>> policy (and how I feel about each point): >>> >> >>>>>> - fewer commits with bad messages (yay!) >>> >> >>>>>> - simpler git graph if we squash + rebase (meh) >>> >> >>>>>> - larger commits of related-but-independent changes (could be >>> OK) >>> >> >>>>>> - commits with bullet points in their description that bundle >>> unrelated changes (sad face) >>> >> >>>>>> - slowdown of development (neutral - slow can be good) >>> >> >>>>>> - fewer "quality of life" improvements, since those would add >>> lines of diff to a PR and are off topic; when they have to be done in a >>> separate PR they don't get done and they don't get reviewed with the same >>> priority (extra sad face) >>> >> >>>>>> >>> >> >>>>>> <rant>I know I am in the minority. I tend to have a lot of PRs >>> where there are 2-5 fairly independent commits. It is "to aid code review" >>> but not in the way you might think: The best size for code review is pretty >>> big, compared to the best size for commit. A commit is the unit of >>> roll-forward, roll-back, cherry-pick, etc. Brian's point about commits not >>> being independently tested is important: this is a tooling issue, but not >>> that easy to change. Here is why I am not that worried about it: I believe >>> strongly in a "rollback first" policy to restore greenness, but also that >>> the rollback change itself must be verified to restore greenness. When a >>> multi-commit PR fails, you can easily open a revert of the whole PR as well >>> as reverts of individual suspect commits. The CI for these will finish >>> around the same time, and if you manage a smaller revert, great! Imagine if >>> to revert a PR you had to revert _every_ change between HEAD and that PR. >>> It would restore to a known green state. Yet we don't do this, because we >>> have technology that makes it unnecessary. Ultimately, single large commits >>> with bullet points are just an unstructured version of multi-commit PRs. So >>> I favor the structure. But people seem to be more likely to write good >>> bullet points than to write independent commits. Perhaps because it is >>> easier.</rant> >>> >> >>>>>> >>> >> >>>>>> So at this point, I think I am OK with a 1 commit per PR >>> policy. I think the net benefits to our commit history would be good. I >>> have grown tired of repeating the conversation. Rebase-and-squash edits >>> commit ids in ways that confuses tools, so I do not favor this. Tooling >>> that merges one commit at a time (without altering commit id) would also be >>> super cool and not that hard. It would prevent intermediate results from >>> merging, solving both problems. >>> >> >>>>>> >>> >> >>>>>> Kenn >>> >> >>>>>> >>> >> >>>>>> >>> >> >>>>>> On Wed, Apr 21, 2021 at 1:25 PM Brian Hulette < >>> bhule...@google.com> wrote: >>> >> >>>>>>> >>> >> >>>>>>> I'd argue that the history is almost always "most useful" >>> when one PR == one commit on master. Intermediate commits from a PR may be >>> useful to aid code review, but they're not verified by presubmits and thus >>> aren't necessarily independently revertible, so I see little value in >>> keeping them around on master. In fact if you're breaking up a PR into >>> multiple commits to aid code review, it's worth considering if they >>> could/should be separately reviewed and verified PRs. >>> >> >>>>>>> We could solve the unwanted commit issue if we have a policy >>> to always "Squash and Merge" PRs with rare exceptions. >>> >> >>>>>>> >>> >> >>>>>>> I agree jira/PR titles could be better, I'm not sure what we >>> can do about it aside from reminding committers of this responsibility. >>> Perhaps the triage process can help catch poorly titled jiras? >>> >> >>>>>>> >>> >> >>>>>>> On Wed, Apr 21, 2021 at 11:38 AM Robert Bradshaw < >>> rober...@google.com> wrote: >>> >> >>>>>>>> >>> >> >>>>>>>> +1 to better descriptions for JIRA (and PRs). Thanks for >>> bringing this up. >>> >> >>>>>>>> >>> >> >>>>>>>> For merging unwanted commits, can we automate a simple check >>> (e.g. with github actions)? >>> >> >>>>>>>> >>> >> >>>>>>>> On Wed, Apr 21, 2021 at 8:00 AM Tomo Suzuki < >>> suzt...@google.com> wrote: >>> >> >>>>>>>>> >>> >> >>>>>>>>> BEAM-12173 is on me. I'm sorry about that. Re-reading >>> committer guide >>> >> >>>>>>>>> [1], I see I was not following this >>> >> >>>>>>>>> >>> >> >>>>>>>>> > The reviewer should give the LGTM and then request that >>> the author of the pull request rebase, squash, split, etc, the commits, so >>> that the history is most useful >>> >> >>>>>>>>> >>> >> >>>>>>>>> >>> >> >>>>>>>>> Thank you for the feedback on this matter! (And I don't >>> think we >>> >> >>>>>>>>> should change the contribution guide) >>> >> >>>>>>>>> >>> >> >>>>>>>>> [1] https://beam.apache.org/contribute/committer-guide/ >>> >> >>>>>>>>> >>> >> >>>>>>>>> On Wed, Apr 21, 2021 at 10:35 AM Ismaël Mejía < >>> ieme...@gmail.com> wrote: >>> >> >>>>>>>>> > >>> >> >>>>>>>>> > Hello, >>> >> >>>>>>>>> > >>> >> >>>>>>>>> > I have noticed an ongoing pattern of carelessness around >>> issues/PR titles and >>> >> >>>>>>>>> > descriptions. It is really painful to see more and more >>> examples like: >>> >> >>>>>>>>> > >>> >> >>>>>>>>> > BEAM-12160 Add TODO for fixing the warning >>> >> >>>>>>>>> > BEAM-12165 Fix ParquetIO >>> >> >>>>>>>>> > BEAM-12173 avoid intermediate conversion (PR) and >>> BEAM-12173 use >>> >> >>>>>>>>> > toMinutes (commit) >>> >> >>>>>>>>> > >>> >> >>>>>>>>> > In all these cases with just a bit of detail in the title >>> it would be enough to >>> >> >>>>>>>>> > make other contributors or reviewers life easierm as well >>> as to have a better >>> >> >>>>>>>>> > project history. What astonishes me apart of the lack of >>> care is that some of >>> >> >>>>>>>>> > those are from Beam commmitters. >>> >> >>>>>>>>> > >>> >> >>>>>>>>> > We already have discussed about not paying attention >>> during commit merges where >>> >> >>>>>>>>> > some PRs end up merging tons of 'unwanted' fixup commits, >>> and nothing has >>> >> >>>>>>>>> > changed so I am wondering if we should maybe just totally >>> remove that rule (for >>> >> >>>>>>>>> > commits) and also eventually for titles and descriptions. >>> >> >>>>>>>>> > >>> >> >>>>>>>>> > Ismaël >>> >> >>>>>>>>> > >>> >> >>>>>>>>> > [1] https://beam.apache.org/contribute/ >>> >> >>>>>>>>> >>> >> >>>>>>>>> >>> >> >>>>>>>>> >>> >> >>>>>>>>> -- >>> >> >>>>>>>>> Regards, >>> >> >>>>>>>>> Tomo >>> >> >>>>> >>> >> >>>>> >>> >>