Case study: https://github.com/apache/beam/pull/14618/commits

To get back to the question of Issue and PR titles:

 - When doing Jira triage, fix issue titles and issue type to be
meaningful. These autogenerate release notes, and also we use these to find
duplicates, etc.
 - Any committer can usually edit PR titles and descriptions. I do this
frequently. I am unhappy with how often the template "PLEASE add a
meaningful description" is left exactly as-is. The template is literally
begging you to actually describe the change. It is fine to duplicate the
body of the commit message. Good commit messages have the format
<subject>\n\n<description>.

Kenn

On Thu, Apr 22, 2021 at 12:49 PM Kenneth Knowles <k...@apache.org> wrote:

> I think I am wrong about this. It seems like for squashed/rebased commits
> it is still GitHub that is committer? But it does seem to have the metadata
> about who did the squash & merge. This pattern of storing important
> metadata outside of git is not a good direction.
>
> Kenn
>
> On Thu, Apr 22, 2021 at 12:45 PM Kenneth Knowles <k...@apache.org> wrote:
>
>> That is unfortunate that GitHub is the committer of merge commits :-/
>> though I suppose you have the author field you can use. It is unfortunate
>> the this is a different field based on method.
>>
>> Kenn
>>
>> On Thu, Apr 22, 2021 at 12:39 PM Ismaël Mejía <ieme...@gmail.com> wrote:
>>
>>> I was not referring to author identity but to committer identity that
>>> matters to know who accepted to merge something but it seems we are
>>> not really using this much because github is the 'committer' of merge
>>> commits too :S maybe something we can improve as part of this
>>> discussion.
>>>
>>> git show --pretty=full COMMITID
>>>
>>> On Thu, Apr 22, 2021 at 9:10 PM Valentyn Tymofieiev <valen...@google.com>
>>> wrote:
>>> >
>>> > Author identity is preserved. Here's an output of 'git log'
>>> >
>>> > commit 93ecc1d3a4b997b2490c4439972ffaf09125299f
>>> > Merge: 2e9ee8c005 4e3decbb4e
>>>                         <------ a merge commit that merges 2 commit,
>>> 4e3decbb4e and it's parent. Author history is preserved on 4e3decbb4e
>>> > Author: Ismaël Mejía <ieme...@gmail.com>
>>>                    <------  this is the author of merge commit
>>> > Date:   Thu Apr 22 12:46:38 2021 +0200
>>> >
>>> >     Merge pull request #14616: [BEAM-12207] Remove log messages about
>>> files to stage.    <------ Note that message was edited, and does not
>>> include a branch, which is nice!
>>> > commit 2e9ee8c0052d96045588e617c9e5de017f30454a
>>> >
>>> >
>>> > commit 28020effca12a18a65799ac7d2d3d520d73072d7
>>> > Author: yoshiki.obata <1285728+lazyl...@users.noreply.github.com>
>>> > Date:   Thu Apr 22 11:57:45 2021 +0900
>>> >
>>> >     [BEAM-7372] cleanup codes for py2 from apache_beam/transforms
>>> (#14544)     <--- 1-commit PR  was squashed-and-merged by me. Author's
>>> identity is preserved
>>> >
>>> > On Thu, Apr 22, 2021 at 11:47 AM Ismaël Mejía <ieme...@gmail.com>
>>> wrote:
>>> >>
>>> >> In the past github squash and merge did not preserve the committer
>>> >> identity correctly, is it still the case? If  so we should not be
>>> >> using it.
>>> >> https://github.com/isaacs/github/issues/1368
>>> >>
>>> >> On Thu, Apr 22, 2021 at 8:41 PM Robert Bradshaw <rober...@google.com>
>>> wrote:
>>> >> >
>>> >> > On Thu, Apr 22, 2021 at 11:29 AM Valentyn Tymofieiev <
>>> valen...@google.com> wrote:
>>> >> >>
>>> >> >> I always squash-and-merge even when there is only 1 commit. This
>>> avoids the necessity to edit the commit message to remove not so helpful
>>> "Merge pull request xxx" message. Is there any harm to recommend squash by
>>> default in the upcoming squash bot even for single commit PRs?
>>> >> >
>>> >> >
>>> >> > Does squash-and-merge in that case preserve the commit as-is if
>>> there's only one? In that case, there'd be no issues of history. (I opted
>>> to not comment on 1-commit PRs to be less chatty.)
>>> >> >
>>> >> >>
>>> >> >>
>>> >> >> On Thu, Apr 22, 2021 at 11:19 AM Robert Bradshaw <
>>> rober...@google.com> wrote:
>>> >> >>>
>>> >> >>> On Thu, Apr 22, 2021 at 9:33 AM Kenneth Knowles <k...@apache.org>
>>> wrote:
>>> >> >>>>
>>> >> >>>>
>>> >> >>>> On Thu, Apr 22, 2021 at 7:04 AM Alexey Romanenko <
>>> aromanenko....@gmail.com> wrote:
>>> >> >>>>>
>>> >> >>>>> Thanks Ismael for bringing this on the table again. Kind of my
>>> “favourite” topic, unfortunately, that I raised a couple of times… Let me
>>> share some of my thoughts on this.
>>> >> >>>>>
>>> >> >>>>> First of all, as Beam developers, honestly we have to agree if
>>> we care about our commits history or not. If not (or not so much) then
>>> probably there is no more things to discuss and we use Git as just Git…
>>> It’s not a bad thing, it’s just different but for large projects, like
>>> Beam, clear commits history is ultra important, imho.
>>> >> >>>>>
>>> >> >>>>> Well, for now we do care and we clearly mention this in our
>>> Contribution Guide. Probably, it sounds only as a recommendation there or
>>> not all contributors (especially first-time ones) read this or take this
>>> into account or pay attention on this. It’s fine and we always can expect
>>> not following our guide because of many different reasons. And this is
>>> exactly where Committers have to play their role! I mean that our clear Git
>>> history mostly relies on committer's shoulders and, before clicking on
>>> Merge button, every committer have (even “must" I’d say) make sure that PR
>>> respects all our rules (we have them because of some reasons, right?) and
>>> ready to be merged. Nice and correct titles/messages is one this thing.
>>> Personally, the first thing that I do once I start to do a review and
>>> before merge, is checking the PR’s title, branches (if it’s from a feature
>>> branch and against main Beam branch), number of commits and their messages.
>>> Then I take a look on related Jira issue which ID should be prefixed to
>>> PR's title and commit’s message(s).
>>> >> >>>>>
>>> >> >>>>> For sure, there are always exceptions. In case of emergency,
>>> for example, if the build is broken because of tiny thing then it makes
>>> sense to fix this as fast as possible and then, probably, to neglect some
>>> rules. But if exceptions become the common practice and happen quite often,
>>> then it’s a signal that either we have to change the rules or change our
>>> attitude to this.
>>> >> >>>>>
>>> >> >>>>> As I see, the initial Ismael’s message of his email was more
>>> about titles and multiple commits per PR is another but, of course, related
>>> topic. For both, I believe we can partly automate it - add checks to
>>> prevent merging the commits with not correct messages or/and limit number
>>> of commits per PR, for example. Some other big projects, like Apache Spark,
>>> have even special tool to merge PR in well-formed way [1]. I’m not sure
>>> that we need to have something similar because I’m pretty sure it will
>>> affect the performance of adding new fixes/features (at least in the
>>> beginning), but since we already started the similar discussions several
>>> times on regular bases, we might want to think in this way as an option too.
>>> >> >>>>
>>> >> >>>>
>>> >> >>>> Noting that we had one too [1]. The trouble was that the bot had
>>> a lot of downtime, code was not part of Beam's repo, and also did not
>>> encode best practices (for example it broke the connection between PRs and
>>> master branch history because it just cherry-picked and squashed commits
>>> and pushed those new unrelated commits straight to master). A script would
>>> address much of this.
>>> >> >>>
>>> >> >>>
>>> >> >>> Yeah, the mergebot was much more hassle than it was worth, and
>>> lots harder to use than pushing a button. I wouldn't be opposed to trying
>>> again with a better (simpler, under our control) one (and in my
>>> investigations of github actions, it doesn't look that hard).
>>> >> >>>
>>> >> >>>> Heuristic CI that says "this commit history looks OK" might
>>> solve a lot of the problem (I see that Robert already started on this).
>>> >> >>>>
>>> >> >>>> And finally I was to repeat my agreement with Ismaël and Alexey
>>> that the root problem is this: we need to actually care about the commit
>>> history and communication of PR/commit titles and descriptions. We use
>>> tools to help us to implement our intentions and to communicate them to
>>> newcomers, but I don't think this will replace taking care of the repo.
>>> >> >>>
>>> >> >>>
>>> >> >>> Committers should care about taking care of the repo more than
>>> the average contributor, but even there there is high variance. I think the
>>> issue is "oh, I didn't think to squash vs. merge" rather than "who cares, I
>>> always press merge anyway" in which case a timely reminder will go a long
>>> way.
>>> >> >>>
>>> >> >>>> Kenn
>>> >> >>>>
>>> >> >>>> [1]
>>> https://lists.apache.org/thread.html/4a65fb0b66935c9dc61568a3067538775edc3e685c6ac03dd3fa4725%40%3Cdev.beam.apache.org%3E
>>> >> >>>>
>>> >> >>>>>
>>> >> >>>>>
>>> >> >>>>> As for me, I’d prefer that every committer paid more attention
>>> (if not yet) on these “non code” things before reviewing/merging a PR.
>>> >> >>>>>
>>> >> >>>>> [1]
>>> https://github.com/apache/spark/blob/master/dev/merge_spark_pr.py
>>> >> >>>>>
>>> >> >>>>>
>>> >> >>>>> On 22 Apr 2021, at 01:28, Robert Bradshaw <rober...@google.com>
>>> wrote:
>>> >> >>>>>
>>> >> >>>>> I am also in the camp that it often makes sense to have more
>>> than 1 commit per PR, but rather than enforce a 1 commit per PR policy, I
>>> would say that it's too much bother to look at the commit history whether
>>> it should be squashed or merged (though I think it is almost always very
>>> obvious which is preferable for a given PR), go ahead and squash rather
>>> than merge by default.
>>> >> >>>>>
>>> >> >>>>>
>>> >> >>>>> On Wed, Apr 21, 2021 at 2:23 PM Kenneth Knowles <
>>> k...@apache.org> wrote:
>>> >> >>>>>>
>>> >> >>>>>> This seems to come up a lot. Maybe we should change something?
>>> >> >>>>>>
>>> >> >>>>>> Having worked on a number of projects and at companies with
>>> this policy, companies using non-distributed source control, and companies
>>> that just "use git like git", I know all these ways of life pretty well.
>>> >> >>>>>>
>>> >> >>>>>> TL;DR my experience is:
>>> >> >>>>>>  - when people care about the commit history and take care of
>>> it, then just "use git like git" results in faster development and clearer
>>> history, despite intermediate commits not being tested by Jenkins/Travis/GHA
>>> >> >>>>>>  - when people see git as an inconvenience, view the history
>>> as an implementation detail, or think in linear history of PR merges and
>>> view the commits as an implementation detail, it becomes a mess
>>> >> >>>>>>
>>> >> >>>>>> Empirically, this is what I expect from a 1 commit = 1 PR
>>> policy (and how I feel about each point):
>>> >> >>>>>>  - fewer commits with bad messages (yay!)
>>> >> >>>>>>  - simpler git graph if we squash + rebase (meh)
>>> >> >>>>>>  - larger commits of related-but-independent changes (could be
>>> OK)
>>> >> >>>>>>  - commits with bullet points in their description that bundle
>>> unrelated changes (sad face)
>>> >> >>>>>>  - slowdown of development (neutral - slow can be good)
>>> >> >>>>>>  - fewer "quality of life" improvements, since those would add
>>> lines of diff to a PR and are off topic; when they have to be done in a
>>> separate PR they don't get done and they don't get reviewed with the same
>>> priority (extra sad face)
>>> >> >>>>>>
>>> >> >>>>>> <rant>I know I am in the minority. I tend to have a lot of PRs
>>> where there are 2-5 fairly independent commits. It is "to aid code review"
>>> but not in the way you might think: The best size for code review is pretty
>>> big, compared to the best size for commit. A commit is the unit of
>>> roll-forward, roll-back, cherry-pick, etc. Brian's point about commits not
>>> being independently tested is important: this is a tooling issue, but not
>>> that easy to change. Here is why I am not that worried about it: I believe
>>> strongly in a "rollback first" policy to restore greenness, but also that
>>> the rollback change itself must be verified to restore greenness. When a
>>> multi-commit PR fails, you can easily open a revert of the whole PR as well
>>> as reverts of individual suspect commits. The CI for these will finish
>>> around the same time, and if you manage a smaller revert, great! Imagine if
>>> to revert a PR you had to revert _every_ change between HEAD and that PR.
>>> It would restore to a known green state. Yet we don't do this, because we
>>> have technology that makes it unnecessary. Ultimately, single large commits
>>> with bullet points are just an unstructured version of multi-commit PRs. So
>>> I favor the structure. But people seem to be more likely to write good
>>> bullet points than to write independent commits. Perhaps because it is
>>> easier.</rant>
>>> >> >>>>>>
>>> >> >>>>>> So at this point, I think I am OK with a 1 commit per PR
>>> policy. I think the net benefits to our commit history would be good. I
>>> have grown tired of repeating the conversation. Rebase-and-squash edits
>>> commit ids in ways that confuses tools, so I do not favor this. Tooling
>>> that merges one commit at a time (without altering commit id) would also be
>>> super cool and not that hard. It would prevent intermediate results from
>>> merging, solving both problems.
>>> >> >>>>>>
>>> >> >>>>>> Kenn
>>> >> >>>>>>
>>> >> >>>>>>
>>> >> >>>>>> On Wed, Apr 21, 2021 at 1:25 PM Brian Hulette <
>>> bhule...@google.com> wrote:
>>> >> >>>>>>>
>>> >> >>>>>>> I'd argue that the history is almost always "most useful"
>>> when one PR == one commit on master. Intermediate commits from a PR may be
>>> useful to aid code review, but they're not verified by presubmits and thus
>>> aren't necessarily independently revertible, so I see little value in
>>> keeping them around on master. In fact if you're breaking up a PR into
>>> multiple commits to aid code review, it's worth considering if they
>>> could/should be separately reviewed and verified PRs.
>>> >> >>>>>>> We could solve the unwanted commit issue if we have a policy
>>> to always "Squash and Merge" PRs with rare exceptions.
>>> >> >>>>>>>
>>> >> >>>>>>> I agree jira/PR titles could be better, I'm not sure what we
>>> can do about it aside from reminding committers of this responsibility.
>>> Perhaps the triage process can help catch poorly titled jiras?
>>> >> >>>>>>>
>>> >> >>>>>>> On Wed, Apr 21, 2021 at 11:38 AM Robert Bradshaw <
>>> rober...@google.com> wrote:
>>> >> >>>>>>>>
>>> >> >>>>>>>> +1 to better descriptions for JIRA (and PRs). Thanks for
>>> bringing this up.
>>> >> >>>>>>>>
>>> >> >>>>>>>> For merging unwanted commits, can we automate a simple check
>>> (e.g. with github actions)?
>>> >> >>>>>>>>
>>> >> >>>>>>>> On Wed, Apr 21, 2021 at 8:00 AM Tomo Suzuki <
>>> suzt...@google.com> wrote:
>>> >> >>>>>>>>>
>>> >> >>>>>>>>> BEAM-12173 is on me. I'm sorry about that. Re-reading
>>> committer guide
>>> >> >>>>>>>>> [1], I see I was not following this
>>> >> >>>>>>>>>
>>> >> >>>>>>>>> > The reviewer should give the LGTM and then request that
>>> the author of the pull request rebase, squash, split, etc, the commits, so
>>> that the history is most useful
>>> >> >>>>>>>>>
>>> >> >>>>>>>>>
>>> >> >>>>>>>>> Thank you for the feedback on this matter! (And I don't
>>> think we
>>> >> >>>>>>>>> should change the contribution guide)
>>> >> >>>>>>>>>
>>> >> >>>>>>>>> [1] https://beam.apache.org/contribute/committer-guide/
>>> >> >>>>>>>>>
>>> >> >>>>>>>>> On Wed, Apr 21, 2021 at 10:35 AM Ismaël Mejía <
>>> ieme...@gmail.com> wrote:
>>> >> >>>>>>>>> >
>>> >> >>>>>>>>> > Hello,
>>> >> >>>>>>>>> >
>>> >> >>>>>>>>> > I have noticed an ongoing pattern of carelessness around
>>> issues/PR titles and
>>> >> >>>>>>>>> > descriptions. It is really painful to see more and more
>>> examples like:
>>> >> >>>>>>>>> >
>>> >> >>>>>>>>> > BEAM-12160 Add TODO for fixing the warning
>>> >> >>>>>>>>> > BEAM-12165 Fix ParquetIO
>>> >> >>>>>>>>> > BEAM-12173 avoid intermediate conversion (PR) and
>>> BEAM-12173 use
>>> >> >>>>>>>>> > toMinutes (commit)
>>> >> >>>>>>>>> >
>>> >> >>>>>>>>> > In all these cases with just a bit of detail in the title
>>> it would be enough to
>>> >> >>>>>>>>> > make other contributors or reviewers life easierm as well
>>> as to have a better
>>> >> >>>>>>>>> > project history.  What astonishes me apart of the lack of
>>> care is that some of
>>> >> >>>>>>>>> > those are from Beam commmitters.
>>> >> >>>>>>>>> >
>>> >> >>>>>>>>> > We already have discussed about not paying attention
>>> during commit merges where
>>> >> >>>>>>>>> > some PRs end up merging tons of 'unwanted' fixup commits,
>>> and nothing has
>>> >> >>>>>>>>> > changed so I am wondering if we should maybe just totally
>>> remove that rule (for
>>> >> >>>>>>>>> > commits) and also eventually for titles and descriptions.
>>> >> >>>>>>>>> >
>>> >> >>>>>>>>> > Ismaël
>>> >> >>>>>>>>> >
>>> >> >>>>>>>>> > [1] https://beam.apache.org/contribute/
>>> >> >>>>>>>>>
>>> >> >>>>>>>>>
>>> >> >>>>>>>>>
>>> >> >>>>>>>>> --
>>> >> >>>>>>>>> Regards,
>>> >> >>>>>>>>> Tomo
>>> >> >>>>>
>>> >> >>>>>
>>>
>>

Reply via email to