Thanks for all the super useful info. I can add some experience from the
early days of Beam: Because the GitHub "merge" button did not exist, we
rolled our own "merge bot". The pros/cons below are not general to the
concept, but to our experience with it:

PROs:
 - It single-threaded commits to the repo so there weren't race conditions
on test results. I think this will be OK for our scale for the foreseeable
future.
 - It re-ran tests after merge but before pushing to master, which is the
second half of eliminating that race condition.

CONs:
 - It was very flaky and often just didn't stay up. We danced for joy when
it was gone.
 - It ran a kind of arbitrary set of tests that didn't match the PR
statuses. It did not have any filter on which tests were run during merge.
We were small enough that mostly just "run the tests" was specific enough.
 - It always squashed the commits and then pushed the squashed commit with
a comment "this closes #<pr>". This messes up PRs that have multiple
commits that should remain separate. But more importantly it made it
impossible to easily distinguish PRs that were merged versus those that
were closed without merge. And of course it is way harder to navigate
history when the commits on master have different hashes than the commits
that were authored.
 - Getting reasonable logs with error messages when a merge failed for
whatever reason was hard or impossible IIRC.

So I think in what you have said there is an option to get the best of all,
something like:

 - Do merges through an action. Merge commit or squash-and-merge would be
separate labels. Or not bother with squash and merge, instead using
heuristics to block PRs with bad commit histories.
 - Have the workflow check that all PR statuses are green before continuing
to merge.

Kenn

On Tue, Jun 28, 2022 at 8:03 AM Danny McCormick <dannymccorm...@google.com>
wrote:

> After looking into this a little bit more, I need to revise my opinion on
> how we would do this; I don't think it's practical to have all required
> status checks via the .asf.yml file because those required checks can't be
> filtered by path (for example, if we want to require Python precommit on
> Python PRs, it would need to be required on *all *PRs). That's a GitHub
> limitation
> <https://docs.github.com/en/enterprise-cloud@latest/repositories/configuring-branches-and-merges-in-your-repository/defining-the-mergeability-of-pull-requests/troubleshooting-required-status-checks#handling-skipped-but-required-checks>,
> not an ASF one.
>
> One option is to write an action that makes sure no checks are failing
> (except maybe codecov?) and put a single required check on that. That would
> also make it easy to build in logic to override required checks like Robert
> suggested ("specific wording would have to be in the last comment"). We
> already have logic in the PR bot that does some of this
> <https://github.com/apache/beam/blob/master/scripts/ci/pr-bot/shared/checks.ts>
> .
>
> The downside to that approach is that it's not clear what the best way to
> trigger that workflow is since it has to run after all other checks have
> completed. We could have it trigger on some label (e.g. "ready to merge")
> and then automatically merge the PR when it's done or comment and remove
> the label if checks are failing/incomplete. This changes the workflow for
> committers from "click the merge button" to "add a label", but doesn't
> require significantly more action or oversight and is pretty similar to how
> Kubernetes <https://github.com/kubernetes/kubernetes/pull/110812> and
> some other large repos run things.
>
> Another option would be to trigger that check at the end of every Actions
> run and use the check_suite trigger for external runs (unfortunately
> actions doesn't trigger that). That would require a lot of boilerplate on
> every Actions workflow though. There are other similar options which may be
> cleaner that also require modifying every Actions workflow, but I don't
> love that option.
>
> I'm still in favor of doing this via the "ready to merge" label option I
> just described, but this has dampened my enthusiasm a little bit since we'd
> need to build out/maintain our own tooling.
>
> On Tue, Jun 28, 2022 at 8:49 AM Danny McCormick <dannymccorm...@google.com>
> wrote:
>
>> Regarding code cov - it does check overall % coverage as well as the
>> coverage of the diff, but I don't think it's designed to be a blocking
>> metric. A good example of where this is not helpful is this pr to remove
>> dead code <https://github.com/apache/beam/pull/22019>. Because it
>> removes tested code, the pr lowers our coverage percentage and fails the
>> check, but it's a totally reasonable PR that doesn't need additional
>> testing (because it only removes functionality). Other classes of changes
>> that are problematic include things like improving hard to cover error
>> messages or tough to test proto changes which can only be covered in an
>> integration test that doesn't make it into codecov (these are both common
>> at least in the Go SDK).
>>
>> I'd be opposed to including that as a required check, though I support
>> adding pretty much every other suite.
>>
>> > I assume we would have a way to override it in case the repo gets into
>> a bad state (for example needing to modify the workflow definitions in
>> order to get them to pass).
>>
>> The way we make a check required is by adding it in the .asf.yml file
>> <https://cwiki.apache.org/confluence/display/INFRA/Git+-+.asf.yaml+features#Git.asf.yamlfeatures-Branchprotection>.
>> So we could always update that if needed to push an emergency fix as long
>> as we don't accidentally include any required checks on the .asf.yml
>> itself. If we mess that up we would need to bring in Infra to help.
>>
>> On Mon, Jun 27, 2022 at 7:59 PM Ahmet Altay <al...@google.com> wrote:
>>
>>>
>>>
>>> On Mon, Jun 27, 2022 at 4:55 PM Robert Bradshaw <rober...@google.com>
>>> wrote:
>>>
>>>> On Mon, Jun 27, 2022 at 3:06 PM Kenneth Knowles <k...@apache.org>
>>>> wrote:
>>>>
>>>>> I am in favor of requiring precommit green before merge. To make this
>>>>> better, we should:
>>>>>
>>>>> (1) run fewer irrelevant tests, for example almost every
>>>>> language-based presubmit tests vastly more code than could possibly be
>>>>> affected.
>>>>> (2) run more relevant tests, for example when an IO is touched running
>>>>> slightly more heavyweight integration tests before merge is reasonable.
>>>>>
>>>>
>>>> Does it have to be all or nothing? E.g. we could start with requiring a
>>>> smaller subset of tests that must pass, with the goal of eventually
>>>> requiring (almost?) everything.
>>>>
>>>>
>>>>> I worry that requiring total line coverage in unit tests / presubmits
>>>>> will encourage mock-based tests that don't demonstrate any meaningful
>>>>> correctness. But I am OK to try it.
>>>>>
>>>>
>>> IIRC, codecov test is checking that % coverage is not reduced with the
>>> new PR and it is not checking for total line coverage. I agree with you
>>> that total line coverage is probably not meaningful in most cases.
>>>
>>>
>>>>
>>>> +1. I think there are a class of presubmits (coverage among them) that
>>>> are advisory rather than binding.
>>>>
>>>
>>> +1 - especially if we can make the tools distinguish between required
>>> and advisory presubmits.
>>>
>>>
>>>>
>>>>> I assume we would have a way to override it in case the repo gets into
>>>>> a bad state (for example needing to modify the workflow definitions in
>>>>> order to get them to pass).
>>>>>
>>>>
>>>> +1. But making this explicit (e.g. specific wording would have to be in
>>>> the last comment) would be a good barrier to have even if it's not
>>>> absolute.
>>>>
>>>>
>>>>> On Mon, Jun 27, 2022 at 2:58 PM Ahmet Altay <al...@google.com> wrote:
>>>>>
>>>>>> Just an opinion: If we have a precommit (including codecov) we should
>>>>>> be willing to enforce those passing before merging a PR. Selectively
>>>>>> applying when/which precommits would be blockers eventually leads to more
>>>>>> opinions. I think codecov is actually providing value by checking that
>>>>>> delta coverage is not dropped. In my experience, for PRs that fail the
>>>>>> codecov precommit there tends to be an opportunity for adding unit tests.
>>>>>>
>>>>>> On Mon, Jun 27, 2022 at 2:14 PM Danny McCormick <
>>>>>> dannymccorm...@google.com> wrote:
>>>>>>
>>>>>>> Brian, you keep beating my emails by an instant :) I think codecov
>>>>>>> provides value independent of blocking a PR - it is a good sanity check
>>>>>>> (e.g. if codecov drops from 50% to 5%) and it gives us a nice metric to
>>>>>>> track over time.
>>>>>>>
>>>>>>> On Mon, Jun 27, 2022 at 5:12 PM Danny McCormick <
>>>>>>> dannymccorm...@google.com> wrote:
>>>>>>>
>>>>>>>> Yeah, codecov is the one exception I had in mind, there may be
>>>>>>>> others that are reasonable to exclude as well
>>>>>>>>
>>>>>>>> On Mon, Jun 27, 2022 at 4:59 PM Jack McCluskey <
>>>>>>>> jrmcclus...@google.com> wrote:
>>>>>>>>
>>>>>>>>> +1 for enforcing some precommit checks as required, but making
>>>>>>>>> them all required would be a bit of a nightmare with how Codecov 
>>>>>>>>> works. Any
>>>>>>>>> Python or Go patch gets flagged as failing codecov if its changes 
>>>>>>>>> aren't
>>>>>>>>> covered at or above the current overall coverage level at master 
>>>>>>>>> (around
>>>>>>>>> 75% right now.) This is a nice goal, but ultimately not every change 
>>>>>>>>> lends
>>>>>>>>> itself to that level of unit testing. But I would be in favor of 
>>>>>>>>> things
>>>>>>>>> like the various ValidatesRunner suites being required.
>>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>>
>>>>>>>>> Jack McCluskey
>>>>>>>>>
>>>>>>>>> On Mon, Jun 27, 2022 at 4:51 PM Danny McCormick <
>>>>>>>>> dannymccorm...@google.com> wrote:
>>>>>>>>>
>>>>>>>>>> Brian beat me by a few seconds, but I actually agree that there
>>>>>>>>>> is a problem here - when Damon's PR was submitted there was 
>>>>>>>>>> basically no
>>>>>>>>>> way of knowing that checks would break without having detailed 
>>>>>>>>>> knowledge of
>>>>>>>>>> how the build system works (so I'm on board with your
>>>>>>>>>> Subject/Background/Assessment). With that said, I disagree with the
>>>>>>>>>> recommended path forward - our automation failed us here and I don't 
>>>>>>>>>> think
>>>>>>>>>> the correct response is to make our manual validation better. That 
>>>>>>>>>> shifts
>>>>>>>>>> the burden of verifying correctness from robots (easy to validate,
>>>>>>>>>> basically free) to humans (expensive, forgetful, and hard to 
>>>>>>>>>> validate,
>>>>>>>>>> especially for newer contributors). Instead, I think we should fix 
>>>>>>>>>> the
>>>>>>>>>> underlying technical problems, which are IMO:
>>>>>>>>>>
>>>>>>>>>> 1) Flaky tests unrelated to the PR caused it to fail
>>>>>>>>>> 2) There were no guardrails in place to avoid merging a PR that
>>>>>>>>>> wasn't fully validated.
>>>>>>>>>>
>>>>>>>>>> Things that would've made this better which I think we should
>>>>>>>>>> focus our attention on:
>>>>>>>>>>
>>>>>>>>>> 1) (easy) Split the linting task out from the test step.
>>>>>>>>>> 2) This one is more controversial, but I would vote we start
>>>>>>>>>> enforcing some (or all?) precommit checks as required. I don't 
>>>>>>>>>> really blame
>>>>>>>>>> anyone for merging this PR since it had repeatedly run into unrelated
>>>>>>>>>> flakes, but that's kinda the point - we need to feel some of that 
>>>>>>>>>> pain to
>>>>>>>>>> avoid accidentally merging bad code. Blocking PRs on failed precommit
>>>>>>>>>> suites would force us to deal with these kinds of flakes and 
>>>>>>>>>> incentivize a
>>>>>>>>>> lot of good behavior in the repo (and disincentivize bad tests).
>>>>>>>>>>
>>>>>>>>>> Thanks,
>>>>>>>>>> Danny
>>>>>>>>>>
>>>>>>>>>> On Mon, Jun 27, 2022 at 4:48 PM Brian Hulette <
>>>>>>>>>> bhule...@google.com> wrote:
>>>>>>>>>>
>>>>>>>>>>> Hi Damon,
>>>>>>>>>>> I think our CI system was working as intended for your PR. We
>>>>>>>>>>> select which PreCommits to run based on the files that are edited, 
>>>>>>>>>>> and the
>>>>>>>>>>> author/reviewer should collaborate to get those green before 
>>>>>>>>>>> merging the
>>>>>>>>>>> PR. Sometimes that will involve inspecting the logs in Jenkins to 
>>>>>>>>>>> find the
>>>>>>>>>>> failing gradle tasks and run them locally to reproduce.
>>>>>>>>>>>
>>>>>>>>>>> It looks like what broke down in this case was that the PR was
>>>>>>>>>>> accidentally merged despite the failing Java PreCommit run.
>>>>>>>>>>>
>>>>>>>>>>> Brian
>>>>>>>>>>>
>>>>>>>>>>> On Mon, Jun 27, 2022 at 12:02 PM Damon Douglas <
>>>>>>>>>>> damondoug...@google.com> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> I write this to prevent a recent experience I encountered
>>>>>>>>>>>> contributing to the Apache Beam repository.  To address a system 
>>>>>>>>>>>> issue, I
>>>>>>>>>>>> write in SBAR
>>>>>>>>>>>> <https://www.hopkinsmedicine.org/antimicrobial-stewardship/nursing-toolkit/_docs/sbar-communication-tool.pdf>
>>>>>>>>>>>> format for clarity.  If there are no objections to the 
>>>>>>>>>>>> recommendation, I'm
>>>>>>>>>>>> happy to fulfill the first recommendation i.e. create GitHub 
>>>>>>>>>>>> issues.
>>>>>>>>>>>>
>>>>>>>>>>>> *Subject*
>>>>>>>>>>>>
>>>>>>>>>>>> I submitted a PR <https://github.com/apache/beam/pull/21953>
>>>>>>>>>>>> that was subsequently reverted
>>>>>>>>>>>> <https://github.com/apache/beam/pull/22044>.
>>>>>>>>>>>>
>>>>>>>>>>>> *Background*
>>>>>>>>>>>>
>>>>>>>>>>>> Despite passing the following gradle tasks,
>>>>>>>>>>>> the PreCommit_Java_Commit failed with a test unrelated to the
>>>>>>>>>>>> code in the PR
>>>>>>>>>>>> <https://ci-beam.apache.org/job/beam_PreCommit_Java_Commit/23066/testReport/junit/org.apache.beam.sdk.io.gcp.bigquery/BigQueryServicesImplTest/testInsertWithinRowCountLimits/>.
>>>>>>>>>>>> The PR was merged and then reverted due to discovered vendor 
>>>>>>>>>>>> dependency
>>>>>>>>>>>> issues and failed style checks.
>>>>>>>>>>>>
>>>>>>>>>>>> ./gradlew rat
>>>>>>>>>>>> ./gradlew spotlessCheck
>>>>>>>>>>>> ./gradlew sdks:java:io:google-cloud-platform:test
>>>>>>>>>>>>
>>>>>>>>>>>> *Assessment*
>>>>>>>>>>>>
>>>>>>>>>>>> Output from the following gradle commands did not surface prior
>>>>>>>>>>>> to the PR merge.
>>>>>>>>>>>>
>>>>>>>>>>>> ./gradlew sdks:java:io:google-cloud-platform:checkStyleMain
>>>>>>>>>>>> ./gradlew sdks:java:io:google-cloud-platform:checkStyleTest
>>>>>>>>>>>>
>>>>>>>>>>>> Both the aforementioned failing PreCommit_Java_Commit test and
>>>>>>>>>>>> a lack of knowledge contributed to the root cause.
>>>>>>>>>>>>
>>>>>>>>>>>> *Recommendation / Proposal*
>>>>>>>>>>>>
>>>>>>>>>>>> I propose updating PR template description
>>>>>>>>>>>> <https://github.com/apache/beam/blob/master/.github/PULL_REQUEST_TEMPLATE.md>
>>>>>>>>>>>>  that
>>>>>>>>>>>> prescribes gradle tasks to be successfully executed prior to 
>>>>>>>>>>>> submitting a
>>>>>>>>>>>> PR.  Toward this aim:
>>>>>>>>>>>>
>>>>>>>>>>>> 1.  Four separate GitHub issues in
>>>>>>>>>>>> https://github.com/apache/beam representative of *common*, and
>>>>>>>>>>>> language specific: *Java*, *Python*, and *Go*, prescribed
>>>>>>>>>>>> gradle tasks
>>>>>>>>>>>> 2.  Don't reply to this email with specifics on how; delegation
>>>>>>>>>>>> of comments/discussion within the relevant context of the GitHub 
>>>>>>>>>>>> issues in 1
>>>>>>>>>>>> 3.  PRs that fix the GitHub issues in 1
>>>>>>>>>>>>
>>>>>>>>>>>> --
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> *Damon Douglas*
>>>>>>>>>>>>
>>>>>>>>>>>> Strategic Cloud Engineer, Data & Analytics, Google Cloud
>>>>>>>>>>>>
>>>>>>>>>>>> damondoug...@google.com
>>>>>>>>>>>>
>>>>>>>>>>>

Reply via email to