Hi all, I have drafted a document summarizing the ideas suggested in this thread.
https://docs.google.com/document/d/10FlXOo_hL2QYTPhwS8uHSyJbQCzwC3K3C12tFccANA8/edit# where I extracted the following action items: short-term (easy) items - Split the linting task (e.g. java checkStyle) out from the test (java percommit) step mid-to-long-term items - A single github action that indicates the PR is ready to merge - Merging the PR through action I am happy to volunteer to take on this task. Once we have decided the direction(s) to proceed I am going to detail the design. Regards, Yi On Tue, Jun 28, 2022 at 10:40 PM Ahmet Altay <al...@google.com> wrote: > I am worried about adding our own custom tooling. They require > maintenance, they introduce new flakiness, and so far we have not been > great about maintaining custom infra. > > On Tue, Jun 28, 2022 at 1:36 PM Kenneth Knowles <k...@apache.org> wrote: > >> Thanks for all the super useful info. I can add some experience from the >> early days of Beam: Because the GitHub "merge" button did not exist, we >> rolled our own "merge bot". The pros/cons below are not general to the >> concept, but to our experience with it: >> >> PROs: >> - It single-threaded commits to the repo so there weren't race >> conditions on test results. I think this will be OK for our scale for the >> foreseeable future. >> - It re-ran tests after merge but before pushing to master, which is the >> second half of eliminating that race condition. >> >> CONs: >> - It was very flaky and often just didn't stay up. We danced for joy >> when it was gone. >> - It ran a kind of arbitrary set of tests that didn't match the PR >> statuses. It did not have any filter on which tests were run during merge. >> We were small enough that mostly just "run the tests" was specific enough. >> - It always squashed the commits and then pushed the squashed commit >> with a comment "this closes #<pr>". This messes up PRs that have multiple >> commits that should remain separate. But more importantly it made it >> impossible to easily distinguish PRs that were merged versus those that >> were closed without merge. And of course it is way harder to navigate >> history when the commits on master have different hashes than the commits >> that were authored. >> - Getting reasonable logs with error messages when a merge failed for >> whatever reason was hard or impossible IIRC. >> >> So I think in what you have said there is an option to get the best of >> all, something like: >> >> - Do merges through an action. Merge commit or squash-and-merge would be >> separate labels. Or not bother with squash and merge, instead using >> heuristics to block PRs with bad commit histories. >> - Have the workflow check that all PR statuses are green before >> continuing to merge. >> > > This sounds like a reasonable process. I assume the only addition in this > case would be another github action. > > >> >> Kenn >> >> On Tue, Jun 28, 2022 at 8:03 AM Danny McCormick < >> dannymccorm...@google.com> wrote: >> >>> After looking into this a little bit more, I need to revise my opinion >>> on how we would do this; I don't think it's practical to have all required >>> status checks via the .asf.yml file because those required checks can't be >>> filtered by path (for example, if we want to require Python precommit on >>> Python PRs, it would need to be required on *all *PRs). That's a GitHub >>> limitation >>> <https://docs.github.com/en/enterprise-cloud@latest/repositories/configuring-branches-and-merges-in-your-repository/defining-the-mergeability-of-pull-requests/troubleshooting-required-status-checks#handling-skipped-but-required-checks>, >>> not an ASF one. >>> >>> One option is to write an action that makes sure no checks are failing >>> (except maybe codecov?) and put a single required check on that. That would >>> also make it easy to build in logic to override required checks like Robert >>> suggested ("specific wording would have to be in the last comment"). We >>> already have logic in the PR bot that does some of this >>> <https://github.com/apache/beam/blob/master/scripts/ci/pr-bot/shared/checks.ts> >>> . >>> >>> The downside to that approach is that it's not clear what the best way >>> to trigger that workflow is since it has to run after all other checks have >>> completed. We could have it trigger on some label (e.g. "ready to merge") >>> and then automatically merge the PR when it's done or comment and remove >>> the label if checks are failing/incomplete. This changes the workflow for >>> committers from "click the merge button" to "add a label", but doesn't >>> require significantly more action or oversight and is pretty similar to how >>> Kubernetes <https://github.com/kubernetes/kubernetes/pull/110812> and >>> some other large repos run things. >>> >>> Another option would be to trigger that check at the end of every >>> Actions run and use the check_suite trigger for external runs >>> (unfortunately actions doesn't trigger that). That would require a lot of >>> boilerplate on every Actions workflow though. There are other similar >>> options which may be cleaner that also require modifying every Actions >>> workflow, but I don't love that option. >>> >>> I'm still in favor of doing this via the "ready to merge" label option I >>> just described, but this has dampened my enthusiasm a little bit since we'd >>> need to build out/maintain our own tooling. >>> >>> On Tue, Jun 28, 2022 at 8:49 AM Danny McCormick < >>> dannymccorm...@google.com> wrote: >>> >>>> Regarding code cov - it does check overall % coverage as well as the >>>> coverage of the diff, but I don't think it's designed to be a blocking >>>> metric. A good example of where this is not helpful is this pr to >>>> remove dead code <https://github.com/apache/beam/pull/22019>. Because >>>> it removes tested code, the pr lowers our coverage percentage and fails the >>>> check, but it's a totally reasonable PR that doesn't need additional >>>> testing (because it only removes functionality). Other classes of changes >>>> that are problematic include things like improving hard to cover error >>>> messages or tough to test proto changes which can only be covered in an >>>> integration test that doesn't make it into codecov (these are both common >>>> at least in the Go SDK). >>>> >>>> I'd be opposed to including that as a required check, though I support >>>> adding pretty much every other suite. >>>> >>>> > I assume we would have a way to override it in case the repo gets >>>> into a bad state (for example needing to modify the workflow definitions in >>>> order to get them to pass). >>>> >>>> The way we make a check required is by adding it in the .asf.yml file >>>> <https://cwiki.apache.org/confluence/display/INFRA/Git+-+.asf.yaml+features#Git.asf.yamlfeatures-Branchprotection>. >>>> So we could always update that if needed to push an emergency fix as long >>>> as we don't accidentally include any required checks on the .asf.yml >>>> itself. If we mess that up we would need to bring in Infra to help. >>>> >>>> On Mon, Jun 27, 2022 at 7:59 PM Ahmet Altay <al...@google.com> wrote: >>>> >>>>> >>>>> >>>>> On Mon, Jun 27, 2022 at 4:55 PM Robert Bradshaw <rober...@google.com> >>>>> wrote: >>>>> >>>>>> On Mon, Jun 27, 2022 at 3:06 PM Kenneth Knowles <k...@apache.org> >>>>>> wrote: >>>>>> >>>>>>> I am in favor of requiring precommit green before merge. To make >>>>>>> this better, we should: >>>>>>> >>>>>>> (1) run fewer irrelevant tests, for example almost every >>>>>>> language-based presubmit tests vastly more code than could possibly be >>>>>>> affected. >>>>>>> (2) run more relevant tests, for example when an IO is touched >>>>>>> running slightly more heavyweight integration tests before merge is >>>>>>> reasonable. >>>>>>> >>>>>> >>>>>> Does it have to be all or nothing? E.g. we could start with requiring >>>>>> a smaller subset of tests that must pass, with the goal of eventually >>>>>> requiring (almost?) everything. >>>>>> >>>>>> >>>>>>> I worry that requiring total line coverage in unit tests / >>>>>>> presubmits will encourage mock-based tests that don't demonstrate any >>>>>>> meaningful correctness. But I am OK to try it. >>>>>>> >>>>>> >>>>> IIRC, codecov test is checking that % coverage is not reduced with the >>>>> new PR and it is not checking for total line coverage. I agree with you >>>>> that total line coverage is probably not meaningful in most cases. >>>>> >>>>> >>>>>> >>>>>> +1. I think there are a class of presubmits (coverage among them) >>>>>> that are advisory rather than binding. >>>>>> >>>>> >>>>> +1 - especially if we can make the tools distinguish between required >>>>> and advisory presubmits. >>>>> >>>>> >>>>>> >>>>>>> I assume we would have a way to override it in case the repo gets >>>>>>> into a bad state (for example needing to modify the workflow >>>>>>> definitions in >>>>>>> order to get them to pass). >>>>>>> >>>>>> >>>>>> +1. But making this explicit (e.g. specific wording would have to be >>>>>> in the last comment) would be a good barrier to have even if it's not >>>>>> absolute. >>>>>> >>>>>> >>>>>>> On Mon, Jun 27, 2022 at 2:58 PM Ahmet Altay <al...@google.com> >>>>>>> wrote: >>>>>>> >>>>>>>> Just an opinion: If we have a precommit (including codecov) we >>>>>>>> should be willing to enforce those passing before merging a PR. >>>>>>>> Selectively >>>>>>>> applying when/which precommits would be blockers eventually leads to >>>>>>>> more >>>>>>>> opinions. I think codecov is actually providing value by checking that >>>>>>>> delta coverage is not dropped. In my experience, for PRs that fail the >>>>>>>> codecov precommit there tends to be an opportunity for adding unit >>>>>>>> tests. >>>>>>>> >>>>>>>> On Mon, Jun 27, 2022 at 2:14 PM Danny McCormick < >>>>>>>> dannymccorm...@google.com> wrote: >>>>>>>> >>>>>>>>> Brian, you keep beating my emails by an instant :) I think codecov >>>>>>>>> provides value independent of blocking a PR - it is a good sanity >>>>>>>>> check >>>>>>>>> (e.g. if codecov drops from 50% to 5%) and it gives us a nice metric >>>>>>>>> to >>>>>>>>> track over time. >>>>>>>>> >>>>>>>>> On Mon, Jun 27, 2022 at 5:12 PM Danny McCormick < >>>>>>>>> dannymccorm...@google.com> wrote: >>>>>>>>> >>>>>>>>>> Yeah, codecov is the one exception I had in mind, there may be >>>>>>>>>> others that are reasonable to exclude as well >>>>>>>>>> >>>>>>>>>> On Mon, Jun 27, 2022 at 4:59 PM Jack McCluskey < >>>>>>>>>> jrmcclus...@google.com> wrote: >>>>>>>>>> >>>>>>>>>>> +1 for enforcing some precommit checks as required, but making >>>>>>>>>>> them all required would be a bit of a nightmare with how Codecov >>>>>>>>>>> works. Any >>>>>>>>>>> Python or Go patch gets flagged as failing codecov if its changes >>>>>>>>>>> aren't >>>>>>>>>>> covered at or above the current overall coverage level at master >>>>>>>>>>> (around >>>>>>>>>>> 75% right now.) This is a nice goal, but ultimately not every >>>>>>>>>>> change lends >>>>>>>>>>> itself to that level of unit testing. But I would be in favor of >>>>>>>>>>> things >>>>>>>>>>> like the various ValidatesRunner suites being required. >>>>>>>>>>> >>>>>>>>>>> Thanks, >>>>>>>>>>> >>>>>>>>>>> Jack McCluskey >>>>>>>>>>> >>>>>>>>>>> On Mon, Jun 27, 2022 at 4:51 PM Danny McCormick < >>>>>>>>>>> dannymccorm...@google.com> wrote: >>>>>>>>>>> >>>>>>>>>>>> Brian beat me by a few seconds, but I actually agree that there >>>>>>>>>>>> is a problem here - when Damon's PR was submitted there was >>>>>>>>>>>> basically no >>>>>>>>>>>> way of knowing that checks would break without having detailed >>>>>>>>>>>> knowledge of >>>>>>>>>>>> how the build system works (so I'm on board with your >>>>>>>>>>>> Subject/Background/Assessment). With that said, I disagree with the >>>>>>>>>>>> recommended path forward - our automation failed us here and I >>>>>>>>>>>> don't think >>>>>>>>>>>> the correct response is to make our manual validation better. That >>>>>>>>>>>> shifts >>>>>>>>>>>> the burden of verifying correctness from robots (easy to validate, >>>>>>>>>>>> basically free) to humans (expensive, forgetful, and hard to >>>>>>>>>>>> validate, >>>>>>>>>>>> especially for newer contributors). Instead, I think we should fix >>>>>>>>>>>> the >>>>>>>>>>>> underlying technical problems, which are IMO: >>>>>>>>>>>> >>>>>>>>>>>> 1) Flaky tests unrelated to the PR caused it to fail >>>>>>>>>>>> >>>>>>>>>>> > I missed this part earlier. I would encourage all committers who are > blocked on flaky tests to file a new github issue with a link to the flaked > test before re-running the flaking test. Re-running tests until they pass > also makes it easier to introduce new flaky tests. > > >> 2) There were no guardrails in place to avoid merging a PR that wasn't >>>>>>>>>>>> fully validated. >>>>>>>>>>>> >>>>>>>>>>>> Things that would've made this better which I think we should >>>>>>>>>>>> focus our attention on: >>>>>>>>>>>> >>>>>>>>>>>> 1) (easy) Split the linting task out from the test step. >>>>>>>>>>>> 2) This one is more controversial, but I would vote we start >>>>>>>>>>>> enforcing some (or all?) precommit checks as required. I don't >>>>>>>>>>>> really blame >>>>>>>>>>>> anyone for merging this PR since it had repeatedly run into >>>>>>>>>>>> unrelated >>>>>>>>>>>> flakes, but that's kinda the point - we need to feel some of that >>>>>>>>>>>> pain to >>>>>>>>>>>> avoid accidentally merging bad code. Blocking PRs on failed >>>>>>>>>>>> precommit >>>>>>>>>>>> suites would force us to deal with these kinds of flakes and >>>>>>>>>>>> incentivize a >>>>>>>>>>>> lot of good behavior in the repo (and disincentivize bad tests). >>>>>>>>>>>> >>>>>>>>>>>> Thanks, >>>>>>>>>>>> Danny >>>>>>>>>>>> >>>>>>>>>>>> On Mon, Jun 27, 2022 at 4:48 PM Brian Hulette < >>>>>>>>>>>> bhule...@google.com> wrote: >>>>>>>>>>>> >>>>>>>>>>>>> Hi Damon, >>>>>>>>>>>>> I think our CI system was working as intended for your PR. We >>>>>>>>>>>>> select which PreCommits to run based on the files that are >>>>>>>>>>>>> edited, and the >>>>>>>>>>>>> author/reviewer should collaborate to get those green before >>>>>>>>>>>>> merging the >>>>>>>>>>>>> PR. Sometimes that will involve inspecting the logs in Jenkins to >>>>>>>>>>>>> find the >>>>>>>>>>>>> failing gradle tasks and run them locally to reproduce. >>>>>>>>>>>>> >>>>>>>>>>>>> It looks like what broke down in this case was that the PR was >>>>>>>>>>>>> accidentally merged despite the failing Java PreCommit run. >>>>>>>>>>>>> >>>>>>>>>>>>> Brian >>>>>>>>>>>>> >>>>>>>>>>>>> On Mon, Jun 27, 2022 at 12:02 PM Damon Douglas < >>>>>>>>>>>>> damondoug...@google.com> wrote: >>>>>>>>>>>>> >>>>>>>>>>>>>> I write this to prevent a recent experience I encountered >>>>>>>>>>>>>> contributing to the Apache Beam repository. To address a system >>>>>>>>>>>>>> issue, I >>>>>>>>>>>>>> write in SBAR >>>>>>>>>>>>>> <https://www.hopkinsmedicine.org/antimicrobial-stewardship/nursing-toolkit/_docs/sbar-communication-tool.pdf> >>>>>>>>>>>>>> format for clarity. If there are no objections to the >>>>>>>>>>>>>> recommendation, I'm >>>>>>>>>>>>>> happy to fulfill the first recommendation i.e. create GitHub >>>>>>>>>>>>>> issues. >>>>>>>>>>>>>> >>>>>>>>>>>>>> *Subject* >>>>>>>>>>>>>> >>>>>>>>>>>>>> I submitted a PR <https://github.com/apache/beam/pull/21953> >>>>>>>>>>>>>> that was subsequently reverted >>>>>>>>>>>>>> <https://github.com/apache/beam/pull/22044>. >>>>>>>>>>>>>> >>>>>>>>>>>>>> *Background* >>>>>>>>>>>>>> >>>>>>>>>>>>>> Despite passing the following gradle tasks, >>>>>>>>>>>>>> the PreCommit_Java_Commit failed with a test unrelated to >>>>>>>>>>>>>> the code in the PR >>>>>>>>>>>>>> <https://ci-beam.apache.org/job/beam_PreCommit_Java_Commit/23066/testReport/junit/org.apache.beam.sdk.io.gcp.bigquery/BigQueryServicesImplTest/testInsertWithinRowCountLimits/>. >>>>>>>>>>>>>> The PR was merged and then reverted due to discovered vendor >>>>>>>>>>>>>> dependency >>>>>>>>>>>>>> issues and failed style checks. >>>>>>>>>>>>>> >>>>>>>>>>>>>> ./gradlew rat >>>>>>>>>>>>>> ./gradlew spotlessCheck >>>>>>>>>>>>>> ./gradlew sdks:java:io:google-cloud-platform:test >>>>>>>>>>>>>> >>>>>>>>>>>>>> *Assessment* >>>>>>>>>>>>>> >>>>>>>>>>>>>> Output from the following gradle commands did not surface >>>>>>>>>>>>>> prior to the PR merge. >>>>>>>>>>>>>> >>>>>>>>>>>>>> ./gradlew sdks:java:io:google-cloud-platform:checkStyleMain >>>>>>>>>>>>>> ./gradlew sdks:java:io:google-cloud-platform:checkStyleTest >>>>>>>>>>>>>> >>>>>>>>>>>>>> Both the aforementioned failing PreCommit_Java_Commit test >>>>>>>>>>>>>> and a lack of knowledge contributed to the root cause. >>>>>>>>>>>>>> >>>>>>>>>>>>>> *Recommendation / Proposal* >>>>>>>>>>>>>> >>>>>>>>>>>>>> I propose updating PR template description >>>>>>>>>>>>>> <https://github.com/apache/beam/blob/master/.github/PULL_REQUEST_TEMPLATE.md> >>>>>>>>>>>>>> that >>>>>>>>>>>>>> prescribes gradle tasks to be successfully executed prior to >>>>>>>>>>>>>> submitting a >>>>>>>>>>>>>> PR. Toward this aim: >>>>>>>>>>>>>> >>>>>>>>>>>>>> 1. Four separate GitHub issues in >>>>>>>>>>>>>> https://github.com/apache/beam representative of *common*, >>>>>>>>>>>>>> and language specific: *Java*, *Python*, and *Go*, >>>>>>>>>>>>>> prescribed gradle tasks >>>>>>>>>>>>>> 2. Don't reply to this email with specifics on how; >>>>>>>>>>>>>> delegation of comments/discussion within the relevant context of >>>>>>>>>>>>>> the GitHub >>>>>>>>>>>>>> issues in 1 >>>>>>>>>>>>>> 3. PRs that fix the GitHub issues in 1 >>>>>>>>>>>>>> >>>>>>>>>>>>>> -- >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> *Damon Douglas* >>>>>>>>>>>>>> >>>>>>>>>>>>>> Strategic Cloud Engineer, Data & Analytics, Google Cloud >>>>>>>>>>>>>> >>>>>>>>>>>>>> damondoug...@google.com >>>>>>>>>>>>>> >>>>>>>>>>>>>