Re: Proposal: keeping post-commit tests green

Mikhail Gryzykhin Wed, 13 Jun 2018 15:46:08 -0700

Hello everybody,

Thanks everyone. I didn't receive any more feedback on the design proposal
document [1] and I believe we've reached consensus. I've added
implementation tasks in JIRA (BEAM-4559 [2])  and will start coding soon.
As a recap, the high-level plan is:



   - Split existing post-commit tests jobs to automatically and manually
   triggered
   - Add tracking by JIRA bugs for failing test job
   - Create document describing post-commit failures handling policies
   - Add tests status badge to PR template
   - Create dashboard for post-commit tests
   - Detect and fix flaky java tests (if any)


[1]
https://docs.google.com/document/d/1sczGwnCvdHiboVajGVdnZL0rfnr7ViXXAebBAf_uQME
[2] https://issues.apache.org/jira/browse/BEAM-4559

--Mikhail


On Wed, Jun 6, 2018 at 1:12 PM Mikhail Gryzykhin <[email protected]> wrote:

> Hello everyone,
>
> Most of the comments on my last draft addressed technical details of
> automation implementation of specific processes proposed. No major process
> changes were suggested.
>
> If you have not yet, please review this document.
>
> Highlights from last change:
> * Bumped splitting tests jobs after Kenneths comment.
> * No-commit in case of too many open JIRA tickets (metric was there,
> action was missing)
> * No-commit in case of too old JIRA ticket (metric was there, action was
> missing)
> * Closed comments that are addressed in document.
>
> This document already has two LGTMs from Scott Wegner and Thomas Weise.
> If no major comments will come, I'll treat this document as complete and
> start working on implementing work items defined in this document.
>
> Thank you,
> --Mikhail
>
>
> On Tue, Jun 5, 2018 at 7:38 PM Thomas Weise <[email protected]> wrote:
>
>> Thanks for taking this initiative. As the number of contributors grows,
>> so does the cost of broken builds. I'm also in favor of locking master
>> merges until related issues are fixed (short term pain for long term
>> gain). It would penalize a few for the benefit of many.
>>
>> On that note, recently we also had a fair share of pre-commit build
>> issues, with a few making their way to master. These include instances
>> unrelated to build tooling, such as compile error or packaging. I don't
>> think we should run PR merges over the red light and suggest it is
>> necessary to step up the gatekeeper responsibility committers have.
>>
>> Thanks,
>> Thomas
>>
>>
>> On Tue, Jun 5, 2018 at 10:56 AM, Scott Wegner <[email protected]> wrote:
>>
>>> I've taken another pass over the doc, and it looks good to me. Thanks
>>> for driving this effort!
>>>
>>> On Mon, Jun 4, 2018 at 9:08 AM Mikhail Gryzykhin <[email protected]>
>>> wrote:
>>>
>>>> Hello everyone,
>>>>
>>>> I have addressed comments on the proposal doc and updated it
>>>> accordingly. I have also added section on metrics that we want to track for
>>>> pre-commit tests and contents for dashboard.
>>>>
>>>> Please, take a second look at the document.
>>>>
>>>> Highlights:
>>>> * Sections that I feel require more discussion are marked with *[More
>>>> opinions wanted]*
>>>> ** I've kept original comments open for this iteration. Please, close
>>>> them if you feel those resolved, or elaborate more on the topic.*
>>>> * Added information on metrics to track
>>>> * Moved “Split test jobs into automatically and manually triggered” to
>>>> “Other ideas to consider”
>>>> * Prioritized automated JIRA ticket creation over manual
>>>> * Prioritized roll-back first policy
>>>> * Added process for enforcing proposed policies.
>>>>
>>>> --Mikhail
>>>>
>>>> Have feedback <http://go/migryz-feedback>?
>>>>
>>>>
>>>> On Tue, May 22, 2018 at 10:11 AM Scott Wegner <[email protected]>
>>>> wrote:
>>>>
>>>>> Thanks for the thoughtful proposal Mikhail. I've left some comments in
>>>>> the doc.
>>>>>
>>>>> I encourage others to take a look: the proposal adds some strong
>>>>> policies about dealing with post-commit failures (rollback policy, locking
>>>>> master). Currently our post-commits are frequently red, and we're missing
>>>>> out on a valuable quality signal. I'm in favor of such policies to help 
>>>>> get
>>>>> the test signals back to a healthy state.
>>>>>
>>>>> On Mon, May 21, 2018 at 2:48 PM Mikhail Gryzykhin <[email protected]>
>>>>> wrote:
>>>>>
>>>>>> Hi Everyone,
>>>>>>
>>>>>> I've updated design doc according to comments.
>>>>>>
>>>>>> https://docs.google.com/document/d/1sczGwnCvdHiboVajGVdnZL0rfnr7ViXXAebBAf_uQME
>>>>>>
>>>>>> In general, ideas proposed seem to be appreciated. Still, some of
>>>>>> sections require more discussion.
>>>>>>
>>>>>> Changes highlight:
>>>>>> * Added roll-back first policy to best practices. This includes
>>>>>> process on how to handle roll-back.
>>>>>> * Marked topics that I'd like to have more input on. [cyan color]
>>>>>>
>>>>>> --Mikhail
>>>>>>
>>>>>> Have feedback <http://go/migryz-feedback>?
>>>>>>
>>>>>>
>>>>>> On Fri, May 18, 2018 at 10:56 AM Andrew Pilloud <[email protected]>
>>>>>> wrote:
>>>>>>
>>>>>>> Blocking commits to master on test flaps seems critical here. The
>>>>>>> test flaps won't get the attention they deserve as long as people are 
>>>>>>> just
>>>>>>> spamming their PRs with 'Run Java Precommit' until they turn green. I'm
>>>>>>> guilty of this behavior and I know it masks new flaky tests.
>>>>>>>
>>>>>>> I added a comment to your doc about detecting flaky tests. This can
>>>>>>> easily be done by rerunning the postcommits during times when Jenkins 
>>>>>>> would
>>>>>>> otherwise be idle. You'll easily get a few dozen runs every weekend, you
>>>>>>> just need a process to triage all the flakes and ensure there are bugs. 
>>>>>>> I
>>>>>>> worked on a project that did this along with blocking master on any post
>>>>>>> commit failure. It was painful for the first few weeks, but things got
>>>>>>> significantly better once most of the bugs were fixed.
>>>>>>>
>>>>>>> Andrew
>>>>>>>
>>>>>>> On Fri, May 18, 2018 at 10:39 AM Kenneth Knowles <[email protected]>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Love it. I would pull out from the doc also the key point: make the
>>>>>>>> postcommit status constantly visible to everyone.
>>>>>>>>
>>>>>>>> Kenn
>>>>>>>>
>>>>>>>> On Fri, May 18, 2018 at 10:17 AM Mikhail Gryzykhin <
>>>>>>>> [email protected]> wrote:
>>>>>>>>
>>>>>>>>> Hi everyone,
>>>>>>>>>
>>>>>>>>> I'm Mikhail and started working on Google Dataflow several months
>>>>>>>>> ago. I'm really excited to work with Beam opensource community.
>>>>>>>>>
>>>>>>>>> I have a proposal to improve contributor experience by keeping
>>>>>>>>> post-commit tests green.
>>>>>>>>>
>>>>>>>>> I'm looking to get community consensus and approval about the
>>>>>>>>> process for keeping post-commit tests green and addressing 
>>>>>>>>> post-commit test
>>>>>>>>> failures.
>>>>>>>>>
>>>>>>>>> Find full list of ideas brought in for discussion in this document:
>>>>>>>>>
>>>>>>>>> https://docs.google.com/document/d/1sczGwnCvdHiboVajGVdnZL0rfnr7ViXXAebBAf_uQME
>>>>>>>>>
>>>>>>>>> Key points are:
>>>>>>>>> 1. Add explicit tracking of failures via JIRA
>>>>>>>>> 2. No-Commit policy when post-commit tests are red
>>>>>>>>>
>>>>>>>>> --Mikhail
>>>>>>>>>
>>>>>>>>>
>>

Re: Proposal: keeping post-commit tests green

Reply via email to