Thanks to Xintong for bringing up this topic, I'm +1 in general.

However, I think it's still not very clear how we address the unstable
tests.
I think this is a very important part of this new guideline.

According to the discussion above, if some tests are unstable, we can
manually disable it.
But I have some questions in my mind:
1) Is the instability judged by the committer themselves or by some
metrics?
2) Should we log the disable commit in the corresponding issue and increase
the priority?
3) What if nobody looks into this issue and this becomes some potential
bugs released with the new version?
4) If no person is actively working on the issue, who should re-enable it?
Would it block PRs again?


Best,
Jark


On Thu, 24 Jun 2021 at 10:04, Xintong Song <tonysong...@gmail.com> wrote:

> Thanks all for the feedback.
>
> @Till @Yangze
>
> I'm also not convinced by the idea of having an exception for local builds.
> We need to execute the entire build (or at least the failing stage)
> locally, to make sure subsequent test cases prevented by the failure one
> are all executed. In that case, it's probably easier to rerun the build on
> azure than locally.
>
> Concerning disabling unstable test cases that regularly block PRs from
> merging, maybe we can say that such cases can only be disabled when someone
> is actively looking into it, likely the person who disabled the case. If
> this person is no longer actively working on it, he/she should enable the
> case again no matter if it is fixed or not.
>
> @Jing
>
> Thanks for the suggestions.
>
> +1 to provide guidelines on handling test failures.
>
> 1. Report the test failures in the JIRA.
> >
>
> +1 on this. Currently, the release managers are monitoring the ci and cron
> build instabilities and reporting them on JIRA. We should also encourage
> other contributors to do that for PRs.
>
> 2. Set a deadline to find out the root cause and solve the failure for the
> > new created JIRA  because we could not block other commit merges for a
> long
> > time
> >
> 3. What to do if the JIRA has not made significant progress when reached to
> > the deadline time?
>
>
> I'm not sure about these two. It feels a bit against the voluntary nature
> of open source projects.
>
> IMHO, frequent instabilities are more likely to be upgraded to the critical
> / blocker priority, receive more attention and eventually get fixed.
> Release managers are also responsible for looking for assignees for such
> issues. If a case is still not fixed soonish, even with all these efforts,
> I'm not sure how setting a deadline can help this.
>
> 4. If we disable the respective tests temporarily, we also need a mechanism
> > to ensure the issue would be continued to be investigated in the future.
> >
>
> +1. As mentioned above, we may consider disabling such tests iff someone is
> actively working on it.
>
> Thank you~
>
> Xintong Song
>
>
>
> On Wed, Jun 23, 2021 at 9:56 PM JING ZHANG <beyond1...@gmail.com> wrote:
>
> > Hi Xintong,
> > +1 to the proposal.
> > In order to better comply with the rule, it is necessary to describe
> what's
> > best practice if encountering test failure which seems unrelated with the
> > current commits.
> > How to avoid merging PR with test failures and not blocking code merging
> > for a long time?
> > I tried to think about the possible steps, and found there are some
> > detailed problems that need to be discussed in a step further:
> > 1. Report the test failures in the JIRA.
> > 2. Set a deadline to find out the root cause and solve the failure for
> the
> > new created JIRA  because we could not block other commit merges for a
> long
> > time
> >     When is a reasonable deadline here?
> > 3. What to do if the JIRA has not made significant progress when reached
> to
> > the deadline time?
> >     There are several situations as follows, maybe different cases need
> > different approaches.
> >     1. the JIRA is non-assigned yet
> >     2. not found the root cause yet
> >     3. not found a good solution, but already found the root cause
> >     4. found a solution, but it needs more time to be done.
> > 4. If we disable the respective tests temporarily, we also need a
> mechanism
> > to ensure the issue would be continued to be investigated in the future.
> >
> > Best regards,
> > JING ZHANG
> >
> > Stephan Ewen <se...@apache.org> 于2021年6月23日周三 下午8:16写道:
> >
> > > +1 to Xintong's proposal
> > >
> > > On Wed, Jun 23, 2021 at 1:53 PM Till Rohrmann <trohrm...@apache.org>
> > > wrote:
> > >
> > > > I would first try to not introduce the exception for local builds. It
> > > makes
> > > > it quite hard for others to verify the build and to make sure that
> the
> > > > right things were executed. If we see that this becomes an issue then
> > we
> > > > can revisit this idea.
> > > >
> > > > Cheers,
> > > > Till
> > > >
> > > > On Wed, Jun 23, 2021 at 4:19 AM Yangze Guo <karma...@gmail.com>
> wrote:
> > > >
> > > > > +1 for appending this to community guidelines for merging PRs.
> > > > >
> > > > > @Till Rohrmann
> > > > > I agree that with this approach unstable tests will not block other
> > > > > commit merges. However, it might be hard to prevent merging commits
> > > > > that are related to those tests and should have been passed them.
> > It's
> > > > > true that this judgment can be made by the committers, but no one
> can
> > > > > ensure the judgment is always precise and so that we have this
> > > > > discussion thread.
> > > > >
> > > > > Regarding the unstable tests, how about adding another exception:
> > > > > committers verify it in their local environment and comment in such
> > > > > cases?
> > > > >
> > > > > Best,
> > > > > Yangze Guo
> > > > >
> > > > > On Tue, Jun 22, 2021 at 8:23 PM 刘建刚 <liujiangangp...@gmail.com>
> > wrote:
> > > > > >
> > > > > > It is a good principle to run all tests successfully with any
> > change.
> > > > > This
> > > > > > means a lot for project's stability and development. I am big +1
> > for
> > > > this
> > > > > > proposal.
> > > > > >
> > > > > > Best
> > > > > > liujiangang
> > > > > >
> > > > > > Till Rohrmann <trohrm...@apache.org> 于2021年6月22日周二 下午6:36写道:
> > > > > >
> > > > > > > One way to address the problem of regularly failing tests that
> > > block
> > > > > > > merging of PRs is to disable the respective tests for the time
> > > being.
> > > > > Of
> > > > > > > course, the failing test then needs to be fixed. But at least
> > that
> > > > way
> > > > > we
> > > > > > > would not block everyone from making progress.
> > > > > > >
> > > > > > > Cheers,
> > > > > > > Till
> > > > > > >
> > > > > > > On Tue, Jun 22, 2021 at 12:00 PM Arvid Heise <ar...@apache.org
> >
> > > > wrote:
> > > > > > >
> > > > > > > > I think this is overall a good idea. So +1 from my side.
> > > > > > > > However, I'd like to put a higher priority on infrastructure
> > > then,
> > > > in
> > > > > > > > particular docker image/artifact caches.
> > > > > > > >
> > > > > > > > On Tue, Jun 22, 2021 at 11:50 AM Till Rohrmann <
> > > > trohrm...@apache.org
> > > > > >
> > > > > > > > wrote:
> > > > > > > >
> > > > > > > > > Thanks for bringing this topic to our attention Xintong. I
> > > think
> > > > > your
> > > > > > > > > proposal makes a lot of sense and we should follow it. It
> > will
> > > > > give us
> > > > > > > > > confidence that our changes are working and it might be a
> > good
> > > > > > > incentive
> > > > > > > > to
> > > > > > > > > quickly fix build instabilities. Hence, +1.
> > > > > > > > >
> > > > > > > > > Cheers,
> > > > > > > > > Till
> > > > > > > > >
> > > > > > > > > On Tue, Jun 22, 2021 at 11:12 AM Xintong Song <
> > > > > tonysong...@gmail.com>
> > > > > > > > > wrote:
> > > > > > > > >
> > > > > > > > > > Hi everyone,
> > > > > > > > > >
> > > > > > > > > > In the past a couple of weeks, I've observed several
> times
> > > that
> > > > > PRs
> > > > > > > are
> > > > > > > > > > merged without a green light from the CI tests, where
> > failure
> > > > > cases
> > > > > > > are
> > > > > > > > > > considered *unrelated*. This may not always cause
> problems,
> > > but
> > > > > would
> > > > > > > > > > increase the chance of breaking our code base. In fact,
> it
> > > has
> > > > > > > occurred
> > > > > > > > > to
> > > > > > > > > > me twice in the past few weeks that I had to revert a
> > commit
> > > > > which
> > > > > > > > breaks
> > > > > > > > > > the master branch due to this.
> > > > > > > > > >
> > > > > > > > > > I think it would be nicer to enforce a stricter rule,
> that
> > no
> > > > PRs
> > > > > > > > should
> > > > > > > > > be
> > > > > > > > > > merged without passing CI.
> > > > > > > > > >
> > > > > > > > > > The problems of merging PRs with "unrelated" test
> failures
> > > are:
> > > > > > > > > > - It's not always straightforward to tell whether a test
> > > > > failures are
> > > > > > > > > > related or not.
> > > > > > > > > > - It prevents subsequent test cases from being executed,
> > > which
> > > > > may
> > > > > > > fail
> > > > > > > > > > relating to the PR changes.
> > > > > > > > > >
> > > > > > > > > > To make things easier for the committers, the following
> > > > > exceptions
> > > > > > > > might
> > > > > > > > > be
> > > > > > > > > > considered acceptable.
> > > > > > > > > > - The PR has passed CI in the contributor's personal
> > > workspace.
> > > > > > > Please
> > > > > > > > > post
> > > > > > > > > > the link in such cases.
> > > > > > > > > > - The CI tests have been triggered multiple times, on the
> > > same
> > > > > > > commit,
> > > > > > > > > and
> > > > > > > > > > each stage has at least passed for once. Please also
> > comment
> > > in
> > > > > such
> > > > > > > > > cases.
> > > > > > > > > >
> > > > > > > > > > If we all agree on this, I'd update the community
> > guidelines
> > > > for
> > > > > > > > merging
> > > > > > > > > > PRs wrt. this proposal. [1]
> > > > > > > > > >
> > > > > > > > > > Please let me know what do you think.
> > > > > > > > > >
> > > > > > > > > > Thank you~
> > > > > > > > > >
> > > > > > > > > > Xintong Song
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > [1]
> > > > > > > > > >
> > > > > > > >
> > > > >
> > >
> https://cwiki.apache.org/confluence/display/FLINK/Merging+Pull+Requests
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > >
> > > >
> > >
> >
>

Reply via email to