Re: [DISCUSS] Do not merge PRs with "unrelated" test failures.

2021-07-07 Thread Xintong Song
Thanks, Etienne. Comments addressed.

Thank you~

Xintong Song



On Wed, Jul 7, 2021 at 6:41 PM Etienne Chauchot 
wrote:

> Hi,
>
> Thanks Xintong for the very good doc ! I added 2 comments to it.
>
> Best,
>
> Etienne
>
> On 02/07/2021 05:57, Xintong Song wrote:
> > Thanks all for the positive feedback.
> >
> > I have updated the wiki page [1], and will send an announcement in a
> > separate thread, to draw more committers' attention.
> >
> > Moreover, I've opened FLINK-23212 where we can continue with the
> discussion
> > around pure documentation changing PRs.
> >
> > Thank you~
> >
> > Xintong Song
> >
> >
> > [1]
> https://cwiki.apache.org/confluence/display/FLINK/Merging+Pull+Requests
> >
> > On Wed, Jun 30, 2021 at 5:26 PM Xintong Song 
> wrote:
> >
> >> I second Tison's opinion.
> >>
> >> Similar to how we skip docs_404_check for PRs that do not touch the
> >> documentation, we can skip other stages for PRs that only contain
> >> documentation changes.
> >>
> >> In addition to making merging documentation PRs easier, we can also
> reduce
> >> the workload on CI workers. Especially during the last days of a release
> >> cycle, which is usually the most busy time for the CI workers, and is
> also
> >> where most documentation efforts take place.
> >>
> >> Thank you~
> >>
> >> Xintong Song
> >>
> >>
> >>
> >> On Wed, Jun 30, 2021 at 3:56 PM Till Rohrmann 
> >> wrote:
> >>
> >>> I think you are right Tison that docs are a special case and they only
> >>> require flink-docs to pass. What I am wondering is how much of a
> problem
> >>> this will be (assuming that we have a decent build stability). The more
> >>> exceptions we add, the harder it will be to properly follow the
> >>> guidelines.
> >>> Maybe we can observe how many docs PRs get delayed/not merged because
> of
> >>> this and then revisit this discussion if needed.
> >>>
> >>> Cheers,
> >>> Till
> >>>
> >>> On Wed, Jun 30, 2021 at 8:30 AM tison  wrote:
> >>>
>  Hi,
> 
>  There are a number of PRs modifying docs only, but we still require
> all
>  tests passed on that.
> 
>  It is a good proposal we avoid merge PR with "unrelated" failure, but
> >>> can
>  we improve the case where the contributor only works for docs?
> 
>  For example, base on the file change set, run doc tests only.
> 
>  Best,
>  tison.
> 
> 
>  godfrey he  于2021年6月30日周三 下午2:17写道:
> 
> > +1 for the proposal. Thanks Xintong!
> >
> > Best,
> > Godfrey
> >
> >
> >
> > Rui Li  于2021年6月30日周三 上午11:36写道:
> >
> >> Thanks Xintong. +1 to the proposal.
> >>
> >> On Tue, Jun 29, 2021 at 11:05 AM 刘建刚 
>  wrote:
> >>> +1 for the proposal. Since the test time is long and environment
> >>> may
> >> vary,
> >>> unstable tests are really annoying for developers. The solution is
> >> welcome.
> >>> Best
> >>> liujiangang
> >>>
> >>> Jingsong Li  于2021年6月29日周二 上午10:31写道:
> >>>
>  +1 Thanks Xintong for the update!
> 
>  Best,
>  Jingsong
> 
>  On Mon, Jun 28, 2021 at 6:44 PM Till Rohrmann <
>  trohrm...@apache.org>
>  wrote:
> 
> > +1, thanks for updating the guidelines Xintong!
> >
> > Cheers,
> > Till
> >
> > On Mon, Jun 28, 2021 at 11:49 AM Yangze Guo <
> >>> karma...@gmail.com>
> >>> wrote:
> >> +1
> >>
> >> Thanks Xintong for drafting this doc.
> >>
> >> Best,
> >> Yangze Guo
> >>
> >> On Mon, Jun 28, 2021 at 5:42 PM JING ZHANG <
>  beyond1...@gmail.com
>  wrote:
> >>> Thanks Xintong for giving detailed documentation.
> >>>
> >>> The best practice for handling test failure is very
> >>> detailed,
> >> it's
> >>> a
> > good
> >>> guidelines document with clear action steps.
> >>>
> >>> +1 to Xintong's proposal.
> >>>
> >>> Xintong Song  于2021年6月28日周一
> >>> 下午4:07写道:
>  Thanks all for the discussion.
> 
>  Based on the opinions so far, I've drafted the new
>  guidelines
> >>> [1],
> > as a
>  potential replacement of the original wiki page [2].
> 
>  Hopefully this draft has covered the most opinions
>  discussed
> >> and
> >> consensus
>  made in this discussion thread.
> 
>  Looking forward to your feedback.
> 
>  Thank you~
> 
>  Xintong Song
> 
> 
>  [1]
> 
> 
> >>>
> https://docs.google.com/document/d/1uUbxbgbGErBXtmEjhwVhBWG3i6nhQ0LXs96OlntEYnU/edit?usp=sharing
>  [2]
> 
> 
> https://cwiki.apache.org/confluence/display/FLINK/Merging+Pull+Requests
> 
> 
>  On F

Re: [DISCUSS] Do not merge PRs with "unrelated" test failures.

2021-07-07 Thread Etienne Chauchot

Hi,

Thanks Xintong for the very good doc ! I added 2 comments to it.

Best,

Etienne

On 02/07/2021 05:57, Xintong Song wrote:

Thanks all for the positive feedback.

I have updated the wiki page [1], and will send an announcement in a
separate thread, to draw more committers' attention.

Moreover, I've opened FLINK-23212 where we can continue with the discussion
around pure documentation changing PRs.

Thank you~

Xintong Song


[1] https://cwiki.apache.org/confluence/display/FLINK/Merging+Pull+Requests

On Wed, Jun 30, 2021 at 5:26 PM Xintong Song  wrote:


I second Tison's opinion.

Similar to how we skip docs_404_check for PRs that do not touch the
documentation, we can skip other stages for PRs that only contain
documentation changes.

In addition to making merging documentation PRs easier, we can also reduce
the workload on CI workers. Especially during the last days of a release
cycle, which is usually the most busy time for the CI workers, and is also
where most documentation efforts take place.

Thank you~

Xintong Song



On Wed, Jun 30, 2021 at 3:56 PM Till Rohrmann 
wrote:


I think you are right Tison that docs are a special case and they only
require flink-docs to pass. What I am wondering is how much of a problem
this will be (assuming that we have a decent build stability). The more
exceptions we add, the harder it will be to properly follow the
guidelines.
Maybe we can observe how many docs PRs get delayed/not merged because of
this and then revisit this discussion if needed.

Cheers,
Till

On Wed, Jun 30, 2021 at 8:30 AM tison  wrote:


Hi,

There are a number of PRs modifying docs only, but we still require all
tests passed on that.

It is a good proposal we avoid merge PR with "unrelated" failure, but

can

we improve the case where the contributor only works for docs?

For example, base on the file change set, run doc tests only.

Best,
tison.


godfrey he  于2021年6月30日周三 下午2:17写道:


+1 for the proposal. Thanks Xintong!

Best,
Godfrey



Rui Li  于2021年6月30日周三 上午11:36写道:


Thanks Xintong. +1 to the proposal.

On Tue, Jun 29, 2021 at 11:05 AM 刘建刚 

wrote:

+1 for the proposal. Since the test time is long and environment

may

vary,

unstable tests are really annoying for developers. The solution is

welcome.

Best
liujiangang

Jingsong Li  于2021年6月29日周二 上午10:31写道:


+1 Thanks Xintong for the update!

Best,
Jingsong

On Mon, Jun 28, 2021 at 6:44 PM Till Rohrmann <

trohrm...@apache.org>

wrote:


+1, thanks for updating the guidelines Xintong!

Cheers,
Till

On Mon, Jun 28, 2021 at 11:49 AM Yangze Guo <

karma...@gmail.com>

wrote:

+1

Thanks Xintong for drafting this doc.

Best,
Yangze Guo

On Mon, Jun 28, 2021 at 5:42 PM JING ZHANG <

beyond1...@gmail.com

wrote:

Thanks Xintong for giving detailed documentation.

The best practice for handling test failure is very

detailed,

it's

a

good

guidelines document with clear action steps.

+1 to Xintong's proposal.

Xintong Song  于2021年6月28日周一

下午4:07写道:

Thanks all for the discussion.

Based on the opinions so far, I've drafted the new

guidelines

[1],

as a

potential replacement of the original wiki page [2].

Hopefully this draft has covered the most opinions

discussed

and

consensus

made in this discussion thread.

Looking forward to your feedback.

Thank you~

Xintong Song


[1]



https://docs.google.com/document/d/1uUbxbgbGErBXtmEjhwVhBWG3i6nhQ0LXs96OlntEYnU/edit?usp=sharing

[2]


https://cwiki.apache.org/confluence/display/FLINK/Merging+Pull+Requests



On Fri, Jun 25, 2021 at 10:40 PM Piotr Nowojski <

pnowoj...@apache.org>

wrote:


Thanks for the clarification Till. +1 for what you

have

written.

Piotrek

pt., 25 cze 2021 o 16:00 Till Rohrmann <

trohrm...@apache.org

napisał(a):

One quick note for clarification. I don't have

anything

against

builds

running on your personal Azure account and this is

not

what I

understood

under "local environment". For me "local

environment"

means

that

someone

runs the test locally on his machine and then says

that

the

tests have passed locally.

I do agree that there might be a conflict of

interests

if a

PR

author

disables tests. Here I would argue that we don't

have

malignant

committers

which means that every committer will probably first

check

the

respective

ticket for how often the test failed. Then I guess

the

next

step

would

be

to discuss on the ticket whether to disable it or

not.

And

finally,

after

reaching a consensus, it will be disabled. If we see

someone

abusing

this

policy, then we can still think about how to guard

against

it.

But,

honestly, I have very rarely seen such a case. I am

also

ok

to

pull in

the

release manager to make the final call if this

resolves

concerns.

Cheers,
Till

On Fri, Jun 25, 2021 at 9:07 AM Piotr Nowojski <

pnowoj...@apache.org>

wrote:


+1 for the general idea, however I have concerns

about

a

couple

of

details.

I would first try to not introduce the exception

for

local

builds.

I

Re: [DISCUSS] Do not merge PRs with "unrelated" test failures.

2021-07-01 Thread Xintong Song
Thanks all for the positive feedback.

I have updated the wiki page [1], and will send an announcement in a
separate thread, to draw more committers' attention.

Moreover, I've opened FLINK-23212 where we can continue with the discussion
around pure documentation changing PRs.

Thank you~

Xintong Song


[1] https://cwiki.apache.org/confluence/display/FLINK/Merging+Pull+Requests

On Wed, Jun 30, 2021 at 5:26 PM Xintong Song  wrote:

> I second Tison's opinion.
>
> Similar to how we skip docs_404_check for PRs that do not touch the
> documentation, we can skip other stages for PRs that only contain
> documentation changes.
>
> In addition to making merging documentation PRs easier, we can also reduce
> the workload on CI workers. Especially during the last days of a release
> cycle, which is usually the most busy time for the CI workers, and is also
> where most documentation efforts take place.
>
> Thank you~
>
> Xintong Song
>
>
>
> On Wed, Jun 30, 2021 at 3:56 PM Till Rohrmann 
> wrote:
>
>> I think you are right Tison that docs are a special case and they only
>> require flink-docs to pass. What I am wondering is how much of a problem
>> this will be (assuming that we have a decent build stability). The more
>> exceptions we add, the harder it will be to properly follow the
>> guidelines.
>> Maybe we can observe how many docs PRs get delayed/not merged because of
>> this and then revisit this discussion if needed.
>>
>> Cheers,
>> Till
>>
>> On Wed, Jun 30, 2021 at 8:30 AM tison  wrote:
>>
>> > Hi,
>> >
>> > There are a number of PRs modifying docs only, but we still require all
>> > tests passed on that.
>> >
>> > It is a good proposal we avoid merge PR with "unrelated" failure, but
>> can
>> > we improve the case where the contributor only works for docs?
>> >
>> > For example, base on the file change set, run doc tests only.
>> >
>> > Best,
>> > tison.
>> >
>> >
>> > godfrey he  于2021年6月30日周三 下午2:17写道:
>> >
>> > > +1 for the proposal. Thanks Xintong!
>> > >
>> > > Best,
>> > > Godfrey
>> > >
>> > >
>> > >
>> > > Rui Li  于2021年6月30日周三 上午11:36写道:
>> > >
>> > > > Thanks Xintong. +1 to the proposal.
>> > > >
>> > > > On Tue, Jun 29, 2021 at 11:05 AM 刘建刚 
>> > wrote:
>> > > >
>> > > > > +1 for the proposal. Since the test time is long and environment
>> may
>> > > > vary,
>> > > > > unstable tests are really annoying for developers. The solution is
>> > > > welcome.
>> > > > >
>> > > > > Best
>> > > > > liujiangang
>> > > > >
>> > > > > Jingsong Li  于2021年6月29日周二 上午10:31写道:
>> > > > >
>> > > > > > +1 Thanks Xintong for the update!
>> > > > > >
>> > > > > > Best,
>> > > > > > Jingsong
>> > > > > >
>> > > > > > On Mon, Jun 28, 2021 at 6:44 PM Till Rohrmann <
>> > trohrm...@apache.org>
>> > > > > > wrote:
>> > > > > >
>> > > > > > > +1, thanks for updating the guidelines Xintong!
>> > > > > > >
>> > > > > > > Cheers,
>> > > > > > > Till
>> > > > > > >
>> > > > > > > On Mon, Jun 28, 2021 at 11:49 AM Yangze Guo <
>> karma...@gmail.com>
>> > > > > wrote:
>> > > > > > >
>> > > > > > > > +1
>> > > > > > > >
>> > > > > > > > Thanks Xintong for drafting this doc.
>> > > > > > > >
>> > > > > > > > Best,
>> > > > > > > > Yangze Guo
>> > > > > > > >
>> > > > > > > > On Mon, Jun 28, 2021 at 5:42 PM JING ZHANG <
>> > beyond1...@gmail.com
>> > > >
>> > > > > > wrote:
>> > > > > > > > >
>> > > > > > > > > Thanks Xintong for giving detailed documentation.
>> > > > > > > > >
>> > > > > > > > > The best practice for handling test failure is very
>> detailed,
>> > > > it's
>> > > > > a
>> > > > > > > good
>> > > > > > > > > guidelines document with clear action steps.
>> > > > > > > > >
>> > > > > > > > > +1 to Xintong's proposal.
>> > > > > > > > >
>> > > > > > > > > Xintong Song  于2021年6月28日周一
>> 下午4:07写道:
>> > > > > > > > >
>> > > > > > > > > > Thanks all for the discussion.
>> > > > > > > > > >
>> > > > > > > > > > Based on the opinions so far, I've drafted the new
>> > guidelines
>> > > > > [1],
>> > > > > > > as a
>> > > > > > > > > > potential replacement of the original wiki page [2].
>> > > > > > > > > >
>> > > > > > > > > > Hopefully this draft has covered the most opinions
>> > discussed
>> > > > and
>> > > > > > > > consensus
>> > > > > > > > > > made in this discussion thread.
>> > > > > > > > > >
>> > > > > > > > > > Looking forward to your feedback.
>> > > > > > > > > >
>> > > > > > > > > > Thank you~
>> > > > > > > > > >
>> > > > > > > > > > Xintong Song
>> > > > > > > > > >
>> > > > > > > > > >
>> > > > > > > > > > [1]
>> > > > > > > > > >
>> > > > > > > > > >
>> > > > > > > >
>> > > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>> https://docs.google.com/document/d/1uUbxbgbGErBXtmEjhwVhBWG3i6nhQ0LXs96OlntEYnU/edit?usp=sharing
>> > > > > > > > > >
>> > > > > > > > > > [2]
>> > > > > > > > > >
>> > > > > > > >
>> > > > > >
>> > > >
>> > https://cwiki.apache.org/confluence/display/FLINK/Merging+Pull+Requests
>> > > > > > > > > >
>> > > > > > > > > >
>> > > > > > > > > >
>> > > > > > > > > > On Fri, Jun 2

Re: [DISCUSS] Do not merge PRs with "unrelated" test failures.

2021-06-30 Thread Xintong Song
I second Tison's opinion.

Similar to how we skip docs_404_check for PRs that do not touch the
documentation, we can skip other stages for PRs that only contain
documentation changes.

In addition to making merging documentation PRs easier, we can also reduce
the workload on CI workers. Especially during the last days of a release
cycle, which is usually the most busy time for the CI workers, and is also
where most documentation efforts take place.

Thank you~

Xintong Song



On Wed, Jun 30, 2021 at 3:56 PM Till Rohrmann  wrote:

> I think you are right Tison that docs are a special case and they only
> require flink-docs to pass. What I am wondering is how much of a problem
> this will be (assuming that we have a decent build stability). The more
> exceptions we add, the harder it will be to properly follow the guidelines.
> Maybe we can observe how many docs PRs get delayed/not merged because of
> this and then revisit this discussion if needed.
>
> Cheers,
> Till
>
> On Wed, Jun 30, 2021 at 8:30 AM tison  wrote:
>
> > Hi,
> >
> > There are a number of PRs modifying docs only, but we still require all
> > tests passed on that.
> >
> > It is a good proposal we avoid merge PR with "unrelated" failure, but can
> > we improve the case where the contributor only works for docs?
> >
> > For example, base on the file change set, run doc tests only.
> >
> > Best,
> > tison.
> >
> >
> > godfrey he  于2021年6月30日周三 下午2:17写道:
> >
> > > +1 for the proposal. Thanks Xintong!
> > >
> > > Best,
> > > Godfrey
> > >
> > >
> > >
> > > Rui Li  于2021年6月30日周三 上午11:36写道:
> > >
> > > > Thanks Xintong. +1 to the proposal.
> > > >
> > > > On Tue, Jun 29, 2021 at 11:05 AM 刘建刚 
> > wrote:
> > > >
> > > > > +1 for the proposal. Since the test time is long and environment
> may
> > > > vary,
> > > > > unstable tests are really annoying for developers. The solution is
> > > > welcome.
> > > > >
> > > > > Best
> > > > > liujiangang
> > > > >
> > > > > Jingsong Li  于2021年6月29日周二 上午10:31写道:
> > > > >
> > > > > > +1 Thanks Xintong for the update!
> > > > > >
> > > > > > Best,
> > > > > > Jingsong
> > > > > >
> > > > > > On Mon, Jun 28, 2021 at 6:44 PM Till Rohrmann <
> > trohrm...@apache.org>
> > > > > > wrote:
> > > > > >
> > > > > > > +1, thanks for updating the guidelines Xintong!
> > > > > > >
> > > > > > > Cheers,
> > > > > > > Till
> > > > > > >
> > > > > > > On Mon, Jun 28, 2021 at 11:49 AM Yangze Guo <
> karma...@gmail.com>
> > > > > wrote:
> > > > > > >
> > > > > > > > +1
> > > > > > > >
> > > > > > > > Thanks Xintong for drafting this doc.
> > > > > > > >
> > > > > > > > Best,
> > > > > > > > Yangze Guo
> > > > > > > >
> > > > > > > > On Mon, Jun 28, 2021 at 5:42 PM JING ZHANG <
> > beyond1...@gmail.com
> > > >
> > > > > > wrote:
> > > > > > > > >
> > > > > > > > > Thanks Xintong for giving detailed documentation.
> > > > > > > > >
> > > > > > > > > The best practice for handling test failure is very
> detailed,
> > > > it's
> > > > > a
> > > > > > > good
> > > > > > > > > guidelines document with clear action steps.
> > > > > > > > >
> > > > > > > > > +1 to Xintong's proposal.
> > > > > > > > >
> > > > > > > > > Xintong Song  于2021年6月28日周一
> 下午4:07写道:
> > > > > > > > >
> > > > > > > > > > Thanks all for the discussion.
> > > > > > > > > >
> > > > > > > > > > Based on the opinions so far, I've drafted the new
> > guidelines
> > > > > [1],
> > > > > > > as a
> > > > > > > > > > potential replacement of the original wiki page [2].
> > > > > > > > > >
> > > > > > > > > > Hopefully this draft has covered the most opinions
> > discussed
> > > > and
> > > > > > > > consensus
> > > > > > > > > > made in this discussion thread.
> > > > > > > > > >
> > > > > > > > > > Looking forward to your feedback.
> > > > > > > > > >
> > > > > > > > > > Thank you~
> > > > > > > > > >
> > > > > > > > > > Xintong Song
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > [1]
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://docs.google.com/document/d/1uUbxbgbGErBXtmEjhwVhBWG3i6nhQ0LXs96OlntEYnU/edit?usp=sharing
> > > > > > > > > >
> > > > > > > > > > [2]
> > > > > > > > > >
> > > > > > > >
> > > > > >
> > > >
> > https://cwiki.apache.org/confluence/display/FLINK/Merging+Pull+Requests
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > On Fri, Jun 25, 2021 at 10:40 PM Piotr Nowojski <
> > > > > > > pnowoj...@apache.org>
> > > > > > > > > > wrote:
> > > > > > > > > >
> > > > > > > > > > > Thanks for the clarification Till. +1 for what you have
> > > > > written.
> > > > > > > > > > >
> > > > > > > > > > > Piotrek
> > > > > > > > > > >
> > > > > > > > > > > pt., 25 cze 2021 o 16:00 Till Rohrmann <
> > > trohrm...@apache.org
> > > > >
> > > > > > > > > > napisał(a):
> > > > > > > > > > >
> > > > > > > > > > > > One quick note for clarification. I don't have
> anything
> > > > > against
> > > > > > > > builds
> > > > > > > > > > > > running on your persona

Re: [DISCUSS] Do not merge PRs with "unrelated" test failures.

2021-06-30 Thread Till Rohrmann
I think you are right Tison that docs are a special case and they only
require flink-docs to pass. What I am wondering is how much of a problem
this will be (assuming that we have a decent build stability). The more
exceptions we add, the harder it will be to properly follow the guidelines.
Maybe we can observe how many docs PRs get delayed/not merged because of
this and then revisit this discussion if needed.

Cheers,
Till

On Wed, Jun 30, 2021 at 8:30 AM tison  wrote:

> Hi,
>
> There are a number of PRs modifying docs only, but we still require all
> tests passed on that.
>
> It is a good proposal we avoid merge PR with "unrelated" failure, but can
> we improve the case where the contributor only works for docs?
>
> For example, base on the file change set, run doc tests only.
>
> Best,
> tison.
>
>
> godfrey he  于2021年6月30日周三 下午2:17写道:
>
> > +1 for the proposal. Thanks Xintong!
> >
> > Best,
> > Godfrey
> >
> >
> >
> > Rui Li  于2021年6月30日周三 上午11:36写道:
> >
> > > Thanks Xintong. +1 to the proposal.
> > >
> > > On Tue, Jun 29, 2021 at 11:05 AM 刘建刚 
> wrote:
> > >
> > > > +1 for the proposal. Since the test time is long and environment may
> > > vary,
> > > > unstable tests are really annoying for developers. The solution is
> > > welcome.
> > > >
> > > > Best
> > > > liujiangang
> > > >
> > > > Jingsong Li  于2021年6月29日周二 上午10:31写道:
> > > >
> > > > > +1 Thanks Xintong for the update!
> > > > >
> > > > > Best,
> > > > > Jingsong
> > > > >
> > > > > On Mon, Jun 28, 2021 at 6:44 PM Till Rohrmann <
> trohrm...@apache.org>
> > > > > wrote:
> > > > >
> > > > > > +1, thanks for updating the guidelines Xintong!
> > > > > >
> > > > > > Cheers,
> > > > > > Till
> > > > > >
> > > > > > On Mon, Jun 28, 2021 at 11:49 AM Yangze Guo 
> > > > wrote:
> > > > > >
> > > > > > > +1
> > > > > > >
> > > > > > > Thanks Xintong for drafting this doc.
> > > > > > >
> > > > > > > Best,
> > > > > > > Yangze Guo
> > > > > > >
> > > > > > > On Mon, Jun 28, 2021 at 5:42 PM JING ZHANG <
> beyond1...@gmail.com
> > >
> > > > > wrote:
> > > > > > > >
> > > > > > > > Thanks Xintong for giving detailed documentation.
> > > > > > > >
> > > > > > > > The best practice for handling test failure is very detailed,
> > > it's
> > > > a
> > > > > > good
> > > > > > > > guidelines document with clear action steps.
> > > > > > > >
> > > > > > > > +1 to Xintong's proposal.
> > > > > > > >
> > > > > > > > Xintong Song  于2021年6月28日周一 下午4:07写道:
> > > > > > > >
> > > > > > > > > Thanks all for the discussion.
> > > > > > > > >
> > > > > > > > > Based on the opinions so far, I've drafted the new
> guidelines
> > > > [1],
> > > > > > as a
> > > > > > > > > potential replacement of the original wiki page [2].
> > > > > > > > >
> > > > > > > > > Hopefully this draft has covered the most opinions
> discussed
> > > and
> > > > > > > consensus
> > > > > > > > > made in this discussion thread.
> > > > > > > > >
> > > > > > > > > Looking forward to your feedback.
> > > > > > > > >
> > > > > > > > > Thank you~
> > > > > > > > >
> > > > > > > > > Xintong Song
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > [1]
> > > > > > > > >
> > > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://docs.google.com/document/d/1uUbxbgbGErBXtmEjhwVhBWG3i6nhQ0LXs96OlntEYnU/edit?usp=sharing
> > > > > > > > >
> > > > > > > > > [2]
> > > > > > > > >
> > > > > > >
> > > > >
> > >
> https://cwiki.apache.org/confluence/display/FLINK/Merging+Pull+Requests
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > On Fri, Jun 25, 2021 at 10:40 PM Piotr Nowojski <
> > > > > > pnowoj...@apache.org>
> > > > > > > > > wrote:
> > > > > > > > >
> > > > > > > > > > Thanks for the clarification Till. +1 for what you have
> > > > written.
> > > > > > > > > >
> > > > > > > > > > Piotrek
> > > > > > > > > >
> > > > > > > > > > pt., 25 cze 2021 o 16:00 Till Rohrmann <
> > trohrm...@apache.org
> > > >
> > > > > > > > > napisał(a):
> > > > > > > > > >
> > > > > > > > > > > One quick note for clarification. I don't have anything
> > > > against
> > > > > > > builds
> > > > > > > > > > > running on your personal Azure account and this is not
> > > what I
> > > > > > > > > understood
> > > > > > > > > > > under "local environment". For me "local environment"
> > means
> > > > > that
> > > > > > > > > someone
> > > > > > > > > > > runs the test locally on his machine and then says that
> > the
> > > > > > > > > > > tests have passed locally.
> > > > > > > > > > >
> > > > > > > > > > > I do agree that there might be a conflict of interests
> > if a
> > > > PR
> > > > > > > author
> > > > > > > > > > > disables tests. Here I would argue that we don't have
> > > > malignant
> > > > > > > > > > committers
> > > > > > > > > > > which means that every committer will probably first
> > check
> > > > the
> > > > > > > > > respective
> > > > > > > > > > > ticket for how often the test failed. Then I guess the
> > next
> > > > > step
> > > > > > > would
> > > > > > > > > be
> 

Re: [DISCUSS] Do not merge PRs with "unrelated" test failures.

2021-06-29 Thread tison
Hi,

There are a number of PRs modifying docs only, but we still require all
tests passed on that.

It is a good proposal we avoid merge PR with "unrelated" failure, but can
we improve the case where the contributor only works for docs?

For example, base on the file change set, run doc tests only.

Best,
tison.


godfrey he  于2021年6月30日周三 下午2:17写道:

> +1 for the proposal. Thanks Xintong!
>
> Best,
> Godfrey
>
>
>
> Rui Li  于2021年6月30日周三 上午11:36写道:
>
> > Thanks Xintong. +1 to the proposal.
> >
> > On Tue, Jun 29, 2021 at 11:05 AM 刘建刚  wrote:
> >
> > > +1 for the proposal. Since the test time is long and environment may
> > vary,
> > > unstable tests are really annoying for developers. The solution is
> > welcome.
> > >
> > > Best
> > > liujiangang
> > >
> > > Jingsong Li  于2021年6月29日周二 上午10:31写道:
> > >
> > > > +1 Thanks Xintong for the update!
> > > >
> > > > Best,
> > > > Jingsong
> > > >
> > > > On Mon, Jun 28, 2021 at 6:44 PM Till Rohrmann 
> > > > wrote:
> > > >
> > > > > +1, thanks for updating the guidelines Xintong!
> > > > >
> > > > > Cheers,
> > > > > Till
> > > > >
> > > > > On Mon, Jun 28, 2021 at 11:49 AM Yangze Guo 
> > > wrote:
> > > > >
> > > > > > +1
> > > > > >
> > > > > > Thanks Xintong for drafting this doc.
> > > > > >
> > > > > > Best,
> > > > > > Yangze Guo
> > > > > >
> > > > > > On Mon, Jun 28, 2021 at 5:42 PM JING ZHANG  >
> > > > wrote:
> > > > > > >
> > > > > > > Thanks Xintong for giving detailed documentation.
> > > > > > >
> > > > > > > The best practice for handling test failure is very detailed,
> > it's
> > > a
> > > > > good
> > > > > > > guidelines document with clear action steps.
> > > > > > >
> > > > > > > +1 to Xintong's proposal.
> > > > > > >
> > > > > > > Xintong Song  于2021年6月28日周一 下午4:07写道:
> > > > > > >
> > > > > > > > Thanks all for the discussion.
> > > > > > > >
> > > > > > > > Based on the opinions so far, I've drafted the new guidelines
> > > [1],
> > > > > as a
> > > > > > > > potential replacement of the original wiki page [2].
> > > > > > > >
> > > > > > > > Hopefully this draft has covered the most opinions discussed
> > and
> > > > > > consensus
> > > > > > > > made in this discussion thread.
> > > > > > > >
> > > > > > > > Looking forward to your feedback.
> > > > > > > >
> > > > > > > > Thank you~
> > > > > > > >
> > > > > > > > Xintong Song
> > > > > > > >
> > > > > > > >
> > > > > > > > [1]
> > > > > > > >
> > > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://docs.google.com/document/d/1uUbxbgbGErBXtmEjhwVhBWG3i6nhQ0LXs96OlntEYnU/edit?usp=sharing
> > > > > > > >
> > > > > > > > [2]
> > > > > > > >
> > > > > >
> > > >
> > https://cwiki.apache.org/confluence/display/FLINK/Merging+Pull+Requests
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > On Fri, Jun 25, 2021 at 10:40 PM Piotr Nowojski <
> > > > > pnowoj...@apache.org>
> > > > > > > > wrote:
> > > > > > > >
> > > > > > > > > Thanks for the clarification Till. +1 for what you have
> > > written.
> > > > > > > > >
> > > > > > > > > Piotrek
> > > > > > > > >
> > > > > > > > > pt., 25 cze 2021 o 16:00 Till Rohrmann <
> trohrm...@apache.org
> > >
> > > > > > > > napisał(a):
> > > > > > > > >
> > > > > > > > > > One quick note for clarification. I don't have anything
> > > against
> > > > > > builds
> > > > > > > > > > running on your personal Azure account and this is not
> > what I
> > > > > > > > understood
> > > > > > > > > > under "local environment". For me "local environment"
> means
> > > > that
> > > > > > > > someone
> > > > > > > > > > runs the test locally on his machine and then says that
> the
> > > > > > > > > > tests have passed locally.
> > > > > > > > > >
> > > > > > > > > > I do agree that there might be a conflict of interests
> if a
> > > PR
> > > > > > author
> > > > > > > > > > disables tests. Here I would argue that we don't have
> > > malignant
> > > > > > > > > committers
> > > > > > > > > > which means that every committer will probably first
> check
> > > the
> > > > > > > > respective
> > > > > > > > > > ticket for how often the test failed. Then I guess the
> next
> > > > step
> > > > > > would
> > > > > > > > be
> > > > > > > > > > to discuss on the ticket whether to disable it or not.
> And
> > > > > finally,
> > > > > > > > after
> > > > > > > > > > reaching a consensus, it will be disabled. If we see
> > someone
> > > > > > abusing
> > > > > > > > this
> > > > > > > > > > policy, then we can still think about how to guard
> against
> > > it.
> > > > > But,
> > > > > > > > > > honestly, I have very rarely seen such a case. I am also
> ok
> > > to
> > > > > > pull in
> > > > > > > > > the
> > > > > > > > > > release manager to make the final call if this resolves
> > > > concerns.
> > > > > > > > > >
> > > > > > > > > > Cheers,
> > > > > > > > > > Till
> > > > > > > > > >
> > > > > > > > > > On Fri, Jun 25, 2021 at 9:07 AM Piotr Nowojski <
> > > > > > pnowoj...@apache.org>
> > > > > > > > > > wrote:
> > > > > > > > > >
> > > > > > > > > > > +1 f

Re: [DISCUSS] Do not merge PRs with "unrelated" test failures.

2021-06-29 Thread godfrey he
+1 for the proposal. Thanks Xintong!

Best,
Godfrey



Rui Li  于2021年6月30日周三 上午11:36写道:

> Thanks Xintong. +1 to the proposal.
>
> On Tue, Jun 29, 2021 at 11:05 AM 刘建刚  wrote:
>
> > +1 for the proposal. Since the test time is long and environment may
> vary,
> > unstable tests are really annoying for developers. The solution is
> welcome.
> >
> > Best
> > liujiangang
> >
> > Jingsong Li  于2021年6月29日周二 上午10:31写道:
> >
> > > +1 Thanks Xintong for the update!
> > >
> > > Best,
> > > Jingsong
> > >
> > > On Mon, Jun 28, 2021 at 6:44 PM Till Rohrmann 
> > > wrote:
> > >
> > > > +1, thanks for updating the guidelines Xintong!
> > > >
> > > > Cheers,
> > > > Till
> > > >
> > > > On Mon, Jun 28, 2021 at 11:49 AM Yangze Guo 
> > wrote:
> > > >
> > > > > +1
> > > > >
> > > > > Thanks Xintong for drafting this doc.
> > > > >
> > > > > Best,
> > > > > Yangze Guo
> > > > >
> > > > > On Mon, Jun 28, 2021 at 5:42 PM JING ZHANG 
> > > wrote:
> > > > > >
> > > > > > Thanks Xintong for giving detailed documentation.
> > > > > >
> > > > > > The best practice for handling test failure is very detailed,
> it's
> > a
> > > > good
> > > > > > guidelines document with clear action steps.
> > > > > >
> > > > > > +1 to Xintong's proposal.
> > > > > >
> > > > > > Xintong Song  于2021年6月28日周一 下午4:07写道:
> > > > > >
> > > > > > > Thanks all for the discussion.
> > > > > > >
> > > > > > > Based on the opinions so far, I've drafted the new guidelines
> > [1],
> > > > as a
> > > > > > > potential replacement of the original wiki page [2].
> > > > > > >
> > > > > > > Hopefully this draft has covered the most opinions discussed
> and
> > > > > consensus
> > > > > > > made in this discussion thread.
> > > > > > >
> > > > > > > Looking forward to your feedback.
> > > > > > >
> > > > > > > Thank you~
> > > > > > >
> > > > > > > Xintong Song
> > > > > > >
> > > > > > >
> > > > > > > [1]
> > > > > > >
> > > > > > >
> > > > >
> > > >
> > >
> >
> https://docs.google.com/document/d/1uUbxbgbGErBXtmEjhwVhBWG3i6nhQ0LXs96OlntEYnU/edit?usp=sharing
> > > > > > >
> > > > > > > [2]
> > > > > > >
> > > > >
> > >
> https://cwiki.apache.org/confluence/display/FLINK/Merging+Pull+Requests
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > On Fri, Jun 25, 2021 at 10:40 PM Piotr Nowojski <
> > > > pnowoj...@apache.org>
> > > > > > > wrote:
> > > > > > >
> > > > > > > > Thanks for the clarification Till. +1 for what you have
> > written.
> > > > > > > >
> > > > > > > > Piotrek
> > > > > > > >
> > > > > > > > pt., 25 cze 2021 o 16:00 Till Rohrmann  >
> > > > > > > napisał(a):
> > > > > > > >
> > > > > > > > > One quick note for clarification. I don't have anything
> > against
> > > > > builds
> > > > > > > > > running on your personal Azure account and this is not
> what I
> > > > > > > understood
> > > > > > > > > under "local environment". For me "local environment" means
> > > that
> > > > > > > someone
> > > > > > > > > runs the test locally on his machine and then says that the
> > > > > > > > > tests have passed locally.
> > > > > > > > >
> > > > > > > > > I do agree that there might be a conflict of interests if a
> > PR
> > > > > author
> > > > > > > > > disables tests. Here I would argue that we don't have
> > malignant
> > > > > > > > committers
> > > > > > > > > which means that every committer will probably first check
> > the
> > > > > > > respective
> > > > > > > > > ticket for how often the test failed. Then I guess the next
> > > step
> > > > > would
> > > > > > > be
> > > > > > > > > to discuss on the ticket whether to disable it or not. And
> > > > finally,
> > > > > > > after
> > > > > > > > > reaching a consensus, it will be disabled. If we see
> someone
> > > > > abusing
> > > > > > > this
> > > > > > > > > policy, then we can still think about how to guard against
> > it.
> > > > But,
> > > > > > > > > honestly, I have very rarely seen such a case. I am also ok
> > to
> > > > > pull in
> > > > > > > > the
> > > > > > > > > release manager to make the final call if this resolves
> > > concerns.
> > > > > > > > >
> > > > > > > > > Cheers,
> > > > > > > > > Till
> > > > > > > > >
> > > > > > > > > On Fri, Jun 25, 2021 at 9:07 AM Piotr Nowojski <
> > > > > pnowoj...@apache.org>
> > > > > > > > > wrote:
> > > > > > > > >
> > > > > > > > > > +1 for the general idea, however I have concerns about a
> > > couple
> > > > > of
> > > > > > > > > details.
> > > > > > > > > >
> > > > > > > > > > > I would first try to not introduce the exception for
> > local
> > > > > builds.
> > > > > > > > > > > It makes it quite hard for others to verify the build
> and
> > > to
> > > > > make
> > > > > > > > sure
> > > > > > > > > > that the right things were executed.
> > > > > > > > > >
> > > > > > > > > > I would counter Till's proposal to ignore local green
> > builds.
> > > > If
> > > > > > > > > committer
> > > > > > > > > > is merging and closing a PR, with official azure failure,
> > but
> > > > > there
> > > > > > > > was a
> > > > > > > > > > green build 

Re: [DISCUSS] Do not merge PRs with "unrelated" test failures.

2021-06-29 Thread Rui Li
Thanks Xintong. +1 to the proposal.

On Tue, Jun 29, 2021 at 11:05 AM 刘建刚  wrote:

> +1 for the proposal. Since the test time is long and environment may vary,
> unstable tests are really annoying for developers. The solution is welcome.
>
> Best
> liujiangang
>
> Jingsong Li  于2021年6月29日周二 上午10:31写道:
>
> > +1 Thanks Xintong for the update!
> >
> > Best,
> > Jingsong
> >
> > On Mon, Jun 28, 2021 at 6:44 PM Till Rohrmann 
> > wrote:
> >
> > > +1, thanks for updating the guidelines Xintong!
> > >
> > > Cheers,
> > > Till
> > >
> > > On Mon, Jun 28, 2021 at 11:49 AM Yangze Guo 
> wrote:
> > >
> > > > +1
> > > >
> > > > Thanks Xintong for drafting this doc.
> > > >
> > > > Best,
> > > > Yangze Guo
> > > >
> > > > On Mon, Jun 28, 2021 at 5:42 PM JING ZHANG 
> > wrote:
> > > > >
> > > > > Thanks Xintong for giving detailed documentation.
> > > > >
> > > > > The best practice for handling test failure is very detailed, it's
> a
> > > good
> > > > > guidelines document with clear action steps.
> > > > >
> > > > > +1 to Xintong's proposal.
> > > > >
> > > > > Xintong Song  于2021年6月28日周一 下午4:07写道:
> > > > >
> > > > > > Thanks all for the discussion.
> > > > > >
> > > > > > Based on the opinions so far, I've drafted the new guidelines
> [1],
> > > as a
> > > > > > potential replacement of the original wiki page [2].
> > > > > >
> > > > > > Hopefully this draft has covered the most opinions discussed and
> > > > consensus
> > > > > > made in this discussion thread.
> > > > > >
> > > > > > Looking forward to your feedback.
> > > > > >
> > > > > > Thank you~
> > > > > >
> > > > > > Xintong Song
> > > > > >
> > > > > >
> > > > > > [1]
> > > > > >
> > > > > >
> > > >
> > >
> >
> https://docs.google.com/document/d/1uUbxbgbGErBXtmEjhwVhBWG3i6nhQ0LXs96OlntEYnU/edit?usp=sharing
> > > > > >
> > > > > > [2]
> > > > > >
> > > >
> > https://cwiki.apache.org/confluence/display/FLINK/Merging+Pull+Requests
> > > > > >
> > > > > >
> > > > > >
> > > > > > On Fri, Jun 25, 2021 at 10:40 PM Piotr Nowojski <
> > > pnowoj...@apache.org>
> > > > > > wrote:
> > > > > >
> > > > > > > Thanks for the clarification Till. +1 for what you have
> written.
> > > > > > >
> > > > > > > Piotrek
> > > > > > >
> > > > > > > pt., 25 cze 2021 o 16:00 Till Rohrmann 
> > > > > > napisał(a):
> > > > > > >
> > > > > > > > One quick note for clarification. I don't have anything
> against
> > > > builds
> > > > > > > > running on your personal Azure account and this is not what I
> > > > > > understood
> > > > > > > > under "local environment". For me "local environment" means
> > that
> > > > > > someone
> > > > > > > > runs the test locally on his machine and then says that the
> > > > > > > > tests have passed locally.
> > > > > > > >
> > > > > > > > I do agree that there might be a conflict of interests if a
> PR
> > > > author
> > > > > > > > disables tests. Here I would argue that we don't have
> malignant
> > > > > > > committers
> > > > > > > > which means that every committer will probably first check
> the
> > > > > > respective
> > > > > > > > ticket for how often the test failed. Then I guess the next
> > step
> > > > would
> > > > > > be
> > > > > > > > to discuss on the ticket whether to disable it or not. And
> > > finally,
> > > > > > after
> > > > > > > > reaching a consensus, it will be disabled. If we see someone
> > > > abusing
> > > > > > this
> > > > > > > > policy, then we can still think about how to guard against
> it.
> > > But,
> > > > > > > > honestly, I have very rarely seen such a case. I am also ok
> to
> > > > pull in
> > > > > > > the
> > > > > > > > release manager to make the final call if this resolves
> > concerns.
> > > > > > > >
> > > > > > > > Cheers,
> > > > > > > > Till
> > > > > > > >
> > > > > > > > On Fri, Jun 25, 2021 at 9:07 AM Piotr Nowojski <
> > > > pnowoj...@apache.org>
> > > > > > > > wrote:
> > > > > > > >
> > > > > > > > > +1 for the general idea, however I have concerns about a
> > couple
> > > > of
> > > > > > > > details.
> > > > > > > > >
> > > > > > > > > > I would first try to not introduce the exception for
> local
> > > > builds.
> > > > > > > > > > It makes it quite hard for others to verify the build and
> > to
> > > > make
> > > > > > > sure
> > > > > > > > > that the right things were executed.
> > > > > > > > >
> > > > > > > > > I would counter Till's proposal to ignore local green
> builds.
> > > If
> > > > > > > > committer
> > > > > > > > > is merging and closing a PR, with official azure failure,
> but
> > > > there
> > > > > > > was a
> > > > > > > > > green build before or in local azure it's IMO enough to
> leave
> > > the
> > > > > > > > message:
> > > > > > > > >
> > > > > > > > > > Latest build failure is a known issue: FLINK-12345
> > > > > > > > > > Green local build: URL
> > > > > > > > >
> > > > > > > > > This should address Till's concern about verification.
> > > > > > > > >
> > > > > > > > > On the other hand I have concerns about disabling tests.*
> It
> > > > > > shouldn't
> > 

Re: [DISCUSS] Do not merge PRs with "unrelated" test failures.

2021-06-28 Thread 刘建刚
+1 for the proposal. Since the test time is long and environment may vary,
unstable tests are really annoying for developers. The solution is welcome.

Best
liujiangang

Jingsong Li  于2021年6月29日周二 上午10:31写道:

> +1 Thanks Xintong for the update!
>
> Best,
> Jingsong
>
> On Mon, Jun 28, 2021 at 6:44 PM Till Rohrmann 
> wrote:
>
> > +1, thanks for updating the guidelines Xintong!
> >
> > Cheers,
> > Till
> >
> > On Mon, Jun 28, 2021 at 11:49 AM Yangze Guo  wrote:
> >
> > > +1
> > >
> > > Thanks Xintong for drafting this doc.
> > >
> > > Best,
> > > Yangze Guo
> > >
> > > On Mon, Jun 28, 2021 at 5:42 PM JING ZHANG 
> wrote:
> > > >
> > > > Thanks Xintong for giving detailed documentation.
> > > >
> > > > The best practice for handling test failure is very detailed, it's a
> > good
> > > > guidelines document with clear action steps.
> > > >
> > > > +1 to Xintong's proposal.
> > > >
> > > > Xintong Song  于2021年6月28日周一 下午4:07写道:
> > > >
> > > > > Thanks all for the discussion.
> > > > >
> > > > > Based on the opinions so far, I've drafted the new guidelines [1],
> > as a
> > > > > potential replacement of the original wiki page [2].
> > > > >
> > > > > Hopefully this draft has covered the most opinions discussed and
> > > consensus
> > > > > made in this discussion thread.
> > > > >
> > > > > Looking forward to your feedback.
> > > > >
> > > > > Thank you~
> > > > >
> > > > > Xintong Song
> > > > >
> > > > >
> > > > > [1]
> > > > >
> > > > >
> > >
> >
> https://docs.google.com/document/d/1uUbxbgbGErBXtmEjhwVhBWG3i6nhQ0LXs96OlntEYnU/edit?usp=sharing
> > > > >
> > > > > [2]
> > > > >
> > >
> https://cwiki.apache.org/confluence/display/FLINK/Merging+Pull+Requests
> > > > >
> > > > >
> > > > >
> > > > > On Fri, Jun 25, 2021 at 10:40 PM Piotr Nowojski <
> > pnowoj...@apache.org>
> > > > > wrote:
> > > > >
> > > > > > Thanks for the clarification Till. +1 for what you have written.
> > > > > >
> > > > > > Piotrek
> > > > > >
> > > > > > pt., 25 cze 2021 o 16:00 Till Rohrmann 
> > > > > napisał(a):
> > > > > >
> > > > > > > One quick note for clarification. I don't have anything against
> > > builds
> > > > > > > running on your personal Azure account and this is not what I
> > > > > understood
> > > > > > > under "local environment". For me "local environment" means
> that
> > > > > someone
> > > > > > > runs the test locally on his machine and then says that the
> > > > > > > tests have passed locally.
> > > > > > >
> > > > > > > I do agree that there might be a conflict of interests if a PR
> > > author
> > > > > > > disables tests. Here I would argue that we don't have malignant
> > > > > > committers
> > > > > > > which means that every committer will probably first check the
> > > > > respective
> > > > > > > ticket for how often the test failed. Then I guess the next
> step
> > > would
> > > > > be
> > > > > > > to discuss on the ticket whether to disable it or not. And
> > finally,
> > > > > after
> > > > > > > reaching a consensus, it will be disabled. If we see someone
> > > abusing
> > > > > this
> > > > > > > policy, then we can still think about how to guard against it.
> > But,
> > > > > > > honestly, I have very rarely seen such a case. I am also ok to
> > > pull in
> > > > > > the
> > > > > > > release manager to make the final call if this resolves
> concerns.
> > > > > > >
> > > > > > > Cheers,
> > > > > > > Till
> > > > > > >
> > > > > > > On Fri, Jun 25, 2021 at 9:07 AM Piotr Nowojski <
> > > pnowoj...@apache.org>
> > > > > > > wrote:
> > > > > > >
> > > > > > > > +1 for the general idea, however I have concerns about a
> couple
> > > of
> > > > > > > details.
> > > > > > > >
> > > > > > > > > I would first try to not introduce the exception for local
> > > builds.
> > > > > > > > > It makes it quite hard for others to verify the build and
> to
> > > make
> > > > > > sure
> > > > > > > > that the right things were executed.
> > > > > > > >
> > > > > > > > I would counter Till's proposal to ignore local green builds.
> > If
> > > > > > > committer
> > > > > > > > is merging and closing a PR, with official azure failure, but
> > > there
> > > > > > was a
> > > > > > > > green build before or in local azure it's IMO enough to leave
> > the
> > > > > > > message:
> > > > > > > >
> > > > > > > > > Latest build failure is a known issue: FLINK-12345
> > > > > > > > > Green local build: URL
> > > > > > > >
> > > > > > > > This should address Till's concern about verification.
> > > > > > > >
> > > > > > > > On the other hand I have concerns about disabling tests.* It
> > > > > shouldn't
> > > > > > be
> > > > > > > > the PR author/committer that's disabling a test on his own,
> as
> > > > > that's a
> > > > > > > > conflict of interests*. I have however no problems with
> > disabling
> > > > > test
> > > > > > > > instabilities that were marked as "blockers" though, that
> > should
> > > work
> > > > > > > > pretty well. But the important thing here is to correctly
> judge
> > > > > bumping
> > > > > > > > 

Re: [DISCUSS] Do not merge PRs with "unrelated" test failures.

2021-06-28 Thread Jingsong Li
+1 Thanks Xintong for the update!

Best,
Jingsong

On Mon, Jun 28, 2021 at 6:44 PM Till Rohrmann  wrote:

> +1, thanks for updating the guidelines Xintong!
>
> Cheers,
> Till
>
> On Mon, Jun 28, 2021 at 11:49 AM Yangze Guo  wrote:
>
> > +1
> >
> > Thanks Xintong for drafting this doc.
> >
> > Best,
> > Yangze Guo
> >
> > On Mon, Jun 28, 2021 at 5:42 PM JING ZHANG  wrote:
> > >
> > > Thanks Xintong for giving detailed documentation.
> > >
> > > The best practice for handling test failure is very detailed, it's a
> good
> > > guidelines document with clear action steps.
> > >
> > > +1 to Xintong's proposal.
> > >
> > > Xintong Song  于2021年6月28日周一 下午4:07写道:
> > >
> > > > Thanks all for the discussion.
> > > >
> > > > Based on the opinions so far, I've drafted the new guidelines [1],
> as a
> > > > potential replacement of the original wiki page [2].
> > > >
> > > > Hopefully this draft has covered the most opinions discussed and
> > consensus
> > > > made in this discussion thread.
> > > >
> > > > Looking forward to your feedback.
> > > >
> > > > Thank you~
> > > >
> > > > Xintong Song
> > > >
> > > >
> > > > [1]
> > > >
> > > >
> >
> https://docs.google.com/document/d/1uUbxbgbGErBXtmEjhwVhBWG3i6nhQ0LXs96OlntEYnU/edit?usp=sharing
> > > >
> > > > [2]
> > > >
> > https://cwiki.apache.org/confluence/display/FLINK/Merging+Pull+Requests
> > > >
> > > >
> > > >
> > > > On Fri, Jun 25, 2021 at 10:40 PM Piotr Nowojski <
> pnowoj...@apache.org>
> > > > wrote:
> > > >
> > > > > Thanks for the clarification Till. +1 for what you have written.
> > > > >
> > > > > Piotrek
> > > > >
> > > > > pt., 25 cze 2021 o 16:00 Till Rohrmann 
> > > > napisał(a):
> > > > >
> > > > > > One quick note for clarification. I don't have anything against
> > builds
> > > > > > running on your personal Azure account and this is not what I
> > > > understood
> > > > > > under "local environment". For me "local environment" means that
> > > > someone
> > > > > > runs the test locally on his machine and then says that the
> > > > > > tests have passed locally.
> > > > > >
> > > > > > I do agree that there might be a conflict of interests if a PR
> > author
> > > > > > disables tests. Here I would argue that we don't have malignant
> > > > > committers
> > > > > > which means that every committer will probably first check the
> > > > respective
> > > > > > ticket for how often the test failed. Then I guess the next step
> > would
> > > > be
> > > > > > to discuss on the ticket whether to disable it or not. And
> finally,
> > > > after
> > > > > > reaching a consensus, it will be disabled. If we see someone
> > abusing
> > > > this
> > > > > > policy, then we can still think about how to guard against it.
> But,
> > > > > > honestly, I have very rarely seen such a case. I am also ok to
> > pull in
> > > > > the
> > > > > > release manager to make the final call if this resolves concerns.
> > > > > >
> > > > > > Cheers,
> > > > > > Till
> > > > > >
> > > > > > On Fri, Jun 25, 2021 at 9:07 AM Piotr Nowojski <
> > pnowoj...@apache.org>
> > > > > > wrote:
> > > > > >
> > > > > > > +1 for the general idea, however I have concerns about a couple
> > of
> > > > > > details.
> > > > > > >
> > > > > > > > I would first try to not introduce the exception for local
> > builds.
> > > > > > > > It makes it quite hard for others to verify the build and to
> > make
> > > > > sure
> > > > > > > that the right things were executed.
> > > > > > >
> > > > > > > I would counter Till's proposal to ignore local green builds.
> If
> > > > > > committer
> > > > > > > is merging and closing a PR, with official azure failure, but
> > there
> > > > > was a
> > > > > > > green build before or in local azure it's IMO enough to leave
> the
> > > > > > message:
> > > > > > >
> > > > > > > > Latest build failure is a known issue: FLINK-12345
> > > > > > > > Green local build: URL
> > > > > > >
> > > > > > > This should address Till's concern about verification.
> > > > > > >
> > > > > > > On the other hand I have concerns about disabling tests.* It
> > > > shouldn't
> > > > > be
> > > > > > > the PR author/committer that's disabling a test on his own, as
> > > > that's a
> > > > > > > conflict of interests*. I have however no problems with
> disabling
> > > > test
> > > > > > > instabilities that were marked as "blockers" though, that
> should
> > work
> > > > > > > pretty well. But the important thing here is to correctly judge
> > > > bumping
> > > > > > > priorities of test instabilities based on their frequency and
> > current
> > > > > > > general health of the system. I believe that release managers
> > should
> > > > be
> > > > > > > playing a big role here in deciding on the guidelines of what
> > should
> > > > > be a
> > > > > > > priority of certain test instabilities.
> > > > > > >
> > > > > > > What I mean by that is two example scenarios:
> > > > > > > 1. if we have a handful of very frequently failing tests and a
> > > > handful
> > > > > of
> > > > > > > very rar

Re: [DISCUSS] Do not merge PRs with "unrelated" test failures.

2021-06-28 Thread Till Rohrmann
+1, thanks for updating the guidelines Xintong!

Cheers,
Till

On Mon, Jun 28, 2021 at 11:49 AM Yangze Guo  wrote:

> +1
>
> Thanks Xintong for drafting this doc.
>
> Best,
> Yangze Guo
>
> On Mon, Jun 28, 2021 at 5:42 PM JING ZHANG  wrote:
> >
> > Thanks Xintong for giving detailed documentation.
> >
> > The best practice for handling test failure is very detailed, it's a good
> > guidelines document with clear action steps.
> >
> > +1 to Xintong's proposal.
> >
> > Xintong Song  于2021年6月28日周一 下午4:07写道:
> >
> > > Thanks all for the discussion.
> > >
> > > Based on the opinions so far, I've drafted the new guidelines [1], as a
> > > potential replacement of the original wiki page [2].
> > >
> > > Hopefully this draft has covered the most opinions discussed and
> consensus
> > > made in this discussion thread.
> > >
> > > Looking forward to your feedback.
> > >
> > > Thank you~
> > >
> > > Xintong Song
> > >
> > >
> > > [1]
> > >
> > >
> https://docs.google.com/document/d/1uUbxbgbGErBXtmEjhwVhBWG3i6nhQ0LXs96OlntEYnU/edit?usp=sharing
> > >
> > > [2]
> > >
> https://cwiki.apache.org/confluence/display/FLINK/Merging+Pull+Requests
> > >
> > >
> > >
> > > On Fri, Jun 25, 2021 at 10:40 PM Piotr Nowojski 
> > > wrote:
> > >
> > > > Thanks for the clarification Till. +1 for what you have written.
> > > >
> > > > Piotrek
> > > >
> > > > pt., 25 cze 2021 o 16:00 Till Rohrmann 
> > > napisał(a):
> > > >
> > > > > One quick note for clarification. I don't have anything against
> builds
> > > > > running on your personal Azure account and this is not what I
> > > understood
> > > > > under "local environment". For me "local environment" means that
> > > someone
> > > > > runs the test locally on his machine and then says that the
> > > > > tests have passed locally.
> > > > >
> > > > > I do agree that there might be a conflict of interests if a PR
> author
> > > > > disables tests. Here I would argue that we don't have malignant
> > > > committers
> > > > > which means that every committer will probably first check the
> > > respective
> > > > > ticket for how often the test failed. Then I guess the next step
> would
> > > be
> > > > > to discuss on the ticket whether to disable it or not. And finally,
> > > after
> > > > > reaching a consensus, it will be disabled. If we see someone
> abusing
> > > this
> > > > > policy, then we can still think about how to guard against it. But,
> > > > > honestly, I have very rarely seen such a case. I am also ok to
> pull in
> > > > the
> > > > > release manager to make the final call if this resolves concerns.
> > > > >
> > > > > Cheers,
> > > > > Till
> > > > >
> > > > > On Fri, Jun 25, 2021 at 9:07 AM Piotr Nowojski <
> pnowoj...@apache.org>
> > > > > wrote:
> > > > >
> > > > > > +1 for the general idea, however I have concerns about a couple
> of
> > > > > details.
> > > > > >
> > > > > > > I would first try to not introduce the exception for local
> builds.
> > > > > > > It makes it quite hard for others to verify the build and to
> make
> > > > sure
> > > > > > that the right things were executed.
> > > > > >
> > > > > > I would counter Till's proposal to ignore local green builds. If
> > > > > committer
> > > > > > is merging and closing a PR, with official azure failure, but
> there
> > > > was a
> > > > > > green build before or in local azure it's IMO enough to leave the
> > > > > message:
> > > > > >
> > > > > > > Latest build failure is a known issue: FLINK-12345
> > > > > > > Green local build: URL
> > > > > >
> > > > > > This should address Till's concern about verification.
> > > > > >
> > > > > > On the other hand I have concerns about disabling tests.* It
> > > shouldn't
> > > > be
> > > > > > the PR author/committer that's disabling a test on his own, as
> > > that's a
> > > > > > conflict of interests*. I have however no problems with disabling
> > > test
> > > > > > instabilities that were marked as "blockers" though, that should
> work
> > > > > > pretty well. But the important thing here is to correctly judge
> > > bumping
> > > > > > priorities of test instabilities based on their frequency and
> current
> > > > > > general health of the system. I believe that release managers
> should
> > > be
> > > > > > playing a big role here in deciding on the guidelines of what
> should
> > > > be a
> > > > > > priority of certain test instabilities.
> > > > > >
> > > > > > What I mean by that is two example scenarios:
> > > > > > 1. if we have a handful of very frequently failing tests and a
> > > handful
> > > > of
> > > > > > very rarely failing tests (like one reported failure and no
> another
> > > > > > occurrence in many months, and let's even say that the failure
> looks
> > > > like
> > > > > > infrastructure/network timeout), we should focus on the
> frequently
> > > > > failing
> > > > > > ones, and probably we are safe to ignore for the time being the
> rare
> > > > > issues
> > > > > > - at least until we deal with the most pressing ones.
> > > > > > 2. If w

Re: [DISCUSS] Do not merge PRs with "unrelated" test failures.

2021-06-28 Thread Yangze Guo
+1

Thanks Xintong for drafting this doc.

Best,
Yangze Guo

On Mon, Jun 28, 2021 at 5:42 PM JING ZHANG  wrote:
>
> Thanks Xintong for giving detailed documentation.
>
> The best practice for handling test failure is very detailed, it's a good
> guidelines document with clear action steps.
>
> +1 to Xintong's proposal.
>
> Xintong Song  于2021年6月28日周一 下午4:07写道:
>
> > Thanks all for the discussion.
> >
> > Based on the opinions so far, I've drafted the new guidelines [1], as a
> > potential replacement of the original wiki page [2].
> >
> > Hopefully this draft has covered the most opinions discussed and consensus
> > made in this discussion thread.
> >
> > Looking forward to your feedback.
> >
> > Thank you~
> >
> > Xintong Song
> >
> >
> > [1]
> >
> > https://docs.google.com/document/d/1uUbxbgbGErBXtmEjhwVhBWG3i6nhQ0LXs96OlntEYnU/edit?usp=sharing
> >
> > [2]
> > https://cwiki.apache.org/confluence/display/FLINK/Merging+Pull+Requests
> >
> >
> >
> > On Fri, Jun 25, 2021 at 10:40 PM Piotr Nowojski 
> > wrote:
> >
> > > Thanks for the clarification Till. +1 for what you have written.
> > >
> > > Piotrek
> > >
> > > pt., 25 cze 2021 o 16:00 Till Rohrmann 
> > napisał(a):
> > >
> > > > One quick note for clarification. I don't have anything against builds
> > > > running on your personal Azure account and this is not what I
> > understood
> > > > under "local environment". For me "local environment" means that
> > someone
> > > > runs the test locally on his machine and then says that the
> > > > tests have passed locally.
> > > >
> > > > I do agree that there might be a conflict of interests if a PR author
> > > > disables tests. Here I would argue that we don't have malignant
> > > committers
> > > > which means that every committer will probably first check the
> > respective
> > > > ticket for how often the test failed. Then I guess the next step would
> > be
> > > > to discuss on the ticket whether to disable it or not. And finally,
> > after
> > > > reaching a consensus, it will be disabled. If we see someone abusing
> > this
> > > > policy, then we can still think about how to guard against it. But,
> > > > honestly, I have very rarely seen such a case. I am also ok to pull in
> > > the
> > > > release manager to make the final call if this resolves concerns.
> > > >
> > > > Cheers,
> > > > Till
> > > >
> > > > On Fri, Jun 25, 2021 at 9:07 AM Piotr Nowojski 
> > > > wrote:
> > > >
> > > > > +1 for the general idea, however I have concerns about a couple of
> > > > details.
> > > > >
> > > > > > I would first try to not introduce the exception for local builds.
> > > > > > It makes it quite hard for others to verify the build and to make
> > > sure
> > > > > that the right things were executed.
> > > > >
> > > > > I would counter Till's proposal to ignore local green builds. If
> > > > committer
> > > > > is merging and closing a PR, with official azure failure, but there
> > > was a
> > > > > green build before or in local azure it's IMO enough to leave the
> > > > message:
> > > > >
> > > > > > Latest build failure is a known issue: FLINK-12345
> > > > > > Green local build: URL
> > > > >
> > > > > This should address Till's concern about verification.
> > > > >
> > > > > On the other hand I have concerns about disabling tests.* It
> > shouldn't
> > > be
> > > > > the PR author/committer that's disabling a test on his own, as
> > that's a
> > > > > conflict of interests*. I have however no problems with disabling
> > test
> > > > > instabilities that were marked as "blockers" though, that should work
> > > > > pretty well. But the important thing here is to correctly judge
> > bumping
> > > > > priorities of test instabilities based on their frequency and current
> > > > > general health of the system. I believe that release managers should
> > be
> > > > > playing a big role here in deciding on the guidelines of what should
> > > be a
> > > > > priority of certain test instabilities.
> > > > >
> > > > > What I mean by that is two example scenarios:
> > > > > 1. if we have a handful of very frequently failing tests and a
> > handful
> > > of
> > > > > very rarely failing tests (like one reported failure and no another
> > > > > occurrence in many months, and let's even say that the failure looks
> > > like
> > > > > infrastructure/network timeout), we should focus on the frequently
> > > > failing
> > > > > ones, and probably we are safe to ignore for the time being the rare
> > > > issues
> > > > > - at least until we deal with the most pressing ones.
> > > > > 2. If we have tons of rarely failing test instabilities, we should
> > > > probably
> > > > > start addressing them as blockers.
> > > > >
> > > > > I'm using my own conscious and my best judgement when I'm
> > > > > bumping/decreasing priorities of test instabilities (and bugs), but
> > as
> > > > > individual committer I don't have the full picture. As I wrote
> > above, I
> > > > > think release managers are in a much better position to keep

Re: [DISCUSS] Do not merge PRs with "unrelated" test failures.

2021-06-28 Thread JING ZHANG
Thanks Xintong for giving detailed documentation.

The best practice for handling test failure is very detailed, it's a good
guidelines document with clear action steps.

+1 to Xintong's proposal.

Xintong Song  于2021年6月28日周一 下午4:07写道:

> Thanks all for the discussion.
>
> Based on the opinions so far, I've drafted the new guidelines [1], as a
> potential replacement of the original wiki page [2].
>
> Hopefully this draft has covered the most opinions discussed and consensus
> made in this discussion thread.
>
> Looking forward to your feedback.
>
> Thank you~
>
> Xintong Song
>
>
> [1]
>
> https://docs.google.com/document/d/1uUbxbgbGErBXtmEjhwVhBWG3i6nhQ0LXs96OlntEYnU/edit?usp=sharing
>
> [2]
> https://cwiki.apache.org/confluence/display/FLINK/Merging+Pull+Requests
>
>
>
> On Fri, Jun 25, 2021 at 10:40 PM Piotr Nowojski 
> wrote:
>
> > Thanks for the clarification Till. +1 for what you have written.
> >
> > Piotrek
> >
> > pt., 25 cze 2021 o 16:00 Till Rohrmann 
> napisał(a):
> >
> > > One quick note for clarification. I don't have anything against builds
> > > running on your personal Azure account and this is not what I
> understood
> > > under "local environment". For me "local environment" means that
> someone
> > > runs the test locally on his machine and then says that the
> > > tests have passed locally.
> > >
> > > I do agree that there might be a conflict of interests if a PR author
> > > disables tests. Here I would argue that we don't have malignant
> > committers
> > > which means that every committer will probably first check the
> respective
> > > ticket for how often the test failed. Then I guess the next step would
> be
> > > to discuss on the ticket whether to disable it or not. And finally,
> after
> > > reaching a consensus, it will be disabled. If we see someone abusing
> this
> > > policy, then we can still think about how to guard against it. But,
> > > honestly, I have very rarely seen such a case. I am also ok to pull in
> > the
> > > release manager to make the final call if this resolves concerns.
> > >
> > > Cheers,
> > > Till
> > >
> > > On Fri, Jun 25, 2021 at 9:07 AM Piotr Nowojski 
> > > wrote:
> > >
> > > > +1 for the general idea, however I have concerns about a couple of
> > > details.
> > > >
> > > > > I would first try to not introduce the exception for local builds.
> > > > > It makes it quite hard for others to verify the build and to make
> > sure
> > > > that the right things were executed.
> > > >
> > > > I would counter Till's proposal to ignore local green builds. If
> > > committer
> > > > is merging and closing a PR, with official azure failure, but there
> > was a
> > > > green build before or in local azure it's IMO enough to leave the
> > > message:
> > > >
> > > > > Latest build failure is a known issue: FLINK-12345
> > > > > Green local build: URL
> > > >
> > > > This should address Till's concern about verification.
> > > >
> > > > On the other hand I have concerns about disabling tests.* It
> shouldn't
> > be
> > > > the PR author/committer that's disabling a test on his own, as
> that's a
> > > > conflict of interests*. I have however no problems with disabling
> test
> > > > instabilities that were marked as "blockers" though, that should work
> > > > pretty well. But the important thing here is to correctly judge
> bumping
> > > > priorities of test instabilities based on their frequency and current
> > > > general health of the system. I believe that release managers should
> be
> > > > playing a big role here in deciding on the guidelines of what should
> > be a
> > > > priority of certain test instabilities.
> > > >
> > > > What I mean by that is two example scenarios:
> > > > 1. if we have a handful of very frequently failing tests and a
> handful
> > of
> > > > very rarely failing tests (like one reported failure and no another
> > > > occurrence in many months, and let's even say that the failure looks
> > like
> > > > infrastructure/network timeout), we should focus on the frequently
> > > failing
> > > > ones, and probably we are safe to ignore for the time being the rare
> > > issues
> > > > - at least until we deal with the most pressing ones.
> > > > 2. If we have tons of rarely failing test instabilities, we should
> > > probably
> > > > start addressing them as blockers.
> > > >
> > > > I'm using my own conscious and my best judgement when I'm
> > > > bumping/decreasing priorities of test instabilities (and bugs), but
> as
> > > > individual committer I don't have the full picture. As I wrote
> above, I
> > > > think release managers are in a much better position to keep
> adjusting
> > > > those kind of guidelines.
> > > >
> > > > Best, Piotrek
> > > >
> > > > pt., 25 cze 2021 o 08:10 Yu Li  napisał(a):
> > > >
> > > > > +1 for Xintong's proposal.
> > > > >
> > > > > For me, resolving problems directly (fixing the infrastructure
> issue,
> > > > > disabling unstable tests and creating blocker JIRAs to track the
> fix
> > > and
> > > > > 

Re: [DISCUSS] Do not merge PRs with "unrelated" test failures.

2021-06-28 Thread Xintong Song
Thanks all for the discussion.

Based on the opinions so far, I've drafted the new guidelines [1], as a
potential replacement of the original wiki page [2].

Hopefully this draft has covered the most opinions discussed and consensus
made in this discussion thread.

Looking forward to your feedback.

Thank you~

Xintong Song


[1]
https://docs.google.com/document/d/1uUbxbgbGErBXtmEjhwVhBWG3i6nhQ0LXs96OlntEYnU/edit?usp=sharing

[2] https://cwiki.apache.org/confluence/display/FLINK/Merging+Pull+Requests



On Fri, Jun 25, 2021 at 10:40 PM Piotr Nowojski 
wrote:

> Thanks for the clarification Till. +1 for what you have written.
>
> Piotrek
>
> pt., 25 cze 2021 o 16:00 Till Rohrmann  napisał(a):
>
> > One quick note for clarification. I don't have anything against builds
> > running on your personal Azure account and this is not what I understood
> > under "local environment". For me "local environment" means that someone
> > runs the test locally on his machine and then says that the
> > tests have passed locally.
> >
> > I do agree that there might be a conflict of interests if a PR author
> > disables tests. Here I would argue that we don't have malignant
> committers
> > which means that every committer will probably first check the respective
> > ticket for how often the test failed. Then I guess the next step would be
> > to discuss on the ticket whether to disable it or not. And finally, after
> > reaching a consensus, it will be disabled. If we see someone abusing this
> > policy, then we can still think about how to guard against it. But,
> > honestly, I have very rarely seen such a case. I am also ok to pull in
> the
> > release manager to make the final call if this resolves concerns.
> >
> > Cheers,
> > Till
> >
> > On Fri, Jun 25, 2021 at 9:07 AM Piotr Nowojski 
> > wrote:
> >
> > > +1 for the general idea, however I have concerns about a couple of
> > details.
> > >
> > > > I would first try to not introduce the exception for local builds.
> > > > It makes it quite hard for others to verify the build and to make
> sure
> > > that the right things were executed.
> > >
> > > I would counter Till's proposal to ignore local green builds. If
> > committer
> > > is merging and closing a PR, with official azure failure, but there
> was a
> > > green build before or in local azure it's IMO enough to leave the
> > message:
> > >
> > > > Latest build failure is a known issue: FLINK-12345
> > > > Green local build: URL
> > >
> > > This should address Till's concern about verification.
> > >
> > > On the other hand I have concerns about disabling tests.* It shouldn't
> be
> > > the PR author/committer that's disabling a test on his own, as that's a
> > > conflict of interests*. I have however no problems with disabling test
> > > instabilities that were marked as "blockers" though, that should work
> > > pretty well. But the important thing here is to correctly judge bumping
> > > priorities of test instabilities based on their frequency and current
> > > general health of the system. I believe that release managers should be
> > > playing a big role here in deciding on the guidelines of what should
> be a
> > > priority of certain test instabilities.
> > >
> > > What I mean by that is two example scenarios:
> > > 1. if we have a handful of very frequently failing tests and a handful
> of
> > > very rarely failing tests (like one reported failure and no another
> > > occurrence in many months, and let's even say that the failure looks
> like
> > > infrastructure/network timeout), we should focus on the frequently
> > failing
> > > ones, and probably we are safe to ignore for the time being the rare
> > issues
> > > - at least until we deal with the most pressing ones.
> > > 2. If we have tons of rarely failing test instabilities, we should
> > probably
> > > start addressing them as blockers.
> > >
> > > I'm using my own conscious and my best judgement when I'm
> > > bumping/decreasing priorities of test instabilities (and bugs), but as
> > > individual committer I don't have the full picture. As I wrote above, I
> > > think release managers are in a much better position to keep adjusting
> > > those kind of guidelines.
> > >
> > > Best, Piotrek
> > >
> > > pt., 25 cze 2021 o 08:10 Yu Li  napisał(a):
> > >
> > > > +1 for Xintong's proposal.
> > > >
> > > > For me, resolving problems directly (fixing the infrastructure issue,
> > > > disabling unstable tests and creating blocker JIRAs to track the fix
> > and
> > > > re-enable them asap, etc.) is (in most cases) better than working
> > around
> > > > them (verify locally, manually check and judge the failure as
> > > "unrelated",
> > > > etc.), and I believe the proposal could help us pushing those more
> > "real"
> > > > solutions forward.
> > > >
> > > > Best Regards,
> > > > Yu
> > > >
> > > >
> > > > On Fri, 25 Jun 2021 at 10:58, Yangze Guo  wrote:
> > > >
> > > > > Creating a blocker issue for the manually disabled tests sounds
> good
> > to
> > > > me.

Re: [DISCUSS] Do not merge PRs with "unrelated" test failures.

2021-06-25 Thread Piotr Nowojski
Thanks for the clarification Till. +1 for what you have written.

Piotrek

pt., 25 cze 2021 o 16:00 Till Rohrmann  napisał(a):

> One quick note for clarification. I don't have anything against builds
> running on your personal Azure account and this is not what I understood
> under "local environment". For me "local environment" means that someone
> runs the test locally on his machine and then says that the
> tests have passed locally.
>
> I do agree that there might be a conflict of interests if a PR author
> disables tests. Here I would argue that we don't have malignant committers
> which means that every committer will probably first check the respective
> ticket for how often the test failed. Then I guess the next step would be
> to discuss on the ticket whether to disable it or not. And finally, after
> reaching a consensus, it will be disabled. If we see someone abusing this
> policy, then we can still think about how to guard against it. But,
> honestly, I have very rarely seen such a case. I am also ok to pull in the
> release manager to make the final call if this resolves concerns.
>
> Cheers,
> Till
>
> On Fri, Jun 25, 2021 at 9:07 AM Piotr Nowojski 
> wrote:
>
> > +1 for the general idea, however I have concerns about a couple of
> details.
> >
> > > I would first try to not introduce the exception for local builds.
> > > It makes it quite hard for others to verify the build and to make sure
> > that the right things were executed.
> >
> > I would counter Till's proposal to ignore local green builds. If
> committer
> > is merging and closing a PR, with official azure failure, but there was a
> > green build before or in local azure it's IMO enough to leave the
> message:
> >
> > > Latest build failure is a known issue: FLINK-12345
> > > Green local build: URL
> >
> > This should address Till's concern about verification.
> >
> > On the other hand I have concerns about disabling tests.* It shouldn't be
> > the PR author/committer that's disabling a test on his own, as that's a
> > conflict of interests*. I have however no problems with disabling test
> > instabilities that were marked as "blockers" though, that should work
> > pretty well. But the important thing here is to correctly judge bumping
> > priorities of test instabilities based on their frequency and current
> > general health of the system. I believe that release managers should be
> > playing a big role here in deciding on the guidelines of what should be a
> > priority of certain test instabilities.
> >
> > What I mean by that is two example scenarios:
> > 1. if we have a handful of very frequently failing tests and a handful of
> > very rarely failing tests (like one reported failure and no another
> > occurrence in many months, and let's even say that the failure looks like
> > infrastructure/network timeout), we should focus on the frequently
> failing
> > ones, and probably we are safe to ignore for the time being the rare
> issues
> > - at least until we deal with the most pressing ones.
> > 2. If we have tons of rarely failing test instabilities, we should
> probably
> > start addressing them as blockers.
> >
> > I'm using my own conscious and my best judgement when I'm
> > bumping/decreasing priorities of test instabilities (and bugs), but as
> > individual committer I don't have the full picture. As I wrote above, I
> > think release managers are in a much better position to keep adjusting
> > those kind of guidelines.
> >
> > Best, Piotrek
> >
> > pt., 25 cze 2021 o 08:10 Yu Li  napisał(a):
> >
> > > +1 for Xintong's proposal.
> > >
> > > For me, resolving problems directly (fixing the infrastructure issue,
> > > disabling unstable tests and creating blocker JIRAs to track the fix
> and
> > > re-enable them asap, etc.) is (in most cases) better than working
> around
> > > them (verify locally, manually check and judge the failure as
> > "unrelated",
> > > etc.), and I believe the proposal could help us pushing those more
> "real"
> > > solutions forward.
> > >
> > > Best Regards,
> > > Yu
> > >
> > >
> > > On Fri, 25 Jun 2021 at 10:58, Yangze Guo  wrote:
> > >
> > > > Creating a blocker issue for the manually disabled tests sounds good
> to
> > > me.
> > > >
> > > > Minor: I'm still a bit worried about the commits merged before we fix
> > > > the unstable tests can also break those tests. Instead of letting the
> > > > assigners keep a look at all potentially related commits, they can
> > > > maintain a branch that is periodically synced with the master branch
> > > > while enabling the unstable test. So that they can catch the breaking
> > > > changes asap.
> > > >
> > > > Best,
> > > > Yangze Guo
> > > >
> > > > On Thu, Jun 24, 2021 at 9:52 PM Till Rohrmann 
> > > > wrote:
> > > > >
> > > > > I like the idea of creating a blocker issue for a disabled test.
> This
> > > > will
> > > > > force us to resolve it in a timely manner and it won't fall through
> > the
> > > > > cracks.
> > > > >
> > > > > Cheers,
> > > > > T

Re: [DISCUSS] Do not merge PRs with "unrelated" test failures.

2021-06-25 Thread Till Rohrmann
One quick note for clarification. I don't have anything against builds
running on your personal Azure account and this is not what I understood
under "local environment". For me "local environment" means that someone
runs the test locally on his machine and then says that the
tests have passed locally.

I do agree that there might be a conflict of interests if a PR author
disables tests. Here I would argue that we don't have malignant committers
which means that every committer will probably first check the respective
ticket for how often the test failed. Then I guess the next step would be
to discuss on the ticket whether to disable it or not. And finally, after
reaching a consensus, it will be disabled. If we see someone abusing this
policy, then we can still think about how to guard against it. But,
honestly, I have very rarely seen such a case. I am also ok to pull in the
release manager to make the final call if this resolves concerns.

Cheers,
Till

On Fri, Jun 25, 2021 at 9:07 AM Piotr Nowojski  wrote:

> +1 for the general idea, however I have concerns about a couple of details.
>
> > I would first try to not introduce the exception for local builds.
> > It makes it quite hard for others to verify the build and to make sure
> that the right things were executed.
>
> I would counter Till's proposal to ignore local green builds. If committer
> is merging and closing a PR, with official azure failure, but there was a
> green build before or in local azure it's IMO enough to leave the message:
>
> > Latest build failure is a known issue: FLINK-12345
> > Green local build: URL
>
> This should address Till's concern about verification.
>
> On the other hand I have concerns about disabling tests.* It shouldn't be
> the PR author/committer that's disabling a test on his own, as that's a
> conflict of interests*. I have however no problems with disabling test
> instabilities that were marked as "blockers" though, that should work
> pretty well. But the important thing here is to correctly judge bumping
> priorities of test instabilities based on their frequency and current
> general health of the system. I believe that release managers should be
> playing a big role here in deciding on the guidelines of what should be a
> priority of certain test instabilities.
>
> What I mean by that is two example scenarios:
> 1. if we have a handful of very frequently failing tests and a handful of
> very rarely failing tests (like one reported failure and no another
> occurrence in many months, and let's even say that the failure looks like
> infrastructure/network timeout), we should focus on the frequently failing
> ones, and probably we are safe to ignore for the time being the rare issues
> - at least until we deal with the most pressing ones.
> 2. If we have tons of rarely failing test instabilities, we should probably
> start addressing them as blockers.
>
> I'm using my own conscious and my best judgement when I'm
> bumping/decreasing priorities of test instabilities (and bugs), but as
> individual committer I don't have the full picture. As I wrote above, I
> think release managers are in a much better position to keep adjusting
> those kind of guidelines.
>
> Best, Piotrek
>
> pt., 25 cze 2021 o 08:10 Yu Li  napisał(a):
>
> > +1 for Xintong's proposal.
> >
> > For me, resolving problems directly (fixing the infrastructure issue,
> > disabling unstable tests and creating blocker JIRAs to track the fix and
> > re-enable them asap, etc.) is (in most cases) better than working around
> > them (verify locally, manually check and judge the failure as
> "unrelated",
> > etc.), and I believe the proposal could help us pushing those more "real"
> > solutions forward.
> >
> > Best Regards,
> > Yu
> >
> >
> > On Fri, 25 Jun 2021 at 10:58, Yangze Guo  wrote:
> >
> > > Creating a blocker issue for the manually disabled tests sounds good to
> > me.
> > >
> > > Minor: I'm still a bit worried about the commits merged before we fix
> > > the unstable tests can also break those tests. Instead of letting the
> > > assigners keep a look at all potentially related commits, they can
> > > maintain a branch that is periodically synced with the master branch
> > > while enabling the unstable test. So that they can catch the breaking
> > > changes asap.
> > >
> > > Best,
> > > Yangze Guo
> > >
> > > On Thu, Jun 24, 2021 at 9:52 PM Till Rohrmann 
> > > wrote:
> > > >
> > > > I like the idea of creating a blocker issue for a disabled test. This
> > > will
> > > > force us to resolve it in a timely manner and it won't fall through
> the
> > > > cracks.
> > > >
> > > > Cheers,
> > > > Till
> > > >
> > > > On Thu, Jun 24, 2021 at 8:06 AM Jingsong Li 
> > > wrote:
> > > >
> > > > > +1 to Xintong's proposal
> > > > >
> > > > > I also have some concerns about unstable cases.
> > > > >
> > > > > I think unstable cases can be divided into these types:
> > > > >
> > > > > - Force majeure: For example, network timeout, sudden environmental
> > > >

Re: [DISCUSS] Do not merge PRs with "unrelated" test failures.

2021-06-25 Thread Piotr Nowojski
+1 for the general idea, however I have concerns about a couple of details.

> I would first try to not introduce the exception for local builds.
> It makes it quite hard for others to verify the build and to make sure
that the right things were executed.

I would counter Till's proposal to ignore local green builds. If committer
is merging and closing a PR, with official azure failure, but there was a
green build before or in local azure it's IMO enough to leave the message:

> Latest build failure is a known issue: FLINK-12345
> Green local build: URL

This should address Till's concern about verification.

On the other hand I have concerns about disabling tests.* It shouldn't be
the PR author/committer that's disabling a test on his own, as that's a
conflict of interests*. I have however no problems with disabling test
instabilities that were marked as "blockers" though, that should work
pretty well. But the important thing here is to correctly judge bumping
priorities of test instabilities based on their frequency and current
general health of the system. I believe that release managers should be
playing a big role here in deciding on the guidelines of what should be a
priority of certain test instabilities.

What I mean by that is two example scenarios:
1. if we have a handful of very frequently failing tests and a handful of
very rarely failing tests (like one reported failure and no another
occurrence in many months, and let's even say that the failure looks like
infrastructure/network timeout), we should focus on the frequently failing
ones, and probably we are safe to ignore for the time being the rare issues
- at least until we deal with the most pressing ones.
2. If we have tons of rarely failing test instabilities, we should probably
start addressing them as blockers.

I'm using my own conscious and my best judgement when I'm
bumping/decreasing priorities of test instabilities (and bugs), but as
individual committer I don't have the full picture. As I wrote above, I
think release managers are in a much better position to keep adjusting
those kind of guidelines.

Best, Piotrek

pt., 25 cze 2021 o 08:10 Yu Li  napisał(a):

> +1 for Xintong's proposal.
>
> For me, resolving problems directly (fixing the infrastructure issue,
> disabling unstable tests and creating blocker JIRAs to track the fix and
> re-enable them asap, etc.) is (in most cases) better than working around
> them (verify locally, manually check and judge the failure as "unrelated",
> etc.), and I believe the proposal could help us pushing those more "real"
> solutions forward.
>
> Best Regards,
> Yu
>
>
> On Fri, 25 Jun 2021 at 10:58, Yangze Guo  wrote:
>
> > Creating a blocker issue for the manually disabled tests sounds good to
> me.
> >
> > Minor: I'm still a bit worried about the commits merged before we fix
> > the unstable tests can also break those tests. Instead of letting the
> > assigners keep a look at all potentially related commits, they can
> > maintain a branch that is periodically synced with the master branch
> > while enabling the unstable test. So that they can catch the breaking
> > changes asap.
> >
> > Best,
> > Yangze Guo
> >
> > On Thu, Jun 24, 2021 at 9:52 PM Till Rohrmann 
> > wrote:
> > >
> > > I like the idea of creating a blocker issue for a disabled test. This
> > will
> > > force us to resolve it in a timely manner and it won't fall through the
> > > cracks.
> > >
> > > Cheers,
> > > Till
> > >
> > > On Thu, Jun 24, 2021 at 8:06 AM Jingsong Li 
> > wrote:
> > >
> > > > +1 to Xintong's proposal
> > > >
> > > > I also have some concerns about unstable cases.
> > > >
> > > > I think unstable cases can be divided into these types:
> > > >
> > > > - Force majeure: For example, network timeout, sudden environmental
> > > > collapse, they are accidental and can always be solved by triggering
> > azure
> > > > again. Committers should wait for the next green azure.
> > > >
> > > > - Obvious mistakes: For example, some errors caused by obvious
> reasons
> > may
> > > > be repaired quickly. At this time, do we need to wait, or not wait
> and
> > just
> > > > ignore?
> > > >
> > > > - Difficult questions: These problems are very difficult to find.
> There
> > > > will be no solution for a while and a half. We don't even know the
> > reason.
> > > > At this time, we should ignore it. (Maybe it's judged by the author
> of
> > the
> > > > case. But what about the old case whose author can't be found?)
> > > >
> > > > So, the ignored cases should be the block of the next release until
> the
> > > > reason is found or the case is fixed?  We need to ensure that someone
> > will
> > > > take care of these cases, because there is no deepening of failed
> > tests, no
> > > > one may continue to pay attention to these cases.
> > > >
> > > > I think this guideline should consider these situations, and show how
> > to
> > > > solve them.
> > > >
> > > > Best,
> > > > Jingsong
> > > >
> > > > On Thu, Jun 24, 2021 at 10:57 AM

Re: [DISCUSS] Do not merge PRs with "unrelated" test failures.

2021-06-24 Thread Yu Li
+1 for Xintong's proposal.

For me, resolving problems directly (fixing the infrastructure issue,
disabling unstable tests and creating blocker JIRAs to track the fix and
re-enable them asap, etc.) is (in most cases) better than working around
them (verify locally, manually check and judge the failure as "unrelated",
etc.), and I believe the proposal could help us pushing those more "real"
solutions forward.

Best Regards,
Yu


On Fri, 25 Jun 2021 at 10:58, Yangze Guo  wrote:

> Creating a blocker issue for the manually disabled tests sounds good to me.
>
> Minor: I'm still a bit worried about the commits merged before we fix
> the unstable tests can also break those tests. Instead of letting the
> assigners keep a look at all potentially related commits, they can
> maintain a branch that is periodically synced with the master branch
> while enabling the unstable test. So that they can catch the breaking
> changes asap.
>
> Best,
> Yangze Guo
>
> On Thu, Jun 24, 2021 at 9:52 PM Till Rohrmann 
> wrote:
> >
> > I like the idea of creating a blocker issue for a disabled test. This
> will
> > force us to resolve it in a timely manner and it won't fall through the
> > cracks.
> >
> > Cheers,
> > Till
> >
> > On Thu, Jun 24, 2021 at 8:06 AM Jingsong Li 
> wrote:
> >
> > > +1 to Xintong's proposal
> > >
> > > I also have some concerns about unstable cases.
> > >
> > > I think unstable cases can be divided into these types:
> > >
> > > - Force majeure: For example, network timeout, sudden environmental
> > > collapse, they are accidental and can always be solved by triggering
> azure
> > > again. Committers should wait for the next green azure.
> > >
> > > - Obvious mistakes: For example, some errors caused by obvious reasons
> may
> > > be repaired quickly. At this time, do we need to wait, or not wait and
> just
> > > ignore?
> > >
> > > - Difficult questions: These problems are very difficult to find. There
> > > will be no solution for a while and a half. We don't even know the
> reason.
> > > At this time, we should ignore it. (Maybe it's judged by the author of
> the
> > > case. But what about the old case whose author can't be found?)
> > >
> > > So, the ignored cases should be the block of the next release until the
> > > reason is found or the case is fixed?  We need to ensure that someone
> will
> > > take care of these cases, because there is no deepening of failed
> tests, no
> > > one may continue to pay attention to these cases.
> > >
> > > I think this guideline should consider these situations, and show how
> to
> > > solve them.
> > >
> > > Best,
> > > Jingsong
> > >
> > > On Thu, Jun 24, 2021 at 10:57 AM Jark Wu  wrote:
> > >
> > > > Thanks to Xintong for bringing up this topic, I'm +1 in general.
> > > >
> > > > However, I think it's still not very clear how we address the
> unstable
> > > > tests.
> > > > I think this is a very important part of this new guideline.
> > > >
> > > > According to the discussion above, if some tests are unstable, we can
> > > > manually disable it.
> > > > But I have some questions in my mind:
> > > > 1) Is the instability judged by the committer themselves or by some
> > > > metrics?
> > > > 2) Should we log the disable commit in the corresponding issue and
> > > increase
> > > > the priority?
> > > > 3) What if nobody looks into this issue and this becomes some
> potential
> > > > bugs released with the new version?
> > > > 4) If no person is actively working on the issue, who should
> re-enable
> > > it?
> > > > Would it block PRs again?
> > > >
> > > >
> > > > Best,
> > > > Jark
> > > >
> > > >
> > > > On Thu, 24 Jun 2021 at 10:04, Xintong Song 
> > > wrote:
> > > >
> > > > > Thanks all for the feedback.
> > > > >
> > > > > @Till @Yangze
> > > > >
> > > > > I'm also not convinced by the idea of having an exception for local
> > > > builds.
> > > > > We need to execute the entire build (or at least the failing stage)
> > > > > locally, to make sure subsequent test cases prevented by the
> failure
> > > one
> > > > > are all executed. In that case, it's probably easier to rerun the
> build
> > > > on
> > > > > azure than locally.
> > > > >
> > > > > Concerning disabling unstable test cases that regularly block PRs
> from
> > > > > merging, maybe we can say that such cases can only be disabled when
> > > > someone
> > > > > is actively looking into it, likely the person who disabled the
> case.
> > > If
> > > > > this person is no longer actively working on it, he/she should
> enable
> > > the
> > > > > case again no matter if it is fixed or not.
> > > > >
> > > > > @Jing
> > > > >
> > > > > Thanks for the suggestions.
> > > > >
> > > > > +1 to provide guidelines on handling test failures.
> > > > >
> > > > > 1. Report the test failures in the JIRA.
> > > > > >
> > > > >
> > > > > +1 on this. Currently, the release managers are monitoring the ci
> and
> > > > cron
> > > > > build instabilities and reporting them on JIRA. We should also
> > > encourage
> > > > > ot

Re: [DISCUSS] Do not merge PRs with "unrelated" test failures.

2021-06-24 Thread Yangze Guo
Creating a blocker issue for the manually disabled tests sounds good to me.

Minor: I'm still a bit worried about the commits merged before we fix
the unstable tests can also break those tests. Instead of letting the
assigners keep a look at all potentially related commits, they can
maintain a branch that is periodically synced with the master branch
while enabling the unstable test. So that they can catch the breaking
changes asap.

Best,
Yangze Guo

On Thu, Jun 24, 2021 at 9:52 PM Till Rohrmann  wrote:
>
> I like the idea of creating a blocker issue for a disabled test. This will
> force us to resolve it in a timely manner and it won't fall through the
> cracks.
>
> Cheers,
> Till
>
> On Thu, Jun 24, 2021 at 8:06 AM Jingsong Li  wrote:
>
> > +1 to Xintong's proposal
> >
> > I also have some concerns about unstable cases.
> >
> > I think unstable cases can be divided into these types:
> >
> > - Force majeure: For example, network timeout, sudden environmental
> > collapse, they are accidental and can always be solved by triggering azure
> > again. Committers should wait for the next green azure.
> >
> > - Obvious mistakes: For example, some errors caused by obvious reasons may
> > be repaired quickly. At this time, do we need to wait, or not wait and just
> > ignore?
> >
> > - Difficult questions: These problems are very difficult to find. There
> > will be no solution for a while and a half. We don't even know the reason.
> > At this time, we should ignore it. (Maybe it's judged by the author of the
> > case. But what about the old case whose author can't be found?)
> >
> > So, the ignored cases should be the block of the next release until the
> > reason is found or the case is fixed?  We need to ensure that someone will
> > take care of these cases, because there is no deepening of failed tests, no
> > one may continue to pay attention to these cases.
> >
> > I think this guideline should consider these situations, and show how to
> > solve them.
> >
> > Best,
> > Jingsong
> >
> > On Thu, Jun 24, 2021 at 10:57 AM Jark Wu  wrote:
> >
> > > Thanks to Xintong for bringing up this topic, I'm +1 in general.
> > >
> > > However, I think it's still not very clear how we address the unstable
> > > tests.
> > > I think this is a very important part of this new guideline.
> > >
> > > According to the discussion above, if some tests are unstable, we can
> > > manually disable it.
> > > But I have some questions in my mind:
> > > 1) Is the instability judged by the committer themselves or by some
> > > metrics?
> > > 2) Should we log the disable commit in the corresponding issue and
> > increase
> > > the priority?
> > > 3) What if nobody looks into this issue and this becomes some potential
> > > bugs released with the new version?
> > > 4) If no person is actively working on the issue, who should re-enable
> > it?
> > > Would it block PRs again?
> > >
> > >
> > > Best,
> > > Jark
> > >
> > >
> > > On Thu, 24 Jun 2021 at 10:04, Xintong Song 
> > wrote:
> > >
> > > > Thanks all for the feedback.
> > > >
> > > > @Till @Yangze
> > > >
> > > > I'm also not convinced by the idea of having an exception for local
> > > builds.
> > > > We need to execute the entire build (or at least the failing stage)
> > > > locally, to make sure subsequent test cases prevented by the failure
> > one
> > > > are all executed. In that case, it's probably easier to rerun the build
> > > on
> > > > azure than locally.
> > > >
> > > > Concerning disabling unstable test cases that regularly block PRs from
> > > > merging, maybe we can say that such cases can only be disabled when
> > > someone
> > > > is actively looking into it, likely the person who disabled the case.
> > If
> > > > this person is no longer actively working on it, he/she should enable
> > the
> > > > case again no matter if it is fixed or not.
> > > >
> > > > @Jing
> > > >
> > > > Thanks for the suggestions.
> > > >
> > > > +1 to provide guidelines on handling test failures.
> > > >
> > > > 1. Report the test failures in the JIRA.
> > > > >
> > > >
> > > > +1 on this. Currently, the release managers are monitoring the ci and
> > > cron
> > > > build instabilities and reporting them on JIRA. We should also
> > encourage
> > > > other contributors to do that for PRs.
> > > >
> > > > 2. Set a deadline to find out the root cause and solve the failure for
> > > the
> > > > > new created JIRA  because we could not block other commit merges for
> > a
> > > > long
> > > > > time
> > > > >
> > > > 3. What to do if the JIRA has not made significant progress when
> > reached
> > > to
> > > > > the deadline time?
> > > >
> > > >
> > > > I'm not sure about these two. It feels a bit against the voluntary
> > nature
> > > > of open source projects.
> > > >
> > > > IMHO, frequent instabilities are more likely to be upgraded to the
> > > critical
> > > > / blocker priority, receive more attention and eventually get fixed.
> > > > Release managers are also responsible for looking for assig

Re: [DISCUSS] Do not merge PRs with "unrelated" test failures.

2021-06-24 Thread Till Rohrmann
I like the idea of creating a blocker issue for a disabled test. This will
force us to resolve it in a timely manner and it won't fall through the
cracks.

Cheers,
Till

On Thu, Jun 24, 2021 at 8:06 AM Jingsong Li  wrote:

> +1 to Xintong's proposal
>
> I also have some concerns about unstable cases.
>
> I think unstable cases can be divided into these types:
>
> - Force majeure: For example, network timeout, sudden environmental
> collapse, they are accidental and can always be solved by triggering azure
> again. Committers should wait for the next green azure.
>
> - Obvious mistakes: For example, some errors caused by obvious reasons may
> be repaired quickly. At this time, do we need to wait, or not wait and just
> ignore?
>
> - Difficult questions: These problems are very difficult to find. There
> will be no solution for a while and a half. We don't even know the reason.
> At this time, we should ignore it. (Maybe it's judged by the author of the
> case. But what about the old case whose author can't be found?)
>
> So, the ignored cases should be the block of the next release until the
> reason is found or the case is fixed?  We need to ensure that someone will
> take care of these cases, because there is no deepening of failed tests, no
> one may continue to pay attention to these cases.
>
> I think this guideline should consider these situations, and show how to
> solve them.
>
> Best,
> Jingsong
>
> On Thu, Jun 24, 2021 at 10:57 AM Jark Wu  wrote:
>
> > Thanks to Xintong for bringing up this topic, I'm +1 in general.
> >
> > However, I think it's still not very clear how we address the unstable
> > tests.
> > I think this is a very important part of this new guideline.
> >
> > According to the discussion above, if some tests are unstable, we can
> > manually disable it.
> > But I have some questions in my mind:
> > 1) Is the instability judged by the committer themselves or by some
> > metrics?
> > 2) Should we log the disable commit in the corresponding issue and
> increase
> > the priority?
> > 3) What if nobody looks into this issue and this becomes some potential
> > bugs released with the new version?
> > 4) If no person is actively working on the issue, who should re-enable
> it?
> > Would it block PRs again?
> >
> >
> > Best,
> > Jark
> >
> >
> > On Thu, 24 Jun 2021 at 10:04, Xintong Song 
> wrote:
> >
> > > Thanks all for the feedback.
> > >
> > > @Till @Yangze
> > >
> > > I'm also not convinced by the idea of having an exception for local
> > builds.
> > > We need to execute the entire build (or at least the failing stage)
> > > locally, to make sure subsequent test cases prevented by the failure
> one
> > > are all executed. In that case, it's probably easier to rerun the build
> > on
> > > azure than locally.
> > >
> > > Concerning disabling unstable test cases that regularly block PRs from
> > > merging, maybe we can say that such cases can only be disabled when
> > someone
> > > is actively looking into it, likely the person who disabled the case.
> If
> > > this person is no longer actively working on it, he/she should enable
> the
> > > case again no matter if it is fixed or not.
> > >
> > > @Jing
> > >
> > > Thanks for the suggestions.
> > >
> > > +1 to provide guidelines on handling test failures.
> > >
> > > 1. Report the test failures in the JIRA.
> > > >
> > >
> > > +1 on this. Currently, the release managers are monitoring the ci and
> > cron
> > > build instabilities and reporting them on JIRA. We should also
> encourage
> > > other contributors to do that for PRs.
> > >
> > > 2. Set a deadline to find out the root cause and solve the failure for
> > the
> > > > new created JIRA  because we could not block other commit merges for
> a
> > > long
> > > > time
> > > >
> > > 3. What to do if the JIRA has not made significant progress when
> reached
> > to
> > > > the deadline time?
> > >
> > >
> > > I'm not sure about these two. It feels a bit against the voluntary
> nature
> > > of open source projects.
> > >
> > > IMHO, frequent instabilities are more likely to be upgraded to the
> > critical
> > > / blocker priority, receive more attention and eventually get fixed.
> > > Release managers are also responsible for looking for assignees for
> such
> > > issues. If a case is still not fixed soonish, even with all these
> > efforts,
> > > I'm not sure how setting a deadline can help this.
> > >
> > > 4. If we disable the respective tests temporarily, we also need a
> > mechanism
> > > > to ensure the issue would be continued to be investigated in the
> > future.
> > > >
> > >
> > > +1. As mentioned above, we may consider disabling such tests iff
> someone
> > is
> > > actively working on it.
> > >
> > > Thank you~
> > >
> > > Xintong Song
> > >
> > >
> > >
> > > On Wed, Jun 23, 2021 at 9:56 PM JING ZHANG 
> wrote:
> > >
> > > > Hi Xintong,
> > > > +1 to the proposal.
> > > > In order to better comply with the rule, it is necessary to describe
> > > what's
> > > > best practice if en

Re: [DISCUSS] Do not merge PRs with "unrelated" test failures.

2021-06-23 Thread Jingsong Li
+1 to Xintong's proposal

I also have some concerns about unstable cases.

I think unstable cases can be divided into these types:

- Force majeure: For example, network timeout, sudden environmental
collapse, they are accidental and can always be solved by triggering azure
again. Committers should wait for the next green azure.

- Obvious mistakes: For example, some errors caused by obvious reasons may
be repaired quickly. At this time, do we need to wait, or not wait and just
ignore?

- Difficult questions: These problems are very difficult to find. There
will be no solution for a while and a half. We don't even know the reason.
At this time, we should ignore it. (Maybe it's judged by the author of the
case. But what about the old case whose author can't be found?)

So, the ignored cases should be the block of the next release until the
reason is found or the case is fixed?  We need to ensure that someone will
take care of these cases, because there is no deepening of failed tests, no
one may continue to pay attention to these cases.

I think this guideline should consider these situations, and show how to
solve them.

Best,
Jingsong

On Thu, Jun 24, 2021 at 10:57 AM Jark Wu  wrote:

> Thanks to Xintong for bringing up this topic, I'm +1 in general.
>
> However, I think it's still not very clear how we address the unstable
> tests.
> I think this is a very important part of this new guideline.
>
> According to the discussion above, if some tests are unstable, we can
> manually disable it.
> But I have some questions in my mind:
> 1) Is the instability judged by the committer themselves or by some
> metrics?
> 2) Should we log the disable commit in the corresponding issue and increase
> the priority?
> 3) What if nobody looks into this issue and this becomes some potential
> bugs released with the new version?
> 4) If no person is actively working on the issue, who should re-enable it?
> Would it block PRs again?
>
>
> Best,
> Jark
>
>
> On Thu, 24 Jun 2021 at 10:04, Xintong Song  wrote:
>
> > Thanks all for the feedback.
> >
> > @Till @Yangze
> >
> > I'm also not convinced by the idea of having an exception for local
> builds.
> > We need to execute the entire build (or at least the failing stage)
> > locally, to make sure subsequent test cases prevented by the failure one
> > are all executed. In that case, it's probably easier to rerun the build
> on
> > azure than locally.
> >
> > Concerning disabling unstable test cases that regularly block PRs from
> > merging, maybe we can say that such cases can only be disabled when
> someone
> > is actively looking into it, likely the person who disabled the case. If
> > this person is no longer actively working on it, he/she should enable the
> > case again no matter if it is fixed or not.
> >
> > @Jing
> >
> > Thanks for the suggestions.
> >
> > +1 to provide guidelines on handling test failures.
> >
> > 1. Report the test failures in the JIRA.
> > >
> >
> > +1 on this. Currently, the release managers are monitoring the ci and
> cron
> > build instabilities and reporting them on JIRA. We should also encourage
> > other contributors to do that for PRs.
> >
> > 2. Set a deadline to find out the root cause and solve the failure for
> the
> > > new created JIRA  because we could not block other commit merges for a
> > long
> > > time
> > >
> > 3. What to do if the JIRA has not made significant progress when reached
> to
> > > the deadline time?
> >
> >
> > I'm not sure about these two. It feels a bit against the voluntary nature
> > of open source projects.
> >
> > IMHO, frequent instabilities are more likely to be upgraded to the
> critical
> > / blocker priority, receive more attention and eventually get fixed.
> > Release managers are also responsible for looking for assignees for such
> > issues. If a case is still not fixed soonish, even with all these
> efforts,
> > I'm not sure how setting a deadline can help this.
> >
> > 4. If we disable the respective tests temporarily, we also need a
> mechanism
> > > to ensure the issue would be continued to be investigated in the
> future.
> > >
> >
> > +1. As mentioned above, we may consider disabling such tests iff someone
> is
> > actively working on it.
> >
> > Thank you~
> >
> > Xintong Song
> >
> >
> >
> > On Wed, Jun 23, 2021 at 9:56 PM JING ZHANG  wrote:
> >
> > > Hi Xintong,
> > > +1 to the proposal.
> > > In order to better comply with the rule, it is necessary to describe
> > what's
> > > best practice if encountering test failure which seems unrelated with
> the
> > > current commits.
> > > How to avoid merging PR with test failures and not blocking code
> merging
> > > for a long time?
> > > I tried to think about the possible steps, and found there are some
> > > detailed problems that need to be discussed in a step further:
> > > 1. Report the test failures in the JIRA.
> > > 2. Set a deadline to find out the root cause and solve the failure for
> > the
> > > new created JIRA  because we could not block

Re: [DISCUSS] Do not merge PRs with "unrelated" test failures.

2021-06-23 Thread Jark Wu
Thanks to Xintong for bringing up this topic, I'm +1 in general.

However, I think it's still not very clear how we address the unstable
tests.
I think this is a very important part of this new guideline.

According to the discussion above, if some tests are unstable, we can
manually disable it.
But I have some questions in my mind:
1) Is the instability judged by the committer themselves or by some
metrics?
2) Should we log the disable commit in the corresponding issue and increase
the priority?
3) What if nobody looks into this issue and this becomes some potential
bugs released with the new version?
4) If no person is actively working on the issue, who should re-enable it?
Would it block PRs again?


Best,
Jark


On Thu, 24 Jun 2021 at 10:04, Xintong Song  wrote:

> Thanks all for the feedback.
>
> @Till @Yangze
>
> I'm also not convinced by the idea of having an exception for local builds.
> We need to execute the entire build (or at least the failing stage)
> locally, to make sure subsequent test cases prevented by the failure one
> are all executed. In that case, it's probably easier to rerun the build on
> azure than locally.
>
> Concerning disabling unstable test cases that regularly block PRs from
> merging, maybe we can say that such cases can only be disabled when someone
> is actively looking into it, likely the person who disabled the case. If
> this person is no longer actively working on it, he/she should enable the
> case again no matter if it is fixed or not.
>
> @Jing
>
> Thanks for the suggestions.
>
> +1 to provide guidelines on handling test failures.
>
> 1. Report the test failures in the JIRA.
> >
>
> +1 on this. Currently, the release managers are monitoring the ci and cron
> build instabilities and reporting them on JIRA. We should also encourage
> other contributors to do that for PRs.
>
> 2. Set a deadline to find out the root cause and solve the failure for the
> > new created JIRA  because we could not block other commit merges for a
> long
> > time
> >
> 3. What to do if the JIRA has not made significant progress when reached to
> > the deadline time?
>
>
> I'm not sure about these two. It feels a bit against the voluntary nature
> of open source projects.
>
> IMHO, frequent instabilities are more likely to be upgraded to the critical
> / blocker priority, receive more attention and eventually get fixed.
> Release managers are also responsible for looking for assignees for such
> issues. If a case is still not fixed soonish, even with all these efforts,
> I'm not sure how setting a deadline can help this.
>
> 4. If we disable the respective tests temporarily, we also need a mechanism
> > to ensure the issue would be continued to be investigated in the future.
> >
>
> +1. As mentioned above, we may consider disabling such tests iff someone is
> actively working on it.
>
> Thank you~
>
> Xintong Song
>
>
>
> On Wed, Jun 23, 2021 at 9:56 PM JING ZHANG  wrote:
>
> > Hi Xintong,
> > +1 to the proposal.
> > In order to better comply with the rule, it is necessary to describe
> what's
> > best practice if encountering test failure which seems unrelated with the
> > current commits.
> > How to avoid merging PR with test failures and not blocking code merging
> > for a long time?
> > I tried to think about the possible steps, and found there are some
> > detailed problems that need to be discussed in a step further:
> > 1. Report the test failures in the JIRA.
> > 2. Set a deadline to find out the root cause and solve the failure for
> the
> > new created JIRA  because we could not block other commit merges for a
> long
> > time
> > When is a reasonable deadline here?
> > 3. What to do if the JIRA has not made significant progress when reached
> to
> > the deadline time?
> > There are several situations as follows, maybe different cases need
> > different approaches.
> > 1. the JIRA is non-assigned yet
> > 2. not found the root cause yet
> > 3. not found a good solution, but already found the root cause
> > 4. found a solution, but it needs more time to be done.
> > 4. If we disable the respective tests temporarily, we also need a
> mechanism
> > to ensure the issue would be continued to be investigated in the future.
> >
> > Best regards,
> > JING ZHANG
> >
> > Stephan Ewen  于2021年6月23日周三 下午8:16写道:
> >
> > > +1 to Xintong's proposal
> > >
> > > On Wed, Jun 23, 2021 at 1:53 PM Till Rohrmann 
> > > wrote:
> > >
> > > > I would first try to not introduce the exception for local builds. It
> > > makes
> > > > it quite hard for others to verify the build and to make sure that
> the
> > > > right things were executed. If we see that this becomes an issue then
> > we
> > > > can revisit this idea.
> > > >
> > > > Cheers,
> > > > Till
> > > >
> > > > On Wed, Jun 23, 2021 at 4:19 AM Yangze Guo 
> wrote:
> > > >
> > > > > +1 for appending this to community guidelines for merging PRs.
> > > > >
> > > > > @Till Rohrmann
> > > > > I agree that with this approach unstable te

Re: [DISCUSS] Do not merge PRs with "unrelated" test failures.

2021-06-23 Thread Xintong Song
Thanks all for the feedback.

@Till @Yangze

I'm also not convinced by the idea of having an exception for local builds.
We need to execute the entire build (or at least the failing stage)
locally, to make sure subsequent test cases prevented by the failure one
are all executed. In that case, it's probably easier to rerun the build on
azure than locally.

Concerning disabling unstable test cases that regularly block PRs from
merging, maybe we can say that such cases can only be disabled when someone
is actively looking into it, likely the person who disabled the case. If
this person is no longer actively working on it, he/she should enable the
case again no matter if it is fixed or not.

@Jing

Thanks for the suggestions.

+1 to provide guidelines on handling test failures.

1. Report the test failures in the JIRA.
>

+1 on this. Currently, the release managers are monitoring the ci and cron
build instabilities and reporting them on JIRA. We should also encourage
other contributors to do that for PRs.

2. Set a deadline to find out the root cause and solve the failure for the
> new created JIRA  because we could not block other commit merges for a long
> time
>
3. What to do if the JIRA has not made significant progress when reached to
> the deadline time?


I'm not sure about these two. It feels a bit against the voluntary nature
of open source projects.

IMHO, frequent instabilities are more likely to be upgraded to the critical
/ blocker priority, receive more attention and eventually get fixed.
Release managers are also responsible for looking for assignees for such
issues. If a case is still not fixed soonish, even with all these efforts,
I'm not sure how setting a deadline can help this.

4. If we disable the respective tests temporarily, we also need a mechanism
> to ensure the issue would be continued to be investigated in the future.
>

+1. As mentioned above, we may consider disabling such tests iff someone is
actively working on it.

Thank you~

Xintong Song



On Wed, Jun 23, 2021 at 9:56 PM JING ZHANG  wrote:

> Hi Xintong,
> +1 to the proposal.
> In order to better comply with the rule, it is necessary to describe what's
> best practice if encountering test failure which seems unrelated with the
> current commits.
> How to avoid merging PR with test failures and not blocking code merging
> for a long time?
> I tried to think about the possible steps, and found there are some
> detailed problems that need to be discussed in a step further:
> 1. Report the test failures in the JIRA.
> 2. Set a deadline to find out the root cause and solve the failure for the
> new created JIRA  because we could not block other commit merges for a long
> time
> When is a reasonable deadline here?
> 3. What to do if the JIRA has not made significant progress when reached to
> the deadline time?
> There are several situations as follows, maybe different cases need
> different approaches.
> 1. the JIRA is non-assigned yet
> 2. not found the root cause yet
> 3. not found a good solution, but already found the root cause
> 4. found a solution, but it needs more time to be done.
> 4. If we disable the respective tests temporarily, we also need a mechanism
> to ensure the issue would be continued to be investigated in the future.
>
> Best regards,
> JING ZHANG
>
> Stephan Ewen  于2021年6月23日周三 下午8:16写道:
>
> > +1 to Xintong's proposal
> >
> > On Wed, Jun 23, 2021 at 1:53 PM Till Rohrmann 
> > wrote:
> >
> > > I would first try to not introduce the exception for local builds. It
> > makes
> > > it quite hard for others to verify the build and to make sure that the
> > > right things were executed. If we see that this becomes an issue then
> we
> > > can revisit this idea.
> > >
> > > Cheers,
> > > Till
> > >
> > > On Wed, Jun 23, 2021 at 4:19 AM Yangze Guo  wrote:
> > >
> > > > +1 for appending this to community guidelines for merging PRs.
> > > >
> > > > @Till Rohrmann
> > > > I agree that with this approach unstable tests will not block other
> > > > commit merges. However, it might be hard to prevent merging commits
> > > > that are related to those tests and should have been passed them.
> It's
> > > > true that this judgment can be made by the committers, but no one can
> > > > ensure the judgment is always precise and so that we have this
> > > > discussion thread.
> > > >
> > > > Regarding the unstable tests, how about adding another exception:
> > > > committers verify it in their local environment and comment in such
> > > > cases?
> > > >
> > > > Best,
> > > > Yangze Guo
> > > >
> > > > On Tue, Jun 22, 2021 at 8:23 PM 刘建刚 
> wrote:
> > > > >
> > > > > It is a good principle to run all tests successfully with any
> change.
> > > > This
> > > > > means a lot for project's stability and development. I am big +1
> for
> > > this
> > > > > proposal.
> > > > >
> > > > > Best
> > > > > liujiangang
> > > > >
> > > > > Till Rohrmann  于2021年6月22日周二 下午6:36写道:
> > > > >
> > > > > > One way to address 

Re: [DISCUSS] Do not merge PRs with "unrelated" test failures.

2021-06-23 Thread JING ZHANG
Hi Xintong,
+1 to the proposal.
In order to better comply with the rule, it is necessary to describe what's
best practice if encountering test failure which seems unrelated with the
current commits.
How to avoid merging PR with test failures and not blocking code merging
for a long time?
I tried to think about the possible steps, and found there are some
detailed problems that need to be discussed in a step further:
1. Report the test failures in the JIRA.
2. Set a deadline to find out the root cause and solve the failure for the
new created JIRA  because we could not block other commit merges for a long
time
When is a reasonable deadline here?
3. What to do if the JIRA has not made significant progress when reached to
the deadline time?
There are several situations as follows, maybe different cases need
different approaches.
1. the JIRA is non-assigned yet
2. not found the root cause yet
3. not found a good solution, but already found the root cause
4. found a solution, but it needs more time to be done.
4. If we disable the respective tests temporarily, we also need a mechanism
to ensure the issue would be continued to be investigated in the future.

Best regards,
JING ZHANG

Stephan Ewen  于2021年6月23日周三 下午8:16写道:

> +1 to Xintong's proposal
>
> On Wed, Jun 23, 2021 at 1:53 PM Till Rohrmann 
> wrote:
>
> > I would first try to not introduce the exception for local builds. It
> makes
> > it quite hard for others to verify the build and to make sure that the
> > right things were executed. If we see that this becomes an issue then we
> > can revisit this idea.
> >
> > Cheers,
> > Till
> >
> > On Wed, Jun 23, 2021 at 4:19 AM Yangze Guo  wrote:
> >
> > > +1 for appending this to community guidelines for merging PRs.
> > >
> > > @Till Rohrmann
> > > I agree that with this approach unstable tests will not block other
> > > commit merges. However, it might be hard to prevent merging commits
> > > that are related to those tests and should have been passed them. It's
> > > true that this judgment can be made by the committers, but no one can
> > > ensure the judgment is always precise and so that we have this
> > > discussion thread.
> > >
> > > Regarding the unstable tests, how about adding another exception:
> > > committers verify it in their local environment and comment in such
> > > cases?
> > >
> > > Best,
> > > Yangze Guo
> > >
> > > On Tue, Jun 22, 2021 at 8:23 PM 刘建刚  wrote:
> > > >
> > > > It is a good principle to run all tests successfully with any change.
> > > This
> > > > means a lot for project's stability and development. I am big +1 for
> > this
> > > > proposal.
> > > >
> > > > Best
> > > > liujiangang
> > > >
> > > > Till Rohrmann  于2021年6月22日周二 下午6:36写道:
> > > >
> > > > > One way to address the problem of regularly failing tests that
> block
> > > > > merging of PRs is to disable the respective tests for the time
> being.
> > > Of
> > > > > course, the failing test then needs to be fixed. But at least that
> > way
> > > we
> > > > > would not block everyone from making progress.
> > > > >
> > > > > Cheers,
> > > > > Till
> > > > >
> > > > > On Tue, Jun 22, 2021 at 12:00 PM Arvid Heise 
> > wrote:
> > > > >
> > > > > > I think this is overall a good idea. So +1 from my side.
> > > > > > However, I'd like to put a higher priority on infrastructure
> then,
> > in
> > > > > > particular docker image/artifact caches.
> > > > > >
> > > > > > On Tue, Jun 22, 2021 at 11:50 AM Till Rohrmann <
> > trohrm...@apache.org
> > > >
> > > > > > wrote:
> > > > > >
> > > > > > > Thanks for bringing this topic to our attention Xintong. I
> think
> > > your
> > > > > > > proposal makes a lot of sense and we should follow it. It will
> > > give us
> > > > > > > confidence that our changes are working and it might be a good
> > > > > incentive
> > > > > > to
> > > > > > > quickly fix build instabilities. Hence, +1.
> > > > > > >
> > > > > > > Cheers,
> > > > > > > Till
> > > > > > >
> > > > > > > On Tue, Jun 22, 2021 at 11:12 AM Xintong Song <
> > > tonysong...@gmail.com>
> > > > > > > wrote:
> > > > > > >
> > > > > > > > Hi everyone,
> > > > > > > >
> > > > > > > > In the past a couple of weeks, I've observed several times
> that
> > > PRs
> > > > > are
> > > > > > > > merged without a green light from the CI tests, where failure
> > > cases
> > > > > are
> > > > > > > > considered *unrelated*. This may not always cause problems,
> but
> > > would
> > > > > > > > increase the chance of breaking our code base. In fact, it
> has
> > > > > occurred
> > > > > > > to
> > > > > > > > me twice in the past few weeks that I had to revert a commit
> > > which
> > > > > > breaks
> > > > > > > > the master branch due to this.
> > > > > > > >
> > > > > > > > I think it would be nicer to enforce a stricter rule, that no
> > PRs
> > > > > > should
> > > > > > > be
> > > > > > > > merged without passing CI.
> > > > > > > >
> > > > > > > > The problems of merging PRs with "unrelated" test failures
> are:
>

Re: [DISCUSS] Do not merge PRs with "unrelated" test failures.

2021-06-23 Thread Stephan Ewen
+1 to Xintong's proposal

On Wed, Jun 23, 2021 at 1:53 PM Till Rohrmann  wrote:

> I would first try to not introduce the exception for local builds. It makes
> it quite hard for others to verify the build and to make sure that the
> right things were executed. If we see that this becomes an issue then we
> can revisit this idea.
>
> Cheers,
> Till
>
> On Wed, Jun 23, 2021 at 4:19 AM Yangze Guo  wrote:
>
> > +1 for appending this to community guidelines for merging PRs.
> >
> > @Till Rohrmann
> > I agree that with this approach unstable tests will not block other
> > commit merges. However, it might be hard to prevent merging commits
> > that are related to those tests and should have been passed them. It's
> > true that this judgment can be made by the committers, but no one can
> > ensure the judgment is always precise and so that we have this
> > discussion thread.
> >
> > Regarding the unstable tests, how about adding another exception:
> > committers verify it in their local environment and comment in such
> > cases?
> >
> > Best,
> > Yangze Guo
> >
> > On Tue, Jun 22, 2021 at 8:23 PM 刘建刚  wrote:
> > >
> > > It is a good principle to run all tests successfully with any change.
> > This
> > > means a lot for project's stability and development. I am big +1 for
> this
> > > proposal.
> > >
> > > Best
> > > liujiangang
> > >
> > > Till Rohrmann  于2021年6月22日周二 下午6:36写道:
> > >
> > > > One way to address the problem of regularly failing tests that block
> > > > merging of PRs is to disable the respective tests for the time being.
> > Of
> > > > course, the failing test then needs to be fixed. But at least that
> way
> > we
> > > > would not block everyone from making progress.
> > > >
> > > > Cheers,
> > > > Till
> > > >
> > > > On Tue, Jun 22, 2021 at 12:00 PM Arvid Heise 
> wrote:
> > > >
> > > > > I think this is overall a good idea. So +1 from my side.
> > > > > However, I'd like to put a higher priority on infrastructure then,
> in
> > > > > particular docker image/artifact caches.
> > > > >
> > > > > On Tue, Jun 22, 2021 at 11:50 AM Till Rohrmann <
> trohrm...@apache.org
> > >
> > > > > wrote:
> > > > >
> > > > > > Thanks for bringing this topic to our attention Xintong. I think
> > your
> > > > > > proposal makes a lot of sense and we should follow it. It will
> > give us
> > > > > > confidence that our changes are working and it might be a good
> > > > incentive
> > > > > to
> > > > > > quickly fix build instabilities. Hence, +1.
> > > > > >
> > > > > > Cheers,
> > > > > > Till
> > > > > >
> > > > > > On Tue, Jun 22, 2021 at 11:12 AM Xintong Song <
> > tonysong...@gmail.com>
> > > > > > wrote:
> > > > > >
> > > > > > > Hi everyone,
> > > > > > >
> > > > > > > In the past a couple of weeks, I've observed several times that
> > PRs
> > > > are
> > > > > > > merged without a green light from the CI tests, where failure
> > cases
> > > > are
> > > > > > > considered *unrelated*. This may not always cause problems, but
> > would
> > > > > > > increase the chance of breaking our code base. In fact, it has
> > > > occurred
> > > > > > to
> > > > > > > me twice in the past few weeks that I had to revert a commit
> > which
> > > > > breaks
> > > > > > > the master branch due to this.
> > > > > > >
> > > > > > > I think it would be nicer to enforce a stricter rule, that no
> PRs
> > > > > should
> > > > > > be
> > > > > > > merged without passing CI.
> > > > > > >
> > > > > > > The problems of merging PRs with "unrelated" test failures are:
> > > > > > > - It's not always straightforward to tell whether a test
> > failures are
> > > > > > > related or not.
> > > > > > > - It prevents subsequent test cases from being executed, which
> > may
> > > > fail
> > > > > > > relating to the PR changes.
> > > > > > >
> > > > > > > To make things easier for the committers, the following
> > exceptions
> > > > > might
> > > > > > be
> > > > > > > considered acceptable.
> > > > > > > - The PR has passed CI in the contributor's personal workspace.
> > > > Please
> > > > > > post
> > > > > > > the link in such cases.
> > > > > > > - The CI tests have been triggered multiple times, on the same
> > > > commit,
> > > > > > and
> > > > > > > each stage has at least passed for once. Please also comment in
> > such
> > > > > > cases.
> > > > > > >
> > > > > > > If we all agree on this, I'd update the community guidelines
> for
> > > > > merging
> > > > > > > PRs wrt. this proposal. [1]
> > > > > > >
> > > > > > > Please let me know what do you think.
> > > > > > >
> > > > > > > Thank you~
> > > > > > >
> > > > > > > Xintong Song
> > > > > > >
> > > > > > >
> > > > > > > [1]
> > > > > > >
> > > > >
> > https://cwiki.apache.org/confluence/display/FLINK/Merging+Pull+Requests
> > > > > > >
> > > > > >
> > > > >
> > > >
> >
>


Re: [DISCUSS] Do not merge PRs with "unrelated" test failures.

2021-06-23 Thread Till Rohrmann
I would first try to not introduce the exception for local builds. It makes
it quite hard for others to verify the build and to make sure that the
right things were executed. If we see that this becomes an issue then we
can revisit this idea.

Cheers,
Till

On Wed, Jun 23, 2021 at 4:19 AM Yangze Guo  wrote:

> +1 for appending this to community guidelines for merging PRs.
>
> @Till Rohrmann
> I agree that with this approach unstable tests will not block other
> commit merges. However, it might be hard to prevent merging commits
> that are related to those tests and should have been passed them. It's
> true that this judgment can be made by the committers, but no one can
> ensure the judgment is always precise and so that we have this
> discussion thread.
>
> Regarding the unstable tests, how about adding another exception:
> committers verify it in their local environment and comment in such
> cases?
>
> Best,
> Yangze Guo
>
> On Tue, Jun 22, 2021 at 8:23 PM 刘建刚  wrote:
> >
> > It is a good principle to run all tests successfully with any change.
> This
> > means a lot for project's stability and development. I am big +1 for this
> > proposal.
> >
> > Best
> > liujiangang
> >
> > Till Rohrmann  于2021年6月22日周二 下午6:36写道:
> >
> > > One way to address the problem of regularly failing tests that block
> > > merging of PRs is to disable the respective tests for the time being.
> Of
> > > course, the failing test then needs to be fixed. But at least that way
> we
> > > would not block everyone from making progress.
> > >
> > > Cheers,
> > > Till
> > >
> > > On Tue, Jun 22, 2021 at 12:00 PM Arvid Heise  wrote:
> > >
> > > > I think this is overall a good idea. So +1 from my side.
> > > > However, I'd like to put a higher priority on infrastructure then, in
> > > > particular docker image/artifact caches.
> > > >
> > > > On Tue, Jun 22, 2021 at 11:50 AM Till Rohrmann  >
> > > > wrote:
> > > >
> > > > > Thanks for bringing this topic to our attention Xintong. I think
> your
> > > > > proposal makes a lot of sense and we should follow it. It will
> give us
> > > > > confidence that our changes are working and it might be a good
> > > incentive
> > > > to
> > > > > quickly fix build instabilities. Hence, +1.
> > > > >
> > > > > Cheers,
> > > > > Till
> > > > >
> > > > > On Tue, Jun 22, 2021 at 11:12 AM Xintong Song <
> tonysong...@gmail.com>
> > > > > wrote:
> > > > >
> > > > > > Hi everyone,
> > > > > >
> > > > > > In the past a couple of weeks, I've observed several times that
> PRs
> > > are
> > > > > > merged without a green light from the CI tests, where failure
> cases
> > > are
> > > > > > considered *unrelated*. This may not always cause problems, but
> would
> > > > > > increase the chance of breaking our code base. In fact, it has
> > > occurred
> > > > > to
> > > > > > me twice in the past few weeks that I had to revert a commit
> which
> > > > breaks
> > > > > > the master branch due to this.
> > > > > >
> > > > > > I think it would be nicer to enforce a stricter rule, that no PRs
> > > > should
> > > > > be
> > > > > > merged without passing CI.
> > > > > >
> > > > > > The problems of merging PRs with "unrelated" test failures are:
> > > > > > - It's not always straightforward to tell whether a test
> failures are
> > > > > > related or not.
> > > > > > - It prevents subsequent test cases from being executed, which
> may
> > > fail
> > > > > > relating to the PR changes.
> > > > > >
> > > > > > To make things easier for the committers, the following
> exceptions
> > > > might
> > > > > be
> > > > > > considered acceptable.
> > > > > > - The PR has passed CI in the contributor's personal workspace.
> > > Please
> > > > > post
> > > > > > the link in such cases.
> > > > > > - The CI tests have been triggered multiple times, on the same
> > > commit,
> > > > > and
> > > > > > each stage has at least passed for once. Please also comment in
> such
> > > > > cases.
> > > > > >
> > > > > > If we all agree on this, I'd update the community guidelines for
> > > > merging
> > > > > > PRs wrt. this proposal. [1]
> > > > > >
> > > > > > Please let me know what do you think.
> > > > > >
> > > > > > Thank you~
> > > > > >
> > > > > > Xintong Song
> > > > > >
> > > > > >
> > > > > > [1]
> > > > > >
> > > >
> https://cwiki.apache.org/confluence/display/FLINK/Merging+Pull+Requests
> > > > > >
> > > > >
> > > >
> > >
>


Re: [DISCUSS] Do not merge PRs with "unrelated" test failures.

2021-06-22 Thread Yangze Guo
+1 for appending this to community guidelines for merging PRs.

@Till Rohrmann
I agree that with this approach unstable tests will not block other
commit merges. However, it might be hard to prevent merging commits
that are related to those tests and should have been passed them. It's
true that this judgment can be made by the committers, but no one can
ensure the judgment is always precise and so that we have this
discussion thread.

Regarding the unstable tests, how about adding another exception:
committers verify it in their local environment and comment in such
cases?

Best,
Yangze Guo

On Tue, Jun 22, 2021 at 8:23 PM 刘建刚  wrote:
>
> It is a good principle to run all tests successfully with any change.  This
> means a lot for project's stability and development. I am big +1 for this
> proposal.
>
> Best
> liujiangang
>
> Till Rohrmann  于2021年6月22日周二 下午6:36写道:
>
> > One way to address the problem of regularly failing tests that block
> > merging of PRs is to disable the respective tests for the time being. Of
> > course, the failing test then needs to be fixed. But at least that way we
> > would not block everyone from making progress.
> >
> > Cheers,
> > Till
> >
> > On Tue, Jun 22, 2021 at 12:00 PM Arvid Heise  wrote:
> >
> > > I think this is overall a good idea. So +1 from my side.
> > > However, I'd like to put a higher priority on infrastructure then, in
> > > particular docker image/artifact caches.
> > >
> > > On Tue, Jun 22, 2021 at 11:50 AM Till Rohrmann 
> > > wrote:
> > >
> > > > Thanks for bringing this topic to our attention Xintong. I think your
> > > > proposal makes a lot of sense and we should follow it. It will give us
> > > > confidence that our changes are working and it might be a good
> > incentive
> > > to
> > > > quickly fix build instabilities. Hence, +1.
> > > >
> > > > Cheers,
> > > > Till
> > > >
> > > > On Tue, Jun 22, 2021 at 11:12 AM Xintong Song 
> > > > wrote:
> > > >
> > > > > Hi everyone,
> > > > >
> > > > > In the past a couple of weeks, I've observed several times that PRs
> > are
> > > > > merged without a green light from the CI tests, where failure cases
> > are
> > > > > considered *unrelated*. This may not always cause problems, but would
> > > > > increase the chance of breaking our code base. In fact, it has
> > occurred
> > > > to
> > > > > me twice in the past few weeks that I had to revert a commit which
> > > breaks
> > > > > the master branch due to this.
> > > > >
> > > > > I think it would be nicer to enforce a stricter rule, that no PRs
> > > should
> > > > be
> > > > > merged without passing CI.
> > > > >
> > > > > The problems of merging PRs with "unrelated" test failures are:
> > > > > - It's not always straightforward to tell whether a test failures are
> > > > > related or not.
> > > > > - It prevents subsequent test cases from being executed, which may
> > fail
> > > > > relating to the PR changes.
> > > > >
> > > > > To make things easier for the committers, the following exceptions
> > > might
> > > > be
> > > > > considered acceptable.
> > > > > - The PR has passed CI in the contributor's personal workspace.
> > Please
> > > > post
> > > > > the link in such cases.
> > > > > - The CI tests have been triggered multiple times, on the same
> > commit,
> > > > and
> > > > > each stage has at least passed for once. Please also comment in such
> > > > cases.
> > > > >
> > > > > If we all agree on this, I'd update the community guidelines for
> > > merging
> > > > > PRs wrt. this proposal. [1]
> > > > >
> > > > > Please let me know what do you think.
> > > > >
> > > > > Thank you~
> > > > >
> > > > > Xintong Song
> > > > >
> > > > >
> > > > > [1]
> > > > >
> > > https://cwiki.apache.org/confluence/display/FLINK/Merging+Pull+Requests
> > > > >
> > > >
> > >
> >


Re: [DISCUSS] Do not merge PRs with "unrelated" test failures.

2021-06-22 Thread 刘建刚
It is a good principle to run all tests successfully with any change.  This
means a lot for project's stability and development. I am big +1 for this
proposal.

Best
liujiangang

Till Rohrmann  于2021年6月22日周二 下午6:36写道:

> One way to address the problem of regularly failing tests that block
> merging of PRs is to disable the respective tests for the time being. Of
> course, the failing test then needs to be fixed. But at least that way we
> would not block everyone from making progress.
>
> Cheers,
> Till
>
> On Tue, Jun 22, 2021 at 12:00 PM Arvid Heise  wrote:
>
> > I think this is overall a good idea. So +1 from my side.
> > However, I'd like to put a higher priority on infrastructure then, in
> > particular docker image/artifact caches.
> >
> > On Tue, Jun 22, 2021 at 11:50 AM Till Rohrmann 
> > wrote:
> >
> > > Thanks for bringing this topic to our attention Xintong. I think your
> > > proposal makes a lot of sense and we should follow it. It will give us
> > > confidence that our changes are working and it might be a good
> incentive
> > to
> > > quickly fix build instabilities. Hence, +1.
> > >
> > > Cheers,
> > > Till
> > >
> > > On Tue, Jun 22, 2021 at 11:12 AM Xintong Song 
> > > wrote:
> > >
> > > > Hi everyone,
> > > >
> > > > In the past a couple of weeks, I've observed several times that PRs
> are
> > > > merged without a green light from the CI tests, where failure cases
> are
> > > > considered *unrelated*. This may not always cause problems, but would
> > > > increase the chance of breaking our code base. In fact, it has
> occurred
> > > to
> > > > me twice in the past few weeks that I had to revert a commit which
> > breaks
> > > > the master branch due to this.
> > > >
> > > > I think it would be nicer to enforce a stricter rule, that no PRs
> > should
> > > be
> > > > merged without passing CI.
> > > >
> > > > The problems of merging PRs with "unrelated" test failures are:
> > > > - It's not always straightforward to tell whether a test failures are
> > > > related or not.
> > > > - It prevents subsequent test cases from being executed, which may
> fail
> > > > relating to the PR changes.
> > > >
> > > > To make things easier for the committers, the following exceptions
> > might
> > > be
> > > > considered acceptable.
> > > > - The PR has passed CI in the contributor's personal workspace.
> Please
> > > post
> > > > the link in such cases.
> > > > - The CI tests have been triggered multiple times, on the same
> commit,
> > > and
> > > > each stage has at least passed for once. Please also comment in such
> > > cases.
> > > >
> > > > If we all agree on this, I'd update the community guidelines for
> > merging
> > > > PRs wrt. this proposal. [1]
> > > >
> > > > Please let me know what do you think.
> > > >
> > > > Thank you~
> > > >
> > > > Xintong Song
> > > >
> > > >
> > > > [1]
> > > >
> > https://cwiki.apache.org/confluence/display/FLINK/Merging+Pull+Requests
> > > >
> > >
> >
>


Re: [DISCUSS] Do not merge PRs with "unrelated" test failures.

2021-06-22 Thread Till Rohrmann
One way to address the problem of regularly failing tests that block
merging of PRs is to disable the respective tests for the time being. Of
course, the failing test then needs to be fixed. But at least that way we
would not block everyone from making progress.

Cheers,
Till

On Tue, Jun 22, 2021 at 12:00 PM Arvid Heise  wrote:

> I think this is overall a good idea. So +1 from my side.
> However, I'd like to put a higher priority on infrastructure then, in
> particular docker image/artifact caches.
>
> On Tue, Jun 22, 2021 at 11:50 AM Till Rohrmann 
> wrote:
>
> > Thanks for bringing this topic to our attention Xintong. I think your
> > proposal makes a lot of sense and we should follow it. It will give us
> > confidence that our changes are working and it might be a good incentive
> to
> > quickly fix build instabilities. Hence, +1.
> >
> > Cheers,
> > Till
> >
> > On Tue, Jun 22, 2021 at 11:12 AM Xintong Song 
> > wrote:
> >
> > > Hi everyone,
> > >
> > > In the past a couple of weeks, I've observed several times that PRs are
> > > merged without a green light from the CI tests, where failure cases are
> > > considered *unrelated*. This may not always cause problems, but would
> > > increase the chance of breaking our code base. In fact, it has occurred
> > to
> > > me twice in the past few weeks that I had to revert a commit which
> breaks
> > > the master branch due to this.
> > >
> > > I think it would be nicer to enforce a stricter rule, that no PRs
> should
> > be
> > > merged without passing CI.
> > >
> > > The problems of merging PRs with "unrelated" test failures are:
> > > - It's not always straightforward to tell whether a test failures are
> > > related or not.
> > > - It prevents subsequent test cases from being executed, which may fail
> > > relating to the PR changes.
> > >
> > > To make things easier for the committers, the following exceptions
> might
> > be
> > > considered acceptable.
> > > - The PR has passed CI in the contributor's personal workspace. Please
> > post
> > > the link in such cases.
> > > - The CI tests have been triggered multiple times, on the same commit,
> > and
> > > each stage has at least passed for once. Please also comment in such
> > cases.
> > >
> > > If we all agree on this, I'd update the community guidelines for
> merging
> > > PRs wrt. this proposal. [1]
> > >
> > > Please let me know what do you think.
> > >
> > > Thank you~
> > >
> > > Xintong Song
> > >
> > >
> > > [1]
> > >
> https://cwiki.apache.org/confluence/display/FLINK/Merging+Pull+Requests
> > >
> >
>


Re: [DISCUSS] Do not merge PRs with "unrelated" test failures.

2021-06-22 Thread Arvid Heise
I think this is overall a good idea. So +1 from my side.
However, I'd like to put a higher priority on infrastructure then, in
particular docker image/artifact caches.

On Tue, Jun 22, 2021 at 11:50 AM Till Rohrmann  wrote:

> Thanks for bringing this topic to our attention Xintong. I think your
> proposal makes a lot of sense and we should follow it. It will give us
> confidence that our changes are working and it might be a good incentive to
> quickly fix build instabilities. Hence, +1.
>
> Cheers,
> Till
>
> On Tue, Jun 22, 2021 at 11:12 AM Xintong Song 
> wrote:
>
> > Hi everyone,
> >
> > In the past a couple of weeks, I've observed several times that PRs are
> > merged without a green light from the CI tests, where failure cases are
> > considered *unrelated*. This may not always cause problems, but would
> > increase the chance of breaking our code base. In fact, it has occurred
> to
> > me twice in the past few weeks that I had to revert a commit which breaks
> > the master branch due to this.
> >
> > I think it would be nicer to enforce a stricter rule, that no PRs should
> be
> > merged without passing CI.
> >
> > The problems of merging PRs with "unrelated" test failures are:
> > - It's not always straightforward to tell whether a test failures are
> > related or not.
> > - It prevents subsequent test cases from being executed, which may fail
> > relating to the PR changes.
> >
> > To make things easier for the committers, the following exceptions might
> be
> > considered acceptable.
> > - The PR has passed CI in the contributor's personal workspace. Please
> post
> > the link in such cases.
> > - The CI tests have been triggered multiple times, on the same commit,
> and
> > each stage has at least passed for once. Please also comment in such
> cases.
> >
> > If we all agree on this, I'd update the community guidelines for merging
> > PRs wrt. this proposal. [1]
> >
> > Please let me know what do you think.
> >
> > Thank you~
> >
> > Xintong Song
> >
> >
> > [1]
> > https://cwiki.apache.org/confluence/display/FLINK/Merging+Pull+Requests
> >
>


Re: [DISCUSS] Do not merge PRs with "unrelated" test failures.

2021-06-22 Thread Till Rohrmann
Thanks for bringing this topic to our attention Xintong. I think your
proposal makes a lot of sense and we should follow it. It will give us
confidence that our changes are working and it might be a good incentive to
quickly fix build instabilities. Hence, +1.

Cheers,
Till

On Tue, Jun 22, 2021 at 11:12 AM Xintong Song  wrote:

> Hi everyone,
>
> In the past a couple of weeks, I've observed several times that PRs are
> merged without a green light from the CI tests, where failure cases are
> considered *unrelated*. This may not always cause problems, but would
> increase the chance of breaking our code base. In fact, it has occurred to
> me twice in the past few weeks that I had to revert a commit which breaks
> the master branch due to this.
>
> I think it would be nicer to enforce a stricter rule, that no PRs should be
> merged without passing CI.
>
> The problems of merging PRs with "unrelated" test failures are:
> - It's not always straightforward to tell whether a test failures are
> related or not.
> - It prevents subsequent test cases from being executed, which may fail
> relating to the PR changes.
>
> To make things easier for the committers, the following exceptions might be
> considered acceptable.
> - The PR has passed CI in the contributor's personal workspace. Please post
> the link in such cases.
> - The CI tests have been triggered multiple times, on the same commit, and
> each stage has at least passed for once. Please also comment in such cases.
>
> If we all agree on this, I'd update the community guidelines for merging
> PRs wrt. this proposal. [1]
>
> Please let me know what do you think.
>
> Thank you~
>
> Xintong Song
>
>
> [1]
> https://cwiki.apache.org/confluence/display/FLINK/Merging+Pull+Requests
>


[DISCUSS] Do not merge PRs with "unrelated" test failures.

2021-06-22 Thread Xintong Song
Hi everyone,

In the past a couple of weeks, I've observed several times that PRs are
merged without a green light from the CI tests, where failure cases are
considered *unrelated*. This may not always cause problems, but would
increase the chance of breaking our code base. In fact, it has occurred to
me twice in the past few weeks that I had to revert a commit which breaks
the master branch due to this.

I think it would be nicer to enforce a stricter rule, that no PRs should be
merged without passing CI.

The problems of merging PRs with "unrelated" test failures are:
- It's not always straightforward to tell whether a test failures are
related or not.
- It prevents subsequent test cases from being executed, which may fail
relating to the PR changes.

To make things easier for the committers, the following exceptions might be
considered acceptable.
- The PR has passed CI in the contributor's personal workspace. Please post
the link in such cases.
- The CI tests have been triggered multiple times, on the same commit, and
each stage has at least passed for once. Please also comment in such cases.

If we all agree on this, I'd update the community guidelines for merging
PRs wrt. this proposal. [1]

Please let me know what do you think.

Thank you~

Xintong Song


[1] https://cwiki.apache.org/confluence/display/FLINK/Merging+Pull+Requests