Re: [DISCUSS] Do not merge PRs with "unrelated" test failures.
Thanks, Etienne. Comments addressed. Thank you~ Xintong Song On Wed, Jul 7, 2021 at 6:41 PM Etienne Chauchot wrote: > Hi, > > Thanks Xintong for the very good doc ! I added 2 comments to it. > > Best, > > Etienne > > On 02/07/2021 05:57, Xintong Song wrote: > > Thanks all for the positive feedback. > > > > I have updated the wiki page [1], and will send an announcement in a > > separate thread, to draw more committers' attention. > > > > Moreover, I've opened FLINK-23212 where we can continue with the > discussion > > around pure documentation changing PRs. > > > > Thank you~ > > > > Xintong Song > > > > > > [1] > https://cwiki.apache.org/confluence/display/FLINK/Merging+Pull+Requests > > > > On Wed, Jun 30, 2021 at 5:26 PM Xintong Song > wrote: > > > >> I second Tison's opinion. > >> > >> Similar to how we skip docs_404_check for PRs that do not touch the > >> documentation, we can skip other stages for PRs that only contain > >> documentation changes. > >> > >> In addition to making merging documentation PRs easier, we can also > reduce > >> the workload on CI workers. Especially during the last days of a release > >> cycle, which is usually the most busy time for the CI workers, and is > also > >> where most documentation efforts take place. > >> > >> Thank you~ > >> > >> Xintong Song > >> > >> > >> > >> On Wed, Jun 30, 2021 at 3:56 PM Till Rohrmann > >> wrote: > >> > >>> I think you are right Tison that docs are a special case and they only > >>> require flink-docs to pass. What I am wondering is how much of a > problem > >>> this will be (assuming that we have a decent build stability). The more > >>> exceptions we add, the harder it will be to properly follow the > >>> guidelines. > >>> Maybe we can observe how many docs PRs get delayed/not merged because > of > >>> this and then revisit this discussion if needed. > >>> > >>> Cheers, > >>> Till > >>> > >>> On Wed, Jun 30, 2021 at 8:30 AM tison wrote: > >>> > Hi, > > There are a number of PRs modifying docs only, but we still require > all > tests passed on that. > > It is a good proposal we avoid merge PR with "unrelated" failure, but > >>> can > we improve the case where the contributor only works for docs? > > For example, base on the file change set, run doc tests only. > > Best, > tison. > > > godfrey he 于2021年6月30日周三 下午2:17写道: > > > +1 for the proposal. Thanks Xintong! > > > > Best, > > Godfrey > > > > > > > > Rui Li 于2021年6月30日周三 上午11:36写道: > > > >> Thanks Xintong. +1 to the proposal. > >> > >> On Tue, Jun 29, 2021 at 11:05 AM 刘建刚 > wrote: > >>> +1 for the proposal. Since the test time is long and environment > >>> may > >> vary, > >>> unstable tests are really annoying for developers. The solution is > >> welcome. > >>> Best > >>> liujiangang > >>> > >>> Jingsong Li 于2021年6月29日周二 上午10:31写道: > >>> > +1 Thanks Xintong for the update! > > Best, > Jingsong > > On Mon, Jun 28, 2021 at 6:44 PM Till Rohrmann < > trohrm...@apache.org> > wrote: > > > +1, thanks for updating the guidelines Xintong! > > > > Cheers, > > Till > > > > On Mon, Jun 28, 2021 at 11:49 AM Yangze Guo < > >>> karma...@gmail.com> > >>> wrote: > >> +1 > >> > >> Thanks Xintong for drafting this doc. > >> > >> Best, > >> Yangze Guo > >> > >> On Mon, Jun 28, 2021 at 5:42 PM JING ZHANG < > beyond1...@gmail.com > wrote: > >>> Thanks Xintong for giving detailed documentation. > >>> > >>> The best practice for handling test failure is very > >>> detailed, > >> it's > >>> a > > good > >>> guidelines document with clear action steps. > >>> > >>> +1 to Xintong's proposal. > >>> > >>> Xintong Song 于2021年6月28日周一 > >>> 下午4:07写道: > Thanks all for the discussion. > > Based on the opinions so far, I've drafted the new > guidelines > >>> [1], > > as a > potential replacement of the original wiki page [2]. > > Hopefully this draft has covered the most opinions > discussed > >> and > >> consensus > made in this discussion thread. > > Looking forward to your feedback. > > Thank you~ > > Xintong Song > > > [1] > > > >>> > https://docs.google.com/document/d/1uUbxbgbGErBXtmEjhwVhBWG3i6nhQ0LXs96OlntEYnU/edit?usp=sharing > [2] > > > https://cwiki.apache.org/confluence/display/FLINK/Merging+Pull+Requests > > > On F
Re: [DISCUSS] Do not merge PRs with "unrelated" test failures.
Hi, Thanks Xintong for the very good doc ! I added 2 comments to it. Best, Etienne On 02/07/2021 05:57, Xintong Song wrote: Thanks all for the positive feedback. I have updated the wiki page [1], and will send an announcement in a separate thread, to draw more committers' attention. Moreover, I've opened FLINK-23212 where we can continue with the discussion around pure documentation changing PRs. Thank you~ Xintong Song [1] https://cwiki.apache.org/confluence/display/FLINK/Merging+Pull+Requests On Wed, Jun 30, 2021 at 5:26 PM Xintong Song wrote: I second Tison's opinion. Similar to how we skip docs_404_check for PRs that do not touch the documentation, we can skip other stages for PRs that only contain documentation changes. In addition to making merging documentation PRs easier, we can also reduce the workload on CI workers. Especially during the last days of a release cycle, which is usually the most busy time for the CI workers, and is also where most documentation efforts take place. Thank you~ Xintong Song On Wed, Jun 30, 2021 at 3:56 PM Till Rohrmann wrote: I think you are right Tison that docs are a special case and they only require flink-docs to pass. What I am wondering is how much of a problem this will be (assuming that we have a decent build stability). The more exceptions we add, the harder it will be to properly follow the guidelines. Maybe we can observe how many docs PRs get delayed/not merged because of this and then revisit this discussion if needed. Cheers, Till On Wed, Jun 30, 2021 at 8:30 AM tison wrote: Hi, There are a number of PRs modifying docs only, but we still require all tests passed on that. It is a good proposal we avoid merge PR with "unrelated" failure, but can we improve the case where the contributor only works for docs? For example, base on the file change set, run doc tests only. Best, tison. godfrey he 于2021年6月30日周三 下午2:17写道: +1 for the proposal. Thanks Xintong! Best, Godfrey Rui Li 于2021年6月30日周三 上午11:36写道: Thanks Xintong. +1 to the proposal. On Tue, Jun 29, 2021 at 11:05 AM 刘建刚 wrote: +1 for the proposal. Since the test time is long and environment may vary, unstable tests are really annoying for developers. The solution is welcome. Best liujiangang Jingsong Li 于2021年6月29日周二 上午10:31写道: +1 Thanks Xintong for the update! Best, Jingsong On Mon, Jun 28, 2021 at 6:44 PM Till Rohrmann < trohrm...@apache.org> wrote: +1, thanks for updating the guidelines Xintong! Cheers, Till On Mon, Jun 28, 2021 at 11:49 AM Yangze Guo < karma...@gmail.com> wrote: +1 Thanks Xintong for drafting this doc. Best, Yangze Guo On Mon, Jun 28, 2021 at 5:42 PM JING ZHANG < beyond1...@gmail.com wrote: Thanks Xintong for giving detailed documentation. The best practice for handling test failure is very detailed, it's a good guidelines document with clear action steps. +1 to Xintong's proposal. Xintong Song 于2021年6月28日周一 下午4:07写道: Thanks all for the discussion. Based on the opinions so far, I've drafted the new guidelines [1], as a potential replacement of the original wiki page [2]. Hopefully this draft has covered the most opinions discussed and consensus made in this discussion thread. Looking forward to your feedback. Thank you~ Xintong Song [1] https://docs.google.com/document/d/1uUbxbgbGErBXtmEjhwVhBWG3i6nhQ0LXs96OlntEYnU/edit?usp=sharing [2] https://cwiki.apache.org/confluence/display/FLINK/Merging+Pull+Requests On Fri, Jun 25, 2021 at 10:40 PM Piotr Nowojski < pnowoj...@apache.org> wrote: Thanks for the clarification Till. +1 for what you have written. Piotrek pt., 25 cze 2021 o 16:00 Till Rohrmann < trohrm...@apache.org napisał(a): One quick note for clarification. I don't have anything against builds running on your personal Azure account and this is not what I understood under "local environment". For me "local environment" means that someone runs the test locally on his machine and then says that the tests have passed locally. I do agree that there might be a conflict of interests if a PR author disables tests. Here I would argue that we don't have malignant committers which means that every committer will probably first check the respective ticket for how often the test failed. Then I guess the next step would be to discuss on the ticket whether to disable it or not. And finally, after reaching a consensus, it will be disabled. If we see someone abusing this policy, then we can still think about how to guard against it. But, honestly, I have very rarely seen such a case. I am also ok to pull in the release manager to make the final call if this resolves concerns. Cheers, Till On Fri, Jun 25, 2021 at 9:07 AM Piotr Nowojski < pnowoj...@apache.org> wrote: +1 for the general idea, however I have concerns about a couple of details. I would first try to not introduce the exception for local builds. I
Re: [DISCUSS] Do not merge PRs with "unrelated" test failures.
Thanks all for the positive feedback. I have updated the wiki page [1], and will send an announcement in a separate thread, to draw more committers' attention. Moreover, I've opened FLINK-23212 where we can continue with the discussion around pure documentation changing PRs. Thank you~ Xintong Song [1] https://cwiki.apache.org/confluence/display/FLINK/Merging+Pull+Requests On Wed, Jun 30, 2021 at 5:26 PM Xintong Song wrote: > I second Tison's opinion. > > Similar to how we skip docs_404_check for PRs that do not touch the > documentation, we can skip other stages for PRs that only contain > documentation changes. > > In addition to making merging documentation PRs easier, we can also reduce > the workload on CI workers. Especially during the last days of a release > cycle, which is usually the most busy time for the CI workers, and is also > where most documentation efforts take place. > > Thank you~ > > Xintong Song > > > > On Wed, Jun 30, 2021 at 3:56 PM Till Rohrmann > wrote: > >> I think you are right Tison that docs are a special case and they only >> require flink-docs to pass. What I am wondering is how much of a problem >> this will be (assuming that we have a decent build stability). The more >> exceptions we add, the harder it will be to properly follow the >> guidelines. >> Maybe we can observe how many docs PRs get delayed/not merged because of >> this and then revisit this discussion if needed. >> >> Cheers, >> Till >> >> On Wed, Jun 30, 2021 at 8:30 AM tison wrote: >> >> > Hi, >> > >> > There are a number of PRs modifying docs only, but we still require all >> > tests passed on that. >> > >> > It is a good proposal we avoid merge PR with "unrelated" failure, but >> can >> > we improve the case where the contributor only works for docs? >> > >> > For example, base on the file change set, run doc tests only. >> > >> > Best, >> > tison. >> > >> > >> > godfrey he 于2021年6月30日周三 下午2:17写道: >> > >> > > +1 for the proposal. Thanks Xintong! >> > > >> > > Best, >> > > Godfrey >> > > >> > > >> > > >> > > Rui Li 于2021年6月30日周三 上午11:36写道: >> > > >> > > > Thanks Xintong. +1 to the proposal. >> > > > >> > > > On Tue, Jun 29, 2021 at 11:05 AM 刘建刚 >> > wrote: >> > > > >> > > > > +1 for the proposal. Since the test time is long and environment >> may >> > > > vary, >> > > > > unstable tests are really annoying for developers. The solution is >> > > > welcome. >> > > > > >> > > > > Best >> > > > > liujiangang >> > > > > >> > > > > Jingsong Li 于2021年6月29日周二 上午10:31写道: >> > > > > >> > > > > > +1 Thanks Xintong for the update! >> > > > > > >> > > > > > Best, >> > > > > > Jingsong >> > > > > > >> > > > > > On Mon, Jun 28, 2021 at 6:44 PM Till Rohrmann < >> > trohrm...@apache.org> >> > > > > > wrote: >> > > > > > >> > > > > > > +1, thanks for updating the guidelines Xintong! >> > > > > > > >> > > > > > > Cheers, >> > > > > > > Till >> > > > > > > >> > > > > > > On Mon, Jun 28, 2021 at 11:49 AM Yangze Guo < >> karma...@gmail.com> >> > > > > wrote: >> > > > > > > >> > > > > > > > +1 >> > > > > > > > >> > > > > > > > Thanks Xintong for drafting this doc. >> > > > > > > > >> > > > > > > > Best, >> > > > > > > > Yangze Guo >> > > > > > > > >> > > > > > > > On Mon, Jun 28, 2021 at 5:42 PM JING ZHANG < >> > beyond1...@gmail.com >> > > > >> > > > > > wrote: >> > > > > > > > > >> > > > > > > > > Thanks Xintong for giving detailed documentation. >> > > > > > > > > >> > > > > > > > > The best practice for handling test failure is very >> detailed, >> > > > it's >> > > > > a >> > > > > > > good >> > > > > > > > > guidelines document with clear action steps. >> > > > > > > > > >> > > > > > > > > +1 to Xintong's proposal. >> > > > > > > > > >> > > > > > > > > Xintong Song 于2021年6月28日周一 >> 下午4:07写道: >> > > > > > > > > >> > > > > > > > > > Thanks all for the discussion. >> > > > > > > > > > >> > > > > > > > > > Based on the opinions so far, I've drafted the new >> > guidelines >> > > > > [1], >> > > > > > > as a >> > > > > > > > > > potential replacement of the original wiki page [2]. >> > > > > > > > > > >> > > > > > > > > > Hopefully this draft has covered the most opinions >> > discussed >> > > > and >> > > > > > > > consensus >> > > > > > > > > > made in this discussion thread. >> > > > > > > > > > >> > > > > > > > > > Looking forward to your feedback. >> > > > > > > > > > >> > > > > > > > > > Thank you~ >> > > > > > > > > > >> > > > > > > > > > Xintong Song >> > > > > > > > > > >> > > > > > > > > > >> > > > > > > > > > [1] >> > > > > > > > > > >> > > > > > > > > > >> > > > > > > > >> > > > > > > >> > > > > > >> > > > > >> > > > >> > > >> > >> https://docs.google.com/document/d/1uUbxbgbGErBXtmEjhwVhBWG3i6nhQ0LXs96OlntEYnU/edit?usp=sharing >> > > > > > > > > > >> > > > > > > > > > [2] >> > > > > > > > > > >> > > > > > > > >> > > > > > >> > > > >> > https://cwiki.apache.org/confluence/display/FLINK/Merging+Pull+Requests >> > > > > > > > > > >> > > > > > > > > > >> > > > > > > > > > >> > > > > > > > > > On Fri, Jun 2
Re: [DISCUSS] Do not merge PRs with "unrelated" test failures.
I second Tison's opinion. Similar to how we skip docs_404_check for PRs that do not touch the documentation, we can skip other stages for PRs that only contain documentation changes. In addition to making merging documentation PRs easier, we can also reduce the workload on CI workers. Especially during the last days of a release cycle, which is usually the most busy time for the CI workers, and is also where most documentation efforts take place. Thank you~ Xintong Song On Wed, Jun 30, 2021 at 3:56 PM Till Rohrmann wrote: > I think you are right Tison that docs are a special case and they only > require flink-docs to pass. What I am wondering is how much of a problem > this will be (assuming that we have a decent build stability). The more > exceptions we add, the harder it will be to properly follow the guidelines. > Maybe we can observe how many docs PRs get delayed/not merged because of > this and then revisit this discussion if needed. > > Cheers, > Till > > On Wed, Jun 30, 2021 at 8:30 AM tison wrote: > > > Hi, > > > > There are a number of PRs modifying docs only, but we still require all > > tests passed on that. > > > > It is a good proposal we avoid merge PR with "unrelated" failure, but can > > we improve the case where the contributor only works for docs? > > > > For example, base on the file change set, run doc tests only. > > > > Best, > > tison. > > > > > > godfrey he 于2021年6月30日周三 下午2:17写道: > > > > > +1 for the proposal. Thanks Xintong! > > > > > > Best, > > > Godfrey > > > > > > > > > > > > Rui Li 于2021年6月30日周三 上午11:36写道: > > > > > > > Thanks Xintong. +1 to the proposal. > > > > > > > > On Tue, Jun 29, 2021 at 11:05 AM 刘建刚 > > wrote: > > > > > > > > > +1 for the proposal. Since the test time is long and environment > may > > > > vary, > > > > > unstable tests are really annoying for developers. The solution is > > > > welcome. > > > > > > > > > > Best > > > > > liujiangang > > > > > > > > > > Jingsong Li 于2021年6月29日周二 上午10:31写道: > > > > > > > > > > > +1 Thanks Xintong for the update! > > > > > > > > > > > > Best, > > > > > > Jingsong > > > > > > > > > > > > On Mon, Jun 28, 2021 at 6:44 PM Till Rohrmann < > > trohrm...@apache.org> > > > > > > wrote: > > > > > > > > > > > > > +1, thanks for updating the guidelines Xintong! > > > > > > > > > > > > > > Cheers, > > > > > > > Till > > > > > > > > > > > > > > On Mon, Jun 28, 2021 at 11:49 AM Yangze Guo < > karma...@gmail.com> > > > > > wrote: > > > > > > > > > > > > > > > +1 > > > > > > > > > > > > > > > > Thanks Xintong for drafting this doc. > > > > > > > > > > > > > > > > Best, > > > > > > > > Yangze Guo > > > > > > > > > > > > > > > > On Mon, Jun 28, 2021 at 5:42 PM JING ZHANG < > > beyond1...@gmail.com > > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > Thanks Xintong for giving detailed documentation. > > > > > > > > > > > > > > > > > > The best practice for handling test failure is very > detailed, > > > > it's > > > > > a > > > > > > > good > > > > > > > > > guidelines document with clear action steps. > > > > > > > > > > > > > > > > > > +1 to Xintong's proposal. > > > > > > > > > > > > > > > > > > Xintong Song 于2021年6月28日周一 > 下午4:07写道: > > > > > > > > > > > > > > > > > > > Thanks all for the discussion. > > > > > > > > > > > > > > > > > > > > Based on the opinions so far, I've drafted the new > > guidelines > > > > > [1], > > > > > > > as a > > > > > > > > > > potential replacement of the original wiki page [2]. > > > > > > > > > > > > > > > > > > > > Hopefully this draft has covered the most opinions > > discussed > > > > and > > > > > > > > consensus > > > > > > > > > > made in this discussion thread. > > > > > > > > > > > > > > > > > > > > Looking forward to your feedback. > > > > > > > > > > > > > > > > > > > > Thank you~ > > > > > > > > > > > > > > > > > > > > Xintong Song > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > [1] > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > https://docs.google.com/document/d/1uUbxbgbGErBXtmEjhwVhBWG3i6nhQ0LXs96OlntEYnU/edit?usp=sharing > > > > > > > > > > > > > > > > > > > > [2] > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > https://cwiki.apache.org/confluence/display/FLINK/Merging+Pull+Requests > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Fri, Jun 25, 2021 at 10:40 PM Piotr Nowojski < > > > > > > > pnowoj...@apache.org> > > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > > > Thanks for the clarification Till. +1 for what you have > > > > > written. > > > > > > > > > > > > > > > > > > > > > > Piotrek > > > > > > > > > > > > > > > > > > > > > > pt., 25 cze 2021 o 16:00 Till Rohrmann < > > > trohrm...@apache.org > > > > > > > > > > > > > > > napisał(a): > > > > > > > > > > > > > > > > > > > > > > > One quick note for clarification. I don't have > anything > > > > > against > > > > > > > > builds > > > > > > > > > > > > running on your persona
Re: [DISCUSS] Do not merge PRs with "unrelated" test failures.
I think you are right Tison that docs are a special case and they only require flink-docs to pass. What I am wondering is how much of a problem this will be (assuming that we have a decent build stability). The more exceptions we add, the harder it will be to properly follow the guidelines. Maybe we can observe how many docs PRs get delayed/not merged because of this and then revisit this discussion if needed. Cheers, Till On Wed, Jun 30, 2021 at 8:30 AM tison wrote: > Hi, > > There are a number of PRs modifying docs only, but we still require all > tests passed on that. > > It is a good proposal we avoid merge PR with "unrelated" failure, but can > we improve the case where the contributor only works for docs? > > For example, base on the file change set, run doc tests only. > > Best, > tison. > > > godfrey he 于2021年6月30日周三 下午2:17写道: > > > +1 for the proposal. Thanks Xintong! > > > > Best, > > Godfrey > > > > > > > > Rui Li 于2021年6月30日周三 上午11:36写道: > > > > > Thanks Xintong. +1 to the proposal. > > > > > > On Tue, Jun 29, 2021 at 11:05 AM 刘建刚 > wrote: > > > > > > > +1 for the proposal. Since the test time is long and environment may > > > vary, > > > > unstable tests are really annoying for developers. The solution is > > > welcome. > > > > > > > > Best > > > > liujiangang > > > > > > > > Jingsong Li 于2021年6月29日周二 上午10:31写道: > > > > > > > > > +1 Thanks Xintong for the update! > > > > > > > > > > Best, > > > > > Jingsong > > > > > > > > > > On Mon, Jun 28, 2021 at 6:44 PM Till Rohrmann < > trohrm...@apache.org> > > > > > wrote: > > > > > > > > > > > +1, thanks for updating the guidelines Xintong! > > > > > > > > > > > > Cheers, > > > > > > Till > > > > > > > > > > > > On Mon, Jun 28, 2021 at 11:49 AM Yangze Guo > > > > wrote: > > > > > > > > > > > > > +1 > > > > > > > > > > > > > > Thanks Xintong for drafting this doc. > > > > > > > > > > > > > > Best, > > > > > > > Yangze Guo > > > > > > > > > > > > > > On Mon, Jun 28, 2021 at 5:42 PM JING ZHANG < > beyond1...@gmail.com > > > > > > > > wrote: > > > > > > > > > > > > > > > > Thanks Xintong for giving detailed documentation. > > > > > > > > > > > > > > > > The best practice for handling test failure is very detailed, > > > it's > > > > a > > > > > > good > > > > > > > > guidelines document with clear action steps. > > > > > > > > > > > > > > > > +1 to Xintong's proposal. > > > > > > > > > > > > > > > > Xintong Song 于2021年6月28日周一 下午4:07写道: > > > > > > > > > > > > > > > > > Thanks all for the discussion. > > > > > > > > > > > > > > > > > > Based on the opinions so far, I've drafted the new > guidelines > > > > [1], > > > > > > as a > > > > > > > > > potential replacement of the original wiki page [2]. > > > > > > > > > > > > > > > > > > Hopefully this draft has covered the most opinions > discussed > > > and > > > > > > > consensus > > > > > > > > > made in this discussion thread. > > > > > > > > > > > > > > > > > > Looking forward to your feedback. > > > > > > > > > > > > > > > > > > Thank you~ > > > > > > > > > > > > > > > > > > Xintong Song > > > > > > > > > > > > > > > > > > > > > > > > > > > [1] > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > https://docs.google.com/document/d/1uUbxbgbGErBXtmEjhwVhBWG3i6nhQ0LXs96OlntEYnU/edit?usp=sharing > > > > > > > > > > > > > > > > > > [2] > > > > > > > > > > > > > > > > > > > > > > > > > https://cwiki.apache.org/confluence/display/FLINK/Merging+Pull+Requests > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Fri, Jun 25, 2021 at 10:40 PM Piotr Nowojski < > > > > > > pnowoj...@apache.org> > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > Thanks for the clarification Till. +1 for what you have > > > > written. > > > > > > > > > > > > > > > > > > > > Piotrek > > > > > > > > > > > > > > > > > > > > pt., 25 cze 2021 o 16:00 Till Rohrmann < > > trohrm...@apache.org > > > > > > > > > > > > > napisał(a): > > > > > > > > > > > > > > > > > > > > > One quick note for clarification. I don't have anything > > > > against > > > > > > > builds > > > > > > > > > > > running on your personal Azure account and this is not > > > what I > > > > > > > > > understood > > > > > > > > > > > under "local environment". For me "local environment" > > means > > > > > that > > > > > > > > > someone > > > > > > > > > > > runs the test locally on his machine and then says that > > the > > > > > > > > > > > tests have passed locally. > > > > > > > > > > > > > > > > > > > > > > I do agree that there might be a conflict of interests > > if a > > > > PR > > > > > > > author > > > > > > > > > > > disables tests. Here I would argue that we don't have > > > > malignant > > > > > > > > > > committers > > > > > > > > > > > which means that every committer will probably first > > check > > > > the > > > > > > > > > respective > > > > > > > > > > > ticket for how often the test failed. Then I guess the > > next > > > > > step > > > > > > > would > > > > > > > > > be >
Re: [DISCUSS] Do not merge PRs with "unrelated" test failures.
Hi, There are a number of PRs modifying docs only, but we still require all tests passed on that. It is a good proposal we avoid merge PR with "unrelated" failure, but can we improve the case where the contributor only works for docs? For example, base on the file change set, run doc tests only. Best, tison. godfrey he 于2021年6月30日周三 下午2:17写道: > +1 for the proposal. Thanks Xintong! > > Best, > Godfrey > > > > Rui Li 于2021年6月30日周三 上午11:36写道: > > > Thanks Xintong. +1 to the proposal. > > > > On Tue, Jun 29, 2021 at 11:05 AM 刘建刚 wrote: > > > > > +1 for the proposal. Since the test time is long and environment may > > vary, > > > unstable tests are really annoying for developers. The solution is > > welcome. > > > > > > Best > > > liujiangang > > > > > > Jingsong Li 于2021年6月29日周二 上午10:31写道: > > > > > > > +1 Thanks Xintong for the update! > > > > > > > > Best, > > > > Jingsong > > > > > > > > On Mon, Jun 28, 2021 at 6:44 PM Till Rohrmann > > > > wrote: > > > > > > > > > +1, thanks for updating the guidelines Xintong! > > > > > > > > > > Cheers, > > > > > Till > > > > > > > > > > On Mon, Jun 28, 2021 at 11:49 AM Yangze Guo > > > wrote: > > > > > > > > > > > +1 > > > > > > > > > > > > Thanks Xintong for drafting this doc. > > > > > > > > > > > > Best, > > > > > > Yangze Guo > > > > > > > > > > > > On Mon, Jun 28, 2021 at 5:42 PM JING ZHANG > > > > > wrote: > > > > > > > > > > > > > > Thanks Xintong for giving detailed documentation. > > > > > > > > > > > > > > The best practice for handling test failure is very detailed, > > it's > > > a > > > > > good > > > > > > > guidelines document with clear action steps. > > > > > > > > > > > > > > +1 to Xintong's proposal. > > > > > > > > > > > > > > Xintong Song 于2021年6月28日周一 下午4:07写道: > > > > > > > > > > > > > > > Thanks all for the discussion. > > > > > > > > > > > > > > > > Based on the opinions so far, I've drafted the new guidelines > > > [1], > > > > > as a > > > > > > > > potential replacement of the original wiki page [2]. > > > > > > > > > > > > > > > > Hopefully this draft has covered the most opinions discussed > > and > > > > > > consensus > > > > > > > > made in this discussion thread. > > > > > > > > > > > > > > > > Looking forward to your feedback. > > > > > > > > > > > > > > > > Thank you~ > > > > > > > > > > > > > > > > Xintong Song > > > > > > > > > > > > > > > > > > > > > > > > [1] > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > https://docs.google.com/document/d/1uUbxbgbGErBXtmEjhwVhBWG3i6nhQ0LXs96OlntEYnU/edit?usp=sharing > > > > > > > > > > > > > > > > [2] > > > > > > > > > > > > > > > > > > > > https://cwiki.apache.org/confluence/display/FLINK/Merging+Pull+Requests > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Fri, Jun 25, 2021 at 10:40 PM Piotr Nowojski < > > > > > pnowoj...@apache.org> > > > > > > > > wrote: > > > > > > > > > > > > > > > > > Thanks for the clarification Till. +1 for what you have > > > written. > > > > > > > > > > > > > > > > > > Piotrek > > > > > > > > > > > > > > > > > > pt., 25 cze 2021 o 16:00 Till Rohrmann < > trohrm...@apache.org > > > > > > > > > > > napisał(a): > > > > > > > > > > > > > > > > > > > One quick note for clarification. I don't have anything > > > against > > > > > > builds > > > > > > > > > > running on your personal Azure account and this is not > > what I > > > > > > > > understood > > > > > > > > > > under "local environment". For me "local environment" > means > > > > that > > > > > > > > someone > > > > > > > > > > runs the test locally on his machine and then says that > the > > > > > > > > > > tests have passed locally. > > > > > > > > > > > > > > > > > > > > I do agree that there might be a conflict of interests > if a > > > PR > > > > > > author > > > > > > > > > > disables tests. Here I would argue that we don't have > > > malignant > > > > > > > > > committers > > > > > > > > > > which means that every committer will probably first > check > > > the > > > > > > > > respective > > > > > > > > > > ticket for how often the test failed. Then I guess the > next > > > > step > > > > > > would > > > > > > > > be > > > > > > > > > > to discuss on the ticket whether to disable it or not. > And > > > > > finally, > > > > > > > > after > > > > > > > > > > reaching a consensus, it will be disabled. If we see > > someone > > > > > > abusing > > > > > > > > this > > > > > > > > > > policy, then we can still think about how to guard > against > > > it. > > > > > But, > > > > > > > > > > honestly, I have very rarely seen such a case. I am also > ok > > > to > > > > > > pull in > > > > > > > > > the > > > > > > > > > > release manager to make the final call if this resolves > > > > concerns. > > > > > > > > > > > > > > > > > > > > Cheers, > > > > > > > > > > Till > > > > > > > > > > > > > > > > > > > > On Fri, Jun 25, 2021 at 9:07 AM Piotr Nowojski < > > > > > > pnowoj...@apache.org> > > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > > > +1 f
Re: [DISCUSS] Do not merge PRs with "unrelated" test failures.
+1 for the proposal. Thanks Xintong! Best, Godfrey Rui Li 于2021年6月30日周三 上午11:36写道: > Thanks Xintong. +1 to the proposal. > > On Tue, Jun 29, 2021 at 11:05 AM 刘建刚 wrote: > > > +1 for the proposal. Since the test time is long and environment may > vary, > > unstable tests are really annoying for developers. The solution is > welcome. > > > > Best > > liujiangang > > > > Jingsong Li 于2021年6月29日周二 上午10:31写道: > > > > > +1 Thanks Xintong for the update! > > > > > > Best, > > > Jingsong > > > > > > On Mon, Jun 28, 2021 at 6:44 PM Till Rohrmann > > > wrote: > > > > > > > +1, thanks for updating the guidelines Xintong! > > > > > > > > Cheers, > > > > Till > > > > > > > > On Mon, Jun 28, 2021 at 11:49 AM Yangze Guo > > wrote: > > > > > > > > > +1 > > > > > > > > > > Thanks Xintong for drafting this doc. > > > > > > > > > > Best, > > > > > Yangze Guo > > > > > > > > > > On Mon, Jun 28, 2021 at 5:42 PM JING ZHANG > > > wrote: > > > > > > > > > > > > Thanks Xintong for giving detailed documentation. > > > > > > > > > > > > The best practice for handling test failure is very detailed, > it's > > a > > > > good > > > > > > guidelines document with clear action steps. > > > > > > > > > > > > +1 to Xintong's proposal. > > > > > > > > > > > > Xintong Song 于2021年6月28日周一 下午4:07写道: > > > > > > > > > > > > > Thanks all for the discussion. > > > > > > > > > > > > > > Based on the opinions so far, I've drafted the new guidelines > > [1], > > > > as a > > > > > > > potential replacement of the original wiki page [2]. > > > > > > > > > > > > > > Hopefully this draft has covered the most opinions discussed > and > > > > > consensus > > > > > > > made in this discussion thread. > > > > > > > > > > > > > > Looking forward to your feedback. > > > > > > > > > > > > > > Thank you~ > > > > > > > > > > > > > > Xintong Song > > > > > > > > > > > > > > > > > > > > > [1] > > > > > > > > > > > > > > > > > > > > > > > > > > > > > https://docs.google.com/document/d/1uUbxbgbGErBXtmEjhwVhBWG3i6nhQ0LXs96OlntEYnU/edit?usp=sharing > > > > > > > > > > > > > > [2] > > > > > > > > > > > > > > > > https://cwiki.apache.org/confluence/display/FLINK/Merging+Pull+Requests > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Fri, Jun 25, 2021 at 10:40 PM Piotr Nowojski < > > > > pnowoj...@apache.org> > > > > > > > wrote: > > > > > > > > > > > > > > > Thanks for the clarification Till. +1 for what you have > > written. > > > > > > > > > > > > > > > > Piotrek > > > > > > > > > > > > > > > > pt., 25 cze 2021 o 16:00 Till Rohrmann > > > > > > > > napisał(a): > > > > > > > > > > > > > > > > > One quick note for clarification. I don't have anything > > against > > > > > builds > > > > > > > > > running on your personal Azure account and this is not > what I > > > > > > > understood > > > > > > > > > under "local environment". For me "local environment" means > > > that > > > > > > > someone > > > > > > > > > runs the test locally on his machine and then says that the > > > > > > > > > tests have passed locally. > > > > > > > > > > > > > > > > > > I do agree that there might be a conflict of interests if a > > PR > > > > > author > > > > > > > > > disables tests. Here I would argue that we don't have > > malignant > > > > > > > > committers > > > > > > > > > which means that every committer will probably first check > > the > > > > > > > respective > > > > > > > > > ticket for how often the test failed. Then I guess the next > > > step > > > > > would > > > > > > > be > > > > > > > > > to discuss on the ticket whether to disable it or not. And > > > > finally, > > > > > > > after > > > > > > > > > reaching a consensus, it will be disabled. If we see > someone > > > > > abusing > > > > > > > this > > > > > > > > > policy, then we can still think about how to guard against > > it. > > > > But, > > > > > > > > > honestly, I have very rarely seen such a case. I am also ok > > to > > > > > pull in > > > > > > > > the > > > > > > > > > release manager to make the final call if this resolves > > > concerns. > > > > > > > > > > > > > > > > > > Cheers, > > > > > > > > > Till > > > > > > > > > > > > > > > > > > On Fri, Jun 25, 2021 at 9:07 AM Piotr Nowojski < > > > > > pnowoj...@apache.org> > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > +1 for the general idea, however I have concerns about a > > > couple > > > > > of > > > > > > > > > details. > > > > > > > > > > > > > > > > > > > > > I would first try to not introduce the exception for > > local > > > > > builds. > > > > > > > > > > > It makes it quite hard for others to verify the build > and > > > to > > > > > make > > > > > > > > sure > > > > > > > > > > that the right things were executed. > > > > > > > > > > > > > > > > > > > > I would counter Till's proposal to ignore local green > > builds. > > > > If > > > > > > > > > committer > > > > > > > > > > is merging and closing a PR, with official azure failure, > > but > > > > > there > > > > > > > > was a > > > > > > > > > > green build
Re: [DISCUSS] Do not merge PRs with "unrelated" test failures.
Thanks Xintong. +1 to the proposal. On Tue, Jun 29, 2021 at 11:05 AM 刘建刚 wrote: > +1 for the proposal. Since the test time is long and environment may vary, > unstable tests are really annoying for developers. The solution is welcome. > > Best > liujiangang > > Jingsong Li 于2021年6月29日周二 上午10:31写道: > > > +1 Thanks Xintong for the update! > > > > Best, > > Jingsong > > > > On Mon, Jun 28, 2021 at 6:44 PM Till Rohrmann > > wrote: > > > > > +1, thanks for updating the guidelines Xintong! > > > > > > Cheers, > > > Till > > > > > > On Mon, Jun 28, 2021 at 11:49 AM Yangze Guo > wrote: > > > > > > > +1 > > > > > > > > Thanks Xintong for drafting this doc. > > > > > > > > Best, > > > > Yangze Guo > > > > > > > > On Mon, Jun 28, 2021 at 5:42 PM JING ZHANG > > wrote: > > > > > > > > > > Thanks Xintong for giving detailed documentation. > > > > > > > > > > The best practice for handling test failure is very detailed, it's > a > > > good > > > > > guidelines document with clear action steps. > > > > > > > > > > +1 to Xintong's proposal. > > > > > > > > > > Xintong Song 于2021年6月28日周一 下午4:07写道: > > > > > > > > > > > Thanks all for the discussion. > > > > > > > > > > > > Based on the opinions so far, I've drafted the new guidelines > [1], > > > as a > > > > > > potential replacement of the original wiki page [2]. > > > > > > > > > > > > Hopefully this draft has covered the most opinions discussed and > > > > consensus > > > > > > made in this discussion thread. > > > > > > > > > > > > Looking forward to your feedback. > > > > > > > > > > > > Thank you~ > > > > > > > > > > > > Xintong Song > > > > > > > > > > > > > > > > > > [1] > > > > > > > > > > > > > > > > > > > > > > https://docs.google.com/document/d/1uUbxbgbGErBXtmEjhwVhBWG3i6nhQ0LXs96OlntEYnU/edit?usp=sharing > > > > > > > > > > > > [2] > > > > > > > > > > > > https://cwiki.apache.org/confluence/display/FLINK/Merging+Pull+Requests > > > > > > > > > > > > > > > > > > > > > > > > On Fri, Jun 25, 2021 at 10:40 PM Piotr Nowojski < > > > pnowoj...@apache.org> > > > > > > wrote: > > > > > > > > > > > > > Thanks for the clarification Till. +1 for what you have > written. > > > > > > > > > > > > > > Piotrek > > > > > > > > > > > > > > pt., 25 cze 2021 o 16:00 Till Rohrmann > > > > > > napisał(a): > > > > > > > > > > > > > > > One quick note for clarification. I don't have anything > against > > > > builds > > > > > > > > running on your personal Azure account and this is not what I > > > > > > understood > > > > > > > > under "local environment". For me "local environment" means > > that > > > > > > someone > > > > > > > > runs the test locally on his machine and then says that the > > > > > > > > tests have passed locally. > > > > > > > > > > > > > > > > I do agree that there might be a conflict of interests if a > PR > > > > author > > > > > > > > disables tests. Here I would argue that we don't have > malignant > > > > > > > committers > > > > > > > > which means that every committer will probably first check > the > > > > > > respective > > > > > > > > ticket for how often the test failed. Then I guess the next > > step > > > > would > > > > > > be > > > > > > > > to discuss on the ticket whether to disable it or not. And > > > finally, > > > > > > after > > > > > > > > reaching a consensus, it will be disabled. If we see someone > > > > abusing > > > > > > this > > > > > > > > policy, then we can still think about how to guard against > it. > > > But, > > > > > > > > honestly, I have very rarely seen such a case. I am also ok > to > > > > pull in > > > > > > > the > > > > > > > > release manager to make the final call if this resolves > > concerns. > > > > > > > > > > > > > > > > Cheers, > > > > > > > > Till > > > > > > > > > > > > > > > > On Fri, Jun 25, 2021 at 9:07 AM Piotr Nowojski < > > > > pnowoj...@apache.org> > > > > > > > > wrote: > > > > > > > > > > > > > > > > > +1 for the general idea, however I have concerns about a > > couple > > > > of > > > > > > > > details. > > > > > > > > > > > > > > > > > > > I would first try to not introduce the exception for > local > > > > builds. > > > > > > > > > > It makes it quite hard for others to verify the build and > > to > > > > make > > > > > > > sure > > > > > > > > > that the right things were executed. > > > > > > > > > > > > > > > > > > I would counter Till's proposal to ignore local green > builds. > > > If > > > > > > > > committer > > > > > > > > > is merging and closing a PR, with official azure failure, > but > > > > there > > > > > > > was a > > > > > > > > > green build before or in local azure it's IMO enough to > leave > > > the > > > > > > > > message: > > > > > > > > > > > > > > > > > > > Latest build failure is a known issue: FLINK-12345 > > > > > > > > > > Green local build: URL > > > > > > > > > > > > > > > > > > This should address Till's concern about verification. > > > > > > > > > > > > > > > > > > On the other hand I have concerns about disabling tests.* > It > > > > > > shouldn't > >
Re: [DISCUSS] Do not merge PRs with "unrelated" test failures.
+1 for the proposal. Since the test time is long and environment may vary, unstable tests are really annoying for developers. The solution is welcome. Best liujiangang Jingsong Li 于2021年6月29日周二 上午10:31写道: > +1 Thanks Xintong for the update! > > Best, > Jingsong > > On Mon, Jun 28, 2021 at 6:44 PM Till Rohrmann > wrote: > > > +1, thanks for updating the guidelines Xintong! > > > > Cheers, > > Till > > > > On Mon, Jun 28, 2021 at 11:49 AM Yangze Guo wrote: > > > > > +1 > > > > > > Thanks Xintong for drafting this doc. > > > > > > Best, > > > Yangze Guo > > > > > > On Mon, Jun 28, 2021 at 5:42 PM JING ZHANG > wrote: > > > > > > > > Thanks Xintong for giving detailed documentation. > > > > > > > > The best practice for handling test failure is very detailed, it's a > > good > > > > guidelines document with clear action steps. > > > > > > > > +1 to Xintong's proposal. > > > > > > > > Xintong Song 于2021年6月28日周一 下午4:07写道: > > > > > > > > > Thanks all for the discussion. > > > > > > > > > > Based on the opinions so far, I've drafted the new guidelines [1], > > as a > > > > > potential replacement of the original wiki page [2]. > > > > > > > > > > Hopefully this draft has covered the most opinions discussed and > > > consensus > > > > > made in this discussion thread. > > > > > > > > > > Looking forward to your feedback. > > > > > > > > > > Thank you~ > > > > > > > > > > Xintong Song > > > > > > > > > > > > > > > [1] > > > > > > > > > > > > > > > > https://docs.google.com/document/d/1uUbxbgbGErBXtmEjhwVhBWG3i6nhQ0LXs96OlntEYnU/edit?usp=sharing > > > > > > > > > > [2] > > > > > > > > > https://cwiki.apache.org/confluence/display/FLINK/Merging+Pull+Requests > > > > > > > > > > > > > > > > > > > > On Fri, Jun 25, 2021 at 10:40 PM Piotr Nowojski < > > pnowoj...@apache.org> > > > > > wrote: > > > > > > > > > > > Thanks for the clarification Till. +1 for what you have written. > > > > > > > > > > > > Piotrek > > > > > > > > > > > > pt., 25 cze 2021 o 16:00 Till Rohrmann > > > > > napisał(a): > > > > > > > > > > > > > One quick note for clarification. I don't have anything against > > > builds > > > > > > > running on your personal Azure account and this is not what I > > > > > understood > > > > > > > under "local environment". For me "local environment" means > that > > > > > someone > > > > > > > runs the test locally on his machine and then says that the > > > > > > > tests have passed locally. > > > > > > > > > > > > > > I do agree that there might be a conflict of interests if a PR > > > author > > > > > > > disables tests. Here I would argue that we don't have malignant > > > > > > committers > > > > > > > which means that every committer will probably first check the > > > > > respective > > > > > > > ticket for how often the test failed. Then I guess the next > step > > > would > > > > > be > > > > > > > to discuss on the ticket whether to disable it or not. And > > finally, > > > > > after > > > > > > > reaching a consensus, it will be disabled. If we see someone > > > abusing > > > > > this > > > > > > > policy, then we can still think about how to guard against it. > > But, > > > > > > > honestly, I have very rarely seen such a case. I am also ok to > > > pull in > > > > > > the > > > > > > > release manager to make the final call if this resolves > concerns. > > > > > > > > > > > > > > Cheers, > > > > > > > Till > > > > > > > > > > > > > > On Fri, Jun 25, 2021 at 9:07 AM Piotr Nowojski < > > > pnowoj...@apache.org> > > > > > > > wrote: > > > > > > > > > > > > > > > +1 for the general idea, however I have concerns about a > couple > > > of > > > > > > > details. > > > > > > > > > > > > > > > > > I would first try to not introduce the exception for local > > > builds. > > > > > > > > > It makes it quite hard for others to verify the build and > to > > > make > > > > > > sure > > > > > > > > that the right things were executed. > > > > > > > > > > > > > > > > I would counter Till's proposal to ignore local green builds. > > If > > > > > > > committer > > > > > > > > is merging and closing a PR, with official azure failure, but > > > there > > > > > > was a > > > > > > > > green build before or in local azure it's IMO enough to leave > > the > > > > > > > message: > > > > > > > > > > > > > > > > > Latest build failure is a known issue: FLINK-12345 > > > > > > > > > Green local build: URL > > > > > > > > > > > > > > > > This should address Till's concern about verification. > > > > > > > > > > > > > > > > On the other hand I have concerns about disabling tests.* It > > > > > shouldn't > > > > > > be > > > > > > > > the PR author/committer that's disabling a test on his own, > as > > > > > that's a > > > > > > > > conflict of interests*. I have however no problems with > > disabling > > > > > test > > > > > > > > instabilities that were marked as "blockers" though, that > > should > > > work > > > > > > > > pretty well. But the important thing here is to correctly > judge > > > > > bumping > > > > > > > >
Re: [DISCUSS] Do not merge PRs with "unrelated" test failures.
+1 Thanks Xintong for the update! Best, Jingsong On Mon, Jun 28, 2021 at 6:44 PM Till Rohrmann wrote: > +1, thanks for updating the guidelines Xintong! > > Cheers, > Till > > On Mon, Jun 28, 2021 at 11:49 AM Yangze Guo wrote: > > > +1 > > > > Thanks Xintong for drafting this doc. > > > > Best, > > Yangze Guo > > > > On Mon, Jun 28, 2021 at 5:42 PM JING ZHANG wrote: > > > > > > Thanks Xintong for giving detailed documentation. > > > > > > The best practice for handling test failure is very detailed, it's a > good > > > guidelines document with clear action steps. > > > > > > +1 to Xintong's proposal. > > > > > > Xintong Song 于2021年6月28日周一 下午4:07写道: > > > > > > > Thanks all for the discussion. > > > > > > > > Based on the opinions so far, I've drafted the new guidelines [1], > as a > > > > potential replacement of the original wiki page [2]. > > > > > > > > Hopefully this draft has covered the most opinions discussed and > > consensus > > > > made in this discussion thread. > > > > > > > > Looking forward to your feedback. > > > > > > > > Thank you~ > > > > > > > > Xintong Song > > > > > > > > > > > > [1] > > > > > > > > > > > https://docs.google.com/document/d/1uUbxbgbGErBXtmEjhwVhBWG3i6nhQ0LXs96OlntEYnU/edit?usp=sharing > > > > > > > > [2] > > > > > > https://cwiki.apache.org/confluence/display/FLINK/Merging+Pull+Requests > > > > > > > > > > > > > > > > On Fri, Jun 25, 2021 at 10:40 PM Piotr Nowojski < > pnowoj...@apache.org> > > > > wrote: > > > > > > > > > Thanks for the clarification Till. +1 for what you have written. > > > > > > > > > > Piotrek > > > > > > > > > > pt., 25 cze 2021 o 16:00 Till Rohrmann > > > > napisał(a): > > > > > > > > > > > One quick note for clarification. I don't have anything against > > builds > > > > > > running on your personal Azure account and this is not what I > > > > understood > > > > > > under "local environment". For me "local environment" means that > > > > someone > > > > > > runs the test locally on his machine and then says that the > > > > > > tests have passed locally. > > > > > > > > > > > > I do agree that there might be a conflict of interests if a PR > > author > > > > > > disables tests. Here I would argue that we don't have malignant > > > > > committers > > > > > > which means that every committer will probably first check the > > > > respective > > > > > > ticket for how often the test failed. Then I guess the next step > > would > > > > be > > > > > > to discuss on the ticket whether to disable it or not. And > finally, > > > > after > > > > > > reaching a consensus, it will be disabled. If we see someone > > abusing > > > > this > > > > > > policy, then we can still think about how to guard against it. > But, > > > > > > honestly, I have very rarely seen such a case. I am also ok to > > pull in > > > > > the > > > > > > release manager to make the final call if this resolves concerns. > > > > > > > > > > > > Cheers, > > > > > > Till > > > > > > > > > > > > On Fri, Jun 25, 2021 at 9:07 AM Piotr Nowojski < > > pnowoj...@apache.org> > > > > > > wrote: > > > > > > > > > > > > > +1 for the general idea, however I have concerns about a couple > > of > > > > > > details. > > > > > > > > > > > > > > > I would first try to not introduce the exception for local > > builds. > > > > > > > > It makes it quite hard for others to verify the build and to > > make > > > > > sure > > > > > > > that the right things were executed. > > > > > > > > > > > > > > I would counter Till's proposal to ignore local green builds. > If > > > > > > committer > > > > > > > is merging and closing a PR, with official azure failure, but > > there > > > > > was a > > > > > > > green build before or in local azure it's IMO enough to leave > the > > > > > > message: > > > > > > > > > > > > > > > Latest build failure is a known issue: FLINK-12345 > > > > > > > > Green local build: URL > > > > > > > > > > > > > > This should address Till's concern about verification. > > > > > > > > > > > > > > On the other hand I have concerns about disabling tests.* It > > > > shouldn't > > > > > be > > > > > > > the PR author/committer that's disabling a test on his own, as > > > > that's a > > > > > > > conflict of interests*. I have however no problems with > disabling > > > > test > > > > > > > instabilities that were marked as "blockers" though, that > should > > work > > > > > > > pretty well. But the important thing here is to correctly judge > > > > bumping > > > > > > > priorities of test instabilities based on their frequency and > > current > > > > > > > general health of the system. I believe that release managers > > should > > > > be > > > > > > > playing a big role here in deciding on the guidelines of what > > should > > > > > be a > > > > > > > priority of certain test instabilities. > > > > > > > > > > > > > > What I mean by that is two example scenarios: > > > > > > > 1. if we have a handful of very frequently failing tests and a > > > > handful > > > > > of > > > > > > > very rar
Re: [DISCUSS] Do not merge PRs with "unrelated" test failures.
+1, thanks for updating the guidelines Xintong! Cheers, Till On Mon, Jun 28, 2021 at 11:49 AM Yangze Guo wrote: > +1 > > Thanks Xintong for drafting this doc. > > Best, > Yangze Guo > > On Mon, Jun 28, 2021 at 5:42 PM JING ZHANG wrote: > > > > Thanks Xintong for giving detailed documentation. > > > > The best practice for handling test failure is very detailed, it's a good > > guidelines document with clear action steps. > > > > +1 to Xintong's proposal. > > > > Xintong Song 于2021年6月28日周一 下午4:07写道: > > > > > Thanks all for the discussion. > > > > > > Based on the opinions so far, I've drafted the new guidelines [1], as a > > > potential replacement of the original wiki page [2]. > > > > > > Hopefully this draft has covered the most opinions discussed and > consensus > > > made in this discussion thread. > > > > > > Looking forward to your feedback. > > > > > > Thank you~ > > > > > > Xintong Song > > > > > > > > > [1] > > > > > > > https://docs.google.com/document/d/1uUbxbgbGErBXtmEjhwVhBWG3i6nhQ0LXs96OlntEYnU/edit?usp=sharing > > > > > > [2] > > > > https://cwiki.apache.org/confluence/display/FLINK/Merging+Pull+Requests > > > > > > > > > > > > On Fri, Jun 25, 2021 at 10:40 PM Piotr Nowojski > > > wrote: > > > > > > > Thanks for the clarification Till. +1 for what you have written. > > > > > > > > Piotrek > > > > > > > > pt., 25 cze 2021 o 16:00 Till Rohrmann > > > napisał(a): > > > > > > > > > One quick note for clarification. I don't have anything against > builds > > > > > running on your personal Azure account and this is not what I > > > understood > > > > > under "local environment". For me "local environment" means that > > > someone > > > > > runs the test locally on his machine and then says that the > > > > > tests have passed locally. > > > > > > > > > > I do agree that there might be a conflict of interests if a PR > author > > > > > disables tests. Here I would argue that we don't have malignant > > > > committers > > > > > which means that every committer will probably first check the > > > respective > > > > > ticket for how often the test failed. Then I guess the next step > would > > > be > > > > > to discuss on the ticket whether to disable it or not. And finally, > > > after > > > > > reaching a consensus, it will be disabled. If we see someone > abusing > > > this > > > > > policy, then we can still think about how to guard against it. But, > > > > > honestly, I have very rarely seen such a case. I am also ok to > pull in > > > > the > > > > > release manager to make the final call if this resolves concerns. > > > > > > > > > > Cheers, > > > > > Till > > > > > > > > > > On Fri, Jun 25, 2021 at 9:07 AM Piotr Nowojski < > pnowoj...@apache.org> > > > > > wrote: > > > > > > > > > > > +1 for the general idea, however I have concerns about a couple > of > > > > > details. > > > > > > > > > > > > > I would first try to not introduce the exception for local > builds. > > > > > > > It makes it quite hard for others to verify the build and to > make > > > > sure > > > > > > that the right things were executed. > > > > > > > > > > > > I would counter Till's proposal to ignore local green builds. If > > > > > committer > > > > > > is merging and closing a PR, with official azure failure, but > there > > > > was a > > > > > > green build before or in local azure it's IMO enough to leave the > > > > > message: > > > > > > > > > > > > > Latest build failure is a known issue: FLINK-12345 > > > > > > > Green local build: URL > > > > > > > > > > > > This should address Till's concern about verification. > > > > > > > > > > > > On the other hand I have concerns about disabling tests.* It > > > shouldn't > > > > be > > > > > > the PR author/committer that's disabling a test on his own, as > > > that's a > > > > > > conflict of interests*. I have however no problems with disabling > > > test > > > > > > instabilities that were marked as "blockers" though, that should > work > > > > > > pretty well. But the important thing here is to correctly judge > > > bumping > > > > > > priorities of test instabilities based on their frequency and > current > > > > > > general health of the system. I believe that release managers > should > > > be > > > > > > playing a big role here in deciding on the guidelines of what > should > > > > be a > > > > > > priority of certain test instabilities. > > > > > > > > > > > > What I mean by that is two example scenarios: > > > > > > 1. if we have a handful of very frequently failing tests and a > > > handful > > > > of > > > > > > very rarely failing tests (like one reported failure and no > another > > > > > > occurrence in many months, and let's even say that the failure > looks > > > > like > > > > > > infrastructure/network timeout), we should focus on the > frequently > > > > > failing > > > > > > ones, and probably we are safe to ignore for the time being the > rare > > > > > issues > > > > > > - at least until we deal with the most pressing ones. > > > > > > 2. If w
Re: [DISCUSS] Do not merge PRs with "unrelated" test failures.
+1 Thanks Xintong for drafting this doc. Best, Yangze Guo On Mon, Jun 28, 2021 at 5:42 PM JING ZHANG wrote: > > Thanks Xintong for giving detailed documentation. > > The best practice for handling test failure is very detailed, it's a good > guidelines document with clear action steps. > > +1 to Xintong's proposal. > > Xintong Song 于2021年6月28日周一 下午4:07写道: > > > Thanks all for the discussion. > > > > Based on the opinions so far, I've drafted the new guidelines [1], as a > > potential replacement of the original wiki page [2]. > > > > Hopefully this draft has covered the most opinions discussed and consensus > > made in this discussion thread. > > > > Looking forward to your feedback. > > > > Thank you~ > > > > Xintong Song > > > > > > [1] > > > > https://docs.google.com/document/d/1uUbxbgbGErBXtmEjhwVhBWG3i6nhQ0LXs96OlntEYnU/edit?usp=sharing > > > > [2] > > https://cwiki.apache.org/confluence/display/FLINK/Merging+Pull+Requests > > > > > > > > On Fri, Jun 25, 2021 at 10:40 PM Piotr Nowojski > > wrote: > > > > > Thanks for the clarification Till. +1 for what you have written. > > > > > > Piotrek > > > > > > pt., 25 cze 2021 o 16:00 Till Rohrmann > > napisał(a): > > > > > > > One quick note for clarification. I don't have anything against builds > > > > running on your personal Azure account and this is not what I > > understood > > > > under "local environment". For me "local environment" means that > > someone > > > > runs the test locally on his machine and then says that the > > > > tests have passed locally. > > > > > > > > I do agree that there might be a conflict of interests if a PR author > > > > disables tests. Here I would argue that we don't have malignant > > > committers > > > > which means that every committer will probably first check the > > respective > > > > ticket for how often the test failed. Then I guess the next step would > > be > > > > to discuss on the ticket whether to disable it or not. And finally, > > after > > > > reaching a consensus, it will be disabled. If we see someone abusing > > this > > > > policy, then we can still think about how to guard against it. But, > > > > honestly, I have very rarely seen such a case. I am also ok to pull in > > > the > > > > release manager to make the final call if this resolves concerns. > > > > > > > > Cheers, > > > > Till > > > > > > > > On Fri, Jun 25, 2021 at 9:07 AM Piotr Nowojski > > > > wrote: > > > > > > > > > +1 for the general idea, however I have concerns about a couple of > > > > details. > > > > > > > > > > > I would first try to not introduce the exception for local builds. > > > > > > It makes it quite hard for others to verify the build and to make > > > sure > > > > > that the right things were executed. > > > > > > > > > > I would counter Till's proposal to ignore local green builds. If > > > > committer > > > > > is merging and closing a PR, with official azure failure, but there > > > was a > > > > > green build before or in local azure it's IMO enough to leave the > > > > message: > > > > > > > > > > > Latest build failure is a known issue: FLINK-12345 > > > > > > Green local build: URL > > > > > > > > > > This should address Till's concern about verification. > > > > > > > > > > On the other hand I have concerns about disabling tests.* It > > shouldn't > > > be > > > > > the PR author/committer that's disabling a test on his own, as > > that's a > > > > > conflict of interests*. I have however no problems with disabling > > test > > > > > instabilities that were marked as "blockers" though, that should work > > > > > pretty well. But the important thing here is to correctly judge > > bumping > > > > > priorities of test instabilities based on their frequency and current > > > > > general health of the system. I believe that release managers should > > be > > > > > playing a big role here in deciding on the guidelines of what should > > > be a > > > > > priority of certain test instabilities. > > > > > > > > > > What I mean by that is two example scenarios: > > > > > 1. if we have a handful of very frequently failing tests and a > > handful > > > of > > > > > very rarely failing tests (like one reported failure and no another > > > > > occurrence in many months, and let's even say that the failure looks > > > like > > > > > infrastructure/network timeout), we should focus on the frequently > > > > failing > > > > > ones, and probably we are safe to ignore for the time being the rare > > > > issues > > > > > - at least until we deal with the most pressing ones. > > > > > 2. If we have tons of rarely failing test instabilities, we should > > > > probably > > > > > start addressing them as blockers. > > > > > > > > > > I'm using my own conscious and my best judgement when I'm > > > > > bumping/decreasing priorities of test instabilities (and bugs), but > > as > > > > > individual committer I don't have the full picture. As I wrote > > above, I > > > > > think release managers are in a much better position to keep
Re: [DISCUSS] Do not merge PRs with "unrelated" test failures.
Thanks Xintong for giving detailed documentation. The best practice for handling test failure is very detailed, it's a good guidelines document with clear action steps. +1 to Xintong's proposal. Xintong Song 于2021年6月28日周一 下午4:07写道: > Thanks all for the discussion. > > Based on the opinions so far, I've drafted the new guidelines [1], as a > potential replacement of the original wiki page [2]. > > Hopefully this draft has covered the most opinions discussed and consensus > made in this discussion thread. > > Looking forward to your feedback. > > Thank you~ > > Xintong Song > > > [1] > > https://docs.google.com/document/d/1uUbxbgbGErBXtmEjhwVhBWG3i6nhQ0LXs96OlntEYnU/edit?usp=sharing > > [2] > https://cwiki.apache.org/confluence/display/FLINK/Merging+Pull+Requests > > > > On Fri, Jun 25, 2021 at 10:40 PM Piotr Nowojski > wrote: > > > Thanks for the clarification Till. +1 for what you have written. > > > > Piotrek > > > > pt., 25 cze 2021 o 16:00 Till Rohrmann > napisał(a): > > > > > One quick note for clarification. I don't have anything against builds > > > running on your personal Azure account and this is not what I > understood > > > under "local environment". For me "local environment" means that > someone > > > runs the test locally on his machine and then says that the > > > tests have passed locally. > > > > > > I do agree that there might be a conflict of interests if a PR author > > > disables tests. Here I would argue that we don't have malignant > > committers > > > which means that every committer will probably first check the > respective > > > ticket for how often the test failed. Then I guess the next step would > be > > > to discuss on the ticket whether to disable it or not. And finally, > after > > > reaching a consensus, it will be disabled. If we see someone abusing > this > > > policy, then we can still think about how to guard against it. But, > > > honestly, I have very rarely seen such a case. I am also ok to pull in > > the > > > release manager to make the final call if this resolves concerns. > > > > > > Cheers, > > > Till > > > > > > On Fri, Jun 25, 2021 at 9:07 AM Piotr Nowojski > > > wrote: > > > > > > > +1 for the general idea, however I have concerns about a couple of > > > details. > > > > > > > > > I would first try to not introduce the exception for local builds. > > > > > It makes it quite hard for others to verify the build and to make > > sure > > > > that the right things were executed. > > > > > > > > I would counter Till's proposal to ignore local green builds. If > > > committer > > > > is merging and closing a PR, with official azure failure, but there > > was a > > > > green build before or in local azure it's IMO enough to leave the > > > message: > > > > > > > > > Latest build failure is a known issue: FLINK-12345 > > > > > Green local build: URL > > > > > > > > This should address Till's concern about verification. > > > > > > > > On the other hand I have concerns about disabling tests.* It > shouldn't > > be > > > > the PR author/committer that's disabling a test on his own, as > that's a > > > > conflict of interests*. I have however no problems with disabling > test > > > > instabilities that were marked as "blockers" though, that should work > > > > pretty well. But the important thing here is to correctly judge > bumping > > > > priorities of test instabilities based on their frequency and current > > > > general health of the system. I believe that release managers should > be > > > > playing a big role here in deciding on the guidelines of what should > > be a > > > > priority of certain test instabilities. > > > > > > > > What I mean by that is two example scenarios: > > > > 1. if we have a handful of very frequently failing tests and a > handful > > of > > > > very rarely failing tests (like one reported failure and no another > > > > occurrence in many months, and let's even say that the failure looks > > like > > > > infrastructure/network timeout), we should focus on the frequently > > > failing > > > > ones, and probably we are safe to ignore for the time being the rare > > > issues > > > > - at least until we deal with the most pressing ones. > > > > 2. If we have tons of rarely failing test instabilities, we should > > > probably > > > > start addressing them as blockers. > > > > > > > > I'm using my own conscious and my best judgement when I'm > > > > bumping/decreasing priorities of test instabilities (and bugs), but > as > > > > individual committer I don't have the full picture. As I wrote > above, I > > > > think release managers are in a much better position to keep > adjusting > > > > those kind of guidelines. > > > > > > > > Best, Piotrek > > > > > > > > pt., 25 cze 2021 o 08:10 Yu Li napisał(a): > > > > > > > > > +1 for Xintong's proposal. > > > > > > > > > > For me, resolving problems directly (fixing the infrastructure > issue, > > > > > disabling unstable tests and creating blocker JIRAs to track the > fix > > > and > > > > >
Re: [DISCUSS] Do not merge PRs with "unrelated" test failures.
Thanks all for the discussion. Based on the opinions so far, I've drafted the new guidelines [1], as a potential replacement of the original wiki page [2]. Hopefully this draft has covered the most opinions discussed and consensus made in this discussion thread. Looking forward to your feedback. Thank you~ Xintong Song [1] https://docs.google.com/document/d/1uUbxbgbGErBXtmEjhwVhBWG3i6nhQ0LXs96OlntEYnU/edit?usp=sharing [2] https://cwiki.apache.org/confluence/display/FLINK/Merging+Pull+Requests On Fri, Jun 25, 2021 at 10:40 PM Piotr Nowojski wrote: > Thanks for the clarification Till. +1 for what you have written. > > Piotrek > > pt., 25 cze 2021 o 16:00 Till Rohrmann napisał(a): > > > One quick note for clarification. I don't have anything against builds > > running on your personal Azure account and this is not what I understood > > under "local environment". For me "local environment" means that someone > > runs the test locally on his machine and then says that the > > tests have passed locally. > > > > I do agree that there might be a conflict of interests if a PR author > > disables tests. Here I would argue that we don't have malignant > committers > > which means that every committer will probably first check the respective > > ticket for how often the test failed. Then I guess the next step would be > > to discuss on the ticket whether to disable it or not. And finally, after > > reaching a consensus, it will be disabled. If we see someone abusing this > > policy, then we can still think about how to guard against it. But, > > honestly, I have very rarely seen such a case. I am also ok to pull in > the > > release manager to make the final call if this resolves concerns. > > > > Cheers, > > Till > > > > On Fri, Jun 25, 2021 at 9:07 AM Piotr Nowojski > > wrote: > > > > > +1 for the general idea, however I have concerns about a couple of > > details. > > > > > > > I would first try to not introduce the exception for local builds. > > > > It makes it quite hard for others to verify the build and to make > sure > > > that the right things were executed. > > > > > > I would counter Till's proposal to ignore local green builds. If > > committer > > > is merging and closing a PR, with official azure failure, but there > was a > > > green build before or in local azure it's IMO enough to leave the > > message: > > > > > > > Latest build failure is a known issue: FLINK-12345 > > > > Green local build: URL > > > > > > This should address Till's concern about verification. > > > > > > On the other hand I have concerns about disabling tests.* It shouldn't > be > > > the PR author/committer that's disabling a test on his own, as that's a > > > conflict of interests*. I have however no problems with disabling test > > > instabilities that were marked as "blockers" though, that should work > > > pretty well. But the important thing here is to correctly judge bumping > > > priorities of test instabilities based on their frequency and current > > > general health of the system. I believe that release managers should be > > > playing a big role here in deciding on the guidelines of what should > be a > > > priority of certain test instabilities. > > > > > > What I mean by that is two example scenarios: > > > 1. if we have a handful of very frequently failing tests and a handful > of > > > very rarely failing tests (like one reported failure and no another > > > occurrence in many months, and let's even say that the failure looks > like > > > infrastructure/network timeout), we should focus on the frequently > > failing > > > ones, and probably we are safe to ignore for the time being the rare > > issues > > > - at least until we deal with the most pressing ones. > > > 2. If we have tons of rarely failing test instabilities, we should > > probably > > > start addressing them as blockers. > > > > > > I'm using my own conscious and my best judgement when I'm > > > bumping/decreasing priorities of test instabilities (and bugs), but as > > > individual committer I don't have the full picture. As I wrote above, I > > > think release managers are in a much better position to keep adjusting > > > those kind of guidelines. > > > > > > Best, Piotrek > > > > > > pt., 25 cze 2021 o 08:10 Yu Li napisał(a): > > > > > > > +1 for Xintong's proposal. > > > > > > > > For me, resolving problems directly (fixing the infrastructure issue, > > > > disabling unstable tests and creating blocker JIRAs to track the fix > > and > > > > re-enable them asap, etc.) is (in most cases) better than working > > around > > > > them (verify locally, manually check and judge the failure as > > > "unrelated", > > > > etc.), and I believe the proposal could help us pushing those more > > "real" > > > > solutions forward. > > > > > > > > Best Regards, > > > > Yu > > > > > > > > > > > > On Fri, 25 Jun 2021 at 10:58, Yangze Guo wrote: > > > > > > > > > Creating a blocker issue for the manually disabled tests sounds > good > > to > > > > me.
Re: [DISCUSS] Do not merge PRs with "unrelated" test failures.
Thanks for the clarification Till. +1 for what you have written. Piotrek pt., 25 cze 2021 o 16:00 Till Rohrmann napisał(a): > One quick note for clarification. I don't have anything against builds > running on your personal Azure account and this is not what I understood > under "local environment". For me "local environment" means that someone > runs the test locally on his machine and then says that the > tests have passed locally. > > I do agree that there might be a conflict of interests if a PR author > disables tests. Here I would argue that we don't have malignant committers > which means that every committer will probably first check the respective > ticket for how often the test failed. Then I guess the next step would be > to discuss on the ticket whether to disable it or not. And finally, after > reaching a consensus, it will be disabled. If we see someone abusing this > policy, then we can still think about how to guard against it. But, > honestly, I have very rarely seen such a case. I am also ok to pull in the > release manager to make the final call if this resolves concerns. > > Cheers, > Till > > On Fri, Jun 25, 2021 at 9:07 AM Piotr Nowojski > wrote: > > > +1 for the general idea, however I have concerns about a couple of > details. > > > > > I would first try to not introduce the exception for local builds. > > > It makes it quite hard for others to verify the build and to make sure > > that the right things were executed. > > > > I would counter Till's proposal to ignore local green builds. If > committer > > is merging and closing a PR, with official azure failure, but there was a > > green build before or in local azure it's IMO enough to leave the > message: > > > > > Latest build failure is a known issue: FLINK-12345 > > > Green local build: URL > > > > This should address Till's concern about verification. > > > > On the other hand I have concerns about disabling tests.* It shouldn't be > > the PR author/committer that's disabling a test on his own, as that's a > > conflict of interests*. I have however no problems with disabling test > > instabilities that were marked as "blockers" though, that should work > > pretty well. But the important thing here is to correctly judge bumping > > priorities of test instabilities based on their frequency and current > > general health of the system. I believe that release managers should be > > playing a big role here in deciding on the guidelines of what should be a > > priority of certain test instabilities. > > > > What I mean by that is two example scenarios: > > 1. if we have a handful of very frequently failing tests and a handful of > > very rarely failing tests (like one reported failure and no another > > occurrence in many months, and let's even say that the failure looks like > > infrastructure/network timeout), we should focus on the frequently > failing > > ones, and probably we are safe to ignore for the time being the rare > issues > > - at least until we deal with the most pressing ones. > > 2. If we have tons of rarely failing test instabilities, we should > probably > > start addressing them as blockers. > > > > I'm using my own conscious and my best judgement when I'm > > bumping/decreasing priorities of test instabilities (and bugs), but as > > individual committer I don't have the full picture. As I wrote above, I > > think release managers are in a much better position to keep adjusting > > those kind of guidelines. > > > > Best, Piotrek > > > > pt., 25 cze 2021 o 08:10 Yu Li napisał(a): > > > > > +1 for Xintong's proposal. > > > > > > For me, resolving problems directly (fixing the infrastructure issue, > > > disabling unstable tests and creating blocker JIRAs to track the fix > and > > > re-enable them asap, etc.) is (in most cases) better than working > around > > > them (verify locally, manually check and judge the failure as > > "unrelated", > > > etc.), and I believe the proposal could help us pushing those more > "real" > > > solutions forward. > > > > > > Best Regards, > > > Yu > > > > > > > > > On Fri, 25 Jun 2021 at 10:58, Yangze Guo wrote: > > > > > > > Creating a blocker issue for the manually disabled tests sounds good > to > > > me. > > > > > > > > Minor: I'm still a bit worried about the commits merged before we fix > > > > the unstable tests can also break those tests. Instead of letting the > > > > assigners keep a look at all potentially related commits, they can > > > > maintain a branch that is periodically synced with the master branch > > > > while enabling the unstable test. So that they can catch the breaking > > > > changes asap. > > > > > > > > Best, > > > > Yangze Guo > > > > > > > > On Thu, Jun 24, 2021 at 9:52 PM Till Rohrmann > > > > wrote: > > > > > > > > > > I like the idea of creating a blocker issue for a disabled test. > This > > > > will > > > > > force us to resolve it in a timely manner and it won't fall through > > the > > > > > cracks. > > > > > > > > > > Cheers, > > > > > T
Re: [DISCUSS] Do not merge PRs with "unrelated" test failures.
One quick note for clarification. I don't have anything against builds running on your personal Azure account and this is not what I understood under "local environment". For me "local environment" means that someone runs the test locally on his machine and then says that the tests have passed locally. I do agree that there might be a conflict of interests if a PR author disables tests. Here I would argue that we don't have malignant committers which means that every committer will probably first check the respective ticket for how often the test failed. Then I guess the next step would be to discuss on the ticket whether to disable it or not. And finally, after reaching a consensus, it will be disabled. If we see someone abusing this policy, then we can still think about how to guard against it. But, honestly, I have very rarely seen such a case. I am also ok to pull in the release manager to make the final call if this resolves concerns. Cheers, Till On Fri, Jun 25, 2021 at 9:07 AM Piotr Nowojski wrote: > +1 for the general idea, however I have concerns about a couple of details. > > > I would first try to not introduce the exception for local builds. > > It makes it quite hard for others to verify the build and to make sure > that the right things were executed. > > I would counter Till's proposal to ignore local green builds. If committer > is merging and closing a PR, with official azure failure, but there was a > green build before or in local azure it's IMO enough to leave the message: > > > Latest build failure is a known issue: FLINK-12345 > > Green local build: URL > > This should address Till's concern about verification. > > On the other hand I have concerns about disabling tests.* It shouldn't be > the PR author/committer that's disabling a test on his own, as that's a > conflict of interests*. I have however no problems with disabling test > instabilities that were marked as "blockers" though, that should work > pretty well. But the important thing here is to correctly judge bumping > priorities of test instabilities based on their frequency and current > general health of the system. I believe that release managers should be > playing a big role here in deciding on the guidelines of what should be a > priority of certain test instabilities. > > What I mean by that is two example scenarios: > 1. if we have a handful of very frequently failing tests and a handful of > very rarely failing tests (like one reported failure and no another > occurrence in many months, and let's even say that the failure looks like > infrastructure/network timeout), we should focus on the frequently failing > ones, and probably we are safe to ignore for the time being the rare issues > - at least until we deal with the most pressing ones. > 2. If we have tons of rarely failing test instabilities, we should probably > start addressing them as blockers. > > I'm using my own conscious and my best judgement when I'm > bumping/decreasing priorities of test instabilities (and bugs), but as > individual committer I don't have the full picture. As I wrote above, I > think release managers are in a much better position to keep adjusting > those kind of guidelines. > > Best, Piotrek > > pt., 25 cze 2021 o 08:10 Yu Li napisał(a): > > > +1 for Xintong's proposal. > > > > For me, resolving problems directly (fixing the infrastructure issue, > > disabling unstable tests and creating blocker JIRAs to track the fix and > > re-enable them asap, etc.) is (in most cases) better than working around > > them (verify locally, manually check and judge the failure as > "unrelated", > > etc.), and I believe the proposal could help us pushing those more "real" > > solutions forward. > > > > Best Regards, > > Yu > > > > > > On Fri, 25 Jun 2021 at 10:58, Yangze Guo wrote: > > > > > Creating a blocker issue for the manually disabled tests sounds good to > > me. > > > > > > Minor: I'm still a bit worried about the commits merged before we fix > > > the unstable tests can also break those tests. Instead of letting the > > > assigners keep a look at all potentially related commits, they can > > > maintain a branch that is periodically synced with the master branch > > > while enabling the unstable test. So that they can catch the breaking > > > changes asap. > > > > > > Best, > > > Yangze Guo > > > > > > On Thu, Jun 24, 2021 at 9:52 PM Till Rohrmann > > > wrote: > > > > > > > > I like the idea of creating a blocker issue for a disabled test. This > > > will > > > > force us to resolve it in a timely manner and it won't fall through > the > > > > cracks. > > > > > > > > Cheers, > > > > Till > > > > > > > > On Thu, Jun 24, 2021 at 8:06 AM Jingsong Li > > > wrote: > > > > > > > > > +1 to Xintong's proposal > > > > > > > > > > I also have some concerns about unstable cases. > > > > > > > > > > I think unstable cases can be divided into these types: > > > > > > > > > > - Force majeure: For example, network timeout, sudden environmental > > > >
Re: [DISCUSS] Do not merge PRs with "unrelated" test failures.
+1 for the general idea, however I have concerns about a couple of details. > I would first try to not introduce the exception for local builds. > It makes it quite hard for others to verify the build and to make sure that the right things were executed. I would counter Till's proposal to ignore local green builds. If committer is merging and closing a PR, with official azure failure, but there was a green build before or in local azure it's IMO enough to leave the message: > Latest build failure is a known issue: FLINK-12345 > Green local build: URL This should address Till's concern about verification. On the other hand I have concerns about disabling tests.* It shouldn't be the PR author/committer that's disabling a test on his own, as that's a conflict of interests*. I have however no problems with disabling test instabilities that were marked as "blockers" though, that should work pretty well. But the important thing here is to correctly judge bumping priorities of test instabilities based on their frequency and current general health of the system. I believe that release managers should be playing a big role here in deciding on the guidelines of what should be a priority of certain test instabilities. What I mean by that is two example scenarios: 1. if we have a handful of very frequently failing tests and a handful of very rarely failing tests (like one reported failure and no another occurrence in many months, and let's even say that the failure looks like infrastructure/network timeout), we should focus on the frequently failing ones, and probably we are safe to ignore for the time being the rare issues - at least until we deal with the most pressing ones. 2. If we have tons of rarely failing test instabilities, we should probably start addressing them as blockers. I'm using my own conscious and my best judgement when I'm bumping/decreasing priorities of test instabilities (and bugs), but as individual committer I don't have the full picture. As I wrote above, I think release managers are in a much better position to keep adjusting those kind of guidelines. Best, Piotrek pt., 25 cze 2021 o 08:10 Yu Li napisał(a): > +1 for Xintong's proposal. > > For me, resolving problems directly (fixing the infrastructure issue, > disabling unstable tests and creating blocker JIRAs to track the fix and > re-enable them asap, etc.) is (in most cases) better than working around > them (verify locally, manually check and judge the failure as "unrelated", > etc.), and I believe the proposal could help us pushing those more "real" > solutions forward. > > Best Regards, > Yu > > > On Fri, 25 Jun 2021 at 10:58, Yangze Guo wrote: > > > Creating a blocker issue for the manually disabled tests sounds good to > me. > > > > Minor: I'm still a bit worried about the commits merged before we fix > > the unstable tests can also break those tests. Instead of letting the > > assigners keep a look at all potentially related commits, they can > > maintain a branch that is periodically synced with the master branch > > while enabling the unstable test. So that they can catch the breaking > > changes asap. > > > > Best, > > Yangze Guo > > > > On Thu, Jun 24, 2021 at 9:52 PM Till Rohrmann > > wrote: > > > > > > I like the idea of creating a blocker issue for a disabled test. This > > will > > > force us to resolve it in a timely manner and it won't fall through the > > > cracks. > > > > > > Cheers, > > > Till > > > > > > On Thu, Jun 24, 2021 at 8:06 AM Jingsong Li > > wrote: > > > > > > > +1 to Xintong's proposal > > > > > > > > I also have some concerns about unstable cases. > > > > > > > > I think unstable cases can be divided into these types: > > > > > > > > - Force majeure: For example, network timeout, sudden environmental > > > > collapse, they are accidental and can always be solved by triggering > > azure > > > > again. Committers should wait for the next green azure. > > > > > > > > - Obvious mistakes: For example, some errors caused by obvious > reasons > > may > > > > be repaired quickly. At this time, do we need to wait, or not wait > and > > just > > > > ignore? > > > > > > > > - Difficult questions: These problems are very difficult to find. > There > > > > will be no solution for a while and a half. We don't even know the > > reason. > > > > At this time, we should ignore it. (Maybe it's judged by the author > of > > the > > > > case. But what about the old case whose author can't be found?) > > > > > > > > So, the ignored cases should be the block of the next release until > the > > > > reason is found or the case is fixed? We need to ensure that someone > > will > > > > take care of these cases, because there is no deepening of failed > > tests, no > > > > one may continue to pay attention to these cases. > > > > > > > > I think this guideline should consider these situations, and show how > > to > > > > solve them. > > > > > > > > Best, > > > > Jingsong > > > > > > > > On Thu, Jun 24, 2021 at 10:57 AM
Re: [DISCUSS] Do not merge PRs with "unrelated" test failures.
+1 for Xintong's proposal. For me, resolving problems directly (fixing the infrastructure issue, disabling unstable tests and creating blocker JIRAs to track the fix and re-enable them asap, etc.) is (in most cases) better than working around them (verify locally, manually check and judge the failure as "unrelated", etc.), and I believe the proposal could help us pushing those more "real" solutions forward. Best Regards, Yu On Fri, 25 Jun 2021 at 10:58, Yangze Guo wrote: > Creating a blocker issue for the manually disabled tests sounds good to me. > > Minor: I'm still a bit worried about the commits merged before we fix > the unstable tests can also break those tests. Instead of letting the > assigners keep a look at all potentially related commits, they can > maintain a branch that is periodically synced with the master branch > while enabling the unstable test. So that they can catch the breaking > changes asap. > > Best, > Yangze Guo > > On Thu, Jun 24, 2021 at 9:52 PM Till Rohrmann > wrote: > > > > I like the idea of creating a blocker issue for a disabled test. This > will > > force us to resolve it in a timely manner and it won't fall through the > > cracks. > > > > Cheers, > > Till > > > > On Thu, Jun 24, 2021 at 8:06 AM Jingsong Li > wrote: > > > > > +1 to Xintong's proposal > > > > > > I also have some concerns about unstable cases. > > > > > > I think unstable cases can be divided into these types: > > > > > > - Force majeure: For example, network timeout, sudden environmental > > > collapse, they are accidental and can always be solved by triggering > azure > > > again. Committers should wait for the next green azure. > > > > > > - Obvious mistakes: For example, some errors caused by obvious reasons > may > > > be repaired quickly. At this time, do we need to wait, or not wait and > just > > > ignore? > > > > > > - Difficult questions: These problems are very difficult to find. There > > > will be no solution for a while and a half. We don't even know the > reason. > > > At this time, we should ignore it. (Maybe it's judged by the author of > the > > > case. But what about the old case whose author can't be found?) > > > > > > So, the ignored cases should be the block of the next release until the > > > reason is found or the case is fixed? We need to ensure that someone > will > > > take care of these cases, because there is no deepening of failed > tests, no > > > one may continue to pay attention to these cases. > > > > > > I think this guideline should consider these situations, and show how > to > > > solve them. > > > > > > Best, > > > Jingsong > > > > > > On Thu, Jun 24, 2021 at 10:57 AM Jark Wu wrote: > > > > > > > Thanks to Xintong for bringing up this topic, I'm +1 in general. > > > > > > > > However, I think it's still not very clear how we address the > unstable > > > > tests. > > > > I think this is a very important part of this new guideline. > > > > > > > > According to the discussion above, if some tests are unstable, we can > > > > manually disable it. > > > > But I have some questions in my mind: > > > > 1) Is the instability judged by the committer themselves or by some > > > > metrics? > > > > 2) Should we log the disable commit in the corresponding issue and > > > increase > > > > the priority? > > > > 3) What if nobody looks into this issue and this becomes some > potential > > > > bugs released with the new version? > > > > 4) If no person is actively working on the issue, who should > re-enable > > > it? > > > > Would it block PRs again? > > > > > > > > > > > > Best, > > > > Jark > > > > > > > > > > > > On Thu, 24 Jun 2021 at 10:04, Xintong Song > > > wrote: > > > > > > > > > Thanks all for the feedback. > > > > > > > > > > @Till @Yangze > > > > > > > > > > I'm also not convinced by the idea of having an exception for local > > > > builds. > > > > > We need to execute the entire build (or at least the failing stage) > > > > > locally, to make sure subsequent test cases prevented by the > failure > > > one > > > > > are all executed. In that case, it's probably easier to rerun the > build > > > > on > > > > > azure than locally. > > > > > > > > > > Concerning disabling unstable test cases that regularly block PRs > from > > > > > merging, maybe we can say that such cases can only be disabled when > > > > someone > > > > > is actively looking into it, likely the person who disabled the > case. > > > If > > > > > this person is no longer actively working on it, he/she should > enable > > > the > > > > > case again no matter if it is fixed or not. > > > > > > > > > > @Jing > > > > > > > > > > Thanks for the suggestions. > > > > > > > > > > +1 to provide guidelines on handling test failures. > > > > > > > > > > 1. Report the test failures in the JIRA. > > > > > > > > > > > > > > > > +1 on this. Currently, the release managers are monitoring the ci > and > > > > cron > > > > > build instabilities and reporting them on JIRA. We should also > > > encourage > > > > > ot
Re: [DISCUSS] Do not merge PRs with "unrelated" test failures.
Creating a blocker issue for the manually disabled tests sounds good to me. Minor: I'm still a bit worried about the commits merged before we fix the unstable tests can also break those tests. Instead of letting the assigners keep a look at all potentially related commits, they can maintain a branch that is periodically synced with the master branch while enabling the unstable test. So that they can catch the breaking changes asap. Best, Yangze Guo On Thu, Jun 24, 2021 at 9:52 PM Till Rohrmann wrote: > > I like the idea of creating a blocker issue for a disabled test. This will > force us to resolve it in a timely manner and it won't fall through the > cracks. > > Cheers, > Till > > On Thu, Jun 24, 2021 at 8:06 AM Jingsong Li wrote: > > > +1 to Xintong's proposal > > > > I also have some concerns about unstable cases. > > > > I think unstable cases can be divided into these types: > > > > - Force majeure: For example, network timeout, sudden environmental > > collapse, they are accidental and can always be solved by triggering azure > > again. Committers should wait for the next green azure. > > > > - Obvious mistakes: For example, some errors caused by obvious reasons may > > be repaired quickly. At this time, do we need to wait, or not wait and just > > ignore? > > > > - Difficult questions: These problems are very difficult to find. There > > will be no solution for a while and a half. We don't even know the reason. > > At this time, we should ignore it. (Maybe it's judged by the author of the > > case. But what about the old case whose author can't be found?) > > > > So, the ignored cases should be the block of the next release until the > > reason is found or the case is fixed? We need to ensure that someone will > > take care of these cases, because there is no deepening of failed tests, no > > one may continue to pay attention to these cases. > > > > I think this guideline should consider these situations, and show how to > > solve them. > > > > Best, > > Jingsong > > > > On Thu, Jun 24, 2021 at 10:57 AM Jark Wu wrote: > > > > > Thanks to Xintong for bringing up this topic, I'm +1 in general. > > > > > > However, I think it's still not very clear how we address the unstable > > > tests. > > > I think this is a very important part of this new guideline. > > > > > > According to the discussion above, if some tests are unstable, we can > > > manually disable it. > > > But I have some questions in my mind: > > > 1) Is the instability judged by the committer themselves or by some > > > metrics? > > > 2) Should we log the disable commit in the corresponding issue and > > increase > > > the priority? > > > 3) What if nobody looks into this issue and this becomes some potential > > > bugs released with the new version? > > > 4) If no person is actively working on the issue, who should re-enable > > it? > > > Would it block PRs again? > > > > > > > > > Best, > > > Jark > > > > > > > > > On Thu, 24 Jun 2021 at 10:04, Xintong Song > > wrote: > > > > > > > Thanks all for the feedback. > > > > > > > > @Till @Yangze > > > > > > > > I'm also not convinced by the idea of having an exception for local > > > builds. > > > > We need to execute the entire build (or at least the failing stage) > > > > locally, to make sure subsequent test cases prevented by the failure > > one > > > > are all executed. In that case, it's probably easier to rerun the build > > > on > > > > azure than locally. > > > > > > > > Concerning disabling unstable test cases that regularly block PRs from > > > > merging, maybe we can say that such cases can only be disabled when > > > someone > > > > is actively looking into it, likely the person who disabled the case. > > If > > > > this person is no longer actively working on it, he/she should enable > > the > > > > case again no matter if it is fixed or not. > > > > > > > > @Jing > > > > > > > > Thanks for the suggestions. > > > > > > > > +1 to provide guidelines on handling test failures. > > > > > > > > 1. Report the test failures in the JIRA. > > > > > > > > > > > > > +1 on this. Currently, the release managers are monitoring the ci and > > > cron > > > > build instabilities and reporting them on JIRA. We should also > > encourage > > > > other contributors to do that for PRs. > > > > > > > > 2. Set a deadline to find out the root cause and solve the failure for > > > the > > > > > new created JIRA because we could not block other commit merges for > > a > > > > long > > > > > time > > > > > > > > > 3. What to do if the JIRA has not made significant progress when > > reached > > > to > > > > > the deadline time? > > > > > > > > > > > > I'm not sure about these two. It feels a bit against the voluntary > > nature > > > > of open source projects. > > > > > > > > IMHO, frequent instabilities are more likely to be upgraded to the > > > critical > > > > / blocker priority, receive more attention and eventually get fixed. > > > > Release managers are also responsible for looking for assig
Re: [DISCUSS] Do not merge PRs with "unrelated" test failures.
I like the idea of creating a blocker issue for a disabled test. This will force us to resolve it in a timely manner and it won't fall through the cracks. Cheers, Till On Thu, Jun 24, 2021 at 8:06 AM Jingsong Li wrote: > +1 to Xintong's proposal > > I also have some concerns about unstable cases. > > I think unstable cases can be divided into these types: > > - Force majeure: For example, network timeout, sudden environmental > collapse, they are accidental and can always be solved by triggering azure > again. Committers should wait for the next green azure. > > - Obvious mistakes: For example, some errors caused by obvious reasons may > be repaired quickly. At this time, do we need to wait, or not wait and just > ignore? > > - Difficult questions: These problems are very difficult to find. There > will be no solution for a while and a half. We don't even know the reason. > At this time, we should ignore it. (Maybe it's judged by the author of the > case. But what about the old case whose author can't be found?) > > So, the ignored cases should be the block of the next release until the > reason is found or the case is fixed? We need to ensure that someone will > take care of these cases, because there is no deepening of failed tests, no > one may continue to pay attention to these cases. > > I think this guideline should consider these situations, and show how to > solve them. > > Best, > Jingsong > > On Thu, Jun 24, 2021 at 10:57 AM Jark Wu wrote: > > > Thanks to Xintong for bringing up this topic, I'm +1 in general. > > > > However, I think it's still not very clear how we address the unstable > > tests. > > I think this is a very important part of this new guideline. > > > > According to the discussion above, if some tests are unstable, we can > > manually disable it. > > But I have some questions in my mind: > > 1) Is the instability judged by the committer themselves or by some > > metrics? > > 2) Should we log the disable commit in the corresponding issue and > increase > > the priority? > > 3) What if nobody looks into this issue and this becomes some potential > > bugs released with the new version? > > 4) If no person is actively working on the issue, who should re-enable > it? > > Would it block PRs again? > > > > > > Best, > > Jark > > > > > > On Thu, 24 Jun 2021 at 10:04, Xintong Song > wrote: > > > > > Thanks all for the feedback. > > > > > > @Till @Yangze > > > > > > I'm also not convinced by the idea of having an exception for local > > builds. > > > We need to execute the entire build (or at least the failing stage) > > > locally, to make sure subsequent test cases prevented by the failure > one > > > are all executed. In that case, it's probably easier to rerun the build > > on > > > azure than locally. > > > > > > Concerning disabling unstable test cases that regularly block PRs from > > > merging, maybe we can say that such cases can only be disabled when > > someone > > > is actively looking into it, likely the person who disabled the case. > If > > > this person is no longer actively working on it, he/she should enable > the > > > case again no matter if it is fixed or not. > > > > > > @Jing > > > > > > Thanks for the suggestions. > > > > > > +1 to provide guidelines on handling test failures. > > > > > > 1. Report the test failures in the JIRA. > > > > > > > > > > +1 on this. Currently, the release managers are monitoring the ci and > > cron > > > build instabilities and reporting them on JIRA. We should also > encourage > > > other contributors to do that for PRs. > > > > > > 2. Set a deadline to find out the root cause and solve the failure for > > the > > > > new created JIRA because we could not block other commit merges for > a > > > long > > > > time > > > > > > > 3. What to do if the JIRA has not made significant progress when > reached > > to > > > > the deadline time? > > > > > > > > > I'm not sure about these two. It feels a bit against the voluntary > nature > > > of open source projects. > > > > > > IMHO, frequent instabilities are more likely to be upgraded to the > > critical > > > / blocker priority, receive more attention and eventually get fixed. > > > Release managers are also responsible for looking for assignees for > such > > > issues. If a case is still not fixed soonish, even with all these > > efforts, > > > I'm not sure how setting a deadline can help this. > > > > > > 4. If we disable the respective tests temporarily, we also need a > > mechanism > > > > to ensure the issue would be continued to be investigated in the > > future. > > > > > > > > > > +1. As mentioned above, we may consider disabling such tests iff > someone > > is > > > actively working on it. > > > > > > Thank you~ > > > > > > Xintong Song > > > > > > > > > > > > On Wed, Jun 23, 2021 at 9:56 PM JING ZHANG > wrote: > > > > > > > Hi Xintong, > > > > +1 to the proposal. > > > > In order to better comply with the rule, it is necessary to describe > > > what's > > > > best practice if en
Re: [DISCUSS] Do not merge PRs with "unrelated" test failures.
+1 to Xintong's proposal I also have some concerns about unstable cases. I think unstable cases can be divided into these types: - Force majeure: For example, network timeout, sudden environmental collapse, they are accidental and can always be solved by triggering azure again. Committers should wait for the next green azure. - Obvious mistakes: For example, some errors caused by obvious reasons may be repaired quickly. At this time, do we need to wait, or not wait and just ignore? - Difficult questions: These problems are very difficult to find. There will be no solution for a while and a half. We don't even know the reason. At this time, we should ignore it. (Maybe it's judged by the author of the case. But what about the old case whose author can't be found?) So, the ignored cases should be the block of the next release until the reason is found or the case is fixed? We need to ensure that someone will take care of these cases, because there is no deepening of failed tests, no one may continue to pay attention to these cases. I think this guideline should consider these situations, and show how to solve them. Best, Jingsong On Thu, Jun 24, 2021 at 10:57 AM Jark Wu wrote: > Thanks to Xintong for bringing up this topic, I'm +1 in general. > > However, I think it's still not very clear how we address the unstable > tests. > I think this is a very important part of this new guideline. > > According to the discussion above, if some tests are unstable, we can > manually disable it. > But I have some questions in my mind: > 1) Is the instability judged by the committer themselves or by some > metrics? > 2) Should we log the disable commit in the corresponding issue and increase > the priority? > 3) What if nobody looks into this issue and this becomes some potential > bugs released with the new version? > 4) If no person is actively working on the issue, who should re-enable it? > Would it block PRs again? > > > Best, > Jark > > > On Thu, 24 Jun 2021 at 10:04, Xintong Song wrote: > > > Thanks all for the feedback. > > > > @Till @Yangze > > > > I'm also not convinced by the idea of having an exception for local > builds. > > We need to execute the entire build (or at least the failing stage) > > locally, to make sure subsequent test cases prevented by the failure one > > are all executed. In that case, it's probably easier to rerun the build > on > > azure than locally. > > > > Concerning disabling unstable test cases that regularly block PRs from > > merging, maybe we can say that such cases can only be disabled when > someone > > is actively looking into it, likely the person who disabled the case. If > > this person is no longer actively working on it, he/she should enable the > > case again no matter if it is fixed or not. > > > > @Jing > > > > Thanks for the suggestions. > > > > +1 to provide guidelines on handling test failures. > > > > 1. Report the test failures in the JIRA. > > > > > > > +1 on this. Currently, the release managers are monitoring the ci and > cron > > build instabilities and reporting them on JIRA. We should also encourage > > other contributors to do that for PRs. > > > > 2. Set a deadline to find out the root cause and solve the failure for > the > > > new created JIRA because we could not block other commit merges for a > > long > > > time > > > > > 3. What to do if the JIRA has not made significant progress when reached > to > > > the deadline time? > > > > > > I'm not sure about these two. It feels a bit against the voluntary nature > > of open source projects. > > > > IMHO, frequent instabilities are more likely to be upgraded to the > critical > > / blocker priority, receive more attention and eventually get fixed. > > Release managers are also responsible for looking for assignees for such > > issues. If a case is still not fixed soonish, even with all these > efforts, > > I'm not sure how setting a deadline can help this. > > > > 4. If we disable the respective tests temporarily, we also need a > mechanism > > > to ensure the issue would be continued to be investigated in the > future. > > > > > > > +1. As mentioned above, we may consider disabling such tests iff someone > is > > actively working on it. > > > > Thank you~ > > > > Xintong Song > > > > > > > > On Wed, Jun 23, 2021 at 9:56 PM JING ZHANG wrote: > > > > > Hi Xintong, > > > +1 to the proposal. > > > In order to better comply with the rule, it is necessary to describe > > what's > > > best practice if encountering test failure which seems unrelated with > the > > > current commits. > > > How to avoid merging PR with test failures and not blocking code > merging > > > for a long time? > > > I tried to think about the possible steps, and found there are some > > > detailed problems that need to be discussed in a step further: > > > 1. Report the test failures in the JIRA. > > > 2. Set a deadline to find out the root cause and solve the failure for > > the > > > new created JIRA because we could not block
Re: [DISCUSS] Do not merge PRs with "unrelated" test failures.
Thanks to Xintong for bringing up this topic, I'm +1 in general. However, I think it's still not very clear how we address the unstable tests. I think this is a very important part of this new guideline. According to the discussion above, if some tests are unstable, we can manually disable it. But I have some questions in my mind: 1) Is the instability judged by the committer themselves or by some metrics? 2) Should we log the disable commit in the corresponding issue and increase the priority? 3) What if nobody looks into this issue and this becomes some potential bugs released with the new version? 4) If no person is actively working on the issue, who should re-enable it? Would it block PRs again? Best, Jark On Thu, 24 Jun 2021 at 10:04, Xintong Song wrote: > Thanks all for the feedback. > > @Till @Yangze > > I'm also not convinced by the idea of having an exception for local builds. > We need to execute the entire build (or at least the failing stage) > locally, to make sure subsequent test cases prevented by the failure one > are all executed. In that case, it's probably easier to rerun the build on > azure than locally. > > Concerning disabling unstable test cases that regularly block PRs from > merging, maybe we can say that such cases can only be disabled when someone > is actively looking into it, likely the person who disabled the case. If > this person is no longer actively working on it, he/she should enable the > case again no matter if it is fixed or not. > > @Jing > > Thanks for the suggestions. > > +1 to provide guidelines on handling test failures. > > 1. Report the test failures in the JIRA. > > > > +1 on this. Currently, the release managers are monitoring the ci and cron > build instabilities and reporting them on JIRA. We should also encourage > other contributors to do that for PRs. > > 2. Set a deadline to find out the root cause and solve the failure for the > > new created JIRA because we could not block other commit merges for a > long > > time > > > 3. What to do if the JIRA has not made significant progress when reached to > > the deadline time? > > > I'm not sure about these two. It feels a bit against the voluntary nature > of open source projects. > > IMHO, frequent instabilities are more likely to be upgraded to the critical > / blocker priority, receive more attention and eventually get fixed. > Release managers are also responsible for looking for assignees for such > issues. If a case is still not fixed soonish, even with all these efforts, > I'm not sure how setting a deadline can help this. > > 4. If we disable the respective tests temporarily, we also need a mechanism > > to ensure the issue would be continued to be investigated in the future. > > > > +1. As mentioned above, we may consider disabling such tests iff someone is > actively working on it. > > Thank you~ > > Xintong Song > > > > On Wed, Jun 23, 2021 at 9:56 PM JING ZHANG wrote: > > > Hi Xintong, > > +1 to the proposal. > > In order to better comply with the rule, it is necessary to describe > what's > > best practice if encountering test failure which seems unrelated with the > > current commits. > > How to avoid merging PR with test failures and not blocking code merging > > for a long time? > > I tried to think about the possible steps, and found there are some > > detailed problems that need to be discussed in a step further: > > 1. Report the test failures in the JIRA. > > 2. Set a deadline to find out the root cause and solve the failure for > the > > new created JIRA because we could not block other commit merges for a > long > > time > > When is a reasonable deadline here? > > 3. What to do if the JIRA has not made significant progress when reached > to > > the deadline time? > > There are several situations as follows, maybe different cases need > > different approaches. > > 1. the JIRA is non-assigned yet > > 2. not found the root cause yet > > 3. not found a good solution, but already found the root cause > > 4. found a solution, but it needs more time to be done. > > 4. If we disable the respective tests temporarily, we also need a > mechanism > > to ensure the issue would be continued to be investigated in the future. > > > > Best regards, > > JING ZHANG > > > > Stephan Ewen 于2021年6月23日周三 下午8:16写道: > > > > > +1 to Xintong's proposal > > > > > > On Wed, Jun 23, 2021 at 1:53 PM Till Rohrmann > > > wrote: > > > > > > > I would first try to not introduce the exception for local builds. It > > > makes > > > > it quite hard for others to verify the build and to make sure that > the > > > > right things were executed. If we see that this becomes an issue then > > we > > > > can revisit this idea. > > > > > > > > Cheers, > > > > Till > > > > > > > > On Wed, Jun 23, 2021 at 4:19 AM Yangze Guo > wrote: > > > > > > > > > +1 for appending this to community guidelines for merging PRs. > > > > > > > > > > @Till Rohrmann > > > > > I agree that with this approach unstable te
Re: [DISCUSS] Do not merge PRs with "unrelated" test failures.
Thanks all for the feedback. @Till @Yangze I'm also not convinced by the idea of having an exception for local builds. We need to execute the entire build (or at least the failing stage) locally, to make sure subsequent test cases prevented by the failure one are all executed. In that case, it's probably easier to rerun the build on azure than locally. Concerning disabling unstable test cases that regularly block PRs from merging, maybe we can say that such cases can only be disabled when someone is actively looking into it, likely the person who disabled the case. If this person is no longer actively working on it, he/she should enable the case again no matter if it is fixed or not. @Jing Thanks for the suggestions. +1 to provide guidelines on handling test failures. 1. Report the test failures in the JIRA. > +1 on this. Currently, the release managers are monitoring the ci and cron build instabilities and reporting them on JIRA. We should also encourage other contributors to do that for PRs. 2. Set a deadline to find out the root cause and solve the failure for the > new created JIRA because we could not block other commit merges for a long > time > 3. What to do if the JIRA has not made significant progress when reached to > the deadline time? I'm not sure about these two. It feels a bit against the voluntary nature of open source projects. IMHO, frequent instabilities are more likely to be upgraded to the critical / blocker priority, receive more attention and eventually get fixed. Release managers are also responsible for looking for assignees for such issues. If a case is still not fixed soonish, even with all these efforts, I'm not sure how setting a deadline can help this. 4. If we disable the respective tests temporarily, we also need a mechanism > to ensure the issue would be continued to be investigated in the future. > +1. As mentioned above, we may consider disabling such tests iff someone is actively working on it. Thank you~ Xintong Song On Wed, Jun 23, 2021 at 9:56 PM JING ZHANG wrote: > Hi Xintong, > +1 to the proposal. > In order to better comply with the rule, it is necessary to describe what's > best practice if encountering test failure which seems unrelated with the > current commits. > How to avoid merging PR with test failures and not blocking code merging > for a long time? > I tried to think about the possible steps, and found there are some > detailed problems that need to be discussed in a step further: > 1. Report the test failures in the JIRA. > 2. Set a deadline to find out the root cause and solve the failure for the > new created JIRA because we could not block other commit merges for a long > time > When is a reasonable deadline here? > 3. What to do if the JIRA has not made significant progress when reached to > the deadline time? > There are several situations as follows, maybe different cases need > different approaches. > 1. the JIRA is non-assigned yet > 2. not found the root cause yet > 3. not found a good solution, but already found the root cause > 4. found a solution, but it needs more time to be done. > 4. If we disable the respective tests temporarily, we also need a mechanism > to ensure the issue would be continued to be investigated in the future. > > Best regards, > JING ZHANG > > Stephan Ewen 于2021年6月23日周三 下午8:16写道: > > > +1 to Xintong's proposal > > > > On Wed, Jun 23, 2021 at 1:53 PM Till Rohrmann > > wrote: > > > > > I would first try to not introduce the exception for local builds. It > > makes > > > it quite hard for others to verify the build and to make sure that the > > > right things were executed. If we see that this becomes an issue then > we > > > can revisit this idea. > > > > > > Cheers, > > > Till > > > > > > On Wed, Jun 23, 2021 at 4:19 AM Yangze Guo wrote: > > > > > > > +1 for appending this to community guidelines for merging PRs. > > > > > > > > @Till Rohrmann > > > > I agree that with this approach unstable tests will not block other > > > > commit merges. However, it might be hard to prevent merging commits > > > > that are related to those tests and should have been passed them. > It's > > > > true that this judgment can be made by the committers, but no one can > > > > ensure the judgment is always precise and so that we have this > > > > discussion thread. > > > > > > > > Regarding the unstable tests, how about adding another exception: > > > > committers verify it in their local environment and comment in such > > > > cases? > > > > > > > > Best, > > > > Yangze Guo > > > > > > > > On Tue, Jun 22, 2021 at 8:23 PM 刘建刚 > wrote: > > > > > > > > > > It is a good principle to run all tests successfully with any > change. > > > > This > > > > > means a lot for project's stability and development. I am big +1 > for > > > this > > > > > proposal. > > > > > > > > > > Best > > > > > liujiangang > > > > > > > > > > Till Rohrmann 于2021年6月22日周二 下午6:36写道: > > > > > > > > > > > One way to address
Re: [DISCUSS] Do not merge PRs with "unrelated" test failures.
Hi Xintong, +1 to the proposal. In order to better comply with the rule, it is necessary to describe what's best practice if encountering test failure which seems unrelated with the current commits. How to avoid merging PR with test failures and not blocking code merging for a long time? I tried to think about the possible steps, and found there are some detailed problems that need to be discussed in a step further: 1. Report the test failures in the JIRA. 2. Set a deadline to find out the root cause and solve the failure for the new created JIRA because we could not block other commit merges for a long time When is a reasonable deadline here? 3. What to do if the JIRA has not made significant progress when reached to the deadline time? There are several situations as follows, maybe different cases need different approaches. 1. the JIRA is non-assigned yet 2. not found the root cause yet 3. not found a good solution, but already found the root cause 4. found a solution, but it needs more time to be done. 4. If we disable the respective tests temporarily, we also need a mechanism to ensure the issue would be continued to be investigated in the future. Best regards, JING ZHANG Stephan Ewen 于2021年6月23日周三 下午8:16写道: > +1 to Xintong's proposal > > On Wed, Jun 23, 2021 at 1:53 PM Till Rohrmann > wrote: > > > I would first try to not introduce the exception for local builds. It > makes > > it quite hard for others to verify the build and to make sure that the > > right things were executed. If we see that this becomes an issue then we > > can revisit this idea. > > > > Cheers, > > Till > > > > On Wed, Jun 23, 2021 at 4:19 AM Yangze Guo wrote: > > > > > +1 for appending this to community guidelines for merging PRs. > > > > > > @Till Rohrmann > > > I agree that with this approach unstable tests will not block other > > > commit merges. However, it might be hard to prevent merging commits > > > that are related to those tests and should have been passed them. It's > > > true that this judgment can be made by the committers, but no one can > > > ensure the judgment is always precise and so that we have this > > > discussion thread. > > > > > > Regarding the unstable tests, how about adding another exception: > > > committers verify it in their local environment and comment in such > > > cases? > > > > > > Best, > > > Yangze Guo > > > > > > On Tue, Jun 22, 2021 at 8:23 PM 刘建刚 wrote: > > > > > > > > It is a good principle to run all tests successfully with any change. > > > This > > > > means a lot for project's stability and development. I am big +1 for > > this > > > > proposal. > > > > > > > > Best > > > > liujiangang > > > > > > > > Till Rohrmann 于2021年6月22日周二 下午6:36写道: > > > > > > > > > One way to address the problem of regularly failing tests that > block > > > > > merging of PRs is to disable the respective tests for the time > being. > > > Of > > > > > course, the failing test then needs to be fixed. But at least that > > way > > > we > > > > > would not block everyone from making progress. > > > > > > > > > > Cheers, > > > > > Till > > > > > > > > > > On Tue, Jun 22, 2021 at 12:00 PM Arvid Heise > > wrote: > > > > > > > > > > > I think this is overall a good idea. So +1 from my side. > > > > > > However, I'd like to put a higher priority on infrastructure > then, > > in > > > > > > particular docker image/artifact caches. > > > > > > > > > > > > On Tue, Jun 22, 2021 at 11:50 AM Till Rohrmann < > > trohrm...@apache.org > > > > > > > > > > wrote: > > > > > > > > > > > > > Thanks for bringing this topic to our attention Xintong. I > think > > > your > > > > > > > proposal makes a lot of sense and we should follow it. It will > > > give us > > > > > > > confidence that our changes are working and it might be a good > > > > > incentive > > > > > > to > > > > > > > quickly fix build instabilities. Hence, +1. > > > > > > > > > > > > > > Cheers, > > > > > > > Till > > > > > > > > > > > > > > On Tue, Jun 22, 2021 at 11:12 AM Xintong Song < > > > tonysong...@gmail.com> > > > > > > > wrote: > > > > > > > > > > > > > > > Hi everyone, > > > > > > > > > > > > > > > > In the past a couple of weeks, I've observed several times > that > > > PRs > > > > > are > > > > > > > > merged without a green light from the CI tests, where failure > > > cases > > > > > are > > > > > > > > considered *unrelated*. This may not always cause problems, > but > > > would > > > > > > > > increase the chance of breaking our code base. In fact, it > has > > > > > occurred > > > > > > > to > > > > > > > > me twice in the past few weeks that I had to revert a commit > > > which > > > > > > breaks > > > > > > > > the master branch due to this. > > > > > > > > > > > > > > > > I think it would be nicer to enforce a stricter rule, that no > > PRs > > > > > > should > > > > > > > be > > > > > > > > merged without passing CI. > > > > > > > > > > > > > > > > The problems of merging PRs with "unrelated" test failures > are: >
Re: [DISCUSS] Do not merge PRs with "unrelated" test failures.
+1 to Xintong's proposal On Wed, Jun 23, 2021 at 1:53 PM Till Rohrmann wrote: > I would first try to not introduce the exception for local builds. It makes > it quite hard for others to verify the build and to make sure that the > right things were executed. If we see that this becomes an issue then we > can revisit this idea. > > Cheers, > Till > > On Wed, Jun 23, 2021 at 4:19 AM Yangze Guo wrote: > > > +1 for appending this to community guidelines for merging PRs. > > > > @Till Rohrmann > > I agree that with this approach unstable tests will not block other > > commit merges. However, it might be hard to prevent merging commits > > that are related to those tests and should have been passed them. It's > > true that this judgment can be made by the committers, but no one can > > ensure the judgment is always precise and so that we have this > > discussion thread. > > > > Regarding the unstable tests, how about adding another exception: > > committers verify it in their local environment and comment in such > > cases? > > > > Best, > > Yangze Guo > > > > On Tue, Jun 22, 2021 at 8:23 PM 刘建刚 wrote: > > > > > > It is a good principle to run all tests successfully with any change. > > This > > > means a lot for project's stability and development. I am big +1 for > this > > > proposal. > > > > > > Best > > > liujiangang > > > > > > Till Rohrmann 于2021年6月22日周二 下午6:36写道: > > > > > > > One way to address the problem of regularly failing tests that block > > > > merging of PRs is to disable the respective tests for the time being. > > Of > > > > course, the failing test then needs to be fixed. But at least that > way > > we > > > > would not block everyone from making progress. > > > > > > > > Cheers, > > > > Till > > > > > > > > On Tue, Jun 22, 2021 at 12:00 PM Arvid Heise > wrote: > > > > > > > > > I think this is overall a good idea. So +1 from my side. > > > > > However, I'd like to put a higher priority on infrastructure then, > in > > > > > particular docker image/artifact caches. > > > > > > > > > > On Tue, Jun 22, 2021 at 11:50 AM Till Rohrmann < > trohrm...@apache.org > > > > > > > > wrote: > > > > > > > > > > > Thanks for bringing this topic to our attention Xintong. I think > > your > > > > > > proposal makes a lot of sense and we should follow it. It will > > give us > > > > > > confidence that our changes are working and it might be a good > > > > incentive > > > > > to > > > > > > quickly fix build instabilities. Hence, +1. > > > > > > > > > > > > Cheers, > > > > > > Till > > > > > > > > > > > > On Tue, Jun 22, 2021 at 11:12 AM Xintong Song < > > tonysong...@gmail.com> > > > > > > wrote: > > > > > > > > > > > > > Hi everyone, > > > > > > > > > > > > > > In the past a couple of weeks, I've observed several times that > > PRs > > > > are > > > > > > > merged without a green light from the CI tests, where failure > > cases > > > > are > > > > > > > considered *unrelated*. This may not always cause problems, but > > would > > > > > > > increase the chance of breaking our code base. In fact, it has > > > > occurred > > > > > > to > > > > > > > me twice in the past few weeks that I had to revert a commit > > which > > > > > breaks > > > > > > > the master branch due to this. > > > > > > > > > > > > > > I think it would be nicer to enforce a stricter rule, that no > PRs > > > > > should > > > > > > be > > > > > > > merged without passing CI. > > > > > > > > > > > > > > The problems of merging PRs with "unrelated" test failures are: > > > > > > > - It's not always straightforward to tell whether a test > > failures are > > > > > > > related or not. > > > > > > > - It prevents subsequent test cases from being executed, which > > may > > > > fail > > > > > > > relating to the PR changes. > > > > > > > > > > > > > > To make things easier for the committers, the following > > exceptions > > > > > might > > > > > > be > > > > > > > considered acceptable. > > > > > > > - The PR has passed CI in the contributor's personal workspace. > > > > Please > > > > > > post > > > > > > > the link in such cases. > > > > > > > - The CI tests have been triggered multiple times, on the same > > > > commit, > > > > > > and > > > > > > > each stage has at least passed for once. Please also comment in > > such > > > > > > cases. > > > > > > > > > > > > > > If we all agree on this, I'd update the community guidelines > for > > > > > merging > > > > > > > PRs wrt. this proposal. [1] > > > > > > > > > > > > > > Please let me know what do you think. > > > > > > > > > > > > > > Thank you~ > > > > > > > > > > > > > > Xintong Song > > > > > > > > > > > > > > > > > > > > > [1] > > > > > > > > > > > > > > https://cwiki.apache.org/confluence/display/FLINK/Merging+Pull+Requests > > > > > > > > > > > > > > > > > > > > > > > > >
Re: [DISCUSS] Do not merge PRs with "unrelated" test failures.
I would first try to not introduce the exception for local builds. It makes it quite hard for others to verify the build and to make sure that the right things were executed. If we see that this becomes an issue then we can revisit this idea. Cheers, Till On Wed, Jun 23, 2021 at 4:19 AM Yangze Guo wrote: > +1 for appending this to community guidelines for merging PRs. > > @Till Rohrmann > I agree that with this approach unstable tests will not block other > commit merges. However, it might be hard to prevent merging commits > that are related to those tests and should have been passed them. It's > true that this judgment can be made by the committers, but no one can > ensure the judgment is always precise and so that we have this > discussion thread. > > Regarding the unstable tests, how about adding another exception: > committers verify it in their local environment and comment in such > cases? > > Best, > Yangze Guo > > On Tue, Jun 22, 2021 at 8:23 PM 刘建刚 wrote: > > > > It is a good principle to run all tests successfully with any change. > This > > means a lot for project's stability and development. I am big +1 for this > > proposal. > > > > Best > > liujiangang > > > > Till Rohrmann 于2021年6月22日周二 下午6:36写道: > > > > > One way to address the problem of regularly failing tests that block > > > merging of PRs is to disable the respective tests for the time being. > Of > > > course, the failing test then needs to be fixed. But at least that way > we > > > would not block everyone from making progress. > > > > > > Cheers, > > > Till > > > > > > On Tue, Jun 22, 2021 at 12:00 PM Arvid Heise wrote: > > > > > > > I think this is overall a good idea. So +1 from my side. > > > > However, I'd like to put a higher priority on infrastructure then, in > > > > particular docker image/artifact caches. > > > > > > > > On Tue, Jun 22, 2021 at 11:50 AM Till Rohrmann > > > > > wrote: > > > > > > > > > Thanks for bringing this topic to our attention Xintong. I think > your > > > > > proposal makes a lot of sense and we should follow it. It will > give us > > > > > confidence that our changes are working and it might be a good > > > incentive > > > > to > > > > > quickly fix build instabilities. Hence, +1. > > > > > > > > > > Cheers, > > > > > Till > > > > > > > > > > On Tue, Jun 22, 2021 at 11:12 AM Xintong Song < > tonysong...@gmail.com> > > > > > wrote: > > > > > > > > > > > Hi everyone, > > > > > > > > > > > > In the past a couple of weeks, I've observed several times that > PRs > > > are > > > > > > merged without a green light from the CI tests, where failure > cases > > > are > > > > > > considered *unrelated*. This may not always cause problems, but > would > > > > > > increase the chance of breaking our code base. In fact, it has > > > occurred > > > > > to > > > > > > me twice in the past few weeks that I had to revert a commit > which > > > > breaks > > > > > > the master branch due to this. > > > > > > > > > > > > I think it would be nicer to enforce a stricter rule, that no PRs > > > > should > > > > > be > > > > > > merged without passing CI. > > > > > > > > > > > > The problems of merging PRs with "unrelated" test failures are: > > > > > > - It's not always straightforward to tell whether a test > failures are > > > > > > related or not. > > > > > > - It prevents subsequent test cases from being executed, which > may > > > fail > > > > > > relating to the PR changes. > > > > > > > > > > > > To make things easier for the committers, the following > exceptions > > > > might > > > > > be > > > > > > considered acceptable. > > > > > > - The PR has passed CI in the contributor's personal workspace. > > > Please > > > > > post > > > > > > the link in such cases. > > > > > > - The CI tests have been triggered multiple times, on the same > > > commit, > > > > > and > > > > > > each stage has at least passed for once. Please also comment in > such > > > > > cases. > > > > > > > > > > > > If we all agree on this, I'd update the community guidelines for > > > > merging > > > > > > PRs wrt. this proposal. [1] > > > > > > > > > > > > Please let me know what do you think. > > > > > > > > > > > > Thank you~ > > > > > > > > > > > > Xintong Song > > > > > > > > > > > > > > > > > > [1] > > > > > > > > > > > https://cwiki.apache.org/confluence/display/FLINK/Merging+Pull+Requests > > > > > > > > > > > > > > > > > > >
Re: [DISCUSS] Do not merge PRs with "unrelated" test failures.
+1 for appending this to community guidelines for merging PRs. @Till Rohrmann I agree that with this approach unstable tests will not block other commit merges. However, it might be hard to prevent merging commits that are related to those tests and should have been passed them. It's true that this judgment can be made by the committers, but no one can ensure the judgment is always precise and so that we have this discussion thread. Regarding the unstable tests, how about adding another exception: committers verify it in their local environment and comment in such cases? Best, Yangze Guo On Tue, Jun 22, 2021 at 8:23 PM 刘建刚 wrote: > > It is a good principle to run all tests successfully with any change. This > means a lot for project's stability and development. I am big +1 for this > proposal. > > Best > liujiangang > > Till Rohrmann 于2021年6月22日周二 下午6:36写道: > > > One way to address the problem of regularly failing tests that block > > merging of PRs is to disable the respective tests for the time being. Of > > course, the failing test then needs to be fixed. But at least that way we > > would not block everyone from making progress. > > > > Cheers, > > Till > > > > On Tue, Jun 22, 2021 at 12:00 PM Arvid Heise wrote: > > > > > I think this is overall a good idea. So +1 from my side. > > > However, I'd like to put a higher priority on infrastructure then, in > > > particular docker image/artifact caches. > > > > > > On Tue, Jun 22, 2021 at 11:50 AM Till Rohrmann > > > wrote: > > > > > > > Thanks for bringing this topic to our attention Xintong. I think your > > > > proposal makes a lot of sense and we should follow it. It will give us > > > > confidence that our changes are working and it might be a good > > incentive > > > to > > > > quickly fix build instabilities. Hence, +1. > > > > > > > > Cheers, > > > > Till > > > > > > > > On Tue, Jun 22, 2021 at 11:12 AM Xintong Song > > > > wrote: > > > > > > > > > Hi everyone, > > > > > > > > > > In the past a couple of weeks, I've observed several times that PRs > > are > > > > > merged without a green light from the CI tests, where failure cases > > are > > > > > considered *unrelated*. This may not always cause problems, but would > > > > > increase the chance of breaking our code base. In fact, it has > > occurred > > > > to > > > > > me twice in the past few weeks that I had to revert a commit which > > > breaks > > > > > the master branch due to this. > > > > > > > > > > I think it would be nicer to enforce a stricter rule, that no PRs > > > should > > > > be > > > > > merged without passing CI. > > > > > > > > > > The problems of merging PRs with "unrelated" test failures are: > > > > > - It's not always straightforward to tell whether a test failures are > > > > > related or not. > > > > > - It prevents subsequent test cases from being executed, which may > > fail > > > > > relating to the PR changes. > > > > > > > > > > To make things easier for the committers, the following exceptions > > > might > > > > be > > > > > considered acceptable. > > > > > - The PR has passed CI in the contributor's personal workspace. > > Please > > > > post > > > > > the link in such cases. > > > > > - The CI tests have been triggered multiple times, on the same > > commit, > > > > and > > > > > each stage has at least passed for once. Please also comment in such > > > > cases. > > > > > > > > > > If we all agree on this, I'd update the community guidelines for > > > merging > > > > > PRs wrt. this proposal. [1] > > > > > > > > > > Please let me know what do you think. > > > > > > > > > > Thank you~ > > > > > > > > > > Xintong Song > > > > > > > > > > > > > > > [1] > > > > > > > > https://cwiki.apache.org/confluence/display/FLINK/Merging+Pull+Requests > > > > > > > > > > > > > >
Re: [DISCUSS] Do not merge PRs with "unrelated" test failures.
It is a good principle to run all tests successfully with any change. This means a lot for project's stability and development. I am big +1 for this proposal. Best liujiangang Till Rohrmann 于2021年6月22日周二 下午6:36写道: > One way to address the problem of regularly failing tests that block > merging of PRs is to disable the respective tests for the time being. Of > course, the failing test then needs to be fixed. But at least that way we > would not block everyone from making progress. > > Cheers, > Till > > On Tue, Jun 22, 2021 at 12:00 PM Arvid Heise wrote: > > > I think this is overall a good idea. So +1 from my side. > > However, I'd like to put a higher priority on infrastructure then, in > > particular docker image/artifact caches. > > > > On Tue, Jun 22, 2021 at 11:50 AM Till Rohrmann > > wrote: > > > > > Thanks for bringing this topic to our attention Xintong. I think your > > > proposal makes a lot of sense and we should follow it. It will give us > > > confidence that our changes are working and it might be a good > incentive > > to > > > quickly fix build instabilities. Hence, +1. > > > > > > Cheers, > > > Till > > > > > > On Tue, Jun 22, 2021 at 11:12 AM Xintong Song > > > wrote: > > > > > > > Hi everyone, > > > > > > > > In the past a couple of weeks, I've observed several times that PRs > are > > > > merged without a green light from the CI tests, where failure cases > are > > > > considered *unrelated*. This may not always cause problems, but would > > > > increase the chance of breaking our code base. In fact, it has > occurred > > > to > > > > me twice in the past few weeks that I had to revert a commit which > > breaks > > > > the master branch due to this. > > > > > > > > I think it would be nicer to enforce a stricter rule, that no PRs > > should > > > be > > > > merged without passing CI. > > > > > > > > The problems of merging PRs with "unrelated" test failures are: > > > > - It's not always straightforward to tell whether a test failures are > > > > related or not. > > > > - It prevents subsequent test cases from being executed, which may > fail > > > > relating to the PR changes. > > > > > > > > To make things easier for the committers, the following exceptions > > might > > > be > > > > considered acceptable. > > > > - The PR has passed CI in the contributor's personal workspace. > Please > > > post > > > > the link in such cases. > > > > - The CI tests have been triggered multiple times, on the same > commit, > > > and > > > > each stage has at least passed for once. Please also comment in such > > > cases. > > > > > > > > If we all agree on this, I'd update the community guidelines for > > merging > > > > PRs wrt. this proposal. [1] > > > > > > > > Please let me know what do you think. > > > > > > > > Thank you~ > > > > > > > > Xintong Song > > > > > > > > > > > > [1] > > > > > > https://cwiki.apache.org/confluence/display/FLINK/Merging+Pull+Requests > > > > > > > > > >
Re: [DISCUSS] Do not merge PRs with "unrelated" test failures.
One way to address the problem of regularly failing tests that block merging of PRs is to disable the respective tests for the time being. Of course, the failing test then needs to be fixed. But at least that way we would not block everyone from making progress. Cheers, Till On Tue, Jun 22, 2021 at 12:00 PM Arvid Heise wrote: > I think this is overall a good idea. So +1 from my side. > However, I'd like to put a higher priority on infrastructure then, in > particular docker image/artifact caches. > > On Tue, Jun 22, 2021 at 11:50 AM Till Rohrmann > wrote: > > > Thanks for bringing this topic to our attention Xintong. I think your > > proposal makes a lot of sense and we should follow it. It will give us > > confidence that our changes are working and it might be a good incentive > to > > quickly fix build instabilities. Hence, +1. > > > > Cheers, > > Till > > > > On Tue, Jun 22, 2021 at 11:12 AM Xintong Song > > wrote: > > > > > Hi everyone, > > > > > > In the past a couple of weeks, I've observed several times that PRs are > > > merged without a green light from the CI tests, where failure cases are > > > considered *unrelated*. This may not always cause problems, but would > > > increase the chance of breaking our code base. In fact, it has occurred > > to > > > me twice in the past few weeks that I had to revert a commit which > breaks > > > the master branch due to this. > > > > > > I think it would be nicer to enforce a stricter rule, that no PRs > should > > be > > > merged without passing CI. > > > > > > The problems of merging PRs with "unrelated" test failures are: > > > - It's not always straightforward to tell whether a test failures are > > > related or not. > > > - It prevents subsequent test cases from being executed, which may fail > > > relating to the PR changes. > > > > > > To make things easier for the committers, the following exceptions > might > > be > > > considered acceptable. > > > - The PR has passed CI in the contributor's personal workspace. Please > > post > > > the link in such cases. > > > - The CI tests have been triggered multiple times, on the same commit, > > and > > > each stage has at least passed for once. Please also comment in such > > cases. > > > > > > If we all agree on this, I'd update the community guidelines for > merging > > > PRs wrt. this proposal. [1] > > > > > > Please let me know what do you think. > > > > > > Thank you~ > > > > > > Xintong Song > > > > > > > > > [1] > > > > https://cwiki.apache.org/confluence/display/FLINK/Merging+Pull+Requests > > > > > >
Re: [DISCUSS] Do not merge PRs with "unrelated" test failures.
I think this is overall a good idea. So +1 from my side. However, I'd like to put a higher priority on infrastructure then, in particular docker image/artifact caches. On Tue, Jun 22, 2021 at 11:50 AM Till Rohrmann wrote: > Thanks for bringing this topic to our attention Xintong. I think your > proposal makes a lot of sense and we should follow it. It will give us > confidence that our changes are working and it might be a good incentive to > quickly fix build instabilities. Hence, +1. > > Cheers, > Till > > On Tue, Jun 22, 2021 at 11:12 AM Xintong Song > wrote: > > > Hi everyone, > > > > In the past a couple of weeks, I've observed several times that PRs are > > merged without a green light from the CI tests, where failure cases are > > considered *unrelated*. This may not always cause problems, but would > > increase the chance of breaking our code base. In fact, it has occurred > to > > me twice in the past few weeks that I had to revert a commit which breaks > > the master branch due to this. > > > > I think it would be nicer to enforce a stricter rule, that no PRs should > be > > merged without passing CI. > > > > The problems of merging PRs with "unrelated" test failures are: > > - It's not always straightforward to tell whether a test failures are > > related or not. > > - It prevents subsequent test cases from being executed, which may fail > > relating to the PR changes. > > > > To make things easier for the committers, the following exceptions might > be > > considered acceptable. > > - The PR has passed CI in the contributor's personal workspace. Please > post > > the link in such cases. > > - The CI tests have been triggered multiple times, on the same commit, > and > > each stage has at least passed for once. Please also comment in such > cases. > > > > If we all agree on this, I'd update the community guidelines for merging > > PRs wrt. this proposal. [1] > > > > Please let me know what do you think. > > > > Thank you~ > > > > Xintong Song > > > > > > [1] > > https://cwiki.apache.org/confluence/display/FLINK/Merging+Pull+Requests > > >
Re: [DISCUSS] Do not merge PRs with "unrelated" test failures.
Thanks for bringing this topic to our attention Xintong. I think your proposal makes a lot of sense and we should follow it. It will give us confidence that our changes are working and it might be a good incentive to quickly fix build instabilities. Hence, +1. Cheers, Till On Tue, Jun 22, 2021 at 11:12 AM Xintong Song wrote: > Hi everyone, > > In the past a couple of weeks, I've observed several times that PRs are > merged without a green light from the CI tests, where failure cases are > considered *unrelated*. This may not always cause problems, but would > increase the chance of breaking our code base. In fact, it has occurred to > me twice in the past few weeks that I had to revert a commit which breaks > the master branch due to this. > > I think it would be nicer to enforce a stricter rule, that no PRs should be > merged without passing CI. > > The problems of merging PRs with "unrelated" test failures are: > - It's not always straightforward to tell whether a test failures are > related or not. > - It prevents subsequent test cases from being executed, which may fail > relating to the PR changes. > > To make things easier for the committers, the following exceptions might be > considered acceptable. > - The PR has passed CI in the contributor's personal workspace. Please post > the link in such cases. > - The CI tests have been triggered multiple times, on the same commit, and > each stage has at least passed for once. Please also comment in such cases. > > If we all agree on this, I'd update the community guidelines for merging > PRs wrt. this proposal. [1] > > Please let me know what do you think. > > Thank you~ > > Xintong Song > > > [1] > https://cwiki.apache.org/confluence/display/FLINK/Merging+Pull+Requests >
[DISCUSS] Do not merge PRs with "unrelated" test failures.
Hi everyone, In the past a couple of weeks, I've observed several times that PRs are merged without a green light from the CI tests, where failure cases are considered *unrelated*. This may not always cause problems, but would increase the chance of breaking our code base. In fact, it has occurred to me twice in the past few weeks that I had to revert a commit which breaks the master branch due to this. I think it would be nicer to enforce a stricter rule, that no PRs should be merged without passing CI. The problems of merging PRs with "unrelated" test failures are: - It's not always straightforward to tell whether a test failures are related or not. - It prevents subsequent test cases from being executed, which may fail relating to the PR changes. To make things easier for the committers, the following exceptions might be considered acceptable. - The PR has passed CI in the contributor's personal workspace. Please post the link in such cases. - The CI tests have been triggered multiple times, on the same commit, and each stage has at least passed for once. Please also comment in such cases. If we all agree on this, I'd update the community guidelines for merging PRs wrt. this proposal. [1] Please let me know what do you think. Thank you~ Xintong Song [1] https://cwiki.apache.org/confluence/display/FLINK/Merging+Pull+Requests