Re: Status of solr tests
> On Jun 15, 2018, at 8:29 AM, David Smiley wrote: > > I'm +1 to modify the Lucene-side JIRA QA bot (Yetus) to not execute Solr > tests. Right now, Yetus only executes Solr tests when there is a Solr change in the patch; otherwise only Lucene tests are executed. I just committed a modification to the Lucene/Solr Yetus personality that adds "-Dtests.badapples=false" to the per-modified-module “ant test” cmdline. This should reduce the noise appreciably. -- Steve www.lucidworks.com - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: Status of solr tests
t;> there. >> >> -- >> Steve >> www.lucidworks.com >> >>> On Jun 19, 2018, at 9:41 AM, Simon Willnauer >>> wrote: >>> >>> Thanks folks, I appreciate you are sharing some thoughts about this. My >>> biggest issue is that this is a permanent condition. I could have sent this >>> mail 2, 4 or 6 years ago and it would have been as relevant as today. >>> >>> I am convinced mark can make some progress but this isn't fixable by a >>> single person this is a structural problem or rather a cultural. I am not >>> sure if everybody is aware of how terrible it is. I took a screenshot of my >>> inbox the other day what I have to dig through on a constant basis >>> everytime I commit a change to lucene to make sure I am not missing >>> something. >>> >>> >>> >>> I don't even know how we can attract any new contributors or how many >>> contributors have been scared away by this in the past. This is not good >>> and bad-appeling these test isn't the answer unless we put a lot of effort >>> into it, sorry I don't see it happening. I would have expected more than >>> like 4 people from this PMC to reply to something like this. From my >>> perspective there is a lot of harm done by this to the project and we have >>> to figure out what we wanna do. This also affects our ability to release, >>> guys our smoke-test builds never pass [1]. I don't know what to do if I >>> were a RM for 7.4 (thanks adrien for doing it) Like I can not tell what is >>> serious and what not on a solr build. It's also not just be smoke tester >>> it's basically everything that runs after solr that is skipped on a regular >>> basis. >>> >>> I don't have a good answer but we have to get this under control it's >>> burdensome for lucene to carry this load and it's carrying it a quite some >>> time. It wasn't very obvious how big this weights since I wasn't working on >>> lucene internals for quite a while and speaking to many folks around here >>> this is on their shoulders but it's not brought up for discussion, i think >>> we have to. >>> >>> simon >>> >>> [1] https://builds.apache.org/job/Lucene-Solr-SmokeRelease-7.4/ >>> >>> >>> On Sat, Jun 16, 2018 at 6:40 AM, Erick Erickson >>> wrote: >>> Martin: >>> >>> I have no idea how logging severity levels apply to unit tests that fail. >>> It's not a question of triaging logs, it's a matter of Jenkins junit test >>> runs reporting failures. >>> >>> >>> >>> On Fri, Jun 15, 2018 at 4:25 PM, Martin Gainty wrote: >>> Erick- >>> >>> appears that style mis-application may be categorised as INFO >>> are mixed in with SEVERE errors >>> >>> Would it make sense to filter the errors based on severity ? >>> >>> https://docs.oracle.com/javase/7/docs/api/java/util/logging/Level.html >>> Level (Java Platform SE 7 ) - Oracle Help Center >>> docs.oracle.com >>> The Level class defines a set of standard logging levels that can be used >>> to control logging output. The logging Level objects are ordered and are >>> specified by ordered integers. >>> if you know Severity you can triage the SEVERE errors before working down >>> to INFO errors >>> >>> >>> WDYT? >>> Martin >>> __ >>> >>> >>> >>> From: Erick Erickson >>> Sent: Friday, June 15, 2018 1:05 PM >>> To: dev@lucene.apache.org; Mark Miller >>> Subject: Re: Status of solr tests >>> >>> Mark (and everyone). >>> >>> I'm trying to be somewhat conservative about what I BadApple, at this >>> point it's only things that have failed every week for the last 4. >>> Part of that conservatism is to avoid BadApple'ing tests that are >>> failing and _should_ fail. >>> >>> I'm explicitly _not_ delving into any of the causes at all at this >>> point, it's overwhelming until we reduce the noise as everyone knows. >>> >>> So please feel totally free to BadApple anything you know is flakey, >>> it won't intrude on my turf ;) >>> >>> And since I realized I can also report tests that have _not_ failed in >>> a month that _are_ BadApple'
Re: Status of solr tests
>> >> I have no idea how logging severity levels apply to unit tests that fail. >> It's not a question of triaging logs, it's a matter of Jenkins junit test >> runs reporting failures. >> >> >> >> On Fri, Jun 15, 2018 at 4:25 PM, Martin Gainty wrote: >> Erick- >> >> appears that style mis-application may be categorised as INFO >> are mixed in with SEVERE errors >> >> Would it make sense to filter the errors based on severity ? >> >> https://docs.oracle.com/javase/7/docs/api/java/util/logging/Level.html >> Level (Java Platform SE 7 ) - Oracle Help Center >> docs.oracle.com >> The Level class defines a set of standard logging levels that can be used to >> control logging output. The logging Level objects are ordered and are >> specified by ordered integers. >> if you know Severity you can triage the SEVERE errors before working down to >> INFO errors >> >> >> WDYT? >> Martin >> __ >> >> >> >> From: Erick Erickson >> Sent: Friday, June 15, 2018 1:05 PM >> To: dev@lucene.apache.org; Mark Miller >> Subject: Re: Status of solr tests >> >> Mark (and everyone). >> >> I'm trying to be somewhat conservative about what I BadApple, at this >> point it's only things that have failed every week for the last 4. >> Part of that conservatism is to avoid BadApple'ing tests that are >> failing and _should_ fail. >> >> I'm explicitly _not_ delving into any of the causes at all at this >> point, it's overwhelming until we reduce the noise as everyone knows. >> >> So please feel totally free to BadApple anything you know is flakey, >> it won't intrude on my turf ;) >> >> And since I realized I can also report tests that have _not_ failed in >> a month that _are_ BadApple'd, we can be a little freer with >> BadApple'ing tests since there's a mechanism for un-annotating them >> without a lot of tedious effort. >> >> FWIW. >> >> On Fri, Jun 15, 2018 at 9:09 AM, Mark Miller wrote: >> > There is an okay chance I'm going to start making some improvements here as >> > well. I've been working on a very stable set of tests on my starburst >> > branch >> > and will slowly bring in test fixes over time (I've already been making >> > some >> > on that branch for important tests). We should currently be defaulting to >> > tests.badapples=false on all solr test runs - it's a joke to try and get a >> > clean run otherwise, and even then somehow 4 or 5 tests that fail somewhat >> > commonly have so far avoided Erick's @BadApple hack and slash. They are bad >> > appled on my dev branch now, but that is currently where any time I have is >> > spent rather than on the main dev branches. >> > >> > Also, too many flakey tests are introduced because devs are not beasting or >> > beasting well before committing new heavy tests. Perhaps we could add some >> > docs around that. >> > >> > We have built in beasting support, we need to emphasize that a couple >> > passes >> > on a new test is not sufficient to test it's quality. >> > >> > - Mark >> > >> > On Fri, Jun 15, 2018 at 9:46 AM Erick Erickson >> > wrote: >> >> >> >> (Sg) All very true. You're not alone in your frustration. >> >> >> >> I've been trying to at least BadApple tests that fail consistently, so >> >> another option could be to disable BadApple'd tests. My hope has been >> >> to get to the point of being able to reliably get clean runs, at least >> >> when BadApple'd tests are disabled. >> >> >> >> From that point I want to draw a line in the sand and immediately >> >> address tests that fail that are _not_ BadApple'd. At least then we'll >> >> stop getting _worse_. And then we can work on the BadApple'd tests. >> >> But as David says, that's not going to be any time soon. It's been a >> >> couple of months that I've been trying to just get the tests >> >> BadApple'd without even trying to fix any of them. >> >> >> >> It's particularly pernicious because with all the noise we don't see >> >> failures we _should_ see. >> >> >> >> So I don't have any good short-term answer either. We've built up a &
Re: Status of solr tests
;>> I don't even know how we can attract any new contributors or how many >>> contributors have been scared away by this in the past. This is not good >>> and bad-appeling these test isn't the answer unless we put a lot of effort >>> into it, sorry I don't see it happening. I would have expected more than >>> like 4 people from this PMC to reply to something like this. From my >>> perspective there is a lot of harm done by this to the project and we have >>> to figure out what we wanna do. This also affects our ability to release, >>> guys our smoke-test builds never pass [1]. I don't know what to do if I >>> were a RM for 7.4 (thanks adrien for doing it) Like I can not tell what is >>> serious and what not on a solr build. It's also not just be smoke tester >>> it's basically everything that runs after solr that is skipped on a regular >>> basis. >>> >>> I don't have a good answer but we have to get this under control it's >>> burdensome for lucene to carry this load and it's carrying it a quite some >>> time. It wasn't very obvious how big this weights since I wasn't working on >>> lucene internals for quite a while and speaking to many folks around here >>> this is on their shoulders but it's not brought up for discussion, i think >>> we have to. >>> >>> simon >>> >>> [1] https://builds.apache.org/job/Lucene-Solr-SmokeRelease-7.4/ >>> >>> >>> On Sat, Jun 16, 2018 at 6:40 AM, Erick Erickson >>> wrote: >>> Martin: >>> >>> I have no idea how logging severity levels apply to unit tests that fail. >>> It's not a question of triaging logs, it's a matter of Jenkins junit test >>> runs reporting failures. >>> >>> >>> >>> On Fri, Jun 15, 2018 at 4:25 PM, Martin Gainty wrote: >>> Erick- >>> >>> appears that style mis-application may be categorised as INFO >>> are mixed in with SEVERE errors >>> >>> Would it make sense to filter the errors based on severity ? >>> >>> https://docs.oracle.com/javase/7/docs/api/java/util/logging/Level.html >>> Level (Java Platform SE 7 ) - Oracle Help Center >>> docs.oracle.com >>> The Level class defines a set of standard logging levels that can be used >>> to control logging output. The logging Level objects are ordered and are >>> specified by ordered integers. >>> if you know Severity you can triage the SEVERE errors before working down >>> to INFO errors >>> >>> >>> WDYT? >>> Martin >>> __ >>> >>> >>> >>> From: Erick Erickson >>> Sent: Friday, June 15, 2018 1:05 PM >>> To: dev@lucene.apache.org; Mark Miller >>> Subject: Re: Status of solr tests >>> >>> Mark (and everyone). >>> >>> I'm trying to be somewhat conservative about what I BadApple, at this >>> point it's only things that have failed every week for the last 4. >>> Part of that conservatism is to avoid BadApple'ing tests that are >>> failing and _should_ fail. >>> >>> I'm explicitly _not_ delving into any of the causes at all at this >>> point, it's overwhelming until we reduce the noise as everyone knows. >>> >>> So please feel totally free to BadApple anything you know is flakey, >>> it won't intrude on my turf ;) >>> >>> And since I realized I can also report tests that have _not_ failed in >>> a month that _are_ BadApple'd, we can be a little freer with >>> BadApple'ing tests since there's a mechanism for un-annotating them >>> without a lot of tedious effort. >>> >>> FWIW. >>> >>> On Fri, Jun 15, 2018 at 9:09 AM, Mark Miller wrote: >>>> There is an okay chance I'm going to start making some improvements here as >>>> well. I've been working on a very stable set of tests on my starburst >>>> branch >>>> and will slowly bring in test fixes over time (I've already been making >>>> some >>>> on that branch for important tests). We should currently be defaulting to >>>> tests.badapples=false on all solr test runs - it's a joke to try and get a >>>> clean run otherwise, and even then somehow 4 or 5 tests that fail somewhat >>>> commonly have s
Re: Status of solr tests
Hi steve, I saw and followed that thread but the only outcome that I can see it stuff being bad appled? I might miss something and I can go and argue on specifics on that thread like: > Testing distributed systems requires, well, distributed systems which is what > starting clusters is all about. which I have worked on for several years and I am convinced it's a false statement. I didn't wanna go down that route which I think boils down to the cultural disconnect. If I missed anything that is answered I am sorry I will go and re-read it. simon On Tue, Jun 19, 2018 at 4:29 PM, Steve Rowe wrote: > Hi Simon, > > Have you seen the late-February thread “Test failures are out of control….”? > : > https://lists.apache.org/thread.html/b783a9d7c22f518b07355e8e4f2c6f56020a7c32f36a58a86d51a3b7@%3Cdev.lucene.apache.org%3E > > If not, I suggest you go take a look. Some of your questions are answered > there. > > -- > Steve > www.lucidworks.com > >> On Jun 19, 2018, at 9:41 AM, Simon Willnauer >> wrote: >> >> Thanks folks, I appreciate you are sharing some thoughts about this. My >> biggest issue is that this is a permanent condition. I could have sent this >> mail 2, 4 or 6 years ago and it would have been as relevant as today. >> >> I am convinced mark can make some progress but this isn't fixable by a >> single person this is a structural problem or rather a cultural. I am not >> sure if everybody is aware of how terrible it is. I took a screenshot of my >> inbox the other day what I have to dig through on a constant basis everytime >> I commit a change to lucene to make sure I am not missing something. >> >> >> >> I don't even know how we can attract any new contributors or how many >> contributors have been scared away by this in the past. This is not good and >> bad-appeling these test isn't the answer unless we put a lot of effort into >> it, sorry I don't see it happening. I would have expected more than like 4 >> people from this PMC to reply to something like this. From my perspective >> there is a lot of harm done by this to the project and we have to figure out >> what we wanna do. This also affects our ability to release, guys our >> smoke-test builds never pass [1]. I don't know what to do if I were a RM for >> 7.4 (thanks adrien for doing it) Like I can not tell what is serious and >> what not on a solr build. It's also not just be smoke tester it's basically >> everything that runs after solr that is skipped on a regular basis. >> >> I don't have a good answer but we have to get this under control it's >> burdensome for lucene to carry this load and it's carrying it a quite some >> time. It wasn't very obvious how big this weights since I wasn't working on >> lucene internals for quite a while and speaking to many folks around here >> this is on their shoulders but it's not brought up for discussion, i think >> we have to. >> >> simon >> >> [1] https://builds.apache.org/job/Lucene-Solr-SmokeRelease-7.4/ >> >> >> On Sat, Jun 16, 2018 at 6:40 AM, Erick Erickson >> wrote: >> Martin: >> >> I have no idea how logging severity levels apply to unit tests that fail. >> It's not a question of triaging logs, it's a matter of Jenkins junit test >> runs reporting failures. >> >> >> >> On Fri, Jun 15, 2018 at 4:25 PM, Martin Gainty wrote: >> Erick- >> >> appears that style mis-application may be categorised as INFO >> are mixed in with SEVERE errors >> >> Would it make sense to filter the errors based on severity ? >> >> https://docs.oracle.com/javase/7/docs/api/java/util/logging/Level.html >> Level (Java Platform SE 7 ) - Oracle Help Center >> docs.oracle.com >> The Level class defines a set of standard logging levels that can be used to >> control logging output. The logging Level objects are ordered and are >> specified by ordered integers. >> if you know Severity you can triage the SEVERE errors before working down to >> INFO errors >> >> >> WDYT? >> Martin >> __ >> >> >> >> From: Erick Erickson >> Sent: Friday, June 15, 2018 1:05 PM >> To: dev@lucene.apache.org; Mark Miller >> Subject: Re: Status of solr tests >> >> Mark (and everyone). >> >> I'm trying to be somewhat conservative about what I BadApple, at this >> point it's only things that have failed every week for the last 4. >> Part of that con
Re: Status of solr tests
Hi Simon, Have you seen the late-February thread “Test failures are out of control….”? : https://lists.apache.org/thread.html/b783a9d7c22f518b07355e8e4f2c6f56020a7c32f36a58a86d51a3b7@%3Cdev.lucene.apache.org%3E If not, I suggest you go take a look. Some of your questions are answered there. -- Steve www.lucidworks.com > On Jun 19, 2018, at 9:41 AM, Simon Willnauer > wrote: > > Thanks folks, I appreciate you are sharing some thoughts about this. My > biggest issue is that this is a permanent condition. I could have sent this > mail 2, 4 or 6 years ago and it would have been as relevant as today. > > I am convinced mark can make some progress but this isn't fixable by a single > person this is a structural problem or rather a cultural. I am not sure if > everybody is aware of how terrible it is. I took a screenshot of my inbox the > other day what I have to dig through on a constant basis everytime I commit a > change to lucene to make sure I am not missing something. > > > > I don't even know how we can attract any new contributors or how many > contributors have been scared away by this in the past. This is not good and > bad-appeling these test isn't the answer unless we put a lot of effort into > it, sorry I don't see it happening. I would have expected more than like 4 > people from this PMC to reply to something like this. From my perspective > there is a lot of harm done by this to the project and we have to figure out > what we wanna do. This also affects our ability to release, guys our > smoke-test builds never pass [1]. I don't know what to do if I were a RM for > 7.4 (thanks adrien for doing it) Like I can not tell what is serious and what > not on a solr build. It's also not just be smoke tester it's basically > everything that runs after solr that is skipped on a regular basis. > > I don't have a good answer but we have to get this under control it's > burdensome for lucene to carry this load and it's carrying it a quite some > time. It wasn't very obvious how big this weights since I wasn't working on > lucene internals for quite a while and speaking to many folks around here > this is on their shoulders but it's not brought up for discussion, i think we > have to. > > simon > > [1] https://builds.apache.org/job/Lucene-Solr-SmokeRelease-7.4/ > > > On Sat, Jun 16, 2018 at 6:40 AM, Erick Erickson > wrote: > Martin: > > I have no idea how logging severity levels apply to unit tests that fail. > It's not a question of triaging logs, it's a matter of Jenkins junit test > runs reporting failures. > > > > On Fri, Jun 15, 2018 at 4:25 PM, Martin Gainty wrote: > Erick- > > appears that style mis-application may be categorised as INFO > are mixed in with SEVERE errors > > Would it make sense to filter the errors based on severity ? > > https://docs.oracle.com/javase/7/docs/api/java/util/logging/Level.html > Level (Java Platform SE 7 ) - Oracle Help Center > docs.oracle.com > The Level class defines a set of standard logging levels that can be used to > control logging output. The logging Level objects are ordered and are > specified by ordered integers. > if you know Severity you can triage the SEVERE errors before working down to > INFO errors > > > WDYT? > Martin > __ > > > > From: Erick Erickson > Sent: Friday, June 15, 2018 1:05 PM > To: dev@lucene.apache.org; Mark Miller > Subject: Re: Status of solr tests > > Mark (and everyone). > > I'm trying to be somewhat conservative about what I BadApple, at this > point it's only things that have failed every week for the last 4. > Part of that conservatism is to avoid BadApple'ing tests that are > failing and _should_ fail. > > I'm explicitly _not_ delving into any of the causes at all at this > point, it's overwhelming until we reduce the noise as everyone knows. > > So please feel totally free to BadApple anything you know is flakey, > it won't intrude on my turf ;) > > And since I realized I can also report tests that have _not_ failed in > a month that _are_ BadApple'd, we can be a little freer with > BadApple'ing tests since there's a mechanism for un-annotating them > without a lot of tedious effort. > > FWIW. > > On Fri, Jun 15, 2018 at 9:09 AM, Mark Miller wrote: > > There is an okay chance I'm going to start making some improvements here as > > well. I've been working on a very stable set of tests on my starburst branch > > and will slowly bring in test fixes over time (I've already
Re: Status of solr tests
Martin: I have no idea how logging severity levels apply to unit tests that fail. It's not a question of triaging logs, it's a matter of Jenkins junit test runs reporting failures. On Fri, Jun 15, 2018 at 4:25 PM, Martin Gainty wrote: > Erick- > > appears that style mis-application may be categorised as INFO > are mixed in with SEVERE errors > > Would it make sense to filter the errors based on severity ? > > > https://docs.oracle.com/javase/7/docs/api/java/util/logging/Level.html > Level (Java Platform SE 7 ) - Oracle Help Center > <https://docs.oracle.com/javase/7/docs/api/java/util/logging/Level.html> > docs.oracle.com > The Level class defines a set of standard logging levels that can be used > to control logging output. The logging Level objects are ordered and are > specified by ordered integers. > if you know Severity you can triage the SEVERE errors before working down > to INFO errors > > WDYT? > Martin > __ > > > > > -- > *From:* Erick Erickson > *Sent:* Friday, June 15, 2018 1:05 PM > *To:* dev@lucene.apache.org; Mark Miller > *Subject:* Re: Status of solr tests > > Mark (and everyone). > > I'm trying to be somewhat conservative about what I BadApple, at this > point it's only things that have failed every week for the last 4. > Part of that conservatism is to avoid BadApple'ing tests that are > failing and _should_ fail. > > I'm explicitly _not_ delving into any of the causes at all at this > point, it's overwhelming until we reduce the noise as everyone knows. > > So please feel totally free to BadApple anything you know is flakey, > it won't intrude on my turf ;) > > And since I realized I can also report tests that have _not_ failed in > a month that _are_ BadApple'd, we can be a little freer with > BadApple'ing tests since there's a mechanism for un-annotating them > without a lot of tedious effort. > > FWIW. > > On Fri, Jun 15, 2018 at 9:09 AM, Mark Miller > wrote: > > There is an okay chance I'm going to start making some improvements here > as > > well. I've been working on a very stable set of tests on my starburst > branch > > and will slowly bring in test fixes over time (I've already been making > some > > on that branch for important tests). We should currently be defaulting to > > tests.badapples=false on all solr test runs - it's a joke to try and get > a > > clean run otherwise, and even then somehow 4 or 5 tests that fail > somewhat > > commonly have so far avoided Erick's @BadApple hack and slash. They are > bad > > appled on my dev branch now, but that is currently where any time I have > is > > spent rather than on the main dev branches. > > > > Also, too many flakey tests are introduced because devs are not beasting > or > > beasting well before committing new heavy tests. Perhaps we could add > some > > docs around that. > > > > We have built in beasting support, we need to emphasize that a couple > passes > > on a new test is not sufficient to test it's quality. > > > > - Mark > > > > On Fri, Jun 15, 2018 at 9:46 AM Erick Erickson > > wrote: > >> > >> (Sg) All very true. You're not alone in your frustration. > >> > >> I've been trying to at least BadApple tests that fail consistently, so > >> another option could be to disable BadApple'd tests. My hope has been > >> to get to the point of being able to reliably get clean runs, at least > >> when BadApple'd tests are disabled. > >> > >> From that point I want to draw a line in the sand and immediately > >> address tests that fail that are _not_ BadApple'd. At least then we'll > >> stop getting _worse_. And then we can work on the BadApple'd tests. > >> But as David says, that's not going to be any time soon. It's been a > >> couple of months that I've been trying to just get the tests > >> BadApple'd without even trying to fix any of them. > >> > >> It's particularly pernicious because with all the noise we don't see > >> failures we _should_ see. > >> > >> So I don't have any good short-term answer either. We've built up a > >> very large technical debt in the testing. The first step is to stop > >> adding more debt, which is what I've been working on so far. And > >> that's the easy part > >> > >> Siigghh
Re: Status of solr tests
can we disable this bot already? On Fri, Jun 15, 2018, 7:25 PM Martin Gainty wrote: > Erick- > > appears that style mis-application may be categorised as INFO > are mixed in with SEVERE errors > > Would it make sense to filter the errors based on severity ? > > > https://docs.oracle.com/javase/7/docs/api/java/util/logging/Level.html > Level (Java Platform SE 7 ) - Oracle Help Center > <https://docs.oracle.com/javase/7/docs/api/java/util/logging/Level.html> > docs.oracle.com > The Level class defines a set of standard logging levels that can be used > to control logging output. The logging Level objects are ordered and are > specified by ordered integers. > if you know Severity you can triage the SEVERE errors before working down > to INFO errors > > WDYT? > Martin > __ > > > > > -- > *From:* Erick Erickson > *Sent:* Friday, June 15, 2018 1:05 PM > *To:* dev@lucene.apache.org; Mark Miller > *Subject:* Re: Status of solr tests > > Mark (and everyone). > > I'm trying to be somewhat conservative about what I BadApple, at this > point it's only things that have failed every week for the last 4. > Part of that conservatism is to avoid BadApple'ing tests that are > failing and _should_ fail. > > I'm explicitly _not_ delving into any of the causes at all at this > point, it's overwhelming until we reduce the noise as everyone knows. > > So please feel totally free to BadApple anything you know is flakey, > it won't intrude on my turf ;) > > And since I realized I can also report tests that have _not_ failed in > a month that _are_ BadApple'd, we can be a little freer with > BadApple'ing tests since there's a mechanism for un-annotating them > without a lot of tedious effort. > > FWIW. > > On Fri, Jun 15, 2018 at 9:09 AM, Mark Miller > wrote: > > There is an okay chance I'm going to start making some improvements here > as > > well. I've been working on a very stable set of tests on my starburst > branch > > and will slowly bring in test fixes over time (I've already been making > some > > on that branch for important tests). We should currently be defaulting to > > tests.badapples=false on all solr test runs - it's a joke to try and get > a > > clean run otherwise, and even then somehow 4 or 5 tests that fail > somewhat > > commonly have so far avoided Erick's @BadApple hack and slash. They are > bad > > appled on my dev branch now, but that is currently where any time I have > is > > spent rather than on the main dev branches. > > > > Also, too many flakey tests are introduced because devs are not beasting > or > > beasting well before committing new heavy tests. Perhaps we could add > some > > docs around that. > > > > We have built in beasting support, we need to emphasize that a couple > passes > > on a new test is not sufficient to test it's quality. > > > > - Mark > > > > On Fri, Jun 15, 2018 at 9:46 AM Erick Erickson > > wrote: > >> > >> (Sg) All very true. You're not alone in your frustration. > >> > >> I've been trying to at least BadApple tests that fail consistently, so > >> another option could be to disable BadApple'd tests. My hope has been > >> to get to the point of being able to reliably get clean runs, at least > >> when BadApple'd tests are disabled. > >> > >> From that point I want to draw a line in the sand and immediately > >> address tests that fail that are _not_ BadApple'd. At least then we'll > >> stop getting _worse_. And then we can work on the BadApple'd tests. > >> But as David says, that's not going to be any time soon. It's been a > >> couple of months that I've been trying to just get the tests > >> BadApple'd without even trying to fix any of them. > >> > >> It's particularly pernicious because with all the noise we don't see > >> failures we _should_ see. > >> > >> So I don't have any good short-term answer either. We've built up a > >> very large technical debt in the testing. The first step is to stop > >> adding more debt, which is what I've been working on so far. And > >> that's the easy part > >> > >> Siigghh > >> > >> Erick > >> > >> > >> On Fri, Jun 15, 2018 at 5:29 AM, David Smiley > > >> wrote: > >
Re: Status of solr tests
Erick- appears that style mis-application may be categorised as INFO are mixed in with SEVERE errors Would it make sense to filter the errors based on severity ? https://docs.oracle.com/javase/7/docs/api/java/util/logging/Level.html Level (Java Platform SE 7 ) - Oracle Help Center<https://docs.oracle.com/javase/7/docs/api/java/util/logging/Level.html> docs.oracle.com The Level class defines a set of standard logging levels that can be used to control logging output. The logging Level objects are ordered and are specified by ordered integers. if you know Severity you can triage the SEVERE errors before working down to INFO errors WDYT? Martin __ From: Erick Erickson Sent: Friday, June 15, 2018 1:05 PM To: dev@lucene.apache.org; Mark Miller Subject: Re: Status of solr tests Mark (and everyone). I'm trying to be somewhat conservative about what I BadApple, at this point it's only things that have failed every week for the last 4. Part of that conservatism is to avoid BadApple'ing tests that are failing and _should_ fail. I'm explicitly _not_ delving into any of the causes at all at this point, it's overwhelming until we reduce the noise as everyone knows. So please feel totally free to BadApple anything you know is flakey, it won't intrude on my turf ;) And since I realized I can also report tests that have _not_ failed in a month that _are_ BadApple'd, we can be a little freer with BadApple'ing tests since there's a mechanism for un-annotating them without a lot of tedious effort. FWIW. On Fri, Jun 15, 2018 at 9:09 AM, Mark Miller wrote: > There is an okay chance I'm going to start making some improvements here as > well. I've been working on a very stable set of tests on my starburst branch > and will slowly bring in test fixes over time (I've already been making some > on that branch for important tests). We should currently be defaulting to > tests.badapples=false on all solr test runs - it's a joke to try and get a > clean run otherwise, and even then somehow 4 or 5 tests that fail somewhat > commonly have so far avoided Erick's @BadApple hack and slash. They are bad > appled on my dev branch now, but that is currently where any time I have is > spent rather than on the main dev branches. > > Also, too many flakey tests are introduced because devs are not beasting or > beasting well before committing new heavy tests. Perhaps we could add some > docs around that. > > We have built in beasting support, we need to emphasize that a couple passes > on a new test is not sufficient to test it's quality. > > - Mark > > On Fri, Jun 15, 2018 at 9:46 AM Erick Erickson > wrote: >> >> (Sg) All very true. You're not alone in your frustration. >> >> I've been trying to at least BadApple tests that fail consistently, so >> another option could be to disable BadApple'd tests. My hope has been >> to get to the point of being able to reliably get clean runs, at least >> when BadApple'd tests are disabled. >> >> From that point I want to draw a line in the sand and immediately >> address tests that fail that are _not_ BadApple'd. At least then we'll >> stop getting _worse_. And then we can work on the BadApple'd tests. >> But as David says, that's not going to be any time soon. It's been a >> couple of months that I've been trying to just get the tests >> BadApple'd without even trying to fix any of them. >> >> It's particularly pernicious because with all the noise we don't see >> failures we _should_ see. >> >> So I don't have any good short-term answer either. We've built up a >> very large technical debt in the testing. The first step is to stop >> adding more debt, which is what I've been working on so far. And >> that's the easy part >> >> Siigghh >> >> Erick >> >> >> On Fri, Jun 15, 2018 at 5:29 AM, David Smiley >> wrote: >> > (Sigh) I sympathize with your points Simon. I'm +1 to modify the >> > Lucene-side JIRA QA bot (Yetus) to not execute Solr tests. We can and >> > are >> > trying to improve the stability of the Solr tests but even >> > optimistically >> > the practical reality is that it won't be good enough anytime soon. >> > When we >> > get there, we can reverse this. >> > >> > On Fri, Jun 15, 2018 at 3:32 AM Simon Willnauer >> > >> > wrote: >> >> >> >> folks, >> >> >
Re: Status of solr tests
Mark (and everyone). I'm trying to be somewhat conservative about what I BadApple, at this point it's only things that have failed every week for the last 4. Part of that conservatism is to avoid BadApple'ing tests that are failing and _should_ fail. I'm explicitly _not_ delving into any of the causes at all at this point, it's overwhelming until we reduce the noise as everyone knows. So please feel totally free to BadApple anything you know is flakey, it won't intrude on my turf ;) And since I realized I can also report tests that have _not_ failed in a month that _are_ BadApple'd, we can be a little freer with BadApple'ing tests since there's a mechanism for un-annotating them without a lot of tedious effort. FWIW. On Fri, Jun 15, 2018 at 9:09 AM, Mark Miller wrote: > There is an okay chance I'm going to start making some improvements here as > well. I've been working on a very stable set of tests on my starburst branch > and will slowly bring in test fixes over time (I've already been making some > on that branch for important tests). We should currently be defaulting to > tests.badapples=false on all solr test runs - it's a joke to try and get a > clean run otherwise, and even then somehow 4 or 5 tests that fail somewhat > commonly have so far avoided Erick's @BadApple hack and slash. They are bad > appled on my dev branch now, but that is currently where any time I have is > spent rather than on the main dev branches. > > Also, too many flakey tests are introduced because devs are not beasting or > beasting well before committing new heavy tests. Perhaps we could add some > docs around that. > > We have built in beasting support, we need to emphasize that a couple passes > on a new test is not sufficient to test it's quality. > > - Mark > > On Fri, Jun 15, 2018 at 9:46 AM Erick Erickson > wrote: >> >> (Sg) All very true. You're not alone in your frustration. >> >> I've been trying to at least BadApple tests that fail consistently, so >> another option could be to disable BadApple'd tests. My hope has been >> to get to the point of being able to reliably get clean runs, at least >> when BadApple'd tests are disabled. >> >> From that point I want to draw a line in the sand and immediately >> address tests that fail that are _not_ BadApple'd. At least then we'll >> stop getting _worse_. And then we can work on the BadApple'd tests. >> But as David says, that's not going to be any time soon. It's been a >> couple of months that I've been trying to just get the tests >> BadApple'd without even trying to fix any of them. >> >> It's particularly pernicious because with all the noise we don't see >> failures we _should_ see. >> >> So I don't have any good short-term answer either. We've built up a >> very large technical debt in the testing. The first step is to stop >> adding more debt, which is what I've been working on so far. And >> that's the easy part >> >> Siigghh >> >> Erick >> >> >> On Fri, Jun 15, 2018 at 5:29 AM, David Smiley >> wrote: >> > (Sigh) I sympathize with your points Simon. I'm +1 to modify the >> > Lucene-side JIRA QA bot (Yetus) to not execute Solr tests. We can and >> > are >> > trying to improve the stability of the Solr tests but even >> > optimistically >> > the practical reality is that it won't be good enough anytime soon. >> > When we >> > get there, we can reverse this. >> > >> > On Fri, Jun 15, 2018 at 3:32 AM Simon Willnauer >> > >> > wrote: >> >> >> >> folks, >> >> >> >> I got more active working on IndexWriter and Soft-Deletes etc. in the >> >> last couple of weeks. It's a blast again and I really enjoy it. The >> >> one thing that is IMO not acceptable is the status of solr tests. I >> >> tried so many times to get them passing on several different OSs but >> >> it seems this is pretty hopepless. It's get's even worse the >> >> Lucene/Solr QA job literally marks every ticket I attach a patch to as >> >> `-1` because of arbitrary solr tests, here is an example: >> >> >> >> || Reason || Tests || >> >> | Failed junit tests | solr.rest.TestManagedResourceStorage | >> >> | | solr.cloud.autoscaling.SearchRateTriggerIntegrationTest | >> >> | | solr.cloud.autoscaling.ScheduledMaintenanceTriggerTest | >> >> | | solr.client.solrj.impl.CloudSolrClientTest | >> >> | | solr.common.util.TestJsonRecordReader | >> >> >> >> Speaking to other committers I hear we should just disable this job. >> >> Sorry, WTF? >> >> >> >> These tests seem to fail all the time, randomly and over and over >> >> again. This renders the test as entirely useless to me. I even invest >> >> time (wrong, I invested) looking into it if they are caused by me or >> >> if I can do something about it. Yet, someone could call me out for >> >> being responsible for them as a commiter, yes I am hence this email. I >> >> don't think I am obliged to fix them. These projects have 50+ >> >> committers and having a shared codebase doesn't mean everybody has to >> >>
Re: Status of solr tests
There is an okay chance I'm going to start making some improvements here as well. I've been working on a very stable set of tests on my starburst branch and will slowly bring in test fixes over time (I've already been making some on that branch for important tests). We should currently be defaulting to tests.badapples=false on all solr test runs - it's a joke to try and get a clean run otherwise, and even then somehow 4 or 5 tests that fail somewhat commonly have so far avoided Erick's @BadApple hack and slash. They are bad appled on my dev branch now, but that is currently where any time I have is spent rather than on the main dev branches. Also, too many flakey tests are introduced because devs are not beasting or beasting well before committing new heavy tests. Perhaps we could add some docs around that. We have built in beasting support, we need to emphasize that a couple passes on a new test is not sufficient to test it's quality. - Mark On Fri, Jun 15, 2018 at 9:46 AM Erick Erickson wrote: > (Sg) All very true. You're not alone in your frustration. > > I've been trying to at least BadApple tests that fail consistently, so > another option could be to disable BadApple'd tests. My hope has been > to get to the point of being able to reliably get clean runs, at least > when BadApple'd tests are disabled. > > From that point I want to draw a line in the sand and immediately > address tests that fail that are _not_ BadApple'd. At least then we'll > stop getting _worse_. And then we can work on the BadApple'd tests. > But as David says, that's not going to be any time soon. It's been a > couple of months that I've been trying to just get the tests > BadApple'd without even trying to fix any of them. > > It's particularly pernicious because with all the noise we don't see > failures we _should_ see. > > So I don't have any good short-term answer either. We've built up a > very large technical debt in the testing. The first step is to stop > adding more debt, which is what I've been working on so far. And > that's the easy part > > Siigghh > > Erick > > > On Fri, Jun 15, 2018 at 5:29 AM, David Smiley > wrote: > > (Sigh) I sympathize with your points Simon. I'm +1 to modify the > > Lucene-side JIRA QA bot (Yetus) to not execute Solr tests. We can and > are > > trying to improve the stability of the Solr tests but even optimistically > > the practical reality is that it won't be good enough anytime soon. > When we > > get there, we can reverse this. > > > > On Fri, Jun 15, 2018 at 3:32 AM Simon Willnauer < > simon.willna...@gmail.com> > > wrote: > >> > >> folks, > >> > >> I got more active working on IndexWriter and Soft-Deletes etc. in the > >> last couple of weeks. It's a blast again and I really enjoy it. The > >> one thing that is IMO not acceptable is the status of solr tests. I > >> tried so many times to get them passing on several different OSs but > >> it seems this is pretty hopepless. It's get's even worse the > >> Lucene/Solr QA job literally marks every ticket I attach a patch to as > >> `-1` because of arbitrary solr tests, here is an example: > >> > >> || Reason || Tests || > >> | Failed junit tests | solr.rest.TestManagedResourceStorage | > >> | | solr.cloud.autoscaling.SearchRateTriggerIntegrationTest | > >> | | solr.cloud.autoscaling.ScheduledMaintenanceTriggerTest | > >> | | solr.client.solrj.impl.CloudSolrClientTest | > >> | | solr.common.util.TestJsonRecordReader | > >> > >> Speaking to other committers I hear we should just disable this job. > >> Sorry, WTF? > >> > >> These tests seem to fail all the time, randomly and over and over > >> again. This renders the test as entirely useless to me. I even invest > >> time (wrong, I invested) looking into it if they are caused by me or > >> if I can do something about it. Yet, someone could call me out for > >> being responsible for them as a commiter, yes I am hence this email. I > >> don't think I am obliged to fix them. These projects have 50+ > >> committers and having a shared codebase doesn't mean everybody has to > >> take care of everything. I think we are at the point where if I work > >> on Lucene I won't run solr tests at all otherwise there won't be any > >> progress. On the other hand solr tests never pass I wonder if the solr > >> code-base gets changes nevertheless? That is again a terrible > >> situation. > >> > >> I spoke to varun and anshum during buzzwords if they can give me some > >> hints what I am doing wrong but it seems like the way it is. I feel > >> terrible pushing stuff to our repo still seeing our tests fail. I get > >> ~15 build failures from solr tests a day I am not the only one that > >> has mail filters to archive them if there isn't a lucene tests in the > >> failures. > >> > >> This is a terrible state folks, how do we fix it? It's the lucene land > >> that get much love on the testing end but that also requires more work > >> on it, I expect solr
Re: Status of solr tests
(Sg) All very true. You're not alone in your frustration. I've been trying to at least BadApple tests that fail consistently, so another option could be to disable BadApple'd tests. My hope has been to get to the point of being able to reliably get clean runs, at least when BadApple'd tests are disabled. >From that point I want to draw a line in the sand and immediately address tests that fail that are _not_ BadApple'd. At least then we'll stop getting _worse_. And then we can work on the BadApple'd tests. But as David says, that's not going to be any time soon. It's been a couple of months that I've been trying to just get the tests BadApple'd without even trying to fix any of them. It's particularly pernicious because with all the noise we don't see failures we _should_ see. So I don't have any good short-term answer either. We've built up a very large technical debt in the testing. The first step is to stop adding more debt, which is what I've been working on so far. And that's the easy part Siigghh Erick On Fri, Jun 15, 2018 at 5:29 AM, David Smiley wrote: > (Sigh) I sympathize with your points Simon. I'm +1 to modify the > Lucene-side JIRA QA bot (Yetus) to not execute Solr tests. We can and are > trying to improve the stability of the Solr tests but even optimistically > the practical reality is that it won't be good enough anytime soon. When we > get there, we can reverse this. > > On Fri, Jun 15, 2018 at 3:32 AM Simon Willnauer > wrote: >> >> folks, >> >> I got more active working on IndexWriter and Soft-Deletes etc. in the >> last couple of weeks. It's a blast again and I really enjoy it. The >> one thing that is IMO not acceptable is the status of solr tests. I >> tried so many times to get them passing on several different OSs but >> it seems this is pretty hopepless. It's get's even worse the >> Lucene/Solr QA job literally marks every ticket I attach a patch to as >> `-1` because of arbitrary solr tests, here is an example: >> >> || Reason || Tests || >> | Failed junit tests | solr.rest.TestManagedResourceStorage | >> | | solr.cloud.autoscaling.SearchRateTriggerIntegrationTest | >> | | solr.cloud.autoscaling.ScheduledMaintenanceTriggerTest | >> | | solr.client.solrj.impl.CloudSolrClientTest | >> | | solr.common.util.TestJsonRecordReader | >> >> Speaking to other committers I hear we should just disable this job. >> Sorry, WTF? >> >> These tests seem to fail all the time, randomly and over and over >> again. This renders the test as entirely useless to me. I even invest >> time (wrong, I invested) looking into it if they are caused by me or >> if I can do something about it. Yet, someone could call me out for >> being responsible for them as a commiter, yes I am hence this email. I >> don't think I am obliged to fix them. These projects have 50+ >> committers and having a shared codebase doesn't mean everybody has to >> take care of everything. I think we are at the point where if I work >> on Lucene I won't run solr tests at all otherwise there won't be any >> progress. On the other hand solr tests never pass I wonder if the solr >> code-base gets changes nevertheless? That is again a terrible >> situation. >> >> I spoke to varun and anshum during buzzwords if they can give me some >> hints what I am doing wrong but it seems like the way it is. I feel >> terrible pushing stuff to our repo still seeing our tests fail. I get >> ~15 build failures from solr tests a day I am not the only one that >> has mail filters to archive them if there isn't a lucene tests in the >> failures. >> >> This is a terrible state folks, how do we fix it? It's the lucene land >> that get much love on the testing end but that also requires more work >> on it, I expect solr to do the same. That at the same time requires >> stop pushing new stuff until the situation is under control. The >> effort of marking stuff as bad apples isn't the answer, this requires >> effort from the drivers behind this project. >> >> simon >> >> - >> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org >> For additional commands, e-mail: dev-h...@lucene.apache.org >> > -- > Lucene/Solr Search Committer, Consultant, Developer, Author, Speaker > LinkedIn: http://linkedin.com/in/davidwsmiley | Book: > http://www.solrenterprisesearchserver.com - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: Status of solr tests
(Sigh) I sympathize with your points Simon. I'm +1 to modify the Lucene-side JIRA QA bot (Yetus) to not execute Solr tests. We can and are trying to improve the stability of the Solr tests but even optimistically the practical reality is that it won't be good enough anytime soon. When we get there, we can reverse this. On Fri, Jun 15, 2018 at 3:32 AM Simon Willnauer wrote: > folks, > > I got more active working on IndexWriter and Soft-Deletes etc. in the > last couple of weeks. It's a blast again and I really enjoy it. The > one thing that is IMO not acceptable is the status of solr tests. I > tried so many times to get them passing on several different OSs but > it seems this is pretty hopepless. It's get's even worse the > Lucene/Solr QA job literally marks every ticket I attach a patch to as > `-1` because of arbitrary solr tests, here is an example: > > || Reason || Tests || > | Failed junit tests | solr.rest.TestManagedResourceStorage | > | | solr.cloud.autoscaling.SearchRateTriggerIntegrationTest | > | | solr.cloud.autoscaling.ScheduledMaintenanceTriggerTest | > | | solr.client.solrj.impl.CloudSolrClientTest | > | | solr.common.util.TestJsonRecordReader | > > Speaking to other committers I hear we should just disable this job. > Sorry, WTF? > > These tests seem to fail all the time, randomly and over and over > again. This renders the test as entirely useless to me. I even invest > time (wrong, I invested) looking into it if they are caused by me or > if I can do something about it. Yet, someone could call me out for > being responsible for them as a commiter, yes I am hence this email. I > don't think I am obliged to fix them. These projects have 50+ > committers and having a shared codebase doesn't mean everybody has to > take care of everything. I think we are at the point where if I work > on Lucene I won't run solr tests at all otherwise there won't be any > progress. On the other hand solr tests never pass I wonder if the solr > code-base gets changes nevertheless? That is again a terrible > situation. > > I spoke to varun and anshum during buzzwords if they can give me some > hints what I am doing wrong but it seems like the way it is. I feel > terrible pushing stuff to our repo still seeing our tests fail. I get > ~15 build failures from solr tests a day I am not the only one that > has mail filters to archive them if there isn't a lucene tests in the > failures. > > This is a terrible state folks, how do we fix it? It's the lucene land > that get much love on the testing end but that also requires more work > on it, I expect solr to do the same. That at the same time requires > stop pushing new stuff until the situation is under control. The > effort of marking stuff as bad apples isn't the answer, this requires > effort from the drivers behind this project. > > simon > > - > To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org > For additional commands, e-mail: dev-h...@lucene.apache.org > > -- Lucene/Solr Search Committer, Consultant, Developer, Author, Speaker LinkedIn: http://linkedin.com/in/davidwsmiley | Book: http://www.solrenterprisesearchserver.com