Reviving this thread from last November. I see that e.g. TestCollectionAPI.test has failed on Solr-check-main lately. Failure rate 6%. http://fucit.org/solr-jenkins-reports/history-trend-of-recent-failures.html#series/org.apache.solr.cloud.api.collections.TestCollectionAPI.test Think it's time to do a @BadApple and test-fixing raid to get rid of all the noise.
Jan > 21. nov. 2021 kl. 05:40 skrev Gus Heck <[email protected]>: > > So in trying to assess whether my changes for > https://issues.apache.org/jira/browse/SOLR-15590 > <https://issues.apache.org/jira/browse/SOLR-15590> had any effect on tests I > ran the full clean check 10 times on main and branch (from the date of this > email). The results were not suggestive of a difference but they were > depressing: > > TEST FAILS: > > Main: > > Failed on run 4,5,7,9,10, hung on run 2 (4 runs succeeded, 7 total failures) > Ran for over 1.5h (killed manually): > 1 @ :solr:core:test > Executing test > org.apache.solr.cloud.CollectionPropsTest (run 2) (message on screen when > killed) > Thread leaks: > 1 @ org.apache.solr.cloud.TestLeaderElectionZkExpiry.classMethod > (:solr:core) (run 10) > 2 @ org.apache.solr.schema.TestBulkSchemaConcurrent.classMethod > (:solr:core) (run 4,7) > Others: > 2 @ org.apache.solr.cloud.api.collections.TestCollectionAPI.test > (:solr:core) (run 7,9) (not repro) > 1 @ > org.apache.solr.common.util.TestJsonRecordReader.testArrayOfRootObjects > (:solr:solrj) (run 5) (not repro) > 1 @ org.apache.solr.cloud.TestPullReplica.testRemoveAllWriterReplicas > (:solr:core) (run4) (not repro) > > Branch: > > Failed on run 1,2,6,8,9,10 (4 runs succeeded, 6 total failures) > Thread Leaks > 1 @ org.apache.solr.schema.TestBulkSchemaConcurrent.classMethod > (:solr:core) (run 1) > 2 @ > org.apache.solr.handler.designer.TestSchemaDesignerSettingsDAO.classMethod > (:solr:core) (run 9,10) > 1 @ > org.apache.solr.client.solrj.impl.CloudHttp2SolrClientBuilderTest.classMethod > (:solr:solrj) (run 2) > 1 @ org.apache.solr.cloud.TestLeaderElectionZkExpiry.classMethod > (:solr:core) (run 8) > Other: > 1 @ org.apache.solr.cloud.OverseerTest.testShardLeaderChange (:solr:core) > (run 6) (not repro) > > There is nothing suggestive of a difference here, and both branch and main > are in terrible shape, each passing less than 50%, though I got curious about > how much it does tell me and the sample is so small that this really only > gives us good confidence that the fail rate is over 30% on each branch, > Fiddling with R Studio a bit gives me depressingly large numbers of runs to > reliably detect anything like a 20% change in build failure rate. For example > (assuming I didn't make an error interpreting this, stats class was several > decades ago) if we take the 60% just observed as current, and introduce a bad > change that causes 20% increase in overall build flakiness... to 80% then if > we allow ourselves 10% chance of type 1 error and a 10% chance of type 2 > error we need to run the build 135 times... (output from R version 3.6.3, > chi-square) > > power.prop.test(n=NULL, p1=0.6, p2=0.8, power=0.9, sig.level=0.1, > alternative="one.sided") > > Two-sample comparison of proportions power calculation > > n = 67.32733 > p1 = 0.6 > p2 = 0.8 > sig.level = 0.1 > power = 0.9 > alternative = one.sided > > NOTE: n is number in *each* group > > Right now there is basically no practical way to be confident if you have not > introduced or caused a change in the flakiness of the build. This also more > or less proves what I had suspected... The only way to control flakey tests > is to completely kill them since any sane number of builds has a good > likelihood of misleading you unless you do something that leads to a virtual > 100% fail rate. > > On Mon, Nov 15, 2021 at 7:45 PM Gus Heck <[email protected] > <mailto:[email protected]>> wrote: > Things seem to be not great. > > I was testing my branch and saw one class with a non-reproducing failure > twice in 9 runs (7 successes) > > So I checked out main to see what the failure rates looked like there: > > I've now run main it 5 times and had 6 test failures. All of these are > different tests and all failing tests pass repeatedly if run alone. Things > seem to have gone downhill at the end of april: > > http://fucit.org/solr-jenkins-reports/history-trend-of-recent-failures.html > <http://fucit.org/solr-jenkins-reports/history-trend-of-recent-failures.html> > > As I write check and nightly are failing on jenkins (and amusingly badapples > is passing) > > https://ci-builds.apache.org/job/Solr/ > <https://ci-builds.apache.org/job/Solr/> > > As it stands there's even a chance that some of what I've done in reworking > how CoreContainer starts up, may have improved the test situation, for many > tests (but the one that's failing is doing so with a thread leak detected > from httpclient) > > I'm really a bit alarmed at the number of failures I'm seeing on main. > > Let me know if you have been building main with no issues recently. (then I > need to figure out what's wrong with my setup, or why it's more sensitive) > Or, let me know if you have an idea why main fails so much. > > -Gus > > -- > http://www.needhamsoftware.com <http://www.needhamsoftware.com/> (work) > http://www.the111shift.com <http://www.the111shift.com/> (play) > > > -- > http://www.needhamsoftware.com <http://www.needhamsoftware.com/> (work) > http://www.the111shift.com <http://www.the111shift.com/> (play)
