Actually, I spent a bunch of time fiddling with tests last weekend which is why I didn't get to the things I, on a checkout that wasn't too terribly old (maybe 2 weeks at that time? I hadn't seen anything that looked like it should fix the tests and Fucit still looked on fire so I just continued with what I had) on main, I did some fixes mostly centering on shutdown race conditions. I got to a point with 9/10 runs passing.
However, then I stashed and updated as a check and ran the tests 10 times and got 1 failure, so I'm now pondering if my changes were worth it. I haven't had time to check if the fact I pulled in one fix I saw go by in email can account for these improvements... ( https://github.com/apache/solr/pull/505/commits/dfb887201e65f4e2a0b4aecadaba0f46d5e527df) The original thing that set me off was the fact I was seeing lots of thread leak suite failures. It's also entirely possible that my 10 runs with one fail after the update were just luck... previously 4-6 fails in 10 runs was common. On Fri, Jan 14, 2022 at 7:44 AM Jan Høydahl <[email protected]> wrote: > Reviving this thread from last November. > > I see that e.g. TestCollectionAPI.test has failed on Solr-check-main > lately. Failure rate 6%. > http://fucit.org/solr-jenkins-reports/history-trend-of-recent-failures.html#series/org.apache.solr.cloud.api.collections.TestCollectionAPI.test > Think it's time to do a @BadApple and test-fixing raid to get rid of all > the noise. > > Jan > > 21. nov. 2021 kl. 05:40 skrev Gus Heck <[email protected]>: > > So in trying to assess whether my changes for > https://issues.apache.org/jira/browse/SOLR-15590 had any effect on tests > I ran the full clean check 10 times on main and branch (from the date of > this email). The results were not suggestive of a difference but they were > depressing: > > TEST FAILS: > > Main: > > Failed on run 4,5,7,9,10, hung on run 2 (4 runs succeeded, 7 total > failures) > Ran for over 1.5h (killed manually): > 1 @ :solr:core:test > Executing test > org.apache.solr.cloud.CollectionPropsTest (run 2) (message on screen when > killed) > Thread leaks: > 1 @ org.apache.solr.cloud.TestLeaderElectionZkExpiry.classMethod > (:solr:core) (run 10) > 2 @ org.apache.solr.schema.TestBulkSchemaConcurrent.classMethod > (:solr:core) (run 4,7) > Others: > 2 @ org.apache.solr.cloud.api.collections.TestCollectionAPI.test > (:solr:core) (run 7,9) (not repro) > 1 @ > org.apache.solr.common.util.TestJsonRecordReader.testArrayOfRootObjects > (:solr:solrj) (run 5) (not repro) > 1 @ org.apache.solr.cloud.TestPullReplica.testRemoveAllWriterReplicas > (:solr:core) (run4) (not repro) > > Branch: > > Failed on run 1,2,6,8,9,10 (4 runs succeeded, 6 total failures) > Thread Leaks > 1 @ org.apache.solr.schema.TestBulkSchemaConcurrent.classMethod > (:solr:core) (run 1) > 2 @ > org.apache.solr.handler.designer.TestSchemaDesignerSettingsDAO.classMethod > (:solr:core) (run 9,10) > 1 @ > org.apache.solr.client.solrj.impl.CloudHttp2SolrClientBuilderTest.classMethod > (:solr:solrj) (run 2) > 1 @ org.apache.solr.cloud.TestLeaderElectionZkExpiry.classMethod > (:solr:core) (run 8) > Other: > 1 @ org.apache.solr.cloud.OverseerTest.testShardLeaderChange > (:solr:core) (run 6) (not repro) > > There is nothing suggestive of a difference here, and both branch and main > are in terrible shape, each passing less than 50%, though I got curious > about how much it does tell me and the sample is so small that this really > only gives us good confidence that the fail rate is over 30% on each > branch, Fiddling with R Studio a bit gives me depressingly large numbers of > runs to reliably detect anything like a 20% change in build failure rate. > For example (assuming I didn't make an error interpreting this, stats class > was several decades ago) if we take the 60% just observed as current, and > introduce a bad change that causes 20% increase in overall build > flakiness... to 80% then if we allow ourselves 10% chance of type 1 error > and a 10% chance of type 2 error we need to run the build 135 times... > (output from R version 3.6.3, chi-square) > > power.prop.test(n=NULL, p1=0.6, p2=0.8, power=0.9, sig.level=0.1, >> alternative="one.sided") >> >> Two-sample comparison of proportions power calculation >> >> n = 67.32733 >> p1 = 0.6 >> p2 = 0.8 >> sig.level = 0.1 >> power = 0.9 >> alternative = one.sided >> >> NOTE: n is number in *each* group >> > > Right now there is basically no practical way to be confident if you have > not introduced or caused a change in the flakiness of the build. This also > more or less proves what I had suspected... The only way to control flakey > tests is to completely kill them since any sane number of builds has a good > likelihood of misleading you unless you do something that leads to a > virtual 100% fail rate. > > On Mon, Nov 15, 2021 at 7:45 PM Gus Heck <[email protected]> wrote: > >> Things seem to be not great. >> >> I was testing my branch and saw one class with a non-reproducing failure >> twice in 9 runs (7 successes) >> >> So I checked out main to see what the failure rates looked like there: >> >> I've now run main it 5 times and had 6 test failures. All of these are >> different tests and all failing tests pass repeatedly if run alone. Things >> seem to have gone downhill at the end of april: >> >> >> http://fucit.org/solr-jenkins-reports/history-trend-of-recent-failures.html >> >> As I write check and nightly are failing on jenkins (and amusingly >> badapples is passing) >> >> https://ci-builds.apache.org/job/Solr/ >> >> As it stands there's even a chance that some of what I've done in >> reworking how CoreContainer starts up, may have improved the test >> situation, for many tests (but the one that's failing is doing so with a >> thread leak detected from httpclient) >> >> I'm really a bit alarmed at the number of failures I'm seeing on main. >> >> Let me know if you have been building main with no issues recently. (then >> I need to figure out what's wrong with my setup, or why it's more >> sensitive) Or, let me know if you have an idea why main fails so much. >> >> -Gus >> >> -- >> http://www.needhamsoftware.com (work) >> http://www.the111shift.com (play) >> > > > -- > http://www.needhamsoftware.com (work) > http://www.the111shift.com (play) > > > -- http://www.needhamsoftware.com (work) http://www.the111shift.com (play)
