Reviving this thread from last November.

I see that e.g. TestCollectionAPI.test has failed on Solr-check-main lately. 
Failure rate 6%. 
http://fucit.org/solr-jenkins-reports/history-trend-of-recent-failures.html#series/org.apache.solr.cloud.api.collections.TestCollectionAPI.test
Think it's time to do a @BadApple and test-fixing raid to get rid of all the 
noise.

Jan

> 21. nov. 2021 kl. 05:40 skrev Gus Heck <[email protected]>:
> 
> So in trying to assess whether my changes for 
> https://issues.apache.org/jira/browse/SOLR-15590 
> <https://issues.apache.org/jira/browse/SOLR-15590> had any effect on tests I 
> ran the full clean check 10 times on main and branch (from the date of this 
> email). The results were not suggestive of a difference but they were 
> depressing:
> 
> TEST FAILS:
> 
> Main:
> 
> Failed on run 4,5,7,9,10, hung on run 2 (4 runs succeeded, 7 total failures)
>   Ran for over 1.5h (killed manually):
>     1 @ :solr:core:test > Executing test 
> org.apache.solr.cloud.CollectionPropsTest (run 2) (message on screen when 
> killed)
>   Thread leaks:
>     1 @ org.apache.solr.cloud.TestLeaderElectionZkExpiry.classMethod 
> (:solr:core) (run 10)
>     2 @ org.apache.solr.schema.TestBulkSchemaConcurrent.classMethod 
> (:solr:core)  (run 4,7)
>   Others:
>     2 @ org.apache.solr.cloud.api.collections.TestCollectionAPI.test 
> (:solr:core) (run 7,9) (not repro)
>     1 @ 
> org.apache.solr.common.util.TestJsonRecordReader.testArrayOfRootObjects 
> (:solr:solrj) (run 5) (not repro)
>     1 @ org.apache.solr.cloud.TestPullReplica.testRemoveAllWriterReplicas 
> (:solr:core) (run4) (not repro)
> 
> Branch:
> 
> Failed on run 1,2,6,8,9,10 (4 runs succeeded, 6 total failures)
>   Thread Leaks
>     1 @ org.apache.solr.schema.TestBulkSchemaConcurrent.classMethod 
> (:solr:core) (run 1)
>     2 @ 
> org.apache.solr.handler.designer.TestSchemaDesignerSettingsDAO.classMethod 
> (:solr:core) (run 9,10)
>     1 @ 
> org.apache.solr.client.solrj.impl.CloudHttp2SolrClientBuilderTest.classMethod 
> (:solr:solrj) (run 2)
>     1 @ org.apache.solr.cloud.TestLeaderElectionZkExpiry.classMethod 
> (:solr:core) (run 8)
>   Other:
>     1 @ org.apache.solr.cloud.OverseerTest.testShardLeaderChange (:solr:core) 
> (run 6) (not repro)
> 
> There is nothing suggestive of a difference here, and both branch and main 
> are in terrible shape, each passing less than 50%, though I got curious about 
> how much it does tell me and the sample is so small that this really only 
> gives us good confidence that the fail rate is over 30% on each branch, 
> Fiddling with R Studio a bit gives me depressingly large numbers of runs to 
> reliably detect anything like a 20% change in build failure rate. For example 
> (assuming I didn't make an error interpreting this, stats class was several 
> decades ago) if we take the 60% just observed as current, and introduce a bad 
> change that causes 20% increase in overall build flakiness... to 80% then if 
> we allow ourselves 10% chance of type 1 error and a 10% chance of type 2 
> error we need to run the build 135 times... (output from R version 3.6.3, 
> chi-square)
> 
> power.prop.test(n=NULL, p1=0.6, p2=0.8, power=0.9, sig.level=0.1, 
> alternative="one.sided")
> 
>      Two-sample comparison of proportions power calculation 
> 
>               n = 67.32733
>              p1 = 0.6
>              p2 = 0.8
>       sig.level = 0.1
>           power = 0.9
>     alternative = one.sided
> 
> NOTE: n is number in *each* group
> 
> Right now there is basically no practical way to be confident if you have not 
> introduced or caused a change in the flakiness of the build. This also more 
> or less proves what I had suspected... The only way to control flakey tests 
> is to completely kill them since any sane number of builds has a good 
> likelihood of misleading you unless you do something that leads to a virtual 
> 100% fail rate.
> 
> On Mon, Nov 15, 2021 at 7:45 PM Gus Heck <[email protected] 
> <mailto:[email protected]>> wrote:
> Things seem to be not great. 
> 
> I was testing my branch and saw one class with a non-reproducing failure 
> twice in 9 runs (7 successes) 
> 
> So I checked out main to see what the failure rates looked like there:
> 
> I've now run main it 5 times and had 6 test failures. All of these are 
> different tests and all failing tests pass repeatedly if run alone. Things 
> seem to have gone downhill at the end of april:
> 
> http://fucit.org/solr-jenkins-reports/history-trend-of-recent-failures.html 
> <http://fucit.org/solr-jenkins-reports/history-trend-of-recent-failures.html>
> 
> As I write check and nightly are failing on jenkins (and amusingly badapples 
> is passing)
> 
> https://ci-builds.apache.org/job/Solr/ 
> <https://ci-builds.apache.org/job/Solr/>
> 
> As it stands there's even a chance that some of what I've done in reworking 
> how CoreContainer starts up, may have improved the test situation, for many 
> tests (but the one that's failing is doing so with a thread leak detected 
> from httpclient) 
> 
> I'm really a bit alarmed at the number of failures I'm seeing on main. 
> 
> Let me know if you have been building main with no issues recently. (then I 
> need to figure out what's wrong with my setup, or why it's more sensitive) 
> Or, let me know if you have an idea why main fails so much.
> 
> -Gus
> 
> -- 
> http://www.needhamsoftware.com <http://www.needhamsoftware.com/> (work)
> http://www.the111shift.com <http://www.the111shift.com/> (play)
> 
> 
> -- 
> http://www.needhamsoftware.com <http://www.needhamsoftware.com/> (work)
> http://www.the111shift.com <http://www.the111shift.com/> (play)

Reply via email to