So in trying to assess whether my changes for
https://issues.apache.org/jira/browse/SOLR-15590 had any effect on tests I
ran the full clean check 10 times on main and branch (from the date of this
email). The results were not suggestive of a difference but they were
depressing:
TEST FAILS:
Main:
Failed on run 4,5,7,9,10, hung on run 2 (4 runs succeeded, 7 total failures)
Ran for over 1.5h (killed manually):
1 @ :solr:core:test > Executing test
org.apache.solr.cloud.CollectionPropsTest (run 2) (message on screen when
killed)
Thread leaks:
1 @ org.apache.solr.cloud.TestLeaderElectionZkExpiry.classMethod
(:solr:core) (run 10)
2 @ org.apache.solr.schema.TestBulkSchemaConcurrent.classMethod
(:solr:core) (run 4,7)
Others:
2 @ org.apache.solr.cloud.api.collections.TestCollectionAPI.test
(:solr:core) (run 7,9) (not repro)
1 @
org.apache.solr.common.util.TestJsonRecordReader.testArrayOfRootObjects
(:solr:solrj) (run 5) (not repro)
1 @ org.apache.solr.cloud.TestPullReplica.testRemoveAllWriterReplicas
(:solr:core) (run4) (not repro)
Branch:
Failed on run 1,2,6,8,9,10 (4 runs succeeded, 6 total failures)
Thread Leaks
1 @ org.apache.solr.schema.TestBulkSchemaConcurrent.classMethod
(:solr:core) (run 1)
2 @
org.apache.solr.handler.designer.TestSchemaDesignerSettingsDAO.classMethod
(:solr:core) (run 9,10)
1 @
org.apache.solr.client.solrj.impl.CloudHttp2SolrClientBuilderTest.classMethod
(:solr:solrj) (run 2)
1 @ org.apache.solr.cloud.TestLeaderElectionZkExpiry.classMethod
(:solr:core) (run 8)
Other:
1 @ org.apache.solr.cloud.OverseerTest.testShardLeaderChange
(:solr:core) (run 6) (not repro)
There is nothing suggestive of a difference here, and both branch and main
are in terrible shape, each passing less than 50%, though I got curious
about how much it does tell me and the sample is so small that this really
only gives us good confidence that the fail rate is over 30% on each
branch, Fiddling with R Studio a bit gives me depressingly large numbers of
runs to reliably detect anything like a 20% change in build failure rate.
For example (assuming I didn't make an error interpreting this, stats class
was several decades ago) if we take the 60% just observed as current, and
introduce a bad change that causes 20% increase in overall build
flakiness... to 80% then if we allow ourselves 10% chance of type 1 error
and a 10% chance of type 2 error we need to run the build 135 times...
(output from R version 3.6.3, chi-square)
power.prop.test(n=NULL, p1=0.6, p2=0.8, power=0.9, sig.level=0.1,
> alternative="one.sided")
>
> Two-sample comparison of proportions power calculation
>
> n = 67.32733
> p1 = 0.6
> p2 = 0.8
> sig.level = 0.1
> power = 0.9
> alternative = one.sided
>
> NOTE: n is number in *each* group
>
Right now there is basically no practical way to be confident if you have
not introduced or caused a change in the flakiness of the build. This also
more or less proves what I had suspected... The only way to control flakey
tests is to completely kill them since any sane number of builds has a good
likelihood of misleading you unless you do something that leads to a
virtual 100% fail rate.
On Mon, Nov 15, 2021 at 7:45 PM Gus Heck <[email protected]> wrote:
> Things seem to be not great.
>
> I was testing my branch and saw one class with a non-reproducing failure
> twice in 9 runs (7 successes)
>
> So I checked out main to see what the failure rates looked like there:
>
> I've now run main it 5 times and had 6 test failures. All of these are
> different tests and all failing tests pass repeatedly if run alone. Things
> seem to have gone downhill at the end of april:
>
> http://fucit.org/solr-jenkins-reports/history-trend-of-recent-failures.html
>
> As I write check and nightly are failing on jenkins (and amusingly
> badapples is passing)
>
> https://ci-builds.apache.org/job/Solr/
>
> As it stands there's even a chance that some of what I've done in
> reworking how CoreContainer starts up, may have improved the test
> situation, for many tests (but the one that's failing is doing so with a
> thread leak detected from httpclient)
>
> I'm really a bit alarmed at the number of failures I'm seeing on main.
>
> Let me know if you have been building main with no issues recently. (then
> I need to figure out what's wrong with my setup, or why it's more
> sensitive) Or, let me know if you have an idea why main fails so much.
>
> -Gus
>
> --
> http://www.needhamsoftware.com (work)
> http://www.the111shift.com (play)
>
--
http://www.needhamsoftware.com (work)
http://www.the111shift.com (play)