Re: Unit tests

Gus Heck Fri, 14 Jan 2022 06:52:49 -0800

Actually, I spent a bunch of time fiddling with tests last weekend which is
why I didn't get to the things I, on a checkout that wasn't too terribly
old (maybe 2 weeks at that time? I hadn't seen anything that looked like it
should fix the tests and Fucit still looked on fire so I just continued
with what I had) on main, I did some fixes mostly centering on shutdown
race conditions. I got to a point with 9/10 runs passing.


However, then I stashed and updated as a check and ran the tests 10 times
and got 1 failure, so I'm now pondering if my changes were worth it. I
haven't had time to check if the fact I pulled in one fix I saw go by in
email can account for these improvements... (
https://github.com/apache/solr/pull/505/commits/dfb887201e65f4e2a0b4aecadaba0f46d5e527df)
The original thing that set me off was the fact I was seeing lots of thread
leak suite failures.

It's also entirely possible that my 10 runs with one fail after the update
were just luck... previously 4-6 fails in 10 runs was common.

On Fri, Jan 14, 2022 at 7:44 AM Jan Høydahl <[email protected]> wrote:

> Reviving this thread from last November.
>
> I see that e.g. TestCollectionAPI.test has failed on Solr-check-main
> lately. Failure rate 6%.
> http://fucit.org/solr-jenkins-reports/history-trend-of-recent-failures.html#series/org.apache.solr.cloud.api.collections.TestCollectionAPI.test
> Think it's time to do a @BadApple and test-fixing raid to get rid of all
> the noise.
>
> Jan
>
> 21. nov. 2021 kl. 05:40 skrev Gus Heck <[email protected]>:
>
> So in trying to assess whether my changes for
> https://issues.apache.org/jira/browse/SOLR-15590 had any effect on tests
> I ran the full clean check 10 times on main and branch (from the date of
> this email). The results were not suggestive of a difference but they were
> depressing:
>
> TEST FAILS:
>
> Main:
>
> Failed on run 4,5,7,9,10, hung on run 2 (4 runs succeeded, 7 total
> failures)
>   Ran for over 1.5h (killed manually):
>     1 @ :solr:core:test > Executing test
> org.apache.solr.cloud.CollectionPropsTest (run 2) (message on screen when
> killed)
>   Thread leaks:
>     1 @ org.apache.solr.cloud.TestLeaderElectionZkExpiry.classMethod
> (:solr:core) (run 10)
>     2 @ org.apache.solr.schema.TestBulkSchemaConcurrent.classMethod
> (:solr:core)  (run 4,7)
>   Others:
>     2 @ org.apache.solr.cloud.api.collections.TestCollectionAPI.test
> (:solr:core) (run 7,9) (not repro)
>     1 @
> org.apache.solr.common.util.TestJsonRecordReader.testArrayOfRootObjects
> (:solr:solrj) (run 5) (not repro)
>     1 @ org.apache.solr.cloud.TestPullReplica.testRemoveAllWriterReplicas
> (:solr:core) (run4) (not repro)
>
> Branch:
>
> Failed on run 1,2,6,8,9,10 (4 runs succeeded, 6 total failures)
>   Thread Leaks
>     1 @ org.apache.solr.schema.TestBulkSchemaConcurrent.classMethod
> (:solr:core) (run 1)
>     2 @
> org.apache.solr.handler.designer.TestSchemaDesignerSettingsDAO.classMethod
> (:solr:core) (run 9,10)
>     1 @
> org.apache.solr.client.solrj.impl.CloudHttp2SolrClientBuilderTest.classMethod
> (:solr:solrj) (run 2)
>     1 @ org.apache.solr.cloud.TestLeaderElectionZkExpiry.classMethod
> (:solr:core) (run 8)
>   Other:
>     1 @ org.apache.solr.cloud.OverseerTest.testShardLeaderChange
> (:solr:core) (run 6) (not repro)
>
> There is nothing suggestive of a difference here, and both branch and main
> are in terrible shape, each passing less than 50%, though I got curious
> about how much it does tell me and the sample is so small that this really
> only gives us good confidence that the fail rate is over 30% on each
> branch, Fiddling with R Studio a bit gives me depressingly large numbers of
> runs to reliably detect anything like a 20% change in build failure rate.
> For example (assuming I didn't make an error interpreting this, stats class
> was several decades ago) if we take the 60% just observed as current, and
> introduce a bad change that causes 20% increase in overall build
> flakiness... to 80% then if we allow ourselves 10% chance of type 1 error
> and a 10% chance of type 2 error we need to run the build 135 times...
> (output from R version 3.6.3, chi-square)
>
> power.prop.test(n=NULL, p1=0.6, p2=0.8, power=0.9, sig.level=0.1,
>> alternative="one.sided")
>>
>>      Two-sample comparison of proportions power calculation
>>
>>               n = 67.32733
>>              p1 = 0.6
>>              p2 = 0.8
>>       sig.level = 0.1
>>           power = 0.9
>>     alternative = one.sided
>>
>> NOTE: n is number in *each* group
>>
>
> Right now there is basically no practical way to be confident if you have
> not introduced or caused a change in the flakiness of the build. This also
> more or less proves what I had suspected... The only way to control flakey
> tests is to completely kill them since any sane number of builds has a good
> likelihood of misleading you unless you do something that leads to a
> virtual 100% fail rate.
>
> On Mon, Nov 15, 2021 at 7:45 PM Gus Heck <[email protected]> wrote:
>
>> Things seem to be not great.
>>
>> I was testing my branch and saw one class with a non-reproducing failure
>> twice in 9 runs (7 successes)
>>
>> So I checked out main to see what the failure rates looked like there:
>>
>> I've now run main it 5 times and had 6 test failures. All of these are
>> different tests and all failing tests pass repeatedly if run alone. Things
>> seem to have gone downhill at the end of april:
>>
>>
>> http://fucit.org/solr-jenkins-reports/history-trend-of-recent-failures.html
>>
>> As I write check and nightly are failing on jenkins (and amusingly
>> badapples is passing)
>>
>> https://ci-builds.apache.org/job/Solr/
>>
>> As it stands there's even a chance that some of what I've done in
>> reworking how CoreContainer starts up, may have improved the test
>> situation, for many tests (but the one that's failing is doing so with a
>> thread leak detected from httpclient)
>>
>> I'm really a bit alarmed at the number of failures I'm seeing on main.
>>
>> Let me know if you have been building main with no issues recently. (then
>> I need to figure out what's wrong with my setup, or why it's more
>> sensitive) Or, let me know if you have an idea why main fails so much.
>>
>> -Gus
>>
>> --
>> http://www.needhamsoftware.com (work)
>> http://www.the111shift.com (play)
>>
>
>
> --
> http://www.needhamsoftware.com (work)
> http://www.the111shift.com (play)
>
>
>

-- 
http://www.needhamsoftware.com (work)
http://www.the111shift.com (play)

Re: Unit tests

Reply via email to