@Ishan ~ Can you look at the question Mike raised about https://issues.apache.org/jira/browse/SOLR-15135 please?
So the AutoscalingHistoryHandlerTest has a number of hard-coded wait times in it. While I can appreciate the need for waiting to see state changes occur, tests like this aren't great for CI and RC smoke tests given the variability of hardware. Case in point, I made this change: ``` *diff --git a/solr/core/src/test/org/apache/solr/handler/admin/AutoscalingHistoryHandlerTest.java b/solr/core/src/test/org/apache/solr/handler/admin/AutoscalingHistoryHandlerTest.java* *index a9eea7f7ca5..3b2d39c3317 100644* *--- a/solr/core/src/test/org/apache/solr/handler/admin/AutoscalingHistoryHandlerTest.java* *+++ b/solr/core/src/test/org/apache/solr/handler/admin/AutoscalingHistoryHandlerTest.java* @@ -282,7 +282,7 @@ public class AutoscalingHistoryHandlerTest extends SolrCloudTestCase { boolean await = actionFiredLatch.await(60, TimeUnit.SECONDS); assertTrue("action did not execute", await); - await = listenerFiredLatch.await(60, TimeUnit.SECONDS); + await = listenerFiredLatch.await(120, TimeUnit.SECONDS); assertTrue("listener did not execute", await); waitForRecovery(COLL_NAME); ``` And of course, beasting passes 5 out of 5; it fails pretty consistently on the first run w/o this change. So I vote we @BadApple this test for 8.8.1 and move forward with RC2 now that Ishan's changes are in. Moreover, since we removed auto-scaling from master, holding up a critical bug fix for a test that fails intermittently b/c of timing seems imprudent. I'm also biased in that I want to get the fix for 15145 out ASAP ;-) On Tue, Feb 16, 2021 at 9:08 AM Ishan Chattopadhyaya < ichattopadhy...@gmail.com> wrote: > Sounds good, Tim. I've ported the fix to the release branch. Just ran the > tests to make sure it works fine. > Thanks for the extra work you'll have to do (RC2) in order to save me > future work (8.8.2). Really owe you one! > > > Are there other fixes you're aware of that are slated for 8.8.2 @Ishan > Chattopadhyaya <ichattopadhy...@gmail.com>? > I am not aware of anything else. > > On Tue, Feb 16, 2021 at 9:19 PM Timothy Potter <thelabd...@gmail.com> > wrote: > >> I'm beasting AutoscalingHistoryHandlerTest locally now, I haven't seen >> that one fail on my side yet. >> >> As far as respin 8.8.1 RC, it's not a problem for me and I prefer that to >> doing an 8.8.2 soon after 8.8.1 comes out. Are there other fixes you're >> aware of that are slated for 8.8.2 @Ishan Chattopadhyaya >> <ichattopadhy...@gmail.com>? In other words, if the fix for 15138 is all >> that will be in 8.8.2, let's just include it in 8.8.1 and hopefully we >> won't need an 8.8.2 ;-) >> >> Tim >> >> On Tue, Feb 16, 2021 at 7:01 AM Michael Sokolov <msoko...@gmail.com> >> wrote: >> >>> Hmm, I got a failure on >>> org.apache.solr.handler.admin.AutoscalingHistoryHandlerTest.testHistory, >>> but it did not reproduce (tried twice). Would that possibly also be >>> addressed by those fixes? >>> >>> On Tue, Feb 16, 2021 at 7:38 AM Ishan Chattopadhyaya >>> <ichattopadhy...@gmail.com> wrote: >>> > >>> > > The failure seems to be because of a timeout during collection >>> > > creation >>> > >>> > Thanks for digging in. Seems like that is the exact class of fix that >>> we did for SOLR-15138 and are planning for 8.8.2. Shall we backport that >>> fix to the release branch now (for RC2 or 8.8.2)? >>> > >>> > > My h/w is really fast and beefy and may be that's why it doesn't get >>> reproduced. >>> > Same here, Ryzen 9 5950X (fastest mainstream CPU out there). >>> > >>> > On Tue, Feb 16, 2021 at 5:36 PM Michael McCandless < >>> luc...@mikemccandless.com> wrote: >>> >> >>> >> Curious, the smoke tester passed for me on the first try: >>> >> >>> >> SUCCESS! [0:44:29.979512] >>> >> >>> >> >>> >> Mike McCandless >>> >> >>> >> http://blog.mikemccandless.com >>> >> >>> >> >>> >> On Sun, Feb 14, 2021 at 11:26 AM Timothy Potter < >>> thelabd...@apache.org> wrote: >>> >>> >>> >>> Please vote for release candidate 1 for Lucene/Solr 8.8.1 >>> >>> >>> >>> >>> >>> The artifacts can be downloaded from: >>> >>> >>> >>> >>> https://dist.apache.org/repos/dist/dev/lucene/lucene-solr-8.8.1-RC1-rev6a50a0315ac7e4979abb0b530857c7795bb3b928 >>> >>> >>> >>> >>> >>> You can run the smoke tester directly with this command: >>> >>> >>> >>> >>> >>> python3 -u dev-tools/scripts/smokeTestRelease.py \ >>> >>> >>> >>> >>> https://dist.apache.org/repos/dist/dev/lucene/lucene-solr-8.8.1-RC1-rev6a50a0315ac7e4979abb0b530857c7795bb3b928 >>> >>> >>> >>> >>> >>> The vote will be open for at least 72 hours i.e. until 2021-02-17 >>> 17:00 UTC. >>> >>> >>> >>> >>> >>> Here is my +1 ~ SUCCESS! [0:50:06.728441] >>> >>> >>> >>> >>> >>> In addition to the smoke test, I built a Docker image from >>> solr-8.8.1.tgz locally and verified: >>> >>> >>> >>> >>> >>> a. A rolling upgrade of a 3-node 8.7.0 cluster to the 8.8.1 RC >>> completes successfully w/o any NPEs or weirdness with leader election / >>> recoveries. >>> >>> >>> >>> >>> >>> b. The base_url property is stored in replica state after the upgrade >>> >>> >>> >>> >>> >>> c. A basic client application built with SolrJ 8.7.0 can load >>> cluster state info directly from ZK and query the 8.8.1 RC1 servers. >>> >>> >>> >>> >>> >>> d. Same client app built with SolrJ 8.8.0 works as well. >>> >>> >>> >>> >>> >>> As this bug-fix release is primarily needed to address a SolrJ >>> back-compat break (SOLR-15145) and unfortunately our smoke tester framework >>> does not test for backcompat of older SolrJ against the RC, I ask others to >>> please test rolling upgrades of servers (ideally multi-node clusters) >>> running pre-8.8.0 to this RC if possible. Also, please try client >>> applications that are using an older SolrJ, esp. those that load cluster >>> state directly from ZK. >>> >>> >>> >>> >>> >>> Best regards, >>> >>> >>> >>> Tim >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> --------------------------------------------------------------------- >>> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org >>> For additional commands, e-mail: dev-h...@lucene.apache.org >>> >>>