Interesting. Do you have a guess as to why the failures there are ~5% and not 100% reproducible?
~ David Smiley Apache Lucene/Solr Search Developer http://www.linkedin.com/in/davidwsmiley On Sat, Feb 20, 2021 at 6:41 PM Ilan Ginzburg <ilans...@gmail.com> wrote: > Indeed the issue is due to my changes. > > In OverseerStatusCmd I've skipped some stat collection when running in > distributed cluster state updates mode because I thought these were only > stats related to cluster state updates. > Obviously that was too aggressive and some of the stats are related to the > Collection API. > > I will make sure to skip returning only the stats that are related to > cluster state updater and restore returning collection api stats (when > running in distributed cluster updates mode, otherwise all stats are > returned). > > Tomorrow... > > Ilan > > On Sun, Feb 21, 2021 at 12:22 AM Ilan Ginzburg <ilans...@gmail.com> wrote: > >> Thank you David for reporting this. >> >> Seems due to my recent changes. I reproduce the failure locally and will >> look at this tomorrow. >> >> With the distributed cluster state updates i've introduced a >> randomization for using either Overseer based cluster state updates or >> distributed cluster state updates in tests. This failure seems to happen in >> the distributed state update case. I suspect it is due to Overseer >> returning less stats than expected by the test (which is expected: Overseer >> cannot return stats about cluster state updates if it does not handle >> cluster state updates). >> >> The following line in the logs tells that the run is using distributed >> cluster state: >> 972874 INFO (jetty-launcher-8973-thread-2) [ ] >> o.a.s.c.DistributedClusterStateUpdater Creating >> DistributedClusterStateUpdater with useDistributedStateUpdate=true. Solr >> will be using distributed cluster state updates. >> >> Ilan >> >> >> On Sat, Feb 20, 2021 at 3:00 PM David Smiley <dsmi...@apache.org> wrote: >> >>> I encountered a failure from OverseerStatusTest locally. According to >>> our test failure trends, this guy only just recently started failing ~4-5% >>> of the time, but previously was fine. Only master branch. >>> >>> >>> http://fucit.org/solr-jenkins-reports/history-trend-of-recent-failures.html#series/org.apache.solr.cloud.OverseerStatusTest.test >>> >>> ~ David Smiley >>> Apache Lucene/Solr Search Developer >>> http://www.linkedin.com/in/davidwsmiley >>> >>