Yes Marcus this is the commit. David I would have expected 50% failures, as 50% of the runs use distributed updates. I’ll try to understand better as I fix the issue.
Ilan On Sun 21 Feb 2021 at 06:17, David Smiley <dsmi...@apache.org> wrote: > Interesting. Do you have a guess as to why the failures there are ~5% and > not 100% reproducible? > > ~ David Smiley > Apache Lucene/Solr Search Developer > http://www.linkedin.com/in/davidwsmiley > > > On Sat, Feb 20, 2021 at 6:41 PM Ilan Ginzburg <ilans...@gmail.com> wrote: > >> Indeed the issue is due to my changes. >> >> In OverseerStatusCmd I've skipped some stat collection when running in >> distributed cluster state updates mode because I thought these were only >> stats related to cluster state updates. >> Obviously that was too aggressive and some of the stats are related to >> the Collection API. >> >> I will make sure to skip returning only the stats that are related to >> cluster state updater and restore returning collection api stats (when >> running in distributed cluster updates mode, otherwise all stats are >> returned). >> >> Tomorrow... >> >> Ilan >> >> On Sun, Feb 21, 2021 at 12:22 AM Ilan Ginzburg <ilans...@gmail.com> >> wrote: >> >>> Thank you David for reporting this. >>> >>> Seems due to my recent changes. I reproduce the failure locally and will >>> look at this tomorrow. >>> >>> With the distributed cluster state updates i've introduced a >>> randomization for using either Overseer based cluster state updates or >>> distributed cluster state updates in tests. This failure seems to happen in >>> the distributed state update case. I suspect it is due to Overseer >>> returning less stats than expected by the test (which is expected: Overseer >>> cannot return stats about cluster state updates if it does not handle >>> cluster state updates). >>> >>> The following line in the logs tells that the run is using distributed >>> cluster state: >>> 972874 INFO (jetty-launcher-8973-thread-2) [ ] >>> o.a.s.c.DistributedClusterStateUpdater Creating >>> DistributedClusterStateUpdater with useDistributedStateUpdate=true. Solr >>> will be using distributed cluster state updates. >>> >>> Ilan >>> >>> >>> On Sat, Feb 20, 2021 at 3:00 PM David Smiley <dsmi...@apache.org> wrote: >>> >>>> I encountered a failure from OverseerStatusTest locally. According to >>>> our test failure trends, this guy only just recently started failing ~4-5% >>>> of the time, but previously was fine. Only master branch. >>>> >>>> >>>> http://fucit.org/solr-jenkins-reports/history-trend-of-recent-failures.html#series/org.apache.solr.cloud.OverseerStatusTest.test >>>> >>>> ~ David Smiley >>>> Apache Lucene/Solr Search Developer >>>> http://www.linkedin.com/in/davidwsmiley >>>> >>>