I agree with Erick's response, and thus the test/assertion seems unreasonable.
If ZK is down, all bets are off on indexing proceeding. In practice, people expect searches to continue for some time at least. ~ David Smiley Apache Lucene/Solr Search Developer http://www.linkedin.com/in/davidwsmiley On Mon, Apr 22, 2019 at 1:54 PM Erick Erickson <erickerick...@gmail.com> wrote: > On the surface, I’m automatically suspicious of _anything_ that relies on > an arbitrary wait period for a state to settle down. Would this 300ms sleep > be adequate on a very fast machine running just one test? > > I don’t see the value that assert anyway. I can’t come up with a use-case > for a running Solr functioning incorrectly because it failed to update a > document while ZooKeeper was shutting down. > > FWIW > Erick > > > On Apr 22, 2019, at 8:42 AM, Gus Heck <gus.h...@gmail.com> wrote: > > > > BasicZkTest has the following bit of code, that I'm tripping on. > > > > zkServer.shutdown(); > > > > // document indexing shouldn't stop immediately after a ZK disconnect > > assertU(adoc("id", "201")); > > > > Thread.sleep(300); > > > > // try a reconnect from disconnect > > zkServer = new ZkTestServer(zkDir, zkPort); > > zkServer.run(false); > > > > It's not entirely clear to me that this should always be true. > ZkStateReader has means to cache and watch various bits of information, but > if it hasn't done the caching yet it may need to talk to zk before > completing the request. I am trying to use Collection Properties as an > alternative location for looking up the routed alias for a collection. > Current code uses a core property, but this is inconvenient for testing as > it can't be altered in the test... or at least I didn't find a way to alter > it. Also, future features such as archiving older collections from a TRA, > might find it useful to be able to disconnect the older collections from > the alias, but right now that would require finding all cores and editing > properties for all of them... > > > > However BasicZkTest fails on this assert, because the fetching of > properties fails, throwing an exception. > > > > So is this assert really reasonable? It kind of feels unreasonable but > I'd like some background from other folks here... > https://issues.apache.org/jira/browse/SOLR-7819 seems to have discussed > this some but The more I think about it, the more I'm convinced that > proceeding without zookeeper available seems dangerous. Any update sent to > an alias (TRA/CRA or regular) will need to check zookeeper for example.... > Also security.json is in zookeeper, so anyone running with security on > probably tries to hit zookeeper on a cache miss too > > > > I guess it comes down to the question of whether or not solr cloud > should work while zookeeper is down/unavail or not. This is the first I've > run into the notion that the answer might be yes. I'd always presumed that > if Zk went away all bets were off, because ZK is what makes a cloud out of > us. > > > > What I don't know is what existing use cases/installs might find this > assert critical (most of the above bug talked about LIR, and the comment on > the commit mentions leader election) > > > > Thoughts? > > > > -Gus > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org > For additional commands, e-mail: dev-h...@lucene.apache.org > >