On 6/18/2020 1:35 AM, Mikhail Khludnev wrote:
I'm challenged with cluster recovery. Think about total failure: ZK
state is lost, however instanceDirs survived since they are mounted via
EBS. Let's say collection is read/only and/or it doesn't have
replicas, just leaders.
Is there a way to create a new empty collection and say, hey here's
shard1 instance, shard2 instance is there etc?
Customer says that the old version of solr does it automatically: when
empty zk is connected, collection's shards just appear there. Right now
due to https://issues.apache.org/jira/browse/SOLR-12066Cleanup deleted
core when node start - if instances with data dirs connect to empty ZK
it just wipes dirs away.
I think that SOLR-12066 was a mistake. See SOLR-13396, which is linked
to SOLR-12066. There are some interesting ideas outlined in SOLR-13396.
There is info in the clusterstate that is currently not recorded
anywhere but zookeeper, making it impossible to fully reconstruct a
collection from existing cores when ZK data is lost.
A quick look at the cloud example on version 8.5.1 tells me that for
such reconstruction to be possible, in addition to what it currently
contains, core.properties would need to record the shard hash range, the
router, maxShardsPerNode, and autoAddReplicas. And there may be other
things related to features that the cloud example does not use.
If both properties and clusterstate in ZK are available, any mismatches
between the two should generate a WARN log, and ZK info should probably
be preferred over properties. A Collections API action should probably
be created to force mismatches back into agreement.
Alternately, the new info could be recorded in a new file, with
cloud.properties being one possibility for the filename. I can think of
reasons to prefer this approach, but I worry about the stability of
adding a whole new file to the config mechanisms.
If the capability does not already exist, I think there should be some
combination of Collections API actions that will allow somebody to
manually reconstruct the collection clusterstate in ZK.
Side note: While playing with examples on 8.5.1 so I could be accurate
on this message, I discovered that the "Files" tab in the admin UI has
issues, in both cloud and standalone mode. The following screenshot has
some red lines added to problems I found. Subdirectories do not work
correctly, the column for filenames is not wide enough for the example
configs, and the filenames do not have mouseover expansion which would
be an alternate way to deal with really long filenames.
https://www.dropbox.com/s/4lm3uad2uv53630/SolrAdminFilesTabProblems.png?dl=0
That's probably worthy of an issue, but I don't want to open one without
discussion.
Thanks,
Shawn
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org