On 6/18/2020 1:35 AM, Mikhail Khludnev wrote:
I'm challenged with cluster recovery. Think about total failure: ZK state is lost, however instanceDirs survived since they are mounted via EBS. Let's say collection is read/only and/or it doesn't have replicas, just leaders. Is there a way to create a new empty collection and say, hey here's shard1 instance, shard2 instance is there etc?

Customer says that the old version of solr does it automatically: when empty zk is connected, collection's shards just appear there. Right now due to https://issues.apache.org/jira/browse/SOLR-12066Cleanup deleted core when node start - if instances with data dirs connect to empty ZK it just wipes dirs away.

I think that SOLR-12066 was a mistake. See SOLR-13396, which is linked to SOLR-12066. There are some interesting ideas outlined in SOLR-13396.

There is info in the clusterstate that is currently not recorded anywhere but zookeeper, making it impossible to fully reconstruct a collection from existing cores when ZK data is lost.

A quick look at the cloud example on version 8.5.1 tells me that for such reconstruction to be possible, in addition to what it currently contains, core.properties would need to record the shard hash range, the router, maxShardsPerNode, and autoAddReplicas. And there may be other things related to features that the cloud example does not use.

If both properties and clusterstate in ZK are available, any mismatches between the two should generate a WARN log, and ZK info should probably be preferred over properties. A Collections API action should probably be created to force mismatches back into agreement.

Alternately, the new info could be recorded in a new file, with cloud.properties being one possibility for the filename. I can think of reasons to prefer this approach, but I worry about the stability of adding a whole new file to the config mechanisms.

If the capability does not already exist, I think there should be some combination of Collections API actions that will allow somebody to manually reconstruct the collection clusterstate in ZK.

Side note: While playing with examples on 8.5.1 so I could be accurate on this message, I discovered that the "Files" tab in the admin UI has issues, in both cloud and standalone mode. The following screenshot has some red lines added to problems I found. Subdirectories do not work correctly, the column for filenames is not wide enough for the example configs, and the filenames do not have mouseover expansion which would be an alternate way to deal with really long filenames.

https://www.dropbox.com/s/4lm3uad2uv53630/SolrAdminFilesTabProblems.png?dl=0

That's probably worthy of an issue, but I don't want to open one without discussion.

Thanks,
Shawn

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to