[ 
https://issues.apache.org/jira/browse/SOLR-13396?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16816367#comment-16816367
 ] 

Kevin Risden commented on SOLR-13396:
-------------------------------------

I agree that arbitrarily deleting data is bad. The other issue is how do you 
clean up if you JUST have the error/warn. Would be nice to know what you needed 
to do in addition that it was a problem.

So I will caveat this by saying I have no idea how this works today, but when I 
read this I thought it would make sense for each node responsible for a 
shard/collection would have to "ack" that the operation was complete. If the 
node was down at the time, when it comes up it should know it needs to do "xyz" 
and finish the operation.

Again not sure of the ZK details, but some rough ideas:
* Create a znode for each node with list of operations it needs to complete - 
this would be written to by the leader?
* Keep track of which operations each node completed on existing list before 
deleting? - I think this could be hard since leader could change?

Some of the concerns would be added load on ZK for reading/writing operations.

The above could have already been thought about when building Solr Cloud so it 
might be a nonstarter.

> SolrCloud will delete the core data for any core that is not referenced in 
> the clusterstate
> -------------------------------------------------------------------------------------------
>
>                 Key: SOLR-13396
>                 URL: https://issues.apache.org/jira/browse/SOLR-13396
>             Project: Solr
>          Issue Type: Bug
>      Security Level: Public(Default Security Level. Issues are Public) 
>          Components: SolrCloud
>    Affects Versions: 7.3.1, 8.0
>            Reporter: Shawn Heisey
>            Priority: Major
>
> SOLR-12066 is an improvement designed to delete core data for replicas that 
> were deleted while the node was down -- better cleanup.
> In practice, that change causes SolrCloud to delete all core data for cores 
> that are not referenced in the ZK clusterstate.  If all the ZK data gets 
> deleted or the Solr instance is pointed at a ZK ensemble with no data, it 
> will proceed to delete all of the cores in the solr home, with no possibility 
> of recovery.
> I do not think that Solr should ever delete core data unless an explicit 
> DELETE action has been made and the node is operational at the time of the 
> request.  If a core exists during startup that cannot be found in the ZK 
> clusterstate, it should be ignored (not started) and a helpful message should 
> be logged.  I think that message should probably be at WARN so that it shows 
> up in the admin UI logging tab with default settings.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to