[ https://issues.apache.org/jira/browse/SOLR-9836?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Mike Drob updated SOLR-9836: ---------------------------- Attachment: SOLR-9836.patch Current WIP patch. * Moved {{modifyIndexProps}} to {{SolrCore}} * Added system property toggle for controlling desired behaviour here. ** Property name and values are shots in the dark and by no means final ** Used an enum because it made sense logically at the time, not sure if this actually matters. * Switched to looking for CorruptIndexException * Fall back to earlier segments file implementation is missing, pending some questions below. (there's a unit test though) ** It's very hard to tell if it was actually the segments file that is corrupt, or if it was something else. ** Is it sufficient to delete {{segments_n}} and let lucene try to read from the new "latest" commit? Will this screw up replication? Do we need to update the generation anywhere else? And I'm still nervous about indiscriminately deleting files where recovery might be possible. I guess that's the point of the config options. ** Another option is to hack a FilterDirectory on the index that would hide the latest segments_n file instead of deleting it. That might work to open it, but we will likely end up with write conflicts next time we commit. The more I toss this idea around, the more it feels like something that would be more cleanly handled at the Lucene level. Possibly best to have two options (recover from leader, do nothing) instead of the initial three proposed by [~markrmil...@gmail.com] and expand on them later. > Add more graceful recovery steps when failing to create SolrCore > ---------------------------------------------------------------- > > Key: SOLR-9836 > URL: https://issues.apache.org/jira/browse/SOLR-9836 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: SolrCloud > Reporter: Mike Drob > Attachments: SOLR-9836.patch, SOLR-9836.patch > > > I have seen several cases where there is a zero-length segments_n file. We > haven't identified the root cause of these issues (possibly a poorly timed > crash during replication?) but if there is another node available then Solr > should be able to recover from this situation. Currently, we log and give up > on loading that core, leaving the user to manually intervene. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org