[ https://issues.apache.org/jira/browse/SOLR-9836?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15730254#comment-15730254 ]
Mike Drob commented on SOLR-9836: --------------------------------- bq. I'm not sure that is the right exception to catch - very brittle. We should probably be mostly looking for CorruptedIndexException and if that doesn't cover a case at the Lucene level, look at improving that there. Even if the case of a 0 byte segments file with nothing to roll back on throws an EOFException today, it may not tomorrow. I think that is the goal of the CorruptIndexException - you can actually have a little more than momentary confidence that your code is not treating exceptions one way while things change underneath you over time. I could add a check somewhere along the chain that would turn an {{EOF}} into a {{CorruptIndex}}. However, I'm not confident enough in the lucene internals to know if this leads to eventual false positives somewhere... It probably looks like: {code:title=SegmentInfos.java:276} long generation = generationFromSegmentsFileName(segmentFileName); //System.out.println(Thread.currentThread() + ": SegmentInfos.readCommit " + segmentFileName); + ChecksumIndexInput saved = null; try (ChecksumIndexInput input = directory.openChecksumInput(segmentFileName, IOContext.READ)) { + saved = input; return readCommit(directory, input, generation); + } catch (EOFException e) { + throw new CorruptIndexException("Unexpected end of file while reading index.", saved, e); } } {code} But the method javadoc worries me: {{* Read a particular segmentFileName. Note that this may throw an IOException if a commit is in process.}} Under what circumstances would this throw an IOException? Randomly returning CorruptIndex during normal operation is bad news. > Add more graceful recovery steps when failing to create SolrCore > ---------------------------------------------------------------- > > Key: SOLR-9836 > URL: https://issues.apache.org/jira/browse/SOLR-9836 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: SolrCloud > Reporter: Mike Drob > Attachments: SOLR-9836.patch > > > I have seen several cases where there is a zero-length segments_n file. We > haven't identified the root cause of these issues (possibly a poorly timed > crash during replication?) but if there is another node available then Solr > should be able to recover from this situation. Currently, we log and give up > on loading that core, leaving the user to manually intervene. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org