[ https://issues.apache.org/jira/browse/NUTCH-1771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14392138#comment-14392138 ]
Jorge Luis Betancourt Gonzalez commented on NUTCH-1771: ------------------------------------------------------- +1 for this patch and for [~wastl-nagel], moving to a new class will allow to write a little "segment checker" if the crawl process is stopped due to a hard reboot, for instance, this tool could help locate the problematic segment before starting the crawling process again. > Solrindex fails if a segment is corrupted or incomplete > ------------------------------------------------------- > > Key: NUTCH-1771 > URL: https://issues.apache.org/jira/browse/NUTCH-1771 > Project: Nutch > Issue Type: Bug > Components: indexer > Affects Versions: 1.8, 1.10 > Reporter: Diaa > Priority: Minor > Fix For: 1.11 > > > When using solrindex to index multiple segments via -dir segment, > the indexing fails if one or more segments are corrupted/incomplete > (generated but not fetched for example) > The failure is simply java.io exception. > Deleting the segment fixes the issue. > The expected behavior should be one of the following: > * skipping the segment and proceeding with others (while logging) > * stopping the indexing and logging the failed segment -- This message was sent by Atlassian JIRA (v6.3.4#6332)