[ 
https://issues.apache.org/jira/browse/NUTCH-1771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14392138#comment-14392138
 ] 

Jorge Luis Betancourt Gonzalez commented on NUTCH-1771:
-------------------------------------------------------

+1 for this patch and for [~wastl-nagel], moving to a new class will allow to 
write a little "segment checker" if the crawl process is stopped due to a hard 
reboot, for instance, this tool could help locate the problematic segment 
before starting the crawling process again.

> Solrindex fails if a segment is corrupted or incomplete
> -------------------------------------------------------
>
>                 Key: NUTCH-1771
>                 URL: https://issues.apache.org/jira/browse/NUTCH-1771
>             Project: Nutch
>          Issue Type: Bug
>          Components: indexer
>    Affects Versions: 1.8, 1.10
>            Reporter: Diaa
>            Priority: Minor
>             Fix For: 1.11
>
>
> When using solrindex to index multiple segments via -dir segment,
> the indexing fails if one or more segments are corrupted/incomplete 
> (generated but not fetched for example)
> The failure is simply java.io exception.
> Deleting the segment fixes the issue.
> The expected behavior should be one of the following:
> * skipping the segment and proceeding with others (while logging)
> * stopping the indexing and logging the failed segment



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to