Hi - 

I'm getting the same problem with Nutch 0.9 - wondering if the patch applies
for that too.

- Grease


JIRA j...@apache.org wrote:
> 
> ParseSegment no longer allow reparsing
> --------------------------------------
> 
>                  Key: NUTCH-633
>                  URL: https://issues.apache.org/jira/browse/NUTCH-633
>              Project: Nutch
>           Issue Type: Bug
>     Affects Versions: 1.0.0
>          Environment: any
>             Reporter: Xue Yong Zhi
>             Priority: Minor
> 
> 
> ParseSegment used to allow reparsing even if parsing has been enabled in
> Fetcher. But now it throws a NumberFormatException as
> 'content.getMetadata().get(Nutch.FETCH_STATUS_KEY)' is null.
> 
> This patch will fix the problem:
> 
> --- a/src/java/org/apache/nutch/parse/ParseSegment.java
> +++ b/src/java/org/apache/nutch/parse/ParseSegment.java
> @@ -70,8 +70,10 @@ public class ParseSegment extends Configured implements
> Tool, Mapper<WritableCom
>        key = newKey;
>      }
>      
> +    //status_key is only available when parsing is not done in fetcher
> +    String status_key =
> content.getMetadata().get(Nutch.FETCH_STATUS_KEY);
>      int status =
> -     
> Integer.parseInt(content.getMetadata().get(Nutch.FETCH_STATUS_KEY));
> +      (null == status_key) ? CrawlDatum.STATUS_FETCH_SUCCESS :
> Integer.parseInt(status_key);
>      if (status != CrawlDatum.STATUS_FETCH_SUCCESS) {
>        // content not fetched successfully, skip document
>        LOG.debug("Skipping " + key + " as content is not fetched
> successfully");
> 
> 
> -- 
> This message is automatically generated by JIRA.
> -
> You can reply to this email to add a comment to the issue online.
> 
> 
> 

-- 
View this message in context: 
http://www.nabble.com/-jira--Created%3A-%28NUTCH-633%29-ParseSegment-no-longer-allow-reparsing-tp17467079p21760251.html
Sent from the Nutch - Dev mailing list archive at Nabble.com.

Reply via email to