Hi - I'm getting the same problem with Nutch 0.9 - wondering if the patch applies for that too.
- Grease JIRA j...@apache.org wrote: > > ParseSegment no longer allow reparsing > -------------------------------------- > > Key: NUTCH-633 > URL: https://issues.apache.org/jira/browse/NUTCH-633 > Project: Nutch > Issue Type: Bug > Affects Versions: 1.0.0 > Environment: any > Reporter: Xue Yong Zhi > Priority: Minor > > > ParseSegment used to allow reparsing even if parsing has been enabled in > Fetcher. But now it throws a NumberFormatException as > 'content.getMetadata().get(Nutch.FETCH_STATUS_KEY)' is null. > > This patch will fix the problem: > > --- a/src/java/org/apache/nutch/parse/ParseSegment.java > +++ b/src/java/org/apache/nutch/parse/ParseSegment.java > @@ -70,8 +70,10 @@ public class ParseSegment extends Configured implements > Tool, Mapper<WritableCom > key = newKey; > } > > + //status_key is only available when parsing is not done in fetcher > + String status_key = > content.getMetadata().get(Nutch.FETCH_STATUS_KEY); > int status = > - > Integer.parseInt(content.getMetadata().get(Nutch.FETCH_STATUS_KEY)); > + (null == status_key) ? CrawlDatum.STATUS_FETCH_SUCCESS : > Integer.parseInt(status_key); > if (status != CrawlDatum.STATUS_FETCH_SUCCESS) { > // content not fetched successfully, skip document > LOG.debug("Skipping " + key + " as content is not fetched > successfully"); > > > -- > This message is automatically generated by JIRA. > - > You can reply to this email to add a comment to the issue online. > > > -- View this message in context: http://www.nabble.com/-jira--Created%3A-%28NUTCH-633%29-ParseSegment-no-longer-allow-reparsing-tp17467079p21760251.html Sent from the Nutch - Dev mailing list archive at Nabble.com.