Author: siren Date: Tue Oct 24 08:21:43 2006 New Revision: 467355 URL: http://svn.apache.org/viewvc?view=rev&rev=467355 Log: fix for NUTCH-379
Modified: lucene/nutch/trunk/CHANGES.txt lucene/nutch/trunk/src/java/org/apache/nutch/parse/ParseUtil.java Modified: lucene/nutch/trunk/CHANGES.txt URL: http://svn.apache.org/viewvc/lucene/nutch/trunk/CHANGES.txt?view=diff&rev=467355&r1=467354&r2=467355 ============================================================================== --- lucene/nutch/trunk/CHANGES.txt (original) +++ lucene/nutch/trunk/CHANGES.txt Tue Oct 24 08:21:43 2006 @@ -50,9 +50,6 @@ 17. NUTCH-383 - upgrade to Hadoop 0.7.1 and Lucene 2.0.0. (ab) -18. NUTCH-391 - ParseUtil logs file contents to log file when it cannot - find parser (siren) - ****************************** WARNING !!! ******************************** * This upgrade breaks data format compatibility. A tool 'convertdb' * * was added to migrate existing CrawlDb-s to the new format. Segment data * @@ -63,6 +60,11 @@ 18. NUTCH-371 - DeleteDuplicates now correctly implements both parts of the algorithm. (ab) +19. NUTCH-391 - ParseUtil logs file contents to log file when it cannot + find parser (siren) + +20. NUTCH-391 - ParseUtil does not pass through the content's URL to the + ParserFactory (Chris A. Mattmann via siren) Release 0.8 - 2006-07-25 Modified: lucene/nutch/trunk/src/java/org/apache/nutch/parse/ParseUtil.java URL: http://svn.apache.org/viewvc/lucene/nutch/trunk/src/java/org/apache/nutch/parse/ParseUtil.java?view=diff&rev=467355&r1=467354&r2=467355 ============================================================================== --- lucene/nutch/trunk/src/java/org/apache/nutch/parse/ParseUtil.java (original) +++ lucene/nutch/trunk/src/java/org/apache/nutch/parse/ParseUtil.java Tue Oct 24 08:21:43 2006 @@ -65,7 +65,8 @@ Parser[] parsers = null; try { - parsers = this.parserFactory.getParsers(content.getContentType(), ""); + parsers = this.parserFactory.getParsers(content.getContentType(), + content.getUrl() != null ? content.getUrl():""); } catch (ParserNotFound e) { if (LOG.isWarnEnabled()) { LOG.warn("No suitable parser found when trying to parse content " + content.getUrl() +