Author: siren
Date: Tue Oct 24 08:21:43 2006
New Revision: 467355

URL: http://svn.apache.org/viewvc?view=rev&rev=467355
Log:
fix for NUTCH-379

Modified:
    lucene/nutch/trunk/CHANGES.txt
    lucene/nutch/trunk/src/java/org/apache/nutch/parse/ParseUtil.java

Modified: lucene/nutch/trunk/CHANGES.txt
URL: 
http://svn.apache.org/viewvc/lucene/nutch/trunk/CHANGES.txt?view=diff&rev=467355&r1=467354&r2=467355
==============================================================================
--- lucene/nutch/trunk/CHANGES.txt (original)
+++ lucene/nutch/trunk/CHANGES.txt Tue Oct 24 08:21:43 2006
@@ -50,9 +50,6 @@
 
 17. NUTCH-383 - upgrade to Hadoop 0.7.1 and Lucene 2.0.0. (ab)
 
-18. NUTCH-391 - ParseUtil logs file contents to log file when it cannot
-    find parser (siren)
-
   ****************************** WARNING !!! ********************************
   * This upgrade breaks data format compatibility. A tool 'convertdb'       *
   * was added to migrate existing CrawlDb-s to the new format. Segment data *
@@ -63,6 +60,11 @@
 18. NUTCH-371 - DeleteDuplicates now correctly implements both parts of
     the algorithm. (ab)
 
+19. NUTCH-391 - ParseUtil logs file contents to log file when it cannot
+    find parser (siren)
+
+20. NUTCH-391 - ParseUtil does not pass through the content's URL to the
+    ParserFactory (Chris A. Mattmann via siren)
 
 
 Release 0.8 - 2006-07-25

Modified: lucene/nutch/trunk/src/java/org/apache/nutch/parse/ParseUtil.java
URL: 
http://svn.apache.org/viewvc/lucene/nutch/trunk/src/java/org/apache/nutch/parse/ParseUtil.java?view=diff&rev=467355&r1=467354&r2=467355
==============================================================================
--- lucene/nutch/trunk/src/java/org/apache/nutch/parse/ParseUtil.java (original)
+++ lucene/nutch/trunk/src/java/org/apache/nutch/parse/ParseUtil.java Tue Oct 
24 08:21:43 2006
@@ -65,7 +65,8 @@
     Parser[] parsers = null;
     
     try {
-      parsers = this.parserFactory.getParsers(content.getContentType(), "");
+      parsers = this.parserFactory.getParsers(content.getContentType(), 
+                content.getUrl() != null ? content.getUrl():"");
     } catch (ParserNotFound e) {
       if (LOG.isWarnEnabled()) {
         LOG.warn("No suitable parser found when trying to parse content " + 
content.getUrl() +


Reply via email to