Author: dogacan
Date: Mon Jul 30 12:02:27 2007
New Revision: 561092

URL: http://svn.apache.org/viewvc?view=rev&rev=561092
Log:
NUTCH-514 - Indexer should only index pages with fetch status SUCCESS.

Modified:
    lucene/nutch/trunk/CHANGES.txt
    lucene/nutch/trunk/src/java/org/apache/nutch/indexer/Indexer.java

Modified: lucene/nutch/trunk/CHANGES.txt
URL: 
http://svn.apache.org/viewvc/lucene/nutch/trunk/CHANGES.txt?view=diff&rev=561092&r1=561091&r2=561092
==============================================================================
--- lucene/nutch/trunk/CHANGES.txt (original)
+++ lucene/nutch/trunk/CHANGES.txt Mon Jul 30 12:02:27 2007
@@ -102,6 +102,11 @@
 34. NUTCH-525 - DeleteDuplicates generates ArrayIndexOutOfBoundsException 
     when trying to rerun dedup on a segment. (Vishal Shah via dogacan)
 
+35. NUTCH-514 - Indexer should only index pages with fetch status SUCCESS.
+    (dogacan) Note: There is a bigger problem, i.e how to deal
+    with redirected pages, and this issue can be considered as a band-aid 
+    for the time being. See NUTCH-273 and NUTCH-353 for more details. 
+
 Release 0.9 - 2007-04-02
 
  1. Changed log4j confiquration to log to stdout on commandline

Modified: lucene/nutch/trunk/src/java/org/apache/nutch/indexer/Indexer.java
URL: 
http://svn.apache.org/viewvc/lucene/nutch/trunk/src/java/org/apache/nutch/indexer/Indexer.java?view=diff&rev=561092&r1=561091&r2=561092
==============================================================================
--- lucene/nutch/trunk/src/java/org/apache/nutch/indexer/Indexer.java (original)
+++ lucene/nutch/trunk/src/java/org/apache/nutch/indexer/Indexer.java Mon Jul 
30 12:02:27 2007
@@ -214,7 +214,8 @@
       return;                                     // only have inlinks
     }
     
-    if (!parseData.getStatus().isSuccess()) {
+    if (!parseData.getStatus().isSuccess() || 
+        fetchDatum.getStatus() != CrawlDatum.STATUS_FETCH_SUCCESS) {
       return;
     }
 



-------------------------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems?  Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now >>  http://get.splunk.com/
_______________________________________________
Nutch-cvs mailing list
Nutch-cvs@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nutch-cvs

Reply via email to