Author: dogacan Date: Mon Jul 30 12:02:27 2007 New Revision: 561092 URL: http://svn.apache.org/viewvc?view=rev&rev=561092 Log: NUTCH-514 - Indexer should only index pages with fetch status SUCCESS.
Modified: lucene/nutch/trunk/CHANGES.txt lucene/nutch/trunk/src/java/org/apache/nutch/indexer/Indexer.java Modified: lucene/nutch/trunk/CHANGES.txt URL: http://svn.apache.org/viewvc/lucene/nutch/trunk/CHANGES.txt?view=diff&rev=561092&r1=561091&r2=561092 ============================================================================== --- lucene/nutch/trunk/CHANGES.txt (original) +++ lucene/nutch/trunk/CHANGES.txt Mon Jul 30 12:02:27 2007 @@ -102,6 +102,11 @@ 34. NUTCH-525 - DeleteDuplicates generates ArrayIndexOutOfBoundsException when trying to rerun dedup on a segment. (Vishal Shah via dogacan) +35. NUTCH-514 - Indexer should only index pages with fetch status SUCCESS. + (dogacan) Note: There is a bigger problem, i.e how to deal + with redirected pages, and this issue can be considered as a band-aid + for the time being. See NUTCH-273 and NUTCH-353 for more details. + Release 0.9 - 2007-04-02 1. Changed log4j confiquration to log to stdout on commandline Modified: lucene/nutch/trunk/src/java/org/apache/nutch/indexer/Indexer.java URL: http://svn.apache.org/viewvc/lucene/nutch/trunk/src/java/org/apache/nutch/indexer/Indexer.java?view=diff&rev=561092&r1=561091&r2=561092 ============================================================================== --- lucene/nutch/trunk/src/java/org/apache/nutch/indexer/Indexer.java (original) +++ lucene/nutch/trunk/src/java/org/apache/nutch/indexer/Indexer.java Mon Jul 30 12:02:27 2007 @@ -214,7 +214,8 @@ return; // only have inlinks } - if (!parseData.getStatus().isSuccess()) { + if (!parseData.getStatus().isSuccess() || + fetchDatum.getStatus() != CrawlDatum.STATUS_FETCH_SUCCESS) { return; } ------------------------------------------------------------------------- This SF.net email is sponsored by: Splunk Inc. Still grepping through log files to find problems? Stop. Now Search log events and configuration files using AJAX and a browser. Download your FREE copy of Splunk now >> http://get.splunk.com/ _______________________________________________ Nutch-cvs mailing list Nutch-cvs@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nutch-cvs