Author: ab
Date: Fri Sep 22 14:49:09 2006
New Revision: 449102

URL: http://svn.apache.org/viewvc?view=rev&rev=449102
Log:
NUTCH-332: fix the problem of doubling scores caused by links pointing
to the current page (e.g. anchors).

Modified:
    lucene/nutch/trunk/CHANGES.txt
    lucene/nutch/trunk/src/java/org/apache/nutch/parse/ParseOutputFormat.java

Modified: lucene/nutch/trunk/CHANGES.txt
URL: 
http://svn.apache.org/viewvc/lucene/nutch/trunk/CHANGES.txt?view=diff&rev=449102&r1=449101&r2=449102
==============================================================================
--- lucene/nutch/trunk/CHANGES.txt (original)
+++ lucene/nutch/trunk/CHANGES.txt Fri Sep 22 14:49:09 2006
@@ -29,6 +29,9 @@
     
 10. NUTCH-367 - DistributedSearch thown ClassCastException (siren)
 
+11. NUTCH-332 - Fix the problem of doubling scores caused by links pointing
+    to the current page (e.g. anchors). (Stefan Groschupf via ab)
+
 
 Release 0.8 - 2006-07-25
 

Modified: 
lucene/nutch/trunk/src/java/org/apache/nutch/parse/ParseOutputFormat.java
URL: 
http://svn.apache.org/viewvc/lucene/nutch/trunk/src/java/org/apache/nutch/parse/ParseOutputFormat.java?view=diff&rev=449102&r1=449101&r2=449102
==============================================================================
--- lucene/nutch/trunk/src/java/org/apache/nutch/parse/ParseOutputFormat.java 
(original)
+++ lucene/nutch/trunk/src/java/org/apache/nutch/parse/ParseOutputFormat.java 
Fri Sep 22 14:49:09 2006
@@ -121,6 +121,8 @@
             } catch (Exception e) {
               toUrl = null;
             }
+            // ignore links to self (or anchors within the page)
+            if (fromUrl.equals(toUrl)) toUrl = null;
             if (toUrl != null) validCount++;
             toUrls[i] = toUrl;
           }


Reply via email to