Hi,

Below is a patch to IndexingFilters.java to avoid running duplicate filters. This may happen if you indavertently put multiple copies of plugins on the plugin.folders path list.

Since currently plugins don't follow a contract to add fields only once, if you run them more than once you will end up with Document's containing multiple fields with the same names and values - and this may badly affect the searching results.

--
Best regards,
Andrzej Bialecki

-------------------------------------------------
Software Architect, System Integration Specialist
CEN/ISSS EC Workshop, ECIMF project chair
EU FP6 E-Commerce Expert/Evaluator
-------------------------------------------------
FreeBSD developer (http://www.freebsd.org)


Index: IndexingFilters.java
===================================================================
RCS file: /cvsroot/nutch/nutch/src/java/net/nutch/indexer/IndexingFilters.java,v
retrieving revision 1.1
diff -b -d -u -r1.1 IndexingFilters.java
--- IndexingFilters.java        28 Jun 2004 21:26:35 -0000      1.1
+++ IndexingFilters.java        7 Jul 2004 16:13:39 -0000
@@ -3,6 +3,8 @@
 
 package net.nutch.indexer;
 
+import java.util.HashMap;
+
 import org.apache.lucene.document.Document;
 
 import net.nutch.plugin.*;
@@ -20,11 +22,15 @@
       if (point == null)
         throw new RuntimeException(IndexingFilter.X_POINT_ID+" not found.");
       Extension[] extensions = point.getExtentens();
-      CACHE = new IndexingFilter[extensions.length];
+      HashMap filterMap = new HashMap();
       for (int i = 0; i < extensions.length; i++) {
         Extension extension = extensions[i];
-        CACHE[i] = (IndexingFilter)extension.getExtensionInstance();
+        IndexingFilter filter = (IndexingFilter)extension.getExtensionInstance();
+        if (!filterMap.containsKey(filter.getClass().getName())) {
+               filterMap.put(filter.getClass().getName(), filter);
+        }
       }
+      CACHE = (IndexingFilter[])filterMap.values().toArray(new IndexingFilter[0]);
     } catch (PluginRuntimeException e) {
       throw new RuntimeException(e);
     }
@@ -36,7 +42,7 @@
   public static Document filter(Document doc, Parse parse, FetcherOutput fo)
     throws IndexingException {
 
-    for (int i = 0 ; i < CACHE.length; i++) {
+    for (int i = 0; i < CACHE.length; i++) {
       doc = CACHE[i].filter(doc, parse, fo);
     }
 

Reply via email to