[ http://issues.apache.org/jira/browse/NUTCH-393?page=all ]
Eelco Lempsink updated NUTCH-393:
---------------------------------
Attachment: NUTCH-393.patch
Here's a complete patch against the latest revision to fix this issue.
Note that not only the Indexer.java must be adjusted, the loop in
IndexingFilters.java that executes each filter must also stop when doc == null.
This means that once a filter decides to drop the document no other filter can
undo that action.
> Indexer doesn't handle null documents returned by filters
> ---------------------------------------------------------
>
> Key: NUTCH-393
> URL: http://issues.apache.org/jira/browse/NUTCH-393
> Project: Nutch
> Issue Type: Bug
> Components: indexer
> Affects Versions: 0.8.1
> Reporter: Eelco Lempsink
> Attachments: NUTCH-393.patch
>
>
> Plugins (like IndexingFilter) may return a null value, but this isn't handled
> by the Indexer. A trivial adjustment is all it takes:
> @@ -237,6 +237,7 @@
> if (LOG.isWarnEnabled()) { LOG.warn("Error indexing "+key+": "+e); }
> return;
> }
> + if (doc == null) return;
>
> float boost = 1.0f;
> // run scoring filters
--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira
-------------------------------------------------------------------------
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
_______________________________________________
Nutch-developers mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-developers