index-extra plugin creates additional fields in the index, based on
configurable logic
--------------------------------------------------------------------------------------
Key: NUTCH-422
URL: http://issues.apache.org/jira/browse/NUTCH-422
Project: Nutch
Issue Type: New Feature
Components: indexer
Affects Versions: 0.8.1
Environment: All environments
Reporter: Alan Tanaman
Extract from the Readme file:
A. Introduction
The index-extra plugin allows you to configure additional fields that you
wish to be added to the index, based on one of the following sources:
- The parsed text
- Meta data fields
- Previously created document-to-be-indexed fields
- Plain constant string
- Java expression combining one or more of the above, and resolving to a
string
A regex can also be applied to any of the above, allowing fields to be
created based on patterns extracted from the source.
B. Installation
1) Binaries only: Copy the 'index-extra' folder within
index-extra-v1.0-bin-java1.5.zip to NUTCHDIR/build
Copy the 'index-extra-conf.xml' file to NUTCHDIR/conf,
and configure
Enable the plugin by updating the nutch-site.xml file
2) Source code: Always refer to the Nutch wiki for detailed
instructions on building Nutch. In short:
Copy the 'index-extra' folder within
index-extra-v1.0-source.zip to NUTCHDIR/src/plugin
Update the build.xml in NUTCHDIR/src/plugin to include
plugin
Update the NUTCHDIR/default.properties file to include
plugin
run ant to build
Copy the 'index-extra-conf.xml' file to NUTCHDIR/conf,
and configure
Enable the plugin by updating the nutch-site.xml file
C. Known Issues
1) For this plugin to work correctly on any document field, it is
necessary to run the other index filters
first, so that all basic document fields are generated first. To do this,
configure the indexingfilter.order
property. (Please see patch NUTCH-421 to enable indexingfilter.order
property. If this patch is not applied,
the plugin will still work, but will not be able to use document fields
created by other index filter plugins.)
2) At this stage, field boost can not be used as Nutch scoring overrides
the field boost with its own
document-level boost calculation. This occurs at the end of
org.apache.nutch.indexer.Indexer's reduce method.
--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira
-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys - and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
Nutch-developers mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-developers