[ 
https://issues.apache.org/jira/browse/NUTCH-260?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Markus Jelsma closed NUTCH-260.
-------------------------------

    Resolution: Won't Fix

> Three new plugins that parse, index and query meta tags defined in the 
> configuration
> ------------------------------------------------------------------------------------
>
>                 Key: NUTCH-260
>                 URL: https://issues.apache.org/jira/browse/NUTCH-260
>             Project: Nutch
>          Issue Type: New Feature
>          Components: indexer, searcher
>    Affects Versions: 0.7.2
>         Environment: Built and tested on Linux so far.
>            Reporter: Jake Vanderdray
>            Priority: Minor
>         Attachments: nutch_customizations.tar
>
>
> These plugins allow you to define meta tags in you're nutch-site file that 
> you want to include in parseing, indexing and searching.  The query plugin 
> must replace query-basic.  The format for adding query terms to 
> nutch-site.xml is:
> <property>
>   <name>meta.names</name>
>   <value>keywords,recommended</value>
>   <description>This is a comma seperated list of meta tag names that will
>   be parsed, indexed and searched against when parse-meta, index-meta and
>   query-meta are used.</description>
> </property>
> <property>
>   <name>meta.boosts</name>
>   <value>1.0,5.0</value>
>   <description>Comma seperated list of boost values when searching using
>   query-meta.  The order of the values should match the order of meta.names.
>   </description>
> </property>
> Meta tags found are assumed to have either a single value or be a comma 
> seperated list of values.  The values found are added to the index as lucene 
> keywords (i.e. meta name=keywords values="First Thing, Second Thing" would 
> result in two keyword fields named "keywords".  The first would countain 
> "First Thing" and the second would contain "Second Thing").
> I had to replace the query-basic plugin in order to allow matches in the meta 
> fields to return hits even if there were no matches in any of the default 
> fields.  The query-basic field only returns hits when every search term is 
> found in at least one default field.  I needed hits returned if matches were 
> found in at least one field for every term, and/or the entire search phrase 
> appeared in a meta index field.
> One known bug is that common terms are not getting stripped out of the 
> fields' values before they get indexed, so "The Next Big Thing" could not be 
> matched because the query engine will strip out "the" from all queries.  I 
> intend to fix this by stipping out common terms from meta fields before 
> indexing them.
> Another issue is that searching for "Next Big Thing" would not match meta 
> index values for "Next", "Big" or "Thing".  You can consider that a bug or a 
> feature depending on how you look at it.
> These plugins were written for and only work on the 0.7.2 branch.
> I'm going to attache a tarball of the source of these three plugins after I 
> create the issue.  To use the plugins, you'll need to untar them in your 
> src/plugins directory and add them to the ant build.xml directive (and of 
> course add them in your nutch-site.xml file).  If these end up getting added 
> to the project, I'll write up documentation on the wiki.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to