[ 
https://issues.apache.org/jira/browse/NUTCH-1264?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13201412#comment-13201412
 ] 

Hudson commented on NUTCH-1264:
-------------------------------

Integrated in nutch-trunk-maven #137 (See 
[https://builds.apache.org/job/nutch-trunk-maven/137/])
    NUTCH-1264 Index-metadata

jnioche : 
http://svn.apache.org/viewvc/nutch/trunk/viewvc/?view=rev&root=&revision=1241074
Files : 
* /nutch/trunk/CHANGES.txt
* /nutch/trunk/conf/nutch-default.xml
* /nutch/trunk/src/plugin/build.xml
* /nutch/trunk/src/plugin/index-metadata
* /nutch/trunk/src/plugin/index-metadata/build.xml
* /nutch/trunk/src/plugin/index-metadata/ivy.xml
* /nutch/trunk/src/plugin/index-metadata/plugin.xml
* /nutch/trunk/src/plugin/index-metadata/src
* /nutch/trunk/src/plugin/index-metadata/src/java
* /nutch/trunk/src/plugin/index-metadata/src/java/org
* /nutch/trunk/src/plugin/index-metadata/src/java/org/apache
* /nutch/trunk/src/plugin/index-metadata/src/java/org/apache/nutch
* /nutch/trunk/src/plugin/index-metadata/src/java/org/apache/nutch/indexer
* 
/nutch/trunk/src/plugin/index-metadata/src/java/org/apache/nutch/indexer/metadata
* 
/nutch/trunk/src/plugin/index-metadata/src/java/org/apache/nutch/indexer/metadata/MetadataIndexer.java

                
> Configurable indexing plugin (index-metadata) 
> ----------------------------------------------
>
>                 Key: NUTCH-1264
>                 URL: https://issues.apache.org/jira/browse/NUTCH-1264
>             Project: Nutch
>          Issue Type: Improvement
>          Components: indexer
>    Affects Versions: 1.5
>            Reporter: Julien Nioche
>             Fix For: 1.5
>
>         Attachments: NUTCH-1264-trunk-v2.patch, NUTCH-1264-trunk.patch
>
>
> We currently have several plugins already distributed or proposed which do 
> very comparable things : 
> - parse-meta [NUTCH-809] to generate metadata fields in parse-metadata and 
> index them
> - headings [NUTCH-1005] to generate headings fields in parse-metadata and 
> index them
> - index-extra [NUTCH-422] to index configurable fields 
> - urlmeta [NUTCH-855] to propagate metadata from the seeds to the outlinks 
> and index them
> - index-static [NUTCH-940] to generate configurable static fields 
> All these plugins have in common that they allow to extract information from 
> various sources and generate fields from them and are largely redundant. 
> Instead this issue proposes to have a single plugin allowing to generate 
> configurable fields from : 
> - static values
> - parse metadata
> - content metadata
> - crawldb metadata
> and let the other plugins focus on the parsing and extraction of the values 
> to index. This will make the addition of new fields simpler by relying on a 
> stable common plugin instead of multiplying the code in various plugins.
> This plugin will replace index-extra [NUTCH-422] and will serve as a basis 
> for further improvements.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to