[ 
https://issues.apache.org/jira/browse/NUTCH-1406?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13398367#comment-13398367
 ] 

Kristof  commented on NUTCH-1406:
---------------------------------

I found a way to, but it involved replacing MetadataIndexer.java completely. 
The date conversion works. I tested it with the seed url stated in the 
description. Patch against trunk is attached.
                
> Metatags-index/-parse plugin: conversion to Solr date format and prevents 
> parsing/indexing of empty tags
> --------------------------------------------------------------------------------------------------------
>
>                 Key: NUTCH-1406
>                 URL: https://issues.apache.org/jira/browse/NUTCH-1406
>             Project: Nutch
>          Issue Type: Improvement
>          Components: indexer, parser
>            Reporter: Kristof 
>            Priority: Minor
>              Labels: conversion, date
>         Attachments: index-metadata-plugin.patch, index-metatags.jar
>
>
> This improvement to the index-metatags plugin (sometimes also refered to 
> parse-metatags plugin) allows for conversion of selected fields to the Solr 
> date format and prevents parsing/indexing of metatags that do not contain any 
> content.
> In order to convert the values of selected metatags to Solr date format, you 
> must specify in nutch-site.xml. The example used is an extended Dublin Core 
> element dcterms.modified with the seed url http://www.cic.gc.ca/. 
> dcterms.modified must also be defined in the metatags.names property.
> {code}
> <property>
>       <name>metatags.convert</name>
>       <value>dcterms.modified</value>
>       <description>For plugin index-metadata: Indicate here the name of the 
> html meta tag that should be converted to Solr date format.
>       </description>
> </property>
> {code}
> I read that SimpleDateFormat format is not a robust solution, so this 
> improvement might have some problems.
> So far it worked well for me. Below more details about the changes.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to