[ https://issues.apache.org/jira/browse/NUTCH-1406?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13398367#comment-13398367 ]
Kristof commented on NUTCH-1406: --------------------------------- I found a way to, but it involved replacing MetadataIndexer.java completely. The date conversion works. I tested it with the seed url stated in the description. Patch against trunk is attached. > Metatags-index/-parse plugin: conversion to Solr date format and prevents > parsing/indexing of empty tags > -------------------------------------------------------------------------------------------------------- > > Key: NUTCH-1406 > URL: https://issues.apache.org/jira/browse/NUTCH-1406 > Project: Nutch > Issue Type: Improvement > Components: indexer, parser > Reporter: Kristof > Priority: Minor > Labels: conversion, date > Attachments: index-metadata-plugin.patch, index-metatags.jar > > > This improvement to the index-metatags plugin (sometimes also refered to > parse-metatags plugin) allows for conversion of selected fields to the Solr > date format and prevents parsing/indexing of metatags that do not contain any > content. > In order to convert the values of selected metatags to Solr date format, you > must specify in nutch-site.xml. The example used is an extended Dublin Core > element dcterms.modified with the seed url http://www.cic.gc.ca/. > dcterms.modified must also be defined in the metatags.names property. > {code} > <property> > <name>metatags.convert</name> > <value>dcterms.modified</value> > <description>For plugin index-metadata: Indicate here the name of the > html meta tag that should be converted to Solr date format. > </description> > </property> > {code} > I read that SimpleDateFormat format is not a robust solution, so this > improvement might have some problems. > So far it worked well for me. Below more details about the changes. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira