[ https://issues.apache.org/jira/browse/NUTCH-809?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13211586#comment-13211586 ]
Elisabeth Adler commented on NUTCH-809: --------------------------------------- I haven't tested the plugin in 1.4 myself, but I think a few guys on the mailing list already used it with 1.4. > Parse-metatags plugin > --------------------- > > Key: NUTCH-809 > URL: https://issues.apache.org/jira/browse/NUTCH-809 > Project: Nutch > Issue Type: New Feature > Components: parser > Affects Versions: 1.4, nutchgora > Reporter: Julien Nioche > Assignee: Julien Nioche > Fix For: 1.5 > > Attachments: NUTCH-809.patch, NUTCH-809_metatags_1.3.patch, > metatags-plugin+tutorial.zip > > > h2. Parse-metatags plugin > The parse-metatags plugin consists of a HTMLParserFilter which takes as > parameter a list of metatag names with '*' as default value. The values are > separated by ';'. > In order to extract the values of the metatags description and keywords, you > must specify in nutch-site.xml > {code:xml} > <property> > <name>metatags.names</name> > <value>description;keywords</value> > </property> > {code} > The MetatagIndexer uses the output of the parsing above to create two fields > 'keywords' and 'description'. Note that keywords is multivalued. > The query-basic plugin is used to include these fields in the search e.g. in > nutch-site.xml > {code:xml} > <property> > <name>query.basic.description.boost</name> > <value>2.0</value> > </property> > <property> > <name>query.basic.keywords.boost</name> > <value>2.0</value> > </property> > {code} > This code has been developed by DigitalPebble Ltd and offered to the > community by ANT.com -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira