[ https://issues.apache.org/jira/browse/NUTCH-809?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13184880#comment-13184880 ]
Lewis John McGibbney commented on NUTCH-809: -------------------------------------------- Hi Elisabeth although I haven't had time to look through your zip yet a big thank you must be aimed your way. If you have time and are willing please create a new page on the Nutch wiki under plugin central. As you can see this issue is closely linked to some others of similar nature so it may/may not change in the future, however community driven documentation is exactly what we are after and it is greatly welcomed. Please contact me off list or @ dev@ with your wiki username and I will add you to a the wiki contributers page. Thank you [1] http://wiki.apache.org/nutch/PluginCentral > Parse-metatags plugin > --------------------- > > Key: NUTCH-809 > URL: https://issues.apache.org/jira/browse/NUTCH-809 > Project: Nutch > Issue Type: New Feature > Components: parser > Affects Versions: 1.4, nutchgora > Reporter: Julien Nioche > Assignee: Julien Nioche > Fix For: 1.5 > > Attachments: NUTCH-809.patch, NUTCH-809_metatags_1.3.patch, > metatags-plugin+tutorial.zip > > > h2. Parse-metatags plugin > The parse-metatags plugin consists of a HTMLParserFilter which takes as > parameter a list of metatag names with '*' as default value. The values are > separated by ';'. > In order to extract the values of the metatags description and keywords, you > must specify in nutch-site.xml > {code:xml} > <property> > <name>metatags.names</name> > <value>description;keywords</value> > </property> > {code} > The MetatagIndexer uses the output of the parsing above to create two fields > 'keywords' and 'description'. Note that keywords is multivalued. > The query-basic plugin is used to include these fields in the search e.g. in > nutch-site.xml > {code:xml} > <property> > <name>query.basic.description.boost</name> > <value>2.0</value> > </property> > <property> > <name>query.basic.keywords.boost</name> > <value>2.0</value> > </property> > {code} > This code has been developed by DigitalPebble Ltd and offered to the > community by ANT.com -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira