[ https://issues.apache.org/jira/browse/NUTCH-1553?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Fengtan updated NUTCH-1553: --------------------------- Attachment: NUTCH-1553-trunk-1.patch > Property 'indexer.delete.robots.noindex' not working when using parser-html. > ---------------------------------------------------------------------------- > > Key: NUTCH-1553 > URL: https://issues.apache.org/jira/browse/NUTCH-1553 > Project: Nutch > Issue Type: Bug > Components: indexer, parser > Affects Versions: 1.6 > Reporter: Alfonso Presa > Priority: Minor > Attachments: NUTCH-1553-trunk-1.patch > > > May be I'm doing something wrong, but it seems to me that +NUTCH-1434+ patch > only works when using tika's parser. When using parser-html, "robots" metatag > is only populated if parse-metatags plugin is enabled and it's done with the > prefix "metatag.". So parseData.getMeta("robots") returns nothing if not > using tika. > I guess the simplest solution would be to provide a fallback in case > parseData.getMeta("robots") is null and then get > parseData.getMeta("metatag.robots") in that case. > Also dependency of this property with parse-metadata plugin when using > parse-html would be something interesting to document somewhere... > (nutch-default.xml?) > Thanks! -- This message was sent by Atlassian JIRA (v6.3.4#6332)