[ 
https://issues.apache.org/jira/browse/NUTCH-1553?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15356694#comment-15356694
 ] 

Hudson commented on NUTCH-1553:
-------------------------------

FAILURE: Integrated in Nutch-trunk #3377 (See 
[https://builds.apache.org/job/Nutch-trunk/3377/])
NUTCH-1553 Property 'indexer.delete.robots.noindex' not working when (snagel: 
rev cb6fbae51a56587c30d15b8f170ebbf470851168)
* src/plugin/parse-html/src/java/org/apache/nutch/parse/html/HtmlParser.java


> Property 'indexer.delete.robots.noindex' not working when using parser-html.
> ----------------------------------------------------------------------------
>
>                 Key: NUTCH-1553
>                 URL: https://issues.apache.org/jira/browse/NUTCH-1553
>             Project: Nutch
>          Issue Type: Bug
>          Components: indexer, parser
>    Affects Versions: 1.6
>            Reporter: Alfonso Presa
>            Assignee: Sebastian Nagel
>            Priority: Minor
>             Fix For: 1.13
>
>         Attachments: NUTCH-1553-trunk-1.patch
>
>
> May be I'm doing something wrong, but it seems to me that +NUTCH-1434+ patch 
> only works when using tika's parser. When using parser-html, "robots" metatag 
> is only populated if parse-metatags plugin is enabled and it's done with the 
> prefix "metatag.". So parseData.getMeta("robots") returns nothing if not 
> using tika.
> I guess the simplest solution would be to provide a fallback in case 
> parseData.getMeta("robots") is null and then get 
> parseData.getMeta("metatag.robots") in that case.
> Also dependency of this property with parse-metadata plugin when using 
> parse-html would be something interesting to document somewhere... 
> (nutch-default.xml?)
> Thanks!



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to