[ https://issues.apache.org/jira/browse/CONNECTORS-1392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15888994#comment-15888994 ]
Karl Wright commented on CONNECTORS-1392: ----------------------------------------- I've had a look at this patch. There's a minor formatting problem; the form is 4 columns wide, and it looks like you've got part of the form being 2 and the other part being 4: {code} @@ -3300,8 +3330,12 @@ " <tr>\n"+ " <td class=\"description\" colspan=\"1\"><nobr>"+Messages.getBodyString(locale,"WebcrawlerConnector.EmailAddress")+"</nobr></td>\n"+ " <td class=\"value\" colspan=\"1\">"+Encoder.bodyEscape(email)+"</td>\n"+ +" </tr>\n"+ +" <tr>\n"+ " <td class=\"description\" colspan=\"1\"><nobr>"+Messages.getBodyString(locale,"WebcrawlerConnector.RobotsUsage")+"</nobr></td>\n"+ " <td class=\"value\" colspan=\"1\"><nobr>"+Encoder.bodyEscape(robots)+"</nobr></td>\n"+ +" <td class=\"description\" colspan=\"1\"><nobr>"+Messages.getBodyString(locale,"WebcrawlerConnector.MetaRobotsTagsUsage")+"</nobr></td>\n"+ +" <td class=\"value\" colspan=\"1\">"+Encoder.bodyEscape(metaRobotsTagsUsage)+"</td>\n"+ " </tr>\n"+ " <tr>\n"+ " <td class=\"description\"><nobr>" + Messages.getBodyString(locale,"WebcrawlerConnector.ProxyHostColon") + "</nobr></td>\n"+ {code} > Add option for Web connector to ignore robots instructions in meta tags > ----------------------------------------------------------------------- > > Key: CONNECTORS-1392 > URL: https://issues.apache.org/jira/browse/CONNECTORS-1392 > Project: ManifoldCF > Issue Type: New Feature > Components: Web connector > Reporter: Markus Schuch > Attachments: CONNECTORS-1392.patch > > > The Web connectors already allows to ignore robots.txt by option. > With this ticket, another option is added, to allow the connector to ignore > robots instructions in {{<meta name="robots ...}} tags. > *Proposal (to be discussed)* > Add a new option list "Page level robots instructions" to the "Robots" Tab. > List entries: > # Obey meta robots tags (the default) > # Don't took at meta robots tags > The end user doc needs to be updated. > Google ressources on robot instructions in HTML pages: > [0] > https://support.google.com/webmasters/answer/79812?hl=en&ctx=cb&src=cb&cbid=tnnsjq5jcodt&cbrank=4 > [1] > https://support.google.com/webmasters/answer/96569?hl=en&ctx=cb&src=cb&cbid=-5rmggrfsp2rq&cbrank=3 > [2] > https://developers.google.com/webmasters/control-crawl-index/docs/robots_meta_tag?csw=1 > Thread on the mailing list > [3] https://www.mail-archive.com/user@manifoldcf.apache.org/msg03258.html -- This message was sent by Atlassian JIRA (v6.3.15#6346)