[ 
https://issues.apache.org/jira/browse/CONNECTORS-1392?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Markus Schuch updated CONNECTORS-1392:
--------------------------------------
    Description: 
The Web connectors already allows to ignore robots.txt by option.

With this ticket, another option is added, to allow the connector to ignore 
robots instructions in {{<meta name="robots ...}} tags and {{<a ... 
rel="nofollow" ...}} attributes.

*First proposal (to be discussed)*

Reuse the existing "Robots.txt usage" option in the "Robots" Tab. Rename the 
existing options:
# Don't look at robots.txt, meta robots and rel attributes
# Obey robots.txt, meta robots tags and rel attributes for data fetches only
# Obey robots.txt, meta robots tags and rel attributes _(the default)_

The end user doc needs to be updated.

Google ressources on robot instructions in HTML pages:
[0] 
https://support.google.com/webmasters/answer/79812?hl=en&ctx=cb&src=cb&cbid=tnnsjq5jcodt&cbrank=4
[1] 
https://support.google.com/webmasters/answer/96569?hl=en&ctx=cb&src=cb&cbid=-5rmggrfsp2rq&cbrank=3

  was:
The Web connectors already allows to ignore robots.txt by option.

With this ticket, another option is added, to allow the connector to ignore 
robots instructions in {{<meta name="robots ...}} tags and {{<a ... 
rel="nofollow" ...}} attributes.

*First proposal*

Reuse the existing "Robots.txt usage" option in the "Robots" Tab. Rename the 
existing options:
# Don't look at robots.txt, meta robots and rel attributes
# Obey robots.txt, meta robots tags and rel attributes for data fetches only
# Obey robots.txt, meta robots tags and rel attributes _(the default)_

The end user doc needs to be updated.

Google ressources on robot instructions in HTML pages:
[0] 
https://support.google.com/webmasters/answer/79812?hl=en&ctx=cb&src=cb&cbid=tnnsjq5jcodt&cbrank=4
[1] 
https://support.google.com/webmasters/answer/96569?hl=en&ctx=cb&src=cb&cbid=-5rmggrfsp2rq&cbrank=3


> Add option for Web connector to ignore robots instructions in meta tags and 
> rel attributes
> ------------------------------------------------------------------------------------------
>
>                 Key: CONNECTORS-1392
>                 URL: https://issues.apache.org/jira/browse/CONNECTORS-1392
>             Project: ManifoldCF
>          Issue Type: New Feature
>          Components: Web connector
>            Reporter: Markus Schuch
>
> The Web connectors already allows to ignore robots.txt by option.
> With this ticket, another option is added, to allow the connector to ignore 
> robots instructions in {{<meta name="robots ...}} tags and {{<a ... 
> rel="nofollow" ...}} attributes.
> *First proposal (to be discussed)*
> Reuse the existing "Robots.txt usage" option in the "Robots" Tab. Rename the 
> existing options:
> # Don't look at robots.txt, meta robots and rel attributes
> # Obey robots.txt, meta robots tags and rel attributes for data fetches only
> # Obey robots.txt, meta robots tags and rel attributes _(the default)_
> The end user doc needs to be updated.
> Google ressources on robot instructions in HTML pages:
> [0] 
> https://support.google.com/webmasters/answer/79812?hl=en&ctx=cb&src=cb&cbid=tnnsjq5jcodt&cbrank=4
> [1] 
> https://support.google.com/webmasters/answer/96569?hl=en&ctx=cb&src=cb&cbid=-5rmggrfsp2rq&cbrank=3



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to