[ https://issues.apache.org/jira/browse/CONNECTORS-1657?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17376611#comment-17376611 ]
Sebastian Bölling commented on CONNECTORS-1657: ----------------------------------------------- I would also appreciate this feature. The usage of an sitemap reference is specified in [https://www.sitemaps.org/protocol.html#submit_robots] This feature would be an easy and standard way for webmasters to inform the Web connector about the pages available for crawling. > Web connector - Handle sitemap instruction in robot.txt > ------------------------------------------------------- > > Key: CONNECTORS-1657 > URL: https://issues.apache.org/jira/browse/CONNECTORS-1657 > Project: ManifoldCF > Issue Type: Improvement > Components: Web connector > Affects Versions: ManifoldCF 2.17 > Reporter: Julien Massiera > Priority: Major > > Currently the web connector does not understand when the robot.txt file > points a sitemap. As an example, for the site > [https://www.persee.fr,|https://www.persee.fr%2C/] in the simple history one > can find the following error: > Unknown robots.txt line: 'Sitemap: [https://www.persee.fr/sitemap.xml'] > -- This message was sent by Atlassian Jira (v8.3.4#803005)