[jira] [Resolved] (CONNECTORS-1620) Accept Sitemaps with content type application/xml
[ https://issues.apache.org/jira/browse/CONNECTORS-1620?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karl Wright resolved CONNECTORS-1620. - Fix Version/s: ManifoldCF 2.14 Resolution: Fixed r1865689 > Accept Sitemaps with content type application/xml > - > > Key: CONNECTORS-1620 > URL: https://issues.apache.org/jira/browse/CONNECTORS-1620 > Project: ManifoldCF > Issue Type: Improvement > Components: Web connector >Reporter: Markus Schuch >Assignee: Markus Schuch >Priority: Major > Fix For: ManifoldCF 2.14 > > > Given an Output Connection, that does not accepts the MIME type > {{application/xml}} for ingestion, it is currently not possible to crawl a > sitemap.xml, when the webserver returns {{application/xml}} as content type > for the sitemap. > The sitemap is discarded before the links are extracted, because the mime > type {{application/xml}} is not listed in the {{interestingMimeTypeArray}}. -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Commented] (CONNECTORS-1620) Accept Sitemaps with content type application/xml
[ https://issues.apache.org/jira/browse/CONNECTORS-1620?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16913410#comment-16913410 ] Markus Schuch commented on CONNECTORS-1620: --- [~daddywri] are you ok with adding {{application/xml}} to the {{interestingMimeTypeArray}} array of the {{WebcrawlerConnector}}? > Accept Sitemaps with content type application/xml > - > > Key: CONNECTORS-1620 > URL: https://issues.apache.org/jira/browse/CONNECTORS-1620 > Project: ManifoldCF > Issue Type: Improvement > Components: Web connector >Reporter: Markus Schuch >Assignee: Markus Schuch >Priority: Major > > Given an Output Connection, that does not accepts the MIME type > {{application/xml}} for ingestion, it is currently not possible to crawl a > sitemap.xml, when the webserver returns {{application/xml}} as content type > for the sitemap. > The sitemap is discarded before the links are extracted, because the mime > type {{application/xml}} is not listed in the {{interestingMimeTypeArray}}. -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Updated] (CONNECTORS-1620) Accept Sitemaps with content type application/xml
[ https://issues.apache.org/jira/browse/CONNECTORS-1620?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Schuch updated CONNECTORS-1620: -- Description: Given an Output Connection, that does not accepts the MIME type {{application/xml}} for ingestion, it is currently not possible to crawl a sitemap.xml, when the webserver returns {{application/xml}} as content type for the sitemap. The sitemap is discarded before the links are extracted, because the mime type {{application/xml}} is not listed in the {{interestingMimeTypeArray}}. was: Given an Output Connection, that does not accepts the MIME type {{application/xml}} for ingestion, it is currently not possible to crawl a sitemap.xml, when the webserver returns {{application/xml}} as content type for the sitemap. The sitemap is discarded before the links are extracted. > Accept Sitemaps with content type application/xml > - > > Key: CONNECTORS-1620 > URL: https://issues.apache.org/jira/browse/CONNECTORS-1620 > Project: ManifoldCF > Issue Type: Improvement > Components: Web connector >Reporter: Markus Schuch >Assignee: Markus Schuch >Priority: Major > > Given an Output Connection, that does not accepts the MIME type > {{application/xml}} for ingestion, it is currently not possible to crawl a > sitemap.xml, when the webserver returns {{application/xml}} as content type > for the sitemap. > The sitemap is discarded before the links are extracted, because the mime > type {{application/xml}} is not listed in the {{interestingMimeTypeArray}}. -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Created] (CONNECTORS-1620) Accept Sitemaps with content type application/xml
Markus Schuch created CONNECTORS-1620: - Summary: Accept Sitemaps with content type application/xml Key: CONNECTORS-1620 URL: https://issues.apache.org/jira/browse/CONNECTORS-1620 Project: ManifoldCF Issue Type: Improvement Components: Web connector Reporter: Markus Schuch Assignee: Markus Schuch Given an Output Connection, that does not accepts the MIME type {{application/xml}} for ingestion, it is currently not possible to crawl a sitemap.xml, when the webserver returns {{application/xml}} as content type for the sitemap. The sitemap is discarded before the links are extracted. -- This message was sent by Atlassian Jira (v8.3.2#803003)