[jira] [Resolved] (CONNECTORS-1620) Accept Sitemaps with content type application/xml

2019-08-22 Thread Karl Wright (Jira)


 [ 
https://issues.apache.org/jira/browse/CONNECTORS-1620?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karl Wright resolved CONNECTORS-1620.
-
Fix Version/s: ManifoldCF 2.14
   Resolution: Fixed

r1865689


> Accept Sitemaps with content type application/xml
> -
>
> Key: CONNECTORS-1620
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1620
> Project: ManifoldCF
>  Issue Type: Improvement
>  Components: Web connector
>Reporter: Markus Schuch
>Assignee: Markus Schuch
>Priority: Major
> Fix For: ManifoldCF 2.14
>
>
> Given an Output Connection, that does not accepts the MIME type 
> {{application/xml}} for ingestion, it is currently not possible to crawl a 
> sitemap.xml, when the webserver returns {{application/xml}} as content type 
> for the sitemap.
> The sitemap is discarded before the links are extracted, because the mime 
> type {{application/xml}} is not listed in the {{interestingMimeTypeArray}}.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Commented] (CONNECTORS-1620) Accept Sitemaps with content type application/xml

2019-08-22 Thread Markus Schuch (Jira)


[ 
https://issues.apache.org/jira/browse/CONNECTORS-1620?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16913410#comment-16913410
 ] 

Markus Schuch commented on CONNECTORS-1620:
---

[~daddywri] are you ok with adding {{application/xml}} to the 
{{interestingMimeTypeArray}} array of the {{WebcrawlerConnector}}?

> Accept Sitemaps with content type application/xml
> -
>
> Key: CONNECTORS-1620
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1620
> Project: ManifoldCF
>  Issue Type: Improvement
>  Components: Web connector
>Reporter: Markus Schuch
>Assignee: Markus Schuch
>Priority: Major
>
> Given an Output Connection, that does not accepts the MIME type 
> {{application/xml}} for ingestion, it is currently not possible to crawl a 
> sitemap.xml, when the webserver returns {{application/xml}} as content type 
> for the sitemap.
> The sitemap is discarded before the links are extracted, because the mime 
> type {{application/xml}} is not listed in the {{interestingMimeTypeArray}}.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Updated] (CONNECTORS-1620) Accept Sitemaps with content type application/xml

2019-08-22 Thread Markus Schuch (Jira)


 [ 
https://issues.apache.org/jira/browse/CONNECTORS-1620?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Markus Schuch updated CONNECTORS-1620:
--
Description: 
Given an Output Connection, that does not accepts the MIME type 
{{application/xml}} for ingestion, it is currently not possible to crawl a 
sitemap.xml, when the webserver returns {{application/xml}} as content type for 
the sitemap.

The sitemap is discarded before the links are extracted, because the mime type 
{{application/xml}} is not listed in the {{interestingMimeTypeArray}}.

  was:
Given an Output Connection, that does not accepts the MIME type 
{{application/xml}} for ingestion, it is currently not possible to crawl a 
sitemap.xml, when the webserver returns {{application/xml}} as content type for 
the sitemap.

The sitemap is discarded before the links are extracted.


> Accept Sitemaps with content type application/xml
> -
>
> Key: CONNECTORS-1620
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1620
> Project: ManifoldCF
>  Issue Type: Improvement
>  Components: Web connector
>Reporter: Markus Schuch
>Assignee: Markus Schuch
>Priority: Major
>
> Given an Output Connection, that does not accepts the MIME type 
> {{application/xml}} for ingestion, it is currently not possible to crawl a 
> sitemap.xml, when the webserver returns {{application/xml}} as content type 
> for the sitemap.
> The sitemap is discarded before the links are extracted, because the mime 
> type {{application/xml}} is not listed in the {{interestingMimeTypeArray}}.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Created] (CONNECTORS-1620) Accept Sitemaps with content type application/xml

2019-08-22 Thread Markus Schuch (Jira)
Markus Schuch created CONNECTORS-1620:
-

 Summary: Accept Sitemaps with content type application/xml
 Key: CONNECTORS-1620
 URL: https://issues.apache.org/jira/browse/CONNECTORS-1620
 Project: ManifoldCF
  Issue Type: Improvement
  Components: Web connector
Reporter: Markus Schuch
Assignee: Markus Schuch


Given an Output Connection, that does not accepts the MIME type 
{{application/xml}} for ingestion, it is currently not possible to crawl a 
sitemap.xml, when the webserver returns {{application/xml}} as content type for 
the sitemap.

The sitemap is discarded before the links are extracted.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)