[
https://issues.apache.org/jira/browse/CONNECTORS-1655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17215375#comment-17215375
]
Julien Massiera commented on CONNECTORS-1655:
---------------------------------------------
Thanks for the fix !
> Web connector - UnsupportedEncodingException utf-8
> --------------------------------------------------
>
> Key: CONNECTORS-1655
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1655
> Project: ManifoldCF
> Issue Type: Bug
> Components: Web connector
> Affects Versions: ManifoldCF 2.17
> Reporter: Julien Massiera
> Assignee: Karl Wright
> Priority: Critical
> Fix For: ManifoldCF 2.18
>
>
> When crawling some sites (for instance this one:
> [http://www.antibes-juanlespins.com/] ) the job manages to index some
> documents, but the stops with the following error code:
> Error: IO error: utf-8; filename=rseventspro_rss20_56.xml
> Here is one the MCF stacktrace:
> Exception tossed: IO error: utf-8; filename=rseventspro_rss20_56.xml
> org.apache.manifoldcf.core.interfaces.ManifoldCFException: IO error: utf-8;
> filename=rseventspro_rss20_56.xml
> at
> org.apache.manifoldcf.crawler.connectors.webcrawler.WebcrawlerConnector.handleXML(WebcrawlerConnector.java:4203)
> ~[?:?]
> at
> org.apache.manifoldcf.crawler.connectors.webcrawler.WebcrawlerConnector.extractLinks(WebcrawlerConnector.java:3855)
> ~[?:?]
> at
> org.apache.manifoldcf.crawler.connectors.webcrawler.WebcrawlerConnector.processDocuments(WebcrawlerConnector.java:746)
> ~[?:?]
> at
> org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:399)
> [mcf-pull-agent.jar:?]
> Caused by: java.io.UnsupportedEncodingException: utf-8;
> filename=rseventspro_rss20_56.xml
> at sun.nio.cs.StreamDecoder.forInputStreamReader(StreamDecoder.java:71)
> ~[?:1.8.0_212]
> at java.io.InputStreamReader.<init>(InputStreamReader.java:100) ~[?:1.8.0_212]
> at
> org.apache.manifoldcf.connectorcommon.fuzzyml.DecodingByteReceiver.dealWithBytes(DecodingByteReceiver.java:47)
> ~[?:?]
> at
> org.apache.manifoldcf.connectorcommon.fuzzyml.BOMEncodingDetector.dealWithRemainder(BOMEncodingDetector.java:250)
> ~[?:?]
> at
> org.apache.manifoldcf.connectorcommon.fuzzyml.SingleByteReceiver.dealWithBytes(SingleByteReceiver.java:52)
> ~[?:?]
> at
> org.apache.manifoldcf.connectorcommon.fuzzyml.Parser.parseWithCharsetDetection(Parser.java:74)
> ~[?:?]
> at
> org.apache.manifoldcf.crawler.connectors.webcrawler.WebcrawlerConnector.handleXML(WebcrawlerConnector.java:4174)
> ~[?:?]
> ... 3 more
--
This message was sent by Atlassian Jira
(v8.3.4#803005)