[ https://issues.apache.org/jira/browse/CONNECTORS-1655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17215309#comment-17215309 ]
Karl Wright commented on CONNECTORS-1655: ----------------------------------------- Basically what is failing is using character encoding "utf-8". As you know this is a very standard charset and almost nothing will work without it. This is not on the list of things removed from JDK 11 as far as I am aware. Perhaps its name has changed and we therefore need to add a list of names that map to it somewhere. But usage would be strewn throughout ManifoldCF in any case. But the official Oracle doc says it should be there, and isn't case sensitive either: https://docs.oracle.com/en/java/javase/11/docs/api/java.base/java/nio/charset/Charset.html I'm afraid it's up to you to do research as to why it's not found in your setup. > Web connector - UnsupportedEncodingException utf-8 > -------------------------------------------------- > > Key: CONNECTORS-1655 > URL: https://issues.apache.org/jira/browse/CONNECTORS-1655 > Project: ManifoldCF > Issue Type: Bug > Components: Web connector > Affects Versions: ManifoldCF 2.17 > Reporter: Julien Massiera > Priority: Critical > > When crawling some sites (for instance this one: > [http://www.antibes-juanlespins.com/] ) the job manages to index some > documents, but the stops with the following error code: > Error: IO error: utf-8; filename=rseventspro_rss20_56.xml > Here is one the MCF stacktrace: > Exception tossed: IO error: utf-8; filename=rseventspro_rss20_56.xml > org.apache.manifoldcf.core.interfaces.ManifoldCFException: IO error: utf-8; > filename=rseventspro_rss20_56.xml > at > org.apache.manifoldcf.crawler.connectors.webcrawler.WebcrawlerConnector.handleXML(WebcrawlerConnector.java:4203) > ~[?:?] > at > org.apache.manifoldcf.crawler.connectors.webcrawler.WebcrawlerConnector.extractLinks(WebcrawlerConnector.java:3855) > ~[?:?] > at > org.apache.manifoldcf.crawler.connectors.webcrawler.WebcrawlerConnector.processDocuments(WebcrawlerConnector.java:746) > ~[?:?] > at > org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:399) > [mcf-pull-agent.jar:?] > Caused by: java.io.UnsupportedEncodingException: utf-8; > filename=rseventspro_rss20_56.xml > at sun.nio.cs.StreamDecoder.forInputStreamReader(StreamDecoder.java:71) > ~[?:1.8.0_212] > at java.io.InputStreamReader.<init>(InputStreamReader.java:100) ~[?:1.8.0_212] > at > org.apache.manifoldcf.connectorcommon.fuzzyml.DecodingByteReceiver.dealWithBytes(DecodingByteReceiver.java:47) > ~[?:?] > at > org.apache.manifoldcf.connectorcommon.fuzzyml.BOMEncodingDetector.dealWithRemainder(BOMEncodingDetector.java:250) > ~[?:?] > at > org.apache.manifoldcf.connectorcommon.fuzzyml.SingleByteReceiver.dealWithBytes(SingleByteReceiver.java:52) > ~[?:?] > at > org.apache.manifoldcf.connectorcommon.fuzzyml.Parser.parseWithCharsetDetection(Parser.java:74) > ~[?:?] > at > org.apache.manifoldcf.crawler.connectors.webcrawler.WebcrawlerConnector.handleXML(WebcrawlerConnector.java:4174) > ~[?:?] > ... 3 more -- This message was sent by Atlassian Jira (v8.3.4#803005)