[jira] [Assigned] (CONNECTORS-1655) Web connector - UnsupportedEncodingException utf-8

2020-10-16 Thread Karl Wright (Jira)


 [ 
https://issues.apache.org/jira/browse/CONNECTORS-1655?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karl Wright reassigned CONNECTORS-1655:
---

Assignee: Karl Wright

> Web connector - UnsupportedEncodingException utf-8
> --
>
> Key: CONNECTORS-1655
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1655
> Project: ManifoldCF
>  Issue Type: Bug
>  Components: Web connector
>Affects Versions: ManifoldCF 2.17
>Reporter: Julien Massiera
>Assignee: Karl Wright
>Priority: Critical
>
> When crawling some sites (for instance this one: 
> [http://www.antibes-juanlespins.com/] ) the job manages to index some 
> documents, but the stops with the following error code:
> Error: IO error: utf-8; filename=rseventspro_rss20_56.xml
> Here is one the MCF stacktrace: 
> Exception tossed: IO error: utf-8; filename=rseventspro_rss20_56.xml
> org.apache.manifoldcf.core.interfaces.ManifoldCFException: IO error: utf-8; 
> filename=rseventspro_rss20_56.xml
> at 
> org.apache.manifoldcf.crawler.connectors.webcrawler.WebcrawlerConnector.handleXML(WebcrawlerConnector.java:4203)
>  ~[?:?]
> at 
> org.apache.manifoldcf.crawler.connectors.webcrawler.WebcrawlerConnector.extractLinks(WebcrawlerConnector.java:3855)
>  ~[?:?]
> at 
> org.apache.manifoldcf.crawler.connectors.webcrawler.WebcrawlerConnector.processDocuments(WebcrawlerConnector.java:746)
>  ~[?:?]
> at 
> org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:399) 
> [mcf-pull-agent.jar:?]
> Caused by: java.io.UnsupportedEncodingException: utf-8; 
> filename=rseventspro_rss20_56.xml
> at sun.nio.cs.StreamDecoder.forInputStreamReader(StreamDecoder.java:71) 
> ~[?:1.8.0_212]
> at java.io.InputStreamReader.(InputStreamReader.java:100) ~[?:1.8.0_212]
> at 
> org.apache.manifoldcf.connectorcommon.fuzzyml.DecodingByteReceiver.dealWithBytes(DecodingByteReceiver.java:47)
>  ~[?:?]
> at 
> org.apache.manifoldcf.connectorcommon.fuzzyml.BOMEncodingDetector.dealWithRemainder(BOMEncodingDetector.java:250)
>  ~[?:?]
> at 
> org.apache.manifoldcf.connectorcommon.fuzzyml.SingleByteReceiver.dealWithBytes(SingleByteReceiver.java:52)
>  ~[?:?]
> at 
> org.apache.manifoldcf.connectorcommon.fuzzyml.Parser.parseWithCharsetDetection(Parser.java:74)
>  ~[?:?]
> at 
> org.apache.manifoldcf.crawler.connectors.webcrawler.WebcrawlerConnector.handleXML(WebcrawlerConnector.java:4174)
>  ~[?:?]
> ... 3 more



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (CONNECTORS-1655) Web connector - UnsupportedEncodingException utf-8

2020-10-16 Thread Karl Wright (Jira)


[ 
https://issues.apache.org/jira/browse/CONNECTORS-1655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17215309#comment-17215309
 ] 

Karl Wright commented on CONNECTORS-1655:
-

Basically what is failing is using character encoding "utf-8".  As you know 
this is a very standard charset and almost nothing will work without it.  This 
is not on the list of things removed from JDK 11 as far as I am aware.  Perhaps 
its name has changed and we therefore need to add a list of names that map to 
it somewhere.  But usage would be strewn throughout ManifoldCF in any case.

But the official Oracle doc says it should be there, and isn't case sensitive 
either:

https://docs.oracle.com/en/java/javase/11/docs/api/java.base/java/nio/charset/Charset.html

I'm afraid it's up to you to do research as to why it's not found in your setup.


> Web connector - UnsupportedEncodingException utf-8
> --
>
> Key: CONNECTORS-1655
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1655
> Project: ManifoldCF
>  Issue Type: Bug
>  Components: Web connector
>Affects Versions: ManifoldCF 2.17
>Reporter: Julien Massiera
>Priority: Critical
>
> When crawling some sites (for instance this one: 
> [http://www.antibes-juanlespins.com/] ) the job manages to index some 
> documents, but the stops with the following error code:
> Error: IO error: utf-8; filename=rseventspro_rss20_56.xml
> Here is one the MCF stacktrace: 
> Exception tossed: IO error: utf-8; filename=rseventspro_rss20_56.xml
> org.apache.manifoldcf.core.interfaces.ManifoldCFException: IO error: utf-8; 
> filename=rseventspro_rss20_56.xml
> at 
> org.apache.manifoldcf.crawler.connectors.webcrawler.WebcrawlerConnector.handleXML(WebcrawlerConnector.java:4203)
>  ~[?:?]
> at 
> org.apache.manifoldcf.crawler.connectors.webcrawler.WebcrawlerConnector.extractLinks(WebcrawlerConnector.java:3855)
>  ~[?:?]
> at 
> org.apache.manifoldcf.crawler.connectors.webcrawler.WebcrawlerConnector.processDocuments(WebcrawlerConnector.java:746)
>  ~[?:?]
> at 
> org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:399) 
> [mcf-pull-agent.jar:?]
> Caused by: java.io.UnsupportedEncodingException: utf-8; 
> filename=rseventspro_rss20_56.xml
> at sun.nio.cs.StreamDecoder.forInputStreamReader(StreamDecoder.java:71) 
> ~[?:1.8.0_212]
> at java.io.InputStreamReader.(InputStreamReader.java:100) ~[?:1.8.0_212]
> at 
> org.apache.manifoldcf.connectorcommon.fuzzyml.DecodingByteReceiver.dealWithBytes(DecodingByteReceiver.java:47)
>  ~[?:?]
> at 
> org.apache.manifoldcf.connectorcommon.fuzzyml.BOMEncodingDetector.dealWithRemainder(BOMEncodingDetector.java:250)
>  ~[?:?]
> at 
> org.apache.manifoldcf.connectorcommon.fuzzyml.SingleByteReceiver.dealWithBytes(SingleByteReceiver.java:52)
>  ~[?:?]
> at 
> org.apache.manifoldcf.connectorcommon.fuzzyml.Parser.parseWithCharsetDetection(Parser.java:74)
>  ~[?:?]
> at 
> org.apache.manifoldcf.crawler.connectors.webcrawler.WebcrawlerConnector.handleXML(WebcrawlerConnector.java:4174)
>  ~[?:?]
> ... 3 more



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (CONNECTORS-1655) Web connector - UnsupportedEncodingException utf-8

2020-10-16 Thread Julien Massiera (Jira)


[ 
https://issues.apache.org/jira/browse/CONNECTORS-1655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17215375#comment-17215375
 ] 

Julien Massiera commented on CONNECTORS-1655:
-

Thanks for the fix !

> Web connector - UnsupportedEncodingException utf-8
> --
>
> Key: CONNECTORS-1655
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1655
> Project: ManifoldCF
>  Issue Type: Bug
>  Components: Web connector
>Affects Versions: ManifoldCF 2.17
>Reporter: Julien Massiera
>Assignee: Karl Wright
>Priority: Critical
> Fix For: ManifoldCF 2.18
>
>
> When crawling some sites (for instance this one: 
> [http://www.antibes-juanlespins.com/] ) the job manages to index some 
> documents, but the stops with the following error code:
> Error: IO error: utf-8; filename=rseventspro_rss20_56.xml
> Here is one the MCF stacktrace: 
> Exception tossed: IO error: utf-8; filename=rseventspro_rss20_56.xml
> org.apache.manifoldcf.core.interfaces.ManifoldCFException: IO error: utf-8; 
> filename=rseventspro_rss20_56.xml
> at 
> org.apache.manifoldcf.crawler.connectors.webcrawler.WebcrawlerConnector.handleXML(WebcrawlerConnector.java:4203)
>  ~[?:?]
> at 
> org.apache.manifoldcf.crawler.connectors.webcrawler.WebcrawlerConnector.extractLinks(WebcrawlerConnector.java:3855)
>  ~[?:?]
> at 
> org.apache.manifoldcf.crawler.connectors.webcrawler.WebcrawlerConnector.processDocuments(WebcrawlerConnector.java:746)
>  ~[?:?]
> at 
> org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:399) 
> [mcf-pull-agent.jar:?]
> Caused by: java.io.UnsupportedEncodingException: utf-8; 
> filename=rseventspro_rss20_56.xml
> at sun.nio.cs.StreamDecoder.forInputStreamReader(StreamDecoder.java:71) 
> ~[?:1.8.0_212]
> at java.io.InputStreamReader.(InputStreamReader.java:100) ~[?:1.8.0_212]
> at 
> org.apache.manifoldcf.connectorcommon.fuzzyml.DecodingByteReceiver.dealWithBytes(DecodingByteReceiver.java:47)
>  ~[?:?]
> at 
> org.apache.manifoldcf.connectorcommon.fuzzyml.BOMEncodingDetector.dealWithRemainder(BOMEncodingDetector.java:250)
>  ~[?:?]
> at 
> org.apache.manifoldcf.connectorcommon.fuzzyml.SingleByteReceiver.dealWithBytes(SingleByteReceiver.java:52)
>  ~[?:?]
> at 
> org.apache.manifoldcf.connectorcommon.fuzzyml.Parser.parseWithCharsetDetection(Parser.java:74)
>  ~[?:?]
> at 
> org.apache.manifoldcf.crawler.connectors.webcrawler.WebcrawlerConnector.handleXML(WebcrawlerConnector.java:4174)
>  ~[?:?]
> ... 3 more



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (CONNECTORS-1655) Web connector - UnsupportedEncodingException utf-8

2020-10-16 Thread Julien Massiera (Jira)


[ 
https://issues.apache.org/jira/browse/CONNECTORS-1655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17215238#comment-17215238
 ] 

Julien Massiera commented on CONNECTORS-1655:
-

Hi [~kwri...@metacarta.com], I am using offical OpenJDK 11 installed from the 
Debian repo:
openjdk version "11.0.8" 2020-07-14
OpenJDK Runtime Environment 18.9 (build 11.0.8+10)
OpenJDK 64-Bit Server VM 18.9 (build 11.0.8+10, mixed mode)

> Web connector - UnsupportedEncodingException utf-8
> --
>
> Key: CONNECTORS-1655
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1655
> Project: ManifoldCF
>  Issue Type: Bug
>  Components: Web connector
>Affects Versions: ManifoldCF 2.17
>Reporter: Julien Massiera
>Priority: Critical
>
> When crawling some sites (for instance this one: 
> [http://www.antibes-juanlespins.com/] ) the job manages to index some 
> documents, but the stops with the following error code:
> Error: IO error: utf-8; filename=rseventspro_rss20_56.xml
> Here is one the MCF stacktrace: 
> Exception tossed: IO error: utf-8; filename=rseventspro_rss20_56.xml
> org.apache.manifoldcf.core.interfaces.ManifoldCFException: IO error: utf-8; 
> filename=rseventspro_rss20_56.xml
> at 
> org.apache.manifoldcf.crawler.connectors.webcrawler.WebcrawlerConnector.handleXML(WebcrawlerConnector.java:4203)
>  ~[?:?]
> at 
> org.apache.manifoldcf.crawler.connectors.webcrawler.WebcrawlerConnector.extractLinks(WebcrawlerConnector.java:3855)
>  ~[?:?]
> at 
> org.apache.manifoldcf.crawler.connectors.webcrawler.WebcrawlerConnector.processDocuments(WebcrawlerConnector.java:746)
>  ~[?:?]
> at 
> org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:399) 
> [mcf-pull-agent.jar:?]
> Caused by: java.io.UnsupportedEncodingException: utf-8; 
> filename=rseventspro_rss20_56.xml
> at sun.nio.cs.StreamDecoder.forInputStreamReader(StreamDecoder.java:71) 
> ~[?:1.8.0_212]
> at java.io.InputStreamReader.(InputStreamReader.java:100) ~[?:1.8.0_212]
> at 
> org.apache.manifoldcf.connectorcommon.fuzzyml.DecodingByteReceiver.dealWithBytes(DecodingByteReceiver.java:47)
>  ~[?:?]
> at 
> org.apache.manifoldcf.connectorcommon.fuzzyml.BOMEncodingDetector.dealWithRemainder(BOMEncodingDetector.java:250)
>  ~[?:?]
> at 
> org.apache.manifoldcf.connectorcommon.fuzzyml.SingleByteReceiver.dealWithBytes(SingleByteReceiver.java:52)
>  ~[?:?]
> at 
> org.apache.manifoldcf.connectorcommon.fuzzyml.Parser.parseWithCharsetDetection(Parser.java:74)
>  ~[?:?]
> at 
> org.apache.manifoldcf.crawler.connectors.webcrawler.WebcrawlerConnector.handleXML(WebcrawlerConnector.java:4174)
>  ~[?:?]
> ... 3 more



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (CONNECTORS-1655) Web connector - UnsupportedEncodingException utf-8

2020-10-16 Thread Karl Wright (Jira)


[ 
https://issues.apache.org/jira/browse/CONNECTORS-1655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17215311#comment-17215311
 ] 

Karl Wright commented on CONNECTORS-1655:
-

Ah, but wait a minute: the issue is that the document in question has an 
illegal content-type:

"utf-8; filename=rseventspro_rss20_56.xml"

A patch for that is possible.  


> Web connector - UnsupportedEncodingException utf-8
> --
>
> Key: CONNECTORS-1655
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1655
> Project: ManifoldCF
>  Issue Type: Bug
>  Components: Web connector
>Affects Versions: ManifoldCF 2.17
>Reporter: Julien Massiera
>Assignee: Karl Wright
>Priority: Critical
>
> When crawling some sites (for instance this one: 
> [http://www.antibes-juanlespins.com/] ) the job manages to index some 
> documents, but the stops with the following error code:
> Error: IO error: utf-8; filename=rseventspro_rss20_56.xml
> Here is one the MCF stacktrace: 
> Exception tossed: IO error: utf-8; filename=rseventspro_rss20_56.xml
> org.apache.manifoldcf.core.interfaces.ManifoldCFException: IO error: utf-8; 
> filename=rseventspro_rss20_56.xml
> at 
> org.apache.manifoldcf.crawler.connectors.webcrawler.WebcrawlerConnector.handleXML(WebcrawlerConnector.java:4203)
>  ~[?:?]
> at 
> org.apache.manifoldcf.crawler.connectors.webcrawler.WebcrawlerConnector.extractLinks(WebcrawlerConnector.java:3855)
>  ~[?:?]
> at 
> org.apache.manifoldcf.crawler.connectors.webcrawler.WebcrawlerConnector.processDocuments(WebcrawlerConnector.java:746)
>  ~[?:?]
> at 
> org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:399) 
> [mcf-pull-agent.jar:?]
> Caused by: java.io.UnsupportedEncodingException: utf-8; 
> filename=rseventspro_rss20_56.xml
> at sun.nio.cs.StreamDecoder.forInputStreamReader(StreamDecoder.java:71) 
> ~[?:1.8.0_212]
> at java.io.InputStreamReader.(InputStreamReader.java:100) ~[?:1.8.0_212]
> at 
> org.apache.manifoldcf.connectorcommon.fuzzyml.DecodingByteReceiver.dealWithBytes(DecodingByteReceiver.java:47)
>  ~[?:?]
> at 
> org.apache.manifoldcf.connectorcommon.fuzzyml.BOMEncodingDetector.dealWithRemainder(BOMEncodingDetector.java:250)
>  ~[?:?]
> at 
> org.apache.manifoldcf.connectorcommon.fuzzyml.SingleByteReceiver.dealWithBytes(SingleByteReceiver.java:52)
>  ~[?:?]
> at 
> org.apache.manifoldcf.connectorcommon.fuzzyml.Parser.parseWithCharsetDetection(Parser.java:74)
>  ~[?:?]
> at 
> org.apache.manifoldcf.crawler.connectors.webcrawler.WebcrawlerConnector.handleXML(WebcrawlerConnector.java:4174)
>  ~[?:?]
> ... 3 more



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (CONNECTORS-1655) Web connector - UnsupportedEncodingException utf-8

2020-10-16 Thread Karl Wright (Jira)


 [ 
https://issues.apache.org/jira/browse/CONNECTORS-1655?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karl Wright resolved CONNECTORS-1655.
-
Fix Version/s: ManifoldCF 2.18
   Resolution: Fixed

r1882582


> Web connector - UnsupportedEncodingException utf-8
> --
>
> Key: CONNECTORS-1655
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1655
> Project: ManifoldCF
>  Issue Type: Bug
>  Components: Web connector
>Affects Versions: ManifoldCF 2.17
>Reporter: Julien Massiera
>Assignee: Karl Wright
>Priority: Critical
> Fix For: ManifoldCF 2.18
>
>
> When crawling some sites (for instance this one: 
> [http://www.antibes-juanlespins.com/] ) the job manages to index some 
> documents, but the stops with the following error code:
> Error: IO error: utf-8; filename=rseventspro_rss20_56.xml
> Here is one the MCF stacktrace: 
> Exception tossed: IO error: utf-8; filename=rseventspro_rss20_56.xml
> org.apache.manifoldcf.core.interfaces.ManifoldCFException: IO error: utf-8; 
> filename=rseventspro_rss20_56.xml
> at 
> org.apache.manifoldcf.crawler.connectors.webcrawler.WebcrawlerConnector.handleXML(WebcrawlerConnector.java:4203)
>  ~[?:?]
> at 
> org.apache.manifoldcf.crawler.connectors.webcrawler.WebcrawlerConnector.extractLinks(WebcrawlerConnector.java:3855)
>  ~[?:?]
> at 
> org.apache.manifoldcf.crawler.connectors.webcrawler.WebcrawlerConnector.processDocuments(WebcrawlerConnector.java:746)
>  ~[?:?]
> at 
> org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:399) 
> [mcf-pull-agent.jar:?]
> Caused by: java.io.UnsupportedEncodingException: utf-8; 
> filename=rseventspro_rss20_56.xml
> at sun.nio.cs.StreamDecoder.forInputStreamReader(StreamDecoder.java:71) 
> ~[?:1.8.0_212]
> at java.io.InputStreamReader.(InputStreamReader.java:100) ~[?:1.8.0_212]
> at 
> org.apache.manifoldcf.connectorcommon.fuzzyml.DecodingByteReceiver.dealWithBytes(DecodingByteReceiver.java:47)
>  ~[?:?]
> at 
> org.apache.manifoldcf.connectorcommon.fuzzyml.BOMEncodingDetector.dealWithRemainder(BOMEncodingDetector.java:250)
>  ~[?:?]
> at 
> org.apache.manifoldcf.connectorcommon.fuzzyml.SingleByteReceiver.dealWithBytes(SingleByteReceiver.java:52)
>  ~[?:?]
> at 
> org.apache.manifoldcf.connectorcommon.fuzzyml.Parser.parseWithCharsetDetection(Parser.java:74)
>  ~[?:?]
> at 
> org.apache.manifoldcf.crawler.connectors.webcrawler.WebcrawlerConnector.handleXML(WebcrawlerConnector.java:4174)
>  ~[?:?]
> ... 3 more



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Build failed in Jenkins: ManifoldCF ยป ManifoldCF-Artifacts-Ant-JDK11 #8

2020-10-16 Thread Apache Jenkins Server
See 


Changes:

[Karl Wright] Fix for CONNECTORS-1655.


--
[...truncated 1.21 MB...]
[javac]   symbol:   class Authentication
[javac]   location: class CswsSession
[javac] 
:98:
 error: cannot find symbol
[javac]   private final DocumentManagement documentManagementHandle;
[javac] ^
[javac]   symbol:   class DocumentManagement
[javac]   location: class CswsSession
[javac] 
:99:
 error: cannot find symbol
[javac]   private final ContentService contentServiceHandle;
[javac] ^
[javac]   symbol:   class ContentService
[javac]   location: class CswsSession
[javac] 
:100:
 error: cannot find symbol
[javac]   private final MemberService memberServiceHandle;
[javac] ^
[javac]   symbol:   class MemberService
[javac]   location: class CswsSession
[javac] 
:101:
 error: cannot find symbol
[javac]   private final SearchService searchServiceHandle;
[javac] ^
[javac]   symbol:   class SearchService
[javac]   location: class CswsSession
[javac] 
:109:
 error: cannot find symbol
[javac]   private Map workspaceTypeNodes = new HashMap<>();
[javac]   ^
[javac]   symbol:   class Node
[javac]   location: class CswsSession
[javac] 
:198:
 error: cannot find symbol
[javac]   public DocumentManagement getDocumentManagementHandle() {
[javac]  ^
[javac]   symbol:   class DocumentManagement
[javac]   location: class CswsSession
[javac] 
:206:
 error: cannot find symbol
[javac]   public ContentService getContentServiceHandle() {
[javac]  ^
[javac]   symbol:   class ContentService
[javac]   location: class CswsSession
[javac] 
:214:
 error: cannot find symbol
[javac]   public MemberService getMemberServiceHandle() {
[javac]  ^
[javac]   symbol:   class MemberService
[javac]   location: class CswsSession
[javac] 
:222:
 error: cannot find symbol
[javac]   public SearchService getSearchServiceHandle() {
[javac]  ^
[javac]   symbol:   class SearchService
[javac]   location: class CswsSession
[javac] 
:250:
 error: cannot find symbol
[javac]   public Node getRootNode(final String nodeType)
[javac]  ^
[javac]   symbol:   class Node
[javac]   location: class CswsSession
[javac] 
:266:
 error: cannot find symbol
[javac]   public List listNodes(final long nodeId)
[javac] ^
[javac]   symbol:   class Node
[javac]   location: class CswsSession
[javac] 
:282:
 error: cannot find symbol
[javac]   public List getChildren(final long nodeId)
[javac] ^
[javac]   symbol:   class Node
[javac]   location: class CswsSession