[ https://issues.apache.org/jira/browse/UIMA-5791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16537368#comment-16537368 ]
Marshall Schor edited comment on UIMA-5791 at 7/9/18 6:50 PM: -------------------------------------------------------------- Looks like on windows the default is: file.encoding: Cp1252 defaultCharset: windows-1252 and on Linux: file.encoding: UTF-8 defaultCharset: UTF-8 Should the service send to a client its encoding setting in a GetMeta response? A client could then use it for meta and subsequent cas deserialization. was (Author: cwiklik): Looks like on windows the default is: file.encoding: Cp1252 defaultCharset: windows-1252 and on Linux: file.encoding: UTF-8 defaultCharset: UTF-8 Should the service send to a client its encoding setting in a GetMeta response? A client could then use it for meta and subsequent cas deserialization. > UIMA-AS: fix client SAXParseException when deserializing metadata > ----------------------------------------------------------------- > > Key: UIMA-5791 > URL: https://issues.apache.org/jira/browse/UIMA-5791 > Project: UIMA > Issue Type: Bug > Components: Async Scaleout > Reporter: Jerry Cwiklik > Assignee: Jerry Cwiklik > Priority: Major > Fix For: 2.10.4AS > > > XML parser fails with SAXParseException when trying to deserialize service > metadata. The scenario which causes the error is: > UIMA-AS client running on windows > Service runs on linux > The client sends getMeta request and receives a response from a service. The > client tries to deserialize the meta and gets: > Jun 06, 2018 2:25:10 PM > org.apache.uima.adapter.jms.client.BaseUIMAAsynchronousEngineCommon_impl$2 > onMessageWARNING: org.apache.uima.util.InvalidXMLException: Invalid > descriptor at <unknown source>.at > org.apache.uima.util.impl.XMLParser_impl.parse(XMLParser_impl.java:219)at > org.apache.uima.util.impl.XMLParser_impl.parseResourceMetaData(XMLParser_impl.java:438)at > > org.apache.uima.util.impl.XMLParser_impl.parseResourceMetaData(XMLParser_impl.java:420)at > > org.apache.uima.adapter.jms.client.BaseUIMAAsynchronousEngineCommon_impl.handleMetadataReply(BaseUIMAAsynchronousEngineCommon_impl.java:1178)at > > org.apache.uima.adapter.jms.client.BaseUIMAAsynchronousEngineCommon_impl$2.run(BaseUIMAAsynchronousEngineCommon_impl.java:2065)at > > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1160)at > > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)at > java.lang.Thread.run(Thread.java:811)Caused by: > org.xml.sax.SAXParseException: Invalid byte 1 of 1-byte UTF-8 sequence.at > org.apache.xerces.parsers.AbstractSAXParser.parse(Unknown Source)at > org.apache.xerces.jaxp.SAXParserImpl$JAXPSAXParser.parse(Unknown Source)at > org.apache.uima.util.impl.XMLParser_impl.parse(XMLParser_impl.java:202)... 7 > more > > A workaround for the above was to set: -D"file.encoding-UTF-8" on the client. > Review the code and provided a fix. Perhaps XML InputSource has a way to set > encoding. The default should be UTF-8. Seems like we need a new uima-as a new > property (or command line arg) to override the default in case a user needs > different encoding. > -- This message was sent by Atlassian JIRA (v7.6.3#76005)