Is UIMA-AS missing some encoding spec?

Marshall Schor Thu, 07 Jun 2018 06:32:02 -0700

Recently, we debugged an issue where a user had a UIMA-AS client running onWindows, connecting to a UIMA-AS service running on Linux in the cloud.

The linux box was set up with LANG etc set to UTF-8. Windows did not have anyspecial configuration.

After a successful service deployment on Linux, the Windows client sent a getmeta, which received a "message string" from the transport, and tried to parseit with the xml parser, but that returned an error


org.xml.sax.SAXParseException: Invalid byte 1 of 1-byte UTF-8 sequence.
at org.apache.xerces.parsers.AbstractSAXParser.parse(Unknown Source)
at org.apache.xerces.jaxp.SAXParserImpl$JAXPSAXParser.parse(Unknown Source)
at org.apache.uima.util.impl.XMLParser_impl.parse(XMLParser_impl.java:202)

Eventually the user worked around this launching the Windows client Java withthe extra parameter


 -D"file.encoding-UTF-8"

which made this problem go away (but may introduce other issues).

Should UIMA-AS communication protocols specify UTF-8 explicitly, instead ofdefaulting to "platform defaults" which seem to cause issues if the defaultsaren't compatible?


-Marshall

Is UIMA-AS missing some encoding spec?

Reply via email to