I think the communication between uima-as service and client ought to be:

1) an internal detail, not something "spec'd" which we have to adhere to

2) done in such a way as to always "work", regardless of what various "OS" defaults are doing.

Whatever gets implemented, shouldn't it work not just for getMetaData data, but also other kinds of data passed between client and server?

I'm thinking the problem might be one of underspecifying encoding/decoding configurations in the various communication subsystems and APIs we use internally, allowing them to pick up the OS "defaults".

-Marshall

On 6/7/2018 3:01 PM, Jaroslaw Cwiklik wrote:

I've created JIRA for this: https://issues.apache.org/jira/browse/UIMA-5791
Not yet sure how to fix this. Will take a look next week. If I understand
the requirements right, the default encoding should be UTF-8 when
deserializing service metadata..
There should also be a way to override the default. Seems like we need a
new cmdline arg (or property) for the client to override default encoding.
Jerry

On Thu, Jun 7, 2018 at 9:31 AM Marshall Schor <m...@schor.com> wrote:

Recently, we debugged an issue where a user had a UIMA-AS client running
on
Windows, connecting to a UIMA-AS service running on Linux in the cloud.

The linux box was set up with LANG etc set to UTF-8.  Windows did not have
any
special configuration.

After a successful service deployment on Linux, the Windows client sent a
get
meta, which received a "message string" from the transport, and tried to
parse
it with the xml parser, but that returned an error

org.xml.sax.SAXParseException: Invalid byte 1 of 1-byte UTF-8 sequence.
at org.apache.xerces.parsers.AbstractSAXParser.parse(Unknown Source)
at org.apache.xerces.jaxp.SAXParserImpl$JAXPSAXParser.parse(Unknown Source)
at org.apache.uima.util.impl.XMLParser_impl.parse(XMLParser_impl.java:202)

Eventually the user worked around this launching the Windows client Java
with
the extra parameter

   -D"file.encoding-UTF-8"

which made this problem go away (but may introduce other issues).

Should UIMA-AS communication protocols specify UTF-8 explicitly, instead
of
defaulting to "platform defaults" which seem to cause issues if the
defaults
aren't compatible?

-Marshall


Reply via email to