Robert Half created CAMEL-11846:
-----------------------------------

             Summary: xtokenize and apply xslt to a string does not work  with 
UTF-16BE
                 Key: CAMEL-11846
                 URL: https://issues.apache.org/jira/browse/CAMEL-11846
             Project: Camel
          Issue Type: Bug
          Components: camel-core
    Affects Versions: 2.17.5
            Reporter: Robert Half


In XML, encoding is often provided inside <?xml ..?> tag. In general, you 
cannot read the tag, if you don't know the encoding, but XML Parsers support 
the detection of several encodings which allows them to read the tag. With that 
information they can read the whole file without knowing the "charset" in first 
place.

xtokenize and xslt use XmlInputFactory#createXmlStreamReader(Reader). But by 
providing a reader Camel tells, that it knows the encoding, so it won't be 
detected by the XML parser.
Also Camel sets the charset to UTF-8 if it is not provided inside a header. 
This makes the underlying reader fail reading UTF-16.

Using XmlInputFactory#createXmlStreamReader(InputStream) inside 
XMLTokenExpressionIterator works (tried in a patch). But the next xslt steps 
fails again because it again uses a Reader.

See Stackoverflow Question for reference:
[https://stackoverflow.com/questions/46322376/apache-camel-to-handle-encoding-declared-in-xml-file]



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to