XOPAwareStAXOMBuilder / MTOMStAXSOAPModelBuilder should use UTF-8 to decode
cid: URIs
-------------------------------------------------------------------------------------
Key: WSCOMMONS-429
URL: https://issues.apache.org/jira/browse/WSCOMMONS-429
Project: WS-Commons
Issue Type: Bug
Reporter: Andreas Veithen
Assignee: Andreas Veithen
Priority: Minor
XOPAwareStAXOMBuilder and MTOMStAXSOAPModelBuilder use the document charset
encoding to decode cid: URIs (see usage of URLDecoder.decode in
ElementHelper#getContentID). However, as explained in [1] (referenced by the
definition of the anyURI type), %HH escaping should always be done using UTF-8.
Since non ASCII characters are not allowed in content IDs, this is only an
issue if the document uses a charset encoding that is not a superset of ASCII
(e.g. UTF-16). It should also be noted that most of the characters that require
%HH encoding are also not allowed (or are unusual) in content IDs. Therefore
this is a minor issue.
It should also be noted that the unit test
MTOMStAXSOAPModelBuilderTest#testUTF16MTOMMessage specifically tests this
incorrect behavior. It should therefore be corrected or removed entirely.
[1] http://www.w3.org/TR/2001/WD-charmod-20010126/#sec-URIs
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.