[
https://issues.apache.org/jira/browse/XERCESJ-1574?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Chris Simmons updated XERCESJ-1574:
-----------------------------------
Attachment: patch.txt
It seems that the same changes completely broke encoding detection for the
"ISO-10646-UCS-2" encoding. This test passes with r578659 of XMLEntityManager
(which was in 2.9.1) but was broken in the subsequent revision r581487 leading
to this failure:-
org.xml.sax.SAXParseException: Content is not allowed in prolog.
This problem was also fixed in r1363647 so probably has the same underlying
cause. I haven't investigated whether any other encodings are affected.
This seemed sufficiently bad for me build a patched Xerces locally, perhaps a
2.11 point release is warranted?
> Problem with detected encoding for UTF-16 encoded as Unicode Little
> -------------------------------------------------------------------
>
> Key: XERCESJ-1574
> URL: https://issues.apache.org/jira/browse/XERCESJ-1574
> Project: Xerces2-J
> Issue Type: Bug
> Components: DOM (Level 3 Core)
> Affects Versions: 2.11.0
> Reporter: Radu Coravu
> Assignee: Michael Glavassevich
> Attachments: patch.txt
>
> Original Estimate: 2h
> Remaining Estimate: 2h
>
> I have the following test case:
> ByteArrayInputStream bis = new ByteArrayInputStream(
> "<?xml version=\"1.0\" encoding=\"UTF-16\"?>
> <a/>".getBytes("UnicodeLittle"));
> InputSource is = new InputSource(bis);
> DOMParser dp = new DOMParser();
> dp.parse(is);
> assertEquals("UTF-16LE", dp.getDocument().getInputEncoding());
> The input stream is encoded as "UnicodeLittle" and "
> dp.getDocument().getInputEncoding()" should return "UTF-16LE" (at least it
> did so in the previous Xerces version). Right now it returns "UTF-16"
> regardless of the byte order mark in the input stream.
> So a developer using the information from
> "dp.getDocument().getInputEncoding()" information does not know how to save
> the document in order to preserve the same BOM.
> This problem is related to the modifications which were made in the
> XMLEntityManager related to encoding detection.
> As a proposed modification, in the method:
> org.apache.xerces.impl.XMLEntityManager.setupCurrentEntity(String,
> XMLInputSource, boolean, boolean)
> before the code:
> fCurrentEntity = new ScannedEntity(name,....
> we could add the following code:
> if("UTF-16".equals(encoding)) {
> if(isBigEndian != null) {
> if(isBigEndian) {
> encoding = "UTF-16BE";
> } else {
> encoding = "UTF-16LE";
> }
> }
> }
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]