[
https://issues.apache.org/jira/browse/XALANJ-2419?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16774677#comment-16774677
]
Jason Harrop commented on XALANJ-2419:
--------------------------------------
It works under Java 11 if I change makeStream("ISO-8859-1") to
makeStream("ISO8859_1").
With makeStream("ISO-8859-1"), s.getBytes(encoding) throws
UnsupportedEncodingException for encoding 8859-1 at
{code:java}
EncodingInfo.inEncoding(char, String) line: 438
EncodingInfo$EncodingImpl.isInEncoding(char) line: 226
EncodingInfo$EncodingImpl.isInEncoding(char) line: 215
EncodingInfo.isInEncoding(char) line: 113
ToXMLStream(ToStream).characters(char[], int, int) line: 1597
ToXMLStream(ToStream).characters(String) line: 1774
ToXMLStreamTest(ToStreamTest).outputCharacters(ToStream, String) line:
88
ToXMLStreamTest.testCase2() line: 114
NativeMethodAccessorImpl.invoke0(Method, Object, Object[]) line: not
available [native method]
NativeMethodAccessorImpl.invoke(Object, Object[]) line: 62
DelegatingMethodAccessorImpl.invoke(Object, Object[]) line: 43
Method.invoke(Object, Object...) line: 566
Reporter.executeTests(Test, int, Object) line: 787
ToXMLStreamTest(FileBasedTest).runTestCases(Properties) line: 339
ToXMLStreamTest(TestImpl).runTest(Properties) line: 205
ToXMLStreamTest(FileBasedTest).doMain(String[]) line: 833
ToXMLStreamTest.main(String[]) line: 196
{code}
Not related to 2419, but FYI there is one other test which fails, due to date
formatting and http://openjdk.java.net/jeps/252
I've put the test code on GitHub; for Java 11 I am using
https://github.com/plutext/xalan-test/tree/Plutext_Java11_xalan-j_2_7_x
> Astral characters written as a pair of NCRs with the surrogate scalar values
> when using UTF-8
> ---------------------------------------------------------------------------------------------
>
> Key: XALANJ-2419
> URL: https://issues.apache.org/jira/browse/XALANJ-2419
> Project: XalanJ2
> Issue Type: Bug
> Components: Serialization
> Affects Versions: 2.7.1
> Reporter: Henri Sivonen
> Priority: Major
> Attachments: XALANJ-2419-fix-v3.txt, XALANJ-2419-tests-v3.txt
>
>
> org.apache.xml.serializer.ToStream contains the following code:
> else if (m_encodingInfo.isInEncoding(ch)) {
> // If the character is in the encoding, and
> // not in the normal ASCII range, we also
> // just leave it get added on to the clean characters
>
> }
> else {
> // This is a fallback plan, we should never get here
> // but if the character wasn't previously handled
> // (i.e. isn't in the encoding, etc.) then what
> // should we do? We choose to write out an entity
> writeOutCleanChars(chars, i, lastDirtyCharProcessed);
> writer.write("&#");
> writer.write(Integer.toString(ch));
> writer.write(';');
> lastDirtyCharProcessed = i;
> }
> This leads to the wrong (latter) if branch running for surrogates, because
> isInEncoding() for UTF-8 returns false for surrogates. It is always wrong
> (regardless of encoding) to escape a surrogate as an NCR.
> The practical effect of this bug is that any document with astral characters
> in it ends up in an ill-formed serialization and does not parse back using an
> XML parser.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]