[
https://issues.apache.org/jira/browse/XALANJ-2419?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17810609#comment-17810609
]
Joe Kesselman edited comment on XALANJ-2419 at 1/25/24 12:59 AM:
-----------------------------------------------------------------
Arggh. Found the difference in invocation, I think. It's an annoying one.
The failing version is using my commandline's current default binding of
"java", which runs through the /etc/alternatives system to invokeĀ
`/usr/lib/jvm/java-17-openjdk-17.0.8.0.7-1.fc37.x86_64/bin/java`
The succeeding versions explicitly invoke /usr/lib/jvm/jre-1.8.0/bin/java,
which /etc/alternatives eventually maps to
`/usr/lib/jvm/java-1.8.0-openjdk-1.8.0.382.b05-2.fc37.x86_64/jre/bin/java`
If I change my one-liner to use that 1.8 jre rather than my current default 17,
the errors vanish.
In the vain hope that the problem was specifically OpenJDK 17, I tried it with
`/jre-21-openjdk-21.0.1.0.12-1.rolling.fc37.x86_64`. Fails there too.
----
So *something* is being java-version sensitive and changed some time after Java
1.8. A bug fixed, a new bug, something redefined, something formatted
differently, something ordered differently. May be ... +_interesting_+ ... to
track down.
At least we now know how to provoke the divergent behavior in the debugger for
study.
Deep breath. Let it out slowly. Recite the mantra: "{color:#0747a6}+_If it was
easy, they wouldn't need people like us_+{color}."
was (Author: JIRAUSER285361):
Arggh. Found the difference in invocation, I think. It's an annoying one.
The failing version is using my commandline's current default binding of
"java", which runs through the /etc/alternatives system to invokeĀ
`/usr/lib/jvm/java-17-openjdk-17.0.8.0.7-1.fc37.x86_64/bin/java`
The succeeding versions explicitly invoke /usr/lib/jvm/jre-1.8.0/bin/java,
which /etc/alternatives eventually maps to
`/usr/lib/jvm/java-1.8.0-openjdk-1.8.0.382.b05-2.fc37.x86_64/jre/bin/java`
If I change my one-liner to use that 1.8 jre rather than my current default 17,
the errors vanish.
In the vain hope that the problem was specifically OpenJDK 17, I tried it with
`/jre-21-openjdk-21.0.1.0.12-1.rolling.fc37.x86_64`. Fails there too.
----
So *something* is being java-version sensitive and changed some time after Java
1.8. A bug fixed, a new bug, something redefined, something formatted
differently, something ordered differently. May be ... +_interesting_+ ... to
track down.
At least we now know how to provoke the divergent behavior in the debugger for
study.
Deep breath. Let it out slowly. Recite the mantra:
"{color:#0747a6}{color:#172b4d}+_If it was easy, they wouldn't need people like
us_+{color}.{color}"
> Astral characters written as a pair of NCRs with the surrogate scalar values
> when using UTF-8
> ---------------------------------------------------------------------------------------------
>
> Key: XALANJ-2419
> URL: https://issues.apache.org/jira/browse/XALANJ-2419
> Project: XalanJ2
> Issue Type: Bug
> Components: Serialization
> Affects Versions: 2.7.1
> Reporter: Henri Sivonen
> Assignee: Joe Kesselman
> Priority: Major
> Fix For: The Latest Development Code
>
> Attachments: XALANJ-2419-fix-v3.txt, XALANJ-2419-tests-v3.txt
>
>
> org.apache.xml.serializer.ToStream contains the following code:
> else if (m_encodingInfo.isInEncoding(ch)) {
> // If the character is in the encoding, and
> // not in the normal ASCII range, we also
> // just leave it get added on to the clean characters
>
> }
> else {
> // This is a fallback plan, we should never get here
> // but if the character wasn't previously handled
> // (i.e. isn't in the encoding, etc.) then what
> // should we do? We choose to write out an entity
> writeOutCleanChars(chars, i, lastDirtyCharProcessed);
> writer.write("&#");
> writer.write(Integer.toString(ch));
> writer.write(';');
> lastDirtyCharProcessed = i;
> }
> This leads to the wrong (latter) if branch running for surrogates, because
> isInEncoding() for UTF-8 returns false for surrogates. It is always wrong
> (regardless of encoding) to escape a surrogate as an NCR.
> The practical effect of this bug is that any document with astral characters
> in it ends up in an ill-formed serialization and does not parse back using an
> XML parser.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]