[ 
https://issues.apache.org/jira/browse/XALANJ-2419?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16439379#comment-16439379
 ] 

Uwe Schindler edited comment on XALANJ-2419 at 4/16/18 12:48 PM:
-----------------------------------------------------------------

Thanks for the fix, I will test it in a moment.

About a release: I am Apache member and committer, so I might start a thread to 
push a release. As these bugs are horrible and make almost any XML handling of 
stuff like Emojis broken, we should maybe do a bugfix for serializer.jar 
release. Keep in mind, this would also require to make a Xerces release, as 
Xerces and Xalan share serializer.jar (I think they depend on each other on 
Maven central).

I would try to manage to do help with a relaese. This fix is indeed simple. 
Somebody should just commit it (I could theoretically do it, but that should be 
done by non-project members only as last resort), and press somebody else would 
press the button for release.

BTW, also my own projects like Apache Solr are affected by this bug (people 
that still use XML instead of JSON with Solr).


was (Author: thetaphi):
Thanks for the fix, I will test it in a moment.

About a release: I am Apache member and committer, so I might start a thread to 
push a release. As these bugs are horrible and make almost any XML handling of 
stuff like Emojis broken, we should maybe do a bugfix releaser for 
serializer.jar release. Keep in mind, this would also require to make a Xerces 
release, as Xerces and Xalan share serializer.jar (I think they depend on each 
other on Maven central).

I would try to manage to do help with a relaese. This fix is indeed simple. 
Somebody should just commit it (I could theoretically do it, but that should be 
done by non-project members only as last resort), and press somebody else would 
press the button for release.

> Astral characters written as a pair of NCRs with the surrogate scalar values 
> when using UTF-8
> ---------------------------------------------------------------------------------------------
>
>                 Key: XALANJ-2419
>                 URL: https://issues.apache.org/jira/browse/XALANJ-2419
>             Project: XalanJ2
>          Issue Type: Bug
>          Components: Serialization
>    Affects Versions: 2.7.1
>            Reporter: Henri Sivonen
>            Priority: Major
>         Attachments: XALANJ-2419-fix-v3.txt, XALANJ-2419-tests-v3.txt
>
>
> org.apache.xml.serializer.ToStream contains the following code:
>                     else if (m_encodingInfo.isInEncoding(ch)) {
>                         // If the character is in the encoding, and
>                         // not in the normal ASCII range, we also
>                         // just leave it get added on to the clean characters
>                         
>                     }
>                     else {
>                         // This is a fallback plan, we should never get here
>                         // but if the character wasn't previously handled
>                         // (i.e. isn't in the encoding, etc.) then what
>                         // should we do?  We choose to write out an entity
>                         writeOutCleanChars(chars, i, lastDirtyCharProcessed);
>                         writer.write("&#");
>                         writer.write(Integer.toString(ch));
>                         writer.write(';');
>                         lastDirtyCharProcessed = i;
>                     }
> This leads to the wrong (latter) if branch running for surrogates, because 
> isInEncoding() for UTF-8 returns false for surrogates. It is always wrong 
> (regardless of encoding) to escape a surrogate as an NCR.
> The practical effect of this bug is that any document with astral characters 
> in it ends up in an ill-formed serialization and does not parse back using an 
> XML parser.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to