[
https://issues.apache.org/jira/browse/XALANJ-2725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17810437#comment-17810437
]
Max commented on XALANJ-2725:
-----------------------------
Hi [[email protected]] , all good points. Considering that I am not an
"expert" in xalan, and have no deep understanding in its full complex
functionality, my intention was to minimize the potential blast radius that may
have been caused by my change. Thus I changed as little code as possible
solving a very narrow and very specific problem. I agree that a broken
character sequence SHOULD result in an error and not in masking. Will this
change break backward compatibility? I am not sure, but it is the right course
of action. As far as using a Character class I just followed suit from other
areas in the code. As you suggested, we do not really need that class and can
get away with a numeric value, as you suggested. I do not think 0 is a valid
value, so might as well serve in place of null.
Also, I tested my changes live on my use-cases, I did not find the regression
tests that you are mentioning as failing. Would you mind pointing me in the
right direction? I'd like to run them and see where they fail, maybe I'll catch
what I missed.
Thanks!
> Possible buffer-boundry issue when serializing surrogate pairs
> --------------------------------------------------------------
>
> Key: XALANJ-2725
> URL: https://issues.apache.org/jira/browse/XALANJ-2725
> Project: XalanJ2
> Issue Type: Improvement
> Security Level: No security risk; visible to anyone(Ordinary problems in
> Xalan projects. Anybody can view the issue.)
> Components: Serialization
> Reporter: Joe Kesselman
> Assignee: Joe Kesselman
> Priority: Major
> Labels: Surrogates, escaping, unicode, utf
> Attachments: astral-chars-split-buffer.patch
>
> Original Estimate: 168h
> Remaining Estimate: 168h
>
> XALANJ-2419 addressed a case where "astral" Unicode characters, requiring a
> surrogate pair (two UTF-16 units), were not being serialized correctly. We
> have a proposed fix for that.
> There is reported to still be an edge case when a surrogate pair which
> crosses buffer boundaries might not be handled correctly. [~maxfortun]
> offered what looks like a reasonable proposed fix
> (https://github.com/maxfortun/xalan-j/blob/a9bd5591d9f8a523548aeec091e886b64c691628/src/org/apache/xml/serializer/ToStream.java#L1607),
> but in my testing this was not serializing the surrogate pairs correctly,
> causing regression on the tests XALANJ-2419 introduced. I don't know whether
> that's because we're taking multiple paths through
> But the edge case does appear to be real, and if so we will need some such
> solution.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]