It does look like a step forward; I was just being confused by the hashtable behavior. Lemme look one more time and I'll probably merge.
I agree, this may be approaching enough of a fix to justify a new release. -- /_ Joe Kesselman (he/him/his) -/ _) My Alexa skill for New Music/New Sounds fans: / https://www.amazon.com/dp/B09WJ3H657/ Caveat: Opinionated old geezer with overcompensated writer's block. May be redundant, verbose, prolix, sesquipedalian, didactic, officious, or redundant. ________________________________ From: Cédric Damioli (Jira) <[email protected]> Sent: Saturday, January 27, 2024 12:53:00 PM To: [email protected] <[email protected]> Subject: [jira] [Commented] (XALANJ-2725) Possible buffer-boundry issue when serializing surrogate pairs [ https://issues.apache.org/jira/browse/XALANJ-2725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17811574#comment-17811574 ] Cédric Damioli commented on XALANJ-2725: ---------------------------------------- It would be great to solve all open encoding issues and then be able to make a release. I think [~maxfortun]'s proposal, even if it don't cover all potential cases may still be a good achievement. What is missing ? Does all tests pass ? I'll test the PR on my edge cases. > Possible buffer-boundry issue when serializing surrogate pairs > -------------------------------------------------------------- > > Key: XALANJ-2725 > URL: https://issues.apache.org/jira/browse/XALANJ-2725 > Project: XalanJ2 > Issue Type: Improvement > Security Level: No security risk; visible to anyone(Ordinary problems in > Xalan projects. Anybody can view the issue.) > Components: Serialization > Reporter: Joe Kesselman > Assignee: Joe Kesselman > Priority: Major > Labels: Surrogates, escaping, unicode, utf > Attachments: astral-chars-split-buffer.patch > > Original Estimate: 168h > Remaining Estimate: 168h > > XALANJ-2419 addressed a case where "astral" Unicode characters, requiring a > surrogate pair (two UTF-16 units), were not being serialized correctly. We > have a proposed fix for that. > There is reported to still be an edge case when a surrogate pair which > crosses buffer boundaries might not be handled correctly. [~maxfortun] > offered what looks like a reasonable proposed fix > (https://github.com/maxfortun/xalan-j/blob/a9bd5591d9f8a523548aeec091e886b64c691628/src/org/apache/xml/serializer/ToStream.java#L1607), > but in my testing this was not serializing the surrogate pairs correctly, > causing regression on the tests XALANJ-2419 introduced. I don't know whether > that's because we're taking multiple paths through > But the edge case does appear to be real, and if so we will need some such > solution. -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
