[
https://issues.apache.org/jira/browse/XALANJ-2730?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17819804#comment-17819804
]
Joe Kesselman commented on XALANJ-2730:
---------------------------------------
As discussed in XALANJ-2725, there are still some edge conditions possible even
after the problem of splitting output across UTF16 buffer boundaries has been
handled. I dropped some additional comments into the serializer ToStream class
to document my concerns.
If an isolated High or Low surrogate somehow gets into the data stream, we are
inconsistent in how we handle it – it may throw an exception, or it may
"silently" output the surrogate as a Numeric Character Reference – which will
not be syntactically or semantically correct per either XML or UTF16, and which
doesn't warn the user of the problem, but which does attempt to show the
problem (approximately) in context.
My _preferred_ fix would be to have malformed UTF16 input always throw
exceptions rather than trying to dance around this to output (unusable) Numeric
Character References for isolated surrogates, especially since the remaining
edge conditions are particularly ugly ones. But comments in the code seem to
suggest that we moved away from that for some reason, and I don't recall
why/how that was justified.
If we do stay with fake-NCRs for isolated surrogates, I'm seriously considering
changing them to be fake-entity-references, which will at least not be
syntactically incorrect; this could be done by replacing the current output, eg
{{{}�{}}}, with something more like
{{&ERR_INVALID_UTF16_SURROGATE_55308;}} , using the MsgKey string so we at
least are in synch with the internationalization layer for clarity.
> Review handling of isolated UTF16 surrogate characters in serializer
> --------------------------------------------------------------------
>
> Key: XALANJ-2730
> URL: https://issues.apache.org/jira/browse/XALANJ-2730
> Project: XalanJ2
> Issue Type: Bug
> Security Level: No security risk; visible to anyone(Ordinary problems in
> Xalan projects. Anybody can view the issue.)
> Reporter: Joe Kesselman
> Assignee: Gary D. Gregory
> Priority: Major
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]