Stefano Mazzocchi wrote:
... 0xd800 is not a legal XML character.
...
<high-unicode ...>&#65536;</high-unicode>

Now: whose problem is this Slide's or JDOM's?

JDOM, I'd guess without looking at the code. This is a very general problem: The surrogate Unicode codepoints are illegal for itself in XML, but of course in Java strings there is no way to express non-baseplane characters other than as two surrogates. Problem: if the test for illegal surrogates is before character reference expansion, illegal surrogates may sneak in as char refs. If the test is after character reference expansion, a non-baseplane character may trigger a false positive. Obviously, the test has to be done twice, once for literal characters and once as part of dealing with character references.

I personally wouldn't loose much sleep over this particular problem.
Unless you are into MathML or obscure historic scripts, non-baseplane
characters are more of a curiosum.

J.Pietschmann



Reply via email to