Re: Random comments and bugfixes

J.Pietschmann Tue, 11 Nov 2003 11:08:53 -0800

Stefano Mazzocchi wrote:

... 0xd800 is not a legal XML character.

...

<high-unicode ...>𐀀</high-unicode>

Now: whose problem is this Slide's or JDOM's?


JDOM, I'd guess without looking at the code. This is a very general
problem: The surrogate Unicode codepoints are illegal for itself in
XML, but of course in Java strings there is no way to express
non-baseplane characters other than as two surrogates. Problem:
if the test for illegal surrogates is before character reference
expansion, illegal surrogates may sneak in as char refs. If the test is
after character reference expansion, a non-baseplane character may
trigger a false positive. Obviously, the test has to be done twice,
once for literal characters and once as part of dealing with character
references.

I personally wouldn't loose much sleep over this particular problem.
Unless you are into MathML or obscure historic scripts, non-baseplane
characters are more of a curiosum.

J.Pietschmann

Re: Random comments and bugfixes

Reply via email to