On 11-Feb-08, at 12:49 PM, Dan Fabulich wrote:
I didn't know this, so I imagine others might not. The string
"�" is invalid XML. The character is simply not allowed in XML
in any representation. XML 1.0 standard blocks most of the
characters under x20, allowing only x9 xA and xD. XML 1.1 allows x1-
x20, but still blocks x0.
http://www.w3.org/TR/1998/REC-xml-19980210#charsets
http://www.w3.org/TR/xml11/#charsets
This creates an interesting problem for serializing Java strings
containing the null character, e.g. "\u0000", or for other non-
whitespace control characters like the bell character "\u0007".
We've got an integration test for this case in Surefire, and it does
entirely the wrong thing (SUREFIRE-455).
In the patch submitted to that bug, Todor throws away nulls in his
XML escaper, silently omitting them from the output; all other
control characters (even the 1.0-illegal ones) pass through. That
doesn't seem right, especially when we're talking about test
results! (Expected "" but was "" ... Just imagine how painful it
would be to track something like that down.)
But neither does it seem right to insert "�" when it's illegal
XML. Notably, Java will cheerfully print � in XML if you tell it
to do so, and many parsers will figure out what to do with it just
fine; the same applies to "".
Thoughts? Should we emit "�", standards-be-damned? Silently
omit the character? Print a "?" instead? Something else?
Where did you run into this? I think not showing what it actually is
makes it immediately not obvious what's going wrong. So I'm for
showing what it actually is. Can you just wrap in CDATA?
-Dan
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
Thanks,
Jason
----------------------------------------------------------
Jason van Zyl
Founder, Apache Maven
jason at sonatype dot com
----------------------------------------------------------
happiness is like a butterfly: the more you chase it, the more it will
elude you, but if you turn your attention to other things, it will come
and sit softly on your shoulder ...
-- Thoreau
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]