Dawid Weiss wrote:

We should not drop the offending characters, but escape them. Either the Unicode entity (&#nn;) or CDATA way is ok (and CDATA way is simpler).


This isn't entirely true, Andrzej -- escaping a character, or putting it in a CDATA section is just about different ways of expressing the same character code in an XML structure. The same and ILLEGAL character code in terms of XML spec (there is a fragment specifying legal character ranges there), so a conforming XML parser should throw an exception if it encounters anything outside of the legal range. The only way of transferring a full binary is to encode it to legal unicode characters (using uuencode or such).

I agree with the person who submitted this patch that it is a potential issue and should be addressed somehow.

Right, I didn't think about this... somehow I thought this was all about special characters like ' " & <.

Then we should take the best of both worlds - escape valid characters, and replace invalid ones with '?' or space, or nothing. I know a place where we could find some inspiration (Carrot2 XMLSerializerHelper.java ... ;-) )

--
Best regards,
Andrzej Bialecki     <><
 ___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com

Reply via email to