RE: chainsaw and "escaping" XML entities

Nicko Cadell 11 Apr 2005 13:14:15 -0000

 
> >> For invalid characters such as 0x1e there are 3 possible solutions:
> 
> >> 1) Discard the character from the output.
> 
> >> 2) Replace the character with a numeric representation e.g. "0x1E".
> 
> >> 3) Replace the character with an XML element e.g. <char code="30"/>
> 
> > Nicko
> > 
> >> favour option 3 above because information is not lost. In 
> options 1 
> >> and 2  information is lost. In 2 the encoding is not 
> reversible. With 
> >> 3 the  application reading the data requires additional smarts to 
> >> pickup on  the encoded values in element, but all the original 
> >> information is  preserved. If the app just asks for the 
> text nodes, 
> >> ignoring the  child elements, then they will get back the 
> same result as from 1.
> 
> If the application just deserializes the string, they'll end 
> up with a much more complex tree structure with a couple of 
> text nodes, an attribute node, ....


If the app does a GetText on the message element they will get all the
text nodes joined up without the sub elements, which is reasonable, i.e.
just drop the control codes. If they use InnerXml then they get XML
elements, but then they should expect that and live with it!


> I don't see that the transport of binary data is a key 
> purpose for log4net. Much as I dislike option proliferation, 

Log4net should not be throwing away data, even if it is not very string
like, just because XML doesn't like it.

> I wonder if would it be reasonable to have 3 as an optional 
> behavior but 1 or 2 as a default?  What does log4j do in this 
> situation?

log4j's XMLLayout just writes to the output stream through a Writer so
it does no escaping or numeric character reference encoding. It does
write the message out into a CDATA section but that should not resolve
the issue. I doubt that it works there either.

Nicko


> --
> Mike Blake-Knox
>

RE: chainsaw and "escaping" XML entities

Reply via email to