> So is it OK in XML to escape all other control characters 
> with the &#xx; ? That seemed to be what I understood from my googling.

This is only true for XML1.1. XML1.1 makes all control characters (except
0x0) restricted characters:

[2]     Char       ::=          [#x1-#xD7FF] | [#xE000-#xFFFD] |
[#x10000-#x10FFFF]      /* any Unicode character, excluding the surrogate
blocks, FFFE, and FFFF. */

[2a]    RestrictedChar     ::=          [#x1-#x8] | [#xB-#xC] | [#xE-#x1F] |
[#x7F-#x84] | [#x86-#x9F]


The use of the restricted characters is discouraged. see
http://www.w3.org/TR/2006/REC-xml11-20060816/#charsets for more details.

XML1.0 blocks a number of control characters:

Character Range
[2]     Char       ::=          #x9 | #xA | #xD | [#x20-#xD7FF] |
[#xE000-#xFFFD] | [#x10000-#x10FFFF]    /* any Unicode character, excluding
the surrogate blocks, FFFE, and FFFF. */

See http://www.w3.org/TR/xml/#charsets


I dislike option c (CDATA approach) because all implementations would need
to implement their own CDATA parser (XML parser ignores the CDATA
statements:). I don't think we would want to restrict the allowed character
set in LTK (binary and XML) - option a. 

I am generally also in favaour of the Base64 encode approach because it is
probably the cleanest solution. I just see the problem that this limits the
readability and authoring by a human of the LTK XML file (one of its major
use cases, right?) if control characters are heavily used. However, the only
alternative I see is some LTK special encoding rule for 0x0  (e.g. \NULL)
and using XML1.1: 

 <rp:ReaderFirmwareVersion>3.0.1.240\NULL</rp:ReaderFirmwareVersion>

        - Christian






> -----Original Message-----
> From: [EMAIL PROTECTED] 
> [mailto:[EMAIL PROTECTED] On 
> Behalf Of John R. Hogerhuis
> Sent: Freitag, 29. Februar 2008 13:29
> To: LLRP Toolkit Development List
> Subject: Re: [ltk-d] Java LTK and non XML characters
> 
> On Fri, Feb 29, 2008 at 9:55 AM, Christian Floerkemeier 
> <[EMAIL PROTECTED]> wrote:
> >
> >  I agree, but escaping via &#0; etc is not an option to my 
> knowledge. 
> > Control  characters are illegal in XML, regardless of encoding.
> >
> 
> Yikes... did some research based on your comment and I agree. 
> Seems that the XML folks want to be nice to the C programmers 
> too. OK, then in that case I think we need to do one of:
> 
> a) Ban null from utf8's in LTK
> b) When they appear (which is, hopefully never), escape the entire
> utf8 as a hex string or base64. We could put an attribute on 
> the element in the LTK-XML instance to indicate that we are 
> representing the string as xs:hexBinary.
> c) CDATA as you propose.
> 
> CDATA has its own problems and complexities. If we don't do 
> (a) I think I would prefer a simple hex encoding in the rare 
> case that a NULL appears since the XML parser will work with 
> it just fine.
> 
> <rp:ReaderFirmwareVersion
> binencode="hex">332E302E312E323400</rp:ReaderFirmwareVersion>
> 
> The default encoding is raw utf-8.
> 
> So is it OK in XML to escape all other control characters 
> with the &#xx; ? That seemed to be what I understood from my googling.
> 
> -- John.
> 
> --------------------------------------------------------------
> -----------
> This SF.net email is sponsored by: Microsoft Defy all 
> challenges. Microsoft(R) Visual Studio 2008.
> http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
> _______________________________________________
> llrp-toolkit-devel mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/llrp-toolkit-devel
> 


-------------------------------------------------------------------------
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
_______________________________________________
llrp-toolkit-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/llrp-toolkit-devel

Reply via email to