On Mon, Oct 06, 2008 at 08:13:58PM +0200, Mike Hommey wrote: > On Wed, Sep 24, 2008 at 07:30:39PM -0700, Matt Kraai wrote: > (...) > > Character Range > > > > [2] Char ::= #x9 | #xA | #xD | [#x20-#xD7FF] | /* any Unicode > > character, > > [#xE000-#xFFFD] | excluding the surrogate > > [#x10000-#x10FFFF] blocks, FFFE, and FFFF. */ > > > > but it doesn't specify that it must accept *only* characters in that > > range. In fact, the next paragraph states > > > > All XML processors MUST accept the UTF-8 and UTF-16 encodings of > > Unicode 3.1 ... > > > > In http://www.unicode.org/Public/3.1-Update/UnicodeData-3.1.0.txt, the > > list of Unicode 3.1 characters, the SOH character is the second entry. > > I'll copy below upstream words: > > That's bull...t > > > > The allowed set of caracter is enumerated in the Char production, that > > simple. Put a caracter out of that range in the document (whatever the > > encoding used) and the processor MUST consider this a fatal error, raise > > it to the application and stop passing data to the application from that > > point in the document. > > IOW: Not a bug.
Thanks for checking upstream. I still don't understand how I'm misunderstanding the spec, but it doesn't look like upstream is likely to agree with me. :) -- Matt http://ftbfs.org/ -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]