Bug#500015: Cannot parse feed containing SOH character

Matt Kraai Mon, 06 Oct 2008 20:12:11 -0700

On Mon, Oct 06, 2008 at 08:13:58PM +0200, Mike Hommey wrote:
> On Wed, Sep 24, 2008 at 07:30:39PM -0700, Matt Kraai wrote:
> (...)
> >  Character Range
> > 
> >  [2] Char ::= #x9 | #xA | #xD | [#x20-#xD7FF] |     /* any Unicode 
> > character,
> >               [#xE000-#xFFFD] |                     excluding the surrogate
> >           [#x10000-#x10FFFF]                    blocks, FFFE, and FFFF. */
> > 
> > but it doesn't specify that it must accept *only* characters in that
> > range.  In fact, the next paragraph states
> > 
> >  All XML processors MUST accept the UTF-8 and UTF-16 encodings of
> >  Unicode 3.1 ...
> > 
> > In http://www.unicode.org/Public/3.1-Update/UnicodeData-3.1.0.txt, the
> > list of Unicode 3.1 characters, the SOH character is the second entry.
> 
> I'll copy below upstream words:
> >   That's bull...t
> >
> > The allowed set of caracter is enumerated in the Char production, that
> > simple. Put a caracter out of that range in the document (whatever the
> > encoding used) and the processor MUST consider this a fatal error, raise
> > it to the application and stop passing data to the application from that
> > point in the document.
> 
> IOW: Not a bug.


Thanks for checking upstream.

I still don't understand how I'm misunderstanding the spec, but it
doesn't look like upstream is likely to agree with me.  :)

-- 
Matt                                                 http://ftbfs.org/



-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]

Bug#500015: Cannot parse feed containing SOH character

Reply via email to