Re: Handling of NUL in the input stream (r825)

Sam Ruby Wed, 27 Jun 2007 04:37:04 -0700

On 6/27/07, Thomas Broyer <[EMAIL PROTECTED]> wrote:
>
> > Log:
> >  The spec says: "All U+0000 NULL characters in the input must be replaced
> > by U+FFFD REPLACEMENT CHARACTERs. Any occurrences of such
> > characters is a parse error."
> >
> >
> > Modified: trunk/testdata/tokenizer/test2.test
> >  ===============================================
> >  --- trunk/testdata/tokenizer/test2.test (original)
> >  +++ trunk/testdata/tokenizer/test2.test Tue Jun 26 23:58:40 2007
> >  @@ -118,7 +118,7 @@
> >   {"description":"Null Byte Replacement",
> >   "input":"\u0000",
> >  -"output":[["Character", "\ufffd"]]}
> >  +"output":["ParseError", ["Character", "\ufffd"]]}
>
> Fixing this in html5lib would require huge refactoring because this
> conversion is done in the HTMLInputStream which doesn't yield tokens
> (and parse errors are tokens currently).


Until a greater refactoring is done, I stuck in HTMLInputStream a
simple queue of errors which the tokenizer converts into ParseErrors.

> I suggest refactoring how parse errors are reported. First, we
> probably shouldn't use tokens to represent parse errors. I suggest
> using either something along the lines of the 'warnings' Python module
> (with a ParseError class inheriting from Warning and carrying a
> reference to source object (input stream, tokenizer or parser) and
> position within the input stream) or something resembling the SAX
> ErrorHandler (the parser registers its error handler on the newly
> created tokenizer; and the tokenizer in turn registers its error
> handler on the newly created input stream), defaulting to a handler
> adding parse errors to the parser's errors list, for backwards
> compatibility (most probably cannot be achieve with a
> warnings.warn-looking reporting model).
>
> But this means that tokenizer tests might need to be refactored also,
> because we might not be able to "arrange" parse errors "pseudo tokens"
> in the right order for the tests to pass (or maybe we should just
> extract them from the expected output and then check whether the
> number of reported parse errors is the same as the number expected,
> without checking where they happened).
>
> Any thoughts?
>
> --
> Thomas Broyer

- Sam Ruby

--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"html5lib-discuss" group.
 To post to this group, send email to [email protected]
 To unsubscribe from this group, send email to [EMAIL PROTECTED]
 For more options, visit this group at 
http://groups.google.com/group/html5lib-discuss?hl=en-GB
-~----------~----~----~----~------~----~------~--~---

Re: Handling of NUL in the input stream (r825)

Reply via email to