This is a vague list of things I plan on doing, as I want to get the
tokenizer more or less finished before I move on to tree-building.
- Split out the input stream from the tokenizer (this is more work
than it sounds because I also want to move to properly consuming
characters as we tokenize, and never unconsuming more than one
character; this is primarily so we can also allow $fp =
fopen('http://example.com
') to be used as an input stream in the future without so much work).
This is what I'm currently working on.
- Remove cases where we directly call states (as we can now
unconsume), to more closely match the spec, and to make the next item
at least possible.
- Investigate moving the whole state-machine to one big switch
statement within the parse method (this shouldn't make the codebase
messy, IMO, and it avoids the function call overhead, which is
currently a non-negligible amount of our expense).
- Do more grabbing multiple characters.
--
Geoffrey Sneddon
<http://gsnedders.com/>
--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups
"html5lib-discuss" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to
[email protected]
For more options, visit this group at
http://groups.google.com/group/html5lib-discuss?hl=en-GB
-~----------~----~----~----~------~----~------~--~---