Anne van Kesteren wrote: > On Mon, 08 Jan 2007 18:23:40 +0100, Sam Ruby <[EMAIL PROTECTED]> > wrote: >>> Because there is no difference between them. See the HTML5 >>> specification. >> >> My point is that by "baking in" that behavior into the tokenizer, it >> essentially limits that tokenizer to just supporting HTML5. By >> providing one extra "bit" of information, the potential for reuse is >> increased. > > Well, the next "bit" would probably be processing instructions. That's > why it would be nice to have some formalization / standardization first > to see how many changes are required exactly.
I have no interest in XML processing instructions at this time. > Currently html5lib maps rather well to the specificaction which improves > the readability of the code a lot (imho). I'd like to know at how many > changes we're looking and how that impacts the code. That's why I provided a comprehensive patch: http://intertwingly.net/stories/2007/01/08/xhtml5.diff >>> Not sure how to do the .lower() stuff. I kind of guessed the reason >>> you wanted to change that was because of a project like this :-) >> >> I've provided one way: by refactoring it so that all the lowercasing >> of element names is done in exactly one place, and that the >> lowercasing of attribute names is also done in exactly one place. >> That class can be subclassed to provide a different behavior. > > Do you this as a standalone patch somewhere? As mentioned before, I'd > like to see how it deals with non-ASCII characters. The patch isn't all that big. The relevant portions are: asciiLower = dict([(ord(c),ord(c.lower())) for c in string.ascii_uppercase]) token["name"] = token["name"].translate(asciiLower) token["data"] = dict([(attr.translate(asciiLower), value) for attr,value in token["data"][::-1]]) - Sam Ruby _______________________________________________ implementors mailing list [email protected] http://lists.whatwg.org/listinfo.cgi/implementors-whatwg.org
