Here's something I've implemented in Instiki, which may (or may not)
be of more general interest.

As you know, I use the HTML5 Sanitizer. I had the choice of sanitizing
a string. But this required invoking the (rather slow) inputstream.rb.
Instead, I decided to sanitize a REXML tree (which Maruku can also
output), using the :sanitize option to XHTMLSerializer.

The fly in the ointment was that I needed to convert MathML named
entities to UTF-8 before they hit the sanitizer. Traversing the tree,
converting text nodes (and attribute values) proved also to be quite
slow, vitiating the advantage of working with a REXML tree.

The solution I hit on was to create a custom TreeWalker, which is
basically the same as the standard REXML TreeWalker, except that it
converts named entities in text nodes and attribute values, as it
goes.

So, my question is: would this be of more general interest? Would it
be desirable to have a :convert_named_entities option (defaulting to
false) to the standard TreeWalker(s)?


--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"html5lib-discuss" group.
 To post to this group, send email to [email protected]
 To unsubscribe from this group, send email to [EMAIL PROTECTED]
 For more options, visit this group at 
http://groups.google.com/group/html5lib-discuss?hl=en-GB
-~----------~----~----~----~------~----~------~--~---

Reply via email to