Here's something I've implemented in Instiki, which may (or may not) be of more general interest.
As you know, I use the HTML5 Sanitizer. I had the choice of sanitizing a string. But this required invoking the (rather slow) inputstream.rb. Instead, I decided to sanitize a REXML tree (which Maruku can also output), using the :sanitize option to XHTMLSerializer. The fly in the ointment was that I needed to convert MathML named entities to UTF-8 before they hit the sanitizer. Traversing the tree, converting text nodes (and attribute values) proved also to be quite slow, vitiating the advantage of working with a REXML tree. The solution I hit on was to create a custom TreeWalker, which is basically the same as the standard REXML TreeWalker, except that it converts named entities in text nodes and attribute values, as it goes. So, my question is: would this be of more general interest? Would it be desirable to have a :convert_named_entities option (defaulting to false) to the standard TreeWalker(s)? --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "html5lib-discuss" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups.google.com/group/html5lib-discuss?hl=en-GB -~----------~----~----~----~------~----~------~--~---
