Brett Parker <[EMAIL PROTECTED]> writes: > On Fri, Jul 13, 2007 at 11:18:18AM +0100, Nic James Ferrier wrote: >> >> Derek Anderson <[EMAIL PROTECTED]> writes: >> >> > hey all, >> > >> > could anyone point me to a python html sanitizer implementation (or >> > example)? i don't mean to strip all html, just tags and attributes not >> > on a whitelist, such as I/B/A href/U/etc. >> >> I use libxml2/libxslt, something like: >> >> doc = libxml2.htmlParseDoc(whatever, "utf8") >> result = libxslt.applyStylesheetFile(doc, "strip.xslt", {}) >> >> There are loads of ways of stripping in xslt depending on what you >> want to do. > > Only works on well formed XHTML documents though... which although they > should be the norm, really aren't!
No. In my example I deliberately used libxml2' HTML parser which is an HTML parser not an XHTML parser. It copes with non-well formed documents as well as all the usual entity problems. -- Nic Ferrier http://www.tapsellferrier.co.uk --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "Django users" group. To post to this group, send email to django-users@googlegroups.com To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups.google.com/group/django-users?hl=en -~----------~----~----~----~------~----~------~--~---