Brett Parker <[EMAIL PROTECTED]> writes:
> On Fri, Jul 13, 2007 at 11:18:18AM +0100, Nic James Ferrier wrote:
>>
>> Derek Anderson <[EMAIL PROTECTED]> writes:
>>
>> > hey all,
>> >
>> > could anyone point me to a python html sanitizer implementation (or
>> > example)? i don't mean to strip all html, just tags and attributes not
>> > on a whitelist, such as I/B/A href/U/etc.
>>
>> I use libxml2/libxslt, something like:
>>
>> doc = libxml2.htmlParseDoc(whatever, "utf8")
>> result = libxslt.applyStylesheetFile(doc, "strip.xslt", {})
>>
>> There are loads of ways of stripping in xslt depending on what you
>> want to do.
>
> Only works on well formed XHTML documents though... which although they
> should be the norm, really aren't!
No. In my example I deliberately used libxml2' HTML parser which is an
HTML parser not an XHTML parser.
It copes with non-well formed documents as well as all the usual
entity problems.
--
Nic Ferrier
http://www.tapsellferrier.co.uk
--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups
"Django users" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at
http://groups.google.com/group/django-users?hl=en
-~----------~----~----~----~------~----~------~--~---