Brett Parker <[EMAIL PROTECTED]> writes:

> On Fri, Jul 13, 2007 at 11:18:18AM +0100, Nic James Ferrier wrote:
>> 
>> Derek Anderson <[EMAIL PROTECTED]> writes:
>> 
>> > hey all,
>> >
>> > could anyone point me to a python html sanitizer implementation (or 
>> > example)?  i don't mean to strip all html, just tags and attributes not 
>> > on a whitelist, such as I/B/A href/U/etc.
>> 
>> I use libxml2/libxslt, something like:
>> 
>>   doc = libxml2.htmlParseDoc(whatever, "utf8")
>>   result = libxslt.applyStylesheetFile(doc, "strip.xslt", {})
>> 
>> There are loads of ways of stripping in xslt depending on what you
>> want to do.
>
> Only works on well formed XHTML documents though... which although they
> should be the norm, really aren't!

No. In my example I deliberately used libxml2' HTML parser which is an
HTML parser not an XHTML parser.

It copes with non-well formed documents as well as all the usual
entity problems.


-- 
Nic Ferrier
http://www.tapsellferrier.co.uk   

--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"Django users" group.
To post to this group, send email to django-users@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/django-users?hl=en
-~----------~----~----~----~------~----~------~--~---

Reply via email to