On Feb 23, 4:59 pm, Jacob Kaplan-Moss <jacob.kaplanm...@gmail.com>
wrote:
> On Mon, Feb 23, 2009 at 3:50 PM, Andy Mckay <a...@clearwind.ca> wrote:
> > You want to use a script to only allow certain HTML tags and enforce a
> > whitelist. Don't be naive and just use string or regular expression to
> > strip only a few, there's lots of hacks that can be done. I use the
> > SGMLParser in Plone, here's an old 
> > one:http://code.activestate.com/recipes/52281/
> >  some googling will probably find you more.
>
> Please don't use this. It's very insecure.
>
> Use html5lib (http://code.google.com/p/html5lib/wiki/UserDocumentation)
> and the sanitizing tokenizer instead.

Interesting, I've also come across this:

http://codespeak.net/lxml/lxmlhtml.html#cleaning-up-html

I've heard it is very fast as it is just a python binding to a C-
library...?

>
> Generally, stripping HTML of anything dangerous is *very*, *very*
> difficult to get right. In fact, it's so hard you shou;d try to avoid
> it if at all possible -- markup languages like markdown, texttile, and
> restructured text are a good choice as a replacement.

I've been using Markdown after seeing it used in many places. There is
a safe mode that instructs it to escape the HTML it would normally
allow. Check the django source for usage.

BN
--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"Django users" group.
To post to this group, send email to django-users@googlegroups.com
To unsubscribe from this group, send email to 
django-users+unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/django-users?hl=en
-~----------~----~----~----~------~----~------~--~---

Reply via email to