On Feb 23, 4:59 pm, Jacob Kaplan-Moss <jacob.kaplanm...@gmail.com> wrote: > On Mon, Feb 23, 2009 at 3:50 PM, Andy Mckay <a...@clearwind.ca> wrote: > > You want to use a script to only allow certain HTML tags and enforce a > > whitelist. Don't be naive and just use string or regular expression to > > strip only a few, there's lots of hacks that can be done. I use the > > SGMLParser in Plone, here's an old > > one:http://code.activestate.com/recipes/52281/ > > some googling will probably find you more. > > Please don't use this. It's very insecure. > > Use html5lib (http://code.google.com/p/html5lib/wiki/UserDocumentation) > and the sanitizing tokenizer instead.
Interesting, I've also come across this: http://codespeak.net/lxml/lxmlhtml.html#cleaning-up-html I've heard it is very fast as it is just a python binding to a C- library...? > > Generally, stripping HTML of anything dangerous is *very*, *very* > difficult to get right. In fact, it's so hard you shou;d try to avoid > it if at all possible -- markup languages like markdown, texttile, and > restructured text are a good choice as a replacement. I've been using Markdown after seeing it used in many places. There is a safe mode that instructs it to escape the HTML it would normally allow. Check the django source for usage. BN --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "Django users" group. To post to this group, send email to django-users@googlegroups.com To unsubscribe from this group, send email to django-users+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/django-users?hl=en -~----------~----~----~----~------~----~------~--~---