On 1/19/06, tonemcd <[EMAIL PROTECTED]> wrote:
> Didn't realise stripogram was so open to those sort of exploits (I've
> only ever used it to get rid of the stuff that might mangle layout).
> There's obviously more to this than meets the eye.

Here are some interesting resources on the challenges involved with
escaping dangerous HTML.

Cal Henderson (from Flickr) has developed a flitering library in PHP.
It's documented in two tutorials - the code is also available (with
unit tests):

http://iamcal.com/publish/articles/php/processing_html/
http://iamcal.com/publish/articles/php/processing_html_part_2/
http://code.iamcal.com/php/lib_filter/

The changelog for LiveJournal's HTML sanitizing stuff list dozens of
interesting vulnerabilities. The code is worth looking at too - lots
of interesting comments:

http://cvs.livejournal.org/browse.cgi/livejournal/cgi-bin/cleanhtml.pl

Mark Pilgrim's feedparser library has unit tests for the sanitizing component:

http://feedparser.org/tests/wellformed/sanitize/
http://feedparser.org/tests/illformed/sanitize/

Even PHP's strip_tags function (which doesn't attempt to sanitize, it
just removes anything that looks like a tag) has had its fair share of
problems:

http://bugs.php.net/search.php?cmd=display&search_for=strip_tags

Cheers,

Simon

Reply via email to