On 1/19/06, tonemcd <[EMAIL PROTECTED]> wrote: > Didn't realise stripogram was so open to those sort of exploits (I've > only ever used it to get rid of the stuff that might mangle layout). > There's obviously more to this than meets the eye.
Here are some interesting resources on the challenges involved with escaping dangerous HTML. Cal Henderson (from Flickr) has developed a flitering library in PHP. It's documented in two tutorials - the code is also available (with unit tests): http://iamcal.com/publish/articles/php/processing_html/ http://iamcal.com/publish/articles/php/processing_html_part_2/ http://code.iamcal.com/php/lib_filter/ The changelog for LiveJournal's HTML sanitizing stuff list dozens of interesting vulnerabilities. The code is worth looking at too - lots of interesting comments: http://cvs.livejournal.org/browse.cgi/livejournal/cgi-bin/cleanhtml.pl Mark Pilgrim's feedparser library has unit tests for the sanitizing component: http://feedparser.org/tests/wellformed/sanitize/ http://feedparser.org/tests/illformed/sanitize/ Even PHP's strip_tags function (which doesn't attempt to sanitize, it just removes anything that looks like a tag) has had its fair share of problems: http://bugs.php.net/search.php?cmd=display&search_for=strip_tags Cheers, Simon