ID: 36112 Updated by: [EMAIL PROTECTED] Reported By: pornel at despammed dot com -Status: Open +Status: Assigned Bug Type: Documentation problem PHP Version: Irrelevant -Assigned To: +Assigned To: gavinfo
Previous Comments: ------------------------------------------------------------------------ [2006-01-20 23:54:03] pornel at despammed dot com Description: ------------ The code on http://uk.php.net/preg_replace: $search = array ('@<script[^>]*?>.*?</script>@si', // Strip out javascript '@<[\/\!]*?[^<>]*?>@si', // Strip out HTML tags doesn't work as advertised. For example it will leave contents of: <script>xxx</script > and worse, it will output valid script tags if given: <<>script>evil<<>/script> If these patterns were used on some website (for stripping markup from user's comments for example), they'd allow XSS attack. Since it's near impossible to properly parse HTML with regular expressions I suggest: * renaming example from 'Convert HTML to text' to 'Remove HTML markup' * adding replacement of '<' as '>' * suggesting use of more robust methods, like strip_tags, nl2br, htmlspecialchars or DOM interface. ------------------------------------------------------------------------ -- Edit this bug report at http://bugs.php.net/?id=36112&edit=1