ID: 36112 Updated by: [EMAIL PROTECTED] Reported By: pornel at despammed dot com -Status: Assigned +Status: Open Bug Type: Documentation problem PHP Version: Irrelevant Assigned To: gavinfo
Previous Comments: ------------------------------------------------------------------------ [2006-03-12 17:06:18] [EMAIL PROTECTED] There are lot of inconsistencies in this example: 1) About @<script[^>]*?>.*?</script>@si : a) the first ? is useless. 2) About @<[\/\!]*?[^<>]*?>@si : a) / and ! don't have to be escaped. b) [\/\!]*? is useless, as it's already matched by [^<>]*?. c) the ? of [^<>]*? is useless. d) the PCRE_DOTALL modifier is useless, there is no dot. e) the PCRE_CASELESS modifier is useless. f) what is the point avoiding "<" in a tag? 3) About @([\r\n])[\s]+@ : a) no need to put \s in a char class. b) every \r\n will be changed to \r, as \s matches \n. I think the whole example has to be reconsidered, because there are already functions to do some of the job, like strip_tags() and html_entity_decode(). ------------------------------------------------------------------------ [2006-01-20 23:54:03] pornel at despammed dot com Description: ------------ The code on http://uk.php.net/preg_replace: $search = array ('@<script[^>]*?>.*?</script>@si', // Strip out javascript '@<[\/\!]*?[^<>]*?>@si', // Strip out HTML tags doesn't work as advertised. For example it will leave contents of: <script>xxx</script > and worse, it will output valid script tags if given: <<>script>evil<<>/script> If these patterns were used on some website (for stripping markup from user's comments for example), they'd allow XSS attack. Since it's near impossible to properly parse HTML with regular expressions I suggest: * renaming example from 'Convert HTML to text' to 'Remove HTML markup' * adding replacement of '<' as '>' * suggesting use of more robust methods, like strip_tags, nl2br, htmlspecialchars or DOM interface. ------------------------------------------------------------------------ -- Edit this bug report at http://bugs.php.net/?id=36112&edit=1