ID:          36112
 Updated by:  [EMAIL PROTECTED]
 Reported By: pornel at despammed dot com
-Status:      Open
+Status:      Assigned
 Bug Type:    Documentation problem
 PHP Version: Irrelevant
-Assigned To: 
+Assigned To: gavinfo


Previous Comments:
------------------------------------------------------------------------

[2006-01-20 23:54:03] pornel at despammed dot com

Description:
------------
The code on http://uk.php.net/preg_replace:

$search = array ('@<script[^>]*?>.*?</script>@si', // Strip 
out javascript
                 '@<[\/\!]*?[^<>]*?>@si',          // Strip 
out HTML tags

doesn't work as advertised. For example it will leave 
contents of:
<script>xxx</script       >
and worse, it will output valid script tags if given:
<<>script>evil<<>/script>

If these patterns were used on some website (for stripping 
markup from user's comments for example), they'd allow XSS 
attack.


Since it's near impossible to properly parse HTML with 
regular expressions I suggest:
* renaming example from 'Convert HTML to text' to 'Remove 
HTML markup'
* adding replacement of '<' as '&gt;'
* suggesting use of more robust methods, like strip_tags, 
nl2br, htmlspecialchars or DOM interface.




------------------------------------------------------------------------


-- 
Edit this bug report at http://bugs.php.net/?id=36112&edit=1

Reply via email to