Edit report at https://bugs.php.net/bug.php?id=47494&edit=1

 ID:                 47494
 Updated by:         ras...@php.net
 Reported by:        philipp dot feigl at gmail dot com
 Summary:            htmlspecialchars does not throw E_WARNING on
                     multibyte problems
 Status:             Not a bug
 Type:               Feature/Change Request
 Package:            Strings related
 Operating System:   CentOS5
 PHP Version:        5.2.8
 Block user comment: N
 Private report:     N

 New Comment:

Also note that many, if not most, apps use this as their only validity filter 
and 
if you output invalid UTF-8, for example, it can lead to security problems like 
the well-known IE 0xE0 XSS exploit. So at some point along the line you have to 
do a multi-byte check and it may as well be here since we need to do it anyway.


Previous Comments:
------------------------------------------------------------------------
[2012-09-06 15:29:07] ras...@php.net

You assume ASCII7 compatibility for all encodings which is a bad assumption.

------------------------------------------------------------------------
[2012-09-06 11:39:19] lzsiga at freemail dot c3 dot hu

Imho htmlspecialchars should not check for multi-byte validity at all, because 
it only deals with a few characters that are all in ASCII7, so it could safely 
ignore every byte between 0x80 and 0xFF. The third parameter could be simply 
ignored (as if it were 'ISO-8859-1')

------------------------------------------------------------------------
[2012-08-30 19:21:49] ni...@php.net

@the disappointed user: PHP 5.4 no longer throws said warning (it was just 
confusing). Instead there are several new options for dealing with incorrect 
encoding. Of particular interest is ENT_SUBSTITUTE, which will replace invalid 
code unit sequences with the Unicode Replacement Character (instead of 
returning a rather unhelpful empty string). This way you can easily spot where 
the string is incorrectly encoded. Furthermore this option has the additional 
advantage of being more graceful (it just removed individual incorrectly 
encoded bytes, not the whole string).

Hope this helps you. More info in the docs: http://de2.php.net/htmlspecialchars

------------------------------------------------------------------------
[2012-08-30 19:01:22] another_disappointed_php_programmer at exam

This is very sad.

This is a bug, and it's sad that PHP core developers said that it's a feature 
and it won't be fixed. I'm disappointed.

------------------------------------------------------------------------
[2012-07-01 15:34:03] ras...@php.net

This really isn't a bug. I do agree that the approach isn't ideal, but we 
shouldn't throw warnings on bad input here because htmlspecialchars() is 
explicitly designed to clean up bad input and it is run directly on user data 
most of the time. In order for someone to avoid this warning they would need to 
first call something like iconv('utf-8','utf-8') to clean up the input data and 
that doesn't make much sense since htmlspecialchars() essentially does that 
already. But, in order to help debugging there should be some way to see why an 
htmlspecialchars() call failed so a last_error() function similar to how it is 
handled for json decoding would make sense.

------------------------------------------------------------------------


The remainder of the comments for this report are too long. To view
the rest of the comments, please view the bug report online at

    https://bugs.php.net/bug.php?id=47494


-- 
Edit this bug report at https://bugs.php.net/bug.php?id=47494&edit=1

Reply via email to