Edit report at https://bugs.php.net/bug.php?id=62341&edit=1
ID: 62341
Comment by: andreas dot rieber at t-online dot de
Reported by: bfanger at gmail dot com
Summary: htmlspecialchars() should work on ascii compatible
encodings by default.
Status: Open
Type: Feature/Change Request
Package: *Unicode Issues
PHP Version: 5.4.4
Block user comment: N
Private report: N
New Comment:
OK, understood. So i will go for a wrapper function where i can set the charset
global and report an error in any case (to identify user problems, potential
xss trouble or simply wrong database entries).
Previous Comments:
------------------------------------------------------------------------
[2012-09-06 15:43:59] [email protected]
The problem with setting it to 8859-1 is that it lets everything through. If
your
page is actually in UTF-8 it means you are now vulnerable to 0xE0 XSS invalid
UTF-8 style attacks. In PHP 5.4 we have addressed this by adding an
ENT_SUBSTITUTE option that lets you substitute any invalid chars instead of
returning an empty string.
------------------------------------------------------------------------
[2012-09-06 15:36:43] andreas dot rieber at t-online dot de
I also spotted that problem on an older iso-8859-1 application. I could now
convert the database to utf-8 or change ca. 150 places in the old code.
Then i checked the problem a bit closer: it is user input, so we don't really
know what charset it is. We can only assume it is the charset we published the
page in. That might be wrong but with the new htmlspecialchars behavior we
would show nothing instead of partly wrong input.
I made some tests and it looks like best is to change my code (even for
applications which use utf-8) to:
htmlspecialchars( $text, 0, "iso-8859-1");
There must be a better way... To return nothing is not really good.
------------------------------------------------------------------------
[2012-07-03 19:39:32] Bonefish26 at aol dot com
Everything is fine with htmlspecialcahrs until someone copies data from their
auto formatted ms word document and puts it in the update box. Setting the
charset option seems to solve the problem.
------------------------------------------------------------------------
[2012-06-18 14:26:23] [email protected]
EUC-JP is heavily used, supported by htmlspecialchars and it is not ASCII
compatible.
------------------------------------------------------------------------
[2012-06-17 17:47:34] bfanger at gmail dot com
Rereading the manpage more thoroughly, all the info is there. Another nice
resource is http://nikic.github.com/2012/01/28/htmlspecialchars-improvements-in-
PHP-5-4.html
I now disagree with the decision of the empty string, with php flexible typing
this should have been false or null.
In php5.4 no longer has the weird 'only errors when "display_errors" is off
behavior', but sadly the chosen behaviour is to alway silently supress those
errors.
If throwing E_WARING is too risky, an E_ENCODING error level would be very
welcome addition.
ENT_IGNORE: Removes special characters from the string instead of ignoring
them.
(My previous statement "unless ENT_IGNORE is passed." is therefor invalid)
Using strtr($text, array('<' => '<', '>' => '>', '&' => '&')); is 35%
slower than htmlspecialchars($text, ENT_NOQUOTES, 'ISO-8859-1') which has the
same output.
The securityrisk applies only to multibyte encoding which always uses 2 or more
bytes per characters, like UTF-16 (but UTF-16 and UTF-32 aren't supported by
htmlspecialchars, i'm not sure if any of the supported charsets is incompatible
with ascii)
My framework uses UTF-8 for 95% percent of the time, but to prevent silent
trucating i'll have to add 'ISO-8859-1' as encoding. It just feels wrong.
The default charset for htmlspecialchars should be "ASCII compatible"
"the encodings ISO-8859-1, ISO-8859-15, UTF-8, cp866, cp1251, cp1252, and
KOI8-R
are effectively equivalent"
no ifs, no buts.
------------------------------------------------------------------------
The remainder of the comments for this report are too long. To view
the rest of the comments, please view the bug report online at
https://bugs.php.net/bug.php?id=62341
--
Edit this bug report at https://bugs.php.net/bug.php?id=62341&edit=1