Edit report at https://bugs.php.net/bug.php?id=61354&edit=1
ID: 61354 Comment by: tokul at users dot sourceforge dot net Reported by: hufeng1987 at gmail dot com Summary: htmlentities and htmlspecialchars doesn't respect the default_charset Status: Not a bug Type: Bug Package: Strings related Operating System: Linux/Windows/ PHP Version: 5.4.0 Block user comment: N Private report: N New Comment: > if you want them working correctly, you should replace following code > with new: > old code: > > htmlspecialchars($string); > > new code: > > htmlspecialchars($string, NULL, 'GB2312'); htmlspecialchars($string, ENT_COMPAT, 'GB2312'); Default is to sanitize double quotes. Previous Comments: ------------------------------------------------------------------------ [2012-03-12 18:27:13] tokul at users dot sourceforge dot net Two small comments. Could you write your Chinese symbols in hex notation. That way they are more friendly for pages written in other charset? Your test code is ----- <?php $string = "<pre><p>\xce\xd2\xca\xc7\xb2\xe2\xca\xd4</p></pre>"; echo var_dump(htmlspecialchars($string)); echo var_dump(htmlspecialchars($string, NULL, 'GB2312')); ----- Expected result - both var_dumps should be the same. > htmlspecialchars should using charset defined by php.ini default_charset. htmlspecialchars() should not use charset defined in PHP configuration. It should use iso-8859-1 for backwards compatibility reasons. ------------------------------------------------------------------------ [2012-03-12 06:12:58] hufeng1987 at gmail dot com When your project using GB2312 as default charset encoding, when you upgrade to php 5.4, you will find htmlspecialchars will not working as usual. if you want them working correctly, you should replace following code with new: old code: htmlspecialchars($string); new code: htmlspecialchars($string, NULL, 'GB2312'); recoding the full project is a huge work. especially when the project is old. ------------------------------------------------------------------------ [2012-03-12 06:05:54] hufeng1987 at gmail dot com may be you are right , php 5.4 should have utf-8 as the default encoding. but , as production enviroment, this will cause more accident. why not php wisely handle default_charset ? that will free us from recoding. ------------------------------------------------------------------------ [2012-03-12 06:04:35] ras...@php.net What do you mean it is impossible to rewrite old code? In previous versions htmlspecialchars() didn't respect the default_charset ini setting either. It only looks at that setting if you pass an empty string as the encoding. The change in PHP 5.4 was simply to switch from ISO-8859-1 to UTF8 when you do not specify a charset. ------------------------------------------------------------------------ [2012-03-12 05:56:17] hufeng1987 at gmail dot com if this was not a bug, why this change blocked our old project? in previous PHP under php 5.4 , we could using htmlspecialchars as simple: htmlspecialchars($string); and this call should not broken the string. but now, under php 5.4, the default encoding change to utf-8. which may broken old codes. it is impossible to rewrite old code ,add charset encoding specified. ------------------------------------------------------------------------ The remainder of the comments for this report are too long. To view the rest of the comments, please view the bug report online at https://bugs.php.net/bug.php?id=61354 -- Edit this bug report at https://bugs.php.net/bug.php?id=61354&edit=1