Edit report at http://bugs.php.net/bug.php?id=51385&edit=1
ID: 51385 User updated by: baudav at gmail dot com Reported by: baudav at gmail dot com Summary: htmlentities next substr with UTF-8 Status: Bogus Type: Bug Package: *Unicode Issues Operating System: W2k3 IIS6 PHP Version: 5.3.2 vc9-nts New Comment: Oh! excuse for my incomplet report! Tested with substr and mb_substr; It's same with mb_string Previous Comments: ------------------------------------------------------------------------ [2010-03-25 05:27:58] baudav at gmail dot com Windows 2003 with IIS6 fastcgi; PHP 5.3.1 or 5.3.2 vc9-nts ------------------------------------------------------------------------ [2010-03-25 05:24:44] ahar...@php.net Thank you for taking the time to write to us, but this is not a bug. Please double-check the documentation available at http://www.php.net/manual/ and the instructions on how to report a bug at http://bugs.php.net/how-to-report.php Like most PHP functions, substr() is not multibyte-aware. You may prefer to use mb_substr() instead. ------------------------------------------------------------------------ [2010-03-25 05:18:11] baudav at gmail dot com Description: ------------ substr not truncate UTF-8 correctly, and generate bad UTF-8 string. test script must be writen in UTF-8 Test script: --------------- <?php $str = 'câble TOSLink mâle/mâle (1.5 à 25m)'; $etc = '...'; echo htmlentities(substr($str, 0, 33). $etc, ENT_QUOTES, 'UTF-8') ?> Expected result: ---------------- câble TOSLink mâle/mâle (1.5 ... Actual result: -------------- no return, just PHP error logged: PHP Warning: htmlentities(): Invalid multibyte sequence in argument in C:\DATA\WWW\test.php on line 5 change substr($str, 0, 33) by substr($str, 0, 32), it's work ------------------------------------------------------------------------ -- Edit this bug report at http://bugs.php.net/bug.php?id=51385&edit=1