Edit report at http://bugs.php.net/bug.php?id=52810&edit=1
ID: 52810 Updated by: cataphr...@php.net Reported by: trane at gol dot com Summary: substr() and $string[n] corrupt multi-byte UTF-8 strings -Status: Open +Status: Bogus Type: Bug Package: Strings related Operating System: OS X 10.6.4 PHP Version: Irrelevant Block user comment: N New Comment: This is not a bug. substr and $str[n] or $str{n} treat the string as a byte array. If you want to get the n-th Unicode code point, use mb_substr. Previous Comments: ------------------------------------------------------------------------ [2010-09-10 12:46:44] trane at gol dot com Description: ------------ (PHP 5.3.2 (cli) (built: Aug 7 2010 00:04:41) Copyright (c) 1997-2010 The PHP Group Zend Engine v2.3.0, Copyright (c) 1998-2010 Zend Technologies) When trying to extract a single character from a UTF-8-encoded Japanese string, instead of the expected character, one gets the dreaded black-diamond-question-mark-of-death. Test script: --------------- $s_string = "é岡ã¯è¸ãæãã§ãã"; echo $s_string[3], "<p />"; // expected output is è¸ // actual output is � print_r($s_string[3]); // expected output is è¸ // actual output is � echo "<p />"; $sub = substr($s_string, 3, 1); echo $sub, "<p />"; // expected output is è¸ // actual output is � Expected result: ---------------- Expected output is è¸ Actual result: -------------- Actual output is � ------------------------------------------------------------------------ -- Edit this bug report at http://bugs.php.net/bug.php?id=52810&edit=1