Edit report at http://bugs.php.net/bug.php?id=52810&edit=1

 ID:                 52810
 Updated by:         cataphr...@php.net
 Reported by:        trane at gol dot com
 Summary:            substr() and $string[n] corrupt multi-byte UTF-8
                     strings
-Status:             Open
+Status:             Bogus
 Type:               Bug
 Package:            Strings related
 Operating System:   OS X 10.6.4
 PHP Version:        Irrelevant
 Block user comment: N

 New Comment:

This is not a bug.



substr and $str[n] or $str{n} treat the string as a byte array. If you
want to get the n-th Unicode code point, use mb_substr.


Previous Comments:
------------------------------------------------------------------------
[2010-09-10 12:46:44] trane at gol dot com

Description:
------------
(PHP 5.3.2 (cli) (built: Aug  7 2010 00:04:41) 

Copyright (c) 1997-2010 The PHP Group

Zend Engine v2.3.0, Copyright (c) 1998-2010 Zend Technologies)



When trying to extract a single character from a UTF-8-encoded Japanese
string, instead of the expected character, one gets the dreaded
black-diamond-question-mark-of-death.





Test script:
---------------
$s_string = "静岡は蒸し暑いです。";

echo $s_string[3], "<p />";

// expected output is 蒸

// actual output is �

print_r($s_string[3]);

// expected output is 蒸

// actual output is �

echo "<p />";

$sub = substr($s_string, 3, 1);

echo $sub, "<p />";

// expected output is 蒸

// actual output is �

Expected result:
----------------
Expected output is 蒸





Actual result:
--------------
Actual output is �




------------------------------------------------------------------------



-- 
Edit this bug report at http://bugs.php.net/bug.php?id=52810&edit=1

Reply via email to