Edit report at https://bugs.php.net/bug.php?id=63079&edit=1

 ID:                 63079
 Updated by:         larue...@php.net
 Reported by:        astatutov at gmail dot com
 Summary:            String access by character is not multibyte-safe
 Status:             Open
 Type:               Bug
 Package:            Strings related
 PHP Version:        Irrelevant
 Block user comment: N
 Private report:     N

 New Comment:

yeah, it's not.
you should use mb_* to deal with multi-byte characters


Previous Comments:
------------------------------------------------------------------------
[2012-09-13 09:58:55] astatutov at gmail dot com

Description:
------------
I know, there is section named "Details of the String Type" in documentation. 
But still there is other section, that stats "Think of a string as an array of 
characters for this purpose". This is very convenient to think so. We use 
mbstring extension to work entirely on utf-8 and mbstring.func_overload option 
allows us almost forget about differences between regular and multibyte 
strings. We just write our application, thinking about its native logic, not 
PHP internal logic. This is high-level programming language, by the way. We're 
using strlen, substr, etc. as we're doing with regular strings. And BANG! 
String bracket operator returns bytes, not characters! 

I think it's unpredictable behavior, even if it's well-documented (but it's 
not). Considering that the use of utf-8 grows everywhere and maybe even PHP 6 
will support it by default, why not implement multibyte support in bracket 
operations now in mbstring extension? Of course, it must be configurable to be 
back-compatible. I know, we can use substr as a replace of string accessing 
operation, but it's very slow and it's wrong in general.

Also I now this is not a first bug on this subject. There was #51919 as 
example, which was closed and marked as not a bug. But I propose to look at 
this problem from the point of view of the language logic, not the 
implementation.

Sorry, if I've missed something else. 

Test script:
---------------
$str = "Kąt";
echo $str[1];

Expected result:
----------------
ą

Actual result:
--------------
�


------------------------------------------------------------------------



-- 
Edit this bug report at https://bugs.php.net/bug.php?id=63079&edit=1

Reply via email to