Edit report at https://bugs.php.net/bug.php?id=63079&edit=1

 ID:                 63079
 Comment by:         Matti dot jarvinen at nitroid dot fi
 Reported by:        astatutov at gmail dot com
 Summary:            String access by character is not multibyte-safe
 Status:             Open
 Type:               Bug
 Package:            Strings related
 PHP Version:        Irrelevant
 Block user comment: N
 Private report:     N

 New Comment:

Under "String access and modification by character" at 
http://php.net/manual/en/language.types.string.php there is no mention about [] 
syntax not being multibyte safe.

At least make this a documentation issue.


Previous Comments:
------------------------------------------------------------------------
[2012-09-13 19:27:35] astatutov at gmail dot com

*mbstring* just determines which module will read this option. It doesn't say 
which module it will affect. Say, option mbstring.func_overload affects whole 
php, because it overrides native functions. Option mbstring.http_input changes 
default php behavior when reading HTTP-request and so on. So why can't 
mbstring.func_overload or, say, mbstring.op_overload override the string 
accessing operation?

------------------------------------------------------------------------
[2012-09-13 14:12:32] larue...@php.net

as the option self said *mbstring*.internal_encoding, not 
php.internal_encoding...

------------------------------------------------------------------------
[2012-09-13 11:57:39] astatutov at gmail dot com

> you should use mb_* to deal with multi-byte characters

I know it. I mentioned it in the description. The option mbstring.func_overload 
do it for me. But bracket operator is still unusable: the documentation states 
it accesses the character while it doesn't. And I believe it's not the 
documentation problem. Any modern language I know which is able to work with 
utf-8 do it transparently for developer. The aim of mbstring is the same, isn't 
it? Setting mbstring.internal_encoding to utf-8 a developer will expect that 
INTERNAL string accessing operator will support it. This is what the term 
"predictable behavior" means.

------------------------------------------------------------------------
[2012-09-13 10:48:25] larue...@php.net

yeah, it's not.
you should use mb_* to deal with multi-byte characters

------------------------------------------------------------------------
[2012-09-13 09:58:55] astatutov at gmail dot com

Description:
------------
I know, there is section named "Details of the String Type" in documentation. 
But still there is other section, that stats "Think of a string as an array of 
characters for this purpose". This is very convenient to think so. We use 
mbstring extension to work entirely on utf-8 and mbstring.func_overload option 
allows us almost forget about differences between regular and multibyte 
strings. We just write our application, thinking about its native logic, not 
PHP internal logic. This is high-level programming language, by the way. We're 
using strlen, substr, etc. as we're doing with regular strings. And BANG! 
String bracket operator returns bytes, not characters! 

I think it's unpredictable behavior, even if it's well-documented (but it's 
not). Considering that the use of utf-8 grows everywhere and maybe even PHP 6 
will support it by default, why not implement multibyte support in bracket 
operations now in mbstring extension? Of course, it must be configurable to be 
back-compatible. I know, we can use substr as a replace of string accessing 
operation, but it's very slow and it's wrong in general.

Also I now this is not a first bug on this subject. There was #51919 as 
example, which was closed and marked as not a bug. But I propose to look at 
this problem from the point of view of the language logic, not the 
implementation.

Sorry, if I've missed something else. 

Test script:
---------------
$str = "Kąt";
echo $str[1];

Expected result:
----------------
ą

Actual result:
--------------
�


------------------------------------------------------------------------



-- 
Edit this bug report at https://bugs.php.net/bug.php?id=63079&edit=1

Reply via email to