On 26 March 2024 17:04:18 GMT, Casper Langemeijer <[email protected]> wrote:
>I'd like to address an issue I have with this RFC.
Please don't top reply.
>I'm not sure is solves a problem by itself. If I understand all of this
>correctly this only does what already can be accomplished with
>preg_match_all('/\X/u', ...). The result of this method in my opinion is not
>very usefull by itself. I've done some searching on various code platforms
>where I mostly find the use-case for counting the number of grapheme's. I've
>used it to implement strrev() that correctly works multibyte.
>
>I'm very sad that mbstring works on codepoints instead of grapheme's and I
>would very much like to see something happening in that area, but I think
>expanding a simple string to an array of as many elements to give developers a
>tool to do this in PHP-space is not good enough. Especially since it can
>already be achieved with a regexp that already works.
>
>In my opinion: This adds nothing, and tells the PHP developer that is ok to do
>count(grapheme_str_split()) for a more accurate mb_strlen().
>
>I would like to see a family of functions that can do multibyte str_split(),
>strrev(), substr(). Ideally as bugfix in mb_* functions, because the edge-case
>of wanting to know the length in codepoints of a string is a weird edge-case.
>No developer wants to know that. mb_strlen() should have returned the number
>of graphemes from the start.
Many of these already exist, such as grapheme_substr. We can't simply change
the behaviour of the already existing functions due to BC reasons.
The intl extension is also built on ICU, an actual unicode text processing
library.
The grapheme_str_split function, as well as other intl extension functions is
what should replace mbstring really.
cheers
Derick