Re: [PHP-DEV][RFC] grapheme cluster for str_split, grapheme_str_split function

Derick Rethans Tue, 26 Mar 2024 10:16:23 -0700

On 26 March 2024 17:04:18 GMT, Casper Langemeijer <[email protected]> wrote:
>I'd like to address an issue I have with this RFC.


Please don't top reply. 

>I'm not sure is solves a problem by itself. If I understand all of this 
>correctly this only does what already can be accomplished with 
>preg_match_all('/\X/u', ...). The result of this method in my opinion is not 
>very usefull by itself. I've done some searching on various code platforms 
>where I mostly find the use-case for counting the number of grapheme's. I've 
>used it to implement strrev() that correctly works multibyte. 
>
>I'm very sad that mbstring works on codepoints instead of grapheme's and I 
>would very much like to see something happening in that area, but I think 
>expanding a simple string to an array of as many elements to give developers a 
>tool to do this in PHP-space is not good enough. Especially since it can 
>already be achieved with a regexp that already works.
>
>In my opinion: This adds nothing, and tells the PHP developer that is ok to do 
>count(grapheme_str_split()) for a more accurate mb_strlen().
>
>I would like to see a family of functions that can do multibyte str_split(), 
>strrev(), substr(). Ideally as bugfix in mb_* functions, because the edge-case 
>of wanting to know the length in codepoints of a string is a weird edge-case. 
>No developer wants to know that. mb_strlen() should have returned the number 
>of graphemes from the start.

Many of these already exist, such as grapheme_substr. We can't simply change 
the behaviour of the already existing functions due to BC reasons. 

The intl extension is also built on ICU, an actual unicode text processing 
library. 

The grapheme_str_split function, as well as other intl extension functions is 
what should replace mbstring really. 

cheers 
Derick

Re: [PHP-DEV][RFC] grapheme cluster for str_split, grapheme_str_split function

Reply via email to