>
> Hi, Niels
>
> Thank you for your comment.
> Indeed, returns false is make sense.
>
> Therefore, I changed to returns false when invalid UTF-8 strings.
>
> Regards
> Yuya
>
> --
> ---------------------------
> Yuya Hamada (tekimen)
> - https://tekitoh-memdhoi.info
> - https://github.com/youkidearitai
> -----------------------------
Sorry, again.
I checked behavior of mb_str_split function. So Illegal byte sequences
are returned as is.
```
sapi/cli/php -r 'var_dump(mb_str_split("あ\xc2\xf4\x80あ"));'
array(4) {
[0]=>
string(3) "あ"
[1]=>
string(2) "��"
[2]=>
string(1) "�"
[3]=>
string(3) "あ"
}
```
And, I reading ICU document about utext_openUTF8 (below is link):
https://unicode-org.github.io/icu-docs/apidoc/dev/icu4c/utext_8h.html#a130e7cba201c4b38799b432eb269f6d5
> Any invalid UTF-8 in the input will be handled in this way: a sequence of
> bytes that has the form of a truncated, but otherwise valid, UTF-8 sequence
> will be replaced by a single unicode replacement character, \uFFFD. Any other
> illegal bytes will each be replaced by a \uFFFD.
Therefore, I think encoding check is not need.
Returns only arrays together with mb_str_split.
Regards
Yuya
--
---------------------------
Yuya Hamada (tekimen)
- https://tekitoh-memdhoi.info
- https://github.com/youkidearitai
-----------------------------