2025年7月15日(火) 16:05 youkidearitai <youkideari...@gmail.com>: > > 2025年7月14日(月) 19:22 Derick Rethans <der...@php.net>: > > > > On Wed, 9 Jul 2025, youkidearitai wrote: > > > > > Hi, Internals > > > > > > I changed below the RFC. > > > - https://wiki.php.net/rfc/grapheme_add_locale_for_case_insensitive > > > Pull request is below: > > > - https://github.com/php/php-src/pull/18792 > > > > > > Change point is below: > > > - Add a strength for grapheme_* functions > > > - Affect to all over the world characters, ex: Ideographic Variation > > > Sequence(IVS) > > > - Use Collator object const values. > > > > These settings are indeed important for these functions, but I can't get > > around the fact that it makes these APIs really cluttered and > > complicated — something that many functions in the grapheme_ / intl > > extension already suffer from. > > > > Is this API really the best way? > > > > > $locale parameter is not change anything. Because I could not find any > > > way. > > > > It seems that I came to a similar conclusion, but locales are much more > > complicated than just languageCode_regionCode (for example, see > > https://github.com/derickr/php-text/blob/main/tests/text-contains.phpt#L25) > > > > You also don't really need a strength argument, as you can 'encode' that > > in the locale name, like: 'nb_NO-u-ks-primary' (I know, it's rather ugly > > and the list of options is vast: > > https://www.unicode.org/reports/tr35/tr35-collation.html#Common_Settings > > > > cheers, > > Derick > > Hi, Derick > > Thank you very much for response. > > > Is this API really the best way? > > I reconsidered the function signature based on what you said. > > > It seems that I came to a similar conclusion, but locales are much more > > complicated than just languageCode_regionCode (for example, see > > https://github.com/derickr/php-text/blob/main/tests/text-contains.phpt#L25) > > > > You also don't really need a strength argument, as you can 'encode' that > > in the locale name, like: 'nb_NO-u-ks-primary' (I know, it's rather ugly > > and the list of options is vast: > > https://www.unicode.org/reports/tr35/tr35-collation.html#Common_Settings > > Indeed, since strength can be specified in the locale, > I thought it would be better to specify it in the locale rather than > as a parameter for strength. > > For example, The grapheme_* functions can detect difference for IVS. > ``` > $ sapi/cli/php -r 'var_dump(grapheme_levenshtein("\u{908A}", > "\u{908A}\u{E0101}", locale: "ja_JP-u-ks-identic"));' > int(1) > $ sapi/cli/php -r 'var_dump(grapheme_levenshtein("\u{908A}", > "\u{908A}\u{E0101}"));' > int(0) > $ sapi/cli/php -r 'var_dump(grapheme_strpos("\u{908A}", > "\u{908A}\u{E0101}"));' > int(0) > $ sapi/cli/php -r 'var_dump(grapheme_strpos("\u{908A}", > "\u{908A}\u{E0101}", locale: "ja_JP-u-ks-identic"));' > bool(false) > ``` > > Since ideographic characters also have identities (e.g., names), we > would like to make IVS compatible with them. > However, it should be simple, so we should compromise somewhere. > > Regards > Yuya > > > -- > --------------------------- > Yuya Hamada (tekimen) > - https://tekitoh-memdhoi.info > - https://github.com/youkidearitai > -----------------------------
Hi, Internals I have revised this RFC. https://wiki.php.net/rfc/grapheme_add_locale_for_case_insensitive I believe I have done my best to address the complexity of Unicode. I would like to go to "Voting" phase. If there are no objections, I would like to start voting this week. Regards Yuya -- --------------------------- Yuya Hamada (tekimen) - https://tekitoh-memdhoi.info - https://github.com/youkidearitai -----------------------------