2025年7月14日(月) 19:22 Derick Rethans <der...@php.net>: > > On Wed, 9 Jul 2025, youkidearitai wrote: > > > Hi, Internals > > > > I changed below the RFC. > > - https://wiki.php.net/rfc/grapheme_add_locale_for_case_insensitive > > Pull request is below: > > - https://github.com/php/php-src/pull/18792 > > > > Change point is below: > > - Add a strength for grapheme_* functions > > - Affect to all over the world characters, ex: Ideographic Variation > > Sequence(IVS) > > - Use Collator object const values. > > These settings are indeed important for these functions, but I can't get > around the fact that it makes these APIs really cluttered and > complicated — something that many functions in the grapheme_ / intl > extension already suffer from. > > Is this API really the best way? > > > $locale parameter is not change anything. Because I could not find any way. > > It seems that I came to a similar conclusion, but locales are much more > complicated than just languageCode_regionCode (for example, see > https://github.com/derickr/php-text/blob/main/tests/text-contains.phpt#L25) > > You also don't really need a strength argument, as you can 'encode' that > in the locale name, like: 'nb_NO-u-ks-primary' (I know, it's rather ugly > and the list of options is vast: > https://www.unicode.org/reports/tr35/tr35-collation.html#Common_Settings > > cheers, > Derick
Hi, Derick Thank you very much for response. > Is this API really the best way? I reconsidered the function signature based on what you said. > It seems that I came to a similar conclusion, but locales are much more > complicated than just languageCode_regionCode (for example, see > https://github.com/derickr/php-text/blob/main/tests/text-contains.phpt#L25) > > You also don't really need a strength argument, as you can 'encode' that > in the locale name, like: 'nb_NO-u-ks-primary' (I know, it's rather ugly > and the list of options is vast: > https://www.unicode.org/reports/tr35/tr35-collation.html#Common_Settings Indeed, since strength can be specified in the locale, I thought it would be better to specify it in the locale rather than as a parameter for strength. For example, The grapheme_* functions can detect difference for IVS. ``` $ sapi/cli/php -r 'var_dump(grapheme_levenshtein("\u{908A}", "\u{908A}\u{E0101}", locale: "ja_JP-u-ks-identic"));' int(1) $ sapi/cli/php -r 'var_dump(grapheme_levenshtein("\u{908A}", "\u{908A}\u{E0101}"));' int(0) $ sapi/cli/php -r 'var_dump(grapheme_strpos("\u{908A}", "\u{908A}\u{E0101}"));' int(0) $ sapi/cli/php -r 'var_dump(grapheme_strpos("\u{908A}", "\u{908A}\u{E0101}", locale: "ja_JP-u-ks-identic"));' bool(false) ``` Since ideographic characters also have identities (e.g., names), we would like to make IVS compatible with them. However, it should be simple, so we should compromise somewhere. Regards Yuya -- --------------------------- Yuya Hamada (tekimen) - https://tekitoh-memdhoi.info - https://github.com/youkidearitai -----------------------------