2025年7月14日(月) 19:22 Derick Rethans <der...@php.net>:
>
> On Wed, 9 Jul 2025, youkidearitai wrote:
>
> > Hi, Internals
> >
> > I changed below the RFC.
> > - https://wiki.php.net/rfc/grapheme_add_locale_for_case_insensitive
> > Pull request is below:
> > - https://github.com/php/php-src/pull/18792
> >
> > Change point is below:
> > - Add a strength for grapheme_* functions
> >   - Affect to all over the world characters, ex: Ideographic Variation
> > Sequence(IVS)
> >   - Use Collator object const values.
>
> These settings are indeed important for these functions, but I can't get
> around the fact that it makes these APIs really cluttered and
> complicated — something that many functions in the grapheme_ / intl
> extension already suffer from.
>
> Is this API really the best way?
>
> > $locale parameter is not change anything. Because I could not find any way.
>
> It seems that I came to a similar conclusion, but locales are much more
> complicated than just languageCode_regionCode (for example, see
> https://github.com/derickr/php-text/blob/main/tests/text-contains.phpt#L25)
>
> You also don't really need a strength argument, as you can 'encode' that
> in the locale name, like: 'nb_NO-u-ks-primary' (I know, it's rather ugly
> and the list of options is vast:
> https://www.unicode.org/reports/tr35/tr35-collation.html#Common_Settings
>
> cheers,
> Derick

Hi, Derick

Thank you very much for response.

> Is this API really the best way?

I reconsidered the function signature based on what you said.

> It seems that I came to a similar conclusion, but locales are much more
> complicated than just languageCode_regionCode (for example, see
> https://github.com/derickr/php-text/blob/main/tests/text-contains.phpt#L25)
>
> You also don't really need a strength argument, as you can 'encode' that
> in the locale name, like: 'nb_NO-u-ks-primary' (I know, it's rather ugly
> and the list of options is vast:
> https://www.unicode.org/reports/tr35/tr35-collation.html#Common_Settings

Indeed, since strength can be specified in the locale,
I thought it would be better to specify it in the locale rather than
as a parameter for strength.

For example, The grapheme_* functions can detect difference for IVS.
```
$ sapi/cli/php -r 'var_dump(grapheme_levenshtein("\u{908A}",
"\u{908A}\u{E0101}", locale: "ja_JP-u-ks-identic"));'
int(1)
$ sapi/cli/php -r 'var_dump(grapheme_levenshtein("\u{908A}",
"\u{908A}\u{E0101}"));'
int(0)
$ sapi/cli/php -r 'var_dump(grapheme_strpos("\u{908A}", "\u{908A}\u{E0101}"));'
int(0)
$ sapi/cli/php -r 'var_dump(grapheme_strpos("\u{908A}",
"\u{908A}\u{E0101}", locale: "ja_JP-u-ks-identic"));'
bool(false)
```

Since ideographic characters also have identities (e.g., names), we
would like to make IVS compatible with them.
However, it should be simple, so we should compromise somewhere.

Regards
Yuya


-- 
---------------------------
Yuya Hamada (tekimen)
- https://tekitoh-memdhoi.info
- https://github.com/youkidearitai
-----------------------------

Reply via email to