Hi Yuya, I think this is a good idea. While spec compliance is generally desirable, DoS via unbounded grapheme clusters is a real threat, and it's reasonable for a language-level implementation to impose practical limits that the Unicode spec itself doesn't define. This kind of gap between a general-purpose spec and a concrete implementation is not unusual.
The default of 32 code points sounds sensible given that natural language grapheme clusters top out well below that. One minor note: it might help to clarify the intended behavior of `grapheme_limit_codepoints` a bit more — for instance, whether it is meant as a validation check (returning false when a cluster exceeds the limit) or something else. Regards, Kentaro Takeda 2026年2月23日(月) 20:28 youkidearitai <[email protected]>: > Hi, Internals > > I noticed grapheme cluster is not limit code points in UAX#29. > https://www.unicode.org/reports/tr29/ > > And there is no limit code point in Unicode that confirmed in issue of ICU. > https://unicode-org.atlassian.net/browse/ICU-23302 > > So that means create many code points in 1 grapheme cluster, > That is crash for program because computer resource is limited. > > For example, this code is 200MB but 1 grapheme cluster in emoji_bomb.txt > ``` > php -r 'echo(mb_trim(str_repeat("\u{200d}\u{1f468}\u{200d}\u{1f466}\u > {200d}\u{1f466}", 10000000), "\u{200d}"));' -d memory_limit=600M > > emoji_bomb.txt > ``` > (PLEASE BE CAREFUL OPEN IN emoji_bomb.txt BECAUSE MAYBE CRASH) > > So, I think we(php-src, programming language level) need to create new > custom limit function. > My idea is below: > > ``` > grapheme_limit_codepoints(string $str, integer $max_codepoints = 32): bool > ``` > > I don't have heavy opinion that $max_codepoints is 32. > However, 32 code points is enough of grapheme cluster because > human language max code points is maybe Hakṣhmalawarayaṁ(ཧ) in > 9 code points. > > If need more than code points in grapheme cluster, > Userland can to increase $max_codepoints. > > Please see also my speakerdeck. > > https://speakerdeck.com/youkidearitai/limit-of-code-point-for-grapheme-cluster > > What do you think about this idea? > > Regards > Yuya > > -- > --------------------------- > Yuya Hamada (tekimen) > - https://tekitoh-memdhoi.info > - https://github.com/youkidearitai > ----------------------------- >
