Larry Wall larry-at-wall.org |Perl 6| wrote:
into *uint16 as long as they don't synthesize codepoints.  And we can
always resort to *uint32 and *int32 knowing that the Unicode consortium
isn't going to use the top bit any time in the foreseeable future.
(Unless, of course, they endorse something resembling NFG. :)

No, a few million code points in the Unicode standard can produce an arbitrary number of unique grapheme clusters, since you can apply as many modifiers as you like to each different base character. If you allow multiples, the total is unbounded.

A small program, which ought to go into the test suite <g>, can generate >4G distinct grapheme clusters, one at a time. How many implementations will that break? If they want fixed size, 64-bits should do for now. Also, if the spec doesn't list a requirement for a minimum implement ion limit, *any* fixed-size implementation will be incorrect even if untestable as such.

--John

Reply via email to