Moritz Lenz wrote:
t/spec/S02-builtin_data_types/unicode.t has tests like this:

# LATIN CAPITAL LETTER A, COMBINING GRAVE ACCENT
my Str $u = "\x[0041,0300]";
is $u.bytes, 3, 'combining À is three bytes as utf8';
is $u.codes, 2, 'combining À is two codes';
is $u.graphs, 1, 'combining À is one graph';

Which seems to imply that a Str remembers its codepoints, even if it is
in grapheme mode (because that's the default).

IMHO it's necessary to store the original assertion. Conversion to NFG should be lazy.

Is this correct? I don't really think that's sensible. I'd expect  a
compiler to store strings in composed normalization (+ NFG), so $u.codes
would be 1.

If a string always stores NFG only - where can we store the result of a decomposition (NFD)?

Also it would be very confusing if a developer just reads a file, filters the lines, and writes them back, if the result is in another normalization form.

Helmut Wollmersdorfer

Reply via email to