Moritz Lenz wrote:
t/spec/S02-builtin_data_types/unicode.t has tests like this:
# LATIN CAPITAL LETTER A, COMBINING GRAVE ACCENT my Str $u = "\x[0041,0300]"; is $u.bytes, 3, 'combining À is three bytes as utf8'; is $u.codes, 2, 'combining À is two codes'; is $u.graphs, 1, 'combining À is one graph';
Which seems to imply that a Str remembers its codepoints, even if it is in grapheme mode (because that's the default).
IMHO it's necessary to store the original assertion. Conversion to NFG should be lazy.
Is this correct? I don't really think that's sensible. I'd expect a compiler to store strings in composed normalization (+ NFG), so $u.codes would be 1.
If a string always stores NFG only - where can we store the result of a decomposition (NFD)?
Also it would be very confusing if a developer just reads a file, filters the lines, and writes them back, if the result is in another normalization form.
Helmut Wollmersdorfer