On Mon, May 18, 2009 at 11:11:32AM +0200, Helmut Wollmersdorfer wrote: > [1] Open questions: > > 1) Will graphemes have an unique charname? > e.g. GRAPHEME LATIN SMALL LETTER A WITH DOT BELOW AND DOT ABOVE
Yes, presumably that comes with the "normalization" part of NFG. We're not aiming for round-tripping of synthetic codepoints, just as NFC doesn't do round-tripping of sequences that have precomposed codepoints. We're really just extending the NFC notion a bit further to encompass temporary precomposed codepoints. > 2) Can I use Unicode property matching safely with graphemes? > If yes, who or what maintains the necessary tables? Good question. My assumption is that adding marks to a character doesn't change its fundamental nature. What needs to be provided other pass-through to the base character's properties? > 3) Details of 'life-time', round-trip. Which is a very interesting topic, with connections to type theory, scope/domain management, and security issues (such as the possibility of a DoS attack on the translation tables). > 4) Should the definition of graphemes conform to Unicode Standard Annex > #29 'grapheme clusters'? Wich level - legacy, extended or tailored? No opinion, other than that we're aiming for the most modern formulation that doesn't implicitly cede declarational control to something out of the control of Perl 6 declarations. (See locales for an example of something Perl 6 ignores in the absence of an explicit declaration to pay attention to them.) So just guessing from the names without reading the Annex in question, not legacy, but probably extended, with explicitly tailoring allowed by declaration. (Unless extended has some dire performance or policy consequences that would be contraindicative...) So as long as we stay inside these fundamental Perl 6 design principles, feel free to whack on the specs. Larry