Re: "Unicode in 'NFG' formation" ?

Larry Wall Mon, 18 May 2009 11:17:26 -0700

On Mon, May 18, 2009 at 11:11:32AM +0200, Helmut Wollmersdorfer wrote:
> [1] Open questions:
>
> 1) Will graphemes have an unique charname?
>    e.g. GRAPHEME LATIN SMALL LETTER A WITH DOT BELOW AND DOT ABOVE


Yes, presumably that comes with the "normalization" part of NFG.
We're not aiming for round-tripping of synthetic codepoints, just
as NFC doesn't do round-tripping of sequences that have precomposed
codepoints.  We're really just extending the NFC notion a bit further
to encompass temporary precomposed codepoints.

> 2) Can I use Unicode property matching safely with graphemes?
>    If yes, who or what maintains the necessary tables?

Good question.  My assumption is that adding marks to a character
doesn't change its fundamental nature.  What needs to be provided
other pass-through to the base character's properties?

> 3) Details of 'life-time', round-trip.

Which is a very interesting topic, with connections to type theory,
scope/domain management, and security issues (such as the possibility
of a DoS attack on the translation tables).

> 4) Should the definition of graphemes conform to Unicode Standard Annex  
> #29 'grapheme clusters'? Wich level - legacy, extended or tailored?

No opinion, other than that we're aiming for the most modern
formulation that doesn't implicitly cede declarational control to
something out of the control of Perl 6 declarations.  (See locales for
an example of something Perl 6 ignores in the absence of an explicit
declaration to pay attention to them.)  So just guessing from the
names without reading the Annex in question, not legacy, but probably
extended, with explicitly tailoring allowed by declaration.  (Unless
extended has some dire performance or policy consequences that would
be contraindicative...)

So as long as we stay inside these fundamental Perl 6 design
principles, feel free to whack on the specs.

Larry

Re: "Unicode in 'NFG' formation" ?

Reply via email to