Re: Perl6 and "accents"

Helmut Wollmersdorfer Tue, 18 May 2010 01:51:25 -0700

Tom Christiansen wrote:

Certainly it's perfectly well known amongst people who deal with
letters--including with the Unicode standard.

"Accent" does have a colloquial meaning that maps correctly,
but sadly that colloquial definition does not correspond to
the technical definition, so in being clear, you become less
accurate. There is, as far as I'm aware, no good middle
ground, here.

One doesn't *have* to make up play-words.  There's nothing wrong with the
correct terminology.  Calling a mark a mark is pretty darned simple.

Well, scientist are not always happy with Unicode terms, e.g.'ideograph' for Han characters, or 'Latin' for Roman scripts. But theterms should be used as defined by the standard--as names/identifiers ofproperties.

Unicode has blocks for diacritic marks, and a Diacritic property for
testing whether something is one.  There are 1328 code points whose
canonical decompositions have both both \p{Diacritic} and \pM in them,
946 code points that have only \pM but not \p{Diacritic}, and 197 thathave \p{Diacritic} but not \pM.

If someone really uses Unicode there is way no around deep knowledge ofthe properties. Such code will use Unicode properties directly, and Perl6 should therefore support all the properties.

I still think resorting to talking about "accent marks" is a bad idea.I had somebody the other day thinking that "throwing out the accent marks"
meant deleting all characters whose code points were over 0x7F--and this
was a recent CompSci major, too.

I know this sort of people. They also believe that UTF-8 is a 2-byteencoding.

But that's nothing.  The more you look into it, the weirder it can get,
especially with collation and canonical equivalence, both of which really
require locale knowledge outside the charset itself.


Sure. The specs of Perl 6 still need huge work on the Unicode part.

Helmut Wollmersdorfer

Re: Perl6 and "accents"

Reply via email to