On Apr 26, 2004, at 5:12 AM, Dan Sugalski wrote:

At 9:34 PM -0400 4/25/04, Bryan C. Warnock wrote:

And what about codepoints that *are* glyphs and/but aren't graphemes?

Where do we have those? (I'm getting tempted instead to just call them fred--it'll at least avoid some of this confusion...)

There shouldn't be those anywhere. At least under the usual definitions, a glyph is a graphic representation of a character (so different fonts define different glyphs to represent the same character), and a grapheme is a sequence of one or more characters which a common language user would consider as a unit. [Note that this usage differs from what a linguist means by a "grapheme", so the Unicode standard currently uses the term "grapheme cluster" rather than "grapheme", to minimize confusion.]


And further, the Unicode standard defines character (or abstract character) as picking out an "abstract meaning _or_ abstract shape", so a character for the "ff ligature" seems to be picking out something related to a visual representation, but it's actually not picking out a glyph (since that ligature looks different in different fonts).

(And ideally ligatures such as the above wouldn't be considered separate characters, but several standards treat them that way, and consequently Unicode includes them for backward compatibility with these standards. So for new usage they should be avoided, instead letting a rendering engine display a ligature glyph for a sequence of two "f" characters. But you'll still encounter them "in the wild".)

(Also, I'm using the term "character" to match the Unicode standard's usage, but it's the same thing for which others are using the word "codepoint". But I'm avoiding the latter usage because it's got some problems: (1) a code point is a number which picks out an abstract character--there's a one-to-one mapping between the two, but they're different things; (2) a "code point" implies an assignment of numbers to abstract characters, and if you're thinking of an approach like the one Dan spelled out, then you need to say _which_ assignment of numbers to characters you're talking about at any given time; and (3) it's supposed to be "code point", not "codepoint".)

JEff



Reply via email to