Re: Dealing with different character map formats when mapping glyph indicies to character codes

Hin-Tak Leung Tue, 23 May 2023 09:40:57 -0700

 On Tuesday, 23 May 2023, 17:19:46 BST, Craig White <gerzy...@gmail.com> wrote:


> I was looking into how freetype maps character codes to glyph indices, and 
> learned that there are many different formats the character map can be in, 
> not to mention the one-to-many and many-to-one mappings that Werner mentioned.
> Will it be necessary to implement the reverse mapping separately for every 
> cmap format?

Not sure why you need to/want to implement it in Freetype. glyph id is unique 
per glyph. Some glyphs are not mapped in any character encodings e.g. "symbol 
fonts with custom encoding vectors" <- there is even a name for such.

Perhaps it is best to STOP thinking about (unicode) characters. Glyphs are 
shaped drawings with a glyph id, some of them for example, lignatures ("combo 
characters" like "ff" , "etc"), which correspond to two (unicode) characters. 
And in Arabic, almost every character have 2 to 4 glyph shapes, called isolated 
forms and init/medi/fini forms.

I think I actually have a python program which does the reverse-map (for the 
purpose of dropping some glyphs in the many-to-one scenario). 
examples/cjk-multi-fix.py in my freetype-py fork ( 
https://github.com/HinTak/freetype-py/, you might need to switch to the 
font-diag branch to see it if it is not not the default branch).

The opentype spec / and font tech was created to make looking up in the most 
frequently used direction (from character encoding to glyph id) fast and easy.

Re: Dealing with different character map formats when mapping glyph indicies to character codes

Reply via email to