Re: [Fonts]Automatic 'lang' determination

Keith Packard Sat, 29 Jun 2002 09:27:25 -0700


Around 9 o'clock on Jun 29, Jungshik Shin wrote:


> IMHO, most problems with Han Unification arise not from using a _single_
> font targeted at one of zh_TW/zh_CN/ja/ko to render a run of text in
> another but from mixing _multiple_ fonts (with _drastically different_
> design principle and other differences like baseline) to render a single
> run of text (say, 65% of characters drawn from one font, 25% from a second
> font, 7% from a third font, etc).

Yes, I agree -- this is true in Western languages as well where the 
application selects a font covering only Latin-1 and attempts to display 
text requiring glyphs from Latin-2; a "smart" application will locate an 
additional font to fill-in the missing glyphs, the result looks like a 
ransom note.

The hope is that proper language tags in the document can avoid this at 
the start by making the first font contain the proper coverage for the 
entire block of text.

This goal is reflected in the design I outlined -- fonts are deemed 
"suitable" for a particular language when they cover a significant 
fraction of the codepoints commonly associated with that language.

> Suppose there's a document tagged as zh_TW that explains how PRC government
> simplified Chinese characters to boost the literacy rate after WW II. If a
> Big5 font (that doesn't cover all characters in the doc) is selected
> instead of a GBK/GB18030 font (with the full coverage), simplified Han
> characters(not used in Taiwan but only used in PRC) in the doc have to be
> rendered with another font (most likely GB2312/GBK/GB18030 font).

A correct version of this document would tag individual sections of the
document with appropriate tags.  This way, the zh_TW sections could be
presented in a traditional Chinese font while the mainland portions are
displayed with simplified Chinese glyphs.

I don't know how prevalent language tagging is in office document formats, 
but it's certainly available in HTML.  It's the HTML case that started my 
journey into language tags.

>  I'm not sure what you meant by 'glyph forms are more likely
> simplified'. You might have misunderstood some aspects of Han Unification
> in Unicode/10646.  In Unicode, simplified forms of Chinese characters are
> NOT unified with corresponding traditional forms of Chinese characters.

You're right -- I didn't believe this to be the case.  I had heard that the
unified portion within the BMP do co-mingle simplified and traditional
forms, but that the non-BMP Han extension provide separate codepoints for
each.

If even BMP codepoints are separate, then it should be possible to create 
a large set of codepoints which could mark fonts as suitable for the 
display of simplified Chinese which are distinct from the set of 
codepoitns suitable for the display of traditional Chinese.   That would 
be nicer than my current kludge of marking any font suitable for 
traditional chinese as unsuitable for simplified Chinese.

Keith Packard        XFree86 Core Team        HP Cambridge Research Lab


_______________________________________________
Fonts mailing list
[EMAIL PROTECTED]
http://XFree86.Org/mailman/listinfo/fonts

Re: [Fonts]Automatic 'lang' determination

Reply via email to