Around 9 o'clock on Jun 29, Jungshik Shin wrote:
> IMHO, most problems with Han Unification arise not from using a _single_ > font targeted at one of zh_TW/zh_CN/ja/ko to render a run of text in > another but from mixing _multiple_ fonts (with _drastically different_ > design principle and other differences like baseline) to render a single > run of text (say, 65% of characters drawn from one font, 25% from a second > font, 7% from a third font, etc). Yes, I agree -- this is true in Western languages as well where the application selects a font covering only Latin-1 and attempts to display text requiring glyphs from Latin-2; a "smart" application will locate an additional font to fill-in the missing glyphs, the result looks like a ransom note. The hope is that proper language tags in the document can avoid this at the start by making the first font contain the proper coverage for the entire block of text. This goal is reflected in the design I outlined -- fonts are deemed "suitable" for a particular language when they cover a significant fraction of the codepoints commonly associated with that language. > Suppose there's a document tagged as zh_TW that explains how PRC government > simplified Chinese characters to boost the literacy rate after WW II. If a > Big5 font (that doesn't cover all characters in the doc) is selected > instead of a GBK/GB18030 font (with the full coverage), simplified Han > characters(not used in Taiwan but only used in PRC) in the doc have to be > rendered with another font (most likely GB2312/GBK/GB18030 font). A correct version of this document would tag individual sections of the document with appropriate tags. This way, the zh_TW sections could be presented in a traditional Chinese font while the mainland portions are displayed with simplified Chinese glyphs. I don't know how prevalent language tagging is in office document formats, but it's certainly available in HTML. It's the HTML case that started my journey into language tags. > I'm not sure what you meant by 'glyph forms are more likely > simplified'. You might have misunderstood some aspects of Han Unification > in Unicode/10646. In Unicode, simplified forms of Chinese characters are > NOT unified with corresponding traditional forms of Chinese characters. You're right -- I didn't believe this to be the case. I had heard that the unified portion within the BMP do co-mingle simplified and traditional forms, but that the non-BMP Han extension provide separate codepoints for each. If even BMP codepoints are separate, then it should be possible to create a large set of codepoints which could mark fonts as suitable for the display of simplified Chinese which are distinct from the set of codepoitns suitable for the display of traditional Chinese. That would be nicer than my current kludge of marking any font suitable for traditional chinese as unsuitable for simplified Chinese. Keith Packard XFree86 Core Team HP Cambridge Research Lab _______________________________________________ Fonts mailing list [EMAIL PROTECTED] http://XFree86.Org/mailman/listinfo/fonts