Re: [Fonts] Combining characters
On Sat, 6 Sep 2003, Anuradha Ratnaweera wrote: Let me put this in a simple point form using a hypothetical example: Now, if I want to render character 51 of X inplace of the composite character 4001+4010, how should I proceed? Is there a way to map unicode sequences to actual (physical) fonts. Prefarably in the form: 4001,4010 - X,51 Your problem is not new and has been worked on for many years by a number of people and today we have a few satisfactory solutions. You crossposted to a few lists I subscribed to. Although I've already answered to you on gtk-i18n list, here I'm gonna give you some URLs: http://www.microsoft.com/typography/specs/default.htm http://www.pango.org (and the source code of Pango available at http://cvs.gnome.org. Take a look at files in pango/modules/indic and pango/modules/thai. You can also take a look at the ICU source code) http://graphite.sil.org http://developer.apple.com/fonts/ Jungshik ___ Fonts mailing list [EMAIL PROTECTED] http://XFree86.Org/mailman/listinfo/fonts
[Fonts] Re: After-XTT's extension of the encoding field.
On Thu, 7 Aug 2003, Mike FABIAN wrote: Jungshik Shin [EMAIL PROTECTED] : On Sat, 2 Aug 2003, Chisato Yamauchi wrote: Have you seen CJK's *TYPICAL* fonts.dir of TrueType fonts? It is following: Not many people would be fond of tweaking fonts.dir/scale files these days :-) It can be automatically generated. The /usr/sbin/fonts-config script on SuSE Linux generates such TTCap entries automatically into the fonts.dir if it detects that xtt is enabled in /etc/X11/XF86Config. That sounds nice. It'll certainly make things easier. However, it could make some people frustrated if it just overwrites the existing fonts.dir (I don't know whether fonts-config on SuSE Linux does that or not) that was 'hand-tweaked' to their satisfaction. In the past, I made it a rule to back up fonts.dir/fonts.scale after losing heavily customized fonts.dir/fonts.scale to an automated tool a couple of times. I agree that the old X fonts are broken beyond repair and we should move on to use fontconfig/Xft as much as possible. The old font system must be kept for backwards compatibility of course but it is probably just a waste of effort to add more extensions the X11 core font system. Much better said than mine. This is exactly what I meant but apparently my choice of words was not that good. If I had thought that support for X11 core fonts need to be removed _now_, I wouldn't have spent my time on gb18030.2000-1 issue (Xfree86 bug 441) let alone fixing bugs in CJK font encoding files for freetype module last year. Jungshik ___ Fonts mailing list [EMAIL PROTECTED] http://XFree86.Org/mailman/listinfo/fonts
Re: [Fonts] Re: Problem of Xft2
On Sat, 9 Aug 2003, Jungshik Shin wrote: On Fri, 8 Aug 2003, Pablo Saratxaga wrote: That being said, it would be nice to have the ability to do user-configuration of glyph substitutions in gtk2; eg telling that when a given font is choosen, then characters of range 0x00-0xff should be ignored, and taken from font instead. The ascii range of some CJK fonts is simply too ugly... or even bugged in some cases. That doesn't need to be that complex. Simply allowing CSS-style fontlist is more than enough. That is, offering a UI for specifying an _ordered_ list of fonts (instead of just one font, generic or specific) should work well. That is, by putting a good Latin(-only) font, a Cyrillic(-only) font, and a Greek(-only) font before a CJK font followed by a generic font (e.g. Serif), you can get the best of all fonts. This UI needs to be a part of the system-wide 'control panel'. I have to correct myself. This does not work well when font selection is done in tandem with 'lang' ('lang' given a very large weight) and _without_ actually going through a run of text to render, which is often the case. What you described may be necessary in the following scenario. Suppose we specify Courier, MingLiu' for a block of text marked as 'zh-TW'. Because Latin letters in CJK fonts are not so good, we specify 'Courier' before 'MingLiu' expecting Latin letters to be rendered by Courier and Chinese characters to be rendered by MingLiu[1]. If the font selection is made solely based on the font list (ordered) and lang. (with 'lang' given a large weight), only 'MingLiu' would be selected because 'zh-TW' is not covered by Courier. As a result, all characters end up being rendered by MingLiu. Char-by-char font selection doesn't have this problem. However, it's likely to be slower. Going through a run of text before choosing a font/a set of fonts may work better but it may be even slower. Staying in a single font as long as possible is another possibility. Of course, if 'lang' is not taken into account and just the ordered list of fonts is used in glyph/font search, we'd not have the above problem. On the other hand, unless the font list is carefully selected, one may get ransom-note style rendering in some cases. Jungshik [1] In some case, exactly the opposite is desired under the premise that glyphs of Latin letters in a CJK font are designed to match well with CJK characters in the font. This works well just as it is now. ___ Fonts mailing list [EMAIL PROTECTED] http://XFree86.Org/mailman/listinfo/fonts
Re: [Fonts] Re: After-XTT's extension of the encoding field.
On Fri, 8 Aug 2003, Mike FABIAN wrote: Jungshik Shin [EMAIL PROTECTED] : On Thu, 7 Aug 2003, Mike FABIAN wrote: It can be automatically generated. The /usr/sbin/fonts-config script on SuSE Linux generates such TTCap entries automatically into the make some people frustrated if it just overwrites the existing fonts.dir (I don't know whether fonts-config on SuSE Linux does that or not) Yes, it does. details on how Mike's font-config script works... snipped Thanks you for the details. It seems that you gave a lot of thought to the script and that it meets my need (if I have to tweak fonts.scale/dir files ever again :-)) Jungshik ___ Fonts mailing list [EMAIL PROTECTED] http://XFree86.Org/mailman/listinfo/fonts
[Fonts] Re: Problem of Xft2
On Fri, 8 Aug 2003, Pablo Saratxaga wrote: On Fri, Aug 08, 2003 at 06:59:43PM +0900, Chisato Yamauchi wrote: But Gtk2 has not complete font-substitution mechanism. Therefore, Gtk2 is insufficient in CJK environment. GTk2, using pango, has builtin fontset mechanism. (it is always enabled, and automatically build, depending on language and language coverage of available fonts). Certainly this is true as long as you use Pango, but not all Gtk2 applications use Pango. Moreover, the font selection widget in Gtk2 does not have the UI to let users specify multiple fonts (CSS-like). Apparently, Qt has this UI according to Yamuchi-san. So I *NEVER* use Gtk2-mozilla. It has no flexibility of a font setting. Mozilla doesn't use Gtk2/pango text rendering mechanisms to render html pages. So, you cannot judge the font abilities of Gtk2 toolkit with mozilla. Well, when rendering html/xml pages, Mozilla has its own 'fontset/font substitution' mechanism of a sort (based on fontconfig in case of Xft build. X11core build is very complicated partly because it has to support the CSS-style font list in its own without any help of fontconfig fielding through 'the jungle of XLFD-based fontnames.) that is very similar to what you wrote above about Pango. Otherwise, how could it support CSS-style font list? Gtk may choose automatically a font that looks funny, but at least a character is always displayed in a readable way, I prefer it that way. I guess just saying Gtk(2) is a bit misleading. Gnome-terminal is a Gtk(2) application, but by default it doesn't use Pango and it does not do 'automatic font substitution' as you described. Set Gnome-terminal font to 'Courier' and see how CJK characters (or any character not covered by Courier) are rendered. They all come in empty boxes. That being said, it would be nice to have the ability to do user-configuration of glyph substitutions in gtk2; eg telling that when a given font is choosen, then characters of range 0x00-0xff should be ignored, and taken from font instead. The ascii range of some CJK fonts is simply too ugly... or even bugged in some cases. That doesn't need to be that complex. Simply allowing CSS-style fontlist is more than enough. That is, offering a UI for specifying an _ordered_ list of fonts (instead of just one font, generic or specific) should work well. That is, by putting a good Latin(-only) font, a Cyrillic(-only) font, and a Greek(-only) font before a CJK font followed by a generic font (e.g. Serif), you can get the best of all fonts. This UI needs to be a part of the system-wide 'control panel'. Falling short of that, applications like Gnome-terminal should at least (the same is true of Konsole) offer a way to specify East Asian font separately (double/full-width) as is done by xterm, vim, OpenOffice and MS Office. Because Gnome-terminal and Konsole don't have this feature, I still prefer to work in xterm for which I can specify my favorite font for single-width characters along with my favorite font for double-width characters (with '-fw' option. I'm gonna add '-faw' option to xterm) Jungshik ___ Fonts mailing list [EMAIL PROTECTED] http://XFree86.Org/mailman/listinfo/fonts
[Fonts] Re: Terminal versus X11 fonts
On Thu, 7 Aug 2003, Steve Sullivan wrote: For example, the Terminal edit current profile gui shows the Miriam font, but Miriam isn't listed by xfontsel or xlsfonts. There are two separate font systems, the X11 core font system and the client-side system with Xft/fontconfig. What you get with xlsfonts/xfontsel is X11 core fonts. 'Terminal' in RedHat 9 uses the client-side font system (Xft/fontconfig based). You can make Miriam and other fonts available as X11 core fonts with freetype/Xtt/type1 backends if they're of a type supported by them. http://www.xfree86.org/4.3.0/fonts.html has all the gory details about XF86 font systems. For (After) X-TT, see http://x-tt.sourceforge.jp/ A lot of people believe that the client-side font system is the way to go (although the core font system will be around for a long time to come) so that you may consider writing your application with the client-side font system (especially, if I18N - internationalization - is important to your projects/programs). You may also want to take a look at http://fontconfig.org and http://www.pango.org Jungshik ___ Fonts mailing list [EMAIL PROTECTED] http://XFree86.Org/mailman/listinfo/fonts
Re: [Fonts] After-XTT's extension of the encoding field.
On Sat, 2 Aug 2003, Chisato Yamauchi wrote: Although the pliability of handling such special fonts is also important, non BMP plane in XLFD is now the most important problem. Confusion is already seen such as linux-utf8 list. An official definition should be indicated right now. Why has XFree86 left this? That's because XFree86 is moving away from 15year-old XLFD-based approach. As Owen wrote, we'd better let that poor thing rest in peace and move along. With fontconfig/Xft, we don't need to worry about XLFD any more except for the sake of backward compatibility. For non-BMP characters, there isn't much issue with back. comp. to worry about. If you take a look at Mozilla's gfx/src/gtk/nsFontMetricsGTK.cpp and gfx/src/gtk/nsFontMetricsXft.cpp (or gfx/src/windows/nsFontMetricsWin.cpp) at http://lxr.mozilla.org/seamonkey, you'll know what I mean. Mozilla developers have put tremendous amount of 'heroic' efforts to make CSS-style font selection work with XLFD-based font names. However, a much simpler and shorter fontconfig based code(in nsFontMetricsXft.cpp) works better that nsFontMetricsGTK.cpp (for XLFD-based font names). Adding yet another field to make XLFD more complex doesn't help a bit in this respect. Besides, in your example (GT fonts), I don't see why you need to extend XLFD. Couldn't you just use different numbers in the last field of XLFD? gt21.ttf -gt-mincho-medium-r-normal--0-0-0-0-c-0-gt.2000-0.1 gt22.ttf -gt-mincho-medium-r-normal--0-0-0-0-c-0-gt.2000-0.2 gt23.ttf -gt-mincho-medium-r-normal--0-0-0-0-c-0-gt.2000-0.3 Instead of the above, the following should work as well, shouldn't it? Am I missing something? gt21.ttf -gt-mincho-medium-r-normal--0-0-0-0-c-0-gt.2000-1 gt22.ttf -gt-mincho-medium-r-normal--0-0-0-0-c-0-gt.2000-2 gt23.ttf -gt-mincho-medium-r-normal--0-0-0-0-c-0-gt.2000-3 Why do we persist in X-TT? The reason is that libfreetype.a does not useful at all in CJK. Especially the following two points are fatal. Well, X-TT's 'competitor' is not freetype module, but fontconfig (+FT2 + Xft) - Handling a proportional multi-bytes fonts is too slow. (The loading speed of libfreetype.a is 20 times slower than that of X-TT 1.4; I show a benchmark in next email.) For the with TTCap option case, the option has been set to fc=0x3400-0xe7ff:fm=0x5a00. This particular option setting indicates that xtt handles the glyphs that are within the CJK region (in unicode) with constant spacing, whose metrics are similar to that of 0x5a00. This is a nifty idea that can be utilized in Freetype2 and/or fontconfig, but it seems to me that the fact that there's that much difference in the perf. between two cases is yet another indication that X11 core fonts have to go away. - The modification of a font(such as auto italic and double striking, etc.) cannot be used at all. That is, libfreetype.a should also have all options of TTCap. Yeah, TTCap is useful, but it appears that we're trying to solve the wrong problem turning away from the real issue. The real problem is that we don't have quality CJK fonts in multiple styles. Anyway, fontconfig offers 'artificial slanting' although it doesn't make much sense to have 'italic' or 'slant' typefaces for CJK. As for 'artificial bold', there's a patch somewhere, but hasn't been accepted because Freetype2 reportedly will come up with a better solution for 'artificial bold'. Have you seen CJK's *TYPICAL* fonts.dir of TrueType fonts? It is following: Not many people would be fond of tweaking fonts.dir/scale files these days :-) Why would they when just dropping truetype fonts in fontconfig in one of directories listed in the font search path works like a charm? Jungshik P.S. If merging X-TT and freetype module is not gonna happen soon, it would be nice if X-TT makes use of fontenc library used by freetype library. With fontenc library, freetype module doesn't have to hardcode font encoding to Unicode mapping tables. Because font encodings are not hard-coded, it's easy to add a new encoding although these days we don't care much. Moreover, it'll cut down the size of X-TT significantly. ___ Fonts mailing list [EMAIL PROTECTED] http://XFree86.Org/mailman/listinfo/fonts
[Fonts] Re: two different gb18030.2000-1 : Sun/Mozilla/Java vs RH
On Wed, 9 Jul 2003, Yu Shao wrote: Jungshik Shin wrote: On Tue, 8 Jul 2003, Yu Shao wrote: I don't get you here, the first version of the patch was made for Red Hat 7.3, at that time we have to use Mozilla with X core font. Since then the patch has been there almost unchanged. GB18030.2000* aliases were added purely because we want Mozilla working (you made gb18030.2000-0 an alias to gbk-0, but you also made a new identity mapping for gb18030.2000-1. They're different/separate issues that cannot be aggregated with '*'.) As I wrote, Mozilla's GB18030Font1 is NOT your gb18030.2000-1.enc BUT Sun's gb18030.2000-1 (and what's proposed by James Su and Roland Mainz). There's NO dispute about gb18030.2000-0. The question is about gb18030.2000-1 (not '0'). With this difference, how could you make Mozilla (non-Xft build) work with your gb18030.2000-1? Probably, it gave you an impression that it worked either because you also had iso10646-1 fonts or because you haven't checked BMP characters _outside_ the repertoire of gb18030.2000-0 with Mozilla. About the identical mapping in RedHat's GB18030.2000-1, it is because the inside compound encoding part is treating them as ISO10646 codes. This is a bit confusing. How am I supposed to interpret this together with the first sentennce in your reply? Do you need RH8's version of gb18030.2000-1.enc or not? This question of mine is about Compound text encoding. You began your reply with the following. Because RedHat XFree86 18030 patch's compound text encoding part was based on James Su's patch which was derived from UTF-8' code, it doesn't really need GB18030.2000-0.enc and GB18030.200-1.enc to be functioning. and then ended it with 'About the identical mapping the inside compound ... is treating them as ..'. To me it appears to contradict each other. How would you propose the conflict between RH's gb18030.2000-1.enc and Solaris/Mozilla/Java's gb18030.2000-1 be solved? Could you add your comment to http://bugs.xfree86.org//cgi-bin/bugzilla/show_bug.cgi?id=441 ? What GB18030 compound encoding code has XFree86 decided to use? right now, there is even no GB18030 X locale definition in CVS, there is no conflict, just totally depends on how to approach the compound text encoding part. Let me make it clear. The conflict is not inside XFree86 but between RH8's gb18030.2000-1 on the one hand and Sun's and Mozilla's gb18030.2000-1 (and what James Su and Romland Mainz proposed) on the other hand. It's regrettable that your patch hasn't been discussed in open forums like fonts/i18n list of XFree86 IIRC (my memory sometimes doesn't serve me well so that I may have missed it). Do you care which of two gb18030.2000-1's is included in XFree86, do you? If you don't care, you're willing to replace RH's gb18030.2000-1.enc with that based on Sun's/Mozilla's/Java's (as suggested by James Su in http://www.mail-archive.com/fonts%40xfree86.org/msg01343.html or by Roland in http://bugs.xfree86.org/cgi-bin/bugzilla/show_bug.cgi?id=441) Independent of compound text encoding, it's bad that gb18030.2000-1 has two different meanings. That's what I want to resolve here. If you agree to go with Sun's version, we won't have to worry about having to figure out which is which (although x11 core fonts become less and less important...) Jungshik ___ Fonts mailing list [EMAIL PROTECTED] http://XFree86.Org/mailman/listinfo/fonts
[Fonts] re: a new font encoding file for XF86 : gb18030.2000-2? (fwd)
Hi, I sent the following to James Su to seek his opinion, but it was bounced. Now I'm sending to 1i8n and fonts list expecting him or other Chinese experts to pick this up. Jungshik Hi, Could you make a comment on http://bugs.xfree86.org//cgi-bin/bugzilla/show_bug.cgi?id=441? It's about adding a new font encoding file to XF86 for BMP characters NOT covered by gbk-0/gb18030.2000-0.enc and gb18030.2000-1.enc that you proposed and was/were accepted. I don't think it's necessary, but your expert opinion would be great to have. I tried to add you to CC of bugzilla, but you're registered there so that I'm writing this instead. Thank you, Jungshik ___ Fonts mailing list [EMAIL PROTECTED] http://XFree86.Org/mailman/listinfo/fonts
[Fonts] Re: [Fonts]Xft patch for halfwidth glyphs in monospace CJK fonts
On Tue, 10 Dec 2002, Anthony Fok wrote: Thank you for bringing up this important issue. I was assigned with the task of dealing with s p a c e d - o u t CJK fixedPitch font issue in konsole. In addition to Konsole, gnome-terminal, Mozilla-xft(for rendering text/plain or a portion of html documents with font style set to monospace), vim-gtk2 and a lot of other programs that require 'fixed-width' fonts have the same problem with CJK 'fixed-width' (actually 'bi-width') fonts. I was about to look into Xft/Pango to see if I can solve this problem because fixing it on application program side seems ineffcient, but 'googled' it to find this message that was sitting in my mailbox unread. The follow-up yours is missing (because I had a network outage for a few days), but found it in the archive. It seems like the patch mentioned by Ken is more ambitious than Anthony's (http://www.kde.gr.jp/~akito/patch/fcpackage/2_1/) and it is probably harder to put that into upcoming 4.3.0 release. Therefore, I'm wondering what Keith thinks of adding Anthony's or similar patch to Xft. CJK fixed-width font issue is serious for CJK users and it'd be very nice to take care of it before the release of 4.3.0. TrueType fonts with the fixedPitch flag set to true to mean that: * All CJK glyphs have the same fullwidth * The ASCII glyphs and other special glyphs have the same halfwidth I have submitted a small patch to the FreeType mailing list to deal with the halfwidth monospace font issue, Has it been committed? and it turns out that Xft has the same issue. It took me a while to figure out that it was not konsole or Qt. :-) Any idea on how to deal with the 15 / 2 = 7; 7 + 7 = 14 issue? :-) How about rounding up to the nearest even number before dividing it by 2? Jungshik ___ Fonts mailing list [EMAIL PROTECTED] http://XFree86.Org/mailman/listinfo/fonts
Re: [Fonts]Re: Xprint
On 10 Dec 2002, Juliusz Chroboczek wrote: JS Even with this weakness, Xprint is by far the best printing JS solution available at the moment for Mozilla under Unix/X11 JS because postscript printing module of Mozilla does not work very JS well yet JC Xprint might work for CJK fonts, It does work for CJK now. Especially version 0.8 of Xprint with truetype font support works pretty well. Even the PS output produced by 0.7 with X11 bitmap fonts doesn't look that bad. JC although I'm a little bit suprised at your enthusiasm for the thing. I'm not so enthusiastic about it as you may think. A better word to characterize what I think about it is ambiguity. See my postings to mozilla-i18n newsgroup news://news.mozilla.org/netscape.public.mozilla.i18n. When I wrote 'by far the best', I meant _as of now_ it gives the best match between the print out and the screen rendering. For CJK web pages, Mozilla PS module can't do that because only *one* PS font for each language can be specified. That is, on the screen, Mozilla(especially Mozilla-Xft) can be a good implementation of CSS, but on the print out, it cannot. Xprint is not perfect, but it's better than printing out everything(CJK and non-Western European) in a single font (specified in pref. file which has to be hand-edited by end-users.). Besides, complex script cannot be printed out at all by Mozilla under Unix without Xprint. With Xprint, it's possible to print out web pages in complex scripts provided that you can render them on the screen with Mozilla-X11core. That's a big difference. JC There is no way, though, how Xprint JC could work for complex scripts without standardising on glyph JC mappings. As I understand it, Xprint is a specialized form of X11 server combined with some X clients. Therefore, I think it has all sorts of weakness found in server-side font model we have been moving away from. It's not fast and nor efficient (compared with client-side font technology) and it doesn't support 'modern' CSS-based font selection/resolution at the same level as provided by fontconfig. Nonetheless, it works _now_. As for complex script rendering, it's possible to print them out as I wrote above and my test with Old Korean showed. (see http://bugzilla.mozilla.org/show_bug.cgi?id=176315). Standardizing on glyph mapping is not a requirement if we just deal with a single application program(e.g. Mozilla). Mozilla-X11 has a way to map the last two fields of XLFD to a mapping between a string of Unicode characters and a sequence of glyphs. That's what Mozilla-X11 uses to render Indic scripts, Thai and Hangul Conjoining Jamos. (Mozilla doesn't yet support opentype fonts at least under X11. Some Pango code was borrowed but that's not from pango-xft but from pango-x). Because Xprint module of Mozilla shares many things with Mozilla-X11corefont/Mozilla-Gtk, without doing anything, Xprint just works when it comes to printing out web pages in Indic scripts, Thai and Old Korean. Of course, I'm well aware that we have to use opentype fonts with gsub/gpos tables for complex script rendering. However, we also need a short-term solution that works now. For instance, there is not a single opentype font freely available for old Korean. The situation is much worse than that for Indic scripts for which free opentype fonts began to emerge. In the meantime, we have to resort to font-specific-encoding hacks. JC There is also no way[1] how Xprint could implement JC dynamically generated fonts, as required for example by CSS2. I'm a bit confused as to what you meant by 'dynamically generated fonts'. Did you mean 'web fonts'? Can you tell me what you meant? JC The right approach is obviously to do incrememtal uploading of fonts JC to the printer at the PS level, as the Mozilla folks are trying to do. I totally agree with you provided that the font resolution mechanism is tied with fontconfig. JC I'm a little bit suspicious about their choice to use Type 42 CIDFonts Given that truetype fonts are much easier to come by than genuine CID-keyed fonts for CJK (which is also true of truetype fonts vs PS type 1 fonts for European scripts although to a lesser degree), I guess the choice is all but inevitable(perhaps OpenOffice also adopted this approach). Do you have a better idea? Judging from your reservation about the rasterization on the host side, what you're thinking of cannot be converting all the glyphs into bitmap and putting them in the PS output. Anyway, I believe this 'mini-project' for Mozilla printing has be 'glued' with fontconfig in CSS2 font resolution so that the screen rendering and PS output use the same set of fonts. What I can think of as an alternative to embedding type 42 PS font(type 2 CIDFont) is just to refer to CID-keyed fonts/type 1 fonts in the PS output and let a real PS printer or ghostscript do the rest of the job. This is similar to what the present PS module for Mozilla does. However, in order to get a faithful
[Fonts]Re: Xprint
On Mon, 9 Dec 2002, Michael B. Allen wrote: Roland Mainz has released a new version of Xprint and appears to be actively working on another. The mozilla website has some nifty looking internationalized screenshots displaying Turkish, Chinese, etc. I've been using an Xprint CUPS setup for sometime now with great success. http://xprint.mozdev.org/screenshots.html Yeah, Xprint works great (it can even be used to print out old Korean page with U+1100 Hangul Jamos) It solved a long-standing problem in X11(well, commercial Unix have some solutions for this), the enormous gap between what you see on the screen and what you get on paper(especially for non-European scripts). Because Xprint is an Xserver specialized for printing and shares many things with the main X server for screen rendering, what you see on the screen is faithfully replicated in what you print out with Xprint as long as two X servers(one for screen and Xprint) have access to the common set of fonts. However, the fact that Xprint is a specialized form of X*server* is also a weakness. You may know that the whole Linux (and FreeBSD and other Unix that rely on XFree86) community is moving away from the server side font and toward client-side font technology (fontconfig and Xft. http://fontconfig.org) With fontconfig and Xft, Unix/X11 finally got on par with Windows and MacOS in terms of font support. Arguably, this is the greatest development in X11 that happened in the last 10years. Mozilla-Xft is finally able to support CSS at the same level with Mozilla-Win and Mozilla-MacOS(no more need to tinker with XLFD and things like that). The problem of the server-side font becomes very obvious when you search for some Japanese(Chinese, Korean) words in Google (they don't have to be CJK, but to make sure that you get a truly multilingual page in UTF-8 that requires multiple fonts to render) and see Mozilla-X11core struggle (sometimes it can take almost 10 seconds at my PIII 750MHz with 384MB) to render the page. (Or, open up the font selection dialog box in Mozilla-X11core and compare that with the font selection dialog box in Mozilla-Xft/Mozilla-Windows/ Mozilla-MacOS. You can repeat the experiment with Mozilla-Xft.) Mozilla-Xft renders the page instantaneously. Also try to print the page with Xprint. Mozilla doesn't respond for as long as 30 seconds (depending on the complexity and the length of pages) until Xprint is done with searching for fonts to 'render' the page. Even with this weakness, Xprint is by far the best printing solution available at the moment for Mozilla under Unix/X11 because postscript printing module of Mozilla does not work very well yet(it works but is far behind what you can get with Mozilla-Windows and Mozilla-MacOS where the OS-level printing infrastructure is far superior to that under Unix/X11. Well, on some commerical Unix, it may be better.) It would be even greater if it's possible to combine Xprint somehow with fontconfig(although not likely). Better still is to write something like XftPrint(or XftPS) which would do to printing what Xft does to the screen rendering . There's an on-going project in Mozilla to directly use Freetype2 and embed type42 truetype fonts in PS output. This might be where fontconfig can come in to better support CSS in Mozilla printout as is done on the screen by fontconfig+Xft in Mozilla-Xft. I hope the Linux distros jump on the bandwagon and start shipping it along with an Xprint enabled Mozilla (Red Hat's mozilla RPMs do not have Xprint enabled). I'm not sure why RH disabled Xprint in their Mozilla RPM. Xft, Xprint and PS printing module can coexist in Mozilla without much problem as far as I can tell. Perhaps, that blocking I mentioned above may not be acceptable? Jungshik Shin P.S. I'm CCing to fonts list of XF86. ___ Fonts mailing list [EMAIL PROTECTED] http://XFree86.Org/mailman/listinfo/fonts
Re: [Fonts]Xft/fontconfig and Non-BMP characters
On Sun, 1 Dec 2002, Jungshik Shin wrote: While trying to make Mozilla-Xft support non-BMP characters with fonts like CODE2001.TTF (with pid=3/eid=10 Cmap), I found that freetype and Xft need a little change. Details are sent to linux-utf8 list (http://mail.nl.linux.org/linux-utf8/2002-12/msg0.html) and Bugzilla Extending XftTextExtents16() to support UTF-16 is similar to Attached is my patch(a bit revised) to extend XftTextExtents16 to support UTF-16 and to fix a typo in fstr.c of fontconfig(which makes the conversion from UTF-16 to UCS-4 not work correctly for characters in even numbered planes with the 17th bit in UTF-32 unset.) Keith, would you take a look? I also sent it to [EMAIL PROTECTED] and got a patch seq. #5522. Jungshik ___ Fonts mailing list [EMAIL PROTECTED] http://XFree86.Org/mailman/listinfo/fonts
Re: [Fonts]Xft/fontconfig and Non-BMP characters
On Tue, 3 Dec 2002, Jungshik Shin wrote: Attached is my patch(a bit revised) to extend XftTextExtents16 to support UTF-16 and to fix a typo in fstr.c of fontconfig(which makes the conversion from UTF-16 to UCS-4 not work correctly for characters in Sorry I forogot to attach it. This time, it's really attached. Jungshik Index: xc/lib/fontconfig/src/fcstr.c === RCS file: /cvs/xc/lib/fontconfig/src/fcstr.c,v retrieving revision 1.10 diff -u -r1.10 fcstr.c --- xc/lib/fontconfig/src/fcstr.c 2002/08/31 22:17:32 1.10 +++ xc/lib/fontconfig/src/fcstr.c 2002/12/04 03:10:13 @@ -282,8 +282,8 @@ */ if ((b 0xfc00) != 0xdc00) return 0; - result = FcChar32) a 0x3ff) 10) | - ((FcChar32) b 0x3ff)) | 0x1; + result = (FcChar32) a 0x3ff) 10) | + ((FcChar32) b 0x3ff))) + 0x1; } else result = a; Index: xc/lib/Xft/xftextent.c === RCS file: /cvs/xc/lib/Xft/xftextent.c,v retrieving revision 1.9 diff -u -r1.9 xftextent.c --- xc/lib/Xft/xftextent.c 2002/10/11 17:53:02 1.9 +++ xc/lib/Xft/xftextent.c 2002/12/04 03:10:14 @@ -147,6 +147,11 @@ free (glyphs); } +#define IS_HIGH_SURROGATE(u) (((FcChar16) (u) 0xfc00L) == 0xd800L) +#define IS_LOW_SURROGATE(u) (((FcChar16) (u) 0xfc00L) == 0xdc00L) +#define SURROGATE_TO_UCS4(h,l) (FT_UInt) (h) 0x03ffL) 10) | \ +((FT_UInt) (l) 0x03ffL)) + 0x1L) + void XftTextExtents16 (Display *dpy, XftFont *pub, @@ -156,6 +161,7 @@ { FT_UInt*glyphs, glyphs_local[NUM_LOCAL]; inti; +intnglyphs = 0; if (len = NUM_LOCAL) glyphs = glyphs_local; @@ -169,8 +175,19 @@ } } for (i = 0; i len; i++) - glyphs[i] = XftCharIndex (dpy, pub, string[i]); -XftGlyphExtents (dpy, pub, glyphs, len, extents); +{ + if (IS_HIGH_SURROGATE(string[i]) i + 1 len + IS_LOW_SURROGATE(string[i + 1])) + { + glyphs[nglyphs++] = XftCharIndex (dpy, pub, + SURROGATE_TO_UCS4(string[i], string[i + 1])); + ++i; + } + else + glyphs[nglyphs++] = XftCharIndex (dpy, pub, string[i]); +} + +XftGlyphExtents (dpy, pub, glyphs, nglyphs, extents); if (glyphs != glyphs_local) free (glyphs); }
[Fonts]Xft and Non-BMP characters
Hi, While trying to make Mozilla-Xft support non-BMP characters with fonts like CODE2001.TTF (with pid=3/eid=10 Cmap), I found that freetype and Xft need a little change. Details are sent to linux-utf8 list (http://mail.nl.linux.org/linux-utf8/2002-12/msg0.html) and Bugzilla (http://bugzilla.mozilla.org/show_bug.cgi?id=182877). Below is a part of my message to linux-utf8 list related to Xft. - I also have to extend XftTextExtents16() included in fcpackage-2.1 to deal with UTF-16 (instead of UCS-2). Xft2 has XftDrawStringUtf16() in addition to XftDrawString16() (the latter is for UCS-2). I thought about adding XftTextExtentsUtf16(), but it appears that it's more convenient for programs like Mozilla which uses UTF-16 for internal string representation when XftTextExtents16() is extended to support UTF-16. Again, there's a little speed penalty. Below are links to FT2 patch (against 2.1.3) and Xft patch (against fcpackage 2.1) http://bugzilla.mozilla.org/attachment.cgi?id=107852 : FT2 patch http://bugzilla.mozilla.org/attachment.cgi?id=107858 : Xft patch There are a couple of screenshots along with Mozilla patch and a couple of sample pages with non-BMP characters at http://bugzilla.mozilla.org/show_bug.cgi?id=182877 --- Extending XftTextExtents16() to support UTF-16 is similar to the extension of Win32 'W' APIs to support UTF-16. Some people may not like it. However, it seems not so bad an idea and I even think that XftDrawString16 may as well be extended in a similar manner. I'm not a big fan of UTF-16, but neither am I very much against it. IMHO, it'd be very nice if either extending XftTextExtents16() or adding a new function XftTextExtentsUtf16 (a la XftDrawStringUtf16() ) is done before the release of XFree86 4.3. Keith, what would you say? Jungshik ___ Fonts mailing list [EMAIL PROTECTED] http://XFree86.Org/mailman/listinfo/fonts
[Fonts]Re: making editable charset/lang in fonts.conf
On Tue, 22 Oct 2002, Keith Packard wrote: Thank you for your explanation. Around 12 o'clock on Oct 22, Jungshik Shin wrote: 1. get a pattern from an application(fontconfig client) 2. apply configuration-specified editing rules to the pattern. For each font: 3. read in font properties from fc-cache or (directly from font if fc-cache is not present) 4. measure the distance between the pattern and each font Fontconfig reads the font properties at startup time, and thereafter only when they change (it checks file mod times when fonts are listed). I see. So, step 3 should be at the top. What we could do is add a set of rules executed when the patterns are loaded although I'm not sure that's precisely what you want, More specifically, you meant 'the patterns holding font properties are loaded from font-cache files', didn't you? If so, that's what I want. match target=font test qual=any name=familystringFAMILY/string/test edit name=charset mode=MODE charset./charset/edit /match where 'MODE' can be 'add'('append/prepend' just do), 'subtract' or 'assign' (or something similar to that effect). Because 'charset' is already taken for Base85 representation of the coverage, a new property (that has to be translated to charset internally) name (e.g. coderange) might be used or 'charset' can be overloaded to mean a more human-readable representation of the coverage in fonts.conf. (sth. like [0x-0x]). For instance, I want the following to be applied to font properties of 'Gulim Old Hangul Jamo' (a hack-encoded font of which character/glyph assignment has NOTHING to do with actual Unicode character assignment) read off from fc-cache BEFORE matching against an application-provided pattern (by measuring the distance) begins. match target=font test qual=any name=family stringGulim Old Hangul Jamo/string /test edit name=charset mode=subtract charset0x4e00-0x5400/charset // remove hack-encoding code points /edit edit name=charset mode=add charset0x1100-0x11ff/charset // Hangul Jamos /edit edit name=charset mode=add charset0x302e-0x302f/charset // Hangul Tone marks /edit edit name=charset mode=add charset0xac00-0xd7a4/charset // Hangul syllables /edit edit name=lang mode=assign stringko/string /edit /match Another example is for Baekmuk Batang which doesn't have glyphs for U+1100-U+11FF, but can be used to render them nonetheless. match target=font test name=family stringBaekmuk Batang/string /test edit name=charset mode=add charset0x1100-0x11ff/charset // Hangul Jamos /edit edit name=charset mode=add charset0x302e-0x302f/charset // Hangul Tone marks /edit /match it would significantly impact application startup performance. Would just adding the feature to fontconfig have this significant negative impact even in absence of editing rules for this feature in fonts.conf? Or, would that negative impact manifest itself ONLY when fonts.conf actually has editing rules for this feature? If the latter is the case, the decision/choice would fall on end-users, wouldn't it? If they think they can exchange a performance hit at application start-up for a feature they desperately need, they would go for it. Otherwise, they wouldn't put any editing rule to be applied at font-properties loading stage. It seems like you want to select fonts based on Unicode coverage of the desired Hangul representation. Actually, that's related but not exactly what I want. I may have get you confused because last week I mentioned multiple ways of representing Hangul along with what I'm talking about here. Or, am I misunderstanding you? I'd like to override 'charset' property(Unicode coverage) of some fonts detected by fontconfig and stored in font-cache because the detected value of 'charset' property doesn't represent their 'true' ability due to hack-encodings used in them. In the example above, 'Old Gulim Hangul Jamo' is detected to cover [U+4E00-U+52xx] and [U+-U+007F], but I like that detected Unicode coverage to be modified by rules specified in fonts.conf to reflect its 'true' ability ( [U+1100,U+11FF], [U+AC00-U+D7A4], etc) Arguably, this could be useful for more general cases than I made it sound. Some fonts have precomposed Latin/Cyrillic/Greek letters with diacritics, but they may not have combining diacritics themselves. When a client of fontconfig like Pango tries to render a sequence of a base character and a diacritic mark with such a font, Pango *seems* to end up having two separate fonts, one for the base character(the font specified in an application ) and the other for the diacritic mark even though the first font (spec. in an application) has a glyph for the precomposed charater made up of the base character and the diacritic mark. Although this kind of problem can be solved at a level different from fontcofing fontconfig could well help here. You could easily add
Re: [Fonts]fontconfig peculiarity(??)
On Fri, 18 Oct 2002, Keith Packard wrote: Around 7 o'clock on Oct 18, Jungshik Shin wrote: For some unknown reason, 'New Gulim' is picked up by 'fontconfig' or 'Xft' for a certain characters when CODE2000 is explicitly requested by applications like Mozilla and gedit (via Pango) More specifically, those certain characters are U+115F(Hangul leading consonant filler) and U+1160(Hangul trailing consonant filler). Fontconfig has a kludge to weed out fonts with broken encoding tables; such fonts often have encoding table entries pointing at blank glyphs which aren't supposed to be blank. It checks each glyph in the encoding and ignores those which are inappropriately blank. which are expected to be blank, that list was derived from a similar table in Mozilla. Blank glyphs not in the table are assumed to represent broken Keith, we talked about this a month ago (Sep. 7th) on this very list :-) You came up with a much more extensive list of characters than Mozilla's blank glyph list. I also filed a bug for Mozilla-Windows (http://bugzilla.mozilla.org/show_bug.cgi?id=167136). You must have forgottent about it. :-) I added those two characters to the blank glyph list /etc/fonts/fonts.conf then. In addition, both Ngulim and Code2000 have blank glyphs for both characters. The only difference is that in Ngulim they're both *spacing*(width 0) while in Code2000 only U+115f is spacing and U+1160 is non-spacing(width=0). So, even if my blank glyph list doesn't have them, there's no reason I can think of Ngulim is preferred over Code2000 for those characters. If they're equal on this count, the explicit request seems to have to take precedence, doesn't it? One possible explanation is that Code2000 isn't marked as supporting 'ko' in font-cache for some reason while Ngulim is. However, both fonts have more or less similar coverage of Korean characters (the full set of precomposed syllables and Hangul Conjoining Jamos and other symbols in KS X 1001). So, this is a bit mytery, too. weren't included in the table. This means that no font will ever be listed as supporting these glyphs, so Mozilla will pick the first font in the match list to draw them with, expecting that this will produce a missing glyph indication. BTW, could it be possible to 'deceive' or 'force' fontconfig to believe that a certain font covers a certain range of Unicode even if it doesn't appear to? I guess it's not possible at the moment, but wouldn't it be nice to add it? What I'm thinking of is something like this: match target=font test qual=any name=familystringGulim Old Hangul Jamo/string/test edit name=coverage mode=assign binding=strong coderanges./coderanges/edit /match where coderanges are a comman-separated list of unicode code points (integer) or code ranges (sth. like [0x-0x]). I found in font cache file that charset property does exactly the thing I want to do with 'coderanges'. If so, would it be possible to use 'charset' to achieve what I described above? Well, I've gotta figure out how 'charset' represents Unicode ranges. Some fonts have a hack-encoding (although advertised as in Unicode) and their apparent Unicode coverage cannot be guessed at all by fontconfig based on Unicode cmap. An application or library aware of this hack-encoding can do some hack with them, though. However, fontconfig does not appear to return a requested font and come up with a fallback after 'intelligent guess' even if explicitly specified because what it thinks a font with hack-encoding can cover does not match at all the range of Unicode an application want to draw with the font. It'd be also nice to be able to do something similar with 'lang' tag. I thought the following line would *make* fontconfig *believe* (*ignoring* what it finds out with OS lang tag and orthography map) that 'Gulim Old Hangul Jamo' is suitable for Korean, but it doesn't seem to work. Did I do anything wrong? --- match target=font test qual=any name=familystringGulim Old Hangul Jamo/string/test edit name=lang mode=assign binding=strongstringko/string/edit /match --- Both of these certaily look like a hack, but some applications (perhaps mathml, Indic script handling, Korean alphabet handling...) need them until OTFs are widely available. Related problems are talked about at http://bugzilla.mozilla.org/show_bug.cgi?id=126919#c315 and comments references therein. http://bugzilla.mozilla.org/show_bug.cgi?id=95708 Jungshik ___ Fonts mailing list [EMAIL PROTECTED] http://XFree86.Org/mailman/listinfo/fonts
Re: [Fonts]fontconfig peculiarity(??)
On Fri, 18 Oct 2002, Keith Packard wrote: Around 12 o'clock on Oct 18, Jungshik Shin wrote: One possible explanation is that Code2000 isn't marked as supporting 'ko' in font-cache for some reason while Ngulim is. This explanation only makes sense when those two chars are NOT included in the blank glyph list, doesn't it? As I wrote, they've have been in the blank glyph list in my fonts.conf since early September. Hmm, things are getting more interesting. After I removed Ngulim.ttf from my font path and then put it back (I ran fc-cache before testing), suddenly Mozilla picks up U+1160 glyph from Code2000. The same is true of 'gedit' when Code2000 is specified as a font to use. Is it at the whim of electrons whirling around inside my computer :-) ? If your font specification includes language, this would cause Ngulim to be preferred over Code2000 if both are added to the pattern in the config file. If the application explicitly names 'Code2000' as a family name, then the language shouldn't matter. The page in question (http://jshin.net/i18n/korean/hunmin.html and http://jshin.net/i18n/korean/hunmin_comp.html) specifies font-family to be CODE2000 explicitly with CSS. I assume this will make Mozilla with Xft enabled ask fontconfig for that font explicitly. As for Pango(gedit), I'm less certain because I don't know whether Pango specifies language when sending fonts request down(or up) the road. Therefore, my original mystery still remains a mystery :-) Code2000 isn't marked as supporting Korean as it is missing a large number of Han glyphs, totalling some 3136 characters from the KSC 5601-1992 encoding. Many Korean documents will not be completely covered by this Sorry I didn't check Han glyphs only checking that it has the full set of precomposed Hangul syllables(11,172 of them.). As I suggested before, a kind of multi-level orthography check may be necessary to cope with situations like this. Or, would it be possible for users to override manually what fontconfig *detects* (both code range coverage and lang) in fonts.conf as suggested in my prev. email? Jungshik ___ Fonts mailing list [EMAIL PROTECTED] http://XFree86.Org/mailman/listinfo/fonts
[Fonts]blank glyph list in fonts.config
Since the release of a new CODE2000 font(by James Kass at http://home.att.net/~jameskass) with glyphs for Hangul Jamos, I've been trying to test how it works with various browsers. Mozilla with direct access to truetype fonts works fine, but Mozilla with Xft patch has a problem with U+115F(Hangul leading consonant filler) and U+1160(Hangul vowel filler). In CODE2000, the former is a spacing(non-zero width) _blank_ glyph while the latter is a non-spacing(zero-width/combining) _blank_ glyph. When Mozilla with Xft patch is used to render http://jshin.net/i18n/korean/fillers.html (or http://jshin.net/i18n/korean/hunmin.html), U+115F and U+1160 are rendered with hollow boxes instead of spacing and non-spacing(combining) blanks seemingly because they're not listed among characters allowed to have blank glyphs. It's 'seemingly' because Mozilla with Xft patch has this problem while 'gedit' doesn't have this problem. Anyway, adding U+115F and U+1160 to the list in fonts.config solved the problem. Two screenshots are put up at http://linux.mizi.com/~ganadist/filler1.png (with U+115F/U+1160 added to blank glyph list) http://linux.mizi.com/~ganadist/filler2.png (without ) Mozilla for MS-Windows has a similar problem and I came up with a similar fix that works. See http://bugzilla.mozilla.org/show_bug.cgi?id=167136. I'm not sure adding U+115F/U+1160 to the blank glyph list is the best way, but it works. Keith, could you consider this? Thank you, Jungshik ___ Fonts mailing list [EMAIL PROTECTED] http://XFree86.Org/mailman/listinfo/fonts
Re: [Fonts]blank glyph list in fonts.config
On Sat, 7 Sep 2002, Keith Packard wrote: Around 9 o'clock on Sep 7, Jungshik Shin wrote: I'm not sure adding U+115F/U+1160 to the blank glyph list is the best way, but it works. Keith, could you consider this? The blank glyph list is supposed to be filled with all of the Unicode values which have an empty visual representation. It's a hack to work ... I adapted the data I found in Mozilla for this purpose, hence the similar issues you found in the two programs. Thank you for going through the Unicode tables to come up with a more extensive list. I've just posted your list to bugzilla bug 167136 mentioned previously. I'm reading through the Unicode tables looking for other blank values, so far I've found: Unicode range added? comments U+180B - U+180E no (but I don't have a Mongolian font to check against) U+200C - U+200F yes (the Unicode description isn't clear) U+2028 - U+2029 no (these seem like they're supposed to be drawn) U+202A - U+202F yes (these also appear blank from the description) U+3164 yes (HANGUL FILLER, similar to U+1160) U+FEFF yes (byte order detector (ZERO WIDTH NO-BREAK SPACE)) U+FFA0 yes HALFWIDTH HANGUL FILLER (similar to U+3164) U+FFF9 - U+FFFByes INTERLINEAR ANNOTATION marks for furigana Rules for inclusion -- if a font could reasonably draw these as blank, they should be included in the list. The idea is to ignore empty glyphs which should always have some visual representation. I think that U+200C/U+200D(ZWNJ, ZWJ) are meant to be used mainly( though not exclusively. Latin ligature formation may also be controlled by them.) with Indic scripts and fonts for Indic scripts are supposed to have some OT tables built-in to map a sequence of characters including ZWNJ/ZWJ to appropriate glyph(s). As for U+200E/U+200F and U+202A - U+202F, I guess a lower-level layer like fontconfig is never supposed to see them because they have to be taken care of at a higher level(layout. e.g. Pango?). Nonetheless, it seems like it's harmless(except for a little performance hit, if any) to include them in the blank glyph list. The same appears true of U+FFF9 - U+FFFB. BTW, although depcecated, U+206A - U+206D seem to have to be included as well. U+206E and U+206F may or may not have to be added. I'm not sure what they're for. Jungshik ___ Fonts mailing list [EMAIL PROTECTED] http://XFree86.Org/mailman/listinfo/fonts
Re: [Fonts]Korean orthography for fontconfig
On Wed, 14 Aug 2002, Owen Taylor wrote: The current Korean orthography looks like a combination of KSC-5607.1987 with the complete Hangul Syllables area of Unicode. I'm sorry to be 'pedantic'. Strictly speaking, this way of talking about Korean orthography (in terms of precomposed syllables) is not quite right. You have to say what consonants and vowels are allowed/required in modern Korean orthography just like you talk about what alphabetic letters are required of any given language represented with Latin/Greek/Cyrillic alphabets. However, there are fonts out there that only have the Hangul syllables in KSC-5607.1987 ... one example would be the freely available 'Baekmuk Batang' font; Not any more. A new set of Baekmuk fonts with the full coverage of 11,172 precomposed modern syllables have been available for quite a while (over two years?) although they may not have been included in popular Linux distributions made outside Korea. You can get them at ftp://ftp.mizi.com/pub/baekmuk/baekmuk-ttf-2.1.tar.gz. In addition to having the full set of 11,172 syllables (precomposed, modern, complete), several glitches have been fixed. such fonts are *not* currently recognized as supporting Korean. Nonetheless, you do have a point and I totally agree with you on it. If this was just a matter of preferring fonts with all the Hangul syllables in Unicode when all other things are equal, then this wouldn't be a big problem, but This is a reasonable thing to do. it's more serious than this: - You can't specify such a font in a generic alias, and have it preferentially selected for Korean language tags. - You can't specify such a font in a generic alias, and have it selected at all if you have fonts with the complete orthography. - fontconfig statements like disable hinting for Korean fonts don't work properly with such a font. These are certainly problematic. I think the right thing to do is probably just to use only the KSC-5607.1987 syllables in the Korean orthography; my understanding is that they are sufficient for the vast majority of modern Korean text. I would omit 'vast'. :-). Thanks to the dominance of MS-Windows in Korea as the leading desktop platform, Koreans are not any more restricted to 2350 syllables. (in the past, they resort to JOHAB encoding to achieve the same.) MS Windows supports CP949 (an extension of EUC-KR based on KS X 1001:1998) and ordinary Korean users have no way whether a syllable they type in belongs to KS X 1001:1998 or not. The result is that more and more documents (especially in web BBS', emails and on-line chatrooms where 'colloquial' - it'd better be called 'slang' of the net subculture often times cryptic to people like me.) Korean with intentional/unconcious use of non-orthography-compliant syllables is widely 'spoken'.) in Korean include syllables outside KS X 1001:1998. (see http://bugzilla.mozilla.org/show_bug.cgi?id=131388). Even under Linux, there's no more restriction because ko_KR.UTF-8 locale can be used with Korean input method Ami (with my patch to allow input of all 11,172 syllables: http://jshin.net/faq/ami-1.0.11.utf8.patch.gz. It'd be nice if distributions like RH and Mandrake pick up this patch so that Linux users can be on par with MS Windows users.) Probably, the same is true of MacOS X. Jungshik ___ Fonts mailing list [EMAIL PROTECTED] http://XFree86.Org/mailman/listinfo/fonts
Re: [Fonts]Korean orthography for fontconfig
On Wed, 14 Aug 2002, Owen Taylor wrote: Jungshik Shin [EMAIL PROTECTED] writes: On Wed, 14 Aug 2002, Owen Taylor wrote: The current Korean orthography looks like a combination of KSC-5607.1987 with the complete Hangul Syllables area of Unicode. I'm sorry to be 'pedantic'. Strictly speaking, this way of talking about Korean orthography (in terms of precomposed syllables) is not quite right. You have to say what consonants and vowels are allowed/required in modern Korean orthography just like you talk about what alphabetic letters are required of any given language represented with Latin/Greek/Cyrillic alphabets. I'm not sure I understand your objection here. But it is just a matter of terminology... I'm sorry I got you confused. For a moment, I forgot that 'orthography' in fcpackage context has a specialized meaning different from its usual meaning. I was way too 'pedantic' writing the paragraph above from the point of view of an 'amature linguist'. In Korean orthography standard (both of ROK and DPRK), only consonants and vowels allowed are enumerated as opposed to listing all their possible combinations because listing consonants and vowels are more than enough. However, in fcpackage context, the situation is different. I'd say they definitely are composed syllables. And since it is possible to render Korean syllables by combining pieces at rendering time (Pango can do this for core X fonts, e.g.), I have more to ask/suggest about Pango's rendering of U+1100 Jamos and other issues in Korean rendering(e.g. Uniscribe-like OT support for Korean). I'll try to do that soon offline. However, there are fonts out there that only have the Hangul syllables in KSC-5607.1987 ... one example would be the freely available 'Baekmuk Batang' font; Not any more. A new set of Baekmuk fonts with the full coverage of 11,172 precomposed modern syllables have been ... ftp://ftp.mizi.com/pub/baekmuk/baekmuk-ttf-2.1.tar.gz. In addition to having the full set of 11,172 syllables (precomposed, I just downloaded that, and it looks like the 'Dotum' font still only covers the KSC-5607.1987, just like in the baekmuk-ttf-2.0.tar.gz that Red Hat ships currently. You're right. Dotum still has only 2350 syllables. Now this brings us back to the problem you raised. Basically, I agree with you that fonts with only KS C 5601-1987 coverage have to regarded as supporting Korean by fontconfig. Especially, this loosening of the criteria is also required by bdf/pcf fonts or bdf-turned-sbit-only TTFs(that will replace bdf/pcf fonts sometime in the future according to what's been discussed today). How about introducing 'level' concept to fontconfig? Characters in level 1 are absolutely required (in case of Korean, 2350 Hangul syllables and some more in symbol block of KS X 1001:1998). Level2 has some optional characters (for Korean, it'd be additional 8000+ syllables and 4800+ Hanjas in KS X 1001:1998), Level3 has even rarer characters (for Korean, it'd be Hanjas in KS X 1002) and so on I think the right thing to do is probably just to use only the KSC-5607.1987 syllables in the Korean orthography; my understanding is that they are sufficient for the vast majority of modern Korean text. I would omit 'vast'. :-). Thanks to the dominance of MS-Windows in Korea as the leading desktop platform, Koreans are not any more restricted to 2350 ... http://bugzilla.mozilla.org/show_bug.cgi?id=131388). I defer to your expertise in this area. I just like to make sure that this was only meant to tell you the current situation in Korean materials on the net and that I still agree with your suggestion about 'ko.orth' file in fcpackage. locale can be used with Korean input method Ami (with my patch to allow input of all 11,172 syllables: http://jshin.net/faq/ami-1.0.11.utf8.patch.gz. It'd be nice if ... Is there any reason that it hasn't gotten into the standard AMI? I also like to know :-). I sent the patch to both the maintainer/author of Ami and the Ami mailing list where he is active in late April/early May, but somehow I haven't heard back from him. Perhaps, I'll once more try to contact him. BTW, to make it work under ko_KR.UTF-8, XLC_LOCALE file for ko_KR.UTF-8 should list ksc5601.1987-0 before jisx0208.1983.-0 Jungshik ___ Fonts mailing list [EMAIL PROTECTED] http://XFree86.Org/mailman/listinfo/fonts
Re: [Fonts]Font family name problem
On Mon, 22 Jul 2002, Keith Packard wrote: Around 8 o'clock on Jul 22, Brian Stell wrote: Will there be a way to get the localized name using the ascii only name? How about the other way around? Given a localized name+lang, would it be possible to get the ascii name? Put differently, would there be a way to access the mapping from a localized name+lang to a font(or ascii name/canonical name)? Yes. The representation of the names internally includes all of the localized names along with the postscript name (which is always ASCII), any match or list result will include all of these names. I would sort the names so that any English or Latin names would come first in the list. Reading this, I think it should be possible, but is there an API for that? A number of web pages have embedded or separate CSS with only 'localized font (family) names'. Web browsers or any other applications accessing those CSS' need to map localized names+lang (assuming that lang info. is available in one way or another) to fonts. Although it appears that they can roll out their own for this purpose, wouldn't it be nice to have this in fontconfig? Thanks, Jungshik ___ Fonts mailing list [EMAIL PROTECTED] http://XFree86.Org/mailman/listinfo/fonts
Re: [Fonts]Re: [I18n]Using current locale in font selection
On Tue, 9 Jul 2002, Keith Packard wrote: Ok, so now what do I do with applications which haven't called setlocale (LC_ALL, )? Do I: a) call setlocal (LC_ALL, ) myself? I'm afraid this can have an unexpected side effect, which could surprise/upset some application program developers. b) use $LANG or $LC_CTYPE? If this road is taken, it has to be determined which env. variables have to be refered to in what order. AFAIK, SUS and POSIX say that it's implementation-dependent. Since XF86 is used with many OS', it'd be best to follow the 'local' convention. Then, I don't know how to figure it out without calling setlocale(LC_CTYPE,). In case of Glibc, If $LC_ALL is set, use it else if $LC_CTYPE is set, use it else if $LANG is set, use it. c) Ignore the locale information and leave the font language preference unset? This might well be the best course along with documenting that setlocale() should be called to make font matching/selection locale dependent or that better still is to explicitly provide lang info when invoking font selection APIs. Jungshik ___ Fonts mailing list [EMAIL PROTECTED] http://XFree86.Org/mailman/listinfo/fonts
[Fonts]Re: [I18n]Using current locale in font selection
On Mon, 8 Jul 2002, Keith Packard wrote: Around 14 o'clock on Jul 8, Owen Taylor wrote: +locale = (FcChar8 *)setlocale (LC_CTYPE, NULL); Don't you mean LC_MESSAGES? I believe it should be LC_CTYPE. Some people like me have the following because English menu and (error) messages are easier to understand than not-so-good translation. LC_CTYPE=ko_KR.eucKR LC_MESSAGES=C LC_PAPER=en_US # because the US doesn't use ISO std. paper size . or LC_CTYPE=ko_KR.UTF-8 LC_MESSAGES=en_US.UTF-8 LC_PAPER=en_US.UTF-8 . If so, I think we should be able to use this return value almost raw; stripping out the language and territory codes and passing them in as FC_LANG, right? Did you mean that only codeset part is relevant here and we can go without relying on lang and territory codes? The codeset doesn't carry any lang-specific information if UTF-8 locale is used. Jungshik ___ Fonts mailing list [EMAIL PROTECTED] http://XFree86.Org/mailman/listinfo/fonts
Re: [Fonts]Automatic 'lang' determination
On Sat, 29 Jun 2002, Jungshik Shin wrote: On Fri, 28 Jun 2002, Keith Packard wrote: I'm confused by this; my exposure to Chinese fonts says that simplified Chinese and traditional Chinese have significant overlap in Unicode codepoints, but that the glyphs are quite a bit different in appearance. I doubt this is the case. As far as I can tell I found this needs some clarification. If glyphs of 'A', 'B' and 'C' from Times Roman Latin-1 font are compared with corresponding glyphs from New Century Schoolbook Latin-2 font, they look certainly different. However, that does not mean that you cannot use Times Roman Latin-1 font to render a run of text in one of languages Latin-2 is meant for as long as Times-Roman Latin-1 font has _all_ the glyphs necessary in that particular run of text. I believe the same thing can happen between two fonts for zh-TW and zh-CN. If glyphs from font A for zh-TW are compared with glyphs from font B (with different design principles) for zh-CN, they for sure look different. However, they're different not because font A is for zh-TW and font B is for zh-CN but because they're designed to appear different. Chinese and traditional Chinese have significant overlap in Unicode codepoints, but that the glyphs are quite a bit different in appearance. To make this kind of comparison meaningful, you have to compare two fonts, one for zh-TW and the other for zh-CN, made by a _single_ foundry with the _identical_ design principles and look and feel (something like Adobe Times Roman Latin-1 font and Adobe Times Roman Latin-2 font). In practice, it's hard to find two fonts that satisfy the crieteria I outlined here. However, ISO 10646 code charts for Han characters should do almost as good a job. That's why I suggested comparing glyphs for PRC and Taiwan in the ISO 10646 Han character chart. Jungshik Shin ___ Fonts mailing list [EMAIL PROTECTED] http://XFree86.Org/mailman/listinfo/fonts
[Fonts]Han unification(SC and TC)(was..Re: Automatic 'lang' determination)
On Sat, 29 Jun 2002, Keith Packard wrote: Ooops. My message crossed yours in mail :-) Around 9 o'clock on Jun 29, Jungshik Shin wrote: IMHO, most problems with Han Unification arise not from using a _single_ font targeted at one of zh_TW/zh_CN/ja/ko to render a run of text in another but from mixing _multiple_ fonts (with _drastically different_ design principle and other differences like baseline) to render a single ... Yes, I agree -- this is true in Western languages as well where the We agree with each other on this point, but still get to different conclusions about zh-CN and zh-TW. I'm afraid that's because you have been misinformed about what Han unification has done about simplified forms and traditional forms of Chinese characters. Suppose there's a document tagged as zh_TW that explains how PRC government simplified Chinese characters to boost the literacy rate after WW II. If a Big5 font (that doesn't cover all characters in the doc) is selected instead of a GBK/GB18030 font (with the full coverage), simplified Han characters(not used in Taiwan but only used in PRC) in the doc have to be rendered with another font (most likely GB2312/GBK/GB18030 font). A correct version of this document would tag individual sections of the document with appropriate tags. This way, the zh_TW sections could be presented in a traditional Chinese font while the mainland portions are displayed with simplified Chinese glyphs. Well, even without language tagging, that would happen, which I regard as _ugly_ for the reason I gave in my previous message. Language tag or not, the result would be just as ugly as using TimesRoman Latin-1 font for most characters with a couple of characters rendered with Palatino Latin-2 font. My hypothetical document would not have separate sections for zh-TW and zh-CN, but rather occasional simplified forms of Chinese characters (absent in Big5 fonts but present in GB2312/GBK/GB18030 fonts) would pop up among traditional forms of Chinese characters (present in _both_ Big5 font and GBK/GB18030 fonts). IMHO, tagging the whole document as 'zh-TW' is perfectly valid and rendering it with GBK/GB18030 (with the full coverage of characters in the document) is better than mixing two fonts, one with Big5 coverage and the other with GBK/GB18030 coverage. The latter would happen if you exclude GBK/GB 18030 fonts for zh-TW text rendering. Tagging individual simplified forms of Chinese characters with 'lang=zh-CN' in the sea of traditional forms of Chinese characters would only lead to a less-desirable result than otherwise possible. I'm not sure what you meant by 'glyph forms are more likely simplified'. You might have misunderstood some aspects of Han Unification in Unicode/10646. In Unicode, simplified forms of Chinese characters are NOT unified with corresponding traditional forms of Chinese characters. You're right -- I didn't believe this to be the case. I had heard that the unified portion within the BMP do co-mingle simplified and traditional forms, but that the non-BMP Han extension provide separate codepoints for each. I'm afraid what you have heard of BMP section is misleading if I understood you correctly. Whether in BMP or not, simplified forms of Chinese characters are NOT UNIFIED with traditional forms of Chinese characters. (let me copy my message to John H. Jenkins @Apple who knows a lot more about Han Unification than I do.) AFAIK, most complaints about Han unification does NOT come from zh-CN vs zh-TW BUT from zh-CN/zh-TW vs ja. For Han characters common in both zh-CN and zh-TW, there's no significant difference in appearence between zh-CN and zh-TW. Although many Japanese would not agree with me, I don't think there's any significant difference across CJKV. (again, ISO 10646 Han chart is a good reference along with ROC MOE's Han character variant dictionary at http://140.111.1.40) To me, Han Unification should have gone further (not less) in a sense and it's worrisome to me that non-BMP includes too many glyph variants (a whole bunch of them coming from Korean Buddist text : see http://www.sutra.re.kr) that should have been unified in my eyes. If even BMP codepoints are separate, then it should be possible to create a large set of codepoints which could mark fonts as suitable for the display of simplified Chinese which are distinct from the set of codepoitns suitable for the display of traditional Chinese. That would be nicer than my current kludge of marking any font suitable for traditional chinese as unsuitable for simplified Chinese. How about this? if covers most of GB 18030 good for both zh-CN and zh-TW (and possibly good for ko) elif covers most of GBK good for both zh-CN and zh-TW (and possibly good for ko) not good for ja elif covers most of Big5, good for zh-TW (and possibly good for ko) not good for ja elif covers most
Re: [Fonts]Automatic 'lang' determination
On Sat, 29 Jun 2002, Yao Zhang wrote: It should be if (covers_almost_all_of (GB2312)) font supports SIMPLIFIED Chinese if (covers_almost_all_of (Big5)) font supports traditional Chinese After sending my prev. message, I read this and I have to agree with this. This is better than what I sent earlier. Just forgetting about GB18030/GBK coverage and concentrating on GB2312 and Big5 coverage is simpler as well as better. Jungshik Shin ___ Fonts mailing list [EMAIL PROTECTED] http://XFree86.Org/mailman/listinfo/fonts