RE: glyph selection for Unicode in browsers
At 13:41 02/10/02 +0900, Martin Duerst wrote: >I'm not sure this is possible with Apache, maybe there is a need >for a RemoveCharset directive similar to RemoveType >(http://httpd.apache.org/docs/mod/mod_mime.html#removetype). >Or maybe there is some other way to get the same result. >If a new directive is desirable, then let's try to hack >the Apache code or to propose it to the Apache people. >Similar of course for other server implementations. Over lunch, a colleague told me that RemoveCharset has been added to Apache 2.0. See e.g. http://httpd.apache.org/docs-2.0/mod/mod_mime.html#removecharset. So the right thing to do may be to ask your ISP to upgrade to Apache 2.0. Regards,Martin.
RE: glyph selection for Unicode in browsers
At 12:14 02/10/01 -0400, [EMAIL PROTECTED] wrote: >I agree that 'sniffing' and 'guessing' are ill-defined, and not to be >relied upon. However, I find it a bit 'ill-defined' that there is no >well-defined (web server independent) way for the 'users' to override >the possibly wrong encoding default of the web server. Either way >(a) the user has to do something web server dependent >(b) the admin has to do changes to the site config >seems a bit clunky and fragile. > >Since the current "resolving order" is obviously already deployed out >there and relied upon by someone, it cannot be changed, but possibly >something new could be introduced? Well, servers can always be improved by the various server implementers. What standards specify is what goes 'over the wire'. The only thing you actually have to do is to make sure that the server doesn't add a 'charset' parameter to the Content-Type header for the directories you are using. Then the is the only info, and is used by the browser. I'm not sure this is possible with Apache, maybe there is a need for a RemoveCharset directive similar to RemoveType (http://httpd.apache.org/docs/mod/mod_mime.html#removetype). Or maybe there is some other way to get the same result. If a new directive is desirable, then let's try to hack the Apache code or to propose it to the Apache people. Similar of course for other server implementations. Regards,Martin.
Re: glyph selection for Unicode in browsers
At 03:22 AM 30-09-02, [EMAIL PROTECTED] wrote: > >I think the idea is that, in a word processor for example > >What would you say about a browser? Probably something about extended style sheets that include typographic system tagging. Ideally, as a typographer, I would like something like CSS that includes a tag for every registered OpenType Layout feature -- and OT 'language system' tagging that sits below the level of document language tagging etc. --, so that I can create sophisticated online documents with the same level of typographic control as I have for print documents. I realise that it may be necessary to dress this up as a higher level, non-proprietory-technology-specific mark up. John Hudson Tiro Typeworks www.tiro.com Vancouver, BC [EMAIL PROTECTED] Those books that allow us to forget the most are accorded the status of a classic. - James Secord
RE: glyph selection for Unicode in browsers
> Sniffing isn't a good idea in the long term. It may work > for simple web page serving, but as soon as you go XML and > start to move data around without the user having a chance > to see it frequently, you'll end up with a big mess. > > Also, 'guessing' is very ill-defined. You might serve > a document to your favorite browser, and it looks okay. > But other browsers might guess a bit differently, or > a new version of your favorite browser may guess a bit > differently, and off you are. I agree that 'sniffing' and 'guessing' are ill-defined, and not to be relied upon. However, I find it a bit 'ill-defined' that there is no well-defined (web server independent) way for the 'users' to override the possibly wrong encoding default of the web server. Either way (a) the user has to do something web server dependent (b) the admin has to do changes to the site config seems a bit clunky and fragile. Since the current "resolving order" is obviously already deployed out there and relied upon by someone, it cannot be changed, but possibly something new could be introduced?
RE: glyph selection for Unicode in browsers
At 07:37 02/09/26 +0900, [EMAIL PROTECTED] wrote: >I would be happy if just this > > > >would be enough to convince the browsers that the page is in UTF-8... >It isn't if the HTTP server claims that the pages it serves are in >ISO 8859-1. A sample of this is http://www.iki.fi/jhi/jp_utf8.html, >it does have the meta charset, but since the webserver (www.hut.fi, >really, a server outside of my control) thinks it's serving Latin 1, >I cannot help the wrong result. (I guess some browsers might do better >work at sniffing the content of the page, but at least IE6 and Opera 6.05 >on Win32 seem to believe the server rather than the (HTML of the) page. Sniffing isn't a good idea in the long term. It may work for simple web page serving, but as soon as you go XML and start to move data around without the user having a chance to see it frequently, you'll end up with a big mess. Also, 'guessing' is very ill-defined. You might serve a document to your favorite browser, and it looks okay. But other browsers might guess a bit differently, or a new version of your favorite browser may guess a bit differently, and off you are. Regards, Martin.
Re: glyph selection for Unicode in browsers
On 09/29/2002 12:53:14 PM tiro wrote: >I think the idea is that, in a word processor for example What would you say about a browser? - Peter --- Peter Constable Non-Roman Script Initiative, SIL International 7500 W. Camp Wisdom Rd., Dallas, TX 75236, USA Tel: +1 972 708 7485 E-mail: <[EMAIL PROTECTED]>
Re: glyph selection for Unicode in browsers
Quoting [EMAIL PROTECTED]: > But should there not be some (possibly user-overridable) relationship > between an NLS or similar tag (e.g. "lang" in HTML or xml:lang) and one of > these so that a browser or word-processing app that knows what "language" > (e.g. what RFC 3066 tag) is applied to the data can tell the > layout/rendering sub-system what OT "language-system" tags to apply > (assuming some API exists to do so)? Surely that is where we want to move > toward. I think the idea is that, in a word processor for example, something like 'Typographic system' would be set by the user as an independent layout control, not directly linked to 'language'. This enables the user to select a language to use for sorting, spellchecking, etc. (character level text handling), and separately select a set of typographic conventions (glyph level text display). I suppose some developers may choose to pursue the direction you suggest, e.g. relating default typographic conventions to the user's language setting. I just make the fonts :) John Hudson
Re: glyph selection for Unicode in browsers
Peter Constable wrote, > >Once the font specs for all this are set and fonts are released with > >the necessary coverage and the shaping engines can access all of this, > >the browsers are sure to quickly add support, too. > > I'm not quite as optimistic in terms of how close we are to having all this > ready to go. I think there's some hard work still ahead. > Oh, certainly. There's always hard work still ahead, it seems. A possible option for dealing with language specific font selection would be to enhance the user font preference options in browsers to include "lang" font assignments under the existing script assignments. To assist users in setting up such preferences, perhaps some kind of font information registry could be collected and published in order to avoid the duplication of effort which would be required if every user had to separately contact the various font developers asking about language suitability. I'd expect this kind of user preference option to be under "advanced" settings... Or, perhaps this degree of font selection could be handled at the HTML/XML authoring tool level. In other words, an author could set up "lang" tag preferences once, and then the tool could automatically set up CSS with appropriate font selection information based upon the "lang" tags used in the document. Or, the browser folks themselves could maintain an internal listing of "lang" tags and suitable fonts and apply them invisibly as default if no other font is selected. Of course, this still would mean much hard work ahead, but it would shift the burden from OpenType developers to browser developers. (smile) Best regards, James Kass.
Re: glyph selection for Unicode in browsers
On 09/28/2002 04:47:49 AM tiro wrote: >'Language system' (not 'language') in the OpenType specification actually means >*writing* system, i.e. a particular set of orthographic/typographic conventions >associated with the use of a particular script. 'Language system' is a >misnomer -- an historical artifact of the incomplete understanding of the >format's original designers --, and it has caused all sorts of confusion, >especially among people who assume that the OT 'language system' tags must have >some relationship to things like NLS tags. There is no necessary relationship >and, indeed, it is possible to conceive of a user wanting to apply, for >instance, the typographic conventions of German to a language other than German. But should there not be some (possibly user-overridable) relationship between an NLS or similar tag (e.g. "lang" in HTML or xml:lang) and one of these so that a browser or word-processing app that knows what "language" (e.g. what RFC 3066 tag) is applied to the data can tell the layout/rendering sub-system what OT "language-system" tags to apply (assuming some API exists to do so)? Surely that is where we want to move toward. - Peter --- Peter Constable Non-Roman Script Initiative, SIL International 7500 W. Camp Wisdom Rd., Dallas, TX 75236, USA Tel: +1 972 708 7485 E-mail: <[EMAIL PROTECTED]>
Re: glyph selection for Unicode in browsers
[EMAIL PROTECTED] scripsit: > There is no necessary relationship > and, indeed, it is possible to conceive of a user wanting to apply, for > instance, the typographic conventions of German to a language other than German. Indeed, if one is doing early modern Swedish, that is exactly what one wants, IIRC. -- One art / There is John Cowan <[EMAIL PROTECTED]> No less / No more http://www.reutershealth.com All things / To do http://www.ccil.org/~cowan With sparks / Galore -- Douglas Hofstadter
Re: glyph selection for Unicode in browsers
Can anyone clarify this one: In Microsoft page here : http://www.microsoft.com/typography/OTSPEC/indicot/default.htm says Malayalam chillu glyphs are formed when inputting (consonant)+(virama). Can I use another formation for chillus, I want to use (consonant)+(virama)+(ZWJ) any problem? And any problem, If I am am giving ligature formation in this way in OpenType tables? Regards, Baiju M --- [EMAIL PROTECTED] wrote: > Quoting [EMAIL PROTECTED]: > > > Actually, my point was specifically that *part* of the > infrastructure is > > already present, at least in OpenType, but not *all*, either > in OpenType > > (meaning of "language" in the OT spec needs to be clarified, > and > > relationships between these tags and the "language" tags > used for data e.g. > > RFC 3066, need to be resolved)... > > 'Language system' (not 'language') in the OpenType > specification actually means > *writing* system, i.e. a particular set of > orthographic/typographic conventions > associated with the use of a particular script. 'Language > system' is a > misnomer -- an historical artifact of the incomplete > understanding of the > format's original designers --, and it has caused all sorts of > confusion, > especially among people who assume that the OT 'language > system' tags must have > some relationship to things like NLS tags. There is no > necessary relationship > and, indeed, it is possible to conceive of a user wanting to > apply, for > instance, the typographic conventions of German to a language > other than German. > > I've suggested to Microsoft and Adobe that the term used in > the spec should be > changed, or at least annotated. > > John Hudson > = __ Do you Yahoo!? New DSL Internet Access from SBC & Yahoo! http://sbc.yahoo.com
Re: glyph selection for Unicode in browsers
Quoting [EMAIL PROTECTED]: > Actually, my point was specifically that *part* of the infrastructure is > already present, at least in OpenType, but not *all*, either in OpenType > (meaning of "language" in the OT spec needs to be clarified, and > relationships between these tags and the "language" tags used for data e.g. > RFC 3066, need to be resolved)... 'Language system' (not 'language') in the OpenType specification actually means *writing* system, i.e. a particular set of orthographic/typographic conventions associated with the use of a particular script. 'Language system' is a misnomer -- an historical artifact of the incomplete understanding of the format's original designers --, and it has caused all sorts of confusion, especially among people who assume that the OT 'language system' tags must have some relationship to things like NLS tags. There is no necessary relationship and, indeed, it is possible to conceive of a user wanting to apply, for instance, the typographic conventions of German to a language other than German. I've suggested to Microsoft and Adobe that the term used in the spec should be changed, or at least annotated. John Hudson
Re: glyph selection for Unicode in browsers
On 09/27/2002 10:56:00 AM Jungshik Shin wrote: >> Again, the problem is knowing just *how* they should go about doing this. > > As for 'how', what MS IE and Mozilla do may not be as user-friendly >as Tex wants them to be, but I think it's pretty reasonable at >least for CJK. If they're configured to use different Unicode-cmapped >(non-Pan-script) fonts for TC/SC/J/K (as opposed to pan-script Unicode >fonts like MS Arial Unicode, Cyberbit), runs of text tagged with TC/SC/J/K >are rendered with fonts configured for TC,SC,J and K, respectively. A couple of notes: Speaking in generalities, a font that isn't a "pan-script Unicode" font potentially can support TC/SC/J/K equally well with glyphs suited to users in each culture -- but not using default character-to-glyph mappings. The mechanisms available to IE or Mozilla today would not provide any means to determine which typographic preferences are supported by default in a given font. Nor does the infrastructure exist that will allow these apps to request the culturally-preferred fonts that would exist in such fonts. Of course, in practice, many currently-existing CJK fonts may have been developed to support a single group of users, and don't include alternate glyphs that might be prefered by users in other cultures. Also, what IE and Mozilla currently do helps with the CJK issues, but these apps don't do anything, that I know of, in relation to comparable issues for other scripts, e.g. language-related preferences for Latin diacritics or Cyrillic italic forms. Which you anticipate: >I guess you already know this much and what you're alluding >to is a problem of another dimension: developing ( Pan-script >if necessary/possible) Unicode fonts with multiple lang-depedent >glyphs Yes (with the added note that the pan-script element is orthogonal to what I'm referring to). >it seems like selecting lang-dependent glyphs for >Latin/Cyrillic letters are more difficult than CJK case I'm not sure; I haven't thought about that, in part because I don't have only limited knowledge of what glyph variations issues there are for most scripts. >The font >selection part of these problems is addressed by fontconfig package by >Keith Packard (http://fontconfig.org). Of course, there should be >other implementations of/attempts at this problem. The fontconfig library is entirely new to me. Thanks for the link. - Peter --- Peter Constable Non-Roman Script Initiative, SIL International 7500 W. Camp Wisdom Rd., Dallas, TX 75236, USA Tel: +1 972 708 7485 E-mail: <[EMAIL PROTECTED]>
Re: glyph selection for Unicode in browsers
Jungshik, I used characters that should display differently. Some punctuation and some like (I think it is) the bone character. However, feel free to suggest a list of characters that should be distinctive and I'll post a page with them that we can all review whether there are differences or not on various platforms, browsers, etc. I agree for many characters there should be no differences. tex Jungshik Shin wrote: > Actually, you might have had hard time telling the display difference > depending on what characters you used for your testing EVEN IF you > configured browsers to use different (but with __very similar__ design > principles and look/feels) Unicode-cmapped (but NON-pan-script) fonts > for TC,SC, J and K *under MS Windows*. This difficulty demonstrates > that CJK Unification in Unicode/10646 is not such a big problem as some > people tried to make it. > > Jungshik -- - Tex Texin cell: +1 781 789 1898 mailto:[EMAIL PROTECTED] Xen Master http://www.i18nGuy.com XenCrafthttp://www.XenCraft.com Making e-Business Work Around the World -
Re: glyph selection for Unicode in browsers
On Fri, 27 Sep 2002 [EMAIL PROTECTED] wrote: > > On 09/26/2002 10:46:42 PM Andrew Cunningham wrote: > > >For me, this is the crux: that browsers have not implimented the css > >:lang selector. As I wrote in my response to Tex, css 'lang' pseudo-class is honored by MS IE and Mozilla 1.x/Netscape 7. > Again, the problem is knowing just *how* they should go about doing this. As for 'how', what MS IE and Mozilla do may not be as user-friendly as Tex wants them to be, but I think it's pretty reasonable at least for CJK. If they're configured to use different Unicode-cmapped (non-Pan-script) fonts for TC/SC/J/K (as opposed to pan-script Unicode fonts like MS Arial Unicode, Cyberbit), runs of text tagged with TC/SC/J/K are rendered with fonts configured for TC,SC,J and K, respectively. I guess you already know this much and what you're alluding to is a problem of another dimension: developing ( Pan-script if necessary/possible) Unicode fonts with multiple lang-depedent glyphs (if that's possible at all overcoming/solving various subtles issues involved. it seems like selecting lang-dependent glyphs for Latin/Cyrillic letters are more difficult than CJK case) and getting apps and rendering/font selection library to make use of them. The font selection part of these problems is addressed by fontconfig package by Keith Packard (http://fontconfig.org). Of course, there should be other implementations of/attempts at this problem. Jungshik Shin
Re: glyph selection for Unicode in browsers
On Fri, 27 Sep 2002, Tex Texin wrote: > Jungshik Shin wrote: > > > > On Thu, 26 Sep 2002, Tex Texin wrote: > > > > > Yes, OS and browsers are getting better. My concerns center around: > > > Is the mechanism for selecting fallback fonts language-sensitive, so > > > that it would favor a Japanese font for Unicode Han characters that were > > > tagged as lang:ja > > > > I'm a little at loss as to why you have the impression > > that 'lang' tag has little effect on rendering of html (in > > UTF-8. e.g. your page or IUC10 announcement page which used to be at > > http://www.unicode.org/iuc/iuc10/x-utf8.html) by major browsers. MS > > IE has been making use of 'lang' attribute(html) for a long time and > > Mozilla solved the problem (although 'xml:lang' is not yet supported) > > last December. In case of Mozilla(and Netscape 7), see > I am glad to see the issue has been given some attention. > I concluded there was a problem after experimenting with some CJK > characters that I repeated with different lang tags and could not get > any display differences unless I used non-Unicode fonts assigned to each > language. I did this with IE 6 and NS 7 and Opera (dont recall if it was > 6 or 7.) Actually, you might have had hard time telling the display difference depending on what characters you used for your testing EVEN IF you configured browsers to use different (but with __very similar__ design principles and look/feels) Unicode-cmapped (but NON-pan-script) fonts for TC,SC, J and K *under MS Windows*. This difficulty demonstrates that CJK Unification in Unicode/10646 is not such a big problem as some people tried to make it. Jungshik
Re: glyph selection for Unicode in browsers
John Cowan wrote: > > Tex Texin scripsit: > > > An author of a primarily Japanese document could choose not to tag > > Chinese text as Chinese, and so get a Japanese rendering of the text, > > but that could hurt search engines or other applications that use > > language tags for purposes other than rendering... > > Indeed, indeed. Tagging (even implicit tagging) with a false language is > a very bad idea. > > > So I stick with the > > idea that text should be tagged with language appropriately, and a user > > that reads Japanese and prefers to see Chinese text with Japanese glyphs > > have the ability to override the language tags to affect rendering. > > The trouble is that that's the default for a Japanese reader reading > mixed-language text. No override should be required. It's not a big trouble. browsers already have options such as netscape's preferences under fonts, radio button: use fonts specified in document vs. override and use user-defined fonts. > > > I can't say if "typographical tradition preference" (TTP) is the correct > > term for "language preference". (I figure I got into enough trouble > > using "typographically correct".) I hope the discussion above was clear > > enough. I'll let others comment on TTP, and if there is general > > agreement that it is a better and more precise and accurate term, I am > > fine with it. > > My point was that it's one thing to want Chinese text displayed with > Japanese glyphs, based on a typographical-tradition preference, and it's > another thing to want the text in a Japanese-language version, > which is what setting a "language preference" would suggest. > > > I am not familiar enough with Fraktur and Antiqua to > > knowledgably comment. From what little I do know this seems to require > > more than language information to decide between them. > > Absolutely. The analogy is that Fraktur is quite, or nearly, illegible if > all you know how to read is Antiqua (which looks like what you are seeing > now, ordinary Latin-script type). This makes the difference greater than a > mere font difference. ok, but it is not clear to me that we should try to fix this problem in the same way we fix the cjk rendering problem. Language is being tagged and provided for a number of reasons, and it should be utilized. Fraktur/Antiqua and other distinctions might need a different mechanism. tex > > -- > Business before pleasure, if not too bloomering long before. > --Nicholas van Rijn > John Cowan <[EMAIL PROTECTED]> > http://www.ccil.org/~cowan http://www.reutershealth.com -- - Tex Texin cell: +1 781 789 1898 mailto:[EMAIL PROTECTED] Xen Master http://www.i18nGuy.com XenCrafthttp://www.XenCraft.com Making e-Business Work Around the World -
Re: glyph selection for Unicode in browsers
Hi, I am glad to see the issue has been given some attention. I concluded there was a problem after experimenting with some CJK characters that I repeated with different lang tags and could not get any display differences unless I used non-Unicode fonts assigned to each language. I did this with IE 6 and NS 7 and Opera (dont recall if it was 6 or 7.) tex Jungshik Shin wrote: > > On Thu, 26 Sep 2002, Tex Texin wrote: > > > Yes, underlying fonts can be a Unicode architecture. That's a good > > thing, but invisible to end-users. > > I would like to keep the sense of "Unicode font" as meaning a font which > > supports a large number of scripts, rather than meaning one that uses > > Unicode for its mapping architecture. > > > > Yes, OS and browsers are getting better. My concerns center around: > > Is the mechanism for selecting fallback fonts language-sensitive, so > > that it would favor a Japanese font for Unicode Han characters that were > > tagged as lang:ja > > I'm a little at loss as to why you have the impression > that 'lang' tag has little effect on rendering of html (in > UTF-8. e.g. your page or IUC10 announcement page which used to be at > http://www.unicode.org/iuc/iuc10/x-utf8.html) by major browsers. MS > IE has been making use of 'lang' attribute(html) for a long time and > Mozilla solved the problem (although 'xml:lang' is not yet supported) > last December. In case of Mozilla(and Netscape 7), see > > http://bugzilla.mozilla.org/show_bug.cgi?id=105199 (fixed. >where you'll find a pair of screenshots with dramatically >different rendering results) > http://bugzilla.mozilla.org/show_bug.cgi?id=115121 > (xml:lang : not yet fixed) > http://bugzilla.mozilla.org/show_bug.cgi?id=122779 (C-L http header > and UTF-8 document) > > > And are the fonts labeled so that the supported language is known? > > Judging from the discussion about the issue in Xfree86-font > list, most of modern OTFs are. Otherwise, applications (or a library > for text rendering/font selection) can resort to a kind of mapping the > character repertoire of a font to language(s) covered as is done by > fontconfig for XFree86. For instance, characters in JIS X 0208 are all > covered, but characters from GB2312, Big5 and KS X 1001 are missing, > a font is likely to be Japanese. > > > Even so, I'd still need to have a large collection of fonts then. > > Indeed that's the case. If OT lang-tag is made use of and > multiple alternative glyphs are available in a single(or > a few) pan-script Unicode font(s), you'd not have to. > > Jungshik -- - Tex Texin cell: +1 781 789 1898 mailto:[EMAIL PROTECTED] Xen Master http://www.i18nGuy.com XenCrafthttp://www.XenCraft.com Making e-Business Work Around the World -
Re: glyph selection for Unicode in browsers
Tex Texin scripsit: > An author of a primarily Japanese document could choose not to tag > Chinese text as Chinese, and so get a Japanese rendering of the text, > but that could hurt search engines or other applications that use > language tags for purposes other than rendering... Indeed, indeed. Tagging (even implicit tagging) with a false language is a very bad idea. > So I stick with the > idea that text should be tagged with language appropriately, and a user > that reads Japanese and prefers to see Chinese text with Japanese glyphs > have the ability to override the language tags to affect rendering. The trouble is that that's the default for a Japanese reader reading mixed-language text. No override should be required. > I can't say if "typographical tradition preference" (TTP) is the correct > term for "language preference". (I figure I got into enough trouble > using "typographically correct".) I hope the discussion above was clear > enough. I'll let others comment on TTP, and if there is general > agreement that it is a better and more precise and accurate term, I am > fine with it. My point was that it's one thing to want Chinese text displayed with Japanese glyphs, based on a typographical-tradition preference, and it's another thing to want the text in a Japanese-language version, which is what setting a "language preference" would suggest. > I am not familiar enough with Fraktur and Antiqua to > knowledgably comment. From what little I do know this seems to require > more than language information to decide between them. Absolutely. The analogy is that Fraktur is quite, or nearly, illegible if all you know how to read is Antiqua (which looks like what you are seeing now, ordinary Latin-script type). This makes the difference greater than a mere font difference. -- Business before pleasure, if not too bloomering long before. --Nicholas van Rijn John Cowan <[EMAIL PROTECTED]> http://www.ccil.org/~cowan http://www.reutershealth.com
Re: glyph selection for Unicode in browsers
On Thu, 26 Sep 2002, Tex Texin wrote: > Yes, underlying fonts can be a Unicode architecture. That's a good > thing, but invisible to end-users. > I would like to keep the sense of "Unicode font" as meaning a font which > supports a large number of scripts, rather than meaning one that uses > Unicode for its mapping architecture. > > Yes, OS and browsers are getting better. My concerns center around: > Is the mechanism for selecting fallback fonts language-sensitive, so > that it would favor a Japanese font for Unicode Han characters that were > tagged as lang:ja I'm a little at loss as to why you have the impression that 'lang' tag has little effect on rendering of html (in UTF-8. e.g. your page or IUC10 announcement page which used to be at http://www.unicode.org/iuc/iuc10/x-utf8.html) by major browsers. MS IE has been making use of 'lang' attribute(html) for a long time and Mozilla solved the problem (although 'xml:lang' is not yet supported) last December. In case of Mozilla(and Netscape 7), see http://bugzilla.mozilla.org/show_bug.cgi?id=105199 (fixed. where you'll find a pair of screenshots with dramatically different rendering results) http://bugzilla.mozilla.org/show_bug.cgi?id=115121 (xml:lang : not yet fixed) http://bugzilla.mozilla.org/show_bug.cgi?id=122779 (C-L http header and UTF-8 document) > And are the fonts labeled so that the supported language is known? Judging from the discussion about the issue in Xfree86-font list, most of modern OTFs are. Otherwise, applications (or a library for text rendering/font selection) can resort to a kind of mapping the character repertoire of a font to language(s) covered as is done by fontconfig for XFree86. For instance, characters in JIS X 0208 are all covered, but characters from GB2312, Big5 and KS X 1001 are missing, a font is likely to be Japanese. > Even so, I'd still need to have a large collection of fonts then. Indeed that's the case. If OT lang-tag is made use of and multiple alternative glyphs are available in a single(or a few) pan-script Unicode font(s), you'd not have to. Jungshik
Re: glyph selection for Unicode in browsers
[EMAIL PROTECTED] wrote: > > On 09/26/2002 08:55:18 PM Tex Texin wrote: > >In the case of HTML, XML, CSS, ways to specify typographic preferences > >exist, and language can be expressed via "lang". We just need browsers > >and other user agents to make use of the lang information as part of > >font selection. > > The difficult question is How? Do we want some means (not codepage) to know > that certain fonts are suited to particular languages? Fonts already span multiple languages, so by itself would not work. Or do we want to > make use of smart-font capabilities to allow culturally-preferred glyphs to > be selected from a font? If the latter, then some more infrastructure still > needs to be developed within APIs and layout engines. I think yes. Whereas api relied on codepages either implicitly or explicitly, this needs to be reexamined and language should be allowed to play a suitable role in glyph selection and font selection. -- - Tex Texin cell: +1 781 789 1898 mailto:[EMAIL PROTECTED] Xen Master http://www.i18nGuy.com XenCrafthttp://www.XenCraft.com Making e-Business Work Around the World -
Re: glyph selection for Unicode in browsers
John, Thanks for commenting. Responses embedded. John Cowan wrote: > > Tex Texin scripsit: > > > I do need to point out that user preference is problematic if it means > > that for a user to display a multilingual document, the user has to go > > thru and specify font preferences for languages they know nothing about. > > How can this be avoided? If I print a document containing a small amount > of text in Georgian (in a bibliography entry, say), I am not going to > know if the Georgian font is the most beautiful thing ever made or one > that is utterly illegible. I have to pass it to someone who can read > Georgian and wait for the "Aah!" or "Arrgh!" as the case may be. > > Or I can take the default and hope for the best. All I ask is the defaults be adequate. I wouldn't disallow software from providing for users to express preferences. I am trying to avoid it being required of users to provide preferences. Yes, most users don't know which fonts are the best choices. > > > Just because I don't read CJK, doesn't mean I don't have legitimate > > needs to display or print CJK in a typographically correct way. > > Librarians, Commerce exchanges, mailing lists, localizers, etc. > > Since the issue is not really a matter of language, but of typographic > tradition (see John Jenkins's excellent discussion of this question at > http://www.unicode.org/unicode/faq/han_cjk.html#3), there is no such thing > as a "typographically correct way". In particular (as noted in the FAQ), > it is commonplace for a Japanese document that quotes Chinese text to > use Japanese-style glyphs for both languages, as this is apparently less > jarring to the average Japanese reader. "Typographically correct" was too strong. I am just looking for the font to reflect the language, so CJK is displayed as either C or J or K as indicated by HTML or XML lang tags. With respect to the comment from John's FAQ, it is reasonable but only for a user who is primarily or strongly a C, or J or K reader. For many applications, such as printing labels for card catalogs or mailing lists, the user's preference does not matter (because the printout targets someone other than the person operating the software). Also, for someone like myself who is not a reader, I would like text displayed the same way each time so I stand a better chance of recognizing it. As more people work with multilingual data, I think more users will be like myself. An author of a primarily Japanese document could choose not to tag Chinese text as Chinese, and so get a Japanese rendering of the text, but that could hurt search engines or other applications that use language tags for purposes other than rendering... So I stick with the idea that text should be tagged with language appropriately, and a user that reads Japanese and prefers to see Chinese text with Japanese glyphs have the ability to override the language tags to affect rendering. > > > But although you didn't quite say this, a user could provide a > > preference not for font, but language, i.e. if the script is CJK, > > display it as C or J or K (or T). And given the language the font > > mechanisms would do a reasonable thing. > > That is reasonable provided you grasp what is meant by "language > preference" here: namely, typographical tradition preference. It would > be like choosing between Fraktur and Antiqua when reading German text: > this too is rather broader than a mere font difference. I am not a typographer, and I am just trying to point out requirements for font selection for a typical user or at least a user that is not a linguist, not a typographer, not a font specialist, and who wants to display/print "pan-Unicode" or "pan-script unicode-based" text. I am not trying to address high end publishing requirements. I can't say if "typographical tradition preference" (TTP) is the correct term for "language preference". (I figure I got into enough trouble using "typographically correct".) I hope the discussion above was clear enough. I'll let others comment on TTP, and if there is general agreement that it is a better and more precise and accurate term, I am fine with it. I am not familiar enough with Fraktur and Antiqua to knowledgably comment. From what little I do know this seems to require more than language information to decide between them. (I did find an interesting article on Fraktur though in trying to understand your meaning, http://www.waldenfont.com/public/gbpmanual.pdf) hth tex p.s. I am about to travel and may not have email for a few days. (A cheer goes up from the list...) > > -- > A mosquito cried out in his pain, John Cowan > "A chemist has poisoned my brain!" http://www.ccil.org/~cowan > The cause of his sorrow http://www.reutershealth.com > Was para-dichloro- [EMAIL PROTECTED] > Diphenyltrichloroethane.(aka DDT) -- -
Re: glyph selection for Unicode in browsers
On 09/27/2002 12:27:22 AM jameskass wrote: >Don't despair. As Peter Constable has pointed out, the infrastructure >for having browsers support language tags is already present. Actually, my point was specifically that *part* of the infrastructure is already present, at least in OpenType, but not *all*, either in OpenType (meaning of "language" in the OT spec needs to be clarified, and relationships between these tags and the "language" tags used for data e.g. RFC 3066, need to be resolved), or in APIs (there's no way for apps to indicate which OT "language" tag to apply to a run unless the app wishes to do *all* of the OT support -- replacing e.g. Uniscribe -- itself). >Once the font specs for all this are set and fonts are released with >the necessary coverage and the shaping engines can access all of this, >the browsers are sure to quickly add support, too. I'm not quite as optimistic in terms of how close we are to having all this ready to go. I think there's some hard work still ahead. - Peter --- Peter Constable Non-Roman Script Initiative, SIL International 7500 W. Camp Wisdom Rd., Dallas, TX 75236, USA Tel: +1 972 708 7485 E-mail: <[EMAIL PROTECTED]>
RE: glyph selection for Unicode in browsers
On 09/26/2002 07:24:08 PM "Murray Sargent" wrote: >I don't think the idea is that codepage equals language. Rather codepage >equals a writing system, which consists of one or more scripts (e.g., 6 >scripts for ShiftJIS). As such the codepage is a useful cue in choosing >an appropriate font for rendering text. (Murray and I talked about this some at dinner a couple of weeks ago, so there's some history here.) I don't think things are quite that simple. A codepage *can* be a useful cue in choosing an appropriate font (or in choosing typographic preferences by whatever means). This certainly may be the case in some instances, such as Shift JIS. But it's not always the case. For instance, cp1251 doesn't tell you what language is involved, and isn't sufficient to determine which italic variants of certain Cyrillic characters are needed. Similarly, cp1250 doesn't tell you what cultural preferences should apply in relation to design and alignment of the ogonek diacritic (e.g. Polish and Lithuanian differ in this regard), or other diacritics (e.g. caron should have a distinct form for Czech); and cp1252 doesn't tell you about cultural preferences regarding cedilla (three different forms can be used for French, but only one is acceptable for Portuguese or Catalan). That's why I maintain that a codepage is a character set, but not a writing system. In general, a codepage does not determine a set of rules for writing; it just provides a vocabularly with which to work. >The bottom line is that if text was generated using a particular >codepage it's likely that the creator of that text intended the text to >be rendered with a font that supports that codepage. Of course, fonts can support multiple codepages. Given e.g. Arial, Tahoma and Verdana, they all support codepages 1250, 1251, 1252, 1253, 1254, 1257 and 1258. That doesn't tell you whether they're appropriate for Polish or Lithuanian or Czech or whatever. Even the fact that they support cp1258 doesn't imply that they are appropriate for Vietnamese: e.g. the default glyphs in Arial for U+1EA5 and U+1EA7 do not have the diacritics stacked in the way needed for Vietnamese. I'm not saying that codepage information isn't ever useful. Obviously, you have found it very useful. But the usefulness has limits. - Peter --- Peter Constable Non-Roman Script Initiative, SIL International 7500 W. Camp Wisdom Rd., Dallas, TX 75236, USA Tel: +1 972 708 7485 E-mail: <[EMAIL PROTECTED]>
Re: glyph selection for Unicode in browsers
On 09/26/2002 08:55:18 PM Tex Texin wrote: >Yes code page was not a good indicator of language, but it was used that >way by some applications. > >And yes, Language should not dominate font selection, it should >influence it. Other typographic preferences also must be accomodated. I agree. >In the case of HTML, XML, CSS, ways to specify typographic preferences >exist, and language can be expressed via "lang". We just need browsers >and other user agents to make use of the lang information as part of >font selection. The difficult question is How? Do we want some means (not codepage) to know that certain fonts are suited to particular languages? Or do we want to make use of smart-font capabilities to allow culturally-preferred glyphs to be selected from a font? If the latter, then some more infrastructure still needs to be developed within APIs and layout engines. - Peter --- Peter Constable Non-Roman Script Initiative, SIL International 7500 W. Camp Wisdom Rd., Dallas, TX 75236, USA Tel: +1 972 708 7485 E-mail: <[EMAIL PROTECTED]>
Re: glyph selection for Unicode in browsers
On 09/26/2002 10:46:42 PM Andrew Cunningham wrote: >For me, this is the crux: that browsers have not implimented the css >:lang selector. Again, the problem is knowing just *how* they should go about doing this. - Peter --- Peter Constable Non-Roman Script Initiative, SIL International 7500 W. Camp Wisdom Rd., Dallas, TX 75236, USA Tel: +1 972 708 7485 E-mail: <[EMAIL PROTECTED]>
Re: glyph selection for Unicode in browsers
Tex Texin scripsit: > I do need to point out that user preference is problematic if it means > that for a user to display a multilingual document, the user has to go > thru and specify font preferences for languages they know nothing about. How can this be avoided? If I print a document containing a small amount of text in Georgian (in a bibliography entry, say), I am not going to know if the Georgian font is the most beautiful thing ever made or one that is utterly illegible. I have to pass it to someone who can read Georgian and wait for the "Aah!" or "Arrgh!" as the case may be. Or I can take the default and hope for the best. > Just because I don't read CJK, doesn't mean I don't have legitimate > needs to display or print CJK in a typographically correct way. > Librarians, Commerce exchanges, mailing lists, localizers, etc. Since the issue is not really a matter of language, but of typographic tradition (see John Jenkins's excellent discussion of this question at http://www.unicode.org/unicode/faq/han_cjk.html#3), there is no such thing as a "typographically correct way". In particular (as noted in the FAQ), it is commonplace for a Japanese document that quotes Chinese text to use Japanese-style glyphs for both languages, as this is apparently less jarring to the average Japanese reader. > But although you didn't quite say this, a user could provide a > preference not for font, but language, i.e. if the script is CJK, > display it as C or J or K (or T). And given the language the font > mechanisms would do a reasonable thing. That is reasonable provided you grasp what is meant by "language preference" here: namely, typographical tradition preference. It would be like choosing between Fraktur and Antiqua when reading German text: this too is rather broader than a mere font difference. -- A mosquito cried out in his pain, John Cowan "A chemist has poisoned my brain!" http://www.ccil.org/~cowan The cause of his sorrow http://www.reutershealth.com Was para-dichloro- [EMAIL PROTECTED] Diphenyltrichloroethane.(aka DDT)
Re: glyph selection for Unicode in browsers
Mark, My preference is that tagged information should display as tagged and the user can do something specifically to override it if they want. But then, I can't read CJK and so would be glad to get comments from those communities. I can see arguments both for and against user preference to take precedence over tags. Where there is no language information in the document, it makes sense to have user preference or heuristics attempt to supply the information. Where the tag is clearly inappropriate, for example, text labeled as English that is clearly Chinese, sure override the tag. Where the tag is wrong but difficult to detect (Traditional vs. Simplified) too bad- the author gets what he deserves. Also, heuristics work well with longer runs of text, but not for shorter runs. (names and addresses, quotations, etc.) >From an implementation standpoint, once you have the ability for language to influence font selection, the significant part is done. Determining which language to use, from a tag, or user preference, or heuristic, is the easy part. I wouldn't have a problem with some precedence rules over which to use, or even some negotiation where the text clearly belongs to a script, and the language influence of tag, user preference or heuristic is limited to whether their recommendation is appropriate for the script. (Hopefully the heuristic is always in line with the script.) I do need to point out that user preference is problematic if it means that for a user to display a multilingual document, the user has to go thru and specify font preferences for languages they know nothing about. Just because I don't read CJK, doesn't mean I don't have legitimate needs to display or print CJK in a typographically correct way. Librarians, Commerce exchanges, mailing lists, localizers, etc. But although you didn't quite say this, a user could provide a preference not for font, but language, i.e. if the script is CJK, display it as C or J or K (or T). And given the language the font mechanisms would do a reasonable thing. tex Mark Davis wrote: > > > not to replace one broken model (code page = language) with > > another broken model (language = font preference). > > I would add to that that I suspect that given the number of documents > that fail to tag with language, or even worse yet, tag with the wrong > language, that other approaches may give generally better results. The > main area of concern is CJK, and I suspect that in a great many cases > the user is probably better off either: > > - simply using a font set according to the user's own preference, or > - having a bit of smarts in the program for heuristically picking > among C, J and K. > > Mark > __ > http://www.macchiato.com > ◄ “Eppur si muove” ► > > - Original Message - > From: "Kenneth Whistler" <[EMAIL PROTECTED]> > To: <[EMAIL PROTECTED]> > Cc: <[EMAIL PROTECTED]>; <[EMAIL PROTECTED]> > Sent: Thursday, September 26, 2002 16:17 > Subject: Re: glyph selection for Unicode in browsers > > > Tex, > > > > > 3) The language information used to be derived > > > > dubiously > > > > > from code page and is > > > missing with Unicode, and architecture needs to accomodate a > better > > > model for bringing language to font selection. > > > > The archetypal situation is for CJK, and in particular J, > > where language choice correlates closely with typographical > > preferences, and where character encoding could, in turn, > > be correlated reliably with language choice. > > > > But in general, the connection does not hold, as for data > > in any of hundreds of different languages written in Code Page 1252, > > for example. > > > > What you are really looking for, I believe, is a way to > > specify typographical preference, which then can be used to > > drive auto-selection of fonts. > > > > I don't think we should head down the garden path of trying > > to tie typographical preference too closely to language identity, > > however we unknot that particular problem. This could get > > you into contrarian problems, where browsers (or other tools) > > start paying *too* much attention to language tags, and > > automatically (and mysteriously) override user preferences > > about the typographical preferences they expect for characters. > > > > What is needed, I believe, is: > > > > a. a way to establish typographic preferences > > b. a way to link typographical preference choices to > >fonts that would express them correctly > > c. a way to (optionally) associate a language with > >a typographical preference > >
Re: glyph selection for Unicode in browsers
Tex Texin wrote, > James, thanks as always for your reply. > The 65K limit is ugly... Thank you, Tex, for this fascinating thread. Don't despair. As Peter Constable has pointed out, the infrastructure for having browsers support language tags is already present. He has also provided a great outline/overview of just what is required. IMO, we are close to many improvements in this regard. Paul Nelson of the Microsoft Typography Group is doing a tremendous job with respect to OpenType technology and the Uniscribe engine. Other platforms are also making great strides forward. Once the font specs for all this are set and fonts are released with the necessary coverage and the shaping engines can access all of this, the browsers are sure to quickly add support, too. It's all somewhat interrelated. Unicode is the best way to go, because Unicode is all about character encoding. Shucks, I guess I don't have to tell you about the benefits of Unicode, eh? Best regards, James Kass.
Re: glyph selection for Unicode in browsers
> not to replace one broken model (code page = language) with > another broken model (language = font preference). I would add to that that I suspect that given the number of documents that fail to tag with language, or even worse yet, tag with the wrong language, that other approaches may give generally better results. The main area of concern is CJK, and I suspect that in a great many cases the user is probably better off either: - simply using a font set according to the user's own preference, or - having a bit of smarts in the program for heuristically picking among C, J and K. Mark __ http://www.macchiato.com ◄ “Eppur si muove” ► - Original Message - From: "Kenneth Whistler" <[EMAIL PROTECTED]> To: <[EMAIL PROTECTED]> Cc: <[EMAIL PROTECTED]>; <[EMAIL PROTECTED]> Sent: Thursday, September 26, 2002 16:17 Subject: Re: glyph selection for Unicode in browsers > Tex, > > > 3) The language information used to be derived > > dubiously > > > from code page and is > > missing with Unicode, and architecture needs to accomodate a better > > model for bringing language to font selection. > > The archetypal situation is for CJK, and in particular J, > where language choice correlates closely with typographical > preferences, and where character encoding could, in turn, > be correlated reliably with language choice. > > But in general, the connection does not hold, as for data > in any of hundreds of different languages written in Code Page 1252, > for example. > > What you are really looking for, I believe, is a way to > specify typographical preference, which then can be used to > drive auto-selection of fonts. > > I don't think we should head down the garden path of trying > to tie typographical preference too closely to language identity, > however we unknot that particular problem. This could get > you into contrarian problems, where browsers (or other tools) > start paying *too* much attention to language tags, and > automatically (and mysteriously) override user preferences > about the typographical preferences they expect for characters. > > What is needed, I believe, is: > > a. a way to establish typographic preferences > b. a way to link typographical preference choices to >fonts that would express them correctly > c. a way to (optionally) associate a language with >a typographical preference > > And this all should be done, of course, in such a way that > default behavior is reasonable and undue burdens of understanding, > font acquisition, installation, and such > are not placed on end-users who simply want to read and print > documents from the web. > > A tall order, I am sure. But as long as we are blue-skying about > architecture for better solutions, I think it is important > not to replace one broken model (code page = language) with > another broken model (language = font preference). > > --Ken >
Re: glyph selection for Unicode in browsers
Hi Tex Texin wrote: > > In the case of HTML, XML, CSS, ways to specify typographic preferences > exist, and language can be expressed via "lang". We just need browsers > and other user agents to make use of the lang information as part of > font selection. For me, this is the crux: that browsers have not implimented the css :lang selector. Things would be easier if we could tie presentation (via css) to the specified language of a document or part of a document. Andrew -- Andrew Cunningham Multilingual Technical Officer OPT, Vicnet State Library of Victoria Australia [EMAIL PROTECTED] Ph: +61-3-8664-7001 Fax: +61-3-9639-2175 http://home.vicnet.net.au/~andrewc/ http://www.openroad.net.au/
Re: glyph selection for Unicode in browsers
Ken, thanks. I absolutely agree. Yes code page was not a good indicator of language, but it was used that way by some applications. And yes, Language should not dominate font selection, it should influence it. Other typographic preferences also must be accomodated. Well said. In the case of HTML, XML, CSS, ways to specify typographic preferences exist, and language can be expressed via "lang". We just need browsers and other user agents to make use of the lang information as part of font selection. tex Kenneth Whistler wrote: > > Tex, > > > 3) The language information used to be derived > > dubiously > > > from code page and is > > missing with Unicode, and architecture needs to accomodate a better > > model for bringing language to font selection. > > The archetypal situation is for CJK, and in particular J, > where language choice correlates closely with typographical > preferences, and where character encoding could, in turn, > be correlated reliably with language choice. > > But in general, the connection does not hold, as for data > in any of hundreds of different languages written in Code Page 1252, > for example. > > What you are really looking for, I believe, is a way to > specify typographical preference, which then can be used to > drive auto-selection of fonts. > > I don't think we should head down the garden path of trying > to tie typographical preference too closely to language identity, > however we unknot that particular problem. This could get > you into contrarian problems, where browsers (or other tools) > start paying *too* much attention to language tags, and > automatically (and mysteriously) override user preferences > about the typographical preferences they expect for characters. > > What is needed, I believe, is: > > a. a way to establish typographic preferences > b. a way to link typographical preference choices to >fonts that would express them correctly > c. a way to (optionally) associate a language with >a typographical preference > > And this all should be done, of course, in such a way that > default behavior is reasonable and undue burdens of understanding, > font acquisition, installation, and such > are not placed on end-users who simply want to read and print > documents from the web. > > A tall order, I am sure. But as long as we are blue-skying about > architecture for better solutions, I think it is important > not to replace one broken model (code page = language) with > another broken model (language = font preference). > > --Ken -- - Tex Texin cell: +1 781 789 1898 mailto:[EMAIL PROTECTED] Xen Master http://www.i18nGuy.com XenCrafthttp://www.XenCraft.com Making e-Business Work Around the World -
RE: glyph selection for Unicode in browsers
I don't think the idea is that codepage equals language. Rather codepage equals a writing system, which consists of one or more scripts (e.g., 6 scripts for ShiftJIS). As such the codepage is a useful cue in choosing an appropriate font for rendering text. In the RichEdit edit engine, we use a codepage generalization called a CharRep and break Unicode plain text into runs of text each characterized by a particular CharRep. We then bind these runs to appropriate fonts for rendering. There are many additional considerations, so unfortunately this isn't an easy task. But with enough refinements it works quite well. The bottom line is that if text was generated using a particular codepage it's likely that the creator of that text intended the text to be rendered with a font that supports that codepage. For text tagged with no codepage, we do our best to translate the keyboard language to a CharRep and proceed as above. When neither the keyboard nor codepage info is available, we use a set of heuristics to break the text into CharRep runs. Among the many heuristics used are 1) a string containing Kana is likely to have a Japanese CharRep, and 2) a CJK string that round trips through CHT, CHS, or ShiftJIS may well belong to those CharReps. In particular if a CJK string doesn't round trip through CHT, it's probably not Traditional Chinese. Murray
Re: glyph selection for Unicode in browsers
Tex, > 3) The language information used to be derived dubiously > from code page and is > missing with Unicode, and architecture needs to accomodate a better > model for bringing language to font selection. The archetypal situation is for CJK, and in particular J, where language choice correlates closely with typographical preferences, and where character encoding could, in turn, be correlated reliably with language choice. But in general, the connection does not hold, as for data in any of hundreds of different languages written in Code Page 1252, for example. What you are really looking for, I believe, is a way to specify typographical preference, which then can be used to drive auto-selection of fonts. I don't think we should head down the garden path of trying to tie typographical preference too closely to language identity, however we unknot that particular problem. This could get you into contrarian problems, where browsers (or other tools) start paying *too* much attention to language tags, and automatically (and mysteriously) override user preferences about the typographical preferences they expect for characters. What is needed, I believe, is: a. a way to establish typographic preferences b. a way to link typographical preference choices to fonts that would express them correctly c. a way to (optionally) associate a language with a typographical preference And this all should be done, of course, in such a way that default behavior is reasonable and undue burdens of understanding, font acquisition, installation, and such are not placed on end-users who simply want to read and print documents from the web. A tall order, I am sure. But as long as we are blue-skying about architecture for better solutions, I think it is important not to replace one broken model (code page = language) with another broken model (language = font preference). --Ken
Re: glyph selection for Unicode in browsers
On 09/26/2002 03:05:36 PM Tex Texin wrote: >The problem I am looking to solve, is to be able to recommend Unicode as >best practice for the web. Which is a good thing to be concerned with, and all of the issues you raise are certainly important. >There is a market opportunity here for some industrious individuals... I think there are various font technology issues very much in need of solution, and preferably by agreement among font vendors (and also platform vendors, for certain issues) as to how to go about it. I'm not sure how that might come about. - Peter --- Peter Constable Non-Roman Script Initiative, SIL International 7500 W. Camp Wisdom Rd., Dallas, TX 75236, USA Tel: +1 972 708 7485 E-mail: <[EMAIL PROTECTED]>
Re: glyph selection for Unicode in browsers
At 02:59 PM 9/26/2002 -0400, Tex Texin wrote: >Shouldn't that be something more like: pan-script Unicode-based font? or p8e font? :) Barry Caplan www.i18n.com
Re: glyph selection for Unicode in browsers
Peter, Yes, I am aware of the difficulty of creating a single font that covers all of Unicode. And fine, let's change terminology. I was trying to make sure that my use of "Unicode font" was clear. Whether it's difficult or not, 1) there is a need for a simple solution for fonts, that lay people can use in conjunction with Unicode text. 2) The glyphs need to vary based on language when such information is available. 3) The language information used to be derived from code page and is missing with Unicode, and architecture needs to accomodate a better model for bringing language to font selection. That said, I'll use any terminology people want me to use, provided it doesn't obscure the issue(s). I don't require that a single font be used to solve problem #1. It can be a bundle of fonts or some other packaging of fonts. I only require that it be accessible to non-technical, non-linguist, people, who require a simple install and broad coverage, to get reasonable (not necessarily high end publishing) quality. It should be something I can do once in advance of receiving documents, and not something I need to do or reconsider every time I get a document and find new missing glyphs. I also don't care who provides the solution- it can be a font vendor, or it can be a package distributed by a browser vendor or someone else. The problem I am looking to solve, is to be able to recommend Unicode as best practice for the web. I don't think it is best practice if there are markets where the rendering is poor because the loss of language information provided by code page is not replaced by the lang facility and it is not best practice if in using Unicode, you need to be either technical or a linguist to identify and use the right fonts to display a document. There is a market opportunity here for some industrious individuals... And I hope the browser vendors are looking at the use of lang to assist in font selection. tex [EMAIL PROTECTED] wrote: > > On 09/26/2002 12:52:13 PM Tex Texin wrote: > > >I would like to keep the sense of "Unicode font" as meaning a font which > >supports a large number of scripts, rather than meaning one that uses > >Unicode for its mapping architecture. > > I suppose you didn't happen to attend session at a number of past Unicode > conferences (not this last one, though) in which folks from Monotype > presented on this these. In general, font developers don't recommend the > idea of a single font that covers "all of Unicode" (it's not possible, BTW, > given the 64K glyph limit). There are a variety of reasons for this. Even > so, people keep looking for them. > > As for terminology, "Unicode font" is too ambiguous for the reasons Markus > mentioned having to do with cmaps. You may be far more concerned with > comprehensive coverage, but that isn't necessarily everyone's concern. In > my work, I have to deal far more with fonts that use different encodings > than I do with fonts that have comprehensive coverage. I much prefer to > refer to comprehensive-coverage fonts as "pan-Unicode" fonts, and for the > other issue, to refer to "Unicode-encoded" or "Unicode-conformant" (as > opposed to custom-encoded) fonts. > > - Peter > > --- > Peter Constable > > Non-Roman Script Initiative, SIL International > 7500 W. Camp Wisdom Rd., Dallas, TX 75236, USA > Tel: +1 972 708 7485 > E-mail: <[EMAIL PROTECTED]> -- - Tex Texin cell: +1 781 789 1898 mailto:[EMAIL PROTECTED] Xen Master http://www.i18nGuy.com XenCrafthttp://www.XenCraft.com Making e-Business Work Around the World -
RE: glyph selection for Unicode in browsers
On 09/26/2002 01:15:37 PM "P. J. Patterson" wrote: >I think, ideally, I would be looking for a program to examine a >document, compare to the selected fonts (with fallback), and then list >the missing glyphs for individual handling. It wouldn't be all that difficult for someone to create a tool that compared a set of data with a preferential list of fonts to determine which characters are going to be supported by which fonts, and which are not covered. That still wouldn't address Tex's concern with regard to language-specific glyph preferences, unless the tool also knew which fonts were designed for which languages (or give preference to which languages as defaults). - Peter --- Peter Constable Non-Roman Script Initiative, SIL International 7500 W. Camp Wisdom Rd., Dallas, TX 75236, USA Tel: +1 972 708 7485 E-mail: <[EMAIL PROTECTED]>
Re: glyph selection for Unicode in browsers
On 09/26/2002 12:52:13 PM Tex Texin wrote: >I would like to keep the sense of "Unicode font" as meaning a font which >supports a large number of scripts, rather than meaning one that uses >Unicode for its mapping architecture. I suppose you didn't happen to attend session at a number of past Unicode conferences (not this last one, though) in which folks from Monotype presented on this these. In general, font developers don't recommend the idea of a single font that covers "all of Unicode" (it's not possible, BTW, given the 64K glyph limit). There are a variety of reasons for this. Even so, people keep looking for them. As for terminology, "Unicode font" is too ambiguous for the reasons Markus mentioned having to do with cmaps. You may be far more concerned with comprehensive coverage, but that isn't necessarily everyone's concern. In my work, I have to deal far more with fonts that use different encodings than I do with fonts that have comprehensive coverage. I much prefer to refer to comprehensive-coverage fonts as "pan-Unicode" fonts, and for the other issue, to refer to "Unicode-encoded" or "Unicode-conformant" (as opposed to custom-encoded) fonts. - Peter --- Peter Constable Non-Roman Script Initiative, SIL International 7500 W. Camp Wisdom Rd., Dallas, TX 75236, USA Tel: +1 972 708 7485 E-mail: <[EMAIL PROTECTED]>
Re: glyph selection for Unicode in browsers
Shouldn't that be something more like: pan-script Unicode-based font? [EMAIL PROTECTED] wrote: > > Tex Texin wrote, > > > I would like to keep the sense of "Unicode font" as meaning a font which > > supports a large number of scripts, rather than meaning one that uses > > Unicode for its mapping architecture. > > "pan-Unicode font" > > I think Frank da Cruz coined that expression, but am not sure. > > Since fonts do use Unicode mapping, some kind of modifyer is needed in > order to distinguish the big ones from the other kind. > > Best regards, > > James Kass. -- - Tex Texin cell: +1 781 789 1898 mailto:[EMAIL PROTECTED] Xen Master http://www.i18nGuy.com XenCrafthttp://www.XenCraft.com Making e-Business Work Around the World -
Re: glyph selection for Unicode in browsers
Tex Texin wrote, > I would like to keep the sense of "Unicode font" as meaning a font which > supports a large number of scripts, rather than meaning one that uses > Unicode for its mapping architecture. "pan-Unicode font" I think Frank da Cruz coined that expression, but am not sure. Since fonts do use Unicode mapping, some kind of modifyer is needed in order to distinguish the big ones from the other kind. Best regards, James Kass.
RE: glyph selection for Unicode in browsers
Actually, as a publisher, we do have a problem with this. I publish scientific abstract data which is collected from authors all over the world. Since the information becomes dated so quickly, we are always looking for ways to reduce turn around time from collection to publication. The books are sometimes 600 pages, and overhead must be kept low. Unicode helps us keep the data accurate, but we are still running into problems identifying missing glyphs prior to printing - for the most part it comes down to visual recognition. This is complicated by the fact that the submission and review processes are all browser based. I spoke with a few people from Adobe at the conference, and the concept of fall-back fonts was very appealing, at least to minimize the missing glyphs and still allow for wider font selections. But some sort of alert system beyond the unrecognized character display is really what we are looking for. I think, ideally, I would be looking for a program to examine a document, compare to the selected fonts (with fallback), and then list the missing glyphs for individual handling. Anyone have any thoughts? P.J. Patterson Director of Product Research and Development Coe-Truman Technologies, Inc. e. [EMAIL PROTECTED] p. 217-398-8594 f. 217-355-0101 > -Original Message- > From: Tex Texin [mailto:[EMAIL PROTECTED]] > Sent: Thursday, September 26, 2002 12:21 PM > To: John Cowan > Cc: [EMAIL PROTECTED]; [EMAIL PROTECTED] > Subject: Re: glyph selection for Unicode in browsers > > > Hi, > Yes, these fonts do not solve everything. (Nor should they.) > > We should be careful not to apply the requirements for high > end publishing systems to software that just needs to have > adequate rendering, such as browsers and other software. > > I would like to have adequate coverage for the Unicode space, > with some language awareness or sensitivity, before we raise > the bar to the level of requiring publishing quality. > > I would guess high end publishers are quite comfortable > choosing (acquiring, installing, selecting) specialized fonts > for different situations, including for rendering different languages. > > However, for people that are not so adept at choosing fonts > and assigning them by language, browsers and other software > need to have a reasonable, solution. > > tex > > John Cowan wrote: > > > > Thomas Chan scripsit: > > > > > But changing the example to fonts like Arial Unicode MS doesn't > > > completely solve everything--a sans serif font is not the > norm for > > > non-trivial quantities of CJK text (compare any book or > newspaper). > > > > Nor any other kind of text, indeed, until the widespread use of > > Arial/Helvetica, which properly is only a display font, as > a text font > > (ugh). > > > > -- > > John Cowan [EMAIL PROTECTED] www.ccil.org/~cowan > > www.reutershealth.com "If I have not seen as far as others, it is > > because giants were standing on my shoulders." > > --Hal Abelson > > -- > - > Tex Texin cell: +1 781 789 1898 mailto:[EMAIL PROTECTED] > Xen Master http://www.i18nGuy.com > > XenCraft http://www.XenCraft.com > Making e-Business Work Around the World > - > >
Re: glyph selection for Unicode in browsers
Markus, Yes, underlying fonts can be a Unicode architecture. That's a good thing, but invisible to end-users. I would like to keep the sense of "Unicode font" as meaning a font which supports a large number of scripts, rather than meaning one that uses Unicode for its mapping architecture. Yes, OS and browsers are getting better. My concerns center around: Is the mechanism for selecting fallback fonts language-sensitive, so that it would favor a Japanese font for Unicode Han characters that were tagged as lang:ja And are the fonts labeled so that the supported language is known? Even so, I'd still need to have a large collection of fonts then. I would like to be able to publish a page such as the Unicode example page: http://www.i18nguy.com/unicode-example.html without feeling obligated to publish a pdf version http://www.i18nguy.com/unicode/unicodeexample.pdf so that the less technical among us would not feel challenged to acquire and install several fonts, language by language. And my main point is that my investment in tagging text segments with "lang" should result in the most appropriate rendering. Currently, using a Unicode font, lang has no visible effect. tex Markus Scherer wrote: > > Tex Texin wrote: > > > However, a Japanese user might have to choose a Japanese font, if the > > Unicode font does not favor (and cannot be made to favor with language > > tags) Japanese renderings. > > So it's catch 22. They have native fonts because Unicode fonts are > > inadequate, but we can be relieved that although Unicode fonts are > > inadequate, we are lucky the users don't use them. > > I am not sure this is as bad as it may sound: > Modern "native fonts" use Unicode cmaps (mapping tables from _Unicode text_ to glyph >IDs) instead of SJIS/whatever cmaps. > They will just not contain entries for much else but "native" characters. > > In that sense, those native fonts will also be "Unicode fonts". > Operating systems and browsers are also getting better at automatically selecting >fallback fonts for characters that are missing in the main font. > > markus -- - Tex Texin cell: +1 781 789 1898 mailto:[EMAIL PROTECTED] Xen Master http://www.i18nGuy.com XenCrafthttp://www.XenCraft.com Making e-Business Work Around the World -
Re: glyph selection for Unicode in browsers
Hi, Yes, these fonts do not solve everything. (Nor should they.) We should be careful not to apply the requirements for high end publishing systems to software that just needs to have adequate rendering, such as browsers and other software. I would like to have adequate coverage for the Unicode space, with some language awareness or sensitivity, before we raise the bar to the level of requiring publishing quality. I would guess high end publishers are quite comfortable choosing (acquiring, installing, selecting) specialized fonts for different situations, including for rendering different languages. However, for people that are not so adept at choosing fonts and assigning them by language, browsers and other software need to have a reasonable, solution. tex John Cowan wrote: > > Thomas Chan scripsit: > > > But changing the example to fonts like Arial Unicode MS doesn't completely > > solve everything--a sans serif font is not the norm for non-trivial > > quantities of CJK text (compare any book or newspaper). > > Nor any other kind of text, indeed, until the widespread use of Arial/Helvetica, > which properly is only a display font, as a text font (ugh). > > -- > John Cowan [EMAIL PROTECTED] www.ccil.org/~cowan www.reutershealth.com > "If I have not seen as far as others, it is because giants were standing > on my shoulders." > --Hal Abelson -- - Tex Texin cell: +1 781 789 1898 mailto:[EMAIL PROTECTED] Xen Master http://www.i18nGuy.com XenCrafthttp://www.XenCraft.com Making e-Business Work Around the World -
Re: glyph selection for Unicode in browsers
Tex Texin wrote: > However, a Japanese user might have to choose a Japanese font, if the > Unicode font does not favor (and cannot be made to favor with language > tags) Japanese renderings. > So it's catch 22. They have native fonts because Unicode fonts are > inadequate, but we can be relieved that although Unicode fonts are > inadequate, we are lucky the users don't use them. I am not sure this is as bad as it may sound: Modern "native fonts" use Unicode cmaps (mapping tables from _Unicode text_ to glyph IDs) instead of SJIS/whatever cmaps. They will just not contain entries for much else but "native" characters. In that sense, those native fonts will also be "Unicode fonts". Operating systems and browsers are also getting better at automatically selecting fallback fonts for characters that are missing in the main font. markus
Re: glyph selection for Unicode in browsers
Thomas Chan scripsit: > But changing the example to fonts like Arial Unicode MS doesn't completely > solve everything--a sans serif font is not the norm for non-trivial > quantities of CJK text (compare any book or newspaper). Nor any other kind of text, indeed, until the widespread use of Arial/Helvetica, which properly is only a display font, as a text font (ugh). -- John Cowan [EMAIL PROTECTED] www.ccil.org/~cowan www.reutershealth.com "If I have not seen as far as others, it is because giants were standing on my shoulders." --Hal Abelson
Re: glyph selection for Unicode in browsers
On Thu, 26 Sep 2002 [EMAIL PROTECTED] wrote: > Tex Texin wrote, > > Given the (un)workable approach, do you then intend to have variants of > > code2000 for CJKT, so one can make the appropriate assignments? (ugh!) > > Code2000's coverage of CJKTV ideographs isn't adequate to support any language > yet. Eventually and hopefully the repertoire will be completed. Given the > current "ceiling" of 65536 max glyphs per font, it might not be feasible to > try to have one font cover all scripts and variants, but time will tell. I don't mean to detract from the point of this discussion, nor to criticize a particular font, but I think the Han glyphs in Code2000 are aesthetically disappointing in that that they are distorted enough (shape, proportions, and positioning) that they differ farther from any typical CJK font more so than two comparable CJK fonts may differ due to language/country glyph preferences. Compare, for instance, with other sans serif CJK fonts like Arial Unicode MS, (cn) MS Hei, or (ja) MS Gothic. But changing the example to fonts like Arial Unicode MS doesn't completely solve everything--a sans serif font is not the norm for non-trivial quantities of CJK text (compare any book or newspaper). These problems would cause rejection of a font faster than adverse reactions to foreign/unfamiliar glyph designs. (The aging serifed Bitstream Cyberbit font might be a better example in this respect.) Thomas Chan [EMAIL PROTECTED]
Re: glyph selection for Unicode in browsers
James, thanks as always for your reply. The 65K limit is ugly... With respect to CJKT comment below, I guess it is true because of catch-22. For example, I set my browser to default to a Unicode font. I think everyone would if they could- -it's a knee-jerk response if the solution is adequate everywhere. You don't have to know which fonts work for which languages. For Americas, and Europe, users can easily just set a Unicode font. However, a Japanese user might have to choose a Japanese font, if the Unicode font does not favor (and cannot be made to favor with language tags) Japanese renderings. So it's catch 22. They have native fonts because Unicode fonts are inadequate, but we can be relieved that although Unicode fonts are inadequate, we are lucky the users don't use them. ugh! So where the differences are important, users are forced to select native fonts instead of unicode fonts. This then creates the difficulty that to view a multilingual page, you need to a)acquire specialized fonts,(tedious and costly perhaps), b) install them, c) assign them d) finally view the page. Sadder still: Content developers that want to use Unicode: a) can invest a lot of time in declaring lang around sections of text, and really get no bang for it at the moment. In truth browsers do very little with this information as far as I can tell. (I suspect it helps search engines, but I need to test that assumption more). b) It is actually more beneficial to use native code pages than unicode, since the browsers seem to do a better job of font selection here. (I need to test this statement more. However, from my own coding experience on windows, knowing the code page allows easy setting of the "script" for the font, which has a major influence on Windows font selection. The language information wouldn't be available so easily for a Unicode file without it being carefully designed in to be passed from the markup layers down to the primitive font selection layers.) To be fair, I think font coverage for Unicode has been steadily improving and it is much easier today to produce multilingual docs than in the past. But I am disappointed in the state of the art for Browsers, and I suspect it is also true for other products that are not professional publishing software of one kind or another. I suspect at the heart of the problem is rendering architecture has not carried language (as opposed to code page) to the primitive layers, and this needs to be addressed throughout the architecture, since the language information can no longer be deduced or presumed when the encoding is Unicode. Whatever the reason, this needs to be fixed a) so Unicode can be recommended as best practice and b) documents are rendered with appropriate glyphs, without extraordinary effort by users. tex [EMAIL PROTECTED] wrote: > > Also, this approach means I have to ask each Unicode font vendor, "Which > > language is your multilingual font designed for?" > > so I know which CJKT assignment is appropriate for that font... > > > > Sad but true. On a happier note, most Japanese users will already have a > Japanese font set as default, Chinese users will have a Chinese (Simp or > Trad) font installed, and so forth. Still, when you're trying to publish > a multilingual page which can be properly displayed anywhere, this isn't > much consolation. > Best regards, > > James Kass. -- - Tex Texin cell: +1 781 789 1898 mailto:[EMAIL PROTECTED] Xen Master http://www.i18nGuy.com XenCrafthttp://www.XenCraft.com Making e-Business Work Around the World -
Re: glyph selection for Unicode in browsers
Tex Texin wrote, > Which registry are you referring to for script and language tags? > Is this in the context of glyphs or do you just mean the IANA language > tag registry? As Peter Constable already noted, in this case "registered" only means registered as an OpenType tag. More info about this can be found on Adobe's page: http://partners.adobe.com/asn/developer/opentype/appendices/ttoreg.html > > Given the (un)workable approach, do you then intend to have variants of > code2000 for CJKT, so one can make the appropriate assignments? (ugh!) > Code2000's coverage of CJKTV ideographs isn't adequate to support any language yet. Eventually and hopefully the repertoire will be completed. Given the current "ceiling" of 65536 max glyphs per font, it might not be feasible to try to have one font cover all scripts and variants, but time will tell. > Also, this approach means I have to ask each Unicode font vendor, "Which > language is your multilingual font designed for?" > so I know which CJKT assignment is appropriate for that font... > Sad but true. On a happier note, most Japanese users will already have a Japanese font set as default, Chinese users will have a Chinese (Simp or Trad) font installed, and so forth. Still, when you're trying to publish a multilingual page which can be properly displayed anywhere, this isn't much consolation. > (I hope this doesn't read like I am attacking you, I am not. I am just > trying to highlight the difficulty I am having with this.) You are not alone... Best regards, James Kass.
Re: glyph selection for Unicode in browsers
On 09/25/2002 03:34:00 PM Tex Texin wrote: >Thanks James. > >Which registry are you referring to for script and language tags? >Is this in the context of glyphs or do you just mean the IANA language >tag registry? The OpenType script and "language" tags are specific to OpenType. As I mentioned in my previous message, one of the problems yet to be solved is how to associate OT "language" tags with the kind of things used for metadata, e.g. RFC 3066 (and also determining whether resolving those associations is the responsibility of the app, of a higher-level layout engine, or of the OpenType layout engine), and it hasn't even been worked out yet (IMO) just what the OT "language" tags are. >Given the (un)workable approach, do you then intend to have variants of >code2000 for CJKT, so one can make the appropriate assignments? (ugh!) > >Also, this approach means I have to ask each Unicode font vendor, "Which >language is your multilingual font designed for?" >so I know which CJKT assignment is appropriate for that font... Unfortunately, that's where we're stuck for the time being. I wish it were otherwise, since we're in the process of coming up with new Latin / Cyrillic fonts for our users throughout the world, and there are various Latin characters for which different glyphs are preferred in different language communities. And the variations for one character don't necessarily correlate with those for another, so you get lots of possible combinations needed -- which would make it a pain to come up with a bunch of language-specific fonts. For now, we're going to give them the ability to select alternate glyphs via Graphite features,* but they'll only be able to use that in Graphite-enabled apps -- it won't work in Word! *Since our software tools are intended for use by linguists working in hundreds of languages / writing systems for which there is no support in commercial software platforms, we have for a long time provided mechanisms to specify writing-system-specific behaviours, such as sorting or character properties determining basic things like word-boundary detection and line breaking. In our new tools that support Graphite, there's an ability for the linguist setting up a system for their writing system to specify what features should be active by default for their writing system. This gives us an interim mechanism to handle language-specific typography requirements. - Peter --- Peter Constable Non-Roman Script Initiative, SIL International 7500 W. Camp Wisdom Rd., Dallas, TX 75236, USA Tel: +1 972 708 7485 E-mail: <[EMAIL PROTECTED]>
RE: glyph selection for Unicode in browsers
> I cannot help the wrong result. (I guess some browsers might do better > work at sniffing the content of the page, but at least IE6 and Opera 6.05 > on Win32 seem to believe the server rather than the (HTML of the) page. After some experimentation it seems that I blamed Opera 6.05/Win32 wrongly, it guesses the charset right. But as pointed out by Tex, HTTP/HTML charset ponderings are probably not Unicode issue as such, they are more a WWW issue, sorry about the slight off-topicalness.
Re: glyph selection for Unicode in browsers
On 09/25/2002 01:51:28 PM Tex Texin wrote: >a) Do Unicode fonts include the language-based glyph variants of >characters, so that a display system is capable of identifying or >hinting which glyph should be used in a particular scenario? They *can*, and some do. When this is the case, then there needs to be some mechanism to modify the relationship between sequences of characters and sequences of glyphs to arrive at the particular glyphs intended for the given language. In general terms, the same kinds of mechanisms than can be used for rendering complex scripts can also be used here -- it's a glyph substitution, comparable to substituting an initial or final form of an Arabic character. Of course, there is a different triggering condition involved in these situations than in the case of a complex script such as Arabic: in the complex-script situation, the triggers are the character context (e.g. preceded by non-word-forming character and followed by word-forming character), whereas here the trigger is a metadata tag. Let's consider how this would be dealt with in term of implementation, using OpenType as an example. The OpenType font format provides means for storing different glyph-transformation rules according to "language". (1) The question is, then, what does it take for the rendering process to make use of one set of language-specific rules rather than another, or rather than a set of default rules (OT allows the font developer to specify a default). In OpenType, glyph-transformation rules are grouped by "features", and a set of rules will be applied when the associated feature has been activated. (Thus, in OT text layout, what's processed is a feature-marked-up string of characters.) This applies to the "language" distinctions as well: the desired "language" must be specified in the input, otherwise the default rules will apply. (2) The idea is that application software must determine what features are activated at what point. Now, hardly any software gets written to interact directly with the OpenType layout engine. Instead, higher-level text layout libraries have been written that wrap the OpenType functionality. Uniscribe is one example; indeed, in Win32 on Windows 2000 and later, there is even another layer, since the standard text-drawing functions (TextOut and ExtTextOut) wrap Uniscribe's functionality. Other examples of libaries that wrap up the OT interface and expose a higher-level interface include Adobe's CoolType engine (not a published interface, that I know of), ICU, Pango and Sun's recent Standard Type Services Framework project. So, at the OT interface, a "language" tag (3) has to be specified in order to get language-specific glyphs. But apps generally don't write to that interface (for good reason); they usually write to a higher interface. The crux of the issue is that none of the higher-level interfaces, that I know of, yet provide any mechanism for the app to specify a "language" tag. (4) Hence, the building blocks are there, but more infrastructure is still needed. Note that there's a bit more involved that simply re-writing higer-level APIs to expose a way to specify OT featues. In particular, a critical issue has to do with the relationship between OpenType's "language" tags, and whatever system of "language" or "locale" tagging might be used elsewhere in a given platform. I've described the situation in terms of OpenType. Neither AAT or Graphite provide exactly the same kind of mechanism for providing different glyph transformations for different languages, though I believe some consideration has been given to possibilities for both technologies. Both use feature mechanisms, so can certainly do what you're looking for; but neither has specifically defined features specifically related to "languages", let alone decided how these should be handled in terms of APIs. It would be possible to implement an AAT or Graphite font that used a feature to get at language-specific glyphs, and apps that exposed a user-interface for setting AAT or Graphite features (5) would offer the user a way to control this. But there would not be any automation whereby an app would specify this based on other "language" or "locale" tagging. Notes: (1) I put "language" in quotation marks since it has not really been adequately worked out what these distinctions are; I think these are probably groups of writing systems. (2) OpenType glyph-transformation rules are organised hierarchically, first by script, then by language, and then according to the other features they are associated with. (3) OpenType's "language" tags have no specified relationship with ISO 639, RFC 3066 or any other system of "language" tags. (4) The same issue applies to OpenType features that pertain to optional aspects of typography and rendering that are up to the user's discretion rather than being obligatory behaviour for a script. For instance, there is an OpenType feature for selecting small cap forms, which a fo
RE: glyph selection for Unicode in browsers
*sigh* Time for me to call it the day and go home, it seems. Opera 6.05/Win32 does *not* get it right if you have it on View -> Encoding -> Automatic detection. Why I was fooled in the below message was that the Encoding setting seems to stick even if I exit and restart Opera, that's why my test page seemed to be working. If I turn it back to autodetect, it doesn't autodetect the UTF-8-ness. (If nothing else this bumbling saga of mine illustrates how difficult it still is to get all this "just to work".) -Original Message- From: Hietaniemi Jarkko (NRC/Boston) Sent: 25 September, 2002 04:56 PM To: Hietaniemi Jarkko (NRC/Boston); 'ext Tex Texin'; 'WWW International'; 'Unicoders' Subject: RE: glyph selection for Unicode in browsers > I cannot help the wrong result. (I guess some browsers might do better > work at sniffing the content of the page, but at least IE6 and Opera 6.05 > on Win32 seem to believe the server rather than the (HTML of the) page. After some experimentation it seems that I blamed Opera 6.05/Win32 wrongly, it guesses the charset right. But as pointed out by Tex, HTTP/HTML charset ponderings are probably not Unicode issue as such, they are more a WWW issue, sorry about the slight off-topicalness.
Re: glyph selection for Unicode in browsers
Thanks James. Which registry are you referring to for script and language tags? Is this in the context of glyphs or do you just mean the IANA language tag registry? Given the (un)workable approach, do you then intend to have variants of code2000 for CJKT, so one can make the appropriate assignments? (ugh!) Also, this approach means I have to ask each Unicode font vendor, "Which language is your multilingual font designed for?" so I know which CJKT assignment is appropriate for that font... (I hope this doesn't read like I am attacking you, I am not. I am just trying to highlight the difficulty I am having with this.) tex [EMAIL PROTECTED] wrote: > > Tex Texin wrote, > > a) Do Unicode fonts include the language-based glyph variants of > > characters, so that a display system is capable of identifying or > > hinting which glyph should be used in a particular scenario? > >... > > OpenType allows for substitution of language-specific glyphs and many > script and language tags are already "registered". > > However, the last time I checked (quite recently), the Uniscribe engine > only implements one language tag per script. > > OpenType is still nascent and tremendous strides have been made within > the past few years. Once implementations do allow for multiple language > based substitutions under a single script tag, there should be much > improvement in browser display. (As long as the fonts get updated, too!) > > Meanwhile, the workable approach seems to remain assigning specific > fonts in the style declaration. > > Best regards, > > James Kass. -- - Tex Texin cell: +1 781 789 1898 mailto:[EMAIL PROTECTED] Xen Master http://www.i18nGuy.com XenCrafthttp://www.XenCraft.com Making e-Business Work Around the World -
Re: glyph selection for Unicode in browsers
Tex Texin wrote, >... > However, I am finding that browsers are not supporting this in a way > that is useful for Unicode. > > What has been working so far is that the browsers can associate > different fonts with different languages. So I might use a Japanese font > such as Mincho for Japanese text and another font for Chinese text. > However, now that there are "Unicode" fonts, if I assign a Unicode font > such as Arial Unicode MS, or CODE2000, to all languages, then I see the > same glyph for a character, regardless of the lang assignment. > > I would like to understand why this is. (Bear in mind, I don't know much > more than the rudiments of font technology.) > > a) Do Unicode fonts include the language-based glyph variants of > characters, so that a display system is capable of identifying or > hinting which glyph should be used in a particular scenario? >... OpenType allows for substitution of language-specific glyphs and many script and language tags are already "registered". However, the last time I checked (quite recently), the Uniscribe engine only implements one language tag per script. OpenType is still nascent and tremendous strides have been made within the past few years. Once implementations do allow for multiple language based substitutions under a single script tag, there should be much improvement in browser display. (As long as the fonts get updated, too!) Meanwhile, the workable approach seems to remain assigning specific fonts in the style declaration. Best regards, James Kass.
Re: glyph selection for Unicode in browsers
Done. I almost forgot, I have a web page that also describes how to use .htaccess with Apache. See tip #1 in: http://www.i18nguy.com/markup/serving.html tex [EMAIL PROTECTED] wrote: > > > You would be happy, but others might not- the standard specifically says > > that the http charset takes precedence. > > http://www.w3.org/TR/REC-html40/charset.html#h-5.2.2 > > Yup. I guess I could argue both ways. The server admins want control; > the users want control, the latter lose :-) > > > However, what you say about user control of web server facilities being > > up to the administrator and not the page's author is true. > > Some of the servers allow users some control through directory-based > > files. > > > > I can send you a sample .htaccess file privately, if it will be of use > > to you. > > Please. -- - Tex Texin cell: +1 781 789 1898 mailto:[EMAIL PROTECTED] Xen Master http://www.i18nGuy.com XenCrafthttp://www.XenCraft.com Making e-Business Work Around the World -
RE: glyph selection for Unicode in browsers
> You would be happy, but others might not- the standard specifically says > that the http charset takes precedence. > http://www.w3.org/TR/REC-html40/charset.html#h-5.2.2 Yup. I guess I could argue both ways. The server admins want control; the users want control, the latter lose :-) > However, what you say about user control of web server facilities being > up to the administrator and not the page's author is true. > Some of the servers allow users some control through directory-based > files. > > I can send you a sample .htaccess file privately, if it will be of use > to you. Please.
RE: glyph selection for Unicode in browsers
I would be happy if just this would be enough to convince the browsers that the page is in UTF-8... It isn't if the HTTP server claims that the pages it serves are in ISO 8859-1. A sample of this is http://www.iki.fi/jhi/jp_utf8.html, it does have the meta charset, but since the webserver (www.hut.fi, really, a server outside of my control) thinks it's serving Latin 1, I cannot help the wrong result. (I guess some browsers might do better work at sniffing the content of the page, but at least IE6 and Opera 6.05 on Win32 seem to believe the server rather than the (HTML of the) page.
Re: glyph selection for Unicode in browsers
You would be happy, but others might not- the standard specifically says that the http charset takes precedence. http://www.w3.org/TR/REC-html40/charset.html#h-5.2.2 However, what you say about user control of web server facilities being up to the administrator and not the page's author is true. Some of the servers allow users some control through directory-based files. My ISP uses apache and so I can set the charset of my files through files named .htaccess in each directory. It is not optimal, but it is helpful. I can send you a sample .htaccess file privately, if it will be of use to you. tex [EMAIL PROTECTED] wrote: > > I would be happy if just this > > > > would be enough to convince the browsers that the page is in UTF-8... > It isn't if the HTTP server claims that the pages it serves are in > ISO 8859-1. A sample of this is http://www.iki.fi/jhi/jp_utf8.html, > it does have the meta charset, but since the webserver (www.hut.fi, > really, a server outside of my control) thinks it's serving Latin 1, > I cannot help the wrong result. (I guess some browsers might do better > work at sniffing the content of the page, but at least IE6 and Opera 6.05 > on Win32 seem to believe the server rather than the (HTML of the) page. -- - Tex Texin cell: +1 781 789 1898 mailto:[EMAIL PROTECTED] Xen Master http://www.i18nGuy.com XenCrafthttp://www.XenCraft.com Making e-Business Work Around the World -