RE: glyph selection for Unicode in browsers

2002-10-08 Thread Martin Duerst

At 13:41 02/10/02 +0900, Martin Duerst wrote:

I'm not sure this is possible with Apache, maybe there is a need
for a RemoveCharset directive similar to RemoveType
(http://httpd.apache.org/docs/mod/mod_mime.html#removetype).
Or maybe there is some other way to get the same result.
If a new directive is desirable, then let's try to hack
the Apache code or to propose it to the Apache people.
Similar of course for other server implementations.

Over lunch, a colleague told me that RemoveCharset has
been added to Apache 2.0. See e.g.
http://httpd.apache.org/docs-2.0/mod/mod_mime.html#removecharset.

So the right thing to do may be to ask your ISP to
upgrade to Apache 2.0.

Regards,Martin.





RE: glyph selection for Unicode in browsers

2002-10-02 Thread Martin Duerst

At 12:14 02/10/01 -0400, [EMAIL PROTECTED] wrote:

I agree that 'sniffing' and 'guessing' are ill-defined, and not to be
relied upon.  However, I find it a bit 'ill-defined' that there is no
well-defined (web server independent) way for the 'users' to override
the possibly wrong encoding default of the web server.  Either way
(a) the user has to do something web server dependent
(b) the admin has to do changes to the site config
seems a bit clunky and fragile.

Since the current resolving order is obviously already deployed out
there and relied upon by someone, it cannot be changed, but possibly
something new could be introduced?

Well, servers can always be improved by the various server implementers.
What standards specify is what goes 'over the wire'.

The only thing you actually have to do is to make sure that the server
doesn't add a 'charset' parameter to the Content-Type header for
the directories you are using. Then the meta is the only info,
and is used by the browser.

I'm not sure this is possible with Apache, maybe there is a need
for a RemoveCharset directive similar to RemoveType
(http://httpd.apache.org/docs/mod/mod_mime.html#removetype).
Or maybe there is some other way to get the same result.
If a new directive is desirable, then let's try to hack
the Apache code or to propose it to the Apache people.
Similar of course for other server implementations.

Regards,Martin.






RE: glyph selection for Unicode in browsers

2002-10-01 Thread jarkko.hietaniemi


 Sniffing isn't a good idea in the long term. It may work
 for simple web page serving, but as soon as you go XML and
 start to move data around without the user having a chance
 to see it frequently, you'll end up with a big mess.
 
 Also, 'guessing' is very ill-defined. You might serve
 a document to your favorite browser, and it looks okay.
 But other browsers might guess a bit differently, or
 a new version of your favorite browser may guess a bit
 differently, and off you are.

I agree that 'sniffing' and 'guessing' are ill-defined, and not to be
relied upon.  However, I find it a bit 'ill-defined' that there is no
well-defined (web server independent) way for the 'users' to override
the possibly wrong encoding default of the web server.  Either way
(a) the user has to do something web server dependent
(b) the admin has to do changes to the site config
seems a bit clunky and fragile.

Since the current resolving order is obviously already deployed out
there and relied upon by someone, it cannot be changed, but possibly
something new could be introduced?










Re: glyph selection for Unicode in browsers

2002-10-01 Thread John Hudson

At 03:22 AM 30-09-02, [EMAIL PROTECTED] wrote:

 I think the idea is that, in a word processor for example

What would you say about a browser?

Probably something about extended style sheets that include typographic 
system tagging. Ideally, as a typographer, I would like something like CSS 
that includes a tag for every registered OpenType Layout feature -- and OT 
'language system' tagging that sits below the level of document language 
tagging etc. --, so that I can create sophisticated online documents with 
the same level of typographic control as I have for print documents. I 
realise that it may be necessary to dress this up as a higher level, 
non-proprietory-technology-specific mark up.

John Hudson

Tiro Typeworks  www.tiro.com
Vancouver, BC   [EMAIL PROTECTED]

Those books that allow us to forget the most
are accorded the status of a classic.
   - James Secord





Re: glyph selection for Unicode in browsers

2002-09-30 Thread Peter_Constable


On 09/29/2002 12:53:14 PM tiro wrote:

I think the idea is that, in a word processor for example

What would you say about a browser?



- Peter


---
Peter Constable

Non-Roman Script Initiative, SIL International
7500 W. Camp Wisdom Rd., Dallas, TX 75236, USA
Tel: +1 972 708 7485
E-mail: [EMAIL PROTECTED]







RE: glyph selection for Unicode in browsers

2002-09-30 Thread Martin Duerst

At 07:37 02/09/26 +0900, [EMAIL PROTECTED] wrote:
I would be happy if just this

meta http-equiv=Content-Type content=text/html; charset=utf-8/

would be enough to convince the browsers that the page is in UTF-8...
It isn't if the HTTP server claims that the pages it serves are in
ISO 8859-1.  A sample of this is http://www.iki.fi/jhi/jp_utf8.html,
it does have the meta charset, but since the webserver (www.hut.fi,
really, a server outside of my control) thinks it's serving Latin 1,
I cannot help the wrong result.  (I guess some browsers might do better
work at sniffing the content of the page, but at least IE6 and Opera 6.05
on Win32 seem to believe the server rather than the (HTML of the) page.

Sniffing isn't a good idea in the long term. It may work
for simple web page serving, but as soon as you go XML and
start to move data around without the user having a chance
to see it frequently, you'll end up with a big mess.

Also, 'guessing' is very ill-defined. You might serve
a document to your favorite browser, and it looks okay.
But other browsers might guess a bit differently, or
a new version of your favorite browser may guess a bit
differently, and off you are.

Regards,   Martin.




Re: glyph selection for Unicode in browsers

2002-09-29 Thread tiro

Quoting [EMAIL PROTECTED]:

 But should there not be some (possibly user-overridable) relationship
 between an NLS or similar tag (e.g. lang in HTML or xml:lang) and one of
 these so that a browser or word-processing app that knows what language
 (e.g. what RFC 3066 tag) is applied to the data can tell the
 layout/rendering sub-system what OT language-system tags to apply
 (assuming some API exists to do so)? Surely that is where we want to move
 toward.

I think the idea is that, in a word processor for example, something 
like 'Typographic system' would be set by the user as an independent layout 
control, not directly linked to 'language'. This enables the user to select a 
language to use for sorting, spellchecking, etc. (character level text 
handling), and separately select a set of typographic conventions (glyph level 
text display).

I suppose some developers may choose to pursue the direction you suggest, e.g. 
relating default typographic conventions to the user's language setting.

I just make the fonts :)

John Hudson




Re: glyph selection for Unicode in browsers

2002-09-28 Thread tiro

Quoting [EMAIL PROTECTED]:

 Actually, my point was specifically that *part* of the infrastructure is
 already present, at least in OpenType, but not *all*, either in OpenType
 (meaning of language in the OT spec needs to be clarified, and
 relationships between these tags and the language tags used for data e.g.
 RFC 3066, need to be resolved)...

'Language system' (not 'language') in the OpenType specification actually means 
*writing* system, i.e. a particular set of orthographic/typographic conventions 
associated with the use of a particular script. 'Language system' is a 
misnomer -- an historical artifact of the incomplete understanding of the 
format's original designers --, and it has caused all sorts of confusion, 
especially among people who assume that the OT 'language system' tags must have 
some relationship to things like NLS tags. There is no necessary relationship 
and, indeed, it is possible to conceive of a user wanting to apply, for 
instance, the typographic conventions of German to a language other than German.

I've suggested to Microsoft and Adobe that the term used in the spec should be 
changed, or at least annotated.

John Hudson




Re: glyph selection for Unicode in browsers

2002-09-28 Thread Baiju M

Can anyone clarify this one:
In Microsoft page here :
http://www.microsoft.com/typography/OTSPEC/indicot/default.htm
says Malayalam chillu glyphs are formed when inputting 
(consonant)+(virama). Can I use another formation for chillus, I
want to use (consonant)+(virama)+(ZWJ) any problem?
And any problem, If I am am giving ligature formation in this
way
in OpenType tables?

Regards,
Baiju M

--- [EMAIL PROTECTED] wrote:
 Quoting [EMAIL PROTECTED]:
 
  Actually, my point was specifically that *part* of the
 infrastructure is
  already present, at least in OpenType, but not *all*, either
 in OpenType
  (meaning of language in the OT spec needs to be clarified,
 and
  relationships between these tags and the language tags
 used for data e.g.
  RFC 3066, need to be resolved)...
 
 'Language system' (not 'language') in the OpenType
 specification actually means 
 *writing* system, i.e. a particular set of
 orthographic/typographic conventions 
 associated with the use of a particular script. 'Language
 system' is a 
 misnomer -- an historical artifact of the incomplete
 understanding of the 
 format's original designers --, and it has caused all sorts of
 confusion, 
 especially among people who assume that the OT 'language
 system' tags must have 
 some relationship to things like NLS tags. There is no
 necessary relationship 
 and, indeed, it is possible to conceive of a user wanting to
 apply, for 
 instance, the typographic conventions of German to a language
 other than German.
 
 I've suggested to Microsoft and Adobe that the term used in
 the spec should be 
 changed, or at least annotated.
 
 John Hudson
 


=


__
Do you Yahoo!?
New DSL Internet Access from SBC  Yahoo!
http://sbc.yahoo.com




Re: glyph selection for Unicode in browsers

2002-09-28 Thread John Cowan

[EMAIL PROTECTED] scripsit:

 There is no necessary relationship 
 and, indeed, it is possible to conceive of a user wanting to apply, for 
 instance, the typographic conventions of German to a language other than German.

Indeed, if one is doing early modern Swedish, that is exactly what one
wants, IIRC.

-- 
One art / There is  John Cowan [EMAIL PROTECTED]
No less / No more   http://www.reutershealth.com
All things / To do  http://www.ccil.org/~cowan
With sparks / Galore -- Douglas Hofstadter




Re: glyph selection for Unicode in browsers

2002-09-28 Thread Peter_Constable


On 09/28/2002 04:47:49 AM tiro wrote:

'Language system' (not 'language') in the OpenType specification actually
means
*writing* system, i.e. a particular set of orthographic/typographic
conventions
associated with the use of a particular script. 'Language system' is a
misnomer -- an historical artifact of the incomplete understanding of the
format's original designers --, and it has caused all sorts of confusion,
especially among people who assume that the OT 'language system' tags must
have
some relationship to things like NLS tags. There is no necessary
relationship
and, indeed, it is possible to conceive of a user wanting to apply, for
instance, the typographic conventions of German to a language other than
German.

But should there not be some (possibly user-overridable) relationship
between an NLS or similar tag (e.g. lang in HTML or xml:lang) and one of
these so that a browser or word-processing app that knows what language
(e.g. what RFC 3066 tag) is applied to the data can tell the
layout/rendering sub-system what OT language-system tags to apply
(assuming some API exists to do so)? Surely that is where we want to move
toward.



- Peter


---
Peter Constable

Non-Roman Script Initiative, SIL International
7500 W. Camp Wisdom Rd., Dallas, TX 75236, USA
Tel: +1 972 708 7485
E-mail: [EMAIL PROTECTED]







Re: glyph selection for Unicode in browsers

2002-09-27 Thread John Cowan

Tex Texin scripsit:

 I do need to point out that user preference is problematic if it means
 that for a user to display a multilingual document, the user has to go
 thru and specify font preferences for languages they know nothing about.

How can this be avoided?  If I print a document containing a small amount
of text in Georgian (in a bibliography entry, say), I am not going to
know if the Georgian font is the most beautiful thing ever made or one
that is utterly illegible.  I have to pass it to someone who can read
Georgian and wait for the Aah! or Arrgh! as the case may be.

Or I can take the default and hope for the best.

 Just because I don't read CJK, doesn't mean I don't have legitimate
 needs to display or print CJK in a typographically correct way.
 Librarians, Commerce exchanges, mailing lists, localizers, etc.

Since the issue is not really a matter of language, but of typographic
tradition (see John Jenkins's excellent discussion of this question at
http://www.unicode.org/unicode/faq/han_cjk.html#3), there is no such thing
as a typographically correct way.  In particular (as noted in the FAQ),
it is commonplace for a Japanese document that quotes Chinese text to
use Japanese-style glyphs for both languages, as this is apparently less
jarring to the average Japanese reader.

 But although you didn't quite say this, a user could provide a
 preference not for font, but language, i.e. if the script is CJK,
 display it as C or J or K (or T). And given the language the font
 mechanisms would do a reasonable thing.

That is reasonable provided you grasp what is meant by language
preference here: namely, typographical tradition preference.  It would
be like choosing between Fraktur and Antiqua when reading German text:
this too is rather broader than a mere font difference.

-- 
A mosquito cried out in his pain,   John Cowan
A chemist has poisoned my brain!  http://www.ccil.org/~cowan
The cause of his sorrow http://www.reutershealth.com
Was para-dichloro-  [EMAIL PROTECTED]
Diphenyltrichloroethane.(aka DDT)




Re: glyph selection for Unicode in browsers

2002-09-27 Thread Peter_Constable


On 09/26/2002 10:46:42 PM Andrew Cunningham wrote:

For me, this is the crux: that browsers have not implimented the css
:lang selector.

Again, the problem is knowing just *how* they should go about doing this.



- Peter


---
Peter Constable

Non-Roman Script Initiative, SIL International
7500 W. Camp Wisdom Rd., Dallas, TX 75236, USA
Tel: +1 972 708 7485
E-mail: [EMAIL PROTECTED]







Re: glyph selection for Unicode in browsers

2002-09-27 Thread Peter_Constable


On 09/26/2002 08:55:18 PM Tex Texin wrote:

Yes code page was not a good indicator of language, but it was used that
way by some applications.

And yes, Language should not dominate font selection, it should
influence it. Other typographic preferences also must be accomodated.

I agree.


In the case of HTML, XML, CSS, ways to specify typographic preferences
exist, and language can be expressed via lang. We just need browsers
and other user agents to make use of the lang information as part of
font selection.

The difficult question is How? Do we want some means (not codepage) to know
that certain fonts are suited to particular languages? Or do we want to
make use of smart-font capabilities to allow culturally-preferred glyphs to
be selected from a font? If the latter, then some more infrastructure still
needs to be developed within APIs and layout engines.



- Peter


---
Peter Constable

Non-Roman Script Initiative, SIL International
7500 W. Camp Wisdom Rd., Dallas, TX 75236, USA
Tel: +1 972 708 7485
E-mail: [EMAIL PROTECTED]







RE: glyph selection for Unicode in browsers

2002-09-27 Thread Peter_Constable


On 09/26/2002 07:24:08 PM Murray Sargent wrote:

I don't think the idea is that codepage equals language. Rather codepage
equals a writing system, which consists of one or more scripts (e.g., 6
scripts for ShiftJIS). As such the codepage is a useful cue in choosing
an appropriate font for rendering text.

(Murray and I talked about this some at dinner a couple of weeks ago, so
there's some history here.)

I don't think things are quite that simple. A codepage *can* be a useful
cue in choosing an appropriate font (or in choosing typographic preferences
by whatever means). This certainly may be the case in some instances, such
as Shift JIS. But it's not always the case. For instance, cp1251 doesn't
tell you what language is involved, and isn't sufficient to determine which
italic variants of certain Cyrillic characters are needed. Similarly,
cp1250 doesn't tell you what cultural preferences should apply in relation
to design and alignment of the ogonek diacritic (e.g. Polish and Lithuanian
differ in this regard), or other diacritics (e.g. caron should have a
distinct form for Czech); and cp1252 doesn't tell you about cultural
preferences regarding cedilla (three different forms can be used for
French, but only one is acceptable for Portuguese or Catalan).

That's why I maintain that a codepage is a character set, but not a writing
system. In general, a codepage does not determine a set of rules for
writing; it just provides a vocabularly with which to work.




The bottom line is that if text was generated using a particular
codepage it's likely that the creator of that text intended the text to
be rendered with a font that supports that codepage.

Of course, fonts can support multiple codepages. Given e.g. Arial, Tahoma
and Verdana, they all support codepages 1250, 1251, 1252, 1253, 1254, 1257
and 1258. That doesn't tell you whether they're appropriate for Polish or
Lithuanian or Czech or whatever. Even the fact that they support cp1258
doesn't imply that they are appropriate for Vietnamese: e.g. the default
glyphs in Arial for U+1EA5 and U+1EA7 do not have the diacritics stacked in
the way needed for Vietnamese.

I'm not saying that codepage information isn't ever useful. Obviously, you
have found it very useful. But the usefulness has limits.



- Peter


---
Peter Constable

Non-Roman Script Initiative, SIL International
7500 W. Camp Wisdom Rd., Dallas, TX 75236, USA
Tel: +1 972 708 7485
E-mail: [EMAIL PROTECTED]







Re: glyph selection for Unicode in browsers

2002-09-27 Thread Peter_Constable


On 09/27/2002 12:27:22 AM jameskass wrote:

Don't despair.  As Peter Constable has pointed out, the infrastructure
for having browsers support language tags is already present.

Actually, my point was specifically that *part* of the infrastructure is
already present, at least in OpenType, but not *all*, either in OpenType
(meaning of language in the OT spec needs to be clarified, and
relationships between these tags and the language tags used for data e.g.
RFC 3066, need to be resolved), or in APIs (there's no way for apps to
indicate which OT language tag to apply to a run unless the app wishes to
do *all* of the OT support -- replacing e.g. Uniscribe -- itself).


Once the font specs for all this are set and fonts are released with
the necessary coverage and the shaping engines can access all of this,
the browsers are sure to quickly add support, too.

I'm not quite as optimistic in terms of how close we are to having all this
ready to go. I think there's some hard work still ahead.



- Peter


---
Peter Constable

Non-Roman Script Initiative, SIL International
7500 W. Camp Wisdom Rd., Dallas, TX 75236, USA
Tel: +1 972 708 7485
E-mail: [EMAIL PROTECTED]








Re: glyph selection for Unicode in browsers

2002-09-27 Thread Tex Texin

John,
Thanks for commenting. Responses embedded.

John Cowan wrote:
 
 Tex Texin scripsit:
 
  I do need to point out that user preference is problematic if it means
  that for a user to display a multilingual document, the user has to go
  thru and specify font preferences for languages they know nothing about.
 
 How can this be avoided?  If I print a document containing a small amount
 of text in Georgian (in a bibliography entry, say), I am not going to
 know if the Georgian font is the most beautiful thing ever made or one
 that is utterly illegible.  I have to pass it to someone who can read
 Georgian and wait for the Aah! or Arrgh! as the case may be.
 
 Or I can take the default and hope for the best.

All I ask is the defaults be adequate. I wouldn't disallow software from
providing for users to express preferences. I am trying to avoid it
being required of users to provide preferences. Yes, most users don't
know which fonts are the best choices.

 
  Just because I don't read CJK, doesn't mean I don't have legitimate
  needs to display or print CJK in a typographically correct way.
  Librarians, Commerce exchanges, mailing lists, localizers, etc.
 
 Since the issue is not really a matter of language, but of typographic
 tradition (see John Jenkins's excellent discussion of this question at
 http://www.unicode.org/unicode/faq/han_cjk.html#3), there is no such thing
 as a typographically correct way.  In particular (as noted in the FAQ),
 it is commonplace for a Japanese document that quotes Chinese text to
 use Japanese-style glyphs for both languages, as this is apparently less
 jarring to the average Japanese reader.

Typographically correct was too strong. I am just looking for the font
to reflect the language, so CJK is displayed as either C or J or K as
indicated by HTML or XML lang tags.

With respect to the comment from John's FAQ, it is reasonable but only
for a user who is primarily or strongly a C, or J or K reader.
For many applications, such as printing labels for card catalogs or
mailing lists, the user's preference does not matter (because the
printout targets someone other than the person operating the software).
Also, for someone like myself who is not a reader, I would like text
displayed the same way each time so I stand a better chance of
recognizing it.
As more people work with multilingual data, I think more users will be
like myself.

An author of a primarily Japanese document could choose not to tag
Chinese text as Chinese, and so get a Japanese rendering of the text,
but that could hurt search engines or other applications that use
language tags for purposes other than rendering... So I stick with the
idea that text should be tagged with language appropriately, and a user
that reads Japanese and prefers to see Chinese text with Japanese glyphs
have the ability to override the language tags to affect rendering.

 
  But although you didn't quite say this, a user could provide a
  preference not for font, but language, i.e. if the script is CJK,
  display it as C or J or K (or T). And given the language the font
  mechanisms would do a reasonable thing.
 
 That is reasonable provided you grasp what is meant by language
 preference here: namely, typographical tradition preference.  It would
 be like choosing between Fraktur and Antiqua when reading German text:
 this too is rather broader than a mere font difference.

I am not a typographer, and I am just trying to point out requirements
for font selection for a typical user or at least a user that is not a
linguist, not a typographer, not a font specialist, and who wants to
display/print pan-Unicode or pan-script unicode-based text.
I am not trying to address high end publishing requirements.

I can't say if typographical tradition preference (TTP) is the correct
term for language preference. (I figure I got into enough trouble
using typographically correct.) I hope the discussion above was clear
enough. I'll let others comment on TTP, and if there is general
agreement that it is a better and more precise and accurate term, I am
fine with it. I am not familiar enough with Fraktur and Antiqua to
knowledgably comment. From what little I do know this seems to require
more than language information to decide between them. 
(I did find an interesting article on Fraktur though in trying to
understand your meaning, http://www.waldenfont.com/public/gbpmanual.pdf)


hth
tex
p.s. I am about to travel and may not have email for a few days. (A
cheer goes up from the list...)





 
 --
 A mosquito cried out in his pain,   John Cowan
 A chemist has poisoned my brain!  http://www.ccil.org/~cowan
 The cause of his sorrow http://www.reutershealth.com
 Was para-dichloro-  [EMAIL PROTECTED]
 Diphenyltrichloroethane.(aka DDT)

-- 
-
Tex Texin   cell: +1 781 789 1898   

Re: glyph selection for Unicode in browsers

2002-09-27 Thread Tex Texin



[EMAIL PROTECTED] wrote:
 
 On 09/26/2002 08:55:18 PM Tex Texin wrote:
 In the case of HTML, XML, CSS, ways to specify typographic preferences
 exist, and language can be expressed via lang. We just need browsers
 and other user agents to make use of the lang information as part of
 font selection.
 
 The difficult question is How? Do we want some means (not codepage) to know
 that certain fonts are suited to particular languages? 

Fonts already span multiple languages, so by itself would not work.

Or do we want to
 make use of smart-font capabilities to allow culturally-preferred glyphs to
 be selected from a font? If the latter, then some more infrastructure still
 needs to be developed within APIs and layout engines.

I think yes. Whereas api relied on codepages either implicitly or
explicitly, this needs to be reexamined and language should be allowed
to play a suitable role in glyph selection and font selection.




-- 
-
Tex Texin   cell: +1 781 789 1898   mailto:[EMAIL PROTECTED]
Xen Master  http://www.i18nGuy.com
 
XenCrafthttp://www.XenCraft.com
Making e-Business Work Around the World
-




Re: glyph selection for Unicode in browsers

2002-09-27 Thread Jungshik Shin

On Thu, 26 Sep 2002, Tex Texin wrote:

 Yes, underlying fonts can be a Unicode architecture. That's a good
 thing, but invisible to end-users.
 I would like to keep the sense of Unicode font as meaning a font which
 supports a large number of scripts, rather than meaning one that uses
 Unicode for its mapping architecture.

 Yes, OS and browsers are getting better. My concerns center around:
 Is the mechanism for selecting fallback fonts language-sensitive, so
 that it would favor a Japanese font for Unicode Han characters that were
 tagged as lang:ja


  I'm a little at loss as to why you have the impression
that  'lang' tag has little effect on rendering of html (in
UTF-8. e.g. your page or IUC10 announcement page which used to be at
http://www.unicode.org/iuc/iuc10/x-utf8.html) by major browsers. MS
IE has been making use of 'lang' attribute(html) for a long time and
Mozilla solved the problem (although 'xml:lang' is not yet supported)
last December. In case of Mozilla(and Netscape 7), see

  http://bugzilla.mozilla.org/show_bug.cgi?id=105199  (fixed.
   where you'll find a pair of screenshots with dramatically
   different rendering results)
  http://bugzilla.mozilla.org/show_bug.cgi?id=115121
  (xml:lang : not yet fixed)
  http://bugzilla.mozilla.org/show_bug.cgi?id=122779 (C-L http header
 and  UTF-8 document)

 And are the fonts labeled so that the supported language is known?

  Judging from the discussion about the issue in Xfree86-font
list, most of modern OTFs are. Otherwise, applications (or  a library
for text rendering/font selection) can resort to a kind of mapping the
character repertoire of a font to language(s) covered as is done by
fontconfig for XFree86. For instance, characters in JIS X 0208 are all
covered, but characters from GB2312, Big5 and KS X 1001 are missing,
a font is likely to be Japanese.

 Even so, I'd still need to have a large collection of fonts then.

  Indeed that's the case. If OT lang-tag is made use of and
multiple alternative glyphs are available in a single(or
a few) pan-script Unicode font(s), you'd not have to.

  Jungshik





Re: glyph selection for Unicode in browsers

2002-09-27 Thread John Cowan

Tex Texin scripsit:

 An author of a primarily Japanese document could choose not to tag
 Chinese text as Chinese, and so get a Japanese rendering of the text,
 but that could hurt search engines or other applications that use
 language tags for purposes other than rendering...

Indeed, indeed.  Tagging (even implicit tagging) with a false language is
a very bad idea.

 So I stick with the
 idea that text should be tagged with language appropriately, and a user
 that reads Japanese and prefers to see Chinese text with Japanese glyphs
 have the ability to override the language tags to affect rendering.

The trouble is that that's the default for a Japanese reader reading
mixed-language text.  No override should be required.

 I can't say if typographical tradition preference (TTP) is the correct
 term for language preference. (I figure I got into enough trouble
 using typographically correct.) I hope the discussion above was clear
 enough. I'll let others comment on TTP, and if there is general
 agreement that it is a better and more precise and accurate term, I am
 fine with it.

My point was that it's one thing to want Chinese text displayed with
Japanese glyphs, based on a typographical-tradition preference, and it's
another thing to want the text in a Japanese-language version,
which is what setting a language preference would suggest.

 I am not familiar enough with Fraktur and Antiqua to
 knowledgably comment. From what little I do know this seems to require
 more than language information to decide between them. 

Absolutely.  The analogy is that Fraktur is quite, or nearly, illegible if
all you know how to read is Antiqua (which looks like what you are seeing
now, ordinary Latin-script type).  This makes the difference greater than a
mere font difference.

-- 
Business before pleasure, if not too bloomering long before.
--Nicholas van Rijn
John Cowan [EMAIL PROTECTED]
http://www.ccil.org/~cowan  http://www.reutershealth.com




Re: glyph selection for Unicode in browsers

2002-09-27 Thread Tex Texin

Hi,
I am glad to see the issue has been given some attention.
I concluded there was a problem after experimenting with some CJK
characters that I repeated with different lang tags and could not get
any display differences unless I used non-Unicode fonts assigned to each
language. I did this with IE 6 and NS 7 and Opera (dont recall if it was
6 or 7.)

tex


Jungshik Shin wrote:
 
 On Thu, 26 Sep 2002, Tex Texin wrote:
 
  Yes, underlying fonts can be a Unicode architecture. That's a good
  thing, but invisible to end-users.
  I would like to keep the sense of Unicode font as meaning a font which
  supports a large number of scripts, rather than meaning one that uses
  Unicode for its mapping architecture.
 
  Yes, OS and browsers are getting better. My concerns center around:
  Is the mechanism for selecting fallback fonts language-sensitive, so
  that it would favor a Japanese font for Unicode Han characters that were
  tagged as lang:ja
 
   I'm a little at loss as to why you have the impression
 that  'lang' tag has little effect on rendering of html (in
 UTF-8. e.g. your page or IUC10 announcement page which used to be at
 http://www.unicode.org/iuc/iuc10/x-utf8.html) by major browsers. MS
 IE has been making use of 'lang' attribute(html) for a long time and
 Mozilla solved the problem (although 'xml:lang' is not yet supported)
 last December. In case of Mozilla(and Netscape 7), see
 
   http://bugzilla.mozilla.org/show_bug.cgi?id=105199  (fixed.
where you'll find a pair of screenshots with dramatically
different rendering results)
   http://bugzilla.mozilla.org/show_bug.cgi?id=115121
   (xml:lang : not yet fixed)
   http://bugzilla.mozilla.org/show_bug.cgi?id=122779 (C-L http header
  and  UTF-8 document)
 
  And are the fonts labeled so that the supported language is known?
 
   Judging from the discussion about the issue in Xfree86-font
 list, most of modern OTFs are. Otherwise, applications (or  a library
 for text rendering/font selection) can resort to a kind of mapping the
 character repertoire of a font to language(s) covered as is done by
 fontconfig for XFree86. For instance, characters in JIS X 0208 are all
 covered, but characters from GB2312, Big5 and KS X 1001 are missing,
 a font is likely to be Japanese.
 
  Even so, I'd still need to have a large collection of fonts then.
 
   Indeed that's the case. If OT lang-tag is made use of and
 multiple alternative glyphs are available in a single(or
 a few) pan-script Unicode font(s), you'd not have to.
 
   Jungshik

-- 
-
Tex Texin   cell: +1 781 789 1898   mailto:[EMAIL PROTECTED]
Xen Master  http://www.i18nGuy.com
 
XenCrafthttp://www.XenCraft.com
Making e-Business Work Around the World
-




Re: glyph selection for Unicode in browsers

2002-09-27 Thread Tex Texin



John Cowan wrote:
 
 Tex Texin scripsit:
 
  An author of a primarily Japanese document could choose not to tag
  Chinese text as Chinese, and so get a Japanese rendering of the text,
  but that could hurt search engines or other applications that use
  language tags for purposes other than rendering...
 
 Indeed, indeed.  Tagging (even implicit tagging) with a false language is
 a very bad idea.
 
  So I stick with the
  idea that text should be tagged with language appropriately, and a user
  that reads Japanese and prefers to see Chinese text with Japanese glyphs
  have the ability to override the language tags to affect rendering.
 
 The trouble is that that's the default for a Japanese reader reading
 mixed-language text.  No override should be required.

It's not a big trouble. browsers already have options such as netscape's
preferences under fonts, radio button:
use fonts specified in document vs. override and use user-defined fonts.

 
  I can't say if typographical tradition preference (TTP) is the correct
  term for language preference. (I figure I got into enough trouble
  using typographically correct.) I hope the discussion above was clear
  enough. I'll let others comment on TTP, and if there is general
  agreement that it is a better and more precise and accurate term, I am
  fine with it.
 
 My point was that it's one thing to want Chinese text displayed with
 Japanese glyphs, based on a typographical-tradition preference, and it's
 another thing to want the text in a Japanese-language version,
 which is what setting a language preference would suggest.
 
  I am not familiar enough with Fraktur and Antiqua to
  knowledgably comment. From what little I do know this seems to require
  more than language information to decide between them.
 
 Absolutely.  The analogy is that Fraktur is quite, or nearly, illegible if
 all you know how to read is Antiqua (which looks like what you are seeing
 now, ordinary Latin-script type).  This makes the difference greater than a
 mere font difference.

ok, but it is not clear to me that we should try to fix this problem in
the same way we fix the cjk rendering problem.
Language is being tagged and provided for a number of reasons, and it
should be utilized.
Fraktur/Antiqua and other distinctions might need a different mechanism.

tex

 
 --
 Business before pleasure, if not too bloomering long before.
 --Nicholas van Rijn
 John Cowan [EMAIL PROTECTED]
 http://www.ccil.org/~cowan  http://www.reutershealth.com

-- 
-
Tex Texin   cell: +1 781 789 1898   mailto:[EMAIL PROTECTED]
Xen Master  http://www.i18nGuy.com
 
XenCrafthttp://www.XenCraft.com
Making e-Business Work Around the World
-




Re: glyph selection for Unicode in browsers

2002-09-27 Thread Jungshik Shin




On Fri, 27 Sep 2002 [EMAIL PROTECTED] wrote:


 On 09/26/2002 10:46:42 PM Andrew Cunningham wrote:

 For me, this is the crux: that browsers have not implimented the css
 :lang selector.

 As I wrote in my response to Tex, css 'lang' pseudo-class is
honored by MS IE and Mozilla 1.x/Netscape 7.

 Again, the problem is knowing just *how* they should go about doing this.

  As for 'how', what MS IE and Mozilla do may not be as user-friendly
as Tex wants them to be, but I think it's pretty reasonable at
least for CJK. If they're configured to use different Unicode-cmapped
(non-Pan-script) fonts for TC/SC/J/K (as opposed to pan-script Unicode
fonts like MS Arial Unicode, Cyberbit), runs of text tagged with TC/SC/J/K
are rendered with fonts configured for TC,SC,J and K, respectively.

I guess you already know this much and what you're alluding
to is a problem of another dimension:  developing ( Pan-script
if necessary/possible) Unicode fonts with multiple lang-depedent
glyphs  (if that's possible at all overcoming/solving various subtles
issues involved. it seems like selecting lang-dependent glyphs for
Latin/Cyrillic letters are more difficult than CJK case) and getting
apps and rendering/font selection library to make use of them.  The font
selection part of these problems is addressed by fontconfig package by
Keith Packard (http://fontconfig.org). Of course, there should be
other implementations of/attempts at this problem.

  Jungshik Shin





Re: glyph selection for Unicode in browsers

2002-09-27 Thread Tex Texin

Jungshik,

I used characters that should display differently. Some punctuation and
some like (I think it is) the bone character.

However, feel free to suggest a list of characters that should be
distinctive and I'll post a page with them that we can all review
whether there are differences or not on various platforms, browsers,
etc.

I agree for many characters there should be no differences.

tex

Jungshik Shin wrote:
 
   Actually, you might have had hard time telling the display difference
 depending on what characters you used for your testing EVEN IF you
 configured browsers to use different (but with __very similar__ design
 principles and look/feels) Unicode-cmapped (but NON-pan-script) fonts
 for TC,SC, J and K *under MS Windows*.  This difficulty demonstrates
 that CJK Unification in Unicode/10646 is not such a big problem as some
 people tried to make it.
 
   Jungshik

-- 
-
Tex Texin   cell: +1 781 789 1898   mailto:[EMAIL PROTECTED]
Xen Master  http://www.i18nGuy.com
 
XenCrafthttp://www.XenCraft.com
Making e-Business Work Around the World
-




Re: glyph selection for Unicode in browsers

2002-09-27 Thread Peter_Constable


On 09/27/2002 10:56:00 AM Jungshik Shin wrote:

 Again, the problem is knowing just *how* they should go about doing
this.

  As for 'how', what MS IE and Mozilla do may not be as user-friendly
as Tex wants them to be, but I think it's pretty reasonable at
least for CJK. If they're configured to use different Unicode-cmapped
(non-Pan-script) fonts for TC/SC/J/K (as opposed to pan-script Unicode
fonts like MS Arial Unicode, Cyberbit), runs of text tagged with TC/SC/J/K
are rendered with fonts configured for TC,SC,J and K, respectively.

A couple of notes:

Speaking in generalities, a font that isn't a pan-script Unicode font
potentially can support TC/SC/J/K equally well with glyphs suited to users
in each culture -- but not using default character-to-glyph mappings. The
mechanisms available to IE or Mozilla today would not provide any means to
determine which typographic preferences are supported by default in a given
font. Nor does the infrastructure exist that will allow these apps to
request the culturally-preferred fonts that would exist in such fonts. Of
course, in practice, many currently-existing CJK fonts may have been
developed to support a single group of users, and don't include alternate
glyphs that might be prefered by users in other cultures.

Also, what IE and Mozilla currently do helps with the CJK issues, but these
apps don't do anything, that I know of, in relation to comparable issues
for other scripts, e.g. language-related preferences for Latin diacritics
or Cyrillic italic forms. Which you anticipate:


I guess you already know this much and what you're alluding
to is a problem of another dimension:  developing ( Pan-script
if necessary/possible) Unicode fonts with multiple lang-depedent
glyphs

Yes (with the added note that the pan-script element is orthogonal to what
I'm referring to).


it seems like selecting lang-dependent glyphs for
Latin/Cyrillic letters are more difficult than CJK case

I'm not sure; I haven't thought about that, in part because I don't have
only limited knowledge of what glyph variations issues there are for most
scripts.


The font
selection part of these problems is addressed by fontconfig package by
Keith Packard (http://fontconfig.org). Of course, there should be
other implementations of/attempts at this problem.

The fontconfig library is entirely new to me. Thanks for the link.



- Peter


---
Peter Constable

Non-Roman Script Initiative, SIL International
7500 W. Camp Wisdom Rd., Dallas, TX 75236, USA
Tel: +1 972 708 7485
E-mail: [EMAIL PROTECTED]







Re: glyph selection for Unicode in browsers

2002-09-26 Thread Thomas Chan

On Thu, 26 Sep 2002 [EMAIL PROTECTED] wrote:

 Tex Texin wrote,
  Given the (un)workable approach, do you then intend to have variants of
  code2000 for CJKT, so one can make the appropriate assignments? (ugh!)
 
 Code2000's coverage of CJKTV ideographs isn't adequate to support any language
 yet.  Eventually and hopefully the repertoire will be completed.  Given the
 current ceiling of 65536 max glyphs per font, it might not be feasible to
 try to have one font cover all scripts and variants, but time will tell.

I don't mean to detract from the point of this discussion, nor to
criticize a particular font, but I think the Han glyphs in Code2000 are
aesthetically disappointing in that that they are distorted enough (shape,
proportions, and positioning) that they differ farther from any typical
CJK font more so than two comparable CJK fonts may differ due to
language/country glyph preferences.  Compare, for instance, with other
sans serif CJK fonts like Arial Unicode MS, (cn) MS Hei, or (ja) MS
Gothic.

But changing the example to fonts like Arial Unicode MS doesn't completely
solve everything--a sans serif font is not the norm for non-trivial
quantities of CJK text (compare any book or newspaper).  These problems
would cause rejection of a font faster than adverse reactions to
foreign/unfamiliar glyph designs.  (The aging serifed Bitstream Cyberbit
font might be a better example in this respect.)


Thomas Chan
[EMAIL PROTECTED]





Re: glyph selection for Unicode in browsers

2002-09-26 Thread Markus Scherer

Tex Texin wrote:

 However, a Japanese user might have to choose a Japanese font, if the
 Unicode font does not favor (and cannot be made to favor with language
 tags) Japanese renderings.
 So it's catch 22. They have native fonts because Unicode fonts are
 inadequate, but we can be relieved that although Unicode fonts are
 inadequate, we are lucky the users don't use them.


I am not sure this is as bad as it may sound:
Modern native fonts use Unicode cmaps (mapping tables from _Unicode text_ to glyph 
IDs) instead of SJIS/whatever cmaps.
They will just not contain entries for much else but native characters.

In that sense, those native fonts will also be Unicode fonts.
Operating systems and browsers are also getting better at automatically selecting 
fallback fonts for characters that are missing in the main font.

markus





Re: glyph selection for Unicode in browsers

2002-09-26 Thread Tex Texin

Hi,
Yes, these fonts do not solve everything. (Nor should they.)

We should be careful not to apply the requirements for high end
publishing systems to software that just needs to have adequate
rendering, such as browsers and other software.

I would like to have adequate coverage for the Unicode space, with some
language awareness or sensitivity, before we raise the bar to the level
of requiring publishing quality.

I would guess high end publishers are quite comfortable choosing
(acquiring, installing, selecting) specialized fonts for different
situations, including for rendering different languages.

However, for people that are not so adept at choosing fonts and
assigning them by language, browsers and other software need to have a
reasonable, solution.

tex

John Cowan wrote:
 
 Thomas Chan scripsit:
 
  But changing the example to fonts like Arial Unicode MS doesn't completely
  solve everything--a sans serif font is not the norm for non-trivial
  quantities of CJK text (compare any book or newspaper).
 
 Nor any other kind of text, indeed, until the widespread use of Arial/Helvetica,
 which properly is only a display font, as a text font (ugh).
 
 --
 John Cowan  [EMAIL PROTECTED]  www.ccil.org/~cowan  www.reutershealth.com
 If I have not seen as far as others, it is because giants were standing
 on my shoulders.
 --Hal Abelson

-- 
-
Tex Texin   cell: +1 781 789 1898   mailto:[EMAIL PROTECTED]
Xen Master  http://www.i18nGuy.com
 
XenCrafthttp://www.XenCraft.com
Making e-Business Work Around the World
-




Re: glyph selection for Unicode in browsers

2002-09-26 Thread Tex Texin

Markus,

Yes, underlying fonts can be a Unicode architecture. That's a good
thing, but invisible to end-users.
I would like to keep the sense of Unicode font as meaning a font which
supports a large number of scripts, rather than meaning one that uses
Unicode for its mapping architecture.

Yes, OS and browsers are getting better. My concerns center around:
Is the mechanism for selecting fallback fonts language-sensitive, so
that it would favor a Japanese font for Unicode Han characters that were
tagged as lang:ja And are the fonts labeled so that the supported
language is known?

Even so, I'd still need to have a large collection of fonts then.

I would like to be able to publish a page such as the Unicode example
page:
http://www.i18nguy.com/unicode-example.html

without feeling obligated to publish a pdf version
http://www.i18nguy.com/unicode/unicodeexample.pdf

so that the less technical among us would not feel challenged to acquire
and install several fonts, language by language.

And my main point is that my investment in tagging text segments with
lang should result in the most appropriate rendering.
Currently, using a Unicode font, lang has no visible effect.

tex


Markus Scherer wrote:
 
 Tex Texin wrote:
 
  However, a Japanese user might have to choose a Japanese font, if the
  Unicode font does not favor (and cannot be made to favor with language
  tags) Japanese renderings.
  So it's catch 22. They have native fonts because Unicode fonts are
  inadequate, but we can be relieved that although Unicode fonts are
  inadequate, we are lucky the users don't use them.
 
 I am not sure this is as bad as it may sound:
 Modern native fonts use Unicode cmaps (mapping tables from _Unicode text_ to glyph 
IDs) instead of SJIS/whatever cmaps.
 They will just not contain entries for much else but native characters.
 
 In that sense, those native fonts will also be Unicode fonts.
 Operating systems and browsers are also getting better at automatically selecting 
fallback fonts for characters that are missing in the main font.
 
 markus

-- 
-
Tex Texin   cell: +1 781 789 1898   mailto:[EMAIL PROTECTED]
Xen Master  http://www.i18nGuy.com
 
XenCrafthttp://www.XenCraft.com
Making e-Business Work Around the World
-




RE: glyph selection for Unicode in browsers

2002-09-26 Thread P. J. Patterson

Actually, as a publisher, we do have a problem with this.  I publish
scientific abstract data which is collected from authors all over the
world.  Since the information becomes dated so quickly, we are always
looking for ways to reduce turn around time from collection to
publication.  The books are sometimes 600 pages, and overhead must be
kept low.  Unicode helps us keep the data accurate, but we are still
running into problems identifying missing glyphs prior to printing - for
the most part it comes down to visual recognition.  This is complicated
by the fact that the submission and review processes are all browser
based.

I spoke with a few people from Adobe at the conference, and the concept
of fall-back fonts was very appealing, at least to minimize the missing
glyphs and still allow for wider font selections.  But some sort of
alert system beyond the unrecognized character display is really what we
are looking for.

I think, ideally, I would be looking for a program to examine a
document, compare to the selected fonts (with fallback), and then list
the missing glyphs for individual handling.

Anyone have any thoughts?  



P.J. Patterson
Director of Product Research and Development
Coe-Truman Technologies, Inc.
e. [EMAIL PROTECTED]
p. 217-398-8594
f. 217-355-0101 


 -Original Message-
 From: Tex Texin [mailto:[EMAIL PROTECTED]] 
 Sent: Thursday, September 26, 2002 12:21 PM
 To: John Cowan
 Cc: [EMAIL PROTECTED]; [EMAIL PROTECTED]
 Subject: Re: glyph selection for Unicode in browsers
 
 
 Hi,
 Yes, these fonts do not solve everything. (Nor should they.)
 
 We should be careful not to apply the requirements for high 
 end publishing systems to software that just needs to have 
 adequate rendering, such as browsers and other software.
 
 I would like to have adequate coverage for the Unicode space, 
 with some language awareness or sensitivity, before we raise 
 the bar to the level of requiring publishing quality.
 
 I would guess high end publishers are quite comfortable 
 choosing (acquiring, installing, selecting) specialized fonts 
 for different situations, including for rendering different languages.
 
 However, for people that are not so adept at choosing fonts 
 and assigning them by language, browsers and other software 
 need to have a reasonable, solution.
 
 tex
 
 John Cowan wrote:
  
  Thomas Chan scripsit:
  
   But changing the example to fonts like Arial Unicode MS doesn't 
   completely solve everything--a sans serif font is not the 
 norm for 
   non-trivial quantities of CJK text (compare any book or 
 newspaper).
  
  Nor any other kind of text, indeed, until the widespread use of 
  Arial/Helvetica, which properly is only a display font, as 
 a text font 
  (ugh).
  
  --
  John Cowan  [EMAIL PROTECTED]  www.ccil.org/~cowan  
  www.reutershealth.com If I have not seen as far as others, it is 
  because giants were standing on my shoulders.
  --Hal Abelson
 
 -- 
 -
 Tex Texin   cell: +1 781 789 1898   mailto:[EMAIL PROTECTED]
 Xen Master  http://www.i18nGuy.com
  
 XenCraft  http://www.XenCraft.com
 Making e-Business Work Around the World
 -
 
 




Re: glyph selection for Unicode in browsers

2002-09-26 Thread jameskass


Tex Texin wrote,

 I would like to keep the sense of Unicode font as meaning a font which
 supports a large number of scripts, rather than meaning one that uses
 Unicode for its mapping architecture.

pan-Unicode font

I think Frank da Cruz coined that expression, but am not sure.

Since fonts do use Unicode mapping, some kind of modifyer is needed in
order to distinguish the big ones from the other kind.

Best regards,

James Kass.




Re: glyph selection for Unicode in browsers

2002-09-26 Thread Tex Texin

Shouldn't that be something more like: pan-script Unicode-based font?

[EMAIL PROTECTED] wrote:
 
 Tex Texin wrote,
 
  I would like to keep the sense of Unicode font as meaning a font which
  supports a large number of scripts, rather than meaning one that uses
  Unicode for its mapping architecture.
 
 pan-Unicode font
 
 I think Frank da Cruz coined that expression, but am not sure.
 
 Since fonts do use Unicode mapping, some kind of modifyer is needed in
 order to distinguish the big ones from the other kind.
 
 Best regards,
 
 James Kass.

-- 
-
Tex Texin   cell: +1 781 789 1898   mailto:[EMAIL PROTECTED]
Xen Master  http://www.i18nGuy.com
 
XenCrafthttp://www.XenCraft.com
Making e-Business Work Around the World
-




Re: glyph selection for Unicode in browsers

2002-09-26 Thread Peter_Constable


On 09/26/2002 12:52:13 PM Tex Texin wrote:

I would like to keep the sense of Unicode font as meaning a font which
supports a large number of scripts, rather than meaning one that uses
Unicode for its mapping architecture.

I suppose you didn't happen to attend session at a number of past Unicode
conferences (not this last one, though) in which folks from Monotype
presented on this these. In general, font developers don't recommend the
idea of a single font that covers all of Unicode (it's not possible, BTW,
given the 64K glyph limit). There are a variety of reasons for this. Even
so, people keep looking for them.

As for terminology, Unicode font is too ambiguous for the reasons Markus
mentioned having to do with cmaps. You may be far more concerned with
comprehensive coverage, but that isn't necessarily everyone's concern. In
my work, I have to deal far more with fonts that use different encodings
than I do with fonts that have comprehensive coverage. I much prefer to
refer to comprehensive-coverage fonts as pan-Unicode fonts, and for the
other issue, to refer to Unicode-encoded or Unicode-conformant (as
opposed to custom-encoded) fonts.



- Peter


---
Peter Constable

Non-Roman Script Initiative, SIL International
7500 W. Camp Wisdom Rd., Dallas, TX 75236, USA
Tel: +1 972 708 7485
E-mail: [EMAIL PROTECTED]







RE: glyph selection for Unicode in browsers

2002-09-26 Thread Peter_Constable


On 09/26/2002 01:15:37 PM P. J. Patterson wrote:

I think, ideally, I would be looking for a program to examine a
document, compare to the selected fonts (with fallback), and then list
the missing glyphs for individual handling.

It wouldn't be all that difficult for someone to create a tool that
compared a set of data with a preferential list of fonts to determine which
characters are going to be supported by which fonts, and which are not
covered. That still wouldn't address Tex's concern with regard to
language-specific glyph preferences, unless the tool also knew which fonts
were designed for which languages (or give preference to which languages as
defaults).



- Peter


---
Peter Constable

Non-Roman Script Initiative, SIL International
7500 W. Camp Wisdom Rd., Dallas, TX 75236, USA
Tel: +1 972 708 7485
E-mail: [EMAIL PROTECTED]







Re: glyph selection for Unicode in browsers

2002-09-26 Thread Tex Texin

Peter,

Yes, I am aware of the difficulty of creating a single font that covers
all of Unicode.
And fine, let's change terminology. I was trying to make sure that my
use of Unicode font was clear.

Whether it's difficult or not, 1) there is a need for a simple solution
for fonts, that lay people can use in conjunction with Unicode text.
2) The glyphs need to vary based on language when such information is
available.
3) The language information used to be derived from code page and is
missing with Unicode, and architecture needs to accomodate a better
model for bringing language to font selection.

That said, I'll use any terminology people want me to use, provided it
doesn't obscure the issue(s).

I don't require that a single font be used to solve problem #1. It can
be a bundle of fonts or some other packaging of fonts. I only require
that it be accessible to non-technical, non-linguist, people, who
require a simple install and broad coverage, to get reasonable (not
necessarily high end publishing) quality. It should be something I can
do once in advance of receiving documents, and not something I need to
do or reconsider every time I get a document and find new missing
glyphs.

I also don't care who provides the solution- it can be a font vendor, or
it can be a package distributed by a browser vendor or someone else.

The problem I am looking to solve, is to be able to recommend Unicode as
best practice for the web.
I don't think it is best practice if there are markets where the
rendering is poor because the loss of language information provided by
code page is not replaced by the lang facility and it is not best
practice if in using Unicode, you need to be either technical or a
linguist to identify and use the right fonts to display a document.

There is a market opportunity here for some industrious individuals...
And I hope the browser vendors are looking at the use of lang to assist
in font selection.

tex


[EMAIL PROTECTED] wrote:
 
 On 09/26/2002 12:52:13 PM Tex Texin wrote:
 
 I would like to keep the sense of Unicode font as meaning a font which
 supports a large number of scripts, rather than meaning one that uses
 Unicode for its mapping architecture.
 
 I suppose you didn't happen to attend session at a number of past Unicode
 conferences (not this last one, though) in which folks from Monotype
 presented on this these. In general, font developers don't recommend the
 idea of a single font that covers all of Unicode (it's not possible, BTW,
 given the 64K glyph limit). There are a variety of reasons for this. Even
 so, people keep looking for them.
 
 As for terminology, Unicode font is too ambiguous for the reasons Markus
 mentioned having to do with cmaps. You may be far more concerned with
 comprehensive coverage, but that isn't necessarily everyone's concern. In
 my work, I have to deal far more with fonts that use different encodings
 than I do with fonts that have comprehensive coverage. I much prefer to
 refer to comprehensive-coverage fonts as pan-Unicode fonts, and for the
 other issue, to refer to Unicode-encoded or Unicode-conformant (as
 opposed to custom-encoded) fonts.
 
 - Peter
 
 ---
 Peter Constable
 
 Non-Roman Script Initiative, SIL International
 7500 W. Camp Wisdom Rd., Dallas, TX 75236, USA
 Tel: +1 972 708 7485
 E-mail: [EMAIL PROTECTED]

-- 
-
Tex Texin   cell: +1 781 789 1898   mailto:[EMAIL PROTECTED]
Xen Master  http://www.i18nGuy.com
 
XenCrafthttp://www.XenCraft.com
Making e-Business Work Around the World
-




Re: glyph selection for Unicode in browsers

2002-09-26 Thread Barry Caplan

At 02:59 PM 9/26/2002 -0400, Tex Texin wrote:
Shouldn't that be something more like: pan-script Unicode-based font?


or p8e font? :)

Barry Caplan
www.i18n.com





Re: glyph selection for Unicode in browsers

2002-09-26 Thread Peter_Constable


On 09/26/2002 03:05:36 PM Tex Texin wrote:

The problem I am looking to solve, is to be able to recommend Unicode as
best practice for the web.

Which is a good thing to be concerned with, and all of the issues you raise
are certainly important.



There is a market opportunity here for some industrious individuals...

I think there are various font technology issues very much in need of
solution, and preferably by agreement among font vendors (and also platform
vendors, for certain issues) as to how to go about it. I'm not sure how
that might come about.



- Peter


---
Peter Constable

Non-Roman Script Initiative, SIL International
7500 W. Camp Wisdom Rd., Dallas, TX 75236, USA
Tel: +1 972 708 7485
E-mail: [EMAIL PROTECTED]







Re: glyph selection for Unicode in browsers

2002-09-26 Thread Kenneth Whistler

Tex,

 3) The language information used to be derived

dubiously

 from code page and is
 missing with Unicode, and architecture needs to accomodate a better
 model for bringing language to font selection.

The archetypal situation is for CJK, and in particular J,
where language choice correlates closely with typographical
preferences, and where character encoding could, in turn,
be correlated reliably with language choice.

But in general, the connection does not hold, as for data
in any of hundreds of different languages written in Code Page 1252,
for example.

What you are really looking for, I believe, is a way to
specify typographical preference, which then can be used to
drive auto-selection of fonts.

I don't think we should head down the garden path of trying
to tie typographical preference too closely to language identity,
however we unknot that particular problem. This could get
you into contrarian problems, where browsers (or other tools)
start paying *too* much attention to language tags, and
automatically (and mysteriously) override user preferences
about the typographical preferences they expect for characters.

What is needed, I believe, is:

  a. a way to establish typographic preferences
  b. a way to link typographical preference choices to
   fonts that would express them correctly
  c. a way to (optionally) associate a language with
   a typographical preference

And this all should be done, of course, in such a way that
default behavior is reasonable and undue burdens of understanding,
font acquisition, installation, and such
are not placed on end-users who simply want to read and print
documents from the web.

A tall order, I am sure. But as long as we are blue-skying about
architecture for better solutions, I think it is important
not to replace one broken model (code page = language) with
another broken model (language = font preference).

--Ken




RE: glyph selection for Unicode in browsers

2002-09-26 Thread Murray Sargent

I don't think the idea is that codepage equals language. Rather codepage
equals a writing system, which consists of one or more scripts (e.g., 6
scripts for ShiftJIS). As such the codepage is a useful cue in choosing
an appropriate font for rendering text. In the RichEdit edit engine, we
use a codepage generalization called a CharRep and break Unicode plain
text into runs of text each characterized by a particular CharRep. We
then bind these runs to appropriate fonts for rendering. There are many
additional considerations, so unfortunately this isn't an easy task. But
with enough refinements it works quite well. 

The bottom line is that if text was generated using a particular
codepage it's likely that the creator of that text intended the text to
be rendered with a font that supports that codepage. For text tagged
with no codepage, we do our best to translate the keyboard language to a
CharRep and proceed as above. When neither the keyboard nor codepage
info is available, we use a set of heuristics to break the text into
CharRep runs. Among the many heuristics used are 1) a string containing
Kana is likely to have a Japanese CharRep, and 2) a CJK string that
round trips through CHT, CHS, or ShiftJIS may well belong to those
CharReps. In particular if a CJK string doesn't round trip through CHT,
it's probably not Traditional Chinese.

Murray




Re: glyph selection for Unicode in browsers

2002-09-26 Thread Tex Texin

Ken, thanks. I absolutely agree.

Yes code page was not a good indicator of language, but it was used that
way by some applications.

And yes, Language should not dominate font selection, it should
influence it. Other typographic preferences also must be accomodated.

Well said.

In the case of HTML, XML, CSS, ways to specify typographic preferences
exist, and language can be expressed via lang. We just need browsers
and other user agents to make use of the lang information as part of
font selection.

tex


Kenneth Whistler wrote:
 
 Tex,
 
  3) The language information used to be derived
 
 dubiously
 
  from code page and is
  missing with Unicode, and architecture needs to accomodate a better
  model for bringing language to font selection.
 
 The archetypal situation is for CJK, and in particular J,
 where language choice correlates closely with typographical
 preferences, and where character encoding could, in turn,
 be correlated reliably with language choice.
 
 But in general, the connection does not hold, as for data
 in any of hundreds of different languages written in Code Page 1252,
 for example.
 
 What you are really looking for, I believe, is a way to
 specify typographical preference, which then can be used to
 drive auto-selection of fonts.
 
 I don't think we should head down the garden path of trying
 to tie typographical preference too closely to language identity,
 however we unknot that particular problem. This could get
 you into contrarian problems, where browsers (or other tools)
 start paying *too* much attention to language tags, and
 automatically (and mysteriously) override user preferences
 about the typographical preferences they expect for characters.
 
 What is needed, I believe, is:
 
   a. a way to establish typographic preferences
   b. a way to link typographical preference choices to
fonts that would express them correctly
   c. a way to (optionally) associate a language with
a typographical preference
 
 And this all should be done, of course, in such a way that
 default behavior is reasonable and undue burdens of understanding,
 font acquisition, installation, and such
 are not placed on end-users who simply want to read and print
 documents from the web.
 
 A tall order, I am sure. But as long as we are blue-skying about
 architecture for better solutions, I think it is important
 not to replace one broken model (code page = language) with
 another broken model (language = font preference).
 
 --Ken

-- 
-
Tex Texin   cell: +1 781 789 1898   mailto:[EMAIL PROTECTED]
Xen Master  http://www.i18nGuy.com
 
XenCrafthttp://www.XenCraft.com
Making e-Business Work Around the World
-




Re: glyph selection for Unicode in browsers

2002-09-26 Thread Andrew Cunningham

Hi

Tex Texin wrote:
 
 In the case of HTML, XML, CSS, ways to specify typographic preferences
 exist, and language can be expressed via lang. We just need browsers
 and other user agents to make use of the lang information as part of
 font selection.

For me, this is the crux: that browsers have not implimented the css 
:lang selector.

Things would be easier if we could tie presentation (via css) to the 
specified language of a document or part of a document.

Andrew

-- 
Andrew Cunningham
Multilingual  Technical Officer
OPT, Vicnet
State Library of Victoria
Australia

[EMAIL PROTECTED]

Ph: +61-3-8664-7001
Fax: +61-3-9639-2175

http://home.vicnet.net.au/~andrewc/
http://www.openroad.net.au/





Re: glyph selection for Unicode in browsers

2002-09-26 Thread Mark Davis

 not to replace one broken model (code page = language) with
 another broken model (language = font preference).

I would add to that that I suspect that given the number of documents
that fail to tag with language, or even worse yet, tag with the wrong
language, that other approaches may give generally better results. The
main area of concern is CJK, and I suspect that in a great many cases
the user is probably better off either:

- simply using a font set according to the user's own preference, or
- having a bit of smarts in the program for heuristically picking
among C, J and K.

Mark
__
http://www.macchiato.com
◄  “Eppur si muove” ►

- Original Message -
From: Kenneth Whistler [EMAIL PROTECTED]
To: [EMAIL PROTECTED]
Cc: [EMAIL PROTECTED]; [EMAIL PROTECTED]
Sent: Thursday, September 26, 2002 16:17
Subject: Re: glyph selection for Unicode in browsers


 Tex,

  3) The language information used to be derived

 dubiously

  from code page and is
  missing with Unicode, and architecture needs to accomodate a
better
  model for bringing language to font selection.

 The archetypal situation is for CJK, and in particular J,
 where language choice correlates closely with typographical
 preferences, and where character encoding could, in turn,
 be correlated reliably with language choice.

 But in general, the connection does not hold, as for data
 in any of hundreds of different languages written in Code Page 1252,
 for example.

 What you are really looking for, I believe, is a way to
 specify typographical preference, which then can be used to
 drive auto-selection of fonts.

 I don't think we should head down the garden path of trying
 to tie typographical preference too closely to language identity,
 however we unknot that particular problem. This could get
 you into contrarian problems, where browsers (or other tools)
 start paying *too* much attention to language tags, and
 automatically (and mysteriously) override user preferences
 about the typographical preferences they expect for characters.

 What is needed, I believe, is:

   a. a way to establish typographic preferences
   b. a way to link typographical preference choices to
fonts that would express them correctly
   c. a way to (optionally) associate a language with
a typographical preference

 And this all should be done, of course, in such a way that
 default behavior is reasonable and undue burdens of understanding,
 font acquisition, installation, and such
 are not placed on end-users who simply want to read and print
 documents from the web.

 A tall order, I am sure. But as long as we are blue-skying about
 architecture for better solutions, I think it is important
 not to replace one broken model (code page = language) with
 another broken model (language = font preference).

 --Ken






Re: glyph selection for Unicode in browsers

2002-09-26 Thread jameskass

Tex Texin wrote,

 James, thanks as always for your reply.
 The 65K limit is ugly...

Thank you, Tex, for this fascinating thread.

Don't despair.  As Peter Constable has pointed out, the infrastructure
for having browsers support language tags is already present.  He has
also provided a great outline/overview of just what is required.

IMO, we are close to many improvements in this regard.  Paul Nelson of
the Microsoft Typography Group is doing a tremendous job with respect
to OpenType technology and the Uniscribe engine.  Other platforms are
also making great strides forward.

Once the font specs for all this are set and fonts are released with
the necessary coverage and the shaping engines can access all of this,
the browsers are sure to quickly add support, too.  It's all somewhat
interrelated.

Unicode is the best way to go, because Unicode is all about character
encoding.  Shucks, I guess I don't have to tell you about the
benefits of Unicode, eh?

Best regards,

James Kass.




Re: glyph selection for Unicode in browsers

2002-09-26 Thread Tex Texin

Mark,

My preference is that tagged information should display as tagged and
the user can do something specifically to override it if they want.
But then, I can't read CJK and so would be glad to get comments from
those communities. I can see arguments both for and against user
preference to take precedence over tags.

Where there is no language information in the document, it makes sense
to have user preference or heuristics attempt to supply the information.
Where the tag is clearly inappropriate, for example, text labeled as
English that is clearly Chinese, sure override the tag.
Where the tag is wrong but difficult to detect (Traditional vs.
Simplified) too bad- the author gets what he deserves.

Also, heuristics work well with longer runs of text, but not for shorter
runs. (names and addresses, quotations, etc.)

From an implementation standpoint, once you have the ability for
language to influence font selection, the significant part is done.
Determining which language to use, from a tag, or user preference, or
heuristic, is the easy part.
I wouldn't have a problem with some precedence rules over which to use,
or even some negotiation where the text clearly belongs to a script, and
the language influence of tag, user preference or heuristic is limited
to whether their recommendation is appropriate for the script.
(Hopefully the heuristic is always in line with the script.)

I do need to point out that user preference is problematic if it means
that for a user to display a multilingual document, the user has to go
thru and specify font preferences for languages they know nothing about.
Just because I don't read CJK, doesn't mean I don't have legitimate
needs to display or print CJK in a typographically correct way.
Librarians, Commerce exchanges, mailing lists, localizers, etc.

But although you didn't quite say this, a user could provide a
preference not for font, but language, i.e. if the script is CJK,
display it as C or J or K (or T). And given the language the font
mechanisms would do a reasonable thing.


tex

Mark Davis wrote:
 
  not to replace one broken model (code page = language) with
  another broken model (language = font preference).
 
 I would add to that that I suspect that given the number of documents
 that fail to tag with language, or even worse yet, tag with the wrong
 language, that other approaches may give generally better results. The
 main area of concern is CJK, and I suspect that in a great many cases
 the user is probably better off either:
 
 - simply using a font set according to the user's own preference, or
 - having a bit of smarts in the program for heuristically picking
 among C, J and K.
 
 Mark
 __
 http://www.macchiato.com
 ◄  “Eppur si muove” ►
 
 - Original Message -
 From: Kenneth Whistler [EMAIL PROTECTED]
 To: [EMAIL PROTECTED]
 Cc: [EMAIL PROTECTED]; [EMAIL PROTECTED]
 Sent: Thursday, September 26, 2002 16:17
 Subject: Re: glyph selection for Unicode in browsers
 
  Tex,
 
   3) The language information used to be derived
 
  dubiously
 
   from code page and is
   missing with Unicode, and architecture needs to accomodate a
 better
   model for bringing language to font selection.
 
  The archetypal situation is for CJK, and in particular J,
  where language choice correlates closely with typographical
  preferences, and where character encoding could, in turn,
  be correlated reliably with language choice.
 
  But in general, the connection does not hold, as for data
  in any of hundreds of different languages written in Code Page 1252,
  for example.
 
  What you are really looking for, I believe, is a way to
  specify typographical preference, which then can be used to
  drive auto-selection of fonts.
 
  I don't think we should head down the garden path of trying
  to tie typographical preference too closely to language identity,
  however we unknot that particular problem. This could get
  you into contrarian problems, where browsers (or other tools)
  start paying *too* much attention to language tags, and
  automatically (and mysteriously) override user preferences
  about the typographical preferences they expect for characters.
 
  What is needed, I believe, is:
 
a. a way to establish typographic preferences
b. a way to link typographical preference choices to
 fonts that would express them correctly
c. a way to (optionally) associate a language with
 a typographical preference
 
  And this all should be done, of course, in such a way that
  default behavior is reasonable and undue burdens of understanding,
  font acquisition, installation, and such
  are not placed on end-users who simply want to read and print
  documents from the web.
 
  A tall order, I am sure. But as long as we are blue-skying about
  architecture for better solutions, I think it is important
  not to replace one broken model (code page = language) with
  another broken model (language = font preference).
 
  --Ken

RE: glyph selection for Unicode in browsers

2002-09-25 Thread jarkko.hietaniemi

I would be happy if just this

meta http-equiv=Content-Type content=text/html; charset=utf-8/

would be enough to convince the browsers that the page is in UTF-8...
It isn't if the HTTP server claims that the pages it serves are in
ISO 8859-1.  A sample of this is http://www.iki.fi/jhi/jp_utf8.html,
it does have the meta charset, but since the webserver (www.hut.fi,
really, a server outside of my control) thinks it's serving Latin 1,
I cannot help the wrong result.  (I guess some browsers might do better
work at sniffing the content of the page, but at least IE6 and Opera 6.05
on Win32 seem to believe the server rather than the (HTML of the) page.




RE: glyph selection for Unicode in browsers

2002-09-25 Thread jarkko.hietaniemi

 You would be happy, but others might not- the standard specifically says
 that the http charset takes precedence.
 http://www.w3.org/TR/REC-html40/charset.html#h-5.2.2

Yup.  I guess I could argue both ways.  The server admins want control;
the users want control, the latter lose :-)

 However, what you say about user control of web server facilities being
 up to the administrator and not the page's author is true.
 Some of the servers allow users some control through directory-based
 files.

 I can send you a sample .htaccess file privately, if it will be of use
 to you.

Please.




Re: glyph selection for Unicode in browsers

2002-09-25 Thread Tex Texin

Done.

I almost forgot, I have a web page that also describes how to use
.htaccess with Apache.
See tip #1 in:
http://www.i18nguy.com/markup/serving.html
tex


[EMAIL PROTECTED] wrote:
 
  You would be happy, but others might not- the standard specifically says
  that the http charset takes precedence.
  http://www.w3.org/TR/REC-html40/charset.html#h-5.2.2
 
 Yup.  I guess I could argue both ways.  The server admins want control;
 the users want control, the latter lose :-)
 
  However, what you say about user control of web server facilities being
  up to the administrator and not the page's author is true.
  Some of the servers allow users some control through directory-based
  files.
 
  I can send you a sample .htaccess file privately, if it will be of use
  to you.
 
 Please.

-- 
-
Tex Texin   cell: +1 781 789 1898   mailto:[EMAIL PROTECTED]
Xen Master  http://www.i18nGuy.com
 
XenCrafthttp://www.XenCraft.com
Making e-Business Work Around the World
-




Re: glyph selection for Unicode in browsers

2002-09-25 Thread jameskass


Tex Texin wrote,

...
 However, I am finding that browsers are not supporting this in a way
 that is useful for Unicode.
 
 What has been working so far is that the browsers can associate
 different fonts with different languages. So I might use a Japanese font
 such as Mincho for Japanese text and another font for Chinese text.
 However, now that there are Unicode fonts, if I assign a Unicode font
 such as Arial Unicode MS, or CODE2000, to all languages, then I see the
 same glyph for a character, regardless of the lang assignment.
 
 I would like to understand why this is. (Bear in mind, I don't know much
 more than the rudiments of font technology.)
 
 a) Do Unicode fonts include the language-based glyph variants of
 characters, so that a display system is capable of identifying or
 hinting which glyph should be used in a particular scenario?
...

OpenType allows for substitution of language-specific glyphs and many
script and language tags are already registered.

However, the last time I checked (quite recently), the Uniscribe engine
only implements one language tag per script.

OpenType is still nascent and tremendous strides have been made within
the past few years.  Once implementations do allow for multiple language
based substitutions under a single script tag, there should be much
improvement in browser display.  (As long as the fonts get updated, too!)

Meanwhile, the workable approach seems to remain assigning specific
fonts in the style declaration.

Best regards,

James Kass.




Re: glyph selection for Unicode in browsers

2002-09-25 Thread Tex Texin

Thanks James.

Which registry are you referring to for script and language tags?
Is this in the context of glyphs or do you just mean the IANA language
tag registry?

Given the (un)workable approach, do you then intend to have variants of
code2000 for CJKT, so one can make the appropriate assignments? (ugh!)

Also, this approach means I have to ask each Unicode font vendor, Which
language is your multilingual font designed for?
so I know which CJKT assignment is appropriate for that font...

(I hope this doesn't read like I am attacking you, I am not. I am just
trying to highlight the difficulty I am having with this.)

tex


[EMAIL PROTECTED] wrote:
 
 Tex Texin wrote,
  a) Do Unicode fonts include the language-based glyph variants of
  characters, so that a display system is capable of identifying or
  hinting which glyph should be used in a particular scenario?
 ...
 
 OpenType allows for substitution of language-specific glyphs and many
 script and language tags are already registered.
 
 However, the last time I checked (quite recently), the Uniscribe engine
 only implements one language tag per script.
 
 OpenType is still nascent and tremendous strides have been made within
 the past few years.  Once implementations do allow for multiple language
 based substitutions under a single script tag, there should be much
 improvement in browser display.  (As long as the fonts get updated, too!)
 
 Meanwhile, the workable approach seems to remain assigning specific
 fonts in the style declaration.
 
 Best regards,
 
 James Kass.

-- 
-
Tex Texin   cell: +1 781 789 1898   mailto:[EMAIL PROTECTED]
Xen Master  http://www.i18nGuy.com
 
XenCrafthttp://www.XenCraft.com
Making e-Business Work Around the World
-




RE: glyph selection for Unicode in browsers

2002-09-25 Thread jarkko.hietaniemi

*sigh*  Time for me to call it the day and go home, it seems.  Opera 6.05/Win32
does *not* get it right if you have it on View - Encoding - Automatic detection.

Why I was fooled in the below message was that the Encoding setting seems to
stick even if I exit and restart Opera, that's why my test page seemed to be
working.  If I turn it back to autodetect, it doesn't autodetect the UTF-8-ness.

(If nothing else this bumbling saga of mine illustrates how difficult it still
is to get all this just to work.)

-Original Message-
From: Hietaniemi Jarkko (NRC/Boston) 
Sent: 25 September, 2002 04:56 PM
To: Hietaniemi Jarkko (NRC/Boston); 'ext Tex Texin'; 'WWW
International'; 'Unicoders'
Subject: RE: glyph selection for Unicode in browsers


 I cannot help the wrong result.  (I guess some browsers might do better
 work at sniffing the content of the page, but at least IE6 and Opera 6.05
 on Win32 seem to believe the server rather than the (HTML of the) page.

After some experimentation it seems that I blamed Opera 6.05/Win32 wrongly,
it guesses the charset right.  But as pointed out by Tex, HTTP/HTML charset
ponderings are probably not Unicode issue as such, they are more a WWW issue,
sorry about the slight off-topicalness.







Re: glyph selection for Unicode in browsers

2002-09-25 Thread Peter_Constable


On 09/25/2002 01:51:28 PM Tex Texin wrote:

a) Do Unicode fonts include the language-based glyph variants of
characters, so that a display system is capable of identifying or
hinting which glyph should be used in a particular scenario?

They *can*, and some do. When this is the case, then there needs to be some
mechanism to modify the relationship between sequences of characters and
sequences of glyphs to arrive at the particular glyphs intended for the
given language. In general terms, the same kinds of mechanisms than can be
used for rendering complex scripts can also be used here -- it's a glyph
substitution, comparable to substituting an initial or final form of an
Arabic character. Of course, there is a different triggering condition
involved in these situations than in the case of a complex script such as
Arabic: in the complex-script situation, the triggers are the character
context (e.g. preceded by non-word-forming character and followed by
word-forming character), whereas here the trigger is a metadata tag.

Let's consider how this would be dealt with in term of implementation,
using OpenType as an example. The OpenType font format provides means for
storing different glyph-transformation rules according to language. (1)
The question is, then, what does it take for the rendering process to make
use of one set of language-specific rules rather than another, or rather
than a set of default rules (OT allows the font developer to specify a
default). In OpenType, glyph-transformation rules are grouped by
features, and a set of rules will be applied when the associated feature
has been activated. (Thus, in OT text layout, what's processed is a
feature-marked-up string of characters.) This applies to the language
distinctions as well: the desired language must be specified in the
input, otherwise the default rules will apply. (2) The idea is that
application software must determine what features are activated at what
point.

Now, hardly any software gets written to interact directly with the
OpenType layout engine. Instead, higher-level text layout libraries have
been written that wrap the OpenType functionality. Uniscribe is one
example; indeed, in Win32 on Windows 2000 and later, there is even another
layer, since the standard text-drawing functions (TextOut and ExtTextOut)
wrap Uniscribe's functionality. Other examples of libaries that wrap up the
OT interface and expose a higher-level interface include Adobe's CoolType
engine (not a published interface, that I know of), ICU, Pango and Sun's
recent Standard Type Services Framework project.

So, at the OT interface, a language tag (3) has to be specified in order
to get language-specific glyphs. But apps generally don't write to that
interface (for good reason); they usually write to a higher interface. The
crux of the issue is that none of the higher-level interfaces, that I know
of, yet provide any mechanism for the app to specify a language tag. (4)
Hence, the building blocks are there, but more infrastructure is still
needed. Note that there's a bit more involved that simply re-writing
higer-level APIs to expose a way to specify OT featues. In particular, a
critical issue has to do with the relationship between OpenType's
language tags, and whatever system of language or locale tagging
might be used elsewhere in a given platform.

I've described the situation in terms of OpenType. Neither AAT or Graphite
provide exactly the same kind of mechanism for providing different glyph
transformations for different languages, though I believe some
consideration has been given to possibilities for both technologies. Both
use feature mechanisms, so can certainly do what you're looking for; but
neither has specifically defined features specifically related to
languages, let alone decided how these should be handled in terms of
APIs. It would be possible to implement an AAT or Graphite font that used a
feature to get at language-specific glyphs, and apps that exposed a
user-interface for setting AAT or Graphite features (5) would offer the
user a way to control this. But there would not be any automation whereby
an app would specify this based on other language or locale tagging.


Notes:

(1) I put language in quotation marks since it has not really been
adequately worked out what these distinctions are; I think these are
probably groups of writing systems.

(2) OpenType glyph-transformation rules are organised hierarchically, first
by script, then by language, and then according to the other features they
are associated with.

(3) OpenType's language tags have no specified relationship with ISO 639,
RFC 3066 or any other system of language tags.

(4) The same issue applies to OpenType features that pertain to optional
aspects of typography and rendering that are up to the user's discretion
rather than being obligatory behaviour for a script. For instance, there is
an OpenType feature for selecting small cap forms, which a font developer
can use to provide 

RE: glyph selection for Unicode in browsers

2002-09-25 Thread jarkko.hietaniemi

 I cannot help the wrong result.  (I guess some browsers might do better
 work at sniffing the content of the page, but at least IE6 and Opera 6.05
 on Win32 seem to believe the server rather than the (HTML of the) page.

After some experimentation it seems that I blamed Opera 6.05/Win32 wrongly,
it guesses the charset right.  But as pointed out by Tex, HTTP/HTML charset
ponderings are probably not Unicode issue as such, they are more a WWW issue,
sorry about the slight off-topicalness.







Re: glyph selection for Unicode in browsers

2002-09-25 Thread Peter_Constable


On 09/25/2002 03:34:00 PM Tex Texin wrote:

Thanks James.

Which registry are you referring to for script and language tags?
Is this in the context of glyphs or do you just mean the IANA language
tag registry?

The OpenType script and language tags are specific to OpenType. As I
mentioned in my previous message, one of the problems yet to be solved is
how to associate OT language tags with the kind of things used for
metadata, e.g. RFC 3066 (and also determining whether resolving those
associations is the responsibility of the app, of a higher-level layout
engine, or of the OpenType layout engine), and it hasn't even been worked
out yet (IMO) just what the OT language tags are.



Given the (un)workable approach, do you then intend to have variants of
code2000 for CJKT, so one can make the appropriate assignments? (ugh!)

Also, this approach means I have to ask each Unicode font vendor, Which
language is your multilingual font designed for?
so I know which CJKT assignment is appropriate for that font...

Unfortunately, that's where we're stuck for the time being. I wish it were
otherwise, since we're in the process of coming up with new Latin /
Cyrillic fonts for our users throughout the world, and there are various
Latin characters for which different glyphs are preferred in different
language communities. And the variations for one character don't
necessarily correlate with those for another, so you get lots of possible
combinations needed -- which would make it a pain to come up with a bunch
of language-specific fonts. For now, we're going to give them the ability
to select alternate glyphs via Graphite features,* but they'll only be able
to use that in Graphite-enabled apps -- it won't work in Word!

*Since our software tools are intended for use by linguists working in
hundreds of languages / writing systems for which there is no support in
commercial software platforms, we have for a long time provided mechanisms
to specify writing-system-specific behaviours, such as sorting or character
properties determining basic things like word-boundary detection and line
breaking. In our new tools that support Graphite, there's an ability for
the linguist setting up a system for their writing system to specify what
features should be active by default for their writing system.  This gives
us an interim mechanism to handle language-specific typography
requirements.



- Peter


---
Peter Constable

Non-Roman Script Initiative, SIL International
7500 W. Camp Wisdom Rd., Dallas, TX 75236, USA
Tel: +1 972 708 7485
E-mail: [EMAIL PROTECTED]







Re: glyph selection for Unicode in browsers

2002-09-25 Thread jameskass


Tex Texin wrote,

 Which registry are you referring to for script and language tags?
 Is this in the context of glyphs or do you just mean the IANA language
 tag registry?

As Peter Constable already noted, in this case registered only means
registered as an OpenType tag.  More info about this can be found on
Adobe's page:
http://partners.adobe.com/asn/developer/opentype/appendices/ttoreg.html

 
 Given the (un)workable approach, do you then intend to have variants of
 code2000 for CJKT, so one can make the appropriate assignments? (ugh!)


Code2000's coverage of CJKTV ideographs isn't adequate to support any language
yet.  Eventually and hopefully the repertoire will be completed.  Given the
current ceiling of 65536 max glyphs per font, it might not be feasible to
try to have one font cover all scripts and variants, but time will tell.
 
 Also, this approach means I have to ask each Unicode font vendor, Which
 language is your multilingual font designed for?
 so I know which CJKT assignment is appropriate for that font...
 

Sad but true.  On a happier note, most Japanese users will already have a
Japanese font set as default, Chinese users will have a Chinese (Simp or
Trad) font installed, and so forth.  Still, when you're trying to publish
a multilingual page which can be properly displayed anywhere, this isn't
much consolation.

 (I hope this doesn't read like I am attacking you, I am not. I am just
 trying to highlight the difficulty I am having with this.)

You are not alone...

Best regards,

James Kass.




Re: glyph selection for Unicode in browsers

2002-09-25 Thread Tex Texin

James, thanks as always for your reply.
The 65K limit is ugly...

With respect to CJKT comment below, I guess it is true because of
catch-22.

For example, I set my browser to default to a Unicode font. I think
everyone would if they could-
-it's a knee-jerk response if the solution is adequate everywhere. You
don't have to know which fonts work for which languages.
For Americas, and Europe, users can easily just set a Unicode font.

However, a Japanese user might have to choose a Japanese font, if the
Unicode font does not favor (and cannot be made to favor with language
tags) Japanese renderings.
So it's catch 22. They have native fonts because Unicode fonts are
inadequate, but we can be relieved that although Unicode fonts are
inadequate, we are lucky the users don't use them.

ugh!

So where the differences are important, users are forced to select
native fonts instead of unicode fonts. This then creates the difficulty
that to view a multilingual page, you need to a)acquire specialized
fonts,(tedious and costly perhaps),  b) install them, c) assign them d)
finally view the page.

Sadder still:
Content developers that want to use Unicode:
a) can invest a lot of time in declaring lang around sections of text,
and really get no bang for it at the moment. In truth browsers do very
little with this information as far as I can tell. (I suspect it helps
search engines, but I need to test that assumption more).

b) It is actually more beneficial to use native code pages than unicode,
since the browsers seem to do a better job of font selection here. (I
need to test this statement more. However, from my own coding experience
on windows, knowing the code page allows easy setting of the script
for the font, which has a major influence on Windows font selection. The
language information wouldn't be available so easily for a Unicode file
without it being carefully designed in to be passed from the markup
layers down to the primitive font selection layers.)

To be fair, I think font coverage for Unicode has been steadily
improving and it is much easier today to produce multilingual docs than
in the past. But I am disappointed in the state of the art for Browsers,
and I suspect it is also true for other products that are not
professional publishing software of one kind or another. I suspect at
the heart of the problem is rendering architecture has not carried
language (as opposed to code page) to the primitive layers, and this
needs to be addressed throughout the architecture, since the language
information can no longer be deduced or presumed when the encoding is
Unicode.

Whatever the reason, this needs to be fixed a) so Unicode can be
recommended as best practice and b) documents are rendered with
appropriate glyphs, without extraordinary effort by users.

tex


[EMAIL PROTECTED] wrote:
  Also, this approach means I have to ask each Unicode font vendor, Which
  language is your multilingual font designed for?
  so I know which CJKT assignment is appropriate for that font...
 
 
 Sad but true.  On a happier note, most Japanese users will already have a
 Japanese font set as default, Chinese users will have a Chinese (Simp or
 Trad) font installed, and so forth.  Still, when you're trying to publish
 a multilingual page which can be properly displayed anywhere, this isn't
 much consolation.

 Best regards,
 
 James Kass.

-- 
-
Tex Texin   cell: +1 781 789 1898   mailto:[EMAIL PROTECTED]
Xen Master  http://www.i18nGuy.com
 
XenCrafthttp://www.XenCraft.com
Making e-Business Work Around the World
-