Re: entities with breve

2002-09-25 Thread Michael Everson

At 10:04 -0500 2002-09-23, [EMAIL PROTECTED] wrote:
On 09/23/2002 07:50:04 AM PRANI6 wrote:

I wonder how to encode two entities with one breve or inverted breve
below,
for example k+s, or p+f. Are there Characters for half breves left and
right
or something like that?

The answer would be to encoded characters comparable to U+0361. A combining
double breve has already been approved for version 4.0. I intend to propose
(unless someone gets around to it before me) a combining double inverted
breve below.

Double INVERTED breve below?
-- 
Michael Everson * * Everson Typography *  * http://www.evertype.com
48B Gleann na Carraige; Cill Fhionntain; Baile Átha Cliath 13; Éire
Telephone +353 86 807 9169 * * Fax +353 1 832 2189 (by arrangement)




Keys. (derives from Re: Sequences of combining characters.)

2002-09-25 Thread William Overington

The recent discussion on sequences has led me to have a look through the
various combining characters and I have found the following.

U+20E3 COMBINING ENCLOSING KEYCAP

It has occurred to me that the use of a sequence of a base character, then
one or more combining characters so as to produce a sequence which would be
otherwise unlikely, followed by U+20E3 might be a very effective way to
include specialised markup systems within a plain text file without
disrupting the normal textual information conveying capabilities of a file.
An all-Unicode font would then produce a graphic representation of the key,
without any prior arrangement being necessary, so that such marked-up
sequences could be produced using just a regular all-Unicode plain text
editor.  A receiving program with a specialized plug-in could then decode
the markup, or it could be decoded manually in some cases.

For example, I am looking at using the following sequence so as to produce a
special purpose key within documents.

U+2604 U+0302 U+20E3

Hopefully that sequence will be so unlikely to occur other than in my
specialised application that the sequence can be used uniquely for that
specialised application.

I am also thinking in terms of using the following sequence to indicate the
end of the markup sequence.

U+2604 U+0302 U+20E2

I have it in mind that characters in the range U+2460 through to U+2473
could be used before parameters within the markup system.



Also, I have noticed that in the document U02D0.pdf that U+20E4 is shown, in
the listing, in magenta whereas U+20DF is shown in black.  Could someone say
what significance the magenta colouring in the document has please?  Is it
perhaps to indicate additions since the previous issue of the document?

William Overington

25 September 2002








Re: entities with breve

2002-09-25 Thread William Overington

Peter Constable wrote as follows.

The answer would be to encoded characters comparable to U+0361. A combining
double breve has already been approved for version 4.0. I intend to propose
(unless someone gets around to it before me) a combining double inverted
breve below.

In the mean time, one can encode these as PUA characters (which is an
interim solution we're going to be using, at least for some purposes).

Could you please say some more about what is going to be encoded in regular
Unicode and with which code points please?

In relation to your encoding these characters as Private Use Area
characters, I wonder if you could please say some more about this please,
both in relation to which code points you are intending to use and also as
to whether encoding a combining accent character or a combining double into
the Private Use Area could lead potentially to any problems over a rendering
system recognizing the character as being a combining character (please know
that I have no specific reason to think that it would, it is just a
possibility about which I wondered when considering various uses of the
Private Use Area).

William Overington

25 September 2002







Re: entities with breve

2002-09-25 Thread Peter_Constable


On 09/25/2002 01:54:25 AM Michael Everson wrote:

The answer would be to encoded characters comparable to U+0361. A
combining
double breve has already been approved for version 4.0. I intend to
propose
(unless someone gets around to it before me) a combining double inverted
breve below.

Double INVERTED breve below?

My mistake: COMBINING DOUBLE BREVE BELOW.



- Peter


---
Peter Constable

Non-Roman Script Initiative, SIL International
7500 W. Camp Wisdom Rd., Dallas, TX 75236, USA
Tel: +1 972 708 7485
E-mail: [EMAIL PROTECTED]







Re: entities with breve

2002-09-25 Thread Peter_Constable


On 09/25/2002 04:04:40 AM William Overington wrote:

Could you please say some more about what is going to be encoded in
regular
Unicode and with which code points please?

You can look at http://www.unicode.org/unicode/alloc/Pipeline.html to see
what's in the pipeline, but note that code points are not yet definite.
There will be a beta period, beginning in January I believe.



In relation to your encoding these characters as Private Use Area
characters, I wonder if you could please say some more about this please,
both in relation to which code points you are intending to use

Our choice of code points s relevant only for our users and those with whom
we might interchange data. Once we have implementations, such info will be
available either with those implementations or on our Web site for the
benefit of those using those implementations. I don't see any reason to
discuss this on this list.



and also as
to whether encoding a combining accent character or a combining double
into
the Private Use Area could lead potentially to any problems over a
rendering
system recognizing the character as being a combining character (please
know
that I have no specific reason to think that it would, it is just a
possibility about which I wondered when considering various uses of the
Private Use Area).

I can't comment about rendering systems in general: some may have issues
that others do not have.



- Peter


---
Peter Constable

Non-Roman Script Initiative, SIL International
7500 W. Camp Wisdom Rd., Dallas, TX 75236, USA
Tel: +1 972 708 7485
E-mail: [EMAIL PROTECTED]







Re: Keys. (derives from Re: Sequences of combining characters.)

2002-09-25 Thread Doug Ewell

William Overington WOverington at ngo dot globalnet dot co dot uk
wrote:

 Also, I have noticed that in the document U02D0.pdf

(actually U20D0.pdf)

 that U+20E4 is
 shown, in the listing, in magenta whereas U+20DF is shown in black.
 Could someone say what significance the magenta colouring in the
 document has please?  Is it perhaps to indicate additions since the
 previous issue of the document?

Since the previous release of Unicode.  The magenta characters are those
added in Unicode 3.2.  They were marked specially in the draft copies of
the code charts to indicate the changes (and probably to highlight the
fact that the assignments were still tentative), and left that way after
3.2 went live.  Whether this was intentional or not, I don't know.

-Doug Ewell
 Fullerton, California





RE: Keys. (derives from Re: Sequences of combining characters.)

2002-09-25 Thread Marco Cimarosti

William Overington wrote:
 The recent discussion on sequences has led me to have a look 
 through the various combining characters and I have found
 the following.
 
 U+20E3 COMBINING ENCLOSING KEYCAP
 
 It has occurred to me that the use of a sequence of a base 
 character, then one or more combining characters so as to
 produce a sequence which would be otherwise unlikely,
 followed by U+20E3 might be a very effective way to
 include specialised markup systems within a plain text
 file [...]

What the hell do key caps have to do with mark up or text files!!??

Mr. Overington, why do you have this irresistible compulsion to mix up
apples and horses? (I feel that the usual apples and oranges is not enough
to convey the idea fully.)

Regards.

_ Marco




Re: no replies

2002-09-25 Thread Barry Caplan

Roslyn,

I will head off trouble because for you because your message is likely to be otherwise 
ignored or semi-flamed.

The best place to get information on compiling and configuring php is on a php support 
or developer list. There must be information on how to subscribe to such lists on the 
php home page, which I am guessing is php.org.

Another great source to find answers that I use at least 10 times a day with a 90%+ 
success rate is to search on related keywords on google.com and groups.google.com. 
OTTOMH, in your case I would try searching php enable-mbstring inthose places and 
see what you find.

This list is for questions related to Unicode. That is probably no one has replied 
previously. Few if any people here are php developers, and even fewer are going to be 
versed in the details of configuring and compiling php.

Hope this helps!

Barry Caplan
www.i18n.com


At 04:35 AM 9/24/2002 -0700, you wrote:

aaah finally, one reply to that question!! thankyou BOB. anyways, could anyone tell 
me how i can recompile php to include mbstring support. i used the ./configure 
enable-mbstring option,did the make install..etc etc, but i still can seem to run any 
of the mbstring functions in my php code, i get fatal error: call to undefined 
function mb_(whatever)...could anyone pls assist me here. thanks 

regards, 

roslyn 





Small 's' with grave?

2002-09-25 Thread James E. Agenbroad

   Wednesday, September 25, 2002
A friend of a friend asked me if Unicode has a code for small s with a
grave.  I can't find one; am I overlooking it?  Has it been added
since 3.0? Thanks in advance.   

 Regards,
  Jim Agenbroad ( [EMAIL PROTECTED] )
 It is not true that people stop pursuing their dreams because they
grow old, they grow old because they stop pursuing their dreams. Adapted
from a letter by Gabriel Garcia Marquez.
 The above are purely personal opinions, not necessarily the official
views of any government or any agency of any.
 Addresses: Office: Phone: 202 707-9612; Fax: 202 707-0955; US
mail: I.T.S. Sys.Dev.Gp.4, Library of Congress, 101 Independence Ave. SE, 
Washington, D.C. 20540-9334 U.S.A.
Home: Phone: 301 946-7326; US mail: Box 291, Garrett Park, MD 20896.  





Old Hungarian?

2002-09-25 Thread jarkko.hietaniemi

 You can look at http://www.unicode.org/unicode/alloc/Pipeline.html to see
 what's in the pipeline, but note that code points are not yet definite.
 There will be a beta period, beginning in January I believe.

Whatever happened to Old Hungarian, aka  Hungarian Runic, aka rovasiras?
(sorry for missing diacritics) I can see a proposal by Mr Everson
back in 1998 (http://wwwold.dkuug.dk/JTC1/SC2/WG2/docs/n1686/n1686.htm,
http://www.dkuug.dk/jtc1/sc2/wg2/docs/n1758.pdf) but I cannot see it in
the above pipeline.





glyph selection for Unicode in browsers

2002-09-25 Thread Tex Texin

Declaring lang for text, should help a browser display the text more
appropriately for the specified language.
(e.g. span lang=esHola/span

It seems especially appropriate for Unicode text, since an Asian
character may have very different display requirements in different
languages (CJKT), and the Han unification brought many of these glyph
variants together.

However, I am finding that browsers are not supporting this in a way
that is useful for Unicode.

What has been working so far is that the browsers can associate
different fonts with different languages. So I might use a Japanese font
such as Mincho for Japanese text and another font for Chinese text.
However, now that there are Unicode fonts, if I assign a Unicode font
such as Arial Unicode MS, or CODE2000, to all languages, then I see the
same glyph for a character, regardless of the lang assignment.

I would like to understand why this is. (Bear in mind, I don't know much
more than the rudiments of font technology.)

a) Do Unicode fonts include the language-based glyph variants of
characters, so that a display system is capable of identifying or
hinting which glyph should be used in a particular scenario?

b) If the above is possible, then I assume the browsers have not
implemented language-based selection yet. Are any browsers moving to
using the appropriate glyphs based on language without depending on each
language being assigned a different font?

c) If the above is not possible, then configuring browsers for Unicode
usage is greatly complicated by the need to have a lengthy list of fonts
assigned to different languages. Is there an alternative approach that
can be used, so users can easily view Unicode text and get the correct
display while using a single Unicode font?

Ideally (to my mind) I should be able to create web pages in Unicode,
with appropriate lang declarations and get reasonable displays on
systems where a user does not do much more than have available a Unicode
font. However, this does not seem to be the case at the moment.

If it will help I can post some test pages I have been using where I
take a string of characters and repeat them with different lang
assignments. The text looks the same unless I choose to assign different
fonts to each language in the browser preferences.
The examples are trivial so I haven't bothered to post them.

I would be glad to learn if there is another approach which is easy for
users to configure, that gives appropriate text rendering.


-- 
-
Tex Texin   cell: +1 781 789 1898   mailto:[EMAIL PROTECTED]
Xen Master  http://www.i18nGuy.com
 
XenCrafthttp://www.XenCraft.com
Making e-Business Work Around the World
-




RE: glyph selection for Unicode in browsers

2002-09-25 Thread jarkko.hietaniemi

I would be happy if just this

meta http-equiv=Content-Type content=text/html; charset=utf-8/

would be enough to convince the browsers that the page is in UTF-8...
It isn't if the HTTP server claims that the pages it serves are in
ISO 8859-1.  A sample of this is http://www.iki.fi/jhi/jp_utf8.html,
it does have the meta charset, but since the webserver (www.hut.fi,
really, a server outside of my control) thinks it's serving Latin 1,
I cannot help the wrong result.  (I guess some browsers might do better
work at sniffing the content of the page, but at least IE6 and Opera 6.05
on Win32 seem to believe the server rather than the (HTML of the) page.




Re: Small 's' with grave?

2002-09-25 Thread starner

A friend of a friend asked me if Unicode has a code for small s with a
grave.  

U+0073 U+0300

Has it been added
since 3.0? Thanks in advance.   

Afaik, there is not and will not be any new precomposed characters since Unicode 3.0




RE: glyph selection for Unicode in browsers

2002-09-25 Thread jarkko.hietaniemi

 You would be happy, but others might not- the standard specifically says
 that the http charset takes precedence.
 http://www.w3.org/TR/REC-html40/charset.html#h-5.2.2

Yup.  I guess I could argue both ways.  The server admins want control;
the users want control, the latter lose :-)

 However, what you say about user control of web server facilities being
 up to the administrator and not the page's author is true.
 Some of the servers allow users some control through directory-based
 files.

 I can send you a sample .htaccess file privately, if it will be of use
 to you.

Please.




Re: glyph selection for Unicode in browsers

2002-09-25 Thread Tex Texin

Done.

I almost forgot, I have a web page that also describes how to use
.htaccess with Apache.
See tip #1 in:
http://www.i18nguy.com/markup/serving.html
tex


[EMAIL PROTECTED] wrote:
 
  You would be happy, but others might not- the standard specifically says
  that the http charset takes precedence.
  http://www.w3.org/TR/REC-html40/charset.html#h-5.2.2
 
 Yup.  I guess I could argue both ways.  The server admins want control;
 the users want control, the latter lose :-)
 
  However, what you say about user control of web server facilities being
  up to the administrator and not the page's author is true.
  Some of the servers allow users some control through directory-based
  files.
 
  I can send you a sample .htaccess file privately, if it will be of use
  to you.
 
 Please.

-- 
-
Tex Texin   cell: +1 781 789 1898   mailto:[EMAIL PROTECTED]
Xen Master  http://www.i18nGuy.com
 
XenCrafthttp://www.XenCraft.com
Making e-Business Work Around the World
-




Re: glyph selection for Unicode in browsers

2002-09-25 Thread jameskass


Tex Texin wrote,

...
 However, I am finding that browsers are not supporting this in a way
 that is useful for Unicode.
 
 What has been working so far is that the browsers can associate
 different fonts with different languages. So I might use a Japanese font
 such as Mincho for Japanese text and another font for Chinese text.
 However, now that there are Unicode fonts, if I assign a Unicode font
 such as Arial Unicode MS, or CODE2000, to all languages, then I see the
 same glyph for a character, regardless of the lang assignment.
 
 I would like to understand why this is. (Bear in mind, I don't know much
 more than the rudiments of font technology.)
 
 a) Do Unicode fonts include the language-based glyph variants of
 characters, so that a display system is capable of identifying or
 hinting which glyph should be used in a particular scenario?
...

OpenType allows for substitution of language-specific glyphs and many
script and language tags are already registered.

However, the last time I checked (quite recently), the Uniscribe engine
only implements one language tag per script.

OpenType is still nascent and tremendous strides have been made within
the past few years.  Once implementations do allow for multiple language
based substitutions under a single script tag, there should be much
improvement in browser display.  (As long as the fonts get updated, too!)

Meanwhile, the workable approach seems to remain assigning specific
fonts in the style declaration.

Best regards,

James Kass.




Re: glyph selection for Unicode in browsers

2002-09-25 Thread Tex Texin

Thanks James.

Which registry are you referring to for script and language tags?
Is this in the context of glyphs or do you just mean the IANA language
tag registry?

Given the (un)workable approach, do you then intend to have variants of
code2000 for CJKT, so one can make the appropriate assignments? (ugh!)

Also, this approach means I have to ask each Unicode font vendor, Which
language is your multilingual font designed for?
so I know which CJKT assignment is appropriate for that font...

(I hope this doesn't read like I am attacking you, I am not. I am just
trying to highlight the difficulty I am having with this.)

tex


[EMAIL PROTECTED] wrote:
 
 Tex Texin wrote,
  a) Do Unicode fonts include the language-based glyph variants of
  characters, so that a display system is capable of identifying or
  hinting which glyph should be used in a particular scenario?
 ...
 
 OpenType allows for substitution of language-specific glyphs and many
 script and language tags are already registered.
 
 However, the last time I checked (quite recently), the Uniscribe engine
 only implements one language tag per script.
 
 OpenType is still nascent and tremendous strides have been made within
 the past few years.  Once implementations do allow for multiple language
 based substitutions under a single script tag, there should be much
 improvement in browser display.  (As long as the fonts get updated, too!)
 
 Meanwhile, the workable approach seems to remain assigning specific
 fonts in the style declaration.
 
 Best regards,
 
 James Kass.

-- 
-
Tex Texin   cell: +1 781 789 1898   mailto:[EMAIL PROTECTED]
Xen Master  http://www.i18nGuy.com
 
XenCrafthttp://www.XenCraft.com
Making e-Business Work Around the World
-




RE: glyph selection for Unicode in browsers

2002-09-25 Thread jarkko.hietaniemi

*sigh*  Time for me to call it the day and go home, it seems.  Opera 6.05/Win32
does *not* get it right if you have it on View - Encoding - Automatic detection.

Why I was fooled in the below message was that the Encoding setting seems to
stick even if I exit and restart Opera, that's why my test page seemed to be
working.  If I turn it back to autodetect, it doesn't autodetect the UTF-8-ness.

(If nothing else this bumbling saga of mine illustrates how difficult it still
is to get all this just to work.)

-Original Message-
From: Hietaniemi Jarkko (NRC/Boston) 
Sent: 25 September, 2002 04:56 PM
To: Hietaniemi Jarkko (NRC/Boston); 'ext Tex Texin'; 'WWW
International'; 'Unicoders'
Subject: RE: glyph selection for Unicode in browsers


 I cannot help the wrong result.  (I guess some browsers might do better
 work at sniffing the content of the page, but at least IE6 and Opera 6.05
 on Win32 seem to believe the server rather than the (HTML of the) page.

After some experimentation it seems that I blamed Opera 6.05/Win32 wrongly,
it guesses the charset right.  But as pointed out by Tex, HTTP/HTML charset
ponderings are probably not Unicode issue as such, they are more a WWW issue,
sorry about the slight off-topicalness.







Re: glyph selection for Unicode in browsers

2002-09-25 Thread Peter_Constable


On 09/25/2002 01:51:28 PM Tex Texin wrote:

a) Do Unicode fonts include the language-based glyph variants of
characters, so that a display system is capable of identifying or
hinting which glyph should be used in a particular scenario?

They *can*, and some do. When this is the case, then there needs to be some
mechanism to modify the relationship between sequences of characters and
sequences of glyphs to arrive at the particular glyphs intended for the
given language. In general terms, the same kinds of mechanisms than can be
used for rendering complex scripts can also be used here -- it's a glyph
substitution, comparable to substituting an initial or final form of an
Arabic character. Of course, there is a different triggering condition
involved in these situations than in the case of a complex script such as
Arabic: in the complex-script situation, the triggers are the character
context (e.g. preceded by non-word-forming character and followed by
word-forming character), whereas here the trigger is a metadata tag.

Let's consider how this would be dealt with in term of implementation,
using OpenType as an example. The OpenType font format provides means for
storing different glyph-transformation rules according to language. (1)
The question is, then, what does it take for the rendering process to make
use of one set of language-specific rules rather than another, or rather
than a set of default rules (OT allows the font developer to specify a
default). In OpenType, glyph-transformation rules are grouped by
features, and a set of rules will be applied when the associated feature
has been activated. (Thus, in OT text layout, what's processed is a
feature-marked-up string of characters.) This applies to the language
distinctions as well: the desired language must be specified in the
input, otherwise the default rules will apply. (2) The idea is that
application software must determine what features are activated at what
point.

Now, hardly any software gets written to interact directly with the
OpenType layout engine. Instead, higher-level text layout libraries have
been written that wrap the OpenType functionality. Uniscribe is one
example; indeed, in Win32 on Windows 2000 and later, there is even another
layer, since the standard text-drawing functions (TextOut and ExtTextOut)
wrap Uniscribe's functionality. Other examples of libaries that wrap up the
OT interface and expose a higher-level interface include Adobe's CoolType
engine (not a published interface, that I know of), ICU, Pango and Sun's
recent Standard Type Services Framework project.

So, at the OT interface, a language tag (3) has to be specified in order
to get language-specific glyphs. But apps generally don't write to that
interface (for good reason); they usually write to a higher interface. The
crux of the issue is that none of the higher-level interfaces, that I know
of, yet provide any mechanism for the app to specify a language tag. (4)
Hence, the building blocks are there, but more infrastructure is still
needed. Note that there's a bit more involved that simply re-writing
higer-level APIs to expose a way to specify OT featues. In particular, a
critical issue has to do with the relationship between OpenType's
language tags, and whatever system of language or locale tagging
might be used elsewhere in a given platform.

I've described the situation in terms of OpenType. Neither AAT or Graphite
provide exactly the same kind of mechanism for providing different glyph
transformations for different languages, though I believe some
consideration has been given to possibilities for both technologies. Both
use feature mechanisms, so can certainly do what you're looking for; but
neither has specifically defined features specifically related to
languages, let alone decided how these should be handled in terms of
APIs. It would be possible to implement an AAT or Graphite font that used a
feature to get at language-specific glyphs, and apps that exposed a
user-interface for setting AAT or Graphite features (5) would offer the
user a way to control this. But there would not be any automation whereby
an app would specify this based on other language or locale tagging.


Notes:

(1) I put language in quotation marks since it has not really been
adequately worked out what these distinctions are; I think these are
probably groups of writing systems.

(2) OpenType glyph-transformation rules are organised hierarchically, first
by script, then by language, and then according to the other features they
are associated with.

(3) OpenType's language tags have no specified relationship with ISO 639,
RFC 3066 or any other system of language tags.

(4) The same issue applies to OpenType features that pertain to optional
aspects of typography and rendering that are up to the user's discretion
rather than being obligatory behaviour for a script. For instance, there is
an OpenType feature for selecting small cap forms, which a font developer
can use to provide 

RE: glyph selection for Unicode in browsers

2002-09-25 Thread jarkko.hietaniemi

 I cannot help the wrong result.  (I guess some browsers might do better
 work at sniffing the content of the page, but at least IE6 and Opera 6.05
 on Win32 seem to believe the server rather than the (HTML of the) page.

After some experimentation it seems that I blamed Opera 6.05/Win32 wrongly,
it guesses the charset right.  But as pointed out by Tex, HTTP/HTML charset
ponderings are probably not Unicode issue as such, they are more a WWW issue,
sorry about the slight off-topicalness.







Re: glyph selection for Unicode in browsers

2002-09-25 Thread Peter_Constable


On 09/25/2002 03:34:00 PM Tex Texin wrote:

Thanks James.

Which registry are you referring to for script and language tags?
Is this in the context of glyphs or do you just mean the IANA language
tag registry?

The OpenType script and language tags are specific to OpenType. As I
mentioned in my previous message, one of the problems yet to be solved is
how to associate OT language tags with the kind of things used for
metadata, e.g. RFC 3066 (and also determining whether resolving those
associations is the responsibility of the app, of a higher-level layout
engine, or of the OpenType layout engine), and it hasn't even been worked
out yet (IMO) just what the OT language tags are.



Given the (un)workable approach, do you then intend to have variants of
code2000 for CJKT, so one can make the appropriate assignments? (ugh!)

Also, this approach means I have to ask each Unicode font vendor, Which
language is your multilingual font designed for?
so I know which CJKT assignment is appropriate for that font...

Unfortunately, that's where we're stuck for the time being. I wish it were
otherwise, since we're in the process of coming up with new Latin /
Cyrillic fonts for our users throughout the world, and there are various
Latin characters for which different glyphs are preferred in different
language communities. And the variations for one character don't
necessarily correlate with those for another, so you get lots of possible
combinations needed -- which would make it a pain to come up with a bunch
of language-specific fonts. For now, we're going to give them the ability
to select alternate glyphs via Graphite features,* but they'll only be able
to use that in Graphite-enabled apps -- it won't work in Word!

*Since our software tools are intended for use by linguists working in
hundreds of languages / writing systems for which there is no support in
commercial software platforms, we have for a long time provided mechanisms
to specify writing-system-specific behaviours, such as sorting or character
properties determining basic things like word-boundary detection and line
breaking. In our new tools that support Graphite, there's an ability for
the linguist setting up a system for their writing system to specify what
features should be active by default for their writing system.  This gives
us an interim mechanism to handle language-specific typography
requirements.



- Peter


---
Peter Constable

Non-Roman Script Initiative, SIL International
7500 W. Camp Wisdom Rd., Dallas, TX 75236, USA
Tel: +1 972 708 7485
E-mail: [EMAIL PROTECTED]







Re: glyph selection for Unicode in browsers

2002-09-25 Thread jameskass


Tex Texin wrote,

 Which registry are you referring to for script and language tags?
 Is this in the context of glyphs or do you just mean the IANA language
 tag registry?

As Peter Constable already noted, in this case registered only means
registered as an OpenType tag.  More info about this can be found on
Adobe's page:
http://partners.adobe.com/asn/developer/opentype/appendices/ttoreg.html

 
 Given the (un)workable approach, do you then intend to have variants of
 code2000 for CJKT, so one can make the appropriate assignments? (ugh!)


Code2000's coverage of CJKTV ideographs isn't adequate to support any language
yet.  Eventually and hopefully the repertoire will be completed.  Given the
current ceiling of 65536 max glyphs per font, it might not be feasible to
try to have one font cover all scripts and variants, but time will tell.
 
 Also, this approach means I have to ask each Unicode font vendor, Which
 language is your multilingual font designed for?
 so I know which CJKT assignment is appropriate for that font...
 

Sad but true.  On a happier note, most Japanese users will already have a
Japanese font set as default, Chinese users will have a Chinese (Simp or
Trad) font installed, and so forth.  Still, when you're trying to publish
a multilingual page which can be properly displayed anywhere, this isn't
much consolation.

 (I hope this doesn't read like I am attacking you, I am not. I am just
 trying to highlight the difficulty I am having with this.)

You are not alone...

Best regards,

James Kass.




Re: glyph selection for Unicode in browsers

2002-09-25 Thread Tex Texin

James, thanks as always for your reply.
The 65K limit is ugly...

With respect to CJKT comment below, I guess it is true because of
catch-22.

For example, I set my browser to default to a Unicode font. I think
everyone would if they could-
-it's a knee-jerk response if the solution is adequate everywhere. You
don't have to know which fonts work for which languages.
For Americas, and Europe, users can easily just set a Unicode font.

However, a Japanese user might have to choose a Japanese font, if the
Unicode font does not favor (and cannot be made to favor with language
tags) Japanese renderings.
So it's catch 22. They have native fonts because Unicode fonts are
inadequate, but we can be relieved that although Unicode fonts are
inadequate, we are lucky the users don't use them.

ugh!

So where the differences are important, users are forced to select
native fonts instead of unicode fonts. This then creates the difficulty
that to view a multilingual page, you need to a)acquire specialized
fonts,(tedious and costly perhaps),  b) install them, c) assign them d)
finally view the page.

Sadder still:
Content developers that want to use Unicode:
a) can invest a lot of time in declaring lang around sections of text,
and really get no bang for it at the moment. In truth browsers do very
little with this information as far as I can tell. (I suspect it helps
search engines, but I need to test that assumption more).

b) It is actually more beneficial to use native code pages than unicode,
since the browsers seem to do a better job of font selection here. (I
need to test this statement more. However, from my own coding experience
on windows, knowing the code page allows easy setting of the script
for the font, which has a major influence on Windows font selection. The
language information wouldn't be available so easily for a Unicode file
without it being carefully designed in to be passed from the markup
layers down to the primitive font selection layers.)

To be fair, I think font coverage for Unicode has been steadily
improving and it is much easier today to produce multilingual docs than
in the past. But I am disappointed in the state of the art for Browsers,
and I suspect it is also true for other products that are not
professional publishing software of one kind or another. I suspect at
the heart of the problem is rendering architecture has not carried
language (as opposed to code page) to the primitive layers, and this
needs to be addressed throughout the architecture, since the language
information can no longer be deduced or presumed when the encoding is
Unicode.

Whatever the reason, this needs to be fixed a) so Unicode can be
recommended as best practice and b) documents are rendered with
appropriate glyphs, without extraordinary effort by users.

tex


[EMAIL PROTECTED] wrote:
  Also, this approach means I have to ask each Unicode font vendor, Which
  language is your multilingual font designed for?
  so I know which CJKT assignment is appropriate for that font...
 
 
 Sad but true.  On a happier note, most Japanese users will already have a
 Japanese font set as default, Chinese users will have a Chinese (Simp or
 Trad) font installed, and so forth.  Still, when you're trying to publish
 a multilingual page which can be properly displayed anywhere, this isn't
 much consolation.

 Best regards,
 
 James Kass.

-- 
-
Tex Texin   cell: +1 781 789 1898   mailto:[EMAIL PROTECTED]
Xen Master  http://www.i18nGuy.com
 
XenCrafthttp://www.XenCraft.com
Making e-Business Work Around the World
-




script detection program

2002-09-25 Thread chuck clemens

Does anyone have a program or tool that can identify the scripts which the 
characters in a UTF-16 encoded file belong to?

I'd like a program that can scan the data and return script tag such as used 
in http://www.unicode.org/unicode/reports/tr24/

so if I had a UTF-16 encoding file with latin and cyrillic characters, the 
tool/program would scan the text and return the name latn and cyrl




_
Send and receive Hotmail on your mobile device: http://mobile.msn.com





Composition chart

2002-09-25 Thread Mark Davis

Apropos the discussion on combining sequences:

I had needed a quick reference for (canonical) composite characters,
and
written a quick chart (well, actually, a quick program for generating
one).
In case anyone is interested I posted it on:

http://www.macchiato.com/unicode/composition_chart.html

I found it a reasonably compact way to reference all composites,
including
the ones that are excluded from NFC. The presentation is not very
polished,
but tool-tips are enabled with character names. Red means 'excluded
from
composition'. The rest of the structure should be more or less clear.

Curiously enough, it seems to expose a bug in Arial Unicode MS: the
character U+1E4B gets a funny glyph. (This is on the AUMS from Office
2000;
I have Office XP but haven't yet gone through the process of
installing it
yet, so it may be fixed there.)

Mark
__
http://www.macchiato.com
►  “Eppur si muove” ◄