New Public Review Issue posted

2004-09-13 Thread [EMAIL PROTECTED]
The Unicode Technical Committee has posted a new issue for public review  
and comment. Details are on the following web page:

http://www.unicode.org/review/

Review period for the new item closes on November 11, 2004.

Please see the page for links to discussion and relevant documents.
Briefly, the new issue is:


46   Proposal for Encoded Representations of Meteg

In some Biblical Hebrew usage, it is considered necessary to distinguish
how the meteg mark positions relative to a vowel point: to the left of the  
vowel, or to the right; or, in the case of a hataf vowel, between the two  
components of the hataf vowel. A solution for this has been proposed using  
control characters, including the zero width joiner and non-joiner
characters. This public-review issue is soliciting feedback on this
proposed solution.


If you have comments for official UTC consideration, please post them by
submitting your comments through our feedback & reporting page:

http://www.unicode.org/reporting.html

If you wish to discuss issues on the Unicode mail list, then please
use the following link to subscribe (if necessary). Please be aware
that discussion comments on the Unicode mail list are not automatically
recorded as input to the UTC. You must use the reporting link above
to generate comments for UTC consideration.

http://www.unicode.org/consortium/distlist.html

Regards,
Rick McGowan
Unicode, Inc.



Re: markup on combining characters

2004-09-08 Thread [EMAIL PROTECTED]
Philippe Verdy wrote, re Public Review Issue #41:

> I don't know if a formal proposal has been sent to ISO/IEC WG too.

Yes. In fact, the PRI document itself says WG2 N2822. It *has* gone to WG2  
as well as to UTC.

Rick




Re: RE: Public Review Issue: UAX #24 Proposed Update

2004-09-08 Thread [EMAIL PROTECTED]
Jony wrote,

> FB1D, HEBREW LETTER YOD WITH HIRIQ, should be assigned to the
> unknown group. It is not a Hebrew character, notwithstanding the
> misleading name.

Hmmm... Are you claiming that HEBREW LETTER YOD (the base character of the  
codepoint U+FB1D) is not a letter of the Hebrew script, and you can
substantiate that claim? If so, please write a document to that effect with  
appropriate citations and send to me for posting to UTC.

Rick



Two new Public Review Issues posted

2004-09-08 Thread [EMAIL PROTECTED]
The Unicode Technical Committee has posted two new issues for public  
review and comment. Details are on the following web page:

http://www.unicode.org/review/

Review periods for the new items close on November 8, 2004.

Briefly, the new issues are:


44  Bidi Category of Fullwidth Solidus

Unicode 4.0.1 changes the Bidi Category U+002F SOLIDUS from "ES" to "CS"  
but leaves U+FF0F FULLWIDTH SOLIDUS as category "ES". U+FF0F FULLWIDTH  
SOLIDUS should probably have the same bidi class as its regular sibling.  
The UTC proposes to make this change for Unicode 4.1.


45  Linebreaking Category of Narrow No-Break Space

Should the linebreaking category of Narrow No-Break Space (NNBSP, U+202F)  
be changed from WS to CS, in analogy to No-Break Space U+00A0? The reason  
for the change is that in all scripts but Mongolian it acts like ordinary  
NBSP, except for its width. In Mongolian it may be recognized in shaping.


If you have comments for official UTC consideration, please post them by
submitting your comments through our feedback & reporting page:

http://www.unicode.org/reporting.html

If you wish to discuss issues on the Unicode mail list, then please
use the following link to subscribe (if necessary). Please be aware
that discussion comments on the Unicode mail list are not automatically
recorded as input to the UTC. You must use the reporting link above
to generate comments for UTC consideration.

http://www.unicode.org/consortium/distlist.html

Regards,
Rick McGowan
Unicode, Inc.



Public Review Issue: UAX #24 Proposed Update

2004-09-08 Thread [EMAIL PROTECTED]
The Unicode Technical Committee has posted a new issue for public review
and comment. Details are on the following web page:

http://www.unicode.org/review/

Review periods for the new items close on November 8, 2004.

Please see the page for links to discussion and relevant documents.
Briefly, the new issue is:


Proposed Update Unicode Standard Annex #24: Script Names
http://www.unicode.org/reports/tr24/tr24-6.html

This is a proposed update to a previously approved Unicode Standard Annex.  
It provides an assignment of script names to all Unicode code points. This  
information is useful in mechanisms such as regular expressions and other  
text processing tasks. The proposed update makes several substantial  
changes to the previously approved annex.


If you have comments for official UTC consideration, please post them by
submitting your comments through our feedback & reporting page:

http://www.unicode.org/reporting.html

If you wish to discuss issues on the Unicode mail list, then please use
the following link to subscribe (if necessary). Please be aware that
discussion comments on the Unicode mail list are not automatically recorded  
as input to the UTC. You must use the reporting link above to generate
comments for UTC consideration.

http://www.unicode.org/consortium/distlist.html

Regards,
Rick McGowan
Unicode, Inc.




UTR #17 now available

2004-09-08 Thread [EMAIL PROTECTED]
The Unicode Technical Committee is pleased to announce the availability of  
a new fully-approved version of Unicode Technical Report #17: Character  
Encoding Model.

It may be obtained at the following URLs:

http://www.unicode.org/reports/tr17/

This report describes a model for the structure of character encodings.  
The Unicode Character Encoding Model places the Unicode Standard in the  
context of other character encodings of all types, as well as existing  
models such as the character architecture promoted by the Internet  
Architecture Board (IAB) for use on the internet


If you have comments for official UTC consideration, please post them by
submitting your comments through our feedback & reporting page:

http://www.unicode.org/reporting.html

If you wish to discuss issues on the Unicode mail list, then please use
the following link to subscribe (if necessary). Please be aware that
discussion comments on the Unicode mail list are not automatically recorded  
as input to the UTC. You must use the reporting link above to generate
comments for UTC consideration.

http://www.unicode.org/consortium/distlist.html



Regards,
Rick McGowan
Unicode, Inc.



Public Review Issue: UAX #34 Proposed Draft

2004-09-07 Thread [EMAIL PROTECTED]
I'm sorry to report that the subject line of my previous note today was  
incorrect. The correct subject line should say UAX #34, and it is UAX #34  
which has been released for public review.

http://www.unicode.org/review/

Regards,
Rick McGowan
Unicode, Inc.

--

Subject: Public Review Issue: UTR #17 Proposed Draft

The Unicode Technical Committee has posted a new issue for public review  
and comment. Details are on the following web page:

http://www.unicode.org/review/

Review periods for the new items close on November 8, 2004.

Please see the page for links to discussion and relevant documents.  
Briefly, the new issue is:


Proposed Draft Unicode Standard Annex #34: Unicode Named Character Sequences
http://www.unicode.org/reports/tr34/tr34-1.html


This annex specifies sequences of characters that may be treated as single  
units, either in particular types of processing, in reference by  
standards, in listing of repertoires (such as for fonts or keyboards), or  
in communicating with users.

If you have comments for official UTC consideration, please post them by  
submitting your comments through our feedback & reporting page:

http://www.unicode.org/reporting.html

If you wish to discuss issues on the Unicode mail list, then please use  
the following link to subscribe (if necessary). Please be aware that  
discussion comments on the Unicode mail list are not automatically recorded  
as input to the UTC. You must use the reporting link above to generate  
comments for UTC consideration.

http://www.unicode.org/consortium/distlist.html

Regards,
Rick McGowan
Unicode, Inc.



Public Review Issue: UTR #17 Proposed Draft

2004-09-07 Thread [EMAIL PROTECTED]
The Unicode Technical Committee has posted a new issue for public review  
and comment. Details are on the following web page:

http://www.unicode.org/review/

Review periods for the new items close on November 8, 2004.

Please see the page for links to discussion and relevant documents.  
Briefly, the new issue is:


Proposed Draft Unicode Standard Annex #34: Unicode Named Character Sequences
http://www.unicode.org/reports/tr34/tr34-1.html


This annex specifies sequences of characters that may be treated as single  
units, either in particular types of processing, in reference by  
standards, in listing of repertoires (such as for fonts or keyboards), or  
in communicating with users.

If you have comments for official UTC consideration, please post them by  
submitting your comments through our feedback & reporting page:

http://www.unicode.org/reporting.html

If you wish to discuss issues on the Unicode mail list, then please use  
the following link to subscribe (if necessary). Please be aware that  
discussion comments on the Unicode mail list are not automatically recorded  
as input to the UTC. You must use the reporting link above to generate  
comments for UTC consideration.

http://www.unicode.org/consortium/distlist.html

Regards,
Rick McGowan
Unicode, Inc.



Re: Common Locale Data Repository Project

2004-04-23 Thread Hideki Hiura - [EMAIL PROTECTED]
> From: "Peter Constable" <[EMAIL PROTECTED]>
> > due to the strong perception of OpenI18N.org as
> > opensource/Linux advocates, even though CLDR project is not
> > specifically bound to Linux.
> It is hard to look at OpenI18N.org's spec and not get the impression
> that all of that group's projects are not bound to some flavour of Unix.

We understand what you mean. Sometime perception is very important, 
and that's why we thought it was a good idea to transfer CLDR.

As we started as Linux Internationalization Initiative(li18nux.org) and
later changed name and charter as OpenI18N.org to accommodate wider
platforms and platform neutral I18N technology developments, any
projects at OpenI18N.org are not limited to Linux/Unix.

> CLDR doesn't have to be tied to any particular platform -- after all,
> it's just a collection of data.

Yup! So hopefully this move would help more parties to join the
projects.
That would definitely help global interoperability for all platforms
and help everybody.

> But I don't think you can honestly say that OpenI18N isn't tied to a
> particular family of platforms

Most of our current projects are mainly for some flavour of Unix,
since most of the participants' expertise and interests are for those
platforms but we are not limited nor have to be bound to them.

The only requirement for the projects in OpenI18N.org is to be open to
everyone, to be developed in open process and to be opensourced.

For example, one of the projects I run, the platform neutral
multilingual distributed Unicode input method framework, IIIMF, runs
on Windows as well, and I honestly hope Microsoft to adapt to IIIMF in
the future release of Windows, so that we can unite unicode input
method framework regardless of platform.

Best Regards,
--
[EMAIL PROTECTED],OpenI18N.org,li18nux.org,unicode.org,sun.com} 
Chair, OpenI18N.org/The Free Standards Group  http://www.OpenI18N.org
Architect/Sr. Staff Engineer, Sun Microsystems, Inc, USA   eFAX: 509-693-8356



Re: Common Locale Data Repository Project

2004-04-22 Thread Hideki Hiura - [EMAIL PROTECTED]
> From: "Philippe Verdy" <[EMAIL PROTECTED]>
> Is that a contribution of the Unicode Consortium to the OpenI18n.org
> project (former li18nux.org, maintained with most help from the
> FSF), or a decision to make the OpenI18n.org project be more open by
> pushing it to a more visible standard?

More on the latter, but slightly different. We believe it would be
good for both opensource community and commercial IT industry that we
transfer (at least a part of) the project to Unicode Consortium, after
hearing the concerns on difficulty of some commercial companies to
join the project due to the strong perception of OpenI18N.org as
opensource/Linux advocates, even though CLDR project is not
specifically bound to Linux.

We hope this transfer would gain further participations from wider
audiences.

Regarding confusions, I have to say it is anticipated, since the project
is still in transition(for example, OpenI18N.org side has not been
finished necessary procedure to finalize this, so OpenI18N.org does
not have a press release statement ready yet - this announcement is a
little too early), I guess it will all be sorted out as time goes by. 

--
[EMAIL PROTECTED],OpenI18N.org,li18nux.org,unicode.org,sun.com} 
Chair, OpenI18N.org/The Free Standards Group  http://www.OpenI18N.org
Architect/Sr. Staff Engineer, Sun Microsystems, Inc, USA   eFAX: 509-693-8356




RE: OT? Languages with letters that always take diacriticals

2004-03-18 Thread [EMAIL PROTECTED]
A number of North American Native languages use a character+diacritic when
no character-diacritic exists. 

-Romanised Cree has <ē> but no 
-Some west-coast Salishan languages have LATIN LAMBDA WITH STROKE+COMBINING
COMMA ABOVE, but no plain LATIN LAMBDA WITH STROKE
-a number of languages (e.g. Meskwaki) use <č> but not .

However, most if not all North American Native languages have multiple
orthographies historically if not synchronically. So some Cree speakers who
are using Roman orthography may very well write  instead of <ē> for
reasons of graphical economy.

Chris Harvey





mail2web - Check your email from the web at
http://mail2web.com/ .





Re: Canadian Unified Syllabics

2004-02-12 Thread [EMAIL PROTECTED]
Hi.

>Make your recommendation either for encoding 3 separate things, or for
>use of variation selectors; or put the issue out for the committee to
>decide.  

I would recommend that three "invisible" (I don't know the technical term)
characters be adden, base-line final, mid-line final, and top-line final. I
can't see why the "top-line final" would be necessary except see *** below.
The benifit of this concept is that for fonts which do not have these
opentype substitutions, there would be no visual effect on the screen, just
that all finals would be top-line, instead of their proper place.

>> Carrier is missing 2 characters (only one of which appears in the
>> text) and Blackfoot is missing 1 character (which doesn't appear in the
>> text).

>Those are the only 3 you found, apart from the different height finals
>for Dene languages, is that right? Or are there still others?

Those are the only missing characters that were used in examples on my
website. Other missing characters are:

-the Ojibway i-finals (vital), 
-Ojibway combining r and l finals (I believe necessary, I have a contact
who could verify if the communities are still using these),
-historical Chipewyan l and g finals (obsolete really), 
-woods cree dh-final (vital),
-ojibway-cree small ring final (vital), 
-west cree w-vowel-y final (if this in not what U+141d is supposed to be, a
colon-like character would be the usual form of this), 
-west cree y-dot final (forgot to put an example of this on the webpage,
will fix) (vital),
-a syllabics hyphen? (just an idea),
-I have found evidence that Beaver used a superscript roman l in native
words (I will put an example up). I figure the other superscript roman
characters (used only in borrowings) could be coded something like "F" +
"topline final". ***

along with the blackfoot and carrier
-blackfoot w-equals sign (vital)
-carrier sans-serif s final (vital)
-carrier f/v final (loan words only, but would be useful)
(I also will try and get in contact with an expert in Carrier).

thanks... chris




mail2web - Check your email from the web at
http://mail2web.com/ .





Re: Canadian Unified Syllabics

2004-02-12 Thread [EMAIL PROTECTED]
Hello

Here are some comments about UCAS suggestions.

* Encode the pages as compliantly as possible.(Also about the overuse of
PUA on my website languagegeek.com)

-I spent all night making Unicode-only versions of the syllabics pages
where appropriate. In the end, for Carrier and Blackfoot, it doesn't look
too bad. Carrier is missing 2 characters (only one of which appears in the
text) and Blackfoot is missing 1 character (which doesn't appear in the
text). Apart from that, the remaining issues are stylistic, and don't
really concern us here. I have also included some .pdf files of what a more
neatly type-set version of the text would look like, using glyph variants
and missing characters. For the Dene languages, the only major problem is
with the baseline-midline-topline final situation. For now, I have used
superscript numbers to mark where the final ought to appear. Doesn't look
too good, but it's better than nothing.

Question: Are there any Unicode characters that one could use to mark final
height? Something like variation selectors? For an Opentype font, I need
some invisible character to tell the font where the final should go. If
nothing is appropriate, then I would suggest that three height selectors be
formally submitted for unicode approval.

* Offer the one font to fit all the pages while awaiting either
   language-specific fonts or OpenType technology availability.
* Note on the pages that the one font aims to cover all syllabics, but that
   language-specific variants exist which can't yet be covered in a single 
   font due to technological limitations.
Done and done (or almost done anyway).

* Use any combining dots and so forth from the COMBINING
   DIACRITIC range.  (A font like Code2000 won't display these
   combiners well due to technology limitations, but, so what?
   In *your* font, you can place the combining glyphs so that
   their default position is acceptable and won't overstrike the
   base glyphs.)

I am going to do a few things. 
1)I am going to leave Aboriginal serif as is, because people already have
the font and may have documents typed already. I have many warnings all
over the site about the drawbacks of using PUA. I also have a big notice
that if at any time an old font from my site is obsolete, I will provide
software to make documents compliant with a new font. Any new font will be
Unicode and OpenType (hopefully any mutually agreed upon missing characters
can be added to the standard by then)

ii) I am making a Syllabics only Open-type font (staying away from a
mega-Unicode font here), which will position diacritics, finals, etc.
properly (hence the need for final-height-position characters). I will
include the glyph variants as "historical" or "alternate" on a case by case
basis. I have discovered that the new Adobe InDesign CS seems to process
the syllabics opentype nicely (I think, I just downloaded it). This won't
work yet on browsers, but if the web-site is done according to unicode (as
best I can) then the opentype font will make it all look good in the future.

iii) I am also going to make language specific Unicode fonts. One could
look at syllabics as actually 4 scripts: Cree-Ojibway-Inuktitut, Dene,
Carrier, and Blackfoot. The differences between these 4 could be likened to
Roman, Cyrilic, and Greek: i.e. the alphabetical concept is pretty much the
same, and several glyph shapes are shared between them. For this reason,
it's tricky to get a Unified Range to look really nice. For a really
nit-picky example, western Cree finals tend to be quite short and small,
while eastern Cree finals have to be taller due to their more complex
shape. Yet both eastern Cree and Western Cree share the 'h-final' U+1426.
What happens is that either a tall h-final occurs alongside short
western-finals, or a short h-final occurs alongside tall eastern finals.



Also, in a few weeks, I would like to present to this forum a list of all
the suggestions, comments, criticisms, etc. that have been posted. And we
can see where to go from there.

Thanks for everyone's comments so far. I hope we can get more opinions.

Chris



mail2web - Check your email from the web at
http://mail2web.com/ .





Re: Canadian Unified Syllabics

2004-02-10 Thread [EMAIL PROTECTED]
Hello

I would like to make a few comments about the Aboriginal Serif font.

First, the reason for putting so many characters in the PUA is as follows.
For Blackfoot, Dene, Cree (some dialects), and Ojibway (some dialects),
some important characters are missing to write these languages properly.
For example, as far as I can figure, within UCAS one cannot differentiate
between top-line, mid-line, and baseline finals. So because these finals
had to be lumped in the PUA (along with some other characters), I put glyph
variants there also. Needless to say, one cannot write Dene or Ojibway
(i-final) using Code2000. So I don't know what else to say. I want the
examples on my site to be legible (dot accents non-spaced in the middle of
syllabics instead of above them aren't really acceptable), and I want the
characters to look like what speakers are familiar with, otherwise they may
very well choose not to use the font, keyboards, etc.

My aim is that people can type their own language on the computer they have
now. Once OpenType is available on my machine and others, I will release
fonts which have OpenType tables, calling the same glyphs that are now in
the PUA. This way, I am trying to make some humble attempt at backward
compatibility. But for now, if people cannot use the OpenType
substitutions, what else should I do? 

I am building specific fonts for specific languages, but I wanted one font
that would display the lot. That way, if someone wanted to use
languagegeek.com, they would only have to download one font, instead of one
per language.

Please notice that months ago, I changed the name of the font from
"Aboriginal Serif Unicode" to "Aboriginal Serif" in response to comments
made earlier on this list; I also note on every page that one would have to
download my font to view the pages properly.

Thank-you.. Chris


mail2web - Check your email from the web at
http://mail2web.com/ .





Unified Canadian Syllabics

2004-02-09 Thread [EMAIL PROTECTED]
Hello

I think I posted this to the list last week, but I haven't seen it come up.


I would like to present to the Unicode community some suggestions for
missing and mis-named characters related to the UCAS range. To properly 
describe the kinds of characters missing etc., many graphics are required.
For this reason, I would invite people to see the document at:

http://www.languagegeek.com/issues/ucas_unicode.html

A "words only" description follows here:
**
**

I would like to suggest to the Unicode community the following observations
relating to the Unified Canadian Aboriginal Syllabics range. My goal (see
www.languagegeek.com) is to enable all of the North American languages to
be properly and accurately written on the Internet, and computers in
general. Here I will focus specifically on the languages which are
currently using or historically used (and may still be in some communities)
syllabics. 

Some conventions used below. All Unicode character names are in majuscule,
and “Canadian Syllabics” has been abbreviated to CS. Hexadecimal Unicode
indices are in parentheses and prefixed with “U+”. All sources cited are
linked to the languagegeek.com bibliography. A “final” is the Syllabics
term for a character which represents a consonant only, not a consonant +
vowel, so CS FINAL GRAVE (U+1420), CS CARRIER H (U+144B) and CS NASKAPI SKW
(U+150A) would all be examples of “finals”. I use the term “syllabic” to
refer to a consonant + vowel character. A series is a row of characters on
a syllabic chart, so in Misnamed Characters Note 1, “tta, tte, tti, tto”
would be the tt-series.
 
Misnamed Characters
The asterisk ᕯ character (U+156F) appears on the code-page chart as
**, and is named CS TTH. This is a misreading of the syllabarium chart used
by the French Missionaries for Chipewyan—probably from the 1904 publication
Prières Catéchisme et Cantiques en langue Montagnaise ou Chipeweyan. The
chart in this book has been reprinted in most if not all “scripts of the
world” type books. Unlike most other syllabics charts, this one does not
have a column of finals to the right of the consonant-vowel syllabics.
Instead, it simply has a list of all the finals, which do not correspond
with the syllabics series on the same row. Thus, the CS WEST-CREE P
(U+144A) (looks like a prime ') final which appears to the right of the
“tta” row is not the sound “tt”, but is instead “h”. The blue circled
asterisk is not “tth”, but is in fact a symbol which indicates a proper
name, in this case /*adą/ (Adam). A second glitch on the Unicode
code-page chart is that this character is written with two asterisks “**”,
when in fact on the chart above, the first asterisk is the character
itself, and the second is part of the example. I believe this should
definitively be fixed. 

In the syllbacs chart mentioned above, the final row in the chart is
labelled “tca, tce…”, (U+1570-73) which corresponds to the modern Roman
orthography sound /t/ (an aspirated stop). Interpreting “tca” as “tya” is a
misunderstanding of the French description of what the c represents. The
Chipewyan Syllabarium page has more info on this. Whether this syllabics
series is renamed is probably not a high priority.

In Naskapi, each a-type syllabic character can either be preceded by a
colon-like character, or have a umlaut-like diacritic. Unicode has labelled
these as having a long vowel: e.g. (U+1482) CS NASKAPI KWAA. In fact, the
colon or umlaut does not mark vowel length (Naskapi orthography ignores
length). Instead, the colon or umlaut simply indicates “wa”. So (U+1482)
would be better named CS NASKAPI KWA. This is also probably not a high
priority.

Missing Characters

Naskapi
According to the Naskapi Lexicon, there is no symbol NASKAPI WOO (U+1416),
but there is a “wi”. This character look similar to U+140E CS WI, but is
different—the dot is higher up on the left side. “wi” may need to be added.
“woo” may be on a different Naskapi chart I have not seen.
 
Blackfoot
In Blackfoot, a raised “equals sign” is used much as the “CS FINAL MIDDLE
DOT” (U+1427) is in Cree: to indicate a /w/ between the consonant and vowel
of the syllabic. A raised = with “CS BLACKFOOT KA” (U+15BD) before it,
gives the sound /kwa/. This character is vital to writing Blackfoot, should
be added. 
 
Carrier Dene
A few finals are missing from Unicode which are used in Carrier.
Information for Carrier is from Poser 2000. There is an important graphical
distinction between the finals used for /s/ and /s(+macron-below)/ (in the
Roman Orthography version). The former is a small serif “s” written
mid-line, while the latter is a small sans-serif “s” written mid-line.
Unicode lists only one version (U+1506) CS ATHAPASCAN S. A second
character, an upside-down mid-line small “h” is used for load words with
/f/ or /v/ sounds. These two finals should be added.
 
In examples of Ca

FYI: CLAW 2003

2002-11-06 Thread [EMAIL PROTECTED]
FYI: Controlled Language Applications Workshop

http://www.eamt.org/eamt-claw03/

Bev

--
www.enso-company.com * [EMAIL PROTECTED]

---
Bev Corwin, President  Enso Company Ltd.
The Westin Building 2001 Sixth Avenue
Penthouse Suite 3403  Seattle WA 98121 USA
Telephone: 206.728.2232 Facsimile: 206.728.2262







remove

2002-01-24 Thread [EMAIL PROTECTED]







Devanagari on MacOS 9.2 and IE 5.1

2002-01-22 Thread [EMAIL PROTECTED]

I spoke to fast. Upon taking a closer look at the file, the font was not set properly. 
MacOS 9.2, Indian Language Kit, Mac IE 5.1 and Devanagari MT as font face seem to 
display UTF-8 encoded Hindi just fine.

Etienne

>Date: Mon, 21 Jan 2002 10:24:16 -0800
> "[EMAIL PROTECTED]" <[EMAIL PROTECTED]> [EMAIL PROTECTED], 
>[EMAIL PROTECTED]: [EMAIL PROTECTED]
> RE: Devanagari
>
>On this subject, Win2K and IE5+ seem to do a nice job displaying UTF8-encoded Hindi. 
>On the Mac, the Indian Language Kit provides for OS support and fonts (with MacOS 9.2 
>and above), but I have not been able to display Hindi (UTF8 encoded) with Mac's IE 
>5.1. Am I correct in assuming that the Mac version of IE does not support Hindi 
>without a hack?
>
>Etienne
>
>>Reply-To: <[EMAIL PROTECTED]>
>> "Christopher J Fynn" <[EMAIL PROTECTED]> <[EMAIL PROTECTED]>Cc: "Aman Chawla" 
><[EMAIL PROTECTED]>
>> RE: DevanagariDate: Mon, 21 Jan 2002 23:59:38 +0600
>>
>>Aman
>>
>>Here in Bhutan the Internet connection is still much worse than in most
>>places I've visited in India & Nepal (and the cost per minute is several
>>times higher) - believe me even then UTF-8 (or UTF-16) encoded pages do not
>>display noticeably slower than ASCII, ISCII or 8-bit font encoded pages -
>>and I don't need to download any special plug-ins or fonts.
>>
>>- Chris
>>
>>--
>>Christopher J Fynn
>>Thimphu, Bhutan
>>
>><[EMAIL PROTECTED]>
>><[EMAIL PROTECTED]>
>>
>>
>>> -Original Message-
>>> From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]]On
>>> Behalf Of Aman Chawla
>>> Sent: 21 January 2002 10:57
>>> To: James Kass; Unicode
>>> Subject: Re: Devanagari
>>>
>>>
>>> - Original Message -
>>> From: "James Kass" <[EMAIL PROTECTED]>
>>> To: "Aman Chawla" <[EMAIL PROTECTED]>; "Unicode"
>>> <[EMAIL PROTECTED]>
>>> Sent: Monday, January 21, 2002 12:46 AM
>>> Subject: Re: Devanagari
>>>
>>>
>>> > 25% may not be 300%, but it isn't insignificant.  As you note, if the
>>> > mark-up were removed from both of those files, the percentage of
>>> > increase would be slightly higher.  But, as connection speeds continue
>>> > to improve, these differences are becoming almost minuscule.
>>>
>>> With regards to South Asia, where the most widely used modems are
>>> approx. 14
>>> kbps, maybe some 36 kbps and rarely 56 kbps, where broadband/DSL is mostly
>>> unheard of, efficiency in data transmission is of paramount importance...
>>> how can we convince the south asian user to create websites in an encoding
>>> that would make his client's 14 kbps modem as effective (rather,
>>> ineffective) as a 4.6 kbps modem?
>>>
>
>
>
>Hot After Christmas DEALS on just about everything!
>http://www.smartshop.com/cgi-bin/main.cgi?ssa=4099





Hot After Christmas DEALS on just about everything!
http://www.smartshop.com/cgi-bin/main.cgi?ssa=4099




RE: Devanagari

2002-01-21 Thread [EMAIL PROTECTED]


On this subject, Win2K and IE5+ seem to do a nice job displaying UTF8-encoded Hindi. 
On the Mac, the Indian Language Kit provides for OS support and fonts (with MacOS 9.2 
and above), but I have not been able to display Hindi (UTF8 encoded) with Mac's IE 
5.1. Am I correct in assuming that the Mac version of IE does not support Hindi 
without a hack?

Etienne

>Reply-To: <[EMAIL PROTECTED]>
> "Christopher J Fynn" <[EMAIL PROTECTED]> <[EMAIL PROTECTED]>Cc: "Aman Chawla" 
><[EMAIL PROTECTED]>
> RE: DevanagariDate: Mon, 21 Jan 2002 23:59:38 +0600
>
>Aman
>
>Here in Bhutan the Internet connection is still much worse than in most
>places I've visited in India & Nepal (and the cost per minute is several
>times higher) - believe me even then UTF-8 (or UTF-16) encoded pages do not
>display noticeably slower than ASCII, ISCII or 8-bit font encoded pages -
>and I don't need to download any special plug-ins or fonts.
>
>- Chris
>
>--
>Christopher J Fynn
>Thimphu, Bhutan
>
><[EMAIL PROTECTED]>
><[EMAIL PROTECTED]>
>
>
>> -Original Message-
>> From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]]On
>> Behalf Of Aman Chawla
>> Sent: 21 January 2002 10:57
>> To: James Kass; Unicode
>> Subject: Re: Devanagari
>>
>>
>> - Original Message -
>> From: "James Kass" <[EMAIL PROTECTED]>
>> To: "Aman Chawla" <[EMAIL PROTECTED]>; "Unicode"
>> <[EMAIL PROTECTED]>
>> Sent: Monday, January 21, 2002 12:46 AM
>> Subject: Re: Devanagari
>>
>>
>> > 25% may not be 300%, but it isn't insignificant.  As you note, if the
>> > mark-up were removed from both of those files, the percentage of
>> > increase would be slightly higher.  But, as connection speeds continue
>> > to improve, these differences are becoming almost minuscule.
>>
>> With regards to South Asia, where the most widely used modems are
>> approx. 14
>> kbps, maybe some 36 kbps and rarely 56 kbps, where broadband/DSL is mostly
>> unheard of, efficiency in data transmission is of paramount importance...
>> how can we convince the south asian user to create websites in an encoding
>> that would make his client's 14 kbps modem as effective (rather,
>> ineffective) as a 4.6 kbps modem?
>>



Hot After Christmas DEALS on just about everything!
http://www.smartshop.com/cgi-bin/main.cgi?ssa=4099




ISCII-Unicode Conversion

2001-11-12 Thread [EMAIL PROTECTED]

Hi,

Would anybody be able to point me to possible ISCII-Unicode conversion utilities/APIs? 
How reliable is the conversion?

How well is Hindi supported by the UTF8-Internet Explorer combination?

Your expertise is GREATLY appreciated.

Best,

Etienne



Nettaxi would like to ask for your help in donations to the RED CROSS today!
http://www.nyredcross.org/donate/




RE: Terms "constructed script", "invented script" (was: FW: Re: Shavian)

2001-07-09 Thread [EMAIL PROTECTED]
>   Odd.  I've always considered Japanese "double consonants" to be
> glottal stops.  Could anyone please explain the difference?

They are glottal stops.  But Japanese writing doesn't have a (standard)  
means of expressing a glottally stopped vowel pair.  It only can express  
consonants.  One supposes that a small "tsu" would suffice, e.g.  
$B%O%t%!%$%C%$(B => hawai'i... And probably has already been used somewhere  
to that effect.  As Ed Cherlin pointed out, "tsu" has been adapted for  
word-final consonants... in that sense, "tsu" is effectively used as a virama  
already.

I still don't know if there's any Japanese phonetic scholarship that  
distinguishes "L" and "R"...

Rick


Re: Terms "constructed script", "invented script" (was: FW: Re: Shavian)

2001-07-07 Thread [EMAIL PROTECTED]
>Hiragana (and katakana) assume certain things about the syllabic structure,
>specifically that syllables are of the form [C] V [C], where the trailing
>consonant (if any) must be "n".

Yes, but, kana _has_ been used even natively in comics and so forth, to end  
words with other consonants (i.e., eliding the last vowel) for example:   
$B%$%s%9%?%s%H%C!&%9!<%W%C(B

The biggest problem with using kana for a wide variety of languages, aside  
from having a severely limited number of consonants & vowels even with  
extension, is that it doesn't express adjacent non-identical consonants at  
all.

Kana should be quite adequate for some other languages...  Hawaiian?  Oh,  
hmmm, well, except for that darned L/R distinction which kana doesn't have...  


Uh... Never mind...

Rick


Re: Shavian

2001-07-05 Thread [EMAIL PROTECTED]

David Starner - [EMAIL PROTECTED] wrote...

> A lot of the arguments against Klingon weren't specificially against
> Klingon;

That was in WG2, I guess... The most recent discussion material that UTC saw  
is a document I wrote, which is solely about Klingon and reasons for  
rejecting it.

Fictional or invented scripts aren't in and of themselves bad candidates for  
encoding, they should just be, in general, of low priority because, pretty  
much without exception, they are "toys".  Shavian and Deseret are examples of  
scripts that needn't have been encoded now, and aren't very widely used, and  
aren't _NEEDED_ by anyone at all, but were encoded because a while back  
someone just happened to have done the work, and the proposals have just been  
sitting around gathering dust.  Might as well get them in, because nothing  
more needs to be done to the proposals.

What's "bad" is that work seems to get done on fictional scripts while there  
are still millions of real people (some of whom even have access to  
computers) who can't express texts of their natively-used languages with  
Unicode because we don't have their scripts encoded.  There are various  
reasons for that, the most common being that we can't get enough information  
about them.  The most common reason for not having enough information is that  
we can't shlep enough experts to us, nor shlep enough of us to the experts,  
to complete any encoding proposals... a matter of time and funds.

Rick





Re: GBK, HZ and EUC-TW

2001-01-10 Thread [EMAIL PROTECTED]

> Lars Garshol wrote:
> 
> * Tom Emerson
> | 
> | As far as mapping tables go, the best one you'll find is the
> | Microsoft or ICU mapping tables. I personally have not seen an
> | official mapping table from GB 13000. As others have noted,
> | Microsoft has extended the "pure" GBK with Euro, and perhaps other
> | code points.
> 
> Hmmm. Does this mean that it is best to support the Microsoft
> extensions, or that it is best not to do so?  I guess we will be
> forced to support them sooner or later, and that we might as well do
> it now to save everyone some bother.

As others have already indirectly noted, the problem then is the Euro  
is thus "double-defined" within GBK at code points GB 0x80 and GB 0xA2E3.
Consequently, round-trip conversions between GBK and the Unicode
0x20AC Euro are thereby not possible without some form of data
code value transformation on the return for one of these two GBK values.

The one alternative is to distinguish between the two forms of GBK,
supporting two forms of conversions - one to cp936 and the other to
"pure" GBK.

---

Out of curiosity, what does GB-18030 define for the Euro?  Does it 
define both a single-width and a double-width form?

If so, does it include any reference to how interoperability should 
be handled in conversions with Unicode (or for that matter, any 
character set which defines a single code value for this character)?  

(Lastly, throwing a lighted match onto gasoline...) If two forms are 
specified in GB-18030, should Unicode consider adding another code 
point in the fullwidth variant region to accomodate this?

- Sue