Query on Diacritical Characters in Unicode

2002-01-24 Thread Navijender yatam reddy

Hello all,
 Im implementing a new feature that support unicode characters
for printer drivers in UI. I have been going through some of the
documents in this regard.
1) I wanted to know what are diacritical characters?
2) In which of the languages are these characters used?
3) Which of the fonts do support these characters?

YNR



---
Information transmitted by this E-MAIL is proprietary to Wipro and/or its Customers and
is intended for use only by the individual or entity to which it is
addressed, and may contain information that is privileged, confidential or
exempt from disclosure under applicable law. If you are not the intended
recipient or it appears that this mail has been forwarded to you without
proper authority, you are notified that any use or dissemination of this
information in any manner is strictly prohibited. In such cases, please
notify us immediately at mailto:[EMAIL PROTECTED] and delete this mail
from your records.




Multiple script Handling

2002-01-24 Thread Rajat Bawa



Hi,
 
I am 
presently working to write a few algorithms to match 2 input Japanese strings 
and see if they match properly.
 
The 
problem i am facing is that input strings can be written in any Japanese scripts 
(Kanji, Katakan, Hiragana) and 2 strings written in different Japanses 
scripts can mean the same 
 
But while 
comparing them using normal unicode based api's(provided by ICU library) they 
stand out to be different as these api's generally copare them based on 
there internal encodings code pattern which is obviously different for 
different character.
 
Any 
solutions to handle the same ( or in other words to compare 2 Japanese strings 
written in different scripts or by mixture of two scripts)  
??
 
 
Regards, 
Rajat Bawa
 


remove

2002-01-24 Thread [EMAIL PROTECTED]







Re: RE: [Very-OT] Re: ü

2002-01-24 Thread Michael Everson

At 21:11 -0500 2002-01-23, Patrick Andries wrote:

>"In the first edition of this dictionary it was said that in many 
>compounds whose second element begins with h the h is silent unless 
>the accent falls on the syllable that it begins; thus philhellenic 
>and philharmonic should not sound the h; in nihilism also it should 
>be silent. Here too the speak-as-you-spell movement has been at 
>work, and though the COD [Concise Oxford Dictionary] does not favour 
>the pronunciation of the h in these words,

Not so, at least not in the ninth edition, 1998.

>it is in fact often heard

I wouldn't say I'd ever heard these words without the h.
-- 
Michael Everson *** Everson Typography *** http://www.evertype.com




Re: TC/SC mapping

2002-01-24 Thread Werner LEMBERG

> > This is the kind of mess that has discouraged anybody from doing a
> > systematic survey of simplifications for the Unihan database.
> 
> Part of this is because there is the orthogonal complexity of
> variant TC forms.  Before converting TC to SC, one should resolve
> all TC variants to the most "common" or "standard" TC form (good
> luck deciding what that means).  e.g., in the above case, resolve to
> U+9EBD.

I think that any mapping will fail.  As so many things with CJK
characters, the usage depends on constraints beyond a character
encoding: time, location, purpose, etc.  This is the very reason why
CCCII hasn't succeeded.  As a consequence, the available fields are
not enough to really represent the interdependencies correctly.

Either increase the number of available keywords (e.g. kZVariant1,
kZVariant2) to be able to fine-tune the dependencies (something like
`character a in the meaning of b is a variant of character c', or add
a remark to the description of keywords that the fields can't be
exhaustive due to such and such reasons.


Werner




Re: [Very-OT] Re: ü

2002-01-24 Thread Alain LaBonté

A 08:13 2002-01-23 -0500, John Cowan a écrit :
>Middle French spelling is very unphonemic.  This is the so-called
>"aspirated h", which still blocks liaison even though it is
>quite silent now.

[Alain]  Not only quite, but absolutely mute, one must not be so shy. We 
use the word "aspirated" to distinguish them from all other "mute" h's just 
because the h has an effect on pronunciation, but the h itself is never 
pronounced in French.

Example of "aspirated" h (they are exceptional anyway) in French : « des 
héros » (which means « [some, many] heroes »)... pronounced « day 'ayro » 
(which distinguishes the words from « des zéros » (« dayzayro »), which 
means « [some, many] zeroes ».

Alain LaBonté
Québec





RE: [Very-OT] Re: ü

2002-01-24 Thread Alain LaBonté

A 16:18 2002-01-23 -0800, Yves Arrouye a écrit :
> > >>Obviously (I advocate in French changing the spelling of common foreign
> > >>words so that there would be more consistency).
> > >
> > >Le ouiquende?
> >
> > That would be pronounced "wikãd"... To respect the English pronunciation
> > you would have to write it "ouiquennde", which would still be a very odd
> > spelling in French... The "end" sound is really not French in itself...
>
>France's Académie française is good at that: they recently invented cédérom
>(CD-ROM; gets used because it's quite okay), and mèl (mail, for e-mail;
>nobody uses it except to make fun of it).

[Alain]  Mel is a horrible and hypocritical abbreviation of "Messagerie 
électronique" recommended in the French government. It is recommended not 
to use it as a noun. However some people in France used to say "email" and 
now say "mel" in spite of the recommendation not to pronounce the abbreviation.

Québec invented the (French-sounding) word "courriel" (for "courrier 
électronique")... It is more and more used in France too.

For one, I must also confess that I personally write the word "cédérom" 
(the sounds no not shock a French speaker and the spelling either -- wile 
email pronounced "ee-mail" [iméle or imèle] in French, is horribly 
schizophrenic) although the word will probably disappear over time 
[regardless of its spelling], as well as the word "microsillon" (33 RPM 
records)...

Using generic names (such as "disque" for CD-ROM, relatively 
technology-independent), was a good evolution in languages (we use one word 
for all "tables", it distracts to change words just because the shape 
changes, if the intent is to describe a function). It seems that nowadays 
we put more and more accent on technology, on how things are made, rather 
than on their destination (functionality). It is perhaps a sociological 
fact that I find interesting to notice.

Alain LaBonté
Québec





Re: Issues with Unicode Hindi

2002-01-24 Thread Michael \(michka\) Kaplan

From: "Dinesh Agarwal" <[EMAIL PROTECTED]>

> 5. Following suggestions from this list, a Unicode
> Hindi-specific font (Mangal) was obtained (from
> private sources) and installed in Control Panel ->
> Fonts.

Please be aware of the fact that those "private sources" have perhaps
engaged in piracy by giving you this font, which they have no right to
redistribute. This list hosts the company who was the victim of the
piracy -- a big company, to be sure, but do we rob the rich man because he
is rich? (I hope everyone thinks the answer here is no!).

I hope you deleted the font now that you have verified the solution. :-)

> Is Microsoft doing anything to improve their publically
> distributed version of Arial Unicode MS?

Agfa Monotype makes the font, and they presumably charge Microsoft a large
amount of money to allow redistribution in so many of their products.  If
you look at the newer version, there are many improvements, but there is a
LOT of work to do there so the fact that they are not done yet is perhaps
exusable.

> The UTC should press them to do so.

The Unicode Technical Committee does not generally pressue members to make
decisions on allocation of resources -- this would not really be
appropriate.

> Also, does anyone know if the Mangal font is available
> in the public domain?

No, it is a part of Windows 2000 and Windows XP. All you need to do is
upgrade to the OS that will support input and locale info and you will get a
font that comes with it

> Microsoft should be asked by the UTC to have Mangal
> included with all future versions of Internet Explorer,
> by default, or something to that effect.

Again, this would not be appropriate.


MichKa

Michael Kaplan
Trigeminal Software, Inc.  -- http://www.trigeminal.com/






Re: TC/SC mapping

2002-01-24 Thread DougEwell2

Many have responded:

> Meanwhile, it is true that there are simplified characters which 
> correspond to more than one traditional form.
...
> This is the kind of mess that has discouraged anybody from doing a 
> systematic survey of simplifications for the Unihan database.
...
> Before converting TC to SC, one should resolve all TC variants to
> the most "common" or "standard" TC form (good luck deciding what that
> means).
...
> I think that any mapping will fail.

Thanks to everyone for your input concerning the TC/SC mapping issue.  You 
have confirmed what I already knew, but needed concrete evidence of; namely, 
that mapping between Traditional Chinese and Simplified Chinese is not a 
simple 1-to-1 table lookup problem, but involves lexical analysis and even 
knowledge of the author's intent.

Currently on the IDN mailing list there is a big debate over this topic.  It 
is well known that ASCII-based domain names are matched in the DNS in a 
case-insensitive manner.  Many people recognize that Chinese readers who are 
familiar with both TC and SC consider text written in the two sub-scripts to 
be interchangeable, in roughly the same way that uppercase and lowercase 
Latin are interchangeable.  They would like Chinese domain names written in 
TC to match the "equivalent" name written in SC, just as "UNICODE.ORG" 
matches "unicode.org".

The problem is getting people to understand the scope of the problem.  As you 
have illustrated so well, TC/SC mapping is NOT, in the general case, as 
simple as Latin case mapping.  It requires content analysis, and possibly 
some form of tagging.

Almost all of the list members whose e-mail addresses end in .cn, .tw or .hk 
seem to believe that there is a willful disregard on the part of the working 
group for the needs of Chinese users in this respect.  We have tried to 
convince them that (a) the solution is not as simple as Latin case mapping, 
as many have portrayed it; (b) the problem is not with Unicode Han 
unification, since TC and SC are not unified; (c) content analysis is not 
feasible for domain names; and (d) the entire problem is out of scope of the 
IDN WG.  We have proposed that organizations register both .cn 
and .cn if they want both hits to be successful.  So far, not 
much convincing has taken place.  In the above case, they claim that all 
eight (2^3) possible combinations (e.g. ".cn") would need to be 
registered, which is overkill.

One list member has even proposed the prohibition of all CJK code points from 
internationalized domain names "until the problem can be solved," and he has 
the support of several others.  It is obvious that this is an attempt to 
hijack the entire IDN model by claiming "it does not support Chinese at all," 
which would certainly be true if Han characters were prohibited, and imposing 
a locally-constructed, Chinese-specific (i.e. not universal) model later on.

Unfortunately, as an American who does not speak or read Chinese, I have been 
in a poor position to argue with these people about their own written 
language.  So I relied on the combined expertise of the Unicode list, 
including native speakers and people with doctorates in Chinese, for 
background information.  Thanks again for your help.

-Doug Ewell
 Fullerton, California




Re: TC/SC mapping

2002-01-24 Thread John H. Jenkins


On Thursday, January 24, 2002, at 09:39 AM, [EMAIL PROTECTED] wrote:

>
> Currently on the IDN mailing list there is a big debate over this topic.  
> It
> is well known that ASCII-based domain names are matched in the DNS in a
> case-insensitive manner.  Many people recognize that Chinese readers who 
> are
> familiar with both TC and SC consider text written in the two sub-scripts 
> to
> be interchangeable, in roughly the same way that uppercase and lowercase
> Latin are interchangeable.  They would like Chinese domain names written 
> in
> TC to match the "equivalent" name written in SC, just as "UNICODE.ORG"
> matches "unicode.org".
>

Actually, this is more like asking "honor" and "honour" to match.

> Almost all of the list members whose e-mail addresses end in .cn, .tw or 
> .hk
> seem to believe that there is a willful disregard on the part of the 
> working
> group for the needs of Chinese users in this respect.  We have tried to
> convince them that (a) the solution is not as simple as Latin case 
> mapping,
> as many have portrayed it; (b) the problem is not with Unicode Han
> unification, since TC and SC are not unified; (c) content analysis is not
> feasible for domain names; and (d) the entire problem is out of scope of 
> the
> IDN WG.  We have proposed that organizations register both .cn
> and .cn if they want both hits to be successful.  So far, not
> much convincing has taken place.  In the above case, they claim that all
> eight (2^3) possible combinations (e.g. ".cn") would need to 
> be
> registered, which is overkill.
>

The bulk of Han ideographs don't occur in TC/SC pairs, so this is specious.
   I.e., to register the equivalent of "unicode.org", you only need two 
registrations, "<78BC>.org" (TC) and 
".org" (SC).  You don't need eight registrations.

Meanwhile, I'd like to offer a suggestion:

*If* they can live with one caveat, and *if* they can give us time to 
clean up our SC/TC mapping data, we could do the following:

1) SC/TC matching on Unicode data is only to be done on the SC/TC mapping 
data supplied by UTC.

2) Wherever a since SC character matches multiple TC characters, all the 
characters are to be treated the same.

This means, for example, that U+53F0 (台) will be treated the same as 
U+6AAF (檯), U+81FA (臺), and U+98B1 (颱).  This also means, of course, that 
U+6AAF, U+81FA, and U+98B1 will end up being indistinguishable even in 
purely TC names.

3) This includes Unicode compatibility mappings.  (Thereby reducing a lot 
of turtles, if nothing else.)

The caveat is that this must be understood to be a first-order, 
computer-appropriate equivalence and is not in any way to be held to be a 
generalized solution to the lexically appropriate conversion between SC 
and TC.  It also has to be understood that some things are going to slip 
through because it is not a generalized solution to Han normalization.  
Lexically inappropriate matches will take place!

(Maybe we should refer to *zhengguihua* instead of "Han normalization"…)

It also means that some desired matches won't happen, and some things can 
be "spoofed" by these nasty variant issues such as came up yesterday.  
U+9EBC and U+9EBD aren't likely to both match U+4E48.

However, this is already a problem in Unicode.  "shuowen.org" will have to 
register both ".org" and ".org"; Jingwa, 
Inc., will need both "" and "".

OK, so this is more than one caveat.  It will also mean that we will no 
longer be able to accept both the TC and SC form for a character as a 
candidate for separate encoding in the future, and future compatibility 
ideographs will be excluded from use in IDN.  (Actually, you could save 
yourself some grief right off by excluding Han radicals and all 
compatibility ideographs.)

==
John H. Jenkins
[EMAIL PROTECTED]
[EMAIL PROTECTED]
http://homepage.mac.com/jenkins/





FW: Unicode Tibetan

2002-01-24 Thread Magda Danish (Unicode)



-Original Message-
From: Robin Sackmann [mailto:[EMAIL PROTECTED]] 
Sent: Tuesday, January 22, 2002 4:24 AM
To: [EMAIL PROTECTED]
Subject: Unicode Tibetan


Dear Sirs,

when I try to use the Tibetan characters contained in the Arial Unicode
MS 
font in Microsoft Word 2002, the vowel signs do not appear above (or
below) 
the base character, as they should, but in the next character position.

Example: the base character "BA" (0F56) cannot be combined with the
vowel 
sign "O" (0F7C) in the proper way. (The result should actually be a
single 
glyph representing the syllable "BO".)

I would be grateful for any information on how this problem can be
solved.

With many thanks and best regards,

Robin Sackmann
___

 Robin Sackmann, M.A.
 Research and Teaching Assistant
 FREIE UNIVERSITAET BERLIN
 Berlin, Germany

 [EMAIL PROTECTED]
 office +49-30-838-51373
 secretary  +49-30-838-54011 
 home   +49-30-621 46 49
 www.germanistik.fu-berlin.de/il ___





FW: Running out of options...

2002-01-24 Thread Magda Danish (Unicode)



-Original Message-
From: Whit Gurley [mailto:[EMAIL PROTECTED]] 
Sent: Wednesday, January 23, 2002 8:47 AM
To: [EMAIL PROTECTED]
Subject: Running out of options...


Hello!

[...]

Now, I understand how ASCII text encoding works and therefore have a 
pretty good understanding of how Unicode builds upon that concept in 
order to create additional, multi-lingual character sets, but I am 
having the worst time trying to figure out how to make it actually 
happen. I realize that it may have to do with the fact that I'm on a 
Mac, but the unicode.org site seems to think that the Mac can be made 
Unicode-ready with system software. I've tried following your 
installation instructions without success - I still can't see Farsi 
text on your Weblog sites and the sample code that I download is 
still a series of null characters ("?") in my text editor 
(BBEdit 6.5).

[...]

So I guess I have two questions:

1. Do you know if it's possible for me, on a Mac with OS 9 and 
BBEdit, to create a web setup that would allow my boss to add Unicode 
Farsi content? What tools would he need to create that content?

2. [...]


I think that's it. Thank you very much for your time!
_
w h i t   g u r l e y
Art Director,
Softeon, Inc.

p :: 415 948 4028
e :: [EMAIL PROTECTED]
w :: http://www.softeon.com




Re: TC/SC mapping

2002-01-24 Thread Thomas Chan

On Thu, 24 Jan 2002, John H. Jenkins wrote:

> However, this is already a problem in Unicode.  "shuowen.org" will have to 
> register both ".org" and ".org"; Jingwa, 
> Inc., will need both "" and "".

U+8AAA and U+8AAC are given on p. 265 of TUS3.0 as an example of what
would have been unified had it not been for source separation.  Is it
possible to acquire data on other z-variants?  The kZVariant fields do not
seem to contain exactly that data.  Had that example not been pointed out,
I wouldn't have been known that both were encoded.


Thomas Chan
[EMAIL PROTECTED]






Re: Issues with Unicode Hindi

2002-01-24 Thread John Hudson

At 21:32 1/23/2002, Dinesh Agarwal wrote:

>Is Microsoft doing anything to improve their publically distributed 
>version of Arial Unicode MS? The UTC should press them to do so. Also, 
>does anyone know if the Mangal font is available in the public domain? 
>Microsoft should be asked by the UTC to have Mangal included with all 
>future versions of Internet Explorer, by default, or something to that effect.

It is not the UTC's job to petition members to provide specific support to 
users of older operating system versions. Microsoft distribute Mangal with 
Windows 2000 and Windows XP, both of which operating systems provide 
extensive Indic script support. Obviously they view support for Indic 
scripts as an incentive to upgrade to these newer versions of the OS, and 
Mangal and other Indic fonts are part of that incentive.

With regard to enhancing the Arial Unicode font, Monotype were scheduled to 
make a presentation on this subject at the Unicode conference in San Jose 
in September. I didn't make it to the conference, because my flight was 
cancelled due to the WTC attack, and I don't know whether this presentation 
took place. Anyone?

John Hudson

Tiro Typeworks  www.tiro.com
Vancouver, BC   [EMAIL PROTECTED]

... es ist ein unwiederbringliches Bild der Vergangenheit,
das mit jeder Gegenwart zu verschwinden droht, die sich
nicht in ihm gemeint erkannte.

... every image of the past that is not recognized by the
present as one of its own concerns threatens to disappear
irretrievably.
   Walter Benjamin





[OT] Rich man Bill (RE: Issues with Unicode Hindi)

2002-01-24 Thread Marco Cimarosti

> [...] but do we rob the rich man because he is rich?
> (I hope everyone thinks the answer here is no!).

I have been told that this mailing list is not for politics, so let's stick
to bare logics:

1. "we rob the poor man because he is poor"
2. "we rob the poor man because he is rich"
3. "we rob the rich man because he is poor"
4. "we rob the rich man because he is rich"

Do you agree that 2 and 3 are self-contradictory? So the choice is between 1
and 4...

_ Marco




Maya numerals

2002-01-24 Thread jarkko . hietaniemi

Hi,

I was refreshing my memory about Maya numerals: http://www.michielb.nl/maya/math.html
and started thinking how would one do these in Unicode, given the funny "stacking".

NOTE: the following is most definitely an encoding suggestion/request for
the Maya numerals, firstly because I'm not a Maya expert, and secondly because
[EMAIL PROTECTED] would hardly be the right place to ask.

Am I correct in assuming that the following is how it would work:

MAYA DIGIT ZERO
MAYA DIGIT ONE
MAYA DIGIT TWO
MAYA DIGIT THREE
MAYA DIGIT FOUR
MAYA DIGIT FIVE
MAYA DIGIT ONE COMBINING
MAYA DIGIT TWO COMBINING
MAYA DIGIT THREE COMBINING
MAYA DIGIT FOUR COMBINING
MAYA DIGIT FIVE COMBINING

and four example 12 would be

MAYA DIGIT FIVE + MAYA DIGIT FIVE COMBINING + MAYA DIGIT TWO COMBINING

or would it work to have just

MAYA DIGIT ZERO
MAYA DIGIT ONE COMBINING
MAYA DIGIT TWO COMBINING
MAYA DIGIT THREE COMBINING
MAYA DIGIT FOUR COMBINING
MAYA DIGIT FIVE COMBINING

and make the combiners combine with U+0020 (or U+00A0)?










Re: Unicode Tibetan

2002-01-24 Thread Michael \(michka\) Kaplan

This is because Arial Unicode MS does not have any of the OpenType
information that would be necessary for the shaping in the Tibetan script. I
do not know of any fonts that currently have this information, but I assume
it is only a matter of time before there are some.


MichKa

Michael Kaplan
Trigeminal Software, Inc.  -- http://www.trigeminal.com/

> From: Robin Sackmann [mailto:[EMAIL PROTECTED]]
> Sent: Tuesday, January 22, 2002 4:24 AM
> To: [EMAIL PROTECTED]
> Subject: Unicode Tibetan
>
>
> Dear Sirs,
>
> when I try to use the Tibetan characters contained in the Arial Unicode
> MS
> font in Microsoft Word 2002, the vowel signs do not appear above (or
> below)
> the base character, as they should, but in the next character position.
>
> Example: the base character "BA" (0F56) cannot be combined with the
> vowel
> sign "O" (0F7C) in the proper way. (The result should actually be a
> single
> glyph representing the syllable "BO".)
>
> I would be grateful for any information on how this problem can be
> solved.
>
> With many thanks and best regards,
>
> Robin Sackmann
> ___
>
>  Robin Sackmann, M.A.
>  Research and Teaching Assistant
>  FREIE UNIVERSITAET BERLIN
>  Berlin, Germany
> 
>  [EMAIL PROTECTED]
>  office +49-30-838-51373
>  secretary  +49-30-838-54011
>  home   +49-30-621 46 49
>  www.germanistik.fu-berlin.de/il ___
>
>
>





RE: TC/SC mapping

2002-01-24 Thread Marco Cimarosti

Doug Ewell wrote:
> Currently on the IDN mailing list there is a big debate over 
> this topic.  It is well known that ASCII-based domain names
> are matched in the DNS in a case-insensitive manner.  Many
> people recognize that Chinese readers who are familiar with
> both TC and SC consider text written in the two sub-scripts
> to be interchangeable, in roughly the same way that
> uppercase and lowercase Latin are interchangeable.

Converting TC to SC is difficult, and the opposite is nearly impossible. But
a simple "loose match" like the one you describe does not seem so difficult.

On the other hand, out of English-only realm, also converting uppercase to
lowercase is difficult, and the opposite is nearly impossible. But simple
case folding is not so difficult.

Here it is simply a matter of putting together all the groups of ideographs
that may be considered variants of each other (not only SC and TC, but also
Japanese simplifications, semantic variants, "specialized semantic
variants", compatibility equivalents, radicals, etc.), and to map them
*internally* to a single key (e.g., the lowest code point in the group).

You don't even bother whether the result is TC, SC, or a horrible mix of the
two: anyway, nobody is supposed to see it.

Of course there are security concerns. It the conversion must be
well-defined and not be changed in the course of time. And, of course, DNS's
should be registered in their "folded" version.

_ Marco




Re: [OT] Rich man Bill (RE: Issues with Unicode Hindi)

2002-01-24 Thread Michael \(michka\) Kaplan

We rob NO ONE. We behave with honor and we wish others to do the same with
us.

Its a respect thing.


MichKa

Michael Kaplan
Trigeminal Software, Inc.  -- http://www.trigeminal.com/

- Original Message -
From: "Marco Cimarosti" <[EMAIL PROTECTED]>
To: "'Michael (michka) Kaplan'" <[EMAIL PROTECTED]>; "Dinesh Agarwal"
<[EMAIL PROTECTED]>; <[EMAIL PROTECTED]>
Sent: Thursday, January 24, 2002 10:44 AM
Subject: [OT] Rich man Bill (RE: Issues with Unicode Hindi)


> > [...] but do we rob the rich man because he is rich?
> > (I hope everyone thinks the answer here is no!).
>
> I have been told that this mailing list is not for politics, so let's
stick
> to bare logics:
>
> 1. "we rob the poor man because he is poor"
> 2. "we rob the poor man because he is rich"
> 3. "we rob the rich man because he is poor"
> 4. "we rob the rich man because he is rich"
>
> Do you agree that 2 and 3 are self-contradictory? So the choice is between
1
> and 4...
>
> _ Marco
>
>





RE: [OT] Rich man Bill (RE: Issues with Unicode Hindi)

2002-01-24 Thread Rick Cameron

Hi, Michael

By 'we', do you mean Microsoft? If so, I'm surprised you identify yourself
so closely with that corporation - I thought you were an independent
contractor...

As for the question of whether Microsoft has robbed anyone, I would say that
there are a couple of cases before the courts that bear on that very
question - and that Microsoft hasn't been faring very well in establishing
its innocence! ;^)

Thanks

- rick cameron

-Original Message-
From: Michael (michka) Kaplan [mailto:[EMAIL PROTECTED]] 
Sent: Thursday, 24 January 2002 11:06
To: Marco Cimarosti; Dinesh Agarwal; [EMAIL PROTECTED]
Subject: Re: [OT] Rich man Bill (RE: Issues with Unicode Hindi)
Importance: Low


We rob NO ONE. We behave with honor and we wish others to do the same with
us.

Its a respect thing.


MichKa

Michael Kaplan
Trigeminal Software, Inc.  -- http://www.trigeminal.com/

- Original Message -
From: "Marco Cimarosti" <[EMAIL PROTECTED]>
To: "'Michael (michka) Kaplan'" <[EMAIL PROTECTED]>; "Dinesh Agarwal"
<[EMAIL PROTECTED]>; <[EMAIL PROTECTED]>
Sent: Thursday, January 24, 2002 10:44 AM
Subject: [OT] Rich man Bill (RE: Issues with Unicode Hindi)


> > [...] but do we rob the rich man because he is rich?
> > (I hope everyone thinks the answer here is no!).
>
> I have been told that this mailing list is not for politics, so let's
stick
> to bare logics:
>
> 1. "we rob the poor man because he is poor"
> 2. "we rob the poor man because he is rich"
> 3. "we rob the rich man because he is poor"
> 4. "we rob the rich man because he is rich"
>
> Do you agree that 2 and 3 are self-contradictory? So the choice is 
> between
1
> and 4...
>
> _ Marco
>
>





Re: TC/SC mapping

2002-01-24 Thread John H. Jenkins


On Thursday, January 24, 2002, at 11:44 AM, Thomas Chan wrote:

> On Thu, 24 Jan 2002, John H. Jenkins wrote:
>
>> However, this is already a problem in Unicode.  "shuowen.org" will have 
>> to
>> register both ".org" and ".org"; Jingwa,
>> Inc., will need both "" and "".
>
> U+8AAA and U+8AAC are given on p. 265 of TUS3.0 as an example of what
> would have been unified had it not been for source separation.  Is it
> possible to acquire data on other z-variants?

Er, no.

> The kZVariant fields do not
> seem to contain exactly that data.

Nope.  As with the SC/TC problem from yesterday, this is just too messy 
for anyone to have found the time to do it properly.  The bulk of the 
kZVariant data we have right now is largely derived from the CCCII mapping 
data.

This is something we're going to ask WG2 to tell the IRG to do.

==
John H. Jenkins
[EMAIL PROTECTED]
[EMAIL PROTECTED]
http://homepage.mac.com/jenkins/





Re: [OT] Rich man Bill (RE: Issues with Unicode Hindi)

2002-01-24 Thread John Hudson

At 10:44 1/24/2002, Marco Cimarosti wrote:

>I have been told that this mailing list is not for politics, so let's stick
>to bare logics:
>
>1. "we rob the poor man because he is poor"
>2. "we rob the poor man because he is rich"
>3. "we rob the rich man because he is poor"
>4. "we rob the rich man because he is rich"
>
>Do you agree that 2 and 3 are self-contradictory? So the choice is between 1
>and 4...

5. "we try not knowingly to rob anyone"

JH

Tiro Typeworks  www.tiro.com
Vancouver, BC   [EMAIL PROTECTED]

... es ist ein unwiederbringliches Bild der Vergangenheit,
das mit jeder Gegenwart zu verschwinden droht, die sich
nicht in ihm gemeint erkannte.

... every image of the past that is not recognized by the
present as one of its own concerns threatens to disappear
irretrievably.
   Walter Benjamin





Re: [OT] Rich man Bill (RE: Issues with Unicode Hindi)

2002-01-24 Thread Michael Everson

>From: "Marco Cimarosti" <[EMAIL PROTECTED]>
>  > I have been told that this mailing list is not for politics, so let's
>stick
>>  to bare logics:
>>
>>  1. "we rob the poor man because he is poor"
>>  2. "we rob the poor man because he is rich"
>>  3. "we rob the rich man because he is poor"
>>  4. "we rob the rich man because he is rich"
>>
>  > Do you agree that 2 and 3 are self-contradictory?

They remind me of the Gospel of Thomas. But in texts like that it is 
all metaphor.
-- 
Michael Everson *** Everson Typography *** http://www.evertype.com




Re: Maya numerals

2002-01-24 Thread Michael Everson

I would encode Mayan using the same model we will use for Egyptian. 
The individual characters cluster groups much as Egyptian does, 
though the font rendering is much more complex.
-- 
Michael Everson *** Everson Typography *** http://www.evertype.com




Re: [OT] Rich man Bill (RE: Issues with Unicode Hindi)

2002-01-24 Thread Michael \(michka\) Kaplan

From: "Rick Cameron" <[EMAIL PROTECTED]>

> By 'we', do you mean Microsoft?

No, I mean "we" as in we, the members of the human race on this spaceshhip
Earth. WE behave appropriately as that as what a good person does. Is "we"
act badly, then "we" should be ashamed and keep it to ourselves.

> If so, I'm surprised you identify yourself so closely
> with that corporation - I thought you were an
> independent contractor...

I am. So I guess you so not need to be surprised

> As for the question of whether Microsoft has robbed anyone, I
> would say that there are a couple of cases before the courts
> that bear on that very question - and that Microsoft hasn't been
> faring very well in establishing its innocence! ;^)

As the queen mother once said, "we are not amused."

It is in bad taste to claim that we may rob someone else who we believe has
robbed others, is it not?

In any case, this has nothing to do with the conversation here. I call upon
Sarasvati to remind the participants of the purposes of this forum? Unless
it has suddenlyu become an extension to the fedearl court system in the US?
;-)


MichKa

Michael Kaplan
Trigeminal Software, Inc.  -- http://www.trigeminal.com/





RE: [OT] Rich man Bill (RE: Issues with Unicode Hindi)

2002-01-24 Thread Rick Cameron

Hi, MichKa et al.

I certainly didn't intend to suggest that it's OK to rob someone (or
something) if you believe they (or it) has robbed others. My statement was
predicated on the erroneous assumption that you, MichKa, were asserting that
Microsoft has not robbed anyone; and I just wanted to point out that there's
room for discussion of such an assertion ;^)

I'm sorry I misunderstood what you meant by "we" - but I think it wasn't
very clear! (A distressingly common problem in e-mail...)

Cheers

- rick cameron

-Original Message-
From: Michael (michka) Kaplan [mailto:[EMAIL PROTECTED]] 
Sent: Thursday, 24 January 2002 11:41
To: Rick Cameron; [EMAIL PROTECTED]
Subject: Re: [OT] Rich man Bill (RE: Issues with Unicode Hindi)


From: "Rick Cameron" <[EMAIL PROTECTED]>

> By 'we', do you mean Microsoft?

No, I mean "we" as in we, the members of the human race on this spaceshhip
Earth. WE behave appropriately as that as what a good person does. Is "we"
act badly, then "we" should be ashamed and keep it to ourselves.

> If so, I'm surprised you identify yourself so closely
> with that corporation - I thought you were an
> independent contractor...

I am. So I guess you so not need to be surprised

> As for the question of whether Microsoft has robbed anyone, I would 
> say that there are a couple of cases before the courts that bear on 
> that very question - and that Microsoft hasn't been faring very well 
> in establishing its innocence! ;^)

As the queen mother once said, "we are not amused."

It is in bad taste to claim that we may rob someone else who we believe has
robbed others, is it not?

In any case, this has nothing to do with the conversation here. I call upon
Sarasvati to remind the participants of the purposes of this forum? Unless
it has suddenlyu become an extension to the fedearl court system in the US?
;-)


MichKa

Michael Kaplan
Trigeminal Software, Inc.  -- http://www.trigeminal.com/




Re: Maya numerals

2002-01-24 Thread Rick McGowan

I would just encode the 20 numerals. However, nobody has yet come up with  
a comprehensive proposal, so I would defer any discussion to the point at  
which some expert(s) have an opinion about the script in general.

Rick




RE: [OT] Rich man Bill (RE: Issues with Unicode Hindi)

2002-01-24 Thread Hohberger, Clive

Anyone who's ever gone to Business School knows that the pragmatically
correct answer is #4. It's called "The Robin Hood Strategy: Steal only from
the rich!"  Who ever made any money stealing from the poor?
Clive



-Original Message-
From: Michael Everson [mailto:[EMAIL PROTECTED]]
Sent: Thursday, January 24, 2002 1:18 PM
To: [EMAIL PROTECTED]
Subject: Re: [OT] Rich man Bill (RE: Issues with Unicode Hindi)
Importance: Low


>From: "Marco Cimarosti" <[EMAIL PROTECTED]>
>  > I have been told that this mailing list is not for politics, so let's
>stick
>>  to bare logics:
>>
>>  1. "we rob the poor man because he is poor"
>>  2. "we rob the poor man because he is rich"
>>  3. "we rob the rich man because he is poor"
>>  4. "we rob the rich man because he is rich"
>>
>  > Do you agree that 2 and 3 are self-contradictory?

They remind me of the Gospel of Thomas. But in texts like that it is 
all metaphor.
-- 
Michael Everson *** Everson Typography *** http://www.evertype.com




Re: TC/SC mapping

2002-01-24 Thread John H. Jenkins


On Thursday, January 24, 2002, at 12:29 PM, John Cowan wrote:

> John H. Jenkins wrote:
>
> {TC1, SC1, SC2, TC2, TC3, SC3} constitute a "Han simplification
> class" (HSC), and are all the same when appearing in IDNs.
>
> Correct?
>

Oui.

>
>> The caveat is that this must be understood to be a first-order, 
>> computer-appropriate equivalence and is not in any way to be held to be 
>> a generalized solution to the lexically appropriate conversion between 
>> SC and TC.
>
>
> Is there any danger that these classes will turn out to be a
> "small world", in the sense that we wind up with a few huge classes
> which include almost all the characters?
>

Nope.

>> (Maybe we should refer to *zhengguihua* instead of "Han normalization"…)
>
>
> Can you explain the joke?
>

It's just to make Ken happy.  He doesn't like me talking about "Han 
normalization," since "normalization" is Unicodespeak for something else.  
"Zhengguihua" is Mandarin for "normalization."

>> It will also mean that we will no longer be able to accept both the TC 
>> and SC form for a character as a candidate for separate encoding in the 
>> future,
>
>
> I don't understand this part.  Since this is neither compatibility nor
> canonical equivalence, it will not effect any of the known normalization
> forms.  Nor are we defining a new normalization form here, since in
> HSCs like the above there is no particular reason to pick any of the
> six characters as *the* normalized form, although by convention we can
> pick one -- say, the one with the smallest Unicode scalar
> value, or the one which appears in the largest number of legacy
> sets -- to aid in description and implementation.
>
> It's just another of those sets of equivalence classes provided for
> special purposes, like the Arabic/Syriac shaping classes or the
> canonical combining classes.
>

Well, first of all, the UTC is already on record as refusing to encode new 
SC separately.

Secondly, we would break IDN equivalence.  If we add a new SC which is 
equivalent to two TC, then suddenly domains which could be distinguished 
on the basis of the old TC pair can't any more.

> Or are you saying that this new information should be represented
> as a Unicode compatibility equivalence?  If so, that would
> wreak havoc with existing NCF and NKCF code.
>

No,

>> (Actually, you could save yourself some grief right off by excluding Han 
>> radicals and all compatibility ideographs.)
>
> This would be a Bad Thing in Korean, though, because the whole point
> of Korean compatibility ideographs is to preserve differences in
> reading.  Or are ideographs not used in (modern) Korean names?
>

These compatibility ideographs are *not* to provide phonetic-specific 
distinctions between various Korean hanja.  They're for compatibility with 
an older standard only, which did make that distinction.  IMHO it would be 
more confusing to Chinese, Japanese, *and* Korean readers to have some 
domain names distinguished when the the only thing different about them is 
the Korean pronunciation of the hanja used to write them.

==
John H. Jenkins
[EMAIL PROTECTED]
[EMAIL PROTECTED]
http://homepage.mac.com/jenkins/





Vedic Characters

2002-01-24 Thread Dinesh Agarwal

Can someone tell me what the current status of encoding Sanskrit (Vedic) 
characters in the UCS is? Have any formal proposals been made yet or are any 
being planned? Also, what about the alternate shapes for certain Devanagari 
numerals like 9 that are mentioned at: 
http://www.evertype.com/standards/iso10646/pdf/vedic/

Thanks,
Dinesh Agarwal

_
MSN Photos is the easiest way to share and print your photos: 
http://photos.msn.com/support/worldwide.aspx





RE: Maya numerals

2002-01-24 Thread jarkko . hietaniemi

Yup, encoding the 20 numerals would probably be the easiest solution.

-Original Message-
From: ext Rick McGowan [mailto:[EMAIL PROTECTED]]
Sent: Thursday, January 24, 2002 14:41
To: [EMAIL PROTECTED]
Subject: Re: Maya numerals


I would just encode the 20 numerals. However, nobody has yet come up with  
a comprehensive proposal, so I would defer any discussion to the point at  
which some expert(s) have an opinion about the script in general.

Rick





Re: TC/SC mapping

2002-01-24 Thread John Cowan

John H. Jenkins wrote:


> Well, first of all, the UTC is already on record as refusing to encode 
> new SC separately.
> 
> Secondly, we would break IDN equivalence.  If we add a new SC which is 
> equivalent to two TC,


By your previous graf, that can't happen; so it must be adding a new TC
(off a newly dug-up bone, perhaps) which simplifies to two different
SCs.  Fair enough.

-- 
Not to perambulate || John Cowan <[EMAIL PROTECTED]>
the corridors   || http://www.reutershealth.com
during the hours of repose || http://www.ccil.org/~cowan
in the boots of ascension.  \\ Sign in Austrian ski-resort hotel





Re: Issues with Unicode Hindi

2002-01-24 Thread Dinesh Agarwal

>Please be aware of the fact that those "private sources" have perhaps
>engaged in piracy by giving you this font, which they have no right to
>redistribute. This list hosts the company who was the victim of the
>piracy -- a big company, to be sure, but do we rob the rich man because he
>is rich? (I hope everyone thinks the answer here is no!).
>
>I hope you deleted the font now that you have verified the solution. :-)

The "private sources" that I mentioned was just this website: 
http://www.geocities.com/fontmagicus/index.html
I knew I wasn't supposed to be downloading the font since I am not an owner 
of XP or 2000, as they require on the site. But I needed to conduct the test 
and now I have deleted the font.

Regards,
Dinesh Agarwal


>From: "Michael \(michka\) Kaplan" <[EMAIL PROTECTED]>
>To: "Dinesh Agarwal" <[EMAIL PROTECTED]>, <[EMAIL PROTECTED]>
>Subject: Re: Issues with Unicode Hindi
>Date: Thu, 24 Jan 2002 08:42:17 -0800
>
>




_
MSN Photos is the easiest way to share and print your photos: 
http://photos.msn.com/support/worldwide.aspx





Wade -> Pinyin transliteration (Unihan ?)

2002-01-24 Thread Patrick Andries


Let's assume I want to "transliterate" a large Wade-Giles database into 
pinyin. It this a purely algorithmic process? For all nouns ? Common and 
proper (cf.  Chiang Kai-Shek vs Jiang Jeshi )? Even for "dialectal" words?

Would any data in the Unicode database help me in this process?

Patrick Andries






[Fwd: RE: FrameMaker+SGML 6.0, InDesign and Unicode]

2002-01-24 Thread Patrick Andries



Maurice Bauhahn wrote :

>No, FrameMaker 6.0 does not support Unicode. 

So if I get it right for FrameMaker6.0+SGML to support Chinese I need a Chinese OS?
And to support French (spelling, hyphenation) and Chinese? Can I use a French 
FrameMaker+SGML 
on a Chinese OS? 

>This is a sore point with me.
>Discussions with Adobe folk at an ATYPI conference a little over a year ago
>did not leave me with any hope that Adobe would start supporting Unicode in
>that product either. It is dangerous to own the second priority product when
>a company publishes two products of similar genre...and especially so when
>the second priority product is bought in. The worse thing that happened to
>FrameMaker was to be bought by Adobe;-( 

Does someone know whether there is an upgrade path from FrameMaker+SGML to something 
else, let's say InDesign (through XML/RTF Documents)?
Does such a move make technical (is InDesign able to handle documents of thousand of 
pages, same features as FrameMaker?) or strategical sense (how is FrameMaker 
evolving(*))?

How is the Unicode support of InDesign?


Patrick Andries
 

(*) The Adobe link "FrameMaker is alive and well!" leads to a File not Found...An 
ominous sign ?
http://www.adobe.com/products/framemaker/prodinfosgml.html then click to go to 
http://www.adobe.com/products/framemaker/fmkrelease.html






Re: Wade -> Pinyin transliteration (Unihan ?)

2002-01-24 Thread Kenneth Whistler

Patrick,

> Let's assume I want to "transliterate" a large Wade-Giles database into 
> pinyin. It this a purely algorithmic process? For all nouns ? Common and 
> proper (cf.  Chiang Kai-Shek vs Jiang Jeshi )? Even for "dialectal" words?

Mostly, but not completely. (Note that "Kai-Shek" is not standarard Mandarin
name, and won't convert to pinyin. The Wade-Giles for pinyin "Jieshi"
would be "Chieh-Shih".)

I suggest you visit the Library of Congress Pinyin Conversion Project
page:

http://www.loc.gov/catdir/pinyin/

and then contact them regarding their process and issues.

In particular, Wade-Giles romanization practices are not all
consistent, so you are bound to run into edge cases.

> 
> Would any data in the Unicode database help me in this process?

I don't think so.

--Ken

> 
> Patrick Andries




Unicode 3.2: BETA files updated

2002-01-24 Thread Kenneth Whistler

Unicoders:

The Unicode 3.2 BETA directory has been updated again,
to complete filling some last minute gaps. In particular:

ftp://www.unicode.org/Public/BETA/Unicode3.2/BidiMirroring-3.2.0d2.txt
ftp://www.unicode.org/Public/BETA/Unicode3.2/UnicodeData-3.2.0d8.txt

These complete the drafting of the Bidi_Mirrored property value
for all the new mathematical symbols in Unicode 3.2, and the
informative information about bidi mirroring pairs (in BidiMirroring.txt)
for the same set of symbols.

An HTML link syntax problem in UnicodeCharacterDatabase.html has also
been fixed.

And StandardizedVariants.html has been updated again, with more
of the missing glyphs provided.

Please remember that tomorrow (January 25) is the deadline for
feedback on the BETA data files for Unicode 3.2. The UTC will be
meeting in just a little over two weeks now, and will take
all the final decisions on any outstanding issues before the
formal release of Unicode 3.2.

--Ken Whistler




Re: Wade -> Pinyin transliteration (Unihan ?)

2002-01-24 Thread John Cowan

Patrick Andries scripsit:

> Let's assume I want to "transliterate" a large Wade-Giles database into 
> pinyin. It this a purely algorithmic process? For all nouns ? Common and 
> proper (cf.  Chiang Kai-Shek vs Jiang Jeshi )? Even for "dialectal" words?

"Chiang Kai-Shek" isn't Wade-Giles; it isn't even Mandarin.

-- 
John Cowan   http://www.ccil.org/~cowan  [EMAIL PROTECTED]
Please leave your values|   Check your assumptions.  In fact,
   at the front desk.   |  check your assumptions at the door.
 --sign in Paris hotel  |--Miles Vorkosigan




Re: Unicode 3.2: BETA files updated

2002-01-24 Thread John Hudson

As Unicode continues to grow, I wonder if we can expect another book-- or 
multiple volumes -- at some stage, or if the standard will become a purely 
electronic document? Has any decision been taken about this?

John Hudson

Tiro Typeworks  www.tiro.com
Vancouver, BC   [EMAIL PROTECTED]

... es ist ein unwiederbringliches Bild der Vergangenheit,
das mit jeder Gegenwart zu verschwinden droht, die sich
nicht in ihm gemeint erkannte.

... every image of the past that is not recognized by the
present as one of its own concerns threatens to disappear
irretrievably.
   Walter Benjamin





Re: Wade -> Pinyin transliteration (Unihan ?)

2002-01-24 Thread Patrick Andries





John Cowan wrote:
[EMAIL PROTECTED]">
  Patrick Andries scripsit:
  
Let's assume I want to "transliterate" a large Wade-Giles database into pinyin. It this a purely algorithmic process? For all nouns ? Common and proper (cf.  Chiang Kai-Shek vs Jiang Jeshi )? Even for "dialectal" words?

"Chiang Kai-Shek" isn't Wade-Giles; it isn't even Mandarin.

I did mention "dialectal" forms (I believe final -k does no longer occur
in Mandarin), I just wondered whether I would find such nouns (proper or
common) in dictionary edited in Taiwan. I asked because I could see no algorithmic
way of converting this name using traditional Wade to Pinyin tables.

Incidentally, if this is not Wade-Giles applied to a "dialectal" pronunciation,
what is it? Geniously interested.

Patrick Andries

PS : Thank you for the National Library of Congress pointer.






[Quite, quite OT:] Re: ü

2002-01-24 Thread David Hopwood

-BEGIN PGP SIGNED MESSAGE-

"Alain LaBonté" wrote:
> A 08:13 2002-01-23 -0500, John Cowan a écrit :
> >Middle French spelling is very unphonemic.  This is the so-called
> >"aspirated h", which still blocks liaison even though it is
> >quite silent now.
> 
> [Alain]  Not only quite, but absolutely mute, one must not be so shy.

"quite" means French "absolument" in this context. I think the rule is
this: if the adjective already describes an absolute quality (like "silent"
or "wrong" or "unacceptable", for example), then "quite" emphasises
that it really is absolute; in speech, the "i" sound in "quite" is
stressed.

If the adjective describes a graded quality, i.e. that often differs in
degree (like "hungry" or "good" or "fast"), then "quite" means French
"assez", and is unstressed.

Of course this makes very little sense. Such is English.

- -- 
David Hopwood <[EMAIL PROTECTED]>

Home page & PGP public key: http://www.users.zetnet.co.uk/hopwood/
RSA 2048-bit; fingerprint 71 8E A6 23 0E D3 4C E5  0F 69 8C D4 FA 66 15 01
Nothing in this message is intended to be legally binding. If I revoke a
public key but refuse to specify why, it is because the private key has been
seized under the Regulation of Investigatory Powers Act; see www.fipr.org/rip


-BEGIN PGP SIGNATURE-
Version: 2.6.3i
Charset: noconv

iQEVAwUBPE+ZADkCAxeYt5gVAQGY+wf/Wiz3gg1+i2LwIT/HSoHTtcSzAx7p885p
CZpsdw56TXwpof0US6Dh2tAFR6AAyNlvfYyN9Cr8LIlzKWZmtns2NtfTog9pE9A7
VIatK4X2MMYtdo+UmVM6LVC13PsPtI3VNkBFoCEojRYqRAj2BpilelwehHBb6Oyf
j9CBM0lgr1guAdQslW3O0KNYqOW89Sn7WdfYgVdeI3bIbpbq9Tx+TwDjbkw7t3gM
TqDPfuZ7ZPcamxwyFzziYbVTq/5IONUbx+c6MkQG9eDsfcnF4f1vhMspx2HBhwtd
X1eyzGXPfIs+ym+2BEku+fl1AZn7OP03Vq14D3Cl/Y/z7ePo0S+i0w==
=gbGZ
-END PGP SIGNATURE-