Re: Unicode 3.2: BETA files updated

2002-01-25 Thread David Hopwood

-BEGIN PGP SIGNED MESSAGE-

Kenneth Whistler wrote:
 And StandardizedVariants.html has been updated again, with more
 of the missing glyphs provided.

I can't see any difference between plain U+2278 (either in the draft
code chart or StandardizedVariants.html) and U+2278 with VS1.
Is plain U+2278 supposed to have an oblique stroke?

Same for U+2279.

For U+2A9D, the tilde-like part of the glyph is reversed left-to-right
relative to what it should be (compare U+2272 and U+2273, and look at
the code chart for plain U+2A9D). This is more important than it sounds!

Less importantly, U+2268 and U+2269 with VS1 should use the same style
of glyph (i.e. opening angle) for the less than/greater than sign, as
the other characters.


The Mongolian descriptions say second form, third form, and fourth form.
Unless these are already defined somewhere, I suggest variation one,
variation two, and variation three instead.


Is variant or variation the preferred term? If variant is preferred,
then why VARIATION SELECTOR ONE, etc.? If not, why StandardizedVariants?

- -- 
David Hopwood [EMAIL PROTECTED]

Home page  PGP public key: http://www.users.zetnet.co.uk/hopwood/
RSA 2048-bit; fingerprint 71 8E A6 23 0E D3 4C E5  0F 69 8C D4 FA 66 15 01
Nothing in this message is intended to be legally binding. If I revoke a
public key but refuse to specify why, it is because the private key has been
seized under the Regulation of Investigatory Powers Act; see www.fipr.org/rip


-BEGIN PGP SIGNATURE-
Version: 2.6.3i
Charset: noconv

iQEVAwUBPE+owzkCAxeYt5gVAQGfMAgAtejHL/lEiqaYW3NYTj6Eku7RMlZqA+om
sXwEZlskrALzBxHs+G1gwx09f3/DCD8vfIlFHOVkHYfkMfxJpMf8CXfSPVpIKM2z
36vhCSc7okQsfIwfDqymj+T/InuF495Ph/g6j5cgQO35vVEC4gzzy04Qy03l5FMm
OP/JoiPgaazcolMslErNmVxUEhwBApheTLlMgMJoK81oDVEhmRmGqFmgcMHUFZUO
pxLyWgXrESvAPwrt3qUs+Des0P++8p6KRbwAVbUA/s2eDBeisYZsiJCiIz45IRfF
elwfv2Ek1pyDiZqvcda4+5x3m3Y1GUt+xoWQ+1C9pt7TM7Q3Z/LK5Q==
=c8Z+
-END PGP SIGNATURE-




Re: Multiple script Handling (kanji - kana)

2002-01-25 Thread Berthold Frommann

Hi Rajat,
 
 Any solutions to handle the same ( or in other words to compare 2 Japanese
 strings written in different scripts or by mixture of two scripts)  ??
It is definitely a non-trivial task.
If you just want to transform a katakana string into hiragana (or vice
versa), it is very easy. But as soon as you start dealing with kanji, it
gets really, really tricky.

Whereas Chinese - mostly - only has one reading per character, kanji very
often have multiple readings. If you want to compare a kanji string to a
hiragana string, you have to find out the reading of the kanji - and there
is no 1-to-1 table for doing this, rather 1-to-n.
You would need a dictionary of Japanese to determine the reading of a
compound. But some compounds have various readings, depending on the
context. So you would also need a semantic analysis of the sentence!

生物 = 1. seibutsu, 2. namamono
今日 = 1. konnichi, 2. kyou
上手 = 1. jouzu, 2. uwate, 3. kamite
下 = 1. ka, 2. ge, 3. shita, 4. shimo, 5. moto, 6. sa(...), 7. kuda(...), 8.
o(riru)

Readings of place names and personal names are especially difficult to
figure out.
However, it really depends on what kind of data you are about to process. If
you e.g. have two fields for a Japanese person's name, one in kanji, and the
transcription in kana, you could at least check whether it is among the
correct transcriptions of the name ... (sigh!)


Regards,
   Berthold


__
Do You Yahoo!?
Yahoo! BB is Broadband by Yahoo!  http://bb.yahoo.co.jp/





RE: [OT] Rich man Bill (RE: Issues with Unicode Hindi)

2002-01-25 Thread Marco Cimarosti

Michael Kaplan wrote:
 We rob NO ONE. We behave with honor and we wish others to do the same with
 us.
 Its a respect thing.

For sure. But you understand that this is politics as well. Many aspects of
copyright and intellectual property, and even the very concept of private
property, are still the subjects of political debate.

In order for your noble statement to be totally effective, there must be no
rich men or poor men to play Robin Hood with. But solving such economic
dilemmas, as Sarasvati might remind us, is not the task for the public
mailing list of a character encoding standard.

Real-life down-to-hearth issues pop up every day on the Unicode List
(probably because this is the roll-out phase of the standard). We recently
discovered how IT people in India have to work sharing rented modem on old
and slow telephone lines. It is not the task of Unicode to fix the telecomm
infrastructure of India. Not to mention the problem of uneven distribution
of resources in the world.

So, in that occasion, this forum concentrated itself on answering a single
pertaining question: is the size overhead of UTF-8 compatible with the
situation of Internet in India? Luckily, the answer was yes: however slow is
your Internet connection, the fact that UTF-8 uses 3 bytes for each Indic
character won't make things worse, because plain text is an insignificant
part of the overall size of an Internet document.

About the issue that you raised regarding software piracy, you should
consider that in many countries it is easy to step into huge multi-floor
shops which sell illegal copies of software and manuals.

Feel free to disagree with this state of things, but please avoid publicly
calling pirates people who just did what it is customary to do in most
parts of the world. Or, probably, who just did a quick test. This is not
fair, not appropriate, and not a technical approach to problems.

By the way, I would not be so sure that at Microsoft Corp. they need your
help to do their math. It is not the task of this forum to decide whether,
for a major corporation, it is more important to be strict about copyright,
rather than to be the first and best behavers on one of the most promising
markets on the planet.

It is OK to point out that such-and-such font is not supposed to be free (or
that it is, but only provided you install the latest version of
such-and-such operating system), or to inform that there also are free or
shareware fonts out there that cover such-and-such Indic scripts.

But I feel that this is not the proper place to settle legal issues about
software distribution. JMHO.

_ Marco




Re: Multiple script Handling (kanji - kana)

2002-01-25 Thread SOS Uni Bonn

From my database with roughly 50.000 lexical
entries (compounds) I get a
number of 1431 compounds with
at least two readings and 71 with at least three
readings.

If I include also 60.000 personal and local names
I get 6840 compounds with two or more, 1344 with three
or more and 348 with four or more readings.


Kay Genenz, Bonn


 Berthold Frommann wrote:
  生物 = 1. seibutsu, 2. namamono
  今日 = 1. konnichi, 2. kyou
  上手 = 1. jouzu, 2. uwate, 3. kamite

 Do you have a rough evaluation of how many compound words have multiple
 readings?







Re: Wade - Pinyin transliteration (Unihan ?)

2002-01-25 Thread Thomas Chan

On Thu, 24 Jan 2002, Patrick Andries wrote:

 John Cowan wrote:
 Patrick Andries scripsit:
 Let's assume I want to transliterate a large Wade-Giles database into 
 pinyin. It this a purely algorithmic process? For all nouns ? Common and 
 proper (cf.  Chiang Kai-Shek vs Jiang Jeshi )? Even for dialectal words?
 
 Chiang Kai-Shek isn't Wade-Giles; it isn't even Mandarin.

 I did mention dialectal forms (I believe final -k does no longer occur 
 in Mandarin), I just wondered whether I would find such nouns (proper or 
 common) in dictionary edited in Taiwan. I asked because I could see no 
 algorithmic way of converting this name using traditional Wade to Pinyin 
 tables.
 
 Incidentally, if this is not Wade-Giles applied to a dialectal 
 pronunciation, what is it? Geniously interested.

It should be noted that Wade-Giles is commonly misused as a cover term
for many old, ad hoc, non-Mandarin-based, or non-Pinyin romanization
systems.

Chiang Kai-shek is a mixture of what looks like Wade-Giles (surname
CHIANG) and some kind of archaic romanization based on Cantonese (given
name Kai-shek).  For placenames, there are many postal romanizations
that are often erroneously considered to be Wade-Giles, e.g., the city
Nanking (postal)/Nan-ching (Wade-Giles)/Nanjing (Pinyin).

In any case, one should also beware of degenerate Wade-Giles forms where
details such as apostrophes (denoting aspiration) are omitted, e.g., the
city Changchun (degenerate Wade-Giles)/Ch'ang-ch'un
(Wade-Giles)/Changchun (Pinyin).  If Changchun were accepted as proper  
Wade-Giles input, then a corrupt *Zhangzhun pinyin form would be
generated.


Thomas Chan
[EMAIL PROTECTED]





Re: Multiple script Handling (kanji - kana)

2002-01-25 Thread Berthold Frommann
Dear Prof. Genenz,

 From my database with roughly 50.000 lexical
 entries (compounds) I get a
 number of 1431 compounds with
 at least two readings and 71 with at least three
 readings.
Taking only into account compounds with multiple readings.

But imagine this: If a program had merely access to a database containing
the readings of single characters, it still couldn't figure out the reading
of a compound (reliably). How could it "know" that $B?M4V(J is "ningen" and not,
for instance, *jinkan?
This means that it would be vital to have access to a database containing
entries for kanji-compound-reading(s).

Without semantic analysis of the sentences concerned, it is not possible to
determine the correct contextual reading of every Japanese compound.
It is only possible to check whether the reading given in the second string
is _one_ of the correct readings.

Greetings from Edo,
   Berthold Frommann

__
Do You Yahoo!?
Yahoo! BB is Broadband by Yahoo!  http://bb.yahoo.co.jp/


Re: Has anyone looked at Laban dance notation?

2002-01-25 Thread Michael Everson

At 12:46 -0800 2002-01-24, Kenneth Whistler wrote:

This heads immediately into a rathole where any scheme of dynamic 
notation for anything whatsoever becomes a candidate for character 
encoding.

Any candidate for encoding has to meet certain criteria. Like Klingon 
didn't. One of those criteria would be doable. Another would be 
meets user requirements. A priori rejection of things makes me 
nervous, though.
-- 
Michael Everson *** Everson Typography *** http://www.evertype.com




Re: Unicode 3.2 Beta Period Finishing

2002-01-25 Thread Bob_Hallissy


Hm, I still don't see 'em all -- letf column images starting at U+2A3C are
still missing.

Bob

On 22-01-2002 05:02:24 Mark Davis wrote:

I sent the message out somewhat prematurely -- the images in the first
column (the normal representative glyphs) should be there tomorrow.

Mark






Re: Unicode 3.2: BETA files updated

2002-01-25 Thread Asmus Freytag

At 06:29 AM 1/24/02 +, David Hopwood wrote:
Kenneth Whistler wrote:
  And StandardizedVariants.html has been updated again, with more
  of the missing glyphs provided.

I can't see any difference between plain U+2278 (either in the draft
code chart or StandardizedVariants.html) and U+2278 with VS1.
Is plain U+2278 supposed to have an oblique stroke?

Same for U+2279.

The plain ones are supposed to have the oblique stroke in the *reference* 
glyphs.
As with all mathematical glyph variations, *both* variations are acceptable in
common, unmarked situations.

For U+2A9D, the tilde-like part of the glyph is reversed left-to-right
relative to what it should be (compare U+2272 and U+2273, and look at
the code chart for plain U+2A9D). This is more important than it sounds!

After the last update, I sent Rick a font for these variations that pays
attention to these details.

Less importantly, U+2268 and U+2269 with VS1 should use the same style
of glyph (i.e. opening angle) for the less than/greater than sign, as
the other characters.

Where possible we've taken the variations from actual fonts, that means
that there may be such minor differences that are unrelated to the feature
called out in the description.

The Mongolian descriptions say second form, third form, and fourth form.
Unless these are already defined somewhere, I suggest variation one,
variation two, and variation three instead.

This list is being published as Amd 1:ISO/IEC 10646-1:2000 (2002), so
it's essentially frozen. The list of variants has been out as a UNU TR
for a long time now with these terms.

Is variant or variation the preferred term? If variant is preferred,
then why VARIATION SELECTOR ONE, etc.? If not, why StandardizedVariants?

While VARIATION SELECTOR is the formal name of the character (and therefore
fixed), referring to the selected thing as a 'variation' sounds really
odd, that's why the more common term 'variant' is used all over the place.
Perhaps we ought to make them formally synonyms, somewhat like code point
and code location.

I think it's a subtle thing. Without context, *VARIANT SELECTOR could be
understood as a VARIANT of a SELECTOR. Equally, without context, referring
of the 'variation' of a character is less clear than saying 'variant'.

A./




Re: Has anyone looked at Laban dance notation?

2002-01-25 Thread Rick McGowan

Michael Everson wrote:

 Any candidate for encoding has to meet certain criteria. Like Klingon
 didn't. One of those criteria would be doable. Another would be
 meets user requirements. A priori rejection of things makes me
 nervous, though.

Yeah. I agree that a priori rejection of Labanotation, or any other of  
various symbolic notations, might be imprudent. But these are cases where  
the burden of proof -- that a character-based encoding is doable and useful  
to the user community -- should be squarely on the proposers.

So far, nobody has even proposed Labanotation nor done anything near the  
analysis and inventory that would be required to really engage in a  
discussion of suitability for character encoding. Same applies to other  
symbologies, like chemical notation, for that matter.

Rick




Re: [Very-OT] Re: ü

2002-01-25 Thread J M Sykes


- Original Message -
From: Michael Everson [EMAIL PROTECTED]
To: Patrick Andries [EMAIL PROTECTED]
Cc: David Starner [EMAIL PROTECTED]; [EMAIL PROTECTED]
Sent: Wednesday, January 23, 2002 12:35 AM
Subject: [Very-OT] Re: ü


snip
 
 Garçon in Oxford English Dictionary but garconnière (bachelor's
 housing) in my Webster's New Lexicon (no cedilla, grave accent).

 Webster's Third New International (1961): garçon
 Supplement (n.d.): garçonnière.

 Oxford New Dictionary of English (2001): garçon, garçonnière.

New Shorter Oxford English Dictionary, January 1997, on CD-ROM has:
garçon, garconnière.

How's that for consistency?
Of course, given the evidence above, they may have revised that by now.

Mike.





RE: Unicode 3.2: BETA files updated

2002-01-25 Thread Suzanne M. Topping

 
 At 06:29 AM 1/24/02 +, David Hopwood wrote:
 Kenneth Whistler wrote:
   And StandardizedVariants.html has been updated again, with more
   of the missing glyphs provided.

Can anyone send me the URL for this chart? I can't seem to find it.




Re: Unicode 3.2 Beta Period Finishing

2002-01-25 Thread Kenneth Whistler

Bob,

 Hm, I still don't see 'em all -- letf column images starting at U+2A3C are
 still missing.

They're all there. I just checked. Try reloading the page.

--Ken

 
 Bob
 
 On 22-01-2002 05:02:24 Mark Davis wrote:
 
 I sent the message out somewhat prematurely -- the images in the first
 column (the normal representative glyphs) should be there tomorrow.
 
 Mark
 
 
 
 





Re: Fontlab 4.0, Opentype and supplementary characters

2002-01-25 Thread John Hudson

Yuri Yarmola has written to me again to say that he is working on 4-byte 
cmap support, but needs an existing font with such a cmap in order to test 
his import function. Does anyone have a font with Plane One characters 
encoded in such a cmap? If so, please contact Yuri directly at 
[EMAIL PROTECTED]. Thanks.

John Hudson

Tiro Typeworks  www.tiro.com
Vancouver, BC   [EMAIL PROTECTED]

... es ist ein unwiederbringliches Bild der Vergangenheit,
das mit jeder Gegenwart zu verschwinden droht, die sich
nicht in ihm gemeint erkannte.

... every image of the past that is not recognized by the
present as one of its own concerns threatens to disappear
irretrievably.
   Walter Benjamin





Re: Unicode 3.2: BETA files updated

2002-01-25 Thread John Cowan

Asmus Freytag scripsit:

 While VARIATION SELECTOR is the formal name of the character (and therefore
 fixed), referring to the selected thing as a 'variation' sounds really
 odd, that's why the more common term 'variant' is used all over the place.
 Perhaps we ought to make them formally synonyms, somewhat like code point
 and code location.
 
 I think it's a subtle thing. Without context, *VARIANT SELECTOR could be
 understood as a VARIANT of a SELECTOR. Equally, without context, referring
 of the 'variation' of a character is less clear than saying 'variant'.

The variation selector specifies the variation which will produce
the variant.

-- 
John Cowan   http://www.ccil.org/~cowan  [EMAIL PROTECTED]
Please leave your values|   Check your assumptions.  In fact,
   at the front desk.   |  check your assumptions at the door.
 --sign in Paris hotel  |--Miles Vorkosigan




Re: Microsoft's Japanese IME has no Unicode option

2002-01-25 Thread $B$m!;!;!;!;(B $B$m!;!;!;(B



From: "Michael \(michka\) Kaplan" [EMAIL PROTECTED]
To: "$B$m!;!;!;!;(B $B$m!;!;!;(B" [EMAIL PROTECTED]
Subject: Re: Microsoft's Japanese IME has no Unicode option
Date: Fri, 25 Jan 2002 10:15:32 -0800

You are wrong.


$B$o$!$!$!$!$!$C$C$C$C!*!*!*(B

If that is so, how to I get the thing to give me Unicode?
All I saw in the list is JIS, Shift-JIS, and Kuten.


_
$B%a!<%k$r3Z$7$_$?$$J}$K:G9b$N%5!<%S%9(B MSN Hotmail $B$,$*$9$9$a(B 
http://www.hotmail.com/JA/


Re: Microsoft's Japanese IME has no Unicode option

2002-01-25 Thread Michael \(michka\) Kaplan
If you use a scripting language like VBScript or JScript, it is converted to
Unicode for of the strings in your code. I have explained that you can
actually see Unicode in a particular scenario, but it is not going to
convert your pages for you or anything like that.

Rather than complaining about what is not happening here, why don't you stop
and calmly explain what you WANT to happen? Perhaps then someone can answer
your question.

(and also, no need to be offensive here!)


MichKa

Michael Kaplan
Trigeminal Software, Inc.  -- http://www.trigeminal.com/


- Original Message -
From: "$B$m!;!;!;!;(B $B$m!;!;!;(B" [EMAIL PROTECTED]
To: [EMAIL PROTECTED]
Cc: [EMAIL PROTECTED]
Sent: Friday, January 25, 2002 11:00 AM
Subject: Re: Microsoft's Japanese IME has no Unicode option





 From: "Michael \(michka\) Kaplan" [EMAIL PROTECTED]
 To: "$B$m!;!;!;!;(B $B$m!;!;!;(B" [EMAIL PROTECTED]
 Subject: Re: Microsoft's Japanese IME has no Unicode option
 Date: Fri, 25 Jan 2002 10:15:32 -0800
 
 You are wrong.
 
 
 $B$o$!$!$!$!$!$C$C$C$C!*!*!*(B

 If that is so, how to I get the thing to give me Unicode?
 All I saw in the list is JIS, Shift-JIS, and Kuten.


 _
 $B%a!<%k$r3Z$7$_$?$$J}$K:G9b$N%5!<%S%9(B MSN Hotmail $B$,$*$9$9$a(B
 http://www.hotmail.com/JA/




RE: Unicode 3.2: BETA files updated

2002-01-25 Thread Julie Allen

John Hudson asked,

 As Unicode continues to grow, I wonder if we can expect another book--
or
 multiple volumes -- at some stage, or if the standard will become a
purely
 electronic document? Has any decision been taken about this?

Speaking in my official capacity as editor, the answer is yes, you can
expect another book. The editorial committee is already hard at work on
4.0, which we expect to publish as one volume. Publication is
tentatively scheduled for spring 2003. As to the form and timing of 5.0,
that would be pure speculation at this point. Someone else on the
committee might be willing to speculate, but I won't! 

Julie Allen






Re: Microsoft's Japanese IME has no Unicode option

2002-01-25 Thread Michael \(michka\) Kaplan
From: "$B$m!;!;!;!;(B $B$m!;!;!;(B" [EMAIL PROTECTED]

 Okay, here's the scoop: I have a page with some (poorly
 written) Japanese in it, and it is in Unicode. I want to be
 able to edit the page without having to port the whole
 doggone thing into Unipad and then curse when I can't
 use my IME in Unipad so I have to cut-and-paste from
 MS Word and THEN go thru the whole rigmarole of
 replacing my page. No. I want it to work in the Geocities
 page editor. And I am using the .com (not .co.jp) version
 of Geocities for this page.

You should use a program (such as FrontPage 2000 or FrontPage XP) that can
support any encoding you choose to use for your pages. A browser is for
displaying pages in the encoding they are in, NOT a tool for editing web
pages.

If you do not like this kind of option, I do not know what else to tell
you -- there are many programs out there designed to do what you are asking,
you cannot insist on using one that will not and then be surprised if it
does not work

MichKa

Michael Kaplan
Trigeminal Software, Inc.  -- http://www.trigeminal.com/


RE: Unicode 3.2: BETA files updated

2002-01-25 Thread Kenneth Whistler

Julie said:

 As to the form and timing of 5.0,
 that would be pure speculation at this point. Someone else on the
 committee might be willing to speculate, but I won't! 

Ummm...

Unicode 5.0 will be published on December 22, 2007,
in DVD3 holographic format, complete with a remastered
Unicode hymn and MSNBC, E!, and MTV interviews with the cast of 
thousands who contributed to the project. We'll get David
Kelley to do the producing.

--Ken





Re: Unicode 3.2: BETA files updated

2002-01-25 Thread David Starner

On Fri, Jan 25, 2002 at 11:31:19AM -0800, Julie Allen wrote:
 Speaking in my official capacity as editor, the answer is yes, you can
 expect another book. The editorial committee is already hard at work on
 4.0, which we expect to publish as one volume.

So are you worried about 4.0 being 2,000 pages long, or do you have a
solution to that problem?

-- 
David Starner - [EMAIL PROTECTED], dvdeug/jabber.com (Jabber)
Pointless website: http://dvdeug.dhis.org
What we've got is a blue-light special on truth. It's the hottest thing 
with the youth. -- Information Society, Peace and Love, Inc.




Re: RE: Unicode 3.2: BETA files updated

2002-01-25 Thread Rick McGowan

Ken let the cat out of the bag:

 Unicode 5.0 will be published on December 22, 2007...
 complete with a remastered Unicode hymn...

It's true. We've already booked an Abbey Road studio for five days in  
March 2007, and we've signed 75 of the hottest young voices in the world to  
be in the chorus, including Oumou Sangare, Nityasree Mahadevan, and Ning  
Liang... Send in your major cash donation today and you, too, can be in the  
chorus rubbing shoulders with the divas!

Rick




Re: Problems with viewing Hindi Unicode Page

2002-01-25 Thread Peter_Constable

On 01/23/2002 02:50:58 AM John Hudson wrote:

The problem for Win 9x users, even with current browsers, is lack of a
system installed Devanagari font with OpenType layout tables. The version
of Arial Unicode that ships with pre-XP versions of Windows does not
contain layout tables for Indic scripts (I have not check the XP version,
but I know that this is something that Monotype have been working on for
Microsoft).

The version of Arial Unicode MS on my system does have layout tables for 
Devanagari. I don't know with what product this version was introduced to 
my system -- I've got Win2K, IE5.5 and Office XP.

BTW, don't go and borrow Mangal.ttf from a Win2K user and install it on 
Win98; you won't get the results you're looking for.


- Peter


---
Peter Constable

Non-Roman Script Initiative, SIL International
7500 W. Camp Wisdom Rd., Dallas, TX 75236, USA
Tel: +1 972 708 7485
E-mail: [EMAIL PROTECTED]





Re: Problems with viewing Hindi Unicode Page

2002-01-25 Thread John Hudson

At 12:56 1/25/2002, [EMAIL PROTECTED] wrote:

The version of Arial Unicode MS on my system does have layout tables for
Devanagari. I don't know with what product this version was introduced to
my system -- I've got Win2K, IE5.5 and Office XP.

Interesting. What's the file date on that font?

John Hudson

Tiro Typeworks  www.tiro.com
Vancouver, BC   [EMAIL PROTECTED]

... es ist ein unwiederbringliches Bild der Vergangenheit,
das mit jeder Gegenwart zu verschwinden droht, die sich
nicht in ihm gemeint erkannte.

... every image of the past that is not recognized by the
present as one of its own concerns threatens to disappear
irretrievably.
   Walter Benjamin





RE: Microsoft's Japanese IME has no Unicode option

2002-01-25 Thread Addison Phillips [wM]
There is a quite simple way to do what you want:

If you want to input directly into an HTML form on the Geocities site, all
you have to do is pull down your "view" menu (I presuppose IE here) and
choose "UTF-8" from the Encoding submenu. Since Geocities doesn't send a
META tag, your browser will now encode all of the data you type as UTF-8 for
you and those are the bytes that will get stored in your page on the
back-end. The reason you're getting Shift-JIS now is that your browser is
probably set to "Japanese auto-detect" and ASCII is certainly valid
Shift-JIS..

Note that adding a META tag to your page is a very good idea if you decide
to use UTF-8 as the encoding.

You can see that this works here:
http://www.geocities.com/apphillips2000/index.html

You will note that I included a META tag. Otherwise you have to manually
select UTF-8 as the page encoding.

Regards,

Addison

Addison P. Phillips
Globalization Architect / Manager, Globalization Engineering
webMethods, Inc.  |  The Business Integration Company
432 Lakeside Drive, Sunnyvale, California, USA
+1 408.962.5487 (phone)  +1 408.210.3569 (mobile)
-
Internationalization is an architecture. It is not a feature.


Re: Multiple script Handling (kanji - kana)

2002-01-25 Thread Stefan Persson

- Original Message -
From: Marco Cimarosti [EMAIL PROTECTED]
To: 'Berthold Frommann' [EMAIL PROTECTED]; Rajat Bawa
[EMAIL PROTECTED]
Cc: [EMAIL PROTECTED]
Sent: den 25 januari 2002 12:06
Subject: RE: Multiple script Handling (kanji - kana)

 In plain-text unicode, furigana might be encoded using this set of control
 characters:

 U+FFF9 (INTERLINEAR ANNOTATION ANCHOR)
 U+FFFA (INTERLINEAR ANNOTATION SEPARATOR)
 U+FFFB (INTERLINEAR ANNOTATION TERMINATOR)

 The format of a word with furigana should be:

 U+FFF9 kanji(s) U+FFFA hiragana(s) U+FFFB

Do you know any font that supports these characters?

Stefan


_
Do You Yahoo!?
Get your free @yahoo.com address at http://mail.yahoo.com





RE: Unicode 3.2: BETA files updated

2002-01-25 Thread Asmus Freytag

At 11:31 AM 1/25/02 -0800, Julie Allen wrote:
John Hudson asked,

  As Unicode continues to grow, I wonder if we can expect another book--
or
  multiple volumes -- at some stage, or if the standard will become a
purely
  electronic document? Has any decision been taken about this?


There are lots of space saving options. Printing the code charts
in 7 pt type (they are now in 22pt type) would allow us to print
nine times as many characters per page. We would have room left
over for adding one of the handy little magnifying glasses like
that ones that accompany the boxed set of the OED.

A./





RE: Unicode 3.2: BETA files updated

2002-01-25 Thread Julie Allen


 On Fri, Jan 25, 2002 at 11:31:19AM -0800, Julie Allen wrote:
  Speaking in my official capacity as editor, the answer is yes, you
can
  expect another book. The editorial committee is already hard at work
on
  4.0, which we expect to publish as one volume.
 
 So are you worried about 4.0 being 2,000 pages long, or do you have a
 solution to that problem?

We're estimating that 4.0 will be roughly 1500 pages, which the
publisher says is not a problem for one volume. Now whether you can
carry it with one hand is a different question. :-)

--Julie






Furigana can be katakana

2002-01-25 Thread $B$m!;!;!;!;(B $B$m!;!;!;(B
In my Love Hina vol 7, $B@iG/(B has furigana $B%_%l%K%"%`(B.

Just thought you might wanna know.

_
$B%a!<%k%5!<%S%9$O!"@$3&(B No.1 $B$N(B MSN Hotmail 
$B$G!*(Bhttp://www.hotmail.com/JA/


Re: Furigana can be katakana

2002-01-25 Thread Stefan Persson

- Original Message -
From: ろ ろ〇〇〇 [EMAIL PROTECTED]
To: [EMAIL PROTECTED]
Cc: [EMAIL PROTECTED]
Sent: den 25 januari 2002 23:23
Subject: Furigana can be katakana

 In my Love Hina vol 7, 千年 has furigana ミレニアム.

In cases such as ?瑞典?スウェーデン? (is the furigana encoded correctly?) the
furigana should always be written in katakana, right?

Stefan


_
Do You Yahoo!?
Get your free @yahoo.com address at http://mail.yahoo.com





Carrying it around [was RE: Unicode 3.2: BETA files updated]

2002-01-25 Thread Mark Leisher


Julie We're estimating that 4.0 will be roughly 1500 pages, which the
Julie publisher says is not a problem for one volume. Now whether you can
Julie carry it with one hand is a different question. :-)

We Unicode accolytes have a rule that requires using both hands when carrying
the holy book anyway :-)

Mind you, revealed wisdom should never exceed 4.5 kilograms in weight (in
Earth-normal gravity) so that it remains suitable for slamming authoritatively
on the tops of podiums, desks and other flattish surfaces.
-
Mark Leisher... I get my ideas from reading the news,
Computing Research Lab  which is probably why my writing has the
New Mexico State University intellectual depth of Saran Wrap.
Box 30001, Dept. 3CRL -- Michael Swaine
Las Cruces, NM  88003




RE: Multiple script Handling (kanji - kana)

2002-01-25 Thread Anil Joshi

Here I am in the list

-Original Message-
From: Stefan Persson [mailto:[EMAIL PROTECTED]]
Sent: Friday, January 25, 2002 4:49 PM
To: Marco Cimarosti; 'Berthold Frommann'; Rajat Bawa
Cc: [EMAIL PROTECTED]
Subject: Re: Multiple script Handling (kanji - kana)


- Original Message -
From: Marco Cimarosti [EMAIL PROTECTED]
To: 'Berthold Frommann' [EMAIL PROTECTED]; Rajat Bawa
[EMAIL PROTECTED]
Cc: [EMAIL PROTECTED]
Sent: den 25 januari 2002 12:06
Subject: RE: Multiple script Handling (kanji - kana)

 In plain-text unicode, furigana might be encoded using this set of control
 characters:

 U+FFF9 (INTERLINEAR ANNOTATION ANCHOR)
 U+FFFA (INTERLINEAR ANNOTATION SEPARATOR)
 U+FFFB (INTERLINEAR ANNOTATION TERMINATOR)

 The format of a word with furigana should be:

 U+FFF9 kanji(s) U+FFFA hiragana(s) U+FFFB

Do you know any font that supports these characters?

Stefan


_
Do You Yahoo!?
Get your free @yahoo.com address at http://mail.yahoo.com





Re: Problems with viewing Hindi Unicode Page

2002-01-25 Thread Andrew Cunningham


- Original Message -
From: [EMAIL PROTECTED]
 The version of Arial Unicode MS on my system does have layout tables for
 Devanagari. I don't know with what product this version was introduced to
 my system -- I've got Win2K, IE5.5 and Office XP.


I guess the question becomes, which version of Arial Unicode MS?

I suspect that the version of Arial Unicode MS you have must be form Office
XP


Andj





iMode to Unicode mapping data

2002-01-25 Thread Ken Krugler

Hi all,

Does anybody know if there exists mappings from any of the iMode 
(DoCoMo) vendor-specific character codes to Unicode? The iMode 
characters (164 of them) exist in Shift-JIS from 0xF89F to 0xF9AF. 
Most of them would be considered dingbats. Yes, I could eyeball the 
glyphs and come up with mappings on my own, but I'm wondering if this 
has already been done.

Thanks,

-- Ken


Ken Krugler
TransPac Software, Inc.
http://www.transpac.com
+1 530-470-9200




Re: Unicode 3.2: BETA files updated

2002-01-25 Thread David Hopwood

-BEGIN PGP SIGNED MESSAGE-

Asmus Freytag wrote:
 At 06:29 AM 1/24/02 +, David Hopwood wrote:
 Kenneth Whistler wrote:
   And StandardizedVariants.html has been updated again, with more
   of the missing glyphs provided.
 
 I can't see any difference between plain U+2278 (either in the draft
 code chart or StandardizedVariants.html) and U+2278 with VS1.
 Is plain U+2278 supposed to have an oblique stroke?
 
 Same for U+2279.
 
 The plain ones are supposed to have the oblique stroke in the
 *reference* glyphs. As with all mathematical glyph variations, *both*
 variations are acceptable in common, unmarked situations.

In that case how do I specify that the reference glyph is required?
I.e. there's an asymmetry here between the VS1 glyph, which can be
specified explicitly, and the reference glyph, which can't.

One possibility is to make VS1 specify what is now the reference glyph,
and VS2 specify the alternate glyph. Unmarked would mean either.

The other possibility is to say that to be strictly Unicode-conformant,
fonts should always use the reference glyph for unmarked characters
(ignoring differences only of style). I think this is actually a better
solution in practice; it avoids having to add selectors that would
usually be redundant, and that would interfere with normalisation.
It's also consistent with the Mongolian variant selectors, where
unmarked should mean the first form.

[...]
 The Mongolian descriptions say second form, third form, and
 fourth form. Unless these are already defined somewhere, I suggest
 variation one, variation two, and variation three instead.
 
 This list is being published as Amd 1:ISO/IEC 10646-1:2000 (2002), so
 it's essentially frozen.

OK.

 Is variant or variation the preferred term? If variant is preferred,
 then why VARIATION SELECTOR ONE, etc.? If not, why StandardizedVariants?
 
 While VARIATION SELECTOR is the formal name of the character (and therefore
 fixed), referring to the selected thing as a 'variation' sounds really
 odd, that's why the more common term 'variant' is used all over the place.
 Perhaps we ought to make them formally synonyms, somewhat like code point
 and code location.

Yes, we should.

 I think it's a subtle thing. Without context, *VARIANT SELECTOR could be
 understood as a VARIANT of a SELECTOR.

But there is always sufficient context for Unicode character names: the
Unicode standard :-) I realise that the VS character names can't be changed
now, though (because they have been accepted for ISO 10646).

- -- 
David Hopwood [EMAIL PROTECTED]

Home page  PGP public key: http://www.users.zetnet.co.uk/hopwood/
RSA 2048-bit; fingerprint 71 8E A6 23 0E D3 4C E5  0F 69 8C D4 FA 66 15 01
Nothing in this message is intended to be legally binding. If I revoke a
public key but refuse to specify why, it is because the private key has been
seized under the Regulation of Investigatory Powers Act; see www.fipr.org/rip


-BEGIN PGP SIGNATURE-
Version: 2.6.3i
Charset: noconv

iQEVAwUBPFCRGDkCAxeYt5gVAQHMHwf/bsgs7cbksude6LMvxXi665uM7ypwTuUx
GOHxF4g7ji3KbHhYIdfKHqjhVikMrg8TyJFmfI7v3hcgtASZF6fJkOf9Ai3nRDuP
ku+l8LN0nuBTp2t3evsWa0gmBWcN6k4LhydiyGez1ndPM6nwLx4yF5nmyjaYWm+E
LiNtDn6Tn+oMsMzs7MwxPC6AOq1ZveIOtgw47Tbh/wa0AAjfa+1XCAnf2OEfZvR9
O6jGLCpmqHByoqzrDhlkVwGaGU6vn6TtXBR0xDWtLUI77DINWwi/dmpBTNHE+7FF
UsyL0+fue1dKZLUgV/idBPdZDxRVq6cjw0nksBZgPKjqjRBc+GmhQw==
=4JRE
-END PGP SIGNATURE-




[OT] weight vs. mass

2002-01-25 Thread Peter_Constable

On 01/25/2002 04:45:04 PM Mark Leisher wrote:

Mind you, revealed wisdom should never exceed 4.5 kilograms in weight (in
Earth-normal gravity)

Uh, Mark? Kilograms are units of mass, not weight, so something that's 4.5 
kilograms or less will be 4.5 kilograms or less whether in Earth-normal 
gravity or on the surface of a neutron star. It's weight in those two 
locations would, of course, be quite different (and would be measured in 
units like kg m / s^2 or something like that -- I forget).


Peter




POSITIVELY MUST READ! Bytext is here!

2002-01-25 Thread Bernard Miller

Hello Unicode list members,
Unicode now has a serious competitor. Please read
about it at www.bytext.org. Everyone on this list
should find it extremely interesting. 

I hope people concerned with Unicode see this as an
opportunity for growth. I think you will find Bytext
to be a superior technology that is worthy of the work
that will be required to implement it. It is
sufficiently different that there is no possible way
it can be turned into a transformation format of
Unicode or something like that. 

I guess this means that many of you are now officially
my colleagues. After following this list and reading
various things you’ve written you all seem very
friendly and intelligent. I look forward to future
conversations between us. 

Sincerely,

Bernard Miller


__
Do You Yahoo!?
Great stuff seeking new owners in Yahoo! Auctions! 
http://auctions.yahoo.com




RE: Unicode 3.2: BETA files updated

2002-01-25 Thread John Hudson

At 13:48 1/25/2002, Julie Allen wrote:

We're estimating that 4.0 will be roughly 1500 pages, which the
publisher says is not a problem for one volume. Now whether you can
carry it with one hand is a different question. :-)

Please try to ensure a sturdy binding. The binding of 3.0 is a little 
weaker than it should be for a book this size, and 1500 pages is going to 
make this more of an issue. Thanks.

John Hudson

Tiro Typeworks  www.tiro.com
Vancouver, BC   [EMAIL PROTECTED]

... es ist ein unwiederbringliches Bild der Vergangenheit,
das mit jeder Gegenwart zu verschwinden droht, die sich
nicht in ihm gemeint erkannte.

... every image of the past that is not recognized by the
present as one of its own concerns threatens to disappear
irretrievably.
   Walter Benjamin





Re: POSITIVELY MUST READ! Bytext is here!

2002-01-25 Thread Youtie Effaight

Unicode now has a serious competitor. Please read
about it at www.bytext.org. Everyone on this list
should find it extremely interesting.

Goll dang! Just what ah've bin waitin' fer!

Code points is gettin' way too expensive in Unicode,
so I sure hope bytext is sellin' 'em cheaper.

Yer ol' pal,
 Youtie



_
Get your FREE download of MSN Explorer at http://explorer.msn.com/intl.asp.





Re: Problems with viewing Hindi Unicode Page

2002-01-25 Thread Peter_Constable

On 01/25/2002 03:17:59 PM John Hudson wrote:

The version of Arial Unicode MS on my system does have layout tables for
Devanagari. I don't know with what product this version was introduced 
to
my system -- I've got Win2K, IE5.5 and Office XP.

Interesting. What's the file date on that font?

Nov 30, 2000



- Peter


---
Peter Constable

Non-Roman Script Initiative, SIL International
7500 W. Camp Wisdom Rd., Dallas, TX 75236, USA
Tel: +1 972 708 7485
E-mail: [EMAIL PROTECTED]





Re: POSITIVELY MUST READ! Bytext is here!

2002-01-25 Thread Michael \(michka\) Kaplan

From Bernard's personal site:

There are so many people smarter than me

Indeed.

But few who are so presumptuous to believe that they are a serious
competitor on such a basis? Though I can offer you a deal on personalized
tutorials to help you with your misconceptions on Unicode, though it may be
too late to do any good. :-)

Good luck with your crusade, you will need it


MichKa

Michael Kaplan
Trigeminal Software, Inc.  -- http://www.trigeminal.com/





Re: POSITIVELY MUST READ! Bytext is here!

2002-01-25 Thread David Starner

On Fri, Jan 25, 2002 at 08:33:15PM -0800, Bernard Miller wrote:
 Unicode now has a serious competitor. Please read
 about it at www.bytext.org. Everyone on this list
 should find it extremely interesting. 

Let's see. Bytext has no corporate supporters, nor is it supported by
any standards organizations. It has a hard-to-read standard that
requires intimate knowledge of Unicode to understand (I think), and that
shows no typographical sophistication. At several points (keyboard
design, markup language), the author seems to want to change the world
instead of being compatible with what's there.

For all their problems, I find Rosetta and Tron to be more serious
competitors and more interesting than Bytext.

-- 
David Starner - [EMAIL PROTECTED], dvdeug/jabber.com (Jabber)
Pointless website: http://dvdeug.dhis.org
What we've got is a blue-light special on truth. It's the hottest thing 
with the youth. -- Information Society, Peace and Love, Inc.




Re: POSITIVELY MUST READ! Bytext is here!

2002-01-25 Thread DougEwell2

In a message dated 2002-01-25 20:45:46 Pacific Standard Time, 
[EMAIL PROTECTED] writes:

 Unicode now has a serious competitor. Please read
 about it at www.bytext.org. Everyone on this list
 should find it extremely interesting. 

I just downloaded the PDF file and spent about 10 minutes skimming through 
it.  This is a joke, right?

-Doug Ewell
 Fullerton, California




Re: Unicode 3.2: BETA files updated

2002-01-25 Thread Asmus Freytag

At 10:58 PM 1/24/02 +, David Hopwood wrote:
One possibility is to make VS1 specify what is now the reference glyph,
and VS2 specify the alternate glyph. Unmarked would mean either.

Boy, great minds do think alike. I proposed that in a paper to the UTC
last year. ;-)

You realize that this issue is not limited to variation selectors?
Read the section on greek phi in http://www.unicode.org/unicode/reports/tr28

The other possibility is to say that to be strictly Unicode-conformant,
fonts should always use the reference glyph for unmarked characters
(ignoring differences only of style). I think this is actually a better
solution in practice; it avoids having to add selectors that would
usually be redundant, and that would interfere with normalisation.
It's also consistent with the Mongolian variant selectors, where
unmarked should mean the first form.

Boy, great minds to think alike. Mark Davis just proposed that in
a paper to the UTC this week.

Unfortunately. this is not a model that's always usable. Please
read the section on phi for background.

By adding a variation, we cannot restrict the glyph range for the
unmarked character - Mongolian being an exception since the unmarked
character's glyph range has been *explicitly* restricted from the
outset to the standard positional forms.

For VS1, the situation is different in that the glyph range of the
*unmarked* character *also* includes the glyph identified by VS1.

A./




Re: POSITIVELY MUST READ! Bytext is here!

2002-01-25 Thread John Hudson

So, is there a script -- something along the lines of the dialectizer 
recently mentioned here -- that automatically generates 'Competitor to 
Unicode' websites? I wonder, because they all make the same set of claims, 
display the same confusion about or misrepresentation of Unicode, and offer 
eerily identical absence of industry support.

John Hudson

Tiro Typeworks  www.tiro.com
Vancouver, BC   [EMAIL PROTECTED]

... es ist ein unwiederbringliches Bild der Vergangenheit,
das mit jeder Gegenwart zu verschwinden droht, die sich
nicht in ihm gemeint erkannte.

... every image of the past that is not recognized by the
present as one of its own concerns threatens to disappear
irretrievably.
   Walter Benjamin





Re: POSITIVELY MUST READ! Bytext is here!

2002-01-25 Thread Sean M. Burke

At 08:33 PM 2002-01-25 -0800, Bernard Miller wrote:
Hello Unicode list members,
Unicode now has a serious competitor. Please read
about it at www.bytext.org. Everyone on this list
should find it extremely interesting. 

Juvenal said It is hard NOT to write satire.

But in future I suggest you resist the temptation.


--
Sean M. Burke  [EMAIL PROTECTED]  http://www.spinn.net/~sburke/





Re: POSITIVELY MUST READ! Bytext is here!

2002-01-25 Thread Rick McGowan

 Unicode now has a serious competitor.

Kllhk!! Kllhk!! Kllhk! Whoa! Almost choked on my tofu burger!

Oh dewd, you have it so, like, all wrong... Universal character encoding  
isn't about Competition and Marketing, it's about everybody doin' it in the  
road, all together like, in love, peace, and harmony.

One of the major, major take-home points is the U word in universal  
character encoding. There's only supposed to be one of them, otherwise,  
sorry to say, it's just as bloody pointless as saddlin' a herd of cats to  
cart your horse to the flea market.

Please, get a grip.

Rick