RE: glyph selection for Unicode in browsers

2002-10-01 Thread Martin Duerst

At 12:14 02/10/01 -0400, [EMAIL PROTECTED] wrote:

>I agree that 'sniffing' and 'guessing' are ill-defined, and not to be
>relied upon.  However, I find it a bit 'ill-defined' that there is no
>well-defined (web server independent) way for the 'users' to override
>the possibly wrong encoding default of the web server.  Either way
>(a) the user has to do something web server dependent
>(b) the admin has to do changes to the site config
>seems a bit clunky and fragile.
>
>Since the current "resolving order" is obviously already deployed out
>there and relied upon by someone, it cannot be changed, but possibly
>something new could be introduced?

Well, servers can always be improved by the various server implementers.
What standards specify is what goes 'over the wire'.

The only thing you actually have to do is to make sure that the server
doesn't add a 'charset' parameter to the Content-Type header for
the directories you are using. Then the  is the only info,
and is used by the browser.

I'm not sure this is possible with Apache, maybe there is a need
for a RemoveCharset directive similar to RemoveType
(http://httpd.apache.org/docs/mod/mod_mime.html#removetype).
Or maybe there is some other way to get the same result.
If a new directive is desirable, then let's try to hack
the Apache code or to propose it to the Apache people.
Similar of course for other server implementations.

Regards,Martin.






Re: Pound and Lira (was: Re: The Currency Symbol of China)

2002-10-01 Thread Kenneth Whistler

Espen asked:

> Just out curiosity, is what you call High Ogonek here what ended up as U+0313 or 
>U+0314?

No. It was an erroneous identification of:

U+02BD MODIFIER LETTER REVERSED COMMA

A spacing form for a modifier mark indicating aspiration, 
rather than a non-spacing comma above.

--Ken




Re: Mac Unicode question

2002-10-01 Thread Tom Gewecke


>One wishes that Word for OS X, or AppleWorks for OS X, or InDesign
>for OS X, allowed one to input text in Unicode. But one step at a
>time, I guess. :-)

I've been playing around with the java-based ThinkFree Write (part of
ThinkFree Office, 30 day free trial, $50 purchase)

http://www.thinkfree.com/

Although there are various things TF Write cannot yet do, it appears you
can input from all Mac OS X keyboards, Unicode and otherwise, plus the
Character Palette (BMP only, however), format in various ways, and save
results as .doc, .rtf, or UTF-8 .html files.

No luck with UTF-8 plain text or copy/paste, but still seems a lot more
capable than Word X.

I'd be curious what others think of it.








Re: Mac Unicode question

2002-10-01 Thread Michael Everson

At 10:22 -0600 2002-10-01, John H. Jenkins wrote:

>On X, any (non-Classic) application can use Windows TrueType fonts. 
>Carbon applications which do not explicitly use ATSUI or MLTE are 
>limited in how much of the font they can use.  Cocoa apps are pretty 
>much able to do anything.

One wishes that Word for OS X, or AppleWorks for OS X, or InDesign 
for OS X, allowed one to input text in Unicode. But one step at a 
time, I guess. :-)

Actually, InDesign lets you paste Unicode text from a TextEdit RTF 
document, but for instance Devanagari ligatures are undone. Maybe you 
could typeset Polish text this way but there are certainly 
limitations. You can't use a keyboard layout to enter text anyway.
-- 
Michael Everson * * Everson Typography *  * http://www.evertype.com
48B Gleann na Carraige; Cill Fhionntain; Baile Átha Cliath 13; Éire
Telephone +353 86 807 9169 * * Fax +353 1 832 2189 (by arrangement)




Re: Fraktur fonts

2002-10-01 Thread jameskass


James Kass wrote,

> ...ligation for Latin/Greek/Cyrillic can't yet be handled automatically
> with Uniscribe because Uniscribe doesn't yet implement OT features for
> these scripts.
> 
> Microsoft is working on adding CGL OT support to Uniscribe.

I stand cheerfully corrected on this.  It has been pointed out off-list
that Uniscribe does now offer some OT support for scripts such as
Latin and Cyrillic.

Best regards,

James Kass.




Re Permission to reproduce?

2002-10-01 Thread Kenneth Whistler

Martin Kochanski asked:

> 
> I want to post a Cardbox database on our Web site (Cardbox is 
> the database that we sell) that contains a list of all 
> Unicode characters: hexadecimal code, decimal code, 
> character, and character name (eg. GREEK CAPITAL LETTER OMEGA 
> WITH TONOS). 
> 
> The first three of these elements are in the public domain, 
> but it strikes me that the character names might be 
> considered to be a literary work and therefore copyright. 
> Does anyone know whether I do in fact need to ask permission 
> before listing those names, and if so, whom I need to ask?

In case it wasn't clear from the short discussion that followed,
let me state for the record:

The character names are a normative part of the Unicode Standard,
and are also identically defined as a normative part of the
International Standard, ISO/IEC 10646 (English version). They
are, indeed, a part of those publicly available standard(s), intended
for free, unrestricted use by all users of those standard(s).

So you don't need to ask anyone's permission to list or otherwise
use those character names.

You *would* have to ask permission (from the Unicode Consortium)
before reproducing the exact *form* of the Unicode code charts,
as printed in the Unicode Standard itself, since the form of
the charts and associated name lists printed there *are* under
copyright.

--Ken




Re: Mac Unicode question

2002-10-01 Thread John Delacour

At 10:22 am -0600 1/10/02, John H. Jenkins wrote:

>On Tuesday, October 1, 2002, at 08:42 AM, Alan Wood wrote:
>
>>I don't think anyone replied to this.  As far as I know, these are the only
>>applications for Mac OS 9 that can use Windows TrueType fonts:
>>
>
>On X, any (non-Classic) application can use Windows TrueType fonts. 
>Carbon applications which do not explicitly use ATSUI or MLTE are 
>limited in how much of the font they can use.  Cocoa apps are pretty 
>much able to do anything.

Even better -- TT fonts (provided they do not use Unicode code 
points) can be used in ANY classic app from System 8.6 onwards (at 
least) with or without OS 10.

To make a font recognisable in OS <10, it must have its file type and 
creator types set.  To do this, select a single font file in the 
finder and run this script:

 tell application "Finder"
   set fontfile to selection as string as alias
   set file type of fontfile to "sfnt"
   set creator type of fontfile to "movr"
 end tell

TT fonts (such as Arial Unicode MS) can only be used in OS 8/9 in 
applications such as WorldText, which means, so far as I know, 
WorldText period.  Mac developers have not been and still are not 
rushing to produce editors that understand Unicode.  Some of them 
seem to believe their apps understand Unicode and make out to their 
customers that they do, but this is pure fantasy.  The illusion was 
successfully created by Apple while they dragged their feet for years 
and did conjuring tricks with the TEC.

TT fonts, whether Unicode or not, will work fine without modification 
in OS 10 and it suffices to put them in ~/Library/Fonts/.  I forget 
whether you need to log out to activate them.

I am hoping that the first serious Unicode word processor to emerge 
will be Nisus, which has done such wonderful service with 
multilingual stuff in the past.  Microsoft's Office X cannot yet 
display Unicode, though it looks as if it can 'store' Unicode behind 
its lines of dashes without destroying it by converting it.  I 
haven't tried, but it's quite possible that a Word X doc containing 
(undisplayed) Unicode strings can be transferred to Windows and 
displayed properly.  I'm just guessing here and am not going to 
bother trying because I detest MS Word in all its weak flavours.

JD





Re: Old Hungarian

2002-10-01 Thread Michael Everson

It has been pointed out to me that I ought to point people at the SEI 
initiative page at Berkeley: 
http://www.linguistics.berkeley.edu/~dwanders/
-- 
Michael Everson * * Everson Typography *  * http://www.evertype.com
48B Gleann na Carraige; Cill Fhionntain; Baile Átha Cliath 13; Éire
Telephone +353 86 807 9169 * * Fax +353 1 832 2189 (by arrangement)




Re: glyph selection for Unicode in browsers

2002-10-01 Thread John Hudson

At 03:22 AM 30-09-02, [EMAIL PROTECTED] wrote:

> >I think the idea is that, in a word processor for example
>
>What would you say about a browser?

Probably something about extended style sheets that include typographic 
system tagging. Ideally, as a typographer, I would like something like CSS 
that includes a tag for every registered OpenType Layout feature -- and OT 
'language system' tagging that sits below the level of document language 
tagging etc. --, so that I can create sophisticated online documents with 
the same level of typographic control as I have for print documents. I 
realise that it may be necessary to dress this up as a higher level, 
non-proprietory-technology-specific mark up.

John Hudson

Tiro Typeworks  www.tiro.com
Vancouver, BC   [EMAIL PROTECTED]

Those books that allow us to forget the most
are accorded the status of a classic.
   - James Secord





Re: [semi-OT]Primer on message catalog for l10n techniques

2002-10-01 Thread Youtie Effaight

Barry wrote ...

>The thread last week on "comet circumflex" briefly veered ...
>While surfing over the weekend, I came across a presentation

Which leads us to RFC 1925 and the eleventh truth of networking:

"(11) Every old idea will be proposed again with a different
name and a different presentation, regardless of whether it works.


Youtie.



_
MSN Photos is the easiest way to share and print your photos: 
http://photos.msn.com/support/worldwide.aspx





Re: Old Hungarian

2002-10-01 Thread Michael Everson

At 01:14 -0500 2002-02-01, [EMAIL PROTECTED] wrote:
>In a message dated 2002-01-31 20:20:33 Pacific Standard Time,
>[EMAIL PROTECTED] writes:
>
>>  Does anyone know if anything happened since the last
>>  proposal in 1988 to include Old Hungarian
>
>Actually 1998.  But yes, I was wondering about the status of the rovásírás as
>well.

The status is what you would expect. We had a core character set, but 
some users wanted explicit ligatures. I resisted that because 
ligation, while rare, is productive or at least unpredictable. Then a 
hiatus occurred because there were discussions about whether ZWJ 
could do the ligation. It turns out that it can, so that problem is 
solved. Another issue is whether the default directionality of the 
script should be LTR or RTL.

I have not been pressing further on this script because of other 
commitments and lack of resources. Please see Unicode Technical Note 
#4 http://www.unicode.org/notes/tn4 for information about how you 
(all) can help. And it's tax deductible.
-- 
Michael Everson * * Everson Typography *  * http://www.evertype.com
48B Gleann na Carraige; Cill Fhionntain; Baile Átha Cliath 13; Éire
Telephone +353 86 807 9169 * * Fax +353 1 832 2189 (by arrangement)




RE: Pound and Lira (was: Re: The Currency Symbol of China)

2002-10-01 Thread Michael Everson

At 17:21 +0200 2002-10-01, Marco Cimarosti wrote:

>The Italian lira is not in circulation any more and, when it was, its symbol
>was with U+00A3, which is the character Italian keyboards have on the key of
>digit "3", in place of the US "#".

And which is the position of the pound sign on UK keyboards.
-- 
Michael Everson * * Everson Typography *  * http://www.evertype.com
48B Gleann na Carraige; Cill Fhionntain; Baile Átha Cliath 13; Éire
Telephone +353 86 807 9169 * * Fax +353 1 832 2189 (by arrangement)




Re: Mac Unicode question

2002-10-01 Thread John H. Jenkins


On Tuesday, October 1, 2002, at 08:42 AM, Alan Wood wrote:

> I don't think anyone replied to this.  As far as I know, these are the 
> only
> applications for Mac OS 9 that can use Windows TrueType fonts:
>

On X, any (non-Classic) application can use Windows TrueType fonts.  
Carbon applications which do not explicitly use ATSUI or MLTE are 
limited in how much of the font they can use.  Cocoa apps are pretty 
much able to do anything.

==
John H. Jenkins
[EMAIL PROTECTED]
[EMAIL PROTECTED]
http://www.tejat.net/





Re: UTF-8 mail must be _all_ UTF-8

2002-10-01 Thread Stefan Persson

- Original Message -
From: "John Delacour" <[EMAIL PROTECTED]>
To: <[EMAIL PROTECTED]>
Cc: "Stefan Persson" <[EMAIL PROTECTED]>
Sent: Tuesday, October 01, 2002 10:08 AM
Subject: UTF-8 mail must be _all_ UTF-8

>On an unrelated point of encoding -- wrong!  You are sending a
>message in charset "UTF-8" with a signature containing an unconverted
>character from the Latin-1 charset. I don't use OE very often, but it
>looks as though you need to create a separate UTF-8 encoded signature
>file to attach to messages you send in UTF-8, since I doubt if any
>mailer is going to tolerate this mixture, and there is no way your OE
>can guess that your signature is iso-8859-1 encoded.

That signature is an ad added by the Yahoo! server. I can't do anything
about it.

Stefan

_
Följ VM på nära håll på Yahoo!s officielle VM-sajt www.yahoo.se/vm2002
Håll dig ajour med nyheter och resultat, med vinnare och förlorare...





Re: Comma below, cedilla, and Gagauz

2002-10-01 Thread Herman Ranes

The core area for written Gagauz is the Gagauz Autonomous Region 
(Gagauz-Yeri) in Moldova. The language has official status since 
autonomy in 1994. Virtually every Gagauz in Moldova who is literate in 
Gagauz is literate in Moldovan[Roumanian] and/or Russian as well.

Moldovan[Roumanian] uses Latin script with s-comma and t-comma. Turkish 
_is_ closely related to Gagauz (they are both Turkic languages), but the 
language is not used in Gagauz-Yeri. Turkish uses c-cedilla and 
s-cedilla. I suppose it would be utterly difficult to educate the people 
of Gagauz-Yeri to use the s-cedilla to write Gagauz, and the s-comma to 
write Moldovan[Roumanian] ... The only reasonable choice would be to use 
comma-letters only in both latin-script languages on Gagauz-Yeri territory.

But this is less an encoding matter, and more a political one, which 
will be decided upon in Chișinău and Komrat/Comrat. Moldova ist the most 
impoverished country in Europe, and are these somewhat subtle matters 
given high priority there?

Is there any Gagauz to ask, Michael ... ?

-Herman Ranes

Michael Everson skreiv:
> At 11:10 -0400 2002-09-26, Robert wrote:
> 
> 
>> The proper encoding of those letters is with *cedilla* (yup -- the 
>> French kind...); thus, c-cedilla, g-cedilla, s-cedilla, t-cedilla, and 
>> so on!
> 
> 
> The proper encoding of the relevant ones in Romanian is s-comma-below 
> and t-comma-below.
> 
> The proper encoding of the relevant ones in Turkish is c-cedilla and 
> s-cedilla.
> 
> What the proper encoding in Gagauz is is problematic. It is a Turkic 
> language so one would expect cedilla, but it is used in Romania and 
> Moldova (and Bulgaria and Greece), so the conculusion as to which is 
> preferred is not necessarily foregone.
> 
> We are going to ask the Gagauzi.


-- 
Herman Ranes  Høgskolen i Sør-Trøndelag
   Avdeling for teknologi
Telefon   +47 73559606Institutt for elektroteknikk
Telefaks  +47 73559581
<[EMAIL PROTECTED]>  N-7004 TRONDHEIM
http://www.hist.no/~herman/   NOREG





RE: Pound and Lira (was: Re: The Currency Symbol of China)

2002-10-01 Thread Marco Cimarosti

Kenneth Whistler wrote:
> [...] So it is possible that the lira sign
> simply derives from a draft list that was standardized
> without anyone ever spending time to debate the pound/lira
> symbol unification first. [...]

If it proves true that the lira sign was an unification fault, why not
stating it officially in the next book?

The current information is misleading:

00A3POUND SIGN
= pound sterling, Irish punt
x (lira sign - 20A4)

20A4LIRA SIGN
* Italy, Turkey
x (pound sign - 00A3)

Why not substituting it with something more sensible, e.g.:

00A3POUND SIGN
= pound sterling, Irish punt, Italian lira, Turkish lira,
etc.
x (lira sign - 20A4)

20A4LIRA SIGN
* Intended for lira, but not widely used.
* Preferred character for lira (Italy, Turkey, etc.) is
00A3.
x (pound sign - 00A3)

The Italian lira is not in circulation any more and, when it was, its symbol
was with U+00A3, which is the character Italian keyboards have on the key of
digit "3", in place of the US "#".

I don't know what's the situation in Turkey, Cyprus, Egypt, and so on, but I
would be very surprised to know that anyone ever used U+20A4.

_ Marco




RE: The Currency Symbol of China

2002-10-01 Thread Barry Caplan

At 12:50 AM 10/1/2002 -0700, Ben Monroe wrote:
>> For instance, IIRC, Isabella Bird wrote in her (British) English
>travelogue in the early Meiji restoration era (1878 AD)
>> of travels to Yedo (now commonly called "Edo" in the literature, and
>known by its modern name to all as "Tokyo"). She called Tokyo "Tokiyo".
>
>Just a small correction. The Meiji Restoration was in 1867 (some
>historians view it as 1868 though).

That's a timezone issue, right? :) Actually the 1878 date I referred to is the date of 
the travels discussed in the book, not the date of the Meiji Restoration. the book 
itself, according to my copy from about 100 years later, was first published in 1880.

Barry Caplan
www.i18n.com






RE: glyph selection for Unicode in browsers

2002-10-01 Thread jarkko.hietaniemi


> Sniffing isn't a good idea in the long term. It may work
> for simple web page serving, but as soon as you go XML and
> start to move data around without the user having a chance
> to see it frequently, you'll end up with a big mess.
> 
> Also, 'guessing' is very ill-defined. You might serve
> a document to your favorite browser, and it looks okay.
> But other browsers might guess a bit differently, or
> a new version of your favorite browser may guess a bit
> differently, and off you are.

I agree that 'sniffing' and 'guessing' are ill-defined, and not to be
relied upon.  However, I find it a bit 'ill-defined' that there is no
well-defined (web server independent) way for the 'users' to override
the possibly wrong encoding default of the web server.  Either way
(a) the user has to do something web server dependent
(b) the admin has to do changes to the site config
seems a bit clunky and fragile.

Since the current "resolving order" is obviously already deployed out
there and relied upon by someone, it cannot be changed, but possibly
something new could be introduced?










Re: The Currency Symbol of China

2002-10-01 Thread Patrick Andries


- Message d'origine -
De : "Ben Monroe" <[EMAIL PROTECTED]>

> - the yen currency began in 1871

And written as such since 1871 in French accord to my Dictionnaire
historique de la langue française which writes « est l'adaptation (1871)
d'un mot japonais dont la transcription normale serait èn, lui-même du
chinois yüan « rond, cercle » et aussi « dollar » (en tant que monnaie
ronde). »

> - there are many foreign languages that have common words with the same
> spelling as "en", so there was a need to avoid this. French and Spanish
> has an "en" meaning "inside (something)"

True even though French could use an accent (as in sample above) to
disambiguate (en is a single nasal sound, èn is not). I'm not sure that the
French and Spanish had much influence on the transcription of Japanese
(English, Dutch or Portuguese maybe).

> and Dutch has an "en" meaning
> "and then". [I really do not know. I am just repeating what it says
> there.]

In Dutch « en » simply means « and » (or « both  and »); there is no
more a connation of « then » than in English.

P. Andries






RE: The Currency Symbol of China

2002-10-01 Thread Thomas Chan

On Mon, 30 Sep 2002, Thomas Chan wrote:

> (Was U+56ED what you saw, James?--I don't have my Krause catalog by me at
> the moment, but I think it was present on older PRC coinage.)

A correction to myself here--I thought I had seen U+56ED as a currency
unit, but now I cannot find a reference in my notes, so I'm retracting
this one.


James Kass said:
>I don't blame you.  According to Krause...
>One Dollar (Yuan) = 100 Cents (Fen/Hsien) = 1000 Cash (Wen/Ch'ien) =
>(=)  0.72 Tael (Liang) = 7 Mace and 2 Candareens
>...and, that's just for starters.

Well, the last part is a different system--mace and candareens are weight
measures for silver coins as part of the "tael" system: liang/qian/fei/li
(tael/mace/candareen/?).  Hence, there are three systems: dollar, cash,
and tael.

The 1/100th units fen and xian (hsien in Krause) are part of different
systems: yuan-jiao-fen in the north, and yuan-hao-xian in the south.
(xian U+4ED9 < English 'cent', even in Macau, where 1/100th of a pataca is
an avo.)  The northern and southern systems may be seen residually in
contemporary Hong Kong and Macau, and historically during the early 20th
century during a period of provincial minting in mainland China, where
people used their local terminology on their coins, with the exception
of the 1.0 unit.  The situation is similar for the 1/10th unit; jiao in
the north and hao in the south.


Marco Cimarosti said:
>U+5143 4~6~D^4~D6~^A
>U+5186 4~6~D^4~DC^4~DC
>U+5706 4~6~D^4~D^4~DC
>U+570E 4~6~D^4~D^I
>U+5713 4~6~D^4~D^O

Thank you for finding these--I didn't realize that U+570E was encoded
independently of U+5713, and not as a font variant of the latter.  (And I
had forgotten the obvious U+5713 ~ U+5706 connection.)

I checked Krause--U+5713 may be seen on pre-war Japanese coinage for
"yen".


Alan Wood said:
>I have added all of the symbols from this discussion to the second table
on
>my page at:
>http://www.alanwood.net/unicode/currency_symbols.html

Please remove U+56ED--that was my mistake.

U+6587 is not entirely appropriate there--while it was a currency unit
(approx. 1/1000th yuan), it was gone in all regions by the early 1930s,
and now it is just (a least) a colloquial Cantonese synonym for yuan, sort
of like northern kuai4 U+584A/U+5757 'piece'.  I can provide you with a
bunch of other terms for 1/10th and 1/100th units, but once one steps into
the realm of Han characters, one is no longer dealing with symbols but
words, and the list can inflate very quickly unless restrictions are set,
such as primary currency units (not 1/10th or 1/100th units) in
contemporary use (not historical) that appear appear on currency (not
other terms like "bucks", "benjamins", etc).

U+5713 I wouldn't list as "yen/yuan variant"--it should be on the same
level as U+5143 and U+5186, as U+5713 (Yuan) is the unit used in Taiwan
and Hong Kong on the currency (despite being "dollars" in English).


Thomas Chan
[EMAIL PROTECTED]





Re: Pound and Lira (was: Re: The Currency Symbol of China)

2002-10-01 Thread Jim Allan

Kenneth Whistler posted:

"It is a deeper subject to figure out how the LIRA SIGN got into
Unicode 1.0 in the first place, and I don't have all the
relevant documents to hand to track it down. It was certainly
already in the April 1990 pre-publication draft of Unicode 1.0
which was widely circulated."

The distinction between pound currency sign and lira currency sign 
appears in the HP Roman-8 character set, still the default character set 
in HP laser printers. See http://www.kostis.net/charsets/hproman8.htm. 
AF is LIRA SIGN and BB is POUND SIGN.

Someone must have thought the difference significant to include both 
glyphs in that set. This might be the source for Unicode. Not including 
both symbols would have broken encoding to Roman-8.

Jim Allan






RE: Mac Unicode question

2002-10-01 Thread Alan Wood

David

I don't think anyone replied to this.  As far as I know, these are the only
applications for Mac OS 9 that can use Windows TrueType fonts:

1)  WorldText, an editor produced by Apple that requires OS 9.1 or later.

2)  SUE (Simple Unicode Editor)
http://members.tripod.com/%7Etomaszek/sue.html

3)  Pepper, a text editor that runs under both Mac OS 9 and Mac OS X 10
http://www.hekkelman.com/pepper.html

4)  MLTE Demo, a text editor for OS 9
ttp://www.merzwaren.com/snippets/index.html#mltedemo

5) Possibly jEdit, a Java text editor for programmers, but I cannot get Java
to work on my OS 9.2.2
http://www.jedit.org

6)  Possibly Simredo, a Java text editor, but I cannot get Java to work
http://www4.vc-net.ne.jp/~klivo/sim/simeng.htm

Alan Wood
http://www.alanwood.net (Unicode, special characters, pesticide names)

> -Original Message-
> From: David J. Perry [SMTP:[EMAIL PROTECTED]]
> Sent: Friday, August 23, 2002 8:26 PM
> To:   [EMAIL PROTECTED]
> Subject:  Mac Unicode question
> 
> I have a large unicode TT font (Windows/OS X) that some people want to
> use under earlier versions of Mac OS X.  I know that Unicode support
> began with OS 8.5 but that many applications were never updated to take
> advantage of it.
> 
> I've been told that Mac apps that _were_ updated can use Windows TT
> fonts just as OS X can.  I'm dubious but the source usually knows what
> he's talking about.  Can anybody confirm?  I don't have an older Mac to
> test on.
> 
> Thanks - David
> 




RE: The Currency Symbol of China

2002-10-01 Thread Alan Wood

I have added all of the symbols from this discussion to the second table on
my page at:

http://www.alanwood.net/unicode/currency_symbols.html

Alan Wood




RE: The Currency Symbol of China

2002-10-01 Thread Marco Cimarosti

Stefan Persson wrote:
> > Similarly, "yen" is just the Japanese (kun) pronunciation of Chinese
> "yuan".
> > IMHO, the preferred symbol for both currencies should be U+00A5.
> 
> Wrong:
> 
> Yen (円) is U+5186, while yuan (元) is U+5143.
> 
> "Yen" is an ancient "on" pronunciation for U+5186; today it's 
> pronounced "en."

They are just two different spellings of the same word, "yuán", which means
"currency unit" or "circle". A quick search brough up at least six variants,
all of which are pronunced "(y)en" or "(g)en" in Japanese on:

U+5143 元
U+5186 円
U+5706 圆
U+570E 圎
U+5713 圓
U+571C 圜

_ Marco




Re: The Currency Symbol of China

2002-10-01 Thread Raymond Mercier

In the Lonely Planet Guide to China the currency RMB is always abbreviated 
simply as Y, and this in a volume with no shortage of Chinese characters, 
and where the Japanese units are indicated as U+FFE5.

Raymond Mercier






UTF-8 mail must be _all_ UTF-8

2002-10-01 Thread John Delacour

At 10:08 pm +0200 30/9/02, Stefan Persson wrote:

>Wrong:
>
>Yen (円) is U+5186, while yuan (元) is U+5143.
>
>"Yen" is an ancient "on" pronunciation for U+5186; today it's pronounced
>"en."
>
>Stefan
>
>_
>Gratis e-mail resten av livet på www.yahoo.se/mail
>Busenkelt!

On an unrelated point of encoding -- wrong!  You are sending a 
message in charset "UTF-8" with a signature containing an unconverted 
character from the Latin-1 charset. I don't use OE very often, but it 
looks as though you need to create a separate UTF-8 encoded signature 
file to attach to messages you send in UTF-8, since I doubt if any 
mailer is going to tolerate this mixture, and there is no way your OE 
can guess that your signature is iso-8859-1 encoded.

JD

PS.  This message is _not_ sent with charset UTF-8





RE: The Currency Symbol of China

2002-10-01 Thread Ben Monroe

Barry Caplan wrote:

> Wow ! I brought Ben out of lurk status after 6 months!

Wow, someone still remembers me after 6 months. I hope it isn't because
I left a bad impression or seriously annoyed someone (smile).
As I'm sure many are, I have been busy with work and other projects.
I am quite interested in many of the topics here, but am afraid that I
am often out of my league for many, so often I do not have anything to
contribute.

> For instance, IIRC, Isabella Bird wrote in her (British) English
travelogue in the early Meiji restoration era (1878 AD)
> of travels to Yedo (now commonly called "Edo" in the literature, and
known by its modern name to all as "Tokyo"). She called Tokyo "Tokiyo".

Just a small correction. The Meiji Restoration was in 1867 (some
historians view it as 1868 though).

For those who read Japanese, an interesting exchange regarding the
spelling of "yen" (and many other topics) occurred in 1998 and 2000 on a
bulletin board at the following site. The various posts were collected
and put together here:
http://www.asahi-net.or.jp/~hi5k-stu/nihongo/hatuon.htm

I have written a _brief_ summary of interesting points regarding "yen"
in English below.
Some of the topics we have already discussed.
There are many, many other broad topics such as /wi, we/ --> /i, e/ and
other such changes which I do not intend to get into.
(Please note that these are not my postings or ideas. Some I agree with;
others I don't.)

- the yen currency began in 1871
- a foreigner introducing minting technology at that time found it
easier to say "yen", so that is how it was written
- it is easier for westerns to say [ye] than [e]
- romanization of "en" could be easily mispronounced by English speakers
as something close to "in"; to avoid this, "y" was prefixed
- common romanization for /e/ at that time was "ye" so "en" just was
written as "yen"; the Christian texts used "ye" for /e/. Even though
[ye] gradually became [e], the spelling system did not change
- Chinese currency is called "yuan" and due to that influence, it is
called "yen" in Japan
- there are many foreign languages that have common words with the same
spelling as "en", so there was a need to avoid this. French and Spanish
has an "en" meaning "inside (something)" and Dutch has an "en" meaning
"and then". [I really do not know. I am just repeating what it says
there.]
- English speakers seem to hear "yen" instead of "en"

> I also think (but I could be wrong) that "ye" is not one of the
characters in the famous Buddhist poem that
> uses each of the kana once and only once, and establishes a de facto
sorting order by virtue of being the only such poem.

Kenneth Whistler was kind enough to respond to this already.

Kenneth Whistler wrote kana distinctions in the Iroha poem:

> i ro ha ni ho he to
> chi ri nu ru wo
> wa ka yo ta re so
> tsu ne na ra mu
> u wi no o ku ya ma
> ke fu ko e te
> [^ that is one ]
> a sa ki yu me mi shi
> ye hi mo se su
> [^ that is the other -- probably should be (w)e ]

Yes, it is and should be /we/. What makes this poem so important is that
it shows a contrast between /i/ and /wi/ and a contrast between /e/ and
/we/. However, by this time there is no contrast between a /e/ and /ye/,
indicating that at this time the distinction between these two sounds
were not made anymore. These two sounds are usually thought to have
merged into [ye] (of course being written with "e"). In addition to what
I wrote before about this merging, Moto'ori Norinaga (1730-1801; one of
the four great "kokugaku" (linguist isn't quite right; perhaps
philologist is closer) scholars) noticed that the Heian poetry, which
was in various 5-7- patterns, would occasionally have an extra mora
(called "jiamari") in the 5-line resulting in 6 mora. He noticed
specifically that this "jiamari" occurred in mora ending in /a, i, u,
o/, but not /e/. He speculates that this /e/ in old texts was different
from the other vowels; i.e., not a vowel, but actually [ye]. (I do not
think the concept of "glides" or semi-vowels was an issue back then.)

> [Attributed to middle Heian, around A.D. 1000.]
>  BTW, the translation of Kukai's iroha poem at that link leaves much
to be desired, though the various version

Yes, it is traditionally attributed to Ku[u]kai (774-835). (He is also,
incorrectly, traditionally regarded as the creator of hiragana.)
However, during the era that he lived, both /e/ and /ye/ should have
been distinguished from each other, but they are not in this poem. This
implies that it should have been written roughly after 950 or so. Also,
the style of the poem is known as "imayou" also indicates a date after
Kuukai lived. The oldest source of this Iroha poem that I know of is
from the "Konkoumyou saishouou kyou" sutra of 1079.

Another important poem often used as evidence for sound distinctions is
the "Ametuti no kotoba" (or "Ametsuchi no kotoba" if you prefer). It
dates from 967. Like the Iroha poem, it uses 48 kana each once with the
exception of /e

Re: The Currency Symbol of China

2002-10-01 Thread Raymond Mercier

In the Lonely Planet Guide to China the currency RMB is always abbreviated 
simply as Y, and this in a volume with no shortage of Chinese characters, 
and where the Japanese units are indicated as U+FFE5.

Raymond Mercier