RE: Saying characters out loud (derives from hash, pound,octothorpe?)

2002-07-11 Thread Becker, Joseph


OK, I was relying on Ken to retrieve this from archive, but he seems to be
off researching frogiform glyphs.  Check it out on Google for more
references.

Joe


Date:  7 Aug 90 17:00:43 PDT (Tuesday)
Subject: Re: Names of characters



<>!*''#
^@`$$-
!*'$_
%*<>#4
&)../
|{~~SYSTEM HALTED
 

Waka waka bang splat tick tick hash,
Caret at back-tick dollar dollar dash,
Bang splat tick dollar under-score,
Percent splat waka waka number four,
Ampersand right-paren dot dot slash,
Vertical-bar curly-bracket tilde tilde CRASH.










RE: Bird headed CJK variants?

2002-01-07 Thread Becker, Joseph


Bird script, cloud script, tadpole script, and many more are illustrated on
the fantastic Hawley Chinese Culture Chart "Fanciful Seal Characters".
Apparently the great Hawley charts are still available:

http://www.wmhawley.com/china1.html

And no, there is no Tadpole Script Area in Unicode.

Joe






RE: OT: Chocolate Letters

2001-12-07 Thread Becker, Joseph


I received my chocolate "B" from my Dutch co-worker two days ago, 5
December.  He apologized that the store had run out of "J"'s ... but of
course "B"'s contain a lot more chocolate!  I'm trying to teach him my
Chinese name ...

Joe B.



Date:  5 Dec 90 04:46:00 PST (Wednesday)
Subject: fonts for St. Nicolas
From:   "J. W. van Wingen" <[EMAIL PROTECTED]>
To: Multiple recipients of list ISO10646 

Dear Colleagues
For this special day I have a very special topic. Today is St. Nicolas
Eve, that is our "boxing day". Thus it is Letter Day, for people use to
give their friends and relations as a traditional present the Initial
Letter of their name, usually in chocolate. Thus there are enormous
piles of chocolate letters in the shops, all to be sold before tomorrow.

>From our point of view the curious aspects are that these letters are
all capital, of the same font (serif), of two sizes only (large and
medium, sometimes also small, for children), and of the same weight
(within one size). But not the whole Latin alphabet is covered, there is
obviously a subrepertoire in use. This can only be discovered
empirically, and it is different for the various makes and producers.
(There are about 4 brands, Droste, Verkade, Baronie, Cote d'or (new),
and 3 chain-brands, V&D, Hema, Jamin). The I is a rarity, perhaps
because it is difficult to design one of the same weight as the other
letters. Q, X, Y are only available in one chain store, but U and Z are
also difficult to obtain. The others are being produced according to
some frequency distribution, which does not always corresponds to
consumers demands, and when the Day comes nearer, you can see many
people frantically delving in the piles, hoping to find at last the
letter they need.

The novelty of this year is the appearance of a new font (Verkade),
modelled on the computer display of straight lines and thus called in
Dutch "digitaal".

Thus, I hope, I could contribute for today to your knowledge and
amusement.

Best regards, Johan van Wingen








RE: What should be radicals

2001-07-09 Thread Becker, Joseph


> Unicode is going to stick with the KangXi radical system

There Unicode goes again, flouting the will of the people ... while
meanwhile in another thread an esteemed Unicode elder has proposed the death
radical.  It's time to bring this system into the 21st Century: where's the
plastics radical, the fast-food radical, the unix radical?!

Joe





FW: Learn more about Windows XP's international features

2001-04-16 Thread Becker, Joseph


FYI,

Joe

-Original Message-
From: Dr. International [mailto:[EMAIL PROTECTED]]
Sent: Friday, April 13, 2001 6:39 PM
To: [EMAIL PROTECTED]
Subject: Learn more about Windows XP's international features

Dear Friends,

One of my colleagues recently wrote an article on new Windows XP (code name
Whistler) international support.

"This two part article highlights the international and multilingual
functionality of Windows 2000 and Windows XP. Part One provides a brief
review of Windows 2000's international support. Part Two highlights Windows
XP's improvements and discusses the expanded feature list for use in a
global solution."

You can find it at:
http://www.microsoft.com/globaldev/articles/winxpintl.asp

Stay tuned for another chapter of "Ask Dr. International".

Kind Regards,

Dr. International
Windows Division
http://www.microsoft.com/globaldev/




RE: Does anyone know what language is this?

2000-12-21 Thread Becker, Joseph


Possibly one of the local dialects of Indonesia?

> apo kabar kau di sano

The first Indonesian/Malay we learn is the greeting "Apa khabar?!",
literally "What is the news?!" ... (later we learn that "khabar" is
Arabic!).

Joe






RE: Devanagari Consonant RA Rule R2

2000-11-08 Thread Becker, Joseph


EM> Is the rule in error, or is it written to
EM> cover some obscure case that most software doesn't bother with?

AJ> The RA[sup] is seen applied to the independent vowel Vocalic R (U+ 090B)
in
AJ> printed samples in Sanskrit.

Yes, this clause of the rule is intended to apply (just) to this spelling of
"rr", treated though it were a conjunct, as illustrated in line (4) of
Figure 9-3 on p. 214.

Joe




[OT] Mumbles of Earth

2000-10-13 Thread Becker, Joseph


Perhaps you remember that the Voyager spacecraft carried a gold phonograph
record with greetings in 55 languages for the spacepersons out there.  The
individual audio clips of those "Murmurs of Earth" are nicely posted on the
"Languages" link under:

http://vraptor.jpl.nasa.gov/voyager/record.html

indexed by language and with English translations.

The aliens are out of luck if they speak Pig Latin (or if they threw away
their phonograph when CD's came out), anyhow

Joe B. sez check it out



FYI: Tatap => Tatar

2000-09-01 Thread Becker, Joseph


Friday September 1 8:24 AM ET

Russia Region Drops Cyrillic Letters 

MOSCOW (AP) - One of Russia's largest republics marked the start of the new
school year Friday by dropping Cyrillic in favor of the Latin alphabet, in
part because it wants closer ties with Europe.

Schools in Tatarstan will now use the Latin alphabet for written work in the
local Tatar language, spokeswoman Zukhra Minekhanova said. The transition
from Cyrillic will take 10 years, she said.

Tatarstan, located 470 miles east of Moscow, has a population of 4 million
and is better off then most republics because of its considerable oil
deposits. It has been prominent in shirking central control from Moscow and
the adoption of the Latin alphabet will underline the trend.

President Vladimir Putin has been seeking to restore tight central control
over the republics that make up the Russian federation.

Minekhanova said the change was necessary because Cyrillic was not capable
of transliterating all the sounds in Tatar and because it would make
European culture more accessible to students. 

[END]




What is "Unicode" in Chinese?

2000-07-24 Thread Becker, Joseph


It seems that Chinese is the only major language in which the term "Unicode"
needs to be translated rather than transliterated.  It may be time to gather
up current usage and select an "official" translation, and perhaps to bless
one or more informal ones.

We have collected these candidates so far:

統一碼   tongyi ma   unified/unification code
單碼  dan ma  unit code
(標準)萬國碼   (biaojun) wanguo ma (standard) multinational code
國際碼   guoji mainternational code

Please let us know if you have found these or other terms in actual current
usage.  Or, if you have another suggestion, even better than all those.

Note that the goal here is simply to find the distinctive translation for
the term "Unicode", not to designate any other international or Chinese
standards related to Unicode.

Joe











RE: Unicode in VFAT file system

2000-07-21 Thread Becker, Joseph


Jony Rosenne, who has been a great contributor since or before the
beginning, wrote in an off moment:

> UTF-8 is a biased transformation format designed to save American and
> Western Europeans storage space and to give some people a warm feeling by
> keeping Unicode in the familiar 8 bit world.

FYI, below are the design goals of UTF-8 as specified by its originators,
Ken Thompson et al @ ATT.

Joe


---
From: [EMAIL PROTECTED]
Date: Tue, 8 Sep 92 03:22:07 EDT
To: [EMAIL PROTECTED]
Subject: (XoJIG 620) 

Here is our modified FSS-UTF proposal.  The words are the same as on the
previous proposal.  My apologies to the author.  The code has been tested to
some degree and should be pretty good shape.  We have converted Plan 9 to
use this encoding and are about to issue a distribution to an initial set of
university users.

File System Safe Universal Character Set Transformation Format (FSS-UTF)
--

With the approval of ISO/IEC 10646 (Unicode) as an international standard
and the anticipated wide spread use of this universal coded character set
(UCS), it is necessary for historically ASCII based operating systems to
devise ways to cope with representation and handling of the large number of
characters that are possible to be encoded by this new standard.

There are several challenges presented by UCS which must be dealt with by
historical operating systems and the C-language programming environment.
The most significant of these challenges is the encoding scheme used by UCS.
More precisely, the challenge is the marrying of the UCS standard with
existing programming languages and existing operating systems and utilities.

The challenges of the programming languages and the UCS standard are being
dealt with by other activities in the industry.  However, we are still faced
with the handling of UCS by historical operating systems and utilities.
Prominent among the operating system UCS handling concerns is the
representation of the data within the file system.  An underlying assumption
is that there is an absolute requirement to maintain the existing operating
system software investment while at the same time taking advantage of the
use the large number of characters provided by the UCS.

UCS provides the capability to encode multi-lingual text within a single
coded character set.  However, UCS and its UTF variant do not protect null
bytes and/or the ASCII slash ("/") making these character encodings
incompatible with existing Unix implementations.  The following proposal
provides a Unix compatible transformation format of UCS such that Unix
systems can support multi-lingual text in a single encoding.  This
transformation format encoding is intended to be used as a file code.  This
transformation format encoding of UCS is intended as an intermediate step
towards full UCS support.  However, since nearly all Unix implementations
face the same obstacles in supporting UCS, this proposal is intended to
provide a common and compatible encoding during this transition stage.


Goal/Objective
--

With the assumption that most, if not all, of the issues surrounding the
handling and storing of UCS in historical operating system file systems are
understood, the objective is to define a UCS transformation format which
also meets the requirement of being usable on a historical operating system
file system in a non-disruptive manner.  The intent is that UCS will be the
process code for the transformation format, which is usable as a file code.

Criteria for the Transformation Format
--

Below are the guidelines that were used in defining the UCS transformation
format:

1) Compatibility with historical file systems:

Historical file systems disallow the null byte and the ASCII slash
character as a part of the file name.

2) Compatibility with existing programs:

The existing model for multibyte processing is that ASCII does not
occur anywhere in a multibyte encoding.  There should be no ASCII code
values for any part of a transformation format representation of a character
that was not in the ASCII character set in the UCS representation of the
character.

3) Ease of conversion from/to UCS.

4) The first byte should indicate the number of bytes to follow in a
multibyte sequence.

5) The transformation format should not be extravagant in terms of
number of bytes used for encoding.

6) It should be possible to find the start of a character
efficiently starting from an arbitrary location in a byte stream.


Proposed FSS-UTF


The proposed UCS transformation format encodes UCS values in the range
[0,0x7fff] using multibyte characters of lengths 1, 2, 3, 4, 5, and 6
bytes.  For all encodings of more than one byte, the initial byte determines
the number of bytes used and the high-order bit in each byte is set. 

RE: Unicode FAQ addendum

2000-07-20 Thread Becker, Joseph


>>| C1 says "A process shall interpret Unicode code values as 16-bit
>>| quantities."

DE> I think the focus here was supposed to be on the fact that Unicode code
DE> values are *not 8-bit* quantities.

This may be the path to an update that is pithy yet true.  The original
mantra, paraphrased in C1 and 1), was just "Globally replace 8 by 16".
Reality later obsoleted the original design, bringing us UTF-8, surrogates,
and UTF-32; all good things, but less pithy.  Since we needn't quibble
terminology in an informal statement, I wouldn't have a problem with the
simple update:

1) Unicode code units are not 8 bits long; deal with it.

Joe