RE: Saying characters out loud (derives from hash, pound,octothorpe?)
OK, I was relying on Ken to retrieve this from archive, but he seems to be off researching frogiform glyphs. Check it out on Google for more references. Joe Date: 7 Aug 90 17:00:43 PDT (Tuesday) Subject: Re: Names of characters <>!*''# ^@`$$- !*'$_ %*<>#4 &)../ |{~~SYSTEM HALTED Waka waka bang splat tick tick hash, Caret at back-tick dollar dollar dash, Bang splat tick dollar under-score, Percent splat waka waka number four, Ampersand right-paren dot dot slash, Vertical-bar curly-bracket tilde tilde CRASH.
RE: Bird headed CJK variants?
Bird script, cloud script, tadpole script, and many more are illustrated on the fantastic Hawley Chinese Culture Chart "Fanciful Seal Characters". Apparently the great Hawley charts are still available: http://www.wmhawley.com/china1.html And no, there is no Tadpole Script Area in Unicode. Joe
RE: OT: Chocolate Letters
I received my chocolate "B" from my Dutch co-worker two days ago, 5 December. He apologized that the store had run out of "J"'s ... but of course "B"'s contain a lot more chocolate! I'm trying to teach him my Chinese name ... Joe B. Date: 5 Dec 90 04:46:00 PST (Wednesday) Subject: fonts for St. Nicolas From: "J. W. van Wingen" <[EMAIL PROTECTED]> To: Multiple recipients of list ISO10646 Dear Colleagues For this special day I have a very special topic. Today is St. Nicolas Eve, that is our "boxing day". Thus it is Letter Day, for people use to give their friends and relations as a traditional present the Initial Letter of their name, usually in chocolate. Thus there are enormous piles of chocolate letters in the shops, all to be sold before tomorrow. >From our point of view the curious aspects are that these letters are all capital, of the same font (serif), of two sizes only (large and medium, sometimes also small, for children), and of the same weight (within one size). But not the whole Latin alphabet is covered, there is obviously a subrepertoire in use. This can only be discovered empirically, and it is different for the various makes and producers. (There are about 4 brands, Droste, Verkade, Baronie, Cote d'or (new), and 3 chain-brands, V&D, Hema, Jamin). The I is a rarity, perhaps because it is difficult to design one of the same weight as the other letters. Q, X, Y are only available in one chain store, but U and Z are also difficult to obtain. The others are being produced according to some frequency distribution, which does not always corresponds to consumers demands, and when the Day comes nearer, you can see many people frantically delving in the piles, hoping to find at last the letter they need. The novelty of this year is the appearance of a new font (Verkade), modelled on the computer display of straight lines and thus called in Dutch "digitaal". Thus, I hope, I could contribute for today to your knowledge and amusement. Best regards, Johan van Wingen
RE: What should be radicals
> Unicode is going to stick with the KangXi radical system There Unicode goes again, flouting the will of the people ... while meanwhile in another thread an esteemed Unicode elder has proposed the death radical. It's time to bring this system into the 21st Century: where's the plastics radical, the fast-food radical, the unix radical?! Joe
FW: Learn more about Windows XP's international features
FYI, Joe -Original Message- From: Dr. International [mailto:[EMAIL PROTECTED]] Sent: Friday, April 13, 2001 6:39 PM To: [EMAIL PROTECTED] Subject: Learn more about Windows XP's international features Dear Friends, One of my colleagues recently wrote an article on new Windows XP (code name Whistler) international support. "This two part article highlights the international and multilingual functionality of Windows 2000 and Windows XP. Part One provides a brief review of Windows 2000's international support. Part Two highlights Windows XP's improvements and discusses the expanded feature list for use in a global solution." You can find it at: http://www.microsoft.com/globaldev/articles/winxpintl.asp Stay tuned for another chapter of "Ask Dr. International". Kind Regards, Dr. International Windows Division http://www.microsoft.com/globaldev/
RE: Does anyone know what language is this?
Possibly one of the local dialects of Indonesia? > apo kabar kau di sano The first Indonesian/Malay we learn is the greeting "Apa khabar?!", literally "What is the news?!" ... (later we learn that "khabar" is Arabic!). Joe
RE: Devanagari Consonant RA Rule R2
EM> Is the rule in error, or is it written to EM> cover some obscure case that most software doesn't bother with? AJ> The RA[sup] is seen applied to the independent vowel Vocalic R (U+ 090B) in AJ> printed samples in Sanskrit. Yes, this clause of the rule is intended to apply (just) to this spelling of "rr", treated though it were a conjunct, as illustrated in line (4) of Figure 9-3 on p. 214. Joe
[OT] Mumbles of Earth
Perhaps you remember that the Voyager spacecraft carried a gold phonograph record with greetings in 55 languages for the spacepersons out there. The individual audio clips of those "Murmurs of Earth" are nicely posted on the "Languages" link under: http://vraptor.jpl.nasa.gov/voyager/record.html indexed by language and with English translations. The aliens are out of luck if they speak Pig Latin (or if they threw away their phonograph when CD's came out), anyhow Joe B. sez check it out
FYI: Tatap => Tatar
Friday September 1 8:24 AM ET Russia Region Drops Cyrillic Letters MOSCOW (AP) - One of Russia's largest republics marked the start of the new school year Friday by dropping Cyrillic in favor of the Latin alphabet, in part because it wants closer ties with Europe. Schools in Tatarstan will now use the Latin alphabet for written work in the local Tatar language, spokeswoman Zukhra Minekhanova said. The transition from Cyrillic will take 10 years, she said. Tatarstan, located 470 miles east of Moscow, has a population of 4 million and is better off then most republics because of its considerable oil deposits. It has been prominent in shirking central control from Moscow and the adoption of the Latin alphabet will underline the trend. President Vladimir Putin has been seeking to restore tight central control over the republics that make up the Russian federation. Minekhanova said the change was necessary because Cyrillic was not capable of transliterating all the sounds in Tatar and because it would make European culture more accessible to students. [END]
What is "Unicode" in Chinese?
It seems that Chinese is the only major language in which the term "Unicode" needs to be translated rather than transliterated. It may be time to gather up current usage and select an "official" translation, and perhaps to bless one or more informal ones. We have collected these candidates so far: 統一碼 tongyi ma unified/unification code 單碼 dan ma unit code (標準)萬國碼 (biaojun) wanguo ma (standard) multinational code 國際碼 guoji mainternational code Please let us know if you have found these or other terms in actual current usage. Or, if you have another suggestion, even better than all those. Note that the goal here is simply to find the distinctive translation for the term "Unicode", not to designate any other international or Chinese standards related to Unicode. Joe
RE: Unicode in VFAT file system
Jony Rosenne, who has been a great contributor since or before the beginning, wrote in an off moment: > UTF-8 is a biased transformation format designed to save American and > Western Europeans storage space and to give some people a warm feeling by > keeping Unicode in the familiar 8 bit world. FYI, below are the design goals of UTF-8 as specified by its originators, Ken Thompson et al @ ATT. Joe --- From: [EMAIL PROTECTED] Date: Tue, 8 Sep 92 03:22:07 EDT To: [EMAIL PROTECTED] Subject: (XoJIG 620) Here is our modified FSS-UTF proposal. The words are the same as on the previous proposal. My apologies to the author. The code has been tested to some degree and should be pretty good shape. We have converted Plan 9 to use this encoding and are about to issue a distribution to an initial set of university users. File System Safe Universal Character Set Transformation Format (FSS-UTF) -- With the approval of ISO/IEC 10646 (Unicode) as an international standard and the anticipated wide spread use of this universal coded character set (UCS), it is necessary for historically ASCII based operating systems to devise ways to cope with representation and handling of the large number of characters that are possible to be encoded by this new standard. There are several challenges presented by UCS which must be dealt with by historical operating systems and the C-language programming environment. The most significant of these challenges is the encoding scheme used by UCS. More precisely, the challenge is the marrying of the UCS standard with existing programming languages and existing operating systems and utilities. The challenges of the programming languages and the UCS standard are being dealt with by other activities in the industry. However, we are still faced with the handling of UCS by historical operating systems and utilities. Prominent among the operating system UCS handling concerns is the representation of the data within the file system. An underlying assumption is that there is an absolute requirement to maintain the existing operating system software investment while at the same time taking advantage of the use the large number of characters provided by the UCS. UCS provides the capability to encode multi-lingual text within a single coded character set. However, UCS and its UTF variant do not protect null bytes and/or the ASCII slash ("/") making these character encodings incompatible with existing Unix implementations. The following proposal provides a Unix compatible transformation format of UCS such that Unix systems can support multi-lingual text in a single encoding. This transformation format encoding is intended to be used as a file code. This transformation format encoding of UCS is intended as an intermediate step towards full UCS support. However, since nearly all Unix implementations face the same obstacles in supporting UCS, this proposal is intended to provide a common and compatible encoding during this transition stage. Goal/Objective -- With the assumption that most, if not all, of the issues surrounding the handling and storing of UCS in historical operating system file systems are understood, the objective is to define a UCS transformation format which also meets the requirement of being usable on a historical operating system file system in a non-disruptive manner. The intent is that UCS will be the process code for the transformation format, which is usable as a file code. Criteria for the Transformation Format -- Below are the guidelines that were used in defining the UCS transformation format: 1) Compatibility with historical file systems: Historical file systems disallow the null byte and the ASCII slash character as a part of the file name. 2) Compatibility with existing programs: The existing model for multibyte processing is that ASCII does not occur anywhere in a multibyte encoding. There should be no ASCII code values for any part of a transformation format representation of a character that was not in the ASCII character set in the UCS representation of the character. 3) Ease of conversion from/to UCS. 4) The first byte should indicate the number of bytes to follow in a multibyte sequence. 5) The transformation format should not be extravagant in terms of number of bytes used for encoding. 6) It should be possible to find the start of a character efficiently starting from an arbitrary location in a byte stream. Proposed FSS-UTF The proposed UCS transformation format encodes UCS values in the range [0,0x7fff] using multibyte characters of lengths 1, 2, 3, 4, 5, and 6 bytes. For all encodings of more than one byte, the initial byte determines the number of bytes used and the high-order bit in each byte is set.
RE: Unicode FAQ addendum
>>| C1 says "A process shall interpret Unicode code values as 16-bit >>| quantities." DE> I think the focus here was supposed to be on the fact that Unicode code DE> values are *not 8-bit* quantities. This may be the path to an update that is pithy yet true. The original mantra, paraphrased in C1 and 1), was just "Globally replace 8 by 16". Reality later obsoleted the original design, bringing us UTF-8, surrogates, and UTF-32; all good things, but less pithy. Since we needn't quibble terminology in an informal statement, I wouldn't have a problem with the simple update: 1) Unicode code units are not 8 bits long; deal with it. Joe