Re: What code point is assigned for the Newton unit?
Your letter makes clear that Unicode needs to do a better job of identifying the preferred character code for many situations. The information is there to a large extent, but buried in the fine print or in data tables. You will see that there is a canonical decomposition from U+212B to U+00C5. This means that once people use Normalization in a widespread fashion, it will become practically impossible to maintain a distinction between these two codes. The inclusion of the U+212B is due to historic reasons. Many other characters have been included in Unicode over the years for legitimate purposes as compatibility characters (to allow round trip conversion to/from important legacy character sets). These have all been given compatibility decompositions. Unfortunately, many characters that have legitimate uses in a legacy-free environment, have also been given compatibility mappings at some time. This makes it very hard to use this information in its current form to identify cases when a distinction between characters should be kept or when not. There is some very explicit guidance, however, in Unicode TR#20 (Unicode and XML). The information there is readily applicable to other environments, if you pay attention to the rationale for each recommendation and evaluate whether it applies in your specific case. A./ PS: >"Ångström" is spelled wrong on the code charts at Unicode's home page, BTW. Can you cite the page number and approximate location on the page (please send this information to me and [EMAIL PROTECTED], not to the whole list).
Re: What code point is assigned for the Newton unit?
Actually, you are mistaken. The decision to encode the Angstrom sign had more to do with the fact that it ws encoded in many legacy encoding sets. There is no specific rule that every unit sign must also be encoded. If you can use Unicode to properly store and render what you need, then there is no lack that would require new characters. MichKa Michael Kaplan Trigeminal Software, Inc. http://www.trigeminal.com/ - Original Message - From: "Stefan Persson" <[EMAIL PROTECTED]> To: <[EMAIL PROTECTED]> Sent: Wednesday, September 12, 2001 8:59 AM Subject: What code point is assigned for the Newton unit? > Hi! > > I recently noticed, that the Unicode does difference between the Swedish > capital letter "Å" (U+00C5; Å) and the Ångström sign (U+212B; Å). So it > seems that every unit sign has got it's own code point, while the Latin > letters with exactly identical shape to those have other code points. For > example, the CJK Compatibility block contains some unit signs (in katakana): > > ㌂: anpea/Ampère > ㌕: kiroguramu/kilogram > etc. > > So, can someone tell me the code points for the Newton unit sign (which > looks exactly like an "N")? And can someone tell me why it's necessary to do > this difference? > > "Ångström" is spelled wrong on the code charts at Unicode's home page, BTW. > > Stefan > > > _ > Do You Yahoo!? > Get your free @yahoo.com address at http://mail.yahoo.com > > >
RE: What code point is assigned for the Newton unit?
Hi Stefan, Actually, you're making an understandable but incorrect assumption. The various units characters that look exactly like "normal" characters or sequences of characters you find scattered around Unicode are there for one reason only: they provide compatibility with existing (legacy) character sets. If you look even more closely at the Unicode character database, you'll find that most of these characters have "pointers" back to the "real" character. That's why you find most of them in blocks called "compatibility"---they only exist to provide backward compatibility (round trip conversion to and from) existing character sets and encodingss. In UNICHAR.TXT, look at the last fields (for Normalization Form KC and KD respectively) and you'll see that U+212B is mapped to U+00c5. You'll also see that the "kg" sign in CJK, for example, is mapped to the letter "k" followed by the letter "g". So, the short answer to your question is: the symbol for "Newton" is the letter "N" or U+004E, since no one saw fit to create a separate character called "newton" that looked just like an "N", but with a different semantic meaning prior to the creation of Unicode. And there should not be one created now, because the letter "N" contains all of the useful information necessary for that purpose. Best Regards, Addison Addison P. Phillips Globalization Architect / Manager, Globalization Engineering webMethods, Inc. 432 Lakeside Drive, Sunnyvale, CA +1 408.962.5487 (phone) +1 408.210.3659 (mobile) - Internationalization is an architecture. It is not a feature. -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]]On Behalf Of Stefan Persson Sent: Wednesday, September 12, 2001 8:59 AM To: [EMAIL PROTECTED] Subject: What code point is assigned for the Newton unit? Hi! I recently noticed, that the Unicode does difference between the Swedish capital letter "Å" (U+00C5; Å) and the Ångström sign (U+212B; Å). So it seems that every unit sign has got it's own code point, while the Latin letters with exactly identical shape to those have other code points. For example, the CJK Compatibility block contains some unit signs (in katakana): ㌂: anpea/Ampère ㌕: kiroguramu/kilogram etc. So, can someone tell me the code points for the Newton unit sign (which looks exactly like an "N")? And can someone tell me why it's necessary to do this difference? "Ångström" is spelled wrong on the code charts at Unicode's home page, BTW. Stefan _ Do You Yahoo!? Get your free @yahoo.com address at http://mail.yahoo.com