Re: Names of planes, and request for sneak preview
On Tue, 11 Jul 2000, Asmus Freytag wrote: > There are 0x10 - 34 possible characters! minus 2048 surrogate codepoints.
Re: Names of planes, and request for sneak preview
John Cowan <[EMAIL PROTECTED]> wrote: > I coined the term "Astral Planes" for the space 1-10, but it > ticked off someone or other (I suppose because it might suggest that > they weren't worthy of implementation) and I stopped using it. On the other hand, it amused me enough so that I actually used it as a function name: int AstralPlanes(const unsigned long ulChar) { return((ulChar & 0x) != 0); } -Doug Ewell Fullerton, California
RE: Names of planes, and request for sneak preview
None of the code values ending in 0xFFFE and 0x refer to characters, i.e., 0xFFFE, 0x, 0x1FFFE, 0x1, etc. These values exist for internal use only. Murray -Original Message- From: john [mailto:[EMAIL PROTECTED]] Sent: Tuesday, July 11, 2000 5:17 PM To: Unicode List Subject: Re: Names of planes, and request for sneak preview > Asmus Freytag wrote: > There are 0x10 - 34 possible characters! > All code values ending in 0xFFFE and Ox do *not* refer to characters. > They are not just temporarily unassigned, but permanently reserved as > non-characters. Clarification request: Does that mean None of the code values ending in 0xFFFE and 0x refer to characters? or Not all of the code values ending in 0xFFFE and 0x refer to characters (i..e some do and some do not)?
Re: Names of planes, and request for sneak preview
> Asmus Freytag wrote: > There are 0x10 - 34 possible characters! > All code values ending in 0xFFFE and Ox do *not* refer to characters. > They are not just temporarily unassigned, but permanently reserved as > non-characters. Clarification request: Does that mean None of the code values ending in 0xFFFE and 0x refer to characters? or Not all of the code values ending in 0xFFFE and 0x refer to characters (i..e some do and some do not)?
Re: Names of planes, and request for sneak preview
Okay, 0x10FFDE different characters. But what of planes 15 and 16? -- Robert Lozyniak Accusplit pedometer, purchased about 2000a07l01d19h45mZ, has NOT FLIPPED My page: http://walk.to/11 [EMAIL PROTECTED] - email (917) 421-3909 x1133 - voicemail/fax Asmus Freytag <[EMAIL PROTECTED]> wrote: > At 12:18 PM 7/11/00 -0800, [EMAIL PROTECTED] > wrote: > >What about F? I was told that there are 0x10 > >possible characters? > >Oh, by the way, if 12 is a dozen and 144 is a > gross, > >what are 16 and 256? > > > There are 0x10 - 34 possible characters! > > All code values ending in 0xFFFE and Ox do > *not* refer to characters. > They are not just temporarily unassigned, but permanently > reserved as > non-characters. > > A./ > ___ Get your own FREE Bolt Onebox - FREE voicemail, email, and fax, all in one place - sign up at http://www.bolt.com
Re: Names of planes, and request for sneak preview
At 12:18 PM 7/11/00 -0800, [EMAIL PROTECTED] wrote: >What about F? I was told that there are 0x10 >possible characters? >Oh, by the way, if 12 is a dozen and 144 is a gross, >what are 16 and 256? There are 0x10 - 34 possible characters! All code values ending in 0xFFFE and Ox do *not* refer to characters. They are not just temporarily unassigned, but permanently reserved as non-characters. A./
Re: Names of planes, and request for sneak preview
[EMAIL PROTECTED] wrote: > > What about F? I was told that there are 0x10 > possible characters? > Oh, by the way, if 12 is a dozen and 144 is a gross, > what are 16 and 256? 1 and 1/4 dozen and 9/16 of a gross.
Re: Names of planes, and request for sneak preview
shoot, its 1 1/3 dozen. Tex Texin wrote: > > [EMAIL PROTECTED] wrote: > > > > What about F? I was told that there are 0x10 > > possible characters? > > Oh, by the way, if 12 is a dozen and 144 is a gross, > > what are 16 and 256? > > 1 and 1/4 dozen and 9/16 of a gross. -- Tex Texin Director, International Products [EMAIL PROTECTED] Progress Software Corp. +1-781-280-4271 Fax:+1-781-280-4655 14 Oak Park, Bedford, MA 01730 http://www.progress.com The #1 Embedded Database http://www.SonicMQ.com JMS Compliant Messaging- Best Middleware Award http://www.aspconnections.com Leading provider in the ASP marketplace http://www.NuSphere.com Open Source software and commercial services for MySQL Progress Globalization Program http://www.progress.com/partners/globalization.htm Come to the Panel on Open Source Approaches to Unicode Libraries at the Sept. Unicode Conference http://www.unicode.org/iuc/iuc17
Re: Names of planes, and request for sneak preview
> Oh, by the way, if 12 is a dozen and 144 is a gross, > what are 16 and 256? 272
Re: Names of planes, and request for sneak preview
What about F? I was told that there are 0x10 possible characters? Oh, by the way, if 12 is a dozen and 144 is a gross, what are 16 and 256? -- Robert Lozyniak Accusplit pedometer, purchased about 2000a07l01d19h45mZ, has NOT FLIPPED My page: http://walk.to/11 [EMAIL PROTECTED] - email (917) 421-3909 x1133 - voicemail/fax Kenneth Whistler <[EMAIL PROTECTED]> wrote: > Mark responded reconditely: > > > > > I ALY FND ANMs HRD2 DL WTH. WD PFR NML WDS. > > > > Michael Everson wrote: > > > > > Ar 07:53 -0800 2000-07-11, scríobh John H. > Jenkins: > > > > > > >At the same time, it would be nice to have > a Unicodally correct way > > > >of referring to planes 1 and 2, since there > is an important boundary > > > >between them. > > > > > > Just use the acronyms BMP, SMP, and SIP. > > > > > From the practice that is developing in the relevant > committees, > and the discussion on this list, it would appear > that the > practical consensus seems to be heading towards: > > .. "The BMP" > 1..1 "Plane 1" > 2..2 "Plane 2" > E..E "Plane 14" > > Those are in fact the terms that most people are > using. It is quite > unlikely that "SMP" and "SIP" and "SPP" are going > to catch on > very widely, given the difficulty of keeping them > straight, or > separate from other TLA's and FLA's like SMTP, > TCPIP, etc. (SPP > also means Southwest Power Pool, Science and Policy > Programs, > School of Public Policy, Society for Philosophy > and Psychology, > Student Protector Plan, Sandy's Pattern Pantry, > Self-Publishing Partners, > and the Santiago Park Plaza...) "Plane 14" is actually > a *much* better > term -- if you do an Internet search on that, all > the pertinent > Unicode-related stuff actually pops right up to > the top of the search. > > And despite Mark's disclaimer about the validity > of any boundaries > past the / 1 boundary, the Plane boundaries > do have some > importance. They are likely to figure prominently > in trie structures > for accessing properties of characters past ; > the planes themselves > have some uniformity of their properties, since > different things > are being isolated to Plane 2 or Plane 14 as opposed > to Plane 1. > > Also, in favor of John Cowan's terminology, one > might also note > that all of the "Astral Planes" are self-naming > by their initial > hex digit. > > --Ken > ___ Get your own FREE Bolt Onebox - FREE voicemail, email, and fax, all in one place - sign up at http://www.bolt.com
Re: Names of planes, and request for sneak preview
Mark responded reconditely: > > I ALY FND ANMs HRD2 DL WTH. WD PFR NML WDS. > > Michael Everson wrote: > > > Ar 07:53 -0800 2000-07-11, scríobh John H. Jenkins: > > > > >At the same time, it would be nice to have a Unicodally correct way > > >of referring to planes 1 and 2, since there is an important boundary > > >between them. > > > > Just use the acronyms BMP, SMP, and SIP. > > >From the practice that is developing in the relevant committees, and the discussion on this list, it would appear that the practical consensus seems to be heading towards: .. "The BMP" 1..1 "Plane 1" 2..2 "Plane 2" E..E "Plane 14" Those are in fact the terms that most people are using. It is quite unlikely that "SMP" and "SIP" and "SPP" are going to catch on very widely, given the difficulty of keeping them straight, or separate from other TLA's and FLA's like SMTP, TCPIP, etc. (SPP also means Southwest Power Pool, Science and Policy Programs, School of Public Policy, Society for Philosophy and Psychology, Student Protector Plan, Sandy's Pattern Pantry, Self-Publishing Partners, and the Santiago Park Plaza...) "Plane 14" is actually a *much* better term -- if you do an Internet search on that, all the pertinent Unicode-related stuff actually pops right up to the top of the search. And despite Mark's disclaimer about the validity of any boundaries past the / 1 boundary, the Plane boundaries do have some importance. They are likely to figure prominently in trie structures for accessing properties of characters past ; the planes themselves have some uniformity of their properties, since different things are being isolated to Plane 2 or Plane 14 as opposed to Plane 1. Also, in favor of John Cowan's terminology, one might also note that all of the "Astral Planes" are self-naming by their initial hex digit. --Ken
Re: Names of planes, and request for sneak preview
I ALY FND ANMs HRD2 DL WTH. WD PFR NML WDS. Michael Everson wrote: > Ar 07:53 -0800 2000-07-11, scríobh John H. Jenkins: > > >At the same time, it would be nice to have a Unicodally correct way > >of referring to planes 1 and 2, since there is an important boundary > >between them. > > Just use the acronyms BMP, SMP, and SIP. > > Michael Everson ** Everson Gunn Teoranta ** http://www.egt.ie > 15 Port Chaeimhghein Íochtarach; Baile Átha Cliath 2; Éire/Ireland > Vox +353 1 478 2597 ** Fax +353 1 478 2597 ** Mob +353 86 807 9169 > 27 Páirc an Fhéithlinn; Baile an Bhóthair; Co. Átha Cliath; Éire
Re: Names of planes, and request for sneak preview
Ar 07:53 -0800 2000-07-11, scríobh John H. Jenkins: >At the same time, it would be nice to have a Unicodally correct way >of referring to planes 1 and 2, since there is an important boundary >between them. Just use the acronyms BMP, SMP, and SIP. Michael Everson ** Everson Gunn Teoranta ** http://www.egt.ie 15 Port Chaeimhghein Íochtarach; Baile Átha Cliath 2; Éire/Ireland Vox +353 1 478 2597 ** Fax +353 1 478 2597 ** Mob +353 86 807 9169 27 Páirc an Fhéithlinn; Baile an Bhóthair; Co. Átha Cliath; Éire
Re: Names of planes, and request for sneak preview
The boundary between at 2 is not really any more significant than the bounary at 3000, or 2000, or 900. We don't need special names for boundaries between different scripts. At plane 14, it's only a few characters we regret ;-) Mark "John H. Jenkins" wrote: > At 7:19 AM -0800 7/11/00, Mark Davis wrote: > >However, there are certain units or thresholds that are useful to distinguish > >in Unicode. The most important threshold is the one between and 1: > >important for UTF-16 implementations (and to a minor degree, UTF-8 > >implementations). So there are terms for codepoints above and below that. I've > >heard the following used: > > > >BMP characters: those with codepoints < 1 (borrowing BMP from 10646) > >aka UCS-2 characters > >aka non-surrogate characters > > > >non-BMP characters: those with codepoints > > >aka non-UCS-2 characters > >aka surrogate characters > > > > At the same time, it would be nice to have a Unicodally correct way > of referring to planes 1 and 2, since there is an important boundary > between them. > > And of course, the *proper* way to refer to plane 14 is to pretend it > doesn't exist at all. :-) > > -- > = > John H. Jenkins > [EMAIL PROTECTED] > [EMAIL PROTECTED] > http://www.blueneptune.com/~tseng
Re: Names of planes, and request for sneak preview
Mark Davis wrote: > It would be nice to have another positive, non-acronymic term for > those characters above , but none has yet arisen. I coined the term "Astral Planes" for the space 1-10, but it ticked off someone or other (I suppose because it might suggest that they weren't worthy of implementation) and I stopped using it. -- Schlingt dreifach einen Kreis um dies! || John Cowan <[EMAIL PROTECTED]> Schliesst euer Aug vor heiliger Schau, || http://www.reutershealth.com Denn er genoss vom Honig-Tau, || http://www.ccil.org/~cowan Und trank die Milch vom Paradies.-- Coleridge (tr. Politzer)
Re: Names of planes, and request for sneak preview
At 7:19 AM -0800 7/11/00, Mark Davis wrote: >However, there are certain units or thresholds that are useful to distinguish >in Unicode. The most important threshold is the one between and 1: >important for UTF-16 implementations (and to a minor degree, UTF-8 >implementations). So there are terms for codepoints above and below that. I've >heard the following used: > >BMP characters: those with codepoints < 1 (borrowing BMP from 10646) >aka UCS-2 characters >aka non-surrogate characters > >non-BMP characters: those with codepoints > >aka non-UCS-2 characters >aka surrogate characters > At the same time, it would be nice to have a Unicodally correct way of referring to planes 1 and 2, since there is an important boundary between them. And of course, the *proper* way to refer to plane 14 is to pretend it doesn't exist at all. :-) -- = John H. Jenkins [EMAIL PROTECTED] [EMAIL PROTECTED] http://www.blueneptune.com/~tseng
Re: Names of planes, and request for sneak preview
We haven't used the notion of Planes and Groups. These actually derived, as far as I can remember from early days in L2, from later-discarded mechanisms that would let you swap in planes into the BMP. Thus it was important to distinguish these levels. Planes and Groups are themselves not particularly useful in Unicode, which has a flat coding space from 0 to 10. We basically just use them now in communicating with our 10646 brethren. However, there are certain units or thresholds that are useful to distinguish in Unicode. The most important threshold is the one between and 1: important for UTF-16 implementations (and to a minor degree, UTF-8 implementations). So there are terms for codepoints above and below that. I've heard the following used: BMP characters: those with codepoints < 1 (borrowing BMP from 10646) aka UCS-2 characters aka non-surrogate characters non-BMP characters: those with codepoints > aka non-UCS-2 characters aka surrogate characters Note: D800 - DFFF are *not* surrogate characters. They are surrogate codepoints, two of which (in UTF-16) represent a surrogate character. The disadvanatage of using this term "surrogate character" is because of this possible confusion; you don't have the same problem if you say "non-BMP character". It would be nice to have another positive, non-acronymic term for those characters above , but none has yet arisen. There are other useful boundaries: Column - 16 values with all but the last 4 binary digits the same: e.g. 2060-206F Window - 128 values with all but the last 7 binary digits the same, e.g. 2000-207F, or 2080-20FF. Used in SCSU; for compression, blocks of 128 are useful. In the UTC, we try not to span window boundaries unnecessarily when allocating characters (for historical reasons, we used to violate this, cf Hebrew or the Kanas). aka half-row (In 10646, Row is 256 values with all but the last 8 binary digits the same). Surrogate Block - 1024 values with all but the last 10 binary digits the same, e.g. E-E0400. In UTF-16, these have the same high (leading) surrogate code value. In the UTC, we try not to span surrogate block boundaries unnecessarily when allocating characters. Mark Doug Ewell wrote: > John Cowan <[EMAIL PROTECTED]> wrote: > > >> Everybody and his cat should know that BMP stands for Basic Multilingual > >> Plane, and the Roadmap pages also show that SMP is short for Secondary > >> Multilingual Plane. What are SIP and GPP? > > > > Supplementary Ideographic Plane, General Purpose Plane. Note that these > > are 10646 names, not Unicode names. > > Interesting... I hadn't looked at it that way. I know that the entire > group/plane/row/cell breakdown is a 10646 thing. Is there a Unicode- > specific term for the range from U+ to U+FFFD, the code points that > can be represented without surrogates? > > -Doug Ewell > Fullerton, California
Re: Names of planes, and request for sneak preview
John Cowan <[EMAIL PROTECTED]> wrote: >> Everybody and his cat should know that BMP stands for Basic Multilingual >> Plane, and the Roadmap pages also show that SMP is short for Secondary >> Multilingual Plane. What are SIP and GPP? > > Supplementary Ideographic Plane, General Purpose Plane. Note that these > are 10646 names, not Unicode names. Interesting... I hadn't looked at it that way. I know that the entire group/plane/row/cell breakdown is a 10646 thing. Is there a Unicode- specific term for the range from U+ to U+FFFD, the code points that can be represented without surrogates? -Doug Ewell Fullerton, California
Re: Names of planes, and request for sneak preview
Doug Ewell wrote: > > Michael Everson's "Roadmap" pages refer to Planes 0, 1, 2, and 14 as > the BMP, SMP, SIP, and GPP respectively. > > Everybody and his cat should know that BMP stands for Basic Multilingual > Plane, and the Roadmap pages also show that SMP is short for Secondary > Multilingual Plane. What are SIP and GPP? Supplementary Ideographic Plane, General Purpose Plane. Note that these are 10646 names, not Unicode names. -- Schlingt dreifach einen Kreis um dies! || John Cowan <[EMAIL PROTECTED]> Schliesst euer Aug vor heiliger Schau, || http://www.reutershealth.com Denn er genoss vom Honig-Tau, || http://www.ccil.org/~cowan Und trank die Milch vom Paradies.-- Coleridge (tr. Politzer)
Re: Names of planes, and request for sneak preview
Doug Ewell asked: > > Michael Everson's "Roadmap" pages refer to Planes 0, 1, 2, and 14 as > the BMP, SMP, SIP, and GPP respectively. > > Everybody and his cat should know that BMP stands for Basic Multilingual > Plane, and the Roadmap pages also show that SMP is short for Secondary > Multilingual Plane. What are SIP and GPP? SMP: Secondary Multilingual Plane for scripts and symbols (= Plane 1) SIP: Supplementary Plane for CJK Ideographs (= Plane 2) SPP: Special Purpose Plane (= Plane 14) GPP stood for "General Purpose Plane", which was the earlier acronym for the SPP, subsequently changed. > > Also, the Plane 1 Roadmap says that the Etruscan, Gothic, and Deseret > scripts are virtually guaranteed to be encoded at U+10300, U+10330, and > U+10400 respectively (they lack only the final approval from WG2, which > is widely expected). If I *PROMISE* to use them only for testing and > not for "live" data or interchange, can I assume these scripts are laid > out as shown in the proposals on Everson's site? You can do what you like with them, but don't assume they will not change. Etruscan is likely to be renamed to Old Italic (including Etruscan), and will most likely have at least one character removed and one character added. Gothic is having one character removed (I don't know whether Everson's document reflects that or not, but the FCD draft of 10646-2 should.) Deseret is probably safe! No one has asked for any emendations to that script. > In general, can someone > with inside information on Plane 1 assignments provide a sneak preview, > complete with the usual caveats that responsible implementors must follow > (no guarantees, use at own risk, etc.)? In addition to those three scripts, expect also to see Byzantine musical symbols, basically unchanged from the earlier draft of 10646-2. Western musical symbols will be encoded, but are likely to get churned a bit from earlier drafts, due to lots of expert feedback on the drafts. And then there are the mathematical alphanumerics, located at U+1D400..U+1D7FF. People who want details should refer to the SC2 website, where the FCD for 10646-2 is available for public review. (It is a gigantic document, however, because of the 40,000+ Chinese characters embedded in it.) For those who can wait just a little longer, however, it is advisable to wait until after the Athens WG2 meeting at the end of September, where the final decisions will be made about assignments of characters for 10646-2. --Ken
Re: Names of planes, and request for sneak preview
On Mon, 10 Jul 2000, Doug Ewell wrote: > What are SIP and GPP? Supplemental Ideograph[ic] Plane and General Purpose Plane, I'd guess. -- Robert