Re: Names of planes, and request for sneak preview

2000-07-12 Thread Roozbeh Pournader



On Tue, 11 Jul 2000, Asmus Freytag wrote:

> There are 0x10 - 34 possible characters!

minus 2048 surrogate codepoints.




Re: Names of planes, and request for sneak preview

2000-07-11 Thread Doug Ewell

John Cowan <[EMAIL PROTECTED]> wrote:

> I coined the term "Astral Planes" for the space 1-10, but it
> ticked off someone or other (I suppose because it might suggest that
> they weren't worthy of implementation) and I stopped using it.

On the other hand, it amused me enough so that I actually used it as
a function name:

int AstralPlanes(const unsigned long ulChar)
{
return((ulChar & 0x) != 0);
}

-Doug Ewell
 Fullerton, California



RE: Names of planes, and request for sneak preview

2000-07-11 Thread Murray Sargent

None of the code values ending in 0xFFFE and 0x refer to characters,
i.e., 0xFFFE, 0x, 0x1FFFE, 0x1, etc.  These values exist for
internal use only.

Murray

-Original Message-
From: john [mailto:[EMAIL PROTECTED]]
Sent: Tuesday, July 11, 2000 5:17 PM
To: Unicode List
Subject: Re: Names of planes, and request for sneak preview

> Asmus Freytag wrote:
> There are 0x10 - 34 possible characters!

> All code values ending in 0xFFFE and Ox do *not* refer to characters. 
> They are not just temporarily unassigned, but permanently reserved as 
> non-characters.

Clarification request: Does that mean
None of the code values ending in 0xFFFE and 0x refer to characters?

or

Not all of the code values ending in 0xFFFE and 0x refer to characters
(i..e some do and some do not)?



Re: Names of planes, and request for sneak preview

2000-07-11 Thread john


> Asmus Freytag wrote:
> There are 0x10 - 34 possible characters!

> All code values ending in 0xFFFE and Ox do *not* refer to characters. 
> They are not just temporarily unassigned, but permanently reserved as
> non-characters.

Clarification request: Does that mean
None of the code values ending in 0xFFFE and 0x refer to characters?

or

Not all of the code values ending in 0xFFFE and 0x refer to characters
(i..e some do and some do not)?




Re: Names of planes, and request for sneak preview

2000-07-11 Thread 11digitboy

Okay, 0x10FFDE different characters. But what of
planes 15 and 16?

--
Robert Lozyniak
Accusplit pedometer, purchased about 2000a07l01d19h45mZ,
has NOT FLIPPED
My page: http://walk.to/11
[EMAIL PROTECTED] - email
(917) 421-3909 x1133 - voicemail/fax



 Asmus Freytag <[EMAIL PROTECTED]> wrote:
> At 12:18 PM 7/11/00 -0800, [EMAIL PROTECTED]
> wrote:
> >What about F? I was told that there are 0x10
> >possible characters?
> >Oh, by the way, if 12 is a dozen and 144 is a
> gross,
> >what are 16 and 256?
> 
> 
> There are 0x10 - 34 possible characters!
> 
> All code values ending in 0xFFFE and Ox do
> *not* refer to characters. 
> They are not just temporarily unassigned, but permanently
> reserved as 
> non-characters.
> 
> A./
> 

___
Get your own FREE Bolt Onebox - FREE voicemail, email, and
fax, all in one place - sign up at http://www.bolt.com




Re: Names of planes, and request for sneak preview

2000-07-11 Thread Asmus Freytag

At 12:18 PM 7/11/00 -0800, [EMAIL PROTECTED] wrote:
>What about F? I was told that there are 0x10
>possible characters?
>Oh, by the way, if 12 is a dozen and 144 is a gross,
>what are 16 and 256?


There are 0x10 - 34 possible characters!

All code values ending in 0xFFFE and Ox do *not* refer to characters. 
They are not just temporarily unassigned, but permanently reserved as 
non-characters.

A./



Re: Names of planes, and request for sneak preview

2000-07-11 Thread Tex Texin



[EMAIL PROTECTED] wrote:
> 
> What about F? I was told that there are 0x10
> possible characters?
> Oh, by the way, if 12 is a dozen and 144 is a gross,
> what are 16 and 256?

1 and 1/4 dozen and 9/16 of a gross.



Re: Names of planes, and request for sneak preview

2000-07-11 Thread Tex Texin

shoot, its 1 1/3 dozen.

Tex Texin wrote:
> 
> [EMAIL PROTECTED] wrote:
> >
> > What about F? I was told that there are 0x10
> > possible characters?
> > Oh, by the way, if 12 is a dozen and 144 is a gross,
> > what are 16 and 256?
> 
> 1 and 1/4 dozen and 9/16 of a gross.

-- 

Tex Texin   Director, International Products
[EMAIL PROTECTED]
Progress Software Corp. +1-781-280-4271 Fax:+1-781-280-4655
14 Oak Park, Bedford, MA 01730

http://www.progress.com The #1 Embedded Database
http://www.SonicMQ.com  JMS Compliant Messaging- Best
Middleware Award
http://www.aspconnections.com   Leading provider in the ASP
marketplace
http://www.NuSphere.com Open Source software and commercial
services for MySQL

Progress Globalization Program
http://www.progress.com/partners/globalization.htm

Come to the Panel on Open Source Approaches to Unicode Libraries at
the Sept. Unicode Conference
http://www.unicode.org/iuc/iuc17



Re: Names of planes, and request for sneak preview

2000-07-11 Thread Jonathan Coxhead

> Oh, by the way, if 12 is a dozen and 144 is a gross,
> what are 16 and 256?

   272




Re: Names of planes, and request for sneak preview

2000-07-11 Thread 11digitboy

What about F? I was told that there are 0x10
possible characters?
Oh, by the way, if 12 is a dozen and 144 is a gross,
what are 16 and 256?

--
Robert Lozyniak
Accusplit pedometer, purchased about 2000a07l01d19h45mZ,
has NOT FLIPPED
My page: http://walk.to/11
[EMAIL PROTECTED] - email
(917) 421-3909 x1133 - voicemail/fax



 Kenneth Whistler <[EMAIL PROTECTED]> wrote:
> Mark responded reconditely:
> 
> > 
> > I ALY FND ANMs HRD2 DL WTH. WD PFR NML WDS.
> > 
> > Michael Everson wrote:
> > 
> > > Ar 07:53 -0800 2000-07-11, scríobh John H.
> Jenkins:
> > >
> > > >At the same time, it would be nice to have
> a Unicodally correct way
> > > >of referring to planes 1 and 2, since there
> is an important boundary
> > > >between them.
> > >
> > > Just use the acronyms BMP, SMP, and SIP.
> > >
> 
> From the practice that is developing in the relevant
> committees,
> and the discussion on this list, it would appear
> that the
> practical consensus seems to be heading towards:
> 
>  ..  "The BMP"
> 1..1 "Plane 1"
> 2..2 "Plane 2"
> E..E "Plane 14"
> 
> Those are in fact the terms that most people are
> using. It is quite
> unlikely that "SMP" and "SIP" and "SPP" are going
> to catch on
> very widely, given the difficulty of keeping them
> straight, or
> separate from other TLA's and FLA's like SMTP,
> TCPIP, etc. (SPP
> also means Southwest Power Pool, Science and Policy
> Programs,
> School of Public Policy, Society for Philosophy
> and Psychology,
> Student Protector Plan, Sandy's Pattern Pantry,
> Self-Publishing Partners,
> and the Santiago Park Plaza...) "Plane 14" is actually
> a *much* better
> term -- if you do an Internet search on that, all
> the pertinent
> Unicode-related stuff actually pops right up to
> the top of the search.
> 
> And despite Mark's disclaimer about the validity
> of any boundaries
> past the  / 1 boundary, the Plane boundaries
> do have some
> importance. They are likely to figure prominently
> in trie structures
> for accessing properties of characters past ;
> the planes themselves
> have some uniformity of their properties, since
> different things
> are being isolated to Plane 2 or Plane 14 as opposed
> to Plane 1.
> 
> Also, in favor of John Cowan's terminology, one
> might also note
> that all of the "Astral Planes" are self-naming
> by their initial
> hex digit.
> 
> --Ken
> 

___
Get your own FREE Bolt Onebox - FREE voicemail, email, and
fax, all in one place - sign up at http://www.bolt.com




Re: Names of planes, and request for sneak preview

2000-07-11 Thread Kenneth Whistler

Mark responded reconditely:

> 
> I ALY FND ANMs HRD2 DL WTH. WD PFR NML WDS.
> 
> Michael Everson wrote:
> 
> > Ar 07:53 -0800 2000-07-11, scríobh John H. Jenkins:
> >
> > >At the same time, it would be nice to have a Unicodally correct way
> > >of referring to planes 1 and 2, since there is an important boundary
> > >between them.
> >
> > Just use the acronyms BMP, SMP, and SIP.
> >

>From the practice that is developing in the relevant committees,
and the discussion on this list, it would appear that the
practical consensus seems to be heading towards:

 ..  "The BMP"
1..1 "Plane 1"
2..2 "Plane 2"
E..E "Plane 14"

Those are in fact the terms that most people are using. It is quite
unlikely that "SMP" and "SIP" and "SPP" are going to catch on
very widely, given the difficulty of keeping them straight, or
separate from other TLA's and FLA's like SMTP, TCPIP, etc. (SPP
also means Southwest Power Pool, Science and Policy Programs,
School of Public Policy, Society for Philosophy and Psychology,
Student Protector Plan, Sandy's Pattern Pantry, Self-Publishing Partners,
and the Santiago Park Plaza...) "Plane 14" is actually a *much* better
term -- if you do an Internet search on that, all the pertinent
Unicode-related stuff actually pops right up to the top of the search.

And despite Mark's disclaimer about the validity of any boundaries
past the  / 1 boundary, the Plane boundaries do have some
importance. They are likely to figure prominently in trie structures
for accessing properties of characters past ; the planes themselves
have some uniformity of their properties, since different things
are being isolated to Plane 2 or Plane 14 as opposed to Plane 1.

Also, in favor of John Cowan's terminology, one might also note
that all of the "Astral Planes" are self-naming by their initial
hex digit.

--Ken



Re: Names of planes, and request for sneak preview

2000-07-11 Thread Mark Davis

I ALY FND ANMs HRD2 DL WTH. WD PFR NML WDS.

Michael Everson wrote:

> Ar 07:53 -0800 2000-07-11, scríobh John H. Jenkins:
>
> >At the same time, it would be nice to have a Unicodally correct way
> >of referring to planes 1 and 2, since there is an important boundary
> >between them.
>
> Just use the acronyms BMP, SMP, and SIP.
>
> Michael Everson  **  Everson Gunn Teoranta  **   http://www.egt.ie
> 15 Port Chaeimhghein Íochtarach; Baile Átha Cliath 2; Éire/Ireland
> Vox +353 1 478 2597 ** Fax +353 1 478 2597 ** Mob +353 86 807 9169
> 27 Páirc an Fhéithlinn;  Baile an Bhóthair;  Co. Átha Cliath; Éire




Re: Names of planes, and request for sneak preview

2000-07-11 Thread Michael Everson

Ar 07:53 -0800 2000-07-11, scríobh John H. Jenkins:

>At the same time, it would be nice to have a Unicodally correct way
>of referring to planes 1 and 2, since there is an important boundary
>between them.

Just use the acronyms BMP, SMP, and SIP.

Michael Everson  **  Everson Gunn Teoranta  **   http://www.egt.ie
15 Port Chaeimhghein Íochtarach; Baile Átha Cliath 2; Éire/Ireland
Vox +353 1 478 2597 ** Fax +353 1 478 2597 ** Mob +353 86 807 9169
27 Páirc an Fhéithlinn;  Baile an Bhóthair;  Co. Átha Cliath; Éire





Re: Names of planes, and request for sneak preview

2000-07-11 Thread Mark Davis

The boundary between at 2 is not really any more significant than the bounary
at 3000, or 2000, or 900. We don't need special names for boundaries between
different scripts.

At plane 14, it's only a few characters we regret ;-)

Mark

"John H. Jenkins" wrote:

> At 7:19 AM -0800 7/11/00, Mark Davis wrote:
> >However, there are certain units or thresholds that are useful to distinguish
> >in Unicode. The most important threshold is the one between  and 1:
> >important for UTF-16 implementations (and to a minor degree, UTF-8
> >implementations). So there are terms for codepoints above and below that. I've
> >heard the following used:
> >
> >BMP characters: those with codepoints < 1 (borrowing BMP from 10646)
> >aka UCS-2 characters
> >aka non-surrogate characters
> >
> >non-BMP characters: those with codepoints > 
> >aka non-UCS-2 characters
> >aka surrogate characters
> >
>
> At the same time, it would be nice to have a Unicodally correct way
> of referring to planes 1 and 2, since there is an important boundary
> between them.
>
> And of course, the *proper* way to refer to plane 14 is to pretend it
> doesn't exist at all.  :-)
>
> --
> =
> John H. Jenkins
> [EMAIL PROTECTED]
> [EMAIL PROTECTED]
> http://www.blueneptune.com/~tseng




Re: Names of planes, and request for sneak preview

2000-07-11 Thread John Cowan

Mark Davis wrote:

> It would be nice to have another positive, non-acronymic term for
> those characters above , but none has yet arisen.

I coined the term "Astral Planes" for the space 1-10, but it
ticked off someone or other (I suppose because it might suggest that
they weren't worthy of implementation) and I stopped using it.

-- 

Schlingt dreifach einen Kreis um dies! || John Cowan <[EMAIL PROTECTED]>
Schliesst euer Aug vor heiliger Schau,  || http://www.reutershealth.com
Denn er genoss vom Honig-Tau,   || http://www.ccil.org/~cowan
Und trank die Milch vom Paradies.-- Coleridge (tr. Politzer)



Re: Names of planes, and request for sneak preview

2000-07-11 Thread John H. Jenkins

At 7:19 AM -0800 7/11/00, Mark Davis wrote:
>However, there are certain units or thresholds that are useful to distinguish
>in Unicode. The most important threshold is the one between  and 1:
>important for UTF-16 implementations (and to a minor degree, UTF-8
>implementations). So there are terms for codepoints above and below that. I've
>heard the following used:
>
>BMP characters: those with codepoints < 1 (borrowing BMP from 10646)
>aka UCS-2 characters
>aka non-surrogate characters
>
>non-BMP characters: those with codepoints > 
>aka non-UCS-2 characters
>aka surrogate characters
>

At the same time, it would be nice to have a Unicodally correct way 
of referring to planes 1 and 2, since there is an important boundary 
between them.

And of course, the *proper* way to refer to plane 14 is to pretend it 
doesn't exist at all.  :-)

-- 
=
John H. Jenkins
[EMAIL PROTECTED]
[EMAIL PROTECTED]
http://www.blueneptune.com/~tseng



Re: Names of planes, and request for sneak preview

2000-07-11 Thread Mark Davis

We haven't used the notion of Planes and Groups. These actually derived, as far
as I can remember from early days in L2, from later-discarded mechanisms that
would let you swap in planes into the BMP. Thus it was important to distinguish
these levels. Planes and Groups are themselves not particularly useful in
Unicode, which has a flat coding space from 0 to 10. We basically just use
them now in communicating with our 10646 brethren.

However, there are certain units or thresholds that are useful to distinguish
in Unicode. The most important threshold is the one between  and 1:
important for UTF-16 implementations (and to a minor degree, UTF-8
implementations). So there are terms for codepoints above and below that. I've
heard the following used:

BMP characters: those with codepoints < 1 (borrowing BMP from 10646)
aka UCS-2 characters
aka non-surrogate characters

non-BMP characters: those with codepoints > 
aka non-UCS-2 characters
aka surrogate characters

Note: D800 - DFFF are *not* surrogate characters. They are surrogate
codepoints, two of which (in UTF-16) represent a surrogate character. The
disadvanatage of using this term "surrogate character" is because of this
possible confusion; you don't have the same problem if you say "non-BMP
character". It would be nice to have another positive, non-acronymic term for
those characters above , but none has yet arisen.

There are other useful boundaries:

Column - 16 values with all but the last 4 binary digits the same: e.g.
2060-206F

Window - 128 values with all but the last 7 binary digits the same, e.g.
2000-207F, or 2080-20FF. Used in SCSU; for compression, blocks of 128 are
useful. In the UTC, we try not to span window boundaries unnecessarily when
allocating characters (for historical reasons, we used to violate this, cf
Hebrew or the Kanas).

aka half-row (In 10646, Row is 256 values with all but the last 8 binary digits
the same).

Surrogate Block - 1024 values with all but the last 10 binary digits the same,
e.g. E-E0400. In UTF-16, these have the same high (leading) surrogate code
value. In the UTC, we try not to span surrogate block boundaries unnecessarily
when allocating characters.

Mark

Doug Ewell wrote:

> John Cowan <[EMAIL PROTECTED]> wrote:
>
> >> Everybody and his cat should know that BMP stands for Basic Multilingual
> >> Plane, and the Roadmap pages also show that SMP is short for Secondary
> >> Multilingual Plane.  What are SIP and GPP?
> >
> > Supplementary Ideographic Plane, General Purpose Plane.  Note that these
> > are 10646 names, not Unicode names.
>
> Interesting... I hadn't looked at it that way.  I know that the entire
> group/plane/row/cell breakdown is a 10646 thing.  Is there a Unicode-
> specific term for the range from U+ to U+FFFD, the code points that
> can be represented without surrogates?
>
> -Doug Ewell
>  Fullerton, California




Re: Names of planes, and request for sneak preview

2000-07-10 Thread Doug Ewell

John Cowan <[EMAIL PROTECTED]> wrote:

>> Everybody and his cat should know that BMP stands for Basic Multilingual
>> Plane, and the Roadmap pages also show that SMP is short for Secondary
>> Multilingual Plane.  What are SIP and GPP?
>
> Supplementary Ideographic Plane, General Purpose Plane.  Note that these
> are 10646 names, not Unicode names.

Interesting... I hadn't looked at it that way.  I know that the entire
group/plane/row/cell breakdown is a 10646 thing.  Is there a Unicode-
specific term for the range from U+ to U+FFFD, the code points that
can be represented without surrogates?

-Doug Ewell
 Fullerton, California



Re: Names of planes, and request for sneak preview

2000-07-10 Thread John Cowan

Doug Ewell wrote:
> 
> Michael Everson's "Roadmap" pages refer to Planes 0, 1, 2, and 14 as
> the BMP, SMP, SIP, and GPP respectively.
> 
> Everybody and his cat should know that BMP stands for Basic Multilingual
> Plane, and the Roadmap pages also show that SMP is short for Secondary
> Multilingual Plane.  What are SIP and GPP?

Supplementary Ideographic Plane, General Purpose Plane.  Note that these
are 10646 names, not Unicode names.

-- 

Schlingt dreifach einen Kreis um dies! || John Cowan <[EMAIL PROTECTED]>
Schliesst euer Aug vor heiliger Schau,  || http://www.reutershealth.com
Denn er genoss vom Honig-Tau,   || http://www.ccil.org/~cowan
Und trank die Milch vom Paradies.-- Coleridge (tr. Politzer)



Re: Names of planes, and request for sneak preview

2000-07-10 Thread Kenneth Whistler

Doug Ewell asked:

> 
> Michael Everson's "Roadmap" pages refer to Planes 0, 1, 2, and 14 as
> the BMP, SMP, SIP, and GPP respectively.
> 
> Everybody and his cat should know that BMP stands for Basic Multilingual
> Plane, and the Roadmap pages also show that SMP is short for Secondary
> Multilingual Plane.  What are SIP and GPP?

SMP: Secondary Multilingual Plane for scripts and symbols (= Plane 1)
SIP: Supplementary Plane for CJK Ideographs (= Plane 2)
SPP: Special Purpose Plane (= Plane 14)

GPP stood for "General Purpose Plane", which was the earlier acronym for the
SPP, subsequently changed.

> 
> Also, the Plane 1 Roadmap says that the Etruscan, Gothic, and Deseret
> scripts are virtually guaranteed to be encoded at U+10300, U+10330, and
> U+10400 respectively (they lack only the final approval from WG2, which
> is widely expected).  If I *PROMISE* to use them only for testing and
> not for "live" data or interchange, can I assume these scripts are laid
> out as shown in the proposals on Everson's site? 

You can do what you like with them, but don't assume they will not change.
Etruscan is likely to be renamed to Old Italic (including Etruscan), and will
most likely have at least one character removed and one character added.
Gothic is having one character removed (I don't know whether Everson's
document reflects that or not, but the FCD draft of 10646-2 should.)
Deseret is probably safe! No one has asked for any emendations to that script.

> In general, can someone
> with inside information on Plane 1 assignments provide a sneak preview,
> complete with the usual caveats that responsible implementors must follow
> (no guarantees, use at own risk, etc.)?

In addition to those three scripts, expect also to see Byzantine musical
symbols, basically unchanged from the earlier draft of 10646-2.
Western musical symbols will be encoded, but are likely to get churned
a bit from earlier drafts, due to lots of expert feedback on the drafts.
And then there are the mathematical alphanumerics, located at
U+1D400..U+1D7FF.

People who want details should refer to the SC2 website, where the FCD
for 10646-2 is available for public review. (It is a gigantic document,
however, because of the 40,000+ Chinese characters embedded in it.)
For those who can wait just a little longer, however, it is advisable to
wait until after the Athens WG2 meeting at the end of September, where
the final decisions will be made about assignments of characters for 10646-2.

--Ken




Re: Names of planes, and request for sneak preview

2000-07-10 Thread Robert Brady

On Mon, 10 Jul 2000, Doug Ewell wrote:

> What are SIP and GPP?

Supplemental Ideograph[ic] Plane and General Purpose Plane, I'd guess.

-- 
Robert