Re: Reasonable to propose stability policy on numeric type = decimal

2010-07-25 Thread CE Whitehead

 
> From: karl williamson (pub...@khwilliamson.com)
> Date: Sun Jul 25 2010 - 17:00:14 CDT 
> . . . 
>> From: cewcat...@hotmail.com 
>> Date: Sun, 25 Jul 2010 16:24:01 -0400 
>> 
>> 
>> > . . . 
>> > Date: Sun, 25 Jul 2010 10:43:11 -0600 
>> > From: pub...@khwilliamson.com 
>> > 
> . . . 
>> > 
>> > Prudence would dictate, then, that when assigning code points to the 
>> > numbers in a script, that a contiguous block of 12-13 be reserved for 
>> > them, such that the first one in the block be set aside for ZERO; the 
>> > next for ONE, etc. 
>> > 
>> > My original question comes down to then, would it be reasonable to 
>> > codify this prudence? People have said it will never happen. But no 
>> > one has said why that is. 
>> > 
>> > Obviously, things can happen that will mess this up--the Phaistos disk 
>> > could turn out to be a base-46 numbering system, as an extremely 
>> > unlikely example. But by dictating prudence now, most such eventualities 
>> > wouldn't happen. 
>> > 
>> > I have since looked at the Nt=Di characters. The ones that aren't in 
>> > contiguous runs are the superscripts and ones that would never be 
>> > considered to be decimal digits, such as a circled ZERO. 
>> Hi 
>> Are you proposing that superscripts be in contiguous runs or not? 

> I was not proposing that. Just the codification of what existing 
> practice has been for Numeric_Type=Decimal_Digit. Superscripts are of 
> Numeric_Type=Digit; the two names are too similar, and cause confusion. 
O.k. that's clear enough now.
I tend to feel however that Asmus has brought up a reasonable objection
-- although in cases other than when some alphabetic characters are reused as 
numeric ones, 
this might be at least a non-harmful policy to have (meaning I cannot think of 
an objection myself right at this moment).

> I know of no general purpose programming language that figures out math 
> equations with superscripts.  
> If you want exponentiation, you have to 
> specify an exponentiation operator. 

>  Above 
>> you disallowed subscripts (although 
>> I think mathematically subscripts have some meaning in equations as do 
>> superscripts and it might worth converting them albeit separately from 
>> other numbers; if these were converted it would allow complete equations 
>> to be converted from character strings -- but with only digits 1-9 I do 
>> not see that much of an issue; I'd personally like to find a subscript 
>> i; but so far I've just looked at: 
>> http://unicode.org/charts/PDF/U2070.pdf where the subscripts 0-9 are all 
>> contiguous but the superscript 1, 2, and 3 are not; searching through 
>> http://unicode.org/Public/UNIDATA/UnicodeData.txt that was all I found; 
>> I then started going through code charts one by one and so far have 
>> gotten as far as Old South Arabian and have not found superscript i or 
>> more superscript decimal numbers though maybe I've missed something -- 
>> the Arabic sukun is not going to be part of a series of superscripts in 
>> any case). 
>>
Sorry again.  Subscript i is encoded; I missed it; indeed there are a a number 
of subscript characters currently encoded.
What I found were:
subscript lower case letters: a; e; o; x; schwa; j; i; r; u; v (still looking 
for more);
also Greek letters betta; gamma; rho; chi; phi (still looking for alpha and 
delta but of course maybe I do not know where to search yet).
(But this is another thread entirely.)
Best,
C. E. Whitehead
cewcat...@hotmail.com
 
  

Re: Reasonable to propose stability policy on numeric type = decimal

2010-07-25 Thread Asmus Freytag

On 7/25/2010 6:05 PM, Martin J. Dürst wrote:



On 2010/07/26 4:37, Asmus Freytag wrote:


PPS: a very hypothetical tough case would be a script where letters
serve both as letters and as decimal place-value digits, and with modern
living practice.


Well, there actually is such a script, namely Han. The digits (一、 
二、三、四、五、六、七、八、九、〇) are used both as letters and as 
decimal place-value digits, and they are scattered widely, and of 
course there are is a lot of modern living practice.

Martin,

you found the hidden clue and solved it, first prize :)

They do not show up as gc=Nd, nor as numeric types Digit or Decimal.

The situation is worse than you indicate, because the same characters 
are also used as elements in a system that doesn't use place-value, but 
uses special characters to show powers of 10.


However, as I indicated in my original post, in situations like that, 
there are usually some changes in practice that took place. Much of the 
living modern practice in these countries involves ASCII digits. While 
the ideographic numbers are definitely still used in certain contexts, 
I've not seen them in input fields and would frankly doubt that they 
exist there. I would fully expect that they are supported as number 
format for output, at least in some implementations, and, of course, 
that input methods convert ASCII digits into them. In other words, I 
wonder whether automatic conversion goes only one-way for these numbers. 
I would suspect it, for the general case, but I don't actually know for 
sure.


For someone in Karl's situation, it would be interesting to learn 
whether and to what extent he should bother supporting these numbers in 
his language extension.


A./



Re: Reasonable to propose stability policy on numeric type = decimal

2010-07-25 Thread Martin J. Dürst



On 2010/07/26 4:37, Asmus Freytag wrote:


PPS: a very hypothetical tough case would be a script where letters
serve both as letters and as decimal place-value digits, and with modern
living practice.


Well, there actually is such a script, namely Han. The digits (一、二、 
三、四、五、六、七、八、九、〇) are used both as letters and as decimal 
place-value digits, and they are scattered widely, and of course there 
are is a lot of modern living practice.


Regards,Martin.

--
#-# Martin J. Dürst, Professor, Aoyama Gakuin University
#-# http://www.sw.it.aoyama.ac.jp   mailto:due...@it.aoyama.ac.jp



Re: CSUR Tonal

2010-07-25 Thread Doug Ewell
I don't know if this belongs on the public Unicode list or not.  I 
assume anyone who feels it should not will not keep their feelings 
secret for long.


Luke-Jr wrote:

I would appreciate any constructive input on this future proposal (for 
CSUR, not Unicode). Doug, do you mind if Tonal is positioned after 
your Ewellic script (maybe you plan to extend it?) from U+E6D0 to 
U+E6FF?


Since I am not the administrator of CSUR, I really don't have any say or 
control over that.  However, since Ewellic could be considered a 
"living" conscript (I added 13 new letters two years ago, and would like 
to add at least one more), it doesn't seem strictly necessary to close 
off the right end of its block.  There is a contiguous block of over 
4,200 code points starting at U+E830 (after Monofon) and it seems to me 
that could be used for Tonal instead.



Specific questions I have:


Following are first-cut answers.  I would not suggest immediately 
cranking out a new revision of your proposal based on these comments 
until you get others.


- Am I using COMBINING correctly? Is it sufficient for fonts to render 
units properly?


These are not really combining marks; they appear to be nothing more 
than ordinary Latin superscript letters.  As such, I would suggest not 
only that the "multiplication" and "division" superscripts be unified 
with each other, but that they be unified with already-encoded Latin 
superscript letters S, T, b, m, r, s, and t.  (The existing multi-letter 
symbols you may have seen in the Unicode Standard are for compatibility 
with legacy character sets, a requirement that does not apply to Tonal.)


Imagine my surprise, then, when I found that these superscripts are not 
formally encoded (only i and n are).  So in that case, I would suggest 
simply encoding seven Latin superscript letters as above, which of 
course could be used for other purposes as well, and for that reason 
might eventually become candidates for formal encoding.


- Do I have enough background on the Tonal system itself? It doesn't 
seem the right place to give all the specific details on how it works, 
but maybe more would be useful?


I don't think you need to explain it further, but you should definitely 
have a bibliographic reference to Nystrom's book, perhaps including a 
link to the Google Books scan since it might be difficult for readers to 
find this book any other way.


- Is my usage of HTML  for unit examples appropriate, despite it 
not rendering *just* right (at least not in my viewer)?


Looks fine to me.  Don't expect your typography, mine, or anyone else's 
to be pixel-for-pixel identical to that of a book printed 150 years ago.


- Should TONAL DIGIT NINE be renamed to TONAL HEXADECIMAL DIGIT NINE 
since it is invalid in a decimal context?


All of the digits should be HEXADECIMAL.  You should also fix the typo 
"HEXADECIAML" in your proposal.


- Should I define TONAL HEXADECIMAL DIGIT TEN, even though it looks 
like Unicode DIGIT NINE (U+0039)?


I would encode all of them as a set.  The Basic Latin digits have the 
wrong property (General Category Nd) and it could be argued it is only a 
coincidence that they have the "right" appearance for tonal use.


- Should I put "(This position shall not be used)" in reserved 
positions, or does this mean it shall not be used *ever*?


You have used it correctly.  Note, however, that you should not reserve 
space for "future" multiplication and division signs unless you think 
Nystrom defined some and you simply haven't been able to find a 
reference.  This is not a space for you to invent your own symbols, 
unless you say so in the prose.


- Is it proper to give glyph examples to TONAL COMBINING UNIT 
DIVISION/MULTIPLICATION for tran, song, and tam, which Nystrom never 
explained how specifically they should be written (following his 
general directions, they would overlap with the 
division/multiplications for ton and san)?


If Nystrom did not use them, you should not include them unless you 
specifically state they are your own invention.


--
Doug Ewell | Thornton, Colorado, USA | http://www.ewellic.org
RFC 5645, 4645, UTN #14 | ietf-languages @ is dot gd slash 2kf0s ­




Re: Reasonable to propose stability policy on numeric type = decimal

2010-07-25 Thread karl williamson

CE Whitehead wrote:



Sorry for my last email; I have that signature in hotmail and always 
delete it but do have it for a few private emails; but sorry as I ment 
to delete it but was very very tired.
 
--C. E. Whitehead
cewcat...@hotmail.com  


From: cewcat...@hotmail.com
To: pub...@khwilliamson.com; verd...@wanadoo.fr
CC: kent.karlsso...@telia.com; unicode@unicode.org
Subject: RE: Reasonable to propose stability policy on numeric type = 
decimal

Date: Sun, 25 Jul 2010 16:24:01 -0400


 > . . . 
 > Date: Sun, 25 Jul 2010 10:43:11 -0600

 > From: pub...@khwilliamson.com
 > To: verd...@wanadoo.fr
 > CC: kent.karlsso...@telia.com; unicode@unicode.org
 > Subject: Re: Reasonable to propose stability policy on numeric type = 
decimal

 >
 > Philippe Verdy wrote:
 > > "Kent Karlsson"  wrote:
 > >> Den 2010-07-25 03.09, skrev "Michael Everson" :
 > >>> On 25 Jul 2010, at 02:02, Bill Poser wrote:
 >  As I said, it isn't a huge issue, but scattering the digits 
makes the
 >  programming a bit more complex and error-prone and the programs 
a little less

 >  efficient.
 > >>> But it would still *work*. So my hyperbole was not outrageous. 
And nobody has
 > >>> actually scattered them. THough there are various types of "runs" 
in existing

 > >>> encoded digits and numbers.
 > >> While not formally of general category Nd (they are "No"), the 
superscript

 > >> digits are a bit scattered:
 > >>
 > >> 00B2;SUPERSCRIPT TWO
 > >> 00B3;SUPERSCRIPT THREE
 > >> 00B9;SUPERSCRIPT ONE
 > >> 2070;SUPERSCRIPT ZERO
 > >> 2074;SUPERSCRIPT FOUR
 > >> ...
 > >> 2079;SUPERSCRIPT NINE
 > >>
 > >> And there are situations where one wants to interpret them as in a
 > >> decimal-position system.
 > >
 > > Scattering does not only affect decimal digits, but also mathematical
 > > operators needed to represent:
 > >
 > > - the numeric sign (« - » or « + »), with at least two variants for
 > > the same system to represent the minus sign (either the ambiguous
 > > minus-heighten, the only one supported in many text-to-number
 > > conversions, or the true mathematical minus sign U+2212 « − » that has
 > > the same width as the plus sign), including some « alternating signs »
 > > that exist in two opposite versions (« ± », « ∓ »);
 > >
 > > - the characters that represent the decimal separator (« . » or « , »)
 > > which is almost always needed but locale-specific (this is not just a
 > > property of the script);
 > >
 > > - the optional character used to note exponential notations and used
 > > in text-to-number conversion (usually « e » or « E »);
 > >
 > > - the optional characters used in the conventional formatting for
 > > grouping digits (NNBSP alias « fine », with possible automatic
 > > fallback to THINSP in font renderers and in rich-text documents
 > > controlling the breaking property with separate style, or fallback to
 > > NBSP in plain-text documents, or fallback to standard SPACE in
 > > preformatted plain-text documents, « , », or « ' », and possibly other
 > > punctuations in their « wide » form, for ideographic scripts).
 > >
 > > Some of them exist in exponential/superscript or indice/subscript
 > > versions (notably digits and decimal separators), but not all of them
 > > (not all separators for grouping digits, using NNBSP may not be
 > > appropriate as its width is not adjusted and it does not have the
 > > semantic of a superscript or subscript).
 > >
 > > For generality, it seems better to assume that digits and other
 > > characters needed to note numbers in the positional decimal system may
 > > be scattered (libraries may still avoid the small overhead of
 > > performing table lookups, by just inspecting a property of the
 > > character '0' or of the convention use, that will either say that it
 > > starts a contiguous ranges, or that the complete sequence is stored in
 > > a lookup array for the 10 digits.
 > >
 > > The general category "Nd" may not always be accurate to find all
 > > digits usable in decimal notations of integers, because the sequence
 > > may have been incomplete when it was first encoded, and completed
 > > later in scattered positions.
 > >
 > > In this case, the digits will often have a general property of "No"
 > > (or even "Nl") that will remain stable. What should also be stable is
 > > their numeric value property (but I'm not sure that this is the case
 > > of "Nl" digits, notably for scripts systems using letters in a way
 > > similar to Greek or Hebrew letters as digits, even if Greek and Hebrew
 > > digits are not encoded separately from the letters that these number
 > > notations are borrowing).
 > >
 > > Also I'm not sure that scripts that define "half-digits", or digits
 > > with higher numeric values than 9, are permitting the use of their
 > > digits with a numeric value between 0 and 9, in a positional decimal
 > > system. The Roman numeric system is such a nu

RE: Reasonable to propose stability policy on numeric type = decimal

2010-07-25 Thread CE Whitehead



Sorry for my last email; I have that signature in hotmail and always delete it 
but do have it for a few private emails; but sorry as I ment to delete it but 
was very very tired.

 

--C. E. Whitehead

cewcat...@hotmail.com 


From: cewcat...@hotmail.com
To: pub...@khwilliamson.com; verd...@wanadoo.fr
CC: kent.karlsso...@telia.com; unicode@unicode.org
Subject: RE: Reasonable to propose stability policy on numeric type = decimal
Date: Sun, 25 Jul 2010 16:24:01 -0400




> . . . 
> Date: Sun, 25 Jul 2010 10:43:11 -0600
> From: pub...@khwilliamson.com
> To: verd...@wanadoo.fr
> CC: kent.karlsso...@telia.com; unicode@unicode.org
> Subject: Re: Reasonable to propose stability policy on numeric type = decimal
> 
> Philippe Verdy wrote:
> > "Kent Karlsson"  wrote:
> >> Den 2010-07-25 03.09, skrev "Michael Everson" :
> >>> On 25 Jul 2010, at 02:02, Bill Poser wrote:
>  As I said, it isn't a huge issue, but scattering the digits makes the
>  programming a bit more complex and error-prone and the programs a little 
>  less
>  efficient.
> >>> But it would still *work*. So my hyperbole was not outrageous. And nobody 
> >>> has
> >>> actually scattered them. THough there are various types of "runs" in 
> >>> existing
> >>> encoded digits and numbers.
> >> While not formally of general category Nd (they are "No"), the superscript
> >> digits are a bit scattered:
> >>
> >> 00B2;SUPERSCRIPT TWO
> >> 00B3;SUPERSCRIPT THREE
> >> 00B9;SUPERSCRIPT ONE
> >> 2070;SUPERSCRIPT ZERO
> >> 2074;SUPERSCRIPT FOUR
> >> ...
> >> 2079;SUPERSCRIPT NINE
> >>
> >> And there are situations where one wants to interpret them as in a
> >> decimal-position system.
> > 
> > Scattering does not only affect decimal digits, but also mathematical
> > operators needed to represent:
> > 
> > - the numeric sign (« - » or « + »), with at least two variants for
> > the same system to represent the minus sign (either the ambiguous
> > minus-heighten, the only one supported in many text-to-number
> > conversions, or the true mathematical minus sign U+2212 « − » that has
> > the same width as the plus sign), including some « alternating signs »
> > that exist in two opposite versions (« ± », « ∓ »);
> > 
> > - the characters that represent the decimal separator (« . » or « , »)
> > which is almost always needed but locale-specific (this is not just a
> > property of the script);
> > 
> > - the optional character used to note exponential notations and used
> > in text-to-number conversion (usually « e » or « E »);
> > 
> > - the optional characters used in the conventional formatting for
> > grouping digits (NNBSP alias « fine », with possible automatic
> > fallback to THINSP in font renderers and in rich-text documents
> > controlling the breaking property with separate style, or fallback to
> > NBSP in plain-text documents, or fallback to standard SPACE in
> > preformatted plain-text documents, « , », or « ' », and possibly other
> > punctuations in their « wide » form, for ideographic scripts).
> > 
> > Some of them exist in exponential/superscript or indice/subscript
> > versions (notably digits and decimal separators), but not all of them
> > (not all separators for grouping digits, using NNBSP may not be
> > appropriate as its width is not adjusted and it does not have the
> > semantic of a superscript or subscript).
> > 
> > For generality, it seems better to assume that digits and other
> > characters needed to note numbers in the positional decimal system may
> > be scattered (libraries may still avoid the small overhead of
> > performing table lookups, by just inspecting a property of the
> > character '0' or of the convention use, that will either say that it
> > starts a contiguous ranges, or that the complete sequence is stored in
> > a lookup array for the 10 digits.
> > 
> > The general category "Nd" may not always be accurate to find all
> > digits usable in decimal notations of integers, because the sequence
> > may have been incomplete when it was first encoded, and completed
> > later in scattered positions.
> > 
> > In this case, the digits will often have a general property of "No"
> > (or even "Nl") that will remain stable. What should also be stable is
> > their numeric value property (but I'm not sure that this is the case
> > of "Nl" digits, notably for scripts systems using letters in a way
> > similar to Greek or Hebrew letters as digits, even if Greek and Hebrew
> > digits are not encoded separately from the letters that these number
> > notations are borrowing).
> > 
> > Also I'm not sure that scripts that define "half-digits", or digits
> > with higher numeric values than 9, are permitting the use of their
> > digits with a numeric value between 0 and 9, in a positional decimal
> > system. The Roman numeric system is such a numeric system (borrowing
> > some scattered Latin letters and adding a few other specific digits)
> > where this will be completely wrong.
> > 
> > Or another base than 10 could be assume

RE: Reasonable to propose stability policy on numeric type = decimal

2010-07-25 Thread CE Whitehead



Damn this indecision; I don't know; shall I take an axe to it, or shall I let 
it grow. --from Judith Wright, "That Seed"


 

> Date: Sun, 25 Jul 2010 10:43:11 -0600
> From: pub...@khwilliamson.com
> To: verd...@wanadoo.fr
> CC: kent.karlsso...@telia.com; unicode@unicode.org
> Subject: Re: Reasonable to propose stability policy on numeric type = decimal
> 
> Philippe Verdy wrote:
> > "Kent Karlsson"  wrote:
> >> Den 2010-07-25 03.09, skrev "Michael Everson" :
> >>> On 25 Jul 2010, at 02:02, Bill Poser wrote:
>  As I said, it isn't a huge issue, but scattering the digits makes the
>  programming a bit more complex and error-prone and the programs a little 
>  less
>  efficient.
> >>> But it would still *work*. So my hyperbole was not outrageous. And nobody 
> >>> has
> >>> actually scattered them. THough there are various types of "runs" in 
> >>> existing
> >>> encoded digits and numbers.
> >> While not formally of general category Nd (they are "No"), the superscript
> >> digits are a bit scattered:
> >>
> >> 00B2;SUPERSCRIPT TWO
> >> 00B3;SUPERSCRIPT THREE
> >> 00B9;SUPERSCRIPT ONE
> >> 2070;SUPERSCRIPT ZERO
> >> 2074;SUPERSCRIPT FOUR
> >> ...
> >> 2079;SUPERSCRIPT NINE
> >>
> >> And there are situations where one wants to interpret them as in a
> >> decimal-position system.
> > 
> > Scattering does not only affect decimal digits, but also mathematical
> > operators needed to represent:
> > 
> > - the numeric sign (« - » or « + »), with at least two variants for
> > the same system to represent the minus sign (either the ambiguous
> > minus-heighten, the only one supported in many text-to-number
> > conversions, or the true mathematical minus sign U+2212 « − » that has
> > the same width as the plus sign), including some « alternating signs »
> > that exist in two opposite versions (« ± », « ∓ »);
> > 
> > - the characters that represent the decimal separator (« . » or « , »)
> > which is almost always needed but locale-specific (this is not just a
> > property of the script);
> > 
> > - the optional character used to note exponential notations and used
> > in text-to-number conversion (usually « e » or « E »);
> > 
> > - the optional characters used in the conventional formatting for
> > grouping digits (NNBSP alias « fine », with possible automatic
> > fallback to THINSP in font renderers and in rich-text documents
> > controlling the breaking property with separate style, or fallback to
> > NBSP in plain-text documents, or fallback to standard SPACE in
> > preformatted plain-text documents, « , », or « ' », and possibly other
> > punctuations in their « wide » form, for ideographic scripts).
> > 
> > Some of them exist in exponential/superscript or indice/subscript
> > versions (notably digits and decimal separators), but not all of them
> > (not all separators for grouping digits, using NNBSP may not be
> > appropriate as its width is not adjusted and it does not have the
> > semantic of a superscript or subscript).
> > 
> > For generality, it seems better to assume that digits and other
> > characters needed to note numbers in the positional decimal system may
> > be scattered (libraries may still avoid the small overhead of
> > performing table lookups, by just inspecting a property of the
> > character '0' or of the convention use, that will either say that it
> > starts a contiguous ranges, or that the complete sequence is stored in
> > a lookup array for the 10 digits.
> > 
> > The general category "Nd" may not always be accurate to find all
> > digits usable in decimal notations of integers, because the sequence
> > may have been incomplete when it was first encoded, and completed
> > later in scattered positions.
> > 
> > In this case, the digits will often have a general property of "No"
> > (or even "Nl") that will remain stable. What should also be stable is
> > their numeric value property (but I'm not sure that this is the case
> > of "Nl" digits, notably for scripts systems using letters in a way
> > similar to Greek or Hebrew letters as digits, even if Greek and Hebrew
> > digits are not encoded separately from the letters that these number
> > notations are borrowing).
> > 
> > Also I'm not sure that scripts that define "half-digits", or digits
> > with higher numeric values than 9, are permitting the use of their
> > digits with a numeric value between 0 and 9, in a positional decimal
> > system. The Roman numeric system is such a numeric system (borrowing
> > some scattered Latin letters and adding a few other specific digits)
> > where this will be completely wrong.
> > 
> > Or another base than 10 could be assumed by their positional system,
> > even if their digits are encoded in a contiguous range of characters
> > for the subset of values 0 to 9. This is probably no longer the case
> > with scripts that have modern use, but in historical scripts or in
> > historical texts using a modern script, the implied base may be
> > different and would have used more or 

Re: Reasonable to propose stability policy on numeric type = decimal

2010-07-25 Thread Asmus Freytag
The short answer to Karl's question is that there will not be an 
absolute guarantee.


The long answer is that, partly for the reasons he's mentioned, this 
won't be a practical problem.


A. Most of the living scripts that are in wide use have been encoded, 
including whatever digits are in use.
B. People reviewing encoding proposals include programmers who would 
object to scattering digits


Thus, the only time this would be an issue is if there were some 
exceptional circumstances. And, as the name says, those circumstances 
could force an exception. If that happens there are two possible 
consequences:


1. The script in question is important enough that everybody will build 
in exceptions into their conversion algorithms
2. The script is so unimportant, that its number system won't be 
supported (i.e. it's treated just like other text).


So, for extending your computer language, there's no reason to hold up 
support for many important scripts, just because of a hypothetical 
future exception.


A./

PS: just because I suspect more than one existing implementation to be 
offset-based, there would be tremendous pressure to prevent exceptions 
already :)


PPS: a very hypothetical tough case would be a script where letters 
serve both as letters and as decimal place-value digits, and with modern 
living practice.  Having a policy like you suggest would officially make 
that unsupportable, but there are other cases, like the language that 
wanted to used @ sign as a letter, that are de-facto unsupportable with 
the modern infrastructure. My suspicion is that users of such a script 
would realize that their method is de-facto unsupported/able and find 
some way to change their ways. Changing practices in the face of 
changing technology is something that happens all the time, not only to 
small communities - but that's an entirely new subject :)






Re: CSUR Tonal

2010-07-25 Thread Luke-Jr
I would appreciate any constructive input on this future proposal (for CSUR, 
not Unicode). Doug, do you mind if Tonal is positioned after your Ewellic 
script (maybe you plan to extend it?) from U+E6D0 to U+E6FF?

Specific questions I have:
- Am I using COMBINING correctly? Is it sufficient for fonts to render units 
properly?
- Do I have enough background on the Tonal system itself? It doesn't seem the 
right place to give all the specific details on how it works, but maybe more 
would be useful?
- Is my usage of HTML  for unit examples appropriate, despite it not 
rendering *just* right (at least not in my viewer)?
- Should TONAL DIGIT NINE be renamed to TONAL HEXADECIMAL DIGIT NINE since it 
is invalid in a decimal context?
- Should I define TONAL HEXADECIMAL DIGIT TEN, even though it looks like 
Unicode DIGIT NINE (U+0039)?
- Should I put "(This position shall not be used)" in reserved positions, or 
does this mean it shall not be used *ever*?
- Is it proper to give glyph examples to TONAL COMBINING UNIT 
DIVISION/MULTIPLICATION for tran, song, and tam, which Nystrom never explained 
how specifically they should be written (following his general directions, 
they would overlap with the division/multiplications for ton and san)?

Thanks,

Luke
Title: 
			Tonal
		

	
	
		
			Tonal:
			U+E6D0 -
			U+E6FF
			
0.1.alpha1
			
		
		
			Proposal
			2010-07-
		
		
			NOTE:
			This is still a proposed encoding and has not been standardized.
		
		
			
			In 1859, John W. Nystrom proposed a hexadecimal (base 16) system of notation, arithmetic, and metrology called the Tonal System. In addition to new weights and measures, his proposal included a new calendar with sixteen months, a new system of coinage, and a hexadecimal clock with sixteen hours in a day.
		
		
			Digits
		
		
			The Tonal digits themselves are named, in order and beginning with the English zero, noll, an, de, ti, go, su, by, ra, me, ni, ko, hu, vy, la, po, and fy.
			Noll (zero) through Me (eight) use the same Arabic glyphs as the decimal system (U+0030 - U+0038).
			The glyph for Arabic nine (U+0039) is also used for Ko (ten).
			This script defines new graphs for the digits ni, hu, vy, la, po, and fy.
		
		
			Units
		
		
			The Tonal system also defines new units, called meters, galls, tims, pons, horse-powers, dollars, and temps.
			
			The abridgement of the units are noted by capital letters (except for temps, which use the digraph Tp),
			and the multiplication and division of the same as an exponent by a small letter placed before or after the unit, thus,
			Mt = Meterton,
			tM = Tonmeter,
			Gs = Gallsan,
			Ts = Timsan,
			Pm = Ponmills,
			etc.
			These multiplications and divisions are encoded as combining marks, which can be combined with the standard ASCII letters.
		
		
			Music
		
		
			In this system, music is rearranged into five new clefs, called Canto, Alto, Treble, Tenor, and Bass.
			These new clefs are encoded, and can be used along with the classical musical symbols encoded at U+1D100.
		
		
			Encoding Structure
		
		
			The Tonal block is divided into the following ranges:
		
		
			
U+E6D0->U+E0DF
Digits
			
			
U+E6E0->U+E0EF
Unit multiplications and divisions
			
			
U+E6F0->U+E0FF
Musical clefs
			
		
		
		
			
U+E6D0
(This position shall not be used)
			
			
U+E6D1
(This position shall not be used)
			
			
U+E6D2
(This position shall not be used)
			
			
U+E6D3
(This position shall not be used)
			
			
U+E6D4
(This position shall not be used)
			
			
U+E6D5
(This position shall not be used)
			
			
U+E6D6
(This position shall not be used)
			
			
U+E6D7
(This position shall not be used)
			
			
U+E6D8
(This position shall not be used)
			
			
U+E6D9
TONAL DIGIT NINE
			
			
U+E6DA
(This position shall not be used)
			
			
U+E6DB
TONAL HEXADECIAML DIGIT ELEVEN
			
			
U+E6DC
TONAL HEXADECIAML DIGIT TWELVE
			
			
U+E6DD
TONAL HEXADECIAML DIGIT THIRTEEN
			
			
U+E6DE
TONAL HEXADECIAML DIGIT FOURTEEN
			
			
U+E6DF
TONAL HEXADECIAML DIGIT FIFTEEN
			
			
U+E6E0
(This position reserved for future division symbol)
			
			
U+E6E1
TONAL COMBINING UNIT DIVISION TRAN
			
			
U+E6E2
TONAL COMBINING UNIT DIVISION SONG
			
			
U+E6E3
TONAL COMBINING UNIT DIVISION TAM
			
			
U+E6E4
TONAL COMBINING UNIT DIVISION BONG
			
			
U+E6E5
TONAL COMBINING UNIT DIVISION MILL
			
			
U+E6E6
TONAL COMBINING UNIT DIVISION SAN
			
			
U+E6E7
TONAL COMBINING UNIT DIVISION TON
			
			
U+E6E8
TONAL COMBINING UNIT MULTIPLICATION TON
			
			
U+E6E9
TONAL COMBINING UNIT MULTIPLICATION SAN
			
			
U+E6EA
TONAL COMBINING UNIT MULTIPLICATION MILL
			
			
U+E6EB
TONAL COMBINING UNIT MULTIPLICATION BONG
			
			
U+E6EC
TONAL COMBINING UNIT MULTIPLICATION TAM
			
			
U+E6ED
TONAL COMBINING

Re: Reasonable to propose stability policy on numeric type = decimal

2010-07-25 Thread karl williamson

Philippe Verdy wrote:

"Kent Karlsson"  wrote:

Den 2010-07-25 03.09, skrev "Michael Everson" :

On 25 Jul 2010, at 02:02, Bill Poser wrote:

As I said, it isn't a huge issue, but scattering the digits makes the
programming a bit more complex and error-prone and the programs a little less
efficient.

But it would still *work*. So my hyperbole was not outrageous. And nobody has
actually scattered them. THough there are various types of "runs" in existing
encoded digits and numbers.

While not formally of general category Nd (they are "No"), the superscript
digits are a bit scattered:

00B2;SUPERSCRIPT TWO
00B3;SUPERSCRIPT THREE
00B9;SUPERSCRIPT ONE
2070;SUPERSCRIPT ZERO
2074;SUPERSCRIPT FOUR
...
2079;SUPERSCRIPT NINE

And there are situations where one wants to interpret them as in a
decimal-position system.


Scattering does not only affect decimal digits, but also mathematical
operators needed to represent:

- the numeric sign (« - » or « + »), with at least two variants for
the same system to represent the minus sign (either the ambiguous
minus-heighten, the only one supported in many text-to-number
conversions, or the true mathematical minus sign U+2212 « − » that has
the same width as the plus sign), including some « alternating signs »
that exist in two opposite versions (« ± », « ∓ »);

- the characters that represent the decimal separator (« . » or « , »)
which is almost always needed but locale-specific (this is not just a
property of the script);

- the optional character used to note exponential notations and used
in text-to-number conversion (usually « e » or  « E »);

- the optional characters used in the conventional formatting for
grouping digits (NNBSP alias « fine », with possible automatic
fallback to THINSP in font renderers and in rich-text documents
controlling the breaking property with separate style, or fallback to
NBSP in plain-text documents, or fallback to standard SPACE in
preformatted plain-text documents, « , », or « ' », and possibly other
punctuations in their « wide » form, for ideographic scripts).

Some of them exist in exponential/superscript or indice/subscript
versions (notably digits and decimal separators), but not all of them
(not all separators for grouping digits, using NNBSP may not be
appropriate as its width is not adjusted and it does not have the
semantic of a superscript or subscript).

For generality, it seems better to assume that digits and other
characters needed to note numbers in the positional decimal system may
be scattered (libraries may still avoid the small overhead of
performing table lookups, by just inspecting a property of the
character '0' or of the convention use, that will either say that it
starts a contiguous ranges, or that the complete sequence is stored in
a lookup array for the 10 digits.

The general category "Nd" may not always be accurate to find all
digits usable in decimal notations of integers, because the sequence
may have been incomplete when it was first encoded, and completed
later in scattered positions.

In this case, the digits will often have a general property of "No"
(or even "Nl") that will remain stable. What should also be stable is
their numeric value property (but I'm not sure that this is the case
of "Nl" digits, notably for scripts systems using letters in a way
similar to Greek or Hebrew letters as digits, even if Greek and Hebrew
digits are not encoded separately from the letters that these number
notations are borrowing).

Also I'm not sure that scripts that define "half-digits", or digits
with higher numeric values than 9, are permitting the use of their
digits with a numeric value between 0 and 9, in a positional decimal
system. The Roman numeric system is such a numeric system (borrowing
some scattered Latin letters and adding a few other specific digits)
where this will be completely wrong.

Or another base than 10 could be assumed by their positional system,
even if their digits are encoded in a contiguous range of characters
for the subset of values 0 to 9. This is probably no longer the case
with scripts that have modern use, but in historical scripts or in
historical texts using a modern script, the implied base may be
different and would have used more or less distinct digits. So instead
of guessing automatically from the encoded text, it may be preferable
to annotate the text (easy to insert if the conversion of the
historical text uses some rich-text format) to specify how to
interpret the numeric value of the original number.

And sometimes, the conversion to superscripts/subscripts compatibility
characters will not be possible even if some of them may be converted
safely to their numeric value, after detecting that they are in
superscript/subscript and that they don't behave the same as normal
digits (16²⁰ must NOT be interpreted as the numeric value 1620, but
must be parsed as two successive numbers 16 and 20, where the second
one has the semantic of an exponent, as if there was an exponentiatio

Re: Reasonable to propose stability policy on numeric type = decimal

2010-07-25 Thread Philippe Verdy
"Kent Karlsson"  wrote:
> Den 2010-07-25 03.09, skrev "Michael Everson" :
> > On 25 Jul 2010, at 02:02, Bill Poser wrote:
> >> As I said, it isn't a huge issue, but scattering the digits makes the
> >> programming a bit more complex and error-prone and the programs a little 
> >> less
> >> efficient.
> >
> > But it would still *work*. So my hyperbole was not outrageous. And nobody 
> > has
> > actually scattered them. THough there are various types of "runs" in 
> > existing
> > encoded digits and numbers.
>
> While not formally of general category Nd (they are "No"), the superscript
> digits are a bit scattered:
>
> 00B2;SUPERSCRIPT TWO
> 00B3;SUPERSCRIPT THREE
> 00B9;SUPERSCRIPT ONE
> 2070;SUPERSCRIPT ZERO
> 2074;SUPERSCRIPT FOUR
> ...
> 2079;SUPERSCRIPT NINE
>
> And there are situations where one wants to interpret them as in a
> decimal-position system.

Scattering does not only affect decimal digits, but also mathematical
operators needed to represent:

- the numeric sign (« - » or « + »), with at least two variants for
the same system to represent the minus sign (either the ambiguous
minus-heighten, the only one supported in many text-to-number
conversions, or the true mathematical minus sign U+2212 « − » that has
the same width as the plus sign), including some « alternating signs »
that exist in two opposite versions (« ± », « ∓ »);

- the characters that represent the decimal separator (« . » or « , »)
which is almost always needed but locale-specific (this is not just a
property of the script);

- the optional character used to note exponential notations and used
in text-to-number conversion (usually « e » or  « E »);

- the optional characters used in the conventional formatting for
grouping digits (NNBSP alias « fine », with possible automatic
fallback to THINSP in font renderers and in rich-text documents
controlling the breaking property with separate style, or fallback to
NBSP in plain-text documents, or fallback to standard SPACE in
preformatted plain-text documents, « , », or « ' », and possibly other
punctuations in their « wide » form, for ideographic scripts).

Some of them exist in exponential/superscript or indice/subscript
versions (notably digits and decimal separators), but not all of them
(not all separators for grouping digits, using NNBSP may not be
appropriate as its width is not adjusted and it does not have the
semantic of a superscript or subscript).

For generality, it seems better to assume that digits and other
characters needed to note numbers in the positional decimal system may
be scattered (libraries may still avoid the small overhead of
performing table lookups, by just inspecting a property of the
character '0' or of the convention use, that will either say that it
starts a contiguous ranges, or that the complete sequence is stored in
a lookup array for the 10 digits.

The general category "Nd" may not always be accurate to find all
digits usable in decimal notations of integers, because the sequence
may have been incomplete when it was first encoded, and completed
later in scattered positions.

In this case, the digits will often have a general property of "No"
(or even "Nl") that will remain stable. What should also be stable is
their numeric value property (but I'm not sure that this is the case
of "Nl" digits, notably for scripts systems using letters in a way
similar to Greek or Hebrew letters as digits, even if Greek and Hebrew
digits are not encoded separately from the letters that these number
notations are borrowing).

Also I'm not sure that scripts that define "half-digits", or digits
with higher numeric values than 9, are permitting the use of their
digits with a numeric value between 0 and 9, in a positional decimal
system. The Roman numeric system is such a numeric system (borrowing
some scattered Latin letters and adding a few other specific digits)
where this will be completely wrong.

Or another base than 10 could be assumed by their positional system,
even if their digits are encoded in a contiguous range of characters
for the subset of values 0 to 9. This is probably no longer the case
with scripts that have modern use, but in historical scripts or in
historical texts using a modern script, the implied base may be
different and would have used more or less distinct digits. So instead
of guessing automatically from the encoded text, it may be preferable
to annotate the text (easy to insert if the conversion of the
historical text uses some rich-text format) to specify how to
interpret the numeric value of the original number.

And sometimes, the conversion to superscripts/subscripts compatibility
characters will not be possible even if some of them may be converted
safely to their numeric value, after detecting that they are in
superscript/subscript and that they don't behave the same as normal
digits (16²⁰ must NOT be interpreted as the numeric value 1620, but
must be parsed as two successive numbers 16 and 20, where the second
one has the semantic

Re: ? Reasonable to propose stability policy on numeric type = decimal

2010-07-25 Thread Kent Karlsson

Den 2010-07-25 03.09, skrev "Michael Everson" :

> On 25 Jul 2010, at 02:02, Bill Poser wrote:
> 
>> As I said, it isn't a huge issue, but scattering the digits makes the
>> programming a bit more complex and error-prone and the programs a little less
>> efficient.
> 
> But it would still *work*. So my hyperbole was not outrageous. And nobody has
> actually scattered them. THough there are various types of "runs" in existing
> encoded digits and numbers.

While not formally of general category Nd (they are "No"), the superscript
digits are a bit scattered:

00B2;SUPERSCRIPT TWO
00B3;SUPERSCRIPT THREE
00B9;SUPERSCRIPT ONE
2070;SUPERSCRIPT ZERO
2074;SUPERSCRIPT FOUR
2075;SUPERSCRIPT FIVE
2076;SUPERSCRIPT SIX
2077;SUPERSCRIPT SEVEN
2078;SUPERSCRIPT EIGHT
2079;SUPERSCRIPT NINE

And there are situations where one wants to interpret them as in a
decimal-position system.

/kent k