Re: statistics

2010-10-12 Thread Asmus Freytag

 On 10/11/2010 9:49 PM, Janusz S. Bień wrote:

On Mon, 11 Oct 2010  announceme...@unicode.org wrote:


  The newly finalized Unicode Version 6.0 adds 2,088 characters,

What is the current total? Are other statistic informations available
somewhere?

The announcement gives a link to click through.

There you will find more statistics.

A./

Best regards

JSB






Re: statistics

2010-10-12 Thread Janusz S. Bień
On Mon, 11 Oct 2010  Asmus Freytag asm...@ix.netcom.com wrote:

   On 10/11/2010 9:49 PM, Janusz S. Bień wrote:
 On Mon, 11 Oct 2010  announceme...@unicode.org wrote:

   The newly finalized Unicode Version 6.0 adds 2,088 characters,
 What is the current total? Are other statistic informations available
 somewhere?
 The announcement gives a link to click through.

 There you will find more statistics.

I guess you mean Character Assignment Overview at

  http://www.unicode.org/versions/Unicode6.0.0/

However it does not provide the precise answer to my primary question,
which is not purely arithmetic but depends on the definition of the
character. In particular, do noncharacters belong to characters?

Regards

JSB

-- 
 ,   
dr hab. Janusz S. Bien, prof. UW -  Uniwersytet Warszawski (Katedra Lingwistyki 
Formalnej)
Prof. Janusz S. Bien - Warsaw University (Department of Formal Linguistics)
jsb...@uw.edu.pl, jsb...@mimuw.edu.pl, http://fleksem.klf.uw.edu.pl/~jsbien/




Re: statistics

2010-10-12 Thread Andrew West
2010/10/12 Janusz S. Bień jsb...@mimuw.edu.pl:

   The newly finalized Unicode Version 6.0 adds 2,088 characters,
 What is the current total? Are other statistic informations available
 somewhere?

 However it does not provide the precise answer to my primary question,
 which is not purely arithmetic but depends on the definition of the
 character. In particular, do noncharacters belong to characters?

The Wikipedia article on Unicode gives the current total, and explains
what the various categories of characters are:

http://en.wikipedia.org/wiki/Unicode

I give a detailed break down of character statistics by Unicode
version (from 1.0.0 to 6.0) at:

http://babelstone.blogspot.com/2005/11/how-many-unicode-characters-are-there.html

Andrew




FW: statistics

2010-10-12 Thread Ernest van den Boogaard

FW to Unicode ml

From: ernestvandenbooga...@hotmail.com
To: jsb...@mimuw.edu.pl
Subject: RE: statistics
Date: Tue, 12 Oct 2010 10:13:17 +0200








In 5.2, Chapter 2.4 table 2-3 is listed which General Categories are 
characters. Out are: Surrogates, Private Use, Non-characters and Reserved 
code points. Note that Format characters (Cf) are included as characters. The 
code points with formatting aspects in C0 and C1 are Controls (Cc), so 
excluded.

Total number of characters in 6.0 is 109,242+142=109,384.

Regards,
Ernest van den Boogaard

 From: jsb...@mimuw.edu.pl
 To: asm...@ix.netcom.com
 CC: unicode@unicode.org
 Subject: Re: statistics
 Date: Tue, 12 Oct 2010 09:14:21 +0200
 
 On Mon, 11 Oct 2010  Asmus Freytag asm...@ix.netcom.com wrote:
 
On 10/11/2010 9:49 PM, Janusz S. Bień wrote:
  On Mon, 11 Oct 2010  announceme...@unicode.org wrote:
 
The newly finalized Unicode Version 6.0 adds 2,088 characters,
  What is the current total? Are other statistic informations available
  somewhere?
  The announcement gives a link to click through.
 
  There you will find more statistics.
 
 I guess you mean Character Assignment Overview at
 
   http://www.unicode.org/versions/Unicode6.0.0/
 
 However it does not provide the precise answer to my primary question,
 which is not purely arithmetic but depends on the definition of the
 character. In particular, do noncharacters belong to characters?
 
 Regards
 
 JSB
 
 -- 
  ,   
 dr hab. Janusz S. Bien, prof. UW -  Uniwersytet Warszawski (Katedra 
 Lingwistyki Formalnej)
 Prof. Janusz S. Bien - Warsaw University (Department of Formal Linguistics)
 jsb...@uw.edu.pl, jsb...@mimuw.edu.pl, http://fleksem.klf.uw.edu.pl/~jsbien/
 
 
  

Creative people on Twitter

2010-10-12 Thread Jeroen Ruigrok van der Werven
Not satisfied with the plain text only option on Twitter, a trend currently
seems to be to write love as ℒℴѵℯ (U+2112, U+2134, U+0475, U+212F) to get a
sort of handwritten display.

Creative, that's for sure.

-- 
Jeroen Ruigrok van der Werven asmodai(-at-)in-nomine.org / asmodai
イェルーン ラウフロック ヴァン デル ウェルヴェン
http://www.in-nomine.org/ | http://www.rangaku.org/ | GPG: 2EAC625B
Time is a twofold teacher, harsh and yet patient like no-one...



Re: statistics

2010-10-12 Thread Doug Ewell

Ernest van den Boogaard wrote:

In 5.2, Chapter 2.4 table 2-3 is listed which General Categories are 
characters. Out are: Surrogates, Private Use, Non-characters and 
Reserved code points. Note that Format characters (Cf) are included as 
characters. The code points with formatting aspects in C0 and C1 are 
Controls (Cc), so excluded.


I don't understand why any control characters would be excluded from a 
count of characters.


--
Doug Ewell | Thornton, Colorado, USA | http://www.ewellic.org
RFC 5645, 4645, UTN #14 | ietf-languages @ is dot gd slash 2kf0s ­




Re: Creative people on Twitter

2010-10-12 Thread Leonardo Boiko
I guess it’s only a matter of 퐭퐢퐦퐞 before people start doing
things like 햙햍햎햘 (notice this email is plain-text).
-- 
Leonardo Boiko




James Kass and Code2000 font

2010-10-12 Thread Alan Wood
I am used to relying on fonts from James Kass to display new Unicode 
characters, 
but his fonts have not been updated for Unicode 5.2 yet, and he has not 
contributed to this list for some time.

I have e-mailed him, but he has not replied, which is not usual for James.

Does anyone know what has happened to James?

Incidentally, many of the new symbols in Unicode 6 are available in the Symbola 
font from George Douros, and they can be seen in Firefox:
http://users.teilar.gr/~g1951d/

Regards

Alan Wood
http://www.alanwood.net (Unicode, special characters, pesticide names) 


  




Re: Creative people on Twitter

2010-10-12 Thread David Starner
On Tue, Oct 12, 2010 at 7:53 AM, Leonardo Boiko leobo...@gmail.com wrote:
 I guess it’s only a matter of 퐭퐢퐦퐞 before people start doing
 things like 햙햍햎햘 (notice this email is plain-text).

Not that soon on Twitter, as Twitter apparently runs a filter and cuts
off all characters above U+ a couple weeks after posting.

-- 
Kie ekzistas vivo, ekzistas espero.




Re: Creative people on Twitter

2010-10-12 Thread Doug Ewell
Leonardo Boiko leoboiko at gmail dot com wrote:

 I guess it’s only a matter of 퐭퐢퐦퐞 before people start doing
 things like 햙햍햎햘 (notice this email is plain-text).

I assumed this would become a big fad, back when I wrote my MathText
tool to automate the process, but it turns out not to have caught on.
Once in a while you find something like this Twitter citation, or the
Uncyclopedia article on Unicode, or the Unicode upside-down converter
on fileformat.info.  These don't cause any real harm, and people get
bored with them quickly.

--
Doug Ewell | Thornton, Colorado, USA | http://www.ewellic.org
RFC 5645, 4645, UTN #14 | ietf-languages @ is dot gd slash 2kf0s ­






RE: OpenType update for Unicode 5.2/6.0?

2010-10-12 Thread Peter Constable
We are in the process of updating the tags to sync with Unicode 6.0. This has 
to be coordinated with the ISO Open Font Format standard, so may take a little 
time.


Peter

From: unicode-bou...@unicode.org [mailto:unicode-bou...@unicode.org] On Behalf 
Of John H. Jenkins
Sent: Monday, October 11, 2010 9:32 AM
To: Unicode ML
Subject: Re: OpenType update for Unicode 5.2/6.0?

You might start with at http://www.microsoft.com/typography/otspec/otlist.htm.

On Oct 11, 2010, at 5:11 AM, Saqqara wrote:


Given that OpenType is the de-facto standard for fonts, it is disappointing to 
see the 'Script tag' list for OpenType has not been updated in almost three 
years. I'm a patient person but the lack of inclusion of new scripts in Unicode 
5.2 a year after the fact seems like carelessness. I've elaborated a little 
further on my jtotobsc blog, see 
http://jtotobsc.blogspot.com/2010/10/isounicode-scripts-missing-in-opentype.html.

My particular interest being ㌃ェ㏏㏯、㏪ㆎㅓ㊖ (mdt-kmt, the Egyptian language in 
hieroglyphs).

Any ideas who needs to be prodded to make an update happen? It would also be 
very useful if HTML5/WOFF could spec Unicode 6.0 or later as a step towards a 
multiscript web.

Bob Richmond





=
Siôn ap-Rhisiart
John H. Jenkins
jenk...@apple.commailto:jenk...@apple.com





Irrational numeric values in TUS

2010-10-12 Thread karl williamson
The Unicode standard only gives numeric values to rational numbers.  Is 
the reason for this merely because of the difficulty of representing 
irrational ones?


In looking through the list of code points, I actually found only one 
case where a character totally unambiguously refers to a particular 
irrational number, and that is U+2107, EULER CONSTANT.  NamesList.txt 
says that U+03C0, GREEK SMALL LETTER PI is used for the ratio of a 
circle's circumference to its diameter, but it has other uses as well, 
and does not have the Math property.  The various Math PI's don't seem 
that they necessarily mean this value either.  Things like the two 
characters that have Planck's constant in their names, even if the 
code points always meant that, have different values in different 
measurement systems, so couldn't be said to refer to particular numbers.


I'm curious if any thought was given to this, and what code points I'm 
missing in my analysis.





My take on the Unicode 6.0 release

2010-10-12 Thread Roozbeh Pournader
Here is the tailored announcement I wrote for the Persian computing 
community:


http://www.advogato.org/person/roozbeh/diary/163.html

Roozbeh



Re: Irrational numeric values in TUS

2010-10-12 Thread Kenneth Whistler
Karl Williamson asked:

 The Unicode standard only gives numeric values to rational numbers.  Is 
 the reason for this merely because of the difficulty of representing 
 irrational ones?

No. Primarily it is because the Unicode Standard is a *character*
encoding standard, and not a standard for numeric values for
various mathematical constants that some characters might be
used to represent.
 
 In looking through the list of code points, I actually found only one 
 case where a character totally unambiguously refers to a particular 
 irrational number, and that is U+2107, EULER CONSTANT.

Well, U+2107 is classified as an uppercase letter. It isn't
classed as a number -- and only the numbers are systematically
given Numeric_Value values in the UCD, and that only because such
information is routinely required for their text processing --
particularly for the digits.

I consider EULER CONSTANT an unfortunate misnomer from the
very, very early days of the Unicode Standard. If we had it to
do over, particularly given the later addition of all the
styled mathematical alphanumerics, I would have favored:

2107 [insert stylename here] CAPITAL E
  = Euler constant
  
Or something similar -- just to make the point clearer.

  NamesList.txt 
 says that U+03C0, GREEK SMALL LETTER PI is used for the ratio of a 
 circle's circumference to its diameter, but it has other uses as well, 
 and does not have the Math property.

Having the Math property basically has nothing to do with whether
a character is assigned a Numeric_Value or not.

 The various Math PI's don't seem 
 that they necessarily mean this value either.  Things like the two 
 characters that have Planck's constant in their names, even if the 
 code points always meant that, have different values in different 
 measurement systems, so couldn't be said to refer to particular numbers.
 
 I'm curious if any thought was given to this, and what code points I'm 
 missing in my analysis.

U+1D452 MATHEMATICAL ITALIC SMALL E (or merely U+0065 LATIN
SMALL LETTER E), also used for Euler's number. See also U+2147.

For that matter, why stop with irrationals? There is
also U+1D456 MATHEMATICAL ITALIC SMALL I (or merely U+0069 LATIN
SMALL LETTER I), used for the imaginary number, square root
of -1. See also U+2148 and U+2149.

Basically, there is no end to how mathematicians may end up
assigning odder and more exotic kinds of numbers to various
symbols available in the standard. And I think how they do
so and exactly what those values mean is basically out of
scope of the Unicode Standard.

--Ken




Re: Irrational numeric values in TUS

2010-10-12 Thread Asmus Freytag

 Ken,

some comments, and a few suggestions near the end.


On 10/12/2010 4:56 PM, Kenneth Whistler wrote:

Karl Williamson asked:


The Unicode standard only gives numeric values to rational numbers.  Is
the reason for this merely because of the difficulty of representing
irrational ones?

No. Primarily it is because the Unicode Standard is a *character*
encoding standard, and not a standard for numeric values for
various mathematical constants that some characters might be
used to represent.

Correct.



I consider EULER CONSTANT an unfortunate misnomer from the
very, very early days of the Unicode Standard. If we had it to
do over, particularly given the later addition of all the
styled mathematical alphanumerics, I would have favored:

2107 [insert stylename here] CAPITAL E
   = Euler constant

Or something similar -- just to make the point clearer.
Actually, what you advocate here is what I consider the mistake that was 
made with the WEIERSTRASS ELLIPTIC FUNCTION. The problem is that the 
Letterlike Symbols were conflated with styled letters used as symbols. 
They are not at all the same category. The Planck constant is a styled 
letter used as symbol, and is correctly unified with the italic h, but 
the planck constant / (2 * pi), or h-bar is not a styled letter but a 
symbol derived from a styled letter - a true letterlike symbol.


2107 and 2118 are one-off designs, not part of complete sets, same as 210F.

Because these characters came from not-well-understood legacy 
collections, and because the styled letters used as symbols were 
initially deemed inadmissible to Unicode as complete sets these 
distinctions weren't clear at the time.

  NamesList.txt
says that U+03C0, GREEK SMALL LETTER PI is used for the ratio of a
circle's circumference to its diameter, but it has other uses as well,
and does not have the Math property.

Having the Math property basically has nothing to do with whether
a character is assigned a Numeric_Value or not.


Correct.

The various Math PI's don't seem
that they necessarily mean this value either.  Things like the two
characters that have Planck's constant in their names, even if the
code points always meant that, have different values in different
measurement systems, so couldn't be said to refer to particular numbers.

I'm curious if any thought was given to this, and what code points I'm
missing in my analysis.

U+1D452 MATHEMATICAL ITALIC SMALL E (or merely U+0065 LATIN
SMALL LETTER E), also used for Euler's number. See also U+2147.


Now you are confusing Euler's constant - also depicted with U+03B3 GREEK 
SMALL LETTER GAMMA, with the natural exponent. That kind of confusion is 
really not helpful and is what drives people like Karl to ask for 
numeric property values in the first place - to unambiguously define 
what these symbols were encoded for.


The proper place to document that, without introducing a formal 
property, is with additional nameslist annotation for a few characters.


I suggest that you add the correct value for Euler's constant as a 
comment and cross reference that character it to 03B3


0.57721 56649 01532 86060 65120 90082 40243 10421 59335 93992

should be approximate enough...?

At the same time you could add a comment e ≈ 2.718 for 212F - Again, not 
to document the value, but to make clear, beyond the character name, 
what constant the alias for 212F denotes.



For that matter, why stop with irrationals? There is
also U+1D456 MATHEMATICAL ITALIC SMALL I (or merely U+0069 LATIN
SMALL LETTER I), used for the imaginary number, square root
of -1. See also U+2148 and U+2149.

Basically, there is no end to how mathematicians may end up
assigning odder and more exotic kinds of numbers to various
symbols available in the standard. And I think how they do
so and exactly what those values mean is basically out of
scope of the Unicode Standard.



Correct - it's not Unicode's role to make the assignment, but common 
usage can and should be documented informally - that's no different to 
documenting modifier letters with detailed linguistic usage.


A./




Re: Irrational numeric values in TUS

2010-10-12 Thread Kenneth Whistler
Asmus,

  I'm curious if any thought was given to this, and what code points I'm
  missing in my analysis.
  U+1D452 MATHEMATICAL ITALIC SMALL E (or merely U+0065 LATIN
  SMALL LETTER E), also used for Euler's number. See also U+2147.
 
 Now you are confusing Euler's constant - also depicted with U+03B3 GREEK 
 SMALL LETTER GAMMA, with the natural exponent.

Actually I'm not confusing the two -- which is why I wrote
Euler's number, not Euler's constant. Perhaps I misplaced
also in the sentence, but I was referring here to 2.718...
not to 0.57721...

 That kind of confusion is 
 really not helpful 

Hehe. Well, it wasn't me, but mathematicians who took to calling
these things Euler's number and Euler's constant confusingly.
Check the wikis. ;-)

 and is what drives people like Karl to ask for 
 numeric property values in the first place - to unambiguously define 
 what these symbols were encoded for.
 
 The proper place to document that, without introducing a formal 
 property, is with additional nameslist annotation for a few characters.

I disagree. Because that just further cements the notion that
these characters *are* the constants. We keep going around on
this, both about mathematical values and about confusion of
characters with units of SI, as well.

 I suggest that you add the correct value for Euler's constant as a 
 comment and cross reference that character it to 03B3
 
 0.57721 56649 01532 86060 65120 90082 40243 10421 59335 93992
 
 should be approximate enough...?
 
 At the same time you could add a comment e ≈ 2.718 for 212F - Again, not 
 to document the value, but to make clear, beyond the character name, 
 what constant the alias for 212F denotes.

Nah, I don't think those are helpful here.

Maybe the UTC would disagree with me. ;-)

--Ken






Re: OpenType update for Unicode 5.2/6.0?

2010-10-12 Thread Ngwe Tun
Dear Peter Costable,

it might be off-topic, When Microsoft will fix MLang bugs for Myanmar?
http://blogs.msdn.com/b/michkap/archive/2008/04/18/8403631.aspx

Burmese Font Developer, We are making fonts without Microsoft OpenType font
specifiction for Myanmar. Can we have any of specification for OpenType in
Unicode 6.0. Again, another disappointing case is Character Map in Windows 7
didn't update yet for Myanmar Changes in Unicode 5.1

Best

Ngwe Tun.




On Wed, Oct 13, 2010 at 3:39 AM, Peter Constable peter...@microsoft.comwrote:

  We are in the process of updating the tags to sync with Unicode 6.0. This
 has to be coordinated with the ISO Open Font Format standard, so may take a
 little time.





 Peter



 *From:* unicode-bou...@unicode.org [mailto:unicode-bou...@unicode.org] *On
 Behalf Of *John H. Jenkins
 *Sent:* Monday, October 11, 2010 9:32 AM
 *To:* Unicode ML
 *Subject:* Re: OpenType update for Unicode 5.2/6.0?



 You might start with at
 http://www.microsoft.com/typography/otspec/otlist.htm.



 On Oct 11, 2010, at 5:11 AM, Saqqara wrote:



   Given that OpenType is the de-facto standard for fonts, it is
 disappointing to see the 'Script tag' list for OpenType has not been updated
 in almost three years. I'm a patient person but the lack of inclusion of new
 scripts in Unicode 5.2 a year after the fact seems like carelessness. I've
 elaborated a little further on my jtotobsc blog, see
 http://jtotobsc.blogspot.com/2010/10/isounicode-scripts-missing-in-opentype.html
 .



 My particular interest being ㌃ェ㏏㏯、㏪ㆎㅓ㊖ (mdt-kmt, the Egyptian
 language in hieroglyphs).



 Any ideas who needs to be prodded to make an update happen? It would also
 be very useful if HTML5/WOFF could spec Unicode 6.0 or later as a step
 towards a multiscript web.



 Bob Richmond











 =
 Siôn ap-Rhisiart
 John H. Jenkins
 jenk...@apple.com







-- 
ယနေ့မှစ၍ norris...@awwonline.biz ကိုသာဆက်သွယ်ကြပါ။ ngwes...@gmail.com ကို
မကြာမှီပိတ်ပါတော့မည်။


RE: OpenType update for Unicode 5.2/6.0?

2010-10-12 Thread Peter Constable
I can’t comment on when limitations in MLang will be addressed; I can only say 
that we are aware of them.

Can you clarify what you think is missing from Character Map?



Peter

From: Ngwe Tun [mailto:ngwes...@gmail.com]
Sent: Tuesday, October 12, 2010 7:39 PM
To: Peter Constable
Cc: John H. Jenkins; Unicode ML
Subject: Re: OpenType update for Unicode 5.2/6.0?


Dear Peter Costable,

it might be off-topic, When Microsoft will fix MLang bugs for Myanmar? 
http://blogs.msdn.com/b/michkap/archive/2008/04/18/8403631.aspx

Burmese Font Developer, We are making fonts without Microsoft OpenType font 
specifiction for Myanmar. Can we have any of specification for OpenType in 
Unicode 6.0. Again, another disappointing case is Character Map in Windows 7 
didn't update yet for Myanmar Changes in Unicode 5.1

Best

Ngwe Tun.



On Wed, Oct 13, 2010 at 3:39 AM, Peter Constable 
peter...@microsoft.commailto:peter...@microsoft.com wrote:
We are in the process of updating the tags to sync with Unicode 6.0. This has 
to be coordinated with the ISO Open Font Format standard, so may take a little 
time.


Peter

From: unicode-bou...@unicode.orgmailto:unicode-bou...@unicode.org 
[mailto:unicode-bou...@unicode.orgmailto:unicode-bou...@unicode.org] On 
Behalf Of John H. Jenkins
Sent: Monday, October 11, 2010 9:32 AM
To: Unicode ML
Subject: Re: OpenType update for Unicode 5.2/6.0?

You might start with at http://www.microsoft.com/typography/otspec/otlist.htm.

On Oct 11, 2010, at 5:11 AM, Saqqara wrote:

Given that OpenType is the de-facto standard for fonts, it is disappointing to 
see the 'Script tag' list for OpenType has not been updated in almost three 
years. I'm a patient person but the lack of inclusion of new scripts in Unicode 
5.2 a year after the fact seems like carelessness. I've elaborated a little 
further on my jtotobsc blog, see 
http://jtotobsc.blogspot.com/2010/10/isounicode-scripts-missing-in-opentype.html.

My particular interest being ㌃ェ㏏㏯、㏪ㆎㅓ㊖ (mdt-kmt, the Egyptian language in 
hieroglyphs).

Any ideas who needs to be prodded to make an update happen? It would also be 
very useful if HTML5/WOFF could spec Unicode 6.0 or later as a step towards a 
multiscript web.

Bob Richmond





=
Siôn ap-Rhisiart
John H. Jenkins
jenk...@apple.commailto:jenk...@apple.com





--
ယနေ့မှစ၍ norris...@awwonline.bizmailto:norris...@awwonline.biz 
ကိုသာဆက်သွယ်ကြပါ။ ngwes...@gmail.commailto:ngwes...@gmail.com ကို 
မကြာမှီပိတ်ပါတော့မည်။