Re: statistics
On 10/11/2010 9:49 PM, Janusz S. Bień wrote: On Mon, 11 Oct 2010 announceme...@unicode.org wrote: The newly finalized Unicode Version 6.0 adds 2,088 characters, What is the current total? Are other statistic informations available somewhere? The announcement gives a link to click through. There you will find more statistics. A./ Best regards JSB
Re: statistics
On Mon, 11 Oct 2010 Asmus Freytag asm...@ix.netcom.com wrote: On 10/11/2010 9:49 PM, Janusz S. Bień wrote: On Mon, 11 Oct 2010 announceme...@unicode.org wrote: The newly finalized Unicode Version 6.0 adds 2,088 characters, What is the current total? Are other statistic informations available somewhere? The announcement gives a link to click through. There you will find more statistics. I guess you mean Character Assignment Overview at http://www.unicode.org/versions/Unicode6.0.0/ However it does not provide the precise answer to my primary question, which is not purely arithmetic but depends on the definition of the character. In particular, do noncharacters belong to characters? Regards JSB -- , dr hab. Janusz S. Bien, prof. UW - Uniwersytet Warszawski (Katedra Lingwistyki Formalnej) Prof. Janusz S. Bien - Warsaw University (Department of Formal Linguistics) jsb...@uw.edu.pl, jsb...@mimuw.edu.pl, http://fleksem.klf.uw.edu.pl/~jsbien/
Re: statistics
2010/10/12 Janusz S. Bień jsb...@mimuw.edu.pl: The newly finalized Unicode Version 6.0 adds 2,088 characters, What is the current total? Are other statistic informations available somewhere? However it does not provide the precise answer to my primary question, which is not purely arithmetic but depends on the definition of the character. In particular, do noncharacters belong to characters? The Wikipedia article on Unicode gives the current total, and explains what the various categories of characters are: http://en.wikipedia.org/wiki/Unicode I give a detailed break down of character statistics by Unicode version (from 1.0.0 to 6.0) at: http://babelstone.blogspot.com/2005/11/how-many-unicode-characters-are-there.html Andrew
FW: statistics
FW to Unicode ml From: ernestvandenbooga...@hotmail.com To: jsb...@mimuw.edu.pl Subject: RE: statistics Date: Tue, 12 Oct 2010 10:13:17 +0200 In 5.2, Chapter 2.4 table 2-3 is listed which General Categories are characters. Out are: Surrogates, Private Use, Non-characters and Reserved code points. Note that Format characters (Cf) are included as characters. The code points with formatting aspects in C0 and C1 are Controls (Cc), so excluded. Total number of characters in 6.0 is 109,242+142=109,384. Regards, Ernest van den Boogaard From: jsb...@mimuw.edu.pl To: asm...@ix.netcom.com CC: unicode@unicode.org Subject: Re: statistics Date: Tue, 12 Oct 2010 09:14:21 +0200 On Mon, 11 Oct 2010 Asmus Freytag asm...@ix.netcom.com wrote: On 10/11/2010 9:49 PM, Janusz S. Bień wrote: On Mon, 11 Oct 2010 announceme...@unicode.org wrote: The newly finalized Unicode Version 6.0 adds 2,088 characters, What is the current total? Are other statistic informations available somewhere? The announcement gives a link to click through. There you will find more statistics. I guess you mean Character Assignment Overview at http://www.unicode.org/versions/Unicode6.0.0/ However it does not provide the precise answer to my primary question, which is not purely arithmetic but depends on the definition of the character. In particular, do noncharacters belong to characters? Regards JSB -- , dr hab. Janusz S. Bien, prof. UW - Uniwersytet Warszawski (Katedra Lingwistyki Formalnej) Prof. Janusz S. Bien - Warsaw University (Department of Formal Linguistics) jsb...@uw.edu.pl, jsb...@mimuw.edu.pl, http://fleksem.klf.uw.edu.pl/~jsbien/
Creative people on Twitter
Not satisfied with the plain text only option on Twitter, a trend currently seems to be to write love as ℒℴѵℯ (U+2112, U+2134, U+0475, U+212F) to get a sort of handwritten display. Creative, that's for sure. -- Jeroen Ruigrok van der Werven asmodai(-at-)in-nomine.org / asmodai イェルーン ラウフロック ヴァン デル ウェルヴェン http://www.in-nomine.org/ | http://www.rangaku.org/ | GPG: 2EAC625B Time is a twofold teacher, harsh and yet patient like no-one...
Re: statistics
Ernest van den Boogaard wrote: In 5.2, Chapter 2.4 table 2-3 is listed which General Categories are characters. Out are: Surrogates, Private Use, Non-characters and Reserved code points. Note that Format characters (Cf) are included as characters. The code points with formatting aspects in C0 and C1 are Controls (Cc), so excluded. I don't understand why any control characters would be excluded from a count of characters. -- Doug Ewell | Thornton, Colorado, USA | http://www.ewellic.org RFC 5645, 4645, UTN #14 | ietf-languages @ is dot gd slash 2kf0s
Re: Creative people on Twitter
I guess it’s only a matter of 퐭퐢퐦퐞 before people start doing things like 햙햍햎햘 (notice this email is plain-text). -- Leonardo Boiko
James Kass and Code2000 font
I am used to relying on fonts from James Kass to display new Unicode characters, but his fonts have not been updated for Unicode 5.2 yet, and he has not contributed to this list for some time. I have e-mailed him, but he has not replied, which is not usual for James. Does anyone know what has happened to James? Incidentally, many of the new symbols in Unicode 6 are available in the Symbola font from George Douros, and they can be seen in Firefox: http://users.teilar.gr/~g1951d/ Regards Alan Wood http://www.alanwood.net (Unicode, special characters, pesticide names)
Re: Creative people on Twitter
On Tue, Oct 12, 2010 at 7:53 AM, Leonardo Boiko leobo...@gmail.com wrote: I guess it’s only a matter of 퐭퐢퐦퐞 before people start doing things like 햙햍햎햘 (notice this email is plain-text). Not that soon on Twitter, as Twitter apparently runs a filter and cuts off all characters above U+ a couple weeks after posting. -- Kie ekzistas vivo, ekzistas espero.
Re: Creative people on Twitter
Leonardo Boiko leoboiko at gmail dot com wrote: I guess it’s only a matter of 퐭퐢퐦퐞 before people start doing things like 햙햍햎햘 (notice this email is plain-text). I assumed this would become a big fad, back when I wrote my MathText tool to automate the process, but it turns out not to have caught on. Once in a while you find something like this Twitter citation, or the Uncyclopedia article on Unicode, or the Unicode upside-down converter on fileformat.info. These don't cause any real harm, and people get bored with them quickly. -- Doug Ewell | Thornton, Colorado, USA | http://www.ewellic.org RFC 5645, 4645, UTN #14 | ietf-languages @ is dot gd slash 2kf0s
RE: OpenType update for Unicode 5.2/6.0?
We are in the process of updating the tags to sync with Unicode 6.0. This has to be coordinated with the ISO Open Font Format standard, so may take a little time. Peter From: unicode-bou...@unicode.org [mailto:unicode-bou...@unicode.org] On Behalf Of John H. Jenkins Sent: Monday, October 11, 2010 9:32 AM To: Unicode ML Subject: Re: OpenType update for Unicode 5.2/6.0? You might start with at http://www.microsoft.com/typography/otspec/otlist.htm. On Oct 11, 2010, at 5:11 AM, Saqqara wrote: Given that OpenType is the de-facto standard for fonts, it is disappointing to see the 'Script tag' list for OpenType has not been updated in almost three years. I'm a patient person but the lack of inclusion of new scripts in Unicode 5.2 a year after the fact seems like carelessness. I've elaborated a little further on my jtotobsc blog, see http://jtotobsc.blogspot.com/2010/10/isounicode-scripts-missing-in-opentype.html. My particular interest being ㌃ェ㏏㏯、㏪ㆎㅓ㊖ (mdt-kmt, the Egyptian language in hieroglyphs). Any ideas who needs to be prodded to make an update happen? It would also be very useful if HTML5/WOFF could spec Unicode 6.0 or later as a step towards a multiscript web. Bob Richmond = Siôn ap-Rhisiart John H. Jenkins jenk...@apple.commailto:jenk...@apple.com
Irrational numeric values in TUS
The Unicode standard only gives numeric values to rational numbers. Is the reason for this merely because of the difficulty of representing irrational ones? In looking through the list of code points, I actually found only one case where a character totally unambiguously refers to a particular irrational number, and that is U+2107, EULER CONSTANT. NamesList.txt says that U+03C0, GREEK SMALL LETTER PI is used for the ratio of a circle's circumference to its diameter, but it has other uses as well, and does not have the Math property. The various Math PI's don't seem that they necessarily mean this value either. Things like the two characters that have Planck's constant in their names, even if the code points always meant that, have different values in different measurement systems, so couldn't be said to refer to particular numbers. I'm curious if any thought was given to this, and what code points I'm missing in my analysis.
My take on the Unicode 6.0 release
Here is the tailored announcement I wrote for the Persian computing community: http://www.advogato.org/person/roozbeh/diary/163.html Roozbeh
Re: Irrational numeric values in TUS
Karl Williamson asked: The Unicode standard only gives numeric values to rational numbers. Is the reason for this merely because of the difficulty of representing irrational ones? No. Primarily it is because the Unicode Standard is a *character* encoding standard, and not a standard for numeric values for various mathematical constants that some characters might be used to represent. In looking through the list of code points, I actually found only one case where a character totally unambiguously refers to a particular irrational number, and that is U+2107, EULER CONSTANT. Well, U+2107 is classified as an uppercase letter. It isn't classed as a number -- and only the numbers are systematically given Numeric_Value values in the UCD, and that only because such information is routinely required for their text processing -- particularly for the digits. I consider EULER CONSTANT an unfortunate misnomer from the very, very early days of the Unicode Standard. If we had it to do over, particularly given the later addition of all the styled mathematical alphanumerics, I would have favored: 2107 [insert stylename here] CAPITAL E = Euler constant Or something similar -- just to make the point clearer. NamesList.txt says that U+03C0, GREEK SMALL LETTER PI is used for the ratio of a circle's circumference to its diameter, but it has other uses as well, and does not have the Math property. Having the Math property basically has nothing to do with whether a character is assigned a Numeric_Value or not. The various Math PI's don't seem that they necessarily mean this value either. Things like the two characters that have Planck's constant in their names, even if the code points always meant that, have different values in different measurement systems, so couldn't be said to refer to particular numbers. I'm curious if any thought was given to this, and what code points I'm missing in my analysis. U+1D452 MATHEMATICAL ITALIC SMALL E (or merely U+0065 LATIN SMALL LETTER E), also used for Euler's number. See also U+2147. For that matter, why stop with irrationals? There is also U+1D456 MATHEMATICAL ITALIC SMALL I (or merely U+0069 LATIN SMALL LETTER I), used for the imaginary number, square root of -1. See also U+2148 and U+2149. Basically, there is no end to how mathematicians may end up assigning odder and more exotic kinds of numbers to various symbols available in the standard. And I think how they do so and exactly what those values mean is basically out of scope of the Unicode Standard. --Ken
Re: Irrational numeric values in TUS
Ken, some comments, and a few suggestions near the end. On 10/12/2010 4:56 PM, Kenneth Whistler wrote: Karl Williamson asked: The Unicode standard only gives numeric values to rational numbers. Is the reason for this merely because of the difficulty of representing irrational ones? No. Primarily it is because the Unicode Standard is a *character* encoding standard, and not a standard for numeric values for various mathematical constants that some characters might be used to represent. Correct. I consider EULER CONSTANT an unfortunate misnomer from the very, very early days of the Unicode Standard. If we had it to do over, particularly given the later addition of all the styled mathematical alphanumerics, I would have favored: 2107 [insert stylename here] CAPITAL E = Euler constant Or something similar -- just to make the point clearer. Actually, what you advocate here is what I consider the mistake that was made with the WEIERSTRASS ELLIPTIC FUNCTION. The problem is that the Letterlike Symbols were conflated with styled letters used as symbols. They are not at all the same category. The Planck constant is a styled letter used as symbol, and is correctly unified with the italic h, but the planck constant / (2 * pi), or h-bar is not a styled letter but a symbol derived from a styled letter - a true letterlike symbol. 2107 and 2118 are one-off designs, not part of complete sets, same as 210F. Because these characters came from not-well-understood legacy collections, and because the styled letters used as symbols were initially deemed inadmissible to Unicode as complete sets these distinctions weren't clear at the time. NamesList.txt says that U+03C0, GREEK SMALL LETTER PI is used for the ratio of a circle's circumference to its diameter, but it has other uses as well, and does not have the Math property. Having the Math property basically has nothing to do with whether a character is assigned a Numeric_Value or not. Correct. The various Math PI's don't seem that they necessarily mean this value either. Things like the two characters that have Planck's constant in their names, even if the code points always meant that, have different values in different measurement systems, so couldn't be said to refer to particular numbers. I'm curious if any thought was given to this, and what code points I'm missing in my analysis. U+1D452 MATHEMATICAL ITALIC SMALL E (or merely U+0065 LATIN SMALL LETTER E), also used for Euler's number. See also U+2147. Now you are confusing Euler's constant - also depicted with U+03B3 GREEK SMALL LETTER GAMMA, with the natural exponent. That kind of confusion is really not helpful and is what drives people like Karl to ask for numeric property values in the first place - to unambiguously define what these symbols were encoded for. The proper place to document that, without introducing a formal property, is with additional nameslist annotation for a few characters. I suggest that you add the correct value for Euler's constant as a comment and cross reference that character it to 03B3 0.57721 56649 01532 86060 65120 90082 40243 10421 59335 93992 should be approximate enough...? At the same time you could add a comment e ≈ 2.718 for 212F - Again, not to document the value, but to make clear, beyond the character name, what constant the alias for 212F denotes. For that matter, why stop with irrationals? There is also U+1D456 MATHEMATICAL ITALIC SMALL I (or merely U+0069 LATIN SMALL LETTER I), used for the imaginary number, square root of -1. See also U+2148 and U+2149. Basically, there is no end to how mathematicians may end up assigning odder and more exotic kinds of numbers to various symbols available in the standard. And I think how they do so and exactly what those values mean is basically out of scope of the Unicode Standard. Correct - it's not Unicode's role to make the assignment, but common usage can and should be documented informally - that's no different to documenting modifier letters with detailed linguistic usage. A./
Re: Irrational numeric values in TUS
Asmus, I'm curious if any thought was given to this, and what code points I'm missing in my analysis. U+1D452 MATHEMATICAL ITALIC SMALL E (or merely U+0065 LATIN SMALL LETTER E), also used for Euler's number. See also U+2147. Now you are confusing Euler's constant - also depicted with U+03B3 GREEK SMALL LETTER GAMMA, with the natural exponent. Actually I'm not confusing the two -- which is why I wrote Euler's number, not Euler's constant. Perhaps I misplaced also in the sentence, but I was referring here to 2.718... not to 0.57721... That kind of confusion is really not helpful Hehe. Well, it wasn't me, but mathematicians who took to calling these things Euler's number and Euler's constant confusingly. Check the wikis. ;-) and is what drives people like Karl to ask for numeric property values in the first place - to unambiguously define what these symbols were encoded for. The proper place to document that, without introducing a formal property, is with additional nameslist annotation for a few characters. I disagree. Because that just further cements the notion that these characters *are* the constants. We keep going around on this, both about mathematical values and about confusion of characters with units of SI, as well. I suggest that you add the correct value for Euler's constant as a comment and cross reference that character it to 03B3 0.57721 56649 01532 86060 65120 90082 40243 10421 59335 93992 should be approximate enough...? At the same time you could add a comment e â 2.718 for 212F - Again, not to document the value, but to make clear, beyond the character name, what constant the alias for 212F denotes. Nah, I don't think those are helpful here. Maybe the UTC would disagree with me. ;-) --Ken
Re: OpenType update for Unicode 5.2/6.0?
Dear Peter Costable, it might be off-topic, When Microsoft will fix MLang bugs for Myanmar? http://blogs.msdn.com/b/michkap/archive/2008/04/18/8403631.aspx Burmese Font Developer, We are making fonts without Microsoft OpenType font specifiction for Myanmar. Can we have any of specification for OpenType in Unicode 6.0. Again, another disappointing case is Character Map in Windows 7 didn't update yet for Myanmar Changes in Unicode 5.1 Best Ngwe Tun. On Wed, Oct 13, 2010 at 3:39 AM, Peter Constable peter...@microsoft.comwrote: We are in the process of updating the tags to sync with Unicode 6.0. This has to be coordinated with the ISO Open Font Format standard, so may take a little time. Peter *From:* unicode-bou...@unicode.org [mailto:unicode-bou...@unicode.org] *On Behalf Of *John H. Jenkins *Sent:* Monday, October 11, 2010 9:32 AM *To:* Unicode ML *Subject:* Re: OpenType update for Unicode 5.2/6.0? You might start with at http://www.microsoft.com/typography/otspec/otlist.htm. On Oct 11, 2010, at 5:11 AM, Saqqara wrote: Given that OpenType is the de-facto standard for fonts, it is disappointing to see the 'Script tag' list for OpenType has not been updated in almost three years. I'm a patient person but the lack of inclusion of new scripts in Unicode 5.2 a year after the fact seems like carelessness. I've elaborated a little further on my jtotobsc blog, see http://jtotobsc.blogspot.com/2010/10/isounicode-scripts-missing-in-opentype.html . My particular interest being ㌃ェ㏏㏯、㏪ㆎㅓ㊖ (mdt-kmt, the Egyptian language in hieroglyphs). Any ideas who needs to be prodded to make an update happen? It would also be very useful if HTML5/WOFF could spec Unicode 6.0 or later as a step towards a multiscript web. Bob Richmond = Siôn ap-Rhisiart John H. Jenkins jenk...@apple.com -- ယနေ့မှစ၍ norris...@awwonline.biz ကိုသာဆက်သွယ်ကြပါ။ ngwes...@gmail.com ကို မကြာမှီပိတ်ပါတော့မည်။
RE: OpenType update for Unicode 5.2/6.0?
I can’t comment on when limitations in MLang will be addressed; I can only say that we are aware of them. Can you clarify what you think is missing from Character Map? Peter From: Ngwe Tun [mailto:ngwes...@gmail.com] Sent: Tuesday, October 12, 2010 7:39 PM To: Peter Constable Cc: John H. Jenkins; Unicode ML Subject: Re: OpenType update for Unicode 5.2/6.0? Dear Peter Costable, it might be off-topic, When Microsoft will fix MLang bugs for Myanmar? http://blogs.msdn.com/b/michkap/archive/2008/04/18/8403631.aspx Burmese Font Developer, We are making fonts without Microsoft OpenType font specifiction for Myanmar. Can we have any of specification for OpenType in Unicode 6.0. Again, another disappointing case is Character Map in Windows 7 didn't update yet for Myanmar Changes in Unicode 5.1 Best Ngwe Tun. On Wed, Oct 13, 2010 at 3:39 AM, Peter Constable peter...@microsoft.commailto:peter...@microsoft.com wrote: We are in the process of updating the tags to sync with Unicode 6.0. This has to be coordinated with the ISO Open Font Format standard, so may take a little time. Peter From: unicode-bou...@unicode.orgmailto:unicode-bou...@unicode.org [mailto:unicode-bou...@unicode.orgmailto:unicode-bou...@unicode.org] On Behalf Of John H. Jenkins Sent: Monday, October 11, 2010 9:32 AM To: Unicode ML Subject: Re: OpenType update for Unicode 5.2/6.0? You might start with at http://www.microsoft.com/typography/otspec/otlist.htm. On Oct 11, 2010, at 5:11 AM, Saqqara wrote: Given that OpenType is the de-facto standard for fonts, it is disappointing to see the 'Script tag' list for OpenType has not been updated in almost three years. I'm a patient person but the lack of inclusion of new scripts in Unicode 5.2 a year after the fact seems like carelessness. I've elaborated a little further on my jtotobsc blog, see http://jtotobsc.blogspot.com/2010/10/isounicode-scripts-missing-in-opentype.html. My particular interest being ㌃ェ㏏㏯、㏪ㆎㅓ㊖ (mdt-kmt, the Egyptian language in hieroglyphs). Any ideas who needs to be prodded to make an update happen? It would also be very useful if HTML5/WOFF could spec Unicode 6.0 or later as a step towards a multiscript web. Bob Richmond = Siôn ap-Rhisiart John H. Jenkins jenk...@apple.commailto:jenk...@apple.com -- ယနေ့မှစ၍ norris...@awwonline.bizmailto:norris...@awwonline.biz ကိုသာဆက်သွယ်ကြပါ။ ngwes...@gmail.commailto:ngwes...@gmail.com ကို မကြာမှီပိတ်ပါတော့မည်။