Re: Akkha script (used by Eastern Magar language) in ISO 15924?
> On Jul 23, 2019, at 12:26 AM, Richard Wordingham via Unicode > wrote: > > On Mon, 22 Jul 2019 17:42:37 -0700 > Anshuman Pandey via Unicode wrote: > >> As I pointed out in L2/11-144, the “Magar Akkha” script is an >> appropriation of Brahmi, renamed to link it to the primordialist >> daydreams of an ethno-linguistic community in Nepal. I have never >> seen actual usage of the script by Magars. If things have changed >> since 2011, I would very much welcome such information. Otherwise, >> the so-called “Magar Akkha” is not suitable for encoding. The Brahmi >> encoding that we have should suffice. > > How would mere usage qualify it as a separate script? I apologize for using the wrong conjunction. Instead of “otherwise” I should have written “nevertheless”. All my best, Anshu
Re: Akkha script (used by Eastern Magar language) in ISO 15924?
As I pointed out in L2/11-144, the “Magar Akkha” script is an appropriation of Brahmi, renamed to link it to the primordialist daydreams of an ethno-linguistic community in Nepal. I have never seen actual usage of the script by Magars. If things have changed since 2011, I would very much welcome such information. Otherwise, the so-called “Magar Akkha” is not suitable for encoding. The Brahmi encoding that we have should suffice. All my best, Anshu > On Jul 22, 2019, at 10:06 AM, Lorna Evans via Unicode > wrote: > > Also: https://scriptsource.org/scr/Qabl > > >> On Mon, Jul 22, 2019, 12:47 PM Ken Whistler via Unicode >> wrote: >> See the entry for "Magar Akkha" on: >> >> http://linguistics.berkeley.edu/sei/scripts-not-encoded.html >> >> Anshuman Pandey did preliminary research on this in 2011. >> >> http://www.unicode.org/L2/L2011/11144-magar-akkha.pdf >> >> It would be premature to assign an ISO 15924 script code, pending the >> research to determine whether this script should be separately encoded. >> >> --Ken >> >>> On 7/22/2019 9:16 AM, Philippe Verdy via Unicode wrote: >>> According to Ethnolog, the Eastern Magar language (mgp) is written in two >>> scripts: Devanagari and "Akkha". >>> >>> But the "Akkha" script does not seem to have any ISO 15924 code. >>> >>> The Ethnologue currently assigns a private use code (Qabl) for this script. >>> >>> Was the addition delayed due to lack of evidence (even if this language is >>> official in Nepal and India) ? >>> >>> Did the editors of Ethnologue submit an addition request for that script >>> (e.g. for the code "Akkh" or "Akha" ?) >>> >>> Or is it considered unified with another script that could explain why it >>> is not coded ? If this is a variant it could have its own code (like >>> Nastaliq in Arabic). Or may be this is just a subset of another >>> (Sino-Tibetan) script ? >>> >>> >>>
Fwd: L2/18-181
> On May 16, 2018, at 3:46 PM, Doug Ewell via Unicode > wrote: > > http://www.unicode.org/L2/L2018/18181-n4947-assamese.pdf > > This is a fascinating proposal to disunify the Assamese script from > Bengali on the following bases: ‘Fascinating’ is a not a term I’d use for this proposal. If folks are interested in a valid proposal for disunification of Bengali, please look at the proposal for Tirhuta. > 1. The identity of Assamese as a script distinct from Bengali is in > jeopardy. This is not a technical matter. Moreover, its typical rhetoric used by various language communities in South Asia. Fairly standard fare for those familiar with such issues. The proposal needs to show how the two scripts differ, ie. conjuncts, CV ligatures, etc. The number forms are similar to those already encoded. Again, cf. Tirhuta. > 2. Collation is different between the Assamese and Bengali languages, > and code point order should reflect collation order. The same issue applies to dictionary order for Hindi, Marathi, which differ from the conventional Sanskrit order for Devanagari. Orthographies for various languages put conjuncts and other things at the end, which are not considered atomic letters. Nothing special in this regard for Assamese and Bengali. > 3. Keyboard design is more difficult because consonants like ক্ষ > are encoded as conjunct forms instead of atomic characters. Ignorant question on my part: is it difficult to use character sequences as labels for keys? I see keys for both क्ष and ज्ञ on the iOS Hindi keyboard, and त्र is tucked away under त. > 4. The use of a single encoded script to write two languages forces > users to use language identifiers to identify the language. Same applies to each of the 40+ varieties of Hindi, as well as Marathi, etc. Another ignorant question: how to identify the various languages that use Arabic and Cyrillic? > 5. Transliteration of Assamese into a different script is problematic > because letters have different phonological value in Assamese and > Bengali. Transliteration or transcription? In any case, this applies to other languages written using similar scripts: a Marathi speaker pronounces ज and ऋ differently than a Hindi speaker does. > It will be interesting to see where this proposal goes. Hopefully, it does not go too far. What it proposes is contrary to Unicode and redundant. > Given that all > or most of these issues can be claimed for English, French, German, > Spanish, and hundreds of other languages written in the Latin script, if > the Assamese proposal is approved we can expect similar disunification > of the Latin script into language-specific alphabets in the future. Fascinating. I mean, terrible. All my best, Anshuman
Re: 0027, 02BC, 2019, or a new character?
> On Feb 20, 2018, at 9:49 PM, James Kass via Unicode > wrote: > > Michael Everson wrote: > >> Orthographic harmonization between these languages can ONLY help any >> speaker of one to access information in any of the others. That expands >> people’s worlds. That would be a good goal. > > Wouldn't dream of arguing with that. Expanding people's worlds is why > many of us have supported Unicode. Agreed! > The good news is that the thread title question is moot. Yes, now let’s please return to discussing emoji. All my best, Anshu
End of discussion, please — Re: Why so much emoji nonsense?
> On Feb 15, 2018, at 10:58 PM, Pierpaolo Bernardi via Unicode > wrote: > > On Fri, Feb 16, 2018 at 4:26 AM, James Kass via Unicode > wrote: > >> The best time to argue against the addition of emoji to Unicode would be >> 2007 or 2008, but you'd be wasting your time travel. Trust me. > > But it's always a good time to argue against the addition of more > nonsense to what we already have got. I think it’s a good time to end this conversation. Whether ‘nonsense’ or not, emoji are here and they’re in Unicode. This conversation has itself become nonsense, d’y’all agree? The amount of time that people have spent on this discussion could’ve been directed towards work on any one of the unencoded scripts listed at: http://www.linguistics.berkeley.edu/sei/scripts-not-encoded.html As many have noted during this discussion, the emoji “ship has already sailed”. I’d’ve jumped aboard sooner, but this metaphor is now also quite tired. 😴 All my best, Anshu
Re: Emoji for major planets at least?
Proposals for planet emoji were submitted in April 2017: https://www.unicode.org/L2/L2017/17100-planet-emoji-seq.pdf http://www.unicode.org/L2/L2017/17100r-planet-emoji-seq.pdf I’m not sure what the result was. Anshu > On Jan 18, 2018, at 12:46 PM, Asmus Freytag (c) via Unicode > wrote: > >> On 1/18/2018 10:01 AM, John H. Jenkins wrote: >> Well, you can go with Venus = white planet, Mercury = grey planet, Uranus = >> greenish planet, Neptune = bluish planet, Jupiter = striped planet. >> >> As you say, though, without a context, none of them convey much and Venus, >> at least, would just be a circle. >> >> Plus there's the question of the context in which someone would want to send >> little pictures of the planets. This sounds like it would be adding emoji >> just because. > > "Earth" as in "a blue ball in space" is something that reached iconic status > after the famous photo taken during the early Apollo missions. I could > definitely see that used in a variety of possible contexts. And the > recognition value is higher than for many recent emoji. > > Saturn, with its rings (even though it's no longer the only one known with > rings) also is iconic and highly recognizable. I lack imagination as to when > someone would want to use it in communication, but I have the same issue with > quite a few recent emoji, some of which are far less iconic or recognizable. > I think it does lend itself to describe a "non-earth" type planet, or even > the generic idea of a planet (as opposed to a star/sun). > > Mars and Venus have tons of connotations, which could be expressed by using > an emoji (as opposed to the astrological symbol for each), but only Mars is > reasonably recognizable without lots of pre-established context. That red > color. > > In a detailed enough rendering, Jupiter, as a shaded "ball" with stripes and > red dot would more recognizable than any of the remaining planets (on par or > better with many recent emoji), but I see even less scope for using it > metaphorically or in extended contexts. > > If someone were to make a proposal, I would suggest to them to limit it to > these four and to provide more of a suggestion as to how these might show up > in use. > > A./ >> >>> On Jan 18, 2018, at 10:44 AM, Asmus Freytag via Unicode >>> wrote: >>> On 1/18/2018 6:55 AM, Shriramana Sharma via Unicode wrote: Hello people. We have sun, earth and moon emoji (3 for the earth and more for the moon's phases). But we don't have emoji for the rest of the planets. We have astrological symbols for all the planets and a few non-existent imaginary "planets" as well. Given this, would it be impractical to encode proper emoji characters for the rest of the planets, at least the major ones whose physical characteristics are well known and identifiable? I mean for example identifying Sedna and Quaoar (https://en.wikipedia.org/wiki/File:EightTNOs.png) is probably not going to be practical for all those other than astronomy buffs but the physical shapes of the major planets are known to all high school students… >>> Earth = blue planet (with clouds) >>> >>> Mars = red planet >>> >>> Saturn = planet with rings >>> >>> I don't think any of the other ones are identifiable in a context-free >>> setting, unless you draw a "big planet with red dot" for Jupiter. >>> >>> Earth would have to be depicted in a way that doesn't focus on >>> "hemispheres", or you miss the idea of it as "planet". >>> >>> >>> >>> A./ >>> >>> >>> >> >
The need for a basic register of emoji submissions
There is a need for a basic register of proposals that have been submitted to the Emoji Subcommittee. Currently, emoji proposals are posted to the UTC register after they have been reviewed by the ESC as being actionable by the UTC. For proposals that make the cut, some time can pass between the date of submission and the date they are posted. For proposals that are deemed unsuitable, there is simply no public record. Consequently, there is no way to know if a particular emoji has been proposed, either while a submitted proposal is being reviewed or if a proposal has been rejected. The "Submitting Emoji Proposals" page at http://unicode.org/emoji/selection.html quixotically notifies the reader using bold face to "check the Emoji List to make sure your proposal is new": this list contains emoji that have already been encoded. This is a problem. There have been three instances where I have worked on emoji proposals only to later learn that they were already proposed earlier. And I learned that only because I check the UTC register frequently for my script encoding efforts. If there were a basic register of emoji submissions, I could have easily checked it and saved the hours I spent in drawing up documents. The de facto rationale for not posting emoji proposals to the UTC register right away is that 'there are too many proposals that are unactionable or of insufficient quality'. But, I think this rationale does not hold water too well. A basic task of a standards subcommittee is to maintain a list of artifacts that pertain to its function. For the ESC, these artifacts include all emoji submissions. And a list of these artifacts can easily be made available at http://unicode.org/emoji. So, that instead of pointing prospective emoji proposal authors to a list of already encoded emoji, they can be pointed to a list of emoji submissions. This basic register can be as simple as a list of names. If the ESC wishes to not post other details, that is fine. I am not asking for a Roadmap. I see from the announcement made yesterday that the ESC now has (at least) four members. Congratulations to the new members, who I believe to be highly capable of maintaining a simple public list of emoji submissions in short time. All my best, Anshu
Re: Comparing Raw Values of the Age Property
I performed several operations on DerivedAge.txt a few months ago. One basic example here: https://pandey.github.io/posts/unicode-growth-UCD-python.html If you provide some more insight into your objective, I might be able to help. I would recommend against relying on the order of the data, and that you instead parse the individual entries to obtain the 'Age' property. All my best, Anshu > On May 22, 2017, at 4:44 PM, Richard Wordingham via Unicode > wrote: > > Given two raw values of the Age property, defined in UCD file > DerivedAge.txt, how is a computer program supposed to compare them? > Apart from special handling for the value "Unassigned" and its short > alias "NA", one used to be able to compare short values against short > values and long values against long values by simple string > comparison. However, now we are coming to Version 10.0 of Unicode, > this no longer works - "1.1" < "10.0" < "2.0". > > There are some possibilities - the values appear in order in > PropertyValueAliases.txt and in DerivedAge.txt. However, I can find no > relevant guarantees in UAX#44. I am looking for a solution that can be > driven by the data files, rather than requiring human thought at every > version release. Can one rely on the FULL STOP being the field > divider, and can one rely on there never being any grouping characters > in the short values? Again, I could find no guarantees. > > Richard.
Re: Counting Devanagari Aksharas
> On Apr 20, 2017, at 8:19 PM, Richard Wordingham via Unicode > wrote: > > On Thu, 20 Apr 2017 14:14:00 -0700 > Manish Goregaokar via Unicode wrote: > >> On Thu, Apr 20, 2017 at 12:14 PM, Richard Wordingham via Unicode >> wrote: > >>> On Thu, 20 Apr 2017 11:17:05 -0700 >>> Manish Goregaokar via Unicode wrote: > I'm of the opinion that Unicode should start considering devanagari (and possibly other indic) consonant clusters as single extended grapheme clusters. > >>> You won't like it if cursor movement granularity is reduced to one >>> extended grapheme cluster. I'm grateful that Emacs allows me to > >> I mean, we do the same for Hangul. > > Hangul is generally a maximum of three characters, which is about the > border of tolerance. I find it irritating to have to completely retype > Thai grapheme clusters of consonant, vowel and tone mark. There were > loud protests from the Thais when preposed vowels were added to the > Thai grapheme cluster and implementations then responded, and Unicode > quickly removed them. Now imagine you're typing Vedic Sanskrit, with its > clusters and pitch indicators. I tried typing Vedic Sanskrit, and it seems to work: http://pandey.pythonanywhere.com/devsyll Haven't tried the orthographic oddity of the Nepali case in question. Above my pay grade. If you access the above link on an iOS device you'll see tofu and missing characters. Apple's Devanagari font needs to be fixed. - AP
Re: Soyombo empty letter frame
> On Jan 4, 2017, at 8:54 PM, Mark E. Shoulson wrote: > >> On 01/04/2017 04:18 PM, eduardo marin wrote: >> The Soyombo proposal is beautiful, but it is missing a very important >> character in my opinion: http://www.unicode.org/L2/L2015/15004-soyombo.pdf >> >> Encoding an empty letter frame will allow for its proper description in >> plain text (as it is clear in the proposal itself), it could be used as an >> stylized cursor in text processors and also we could make zwj sequences such >> that combining with consonants makes it only render the nucleus. > > According to the proposal: > > In the proposed encoding a combination of frame and nucleus is considered an > atomic letter This approach enhances the conceptualization and > identification of letters in the script; for instance, the letter ‘ka’ refers > inherently to the fully-formed (X) and not to the nucleus (X). > In other words, they are explicitly rejecting the model considering the > "frame" as an item in its own right. I realize that you are not calling for > redefining all the letters in terms of frame+nucleus, but encoding the frame > seems to be something the proposers deliberately decided against doing. In > calling for encoding the frame (and why just one frame? Wouldn't you want > both the "closed" and "open" ones?), I think you really are going against > what seems to be a design principle of the proposers. Which of course you > are completely entitled to do: just that you probably are better off talking > it over with the proposers directly, to learn their thinking and so they can > learn yours. > > ~mark As the author of the Soyombo proposal, I should like to say that I did indeed consider proposing the two frames for encoding as "pedagogical" characters. I did not mention the possibility of such in the proposal, but the present discussion persuades me to reinvestigate the issue. I'd be happy to hear the opinion of others. All the best, Anshuman
Re: Offlist -- Re: Comment in a leading German newspaper regarding the way UTC and Apple handle Emoji as an attack on Free Speech
That should've been offlist... :) > On Aug 28, 2016, at 2:04 PM, Anshuman Pandey wrote: > > Hi Doug, > > Do you know who represents the US on ISO 3166? > > Anshu > > >> On Aug 28, 2016, at 1:22 PM, Doug Ewell wrote: >> >> Philippe Verdy wrote: >> >>> Well it is still not so universal as there are wide ranges of glyphs >>> excluded for now to encoding as characters: >>> [...] >>> - country flags have been included but many regional emblems are >>> excluded (as they don't match any ISO 3166-1 code) >> >> There are tentative plans (again) to provide a composite encoding for flags >> corresponding to country subdivisions encoded in ISO 3166-2. >> >> Unicode and 10646 have done well so far to avoid judging for themselves >> which regions or groups deserve encoding over others, and sticking with the >> decisions of ISO 3166/MA instead. >> >>> - common road signs/street signs and signs for indoor facilities & >>> services >> >> I wouldn't doubt those are coming soon. >> >>> - various box drawing characters used in legacy terminals (notably in >>> Teletext and on older 8-bit systems): a few of them were added from >>> DOS/OEM codepages. >> >> I thought that set had been pretty much completed by now. I wonder which one >> are supposedly still missing. >> >> -- >> Doug Ewell | Thornton, CO, US | ewellic.org
Offlist -- Re: Comment in a leading German newspaper regarding the way UTC and Apple handle Emoji as an attack on Free Speech
Hi Doug, Do you know who represents the US on ISO 3166? Anshu > On Aug 28, 2016, at 1:22 PM, Doug Ewell wrote: > > Philippe Verdy wrote: > >> Well it is still not so universal as there are wide ranges of glyphs >> excluded for now to encoding as characters: >> [...] >> - country flags have been included but many regional emblems are >> excluded (as they don't match any ISO 3166-1 code) > > There are tentative plans (again) to provide a composite encoding for flags > corresponding to country subdivisions encoded in ISO 3166-2. > > Unicode and 10646 have done well so far to avoid judging for themselves which > regions or groups deserve encoding over others, and sticking with the > decisions of ISO 3166/MA instead. > >> - common road signs/street signs and signs for indoor facilities & >> services > > I wouldn't doubt those are coming soon. > >> - various box drawing characters used in legacy terminals (notably in >> Teletext and on older 8-bit systems): a few of them were added from >> DOS/OEM codepages. > > I thought that set had been pretty much completed by now. I wonder which one > are supposedly still missing. > > -- > Doug Ewell | Thornton, CO, US | ewellic.org
Re: Revenge of pIqaD
Dear Mark and Chris, I wonder if copyright or other IP issues might hinder the suitability of encoding Klingon, similar to the Tolkien scripts? And to be sure, Klingon certainly does have a larger digital presence than the Gondi scripts... All the best, Anshu > On Jul 28, 2015, at 10:21 PM, Mark Shoulson wrote: > > OK! I'm freshly back from the qep'a' cha'maH cha'DIch in Chicago, and I have > to report that Klingon pIqaD really is out there and getting some use, > despite having been banished to the PUA. I've seen it on a wine-bottle label > (commercially produced, not someone's homebrew), on the Klingon version of > the Monopoly game, a book or two (NOT published by the KLI); there are > websites using it (but then there were last time I mentioned this and that > didn't seem to count then), and apparently support for it on several > platforms, including a smartphone keypad, to say nothing of quite a few > T-shirts. Apparently there is a small community actually using pIqaD to > (*gasp*) exchange information via SMS. I'm copying Chris Lipscombe on this > email; he is better plugged in to the use of pIqaD in Real Life™ (don't > forget to Reply All if you want to include him, since I think he isn't on the > list at the moment). > > What has to be done to get this encoded? The proposal is likely still more > or less what we need, and it probably has at least as much online information > interchange as, say, Gondi does ("Well, what do you expect, Gondi isn't > encoded yet!" "Neither is pIqaD.") Are we ready to revisit this question > again? > > ~mark
Re: Accessing the WG2 document register
Andrew, Thank you for this detailed investigation. It is truly informative. As I am considered an ineligible contributor by ISO, um, standards, I hereby withdraw all of my contributions to Unicode, and reflexively to ISO 10646. A list of the contributions that I withdraw is given at: http://linguistics.berkeley.edu/~pandey/ Whoever has the task of coordinating with ISO, is that you Michel?, please withdraw all of my contributions. All the best, Anshuman
Re: Accessing the WG2 document register
> On Jun 10, 2015, at 5:07 AM, Janusz S. Bien wrote: > > Quote/Cytat - William_J_G Overington (Wed 10 Jun > 2015 10:25:19 AM CEST): > >>> Remind me why Unicode is still taking ISO to the dance? Sometimes going >>> stag has its benefits... >> >> >> As I understand it, Unicode Inc. is a recognised guest of ISO in >> participating in ISO producing an International Standard. > > Cf. http://www.unicode.org/L2/L2014/14286-wg2-liaison.pdf This document provides further evidence of the irrelevance of ISO in the Unicode world. Deference. Janusz, what was your intention in providing a link to this document? All the best, Anshuman
Re: Accessing the WG2 document register
On Jun 10, 2015, at 4:25 AM, William_J_G Overington wrote: >> Remind me why Unicode is still taking ISO to the dance? Sometimes going stag >> has its benefits... > > > As I understand it, Unicode Inc. is a recognised guest of ISO in > participating in ISO producing an International Standard. Does Unicode need ISO to exist? Or does ISO need Unicode? > The fact that Unicode Inc. provides a valuable public service in making > documents and encoding charts freely available to all who access the > www.unicode.org website is not in any way the same as the provenance that ISO > has of being recognised by governments around the world as providing > standards for technological matters ISO is a profit making business. I worked on an ISO standard for the transliteration of Indic scripts two decades ago and I have yet to see the published standard. Back then I couldn't afford to buy the document and ISO didn't have the heart to give me a copy as a contribute. So, to this day today, I have yet to see the official standard that I helped to develop. ISO needs to function as a non-profit organization with open access to all of its activities and publications. > I am not a lawyer, yet as I understand it, the underlying theory of standards > work is that it is a legally permitted exception to a general legal > prohibition of businesses meeting together to decide and agree what will be > applied in industrial activity. And so ISO functions by relying upon contributions made by the public without granting either authorship or compensation to those who actually build their standards. And now they want to claim ownership of contributed documents... > Thus, for example, it is fine for businesses to agree that one particular > code point will be used for the symbol for the Indian Rupee, as that helps > consumers in that a message between computers of different brands can be > passed and read successfully. This can be done without ISO... > Yet, for example, it is not permitted for businesses to meet together to > decide that all computers will be in a grey plastic box, as that hinders > choice for consumers. Who exactly is imposing these restrictions? Restriction of choice is an issue for political economy, not standards bodies. All the best, Anshuman
Re: Accessing the WG2 document register
Shervin, > On Jun 9, 2015, at 7:18 PM, Shervin Afshar wrote: > > Anshuman Pandey observed: > > > Remind me why Unicode is still taking ISO to the dance? Sometimes going > > stag has its benefits... > > Hear, hear! I really wanted to punctuate my statement with a STAG emoji, or REINDEER at the very least. But, the closest thing I found was 🐂. Pragmatically on the dot, but unforch not semantically... Anshu
Re: Accessing the WG2 document register
Hi Ken, > On Jun 9, 2015, at 6:38 PM, Ken Lunde wrote: > > Welcome to ISO. ☺ I think I'll skip that party. 😊 I've already started to add copyright statements to my proposals. Now I'll add another statement that says: "This document is intended for encoding the XYZ script in The Unicode Standard. If it and its contents are appropriated for encoding XYZ in ISO 10646, then ISO must make this document openly and publicly accessible to all." Remind me why Unicode is still taking ISO to the dance? Sometimes going stag has its benefits... All the best, Anshu
Accessing the WG2 document register
Hello all, I learned today that the WG2 document register is not publicly accessible. This means that I, as a proposal author, have no means of accessing the documents that I contribute. Can someone associated with WG2 or anyone else in the know please tell me why these documents are under lock and key? All the best, Anshuman
Re: sex and emoji
Never would have imagined 'sex' and 'Unicode' in the memetic scene, but a big ol' 🍆 to the UTC! Kudos, rather 🍆. > On Feb 12, 2015, at 4:47 PM, Asmus Freytag wrote: > > To quote: "While this probably isn’t news to fans of the eggplant emoji, > " > > More here: > > http://time.com/3694763/match-com-dating-survey-emoji-sex/ > > A./ > ___ > Unicode mailing list > Unicode@unicode.org > http://unicode.org/mailman/listinfo/unicode ___ Unicode mailing list Unicode@unicode.org http://unicode.org/mailman/listinfo/unicode
Re: The Ruble sign has been approved
"In Russia... The de facto standard ruble sign approves the board of directors..." ;) On Dec 11, 2013, at 8:11 AM, Leo Broukhis wrote: The board of directors of the Central Bank of Russia has [finally] approved the de facto standard ruble sign. http://lenta.ru/news/2013/12/11/symbol/ Leo