Re: Mayan numerals (again)
The question is, whether the two versions (horizontal and vertical) are warranted for or not. With my limited knowledge of the matter, I would believe only one set to be encodable, the other being free / stylistic variation. Sz Szelp, André Szabolcs +43 (650) 79 22 400 On Sun, Jun 23, 2013 at 9:19 PM, Jameson Quinn jameson.qu...@gmail.comwrote: Last year, I started a discussion about proposing the Mayan numerals for inclusion in Unicode. Several people on the list supported this idea, and encouraged me to submit a proposal. I did not manage to do so last year, but I am ready to now. I have access to dozens of different books with their page numbers, tables of contents, and publication dates in mayan numerals. Several of them use the numerals in other ways, such as numbered lists or century numbers (ie, siglo 16, 16th century, with 16 in Mayan numbers). All of these are from a single publishing house, and I know of 2 other publishers who use similar practices. None of the samples I have are textbooks, and it is common for math textbooks here in Guatemala to have a section on Mayan numerals, typically with a few simple addition problems or the like. The publisher of the books I have is interested, and would probably sign on to my proposal, though it would take about a month for them to get full consensus on this. I can also provide photos of Guatemalan currency notes, which have mayan as well as arabic numerals on them. I'd like to propose 40 glyphs: the vertical and horizontal versions of the digits 0-19. The zero glyph would be in it's shell form; the several minor variants of this form would be considered as the same base glyph. This initial proposal would not include head variants or the petroglyphic flower zero, nor would it include petroglyphic marginal decorations on the glyphs for 1, 6, 11, and 16, as all of those are generally used in a context of fully glyphic writing, which has a number of difficult technical issues to resolve before it's ready for unicode. (Although I could provide at least one modern example of a glyphic text; this is at least to some degree a living art today, though it was dead for centuries.) I'd like to know what should be my next step, and if anyone who's more experienced with unicode procedures would like to advise me more closely. Sincerely, Jameson Quinn
Re: Mayan numerals (again)
One never stops learning... I'd be very interested in the examples, especially in how far they are non-interchangeable. Thanks Szelp, André Szabolcs +43 (650) 79 22 400 On Tue, Jul 2, 2013 at 1:03 PM, Jameson Quinn jameson.qu...@gmail.comwrote: 2013/7/2, Szelp, A. Sz. a.sz.sz...@gmail.com: The question is, whether the two versions (horizontal and vertical) are warranted for or not. With my limited knowledge of the matter, I would believe only one set to be encodable, the other being free / stylistic variation. I have examples of printed pages using both forms on the same page non-interchangably, if that helps.
Re: Latvian and Marshallese Ad Hoc Report (cedilla and comma below)
The COMMAN BELOW / CEDILLA problem is typically something that probably cannot be solved in Unicode in a way to satisfy every possible aspect.[^1] These problems are an artifact of the historical development of Unicode, and as a standard, stability issues seem to be high priority. Higher priority usually, than canonical equivalences and NFD, especially as NFC is the usually recommended form. To fix these is probably a to keep in mind item for a hypothetical * *NeoUniCode* standard of the future, as so many other issues. With modern font technologies capable of language dependent glyph variants and markup languages, a unification from the beginning might be a solution, or to disunify the both forms acceptable from either (with the drawback of even more confusables). However, these considerations are pretty academic and hypothetical from a current Unicode point of view. The case is similar with CARON / COMMA ABOVE RIGHT of Czech/Slovak, posing probably an even harder case. Here one might consider for a hypothetical * *NeoUniCode* standard encoding them as they canonically appear—with CARON for uppercase and COMMA ABOVE RIGHT for lowercase and define language-dependent casing behaviour, as it is already done with Latin SMALL LETTER I and SMALL LETTER DOTLESS I / CAPITAL LETTER I and CAPITAL LETTER WITH DOT ABOVE for Turkish in the current Unicode standard. (And while at it, one could consider do away with separate code points for uppercase letters altogether and resolve the issue with a mechanism similar to combining characters or variation selectors). /Szabolcs [^1]: In fact, in languages where both presentations are equally acceptable, even the (synchronic) identity is hard to determine: is it a CEDILLE that can take COMMA form as well, or the other way around? Szelp, André Szabolcs +43 (650) 79 22 400 On Wed, Jun 19, 2013 at 2:41 PM, Denis Jacquerye moy...@gmail.com wrote: On Wed, Jun 19, 2013 at 9:12 AM, Michael Everson ever...@evertype.com wrote: On 19 Jun 2013, at 07:54, Denis Jacquerye moy...@gmail.com wrote: [...] How would one rationalize using one diacritic U+0327 with M/m and O/o but not with L/l and N/n in Marshallese? The same way one would rationalize using precomposed ãẽĩñõũỹ (aeinouy with tilde) but a necessarily de-composed g̃ (g with tilde) in Guaraní. This is wrong: ãẽĩñõũỹ normalize to use U+0303 in NFD, so they canonically use the same tilde as g̃. The 4 additional non decomposable characters with Marshallese with cedilla would not normalize to use the same cedilla as the others Marshallese characters with cedilla. The would no canonically use the same cedilla. [...] It would require less new characters to be encoded and would make it easier to support in fonts (adding 1 instead of 4). No! Because if you added a single new character you'd have to make sure you had good glyph placement with LlMmNnOo which is eight glyphs. The best practice would require to add diacritical mark placement whenever necessary if not on all possible base character, M/m and O/o would still need either way, L/l and N/n would need it for other combining diacritics either way. A modern font already needs to be able to correctly place combining diacritics, including cedilla or ogonek. Navajo and other languages need other placement of ogonek than that of European languages. This does not mean it is justified to encode single precomposed Navajo ogonek characters. The placement of the cedilla is not semantically different, m̧ with the cedilla on the left has the same meaning as if the cedilla were centered or on the right, even if just one of the two is correct in some contexts like in Marshallese. This does not mean it is justified to encode m with left cedilla, m with centered cedilla or m with right cedilla. An additional single combining diacritics would behave the same way. On Wed, Jun 19, 2013 at 9:49 AM, Michael Everson ever...@evertype.com wrote: On 19 Jun 2013, at 09:04, Denis Jacquerye moy...@gmail.com wrote: Furthermore, the cedilla can also have a proper cedilla form as opposed to the Latvian or Livonian comma below form in transliteration systems. This has nothing to do with the Marshallese/Latvian conflict, though. ALA-LC romanizations use cedilla with r as they do under c or s. Does ŗ contrast with r̦ in ALA-LC romanization? The same way Marshallese has cedilla letters contrasting with comma below letters. The only correct form is with cedilla and it doesn't use comma below. BGN/PCGN and UNGEGN romanizations use cedilla with d as they do under h, s, t or z. DIN 1460-2 uses the cedilla under d, k, l, n as it does under c, h, s, t and z. If those things are a problem, then solving this problem for Marshallese simply does nothing about that problem. But it solves the problem for Marshallese. If the 4 Marshallese cedilla characters are encoded as single characters, does this mean the d, k, l, r
Re: Greek Astrology
Is there evidence that these have been used consistently, on most charts of the time? These could be ad-hoc notations (as given the contemporary praxis, ligation per se does not make a symbol). -- Szelp, André Szabolcs +43 (650) 79 22 400 On Thu, Nov 1, 2012 at 2:38 AM, CE Whitehead cewcat...@hotmail.com wrote: Hi. From: Raymond Mercier rm459_at_cam.ac.ukrm459_at_cam.ac.uk?Subject=Re:%20Greek%20astrology Date: Mon, 29 Oct 2012 08:52:43 - I think I had somehow assumed that the symbols used in Greek Horoscopes had already been encoded, but it seems not. The four signs used to mark the principal corners (ascendant, etc) of the horoscope diagram are shown in the attachment, taken from http://www.skyscript.co.uk/greek_horoscope.html These four signs should be encoded along with the zodiacal signs U+2648 to U+2653. Perhaps they are already in the pipeline ? Perhaps these should be in the pipeline, as the online templates I could find for astrological charts do not have them; they have to be added in (although it would be possible to have these built into the chart template also, as the houses are always in the same place and the ascendant is always located between the 12th and the 1rst, etc.); see: http://www.skyscript.co.uk/charttemp.html Similarly Paul Wade's copiable template is void of the symbols http://books.google.com/books?id=WY8hjKtSaP0Cpg=PA40lpg=PA40dq=natal+charts+astrological+charts+templatessource=blots=By-xF3UGWBsig=KvomOKgo999CwuJPKaq1LmeoqHchl=frsa=Xei=oMCRUK-wF5Sc8gTWi4DYAgved=0CDQQ6AEwAzgK#v=onepageq=natal%20charts%20astrological%20charts%20templatesf=false (I'll try to check an offline guide, too, but the few actual online templates, not sample charts, seem void of the symbols for the ascendant, midheaven, etc., so they seem to be separate from the actual chart of the houses, so go for it. Happy Halloween in any case.) Best, --C. E. Whitehead cewcat...@hotmail.com Best wishes Raymond Mercier
Re: Greek astrology
These look as if they were actually ligatures. Without knowing the greek words for the principal corners, I'd read them as a rho-omega-kappa, a pi-upsilon, an alpha (delta?)-upsilon-nu-omega and a rho-mu ligature. I wouldn't be surprised, if these letters were abbreviations for some expanded terms for the four principal corners. On the other hand there do exist ligatures which gained conventional meaning and are now encoded as their own character, eg. ℔, ℅. Szabolcs On Mon, Oct 29, 2012 at 9:52 AM, Raymond Mercier rm...@cam.ac.uk wrote: ** I think I had somehow assumed that the symbols used in Greek Horoscopes had already been encoded, but it seems not. The four signs used to mark the principal corners (ascendant, etc) of the horoscope diagram are shown in the attachment, taken from http://www.skyscript.co.uk/greek_horoscope.html These four signs should be encoded along with the zodiacal signs U+2648 to U+2653. Perhaps they are already in the pipeline ? Best wishes Raymond Mercier
Re: Greek astrology
Oh, actually the *very*hompage*you*linked* makes it quite clear, that these are not symbols, but abbreviatures (emphasis by color by me; the original page uses a somewhat unusual transliteration scheme): — These are just completely ordinary late ancient/medieval abbreviations, I would not think that they are encodable. (Use ZWJ, if you must). To demonstrate: the name for the midheaven is written in Greek asmesouranhma, the English transliteration of which is *Mesuranima* and the translation of which is 'midheaven' or 'middle of the sky'. The equivalent Latin term, which has remained in use, is *Medium Coeli*. Just as we abbreviate *M* edium *C*oeli to MC, the Greek word *m*esou*r*anhma is abbreviated to mr,which is worked into a symbol (see fig.6.D below) by allowing the Greek letter mu (m) to cut across the down stroke of the Greek letter rho (r). [image: Ascendant symbol] [image: IC symbol] [image: Descendant] [image: MC symbol] A) AscendantB) IC C) DescendantD) MC Fig. 6, the abbreviations and symbols of the angular house names A similar approach is used to generate the symbol for the ascendant. The Greek word *w**r*os*k*opoV transliterates as *Horoskopos*, which is easily recognised as meaning 'hour-marker' or 'hour-watcher'. Here the abbreviated (emboldened) characters are combined so that the down stroke of rho (r) cuts across omega (w), and rests on top of kappa (k). This is one of only four symbols which have been noticed in ancient Greek charts. The others are the glyph for the midheaven which has just been described, and those for the Sun and Moon which are detailed below. Currently this symbol has the oldest heritage, appearing without the underlying kappa in a papyrus from Karanis relating to the year 182. Of course, when the underlying kappa is removed, the glyph for the ascendant and that of the midheaven appear very similar, and in some charts the same symbol seems to have been used to mark either or both the ascendant and midheaven.[5]http://www.skyscript.co.uk/greek_horoscope.html#5 The name of the descendant shown here is not so much a symbol as an abbreviation with a raised character at the end. This presents the first four letters of the Greek word dunwn, which transliterates as *dunon* and translates as 'setting' or 'western' or 'evening' (in the same way that the word *oriens* can mean 'eastern' 'rising' or 'morning'; all of these words originating from the same root). The symbol that we see under the 4th house comprises the first two characters of the Greek word *u**v*ogeion [!], with pi (v) resting on top of upsilon (u). The transliteration of this word is *ypogeon* and its translation 'under-earth' (or 'underground' or 'underworld') presents a close association with traditional astrological references to the 4th house as 'under the earth'. Our common abbreviation I.C., derives from the Latin* Immum Coeli *which translates as 'lower heaven', but this older term seems to do a better job of conveying the underworld mythology that is anciently associated with the 4th house, and its interpretative role in describing what lies beneath the surface of the ground. On Mon, Oct 29, 2012 at 9:52 AM, Raymond Mercier rm...@cam.ac.uk wrote: ** I think I had somehow assumed that the symbols used in Greek Horoscopes had already been encoded, but it seems not. The four signs used to mark the principal corners (ascendant, etc) of the horoscope diagram are shown in the attachment, taken from http://www.skyscript.co.uk/greek_horoscope.html These four signs should be encoded along with the zodiacal signs U+2648 to U+2653. Perhaps they are already in the pipeline ? Best wishes Raymond Mercier
Re: ASSAMESE AND BENGALI CONTROVERSY IN UNICODE STANDARD ::::: SOLUTIONS
On Wed, Jul 11, 2012 at 10:30 PM, Richard Wordingham richard.wording...@ntlworld.com wrote: On Wed, 11 Jul 2012 21:17:08 +0200 Joó Ádám a...@jooadam.hu wrote: To extend the list, the Irish, Scots, English, Scandinavians and Poles picked up the Roman heritage without the assistance of being physically conquered. And the Romanians re-established it as an expression of non-Slavness. Well, the official language of Hungary was Latin up until 1844. Does that qualify us as the true inheritors of the Roman Empire? No. I wasn't sure how voluntarily Hungary (or rather, its rulers) had adopted West European ways, so I didn't add Hungary to the list. Oh, its rulers adopted West European ways (or rather: the Latin Rite Church's way as opposed to the Greek Rite Church's ways) quite voluntarily in the late 10th century...
Re: Mandombe
On Mon, Jun 11, 2012 at 10:58 AM, Stephan Stiller sstil...@stanford.eduwrote: This is interesting only if the encodable elements would be different - remember, Unicode is not a font standard. +1; rendering can be so much more complex than encoding. I'd really like to see a successful renderer for Nastaliq, (vertical) Mongolian, or Duployan. (What *are* the hardest writing systems to render?) Vertical mongolian does not seem to be harder to render _conceptually_ than, let's say, simple arabic. It's more the architectural limitations of rendering engines that seem to limit its availability, and the intermixing with horizontal text. For Nastaliq, Thomas Milo's DecoType is miraculous: it's hard, but given the good job they did, obviously not impossible. — Well, I don't know about Duployan. /Sz
Re: Mandombe
A very interesting script indeed. (Never heard of it before). While the shape and the impression it does is quite intriguing and fascinating, I'd think that it's rather impractical to write actually. What are the experiences of the educators in this respect? (Though I understand that this being a revealed, thus in many respects sacred script to its educators and users, accounts of it will be probably biassed). /Sz On Fri, Jun 8, 2012 at 10:43 PM, Jean-François Colson j...@colson.eu wrote: Hello In the French Wikipedia article about Mandombe ( http://fr.wikipedia.org/wiki/**Mandombehttp://fr.wikipedia.org/wiki/Mandombe) I read: “Un dossier de demande d'encodage de l'écriture Mandombe a été introduit à l'Unicode au mois de décembre 2010. Ce dossier a été discuté à la réunion du Comité technique de l'Unicode au début du mois de février 2011.” which I would translate as “A Mandombe script encoding request dossier was introduced at Unicode in December 2010. That dossier was discussed at the Unicode technical committee meeting at the beginning of February 2011.” Does anybody have informations about that “dossier”? Is it available anywhere on the web? Thanks JF
Re: Latin chi and stretched x
Julian, if you look closely, it is not actually a turned s, but something created with a turned s in mind. In the very sort of the alphabet, the regular s has equal (or near-equal) top and bottom bowls. the turned one has an emphasized upper bowl, which of course stems from the idea of a turned s (as some fonts have a larger bowl lower bowl of s for balance), but it is quite clearly not a turned s as identity, but rather something _inspired_ by a turned s. On Thu, Jun 7, 2012 at 11:05 PM, Julian Bradfield jcb+unic...@inf.ed.ac.ukwrote: David Starner wrote: LATIN SMALL LETTER ROTATED P was used; see http://commons.wikimedia.org/wiki/File:BAE-Siouan_Alphabet.png . It has caused some whimpering among those trying to transcribe the text. Urk! And there's rotated s as well. Alright, I take it back. There is no limit to the barminess of script inventors. Obviously what we need are combining marks whose visual effect is reversing/rotating the previous glyph. No, I didn't say that, I really didn't say that... -- The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336.
Re: Latin chi and stretched x
You are right, the s-acute just below it confused me. -- Szelp, André Szabolcs +43 (650) 79 22 400 On Fri, Jun 8, 2012 at 11:32 AM, Julian Bradfield jcb+unic...@inf.ed.ac.ukwrote: Szelp, A. Sz. wrote: Julian, if you look closely, it is not actually a turned s, but something created with a turned s in mind. In the very sort of the alphabet, the regular s has equal (or near-equal) top and bottom bowls. the turned one has an emphasized upper bowl, which of course stems from the idea of a turned s (as some fonts have a larger bowl lower bowl of s for balance), but it is quite clearly not a turned s as identity, but rather something _inspired_ by a turned s. Quite clearly wrong! I'm afraid you're suffering from optical delusion. I actually thought the same when I first looked at it, but it's not so. Cut out the turned s; then cut out, say, the initial s of sonant. Rotate it 180 degrees. They're identical, up to the tiny variations due to actual ink from metal type. (Beware that the ś immediately below is from a different fount, and *does* have more equal bowls. That's what confused me at first.) Of course, since this was printed in the age of metal type, it *has* to be a turned s. Cutting a special type would cost far more, and as David pointed out in his original post, the reason for the absurd turned p and turned s was the the publishers weren't willing to cut the extra types to match the letters in the original hand-written script. -- The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336.
Re: Vexillological symbols
They probably are. They are routinely used in vexillological literature also in print. Szabolcs On Sat, Jun 2, 2012 at 10:17 PM, Jean-François Colson j...@colson.eu wrote: While we’re speaking of flags, the study of flags is named vexillology. In that discipline, a certain number of symbols are used: 63 symbols show where the flags are used and a score of additional symbols, presented at en.wikipedia.org/wiki/Vexillology , are used to describe the flags. There could be more symbols, I haven’t investigated that matter yet. Are these symbols worth a proposal? JF
Re: Latin chi and stretched x
Unvoiced so far, I had similar reservations re streched x and latin chi. Michael wrote: As I say, stretched x is in a family of other x's with one or two long feet, which may have rings or hooks on the end of them. But its weight is clearly x-like -- by design. Where Teuthonista texts occasionally used a proper Greek chi it is because of typographic deficiency. This family of streched x-s seem to go back to a tradition of using different font sorts distinctively for sounds, most prominently greek letters (this practice found its way also into IPA) and fraktur. (I know 19th c., early 20th c. German tourist's basic Italian guidbooks using a vs. fraktur a differently to denote different sounds, as they use x vs. chi differently. The streched x with one long leg quite probably comes from a fraktur (more exactly: textur) x, as does the streched x from the chi. Denis gives good evidence for the streched x being chi. Adding curls and modifications to existing (including innovative) signs is common to phonetic tradition. All in all, I also have the impression, that while encoding LATIN CHI as distinct from GREEK CHI was long due, there are not enough grounds to disunify latin-chi from streched-x. There is no contrastive use and the history points to chi. The only difference is (if there is any? most use italic type) stroke weight distribution between the two, according to Michael, but it's Michael himself who's recognized that Teuthonista suffers from a good deal of extraordinarily bad typography, which shows us, that the different stroke weight distribution is actually just bad typography. — actually quite similar to something we've seen with Cyrillic reform orthographies (eg. the gha derived from a handwritten old q, which got encoded misnamed as OI) of the 20-30ies and the chinese tone letters derived from numbers/latin/cyrillic type. Szabolcs On Mon, Jun 4, 2012 at 3:10 PM, Denis Jacquerye moy...@gmail.com wrote: On Mon, Jun 4, 2012 at 11:38 AM, Michael Everson ever...@evertype.com wrote: On 4 Jun 2012, at 10:04, Denis Jacquerye wrote: On Mon, Jun 4, 2012 at 10:16 AM, Michael Everson ever...@evertype.com wrote: What is your point, though? Latin stretched x has been accepted based on examples with an Italic glyph like Lepsius' chi, a glyph like Greek chi and a stretched x taller than x-height (and not below baseline). All these are strictly different glyphs. Teuthonista suffers from a good deal of extraordinarily bad typography, and a fair bit of non-typographic handwritten text (which isn't bad). Where it uses Greek sorts it is because that was what they had, but it is clear from the *family* of stretched x's some with rings and curls that it is an x that is being stretched. (And not a chi with But Latin chi is being proposed as a different character because IPA has used a different glyph. Why? Because all, not some, of the IPA borrowings from Greek were explicitly stated to be designed to be different from Greek and to harmonize with Latin. The persisting unification doesn't make processing multi-script Greek and Latin text any easier, and ultimately is not what was designed. This is very clear in the beta, which now can be disunified because of its capital, but which should never have been unified in the first place. Furthermore, the Latin capital Chi is being proposed based on Lepsius' capital Chi which glyphs are strictly different from that one proposed. Yes, but it is still essentially a Latin Chi, not a Latin Stretched X. It is clearly not a Greek Chi, because Greek Chi does not use that shape for its capital. Lepsius, and the IPA, explicitly disunified Latin Chi from Greek, and I would say that both Lepsius and IPA glyphs could be taken for glyph variants of Latin Chi. But they are different from what is found in Greek. My concern is only with Latin chi being unified with Latin stretched x. The disunification of Latin chi from Greek chi (or the others in the proposal) is a good thing, I just think it has already been done with stretched x given the examples. As I say, stretched x is in a family of other x's with one or two long feet, which may have rings or hooks on the end of them. But its weight is clearly x-like -- by design. Where Teuthonista texts occasionally used a proper Greek chi it is because of typographic deficiency. How do we move forward? Is there evidence IPA Latin chi is any different from Teuthonista's multiple stretched x? Both use the glyph of Greek chi sometimes, and other glyphs other times. Stretched x is an x, not anything else. In its origin, they stretched a Latin x. Latin chi is borrowed from Greek chi, but in Lepsius uses a unique capital, and in IPA has a Greek-chi-like weight which differs from the Latin x. Lepsius' chi (with a proper Latin glyph) was already in use in Lepsius' Standard Alphabet (1855) for a guttural consonant, and chi with an acute for a palatal consonant. The
Re: Exact positioning of Indian Rupee symbol according to Unicode Technical Committee
Keyboard layouts are, to my best knowledge, not a matter of Unicode. Szabolcs On Mon, May 28, 2012 at 10:19 AM, Anand Kumar Sharma aksha...@cdac.in wrote: Hi I want to know that is current exact Position of Indian Rupee Symbol on US-English keyboard (QWERTY keyboard). I came across one of the blog showing Rupee symbol on extreme left to character 1 refer this http://blog.foradian.com/rupee-foradian-keyboard-layout-type-the-india (Refer Keyboard picture) There is another way of typing Rupee symbol using ALTGr+4 which I most of time use on third layer of In script Keyboard What will be the position of Rupee symbol according to particular STANDARD on our keyboard when new keyboard with rupee symbol will come into market -- Thanks and Regards This mail has came from desk of Anand Kumar Sharma GIST QA|CDAC-Pune|Ph:020-25503468|http://www.cdac.in Before software can be reusable it first has to be usable --- This e-mail is for the sole use of the intended recipient(s) and may contain confidential and privileged information. If you are not the intended recipient, please contact the sender by reply e-mail and destroy all copies and the original message. Any unauthorized review, use, disclosure, dissemination, forwarding, printing or copying of this email is strictly prohibited and appropriate legal action will be taken. ---
Re: Unicode 6.2 to Support the Turkish Lira Sign
Andreas, Asmus, let me have my two coins as well... The Turks did not present “a new symbol”. They presented a new design for an existing symbol (₤) which stands in for an existing currency. A new design makes it a new symbol. Especially a radical new design. What makes a symbol a symbol? If I design a new door handle, is this going to get a new rubrication in household supplier’s catalogues? Or is it still just: a door handle. And how would you define or measure the radicalism of the design in question? I can’t see any ‘radical new design’. So far I see the parallel between the new radical symbol and the encoded lira sign (₤) to be the same relation as the one between the new (official, prescriptive) Euro symbol [1] (the geometric one, which, though supposed to be prescriptive, *no-one* uses) and the actual incarnations of the Euro symbol (€) in different font faces matching their design (usually their C). Now, I truly concur with Andreas that as such, the new code position is _not_ warranted. Of course, if the sign does get encoded, we won't be able to prove ourselves, as encoding this design fallacy (thank god in the case of the Euro common sense and a sense of aesthetics won over burocratic shortsightedness) is on the other hand a self-fulfillying prophecy. If you will have a U+20A4 LIRA SIGN _and_ a U+20BA TURKISH LIRA SIGN, designers _will_have_to_ make a visual distinction between them, forcing them to take on the poor design official design, not allowing them to interpret creatively the sign like they did with the Euro to make it more visually pleasing. I'm wondering how often we'll see (before the encoding happens! [2]) the new lira sign to surface as ₤, £ or Ł or Ƚ in handwritten naive typography after the fuss about the new sign settles in 1–2 months. So I really think that the current situation is as this: The Turkish government has presented an official Turkish lira sign (alike the official Euro construction). This is probably going to be printed on banknotes, as it is official. This is fine. So does the ENB. However, the sign is in fact just a particular identity, used for official, engraving and minting purposes of ₤, as which it should be used in text. Transcribing the particular design of [anchor-lira] 100 of a hypothetical future banknote in plaintext as ₤ 100 is equally valid as transcribing the [official-geometric-euro-design] 10 of the Euro banknotes as € 10. Of course, if you buy too quickly and too cheap, as Andreas put it, and encode the new glyph variant of the Lira sign which happens to be the one preferred for future Turkish banknotes and coins, you open up Pandora's box by forcing a need for distinction, where there is — as per status-quo — none. You have been warned :-) My two cents... Szabolcs [1] http://en.wikipedia.org/wiki/File:Euro_Construction.svg [2] of course, once the sign is encoded, it will be used in print and that will influence handwritten usage. Well, the self-fulfilling prophecy sets in.
Re: Unicode 6.2 to Support the Turkish Lira Sign
Asmus, most of your letter (and my previous one, for a matter of fact), is opinion, which is valuable to voice and to be heard, but which upon it's hard to argue, so I won't go into that. However you write: What you and Andreas are advocating, that is not to add a code point, would require a wholesale glyph change for U+20A4. All existing fonts would have to be tweaked to suddenly have shapes based on a L in a Turkish slipper (that's what the times-like example in the proposal document reminds me of) instead of a script-like shape (based on £). And I must reject that. This is not what we are advocating. While not speaking for Andreas, *my* point is that if we were to encode that writing on that mug or on the flyer (cf. the proposal you are referring to), the identification of the incriminated glyph as U+20A4 would be correct and preferable and right. Thank god, for fancy flyer designers (who might want to have the flashy anchor-style, or let's put it that way: technocratic constructivist style) modern font technologies allow for glyph variants via stylistic sets or other means. (i.e. there might be a _preferred_ style of U+20A4 in Turkey, as there is a preferred style for certain italic Cyrillic letters in Serbia distinct from the Russian [= de facto general] style. If the usage of the sign develops in a way that a disunification is warranted, we can do so later. No need to hurry. The Armenian dram sign was first printed on a banknote in 2003 (in the security strip of the 10.000 dram banknote). It has been consequently used in newer coinage and banknotes, appearing on the 1.000 dram in 2011. It was part of an Armenian national standard. Yet Unicode encoded it only in 2011 in v6.1. That's 8 years. There was obviously no need for hurry to encode a new-born currency sign. Neither is here need for hurry. We can wait and see wether there's need or real basis for disunification. Szabolcs
Re: Unicode 6.2 to Support the Turkish Lira Sign
Michael wrote: which happens to be the one preferred for future Turkish banknotes and coins, you open up Pandora's box by forcing a need for distinction, where there is — as per status-quo — none. You have been warned :-) There is nothing new here. 2003-02-24 ₲ ₳ http://std.dkuug.dk/jtc1/sc2/wg2/docs/n2579.pdf 2003-10-01 ؋ http://std.dkuug.dk/jtc1/sc2/wg2/docs/n2640.pdf 2004-04-23 ₴ ₵ http://std.dkuug.dk/jtc1/sc2/wg2/docs/n2743.pdf 2008-03-06 ₷ http://std.dkuug.dk/jtc1/sc2/wg2/docs/n3390.pdf 2008-03-06 ₸ http://std.dkuug.dk/jtc1/sc2/wg2/docs/n3392.pdf 2010-02-10 ֏ ftp://std.dkuug.dk/jtc1/sc2/wg2/docs/n3771.pdf (KP) 2010-07-19 ₹ http://std.dkuug.dk/jtc1/sc2/wg2/docs/n3862.pdf 2012-04-17 ₺ http://std.dkuug.dk/jtc1/sc2/wg2/docs/n4258.pdf Of course there is. These were signs unidentifyable with existing currency symbols. The new rupee sign is, of course, note identical with the [Rp] sign, which is quite distinct, only the semantics being identical. (like € being semantically identical to the four-codepoint string Euro). Also, all of these (and also the Euro) have a clear description in terms of constituent letters. $: an S with a single or a double vertikal stroke ¢: a c with a vertical or slanted stroke £: a fancy L with a horizontal stroke ₤: a fancy L with two horizontal strokes ₪: a SHIN and a HET ligated ( sheqel ḥadash 'new shekel') €: a C with double strokes ( E) ₲: a G with a vertical stroke ₳: an A with a double crossbar ₴: A DZELO with a double crossbar ( italic minuscle GHE) ₵: A C with a vertical stroke ₷: an S with an m ligated ₸: a T with a second top bar ֏: an Armenian CAPITAL DA with a double crossbar (instead of the simple right twig) ₹: a stemless R crossed ( RA crossed) Quite tellingly in the case of the EURO it is *not* the very geometric official glyph of the Euro that is encoded, it is not even chosen for the representative glyph. So what is the proposed TURKISH LIRA SIGN, if not a fancy L with two horizontal strokes? Szabolcs
Re: Unicode 6.2 to Support the Turkish Lira Sign
Philippe, In fact I do expect that real world representation of the new sign (outside banknotes and preprinted check forms), will be more similar to a mirrored capital J, the two strokes will be there but their slanting will vary a lot. so if your assumptions do turn out to be true, then it really will be an ARMENIAN DRAM rotated by 180°s... ;-) /Sz
Re: Unicode 6.2 to Support the Turkish Lira Sign
On Wed, May 23, 2012 at 2:31 PM, Philippe Verdy verd...@wanadoo.fr wrote: so if your assumptions do turn out to be true, then it really will be an ARMENIAN DRAM rotated by 180°s... ;-) A 180 degrees rotation is really so much significant that there's no risk of confusion. Otherwise we would always confuse A and V, 6 and 9, L and 7, C and Ɔ, p and d, d and q, and so on. ... unless you are a legasthenic ... Come on, note the ;-), I was not suggesting that this were a problem. /Sz
Re: Unicode 6.2 to Support the Turkish Lira Sign
I always wondered about the strange Drachma glyph in the standard: a Latin script D connected to a greek rho. What you identify as a Latin script D is probably also a Greek script D. cf. also the Cyrillic script D, which coincides with the Latin, even though the roman (and even printed cursive!) letters diverge considerably. Having a script Δρ (in script style) does not seem strange or absurd. Szabolcs
Re: Unicode, SMS and year 2012
While there are good reasons the authors of HTML5 brought to ignore SCSU or BOCU-1, having excluded UTF-32 which is the most direct, one-to-one mapping of Unicode codepoints to byte values seems shortsighted. We are talking about the whole of Unicode, not just BMP. /Sz On Sat, Apr 28, 2012 at 21:48, Doug Ewell d...@ewellic.org wrote: anbu at peoplestring dot com wrote: What are some of the reasons a new encoding will face challenges? The main challenge to a new encoding is that UTF-8 is already present in numerous applications and operating systems, and that any encoding intended to serve as an alternative, let alone a replacement UTF-8, must be better enough to justify re-engineering of these systems. Some people are simply opposed to additional encoding schemes. The HTML5 specification explicitly forbids the use of UTF-32, SCSU, and BOCU-1 (while allowing many non-Unicode legacy encodings and quietly mapping others to Windows encodings); one committee member was quoted as saying that other encodings of Unicode waste developer time. Any encoding that does not align code point boundaries along byte boundaries will be criticized for requiring excessive processing. The argument that I made will be made by others, that if it necessary to process bit-by-bit, one might as well use a general-purpose compression algorithm. It is popular to present gzip as the ideal compression approach, since it is widely available, especially on Linux-type systems, and publicly documented (and not IP-encumbered). I may have missed some other objections. -- Doug Ewell | Thornton, Colorado, USA http://www.ewellic.org | @DougEwell
Re: Support for non-BMP characters
Shouldn't it be technically possible to store Supplementary Plane characters in UTF-16 / UCS-2 as well? Isn't that what Surrogate Pairs are for? Sz On Wed, Apr 25, 2012 at 11:09, Marc Durdin marc.dur...@tavultesoft.comwrote: Probably the most egregious example I know of is JavaScript. As far as I know, JavaScript still only groks UCS-2. I'd love to be wrong. Marc -Original Message- From: unicode-bou...@unicode.org [mailto:unicode-bou...@unicode.org] On Behalf Of David Starner Sent: Wednesday, 25 April 2012 6:32 PM To: Unicode Mailing List Subject: Support for non-BMP characters It's been ten years since the first non-BMP characters were encoded. How are they working in your neck of the woods? There's a lot of places where they're working just fine, but I was facing MySQL's support. It has had support for UCS-2 and UTF-8 limited to the BMP for a long time; now in MySQL 5.5 there's utf16, utf32 and utf8mb4. (MySQL 5.1 and 5.5 are the current stable releases.) But there's enough warnings about incompatibilities with utf8mb4 to make me pause before switching my private database to it, and I think the net will see MySQL databases with utf8 instead of utf8mb4 as long as MySQL exists, unless they decide to push people over to it. (Ada's an issue too, though not one most people will have to deal with. While Ada 2005 added a UTF-32 string type, it left the UCS-2 string type as is. Again, I suspect a lot of nominally Unicode Ada programs are going to BMP-only. Of course, UTF-8 as an ASCII superset is used, stuffed into strings labeled Latin-1; it's technically not conformant with the Ada standard but it works so long as you don't need much string processing.) In any case, is the use of non-BMP characters still problematic in your corner of the computing world or is everything looking fine from where you are? -- Kie ekzistas vivo, ekzistas espero.
Re: Support for non-BMP characters
I'm really not a technical expert, but what you write rather sounds to me as if Javascripts UCS-2 implementation were broken... Thanks for the linked document. Sz On Wed, Apr 25, 2012 at 11:41, Marc Durdin marc.dur...@tavultesoft.comwrote: Yes, but this means that regexes with SMP don’t work (e.g. [풜-풵]), character counts returns code units, etc. So you have to reimplement string.length, string.charCodeAt, etc, if you don’t want to deal with surrogate pairs (I reckon you’ve got better things to be spending your time on). ** ** http://dheeb.files.wordpress.com/2011/07/gbu.pdf “Unicode Support Shootout - The Good, the Bad the (mostly) Ugly” by Tom Christiansen has a great summary of some of the issues with relying on JavaScript’s internal string manipulation (unfortunately can’t find a better working link at present – the official training.perl.com site seems to be down). Actually, that presentation is a fantastic place to start for understanding many of the limitations of various programming languages’ support for Unicode – if you haven’t read it, I’d urge you to go read it now. ** ** Marc ** ** *From:* Szelp, A. Sz. [mailto:a.sz.sz...@gmail.com] *Sent:* Wednesday, 25 April 2012 7:28 PM *To:* Marc Durdin *Cc:* David Starner; Unicode Mailing List *Subject:* Re: Support for non-BMP characters ** ** Shouldn't it be technically possible to store Supplementary Plane characters in UTF-16 / UCS-2 as well? Isn't that what Surrogate Pairs are for? ** ** Sz On Wed, Apr 25, 2012 at 11:09, Marc Durdin marc.dur...@tavultesoft.com wrote: Probably the most egregious example I know of is JavaScript. As far as I know, JavaScript still only groks UCS-2. I'd love to be wrong. Marc -Original Message- From: unicode-bou...@unicode.org [mailto:unicode-bou...@unicode.org] On Behalf Of David Starner Sent: Wednesday, 25 April 2012 6:32 PM To: Unicode Mailing List Subject: Support for non-BMP characters It's been ten years since the first non-BMP characters were encoded. How are they working in your neck of the woods? There's a lot of places where they're working just fine, but I was facing MySQL's support. It has had support for UCS-2 and UTF-8 limited to the BMP for a long time; now in MySQL 5.5 there's utf16, utf32 and utf8mb4. (MySQL 5.1 and 5.5 are the current stable releases.) But there's enough warnings about incompatibilities with utf8mb4 to make me pause before switching my private database to it, and I think the net will see MySQL databases with utf8 instead of utf8mb4 as long as MySQL exists, unless they decide to push people over to it. (Ada's an issue too, though not one most people will have to deal with. While Ada 2005 added a UTF-32 string type, it left the UCS-2 string type as is. Again, I suspect a lot of nominally Unicode Ada programs are going to BMP-only. Of course, UTF-8 as an ASCII superset is used, stuffed into strings labeled Latin-1; it's technically not conformant with the Ada standard but it works so long as you don't need much string processing.) In any case, is the use of non-BMP characters still problematic in your corner of the computing world or is everything looking fine from where you are? -- Kie ekzistas vivo, ekzistas espero. ** **
Re: Code2000 on SourceForge (was Re: [indic] Re: Lack of Complex script rendering support on Android)
James, you might want to review (at least) the OFL: http://en.wikipedia.org/wiki/SIL_Open_Font_License, a license specifically created for fonts, created with freedoms in mind. In several respects it fits fonts much better than GPLv3. /Sz On Fri, Feb 3, 2012 at 18:12, James Kass jamesk...@att.net wrote: I rather would stick with GPLv3, simply because more permissive license threatens freedom. For example, someone may take over my fonts, develop them further, and subsequently change their license to something commercial-only. It is what I want to avoid. Just something like stories known from MACOS X, initially Berkeley-licensed-software derivative, finally commercialized product. James Kass
Re: Code2000 on SourceForge (was Re: [indic] Re: Lack of Complex script rendering support on Android)
Sorry, I was reading my mail threads according to time/date. I see now that the same has been proposed on the other thread. I also see you preferring not to act due to private commitments and time constrains. Sorry, again, for bringing this up unnecessarily. All the best for your struggle, and keep it simple! /Szabolcs On Sat, Feb 4, 2012 at 10:49, Szelp, A. Sz. a.sz.sz...@gmail.com wrote: James, you might want to review (at least) the OFL: http://en.wikipedia.org/wiki/SIL_Open_Font_License, a license specifically created for fonts, created with freedoms in mind. In several respects it fits fonts much better than GPLv3. /Sz On Fri, Feb 3, 2012 at 18:12, James Kass jamesk...@att.net wrote: I rather would stick with GPLv3, simply because more permissive license threatens freedom. For example, someone may take over my fonts, develop them further, and subsequently change their license to something commercial-only. It is what I want to avoid. Just something like stories known from MACOS X, initially Berkeley-licensed-software derivative, finally commercialized product. James Kass
Re: Sorting and Volapük
Indeed, I can confirm that behaviour for ö and ü. However, Hungarian does not have ä which is part of Volapük. (And if it's nevertheless there, e.g. in name-lists containing foreign names, or Hungarian names of foreign (German) origin, ä is sorted as a). So Hungarian is neither a perfect fit as a substitute locale for Volapük. /Szabolcs On Sun, Jan 1, 2012 at 19:48, Jean-François Colson j...@colson.eu wrote: Le 01/01/12 16:27, Michael Everson a écrit : IIRC Hungarian does that for ö and ü: they’re separate letters sorted after o and u respectively. But OTOH á, é, í, ó, ő, ú and ő are sorted as a, e, i, o, ö, u and ü respectively.
Re: Archaic Pashto letter
- Is the present hamza convention a development of the two vertical dots proposal, or are they unrelated? About a year ago I worked with several Afghan expatriates living in Southern California, and in handwriting they would typically join two diacritical dots as a squiggle rather than a line (which is more common in Arabic). One could see how two vertical dots might develop into a vertical squiggle and later into a hamza, especially given the note by Vladimir Ivanov cited below. But this is only a conjecture at this point. This sounds pretty much plausible, anyway it seems more plausible than an original hamza. In that case U+0682 would be actually a glyph variant of U+0682. Anyway, I'm quite interested in the outcome of yours and others' investigation into that matter. Szabolcs