Re: Are Latin and Cyrillic essentially the same script?
On 22 Nov 2010, at 18:55, Asmus Freytag wrote: That seems to be true for IPA as well - because already, if you use the font binding for IPA, your a's and g's will not come out right, which means you don't even have to worry about betas and chis. Not so. There is already a convention (going back to the late 19th or early 20th century) about handling this. In an ordinary Times-like font, a slopes and loses its hat when italicized. In an ordinary Times-like font, ɑ is replaced by an italic Greek α (alpha). Michael Everson * http://www.evertype.com/
Re: Are Latin and Cyrillic essentially the same script?
On 19 Nov 2010, at 07:15, Peter Constable wrote: And while IPA is primarily based on Latin script, not all of its characters are Latin characters: bilabial and interdental fricative phonemes are represented using Greek letters beta and theta. IPA beta and chi behave very differently from their Greek antecedents and should not remain unified. The case for theta is messier because theta is so very messy. Michael Everson * http://www.evertype.com/
Re: Are Latin and Cyrillic essentially the same script?
On 19 Nov 2010, at 17:09, Peter Constable wrote: And historic texts aren’t as likely or unlikely to require specialized fonts? Twenty years of historic text in Tatar isn't irrelevant. It's also a notational system that requires specific training in its use, And working with historic texts doesn’t require specific training? Not in terms of Jaŋalif. The training you need there is just learn to read the language in another alphabet. IPA is more complex than that, especially if you go for close transcription. While several orthographies have been based on IPA, my understanding is that some of them saw the encoding of additional characters to make them work as orthographies. Again, I don’t see how that impacts this particular case. This particular case is analogous to the borrowing of Q and W into Cyrillic from Latin. By the way I understand that there are many people who would like to revert to the Latin orthography for these Turkic languages. At present Russian law forbids this, but it is not the case that one may expect that this orthography will always remain historic. It boils down to this: just as there aren’t technical or usability reasons that make it problematic to represent IPA text using two Greek characters in an otherwise-Latin system, Yes there are. Sorting multilingual text including Greek and IPA transcriptions, for one. The glyph shape for IPA beta is practically unknown in Greek. Latin capital Chi is not the same as Greek capital chi. so also there are no technical or usability reasons I’m aware of why it is problematic to represent this historic Janalif orthography using two Cyrillic characters. They are the same technical and usability reasons which led to the disunification of Cyrillic Ԛ and Ԝ from Latin Q and W. Michael Everson * http://www.evertype.com/
Re: Are Latin and Cyrillic essentially the same script?
On 11/22/2010 4:15 AM, Michael Everson wrote: It boils down to this: just as there aren’t technical or usability reasons that make it problematic to represent IPA text using two Greek characters in an otherwise-Latin system, Yes there are. Sorting multilingual text including Greek and IPA transcriptions, for one. The glyph shape for IPA beta is practically unknown in Greek. Latin capital Chi is not the same as Greek capital chi. so also there are no technical or usability reasons I’m aware of why it is problematic to represent this historic Janalif orthography using two Cyrillic characters. They are the same technical and usability reasons which led to the disunification of Cyrillic Ԛ and Ԝ from Latin Q and W. The sorting problem I think I understand. Because scripts are kept together in sorting, when you have a mixed script list, you normally overrides just the sorting for the script to which the (sort-)language belongs. A mixed French-Russian list would use French ordering for the Latin characters, but the Russian words would all appear together (and be sorted according to some generic sort order for Cyrillic characters - except that for a bilingual list, sorting the Cyrillic according to Russian rules might also make sense.). Same for a French-Greek list. The Greek characters will be together and sorted either by a generic Greek (script) sort, or a specific Greek (language) sort.When you sort a mixed list of IPA and Greek, the beta and chi will now sort with the Latin characters, in whatever sort order applies for IPA. That means the order of all Greek words in the list will get messed up. It will neither be a generic Greek (script) sort, nor a specific Greek (language) sort, because you can't tailor the same characters two different ways in the same sort. That's the problem I understand is behind the issue with the Kurdish Q and W, and with the character pair proposed for disunification for Janalif. Perhaps, it seems, there are some technical problems that would make the support for such mixed-script orthographies not as seamless as for regular orthographies after all. In that case, a decision would boil down to whether these technical issues are significant enough (given the usage). In other words, it becomes a cost-benefit analysis. Duplication of characters (except where their glyphs have acquired a different appearance in the other context) always has a cost in added confusability. Users can select the wrong character accidentally, spoofers can do so intentionally to try to cause harm. But Unicode was never just a list of distinct glyphs, so duplication between Latin and Greek, or Latin and Cyrillic is already widespread, especially among the capitals. Unlike what Michael claims for IPA, the Janalif characters don't seem to have a very different appearance, so there would not be any technical or usability issue there. Minor glyph variations can be handled by standard technologies, like OpenType, as long as the overall appearance remains legible should language binding of a text have gotten lost. That seems to be true for IPA as well - because already, if you use the font binding for IPA, your a's and g's will not come out right, which means you don't even have to worry about betas and chis. IPA being a notation, I would not be surprised to learn that mixed lists with both IPA and other terms are a rare thing. But for Janalif it would seem that mixed Janalif/Cyrillic lists would be rather common, relative to the size of the corpus, even if its a dead (or currently out of use) orthography. I'd like to see this addressed a bit more in detail by those who support the decision to keep the borrowed characters unified. A./
Re: Are Latin and Cyrillic essentially the same script?
On 11/18/2010 11:15 PM, Peter Constable wrote: If you'd like a precedent, here's one: Yes, I think discussion of precedents is important - it leads to the formulation of encoding principles that can then (hopefully) result in more consistency in future encoding efforts. Let me add the caveat that I fully understand that character encoding doesn't work by applying cook-book style recipes, and that principles are better phrased as criteria for weighing a decision rather than as formulaic rules. With these caveats, then: IPA is a widely-used system of transcription based primarily on the Latin script. In comparison to the Janalif orthography in question, there is far more existing data. Also, whereas that Janalif orthography is no longer in active use--hence there are not new texts to be represented (there are at best only new citations of existing texts), IPA is as a writing system in active use with new texts being created daily; thus, the body of digitized data for IPA is growing much more that is data in the Janalif orthography. And while IPA is primarily based on Latin script, not all of its characters are Latin characters: bilabial and interdental fricative phonemes are represented using Greek letters beta and theta. IPA has other characteristics in both its usage and its encoding that you need to consider to make the comparison valid. First, IPA requires specialized fonts because it relies on glyphic distinctions that fonts not designed for IPA use will not guarantee. (Latin a with and without hook, g with hook vs. two stories are just two examples). It's also a notational system that requires specific training in its use, and it is caseless - in distinction to ordinary Latin script. While several orthographies have been based on IPA, my understanding is that some of them saw the encoding of additional characters to make them work as orthographies. Finally, IPA, like other phonetic notations, uses distinctions between letter forms on the character level that would almost always be relegated to styling in ordinary text. Because of these special aspects of IPA, I would class it in its own category of writing systems which makes it less useful as a precedent against which to evaluate general Latin-based orthographies. Given a precedent of a widely-used Latin writing system for which it is considered adequate to have characters of central importance represented using letters from a different script, Greek, it would seem reasonable if someone made the case that it's adequate to represent an historic Latin orthography using Cyrillic soft sign. I think the question can and should be asked, what is adequate for a historic orthography. (I don't know anything about the particulars of Janalif, beyond what I read here, so for now, I accept your categorization of it as if it were fact). The precedent for historic orthographies is a bit uneven in Unicode. Some scripts have extensive collection of characters (even duplicates or near duplicates) to cover historic usage. Other historic orthographies cannot be fully represented without markup. And some are now better supported than at the beginning because the encoding has plugged certain gaps. A helpful precedent in this case would be that of another minority or historic orthography, or historic minority orthography for which the use of Greek or Cyrillic characters with Latin was deemed acceptable. I don't think Janalif is totally unique (although the others may not be dead). I'm thinking of the Latin OU that was encoded based on a Greek ligature, and the perennial question of the Kurdish Q an W (Latin borrowings into Cyrillic - I believe these are now 051A and 051C). Again, these may be for living orthographies. /Against this backdrop, it would help if WG2 (and UTC) could point to agreed upon criteria that spell out what circumstances should favor, and what circumstances should disfavor, formal encoding of borrowed characters, in the LGC script family or in the general case./ That's the main point I'm trying to make here. I think it is not enough to somehow arrive at a decision for one orthography, but it is necessary for the encoding committees to grab hold of the reasoning behind that decision and work out how to apply consistent reasoning like that in future cases. This may still feel a little bit unsatisfactory for those whose proposal is thus becoming the test-case to settle a body of encoding principles, but to that I say, there's been ample precedent for doing it that way in Unicode and 10646. So let me ask these questions: A. What are the encoding principles that follow from the disposition of the Janalif proposal? B. What precedents are these based on resp. what precedents are consciously established by this decision? A./
RE: Are Latin and Cyrillic essentially the same script?
From: Asmus Freytag [mailto:asm...@ix.netcom.com] IPA has other characteristics in both its usage and its encoding that you need to consider to make the comparison valid. First, IPA requires specialized fonts because it relies on glyphic distinctions that fonts not designed for IPA use will not guarantee. And historic texts aren’t as likely or unlikely to require specialized fonts? It's also a notational system that requires specific training in its use, And working with historic texts doesn’t require specific training? and it is caseless - in distinction to ordinary Latin script. I could understand how that might be relevant if we were discussing a character borrowed from another script but with different casing behaviour in the original script. (E.g., the character is caseless in the original script, or it is case but only the lowercase was borrowed and a novel uppercase character was created in the receptor script. This was a valid consideration in the encoding of Lisu, for instance.) I don’t really see how that impacts the discussion in this particular case. While several orthographies have been based on IPA, my understanding is that some of them saw the encoding of additional characters to make them work as orthographies. Again, I don’t see how that impacts this particular case. Finally, IPA, like other phonetic notations, uses distinctions between letter forms on the character level that would almost always be relegated to styling in ordinary text. And again, I don’t see how this impacts the particular case under discussion. Because of these special aspects of IPA, I would class it in its own category of writing systems which makes it less useful as a precedent against which to evaluate general Latin-based orthographies. Perhaps in general it cannot serve as a precedent for all things. But as noted, I think several of the things you noted have no particular bearing in this case. For the specific issue of borrowing a character from another script in a historic orthography, I think it’s a perfectly valid precedent. It boils down to this: just as there aren’t technical or usability reasons that make it problematic to represent IPA text using two Greek characters in an otherwise-Latin system, so also there are no technical or usability reasons I’m aware of why it is problematic to represent this historic Janalif orthography using two Cyrillic characters. Btw, I suspect that calling these Latin characters is completely revisionist: if we could ask anyone that taught or used this orthography in 1930 about these characters, I suspect they would say that they are Cyrillic characters. I think the question can and should be asked, what is adequate for a historic orthography. Clearly you’re trying to have a discussion about general principles, not about the specific characters. At the moment, I’m prepared to discuss general principles to the extent that they impinge on the particular case at hand. Other’s may wish to engage on a broader discussion of general principles (though, hopefully under a different subject). Against this backdrop, it would help if WG2 (and UTC) could point to agreed upon criteria that spell out what circumstances should favor, and what circumstances should disfavor, formal encoding of borrowed characters, in the LGC script family or in the general case. That's the main point I'm trying to make here. I think it is not enough to somehow arrive at a decision for one orthography, but it is necessary for the encoding committees to grab hold of the reasoning behind that decision and work out how to apply consistent reasoning like that in future cases. These are not unreasonable requests. I don’t see any inconsistency in practice as it relates to this particular case, however. So let me ask these questions: A. What are the encoding principles that follow from the disposition of the Janalif proposal? I think one principle is that we do not always have to maintain a principle of orthographic script purity. In particular, in the case of historic orthographies no longer in active use that borrowed characters from another script in the LGC family, if there are no technical or usability reasons that make it problematic to represent those text elements using existing characters from the source script, then it is not necessary to encode equivalents in the receptor script so that we can say that the historic orthography is a pure-Latin / pure-Greek / pure-Cyrillic orthography (which, in terms of social history rather than character encoding, would likely be a revisionist perspective). B. What precedents are these based on resp. what precedents are consciously established by this decision? I'm not sure I fully understand the question so won't venture a comment. Peter
RE: Are Latin and Cyrillic essentially the same script?
From: unicode-bou...@unicode.org [mailto:unicode-bou...@unicode.org] On Behalf Of André Szabolcs Szelp AFAIR the reservations of WG2 concerning the encoding of Jangalif Latin Ь/ь as a new character were not in view of Cyrillic Ь/ь, but rather in view of its potential identity with the tone sign mentioned by you as well. It is a Latin letter adapted from the Cyrillic soft sign, There's another possible point of view: that it's a Cyrillic character that, for a short period, people tried using as a Latin character but that never stuck, and that it's completely adequate to represent Janalif text in that orthography using the Cyrillic soft sign. Peter
Re: Are Latin and Cyrillic essentially the same script?
On 11/18/2010 8:04 AM, Peter Constable wrote: From: unicode-bou...@unicode.org [mailto:unicode-bou...@unicode.org] On Behalf Of André Szabolcs Szelp AFAIR the reservations of WG2 concerning the encoding of Jangalif Latin Ь/ь as a new character were not in view of Cyrillic Ь/ь, but rather in view of its potential identity with the tone sign mentioned by you as well. It is a Latin letter adapted from the Cyrillic soft sign, There's another possible point of view: that it's a Cyrillic character that, for a short period, people tried using as a Latin character but that never stuck, and that it's completely adequate to represent Janalif text in that orthography using the Cyrillic soft sign. When one language borrows a word from another, there are several stages of foreignness, ranging from treating the foreign word as a short quotation in the original language to treating it as essentially fully native. Now words are very complex in behavior and usage compared to characters. You can check for pronunciation, spelling and adaptation to the host grammar to check which stage of adaptation a word has reached. When a script borrows a letter from another, you are essentially limited in what evidence you can use to document objectively whether the borrowing has crossed over the script boundary and the character has become native. With typographically closely related scripts, getting tell-tale typographical evidence is very difficult. After all, these scripts started out from the same root. So, you need some other criteria. You could individually compare orthographies and decide which ones are important enough (or established enough) to warrant support. Or you could try to distinguish between orthographies for general use withing the given language, vs. other systems of writing (transcriptions, say). But whatever you do, you should be consistent and take account of existing precedent. There are a number of characters encoded as nominally Latin in Unicode that are borrowings from other scripts, usually Greek. A discussion of the current issue should include explicit explanation of why these precedents apply or do not apply, and, in the latter case, why some precedents may be regarded as examples of past mistakes. By explicitly analyzing existing precedents, it should be possible to avoid the impression that the current discussion is focused on the relative merits of a particular orthography based on personal and possibly arbitrary opinions by the work group experts. If it can be shown that all other cases where such borrowings were accepted into Unicode are based on orthographies that are more permanent, more widespread or both, or where other technical or typographical reasons prevailed that are absent here, then it would make any decision on the current request seem a lot less arbitrary. I don't know where the right answer lies in the case of Janalif, or which point of view, in Peter's phrasing, would make the most sense, but having this discussion without clear understanding of the precedents will lead to inconsistent encoding. A./
pupil's comment: Are Latin and Cyrillic essentially the same script?
Dear all, Still see myself as pupil reading introduction chart of unicode, but I am happy to join the discussion on the Russian: it is quite different from Latin. Apart from 33 characters in Russian alphabet = more characters and apart from quite a few characters that as English speaker you clearly do not know, Latin and Russian indeed contain some similar characters. But watch out. There are if I am correct 3 a's in the world, in this email a (Latin) looks like a (Russian) but they are different. So the Russian a is quite suited for a hierogplyph attack (I will try ontslag.com, which is Dutch for dismissal.com, to see how search engines react. With Russian a. Punycode is different of the word as total). Similar example: Ukraine i - looks like ours, but you can't register it on .rf (Russian Federation). Experiment 1 year ago with *Reïntegratie.com* http://www.google.nl/aclk?sa=lai=Cq32OAcrlTIelNsGTOoCQ8Z4GwoKpugHavNrYFpf09AgIABADKANQppe9lfj_AWCRvJqFhBigAaryw_4DyAEBqQJLcsn7dNi2PqoEHE_QPDrLX54nLEfeere4hVxwC4D9yTrI81AEiP26BRMI9ayF7dSrpQIVyo0OCh1WKGKjygUAei=AcrlTLWoLsqbOtbQiJsKsig=AGiWqtxaX45Uf8wTKRjRJAdJsIX8fkSunAadurl=http://www.arboned.nl/diensten/arbeidsdeskundig-advies/dienst/arbeidsdeskundig-reintegratieonderzoek/ being correct Dutch for reintegration, but being impossible as domainname because SIDN.nl (supposed to be nic.nl) is very conservative and does not even allow signs gave as result: in the beginning Google appreciated and appreciated itafter a few months the hosted and filled site 'sank'.(I borrowed the **ï* http://www.google.nl/aclk?sa=lai=Cq32OAcrlTIelNsGTOoCQ8Z4GwoKpugHavNrYFpf09AgIABADKANQppe9lfj_AWCRvJqFhBigAaryw_4DyAEBqQJLcsn7dNi2PqoEHE_QPDrLX54nLEfeere4hVxwC4D9yTrI81AEiP26BRMI9ayF7dSrpQIVyo0OCh1WKGKjygUAei=AcrlTLWoLsqbOtbQiJsKsig=AGiWqtxaX45Uf8wTKRjRJAdJsIX8fkSunAadurl=http://www.arboned.nl/diensten/arbeidsdeskundig-advies/dienst/arbeidsdeskundig-reintegratieonderzoek/ *from Catalan, amidst Latin characters). News about ss / sz to whom is interested: most Germans were alert (ss-holders had priority to /ß)//, /so no/Fußbal/l for me, but only experimental domain names IDNexpress.de and IDNexpre/ß.de. /It was a mini-landrush on Nov. 16 2010, 10:00 German time onwards (Denic.de) / /Very busy with .rf auction now, in December I will put 2 different sites on these ss and sz names so people can wonder at their screens to see what is happening. Above reaction was more out of domain names and practical experience than chartUTFxyz - but definitely: different script. Br, Philippe On 18-11-2010 20:04, Asmus Freytag wrote: On 11/18/2010 8:04 AM, Peter Constable wrote: From: unicode-bou...@unicode.org [mailto:unicode-bou...@unicode.org] On Behalf Of André Szabolcs Szelp AFAIR the reservations of WG2 concerning the encoding of Jangalif Latin Ь/ь as a new character were not in view of Cyrillic Ь/ь, but rather in view of its potential identity with the tone sign mentioned by you as well. It is a Latin letter adapted from the Cyrillic soft sign, There's another possible point of view: that it's a Cyrillic character that, for a short period, people tried using as a Latin character but that never stuck, and that it's completely adequate to represent Janalif text in that orthography using the Cyrillic soft sign. When one language borrows a word from another, there are several stages of foreignness, ranging from treating the foreign word as a short quotation in the original language to treating it as essentially fully native. Now words are very complex in behavior and usage compared to characters. You can check for pronunciation, spelling and adaptation to the host grammar to check which stage of adaptation a word has reached. When a script borrows a letter from another, you are essentially limited in what evidence you can use to document objectively whether the borrowing has crossed over the script boundary and the character has become native. With typographically closely related scripts, getting tell-tale typographical evidence is very difficult. After all, these scripts started out from the same root. So, you need some other criteria. You could individually compare orthographies and decide which ones are important enough (or established enough) to warrant support. Or you could try to distinguish between orthographies for general use withing the given language, vs. other systems of writing (transcriptions, say). But whatever you do, you should be consistent and take account of existing precedent. There are a number of characters encoded as nominally Latin in Unicode that are borrowings from other scripts, usually Greek. A discussion of the current issue should include explicit explanation of why these precedents apply or do not apply, and, in the latter case, why some precedents may be regarded as examples of past mistakes. By explicitly analyzing existing precedents, it should be possible to avoid the
RE: Are Latin and Cyrillic essentially the same script?
If you'd like a precedent, here's one: IPA is a widely-used system of transcription based primarily on the Latin script. In comparison to the Janalif orthography in question, there is far more existing data. Also, whereas that Janalif orthography is no longer in active use--hence there are not new texts to be represented (there are at best only new citations of existing texts), IPA is as a writing system in active use with new texts being created daily; thus, the body of digitized data for IPA is growing much more that is data in the Janalif orthography. And while IPA is primarily based on Latin script, not all of its characters are Latin characters: bilabial and interdental fricative phonemes are represented using Greek letters beta and theta. Given a precedent of a widely-used Latin writing system for which it is considered adequate to have characters of central importance represented using letters from a different script, Greek, it would seem reasonable if someone made the case that it's adequate to represent an historic Latin orthography using Cyrillic soft sign. Peter -Original Message- From: Asmus Freytag [mailto:asm...@ix.netcom.com] Sent: Thursday, November 18, 2010 11:05 AM To: Peter Constable Cc: André Szabolcs Szelp; Karl Pentzlin; unicode@unicode.org; Ilya Yevlampiev Subject: Re: Are Latin and Cyrillic essentially the same script? On 11/18/2010 8:04 AM, Peter Constable wrote: From: unicode-bou...@unicode.org [mailto:unicode-bou...@unicode.org] On Behalf Of André Szabolcs Szelp AFAIR the reservations of WG2 concerning the encoding of Jangalif Latin Ь/ь as a new character were not in view of Cyrillic Ь/ь, but rather in view of its potential identity with the tone sign mentioned by you as well. It is a Latin letter adapted from the Cyrillic soft sign, There's another possible point of view: that it's a Cyrillic character that, for a short period, people tried using as a Latin character but that never stuck, and that it's completely adequate to represent Janalif text in that orthography using the Cyrillic soft sign. When one language borrows a word from another, there are several stages of foreignness, ranging from treating the foreign word as a short quotation in the original language to treating it as essentially fully native. Now words are very complex in behavior and usage compared to characters. You can check for pronunciation, spelling and adaptation to the host grammar to check which stage of adaptation a word has reached. When a script borrows a letter from another, you are essentially limited in what evidence you can use to document objectively whether the borrowing has crossed over the script boundary and the character has become native. With typographically closely related scripts, getting tell-tale typographical evidence is very difficult. After all, these scripts started out from the same root. So, you need some other criteria. You could individually compare orthographies and decide which ones are important enough (or established enough) to warrant support. Or you could try to distinguish between orthographies for general use withing the given language, vs. other systems of writing (transcriptions, say). But whatever you do, you should be consistent and take account of existing precedent. There are a number of characters encoded as nominally Latin in Unicode that are borrowings from other scripts, usually Greek. A discussion of the current issue should include explicit explanation of why these precedents apply or do not apply, and, in the latter case, why some precedents may be regarded as examples of past mistakes. By explicitly analyzing existing precedents, it should be possible to avoid the impression that the current discussion is focused on the relative merits of a particular orthography based on personal and possibly arbitrary opinions by the work group experts. If it can be shown that all other cases where such borrowings were accepted into Unicode are based on orthographies that are more permanent, more widespread or both, or where other technical or typographical reasons prevailed that are absent here, then it would make any decision on the current request seem a lot less arbitrary. I don't know where the right answer lies in the case of Janalif, or which point of view, in Peter's phrasing, would make the most sense, but having this discussion without clear understanding of the precedents will lead to inconsistent encoding. A./
Re: Are Latin and Cyrillic essentially the same script?
AFAIR the reservations of WG2 concerning the encoding of Jangalif Latin Ь/ь as a new character were not in view of Cyrillic Ь/ь, but rather in view of its potential identity with the tone sign mentioned by you as well. It is a Latin letter adapted from the Cyrillic soft sign, like the Jangalif character. Function, as you point out, is not a distinctive feature. The different serif style which you pointed out cannot be seen as discriminating features of character identity, especially not in a time of bad typography (and actually lack of latin typographic tradition in China of the time). /Sz On Wed, Nov 10, 2010 at 5:08 PM, Karl Pentzlin karl-pentz...@acssoft.de wrote: As shown in N3916: http://std.dkuug.dk/jtc1/sc2/wg2/docs/n3916.pdf = L2/10-356, there exists a Latin letter which resembles the Cyrillic soft sign Ь/ь (U+042C/U+044C). This letter is part of the Jaꞑalif variant of the alphabet, which was used for several languages in the former Soviet Union (e.g. Tatar), and was developed in parallel to the alphabet nowadays in use for Turk and Azerbaijan, see: http://en.wikipedia.org/wiki/Janalif . In fact, it was proposed on this base, being the only Jaꞑalif letter missing so far, since the ꞑ (occurring in the alphabet name itself) was introduced with Unicode 6.0. The letter is no soft sign; it is the exact Tatar equivalent of the Turkish dotless i, thus it has a similar use as the Cyrillic yeru Ы/ы (U+042B/U+044B). In this function, it is a part of the adaptation of the Latin alphabet for a lot of non-Russian languages in the Soviet Union in the 1920s, see e.g.: Юшманов, Н. В.: Определитель Языков. Москва/Ленинград 1941, http://fotki.yandex.ru/users/ievlampiev/view/155697?page=3 . (A proposal regarding this subject is expected for 2011.) Thus, it shares with the Cyrillic soft sign its form and partly the geographical area of its use, but in no case its meaning. Similar can be said e.g. for P/p (U+0050/U+0070, Latin letter P) and Р/р (U+0420/U+0440, Cyrillic letter ER). According to the pre-preliminary minutes of UTC #125 (L2/10-415), the UTC has not accepted the Latin Ь/ь. It is an established practice for the European alphabetic scripts to encode a new letter only if it has a different shape (in at least one of the capital and small forms) regarding to all already encoded letter of the same script. The Y/y is well known to denote completely different pronunciations, used as consonant as well as vocal, even within the same language. Thus, if somebody unearths a Latin letter E/e in some obscure minority language which has no E-like vocal, to denote a M-like sound and in fact to be collated after the M in the local alphabet, this will probably not lead to a new encoding. But, Latin and Cyrillic are different scripts (the question in the Re of this mail is rhetorical, of course). Admittedly, there also is a precedence for using Cyrillic letters in Latin text: the use of U+0417/U+0437 and U+0427/U+0447 for tone letters in Zhuang. However, the orthography using them was short-lived, being superseded by another Latin orthography which uses genuine Latin letters as tone marks (J/j and X/x, in this case). On the other hand, Jaꞑalif and the other Latin alphabets which use Ь/ь did not lose the Ь/ь by an improvement of the orthography, but were completely deprecated by an ukase of Stalin. Thus, they continue to be the Latin alphabets of the respective languages. Whether formally requesting a revival or not, they are regarded as valid by the members of the cultural group (even if only to access their cultural inheritance). Especially, it cannot be excluded that persons want to create Latin domain names or e-mail addresses without being accused for script mixing. Taking this into account, not mentioning the technical problems regarding collation etc. and the typographical issues when it comes to subtle differences between Latin and Cyrillic in high quality typography, it is really hard to understand why the UTC refuses to encode the Latin Ь/ь. A quick glance at the Юшманов table mentioned above proves that there is absolutely no request to duplicate the whole Cyrillic alphabet in Latin, as someone may have feared. - Karl Pentzlin -- Szelp, André Szabolcs +43 (650) 79 22 400
Are Latin and Cyrillic essentially the same script?
As shown in N3916: http://std.dkuug.dk/jtc1/sc2/wg2/docs/n3916.pdf = L2/10-356, there exists a Latin letter which resembles the Cyrillic soft sign Ь/ь (U+042C/U+044C). This letter is part of the Jaꞑalif variant of the alphabet, which was used for several languages in the former Soviet Union (e.g. Tatar), and was developed in parallel to the alphabet nowadays in use for Turk and Azerbaijan, see: http://en.wikipedia.org/wiki/Janalif . In fact, it was proposed on this base, being the only Jaꞑalif letter missing so far, since the ꞑ (occurring in the alphabet name itself) was introduced with Unicode 6.0. The letter is no soft sign; it is the exact Tatar equivalent of the Turkish dotless i, thus it has a similar use as the Cyrillic yeru Ы/ы (U+042B/U+044B). In this function, it is a part of the adaptation of the Latin alphabet for a lot of non-Russian languages in the Soviet Union in the 1920s, see e.g.: Юшманов, Н. В.: Определитель Языков. Москва/Ленинград 1941, http://fotki.yandex.ru/users/ievlampiev/view/155697?page=3 . (A proposal regarding this subject is expected for 2011.) Thus, it shares with the Cyrillic soft sign its form and partly the geographical area of its use, but in no case its meaning. Similar can be said e.g. for P/p (U+0050/U+0070, Latin letter P) and Р/р (U+0420/U+0440, Cyrillic letter ER). According to the pre-preliminary minutes of UTC #125 (L2/10-415), the UTC has not accepted the Latin Ь/ь. It is an established practice for the European alphabetic scripts to encode a new letter only if it has a different shape (in at least one of the capital and small forms) regarding to all already encoded letter of the same script. The Y/y is well known to denote completely different pronunciations, used as consonant as well as vocal, even within the same language. Thus, if somebody unearths a Latin letter E/e in some obscure minority language which has no E-like vocal, to denote a M-like sound and in fact to be collated after the M in the local alphabet, this will probably not lead to a new encoding. But, Latin and Cyrillic are different scripts (the question in the Re of this mail is rhetorical, of course). Admittedly, there also is a precedence for using Cyrillic letters in Latin text: the use of U+0417/U+0437 and U+0427/U+0447 for tone letters in Zhuang. However, the orthography using them was short-lived, being superseded by another Latin orthography which uses genuine Latin letters as tone marks (J/j and X/x, in this case). On the other hand, Jaꞑalif and the other Latin alphabets which use Ь/ь did not lose the Ь/ь by an improvement of the orthography, but were completely deprecated by an ukase of Stalin. Thus, they continue to be the Latin alphabets of the respective languages. Whether formally requesting a revival or not, they are regarded as valid by the members of the cultural group (even if only to access their cultural inheritance). Especially, it cannot be excluded that persons want to create Latin domain names or e-mail addresses without being accused for script mixing. Taking this into account, not mentioning the technical problems regarding collation etc. and the typographical issues when it comes to subtle differences between Latin and Cyrillic in high quality typography, it is really hard to understand why the UTC refuses to encode the Latin Ь/ь. A quick glance at the Юшманов table mentioned above proves that there is absolutely no request to duplicate the whole Cyrillic alphabet in Latin, as someone may have feared. - Karl Pentzlin
Re: Are Latin and Cyrillic essentially the same script?
2010-11-10 10:08, I wrote: KP As shown in N3916 ... Please read vowel instead of vocal throughout the mail. Sorry.