Re: Are Latin and Cyrillic essentially the same script?

2010-11-23 Thread Michael Everson
On 22 Nov 2010, at 18:55, Asmus Freytag wrote:

 That seems to be true for IPA as well - because already, if you use the font 
 binding for IPA, your a's and g's will not come out right, which means you 
 don't even have to worry about betas and chis.


Not so. There is already a convention (going back to the late 19th or early 
20th century) about handling this. 

In an ordinary Times-like font, a slopes and loses its hat when italicized. 
In an ordinary Times-like font, ɑ is replaced by an italic Greek α (alpha). 

Michael Everson * http://www.evertype.com/





Re: Are Latin and Cyrillic essentially the same script?

2010-11-22 Thread Michael Everson
On 19 Nov 2010, at 07:15, Peter Constable wrote:

 And while IPA is primarily based on Latin script, not all of its characters 
 are Latin characters: bilabial and interdental fricative phonemes are 
 represented using Greek letters beta and theta.

IPA beta and chi behave very differently from their Greek antecedents and 
should not remain unified. The case for theta is messier because theta is so 
very messy.

Michael Everson * http://www.evertype.com/





Re: Are Latin and Cyrillic essentially the same script?

2010-11-22 Thread Michael Everson
On 19 Nov 2010, at 17:09, Peter Constable wrote:

 And historic texts aren’t as likely or unlikely to require specialized fonts?

Twenty years of historic text in Tatar isn't irrelevant. 


 It's also a notational system that requires specific training in its use, 
 
 And working with historic texts doesn’t require specific training?

Not in terms of Jaŋalif. The training you need there is just learn to read the 
language in another alphabet. IPA is more complex than that, especially if you 
go for close transcription.

 While several orthographies have been based on IPA, my understanding is 
 that some of them saw the encoding of additional characters to make them 
 work as orthographies.
 
 Again, I don’t see how that impacts this particular case.

This particular case is analogous to the borrowing of Q and W into Cyrillic 
from Latin. 

By the way I understand that there are many people who would like to revert to 
the Latin orthography for these Turkic languages. At present Russian law 
forbids this, but it is not the case that one may expect that this orthography 
will always remain historic. 

 It boils down to this: just as there aren’t technical or usability reasons 
 that make it problematic to represent IPA text using two Greek characters in 
 an otherwise-Latin system,

Yes there are. Sorting multilingual text including Greek and IPA 
transcriptions, for one. The glyph shape for IPA beta is practically unknown in 
Greek. Latin capital Chi is not the same as Greek capital chi. 

 so also there are no technical or usability reasons I’m aware of why it is 
 problematic to represent this historic Janalif orthography using two Cyrillic 
 characters.

They are the same technical and usability reasons which led to the 
disunification of Cyrillic Ԛ and Ԝ from Latin Q and W.

Michael Everson * http://www.evertype.com/





Re: Are Latin and Cyrillic essentially the same script?

2010-11-22 Thread Asmus Freytag

On 11/22/2010 4:15 AM, Michael Everson wrote:

It boils down to this: just as there aren’t technical or usability reasons that 
make it problematic to represent IPA text using two Greek characters in an 
otherwise-Latin system,

Yes there are. Sorting multilingual text including Greek and IPA 
transcriptions, for one. The glyph shape for IPA beta is practically unknown in 
Greek. Latin capital Chi is not the same as Greek capital chi.


  so also there are no technical or usability reasons I’m aware of why it is 
problematic to represent this historic Janalif orthography using two Cyrillic 
characters.

They are the same technical and usability reasons which led to the 
disunification of Cyrillic Ԛ and Ԝ from Latin Q and W.


The sorting problem I think I understand.

Because scripts are kept together in sorting, when you have a mixed 
script list, you normally overrides just the sorting for the script to 
which the (sort-)language belongs. A mixed French-Russian list would use 
French ordering for the Latin characters, but the Russian words would 
all appear together (and be sorted according to some generic sort order 
for Cyrillic characters - except that for a bilingual list, sorting the 
Cyrillic according to Russian rules might also make sense.).


Same for a French-Greek list. The Greek characters will be together and 
sorted either by a generic Greek (script) sort, or a specific Greek 
(language) sort.When you sort a mixed list of IPA and Greek, the beta 
and chi will now sort with the Latin characters, in whatever sort order 
applies for IPA. That means the order of all Greek words in the list 
will get messed up. It will neither be a generic Greek (script) sort, 
nor a specific Greek (language) sort, because you can't tailor the same 
characters two different ways in the same sort.


That's the problem I understand is behind the issue with the Kurdish Q 
and W, and with the character pair proposed for disunification for Janalif.


Perhaps, it seems, there are some technical problems that would make the 
support for such mixed-script orthographies not as seamless as for 
regular orthographies after all.


In that case, a decision would boil down to whether these technical 
issues are significant enough (given the usage).


In other words, it becomes a cost-benefit analysis. Duplication of 
characters (except where their glyphs have acquired a different 
appearance in the other context) always has a cost in added 
confusability. Users can select the wrong character accidentally, 
spoofers can do so intentionally to try to cause harm. But Unicode was 
never just a list of distinct glyphs, so duplication between Latin and 
Greek, or Latin and Cyrillic is already widespread, especially among the 
capitals.


Unlike what Michael claims for IPA, the Janalif characters don't seem to 
have a very different appearance, so there would not be any technical or 
usability issue there. Minor glyph variations can be handled by standard 
technologies, like OpenType, as long as the overall appearance remains 
legible should language binding of a text have gotten lost.


That seems to be true for IPA as well - because already, if you use the 
font binding for IPA, your a's and g's will not come out right, which 
means you don't even have to worry about betas and chis.


IPA being a notation, I would not be surprised to learn that mixed lists 
with both IPA and other terms are a rare thing. But for Janalif it would 
seem that mixed Janalif/Cyrillic lists would be rather common, relative 
to the size of the corpus, even if its a dead (or currently out of use) 
orthography.


I'd like to see this addressed a bit more in detail by those who support 
the decision to keep the borrowed characters unified.


A./


Re: Are Latin and Cyrillic essentially the same script?

2010-11-19 Thread Asmus Freytag

On 11/18/2010 11:15 PM, Peter Constable wrote:

If you'd like a precedent, here's one:


Yes, I think discussion of precedents is important - it leads to the 
formulation of encoding principles that can then (hopefully) result in 
more consistency in future encoding efforts.


Let me add the caveat that I fully understand that character encoding 
doesn't work by applying cook-book style recipes, and that principles 
are better phrased as criteria for weighing a decision rather than as 
formulaic rules.


With these caveats, then:

  IPA is a widely-used system of transcription based primarily on the Latin 
script. In comparison to the Janalif orthography in question, there is far more 
existing data. Also, whereas that Janalif orthography is no longer in active 
use--hence there are not new texts to be represented (there are at best only 
new citations of existing texts), IPA is as a writing system in active use with 
new texts being created daily; thus, the body of digitized data for IPA is 
growing much more that is data in the Janalif orthography. And while IPA is 
primarily based on Latin script, not all of its characters are Latin 
characters: bilabial and interdental fricative phonemes are represented using 
Greek letters beta and theta.


IPA has other characteristics in both its usage and its encoding that 
you need to consider to make the comparison valid.


First, IPA requires specialized fonts because it relies on glyphic 
distinctions that fonts not designed for IPA use will not guarantee. 
(Latin a with and without hook, g with hook vs. two stories are just two 
examples). It's also a notational system that requires specific training 
in its use, and it is caseless - in distinction to ordinary Latin script.


While several orthographies have been based on IPA, my understanding is 
that some of them saw the encoding of additional characters to make them 
work as orthographies.


Finally, IPA, like other phonetic notations, uses distinctions between 
letter forms on the character level that would almost always be 
relegated to styling in ordinary text.


Because of these special aspects of IPA, I would class it in its own 
category of writing systems which makes it less useful as a precedent 
against which to evaluate general Latin-based orthographies.



Given a precedent of a widely-used Latin writing system for which it is 
considered adequate to have characters of central importance represented using 
letters from a different script, Greek, it would seem reasonable if someone 
made the case that it's adequate to represent an historic Latin orthography 
using Cyrillic soft sign.


I think the question can and should be asked, what is adequate for a 
historic orthography. (I don't know anything about the particulars of 
Janalif, beyond what I read here, so for now, I accept your 
categorization of it as if it were fact).


The precedent for historic orthographies is a bit uneven in Unicode. 
Some scripts have extensive collection of characters (even duplicates or 
near duplicates) to cover historic usage. Other historic orthographies 
cannot be fully represented without markup. And some are now better 
supported than at the beginning because the encoding has plugged certain 
gaps.


A helpful precedent in this case would be that of another minority or 
historic orthography, or historic minority orthography for which the use 
of Greek or Cyrillic characters with Latin was deemed acceptable. I 
don't think Janalif is totally unique (although the others may not be 
dead). I'm thinking of the Latin OU that was encoded based on a Greek 
ligature, and the perennial question of the Kurdish Q an W (Latin 
borrowings into Cyrillic - I believe these are now 051A and 051C). 
Again, these may be for living orthographies.


   /Against this backdrop, it would help if WG2 (and UTC) could point
   to agreed upon criteria that spell out what circumstances should
   favor, and what circumstances should disfavor, formal encoding of
   borrowed characters, in the LGC script family or in the general case./


That's the main point I'm trying to make here. I think it is not enough 
to somehow arrive at a decision for one orthography, but it is necessary 
for the encoding committees to grab hold of the reasoning behind that 
decision and work out how to apply consistent reasoning like that in 
future cases.


This may still feel a little bit unsatisfactory for those whose proposal 
is thus becoming the test-case to settle a body of encoding principles, 
but to that I say, there's been ample precedent for doing it that way in 
Unicode and 10646.


So let me ask these questions:

   A. What are the encoding principles that follow from the disposition
   of the Janalif proposal?

   B. What precedents are these based on resp. what precedents are
   consciously established by this decision?


A./




RE: Are Latin and Cyrillic essentially the same script?

2010-11-19 Thread Peter Constable
From: Asmus Freytag [mailto:asm...@ix.netcom.com] 

 IPA has other characteristics in both its usage and its encoding that you 
 need to consider to make the comparison valid.

 First, IPA requires specialized fonts because it relies on glyphic 
 distinctions 
 that fonts not designed for IPA use will not guarantee.

And historic texts aren’t as likely or unlikely to require specialized fonts?


 It's also a notational system that requires specific training in its use, 

And working with historic texts doesn’t require specific training?

 and it  is caseless - in distinction to ordinary Latin script.

I could understand how that might be relevant if we were discussing a character 
borrowed from another script but with different casing behaviour in the 
original script. (E.g., the character is caseless in the original script, or it 
is case but only the lowercase was borrowed and a novel uppercase character was 
created in the receptor script. This was a valid consideration in the encoding 
of Lisu, for instance.) I don’t really see how that impacts the discussion in 
this particular case. 


 While several orthographies have been based on IPA, my understanding is 
 that some of them saw the encoding of additional characters to make them 
 work as orthographies.

Again, I don’t see how that impacts this particular case.


 Finally, IPA, like other phonetic notations, uses distinctions between letter 
 forms on the character level that would almost always be relegated to styling 
 in ordinary text.

And again, I don’t see how this impacts the particular case under discussion.


 Because of these special aspects of IPA, I would class it in its own category 
 of writing systems which makes it less useful as a precedent against which to 
 evaluate general Latin-based orthographies.

Perhaps in general it cannot serve as a precedent for all things. But as noted, 
I think several of the things you noted have no particular bearing in this 
case. For the specific issue of borrowing a character from another script in a 
historic orthography, I think it’s a perfectly valid precedent. It boils down 
to this: just as there aren’t technical or usability reasons that make it 
problematic to represent IPA text using two Greek characters in an 
otherwise-Latin system, so also there are no technical or usability reasons I’m 
aware of why it is problematic to represent this historic Janalif orthography 
using two Cyrillic characters.

Btw, I suspect that calling these Latin characters is completely revisionist: 
if we could ask anyone that taught or used this orthography in 1930 about these 
characters, I suspect they would say that they are Cyrillic characters.


 I think the question can and should be asked, what is adequate for a historic 
 orthography.

Clearly you’re trying to have a discussion about general principles, not about 
the specific characters. At the moment, I’m prepared to discuss general 
principles to the extent that they impinge on the particular case at hand. 
Other’s may wish to engage on a broader discussion of general principles 
(though, hopefully under a different subject).

 Against this backdrop, it would help if WG2 (and UTC) could point to agreed 
 upon criteria that spell out what circumstances should favor, and what 
 circumstances should disfavor, formal encoding of borrowed characters, in the 
 LGC script family or in the general case.

 That's the main point I'm trying to make here. I think it is not enough to 
 somehow 
 arrive at a decision for one orthography, but it is necessary for the 
 encoding 
 committees to grab hold of the reasoning behind that decision and work out 
 how 
 to apply consistent reasoning like that in future cases.

These are not unreasonable requests. I don’t see any inconsistency in practice 
as it relates to this particular case, however.

 So let me ask these questions:
 A. What are the encoding principles that follow from the disposition of the 
 Janalif 
 proposal?

I think one principle is that we do not always have to maintain a principle of 
orthographic script purity. In particular, in the case of historic 
orthographies no longer in active use that borrowed characters from another 
script in the LGC family, if there are no technical or usability reasons that 
make it problematic to represent those text elements using existing characters 
from the source script, then it is not necessary to encode equivalents in the 
receptor script so that we can say that the historic orthography is a 
pure-Latin / pure-Greek / pure-Cyrillic orthography (which, in terms of social 
history rather than character encoding, would likely be a revisionist 
perspective).


 B. What precedents are these based on resp. what precedents are consciously 
 established by this decision?

I'm not sure I fully understand the question so won't venture a comment.



Peter




RE: Are Latin and Cyrillic essentially the same script?

2010-11-18 Thread Peter Constable
From: unicode-bou...@unicode.org [mailto:unicode-bou...@unicode.org] On Behalf 
Of André Szabolcs Szelp

 AFAIR the reservations of WG2 concerning the encoding of Jangalif 
 Latin Ь/ь as a new character were not in view of Cyrillic Ь/ь, but 
 rather in view of its potential identity with the tone sign mentioned 
 by you as well. It is a Latin letter adapted from the Cyrillic soft sign, 

There's another possible point of view: that it's a Cyrillic character that, 
for a short period, people tried using as a Latin character but that never 
stuck, and that it's completely adequate to represent Janalif text in that 
orthography using the Cyrillic soft sign.



Peter




Re: Are Latin and Cyrillic essentially the same script?

2010-11-18 Thread Asmus Freytag

On 11/18/2010 8:04 AM, Peter Constable wrote:

From: unicode-bou...@unicode.org [mailto:unicode-bou...@unicode.org] On Behalf 
Of André Szabolcs Szelp


AFAIR the reservations of WG2 concerning the encoding of Jangalif
Latin Ь/ь as a new character were not in view of Cyrillic Ь/ь, but
rather in view of its potential identity with the tone sign mentioned
by you as well. It is a Latin letter adapted from the Cyrillic soft sign,

There's another possible point of view: that it's a Cyrillic character that, 
for a short period, people tried using as a Latin character but that never 
stuck, and that it's completely adequate to represent Janalif text in that 
orthography using the Cyrillic soft sign.




When one language borrows a word from another, there are several stages 
of foreignness, ranging from treating the foreign word as a short 
quotation in the original language to treating it as essentially fully 
native.


Now words are very complex in behavior and usage compared to characters. 
You can check for pronunciation, spelling and adaptation to the host 
grammar to check which stage of adaptation a word has reached.


When a script borrows a letter from another, you are essentially limited 
in what evidence you can use to document objectively whether the 
borrowing has crossed over the script boundary and the character has 
become native.


With typographically closely related scripts, getting tell-tale 
typographical evidence is very difficult. After all, these scripts 
started out from the same root.


So, you need some other criteria.

You could individually compare orthographies and decide which ones are 
important enough (or established enough) to warrant support. Or you 
could try to distinguish between orthographies for general use withing 
the given language, vs. other systems of writing (transcriptions, say).


But whatever you do, you should be consistent and take account of 
existing precedent.


There are a number of characters encoded as nominally Latin in Unicode 
that are borrowings from other scripts, usually Greek.


A discussion of the current issue should include explicit explanation of 
why these precedents apply or do not apply, and, in the latter case, why 
some precedents may be regarded as examples of past mistakes.


By explicitly analyzing existing precedents, it should be possible to 
avoid the impression that the current discussion is focused on the 
relative merits of a particular orthography based on personal and 
possibly arbitrary opinions by the work group experts.


If it can be shown that all other cases where such borrowings were 
accepted into Unicode are based on orthographies that are more 
permanent, more widespread or both, or where other technical or 
typographical reasons prevailed that are absent here, then it would make 
any decision on the current request seem a lot less arbitrary.


I don't know where the right answer lies in the case of Janalif, or 
which point of view, in Peter's phrasing, would make the most sense, but 
having this discussion without clear understanding of the precedents 
will lead to inconsistent encoding.


A./



pupil's comment: Are Latin and Cyrillic essentially the same script?

2010-11-18 Thread JP Blankert (thuis PC based)

Dear all,

Still see myself as pupil reading introduction chart of unicode, but I 
am happy to join the discussion on the Russian: it is quite different 
from Latin. Apart from 33 characters in Russian alphabet = more 
characters and apart from quite a few characters that as English speaker 
you clearly do not know, Latin and Russian indeed contain some similar 
characters. But watch out. There are if I am correct 3 a's in the world, 
in this email a (Latin) looks like a (Russian) but they are different. 
So the Russian a is quite suited for a hierogplyph attack (I will try 
ontslag.com, which is Dutch for dismissal.com, to see how search engines 
react. With Russian a. Punycode is different of the word as total).


Similar example: Ukraine i - looks like ours, but you can't register it 
on .rf (Russian Federation).


Experiment 1 year ago with *Reïntegratie.com* 
http://www.google.nl/aclk?sa=lai=Cq32OAcrlTIelNsGTOoCQ8Z4GwoKpugHavNrYFpf09AgIABADKANQppe9lfj_AWCRvJqFhBigAaryw_4DyAEBqQJLcsn7dNi2PqoEHE_QPDrLX54nLEfeere4hVxwC4D9yTrI81AEiP26BRMI9ayF7dSrpQIVyo0OCh1WKGKjygUAei=AcrlTLWoLsqbOtbQiJsKsig=AGiWqtxaX45Uf8wTKRjRJAdJsIX8fkSunAadurl=http://www.arboned.nl/diensten/arbeidsdeskundig-advies/dienst/arbeidsdeskundig-reintegratieonderzoek/ 
being correct Dutch for reintegration, but being impossible as 
domainname because SIDN.nl (supposed to be nic.nl) is very conservative 
and does not even allow signs gave as result: in the beginning Google 
appreciated and appreciated itafter a few months the hosted and 
filled site 'sank'.(I borrowed the **ï* 
http://www.google.nl/aclk?sa=lai=Cq32OAcrlTIelNsGTOoCQ8Z4GwoKpugHavNrYFpf09AgIABADKANQppe9lfj_AWCRvJqFhBigAaryw_4DyAEBqQJLcsn7dNi2PqoEHE_QPDrLX54nLEfeere4hVxwC4D9yTrI81AEiP26BRMI9ayF7dSrpQIVyo0OCh1WKGKjygUAei=AcrlTLWoLsqbOtbQiJsKsig=AGiWqtxaX45Uf8wTKRjRJAdJsIX8fkSunAadurl=http://www.arboned.nl/diensten/arbeidsdeskundig-advies/dienst/arbeidsdeskundig-reintegratieonderzoek/ 
*from Catalan, amidst Latin characters).


News about ss / sz to whom is interested: most Germans were alert 
(ss-holders had priority to /ß)//, /so no/Fußbal/l for me, but only 
experimental domain names IDNexpress.de and IDNexpre/ß.de. /It was a 
mini-landrush on Nov. 16 2010, 10:00 German time onwards (Denic.de)

/
/Very busy with .rf auction now, in December I will put 2 different 
sites on these ss and sz names so people can wonder at their screens to 
see what is happening.


Above reaction was more out of domain names and practical experience 
than chartUTFxyz - but definitely: different script.


Br,

Philippe


On 18-11-2010 20:04, Asmus Freytag wrote:

On 11/18/2010 8:04 AM, Peter Constable wrote:
From: unicode-bou...@unicode.org [mailto:unicode-bou...@unicode.org] 
On Behalf Of André Szabolcs Szelp



AFAIR the reservations of WG2 concerning the encoding of Jangalif
Latin Ь/ь as a new character were not in view of Cyrillic Ь/ь, but
rather in view of its potential identity with the tone sign mentioned
by you as well. It is a Latin letter adapted from the Cyrillic soft 
sign,
There's another possible point of view: that it's a Cyrillic 
character that, for a short period, people tried using as a Latin 
character but that never stuck, and that it's completely adequate to 
represent Janalif text in that orthography using the Cyrillic soft sign.





When one language borrows a word from another, there are several 
stages of foreignness, ranging from treating the foreign word as a 
short quotation in the original language to treating it as essentially 
fully native.


Now words are very complex in behavior and usage compared to 
characters. You can check for pronunciation, spelling and adaptation 
to the host grammar to check which stage of adaptation a word has 
reached.


When a script borrows a letter from another, you are essentially 
limited in what evidence you can use to document objectively whether 
the borrowing has crossed over the script boundary and the character 
has become native.


With typographically closely related scripts, getting tell-tale 
typographical evidence is very difficult. After all, these scripts 
started out from the same root.


So, you need some other criteria.

You could individually compare orthographies and decide which ones are 
important enough (or established enough) to warrant support. Or 
you could try to distinguish between orthographies for general use 
withing the given language, vs. other systems of writing 
(transcriptions, say).


But whatever you do, you should be consistent and take account of 
existing precedent.


There are a number of characters encoded as nominally Latin in 
Unicode that are borrowings from other scripts, usually Greek.


A discussion of the current issue should include explicit explanation 
of why these precedents apply or do not apply, and, in the latter 
case, why some precedents may be regarded as examples of past mistakes.


By explicitly analyzing existing precedents, it should be possible to 
avoid the 

RE: Are Latin and Cyrillic essentially the same script?

2010-11-18 Thread Peter Constable
If you'd like a precedent, here's one: IPA is a widely-used system of 
transcription based primarily on the Latin script. In comparison to the Janalif 
orthography in question, there is far more existing data. Also, whereas that 
Janalif orthography is no longer in active use--hence there are not new texts 
to be represented (there are at best only new citations of existing texts), IPA 
is as a writing system in active use with new texts being created daily; thus, 
the body of digitized data for IPA is growing much more that is data in the 
Janalif orthography. And while IPA is primarily based on Latin script, not all 
of its characters are Latin characters: bilabial and interdental fricative 
phonemes are represented using Greek letters beta and theta.

Given a precedent of a widely-used Latin writing system for which it is 
considered adequate to have characters of central importance represented using 
letters from a different script, Greek, it would seem reasonable if someone 
made the case that it's adequate to represent an historic Latin orthography 
using Cyrillic soft sign.


Peter


-Original Message-
From: Asmus Freytag [mailto:asm...@ix.netcom.com] 
Sent: Thursday, November 18, 2010 11:05 AM
To: Peter Constable
Cc: André Szabolcs Szelp; Karl Pentzlin; unicode@unicode.org; Ilya Yevlampiev
Subject: Re: Are Latin and Cyrillic essentially the same script?

On 11/18/2010 8:04 AM, Peter Constable wrote:
 From: unicode-bou...@unicode.org [mailto:unicode-bou...@unicode.org] 
 On Behalf Of André Szabolcs Szelp

 AFAIR the reservations of WG2 concerning the encoding of Jangalif 
 Latin Ь/ь as a new character were not in view of Cyrillic Ь/ь, but 
 rather in view of its potential identity with the tone sign mentioned 
 by you as well. It is a Latin letter adapted from the Cyrillic soft 
 sign,
 There's another possible point of view: that it's a Cyrillic character that, 
 for a short period, people tried using as a Latin character but that never 
 stuck, and that it's completely adequate to represent Janalif text in that 
 orthography using the Cyrillic soft sign.



When one language borrows a word from another, there are several stages of 
foreignness, ranging from treating the foreign word as a short quotation in 
the original language to treating it as essentially fully native.

Now words are very complex in behavior and usage compared to characters. 
You can check for pronunciation, spelling and adaptation to the host grammar to 
check which stage of adaptation a word has reached.

When a script borrows a letter from another, you are essentially limited in 
what evidence you can use to document objectively whether the borrowing has 
crossed over the script boundary and the character has become native.

With typographically closely related scripts, getting tell-tale typographical 
evidence is very difficult. After all, these scripts started out from the same 
root.

So, you need some other criteria.

You could individually compare orthographies and decide which ones are 
important enough (or established enough) to warrant support. Or you could 
try to distinguish between orthographies for general use withing the given 
language, vs. other systems of writing (transcriptions, say).

But whatever you do, you should be consistent and take account of existing 
precedent.

There are a number of characters encoded as nominally Latin in Unicode that 
are borrowings from other scripts, usually Greek.

A discussion of the current issue should include explicit explanation of why 
these precedents apply or do not apply, and, in the latter case, why some 
precedents may be regarded as examples of past mistakes.

By explicitly analyzing existing precedents, it should be possible to avoid the 
impression that the current discussion is focused on the relative merits of a 
particular orthography based on personal and possibly arbitrary opinions by the 
work group experts.

If it can be shown that all other cases where such borrowings were accepted 
into Unicode are based on orthographies that are more permanent, more 
widespread or both, or where other technical or typographical reasons prevailed 
that are absent here, then it would make any decision on the current request 
seem a lot less arbitrary.

I don't know where the right answer lies in the case of Janalif, or which point 
of view, in Peter's phrasing, would make the most sense, but having this 
discussion without clear understanding of the precedents will lead to 
inconsistent encoding.

A./





Re: Are Latin and Cyrillic essentially the same script?

2010-11-17 Thread André Szabolcs Szelp
AFAIR the reservations of WG2 concerning the encoding of Jangalif
Latin Ь/ь as a new character were not in view of Cyrillic Ь/ь, but
rather in view of its potential identity with the tone sign mentioned
by you as well. It is a Latin letter adapted from the Cyrillic soft
sign, like the Jangalif character. Function, as you point out, is not
a distinctive feature. The different serif style which you pointed out
cannot be seen as discriminating features of character identity,
especially not in a time of bad typography (and actually lack of latin
typographic tradition in China of the time).


/Sz

On Wed, Nov 10, 2010 at 5:08 PM, Karl Pentzlin karl-pentz...@acssoft.de wrote:
 As shown in N3916: http://std.dkuug.dk/jtc1/sc2/wg2/docs/n3916.pdf
 = L2/10-356, there exists a Latin letter which resembles the Cyrillic
 soft sign Ь/ь (U+042C/U+044C). This letter is part of the Jaꞑalif
 variant of the alphabet, which was used for several languages in the
 former Soviet Union (e.g. Tatar), and was developed in parallel to the
 alphabet nowadays in use for Turk and Azerbaijan, see:
 http://en.wikipedia.org/wiki/Janalif .
 In fact, it was proposed on this base, being the only Jaꞑalif letter
 missing so far, since the ꞑ (occurring in the alphabet name itself)
 was introduced with Unicode 6.0.

 The letter is no soft sign; it is the exact Tatar equivalent of the
 Turkish dotless i, thus it has a similar use as the Cyrillic yeru
 Ы/ы (U+042B/U+044B).

 In this function, it is a part of the adaptation of the Latin alphabet
 for a lot of non-Russian languages in the Soviet Union in the 1920s,
 see e.g.: Юшманов, Н. В.: Определитель Языков. Москва/Ленинград 1941,
 http://fotki.yandex.ru/users/ievlampiev/view/155697?page=3 .
 (A proposal regarding this subject is expected for 2011.)

 Thus, it shares with the Cyrillic soft sign its form and partly the
 geographical area of its use, but in no case its meaning. Similar can
 be said e.g. for P/p (U+0050/U+0070, Latin letter P) and Р/р
 (U+0420/U+0440, Cyrillic letter ER).

 According to the pre-preliminary minutes of UTC #125 (L2/10-415),
 the UTC has not accepted the Latin Ь/ь.

 It is an established practice for the European alphabetic scripts to
 encode a new letter only if it has a different shape (in at least one
 of the capital and small forms) regarding to all already encoded
 letter of the same script. The Y/y is well known to denote completely
 different pronunciations, used as consonant as well as vocal, even within
 the same language. Thus, if somebody unearths a Latin letter E/e in some
 obscure minority language which has no E-like vocal, to denote a M-like
 sound and in fact to be collated after the M in the local alphabet, this
 will probably not lead to a new encoding.

 But, Latin and Cyrillic are different scripts (the question in the Re
 of this mail is rhetorical, of course).

 Admittedly, there also is a precedence for using Cyrillic letters in
 Latin text: the use of U+0417/U+0437 and U+0427/U+0447 for tone
 letters in Zhuang. However, the orthography using them was
 short-lived, being superseded by another Latin orthography which uses
 genuine Latin letters as tone marks (J/j and X/x, in this case).

 On the other hand, Jaꞑalif and the other Latin alphabets which use Ь/ь
 did not lose the Ь/ь by an improvement of the orthography, but were
 completely deprecated by an ukase of Stalin. Thus, they continue to be
 the Latin alphabets of the respective languages.
 Whether formally requesting a revival or not, they are regarded as valid
 by the members of the cultural group (even if only to access their cultural
 inheritance).
 Especially, it cannot be excluded that persons want to create Latin domain
 names or e-mail addresses without being accused for script mixing.

 Taking this into account, not mentioning the technical problems
 regarding collation etc. and the typographical issues when it comes to
 subtle differences between Latin and Cyrillic in high quality
 typography, it is really hard to understand why the UTC refuses to encode
 the Latin Ь/ь.

 A quick glance at the Юшманов table mentioned above proves that there
 is absolutely no request to duplicate the whole Cyrillic alphabet in
 Latin, as someone may have feared.

 - Karl Pentzlin






-- 
Szelp, André Szabolcs

+43 (650) 79 22 400




Are Latin and Cyrillic essentially the same script?

2010-11-10 Thread Karl Pentzlin
As shown in N3916: http://std.dkuug.dk/jtc1/sc2/wg2/docs/n3916.pdf
= L2/10-356, there exists a Latin letter which resembles the Cyrillic
soft sign Ь/ь (U+042C/U+044C). This letter is part of the Jaꞑalif
variant of the alphabet, which was used for several languages in the
former Soviet Union (e.g. Tatar), and was developed in parallel to the
alphabet nowadays in use for Turk and Azerbaijan, see:
http://en.wikipedia.org/wiki/Janalif .
In fact, it was proposed on this base, being the only Jaꞑalif letter
missing so far, since the ꞑ (occurring in the alphabet name itself)
was introduced with Unicode 6.0.

The letter is no soft sign; it is the exact Tatar equivalent of the
Turkish dotless i, thus it has a similar use as the Cyrillic yeru
Ы/ы (U+042B/U+044B).

In this function, it is a part of the adaptation of the Latin alphabet
for a lot of non-Russian languages in the Soviet Union in the 1920s,
see e.g.: Юшманов, Н. В.: Определитель Языков. Москва/Ленинград 1941,
http://fotki.yandex.ru/users/ievlampiev/view/155697?page=3 .
(A proposal regarding this subject is expected for 2011.)

Thus, it shares with the Cyrillic soft sign its form and partly the
geographical area of its use, but in no case its meaning. Similar can
be said e.g. for P/p (U+0050/U+0070, Latin letter P) and Р/р
(U+0420/U+0440, Cyrillic letter ER).

According to the pre-preliminary minutes of UTC #125 (L2/10-415),
the UTC has not accepted the Latin Ь/ь.

It is an established practice for the European alphabetic scripts to
encode a new letter only if it has a different shape (in at least one
of the capital and small forms) regarding to all already encoded
letter of the same script. The Y/y is well known to denote completely
different pronunciations, used as consonant as well as vocal, even within
the same language. Thus, if somebody unearths a Latin letter E/e in some
obscure minority language which has no E-like vocal, to denote a M-like
sound and in fact to be collated after the M in the local alphabet, this
will probably not lead to a new encoding.

But, Latin and Cyrillic are different scripts (the question in the Re
of this mail is rhetorical, of course).

Admittedly, there also is a precedence for using Cyrillic letters in
Latin text: the use of U+0417/U+0437 and U+0427/U+0447 for tone
letters in Zhuang. However, the orthography using them was
short-lived, being superseded by another Latin orthography which uses
genuine Latin letters as tone marks (J/j and X/x, in this case).

On the other hand, Jaꞑalif and the other Latin alphabets which use Ь/ь
did not lose the Ь/ь by an improvement of the orthography, but were
completely deprecated by an ukase of Stalin. Thus, they continue to be
the Latin alphabets of the respective languages.
Whether formally requesting a revival or not, they are regarded as valid
by the members of the cultural group (even if only to access their cultural
inheritance).
Especially, it cannot be excluded that persons want to create Latin domain
names or e-mail addresses without being accused for script mixing.

Taking this into account, not mentioning the technical problems
regarding collation etc. and the typographical issues when it comes to
subtle differences between Latin and Cyrillic in high quality
typography, it is really hard to understand why the UTC refuses to encode
the Latin Ь/ь.

A quick glance at the Юшманов table mentioned above proves that there
is absolutely no request to duplicate the whole Cyrillic alphabet in
Latin, as someone may have feared.

- Karl Pentzlin




Re: Are Latin and Cyrillic essentially the same script?

2010-11-10 Thread Karl Pentzlin
2010-11-10 10:08, I wrote:

KP As shown in N3916 ...

Please read vowel instead of vocal throughout the mail. Sorry.