RE: Are Latin and Cyrillic essentially the same script?

2010-11-18 Thread Peter Constable
If you'd like a precedent, here's one: IPA is a widely-used system of 
transcription based primarily on the Latin script. In comparison to the Janalif 
orthography in question, there is far more existing data. Also, whereas that 
Janalif orthography is no longer in active use--hence there are not new texts 
to be represented (there are at best only new citations of existing texts), IPA 
is as a writing system in active use with new texts being created daily; thus, 
the body of digitized data for IPA is growing much more that is data in the 
Janalif orthography. And while IPA is primarily based on Latin script, not all 
of its characters are Latin characters: bilabial and interdental fricative 
phonemes are represented using Greek letters beta and theta.

Given a precedent of a widely-used Latin writing system for which it is 
considered adequate to have characters of central importance represented using 
letters from a different script, Greek, it would seem reasonable if someone 
made the case that it's adequate to represent an historic Latin orthography 
using Cyrillic soft sign.


Peter


-Original Message-
From: Asmus Freytag [mailto:asm...@ix.netcom.com] 
Sent: Thursday, November 18, 2010 11:05 AM
To: Peter Constable
Cc: André Szabolcs Szelp; Karl Pentzlin; unicode@unicode.org; Ilya Yevlampiev
Subject: Re: Are Latin and Cyrillic essentially the same script?

On 11/18/2010 8:04 AM, Peter Constable wrote:
> From: unicode-bou...@unicode.org [mailto:unicode-bou...@unicode.org] 
> On Behalf Of André Szabolcs Szelp
>
>> AFAIR the reservations of WG2 concerning the encoding of Jangalif 
>> Latin Ь/ь as a new character were not in view of Cyrillic Ь/ь, but 
>> rather in view of its potential identity with the tone sign mentioned 
>> by you as well. It is a Latin letter adapted from the Cyrillic soft 
>> sign,
> There's another possible point of view: that it's a Cyrillic character that, 
> for a short period, people tried using as a Latin character but that never 
> stuck, and that it's completely adequate to represent Janalif text in that 
> orthography using the Cyrillic soft sign.
>
>

When one language borrows a word from another, there are several stages of 
"foreignness", ranging from treating the foreign word as a short quotation in 
the original language to treating it as essentially fully native.

Now words are very complex in behavior and usage compared to characters. 
You can check for pronunciation, spelling and adaptation to the host grammar to 
check which stage of adaptation a word has reached.

When a script borrows a letter from another, you are essentially limited in 
what evidence you can use to document objectively whether the borrowing has 
crossed over the script boundary and the character has become "native".

With typographically closely related scripts, getting tell-tale typographical 
evidence is very difficult. After all, these scripts started out from the same 
root.

So, you need some other criteria.

You could individually compare orthographies and decide which ones are 
"important" enough (or "established" enough) to warrant support. Or you could 
try to distinguish between orthographies for general use withing the given 
language, vs. other systems of writing (transcriptions, say).

But whatever you do, you should be consistent and take account of existing 
precedent.

There are a number of characters encoded as nominally "Latin" in Unicode that 
are borrowings from other scripts, usually Greek.

A discussion of the current issue should include explicit explanation of why 
these precedents apply or do not apply, and, in the latter case, why some 
precedents may be regarded as examples of past mistakes.

By explicitly analyzing existing precedents, it should be possible to avoid the 
impression that the current discussion is focused on the relative merits of a 
particular orthography based on personal and possibly arbitrary opinions by the 
work group experts.

If it can be shown that all other cases where such borrowings were accepted 
into Unicode are based on orthographies that are more permanent, more 
widespread or both, or where other technical or typographical reasons prevailed 
that are absent here, then it would make any decision on the current request 
seem a lot less arbitrary.

I don't know where the right answer lies in the case of Janalif, or which point 
of view, in Peter's phrasing, would make the most sense, but having this 
discussion without clear understanding of the precedents will lead to 
inconsistent encoding.

A./





pupil's comment: Are Latin and Cyrillic essentially the same script?

2010-11-18 Thread JP Blankert (thuis & PC based)

Dear all,

Still see myself as pupil reading introduction chart of unicode, but I 
am happy to join the discussion on the Russian: it is quite different 
from Latin. Apart from 33 characters in Russian alphabet = more 
characters and apart from quite a few characters that as English speaker 
you clearly do not know, Latin and Russian indeed contain some similar 
characters. But watch out. There are if I am correct 3 a's in the world, 
in this email a (Latin) looks like a (Russian) but they are different. 
So the Russian a is quite suited for a hierogplyph attack (I will try 
ontslag.com, which is Dutch for dismissal.com, to see how search engines 
react. With Russian a. Punycode is different of the word as total).


Similar example: Ukraine i - looks like ours, but you can't register it 
on .rf (Russian Federation).


Experiment 1 year ago with *Reïntegratie.com* 
 
being correct Dutch for reintegration, but being impossible as 
domainname because SIDN.nl (supposed to be nic.nl) is very conservative 
and does not even allow signs gave as result: in the beginning Google 
appreciated and appreciated itafter a few months the hosted and 
filled site 'sank'.(I borrowed the **ï* 
 
*from Catalan, amidst Latin characters).


News about ss / sz to whom is interested: most Germans were alert 
(ss-holders had priority to /ß)//, /so no/Fußbal/l for me, but only 
experimental domain names IDNexpress.de and IDNexpre/ß.de. /It was a 
mini-landrush on Nov. 16 2010, 10:00 German time onwards (Denic.de)

/
/Very busy with .rf auction now, in December I will put 2 different 
sites on these ss and sz names so people can wonder at their screens to 
see what is happening.


Above reaction was more out of domain names and practical experience 
than chartUTFxyz - but definitely: different script.


Br,

Philippe


On 18-11-2010 20:04, Asmus Freytag wrote:

On 11/18/2010 8:04 AM, Peter Constable wrote:
From: unicode-bou...@unicode.org [mailto:unicode-bou...@unicode.org] 
On Behalf Of André Szabolcs Szelp



AFAIR the reservations of WG2 concerning the encoding of Jangalif
Latin Ь/ь as a new character were not in view of Cyrillic Ь/ь, but
rather in view of its potential identity with the tone sign mentioned
by you as well. It is a Latin letter adapted from the Cyrillic soft 
sign,
There's another possible point of view: that it's a Cyrillic 
character that, for a short period, people tried using as a Latin 
character but that never stuck, and that it's completely adequate to 
represent Janalif text in that orthography using the Cyrillic soft sign.





When one language borrows a word from another, there are several 
stages of "foreignness", ranging from treating the foreign word as a 
short quotation in the original language to treating it as essentially 
fully native.


Now words are very complex in behavior and usage compared to 
characters. You can check for pronunciation, spelling and adaptation 
to the host grammar to check which stage of adaptation a word has 
reached.


When a script borrows a letter from another, you are essentially 
limited in what evidence you can use to document objectively whether 
the borrowing has crossed over the script boundary and the character 
has become "native".


With typographically closely related scripts, getting tell-tale 
typographical evidence is very difficult. After all, these scripts 
started out from the same root.


So, you need some other criteria.

You could individually compare orthographies and decide which ones are 
"important" enough (or "established" enough) to warrant support. Or 
you could try to distinguish between orthographies for general use 
withing the given language, vs. other systems of writing 
(transcriptions, say).


But whatever you do, you should be consistent and take account of 
existing precedent.


There are a number of characters encoded as nominally "Latin" in 
Unicode that are borrowings from other scripts, usually Greek.


A discussion of the current issue should include explicit explanation 
of why these precedents apply or do not apply, and, in the latter 
case, why some precedents may be regarded as examples of past mistakes.


By explicitly analyzing existing precedents, it should be possi

Re: Are Latin and Cyrillic essentially the same script?

2010-11-18 Thread Asmus Freytag

On 11/18/2010 8:04 AM, Peter Constable wrote:

From: unicode-bou...@unicode.org [mailto:unicode-bou...@unicode.org] On Behalf 
Of André Szabolcs Szelp


AFAIR the reservations of WG2 concerning the encoding of Jangalif
Latin Ь/ь as a new character were not in view of Cyrillic Ь/ь, but
rather in view of its potential identity with the tone sign mentioned
by you as well. It is a Latin letter adapted from the Cyrillic soft sign,

There's another possible point of view: that it's a Cyrillic character that, 
for a short period, people tried using as a Latin character but that never 
stuck, and that it's completely adequate to represent Janalif text in that 
orthography using the Cyrillic soft sign.




When one language borrows a word from another, there are several stages 
of "foreignness", ranging from treating the foreign word as a short 
quotation in the original language to treating it as essentially fully 
native.


Now words are very complex in behavior and usage compared to characters. 
You can check for pronunciation, spelling and adaptation to the host 
grammar to check which stage of adaptation a word has reached.


When a script borrows a letter from another, you are essentially limited 
in what evidence you can use to document objectively whether the 
borrowing has crossed over the script boundary and the character has 
become "native".


With typographically closely related scripts, getting tell-tale 
typographical evidence is very difficult. After all, these scripts 
started out from the same root.


So, you need some other criteria.

You could individually compare orthographies and decide which ones are 
"important" enough (or "established" enough) to warrant support. Or you 
could try to distinguish between orthographies for general use withing 
the given language, vs. other systems of writing (transcriptions, say).


But whatever you do, you should be consistent and take account of 
existing precedent.


There are a number of characters encoded as nominally "Latin" in Unicode 
that are borrowings from other scripts, usually Greek.


A discussion of the current issue should include explicit explanation of 
why these precedents apply or do not apply, and, in the latter case, why 
some precedents may be regarded as examples of past mistakes.


By explicitly analyzing existing precedents, it should be possible to 
avoid the impression that the current discussion is focused on the 
relative merits of a particular orthography based on personal and 
possibly arbitrary opinions by the work group experts.


If it can be shown that all other cases where such borrowings were 
accepted into Unicode are based on orthographies that are more 
permanent, more widespread or both, or where other technical or 
typographical reasons prevailed that are absent here, then it would make 
any decision on the current request seem a lot less arbitrary.


I don't know where the right answer lies in the case of Janalif, or 
which point of view, in Peter's phrasing, would make the most sense, but 
having this discussion without clear understanding of the precedents 
will lead to inconsistent encoding.


A./



RE: Are Latin and Cyrillic essentially the same script?

2010-11-18 Thread Peter Constable
From: unicode-bou...@unicode.org [mailto:unicode-bou...@unicode.org] On Behalf 
Of André Szabolcs Szelp

> AFAIR the reservations of WG2 concerning the encoding of Jangalif 
> Latin Ь/ь as a new character were not in view of Cyrillic Ь/ь, but 
> rather in view of its potential identity with the tone sign mentioned 
> by you as well. It is a Latin letter adapted from the Cyrillic soft sign, 

There's another possible point of view: that it's a Cyrillic character that, 
for a short period, people tried using as a Latin character but that never 
stuck, and that it's completely adequate to represent Janalif text in that 
orthography using the Cyrillic soft sign.



Peter