Re: Writing Tatar using the Latin script; new characters to encode?

2004-07-28 Thread Philipp Reichmuth
Eric Muller schrieb:
Unicode exists to support what people use.  Do people use Latin script 
for Tatar?  Evidence indicates that they do.  Should Unicode support 
it, then?  Certainly.  Does Unicode support it?  Yes, Unicode supports 
the Latin script, with gobs of extensions.  So what's the problem?
Latin n with descender, which is not encoded  but needed according to  
.
For Tatar, at least, it's OK to use eng as far as I can see.
Official usage is n with tilde. See, e.g., the language law of 1999, 
which is available online and which uses n with tilde throughout: 
http://tugan-tel.noka.ru/belem/imla

However, I was in Tatarstan early this year, and signs etc. frequently 
show inverted e and n with descender where the law would call for a with 
diaeresis and n with tilde. It's pretty much of an alphabet in transition.

Philipp


Re: Writing Tatar using the Latin script; new characters to encode?

2004-07-27 Thread Peter Kirk
On 27/07/2004 18:29, Eric Muller wrote:
Mark E. Shoulson wrote:
Unicode exists to support what people use.  Do people use Latin 
script for Tatar?  Evidence indicates that they do.  Should Unicode 
support it, then?  Certainly.  Does Unicode support it?  Yes, Unicode 
supports the Latin script, with gobs of extensions.  So what's the 
problem?

Latin n with descender, which is not encoded  but needed according to  
.

Eric.
Possibly needed, but the sources I have seen suggest that this letter is 
actually an eng, 014A/014B, or else N with tilde is used. It is 
pronounced like a phonetic eng.

--
Peter Kirk
[EMAIL PROTECTED] (personal)
[EMAIL PROTECTED] (work)
http://www.qaya.org/



Re: Writing Tatar using the Latin script; new characters to encode?

2004-07-27 Thread Peter Kirk
On 27/07/2004 18:21, Mark E. Shoulson wrote:
What's all the fuss, then?

Ivan the Terrible conquered (what is now) Tatarstan in 1552. Stalin 
imposed the Cyrillic alphabet in 1939. Do the current Russian 
authorities want the same reputation? That's why this is a big issue in 
Tatarstan.

BarlÄq keÅelÃr dà azat hÃm Ãz abruylarÄ hÃm xoquqlarÄ yaÄÄnnan tià bulÄp 
tualar. AlarÄa aqÄl hÃm woclaà birelgÃn hÃm ber-bersenà qarata tuÄanarÃa 
monÃsÃbÃttà bulÄrÄa tieÅlÃr - for a translation, see 
http://www.omniglot.com/writing/tatar.htm (but there do seem to be some 
variations in the alphabet).

Unicode exists to support what people use.  Do people use Latin script 
for Tatar?  Evidence indicates that they do.  Should Unicode support 
it, then?  Certainly.  Does Unicode support it?  Yes, Unicode supports 
the Latin script, with gobs of extensions.  So what's the problem?  
Are there any characters in Latin transcription of Tatar that Unicode 
doesn't support?

Well, there is a strange curly Y for [y] in the 1929-1939 alphabet, 
which looks like 01B3/01B4 but I wonder if that is really the historic 
form as these letters are in Unicode for African languages. Apart from 
that, there are in fact no problems.

The law doesn't enter into this.  What's the big deal?
~mark



--
Peter Kirk
[EMAIL PROTECTED] (personal)
[EMAIL PROTECTED] (work)
http://www.qaya.org/



Re: Writing Tatar using the Latin script; new characters to encode?

2004-07-27 Thread Mark E. Shoulson
Eric Muller wrote:
Mark E. Shoulson wrote:
Unicode exists to support what people use.  Do people use Latin 
script for Tatar?  Evidence indicates that they do.  Should Unicode 
support it, then?  Certainly.  Does Unicode support it?  Yes, Unicode 
supports the Latin script, with gobs of extensions.  So what's the 
problem?

Latin n with descender, which is not encoded  but needed according to  
.
So we verify that it's true, and encode it.  Why should Russia care?  
(It's not even in the Cyrillic block) It sounds like we've heard 
evidence that this isn't an idiosyncratic usage by one or two people, 
and there are folks who use this orthography.  If it's in use, or if it 
*was* in use, Unicode has to support it.

There's always a higher bar for proving that something should *not* be 
encoded, as we saw already with Phoenician.  If a sizable minority of 
people use it, what's the harm?  We're not hurting for codepoints.  I 
haven't seen much argument on the only substantive questions: do these 
letter actually occur and is the orthography actually being used.  
That's really all that matters to the decision.  I've seen some weak 
"yes" answers to the second question, one "yes" for the first, but not 
much in terms of evidence.

~mark



Re: Writing Tatar using the Latin script; new characters to encode?

2004-07-27 Thread Eric Muller
Mark E. Shoulson wrote:
Unicode exists to support what people use.  Do people use Latin script 
for Tatar?  Evidence indicates that they do.  Should Unicode support 
it, then?  Certainly.  Does Unicode support it?  Yes, Unicode supports 
the Latin script, with gobs of extensions.  So what's the problem?
Latin n with descender, which is not encoded  but needed according to  
.

Eric.



Re: Writing Tatar using the Latin script; new characters to encode?

2004-07-27 Thread Mark E. Shoulson
What's all the fuss, then?
Unicode exists to support what people use.  Do people use Latin script 
for Tatar?  Evidence indicates that they do.  Should Unicode support it, 
then?  Certainly.  Does Unicode support it?  Yes, Unicode supports the 
Latin script, with gobs of extensions.  So what's the problem?  Are 
there any characters in Latin transcription of Tatar that Unicode 
doesn't support?

The law doesn't enter into this.  What's the big deal?
~mark


Re: Writing Tatar using the Latin script; new characters to encode?

2004-07-27 Thread Anto'nio Martins-Tuva'lkin
On 2004.07.27, 15:21, Alexander Savenkov <[EMAIL PROTECTED]> wrote:

> Once again, Peter, you're going off the topic. You're invited to
> prove your assumptions with facts or withdraw them.

I was in Tatarstan in March 2000 (in Kazan and in Nabr. Cheln.) and
most of the (scarce) public usage of the Latin script for spelling
Tatar language I witnessed was indeed in media sponsored by, or under
the responsability of the Tatarstan Government.

--.
António MARTINS-Tuválkin |  ()|
<[EMAIL PROTECTED]>||
PT-1XXX-XXX LISBOA   Não me invejo de quem tem|
+351 934 821 700 carros, parelhas e montes|
http://www.tuvalkin.web.pt/bandeira/ só me invejo de quem bebe|
http://pagina.de/bandeiras/  a água em todas as fontes|




Re: Writing Tatar using the Latin script; new characters to encode?

2004-07-27 Thread Peter Kirk
On 27/07/2004 15:21, Alexander Savenkov wrote:
...
I can't guess what is considered by "many in Tatarstan". And I think
you shouldn't be guessing too as it makes no difference in our case.
If someone, in spite of the law, consider killing people to be ok,
it's a matter of court.
 

There is a law against killing people, but that doesn't mean that it 
doesn't happen. Unicode encodes characters in actual use, not only those 
officially defined by governments. Even if use of the Latin alphabet 
were illegal in Tatarstan (and it is not illegal, just unofficial), if 
there is evidence that people actually use it, Unicode needs to support 
that alphabet.

...
Btw, I remember reading you visited Azerbaijan, so you know
the situation there better. I.e., you should know that many Azerbaijan
officials write their public speeches in Cyrillic script, so
the secretararies need to transliterate them into Latin before
publishing.
 

This is true, and proves my point. Cyrillic script is not the official 
script in Azerbaijan, and may not be used in publications, signs etc. 
Nevertheless, it is in widespread use. Therefore, Unicode needs to 
support it. The same applies to the Tatar Latin script.

5) This is an alphabet which has been used, even in official
websites, and very likely continues to be used by some. Decisions
made in Moscow do not change this, especially because they are in
practice widely ignored in Tatarstan
   

Once again, Peter, you're going off the topic. You're invited to prove
your assumptions with facts or withdraw them. I personally consider
statements of this kind as veiled attacks to Russia's statehood.
Please, stop that.
 

I am merely reporting the facts as I understand them. Decisions of the 
Moscow government are not always obeyed in Tatarstan. Decisions of the 
London government are not always obeyed in parts of Northern Ireland - 
although I don't think any of the disputed ones are relevant to Unicode.

--
Peter Kirk
[EMAIL PROTECTED] (personal)
[EMAIL PROTECTED] (work)
http://www.qaya.org/



Re: Writing Tatar using the Latin script; new characters to encode?

2004-07-18 Thread Doug Ewell
Peter Kirk <[EMAIL PROTECTED]> wrote:

> 5) This is an alphabet which has been used, even in official websites,
> and very likely continues to be used by some. Decisions made in Moscow
> do not change this, especially because they are in practice widely
> ignored in Tatarstan and have no force in some other places where
Tatars
> live. This alphabet therefore needs to be supported by Unicode. But
> fortunately this is not a problem as all the characters are already
defined.

Peter is absolutely right.  Whether the Russian government has banned
the Latin script for writing Tatar is irrelevant.  It has been used, at
least in the recent past and at least outside Russia.

-Doug Ewell
 Fullerton, California
 http://users.adelphia.net/~dewell/





Re: Writing Tatar using the Latin script; new characters to encode?

2004-07-18 Thread Peter Kirk
On 18/07/2004 16:22, Alexander Savenkov wrote:
Hello,
(delayed response)
2004-05-12T19:37:51+03:00 Ernest Cline <[EMAIL PROTECTED]> wrote:
 

From: Alexander Savenkov <[EMAIL PROTECTED]>
   

2004-05-12T03:08:59+03:00 Eric Muller <[EMAIL PROTECTED]> wrote:
 

According to , there is a currently an effort to convert the
writing of Tatar from Cyrillic to Latin.
   

1. Does somebody have more information about that effort?
   

Perhaps it's their own effort.
 

Eki lists four characters as needed but missing in Unicode (see 

   

).
 

2. The case pair for barred o is encoded (U+019F and U+0275), and it
seems that their confusion comes from less-than-perfect but annotated
name for U+019F, and from the usage remark "African". Can we 
authoritatively tell them that those two characters are the ones they
want? Can we add a "Tatar" usage remark to both?
   

Is there a need for this? You don't want to tell everyone on the net
about his or her wrong assumptions. There's one sentence in the page
you mentioned that gives a good description of this resource:
"The conversion from Cyrillic to Latin script is planned within years
2001-2011."
This is false.
 

3. The case pair n with descender is definitely not encoded, and from my
memory of the discussion of ghe with descender, we would want to encode
them as separate characters (rather than with combining descenders on
"n"). Is anybody working on that proposal?
   

There's no Latin Tatar script. It's the law. Full stop.
It's the Institute of Estonian language. I hope they know more about
Estonian than about other languages and Unicode.
 

 

They are numerous sites on the web about the change from Cyrillic to
Latin for Tatar that is planned for completion by 2011 by the Republic
of Tatarstan (a part of the Russian Federation).
   

Ernest, I fail to see how the fact that there are numerous sites about
Latin for Tatar proves it really exists. There are numerous sites
about Babylon 5 and Frankenstein. What are your thoughts about these?
 

There is legal wrangling
over wether Tatarstan can make the change back to Latin script official
for Tatar as it is used there, but no final decision has been reached and
there is probably at least several more years of legal shenanigans
before it is reached.
   

You're wrong and the facts you give here are outdated. Legal wrangling
is over. See links below (in Russian).
...
 

As for the merits of the proposed change back to Latin, I think
it is silly for Tatarstan to make the change and it is silly for the
Russian Federation to oppose it.
   

Your clever thoughts are really helpful. I wonder what Russians and
Tatars would do without them.
Links in Russian:
http://www.tatar.ru/?DNSID=0627096ec5c075004c0d219207f349de&node_id=978
 

An article about language on the official Tatarstan government website. 
Last paragraph:

Ð ÑÐÐÑÑ ÐÐÐÑÐÐÐÑÐÐÐ ÑÐÐÐÑÑÐÐÑÑÐÐÑ ÑÐÑÐÑÑ ÐÐÑÐÐÐÑÐ ÐÐ ÐÑ 
ÐÐÑÐÐÑÐÐÐ ÐÑÐÑÐÐÐ Ð ÑÐÐÐ ÐÐÑÐÑÑÐÑÑ ÑÑÐ ÐÐÑ ÐÐÐ ÐÑÐÐÑ Ð 
ÑÐÑÑÐÐÑ ÐÐÑ ÑÑÐÐ 15 ÑÐÐÑÑÐÑÑ 1999  ÐÑÐÐÑÑ Ð 
ÐÐÑÐÑÐ ÐÐÑÐÑÑÑÐÐ "Ð ÐÐÑÑÑÐ ÑÐÑÐÑÑ ÐÐÑÐÐÐÑÐ ÐÐ ÐÑ 
ÐÐÑÐÐÑÐÐÐ ÐÑÐÑÐÐÐ".
With the aim of the further improvement of the Tatar alphabet on the 
basis of Latin graphics and the formation of favourable conditions for 
its entry into the system of world communications, on 15th September 
1999 there was accepted a Law of the Republic of Tatarstan "On the 
establishment of the Tatar alphabet on the basis of Latin graphics".

http://www.tatar.ru/1296_c.html
 

This is the text of that law. Article 5:
ÐÐÑÑÐÑÑÐÐ Ð ÐÑÑÑÐÐÐÑ Ð ÑÐÐÑ Ñ 1 ÑÐÐÑÑÐÑÑ 2001 .
This law will come into force on 1st September 2001.
http://www.tatar.ru/index.php?node_id=1006
 

The alphabet, with pronunciations in Latin and Cyrillic. This alphabet 
consists of Latin-1 characters plus schwa, G breve, dotless i, dotted I, 
N with a descender, barred O (019F/0275) and S with cedilla. The N/n 
with descender might cause a problem because 014A/014B do not look quite 
right. (But a rather different form of the same alphabet appears in the 
left column of http://www.tatar.ru/?node_id=2611; here the N with 
descender looks like 014A/014B with the alternative form of the capital 
looking like the small letter.) The case mapping of the i's is as in 
Turkish.

http://www.tatar.ru/?DNSID=0627096ec5c075004c0d219207f349de&node_id=2610
 

This describes Inalif, an experimental Tatar Latin alphabet for use on 
the Internet, based on the alphabet in the 1999 law.

http://www.tatar.ru/?node_id=2611
 

This gives details of Inalif, which appears to be ASCII-only and rather 
like the Uzbek Latin alphabet. This page dated December 2003 refers to 
the 1999 alphabet as "the official Latin alphabet", and is signed by 
many prominent Tatars at least one of whom is a top Tatarstan government 
official. It also mentions that the 1999 alphabet is u

Re: Writing Tatar using the Latin script; new characters to encode?

2004-05-12 Thread Kenneth Whistler
Eric Muller wrote:

> According to , there is a currently an effort to convert the 
> writing of Tatar from Cyrillic to Latin.

Alexander Savenkov said:

> There's no Latin Tatar script. It's the law. Full stop.

Ernest Cline said:

> They are numerous sites on the web about the change from Cyrillic to
> Latin for Tatar that is planned for completion by 2011 by the Republic
> of Tatarstan (a part of the Russian Federation). ...

And you couldn't get a much clearer demonstration of how
linguistic politics can come to trump "objective" graphological
considerations when it comes to character encoding
standardization.

--Ken




RE: Writing Tatar using the Latin script; new characters to encode?

2004-05-12 Thread Peter Constable
>>2. The case pair for barred o is encoded (U+019F and U+0275), and it 
seems that their confusion comes from less-than-perfect but annotated 
name for U+019F, and from the usage remark "African". Can we 
authoritatively tell them that those two characters are the ones they 
want? 

IMO, yes.


>>Can we add a "Tatar" usage remark to both?

That can certainly be done (assuming the info on Tatar is correct), and
may be helpful.



>>3. The case pair n with descender is definitely not encoded, and from
my 
memory of the discussion of ghe with descender, we would want to encode 
them as separate characters (rather than with combining descenders on 
"n"). 

Yes.


>>Is anybody working on that proposal?

The ghe with descender is already approved by UTC and in the PDAM for
amendement 1.

If you look in the documentation on SIL's usage of the PUA
(http://scripts.sil.org/cms/scripts/page.php?site_id=nrsi&cat_id=Unicode
PUA) you'll find that I had been given evidence for Latin H/h with
descender in Judeo-Tat (not related to Tatar). I had anticipated
preparing a proposal for that and the other orthographic characters in
SIL's PUA usage, but have not yet had opportunity to do so. The
n-descender was not among the thing that were added to SIL's PUA usage,
though.




Peter
 
Peter Constable
Globalization Infrastructure and Font Technologies
Microsoft Windows Division






Re: Writing Tatar using the Latin script; new characters to encode?

2004-05-12 Thread Ernest Cline

 From: Alexander Savenkov <[EMAIL PROTECTED]>
>
> 2004-05-12T03:08:59+03:00 Eric Muller <[EMAIL PROTECTED]> wrote:
>
> > According to , there is a currently an effort to convert the
> > writing of Tatar from Cyrillic to Latin.
>
> > 1. Does somebody have more information about that effort?
>
> Perhaps it's their own effort.
>
> > Eki lists four characters as needed but missing in Unicode (see 
> > ).
>
> > 2. The case pair for barred o is encoded (U+019F and U+0275), and it
> > seems that their confusion comes from less-than-perfect but annotated
> > name for U+019F, and from the usage remark "African". Can we 
> > authoritatively tell them that those two characters are the ones they
> > want? Can we add a "Tatar" usage remark to both?
>
> Is there a need for this? You don't want to tell everyone on the net
> about his or her wrong assumptions. There's one sentence in the page
> you mentioned that gives a good description of this resource:
>
> "The conversion from Cyrillic to Latin script is planned within years
> 2001-2011."
>
> This is false.
>
> > 3. The case pair n with descender is definitely not encoded, and from my
> > memory of the discussion of ghe with descender, we would want to encode
> > them as separate characters (rather than with combining descenders on
> > "n"). Is anybody working on that proposal?
>
> There's no Latin Tatar script. It's the law. Full stop.
>
> It's the Institute of Estonian language. I hope they know more about
> Estonian than about other languages and Unicode.

They are numerous sites on the web about the change from Cyrillic to
Latin for Tatar that is planned for completion by 2011 by the Republic
of Tatarstan (a part of the Russian Federation). There is legal wrangling
over wether Tatarstan can make the change back to Latin script official
for Tatar as it is used there, but no final decision has been reached and
there is probably at least several more years of legal shenanigans
before it is reached.  However, I have not seen any other source for the
characters that the Institute of Estonian Language indicates.  Every other
source indicates using existing Unicode Latin characters and not a new
N WITH DESCENDER character.  Apparently the proposed new
Latin version of Tatar is not the same as the one used 1929-1939, so it
may be that the exact Latin letters have yet to be decided upon, or that
the Estonian institute got hold of a draft proposal that contained an
N with descender character in it.  Such a character would preserve a
symmetry between the usage of LATIN N and CYRILLIC EN for Tatar
in that to go from  to  one would use the descender in both
scripts, so I can see why such a character would be under
consideration by those planning such a change.

As for the merits of the proposed change back to Latin, I think
it is silly for Tatarstan to make the change and it is silly for the
Russian Federation to oppose it.





Re: Writing Tatar using the Latin script; new characters to encode?

2004-05-12 Thread Alexander Savenkov
Hello,

2004-05-12T03:08:59+03:00 Eric Muller <[EMAIL PROTECTED]> wrote:

> According to , there is a currently an effort to convert the
> writing of Tatar from Cyrillic to Latin.

> 1. Does somebody have more information about that effort?

Perhaps it's their own effort.

> Eki lists four characters as needed but missing in Unicode (see 
> ).

> 2. The case pair for barred o is encoded (U+019F and U+0275), and it
> seems that their confusion comes from less-than-perfect but annotated
> name for U+019F, and from the usage remark "African". Can we 
> authoritatively tell them that those two characters are the ones they
> want? Can we add a "Tatar" usage remark to both?

Is there a need for this? You don't want to tell everyone on the net
about his or her wrong assumptions. There's one sentence in the page
you mentioned that gives a good description of this resource:

"The conversion from Cyrillic to Latin script is planned within years
2001-2011."

This is false.

> 3. The case pair n with descender is definitely not encoded, and from my
> memory of the discussion of ghe with descender, we would want to encode
> them as separate characters (rather than with combining descenders on
> "n"). Is anybody working on that proposal?

There's no Latin Tatar script. It's the law. Full stop.

It's the Institute of Estonian language. I hope they know more about
Estonian than about other languages and Unicode.

> PS: sorry for the double post to unicode and unicore. However, given the
> current state of [EMAIL PROTECTED], this seems the best course of action.

What's up with [EMAIL PROTECTED]
-- 
  Alexander Savenkovhttp://www.xmlhack.ru/
  [EMAIL PROTECTED] http://www.xmlhack.ru/authors/croll/




Re: Writing Tatar using the Latin script; new characters to encode?

2004-05-12 Thread Anto'nio Martins-Tuva'lkin
On 2004.05.12, 00:08, Eric Muller <[EMAIL PROTECTED]> wrote:

> less-than-perfect but annotated name for U+019F, and from the usage
> remark "African". Can we authoritatively tell them that those two
> characters are the ones they want? Can we add a "Tatar" usage remark
> to both?

As easily as "we" can remove the misleading remark "Catalan" from
U+0140, I should think... >;-|

--.
António MARTINS-Tuválkin |  ()|
<[EMAIL PROTECTED]>||
PT-1XXX-XXX LISBOA   Não me invejo de quem tem|
+351 934 821 700 carros, parelhas e montes|
http://www.tuvalkin.web.pt/bandeira/ só me invejo de quem bebe|
http://pagina.de/bandeiras/  a água em todas as fontes|




Re: Writing Tatar using the Latin script; new characters to encode?

2004-05-12 Thread Doug Ewell
Kenneth Whistler  wrote:

> And Eki should be notified that the statement on the site about
> the barred o's is incorrect.

They've got an interesting little site there, with lots of information
pertaining to both Unicode and 8-bit encodings, but some misinformation
as well.  In particular, I wish they wouldn't perpetuate the myth that
certain letters-with-diacritic "aren't encoded" when they just need to
be composed.  Unexplained blanket statements like:

"Yoruba precomposed characters were rejected by the Unicode Technical
Committee in 1996."

give the wrong impression altogether.

>> 3. The case pair n with descender is definitely not encoded, and from
>> my memory of the discussion of ghe with descender, we would want to
>> encode them as separate characters (rather than with combining
>> descenders on "n").
>
> Yes. But why, oh why, do people do this to themselves, instead of
> just making use of the existing Latin letters for this: 014A/014B ?
>
> This is another recipe to wait years for their orthography to be
> supported by conventional software.

Amen.

-Doug Ewell
 Fullerton, California
 http://users.adelphia.net/~dewell/




RE: Writing Tatar using the Latin script; new characters to encode?

2004-05-11 Thread Ernest Cline

> [Original Message]
> From: Eric Muller <[EMAIL PROTECTED]>
> To: <[EMAIL PROTECTED]>; <[EMAIL PROTECTED]>
> Date: 5/11/2004 7:09:34 PM
> Subject: Writing Tatar using the Latin script; new characters to encode?
>
> According to , there is a currently an effort to convert the 
> writing of Tatar from Cyrillic to Latin.

They also have some other Latin characters that are not in Unicode
given in at that site. [1]  Most of them have decompositions into a basic
Latin character with a combining mark, (which are listed on the referenced
web page) but there are a few which do not.

Besides the N WITH DESCENDER and O WITH MIDDLE BAR that
Eric mentioned there are also G WITH TURNED COMMA ABOVE RIGHT
and O with TURNED COMMA ABOVE RIGHT.  They look like they might
be nothing more than the base letter followed by U+02BB MODIFIER
LETTER TURNED COMMA kerned so that the turned comma intrudes
into the space of the letter (but doesn't overlap) or it might show a need
for a new combining character COMBINING TURNED COMMA ABOVE RIGHT
that would be a companion to U+0315 COMBINING COMMA ABOVE RIGHT.

There is the proposed CYRILLIC GHE WITH DESCENDER on that page as well,
with no indication of its status of as a proposed character.

In addition they have what they call CYRILLIC XA WITH HOOK, altho the
hook looks more like a loop to me, and several other Cyrillic letters that
might be of interest.

[1] http://www.eki.ee/letter/chardata.cgi?ucode=e000-f8ff