Re: Continue:Glaring mistake in the code list for South Asian Script

2011-09-10 Thread Chridtopher Fynn

On 10/09/2011 04:53, delex r wrote:

I figure out that Unicode has not addressed the sovereignty issues of a 
language while trying to devise an ASCII like encoding system for almost all 
the characters and symbols used on earth.

.

The Unicode encodes writing systems not languages  - it certainly has 
nothing to do with the sovereignty issues of a language - nor should it.


There are many characters encoded in the Latin blocks that were never 
used for writing the Latin Language and similarly there are characters 
encoded in the Arabic block only used for writing Persian not Arabic.


Characters only used for writing Assamese in the Bengali block is 
similar. As long as you can type all the characters necessary for 
writing your language, don't worry about names.










Re: Continue:Glaring mistake in the code list for South Asian Script

2011-09-10 Thread Petr Tomasek
On Sat, Sep 10, 2011 at 02:02:09AM +, Doug Ewell wrote:
 English, French, German, Dutch, Spanish, Italian, Portuguese, Swedish, and 
 Polish are all different languages. Each has its own pronunciation, 
 vocabulary, orthography, national identity, and rich literary tradition.
 
 Would you suggest that the letters used in each of these languages should be 
 encoded separately?

That would be the best :). People would stop pretending
they are able to read/write the language of the other one :

-- 
Petr Tomasek http://www.etf.cuni.cz/~tomasek
Jabber: but...@jabbim.cz


EA 355:001  DU DU DU DU
EA 355:002  TU TU TU TU
EA 355:003  NU NU NU NU NU NU NU
EA 355:004  NA NA NA NA NA






Re: Continue:Glaring mistake in the code list for South Asian Script

2011-09-10 Thread anbu
Hi Unicode Community!

I recommend to Unicode that this grievance is taken into account. No one
consonant in this code range is used by only one language. Refer:

http://en.wikipedia.org/wiki/Eastern_Nagari_alphabet#Consonants

The Indian census of 1961 recognised 1,652 different languages in India
(including languages not native to the subcontinent). The 1991 census
recognizes 1,576 classified mother tongues. Refer:

http://en.wikipedia.org/wiki/Languages_of_India#Inventories.

The Eastern Nagari script is an Abugida system of writing belonging to the
Brahmic family of scripts whose use is associated with the Assamese,
Bengali, Bishnupriya Manipuri, Maithili, Mising, Meitei Manipuri, Sylheti,
and Chittagonian languages. Refer:

http://en.wikipedia.org/wiki/Eastern_Nagari_alphabet

The Bengali alphabet (Bengali: বাংলা লিপি bangla lipi or Bengali: বঙ্গলিপি
bôņgôlipi) is the writing system for the Bengali language. The same script
is the basis for the Assamese, Meitei, Bishnupriya Manipuri, Kokborok, Garo
and Mundari alphabets. All these languages are spoken in the eastern region
of South Asia. Refer:

http://en.wikipedia.org/wiki/Bengali_alphabet

I propose to Unicode that it renames this code range as Eastern Nagari
or East(ern) South Asian Script.

Regards,
Anbu Kaveeswarar Selvaraju

On Sat, 10 Sep 2011 02:44:59 +0200, Kent Karlsson
kent.karlsso...@telia.com wrote:
 Den 2011-09-10 00:53, skrev delex r del...@indiatimes.com:
 
 I figure out that Unicode has not addressed the sovereignty issues of a
 language
 
 Which, I daresay, is irrelevant from a *character* encoding perspective.
 
 while trying to devise an ASCII like encoding system for almost all
 the characters and symbols used on earth. I am continuing with my
 observation
 of the glaring mistake done by Unicode by naming a South Asian Script
as
 ³Bengali². Here I would like to give certain information that I think
 will be
 of some help for Unicode in its endeavour to faithfully represent a
 Universal
 Character encoding standard truer to even micro-facts.
 
 India is believed to have at least 1652 mother tongues out of which
only
 22
 
 One list of languages in India is given in
 http://www.ethnologue.com/show_country.asp?name=IN
 (I did not count the number of entries)
 
 are recognized by the Indian Constitution as official languages for
 administrative communication among local governments and to the
 citizens. And
 the constitution has not explicitly recognized any official script. As
 Unicode
 has listed the languages and scripts, the Indian Constitution has also
 listed
 
 Unicode does not list any languages at all. Ok, the CLDR subproject
copies
 a
 list of language codes from the IANA language subtag registry, which (in
a
 complex manner) takes its language codes from (among others) the ISO
639-3
 registry, which largely is in sync with Ethnologue (as in the list
above);
 but I guess that is not what you referred to.
 
 the official languages ( In its 8th schedule). The first entry in that
 list is
 the Assamese language.  Assamese is a sovereign language with its own
 grammar
 
 Which I don't think is in dispute at all.
 
 and ³script² that contains some unique characters that you will not
find
 in
 any of the scripts so far discovered by Unicode. At least 30 million
 people
 
 Unicode (at this stage) does not do any discovery. Unicode and ISO/IEC
 10646 is driven by applications (proposals) to encode characters (and
 define
 properties of characters).
 
 call it the ³Assamese Script² and if provided with computers and
 internet
 
 If you want to disunify the Bengali script (and characters) from
Assamese,
 you need to show, in a proposal document, that they really are different
 scripts, and should not be unified as just different uses of the same
 script.
 
 connection can bomb the Unicode e-mail address with confirmations.
These
 
 Hmm, an email bombing threat... I'm sure Sarasvati can find a way to
block
 those (or we may all simply file them away as spam).
 
 characters are, I repeat, the one that is given a Hexcode 09F0  and the
 other
 with 09F1 by this universal character encoding system but unfortunat!
  ely has described both as ³Bengali² Ra etc. etc. I don¹t know who has
  advised
 Unicode to use the tag ³Bengali² to name the block that includes these
 two
 characters. 
 
 If you are not an Indian then just google an image of an Indian
Currency
 note.
 There on one side of the note you will find a box inside which the
value
 of
 the currency note is written in words in at least 15 scripts of
official
 Indian languages.( I don¹t know why it is not 22). At the top , the
 script is
 Assamese as Assamese is the first officially recognized language
 (script?) .
 Next below it you will find almost similar shapes. That is in Bengali.
 India
 officially recognises the distinction between these two scripts which
 although
 shaped similar but sounds very different at many points. And the
standard
 
 Minor font differences is not a 

Re: Continue:Glaring mistake in the code list for South Asian Script

2011-09-10 Thread Doug Ewell

Anbu Kaveeswarar Selvaraju anbu at peoplestring dot com wrote:

The Bengali alphabet (Bengali: বাংলা লিপি bangla lipi or Bengali: 
বঙ্গলিপি
bôņgôlipi) is the writing system for the Bengali language. The same 
script
is the basis for the Assamese, Meitei, Bishnupriya Manipuri, Kokborok, 
Garo
and Mundari alphabets. All these languages are spoken in the eastern 
region

of South Asia. Refer:

http://en.wikipedia.org/wiki/Bengali_alphabet

I propose to Unicode that it renames this code range as Eastern 
Nagari

or East(ern) South Asian Script.


You're not listening.  Block names in Unicode do not denote languages.

--
Doug Ewell | Thornton, Colorado, USA | RFC 5645, 4645, UTN #14
www.ewellic.org | www.facebook.com/doug.ewell | @DougEwell ­ 





RE: Continue:Glaring mistake in the code list for South Asian Script

2011-09-10 Thread Peter Constable
Once a script is encoded, the reference name used in the Standard for the 
script becomes part of stable character identifiers that _cannot be changed_. 
This is not just Unicode policy; this is policy of ISO JTC1/SC2. The reference 
name Bengali for the script in question cannot be changed. The most that 
could be done would be to add a comment indicating that the script is also 
known as Eastern Nagari or that the script is used for Assamese, Manipuri, 
and other languages as well as the Bengali language. But, in fact, the Standard 
already says this--see TUS 6.1, section 9.2, page 985 
(http://www.unicode.org/versions/Unicode6.0.0/ch09.pdf):

quote

9.2 Bengali (Bangla)

Bengali: U+0980–U+09FF

The Bengali script is a North Indian script closely related to Devanagari. It 
is used to write
the Bengali language primarily in the West Bengal state and in the nation of 
Bangladesh. In
India and Bangladesh, the preferred name for the script and the language is 
Bangla. The
script is also used to write Assamese in Assam and a number of other minority 
languages,
such as Bishnupriya Manipuri, Daphla, Garo, Hallam, Khasi, Mizo, Munda, Naga, 
Rian,
and Santali, in northeastern India.

/quote

If there is any reasonable revision to this informative text that you think 
would improve it, you should submit that feedback; you can do that using the 
online feedback mechanism at http://www.unicode.org/reporting.html.


Peter


-Original Message-
From: unicode-bou...@unicode.org [mailto:unicode-bou...@unicode.org] On Behalf 
Of a...@peoplestring.com
Sent: Saturday, September 10, 2011 2:09 AM
To: kent.karlsso...@telia.com
Cc: del...@indiatimes.com; unicode@unicode.org
Subject: Re: Continue:Glaring mistake in the code list for South Asian Script

Hi Unicode Community!

I recommend to Unicode that this grievance is taken into account. No one 
consonant in this code range is used by only one language. Refer:

http://en.wikipedia.org/wiki/Eastern_Nagari_alphabet#Consonants

The Indian census of 1961 recognised 1,652 different languages in India 
(including languages not native to the subcontinent). The 1991 census 
recognizes 1,576 classified mother tongues. Refer:

http://en.wikipedia.org/wiki/Languages_of_India#Inventories.

The Eastern Nagari script is an Abugida system of writing belonging to the 
Brahmic family of scripts whose use is associated with the Assamese, Bengali, 
Bishnupriya Manipuri, Maithili, Mising, Meitei Manipuri, Sylheti, and 
Chittagonian languages. Refer:

http://en.wikipedia.org/wiki/Eastern_Nagari_alphabet

The Bengali alphabet (Bengali: বাংলা লিপি bangla lipi or Bengali: বঙ্গলিপি
bôņgôlipi) is the writing system for the Bengali language. The same script is 
the basis for the Assamese, Meitei, Bishnupriya Manipuri, Kokborok, Garo and 
Mundari alphabets. All these languages are spoken in the eastern region of 
South Asia. Refer:

http://en.wikipedia.org/wiki/Bengali_alphabet

I propose to Unicode that it renames this code range as Eastern Nagari
or East(ern) South Asian Script.

Regards,
Anbu Kaveeswarar Selvaraju

On Sat, 10 Sep 2011 02:44:59 +0200, Kent Karlsson kent.karlsso...@telia.com 
wrote:
 Den 2011-09-10 00:53, skrev delex r del...@indiatimes.com:
 
 I figure out that Unicode has not addressed the sovereignty issues of 
 a language
 
 Which, I daresay, is irrelevant from a *character* encoding perspective.
 
 while trying to devise an ASCII like encoding system for almost all 
 the characters and symbols used on earth. I am continuing with my 
 observation of the glaring mistake done by Unicode by naming a South 
 Asian Script
as
 ³Bengali². Here I would like to give certain information that I think 
 will be of some help for Unicode in its endeavour to faithfully 
 represent a Universal Character encoding standard truer to even 
 micro-facts.
 
 India is believed to have at least 1652 mother tongues out of which
only
 22
 
 One list of languages in India is given in 
 http://www.ethnologue.com/show_country.asp?name=IN
 (I did not count the number of entries)
 
 are recognized by the Indian Constitution as official languages for 
 administrative communication among local governments and to the 
 citizens. And the constitution has not explicitly recognized any 
 official script. As Unicode has listed the languages and scripts, the 
 Indian Constitution has also listed
 
 Unicode does not list any languages at all. Ok, the CLDR subproject
copies
 a
 list of language codes from the IANA language subtag registry, which 
 (in
a
 complex manner) takes its language codes from (among others) the ISO
639-3
 registry, which largely is in sync with Ethnologue (as in the list
above);
 but I guess that is not what you referred to.
 
 the official languages ( In its 8th schedule). The first entry in 
 that list is the Assamese language.  Assamese is a sovereign language 
 with its own grammar
 
 Which I don't think is in dispute at all.
 
 and ³script² that contains some unique characters

Re: Continue:Glaring mistake in the code list for South Asian Script

2011-09-10 Thread Richard Wordingham
On Sat, 10 Sep 2011 12:33:47 +0600
Chridtopher Fynn chris.f...@gmail.com wrote:

 Characters only used for writing Assamese in the Bengali block is 
 similar. As long as you can type all the characters necessary for 
 writing your language, don't worry about names.

Actually, names sometimes matter.  If one is forced to use a pick list
when typing, it is helpful to see the name of the character if the pick
list displays the character poorly.  However, apart from a few
totally confusing howlers (especially in Lao), that is largely an
internationalisation issue.

In this context, though, it is probably best to mutter, 'Unicode
idiots call the Assamese script Bengali' rather than totally confuse
people.  (I presume the Assamese are happy with the concept that
Bengali uses the Assamese script.)

Secondly, some people need to be able to type other people's languages
- a great many people need to be able to type English!  I imagine Anbu
  needs to work with several scripts.

Richard.



Re: Continue:Glaring mistake in the code list for South Asian Script

2011-09-10 Thread Mark E. Shoulson
On 09/09/2011 08:12 PM, Peter Constable wrote (responding to 
del...@indiatimes.com):

Thus, what you refer to as a glaring mistake is not a mistake at all when 
considered in relation to what the intent and usage within the Standard is--and what it 
is _not_.
More significantly, it doesn't even matter if it *is* a mistake.  
Bringing evidence and trying to prove that you are correct is not 
relevant.  Even if you are completely right and everyone can see it, 
Unicode *still* isn't going to change its names.  If they won't even 
correct a misspelling, a single-letter transposition, they are not going 
to make other changes.


Stop trying to tell us why you are right.  We can concede that you are, 
and it doesn't matter.


~mark



RE: Continue:Glaring mistake in the code list for South Asian Script

2011-09-09 Thread Peter Constable
You appear to be assuming that Unicode lists languages. It does not. It deals 
with characters and scripts. As mentioned before, it does not attempt to 
document all possible and preferred ways to refer to characters or scripts; 
that is well beyond the scope, purpose and requirements. All that Unicode does 
is provide a standard and universally-available means of encoding 
text--whatever text for whatever language, and referred to by whatever 
communities in whichever ways they may choose. To achieve that, it must adopt 
_some_ name for characters and scripts for reference purposes so that 
implementers of the standard have some way to refer to those things 
unambiguously. But that does not at all mean that _everybody_ is assumed to use 
those same terms, or even to think of collections of characters in the same way 
that Unicode uses the notion of script.

With that in mind, Bengali is used in the Unicode standard purely as an 
unambiguous way to refer to a particular collection of characters that are 
related in history and current conventional usage (across multiple language 
communities) and that share certain graphic and behavioural characteristics. It 
is mainly historical coincidence that Bengali is the term used in the 
Standard; as Doug Ewell and John Jenkins explained in other mail, these terms 
were adopted within the Standard based on how such collections are most 
typically referred to in English-language discussion. The term is being used to 
reference a collection of characters--a script--and not a language, and there 
is no intent whatsoever to suggest that any particular language should be 
considered to have any particular status relative to any other language.

Thus, what you refer to as a glaring mistake is not a mistake at all when 
considered in relation to what the intent and usage within the Standard is--and 
what it is _not_.


Peter

-Original Message-
From: unicode-bou...@unicode.org [mailto:unicode-bou...@unicode.org] On Behalf 
Of delex r
Sent: Friday, September 09, 2011 3:54 PM
To: unicode@unicode.org
Subject: Continue:Glaring mistake in the code list for South Asian Script

I figure out that Unicode has not addressed the sovereignty issues of a 
language while trying to devise an ASCII like encoding system for almost all 
the characters and symbols used on earth. I am continuing with my observation 
of the glaring mistake done by Unicode by naming a South Asian Script as 
“Bengali”. Here I would like to give certain information that I think will be 
of some help for Unicode in its endeavour to faithfully represent a Universal 
Character encoding standard truer to even micro-facts.

India is believed to have at least 1652 mother tongues out of which only 22 are 
recognized by the Indian Constitution as official languages for administrative 
communication among local governments and to the citizens. And the constitution 
has not explicitly recognized any official script. As Unicode has listed the 
languages and scripts, the Indian Constitution has also listed the official 
languages ( In its 8th schedule). The first entry in that list is the Assamese 
language.  Assamese is a sovereign language with its own grammar and “script” 
that contains some unique characters that you will not find in any of the 
scripts so far discovered by Unicode. At least 30 million people call it the 
“Assamese Script” and if provided with computers and internet connection can 
bomb the Unicode e-mail address with confirmations. These characters are, I 
repeat, the one that is given a Hexcode 09F0  and the other with 09F1 by this 
universal character encoding system but unfortunat!
 ely has described both as “Bengali” Ra etc. etc. I don’t know who has advised 
Unicode to use the tag “Bengali” to name the block that includes these two 
characters. 

If you are not an Indian then just google an image of an Indian Currency note. 
There on one side of the note you will find a box inside which the value of the 
currency note is written in words in at least 15 scripts of official Indian 
languages.( I don’t know why it is not 22). At the top , the script is Assamese 
as Assamese is the first officially recognized language (script?) . Next below 
it you will find almost similar shapes. That is in Bengali. India officially 
recognises the distinction between these two scripts which although shaped 
similar but sounds very different at many points. And the standard assamese 
alphabet set has extra characters which are never bengali just like London is 
never in Germany.

Coming again to the Hexcodes 09F0 (Raw) and 09F1 (wabo). Both have nothing 
Bengali in them and interestingly 09F1 ( sounds WO or WA when used within 
words) has even nothing ‘Ra’ sound in it. Thus you know, with actual Bengali 
alphabet set one can’t write anything to produce the sound “Watt” as in James 
Watt and instead need to combine three alphabets but even then only to sound  
like “ OOYAT “ in Bengali itself. 

Therefore 

Re: Continue:Glaring mistake in the code list for South Asian Script

2011-09-09 Thread Kent Karlsson

Den 2011-09-10 00:53, skrev delex r del...@indiatimes.com:

 I figure out that Unicode has not addressed the sovereignty issues of a
 language

Which, I daresay, is irrelevant from a *character* encoding perspective.

 while trying to devise an ASCII like encoding system for almost all
 the characters and symbols used on earth. I am continuing with my observation
 of the glaring mistake done by Unicode by naming a South Asian Script as
 ³Bengali². Here I would like to give certain information that I think will be
 of some help for Unicode in its endeavour to faithfully represent a Universal
 Character encoding standard truer to even micro-facts.
 
 India is believed to have at least 1652 mother tongues out of which only 22

One list of languages in India is given in
http://www.ethnologue.com/show_country.asp?name=IN
(I did not count the number of entries)

 are recognized by the Indian Constitution as official languages for
 administrative communication among local governments and to the citizens. And
 the constitution has not explicitly recognized any official script. As Unicode
 has listed the languages and scripts, the Indian Constitution has also listed

Unicode does not list any languages at all. Ok, the CLDR subproject copies a
list of language codes from the IANA language subtag registry, which (in a
complex manner) takes its language codes from (among others) the ISO 639-3
registry, which largely is in sync with Ethnologue (as in the list above);
but I guess that is not what you referred to.

 the official languages ( In its 8th schedule). The first entry in that list is
 the Assamese language.  Assamese is a sovereign language with its own grammar

Which I don't think is in dispute at all.

 and ³script² that contains some unique characters that you will not find in
 any of the scripts so far discovered by Unicode. At least 30 million people

Unicode (at this stage) does not do any discovery. Unicode and ISO/IEC
10646 is driven by applications (proposals) to encode characters (and define
properties of characters).

 call it the ³Assamese Script² and if provided with computers and internet

If you want to disunify the Bengali script (and characters) from Assamese,
you need to show, in a proposal document, that they really are different
scripts, and should not be unified as just different uses of the same
script.

 connection can bomb the Unicode e-mail address with confirmations. These

Hmm, an email bombing threat... I'm sure Sarasvati can find a way to block
those (or we may all simply file them away as spam).

 characters are, I repeat, the one that is given a Hexcode 09F0  and the other
 with 09F1 by this universal character encoding system but unfortunat!
  ely has described both as ³Bengali² Ra etc. etc. I don¹t know who has advised
 Unicode to use the tag ³Bengali² to name the block that includes these two
 characters. 
 
 If you are not an Indian then just google an image of an Indian Currency note.
 There on one side of the note you will find a box inside which the value of
 the currency note is written in words in at least 15 scripts of official
 Indian languages.( I don¹t know why it is not 22). At the top , the script is
 Assamese as Assamese is the first officially recognized language (script?) .
 Next below it you will find almost similar shapes. That is in Bengali. India
 officially recognises the distinction between these two scripts which although
 shaped similar but sounds very different at many points. And the standard

Minor font differences is not a reason for disunification. Different
pronunciations of the same letters is not a reason for disunification
either. Just think of how many different ways Latin letters (and letter
combinations) are pronounced in different languages (x, j, h, v, w, f, ...;
even a gets different pronunciation in British English vs. US English,
and that is within the same language...; and most orthographies aren't
very accurately phonetic anyway, with quite a bit of varying (contextual
and dialectal) pronunciation for the letters).

 assamese alphabet set has extra characters which are never bengali just like
 London is never in Germany.

There are 8 London in the USA, two in Canada, one in Kiribati, ... ;-)
(http://en.wikipedia.org/wiki/London_(disambiguation))

 Coming again to the Hexcodes 09F0 (Raw) and 09F1 (wabo). Both have nothing
 Bengali in them and interestingly 09F1 ( sounds WO or WA when used within
 words) has even nothing ŒRa¹ sound in it. Thus you know, with actual Bengali
 alphabet set one can¹t write anything to produce the sound ³Watt² as in James
 Watt and instead need to combine three alphabets but even then only to sound
 like ³ OOYAT ³ in Bengali itself.

Yes, English has a rather peculiar pronunciation for the letter W... ;-)
Several languages will pronounce Watt (without changing the spelling) as
Vatt, and regard that as a normal pronunciation of Watt.

 Therefore Unicode must consider terming the block range as ³Assamese² 

Re: Continue:Glaring mistake in the code list for South Asian Script

2011-09-09 Thread Doug Ewell
English, French, German, Dutch, Spanish, Italian, Portuguese, Swedish, and 
Polish are all different languages. Each has its own pronunciation, vocabulary, 
orthography, national identity, and rich literary tradition.

Would you suggest that the letters used in each of these languages should be 
encoded separately?

--
Doug Ewell • d...@ewellic.org
Sent via BlackBerry by ATT