Re: Devanagari

2002-01-20 Thread Aman Chawla



- Original 
Message -From: "David Starner" <[EMAIL PROTECTED]>To: "Aman Chawla" <[EMAIL PROTECTED]>Cc: "James Kass" <[EMAIL PROTECTED]>; "Unicode"<[EMAIL PROTECTED]>Sent: Monday, January 21, 2002 12:19 
AMSubject: Re: Devanagari> What's your point in continuing 
this? Most of the people on this list> already know how UTF-8 can expand 
the size of non-English text.The issue was originally brought up to 
gather opinion from members of thislist as to whether UTF-8 or ISCII should 
be used for creating Devanagari webpages. The point is not to criticise 
Unicode but to gather opinions ofinformed persons (list members) and 
determine what is the best encoding for informationinterchange in 
South-Asian scripts...


Re: Devanagari

2002-01-20 Thread Aman Chawla

- Original Message -
From: "James Kass" <[EMAIL PROTECTED]>
To: "Aman Chawla" <[EMAIL PROTECTED]>; "Unicode"
<[EMAIL PROTECTED]>
Sent: Monday, January 21, 2002 12:46 AM
Subject: Re: Devanagari


> 25% may not be 300%, but it isn't insignificant.  As you note, if the
> mark-up were removed from both of those files, the percentage of
> increase would be slightly higher.  But, as connection speeds continue
> to improve, these differences are becoming almost minuscule.

With regards to South Asia, where the most widely used modems are approx. 14
kbps, maybe some 36 kbps and rarely 56 kbps, where broadband/DSL is mostly
unheard of, efficiency in data transmission is of paramount importance...
how can we convince the south asian user to create websites in an encoding
that would make his client's 14 kbps modem as effective (rather,
ineffective) as a 4.6 kbps modem?





Re: Devanagari

2002-01-20 Thread Aman Chawla

Taking the extra links into account the sizes are:
English: 10.4 Kb
Devanagari: 15.0 Kb
Thus the Dev. page is 1.44 times the Eng. page. For sites providing archives
of documents/manuscripts (in plain text) in Devanagari, this factor could be
as high as approx. 3 using UTF-8 and around 1 using ISCII.

- Original Message -
From: "James Kass" <[EMAIL PROTECTED]>
To: <[EMAIL PROTECTED]>; <[EMAIL PROTECTED]>
Cc: <[EMAIL PROTECTED]>; <[EMAIL PROTECTED]>
Sent: Sunday, January 20, 2002 11:01 PM
Subject: Re: Devanagari


>
> Doug Ewell wrote,
>
> >
> > I think before worrying about the performance and storage effect on Web
pages
> > due to UTF-8, it might help to do some profiling and see what the actual
> > impact is.
> >
>
> The "What is Unicode?" pages offer a quick study.
>
> 14808 bytes (English)
> 15218 bytes (Hindi)
> 10808 bytes (Danish)
> 11281 bytes (French)
>  9682 bytes (Chinese Trad.)
>
> (The English page includes links to all the other scripts, but the
individual
> script pages only link back to the English page.  So, the English page is
a
> bit larger than the other pages for this reason, not a fair test if we
only
> count the English and Hindi pages.)
>
> The Unicode logo gif at the top left corner of each of these pages takes
>  bytes.  A screen shot of the beginning of the Hindi page takes
> 37569 bytes as a gif, the small portion cropped and attached takes
> 4939 bytes.
>
> The "What is Unicode?" pages are at:
> http://www.unicode.org/unicode/standard/WhatIsUnicode.html
>
> Best regards,
>
> James Kass.
>
>

Title: What is Unicode?







  

  
  

    
  General Information
  Home 
  | Site Map |
  Search 
  
  
  Goto

  
  


   


  
  

  Translations



  
  Hvad er Unicode? in Danish



  (Other languages will be added 
  over time.)


   

  
  

  Display Problems


  Depending on the level of Unicode 
  support in the browser you are using and whether or not you have the 
  necessary fonts installed, you may have display problems for some of 
  the translations, particularly with complex scripts such as Arabic.  
  For further information, see
  Display Problems.


   

  
  

  More Information


  The 
  Unicode Standard, Version 3.0


  
  
  Technical Introduction


  
  Glossary


  
  Unicode-Enabled Products


  
  
  Useful Resources


  
  
  Unicode Consortium


  
  Contacting Unicode

  
  
  
  

  
  
What is Unicode?
Unicode provides a unique number for every 
character,
no matter what the platform,
no matter what the program,
no matter what the language.
Fundamentally, computers just deal with numbers. They store 
letters and other characters by assigning a number for each one. 
Before Unicode was invented, there were hundreds of different 
encoding systems for assigning these numbers. No single encoding 
could contain enough characters: for example, the European Union 
alone requires several different encodings to cover all its 
languages. Even for a single language like English no single 
encoding was adequate for all the letters, punctuation, and 
technical symbols in common use.
These encoding systems also conflict with one another. That is, 
two encodings can use the same number for two different 
characters, or use different numbers for the same character. 
Any given computer (especially servers) needs to support many 
different encodings; yet whenever data is passed between different 
encodings or platforms, that data always runs the risk of 
corruption.
Unicode is changing all that!
Unicode provides a unique number for every character, no matter 
what the platform, no matter what the program, no matter what the 
language. The Unicode Standard has been adopted by such industry 
leaders as Apple, HP, IBM, JustSystem, Microsoft, Oracle, SAP, Sun, 
Sybase, Unisys and

many others. Unicode is required by modern standards such as 
XML, Java, ECMAScript (JavaScript), LDAP, CORBA 3.0, WML, etc., and 
   

Re: Devanagari

2002-01-20 Thread Aman Chawla

> The fact that UTF-8 economizes on the storage for ASCII characters, is a
> benefit for *all* HTML users, as the HTML syntax is entirely in ASCII and
> claims a significant fraction of the data.

> A UTF-8 encoded HTML file, will therefore have (percentage-wise) less
overhead
> for Devanagari as claimed. Add to that James' observation on graphics
files,
> many of which accompany even the simplest HTML documents and you get a
> percentage difference between the sizes of an English and Devanagari
website
> (i.e. in its entirety) that's well within the fluctuation of the typical
> length in characters, for expressing the same concept in different
languages.

The point was that a UTF-8 encoded HTML file for an English web page
carrying say 10 gifs would have a file size one-third that for a Devanagari
web page with the same no. of gifs - even if you take into account the
fluctuation of the typical length in characters, for expressing the same
concept in different languages. This is because in some cases one language
may express a concept more compactly while in other cases it may not, and on
the whole this effect would balance out and can therefore be neglected.
Therefore transmission of a Devanagari web page over a network would take
thrice as long as that of an English web page using the same images and
presenting the same information.






Devanagari Rupee Symbol

2002-01-20 Thread Aman Chawla



I am unable to find the Devanagari Rupee sign 
encoded in Unicode? Is it encoded? If not, why?
 


Devanagari

2002-01-19 Thread Aman Chawla




I would be grateful if I could get opinions on 
the following: 
 
1. Which encoding/character set is most suitable 
for using Hindi/Marathi (both of which use Devanagari) on the internet as well 
as in databases, and why? In your response, please refer to: http://www.iiit.net/ltrc/Publications/iscii_plugin_display.html, particularly 
the following paragraphs:

"Many people hope that the standardization problem will get solved because of 
Unicode. However there is an issue of transmission efficiency. The transmission 
cost for Indian languages will be three times that of English! The real culprit 
being UTF-8. UTF-8 converts Unicode two-byte codes to byte sequence of one to 
four bytes. In the process they make sure that ASCII part of the Unicode is 
transmitted as single byte only. So for a language like English which uses only 
0-127 part of the code there is no overhead. European languages use only a few 
character codes in the region 128-255 in addition to 0-127 part. So in the case 
of the Europian languages the transmission of this portion may incur some 
overhead say of the order of 10%. 
In contrast to above cases Indian languages use no part of the code in region 
0-127. Secondly Indian character codes occupy less than 127 codes for each 
language. So what could have been transmitted in one byte if one uses ASCII will 
be transmitted in a sequence of two to four bytes. This amounts to extra 
overhead of 200%!"

2. Related to question 1, what can be done to 
encourage/force the use of a standardised encoding for Devanagari on the 
Internet? 
 
3. With reference to the previous question, can 
programs that convert the myriad Devangari encodings in use today to a 
standard encoding (question 1) be made freely available, and how?
 
4. Is there any search engine on the internet that 
maintains an up to date index of sites in Devanagari? If not, what can be done 
to encourage proprietary search engines to support Hindi? Google supposedly has 
a Hindi language option, but surprise, it's in Roman script! Several emails to 
them have elicited the response: "At the moment we don't support 
Devanagari..."
 
Thanks,
 
Aman Chawla


Unicode Devanagari Range

2002-01-16 Thread Aman Chawla



This is with reference to the Unicode Devanagari 
(Hindi) Range. Is there a way to overcome/override the automatic glyph 
substitution that occurs when one types a pure consonant (eg. 0926 द) + halant (094D ् ) + another consonant (0918 घ) ? 

 
When one types the previously indicated sequence, 
one gets a combined glyph द्घ  that is extremely 
difficult to read and is often avoided in Devanagari printing, for the purposes 
of legibility. This glyph can easily be mistaken with द्ध which is (द  
+ ् + ध). In all Devanagari newspapers, the sequence indicated above is printed 
as it is, without substituting the complex, combined glyph to minimise 
confusion. 


Unicode Search Engines

2002-01-16 Thread Aman Chawla



Are there any search engines at all at present 
which allow one to search sites encoded in UTF-8? If not, are there plans to 
build such search engines? For example, is Google going to implement such an 
engine?
 
Aman Chawla


Re: Hindi characters for transcribing the sound "e"

2002-01-15 Thread Aman Chawla



[EMAIL PROTECTED] type="cite">
  >>This 
  is the kind of thing I am looking for: a 'special composite matra' to write a 
  new sound in Hindi, imported from 
  English. 
>>I don't believe it exists. But what is your goal?  Trying to 
give an idea of how English is spoken to Hindi readers? I'm not sure a new or 
very rare character would really >>help.
My goal is to accurately transcribe English words such as 'get', 'bed' etc. 
into Hindi. Just as for Bengali a special character can be used to represent a 
sound not present in the language, similarly there should be (hopefully) a 
special character for this English sound. 
 
Also are there any words in Hindi that use the ऎ 
DEVANAGARI LETTER SHORT E or its corresponding diacritic mark ॆ? 
I personally have never come across one. Maybe this diacritic gives the sound of 
the "e" in bed or led?

  - Original Message - 
  From: 
  Patrick 
  Andries 
  To: Aman Chawla 
  Cc: Unicode 
  Sent: Tuesday, January 15, 2002 7:10 
  PM
  Subject: Re: Hindi characters for 
  transcribing the sound "e"
  Aman Chawla wrote:
  [EMAIL PROTECTED] type="cite"> 



Thanks for the response Patrick. I understand 
your last sentence: the closest 
you can come to /&eps;/  is using ैYes, and I 
  believe there is variability in the pronounciation of this grapheme within 
  Hindi speakers. As mentioned, some authors say it is a /&eps;/ (open), 
  some say it is a diphtong (such as English "rail"). There is nothing strange 
  about this.Compare the pronounciation given on these two different 
  sites : http://www.avashy.com/script/greendemo1.html 
  (the woman pronounces the letter in isolation differently from the man, but 
  both say  /&eps;/ in aisâ) and the diphtong produced here  http://faculty.maxwell.syr.edu/jishnu/101/alphabet/sounds/018ei.wav 
  (found on http://faculty.maxwell.syr.edu/jishnu/101/alphabet/default.asp?section=0).You 
  say it is /ae/ (I take it) as in "shall", this is corroborated by William 
  Bright (op. cit), but Ohala writes in her article that /ae/ only occurs in 
  English loan words such as "bat" (cricket bat)... Knowing quite well 
  French phonology and its own diversity, I would assume the same applies to 
  Hindi: the same letters are pronounced differently in different regions or 
  even social classes. 
  [EMAIL PROTECTED] type="cite">
However, in 
the response given to the following FAQ:  http://www.unicode.org/unicode/faq/indic.html#13 you will find this sentence: "This zophola_aa can be seen as a 
special "composite" matra to write a new Bengali sound, imported from 
English." 
  [EMAIL PROTECTED] type="cite">
This is the 
kind of thing I am looking for: a 'special composite matra' to write a new 
sound in Hindi, imported from 
English. I 
  don't believe it exists. But what is your goal?  Trying to give an idea 
  of how English is spoken to Hindi readers? I'm not sure a new or very rare 
  character would really help.
  [EMAIL PROTECTED] type="cite">
Mark Davis 
suggests that: "I just checked with the ICU online demo at  
http://oss.software.ibm.com/cgi-bin/icu/tr , and "e" is transliterated 
as U+090E "ऎ" DEVANAGARI LETTER SHORT E*. "  
One has  to 
  distinguish between transcription and transliteration. A transliteration only 
  allows one to preserve the original spelling in the absence of the original 
  alphabet. It does not indicate how this letter should be pronounced (see the 
  various pronounciation of the English "e" in "we", "red", "the", "new",  
  "bottle/some", "clerk") and this was your original question "how do I 
  represent in Devanâgarî the English SOUND found in "red", "bed". A 
  transliteration is of no help, a transcription is.Patrick 
  A.


Re: Hindi characters for transcribing the sound "e"

2002-01-15 Thread Aman Chawla



Thanks for the response Patrick. I understand your 
last sentence: the closest you can 
come to /&eps;/  is using ै
However, in the response 
given to the following FAQ:  http://www.unicode.org/unicode/faq/indic.html#13 
you will find this sentence: "This zophola_aa can be seen as a special 
"composite" matra to write a new Bengali sound, imported from English." This is 
the kind of thing I am looking for: a 'special composite matra' to write a new 
sound in Hindi, imported from 
English. 
Mark Davis suggests 
that: "I just checked with the ICU online demo at http://oss.software.ibm.com/cgi-bin/icu/tr, 
and "e" is transliterated as U+090E "ऎ" DEVANAGARI LETTER 
SHORT E*. "  


Re: Hindi characters for transcribing the sound "e"

2002-01-14 Thread Aman Chawla



Yes, I am a native speaker. The Hindi word for dirt 
is मैल The vowel in this sounds like the vowel in the 
English word 'shall' (also, like the first vowel sound in the English word 
'rally') and not at all like the vowel in 'bed' or 'red'. However, the Hindi 
word for harmony which is मेल has a vowel which does sound 
like the vowel sound in the English word 'bake'. In any case, the vowel sound 
that I am talking about is neither the one in 'shall' nor the one in 'bake', 
rather the one in 'bed', 'red', 'said', etc.

  - Original Message - 
  From: 
  Patrick 
  Andries 
  To: Aman Chawla 
  Sent: Tuesday, January 15, 2002 1:42 
  AM
  Subject: Re: Hindi characters for 
  transcribing the sound "e"
  Do you speak Hindi ? Does the word for dirt have a vowel that 
  sounds like bed/red ? Does the word for harmony one that sounds like bake ? 
  How do they sound for you ?If these pairs of word do not sound  
  alike, Manjari Ohala is wrong in his article about Hindi phonetics in the 
  Handbook of the International Phonetic AssociationAman 
  Chawla wrote:
  [EMAIL PROTECTED] type="cite"> 

Actually, I am not talking about the sound in 
hay or bake or the Hindi words for dirt or harmony. Rather, the sound in 
bed, red, dead, led, fed, said, etc.

  - 
  Original Message - 
  From:Patrick 
  Andries 
  To: 
  Aman Chawla 
  Sent: 
  Monday, January 14, 2002 10:24 PM
  Subject: 
  Re: Hindi characters for transcribing the sound "e"
  Aman Chawla a écrit :
  [EMAIL PROTECTED] 
type="cite">


With reference to the FAQ: http://www.unicode.org/unicode/faq/indic.html#13 
, I would like to know what are the Hindi characters used to transcribe 
the sound "e" (as in English "bet", "bed", "red" etc.) in 
Unicode.
ThanksEnglish 
  vowels, I'm not too sure about them. Let's see, are you speak of the sound 
  "e" (the open mid-front unrounded vowel) as in Hindi  /mƐl/ ("dirt" 
  according to my sources) and not /mel/ ("harmony") ? I believe it 
  is often translitterated "ai" and could be transcribed back in Hindi 
  with  ऐ or 
  ै (U+0910, U+0948 as a diacritic) 
  although it is originally a diphtong. The English closed mid-front 
  unrounded vowel /e/ (as in hay or bake) would be transcribe with a U+090E 
  ऎ . Patrick 
  Andries


Re: Hindi characters for transcribing the sound "e"

2002-01-14 Thread Aman Chawla



The Demo doesn't seem to be particularly reliable. 
For instance, the following English words, all have the same vowel sound: red, 
said, dead, led, shed, fed. However, the Demo gave the following 
Latin-Devanagari outputs: रॆद् , सैद् , दॆअद् , लॆद् , शॆद् , 
फ़ॆद् 
First of all the ending 'd' sound in all the 
English words is ड् and not द् as given 
by the demo. Secondly, though 'said' and 'red' have the same vowel sound (not 
character, but sound), the demo gave two different Hindi diacritics. Hindi is 
phonetic and so each diacritic has one and only one sound. 
 
I am looking for transscription (by sound) of the 
"e" sound in bed, red, get etc. into a Hindi character.  

  - Original Message - 
  From: 
  Mark Davis 
  
  To: Aman Chawla ; Unicode 
  Sent: Monday, January 14, 2002 9:04 
  PM
  Subject: Re: Hindi characters for 
  transcribing the sound "e"
  
  There are two different processes: 
  transliteration (which is by letter) and transscription (which is by 
  sound).
   
  If transliteration is what you mean, I just 
  checked with the ICU online demo at http://oss.software.ibm.com/cgi-bin/icu/tr, 
  and "e" is transliterated as U+090E "ऎ" DEVANAGARI LETTER SHORT E*. ICU 
  transliteration for Devanagari is based on ISCII (for the exact composition, 
  see the last section of http://oss.software.ibm.com/icu/userguide/Transliteration.html, 
  called "Script Transliteration Sources".
   
  Mark
   
  * I use the demo fairly often simply to get hex 
  converted to and from characters, and characters converted to and from 
  names.
   
  —
   
  Πόλλ’ ἠπίστατο ἔργα, κακῶς δ’ ἠπίστατο πάντα — Ὁμήρου Μαργίτῃ[For 
  transliteration, see http://oss.software.ibm.com/cgi-bin/icu/tr]
   
  http://www.macchiato.com
  
- Original Message - 
From: 
Aman Chawla 
To: Unicode 
Sent: Monday, January 14, 2002 
05:48
Subject: Hindi characters for 
transcribing the sound "e"

With reference to the FAQ: http://www.unicode.org/unicode/faq/indic.html#13, 
I would like to know what are the Hindi characters used to transcribe the 
sound "e" (as in English "bet", "bed", "red" etc.) in Unicode.
Thanks


Re: Hindi characters for transcribing the sound "e"

2002-01-14 Thread Aman Chawla



Actually, I am not talking about the sound in hay 
or bake or the Hindi words for dirt or harmony. Rather, the sound in bed, red, 
dead, led, fed, said, etc.

  - Original Message - 
  From: 
  Patrick 
  Andries 
  To: Aman Chawla 
  Sent: Monday, January 14, 2002 10:24 
  PM
  Subject: Re: Hindi characters for 
  transcribing the sound "e"
  Aman Chawla a écrit :
  [EMAIL PROTECTED] type="cite">



With reference to the FAQ: http://www.unicode.org/unicode/faq/indic.html#13 
, I would like to know what are the Hindi characters used to transcribe the 
sound "e" (as in English "bet", "bed", "red" etc.) in Unicode.
ThanksEnglish 
  vowels, I'm not too sure about them. Let's see, are you speak of the sound "e" 
  (the open mid-front unrounded vowel) as in Hindi  /mƐl/ ("dirt" according 
  to my sources) and not /mel/ ("harmony") ? I believe it is often 
  translitterated "ai" and could be transcribed back in Hindi 
  with  ऐ or 
  ै (U+0910, U+0948 as a diacritic) although it 
  is originally a diphtong. The English closed mid-front unrounded vowel /e/ (as 
  in hay or bake) would be transcribe with a U+090E 
  ऎ. Patrick 
Andries


Hindi characters for transcribing the sound "e"

2002-01-14 Thread Aman Chawla



With reference to the FAQ: http://www.unicode.org/unicode/faq/indic.html#13, 
I would like to know what are the Hindi characters used to transcribe the sound 
"e" (as in English "bet", "bed", "red" etc.) in Unicode.
Thanks