Re: Panjabi or Punjabi?

2003-02-21 Thread Aditya Gokhale

Hello,
It is like Panama where we have 'pa' strong sound of 'a' and Sun
where even though we have a sound of 'a' we still spell it with 'u'. So both
the spellings (Punjabi and Panjabi) are correct. As far as Punjabi / Panjabi
goes, in India it is spelled as Punjabi and in Pakistan it is spelled as
Panjabi.

Aditya Gokhale.


- Original Message -
From: Roozbeh Pournader [EMAIL PROTECTED]
To: Unicode List [EMAIL PROTECTED]
Sent: Friday, February 21, 2003 1:30 PM
Subject: Panjabi or Punjabi?



 I just found a difference between online reference on official ISO 639
 names of the pa language.

 http://ftp.ics.uci.edu/pub/ietf/http/related/iso639.txt mentions:

 pa Punjabi

 while http://www.loc.gov/standards/iso639-2/englangn.html says:

 Panjabi pendjabi pan pa

 Which spelling is the official ISO 639 one?

 roozbeh






Re: Indic Devanagari Query

2003-01-29 Thread Aditya Gokhale

Hello,
Thanks for the reply. I will check the points as you said, as far as the
font issues are considered. We all know how jna,shra and ksh are formed in
UNICODE and ISCII, but the point I wanted to make was, if we have to sort /
search / process the data in Devanagari script, then we have to keep track
of at least three characters and not one. This becomes tedious, thought not
impossible. If single
code point is present it will be very easy to process.
With regards, to predict language by using some heuristic, in my
opinion it is a very risky solution, at least when I don't have much
information at stage one of my application. I am running OCR engine on a
Devanagari page, then based on the formatting, tagging the language. So I
think tagging, as I am doing right now is a better solution. I also agree
with the views expressed by Asmus Freytag, that if we go on including all
the 6000 languages, it will be extremely impossible to cross-correlate these
'code pages'.

-Aditya






Indic Devanagari Query

2003-01-28 Thread Aditya Gokhale



Hello Everybody, I had few query 
regarding representation of Devanagari script in Unicode(Code page - 0x0900 
- 0x097F). Devanagari is a writing script, isused in Hindi, Marathi and 
Sanskrit languages. I have following questions - 

1. In Marathi and Sanskrit language two charactersglyphs 
of 'la' and 'sha' are represented differently as shown in the image below - 


(Firstglyph is 
'la' and second one is 'sha')
as compared to Hindi where these character glyphs are 
represented as shown in the image below - 

(First glyph is 'la' and 
second one is 'sha')

In the same script code page, how do I use these two different 
Glyphs, to represent the same character ? Is there any way by which I can do it 
in an Open type font and Free type font implementation ?

2. Implementation Query -
 In an implementation where I need to send / 
process Hindi, Marathi and Sanskrit data, how do Idifferentiate between 
languages (Hindi, Marathi and Sanskrit). Say for example, I am writing a 
translation engine, and I want to translate a document having Hindi, Marathi and 
Sanskrit Text in it, how do I know from the code points between 0x0900 and 
0x097F, that the data under perusal is Hindi / Marathi / Sanskrit ?
 I would suggest that we should give 
different code pages for Marathi, Hindi and Sanskrit. May be current code page 
of Devanagari can be traded as Hindi and two new code pages for Marathi and 
Sanskrit be added. This could solve these issues. If there is any better way of 
solving this, any one suggest.


3. Character codes for jna, shra, ksh - 

In Sanskrit and Marathi jna, shra and ksh are considered as 
separate characters and not ligatures. How do we take care of this ? Can I get 
over all views on the matter from the group ? In my opinion they should be given 
different code points in the specific language code page.
Please find below the character glyphs - 

jna

shra

ksh


thanks,
Aditya Gokhale.
GIST Research and Development Lab,
C-DAC Pune,
Maharashtra, India.

http://www.cdacindia.com/html/gist/gistidx.asp