Re: Bengali Script

2010-07-27 Thread Tulasi
> the wrong way by not doing proper research yourself first.

I am looking for basic stuff, already available 2 lists.
May be you can educate me how Unicode discovered Bengali script;
what research it did; lets add it to objective as well.

(a) JPG image of letters/symbols/cascaded-conjuncts as per GOB-standard
(b) JPG image of letters/symbols/cascaded-conjuncts as per WBG-standard
(c) What research Unicode did leading to its Bengali script encoding.

> How about approaching said organisations and trying to get
> a copy of the standards?

I will write to them as well if I get contact address. But GOB and WBG
are Unicode's Institutional members. This is right spot to get both
requested items. Just two JPG images - GOB-standard and WBG-standard.

I am hoping Prof Pandey will email me WBG-standard and hoping Javier
will get GOB-standard (see appended email).

> rubbing a lot of people on this list

Not true!
I did/do not rub anybody.

Tulasi

From: Jeroen Ruigrok van der Werven 
Date: Tue, 13 Jul 2010 10:16:17 +0200
Subject: Re: Bengali Script
To: Tulasi 
Cc: Unicode Discussion 

-On [20100713 05:05], Tulasi (tulas...@gmail.com) wrote:
>So I needed 2 list of letters/symbols including cascaded conjuncts,
>one GOB-standard and the other WBG-standard.

How about approaching said organisations and trying to get a copy of the
standards?

I'm sorry, you probably mean well, but at the moment I get the distinct
impression that your many questions is rubbing a lot of people on this list
the wrong way by not doing proper research yourself first.

-- 
Jeroen Ruigrok van der Werven  / asmodai
イェルーン ラウフロック ヴァン デル ウェルヴェン
http://www.in-nomine.org/ | http://www.rangaku.org/ | GPG: 2EAC625B
Who looks under the surface does so at his own risk...


From: Tulasi 
Date: Fri, 16 Jul 2010 14:43:47 -0700
Subject: Re: Bengali Script
To: Javier Sola 
Cc: unicode@unicode.org, pan...@umich.edu

> I work with the Bangla Academy (note the work Bangla in the English
> name of the academy), the Ministry of information Teachnology,
> the Office of the Prime Minister and the Ministry of Education.

Javier can certainly help us get a copy of JPG images of list of all
letters/symbols including cascaded conjuncts as per Government of
Bangladesh (GOB) standard.

So Javier, can you help get a copy?

We are fortunate to have someone like him on board who worked with
Ministry of Education and Office of the Prime Minister.

I have written to Prof Pandey, a knowledgeable Bengali.
I found his work while googling
http://std.dkuug.dk/jtc1/sc2/wg2/docs/n3311.pdf

Any comment on his work?

I can post to the group if JPG image I am expecting from Prof Pandey
is less than 50K :-')

Tulasi


From: Javier Sola 
Date: Fri, 09 Jul 2010 14:53:40 +0700
Subject: Re: Bengali Script
To:
Cc: unicode@unicode.org

I have been doing localization of software to Bangla for several years.
I work with the Bangla Academy (note the work Bangla in the English name
of the academy), the Ministry of information Teachnology, the Office of
the Prime Minister and the Ministry of Education. Among other things we
are working on the standardization of computer language in Bangla.

> trimmed off <

Javier


From: Tulasi 
Date: Thu, 15 Jul 2010 00:55:40 -0700
Subject: Re: Bengali Script
To: pan...@umich.edu
Cc: Tulasi 

Hi Prof Pandey:

I have got your email ID through Internet where I have learned that
you are conducting research on Bengali Script
< http://std.dkuug.dk/jtc1/sc2/wg2/docs/n3311.pdf >

Congratulations!

On this script, currently there is a discussion going on in Unicode forum.

I have learned that West Bengal Government (WBG) has a standard on this script.

Can you please email a JPG image with list of all letters/symbols
including cascaded conjuncts that are found in WBG standard?

Thank you!

Tulasi


Re: Pashto yeh characters

2010-07-27 Thread David Starner
On Tue, Jul 27, 2010 at 5:07 PM, Christoph Päper
 wrote:
> David Starner:
>> On Tue, Jul 27, 2010 at 12:43 PM, Andreas Prilop
>>> [U+0649] is no Arabic letter, but an Uighur letter.
>>
>> That's wrong, though. […] U+0649 must be an Arabic character;
>
> Andreas probably meant that U+0649 is not part of the Arabic writing system, 
> i.e. the Arabic script as used in writing the Arabic language (with some 
> recognised orthography).
>
> You probably mean that U+0649 is part of the Arabic script, which it 
> certainly is.

No, what I mean was that MacArabic, Windows-1256 and ISO-8859-6 are
designed to write the Arabic language. If U+0649 is in these character
sets, to say that it's really a Uighur character is like saying that
U+0041 is really a Greek character; it spits in the face of how the
character has been used and how fonts have been designed for the
character.

-- 
Kie ekzistas vivo, ekzistas espero.




Re: ? Reasonable to propose stability policy on numeric type = decimal

2010-07-27 Thread karl williamson

vanis...@boil.afraid.org wrote:

From: Kenneth Whistler (k...@sybase.com)

C. E. Whitehead said: 

I've not gone through many character charts though so I can't 
really speak as an expert as you all can; sorry I've not gotten 
to more; I will try to ... 
For people who wish to pursue this issue further, the relevant 
information is neatly summarized in the extracted property 
data file: 

http://www.unicode.org/Public/UNIDATA/extracted/DerivedNumericType.txt 

That is what you should look at for efficiency, and 
is basically what the UTC would be using for discussion 
about this matter. 

--Ken 


C.E.

Specifically, notice New Tai Lue numbers (U+19D0-U+19DA). We have a sequence of 
eleven gc=Nd, that absolutely cannot be arranged so that consecutive code points 
have ascending numeric values. I doubt that if Arabic were encoded today that there 
would be a full set of Eastern digits, only 4-7, with 0-3 and 8 & 9 sharing 
with the regular Arabic digits.


I don't understand what you mean here by the "Arabic" digits.  Please 
give a code point number example.
 This leads me to the conclusion that any formal policy is inviting 
definitionally insoluble problems in future encodings - collision 
between encoding each character only once, and having a mathematically 
pure digit sequence.


That having been said, I have absolutely no problem with reserving a code point 
for zero, especially when a script is still in current use by a modern language 
community. Even if usage has not been place-value before, it is a simple 
adaptation for a script when its user community is exposed to global business, 
scientific, and standards communities.

Even though I have no official say, as a script encoder, my vote would be to 
simply recommend that decimal digits be sequentially ordered 0-9, and to leave 
a reserved code point if the system is in modern use but does not currently use 
place-value, and hence have a digit zero. I would explicitly fight against 
anything more formal, as it would unnecessarily encumber script encoders who 
have to balance a lot more interests than just programmers who won't provide 
for an exception branch for non-sequential number arrangements. You've gotta do 
it anyway, for CJK and New Tai Lue. I would also question any programmer who 
wouldn't allow for mixing of the two blocks of Arabit digits. Just leave the 
code open for future additions, just as you do for the sequential/ascending 
numbers.



The original proposal also called for leaving empty a couple of code 
points after '9' to allow things like New Tai Lue having duplicate '1' 
digits to be adjacent to the block of 9 digits.  Do you have a problem 
with that?

-Van Anderson







Re: Why does EULER CONSTANT not have math property and PLANCK CONSTANT does?

2010-07-27 Thread Asmus Freytag

On 7/27/2010 3:02 PM, Kenneth Whistler wrote:

Karl Williamson asked:

  

Subject: Why does EULER CONSTANT not have math property and PLANCK CONSTANT 
does?



  

They are U+2107 and U+210E respectively.



Because U+210E PLANCK CONSTANT is, to quote the standard,
"simply a mathematical italic h". It serves as the filler for
the gap in the run of mathematical italic letters at U+1D455.
  

Correct - they form a set and need to be treated consistently.


Other letterlike symbols in that block are not given the
Other_Math property, even if they may be used in mathematical
expressions. (Note that regular Greek letters are also not
given the Other_Math property, even though they obviously also
occur in mathematical expressions.)
  
For Euler Constant and Weierstrass elliptic function, this doesn't make 
a lot of sense, as these are explicitly mathematical characters, not 
characters that are "also used in mathematical expressions".


I have put in a formal proposal to add these two (2107 and 2118) to the 
list of characters with the math property.

The Math property can be thought of as a hint that a particular
symbol is specialized for mathematical usage; it isn't a
property that any character that ever occurs in a mathematical
expression needs to have. Nor is every character with
the Math property only used in mathematical contexts.
  
One way to look at this property is as a way to help detection of 
mathematical expressions in running text. Characters that are primarily 
used for mathematical purposes, or prominently used there, should be 
included. Characters that are heavily used in ordinary text, with 
non-mathematical uses should be excluded.


A./
  
  





Re: Pashto yeh characters

2010-07-27 Thread Mansour, Kamal
Ron, as you've already noticed, there can be multiple conventions for the 
orthography of a single language.

For the Yeh repertoire, typically the following are used:
u+06CC
u+06CD
u+06D0

For a current corpus, have a look at BBC News (http://www.bbc.co.uk/pashto) and 
Deutsche Welle (http://www.dw-world.de/)

Kamal


On 2010.7.22 10:17, "lingu...@artstein.org"  wrote:

Hi,

This is a query I had originally sent to the Linguist List, modified
based on feedback I got there. I am hoping that someone in the Unicode
community can help resolve this.

I'm interested in knowing if there is a standard way to encode the
various Pashto yeh-characters in Unicode, and if so, what it is. This
question is a bit more complicated than it sounds, so here's the
background.

Pashto is written using a derivative of the Arabic script. The Arabic
language uses a single character for both /j/ and /i:/ sounds. Like
many Arabic characters, this one is composed of a base form (which
changes shape based on its position in a word) and dots (in this case,
two dots below the base form). In most of the Arabic-speaking world
the dots are present with both the medial and final form, though in
Egypt (and possibly other places) the convention is to have two dots
on the medial form but leave them off the final form. The standard
arrangement of the two dots is horizontal, but they can be placed
vertically or diagonally with no change in meaning.

Persian also uses a single character for /j/ and /i:/, with the
convention of two dots on the medial form, no dots on the final form
(same as in Egypt).

The two conventions for the /j/-/i:/ character were given distinct
code points in unicode despite the fact that they do not contrast;
documentation is scarce, but presumably this was done in order to
allow writing both Arabic and Persian in the same document. Therefore,
Unicode has the following code points (I'm not giving the names, but
rather the typical visual representation of the glyphs and typical use).

U+064A two dots medially and finally (/j/-/i:/ Arabic convention)
U+06CC two dots medially, none finally (/j/-/i:/ Persian convention)

There are a few additional yeh-base code points defined, some of which
are relevant to Pashto (see below).

U+0649 no dots medially or finally (Arabic /a/ from etymological /j/)
U+0626 hamza above medially and finally (Arabic glottal stop in
certain contexts)
U+06D0 two dots medially and finally in vertical arrangement
U+06CD tail and no dots in final position

As it so happens, there is much confusion in how these characters are
used in actual electronic documents, which is not surprising given
that U+06CC looks like U+064A in medial position but like U+0649 in
final position. There is an excellent article by Jonathan Kew that
sorts out what this means for various languages that use derivatives
of the Arabic script.

http://scripts.sil.org/cms/scripts/render_download.php?site_id=nrsi=file_id=arabicletterusagenotes=ArabicLetterUsageNotes.pdf
 


Unfortunately, this article does not discuss Pashto. I have little
knowledge of the language, but here's what I managed to understand
from the inspection of a few documents and with the help of friendly
people on the Linguist List (and please correct me if I'm wrong).

Traditionally, Pashto used a single character with the same convention
as in Persian, of two dots in the medial form and none on the final
form, and with no significance attached to the visual arrangement of
the dots. The character was 3-ways ambiguous between the sounds /j/,
/i:/ and /e/. In recent decades (probably since the 1970s or 1980s)
there has been some differentiation, partly due to changes in the
typesetting process and partly due to a deliberate effort of the
Pashto Academy at the University of Peshawar, Pakistan.

One convention that has gained fairly wide acceptance is a distinction
between a horizontal arrangement of the dots, representing /j/ or /i:/
as in Arabic and Persian, and a vertical arrangement representing the
sound /e/. This distinction is the same as in Uighur, and the
character with vertical dots has been codified as U+06D0. Additional
conventions include a hamza (U+0626) or tail (U+06CD) to represent /j/
at the end of a word in certain grammatical markers. All of these are
quite standard by now and do not pose much of a problem.

However, a further convention appears to have arisen, which as far as
I can tell is unique to Pashto in that it distinguishes between /j/
and /i:/ (though only in word-final position):

/j/ is written with two dots medially, none finally
/i:/ is written with two dots both medially and finally

I have never seen this codified explicitly, but this

Re: Why does EULER CONSTANT not have math property and PLANCK CONSTANT does?

2010-07-27 Thread Kenneth Whistler
Karl Williamson asked:

> Subject: Why does EULER CONSTANT not have math property and PLANCK CONSTANT 
> does?

> They are U+2107 and U+210E respectively.

Because U+210E PLANCK CONSTANT is, to quote the standard,
"simply a mathematical italic h". It serves as the filler for
the gap in the run of mathematical italic letters at U+1D455.

All of the mathematical alphanumeric symbols are given
the Other_Math property, and so also the derived Math property.
And for consistency, any of the mathematical alphanumeric
symbols omitted from the Mathematical Alphanumeric Symbols
block, because the corresponding font-styled variant had
already been encoded in the Letterlike Symbols block, are
also given the Other_Math property.

Other letterlike symbols in that block are not given the
Other_Math property, even if they may be used in mathematical
expressions. (Note that regular Greek letters are also not
given the Other_Math property, even though they obviously also
occur in mathematical expressions.)

The Math property can be thought of as a hint that a particular
symbol is specialized for mathematical usage; it isn't a
property that any character that ever occurs in a mathematical
expression needs to have. Nor is every character with
the Math property only used in mathematical contexts.
  
> Chapter 4 of TUS seems to 
> indicate that neither should, since they both are operands, and it says 
> this property applies to mathematical operators.

Actually, Chapter 4 no longer says anything about the Math
property. It is discussed in Section 15.4, Mathematical Symbols.

That text still says:

"The mathematical (math) property is an informative property of
characters that are used as operators in mathematical formulas."

Technically it doesn't say that it is a property *only* of such
operators -- and obviously it isn't when you examine the actual
list, since nobody considers the long list of mathematical
alphanumeric symbols to be operators. So it might be nice
if someone would propose an update to that text to better
describe the actual set and so as not to give the misleading
impression that it applies *only* to operators.

Incidentally, much more detailed information about the classification
of Unicode characters for math is available in the data file
associated with UTR #25:

http://www.unicode.org/Public/math/revision-11/MathClassEx-11.txt

The contents of that file is not limited just to characters
with the value Math=True.

--Ken




Re: Pashto yeh characters

2010-07-27 Thread Christoph Päper
David Starner:
> On Tue, Jul 27, 2010 at 12:43 PM, Andreas Prilop
>> [U+0649] is no Arabic letter, but an Uighur letter.
> 
> That's wrong, though. […] U+0649 must be an Arabic character;

Andreas probably meant that U+0649 is not part of the Arabic writing system, 
i.e. the Arabic script as used in writing the Arabic language (with some 
recognised orthography).

You probably mean that U+0649 is part of the Arabic script, which it certainly 
is.

No contradiction here, just not a good idea to use ‘Arabic’ as an adjective 
with ‘letter’ or ‘character’, unless you make sure everyone agrees – I would – 
that letters are constituents of writing systems, whereas characters form 
scripts. 

Manywhere, though, ‘writing system’, ‘script’, ‘orthography’, ‘alphabet’ and 
even ‘language’ tend to be synonyms (and may share a name with people and 
religion, too), as do ‘character’, ‘letter’, ‘glyph’, ‘grapheme’, ‘sign’ and 
‘symbol’. Some scholars like to use (or invent) alternative names to aid the 
distinction, e.g. I’ve seen – I think in one of Coulmas’ books – Latin/Roman 
and – elsewhere – Arabic/Arabetic/Arabian, but that would only really help if 
enough people understood and did it.



Re: ? Reasonable to propose stability policy on numeric type = decimal

2010-07-27 Thread vanisaac
From: Kenneth Whistler (k...@sybase.com)

> C. E. Whitehead said: 
> 
> > I've not gone through many character charts though so I can't 
> > really speak as an expert as you all can; sorry I've not gotten 
> > to more; I will try to ... 
> 
> For people who wish to pursue this issue further, the relevant 
> information is neatly summarized in the extracted property 
> data file: 
> 
> http://www.unicode.org/Public/UNIDATA/extracted/DerivedNumericType.txt 
> 
> That is what you should look at for efficiency, and 
> is basically what the UTC would be using for discussion 
> about this matter. 
> 
> --Ken 

C.E.

Specifically, notice New Tai Lue numbers (U+19D0-U+19DA). We have a sequence of 
eleven gc=Nd, that absolutely cannot be arranged so that consecutive code 
points have ascending numeric values. I doubt that if Arabic were encoded today 
that there would be a full set of Eastern digits, only 4-7, with 0-3 and 8 & 9 
sharing with the regular Arabic digits. This leads me to the conclusion that 
any formal policy is inviting definitionally insoluble problems in future 
encodings - collision between encoding each character only once, and having a 
mathematically pure digit sequence.

That having been said, I have absolutely no problem with reserving a code point 
for zero, especially when a script is still in current use by a modern language 
community. Even if usage has not been place-value before, it is a simple 
adaptation for a script when its user community is exposed to global business, 
scientific, and standards communities.

Even though I have no official say, as a script encoder, my vote would be to 
simply recommend that decimal digits be sequentially ordered 0-9, and to leave 
a reserved code point if the system is in modern use but does not currently use 
place-value, and hence have a digit zero. I would explicitly fight against 
anything more formal, as it would unnecessarily encumber script encoders who 
have to balance a lot more interests than just programmers who won't provide 
for an exception branch for non-sequential number arrangements. You've gotta do 
it anyway, for CJK and New Tai Lue. I would also question any programmer who 
wouldn't allow for mixing of the two blocks of Arabit digits. Just leave the 
code open for future additions, just as you do for the sequential/ascending 
numbers.

-Van Anderson




Re: ? Reasonable to propose stability policy on numeric type = decimal

2010-07-27 Thread Kenneth Whistler
C. E. Whitehead said:

> I've not gone through many character charts though so I can't 
> really speak as an expert as you all can; sorry I've not gotten 
> to more; I will try to ...

For people who wish to pursue this issue further, the relevant
information is neatly summarized in the extracted property
data file:

http://www.unicode.org/Public/UNIDATA/extracted/DerivedNumericType.txt

That is what you should look at for efficiency, and
is basically what the UTC would be using for discussion
about this matter.

--Ken




Why does EULER CONSTANT not have math property and PLANCK CONSTANT does?

2010-07-27 Thread karl williamson
They are U+2107 and U+210E respectively.  Chapter 4 of TUS seems to 
indicate that neither should, since they both are operands, and it says 
this property applies to mathematical operators.




Re: ? Reasonable to propose stability policy on numeric type = decimal

2010-07-27 Thread CE Whitehead

Hi.

> From: Mark Davis ☕ (m...@macchiato.com)
> Date: Mon Jul 26 2010 - 14:13:22 CDT 
> I agree that having it stated at point of use is useful - and we do that in 
> other cases covered by stability clauses; but we can only state it IF we 
> have the corresponding stability policy. 

> Mark 

> . . . 
 
>> On Mon, Jul 26, 2010 at 11:06, Asmus Freytag  wrote: 

>>> On 7/26/2010 6:55 AM, John Burger wrote: 
> 
>>> Mark Davis ☕ wrote: 
>> 
 From just a quick scan, it appears that they are currently all contiguous 
 within their respective groups. If we were to impose a stability policy, 
 it 
 would be a constraint on the general_category: we would not assign 
 general_category=decimal_number to any character unless it was part of a 
 contiguous range of 10 such characters with ascending values from 0..9. 
>>> 
>> While that is true for the properties, it's not true for the encoding of 
>> character that are *used* as decimal digits. Martin gave the most widely 
>> used counterexample. 
>> 
>> 
>>> 
>>> Whether such a policy makes sense, I'm not clear on why it would be called 
>>> a "stability" policy - the analogy to the existing such policies seems 
>>> strained at best. 
>>> 
>> There are two parts to this. 
>> 
>> One, and I think this is the more important part, is to have an encoding 
>> policy of not splitting up runs of decimal digits - which would include 
>> reserving a spot for a zero, in case, *over the lifetime of Unicode*, some 
>> script changes their use from numbers 1-9 to decimal digits. 
>> 
>> The other is a guarantee of what it means for a character to have the 
>> decimal digit property. 
>> 
>> My suggestion for handling this, differ a bit from what has been discussed 
>> so far. 
>> 
>> The first I would address by suitable language in the WG2 Principles and 
>> Procedures document. This is where policies on encoding are maintained. 
>> True, these policies do allow exceptions, but exceptions (note Han !) do 
>> exist, and if a similar case of mixed-use character came along, then they 
>> would have to be dealt with accordingly. What the P&P would do is remove the 
>> wrong notion that it is OK to scatter runs of known decimal digits when 
>> encoding new scripts. 
>> 
>> The second I would address not by a stability policy, but by clarity of 
>> definition of the property. Language such as: 
>> 
>> "A character is given the decimal digit property, if and only if, it is 
>> used in a decimal place-value notation and all 10 digits are encoded 
>> in a single unbroken run starting with the digit of value 0, in 
>> ascending 
>> order of magnitude". 
>> 
>> or equivalent would be quite sufficient. That language happens to be a much 
>> clearer statement of the *implicit* definition used in assigning this 
>> property than the language found in UAX#44 or Unicode Section 4.6. 
>> 
>> Having that language where the property is documented is much more useful 
>> and visible than in a stability policy. 
>> 
>> A./ 
I like this policy -- both parts of it -- but agree with Asmus that the first 
thing to do is define a decimal digit; that will rule out the characters such 
as Asmus has described where 
">  the same [alphabetic] characters 
 > are also used as elements in a system that doesn't use place-value, but 
 > uses special characters to show powers of 10. "
(there is no reason for these not to be as contiguous as possible but these 
cannot be contiguous if they are alphabetic . . .
and if there is no zero then reserving a space for the zero is a moot issue; 
also these are all encoded and I think we want the policy for future encodings 
only)
there are other cases where characters do not use place value although they 
seem to be based on 10's 100's etc; 
a number of languages used | for 1 ; || for 2 ; ||| for 3 
or something similar, and then have bundled multiples of 10 (many of these seem 
to be ancient languages . . . mostly it seems, and certainly there is no 0 and 
no need to reserve space for it; 
I've not gone through many character charts though so I can't really speak as 
an expert as you all can; sorry I've not gotten to more; I will try to (I have 
been looking some at my registries instead; long story).
 
Best,
C. E. Whitehead
cewcat...@hotmail.com
 


 
  

RE: Pashto yeh characters

2010-07-27 Thread CE Whitehead

Hi, Khaled, Arno, Andreas:

 

All the Arabic characters (consonants, hamzas, but not vowel diacritics or 
numbers) that I need are betwee U621 (hamza) and 64A; there are vowel 
diacritics that can be used immediately following these and then the Arabic 
numbers.  (Would any of these look-alikes be security issues?  Both these 
characters are allowed in IDN's; see:

http://unicode.org/reports/tr36/idn-chars.html)

 

Thanks all.

 

Best,

 

C. E. Whitehead

cewcat...@hotmail.com

So I would concur with Khaled and Arno here that U649 is Arabic aleph maqsura (


 
> Date: Tue, 27 Jul 2010 20:09:21 +0200
> From: a...@zedat.fu-berlin.de
> To: prilop4...@trashmail.net
> CC: unicode@unicode.org; lingu...@artstein.org
> Subject: Re: Pashto yeh characters
> 
> Andreas Prilop:
> > U+0649 has the traditional name "alif maqsura" because it was
> > taken from ISO-8859-6. But I see no objection to use U+06CC
> > for alif maqsura.
> 
> I beg to differ
> Since U+0649 is called alif maqsura
> it should be used for alif maqsura.
> 
> Please not that in the Qur'an
> it occurs not only at the end of words.
> 
> That two glyphs are the same
> dies not mean that the letters are the same.
> Or do you use small l for capital I
> when using Helvetica?
> 
> 
> 
  

Re: Pashto yeh characters

2010-07-27 Thread Arno Schmitt
Andreas Prilop:
> U+0649 has the traditional name "alif maqsura" because it was
> taken from ISO-8859-6. But I see no objection to use U+06CC
> for alif maqsura.

I beg to differ
Since U+0649 is called alif maqsura
it should be used for alif maqsura.

Please not that in the Qur'an
it occurs not only at the end of words.

That two glyphs are the same
dies not mean that the letters are the same.
Or do you use small l for capital I
when using Helvetica?





Re: Pashto yeh characters

2010-07-27 Thread Khaled Hosny
On Tue, Jul 27, 2010 at 06:43:19PM +0200, Andreas Prilop wrote:

[...]

> U+0649 has (should have) four glyphs without any dots. This is no
> Arabic letter, but an Uighur letter. Therefore you should not use
> U+0649 for Arabic, Persian, Pashto, Urdu but only U+06CC.

I'm not sure what is the bases of this conclusion, but U+0649 have no
dots in its initial/medial forms in Arabic too, it just happen not to
get in those two positions in modern orthography, but it can be seen in
Quran which is still written in the old, early Islamic orthography.

See the attached image showing the words فسوىهن and ميكىل.

Regards,
 Khaled

-- 
 Khaled Hosny
 Arabic localiser and member of Arabeyes.org team
 Free font developer
<>

Re: Pashto yeh characters

2010-07-27 Thread David Starner
On Tue, Jul 27, 2010 at 12:43 PM, Andreas Prilop
 wrote:
> U+0649 has (should have) four glyphs without any dots. This is no
> Arabic letter, but an Uighur letter. Therefore you should not use
> U+0649 for Arabic, Persian, Pashto, Urdu but only U+06CC.

That's wrong, though. MacArabic, Windows-1256 and ISO-8859-6 are all
standards for the encoding of Arabic. Thus U+0649 must be an Arabic
character; existing use in both those sets and in Unicode say that is.

-- 
Kie ekzistas vivo, ekzistas espero.



Re: Pashto yeh characters

2010-07-27 Thread Andreas Prilop
On Thu, 22 Jul 2010, lingu...@artstein.org wrote:

> [...]
> To wrap up, are my observations about the Pashto writing conventions
> correct? And is there a standard for assigning the Pashto characters
> representing /j/ and /i:/ to Unicode code points?

Practical answer:

U+0649 and U+064A are included in MacArabic/MacFarsi and Windows-1256;
but U+06CC is not. Support for 0649 and 064A in fonts is still better
than for 06CC. For example, try the various Arabic fonts in Windows XP:
 http://www.user.uni-hannover.de/nhtcapri/temp/ya.arabic.html

Therefore you should use only U+0649 and U+064A for Arabic, Persian, Urdu
if you want your documents to be displayed on other computers.
I have done so in
 http://www.user.uni-hannover.de/nhtcapri/arabic-alphabet.html
 http://www.user.uni-hannover.de/nhtcapri/persian-alphabet.html
 http://www.user.uni-hannover.de/nhtcapri/mac-urdu-alphabet.html

However, for Pashto you need characters outside Windows-1256 anyway.

   * * * * * *

Theoretical answer:

U+0649 has (should have) four glyphs without any dots. This is no
Arabic letter, but an Uighur letter. Therefore you should not use
U+0649 for Arabic, Persian, Pashto, Urdu but only U+06CC.
I have done so in
 http://www.user.uni-hannover.de/nhtcapri/urdu-alphabet.html
 http://www.user.uni-hannover.de/nhtcapri/pashto-alphabet.html

U+0649 has the traditional name "alif maqsura" because it was
taken from ISO-8859-6. But I see no objection to use U+06CC
for alif maqsura.

You cannot distinguish the initial and middle glyphs of 064A and 06CC.
Use whatever you want. Given the practical answer above, you might
prefer U+064A. But if you don't have U+06CC in your font, you
probably don't have Pashto letters either.



Re: Reasonable to propose stability policy on numeric type = decimal

2010-07-27 Thread Raymond Mercier

"John Dlugosz" writes

I can imagine supporting national representations for numbers for 
outputting reports, but I don't imagine anyone writing in a >>programming 
language would be compelled to type 四佰六十 instead of 560.


Especially since 四佰六十 is 460.

Raymond Mercier 





RE: Reasonable to propose stability policy on numeric type = decimal

2010-07-27 Thread John Dlugosz
> I'm considering extending an existing computer programming language
> which currently only understands numbers composed solely by the ASCII
> numbers to also understand those from other scripts.  I'm not going to
> do it unless it is easy within the existing implementation (not some
> theoretical better implementation) and efficient and not a security
> threat.

I can imagine supporting national representations for numbers for outputting 
reports, but I don't imagine anyone writing in a programming language would be 
compelled to type 四佰六十 instead of 560.  It's more like an English speaker 
spelling out the words.  Notice that it's not just digits, but contains 
explicit powers and no zero.  There are also variations used for writing 
checks, so they are still "scattered".

I can imagine wanting to write in-language numbers for writing headings, for 
example, and would want software to understand they are numbers and not opaque 
labels.  That usage would include roman numerals too.  So I can certainly see 
uses for a "string to value" library function that worked with all manners of 
national digits and the ways in which those digits are actually used -- not 
just transliterating to modern international notation.

--John







TradeStation Group, Inc. is a publicly-traded holding company (NASDAQ GS: TRAD) 
of three operating subsidiaries, TradeStation Securities, Inc. (Member NYSE, 
FINRA, SIPC and NFA), TradeStation Technologies, Inc., a trading software and 
subscription company, and TradeStation Europe Limited, a United Kingdom, 
FSA-authorized introducing brokerage firm. None of these companies provides 
trading or investment advice, recommendations or endorsements of any kind. The 
information transmitted is intended only for the person or entity to which it 
is addressed and may contain confidential and/or privileged material. Any 
review, retransmission, dissemination or other use of, or taking of any action 
in reliance upon, this information by persons or entities other than the 
intended recipient is prohibited. If you received this in error, please contact 
the sender and delete the material from any computer.




Re: Reasonable to propose stability policy on numeric type = decimal

2010-07-27 Thread John Burger

karl williamson wrote:


Asmus Freytag wrote:

The situation is worse than you indicate, because the same characters
are also used as elements in a system that doesn't use place-value,  
but

uses special characters to show powers of 10.



I would think I wouldn't support these numbers, since we couldn't be
unambiguously sure of what was intended.

Another issue that I brought up a while back on this list is Tamil
numbers, where western practice seems to have infiltrated enough that
Unicode gave them Gc=Nd, but IIRC from the responses I got back then,
they can appear in older style with other characters meaning 10, 100,
1000.  In implementing this, if any of the other characters were
encountered in parsing such a number, it would disqualify it.


I think you could treat the Han digits the same way:  In some of the  
Chinese news corpora I work with, the ten Han digits are frequently  
used Western-style, especially for years, phone numbers, and other  
identifiers.


- John D. Burger
  MITRE




Re: Indian Rupee Sign (U+20B9) proposal

2010-07-27 Thread Akshat Joshi
On Fri, Jul 23, 2010 at 6:28 AM, Kenneth Whistler  wrote:

> However, I can assure Tulasi (and anyone else of the "*nation of
> over a billion populations*") that having 17 (or 17 lakh, or
> even 17 crore) proposals to encode the Indian Rupee Sign
> won't get it encoded any quicker than having one or two
> proposals. Character encoding proposals are not *petitions*
> decided by numbers or weight of delivered stacks -- they are technical
> proposals reviewed for technical content and decision by two technical
> committees.
>

Apologies for the late mail on a matter that was long closed maybe.
Just read it now hence replying.

Even though you were reiterating what someone else had said, that too when
you know that you are addressing a nation, doesn't sound very positive.

-Akshat Joshi
India
PS: Even though new to post on this thread, am not very new to Unicode.