New Public Review Issue posted

2004-12-23 Thread Rick McGowan
The Unicode Technical Committee has posted a new issue for public review  
and comment. Details are on the following web page:

http://www.unicode.org/review/

Review periods for the new items close on January 31, 2005.

Please see the page for links to discussion and relevant documents.  
Briefly, the new issue is:


59  Disunification of Dandas

The UTC is considering the question of disunifying the characters U+0964  
DEVANAGARI DANDA and U+0965 DEVANAGARI DOUBLE DANDA from their counterparts  
in several other Indic scripts. Feedback on this issue, for or against the  
disunification, is being sought.

A background document is available here:

http://www.unicode.org/review/pr-59.html


If you have comments for official UTC consideration, please post them by  
submitting your comments through our feedback  reporting page:

http://www.unicode.org/reporting.html

If you wish to discuss issues on the Unicode mail list, then please use  
the following link to subscribe (if necessary). Please be aware that  
discussion comments on the Unicode mail list are not automatically recorded  
as input to the UTC. You must use the reporting link above to generate  
comments for UTC consideration.

http://www.unicode.org/consortium/distlist.html

Regards,
Rick McGowan
Unicode, Inc.



Danda disunification (was Re: New Public Review Issue posted)

2004-12-23 Thread James Kass

Public Review Issue # 59 concerning danda and double danda
doesn't mention the Limbu script specifically.

The double danda, at least, is used in the Limbu script.
See the exhibit on page 12 of N2410.PDF.  It's also listed 
in the Limbu punctuation shown on page 16.

Best regards,

James Kass



Re: Danda disunification (was Re: New Public Review Issue posted)

2004-12-23 Thread Asmus Freytag
At 04:32 PM 12/23/2004, James Kass wrote:
Public Review Issue # 59 concerning danda and double danda
doesn't mention the Limbu script specifically.
The double danda, at least, is used in the Limbu script.
See the exhibit on page 12 of N2410.PDF.  It's also listed
in the Limbu punctuation shown on page 16.
Some notes:
The Limbu double danda shows little visual differentiation
from the Devanagari double danda - it seems shorter, but
it's difficult to separate font-related from script related
effects here.
If the text sample is typical, it would seem that it is used
quite frequently in ordinary text in Limbu, while dandas in
other scripts were claimed to be used only in special
contexts.
A./
PS: the URL is: http://anubis.dkuug.dk/jtc1/sc2/wg2/docs/n2410.pdf 




New Public Review Issue posted

2004-12-22 Thread Rick McGowan
The CLDR Technical Committee has posted a new issue for public
review and comment. Details are on the following web page:

http://www.unicode.org/review/#pri58

Review periods for the new items close on January 31, 2005.

Please see the page for links to discussion and relevant documents.
Briefly, the new issue is:


58  Characters with cedilla and comma below in Romanian language

The CLDR Technical Committee is seeking feedback regarding the
relative frequency of use of the characters with comma below and
of the characters with cedilla in Romanian language textual material.



If you have comments for official UTC consideration, please post them by
submitting your comments through our feedback  reporting page:

http://www.unicode.org/reporting.html

If you wish to discuss issues on the Unicode mail list, then please
use the following link to subscribe (if necessary). Please be aware
that discussion comments on the Unicode mail list are not automatically
recorded as input to the UTC. You must use the reporting link above
to generate comments for UTC consideration.

http://www.unicode.org/consortium/distlist.html

Regards,
Rick McGowan
Unicode, Inc.



New Public Review Issue

2004-11-24 Thread Rick McGowan
The Unicode Technical Committee has posted a new issue for public
review and comment. Details are on the following web page:

http://www.unicode.org/review/

Review period for the new item closes on January 31, 2005.

Please see the page for links to discussion and relevant documents.
Briefly, the new issue is:


57  Changes to Bidi categories of some characters used with Mathematics

The UTC is considering changing the bidi category of seven compatibility
characters from ET to ES:
U+207A SUPERSCRIPT PLUS SIGN
U+208A SUBSCRIPT PLUS SIGN
U+FB29 HEBREW LETTER ALTERNATIVE PLUS SIGN
U+FE62 SMALL PLUS SIGN
U+FE63 SMALL HYPHEN-MINUS
U+FF0B FULLWIDTH PLUS SIGN
U+FF0D FULLWIDTH HYPHEN-MINUS

The UTC is also seeking feedback on the bidi categories of the following
characters, and whether to also change these from ET to ES:
U+2212 MINUS SIGN
U+207B SUPERSCRIPT MINUS
U+208B SUBSCRIPT MINUS

All of these characters may be used in connection with mathematical 
applications.


If you have comments for official UTC consideration, please post them by
submitting your comments through our feedback  reporting page:

http://www.unicode.org/reporting.html

If you wish to discuss issues on the Unicode mail list, then please
use the following link to subscribe (if necessary). Please be aware
that discussion comments on the Unicode mail list are not automatically
recorded as input to the UTC. You must use the reporting link above
to generate comments for UTC consideration.

http://www.unicode.org/consortium/distlist.html

Regards,
Rick McGowan
Unicode, Inc.



New Public Review Issue posted

2004-09-13 Thread [EMAIL PROTECTED]
The Unicode Technical Committee has posted a new issue for public review  
and comment. Details are on the following web page:

http://www.unicode.org/review/

Review period for the new item closes on November 11, 2004.

Please see the page for links to discussion and relevant documents.
Briefly, the new issue is:


46   Proposal for Encoded Representations of Meteg

In some Biblical Hebrew usage, it is considered necessary to distinguish
how the meteg mark positions relative to a vowel point: to the left of the  
vowel, or to the right; or, in the case of a hataf vowel, between the two  
components of the hataf vowel. A solution for this has been proposed using  
control characters, including the zero width joiner and non-joiner
characters. This public-review issue is soliciting feedback on this
proposed solution.


If you have comments for official UTC consideration, please post them by
submitting your comments through our feedback  reporting page:

http://www.unicode.org/reporting.html

If you wish to discuss issues on the Unicode mail list, then please
use the following link to subscribe (if necessary). Please be aware
that discussion comments on the Unicode mail list are not automatically
recorded as input to the UTC. You must use the reporting link above
to generate comments for UTC consideration.

http://www.unicode.org/consortium/distlist.html

Regards,
Rick McGowan
Unicode, Inc.



Re: New Public Review Issue posted

2004-09-13 Thread Chris Jacobs

- Original Message - 
From: [EMAIL PROTECTED]
To: [EMAIL PROTECTED]
Cc: [EMAIL PROTECTED]
Sent: Tuesday, September 14, 2004 1:21 AM
Subject: New Public Review Issue posted


 The Unicode Technical Committee has posted a new issue for public review  
 and comment. Details are on the following web page:
 
 http://www.unicode.org/review/
 
 Review period for the new item closes on November 11, 2004.
 
 Please see the page for links to discussion and relevant documents.

In table 7 the glyph for U+05D6 looks wrong




RE: New Public Review Issue posted

2004-09-13 Thread Peter Constable
That's what you get when you copy and paste text when you're a bit
tired. Of course, the column on the right was supposed to say 05D0...
ALEF. I've submitted a revised doc.


Peter

 -Original Message-
 From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]
On
 Behalf Of Chris Jacobs
 Sent: Monday, September 13, 2004 6:35 PM
 To: [EMAIL PROTECTED]; [EMAIL PROTECTED]
 Cc: [EMAIL PROTECTED]
 Subject: Re: New Public Review Issue posted
 
 
 - Original Message -
 From: [EMAIL PROTECTED]
 To: [EMAIL PROTECTED]
 Cc: [EMAIL PROTECTED]
 Sent: Tuesday, September 14, 2004 1:21 AM
 Subject: New Public Review Issue posted
 
 
  The Unicode Technical Committee has posted a new issue for public
review
  and comment. Details are on the following web page:
 
  http://www.unicode.org/review/
 
  Review period for the new item closes on November 11, 2004.
 
  Please see the page for links to discussion and relevant documents.
 
 In table 7 the glyph for U+05D6 looks wrong
 





New Public Review Issue posted

2004-07-13 Thread Sarasvati
The officers of the Unicode Consortium have posted a new issue for public
review and comment. Details are on the following web page:

http://www.unicode.org/review/

Review period for the new item closes on August 3, 2004.

Please see the page for links to discussion and relevant documents.
Briefly, the new issue is:


37  Clarification of the Use of Zero Width Joiner in Indic Scripts

There are some inconsistencies in the use of ZERO WIDTH JOINER (ZWJ)
in a number of Indic scripts which are outlined in the accompanying review
document. This proposal intends to rectify these problems, clarifying
how the ZERO WIDTH JOINER is to be applied in scripts, and consolidating
common mechanisms for equivalent problems that exist in several scripts.
The scope for what is proposed covers Devanagari, Bengali, Gurmukhi,
Gujarati, Oriya, Tamil, Telugu, Kannada and Malayalam.

The question for reviewers is: Should the UTC adopt a model in which
ZWJ precedes Virama, as proposed in section 7 of the review document?



If you have comments for official UTC consideration, please post them by
submitting your comments through our feedback  reporting page:

http://www.unicode.org/reporting.html

If you wish to discuss issues on the Unicode mail list, then please
use the following link to subscribe (if necessary). Please be aware
that discussion comments on the Unicode mail list are not automatically
recorded as input to the UTC. You must use the reporting link above
to generate comments for UTC consideration.

http://www.unicode.org/consortium/distlist.html

Regards,
Rick McGowan
Unicode, Inc.



Re: New Public Review Issue posted

2004-05-26 Thread D. Starner
Mark Davis [EMAIL PROTECTED] writes:

 Why modifier letters -- those are not really
 superscripts. Waw?
 
Last time I went looking for Modifier Letter Small N,
I decided it was encoded as U+207F, SUPERSCRIPT LATIN SMALL
LETTER N. If it's not, pretty much every variant of n has
been encoded as a modifier letter, except for the basic small
letter.
-- 
___
Sign-up for Ads Free at Mail.com
http://promo.mail.com/adsfreejump.htm




Re: New Public Review Issue posted

2004-05-26 Thread Michael Everson
At 10:19 -0800 2004-05-26, D. Starner wrote:
Mark Davis [EMAIL PROTECTED] writes:
 Why modifier letters -- those are not really
 superscripts. Waw?
Last time I went looking for Modifier Letter Small N,
I decided it was encoded as U+207F, SUPERSCRIPT LATIN SMALL
LETTER N. If it's not, pretty much every variant of n has
been encoded as a modifier letter, except for the basic small
letter.
That's it.
--
Michael Everson * * Everson Typography *  * http://www.evertype.com


RE: New Public Review Issue posted

2004-05-26 Thread Peter Constable
 From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]
On Behalf
 Of D. Starner


 Last time I went looking for Modifier Letter Small N,
 I decided it was encoded as U+207F, SUPERSCRIPT LATIN SMALL
 LETTER N. If it's not, pretty much every variant of n has
 been encoded as a modifier letter, except for the basic small
 letter.

Whatever the character properties, it is certainly the case that U+207F
is used in phonetic transcription in analogous contexts to characters in
the Modifier Letters block.



Peter
 
Peter Constable
Globalization Infrastructure and Font Technologies
Microsoft Windows Division




RE: New Public Review Issue posted

2004-05-26 Thread Michael Everson
At 13:16 -0700 2004-05-26, Peter Constable wrote:
Whatever the character properties, it is certainly the case that U+207F
is used in phonetic transcription in analogous contexts to characters in
the Modifier Letters block.
NOTA BENE: Is used. It's been recommended for more than a decade.
--
Michael Everson * * Everson Typography *  * http://www.evertype.com


New Public Review Issue posted

2004-05-25 Thread Rick McGowan
The Unicode Technical Committee has posted a new issue for public
review and comment. Details are on the following web page:

http://www.unicode.org/review/

Review period for the new item closes on June 8, 2004.

Please see the page for links to discussion and relevant documents.
Briefly, the new issue is:


Draft Unicode Technical Report #30 Character Foldings   2004.06.08

An updated draft of UTR #30 Character Foldings is now available. This
update also provides draft data files for four types of character foldings.
The Unicode Technical Committee especially seeks review of the data files.


If you have comments for official UTC consideration, please post them by
submitting your comments through our feedback  reporting page:

http://www.unicode.org/reporting.html

If you wish to discuss issues on the Unicode mail list, then please
use the following link to subscribe (if necessary). Please be aware
that discussion comments on the Unicode mail list are not automatically
recorded as input to the UTC. You must use the reporting link above
to generate comments for UTC consideration.

http://www.unicode.org/consortium/distlist.html

Regards,
Rick McGowan
Unicode, Inc.



Re: New Public Review Issue posted

2004-05-25 Thread jcowan
Rick McGowan scripsit:
 The Unicode Technical Committee has posted a new issue for public
 review and comment. Details are on the following web page:
 
   http://www.unicode.org/review/

I have prepared a draft DiacriticFolding.txt file for this issue; it is
temporarily available at http://www.ccil.org/~cowan/DiacriticFolding.txt .
This was prepared by looking for lines in UnicodeData that matched
the regex '(GREEK|LATIN|CYRILLIC|HEBREW).*WITH'.  (I added Hebrew to the
set of scripts specified by the current draft of #30.)

Characters with decompositions were mapped into the base character of the
decomposition; characters without decompositions were mapped by name.
The file http://www.ccil.org/~cowan/DiacriticFoldingExceptions.txt contains
a list of 32 characters matching the pattern which did not seem to me
to be suitable for diacritic folding.

I have posted a short version of this note to the Unicode comment form.

Comments?

-- 
A rabbi whose congregation doesn't want John Cowan
to drive him out of town isn't a rabbi, http://www.ccil.org/~cowan
and a rabbi who lets them do it [EMAIL PROTECTED]
isn't a man.--Jewish saying http://www.reutershealth.com



Re: New Public Review Issue posted

2004-05-25 Thread Mark Davis
I don't think the fold to base is as useful as some other information. For
those characters with a canonical decomposition, the decomposition carries more
more information, since you can combine it with a remove combining marks
folding to get the folding to base.

For my part, what would be more interesting would be a full decomposition of
the characters that don't have a canonical decomposition, e.g.

LATIN CAPITAL LETTER O WITH STROKE = O + /

BTW, I had posted some commentary on TR30, which I will repeat here.

... I found these files almost
impossible to assess in code point form, so I ran them through a quick ICU
transform to add comments with the real characters and names. I also NFC'd the
forms, just for consistency. These files generated from Asmus's are in
http://www.macchiato.com/utc/tr30/.

I had suggest posting them in this form for public review of the TR, since
others will have the same difficulty in assessing the quality of the data.

Here are some quick comments.

http://www.macchiato.com/utc/tr30/HiraganaFolding-new.txt

Adding digraph expansions seems quite odd.

http://www.macchiato.com/utc/tr30/KatakanaFolding-new.txt

When in NFC, whole batches of these mappings are NOPs. Don't know why they are
there; they are also not consistent in the use of composed vs. decomposed forms.

This file combines half-width katakana folding. I think it is much more useful
if that is separated out. Someone can apply a sequence of two transforms if they
want both.

http://www.macchiato.com/utc/tr30/SuperscriptFolding-new.txt

This feels like a real potpourri of stuff. Why superscripts and not subscripts?
Why annotation characters? Why modifier letters -- those are not really
superscripts. Waw?

http://www.macchiato.com/utc/tr30/WidthFolding-new.txt

This file would be MUCH more useful if in two separate files.

Full-width to half-width
Half-width to full-width

Again, remove the NFC mappings.

27E6; 301A #MATHEMATICAL LEFT WHITE SQUARE BRACKET  LEFT WHITE SQUARE
BRACKET

These don't appear to be a width issue.

Note that I have not checked these new data tables for completeness; these were
just some quick observations.


Mark
__
http://www.macchiato.com
  

- Original Message - 
From: [EMAIL PROTECTED]
To: [EMAIL PROTECTED]
Sent: Tue, 2004 May 25 14:57
Subject: Re: New Public Review Issue posted


 Rick McGowan scripsit:
  The Unicode Technical Committee has posted a new issue for public
  review and comment. Details are on the following web page:
 
  http://www.unicode.org/review/

 I have prepared a draft DiacriticFolding.txt file for this issue; it is
 temporarily available at http://www.ccil.org/~cowan/DiacriticFolding.txt .
 This was prepared by looking for lines in UnicodeData that matched
 the regex '(GREEK|LATIN|CYRILLIC|HEBREW).*WITH'.  (I added Hebrew to the
 set of scripts specified by the current draft of #30.)

 Characters with decompositions were mapped into the base character of the
 decomposition; characters without decompositions were mapped by name.
 The file http://www.ccil.org/~cowan/DiacriticFoldingExceptions.txt contains
 a list of 32 characters matching the pattern which did not seem to me
 to be suitable for diacritic folding.

 I have posted a short version of this note to the Unicode comment form.

 Comments?

 -- 
 A rabbi whose congregation doesn't want John Cowan
 to drive him out of town isn't a rabbi, http://www.ccil.org/~cowan
 and a rabbi who lets them do it [EMAIL PROTECTED]
 isn't a man.--Jewish saying http://www.reutershealth.com






New Public Review Issue posted

2004-03-24 Thread Rick McGowan
The Unicode Technical Committee has posted a new issue for public review
and comment. Details are on the following web page:

http://www.unicode.org/review/

Review periods for the new item closes on June 8, 2004.

Please see the page for links to discussion and relevant documents.
Briefly, the new issue is:

---

31  Cantonese Romanization  2004.06.08

The sources for the Unihan database use multiple competing romanizations  
of Cantonese, while the Unihan database uses yet another romanization. We  
feel that there is no good reason for Unicode to contribute to this  
confusion, so we plan to adopt a single, standard Cantonese romanization  
for use throughout the Unihan database.

---

Also, the closing dates for issues #20 and #25 have been extended into June.

---

If you have comments for official UTC consideration, please post them by
submitting your comments through our feedback  reporting page:

http://www.unicode.org/reporting.html

If you wish to discuss issues on the Unicode mail list, then please
use the following link to subscribe (if necessary). Please be aware
that discussion comments on the Unicode mail list are not automatically
recorded as input to the UTC. You must use the reporting link above
to generate comments for UTC consideration.

http://www.unicode.org/consortium/distlist.html

Regards,
Rick McGowan
Unicode, Inc.



Re: New Public Review Issue

2004-02-24 Thread Peter Kirk
On 23/02/2004 15:33, Rick McGowan wrote:

The Unicode Technical Committee has posted a new issue for public review  
and comment. Details are on the following web page:

	http://www.unicode.org/review/

Review periods for the new item closes on June 8, 2004.

Please see the page for links to discussion and relevant documents.  
Briefly, the new issue is:

---

30  Bengali Khanda Ta  (Closes 2004.06.08)

...

 

Although I don't know much about Bengali, my work on Hebrew and other 
languages leads me to think of other possible options beyond the four 
described in this document, which should be considered seriously if 
changes to the existing encoding model are being considered.

The option  ta, ZWJ, virama  is mentioned in the document, but 
dismissed without proper argument although it would seem to me that this 
is a far more logical encoding than  ta, virama, ZWJ . After all, the 
character in question can easily be understood as a ligature of ta and 
virama, but certainly not as ta followed by a ligature of virama with 
the following character. While I can understand the objection that this 
involve[s] innovations into the general Indic encoding model, there 
does come a time when such innovations are preferable to kludges of the 
existing model. A recent UTC decision has removed the objection to this 
encoding that ZWJ should not be used within a combining character sequence.

Another alternative which should be considered is use of a variation 
selector. These were apparently designed for situations like this where 
two characters are graphically distinct and perceived by the user 
community as distinct, but also have an underlying unity which should be 
preserved. In one sense this can be considered as like a new character, 
thus meeting the user community preference for model D, but it also 
meets the last objection to this model.

--
Peter Kirk
[EMAIL PROTECTED] (personal)
[EMAIL PROTECTED] (work)
http://www.qaya.org/



RE: New Public Review Issue

2004-02-24 Thread Peter Constable
 From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]
On Behalf
 Of Peter Kirk

 The option  ta, ZWJ, virama  is mentioned in the document, but
 dismissed without proper argument although it would seem to me that
this
 is a far more logical encoding than  ta, virama, ZWJ . After all,
the
 character in question can easily be understood as a ligature of ta and
 virama, but certainly not as ta followed by a ligature of virama with
 the following character.

I had indeed thought of  ta, ZWJ, virama  because of the fact that the
khanda ta is kind of like a ligature of ta and virama. But the generic
use of ZWJ for requesting more-ligated forms is *not* applicable to
Indic scripts. (If it were,  C, virama, C  should produce a half form
and  C, virama, ZWJ, C  should be required to generate the conjuct
form.) It would *not* lead to more reliable implementations and better
usability to mix usages of ZWJ like this unless absolutely necessary.


 While I can understand the objection that this
 involve[s] innovations into the general Indic encoding model, there
 does come a time when such innovations are preferable to kludges of
the
 existing model.

Using  ta, virama, ZWJ  for khanda ta is hardly a kludge. While khanda
ta does not have behaviours typical of a half form wrt clustering (and
so is probably best not referred to as a half form), it *is* referred
to as such by some, including some Bengalis. The Indic model specifies
the use of  C, virama, C  normally and  C, virama, ZWJ, C  and  C,
virama, ZWNJ, C  for explicit overrides, and this is precisely what is
being proposed here.



 Another alternative which should be considered is use of a variation
 selector.

None of the stakeholders on this issue has suggested that option, and I
suspect would reject it outright. There is no need to introduce a
variation selector; it would constitute yet another innovation in the
Indic model and would only lead to more confusion. 

While the notion that a different presentation form for what is in some
sense the same thing does provide some motivation for the suggestion,
the Indic model already has mechanisms for dealing with this in the
context of Indic scripts. In this context, then, this would be a far
greater kludge than a minor deviation from prototypical behaviour of ZWJ
wrt clustering.

I was aware of these other possibilities; I left them out of the
discussion for a reason: they would only serve to make the document
longer with no real benefit.


Peter
 
Peter Constable
Globalization Infrastructure and Font Technologies
Microsoft Windows Division



RE: New Public Review Issue

2004-02-24 Thread Kenneth Whistler

  Another alternative which should be considered is use of a variation
  selector.
 
 None of the stakeholders on this issue has suggested that option, and I
 suspect would reject it outright. There is no need to introduce a
 variation selector; it would constitute yet another innovation in the
 Indic model and would only lead to more confusion. 

I agree with Peter (C, not K) here. The problem with an
approach using variation selectors is twofold. As Peter
Constable says, it would constitute another innovation for
controlling forms in Indic processing, introducing the
possibility for more confusion and mismatch in implementations.
Even worse, however, is that variation selectors are intended
to be ignorable without serious distortion of the impact on
text interpretation. The typical cases of variation selection
for math symbols just picks out a glyph preference between
what are otherwise freely interchangeable forms. But in
the case of khanda-ta we have a fixed orthographic form that
is correct in some circumstances and incorrect in others, at
least by all accounts I've been hearing. It is such situations
that have typically used ZWJ and ZWNJ in Indic scripts to
control required forms.

Think of variation selection as being more appropriate when
what we are talking about are for most purposes simply
*free variants* for presentation -- either is equally correct
to most people under most circumstances -- but where for
particular presentation purposes someone wishes to choose
out a precise variant and have indication of that usage
reside in the text stream itself. (And even then, this is
only used in extreme circumstances when failure to have such
a mechanism available is causing a mapping problem or similar
issue which threatens to become a character *encoding* problem
for the committees.)

--Ken




RE: New Public Review Issue

2004-02-24 Thread Asmus Freytag
At 12:11 PM 2/24/2004, Kenneth Whistler wrote:
Think of variation selection as being more appropriate when
what we are talking about are for most purposes simply
*free variants* for presentation -- either is equally correct
to most people under most circumstances -- but where for
particular presentation purposes someone wishes to choose
out a precise variant and have indication of that usage
reside in the text stream itself. (And even then, this is
only used in extreme circumstances when failure to have such
a mechanism available is causing a mapping problem or similar
issue which threatens to become a character *encoding* problem
for the committees.)
This is *not* the case for the Mongolian FVS, by the way, one
of the reasons that we didn't use generic Variation selectors
for that script.
I'm not(!) advocating a Bengali FVS, but adding such a beast would
in theory overcome Ken's objection about ignorability of variation
selectors, as it could have documented behavior that's not generic.
However, that's got to be about the second least attractive option
imaginable. (Leaving the slot for truly least attractive option
open here for some as-yet-undiscovered monstrosity ;-)
A./ 





RE: New Public Review Issue

2004-02-24 Thread Kenneth Whistler

 I'm not(!) advocating a Bengali FVS, but adding such a beast would
 in theory overcome Ken's objection about ignorability of variation
 selectors, as it could have documented behavior that's not generic.
 
 However, that's got to be about the second least attractive option
 imaginable. (Leaving the slot for truly least attractive option
 open here for some as-yet-undiscovered monstrosity ;-)

BENGALI COMBINING KHANDA MODIFIER

A combining mark, which only applies to a TA baseform, and
which has the effect of reshaping the TA into a khanda-ta
form. How's that for an alternative?

--Ken ;-)

Or, if you don't like that, we could have khanda-ta represented
by the sequence of Latin letters, k,h,a,n,d,a,-,t,a and
have the rendering engines and fonts remap that sequence
to the appropriate glyph.




RE: New Public Review Issue

2004-02-24 Thread Peter Constable
 From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]
On
 Behalf Of Kenneth Whistler


  However, that's got to be about the second least attractive option
  imaginable. (Leaving the slot for truly least attractive option
  open here for some as-yet-undiscovered monstrosity ;-)
 
 BENGALI COMBINING KHANDA MODIFIER
 
 A combining mark, which only applies to a TA baseform, and
 which has the effect of reshaping the TA into a khanda-ta
 form. How's that for an alternative?

Gackk!


 Or, if you don't like that, we could have khanda-ta represented
 by the sequence of Latin letters, k,h,a,n,d,a,-,t,a and
 have the rendering engines and fonts remap that sequence
 to the appropriate glyph.

Naw, that's pretty lame, being rather similar to what is done in some
systems representing characters as named entities. For instance, it
wouldn't be hard to imagine khandata; in an XML stream. The only thing
missing is that would be a layer of representation one level removed
from Unicode.


What about creating a new control character? Some possible names:

ZERO WIDTH CONJUNCTIVE NON-JOINER
ZERO WIDTH NON-JOINING JOINER
ZERO WIDTH HALF FORM NON-JOINER
ZERO WIDTH SEMI-JOINER

Hey, I think I like that last one. ;-) 

This could be used in the kinds of contexts in which ZWJ and ZWNJ have
been used, but would provide a third alternative for situations like
this in which there is a binary distinction but one of the two things to
be represented doesn't exactly fit into the mold of ZWJ. Now, in most
situations, where things *do* fit the mold of ZWJ, ZWSJ would behave
exactly like ZWJ. But in a situation like this, it would be the
opposite: ZWSJ would be used for the khanda ta; and as for how the
corresponding sequence with ZWJ should be displayed, ZWJ would behave
just like ZWSJ.
 
Of course, we would be free to start inventing new renderings that can
be given to things like Arabic letters preceded or followed by ZWSJ, or
 c, ZWSJ, t .

(I'll bet this idea still doesn't reach the pinnacle of monstrosity.)


Peter



New Public Review Issue

2004-02-23 Thread Rick McGowan
The Unicode Technical Committee has posted a new issue for public review  
and comment. Details are on the following web page:

http://www.unicode.org/review/

Review periods for the new item closes on June 8, 2004.

Please see the page for links to discussion and relevant documents.  
Briefly, the new issue is:

---

30  Bengali Khanda Ta  (Closes 2004.06.08)

The description of khanda ta in section 9.2 of Unicode 4.0 and in one of  
the current Indic FAQs assumed a particular understanding of expected  
behaviors rather than stating those expectations explicitly. Due to certain  
wording and an atypical use of ZERO WIDTH JOINER, some implementers have  
been misled about the behaviors related to khanda ta that were assumed.

In the course of investigating this issue, input was received suggesting  
that the atypical use of ZERO WIDTH JOINER was problematic, and that a  
different encoded representation for khanda ta should be adopted.

Alternate representations for khanda ta are described and evaluated in the  
review document. It is proposed that the existing representation specified  
in section 9.2 be retained, but that the description in the Standard be  
revised to remove any ambiguity and potential for misunderstanding.

---

If you have comments for official UTC consideration, please post them by
submitting your comments through our feedback  reporting page:

http://www.unicode.org/reporting.html

If you wish to discuss issues on the Unicode mail list, then please
use the following link to subscribe (if necessary). Please be aware
that discussion comments on the Unicode mail list are not automatically
recorded as input to the UTC. You must use the reporting link above
to generate comments for UTC consideration.

http://www.unicode.org/consortium/distlist.html

Regards,
Rick McGowan
Unicode, Inc.




Re: PR#11 (soft-dotted property) and digraphs (was: New Public Review Issue posted)

2004-02-13 Thread Philippe Verdy
From: Rick McGowan [EMAIL PROTECTED]
 Philippe (and others who might be looking),

  I can't remember what was decided about the Soft-Dotted property of some
  Latin ligatures/digraphs with i or j in PR #11 (yes it was closed on last
  August...).

 The resolved issues are posted on the Resolved Issues page. It is linked
 from the Public Review page.

Exactly. That's when reading this page that I posted this question...

The Resolved issue just speaks about ij (explicitly excluded from
soft-dotted characters) but not about lj and similar digraphs (formed with a
soft-dotted letter)...




New Public Review Issue posted

2004-02-12 Thread Rick McGowan
The Unicode Technical Committee has posted a new issue for public review
and comment. Details are on the following web page:

http://www.unicode.org/review/

The review period for the new item closes on June 8, 2004.

Please see the page for links to discussion and relevant documents.
Briefly, the new issue is:


29  Normalization Issue  (Closes 2004.06.08)

There is a problem in the language of the specification of Unicode  
Standard Annex #15: Unicode Normalization Forms for forms NFC and NFKC. A  
textual fix is required to make normalization formally self-consistent. The  
fix will not have an impact on real data found in practice (with the  
possible exception of test cases for the algorithm itself), because the  
affected sequences do not constitute well-formed text in any language.  
Details, cases, and recommendations can be found in the review document.


If you have comments for official UTC consideration, please post them by
submitting your comments through our feedback  reporting page:

http://www.unicode.org/reporting.html

If you wish to discuss issues on the Unicode mail list, then please use
the following link to subscribe (if necessary). Please be aware that
discussion comments on the Unicode mail list are not automatically recorded  
as input to the UTC. You must use the reporting link above to generate
comments for UTC consideration.

http://www.unicode.org/consortium/distlist.html

Note: If you are a liaison representative, please forward this message as  
appropriate within your organization.


Please also note that the Unicode 4.0.1 beta period has now closed (issue  
#13). We have also closed issues #26, #27, and #28. Their resolutions can  
all be found on the Resolved Issues page, linked from the above Public  
Review page.


Regards,
Rick McGowan
Unicode, Inc.



Re: New Public Review Issue posted

2004-02-12 Thread Philippe Verdy
I can't remember what was decided about the Soft-Dotted property of some Latin
ligatures/digraphs with i or j in PR #11 (yes it was closed on last August...).

I speak about lj for example. As they are not listed in the final resolution,
I suppose they are still not soft-dotted, and thus their dots are retained
intact even after a diacritic is added above them (exactly like for ij where
this is explicitly stated).

- Original Message - 
From: Rick McGowan [EMAIL PROTECTED]
To: [EMAIL PROTECTED]
Sent: Thursday, February 12, 2004 11:20 PM
Subject: New Public Review Issue posted


 The Unicode Technical Committee has posted a new issue for public review
 and comment. Details are on the following web page:

 http://www.unicode.org/review/

 The review period for the new item closes on June 8, 2004.




Re: New Public Review Issue posted

2004-02-12 Thread Rick McGowan
Philippe (and others who might be looking),

 I can't remember what was decided about the Soft-Dotted property of some
 Latin
 ligatures/digraphs with i or j in PR #11 (yes it was closed on last
 August...).


The resolved issues are posted on the Resolved Issues page. It is linked  
from the Public Review page.

Rick




New Public Review Issue

2004-01-29 Thread Rick McGowan
Note: This announcement was intended to go out a few days ago, but was  
delayed due to e-mail trouble with the recent net-wide virus. We apologize  
for the inconvenience of having such a short review period.

The Unicode Technical Committee has posted a new issue for public review  
and comment. Details are on the following web page:

http://www.unicode.org/review/

Review period for the new item closes on February 4, 2004.

Please see the page for links to discussion and relevant documents.  
Briefly, the new issue is:

28  BIDI Boundary_Neutral Property Value  (Closes 2004.02.04)

The BIDI property value BN is currently aligned with the General Category  
Value Format_Character (Cf), minus, the BIDI specific format characters  
(LRM, RLM, RLE, LRE, RLO, LRO, PDF). The intent of the BN property is to  
allow the BIDI algorithm to ignore invisible, irrelevant characters when  
determining the ordering of the visible characters. The proposal is to  
align the BN property with Default_Ignorable_Code_Point property (DICP)  
instead of Cf, minus again the BIDI specific characters.


If you have comments for official UTC consideration, please post them by  
submitting your comments through our feedback  reporting page:

http://www.unicode.org/reporting.html

If you wish to discuss issues on the Unicode mail list, then please use  
the following link to subscribe (if necessary). Please be aware that  
discussion comments on the Unicode mail list are not automatically recorded  
as input to the UTC. You must use the reporting link above to generate  
comments for UTC consideration.

http://www.unicode.org/consortium/distlist.html

Regards,
Rick McGowan
Unicode, Inc.



RE: [hebrew] ZWJ and ZWNJ in combining sequences, was: New Public Review Issue posted

2004-01-19 Thread Peter Constable
Is there any reason why this needed to be cross-posted to both lists?
Certain members of the Hebrew list have had a very bad habit of allowing
that discussion to spill over to the Unicode list for no good reason. I
hope that responders will be careful in posting to the Hebrew list only.


Peter




New Public Review Issue posted

2004-01-16 Thread Rick McGowan
The Unicode Technical Committee has posted a new issue for public review  
and comment. Details are on the following web page:

http://www.unicode.org/review/

Review periods for the new item closes on January 27, 2004.

Please see the page for links to discussion and relevant documents.  
Briefly, the new issue is:


Issue #27   Joiner/Nonjoiner in Combining Character Sequences

Unicode 4.0 describes the structure of Khmer syllables, saying that they  
may contain an interior ZWJ. There is a problem with this that needs to be  
resolved in 4.0.1, because some of the characters later in the syllable can  
be combining characters. This paper describes a proposal with to fix this  
problem. As a part of the proposal, a choice has to be made among two  
alternatives.


If you have comments for official UTC consideration, please post them by  
submitting your comments through our feedback  reporting page:

http://www.unicode.org/reporting.html

If you wish to discuss issues on the Unicode mail list, then please use  
the following link to subscribe (if necessary). Please be aware that  
discussion comments on the Unicode mail list are not automatically recorded  
as input to the UTC. You must use the reporting link above to generate  
comments for UTC consideration.

http://www.unicode.org/consortium/distlist.html

Let me take this opportunity also to remind everyone that the closing date  
for comment on several other public review issues is approaching, so if  
you have comments, please try to send them in soon.

Note: If you are a liaison representative, please forward this message as  
appropriate within your organization.

Regards,
Rick McGowan
Unicode, Inc.





ZWJ and ZWNJ in combining sequences, was: New Public Review Issue posted

2004-01-16 Thread Peter Kirk
On 16/01/2004 11:17, Rick McGowan wrote:

The Unicode Technical Committee has posted a new issue for public review  
and comment. Details are on the following web page:

	http://www.unicode.org/review/

Review periods for the new item closes on January 27, 2004.

Please see the page for links to discussion and relevant documents.  
Briefly, the new issue is:

Issue #27   Joiner/Nonjoiner in Combining Character Sequences

Unicode 4.0 describes the structure of Khmer syllables, saying that they  
may contain an interior ZWJ. There is a problem with this that needs to be  
resolved in 4.0.1, because some of the characters later in the syllable can  
be combining characters. This paper describes a proposal with to fix this  
problem. As a part of the proposal, a choice has to be made among two  
alternatives.

 

Although this issue has been brought up for review in the light of the 
problem with Khmer, it also has a significant impact on Hebrew, and for 
that reason I am bringing it to the attention of the Hebrew list as well.

I support the main proposal, which is to allow the ZWJ and ZWNJ 
characters to occur within combining character sequences. When they 
occur between two combining marks, they will indicate joining and 
non-joining forms respectively of those two combining marks. In Hebrew, 
this will provide a convenient mechanism for requesting or inhibiting 
ligatures between meteg and hataf vowels (see 
http://www.qaya.org/academic/hebrew/Issues-Hebrew-Unicode.html secton 
3.5). Previously there was no such mechanism which was strictly 
compatible with Unicode definitions. With this change, the following 
distinctions can be made:

vowel, ZWJ, meteg - medial meteg preferred, but only possible if the 
vowel is a hataf vowel (ZWJ must be ignored for other vowels)

vowel, ZWNJ, meteg - left meteg preferred

vowel, meteg - no preference, font default should be used (probably 
left meteg with all vowels)

meteg, CGJ, vowel - right meteg preferred - or should this last one be 
meteg, ZWNJ, vowel, considering that ZWNJ will have the same effect as 
CGJ of blocking canonical reordering?

I have a small concern that at least potentially there might be a need 
to promote or inhibit a ligature between combining marks which do not 
come together in canonical order. For example, in principle a single 
Hebrew base character might be combined with a hataf vowel (ccc 11-13), 
dagesh (ccc 21) and meteg (ccc 22). In canonical order the dagesh would 
be reordered between the hataf vowel and the meteg, either before or 
after ZWJ/ZWNJ, and would interfere with the mechanism. It might be 
necessary to code dagesh, CGJ, hataf vowel, ZW(N)J, meteg or hataf 
vowel, ZW(N)J, meteg, CGJ, dagesh. No such combination actually occurs 
in the standard text of the Hebrew Bible, but in principle one might be 
found in other texts.

At first sight I see no reason to express a preference between option A 
or option B in the review issue, for Hebrew or any other reason.

Please note the following if you wish to make official feedback to the 
UTC on this matter.

If you have comments for official UTC consideration, please post them by  
submitting your comments through our feedback  reporting page:

   http://www.unicode.org/reporting.html

If you wish to discuss issues on the Unicode mail list, then please use  
the following link to subscribe (if necessary). Please be aware that  
discussion comments on the Unicode mail list are not automatically recorded  
as input to the UTC. You must use the reporting link above to generate  
comments for UTC consideration.

   http://www.unicode.org/consortium/distlist.html

Let me take this opportunity also to remind everyone that the closing date  
for comment on several other public review issues is approaching, so if  
you have comments, please try to send them in soon.

Note: If you are a liaison representative, please forward this message as  
appropriate within your organization.

Regards,
Rick McGowan
Unicode, Inc.




 



--
Peter Kirk
[EMAIL PROTECTED] (personal)
[EMAIL PROTECTED] (work)
http://www.qaya.org/