New Public Review Issue posted
The Unicode Technical Committee has posted a new issue for public review and comment. Details are on the following web page: http://www.unicode.org/review/ Review periods for the new items close on January 31, 2005. Please see the page for links to discussion and relevant documents. Briefly, the new issue is: 59 Disunification of Dandas The UTC is considering the question of disunifying the characters U+0964 DEVANAGARI DANDA and U+0965 DEVANAGARI DOUBLE DANDA from their counterparts in several other Indic scripts. Feedback on this issue, for or against the disunification, is being sought. A background document is available here: http://www.unicode.org/review/pr-59.html If you have comments for official UTC consideration, please post them by submitting your comments through our feedback reporting page: http://www.unicode.org/reporting.html If you wish to discuss issues on the Unicode mail list, then please use the following link to subscribe (if necessary). Please be aware that discussion comments on the Unicode mail list are not automatically recorded as input to the UTC. You must use the reporting link above to generate comments for UTC consideration. http://www.unicode.org/consortium/distlist.html Regards, Rick McGowan Unicode, Inc.
Danda disunification (was Re: New Public Review Issue posted)
Public Review Issue # 59 concerning danda and double danda doesn't mention the Limbu script specifically. The double danda, at least, is used in the Limbu script. See the exhibit on page 12 of N2410.PDF. It's also listed in the Limbu punctuation shown on page 16. Best regards, James Kass
Re: Danda disunification (was Re: New Public Review Issue posted)
At 04:32 PM 12/23/2004, James Kass wrote: Public Review Issue # 59 concerning danda and double danda doesn't mention the Limbu script specifically. The double danda, at least, is used in the Limbu script. See the exhibit on page 12 of N2410.PDF. It's also listed in the Limbu punctuation shown on page 16. Some notes: The Limbu double danda shows little visual differentiation from the Devanagari double danda - it seems shorter, but it's difficult to separate font-related from script related effects here. If the text sample is typical, it would seem that it is used quite frequently in ordinary text in Limbu, while dandas in other scripts were claimed to be used only in special contexts. A./ PS: the URL is: http://anubis.dkuug.dk/jtc1/sc2/wg2/docs/n2410.pdf
New Public Review Issue posted
The CLDR Technical Committee has posted a new issue for public review and comment. Details are on the following web page: http://www.unicode.org/review/#pri58 Review periods for the new items close on January 31, 2005. Please see the page for links to discussion and relevant documents. Briefly, the new issue is: 58 Characters with cedilla and comma below in Romanian language The CLDR Technical Committee is seeking feedback regarding the relative frequency of use of the characters with comma below and of the characters with cedilla in Romanian language textual material. If you have comments for official UTC consideration, please post them by submitting your comments through our feedback reporting page: http://www.unicode.org/reporting.html If you wish to discuss issues on the Unicode mail list, then please use the following link to subscribe (if necessary). Please be aware that discussion comments on the Unicode mail list are not automatically recorded as input to the UTC. You must use the reporting link above to generate comments for UTC consideration. http://www.unicode.org/consortium/distlist.html Regards, Rick McGowan Unicode, Inc.
New Public Review Issue
The Unicode Technical Committee has posted a new issue for public review and comment. Details are on the following web page: http://www.unicode.org/review/ Review period for the new item closes on January 31, 2005. Please see the page for links to discussion and relevant documents. Briefly, the new issue is: 57 Changes to Bidi categories of some characters used with Mathematics The UTC is considering changing the bidi category of seven compatibility characters from ET to ES: U+207A SUPERSCRIPT PLUS SIGN U+208A SUBSCRIPT PLUS SIGN U+FB29 HEBREW LETTER ALTERNATIVE PLUS SIGN U+FE62 SMALL PLUS SIGN U+FE63 SMALL HYPHEN-MINUS U+FF0B FULLWIDTH PLUS SIGN U+FF0D FULLWIDTH HYPHEN-MINUS The UTC is also seeking feedback on the bidi categories of the following characters, and whether to also change these from ET to ES: U+2212 MINUS SIGN U+207B SUPERSCRIPT MINUS U+208B SUBSCRIPT MINUS All of these characters may be used in connection with mathematical applications. If you have comments for official UTC consideration, please post them by submitting your comments through our feedback reporting page: http://www.unicode.org/reporting.html If you wish to discuss issues on the Unicode mail list, then please use the following link to subscribe (if necessary). Please be aware that discussion comments on the Unicode mail list are not automatically recorded as input to the UTC. You must use the reporting link above to generate comments for UTC consideration. http://www.unicode.org/consortium/distlist.html Regards, Rick McGowan Unicode, Inc.
New Public Review Issue posted
The Unicode Technical Committee has posted a new issue for public review and comment. Details are on the following web page: http://www.unicode.org/review/ Review period for the new item closes on November 11, 2004. Please see the page for links to discussion and relevant documents. Briefly, the new issue is: 46 Proposal for Encoded Representations of Meteg In some Biblical Hebrew usage, it is considered necessary to distinguish how the meteg mark positions relative to a vowel point: to the left of the vowel, or to the right; or, in the case of a hataf vowel, between the two components of the hataf vowel. A solution for this has been proposed using control characters, including the zero width joiner and non-joiner characters. This public-review issue is soliciting feedback on this proposed solution. If you have comments for official UTC consideration, please post them by submitting your comments through our feedback reporting page: http://www.unicode.org/reporting.html If you wish to discuss issues on the Unicode mail list, then please use the following link to subscribe (if necessary). Please be aware that discussion comments on the Unicode mail list are not automatically recorded as input to the UTC. You must use the reporting link above to generate comments for UTC consideration. http://www.unicode.org/consortium/distlist.html Regards, Rick McGowan Unicode, Inc.
Re: New Public Review Issue posted
- Original Message - From: [EMAIL PROTECTED] To: [EMAIL PROTECTED] Cc: [EMAIL PROTECTED] Sent: Tuesday, September 14, 2004 1:21 AM Subject: New Public Review Issue posted The Unicode Technical Committee has posted a new issue for public review and comment. Details are on the following web page: http://www.unicode.org/review/ Review period for the new item closes on November 11, 2004. Please see the page for links to discussion and relevant documents. In table 7 the glyph for U+05D6 looks wrong
RE: New Public Review Issue posted
That's what you get when you copy and paste text when you're a bit tired. Of course, the column on the right was supposed to say 05D0... ALEF. I've submitted a revised doc. Peter -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Chris Jacobs Sent: Monday, September 13, 2004 6:35 PM To: [EMAIL PROTECTED]; [EMAIL PROTECTED] Cc: [EMAIL PROTECTED] Subject: Re: New Public Review Issue posted - Original Message - From: [EMAIL PROTECTED] To: [EMAIL PROTECTED] Cc: [EMAIL PROTECTED] Sent: Tuesday, September 14, 2004 1:21 AM Subject: New Public Review Issue posted The Unicode Technical Committee has posted a new issue for public review and comment. Details are on the following web page: http://www.unicode.org/review/ Review period for the new item closes on November 11, 2004. Please see the page for links to discussion and relevant documents. In table 7 the glyph for U+05D6 looks wrong
New Public Review Issue posted
The officers of the Unicode Consortium have posted a new issue for public review and comment. Details are on the following web page: http://www.unicode.org/review/ Review period for the new item closes on August 3, 2004. Please see the page for links to discussion and relevant documents. Briefly, the new issue is: 37 Clarification of the Use of Zero Width Joiner in Indic Scripts There are some inconsistencies in the use of ZERO WIDTH JOINER (ZWJ) in a number of Indic scripts which are outlined in the accompanying review document. This proposal intends to rectify these problems, clarifying how the ZERO WIDTH JOINER is to be applied in scripts, and consolidating common mechanisms for equivalent problems that exist in several scripts. The scope for what is proposed covers Devanagari, Bengali, Gurmukhi, Gujarati, Oriya, Tamil, Telugu, Kannada and Malayalam. The question for reviewers is: Should the UTC adopt a model in which ZWJ precedes Virama, as proposed in section 7 of the review document? If you have comments for official UTC consideration, please post them by submitting your comments through our feedback reporting page: http://www.unicode.org/reporting.html If you wish to discuss issues on the Unicode mail list, then please use the following link to subscribe (if necessary). Please be aware that discussion comments on the Unicode mail list are not automatically recorded as input to the UTC. You must use the reporting link above to generate comments for UTC consideration. http://www.unicode.org/consortium/distlist.html Regards, Rick McGowan Unicode, Inc.
Re: New Public Review Issue posted
Mark Davis [EMAIL PROTECTED] writes: Why modifier letters -- those are not really superscripts. Waw? Last time I went looking for Modifier Letter Small N, I decided it was encoded as U+207F, SUPERSCRIPT LATIN SMALL LETTER N. If it's not, pretty much every variant of n has been encoded as a modifier letter, except for the basic small letter. -- ___ Sign-up for Ads Free at Mail.com http://promo.mail.com/adsfreejump.htm
Re: New Public Review Issue posted
At 10:19 -0800 2004-05-26, D. Starner wrote: Mark Davis [EMAIL PROTECTED] writes: Why modifier letters -- those are not really superscripts. Waw? Last time I went looking for Modifier Letter Small N, I decided it was encoded as U+207F, SUPERSCRIPT LATIN SMALL LETTER N. If it's not, pretty much every variant of n has been encoded as a modifier letter, except for the basic small letter. That's it. -- Michael Everson * * Everson Typography * * http://www.evertype.com
RE: New Public Review Issue posted
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of D. Starner Last time I went looking for Modifier Letter Small N, I decided it was encoded as U+207F, SUPERSCRIPT LATIN SMALL LETTER N. If it's not, pretty much every variant of n has been encoded as a modifier letter, except for the basic small letter. Whatever the character properties, it is certainly the case that U+207F is used in phonetic transcription in analogous contexts to characters in the Modifier Letters block. Peter Peter Constable Globalization Infrastructure and Font Technologies Microsoft Windows Division
RE: New Public Review Issue posted
At 13:16 -0700 2004-05-26, Peter Constable wrote: Whatever the character properties, it is certainly the case that U+207F is used in phonetic transcription in analogous contexts to characters in the Modifier Letters block. NOTA BENE: Is used. It's been recommended for more than a decade. -- Michael Everson * * Everson Typography * * http://www.evertype.com
New Public Review Issue posted
The Unicode Technical Committee has posted a new issue for public review and comment. Details are on the following web page: http://www.unicode.org/review/ Review period for the new item closes on June 8, 2004. Please see the page for links to discussion and relevant documents. Briefly, the new issue is: Draft Unicode Technical Report #30 Character Foldings 2004.06.08 An updated draft of UTR #30 Character Foldings is now available. This update also provides draft data files for four types of character foldings. The Unicode Technical Committee especially seeks review of the data files. If you have comments for official UTC consideration, please post them by submitting your comments through our feedback reporting page: http://www.unicode.org/reporting.html If you wish to discuss issues on the Unicode mail list, then please use the following link to subscribe (if necessary). Please be aware that discussion comments on the Unicode mail list are not automatically recorded as input to the UTC. You must use the reporting link above to generate comments for UTC consideration. http://www.unicode.org/consortium/distlist.html Regards, Rick McGowan Unicode, Inc.
Re: New Public Review Issue posted
Rick McGowan scripsit: The Unicode Technical Committee has posted a new issue for public review and comment. Details are on the following web page: http://www.unicode.org/review/ I have prepared a draft DiacriticFolding.txt file for this issue; it is temporarily available at http://www.ccil.org/~cowan/DiacriticFolding.txt . This was prepared by looking for lines in UnicodeData that matched the regex '(GREEK|LATIN|CYRILLIC|HEBREW).*WITH'. (I added Hebrew to the set of scripts specified by the current draft of #30.) Characters with decompositions were mapped into the base character of the decomposition; characters without decompositions were mapped by name. The file http://www.ccil.org/~cowan/DiacriticFoldingExceptions.txt contains a list of 32 characters matching the pattern which did not seem to me to be suitable for diacritic folding. I have posted a short version of this note to the Unicode comment form. Comments? -- A rabbi whose congregation doesn't want John Cowan to drive him out of town isn't a rabbi, http://www.ccil.org/~cowan and a rabbi who lets them do it [EMAIL PROTECTED] isn't a man.--Jewish saying http://www.reutershealth.com
Re: New Public Review Issue posted
I don't think the fold to base is as useful as some other information. For those characters with a canonical decomposition, the decomposition carries more more information, since you can combine it with a remove combining marks folding to get the folding to base. For my part, what would be more interesting would be a full decomposition of the characters that don't have a canonical decomposition, e.g. LATIN CAPITAL LETTER O WITH STROKE = O + / BTW, I had posted some commentary on TR30, which I will repeat here. ... I found these files almost impossible to assess in code point form, so I ran them through a quick ICU transform to add comments with the real characters and names. I also NFC'd the forms, just for consistency. These files generated from Asmus's are in http://www.macchiato.com/utc/tr30/. I had suggest posting them in this form for public review of the TR, since others will have the same difficulty in assessing the quality of the data. Here are some quick comments. http://www.macchiato.com/utc/tr30/HiraganaFolding-new.txt Adding digraph expansions seems quite odd. http://www.macchiato.com/utc/tr30/KatakanaFolding-new.txt When in NFC, whole batches of these mappings are NOPs. Don't know why they are there; they are also not consistent in the use of composed vs. decomposed forms. This file combines half-width katakana folding. I think it is much more useful if that is separated out. Someone can apply a sequence of two transforms if they want both. http://www.macchiato.com/utc/tr30/SuperscriptFolding-new.txt This feels like a real potpourri of stuff. Why superscripts and not subscripts? Why annotation characters? Why modifier letters -- those are not really superscripts. Waw? http://www.macchiato.com/utc/tr30/WidthFolding-new.txt This file would be MUCH more useful if in two separate files. Full-width to half-width Half-width to full-width Again, remove the NFC mappings. 27E6; 301A #MATHEMATICAL LEFT WHITE SQUARE BRACKET LEFT WHITE SQUARE BRACKET These don't appear to be a width issue. Note that I have not checked these new data tables for completeness; these were just some quick observations. Mark __ http://www.macchiato.com - Original Message - From: [EMAIL PROTECTED] To: [EMAIL PROTECTED] Sent: Tue, 2004 May 25 14:57 Subject: Re: New Public Review Issue posted Rick McGowan scripsit: The Unicode Technical Committee has posted a new issue for public review and comment. Details are on the following web page: http://www.unicode.org/review/ I have prepared a draft DiacriticFolding.txt file for this issue; it is temporarily available at http://www.ccil.org/~cowan/DiacriticFolding.txt . This was prepared by looking for lines in UnicodeData that matched the regex '(GREEK|LATIN|CYRILLIC|HEBREW).*WITH'. (I added Hebrew to the set of scripts specified by the current draft of #30.) Characters with decompositions were mapped into the base character of the decomposition; characters without decompositions were mapped by name. The file http://www.ccil.org/~cowan/DiacriticFoldingExceptions.txt contains a list of 32 characters matching the pattern which did not seem to me to be suitable for diacritic folding. I have posted a short version of this note to the Unicode comment form. Comments? -- A rabbi whose congregation doesn't want John Cowan to drive him out of town isn't a rabbi, http://www.ccil.org/~cowan and a rabbi who lets them do it [EMAIL PROTECTED] isn't a man.--Jewish saying http://www.reutershealth.com
New Public Review Issue posted
The Unicode Technical Committee has posted a new issue for public review and comment. Details are on the following web page: http://www.unicode.org/review/ Review periods for the new item closes on June 8, 2004. Please see the page for links to discussion and relevant documents. Briefly, the new issue is: --- 31 Cantonese Romanization 2004.06.08 The sources for the Unihan database use multiple competing romanizations of Cantonese, while the Unihan database uses yet another romanization. We feel that there is no good reason for Unicode to contribute to this confusion, so we plan to adopt a single, standard Cantonese romanization for use throughout the Unihan database. --- Also, the closing dates for issues #20 and #25 have been extended into June. --- If you have comments for official UTC consideration, please post them by submitting your comments through our feedback reporting page: http://www.unicode.org/reporting.html If you wish to discuss issues on the Unicode mail list, then please use the following link to subscribe (if necessary). Please be aware that discussion comments on the Unicode mail list are not automatically recorded as input to the UTC. You must use the reporting link above to generate comments for UTC consideration. http://www.unicode.org/consortium/distlist.html Regards, Rick McGowan Unicode, Inc.
Re: New Public Review Issue
On 23/02/2004 15:33, Rick McGowan wrote: The Unicode Technical Committee has posted a new issue for public review and comment. Details are on the following web page: http://www.unicode.org/review/ Review periods for the new item closes on June 8, 2004. Please see the page for links to discussion and relevant documents. Briefly, the new issue is: --- 30 Bengali Khanda Ta (Closes 2004.06.08) ... Although I don't know much about Bengali, my work on Hebrew and other languages leads me to think of other possible options beyond the four described in this document, which should be considered seriously if changes to the existing encoding model are being considered. The option ta, ZWJ, virama is mentioned in the document, but dismissed without proper argument although it would seem to me that this is a far more logical encoding than ta, virama, ZWJ . After all, the character in question can easily be understood as a ligature of ta and virama, but certainly not as ta followed by a ligature of virama with the following character. While I can understand the objection that this involve[s] innovations into the general Indic encoding model, there does come a time when such innovations are preferable to kludges of the existing model. A recent UTC decision has removed the objection to this encoding that ZWJ should not be used within a combining character sequence. Another alternative which should be considered is use of a variation selector. These were apparently designed for situations like this where two characters are graphically distinct and perceived by the user community as distinct, but also have an underlying unity which should be preserved. In one sense this can be considered as like a new character, thus meeting the user community preference for model D, but it also meets the last objection to this model. -- Peter Kirk [EMAIL PROTECTED] (personal) [EMAIL PROTECTED] (work) http://www.qaya.org/
RE: New Public Review Issue
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Peter Kirk The option ta, ZWJ, virama is mentioned in the document, but dismissed without proper argument although it would seem to me that this is a far more logical encoding than ta, virama, ZWJ . After all, the character in question can easily be understood as a ligature of ta and virama, but certainly not as ta followed by a ligature of virama with the following character. I had indeed thought of ta, ZWJ, virama because of the fact that the khanda ta is kind of like a ligature of ta and virama. But the generic use of ZWJ for requesting more-ligated forms is *not* applicable to Indic scripts. (If it were, C, virama, C should produce a half form and C, virama, ZWJ, C should be required to generate the conjuct form.) It would *not* lead to more reliable implementations and better usability to mix usages of ZWJ like this unless absolutely necessary. While I can understand the objection that this involve[s] innovations into the general Indic encoding model, there does come a time when such innovations are preferable to kludges of the existing model. Using ta, virama, ZWJ for khanda ta is hardly a kludge. While khanda ta does not have behaviours typical of a half form wrt clustering (and so is probably best not referred to as a half form), it *is* referred to as such by some, including some Bengalis. The Indic model specifies the use of C, virama, C normally and C, virama, ZWJ, C and C, virama, ZWNJ, C for explicit overrides, and this is precisely what is being proposed here. Another alternative which should be considered is use of a variation selector. None of the stakeholders on this issue has suggested that option, and I suspect would reject it outright. There is no need to introduce a variation selector; it would constitute yet another innovation in the Indic model and would only lead to more confusion. While the notion that a different presentation form for what is in some sense the same thing does provide some motivation for the suggestion, the Indic model already has mechanisms for dealing with this in the context of Indic scripts. In this context, then, this would be a far greater kludge than a minor deviation from prototypical behaviour of ZWJ wrt clustering. I was aware of these other possibilities; I left them out of the discussion for a reason: they would only serve to make the document longer with no real benefit. Peter Peter Constable Globalization Infrastructure and Font Technologies Microsoft Windows Division
RE: New Public Review Issue
Another alternative which should be considered is use of a variation selector. None of the stakeholders on this issue has suggested that option, and I suspect would reject it outright. There is no need to introduce a variation selector; it would constitute yet another innovation in the Indic model and would only lead to more confusion. I agree with Peter (C, not K) here. The problem with an approach using variation selectors is twofold. As Peter Constable says, it would constitute another innovation for controlling forms in Indic processing, introducing the possibility for more confusion and mismatch in implementations. Even worse, however, is that variation selectors are intended to be ignorable without serious distortion of the impact on text interpretation. The typical cases of variation selection for math symbols just picks out a glyph preference between what are otherwise freely interchangeable forms. But in the case of khanda-ta we have a fixed orthographic form that is correct in some circumstances and incorrect in others, at least by all accounts I've been hearing. It is such situations that have typically used ZWJ and ZWNJ in Indic scripts to control required forms. Think of variation selection as being more appropriate when what we are talking about are for most purposes simply *free variants* for presentation -- either is equally correct to most people under most circumstances -- but where for particular presentation purposes someone wishes to choose out a precise variant and have indication of that usage reside in the text stream itself. (And even then, this is only used in extreme circumstances when failure to have such a mechanism available is causing a mapping problem or similar issue which threatens to become a character *encoding* problem for the committees.) --Ken
RE: New Public Review Issue
At 12:11 PM 2/24/2004, Kenneth Whistler wrote: Think of variation selection as being more appropriate when what we are talking about are for most purposes simply *free variants* for presentation -- either is equally correct to most people under most circumstances -- but where for particular presentation purposes someone wishes to choose out a precise variant and have indication of that usage reside in the text stream itself. (And even then, this is only used in extreme circumstances when failure to have such a mechanism available is causing a mapping problem or similar issue which threatens to become a character *encoding* problem for the committees.) This is *not* the case for the Mongolian FVS, by the way, one of the reasons that we didn't use generic Variation selectors for that script. I'm not(!) advocating a Bengali FVS, but adding such a beast would in theory overcome Ken's objection about ignorability of variation selectors, as it could have documented behavior that's not generic. However, that's got to be about the second least attractive option imaginable. (Leaving the slot for truly least attractive option open here for some as-yet-undiscovered monstrosity ;-) A./
RE: New Public Review Issue
I'm not(!) advocating a Bengali FVS, but adding such a beast would in theory overcome Ken's objection about ignorability of variation selectors, as it could have documented behavior that's not generic. However, that's got to be about the second least attractive option imaginable. (Leaving the slot for truly least attractive option open here for some as-yet-undiscovered monstrosity ;-) BENGALI COMBINING KHANDA MODIFIER A combining mark, which only applies to a TA baseform, and which has the effect of reshaping the TA into a khanda-ta form. How's that for an alternative? --Ken ;-) Or, if you don't like that, we could have khanda-ta represented by the sequence of Latin letters, k,h,a,n,d,a,-,t,a and have the rendering engines and fonts remap that sequence to the appropriate glyph.
RE: New Public Review Issue
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Kenneth Whistler However, that's got to be about the second least attractive option imaginable. (Leaving the slot for truly least attractive option open here for some as-yet-undiscovered monstrosity ;-) BENGALI COMBINING KHANDA MODIFIER A combining mark, which only applies to a TA baseform, and which has the effect of reshaping the TA into a khanda-ta form. How's that for an alternative? Gackk! Or, if you don't like that, we could have khanda-ta represented by the sequence of Latin letters, k,h,a,n,d,a,-,t,a and have the rendering engines and fonts remap that sequence to the appropriate glyph. Naw, that's pretty lame, being rather similar to what is done in some systems representing characters as named entities. For instance, it wouldn't be hard to imagine khandata; in an XML stream. The only thing missing is that would be a layer of representation one level removed from Unicode. What about creating a new control character? Some possible names: ZERO WIDTH CONJUNCTIVE NON-JOINER ZERO WIDTH NON-JOINING JOINER ZERO WIDTH HALF FORM NON-JOINER ZERO WIDTH SEMI-JOINER Hey, I think I like that last one. ;-) This could be used in the kinds of contexts in which ZWJ and ZWNJ have been used, but would provide a third alternative for situations like this in which there is a binary distinction but one of the two things to be represented doesn't exactly fit into the mold of ZWJ. Now, in most situations, where things *do* fit the mold of ZWJ, ZWSJ would behave exactly like ZWJ. But in a situation like this, it would be the opposite: ZWSJ would be used for the khanda ta; and as for how the corresponding sequence with ZWJ should be displayed, ZWJ would behave just like ZWSJ. Of course, we would be free to start inventing new renderings that can be given to things like Arabic letters preceded or followed by ZWSJ, or c, ZWSJ, t . (I'll bet this idea still doesn't reach the pinnacle of monstrosity.) Peter
New Public Review Issue
The Unicode Technical Committee has posted a new issue for public review and comment. Details are on the following web page: http://www.unicode.org/review/ Review periods for the new item closes on June 8, 2004. Please see the page for links to discussion and relevant documents. Briefly, the new issue is: --- 30 Bengali Khanda Ta (Closes 2004.06.08) The description of khanda ta in section 9.2 of Unicode 4.0 and in one of the current Indic FAQs assumed a particular understanding of expected behaviors rather than stating those expectations explicitly. Due to certain wording and an atypical use of ZERO WIDTH JOINER, some implementers have been misled about the behaviors related to khanda ta that were assumed. In the course of investigating this issue, input was received suggesting that the atypical use of ZERO WIDTH JOINER was problematic, and that a different encoded representation for khanda ta should be adopted. Alternate representations for khanda ta are described and evaluated in the review document. It is proposed that the existing representation specified in section 9.2 be retained, but that the description in the Standard be revised to remove any ambiguity and potential for misunderstanding. --- If you have comments for official UTC consideration, please post them by submitting your comments through our feedback reporting page: http://www.unicode.org/reporting.html If you wish to discuss issues on the Unicode mail list, then please use the following link to subscribe (if necessary). Please be aware that discussion comments on the Unicode mail list are not automatically recorded as input to the UTC. You must use the reporting link above to generate comments for UTC consideration. http://www.unicode.org/consortium/distlist.html Regards, Rick McGowan Unicode, Inc.
Re: PR#11 (soft-dotted property) and digraphs (was: New Public Review Issue posted)
From: Rick McGowan [EMAIL PROTECTED] Philippe (and others who might be looking), I can't remember what was decided about the Soft-Dotted property of some Latin ligatures/digraphs with i or j in PR #11 (yes it was closed on last August...). The resolved issues are posted on the Resolved Issues page. It is linked from the Public Review page. Exactly. That's when reading this page that I posted this question... The Resolved issue just speaks about ij (explicitly excluded from soft-dotted characters) but not about lj and similar digraphs (formed with a soft-dotted letter)...
New Public Review Issue posted
The Unicode Technical Committee has posted a new issue for public review and comment. Details are on the following web page: http://www.unicode.org/review/ The review period for the new item closes on June 8, 2004. Please see the page for links to discussion and relevant documents. Briefly, the new issue is: 29 Normalization Issue (Closes 2004.06.08) There is a problem in the language of the specification of Unicode Standard Annex #15: Unicode Normalization Forms for forms NFC and NFKC. A textual fix is required to make normalization formally self-consistent. The fix will not have an impact on real data found in practice (with the possible exception of test cases for the algorithm itself), because the affected sequences do not constitute well-formed text in any language. Details, cases, and recommendations can be found in the review document. If you have comments for official UTC consideration, please post them by submitting your comments through our feedback reporting page: http://www.unicode.org/reporting.html If you wish to discuss issues on the Unicode mail list, then please use the following link to subscribe (if necessary). Please be aware that discussion comments on the Unicode mail list are not automatically recorded as input to the UTC. You must use the reporting link above to generate comments for UTC consideration. http://www.unicode.org/consortium/distlist.html Note: If you are a liaison representative, please forward this message as appropriate within your organization. Please also note that the Unicode 4.0.1 beta period has now closed (issue #13). We have also closed issues #26, #27, and #28. Their resolutions can all be found on the Resolved Issues page, linked from the above Public Review page. Regards, Rick McGowan Unicode, Inc.
Re: New Public Review Issue posted
I can't remember what was decided about the Soft-Dotted property of some Latin ligatures/digraphs with i or j in PR #11 (yes it was closed on last August...). I speak about lj for example. As they are not listed in the final resolution, I suppose they are still not soft-dotted, and thus their dots are retained intact even after a diacritic is added above them (exactly like for ij where this is explicitly stated). - Original Message - From: Rick McGowan [EMAIL PROTECTED] To: [EMAIL PROTECTED] Sent: Thursday, February 12, 2004 11:20 PM Subject: New Public Review Issue posted The Unicode Technical Committee has posted a new issue for public review and comment. Details are on the following web page: http://www.unicode.org/review/ The review period for the new item closes on June 8, 2004.
Re: New Public Review Issue posted
Philippe (and others who might be looking), I can't remember what was decided about the Soft-Dotted property of some Latin ligatures/digraphs with i or j in PR #11 (yes it was closed on last August...). The resolved issues are posted on the Resolved Issues page. It is linked from the Public Review page. Rick
New Public Review Issue
Note: This announcement was intended to go out a few days ago, but was delayed due to e-mail trouble with the recent net-wide virus. We apologize for the inconvenience of having such a short review period. The Unicode Technical Committee has posted a new issue for public review and comment. Details are on the following web page: http://www.unicode.org/review/ Review period for the new item closes on February 4, 2004. Please see the page for links to discussion and relevant documents. Briefly, the new issue is: 28 BIDI Boundary_Neutral Property Value (Closes 2004.02.04) The BIDI property value BN is currently aligned with the General Category Value Format_Character (Cf), minus, the BIDI specific format characters (LRM, RLM, RLE, LRE, RLO, LRO, PDF). The intent of the BN property is to allow the BIDI algorithm to ignore invisible, irrelevant characters when determining the ordering of the visible characters. The proposal is to align the BN property with Default_Ignorable_Code_Point property (DICP) instead of Cf, minus again the BIDI specific characters. If you have comments for official UTC consideration, please post them by submitting your comments through our feedback reporting page: http://www.unicode.org/reporting.html If you wish to discuss issues on the Unicode mail list, then please use the following link to subscribe (if necessary). Please be aware that discussion comments on the Unicode mail list are not automatically recorded as input to the UTC. You must use the reporting link above to generate comments for UTC consideration. http://www.unicode.org/consortium/distlist.html Regards, Rick McGowan Unicode, Inc.
RE: [hebrew] ZWJ and ZWNJ in combining sequences, was: New Public Review Issue posted
Is there any reason why this needed to be cross-posted to both lists? Certain members of the Hebrew list have had a very bad habit of allowing that discussion to spill over to the Unicode list for no good reason. I hope that responders will be careful in posting to the Hebrew list only. Peter
New Public Review Issue posted
The Unicode Technical Committee has posted a new issue for public review and comment. Details are on the following web page: http://www.unicode.org/review/ Review periods for the new item closes on January 27, 2004. Please see the page for links to discussion and relevant documents. Briefly, the new issue is: Issue #27 Joiner/Nonjoiner in Combining Character Sequences Unicode 4.0 describes the structure of Khmer syllables, saying that they may contain an interior ZWJ. There is a problem with this that needs to be resolved in 4.0.1, because some of the characters later in the syllable can be combining characters. This paper describes a proposal with to fix this problem. As a part of the proposal, a choice has to be made among two alternatives. If you have comments for official UTC consideration, please post them by submitting your comments through our feedback reporting page: http://www.unicode.org/reporting.html If you wish to discuss issues on the Unicode mail list, then please use the following link to subscribe (if necessary). Please be aware that discussion comments on the Unicode mail list are not automatically recorded as input to the UTC. You must use the reporting link above to generate comments for UTC consideration. http://www.unicode.org/consortium/distlist.html Let me take this opportunity also to remind everyone that the closing date for comment on several other public review issues is approaching, so if you have comments, please try to send them in soon. Note: If you are a liaison representative, please forward this message as appropriate within your organization. Regards, Rick McGowan Unicode, Inc.
ZWJ and ZWNJ in combining sequences, was: New Public Review Issue posted
On 16/01/2004 11:17, Rick McGowan wrote: The Unicode Technical Committee has posted a new issue for public review and comment. Details are on the following web page: http://www.unicode.org/review/ Review periods for the new item closes on January 27, 2004. Please see the page for links to discussion and relevant documents. Briefly, the new issue is: Issue #27 Joiner/Nonjoiner in Combining Character Sequences Unicode 4.0 describes the structure of Khmer syllables, saying that they may contain an interior ZWJ. There is a problem with this that needs to be resolved in 4.0.1, because some of the characters later in the syllable can be combining characters. This paper describes a proposal with to fix this problem. As a part of the proposal, a choice has to be made among two alternatives. Although this issue has been brought up for review in the light of the problem with Khmer, it also has a significant impact on Hebrew, and for that reason I am bringing it to the attention of the Hebrew list as well. I support the main proposal, which is to allow the ZWJ and ZWNJ characters to occur within combining character sequences. When they occur between two combining marks, they will indicate joining and non-joining forms respectively of those two combining marks. In Hebrew, this will provide a convenient mechanism for requesting or inhibiting ligatures between meteg and hataf vowels (see http://www.qaya.org/academic/hebrew/Issues-Hebrew-Unicode.html secton 3.5). Previously there was no such mechanism which was strictly compatible with Unicode definitions. With this change, the following distinctions can be made: vowel, ZWJ, meteg - medial meteg preferred, but only possible if the vowel is a hataf vowel (ZWJ must be ignored for other vowels) vowel, ZWNJ, meteg - left meteg preferred vowel, meteg - no preference, font default should be used (probably left meteg with all vowels) meteg, CGJ, vowel - right meteg preferred - or should this last one be meteg, ZWNJ, vowel, considering that ZWNJ will have the same effect as CGJ of blocking canonical reordering? I have a small concern that at least potentially there might be a need to promote or inhibit a ligature between combining marks which do not come together in canonical order. For example, in principle a single Hebrew base character might be combined with a hataf vowel (ccc 11-13), dagesh (ccc 21) and meteg (ccc 22). In canonical order the dagesh would be reordered between the hataf vowel and the meteg, either before or after ZWJ/ZWNJ, and would interfere with the mechanism. It might be necessary to code dagesh, CGJ, hataf vowel, ZW(N)J, meteg or hataf vowel, ZW(N)J, meteg, CGJ, dagesh. No such combination actually occurs in the standard text of the Hebrew Bible, but in principle one might be found in other texts. At first sight I see no reason to express a preference between option A or option B in the review issue, for Hebrew or any other reason. Please note the following if you wish to make official feedback to the UTC on this matter. If you have comments for official UTC consideration, please post them by submitting your comments through our feedback reporting page: http://www.unicode.org/reporting.html If you wish to discuss issues on the Unicode mail list, then please use the following link to subscribe (if necessary). Please be aware that discussion comments on the Unicode mail list are not automatically recorded as input to the UTC. You must use the reporting link above to generate comments for UTC consideration. http://www.unicode.org/consortium/distlist.html Let me take this opportunity also to remind everyone that the closing date for comment on several other public review issues is approaching, so if you have comments, please try to send them in soon. Note: If you are a liaison representative, please forward this message as appropriate within your organization. Regards, Rick McGowan Unicode, Inc. -- Peter Kirk [EMAIL PROTECTED] (personal) [EMAIL PROTECTED] (work) http://www.qaya.org/