Re: [OT] Voiced velar fricative
On Wed, Nov 05, 2003 at 10:10:58AM -0800, Doug Ewell wrote: I need someone to think of a quick example, off the top of their head, of a language (and example word) that uses the voiced velar fricative, the voiced equivalent of the 'ch' in Scottish 'loch'. The IPA symbol for this sound is [], or U+0263. The more commonly known the language, the better (i.e. no South American languages with 200 speakers, please). Czech Slovak, where it is an allophone of voiceless velar fricative, so the process of assimilation has to take part - grapheme ch is usually pronounced /x/, unless certain voiced consonants follow immediately - then it is indeed // (U+0263). Although I noticed that especially young people in Bratislava start to pronounce it as something similar to voiced _uvular_ fricative // (U+0281) -- --- | Radovan Garabk http://melkor.dnp.fmph.uniba.sk/~garabik/ | | __..--^^^--..__garabik @ melkor.dnp.fmph.uniba.sk | --- Antivirus alert: file .signature infected by signature virus. Hi! I'm a signature virus! Copy me into your signature file to help me spread!
Re: [hebrew] Re: Hebrew composition model, with cantillation marks
At 15:53 -0800 2003-11-05, Doug Ewell wrote: Gads, how I wish there were a Hebrew-specific list where these protracted Hebrew-specific discussions could take place. There is. [EMAIL PROTECTED] I just unsubscribed from it because I just can't track the volume of what's being discussed there. -- Michael Everson * * Everson Typography * * http://www.evertype.com
RE: Encoding Tamil SRI
Tamil SHRI [sic] can't be represented correctly in Unicode yet. It will not be able to be correctly until U+0BB6 is encoded. It was accepted for ballot by WG2 and UTC but has to go through the process now. -- Michael Everson * * Everson Typography * * http://www.evertype.com
Re: Merging combining classes, was: New contribution N2676
On 05/11/2003 19:59, Jony Rosenne wrote: -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Philippe Verdy Sent: Thursday, November 06, 2003 3:46 AM Is there an initiative in Israel related to the supported glyphs and rendering features required to support Hebrew, like it exists in Europe with MES subsets, and will soon be developped for Chinese? Why would we need it? All major vendors support Hebrew quite well now. Jony You mean, I think, that they support the (unofficial) subset of the Unicode Hebrew block used in modern Hebrew, either only unpointed or with a limited inventory and limited combinations of points. Adequate for normal use in Israel, but not for biblical scholarship. -- Peter Kirk [EMAIL PROTECTED] (personal) [EMAIL PROTECTED] (work) http://www.qaya.org/
RE: Encoding Tamil SRI
Peter Constable wrote: Alternatives given were (0BB8)(0BCD)(0BB1)(0BC0) (0BB6)(0BCD)(0BB1)(0BC0) (if and when U+0BB6 becomes Unicode) (0B9A)(0BBF)(0BB1)(0BC0) Alternatives to what? The first and third sequence would have distinct appearances (see attached file), and would consistute distinct spellings. The second cannot be evaluated without knowing what they intend 0BB6 to be. U+0BB6 = TAMIL LETTER SHA (see http://www.unicode.org/alloc/Pipeline.html). _ Marco
Re: UTF-16 inside UTF-8
Doug Ewell scripsit: To cite a non-Unicode example, in ECMAScript (née JavaScript) there is a function Date.GetYear() that was intended to return the last two digits of the year but actually returned the year minus 1900. Of course, starting in 2000 the function returned a value which was useful to practically nobody. How, not useful? C programmers have been dealing with (year - 1900) since the 70s, and it is now 103. :-) Did Sun or ECMA change the definition of Date.GetYear()? No, they introduced a new function, Date.GetFullYear(), which does what users really want. I wonder why they bothered, since it can be defined in a single line of ECMAscript. Now if GetYear() had indeed returned the last two digits, that would have been annoying, since it could be used only for presentation, and in order to get the actual year, one would have to impose an arbitrary heuristic to map the 2-digit value to a year number. -- Only do what only you can do. John Cowan [EMAIL PROTECTED] --Edsger W. Dijkstra, http://www.reutershealth.com deceased 6 August 2002 http://www.ccil.org/~cowan
Re: [hebrew] Re: Hebrew composition model, with cantillation marks
On 06/11/2003 02:42, Michael Everson wrote: There is. [EMAIL PROTECTED] I just unsubscribed from it because I just can't track the volume of what's being discussed there. Understandable, but sad. When new people join a discussion like that they often have a lot of questions which need answering as well as new ideas which need consideration, and these contribute to a temporary high volume of traffic. In retrospect it might have been better to take some of this off list. But if this means that the older participants who already understand the issues are scared away, the cause of standardisation is not advanced. Meanwhile I would judge that the current spate of high traffic has almost run its course, and things will quieten down over the next couple of days. We need to work towards some real proposals for improving Hebrew support, not just chat. But who is going to know about these proposals and assess them if they are not on the Hebrew list, and if discussion of Hebrew is not allowed on the main list? -- Peter Kirk [EMAIL PROTECTED] (personal) [EMAIL PROTECTED] (work) http://www.qaya.org/
Re: [hebrew] Re: Hebrew composition model, with cantillation marks
At 04:55 -0800 2003-11-06, Peter Kirk wrote: We need to work towards some real proposals for improving Hebrew support, not just chat. But who is going to know about these proposals and assess them if they are not on the Hebrew list, and if discussion of Hebrew is not allowed on the main list? Please keep the detailed proposals on the Hebrew-specific list. It's probably best not to cc: the main list. If you're thinking of cc:ing, it probably belongs to the detailed list. -- Michael Everson * * Everson Typography * * http://www.evertype.com
Re: elided base character or obliterated character (was: Hebrew composition model, with cantillation marks)
On Wed, 5 Nov 2003 12:24:00 +0100, Philippe Verdy wrote: The obliterated character needed for paleolitic studies, or to encode any texts in which the character is not recognizable already exists: isn't it the REPLACEMENT CHARACTER? The problem of how to represent missing/obliterated characters in Unicode when transcribing manuscript/printed texts and inscriptions, etc. has always perplexed me. U+FFFD [Replacement Character] is used to replace an incoming character whose value is unknown or unrepresentable in Unicode, and is definitely not the correct character to use to represent a missing or obliterated character in a non-electronic source text. For Chinese the standard glyph for a missing/obliterated/unclear ideograph is a full-width hollow square (i.e. the same size as a CJK ideograph). This glyph is very common in modern printed Chinese texts, from scholarly editions of ancient texts unearthed from 2,000 year old tombs to popular typeset reprints of 19th century novels. Several examples of the usage of this glyph in modern printed texts from the PRC can be found at http://uk.geocities.com/babelstone1357/CJK/missing.html The problem is how to represent this glyph in electronic texts. Browsing the internet there seem to be two, both unsatisfactory, ways of representing this missing ideograph glyph : 1. Using U+25A1 [WHITE SQUARE] (although any of the other white square graphic symbols encoded in Unicode, such as U+25A2, U+25FB or U+25FD, could also be used I suppose). The problems with this character are : a) it has the wrong character properties for use within running CJK text. b) with CJK fonts such as SimSun U+25A1 is rendered the same height and width as a CJK ideograph, but with non-Chinese fonts such as Arial Unicode MS U+25A1 may be rendered much smaller than a CJK ideograph, which looks totally wrong. 2. Using U+56D7 [a CJK ideograph, rarely used other than as a radical = U+2F1E], which has the right character properties, and renders at the correct size; but the glyph shape may not be completely square depending upon the font style, and basically it is just the wrong character for the job. It would be extremely useful to have a dedicated Unicode character for missing CJK ideograph with the right character properties, and I have considered making a proposal for such a character, but have hesitated as if there really is such a great need for it (and I personally have web pages which transcribe texts with missing/obliterated ideographs where such a character is desperately needed) then why does it not already exist in Unicode or pre-existing Chinese encoding standards ? Andrew
Re: [hebrew] Re: Hebrew composition model, with cantillation marks
On 06/11/2003 05:14, Michael Everson wrote: At 04:55 -0800 2003-11-06, Peter Kirk wrote: We need to work towards some real proposals for improving Hebrew support, not just chat. But who is going to know about these proposals and assess them if they are not on the Hebrew list, and if discussion of Hebrew is not allowed on the main list? Please keep the detailed proposals on the Hebrew-specific list. It's probably best not to cc: the main list. If you're thinking of cc:ing, it probably belongs to the detailed list. But we Hebrew experts want our proposals to be reviewed in advance by UTC members and others who understand the broad scope of Unicode. This avoids wasting the UTC's time as well as ours by presenting proposals which are clearly unacceptable. But how are UTC members to see or even know about such proposals if they don't monitor the Hebrew list and if the proposals cannot be mentioned, as I proposed, on the general list? -- Peter Kirk [EMAIL PROTECTED] (personal) [EMAIL PROTECTED] (work) http://www.qaya.org/
RE: Encoding Tamil SRI
. Peter Jacobi wrote, The point is, that contrary to northern Indian scripts, Tamil doesn't form conjunct consonants. Perhaps this could be stated as “... Tamil doesn't form many conjunct consonants”? U+0B95, U+0BCD, U+0BB7 should render as Tamil K-SSA (க்ஷ). Best regards, James Kass .
RE: Encoding Tamil SRI
. Michael Everson wrote, Tamil SHRI [sic] can't be represented correctly in Unicode yet. It will not be able to be correctly until U+0BB6 is encoded. It was accepted for ballot by WG2 and UTC but has to go through the process now. Proposal for adding SHA at U+0BB6 can be seen at: http://wwwold.dkuug.dk/JTC1/SC2/WG2/docs/n2617 In the document, it is noted that the current practice for encoding SHRI in Unicode is SA+VIRAMA+RA. Does this mean that existing documents/data are incorrect or will become incorrect once SHA is formally approved? Best regards, James Kass .
RE: Encoding Tamil SRI
In the document, it is noted that the current practice for encoding SHRI in Unicode is SA+VIRAMA+RA. Plus II. (SA+VIRAMA+RA+II). Best regards, James Kass .
[OT] Voiced velar fricative
Common enough in Irish, Doug. Herewith some minimal pairs: ghroí (voiced) chroí (unvoiced) ghas (voiced) chas (unvoiced) ghual (voiced) chual (unvoiced) ghoill (voiced) choill (unvoiced) ghnó (voiced) chnó (unvoiced) Learners (until they develop a good ear for the difference) can make mistakes to their cost in re the above and similar pairings. Hope this helps, mg -- Marion Gunn * EGTeo (Estab.1991) 27 Páirc an Fhéithlinn, Baile an Bhóthair, Co. Átha Cliath, Éire. * [EMAIL PROTECTED] * [EMAIL PROTECTED] *
Re: [hebrew] Re: Hebrew composition model, with cantillation marks
Michael Everson everson at evertype dot com wrote: At 15:53 -0800 2003-11-05, Doug Ewell wrote: Gads, how I wish there were a Hebrew-specific list where these protracted Hebrew-specific discussions could take place. There is. [EMAIL PROTECTED] I know. I was being facetious. Peter Kirk peterkirk at qaya dot org responded to Michael a few messages later: Please keep the detailed proposals on the Hebrew-specific list. It's probably best not to cc: the main list. If you're thinking of cc:ing, it probably belongs to the detailed list. But we Hebrew experts want our proposals to be reviewed in advance by UTC members and others who understand the broad scope of Unicode. This avoids wasting the UTC's time as well as ours by presenting proposals which are clearly unacceptable. But how are UTC members to see or even know about such proposals if they don't monitor the Hebrew list and if the proposals cannot be mentioned, as I proposed, on the general list? I don't think mentioning the proposals is something anyone would object to. It would be nice, though, if the great volume of committee work, which involves initial bouncing around of ideas and maximum controversy among participants, could take place on the [hebrew] list and the proposals, if any, could be brought back to the main list after there is some semblance of consensus among [hebrew] participants: We've come up with the following suggestions for handling this problem with shuffling of Hebrew combining marks or whatever: (1) create a new combining character X; (2) redefine the semantics of existing character Y; (3) create a new base character Z; (4) create a Technical Report clarifying how things should be encoded; (5) etc. etc. Comments would then be appropriate to the main list if they are relevant to Unicode in general, or deal with the acceptability of the proposal, or should return to the [hebrew] list if they deal with the minute details of Hebrew, especially if they are comprehensible only to those with a working knowledge of Hebrew (which characterizes much of the current discussion). This bi-level approach is suggested only because of the very high volume of detailed discussion this topic has engendered, not because I think there's anything wrong with discussing Hebrew or details on the Unicode list. I can't help thinking that other specialized lists, such as those for bidi and CJK, were created to resolve this exact type of problem. I realize I may be way off base on this, in which case I'll just continue to make frequent use of my Delete button. -Doug Ewell Fullerton, California http://users.adelphia.net/~dewell/
Re: [hebrew] Re: Hebrew composition model, with cantillation marks
On Thu, 6 Nov 2003 08:30:24 -0800, Doug Ewell wrote: I can't help thinking that other specialized lists, such as those for bidi and CJK, were created to resolve this exact type of problem. CJK list ? Now if only there was a list of Unicode lists ...
Re: Hebrew composition model, with cantillation marks
Andrew, There isn't a CJK list. Rick CJK list ? Now if only there was a list of Unicode lists ...
Re: [hebrew] Re: Hebrew composition model, with cantillation marks
I agree with you here, Doug. I am copying this to the Hebrew list in the hope that those on both lists will follow this kind of procedure. Or does anyone have strong objections? On 06/11/2003 08:30, Doug Ewell wrote: ... Peter Kirk peterkirk at qaya dot org responded to Michael a few messages later: Please keep the detailed proposals on the Hebrew-specific list. It's probably best not to cc: the main list. If you're thinking of cc:ing, it probably belongs to the detailed list. But we Hebrew experts want our proposals to be reviewed in advance by UTC members and others who understand the broad scope of Unicode. This avoids wasting the UTC's time as well as ours by presenting proposals which are clearly unacceptable. But how are UTC members to see or even know about such proposals if they don't monitor the Hebrew list and if the proposals cannot be mentioned, as I proposed, on the general list? I don't think mentioning the proposals is something anyone would object to. It would be nice, though, if the great volume of committee work, which involves initial bouncing around of ideas and maximum controversy among participants, could take place on the [hebrew] list and the proposals, if any, could be brought back to the main list after there is some semblance of consensus among [hebrew] participants: We've come up with the following suggestions for handling this problem with shuffling of Hebrew combining marks or whatever: (1) create a new combining character X; (2) redefine the semantics of existing character Y; (3) create a new base character Z; (4) create a Technical Report clarifying how things should be encoded; (5) etc. etc. Comments would then be appropriate to the main list if they are relevant to Unicode in general, or deal with the acceptability of the proposal, or should return to the [hebrew] list if they deal with the minute details of Hebrew, especially if they are comprehensible only to those with a working knowledge of Hebrew (which characterizes much of the current discussion). (Actually, this is not quite true. Most of the recent thread has been an attempt to educate someone who was, by their own admission, not familiar with the details of Hebrew, but nevertheless wanted to help fix the problems.) This bi-level approach is suggested only because of the very high volume of detailed discussion this topic has engendered, not because I think there's anything wrong with discussing Hebrew or details on the Unicode list. I can't help thinking that other specialized lists, such as those for bidi and CJK, were created to resolve this exact type of problem. I realize I may be way off base on this, in which case I'll just continue to make frequent use of my Delete button. -Doug Ewell Fullerton, California http://users.adelphia.net/~dewell/ -- Peter Kirk [EMAIL PROTECTED] (personal) [EMAIL PROTECTED] (work) http://www.qaya.org/
Re: elided base character or obliterated character (was: Hebrew composition model, with cantillation marks)
Andrew C. West scripsit: The problem of how to represent missing/obliterated characters in Unicode when transcribing manuscript/printed texts and inscriptions, etc. has always perplexed me. IIRC we talked about this a year or so ago, and kicked around the idea that the Chinese square could be treated as a glyph variant of U+3013 GETA MARK, which looks quite different but symbolizes the same thing. I don't remember the outcome. -- But you, Wormtongue, you have done what you could for your true master. Some reward you have earned at least. Yet Saruman is apt to overlook his bargains. I should advise you to go quickly and remind him, lest he forget your faithful service. --Gandalf John Cowan [EMAIL PROTECTED]
[offline] RE: [hebrew] Re: Hebrew composition model, with cantillation marks
While I might have added this thread yesterday, I trust you believe that I am attempting to get Hebrew-specific stuff onto the Hebrew list, and trying to kill the kind of rambling threads that have been going on, which I consider fruitless and find very annoying. Peter -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Doug Ewell Sent: Wednesday, November 05, 2003 3:54 PM To: Unicode Mailing List Subject: Re: [hebrew] Re: Hebrew composition model, with cantillation marks Gads, how I wish there were a Hebrew-specific list where these protracted Hebrew-specific discussions could take place. -Doug Ewell Fullerton, California http://users.adelphia.net/~dewell/
list etiquette (was RE: Merging combining classes, was: New contribution N2676)
Folks, there are people on the Unicode list who have been frustrated by the volume of traffic on Hebrew, and for that reason a separate list was created. All of the people currently discussing Hebrew are members of that other list, although certain individuals have a bad habit of sending replies back to the Unicode list. When this happens, I think it would be a courtesy to actively steer the discussion back to the Hebrew list by sending your replies there. Peter Peter Constable Globalization Infrastructure and Font Technologies Microsoft Windows Division -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Peter Kirk Sent: Thursday, November 06, 2003 3:34 AM To: Jony Rosenne Cc: 'Philippe Verdy'; [EMAIL PROTECTED] Subject: Re: Merging combining classes, was: New contribution N2676 On 05/11/2003 19:59, Jony Rosenne wrote: -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Philippe Verdy Sent: Thursday, November 06, 2003 3:46 AM Is there an initiative in Israel related to the supported glyphs and rendering features required to support Hebrew, like it exists in Europe with MES subsets, and will soon be developped for Chinese? Why would we need it? All major vendors support Hebrew quite well now. Jony You mean, I think, that they support the (unofficial) subset of the Unicode Hebrew block used in modern Hebrew, either only unpointed or with a limited inventory and limited combinations of points. Adequate for normal use in Israel, but not for biblical scholarship. -- Peter Kirk [EMAIL PROTECTED] (personal) [EMAIL PROTECTED] (work) http://www.qaya.org/
RE: [hebrew] Re: Hebrew composition model, with cantillation marks
But we Hebrew experts want our proposals to be reviewed in advance by UTC members and others who understand the broad scope of Unicode... There have been several such people subscribed to the Hebrew list. Rambling verbose discussions are making some of them leave however. Peter Peter Constable Globalization Infrastructure and Font Technologies Microsoft Windows Division
Re: UTF-16 inside UTF-8
I would like to comment on several statements that I have seen in this thread - - Migrating from UCS-2 to UTF-16: Doable, and has been done for many applications and libraries. - Difficult to handle UTF-16? Use ICU - it handles all of Unicode for collation, regular expressions, string casing, codepage conversion, and many other things. - Support for supplementary characters only for Chinese? Japan has defined JIS X 0213 which has characters that map to + supplementary characters as well as + multiple BMP characters (ICU 2.8 will support codepage conversion involving multiple characters on either side) CJKV ideographs, used in several languages, are driving support for supplementary characters. - Case mappings can be modified to return a 32-bit Unicode code point instead of 16-bit BMP? This works, but only for simple case mappings. Full Unicode case mappings are defined on strings, and single-character APIs won't work at all. Full string mappings map 1:n and are context- and language-sensitive. markus http://oss.software.ibm.com/icu/ -- Opinions expressed here may not reflect my company's positions unless otherwise noted.
Re: Merging combining classes
On 2003.10.30, 15:48, Jim Allan [EMAIL PROTECTED] wrote: I offered a suggestion on cedilla and combining undercomma: ... One wants to find matches for Romanian and Latvian personal names or place names or individual forms using cedilla or undercomma regardless of the language in which they are embedded. All this cedilla vs. undercomma reminds me of something I spotted last summer (and will have on photo ASAP): Portuguese roadsigns are usually set in a type whose cedilla glyphs are shaped like undercommas (which are less frequent than the connecting variant but nonetheless correct). A large sign at the main western road access to Miranda do Douro, Portugal's northeasternmost city, informs that if you take the road to the left out of the next roundabout you will reach the neighboring city Bragança... All this quite OK, but for some weird reason the cedilla was placed under the second a instead of under the c. Now the real challenge is to try and encode this typo: someone learned in Portuguese would prefer 0042 0072 0061 0067 0061 0327 006E 0063 0061 but any other would never know and have it 0042 0072 0061 0067 0061 0326 006E 0063 0061 of course the same can be said about any correctly spelt word, but these may be checked against a dictionary and corrected -- typoes cannot. Anyway -- who ever decided that cedilla and undercomma are different things? Do they have different origins? Any language / orthography using both distinctly?... -- . António MARTINS-Tuválkin, | ()| [EMAIL PROTECTED] || R. Laureano de Oliveira, 64 r/c esq. | PT-1885-050 MOSCAVIDE (LRS) Não me invejo de quem tem | +351 934 821 700 carros, parelhas e montes | http://www.tuvalkin.web.pt/bandeira/ só me invejo de quem bebe | http://pagina.de/bandeiras/ a água em todas as fontes |
Re: [hebrew] Re: Hebrew composition model, with cantillation marks
Philippe Verdy wrote at 10:15 PM on Wednesday, November 5, 2003: If it's not in the written text, it is not implied by the writer. If this were true, based on the fact that writers wrote very few of them, we would be faced with the implication that there were very few vowels indeed in the old Hebrew, Aramaic, Arabic, Syriac, Phoenician, Moabite, Ammonite, and Ugaritic languages. Respectfully, Dean A. Snyder Scholarly Technology Specialist Library Digital Programs, Sheridan Libraries Garrett Room, MSE Library, 3400 N. Charles St. Johns Hopkins University Baltimore, Maryland, USA 21218 office: 410 516-6850 mobile: 410 245-7168 fax: 410-516-6229 Manager, Digital Hammurabi Project: www.jhu.edu/digitalhammurabi
Re: CJK mailing list (was: Hebrew composition model, with cantillation marks)
From: Rick McGowan [EMAIL PROTECTED] To: [EMAIL PROTECTED] Sent: Thursday, November 06, 2003 6:43 PM Subject: Re: Hebrew composition model, with cantillation marks Andrew, There isn't a CJK list. Rick CJK normalization at least does not cause so many problems, as ideographs are not encoded by combining sequences, but individually (except for some private ideographs that may be encoded with ideographic description characters, and basic radicals or strokes, but they all are base character at class 0 and are not affected by normalization). So the only discussions in CJK are mostly related to unification of repertoires from various sources (national standard organisms, research groups and universitaries, librarians, dictionnaries and their publishers...) This focuses much less a large population, but it is certainly a problem when CJK can be augmented ad infinitum without policies by any one using the script and constantly inventing new characters in their own EUDC area and using them to publish something.
Re: Merging combining classes
António Martins-Tválkin wrote: Anyway -- who ever decided that cedilla and undercomma are different things? Do they have different origins? Any language / orthography using both distinctly?... I don't know whether undercomma is in origin distinct from cedilla or is historically an adaptation of the cedilla. I *suspect* the latter. Even given a common origins, it is debatable whether they should now be considered the same or not. That is why there is a problem. It isn't cut and dried. The MARC 21 and Ansel character sets distinguished the two as CEDILLA and LEFT HOOK (for the undercomma) though it is dubious whether the originators of these sets knew what this left hook was. See http://lcweb2.loc.gov/cocoon/codetables/45.html for current ANSEL specifications and http://www.niso.org/standards/resources/Z39-47-1993(R2002).pdf for 1963 table where it was notoriously given the name LEFT HOOF. Its identity with the undercomma is asserted at http://www.niso.org/international/SC4/Wg1_240.pdf: 5/2 HOOK TO LEFT In ISO 5426, this character is annotated ' used in Latvian, Romanian.' Because of this use, the most appropriate mapping is to U+0326 COMBINING COMMA BELOW (annotated as 'variant of the following' [combining cedilla] in the Unicode Standard). The original ISO 6429 character sets were constructed under the philosophy that differences between cedilla and undercomma were only stylistic. The default images in those tables and in Unicode Standard versions 1 and 2 showed a cedilla form throughout. However users of Latvian and Romanian insisted firmly that cedilla forms were not historically correct for printed material in those languages. It was *only* increasing use of fonts created outside of eastern Europe that had caused the incorrect cedilla shape to be seen, especially as computer technology took hold. For Latvian (and Livonian), the problem was easily solved within standard character sets by font designers using the undercomma character beneath all letters except _c_ or _s_ . However Romanian _s_ which traditionally had undercomma conflicted with Turkish _s_ with cedilla. The result was a Romanian proposal to add uppercase and lowercase combined characters with undercomma for uppercase and lowercase _s_ and _t_. See ISO/IEC JTC 1/SC 2/WG 2 N1604 (1987) at http://anubis.dkuug.dk/JTC1/SC2/WG2/docs/n1604.htm : *RESOLUTION M33.24 (4 Latin characters): _Netherland Negative._* WG 2 accepts the following four Latin characters (requested by Romania), their names and shapes to be encoded in the BMP as follows: 0218 LATIN CAPITAL LETTER S WITH COMMA BELOW 0219 LATIN SMALL LETTER S WITH COMMA BELOW 021A LATIN CAPITAL LETTER T WITH COMMA BELOW 021B LATIN SMALL LETTER T WITH COMMA BELOW in accordance with document N1361. See resolution M33.26 for further processing. But Romanians are still frustrated because most fonts distributed as part of computer operating systems or otherwise available do not support these characters. ISO 8859/16 (intended as a replacement for ISO 8859/2) specifically designates undercomma rather than cedilla with _s_, _S_, _t_, _T_. See ftp://ftp.unicode.org/Public/MAPPINGS/ISO8859/8859-16.TXT For the Netherlands opposition see http://wwwold.dkuug.dk/JTC1/SC2/WG3/docs/n441.pdf . Since there is no linguistic tradition in any language for _t_ with a cedilla shape beneath, most modern fonts display an undercomma beneath U+0162, U+0163 instead of a cedilla shape. It is really only with _s_ that there are two conflicting usages. There are actually three conflicting uses, since Gagauz traditionally uses a cedilla shape under _c_ an undercomma beneath _t_ and a symbol halfway between the two under _s_. See http://www.unicode.org/mail-arch/unicode-ml/y2002-m09/0199.html Jim Allan
OT: Inuktitut dictionary?
A friend is looking for a vocabulary-rich English-Inuktitut dictionary as a source for names for malamute dogs. He is a scholar in another field (astrobiology), and so is concerned with accuracy. I'm sure he would gladly learn the syllabics to the extent necessary. He has access to university interlibrary loan if the best dictionary is out of print. And I imagine he would be fine with Inupiaq, too. Please email offlist if you have any suggestions. Thanks! -- Curtis Clark http://www.csupomona.edu/~jcclark/ Mockingbird Font Works http://www.mockfont.com/