Re: [OT] Voiced velar fricative

2003-11-06 Thread Radovan Garabik
On Wed, Nov 05, 2003 at 10:10:58AM -0800, Doug Ewell wrote:
 I need someone to think of a quick example, off the top of their head,
 of a language (and example word) that uses the voiced velar fricative,
 the voiced equivalent of the 'ch' in Scottish 'loch'.  The IPA symbol
 for this sound is [], or U+0263.
 
 The more commonly known the language, the better (i.e. no South American
 languages with 200 speakers, please).

Czech  Slovak, where it is an allophone of voiceless velar fricative,
so the process of assimilation has to take part - 
grapheme ch is usually pronounced /x/, unless certain voiced
consonants follow immediately - then it is indeed // (U+0263). Although
I noticed that especially young people in Bratislava start to pronounce
it as something similar to voiced _uvular_ fricative // (U+0281)


-- 
 ---
| Radovan Garabk http://melkor.dnp.fmph.uniba.sk/~garabik/ |
| __..--^^^--..__garabik @ melkor.dnp.fmph.uniba.sk |
 ---
Antivirus alert: file .signature infected by signature virus.
Hi! I'm a signature virus! Copy me into your signature file to help me spread!



Re: [hebrew] Re: Hebrew composition model, with cantillation marks

2003-11-06 Thread Michael Everson
At 15:53 -0800 2003-11-05, Doug Ewell wrote:
Gads, how I wish there were a Hebrew-specific list where these
protracted Hebrew-specific discussions could take place.
There is. [EMAIL PROTECTED]

I just unsubscribed from it because I just can't track the volume of 
what's being discussed there.
--
Michael Everson * * Everson Typography *  * http://www.evertype.com



RE: Encoding Tamil SRI

2003-11-06 Thread Michael Everson
Tamil SHRI [sic] can't be represented correctly in Unicode yet. It 
will not be able to be correctly until U+0BB6 is encoded. It was 
accepted for ballot by WG2 and UTC but has to go through the process 
now.
--
Michael Everson * * Everson Typography *  * http://www.evertype.com



Re: Merging combining classes, was: New contribution N2676

2003-11-06 Thread Peter Kirk
On 05/11/2003 19:59, Jony Rosenne wrote:

 

-Original Message-
From: [EMAIL PROTECTED] 
[mailto:[EMAIL PROTECTED] On Behalf Of Philippe Verdy
Sent: Thursday, November 06, 2003 3:46 AM

Is there an initiative in Israel related to the supported 
glyphs and rendering features required to support Hebrew, 
like it exists in Europe with MES subsets, and will soon be 
developped for Chinese?

   

Why would we need it? All major vendors support Hebrew quite well now.

Jony
 

You mean, I think, that they support the (unofficial) subset of the 
Unicode Hebrew block used in modern Hebrew, either only unpointed or 
with a limited inventory and limited combinations of points. Adequate 
for normal use in Israel, but not for biblical scholarship.

--
Peter Kirk
[EMAIL PROTECTED] (personal)
[EMAIL PROTECTED] (work)
http://www.qaya.org/




RE: Encoding Tamil SRI

2003-11-06 Thread Marco Cimarosti
Peter Constable wrote:
  Alternatives given were
  (0BB8)(0BCD)(0BB1)(0BC0)
  (0BB6)(0BCD)(0BB1)(0BC0)  (if and when U+0BB6 becomes Unicode)
  (0B9A)(0BBF)(0BB1)(0BC0)
 
 Alternatives to what? The first and third sequence would have distinct
 appearances (see attached file), and would consistute distinct
 spellings. The second cannot be evaluated without knowing what they
 intend 0BB6 to be.

U+0BB6 = TAMIL LETTER SHA (see
http://www.unicode.org/alloc/Pipeline.html).

_ Marco



Re: UTF-16 inside UTF-8

2003-11-06 Thread John Cowan
Doug Ewell scripsit:

 To cite a non-Unicode example, in ECMAScript (née JavaScript) there is a
 function Date.GetYear() that was intended to return the last two digits
 of the year but actually returned the year minus 1900.  Of course,
 starting in 2000 the function returned a value which was useful to
 practically nobody.

How, not useful?  C programmers have been dealing with (year - 1900)
since the 70s, and it is now 103.  :-)

 Did Sun or ECMA change the definition of
 Date.GetYear()?  No, they introduced a new function, Date.GetFullYear(),
 which does what users really want.

I wonder why they bothered, since it can be defined in a single line of
ECMAscript.

Now if GetYear() had indeed returned the last two digits, that would have
been annoying, since it could be used only for presentation, and in
order to get the actual year, one would have to impose an arbitrary
heuristic to map the 2-digit value to a year number.

-- 
Only do what only you can do.   John Cowan [EMAIL PROTECTED]
  --Edsger W. Dijkstra, http://www.reutershealth.com
deceased 6 August 2002  http://www.ccil.org/~cowan



Re: [hebrew] Re: Hebrew composition model, with cantillation marks

2003-11-06 Thread Peter Kirk
On 06/11/2003 02:42, Michael Everson wrote:

There is. [EMAIL PROTECTED]

I just unsubscribed from it because I just can't track the volume of 
what's being discussed there.
Understandable, but sad. When new people join a discussion like that 
they often have a lot of questions which need answering as well as new 
ideas which need consideration, and these contribute to a temporary high 
volume of traffic. In retrospect it might have been better to take some 
of this off list. But if this means that the older participants who 
already understand the issues are scared away, the cause of 
standardisation is not advanced. Meanwhile I would judge that the 
current spate of high traffic has almost run its course, and things will 
quieten down over the next couple of days.

We need to work towards some real proposals for improving Hebrew 
support, not just chat. But who is going to know about these proposals 
and assess them if they are not on the Hebrew list, and if discussion of 
Hebrew is not allowed on the main list?

--
Peter Kirk
[EMAIL PROTECTED] (personal)
[EMAIL PROTECTED] (work)
http://www.qaya.org/




Re: [hebrew] Re: Hebrew composition model, with cantillation marks

2003-11-06 Thread Michael Everson
At 04:55 -0800 2003-11-06, Peter Kirk wrote:

We need to work towards some real proposals for improving Hebrew 
support, not just chat. But who is going to know about these 
proposals and assess them if they are not on the Hebrew list, and if 
discussion of Hebrew is not allowed on the main list?
Please keep the detailed proposals on the Hebrew-specific list. It's 
probably best not to cc: the main list. If you're thinking of cc:ing, 
it probably belongs to the detailed list.
--
Michael Everson * * Everson Typography *  * http://www.evertype.com



Re: elided base character or obliterated character (was: Hebrew composition model, with cantillation marks)

2003-11-06 Thread Andrew C. West
On Wed, 5 Nov 2003 12:24:00 +0100, Philippe Verdy wrote:
 
 The obliterated character needed for paleolitic studies, or to encode any
 texts in which the character is not recognizable already exists: isn't it
 the REPLACEMENT CHARACTER?
 

The problem of how to represent missing/obliterated characters in Unicode when
transcribing manuscript/printed texts and inscriptions, etc. has always
perplexed me.

U+FFFD [Replacement Character] is used to replace an incoming character whose
value is unknown or unrepresentable in Unicode, and is definitely not the
correct character to use to represent a missing or obliterated character in a
non-electronic source text.

For Chinese the standard glyph for a missing/obliterated/unclear ideograph is a
full-width hollow square (i.e. the same size as a CJK ideograph). This glyph is
very common in modern printed Chinese texts, from scholarly editions of ancient
texts unearthed from 2,000 year old tombs to popular typeset reprints of 19th
century novels. Several examples of the usage of this glyph in modern printed
texts from the PRC can be found at
http://uk.geocities.com/babelstone1357/CJK/missing.html

The problem is how to represent this glyph in electronic texts. Browsing the
internet there seem to be two, both unsatisfactory, ways of representing this
missing ideograph glyph :

1. Using U+25A1   [WHITE SQUARE] (although any of the other white square
graphic symbols encoded in Unicode, such as U+25A2, U+25FB or U+25FD, could also
be used I suppose). The problems with this character are :
a) it has the wrong character properties for use within running CJK text.
b) with CJK fonts such as SimSun U+25A1 is rendered the same height and width as
a CJK ideograph, but with non-Chinese fonts such as Arial Unicode MS U+25A1 may
be rendered much smaller than a CJK ideograph, which looks totally wrong.

2. Using U+56D7 š˜ [a CJK ideograph, rarely used other than as a radical =
U+2F1E], which has the right character properties, and renders at the correct
size; but the glyph shape may not be completely square depending upon the font
style, and basically it is just the wrong character for the job.

It would be extremely useful to have a dedicated Unicode character for missing
CJK ideograph with the right character properties, and I have considered making
a proposal for such a character, but have hesitated as if there really is such a
great need for it (and I personally have web pages which transcribe texts with
missing/obliterated ideographs where such a character is desperately needed)
then why does it not already exist in Unicode or pre-existing Chinese encoding
standards ?

Andrew



Re: [hebrew] Re: Hebrew composition model, with cantillation marks

2003-11-06 Thread Peter Kirk
On 06/11/2003 05:14, Michael Everson wrote:

At 04:55 -0800 2003-11-06, Peter Kirk wrote:

We need to work towards some real proposals for improving Hebrew 
support, not just chat. But who is going to know about these 
proposals and assess them if they are not on the Hebrew list, and if 
discussion of Hebrew is not allowed on the main list?


Please keep the detailed proposals on the Hebrew-specific list. It's 
probably best not to cc: the main list. If you're thinking of cc:ing, 
it probably belongs to the detailed list.
But we Hebrew experts want our proposals to be reviewed in advance by 
UTC members and others who understand the broad scope of Unicode. This 
avoids wasting the UTC's time as well as ours by presenting proposals 
which are clearly unacceptable. But how are UTC members to see or even 
know about such proposals if they don't monitor the Hebrew list and if 
the proposals cannot be mentioned, as I proposed, on the general list?

--
Peter Kirk
[EMAIL PROTECTED] (personal)
[EMAIL PROTECTED] (work)
http://www.qaya.org/




RE: Encoding Tamil SRI

2003-11-06 Thread jameskass
.
Peter Jacobi wrote,

 The point is, that contrary to northern Indian scripts, Tamil doesn't form
 conjunct consonants.

Perhaps this could be stated as “... Tamil doesn't form many conjunct
consonants”?

U+0B95, U+0BCD, U+0BB7 should render as Tamil K-SSA (க்ஷ).

Best regards,

James Kass
.



RE: Encoding Tamil SRI

2003-11-06 Thread jameskass
.
Michael Everson wrote,

 Tamil SHRI [sic] can't be represented correctly in Unicode yet. It 
 will not be able to be correctly until U+0BB6 is encoded. It was 
 accepted for ballot by WG2 and UTC but has to go through the process 
 now.

Proposal for adding SHA at U+0BB6 can be seen at:
http://wwwold.dkuug.dk/JTC1/SC2/WG2/docs/n2617

In the document, it is noted that the current practice for encoding
SHRI in Unicode is SA+VIRAMA+RA.  Does this mean that existing
documents/data are incorrect or will become incorrect once SHA is
formally approved?

Best regards,

James Kass
.



RE: Encoding Tamil SRI

2003-11-06 Thread jameskass
 In the document, it is noted that the current practice for encoding
 SHRI in Unicode is SA+VIRAMA+RA.

Plus II. (SA+VIRAMA+RA+II).

Best regards,

James Kass
.



[OT] Voiced velar fricative

2003-11-06 Thread Marion Gunn


Common enough in Irish, Doug.

Herewith some minimal pairs:

ghroí (voiced)
chroí (unvoiced)

ghas (voiced)
chas (unvoiced)

ghual (voiced)
chual (unvoiced)

ghoill (voiced)
choill (unvoiced)

ghnó (voiced)
chnó (unvoiced)

Learners (until they develop a good ear for the difference) can make
mistakes to their cost in re the above and similar pairings. 

Hope this helps,
mg

-- 
Marion Gunn * EGTeo (Estab.1991)
27 Páirc an Fhéithlinn, Baile an 
Bhóthair, Co. Átha Cliath, Éire.
* [EMAIL PROTECTED] * [EMAIL PROTECTED] *



Re: [hebrew] Re: Hebrew composition model, with cantillation marks

2003-11-06 Thread Doug Ewell
Michael Everson everson at evertype dot com wrote:

 At 15:53 -0800 2003-11-05, Doug Ewell wrote:
 Gads, how I wish there were a Hebrew-specific list where these
 protracted Hebrew-specific discussions could take place.

 There is. [EMAIL PROTECTED]

I know.  I was being facetious.


Peter Kirk peterkirk at qaya dot org responded to Michael a few
messages later:

 Please keep the detailed proposals on the Hebrew-specific list. It's
 probably best not to cc: the main list. If you're thinking of cc:ing,
 it probably belongs to the detailed list.

 But we Hebrew experts want our proposals to be reviewed in advance
 by UTC members and others who understand the broad scope of Unicode.
 This avoids wasting the UTC's time as well as ours by presenting
 proposals which are clearly unacceptable. But how are UTC members to
 see or even know about such proposals if they don't monitor the Hebrew
 list and if the proposals cannot be mentioned, as I proposed, on the
 general list?

I don't think mentioning the proposals is something anyone would
object to.  It would be nice, though,  if the great volume of committee
work, which involves initial bouncing around of ideas and maximum
controversy among participants, could take place on the [hebrew] list
and the proposals, if any, could be brought back to the main list after
there is some semblance of consensus among [hebrew] participants:

We've come up with the following suggestions for handling this problem
with shuffling of Hebrew combining marks or whatever:  (1) create a new
combining character X; (2) redefine the semantics of existing character
Y; (3) create a new base character Z; (4) create a Technical Report
clarifying how things should be encoded; (5) etc. etc.

Comments would then be appropriate to the main list if they are relevant
to Unicode in general, or deal with the acceptability of the proposal,
or should return to the [hebrew] list if they deal with the minute
details of Hebrew, especially if they are comprehensible only to those
with a working knowledge of Hebrew (which characterizes much of the
current discussion).

This bi-level approach is suggested only because of the very high volume
of detailed discussion this topic has engendered, not because I think
there's anything wrong with discussing Hebrew or details on the Unicode
list.  I can't help thinking that other specialized lists, such as those
for bidi and CJK, were created to resolve this exact type of problem.

I realize I may be way off base on this, in which case I'll just
continue to make frequent use of my Delete button.

-Doug Ewell
 Fullerton, California
 http://users.adelphia.net/~dewell/




Re: [hebrew] Re: Hebrew composition model, with cantillation marks

2003-11-06 Thread Andrew C. West
On Thu, 6 Nov 2003 08:30:24 -0800, Doug Ewell wrote:
 
 I can't help thinking that other specialized lists, such as those
 for bidi and CJK, were created to resolve this exact type of problem.

CJK list ? Now if only there was a list of Unicode lists ...



Re: Hebrew composition model, with cantillation marks

2003-11-06 Thread Rick McGowan
Andrew, There isn't a CJK list.
Rick


 CJK list ? Now if only there was a list of Unicode lists ...




Re: [hebrew] Re: Hebrew composition model, with cantillation marks

2003-11-06 Thread Peter Kirk
I agree with you here, Doug. I am copying this to the Hebrew list in the 
hope that those on both lists will follow this kind of procedure. Or 
does anyone have strong objections?

On 06/11/2003 08:30, Doug Ewell wrote:

...

Peter Kirk peterkirk at qaya dot org responded to Michael a few
messages later:
 

Please keep the detailed proposals on the Hebrew-specific list. It's
probably best not to cc: the main list. If you're thinking of cc:ing,
it probably belongs to the detailed list.
 

But we Hebrew experts want our proposals to be reviewed in advance
by UTC members and others who understand the broad scope of Unicode.
This avoids wasting the UTC's time as well as ours by presenting
proposals which are clearly unacceptable. But how are UTC members to
see or even know about such proposals if they don't monitor the Hebrew
list and if the proposals cannot be mentioned, as I proposed, on the
general list?
   

I don't think mentioning the proposals is something anyone would
object to.  It would be nice, though,  if the great volume of committee
work, which involves initial bouncing around of ideas and maximum
controversy among participants, could take place on the [hebrew] list
and the proposals, if any, could be brought back to the main list after
there is some semblance of consensus among [hebrew] participants:
We've come up with the following suggestions for handling this problem
with shuffling of Hebrew combining marks or whatever:  (1) create a new
combining character X; (2) redefine the semantics of existing character
Y; (3) create a new base character Z; (4) create a Technical Report
clarifying how things should be encoded; (5) etc. etc.
Comments would then be appropriate to the main list if they are relevant
to Unicode in general, or deal with the acceptability of the proposal,
or should return to the [hebrew] list if they deal with the minute
details of Hebrew, especially if they are comprehensible only to those
with a working knowledge of Hebrew (which characterizes much of the
current discussion).
 

(Actually, this is not quite true. Most of the recent thread has been an 
attempt to educate someone who was, by their own admission, not familiar 
with the details of Hebrew, but nevertheless wanted to help fix the 
problems.)

This bi-level approach is suggested only because of the very high volume
of detailed discussion this topic has engendered, not because I think
there's anything wrong with discussing Hebrew or details on the Unicode
list.  I can't help thinking that other specialized lists, such as those
for bidi and CJK, were created to resolve this exact type of problem.
I realize I may be way off base on this, in which case I'll just
continue to make frequent use of my Delete button.
-Doug Ewell
Fullerton, California
http://users.adelphia.net/~dewell/


 



--
Peter Kirk
[EMAIL PROTECTED] (personal)
[EMAIL PROTECTED] (work)
http://www.qaya.org/




Re: elided base character or obliterated character (was: Hebrew composition model, with cantillation marks)

2003-11-06 Thread John Cowan
Andrew C. West scripsit:

 The problem of how to represent missing/obliterated characters in Unicode when
 transcribing manuscript/printed texts and inscriptions, etc. has always
 perplexed me.

IIRC we talked about this a year or so ago, and kicked around the idea that
the Chinese square could be treated as a glyph variant of U+3013 GETA MARK,
which looks quite different but symbolizes the same thing.

I don't remember the outcome.

-- 
But you, Wormtongue, you have done what you could for your true master.  Some
reward you have earned at least.  Yet Saruman is apt to overlook his bargains.
I should advise you to go quickly and remind him, lest he forget your faithful
service.  --Gandalf John Cowan [EMAIL PROTECTED]



[offline] RE: [hebrew] Re: Hebrew composition model, with cantillation marks

2003-11-06 Thread Peter Constable
While I might have added this thread yesterday, I trust you believe that
I am attempting to get Hebrew-specific stuff onto the Hebrew list, and
trying to kill the kind of rambling threads that have been going on,
which I consider fruitless and find very annoying.

Peter

 -Original Message-
 From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]
On
 Behalf Of Doug Ewell
 Sent: Wednesday, November 05, 2003 3:54 PM
 To: Unicode Mailing List
 Subject: Re: [hebrew] Re: Hebrew composition model, with cantillation
 marks
 
 Gads, how I wish there were a Hebrew-specific list where these
 protracted Hebrew-specific discussions could take place.
 
 -Doug Ewell
  Fullerton, California
  http://users.adelphia.net/~dewell/
 
 





list etiquette (was RE: Merging combining classes, was: New contribution N2676)

2003-11-06 Thread Peter Constable
Folks, there are people on the Unicode list who have been frustrated by
the volume of traffic on Hebrew, and for that reason a separate list was
created. All of the people currently discussing Hebrew are members of
that other list, although certain individuals have a bad habit of
sending replies back to the Unicode list. When this happens, I think it
would be a courtesy to actively steer the discussion back to the Hebrew
list by sending your replies there.

Peter
 
Peter Constable
Globalization Infrastructure and Font Technologies
Microsoft Windows Division



 -Original Message-
 From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]
On
 Behalf Of Peter Kirk
 Sent: Thursday, November 06, 2003 3:34 AM
 To: Jony Rosenne
 Cc: 'Philippe Verdy'; [EMAIL PROTECTED]
 Subject: Re: Merging combining classes, was: New contribution N2676
 
 On 05/11/2003 19:59, Jony Rosenne wrote:
 
 
 
 -Original Message-
 From: [EMAIL PROTECTED]
 [mailto:[EMAIL PROTECTED] On Behalf Of Philippe Verdy
 Sent: Thursday, November 06, 2003 3:46 AM
 
 Is there an initiative in Israel related to the supported
 glyphs and rendering features required to support Hebrew,
 like it exists in Europe with MES subsets, and will soon be
 developped for Chinese?
 
 
 
 
 Why would we need it? All major vendors support Hebrew quite well
now.
 
 Jony
 
 
 You mean, I think, that they support the (unofficial) subset of the
 Unicode Hebrew block used in modern Hebrew, either only unpointed or
 with a limited inventory and limited combinations of points. Adequate
 for normal use in Israel, but not for biblical scholarship.
 
 --
 Peter Kirk
 [EMAIL PROTECTED] (personal)
 [EMAIL PROTECTED] (work)
 http://www.qaya.org/
 
 
 





RE: [hebrew] Re: Hebrew composition model, with cantillation marks

2003-11-06 Thread Peter Constable
 But we Hebrew experts want our proposals to be reviewed in advance
by
 UTC members and others who understand the broad scope of Unicode...

There have been several such people subscribed to the Hebrew list.
Rambling verbose discussions are making some of them leave however.


Peter
 
Peter Constable
Globalization Infrastructure and Font Technologies
Microsoft Windows Division




Re: UTF-16 inside UTF-8

2003-11-06 Thread Markus Scherer
I would like to comment on several statements that I have seen in this thread -

- Migrating from UCS-2 to UTF-16:
  Doable, and has been done for many applications and libraries.
- Difficult to handle UTF-16?
  Use ICU - it handles all of Unicode for collation,
  regular expressions, string casing, codepage conversion,
  and many other things.
- Support for supplementary characters only for Chinese?
  Japan has defined JIS X 0213 which has characters that map to
  + supplementary characters
  as well as
  + multiple BMP characters
  (ICU 2.8 will support codepage conversion involving
   multiple characters on either side)
  CJKV ideographs, used in several languages, are driving support
  for supplementary characters.
- Case mappings can be modified to return a 32-bit Unicode
  code point instead of 16-bit BMP?
  This works, but only for simple case mappings.
  Full Unicode case mappings are defined on strings, and
  single-character APIs won't work at all.
  Full string mappings map 1:n and are context- and language-sensitive.
markus

http://oss.software.ibm.com/icu/

--
Opinions expressed here may not reflect my company's positions unless otherwise noted.



Re: Merging combining classes

2003-11-06 Thread Anto'nio Martins-Tuva'lkin
On 2003.10.30, 15:48, Jim Allan [EMAIL PROTECTED] wrote:

 I offered a suggestion on cedilla and combining undercomma:
...
 One wants to find matches for Romanian and Latvian personal names or
 place names or individual forms using cedilla or undercomma regardless
 of the language in which they are embedded.

All this cedilla vs. undercomma reminds me of something I spotted last
summer (and will have on photo ASAP): Portuguese roadsigns are usually
set in a type whose cedilla glyphs are shaped like undercommas (which
are less frequent than the connecting variant but nonetheless correct).

A large sign at the main western road access to Miranda do Douro,
Portugal's northeasternmost city, informs that if you take the road to
the left out of the next roundabout you will reach the neighboring city
Bragança...

All this quite OK, but for some weird reason the cedilla was placed
under the second a instead of under the c. Now the real challenge is
to try and encode this typo: someone learned in Portuguese would prefer

0042 0072 0061 0067 0061 0327 006E 0063 0061

but any other would never know and have it

0042 0072 0061 0067 0061 0326 006E 0063 0061

of course the same can be said about any correctly spelt word, but these
may be checked against a dictionary and corrected -- typoes cannot.

Anyway -- who ever decided that cedilla and undercomma are different
things? Do they have different origins? Any language / orthography using
both distinctly?...

--   .
António MARTINS-Tuválkin,   |  ()|
[EMAIL PROTECTED]   ||
R. Laureano de Oliveira, 64 r/c esq. |
PT-1885-050 MOSCAVIDE (LRS)  Não me invejo de quem tem   |
+351 934 821 700 carros, parelhas e montes   |
http://www.tuvalkin.web.pt/bandeira/ só me invejo de quem bebe   |
http://pagina.de/bandeiras/  a água em todas as fontes   |




Re: [hebrew] Re: Hebrew composition model, with cantillation marks

2003-11-06 Thread Dean Snyder
Philippe Verdy wrote at 10:15 PM on Wednesday, November 5, 2003:

If it's not in the written text, it is not implied by the writer.

If this were true, based on the fact that writers wrote very few of them,
we would be faced with the implication that there were very few vowels
indeed in the old Hebrew, Aramaic, Arabic, Syriac, Phoenician, Moabite,
Ammonite, and Ugaritic languages.


Respectfully,

Dean A. Snyder
Scholarly Technology Specialist
Library Digital Programs, Sheridan Libraries
Garrett Room, MSE Library, 3400 N. Charles St.
Johns Hopkins University
Baltimore, Maryland, USA 21218

office: 410 516-6850 mobile: 410 245-7168 fax: 410-516-6229
Manager, Digital Hammurabi Project: www.jhu.edu/digitalhammurabi





Re: CJK mailing list (was: Hebrew composition model, with cantillation marks)

2003-11-06 Thread Philippe Verdy
From: Rick McGowan [EMAIL PROTECTED]
To: [EMAIL PROTECTED]
Sent: Thursday, November 06, 2003 6:43 PM
Subject: Re: Hebrew composition model, with cantillation marks


 Andrew, There isn't a CJK list.
 Rick

CJK normalization at least does not cause so many problems, as ideographs
are not encoded by combining sequences, but individually (except for some
private ideographs that may be encoded with ideographic description
characters, and basic radicals or strokes, but they all are base character
at class 0 and are not affected by normalization).

So the only discussions in CJK are mostly related to unification of
repertoires from various sources (national standard organisms, research
groups and universitaries, librarians, dictionnaries and their
publishers...)

This focuses much less a large population, but it is certainly a problem
when CJK can be augmented ad infinitum without policies by any one using the
script and constantly inventing new characters in their own EUDC area and
using them to publish something.




Re: Merging combining classes

2003-11-06 Thread Jim Allan
António Martins-Tválkin wrote:

Anyway -- who ever decided that cedilla and undercomma are different
things? Do they have different origins? Any language / orthography using
both distinctly?... 
I don't know whether undercomma is in origin distinct from cedilla or is 
historically an adaptation of the cedilla. I *suspect* the latter.

Even given a common origins, it is debatable whether they should now be 
considered the same or not. That is why there is a problem. It isn't cut 
and dried.

The MARC 21 and Ansel character sets distinguished the two as CEDILLA 
and LEFT HOOK (for the undercomma) though it is dubious whether the 
originators of these sets knew what this left hook was. See 
http://lcweb2.loc.gov/cocoon/codetables/45.html for current ANSEL 
specifications and 
http://www.niso.org/standards/resources/Z39-47-1993(R2002).pdf for 1963 
table where it was notoriously given the name LEFT HOOF.

Its identity with the undercomma is asserted at 
http://www.niso.org/international/SC4/Wg1_240.pdf:


5/2 HOOK TO LEFT
In ISO 5426, this character is annotated ' used in Latvian, Romanian.' 
Because of this use, the most appropriate mapping is to U+0326 COMBINING 
COMMA BELOW (annotated as 'variant of the following' [combining cedilla] 
in the Unicode Standard).


The original ISO 6429 character sets were constructed under the 
philosophy that differences between cedilla and undercomma were only 
stylistic. The default images in those tables and in Unicode Standard 
versions 1 and 2 showed a cedilla form throughout.

However users of Latvian and Romanian insisted firmly that cedilla forms 
were not historically correct for printed material in those languages. 
It was *only* increasing use of fonts created outside of eastern Europe 
that had caused the incorrect cedilla shape to be seen, especially as 
computer technology took hold.

For Latvian (and Livonian), the problem was easily solved within  
standard character sets by font designers using the undercomma character 
beneath all letters except _c_ or _s_ .

However Romanian _s_ which traditionally had undercomma conflicted with 
Turkish _s_ with cedilla.

The result was a Romanian proposal to add uppercase and lowercase 
combined characters with undercomma for uppercase and lowercase _s_ and _t_.

See ISO/IEC JTC 1/SC 2/WG 2 N1604 (1987) at 
http://anubis.dkuug.dk/JTC1/SC2/WG2/docs/n1604.htm :


*RESOLUTION M33.24 (4 Latin characters):
_Netherland Negative._*

WG 2 accepts the following four Latin characters (requested by Romania), 
their names and shapes to be encoded in the BMP as follows:

   0218 LATIN CAPITAL LETTER S WITH COMMA BELOW

   0219 LATIN SMALL LETTER S WITH COMMA BELOW

   021A LATIN CAPITAL LETTER T WITH COMMA BELOW

   021B LATIN SMALL LETTER T WITH COMMA BELOW

in accordance with document N1361.

See resolution M33.26 for further processing.

But Romanians are still frustrated because most fonts distributed as 
part of computer operating systems or otherwise available do not support 
these characters.

ISO 8859/16 (intended as a replacement for ISO 8859/2) specifically 
designates undercomma rather than cedilla with _s_, _S_, _t_, _T_. See 
ftp://ftp.unicode.org/Public/MAPPINGS/ISO8859/8859-16.TXT

For the Netherlands opposition see 
http://wwwold.dkuug.dk/JTC1/SC2/WG3/docs/n441.pdf .

Since there is no linguistic tradition in any language for _t_ with a 
cedilla shape beneath, most modern fonts display an undercomma beneath 
U+0162, U+0163 instead of a cedilla shape.

It is really only with _s_ that there are two conflicting usages.

There are actually three conflicting uses, since Gagauz traditionally 
uses a cedilla shape under _c_ an undercomma beneath _t_ and a symbol 
halfway between the two under _s_. See 
http://www.unicode.org/mail-arch/unicode-ml/y2002-m09/0199.html

Jim Allan













OT: Inuktitut dictionary?

2003-11-06 Thread Curtis Clark
A friend is looking for a vocabulary-rich English-Inuktitut dictionary 
as a source for names for malamute dogs. He is a scholar in another 
field (astrobiology), and so is concerned with accuracy. I'm sure he 
would gladly learn the syllabics to the extent necessary. He has access 
to university interlibrary loan if the best dictionary is out of print. 
And I imagine he would be fine with Inupiaq, too. Please email offlist 
if you have any suggestions. Thanks!

--
Curtis Clark  http://www.csupomona.edu/~jcclark/
Mockingbird Font Works  http://www.mockfont.com/