RE: Pronunciation of U+0429 (was RE: Digraphs as Distinct Logical Uni ts)

2002-08-09 Thread Marco Cimarosti

David Starner wrote:
 At 11:00 AM 8/8/02 -0700, David Possin wrote:
 I have seen the German transliteration being 'schtsch' for 
 it, English
 would be 'shtsh' with 'sh' spoken like sharp in both cases. The
 German 'ch' sound is very different.
 
 Shouldn't that be 'shch' for English? I've seen that before, and it
 makes more sense.

Yes, that's the normal English romanization (Хрущёв = Khrushchev).

And the official Russian scientific transliteration is šč (Хрущёв =
Chruščëv).

This puzzles me a little bit because it seems that Russians themselves think
that the letter represent two consonants, or a glide. BTW, a Russian course
book I have at home represents the pronunciation with a sort of
Cyrillic-based IPA: letter щ is transcribed as [шч].

Perhaps the two consonants is a sort of Russian received pronunciation,
while the ich-Laut pronunciation could be dialectal.

_ Marco




Re: Digraphs as Distinct Logical Units

2002-08-09 Thread Andrew C. West

Doug Ewell wrote:

 And if you think that's bad, you should have seen the ones that got rejected -- 
 special emphasized Hangul for writing the names of North Korean dictators

Not so outlandish as it may first appear. When Egyptian hieroglyphs get encoded in 
Unicode, I would
not be surprised to see special characters for the cartouched names of pharaohs (for 
pharaohs read
dictators).

And in China, historically the personal names of emperors (for emperors read 
dictators) have been
tabooed (some dynasties, e.g. Han, Song and Qing, more than others), meaning that if 
you had to
write a character that happened to be part of the emperor's personal name, then you 
either
substituted another character (synonym or homophone as appropriate), or wrote the 
character with the
last stroke omitted. This later practice was prevalent during the Qing dynasty 
(1644-1911). For
example, the character hong ºë [U+5F18] is often found written without the final dot 
on the bottom
right in texts dating from and after the reign of the Qianlong emperor (r.1736-1795), 
whose personal
name was hongli ºë•Ñ [U+5F18, U+66C6].

Whilst an editorial decision may be made to transcribe all instances of the tabooed 
form of ºë
[U+5F18] as ºë [U+5F18] for a given text, because these tabooed forms are so useful 
for dating
purposes, textual scholars often have to refer to the tabooed form as distinct from 
the canonical
form (I myself have had to do so, and have been reduced to using awkward formulae such 
as the
character ºë with a missing final stroke).

I was thinking that perhaps there might be a need for a new Unicode block - CJK Taboo 
Replacement
Characters, but having just looked at the chart for CJK Unified Ideographs Extension B
http://www.unicode.org/charts/PDF/U2.pdf (scary reading for you font 
developers), I notice
that the tabooed form of hong is encoded at U+2239E, as is at least one other 
taboo-form that I
checked (U+248E5).

Andrew West




OT: Re: Pronunciation of U+0429 (was RE: Digraphs as Distinct Logical Uni ts)

2002-08-09 Thread Philipp Reichmuth

Hello Rick,

RC My native Russian speaker isn't available at the moment, but when she
RC pronounced U+0429 for me this morning, it sounded like a single phoneme. And
RC when I pronounced an ich-laut for her, she said it was the same sound.

Unfortunately, the latter experiment does not prove very much because
of categorial perception. A speaker from a language will always have a
certain tolerance with which they perceive phonemes. German native
speakers are an extreme case: almost everyone without phonetic
training will say that [ç] and [x] are the same *sound* (because
they're allophones of the same *phoneme*), even though they're really
different.

A similar case exists with Russian [l], for example. Because Russian
has two L-sounds ([l] and [l']), Russian [l] is usually darker and
more tense than, say, German [l]. However, when I produce a German [l]
and ask a Russian what sound it is, they will always say it's an [l],
even though their own realization of [l] is phonetically different.
And when I ask them to produce an [l] and then produce my own [l],
they will say that it's the same sound, even though it is a different
sound *phonetically* (because, of course, when asked whether A and B
are the same sound, most people answer from their *phonological*
viewpoint).

If you want to experiment, ask her to say chemistry in Russian,
listen to the first phoneme, compare it to U+0429 (they *are*
different) and then figure out which one is the ich-sound [ç].

RC The entry for U+0429 (which they write as Ø') sure looks and
RC sounds like an ich-laut to me.

Oh, the entry for [x'] sounds so, too :-) For a native speaker of a
language other than Russian, both probably sound like it. For a native
speaker of German (like myself), *both* sound *different* from High
German [ç] (or at least my own idea of how ich *should* be
articulated in High German). (However, when speakers of Ripuarian (the
dialect of German in Bonn where I live) say ich, it sound pretty
much like my idea of U+0429, whatever that signifies...)

Ah, this is all so complicated.

  Philippmailto:[EMAIL PROTECTED]
___
Chaos reigns within / Reflect, repent, and reboot / Order shall return





Backward accent order

2002-08-09 Thread Ake Persson

The French language uses backward accent order. Is backward accent order
used in any other language?

Regards,
Åke Persson





Re: Re: Pronunciation of U+0429 (was RE: Digraphs as Distinct Logical Uni ts)

2002-08-09 Thread Stefan Persson

- Original Message -
From: Philipp Reichmuth [EMAIL PROTECTED]
To: Rick Cameron [EMAIL PROTECTED]
Cc: [EMAIL PROTECTED]
Sent: Friday, August 09, 2002 12:02 PM
Subject: OT: Re: Pronunciation of U+0429 (was RE: Digraphs as Distinct
Logical Uni ts)

 German native
 speakers are an extreme case: almost everyone without phonetic
 training will say that [ç] and [x] are the same *sound* (because
 they're allophones of the same *phoneme*), even though they're really
 different.

But [ç] and [χ] aren't very different:

[ç] is how native Swedes pronounce k before some vowels.
[χ] is how many immigrants pronounce the same letter.
[ʧ] is how it's pronounced in the Finnish dialect.

Stefan

_
Gratis e-mail resten av livet på www.yahoo.se/mail
Busenkelt!





Re: Digraphs as Distinct Logical Units

2002-08-09 Thread James Kass


Andrew C. West wrote of pharoahs and taboos.

Egyptian Hieroglyphic Encoding Proposal:
http://std.dkuug.dk/jtc1/sc2/wg2/docs/n1637/n1637.htm


Proposal to Add IDEOGRAPHIC TABOO VARIATION INDICATOR 
to ISO/IEC 10646:
http://std.dkuug.dk/JTC1/SC2/WG2/docs/n2475.pdf

Best regards,

James Kass.






Re: OT: Re: Pronunciation of U+0429 (was RE: Digraphs as Distinct Logical Uni ts)

2002-08-09 Thread Anatoly Vorobey

Hello Philipp,

PR Hello Rick,

RC My native Russian speaker isn't available at the moment, but when she
RC pronounced U+0429 for me this morning, it sounded like a single phoneme. And
RC when I pronounced an ich-laut for her, she said it was the same sound.

There are two ways to pronounce U+0429. One is a single consonant that
sounds like a softer version of [S] (the sh-sound), the other is very
similar to [StS].

The [StS]-variation, recorded in many foreign textbooks and other
sources, is almost, but not quite, extinct. The single-consonant
version is almost, but not quite, universal in modern Russian.

More clarifications:
- the single-consonant version [S'] is indeed one sound; it's not the
case that it's just [StS] mistakenly believed to be a single sound by
native speakers. [S'] and [StS] are different to a native ear
(but you don't hear [StS] so much anymore).
- both [StS] and [S'] are double in length; that's why in fact [S'] is
usually denotes [S':] in Russian phonetical texts. The letter U+0429
always denotes a double consonant, whether its quality is [S'] or
[StS] (the actual length is not exactly double but somewhat less than
twice the normal consonant length; that is true of all cases of
consonant doubling in Russian, however).

There are very few cases where U+0429 is pronounced as a single [S']
consonant in casual speech; e.g. in the word voobsche. This is
probably due to such words' high frequency in speech; whether it'll in
time affect the length of U+0429 in general remains to be seen.

- in any case it's a single phoneme, both in the [S'] and the [StS]
version. It contrasts meaningfully with S+tS. S+tS (which occurs
fairly often on morpheme boundaries) sounds slightly different from
U+0429 in its [StS] variant (as far as I can make out; my native
version of U+0429 is [S']).

- the [StS] variation is normally thought of as belonging to the
St.Petersburg [Leningrad] accent. St.Petersburg is where it survives
(barely) today, and it's by no means universal there today. It's
disappeating pretty rapidly. A generation ago, many actors, singers,
sometimes TV announcers used [StS]; today it's no longer considered
acceptable.

- historically, the [StS] pronunciation used to be universal in
Russian (this [StS] evolved from earlier proto-Slavic [St], IIRC; the
same letter denotes [St] in old Slavonic texts). The currently
standard [S'] variation used to be a Moscovite accent feature which
started to appear around 15-16th centuries. Slowly it propagated
throughout most of Russian dialects, until in the end only some
Northern dialects, including the St.Petersburg dialect, remained with
[StS]. This also helps explain why [S'] is always (well, nearly -- see
above) a double consonant, the only such consonant in Russian. It
appeared as a kind of flattening of the differences between S and tS
in [StS], both consonants coming together, in a way, and forming a
single [S':] (tS is perceived to be a single consonant sound in Russian and is
different from t+S).

- some phonetists prefer to speak of [S'tS] in the St.Petersburg
accent and not [StS]. It's certainly true that the first consonant in
[S'tS] is softer than the standard, rather hard, Russian [S].

(I am a native speaker.)

-- 
Anatoly Vorobey,
my journal (in Russian): http://www.livejournal.com/users/avva/
[EMAIL PROTECTED] http://pobox.com/~mellon/
Angels can fly because they take themselves lightly - G.K.Chesterton





Re: OT: Re: Pronunciation of U+0429 (was RE: Digraphs as Distinct Logical Uni ts)

2002-08-09 Thread John Cowan

Anatoly Vorobey scripsit:

 - historically, the [StS] pronunciation used to be universal in
 Russian (this [StS] evolved from earlier proto-Slavic [St], IIRC; the
 same letter denotes [St] in old Slavonic texts). 

And in modern Bulgarian as well.

-- 
John Cowan  [EMAIL PROTECTED]
http://www.ccil.org/~cowan  http://www.reutershealth.com
Charles li reis, nostre emperesdre magnes,
Set anz totz pleinz ad ested in Espagnes.




Re: Pronunciation of U+0429

2002-08-09 Thread Radovan Garabik

Rick Cameron wrote:

Is Щ  pronounced in Russian something like the ich-Laut in German? I

not at all. first, Щ is a double consonant
 
   believe
this sound is represented in IPA by /ç/. In TUS 2.0 it says that
/ɕ/
(U+0255) represents the sound spelled with ś (U+015B) in Polish, so
perhaps
these sounds are different. If so, any hints on the difference?
   

(FWIW, I too was taught that Щ was pronounced /ʃʧ/ - but my

that is indeed the official pronunciation, and if you ask an (educated) Russian
speaker to slowly pronounce a word with Щ he will pronounce it as
/ʃʧ/ - but I guess it is influenced by orthography.
In normal speech, this sound is almost like /ʃː/  or /ɕː/ (definitely softer
than just plain /ʃː/)


Russian teacher
was a Czech! Are there any Slavic languages that do have a letter
pronounced
/ʃʧ/?)

east slovak dialects, and it is a real  combination of two phonemes /ʃʧ/ there
(and it is usually written šč, when these dialects are written down at all)
however, in some dialects it turns into /ɕʨ/ as known from polish

btw ukrainian pronunciation of Щ is IMHO /ʃː/ 

-- 
 ---
| Radovan Garabik http://melkor.dnp.fmph.uniba.sk/~garabik/ |
| __..--^^^--..__garabik  melkor.dnp.fmph.uniba.sk |
 ---
Antivirus alert: file .signature infected by signature virus.
Hi! I'm a signature virus! Copy me into your signature file to help me spread!




Re: German 'ich' (was: Pronunciation of U+0429)

2002-08-09 Thread David Possin

I was thinking about Hessisch too, which is Frankfurt area and the
German Bundesland Hessen. 
I think I can distinguish about 6 different dialects, each one has a
different pronunciation of 'ich'. If anybody is interested I can
organize a conference call offlist and we can listen to the various
sounds by phone. Compare it with the Berlin version ;-)

Dave
--- Otto Stolz [EMAIL PROTECTED] wrote:
 Rick Cameron wrote:
 
  At http://www.philol.msu.ru/rus/galya-1/kons/n-2.htm you can find
  audiovisual samples for the consonants of the Russian alphabet. The
 entry
  for U+0429 (which they write as Ш') sure looks and sounds like an
 ich-laut
  to me.
 
 Are you referring to the German standard pronounciation [ç],
 or have you, by any chance, heard this phoneme pronounced by
 a Hessian [ʃ]? The latter would resemble the pronounciation of
 щ much more than the former (which is normally transliterated
 into Russian as г).
 
 Best wishes,
Otto Stolz
 
 


=
Dave Possin
Globalization Consultant
www.Welocalize.com
http://groups.yahoo.com/group/locales/

__
Do You Yahoo!?
HotJobs - Search Thousands of New Jobs
http://www.hotjobs.com




Re: Tildes on vowels

2002-08-09 Thread William Overington

David Possin wrote as follows.

quote

In German it was common to use a macron over m and n to show mm and nn,
I saw it being written this way up to the 1970's. But I never saw it
used for any other double letters.

Dave

end quote

There is a very interesting document entitled The Gutenberg Press available
as a file named gbpmanual.pdf from the Walden Font website.

The website address is as follows.

http://www.waldenfont.com

The address for the file is as follows.

http://www.waldenfont.com/public/gbpmanual.pdf

On page 14 are some special characters, ligatures and abbreviations, as used
by Gutenberg.

Searching through the table is great fun so I will only mention here the
first entry in the table which shows a letter a with a horizontal line over
the top which is stated as am, an in the pdf file.

The Walden Font website also has some sample fonts showing some of the
characters in each font.  With the Gutenberg sample some of the special
characters with a horizontal line over the top are in the sample.  I managed
to find them using the Insert Symbol facility of Word 97 on a Windows 98
platform.

I have also experimented using WordPad on a Windows 98 platform and found
that I could get one of the characters by using Alt+0200.

I also managed to get that same character into WordPad on an older Windows
95 PC.

I have not referred to the line over the top as a macron as I am not sure
whether it is a macron.  I say not sure because I am learning and am not
sure in that context, not in any way because I am expressing a learned
opinion on the matter or anything like that.

The document refers to Gutenberg having 290 characters in his typeset.

However, the Walden Font font seems not to have that many characters, so
perhaps someone might like to say something about Gutenberg's character set
please.

An email correspondent recently informed me that Gutenberg used a qv
ligature.  Does anyone know please of what ligatures and abbreviations were
used by Gutenberg, if any, which are not in Walden Font font please?

I recently saw a television programme in the United Kingdom about Gutenberg
not having used a reusable matrix for typecasting but having to make a new
matrix for each casting, without the benefit of having a punch to make the
matrix.  This was discovered by really high magnification of characters in
some of Gutenberg's printing.  It appears that the type was reused on
different pages but that no two versions of the same letter on any given
page were congruently identical.

William Overington

9 August 2002















Taboo Variants (was Re: Digraphs as Distinct Logical Units )

2002-08-09 Thread Andrew C. West

James Kass wrote:

 
 Proposal to Add IDEOGRAPHIC TABOO VARIATION INDICATOR 
 to ISO/IEC 10646:
 a
href=http://mail.alumni.princeton.edu//jump/http://std.dkuug.dk/JTC1/SC2/WG2/docs/n2475.pdf;http://std.dkuug.dk/JTC1/SC2/WG2/docs/n2475.pdf/a

Thanks for the reference.

There seem to be a couple of problems with this proposal as far as I can see.

1. The Ideographic Taboo Variation Indicator is proposed for inclusion in the Kangxi 
Radicals block
!!!

Surely they can't be serious. If they just need an empty code point, they might as 
well put it at
U+03A2 and be dammed. Probably the CJK Symbols and Punctuation block would be more 
appropriate, but
that's full up now, which I guess is why it's proposed to put the character at any old 
empty code
point. The original CJK Symbols and Punctuation block was always going to be too 
small, and I
believe that a new block is needed for extended CJK Symbols and Punctuation (there are 
still a
number of ideographic symbols that need encoding, such as the two or three commonly 
encountered
symbols that have the same semantics as U+3005 IDEOGRAPHIC ITERATION MARK).

2. Looking at CJK Unified Ideographs Extension B, it seems that the most common taboo 
variants are
now already encoded in Unicode. In addition to U+2239E and U+248E5 which I have 
already mentioned,
the primary example of a taboo-form variant character given in the proposal is also 
encoded at
U+22606. The secondary examples (where the taboo-form is used as a phonetic component 
in a more
complex character) could be currently coded using Ideographic Description Characters - 
e.g. U+2FF0,
U+2E98, U+22606 and U+2FF0, U+2EAF, U+22606. Is there still a need for an 
Ideographic Taboo
Variation Indicator ?

Personally I still think that a separate CJK Taboo Replacement Characters block would 
have been more
logical ... but it's too late now.

By the way, when's Code2000 going to include the CJK Unified Ideographs Extension B 
glyphs ? There
are actually a few useful characters hidden here and there amongst the morass of junk 
characters.

Andrew West




Re[2]: Pronunciation of U+0429

2002-08-09 Thread Anatoly Vorobey

Hello Radovan,

RG that is indeed the official pronunciation,

No, it really isn't!

RG and if you ask an (educated) Russian
RG speaker to slowly pronounce a word with [U+0429] he will pronounce it as
RG [StS]

No, he really won't!

RG  but I guess it is influenced by orthography.

What's the orthography got to do with it??

-- 
Anatoly Vorobey,
my journal (in Russian): http://www.livejournal.com/users/avva/
[EMAIL PROTECTED] http://pobox.com/~mellon/
Angels can fly because they take themselves lightly - G.K.Chesterton





Re: Backward accent order

2002-08-09 Thread Michael \(michka\) Kaplan

AFAIK reverse diacritic are unique to French -- of course French is spoken
in a lot of different locales. ;-)

MichKa

- Original Message -
From: Ake Persson [EMAIL PROTECTED]
To: Unicode List [EMAIL PROTECTED]
Sent: Friday, August 09, 2002 3:58 AM
Subject: Backward accent order


 The French language uses backward accent order. Is backward accent order
 used in any other language?

 Regards,
 Åke Persson








RE: German 'ich' (was: Pronunciation of U+0429)

2002-08-09 Thread David Possin

I guess everybody know that the has genders in Germany: der, die, das

Now imagine the poor American arriving in Munich and stepping on a
Bavarian's toe:
Das die der Dei-bel hol 
(I messed with the Bavarian spelling a bit to get my point across.)

I' bä a Schwob
(I learned German the first time in a tiny Swabian village near
Tübingen)
Dave
 
--- Vaintroub, Wladislav [EMAIL PROTECTED] wrote:
 Despite all the similarities in pronounciations of Russian U+0429 and
 German
 ich , 
 U+0429 seems to be very hard for pronounce Germans, who learn Russian
 (the
 most complicated for Germans is I think U+042B, which most of them
 pronounce
 like German u).
 
 Icke,
 (a Russian living in Berlin)
 
 
 -Original Message-
 From: David Possin [mailto:[EMAIL PROTECTED]]
 Sent: Friday, August 09, 2002 2:17 PM
 To: Otto Stolz; Rick Cameron
 Cc: [EMAIL PROTECTED]
 Subject: Re: German 'ich' (was: Pronunciation of U+0429)
 
 
 I was thinking about Hessisch too, which is Frankfurt area and the
 German Bundesland Hessen. 
 I think I can distinguish about 6 different dialects, each one has a
 different pronunciation of 'ich'. If anybody is interested I can
 organize a conference call offlist and we can listen to the various
 sounds by phone. Compare it with the Berlin version ;-)
 
 Dave
 --- Otto Stolz [EMAIL PROTECTED] wrote:
  Rick Cameron wrote:
  
   At http://www.philol.msu.ru/rus/galya-1/kons/n-2.htm you can find
   audiovisual samples for the consonants of the Russian alphabet.
 The
  entry
   for U+0429 (which they write as D?') sure looks and sounds like
 an
  ich-laut
   to me.
  
  Are you referring to the German standard pronounciation [A?],
  or have you, by any chance, heard this phoneme pronounced by
  a Hessian [Ef]? The latter would resemble the pronounciation of
  N? much more than the former (which is normally transliterated
  into Russian as D3).
  
  Best wishes,
 Otto Stolz
  
  


=
Dave Possin
Globalization Consultant
www.Welocalize.com
http://groups.yahoo.com/group/locales/

__
Do You Yahoo!?
HotJobs - Search Thousands of New Jobs
http://www.hotjobs.com




Re: Digraphs as Distinct Logical Units

2002-08-09 Thread John H. Jenkins


On Friday, August 9, 2002, at 03:54 AM, Andrew C. West wrote:

 And in China, historically the personal names of emperors (for 
 emperors read dictators) have been
 tabooed

An Ideographic Taboo Variation Indicator has been approved by the UTC 
for addition to the standard to handle precisely this kind of situation 
(see http://www.unicode.org/unicode/alloc/Pipeline.html.  It works on 
the theory that you rarely need to know the precise *form* of the taboo 
variant, just that a taboo form is being used.  There was some 
disagreement in WG2 about its utility, however, and there is the 
problem that, as you note, some taboo variants have already been 
encoded.  It's currently scheduled to be reconsidered by the UTC.

==
John H. Jenkins
[EMAIL PROTECTED]
[EMAIL PROTECTED]
http://homepage.mac.com/jhjenkins/





Re: Digraphs as Distinct Logical Units

2002-08-09 Thread Doug Ewell

Andrew C. West andrewcwest at alumni dot princeton dot edu wrote:

 And if you think that's bad, you should have seen the ones that got
 rejected -- special emphasized Hangul for writing the names of
 North Korean dictators

 Not so outlandish as it may first appear. When Egyptian hieroglyphs
 get encoded in Unicode, I would not be surprised to see special
 characters for the cartouched names of pharaohs (for pharaohs read
 dictators).

 And in China, historically the personal names of emperors (for
 emperors read dictators) have been tabooed (some dynasties, e.g. Han,
 Song and Qing, more than others), meaning that if you had to write a
 character that happened to be part of the emperor's personal name,
 then you either substituted another character (synonym or homophone as
 appropriate), or wrote the character with the last stroke omitted.
 This later practice was prevalent during the Qing dynasty (1644-1911).

The Egyptian pharaohs and Chinese emperors were generally viewed as gods
or demigods.  It's not too surprising to see the names of supreme beings
written in a special way.  In the Hebrew tradition, the name of God
(Yahweh) is written specially to avoid the appearance of blasphemy.
Mark Shoulson and Michael Everson co-wrote a draft proposal in 1998 to
encode the Tetragrammaton in Unicode:

http://std.dkuug.dk/jtc1/sc2/wg2/docs/n1740/n1740.htm

But in 2002, political leaders and heads of state are more likely to be
seen as human, rather than superhuman, at least in most cultures, and to
have their names written with the same characters as the common folk.

For the North Koreans to encode special emphasized Hangul characters
for the names of their two Great Leaders, Kim Il-sung and Kim Jong-il,
in their national standard -- going so far as to encode separate
characters for Kim and Il for each leader, though the two were
father and son -- and to propose these emphasized characters for
ISO/IEC 10646, seems extremely backward and/or extremely repressive, at
least to this Westerner.

-Doug Ewell
 Fullerton, California





Re: Tildes on vowels

2002-08-09 Thread William Overington

Stefan Persson wrote as follows (text ), responding to Andrew C. West (text
).

 Personally I think that markup may be more appropriate, given the
countless possible permutations of
 combining/superscript letters that may be encountered in mediaeval texts
in various languages.

 Why not just add *two* characters, either to the PUA or to Unicode?

 U+ = COMBINING LETTER ABOVE INDICATOR
 U+XXXY = SUPERSCRIPT LETTER INDICATOR

 This means that U+ directly followed by a is a combining a above,
and that U+XXXY directly followed by a is a superscript a.

 This means some normalisation issues:

 U+0061 U+0363 ≡ U+0061 U+ U+0061
 U+00AA ≡ U+XXXY U+0061
 etc.

 Stefan

Well, such normalisation could be as private a matter as the allocation of
the two characters to the Private Use Area.  Consider please the following
scenario, which is a scenario which I have devised in a creative writing
manner as a fictional scenario, yet which does not seem unrealistic in
relation to what might happen in practice, somewhere, sometime.  Suppose
please that someone wishes to transcribe the text of a medieval manuscript
so as to have the text stored in a computerised format.  Upon finding
various characters in the manuscript such that he or she cannot enter them
as Unicode characters, he or she might reasonably devise his or her own
encoding list, by, say, making a handwritten list (with a view to later
putting the piece of paper through a scanner to produce a graphic file) and
use that encoding list in order to make human decisions as to which
characters to key into the computer system, perhaps doing the keying with a
program such as UniPad.

The UniPad website is as follows.

http://www.unipad.org

It may be that the UniPad program could be customised so as to have a
special soft keyboard to help the transcriber in keying the codes, yet even
if that is not possible the Private Use Area codes could be entered using
the character map which UniPad provides.

In such circumstances the transcriber could decide to have a Private Use
Area encoding of the characters of the manuscript on the basis of one
Private Use Area code point for each character in the manuscript or he or
she could decide to have a system which used the two operators which you
suggest together with zero or more other operators and zero or more
individual characters depending upon the repertoire of characters which
exist in the manuscript.

Certainly there are then issues of using the data once it is in a computer
file, maybe some special program will need to be written (such as a small
Pascal program, I am not meaning some major development project to produce a
special program, just something which will do what is required for the
particular transcription project), yet for someone to use two such Private
Use Area encodings in order to facilitate the task of getting the
information content accurately from the document into the computer, it seems
a perfectly reasonable thing to do.  The transcriber might need to do the
transcribing of the original document during certain daytime hours at a
table in a secure library environment during a time frame arranged by prior
appointment and permissions.  Once the transcribed data is in the computer,
either keyed in while in the library or transcribed from notes made using a
pencil, the transcriber and other interested people throughout the world
can analyse the meaning of the text of the document almost anywhere.

In such circumstances of some people trying to understand such documents,
maybe using the two codes within the Private Use Area together with an
ordinary TrueType font which has U+ implemented so as to show a glyph of
an arrow starting by going straight upwards then going steeply diagonally
upwards in a bend dexter direction until it reaches the point of the arrow,
(as if the back half of the arrow were as in U+2191 and the front half of
the arrow were as in U+2196) and U+XXXY implemented as an arrow going
straight upwards until it reaches the point of the arrow, (similar to
U+2191) would be a way of researchers having a look at the transcribed text
of the document in a convenient manner.  I only suggest those particular
glyphs as examples in this posting, please feel free to use whatever glyph
designs you wish.

Certainly, the use of such Private Use Area codes would only have any
validity in their use amongst a group of users of the Unicode system who had
agreed to use those particular Private Use Area encodings to have those
meanings.  Yet the use of such a Private Use Area encoding could, I feel, be
very useful amongst such a group of researchers in that it would get the
document transcription job done and would have the considerable advantage
that if the transcribed file were to be displayed in a program such as
WordPad or Word that in order to be able to understand an indication of the
presence in the original document of any regular Unicode character combined
above any other regular Unicode 

RE: [unicode] Re[2]: Pronunciation of U+0429

2002-08-09 Thread Marco Cimarosti

Radovan Garabik wrote:
  RG  but I guess it is influenced by orthography.
  
  What's the orthography got to do with it??
 
 if the children in schools are taught that щ is pronounced
 as шч, they (those who are paying atention) will remember it
 and then use this pronunciation when asked to pronounce each phoneme
 of a given word.

Uh!? Are you thinking about children from ethnic minorities? Russian
children are supposed to be already able to speak Russian when they go to
school: I guess what they learn is that sound has that letter, not the
other way round.

_ Marco




Re: Tildes on vowels

2002-08-09 Thread Stefan Persson

- Original Message -
From: William Overington [EMAIL PROTECTED]
To: Stefan Persson [EMAIL PROTECTED]; Andrew C. West
[EMAIL PROTECTED]; [EMAIL PROTECTED]
Cc: [EMAIL PROTECTED]
Sent: Friday, August 09, 2002 6:00 PM
Subject: Re: Tildes on vowels

 Well, why not go ahead and decide on two code points within the Private
Use
 Area as values for  and XXXY, post them in this list and perhaps that
 action will lead to that facility becoming available as a facility to
 document transcribers all around the world.

There have been several messages sent to this list about why this would be
inappropriate. Just read the answers to some of your recent discussions, and
you'll understand what I mean.

Stefan

_
Följ VM på nära håll på Yahoo!s officielle VM-sajt www.yahoo.se/vm2002
Håll dig ajour med nyheter och resultat, med vinnare och förlorare...





Re: Taboo Variants

2002-08-09 Thread Andrew C. West

John H. Jenkins wrote:

 
 Yes, because you do not *encode* characters using IDC's.  You describe 
 them.  This is carefully explained in the standard.

I stand corrected.

 
 Of course, using the taboo variant selector is about as vague as an 
 IDC, so it doesn't make that much difference.

My point is that if the commonly encountered taboo variants are already encoded in 
CJK-B, then
either the other taboo variants should also be added to CJK-B or they could be 
*described* using
IDCs. Adding a taboo variant selector does make a difference, because then there'll be 
more than one
way to reference the same character.

On the other hand, given the lack of font support for CJK-B, perhaps a taboo variant 
selector would
be preferable ... now I don't where I stand on this !

 
 As to the proposed location, note that the byte-order mark got stuck 
 with a bunch of Arabic compatibility forms.

U+FEFF is only stuck with a bunch of Arabic compatibility forms because it's the 
little-endian of
U+FFFE, and as far as I'm aware it's not actually a BOM character, but a code point 
that is used
solely with the semantic of BOM (TR28 Section 3.9).

 Sometimes the odd 
 character gets stuck in an odd place; as you say, there wasn't any room 
 left in the more logical location, and this spot in the KangXi radicals 
 block was pretty much never going to be used otherwise.  Six of one, as 
 it were.
 

I simply can't accept this.

For argument's sake, what are you going to do when I publish the manuscript copy of a 
draft edition
of the Kangxi dictionary that I recently purchased in a second-hand bookstore in 
London that
includes ten supplementary radicals not found in the printed editions ?

In principle, as has been argued convincingly in another thread recently, you can 
never assume that
any unused code point will always remain vacant. The Kangxi Radical block may look as 
if it will
never change, but we shouldn't rely on that being the case.

Given that there's going to be proposals for additional CJK symbols and punctuation 
marks in the
future (if no-one else does I've got a few I'll propose), surely it would be better to 
simply create
a CJK Symbols and Punctuation B block for the proposed IDEOGRAPHIC TABOO VARIATION 
INDICATOR. It's
irrelevant that the block will only have one charcacter to start with. It's got to be 
better than
poluting other blocks with characters that just don't belong there.

Andrew




Re: Taboo Variants

2002-08-09 Thread Andrew C. West

John H. Jenkins wrote:

 
 Of course, using the taboo variant selector is about as vague as an 
 IDC, so it doesn't make that much difference.
 

Actually, on second thoughts, why do we need a taboo variant selector when we already 
have generic
variation selectors (U+FE00 through U+FE0F) ? The Standardized Variants document
http://www.unicode.org/Public/UNIDATA/StandardizedVariants.html states :

quote
Han Variants
At this time no Han variants exist. When they do, a table will be inserted here.
/quote

Surely if there ever was a place to put taboo-form variants, this is it.

Andrew C. West
http://uk.geocities.com/babelstone1357/




Re: [unicode] Re[2]: Pronunciation of U+0429

2002-08-09 Thread Radovan Garabik

On Fri, Aug 09, 2002 at 07:16:09PM +0200, Marco Cimarosti wrote:
 Radovan Garabik wrote:
   RG  but I guess it is influenced by orthography.
   
   What's the orthography got to do with it??
  
  if the children in schools are taught that щ is pronounced
  as шч, they (those who are paying atention) will remember it
  and then use this pronunciation when asked to pronounce each phoneme
  of a given word.
 
 Uh!? Are you thinking about children from ethnic minorities? Russian

no, I am speaking about Russians

 children are supposed to be already able to speak Russian when they go to
 school: I guess what they learn is that sound has that letter, not the
 other way round.

I have no idea how it is in Russian school system, but:
1) they can speak a dialect
2) as it was already pointed out, щ, when transcribed phoneticaly,
is written as шч in Russian literature.
When I was being taught Russian (5th grade, elementary school), 
there was never ever a mention that щ can be pronounced differently 
from шч combination. Indeed, when our teacher explained cyrillic, she
took a special effort to explain that in Russian, шч combination
is written as щ (with some exceptions, of course, such as счастие).
Also in Russian textbooks, there was written everywhere that when
pronunciation is concerned, щ=шч. But again, these were textbooks
written by Slovaks, for Slovak pupils (and not particularly good,
e.g. until I started to read real Russian literature I had no
idea that ё is often written as е. It took me some time to get out
of this confusion :-))


-- 
 ---
| Radovan Garabik http://melkor.dnp.fmph.uniba.sk/~garabik/ |
| __..--^^^--..__garabik  melkor.dnp.fmph.uniba.sk |
 ---
Antivirus alert: file .signature infected by signature virus.
Hi! I'm a signature virus! Copy me into your signature file to help me spread!




Re: Taboo Variants

2002-08-09 Thread John H. Jenkins


On Friday, August 9, 2002, at 11:38 AM, Andrew C. West wrote:

 My point is that if the commonly encountered taboo variants are 
 already encoded in CJK-B, then
 either the other taboo variants should also be added to CJK-B or they 
 could be *described* using
 IDCs.

Encoding them was a mistake, pure and simple.  We didn't monitor the 
IRG well enough in the CJK-B encoding process, or we would have 
objected to this kind of cruft.

And describing them is a valid approach.  It depends on what's more 
important to you—the appearance (which IDS's are better at), or the 
semantic (which is explicit with the TVS).

 Adding a taboo variant selector does make a difference, because then 
 there'll be more than one
 way to reference the same character.


Well, yes and no.  Even though we've already got taboo variants 
encoded, we have no way to flag in a text that the purpose they're 
serving is taboo variants.  The interesting thing about the taboo 
variants is precisely that meaning:  This is character X written in a 
deliberately distorted way.  You identified the taboo variants you 
found in Ext B not based on anything in the standard, but because of 
your outside knowledge.  A student encountering them in a text may well 
be stymied until she goes to her professor.

Meanwhile, multiple encodings of the same Han character are *already* a 
major problem.  This is one reason why the UTC is determined to be 
stricter in the future to keep it from continuing to happen.

==
John H. Jenkins
[EMAIL PROTECTED]
[EMAIL PROTECTED]
http://homepage.mac.com/jhjenkins/





Re: Taboo Variants

2002-08-09 Thread John Cowan

Andrew C. West scripsit:

 Given that there's going to be proposals for additional CJK symbols
 and punctuation marks in the future (if no-one else does I've got a few
 I'll propose), surely it would be better to simply create a CJK Symbols
 and Punctuation B block for the proposed IDEOGRAPHIC TABOO VARIATION
 INDICATOR. It's irrelevant that the block will only have one charcacter
 to start with. It's got to be better than polluting other blocks with
 characters that just don't belong there.

Blocks exist to keep things simple for allocators (i.e. UTC and WG2), and
not to allow end-users to make deductions about them; all such deductions
are quite illegitimate.  (If this isn't actually written down anywhere,
it should be.)

ISO 10646 (but not Unicode) does have the notion of labelled collections,
which may be open (i.e. include currently unassigned codepoints) or closed.
Regrettably, I can't cite examples, as AFAIK the list of collections is
not online anywhere.

-- 
John Cowan  [EMAIL PROTECTED]
http://www.ccil.org/~cowan  http://www.reutershealth.com
Unified Gaelic in Cyrillic script!
http://groups.yahoo.com/group/Celticonlang




Re: Taboo Variants

2002-08-09 Thread Andrew C. West

John Cowan wrote:

 
 Blocks exist to keep things simple for allocators (i.e. UTC and WG2), and
 not to allow end-users to make deductions about them; all such deductions
 are quite illegitimate.  (If this isn't actually written down anywhere,
 it should be.)

Surely assigning a character to a block with other like-minded characters IS keeping 
things simple
for allocators, and randomly assigning miscellaneous characters all other the place 
makes it as
confusing to allocators as to end-users. Surely that's the whole point of having 
designated block
names.

It sounds to me that what you're suggesting is that characters should be allocated 
sequentially from
U+ up, with no gaps. Would that not be the most simple solution for allocators !?

After all, as long as the end-user sees the glyph that their expecting, they don't 
care what code
point it's mapped to (indeed, as you imply, code points should be invisible to the 
end-user).

Andrew C. West
http://uk.geocities.com/babelstone1357/




Re: Taboo Variants

2002-08-09 Thread David Starner

At 10:54 AM 8/9/02 -0700, Andrew C. West wrote:
Actually, on second thoughts, why do we need a taboo variant selector when 
we already have generic
variation selectors (U+FE00 through U+FE0F) ? The Standardized Variants 
document
http://www.unicode.org/Public/UNIDATA/StandardizedVariants.html states :

quote
Han Variants
At this time no Han variants exist. When they do, a table will be inserted 
here.
/quote

Surely if there ever was a place to put taboo-form variants, this is it.

The difference being that the table matches a certain number of ideographs
with specific variants, where the taboo variant selector potentially matches
any ideograph with an (unspecified) taboo variant.






Re[2]: [unicode] Re[2]: Pronunciation of U+0429

2002-08-09 Thread Anatoly Vorobey

Hello Radovan,

 RG that is indeed the official pronunciation,
 
 No, it really isn't!

RG not even if you ask your fellow innocent russian speakers
RG please read for me this word  v e r y   s l o w l y 
RG and listen carefully?

No, it isn't.

The [StS] pronunciation has been considered a dialect pronunciation for 50
years now. The official, standard pronuncation is [S'], and has been
for a long time.

RG We were certainly taught to pronounce [U+0429] as [StS] (soft [tS] before soft
RG vowels, of course),

[tS] is _always_ soft in Russian.

 RG  but I guess it is influenced by orthography.
 
 What's the orthography got to do with it??

RG if the children in schools are taught that [U+0429] is pronounced
RG as [StS],

Trust me, they aren't.

-- 
Anatoly Vorobey,
my journal (in Russian): http://www.livejournal.com/users/avva/
[EMAIL PROTECTED] http://pobox.com/~mellon/
Angels can fly because they take themselves lightly - G.K.Chesterton





Re: Taboo Variants

2002-08-09 Thread Kenneth Whistler

Lest everyone go scrabbling off the deep end and drown on
this particular thread, I would like to point out the following
facts:

U+2FDF IDEOGRAPHIC TABOO VARIATION INDICATOR

was accepted by the UTC on April 30, 2002. However, when the
proposal was taken into WG2 it met a wall of opposition led
by China. WG2 did *NOT* accept the character, and it is not
a part of the FPDAM 2 currently being ballotted for inclusion
in 10646.

The UTC will have to deal with this mismatch (along with a number
of others) in its upcoming meeting this month.

China's clear preference is to simply encode all the taboo
variants as separate characters. At the WG2 meeting, they
pointed out a number of instances already encoded in Extension B,
as you have. And with China not wanting an IDEOGRAPHIC TABOO
VARIATION INDICATOR encoded, many other members of WG2 will
defer to their opinion on the topic.

This issue clearly needs to be worked further in the IRG context
before a consensus will emerge.

At any rate, don't consider it a done deal. What
matters is what eventually gets published in the final, approved
Amendment 2 for ISO/IEC 10646, which *will* match what we
publish in Unicode 4.0.

--Ken





Re: Taboo Variants

2002-08-09 Thread John Cowan

Andrew C. West scripsit:

 It sounds to me that what you're suggesting is that characters should be allocated 
sequentially from
 U+ up, with no gaps. Would that not be the most simple solution for allocators !?

Only if they acted sequentially, which they did not and do not.   Different
scripts are being worked on simultaneously, and without block allocation
it would be impossible to keep them from stepping on each others' code
points.  But once the job is done, the notion of blocks is dispensable.

-- 
John Cowan  [EMAIL PROTECTED]  www.ccil.org/~cowan  www.reutershealth.com
In computer science, we stand on each other's feet.
--Brian K. Reid




Re: Pronunciation of U+0429

2002-08-09 Thread Philipp Reichmuth

JC so unnatural to peoples with more phonemic orthographies.

Russian orthography is pretty *phonemic*, excluding historic forms such
as the -ogo genitive or the soft sign with the 2nd person singular of the
verb. Most accent-counting languages tend to reduce sounds rather
heavily in nonstressed syllables, however, and in those cases a
phonemic orthography doesn't help a lot.

  Philippmailto:[EMAIL PROTECTED]
___
Chaos reigns within / Reflect, repent, and reboot / Order shall return





Re: Pronunciation of U+0429

2002-08-09 Thread John Cowan

Philipp Reichmuth scripsit:

 Russian orthography is pretty *phonemic*, excluding historic forms such
 as the -ogo genitive or the soft sign with the 2nd person singular of the
 verb. Most accent-counting languages tend to reduce sounds rather
 heavily in nonstressed syllables, however, and in those cases a
 phonemic orthography doesn't help a lot.

I take it to be rather morphophonemic, much like German orthography.

-- 
John Cowan   [EMAIL PROTECTED]   http://www.reutershealth.com
Mr. Lane, if you ever wish anything that I can do all you will have to do
will be to send me a telegram asking and it will be done.
Mr. Hearst, if you ever get a telegram from me asking
you to do anything you can put the telegram down as a forgery.




OT Laugh for the day - I liked the title of this security related article

2002-08-09 Thread Barry Caplan

and the first few sentences as well


Barry Caplan
www.i18n.com

http://www.securitymanagement.com/library/000599.html


How to Keep Out Bad Characters

By DeQuendre Neeley

The business world is one of constant motion. But it is not just people who are on the 
move. It is also information. Businesses today depend on the efficient exchange of 
information, for which they rely increasingly on the Internet and other computer 
networks. Unfortunately, in the digital world, as in its physical counterpart, bad 
characters will sometimes try to slip in with the good. 





Re: Is U+0140 (l with middle dot) ever used?

2002-08-09 Thread Anto'nio Martins-Tuva'lkin

I asked my catalonian contacs about this issue; something like
_

IMO, in catalan [L][·][L] is prefered to [L·][L] because L-dot is not
really a separate letter, like spanish ñ, but a simply separator just
like an ordinary -.

Actually, AFAIK, in catalan typography if one needs to compose with
exaggerated letter spacing, middle dot is dealt with as a separate
symbol, and thus paral·lel looks like

   P   A   R   A   L   ·   L   E   L

and not

 P   A   R   A   L·   L   E   L
_

I just recieved an answer about this issue. Translated bellow:
_

 From: Hèctor Alós i Font [EMAIL PROTECTED]
 Date: Thu, 08 Aug 2002 08:17:18 +0200
 Subject: Re: [esperpentu] Fwd: Is U+0140 (l with middle dot) ever
  used?

 Vi pravas: temas pri memstara signo, ne alglulajxo al antauxa lo.
 Nuntempe en la hispaniaj klavaroj (legxo Majó), temas pri memstara
 signo tajpita per Maj+3. Mi memoras tamen malnovajn tajpilojn kun
 aparta klavo l+mezpunkto.

You're right: it's an standalone symbol, not an addition to the previous
L. In current spanish keyboards (Majó law), it's a separated symbol
located at Shift+3. But I remember older typewriters with a separated
key L + middle dot.

 Principe temas pri mezalta punkto, sed estas homoj uzantaj normalan
 punkton: ekzemple la kataluna eldono de El Periódico ( 
 http://www.elperiodico.com/EDICION/portada.htm?l=CAT ). Persone mi 
 konsideras tion suficxe malbela - kvankam estas vere, ke tio apenaux 
 konfuzas: tuj sekve, sen spaco, estas minuskla litero, malkiel okazas
 kun la vera punkto.

In principle it is a dot at mid line height, but some people uses normal
period dot: f.i. the catalan edition of the newspaper El Periódico (
http://www.elperiodico.com/EDICION/portada.htm?l=CAT ). Personnally I
find it rather ugly -- though it's true that this parctice is hardely
abiguous: right after the period, no space, there's a lower case letter
unlike what happends with a real period.

 Gxi estas uzata ankaux katalune kiel apartigilo ekz-e en kelkaj fakaj
 eldonoj de mezepokaj tekstoj: se mi bone komprenas, tiel oni indikas, ke 
 en la originalo estis unu sola vorto, sed nuntempe oni skribus dise.

In catalan it is used also as a separator f.i. in scholastic editions of
medieaval texts: IIUC, it is thus noted that in the original something
is written as a single word, which nowadays we'd write separately.

 Mi rimarkis gxian uzon ankaux en la okcitana (Zamen·hof), sed mi tre
 dubas, ke tio estas norma uzo - simple kataluna influo. Eble 
 portugallingvanoj povus imiti :)

I noted the use of middle dot also in occitan (Zamen·hof) [thus
distinguishing a foreign nh, here polish, from the occitan digraph
nh], but I strongly doubt that this is normative -- it's probably just
some catalan influence. Maybe portuguese speakers could do the same :)
[nh also occurs in portuguese].

 Kaj jes gxi estas efektive cxiutage uzata: amaseto da vortoj gxin
 enhavas, kvankam la barcelona (nenorma) prononco ne distingas inter l
 kaj l·l - sed jes duobligas suficxe multajn aliajn konsononantojn.

And, yes, L + middle dot + L is indeed used: in a smallish number of
catalan words, even if the barcelonian [normative] pronunciation doesn't
distinguish between L and L·L, though it doubles a number of other
consonants.
_

So, unless it is (or becomes) used in any other language, U+0140 seems
about to disappear from actual usage, with or without any official
deprecation.

As for the refered usage of normal period, it suffers from the known
problems of having an punctuation sign used a letter symbol (word
division, word count, alphasorting etc.).

Hm. But middle dot is not also a letter symbol. It's also used as a
bullet, a tab filling, even a box-drawing char. Shouldn't Unicode
provide a way to separate this duality?

--   .
António MARTINS-Tuválkin,   |  ()|
[EMAIL PROTECTED]   ||
R. Laureano de Oliveira, 64 r/c esq. |
PT-1885-050 MOSCAVIDE (LRS)  Não me invejo de quem tem   |
+351 917 511 549 carros, parelhas e montes   |
http://www.tuvalkin.web.pt/bandeira/ só me invejo de quem bebe   |
http://pagina.de/bandeiras/  a água em todas as fontes   |





Re: Taboo Variants

2002-08-09 Thread David Hopwood

-BEGIN PGP SIGNED MESSAGE-

Andrew C. West wrote:
[re: proposed IDEOGRAPHIC TABOO VARIATION INDICATOR]
 
 Given that there's going to be proposals for additional CJK symbols and
 punctuation marks in the future (if no-one else does I've got a few I'll
 propose), surely it would be better to simply create a CJK Symbols and
 Punctuation B block for the proposed IDEOGRAPHIC TABOO VARIATION INDICATOR.
 It's irrelevant that the block will only have one charcacter to start with.
 It's got to be better than poluting other blocks with characters that just
 don't belong there.

There's an unassigned block right next to the other ideographic variation
selectors, at U+FE10..U+FE1F. *If* there are going to be variation selectors
for particular semantics, I would have thought that's the obvious place to
encode them.

However, it doesn't make much sense to me to suddenly change from encoding
variants using separate code points, to encoding them using variation
selectors. Arguably variation selectors would have been the better approach
if they had been used from the start (in particular, there would have been
no need for any of the compatibility ideographs). However, requiring
implementations to handle lots of separately encoded variant characters
*and* variation selectors, is the worst of both worlds IMHO.

- -- 
David Hopwood [EMAIL PROTECTED]

Home page  PGP public key: http://www.users.zetnet.co.uk/hopwood/
RSA 2048-bit; fingerprint 71 8E A6 23 0E D3 4C E5  0F 69 8C D4 FA 66 15 01
Nothing in this message is intended to be legally binding. If I revoke a
public key but refuse to specify why, it is because the private key has been
seized under the Regulation of Investigatory Powers Act; see www.fipr.org/rip


-BEGIN PGP SIGNATURE-
Version: 2.6.3i
Charset: noconv

iQEVAwUBPVQm/DkCAxeYt5gVAQGD7Qf6Ai+Zxx+M9T+1cZt8J8+QF4iHdh1Ga7k0
gU+L/8YU7smq66s56y2y+chWMQr5LJvgfO1C3Z43dKlSfZ2acZBIYRIuISkHhVWl
wmawQ9kXenmKHMX2NB3abvlzuYXyZ7F2L12DoKnIapilfUeZtyjNKGM7njmqCEoo
JoUaMXOJrqLggI0FuYfn4sXMdsJXhUZkwouaG4i4qg/+UQ9yH5t4uWMc8a1vZbrq
TjOUllqPJ/fHqip7r13DFcCA3qIjq8jyJgyY7n6VOpSL6yBoBlaYiGKj1pMC84YC
3WpSF74JbDuYVMg9mOSRdUQgb5UiOr+7JsF4MSa1izTOpJCNi96HZg==
=KWkE
-END PGP SIGNATURE-




Re[2]: Pronunciation of U+0429

2002-08-09 Thread Anatoly Vorobey

Hello John,

 Russian orthography is pretty *phonemic*, excluding historic forms such
 as the -ogo genitive or the soft sign with the 2nd person singular of the
 verb. Most accent-counting languages tend to reduce sounds rather
 heavily in nonstressed syllables, however, and in those cases a
 phonemic orthography doesn't help a lot.

JC I take it to be rather morphophonemic, much like German orthography.

Yep. Russian phonetists usually call phonemes what Western
phonetists call morphonemes, so they have no problem with calling
Russian orthography _phonemic_.

-- 
Anatoly Vorobey,
my journal (in Russian): http://www.livejournal.com/users/avva/
[EMAIL PROTECTED] http://pobox.com/~mellon/
Angels can fly because they take themselves lightly - G.K.Chesterton





Re: Is U+0140 (l with middle dot) ever used?

2002-08-09 Thread Doug Ewell

Anto'nio Martins-Tuva'lkin antonio at tuvalkin dot web dot pt wrote:

 Hm. But middle dot is not also a letter symbol. It's also used as a
 bullet, a tab filling, even a box-drawing char. Shouldn't Unicode
 provide a way to separate this duality?

It should, and does.  Unicode has plenty of bullet operators, hyphen
bullets, dot operators, little black circles and squares and triangles,
all kinds of stuff to fill these various typographical needs.  The only
question is whether people will actually use these new goodies, or
continue to settle for whatever their keyboard and favorite 8-bit code
page gave them.

-Doug Ewell
 Fullerton, California





Re: Digraphs as Distinct Logical Units

2002-08-09 Thread Doug Ewell

Philipp Reichmuth uzsv2k at uni dash bonn dot de wrote:

 What about round-trip compatibility?

UTC and WG2 apparently decided that some degree of compatibility with
this relatively new (1997) DPRK standard could be sacrificed.  The
horizontal-bar fractions can be mapped to the existing Unicode
fractions, and the only thing lost in round-tripping is the exact glyph
shape.  Likewise the emphasized name syllables; the only loss of
information is the emphasis, not the plain-text identity of the
syllables.

-Doug Ewell
 Fullerton, California





Re: Taboo Variants

2002-08-09 Thread Doug Ewell

John Cowan jcowan at reutershealth dot com wrote:

 ISO 10646 (but not Unicode) does have the notion of labelled
 collections, which may be open (i.e. include currently unassigned
 codepoints) or closed.  Regrettably, I can't cite examples, as AFAIK
 the list of collections is not online anywhere.

http://std.dkuug.dk/jtc1/sc2/wg2/docs/n2499.pdf
pages 37 through 46.

-Doug Ewell
 Fullerton, California






Re: Digraphs as Distinct Logical Units

2002-08-09 Thread Roozbeh Pournader

On Fri, 9 Aug 2002, Doug Ewell wrote:

 Re: Mixed up priorities
 From: Michael Everson
 Date: Sun Oct 24 1999 - 06:34:24 EDT
 [...]
 
 (I just love that name, don't you?  I could say it all day, if only I
 knew how.  !Xóõ !Xóõ !Xóõ.)
 
 -Doug Ewell
  Fullerton, California

which makes one wonder if the above comment is a quote or yours.

roozbeh





Re: Digraphs as Distinct Logical Units

2002-08-09 Thread Doug Ewell

Roozbeh Pournader roozbeh at sharif dot edu wrote:

 Was there anything decided about using variant selectors for selecting
 exact shapes?

StandardizedVariants.html doesn't list anything for vulgar fractions.  I
assume they decided the distinction wasn't worth making.

-Doug Ewell
 Fullerton, California