RE: Pronunciation of U+0429 (was RE: Digraphs as Distinct Logical Uni ts)
David Starner wrote: At 11:00 AM 8/8/02 -0700, David Possin wrote: I have seen the German transliteration being 'schtsch' for it, English would be 'shtsh' with 'sh' spoken like sharp in both cases. The German 'ch' sound is very different. Shouldn't that be 'shch' for English? I've seen that before, and it makes more sense. Yes, that's the normal English romanization (Хрущёв = Khrushchev). And the official Russian scientific transliteration is šč (Хрущёв = Chruščëv). This puzzles me a little bit because it seems that Russians themselves think that the letter represent two consonants, or a glide. BTW, a Russian course book I have at home represents the pronunciation with a sort of Cyrillic-based IPA: letter щ is transcribed as [шч]. Perhaps the two consonants is a sort of Russian received pronunciation, while the ich-Laut pronunciation could be dialectal. _ Marco
Re: Digraphs as Distinct Logical Units
Doug Ewell wrote: And if you think that's bad, you should have seen the ones that got rejected -- special emphasized Hangul for writing the names of North Korean dictators Not so outlandish as it may first appear. When Egyptian hieroglyphs get encoded in Unicode, I would not be surprised to see special characters for the cartouched names of pharaohs (for pharaohs read dictators). And in China, historically the personal names of emperors (for emperors read dictators) have been tabooed (some dynasties, e.g. Han, Song and Qing, more than others), meaning that if you had to write a character that happened to be part of the emperor's personal name, then you either substituted another character (synonym or homophone as appropriate), or wrote the character with the last stroke omitted. This later practice was prevalent during the Qing dynasty (1644-1911). For example, the character hong ºë [U+5F18] is often found written without the final dot on the bottom right in texts dating from and after the reign of the Qianlong emperor (r.1736-1795), whose personal name was hongli ºëÑ [U+5F18, U+66C6]. Whilst an editorial decision may be made to transcribe all instances of the tabooed form of ºë [U+5F18] as ºë [U+5F18] for a given text, because these tabooed forms are so useful for dating purposes, textual scholars often have to refer to the tabooed form as distinct from the canonical form (I myself have had to do so, and have been reduced to using awkward formulae such as the character ºë with a missing final stroke). I was thinking that perhaps there might be a need for a new Unicode block - CJK Taboo Replacement Characters, but having just looked at the chart for CJK Unified Ideographs Extension B http://www.unicode.org/charts/PDF/U2.pdf (scary reading for you font developers), I notice that the tabooed form of hong is encoded at U+2239E, as is at least one other taboo-form that I checked (U+248E5). Andrew West
OT: Re: Pronunciation of U+0429 (was RE: Digraphs as Distinct Logical Uni ts)
Hello Rick, RC My native Russian speaker isn't available at the moment, but when she RC pronounced U+0429 for me this morning, it sounded like a single phoneme. And RC when I pronounced an ich-laut for her, she said it was the same sound. Unfortunately, the latter experiment does not prove very much because of categorial perception. A speaker from a language will always have a certain tolerance with which they perceive phonemes. German native speakers are an extreme case: almost everyone without phonetic training will say that [ç] and [x] are the same *sound* (because they're allophones of the same *phoneme*), even though they're really different. A similar case exists with Russian [l], for example. Because Russian has two L-sounds ([l] and [l']), Russian [l] is usually darker and more tense than, say, German [l]. However, when I produce a German [l] and ask a Russian what sound it is, they will always say it's an [l], even though their own realization of [l] is phonetically different. And when I ask them to produce an [l] and then produce my own [l], they will say that it's the same sound, even though it is a different sound *phonetically* (because, of course, when asked whether A and B are the same sound, most people answer from their *phonological* viewpoint). If you want to experiment, ask her to say chemistry in Russian, listen to the first phoneme, compare it to U+0429 (they *are* different) and then figure out which one is the ich-sound [ç]. RC The entry for U+0429 (which they write as Ø') sure looks and RC sounds like an ich-laut to me. Oh, the entry for [x'] sounds so, too :-) For a native speaker of a language other than Russian, both probably sound like it. For a native speaker of German (like myself), *both* sound *different* from High German [ç] (or at least my own idea of how ich *should* be articulated in High German). (However, when speakers of Ripuarian (the dialect of German in Bonn where I live) say ich, it sound pretty much like my idea of U+0429, whatever that signifies...) Ah, this is all so complicated. Philippmailto:[EMAIL PROTECTED] ___ Chaos reigns within / Reflect, repent, and reboot / Order shall return
Backward accent order
The French language uses backward accent order. Is backward accent order used in any other language? Regards, Åke Persson
Re: Re: Pronunciation of U+0429 (was RE: Digraphs as Distinct Logical Uni ts)
- Original Message - From: Philipp Reichmuth [EMAIL PROTECTED] To: Rick Cameron [EMAIL PROTECTED] Cc: [EMAIL PROTECTED] Sent: Friday, August 09, 2002 12:02 PM Subject: OT: Re: Pronunciation of U+0429 (was RE: Digraphs as Distinct Logical Uni ts) German native speakers are an extreme case: almost everyone without phonetic training will say that [ç] and [x] are the same *sound* (because they're allophones of the same *phoneme*), even though they're really different. But [ç] and [Ï] aren't very different: [ç] is how native Swedes pronounce k before some vowels. [Ï] is how many immigrants pronounce the same letter. [ʧ] is how it's pronounced in the Finnish dialect. Stefan _ Gratis e-mail resten av livet på www.yahoo.se/mail Busenkelt!
Re: Digraphs as Distinct Logical Units
Andrew C. West wrote of pharoahs and taboos. Egyptian Hieroglyphic Encoding Proposal: http://std.dkuug.dk/jtc1/sc2/wg2/docs/n1637/n1637.htm Proposal to Add IDEOGRAPHIC TABOO VARIATION INDICATOR to ISO/IEC 10646: http://std.dkuug.dk/JTC1/SC2/WG2/docs/n2475.pdf Best regards, James Kass.
Re: OT: Re: Pronunciation of U+0429 (was RE: Digraphs as Distinct Logical Uni ts)
Hello Philipp, PR Hello Rick, RC My native Russian speaker isn't available at the moment, but when she RC pronounced U+0429 for me this morning, it sounded like a single phoneme. And RC when I pronounced an ich-laut for her, she said it was the same sound. There are two ways to pronounce U+0429. One is a single consonant that sounds like a softer version of [S] (the sh-sound), the other is very similar to [StS]. The [StS]-variation, recorded in many foreign textbooks and other sources, is almost, but not quite, extinct. The single-consonant version is almost, but not quite, universal in modern Russian. More clarifications: - the single-consonant version [S'] is indeed one sound; it's not the case that it's just [StS] mistakenly believed to be a single sound by native speakers. [S'] and [StS] are different to a native ear (but you don't hear [StS] so much anymore). - both [StS] and [S'] are double in length; that's why in fact [S'] is usually denotes [S':] in Russian phonetical texts. The letter U+0429 always denotes a double consonant, whether its quality is [S'] or [StS] (the actual length is not exactly double but somewhat less than twice the normal consonant length; that is true of all cases of consonant doubling in Russian, however). There are very few cases where U+0429 is pronounced as a single [S'] consonant in casual speech; e.g. in the word voobsche. This is probably due to such words' high frequency in speech; whether it'll in time affect the length of U+0429 in general remains to be seen. - in any case it's a single phoneme, both in the [S'] and the [StS] version. It contrasts meaningfully with S+tS. S+tS (which occurs fairly often on morpheme boundaries) sounds slightly different from U+0429 in its [StS] variant (as far as I can make out; my native version of U+0429 is [S']). - the [StS] variation is normally thought of as belonging to the St.Petersburg [Leningrad] accent. St.Petersburg is where it survives (barely) today, and it's by no means universal there today. It's disappeating pretty rapidly. A generation ago, many actors, singers, sometimes TV announcers used [StS]; today it's no longer considered acceptable. - historically, the [StS] pronunciation used to be universal in Russian (this [StS] evolved from earlier proto-Slavic [St], IIRC; the same letter denotes [St] in old Slavonic texts). The currently standard [S'] variation used to be a Moscovite accent feature which started to appear around 15-16th centuries. Slowly it propagated throughout most of Russian dialects, until in the end only some Northern dialects, including the St.Petersburg dialect, remained with [StS]. This also helps explain why [S'] is always (well, nearly -- see above) a double consonant, the only such consonant in Russian. It appeared as a kind of flattening of the differences between S and tS in [StS], both consonants coming together, in a way, and forming a single [S':] (tS is perceived to be a single consonant sound in Russian and is different from t+S). - some phonetists prefer to speak of [S'tS] in the St.Petersburg accent and not [StS]. It's certainly true that the first consonant in [S'tS] is softer than the standard, rather hard, Russian [S]. (I am a native speaker.) -- Anatoly Vorobey, my journal (in Russian): http://www.livejournal.com/users/avva/ [EMAIL PROTECTED] http://pobox.com/~mellon/ Angels can fly because they take themselves lightly - G.K.Chesterton
Re: OT: Re: Pronunciation of U+0429 (was RE: Digraphs as Distinct Logical Uni ts)
Anatoly Vorobey scripsit: - historically, the [StS] pronunciation used to be universal in Russian (this [StS] evolved from earlier proto-Slavic [St], IIRC; the same letter denotes [St] in old Slavonic texts). And in modern Bulgarian as well. -- John Cowan [EMAIL PROTECTED] http://www.ccil.org/~cowan http://www.reutershealth.com Charles li reis, nostre emperesdre magnes, Set anz totz pleinz ad ested in Espagnes.
Re: Pronunciation of U+0429
Rick Cameron wrote: Is Щ pronounced in Russian something like the ich-Laut in German? I not at all. first, Щ is a double consonant believe this sound is represented in IPA by /ç/. In TUS 2.0 it says that /ɕ/ (U+0255) represents the sound spelled with ś (U+015B) in Polish, so perhaps these sounds are different. If so, any hints on the difference? (FWIW, I too was taught that Щ was pronounced /ʃʧ/ - but my that is indeed the official pronunciation, and if you ask an (educated) Russian speaker to slowly pronounce a word with Щ he will pronounce it as /ʃʧ/ - but I guess it is influenced by orthography. In normal speech, this sound is almost like /ʃː/ or /ɕː/ (definitely softer than just plain /ʃː/) Russian teacher was a Czech! Are there any Slavic languages that do have a letter pronounced /ʃʧ/?) east slovak dialects, and it is a real combination of two phonemes /ʃʧ/ there (and it is usually written šč, when these dialects are written down at all) however, in some dialects it turns into /ɕʨ/ as known from polish btw ukrainian pronunciation of Щ is IMHO /ʃː/ -- --- | Radovan Garabik http://melkor.dnp.fmph.uniba.sk/~garabik/ | | __..--^^^--..__garabik melkor.dnp.fmph.uniba.sk | --- Antivirus alert: file .signature infected by signature virus. Hi! I'm a signature virus! Copy me into your signature file to help me spread!
Re: German 'ich' (was: Pronunciation of U+0429)
I was thinking about Hessisch too, which is Frankfurt area and the German Bundesland Hessen. I think I can distinguish about 6 different dialects, each one has a different pronunciation of 'ich'. If anybody is interested I can organize a conference call offlist and we can listen to the various sounds by phone. Compare it with the Berlin version ;-) Dave --- Otto Stolz [EMAIL PROTECTED] wrote: Rick Cameron wrote: At http://www.philol.msu.ru/rus/galya-1/kons/n-2.htm you can find audiovisual samples for the consonants of the Russian alphabet. The entry for U+0429 (which they write as Ш') sure looks and sounds like an ich-laut to me. Are you referring to the German standard pronounciation [ç], or have you, by any chance, heard this phoneme pronounced by a Hessian [ʃ]? The latter would resemble the pronounciation of щ much more than the former (which is normally transliterated into Russian as г). Best wishes, Otto Stolz = Dave Possin Globalization Consultant www.Welocalize.com http://groups.yahoo.com/group/locales/ __ Do You Yahoo!? HotJobs - Search Thousands of New Jobs http://www.hotjobs.com
Re: Tildes on vowels
David Possin wrote as follows. quote In German it was common to use a macron over m and n to show mm and nn, I saw it being written this way up to the 1970's. But I never saw it used for any other double letters. Dave end quote There is a very interesting document entitled The Gutenberg Press available as a file named gbpmanual.pdf from the Walden Font website. The website address is as follows. http://www.waldenfont.com The address for the file is as follows. http://www.waldenfont.com/public/gbpmanual.pdf On page 14 are some special characters, ligatures and abbreviations, as used by Gutenberg. Searching through the table is great fun so I will only mention here the first entry in the table which shows a letter a with a horizontal line over the top which is stated as am, an in the pdf file. The Walden Font website also has some sample fonts showing some of the characters in each font. With the Gutenberg sample some of the special characters with a horizontal line over the top are in the sample. I managed to find them using the Insert Symbol facility of Word 97 on a Windows 98 platform. I have also experimented using WordPad on a Windows 98 platform and found that I could get one of the characters by using Alt+0200. I also managed to get that same character into WordPad on an older Windows 95 PC. I have not referred to the line over the top as a macron as I am not sure whether it is a macron. I say not sure because I am learning and am not sure in that context, not in any way because I am expressing a learned opinion on the matter or anything like that. The document refers to Gutenberg having 290 characters in his typeset. However, the Walden Font font seems not to have that many characters, so perhaps someone might like to say something about Gutenberg's character set please. An email correspondent recently informed me that Gutenberg used a qv ligature. Does anyone know please of what ligatures and abbreviations were used by Gutenberg, if any, which are not in Walden Font font please? I recently saw a television programme in the United Kingdom about Gutenberg not having used a reusable matrix for typecasting but having to make a new matrix for each casting, without the benefit of having a punch to make the matrix. This was discovered by really high magnification of characters in some of Gutenberg's printing. It appears that the type was reused on different pages but that no two versions of the same letter on any given page were congruently identical. William Overington 9 August 2002
Taboo Variants (was Re: Digraphs as Distinct Logical Units )
James Kass wrote: Proposal to Add IDEOGRAPHIC TABOO VARIATION INDICATOR to ISO/IEC 10646: a href=http://mail.alumni.princeton.edu//jump/http://std.dkuug.dk/JTC1/SC2/WG2/docs/n2475.pdf;http://std.dkuug.dk/JTC1/SC2/WG2/docs/n2475.pdf/a Thanks for the reference. There seem to be a couple of problems with this proposal as far as I can see. 1. The Ideographic Taboo Variation Indicator is proposed for inclusion in the Kangxi Radicals block !!! Surely they can't be serious. If they just need an empty code point, they might as well put it at U+03A2 and be dammed. Probably the CJK Symbols and Punctuation block would be more appropriate, but that's full up now, which I guess is why it's proposed to put the character at any old empty code point. The original CJK Symbols and Punctuation block was always going to be too small, and I believe that a new block is needed for extended CJK Symbols and Punctuation (there are still a number of ideographic symbols that need encoding, such as the two or three commonly encountered symbols that have the same semantics as U+3005 IDEOGRAPHIC ITERATION MARK). 2. Looking at CJK Unified Ideographs Extension B, it seems that the most common taboo variants are now already encoded in Unicode. In addition to U+2239E and U+248E5 which I have already mentioned, the primary example of a taboo-form variant character given in the proposal is also encoded at U+22606. The secondary examples (where the taboo-form is used as a phonetic component in a more complex character) could be currently coded using Ideographic Description Characters - e.g. U+2FF0, U+2E98, U+22606 and U+2FF0, U+2EAF, U+22606. Is there still a need for an Ideographic Taboo Variation Indicator ? Personally I still think that a separate CJK Taboo Replacement Characters block would have been more logical ... but it's too late now. By the way, when's Code2000 going to include the CJK Unified Ideographs Extension B glyphs ? There are actually a few useful characters hidden here and there amongst the morass of junk characters. Andrew West
Re[2]: Pronunciation of U+0429
Hello Radovan, RG that is indeed the official pronunciation, No, it really isn't! RG and if you ask an (educated) Russian RG speaker to slowly pronounce a word with [U+0429] he will pronounce it as RG [StS] No, he really won't! RG but I guess it is influenced by orthography. What's the orthography got to do with it?? -- Anatoly Vorobey, my journal (in Russian): http://www.livejournal.com/users/avva/ [EMAIL PROTECTED] http://pobox.com/~mellon/ Angels can fly because they take themselves lightly - G.K.Chesterton
Re: Backward accent order
AFAIK reverse diacritic are unique to French -- of course French is spoken in a lot of different locales. ;-) MichKa - Original Message - From: Ake Persson [EMAIL PROTECTED] To: Unicode List [EMAIL PROTECTED] Sent: Friday, August 09, 2002 3:58 AM Subject: Backward accent order The French language uses backward accent order. Is backward accent order used in any other language? Regards, Åke Persson
RE: German 'ich' (was: Pronunciation of U+0429)
I guess everybody know that the has genders in Germany: der, die, das Now imagine the poor American arriving in Munich and stepping on a Bavarian's toe: Das die der Dei-bel hol (I messed with the Bavarian spelling a bit to get my point across.) I' bä a Schwob (I learned German the first time in a tiny Swabian village near Tübingen) Dave --- Vaintroub, Wladislav [EMAIL PROTECTED] wrote: Despite all the similarities in pronounciations of Russian U+0429 and German ich , U+0429 seems to be very hard for pronounce Germans, who learn Russian (the most complicated for Germans is I think U+042B, which most of them pronounce like German u). Icke, (a Russian living in Berlin) -Original Message- From: David Possin [mailto:[EMAIL PROTECTED]] Sent: Friday, August 09, 2002 2:17 PM To: Otto Stolz; Rick Cameron Cc: [EMAIL PROTECTED] Subject: Re: German 'ich' (was: Pronunciation of U+0429) I was thinking about Hessisch too, which is Frankfurt area and the German Bundesland Hessen. I think I can distinguish about 6 different dialects, each one has a different pronunciation of 'ich'. If anybody is interested I can organize a conference call offlist and we can listen to the various sounds by phone. Compare it with the Berlin version ;-) Dave --- Otto Stolz [EMAIL PROTECTED] wrote: Rick Cameron wrote: At http://www.philol.msu.ru/rus/galya-1/kons/n-2.htm you can find audiovisual samples for the consonants of the Russian alphabet. The entry for U+0429 (which they write as D?') sure looks and sounds like an ich-laut to me. Are you referring to the German standard pronounciation [A?], or have you, by any chance, heard this phoneme pronounced by a Hessian [Ef]? The latter would resemble the pronounciation of N? much more than the former (which is normally transliterated into Russian as D3). Best wishes, Otto Stolz = Dave Possin Globalization Consultant www.Welocalize.com http://groups.yahoo.com/group/locales/ __ Do You Yahoo!? HotJobs - Search Thousands of New Jobs http://www.hotjobs.com
Re: Digraphs as Distinct Logical Units
On Friday, August 9, 2002, at 03:54 AM, Andrew C. West wrote: And in China, historically the personal names of emperors (for emperors read dictators) have been tabooed An Ideographic Taboo Variation Indicator has been approved by the UTC for addition to the standard to handle precisely this kind of situation (see http://www.unicode.org/unicode/alloc/Pipeline.html. It works on the theory that you rarely need to know the precise *form* of the taboo variant, just that a taboo form is being used. There was some disagreement in WG2 about its utility, however, and there is the problem that, as you note, some taboo variants have already been encoded. It's currently scheduled to be reconsidered by the UTC. == John H. Jenkins [EMAIL PROTECTED] [EMAIL PROTECTED] http://homepage.mac.com/jhjenkins/
Re: Digraphs as Distinct Logical Units
Andrew C. West andrewcwest at alumni dot princeton dot edu wrote: And if you think that's bad, you should have seen the ones that got rejected -- special emphasized Hangul for writing the names of North Korean dictators Not so outlandish as it may first appear. When Egyptian hieroglyphs get encoded in Unicode, I would not be surprised to see special characters for the cartouched names of pharaohs (for pharaohs read dictators). And in China, historically the personal names of emperors (for emperors read dictators) have been tabooed (some dynasties, e.g. Han, Song and Qing, more than others), meaning that if you had to write a character that happened to be part of the emperor's personal name, then you either substituted another character (synonym or homophone as appropriate), or wrote the character with the last stroke omitted. This later practice was prevalent during the Qing dynasty (1644-1911). The Egyptian pharaohs and Chinese emperors were generally viewed as gods or demigods. It's not too surprising to see the names of supreme beings written in a special way. In the Hebrew tradition, the name of God (Yahweh) is written specially to avoid the appearance of blasphemy. Mark Shoulson and Michael Everson co-wrote a draft proposal in 1998 to encode the Tetragrammaton in Unicode: http://std.dkuug.dk/jtc1/sc2/wg2/docs/n1740/n1740.htm But in 2002, political leaders and heads of state are more likely to be seen as human, rather than superhuman, at least in most cultures, and to have their names written with the same characters as the common folk. For the North Koreans to encode special emphasized Hangul characters for the names of their two Great Leaders, Kim Il-sung and Kim Jong-il, in their national standard -- going so far as to encode separate characters for Kim and Il for each leader, though the two were father and son -- and to propose these emphasized characters for ISO/IEC 10646, seems extremely backward and/or extremely repressive, at least to this Westerner. -Doug Ewell Fullerton, California
Re: Tildes on vowels
Stefan Persson wrote as follows (text ), responding to Andrew C. West (text ). Personally I think that markup may be more appropriate, given the countless possible permutations of combining/superscript letters that may be encountered in mediaeval texts in various languages. Why not just add *two* characters, either to the PUA or to Unicode? U+ = COMBINING LETTER ABOVE INDICATOR U+XXXY = SUPERSCRIPT LETTER INDICATOR This means that U+ directly followed by a is a combining a above, and that U+XXXY directly followed by a is a superscript a. This means some normalisation issues: U+0061 U+0363 ≡ U+0061 U+ U+0061 U+00AA ≡ U+XXXY U+0061 etc. Stefan Well, such normalisation could be as private a matter as the allocation of the two characters to the Private Use Area. Consider please the following scenario, which is a scenario which I have devised in a creative writing manner as a fictional scenario, yet which does not seem unrealistic in relation to what might happen in practice, somewhere, sometime. Suppose please that someone wishes to transcribe the text of a medieval manuscript so as to have the text stored in a computerised format. Upon finding various characters in the manuscript such that he or she cannot enter them as Unicode characters, he or she might reasonably devise his or her own encoding list, by, say, making a handwritten list (with a view to later putting the piece of paper through a scanner to produce a graphic file) and use that encoding list in order to make human decisions as to which characters to key into the computer system, perhaps doing the keying with a program such as UniPad. The UniPad website is as follows. http://www.unipad.org It may be that the UniPad program could be customised so as to have a special soft keyboard to help the transcriber in keying the codes, yet even if that is not possible the Private Use Area codes could be entered using the character map which UniPad provides. In such circumstances the transcriber could decide to have a Private Use Area encoding of the characters of the manuscript on the basis of one Private Use Area code point for each character in the manuscript or he or she could decide to have a system which used the two operators which you suggest together with zero or more other operators and zero or more individual characters depending upon the repertoire of characters which exist in the manuscript. Certainly there are then issues of using the data once it is in a computer file, maybe some special program will need to be written (such as a small Pascal program, I am not meaning some major development project to produce a special program, just something which will do what is required for the particular transcription project), yet for someone to use two such Private Use Area encodings in order to facilitate the task of getting the information content accurately from the document into the computer, it seems a perfectly reasonable thing to do. The transcriber might need to do the transcribing of the original document during certain daytime hours at a table in a secure library environment during a time frame arranged by prior appointment and permissions. Once the transcribed data is in the computer, either keyed in while in the library or transcribed from notes made using a pencil, the transcriber and other interested people throughout the world can analyse the meaning of the text of the document almost anywhere. In such circumstances of some people trying to understand such documents, maybe using the two codes within the Private Use Area together with an ordinary TrueType font which has U+ implemented so as to show a glyph of an arrow starting by going straight upwards then going steeply diagonally upwards in a bend dexter direction until it reaches the point of the arrow, (as if the back half of the arrow were as in U+2191 and the front half of the arrow were as in U+2196) and U+XXXY implemented as an arrow going straight upwards until it reaches the point of the arrow, (similar to U+2191) would be a way of researchers having a look at the transcribed text of the document in a convenient manner. I only suggest those particular glyphs as examples in this posting, please feel free to use whatever glyph designs you wish. Certainly, the use of such Private Use Area codes would only have any validity in their use amongst a group of users of the Unicode system who had agreed to use those particular Private Use Area encodings to have those meanings. Yet the use of such a Private Use Area encoding could, I feel, be very useful amongst such a group of researchers in that it would get the document transcription job done and would have the considerable advantage that if the transcribed file were to be displayed in a program such as WordPad or Word that in order to be able to understand an indication of the presence in the original document of any regular Unicode character combined above any other regular Unicode
RE: [unicode] Re[2]: Pronunciation of U+0429
Radovan Garabik wrote: RG but I guess it is influenced by orthography. What's the orthography got to do with it?? if the children in schools are taught that щ is pronounced as шч, they (those who are paying atention) will remember it and then use this pronunciation when asked to pronounce each phoneme of a given word. Uh!? Are you thinking about children from ethnic minorities? Russian children are supposed to be already able to speak Russian when they go to school: I guess what they learn is that sound has that letter, not the other way round. _ Marco
Re: Tildes on vowels
- Original Message - From: William Overington [EMAIL PROTECTED] To: Stefan Persson [EMAIL PROTECTED]; Andrew C. West [EMAIL PROTECTED]; [EMAIL PROTECTED] Cc: [EMAIL PROTECTED] Sent: Friday, August 09, 2002 6:00 PM Subject: Re: Tildes on vowels Well, why not go ahead and decide on two code points within the Private Use Area as values for and XXXY, post them in this list and perhaps that action will lead to that facility becoming available as a facility to document transcribers all around the world. There have been several messages sent to this list about why this would be inappropriate. Just read the answers to some of your recent discussions, and you'll understand what I mean. Stefan _ Följ VM på nära håll på Yahoo!s officielle VM-sajt www.yahoo.se/vm2002 Håll dig ajour med nyheter och resultat, med vinnare och förlorare...
Re: Taboo Variants
John H. Jenkins wrote: Yes, because you do not *encode* characters using IDC's. You describe them. This is carefully explained in the standard. I stand corrected. Of course, using the taboo variant selector is about as vague as an IDC, so it doesn't make that much difference. My point is that if the commonly encountered taboo variants are already encoded in CJK-B, then either the other taboo variants should also be added to CJK-B or they could be *described* using IDCs. Adding a taboo variant selector does make a difference, because then there'll be more than one way to reference the same character. On the other hand, given the lack of font support for CJK-B, perhaps a taboo variant selector would be preferable ... now I don't where I stand on this ! As to the proposed location, note that the byte-order mark got stuck with a bunch of Arabic compatibility forms. U+FEFF is only stuck with a bunch of Arabic compatibility forms because it's the little-endian of U+FFFE, and as far as I'm aware it's not actually a BOM character, but a code point that is used solely with the semantic of BOM (TR28 Section 3.9). Sometimes the odd character gets stuck in an odd place; as you say, there wasn't any room left in the more logical location, and this spot in the KangXi radicals block was pretty much never going to be used otherwise. Six of one, as it were. I simply can't accept this. For argument's sake, what are you going to do when I publish the manuscript copy of a draft edition of the Kangxi dictionary that I recently purchased in a second-hand bookstore in London that includes ten supplementary radicals not found in the printed editions ? In principle, as has been argued convincingly in another thread recently, you can never assume that any unused code point will always remain vacant. The Kangxi Radical block may look as if it will never change, but we shouldn't rely on that being the case. Given that there's going to be proposals for additional CJK symbols and punctuation marks in the future (if no-one else does I've got a few I'll propose), surely it would be better to simply create a CJK Symbols and Punctuation B block for the proposed IDEOGRAPHIC TABOO VARIATION INDICATOR. It's irrelevant that the block will only have one charcacter to start with. It's got to be better than poluting other blocks with characters that just don't belong there. Andrew
Re: Taboo Variants
John H. Jenkins wrote: Of course, using the taboo variant selector is about as vague as an IDC, so it doesn't make that much difference. Actually, on second thoughts, why do we need a taboo variant selector when we already have generic variation selectors (U+FE00 through U+FE0F) ? The Standardized Variants document http://www.unicode.org/Public/UNIDATA/StandardizedVariants.html states : quote Han Variants At this time no Han variants exist. When they do, a table will be inserted here. /quote Surely if there ever was a place to put taboo-form variants, this is it. Andrew C. West http://uk.geocities.com/babelstone1357/
Re: [unicode] Re[2]: Pronunciation of U+0429
On Fri, Aug 09, 2002 at 07:16:09PM +0200, Marco Cimarosti wrote: Radovan Garabik wrote: RG but I guess it is influenced by orthography. What's the orthography got to do with it?? if the children in schools are taught that щ is pronounced as шч, they (those who are paying atention) will remember it and then use this pronunciation when asked to pronounce each phoneme of a given word. Uh!? Are you thinking about children from ethnic minorities? Russian no, I am speaking about Russians children are supposed to be already able to speak Russian when they go to school: I guess what they learn is that sound has that letter, not the other way round. I have no idea how it is in Russian school system, but: 1) they can speak a dialect 2) as it was already pointed out, щ, when transcribed phoneticaly, is written as шч in Russian literature. When I was being taught Russian (5th grade, elementary school), there was never ever a mention that щ can be pronounced differently from шч combination. Indeed, when our teacher explained cyrillic, she took a special effort to explain that in Russian, шч combination is written as щ (with some exceptions, of course, such as счастие). Also in Russian textbooks, there was written everywhere that when pronunciation is concerned, щ=шч. But again, these were textbooks written by Slovaks, for Slovak pupils (and not particularly good, e.g. until I started to read real Russian literature I had no idea that ё is often written as е. It took me some time to get out of this confusion :-)) -- --- | Radovan Garabik http://melkor.dnp.fmph.uniba.sk/~garabik/ | | __..--^^^--..__garabik melkor.dnp.fmph.uniba.sk | --- Antivirus alert: file .signature infected by signature virus. Hi! I'm a signature virus! Copy me into your signature file to help me spread!
Re: Taboo Variants
On Friday, August 9, 2002, at 11:38 AM, Andrew C. West wrote: My point is that if the commonly encountered taboo variants are already encoded in CJK-B, then either the other taboo variants should also be added to CJK-B or they could be *described* using IDCs. Encoding them was a mistake, pure and simple. We didn't monitor the IRG well enough in the CJK-B encoding process, or we would have objected to this kind of cruft. And describing them is a valid approach. It depends on what's more important to youthe appearance (which IDS's are better at), or the semantic (which is explicit with the TVS). Adding a taboo variant selector does make a difference, because then there'll be more than one way to reference the same character. Well, yes and no. Even though we've already got taboo variants encoded, we have no way to flag in a text that the purpose they're serving is taboo variants. The interesting thing about the taboo variants is precisely that meaning: This is character X written in a deliberately distorted way. You identified the taboo variants you found in Ext B not based on anything in the standard, but because of your outside knowledge. A student encountering them in a text may well be stymied until she goes to her professor. Meanwhile, multiple encodings of the same Han character are *already* a major problem. This is one reason why the UTC is determined to be stricter in the future to keep it from continuing to happen. == John H. Jenkins [EMAIL PROTECTED] [EMAIL PROTECTED] http://homepage.mac.com/jhjenkins/
Re: Taboo Variants
Andrew C. West scripsit: Given that there's going to be proposals for additional CJK symbols and punctuation marks in the future (if no-one else does I've got a few I'll propose), surely it would be better to simply create a CJK Symbols and Punctuation B block for the proposed IDEOGRAPHIC TABOO VARIATION INDICATOR. It's irrelevant that the block will only have one charcacter to start with. It's got to be better than polluting other blocks with characters that just don't belong there. Blocks exist to keep things simple for allocators (i.e. UTC and WG2), and not to allow end-users to make deductions about them; all such deductions are quite illegitimate. (If this isn't actually written down anywhere, it should be.) ISO 10646 (but not Unicode) does have the notion of labelled collections, which may be open (i.e. include currently unassigned codepoints) or closed. Regrettably, I can't cite examples, as AFAIK the list of collections is not online anywhere. -- John Cowan [EMAIL PROTECTED] http://www.ccil.org/~cowan http://www.reutershealth.com Unified Gaelic in Cyrillic script! http://groups.yahoo.com/group/Celticonlang
Re: Taboo Variants
John Cowan wrote: Blocks exist to keep things simple for allocators (i.e. UTC and WG2), and not to allow end-users to make deductions about them; all such deductions are quite illegitimate. (If this isn't actually written down anywhere, it should be.) Surely assigning a character to a block with other like-minded characters IS keeping things simple for allocators, and randomly assigning miscellaneous characters all other the place makes it as confusing to allocators as to end-users. Surely that's the whole point of having designated block names. It sounds to me that what you're suggesting is that characters should be allocated sequentially from U+ up, with no gaps. Would that not be the most simple solution for allocators !? After all, as long as the end-user sees the glyph that their expecting, they don't care what code point it's mapped to (indeed, as you imply, code points should be invisible to the end-user). Andrew C. West http://uk.geocities.com/babelstone1357/
Re: Taboo Variants
At 10:54 AM 8/9/02 -0700, Andrew C. West wrote: Actually, on second thoughts, why do we need a taboo variant selector when we already have generic variation selectors (U+FE00 through U+FE0F) ? The Standardized Variants document http://www.unicode.org/Public/UNIDATA/StandardizedVariants.html states : quote Han Variants At this time no Han variants exist. When they do, a table will be inserted here. /quote Surely if there ever was a place to put taboo-form variants, this is it. The difference being that the table matches a certain number of ideographs with specific variants, where the taboo variant selector potentially matches any ideograph with an (unspecified) taboo variant.
Re[2]: [unicode] Re[2]: Pronunciation of U+0429
Hello Radovan, RG that is indeed the official pronunciation, No, it really isn't! RG not even if you ask your fellow innocent russian speakers RG please read for me this word v e r y s l o w l y RG and listen carefully? No, it isn't. The [StS] pronunciation has been considered a dialect pronunciation for 50 years now. The official, standard pronuncation is [S'], and has been for a long time. RG We were certainly taught to pronounce [U+0429] as [StS] (soft [tS] before soft RG vowels, of course), [tS] is _always_ soft in Russian. RG but I guess it is influenced by orthography. What's the orthography got to do with it?? RG if the children in schools are taught that [U+0429] is pronounced RG as [StS], Trust me, they aren't. -- Anatoly Vorobey, my journal (in Russian): http://www.livejournal.com/users/avva/ [EMAIL PROTECTED] http://pobox.com/~mellon/ Angels can fly because they take themselves lightly - G.K.Chesterton
Re: Taboo Variants
Lest everyone go scrabbling off the deep end and drown on this particular thread, I would like to point out the following facts: U+2FDF IDEOGRAPHIC TABOO VARIATION INDICATOR was accepted by the UTC on April 30, 2002. However, when the proposal was taken into WG2 it met a wall of opposition led by China. WG2 did *NOT* accept the character, and it is not a part of the FPDAM 2 currently being ballotted for inclusion in 10646. The UTC will have to deal with this mismatch (along with a number of others) in its upcoming meeting this month. China's clear preference is to simply encode all the taboo variants as separate characters. At the WG2 meeting, they pointed out a number of instances already encoded in Extension B, as you have. And with China not wanting an IDEOGRAPHIC TABOO VARIATION INDICATOR encoded, many other members of WG2 will defer to their opinion on the topic. This issue clearly needs to be worked further in the IRG context before a consensus will emerge. At any rate, don't consider it a done deal. What matters is what eventually gets published in the final, approved Amendment 2 for ISO/IEC 10646, which *will* match what we publish in Unicode 4.0. --Ken
Re: Taboo Variants
Andrew C. West scripsit: It sounds to me that what you're suggesting is that characters should be allocated sequentially from U+ up, with no gaps. Would that not be the most simple solution for allocators !? Only if they acted sequentially, which they did not and do not. Different scripts are being worked on simultaneously, and without block allocation it would be impossible to keep them from stepping on each others' code points. But once the job is done, the notion of blocks is dispensable. -- John Cowan [EMAIL PROTECTED] www.ccil.org/~cowan www.reutershealth.com In computer science, we stand on each other's feet. --Brian K. Reid
Re: Pronunciation of U+0429
JC so unnatural to peoples with more phonemic orthographies. Russian orthography is pretty *phonemic*, excluding historic forms such as the -ogo genitive or the soft sign with the 2nd person singular of the verb. Most accent-counting languages tend to reduce sounds rather heavily in nonstressed syllables, however, and in those cases a phonemic orthography doesn't help a lot. Philippmailto:[EMAIL PROTECTED] ___ Chaos reigns within / Reflect, repent, and reboot / Order shall return
Re: Pronunciation of U+0429
Philipp Reichmuth scripsit: Russian orthography is pretty *phonemic*, excluding historic forms such as the -ogo genitive or the soft sign with the 2nd person singular of the verb. Most accent-counting languages tend to reduce sounds rather heavily in nonstressed syllables, however, and in those cases a phonemic orthography doesn't help a lot. I take it to be rather morphophonemic, much like German orthography. -- John Cowan [EMAIL PROTECTED] http://www.reutershealth.com Mr. Lane, if you ever wish anything that I can do all you will have to do will be to send me a telegram asking and it will be done. Mr. Hearst, if you ever get a telegram from me asking you to do anything you can put the telegram down as a forgery.
OT Laugh for the day - I liked the title of this security related article
and the first few sentences as well Barry Caplan www.i18n.com http://www.securitymanagement.com/library/000599.html How to Keep Out Bad Characters By DeQuendre Neeley The business world is one of constant motion. But it is not just people who are on the move. It is also information. Businesses today depend on the efficient exchange of information, for which they rely increasingly on the Internet and other computer networks. Unfortunately, in the digital world, as in its physical counterpart, bad characters will sometimes try to slip in with the good.
Re: Is U+0140 (l with middle dot) ever used?
I asked my catalonian contacs about this issue; something like _ IMO, in catalan [L][·][L] is prefered to [L·][L] because L-dot is not really a separate letter, like spanish ñ, but a simply separator just like an ordinary -. Actually, AFAIK, in catalan typography if one needs to compose with exaggerated letter spacing, middle dot is dealt with as a separate symbol, and thus paral·lel looks like P A R A L · L E L and not P A R A L· L E L _ I just recieved an answer about this issue. Translated bellow: _ From: Hèctor Alós i Font [EMAIL PROTECTED] Date: Thu, 08 Aug 2002 08:17:18 +0200 Subject: Re: [esperpentu] Fwd: Is U+0140 (l with middle dot) ever used? Vi pravas: temas pri memstara signo, ne alglulajxo al antauxa lo. Nuntempe en la hispaniaj klavaroj (legxo Majó), temas pri memstara signo tajpita per Maj+3. Mi memoras tamen malnovajn tajpilojn kun aparta klavo l+mezpunkto. You're right: it's an standalone symbol, not an addition to the previous L. In current spanish keyboards (Majó law), it's a separated symbol located at Shift+3. But I remember older typewriters with a separated key L + middle dot. Principe temas pri mezalta punkto, sed estas homoj uzantaj normalan punkton: ekzemple la kataluna eldono de El Periódico ( http://www.elperiodico.com/EDICION/portada.htm?l=CAT ). Persone mi konsideras tion suficxe malbela - kvankam estas vere, ke tio apenaux konfuzas: tuj sekve, sen spaco, estas minuskla litero, malkiel okazas kun la vera punkto. In principle it is a dot at mid line height, but some people uses normal period dot: f.i. the catalan edition of the newspaper El Periódico ( http://www.elperiodico.com/EDICION/portada.htm?l=CAT ). Personnally I find it rather ugly -- though it's true that this parctice is hardely abiguous: right after the period, no space, there's a lower case letter unlike what happends with a real period. Gxi estas uzata ankaux katalune kiel apartigilo ekz-e en kelkaj fakaj eldonoj de mezepokaj tekstoj: se mi bone komprenas, tiel oni indikas, ke en la originalo estis unu sola vorto, sed nuntempe oni skribus dise. In catalan it is used also as a separator f.i. in scholastic editions of medieaval texts: IIUC, it is thus noted that in the original something is written as a single word, which nowadays we'd write separately. Mi rimarkis gxian uzon ankaux en la okcitana (Zamen·hof), sed mi tre dubas, ke tio estas norma uzo - simple kataluna influo. Eble portugallingvanoj povus imiti :) I noted the use of middle dot also in occitan (Zamen·hof) [thus distinguishing a foreign nh, here polish, from the occitan digraph nh], but I strongly doubt that this is normative -- it's probably just some catalan influence. Maybe portuguese speakers could do the same :) [nh also occurs in portuguese]. Kaj jes gxi estas efektive cxiutage uzata: amaseto da vortoj gxin enhavas, kvankam la barcelona (nenorma) prononco ne distingas inter l kaj l·l - sed jes duobligas suficxe multajn aliajn konsononantojn. And, yes, L + middle dot + L is indeed used: in a smallish number of catalan words, even if the barcelonian [normative] pronunciation doesn't distinguish between L and L·L, though it doubles a number of other consonants. _ So, unless it is (or becomes) used in any other language, U+0140 seems about to disappear from actual usage, with or without any official deprecation. As for the refered usage of normal period, it suffers from the known problems of having an punctuation sign used a letter symbol (word division, word count, alphasorting etc.). Hm. But middle dot is not also a letter symbol. It's also used as a bullet, a tab filling, even a box-drawing char. Shouldn't Unicode provide a way to separate this duality? -- . António MARTINS-Tuválkin, | ()| [EMAIL PROTECTED] || R. Laureano de Oliveira, 64 r/c esq. | PT-1885-050 MOSCAVIDE (LRS) Não me invejo de quem tem | +351 917 511 549 carros, parelhas e montes | http://www.tuvalkin.web.pt/bandeira/ só me invejo de quem bebe | http://pagina.de/bandeiras/ a água em todas as fontes |
Re: Taboo Variants
-BEGIN PGP SIGNED MESSAGE- Andrew C. West wrote: [re: proposed IDEOGRAPHIC TABOO VARIATION INDICATOR] Given that there's going to be proposals for additional CJK symbols and punctuation marks in the future (if no-one else does I've got a few I'll propose), surely it would be better to simply create a CJK Symbols and Punctuation B block for the proposed IDEOGRAPHIC TABOO VARIATION INDICATOR. It's irrelevant that the block will only have one charcacter to start with. It's got to be better than poluting other blocks with characters that just don't belong there. There's an unassigned block right next to the other ideographic variation selectors, at U+FE10..U+FE1F. *If* there are going to be variation selectors for particular semantics, I would have thought that's the obvious place to encode them. However, it doesn't make much sense to me to suddenly change from encoding variants using separate code points, to encoding them using variation selectors. Arguably variation selectors would have been the better approach if they had been used from the start (in particular, there would have been no need for any of the compatibility ideographs). However, requiring implementations to handle lots of separately encoded variant characters *and* variation selectors, is the worst of both worlds IMHO. - -- David Hopwood [EMAIL PROTECTED] Home page PGP public key: http://www.users.zetnet.co.uk/hopwood/ RSA 2048-bit; fingerprint 71 8E A6 23 0E D3 4C E5 0F 69 8C D4 FA 66 15 01 Nothing in this message is intended to be legally binding. If I revoke a public key but refuse to specify why, it is because the private key has been seized under the Regulation of Investigatory Powers Act; see www.fipr.org/rip -BEGIN PGP SIGNATURE- Version: 2.6.3i Charset: noconv iQEVAwUBPVQm/DkCAxeYt5gVAQGD7Qf6Ai+Zxx+M9T+1cZt8J8+QF4iHdh1Ga7k0 gU+L/8YU7smq66s56y2y+chWMQr5LJvgfO1C3Z43dKlSfZ2acZBIYRIuISkHhVWl wmawQ9kXenmKHMX2NB3abvlzuYXyZ7F2L12DoKnIapilfUeZtyjNKGM7njmqCEoo JoUaMXOJrqLggI0FuYfn4sXMdsJXhUZkwouaG4i4qg/+UQ9yH5t4uWMc8a1vZbrq TjOUllqPJ/fHqip7r13DFcCA3qIjq8jyJgyY7n6VOpSL6yBoBlaYiGKj1pMC84YC 3WpSF74JbDuYVMg9mOSRdUQgb5UiOr+7JsF4MSa1izTOpJCNi96HZg== =KWkE -END PGP SIGNATURE-
Re[2]: Pronunciation of U+0429
Hello John, Russian orthography is pretty *phonemic*, excluding historic forms such as the -ogo genitive or the soft sign with the 2nd person singular of the verb. Most accent-counting languages tend to reduce sounds rather heavily in nonstressed syllables, however, and in those cases a phonemic orthography doesn't help a lot. JC I take it to be rather morphophonemic, much like German orthography. Yep. Russian phonetists usually call phonemes what Western phonetists call morphonemes, so they have no problem with calling Russian orthography _phonemic_. -- Anatoly Vorobey, my journal (in Russian): http://www.livejournal.com/users/avva/ [EMAIL PROTECTED] http://pobox.com/~mellon/ Angels can fly because they take themselves lightly - G.K.Chesterton
Re: Is U+0140 (l with middle dot) ever used?
Anto'nio Martins-Tuva'lkin antonio at tuvalkin dot web dot pt wrote: Hm. But middle dot is not also a letter symbol. It's also used as a bullet, a tab filling, even a box-drawing char. Shouldn't Unicode provide a way to separate this duality? It should, and does. Unicode has plenty of bullet operators, hyphen bullets, dot operators, little black circles and squares and triangles, all kinds of stuff to fill these various typographical needs. The only question is whether people will actually use these new goodies, or continue to settle for whatever their keyboard and favorite 8-bit code page gave them. -Doug Ewell Fullerton, California
Re: Digraphs as Distinct Logical Units
Philipp Reichmuth uzsv2k at uni dash bonn dot de wrote: What about round-trip compatibility? UTC and WG2 apparently decided that some degree of compatibility with this relatively new (1997) DPRK standard could be sacrificed. The horizontal-bar fractions can be mapped to the existing Unicode fractions, and the only thing lost in round-tripping is the exact glyph shape. Likewise the emphasized name syllables; the only loss of information is the emphasis, not the plain-text identity of the syllables. -Doug Ewell Fullerton, California
Re: Taboo Variants
John Cowan jcowan at reutershealth dot com wrote: ISO 10646 (but not Unicode) does have the notion of labelled collections, which may be open (i.e. include currently unassigned codepoints) or closed. Regrettably, I can't cite examples, as AFAIK the list of collections is not online anywhere. http://std.dkuug.dk/jtc1/sc2/wg2/docs/n2499.pdf pages 37 through 46. -Doug Ewell Fullerton, California
Re: Digraphs as Distinct Logical Units
On Fri, 9 Aug 2002, Doug Ewell wrote: Re: Mixed up priorities From: Michael Everson Date: Sun Oct 24 1999 - 06:34:24 EDT [...] (I just love that name, don't you? I could say it all day, if only I knew how. !Xóõ !Xóõ !Xóõ.) -Doug Ewell Fullerton, California which makes one wonder if the above comment is a quote or yours. roozbeh
Re: Digraphs as Distinct Logical Units
Roozbeh Pournader roozbeh at sharif dot edu wrote: Was there anything decided about using variant selectors for selecting exact shapes? StandardizedVariants.html doesn't list anything for vulgar fractions. I assume they decided the distinction wasn't worth making. -Doug Ewell Fullerton, California