Re: Major Defect in Combining Classes of Tibetan Vowels
On Wed, Jun 25, 2003 at 02:10:44 -0700, Andrew C. West wrote: I've never really understood normalization, but it seems to me that normalising bcuig 0F56, 0F45, 0F74, 0F72, 0F42 to bciug 0F56, 0F45, 0F72, 0F74, 0F42 is wrong as bciug could conceivably be a shorthand abbreviation for a wcompletely different word with a gigu [i] on the first syllable and a shabkyu [u] on the second syllable. Err, as in this particular case one vowel sign is above and the other one is below the stack - i.e. they don't interact spatially - you cannot really distinguish them. ;) SY, Uwe -- [EMAIL PROTECTED] | Zu Grunde kommen http://www.ptc.spbu.ru/~uwe/| Ist zu Grunde gehen
Re: Major Defect in Combining Classes of Tibetan Vowels
On Wed, Jun 25, 2003 at 07:31:51 -0700, Andrew C. West wrote: Err, as in this particular case one vowel sign is above and the other one is below the stack - i.e. they don't interact spatially - you cannot really distinguish them. ;) I know that the vowel signs do not interact with each other typographically, but what's that got to do with anything ? I'm talking about the logical ordering of the Unicode codepoints used to encode some Tibetan text, not the physical appearance of the glyphs that are used to render that sequence of codepoints. What I'm suggesting is that although cui 0F45, 0F74, 0F72 and ciu 0F45, 0F72, 0F74 should be rendered identically, the logical ordering of the codepoints representing the vowels may represent lexical differences that would be lost during the process of normalisation. And given that the two look identical in writing in the first palce, this lexical difference had a chance to originate exactly *where*? You are putting the cart before the horse. Also note that the original question from Chris is about things that do interact spatially. SY, Uwe -- [EMAIL PROTECTED] | Zu Grunde kommen http://www.ptc.spbu.ru/~uwe/| Ist zu Grunde gehen
Re: Major Defect in Combining Classes of Tibetan Vowels
On Wed, Jun 25, 2003 at 09:08:10 -0700, Peter Lofting wrote: A list of common contractions would help here. I've seen at least one such published collection in the past which listed common contractions found in U-Med running text. However I don't have it with me. Does anyone on-line have access to a document like this? A sample list of dbu can contractions from Schmidt grammar: http://snark.ptc.spbu.ru/~uwe/tibex/contractions/contractions.html SY, Uwe -- [EMAIL PROTECTED] | Zu Grunde kommen http://www.ptc.spbu.ru/~uwe/| Ist zu Grunde gehen
Re: Fw: Karelian ASSR
On Fri, Dec 27, 2002 at 01:43:48 +, Anto'nio Martins-Tuva'lkin wrote: due to the new language law of the Russian Federation that makes Cyrillics compulsory for all the languages within the Federation. That's a very controversial law, but one correction is due nonetheless: for all *state* languages. Constitution says that the republics shall have the right to institute their own state languages. This law puts a constraint on that right. My understanding is that if a republic wants to institute a state language that is not written in cyrillic, the decision must be made at a federal level. SY, Uwe -- [EMAIL PROTECTED] | Zu Grunde kommen http://www.ptc.spbu.ru/~uwe/| Ist zu Grunde gehen
Re: Errata in language/script list: xUSSR languages
On Tue, Jul 31, 2001 at 17:58:57 +0700, Kairat A. Rakhim wrote: Nenets Latin, Cyrillic What is 'Netets'? http://directory.google.com/Top/Regional/Europe/Russia/Society_and_Culture/Nationalities/Arctic_and_Siberian/Nenets/ http://directory.google.com/Top/Science/Social_Sciences/Language_and_Linguistics/Natural_Languages/Finno-Ugric_Languages/Nenets/ SY, Uwe -- [EMAIL PROTECTED] | Zu Grunde kommen http://www.ptc.spbu.ru/~uwe/| Ist zu Grunde gehen
Re: Metafont to something real
On Tue, Mar 06, 2001 at 06:56:04 -0800, Nelson H. F. Beebe wrote: The standard TeX Computer Modern fonts are available in the CTAN archives in both Metafont and Adobe Type 1 format. The latter can be found in any CTAN mirror, e.g., ftp://ctan.math.utah.edu/tex-archive/fonts/cm/ps-type1 Beware that at least BlueSky fonts has buggy AFMs. Appreantly the converter was not following MF rules and it produced more then one kerning pair (KPX) entries for some glyphs. E.g. cmr12.afm gives: KPX k a -54.396 KPX k a -27.197 which is obviously an error. Here's the roman.mf: ligtable "k": if serifs: "v": "a" kern -u#, fi\\"w": "e" kern k#, "a" kern k#, "o" kern k#, "c" kern k#; Reading this is not for mere mortals, so let's examine output of tfmtopl cmr12.tfm In LIGTABLE we have (uwe: in pl files notation 'C x' means literal character 'x'): (LABEL C k) (LABEL C v) (KRN C a R -0.054398)-- (LABEL C w) (KRN C e R -0.027199) (KRN C a R -0.027199)-- (KRN C o R -0.027199) (KRN C c R -0.027199) (STOP) My understanding of the Appendix F of MFbook is that ligtable program stops on first match, so for (k,a) the correct kerning is -0.054398. But apparently the converter was buggy. The fix is to remove all the duplicates except first (adjusting StartKernpairs line accordingly). Depending on the program that you feed AFMs to, this might be ok, give you a wrong kerning or give you an error message about duplicate KPX. Actually, that's how I found it - a user of Lout batch formatter reported that Lout complains about BlueSky fonts. Yes - I tried to report this problem but never heard back. SY, Uwe -- [EMAIL PROTECTED] | Zu Grunde kommen http://www.ptc.spbu.ru/~uwe/| Ist zu Grunde gehen
Re: Latin digraph characters
On Wed, Feb 28, 2001 at 11:19:37 -0800, Antoine Leca wrote: [utf-8] [koi8-r] ;-( I know I should upgrade my mailer. Also, Don Knuth gives ðÁÆÎÕÔÉÊ for his first name, which does not sounds very Russian to me. It's Russian. Though, surely, not of Russian/Slavic origin. He was born on May 4 (Julian) in a village just few miles avay of a monastery famous for and named after Russian saint of the same name. Incidentally Russian Orthodox Church celebrate the memory of this saint on May 1 (Julian). Hence the name choice is not that surprising given the time and the place of birth. http://users.kaluga.ru/school6/chebishev/Family.htm http://www.days.ru/Life/life964.htm SY, Uwe -- [EMAIL PROTECTED] | Zu Grunde kommen http://www.ptc.spbu.ru/~uwe/| Ist zu Grunde gehen
Re: [OT] What is DEL for?
On Wed, Feb 21, 2001 at 06:29:29 -0800, Marco Cimarosti wrote: What is the function of ASCII control code 0x7F (DEL) in text interchange? Particularly, what effect or interpretation might it have in communication protocols, terminal protocols and, especially, inside text files? My interest is about the function of this character in *contemporary* platforms and software, although I wouldn't dislike historical information, as far as it is clearly flagged as such. AFAIK, the history is that on punched media (cards, paper tape) DEL was used to delete a character as it was represented as holes in all positions. For paper tape the following demonstrates it nicely: $ echo -ne '\177' | /usr/games/ppt ___ | .ooo| ___ On DEC (and, I belive other) terminals the -- "Rubout" key (PC keyboards has "BackSpace" key in this position) generates DEL. So emacs, The One True Editor :-), uses ^H key (i.e. backspace) for help - which causes a lot of confusion for new users who have PC keyboards that generate backspace (^H) for -- key. SY, Uwe -- [EMAIL PROTECTED] | Zu Grunde kommen http://www.ptc.spbu.ru/~uwe/| Ist zu Grunde gehen
Re: [OT] What is DEL for?
On Wed, Feb 21, 2001 at 09:42:53 -0800, Marco Cimarosti wrote: 1) What happens if emacs loads Doug Ewell's text file (I.e. a text file containing "ABCdelDEF") and then saves it? Would the file's content be changed to "ABDEF"? No. I don't think any program interprets file contents in this way. 2) Could emacs be invoked with a text file as the keyboard input? No. It needs a reall tty. SY, Uwe -- [EMAIL PROTECTED] | Zu Grunde kommen http://www.ptc.spbu.ru/~uwe/| Ist zu Grunde gehen
Re: Sanskrit Transliteration Characters
On Tue, Feb 20, 2001 at 12:32:01 +, Otto Stolz wrote: That's why I made and posted CSX mapping. There are a LOT of old CSX-encoded material. With this mapping I can use existing software (like the mentioned perl module) to convert it to Unicode and use emacs to view/edit it. This implicetly answers the original question from Krishna Desikachary: Does a Unicode standard exist for these characters? Exactly. All these chars could be encoded with Unicode and one can use the CSX mapping table I've sent to the lsit with existing programs to convert texts in legacy CSX encoding to Unicode. I should have been more explicit about this perhaps. It could be helpful to have your mapping (or a link to it) in the Unicode.org WWW pages, cf. http://www.unicode.org/unicode/faq/mappings/mappings.html. Some time ago I sumbitted it to Mark Leisher for inclustion to his CSets collection http://crl.nmsu.edu/~mleisher/csets.html as CSX co are not "official" standards. I haven't heard back from him yet. SY, Uwe -- [EMAIL PROTECTED] | Zu Grunde kommen http://www.ptc.spbu.ru/~uwe/| Ist zu Grunde gehen
Re: Sanskrit Transliteration Characters
On Mon, Feb 19, 2001 at 19:47:43 -0800, Krishna Desikachary wrote: a) There is an internationally accepted set of extra chars that are included in Roman (Latin) script to transacribe Sanskrit texts in Roman script. Does a Unicode standard exist for these characters? Were these ever standardises even outside the realm of Unicode? Yes, there are CS (classical sanskrit), CSX (CS eXtended) and now CSX+ 8-bit character sets for transliteration of Indic languages. CSX+ covers all the essential characters needed for ISO 15919, the draft standard for transliteration of Indian languages. http://ourworld.compuserve.com/homepages/stone_catend/translit.htm Find attached a mapping file for CSX I wrote to convert a Pali dictionary to Unicode (with perl's Unicode::Map module). Contact Dr. Jhon D. Smith for further info: http://bombay.oriental.cam.ac.uk/index.html b) Are ther any public, or commercial fonts available that follow these standard. Dr. Smith site carries some. Itrans by Avinash Chopde is also distributed with some CS fonts: http://www.aczone.com/ SY, Uwe -- [EMAIL PROTECTED] | Zu Grunde kommen http://www.ptc.spbu.ru/~uwe/| Ist zu Grunde gehen # #Name: CSX to Unicode table #Unicode version: 3.0 #Table version: 1.00 #Table format: Format A #Date: 2000-12-24 #Authors: Valeriy Ushakov [EMAIL PROTECTED] #General notes: CSX is based on CP437 #See http://bombay.oriental.cam.ac.uk/john/mahabharata/csx.html # #Format: Four tab-separated columns #Column #1 is the CSX code (in hex as 0xXX) #Column #2 is the Unicode (in hex as 0x or 0x+0x) #Column #3 is the Unicode name (follows a comment sign, '#') # #The entries are in CSX order # #CSXUCS comb. #unicode name # 0x000x #NULL 0x010x0001 #START OF HEADING 0x020x0002 #START OF TEXT 0x030x0003 #END OF TEXT 0x040x0004 #END OF TRANSMISSION 0x050x0005 #ENQUIRY 0x060x0006 #ACKNOWLEDGE 0x070x0007 #BELL 0x080x0008 #BACKSPACE 0x090x0009 #HORIZONTAL TABULATION 0x0a0x000a #LINE FEED 0x0b0x000b #VERTICAL TABULATION 0x0c0x000c #FORM FEED 0x0d0x000d #CARRIAGE RETURN 0x0e0x000e #SHIFT OUT 0x0f0x000f #SHIFT IN 0x100x0010 #DATA LINK ESCAPE 0x110x0011 #DEVICE CONTROL ONE 0x120x0012 #DEVICE CONTROL TWO 0x130x0013 #DEVICE CONTROL THREE 0x140x0014 #DEVICE CONTROL FOUR 0x150x0015 #NEGATIVE ACKNOWLEDGE 0x160x0016 #SYNCHRONOUS IDLE 0x170x0017 #END OF TRANSMISSION BLOCK 0x180x0018 #CANCEL 0x190x0019 #END OF MEDIUM 0x1a0x001a #SUBSTITUTE 0x1b0x001b #ESCAPE 0x1c0x001c #FILE SEPARATOR 0x1d0x001d #GROUP SEPARATOR 0x1e0x001e #RECORD SEPARATOR 0x1f0x001f #UNIT SEPARATOR 0x200x0020 #SPACE 0x210x0021 #EXCLAMATION MARK 0x220x0022 #QUOTATION MARK 0x230x0023 #NUMBER SIGN 0x240x0024 #DOLLAR SIGN 0x250x0025 #PERCENT SIGN 0x260x0026 #AMPERSAND 0x270x0027 #APOSTROPHE 0x280x0028 #LEFT PARENTHESIS 0x290x0029 #RIGHT PARENTHESIS 0x2a0x002a #ASTERISK 0x2b0x002b #PLUS SIGN 0x2c0x002c #COMMA 0x2d0x002d #HYPHEN-MINUS 0x2e0x002e #FULL STOP 0x2f0x002f #SOLIDUS 0x300x0030 #DIGIT ZERO 0x310x0031 #DIGIT ONE 0x320x0032 #DIGIT TWO 0x330x0033 #DIGIT THREE 0x340x0034 #DIGIT FOUR 0x350x0035 #DIGIT FIVE 0x360x0036 #DIGIT SIX 0x370x0037 #DIGIT SEVEN 0x380x0038 #DIGIT EIGHT 0x390x0039 #DIGIT NINE 0x3a0x003a #COLON 0x3b0x003b #SEMICOLON 0x3c0x003c #LESS-THAN SIGN 0x3d0x003d #EQUALS SIGN 0x3e0x003e #GREATER-THAN SIGN 0x3f0x003f #QUESTION MARK 0x400x0040 #COMMERCIAL AT 0x410x0041 #LATIN CAPITAL LETTER A 0x420x0042 #LATIN CAPITAL LETTER B 0x430x0043 #LATIN CAPITAL LETTER C 0x440x0044 #LATIN CAPITAL LETTER D 0x450x0045 #LATIN CAPITAL LETTER E 0x460x0046 #LATIN CAPITAL LETTER F 0x470x0047 #LATIN CAPITAL LETTER G 0x480x0048 #LATIN CAPITAL LETTER H 0x490x0049 #LATIN CAPITAL LETTER I 0x4a0x004a #LATIN CAPITAL LETTER J 0x4b0x004b #LATIN CAPITAL LETTER K 0x4c0x004c #LATIN CAPITAL LETTER L 0x4d0x004d #LATIN CAPITAL LETTER M 0x4e0x004e #LATIN
Re: Daniels and Bright Tibetan Query
On Wed, Jan 31, 2001 at 08:10:44 -0800, James E. Agenbroad wrote: In the chapter on Tibetan in Daniels and Bright's The world's writing systems (page 434) about prescript symbols: "There are six radicals that never occur with a prescript: wa, ra, la, ha, and 'a chung." Does anyone know what the sixth one is or should it be "five"? Thanks in advance. a chen SY, Uwe -- [EMAIL PROTECTED] | Zu Grunde kommen http://www.ptc.spbu.ru/~uwe/| Ist zu Grunde gehen
Re: Java and Unicode
On Thu, Nov 16, 2000 at 05:58:27 -0800, Elliotte Rusty Harold wrote: public char charAt(int index) This method is used to walk strings, looking at each character in turn, a useful thing to do. Clearly it would be possible to replace it with a method with a String return type like this: public String characterAt(int index) And what method you will use to obtain the (single) character in the returned string? :-) SY, Uwe -- [EMAIL PROTECTED] | Zu Grunde kommen http://www.ptc.spbu.ru/~uwe/| Ist zu Grunde gehen
Re: Cyrillic -
-Original Message- From: Aleksandar Poposki [mailto:[EMAIL PROTECTED]] Sent: Thursday, September 28, 2000 4:04 PM To: [EMAIL PROTECTED] Subject: Your opinion I'm the Webmaster of the Macedonian Orthodox Church website located at www.m-p-c.org. When I started this project I was not very familiar with Unicode and used 'home-made' fonts for Cyrillic characters, but learning about Unicode, I see it is the best way to go, as it is the International standard. Keeping this in mind, and other difficulties I've had, I wish to ask: Do you plan to have Old Church Slavonic (OCS) in your pages? Unicode lacks support for "letter titlo" (i.e. titlo with a letter) used quite productively in OCS (in Russia at least), so you can't use Unicode to write "The Lord" (with "slovo-titlo") or "The Gospel" (with "glagol-titlo"). SY, Uwe -- [EMAIL PROTECTED] | Zu Grunde kommen http://www.ptc.spbu.ru/~uwe/| Ist zu Grunde gehen
Re: Cyrillic -
On Fri, Sep 29, 2000 at 15:55:41 -0800, John Cowan wrote: What is genuinely missing is IOTIFIED A. Because LITTLE YUS and IOTIFIED A fell together in Russian as /ja/, Peter eliminated the latter and adopted a modified form of LITTLE YUS, now CYRILLIC LETTER YA. But aren't IOTIFIED A and YA just glyph variants (with LITTLE YUS lacking a parallel glyph in Peter's civil alphabet, merging with YA instead). Historically YA is a glyph variant of LITTLE YUS, not of IOTIFIED A, I am told. So given that we have already encoded YA and LITTLE YUS (unavoidable, really, considering how different they look), IOTIFIED A has no representation. My, rather limited, understanding is that at that time the two letters, LITTLE YUS and IOTIFIED A, were no longer denoting distinct sounds and were used more or less interchangeably (i.e. they were more or less glyph variants by that time) and so Peter merged them into one letter YA with a glyph for it being based on a glyph for LITTLE YUS. In other words iotified a (ya) survived in Peter's secular Russian alphabet as a character but lost its Slavonic glyph, while little yus disappeared as a character but its glyph survived in the new alphabet. Thus Peter's YA is *character* YA (== iotified a) with a glyph based on a glyph for little yus. But important point here is that "old" alphabet and "new" alphabet were "disjoint". With regard to Russian they are disjoint in time. With regard to Slavonic - the new alphabet was "secular Russian", while old one was "Church Slavonic" and the two never really mixed. The "typeface" aspect is important too: writing one of the languages in the other's typeface is clearly perceived as either a visual pun or transliteration. So, in theory, you'll never find *glyph* YA (reversed R) and *glyph* IOTIFIED A (i-a) in one homogeneous text as this is made impossible by either synchronic or diachronic constraints. So it seems that for Slavonic one should use LITTLE YUS to encode little yus and YA to encode iotified a (which my grammar book of Slavonic calls just "ya"). For Russian there's no LITTLE YUS and character YA is used to encode ya. Of course it's still possible to develop a typeface with all three glyphs (little yus, iotified a, ya) in it and use OpenType to choose correct one. This is not dissimilar to, say, mixed Serbian and Russian cursive text with different glyphs for certain characters. (And the latter have been already discussed to death on this list). All this, of course, is Russian-centric. I don't know how things developed in other Slavic languages, especially in southern slavic languages that are closer to (also southern by its origin) Church Slavonic than the eastern slavic Russian. PS: Sorry if this sounds a little confusing - 6am is not the best time for writing from memory short essays on history of Cyrillic alphabet in Russia. SY, Uwe -- [EMAIL PROTECTED] | Zu Grunde kommen http://www.ptc.spbu.ru/~uwe/| Ist zu Grunde gehen
Digits (Was: What a difference a glyph makes...)
On Wed, Jul 26, 2000 at 12:02:15 -0800, [EMAIL PROTECTED] wrote: This reminds me of "Are DIGIT SEVEN and DIGIT SEVEN WITH STROKE distinct characters?" Yeah, our decimal number system has at least thirteen digits: DIGIT ONE Add another ONE here: digit one with bottom stroke: /| _|_ This bottom stroke in ONE was mandatory, just like slashed zero, for submitting punching jobs (you know, in those batch days when punched cards were still in active use and you had an option to submit a handwritten text of your program to be punched for you). SY, Uwe -- [EMAIL PROTECTED] | Zu Grunde kommen http://www.ptc.spbu.ru/~uwe/| Ist zu Grunde gehen
Re: .TTF to .GIFs-- back again...
On Sun, Jul 16, 2000 at 16:12:53 -0800, Robert Wheelock wrote: 1. Convert a TrueType (or EPS Type 1) font's characters into individual .GIF (or .BMP) images GhostScript? It dropped gif support because of licensing issues, but supports plenty of other graphic formats. You just need to wrtite a little script. 2. The reverse-use individual .GIF (or .BMP) images to build a useful TrueType (or EPS Type 1) font. FontLab with ScanFont http://www.fontlab.com/. FontLab is generally regarded as being the best font editor. Drawing glyphs from scratch is a weakness of FontLab v3.x, so several font designers I know prefer to draw somewhere else (e.g. Fontographer) but do all other work in FontLab. But since you work with scans, this weakness should be irrelevant in your situation. Or you can try your luck with GNU fontutils (limn + bzrto). SY, Uwe -- [EMAIL PROTECTED] | Zu Grunde kommen http://www.ptc.spbu.ru/~uwe/| Ist zu Grunde gehen
Re: Pronunciation of Unicode
On Fri, Jul 14, 2000 at 14:33:53 -0800, Tex Texin wrote: And do we know which locale we are debating the pronounciation of? Michael is in Ireland, ... My manager, native Irish (she's absolutely lovely person - the best boss I ever had), would pronounce it with final /kozh/ I think ;-) In Russian I would occasionally pronounce it without initial /j/, as /oo ni 'kod/ because of universal pronunciation of 'uni-' as /oo ni/ in Russian. SY, Uwe -- [EMAIL PROTECTED] | Zu Grunde kommen http://www.ptc.spbu.ru/~uwe/| Ist zu Grunde gehen
Re: Chinese characters in Java Applet
On Thu, Jun 22, 2000 at 02:20:39 -0800, Parvinder Singh(EHPT) wrote: I am trying to to display chinese characters stored in Unicode format in oracle database through a Java applet in the browser. The applet uses JDBC calls and thin driver. The oracle resides on Sun Solaris server . But the applet is not showing the characters correctly. My browser has chinese fonts. Do I need to have something else at client side ? What all additional things are needed to accomplish the chinese character display in the applet ? Yes, you need to tell client-side AWT which platform fonts to use. I have posted a sample font.properties entries for win32 just few days ago, solaris is not very different. If you missed that post of mine, just drop me a note and I'll forward it to you. SY, Uwe -- [EMAIL PROTECTED] | Zu Grunde kommen http://www.ptc.spbu.ru/~uwe/| Ist zu Grunde gehen