Re: Special Type Sorts Tray 2001
In a message dated 2001-09-30 9:19:31 Pacific Daylight Time, [EMAIL PROTECTED] writes: > I have been thinking recently that it would be useful to have presentation > forms for a ct ligature character and various long s ligatures so that one > may transcribe printed works from the 18th century into unicode while > keeping the typographic style intact. As mentioned, this can already be done with ZWJ, although fonts may not be able to render it correctly. (But this is always true for any newly added glyph, no matter how encoded.) > In view of these various situations and possibly various others that people > might like to post into this thread, I write to put forward the suggestion > that as a discussion on this list various users of the unicode > specification might like to agree informally a collection of characters > called Special Type Sorts Tray 2001 or STST2001 to be defined in the Private > Use Area in, say, the range U+E700 through to U+E7FF in the hope that > perhaps by there being some informal agreement perhaps someone with a font > generating package might like to add them into a font and maybe various > small yet significant benefits to the facilities available for encoding text > might be achieved. You might want to take a look at the ConScript Unicode Registry, which was originally intended for "constructed" and artificial scripts, but which could also be used for this purpose. > Please know that I am specifically suggesting that this be a discussion > amongst the user community: I am not suggesting that the Unicode Consortium > endorse this suggestion as I am fully aware that the rules for the use of > the Private Use Area specifically say that no assignment to a particular set > of characters will ever be endorsed by the Unicode Consortium. OK, then ConScript might be a suitable venue for this proposed encoding after all. > I declare an interest in the choice of U+E700 to U+E7FF as the range for > STST2001 in that I have been defining and publishing, This range is already taken in ConScript, but several other ranges are available, and as David mentioned, you'll probably need a lot more than 256 code points. ConScript is the work of Michael Everson and John Cowan. You should check with them. http://www.evertype.com/standards/csur/index.html http://www.evertype.com/standards/csur/conscript-table.html -Doug Ewell Fullerton, California
Re: Special Type Sorts Tray 2001 (derives from Egyptian Transliteration Characters)
On Sun, Sep 30, 2001 at 04:59:49PM +0100, William Overington wrote: > In view of these various situations and possibly various others that people > might like to post into this thread, I write to put forward the suggestion > that as a discussion on this list various users of the unicode > specification might like to agree informally a collection of characters > called Special Type Sorts Tray 2001 or STST2001 to be defined in the Private > Use Area in, say, the range U+E700 through to U+E7FF in the hope that All those characters can be encoded in Unicode already. Use a ZWJ for the ligated characters. And all those characters can be displayed on an OpenType system - the H with line below and hyphen with diaresis can be display on my xterm with overprinted combining characters. The rest of the world has a solution for this; a hacked solution may be usable on some systems that can't get it right, but there's no need to standardize it. Did you notice that all the characters you mentioned are for Latin scripts? Some other scripts, in normal use, can take more than 256 glyphs to be right - see the Arabic pre-shaped glyphs and the precomposed Hangul characters for examples. I bet I can fill that with Latin examples alone. Malay Grammar has a ligated ng. Lakota has at least couple dozen non-precomposed letters. Lithuanian needs its couple dozen. Math books will arbitarily compose any letter with any symbol - I can get a couple dozen examples from what I have on hand. The Fraktur ligations probably add up to a couple dozen there. I don't think I'd have any problem coming up with 256 examples, all clearly documented as to source with scans, by the end of the day. > Maybe someday some of the characters might be promoted to become regular > unicode characters by the Unicode Consortium, maybe not. Not likely. Unicode refuses to encode more ligatures and precomposed characters. -- David Starner - [EMAIL PROTECTED] Pointless website: http://dvdeug.dhis.org "I saw a daemon stare into my face, and an angel touch my breast; each one softly calls my name . . . the daemon scares me less." - "Disciple", Stuart Davis
RE: Special Type Sorts Tray 2001 (derives from Egyptian Transliteration Characters)
William, It looks like if you really want multilingual support that you need to run your text through a layout engine. If that is the case then you can remap certain characters or character combinations into the U+FDD0 to U+FDEF Unicode range and use this special non-character area for what ever purpose the font and layout engine needs. If the private area becomes standardized then it is no longer a private area. Carl > -Original Message- > From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]]On > Behalf Of William Overington > Sent: Sunday, September 30, 2001 9:00 AM > To: [EMAIL PROTECTED] > Cc: [EMAIL PROTECTED]; [EMAIL PROTECTED]; > [EMAIL PROTECTED] > Subject: Special Type Sorts Tray 2001 (derives from Egyptian > Transliteration Characters) > > > In a recent thread entitled Egyptian Transliteration Characters, a request > was made for various characters including the following. > > LATIN CAPITAL LETTER H WITH LINE BELOW > LATIN SMALL LETTER H WITH LINE BELOW > > There was also a suggestion from a participant in the thread for > a character > HYPHEN WITH DIARESIS for use in preparing a vocabulary list in German. > > I have been thinking recently that it would be useful to have presentation > forms for a ct ligature character and various long s ligatures so that one > may transcribe printed works from the 18th century into unicode while > keeping the typographic style intact. > > There is already U+017F LATIN SMALL LETTER LONG S and U+FB05 LATIN SMALL > LIGATURE LONG S T in regular unicode. > > I am thinking of such characters as LATIN SMALL LIGATURE LONG S LONG S and > LATIN SMALL LIGATURE LONG S L and LATIN SMALL LIGATURE LONG S B and so on. > There are perhaps about a dozen long s ligatures that could usefully be > encoded. > > In view of these various situations and possibly various others > that people > might like to post into this thread, I write to put forward the suggestion > that as a discussion on this list various users of the unicode > specification might like to agree informally a collection of characters > called Special Type Sorts Tray 2001 or STST2001 to be defined in > the Private > Use Area in, say, the range U+E700 through to U+E7FF in the hope that > perhaps by there being some informal agreement perhaps someone with a font > generating package might like to add them into a font and maybe various > small yet significant benefits to the facilities available for > encoding text > might be achieved. > > Maybe someday some of the characters might be promoted to become regular > unicode characters by the Unicode Consortium, maybe not. I feel > that it is > better to have available soon rather than not to have available some > informal list with some level of agreement amongst users, even if > only tacit > agreement, so that it is possible to use unicode to encode the various > characters for the various purposes. > > Please know that I am specifically suggesting that this be a discussion > amongst the user community: I am not suggesting that the Unicode > Consortium > endorse this suggestion as I am fully aware that the rules for the use of > the Private Use Area specifically say that no assignment to a > particular set > of characters will ever be endorsed by the Unicode Consortium. So, whilst > recognizing that that statement in the specification may not preclude the > Unicode Consortium from saying that some particular usage of the > Private Use > Area is wrong in some way, the absence of any encouragement from > the Unicode > Consortium over the definition of Special Type Sorts Tray 2001 > should not be > seen as in any way an objection to it being defined. > > I declare an interest in the choice of U+E700 to U+E7FF as the range for > STST2001 in that I have been defining and publishing, as part of my > research, designations for a number of characters in the Private Use Area > for a specific application area, namely for use in Java > programming for the > DVB-MHP (Digital Video Broadcasting - Multimedia Home Platform) system and > this particular range does not conflict with the codes that I am using in > that project, so the choice of U+E700 to U+E7FF as the range would be > particularly convenient to me. If anyone is interested to see those > definitions then they are in the DVB-MHP section of > http://www.users.globalnet.co.uk/~ngo which is our family webspace in > England. There are references in various of the documents, namely the > Contemporary introduction, the document about Sequential text files and > their applications and in the second and third documents about > the Astrolabe > Channel numerical pointer. > > It is hard to even guess how many characters there are that people might > like to suggest for STST2001 and maybe there will be only a few and sorts > can be added gradually over a number of years, or maybe the tray will be > filled up quickly and starting another tray will need to be considered. > Hopefully STST2001 will be a useful facility and then
Unicode Conf. game idea: Vowel Karuta
You play karuta like normal, but on the cards are IPA vowels. So the guy says /i::/ or whatever and you have to pick up that vowel. $B$8$e$&$$$C$A$c$s(B(Juuitchan) Well, I guess what you say is true, I could never be the right kind of girl for you, I could never be your woman - White Town
Re: Egyptian Transliteration Characters
> >The missing characters can be characterised as follows: > >LATIN CAPITAL LETTER H WITH LINE BELOW >LATIN SMALL LETTER H WITH LINE BELOW > When I saw this I remembered that there is a letter H with a line across it that is used in Maltese. I remembered this from seeing it in a catalogue of metal type which listed the accents needed for various European languages, not from a linguistic perspective, so I do not know if that letter would be appropriate for your needs. My thoughts are that, as the use is for transliteration for study rather than for transcription as a direct record it might perhaps be a suitable choice for your use, even if only on a temporary basis, with the big advantage that the letters are not only already coded in unicode as U+0126 for LATIN CAPITAL LETTER H WITH STROKE and U+0127 for LATIN SMALL LETTER H WITH STROKE (the 0126 and 0127 being hexadecimal representations) but also that both are often included in fonts that are available now. If someone happens to be using an older version of Word that has not got those characters available in the font being used then later versions of several fonts, including Arial and Times New Roman, that do contain the characters are available free from the http://www.microsoft.com/typography/fontpack/default.htm webpage. In the Microsoft Word program one simply uses Insert Symbol and then finds the desired character in the display provided. One can even set up short cuts so that some combination such as Alt + Shift + H gives the one character and Alt + H gives the other character using text entry using an ordinary English keyboard. I do have a further suggestion regarding the use of the Private Use Area, though as that has a wider context, I will start a new thread for that suggestion. William Overington 30 September 2001
Special Type Sorts Tray 2001 (derives from Egyptian Transliteration Characters)
In a recent thread entitled Egyptian Transliteration Characters, a request was made for various characters including the following. LATIN CAPITAL LETTER H WITH LINE BELOW LATIN SMALL LETTER H WITH LINE BELOW There was also a suggestion from a participant in the thread for a character HYPHEN WITH DIARESIS for use in preparing a vocabulary list in German. I have been thinking recently that it would be useful to have presentation forms for a ct ligature character and various long s ligatures so that one may transcribe printed works from the 18th century into unicode while keeping the typographic style intact. There is already U+017F LATIN SMALL LETTER LONG S and U+FB05 LATIN SMALL LIGATURE LONG S T in regular unicode. I am thinking of such characters as LATIN SMALL LIGATURE LONG S LONG S and LATIN SMALL LIGATURE LONG S L and LATIN SMALL LIGATURE LONG S B and so on. There are perhaps about a dozen long s ligatures that could usefully be encoded. In view of these various situations and possibly various others that people might like to post into this thread, I write to put forward the suggestion that as a discussion on this list various users of the unicode specification might like to agree informally a collection of characters called Special Type Sorts Tray 2001 or STST2001 to be defined in the Private Use Area in, say, the range U+E700 through to U+E7FF in the hope that perhaps by there being some informal agreement perhaps someone with a font generating package might like to add them into a font and maybe various small yet significant benefits to the facilities available for encoding text might be achieved. Maybe someday some of the characters might be promoted to become regular unicode characters by the Unicode Consortium, maybe not. I feel that it is better to have available soon rather than not to have available some informal list with some level of agreement amongst users, even if only tacit agreement, so that it is possible to use unicode to encode the various characters for the various purposes. Please know that I am specifically suggesting that this be a discussion amongst the user community: I am not suggesting that the Unicode Consortium endorse this suggestion as I am fully aware that the rules for the use of the Private Use Area specifically say that no assignment to a particular set of characters will ever be endorsed by the Unicode Consortium. So, whilst recognizing that that statement in the specification may not preclude the Unicode Consortium from saying that some particular usage of the Private Use Area is wrong in some way, the absence of any encouragement from the Unicode Consortium over the definition of Special Type Sorts Tray 2001 should not be seen as in any way an objection to it being defined. I declare an interest in the choice of U+E700 to U+E7FF as the range for STST2001 in that I have been defining and publishing, as part of my research, designations for a number of characters in the Private Use Area for a specific application area, namely for use in Java programming for the DVB-MHP (Digital Video Broadcasting - Multimedia Home Platform) system and this particular range does not conflict with the codes that I am using in that project, so the choice of U+E700 to U+E7FF as the range would be particularly convenient to me. If anyone is interested to see those definitions then they are in the DVB-MHP section of http://www.users.globalnet.co.uk/~ngo which is our family webspace in England. There are references in various of the documents, namely the Contemporary introduction, the document about Sequential text files and their applications and in the second and third documents about the Astrolabe Channel numerical pointer. It is hard to even guess how many characters there are that people might like to suggest for STST2001 and maybe there will be only a few and sorts can be added gradually over a number of years, or maybe the tray will be filled up quickly and starting another tray will need to be considered. Hopefully STST2001 will be a useful facility and then when someone chooses to put forward a suggestion for a character to be available then sometimes adding it to STST2001 will be a suitable solution. A solution that someone suggesting a character should allow eight days for discussion and then if the suggestion does not conflict with an existing definition and no good reason has been put forward as to why the suggestion should not be included then the suggestion becomes included in STST2001 would perhaps be suitable. A good reason might be that, unknown to the person making the suggestion, that the character sort is already defined in regular unicode. I feel that a special type sorts tray within the Private Use Area agreed informally by people within the user community would be a very useful facility. William Overington 30 september 2001
Re: Missing Arabic and Syriac characters in Unicode
>From: Philipp Reichmuth <[EMAIL PROTECTED]> >Reply-To: Philipp Reichmuth <[EMAIL PROTECTED]> >To: Roozbeh Pournader <[EMAIL PROTECTED]> >CC: Miikka-Markus Alhonen <[EMAIL PROTECTED]>, Unicode List <[EMAIL PROTECTED]> >Subject: Re: Missing Arabic and Syriac characters in Unicode >Date: Sun, 30 Sep 2001 12:54:17 +0200 > >-BEGIN PGP SIGNED MESSAGE- >Hash: SHA1 > >Hi folks! > >RP> At least not in the Korans I've seen. In those, Turned Damma is clearly >RP> used to mark an /u:/ sound when a Waw is not there (and only that). It is >RP> not an ornament in any way. I'm talking about Iranian Korans. > >It's clearly a character then. It definitely makes sense in an Iranian >context from the viewpoint of the Persian use of Arabic script. > >BTW does it represent /u:/ or /u/? In the Qur'an, /u:/ would probably >represented by Waw and would be read by a Persian as /u/, wouldn't it? >While damma would be read as /o/. Could you point me to a location in >an Iranian Qur'an where there is one of these? > >Greetings > Philipp mailto:[EMAIL PROTECTED] >__ Inverted Damma, vertical Kasrah etc are accentuated Damma and Kasrah etc and these are variants (alternate ways) used extensively in Qurans published in India and Pakistan. In the Qurans published in Middle East these are usually represented by Damma followed by a small waw and Kasrah followed by a small yay. If anyone wished I could try scanning and sending you the examples from published copies of Quran. Since these are variants, one could (and I have) defined these as ligatures in fonts, so that when one wishes these to appear for a work published for people of certain area, all one has to do is change the font. Since these are two different ways of representing same vowel sounds, different unicode positions may not be advisable. In any case these are not ornamental or decorative marks, rather these are different way of representing existing unciode characters. Regards Abdul-Majid Bhurgri Get your FREE download of MSN Explorer at http://explorer.msn.com
Re: Turned Damma [was Re: Missing Arabic and Syriac characters inUnicode]
On Sun, 30 Sep 2001, Philipp Reichmuth wrote: > BTW does it represent /u:/ or /u/? In the Qur'an, /u:/ would probably > represented by Waw and would be read by a Persian as /u/, wouldn't it? > While damma would be read as /o/. Could you point me to a location in > an Iranian Qur'an where there is one of these? Well, beginners read the vowels as they read them in Persian, but you are told about the real pronounciation in the Iranian high schools. So in short, Turned Damma is pronounced like a Waw, if you are a beginner, you pronounce both as /u/; if you are an expert, you pronounce both as /u:/. I don't have an old Koran handy (since the mark is only used in Korans published before 1980). I will look and tell you when I got home. roozbeh
Re: Missing Arabic and Syriac characters in Unicode
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Hi folks! RP> At least not in the Korans I've seen. In those, Turned Damma is clearly RP> used to mark an /u:/ sound when a Waw is not there (and only that). It is RP> not an ornament in any way. I'm talking about Iranian Korans. It's clearly a character then. It definitely makes sense in an Iranian context from the viewpoint of the Persian use of Arabic script. BTW does it represent /u:/ or /u/? In the Qur'an, /u:/ would probably represented by Waw and would be read by a Persian as /u/, wouldn't it? While damma would be read as /o/. Could you point me to a location in an Iranian Qur'an where there is one of these? Greetings Philippmailto:[EMAIL PROTECTED] __ Errors have occurred / We won't tell you where or why / Lazy programmers -BEGIN PGP SIGNATURE- Version: GnuPG v1.0.6 (MingW32) Comment: Freedom of the press is limited to those who own one. iD8DBQE7tvnfAFQhKhQ6O0kRAjsQAKCRgS2L9VfqZp7cKeqLZxIDBGzBdgCbBzlx N33Sx3c1saFTjPthvVBpCe4= =lUUN -END PGP SIGNATURE-
Re: Missing Arabic and Syriac characters in Unicode
On Sun, 30 Sep 2001, Philipp Reichmuth wrote: > >> This includes 'Subscript Alef' and 'Turned Damma' (Ulta Pesh), used in > >> Iran and Pakistan; > > MMA> I think these are also used in Arab countries, because even my Arabic teacher > MMA> who's from Syria referred to this "ulta pesh" as a "Koranic sign". > > Hm, as far as I understand it, it is mainly used as a calligraphic > sign in Arab-speaking countries and carries no phonetic or > recitational information of its own. I've checked through my own > copies of the Qur'an briefly, but as far as I can see, it's used only > in calligraphic script as an ornamental sign. Since Qur'anic verses > tend to be rather ornately decorated, the association with the Qur'an > appears quite straightforward. On the other hand, all of my copies are > printed (says something already) either in Egypt or Sudan, so they > need not be representative. At least not in the Korans I've seen. In those, Turned Damma is clearly used to mark an /u:/ sound when a Waw is not there (and only that). It is not an ornament in any way. I'm talking about Iranian Korans. roozbeh