Character properties
Is there a place on unicode.org which describes the concept of properties in greater detail? -Original Message- From: Kenneth Whistler [mailto:[EMAIL PROTECTED]] Other properties accrue more directly to characters, per se. They attach to the abstract character, and get associated with a code point more indirectly by virtue of the encoding of that character. The numeric value of a character would be a good example of this. No one expects an unassigned code point or an assigned dingbat character or a left bracket to have a numeric value property (except perhaps a future generation of Unicabbalists). There are no corresponding features in other character sets usually. Correct. Before the development of the Unicode Standard, character encoding committees tended to leave that property assignments either up to implementations (considering them obvious) or up to standardization committees whose charter was character processing -- e.g. SC22/WG15 POSIX in the ISO context. The development of a Universal character encoding necessitated changing that, bringing character property development and standardization under the same roof as character encoding.
Re: Common input methods for IPA
Marc Wilhelm Küster kuester at saphor dot net wrote: Lukas' German-based phonetic keyboard is something I'll definitely take a deeper look into -- what I saw so far on the quoted URL is promising. It comes closest to a turn-key solution for Germany for the time being. At the same time I'll also have a look at the UniPad keyboard. Remember that SC UniPad is a standalone Unicode text editor. Keyboards designed for UniPad can't really be used with anything else. Also, I haven't made my IPA keyboard for UniPad available yet, for the reasons I mentioned: it's based on the images at gy.com (which are somewhat difficult to interpret and which can hardly be said to depict a standard IPA keyboard), plus there are several characters missing that I haven't been able to find in Unicode. If you still want to have a look at it, though, I can e-mail it to you (and for that matter, to anyone who might be able to help me find those missing characters!). I have released six other custom keyboards for UniPad; anyone who is interested should check http://www.unipad.org/keyboard/ for more information. -Doug Ewell Fullerton, California
Re: Proposal: Ligatures w/ ZWJ in OpenType
Concerning the use of ZWJ to request ligation in the Latin script (and, less contentiously, the use of ZWNJ to prevent it), many -- including some experts and UTC members -- have stated that ZWJ should only be used in exceptional circumstances, or when the requested ligature is necessary grammatically or orthographically instead of stylistically (however that is determined). I'm starting to see why I disagree so strongly with this position. It's not that I'm eager to pepper my text with ZWJs or to require other writers to do the same, or even that I think modern English text in most circumstances really requires much more than the basic f-ligatures. No, what bothers me is that the ZWJ/ZWNJ ligation scheme is starting to look just like the DOA (deprecated on arrival) Plane 14 language tags. In each case, Unicode has created a mechanism to solve a genuine (if limited) need, but then told us -- officially or unofficially -- that we should not use it, or that it is reserved for use with special protocols which are never defined or mentioned again. I think I've lost the battle regarding Plane 14 tags -- though I can't promise I'll never use them in plain text without those mysterious special protocols -- but the fight for ZWJ ligation continues. The UTC may have intended that ZWJ ligation be used only in rare and exceptional circumstances, but UAX #27, revised section 13.2 doesn't say that. It says that ZWJ and ZWNJ *may be used* to request ligation or non-ligation, and that font vendors should add ZWJ to their ligature mapping tables as appropriate. It does acknowledge that some fonts won't (or shouldn't) include glyphs for every possible ligature, and never claims that they must (or should). It specifically does *not* say that ZWJ ligation is to be restricted to certain orthographies, or to cases where ligation changes the meaning of the text. As Michael and Asmus have pointed out, without ZWJ ligation we will continue to see numerous, very serious proposals to add more ligated presentation forms to Unicode. Is that what we want? Not everyone will buy into the notion that AAT and OpenType will automagically handle all ligation scenarios. ZWJ/ZWNJ for ligation control is part of Unicode. It is not always the best solution, but it is *a* solution, and should be available to the user without restriction or discouragement. -Doug Ewell Fullerton, California
Is UniCode's Thai character representation is acceptable by TISI or not?
Hi Samphan, Thank U for Your kind response.Please let me know whether Unicode's Thai character represation is acceptable by TISI or not? It is very essential to our project. Thanks in Advance. Regards, Sreedhar M. - Original Message - From: Samphan Raruenrom [EMAIL PROTECTED] To: Sreedhar.M [EMAIL PROTECTED] Sent: Monday, July 15, 2002 4:46 PM Subject: Re: tis-620 tis-620 is in the process of registering as iso-8859-11 so you can use the proposal for info about tis-620, the chart is at the end. http://www.nectec.or.th/it-standards/iso8859-11/index.html Sreedhar.M wrote: Hi Samphan, Thank You for Your kind information.But the URL You specified is displaying links which are showing all in Thai Language.I need the information in English.Please let me know whether is there any information available in english regarding this. Thanks in Advance. -- Samphan Raruenrom Information Research and Development Division, National Electronics and Computer Technology Center, Thailand. http://www.nectec.or.th/home/index.html
Re: Proposal: Ligatures w/ ZWJ in OpenType
On Monday, July 15, 2002, at 09:58 AM, Doug Ewell wrote: No, what bothers me is that the ZWJ/ZWNJ ligation scheme is starting to look just like the DOA (deprecated on arrival) Plane 14 language tags. In each case, Unicode has created a mechanism to solve a genuine (if limited) need, but then told us -- officially or unofficially -- that we should not use it, or that it is reserved for use with special protocols which are never defined or mentioned again. I'm not sure I agree with you here. The position of the UTC is not that ZWJ should never be used and we're sorry we added it, which is the case of the Plane 14 language tags. It's that the ZWJ should not be the primary mechanism for providing ligature support in many cases. That's as far as it goes. The UTC may have intended that ZWJ ligation be used only in rare and exceptional circumstances, but UAX #27, revised section 13.2 doesn't say that. The latest word is the Unicode 3.2 document, not the Unicode 3.1 document. It says: Ligatures and Latin Typography (addition) It is the task of the rendering system to select a ligature (where ligatures are possible) as part of the task of creating the most pleasing line layout. Fonts that provide more ligatures give the rendering system more options. However, defining the locations where ligatures are possible cannot be done by the rendering system, because there are many languages in which this depends not on simple letter pair context but on the meaning of the word in question. ZWJ and ZWNJ are to be used for the latter task, marking the non-regular cases where ligatures are required or prohibited. This is different from selecting a degree of ligation for stylistic reasons. Such selection is best done with style markup. See Unicode Technical Report #20, Unicode in XML and other Markup Languages for more information. It says that ZWJ and ZWNJ *may be used* to request ligation or non-ligation, and that font vendors should add ZWJ to their ligature mapping tables as appropriate. It does acknowledge that some fonts won't (or shouldn't) include glyphs for every possible ligature, and never claims that they must (or should). It specifically does *not* say that ZWJ ligation is to be restricted to certain orthographies, or to cases where ligation changes the meaning of the text. This is correct. Nor is this changed in Unicode 3.2. The goal is to make the ZWJ mechanism available to people who feel it is appropriate to meet their needs, but to try to inform them that in the majority of cases, a higher-level protocol would be better. Adobe doesn't have to revise InDesign, for example, to insert ZWJ all over when a user selects text and turns optional ligatures on. OTOH, the hope is that if ligatures are available InDesign will honor the ZWJ marked ones, even if ligation has been turned off. John Hudson has recommended what seems a reasonable way to handle this in OT. Apple will be releasing new versions of its font tools in the near future, and the documentation will include a recommendation for how this can be done with AAT. We've been revising our own fonts as the opportunity presents itself to support ZWJ as well. (The system and ATSUI-savvy applications require no revision.) The push-back coming from the font community on the issue has to do mostly with the communications problem that they weren't aware of it in as timely a fashion as would have been best, and the concern that font developers and application/OS developers will be forced to add ligature support where they have felt it in appropriate in the past. ZWJ/ZWNJ for ligation control is part of Unicode. It is not always the best solution, but it is *a* solution, and should be available to the user without restriction or discouragement. It's discouraged when it's inappropriate. It isn't deprecated. There are numerous places where Unicode provides multiple ways of representing something. In this instance, Unicode is trying to delineate where a particular mechanism is appropriate and where inappropriate. == John H. Jenkins [EMAIL PROTECTED] [EMAIL PROTECTED] http://homepage.mac.com/jenkins/
General question
Good morning Unicadets -- This one came in to the Unicode office. If anyone has any hints, please reply to the sender directly. Thanks, Rick Date/Time:Mon Jul 15 05:48:43 EDT 2002 Contact: [EMAIL PROTECTED] Report Type: General question where can I get the free tool to translate my c++ source code to Unicode compliant for internationalization?
Re: Common input methods for IPA
Doug, Let's take this IPA keylayout discussion to the [EMAIL PROTECTED] list. I'll be posting a comparison chart of four different layouts there shortly. -- Michael Everson *** Everson Typography *** http://www.evertype.com
Re: Common input methods for IPA
At 08:24 -0700 2002-07-15, Doug Ewell wrote: Also, I haven't made my IPA keyboard for UniPad available yet, for the reasons I mentioned: it's based on the images at gy.com (which are somewhat difficult to interpret and which can hardly be said to depict a standard IPA keyboard), plus there are several characters missing that I haven't been able to find in Unicode. I did also find some characters in SIL's key layouts which are not encoded, though a couple of the missing ones are handled now by the new UPA additions. -- Michael Everson *** Everson Typography *** http://www.evertype.com
Re: Proposal: Ligatures w/ ZWJ in OpenType
At 08:58 AM 15-07-02, Doug Ewell wrote: No, what bothers me is that the ZWJ/ZWNJ ligation scheme is starting to look just like the DOA (deprecated on arrival) Plane 14 language tags. In each case, Unicode has created a mechanism to solve a genuine (if limited) need, but then told us -- officially or unofficially -- that we should not use it, or that it is reserved for use with special protocols which are never defined or mentioned again. ... ZWJ/ZWNJ for ligation control is part of Unicode. It is not always the best solution, but it is *a* solution, and should be available to the user without restriction or discouragement. I don't think I am trying to discourage people from using ZWJ/ZWNJ for ligation control, or to impose restrictions upon it, but I do have concerns about the practicalities of implementing such control in a way that provides users of ZWJ with the results they desire while not breaking existing ligature implementations. I really am trying to figure out a clear and consistent way to make ZWJ work. Of course, I can only try to propose part of the solution, because ZWJ has an impact not only on how fonts are made but on how layout engines handle the relationship of control characters and glyphs. The implementation note in TR27 stating that font developers should add glyph substitution lookups for ZWJ sequences to their fonts seems to me to display an incomplete understanding of the technology involved. The comments on 'Ligatures and Latin Typography -- naive comments, I think: the layout issues involved are in no way limited to Latin typography -- in TR28, instead of clarifying the situation retreat to a vaguer position. Perhaps the idea is that, by keeping things vague, the UTC permits freedom of implementation, but so far all I am seeing in response is confusion: confusion about what ZWJ signifies in text, and how it should be implemented in line layout. If Doug is worried that ZWJ will be 'deprecated on arrival', he might also worry that ZWJ will be so variously interpreted as to become useless as a reliable means of achieving any consistent result. I have other, more general concerns, about the poor communication between the UTC and the people who make fonts. This is not the UTC's fault. Unlike other technologies that are related to and influenced by Unicode, e.g. web standards and technology, there is no parallel organisation governing the development of font software, no 'Font Technology Consortium'. This means that communication between UTC and font developers has been, at best, ad hoc. I am trying to do something to rectify this situation, since I believe it will benefit everyone if UTC can rely on more regular, consistent and informed involvement from the type industry, and font developers can receive and digest information from the UTC that has an impact on font technology in a timely fashion. John Hudson Tiro Typeworks www.tiro.com Vancouver, BC [EMAIL PROTECTED] Language must belong to the Other -- to my linguistic community as a whole -- before it can belong to me, so that the self comes to its unique articulation in a medium which is always at some level indifferent to it. - Terry Eagleton
Re: TrueType signature bits
Thanks for the confirmation. Do you know what role the unicode bits play in the use of the font - in MS Word, for example ? As far as I can see, even if the bits are set carelessly, or not set at all, the font seems to work in Word. Raymond
C++ unicode
I am sure there is no tool, free or otherwise, for making C++ code unicode compliant If you are talking of Windows code, you could make a start by just putting #define UNICODE before the windows headers. Then, you will have a lot of editing to do. I have just finished that sort of project, so as to be able to handle arbitrary (non-Ansi) filenames. As in my Fontlist v.5, http://ourworld.compuserve.com/homepages/RaymondM Raymond Mercier
Re: TrueType signature bits
At 10:11 PM 7/15/02 +0100, Raymond Mercier wrote: Do you know what role the unicode bits play in the use of the font Among other things, the new X font architecture will try to use them to pick a font in a culturely acceptable typeface.
Re: TrueType signature bits
At 02:11 PM 15-07-02, Raymond Mercier wrote: Do you know what role the unicode bits play in the use of the font - in MS Word, for example ? As far as I can see, even if the bits are set carelessly, or not set at all, the font seems to work in Word. At the moment, they are mainly of use when the font lacks 8-bit codepage support -- e.g. it is an Indic font, for which there are no registered codepages on the system -- and an application needs to determine whether this font is suitable to display a particular text. There are likely to be other fallback mechanisms behind this one, e.g. checking for the presence of particular characters in the font cmap table. John Hudson Tiro Typeworks www.tiro.com Vancouver, BC [EMAIL PROTECTED] Language must belong to the Other -- to my linguistic community as a whole -- before it can belong to me, so that the self comes to its unique articulation in a medium which is always at some level indifferent to it. - Terry Eagleton
Re: Is UniCode's Thai character representation is acceptable by TISI or not?
Sreedhar M wrote: Thank U for Your kind response.Please let me know whether Unicode's Thai character represation is acceptable by TISI or not? It is very essential to our project. Yes. TISI had taken part in the representation of Thai char. in ISO 10646 (and Unicode indirectly). Unicode has backward-compatibility goal so it takes the whole Thai block in TIS-620 to Unicode directly :- unicode = tis620 - 0xa0 + 0x0e00 Which is perfect and ease transition of code. We can modified our code just a little bit to make it work on both tis-620 and unicode (see libinthai, a Thai word-break library, as an example). However, there're still some problems which is beyond assignments of code points, that's char. properties. There're some mistakes in Unicode char. properties for Thai char. and you have to code around that.