RE: the Unicode range and code page range bits in the TrueType OS/2 table
Microsoft applications use both of these to try to determine if a font is likely to support a certain range. Some fonts do not properly set those values but most do, especially common ones. Chris Pratley Group Program Manager Microsoft Office Sent with OfficeXP on WindowsXP -Original Message- From: Yung-Fong Tang [mailto:[EMAIL PROTECTED]] Sent: February 7, 2002 6:45 PM To: Brian Stell; Deborah Goldsmith; [EMAIL PROTECTED]; [EMAIL PROTECTED]; [EMAIL PROTECTED] Subject: the Unicode range and code page range bits in the TrueType OS/2 table Dear i18n folks in Unicode.org Do we know ANY application on ANY platform use the Unicode range or code rage field in the TrueType OS/2 table to support different langeuags ? Does Microsoft applications depend on that ? Deborah: How about MacOS and Mac OS Apps Any Linux application use that ? Ken: Do you know any Adobe software depend on that? I heard a rumer said that those bits are usually unset and keep as 0. But I found that some of the font are set if I use ttfdump to look at them. Thanks
RE: the Unicode range and code page range bits in the TrueType OS/2 table
At 00:19 2/8/2002, Chris Pratley wrote: Microsoft applications use both of these to try to determine if a font is likely to support a certain range. Some fonts do not properly set those values but most do, especially common ones. Chris, how do you define a 'properly set' Unicode range in the OS/2 table? Correct codepage support is self-evident: a font should indicate codepage support only if it's cmap table includes *all* the characters in that codepage. Our current production tool (FontLab 4.0) indicates support for a Unicode range if *any* of the characters in that range are supported. This seems to me, on analysis, to be the best approach, since few fonts will support all the characters in a Unicode range, the definition of a Unicode range may change over time as new characters are added, and arbitrarily insisting on a certain percentage of the characters in a Unicode range is, well, arbitrary. I seem to recall that this approach is approved by your colleagues in the MS type group, but would be interested to know if your opinion, as an MS app developer, differed. John Hudson Tiro Typeworks www.tiro.com Vancouver, BC [EMAIL PROTECTED] ... es ist ein unwiederbringliches Bild der Vergangenheit, das mit jeder Gegenwart zu verschwinden droht, die sich nicht in ihm gemeint erkannte. ... every image of the past that is not recognized by the present as one of its own concerns threatens to disappear irretrievably. Walter Benjamin
Re: Unicode and Security
Elliotte Rusty Harold wrote: The problem is that all of these or any other client-based solution you come up with is only going to be implemented in some clients. Many, and at least initially most, users are not going to have any such protections. This needs to be cut off at the protocol level. Rather, the problem is that replacing just one of the many existing character encodings with an allegedly secure one would only be going to serve some (rather few!) users. Finding a solution that works with all character encodings alike, is much more efficient (and is probably feasable, in contrast to the solution advocated by ERH). One possible solution for the e-mail spoofing problem is kryptographic authentication. This is independent of the underlying character encoding, and it is al- ready widely available. I said 'allegedly secure', because no character encoding standard can really prevent this sort of spoofing (we had enough examples in this thread, based on bare ASCII). Trying to find a spoofing-proof character- encoding is comparable to the task of finding an alphabet that does not allow to spell any insults. Best wishes, Otto Stolz
Re: Unicode and Security
At 17:42 -0500 2002-02-07, John Cowan wrote: The only widely-deployed alternative approach I know of is ETSI GSM 03.38 (used in mobile telephony), A truly bizarre character set: it supports English, French, mainland Scandinavian languages, Italian, Spanish with Graves, and GREEK SHOUTING. On my Nokia I am forced to write SMS messages in Irish with graves for ÀàÌìÒòÙù but I am awarded Éé. The Nokia does have ÁáÍíÓóÚú available for spelling names in the phone book, but the accents are stripped off if they are sent in a text message. :-( -- Michael Everson *** Everson Typography *** http://www.evertype.com
Re: Unicode and Security
At 15:53 -0500 2002-02-07, Elliotte Rusty Harold wrote: For text files, probably not. But for the domain name system the world very well might. Indeed, maybe it should unless this problem can be dealt with. I suspect it can be dealt with by prohibiting script mixing in domain names (e.g. each component of the name must be entirely Greek or entirely Cyrillic or entirely Latin etc. Note: something_Cyrillic.something_greek.com is OK.) Does anybody really need mixed Latin and Greek domain names? Certainly. Some years ago the European Court upheld the right of a Belgian man whose father was Belgian and mother was Greek to spell his hyphenated last name in both scripts. Why should he not be allowed to register a domain based on his own name? I don't think this has anything to do with Unicode. In Unicode, we wish to make all the world's writing system available to everyone. Thieves and cheats will use it if they wish, but this detracts not one whit from the nobility of our enterprise. -- Michael Everson *** Everson Typography *** http://www.evertype.com
RE: Unicode and Security: Domain Names
-Original Message- From: Tom Gewecke [mailto:[EMAIL PROTECTED]] Sent: Thursday, February 07, 2002 6:20 PM To: [EMAIL PROTECTED] Subject: Re: Unicode and Security: Domain Names I note that companies like Verisign already claim to offer domain names in dozens of languages and scripts. Apparently these are converted by something called RACE encoding to ASCII for actual use on the internet. Does anyone know anything about RACE encoding and its properties? I wrote an article on IDNS in December of 2000 which discusses the approaches which were being debated at that time, including RACE. RACE is briefly described in that article. You can find it at: http://www-106.ibm.com/developerworks/library/u-domains.html I tried to find an updated internet draft on RACE, but looks like nothing exists after version 4, which has been archived. I'm guessing that draft names wich include the text BRACE, TRACE, and GRACE are probably RACE variations however. Check them out at: http://www.ietf.org/internet-drafts/ Suzanne Topping BizWonk Inc. [EMAIL PROTECTED]
RE: the Unicode range and code page range bits in the TrueType OS/2 table
On 02/08/2002 03:01:31 AM John Hudson wrote: Chris, how do you define a 'properly set' Unicode range in the OS/2 table? Correct codepage support is self-evident: a font should indicate codepage support only if it's cmap table includes *all* the characters in that codepage. Well, there are some gray areas. There are fonts out there that have the bit for cp1252 set but that don't have the euro or the upper/lower z-caron. And, I will confess, there are fonts out there that really stretch their claims to supporting cp X. For example, when we were completing our Yi font a couple of years ago, we wanted it to work in Word 97 and Word 2000. There was a problem in that Word 2000 had a bunch of font-linking things going on to try to keep the user from seeing boxes, but the algorithms were completely unaware of Yi. I ended up having to set codepage bits for Japanese and (I think) Central European (some Latin codepage other than cp1252) in order to make Word 2000 actually use the font -- if I didn't, then Word would quietly substitute Times New Roman or a Far East font for characters that the font really did support, including about half the Yi range. The claims of supporting those two codepages was very tenuous: there were many of the cp1250 characters not supported by the font, and I think there was exactly one character from cp932 that we actually supported -- 30FB. I'm sure we're not the only ones who have ever stretched things like this. - Peter --- Peter Constable Non-Roman Script Initiative, SIL International 7500 W. Camp Wisdom Rd., Dallas, TX 75236, USA Tel: +1 972 708 7485 E-mail: [EMAIL PROTECTED]
Re: Unicode and Security: Domain Names
In a message dated 2002-02-08 8:23:22 Pacific Standard Time, [EMAIL PROTECTED] writes: Does anyone know anything about RACE encoding and its properties? I wrote an article on IDNS in December of 2000 which discusses the approaches which were being debated at that time, including RACE. RACE is briefly described in that article. You can find it at: http://www-106.ibm.com/developerworks/library/u-domains.html I tried to find an updated internet draft on RACE, but looks like nothing exists after version 4, which has been archived. I'm guessing that draft names wich include the text BRACE, TRACE, and GRACE are probably RACE variations however. Check them out at: http://www.ietf.org/internet-drafts/ An ACE (ASCII-Compatible Encoding) has been chosen for IDN, and it is neither RACE nor DUDE. Its working name was AMC-ACE-Z, and it has since been renamed Punycode. (No, I don't like the name either.) A search for punycode in the internet-drafts directory that Suzanne mentioned will reveal the details you are looking for. Beware that in addition to Punycode, there is another step in the IDN process called nameprep, which is basically an extended form of normalization to keep compatibility characters, non-spacing marks, directional overrides, and such out of domain names. Converting an arbitrary string through Punycode does not necessarily make it IDN-ready. -Doug Ewell Fullerton, California (address will soon change to dewell at adelphia dot net)
Re: the Unicode range and code page range bits in the TrueType OS/2 table
ok. Let me ask again since my origional question is not good enough Do font vendor set teh the ulCharRange bits in OS/2range ?Does Application or OS depend on ulCharRange for what purpose? Ken Lunde wrote: Frank, You wrote: Ken: Do you know any Adobe software depend on that? I heard a rumer said that those bits are usually unset and keep as 0. But I found that some of the font are set ifI use ttfdump to look at them. Our OpenType fonts include 'OS/2' tables, and we populate these fields with meaningful information. To what extent our applications actually make use of it, I don't know. Regards... -- Ken
Re: Unicode and Security
Hi Elliotte and others, ERH Does anybody really need mixed Latin and Greek domain names? This is the wrong approach altogether. If we want to be universal, we can't exclude cases on a heuristic basis of no one is probably going to need this. BTW People will certainly want mixed Han and Latin characters where the problem arises with fullwidth forms to some extent, and people will probably want mixed Cyrillic and Latin domain names as well (one starts seeing mixed scripts in business names, for instance). Philippmailto:[EMAIL PROTECTED] ___ Hal, open the file / Hal, open the damn file, Hal / open the, please Hal
Re[2]: Unicode and Security
Hello Asmus and others, I'm not sure Unicode can be fixed at this point. The flaws may be too deeply embedded. The real solution may involve waiting until companies and people start losing significant amounts of money as a result of the flaws in Unicode, and then throwing it away and replacing it with something else. AF This sounds nice and dramatic, but misses the point that the kinds of AF issues you highlighted are absolutely common to *all* character sets AF containing Latin and Greek, or Latin and Cyrillic characters, suggesting AF that you are simply grandstanding here, instead of trying to find real AF solutions to your problem. Oh, it is very well possible to design a character set that supports all of Latin, Cyrillic and Greek without being susceptible to this problem beyond the familiar 1-l-|, 0-O dimension. The main premise is to encode glyphs instead of characters so that one glyph A is used in all three of these alphabets. Roundtrip compatibility with legacy character sets would be a problem, though. It looks like there is the decision between kludge A (roundtrip compatibility missing) and kludge B (easier spoofability). However, for URLs etc., roundtrip compatibility is not really necessary, I think. AF Earlier, you accused Unicode of being in denial about security AF issues: It is you who is in denial about some underlying AF realities, among which is that there are security issues that AF cannot be fixed by designing a 'better' character set. I am sure they can be fixed by designing a better character set that is better suited to a given problem. A lot of problems can be avoided by regarding a character set as an application-specific entity to some extent. This is not what we want, of course; we want a universal encoding across all applications. This being our premise, the resulting problems which you cannot possibly deny will have to be dealt with in one way or the other. To me, it seems a better idea to fix problems that arise directly from the way we encode our characters already on the character set level as far as possible, even if it just means notifying people that mixing characters from different alphabets may lead to misinterpretations and to denote common glyph similarities in the standard, such as the glyph A or for that part the character A being indiscernible in several alphabets. Philippmailto:[EMAIL PROTECTED] ___ Seeing my great fault / Through darkening blue windows / I begin again
Re: Unicode and Security
At 15:53 -0500 2002-02-07, Elliotte Rusty Harold wrote: For text files, probably not. But for the domain name system the world very well might. Indeed, maybe it should unless this problem can be dealt with. I suspect it can be dealt with by prohibiting script mixing in domain names (e.g. each component of the name must be entirely Greek or entirely Cyrillic or entirely Latin etc. Note: something_Cyrillic.something_greek.com is OK.) Does anybody really need mixed Latin and Greek domain names? Not only that, why limit the alleged security risks to domain names? Why not the part of an email address before the @? the allowed characters for that are specified in a different RFC than that for domain names, and has nothing to do at all with DNS. And how many variations of numerals are there in Unicode? After all, every place you could use a domain name, you could use the actual IP address too. How many ways might that be spoofed? Barry
Re[2]: Unicode and Security
At 06:18 PM 2/8/02 +0100, Philipp Reichmuth wrote: Oh, it is very well possible to design a character set that supports all of Latin, Cyrillic and Greek without being susceptible to this problem beyond the familiar 1-l-|, 0-O dimension. The main premise is to encode glyphs instead of characters so that one glyph A is used in all three of these alphabets. Roundtrip compatibility with legacy character sets would be a problem, though. It looks like there is the decision between kludge A (roundtrip compatibility missing) and kludge B (easier spoofability). If your statement was phrased differently, i.e. saying that domain name registration and resolution should not allow a distinction between A.com and A.com where one uses the Greek and one the Latin A, that would be a different matter. Such action would close this spoofing loophole very effectively w/o restricting the registration of meaningful names. However, there may be subtle issues with such an approach. But the important thing is that it does not fiddle with the character set as such. However, for URLs etc., roundtrip compatibility is not really necessary, I think. I beg to differ. Roundtrip convertibility is very important since URLs live in documents encoded in Unicode, ISO/IEC 8859-7, even Shift-JIS etc. that are all not 'glyph' encodings. Whatever specialized 'character set' gets used transiently in resolving the domain name is one issue, but it better be easily possible to convert between it and the form URLs are actually stored in hypertext. I am sure they can be fixed by designing a better character set that is better suited to a given problem. A lot of problems can be avoided by regarding a character set as an application-specific entity to some extent. This is not what we want, of course; we want a universal encoding across all applications. This being our premise, the resulting problems which you cannot possibly deny will have to be dealt with in one way or the other. Nobody argues that spoofing and other security issues shouldn't get addressed. To me, it seems a better idea to fix problems that arise directly from the way we encode our characters already on the character set level as far as possible, even if it just means notifying people that mixing characters from different alphabets may lead to misinterpretations and to denote common glyph similarities in the standard, such as the glyph A or for that part the character A being indiscernible in several alphabets. And we are certainly doing that. But, while A is an important character, there are nearly 70,000 han characters out there, some with distinctions so subtle that many fonts will not show them and many users will not recognize them. This has not featured in this discussion so far, nicely showing how our perception of issues are colored by our personal experience with scripts and languages. For han characters even my simple suggestion above is probably not practical. A./
RE: Unicode and Security: Domain Names
Moreover, the IDN WG documents are in final call, so if you have comments to make on them, now is the time. Visit http://www.i-d-n.net/ and sub-scribe (with a hyphen here so that listar does not interpret my post as a command!) to their mailing list (and read their archives) before doing so. The documents in last call are: 1. Internationalizing Domain Names in Applications (IDNA) http://www.ietf.org/internet-drafts/draft-ietf-idn-idna-06.txt 2. Stringprep Profile for Internationalized Host Names http://www.ietf.org/internet-drafts/draft-ietf-idn-nameprep-07.txt 3. Punycode version 0.3.3 http://www.ietf.org/internet-drafts/draft-ietf-idn-punycode-00.txt 4. Preparation of Internationalized Strings (stringprep) http://www.ietf.org/internet-drafts/draft-hoffman-stringprep-00.txt and the last call will end on Feb 11th 2002, 23h59m GMT-5. There is little time left. YA
RE: Unicode and Security: Domain Names
I want to review these documents, but since time is short, maybe someone can answer my question... Are the actual domain names as stored in the DB going to be canonical normalized Unicode strings? It seems this would go a long way towards preventing spoofing ... no one would be allowed to register a non-canonical normalized domain name. Then, a resolver would be required to normalize any request string before the actual resolve. So my questions are: 1 - Am I way off base here? If so, why? 2 - If not, is it already addressed in these docs? 3 - If it is not in the docs, and the request makes sense, then I will make the effort to beat the deadline, which is next Monday. Thanks! Barry At 10:37 AM 2/8/2002 -0800, Yves Arrouye wrote: Moreover, the IDN WG documents are in final call, so if you have comments to make on them, now is the time. Visit http://www.i-d-n.net/ and sub-scribe (with a hyphen here so that listar does not interpret my post as a command!) to their mailing list (and read their archives) before doing so. The documents in last call are: 1. Internationalizing Domain Names in Applications (IDNA) http://www.ietf.org/internet-drafts/draft-ietf-idn-idna-06.txt 2. Stringprep Profile for Internationalized Host Names http://www.ietf.org/internet-drafts/draft-ietf-idn-nameprep-07.txt 3. Punycode version 0.3.3 http://www.ietf.org/internet-drafts/draft-ietf-idn-punycode-00.txt 4. Preparation of Internationalized Strings (stringprep) http://www.ietf.org/internet-drafts/draft-hoffman-stringprep-00.txt and the last call will end on Feb 11th 2002, 23h59m GMT-5. There is little time left. YA
RE: Unicode and Security: Domain Names
Moreover, the IDN WG documents are in final call, so if you have comments to make on them, now is the time. Visit http://www.i-d-n.net/ and subscribe to their mailing list (and read their archives) before doing so. The documents in last call are: 1. Internationalizing Domain Names in Applications (IDNA) http://www.ietf.org/internet-drafts/draft-ietf-idn-idna-06.txt 2. Stringprep Profile for Internationalized Host Names http://www.ietf.org/internet-drafts/draft-ietf-idn-nameprep-07.txt 3. Punycode version 0.3.3 http://www.ietf.org/internet-drafts/draft-ietf-idn-punycode-00.txt 4. Preparation of Internationalized Strings (stringprep) http://www.ietf.org/internet-drafts/draft-hoffman-stringprep-00.txt and the last call will end on Feb 11th 2002, 23h59m GMT-5. There is little time left. YA
21st Unicode Conference, May 2002, Dublin, Ireland
First European IUC in two years! Twenty-first International Unicode Conference (IUC21) Unicode, Localization and the Web: The Global Connection http://www.unicode.org/iuc/iuc21 May 14-17, 2002 Dublin, Ireland Just 13 weeks to go! The Unicode Standard has become the foundation for all modern text processing. It is used on large machines, tiny portable devices, and for distributed processing across the Internet. The standard brings cost-reducing efficiency to international applications and enables the exchange of text in an ever increasing list of natural languages. New technologies and innovative Internet applications, as well as the evolving Unicode Standard, bring new challenges along with their new capabilities. The Twenty-first International Unicode Conference (IUC21) will explore the opportunities created by the latest advances and how to leverage them, as well as potential pitfalls to be aware of, and problem areas that need further research. Conference attendees will include managers, software engineers, systems analysts, font designers, graphic designers, content developers, technical writers, and product marketing personnel, involved in the development, deployment or use of Unicode software or content, and the globalization of software and the Internet. CONFERENCE WEB SITE, PROGRAM and REGISTRATION The Conference Program and Registration form will be available soon at the Conference Web site: http://www.unicode.org/iuc/iuc21 CONFERENCE SPONSORS Agfa Monotype Corporation Basis Technology Corporation Localisation Research Centre Microsoft Corporation Reuters Ltd. Sun Microsystems, Inc. World Wide Web Consortium (W3C) GLOBAL COMPUTING SHOWCASE Visit the Showcase to find out more about products supporting the Unicode Standard, and products and services that can help you globalize/localize your software, documentation and Internet content. For details, visit the Conference Web site. CONFERENCE VENUE The Conference will take place at: The Burlington Hotel Upper Leeson Street Dublin 4, Ireland Tel: (+353 1) 660 5222 Fax: (+353 1) 660 8496 CONFERENCE MANAGEMENT Global Meeting Services Inc. 8949 Lombard Place, #416 San Diego, CA 92122, USA Tel: +1 858 638 0206 (voice) +1 858 638 0504 (fax) Email: [EMAIL PROTECTED] or: [EMAIL PROTECTED] THE UNICODE CONSORTIUM The Unicode Consortium was founded as a non-profit organization in 1991. It is dedicated to the development, maintenance and promotion of The Unicode Standard, a worldwide character encoding. The Unicode Standard encodes the characters of the world's principal scripts and languages, and is code-for-code identical to the international standard ISO/IEC 10646. In addition to cooperating with ISO on the future development of ISO/IEC 10646, the Consortium is responsible for providing character properties and algorithms for use in implementations. Today the membership base of the Unicode Consortium includes major computer corporations, software producers, database vendors, research institutions, international agencies and various user groups. For further information on the Unicode Standard, visit the Unicode Web site at http://www.unicode.org or e-mail [EMAIL PROTECTED] * * * * * Unicode(r) and the Unicode logo are registered trademarks of Unicode, Inc. Used with permission. -- -- Visit our Internet site at http://www.reuters.com Any views expressed in this message are those of the individual sender, except where the sender specifically states them to be the views of Reuters Ltd.
RE: Unicode and Security: Domain Names
Are the actual domain names as stored in the DB going to be canonical normalized Unicode strings? It seems this would go a long way towards preventing spoofing ... Names will be stored according to a normalization called Nameprep. Read the Stringprep (general framework) and Nameprep (IDN application, or Stringprep profile) for details. This normalization includes a step of normalizing using NFKC, but it does more than that. no one would be allowed to register a non- canonical normalized domain name. Then, a resolver would be required to normalize any request string before the actual resolve. To keep the resolver's loads the same as today, client applications will do the normalization of their requests. If they don't normalize properly, the lookup will just fail. Read the IDNA document for more info on this. All normalized strings are encoded in a so-called ASCII Compatible Encoding which uses the restricted set of characters used in the DNS today (letters, digits, hyphen except at the extremities) for host names (which are different than STD13 names, cf. SRV RRs for example). Read IDNA, again, and Punycode, the chosen encoding. YA
RE: Unicode and Security: Domain Names
The recent discussions of this list about Internet domain name spoofing through substitution of Unicode characters that have similar, or identical, glyphs is an issue that has recently appeared in print in a prominent journal: @String{j-CACM = Communications of the ACM} @Article{Gabrilovich:2002:IRH, author = Evgeniy Gabrilovich and Alex Gontmakher, title =Inside risks: The homograph attack, journal = j-CACM, volume = 45, number = 2, pages =128--128, month =feb, year = 2002, CODEN =CACMA2, ISSN = 0001-0782, bibdate = Wed Jan 30 17:45:01 MST 2002, bibsource =http://www.acm.org/pubs/contents/journals/cacm/;, acknowledgement = ack-nhfb, } Bruce Schneier also discussed this in the 15-Mar-2001, 15-Jul-2001, 15-Sep-2001, and 15-Nov-2001 issues of the CRYPTO-GRAM newsletter (available at http://www.counterpane.com/crypto-gram.html ) and gave these links for more info: http://www.theregister.co.uk/content/55/21573.html http://www.securityfocus.com/bid/3461 http://www.counterpane.com/crypto-gram-0007.html#9 http://www.securityfocus.com/focus/ids/articles/utf8.html --- - Nelson H. F. BeebeTel: +1 801 581 5254 - - Center for Scientific Computing FAX: +1 801 585 1640, +1 801 581 4148 - - University of UtahInternet e-mail: [EMAIL PROTECTED] - - Department of Mathematics, 322 INSCC [EMAIL PROTECTED] [EMAIL PROTECTED] - - 155 S 1400 E RM 233 [EMAIL PROTECTED]- - Salt Lake City, UT 84112-0090, USAURL: http://www.math.utah.edu/~beebe - ---
Re: Re[2]: Unicode and Security
Asmus is absolutely right about Latin, Greek and Cyrillic. And the response that Unicode should be encoding glyphs instead of characters is, in the least, misguided. No character encodings have ever been predicated on that. For an example of how many glyphs are available just for the letter A, look at: http://www.macchiato.com/utc/glyph_variation.html There have been attempts to develop glyph standards (AFII was one). All have floundered. Mark — Πόλλ’ ἠπίστατο ἔργα, κακῶς δ’ ἠπίστατο πάντα — Ὁμήρου Μαργίτῃ [For transliteration, see http://oss.software.ibm.com/cgi-bin/icu/tr] http://www.macchiato.com - Original Message - From: Philipp Reichmuth [EMAIL PROTECTED] To: Asmus Freytag [EMAIL PROTECTED] Cc: [EMAIL PROTECTED] Sent: Friday, February 08, 2002 09:18 Subject: Re[2]: Unicode and Security Hello Asmus and others, I'm not sure Unicode can be fixed at this point. The flaws may be too deeply embedded. The real solution may involve waiting until companies and people start losing significant amounts of money as a result of the flaws in Unicode, and then throwing it away and replacing it with something else. AF This sounds nice and dramatic, but misses the point that the kinds of AF issues you highlighted are absolutely common to *all* character sets AF containing Latin and Greek, or Latin and Cyrillic characters, suggesting AF that you are simply grandstanding here, instead of trying to find real AF solutions to your problem. Oh, it is very well possible to design a character set that supports all of Latin, Cyrillic and Greek without being susceptible to this problem beyond the familiar 1-l-|, 0-O dimension. The main premise is to encode glyphs instead of characters so that one glyph A is used in all three of these alphabets. Roundtrip compatibility with legacy character sets would be a problem, though. It looks like there is the decision between kludge A (roundtrip compatibility missing) and kludge B (easier spoofability). However, for URLs etc., roundtrip compatibility is not really necessary, I think. AF Earlier, you accused Unicode of being in denial about security AF issues: It is you who is in denial about some underlying AF realities, among which is that there are security issues that AF cannot be fixed by designing a 'better' character set. I am sure they can be fixed by designing a better character set that is better suited to a given problem. A lot of problems can be avoided by regarding a character set as an application-specific entity to some extent. This is not what we want, of course; we want a universal encoding across all applications. This being our premise, the resulting problems which you cannot possibly deny will have to be dealt with in one way or the other. To me, it seems a better idea to fix problems that arise directly from the way we encode our characters already on the character set level as far as possible, even if it just means notifying people that mixing characters from different alphabets may lead to misinterpretations and to denote common glyph similarities in the standard, such as the glyph A or for that part the character A being indiscernible in several alphabets. Philippmailto:[EMAIL PROTECTED] ___ Seeing my great fault / Through darkening blue windows / I begin again
Arabic indexes
I have here a book with separate English, Hebrew and Arabic indexes. In the English index, the indexed words appear (as is conventional) with a page number after (that is, to the right) of them. In the Hebrew index, the words likewise appear with a page number after (that is, to the left) of them. In the Arabic index, however, the indexed words appear with a page number before (that is, to the right) of them. Is this regular practice in Arabic indexing, or some bizarre bidi glitch? -- John Cowan http://www.ccil.org/~cowan [EMAIL PROTECTED] To say that Bilbo's breath was taken away is no description at all. There are no words left to express his staggerment, since Men changed the language that they learned of elves in the days when all the world was wonderful. --_The Hobbit_