Re: Unicode and Security
I have long advocated more intelligent GUIs to help distinguish spoofing names. I think the technique could also help for the Traditional vs Simplified Chinese issue; to help people type in one or the other but not mix. I coded up (very rough, I warn you) a quick demo of what I mean. Try: http://www.macchiato.com/utc/despoofing Mark — Πόλλ’ ἠπίστατο ἔργα, κακῶς δ’ ἠπίστατο πάντα — Ὁμήρου Μαργίτῃ [For transliteration, see http://oss.software.ibm.com/cgi-bin/icu/tr] http://www.macchiato.com - Original Message - From: John Cowan [EMAIL PROTECTED] To: [EMAIL PROTECTED] Cc: [EMAIL PROTECTED]; [EMAIL PROTECTED]; [EMAIL PROTECTED]; [EMAIL PROTECTED] Sent: Saturday, February 09, 2002 18:28 Subject: Re: Unicode and Security [EMAIL PROTECTED] scripsit: Let's keep going. Latin Y, Greek Upsilon, Cyrillic U. Wait a minute, that Cyrillic U doesn't look *quite* the same. Oh well, it's close enough, right? And then there's the Cyrillic U with the straight descender, whic actually does look just like its Latin and Greek counterparts. I guess we just can't afford to have two kinds of Cyrillic U around: off with their heads (or tails)! Unfortunately, there goes all those Turkic languages written in Cyrillic. Well, they should Romanize anyway. In fact all languages should Romanize: it simplifies everything s much, and if we get rid of diacritics while we're at at it well, the ASCII Consortium (off-net, but cached in part at http://www.google.com/search?q=cache:IRueJQ1bA-4C:www.wholehog.fsnet.c o.uk/robert/ascii/+ASCII+Consortiumhl=en) will find it a dream come true. And there was much rejoicing. -- John Cowan http://www.ccil.org/~cowan [EMAIL PROTECTED] To say that Bilbo's breath was taken away is no description at all. There are no words left to express his staggerment, since Men changed the language that they learned of elves in the days when all the world was wonderful. --_The Hobbit_
Re: Unicode and Security
Elliotte Rusty Harold scripsit: Another possibility is a super-normalization that does combine similar looking Unicode characters; e.g. in the domain name system we might decide that microsoft.com with Latin o's or Cyrillic o's or Greek o's is to resolve to the same address. In that case comment NOW, TODAY or TOMORROW, to the IETF IDN lists so that they can extend the nameprep process to do such things. (They will be resistant at this stage, no doubt, but it's worth a try.) The Unicode list can't help you. -- John Cowan http://www.ccil.org/~cowan [EMAIL PROTECTED] To say that Bilbo's breath was taken away is no description at all. There are no words left to express his staggerment, since Men changed the language that they learned of elves in the days when all the world was wonderful. --_The Hobbit_
RE: Unicode and Security
Doug, I agree. I used to do security consulting and found that the biggest problem was that people tried to come up with solutions for the wrong problem. We can go back to the typewriter days when there was no.t difference between 1 l or 0 O. Do. you blame ASCII if you type ST0P instead of STOP? Reexamine the problem and potential solutions. Carl -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]]On Behalf Of [EMAIL PROTECTED] Sent: Sunday, February 10, 2002 5:24 PM To: [EMAIL PROTECTED]; [EMAIL PROTECTED] Cc: [EMAIL PROTECTED]; [EMAIL PROTECTED] Subject: Re: Unicode and Security In a message dated 2002-02-10 13:00:19 Pacific Standard Time, [EMAIL PROTECTED] writes: However, I do continue to maintain that character confusion is a real security risk that will have real impact on users, and that needs to be considered in any system that uses Unicode. We have already established that similar-looking characters can cause confusion in Unicode-based systems. However, we have also established that ISO 8859-1, 8859-5 (Cyrillic), 8859-7 (Greek), and even ASCII can suffer from this same problem. It is unrealistic to sugest that the problem began with Unicode. In some domains the problem might be severe enough to eliminate Unicode from consideration in favor of less extensive character sets like Latin-1. That would be a shame, but until the Unicode consortium addresses at a root level the real security implications of their work, security conscious developers will look elsewhere. (I notice the Unicode 3.0 book does not even have the word security in its index.) Many more developers who are at best tangentially conscious of security issues will go ahead and develop insecure systems because they don't realize the security implications of adopting Unicode. Companies and individuals that choose to throw out the baby with the bath water will achieve the kind of results that that approach usually delivers. Companies and individuals that wish to establish their own definitions of, and policies for dealing with, confusable characters are free to do so. As I stated earlier, and nobody could refute, there is no consistent way to determine which sets of characters are confusable with each other, other than in the most obvious cases like o/omicron. So of course neither the Unicode Consortium nor WG2 has taken it upon themselves to draw up such a list. This must be a local decision. Another possibility is a super-normalization that does combine similar looking Unicode characters; e.g. in the domain name system we might decide that microsoft.com with Latin o's or Cyrillic o's or Greek o's is to resolve to the same address. No separate registration would be necessary or possible. This would require detailed analysis of the tens of thousands of Unicode characters allowed in domain names by fluent speakers of various languages; not easy, not cheap, but perhaps necessary. Besides, the security improvements, this proposal would also improve the system's usability. Aren't sure whether that URL on the bus used an o or an omicron? Doesn't matter, type either one. Adding this sort of unification to the nameprep stage might have been possible about a year or so ago. It's probably too late now. Actually, people have been talking about the security problems with HTML for years. Search engines have gone to some effort to eliminate spamdexers that use these techniques. The log in HTML's eye does not, however, negate the existence of the log in Unicode's eye. Again (and again), the problem is not unique to Unicode. Existing character sets also contain confusables. Blaming Unicode for exacerbating the problem by offering so many characters is like blaming your local ice cream shop for offering 31 flavors, because that makes it so much more difficult to choose. -Doug Ewell Fullerton, California (address will soon change to dewell at adelphia dot net)
Re: Unicode and Security
* Elliotte Rusty Harold | | Let's say I register microsoft.com, only the fifth letter isn't a | lower-case Latin o. It's actually a lower case Greek omicron. I'll grant you that this is possible, perhaps even likely, and that it may cause problems, but I'm far from convinced that this in any way supports the there are security problems in Unicode thesis. There are many characters which look alike, and yet are different, which can cause problems of this kind. There are for example already viruses which exploit the visual similarity between 1 and l in the Windows system font to keep themselves from being discovered in file listings. So if this really is considered a problem it would seem to me that you would need to deal with the problem of [EMAIL PROTECTED], [EMAIL PROTECTED], and [EMAIL PROTECTED] looking very similar to [EMAIL PROTECTED] in lots of fonts. To exploit this, all you need to know is what email client someone uses, and usually every email they write will have that information in its headers. It seems to me that this problem really needs some other fix than the merging of all similar-looking characters in all character sets. I just can't see that working. Similarly, the security problems caused by using Unicode encoding tricks to hide or mangle text in, say, contracts, is no different from using HTML or CSS (or whatever) tricks to achieve the same effect, and yet nobody is talking about security problems with HTML or CSS. See [1] for one way of dealing with it that is now being worked on. So while I accept that there is a problem it does not seem to me that Unicode is the problem. To me the problem seems to be the complexity of the relationship between the bytes sent to the user and what the user actually sees and reacts to. That complexity is not going to disappear, and aspects of the same problem exist with just about any information representation, so clearly the solution must be something other than changing all of these syntaxes/formats/encodings. In the specific case you cite, for example, a better solution might be for the user's email client to keep track of all the user's contacts and for it to indicate in some clearly visible way whether the current email comes from one of them or not. Whether it uses string matching of email addresses or digital signatures to do that doesn't really matter; it solves the problem in your example either way. [1] URL: http://www.w3.org/TR/xmldsig-core/#sec-Seen -- Lars Marius Garshol, Ontopian URL: http://www.ontopia.net ISO SC34/WG3, OASIS GeoLang TCURL: http://www.garshol.priv.no
Re: Unicode and Security
In a message dated 2002-02-09 13:00:59 Pacific Standard Time, [EMAIL PROTECTED] writes: It seems to me that this problem really needs some other fix than the merging of all similar-looking characters in all character sets. I just can't see that working. Even the merging part wouldn't work. Let's say that I, like Ken Sakamura or Bernard Miller before me, have decided that I know much more about character encoding than the Unicode Consortium or WG2, and I am going to develop my own character encoding that will solve the problem of confusables once and for all. OK, we start with the easy ones. Latin A, Greek Alpha, and Cyrillic A all get unified. Latin E, Greek Epsilon, Cyrillic E, unified. Hey, this is easier than I thought. Latin B, Greek Beta, Cyrillic Ve. Ha! I'm smart enough to know that Ve gets unified with B and Beta, even though it represents a different sound. Just like Han unification! Boy, those Unicode dolts really missed something there. Let's keep going. Latin Y, Greek Upsilon, Cyrillic U. Wait a minute, that Cyrillic U doesn't look *quite* the same. Oh well, it's close enough, right? Let's try some lower-case letters. Latin a, Greek alpha, Cyrillic a. That Greek alpha looks kinda cursive, doesn't it? Should we unify it or not. Hmmm... How about Latin n and Greek eta? Is that descender on the eta significant or not? Hey, you could stick an eta in the middle of a Web address and really fool somebody. Better unify. How about Latin v and Greek nu? Different glyphs or not? In 9-point MS Sans Serif, they're pretty close, aren't they? (And don't forget Armenian vo!) Same goes for Latin y and Greek gamma. Well, you get the point. The world of alphabetic confusables is just not that simple or that 1-to-1. There are more edge cases, in fact, than obvious cases such as the a/alpha or o/omicron that we keep hearing about. And if I were trying to design this hypothetical Uniglyph encoding to get rid of those pesky confusables, and still provide support for alphabetic scripts besides Latin, I would eventually have to face the fact that it *can't be done*. Oh, sure, it can be done for a/alpha and o/omicron, so I can make a sales presentation or a picket sign. But a complete technical solution, uh, no. -Doug Ewell Fullerton, California (address will soon change to dewell at adelphia dot net)
Re: Unicode and Security
[EMAIL PROTECTED] scripsit: Let's keep going. Latin Y, Greek Upsilon, Cyrillic U. Wait a minute, that Cyrillic U doesn't look *quite* the same. Oh well, it's close enough, right? And then there's the Cyrillic U with the straight descender, whic actually does look just like its Latin and Greek counterparts. I guess we just can't afford to have two kinds of Cyrillic U around: off with their heads (or tails)! Unfortunately, there goes all those Turkic languages written in Cyrillic. Well, they should Romanize anyway. In fact all languages should Romanize: it simplifies everything s much, and if we get rid of diacritics while we're at at it well, the ASCII Consortium (off-net, but cached in part at http://www.google.com/search?q=cache:IRueJQ1bA-4C:www.wholehog.fsnet.co.uk/robert/ascii/+ASCII+Consortiumhl=en) will find it a dream come true. And there was much rejoicing. -- John Cowan http://www.ccil.org/~cowan [EMAIL PROTECTED] To say that Bilbo's breath was taken away is no description at all. There are no words left to express his staggerment, since Men changed the language that they learned of elves in the days when all the world was wonderful. --_The Hobbit_
Re: Unicode and Security
Elliotte Rusty Harold wrote: The problem is that all of these or any other client-based solution you come up with is only going to be implemented in some clients. Many, and at least initially most, users are not going to have any such protections. This needs to be cut off at the protocol level. Rather, the problem is that replacing just one of the many existing character encodings with an allegedly secure one would only be going to serve some (rather few!) users. Finding a solution that works with all character encodings alike, is much more efficient (and is probably feasable, in contrast to the solution advocated by ERH). One possible solution for the e-mail spoofing problem is kryptographic authentication. This is independent of the underlying character encoding, and it is al- ready widely available. I said 'allegedly secure', because no character encoding standard can really prevent this sort of spoofing (we had enough examples in this thread, based on bare ASCII). Trying to find a spoofing-proof character- encoding is comparable to the task of finding an alphabet that does not allow to spell any insults. Best wishes, Otto Stolz
Re: Unicode and Security
At 17:42 -0500 2002-02-07, John Cowan wrote: The only widely-deployed alternative approach I know of is ETSI GSM 03.38 (used in mobile telephony), A truly bizarre character set: it supports English, French, mainland Scandinavian languages, Italian, Spanish with Graves, and GREEK SHOUTING. On my Nokia I am forced to write SMS messages in Irish with graves for ÀàÌìÒòÙù but I am awarded Éé. The Nokia does have ÁáÍíÓóÚú available for spelling names in the phone book, but the accents are stripped off if they are sent in a text message. :-( -- Michael Everson *** Everson Typography *** http://www.evertype.com
Re: Unicode and Security
At 15:53 -0500 2002-02-07, Elliotte Rusty Harold wrote: For text files, probably not. But for the domain name system the world very well might. Indeed, maybe it should unless this problem can be dealt with. I suspect it can be dealt with by prohibiting script mixing in domain names (e.g. each component of the name must be entirely Greek or entirely Cyrillic or entirely Latin etc. Note: something_Cyrillic.something_greek.com is OK.) Does anybody really need mixed Latin and Greek domain names? Certainly. Some years ago the European Court upheld the right of a Belgian man whose father was Belgian and mother was Greek to spell his hyphenated last name in both scripts. Why should he not be allowed to register a domain based on his own name? I don't think this has anything to do with Unicode. In Unicode, we wish to make all the world's writing system available to everyone. Thieves and cheats will use it if they wish, but this detracts not one whit from the nobility of our enterprise. -- Michael Everson *** Everson Typography *** http://www.evertype.com
RE: Unicode and Security: Domain Names
-Original Message- From: Tom Gewecke [mailto:[EMAIL PROTECTED]] Sent: Thursday, February 07, 2002 6:20 PM To: [EMAIL PROTECTED] Subject: Re: Unicode and Security: Domain Names I note that companies like Verisign already claim to offer domain names in dozens of languages and scripts. Apparently these are converted by something called RACE encoding to ASCII for actual use on the internet. Does anyone know anything about RACE encoding and its properties? I wrote an article on IDNS in December of 2000 which discusses the approaches which were being debated at that time, including RACE. RACE is briefly described in that article. You can find it at: http://www-106.ibm.com/developerworks/library/u-domains.html I tried to find an updated internet draft on RACE, but looks like nothing exists after version 4, which has been archived. I'm guessing that draft names wich include the text BRACE, TRACE, and GRACE are probably RACE variations however. Check them out at: http://www.ietf.org/internet-drafts/ Suzanne Topping BizWonk Inc. [EMAIL PROTECTED]
Re: Unicode and Security: Domain Names
In a message dated 2002-02-08 8:23:22 Pacific Standard Time, [EMAIL PROTECTED] writes: Does anyone know anything about RACE encoding and its properties? I wrote an article on IDNS in December of 2000 which discusses the approaches which were being debated at that time, including RACE. RACE is briefly described in that article. You can find it at: http://www-106.ibm.com/developerworks/library/u-domains.html I tried to find an updated internet draft on RACE, but looks like nothing exists after version 4, which has been archived. I'm guessing that draft names wich include the text BRACE, TRACE, and GRACE are probably RACE variations however. Check them out at: http://www.ietf.org/internet-drafts/ An ACE (ASCII-Compatible Encoding) has been chosen for IDN, and it is neither RACE nor DUDE. Its working name was AMC-ACE-Z, and it has since been renamed Punycode. (No, I don't like the name either.) A search for punycode in the internet-drafts directory that Suzanne mentioned will reveal the details you are looking for. Beware that in addition to Punycode, there is another step in the IDN process called nameprep, which is basically an extended form of normalization to keep compatibility characters, non-spacing marks, directional overrides, and such out of domain names. Converting an arbitrary string through Punycode does not necessarily make it IDN-ready. -Doug Ewell Fullerton, California (address will soon change to dewell at adelphia dot net)
Re: Unicode and Security
Hi Elliotte and others, ERH Does anybody really need mixed Latin and Greek domain names? This is the wrong approach altogether. If we want to be universal, we can't exclude cases on a heuristic basis of no one is probably going to need this. BTW People will certainly want mixed Han and Latin characters where the problem arises with fullwidth forms to some extent, and people will probably want mixed Cyrillic and Latin domain names as well (one starts seeing mixed scripts in business names, for instance). Philippmailto:[EMAIL PROTECTED] ___ Hal, open the file / Hal, open the damn file, Hal / open the, please Hal
Re[2]: Unicode and Security
Hello Asmus and others, I'm not sure Unicode can be fixed at this point. The flaws may be too deeply embedded. The real solution may involve waiting until companies and people start losing significant amounts of money as a result of the flaws in Unicode, and then throwing it away and replacing it with something else. AF This sounds nice and dramatic, but misses the point that the kinds of AF issues you highlighted are absolutely common to *all* character sets AF containing Latin and Greek, or Latin and Cyrillic characters, suggesting AF that you are simply grandstanding here, instead of trying to find real AF solutions to your problem. Oh, it is very well possible to design a character set that supports all of Latin, Cyrillic and Greek without being susceptible to this problem beyond the familiar 1-l-|, 0-O dimension. The main premise is to encode glyphs instead of characters so that one glyph A is used in all three of these alphabets. Roundtrip compatibility with legacy character sets would be a problem, though. It looks like there is the decision between kludge A (roundtrip compatibility missing) and kludge B (easier spoofability). However, for URLs etc., roundtrip compatibility is not really necessary, I think. AF Earlier, you accused Unicode of being in denial about security AF issues: It is you who is in denial about some underlying AF realities, among which is that there are security issues that AF cannot be fixed by designing a 'better' character set. I am sure they can be fixed by designing a better character set that is better suited to a given problem. A lot of problems can be avoided by regarding a character set as an application-specific entity to some extent. This is not what we want, of course; we want a universal encoding across all applications. This being our premise, the resulting problems which you cannot possibly deny will have to be dealt with in one way or the other. To me, it seems a better idea to fix problems that arise directly from the way we encode our characters already on the character set level as far as possible, even if it just means notifying people that mixing characters from different alphabets may lead to misinterpretations and to denote common glyph similarities in the standard, such as the glyph A or for that part the character A being indiscernible in several alphabets. Philippmailto:[EMAIL PROTECTED] ___ Seeing my great fault / Through darkening blue windows / I begin again
Re: Unicode and Security
At 15:53 -0500 2002-02-07, Elliotte Rusty Harold wrote: For text files, probably not. But for the domain name system the world very well might. Indeed, maybe it should unless this problem can be dealt with. I suspect it can be dealt with by prohibiting script mixing in domain names (e.g. each component of the name must be entirely Greek or entirely Cyrillic or entirely Latin etc. Note: something_Cyrillic.something_greek.com is OK.) Does anybody really need mixed Latin and Greek domain names? Not only that, why limit the alleged security risks to domain names? Why not the part of an email address before the @? the allowed characters for that are specified in a different RFC than that for domain names, and has nothing to do at all with DNS. And how many variations of numerals are there in Unicode? After all, every place you could use a domain name, you could use the actual IP address too. How many ways might that be spoofed? Barry
Re[2]: Unicode and Security
At 06:18 PM 2/8/02 +0100, Philipp Reichmuth wrote: Oh, it is very well possible to design a character set that supports all of Latin, Cyrillic and Greek without being susceptible to this problem beyond the familiar 1-l-|, 0-O dimension. The main premise is to encode glyphs instead of characters so that one glyph A is used in all three of these alphabets. Roundtrip compatibility with legacy character sets would be a problem, though. It looks like there is the decision between kludge A (roundtrip compatibility missing) and kludge B (easier spoofability). If your statement was phrased differently, i.e. saying that domain name registration and resolution should not allow a distinction between A.com and A.com where one uses the Greek and one the Latin A, that would be a different matter. Such action would close this spoofing loophole very effectively w/o restricting the registration of meaningful names. However, there may be subtle issues with such an approach. But the important thing is that it does not fiddle with the character set as such. However, for URLs etc., roundtrip compatibility is not really necessary, I think. I beg to differ. Roundtrip convertibility is very important since URLs live in documents encoded in Unicode, ISO/IEC 8859-7, even Shift-JIS etc. that are all not 'glyph' encodings. Whatever specialized 'character set' gets used transiently in resolving the domain name is one issue, but it better be easily possible to convert between it and the form URLs are actually stored in hypertext. I am sure they can be fixed by designing a better character set that is better suited to a given problem. A lot of problems can be avoided by regarding a character set as an application-specific entity to some extent. This is not what we want, of course; we want a universal encoding across all applications. This being our premise, the resulting problems which you cannot possibly deny will have to be dealt with in one way or the other. Nobody argues that spoofing and other security issues shouldn't get addressed. To me, it seems a better idea to fix problems that arise directly from the way we encode our characters already on the character set level as far as possible, even if it just means notifying people that mixing characters from different alphabets may lead to misinterpretations and to denote common glyph similarities in the standard, such as the glyph A or for that part the character A being indiscernible in several alphabets. And we are certainly doing that. But, while A is an important character, there are nearly 70,000 han characters out there, some with distinctions so subtle that many fonts will not show them and many users will not recognize them. This has not featured in this discussion so far, nicely showing how our perception of issues are colored by our personal experience with scripts and languages. For han characters even my simple suggestion above is probably not practical. A./
RE: Unicode and Security: Domain Names
Moreover, the IDN WG documents are in final call, so if you have comments to make on them, now is the time. Visit http://www.i-d-n.net/ and sub-scribe (with a hyphen here so that listar does not interpret my post as a command!) to their mailing list (and read their archives) before doing so. The documents in last call are: 1. Internationalizing Domain Names in Applications (IDNA) http://www.ietf.org/internet-drafts/draft-ietf-idn-idna-06.txt 2. Stringprep Profile for Internationalized Host Names http://www.ietf.org/internet-drafts/draft-ietf-idn-nameprep-07.txt 3. Punycode version 0.3.3 http://www.ietf.org/internet-drafts/draft-ietf-idn-punycode-00.txt 4. Preparation of Internationalized Strings (stringprep) http://www.ietf.org/internet-drafts/draft-hoffman-stringprep-00.txt and the last call will end on Feb 11th 2002, 23h59m GMT-5. There is little time left. YA
RE: Unicode and Security: Domain Names
I want to review these documents, but since time is short, maybe someone can answer my question... Are the actual domain names as stored in the DB going to be canonical normalized Unicode strings? It seems this would go a long way towards preventing spoofing ... no one would be allowed to register a non-canonical normalized domain name. Then, a resolver would be required to normalize any request string before the actual resolve. So my questions are: 1 - Am I way off base here? If so, why? 2 - If not, is it already addressed in these docs? 3 - If it is not in the docs, and the request makes sense, then I will make the effort to beat the deadline, which is next Monday. Thanks! Barry At 10:37 AM 2/8/2002 -0800, Yves Arrouye wrote: Moreover, the IDN WG documents are in final call, so if you have comments to make on them, now is the time. Visit http://www.i-d-n.net/ and sub-scribe (with a hyphen here so that listar does not interpret my post as a command!) to their mailing list (and read their archives) before doing so. The documents in last call are: 1. Internationalizing Domain Names in Applications (IDNA) http://www.ietf.org/internet-drafts/draft-ietf-idn-idna-06.txt 2. Stringprep Profile for Internationalized Host Names http://www.ietf.org/internet-drafts/draft-ietf-idn-nameprep-07.txt 3. Punycode version 0.3.3 http://www.ietf.org/internet-drafts/draft-ietf-idn-punycode-00.txt 4. Preparation of Internationalized Strings (stringprep) http://www.ietf.org/internet-drafts/draft-hoffman-stringprep-00.txt and the last call will end on Feb 11th 2002, 23h59m GMT-5. There is little time left. YA
RE: Unicode and Security: Domain Names
Moreover, the IDN WG documents are in final call, so if you have comments to make on them, now is the time. Visit http://www.i-d-n.net/ and subscribe to their mailing list (and read their archives) before doing so. The documents in last call are: 1. Internationalizing Domain Names in Applications (IDNA) http://www.ietf.org/internet-drafts/draft-ietf-idn-idna-06.txt 2. Stringprep Profile for Internationalized Host Names http://www.ietf.org/internet-drafts/draft-ietf-idn-nameprep-07.txt 3. Punycode version 0.3.3 http://www.ietf.org/internet-drafts/draft-ietf-idn-punycode-00.txt 4. Preparation of Internationalized Strings (stringprep) http://www.ietf.org/internet-drafts/draft-hoffman-stringprep-00.txt and the last call will end on Feb 11th 2002, 23h59m GMT-5. There is little time left. YA
RE: Unicode and Security: Domain Names
Are the actual domain names as stored in the DB going to be canonical normalized Unicode strings? It seems this would go a long way towards preventing spoofing ... Names will be stored according to a normalization called Nameprep. Read the Stringprep (general framework) and Nameprep (IDN application, or Stringprep profile) for details. This normalization includes a step of normalizing using NFKC, but it does more than that. no one would be allowed to register a non- canonical normalized domain name. Then, a resolver would be required to normalize any request string before the actual resolve. To keep the resolver's loads the same as today, client applications will do the normalization of their requests. If they don't normalize properly, the lookup will just fail. Read the IDNA document for more info on this. All normalized strings are encoded in a so-called ASCII Compatible Encoding which uses the restricted set of characters used in the DNS today (letters, digits, hyphen except at the extremities) for host names (which are different than STD13 names, cf. SRV RRs for example). Read IDNA, again, and Punycode, the chosen encoding. YA
RE: Unicode and Security: Domain Names
The recent discussions of this list about Internet domain name spoofing through substitution of Unicode characters that have similar, or identical, glyphs is an issue that has recently appeared in print in a prominent journal: @String{j-CACM = Communications of the ACM} @Article{Gabrilovich:2002:IRH, author = Evgeniy Gabrilovich and Alex Gontmakher, title =Inside risks: The homograph attack, journal = j-CACM, volume = 45, number = 2, pages =128--128, month =feb, year = 2002, CODEN =CACMA2, ISSN = 0001-0782, bibdate = Wed Jan 30 17:45:01 MST 2002, bibsource =http://www.acm.org/pubs/contents/journals/cacm/;, acknowledgement = ack-nhfb, } Bruce Schneier also discussed this in the 15-Mar-2001, 15-Jul-2001, 15-Sep-2001, and 15-Nov-2001 issues of the CRYPTO-GRAM newsletter (available at http://www.counterpane.com/crypto-gram.html ) and gave these links for more info: http://www.theregister.co.uk/content/55/21573.html http://www.securityfocus.com/bid/3461 http://www.counterpane.com/crypto-gram-0007.html#9 http://www.securityfocus.com/focus/ids/articles/utf8.html --- - Nelson H. F. BeebeTel: +1 801 581 5254 - - Center for Scientific Computing FAX: +1 801 585 1640, +1 801 581 4148 - - University of UtahInternet e-mail: [EMAIL PROTECTED] - - Department of Mathematics, 322 INSCC [EMAIL PROTECTED] [EMAIL PROTECTED] - - 155 S 1400 E RM 233 [EMAIL PROTECTED]- - Salt Lake City, UT 84112-0090, USAURL: http://www.math.utah.edu/~beebe - ---
Re: Re[2]: Unicode and Security
Asmus is absolutely right about Latin, Greek and Cyrillic. And the response that Unicode should be encoding glyphs instead of characters is, in the least, misguided. No character encodings have ever been predicated on that. For an example of how many glyphs are available just for the letter A, look at: http://www.macchiato.com/utc/glyph_variation.html There have been attempts to develop glyph standards (AFII was one). All have floundered. Mark — Πόλλ’ ἠπίστατο ἔργα, κακῶς δ’ ἠπίστατο πάντα — Ὁμήρου Μαργίτῃ [For transliteration, see http://oss.software.ibm.com/cgi-bin/icu/tr] http://www.macchiato.com - Original Message - From: Philipp Reichmuth [EMAIL PROTECTED] To: Asmus Freytag [EMAIL PROTECTED] Cc: [EMAIL PROTECTED] Sent: Friday, February 08, 2002 09:18 Subject: Re[2]: Unicode and Security Hello Asmus and others, I'm not sure Unicode can be fixed at this point. The flaws may be too deeply embedded. The real solution may involve waiting until companies and people start losing significant amounts of money as a result of the flaws in Unicode, and then throwing it away and replacing it with something else. AF This sounds nice and dramatic, but misses the point that the kinds of AF issues you highlighted are absolutely common to *all* character sets AF containing Latin and Greek, or Latin and Cyrillic characters, suggesting AF that you are simply grandstanding here, instead of trying to find real AF solutions to your problem. Oh, it is very well possible to design a character set that supports all of Latin, Cyrillic and Greek without being susceptible to this problem beyond the familiar 1-l-|, 0-O dimension. The main premise is to encode glyphs instead of characters so that one glyph A is used in all three of these alphabets. Roundtrip compatibility with legacy character sets would be a problem, though. It looks like there is the decision between kludge A (roundtrip compatibility missing) and kludge B (easier spoofability). However, for URLs etc., roundtrip compatibility is not really necessary, I think. AF Earlier, you accused Unicode of being in denial about security AF issues: It is you who is in denial about some underlying AF realities, among which is that there are security issues that AF cannot be fixed by designing a 'better' character set. I am sure they can be fixed by designing a better character set that is better suited to a given problem. A lot of problems can be avoided by regarding a character set as an application-specific entity to some extent. This is not what we want, of course; we want a universal encoding across all applications. This being our premise, the resulting problems which you cannot possibly deny will have to be dealt with in one way or the other. To me, it seems a better idea to fix problems that arise directly from the way we encode our characters already on the character set level as far as possible, even if it just means notifying people that mixing characters from different alphabets may lead to misinterpretations and to denote common glyph similarities in the standard, such as the glyph A or for that part the character A being indiscernible in several alphabets. Philippmailto:[EMAIL PROTECTED] ___ Seeing my great fault / Through darkening blue windows / I begin again
Reversible bidi (wrote RE: Unicode and Security)
Otto Stolz wrote: Gaspar Sinai wrote: Just because some companies who have influence on Unicode Consortium use some algorithm, like backing store and re-mapping, it does not mean that this is the only way. [...] Yudit does convert the input to view order and back. Now, this reveals the real problem. From this description, I gather that Gaspar's editor does not preserve the backing store, hence it has to reconstruct it from the rendering. As the rendering process is a n-1 mapping, its reverse is, intrisically, ambiguous. So, the attempt to recon- struct the original character sequence from the vsual appearance is bound to fail, in the general case. Dankeschön, Otto! I have been wondering for all the duration of this discussion what the heck Gaspar and everybody else were talking about. Now I begin to understand. Could we please drop all this garbage about security (this is not the Anti-fraud Mailing List!) and talk about this implementation problem? As I see it, dropping the backing store after running the bidi algorithm is not necessarily a bad idea. But a condition must be respected: each character's *embedding* levels and *override* information should be preserved together with the text. With this additional data in hand, it is not impossible to define a *reversed* Bidi algorithm which effectively recovers the backing store from the visual order. Roozbeh Pournander, I, and other people have discussed this at length on this list, and a very similar algorithm is actually implemented as part of ICU. Such a reversed Bidi technique does not necessarily restore a bit-wise copy of the original backing store. However, the resulting backing store is guaranteed to (a) have the same logical order as the original and (b) have the same nesting of bidi embedding and overrides. The only things that this approach drops are redundant bidi controls (such as a LTR embedding within an already LTR segment), but is this all bad? Even the John Cowan's example becomes perfectly unambiguous, if the bidi embedding levels are retained: Case 1: From visual order: the Arabs = BARA-LA And bidi levels:222 Get logical order: the Arabs = AL-ARAB Case 2: From visual order: the Arabs = BARA-LA And bidi levels:322 Get logical order: AL-ARAB = the Arabs It is not perfectly clear whether this approach is more or less functional than the traditional approach of maintaining the backing store. What is important, is that the two techniques have the same result. My impression is that, although this reverse bidi requires more processing (text must undergo two bidi algorithms vs. one), it makes the editing of text a little bit easier, both for the programmer and for the user. Roozbeh and I also considered that, as the embedding level are available during the editing process, it would also be possible to *display* them (e.g., in the form of stacks of horizontal arrows drawn under the text), and this would make clear to the user the exact reading order. _ Marco
RE: Reversible bidi (wrote RE: Unicode and Security)
I (Marco Cimarosti) wrote: Even the John Cowan's example becomes perfectly unambiguous, if the bidi embedding levels are retained: Case 1: From visual order: the Arabs = BARA-LA And bidi levels:222 Get logical order: the Arabs = AL-ARAB Case 2: From visual order: the Arabs = BARA-LA And bidi levels:322 Get logical order: AL-ARAB = the Arabs Errata: the bidi levels in the examples should actually be lowered by one level. I must also add that also the paragraph's overall embedding level should be retained, although this would be almost always identical to the lowest embedding level in the text. So, here is the correction: Even John Cowan's example becomes perfectly unambiguous, provided that the bidi embedding levels are retained: Case 1: From visual order: the Arabs = BARA-LA And bidi levels:111 And paragraph level:0 You get logical order: the Arabs = AL-ARAB Case 2: From visual order: the Arabs = BARA-LA And bidi levels:211 And paragraph level:1 You get logical order: AL-ARAB = the Arabs _ Marco
RE: Unicode and Security
Gaspar Sinai wrote: I am thinking about electronically signed Unicode text documents that are rendered correctly or believeed to be rendered correctly, still they look different, seem to contain additional or do not seem to contain some text when viewed with different viewers due to some ambiguities inherent in the standard. This sounds like a rendering (application) issue not a character encoding (unicode) issue. If the applicaton or operating environment doesn't properly support complex script rendering (and / or if the client doesn't have the right fonts installed) then text in complex scripts might be rendered incorrectly - or not at all. Chances are such text would either be nonsensical, look like gobbledegook, or display as string of empty boxes indicating missing glyphs. Would you sign something like that? Can you give an example of some text or document a person might be fooled into signing that would mean one thing if rendered correctly and something entirely different when rendered incorrectly? - Chris
RE: Unicode and Security
John Hudson wrote: I can make an OpenType font for that uses contextual substitution to replace the phrase 'The licensee also agrees to pay the type designer $10,000 every time he uses the lowercase e' with a series of invisible non-spacing glyphs. Of course, the backing store will contain my dastardly hidden clause and that is the text the unwitting victim will electronically sign. Hahahaha, he laughed maniacally! How about a font that displays any number following a dollar sign as only 10% of the actual value in the backing text? As John pointed out, this sort of thing isn't a Unicode problem. One could just as easily employ the same kind of hidden rendering rules with ASCII text. The only way to prevent this sort of fraud altogether would be to throw out complex script rendering and encode glyphs not characters... I don't think anyone seriously wants to go back down that route and anyway it would probably take decades and a huge effort to make such a standard properly covering all the scripts already in Unicode - and there would undoubtedly still be other problems. There are plenty of ways paper documents can be altered, added to or just plain forged by someone intent on fraud - some of them extremely difficult to detect. I don't know, but it's probably safest to assume that the situation is similar with electronic documents - whatever security systems are in place. That's one reason why you should always keep a duplicate copy of any contract you sign - whether its an electronic document you digitally sign or a paper document you sign with a pen. - Chris
Re: Unicode and Security
At 11:54 AM -0700 2/6/02, John H. Jenkins wrote: Right, but right now is that people are typing things like www.whitehouse. com instead of www.whitehouse.gov (or, for that matter, www.unicode.com). How likely is it that someone will accidentally type www.s?mple.com instead of www.sample.com? Somebody could easily follow a link to such a site, possibly through a pop-up or some spyware installed on their system, and never realize they weren't at the actual site. Security and spoofing are very real issues that were never, as far as I know, even considered in the design of Unicode. It's unclear whether or not the problem can be fixed now. The Unicode community has been in serious denial about this for some time. That other technologies also have or contribute to these problems in no way absolves Unicode of its problems. -- +---++---+ | Elliotte Rusty Harold | [EMAIL PROTECTED] | Writer/Programmer | +---++---+ | The XML Bible, 2nd Edition (Hungry Minds, 2001) | | http://www.ibiblio.org/xml/books/bible2/ | | http://www.amazon.com/exec/obidos/ISBN=0764547607/cafeaulaitA/ | +--+-+ | Read Cafe au Lait for Java News: http://www.cafeaulait.org/ | | Read Cafe con Leche for XML News: http://www.ibiblio.org/xml/ | +--+-+
Re: Unicode and Security
* Elliotte Rusty Harold | | Security and spoofing are very real issues that were never, as far | as I know, even considered in the design of Unicode. It's unclear | whether or not the problem can be fixed now. The Unicode community | has been in serious denial about this for some time. That other | technologies also have or contribute to these problems in no way | absolves Unicode of its problems. Could you explain what the problem is, as you see it? I've heard mumblings about this from various directions for a long time, but could never make any sense out of them. Is there a problem? If so, what is it? It seems to me that as security problems go C/C++ is infinitely much worse that Unicode, and anyone at all serious about security should start there, rather than with character sets. -- Lars Marius Garshol, Ontopian URL: http://www.ontopia.net ISO SC34/WG3, OASIS GeoLang TCURL: http://www.garshol.priv.no
Re: Unicode and Security
I've been thinking about security issues in Unicode, and I've come up with one that's quite scary and worse than any I've heard before. It uses only plaintext, no fonts involved, doesn't require buggy software, and works over e-mail instead of the Web. All it requires added to the existing infrastructure is internationalized domain names. So in the hope that this becomes a self-defeating prophecy, here's the scenario: I as a reporter or industrial spy or detective working on a divorce case, have learned the identities and internal e-mail addresses of two people, call them Alice and Bob, at Microsoft (or just about any other large company). I've somehow communicated with these people personally, for instance on an e-mail list completely unrelated to work but for which they use their work e-mail so I'm familiar with their style and signature files. Or perhaps, I've communicated with them on work related matters before. In any case, it's not hard to get two people who know each other at a large company to send you e-mail. Of course, they would presumably be careful not to give me secret company information since they know they're talking to an outsider. For the sake of argument, let's call the company they work at Microsoft, but this attack could hit most companies with a .com address. Let's say I register microsoft.com, only the fifth letter isn't a lower-case Latin o. It's actually a lower case Greek omicron. I then forge a believable letter from [EMAIL PROTECTED] to [EMAIL PROTECTED] saying Can you please update me on your budget? Bob, noticing that the e-mail appears to come from Alice, whom he knows and trusts, fires off a reply with his confidential information. Only it doesn't go to Alice. It goes to me. I can then reply to Bob, asking for clarification or more details. I can ask him to attach the latest build of his software. I can carry on a conversation in which Bob believes me to be Alice and spills his guts. This is very, very bad. E-mail forgery has been a problem for a long time, but it's always been one-way. You couldn't trick somebody into sending you a reply because doing so required using a different e-mail address than the one they expected, thus revealing the message as forged. With a Unicode enabled mailer, that's no longer true. If the fonts Bob (not me, but Bob) chooses for his e-mail program do not make a clear distinction between an o and an omicron, this works. There are lots of other attacks. The Cyrillic and Greek alphabets provide lots of options for replacing single letters in Latin domain names. I'm not sure whether or not the internationalized domain names working group has fully grokked this or not. Like Unicode, they seem to be trying to pass the buck. In particular, they state http://www.ietf.org/internet-drafts/draft-ietf-idn-requirements-09.txt: Specifying requirements for internationalized domain names does not itself raise any new security issues. However, any change to the DNS MAY affect the security of any protocol that relies on the DNS or on DNS names. A thorough evaluation of those protocols for security concerns will be needed when they are developed. In particular, IDNs MUST be compatible with DNSSEC and, if multiple charsets or representation forms are permitted, the implications of this name-spoof MUST be throughly understood. In other words, it's not our fault. Blame the client software. Sounds distressingly like the Unicode Consortium's approach to these issues. Interestingly, my attack works with a single character representation (Unicode). It is not dependent on multiple charsets. I don't know if the IDN working group has thought of this problem. I hope they have, and consider it their responsibility to prevent. I also hope the Unicode consortium and vendors of client software think about these problems. But I don't think we can count on client software getting this right. (Hell, Microsoft, can't even stop e-mail from running scripts.) The problem needs to be fixed closer to the source. -- +---++---+ | Elliotte Rusty Harold | [EMAIL PROTECTED] | Writer/Programmer | +---++---+ | The XML Bible, 2nd Edition (Hungry Minds, 2001) | | http://www.ibiblio.org/xml/books/bible2/ | | http://www.amazon.com/exec/obidos/ISBN=0764547607/cafeaulaitA/ | +--+-+ | Read Cafe au Lait for Java News: http://www.cafeaulait.org/ | | Read Cafe con Leche for XML News: http://www.ibiblio.org/xml/ | +--+-+
Re: Unicode and Security
On Thu, Feb 07, 2002 at 10:34:20AM -0500, Elliotte Rusty Harold wrote: Security and spoofing are very real issues that were never, as far as I know, even considered in the design of Unicode. Unicode is a character encoding, not a glyph encoding. Furthermore, it's a superset of a number of preexisting character sets, so that it was possible for those users to move to Unicode without problems. Since important preexisting character sets seperated Greek, Cyrillic and Latin scripts, Unicode had to. Had Unicode not chosen to follow these principles, ISO 10646 would have, and it would have become the dominant character set, with the same problems. In any case, what is your solution? When the American Mathematical Society says We need a SMALL CIRCLE for the mathematical texts, do you say no, we already have the unified LATGRKCRY SMALL O? After they show you that the two are distinct characters in their texts, do you still refuse because someone might get confused? The Universal Character Set can't afford to not encode characters like that. -- David Starner - [EMAIL PROTECTED], dvdeug/jabber.com (Jabber) Pointless website: http://dvdeug.dhis.org What we've got is a blue-light special on truth. It's the hottest thing with the youth. -- Information Society, Peace and Love, Inc.
Re: Unicode and Security
At 12:22 -0500 2002-02-07, Elliotte Rusty Harold wrote: For the sake of argument, let's call the company they work at Microsoft, but this attack could hit most companies with a .com address. Let's say I register microsoft.com, only the fifth letter isn't a lower-case Latin o. It's actually a lower case Greek omicron. I then forge a believable letter from [EMAIL PROTECTED] to [EMAIL PROTECTED] saying Can you please update me on your budget? Bob, noticing that the e-mail appears to come from Alice, whom he knows and trusts, fires off a reply with his confidential information. Only it doesn't go to Alice. It goes to me. I can then reply to Bob, asking for clarification or more details. I can ask him to attach the latest build of his software. I can carry on a conversation in which Bob believes me to be Alice and spills his guts. This is very, very bad. It isn't Unicode's fault that some letters look like others. That's a fault of history. -- Michael Everson *** Everson Typography *** http://www.evertype.com
Re: Unicode and Security
At 11:53 AM 2/7/02 -0600, David Starner wrote: a superset of a number of preexisting character sets, so that it was possible for those users to move to Unicode without problems. Since important preexisting character sets seperated Greek, Cyrillic and Latin scripts, Unicode had to. Had Unicode not chosen to follow these principles, ISO 10646 would have, and it would have become the dominant character set, with the same problems. Actually, this discussion ignores that, in order to be workable, a character set standard for *cased* scripts, must support context free case transitions. That's why B, B, and B need to be separated, since they lower case into the three different characters 'b', 'beta' and 'small B'. That they are also considered to come from different scripts, just reinforces that argument. However, the Latin character that looks like a captital D with stroke can lowercase into a straight 'd with stroke' or a curly form, which is an icelandic letter. As long as the two lower case forms aren't unified, and little speaks in favor of that, least of all, legacy, then the two upper case forms must be separated as well. The one exception that survived (Turkish I) is causing innumerable problems, which supports the rule I gave at the outset. Any workable multilingual character set containing these characters will allow spoofing on the character level, and all existing ones (including 8859-7 for Latin/Greek for example) do. But, as the discussion shows, spoofing on the word level (.com for .gov) is alive and well, and supported by any character set whatsoever. For that reason, it seems to promise little gain to try to chase the holy grail of a multilingual character set that somehow avoids the character level spoofing, if the word level spoofing can go on unchecked. A./
Re: Unicode and Security
At 12:22 PM 2/7/2002 -0500, Elliotte Rusty Harold wrote: I've been thinking about security issues in Unicode, and I've come up with one that's quite scary and worse than any I've heard before. It uses only plaintext, no fonts involved, doesn't require buggy software, and works over e-mail instead of the Web. All it requires added to the existing infrastructure is internationalized domain names. So in the hope that this becomes a self-defeating prophecy, here's the scenario: snipCan you please update me on your budget? Bob, noticing that the e-mail appears to come from Alice, whom he knows and trusts, fires off a reply with his confidential information. Only it doesn't go to Alice. It goes to me. I can then reply to Bob, asking for clarification or more details. I can ask him to attach the latest build of his software. I can carry on a conversation in which Bob believes me to be Alice and spills his guts. This is very, very bad. This is precisely the problem digital signing is meant to solve. Signing means that Alice has encrypted the message with her private key before sending to Bob. Bob then unencrypts the message using Alice's public key. If the message does not unencrypt, then Bob should not trust that the message is from Alice. This algorithm works independent of transport mechanism (email, etc.), or domains. Alice's key stays with Alice,not with the domain. Of course, how you exchange trusted keys in the first place is another matter, but I am sure this is all covered on a security FAQ somewhere. E-mail forgery has been a problem for a long time, but it's always been one-way. You couldn't trick somebody into sending you a reply because doing so required using a different e-mail address than the one they expected, thus revealing the message as forged. There are many many ways to get a response from someone via email, even if the address is not recognized or forged. Most involve social engineering approaches more than anything else. My mailbox filled with spam will attest the that! With a Unicode enabled mailer, that's no longer true. If the fonts Bob (not me, but Bob) chooses for his e-mail program do not make a clear distinction between an o and an omicron, this works. There are lots of other attacks. The Cyrillic and Greek alphabets provide lots of options for replacing single letters in Latin domain names. Unless all messages are signed (technically feasible) , then there is no trust at all. When Outlook/Exchange supports, in fact requires, messages to be signed, then this problem will start to dwindle away, at least in the email realm. Of course if there is a method to judge the level of trust for properly signed messages that arrive from folks you don't know (a human failability), then knowing the origin of the message might not help much either. My inbound spam can be verifiably signed, but it is still spam. In other words, it's not our fault. Blame the client software. Sounds distressingly like the Unicode Consortium's approach to these issues. Interestingly, my attack works with a single character representation (Unicode). Your attack is only a social engineering attack, not a technical weakness inherent in any protocol, or character set (even though there may be such issues) Barry
Re: Unicode and Security
On Thu, Feb 07, 2002 at 12:22:18PM -0500, Elliotte Rusty Harold wrote: Interestingly, my attack works with a single character representation (Unicode). It is not dependent on multiple charsets. It also works with EUC-JP (and other Japanese charsets), all 8-bit Russian representations, all 8-bit Greek representations . . . The problem needs to be fixed closer to the source. How about a solution that doesn't involve the destruction of Unicode as a useful tool? The fact that MD5 sums matching doesn't prove that the files match is not a bug in MD5 sums. Likewise, the fact that glyphs may look alike in a _character_ is not a bug in the character encoding. -- David Starner - [EMAIL PROTECTED], dvdeug/jabber.com (Jabber) Pointless website: http://dvdeug.dhis.org What we've got is a blue-light special on truth. It's the hottest thing with the youth. -- Information Society, Peace and Love, Inc.
Re: Unicode and Security
On Thu, Feb 07, 2002 at 10:34:20AM -0500, Elliotte Rusty Harold wrote: Unicode is a character encoding, not a glyph encoding. Furthermore, it's a superset of a number of preexisting character sets, so that it was possible for those users to move to Unicode without problems. Since important preexisting character sets seperated Greek, Cyrillic and Latin scripts, Unicode had to. Had Unicode not chosen to follow these principles, ISO 10646 would have, and it would have become the dominant character set, with the same problems. I know why these choices were made. That has nothing to do with the question of whether the finished product will or will not cause security breaches. In any case, what is your solution? When the American Mathematical Society says We need a SMALL CIRCLE for the mathematical texts, do you say no, we already have the unified LATGRKCRY SMALL O? After they show you that the two are distinct characters in their texts, do you still refuse because someone might get confused? The Universal Character Set can't afford to not encode characters like that. I'm not sure Unicode can be fixed at this point. The flaws may be too deeply embedded. The real solution may involve waiting until companies and people start losing significant amounts of money as a result of the flaws in Unicode, and then throwing it away and replacing it with something else. I don't like that solution, but not liking it doesn't mean it ain't gonna happen as soon as Exxon loses a few billion dollars because somebody spoofed them and thereby gained access to their bidding plans for oil leases. Don't be surprised when some large companies start issuing memos forbidding the use of Unicode, or blocking all non-ASCII domain names at their firewall. One possible solution at the domain name system level might be to limit domain names to a single Unicode block or group. For instance, Greek domain names could be allowed but not domain names that mix Greek with Latin. Similarly, you couldn't mix Latin with Cyrillic or Cyrillic with Greek. That would at least vastly reduce the possibility for domain spoofing, if not eliminate it entirely. Interesting tidbit: app1e.com (not APPLE.COM but APP1E.COM) is in fact already registered. This attack may not be as theoretical as I initially thought. -- +---++---+ | Elliotte Rusty Harold | [EMAIL PROTECTED] | Writer/Programmer | +---++---+ | The XML Bible, 2nd Edition (Hungry Minds, 2001) | | http://www.ibiblio.org/xml/books/bible2/ | | http://www.amazon.com/exec/obidos/ISBN=0764547607/cafeaulaitA/ | +--+-+ | Read Cafe au Lait for Java News: http://www.cafeaulait.org/ | | Read Cafe con Leche for XML News: http://www.ibiblio.org/xml/ | +--+-+
RE: Unicode and Security
I'm not sure Unicode can be fixed at this point. The flaws may be too deeply embedded. The real solution may involve waiting until companies and people start losing significant amounts of money as a result of the flaws in Unicode, and then throwing it away and replacing it with something else. I don't like that solution, but not liking it doesn't mean it ain't gonna happen as soon as Exxon loses a few billion dollars because somebody spoofed them and thereby gained access to their bidding plans for oil leases. Don't be surprised when some large companies start issuing memos forbidding the use of Unicode, or blocking all non-ASCII domain names at their firewall. Doom! Doom! Doom! End is nigh, repent ye sinners! Interesting tidbit: app1e.com (not APPLE.COM but APP1E.COM) is in fact already registered. This attack may not be as theoretical as I initially thought. Interestingly enough, I find this (and whitehouse.com and whitehouse.org, and micros0ft.com, and ...) a good example for Unicode being largely irrelevant. Sure, Unicode gives more possibilities for abuse, but I fail to see how a character encoding standard can stop people from being stupid and not using public keys or some other means of trust in cases where it matters. Analogously, people will keep opening executable attachments promising sex, regardless of whether the 's', 'e', and 'x' are Latin letters or not.
Re: Unicode and Security
At 10:16 AM -0800 2/7/02, Barry Caplan wrote: This is precisely the problem digital signing is meant to solve. Signing means that Alice has encrypted the message with her private key before sending to Bob. Bob then unencrypts the message using Alice's public key. If the message does not unencrypt, then Bob should not trust that the message is from Alice. This algorithm works independent of transport mechanism (email, etc.), or domains. Alice's key stays with Alice,not with the domain. Of course, how you exchange trusted keys in the first place is another matter, but I am sure this is all covered on a security FAQ somewhere. That's very nice in theory, but it's not the way people use e-mail in practice and it's not going to be. Microsoft, a company with a very technically literate employee base, might be able to implement this scheme (though I doubt it). A company like Exxon never could. The system's just too cumbersome. E-mail forgery has been a problem for a long time, but it's always been one-way. You couldn't trick somebody into sending you a reply because doing so required using a different e-mail address than the one they expected, thus revealing the message as forged. There are many many ways to get a response from someone via email, even if the address is not recognized or forged. Most involve social engineering approaches more than anything else. My mailbox filled with spam will attest the that! Yes, but that doesn't address the fact that this makes the problem far worse. With a Unicode enabled mailer, that's no longer true. If the fonts Bob (not me, but Bob) chooses for his e-mail program do not make a clear distinction between an o and an omicron, this works. There are lots of other attacks. The Cyrillic and Greek alphabets provide lots of options for replacing single letters in Latin domain names. Unless all messages are signed (technically feasible) , then there is no trust at all. When Outlook/Exchange supports, in fact requires, messages to be signed, then this problem will start to dwindle away, at least in the email realm. Would that it were so, but it's not. As you suggest, people do trust e-mail even when they shouldn't. Trust is a human question decided by human beings, not a boolean answer that comes out of a computer algorithm. I can trust that the message I'm replying to came from a person named Barry Caplan even if I have no proof of that whatsoever. Of course if there is a method to judge the level of trust for properly signed messages that arrive from folks you don't know (a human failability), then knowing the origin of the message might not help much either. My inbound spam can be verifiably signed, but it is still spam. In other words, it's not our fault. Blame the client software. Sounds distressingly like the Unicode Consortium's approach to these issues. Interestingly, my attack works with a single character representation (Unicode). Your attack is only a social engineering attack, not a technical weakness inherent in any protocol, or character set (even though there may be such issues) Technical systems can be more or less resistant to social engineering attacks. It is the task of the system designers to make the system more resistant. I'm reminded of an IBM mainframe system about a decade ago where it was possible to change your password by appending a slash and the new password to the old password when logging in. Few users knew this but hackers did. It wasn't very hard to convince a user on the phone that they needed to set their account to debugging mode by logging in and appending /DEBUG to the password. This had the affect of changing their account password to DEBUG which you knew and they didn't. (It's been awhile, but I vaguely recall that this was the hack Phiber Optic used to break into the New York City Public School System computers) This particular system was poorly designed and thus vulnerable to a social engineering attack. But make no mistake: it was very much a design flaw in the system. It was not the user's fault for not knowing about an obscure option to change their password at the login prompt. It should not have been there in the first place, and once discovered it needed to be taken out. That there were other social engineering attacks on the system didn't change the need to fix this problem. Design choices have security consequences. It is not enough to claim that your system is secure when used properly or when implemented properly. The system must be designed in such a way that it is natural to use it properly and it is easy to implement properly. Furthermore, failure to do so should be obvious. When a system is being used incorrectly, the problem needs to be brutally obvious. In Unicode, it is not. -- +---++---+ | Elliotte Rusty Harold | [EMAIL PROTECTED] | Writer/Programmer |
Re: Unicode and Security
At 11:34 AM -0800 2/7/02, Asmus Freytag wrote: But, as the discussion shows, spoofing on the word level (.com for .gov) is alive and well, and supported by any character set whatsoever. For that reason, it seems to promise little gain to try to chase the holy grail of a multilingual character set that somehow avoids the character level spoofing, if the word level spoofing can go on unchecked. Burglary at the broken window level is alive and well. Therefore there's little point to putting locks on doors. I hope the fallacy of the above is obvious, but when translated into the computer security domain it's all too common a rationalization, as this thread demonstrates. There are many ways to socially engineer someone into doing something they shouldn't do. This is just one of them, and one that's mostly theoretical at the current time. However, we still need to plug the hole. That there are other, less damaging holes (or even more damaging ones) is no excuse for not fixing this one. Just to pull a number out of a hat, imagine there are 10,000 attacks a day using spoofing in the current system. Is this any justification for opening up a hole that will add 10,000 more? Of course it's not. -- +---++---+ | Elliotte Rusty Harold | [EMAIL PROTECTED] | Writer/Programmer | +---++---+ | The XML Bible, 2nd Edition (Hungry Minds, 2001) | | http://www.ibiblio.org/xml/books/bible2/ | | http://www.amazon.com/exec/obidos/ISBN=0764547607/cafeaulaitA/ | +--+-+ | Read Cafe au Lait for Java News: http://www.cafeaulait.org/ | | Read Cafe con Leche for XML News: http://www.ibiblio.org/xml/ | +--+-+
Re: Unicode and Security
On Thu, Feb 07, 2002 at 01:21:29PM -0500, Elliotte Rusty Harold wrote: I'm not sure Unicode can be fixed at this point. The flaws may be too deeply embedded. The real solution may involve waiting until companies and people start losing significant amounts of money as a result of the flaws in Unicode, and then throwing it away and replacing it with something else. What else? As we keep pointing out, almost every character in Unicode that normally has the same glyph as another is in Unicode with good reason. To change that to something that would fit your goals will cost billions right now just for the change, and then you end with a character set that can't round trip all the others in common use, and that is more painful to use for Greeks and Russian, and completely unusable for mathematicians. I seriously doubt the world would go to a massively inferior character set because of the security holes you're talking about. -- David Starner - [EMAIL PROTECTED], dvdeug/jabber.com (Jabber) Pointless website: http://dvdeug.dhis.org What we've got is a blue-light special on truth. It's the hottest thing with the youth. -- Information Society, Peace and Love, Inc.
Re: Unicode and Security
At 12:28 PM -0800 2/7/02, John Hudson wrote: 1. The software industry has already devised mechanisms to protect against e-mail forgery, e.g. private-public key encryption. And nobody uses them because they're too complex. 2. What you describe is criminal fraud and there are laws to protect against such 'spoofing' and to punish those who perpetrate it. Yes, there are. And there are laws to protect against spam and denial of service attacks and cracking systems. For that matter there are laws to protect against burglary, but I still have locks on my front door. Laws are no substitute for prevention. -- +---++---+ | Elliotte Rusty Harold | [EMAIL PROTECTED] | Writer/Programmer | +---++---+ | The XML Bible, 2nd Edition (Hungry Minds, 2001) | | http://www.ibiblio.org/xml/books/bible2/ | | http://www.amazon.com/exec/obidos/ISBN=0764547607/cafeaulaitA/ | +--+-+ | Read Cafe au Lait for Java News: http://www.cafeaulait.org/ | | Read Cafe con Leche for XML News: http://www.ibiblio.org/xml/ | +--+-+
Re: Unicode and Security
Elliotte Rusty Harold wrote: Interesting tidbit: app1e.com (not APPLE.COM but APP1E.COM) is in fact already registered. This attack may not be as theoretical as I initially thought. And a0l.com. (It even has a website: www.a0l.com.) [Which, incidentally, induces a security hazard, by attempting to download a free/10 sec download that fills forms and brings offers based on websites you visit, offered by The Gator Corporation -- even authenticated by Verisign. But would *you* trust a free download offered by The Gator Corporation, even if it is really them?? Those guys are bigtime ad-spy facilitation gators, all right.] But this kind of stuff is old news, as other examples raised by people have shown. The problem needs to be fixed closer to the source. It simply isn't practical to try to fix the problem of visual spoofing by trying to prevent it in the character encoding, or for that matter in the text-based protocols. As Barry Caplan pointed out, there are other, more robust means of determining trusted identity, to foil the cases like your Bob and Alice email scenario. But of course, the best technology won't prevent stupid, gullible people from falling into traps set for them by unscrupulous, cunning scam artists. And nobody is going to keep the dedicated industrial or military spies from finding ways to crack supposedly secure systems, if for no other reason than secure systems are administered by fallible humans. The Unicode community has been in serious denial about this for some time. That other technologies also have or contribute to these problems in no way absolves Unicode of its problems. Well, yeah, I guess I'm still in denial. *hehe* As Asmus, and any number of other people on this list have pointed out, the same problem of Latin/Greek/Cyrillic letter spoofing that has you so worried is present in any number of other 8-bit character sets, and because of the nature of the writing systems, the nature of case, and the requirements on textual processing, would still be present in any alternative to Unicode that some other character encoding committee could come up with. Even if we sat down to do it all over again, with a big Security Is Our Primary Concern! banner posted on the wall for every committee meeting, Unicode 2 would still end up with separate Latin, Greek, and Cyrillic alphabets encoded. Not to do so would make any proposed new standard crash and burn before it left the runway. The only widely-deployed alternative approach I know of is ETSI GSM 03.38 (used in mobile telephony), which has Greek (uppercase only) added to Latin, using the same codes for the uppercase Greek letters which look like Latin (ABEHIKMNOPTXYZ). But this approach is so patently nonextensible and so unworkable for any significant text processing requirements, that SC2, ANSI, or other major players in character encoding have never seriously considered such an approach for character encoding. So perhaps turnabout is fair play here. I'd say that a certain portion of the security community has been in serious denial about the nature of character encodings for some time. --Ken
Re: Unicode and Security
Thursday, February 7, 2002 Would making the about to be misled respondent type the address of the intended person (with a roman 'o', not a greek omicron) and then having the system see if they match detect and thwart such tricks? The respondent is already typing so it's not a large extra burden. Regards, Jim Agenbroad (dislcaimer and addresses at bottom) On Thu, 7 Feb 2002, Michael Everson wrote: At 12:22 -0500 2002-02-07, Elliotte Rusty Harold wrote: For the sake of argument, let's call the company they work at Microsoft, but this attack could hit most companies with a .com address. Let's say I register microsoft.com, only the fifth letter isn't a lower-case Latin o. It's actually a lower case Greek omicron. I then forge a believable letter from [EMAIL PROTECTED] to [EMAIL PROTECTED] saying Can you please update me on your budget? Bob, noticing that the e-mail appears to come from Alice, whom he knows and trusts, fires off a reply with his confidential information. Only it doesn't go to Alice. It goes to me. I can then reply to Bob, asking for clarification or more details. I can ask him to attach the latest build of his software. I can carry on a conversation in which Bob believes me to be Alice and spills his guts. This is very, very bad. It isn't Unicode's fault that some letters look like others. That's a fault of history. -- Michael Everson *** Everson Typography *** http://www.evertype.com Regards, Jim Agenbroad ( [EMAIL PROTECTED] ) It is not true that people stop pursuing their dreams because they grow old, they grow old because they stop pursuing their dreams. Adapted from a letter by Gabriel Garcia Marquez. The above are purely personal opinions, not necessarily the official views of any government or any agency of any. Addresses: Office: Phone: 202 707-9612; Fax: 202 707-0955; US mail: I.T.S. Sys.Dev.Gp.4, Library of Congress, 101 Independence Ave. SE, Washington, D.C. 20540-9334 U.S.A. Home: Phone: 301 946-7326; US mail: Box 291, Garrett Park, MD 20896.
Re: Unicode and Security
At 11:42 2/7/2002, Elliotte Rusty Harold wrote: Burglary at the broken window level is alive and well. Therefore there's little point to putting locks on doors. I hope the fallacy of the above is obvious, but when translated into the computer security domain it's all too common a rationalization, as this thread demonstrates. I disagree. Suggesting that many of the benefits of the Unicode encoding model should be abandoned because they might be abused is like saying 'Burglary at the broken window level is alive and well. Therefore there's little point in possessing anything.' Of course there is a point to putting locks on doors, but that is analogous to putting locks on e-mail, not to obsessing about one potential security problem in one particular software standard. If you were able to fix all the 'flaws' in Unicode, you would a) be left with a less useful character encoding standard, b) still be facing all the remaining security holes in all the remaining software standards and applications, and c) have done nothing to combat user ignorance and gullibility just waiting to be taken advantage of. John Hudson Tiro Typeworks www.tiro.com Vancouver, BC [EMAIL PROTECTED] ... es ist ein unwiederbringliches Bild der Vergangenheit, das mit jeder Gegenwart zu verschwinden droht, die sich nicht in ihm gemeint erkannte. ... every image of the past that is not recognized by the present as one of its own concerns threatens to disappear irretrievably. Walter Benjamin
Re: Unicode and Security
I think this is probably beginning to get off-topic, but At 12:45 2/7/2002, Elliotte Rusty Harold wrote: 1. The software industry has already devised mechanisms to protect against e-mail forgery, e.g. private-public key encryption. And nobody uses them because they're too complex. I think fewer people use them than should not because they are too complex but because a) not enough people know about them and b) too many of the people who know about them believe them to be a lot more complex than they are. A few messages ago you suggested that some companies might introduce 'no Unicode' policies in order to protect against spoofing (despite that fact that many alternative character encodings would leave them equally vulnerable). I think it is far more likely that companies would introduce compulsory e-mail encryption and signing. John Hudson Tiro Typeworks www.tiro.com Vancouver, BC [EMAIL PROTECTED] ... es ist ein unwiederbringliches Bild der Vergangenheit, das mit jeder Gegenwart zu verschwinden droht, die sich nicht in ihm gemeint erkannte. ... every image of the past that is not recognized by the present as one of its own concerns threatens to disappear irretrievably. Walter Benjamin
Re: Unicode and Security
At 10:21 2/7/2002, Elliotte Rusty Harold wrote: I'm not sure Unicode can be fixed at this point. The flaws may be too deeply embedded. What flaws? The fact that glyphs in different scripts may be similar or identical in some typefaces, and misrepresentation is possible because Unicode separately encodes these glyphs as distinct characters? I'm sorry, but that is the nature of writing systems, and Unicode's encoding of these characters is inherited from existing standard character sets. Is this a flaw? Is this as great a flaw as glyph-based encoding would have been? Is it as great a flaw as hampering backwards compatibility with other encodings would have been? In your examples, you seem to ignore two things: 1. The software industry has already devised mechanisms to protect against e-mail forgery, e.g. private-public key encryption. 2. What you describe is criminal fraud and there are laws to protect against such 'spoofing' and to punish those who perpetrate it. John Hudson Tiro Typeworks www.tiro.com Vancouver, BC [EMAIL PROTECTED] ... es ist ein unwiederbringliches Bild der Vergangenheit, das mit jeder Gegenwart zu verschwinden droht, die sich nicht in ihm gemeint erkannte. ... every image of the past that is not recognized by the present as one of its own concerns threatens to disappear irretrievably. Walter Benjamin
Re: Unicode and Security
At 02:42 PM 2/7/2002 -0500, Elliotte Rusty Harold wrote: At 11:34 AM -0800 2/7/02, Asmus Freytag wrote: But, as the discussion shows, spoofing on the word level (.com for .gov) is alive and well, and supported by any character set whatsoever. For that reason, it seems to promise little gain to try to chase the holy grail of a multilingual character set that somehow avoids the character level spoofing, if the word level spoofing can go on unchecked. Burglary at the broken window level is alive and well. Therefore there's little point to putting locks on doors. I hope the fallacy of the above is obvious, but when translated into the computer security domain it's all too common a rationalization, as this thread demonstrates. It is not obvious to me that there is a fallacy at all, let alone what it is. Instead of stating that we should be able to infer the fallacy, please state it, and a possible solution explicitly. It seems to me we have already proposed working, and available (if not elegant) solutions to the issue of trust of content. Now the issue seems to be trust of domain names. My browser already has built in support for identifying groups of domains I can assign varying levels of trust to, base on certificate technology. NOt elegant, but available. Similarly, something for email could e done using today's technology. More importantly, wrt DNS: under what circumstances can you, today, or in the future, actually trust that the address resolving information you get is accurate? None, really. The packets go too many places on the way that could change them. And even if it is accurate, which of course it usually is, how can you be sure that packets at a lower level will actually be delivered, as intended, and not misdirected or copied elsewhere? You can't, really, for the same reason. This is the nature of the system, especially at the IP level. None of this has to the slightest bit to do with what characters are used for domain names, and hence will not go away with any changes to DNS. It has everything to do with why data should be encrypted if you care about security of data. There are many ways to socially engineer someone into doing something they shouldn't do. This is just one of them, and one that's mostly theoretical at the current time. However, we still need to plug the hole. That there are other, less damaging holes (or even more damaging ones) is no excuse for not fixing this one. The source code for bind is available. Go ahead and fix it. good luck persuading people to upgrade such a mission critical part of the internet though. Just to pull a number out of a hat, imagine there are 10,000 attacks a day using spoofing in the current system. Is this any justification for opening up a hole that will add 10,000 more? Of course it's not. I still don't see the attack as anything but social engineering. That a telemarketer or door-to-door salesman can get my credit card info by misrepresenting their intent does not mean there is a flaw in either the phone numbering scheme, or the credit card system. Your attack is exactly analogous. Barry
Re: Unicode and Security
Kenneth Whistler wrote: The only widely-deployed alternative approach I know of is ETSI GSM 03.38 (used in mobile telephony), A truly bizarre character set: it supports English, French, mainland Scandinavian languages, Italian, Spanish with Graves, and GREEK SHOUTING. -- John Cowan [EMAIL PROTECTED] http://www.reutershealth.com I amar prestar aen, han mathon ne nen,http://www.ccil.org/~cowan han mathon ne chae, a han noston ne 'wilith. --Galadriel, _LOTR:FOTR_
Re: Unicode and Security: Domain Names
I note that companies like Verisign already claim to offer domain names in dozens of languages and scripts. Apparently these are converted by something called RACE encoding to ASCII for actual use on the internet. Does anyone know anything about RACE encoding and its properties?
RE: Unicode and Security: Domain Names
It is one of the competitors for internationalized domain names. The ACE stands for ASCII Compatible Encoding. The encoding which appears likely to gain overall acceptance is called DUDE and can be found here: http://www.i-d-n.net/draft/draft-ietf-idn-dude-02.txt There are several ACE encoding demos on the 'Net (Mark Davis has one at www.macchiato.com, I have one at www.inter-locale.com) http://www.i-d-n.net is where you can find out about a whole zoo of Unicode transfer encoding schemes proposed for use in DNS, plus the relevant issues, of which there turn out to be a number when creating I18n domain names. The early implementers have mostly ignored these issues and the interplay between the ultimate standard and existing registrars should be interesting. Regards, Addison Addison P. Phillips Globalization Architect / Manager, Globalization Engineering webMethods, Inc. | The Business Integration Company 432 Lakeside Drive, Sunnyvale, California, USA +1 408.962.5487 (phone) +1 408.210.3569 (mobile) - Internationalization is an architecture. It is not a feature. -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]]On Behalf Of Tom Gewecke Sent: Thursday, February 07, 2002 3:20 PM To: [EMAIL PROTECTED] Subject: Re: Unicode and Security: Domain Names I note that companies like Verisign already claim to offer domain names in dozens of languages and scripts. Apparently these are converted by something called RACE encoding to ASCII for actual use on the internet. Does anyone know anything about RACE encoding and its properties?
Re: Unicode and Security
At 01:21 PM 2/7/02 -0500, Elliotte Rusty Harold wrote: I'm not sure Unicode can be fixed at this point. The flaws may be too deeply embedded. The real solution may involve waiting until companies and people start losing significant amounts of money as a result of the flaws in Unicode, and then throwing it away and replacing it with something else. This sounds nice and dramatic, but misses the point that the kinds of issues you highlighted are absolutely common to *all* character sets containing Latin and Greek, or Latin and Cyrillic characters, suggesting that you are simply grandstanding here, instead of trying to find real solutions to your problem. Earlier, you accused Unicode of being in denial about security issues: It is you who is in denial about some underlying realities, among which is that there are security issues that cannot be fixed by designing a 'better' character set. You remind me of the people who keep on designing perpetual motion devices, even after the laws of thermodynamics proved the futility of such efforts. If you are interested in advancing security you would stop from barking up this blind alley and focus your energy on attacking the problems with other means. Plenty of suggestions have been made in this space over the last few days. Some of all of these should be explored. But if we learned anything useful in this exchange, it is that no security scheme should be designed so that it is dependent on the character encoding as primary defense against spoofing. Doing so would burden the character encoding with a task it will never be capable of fulfilling, since it would mean seriously compromising support for the tasks for which it was created in the first place. A./
Re: Unicode and Security
On Thu, 7 Feb 2002, Elliotte Rusty Harold wrote: Trust is a human question decided by human beings, not a boolean answer that comes out of a computer algorithm. I can trust that the message I'm replying to came from a person named Barry Caplan even if I have no proof of that whatsoever. Or that the book you're reading has been written by a person named Nicolas Bourbaki... (Sorry, I love the idea. I could not stop myself.) roozbeh
Re: Unicode and Security
At 04:17 AM 2/8/2002 +0330, Roozbeh Pournader wrote: On Thu, 7 Feb 2002, Elliotte Rusty Harold wrote: Trust is a human question decided by human beings, not a boolean answer that comes out of a computer algorithm. I can trust that the message I'm replying to came from a person named Barry Caplan even if I have no proof of that whatsoever. Or that the book you're reading has been written by a person named Nicolas Bourbaki... (Sorry, I love the idea. I could not stop myself.) roozbeh On what basis can Elliotte know that a message purported to be from Barry Caplan actually is from Barry Caplan, or that there even is a Barry Caplan? The person writing this, who claims to be Barry Caplan, has never met anyone named Elliotte Rusty Harold to the best of his recollection. He (Barry Caplan) does claim to personally be acquainted with many others on this list though - hi - sorry I missed you in DC! :) Best Regards, Barry Caplan www.i18n.com - coming soon, preview available now News | Tools | Process for Global Software Team I18N
Solipsism (was RE: Unicode and Security)
Title: Message What makes me think you exist, anyway? ;^) - rick (or so I say) -Original Message-From: Barry Caplan [mailto:[EMAIL PROTECTED]] Sent: Thursday, 7 February 2002 17:13To: Unicode ListSubject: Re: Unicode and SecurityAt 04:17 AM 2/8/2002 +0330, Roozbeh Pournader wrote: On Thu, 7 Feb 2002, Elliotte Rusty Harold wrote: Trust is a human question decided by human beings, not a boolean answer that comes out of a computer algorithm. I can trust that the message I'm replying to came from a person named "Barry Caplan" even if I have no proof of that whatsoever.Or that the book you're reading has been written by a person named "Nicolas Bourbaki"...(Sorry, I love the idea. I could not stop myself.)roozbehOn what basis can "Elliotte" know that a message purported to be from "Barry Caplan" actually is from "Barry Caplan", or that there even is a "Barry Caplan"? The person writing this, who claims to be "Barry Caplan", has never met anyone named "Elliotte Rusty Harold" to the best of his recollection. He ("Barry Caplan") does claim to personally be acquainted with many others on this list though - hi - sorry I missed you in DC! :) Best Regards,Barry Caplanwww.i18n.com - coming soon, preview available nowNews | Tools | Process for Global SoftwareTeam I18N
Re: Unicode and Security
At 4:31 PM -0500 2/7/02, James E. Agenbroad wrote: Thursday, February 7, 2002 Would making the about to be misled respondent type the address of the intended person (with a roman 'o', not a greek omicron) and then having the system see if they match detect and thwart such tricks? The respondent is already typing so it's not a large extra burden. Yes, that would probably work, though users would complain. Having the outgoing SMTP server drop all messages addressed to spoofs of the corporate domain works too on an enterprise level. And using message authentication based on public-key certification works too. The problem is that all of these or any other client-based solution you come up with is only going to be implemented in some clients. Many, and at least initially most, users are not going to have any such protections. This needs to be cut off at the protocol level. It is far better to prevent the spoofed messages from being sent in the first place than to offer clients tools to stop them once they're free in the ether. The maintainers of the Net and the Web at all levels from local sys admins to ISPs to spec implementers to spec writers to router vendors are rushing from hole to hole, trying to plug them faster than the script kiddies can exploit them. Even Microsoft is starting to recognize their culpability for producing an insecure infrastructure. This is a result of years of Internet development in all layers from the physical hardware on up to the browser without a real understanding of security. For past protocols like HTTP and URLs, we can plead ignorance and lack of imagination. We never knew how bad things were going to get. Now we do. We no longer have any excuses for knowingly designing systems that are open to spoofing, denial of service, or outright system cracking. Mistakes will of course continue to be made, but we have to try to make as few as possible and fix the problems where we can as soon as we can. There are legacy problems in HTTP, DNS, URLs, and many other systems; but when we're designing something truly new like internationalized domain names it only makes sense to avoid these known problems. -- +---++---+ | Elliotte Rusty Harold | [EMAIL PROTECTED] | Writer/Programmer | +---++---+ | The XML Bible, 2nd Edition (Hungry Minds, 2001) | | http://www.ibiblio.org/xml/books/bible2/ | | http://www.amazon.com/exec/obidos/ISBN=0764547607/cafeaulaitA/ | +--+-+ | Read Cafe au Lait for Java News: http://www.cafeaulait.org/ | | Read Cafe con Leche for XML News: http://www.ibiblio.org/xml/ | +--+-+
Re: Unicode and Security
At 5:12 PM -0800 2/7/02, Barry Caplan wrote: On what basis can Elliotte know that a message purported to be from Barry Caplan actually is from Barry Caplan, or that there even is a Barry Caplan? The person writing this, who claims to be Barry Caplan, has never met anyone named Elliotte Rusty Harold to the best of his recollection. He (Barry Caplan) does claim to personally be acquainted with many others on this list though - hi - sorry I missed you in DC! :) My point is exactly that I have no knowledge of this, but trust is not about knowledge. Trust is a decision made in the human brain on a not necessarily rational basis. In a rational world, trust would only be given to statements with some level of proof. We do not live in this rational world. In practice untrustworthy entities will be trusted both as to identity and other statements. Our system should be robust in the face of this. -- +---++---+ | Elliotte Rusty Harold | [EMAIL PROTECTED] | Writer/Programmer | +---++---+ | The XML Bible, 2nd Edition (Hungry Minds, 2001) | | http://www.ibiblio.org/xml/books/bible2/ | | http://www.amazon.com/exec/obidos/ISBN=0764547607/cafeaulaitA/ | +--+-+ | Read Cafe au Lait for Java News: http://www.cafeaulait.org/ | | Read Cafe con Leche for XML News: http://www.ibiblio.org/xml/ | +--+-+
Re: Unicode and Security
At 10:21 AM 2/7/02, Elliotte Rusty Harold wrote: I don't like that solution, but not liking it doesn't mean it ain't gonna happen as soon as Exxon loses a few billion dollars because somebody spoofed them and thereby gained access to their bidding plans for oil leases. Enron lost a few billion dollars, and iirc Unicode was not involved. -- Curtis Clark http://www.csupomona.edu/~jcclark/ Mockingbird Font Works http://www.mockfont.com/
Re: Unicode and Security
Elliotte Rusty Harold wrote: For past protocols like HTTP and URLs, we can plead ignorance and lack of imagination. We never knew how bad things were going to get. Now we do. We no longer have any excuses for knowingly designing systems that are open to spoofing, denial of service, or outright system cracking. Mistakes will of course continue to be made, but we have to try to make as few as possible and fix the problems where we can as soon as we can. There are legacy problems in HTTP, DNS, URLs, and many other systems; but when we're designing something truly new like internationalized domain names it only makes sense to avoid these known problems. And I'm with you all the way to this point. Where we part company, I think is at the implied and so... If the basic requirements are that we find a way (for IDN) to present meaningful strings to end users (note, not any natural language phrase, but just a suitably contained, meaningful subset thereof that users can live with) and then find a foolproof way to map that to IP numbers, *and* that those meaningful strings be truly internationalized and not just the current restricted subset of ASCII, then we have a problem. Either you have to more or less completely ignore the structure and integrity of writing systems, and try to constrain down the problem to a totally etic, psychological perception-based notion of no visual confusion allowed in visible symbols to be represented in strings, anywhere, anytime. Or you have to admit that internationalizing the strings even just the teensiest bit (e.g. allowing Cyrillic in the door along with ASCII, or for that matter just allowing in accented Latin letters along with ASCII) is going to increase the confusability level in visible symbols used in strings. The reductio ad absurdum of the first position is that allowing even a single additional character in domain names, no matter how distinct or innocuous, incrementally increases the opportunity for confusion, spoofing, or other monkey business over the current situation. So if we no longer have any excuses to do anything that might knowingly increase the opportunity for security holes, then logically, we should just shut down the whole IDN effort and proclaim to the world, Let them eat ASCII! Heck, it doesn't even have to be close to visual confusability to cause a problem. What if IDN allowed just two Han characters in, and nothing else, and those Han characters were for nihon (Japanese for Japan). Then somebody could register Microsoftnihon.com and never mind the naive user -- the knowledgable, biliterate English/Japanese user could be spoofed into thinking that was Microsoft's Japan division, instead of Trojans 'R Us. I think that rather than coming to the Unicode list to proclaim Unicode is a security risk! The sky is falling! the better way to conceive this is that globalization of the IT infrastructure of the world is a difficult business that presents many new possibilities for security risks if internationalization of existing protocols and the handling of textual data from around the world is not done carefully. If the customers of the Internet are demanding that it be internationalized better that it currently is (and I believe they are), and if part of that internationalization is responding to demands that Japan be able to have Japanese domain names, China have Chinese domain names, etc., as I believe it is, then we just have to come to grips with the complexity of text handling that that implies. And in turn that means that just as years ago system programmers learned to their chagrin that their systems broke because they had been doing casemapping with c -= 0x20 assignments, so Internet protocol developers are going to have to learn that their security is broken if it depends on the structure and constraints of ASCII, or on the use of small glyph sets where all the glyphs are visually distinct from each other. --Ken
Re: [idn] RE: Unicode and Security
[EMAIL PROTECTED] observed: Analogously, people will keep opening executable attachments promising sex, regardless of whether the 's', 'e', and 'x' are Latin letters or not. They're not, of course: U+0455 U+0435 U+0445 -Doug Ewell Fullerton, California (address will soon change to dewell at adelphia dot net)
Re: Unicode and Security
On Wednesday, February 6, 2002, at 11:12 AM, Lars Kristan wrote: Maybe digitally signed messages and bank accounts are not that good of an example, since people would be more careful there. Another case where this may get exploited will be domain names, once Unicode is allowed there. While www.example.com may be a company I trust, www.example.com with a Cyrillic 'a' in it may be a hacker (and no, I did not imply he/she would be from a county that uses Cyrillic) trying to get me to visit the site. Right, but right now is that people are typing things like www.whitehouse. com instead of www.whitehouse.gov (or, for that matter, www.unicode.com). How likely is it that someone will accidentally type www.s$B'Q(Bmple.com instead of www.sample.com? The original focus was on digital signatures, and I still don't get the objection. Because I don't know *precisely* what bytes Microsoft Word or Adobe Acrobat use, do I refuse to sign documents they create? Is that the idea? I mean, good heavens, I don't even know *precisely* what bytes Mail. app is going to use for this email. Should I refuse to sign it? == John H. Jenkins [EMAIL PROTECTED] [EMAIL PROTECTED] http://homepage.mac.com/jenkins/
RE: Unicode and Security
Well, I was tempted to join the discussion for a while now, but one of the things that stopped me was that I didn't quite understand why it was so focused on the bidi stuff. To make a certain portion of the text look like something else should be easier than that. OK, invisible non-spacing glyphs would be just one more method, I guess. I was thinking of replacing some characters with their look-alikes (probably even rendered from the same data in a font), like using U+0430 instead of U+0061 (Cyrillic 'a' instead of Latin 'a'). Maybe digitally signed messages and bank accounts are not that good of an example, since people would be more careful there. Another case where this may get exploited will be domain names, once Unicode is allowed there. While www.example.com may be a company I trust, www.example.com with a Cyrillic 'a' in it may be a hacker (and no, I did not imply he/she would be from a county that uses Cyrillic) trying to get me to visit the site. Yes, it's a fraud. And I want to thank John for pointing that out. But we're making it a hell of a lot easier now. In ASCII, all one could try was www.examp1e.com and a couple of other tricks, but it was maybe 10 tricks in ASCII, some more in case of Latin 1. How many are there with Unicode? U, a million? Well, nothing wrong with Unicode of course. Just means that there will need to be an option in your browser to reject any site without a digital certificate, and perhaps it will need to be turned on by default. So, there are ways to fight this (and I am afraid relying on police will not do it), but maybe these things should be well in place before someone gets a chance to exploit the new ways. Just a thought. Regards, Lars -Original Message- From: John Hudson [mailto:[EMAIL PROTECTED]] Sent: Wednesday, February 06, 2002 01:54 To: Unicode List Subject: Re: Unicode and Security At 09:39 2/5/2002, John H. Jenkins wrote: Y'know, I must confess to not following this thread at all. Yes, it is impossible to tell from the glyphs on the screen what sequence of Unicode characters was used to generate them. Just *how*, exactly, is this a security problem? I was wondering the same thing. I can make an OpenType font for that uses contextual substitution to replace the phrase 'The licensee also agrees to pay the type designer $10,000 every time he uses the lowercase e' with a series of invisible non-spacing glyphs. Of course, the backing store will contain my dastardly hidden clause and that is the text the unwitting victim will electronically sign. Hahahaha, he laughed maniacally! This has nothing to do with encoding, does not rely on difficult and totally improbable manipulation of a bidirectional algorithm and, most relevantly, is *not* a security problem in the OpenType font specification. It is an example of fraud. I suppose if there was a software solution to all such dangers, we wouldn't need police, felony charges, the court system, prisons, or any of the other things we rely on to protect honest people against dishonest. John Hudson Tiro Typeworkswww.tiro.com Vancouver, BC [EMAIL PROTECTED] ... es ist ein unwiederbringliches Bild der Vergangenheit, das mit jeder Gegenwart zu verschwinden droht, die sich nicht in ihm gemeint erkannte. ... every image of the past that is not recognized by the present as one of its own concerns threatens to disappear irretrievably. Walter Benjamin
RE: Unicode and Security
Well, nothing wrong with Unicode of course. Just means that there will need to be an option in your browser to reject any site without a digital certificate, and perhaps it will need to be turned on by default. So, Nothing prevents sites running frauds to get a certificate matching their name. If the price of certificates drop, or if the fraud has good margins enough, it will not even be a big inconvenience. YA
Re: Unicode and Security
On Wed, Feb 06, 2002 at 07:12:19PM +0100, Lars Kristan wrote: Well, I was tempted to join the discussion for a while now, but one of the things that stopped me was that I didn't quite understand why it was so focused on the bidi stuff. Because it can have a dramatic effect, whereas changing look-alikes has no effect on the displayed text. Yes, it's a fraud. And I want to thank John for pointing that out. But we're making it a hell of a lot easier now. In ASCII, all one could try was www.examp1e.com and a couple of other tricks, but it was maybe 10 tricks in ASCII, some more in case of Latin 1. How many are there with Unicode? U, a million? How often does it matter? I can see registars not registering stuff that was obviously an attempt to defraud, but you won't get there if you type it in yourself. It's easier for someone to set up a forged Microsoft link, but it's easy to check that. Rather than everyone being digitally signed, just checking if it's multiscript and pop up a warning will catch most of the cases. You could colorcode the major scripts with confusables . . . -- David Starner - [EMAIL PROTECTED], dvdeug/jabber.com (Jabber) Pointless website: http://dvdeug.dhis.org What we've got is a blue-light special on truth. It's the hottest thing with the youth. -- Information Society, Peace and Love, Inc.
Re: Unicode and Security
At 11:54 AM 2/6/2002 -0700, John H. Jenkins wrote: The original focus was on digital signatures, and I still don't get the objection. Because I don't know *precisely* what bytes Microsoft Word or Adobe Acrobat use, do I refuse to sign documents they create? Is that the idea? I mean, good heavens, I don't even know *precisely* what bytes Mail. app is going to use for this email. Should I refuse to sign it? I don't think the main issue is whether or not you should sign it. I think the main issue the original poster tired to raise, is that as the recipient of such a signed document, he is not persuaded he should trust it. This is a serious issue, although as several have noted, not a Unicode-only one. No one doubts the security of the encryption algorithms used for signing. But the issue of trust is critical. In the analog world, people are expected read and understand documents, and in general, the worlds legal systems are set up to recognize that a signature (or stamp or seal or whatever) is binding evidence that such care was taken (even if it wasn't really taken). In the digital world, individual behavior and legal processes both may not be so well formed to support the technology of digital signatures. I believe this is what the original point was. IANAL, but enforceability of such a kluged, digitally-signed document seems in doubt. There is a long history of that type of contract support in our US legal systems, and probably others as well. There will surely be difficulties adapting it to the digital domain, but I think the basis for support is already there Anyway, it is not, but maybe should be well known, that the purpose of digital signatures, is to verify who the sender is, and to verify that the document has not been changed in transit. That it might contain tricky language or information is an important thing to note, but the reader still needs to rely on the document's contents with the same skeptical eye as if it were not printed. Just as the Unicode bi-di algorithm makes no claims at reversibility, digital signing algorithms make no claim that the signed contents are correct,or even useful.
Re: Unicode and Security
On Tue, Feb 05, 2002 at 01:27:49PM +0900, Gaspar Sinai wrote: Talking about characters: I think bi-di should not be in Unicode Standard because it is not a character. It is an algorithm. Why would that fix the problem? Then everyone would just choose their own algorithim, and instead of a couple different renderings, with the ability to check it against the standard, you'd get a thousand, each equally correct. I feel sorry for interrupting in the Let's praise and celebrate Unicode mood of this mailing list. Head over to the POSIX list and start complaining about the maldesign of fixed width buffers and see how long they listen to you. This is the Unicode list - that means people here are interested in working in Unicode. The BIDI algorithm is frozen - seriously changing it would break way too many implementations to be considered. (Note that gets - so broken that the GNU linker will complain if you use it - is a standard part of a POSIX system. There's no evidence that the BIDI algorithm is anywhere near that broken.) I wish there was another world character standard besides Unicode and not only half-hearted attempts like bytext. Unicode has its problems, but it works. It takes a lot of work to build to create a character standard, and it's hard to find a bunch of people to work on a project to go against the industry leader without serious problems in that leader. Anyone on this list could produce a better Unicode than Unicode, just like any Unix person could produce a better Unix than Unix. But it's not going to be enough better that it's worth losing backward compatibility, and any serious changes will never get consensus. So a standard is entrenched. (Cf. Fortran, Unix, ASCII). The result is that you get the bizarre ideas of individuals, like Bytext and Rosetta, never really fully fleshed out or implemented, and the Japanese-centric universal charsets like Tron and ISO-2022-INT-1. (I've heard rumors of other cultures producing universal charsets that fix Unicode's bugs for their language only. I'm not familiar with them, though.) The first are too quirky to be useful. (Bytext's author compared it lambda calculus and Unicode to arithmetic. In some ways, it's an accurate comparison; while Church numbers are interesting, every real system directly supports arithmetic on binary numbers, as that's much more efficent and simple.) The later don't support non-Japanese scripts as well as Unicode, and don't sell well to non-Japanese audiences. ISO-2022-INT-1 supports 7 94x94 character charsets for CJK audiences (roughly 60,000 characters before any sort of unification), and ISO-8859-1 and ISO-8859-7*, leaving the Russians, the Hungarians, the Arabs and many more out in the cold. To the best of my knowledge, there's not enough information avalaible to the non-Japanese speaker to implement Tron. (Not only is information available about ISO-10646-1/Unicode in more languages, English is also more generally known than Japanese.) Again to the best of knowledge, there has no improvements to non-CJK sections of Tron (besides Braille) after the Unicode 2.0, whereas Unicode has continually updated to keep up - Unicode 3.2 handles more archaic documents, more languages and more scripts than ever before, as well as better linguistic and mathematic support. In all honesty, I only care the CJK parts of Unicode in that they convince people to implement Unicode so I can play with the Latin, Greek, Cherokee, IPA and Mathematics sections. Encoding 50,000 more Han ideographs produces a lot less interest in me than encoding Gothic. A lot of the audience is the same - who cares about ancient Greek? Will it handle the Dhammapada in Pali without error?. It appears the serious attempts to topple Unicode - Tron, for example - forgot that, and looked to their own issues, leaving Unicode to be the only real attempt to serve the needs of everyone and hence victor. * It seems there's disprepancy in what ISO-2022-INT-1 encodes. Another source adds ISO-8859-2 and ISO-8859-5, still leaving the Arabs, the residents of the Baltic states, Hindi and a lot of the rest of the world out. -- David Starner - [EMAIL PROTECTED], dvdeug/jabber.com (Jabber) Pointless website: http://dvdeug.dhis.org What we've got is a blue-light special on truth. It's the hottest thing with the youth. -- Information Society, Peace and Love, Inc.
Re: Unicode and Security
At 13:27 +0900 2002-02-05, Gaspar Sinai wrote: Just because some companies who have influence on Unicode Consortium use some algorithm, like backing store and re-mapping, it does not mean that this is the only way. And I don't even think they do in cases when character conversion is necessary. Backing store and remapping are fundamental principles of Unicode. They are implemented by people who want to implement the Unicode standard. For me it is very imprtant what a naive user sees on the screen. For me, too. Yudit does convert the input to view order and back. Text direction and end of line is clearly indicated. [...] If the standard wants me to confuse the user, I would rather dump the standard than comply. I haven't been able to follow how I, the user, am confused by the Unicode Standard. It sounds to me as though you want a Show Invisibles option to disassemble Hebrew or Arabic text and display them in LTR order without any ligation so that the user can see what is in the backing store. That's a valid thing to want to do, but it's a special case of rendering, which has little to do with the algorithm. I wish there was another world character standard besides Unicode and not only half-hearted attempts like bytext. Talking about characters: I think bi-di should not be in Unicode Standard because it is not a character. It is an algorithm. Yes, it is. The Unicode Standard does not just encode characters. It also provides tools for implementation. I feel sorry for interrupting in the Let's praise and celebrate Unicode mood of this mailing list. We like Unicode. We work to make it better. Sometimes people come to us with problems that aren't problems, or raise issues that have been dealt with many times before. Sometimes people bring us real problems that need real solutions. We're an intelligent bunch, methinks, and we can tell the difference. Unicode may have warts, but it's a lot better than ISO 2022. -- Michael Everson *** Everson Typography *** http://www.evertype.com
Re: Unicode and Security
Gaspar Sinai has a valid point insofar as there is a possible ambiguity in bidi text. However, he is absolutely wrong in blaming the Unicode bidi algorithm for this problem. Gaspar Sinai had written: change products or to change the standard and use a reversable bidi. and later: Hold on there! You admit that unicode alrgorithm is *really* not reversable? He completely failed to acknowledge the fact that the bidi rendering process is intrinsically not reversible, in the general case. And he did not mention John Cowan's (IIRC) simple example illustrating this fact. In other words, he is discussing on wrong premise, so his con- clusions are not sound. If he will not get his basic facts right, this whole discussion is indeed surreal, and mostly a waste of time. Gaspar Sinai wrote: Just because some companies who have influence on Unicode Consortium use some algorithm, like backing store and re-mapping, it does not mean that this is the only way. [...] Yudit does convert the input to view order and back. Now, this reveals the real problem. From this description, I gather that Gaspar's editor does not preserve the backing store, hence it has to reconstruct it from the rendering. As the rendering process is a n-1 mapping, its reverse is, intrisically, ambiguous. So, the attempt to recon- struct the original character sequence from the vsual appearance is bound to fail, in the general case. Now Gaspar asks everybody else to comply with his own approach, and does not even see that this approach will not work! Text direction and end of line is clearly indicated. The Unicode values of the characters in the cluster under the cursor are clearly indicated. These are good features to have in a decent editor; but they are entirely unrelated to the perceived problem. They can easily be implemented in an edtor that keeps the backing store. In all cases what you view be converted back to the *same* bitstream - except for illegal encoded text but that leaves clearly visible traces in the screen, as it should. Fine. And a lot easier to attain, if the original bitstream is not discarded, in the first place ;-) If the standard wants me to confuse the user, I would rather dump the standard than comply. That is certainly not the standard's aim. Rather the bidi part of the standard wants to describe established practice for bidi writing. I updated: http://www.yudit.org/security/ It would be honest to describe the facts, as they are in reality, and not overstate, or even falsify, them in order to drive a point home. E. g.: Unicode Bidirectional Algorithm is non-reversable. Rather: Bidi text may be ambiguous, if you cannot determine where to start reading. E. g. the arabs = SBARA EHT (where uppercase represents the arabic equivalent, written right to left) can be read from either side. Nested levels of RTL, and LTR, clauses may render the interpretation of bidi text even more problematic. The ambiguity is normally resolved in one of two ways: - The starting direction is determined from the context, e. g. you would start reading the preceding example from the left, as it is embedded in an English (i. e. LTR) paragraph; you would start reading this very same line from the other side if it were embedded in an Arabic (i. e. RTL) paragraph. - Embedded levels are usually delimited by quotes, or other con- textual hints. That means that if text converted back from display order we can not get back the same text. Rather: ... we will not get back the same text, in every single case. Imagine somone signing a digital unicode document. He is looking at his viewer but what he signs is the bitstream. He is probably signing a document that he has entered himself. Where could the ambiguity come from, if he has not deliberatly intro- duced it, himself? At yudit.org we advise you: please never sign digitally a Unicode document - or sign it knowing your own risk. Rather: Make sure what you sign, in particular regarding bidi documents. If you want to sign the clauses you entered, in logical order, then sign your e-mail (or other Unicode text); if you want to sign the rendering, then apply your signature to an image, or pdf, file. In both cases, try to express your points (particularly the nesting of clauses written in oppsite directions) as unambiguously as possible. Btw.: Decent software should make clear and obvious to the user what he is really signing. Best wishes, Otto Stolz
Re: Unicode and Security
Y'know, I must confess to not following this thread at all. Yes, it is impossible to tell from the glyphs on the screen what sequence of Unicode characters was used to generate them. Just *how*, exactly, is this a security problem? == John H. Jenkins [EMAIL PROTECTED] [EMAIL PROTECTED] http://homepage.mac.com/jenkins/
RE: Unicode and Security
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 I don't see why pick on bidi. Unicode rendering is not reversible in Latin too - from the rendering you cannot and should not be able to tell whether a character was decomposed or precomposed. Looking at some text, you would not be able to tell whether there are or aren't trailing spaces. This goes for good old ASCII too. What's the point? What has it to do with security? Jony -BEGIN PGP SIGNATURE- Version: PGPfreeware 7.0.3 for non-commercial use http://www.pgp.com iQA/AwUBPGBFxRV5/en3UelbEQKZtwCfWhhpuyS8Jf35/FCJltIpiNW3iTEAnRYW lagDbQlCy5wSd5rmvGOfCGfb =SZkL -END PGP SIGNATURE-
Re: Unicode and Security
Gaspar Sinai scripsit: So common language is screenshots... Ok. I updated the page. Thank you. Now the exact same file is viewed with two different viewers at the bottom of this page: http://www.yudit.org/security/ Outlook Express, at least the version you are using, has a bug; it is failing to set the overall directionality to RTL even though the first character is strongly RTL. The fact that some implementations are buggy is hardly an argument against either the use of bidi or Unicode. Furthermore, the example is perverse: you are providing a sentence that looks like it's meant to be read as English, but in Arabic reading order. I maintain my view that if there is no proven reversable logical-to-viewed/viewed-to-logical electronic signatures should be avoided. There is nothing to be done about the fact that the visual appearance the Arabs = BARA-LA Islam = MALSI-LA (using caps for RTL as usual) can be read as either Arabic-to-English or English-to-Arabic, depending on the larger context. If you saw it written down in isolation, you wouldn't know which way to read it either: nothing in the mere appearance of such a text tells you whether it is basically in English or Arabic. Therefore, this appearance can have either of two encodings: AL-ARAB = the Arabs\nAL-ISLAM = Islam the Arabs = AL-ARAB\nIslam = AL-ISLAM -- John Cowan http://www.ccil.org/~cowan [EMAIL PROTECTED] To say that Bilbo's breath was taken away is no description at all. There are no words left to express his staggerment, since Men changed the language that they learned of elves in the days when all the world was wonderful. --_The Hobbit_
Re: Unicode and Security
On 04-02-2002 11:15:25 John Cowan wrote: Outlook Express, at least the version you are using, has a bug; it is failing to set the overall directionality to RTL even though the first character is strongly RTL. The fact that some implementations are buggy is hardly an argument against either the use of bidi or Unicode. Of course the bidi algorithm permits using a higher-level protocol to set the paragraph direction (see note under rule P3, TUS 3.0 page 61). Thus one could argue that this isn't necessarily a bug in Outlook Express -- at least it isn't a violation of the standard. Bob
Re: Unicode and Security
On Mon, 4 Feb 2002, John Cowan wrote: Gaspar Sinai scripsit: Now the exact same file is viewed with two different viewers at the bottom of this page: http://www.yudit.org/security/ Outlook Express, at least the version you are using, has a bug; it is failing to set the overall directionality to RTL even though the first character is strongly RTL. The fact that some implementations are buggy is hardly an argument against either the use of bidi or Unicode. I am sorry but someone on this list has just said: + |The bidi algorithm is anything but vague. Any |implementation can be rigorously tested against two |reference implementations, to ensure fully compatible |implementation. + So does this mean that Microsoft does not rigorously test their products? Or does this mean the test is wrong? Or maybe the algorithm is vague? I expect at least one yes answer here. Come on guys this is only *one* example. And it happened in MS outlook too. (No more screenshots please none of my friends use that product any more). I am ready to publish regularily bad rendering of the *buggy* implementations of the non-vague unicode BIDI (or the non-buggy implementations of the *vague* BIDI - take your choice). I wonder which cost more to regualrily patch and change products or to change the standard and use a reversable bidi. It may take some time to find the bug - but the bug will be there... Cheers gaspar
Re: Unicode and Security
Gaspar Sinai... Pursuing this kind of trivia hunt for bugs in an environment employing Unicode is not any different than prusuing the same kind of bugs in any other environment. It is within the purview of the security community to find such bugs before hackers find them. But those bugs are not character set bugs, they are software bugs! I wonder which cost more to regualrily patch and change products or to change the standard and use a reversable bidi. Oh come now... That sounds like your real agenda -- you must have an algorithm that you like better, and apparently you thing that if implemented in software it would be less bug prone. Well, changing the Unicode bidi algorithm to use a reversible bidi still isn't going to solve the problem that all software has bugs! So if you find a bug in the bidi algorithm or the reference implementations, please let people know. It would be helpful. But at this point, changing it to be reversible isn't an option. Rick
Re: Unicode and Security
Hello, Before you call this thread a waste of time, and out of curiosity.. what were theconsiderations put forth which determined the way the bidi algorithm is (uax#9). Ie. what were the pros and cons of a reversible bidi? Also, who make up the 'bidi community'? The users or the developer(s) of the bidi algorithm? Thank you -- Mohammed Elzubeir "Mark Davis" [EMAIL PROTECTED] 02/04/02 10:13AM Outlook Express, at least the version you are using, has a bug;The BIDI algorithm is not reversible, and could not be made reversiblewithout eliminating features that are important to the bidi community.This was considered at the time the bidi algorithm was developed.This thread is a waste of time.Mark
Re: Unicode and Security
On Mon, 4 Feb 2002, Mark Davis wrote: Outlook Express, at least the version you are using, has a bug; This is not a bug; it is specifically cited in the Bidirectional Conformance section of Chapter 3 as one of the ways a higher-level protocol can override the BIDI algorithm. I otherwise agree with John about the perversity (perversion ;-) of the examples. change products or to change the standard and use a reversable bidi. The BIDI algorithm is not reversible, and could not be made reversible without eliminating features that are important to the bidi community. This was considered at the time the bidi algorithm was developed. Hold on there! You admit that unicode alrgorithm is *really* not reversable? I was just bluffing because I just saw that their is no reverse algorithm published in the standard! Can you imagine the implications of this? Imagine somone signing a digital unicode document. He is looking at his viewer but what he signs is the ___bitstream___. So you claim that this guy who might have no connection to software industry at all will be able to run an algorithm - that does not exist - in his head? This thread is a waste of time. If unicode bi-di algorithm was reversable none of this would happen. Software developers, who are flash and blood people, would be able to do a clean room implementation of the algorithm and the reverse of it. The correctness of the software could be *automatically* checked by just reversing the view and checking it against the bitstream. Instead of the automatic check no there are test cases and if there is a nasty bug the reply is, oh well, sorry for that, and plug in another fix and test case. I feel I saw this attitude before... Is it only me? Gaspar
Re: Unicode and Security
Gaspar Sinai scripsit: Hold on there! You admit that unicode alrgorithm is *really* not reversable? I was just bluffing because I just saw that their is no reverse algorithm published in the standard! It can't be reversable, as my little English = CIBARA demonstration showed. The only way to make a reversable algorithm would be to abandon the principle of phonetic internal ordering. Can you imagine the implications of this? Imagine somone signing a digital unicode document. He is looking at his viewer but what he signs is the ___bitstream___. So you claim that this guy who might have no connection to software industry at all will be able to run an algorithm - that does not exist - in his head? No Real World document is going to make sense read both ways. It will make sense one way, thus: BARA-LA AW MALSI-AL mean the Arabs and Islam respectively. The other order will make no sense at all. -- John Cowan http://www.ccil.org/~cowan [EMAIL PROTECTED] To say that Bilbo's breath was taken away is no description at all. There are no words left to express his staggerment, since Men changed the language that they learned of elves in the days when all the world was wonderful. --_The Hobbit_
Re: Unicode and Security
This thread is a waste of time. Gaspar If unicode bi-di algorithm was reversable none of this would Gaspar happen. Software developers, who are flash and blood people, would Gaspar be able to do a clean room implementation of the algorithm and the Gaspar reverse of it. The correctness of the software could be Gaspar *automatically* checked by just reversing the view and checking it Gaspar against the bitstream. Gaspar Instead of the automatic check no there are test cases and if Gaspar there is a nasty bug the reply is, oh well, sorry for that, and Gaspar plug in another fix and test case. Gaspar I feel I saw this attitude before... Is it only me? I don't understand your reasoning. Applying the bidi algorithm or a higher-level protocol does not change the backing store. Applying the bidi algorithm is essentially a one-way transformation, but the original information need not be thrown away. Yudit differentiates the backing store and the display, does it not? And as for signing a Unicode document, the fact that the user is implicitly signing the __bitstream__ and not the __document__ is probably the right thing to do. To be meaningful, the data will be displayed the same everywhere, barring incorrect renderers. And in the case of incorrect rendering, it is the __bitstream__ that remains correct, and that is what the user signed. A user types some text on a computer and signs it. Is the user signing the idea expressed by the text or the presentation of the text? They are signing the idea. The presentation can have all kinds of flaws that do not represent the original idea, such as a printer that can't print the letter e. - Mark LeisherOrthodoxy, of whatever color, seems to Computing Research Lab demand a lifeless, imitative style. New Mexico State University Box 30001, Dept. 3CRL -- Politics and the English Language, Las Cruces, NM 88003 George Orwell
Re: Unicode and Security
No Real World document is going to make sense read both ways. It will make sense one way, thus: "BARA-LA AW MALSI-AL mean the Arabs and Islam respectively". The other order will make no sense at all. Good style might say to put in a line break so you know what's going on. I don't know if that would help. Maybe it would do more harm than good. Let's ask an Israeli. They probably have to deal with this on a daily basis. $B"*!!$8$e$&$$$C$A$c$s!!"+(B $B!!$@$s$;$$$i$7$5$`$h$&(B _ $B%$%s%?!<%M%C%H$r$V$i$V$i%7%g%C%T%s%0$9$k$J$i(BMSN $B%7%g%C%T%s%0$X(B http://shopping.msn.co.jp/
Re: Unicode and Security
Gaspar wrote: The BIDI algorithm is not reversible, and could not be made reversible without eliminating features that are important to the bidi community. This was considered at the time the bidi algorithm was developed. Hold on there! You admit that unicode alrgorithm is *really* not reversable? I was just bluffing because I just saw that their is no reverse algorithm published in the standard! Of course it isn't reversible. (echoing John Cowan) The bidi algorithm is a set of steps for going from a logical representation of text to a specification of the *actual* directionality for rendering in lines. But there are inherent ambiguities in trying to reverse the process, to go from line-rendered text display to a logical representation of text. In addition to John Cowan's example of ambiguity caused by assumption of the default rendering order, you could always introduce extraneous embedding levels that would resolve the same, or you could have otherwise undetectable differences that would result in the same measurement and display of text, such as one em space versus a sequence of two en spaces. Gaspar continued: Can you imagine the implications of this? Imagine somone signing a digital unicode document. He is looking at his viewer but what he signs is the ___bitstream___. So you claim that this guy who might have no connection to software industry at all will be able to run an algorithm - that does not exist - in his head? Reading and understanding the content of text is no guarantee of being able to reverse a rendering process to intuit the exact order of characters which was used to produce that text -- ever. This is not merely a Unicode (and ISO 10646) issue, but even crops up in the severely limited context of ASCII text rendered with monowidth fonts. A trivial example of this can be found in otherwise undetectable spaces at ends of lines, or in ambiguities with regard to whether a particular spacing was produced by tabulation or insertion of multiple spaces. This thread is a waste of time. I agree with Mark about that. If unicode bi-di algorithm was reversable none of this would happen. Nonsense. Software developers, who are flash and blood people, would be able to do a clean room implementation of the algorithm and the reverse of it. The correctness of the software could be *automatically* checked by just reversing the view and checking it against the bitstream. Think again. Instead of the automatic check no there are test cases and if there is a nasty bug the reply is, oh well, sorry for that, and plug in another fix and test case. I feel I saw this attitude before... Is it only me? 'fraid so. By the way, I just checked www.yudit.org and noted that among the future plans for Yudit are: * Waiting for a standard that makes more sense than Unicode and jump ship. with that makes more sense pointing to http://www.bytext.org/ Oh ho! I think the readers of this list who considered the virtues of ByText would find that an interesting indication of judgement. --Ken
Re: Unicode and Security
On Mon, 4 Feb 2002, Mark Leisher wrote: [...cut some stuff to save room...] I don't understand your reasoning. Applying the bidi algorithm or a higher-level protocol does not change the backing store. Applying the bidi algorithm is essentially a one-way transformation, but the original information need not be thrown away. Yudit differentiates the backing store and the display, does it not? Thank you for mentioning Yudit - I don't need advertisement, there are enough users. Just because some companies who have influence on Unicode Consortium use some algorithm, like backing store and re-mapping, it does not mean that this is the only way. And I don't even think they do in cases when character conversion is necessary. For me it is very imprtant what a naive user sees on the screen. Yudit does convert the input to view order and back. Text direction and end of line is clearly indicated. The Unicode values of the characters in the cluster under the cursor are clearly indicated. In all cases what you view be converted back to the *same* bitstream - except for illegal encoded text but that leaves clearly visible traces in the screen, as it should. If the standard wants me to confuse the user, I would rather dump the standard than comply. I wish there was another world character standard besides Unicode and not only half-hearted attempts like bytext. Talking about characters: I think bi-di should not be in Unicode Standard because it is not a character. It is an algorithm. I also start think this thread is a waste of time. This thread won't solve the our problem. I feel sorry for interrupting in the Let's praise and celebrate Unicode mood of this mailing list. gaspar I updated: http://www.yudit.org/security/ I wanted to remove it after solving the problem, but it seems that this page will stay.
Re: Unicode and Security
From: Gaspar Sinai [EMAIL PROTECTED] If the standard wants me to confuse the user, I would rather dump the standard than comply. Well, don't let the door hit you in the a** on the way out? Te users will be less confused than you realize -- only people who walk in with agendas see the flaws you claim. Talking about characters: I think bi-di should not be in Unicode Standard because it is not a character. It is an algorithm. And it is documented as such. Clearly what you want of Unicode does not match what it actually is -- when my wife and I realized such about each other, she became my ex-wife. Since that is your goal here, I guess your divorce from Unicode should not be a surprise? snip out of order Thank you for mentioning Yudit - I don't need advertisement, there are enough users. Perhaps some will leave if you are honest about your divorce though -- you might be surprised how many people follow the standard? I also start think this thread is a waste of time. This thread won't solve the our problem. The only issue though is that we do not have a problem, here? I feel sorry for interrupting in the Let's praise and celebrate Unicode mood of this mailing list. Sorry, thats not the mood of the list. But in order to have a healthy respect for the people who give sound and reasonable arguments, we must show a matching lack of respect for those who give specious arguments. I updated: http://www.yudit.org/security/ I wanted to remove it after solving the problem, but it seems that this page will stay. The problem is solved, though. The real problem at this point can be found at http://www.yudit.org/gaspar/ though. MichKa Michael Kaplan Trigeminal Software, Inc. -- http://www.trigeminal.com/
Re: Unicode and Security
At 02:15 PM 2/3/2002 +0900, you wrote: On Sat, 2 Feb 2002, David Starner wrote: [...several lines cut to save room...] I think I'm missing your perspective. To me, these are minor quirks. Why do you see them as huge problems? I am thinking about electronically signed Unicode text documents that are rendered correctly or believeed to be rendered correctly, still they look different, seem to contain additional or do not seem to contain some text when viewed with different viewers due to some ambiguities inherent in the standard. An electronically signed document allows you to trust who wrote it, and that the *byte* sequence* hasn't been tampered with. It implies nothing at all trust wise about what software you should use to interpret it. You would go through the trouble to verify a signature, but trust the .doc extension and some machine's implementation of Word with your money? Makes no sense. That being said, identifying security issues of existing programs and or protocols when they intersect with Unicode-based data is an important issue, and one I intend to cover regularly on www.i18n.com, once it launches this month. For those of you that have specific issues to write about, or are interested in providing a series of security-related articles (length and frequency TBD, please contact me off-list. I think there are endless examples already out there, to provide, and I know of at least one that is serious. Let's find more! Best Regards, Barry Caplan www.i18n.com - coming soon, preview available now News | Tools | Process for Global Software Team I18N
Re: Unicode and Security
On Sun, 3 Feb 2002, Asmus Freytag wrote: The bidi algorithm is anything but vague. Any implementation can be rigorously tested against two reference implementations, to ensure fully compatible implementation. Sorry buys to be this short this time but I kicked life to my Windows laptop and made and Example for BIDI. That pretty much took my time away... The following page contains my view of Unicode BIDI algorithm (with screenshots). http://www.yudit.org/security/ This page is not linked up enywhere yet - I just made it for this list. My apology for being so bastard - my nature is to be paranoid. Gaspar
Re: Unicode and Security
On Sun, 3 Feb 2002, John Cowan wrote: Gaspar Sinai scripsit: The following page contains my view of Unicode BIDI algorithm (with screenshots). http://www.yudit.org/security/ Oooo-kay. This is not a Unicode problem per se: it is about embedded text vs. text that is not embedded. The Yudit and IE versions are displaying a text (Java code) that is essentially in Latin script (LTR) with some RTL inclusions. However, when the Java application actually runs, it displays three separate and distinct texts, each of which is an RTL text with some LTR inclusions. They are assumed to be RTL text, by the bidi rules, because they begin with a strong RTL character. Similar things happen when you construct XML documents with RTL element names: the bidi rules, which are meant for true text and not computer-readable stuff, sometimes produce visually confusing results. So it is perfectly ok? I can make a non-ebedded example too. I do not have time to make childish examples and screenshots to get through my point. I have a job to do and text processing is just my hobby. The rendering problems are all side effects of the unicode bi-di algorithm. If unicode bidi algorithm would be proven to be reversable (logical-display ; display-logical) I would not go to bed worrying about my signed documents. Thats my view of the problem. Cheers gaspar
Re: Unicode and Security
Gaspar Sinai scripsit: The following page contains my view of Unicode BIDI algorithm (with screenshots). http://www.yudit.org/security/ Oooo-kay. This is not a Unicode problem per se: it is about embedded text vs. text that is not embedded. The Yudit and IE versions are displaying a text (Java code) that is essentially in Latin script (LTR) with some RTL inclusions. However, when the Java application actually runs, it displays three separate and distinct texts, each of which is an RTL text with some LTR inclusions. They are assumed to be RTL text, by the bidi rules, because they begin with a strong RTL character. Similar things happen when you construct XML documents with RTL element names: the bidi rules, which are meant for true text and not computer-readable stuff, sometimes produce visually confusing results. -- John Cowan http://www.ccil.org/~cowan [EMAIL PROTECTED] To say that Bilbo's breath was taken away is no description at all. There are no words left to express his staggerment, since Men changed the language that they learned of elves in the days when all the world was wonderful. --_The Hobbit_
Re: Unicode and Security
Gaspar Sinai scripsit: So it is perfectly ok? I can make a non-ebedded example too. I do not have time to make childish examples and screenshots to get through my point. I have a job to do and text processing is just my hobby. Mine too, but it's difficult to understand the merits of an objection when no actual examples of the problem are given. -- John Cowan http://www.ccil.org/~cowan [EMAIL PROTECTED] To say that Bilbo's breath was taken away is no description at all. There are no words left to express his staggerment, since Men changed the language that they learned of elves in the days when all the world was wonderful. --_The Hobbit_
Re: Unicode and Security
On Sun, 3 Feb 2002, John Cowan wrote: Gaspar Sinai scripsit: So it is perfectly ok? I can make a non-ebedded example too. I do not have time to make childish examples and screenshots to get through my point. I have a job to do and text processing is just my hobby. Mine too, but it's difficult to understand the merits of an objection when no actual examples of the problem are given. So common language is screenshots... Ok. I updated the page. Now the exact same file is viewed with two different viewers at the bottom of this page: http://www.yudit.org/security/ I maintain my view that if there is no proven reversable logical-to-viewed/viewed-to-logical electronic signatures should be avoided. And the bottom line is: I don't really care if Unicode will admit that this is a problem. If my reasoning (not my screenshots) convince *some* people not to sign electronically unicode text I think I did those guys good - and that is enough satisfaction for me. Cheers gaspar
Re: Unicode and Security
On Mon, Feb 04, 2002 at 02:25:05PM +0900, Gaspar Sinai wrote: And the bottom line is: I don't really care if Unicode will admit that this is a problem. If my reasoning (not my screenshots) convince *some* people not to sign electronically unicode text I think I did those guys good - and that is enough satisfaction for me. Why not just warn against signing documents with bidi in them? Odds are, people who would run into this, if warned against using Unicode, would use ISO-8859-6/8 - which is often ran through the same bidi algorithim. And what if you don't do those guys good? They miss a multimillion dollar account because they can't work with the client, or they fall for something more common because they're worrying about Unicode? -- David Starner - [EMAIL PROTECTED], dvdeug/jabber.com (Jabber) Pointless website: http://dvdeug.dhis.org What we've got is a blue-light special on truth. It's the hottest thing with the youth. -- Information Society, Peace and Love, Inc.
Re: Unicode and Security
A while back there was some discussion of security. You could start by checking the list archies for those threads. Is Unicode secure? What character standards can be considered secure? What does security really mean for a character encoding? In my opinion, security is related to bugs in software, not to specifications of character encodings. No matter what character encoding you use, you are subject to certains types of security problems in certain environments if you don't write correct and robust programs! The uneasiness you are experiencing at this time is manifest only because Unicode is a relatively new character encoding and software/program environments in which Unicode is found have not been subjected to the same degree of scrutiny and analysis as previous environments which used, for example, only ASCII. I would also like to know your opinion about the need to create another or an 'intermediate' standard. There is no need to do that. The scenarios you present are related to misinterpretations by software, not to any real problems with the specification of Unicode itself. If you precisely specify the input that your software will accept in secure situations where interpretation matters, and specify what things your software will NOT accept as substitutes, then you will not have these kinds of security problems. There is, perhaps, a need for the security community to discuss the types of security attacks that could be mounted against naive software that accepts Unicode strings in secure situations. That's my opinion. Rick
Re: Unicode and Security
On Sun, Feb 03, 2002 at 11:41:11AM +0900, Gaspar Sinai wrote: I had the following problems where unicode could not be used because of security issues. In all cases the signer of a document can be lured into believing that the wording of the document he/she is about to sign is different. This seems more like a legal issue than anything else. It's not legal to lure someone into believing that the wording of the docuement to be signed is different. I think you're trying to apply a technical solution to a legal problem. 1. Character Order Problem The BIDI algorithm is too complex and not reversible. I could create a BIDI document where only RLO LRO and PDF characters were used, and the WORD, JAVA and KDE produced different word ordering. I don't have access to MS platform now to reproduce this but as far as I can tell it was like: RLOtext1PDFU+0020RLOtext2PDF Because the BIDI algorithm is too complex and vague it can be said that these programs all displayed the text correctly, still differently. text1 text2 text2 text1 If you support the RLO/PDF characters, the answer is 1txet 2txet, if I'm reading it right. If you don't, then there's no reason to run the bidi algoritim, and the answer is text1 text2. Whether ligature forming will actually happen or not is completely up to the font. If the font does have the ligature, it will be formed. The standard does not define all the compulsory ligatures. The whole point of this is that ligatures shouldn't be something most users have to worry about, and they shouldn't be something that changes meaning. If I'm using Times New Roman, it should make the ff, fi, and ffi ligatures automatically. If I switch the document to an old-style font, it should do ct and st automatically. b) Hidden Marks It is possible to make a combining mark, like a negation mark appear in the base characters body making it invisible. It is nearly impossible to test the rendering engine for all possible combinations. Sure. 3. Text Search Problem It is possible to create texts that look the same, but the can not be searched because even when fully decomposed and ordered they will be different. I don't see a solution for this. U+0030, U+004F, U+006F, U+039F, U+041E, U+0555, U+0A66, U+0AE6, U+0B66, U+0C66, U+OCE6, U+0E50, U+0ED0, U+1040, U+17E0, U+2070, U+2080, U+2134, U+25CB, U+25EF, U+274D, and U+3007 are all a closed circular shapes. But while they could be confused when used inappropriately, they each have distinct meaning and use. If you want text to be searchable, then encode it properly. If you don't, well, that's your choice. This is true in preexisting standards, too - any that include two of the Latin, Cyrillic and Greek scripts. I think I'm missing your perspective. To me, these are minor quirks. Why do you see them as huge problems? -- David Starner - [EMAIL PROTECTED], dvdeug/jabber.com (Jabber) Pointless website: http://dvdeug.dhis.org What we've got is a blue-light special on truth. It's the hottest thing with the youth. -- Information Society, Peace and Love, Inc.
Re: Unicode and Security
On Sat, 2 Feb 2002, David Starner wrote: [...several lines cut to save room...] I think I'm missing your perspective. To me, these are minor quirks. Why do you see them as huge problems? I am thinking about electronically signed Unicode text documents that are rendered correctly or believeed to be rendered correctly, still they look different, seem to contain additional or do not seem to contain some text when viewed with different viewers due to some ambiguities inherent in the standard. It might be just a minor quirk unless they don't cost me trasferrring all the money from my bank account to a person unintentionally... Can all the cases be identified and clearified or there are infinite number of back-doors in the standard? Thank you, Gaspar
Re: Unicode and Security
On Sun, Feb 03, 2002 at 02:15:51PM +0900, Gaspar Sinai wrote: I am thinking about electronically signed Unicode text documents that are rendered correctly or believeed to be rendered correctly, still they look different, seem to contain additional or do not seem to contain some text when viewed with different viewers due to some ambiguities inherent in the standard. Some CR's at the right place might produce the same effect in a pure ASCII document. The O/0 and 1/l/| confusables exist in ASCII. It might be just a minor quirk unless they don't cost me trasferrring all the money from my bank account to a person unintentionally... There seem to be much easier ways to scam money than to exploit something like this. Promise the world, take their money and run has been changed more by Ebay than Unicode. If you don't trust someone, don't deal with them. If they do pull something like this, it's no more legal than any other form of scam. Can all the cases be identified and clearified or there are infinite number of back-doors in the standard? Since the only way to fix all these problems would be be to prescibe a specific font and specific manner to render text using that font, it's unlikely they will be fixed. But there aren't an infinite number of back-doors in the standard, as it's logically a finite document. -- David Starner - [EMAIL PROTECTED], dvdeug/jabber.com (Jabber) Pointless website: http://dvdeug.dhis.org What we've got is a blue-light special on truth. It's the hottest thing with the youth. -- Information Society, Peace and Love, Inc.