I agree with Roozbeh; in the particular case cited, there is not a problem.
There is a very significant problem with transcoding, because different vendors mean different things by a tag like "SJIS". The IANA registry, unfortunately, does not specify the mapping corresponding to this tag in sufficient detail for precise conversions. See, for example, http://www.w3.org/TR/japanese-xml/ for an extended discussion. However, precise tags have been proposed to IANA to resolve these problems (in Appendix D of this document) by providing precise mappings, at least for Japanese. http://oss.software.ibm.com/icu/charset/index.html lists some of the variation among different character mapping tables for other code sets. However, all this being said and done, the variations are usually a very small percent of the total, and usually restricted to a few punctuation or symbols. And to the best of my knowledge, people do not vary in how they interpret the ISO 8859 series. Thus while the document wants to point to the problem, it would be misleading to give people the impression that a large number of characters will cause security problems, when it is really restricted to a very small number of cases. What would be productive and useful would be to identify and list those characters that could have problems in practice. Mark __________ http://www.macchiato.com “Eppur si muove” ----- Original Message ----- From: "Roozbeh Pournader" <[EMAIL PROTECTED]> To: "IDN" <[EMAIL PROTECTED]> Sent: Tuesday, May 28, 2002 05:22 Subject: Re: [idn] Re: Legacy charset conversion in draft-ietf-idn-idna-08.txt > > > The basic attack: Alice runs on host that uses Latin-1 for > > input/output and enters www.µbank.com (where µ is 8859-1 0xB5). The > > domain is registered using U+00B5, but Alice's application transcode > > the string using U+03BC. Either Alice can't connect (if the other > > domain doesn't exist) or she ends up talking to someone else (if the > > other domain does exist). > > I'm sorry, but your example doesn't work. In nameprep, when doing Unicode > Normalization, U+00B5 is mapped to U+03BC. So these will be the same > domain name, and have the same ACE label. > > roozbeh > > >
