Erik van der Poel <erik at vanderpoel dot org> wrote: > Let's take an analogy. The P.O. Box system. Right now, it uses numbers > like P.O. Box 3256. What would happen if the Postal Service decided to > use Unicode, where some of the characters are only slightly different, > and the postman inadvertently put some important mail into the wrong > box, one that was registered by an evil person, using a name that was > only slightly different from the PO box of some company? > > Wouldn't that company try to get the Postal Service to use a smaller > set of symbols (say, digits) rather than this confusing Unicode? Maybe > that company would even try to sue the Postal Service.
Sadly, the Postal Service is fully capable of putting mail in the wrong box without the help of Unicode. Sorry. Anyway... The Postal Service probably would have instituted some type of subset of valid characters for use in P.O. box identifiers. (Or alternatively, they would have a blacklist of invalid characters.) At some point they would discover, or have pointed out to them, confusables that they hadn't thought of. They would probably then amend their list to exclude the newly discovered confusable. THEN, because they are the Postal Service and have complete authority over the entire system, they would be able to discontinue the use of the evil P.O. box and require the user to change its name, or change it for him, or refund his deposit. Meanwhile, the reason they switched to Unicode in the first place was precisely that they wanted to offer a wider variety of characters to their users, for whatever reason. Isn't that why domain names were internationalized? Isn't that the reason *anyone* switches to Unicode? They must have had some benefits in mind by adopting a larger repertoire. By switching away from "this confusing Unicode" they are giving up those perceived benefits. If you have an application that lends itself to a limited repertoire, like automobile license plates or P.O. box numbers or house numbers, use that limited repertoire. If you need a wider repertoire, you can use Unicode and you can *still* implement a subset. The problem, as always, is in determining what the subset should be. -Doug Ewell Fullerton, California http://users.adelphia.net/~dewell/
