On Sun, Oct 23, 2005 at 10:55:34PM +0900, Dan Kogai wrote: : To make the matter worse, there are not just one "yen sign" in : Unicode. Take a look at this. : : ¥ U+00A5 YEN SIGN : ¥ U+FFE5 FULLWIDTH YEN SIGN : : Tough they look and groks the same to human, computers handle them : differently. This happened when Unicode Consortium decided to make : BMP round-trippable against legacy encodings. They were distinct in : JIS standards, so happened Unicode. : : Maybe we should avoid other symbols like this for sigils -- those not : in ASCII that have 'fullwidth' variations. q($) and q(\) are okay : (or too late) because they are already in ASCII. q(¥) should be : avoided because you can hardly tell the difference from q(¥) in the : display. : : But this will also outlaw the cent sign. I have attached a list of : those affected. As you see, most are with ASCII equivalents but some : are not.
We'd have to outlaw A..Z as well. :-) I think a better plan might just be to say that we'll treat any fullwidth character as equivalent to its narrow companion, at least when used as an operator. Canonicalizing identifiers may be another matter though. On the other hand, certain of the double-width characters are likely to be confused with two singles, such as = FF1D FULLWIDTH EQUALS SIGN _ FF3F FULLWIDTH LOW LINE so maybe they should be equivalent to == and __, or outlawed. And one could (un)reasonably argue that ~ FF5E FULLWIDTH TILDE ought to mean ~~ rather than ~. But in general we need to go slow on such decisions. For now just sticking our toe into Latin-1 is enough, as long as we're looking ahead for visual pitfalls. As for the ¥ pitfall, so far we've intentionally been careful to use it only where an operator is expected, whereas \ is legal only where a term is expected. So at least for Perl code, we can translate legacy ¥ to different codepoints. (Whether the Japanese font distinguishes them is another issue, of course. I have a "Unicode" font on my machine that prints backslash as ¥, which I find slightly irritating, but doubtless will be par for the course in Japan for the foreseeable future. Maybe that's a good reason to allow the doublewith backslash as an alias for normal backslash. Maybe not.) Anyway, I think people will be able to distinguish visually between "A ¥ B" and "¥X" as long as we keep the operator/term distinction. Larry