On Mon, 14 May 2007 09:42:13 +1000, Aldo Cortesi wrote: > I don't > want to be in a situation where I need to mechanically "clean" > code (say, from a submitted patch) with a tool because I can't > reliably verify it by eye.
But you can't reliably verify by eye. That's orders of magnitude more difficult than debugging by eye, and we all know that you can't reliably debug anything but the most trivial programs by eye. If you're relying on cursory visual inspection to recognize harmful code, you're already vulnerable to trojans. > We should learn from the plethora of > Unicode-related security problems that have cropped up in the last > few years. Of course we should. And one of the things we should learn is when and how Unicode is a risk, and not imagine that Unicode is some sort of mystical contamination that creates security problems just by being used. > - Non-ASCII identifiers would be a barrier to code exchange. If I > know > Python I should be able to easily read any piece of code written > in it, regardless of the linguistic origin of the author. If PEP > 3131 is accepted, this will no longer be the case. But it isn't the case now, so that's no different. Code exchange regardless of human language is a nice principle, but it doesn't work in practice. How do you use "any piece of code ... regardless of the linguistic origin of the author" when you don't know what the functions and classes and arguments _mean_? Here's a tiny doc string from one of the functions in the standard library, translated (more or less) to Portuguese. If you can't read Portuguese at least well enough to get by, how could you possibly use this function? What would you use it for? What does it do? What arguments does it take? def dirsorteinsercao(a, x, baixo=0, elevado=None): """da o artigo x insercao na lista a, e mantem-na a supondo classificado e classificado. Se x estiver ja em a, introduza-o a direita do x direita mais. Os args opcionais baixos (defeito 0) e elevados (len(a) do defeito) limitam a fatia de a a ser procurarado. """ # not a non-ASCII character in sight (unless I missed one...) [Apologies to Portuguese speakers for the dogs-breakfast I'm sure Babel- fish and I made of the translation.] The particular function I chose is probably small enough and obvious enough that you could work out what it does just by following the algorithm. You might even be able to guess what it is, because Portuguese is similar enough to other Latin languages that most people can guess what some of the words might mean (elevados could be height, maybe?). Now multiply this difficulty by a thousand for a non-trivial module with multiple classes and dozens of methods and functions. And you might not even know what language it is in. No, code exchange regardless of natural language is a nice principle, but it doesn't exist except in very special circumstances. > A Python > project that uses Urdu identifiers throughout is just as useless > to me, from a code-exchange point of view, as one written in Perl. That's because you can't read it, not because it uses Unicode. It could be written entirely in ASCII, and still be unreadable and impossible to understand. > - Unicode is harder to work with than ASCII in ways that are more > important > in code than in human-language text. Humans eyes don't care if two > visually indistinguishable characters are used interchangeably. > Interpreters do. There is no doubt that people will accidentally > introduce mistakes into their code because of this. That's no different from typos in ASCII. There's no doubt that we'll give the same answer we've always given for this problem: unit tests, pylint and pychecker. -- Steven. -- http://mail.python.org/mailman/listinfo/python-list