On Thu, Jan 20, 2011 at 03:51:05AM +0100, Victor Stinner wrote: > For a lesson at school, it is nice to write examples in the > mother language, instead of using "raw" english with ASCII identifiers > and filenames.
Then use this:: import cafe as café When you do things this way you do not have to translate between unknown encodings into unicode. Everything is within python source where you have a defined encoding. Teaching students to write non-portable code (relying on filesystem encoding where your solution is, don't upload to pypi anything that has non-ascii filenames) seems like the exact opposite of how you'd want to shape a young student's understanding of good programming practices. > In a school, you can use the same configuration > (encoding) on all computers. > In a school computer lab perhaps. But not on all the students' and professors' machines. How many professors will be cursing python when they discover that the example code that they wrote on their Linux workstation doesn't work when the students try to use it in their windows computer lab? How many students will be upset when the code they turn in runs on their professor's test machine if the lab computers were booted into the Linux partition but not if the they were booted into Windows? > > > > > * Specify an encoding per platform and stick to that. > > > > > > It doesn't work: on UNIX/BSD, the user chooses its own encoding and all > > > programs will use it. > > > > > (...) This prevents getting a mixture of encodings of modules (...) > > If you have an issue with encodings, when have to fix it when you create > a module (on disk), not when you load a module (it is too late). > It's not too late to throw a clear error of what's wrong. > > I haven't looked at your patch so > > perhaps you have an ingenous method of translating from the unicode > > representation of the module in the import statement to the bytes in > > arbitrary encodings on the filesystem that I haven't thought of. > > On Windows, My patch tries to avoid any conversion: it uses unicode > everywhere. > > On other OSes, it uses the Python filesystem encoding to encode a module > name (as it is done for any other operation on the filesystem with an > unicode filename). > The other interfaces are somewhat of a red herring here. As I wrote in another email, importing modules has ramifications that open(), for instance, does not. Additionally, those other filesystem operations have been growing the ability to take byte values and encoding parameters because unicode translation via a single filesystem encoding is a good default but not a complete solution. I think that this problem demands a complete solution, however, and it seems to me that limiting the scope of the problem is the most pleasant method to accomplish this. Your solution creates modules which aren't portable. One of my proposals creates python code which isn't portable. The other one suffers some of the same disadvantages as your solution in portability but allows for tools that could automatically correct modules. > -- > > Python 3 supports bytes filename to be able to read/copy/delete > undecodable filenames, filenames stored in a encoding different than the > system encoding, broken filenames. It is also possible to access these > files using PEP 383 (with surrogate characters). This is useful to use > Python on an old system. > > > If you don't, however, then really - ASCII-only seems like the sanest > > of the three solutions I can think of. > > But a (Python 3) module is not supposed to have a broken filename. If it > is the case, you have better to fix its name, instead of trying to fix > the problem later (in Python). > We agree that there should not be broken module names. However it seems we very hotly disagree about the definition of that. You think that if a module is named appropriately on one system but is not portable to another system, that's fine. I think that portability between systems is very important and sacrificing that so that someone can locally use a module with non-ASCII characters doesn't have a justifiable reward. > With UTF-8 filesystem encoding (eg. on Mac OS X, and most Linux setups), > it is already possible to use non-ASCII module names. > Tangent: This is not true about Linux. UTF-8 is a matter of the interpretation of the filesystem bytes that the user specifies by setting their system locale. Setting system locale to ASCII for use in system-wide scripts, is quite common as is changing locale settings in other parts of the world (as I can tell you from the bug reports colleagues CC me on to fix for the problems with unicode support in their python2 programs). Allowing module names incompatible with ascii without specifying an encoding will just lead to bug reports down the line. Relatively few programmers understand the difference between the python unicode abstraction and the byte representations possible for those strings. Allowing non-ascii characters in module filenames without specifying an encoding sets a trap for these programmers to fall into when they move beyond their studies to programming for customers, pypi downloaders, etc who don't have the same environment as themselves. -Toshio
pgpuQ5cF19ruV.pgp
Description: PGP signature
_______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com