On 17 August 2015 at 18:12, Erik Bray <erik.m.b...@gmail.com> wrote: > On Thu, Aug 13, 2015 at 4:42 AM, 俞博文 <steven...@hotmail.com> wrote: >> Dear Maintainers: >> >> This problem occurred when >> 1. Windows platform >> 2. Python is installed on non-Latin path (for example: path contains Chinese >> character). >> 3. try to "pip install theano" >> >> And I found the problem is in distutils.command.build_scripts module's >> copy_scripts function, on line 106 >> >> executable = os.fsencode(executable) >> shebang = b"#!" + executable + post_interp + b"\n" >> try: >> shebang.decode('utf-8') >> >> actually os.fsencode will encode the path into GBK encoding on windows, it's >> certainly that will fail to decode via utf-8. >> >> Solution: >> >> #executable = os.fsencode(executable) (delete this line) >> executable = executable.encode('utf-8') >> >> Theano successfully installed after this patch. > > Hi, > > This is a bit tricky--I think, from the *nix perspective, using > os.fsencode() looks like the correct approach here. However, if > sys.getfilesystemencoding() != 'utf-8', and if the result of > os.fsencode(executable) is not decodable as utf-8, then that's going > to be a problem for the Python interpreter which begins reading a file > as utf-8 until it gets to the coding token. > > Unfortunately this is a bit contradictory--if the path to the > interpreter in the local filesystem encoding is not UTF-8 it is > impossible to parse that file in Python. On Windows this shouldn't > matter--I agree with your patch, that it should just write the shebang > line in UTF-8. However, on *nix systems it really should be using > os.fsencode, I think. > > I wonder if this was brought up in the discussion around PEP-263. I > feel like as long as the file encoding is declared to be the same as > whatever encoding was used the write the shebang line, that this > should be valid. However, the Python interpreter still tries to > interpret the shebang line as UTF-8, and hence falls over in your > case. This is unfortunate...
There are a number of questions here, which I don't currently have time to dig into, I'm afraid: 1. The original post specifies Windows, so I'll stick to that. Unix is a whole other situation, and I won't cover that as I have no expertise there. But it will need reviewing by someone who does know. 2. Where is the shebang being used? I can think of at least 3 possibilities, and they are all parsed with different code. If it's written to a .py file executed by the user (via the launcher) it should be UTF-8 as that's what the launcher uses. If it's written to the embedded python script in a pip (distlib) single-file exe wrapper, it should probably also use UTF-8 as the distlib wrappers use code derived from the launcher code (I believe) and therefore probably also uses UTF-8. If it's an old-style setuptools 2-file exe wrapper (.exe and -script.py) then it should use whatever that exe requires - I have no idea what that might be, but UTF-8 is still the only really sane choice, it's just that the setuptools wrapper was written some time ago and may not have made that choice. Someone should check. 3. Long story short, use UTF-8, but you may need to check the code that interprets the shebang just to be sure. Any actual patch needs to be conditional on the OS as well (unless it turns out that UTF-8 is the right answer everywhere, which frankly I doubt...) Paul _______________________________________________ Distutils-SIG maillist - Distutils-SIG@python.org https://mail.python.org/mailman/listinfo/distutils-sig