jinx ;-) On Thu, Dec 15, 2016 at 4:38 AM, Nick Coghlan <ncogh...@gmail.com> wrote: > On 15 December 2016 at 21:17, Toshio Kuratomi <a.bad...@gmail.com> wrote: >> My one concern is precisely this variety. For instance, if I get a >> report that my application is raising a UnicodeError on RHEL7 when run >> under cron (which uses the C locale) I might then try to replicate the >> error on Fedora using the same LC_ALL=C locale. With this change I >> would fail to reproduce the error. > > But with the current patch you *would* get a visible warning on stderr saying: > > Python detected LC_CTYPE=C. Setting LC_ALL & LANG to C.UTF-8. > warning is not enough IMHO, but.... > > Agreed, and my original idea upstream included an environment variable > override to account for that case:
I do think this is sufficient. Debugging already requires setting an env var (LC_*=C) so setting a second one in addition is not a big deal. The internet will have outdated information on debugging for a few years and then people will figure out the new invocation and adapt. >> I think the library is the appropriate place. Otherwise you end up >> with a python application failing when run under mod_wsgi[*]_ which >> you can't debug using the command line interpreter. > > There's one pragmatic problem with that, and one that's a question of > appropriate division of responsibilities in terms of understanding the > runtime's context of use. > > The pragmatic problem is that the main CPython binary calls > https://docs.python.org/3/c-api/sys.html#c.Py_DecodeLocale to convert > the command line arguments from char* to wchar_t* before it calls > Py_Main, which means we have to override the locale *before* we hand > over control to the dynamically linked library. Otherwise we end up in > exactly the same situation that click complains about: by the time we > find out there's a problem with the locale, some work has already been > done using the wrong setting. > I thought about this one and decided it isn't really a problem. Just make the check in both places. The CPython binary is choosing/needs to preprocess the arguments before calling Py_Main so it needs to check and set locale to do its job. libpython needs to make the same check before it knows it can do its job successfully. (Actually, this looks like an API choice on libpython's part. The arguments are decoded from raw bytes but they aren't assigned any semantic meaning. If libpython handled the decoding as well, then this wouldn't be a concern). > The architectural problem is that when you embed CPython, it really is > one of the embedding application's responsibilities to configure the > locale such that the interpreter plays nice with the rest of the > application. It's one thing to second guess the shell from directly > inside a C-level main() function when we know POSIX makes some really > old ASCII-centric assumptions and that developers are prone to writing > "LANG=C" rather than "LANG=C.UTF-8" to turn off their locale settings, > but something else entirely to second guess a GUI application like > Blender (where arbitrary amounts of code may have already run before > the CPython runtime gets initialised) or an application platform with > its own environment management system like Apache httpd. > yeah, this one is much tougher, although I disagree on the reason it's a problem. I do not think it's necessarily the embedding application's responsibility to make sure the embedded interpreter.can run but the nature of the environment variables being process-global means that the library can't set them without affecting the application as a whole. That's a big no-no.. I'd almost say that internalizing the click behviour could be the correct design here. Have the library check that it has a locale with non-ascii capabilities and fail if it doesn't would be helpful. That would quickly point to differences in behaviours running under a mod_wsgi vs /usr/bin/python, for instance, prompting the user to fix the mod_wsgi deployment in advance. OTOH, users don't run into the problem all the time (it depends on the data being processed and how it is handled) so it seems heavy handed to do it this way (I suppose by the same argument I'd have to say that click is doing it wrong to force users to address ascii-only locales...) The costs here are very steep in both directions... so I don't see any good ways to address it yet. The best I can offer so far is for the library to check and warn if an ascii-only locale is used. That way someone who encounters a UnicodeError in code deployed under mod_wsgi is shown how to debug this when they run it under /usr/bin/python. So something like this from the library: libpython detected LC_CTYPE=C.Some encoding errors may occur.. Use.PYTHONALLOWCLOCALE=1 LC_CTYPE=C /usr/bin/python if debugging this under /usr/bin/python. -Toshio _______________________________________________ python-devel mailing list -- python-devel@lists.fedoraproject.org To unsubscribe send an email to python-devel-le...@lists.fedoraproject.org