jinx ;-)

On Thu, Dec 15, 2016 at 4:38 AM, Nick Coghlan <ncogh...@gmail.com> wrote:
> On 15 December 2016 at 21:17, Toshio Kuratomi <a.bad...@gmail.com> wrote:
>> My one concern is precisely this variety.  For instance, if I get a
>> report that my application is raising a UnicodeError on RHEL7 when run
>> under cron (which uses the C locale) I might then try to replicate the
>> error on Fedora using the same LC_ALL=C locale.  With this change I
>> would fail to reproduce the error.
>
> But with the current patch you *would* get a visible warning on stderr saying:
>
>     Python detected LC_CTYPE=C. Setting LC_ALL & LANG to C.UTF-8.
>
warning is not enough IMHO, but....
>
> Agreed, and my original idea upstream included an environment variable
> override to account for that case:

I do think this is sufficient.  Debugging already requires setting an
env var (LC_*=C) so setting a second one in addition is not a big
deal.  The internet will have outdated information on debugging for a
few years and then people will figure out the new invocation and
adapt.

>> I think the library is the appropriate place.  Otherwise you end up
>> with a python application failing when run under mod_wsgi[*]_ which
>> you can't debug using the command line interpreter.
>
> There's one pragmatic problem with that, and one that's a question of
> appropriate division of responsibilities in terms of understanding the
> runtime's context of use.
>
> The pragmatic problem is that the main CPython binary calls
> https://docs.python.org/3/c-api/sys.html#c.Py_DecodeLocale to convert
> the command line arguments from char* to wchar_t* before it calls
> Py_Main, which means we have to override the locale *before* we hand
> over control to the dynamically linked library. Otherwise we end up in
> exactly the same situation that click complains about: by the time we
> find out there's a problem with the locale, some work has already been
> done using the wrong setting.
>
I thought about this one and decided it isn't really a problem.  Just
make the check in both places.  The CPython binary is choosing/needs
to preprocess the arguments before calling Py_Main so it needs to
check and set locale to do its job.  libpython needs to make the same
check before it knows it can do its job successfully.  (Actually, this
looks like an API choice on libpython's part.  The arguments are
decoded from raw bytes but they aren't assigned any semantic meaning.
If libpython handled the decoding as well, then this wouldn't be a
concern).

> The architectural problem is that when you embed CPython, it really is
> one of the embedding application's responsibilities to configure the
> locale such that the interpreter plays nice with the rest of the
> application. It's one thing to second guess the shell from directly
> inside a C-level main() function when we know POSIX makes some really
> old ASCII-centric assumptions and that developers are prone to writing
> "LANG=C" rather than "LANG=C.UTF-8" to turn off their locale settings,
> but something else entirely to second guess a GUI application like
> Blender (where arbitrary amounts of code may have already run before
> the CPython runtime gets initialised) or an application platform with
> its own environment management system like Apache httpd.
>
yeah, this one is much tougher, although I disagree on the reason it's
a problem.  I do not think it's necessarily the embedding
application's responsibility to make sure the embedded interpreter.can
run but the nature of the environment variables being process-global
means that the library can't set them without affecting the
application as a whole.  That's a big no-no..

I'd almost say that internalizing the click behviour could be the
correct design here.  Have the library check that it has a locale with
non-ascii capabilities and fail if it doesn't would be helpful.  That
would quickly point to differences in behaviours running under a
mod_wsgi vs /usr/bin/python, for instance, prompting the user to fix
the mod_wsgi deployment in advance.  OTOH, users don't run into the
problem all the time (it depends on the data being processed and how
it is handled) so it seems heavy handed to do it this way (I suppose
by the same argument I'd have to say that click is doing it wrong to
force users to address ascii-only locales...)

The costs here are very steep in both directions... so I don't see any
good ways to address it yet.  The best I can offer so far is for the
library to check and warn if an ascii-only locale is used.  That way
someone who encounters a UnicodeError in code deployed under mod_wsgi
is shown how to debug this when they run it under /usr/bin/python.  So
something like this from the library:

libpython detected LC_CTYPE=C.Some encoding errors may occur..
Use.PYTHONALLOWCLOCALE=1 LC_CTYPE=C /usr/bin/python if debugging this
under /usr/bin/python.

-Toshio
_______________________________________________
python-devel mailing list -- python-devel@lists.fedoraproject.org
To unsubscribe send an email to python-devel-le...@lists.fedoraproject.org

Reply via email to