A WSGI script file is not a __main__ module which is how a script file given as an argument to command line Python is treated. From memory the code related to importing that script file as the __main__ module and all the special treatment related to it is very convoluted and not something that can be reused by anything else. I even recollect it being implemented in an internal C function of CPython that can't be called by anything else.
That was back in Python 1.X/2.X days though. The question thus is whether Python 3.X has refactored the code for processing the script file as the __main__ module out into a separate pure Python code module. If that has been done, and one can dictate a different name for the module besides __main__ (which cannot be used in mod_wsgi since one could technically have multiple loaded WSGI script files in the same interpreter context which need to be named differently), then maybe it could be reused. This also depends though on whether module reload for embedded mode of mod_wsgi can still be handled. So there are quite a lot of technical problems that would need to be solved first. If you have the time and can at least identify for me where in the CPython code (or Python stdlib) the importing of __main__ Python script file is handled with the behaviour you need, that would give me a head start to work out whether it is practical. A starting point may be the run.py module, but not sure these days after all the rewrites of the Python module system over time where it is even handled. So it may be possible now that mod_wsgi is only supporting Python 3, it definitely wouldn't have been possible when was supporting both Python 2 and 3 though, as pretty sure how it was done in Python 2 meant code wasn't reusable (or that could have been Python 1.X). On Fri, 22 Mar 2024 at 13:36, Lucas Thode <thode...@gmail.com> wrote: > The problem with your statement that "As to the initial WSGI script file, > it is not a module import and so any special language encoding definition > in a magic header of the file is ignored and it should just use whatever > the Python lang/locale is set to." is that the CPython interpreter itself > does not ignore magic headers and zipapp functionality when passed a script > file on the command line. Indeed, `python3 myapp.pyz`, where myapp.pyz is > a valid Python zipapp, will run the Python code in the zipapp's > __main__.py. Should I file a bug against mod_wsgi regarding its lack of > support for what is normal Python functionality in every other context (not > just imported modules)? > > On Thursday, March 21, 2024 at 9:00:35 PM UTC-5 Graham Dumpleton wrote: > > Depends a little bit on whether you are using embedded mode or daemon mode > of mod_wsgi, or whether using mod_wsgi-express. > > The Python embedded in Apache when not using mod_wsgi-express should by > default inherit the system default locale. This is often the C or POSIX > locale from memory and not any variant of UTF-8 because Linux distros don't > necessarily do sane things, although this may actually have changed. > > What is calculated for language/local for specific HTTP requests to Apache > based on Apache's rules makes no difference. > > If you are using daemon mode of mod_wsgi you can use the lang/locale > option to the WSGIDaemonProcess directive to explicitly set it for those > processes. > > > https://modwsgi.readthedocs.io/en/master/configuration-directives/WSGIDaemonProcess.html#lang > > https://modwsgi.readthedocs.io/en/master/configuration-directives/WSGIDaemonProcess.html#locale > > I can't remember if there is a way of overriding it for embedded mode > easily besides setting it in systemd or other startup files which startup > Apache, I don't think so, so it is governed by what Apache process inherits > from the system. You can possibly use Python functions to change it after > the process started, but that may be too late for stuff which is already > imported. > > If you are using mod_wsgi-express, it tries to set things itself to a sane > value if not set by the --locale command line option. > > Bit of a description about it in: > > > https://github.com/GrahamDumpleton/mod_wsgi/blob/f54eadd6da8e3da0faccd497d4165de435b97242/docs/release-notes/version-4.4.3.rst#features-changed > > > > > > * The behaviour of the --locale option to mod_wsgi-express has changed. > Previously if this option was not defined, then both of the locales > en_US.UTF-8 and C.UTF-8 have at times been hardwired as the default locale. > These locales are though not always present. As a consequence, a new > algorithm is now used. If the --locale option is supplied, the argument > will be used as the locale. If no argument is supplied, the default locale > for the executing mod_wsgi-express process will be used. If that however is > C or POSIX, then an attempt will be made to use either the en_US.UTF-8 or > C.UTF-8 locales and if that is not possible only then fallback to the > default locale of the mod_wsgi-express process. In other words, unless you > override the default language locale, an attempt is made to use an English > language locale with UTF-8 encoding.* > > So the wisest thing to do if you have a special requirement is to set > --locale option. > > If you force mod_wsgi-express into embedded mode though, it possibly just > inherits whatever parent shell is using again, I can't remember if > mod_wsgi-express tries to set it in the parent process as well so inherited > in the child process. > > As to the initial WSGI script file, it is not a module import and so any > special language encoding definition in a magic header of the file is > ignored and it should just use whatever the Python lang/locale is set to. > > If you need such a thing to be honoured then don't put your real code in > the WSGI script file and instead hold your project code in a distinct > Python package structure and import modules from it in the WSGI script file. > > Not sure if this answers your question or not. My memory is very murky > about some of this stuff, especially what happens in embedded mode. > > Graham > > On Fri, 22 Mar 2024 at 12:31, Lucas Thode <thod...@gmail.com> wrote: > > What determines which encoding mod_wsgi uses when it reads WSGI scripts: > Apache's configured locale (which for me is en_us.UTF8), or something > else? (I ask about this because mod_wsgi appears to do low-level manual > hackery when reading wsgi script files instead of going through importlib > or runpy, which means that it can't handle a zipapp or even something that > uses a PEP 263 magic comment to convey encoding information, the latter > making it impossible to "wrap" a zipapp with a loader shim even unless > something else gives.) > > Minimized example (works when you run it using python3 breaks.py, breaks > with the errors below if you try to load it using `mod_wsgi-express > start-server breaks.py` using a mod_wsgi-express installed into a venv with > pip install), note that you will have to save breaks.py as > latin1/iso-8859-1 to cause this to break): > > $ cat breaks.py > # coding: latin1 > import sys > from wsgiref.simple_server import make_server > > def application(environ, start_response): > start_response('200 OK', [('Content-Type', 'text/plain')]) > message = 'It works!\n' > version = 'Python v' + sys.version.split()[0] + '\n' > response = '\n'.join([message, version]) > return [response.encode()] > > def main(): > with make_server('', 8100, application) as httpd: > httpd.serve_forever() > > blow_up_unicode = 'â(¡' # \xe2\x28\xa1 > > if __name__ == '__main__': > main() > > Errors it generates when run under mod_wsgi-express: > [Thu Mar 21 20:12:56.942439 2024] [wsgi:error] [pid 3288289:tid > 140356515776384] > mod_wsgi (pid=3288289): Failed to exec Python script file > '/tmp/mod_wsgi-localh > ost:8000:1000/handler.wsgi'. > [Thu Mar 21 20:12:56.942486 2024] [wsgi:error] [pid 3288289:tid > 140356515776384] > mod_wsgi (pid=3288289): Exception occurred processing WSGI script > '/tmp/mod_wsg > i-localhost:8000:1000/handler.wsgi'. > [Thu Mar 21 20:12:56.943223 2024] [wsgi:error] [pid 3288289:tid > 140356515776384] Traceback (most recent call last): > [Thu Mar 21 20:12:56.943329 2024] [wsgi:error] [pid 3288289:tid > 140356515776384] File "/tmp/mod_wsgi-localhost:8000:1000/handler.wsgi", > line 90, in <module> > [Thu Mar 21 20:12:56.943335 2024] [wsgi:error] [pid 3288289:tid > 140356515776384] handler = > mod_wsgi.server.ApplicationHandler(entry_point, > [Thu Mar 21 20:12:56.943337 2024] [wsgi:error] [pid 3288289:tid > 140356515776384] > ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ > [Thu Mar 21 20:12:56.943345 2024] [wsgi:error] [pid 3288289:tid > 140356515776384] File > "/home/lucas/wsgizip/lib/python3.11/site-packages/mod_wsgi/server/__init__.py", > line 1475, in __init__ > [Thu Mar 21 20:12:56.943348 2024] [wsgi:error] [pid 3288289:tid > 140356515776384] code = compile(fp.read(), entry_point, 'exec', > [Thu Mar 21 20:12:56.943350 2024] [wsgi:error] [pid 3288289:tid > 140356515776384] ^^^^^^^^^ > [Thu Mar 21 20:12:56.943356 2024] [wsgi:error] [pid 3288289:tid > 140356515776384] File "<frozen codecs>", line 322, in decode > [Thu Mar 21 20:12:56.943371 2024] [wsgi:error] [pid 3288289:tid > 140356515776384] UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe2 > in position 458: invalid continuation byte > > -- > You received this message because you are subscribed to the Google Groups > "modwsgi" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to modwsgi+u...@googlegroups.com. > To view this discussion on the web visit > https://groups.google.com/d/msgid/modwsgi/18b16e2e-4e3c-49f7-84af-7351e4619687n%40googlegroups.com > <https://groups.google.com/d/msgid/modwsgi/18b16e2e-4e3c-49f7-84af-7351e4619687n%40googlegroups.com?utm_medium=email&utm_source=footer> > . > > -- > You received this message because you are subscribed to the Google Groups > "modwsgi" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to modwsgi+unsubscr...@googlegroups.com. > To view this discussion on the web visit > https://groups.google.com/d/msgid/modwsgi/eb473637-7b5b-4329-9d27-f5597b709331n%40googlegroups.com > <https://groups.google.com/d/msgid/modwsgi/eb473637-7b5b-4329-9d27-f5597b709331n%40googlegroups.com?utm_medium=email&utm_source=footer> > . > -- You received this message because you are subscribed to the Google Groups "modwsgi" group. To unsubscribe from this group and stop receiving emails from it, send an email to modwsgi+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/modwsgi/CALRNbkD5HEdaN%2BmyPvnX5Rm10_-8PR7xoXSQmy1-0hi%2BKEoyQg%40mail.gmail.com.