Hi, I started to modify Python internals to pass explicitly the Python thread state ("tstate") to internal C a functions: https://vstinner.github.io/cpython-pass-tstate.html
Until subinterpreters will be fully implemented, it's unclear if getting the current Python thread state using _PyThreadState_GET() macro (which uses an atomic read) will remain efficient. For example, right now, the "GIL state" API doesn't support subinterpreters: fixing this may require to add a lock somewhere which may make _PyThreadState_GET() less efficient. Sorry, I don't have numbers, since I didn't experiment to implement these changes yet: I was blocked by other issues. We can only guess at this point. To me, it sounds like a good practice to pass the current Python thread state to internal C functions. It seems like most core developers agreed with that in my previous python-dev thread "Pass the Python thread state to internal C functions": https://mail.python.org/archives/list/python-dev@python.org/thread/PQBGECVGVYFTVDLBYURLCXA3T7IPEHHO/#Q4IPXMQIM5YRLZLHADUGSUT4ZLXQ6MYY The question is now if we should "propagate" tstate to function calls in the latest VECTORCALL calling convention (which is currently private). Petr Viktorin plans to make VECTORCALL APIs public in Python 3.9, as planned in the PEP 590: https://bugs.python.org/issue39245 I added explicitly Stefan Behnel in copy, since Cython should be directly impacted by such change. Cython is the kind of project which may benefit of having tstate accessible directly. I started to move more and more things from "globals" to "per interpreter". For example, the garbage collector is now "per interpreter" (lives in PyThreadState). Small integer singletons are now also "per singleton": int("1") are now different objects in each interpreter, whereas they were shared previously. Later, even "None" singleton (and all other singletons) should be made "per interpreter". Getting a "per interpreter" object requires to state from the Python thread state: call _PyThreadState_GET(). Avoiding _PyThreadState_GET() calls reduces any risk of making Python slower with incoming subinterpreter changes. For the long term, the goal is that each subinterpreter has its own isolated world: no Python object should be shared, no state should be shared. The intent is to avoid any need of locking, to maximize performance when running interpreters in parallel. Obviously, each interpreter will have its own private GIL ;-) Py_INCREF() and Py_DECREF() would require slow atomic operations if Python objects are shared. If objects are not shared between interpreters, Py_INCREF() and Py_DECREF() can remain as fast as they are today. Any additional locking may kill performance. Victor -- Night gathers, and now my watch begins. It shall not end until my death. _______________________________________________ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-le...@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/PIXJAJPWKDGHSQD65VOO2B7FDLU2QLHH/ Code of Conduct: http://python.org/psf/codeofconduct/