Hi,

I started to modify Python internals to pass explicitly the Python
thread state ("tstate") to internal C a functions:
https://vstinner.github.io/cpython-pass-tstate.html

Until subinterpreters will be fully implemented, it's unclear if
getting the current Python thread state using _PyThreadState_GET()
macro (which uses an atomic read) will remain efficient. For example,
right now, the "GIL state" API doesn't support subinterpreters: fixing
this may require to add a lock somewhere which may make
_PyThreadState_GET() less efficient. Sorry, I don't have numbers,
since I didn't experiment to implement these changes yet: I was
blocked by other issues. We can only guess at this point.

To me, it sounds like a good practice to pass the current Python
thread state to internal C functions. It seems like most core
developers agreed with that in my previous python-dev thread "Pass the
Python thread state to internal C functions":

https://mail.python.org/archives/list/python-dev@python.org/thread/PQBGECVGVYFTVDLBYURLCXA3T7IPEHHO/#Q4IPXMQIM5YRLZLHADUGSUT4ZLXQ6MYY

The question is now if we should "propagate" tstate to function calls
in the latest VECTORCALL calling convention (which is currently
private). Petr Viktorin plans to make VECTORCALL APIs public in Python
3.9, as planned in the PEP 590:
https://bugs.python.org/issue39245

I added explicitly Stefan Behnel in copy, since Cython should be
directly impacted by such change. Cython is the kind of project which
may benefit of having tstate accessible directly.

I started to move more and more things from "globals" to "per
interpreter". For example, the garbage collector is now "per
interpreter" (lives in PyThreadState). Small integer singletons are
now also "per singleton": int("1") are now different objects in each
interpreter, whereas they were shared previously. Later, even "None"
singleton (and all other singletons) should be made "per interpreter".
Getting a "per interpreter" object requires to state from the Python
thread state: call _PyThreadState_GET(). Avoiding _PyThreadState_GET()
calls reduces any risk of making Python slower with incoming
subinterpreter changes.

For the long term, the goal is that each subinterpreter has its own
isolated world: no Python object should be shared, no state should be
shared. The intent is to avoid any need of locking, to maximize
performance when running interpreters in parallel. Obviously, each
interpreter will have its own private GIL ;-) Py_INCREF() and
Py_DECREF() would require slow atomic operations if Python objects are
shared. If objects are not shared between interpreters, Py_INCREF()
and Py_DECREF() can remain as fast as they are today. Any additional
locking may kill performance.

Victor
-- 
Night gathers, and now my watch begins. It shall not end until my death.
_______________________________________________
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/PIXJAJPWKDGHSQD65VOO2B7FDLU2QLHH/
Code of Conduct: http://python.org/psf/codeofconduct/

Reply via email to