New submission from Eric Snow <[email protected]>:
`_Py_Identifier` has been useful but at this point there is a faster and
simpler approach we could take as a replacement: statically initialize the
objects as fields on `_PyRuntimeState` and reference them directly through a
macro.
This would involve the following:
* add a `PyUnicodeObject field (not a pointer) to `_PyRuntimeState` for each
string that currently uses `_Py_IDENTIFIER()`
* initialize each object as part of the static initializer for `_PyRuntimeState`
* make each "immortal" (e.g. start with a really high refcount)
* add a macro to look up a given string
* update each location that currently uses `_Py_IDENTIFIER()` to use the new
macro instead
As part of this, we would also do the following:
* get rid of all C-API functions with `_Py_Identifer` parameters
* get rid of the old runtime state related to identifiers
* get rid of `_Py_Identifier`, `_Py_IDENTIFIER()`, etc.
(Note that there are several hundred uses of `_Py_IDENTIFIER()`, including a
number of duplicates.)
Pros:
* reduces indirection (and extra calls) for C-API using the strings (making the
code easier to understand and speeding it up)
* the objects are referenced from a fixed address in the static data section
(speeding things up and allowing the C compiler to optimize better)
* there is no lazy allocation (or lookup, etc.) so there are fewer possible
failures when the objects get used (thus less error return checking)
* simplifies the runtime state
* saves memory (at little, at least)
* the approach for per-interpreter is simpler (if needed)
* reduces the number of static variables in any given C module
* reduces the number of functions in the ("private") C-API
* "deep frozen" modules can use these strings
* other commonly-used strings could be pre-allocated by adding
`_PyRuntimeState` fields for them
Cons:
* churn
* adding a string to the list requires modifying a separate file from the one
where you actually want to use the string
* strings can get "orphaned" (we could prevent this with a check in `make
check`)
* some PyPI packages may rely on `_Py_IDENTIFIER()` (even though it is
"private" C-API)
* some strings may never get used for any given ./python invocation
Note that with a basic partial implementation (GH-30928) I'm seeing a 1%
improvement in performance (see
https://github.com/faster-cpython/ideas/issues/230).
----------
assignee: eric.snow
components: Interpreter Core
messages: 411799
nosy: eric.snow, serhiy.storchaka, vstinner
priority: normal
pull_requests: 29107
severity: normal
stage: needs patch
status: open
title: Replace _Py_IDENTIFIER() with statically initialized objects.
versions: Python 3.11
_______________________________________
Python tracker <[email protected]>
<https://bugs.python.org/issue46541>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe:
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com