On Wed, Feb 2, 2022 at 2:48 PM Eric Snow <ericsnowcurren...@gmail.com>
wrote:

> I'm planning on moving us to a simpler, more efficient alternative to
> _Py_IDENTIFIER(), but want to see if there are any objections first
> before moving ahead.  Also see https://bugs.python.org/issue46541.
>
> _Py_IDENTIFIER() was added in 2011 to replace several internal string
> object caches and to support cleaning up the cached objects during
> finalization.  A number of "private" functions (each with a
> _Py_Identifier param) were added at that time, mostly corresponding to
> existing functions that take PyObject* or char*.  Note that at present
> there are several hundred uses of _Py_IDENTIFIER(), including a number
> of duplicates.
>
> My plan is to replace our use of _Py_IDENTIFIER() with statically
> initialized string objects (as fields under _PyRuntimeState).  That
> involves the following:
>
> * add a PyUnicodeObject field (not a pointer) to _PyRuntimeState for
> each string that currently uses _Py_IDENTIFIER() (or
> _Py_static_string())
> * statically initialize each object as part of the initializer for
> _PyRuntimeState
> * add a macro to look up a given global string
> * update each location that currently uses _Py_IDENTIFIER() to use the
> new macro instead
>
> Pros:
>
> * reduces indirection (and extra calls) for C-API functions that need
> the strings (making the code a little easier to understand and
> speeding it up)
> * the objects are referenced from a fixed address in the static data
> section instead of the heap (speeding things up and allowing the C
> compiler to optimize better)
> * there is no lazy allocation (or lookup, etc.) so there are fewer
> possible failures when the objects get used (thus less error return
> checking)
> * saves memory (at little, at least)
> * if needed, the approach for per-interpreter is simpler
> * helps us get rid of several hundred static variables throughout the code
> base
> * allows us to get rid of _Py_IDENTIFIER() and a bunch of related
> C-API functions
> * "deep frozen" modules can use the global strings
> * commonly-used strings could be pre-allocated by adding
> _PyRuntimeState fields for them
>
> Cons:
>
> * a little less convenient: adding a global string requires modifying
> a separate file from the one where you actually want to use the string
> * strings can get "orphaned" (I'm planning on checking in CI)
> * some strings may never get used for any given ./python invocation
> (not that big a difference though)
>
> I have a PR up (https://github.com/python/cpython/pull/30928) that
> adds the global strings and replaces use of _Py_IDENTIFIER() in our
> code base, except for in non-builtin stdlib extension modules.  (Those
> will be handled separately if we proceed.)  The PR also adds a CI
> check for "orphaned" strings.  It leaves _Py_IDENTIFIER() for now, but
> disallows any Py_BUILD_CORE code from using it.
>
> With that change I'm seeing a 1% improvement in performance (see
> https://github.com/faster-cpython/ideas/issues/230).
>
> I'd also like to actually get rid of _Py_IDENTIFIER(), along with
> other related API including ~14 (private) C-API functions.  Dropping
> all that helps reduce maintenance costs.  However, at least one PyPI
> project (blender) is using _Py_IDENTIFIER().  So, before we could get
> rid of it, we'd first have to deal with that project (and any others).
>

datapoint: an internal code search turns up blender, multidict, and
typed_ast as open source users of _Py_IDENTIFIER .  Easy to clean up as
PRs.  There are a couple of internal uses as well, all of which are
similarly easy to address and are only in code that is expected to need API
cleanup tweaks between CPython versions.

Overall I think addressing the broader strategy question among
the performance focused folks is worthwhile though.

-gps


>
> To sum up, I wanted to see if there are any objections before I start
> merging anything.  Thanks!
>
> -eric
> _______________________________________________
> Python-Dev mailing list -- python-dev@python.org
> To unsubscribe send an email to python-dev-le...@python.org
> https://mail.python.org/mailman3/lists/python-dev.python.org/
> Message archived at
> https://mail.python.org/archives/list/python-dev@python.org/message/DNMZAMB4M6RVR76RDZMUK2WRLI6KAAYS/
> Code of Conduct: http://python.org/psf/codeofconduct/
>
_______________________________________________
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/BCQJ6AZMPTI2DGFQPC27RUIFJQGDIOQD/
Code of Conduct: http://python.org/psf/codeofconduct/

Reply via email to