[Python-Dev] Re: PEP 620: Hide implementation details from the C API

Victor Stinner Tue, 23 Jun 2020 17:36:02 -0700

Le mar. 23 juin 2020 à 16:56, Stefan Behnel <stefan...@behnel.de> a écrit :
> > Adding a new member breaks the stable ABI (PEP 384), especially for
> > types declared statically (e.g. ``static PyTypeObject MyType =
> > {...};``). In Python 3.4, the PEP 442 "Safe object finalization" added
> > the ``tp_finalize`` member at the end of the ``PyTypeObject`` structure.
> > For ABI backward compatibility, a new ``Py_TPFLAGS_HAVE_FINALIZE`` type
> > flag was required to announce if the type structure contains the
> > ``tp_finalize`` member. The flag was removed in Python 3.8 (`bpo-32388
> > <https://bugs.python.org/issue32388>`_).
>
> Probably not the best example. I think this is pretty much normal API
> evolution. Changing the deallocation protocol for objects is going to
> impact any public API in one way or another. PyTypeObject is also not
> exposed with its struct fields in the limited API, so your point regarding
> "tp_print" is also not a strong one.


The PEP 442 doesn't break backward compatibility. C extensions using
tp_dealloc continue to work.

But adding a new member to PyTypeObject caused practical
implementation issues. I'm not sure why you are mentioning the limited
C API. Most C extensions don't use it and declare their type as
"static types".

I'm not trying to describe the Py_TPFLAGS_HAVE_FINALIZE story as a
major blocker issue in the CPython history. It's just one of the many
examples of issues to evolve CPython internals.


> > Same CPython design since 1990: structures and reference counting
> > -----------------------------------------------------------------
> > Members of ``PyObject`` and ``PyTupleObject`` structures have not
> > changed since the "Initial revision" commit (1990)
>
> While I see an advantage in hiding the details of PyObject (specifically
> memory management internals), I would argue that there simply isn't much to
> improve in PyTupleObject, so these two don't fly at the same level for me.

There are different reasons to make PyTupleObject opaque:

* Prevent access to members of its "PyObject ob_base" member (disallow
accessing directly "tuple->ob_base.ob_refcnt")

* Prevent C extensions to make assumptions on how a Python
implementation stores a tuple. Currently, C extensions are designed to
have best performances with CPython, but it makes them run slower on
PyPy.

* It becomes possible to experiment with a more efficient PyTypeObject
layout, in terms of memory footprint or runtime performance, depending
on the use case. For example, storing directly numbers as numbers
rather than PyObject. Or maybe use a different layout to make
PyList_AsTuple() an O(1) operation. I had a similar idea about
converting a bytearray into a bytes without having to copy memory. It
also requires to modify PyBytesObject to experiment such idea. An
array of PyObject* is the most efficient storage for all use cases.


> My feeling is that PyPy specifically is better served with the HPy API,
> which is different enough to consider it a mostly separate API, or an
> evolution of the limited API, if you want. Suggesting that extension
> authors support two different APIs is much, but forcing them to support the
> existing CPython C-API (for legacy reasons) and the changed CPython C-API
> (for future compatibility), and then asking them to support a separate
> C-API in addition (for platform independence, with performance penalties)
> seems stretching it a lot.

The PEP 620 changes the C API to make it converge to the limited C
API, but it also prepares C extensions to ease their migration to HPy.

For example, by design, HPy doesn't give a direct access to the
PyTupleObject.ob_item member. Enforce usage of PyTuple_GetItem()
function or PyTuple_GET_ITEM() maro should ease migration to
HPy_GetItem_i().


I disagree that extension authors have to support two C APIs. Many PEP
620 incompatible C changes are already completed, and I was surprised
by the very low numbers of extensions affected by these changes. In
practice, most extensions use simple and regular C code, they don't
"abuse" the C API. Cython itself is affected by most chances since
Cython basically uses all C API features :-) But in practice, only a
minority of extensions written with Cython are affected, since they
(indirectly, via Cython) only use a subset of the C API.

Also, once an extension is updated for incompatible changes, it
remains compatible with old Python versions. When a new function is
used, pythoncapi_compat.h can be used to support old Python versions.
It is not like code has to be duplicated to support two unrelated
APIs.


> > * (**Completed**) Add new functions ``Py_SET_TYPE()``, ``Py_SET_REFCNT()`` 
> > and
> >   ``Py_SET_SIZE()``. The ``Py_TYPE()``, ``Py_REFCNT()`` and
> >   ``Py_SIZE()`` macros become functions which cannot be used as l-value.
> > * (**Completed**) New C API functions must not return borrowed
> >   references.
> > * (**In Progress**) Provide ``pythoncapi_compat.h`` header file.
> > * (**In Progress**) Make structures opaque, add getter and setter
> >   functions.
> > * (**Not Started**) Deprecate ``PySequence_Fast_ITEMS()``.
> > * (**Not Started**) Convert ``PyTuple_GET_ITEM()`` and
> >   ``PyList_GET_ITEM()`` macros to static inline functions.
>
> Most of these have the potential to break code, sometimes needlessly,
> AFAICT.

Py_SET_xxx() functions are designed to allow experiment tagged
pointers in CPython. Do you mean that tagged pointers are not worth to
be experimented with? Early Neil's proof-of-concept was promising:
https://mail.python.org/archives/list/capi-...@python.org/thread/EGAY55ZWMF2WSEMP7VAZSFZCZ4VARU7L/#EGAY55ZWMF2WSEMP7VAZSFZCZ4VARU7L

PyPy decided to abandon tagged pointers, since it wasn't really worth
it in PyPy. But PyPy and CPython have a very different design, IMO the
performance will be more interesting in CPython than in PyPy.


> Especially the efforts to block away the internal data structures
> annoy me. It's obviously ok if we don't require other implementations to
> provide this access, but CPython has these data structures and I think it
> should continue to expose them.

CPython continues to expose structures in its internal C API.


> If we remove CPython specific features from the (de-facto) "official public
> Python C-API", then I think there should be a "public CPython 3.X C-API"
> that actively exposes the data structures natively, not just an "internal"
> one. That way, extension authors can take the usual decision between
> performance, maintenance effort and platform independence.

I would like to promote "portable" C code, rather than promote writing
CPython specific code.

I mean that the "default" should be the portable API, and writing
CPython specific code would be a deliberate opt-in choice.


> >     typedef struct {
> >         PyObject ob_base;
> >         double ob_fval;
> >     } PyFloatObject;
>
> Please keep PyFloat_AS_DOUBLE() and friends do what they currently do.

If PyFloatObject becomes opaque, PyFloat_AS_DOUBLE() macro must become
a function call.


> > Making ``PyTypeObject`` structure opaque breaks C extensions declaring
> > types statically (e.g. ``static PyTypeObject MyType = {...};``).
>
> Not necessarily. There was an unimplemented feature proposed in PEP-3121,
> the PyType_Copy() function.
>
> https://www.python.org/dev/peps/pep-3121/#specification
>
> PyTypeObject does not have to be opaque. But it also doesn't have to be the
> same thing for defining and for using types. You could still define a type
> with a PyTypeObject struct and then copy it over into a heap type or other
> internal type structure from there.

A practical issue is that many C extensions refer directly to a type
using something like "&MyType". Example in CPython:

#define PyUnicode_CheckExact(op) Py_IS_TYPE(op, &PyUnicode_Type)

If PyType_Copy(&PyUnicode_Type) is used to allocate the real unicode
type as a heap type, code using &PyUnicode_Type will fail.

See https://bugs.python.org/issue40601 "[C API] Hide static types from
the limited C API" about this issue. This issue concerns
subinterpreters: each subinterpreter should have its own (copied of)
types.


> Whether that's better than using PyType_FromSpec(), maybe not, but at least
> it doesn't mean we have to break existing code that uses static extension
> type definitions.

If we choose the PyType_Copy() way, we must stop referring to types as
"PyTypeObject*" internally, but maybe use "PyHeapTypeObject*" or
something else. Currently, static types and heap types are
interchangeable on purpose.


> I haven't come across a use
> case yet where I had to change a ref-count by more than 1, but allowing
> users to arbitrarily do that may require way more infrastructure under the
> hood than allowing them to create or remove a single reference to an
> object. I think explicit is really better than implicit here.

Py_SET_REFCNT() is not Py_INCREF(). It's used for special functions
like free lists, resurrect an object, save/restore reference counter
(during resurrection), etc.


> The same does not seem to apply to "Py_SET_TYPE()" and "Py_SET_SIZE()",
> since any object or (applicable) container implementation would normally
> have to know its type and size, regardless of any implementation details.

Py_SET_TYPE() is needed to set tp_base on types declared statically.
"tp_base = &PyType_Type" doesn't work on Visual Studio if I recall
correctly. See for example the numpy fix:
https://github.com/numpy/numpy/commit/a96b18e3d4d11be31a321999cda4b795ea9eccaa

Py_SET_SIZE() is needed for types which inherit from PyVarObject, like
PyListObject.


> > The important part is coordination and finding a balance between CPython
> > evolutions and backward compatibility. For example, breaking a random,
> > old, obscure and unmaintained C extension on PyPI is less severe than
> > breaking numpy.
>
> This sounds like a common CI testing infrastructure would help all sides.
> Currently, we have something like that mostly working by having different
> projects integrate with each other's master branch, e.g. Pandas, NumPy,
> Cython, and notifying each other of detected breakages. It's mostly every
> project setting up its own CI on travis&Co here, so a bit of duplicated
> work on all sides. Not sure if that's inherently bad, but there's
> definitely some room for generalisation and improvements.

I wrote https://github.com/vstinner/pythonci to test cython, numpy and
a few other projects on the next Python version (master branch).

First, I even wrote a section of the PEP: "please test your project on
the next Python version", but I removed it since it doesn't require
any change in CPython itself, and we cannot require people to do it.


> Again, thanks Victor for pushing these efforts. Even if me and others are
> giving you a hard time getting your proposals accepted, I appreciate the
> work that you put into improving the ecosystem(s).

Thanks Stefan for your very useful feedback :-) I'm sure that it will
help to enhance the PEP. I'm open to consider removing a bunch of
incompatible changes like making the PyObject structure opaque.

If you look at my PyObject https://bugs.python.org/issue39573 and
PyTypeObject https://bugs.python.org/issue40170 issues: the changes
that I already pushed are mostly changes to abstract access to these
structures.

Victor
-- 
Night gathers, and now my watch begins. It shall not end until my death.
_______________________________________________
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/QJWZYS6G43BQFML4IMQIWBICOWXU3C44/
Code of Conduct: http://python.org/psf/codeofconduct/

[Python-Dev] Re: PEP 620: Hide implementation details from the C API

Reply via email to