Le mar. 23 juin 2020 à 16:56, Stefan Behnel <stefan...@behnel.de> a écrit : > > Adding a new member breaks the stable ABI (PEP 384), especially for > > types declared statically (e.g. ``static PyTypeObject MyType = > > {...};``). In Python 3.4, the PEP 442 "Safe object finalization" added > > the ``tp_finalize`` member at the end of the ``PyTypeObject`` structure. > > For ABI backward compatibility, a new ``Py_TPFLAGS_HAVE_FINALIZE`` type > > flag was required to announce if the type structure contains the > > ``tp_finalize`` member. The flag was removed in Python 3.8 (`bpo-32388 > > <https://bugs.python.org/issue32388>`_). > > Probably not the best example. I think this is pretty much normal API > evolution. Changing the deallocation protocol for objects is going to > impact any public API in one way or another. PyTypeObject is also not > exposed with its struct fields in the limited API, so your point regarding > "tp_print" is also not a strong one.
The PEP 442 doesn't break backward compatibility. C extensions using tp_dealloc continue to work. But adding a new member to PyTypeObject caused practical implementation issues. I'm not sure why you are mentioning the limited C API. Most C extensions don't use it and declare their type as "static types". I'm not trying to describe the Py_TPFLAGS_HAVE_FINALIZE story as a major blocker issue in the CPython history. It's just one of the many examples of issues to evolve CPython internals. > > Same CPython design since 1990: structures and reference counting > > ----------------------------------------------------------------- > > Members of ``PyObject`` and ``PyTupleObject`` structures have not > > changed since the "Initial revision" commit (1990) > > While I see an advantage in hiding the details of PyObject (specifically > memory management internals), I would argue that there simply isn't much to > improve in PyTupleObject, so these two don't fly at the same level for me. There are different reasons to make PyTupleObject opaque: * Prevent access to members of its "PyObject ob_base" member (disallow accessing directly "tuple->ob_base.ob_refcnt") * Prevent C extensions to make assumptions on how a Python implementation stores a tuple. Currently, C extensions are designed to have best performances with CPython, but it makes them run slower on PyPy. * It becomes possible to experiment with a more efficient PyTypeObject layout, in terms of memory footprint or runtime performance, depending on the use case. For example, storing directly numbers as numbers rather than PyObject. Or maybe use a different layout to make PyList_AsTuple() an O(1) operation. I had a similar idea about converting a bytearray into a bytes without having to copy memory. It also requires to modify PyBytesObject to experiment such idea. An array of PyObject* is the most efficient storage for all use cases. > My feeling is that PyPy specifically is better served with the HPy API, > which is different enough to consider it a mostly separate API, or an > evolution of the limited API, if you want. Suggesting that extension > authors support two different APIs is much, but forcing them to support the > existing CPython C-API (for legacy reasons) and the changed CPython C-API > (for future compatibility), and then asking them to support a separate > C-API in addition (for platform independence, with performance penalties) > seems stretching it a lot. The PEP 620 changes the C API to make it converge to the limited C API, but it also prepares C extensions to ease their migration to HPy. For example, by design, HPy doesn't give a direct access to the PyTupleObject.ob_item member. Enforce usage of PyTuple_GetItem() function or PyTuple_GET_ITEM() maro should ease migration to HPy_GetItem_i(). I disagree that extension authors have to support two C APIs. Many PEP 620 incompatible C changes are already completed, and I was surprised by the very low numbers of extensions affected by these changes. In practice, most extensions use simple and regular C code, they don't "abuse" the C API. Cython itself is affected by most chances since Cython basically uses all C API features :-) But in practice, only a minority of extensions written with Cython are affected, since they (indirectly, via Cython) only use a subset of the C API. Also, once an extension is updated for incompatible changes, it remains compatible with old Python versions. When a new function is used, pythoncapi_compat.h can be used to support old Python versions. It is not like code has to be duplicated to support two unrelated APIs. > > * (**Completed**) Add new functions ``Py_SET_TYPE()``, ``Py_SET_REFCNT()`` > > and > > ``Py_SET_SIZE()``. The ``Py_TYPE()``, ``Py_REFCNT()`` and > > ``Py_SIZE()`` macros become functions which cannot be used as l-value. > > * (**Completed**) New C API functions must not return borrowed > > references. > > * (**In Progress**) Provide ``pythoncapi_compat.h`` header file. > > * (**In Progress**) Make structures opaque, add getter and setter > > functions. > > * (**Not Started**) Deprecate ``PySequence_Fast_ITEMS()``. > > * (**Not Started**) Convert ``PyTuple_GET_ITEM()`` and > > ``PyList_GET_ITEM()`` macros to static inline functions. > > Most of these have the potential to break code, sometimes needlessly, > AFAICT. Py_SET_xxx() functions are designed to allow experiment tagged pointers in CPython. Do you mean that tagged pointers are not worth to be experimented with? Early Neil's proof-of-concept was promising: https://mail.python.org/archives/list/capi-...@python.org/thread/EGAY55ZWMF2WSEMP7VAZSFZCZ4VARU7L/#EGAY55ZWMF2WSEMP7VAZSFZCZ4VARU7L PyPy decided to abandon tagged pointers, since it wasn't really worth it in PyPy. But PyPy and CPython have a very different design, IMO the performance will be more interesting in CPython than in PyPy. > Especially the efforts to block away the internal data structures > annoy me. It's obviously ok if we don't require other implementations to > provide this access, but CPython has these data structures and I think it > should continue to expose them. CPython continues to expose structures in its internal C API. > If we remove CPython specific features from the (de-facto) "official public > Python C-API", then I think there should be a "public CPython 3.X C-API" > that actively exposes the data structures natively, not just an "internal" > one. That way, extension authors can take the usual decision between > performance, maintenance effort and platform independence. I would like to promote "portable" C code, rather than promote writing CPython specific code. I mean that the "default" should be the portable API, and writing CPython specific code would be a deliberate opt-in choice. > > typedef struct { > > PyObject ob_base; > > double ob_fval; > > } PyFloatObject; > > Please keep PyFloat_AS_DOUBLE() and friends do what they currently do. If PyFloatObject becomes opaque, PyFloat_AS_DOUBLE() macro must become a function call. > > Making ``PyTypeObject`` structure opaque breaks C extensions declaring > > types statically (e.g. ``static PyTypeObject MyType = {...};``). > > Not necessarily. There was an unimplemented feature proposed in PEP-3121, > the PyType_Copy() function. > > https://www.python.org/dev/peps/pep-3121/#specification > > PyTypeObject does not have to be opaque. But it also doesn't have to be the > same thing for defining and for using types. You could still define a type > with a PyTypeObject struct and then copy it over into a heap type or other > internal type structure from there. A practical issue is that many C extensions refer directly to a type using something like "&MyType". Example in CPython: #define PyUnicode_CheckExact(op) Py_IS_TYPE(op, &PyUnicode_Type) If PyType_Copy(&PyUnicode_Type) is used to allocate the real unicode type as a heap type, code using &PyUnicode_Type will fail. See https://bugs.python.org/issue40601 "[C API] Hide static types from the limited C API" about this issue. This issue concerns subinterpreters: each subinterpreter should have its own (copied of) types. > Whether that's better than using PyType_FromSpec(), maybe not, but at least > it doesn't mean we have to break existing code that uses static extension > type definitions. If we choose the PyType_Copy() way, we must stop referring to types as "PyTypeObject*" internally, but maybe use "PyHeapTypeObject*" or something else. Currently, static types and heap types are interchangeable on purpose. > I haven't come across a use > case yet where I had to change a ref-count by more than 1, but allowing > users to arbitrarily do that may require way more infrastructure under the > hood than allowing them to create or remove a single reference to an > object. I think explicit is really better than implicit here. Py_SET_REFCNT() is not Py_INCREF(). It's used for special functions like free lists, resurrect an object, save/restore reference counter (during resurrection), etc. > The same does not seem to apply to "Py_SET_TYPE()" and "Py_SET_SIZE()", > since any object or (applicable) container implementation would normally > have to know its type and size, regardless of any implementation details. Py_SET_TYPE() is needed to set tp_base on types declared statically. "tp_base = &PyType_Type" doesn't work on Visual Studio if I recall correctly. See for example the numpy fix: https://github.com/numpy/numpy/commit/a96b18e3d4d11be31a321999cda4b795ea9eccaa Py_SET_SIZE() is needed for types which inherit from PyVarObject, like PyListObject. > > The important part is coordination and finding a balance between CPython > > evolutions and backward compatibility. For example, breaking a random, > > old, obscure and unmaintained C extension on PyPI is less severe than > > breaking numpy. > > This sounds like a common CI testing infrastructure would help all sides. > Currently, we have something like that mostly working by having different > projects integrate with each other's master branch, e.g. Pandas, NumPy, > Cython, and notifying each other of detected breakages. It's mostly every > project setting up its own CI on travis&Co here, so a bit of duplicated > work on all sides. Not sure if that's inherently bad, but there's > definitely some room for generalisation and improvements. I wrote https://github.com/vstinner/pythonci to test cython, numpy and a few other projects on the next Python version (master branch). First, I even wrote a section of the PEP: "please test your project on the next Python version", but I removed it since it doesn't require any change in CPython itself, and we cannot require people to do it. > Again, thanks Victor for pushing these efforts. Even if me and others are > giving you a hard time getting your proposals accepted, I appreciate the > work that you put into improving the ecosystem(s). Thanks Stefan for your very useful feedback :-) I'm sure that it will help to enhance the PEP. I'm open to consider removing a bunch of incompatible changes like making the PyObject structure opaque. If you look at my PyObject https://bugs.python.org/issue39573 and PyTypeObject https://bugs.python.org/issue40170 issues: the changes that I already pushed are mostly changes to abstract access to these structures. Victor -- Night gathers, and now my watch begins. It shall not end until my death. _______________________________________________ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-le...@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/QJWZYS6G43BQFML4IMQIWBICOWXU3C44/ Code of Conduct: http://python.org/psf/codeofconduct/