Dag Sverre Seljebotn <d.s.seljeb...@astro.uio.no> wrote:
>I'm repeating myself a bit, but my previous thread of this ended up >being about something else, and also since then I've been on an >expedition to the hostile waters of python-dev. > >I'm crazy enough to believe that I'm proposing a technical solution to >alleviate the problems we've faced as a community the past year. No, >this will NOT be about NA, and certainly not governance, but do please >allow me one paragraph of musings before the meaty stuff. > >I believe the Achilles heel of NumPy is the C API and the >PyArrayObject. >The reliance we all have on the NumPy C API means there can in practice > >only be one "array" type per Python process. This makes people *very* >afraid of creative forking or new competing array libraries (since they > >just can't live in parallel -- like Cython and Pyrex can!), and every >new feature has to go into ndarray to fully realise itself. This in >turn >means that experimentation with new features has to happen within one >or >a few release cycles, it cannot happen in the wild and by competition >and by seeing what works over the course of years before finally making > >it into upstream. Finally, if any new great idea can really only be >implemented decently if it also impacts thousands of users...that's bad > >both for morale and developer recruitment. > >The meat: > >There's already of course been work on making the NumPy C API work >through an indirection layer to make a more stable ABI. This is about >changing the ideas of how that indirection should happen, so that you >could in theory implement the C API independently of NumPy. > >You could for instance make a "mini-NumPy" that only contains the bare >essentials, and load that in the same process as the real NumPy, and >use >the C API against objects from both libraries. > >I'll assume that we can get a PEP through by waving a magic wand, since > >that makes it easier to focus on essentials. There's many ugly or less >ugly hacks to make it work on any existing CPython [1], and they >wouldn't be so ugly if there's PEP blessing for the general idea. > >Imagine if PyTypeObject grew an extra pointer "tp_customslots", which >pointed to an array of these: > >typedef struct { > unsigned long tpe_id; > void *tpe_data; >} PyTypeObjectCustomSlot; > >The ID space is partitioned to anyone who asks, and NumPy is given a >large chunk. To insert a "custom slot", you stick it in this list. And >you search it linearly for, say, PYTYPE_CUSTOM_NUMPY_SLOT (each type >will typically have 0-3 entries so the search is very fast). > >I've benchmarked something very similar recently, and the overhead in a > >"hot" situation is on the order of 4-6 cycles. (As for cache, you can >at >least stick the slot array right next to the type object in memory.) > >Now, a NumPy array would populate this list with 1-2 entries pointing >to >tables of function pointers for the NumPy C API. This lookup through >the >PyTypeObject would in part replace the current import_array() >mechanism. > >I'd actually propose two such custom slots for ndarray for starters: > >a) One PEP 3118-like binary description that exposes raw data pointers To be more clear: the custom-slot in the pytypeobject would contain an offset that you could add to your PyObject* to get to this information. Dag >(without the PEP 3118 red tape) > > b) A function pointer table for a suitable subset of the NumPy C API >(obviously not array construction and so on) > >The all-important PyArray_DATA/DIMS/... would be macros that try for a) > >first, but fall back to b). Things like PyArray_Check would actually >check for support of these slots, "duck typing", rather than the Python > >type (of course, this could only be done at a major revision like NumPy > >2.0 or 3.0). > >The overhead should be on the order of 5 cycles per C API call. That >should be fine for anything but the use of PyArray_DATA inside a tight >loop (which is a bad idea anyway). > >For now I just want to establish if there's support for this general >idea, and see if I can get some weight behind a PEP (and ideally a >co-author), which would make this a general approach and something more > >than an ugly NumPy specific hack. We'd also have good use for such a >PEP >in Cython (and, I believe, Numba/SciPy in CEP 1000). > >Dag > >[1] There's many ways of doing similar things in current Python, such >as >standardising across many participating projects on using a common >metaclass. Here's another alternative that doesn't add such >inter-project dependencies but is more renegade: >http://wiki.cython.org/enhancements/cep1001 >_______________________________________________ >NumPy-Discussion mailing list >NumPy-Discussion@scipy.org >http://mail.scipy.org/mailman/listinfo/numpy-discussion -- Sent from my Android phone with K-9 Mail. Please excuse my brevity. _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion