On Tue, Jun 23, 2020 at 7:52 AM Stefan Behnel <stefan...@behnel.de> wrote:

> I agree that this is more explicit when it comes to resource management,
> but there is nothing that beats direct native data structure access when it
> comes to speed.


>From the perspective of the function that wants to get access to the
contents of an object, direct access will be the fastest.  However, more
globally, from the perspective of the runtime as a whole, supporting direct
access in all situations typically makes other things slower.  So, unless
there is hot code doing a lot of direct structure access, there can be a
big net loss.

The JNI has a good compromise, at least for primitive types like the
various classes of floats, and ints.  When requesting the contents of an
array, the runtime might be able to give you a direct pointer, but, if not,
what you get back might be a copy of the contents of the array.  To know
what happened, the API gives you a signal about the ownership of the
pointer in an out parameter.

https://docs.oracle.com/javase/8/docs/technotes/guides/jni/spec/functions.html#Get_PrimitiveType_ArrayElements_routines

You have to release these when you're done, kind of like what Python's
buffer protocol does today.  For small to medium sized stuff, there is an
API for just copying things out into a user provided buffer, which might be
faster.

https://docs.oracle.com/javase/8/docs/technotes/guides/jni/spec/functions.html#Get_PrimitiveType_ArrayRegion_routines

The C-API is currently inconsistent in what kind of access you can get to
the contents of an object.  As I mentioned in the other thread, it would be
beneficial to alternative implementations of Python to have more uniformity
in what the C-API provides.

If a "PyObject*[]" is not what the runtime uses internally
> as data structure, then why hand it out as an interface to users who
> require performance? There's PyIter_Next() already for those who don't.
>

For arrays of pointers to objects that may be under the management of a
moving garbage collector, what looks like direct access would actually be
emulated and dramatically slower than doing PyIter_Next.

If the intention is to switch to a more efficient internal data structure
> inside of CPython (or expose in PyPy whatever that uses), then I would look
> more at PEP-393 for a good interface here, or "array.array". It's perfectly
> fine to have 20 different internal array types, as long as they are
> explicitly and safely exposed to users.
>

The internal data structure might be exactly the same, but, for example, a
moving GC might make efficient direct access to the data structure from C
code impossible.  If you are not doing with reference types, I agree, we
can do much better.
_______________________________________________
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/FUGK3ECIXVQ5B6J6G7GUFCSHAWBBOTAI/
Code of Conduct: http://python.org/psf/codeofconduct/

Reply via email to