Re: [Python-Dev] C-level duck typing

Mark Shannon Wed, 16 May 2012 08:42:31 -0700

Dag Sverre Seljebotn wrote:

On 05/16/2012 02:47 PM, Mark Shannon wrote:
Stefan Behnel wrote:
Dag Sverre Seljebotn, 16.05.2012 12:48:
On 05/16/2012 11:50 AM, "Martin v. Löwis" wrote:
Agreed in general, but in this case, it's really not that easy. A C
function call involves a certain overhead all by itself, so calling
into
the C-API multiple times may be substantially more costly than, say,
calling through a function pointer once and then running over a
returned C
array comparing numbers. And definitely way more costly than
running over
an array that the type struct points to directly. We are not talking
about
hundreds of entries here, just a few. A linear scan in 64 bit steps
over
something like a hundred bytes in the L1 cache should hardly be
measurable.
I give up, then. I fail to understand the problem. Apparently, youwant
to do something with the value you get from this lookup operation, but
that something won't involve function calls (or else the function call
overhead for the lookup wouldn't be relevant).
In our specific case the value would be an offset added to the
PyObject*,
and there we would find a pointer to a C function (together with a
64-bit
signature), and calling that C function (after checking the 64 bit
signature) is our final objective.
I think the use case hasn't been communicated all that clearly yet.Let's
give it another try.

Imagine we have two sides, one that provides a callable and the other
side
that wants to call it. Both sides are implemented in C, so the callee
has a
C signature and the caller has the arguments available as C data
types. The
signature may or may not match the argument types exactly (float vs.
double, int vs. long, ...), because the caller and the callee know
nothing
about each other initially, they just happen to appear in the same
program
at runtime. All they know is that they could call each other through
Python
space, but that would require data conversion, tuple packing, calling,
tuple unpacking, data unpacking, and then potentially the same thing
on the
way back. They want to avoid that overhead.

Now, the caller needs to figure out if the callee has a compatible
signature. The callee may provide more than one signature (i.e. morethan
one C call entry point), perhaps because it is implemented to deal with
different input data types efficiently, or perhaps because it can
efficiently convert them to its expected input. So, there is a
signature on
the caller side given by the argument types it holds, and a couple of
signature on the callee side that can accept different C data input.Then
the caller needs to find out which signatures there are and match them
against what it can efficiently call. It may even be a JIT compiler that
can generate an efficient call signature on the fly, given a suitable
signature on callee side.
An example for this is an algorithm that evaluates a user provided
function
on a large NumPy array. The caller knows what array type it is operating
on, and the user provided function may be designed to efficientlyoperate
on arrays of int, float and double entries.
Given that use case, can I suggest the following:

Separate the discovery of the function from its use.
By this I mean first lookup the function (outside of the loop)
then use the function (inside the loop).
We would obviously do that when we can. But Cython is a compiler/codetranslator, and we don't control usecases. You can easily make upusecases (= Cython code people write) where you can't easily separatethe two.
For instance, the Sage projects has hundreds of thousands of lines ofobject-oriented Cython code (NOT just array-oriented, but also graphsand trees and stuff), which is all based on Cython's own fast vtabledispatches a la C++. They might want to clean up their code and moregeneric callback objects some places.
Other users currently pass around C pointers for callback functions, andwe'd like to tell them "pass around these nicer Python callablesinstead, honestly, the penalty is only 2 ns per call". (*Regardless* ofhow you use them, like making sure you use them in a loop where we canstatically pull out the function pointer acquisition. Saying "this isonly non-sluggish if you do x, y, z puts users off.)


Why not pass around a PyCFunction object, instead of a C function

pointer. It contains two fields: the function pointer and the object(self), which is exactly what you want.


Of course, the PyCFunction object only allows a limited range of
function types, which is why I am suggesting a variant which supports a
wider range of C function pointer types.

Is a single extra indirection in obj->func() rather than func(),
really that inefficient?
If you are passing around raw pointers, you have already given up on
dynamic type checking.

I'm not asking you to consider the details of all that. Just to allowsome kind of high-performance extensibility of PyTypeObject, so that wecan *stop* bothering python-dev with specific requirements from ourparallel universe of nearly-all-Cython-and-Fortran-and-C++ codebases :-)


If I read it correctly, you have two problems you wish to solve:
1. A fast callable that can be passed around (see above)
2. Fast access to that callable from a type.

The solution for 2. is the  _PyType_Lookup() function.

By the time you have fixed your proposed solution to properly handlesubclassing I doubt it will be any quicker than _PyType_Lookup().


Cheers,
Mark.

_______________________________________________
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] C-level duck typing

Reply via email to