On 05/16/2012 10:24 PM, Robert Bradshaw wrote:
On Wed, May 16, 2012 at 11:33 AM, "Martin v. Löwis"<mar...@v.loewis.de>  wrote:
Does this use case make sense to everyone?

The reason why we are discussing this on python-dev is that we are looking
for a general way to expose these C level signatures within the Python
ecosystem. And Dag's idea was to expose them as part of the type object,
basically as an addition to the current Python level tp_call() slot.

The use case makes sense, yet there is also a long-standing solution already
to expose APIs and function pointers: the capsule objects.

If you want to avoid dictionary lookups on the server side, implement
tp_getattro, comparing addresses of interned strings.

Yes, that's an idea worth looking at. The point implementing
tp_getattro to avoid dictionary lookups overhead is a good one, worth
trying at least. One drawback is that this approach does require the
GIL (as does _PyType_Lookup).

Regarding the C function being faster than the dictionary lookup (or
at least close enough that the lookup takes time), yes, this happens
all the time. For example one might be solving differential equations
and the "user input" is essentially a set of (usually simple) double
f(double) and its derivatives.

To underline how this is performance critical to us, perhaps a full Cython example is useful.

The following Cython code is a real world usecase. It is not too contrived in the essentials, although simplified a little bit. For instance undergrad engineering students could pick up Cython just to play with simple scalar functions like this.

from numpy import sin
# assume sin is a Python callable and that NumPy decides to support
# our spec to also support getting a "double (*sinfuncptr)(double)".

# Our mission: Avoid to have the user manually import "sin" from C,
# but allow just using the NumPy object and still be fast.

# define a function to integrate
cpdef double f(double x):
    return sin(x * x) # guess on signature and use "fastcall"!

# the integrator
def integrate(func, double a, double b, int n):
    cdef double s = 0
    cdef double dx = (b - a) / n
    for i in range(n):
        # This is also a fastcall, but can be cached so doesn't
        # matter...
        s += func(a + i * dx)
    return s * dx

integrate(f, 0, 1, 1000000)

There are two problems here:

- The "sin" global can be reassigned (monkey-patched) between each call to "f", no way for "f" to know. Even "sin" could do the reassignment. So you'd need to check for reassignment to do caching...

- The fastcall inside of "f" is separated from the loop in "integrate". And since "f" is often in another module, we can't rely on static full program analysis.

These problems with monkey-patching disappear if the lookup is negligible.

Some rough numbers:

- The overhead with the tp_flags hack is a 2 ns overhead (something similar with a metaclass, the problems are more how to synchronize that metaclass across multiple 3rd party libraries)

 - Dict lookup 20 ns

- The sin function is about 35 ns. And, "f" is probably only 2-3 ns, and there could very easily be multiple such functions, defined in different modules, in a chain, in order to build up a formula.

Dag
_______________________________________________
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Reply via email to