On 05/17/2012 08:13 PM, Dag Sverre Seljebotn wrote:
Mark Shannon <m...@hotpy.org> wrote:
Dag Sverre Seljebotn wrote:
from numpy import sin
# assume sin is a Python callable and that NumPy decides to support
# our spec to also support getting a "double (*sinfuncptr)(double)".

# Our mission: Avoid to have the user manually import "sin" from C,
# but allow just using the NumPy object and still be fast.

# define a function to integrate
cpdef double f(double x):
return sin(x * x) # guess on signature and use "fastcall"!

# the integrator
def integrate(func, double a, double b, int n):
cdef double s = 0
cdef double dx = (b - a) / n
for i in range(n):
# This is also a fastcall, but can be cached so doesn't
# matter...
s += func(a + i * dx)
return s * dx

integrate(f, 0, 1, 1000000)

There are two problems here:

- The "sin" global can be reassigned (monkey-patched) between each
call
to "f", no way for "f" to know. Even "sin" could do the reassignment.
So
you'd need to check for reassignment to do caching...

Since Cython allows static typing why not just declare that func can
treat sin as if it can't be monkeypatched?

If you want to manually declare stuff, you can always use a C function
pointer too...

Moving the load of a global variable out of the loop does seem to be a
rather obvious optimisation, if it were declared to be legal.

In case you didn't notice, there was no global variable loads inside the
loop...

You can keep chasing this, but there's *always* cases where they don't
(and you need to save the situation by manual typing).

Anyway: We should really discuss Cython on the Cython list. If my
motivating example wasn't good enough for you there's really nothing I
can do.

Some rough numbers:

- The overhead with the tp_flags hack is a 2 ns overhead (something
similar with a metaclass, the problems are more how to synchronize
that
metaclass across multiple 3rd party libraries)

Does your approach handle subtyping properly?

Not really.


- Dict lookup 20 ns

Did you time _PyType_Lookup() ?

No, didn't get around to it yet (and thanks for pointing it out).
(Though the GIL requirement is an issue too for Cython.)

- The sin function is about 35 ns. And, "f" is probably only 2-3 ns,

and there could very easily be multiple such functions, defined in
different modules, in a chain, in order to build up a formula.


Such micro timings are meaningless, because the working set often tends

to fit in the hardware cache. A level 2 cache miss can takes 100s of
cycles.

I'm sorry; if my rant wasn't clear: Such micro-benchmarks do in fact mimic very closely what you'd do if you'd, say, integrate an ordinary differential equation. You *do* have a tight loop like that, just hammering on floating point numbers. Making that specific usecase more convenient was actually the original usecase that spawned this discussion on the NumPy list over a month ago...

Dag


I find this sort of response arrogant -- do you know the details of
every usecase for a programming language under the sun?

Many Cython users are scientists. And in scientific computing in
particular you *really* have the whole range of problems and working
sets. Honestly. In some codes you only really care about the speed of
the disk controller. In other cases you can spend *many seconds* working
almost only in L1 or perhaps L2 cache (for instance when integrating
ordinary differential equations in a few variables, which is not
entirely different in nature from the example I posted). (Then, those
many seconds are replicated many million times for different parameters
on a large cluster, and a 2x speedup translates directly into large
amounts of saved money.)

Also, with numerical codes you block up the problem so that loads to L2
are amortized over sufficient FLOPs (when you can).

Every time Cython becomes able to do stuff more easily in this domain,
people thank us that they didn't have to dig up Fortran but can stay
closer to Python.

Sorry for going off on a rant. I find that people will give well-meant
advice about performance, but that advice is just generalizing from
computer programs in entirely different domains (web apps?), and
sweeping generalizations has a way of giving the wrong answer.

Dag
_______________________________________________
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe:
http://mail.python.org/mailman/options/python-dev/d.s.seljebotn%40astro.uio.no


_______________________________________________
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Reply via email to