On 04/10/2012 03:38 PM, Dag Sverre Seljebotn wrote: > On 04/10/2012 03:29 PM, Nathaniel Smith wrote: >> On Tue, Apr 10, 2012 at 2:15 PM, Dag Sverre Seljebotn >> <d.s.seljeb...@astro.uio.no> wrote: >>> On 04/10/2012 03:10 PM, Dag Sverre Seljebotn wrote: >>>> On 04/10/2012 03:00 PM, Nathaniel Smith wrote: >>>>> On Tue, Apr 10, 2012 at 1:39 PM, Dag Sverre Seljebotn >>>>> <d.s.seljeb...@astro.uio.no> wrote: >>>>>> On 04/10/2012 12:37 PM, Nathaniel Smith wrote: >>>>>>> On Tue, Apr 10, 2012 at 1:57 AM, Travis Oliphant<tra...@continuum.io> >>>>>>> wrote: >>>>>>>> On Apr 9, 2012, at 7:21 PM, Nathaniel Smith wrote: >>>>>>>> >>>>>>>> ...isn't this an operation that will be performed once per compiled >>>>>>>> function? Is the overhead of the easy, robust method (calling >>>>>>>> ctypes.cast) >>>>>>>> actually measurable as compared to, you know, running an optimizing >>>>>>>> compiler? >>>>>>>> >>>>>>>> Yes, there can be significant overhead. The compiler is run once and >>>>>>>> creates the function. This function is then potentially used many, >>>>>>>> many >>>>>>>> times. Also, it is entirely conceivable that the "build" step >>>>>>>> happens at >>>>>>>> a separate "compilation" time, and Numba actually loads a pre-compiled >>>>>>>> version of the function from disk which it then uses at run-time. >>>>>>>> >>>>>>>> I have been playing with a version of this using scipy.integrate and >>>>>>>> unfortunately the overhead of ctypes.cast is rather significant --- to >>>>>>>> the >>>>>>>> point of making the code-path using these function pointers to be >>>>>>>> useless >>>>>>>> when without the ctypes.cast overhed the speed up is 3-5x. >>>>>>> >>>>>>> Ah, I was assuming that you'd do the cast once outside of the inner >>>>>>> loop (at the same time you did type compatibility checking and so >>>>>>> forth). >>>>>>> >>>>>>>> In general, I think NumPy will need its own simple function-pointer >>>>>>>> object >>>>>>>> to use when handing over raw-function pointers between Python and C. >>>>>>>> SciPy >>>>>>>> can then re-use this object which also has a useful C-API for things >>>>>>>> like >>>>>>>> signature checking. I have seen that ctypes is nice but very slow >>>>>>>> and >>>>>>>> without a compelling C-API. >>>>>>> >>>>>>> Sounds reasonable to me. Probably nicer than violating ctypes's >>>>>>> abstraction boundary, and with no real downsides. >>>>>>> >>>>>>>> The kind of new C-level cfuncptr object I imagine has attributes: >>>>>>>> >>>>>>>> void *func_ptr; >>>>>>>> char *signature string /* something like 'dd->d' to indicate a >>>>>>>> function >>>>>>>> that takes two doubles and returns a double */ >>>>>>> >>>>>>> This looks like it's setting us up for trouble later. We already have >>>>>>> a robust mechanism for describing types -- dtypes. We should use that >>>>>>> instead of inventing Yet Another baby type system. We'll need to >>>>>>> convert between this representation and dtypes anyway if you want to >>>>>>> use these pointers for ufunc loops... and if we just use dtypes from >>>>>>> the start, we'll avoid having to break the API the first time someone >>>>>>> wants to pass a struct or array or something. >>>>>> >>>>>> For some of the things we'd like to do with Cython down the line, >>>>>> something very fast like what Travis describes is exactly what we need; >>>>>> specifically, if you have Cython code like >>>>>> >>>>>> cdef double f(func): >>>>>> return func(3.4) >>>>>> >>>>>> that may NOT be called in a loop. >>>>>> >>>>>> But I do agree that this sounds overkill for NumPy+numba at the moment; >>>>>> certainly for scipy.integrate where you can amortize over N function >>>>>> samples. But Travis perhaps has a usecase I didn't think of. >>>>> >>>>> It sounds sort of like you're disagreeing with me but I can't tell >>>>> about what, so maybe I was unclear :-). >>>>> >>>>> All I was saying was that a list-of-dtype-objects was probably a >>>>> better way to write down a function signature than some ad-hoc string >>>>> language. In both cases you'd do some type-compatibility-checking up >>>>> front and then use C calling afterwards, and I don't see why >>>>> type-checking would be faster or slower for one representation than >>>>> the other. (Certainly one wouldn't have to support all possible dtypes >>> >>> Rereading this, perhaps this is the statement you seek: Yes, doing a >>> simple strcmp is much, much faster than jumping all around in memory to >>> check the equality of two lists of dtypes. If it is a string less than 8 >>> bytes in length with the comparison string known at compile-time (the >>> Cython case) then the comparison is only a couple of CPU instructions, >>> as you can check 64 bits at the time. >> >> Right, that's what I wasn't getting until you mentioned strcmp :-). >> >> That said, the core numpy dtypes are singletons. For this purpose, the >> signature could be stored as C array of PyArray_Descr*, but even if we >> store it in a Python tuple/list, we'd still end up with a contiguous >> array of PyArray_Descr*'s. (I'm assuming that we would guarantee that >> it was always-and-only a real PyTupleObject* here.) So for the >> function we're talking about, the check would compile down to doing >> the equivalent of a 3*pointersize-byte strcmp, instead of a 5-byte >> strcmp. That's admittedly worse, but I think the difference between >> these two comparisons is unlikely to be measurable, considering that >> they're followed immediately by a cache miss when we actually jump to >> the function pointer.
Actually, I think the performance hit is a problem in the Cython case. While there's no place to explicitly pre-check the signature, it will very often be the case that everything is in L1 cache already. Consider f being called in a loop. (And the whole point of the exercise is to avoid for the user having to type the "func" argument) Dag _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion