On 04/10/2012 03:29 PM, Nathaniel Smith wrote: > On Tue, Apr 10, 2012 at 2:15 PM, Dag Sverre Seljebotn > <d.s.seljeb...@astro.uio.no> wrote: >> On 04/10/2012 03:10 PM, Dag Sverre Seljebotn wrote: >>> On 04/10/2012 03:00 PM, Nathaniel Smith wrote: >>>> On Tue, Apr 10, 2012 at 1:39 PM, Dag Sverre Seljebotn >>>> <d.s.seljeb...@astro.uio.no> wrote: >>>>> On 04/10/2012 12:37 PM, Nathaniel Smith wrote: >>>>>> On Tue, Apr 10, 2012 at 1:57 AM, Travis Oliphant<tra...@continuum.io> >>>>>> wrote: >>>>>>> On Apr 9, 2012, at 7:21 PM, Nathaniel Smith wrote: >>>>>>> >>>>>>> ...isn't this an operation that will be performed once per compiled >>>>>>> function? Is the overhead of the easy, robust method (calling >>>>>>> ctypes.cast) >>>>>>> actually measurable as compared to, you know, running an optimizing >>>>>>> compiler? >>>>>>> >>>>>>> Yes, there can be significant overhead. The compiler is run once and >>>>>>> creates the function. This function is then potentially used many, >>>>>>> many >>>>>>> times. Also, it is entirely conceivable that the "build" step >>>>>>> happens at >>>>>>> a separate "compilation" time, and Numba actually loads a pre-compiled >>>>>>> version of the function from disk which it then uses at run-time. >>>>>>> >>>>>>> I have been playing with a version of this using scipy.integrate and >>>>>>> unfortunately the overhead of ctypes.cast is rather significant --- to >>>>>>> the >>>>>>> point of making the code-path using these function pointers to be >>>>>>> useless >>>>>>> when without the ctypes.cast overhed the speed up is 3-5x. >>>>>> >>>>>> Ah, I was assuming that you'd do the cast once outside of the inner >>>>>> loop (at the same time you did type compatibility checking and so >>>>>> forth). >>>>>> >>>>>>> In general, I think NumPy will need its own simple function-pointer >>>>>>> object >>>>>>> to use when handing over raw-function pointers between Python and C. >>>>>>> SciPy >>>>>>> can then re-use this object which also has a useful C-API for things >>>>>>> like >>>>>>> signature checking. I have seen that ctypes is nice but very slow and >>>>>>> without a compelling C-API. >>>>>> >>>>>> Sounds reasonable to me. Probably nicer than violating ctypes's >>>>>> abstraction boundary, and with no real downsides. >>>>>> >>>>>>> The kind of new C-level cfuncptr object I imagine has attributes: >>>>>>> >>>>>>> void *func_ptr; >>>>>>> char *signature string /* something like 'dd->d' to indicate a function >>>>>>> that takes two doubles and returns a double */ >>>>>> >>>>>> This looks like it's setting us up for trouble later. We already have >>>>>> a robust mechanism for describing types -- dtypes. We should use that >>>>>> instead of inventing Yet Another baby type system. We'll need to >>>>>> convert between this representation and dtypes anyway if you want to >>>>>> use these pointers for ufunc loops... and if we just use dtypes from >>>>>> the start, we'll avoid having to break the API the first time someone >>>>>> wants to pass a struct or array or something. >>>>> >>>>> For some of the things we'd like to do with Cython down the line, >>>>> something very fast like what Travis describes is exactly what we need; >>>>> specifically, if you have Cython code like >>>>> >>>>> cdef double f(func): >>>>> return func(3.4) >>>>> >>>>> that may NOT be called in a loop. >>>>> >>>>> But I do agree that this sounds overkill for NumPy+numba at the moment; >>>>> certainly for scipy.integrate where you can amortize over N function >>>>> samples. But Travis perhaps has a usecase I didn't think of. >>>> >>>> It sounds sort of like you're disagreeing with me but I can't tell >>>> about what, so maybe I was unclear :-). >>>> >>>> All I was saying was that a list-of-dtype-objects was probably a >>>> better way to write down a function signature than some ad-hoc string >>>> language. In both cases you'd do some type-compatibility-checking up >>>> front and then use C calling afterwards, and I don't see why >>>> type-checking would be faster or slower for one representation than >>>> the other. (Certainly one wouldn't have to support all possible dtypes >> >> Rereading this, perhaps this is the statement you seek: Yes, doing a >> simple strcmp is much, much faster than jumping all around in memory to >> check the equality of two lists of dtypes. If it is a string less than 8 >> bytes in length with the comparison string known at compile-time (the >> Cython case) then the comparison is only a couple of CPU instructions, >> as you can check 64 bits at the time. > > Right, that's what I wasn't getting until you mentioned strcmp :-). > > That said, the core numpy dtypes are singletons. For this purpose, the > signature could be stored as C array of PyArray_Descr*, but even if we > store it in a Python tuple/list, we'd still end up with a contiguous > array of PyArray_Descr*'s. (I'm assuming that we would guarantee that > it was always-and-only a real PyTupleObject* here.) So for the > function we're talking about, the check would compile down to doing > the equivalent of a 3*pointersize-byte strcmp, instead of a 5-byte > strcmp. That's admittedly worse, but I think the difference between > these two comparisons is unlikely to be measurable, considering that > they're followed immediately by a cache miss when we actually jump to > the function pointer.
Yes, for singletons you're almost as good off. But if you have a struct argument, say void f(double x, struct {double a, float b} y); then PEP 3118 gives you the string "dT{dd}", whereas with NumPy dtypes you won't have a singleton? I can agree that that is a minor issue though (you could always *make* NumPy dtypes always be singleton). I think the real argument is that for Cython, it just wouldn't do to rely on NumPy dtypes (or NumPy being installed at all) for something as basic as calling to a C-level function; and strings are a simple substitute. And since it is a format defined in PEP 3118, NumPy should already support these kinds of strings internally (i.e. conversion to/from dtype). Dag _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion