Re: [Cython] Compile-time duck typing

Robert Bradshaw Sat, 24 May 2008 13:43:23 -0700

On May 24, 2008, at 1:03 PM, Dag Sverre Seljebotn wrote:

> I'll follow up with a sample getitem implementation, so you need not
> follow up this thread until then. But I really wanted to explain
> compile-time duck typing of return types properly (see below).
>
> Robert Bradshaw wrote:
>> On May 24, 2008, at 10:16 AM, Dag Sverre Seljebotn wrote:
>>> In order to solve these problems, one could start to create
>>> complicated
>>> solutions like allowing "self.dtype" as a return type in the method
>>> signature if an assumption is placed on self, specify "nested
>>> types" (ie.
>>> tuple(int, int, int)), with a following combinatorial explosions in
>>> manual
>>> overloads one needs to create) etc. etc.
>>
>> Ah, but in this case it seems much simpler (from the users
>> perspective) to resolve arr[1,2,3] as __getitem__(1,2,3) and let
>> function overloading handle this the normal way. BTW, in terms of
>> focus, I think handling slicing is much lower on the priority list
>> than a lot of other things (the relative gain here is much smaller).
>
> But this makes our __getitem__ different from Python's! If we do that,
> we should rather make up a wholly different, new syntax (__cgetitem__,
> __cgetslice__, and so on); but I do not like to take this direction.


We're going to have to avoid the tuple packing/unpacking somehow if  
we're going for speed, so it make sense in this case to not pack them  
at all in this case rather than have special unpacking code on the  
other end (and I think it looks cleaner too).

> It's OK to not do any optimization for slicing, but it's very  
> important
> that slices correctly fall back to the Python [] operator. As long as
> the Python __getitem__ interface is kept, I must fall back to the []
> operator manually, and also take tuples for n-d indices).

Yep.

> (Also I find the prospect of manually creating multiple overloads
> depending on the number of dimensions somewhat distasteful. Of course,
> I'll do it if there's not enough time, but I'd like to at least have a
> path forward that *can* lead there eventually, and *then* hack it.)

True. A generic n-ary unpacker that gets unrolled completely at  
compile time may be much more complicated to implement (and read).  
IMHO, not as important as making 1, 2, and 3-dimensional indexing as  
fast as possible (possibly falling back to a runtime loop for more).  
Not as powerful, but more realistic to actually get done (especially  
given all the other things you're planning to do).

>>> (In order for this to work there's a small hitch: One must support
>>> code
>>> like this:
>>>
>>> cdef generic get_something(as_str):
>>>     if as_str: return "asdf"
>>>     else: return 3432
>>>
>>> This can be fixed simply by having return type mismatches for
>>> instantiated
>>> generics converted into runtime errors rather than halt
>>> compilation, this
>>> emulates Python behaviour nicely.)
>>
>> So here "generic" would become the more general of the two, i.e. an
>> object. For generic inline functions, would it get optimized away
>> (i.e. if one knew as_str at compile time, it would know the return
>> type exactly?)
>
> No, this is all wrong.
>
> If having "generic" as the return value simply resulted in the more
> general of the types, I wouldn't bother with it -- after all, the
> programmer know which types can be returned, and would be able to
> specify object manually!
>
> I'll exemplify using the function above. If you don't like what you  
> see,
> read footnote [1].
>
> Working calling code:
> (1): cdef char* chbuf = get_something(as_str=True) # chbuf = "asdf"
> (2): cdef object s = get_something(as_str=True) # s = str("asdf")
> (3): cdef object o_n = get_something(as_str=False) # o_n = int(3423)
> (4): cdef int i_n = get_something(as_str=False) # i_n = int(3423)
>
> I.e. this creates four different instances of get_something, each one
> with different semantics because of the return type. I.e. (1)  
> instantiates
>
> cdef char* get_something(as_str): ....
>
> which of course makes 'return "asdf"' return a string literal pointer.
> (I suppose this will change into an error if that auto-coercion is
> removed :-)). (2) and (3) both uses the same instantiation, and their
> code returns object (like your guessed behaviour). (4) turns into
>
> cdef int get_something(as_str): ...
>
> OK, so obviously for (1) and (4) there will be a type mismatch in the
> line of code that's not run. That's where I proposed to change it  
> into a
> run-time error (because "those spots should not be reachable"). I.e,
> suppose this call is done:
>
> cdef int n = get_something(as_str=True)
>
> This uses instantiation (4) from above (the int return one). I'll now
> write out the proposed body of this function:
>
> cdef int get_something(as_str):
>      if as_str:
>          "asdf" # evaluate and discard expression in the return  
> statement
>          # Then explicitly, and always, raise the coercion error.
>          # The point is: Usually this place is not reached!
>          raise TypeError("Cannot coerce str to C int")
>          # or whatever you have now for <int><object>"a"...
>      else:
>          return 3432

Note: one concern is keeping the size of this body down, especially  
if it's inlined in some tight loop.

>
> Instantiation (1) of the function is symmetric to this, raising an
> exception if control reaches the place where the integer is returned.
>
> So the end result is that the "int-return-type" instantiation of
> get_something returns the proper, native C int when called with
> as_str=False, and raises a coercion exception when called with  
> as_str=True.
>
> [1] Even if this may seem hard to wrap ones head around, the end of  
> the
> story for the end-user is rather pleasing; one gets more or less the
> same behaviour as if get_something was declared with an "object"  
> return
> type. It should natural to use. But no object coercion is involved for
> the compiler, so speed is maintained.

Interesting idea, I think Haskell has something like this. It's like  
type coercion going the opposite direction--one wants an int result  
so it changes the expression itself (perhaps after passing through  
several layers? How feasible is this?). I'd rather be a bit more  
explicit (especially for ease of doing type inference for statements  
like "x = arr[5]").

Essentially, what you really want is __getitem__ to return a variety  
of types, determined at compile time, and without coercion through an  
object. For inlined functions perhaps we could have a phase  
automatically optimizing away <type><object>x where there is a direct  
conversion from x to type (if the <object> wasn't explicitly  
requested by the user). No good solution to the generic problem is  
coming to might right now though...

- Robert

_______________________________________________
Cython-dev mailing list
[email protected]
http://codespeak.net/mailman/listinfo/cython-dev

Re: [Cython] Compile-time duck typing

Reply via email to