On May 24, 2008, at 1:03 PM, Dag Sverre Seljebotn wrote:
> I'll follow up with a sample getitem implementation, so you need not
> follow up this thread until then. But I really wanted to explain
> compile-time duck typing of return types properly (see below).
>
> Robert Bradshaw wrote:
>> On May 24, 2008, at 10:16 AM, Dag Sverre Seljebotn wrote:
>>> In order to solve these problems, one could start to create
>>> complicated
>>> solutions like allowing "self.dtype" as a return type in the method
>>> signature if an assumption is placed on self, specify "nested
>>> types" (ie.
>>> tuple(int, int, int)), with a following combinatorial explosions in
>>> manual
>>> overloads one needs to create) etc. etc.
>>
>> Ah, but in this case it seems much simpler (from the users
>> perspective) to resolve arr[1,2,3] as __getitem__(1,2,3) and let
>> function overloading handle this the normal way. BTW, in terms of
>> focus, I think handling slicing is much lower on the priority list
>> than a lot of other things (the relative gain here is much smaller).
>
> But this makes our __getitem__ different from Python's! If we do that,
> we should rather make up a wholly different, new syntax (__cgetitem__,
> __cgetslice__, and so on); but I do not like to take this direction.
We're going to have to avoid the tuple packing/unpacking somehow if
we're going for speed, so it make sense in this case to not pack them
at all in this case rather than have special unpacking code on the
other end (and I think it looks cleaner too).
> It's OK to not do any optimization for slicing, but it's very
> important
> that slices correctly fall back to the Python [] operator. As long as
> the Python __getitem__ interface is kept, I must fall back to the []
> operator manually, and also take tuples for n-d indices).
Yep.
> (Also I find the prospect of manually creating multiple overloads
> depending on the number of dimensions somewhat distasteful. Of course,
> I'll do it if there's not enough time, but I'd like to at least have a
> path forward that *can* lead there eventually, and *then* hack it.)
True. A generic n-ary unpacker that gets unrolled completely at
compile time may be much more complicated to implement (and read).
IMHO, not as important as making 1, 2, and 3-dimensional indexing as
fast as possible (possibly falling back to a runtime loop for more).
Not as powerful, but more realistic to actually get done (especially
given all the other things you're planning to do).
>>> (In order for this to work there's a small hitch: One must support
>>> code
>>> like this:
>>>
>>> cdef generic get_something(as_str):
>>> if as_str: return "asdf"
>>> else: return 3432
>>>
>>> This can be fixed simply by having return type mismatches for
>>> instantiated
>>> generics converted into runtime errors rather than halt
>>> compilation, this
>>> emulates Python behaviour nicely.)
>>
>> So here "generic" would become the more general of the two, i.e. an
>> object. For generic inline functions, would it get optimized away
>> (i.e. if one knew as_str at compile time, it would know the return
>> type exactly?)
>
> No, this is all wrong.
>
> If having "generic" as the return value simply resulted in the more
> general of the types, I wouldn't bother with it -- after all, the
> programmer know which types can be returned, and would be able to
> specify object manually!
>
> I'll exemplify using the function above. If you don't like what you
> see,
> read footnote [1].
>
> Working calling code:
> (1): cdef char* chbuf = get_something(as_str=True) # chbuf = "asdf"
> (2): cdef object s = get_something(as_str=True) # s = str("asdf")
> (3): cdef object o_n = get_something(as_str=False) # o_n = int(3423)
> (4): cdef int i_n = get_something(as_str=False) # i_n = int(3423)
>
> I.e. this creates four different instances of get_something, each one
> with different semantics because of the return type. I.e. (1)
> instantiates
>
> cdef char* get_something(as_str): ....
>
> which of course makes 'return "asdf"' return a string literal pointer.
> (I suppose this will change into an error if that auto-coercion is
> removed :-)). (2) and (3) both uses the same instantiation, and their
> code returns object (like your guessed behaviour). (4) turns into
>
> cdef int get_something(as_str): ...
>
> OK, so obviously for (1) and (4) there will be a type mismatch in the
> line of code that's not run. That's where I proposed to change it
> into a
> run-time error (because "those spots should not be reachable"). I.e,
> suppose this call is done:
>
> cdef int n = get_something(as_str=True)
>
> This uses instantiation (4) from above (the int return one). I'll now
> write out the proposed body of this function:
>
> cdef int get_something(as_str):
> if as_str:
> "asdf" # evaluate and discard expression in the return
> statement
> # Then explicitly, and always, raise the coercion error.
> # The point is: Usually this place is not reached!
> raise TypeError("Cannot coerce str to C int")
> # or whatever you have now for <int><object>"a"...
> else:
> return 3432
Note: one concern is keeping the size of this body down, especially
if it's inlined in some tight loop.
>
> Instantiation (1) of the function is symmetric to this, raising an
> exception if control reaches the place where the integer is returned.
>
> So the end result is that the "int-return-type" instantiation of
> get_something returns the proper, native C int when called with
> as_str=False, and raises a coercion exception when called with
> as_str=True.
>
> [1] Even if this may seem hard to wrap ones head around, the end of
> the
> story for the end-user is rather pleasing; one gets more or less the
> same behaviour as if get_something was declared with an "object"
> return
> type. It should natural to use. But no object coercion is involved for
> the compiler, so speed is maintained.
Interesting idea, I think Haskell has something like this. It's like
type coercion going the opposite direction--one wants an int result
so it changes the expression itself (perhaps after passing through
several layers? How feasible is this?). I'd rather be a bit more
explicit (especially for ease of doing type inference for statements
like "x = arr[5]").
Essentially, what you really want is __getitem__ to return a variety
of types, determined at compile time, and without coercion through an
object. For inlined functions perhaps we could have a phase
automatically optimizing away <type><object>x where there is a direct
conversion from x to type (if the <object> wasn't explicitly
requested by the user). No good solution to the generic problem is
coming to might right now though...
- Robert
_______________________________________________
Cython-dev mailing list
[email protected]
http://codespeak.net/mailman/listinfo/cython-dev