Re: [Cython] String types with Python 2.x and 3.x

Stefan Behnel Mon, 14 Sep 2009 12:05:55 -0700

Robert Bradshaw wrote:
> On Sep 13, 2009, at 12:39 PM, Stefan Behnel wrote:
>>>>        cdef str s = "some string"
>>>>        cdef char* cs = s
>>>>
>>> I'm inclined for a warning... and that warning would not be generated
>>> in this case: "cdef char*cs = <bytes>s" , right?
>> Sure.
> 
> That could be bad, <bytes>s doesn't actually do a typecheck,  
> especially if the bytes -> char* is eventually optimized. One should  
> do <bytes?>s or <object>s (neither of which generate a warning).


To me, that's just like casting an int to a void*. I don't see a reason to
special case some casts while we already allow all that dangerous C stuff.
If nothing else, a cast is a clear way to say "I know better!". And if you
actually do not know better, you'll see where that gets you. Not Cython's
problem.


>> changing the argument/return value types from "object" to the  
>> right types will allow Cython to do actual type checking.
> 
> Often the type checking will be redundant with the type checking that  
> happens inside the method, so I'm not so sure this is a good idea.

I meant compile time type checking, which won't hurt performance but helps
in making the C-API safer and also allows Cython to do some optimisations.

For example, I only noticed recently that literal Python strings were
always treated as "object" in Cython. So things like u"".join() were never
associated with the unicode type.


>>>> And "str", "bytes" and "unicode" wouldn't be assignable to each  
>>>> other,
>>>> right? Or would you also leave that to runtime?
>>> "bytes" <-> "unicode" (obviously?) would not be assignable, tough for
>>> the case of "bytes" <-> "str" or "str" <-> "unicode", we could
>>> generate similar Cython compile warnings as for the "[unsigned ]char
>>> *" conversions.
>> Yes, I guess that's a similar case.
> 
> I'd be inclined to outright disallow them, favoring requiring <bytes? 
>  > or <unicode?> or <object> cast.

Perfectly fine with me.


> Currently, though, I can't think  
> of any reason to type str/bytes/unicode variables at all.

You should take a look at the call optimisations for builtin types. I've
been adding to them for a while now, and they really make a huge difference.

For example, this:

        cdef unicode u = some_unicode_string
        s = u.encode('UTF-8')

will now result in a straight C call to the UTF-8 encoder, instead of
looking up the method, calling it, and having it look up the codec
internally. I find that pretty cool.

Stefan
_______________________________________________
Cython-dev mailing list
[email protected]
http://codespeak.net/mailman/listinfo/cython-dev

Re: [Cython] String types with Python 2.x and 3.x

Reply via email to