Dag Sverre Seljebotn, 15.05.2010 11:28:
> Stefan Behnel wrote:
>> latest cython-devel can infer the type of a for-loop variable when
>> iterating over C arrays, C pointers and Python strings. It will infer
>> Py_UNICODE for unicode strings, but plain 'object' for a bytes string, as
>> this returns sliced strings in Py2 and integers in Py3, so there is no
>> common C type. So the following will infer c to be a plain Python object:
>>
>>       cdef bytes s = b'abcdefg'
>>
>>       c = s[4]
>>       for c in s:
>>           pass
>>
>> However, this:
>>
>>       c = b'abcdefg'[4]
>>       for c in b'abcdefg':
>>           pass
>>
>> will infer 'char' for c, as the bytes literal starts off as a char* string.
>> The main problem here is that 'char' does not behave like a Python bytes
>> object at all. I doubt that iterating over bytes literals is a common use
>> case, but I'm not sure about the 'least surprising' thing to do here.
>>
>> Should we special case this to prevent breaking Python-2 semantics, or
>> should we expect that users will usually want 'char' as a result anyway?
>>
>> Both behaviours are easy to get with a simple cast, so this is really only
>> a matter of consistency and least surprise. The thing that really bites me
>> here is that the bytes type in Py3 *does* return integers on iteration. So
>> returning 'char' on indexing and iteration would be both more efficient and
>> more future proof. But it would also be impossible to keep consistent in
>> Python-2, as faking it would mean that an untyped bytes object would return
>> a substring, whereas a typed one would return an integer. And I don't
>> really want to inject a type check branch into each getitem call to
>> override that behaviour...
>>
>> So ISTM that the only way to make this consistent is to follow Python 2 for
>> now, including literals, and to accept the different (but also consistent)
>> behaviour when running in Python 3.
>>
> "In the face of ambiguity, refuse the temptation to guess"?

I'm not sure this applies here. We have existing Python semantics for this, 
after all. They just differ between Python 2 and Python 3. This is just a 
case where we can't easily guarantee one specific behaviour as we do not 
control the type's implementation.


> I.e., I'd just disallow it from the language (that is, require a cast),
> because of this issue. I don't see iterating over string literals as
> important enough that one can't require a cast.

Indexing based on a dynamically calculated index may be somewhat more 
important though, and a cast makes that a lot more ugly. Also, requiring a 
cast would prevent us from compiling Python code that uses this (I guess 
that makes a case for following the Py2 semantics).

Stefan
_______________________________________________
Cython-dev mailing list
[email protected]
http://codespeak.net/mailman/listinfo/cython-dev

Reply via email to