On May 15, 2010, at 3:38 AM, Stefan Behnel wrote:

> Dag Sverre Seljebotn, 15.05.2010 11:28:
>> Stefan Behnel wrote:
>>> latest cython-devel can infer the type of a for-loop variable when
>>> iterating over C arrays, C pointers and Python strings. It will  
>>> infer
>>> Py_UNICODE for unicode strings, but plain 'object' for a bytes  
>>> string, as
>>> this returns sliced strings in Py2 and integers in Py3, so there  
>>> is no
>>> common C type. So the following will infer c to be a plain Python  
>>> object:
>>>
>>>      cdef bytes s = b'abcdefg'
>>>
>>>      c = s[4]
>>>      for c in s:
>>>          pass
>>>
>>> However, this:
>>>
>>>      c = b'abcdefg'[4]
>>>      for c in b'abcdefg':
>>>          pass
>>>
>>> will infer 'char' for c, as the bytes literal starts off as a  
>>> char* string.
>>> The main problem here is that 'char' does not behave like a Python  
>>> bytes
>>> object at all. I doubt that iterating over bytes literals is a  
>>> common use
>>> case, but I'm not sure about the 'least surprising' thing to do  
>>> here.
>>>
>>> Should we special case this to prevent breaking Python-2  
>>> semantics, or
>>> should we expect that users will usually want 'char' as a result  
>>> anyway?
>>>
>>> Both behaviours are easy to get with a simple cast, so this is  
>>> really only
>>> a matter of consistency and least surprise. The thing that really  
>>> bites me
>>> here is that the bytes type in Py3 *does* return integers on  
>>> iteration. So
>>> returning 'char' on indexing and iteration would be both more  
>>> efficient and
>>> more future proof. But it would also be impossible to keep  
>>> consistent in
>>> Python-2, as faking it would mean that an untyped bytes object  
>>> would return
>>> a substring, whereas a typed one would return an integer. And I  
>>> don't
>>> really want to inject a type check branch into each getitem call to
>>> override that behaviour...
>>>
>>> So ISTM that the only way to make this consistent is to follow  
>>> Python 2 for
>>> now, including literals, and to accept the different (but also  
>>> consistent)
>>> behaviour when running in Python 3.
>>>
>> "In the face of ambiguity, refuse the temptation to guess"?
>
> I'm not sure this applies here. We have existing Python semantics  
> for this,
> after all. They just differ between Python 2 and Python 3. This is  
> just a
> case where we can't easily guarantee one specific behaviour as we do  
> not
> control the type's implementation.

Sounds like a perfect example of where the -3 flag should be used.

>> I.e., I'd just disallow it from the language (that is, require a  
>> cast),
>> because of this issue. I don't see iterating over string literals as
>> important enough that one can't require a cast.
>
> Indexing based on a dynamically calculated index may be somewhat more
> important though, and a cast makes that a lot more ugly. Also,  
> requiring a
> cast would prevent us from compiling Python code that uses this (I  
> guess
> that makes a case for following the Py2 semantics).

+1

- Robert

_______________________________________________
Cython-dev mailing list
[email protected]
http://codespeak.net/mailman/listinfo/cython-dev

Reply via email to