[Cython] Indexing and looping over bytes literals

Stefan Behnel Sat, 15 May 2010 02:13:52 -0700

Hi,

latest cython-devel can infer the type of a for-loop variable when 
iterating over C arrays, C pointers and Python strings. It will infer 
Py_UNICODE for unicode strings, but plain 'object' for a bytes string, as 
this returns sliced strings in Py2 and integers in Py3, so there is no 
common C type. So the following will infer c to be a plain Python object:


     cdef bytes s = b'abcdefg'

     c = s[4]
     for c in s:
         pass

However, this:

     c = b'abcdefg'[4]
     for c in b'abcdefg':
         pass

will infer 'char' for c, as the bytes literal starts off as a char* string. 
The main problem here is that 'char' does not behave like a Python bytes 
object at all. I doubt that iterating over bytes literals is a common use 
case, but I'm not sure about the 'least surprising' thing to do here.

Should we special case this to prevent breaking Python-2 semantics, or 
should we expect that users will usually want 'char' as a result anyway?

Both behaviours are easy to get with a simple cast, so this is really only 
a matter of consistency and least surprise. The thing that really bites me 
here is that the bytes type in Py3 *does* return integers on iteration. So 
returning 'char' on indexing and iteration would be both more efficient and 
more future proof. But it would also be impossible to keep consistent in 
Python-2, as faking it would mean that an untyped bytes object would return 
a substring, whereas a typed one would return an integer. And I don't 
really want to inject a type check branch into each getitem call to 
override that behaviour...

So ISTM that the only way to make this consistent is to follow Python 2 for 
now, including literals, and to accept the different (but also consistent) 
behaviour when running in Python 3.

Opinions?

Stefan
_______________________________________________
Cython-dev mailing list
[email protected]
http://codespeak.net/mailman/listinfo/cython-dev

[Cython] Indexing and looping over bytes literals

Reply via email to