Hi,
latest cython-devel can infer the type of a for-loop variable when
iterating over C arrays, C pointers and Python strings. It will infer
Py_UNICODE for unicode strings, but plain 'object' for a bytes string, as
this returns sliced strings in Py2 and integers in Py3, so there is no
common C type. So the following will infer c to be a plain Python object:
cdef bytes s = b'abcdefg'
c = s[4]
for c in s:
pass
However, this:
c = b'abcdefg'[4]
for c in b'abcdefg':
pass
will infer 'char' for c, as the bytes literal starts off as a char* string.
The main problem here is that 'char' does not behave like a Python bytes
object at all. I doubt that iterating over bytes literals is a common use
case, but I'm not sure about the 'least surprising' thing to do here.
Should we special case this to prevent breaking Python-2 semantics, or
should we expect that users will usually want 'char' as a result anyway?
Both behaviours are easy to get with a simple cast, so this is really only
a matter of consistency and least surprise. The thing that really bites me
here is that the bytes type in Py3 *does* return integers on iteration. So
returning 'char' on indexing and iteration would be both more efficient and
more future proof. But it would also be impossible to keep consistent in
Python-2, as faking it would mean that an untyped bytes object would return
a substring, whereas a typed one would return an integer. And I don't
really want to inject a type check branch into each getitem call to
override that behaviour...
So ISTM that the only way to make this consistent is to follow Python 2 for
now, including literals, and to accept the different (but also consistent)
behaviour when running in Python 3.
Opinions?
Stefan
_______________________________________________
Cython-dev mailing list
[email protected]
http://codespeak.net/mailman/listinfo/cython-dev